About 31109 - gprofng not built and installed in a combined binutils+gcc build
Hi, I asked in https://sourceware.org/bugzilla/show_bug.cgi?id=31109 > I prepared a patch for the releases/gcc-13 branch. > Richard Biener rejected my patch for this branch. > Which branch should I use? master, trunk or something else? Do you really need gprofng in the gcc repo ? if yes: the fix is trivial. I did for the releases/gcc-13 branch: git cherry-pick 24552056fd5fc677c0d032f54a5cad1c4303d312 Can anyone do the same for the correct branch. I have no write permissions for gcc.gnu.org/git/gcc.git I maintain binutils-gdb/gprofng. Who will maintain gcc/gprofng ? If no: may I close 31109 ? Thank you, -Vladimir .
Fwd: [PATCH] gprofng: a new GNU profiler
PING. If the patch is approved, can anyone apply this patch. I don't have permissions for `git push`. Thank you, -Vladimir Forwarded Message Subject:[PATCH] gprofng: a new GNU profiler Date: Wed, 13 Dec 2023 17:23:01 -0800 From: vladimir.mezent...@oracle.com To: gcc-patches@gcc.gnu.org CC: Vladimir Mezentsev From: Vladimir Mezentsev This is fixes for releases/gcc-13 for 31109 gprofng not built and installed in a combined binutils+gcc build I only cherry-picked 24552056fd5fc677c0d032f54a5cad1c4303d312 and tested my build. ChangeLog: * Makefile.def: Add gprofng module. * configure.ac: Add --enable-gprofng option. * Makefile.in: Regenerate. * configure: Regenerate. include/ChangeLog: * collectorAPI.h: New file. * libcollector.h: New file. * libfcollector.h: New file. --- Makefile.def | 11 + Makefile.in | 497 configure | 18 ++ configure.ac | 14 ++ include/collectorAPI.h | 73 ++ include/libcollector.h | 89 +++ include/libfcollector.h | 42 7 files changed, 744 insertions(+) create mode 100644 include/collectorAPI.h create mode 100644 include/libcollector.h create mode 100644 include/libfcollector.h diff --git a/Makefile.def b/Makefile.def index 35e994eb77e..9d43b78b51b 100644 --- a/Makefile.def +++ b/Makefile.def @@ -72,6 +72,7 @@ host_modules= { module= isl; lib_path=.libs; bootstrap=true; no_install= true; }; host_modules= { module= gold; bootstrap=true; }; host_modules= { module= gprof; }; +host_modules= { module= gprofng; }; // intl acts on 'host_shared' directly, and does not support --with-pic. host_modules= { module= intl; bootstrap=true; }; host_modules= { module= tcl; @@ -510,6 +511,16 @@ dependencies = { module=all-gprof; on=all-bfd; }; dependencies = { module=all-gprof; on=all-opcodes; }; dependencies = { module=all-gprof; on=all-intl; }; dependencies = { module=all-gprof; on=all-gas; }; + +dependencies = { module=configure-gprofng; on=configure-intl; }; +dependencies = { module=all-gprofng; on=all-libiberty; }; +dependencies = { module=all-gprofng; on=all-bfd; }; +dependencies = { module=all-gprofng; on=all-opcodes; }; +dependencies = { module=all-gprofng; on=all-intl; }; +dependencies = { module=all-gprofng; on=all-gas; }; +dependencies = { module=install-gprofng; on=install-opcodes; }; +dependencies = { module=install-gprofng; on=install-bfd; }; + dependencies = { module=configure-ld; on=configure-intl; }; dependencies = { module=all-ld; on=all-libiberty; }; dependencies = { module=all-ld; on=all-bfd; }; diff --git a/Makefile.in b/Makefile.in index 06a9398e172..2ea8943dc3b 100644 --- a/Makefile.in +++ b/Makefile.in @@ -1084,6 +1084,7 @@ configure-host: \ maybe-configure-isl \ maybe-configure-gold \ maybe-configure-gprof \ + maybe-configure-gprofng \ maybe-configure-intl \ maybe-configure-tcl \ maybe-configure-itcl \ @@ -1237,6 +1238,7 @@ all-host: maybe-all-isl all-host: maybe-all-gold @endif gold-no-bootstrap all-host: maybe-all-gprof +all-host: maybe-all-gprofng @if intl-no-bootstrap all-host: maybe-all-intl @endif intl-no-bootstrap @@ -1376,6 +1378,7 @@ info-host: maybe-info-mpc info-host: maybe-info-isl info-host: maybe-info-gold info-host: maybe-info-gprof +info-host: maybe-info-gprofng info-host: maybe-info-intl info-host: maybe-info-tcl info-host: maybe-info-itcl @@ -1466,6 +1469,7 @@ dvi-host: maybe-dvi-mpc dvi-host: maybe-dvi-isl dvi-host: maybe-dvi-gold dvi-host: maybe-dvi-gprof +dvi-host: maybe-dvi-gprofng dvi-host: maybe-dvi-intl dvi-host: maybe-dvi-tcl dvi-host: maybe-dvi-itcl @@ -1556,6 +1560,7 @@ pdf-host: maybe-pdf-mpc pdf-host: maybe-pdf-isl pdf-host: maybe-pdf-gold pdf-host: maybe-pdf-gprof +pdf-host: maybe-pdf-gprofng pdf-host: maybe-pdf-intl pdf-host: maybe-pdf-tcl pdf-host: maybe-pdf-itcl @@ -1646,6 +1651,7 @@ html-host: maybe-html-mpc html-host: maybe-html-isl html-host: maybe-html-gold html-host: maybe-html-gprof +html-host: maybe-html-gprofng html-host: maybe-html-intl html-host: maybe-html-tcl html-host: maybe-html-itcl @@ -1736,6 +1742,7 @@ TAGS-host: maybe-TAGS-mpc TAGS-host: maybe-TAGS-isl TAGS-host: maybe-TAGS-gold TAGS-host: maybe-TAGS-gprof +TAGS-host: maybe-TAGS-gprofng TAGS-host: maybe-TAGS-intl TAGS-host: maybe-TAGS-tcl TAGS-host: maybe-TAGS-itcl @@ -1826,6 +1833,7 @@ install-info-host: maybe-install-info-mpc install-info-host: maybe-install-info-isl install-info-host: maybe-install-info-gold install-info-host: maybe-install-info-gprof +install-info-host: maybe-install-info-gprofng install-info-host: maybe-install-info-intl install-info-host: maybe-install-info-tcl install-info-host: maybe-install-info-itcl @@ -1916,6 +1924,7 @@ install-dvi-host: maybe-install-dvi-mpc install-dvi-host: maybe-install-dvi-isl install-dvi-host: maybe-install-dvi-gold install-dvi-host: maybe-install-dvi-gprof +install-dvi-host: maybe-install-dvi-gprofng install-dvi-host: maybe-install-dvi-intl install-dvi-host: maybe-install-dvi-tcl install-dvi-host: maybe-i
[PATCH] gprofng: a new GNU profiler
From: Vladimir Mezentsev This is fixes for releases/gcc-13 for 31109 gprofng not built and installed in a combined binutils+gcc build I only cherry-picked 24552056fd5fc677c0d032f54a5cad1c4303d312 and tested my build. ChangeLog: * Makefile.def: Add gprofng module. * configure.ac: Add --enable-gprofng option. * Makefile.in: Regenerate. * configure: Regenerate. include/ChangeLog: * collectorAPI.h: New file. * libcollector.h: New file. * libfcollector.h: New file. --- Makefile.def| 11 + Makefile.in | 497 configure | 18 ++ configure.ac| 14 ++ include/collectorAPI.h | 73 ++ include/libcollector.h | 89 +++ include/libfcollector.h | 42 7 files changed, 744 insertions(+) create mode 100644 include/collectorAPI.h create mode 100644 include/libcollector.h create mode 100644 include/libfcollector.h diff --git a/Makefile.def b/Makefile.def index 35e994eb77e..9d43b78b51b 100644 --- a/Makefile.def +++ b/Makefile.def @@ -72,6 +72,7 @@ host_modules= { module= isl; lib_path=.libs; bootstrap=true; no_install= true; }; host_modules= { module= gold; bootstrap=true; }; host_modules= { module= gprof; }; +host_modules= { module= gprofng; }; // intl acts on 'host_shared' directly, and does not support --with-pic. host_modules= { module= intl; bootstrap=true; }; host_modules= { module= tcl; @@ -510,6 +511,16 @@ dependencies = { module=all-gprof; on=all-bfd; }; dependencies = { module=all-gprof; on=all-opcodes; }; dependencies = { module=all-gprof; on=all-intl; }; dependencies = { module=all-gprof; on=all-gas; }; + +dependencies = { module=configure-gprofng; on=configure-intl; }; +dependencies = { module=all-gprofng; on=all-libiberty; }; +dependencies = { module=all-gprofng; on=all-bfd; }; +dependencies = { module=all-gprofng; on=all-opcodes; }; +dependencies = { module=all-gprofng; on=all-intl; }; +dependencies = { module=all-gprofng; on=all-gas; }; +dependencies = { module=install-gprofng; on=install-opcodes; }; +dependencies = { module=install-gprofng; on=install-bfd; }; + dependencies = { module=configure-ld; on=configure-intl; }; dependencies = { module=all-ld; on=all-libiberty; }; dependencies = { module=all-ld; on=all-bfd; }; diff --git a/Makefile.in b/Makefile.in index 06a9398e172..2ea8943dc3b 100644 --- a/Makefile.in +++ b/Makefile.in @@ -1084,6 +1084,7 @@ configure-host: \ maybe-configure-isl \ maybe-configure-gold \ maybe-configure-gprof \ +maybe-configure-gprofng \ maybe-configure-intl \ maybe-configure-tcl \ maybe-configure-itcl \ @@ -1237,6 +1238,7 @@ all-host: maybe-all-isl all-host: maybe-all-gold @endif gold-no-bootstrap all-host: maybe-all-gprof +all-host: maybe-all-gprofng @if intl-no-bootstrap all-host: maybe-all-intl @endif intl-no-bootstrap @@ -1376,6 +1378,7 @@ info-host: maybe-info-mpc info-host: maybe-info-isl info-host: maybe-info-gold info-host: maybe-info-gprof +info-host: maybe-info-gprofng info-host: maybe-info-intl info-host: maybe-info-tcl info-host: maybe-info-itcl @@ -1466,6 +1469,7 @@ dvi-host: maybe-dvi-mpc dvi-host: maybe-dvi-isl dvi-host: maybe-dvi-gold dvi-host: maybe-dvi-gprof +dvi-host: maybe-dvi-gprofng dvi-host: maybe-dvi-intl dvi-host: maybe-dvi-tcl dvi-host: maybe-dvi-itcl @@ -1556,6 +1560,7 @@ pdf-host: maybe-pdf-mpc pdf-host: maybe-pdf-isl pdf-host: maybe-pdf-gold pdf-host: maybe-pdf-gprof +pdf-host: maybe-pdf-gprofng pdf-host: maybe-pdf-intl pdf-host: maybe-pdf-tcl pdf-host: maybe-pdf-itcl @@ -1646,6 +1651,7 @@ html-host: maybe-html-mpc html-host: maybe-html-isl html-host: maybe-html-gold html-host: maybe-html-gprof +html-host: maybe-html-gprofng html-host: maybe-html-intl html-host: maybe-html-tcl html-host: maybe-html-itcl @@ -1736,6 +1742,7 @@ TAGS-host: maybe-TAGS-mpc TAGS-host: maybe-TAGS-isl TAGS-host: maybe-TAGS-gold TAGS-host: maybe-TAGS-gprof +TAGS-host: maybe-TAGS-gprofng TAGS-host: maybe-TAGS-intl TAGS-host: maybe-TAGS-tcl TAGS-host: maybe-TAGS-itcl @@ -1826,6 +1833,7 @@ install-info-host: maybe-install-info-mpc install-info-host: maybe-install-info-isl install-info-host: maybe-install-info-gold install-info-host: maybe-install-info-gprof +install-info-host: maybe-install-info-gprofng install-info-host: maybe-install-info-intl install-info-host: maybe-install-info-tcl install-info-host: maybe-install-info-itcl @@ -1916,6 +1924,7 @@ install-dvi-host: maybe-install-dvi-mpc install-dvi-host: maybe-install-dvi-isl install-dvi-host: maybe-install-dvi-gold install-dvi-host: maybe-install-dvi-gprof +install-dvi-host: maybe-install-dvi-gprofng install-dvi-host: maybe-install-dvi-intl install-dvi-host: maybe-install-dvi-tcl install-dvi-host: maybe-install-dvi-itcl @@ -2006,6 +2015,7 @@ install-pdf-host: maybe-install-pdf-mpc install-pdf-host: maybe-install-pdf-isl install-pdf-h
Re: [PATCH] PR gcc/84923 - gcc.dg/attr-weakref-1.c failed on aarch64
On 05/21/2018 01:30 PM, Jeff Law wrote: > On 05/17/2018 02:42 AM, Richard Biener wrote: >> On Thu, 17 May 2018, Kyrill Tkachov wrote: >> >>> Hi, >>> >>> Given this is a midend change it's a good idea to CC some of the maintainers >>> of that area. >>> I've copied richi and Honza. >> The patch is ok for trunk (it's actually mine...) and for the branch >> after a while. > Installed. > > Vladimir -- do you have write access? No. I don't. > ISTM it would make sense for you > to have access if you plan to continue submitting changes. yes. I plan to continue submitting changes. -Vladimir > > Jeff
Re: [PATCH] PR gcc/84923 - gcc.dg/attr-weakref-1.c failed on aarch64
Ping. -Vladimir On 05/10/2018 11:30 PM, vladimir.mezent...@oracle.com wrote: > From: Vladimir Mezentsev > > When weakref_targets is not empty a target cannot be removed from the weak > list. > A small example is below when 'wv12' is removed from the weak list on aarch64: > static vtype Wv12 __attribute__((weakref ("wv12"))); > extern vtype wv12 __attribute__((weak)); > > Bootstrapped on aarch64-unknown-linux-gnu including (c,c++ and go). > Tested on aarch64-linux-gnu. > No regression. The attr-weakref-1.c test passed. > > ChangeLog: > 2018-05-10 Vladimir Mezentsev > > PR gcc/84923 > * varasm.c (weak_finish): clean up weak_decls > --- > gcc/varasm.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/gcc/varasm.c b/gcc/varasm.c > index 85296b4..8cf6e1e 100644 > --- a/gcc/varasm.c > +++ b/gcc/varasm.c > @@ -5652,7 +5652,8 @@ weak_finish (void) >tree alias_decl = TREE_PURPOSE (t); >tree target = ultimate_transparent_alias_target (&TREE_VALUE (t)); > > - if (! TREE_SYMBOL_REFERENCED (DECL_ASSEMBLER_NAME (alias_decl))) > + if (! TREE_SYMBOL_REFERENCED (DECL_ASSEMBLER_NAME (alias_decl)) > + || TREE_SYMBOL_REFERENCED (target)) > /* Remove alias_decl from the weak list, but leave entries for > the target alone. */ > target = NULL_TREE;
[PATCH] PR gcc/84923 - gcc.dg/attr-weakref-1.c failed on aarch64
From: Vladimir Mezentsev When weakref_targets is not empty a target cannot be removed from the weak list. A small example is below when 'wv12' is removed from the weak list on aarch64: static vtype Wv12 __attribute__((weakref ("wv12"))); extern vtype wv12 __attribute__((weak)); Bootstrapped on aarch64-unknown-linux-gnu including (c,c++ and go). Tested on aarch64-linux-gnu. No regression. The attr-weakref-1.c test passed. ChangeLog: 2018-05-10 Vladimir Mezentsev PR gcc/84923 * varasm.c (weak_finish): clean up weak_decls --- gcc/varasm.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/gcc/varasm.c b/gcc/varasm.c index 85296b4..8cf6e1e 100644 --- a/gcc/varasm.c +++ b/gcc/varasm.c @@ -5652,7 +5652,8 @@ weak_finish (void) tree alias_decl = TREE_PURPOSE (t); tree target = ultimate_transparent_alias_target (&TREE_VALUE (t)); - if (! TREE_SYMBOL_REFERENCED (DECL_ASSEMBLER_NAME (alias_decl))) + if (! TREE_SYMBOL_REFERENCED (DECL_ASSEMBLER_NAME (alias_decl)) + || TREE_SYMBOL_REFERENCED (target)) /* Remove alias_decl from the weak list, but leave entries for the target alone. */ target = NULL_TREE; -- 1.8.3.1
[PATCH] PR gcc/84923 - gcc.dg/attr-weakref-1.c failed on aarch64
From: Vladimir Mezentsev When weakref_targets is not empty a target cannot be removed from weak_decls. A small example is below when 'wv12' is removed from the weak list on aarch64: static vtype Wv12 __attribute__((weakref ("wv12"))); extern vtype wv12 __attribute__((weak)); Bootstrapped on aarch64-unknown-linux-gnu including (c,c++ and go). Tested on aarch64-linux-gnu. No regression. The attr-weakref-1.c test passed. ChangeLog: 2018-04-12 Vladimir Mezentsev PR gcc/84923 * varasm.c (weak_finish): clean up weak_decls --- gcc/varasm.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/gcc/varasm.c b/gcc/varasm.c index d24bac4..2a70234 100644 --- a/gcc/varasm.c +++ b/gcc/varasm.c @@ -5683,8 +5683,7 @@ weak_finish (void) nor multiple .weak directives for the latter. */ for (p = &weak_decls; (t2 = *p) ; ) { - if (TREE_VALUE (t2) == alias_decl - || target == DECL_ASSEMBLER_NAME (TREE_VALUE (t2))) + if (TREE_VALUE (t2) == alias_decl) *p = TREE_CHAIN (t2); else p = &TREE_CHAIN (t2); -- 1.8.3.1
Re: [PATCH] PR gcc/68256 Defining TARGET_USE_CONSTANT_BLOCKS_P causes go bootstrap failure on aarch64.
On 03/15/2018 07:07 AM, Bin.Cheng wrote: > On Fri, Feb 16, 2018 at 5:18 PM, wrote: >> From: Vladimir Mezentsev >> >> Ramana Radhakrishnan made a workaround in gcc/config/aarch64/aarch64.c to >> resolve >> bootstrap comparison failure (2015-11-10, commit >> bc443a71dafa2e707bae4b2fa74f83b05dea37ab). >> The real bug is in gcc/varasm.c. >> hash_section() returns an unstable value. >> As result, two blocks are created in get_block_for_section() for one unnamed >> section. >> A list of objects in these blocks depends on the -gtoggle option. >> I removed Ramana's workaround in gcc/config/aarch64/aarch64.c and >> I fixed hash_section() in gcc/varasm.c >> >> Bootstrapped on aarch64-unknown-linux-gnu including (c,c++ and go). >> Testing finished ok. > Hi Vladimir, > Thanks for fixing the long standing issue, but this change causes below > failure, > could you have a look? Thanks I did not see this failure. I cannot tell why. I use one CFARM machine (gcc116) and when my fix was approved I clean up my directories on this machine (including binaries and log files). ramana.radhakrish...@arm.com implemented (2015-11-06, commit cd4fcdb8ff16ec2dad56f9e736ac4552f8f52001) a new feature ('Switch constant pools to separate rodata sections'). An additional fix is needed for this feature. The test below demonstrates problem: % cat t.c extern void abort(void); typedef int vtype; static vtype Wv12 __attribute__((weakref ("wv12"))); extern vtype wv12 __attribute__((weak)); #define chk(p) do { if (!p) abort (); } while (0) int main () { chk (!&Wv12); chk (!&wv12); return (0); } % gcc t.c /tmp/cciZgRxK.o:(.rodata+0x0): undefined reference to `wv12' /tmp/cciZgRxK.o:(.rodata+0x8): undefined reference to `wv12' % gcc t.c -S % cat t.s .size main, .-main * .weakref Wv12,wv12 *<<<<<< Not here. This should be after definitions of Wv12 and wv12. Wv12 .section .rodata .align 3 .LC0: .xword Wv12 .align 3 .LC1: .xword wv12 I can file a new PR and fix it by Tuesday/Wednesday or we can temporary restore Ramana's workaround in aarch64_use_blocks_for_constant_p: % git diff gcc/config/aarch64/aarch64.c diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 4b5183b..222ea33 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -7735,7 +7735,11 @@ aarch64_use_blocks_for_constant_p (machine_mode, const_rtx) { /* We can't use blocks for constants when we're using a per-function constant pool. */ - return !aarch64_can_use_per_function_literal_pools_p (); + /* Fixme:: + return !aarch64_can_use_per_function_literal_pools_p (); + This return statement breaks gcc.dg/attr-weakref-1.c test. + For now we workaround this by returning false here. */ + return false; } /* Select appropriate section for constants depending -Vladimir > > Failures: > gcc.dg/attr-weakref-1.c > > Bisected to: > > > commit 536728c16d6a0173930ecfe370302baa471c299e > Author: rguenth > Date: Thu Mar 15 08:55:04 2018 + > > 2018-03-15 Vladimir Mezentsev > > PR target/68256 > * varasm.c (hash_section): Return an unchangeble hash value > * config/aarch64/aarch64.c (aarch64_use_blocks_for_constant_p): > Return !aarch64_can_use_per_function_literal_pools_p (). > > > Thanks, > bin >> ChangeLog: >> 2018-02-15 Vladimir Mezentsev >> >> PR gcc/68256 >> * varasm.c (hash_section): Return an unchangeble hash value >> * config/aarch64/aarch64.c (aarch64_use_blocks_for_constant_p): >> Return !aarch64_can_use_per_function_literal_pools_p (); >> --- >> gcc/config/aarch64/aarch64.c | 8 +++- >> gcc/varasm.c | 2 +- >> 2 files changed, 4 insertions(+), 6 deletions(-) >> >> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c >> index 174310c..a0a495d 100644 >> --- a/gcc/config/aarch64/aarch64.c >> +++ b/gcc/config/aarch64/aarch64.c >> @@ -7596,11 +7596,9 @@ aarch64_can_use_per_function_literal_pools_p (void) >> static bool >> aarch64_use_blocks_for_constant_p (machine_mode, const_rtx) >> { >> - /* Fixme:: In an ideal world this would work similar >> - to the logic in aarch64_select_rtx_section but this >> - breaks bootstrap in gcc go. For now we workaround >> - this by returning false here. */ >> - return false; >> + /* We can't use blocks for constants when we're using a per-function >
[PATCH] PR gcc/68256 Defining TARGET_USE_CONSTANT_BLOCKS_P causes go bootstrap failure on aarch64.
From: Vladimir Mezentsev Ramana Radhakrishnan made a workaround in gcc/config/aarch64/aarch64.c to resolve bootstrap comparison failure (2015-11-10, commit bc443a71dafa2e707bae4b2fa74f83b05dea37ab). The real bug is in gcc/varasm.c. hash_section() returns an unstable value. As result, two blocks are created in get_block_for_section() for one unnamed section. A list of objects in these blocks depends on the -gtoggle option. I removed Ramana's workaround in gcc/config/aarch64/aarch64.c and I fixed hash_section() in gcc/varasm.c Bootstrapped on aarch64-unknown-linux-gnu including (c,c++ and go). Testing finished ok. ChangeLog: 2018-02-15 Vladimir Mezentsev PR gcc/68256 * varasm.c (hash_section): Return an unchangeble hash value * config/aarch64/aarch64.c (aarch64_use_blocks_for_constant_p): Return !aarch64_can_use_per_function_literal_pools_p (); --- gcc/config/aarch64/aarch64.c | 8 +++- gcc/varasm.c | 2 +- 2 files changed, 4 insertions(+), 6 deletions(-) diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 174310c..a0a495d 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -7596,11 +7596,9 @@ aarch64_can_use_per_function_literal_pools_p (void) static bool aarch64_use_blocks_for_constant_p (machine_mode, const_rtx) { - /* Fixme:: In an ideal world this would work similar - to the logic in aarch64_select_rtx_section but this - breaks bootstrap in gcc go. For now we workaround - this by returning false here. */ - return false; + /* We can't use blocks for constants when we're using a per-function + constant pool. */ + return !aarch64_can_use_per_function_literal_pools_p (); } /* Select appropriate section for constants depending diff --git a/gcc/varasm.c b/gcc/varasm.c index b045efa..5aae5b4 100644 --- a/gcc/varasm.c +++ b/gcc/varasm.c @@ -225,7 +225,7 @@ hash_section (section *sect) { if (sect->common.flags & SECTION_NAMED) return htab_hash_string (sect->named.name); - return sect->common.flags; + return sect->common.flags & ~SECTION_DECLARED; } /* Helper routines for maintaining object_block_htab. */ -- 1.8.3.1
Re: [PATCH] PR libgcc/59714 complex division is surprising on aarch64
On 02/06/2018 08:53 AM, Joseph Myers wrote: > The files in libgcc/soft-fp must be verbatim copies of the master sources > in glibc. So you can't make any local changes to them, and if you think > changes are needed you need to patch things upstream in glibc first, with > a proper extended explanation of why the fix is needed and why it is safe > and the appropriate design for the fix. There's nothing at all in this > patch submission to explain that change. It is a bug/typo in gcc/libgcc/soft-fp/quad.h when _FP_W_TYPE_SIZE is 64 (for example, on aarch64) . It looks like a code after line 202 was never used. % cat -n libgcc/soft-fp/quad.h ... 201 202 #else /* not _FP_W_TYPE_SIZE < 64 */ 203 union _FP_UNION_Q 204 { 205 TFtype flt /* __attribute__ ((mode (TF))) */ ; 206 struct *_FP_STRUCT_LAYOUT* 207 { 208 _FP_W_TYPE a, b; 209 } longs; 210 struct *_FP_STRUCT_LAYOUT* 211 { 212 # if __BYTE_ORDER == __BIG_ENDIAN 213 unsigned sign : 1; 214 unsigned exp : _FP_EXPBITS_Q; 215 _FP_W_TYPE frac1 : _FP_FRACBITS_Q - (_FP_IMPLBIT_Q != 0) - _FP_W_TYPE_SIZE; 216 _FP_W_TYPE frac0 : _FP_W_TYPE_SIZE; 217 # else 218 _FP_W_TYPE frac0 : _FP_W_TYPE_SIZE; 219 _FP_W_TYPE frac1 : _FP_FRACBITS_Q - (_FP_IMPLBIT_Q != 0) - _FP_W_TYPE_SIZE; 220 unsigned exp : _FP_EXPBITS_Q; 221 unsigned sign : 1; 222 # endif 223 } bits; 224 }; We see 'struct _FP_STRUCT_LAYOUT' is declared twice (in lines 206 and 210) inside union _FP_UNION_Q. Compiler reports warning. -Vladimir P.S.: gcc/libgcc/soft-fp/ and glibc/soft-fp/ are not synchronized now. For example: % diff gcc/libgcc/soft-fp/quad.h glibc/soft-fp/ 3c3 < Copyright (C) 1997-2016 Free Software Foundation, Inc. --- > Copyright (C) 1997-2018 Free Software Foundation, Inc. 96c96 < } bits __attribute__ ((packed)); --- > } bits;
Re: [PATCH] PR libgcc/59714 complex division is surprising on aarch64
On 01/29/2018 12:51 PM, Joseph Myers wrote: > On Mon, 29 Jan 2018, vladimir.mezent...@oracle.com wrote: > >>> What about powerpc __divkc3? >>> >>> What was the rationale for using soft-fp rather than adding appropriate >>> built-in functions as suggested in a comment? >> I had a version with built-in functions and I can restore it. >> >> Let's design what we want to use soft-fp or built-in function. >> I'd prefer to use built-in functions but performance is in two times worse. >> >> The test below demonstrates performance degradation: > So, this is comparing __builtin_frexp (extracting both exponent and > mantissa, including appropriate handling of exponents of subnormal values) > with just extracting the exponent and with no special handling of zero / > infinity / NaN / subnormal values. I compared with __builtin_ilogb. There is a same performance degradation. > We need to consider what functionality is actually needed for this > scaling, if we do wish to use such scaling based on integer exponents. If > what's needed corresponds exactly to some standard functions such as ilogb > and scalbn with all their special cases, then built-in versions of those > standard functions may make sense. The IEEE format for double is . We need a function like extract_exponent_field. All standard function make an additional work to get exponent. It is a reason why we see performance degradation. % cat r2.c #include int main(int argc, char** argv) { double d = 0x0.004p-1023; printf("d=%-24.13a exp=%d\n", d, __builtin_ilogb(d)); return 0; } % gcc -O2 -lm r2.c ; ./a.out d=0x0.00200p-1022 *exp=-1033 * -Vladimir > If what's needed does not correspond > to standard functions and does not need to handle such special cases, > that's more of an indication for examining the representation in libgcc - > but it's still necessary to handle the many different variant > floating-point formats supported, or to safely avoid using the new code > for formats it can't handle. >
[PATCH] PR libgcc/59714 complex division is surprising on aarch64
From: Vladimir Mezentsev Tested on aarch64-linux-gnu, x86_64-pc-linux-gnu and sparc64-unknown-linux-gnu. No regression. New tests now passed. There is a performance degradation for complex double type: failed cases |time old new| old new complex-32 150 | 3.51 3.56 complex-64 190 | 1.88 2.34 complex-128 340 | 3.71 2.88 The formula for complex division (a+bi)/(c+di) is (a*c+b*d)/(c*c+d*d) + (b*c-a*d)/(c*c+d*d)i The current implementation avoids overflow in (c*c + d*d) using x=MAX(|c|, |d|); c1=c/x; d1=d/x; and then calculating the result as (a*c1+b*d1)/(c1*c1+d1*d1)*(1.0/x) + (b*c1-a*d1)/(c1*c1+d1*d1)*(1.0/x)i This approach has two issues: 1. Unexpected rounding when x is not a power of 2.0 For example, (1.0+3.0i) /(1.0+3.0i) should be 1.0 but it is (1.0 - 1.66533e-17i) on aarch64 (PR59714) 2. Unexpected underflow in c/x or d/x. For example, (0x1.0p+1023 + 0x0.8p-1022 i)/(0x1.0p+677 + 0x1.0p-677 i) should be (0x1.0p+346 - 0x1.0p-1008 i) but it is (0x1.0p+346) My implementation avoids these issues. I use soft-fp instead built-in functions because performance with built-in functions (like __builtin_ilogb) is in two times worse. Also libm is needed for built-in functions. Restrictions: An old implementation is used when an architecture doesn't provide sfp-machine.h An old implementation is used for __divkc3 (I have no access on rs6000 for testing) ChangeLog: 2018-02-05 Vladimir Mezentsev PR libgcc/59714 * libgcc/libgcc2.c: New complex division implementation * libgcc/soft-fp/quad.h: struct _FP_STRUCT_LAYOUT was redefined * libgcc/config/no-sfp-machine.h: Add _NO_SFP_MACHINE_H * libgcc/config/sparc/sfp-machine.h: New * gcc/testsuite/gcc.c-torture/execute/pr59714_128.c: New test * gcc/testsuite/gcc.c-torture/execute/pr59714_32.c: New test * gcc/testsuite/gcc.c-torture/execute/pr59714_64.c: New test --- gcc/testsuite/gcc.c-torture/execute/pr59714_128.c | 506 ++ gcc/testsuite/gcc.c-torture/execute/pr59714_32.c | 506 ++ gcc/testsuite/gcc.c-torture/execute/pr59714_64.c | 506 ++ libgcc/config/no-sfp-machine.h| 3 + libgcc/config/sparc/sfp-machine.h | 14 + libgcc/libgcc2.c | 178 +++- libgcc/soft-fp/quad.h | 2 +- 7 files changed, 1710 insertions(+), 5 deletions(-) create mode 100644 gcc/testsuite/gcc.c-torture/execute/pr59714_128.c create mode 100644 gcc/testsuite/gcc.c-torture/execute/pr59714_32.c create mode 100644 gcc/testsuite/gcc.c-torture/execute/pr59714_64.c create mode 100644 libgcc/config/sparc/sfp-machine.h diff --git a/gcc/testsuite/gcc.c-torture/execute/pr59714_128.c b/gcc/testsuite/gcc.c-torture/execute/pr59714_128.c new file mode 100644 index 000..e26f40a --- /dev/null +++ b/gcc/testsuite/gcc.c-torture/execute/pr59714_128.c @@ -0,0 +1,506 @@ +#include +#include + +extern void abort (void); +#define FABS(x) ((x) >= 0 ? (x) : -(x)) +#define FTYPE long double +#define FMT "%-39.28La" + +/* + Tables of numbers for complex division testing + All values in hex format + z[i] == correctly rounded value of x[i]/y[i]; + */ + +#define NUMCNT 100 + +__complex FTYPE x[NUMCNT] = + { +0x1.0p0L + 0x1.0p0Li, /* #1 */ +0x1.0p0L + 0x1.0p0Li, /* #2 */ +0x1.0p1023L + 0x1.0p-1023Li, /* #3 */ +0x1.0p1023L + 0x1.0p1023Li,/* #4 */ +0x1.0p1020L + 0x1.0p-844Li,/* #5 */ +0x1.0p-71L + 0x1.0p1021Li, /* #6 */ +0x1.0p-347L + 0x1.0p-54Li, /* #7 */ +0x1.0p-1074L + 0x1.0p-1074Li, /* #8 */ +0x1.0p1015L + 0x1.0p-989Li,/* #9 */ +0x1.0p-622L + 0x1.0p-1071Li, /* #10 */ +0x1.0p0L + 0x3.0p0Li, /* #11 */ +-0x1.0p0L + 0x3.0p0Li, /* #12 */ +0x1.0p0L - 0x3.0p0Li, /* #13 */ +0x3.0p0L + 0x1.0p0Li, /* #14 */ +0x3.0p0L + 0x4.0p0Li, /* #15 */ +0x1.0p0L - 0x3.1p-20Li,/* #16 */ +0x1.0p0L + 0x3.001p-20Li, /* #17 */ +0x1.0p0L + 0x1.80002p1Li, /* #18 */ +0x1.0p0L + 0x1.80004p1Li, /* #19 */ +0x1.0p0L + 0x1.80008p1Li, /* #20 */ +0x1.0p0L + 0x1.80010p1Li, /* #21 */ +0x1.0p0L + 0x1.80020p1Li, /* #22 */ +0x1.0p0L + 0x1.80040p1Li, /* #23 */ +0x1.0p0L + 0x1.80080p1Li, /* #24 */ +0x1.0p0L + 0x1.7ff8p1Li, /* #25 */ +0x1.0p0L + 0x1.7ff0p1Li, /* #26 */ +0x1.0p0L + 0x1.7f80p1Li, /* #27 */ +0x1.0p0L + 0x1.7f00p1Li, /* #28 */ +0x5.0p0L + 0x5.0p0Li, /* #29 */ +0x4.0p0L + 0x3.0p0Li, /* #30 */ +-0x4.0p0L + 0x3.1p0Li, /* #31 */ +0x3.001p-20L + 0x1.0p0Li, /* #32 */ +
Re: [PATCH] PR libgcc/59714 complex division is surprising on aarch64
On 01/25/2018 04:14 PM, Joseph Myers wrote: > On Thu, 25 Jan 2018, vladimir.mezent...@oracle.com wrote: > >> From: Vladimir Mezentsev >> >> Tested on aarch64-linux-gnu, x86_64-pc-linux-gnu and >> sparc64-unknown-linux-gnu. >> No regression. New tests now passed. >> There is a performance degradation for complex double type: > This is, of course, something to consider for GCC 9; it's not suitable for > the current development stage. Why ? > > Could you provide a more extended writeup of the issue (starting with the > argument for it being a bug at all), the approach you took and the > rationale for the approach, when you resubmit the patch for GCC 9 stage 1? The formula for complex division (a+bi)/(c+di) is (a*c+b*d)/(c*c+d*d) + (b*c-a*d)/(c*c+d*d)i The current implementation avoids overflow in (c*c + d*d) using x=MAX(|c|, |d|); c1=c/x; d1=d/x; and then calculating the result as (a*c1+b*d1)/(c1*c1+d1*d1)*(1.0/x) + (b*c1-a*d1)/(c1*c1+d1*d1)*(1.0/x)i This approach has two issues: 1. Unexpected rounding when x is not a power of 2.0 For example, (1.0+3.0i) /(1.0+3.0i) should be 1.0 but it is (1.0 - 1.66533e-17i) on aarch64 (PR59714) 2. Unexpected underflow in c/x or d/x. For example, (0x1.0p+1023 + 0x0.8p-1022 i)/(0x1.0p+677 + 0x1.0p-677 i) should be (0x1.0p+346 - 0x1.0p-1008 i) but it is (0x1.0p+346) My implementation avoids these issues. > >> * libgcc/config/sparc/sfp-machine.h: New > Are you introducing a requirement for all architectures to provide > sfp-machine.h? If so, did you determine that SPARC was the only one > lacking it? If not the only one lacking it, you need to make sure that > you do not break any existing architecture in GCC. > > What about architectures using non-IEEE formats? soft-fp is not suitable > for those, but you mustn't break them. Remember that e.g. TFmode can be > IBM long double, and other ?Fmode similarly need not be IEEE; the name > only indicates how many bytes are in the format, nothing else about it. Agree. An additional fix is needed. Build can be broken without sfp-mashine.h or on architectures using non-IEEE format. > > What about powerpc __divkc3? > > What was the rationale for using soft-fp rather than adding appropriate > built-in functions as suggested in a comment? I had a version with built-in functions and I can restore it. Let's design what we want to use soft-fp or built-in function. I'd prefer to use built-in functions but performance is in two times worse. The test below demonstrates performance degradation: % cat r1.c #include int main(int argc, char** argv) { int i; long long sum = 0; for(i = 1; i < 10; i++) { int exp; double d = (double) i; #ifdef BUITIN double v = __builtin_frexp(d, &exp); #else union _FP_UNION_D { double flt; struct _FP_STRUCT_LAYOUT { unsigned frac0 : 32; unsigned frac1 : 20; unsigned exp : 11; unsigned sign : 1; } bits __attribute__ ((packed)); long long LL; } u; u.flt = d; exp = u.bits.exp - 1022; #endif sum += exp * (i % 2 == 0 ? 1 : -1); } printf("sum = %d\n", sum); return 0; } %gcc -O2 -DBUITIN r1.c ; time ./a.out real 0m4.247s user 0m4.243s sys 0m0.001s %gcc -O2 r1.c ; time ./a.out real 0m1.977s user 0m1.973s sys 0m0.001s -Vladimir
[PATCH] PR libgcc/59714 complex division is surprising on aarch64
From: Vladimir Mezentsev Tested on aarch64-linux-gnu, x86_64-pc-linux-gnu and sparc64-unknown-linux-gnu. No regression. New tests now passed. There is a performance degradation for complex double type: failed cases |time old new| old new complex-32 150 | 3.51 3.56 complex-64 190 | 1.88 2.34 complex-128 340 | 3.71 2.88 ChangeLog: 2018-01-25 Vladimir Mezentsev PR libgcc/59714 * libgcc/libgcc2.c: New complex division implementation * libgcc/config/sparc/sfp-machine.h: New * gcc/testsuite/gcc.c-torture/execute/pr59714_128.c: New test * gcc/testsuite/gcc.c-torture/execute/pr59714_32.c: New test * gcc/testsuite/gcc.c-torture/execute/pr59714_64.c: New test --- gcc/testsuite/gcc.c-torture/execute/pr59714_128.c | 506 ++ gcc/testsuite/gcc.c-torture/execute/pr59714_32.c | 506 ++ gcc/testsuite/gcc.c-torture/execute/pr59714_64.c | 506 ++ libgcc/config/sparc/sfp-machine.h | 13 + libgcc/libgcc2.c | 175 +++- 5 files changed, 1700 insertions(+), 6 deletions(-) create mode 100644 gcc/testsuite/gcc.c-torture/execute/pr59714_128.c create mode 100644 gcc/testsuite/gcc.c-torture/execute/pr59714_32.c create mode 100644 gcc/testsuite/gcc.c-torture/execute/pr59714_64.c create mode 100644 libgcc/config/sparc/sfp-machine.h diff --git a/gcc/testsuite/gcc.c-torture/execute/pr59714_128.c b/gcc/testsuite/gcc.c-torture/execute/pr59714_128.c new file mode 100644 index 000..e26f40a --- /dev/null +++ b/gcc/testsuite/gcc.c-torture/execute/pr59714_128.c @@ -0,0 +1,506 @@ +#include +#include + +extern void abort (void); +#define FABS(x) ((x) >= 0 ? (x) : -(x)) +#define FTYPE long double +#define FMT "%-39.28La" + +/* + Tables of numbers for complex division testing + All values in hex format + z[i] == correctly rounded value of x[i]/y[i]; + */ + +#define NUMCNT 100 + +__complex FTYPE x[NUMCNT] = + { +0x1.0p0L + 0x1.0p0Li, /* #1 */ +0x1.0p0L + 0x1.0p0Li, /* #2 */ +0x1.0p1023L + 0x1.0p-1023Li, /* #3 */ +0x1.0p1023L + 0x1.0p1023Li,/* #4 */ +0x1.0p1020L + 0x1.0p-844Li,/* #5 */ +0x1.0p-71L + 0x1.0p1021Li, /* #6 */ +0x1.0p-347L + 0x1.0p-54Li, /* #7 */ +0x1.0p-1074L + 0x1.0p-1074Li, /* #8 */ +0x1.0p1015L + 0x1.0p-989Li,/* #9 */ +0x1.0p-622L + 0x1.0p-1071Li, /* #10 */ +0x1.0p0L + 0x3.0p0Li, /* #11 */ +-0x1.0p0L + 0x3.0p0Li, /* #12 */ +0x1.0p0L - 0x3.0p0Li, /* #13 */ +0x3.0p0L + 0x1.0p0Li, /* #14 */ +0x3.0p0L + 0x4.0p0Li, /* #15 */ +0x1.0p0L - 0x3.1p-20Li,/* #16 */ +0x1.0p0L + 0x3.001p-20Li, /* #17 */ +0x1.0p0L + 0x1.80002p1Li, /* #18 */ +0x1.0p0L + 0x1.80004p1Li, /* #19 */ +0x1.0p0L + 0x1.80008p1Li, /* #20 */ +0x1.0p0L + 0x1.80010p1Li, /* #21 */ +0x1.0p0L + 0x1.80020p1Li, /* #22 */ +0x1.0p0L + 0x1.80040p1Li, /* #23 */ +0x1.0p0L + 0x1.80080p1Li, /* #24 */ +0x1.0p0L + 0x1.7ff8p1Li, /* #25 */ +0x1.0p0L + 0x1.7ff0p1Li, /* #26 */ +0x1.0p0L + 0x1.7f80p1Li, /* #27 */ +0x1.0p0L + 0x1.7f00p1Li, /* #28 */ +0x5.0p0L + 0x5.0p0Li, /* #29 */ +0x4.0p0L + 0x3.0p0Li, /* #30 */ +-0x4.0p0L + 0x3.1p0Li, /* #31 */ +0x3.001p-20L + 0x1.0p0Li, /* #32 */ +0x1.001p-80L + 0x1.0p0Li, /* #33 */ +0x1.001p-8000L + 0x1.0p0Li,/* #34 */ +0x1.0p0L + 0x1.001p-8000Li,/* #35 */ +-0x1.0p0L + 0x1.001p-8000Li,/* #36 */ +-0x1.0p0L + 0x1.001p-8000Li,/* #37 */ +-0x1.0p0L + 0x1.001p-8000Li,/* #38 */ +0x1.0p0L - 0x1.001p-8000Li,/* #39 */ +0x1.0p1000L - 0x1.001p-8000Li, /* #40 */ +0x1.97530p1000L - 0x1.001p-8000Li, /* #41 */ +0x1.97530p3000L - 0x1.001p-8000Li, /* #42 */ +0x1.97530p3770L - 0x1.001p-8000Li, /* #43 */ +0x1.97530p4770L - 0x1.001p-7000Li, /* #44 */ +0x1.97530p0L - 0x1.001p-12000Li, /* #45 */ +0x1.97530p1000L - 0x1.001p-11000Li,/* #46 */ +0x1.97530p2000L - 0x1.001p-9000Li, /* #47 */ +0x1.97530p3000L - 0x1.001p-8000Li, /* #48 */ +0x1.97530p4000L - 0x1.001p-7000Li, /* #49 */ +0x1.97530p5000L - 0x1.001p-6000Li, /* #50 */ +0x1.97530p6000L - 0x1.001p-5000Li, /* #51 */ +0x1.97530p7000L - 0x1.001p-4000Li, /* #52 */ +0x1.97530p8000L - 0x1.001p-1000Li, /* #53 */ +0x1.97530p9000L - 0x1.001p-0Li,/* #54 */ +0x1.97530p1L + 0x1.001p2000Li, /* #55 */ +0x1
Re: [PATCH] PR libgcc/59714 complex division is surprising on aarch64
On 10/25/2017 10:28 AM, Joseph Myers wrote: > On Wed, 25 Oct 2017, vladimir.mezent...@oracle.com wrote: > >> +# Disable FMA (floating-point multiply-add) instructions for complex >> division. >> +# These instructions can produce different result if two operations >> executed separately. >> +LIBGCC2_FFP_CONTRAST_CFLAGS = -ffp-contract=off >> +LIB2_DIV3_FUNCS = _divdc3 _divhc3 _divsc3 > Without regard to whether the change is appropriate in the first place, > I'm doubtful of the logic for selecting floating-point modes. I'd expect > something that applies to all floating-point modes. Even if nothing has > fused XFmode operations, some architectures (e.g. s390, powerpc) have > fused operations on TFmode / KFmode. Hello Joseph and GCC-team, Different people are giving me different directions. I am glad to change my fix but please provide clear directions on how you want to see it. michael.hud...@linaro.org created PR 59714 and suggested to set -ffp-contract=off: > int main(int argc, char** argv) > { > __complex double c = 1.0 + 3.0i; > printf("%g\n", __imag__ (c/c)); > } > ubuntu@arm64:~$ gcc cplx.c -o cplx > ubuntu@arm64:~$ ./cplx > -1.66533e-17 > > This is because libgcc2.c is compiled in a way that lets the compiler used > fused multiply add instructions. > It shouldn't be! > ... > In general, fma makes it hard to reason about expressions > like "a*b-c*d" -- which of the multiplications is being done at the higher precision? > But I guess arguing to fp-contract to default to off is a war I don't want to get into fighting. Thank you, -Vladimir
[PATCH] PR libgcc/59714 complex division is surprising on aarch64
From: Vladimir Mezentsev FMA (floating-point multiply-add) instructions are supported on aarch64. These instructions can produce different result if two operations executed separately. -ffp-contract=off doesn't allow the FMA instructions. Tested on aarch64-unknown-linux-gnu. No regression. Two failed tests now passed. ChangeLog: 2017-10-25 Vladimir Mezentsev PR libgcc/59714 * libgcc/Makefile.in: Add -ffp-contract=off --- libgcc/Makefile.in | 8 1 file changed, 8 insertions(+) diff --git a/libgcc/Makefile.in b/libgcc/Makefile.in index a1a392d..f313556 100644 --- a/libgcc/Makefile.in +++ b/libgcc/Makefile.in @@ -455,6 +455,14 @@ ifeq ($(LIB2_SIDITI_CONV_FUNCS),) lib2funcs += $(subst XX,di,$(dwfloatfuncs)) endif +# Disable FMA (floating-point multiply-add) instructions for complex division. +# These instructions can produce different result if two operations executed separately. +LIBGCC2_FFP_CONTRAST_CFLAGS = -ffp-contract=off +LIB2_DIV3_FUNCS = _divdc3 _divhc3 _divsc3 +lib2-div3-o = $(patsubst %,%$(objext),$(LIB2_DIV3_FUNCS)) +lib2-div3-s-o = $(patsubst %,%_s$(objext),$(LIB2_DIV3_FUNCS)) +$(lib2-div3-o) $(lib2-div3-s-o): LIBGCC2_CFLAGS += $(LIBGCC2_FFP_CONTRAST_CFLAGS) + # These might cause a divide overflow trap and so are compiled with # unwinder info. LIB2_DIVMOD_FUNCS = _divdi3 _moddi3 _divmoddi4 \ -- 1.8.3.1
Re: [PATCH] PR libgcc/59714 complex division is surprising on aarch64
On 10/19/2017 06:07 AM, Wilco Dijkstra wrote: > Vladimir wrote: > > +# Disable floating-point expression contraction > +LIBGCC2_FFP_CONTRAST_CFLAGS = -ffp-contract=off > + > > It looks like this disables fp-contract in all of libgcc... > What is the the number of FMAs in libgcc before/after? In original (without my fix) libgcc_s.so: % objdump -d libgcc_s.so| egrep '(fmadd|fmsub|fnmadd|fnmsub)'|wc -l 38 In my new libgcc_s.so ( see my fix below) % objdump -d libgcc_s.so| egrep '(fmadd|fmsub|fnmadd|fnmsub)'|wc -l 8 > If it disables anything other than the ones in complex division, it > would have an unintended performance impact. It's best to do > this just for complex division. On 10/19/2017 10:17 AM, Wilco Dijkstra wrote: > Vladimir Mezentsev >> On 10/19/2017 06:37 AM, Richard Earnshaw (lists) wrote: >>> On 19/10/17 14:07, Wilco Dijkstra wrote: >>>> Vladimir wrote: >>>> >>>> +# Disable floating-point expression contraction >>>> +LIBGCC2_FFP_CONTRAST_CFLAGS = -ffp-contract=off >>>> + >>>> >>>> It looks like this disables fp-contract in all of libgcc... >>>> What is the the number of FMAs in libgcc before/after? >> How can I find this number ? >> dis | grep | wc > Eg. objdump -d ~/install/gcc/lib64/libgcc_s.so | grep -c fmadd > Also grep fmsub, fnmadd, fnmsub. > >>> It's probably better to do this with an attribute >>> >>> __attribute__((optimize("fp-contract=off"))) >>> >>> on the affected functions. >> I like your suggestion. > I don't think this will work in general, IIRC only a few targets correctly > implement the optimize pragma. Changing the makefile to build only > __divdc3/__divtc3/__divsc3/__divhc3 without FMA looks like a better > approach. Only 3 (_divdc3 / _divhc3 / _divsc3) have FMAs. The other FMAs are in: _fixunsdfdi.o _fixunsdfdi_s.o _fixunssfdi.o _fixunssfdi_s.o _muldc3.o _muldc3_s.o _mulhc3.o _mulhc3_s.o _mulsc3.o _mulsc3_s.o I suggest the following fix: % git diff diff --git a/libgcc/Makefile.in b/libgcc/Makefile.in index a1a392d..f313556 100644 --- a/libgcc/Makefile.in +++ b/libgcc/Makefile.in @@ -455,6 +455,14 @@ ifeq ($(LIB2_SIDITI_CONV_FUNCS),) lib2funcs += $(subst XX,di,$(dwfloatfuncs)) endif +# Disable FMA (floating-point multiply-add) instructions for complex division. +# These instructions can produce different result if two operations executed separately. +LIBGCC2_FFP_CONTRAST_CFLAGS = -ffp-contract=off +LIB2_DIV3_FUNCS = _divdc3 _divhc3 _divsc3 +lib2-div3-o = $(patsubst %,%$(objext),$(LIB2_DIV3_FUNCS)) +lib2-div3-s-o = $(patsubst %,%_s$(objext),$(LIB2_DIV3_FUNCS)) +$(lib2-div3-o) $(lib2-div3-s-o): LIBGCC2_CFLAGS += $(LIBGCC2_FFP_CONTRAST_CFLAGS) + If my fix will be approved I send ChangeLog and a patch file tomorrow. -Vladimir > However we also need to decide whether this is the best possible fix. > It will affect accuracy of other inputs as well... > > Wilco
Re: [PATCH] PR libgcc/59714 complex division is surprising on aarch64
On 10/19/2017 06:37 AM, Richard Earnshaw (lists) wrote: On 19/10/17 14:07, Wilco Dijkstra wrote: Vladimir wrote: +# Disable floating-point expression contraction +LIBGCC2_FFP_CONTRAST_CFLAGS = -ffp-contract=off + It looks like this disables fp-contract in all of libgcc... What is the the number of FMAs in libgcc before/after? How can I find this number ? dis | grep | wc Or are there the other simple ways ? If it disables anything other than the ones in complex division, it would have an unintended performance impact. It's best to do this just for complex division. Wilco It's probably better to do this with an attribute __attribute__((optimize("fp-contract=off"))) on the affected functions. I like your suggestion. -Vladimir R.
[PATCH] PR libgcc/59714 complex division is surprising on aarch64
From: Vladimir Mezentsev FMA (floating-point multiply-add) instructions are supported on aarch64. These instructions can produce different result if two operations executed separately. -ffp-contract=off doesn't allow the FMA instructions. Tested on two platforms: aarch64-unknown-linux-gnu: No regression. Two failed tests now passed. sparc64-unknown-linux-gnu: No regression. ChangeLog: 2017-10-18 Vladimir Mezentsev PR libgcc/59714 * libgcc/Makefile.in: Add -ffp-contract=off --- libgcc/Makefile.in |4 1 files changed, 4 insertions(+), 0 deletions(-) diff --git a/libgcc/Makefile.in b/libgcc/Makefile.in index a1a392d..b771875 100644 --- a/libgcc/Makefile.in +++ b/libgcc/Makefile.in @@ -237,12 +237,16 @@ else DECNUMINC = endif +# Disable floating-point expression contraction +LIBGCC2_FFP_CONTRAST_CFLAGS = -ffp-contract=off + # Options to use when compiling libgcc2.a. # LIBGCC2_DEBUG_CFLAGS = -g LIBGCC2_CFLAGS = -O2 $(LIBGCC2_INCLUDES) $(GCC_CFLAGS) $(HOST_LIBGCC2_CFLAGS) \ $(LIBGCC2_DEBUG_CFLAGS) -DIN_LIBGCC2 \ -fbuilding-libgcc -fno-stack-protector \ +$(LIBGCC2_FFP_CONTRAST_CFLAGS) \ $(INHIBIT_LIBC_CFLAGS) # Additional options to use when compiling libgcc2.a. -- 1.7.1
Re: [PATCH 2/2] PR libgcc/59714 complex division is surprising on aarch64
On 10/12/2017 03:40 AM, Richard Earnshaw wrote: > On 12/10/17 06:21, vladimir.mezent...@oracle.com wrote: >> From: Vladimir Mezentsev >> >> FMA (floating-point multiply-add) instructions are supported on aarch64. >> These instructions can produce different result if two operations executed >> separately. >> -ffp-contract=off doesn't allow the FMA instructions. >> >> Tested on aarch64-linux-gnu. >> No regression. Two failed tests now passed. >> >> ChangeLog: >> 2017-10-11 Vladimir Mezentsev >> >> PR libgcc/59714 >> * libgcc/config/aarch64/t-aarch64 (HOST_LIBGCC2_CFLAGS): Add >> -ffp-contract=off >> --- >> libgcc/config/aarch64/t-aarch64 | 1 + >> 1 file changed, 1 insertion(+) >> >> diff --git a/libgcc/config/aarch64/t-aarch64 >> b/libgcc/config/aarch64/t-aarch64 >> index 3af933c..e33bef0 100644 >> --- a/libgcc/config/aarch64/t-aarch64 >> +++ b/libgcc/config/aarch64/t-aarch64 >> @@ -18,4 +18,5 @@ >> # along with GCC; see the file COPYING3. If not see >> # <http://www.gnu.org/licenses/>. >> >> +HOST_LIBGCC2_CFLAGS += -ffp-contract=off >> LIB2ADD += $(srcdir)/config/aarch64/sync-cache.c >> > Why would we want to do this on AArch64 only? If it's right for us, > then surely it would be right for everyone and the option should be > applied at the top level. It is a machine dependent option. We don't need this option, for example, on sparc or intel machines. -Vladimir > > Hint: I'm not convinced on the evidence here that it is right across the > whole of libgcc. Before moving forward on this particular PR I think we > need to understand what behaviour we do want out of the compiler. > > R.
[PATCH 2/2] PR libgcc/59714 complex division is surprising on aarch64
From: Vladimir Mezentsev FMA (floating-point multiply-add) instructions are supported on aarch64. These instructions can produce different result if two operations executed separately. -ffp-contract=off doesn't allow the FMA instructions. Tested on aarch64-linux-gnu. No regression. Two failed tests now passed. ChangeLog: 2017-10-11 Vladimir Mezentsev PR libgcc/59714 * libgcc/config/aarch64/t-aarch64 (HOST_LIBGCC2_CFLAGS): Add -ffp-contract=off --- libgcc/config/aarch64/t-aarch64 | 1 + 1 file changed, 1 insertion(+) diff --git a/libgcc/config/aarch64/t-aarch64 b/libgcc/config/aarch64/t-aarch64 index 3af933c..e33bef0 100644 --- a/libgcc/config/aarch64/t-aarch64 +++ b/libgcc/config/aarch64/t-aarch64 @@ -18,4 +18,5 @@ # along with GCC; see the file COPYING3. If not see # <http://www.gnu.org/licenses/>. +HOST_LIBGCC2_CFLAGS += -ffp-contract=off LIB2ADD += $(srcdir)/config/aarch64/sync-cache.c -- 1.8.3.1
Re: [PATCH] PR target/68256 Defining TARGET_USE_CONSTANT_BLOCKS_P causes go bootstrap failure on aarch64
On 10/05/2017 07:09 AM, Richard Earnshaw (lists) wrote: > On 04/10/17 00:50, vladimir.mezent...@oracle.com wrote: >> From: Vladimir Mezentsev >> >> Tested on aarch64-linux-gnu. >> No regression. >> No bootstrap failure. >> >> gcc/ChangeLog: >> 2017-09-26 Vladimir Mezentsev >> >> * gcc/config/aarch64/aarch64.c: restore fix in >> aarch64_use_blocks_for_constant_p > Vladimir, > > Thanks for the patch, but I can't accept this as it stands. The patch > lacks a description of what is motivating the change, so I can't really > review it sensibly. What's the problem you are trying to address? Why > do you believe this is the right fix? Do you have a test case? Hello Richard, I wanted to fix *Bug 68256* <https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68256> - Defining TARGET_USE_CONSTANT_BLOCKS_P causes go bootstrap failure on aarch64. Ramana Radhakrishnan ( ) made a workaround for this bug: commit bc443a71dafa2e707bae4b2fa74f83b05dea37ab Author: ramana Date: Tue Nov 10 08:35:21 2015 + Workaround PR68256 on AArch64 I removed Raman's fixes andI cannot reproduce the problem. I used the gcc116.fsffrance.org machine: % uname -a Linux gcc116 3.13.0-92-generic #139-Ubuntu SMP Tue Jun 28 20:45:34 UTC 2016 aarch64 aarch64 aarch64 GNU/Linux I run % ../gcc/configure --enable-languages=c,c++,go --enable-bootstrap --enable-multilib % make -j8 bootstrap Is it correct steps to reproduce the go bootstrap failure ? If yes: There were no bootstrap comparison failure and no regression in testing. My suggestion is to remove Raman's workaround and close PR68256. If no: What is wrong in my steps? Thank you, -Vladimir > > The ChangeLog entry doesn't help me either: restore what fix? > > Is the patch associated with a bugzilla report? If so, then please > include that in the ChangeLog in the form "PR /. > > Incidentally, ChangeLog entries should be complete sentences (begin with > a capital letter and end with a full stop). > > R. > > >> --- >> gcc/config/aarch64/aarch64.c | 8 +++- >> 1 file changed, 3 insertions(+), 5 deletions(-) >> >> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c >> index 1c14008..b377bc7 100644 >> --- a/gcc/config/aarch64/aarch64.c >> +++ b/gcc/config/aarch64/aarch64.c >> @@ -6193,11 +6193,9 @@ aarch64_can_use_per_function_literal_pools_p (void) >> static bool >> aarch64_use_blocks_for_constant_p (machine_mode, const_rtx) >> { >> - /* Fixme:: In an ideal world this would work similar >> - to the logic in aarch64_select_rtx_section but this >> - breaks bootstrap in gcc go. For now we workaround >> - this by returning false here. */ >> - return false; >> + /* We can't use blocks for constants when we're using a per-function >> + constant pool. */ >> + return !aarch64_can_use_per_function_literal_pools_p (); >> } >> >> /* Select appropriate section for constants depending >>
[PATCH] PR target/68256 Defining TARGET_USE_CONSTANT_BLOCKS_P causes go bootstrap failure on aarch64
From: Vladimir Mezentsev Tested on aarch64-linux-gnu. No regression. No bootstrap failure. gcc/ChangeLog: 2017-09-26 Vladimir Mezentsev * gcc/config/aarch64/aarch64.c: restore fix in aarch64_use_blocks_for_constant_p --- gcc/config/aarch64/aarch64.c | 8 +++- 1 file changed, 3 insertions(+), 5 deletions(-) diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 1c14008..b377bc7 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -6193,11 +6193,9 @@ aarch64_can_use_per_function_literal_pools_p (void) static bool aarch64_use_blocks_for_constant_p (machine_mode, const_rtx) { - /* Fixme:: In an ideal world this would work similar - to the logic in aarch64_select_rtx_section but this - breaks bootstrap in gcc go. For now we workaround - this by returning false here. */ - return false; + /* We can't use blocks for constants when we're using a per-function + constant pool. */ + return !aarch64_can_use_per_function_literal_pools_p (); } /* Select appropriate section for constants depending -- 1.8.3.1