[Bug middle-end/90075] [7/8 Regression] [AArch64] ICE during RTL pass when member of union passed to copysignf
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90075 --- Comment #5 from ramana.radhakrishnan at arm dot com --- The main reason for the ICE is this bit of code here. GCC-8 and earlier have this bit of code in the expansion for copysignsf3 .. rtx op2 = lowpart_subreg (V2SFmode, operands[2], SFmode); .. which looks quite a bit different to the approach taken with copysigndf3 until your rewrite. This gets an input in operands[2] which is subreg:SF (reg:SI 100) and then lower_subreg->simplify_gen_subreg seems to get into a tangle that it can't get out of. That causes simplify_gen_subreg to get confused and that ends up returning a Null pointer as it is unable to do the conversion - we then don't check and thus ICE with a null pointer error. Having looked at it again this morning my reaction is that while there be dragons in subreg's of vector modes and such mode casting, the newer rewrite seems reasonable and is not papering over any underlying modes. For the release branches, I think backporting your patch (and any followups , do you remember any ?) should be fine and we should just do it ./ Ramana From: rearnsha at gcc dot gnu.org Sent: 23 April 2019 15:57 To: ram...@gcc.gnu.org Subject: [Bug middle-end/90075] [7/8 Regression] [AArch64] ICE during RTL pass when member of union passed to copysignf https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90075 --- Comment #4 from Richard Earnshaw --- (In reply to Ramana Radhakrishnan from comment #3) > Seems to have been "fixed" by the commit to fix PR87369, > > Richard, is this something to backport ? Prima-facie , it appears not and we > will need an appropriate fix for the release branches. Given that the patch for PR87369 eliminates the ICE, it's probably preferable for backporting to a separate patch that is only used on the release branches. That patch has now been soaking on trunk for a while now, so is likely to be pretty safe. I am a bit worried however, that the patch papers over a likely trunk ICE that isn't really fixed. It would be nice to investigate further if some additional mitigation is warranted. -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug.
[Bug target/65837] [arm-linux-gnueabihf] lto1 target specific builtin not available
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65837 --- Comment #25 from ramana.radhakrishnan at arm dot com ramana.radhakrishnan at arm dot com --- On 29/04/15 09:09, rguenther at suse dot de wrote: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65837 --- Comment #24 from rguenther at suse dot de rguenther at suse dot de --- On Wed, 29 Apr 2015, ramana at gcc dot gnu.org wrote: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65837 --- Comment #23 from Ramana Radhakrishnan ramana at gcc dot gnu.org --- (In reply to rguent...@suse.de from comment #20) On Tue, 28 Apr 2015, prathamesh3492 at gcc dot gnu.org wrote: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65837 --- Comment #17 from prathamesh3492 at gcc dot gnu.org --- (In reply to clyon from comment #16) (In reply to prathamesh3492 from comment #15) I am not understanding why vfpv3-d16 appears in collect_gcc_options in run_gcc(). Isn't this because you configured GCC --with-fpu=vfpv3-d16? COLLECT_GCC_OPTIONS is set by gcc.c:set_collect_gcc_options(): /* Build COLLECT_GCC_OPTIONS to have all of the options specified to the compiler. */ obstack_grow (collect_obstack, COLLECT_GCC_OPTIONS=, sizeof (COLLECT_GCC_OPTIONS=) - 1); and at the end of set_collect_gcc_options(): xputenv (XOBFINISH (collect_obstack, char *)); which makes it environment variable. set_collect_gcc_options() is called by do_spec, which is called by driver::maybe_run_linker(), before executing linker. So the driver has no knowledge of options passed at compile-time, it sets the default -mfpu=vfpv3-d16. When lto-wrapper executes, it gets linker command line options from environment variable COLLECT_GCC_OPTIONS, which contains -mfpu=vfpv3-d16. and since that was being appended after compile-time options (fdecoded_options), -mfpu=vfpv3-d16 overrides -mfpu=neon. This also explains why it works in one shot arm-linux-gnueabihf -flto -mfpu=neon test.c COLLECT_GCC_OPTIONS will have -mfpu=neon since it's mentioned on command line, and lto-wrapper has access to this COLLECT_GCC_OPTIONS. When compiler and linker are run separately, at link time, the driver has no knowledege of flags of compile-time run, and hence sets default flags in COLLECT_GCC_OPTIONS. I think correct way to fix would be in run_gcc() to get values from COLLECT_GCC_OPTIONS in decoded_options as is currently done. run_gcc() walks through options in object file and saves it in fdecoded_options. So override the value in decoded_options for the same option found in fdecoded_options. Would that be a right approach ? No, link-time options always override compile-time ones. I suspect the fix will be to somehow avoid setting defaults when linking? Well I'm not sure how easy that's going to be - You will need some amount of defaults to get through especially if the user hasn't provided the options in the first place :( . The other option is to special-case only those options that are slipping in that way (-march, -mtune, -mcpu, -mfpu?) and emit those always first, hoping that explicit ones will override that. Of course lto-wrapper cannot distinguish between implicitely and explicitely such given arguments at link-time. Which means the only solution would be to completely ignore these at link-time. IMHO that will cause more problems. Alternatively, add options_default_specs in the beginning and then filter out all those values of options that are the same as the default options hoping that the user doesn't put that back in again. But again it's sticky tape and doesnt' really fix the problem. I'm not sure we're improving the situation in anyway by putting in the hack - it just pushes a compile time breakage into possibly subtle runtime failure which can easily be achieved by adding the relaxed option in this case -mfpu=neon to the command line at link time. At least then it's evident to the user that they need to do something specific to their use-case to get LTO working or fail very quickly if their code relies on absence of SIMD code in the default case or in the case without auto-detection. I'm pretty sure this will be the first thing to sort out when trying to build kernels with LTO for e.g.. . So, I guess I'm voting for doing this properly with target attributes rather than putting one more bit of sticky tape in a pretty painful area of the compiler. But I suspect this might break otherwise working cases (due to the fact that which -march/-mtune/-mcpu/-mfpu options lto-wrapper chooses from the object files is essentially random if they don't agree for all objects). It's really unfortunate that these configure-time defaults appear as regular user command-line arguments :( I suppose this was done to make them visible to specs processing. Yeah, sigh. regards Ramana
[Bug target/65837] [arm-linux-gnueabihf] lto1 target specific builtin not available
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65837 --- Comment #8 from ramana.radhakrishnan at arm dot com ramana.radhakrishnan at arm dot com --- On 23/04/15 09:18, rguenth at gcc dot gnu.org wrote: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65837 Richard Biener rguenth at gcc dot gnu.org changed: What|Removed |Added Status|RESOLVED|REOPENED Last reconfirmed||2015-04-23 Resolution|INVALID |--- Ever confirmed|0 |1 --- Comment #7 from Richard Biener rguenth at gcc dot gnu.org --- Not really - it's supported with GCC 5 that way as it uses target attributes. The ARM backend doesn't support target attributes either as attributes to functions or as #pragmas. That's the real issue not that we don't initialize and stream the builtins in this corner case. While doing that work we need to make sure we initialize the builtins for the appropriate options. So the Chromium usecase is INVALID, I'd think. But an additional PR to support target attributes in the ARM backend. There's a start with Christian Bruel's patches for arm and thumb but it needs extension to other options as well. Ramana
[Bug jit/64810] jit not working on armv7hl (ld: error: /tmp/libgccjit-ZGemdr/fake.so uses VFP register arguments, /tmp/ccJFCBsE.o does not)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64810 --- Comment #18 from ramana.radhakrishnan at arm dot com ramana.radhakrishnan at arm dot com --- On 28/01/15 17:58, dmalcolm at gcc dot gnu.org wrote: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64810 --- Comment #9 from David Malcolm dmalcolm at gcc dot gnu.org --- Thanks Ramana. I attempted a build of the jit with the configuration you suggested, specifically: $ ../src/configure \ --enable-host-shared \ --enable-languages=jit,c++ \ --disable-bootstrap \ --enable-checking=release \ --prefix=/home/dmalcolm/gcc-git-jit/install-jit \ --with-arch=armv7-a \ --with-float=hard \ --with-fpu=vfpv3-d16 Unfortunately, I see the same failures. Hacking in a -v into the driver invocation in jit-playback.c... diff --git a/gcc/jit/jit-playback.c b/gcc/jit/jit-playback.c index d2549a0..5f570a7 100644 --- a/gcc/jit/jit-playback.c +++ b/gcc/jit/jit-playback.c @@ -2271,6 +2271,8 @@ invoke_driver (const char *ctxt_progname, time. */ ADD_ARG (-fno-use-linker-plugin); + ADD_ARG (-v); + /* pex argv arrays are NULL-terminated. */ ADD_ARG (NULL); ...I see that libgccjit attempts to invoke the driver to convert the .s to a .so, but it fails like so: Target: armv7l-unknown-linux-gnueabihf Configured with: ../src/configure --enable-host-shared --enable-languages=jit,c++ --disable-bootstrap --enable-checking=release --prefix=/home/dmalcolm/gcc-git-jit/install-jit --with-arch=armv7-a --with-float=hard --with-fpu=vfpv3-d16 Thread model: posix gcc version 5.0.0 20150126 (experimental) (GCC) COLLECT_GCC_OPTIONS='-shared' '-o' '/tmp/libgccjit-VxeXM1/fake.so' '-fno-use-linker-plugin' '-v' '-march=armv7-a' '-mfloat-abi=hard' '-mfpu=vfpv3-d16' '-mtls-dialect=gnu' as -v -march=armv7-a -mfloat-abi=hard -mfpu=vfpv3-d16 -meabi=5 -o /tmp/ccCY7c5L.o /tmp/libgccjit-VxeXM1/fake.s GNU assembler version 2.24 (armv7hl-redhat-linux-gnueabi) using BFD version version 2.24 COMPILER_PATH= LIBRARY_PATH=/home/dmalcolm/gcc-git-jit/build-jit-comment8/gcc/:/lib/:/usr/lib/ COLLECT_GCC_OPTIONS='-shared' '-o' '/tmp/libgccjit-VxeXM1/fake.so' '-fno-use-linker-plugin' '-v' '-march=armv7-a' '-mfloat-abi=hard' '-mfpu=vfpv3-d16' '-mtls-dialect=gnu' ld --eh-frame-hdr -shared -dynamic-linker /lib/ld-linux-armhf.so.3 -X -m armelf_linux_eabi -o /tmp/libgccjit-VxeXM1/fake.so /lib/crti.o /home/dmalcolm/gcc-git-jit/build-jit-comment8/gcc/crtbeginS.o -L/home/dmalcolm/gcc-git-jit/build-jit-comment8/gcc /tmp/ccCY7c5L.o -lgcc --as-needed -lgcc_s --no-as-needed -lc -lgcc --as-needed -lgcc_s --no-as-needed /home/dmalcolm/gcc-git-jit/build-jit-comment8/gcc/crtendS.o /lib/crtn.o ld: error: /tmp/libgccjit-VxeXM1/fake.so uses VFP register arguments, /tmp/ccCY7c5L.o does not ld: failed to merge target specific data of file /tmp/ccCY7c5L.o That said, with the test-empty.c testcase, the generated fake.s looks like this: .cpu arm10tdmi .fpu softvfp .eabi_attribute 20, 1 .eabi_attribute 21, 1 .eabi_attribute 23, 3 .eabi_attribute 24, 1 .eabi_attribute 25, 1 .eabi_attribute 26, 2 .eabi_attribute 30, 2 .eabi_attribute 34, 0 .arm .syntax divided .file fake.c .text .Ltext0: .cfi_sections .debug_frame .Letext0: .section.debug_line,,%progbits .Ldebug_line0: .section.debug_str,MS,%progbits,1 .LASF0: .ascii /tmp/libgccjit-La3Yzk/fake.c\000 .LASF1: .ascii libgccjit 5.0.0 20150126 (experimental) -fPIC -O3 - .ascii g --param ggc-min-expand=0 --param ggc-min-heapsize .ascii =0\000 .ident GCC: (GNU) 5.0.0 20150126 (experimental) .section.note.GNU-stack,,%progbits In particular, I'm guessing that the line: .fpu softvfp is at fault here. This appears to come from arm.c:arm_file_start: 25689 if (TARGET_SOFT_FLOAT) 25690 { 25691 fpu_name = softvfp; 25692 } 25693 else and on debugging: (gdb) p global_options.x_arm_float_abi $1 = ARM_FLOAT_ABI_SOFT Is this value bogus, given the configure-time options? Sorry about the slow response, I was unable to check email last evening. Yes this value is bogus as are the other .cpu values - the assembler output suggests to me that the configure time options aren't being passed at all from the driver down when used as a jit. Given the configure options I would expect .arch armv7-a .fpu vfpv3-d16 and an EABI attribute tag to indicate the PCS. I think you've worked this out reading down-thread. Ramana (if so, I'm guessing this is a jit-specific state-management issue)
[Bug jit/64810] jit not working on armv7hl (ld: error: /tmp/libgccjit-ZGemdr/fake.so uses VFP register arguments, /tmp/ccJFCBsE.o does not)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64810 --- Comment #8 from ramana.radhakrishnan at arm dot com ramana.radhakrishnan at arm dot com --- On 27/01/15 20:00, dmalcolm at gcc dot gnu.org wrote: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64810 --- Comment #7 from David Malcolm dmalcolm at gcc dot gnu.org --- (In reply to ramana.radhakrish...@arm.com from comment #4) On 27/01/15 12:27, jakub at gcc dot gnu.org wrote: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64810 Jakub Jelinek jakub at gcc dot gnu.org changed: What|Removed |Added CC||jakub at gcc dot gnu.org --- Comment #2 from Jakub Jelinek jakub at gcc dot gnu.org --- Both the compiler and libgccjit were configured with: --with-tune=cortex-a8 --with-arch=armv7-a --with-float=hard --with-fpu=vfpv3-d16 --with-abi=aapcs-linux The --with-abi=aapcs-linux in addition to the --with-float=hard argument is just wrong. In fact it allows for a case where if the order of command line arguments passed to the compiler if in some way is wrong, the code generated will force the compiler into passing floating point parameters through the integer registers rather than fp registers which is what the --with-float=hard configure time option is doing. Really Fedora should remove this from the configure line as it only confuses people. Ramana: I'm sorry, I had trouble parsing your comment (e.g. which option were you referring to by this in the final sentence above). By this in the final sentence, I referred to the --with-abi=aapcs-linux option. What should I configure with when debugging this? e.g. should I keep the --with-abi=aapcs-linux and lose the --with-float=hard? should I be testing with the other options Jakub mentioned? etc. The architecture specific options that are commonly used for bootstrapping on ARM hard float are the following. --with-arch=armv7-a --with-float=hard --with-fpu=vfpv3-d16 (Sorry about my ignorance here; I'm not particularly familiar with arm, and my test machine is slow which makes it difficult to exhaustively try every possibility). No problem it's not an issue - there are quite a few options available so it's not easy to get the correct options sorted in the first instance. Ramana
[Bug jit/64810] jit not working on armv7hl (ld: error: /tmp/libgccjit-ZGemdr/fake.so uses VFP register arguments, /tmp/ccJFCBsE.o does not)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64810 --- Comment #4 from ramana.radhakrishnan at arm dot com ramana.radhakrishnan at arm dot com --- On 27/01/15 12:27, jakub at gcc dot gnu.org wrote: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64810 Jakub Jelinek jakub at gcc dot gnu.org changed: What|Removed |Added CC||jakub at gcc dot gnu.org --- Comment #2 from Jakub Jelinek jakub at gcc dot gnu.org --- Both the compiler and libgccjit were configured with: --with-tune=cortex-a8 --with-arch=armv7-a --with-float=hard --with-fpu=vfpv3-d16 --with-abi=aapcs-linux The --with-abi=aapcs-linux in addition to the --with-float=hard argument is just wrong. In fact it allows for a case where if the order of command line arguments passed to the compiler if in some way is wrong, the code generated will force the compiler into passing floating point parameters through the integer registers rather than fp registers which is what the --with-float=hard configure time option is doing. Really Fedora should remove this from the configure line as it only confuses people. For the compiler built can you please post back the output is for a simple function that adds 2 float values and pushes it back up. Ramana
[Bug target/63503] [AArch64] A57 executes fused multiply-add poorly in some situations
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63503 --- Comment #20 from ramana.radhakrishnan at arm dot com ramana.radhakrishnan at arm dot com --- On 23/10/14 00:28, e.menezes at samsung dot com wrote: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63503 --- Comment #16 from Evandro e.menezes at samsung dot com --- (In reply to Wilco from comment #15) Using -Ofast is not any different from -O3 -ffast-math when compiling non-Fortran code. As comment 10 shows, both loops are vectorized, however LLVM unrolls twice and uses multiple accumulators while GCC doesn't. You're right. LLVM produces: .LBB0_1:// %vector.body // =This Inner Loop Header: Depth=1 add x11, x9, x8 add x12, x10, x8 ldp q2, q3, [x11] ldp q4, q5, [x12] add x8, x8, #32 // =32 fmla v0.2d, v2.2d, v4.2d fmla v1.2d, v3.2d, v5.2d cmp x8, #128, lsl #12 // =524288 b.ne.LBB0_1 And GCC: .L3: ldr q2, [x2, x0] add w1, w1, 1 ldr q1, [x3, x0] cmp w1, w4 add x0, x0, 16 fmlav0.2d, v2.2d, v1.2d bcc .L3 I still don't see what this has to do with A57. You should open a generic bug about GCC not applying basic loop optimizations with -O3 (in fact limited unrolling is useful even for -O2). Indeed, but I think that there's still a code-generation opportunity for A57 here. What you mention is a general code generation improvement for AArch64. There's nothing Cortex-A57 specific about it. In the AArch64 backend, we think architecture and then micro-architecture. Note above that the registers are loaded in pairs by LLVM, while GCC, when it unrolls the loop, more aggressively BTW, each vector is loaded individually: .L3: ldr q28, [x15, x16] add x17, x16, 16 ldr q29, [x14, x16] add x0, x16, 32 ldr q30, [x15, x17] add x18, x16, 48 ldr q31, [x14, x17] add x1, x16, 64 ... fmlav27.2d, v28.2d, v29.2d ... fmlav27.2d, v30.2d, v31.2d ... # Rest of 8x unroll bcc .L3 It also goes without saying that this code could also benefit from the post-increment addressing mode. What's the kind of performance delta you see if you managed to unroll the loop just a wee bit ? Probably not much looking at the code produced here. Ramana
[Bug libstdc++/61532] [4.9/4.10 regression] make_signed and make_unsigned wchar_t have started failing in the libstdc++ testsuite.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61532 --- Comment #10 from ramana.radhakrishnan at arm dot com ramana.radhakrishnan at arm dot com --- On 23/06/14 21:31, redi at gcc dot gnu.org wrote: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61532 --- Comment #9 from Jonathan Wakely redi at gcc dot gnu.org --- The tests should be fixed now - please check. This commit has fixed the issues with the 2 tests that were still remaining . https://gcc.gnu.org/ml/gcc-testresults/2014-06/msg02109.html libstdc++ test results are now clear on armhf. Thanks for fixing this up. regards Ramana
[Bug rtl-optimization/59535] [4.9 regression] -Os code size regressions for Thumb1/Thumb2 with LRA
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59535 --- Comment #19 from ramana.radhakrishnan at arm dot com ramana.radhakrishnan at arm dot com --- On 06/12/14 08:46, fredrik.hederstie...@securitas-direct.com wrote: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59535 Fredrik Hederstierna fredrik.hederstie...@securitas-direct.com changed: What|Removed |Added CC||fredrik.hederstierna@securi ||tas-direct.com --- Comment #18 from Fredrik Hederstierna fredrik.hederstie...@securitas-direct.com --- I compared GCC 4.8.3 and GCC 4.9.0 for arm-none-eabi, and I still see a code size increase for thumb1 (and thumb2) for both my arm966e and my cortex-m4 targets. GCC 4.8.3 RAM used 93812 Flash used 515968 GCC 4.9.0 RAM used 93812 (same) Flash used 522608 (+1.3%) Then I tried to disable LRA and results got better: GCC 4.9.0 : added flag -mno-lra RAM used 93812 (same) Flash used 519624 (+0.7%) Flags used are otherwise identical for both tests: -Os -g3 -ggdb3 -gdwarf-4 -fvar-tracking-assignments -fverbose-asm -fno-common -ffunction-sections -fdata-sections -fno-exceptions -fno-asynchronous-unwind-tables -fno-unwind-tables -mthumb -mcpu=arm966e-s -msoft-float -mno-unaligned-access Generally GCC 4.9.0 seems to produce larger code, I tried to experiement with LTO (-flto -flto-fat-objects), but then code size increased even more for both GCC 4.8.3 and GCC 4.9.0, I was expecting a code decrease though. Sorry I cannot share exact sources used for compilation here, I can share toolchain build script though on request, or try to setup a small test case. I first just wanted to confirm that this bug really is fixed and resolved, so its not a new bug or another known issue. It might be another issue or it may well be an issue with LRA not many could tell for certain unless we could get a small testcase to look at. What we'd like is a small testcase that shows the problem compared with gcc 4.8.3 to progress this further. Please file a new bug report following the instructions in https://gcc.gnu.org/bugs/#report in this particular case we'd be interested in all command line options that were used. regards Ramana BR /Fredrik
[Bug target/61153] [ARM] vbic vorn tests fail
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=61153 --- Comment #8 from ramana.radhakrishnan at arm dot com ramana.radhakrishnan at arm dot com --- How do we define cases where we need them? My concern is that some compiler change might cause a suboptimal-yet-functional code to be generated, and we wouldn't notice it. Well, currently tests in gcc.target/arm/neon serve no functional / correctness purpose. The only thing they check is if the correct instruction is generated at O0 which is probably enough for that case. I would rather that we put in scan-assembler tests in your testsuite rather than anything else. regards Ramana
[Bug target/43129] Simplify global variable's address loading with option -fpic
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43129 --- Comment #9 from ramana.radhakrishnan at arm dot com ramana.radhakrishnan at arm dot com 2010-10-14 16:39:26 UTC --- On Thu, 2010-10-14 at 16:33 +, stephen.clarke at st dot com wrote: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43129 Stephen Clarke stephen.clarke at st dot com changed: What|Removed |Added CC||stephen.clarke at st dot ||com --- Comment #8 from Stephen Clarke stephen.clarke at st dot com 2010-10-14 16:32:56 UTC --- For arm instruction set, could you fold pc into the indexing to save an instruction? foo: ldr r3, .L2// C .LPIC0: ldr r3, [r3,pc]// C You'll find that the ARM-ARM thinks that PC in any of the 3 locations in this instruction form is *unpredictable*. Thus this form of the instruction should not be used. cheers Ramana