[Bug middle-end/94083] New: inefficient soft-float x!=Inf code
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94083 Bug ID: 94083 Summary: inefficient soft-float x!=Inf code Product: gcc Version: 9.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: middle-end Assignee: unassigned at gcc dot gnu.org Reporter: wilson at gcc dot gnu.org Target Milestone: --- Given a testcase like this int foo(void) { volatile float f; intn; f = __builtin_huge_valf(); n += 1 - (f != __builtin_huge_valf()); return n; } and compiling for soft-float, we end up with a call to __unordsf2 followed by a call to __lesf2. This means the floats have to be unpacked twice and checked for nan twice. This gives both poor performance and poor code size. I've confirmed this for x86, arm, and riscv. Folding in the C front end is creating an unordered less then or equal comparison against FLT_MAX. From the 004.original file n = SAVE_EXPR + n; This optimization is coming from a rule in the match.pd file. /* x != +Inf is always equal to !(x > DBL_MAX), but this introduces an exception for x a NaN so use an unordered comparison. */ When we generate rtl, we call do_compare_rtx_and_jump which notices that we don't have an operation for UNLE_EXPR, but decides we can't reverse it because it is unsafe. It tries swapping arguments, but we don't have UNGE_EXPR either. So it emits two libcalls. Converting a NE compare to a UNLE compare looks like an odd optimization. If we want to consider unordered operations as canonical operations, then maybe we should add libgcc support for the unordered operations. Or maybe we should check to see if unordered operations are handled by the target before converting a simple NE into a UNLE. The match.pd rule was changed to use UNLE in the patch for PR 64811 which fixed a problem with handling NaNs. This happened 2018-01-09. The optimization dates back to 2003-05-22 but was originally using LE which is OK for soft-float. It wasn't until the NaN bug was fixed by using UNLE instead of LE that this became an optimization problem. Maybe we just shouldn't perform this optimization when honoring NaNs? That would avoid generating the problematic unordered operation early in the optimizer.
[Bug target/94136] GCC doc for built-in function __builtin___clear_cache() not 100% correct
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94136 Jim Wilson changed: What|Removed |Added CC||wilson at gcc dot gnu.org --- Comment #3 from Jim Wilson --- Another possible issue is that __clear_cache is defined in the libgcc docs. But only some platforms are defining CLEAR_INSN_CACHE so only some targets have a usable __clear_cache function.
[Bug c++/94044] [10 Regression] internal compiler error: in comptypes, at cp/typeck.c:1490 on riscv64-unknown-linux-gnu and arm-eabi
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94044 Jim Wilson changed: What|Removed |Added CC||wilson at gcc dot gnu.org --- Comment #7 from Jim Wilson --- I made an attempt to reproduce this. I wasn't able to reproduce with an arm-eabi build. I was able to reproduce with a riscv64-linux build. The funny part is that I was able to build two compilers from the same gcc sources, one which reproduces and one which does not, which differ only in exactly how I did the build. For the failing build, I had a complete riscv-gnu-toolchain build available when configuring. For the working build it was just binutils and gcc without glibc/linux header files, and a top-of-tree binutils version unlike the first build. Debugging the two side by side, I see that execution diverges at line 9680 in cp/pt.c entry = type_specializations->find_with_hash (&elt, hash); The working compiler has no hash hit and returns zero. The failing compiler has a hash hit, and then dies inside spec_hasher::equal. In the spec_hasher::equal function I see (gdb) print *e1 $29 = {tmpl = , args = , spec = } (gdb) print *e2 $30 = {tmpl = , args = , spec = } (gdb) pt unit-size align:64 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type 0x75deb7e0 precision:64 min max pointer_to_this > readonly arg:0 elt:1 > type_0 type_6 VOID align:8 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type 0x75ef4540> tmp.C:13:23 start: tmp.C:13:23 finish: tmp.C:13:35>> (gdb) print e2->args $32 = (gdb) pt unit-size align:64 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type 0x75deb7e0 precision:64 min max pointer_to_this > readonly arg:0 elt:1 > type_0 type_6 VOID align:8 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type 0x75efa2a0> tmp.C:13:23 start: tmp.C:13:23 finish: tmp.C:13:35>> (gdb) It then eventually dies inside comptypes because TREE_CODE (t1) is type_pack_expansion. And also TREE_CODE (t2) is type_pack_expansion. This is called from the SIZEOF_EXPR case in cp_tree_equal. If tree addresses are being used for the hash codes, this could just be bad luck whether it fails or not.
[Bug c++/94044] [10 Regression] internal compiler error: in comptypes, at cp/typeck.c:1490 on riscv64-unknown-linux-gnu and arm-eabi
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94044 --- Comment #9 from Jim Wilson --- (In reply to Jakub Jelinek from comment #8) > So perhaps to ease reproduction, tweak the hash function in this case to > always return 0? Yes, that works. I just didn't have a chance to look at the hash function last night. With the hash function hacked I can reproduce for any target and any -std=c++X value. diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c index 789ccdb..4337928 100644 --- a/gcc/cp/pt.c +++ b/gcc/cp/pt.c @@ -1733,7 +1733,8 @@ hash_tmpl_and_args (tree tmpl, tree args) hashval_t spec_hasher::hash (spec_entry *e) { - return hash_tmpl_and_args (e->tmpl, e->args); + return 0; + // return hash_tmpl_and_args (e->tmpl, e->args); } /* Recursively calculate a hash value for a template argument ARG, for use
[Bug target/94173] [RISCV] Superfluous stackpointer manipulation
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94173 Jim Wilson changed: What|Removed |Added CC||wilson at gcc dot gnu.org --- Comment #1 from Jim Wilson --- struct Pair has size 8 and align 4, and we have no unaligned load/store support, so we are not able to allocate the temporary local variable to a register. It must be allocated a stack slot. The RTL optimizer is able to figure out that the stack stores and loads don't alias anything and hence are not necessary and optimizes them away. However, we don't have any support to unallocate a stack slot after it has already been allocated, so we end up with the unnecessary stack pointer increment and decrement. In a degenerate case like this, where there are no longer any stack loads/stores, we may be able to notice that and get rid of the stack pointer manipulation. But in a more complicated case where there are multiple stack slots, and references to all but one is optimized away, then we would still need the stack pointer change, though we would just be wasting stack space in this case with larger decrements/increments than needed. If you change the type to truct Pair { char *s; char *t; } __attribute__ ((aligned(8))); then you get the result you want. That isn't a practical solution, but it demonstrates that this is a size/alignment/strict-alignment problem. This is more of a middle end problem than a RISC-V backend problem. It should be possible to reproduce on any target with similar strict alignment constraints, and similar calling conventions that allow returning the structure in registers, though I don't know if there are any offhand.
[Bug target/94173] [RISCV] Superfluous stackpointer manipulation
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94173 Jim Wilson changed: What|Removed |Added Status|UNCONFIRMED |NEW Ever confirmed|0 |1 Last reconfirmed||2020-03-16
[Bug target/94173] [RISCV] Superfluous stackpointer manipulation
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94173 --- Comment #3 from Jim Wilson --- I was looking at the rv32 output. For the rv64 compiler, you need to use aligned(16).
[Bug c++/64697] C++11 thread_local: relocation truncated to fit: R_X86_64_PC32 against undefined symbol `TLS init function for N::ptd'
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64697 Jim Wilson changed: What|Removed |Added CC||wilson at gcc dot gnu.org --- Comment #22 from Jim Wilson --- This looks like a binutils bug to me. A call to an undefined weak function should never be executed, so it is OK for the linker to convert that call instruction into anything convenient. There is no need for a relocation that can reach an address of zero. We can convert the call instruction to call itself, or the next instruction, or change it to a nop, what ever is convenient, it doesn't really matter. A number of binutils ports already have code to handle related problems. ARM and RISC-V for sure. Probably others. It looks like this support is missing from the x86_64 port. I'd suggest refiling this as a binutils bug. See for instance https://sourceware.org/bugzilla/show_bug.cgi?id=23244 for a RISC-V example of the same problem. But we need a new bug for the x86_64 problem. RISC-V has a register hard wired to zero, so I rewrite the call instruction to use x0 as the base address. The arm port turns the call into a nop.
[Bug c++/64697] C++11 thread_local: relocation truncated to fit: R_X86_64_PC32 against undefined symbol `TLS init function for N::ptd'
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64697 --- Comment #24 from Jim Wilson --- Joel Sherrill offered to create a binutils bug report for this.
[Bug bootstrap/92008] Build failure on cygwin
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92008 Jim Wilson changed: What|Removed |Added CC||wilson at gcc dot gnu.org --- Comment #21 from Jim Wilson --- This looks the same as a binutils bug. https://sourceware.org/bugzilla/show_bug.cgi?id=22941 The easy solution is to touch intl/plural.c after checkout, so that bison won't be run. The contrib/gcc_update script already does this. So the simplest solution for the original problem is to use contrib/gcc_update to update a gcc tree, or "contrib/gcc_update --touch" if you want to fix a gcc tree without updating it. If your gcc git source tree was already mangled by a bad bison run, you will have to manually reset it to a clean tree, e.g. "git reset --hard" or "git diff > tmp.file; patch -p1 --reverse < tmp.file; rm tmp.file" or whatever, and then run the contrib/gcc_update --touch command. Binutils unfortunately does not have an equivalent to the gcc_update script and hence requires a fix. git unfortunately does not preserve file timestamps across commit and checkout, so when you checkout a file it gets the current time. git also tends to check out files in alphabetical order. If you are on a fast filesystem, i.e. linux, plural.c and plural.y almost always get the same timestamp and bison isn't run. If you are on a slow filesystem, i.e. cygwin, plural.c is often older than plural.y, and bison must be run, and the current bison version fails. This is why it is cygwin folk that most commonly run into this problem.
[Bug target/94950] [8/9/10 regression] ICE in gcc.dg/pr94780.c on riscv64
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94950 --- Comment #5 from Jim Wilson --- I tested it with an rv64gc-linux cross compiler. The patch fixes these failures: FAIL: gcc.dg/pr94780.c (internal compiler error) FAIL: gcc.dg/pr94780.c (test for excess errors) FAIL: gcc.dg/pr94842.c (internal compiler error) FAIL: gcc.dg/pr94842.c (test for excess errors) There are no regressions. I think it should be backported to the gcc-10 release branch.
[Bug target/94950] [8/9 regression] ICE in gcc.dg/pr94780.c on riscv64
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94950 Jim Wilson changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|--- |FIXED --- Comment #9 from Jim Wilson --- Fixed on mainline and gcc-10 branch.
[Bug target/94780] [8/9 Regression] ICE in walk_body at gcc/tree-nested.c:713 since r6-3632-gf6f69fb09c5f81df
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94780 Bug 94780 depends on bug 94950, which changed state. Bug 94950 Summary: [8/9 regression] ICE in gcc.dg/pr94780.c on riscv64 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94950 What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|--- |FIXED
[Bug target/95115] [10 Regression] RISC-V 64: inf/inf division optimized out, invalid operation not raised
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95115 Jim Wilson changed: What|Removed |Added CC||wilson at gcc dot gnu.org --- Comment #3 from Jim Wilson --- Marc Glisse's testcase fails even with old gcc versions. My x86_64 Ubuntu 16.04 gcc-5.4.0 also removes the divide with -O. My Ubuntu 18.04 gcc-7.5.0 gives the same result. This seems to be simple constant folding that we have always done. The assumption here seems to be that if the user is dividing constants, then we don't need to worry about setting exception bits. If I write (4.0 / 3.0) for instance, the compiler just folds it and doesn't worry about setting the inexact bit. Aurelien Jarno's testcase in the attachment is more interesting, as that works with older gcc versions, just not gcc-10. I did a bisect, and tracked this down to the Richard Biener's patch for pr83518. It looks like the glibc code was obfuscated a bit to try to avoid the usual trivial constant folding, and the patch for pr83518 just made gcc smart enough to recognize that constants are involved, and then optimize this case the same way we have always optimized FP constant divides. Newlib incidentally uses (x-x)/(x-x) where x is the input value, so there are no constants involved, and the divide does not get optimized away. This still works with gcc-10. The result is a subtract followed by a divide. At first glance, this looks more like a glibc problem to me than a gcc problem. But maybe the fact that constants were written to memory and then read back in should prevent the usual trivial FP constant divide folding. I can almost make the glibc testcase work if I mark the unions as volatile. That prevents the union reads and writes from being optimized away, but the divide gets moved after the fetestexcept call. That looks like a gcc bug though I think a different problem that this pr. The 234t.optimized dump is correct. The 236r.expand dump is wrong. This happens for both x86_64 and RISC-V. The resulting code is bigger than what the newlib trick generates though.
[Bug target/95252] testcase gcc.dg/torture/pr67916.c failure when testing with -msave-restore
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95252 Jim Wilson changed: What|Removed |Added Ever confirmed|0 |1 Last reconfirmed||2020-05-25 Status|UNCONFIRMED |NEW --- Comment #1 from Jim Wilson --- It appears to be failing in the rename register (rnreg) pass. This is because the unspec patterns for the save/restore calls don't mention the registers that they use/modify. This confuses rename reg into thinking that live regs are dead, and it accidentally clobbers them before the save call. This worked OK when save/restore calls could only be at the beginning or end of a function. But now that this works with tail calls and shrink wrapping, we can get them in inner blocks. Since the different save/restore calls use/modify different sets of registers, fixing this gets a little complicated. Maybe we can just use the max list of registers because listing extra ones shouldn't matter? Another solution is to disable the rename register pass when -msave-restore is used. This isn't doing any checking for whether regs can be used in compressed instructions or not. This is currently encoded in REG_ALLOC_ORDER which this pass doesn't use. The result is that this is probably increasing code size which is undesirable when -msave-restore it used. Disabling this would reduce code size and fix the -msave-restore problem. The rename register pass does use the PREFERRED_RENAME_CLASS hook that we haven't defined. We should try defining this to convert registers classes to subsets that only include the regs that can be used in compressed instructions. This might result in a code size decrease. If this works, then the rename reg pass should not be disabled, and we should find a way to fix the save/restore pattern register lists instead. I need to do some builds and experiments to verify this info.
[Bug target/95252] testcase gcc.dg/torture/pr67916.c failure when testing with -msave-restore
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95252 --- Comment #3 from Jim Wilson --- I tried both. Turning off register naming works. It gives a code size decrease of about 0.003% for the libraries I looked at which can be ignored. This probably also reduces performance; I didn't check that. I think it would be better to leave register naming on and define the PREFERRED_RENAME_CLASS hook. Adding uses to the gpr_save pattern also works for the testcase. I just added uses for all of the saved regs. We shouldn't need an exact list, because there is little or no code before the prologue, and the prologues are added late. An exact list would be cleaner if you want to try to do that. I also needed to fix riscv_remove_unneeded_save_restore_calls to ignore the prologue_matched insn when checking for USEs to avoid gcc testsuite regressions. I now have 3 g++ testsuite regressions I haven't looked at yet. FAIL: g++.dg/torture/stackalign/throw-1.C -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions -fpic execution test FAIL: g++.dg/torture/stackalign/throw-2.C -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions execution test FAIL: g++.dg/torture/stackalign/throw-2.C -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions -fpic execution test
[Bug target/95252] testcase gcc.dg/torture/pr67916.c failure when testing with -msave-restore
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95252 --- Comment #4 from Jim Wilson --- Created attachment 48624 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=48624&action=edit disable reg rename when -msave-restore the code using MASK_SAVE_RESTORE is just for testing purposes
[Bug target/95252] testcase gcc.dg/torture/pr67916.c failure when testing with -msave-restore
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95252 --- Comment #5 from Jim Wilson --- Created attachment 48625 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=48625&action=edit add uses to gpr_save pattern the code using MASK_SAVE_RESTORE is just for testing purposes unfinished, adds 3 new g++ testsuite failures
[Bug target/95252] testcase gcc.dg/torture/pr67916.c failure when testing with -msave-restore
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95252 --- Comment #7 from Jim Wilson --- I've got 3 new g++ testsuite failures. So we might still need an exact list of USEs. I hadn't thought about RVE. That will have to be checked also. RV32/RV64 shouldn't matter, as the mode in the USEs doesn't matter. Unless maybe you want to use a multi-word load to match multiple registers with a single USE to reduce the size of the patterns, in which case it would need to be different for rv32 and rv64. If we do need an exact list of USEs, maybe we can use a match_parallel to simplify the patterns.
[Bug target/84553] -rdynamic generates TEXTREL relocations on ia64
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84553 --- Comment #7 from Jim Wilson --- I was ia64 maintainer when I wrote the patch, but couldn't test it. I'm not the ia64 maintainer anymore. I suggest asking the current ia64 maintainer. Though, oops, I see we don't have one listed in the MAINTAINER file. I thought we had appointed one. I'm a global maintainer, but that doesn't give me power to approve my own patch for things I don't maintain anymore. I'm hopelessly overcommitted on RISC-V issues so unlikely to have time to do anything here.
[Bug target/95637] Read-only data assigned to `.sdata' rather than `.rodata'
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95637 Jim Wilson changed: What|Removed |Added CC||wilson at gcc dot gnu.org --- Comment #1 from Jim Wilson --- The RISC-V backend puts small read-only data in the srodata section. RISC-V is not the only target that supports srodata. I agree that this might be surprising for targets with memory protection that are expecting writes to read-only data to trap but I don't think that standards require traps here. And for targets without memory protection this is a useful code size and performance optimization. We could perhaps disable srodata support for the riscv linux and freebsd targets. I think those are the only ones with memory protection that we support. Maybe make this controlled by an option so people can choose between getting traps and getting smaller faster code.
[Bug target/95632] Redundant zero extension
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95632 Jim Wilson changed: What|Removed |Added Last reconfirmed||2020-06-12 Status|UNCONFIRMED |NEW Ever confirmed|0 |1 --- Comment #1 from Jim Wilson --- We sign extend HImode constants as that is the natural thing to do to make arithmetic work. This does mean that unsigned short logical operations need a zero extend after the operation which might otherwise be unnecessary. This can't be handled at rtl generation time as we don't know if the constant will be used for arithmetic or logicals or signed or unsigned. But maybe an optimization pass could go over the code and convert HImode constants to signed or unsigned as appropriate to reduce the number of sign/zero extend operations. We have the ree pass that we might be able to extend to handle this. Handling this in combine requires a 4->3 splitter which is something combine doesn't do. We could work around that by not splitting constants before combine, but that would be a major change and probably not beneficial, as we wouldn't be able to easily optimize the high part of the constants anymore. Another approach here might be to split the xor along with the constant. If we generated something like srlia0,a0,1 xoria0,a0,1 li a5,-24576 xor a0,a0,a5 then we can optimize away the following zero extend with a 3->2 splitter which combine already supports via find_split_point. We can still optimize the high part of the constant. Since the immediates are sign extended, if the low part of the immediate has the sign bit set, we would have to invert the high part of the immediate to get the right result. At least I think that works, I haven't double checked it yet. This only works for or if the low part doesn't have the sign bit set. And this only works for and if the low part does have the sign bit set.
[Bug target/95637] Read-only data assigned to `.sdata' rather than `.rodata'
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95637 --- Comment #3 from Jim Wilson --- People have asked about constant pools before, but as far as I know no one has tried to implement support for them yet. We don't have a pc-relative load, so it would be a two instruction sequence with auipc. Unless maybe you load the base address into a register, which is probably OK for rvi but may cause register pressure problems for rve. We have a 12-bit signed offset, +/-2K which limits the range we can address if you want to put the base address in a register. There could also complications with the aggressive link time code relaxations that we do, depending on where you put the constant pools and how you use them. It isn't clear if constant pools are better or worse than what we already have.
[Bug target/95632] Redundant zero extension
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95632 --- Comment #3 from Jim Wilson --- It isn't possible to have patterns that match only in combine. If we add a pattern to accept (xor (reg) (large constant)) then it could match in any optimization pass, and could prevent us from optimizing away redundant lui instructions. There is a representation issue here with constants. If we split them early, then optimizing redundant lui is easy. If we split them late, then optimizing redundant lui is hard. There are also other optimizations that may be easy or hard depending on whether constants are split early or late. Currently, we always split constants early, and changing that will have a major impact on the code optimization, which may be good or bad, but more likely will be good for some programs and bad for others. I'd rather not change this as it will be a major project to deal with the problems caused by the change. Hence my suggestion at RTL generation time to split xor with constants differently. I have a proof of concept patch for that, but it needs a lot of cleanup to be useful, and a lot of testing to verify that it improves code more often than it harms code. As for ree, splitters after register allocation traditionally check reload_completed which is a global variable set near the end of the last register allocation pass. The split2 pass happens between reload and ree. Maybe moving ree before split2 would help RISC-V, but might hurt other targets. Or might help for some programs and hurt for others.
[Bug target/95632] Redundant zero extension
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95632 --- Comment #4 from Jim Wilson --- Created attachment 48737 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=48737&action=edit proof of concept patch for changing xor with a large constant needs cleanup and testing to be useful
[Bug tree-optimization/95685] Loop invariants can't be moved out of the loop
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95685 --- Comment #1 from Jim Wilson --- The problem with the constant isn't apparent until we reach RTL generation and see that it requires two instructions to load. Then once in RTL optimization passes we have mostly block local optimizations that aren't going to notice the same constant used in 3 different blocks and optimize it. The if statement inside the unrolled loop bodies prevent RTL optimization passes from fixing this. So yes, this would work better if we could do loop invariant code motion before loop unrolling as you suggested.
[Bug tree-optimization/95760] ivopts with loop variables
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95760 Jim Wilson changed: What|Removed |Added CC||wilson at gcc dot gnu.org --- Comment #1 from Jim Wilson --- You are compiling with -Os. I get the expected result if I compile with -O2. Looking at tree dumps, I see the first difference between -O2 and -Os dumps is in the ch2 (copy loop header 2) pass, which explicitly disables loop header copying when -Os is used. Note the optimize_loop_for_size_p check in should_duplicate_loop_header_p in tree-ssa-loop-ch.c. You can see the difference if you add -ftree-dump-ch2-all. In the -O2 ch2 dump file, I see Loop 1 is not do-while loop: latch is not empty. Will duplicate bb 7 Not duplicating bb 3: it is single succ. Duplicating header of the loop 1 up to edge 7->3, 4 insns. Loop 1 is do-while loop Loop 1 is now do-while loop. and in the -Os ch2 dump file, I see Loop 1 is not do-while loop: latch is not empty. Not duplicating bb 7: optimizing for size. The difference in loop optimization here then affects the later ivopt pass. Normally, duplicating basic blocks will make code bigger. But in this case the duplicated blocks enable better loop optimization which results in smaller code at the end. This kind of thing is hard to handle with the heuristics. We would have to optimize both ways and check to see which one is smaller at the end to get this right every time, and the compiler doesn't work that way currently. I haven't checked older sources to see if/when a heuristic changed. This isn't risc-v specific. I see the same issue with x86_64.
[Bug tree-optimization/95760] ivopts with loop variables
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95760 --- Comment #2 from Jim Wilson --- I took another look, and it turns out that the should_duplicate_loop_header_p for size/speed is not the only issue. There is also an issue in tree-ssa-loop-ivopts.c when computing iv costs. With speed, the +4 iv is computed as cheaper than the +1 iv. With size, the +4 iv and +1 iv have the exact same cost, and since the +1 iv was looked at first that one was chosen. If I hack adjust_setup_cost to use to always use the speed cost calculation, and retain the should_duplicate_loop_header_p hack, then both the inner and outer loops get the +4 iv with -Os. Looking at gcc-8.3, I see that the outer loop has the +4 iv and the inner loop as the +1 iv. This looks similar to the result I get with the adjust_setup_cost hack but not the should_duplicate_loop_header_p hack. So I think the regression is solely due to some change in the cost calculation. There is a lot of code involved in cost calculations. This could have even been a riscv backend change. I would suggest doing a bisect over the gcc git tree if you want to see exactly where and how the cost calculation changed. The -Os and -O2 optimization diverges in try_improve_iv_set where it does "if (acost < best_cost)". Maybe this could be improved to handle the case where acost == best_cost, and use some other criteria to choose which one is better, e.g. maybe a giv is better than a biv if they have the same cost. I haven't tried looking into this.
[Bug target/96026] overlap register bewteen DEST and SOURCE in different machine mode
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96026 Jim Wilson changed: What|Removed |Added CC||wilson at gcc dot gnu.org Resolution|--- |FIXED Status|UNCONFIRMED |RESOLVED --- Comment #1 from Jim Wilson --- This is 3 different places where you have asked the same question now. One place would have been good enough. Already answered in other places.
[Bug target/96191] New: aarch64 stack_protect_test canary leak
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96191 Bug ID: 96191 Summary: aarch64 stack_protect_test canary leak Product: gcc Version: 11.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: wilson at gcc dot gnu.org Target Milestone: --- Given a simple testcase extern int sub (int); int main (void) { sub (10); return 0; } commpiling with -O -S -fstack-protector-all -mstack-protector-guard=global in the epilogue for the canary check I see ldr x1, [sp, 40] ldr x0, [x19, #:lo12:__stack_chk_guard] eor x0, x1, x0 cbnzx0, .L4 Both x0 and x1 have the stack protector canary loaded into them, and the eor clobbers x0, but x1 is left alone. This means the value of the canary is leaking from the epilogue. The canary value is never supposed to survive in a register outside the stack protector patterns. A powerpc64-linux toolchain build with the same testcase and options generates lwz 9,28(1) lwz 10,0(31) xor. 9,9,10 li 10,0 bne- 0,.L4 and note that it clears the second register after the xor to prevent the canary leak. The aarch64 stack_protect_test pattern should do the same thing.
[Bug target/96191] aarch64 stack_protect_test canary leak
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96191 --- Comment #3 from Jim Wilson --- The location of the canary is not known to the attacker. You are not supposed to leak the address of the canary or the value of the canary. If you leak either, then an attacker has a chance to restore the canary after clobbering it. See the descriptions of the stack_protect_set and stack_protect_test patterns in gcc/doc/md.texi which make clear that no intermediate values should be allowed to survive past the end of the pattern.
[Bug sanitizer/96307] [10/11 Regression] ICE in sanopt on riscv64 since r11-2283-g2ca1b6d009b194286c3ec91f9c51cc6b0a475458
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96307 Jim Wilson changed: What|Removed |Added CC||wilson at gcc dot gnu.org --- Comment #2 from Jim Wilson --- It is calling targetm.asan_shadow_offset which is a null function pointer currently for RISC-V. This is related to Kito's recent patch to re-enable ksan support when asan_shadow_offset isn't defined. But it looks like there are multiple params that can cause asan_shadow_offset to be called for ksan when it normally isn't. So this change may need to be removed. Good news is that we have a patch to add asan support for RISC-V which would make Kito's toplev.c patch unnecessary for us.
[Bug bootstrap/97183] New: zstd build failure for gcc 10 on Ubuntu 16.04
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97183 Bug ID: 97183 Summary: zstd build failure for gcc 10 on Ubuntu 16.04 Product: gcc Version: 10.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: bootstrap Assignee: unassigned at gcc dot gnu.org Reporter: wilson at gcc dot gnu.org Target Milestone: --- This was originally reported here https://github.com/riscv/riscv-gnu-toolchain/issues/718 A build of gcc 10 on Ubuntu 16.04 with the libzstd-dev package installed gives multiple errors. Ubuntu 16.04 has zstd version 0.5.1 which lacks features that gcc is trying to use. The gcc configure test for zstd.h only verifies that it exists, it doesn't verify the version, or that any of the functions or macros we need are present. Ubuntu 18.04 has zstd version 1.3.3, I verified that builds. So we can maybe verify the version is 1.3.3 or greater, or maybe check for the specific functions and macros that we are trying to use. Kito did a little research that suggests that we need verfsion 1.3.0 or greater. We haven't tried to verify that. --without-zstd successfully works around the problem. build log info from the original bug report: g++ -fno-PIE -c -g -O2 -DIN_GCC -DCROSS_DIRECTORY_STRUCTURE -fno-exceptions -fno-rtti -fasynchronous-unwind-tables -W -Wall -Wno-narrowing -Wwrite-strings -Wcast-qual -Wmissing-format-attribute -Woverloaded-virtual -pedantic -Wno-long-long -Wno-variadic-macros -Wno-overlength-strings -DHAVE_CONFIG_H -I. -I. -I../.././riscv-gcc/gcc -I../.././riscv-gcc/gcc/. -I../.././riscv-gcc/gcc/../include -I../.././riscv-gcc/gcc/../libcpp/include -I../.././riscv-gcc/gcc/../libdecnumber -I../.././riscv-gcc/gcc/../libdecnumber/dpd -I../libdecnumber -I../.././riscv-gcc/gcc/../libbacktrace -o optabs-tree.o -MT optabs-tree.o -MMD -MP -MF ./.deps/optabs-tree.TPo ../.././riscv-gcc/gcc/optabs-tree.c ../.././riscv-gcc/gcc/lto-compress.c: In function ‘int lto_normalized_zstd_level()’: ../.././riscv-gcc/gcc/lto-compress.c:120:36: error: ‘ZSTD_maxCLevel’ was not declared in this scope else if (level > ZSTD_maxCLevel ()) ^ ../.././riscv-gcc/gcc/lto-compress.c: In function ‘void lto_uncompression_zstd(lto_compression_stream*)’: ../.././riscv-gcc/gcc/lto-compress.c:160:74: error: ‘ZSTD_getFrameContentSize’ was not declared in this scope unsigned long long const rsize = ZSTD_getFrameContentSize (cursor, size); ^ ../.././riscv-gcc/gcc/lto-compress.c:161:16: error: ‘ZSTD_CONTENTSIZE_ERROR’ was not declared in this scope if (rsize == ZSTD_CONTENTSIZE_ERROR) ^ ../.././riscv-gcc/gcc/lto-compress.c:163:21: error: ‘ZSTD_CONTENTSIZE_UNKNOWN’ was not declared in this scope else if (rsize == ZSTD_CONTENTSIZE_UNKNOWN) ^
[Bug libstdc++/97182] Add support for targets that only define SYS_futex_time64 and not SYS_futex
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97182 --- Comment #3 from Jim Wilson --- libgomp works on riscv64-linux. It would only be riscv32-linux that is broken. The riscv32 support was only just recently added to FSF glibc, and hasn't appeared in a release yet, so arguably, there is no ABI break for riscv32-linux if we can fix this before the gcc-11 release, as that is the first one that can officially support riscv32-linux. Unofficialy we have embedded linux distros with riscv32-linux but they should be able to tolerate ABI breaks, particularly since we never guaranteed that the riscv32-linux ABI was stable. This will be a problem for other 32-bit targets though when they enable 64-bit time_t support.
[Bug middle-end/92263] [10 Regression] ICE in commit_one_edge_insertion, at cfgrtl.c:2087 since r270758
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92263 --- Comment #3 from Jim Wilson --- I did a cross compiler build and check yesterday using up-to-date sources and did not see this failure. I've been testing regularly. I did my build on an x86_64 Ubuntu 16.04 machine with gcc-5.4 as the system compiler. Maybe this depends on the compiler used for the build? Or the exact configure command?
[Bug middle-end/92263] [10 Regression] ICE in commit_one_edge_insertion, at cfgrtl.c:2087 since r270758
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92263 --- Comment #4 from Jim Wilson --- OK, I get it now. You are using non-standard optimization options with a testsuite testcase. I can reproduce when I use your compiler options. I will take a look.
[Bug middle-end/92263] [10 Regression] ICE in commit_one_edge_insertion, at cfgrtl.c:2087 since r270758
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92263 --- Comment #5 from Jim Wilson --- The patch adds a RISC-V movcc pattern. This causes toplev.c to enable flag_tree_cselim. This optimization pass creates a complex long double conditional move via a phi node. complex long double cstore_31; ... [local count: 27903866]: cstore_30 = MEM [(void *)_8]; [local count: 55807731]: # cstore_31 = PHI <__complex__ (0.0, 0.0)(4), cstore_30(5)> MEM [(void *)_8] = cstore_31; When we try to convert gimple to rtl, eliminate_phi calls insert_value_copy_on_edge for the 32-byte long double 0 value. The constant then gets forced to memory, and we end up calling emit_block_move with BLOCK_OP_NO_LIBCALL, which ends up emitting a loop to do the memory to memory copy. Then later in commit_one_edge_insertion we split the edge, insert the code containing the loop, and then trigger an abort because the last instruction inserted is the loop back branch. I don't see where the RISC-V port did anything wrong. The load hoisting code is checking the movcc optab to see if the target supports the operation, but I don't see anything obvious like that in the cselim pass. The only obvious fix I see in the RISC-V back end is to modify riscv_expand_block_move to emit inline non-loop code for a 32-byte memory to memory copy, even when optimizing for size, which I'd rather not do. Maybe it can be fixed in commit_one_edge_insertion by allowing conditional branches but not unconditional branches, but it isn't clear why this is refusing to allow branches here in the first place. I will have to look at other targets to see why they aren't failing.
[Bug middle-end/92263] [10 Regression] ICE in commit_one_edge_insertion, at cfgrtl.c:2087 since r270758
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92263 --- Comment #6 from Jim Wilson --- Looking at some other targets. ARM has movcc but not 128-bit long double. Aaarch has movcc and 128-bit long double, but has 128-bit load/store so this is only 4 instructions. mips64, powerpc64, and sparc64 have movcc and 128-bit long double, but emit the memcpy inline as 8 instructions. riscv64 meanwhile wants the libcall with -Os as that is 4 instructions instead of 8. For rv32 this would be 16 instructions. I'm not sure offhand if the other targets support 32-bit code and 128-bit long double. Anyways, I tracked the use of BLOCK_OP_NO_LIBCALL in emit_move_complex back to bugzilla 15289, fixed by a patch from Richard Henderson back in Dec 1 2004. I think it is just an oversight that -Os wasn't considered here. I think the correct fix is to only force BLOCK_OP_NO_LIBCALL when optimizing for speed. With this change, I get the 8 instruction sequence with -O2, and the 4 instruction libcall sequence with -Os, which is what the RISC-V backend wants, and this lets the testcase work.
[Bug middle-end/92263] [10 Regression] ICE in commit_one_edge_insertion, at cfgrtl.c:2087 since r270758
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92263 Jim Wilson changed: What|Removed |Added Assignee|unassigned at gcc dot gnu.org |wilson at gcc dot gnu.org --- Comment #7 from Jim Wilson --- Created attachment 47139 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=47139&action=edit untested proposed fix
[Bug middle-end/92263] [10 Regression] ICE in commit_one_edge_insertion, at cfgrtl.c:2087 since r270758
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92263 --- Comment #8 from Jim Wilson --- Author: wilson Date: Tue Nov 5 22:34:40 2019 New Revision: 277861 URL: https://gcc.gnu.org/viewcvs?rev=277861&root=gcc&view=rev Log: Allow libcalls for complex memcpy when optimizing for size. The RISC-V backend wants to use a libcall when optimizing for size if more than 6 instructions are needed. Emit_move_complex asks for no libcalls. This case requires 8 insns for rv64 and 16 insns for rv32, so we get fallback code that emits a loop. Commit_one_edge_insertion doesn't allow code inserted for a phi node on an edge to end with a branch, and so this triggers an assertion. This problem goes away if we allow libcalls when optimizing for size, which gives the code the RISC-V backend wants, and avoids triggering the assert. gcc/ PR middle-end/92263 * expr.c (emit_move_complex): Only use BLOCK_OP_NO_LIBCALL when optimize_insn_for_speed_p is true. gcc/testsuite/ PR middle-end/92263 * gcc.dg/pr92263.c: New. Added: trunk/gcc/testsuite/gcc.dg/pr92263.c Modified: trunk/gcc/ChangeLog trunk/gcc/expr.c trunk/gcc/testsuite/ChangeLog
[Bug middle-end/92263] [10 Regression] ICE in commit_one_edge_insertion, at cfgrtl.c:2087 since r270758
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92263 Jim Wilson changed: What|Removed |Added Status|NEW |ASSIGNED --- Comment #9 from Jim Wilson --- Fixed on mainline. Do you need a backport to the gcc-9 branch?
[Bug middle-end/92263] [10 Regression] ICE in commit_one_edge_insertion, at cfgrtl.c:2087 since r270758
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92263 Jim Wilson changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|--- |FIXED --- Comment #10 from Jim Wilson --- Fixed on mainline. No backport requested.
[Bug bootstrap/92709] Cross Compilation failed for Latest GCC riscv64-linux-gnu on Linux/WSL2
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92709 Jim Wilson changed: What|Removed |Added CC||wilson at gcc dot gnu.org --- Comment #3 from Jim Wilson --- >make[4]: Entering directory >'/home/cqwrteur/gcc-riscv64-build/riscv64-linux-gnu/lib32/ilp32/libgcc' >make[4]: *** No rule to make target 'all'. Stop. Look in the riscv64-linux-gnu/lib32/ilp32/libgcc dir. There should be a Makefile that is a copy of the $srcdir/libgcc/Makefile.in file with some sed substition. If this isn't true, then it is the configure step that failed. You can force the library dirs to reconfigure by deleting them. It gets a bit more complicated with multilibs, I think you have to delete the top level riscv64-linux-gnu/libgcc to force a reconfigure, but just a rm -rf riscv64-linux-gnu works too and is simpler, though more stuff will be rebuilt. You might want to do a -j1 make to get an easier to read build log. I don't have a Windows machine at work, but there is only one WSL problem that I have seen reported, and it is that WSL makes filesystems case-insensitive by default which is contrary to linux practice. It is known that this will break glibc builds which uses .os and .oS for two different kinds of files. I don't think that this breaks gcc builds. But since you are trying to do a cross to riscv64-linux-gnu you will run into this problem if you haven't already.
[Bug target/93062] Failed to generate indirect branch for long branches on riscv
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93062 Jim Wilson changed: What|Removed |Added CC||wilson at gcc dot gnu.org --- Comment #1 from Jim Wilson --- We need to check length attributes in the branch patterns, and emit different sequences depending on the length. There are multiple examples to compare with, for instance the condjump pattern in the aarch64.md file.
[Bug testsuite/93045] gc bug with test "start_unit-test-1.c"
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93045 Jim Wilson changed: What|Removed |Added CC||wilson at gcc dot gnu.org --- Comment #1 from Jim Wilson --- This worked for me running natively on a fedora rawhide system with a 4.15 linux kernel.
[Bug target/93062] Failed to generate indirect branch for long branches on riscv
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93062 Jim Wilson changed: What|Removed |Added Status|UNCONFIRMED |NEW Last reconfirmed||2019-12-25 Ever confirmed|0 |1
[Bug inline-asm/93202] [RISCV] ICE when using inline asm 'h' constraint modifier
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93202 --- Comment #2 from Jim Wilson --- %h is used for the gcc internal implementation of emitting auipc. I'm skeptical that it is useful for asms. Stripping the HIGH rtx is an internal implementation detail, and does not apply to asms, as you can't get a HIGH there. Is there a reason why you are trying to use it? There may be a better solution for what you need. If we really need %h to work in asms then it probably needs some inconvenient work. I'd rather document that %h shouldn't be used in asms, or leave it undocumented as an internal gcc implementation detail. I'm assuming that you are just working on llvm support, and don't actually need %h to work in asms, you just need llvm and gcc compatibility. riscv_print_operand does use output_operand_lossage as it should. But it calls a function riscv_print_operand_reloc which calls gcc_unreachable in a switch statement. That is an oversight. It can be fixed to use output_operand_lossage too.
[Bug inline-asm/93202] [RISCV] ICE when using inline asm 'h' constraint modifier
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93202 --- Comment #5 from Jim Wilson --- Jakub's patch looks OK to me.
[Bug inline-asm/93202] [RISCV] ICE when using inline asm 'h' constraint modifier
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93202 --- Comment #7 from Jim Wilson --- (In reply to Luís Marques from comment #3) > Jim Wilson: I'm not using it, I was only working on the LLVM implementation. > Could you please clarify if following modifiers are also internal only? > > 'C' Print the integer branch condition for comparison OP. > 'A' Print the atomic operation suffix for memory model OP. > 'F' Print a FENCE if the memory model requires a release. 'C' maps an rtx to a string. It is intended to be used for comparisons to emit the appropriate compare instruction, because the instruction names match the gcc internal rtx names. It can't be used for its intended purpose in an asm, as you can't get a comparison operator as an operand to an asm. Since it works with any rtx, it can be used in an asm, but is very unlikely to be useful. 'A' takes a memory order value from stdatomic.h, and emits a .acq if it is one of the memory orders that requires an acquire operation, e.g. __ATOMIC_ACQUIRE. Gcc calls this a memory model internally, and defines the values in memmodel.h. The primary use is for the atomic builtin functions, to map the memory order argument to the right instruction. This takes an integer argument, so in theory it could be used in an asm, but unlikely to be very useful. 'F" is similar, except it is to atomic releases, and emits a fence instruction. This one is a bit of historical accident. The gcc riscv port was written before we had a formal memory model defined, and so to be conservative, it emits fences in a lot of places where we probably don't need them. Now that we do have a formal memory model defined, the gcc port needs to be fixed to implement it, except there is no one to do the work, so it is unclear when it will happen. Meanwhile, the port still emits a lot of fences we don't need via 'F'. This takes an integer argument as above, so likewise in theory could be used in an asm, but unlikely to be useful. And this one has the additional problem that it needs to change in a future gcc release, though we could preserve the current meaning of the 'F' letter and use a new letter if necessary in the rewrite. The useful print operand letters are the ones for registers, constants, and addresses. These random ones used for internal gcc features aren't really useful in asms.
[Bug target/93304] RISC-V: Function with interrupt attribute use register without save/restore at prologue/epilogue
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93304 Jim Wilson changed: What|Removed |Added CC||wilson at gcc dot gnu.org --- Comment #2 from Jim Wilson --- There is a convention of using all caps for function arguments. See for instance the riscv_build_integer function comment. It would be nice to preserve this convention, but this is a very minor issue. I usually put a blank line between the function comment and the function, but again this is a very minor issue. The patch looks OK to me.
[Bug target/93333] ICE: RTL check: expected code 'const_int', have 'and' in riscv_rtx_costs, at config/riscv/riscv.c:1645
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=9 Jim Wilson changed: What|Removed |Added Status|UNCONFIRMED |NEW Last reconfirmed||2020-01-21 Ever confirmed|0 |1 --- Comment #2 from Jim Wilson --- I can reproduce. Reproducing requires enabling rtl checking which is not on by default. I suspect that there are other similar problems, as we probably haven't tested a build with rtl checking enabled before. The problem is in riscv_rtx_costs which only needs to return valid values for valid rtl, and it is failing the rtl check for invalid rtl, so this isn't a major problem if rtl checking is off, but it does need to be fixed to be safe.
[Bug target/93333] ICE: RTL check: expected code 'const_int', have 'and' in riscv_rtx_costs, at config/riscv/riscv.c:1645
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=9 --- Comment #3 from Jim Wilson --- Jakub's patch looks OK, and works for the testcase.
[Bug target/93333] ICE: RTL check: expected code 'const_int', have 'and' in riscv_rtx_costs, at config/riscv/riscv.c:1645
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=9 --- Comment #4 from Jim Wilson --- I tried some cross testing with rtl checking enabled, and found another rtl check bug with the -msave-restore support in config/riscv/riscv-sr.c where it uses XINT to read from a CONST_INT which is wrong, as it is actually an XWINT value, and we should be using INTVAL to read the value. I've tested a patch for that, and can commit it tomorrow. -msave-restore is for embedded code size, so this shouldn't be a problem for linux users.
[Bug target/93333] ICE: RTL check: expected code 'const_int', have 'and' in riscv_rtx_costs, at config/riscv/riscv.c:1645
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=9 Jim Wilson changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |FIXED --- Comment #7 from Jim Wilson --- Fixed on mainline.
[Bug target/93304] RISC-V: Function with interrupt attribute use register without save/restore at prologue/epilogue
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93304 Jim Wilson changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |FIXED --- Comment #4 from Jim Wilson --- Fixed on mainline.
[Bug target/89627] Miscompiled constructor call
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89627 Jim Wilson changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |FIXED Assignee|unassigned at gcc dot gnu.org |wilson at gcc dot gnu.org --- Comment #5 from Jim Wilson --- Was fixed in gcc 9 last year.
[Bug target/91602] GCC fails to build for riscv in a combined tree due to misconfigured leb128 support
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91602 --- Comment #10 from Jim Wilson --- The proposed binutils patch has multiple problems and has gone through multiple iterations. Not clear when or if we will be able to accept it. The gcc configure patch to eliminate the call to gcc_GAS_VERSION_GTE_IFELSE for in tree gas builds actually looked like the better solution to me though I haven't tried it yet. If we go this way the patch should perhaps eliminate everything related to gcc_GAS_VERSION_GTE_IFELSE which is a much bigger patch. Since combined tree builds are obsolete, this is low on my priority list.
[Bug target/91602] GCC fails to build for riscv in a combined tree due to misconfigured leb128 support
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91602 --- Comment #11 from Jim Wilson --- Since Marxin pinged this and got me thinking about this again, I realized that there is a simpler fix based on Serge's second suggestion. We can just delete the gas version number from the uleb128 gas check in configure.ac. This will force a gas feature check for uleb128 only, which solves the RISC-V build problem, and is a nice small change. I'm testing a patch for that.
[Bug target/93532] RISCV g++ hangs with optimization >= -O2
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93532 Jim Wilson changed: What|Removed |Added CC||wilson at gcc dot gnu.org --- Comment #9 from Jim Wilson --- I tried the buildroot instructions. It didn't work on an ubuntu 16.04 server machine. There is a 'python3 pip3 -q docwriter' command that hangs. I also discovered that the script isn't restartable. It runs -rf on the build directory and exits with an error. I did get it to work on my ubuntu 18.04 laptop. And it does hang, but it isn't the btPolyhedralContactClipping.cpp file that hangs for me, it is the btBoxBoxDetector.cpp file. I was able to reproduce this with a gcc-8.3.0 build using -O2 -fPIC -fstack-protector-strong options to compile the file. It does not reproduce using the top of the gcc-8-branch svn tree, suggesting that either it is already fixed, or it is maybe a memory corruption problem that is hard to reproduce. Using gdb to attach to the gcc-8.3.0 compiler, I see that it is looping in lra, but I haven't tried to debug that yet. #0 0x00705e7b in bitmap_find_bit (bit=42321, bit@entry=330, head=0x376ae88) at ../../gcc-8.3.0/gcc/bitmap.c:539 #1 bitmap_set_bit (head=0x376ae88, bit=bit@entry=42321) at ../../gcc-8.3.0/gcc/bitmap.c:600 #2 0x0099b95f in mark_regno_dead (regno=42321, mode=, point=) at ../../gcc-8.3.0/gcc/lra-lives.c:362 #3 0x0099c9c4 in process_bb_lives (dead_insn_p=false, curr_point=@0x7ffc9a90: 181876, bb=) at ../../gcc-8.3.0/gcc/lra-lives.c:842 #4 lra_create_live_ranges_1 (all_p=all_p@entry=true, dead_insn_p=dead_insn_p@entry=false) at ../../gcc-8.3.0/gcc/lra-lives.c:1337 #5 0x0099e7c0 in lra_create_live_ranges (all_p=all_p@entry=true, dead_insn_p=dead_insn_p@entry=false) at ../../gcc-8.3.0/gcc/lra-lives.c:1406 #6 0x00982d0c in lra (f=) at ../../gcc-8.3.0/gcc/lra.c:2473 #7 0x0093fa32 in do_reload () at ../../gcc-8.3.0/gcc/ira.c:5465 #8 (anonymous namespace)::pass_reload::execute (this=) at ../../gcc-8.3.0/gcc/ira.c:5649 ...
[Bug target/93532] RISCV g++ hangs with optimization >= -O2
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93532 --- Comment #10 from Jim Wilson --- Created attachment 47774 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=47774&action=edit testcase that reproduces for me compile with -O2 -fPIC -fstack-protector-strong
[Bug target/93532] RISCV g++ hangs with optimization >= -O2
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93532 --- Comment #11 from Jim Wilson --- I'm able to reproduce with the gcc-8-branch now. Maybe I made a mistake with my earlier build. Anyways, it looks like it is going wrong here in the reload dump Creating newreg=1856, assigning class NO_REGS to save r1856 434: fa0:SF=call [`sqrtf'] argc:0 REG_UNUSED fa0:SF REG_CALL_DECL `sqrtf' REG_EH_REGION 0 Add reg<-save after: 2446: r114:SF#0=r1856:DF 432: NOTE_INSN_BASIC_BLOCK 24 Add save<-reg after: 2445: r1856:DF=r114:SF#0 then later we appear to end up in a loop generating secondary reloads that need secondary reloads themselves, and so forth. The instruction above looks funny, trying to use a subreg to convert DFmode to SFmode. I don't think we should be generating that. So it looks like a caller save problem. If I add -fno-caller-saves the compile finishes. It appears that we need a definition for HARD_REGNO_CALLER_SAVE_MODE because the default definition can't work here. The comment in sparc.h for HARD_REGNO_CALLER_SAVE_MODE looks relevant. The same definition may work for RISC-V. Looks like the MIPS port does it the same way too.
[Bug target/93532] RISCV g++ hangs with optimization >= -O2
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93532 --- Comment #12 from Jim Wilson --- A bisection on mainline between the gcc-8 and gcc-9 releases shows that this testcase was fixed by a combine patch for PR87600 that stops combining hard regs with pseudos to reduce register pressure. The commentary refers to ira and lra problems. A combine patch won't be as safe as a RISC-V backend patch though. I tried testing the riscv HARD_REGNO_CALLER_SAVE_MODE patch with buildroot but it turns out that it is downloading a pre-built compiler instead of building one. So dropping in the patch doesn't do anything. I will have to figure out what is going on there. Trying the riscv patch with mainline on the testcase, I see that I get better rematerialization without the confusing subregs, and I also get smaller stack frames since we are saving SFmode now to the stack instead of DFmode now. Otherwise, I don't see any significant changes to the code. I tried a make check with the riscv patch on mainline, and got an unexpected g++ testsuite failure, so I will have to look into that.
[Bug target/93532] RISCV g++ hangs with optimization >= -O2
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93532 Jim Wilson changed: What|Removed |Added Assignee|unassigned at gcc dot gnu.org |wilson at gcc dot gnu.org --- Comment #13 from Jim Wilson --- Created attachment 47794 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=47794&action=edit untested patch to fix the problem
[Bug target/93532] RISCV g++ hangs with optimization >= -O2
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93532 Jim Wilson changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |FIXED --- Comment #19 from Jim Wilson --- Patch applied to mainline. This is just a minor optimization for gcc-10 as a combiner patch between gcc-8 and gcc-9 reduces register pressure enough to prevent the hang. Hence there is no real need for the patch in gcc-9. The patch might be useful in gcc-8, but the problem is hard to reproduce, buildroot is the only one that ran into the problem, and they can always add the patch to their tree, so not clear if we really need it on the gcc-8 branch.
[Bug target/93532] RISCV g++ hangs with optimization >= -O2
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93532 --- Comment #20 from Jim Wilson --- Thanks for confirming that it solves the buildroot build problem. My gcc mainline g++ test failure turned out to be a thread related issue with qemu cross testing. The testcase works always on hardware, but fails maybe 10-20% of the time when run under qemu. RISC-V qemu is known to still have a few bugs in this area, though they might already be fixed in newer qemu versions than what I have.
[Bug tree-optimization/90883] Generated code is worse if returned struct is unnamed
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90883 Jim Wilson changed: What|Removed |Added CC||wilson at gcc dot gnu.org --- Comment #29 from Jim Wilson --- The testcase works for riscv64-elf but does not work for riscv32-elf. The difference is in the einline pass before dse1. riscv64-elf has tmp.C:12:17: optimized: Inlining constexpr C::C()/1 into C slow()/3. where as riscv32-elf has tmp.C:12:17: missed: will not early inline: C slow()/3->constexpr C::C()/1, call is cold and code would grow by 1 Since the constructor was not early inlined, dse1 can't eliminate the redundant store. The constructor eventually gets inlined between 085i.materialize-all-clones and 088t.fixup_cfg3 which allows dse2 to eliminate the redundant store. I can make the testcase work for riscv32-elf if I add --param max-inline-insns-size=1 to allow the constructor to be inlined during the einline pass. I didn't check to see if this works for the other failing targets.
[Bug rtl-optimization/92656] The zero_extend insn can't be eliminated in the combine pass
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92656 --- Comment #3 from Jim Wilson --- Looking at this, I see that the problem occurs in record_value_for_reg where it does if (!insn || (value && rsp->last_set_table_tick >= label_tick_ebb_start)) rsp->last_set_invalid = 1; last_set_table_tick is 2 and label_tick_ebb_start is 1 because this is the first block of the function. This actually causes a lot of variables set in the first block to be marked invalid if used in a successful combination two or more times, which then prevents the nonzero bits info from being used for any of them. There seems to be a problem with how label_tick is used. In the very first block in the body of the function, label_tick is 2 and label_tick_ebb_start is 1. This is because it is considered to be the second block in the ebb after the entry block. In the second block in the body of the function, label_tick is 3 and label_tick_ebb_start is 3. This means that every variable set in the first block gets treated differently than in every block after the first. If I add a little bit of code before the loop to force it to be the second block, then I get correct output from combine. I just added this before the loop static int j = 0; if (val) j++; This also explains why the problem only occurs with -mtune=sifive-7-series because this enables the conditional move support that turns the loop into a single block, and then the -funroll-loops option fully unrolls the loop, turning the entire function into one block, which prevents combine from handling many of the register sets correctly because everything is in the first block now. This also explains why the problem started when the 2->2 combination support was added, as that causes more successful combinations, and hence more registers getting invalidated in the first block. So the question is why we need label_tick > label_tick_ebb_start for the first block of the function. There is nothing set in the entry block other than hard registers, and those could always be handled specially by just marking them as invalid somehow before processing instructions. Or alternatively, in record_value_for_reg, maybe we can add a check for a pseudo reg only set once and not live in the prologue, and avoid marking it as invalid when we process it a second time. There are already a lot of checks like this scattered around the code.
[Bug rtl-optimization/92656] The zero_extend insn can't be eliminated in the combine pass
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92656 --- Comment #5 from Jim Wilson --- A rewrite using dataflow would be better of course. I'm just trying to understand the problem with this testcase better, and maybe find a simple solution, but I don't think that there is one. The workarounds I see just make the code more complicated and add more risk of something else going wrong.
[Bug tree-optimization/90883] Generated code is worse if returned struct is unnamed
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90883 --- Comment #32 from Jim Wilson --- The proposed patch looks OK to me. I suggest you submit it to gcc-patches.
[Bug lto/88422] collect2.exe: fatal error: lto-wrapper returned 1 exit status: file not recognized: file truncated
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88422 Jim Wilson changed: What|Removed |Added CC||wilson at gcc dot gnu.org --- Comment #2 from Jim Wilson --- I've reproduced this problem with a RISC-V gcc-8.x compiler, and tracked it down to the first patch for bug 81968, in comment #60. With the patch reverted, the testcase works. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81968#c60 The RISC-V testcase works with Linux hosted and Cygwin hosted toolchains, and only fails for mingw32 hosted toolchains. Maybe an LLP64 problem with the patch? I didn't see any obvious type error in the patch though. I had to borrow a windows machine from our IT group to look at this, and they have since taken the loaner back, so I don't have a machine at work I can use to debug this at present.
[Bug lto/88422] collect2.exe: fatal error: lto-wrapper returned 1 exit status: file not recognized: file truncated
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88422 --- Comment #4 from Jim Wilson --- I used a cross compiler, so ulong_type is easy enough to check. For simple-object-elf.i I see __extension__ typedef unsigned long long uint64_t; ... __extension__ typedef uint64_t ulong_type; which looks right.
[Bug target/84797] RISC-V: add --with-multilib-list support
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84797 --- Comment #2 from Jim Wilson --- Created attachment 43904 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=43904&action=edit version 2 patch Missing documentation, doesn't handle architecture aliases, and only lets your specify one ABI, but otherwise seems to be working right. Configuring with --enable-multilib --with-multilib-list=lp64d --with-abi=lp64d --with-arch=rv64gc I get a compiler that does this gamma05:2013$ ./xgcc -B./ --print-multi-lib .; gamma05:2014$ ./xgcc -B./ --print-multi-dir . gamma05:2015$ ./xgcc -B./ --print-multi-os-directory ../lib64/lp64d gamma05:2016$ So one multilib was built, and it was installed into lib64/lp64d where we want it. The --print-multi-dir value shouldn't matter. Most ports only select multilibs based on two options. The RISC-V port uses two options to select multilibs, which makes specifying this stuff a lot more complicated. This is why I'm only allowing one ABI choice at the moment. I see I forgot to cleanup the t-linux-withmultilib file. I will do that in the next version of the patch.
[Bug bootstrap/84856] Bootstrap failure on riscv: comparison of integer expressions of different signedness
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84856 --- Comment #8 from Jim Wilson --- I copied the design of the patch from the i386 backend, so in theory it should work. The layout of the stack is completely at the control of the target backend, so uses of STACK_BOUNDARY outside the backend should not be a problem. I did some sanity checking when I made the check, but now that you point the problem out I see that I missed two cases. outgoing_args_size and pretend_args_size are not longer rounded to the PREFERRED_STACK_BOUNDARY size, they are rounded to the smaller STACK_BOUNDARY size instead. We can fix this in riscv_compute_frame_info by adding RISCV_STACK_ALIGN macro calls around the uses of these two values. This is a simple fix. I'm testing a patch now.
[Bug bootstrap/84856] Bootstrap failure on riscv: comparison of integer expressions of different signedness
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84856 --- Comment #10 from Jim Wilson --- Author: wilson Date: Tue Apr 17 21:41:07 2018 New Revision: 259449 URL: https://gcc.gnu.org/viewcvs?rev=259449&root=gcc&view=rev Log: RISC-V: Fix 32-bit stack pointer alignment problem. gcc/ PR 84856 * config/riscv/riscv.c (riscv_compute_frame_info): Add calls to RISCV_STACK_ALIGN when using outgoing_args_size and pretend_args_size. Set arg_pointer_offset after using pretend_args_size. Modified: trunk/gcc/ChangeLog trunk/gcc/config/riscv/riscv.c
[Bug inline-asm/85185] Wider-than-expected load for struct member used as operand of inline-asm with memory clobber at -Og
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85185 Jim Wilson changed: What|Removed |Added Status|UNCONFIRMED |NEW Last reconfirmed||2018-04-23 CC||wilson at gcc dot gnu.org Ever confirmed|0 |1 --- Comment #7 from Jim Wilson --- The problem is exposed in combine, where we take two instructions (insn 9 8 10 2 (set (reg:DI 72 [ _2 ]) (sign_extend:DI (subreg:HI (reg:SI 75 [ SD.1554.PD.1553 ]) 0))) "tmp.c"\ :12 92 {extendhidi2} (expr_list:REG_DEAD (reg:SI 75 [ SD.1554.PD.1553 ]) (nil))) (insn 10 9 0 2 (parallel [ (asm_operands/v ("magic %0") ("") 0 [ (subreg/s/u:HI (reg:DI 72 [ _2 ]) 0) ] [ (asm_input:HI ("r") tmp.c:12) ] [] tmp.c:12) (clobber (mem:BLK (scratch) [0 A8])) ]) "tmp.c":12 -1 (expr_list:REG_DEAD (reg:DI 72 [ _2 ]) (nil))) and then produce insn 10 9 0 2 (parallel [ (asm_operands/v ("magic %0") ("") 0 [ (subreg:HI (reg:SI 75 [ SD.1554.PD.1553 ]) 0) ] [ (asm_input:HI ("r") tmp.c:12) ] [] tmp.c:12) (clobber (mem:BLK (scratch) [0 A8])) ]) "tmp.c":12 -1 (expr_list:REG_DEAD (reg:SI 75 [ SD.1554.PD.1553 ]) (nil))) We have now lost the truncation and sign-extension. The value passed to the asm has correct value for the low 16 bits, but has garbage in the high 16 bits. However, what combine did does not appear wrong by itself. One could argue that the problem started with the asm, which is taking a HImode argument, even though this makes little sense on RISC-V, since the only instructions operating on HImode are the 16-bit load and store instructions. Maybe the asm should use the sign-extended DImode value directly and assume a DImode input instead of a HImode input? That would prevent the truncate and sign-extend from being optimized away, but might be wrong if someone extends the RISC-V ISA to include instructions that operate directly on HImode values. I can work around the problem by explicitly casting the asm input to int. asm("magic %0" :: "r" ((int)sub.a) : "memory"); and now the asm takes SImode input, and the truncate/sign extend can't be optimized away. Asking the user to change their code doesn't seem like the right solution though.
[Bug target/85492] riscv64: endless when throwing an exception from a constructor
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85492 --- Comment #1 from Jim Wilson --- The testcase fails with default dynamic linking. It works with static linking. It also works if runtime_error is removed and we have just a plain throw. Using github riscv/riscv-gnu-toolchain project, which has older versions of binutils, gcc, and glibc, it works both static and dynamic. If I update binutils and/or gcc to FSF mainline, it still works. If I update glibc to FSF glibc-2.27, it fails dynamic but works static. So apparently the problem was triggered by a glibc change when it was upstreamed. I tried adding aborts to libgcc and libstdc++ unwind/exception routines. They aren't hit. qemu traces suggest it is looping inside the dynamic linker. LD_DEBUG=all isn't helpful. It prints a lot of messages for binding symbols, and then no messages when it gets stuck looping (assuming it is looping inside ld.so). Unfortunately, we don't have gdb support yet. I can't use gdb sim to generate a trace as gdb sim doesn't support dynamic linked binaries. It isn't clear how to debug this. Maybe I can find a clue in the gcc testsuite. I haven't tried running that natively yet. It will likely take a while to run though and may not trigger the same failure.
[Bug target/85492] riscv64: endless when throwing an exception from a constructor
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85492 Jim Wilson changed: What|Removed |Added Status|UNCONFIRMED |NEW Last reconfirmed||2018-04-24 Ever confirmed|0 |1
[Bug target/85492] riscv64: endless loop when throwing an exception from a constructor
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85492 --- Comment #3 from Jim Wilson --- I figured out that I wasn't fully rebuilding and relinking all libraries while trying to debug this with printf, and that sent me down the wrong path. Trying this again, correctly, I see that we have a loop in unwind, because the return address for _start is pointing at _start. This works by accident when static linking, because crt1.o is included before crtbegin.o, crtbegin.o registers FDEs starting from a label it adds to the eh_frame section, and hence the FDE for _start in crt1.o gets lost. When unwinding, we see that there is no FDE for _start, and it isn't an exception frame, so that terminates unwinding. When dynamic linking, we use PT_GNU_EH_FRAME which uses eh_frame section addresses and hence finds every FDE, including the one for _start, so we try to unwind through _start, get a return address pointing at _start, and go into an infinite loop. This requires a glibc patch to fix. Just setting the return address in _start to 0 works.
[Bug target/85492] riscv64: endless loop when throwing an exception from a constructor
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85492 --- Comment #4 from Jim Wilson --- Created attachment 44032 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=44032&action=edit proposed glibc patch to fix the problem
[Bug target/85492] riscv64: endless loop when throwing an exception from a constructor
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85492 --- Comment #6 from Jim Wilson --- I suggest you handle the glibc patch. Note that you can probably also fix this by adding unwind direcives to _start to say that the return address is in x0. This would avoid the minor code size increase, but takes a little more effort to figure out how to add the right unwind directives to assembly code to make this work. I haven't tried that.
[Bug target/85492] riscv64: endless loop when throwing an exception from a constructor
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85492 --- Comment #8 from Jim Wilson --- (In reply to Aurelien Jarno from comment #7) > Should I just close this bug and open a new one on the glibc side? That is fine if you want to do that. > + /* Mark ra as undefined in order to stop unwinding here! */ > + cfi_undefined (ra) I tried this, and it worked for me.
[Bug target/85596] New: aarch64 --with-multilib-list documentation missing
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85596 Bug ID: 85596 Summary: aarch64 --with-multilib-list documentation missing Product: gcc Version: 8.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: wilson at gcc dot gnu.org Target Milestone: --- config.gcc has aarch64 support for the --with-multilib-list option, but it isn't documented in the doc/install.texi file.
[Bug target/85142] Wrong -print-multi-os-directory & -print-multi-lib output for riscv64 + multilib
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85142 Jim Wilson changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |DUPLICATE --- Comment #12 from Jim Wilson --- Will be fixed by the patch for 84797. *** This bug has been marked as a duplicate of bug 84797 ***
[Bug target/84797] RISC-V: add --with-multilib-list support
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84797 --- Comment #3 from Jim Wilson --- *** Bug 85142 has been marked as a duplicate of this bug. ***
[Bug target/84797] RISC-V: add --with-multilib-list support
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84797 Jim Wilson changed: What|Removed |Added Status|UNCONFIRMED |ASSIGNED Last reconfirmed||2018-05-08 Assignee|unassigned at gcc dot gnu.org |wilson at gcc dot gnu.org Ever confirmed|0 |1
[Bug target/84797] RISC-V: add --with-multilib-list support
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84797 --- Comment #4 from Jim Wilson --- Author: wilson Date: Wed May 9 21:17:14 2018 New Revision: 260096 URL: https://gcc.gnu.org/viewcvs?rev=260096&root=gcc&view=rev Log: RISC-V: Add with-multilib-list support. gcc/ PR target/84797 * config.gcc (riscv*-*-*): Handle --with-multilib-list. * config/riscv/t-withmultilib: New. * config/riscv/withmultilib.h: New. * doc/install.texi: Document RISC-V --with-multilib-list support. Added: trunk/gcc/config/riscv/t-withmultilib trunk/gcc/config/riscv/withmultilib.h Modified: trunk/gcc/ChangeLog trunk/gcc/config.gcc trunk/gcc/doc/install.texi
[Bug target/84797] RISC-V: add --with-multilib-list support
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84797 Jim Wilson changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|--- |FIXED --- Comment #5 from Jim Wilson --- Fixed on mainline.
[Bug target/86005] [RISCV] Invalid intermixing of __atomic_* libcalls and inline atomic instruction sequences
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86005 Jim Wilson changed: What|Removed |Added Status|UNCONFIRMED |NEW Last reconfirmed||2018-05-31 Ever confirmed|0 |1
[Bug other/86039] Compiler placed in deep/long folder cannot open/run needed files on Windows
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86039 Jim Wilson changed: What|Removed |Added CC||wilson at gcc dot gnu.org --- Comment #1 from Jim Wilson --- Windows has a 260 character default maximum path length. See for instance https://msdn.microsoft.com/en-us/library/windows/desktop/aa365247(v=vs.85).aspx#maxpath This looks like an OS problem not a gcc problem.
[Bug target/86005] [RISCV] Invalid intermixing of __atomic_* libcalls and inline atomic instruction sequences
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86005 --- Comment #8 from Jim Wilson --- This looks like a generic GCC problem, not a RISC-V specific problem. For instance, if I build an armv6t2 compiler I get bl __atomic_fetch_add_4
[Bug target/86005] [RISCV] Invalid intermixing of __atomic_* libcalls and inline atomic instruction sequences
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86005 --- Comment #9 from Jim Wilson --- Oops, hitting tab doesn't work as expected. Trying again... This looks like a generic GCC problem, not a RISC-V specific problem. Or perhaps, not a gcc bug at all. For instance, if I build an armv6t2 compiler I get bl __atomic_fetch_add_4 ... mcr p15, 0, r0, c7, c10, 5 ldr r3, [r3] mcr p15, 0, r0, c7, c10, 5 where the mcr is equivalent to the RISC-V fence. It looks like MIPS16 and a number of other targets have the same problem. GCC has no support for calling __atomic_load_4 for this testcase. GCC assumes that loads smaller or equal to the word size are always atomic, and will not call a library routine for them. It will emit memory barriers. If what gcc is doing is wrong, then it is broken for all targets that don't inline expand every atomic function call, and/or don't have atomic instructions. I can fix the rv32ia support by inlining expanding every atomic function call. I can't fix the rv32i support without target independent optimizer changes.
[Bug libffi/84410] libffi doesn't support riscv now, but not disabled in configure.ac
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84410 Jim Wilson changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |FIXED --- Comment #4 from Jim Wilson --- libffi builds now, and so does libgo compiler. So fixed for GCC 9.
[Bug tree-optimization/91191] New: vrp and boolean arguments
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91191 Bug ID: 91191 Summary: vrp and boolean arguments Product: gcc Version: 9.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: wilson at gcc dot gnu.org Target Milestone: --- It appears that vrp isn't propagating the ranges of incoming boolean arguments. Given this example: unsigned char reg(_Bool b) { union U { unsigned char f0; _Bool f1; }; union U u; u.f1 = b; if (u.f0 > 1) { // This cannot happen // if b is only allowed // to be 0 or 1: return 42; } return 13; } clang optimizes this to unconditionally return 13, but gcc does a compare and conditionally returns either 42 or 13 depending on the result of the compare. This happens with both x86_64 and RISC-V. Looking at the vrp dumps, I see b_3(D): VARYING
[Bug target/91229] New: RISC-V ABI problem with zero-length bit-fields and float struct fields
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91229 Bug ID: 91229 Summary: RISC-V ABI problem with zero-length bit-fields and float struct fields Product: gcc Version: 9.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: wilson at gcc dot gnu.org Target Milestone: --- This was noticed by clang development, comparing clang against gcc to verify ABI compliance. There is a problem with the GCC implementation of the ABI, where we are accidentally emitting different code for the same struct when compiled by the C and C++ compilers. There are two cases affected by this. Here is the first example: struct s1 { int : 0; float f; int i; int : 0; }; void dummy(float, int); void f(struct s1 s) { dummy(s.f + 1.0, s.i + 1); } where we have a struct that can be passed in one FP reg and one integer GP, and here is the second example: struct s1 { int : 0; float f; float g; int : 0; }; void dummy(float, float); void f(struct s1 s) { dummy(s.f + 1.0, s.g + 2.0); } where we have a struct that can be passed in two FP regs. In both cases, the C++ compiler passes the float struct fields in FP registers, and the C compiler passes them in integer registers. The general case here is any struct with one or more zero-length bitfields, exactly two non-zero length fields, one of which must have an FP type that can fit in an FP register, and the other can be an FP type that fits in an FP register or an integer type that fits in an integer register or a integer bit-field that is the exact same size as an integer type that can fit in an integer register. Also, the target must have FP register support. The fundamental problem is that the RISC-V backend is not checking for zero-length bit-fields when deciding if a struct field can be passed in a FP register or not. Meanwhile, the C++ front end is stripping zero-length bit-fields after struct layout. So when compiling as C++ we decide that the FP struct fields can be passed in FP regs. But when compiling as C we decide that there are too many struct fields and they all get passed in integer registers. Since having the C and C++ front ends using different ABIs is undesirable, we need an ABI change. Fixing the C++ case would require inconvenient changes to the C++ front end. So fixing the C case with a RISC-V backend patch looks like the best practical solution. The affected structures are a bit obscure and not very useful, so it is hoped no real code will be affected. I've done an open-embedded world build with an instrumented compiler, and I didn't see any case that triggered my code. Not everything built though, since some stuff still doesn't have RISC-V support yet. I did have over 30,000 tasks run, so quite a bit of stuff did build.
[Bug target/91229] RISC-V ABI problem with zero-length bit-fields and float struct fields
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91229 Jim Wilson changed: What|Removed |Added Target||riscv*-*-* Status|UNCONFIRMED |NEW Last reconfirmed||2019-07-22 Assignee|unassigned at gcc dot gnu.org |wilson at gcc dot gnu.org Ever confirmed|0 |1 --- Comment #1 from Jim Wilson --- There is a psABI discussion about the problem at https://github.com/riscv/riscv-elf-psabi-doc/issues/99
[Bug target/91229] RISC-V ABI problem with zero-length bit-fields and float struct fields
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91229 --- Comment #2 from Jim Wilson --- Created attachment 46617 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=46617&action=edit proposed patch to change ABI and warn for affected structs
[Bug target/91229] RISC-V ABI problem with zero-length bit-fields and float struct fields
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91229 --- Comment #3 from Jim Wilson --- Author: wilson Date: Thu Aug 8 19:04:56 2019 New Revision: 274215 URL: https://gcc.gnu.org/viewcvs?rev=274215&root=gcc&view=rev Log: RISC-V: Fix C ABI for flattened struct with 0-length bitfield. gcc/ PR target/91229 * config/riscv/riscv.c (riscv_flatten_aggregate_field): New arg ignore_zero_width_bit_field_p. Skip zero size bitfields when true. Pass into recursive call. (riscv_flatten_aggregate_argument): New arg. Pass to riscv_flatten_aggregate_field. (riscv_pass_aggregate_in_fpr_pair_p): New local warned. Call riscv_flatten_aggregate_argument twice, with false and true as last arg. Process result twice. Compare results and warn if different. (riscv_pass_aggregate_in_fpr_and_gpr_p): Likewise. gcc/testsuite/ * gcc.target/riscv/flattened-struct-abi-1.c: New test. * gcc.target/riscv/flattened-struct-abi-2.c: New test. Added: trunk/gcc/testsuite/gcc.target/riscv/flattened-struct-abi-1.c trunk/gcc/testsuite/gcc.target/riscv/flattened-struct-abi-2.c Modified: trunk/gcc/ChangeLog trunk/gcc/config/riscv/riscv.c trunk/gcc/testsuite/ChangeLog
[Bug target/91229] RISC-V ABI problem with zero-length bit-fields and float struct fields
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91229 Jim Wilson changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |FIXED --- Comment #4 from Jim Wilson --- Fixed on mainline.
[Bug target/91420] relocation truncated to fit: R_RISCV_HI20 against `.LC0' with GCC 8.2/8.3 with "-O2" on RISC-V
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91420 Jim Wilson changed: What|Removed |Added Status|UNCONFIRMED |NEW Last reconfirmed||2019-08-12 CC||wilson at gcc dot gnu.org Ever confirmed|0 |1
[Bug target/91602] GCC fails to build for riscv in a combined tree due to misconfigured leb128 support
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91602 Jim Wilson changed: What|Removed |Added CC||wilson at gcc dot gnu.org --- Comment #3 from Jim Wilson --- Combined tree builds are obsolete and shouldn't be used anymore. Since this only shows up in a combined tree build, I don't consider it important. If you build the toolchain the correct way, building binutils and gcc separately, the build does work. My preferred solution would be to kill combined tree build support.