[Bug middle-end/94083] New: inefficient soft-float x!=Inf code

2020-03-06 Thread wilson at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94083

Bug ID: 94083
   Summary: inefficient soft-float x!=Inf code
   Product: gcc
   Version: 9.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: middle-end
  Assignee: unassigned at gcc dot gnu.org
  Reporter: wilson at gcc dot gnu.org
  Target Milestone: ---

Given a testcase like this
int foo(void) {
  volatile float f;
  intn;
  f = __builtin_huge_valf();
  n += 1 - (f != __builtin_huge_valf());
  return n;
}
and compiling for soft-float, we end up with a call to __unordsf2 followed by a
call to __lesf2.  This means the floats have to be unpacked twice and checked
for nan twice. This gives both poor performance and poor code size.  I've
confirmed this for x86, arm, and riscv.

Folding in the C front end is creating an unordered less then or equal
comparison against FLT_MAX.  From the 004.original file
n = SAVE_EXPR  + n;

This optimization is coming from a rule in the match.pd file.
  /* x != +Inf is always equal to !(x > DBL_MAX), but this introduces
 an exception for x a NaN so use an unordered comparison.  */

When we generate rtl, we call do_compare_rtx_and_jump which notices that we
don't have an operation for UNLE_EXPR, but decides we can't reverse it because
it is unsafe.  It tries swapping arguments, but we don't have UNGE_EXPR either.
 So it emits two libcalls.

Converting a NE compare to a UNLE compare looks like an odd optimization.  If
we want to consider unordered operations as canonical operations, then maybe we
should add libgcc support for the unordered operations.  Or maybe we should
check to see if unordered operations are handled by the target before
converting a simple NE into a UNLE.

The match.pd rule was changed to use UNLE in the patch for PR 64811 which fixed
a problem with handling NaNs.  This happened 2018-01-09.  The optimization
dates back to 2003-05-22 but was originally using LE which is OK for
soft-float.  It wasn't until the NaN bug was fixed by using UNLE instead of LE
that this became an optimization problem.

Maybe we just shouldn't perform this optimization when honoring NaNs?  That
would avoid generating the problematic unordered operation early in the
optimizer.

[Bug target/94136] GCC doc for built-in function __builtin___clear_cache() not 100% correct

2020-03-11 Thread wilson at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94136

Jim Wilson  changed:

   What|Removed |Added

 CC||wilson at gcc dot gnu.org

--- Comment #3 from Jim Wilson  ---
Another possible issue is that __clear_cache is defined in the libgcc docs. 
But only some platforms are defining CLEAR_INSN_CACHE so only some targets have
a usable __clear_cache function.

[Bug c++/94044] [10 Regression] internal compiler error: in comptypes, at cp/typeck.c:1490 on riscv64-unknown-linux-gnu and arm-eabi

2020-03-13 Thread wilson at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94044

Jim Wilson  changed:

   What|Removed |Added

 CC||wilson at gcc dot gnu.org

--- Comment #7 from Jim Wilson  ---
I made an attempt to reproduce this.  I wasn't able to reproduce with an
arm-eabi build.  I was able to reproduce with a riscv64-linux build.  The funny
part is that I was able to build two compilers from the same gcc sources, one
which reproduces and one which does not, which differ only in exactly how I did
the build.  For the failing build, I had a complete riscv-gnu-toolchain build
available when configuring.  For the working build it was just binutils and gcc
without glibc/linux header files, and a top-of-tree binutils version unlike the
first build.

Debugging the two side by side, I see that execution diverges at line 9680 in
cp/pt.c
  entry = type_specializations->find_with_hash (&elt, hash);
The working compiler has no hash hit and returns zero.  The failing compiler
has a hash hit, and then dies inside spec_hasher::equal.

In the spec_hasher::equal function I see

(gdb) print *e1
$29 = {tmpl = ,
  args = , spec = }
(gdb) print *e2
$30 = {tmpl = ,
  args = , spec = }

(gdb) pt
 
unit-size 
align:64 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type
0x75deb7e0 precision:64 min  max 
pointer_to_this >
readonly
arg:0  elt:1 >
type_0 type_6 VOID
align:8 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type
0x75ef4540>
tmp.C:13:23 start: tmp.C:13:23 finish: tmp.C:13:35>>
(gdb) print e2->args
$32 = 
(gdb) pt
 
unit-size 
align:64 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type
0x75deb7e0 precision:64 min  max 
pointer_to_this >
readonly
arg:0  elt:1 >
type_0 type_6 VOID
align:8 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type
0x75efa2a0>
tmp.C:13:23 start: tmp.C:13:23 finish: tmp.C:13:35>>
(gdb) 

It then eventually dies inside comptypes because TREE_CODE (t1) is
type_pack_expansion.  And also TREE_CODE (t2) is type_pack_expansion.  This is
called from the SIZEOF_EXPR case in cp_tree_equal.

If tree addresses are being used for the hash codes, this could just be bad
luck whether it fails or not.

[Bug c++/94044] [10 Regression] internal compiler error: in comptypes, at cp/typeck.c:1490 on riscv64-unknown-linux-gnu and arm-eabi

2020-03-14 Thread wilson at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94044

--- Comment #9 from Jim Wilson  ---
(In reply to Jakub Jelinek from comment #8)
> So perhaps to ease reproduction, tweak the hash function in this case to
> always return 0?

Yes, that works.  I just didn't have a chance to look at the hash function last
night.  With the hash function hacked I can reproduce for any target and any
-std=c++X value.

diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index 789ccdb..4337928 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -1733,7 +1733,8 @@ hash_tmpl_and_args (tree tmpl, tree args)
 hashval_t
 spec_hasher::hash (spec_entry *e)
 {
-  return hash_tmpl_and_args (e->tmpl, e->args);
+  return 0;
+  //  return hash_tmpl_and_args (e->tmpl, e->args);
 }

 /* Recursively calculate a hash value for a template argument ARG, for use

[Bug target/94173] [RISCV] Superfluous stackpointer manipulation

2020-03-16 Thread wilson at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94173

Jim Wilson  changed:

   What|Removed |Added

 CC||wilson at gcc dot gnu.org

--- Comment #1 from Jim Wilson  ---
struct Pair has size 8 and align 4, and we have no unaligned load/store
support, so we are not able to allocate the temporary local variable to a
register.  It must be allocated a stack slot.  The RTL optimizer is able to
figure out that the stack stores and loads don't alias anything and hence are
not necessary and optimizes them away.  However, we don't have any support to
unallocate a stack slot after it has already been allocated, so we end up with
the unnecessary stack pointer increment and decrement.  In a degenerate case
like this, where there are no longer any stack loads/stores, we may be able to
notice that and get rid of the stack pointer manipulation.  But in a more
complicated case where there are multiple stack slots, and references to all
but one is optimized away, then we would still need the stack pointer change,
though we would just be wasting stack space in this case with larger
decrements/increments than needed.

If you change the type to
truct Pair {
char *s;
char *t;
} __attribute__ ((aligned(8)));
then you get the result you want.  That isn't a practical solution, but it
demonstrates that this is a size/alignment/strict-alignment problem.

This is more of a middle end problem than a RISC-V backend problem.  It should
be possible to reproduce on any target with similar strict alignment
constraints, and similar calling conventions that allow returning the structure
in registers, though I don't know if there are any offhand.

[Bug target/94173] [RISCV] Superfluous stackpointer manipulation

2020-03-16 Thread wilson at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94173

Jim Wilson  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
 Ever confirmed|0   |1
   Last reconfirmed||2020-03-16

[Bug target/94173] [RISCV] Superfluous stackpointer manipulation

2020-03-16 Thread wilson at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94173

--- Comment #3 from Jim Wilson  ---
I was looking at the rv32 output.  For the rv64 compiler, you need to use
aligned(16).

[Bug c++/64697] C++11 thread_local: relocation truncated to fit: R_X86_64_PC32 against undefined symbol `TLS init function for N::ptd'

2020-03-26 Thread wilson at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64697

Jim Wilson  changed:

   What|Removed |Added

 CC||wilson at gcc dot gnu.org

--- Comment #22 from Jim Wilson  ---
This looks like a binutils bug to me.  A call to an undefined weak function
should never be executed, so it is OK for the linker to convert that call
instruction into anything convenient.  There is no need for a relocation that
can reach an address of zero.  We can convert the call instruction to call
itself, or the next instruction, or change it to a nop, what ever is
convenient, it doesn't really matter.

A number of binutils ports already have code to handle related problems.  ARM
and RISC-V for sure.  Probably others.  It looks like this support is missing
from the x86_64 port.  I'd suggest refiling this as a binutils bug.  See for
instance
  https://sourceware.org/bugzilla/show_bug.cgi?id=23244
for a RISC-V example of the same problem.  But we need a new bug for the x86_64
problem.  RISC-V has a register hard wired to zero, so I rewrite the call
instruction to use x0 as the base address.  The arm port turns the call into a
nop.

[Bug c++/64697] C++11 thread_local: relocation truncated to fit: R_X86_64_PC32 against undefined symbol `TLS init function for N::ptd'

2020-03-26 Thread wilson at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64697

--- Comment #24 from Jim Wilson  ---
Joel Sherrill offered to create a binutils bug report for this.

[Bug bootstrap/92008] Build failure on cygwin

2020-04-16 Thread wilson at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92008

Jim Wilson  changed:

   What|Removed |Added

 CC||wilson at gcc dot gnu.org

--- Comment #21 from Jim Wilson  ---
This looks the same as a binutils bug.
https://sourceware.org/bugzilla/show_bug.cgi?id=22941

The easy solution is to touch intl/plural.c after checkout, so that bison won't
be run.  The contrib/gcc_update script already does this.  So the simplest
solution for the original problem is to use contrib/gcc_update to update a gcc
tree, or "contrib/gcc_update --touch" if you want to fix a gcc tree without
updating it.  If your gcc git source tree was already mangled by a bad bison
run, you will have to manually reset it to a clean tree, e.g. "git reset
--hard" or "git diff > tmp.file; patch -p1 --reverse < tmp.file; rm tmp.file"
or whatever, and then run the contrib/gcc_update --touch command.  Binutils
unfortunately does not have an equivalent to the gcc_update script and hence
requires a fix.

git unfortunately does not preserve file timestamps across commit and checkout,
so when you checkout a file it gets the current time.  git also tends to check
out files in alphabetical order.  If you are on a fast filesystem, i.e. linux,
plural.c and plural.y almost always get the same timestamp and bison isn't run.
 If you are on a slow filesystem, i.e. cygwin, plural.c is often older than
plural.y, and bison must be run, and the current bison version fails.  This is
why it is cygwin folk that most commonly run into this problem.

[Bug target/94950] [8/9/10 regression] ICE in gcc.dg/pr94780.c on riscv64

2020-05-06 Thread wilson at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94950

--- Comment #5 from Jim Wilson  ---
I tested it with an rv64gc-linux cross compiler.  The patch fixes these
failures:
FAIL: gcc.dg/pr94780.c (internal compiler error)
FAIL: gcc.dg/pr94780.c (test for excess errors) 
FAIL: gcc.dg/pr94842.c (internal compiler error)
FAIL: gcc.dg/pr94842.c (test for excess errors)
There are no regressions.

I think it should be backported to the gcc-10 release branch.

[Bug target/94950] [8/9 regression] ICE in gcc.dg/pr94780.c on riscv64

2020-05-07 Thread wilson at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94950

Jim Wilson  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #9 from Jim Wilson  ---
Fixed on mainline and gcc-10 branch.

[Bug target/94780] [8/9 Regression] ICE in walk_body at gcc/tree-nested.c:713 since r6-3632-gf6f69fb09c5f81df

2020-05-07 Thread wilson at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94780
Bug 94780 depends on bug 94950, which changed state.

Bug 94950 Summary: [8/9 regression] ICE in gcc.dg/pr94780.c on riscv64
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94950

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

[Bug target/95115] [10 Regression] RISC-V 64: inf/inf division optimized out, invalid operation not raised

2020-05-13 Thread wilson at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95115

Jim Wilson  changed:

   What|Removed |Added

 CC||wilson at gcc dot gnu.org

--- Comment #3 from Jim Wilson  ---
Marc Glisse's testcase fails even with old gcc versions.  My x86_64 Ubuntu
16.04 gcc-5.4.0 also removes the divide with -O.  My Ubuntu 18.04 gcc-7.5.0
gives the same result.  This seems to be simple constant folding that we have
always done.  The assumption here seems to be that if the user is dividing
constants, then we don't need to worry about setting exception bits.  If I
write (4.0 / 3.0) for instance, the compiler just folds it and doesn't worry
about setting the inexact bit.

Aurelien Jarno's testcase in the attachment is more interesting, as that works
with older gcc versions, just not gcc-10.  I did a bisect, and tracked this
down to the Richard Biener's patch for pr83518.  It looks like the glibc code
was obfuscated a bit to try to avoid the usual trivial constant folding, and
the patch for pr83518 just made gcc smart enough to recognize that constants
are involved, and then optimize this case the same way we have always optimized
FP constant divides.

Newlib incidentally uses (x-x)/(x-x) where x is the input value, so there are
no constants involved, and the divide does not get optimized away.  This still
works with gcc-10.  The result is a subtract followed by a divide.

At first glance, this looks more like a glibc problem to me than a gcc problem.
 But maybe the fact that constants were written to memory and then read back in
should prevent the usual trivial FP constant divide folding.

I can almost make the glibc testcase work if I mark the unions as volatile. 
That prevents the union reads and writes from being optimized away, but the
divide gets moved after the fetestexcept call.  That looks like a gcc bug
though I think a different problem that this pr.  The 234t.optimized dump is
correct.  The 236r.expand dump is wrong.  This happens for both x86_64 and
RISC-V.  The resulting code is bigger than what the newlib trick generates
though.

[Bug target/95252] testcase gcc.dg/torture/pr67916.c failure when testing with -msave-restore

2020-05-25 Thread wilson at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95252

Jim Wilson  changed:

   What|Removed |Added

 Ever confirmed|0   |1
   Last reconfirmed||2020-05-25
 Status|UNCONFIRMED |NEW

--- Comment #1 from Jim Wilson  ---
It appears to be failing in the rename register (rnreg) pass.  This is because
the unspec patterns for the save/restore calls don't mention the registers that
they use/modify.  This confuses rename reg into thinking that live regs are
dead, and it accidentally clobbers them before the save call.  This worked OK
when save/restore calls could only be at the beginning or end of a function. 
But now that this works with tail calls and shrink wrapping, we can get them in
inner blocks.  Since the different save/restore calls use/modify different sets
of registers, fixing this gets a little complicated.  Maybe we can just use the
max list of registers because listing extra ones shouldn't matter?

Another solution is to disable the rename register pass when -msave-restore is
used.  This isn't doing any checking for whether regs can be used in compressed
instructions or not.  This is currently encoded in REG_ALLOC_ORDER which this
pass doesn't use.  The result is that this is probably increasing code size
which is undesirable when -msave-restore it used.  Disabling this would reduce
code size and fix the -msave-restore problem.

The rename register pass does use the PREFERRED_RENAME_CLASS hook that we
haven't defined.  We should try defining this to convert registers classes to
subsets that only include the regs that can be used in compressed instructions.
 This might result in a code size decrease.  If this works, then the rename reg
pass should not be disabled, and we should find a way to fix the save/restore
pattern register lists instead.

I need to do some builds and experiments to verify this info.

[Bug target/95252] testcase gcc.dg/torture/pr67916.c failure when testing with -msave-restore

2020-05-27 Thread wilson at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95252

--- Comment #3 from Jim Wilson  ---
I tried both.  Turning off register naming works.  It gives a code size
decrease of about 0.003% for the libraries I looked at which can be ignored. 
This probably also reduces performance; I didn't check that.  I think it would
be better to leave register naming on and define the PREFERRED_RENAME_CLASS
hook.  

Adding uses to the gpr_save pattern also works for the testcase.  I just added
uses for all of the saved regs.  We shouldn't need an exact list, because there
is little or no code before the prologue, and the prologues are added late.  An
exact list would be cleaner if you want to try to do that.  I also needed to
fix riscv_remove_unneeded_save_restore_calls to ignore the prologue_matched
insn when checking for USEs to avoid gcc testsuite regressions.  I now have 3
g++ testsuite regressions I haven't looked at yet.
FAIL: g++.dg/torture/stackalign/throw-1.C   -O3 -fomit-frame-pointer
-funroll-loops -fpeel-loops -ftracer -finline-functions -fpic execution test
FAIL: g++.dg/torture/stackalign/throw-2.C   -O3 -fomit-frame-pointer
-funroll-loops -fpeel-loops -ftracer -finline-functions  execution test
FAIL: g++.dg/torture/stackalign/throw-2.C   -O3 -fomit-frame-pointer
-funroll-loops -fpeel-loops -ftracer -finline-functions -fpic execution test

[Bug target/95252] testcase gcc.dg/torture/pr67916.c failure when testing with -msave-restore

2020-05-27 Thread wilson at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95252

--- Comment #4 from Jim Wilson  ---
Created attachment 48624
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=48624&action=edit
disable reg rename when -msave-restore

the code using MASK_SAVE_RESTORE is just for testing purposes

[Bug target/95252] testcase gcc.dg/torture/pr67916.c failure when testing with -msave-restore

2020-05-27 Thread wilson at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95252

--- Comment #5 from Jim Wilson  ---
Created attachment 48625
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=48625&action=edit
add uses to gpr_save pattern

the code using MASK_SAVE_RESTORE is just for testing purposes
unfinished, adds 3 new g++ testsuite failures

[Bug target/95252] testcase gcc.dg/torture/pr67916.c failure when testing with -msave-restore

2020-05-28 Thread wilson at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95252

--- Comment #7 from Jim Wilson  ---
I've got 3 new g++ testsuite failures.  So we might still need an exact list of
USEs.  I hadn't thought about RVE.  That will have to be checked also. 
RV32/RV64 shouldn't matter, as the mode in the USEs doesn't matter.  Unless
maybe you want to use a multi-word load to match multiple registers with a
single USE to reduce the size of the patterns, in which case it would need to
be different for rv32 and rv64.  If we do need an exact list of USEs, maybe we
can use a match_parallel to simplify the patterns.

[Bug target/84553] -rdynamic generates TEXTREL relocations on ia64

2020-06-02 Thread wilson at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84553

--- Comment #7 from Jim Wilson  ---
I was ia64 maintainer when I wrote the patch, but couldn't test it.  I'm not
the ia64 maintainer anymore.  I suggest asking the current ia64 maintainer. 
Though, oops, I see we don't have one listed in the MAINTAINER file.  I thought
we had appointed one.  I'm a global maintainer, but that doesn't give me power
to approve my own patch for things I don't maintain anymore.  I'm hopelessly
overcommitted on RISC-V issues so unlikely to have time to do anything here.

[Bug target/95637] Read-only data assigned to `.sdata' rather than `.rodata'

2020-06-11 Thread wilson at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95637

Jim Wilson  changed:

   What|Removed |Added

 CC||wilson at gcc dot gnu.org

--- Comment #1 from Jim Wilson  ---
The RISC-V backend puts small read-only data in the srodata section.  RISC-V is
not the only target that supports srodata.  I agree that this might be
surprising for targets with memory protection that are expecting writes to
read-only data to trap but I don't think that standards require traps here. 
And for targets without memory protection this is a useful code size and
performance optimization.

We could perhaps disable srodata support for the riscv linux and freebsd
targets.  I think those are the only ones with memory protection that we
support.  Maybe make this controlled by an option so people can choose between
getting traps and getting smaller faster code.

[Bug target/95632] Redundant zero extension

2020-06-11 Thread wilson at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95632

Jim Wilson  changed:

   What|Removed |Added

   Last reconfirmed||2020-06-12
 Status|UNCONFIRMED |NEW
 Ever confirmed|0   |1

--- Comment #1 from Jim Wilson  ---
We sign extend HImode constants as that is the natural thing to do to make
arithmetic work.  This does mean that unsigned short logical operations need a
zero extend after the operation which might otherwise be unnecessary.  This
can't be handled at rtl generation time as we don't know if the constant will
be used for arithmetic or logicals or signed or unsigned.  But maybe an
optimization pass could go over the code and convert HImode constants to signed
or unsigned as appropriate to reduce the number of sign/zero extend operations.
 We have the ree pass that we might be able to extend to handle this.

Handling this in combine requires a 4->3 splitter which is something combine
doesn't do.  We could work around that by not splitting constants before
combine, but that would be a major change and probably not beneficial, as we
wouldn't be able to easily optimize the high part of the constants anymore.

Another approach here might be to split the xor along with the constant.  If we
generated something like
srlia0,a0,1
xoria0,a0,1
li  a5,-24576
xor a0,a0,a5
then we can optimize away the following zero extend with a 3->2 splitter which
combine already supports via find_split_point.  We can still optimize the high
part of the constant. Since the immediates are sign extended, if the low part
of the immediate has the sign bit set, we would have to invert the high part of
the immediate to get the right result.  At least I think that works, I haven't
double checked it yet.  This only works for or if the low part doesn't have the
sign bit set.  And this only works for and if the low part does have the sign
bit set.

[Bug target/95637] Read-only data assigned to `.sdata' rather than `.rodata'

2020-06-15 Thread wilson at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95637

--- Comment #3 from Jim Wilson  ---
People have asked about constant pools before, but as far as I know no one has
tried to implement support for them yet.

We don't have a pc-relative load, so it would be a two instruction sequence
with auipc.  Unless maybe you load the base address into a register, which is
probably OK for rvi but may cause register pressure problems for rve.  We have
a 12-bit signed offset, +/-2K which limits the range we can address if you want
to put the base address in a register.  There could also complications with the
aggressive link time code relaxations that we do, depending on where you put
the constant pools and how you use them.  It isn't clear if constant pools are
better or worse than what we already have.

[Bug target/95632] Redundant zero extension

2020-06-15 Thread wilson at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95632

--- Comment #3 from Jim Wilson  ---
It isn't possible to have patterns that match only in combine.  If we add a
pattern to accept (xor (reg) (large constant)) then it could match in any
optimization pass, and could prevent us from optimizing away redundant lui
instructions.

There is a representation issue here with constants.  If we split them early,
then optimizing redundant lui is easy.  If we split them late, then optimizing
redundant lui is hard.  There are also other optimizations that may be easy or
hard depending on whether constants are split early or late.  Currently, we
always split constants early, and changing that will have a major impact on the
code optimization, which may be good or bad, but more likely will be good for
some programs and bad for others.  I'd rather not change this as it will be a
major project to deal with the problems caused by the change.

Hence my suggestion at RTL generation time to split xor with constants
differently.  I have a proof of concept patch for that, but it needs a lot of
cleanup to be useful, and a lot of testing to verify that it improves code more
often than it harms code.

As for ree, splitters after register allocation traditionally check
reload_completed which is a global variable set near the end of the last
register allocation pass.  The split2 pass happens between reload and ree. 
Maybe moving ree before split2 would help RISC-V, but might hurt other targets.
 Or might help for some programs and hurt for others.

[Bug target/95632] Redundant zero extension

2020-06-15 Thread wilson at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95632

--- Comment #4 from Jim Wilson  ---
Created attachment 48737
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=48737&action=edit
proof of concept patch for changing xor with a large constant

needs cleanup and testing to be useful

[Bug tree-optimization/95685] Loop invariants can't be moved out of the loop

2020-06-15 Thread wilson at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95685

--- Comment #1 from Jim Wilson  ---
The problem with the constant isn't apparent until we reach RTL generation and
see that it requires two instructions to load.  Then once in RTL optimization
passes we have mostly block local optimizations that aren't going to notice the
same constant used in 3 different blocks and optimize it.  The if statement
inside the unrolled loop bodies prevent RTL optimization passes from fixing
this.  So yes, this would work better if we could do loop invariant code motion
before loop unrolling as you suggested.

[Bug tree-optimization/95760] ivopts with loop variables

2020-06-20 Thread wilson at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95760

Jim Wilson  changed:

   What|Removed |Added

 CC||wilson at gcc dot gnu.org

--- Comment #1 from Jim Wilson  ---
You are compiling with -Os.  I get the expected result if I compile with -O2.

Looking at tree dumps, I see the first difference between -O2 and -Os dumps is
in the ch2 (copy loop header 2) pass, which explicitly disables loop header
copying when -Os is used.  Note the optimize_loop_for_size_p check in
should_duplicate_loop_header_p in tree-ssa-loop-ch.c.  You can see the
difference if you add -ftree-dump-ch2-all.  In the -O2 ch2 dump file, I see

Loop 1 is not do-while loop: latch is not empty.
Will duplicate bb 7
  Not duplicating bb 3: it is single succ.
Duplicating header of the loop 1 up to edge 7->3, 4 insns.
Loop 1 is do-while loop
Loop 1 is now do-while loop.

and in the -Os ch2 dump file, I see

Loop 1 is not do-while loop: latch is not empty.
  Not duplicating bb 7: optimizing for size.

The difference in loop optimization here then affects the later ivopt pass. 
Normally, duplicating basic blocks will make code bigger.  But in this case the
duplicated blocks enable better loop optimization which results in smaller code
at the end.  This kind of thing is hard to handle with the heuristics.  We
would have to optimize both ways and check to see which one is smaller at the
end to get this right every time, and the compiler doesn't work that way
currently.

I haven't checked older sources to see if/when a heuristic changed.

This isn't risc-v specific.  I see the same issue with x86_64.

[Bug tree-optimization/95760] ivopts with loop variables

2020-06-22 Thread wilson at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95760

--- Comment #2 from Jim Wilson  ---
I took another look, and it turns out that the should_duplicate_loop_header_p
for size/speed is not the only issue.  There is also an issue in
tree-ssa-loop-ivopts.c when computing iv costs.  With speed, the +4 iv is
computed as cheaper than the +1 iv.  With size, the +4 iv and +1 iv have the
exact same cost, and since the +1 iv was looked at first that one was chosen. 
If I hack adjust_setup_cost to use to always use the speed cost calculation,
and retain the should_duplicate_loop_header_p hack, then both the inner and
outer loops get the +4 iv with -Os.

Looking at gcc-8.3, I see that the outer loop has the +4 iv and the inner loop
as the +1 iv.  This looks similar to the result I get with the
adjust_setup_cost hack but not the should_duplicate_loop_header_p hack.  So I
think the regression is solely due to some change in the cost calculation.

There is a lot of code involved in cost calculations.  This could have even
been a riscv backend change.  I would suggest doing a bisect over the gcc git
tree if you want to see exactly where and how the cost calculation changed.

The -Os and -O2 optimization diverges in try_improve_iv_set where it does "if
(acost < best_cost)".  Maybe this could be improved to handle the case where
acost == best_cost, and use some other criteria to choose which one is better,
e.g. maybe a giv is better than a biv if they have the same cost.  I haven't
tried looking into this.

[Bug target/96026] overlap register bewteen DEST and SOURCE in different machine mode

2020-07-02 Thread wilson at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96026

Jim Wilson  changed:

   What|Removed |Added

 CC||wilson at gcc dot gnu.org
 Resolution|--- |FIXED
 Status|UNCONFIRMED |RESOLVED

--- Comment #1 from Jim Wilson  ---
This is 3 different places where you have asked the same question now.  One
place would have been good enough.  Already answered in other places.

[Bug target/96191] New: aarch64 stack_protect_test canary leak

2020-07-13 Thread wilson at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96191

Bug ID: 96191
   Summary: aarch64 stack_protect_test canary leak
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: wilson at gcc dot gnu.org
  Target Milestone: ---

Given a simple testcase
extern int sub (int);

int
main (void)
{
  sub (10);
  return 0;
}
commpiling with -O -S -fstack-protector-all -mstack-protector-guard=global
in the epilogue for the canary check I see
ldr x1, [sp, 40]
ldr x0, [x19, #:lo12:__stack_chk_guard]
eor x0, x1, x0
cbnzx0, .L4
Both x0 and x1 have the stack protector canary loaded into them, and the eor
clobbers x0, but x1 is left alone.  This means the value of the canary is
leaking from the epilogue.  The canary value is never supposed to survive in a
register outside the stack protector patterns.

A powerpc64-linux toolchain build with the same testcase and options generates
lwz 9,28(1)
lwz 10,0(31)
xor. 9,9,10
li 10,0
bne- 0,.L4
and note that it clears the second register after the xor to prevent the canary
leak.  The aarch64 stack_protect_test pattern should do the same thing.

[Bug target/96191] aarch64 stack_protect_test canary leak

2020-07-14 Thread wilson at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96191

--- Comment #3 from Jim Wilson  ---
The location of the canary is not known to the attacker.  You are not supposed
to leak the address of the canary or the value of the canary.  If you leak
either, then an attacker has a chance to restore the canary after clobbering
it.

See the descriptions of the stack_protect_set and stack_protect_test patterns
in gcc/doc/md.texi which make clear that no intermediate values should be
allowed to survive past the end of the pattern.

[Bug sanitizer/96307] [10/11 Regression] ICE in sanopt on riscv64 since r11-2283-g2ca1b6d009b194286c3ec91f9c51cc6b0a475458

2020-08-04 Thread wilson at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96307

Jim Wilson  changed:

   What|Removed |Added

 CC||wilson at gcc dot gnu.org

--- Comment #2 from Jim Wilson  ---
It is calling targetm.asan_shadow_offset which is a null function pointer
currently for RISC-V.  This is related to Kito's recent patch to re-enable ksan
support when asan_shadow_offset isn't defined.  But it looks like there are
multiple params that can cause asan_shadow_offset to be called for ksan when it
normally isn't.  So this change may need to be removed.

Good news is that we have a patch to add asan support for RISC-V which would
make Kito's toplev.c patch unnecessary for us.

[Bug bootstrap/97183] New: zstd build failure for gcc 10 on Ubuntu 16.04

2020-09-23 Thread wilson at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97183

Bug ID: 97183
   Summary: zstd build failure for gcc 10 on Ubuntu 16.04
   Product: gcc
   Version: 10.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: bootstrap
  Assignee: unassigned at gcc dot gnu.org
  Reporter: wilson at gcc dot gnu.org
  Target Milestone: ---

This was originally reported here
https://github.com/riscv/riscv-gnu-toolchain/issues/718

A build of gcc 10 on Ubuntu 16.04 with the libzstd-dev package installed gives
multiple errors.  Ubuntu 16.04 has zstd version 0.5.1 which lacks features that
gcc is trying to use.  The gcc configure test for zstd.h only verifies that it
exists, it doesn't verify the version, or that any of the functions or macros
we need are present.

Ubuntu 18.04 has zstd version 1.3.3, I verified that builds.  So we can maybe
verify the version is 1.3.3 or greater, or maybe check for the specific
functions and macros that we are trying to use.

Kito did a little research that suggests that we need verfsion 1.3.0 or
greater.  We haven't tried to verify that.

--without-zstd successfully works around the problem.

build log info from the original bug report:

g++ -fno-PIE -c   -g -O2 -DIN_GCC  -DCROSS_DIRECTORY_STRUCTURE  
-fno-exceptions -fno-rtti -fasynchronous-unwind-tables -W -Wall -Wno-narrowing
-Wwrite-strings -Wcast-qual -Wmissing-format-attribute -Woverloaded-virtual
-pedantic -Wno-long-long -Wno-variadic-macros -Wno-overlength-strings  
-DHAVE_CONFIG_H -I. -I. -I../.././riscv-gcc/gcc -I../.././riscv-gcc/gcc/.
-I../.././riscv-gcc/gcc/../include -I../.././riscv-gcc/gcc/../libcpp/include 
-I../.././riscv-gcc/gcc/../libdecnumber
-I../.././riscv-gcc/gcc/../libdecnumber/dpd -I../libdecnumber
-I../.././riscv-gcc/gcc/../libbacktrace   -o optabs-tree.o -MT optabs-tree.o
-MMD -MP -MF ./.deps/optabs-tree.TPo ../.././riscv-gcc/gcc/optabs-tree.c
../.././riscv-gcc/gcc/lto-compress.c: In function ‘int
lto_normalized_zstd_level()’:
../.././riscv-gcc/gcc/lto-compress.c:120:36: error: ‘ZSTD_maxCLevel’ was not
declared in this scope
   else if (level > ZSTD_maxCLevel ())
^
../.././riscv-gcc/gcc/lto-compress.c: In function ‘void
lto_uncompression_zstd(lto_compression_stream*)’:
../.././riscv-gcc/gcc/lto-compress.c:160:74: error: ‘ZSTD_getFrameContentSize’
was not declared in this scope
   unsigned long long const rsize = ZSTD_getFrameContentSize (cursor, size);
  ^
../.././riscv-gcc/gcc/lto-compress.c:161:16: error: ‘ZSTD_CONTENTSIZE_ERROR’
was not declared in this scope
   if (rsize == ZSTD_CONTENTSIZE_ERROR)
^
../.././riscv-gcc/gcc/lto-compress.c:163:21: error: ‘ZSTD_CONTENTSIZE_UNKNOWN’
was not declared in this scope
   else if (rsize == ZSTD_CONTENTSIZE_UNKNOWN)
 ^

[Bug libstdc++/97182] Add support for targets that only define SYS_futex_time64 and not SYS_futex

2020-09-23 Thread wilson at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97182

--- Comment #3 from Jim Wilson  ---
libgomp works on riscv64-linux.  It would only be riscv32-linux that is broken.

The riscv32 support was only just recently added to FSF glibc, and hasn't
appeared in a release yet, so arguably, there is no ABI break for riscv32-linux
if we can fix this before the gcc-11 release, as that is the first one that can
officially support riscv32-linux.  Unofficialy we have embedded linux distros
with riscv32-linux but they should be able to tolerate ABI breaks, particularly
since we never guaranteed that the riscv32-linux ABI was stable.

This will be a problem for other 32-bit targets though when they enable 64-bit
time_t support.

[Bug middle-end/92263] [10 Regression] ICE in commit_one_edge_insertion, at cfgrtl.c:2087 since r270758

2019-10-29 Thread wilson at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92263

--- Comment #3 from Jim Wilson  ---
I did a cross compiler build and check yesterday using up-to-date sources and
did not see this failure.  I've been testing regularly.  I did my build on an
x86_64 Ubuntu 16.04 machine with gcc-5.4 as the system compiler.  Maybe this
depends on the compiler used for the build?  Or the exact configure command?

[Bug middle-end/92263] [10 Regression] ICE in commit_one_edge_insertion, at cfgrtl.c:2087 since r270758

2019-10-29 Thread wilson at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92263

--- Comment #4 from Jim Wilson  ---
OK, I get it now.  You are using non-standard optimization options with a
testsuite testcase.  I can reproduce when I use your compiler options.  I will
take a look.

[Bug middle-end/92263] [10 Regression] ICE in commit_one_edge_insertion, at cfgrtl.c:2087 since r270758

2019-10-29 Thread wilson at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92263

--- Comment #5 from Jim Wilson  ---
The patch adds a RISC-V movcc pattern.  This causes toplev.c to enable
flag_tree_cselim.  This optimization pass creates a complex long double
conditional move via a phi node.
  complex long double cstore_31;
  ...
   [local count: 27903866]:
  cstore_30 = MEM  [(void *)_8];

   [local count: 55807731]:
  # cstore_31 = PHI <__complex__ (0.0, 0.0)(4), cstore_30(5)>
  MEM  [(void *)_8] = cstore_31;
When we try to convert gimple to rtl, eliminate_phi calls
insert_value_copy_on_edge for the 32-byte long double 0 value.  The constant
then gets forced to memory, and we end up calling emit_block_move with
BLOCK_OP_NO_LIBCALL, which ends up emitting a loop to do the memory to memory
copy.  Then later in commit_one_edge_insertion we split the edge, insert the
code containing the loop, and then trigger an abort because the last
instruction inserted is the loop back branch.

I don't see where the RISC-V port did anything wrong.  The load hoisting code
is checking the movcc optab to see if the target supports the operation, but I
don't see anything obvious like that in the cselim pass.  The only obvious fix
I see in the RISC-V back end is to modify riscv_expand_block_move to emit
inline non-loop code for a 32-byte memory to memory copy, even when optimizing
for size, which I'd rather not do.  Maybe it can be fixed in
commit_one_edge_insertion by allowing conditional branches but not
unconditional branches, but it isn't clear why this is refusing to allow
branches here in the first place.  I will have to look at other targets to see
why they aren't failing.

[Bug middle-end/92263] [10 Regression] ICE in commit_one_edge_insertion, at cfgrtl.c:2087 since r270758

2019-10-30 Thread wilson at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92263

--- Comment #6 from Jim Wilson  ---
Looking at some other targets.  ARM has movcc but not 128-bit long double. 
Aaarch has movcc and 128-bit long double, but has 128-bit load/store so this is
only 4 instructions.  mips64, powerpc64, and sparc64 have movcc and 128-bit
long double, but emit the memcpy inline as 8 instructions.  riscv64 meanwhile
wants the libcall with -Os as that is 4 instructions instead of 8.  For rv32
this would be 16 instructions.  I'm not sure offhand if the other targets
support 32-bit code and 128-bit long double.

Anyways, I tracked the use of BLOCK_OP_NO_LIBCALL in emit_move_complex back to
bugzilla 15289, fixed by a patch from Richard Henderson back in Dec 1 2004.  I
think it is just an oversight that -Os wasn't considered here.  I think the
correct fix is to only force BLOCK_OP_NO_LIBCALL when optimizing for speed. 
With this change, I get the 8 instruction sequence with -O2, and the 4
instruction libcall sequence with -Os, which is what the RISC-V backend wants,
and this lets the testcase work.

[Bug middle-end/92263] [10 Regression] ICE in commit_one_edge_insertion, at cfgrtl.c:2087 since r270758

2019-10-30 Thread wilson at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92263

Jim Wilson  changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |wilson at gcc dot 
gnu.org

--- Comment #7 from Jim Wilson  ---
Created attachment 47139
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=47139&action=edit
untested proposed fix

[Bug middle-end/92263] [10 Regression] ICE in commit_one_edge_insertion, at cfgrtl.c:2087 since r270758

2019-11-05 Thread wilson at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92263

--- Comment #8 from Jim Wilson  ---
Author: wilson
Date: Tue Nov  5 22:34:40 2019
New Revision: 277861

URL: https://gcc.gnu.org/viewcvs?rev=277861&root=gcc&view=rev
Log:
Allow libcalls for complex memcpy when optimizing for size.

The RISC-V backend wants to use a libcall when optimizing for size if
more than 6 instructions are needed.  Emit_move_complex asks for no
libcalls.  This case requires 8 insns for rv64 and 16 insns for rv32,
so we get fallback code that emits a loop.  Commit_one_edge_insertion
doesn't allow code inserted for a phi node on an edge to end with a
branch, and so this triggers an assertion.  This problem goes away if
we allow libcalls when optimizing for size, which gives the code the
RISC-V backend wants, and avoids triggering the assert.

gcc/
PR middle-end/92263
* expr.c (emit_move_complex): Only use BLOCK_OP_NO_LIBCALL when
optimize_insn_for_speed_p is true.

gcc/testsuite/
PR middle-end/92263
* gcc.dg/pr92263.c: New.

Added:
trunk/gcc/testsuite/gcc.dg/pr92263.c
Modified:
trunk/gcc/ChangeLog
trunk/gcc/expr.c
trunk/gcc/testsuite/ChangeLog

[Bug middle-end/92263] [10 Regression] ICE in commit_one_edge_insertion, at cfgrtl.c:2087 since r270758

2019-11-05 Thread wilson at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92263

Jim Wilson  changed:

   What|Removed |Added

 Status|NEW |ASSIGNED

--- Comment #9 from Jim Wilson  ---
Fixed on mainline.

Do you need a backport to the gcc-9 branch?

[Bug middle-end/92263] [10 Regression] ICE in commit_one_edge_insertion, at cfgrtl.c:2087 since r270758

2019-11-14 Thread wilson at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92263

Jim Wilson  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #10 from Jim Wilson  ---
Fixed on mainline.  No backport requested.

[Bug bootstrap/92709] Cross Compilation failed for Latest GCC riscv64-linux-gnu on Linux/WSL2

2019-12-02 Thread wilson at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92709

Jim Wilson  changed:

   What|Removed |Added

 CC||wilson at gcc dot gnu.org

--- Comment #3 from Jim Wilson  ---
>make[4]: Entering directory 
>'/home/cqwrteur/gcc-riscv64-build/riscv64-linux-gnu/lib32/ilp32/libgcc'
>make[4]: *** No rule to make target 'all'.  Stop.

Look in the riscv64-linux-gnu/lib32/ilp32/libgcc dir.  There should be a
Makefile that is a copy of the $srcdir/libgcc/Makefile.in file with some sed
substition.  If this isn't true, then it is the configure step that failed.

You can force the library dirs to reconfigure by deleting them.  It gets a bit
more complicated with multilibs, I think you have to delete the top level
riscv64-linux-gnu/libgcc to force a reconfigure, but just a rm -rf
riscv64-linux-gnu works too and is simpler, though more stuff will be rebuilt.

You might want to do a -j1 make to get an easier to read build log.

I don't have a Windows machine at work, but there is only one WSL problem that
I have seen reported, and it is that WSL makes filesystems case-insensitive by
default which is contrary to linux practice.  It is known that this will break
glibc builds which uses .os and .oS for two different kinds of files.  I don't
think that this breaks gcc builds.  But since you are trying to do a cross to
riscv64-linux-gnu you will run into this problem if you haven't already.

[Bug target/93062] Failed to generate indirect branch for long branches on riscv

2019-12-24 Thread wilson at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93062

Jim Wilson  changed:

   What|Removed |Added

 CC||wilson at gcc dot gnu.org

--- Comment #1 from Jim Wilson  ---
We need to check length attributes in the branch patterns, and emit different
sequences depending on the length.  There are multiple examples to compare
with, for instance the condjump pattern in the aarch64.md file.

[Bug testsuite/93045] gc bug with test "start_unit-test-1.c"

2019-12-24 Thread wilson at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93045

Jim Wilson  changed:

   What|Removed |Added

 CC||wilson at gcc dot gnu.org

--- Comment #1 from Jim Wilson  ---
This worked for me running natively on a fedora rawhide system with a 4.15
linux kernel.

[Bug target/93062] Failed to generate indirect branch for long branches on riscv

2019-12-24 Thread wilson at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93062

Jim Wilson  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2019-12-25
 Ever confirmed|0   |1

[Bug inline-asm/93202] [RISCV] ICE when using inline asm 'h' constraint modifier

2020-01-08 Thread wilson at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93202

--- Comment #2 from Jim Wilson  ---
%h is used for the gcc internal implementation of emitting auipc.  I'm
skeptical that it is useful for asms.  Stripping the HIGH rtx is an internal
implementation detail, and does not apply to asms, as you can't get a HIGH
there.  Is there a reason why you are trying to use it?  There may be a better
solution for what you need.  If we really need %h to work in asms then it
probably needs some inconvenient work.  I'd rather document that %h shouldn't
be used in asms, or leave it undocumented as an internal gcc implementation
detail.  I'm assuming that you are just working on llvm support, and don't
actually need %h to work in asms, you just need llvm and gcc compatibility.

riscv_print_operand does use output_operand_lossage as it should.  But it calls
a function riscv_print_operand_reloc which calls gcc_unreachable in a switch
statement.  That is an oversight.  It can be fixed to use
output_operand_lossage too.

[Bug inline-asm/93202] [RISCV] ICE when using inline asm 'h' constraint modifier

2020-01-08 Thread wilson at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93202

--- Comment #5 from Jim Wilson  ---
Jakub's patch looks OK to me.

[Bug inline-asm/93202] [RISCV] ICE when using inline asm 'h' constraint modifier

2020-01-08 Thread wilson at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93202

--- Comment #7 from Jim Wilson  ---
(In reply to Luís Marques from comment #3)
> Jim Wilson: I'm not using it, I was only working on the LLVM implementation.
> Could you please clarify if following modifiers are also internal only?
> 
>   'C'  Print the integer branch condition for comparison OP.
>   'A'  Print the atomic operation suffix for memory model OP.
>   'F'  Print a FENCE if the memory model requires a release.

'C' maps an rtx to a string.  It is intended to be used for comparisons to emit
the appropriate compare instruction, because the instruction names match the
gcc internal rtx names.  It can't be used for its intended purpose in an asm,
as you can't get a comparison operator as an operand to an asm.  Since it works
with any rtx, it can be used in an asm, but is very unlikely to be useful.

'A' takes a memory order value from stdatomic.h, and emits a .acq if it is one
of the memory orders that requires an acquire operation, e.g. __ATOMIC_ACQUIRE.
 Gcc calls this a memory model internally, and defines the values in
memmodel.h.  The primary use is for the atomic builtin functions, to map the
memory order argument to the right instruction.  This takes an integer
argument, so in theory it could be used in an asm, but unlikely to be very
useful.

'F" is similar, except it is to atomic releases, and emits a fence instruction.
 This one is a bit of historical accident.  The gcc riscv port was written
before we had a formal memory model defined, and so to be conservative, it
emits fences in a lot of places where we probably don't need them.  Now that we
do have a formal memory model defined, the gcc port needs to be fixed to
implement it, except there is no one to do the work, so it is unclear when it
will happen.  Meanwhile, the port still emits a lot of fences we don't need via
'F'.  This takes an integer argument as above, so likewise in theory could be
used in an asm, but unlikely to be useful.  And this one has the additional
problem that it needs to change in a future gcc release, though we could
preserve the current meaning of the 'F' letter and use a new letter if
necessary in the rewrite.

The useful print operand letters are the ones for registers, constants, and
addresses.  These random ones used for internal gcc features aren't really
useful in asms.

[Bug target/93304] RISC-V: Function with interrupt attribute use register without save/restore at prologue/epilogue

2020-01-17 Thread wilson at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93304

Jim Wilson  changed:

   What|Removed |Added

 CC||wilson at gcc dot gnu.org

--- Comment #2 from Jim Wilson  ---
There is a convention of using all caps for function arguments.  See for
instance the riscv_build_integer function comment.  It would be nice to
preserve this convention, but this is a very minor issue.  I usually put a
blank line between the function comment and the function, but again this is a
very minor issue.

The patch looks OK to me.

[Bug target/93333] ICE: RTL check: expected code 'const_int', have 'and' in riscv_rtx_costs, at config/riscv/riscv.c:1645

2020-01-20 Thread wilson at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=9

Jim Wilson  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2020-01-21
 Ever confirmed|0   |1

--- Comment #2 from Jim Wilson  ---
I can reproduce.  Reproducing requires enabling rtl checking which is not on by
default.  I suspect that there are other similar problems, as we probably
haven't tested a build with rtl checking enabled before.

The problem is in riscv_rtx_costs which only needs to return valid values for
valid rtl, and it is failing the rtl check for invalid rtl, so this isn't a
major problem if rtl checking is off, but it does need to be fixed to be safe.

[Bug target/93333] ICE: RTL check: expected code 'const_int', have 'and' in riscv_rtx_costs, at config/riscv/riscv.c:1645

2020-01-20 Thread wilson at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=9

--- Comment #3 from Jim Wilson  ---
Jakub's patch looks OK, and works for the testcase.

[Bug target/93333] ICE: RTL check: expected code 'const_int', have 'and' in riscv_rtx_costs, at config/riscv/riscv.c:1645

2020-01-20 Thread wilson at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=9

--- Comment #4 from Jim Wilson  ---
I tried some cross testing with rtl checking enabled, and found another rtl
check bug with the -msave-restore support in config/riscv/riscv-sr.c where it
uses XINT to read from a CONST_INT which is wrong, as it is actually an XWINT
value, and we should be using INTVAL to read the value.  I've tested a patch
for that, and can commit it tomorrow.  -msave-restore is for embedded code
size, so this shouldn't be a problem for linux users.

[Bug target/93333] ICE: RTL check: expected code 'const_int', have 'and' in riscv_rtx_costs, at config/riscv/riscv.c:1645

2020-01-21 Thread wilson at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=9

Jim Wilson  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #7 from Jim Wilson  ---
Fixed on mainline.

[Bug target/93304] RISC-V: Function with interrupt attribute use register without save/restore at prologue/epilogue

2020-01-21 Thread wilson at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93304

Jim Wilson  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #4 from Jim Wilson  ---
Fixed on mainline.

[Bug target/89627] Miscompiled constructor call

2020-01-27 Thread wilson at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89627

Jim Wilson  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED
   Assignee|unassigned at gcc dot gnu.org  |wilson at gcc dot 
gnu.org

--- Comment #5 from Jim Wilson  ---
Was fixed in gcc 9 last year.

[Bug target/91602] GCC fails to build for riscv in a combined tree due to misconfigured leb128 support

2020-01-29 Thread wilson at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91602

--- Comment #10 from Jim Wilson  ---
The proposed binutils patch has multiple problems and has gone through multiple
iterations.  Not clear when or if we will be able to accept it.

The gcc configure patch to eliminate the call to gcc_GAS_VERSION_GTE_IFELSE for
in tree gas builds actually looked like the better solution to me though I
haven't tried it yet.  If we go this way the patch should perhaps eliminate
everything related to gcc_GAS_VERSION_GTE_IFELSE which is a much bigger patch.

Since combined tree builds are obsolete, this is low on my priority list.

[Bug target/91602] GCC fails to build for riscv in a combined tree due to misconfigured leb128 support

2020-01-29 Thread wilson at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91602

--- Comment #11 from Jim Wilson  ---
Since Marxin pinged this and got me thinking about this again, I realized that
there is a simpler fix based on Serge's second suggestion.  We can just delete
the gas version number from the uleb128 gas check in configure.ac.  This will
force a gas feature check for uleb128 only, which solves the RISC-V build
problem, and is a nice small change.  I'm testing a patch for that.

[Bug target/93532] RISCV g++ hangs with optimization >= -O2

2020-02-03 Thread wilson at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93532

Jim Wilson  changed:

   What|Removed |Added

 CC||wilson at gcc dot gnu.org

--- Comment #9 from Jim Wilson  ---
I tried the buildroot instructions.  It didn't work on an ubuntu 16.04 server
machine.  There is a 'python3 pip3 -q docwriter' command that hangs.  I also
discovered that the script isn't restartable.  It runs -rf on the build
directory and exits with an error.  I did get it to work on my ubuntu 18.04
laptop.  And it does hang, but it isn't the btPolyhedralContactClipping.cpp
file that hangs for me, it is the btBoxBoxDetector.cpp file.  I was able to
reproduce this with a gcc-8.3.0 build using -O2 -fPIC -fstack-protector-strong
options to compile the file.  It does not reproduce using the top of the
gcc-8-branch svn tree, suggesting that either it is already fixed, or it is
maybe a memory corruption problem that is hard to reproduce.

Using gdb to attach to the gcc-8.3.0 compiler, I see that it is looping in lra,
but I haven't tried to debug that yet.

#0  0x00705e7b in bitmap_find_bit (bit=42321, bit@entry=330, 
head=0x376ae88) at ../../gcc-8.3.0/gcc/bitmap.c:539
#1  bitmap_set_bit (head=0x376ae88, bit=bit@entry=42321)
at ../../gcc-8.3.0/gcc/bitmap.c:600
#2  0x0099b95f in mark_regno_dead (regno=42321, mode=, 
point=) at ../../gcc-8.3.0/gcc/lra-lives.c:362
#3  0x0099c9c4 in process_bb_lives (dead_insn_p=false, 
curr_point=@0x7ffc9a90: 181876, bb=)
at ../../gcc-8.3.0/gcc/lra-lives.c:842
#4  lra_create_live_ranges_1 (all_p=all_p@entry=true, 
dead_insn_p=dead_insn_p@entry=false)
at ../../gcc-8.3.0/gcc/lra-lives.c:1337
#5  0x0099e7c0 in lra_create_live_ranges (all_p=all_p@entry=true, 
dead_insn_p=dead_insn_p@entry=false)
at ../../gcc-8.3.0/gcc/lra-lives.c:1406
#6  0x00982d0c in lra (f=)
at ../../gcc-8.3.0/gcc/lra.c:2473
#7  0x0093fa32 in do_reload () at ../../gcc-8.3.0/gcc/ira.c:5465
#8  (anonymous namespace)::pass_reload::execute (this=)
at ../../gcc-8.3.0/gcc/ira.c:5649
...

[Bug target/93532] RISCV g++ hangs with optimization >= -O2

2020-02-03 Thread wilson at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93532

--- Comment #10 from Jim Wilson  ---
Created attachment 47774
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=47774&action=edit
testcase that reproduces for me

compile with -O2 -fPIC -fstack-protector-strong

[Bug target/93532] RISCV g++ hangs with optimization >= -O2

2020-02-04 Thread wilson at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93532

--- Comment #11 from Jim Wilson  ---
I'm able to reproduce with the gcc-8-branch now.  Maybe I made a mistake with
my earlier build.  Anyways, it looks like it is going wrong here in the reload
dump

  
  Creating newreg=1856, assigning class NO_REGS to save r1856
  434: fa0:SF=call [`sqrtf'] argc:0
  REG_UNUSED fa0:SF
  REG_CALL_DECL `sqrtf'
  REG_EH_REGION 0
Add reg<-save after:
 2446: r114:SF#0=r1856:DF

  432: NOTE_INSN_BASIC_BLOCK 24
Add save<-reg after:
 2445: r1856:DF=r114:SF#0

  

then later we appear to end up in a loop generating secondary reloads that need
secondary reloads themselves, and so forth.  The instruction above looks funny,
trying to use a subreg to convert DFmode to SFmode.  I don't think we should be
generating that.

So it looks like a caller save problem.  If I add -fno-caller-saves the compile
finishes.  It appears that we need a definition for HARD_REGNO_CALLER_SAVE_MODE
because the default definition can't work here.  The comment in sparc.h for
HARD_REGNO_CALLER_SAVE_MODE looks relevant.  The same definition may work for
RISC-V.  Looks like the MIPS port does it the same way too.

[Bug target/93532] RISCV g++ hangs with optimization >= -O2

2020-02-06 Thread wilson at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93532

--- Comment #12 from Jim Wilson  ---
A bisection on mainline between the gcc-8 and gcc-9 releases shows that this
testcase was fixed by a combine patch for PR87600 that stops combining hard
regs with pseudos to reduce register pressure.  The commentary refers to ira
and lra problems.  A combine patch won't be as safe as a RISC-V backend patch
though.

I tried testing the riscv HARD_REGNO_CALLER_SAVE_MODE patch with buildroot but
it turns out that it is downloading a pre-built compiler instead of building
one.  So dropping in the patch doesn't do anything.  I will have to figure out
what is going on there.

Trying the riscv patch with mainline on the testcase, I see that I get better
rematerialization without the confusing subregs, and I also get smaller stack
frames since we are saving SFmode now to the stack instead of DFmode now. 
Otherwise, I don't see any significant changes to the code.

I tried a make check with the riscv patch on mainline, and got an unexpected
g++ testsuite failure, so I will have to look into that.

[Bug target/93532] RISCV g++ hangs with optimization >= -O2

2020-02-06 Thread wilson at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93532

Jim Wilson  changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |wilson at gcc dot 
gnu.org

--- Comment #13 from Jim Wilson  ---
Created attachment 47794
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=47794&action=edit
untested patch to fix the problem

[Bug target/93532] RISCV g++ hangs with optimization >= -O2

2020-02-08 Thread wilson at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93532

Jim Wilson  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #19 from Jim Wilson  ---
Patch applied to mainline.  This is just a minor optimization for gcc-10 as a
combiner patch between gcc-8 and gcc-9 reduces register pressure enough to
prevent the hang.  Hence there is no real need for the patch in gcc-9.  The
patch might be useful in gcc-8, but the problem is hard to reproduce, buildroot
is the only one that ran into the problem, and they can always add the patch to
their tree, so not clear if we really need it on the gcc-8 branch.

[Bug target/93532] RISCV g++ hangs with optimization >= -O2

2020-02-08 Thread wilson at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93532

--- Comment #20 from Jim Wilson  ---
Thanks for confirming that it solves the buildroot build problem.

My gcc mainline g++ test failure turned out to be a thread related issue with
qemu cross testing.  The testcase works always on hardware, but fails maybe
10-20% of the time when run under qemu.  RISC-V qemu is known to still have a
few bugs in this area, though they might already be fixed in newer qemu
versions than what I have.

[Bug tree-optimization/90883] Generated code is worse if returned struct is unnamed

2020-02-20 Thread wilson at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90883

Jim Wilson  changed:

   What|Removed |Added

 CC||wilson at gcc dot gnu.org

--- Comment #29 from Jim Wilson  ---
The testcase works for riscv64-elf but does not work for riscv32-elf.  The
difference is in the einline pass before dse1.  riscv64-elf has
tmp.C:12:17: optimized:  Inlining constexpr C::C()/1 into C slow()/3.
where as riscv32-elf has
tmp.C:12:17: missed:   will not early inline: C slow()/3->constexpr
C::C()/1, call is cold and code would grow by 1

Since the constructor was not early inlined, dse1 can't eliminate the redundant
store.  The constructor eventually gets inlined between
085i.materialize-all-clones and 088t.fixup_cfg3 which allows dse2 to eliminate
the redundant store.

I can make the testcase work for riscv32-elf if I add
--param max-inline-insns-size=1
to allow the constructor to be inlined during the einline pass.  I didn't check
to see if this works for the other failing targets.

[Bug rtl-optimization/92656] The zero_extend insn can't be eliminated in the combine pass

2020-02-27 Thread wilson at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92656

--- Comment #3 from Jim Wilson  ---
Looking at this, I see that the problem occurs in record_value_for_reg where it
does
  if (!insn
  || (value && rsp->last_set_table_tick >= label_tick_ebb_start))
rsp->last_set_invalid = 1;
last_set_table_tick is 2 and label_tick_ebb_start is 1 because this is the
first block of the function.  This actually causes a lot of variables set in
the first block to be marked invalid if used in a successful combination two or
more times, which then prevents the nonzero bits info from being used for any
of them.

There seems to be a problem with how label_tick is used.  In the very first
block in the body of the function, label_tick is 2 and label_tick_ebb_start is
1.  This is because it is considered to be the second block in the ebb after
the entry block.  In the second block in the body of the function, label_tick
is 3 and label_tick_ebb_start is 3.  This means that every variable set in the
first block gets treated differently than in every block after the first.

If I add a little bit of code before the loop to force it to be the second
block, then I get correct output from combine.  I just added this before the
loop
  static int j = 0;
  if (val)
j++;

This also explains why the problem only occurs with -mtune=sifive-7-series
because this enables the conditional move support that turns the loop into a
single block, and then the -funroll-loops option fully unrolls the loop,
turning the entire function into one block, which prevents combine from
handling many of the register sets correctly because everything is in the first
block now.

This also explains why the problem started when the 2->2 combination support
was added, as that causes more successful combinations, and hence more
registers getting invalidated in the first block.

So the question is why we need label_tick > label_tick_ebb_start for the first
block of the function.  There is nothing set in the entry block other than hard
registers, and those could always be handled specially by just marking them as
invalid somehow before processing instructions.

Or alternatively, in record_value_for_reg, maybe we can add a check for a
pseudo reg only set once and not live in the prologue, and avoid marking it as
invalid when we process it a second time.  There are already a lot of checks
like this scattered around the code.

[Bug rtl-optimization/92656] The zero_extend insn can't be eliminated in the combine pass

2020-02-27 Thread wilson at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92656

--- Comment #5 from Jim Wilson  ---
A rewrite using dataflow would be better of course.  I'm just trying to
understand the problem with this testcase better, and maybe find a simple
solution, but I don't think that there is one.  The workarounds I see just make
the code more complicated and add more risk of something else going wrong.

[Bug tree-optimization/90883] Generated code is worse if returned struct is unnamed

2020-03-02 Thread wilson at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90883

--- Comment #32 from Jim Wilson  ---
The proposed patch looks OK to me.  I suggest you submit it to gcc-patches.

[Bug lto/88422] collect2.exe: fatal error: lto-wrapper returned 1 exit status: file not recognized: file truncated

2018-12-09 Thread wilson at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88422

Jim Wilson  changed:

   What|Removed |Added

 CC||wilson at gcc dot gnu.org

--- Comment #2 from Jim Wilson  ---
I've reproduced this problem with a RISC-V gcc-8.x compiler, and tracked it
down to the first patch for bug 81968, in comment #60.  With the patch
reverted, the testcase works.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81968#c60

The RISC-V testcase works with Linux hosted and Cygwin hosted toolchains, and
only fails for mingw32 hosted toolchains.  Maybe an LLP64 problem with the
patch?  I didn't see any obvious type error in the patch though.

I had to borrow a windows machine from our IT group to look at this, and they
have since taken the loaner back, so I don't have a machine at work I can use
to debug this at present.

[Bug lto/88422] collect2.exe: fatal error: lto-wrapper returned 1 exit status: file not recognized: file truncated

2018-12-10 Thread wilson at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88422

--- Comment #4 from Jim Wilson  ---
I used a cross compiler, so ulong_type is easy enough to check.  For
simple-object-elf.i I see
__extension__ typedef unsigned long long uint64_t;
...
__extension__ typedef uint64_t ulong_type;
which looks right.

[Bug target/84797] RISC-V: add --with-multilib-list support

2018-04-10 Thread wilson at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84797

--- Comment #2 from Jim Wilson  ---
Created attachment 43904
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=43904&action=edit
version 2 patch

Missing documentation, doesn't handle architecture aliases, and only lets your
specify one ABI, but otherwise seems to be working right.  Configuring with
--enable-multilib --with-multilib-list=lp64d --with-abi=lp64d
--with-arch=rv64gc
I get a compiler that does this

gamma05:2013$ ./xgcc -B./ --print-multi-lib
.;
gamma05:2014$ ./xgcc -B./ --print-multi-dir
.
gamma05:2015$ ./xgcc -B./ --print-multi-os-directory
../lib64/lp64d
gamma05:2016$ 

So one multilib was built, and it was installed into lib64/lp64d where we want
it.  The --print-multi-dir value shouldn't matter.

Most ports only select multilibs based on two options.  The RISC-V port uses
two options to select multilibs, which makes specifying this stuff a lot more
complicated.  This is why I'm only allowing one ABI choice at the moment.

I see I forgot to cleanup the t-linux-withmultilib file.  I will do that in the
next version of the patch.

[Bug bootstrap/84856] Bootstrap failure on riscv: comparison of integer expressions of different signedness

2018-04-16 Thread wilson at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84856

--- Comment #8 from Jim Wilson  ---
I copied the design of the patch from the i386 backend, so in theory it should
work.  The layout of the stack is completely at the control of the target
backend, so uses of STACK_BOUNDARY outside the backend should not be a problem.

I did some sanity checking when I made the check, but now that you point the
problem out I see that I missed two cases.  outgoing_args_size and
pretend_args_size are not longer rounded to the PREFERRED_STACK_BOUNDARY size,
they are rounded to the smaller STACK_BOUNDARY size instead.  We can fix this
in riscv_compute_frame_info by adding RISCV_STACK_ALIGN macro calls around the
uses of these two values.  This is a simple fix.  I'm testing a patch now.

[Bug bootstrap/84856] Bootstrap failure on riscv: comparison of integer expressions of different signedness

2018-04-17 Thread wilson at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84856

--- Comment #10 from Jim Wilson  ---
Author: wilson
Date: Tue Apr 17 21:41:07 2018
New Revision: 259449

URL: https://gcc.gnu.org/viewcvs?rev=259449&root=gcc&view=rev
Log:
RISC-V: Fix 32-bit stack pointer alignment problem.

gcc/
PR 84856
* config/riscv/riscv.c (riscv_compute_frame_info): Add calls to
RISCV_STACK_ALIGN when using outgoing_args_size and pretend_args_size.
Set arg_pointer_offset after using pretend_args_size.

Modified:
trunk/gcc/ChangeLog
trunk/gcc/config/riscv/riscv.c

[Bug inline-asm/85185] Wider-than-expected load for struct member used as operand of inline-asm with memory clobber at -Og

2018-04-23 Thread wilson at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85185

Jim Wilson  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2018-04-23
 CC||wilson at gcc dot gnu.org
 Ever confirmed|0   |1

--- Comment #7 from Jim Wilson  ---
The problem is exposed in combine, where we take two instructions

(insn 9 8 10 2 (set (reg:DI 72 [ _2 ])
(sign_extend:DI (subreg:HI (reg:SI 75 [ SD.1554.PD.1553 ]) 0)))
"tmp.c"\
:12 92 {extendhidi2}
 (expr_list:REG_DEAD (reg:SI 75 [ SD.1554.PD.1553 ])
(nil)))
(insn 10 9 0 2 (parallel [
(asm_operands/v ("magic %0") ("") 0 [
(subreg/s/u:HI (reg:DI 72 [ _2 ]) 0)
]
 [
(asm_input:HI ("r") tmp.c:12)
]
 [] tmp.c:12)
(clobber (mem:BLK (scratch) [0  A8]))
]) "tmp.c":12 -1
 (expr_list:REG_DEAD (reg:DI 72 [ _2 ])
(nil)))

and then produce

insn 10 9 0 2 (parallel [
(asm_operands/v ("magic %0") ("") 0 [
(subreg:HI (reg:SI 75 [ SD.1554.PD.1553 ]) 0)
]
 [
(asm_input:HI ("r") tmp.c:12)
]
 [] tmp.c:12)
(clobber (mem:BLK (scratch) [0  A8]))
]) "tmp.c":12 -1
 (expr_list:REG_DEAD (reg:SI 75 [ SD.1554.PD.1553 ])
(nil)))

We have now lost the truncation and sign-extension.  The value passed to the
asm has correct value for the low 16 bits, but has garbage in the high 16 bits. 

However, what combine did does not appear wrong by itself.  One could argue
that the problem started with the asm, which is taking a HImode argument, even
though this makes little sense on RISC-V, since the only instructions operating
on HImode are the 16-bit load and store instructions.  Maybe the asm should use
the sign-extended DImode value directly and assume a DImode input instead of a
HImode input?  That would prevent the truncate and sign-extend from being
optimized away, but might be wrong if someone extends the RISC-V ISA to include
instructions that operate directly on HImode values.

I can work around the problem by explicitly casting the asm input to int.
  asm("magic %0" :: "r" ((int)sub.a) : "memory");
and now the asm takes SImode input, and the truncate/sign extend can't be
optimized away.  Asking the user to change their code doesn't seem like the
right solution though.

[Bug target/85492] riscv64: endless when throwing an exception from a constructor

2018-04-24 Thread wilson at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85492

--- Comment #1 from Jim Wilson  ---
The testcase fails with default dynamic linking.  It works with static linking.

It also works if runtime_error is removed and we have just a plain throw.

Using github riscv/riscv-gnu-toolchain project, which has older versions of
binutils, gcc, and glibc, it works both static and dynamic.  If I update
binutils and/or gcc to FSF mainline, it still works.  If I update glibc to FSF
glibc-2.27, it fails dynamic but works static.  So apparently the problem was
triggered by a glibc change when it was upstreamed.

I tried adding aborts to libgcc and libstdc++ unwind/exception routines.  They
aren't hit.  qemu traces suggest it is looping inside the dynamic linker. 
LD_DEBUG=all isn't helpful.  It prints a lot of messages for binding symbols,
and then no messages when it gets stuck looping (assuming it is looping inside
ld.so).

Unfortunately, we don't have gdb support yet.  I can't use gdb sim to generate
a trace as gdb sim doesn't support dynamic linked binaries.  It isn't clear how
to debug this.  Maybe I can find a clue in the gcc testsuite.  I haven't tried
running that natively yet.  It will likely take a while to run though and may
not trigger the same failure.

[Bug target/85492] riscv64: endless when throwing an exception from a constructor

2018-04-24 Thread wilson at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85492

Jim Wilson  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2018-04-24
 Ever confirmed|0   |1

[Bug target/85492] riscv64: endless loop when throwing an exception from a constructor

2018-04-27 Thread wilson at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85492

--- Comment #3 from Jim Wilson  ---
I figured out that I wasn't fully rebuilding and relinking all libraries while
trying to debug this with printf, and that sent me down the wrong path.

Trying this again, correctly, I see that we have a loop in unwind, because the
return address for _start is pointing at _start.  This works by accident when
static linking, because crt1.o is included before crtbegin.o, crtbegin.o
registers FDEs starting from a label it adds to the eh_frame section, and hence
the FDE for _start in crt1.o gets lost.  When unwinding, we see that there is
no FDE for _start, and it isn't an exception frame, so that terminates
unwinding.  When dynamic linking, we use PT_GNU_EH_FRAME which uses eh_frame
section addresses and hence finds every FDE, including the one for _start, so
we try to unwind through _start, get a return address pointing at _start, and
go into an infinite loop.

This requires a glibc patch to fix.  Just setting the return address in _start
to 0 works.

[Bug target/85492] riscv64: endless loop when throwing an exception from a constructor

2018-04-27 Thread wilson at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85492

--- Comment #4 from Jim Wilson  ---
Created attachment 44032
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=44032&action=edit
proposed glibc patch to fix the problem

[Bug target/85492] riscv64: endless loop when throwing an exception from a constructor

2018-04-27 Thread wilson at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85492

--- Comment #6 from Jim Wilson  ---
I suggest you handle the glibc patch.

Note that you can probably also fix this by adding unwind direcives to _start
to say that the return address is in x0.  This would avoid the minor code size
increase, but takes a little more effort to figure out how to add the right
unwind directives to assembly code to make this work.  I haven't tried that.

[Bug target/85492] riscv64: endless loop when throwing an exception from a constructor

2018-04-28 Thread wilson at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85492

--- Comment #8 from Jim Wilson  ---
(In reply to Aurelien Jarno from comment #7)
> Should I just close this bug and open a new one on the glibc side?

That is fine if you want to do that.

> +   /* Mark ra as undefined in order to stop unwinding here!  */
> +   cfi_undefined (ra)

I tried this, and it worked for me.

[Bug target/85596] New: aarch64 --with-multilib-list documentation missing

2018-05-01 Thread wilson at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85596

Bug ID: 85596
   Summary: aarch64 --with-multilib-list documentation missing
   Product: gcc
   Version: 8.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: wilson at gcc dot gnu.org
  Target Milestone: ---

config.gcc has aarch64 support for the --with-multilib-list option, but it
isn't documented in the doc/install.texi file.

[Bug target/85142] Wrong -print-multi-os-directory & -print-multi-lib output for riscv64 + multilib

2018-05-08 Thread wilson at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85142

Jim Wilson  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |DUPLICATE

--- Comment #12 from Jim Wilson  ---
Will be fixed by the patch for 84797.

*** This bug has been marked as a duplicate of bug 84797 ***

[Bug target/84797] RISC-V: add --with-multilib-list support

2018-05-08 Thread wilson at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84797

--- Comment #3 from Jim Wilson  ---
*** Bug 85142 has been marked as a duplicate of this bug. ***

[Bug target/84797] RISC-V: add --with-multilib-list support

2018-05-08 Thread wilson at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84797

Jim Wilson  changed:

   What|Removed |Added

 Status|UNCONFIRMED |ASSIGNED
   Last reconfirmed||2018-05-08
   Assignee|unassigned at gcc dot gnu.org  |wilson at gcc dot 
gnu.org
 Ever confirmed|0   |1

[Bug target/84797] RISC-V: add --with-multilib-list support

2018-05-09 Thread wilson at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84797

--- Comment #4 from Jim Wilson  ---
Author: wilson
Date: Wed May  9 21:17:14 2018
New Revision: 260096

URL: https://gcc.gnu.org/viewcvs?rev=260096&root=gcc&view=rev
Log:
RISC-V: Add with-multilib-list support.

gcc/
PR target/84797
* config.gcc (riscv*-*-*): Handle --with-multilib-list.
* config/riscv/t-withmultilib: New.
* config/riscv/withmultilib.h: New.
* doc/install.texi: Document RISC-V --with-multilib-list support.

Added:
trunk/gcc/config/riscv/t-withmultilib
trunk/gcc/config/riscv/withmultilib.h
Modified:
trunk/gcc/ChangeLog
trunk/gcc/config.gcc
trunk/gcc/doc/install.texi

[Bug target/84797] RISC-V: add --with-multilib-list support

2018-05-09 Thread wilson at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84797

Jim Wilson  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #5 from Jim Wilson  ---
Fixed on mainline.

[Bug target/86005] [RISCV] Invalid intermixing of __atomic_* libcalls and inline atomic instruction sequences

2018-05-31 Thread wilson at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86005

Jim Wilson  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2018-05-31
 Ever confirmed|0   |1

[Bug other/86039] Compiler placed in deep/long folder cannot open/run needed files on Windows

2018-06-04 Thread wilson at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86039

Jim Wilson  changed:

   What|Removed |Added

 CC||wilson at gcc dot gnu.org

--- Comment #1 from Jim Wilson  ---
Windows has a 260 character default maximum path length.  See for instance
https://msdn.microsoft.com/en-us/library/windows/desktop/aa365247(v=vs.85).aspx#maxpath

This looks like an OS problem not a gcc problem.

[Bug target/86005] [RISCV] Invalid intermixing of __atomic_* libcalls and inline atomic instruction sequences

2018-06-04 Thread wilson at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86005

--- Comment #8 from Jim Wilson  ---
This looks like a generic GCC problem, not a RISC-V specific problem.

For instance, if I build an armv6t2 compiler I get
bl  __atomic_fetch_add_4

[Bug target/86005] [RISCV] Invalid intermixing of __atomic_* libcalls and inline atomic instruction sequences

2018-06-04 Thread wilson at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86005

--- Comment #9 from Jim Wilson  ---
Oops, hitting tab doesn't work as expected.  Trying again...

This looks like a generic GCC problem, not a RISC-V specific problem.  Or
perhaps, not a gcc bug at all.

For instance, if I build an armv6t2 compiler I get
bl  __atomic_fetch_add_4
...
mcr p15, 0, r0, c7, c10, 5
ldr r3, [r3]
mcr p15, 0, r0, c7, c10, 5
where the mcr is equivalent to the RISC-V fence.  It looks like MIPS16 and a
number of other targets have the same problem.

GCC has no support for calling __atomic_load_4 for this testcase.  GCC assumes
that loads smaller or equal to the word size are always atomic, and will not
call a library routine for them.  It will emit memory barriers.

If what gcc is doing is wrong, then it is broken for all targets that don't
inline expand every atomic function call, and/or don't have atomic
instructions.

I can fix the rv32ia support by inlining expanding every atomic function call. 
I can't fix the rv32i support without target independent optimizer changes.

[Bug libffi/84410] libffi doesn't support riscv now, but not disabled in configure.ac

2018-06-08 Thread wilson at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84410

Jim Wilson  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #4 from Jim Wilson  ---
libffi builds now, and so does libgo compiler.  So fixed for GCC 9.

[Bug tree-optimization/91191] New: vrp and boolean arguments

2019-07-17 Thread wilson at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91191

Bug ID: 91191
   Summary: vrp and boolean arguments
   Product: gcc
   Version: 9.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: wilson at gcc dot gnu.org
  Target Milestone: ---

It appears that vrp isn't propagating the ranges of incoming boolean arguments.
 Given this example:

unsigned char reg(_Bool b) {
union U {
unsigned char f0;
_Bool f1;
};
union U u;
u.f1 = b;
if (u.f0 > 1) { 
// This cannot happen
// if b is only allowed
// to be 0 or 1:
return 42;   
}
return 13;
}

clang optimizes this to unconditionally return 13, but gcc does a compare and
conditionally returns either 42 or 13 depending on the result of the compare.
This happens with both x86_64 and RISC-V.

Looking at the vrp dumps, I see
b_3(D): VARYING

[Bug target/91229] New: RISC-V ABI problem with zero-length bit-fields and float struct fields

2019-07-22 Thread wilson at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91229

Bug ID: 91229
   Summary: RISC-V ABI problem with zero-length bit-fields and
float struct fields
   Product: gcc
   Version: 9.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: wilson at gcc dot gnu.org
  Target Milestone: ---

This was noticed by clang development, comparing clang against gcc to verify
ABI compliance.

There is a problem with the GCC implementation of the ABI, where we are
accidentally emitting different code for the same struct when compiled by the C
and C++ compilers.

There are two cases affected by this.  Here is the first example:
struct s1 { int : 0; float f; int i; int : 0; };

void dummy(float, int);

void f(struct s1 s) {
  dummy(s.f + 1.0, s.i + 1);
}
where we have a struct that can be passed in one FP reg and one integer GP, and
here is the second example:
struct s1 { int : 0; float f; float g; int : 0; };

void dummy(float, float);

void f(struct s1 s) {
  dummy(s.f + 1.0, s.g + 2.0);
}
where we have a struct that can be passed in two FP regs.  In both cases, the
C++ compiler passes the float struct fields in FP registers, and the C compiler
passes them in integer registers.

The general case here is any struct with one or more zero-length bitfields,
exactly two non-zero length fields, one of which must have an FP type that can
fit in an FP register, and the other can be an FP type that fits in an FP
register or an integer type that fits in an integer register or a integer
bit-field that is the exact same size as an integer type that can fit in an
integer register.  Also, the target must have FP register support.

The fundamental problem is that the RISC-V backend is not checking for
zero-length bit-fields when deciding if a struct field can be passed in a FP
register or not.  Meanwhile, the C++ front end is stripping zero-length
bit-fields after struct layout.  So when compiling as C++ we decide that the FP
struct fields can be passed in FP regs.  But when compiling as C we decide that
there are too many struct fields and they all get passed in integer registers.

Since having the C and C++ front ends using different ABIs is undesirable, we
need an ABI change.  Fixing the C++ case would require inconvenient changes to
the C++ front end.  So fixing the C case with a RISC-V backend patch looks like
the best practical solution.

The affected structures are a bit obscure and not very useful, so it is hoped
no real code will be affected.  I've done an open-embedded world build with an
instrumented compiler, and I didn't see any case that triggered my code.  Not
everything built though, since some stuff still doesn't have RISC-V support
yet.  I did have over 30,000 tasks run, so quite a bit of stuff did build.

[Bug target/91229] RISC-V ABI problem with zero-length bit-fields and float struct fields

2019-07-22 Thread wilson at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91229

Jim Wilson  changed:

   What|Removed |Added

 Target||riscv*-*-*
 Status|UNCONFIRMED |NEW
   Last reconfirmed||2019-07-22
   Assignee|unassigned at gcc dot gnu.org  |wilson at gcc dot 
gnu.org
 Ever confirmed|0   |1

--- Comment #1 from Jim Wilson  ---
There is a psABI discussion about the problem at
https://github.com/riscv/riscv-elf-psabi-doc/issues/99

[Bug target/91229] RISC-V ABI problem with zero-length bit-fields and float struct fields

2019-07-22 Thread wilson at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91229

--- Comment #2 from Jim Wilson  ---
Created attachment 46617
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=46617&action=edit
proposed patch to change ABI and warn for affected structs

[Bug target/91229] RISC-V ABI problem with zero-length bit-fields and float struct fields

2019-08-08 Thread wilson at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91229

--- Comment #3 from Jim Wilson  ---
Author: wilson
Date: Thu Aug  8 19:04:56 2019
New Revision: 274215

URL: https://gcc.gnu.org/viewcvs?rev=274215&root=gcc&view=rev
Log:
RISC-V: Fix C ABI for flattened struct with 0-length bitfield.

gcc/
PR target/91229
* config/riscv/riscv.c (riscv_flatten_aggregate_field): New arg
ignore_zero_width_bit_field_p.  Skip zero size bitfields when true.
Pass into recursive call.
(riscv_flatten_aggregate_argument): New arg.  Pass to
riscv_flatten_aggregate_field.
(riscv_pass_aggregate_in_fpr_pair_p): New local warned.  Call
riscv_flatten_aggregate_argument twice, with false and true as last
arg.  Process result twice.  Compare results and warn if different.
(riscv_pass_aggregate_in_fpr_and_gpr_p): Likewise.

gcc/testsuite/
* gcc.target/riscv/flattened-struct-abi-1.c: New test.
* gcc.target/riscv/flattened-struct-abi-2.c: New test.

Added:
trunk/gcc/testsuite/gcc.target/riscv/flattened-struct-abi-1.c
trunk/gcc/testsuite/gcc.target/riscv/flattened-struct-abi-2.c
Modified:
trunk/gcc/ChangeLog
trunk/gcc/config/riscv/riscv.c
trunk/gcc/testsuite/ChangeLog

[Bug target/91229] RISC-V ABI problem with zero-length bit-fields and float struct fields

2019-08-08 Thread wilson at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91229

Jim Wilson  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #4 from Jim Wilson  ---
Fixed on mainline.

[Bug target/91420] relocation truncated to fit: R_RISCV_HI20 against `.LC0' with GCC 8.2/8.3 with "-O2" on RISC-V

2019-08-12 Thread wilson at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91420

Jim Wilson  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2019-08-12
 CC||wilson at gcc dot gnu.org
 Ever confirmed|0   |1

[Bug target/91602] GCC fails to build for riscv in a combined tree due to misconfigured leb128 support

2019-08-30 Thread wilson at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91602

Jim Wilson  changed:

   What|Removed |Added

 CC||wilson at gcc dot gnu.org

--- Comment #3 from Jim Wilson  ---
Combined tree builds are obsolete and shouldn't be used anymore.  Since this
only shows up in a combined tree build, I don't consider it important.  If you
build the toolchain the correct way, building binutils and gcc separately, the
build does work.  My preferred solution would be to kill combined tree build
support.

  1   2   3   4   5   >