[Bug target/105733] riscv: Poor codegen for large stack frames

2022-06-06 Thread wilson at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105733

Jim Wilson  changed:

   What|Removed |Added

 CC||wilson at gcc dot gnu.org

--- Comment #2 from Jim Wilson  ---
This is a known problem that I have commented on before.  For instance here
https://github.com/riscv-collab/riscv-gcc/issues/193

Copying my reply to here since this is a better place for the bug report:

This is a known problem. GCC first emits code using a frame pointer, and then
tries to eliminate the frame pointer during register allocation. When the frame
size is larger than the immediate field of an add, we need temporaries to
compute frame offsets. Compiler optimizations like common subexpression
elimination (cse) try to optimize the calculation of the temporaries. Then when
we try to eliminate the frame pointer, we run into trouble because the cse opt
changes interfere with the frame pointer elimination, and we end up with an
ugly inefficient mess.

This problem isn't RISC-V specific, but since RISC-V has 12-bit immediates and
most everyone else has 16-bit immediates, we hit the problem sooner, and thus
makes it much more visible for RISC-V. The example above is putting 12KB on the
stack, which is larger than 12-bits of range (4KB), but well within 16-bits of
range (64KB).

I have a prototype of a patch to fix it by allowing >12-bit offsets for frame
pointer references, and then fixing this when eliminating the frame pointer,
but this can cause other problems, and needs more work to be usable. I have no
idea when someone might try to finish the patch.

End of inclusion from github.

To improve my earlier answer, we have signed 12-bit immediates, so the trouble
actually starts at 2048 bytes, and Jessica's example is larger than that. 
Reduce BUFSIZ to 2044 and you get a reasonable result.

foo:
li  a5,4096
addia5,a5,-2048
addisp,sp,-2048
add a5,a5,a0
add a0,a5,sp
li  t0,4096
sb  zero,-2048(a0)
addit0,t0,-2048
add sp,sp,t0
jr  ra

There is still a bit of oddness here.  We are adding 2048 to a0 and then using
an address -2048(a0).  I think that is more cse trouble.  2048 requires two
instructions to load into a register which is likely confusing something
somewhere.

[Bug target/103889] gccgo is unable to find its standard library by default on 64-Bit RISC-V due to musl not using multilib but still uses t-linux

2022-01-03 Thread wilson at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103889

Jim Wilson  changed:

   What|Removed |Added

 CC||wilson at gcc dot gnu.org

--- Comment #14 from Jim Wilson  ---
The RISC-V ABI defines 6 different ABIs currently (not including rv32e
support), so the old /lib and /lib64 distinction isn't enough.  We define 6
different directories where the libraries can be, depending on the ABI in use. 
For rv64/lp64d, the most common one, they go in /lib64/lp64d.  Desktop linux
does not use all of these ABIs, but embedded linux does.

Alibaba/T-Head incidentally has written linux kernel and qemu support for
running 32-bit code on a rv64 kernel, but I don't know if they have hardware
that supports this yet, and don't know if they are multilibbing/multiarching
yet.  I think it is only a matter of time before we have to support both 32-bit
and 64-bit code on rv64 systems.

Anyways, the first thing I would try is to make some links.  You can make
/lib64 a link to /lib, and make a /lib/lp64d link that points back at lib. 
Hence /lib64/lp64d will point at /lib.  Likewise for /usr/lib64/lp64d to point
at /usr/lib.  Then see if gccgo works.  Some linux distros already have these
links.  Adding these links to the OS might be the simplest solution if it
works.

If you don't link the symlink idea, then we probably need a new t-linux-musl
file.  And if you are using GNU Binutils, then you need GNU ld changes, since
it is hardwired to use the same dirs.  And if you are using glibc then you need
glibc changes too, but you mentioned musl so this doesn't apply to you, but
would apply to others if they want the same kind of changes for another linux
distro.

On a Fedora RISC-V system
[wilson@fedora-riscv /]$ ls -lt lib64
lrwxrwxrwx. 1 root root 9 Aug 12  2020 lib64 -> usr/lib64
[wilson@fedora-riscv /]$ cd /usr/lib64
[wilson@fedora-riscv lib64]$ ls -lt lp64d
lrwxrwxrwx. 1 root root 1 Aug  4 07:18 lp64d -> .
[wilson@fedora-riscv lib64]$ 
So both /lib64/lp64d and /usr/lib64/lp64d work, and point to the same place.

[Bug target/103302] wrong code with -fharden-compares

2021-12-08 Thread wilson at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103302

--- Comment #11 from Jim Wilson  ---
FYI I have a patch to re-add the movti pattern to riscv.md which should also
fix this and another bug.  Kito removed the pattern in 2016 and I was hoping to
get an answer from him about why he removed it.  The RISC-V Summit is this week
so not much development work happening this week from RISC-V developers.  And
in general I've been in transition away from SiFive and not getting much done.

[Bug target/103271] ICE in assign_stack_temp_for_type with -ftrivial-auto-var-init=pattern and VLAs and -mno-strict-align on riscv64

2021-12-03 Thread wilson at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103271

Jim Wilson  changed:

   What|Removed |Added

 CC||kito.cheng at gmail dot com

--- Comment #18 from Jim Wilson  ---
I found the place where the movti patterns were removed.  This is a riscv-gcc
commit from Kito.

commit 38650bdbe91bf37c3cce706ce612097b657a91d5
Author: Kito Cheng 
Date:   Mon Sep 12 14:52:56 2016 +0800

Remove 128 bit support, just let gcc handle it

I don't see a github pull request or issue for it, so I don't know why Kito
removed the support.  This is a little too long ago to easily find info related
to the change.  This is a gcc-6.1 source base.  Unless perhaps Kito remembers
why he made the change, all I can do is readd the support and wait to see if
something else breaks.  I do think that we should readd the movti support to
fix a couple of bugs.  I suspect a code size or performance issue, but wrong
code and ICEs are worse problems than code size and performance.

[Bug target/103271] ICE in assign_stack_temp_for_type with -ftrivial-auto-var-init=pattern and VLAs and -mno-strict-align on riscv64

2021-12-02 Thread wilson at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103271

--- Comment #16 from Jim Wilson  ---
I have a patch to add movti to the riscv port.  I agree that we should be
adding this.  I just unfortunately had a kitchen accident and had take some
time off before I finished it.  I noticed that a comment before
riscv_split_doubleword_move implies that we did used to have a movti pattern. 
I wanted to check out the history before re-adding movti to make sure I didn't
break anything.

[Bug target/103271] ICE in assign_stack_temp_for_type with -ftrivial-auto-var-init=pattern and VLAs and -mno-strict-align on riscv64

2021-11-25 Thread wilson at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103271

--- Comment #6 from Jim Wilson  ---
See also bug 103302 which can also be fixed by adding a movti pattern.

[Bug target/103302] wrong code with -fharden-compares

2021-11-25 Thread wilson at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103302

--- Comment #4 from Jim Wilson  ---
See also bug 103271 which can also be fixed by adding a movti pattern.

[Bug target/103271] ICE in assign_stack_temp_for_type with -ftrivial-auto-var-init=pattern and VLAs and -mno-strict-align on riscv64

2021-11-25 Thread wilson at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103271

Jim Wilson  changed:

   What|Removed |Added

 CC||wilson at gcc dot gnu.org

--- Comment #5 from Jim Wilson  ---
SiFive doesn't support -mno-strict-align so I've never tested it.  I doubt that
it works correctly, i.e. I doubt that it optimizes as intended.  I've mentioned
this to other RVI members, but there hasn't been anyone other than SiFive
actively working on upstream gcc so I don't think anyone ever looked at it.  It
shouldn't give an ICE though.

Looking at this, it appears to be another "if only we had a movti pattern"
issue.

In expand_DEFERRED_INIT in internal-fn.c, in the reg_lhs == TRUE case, there is
a test
  && have_insn_for (SET, var_mode))
which fails because var_mode is TImode and we don't have a movti pattern.  The
code calls build_zero_cst which returns a constructor with an array type.  We
then call expand_assignment which gets confused as it doesn't know the size of
the array it is copying.

However, if we had a movti pattern, then the code computes the size of the
array, and creates a VIEW_CONVERT_EXPR to document the array size before
calling expand_assignment.  So it looks like it would work if we had a movti
pattern.

I verified that adding a dummy movti pattern makes the ICE go away.

[Bug target/103302] wrong code with -fharden-compares

2021-11-25 Thread wilson at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103302

--- Comment #3 from Jim Wilson  ---
Maybe the register allocator should remove clobbers of pseudos, instead of
turning them into clobbers of hard register pairs.  That would eliminate the
ambiguity after register allocation.  It is also true that we don't needs hard
reg clobbers.  The clobbers are only there for tracking pseudo reg subregs.

[Bug target/103302] wrong code with -fharden-compares

2021-11-25 Thread wilson at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103302

Jim Wilson  changed:

   What|Removed |Added

 CC||wilson at gcc dot gnu.org

--- Comment #2 from Jim Wilson  ---
It is the second reversed comparison that is wrong.  This is the
  u32_0 <= (...)
on the first line of foo0.  In the assembly file, this ends up as
mv  a0,a6
mv  a1,a7
xor a6,a0,a6
xor a7,a1,a7
or  a6,a6,a7
seqza6,a6
and note that it is comparing a value against itself when it should be
comparing two different values.

The harden compare pass is generating RTL

insn 156 152 155 6 (set (reg:TI 201)
(asm_operands:TI ("") ("=g") 0 [
(reg:TI 77 [ _8 ])
]
 [
(asm_input:TI ("0"))
]
 [])) -1
 (nil))
(insn 155 156 153 6 (clobber (reg:TI 77 [ _8 ])) -1
 (nil))
(insn 153 155 154 6 (set (subreg:DI (reg:TI 77 [ _8 ]) 0)
(subreg:DI (reg:TI 201) 0)) -1
 (nil))
(insn 154 153 160 6 (set (subreg:DI (reg:TI 77 [ _8 ]) 8)
(subreg:DI (reg:TI 201) 8)) -1
 (nil))

Then the asmcons pass is changing this to

(insn 851 152 849 5 (clobber (reg:TI 201)) -1
 (nil))
(insn 849 851 850 5 (set (subreg:DI (reg:TI 201) 0)
(subreg:DI (reg:TI 77 [ _8 ]) 0)) -1
 (nil))
(insn 850 849 156 5 (set (subreg:DI (reg:TI 201) 8)
(subreg:DI (reg:TI 77 [ _8 ]) 8)) -1
 (nil))
(insn 156 850 155 5 (set (reg:TI 201)
(asm_operands:TI ("") ("=g") 0 [
(reg:TI 201)
]
 [
(asm_input:TI ("0"))
]
 [])) -1
 (expr_list:REG_DEAD (reg:TI 77 [ _8 ])
(nil)))
(insn 155 156 153 5 (clobber (reg:TI 77 [ _8 ])) -1
 (nil))
(insn 153 155 154 5 (set (subreg:DI (reg:TI 77 [ _8 ]) 0)
(subreg:DI (reg:TI 201) 0)) 135 {*movdi_64bit}
 (nil))
(insn 154 153 854 5 (set (subreg:DI (reg:TI 77 [ _8 ]) 8)
(subreg:DI (reg:TI 201) 8)) 135 {*movdi_64bit}
 (expr_list:REG_DEAD (reg:TI 201)
(nil)))

Then the register allocator puts both 77 and 201 in the same register, which
means we are now clobbering values we need.

In the reload dump I see

(insn 851 152 849 5 (clobber (reg:TI 16 a6 [201])) -1
 (nil))
(insn 849 851 850 5 (set (reg:DI 16 a6 [201])
(reg:DI 10 a0 [orig:77 _8 ] [77])) 135 {*movdi_64bit}
 (nil))
(insn 850 849 907 5 (set (reg:DI 17 a7 [+8 ])
(reg:DI 11 a1 [ _8+8 ])) 135 {*movdi_64bit}
 (nil))
(insn 907 850 1014 5 (clobber (reg:TI 16 a6 [201])) -1
 (nil))

so the insns 849 and 850 get optimized away, but we need them.  Also, we have

(insn 854 154 852 5 (clobber (reg:TI 16 a6 [202])) -1
 (nil))
(insn 852 854 853 5 (set (reg:DI 16 a6 [202])
(reg:DI 6 t1 [orig:86 _39 ] [86])) 135 {*movdi_64bit}
 (nil))
(insn 853 852 913 5 (set (reg:DI 17 a7 [+8 ])
(reg:DI 7 t2 [ _39+8 ])) 135 {*movdi_64bit}
 (nil))
(insn 913 853 1010 5 (clobber (reg:TI 16 a6 [202])) -1
 (nil))

and the insns 852 and 853 get optimized away, but we need them.  The comparison
is supposed to be a0/a1 versus t1/t2, but we end up with comparing a6/a7
against itself.

asmcons is calling emit_move_insn to copy the asm source to the asm dest so it
can simplify the asm.  Since this is a multiword mode, and the riscv backend
doesn't have a movti pattern, this ends up calling emit_move_multi_word which
emits the extra clobber that causes the problem.

I suppose we could fix this by adding a movti pattern to the riscv backend to
avoid the clobbers but we shouldn't have to.  Though it would be interesting to
see if this maybe results in better code optimization.

It isn't clear exactly where the problem is.  Maybe asmcons shouldn't try to
fix an asm when the mode is larger than the word mode?  This could be left to
the register allocator to fix.  Or maybe harden compare shouldn't generate RTL
like this?  This could be a harden compare issue, or maybe an issue with the
RTL expander to emit the rtl differently.  Looks like the same issue with the
RTL expander calling emit_move_multi_word which generates the clobber.  Or
maybe a movti pattern is actually required now?

I did verify that disabling asmcons fixes the problem for this testcase.  I had
to hack the code in function.c to do that as there is no option to disable it.

[Bug target/102211] [12 regression] ICE introduced by r12-3277

2021-09-13 Thread wilson at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102211

--- Comment #10 from Jim Wilson  ---
I attached a patch which is my proposed solution to the RISC-V backend.  It
adds a new f_register_operand predicate and modifies patterns that use the f
constraint to use it instead of register_operand.

This was tested with an default language gcc build, glibc build, and glibc
check on an unmatched running OpenEmbedded.  And an all language gcc build,
glibc build, and glibc check on an unleashed running Fedora with an old 4.15
kernel.  Both succeeded as well as expected.  I'll be trying gcc check next.

Meanwhile, the validate_subregs patch was reverted, so we don't immediately
need this to fix build errors.  But it still might be useful if
validate_subregs changes again.  Or if it happens to give better code, though I
think it won't do anything if validate_subregs is rejecting the subregs we are
checking for in f_register_operand.

[Bug target/102211] [12 regression] ICE introduced by r12-3277

2021-09-13 Thread wilson at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102211

--- Comment #9 from Jim Wilson  ---
Created attachment 51456
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=51456=edit
proposed RISC-V backend solution

[Bug target/102250] [11/12 Regression] python is not documented as a Prerequisite for building for riscv

2021-09-09 Thread wilson at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102250

Jim Wilson  changed:

   What|Removed |Added

 CC||wilson at gcc dot gnu.org

--- Comment #3 from Jim Wilson  ---
There is no python requirement for building a riscv linux compiler.  It is only
needed if the user specifies the optional --with-arch configure option, and if
the user specifies a non-canonical form of the architecture string in the
--with-arch option string.  Otherwise gcc will build fine without python.

A later patch fixed this to check for python before using it, after Andreas
Schwab complained about this.

[Bug target/102211] [12 regression] ICE introduced by r12-3277

2021-09-08 Thread wilson at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102211

--- Comment #5 from Jim Wilson  ---
I have a WIP fix that lets me build newlib and glibc via riscv-gnu-toolchain. 
I haven't tried a bootstrap yet.  I created a new predicate that uses the small
bit of deleted code I need from validate_subregs, and then modified most
patterns with an f constraint to use it.

+;; This predicate should be used for any pattern that has constraints that
+;; accept FP registers.
+(define_predicate "f_register_operand"
+  (match_operand 0 "register_operand")
+{
+  /* We can't allow a subreg that changes size in an FP reg, as that violates
+ the NaN-boxing requirement.  Copied from old validate_subregs code.  */
+  if (GET_CODE (op) == SUBREG)
+{
+  machine_mode imode = GET_MODE (SUBREG_REG (op));
+  machine_mode omode = GET_MODE (op);
+  if (FLOAT_MODE_P (imode) || FLOAT_MODE_P (omode))
+   {
+ poly_uint64 isize = GET_MODE_SIZE (imode);
+ poly_uint64 osize = GET_MODE_SIZE (omode);
+
+ if (! (known_eq (isize, osize)
+/* LRA can use subreg to store a floating point value in
+   an integer mode.  Although the floating point and the
+   integer modes need the same number of hard registers,
+   the size of floating point mode can be less than the
+   integer mode.  LRA also uses subregs for a register
+   should be used in different mode in on insn.  */
+|| lra_in_progress))
+   return false;
+   }
+}
+  return true;
+})

I haven't fixed the move patterns yet.  I need a few more new predicates for
that.

The riscv_hard_regno_mode_ok function looks odd as Hongtao Liu mentioned.  The
mov*i patterns have f constraints, but we don't allow FP values in integer
registers here.  I think this is a missed optimization, or maybe good register
allocation creates some no-op moves to hide the problem.  My current patch does
not need a riscv_hard_regno_mode_ok change to work.

The riscv_can_change_mode_class also looks a little odd.  The MIPS one that we
copied doesn't allow mode changes in FP regs, but the alpha one does allow mode
changes in FP regs if the modes have the same size.  I think the alpha one can
work for RISC-V and that this is another missed optimization, though again
maybe good regalloc hides the problem with no-op moves.

But I will worry about the possible missed optimizations after I get bootstrap
working again.

[Bug target/102211] [12 regression] ICE introduced by r12-3277

2021-09-07 Thread wilson at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102211

Jim Wilson  changed:

   What|Removed |Added

 Ever confirmed|0   |1
   Last reconfirmed||2021-09-08
 Status|UNCONFIRMED |NEW

[Bug target/102211] [12 regression] ICE introduced by r12-3277

2021-09-07 Thread wilson at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102211

Jim Wilson  changed:

   What|Removed |Added

 CC||wilson at gcc dot gnu.org

--- Comment #4 from Jim Wilson  ---
Yes, moving SI/DI values to FP regs is OK.  However, RISC-V requires that FP
values in FP registers be stored NaN-boxed.  So an SFmode value in a 64-bit FP
reg has the upper 32-bits of all ones, and the lower 32-bits is the value. 
Thus if accessed as a 64-bit value, you get a NaN.  The hardware may trap if
you access a 32-bit value which is not properly NaN-boxed.  Using qemu to check
this may not be good enough, as last time I looked at qemu it wasn't handling
NaN-boxing correctly, but this was over a year ago, so maybe it has been fixed
since.  I don't know.

Anyways, this code sequence is OK
foo:
fmv.w.x fa0,a0
ret
because we are moving a 32-bit SImode value to an FP reg and then treating it
as SFmode, and the 32-bit move will properly NaN-box the SFmode value.

This code sequence is not OK
foo:
fmv.d.x fa5,a0
fmul.s  fa0,fa0,fa5
because we are moving a 64-bit DImode value to an FP reg and then treating it
as SFmode, which is not OK because the value won't be NaN-boxed and may trap at
run time.

validate_subreg used to prevent the bad subreg from being created.

I would think that TARGET_CAN_CHANGE_MODE_CLASS could help here, but it isn't
being called inside general_operand when called from fwprop1 where the bad
substitution happens.  Because we have a pseudo-register, and it is only called
for hard registers.

I don't see a way to fix this as a backend change with current validate_subreg,
other than by replacing register_operand with riscv_register_operand, and
putting the subreg check I need inside riscv_register_operand.  And likewise
for any other affected predicate, like move_operand.  This will be a big
change, though a lot of it will be mechanical.  As an optimization, we can
continue to use register_operand in any pattern that can't use FP registers.

As a middle end change, I need a new hook in general_operand to reject subregs
that we can't support on RISC-V.

Or maybe re-add the check I need to validate_subreg as a hook, so it can be
conditionally enabled for RISC-V.

We can allow (subreg:SF (reg:DI)) if it gets allocated to an integer register. 
It is only when it is allocated to an FP register that it can't work.  I don't
know offhand if that can be described.  But disallowing the subreg always for
RISC-V is simpler and also works.

[Bug tree-optimization/102139] New: -O3 miscompile due to slp-vectorize on strict align target

2021-08-30 Thread wilson at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102139

Bug ID: 102139
   Summary: -O3 miscompile due to slp-vectorize on strict align
target
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: wilson at gcc dot gnu.org
  Target Milestone: ---

This was originally reported here.
https://github.com/riscv/riscv-gcc/issues/289

This testcase is miscompiled at -O3 for a riscv64 target, though this is not a
bug in the riscv64 port.  I think it will fail for any strict align target.

typedef unsigned short uint16_t;

void zero_two_uint16(uint16_t* ptr) {
  ptr[0] = 0;
  ptr[1] = 0;
}

void zero(uint16_t* ptr) {
  for (int i = 0; i < 16; ++i) {
zero_two_uint16(ptr);
ptr += 2;
  }
}

The output is
zero:
sd  zero,0(a0)
sd  zero,8(a0)
sd  zero,16(a0)
sd  zero,24(a0)
sd  zero,32(a0)
sd  zero,40(a0)
sd  zero,48(a0)
sd  zero,56(a0)
ret
which fails due to unaligned accesses as a0 only has 2 byte alignment.

A git bisect tracked the problem down to this commit.

commit f5e18dd
Author: Kewen Lin li...@gcc.gnu.org
Date: Tue Nov 3 02:51:47 2020 +

pass: Run cleanup passes before SLP [PR96789]
...

I get correct code if I disable the fre4 pass, which is the fre pass inside
pre_slp_scalar_cleanup which was added by this patch.

The 169t.vectorize pass adds an address alignment check, and then emits a loop
with double-word stores if aligned, and a loop with half-word stores if
unaligned.  172t.cunroll fully unrolls both loops.  The 173t.fre4 pass deletes
a phi node before the half-word stores.  The 172t output has
   [local count: 12627204]:
  # ptr_3 = PHI 
  # ivtmp_15 = PHI <16(2)>
  *ptr_3 = 0;
and the 173t.fre4 output has
   [local count: 12627204]:
  *ptr_4(D) = 0;
In the 175t.slp1 pass, the block of half-word stores gets vectorized which is
wrong.  Then later 207t.dce7 notices duplicate code and deletes the second
block of stores.

Comparing the full slp1 dump with fre4 disabled versus the unmodified slp1
dump, I see that the first significant difference is when computing pointer
alignment.  With fre4 disabled, I get

tmp.c:4:10: note:  recording new base alignment for vectp_ptr.8_125
  alignment:8
  misalignment: 0
  based on: MEM  [(uint16_t
*)vectp_ptr.8_125] = { 0, 0, 0, 0 };
tmp.c:4:10: note:  recording new base alignment for ptr_3
  alignment:2
  misalignment: 0
  based on: *ptr_3 = 0;
tmp.c:4:10: note:   === vect_slp_analyze_instance_alignment ===
tmp.c:4:10: note:   vect_compute_data_ref_alignment:
tmp.c:4:10: note:   can't force alignment of ref: *ptr_3

It then refuses to vectorize.  With the unmodified compiler I get

tmp.c:4:10: note:  recording new base alignment for ptr_4(D)
  alignment:8
  misalignment: 0
  based on: MEM  [(uint16_t *)ptr_4(D)] = {
0, 0, 0, 0 };
tmp.c:4:10: note:   === vect_slp_analyze_instance_alignment ===
tmp.c:4:10: note:   vect_compute_data_ref_alignment:
tmp.c:4:10: missed:   misalign = 0 bytes of ref *ptr_4(D)

and then goes ahead and vectorizes which is wrong.

Maybe fre4 shouldn't optimize away a phi node when the pointers have different
alignment?

I noticed that before slp1 runs, the double-word store block has
  # ALIGN = 8, MISALIGN = 0
but the half-word store block does not.  After slp1 runs, both the double-word
store and the half-word store block have these notes.

[Bug middle-end/101996] libatomic: RISC-V 64: Infinite recursion in __atomic_compare_exchange_1

2021-08-25 Thread wilson at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101996

--- Comment #6 from Jim Wilson  ---
Looking at Alpine Linux discussions, I see that it has a
--enable-autolink-libatomic configure option which links in libatomic by
default.  This could break the libatomic autoconf tests that check to see if
libatomic functions exist, by linking in libatomic when we aren't expecting it.
 The libatomic autoconf tests are really checking to see if the calls get
inline expanded.  They don't want the calls to be satisified by a library. 
They want the calls to be undefined functions if they aren't inline expanded. 
These autoconf tests might need to be rewritten if more distros start linking
in libatomic by default.

[Bug middle-end/101996] libatomic: RISC-V 64: Infinite recursion in __atomic_compare_exchange_1

2021-08-23 Thread wilson at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101996

--- Comment #5 from Jim Wilson  ---
A riscv-gnu-toolchain build with musl looks OK, so it looks like an Alpine
Linux problem.

[Bug middle-end/101996] libatomic: RISC-V 64: Infinite recursion in __atomic_compare_exchange_1

2021-08-23 Thread wilson at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101996

Jim Wilson  changed:

   What|Removed |Added

 CC||wilson at gcc dot gnu.org

--- Comment #4 from Jim Wilson  ---
The only way I can reproduce this is if I hand modify configure output.  In the
riscv64-unknown-linux-gnu/libatomic/ build dir there is a auto-config.h file
that has
#define HAVE_ATOMIC_CAS_1 0
If I modify that to
#define HAVE_ATOMIC_CAS_1 1
then I get the same result where atomic_compare_exchange_1 calls itself.

libat_compare_exchange_1:
addisp,sp,-16
li  a4,0
li  a3,5
sd  ra,8(sp)
call__atomic_compare_exchange_1
ld  ra,8(sp)
addisp,sp,16
jr  ra
.set__atomic_compare_exchange_1,libat_compare_exchange_1

The question would then be why you get this configure result.  In the
config.log file in the same dir I see

configure:13591: checking for __atomic_compare_exchange for size 1
configure:13611:
/scratch/jimw/riscv-gnu-toolchain/X-tmp/build-gcc-linux-stage2/./gcc/xgcc
-B/scratch/jimw/riscv-gnu-toolchain/X-tmp/build-gcc-linux-stage2/./gcc/
-B/scratch/jimw/riscv-gnu-toolchain/X-tmp/build-install/riscv64-unknown-linux-gnu/bin/
-B/scratch/jimw/riscv-gnu-toolchain/X-tmp/build-install/riscv64-unknown-linux-gnu/lib/
-isystem
/scratch/jimw/riscv-gnu-toolchain/X-tmp/build-install/riscv64-unknown-linux-gnu/include
-isystem
/scratch/jimw/riscv-gnu-toolchain/X-tmp/build-install/riscv64-unknown-linux-gnu/sys-include
   -o conftest -O2   -mcmodel=medlow -fno-sync-libcallsconftest.c  >&5
/scratch/jimw/riscv-gnu-toolchain/X-tmp/build-install/riscv64-unknown-linux-gnu/bin/ld:
/tmp/ccEYquRC.o: in function `main':
conftest.c:(.text.startup+0xa): undefined reference to
`__atomic_compare_exchange_1'
collect2: error: ld returned 1 exit status
configure:13614: $? = 1
configure:13642: result: no

You should check the auto-config.h and config.log files in your gcc build.

I don't think that the compiler version matters.  I see you are using musl
which is something I haven't been testing.  Maybe there is something in musl
that is confusing the libatomic build.  Or maybe there is something about the
way that Alpine Linux is compiling gcc that is causing the wrong result.

It looks like either an Alpine Linux problem or a musl problem to me.  I don't
have a system running Alpine Linux and don't know how to set one up offhand
which limits what I can immediately do.

[Bug target/91602] GCC fails to build for riscv in a combined tree due to misconfigured leb128 support

2021-07-19 Thread wilson at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91602

--- Comment #14 from Jim Wilson  ---
I posted a patch but didn't get a review, and then got distracted by other
stuff and failed to follow up.
https://gcc.gnu.org/pipermail/gcc-patches/2020-January/539461.html

[Bug c/98892] FAIL: gcc.dg/plugin/diagnostic-test-expressions-1.c

2021-06-28 Thread wilson at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98892

--- Comment #4 from Jim Wilson  ---
It turns out that -fmessage-length=0 doesn't work which is odd.  I suspect a
latent bug in the diagnostic code.

I tried -fmessage-length=128, which should work as that is longer than the
error line, and does work for this failure, but that causes a different line to
fail.  Turns out that there is a test that emits a 2.3KB error message.  To
make the entire testcase work, I need a -fmessage-length value longer than the
longest error message, so the smallest power of 2 that works is
-fmessage-length=4096 which seems too stupid to submit as a patch.

[Bug c/98892] FAIL: gcc.dg/plugin/diagnostic-test-expressions-1.c

2021-06-26 Thread wilson at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98892

--- Comment #3 from Jim Wilson  ---
On second thought, I don't think that there is anything wrong with
dg-*-multiline-output.

The problem is simply that the diagnostic code is left shifting the error
message by m_x_offset_display, and this left shit causes the printed message to
not match the expected message.  The difference is only a few chars of white
space which makes it very hard to see the problem when inspecting the test log.
 You have to count white space characters to see the problem.  In my case,
there is one less space char in the printed message than in the expected
message.

And it looks like the solution is -fmessage-length=0 which is already added to
ALWAYS_CXXFLAGS and should maybe be added to ALWAYS_TEST_FLAGS instead.  Or
maybe just added to this one testcase for now to make it work.

[Bug c/98892] FAIL: gcc.dg/plugin/diagnostic-test-expressions-1.c

2021-06-25 Thread wilson at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98892

Jim Wilson  changed:

   What|Removed |Added

 CC||wilson at gcc dot gnu.org

--- Comment #2 from Jim Wilson  ---
I ran into this with a riscv toolchain make check.

Debugging, I discovered that if my terminal window is 80 characters wide, then
context->max_caret_width gets set to 78 in diagnostic.c, and then in
diagnostic-show-locus.c m_x_offset_display gets set to 9, and then in
print_trailing_fixits it calls move_to_column with column set to 9 which causes
an extra unnecessary newline to be emitted.  This extra newline leads to the "1
blank lines in output" error.  Also, I suspect that there is a contributing bug
in the dg-{begin,end}-multiline-output support that causes it to fail when an
error is followed by two newlines.  This causes the "expected multiline pattern
lines 550-551" failure.

A bit of experimenting with the testcase in isolation shows that if my screen
is 88 chars wide there is no extra newline, and if my screen is 87 chars wide I
get the extra newline.

A bit of experimenting with RUNTESTFLAGS="plugin.exp" show that the testcase
fails every time when my screen is 80 characters wide and works everytime when
my screen is 81 characters wide.  I don't know why this number is different
than above.

A testcase whose success depends on screen width is very annoying.

One way to work around the problem is to make the line shorter.  If I delete
one tab char from line 548 of gcc.dg/plugin/diagnostic-test-expressions-1.c and
8 space characters from the two matching error lines 550 and 551, then the
testcase does work with an 80 character wide terminal.  However, it does now
fail with a 77 char wide screen due to a different line 415-416.  So this isn't
an ideal solution.  But it works for me as my screens are always 80 characters
wide.

ideally there should be a way to turn off the max_caret_width stuff when
running the testsuite so results don't depend on screen width.  Someone who
knows the diagnostic code should probably look at this.

Or maybe print_trailing_fixits can be fixed?  It isn't obvious why it needs to
call move_to_column at the end.

[Bug rtl-optimization/100264] REE does not work on PARALLEL expressions with a single register SET child

2021-06-02 Thread wilson at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100264

Jim Wilson  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 CC||wilson at gcc dot gnu.org
 Resolution|--- |FIXED

--- Comment #3 from Jim Wilson  ---
Fixed on mainline.

[Bug target/86387] [RISCV][ABI] GCC fails to sign/zero-ext integers as necessary for passing/returning int+fp structs on hard-float ABIs

2021-05-05 Thread wilson at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86387

Jim Wilson  changed:

   What|Removed |Added

 Resolution|--- |WONTFIX
 Status|ASSIGNED|RESOLVED

--- Comment #3 from Jim Wilson  ---
This was fixed with a psABI change in Nov 2018 and we forgot to update this bug
report.
https://github.com/riscv/riscv-elf-psabi-doc/pull/74

[Bug target/100348] RISC-V extra pointer adjustments for memcpy() from glibc

2021-04-30 Thread wilson at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100348

Jim Wilson  changed:

   What|Removed |Added

 CC||wilson at gcc dot gnu.org

--- Comment #1 from Jim Wilson  ---
Most glibc targets already have an assembly language memcpy.S implementation. 
RISC-V should also.  This is likely the most effective solution to the problem. 
 We should fix the optimizer problem if possible of course.  It is just that
the glibc solution is likely easier than the gcc solution.

[Bug target/100348] RISC-V extra pointer adjustments for memcpy() from glibc

2021-04-30 Thread wilson at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100348

Jim Wilson  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2021-04-30
 Ever confirmed|0   |1

[Bug target/100316] Regression: __clear_cache() does not support NULL-pointer arguments

2021-04-28 Thread wilson at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100316

Jim Wilson  changed:

   What|Removed |Added

 CC||aoliva at gcc dot gnu.org,
   ||wilson at gcc dot gnu.org

--- Comment #1 from Jim Wilson  ---
Constant addresses are VOIDmode, and it only accepts Pmode and ptr_mode.  It
should accept VOIDmode too.  For instance cse_lookup_mem does this
  addr_mode = GET_MODE (XEXP (x, 0));
  if (addr_mode == VOIDmode)
addr_mode = Pmode;

This stems from changes that Alexandre Oliva made.  Adding him to the cc list.

[Bug c/100178] Should the “short” be promoted to “int” when use inline asm?

2021-04-21 Thread wilson at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100178

Jim Wilson  changed:

   What|Removed |Added

 CC||wilson at gcc dot gnu.org

--- Comment #1 from Jim Wilson  ---
This looks similar to 85185.  Same problem with a short type not working
correctly as an operand to an asm, and RISC-V is not the only target that the
code fails for.  I would suggest avoiding using char/short with an asm until
someone figures out how to fix this.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85185

[Bug libgcc/99964] android(bionic) cannot find crti.o and crtn.o on aarch64

2021-04-07 Thread wilson at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99964

Jim Wilson  changed:

   What|Removed |Added

 CC||wilson at gcc dot gnu.org

--- Comment #1 from Jim Wilson  ---
(In reply to cqwrteur from comment #0)
> bionic simply does not provide crti.o and crtn.o
> https://github.com/aosp-riscv/platform_bionic/tree/master/libc/arch-common/
> bionic

That is the RISC-V Android port in progress.  It should not be used for
anything other than RISC-V, as RISC-V patches might have accidentally broken
other targets.  Use the official Android sources for Aarch64, or maybe
something distributed by Linaro.

[Bug debug/99090] gsplit-dwarf broken on riscv64-linux

2021-02-26 Thread wilson at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99090

--- Comment #5 from Jim Wilson  ---
I tested it with a riscv-gnu-toolchain build and check.  The 4 -gsplit-dwarf
testcases that fail without the patch work with the patch.

I also tried a build and check with -gsplit-dwarf enabled by default and
discovered that there are a number of debug tests that fail simply because the
output is a little different than what is expected.  But nothing else appeared
to fail.

[Bug debug/99090] gsplit-dwarf broken on riscv64-linux

2021-02-26 Thread wilson at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99090

--- Comment #2 from Jim Wilson  ---
Yes we could have partial uleb128 support.  There is only a problem if at least
one label is in the code section.

There is another proposed solution to add special relaxable relocations for
uleb128 but the initial proposal had flaws, and no one has reviewed the second
proposal yet.

Or we could change the -gsplit-dwarf support to work even when there is no
uleb128 support.

[Bug tree-optimization/94092] Code size and performance degradations after -ftree-loop-distribute-patterns was enabled at -O[2s]+

2021-02-23 Thread wilson at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94092

--- Comment #6 from Jim Wilson  ---
Trying Alex's patch, it doesn't work on this testcase.

One problem is that the loop bound is unknown, so the memset size is estimated
as max 0x1fffc bytes.  The code calls
default_use_by_pieces_infrastructure_p which wants the total number of
instructions to be less than SET_RATIO which is 16.  The total number of
instructions is computed as 0x1fffc.  So the attempt fails.

There is a contributing problem here that the alignment of the dest was
computed as 8-bits, even though we have a pointer to int which has 32-bit
alignment on a strict alignment system.  That problem happens in
expand_builtin_memset_args which calls get_pointer_alignment on dest.  DEST is
a SSN_NAME, and the SSA_NAME pointer info says align is 0, so it defaults to
8-bit byte alignment.  I haven't tried to figure out where that went wrong.

The patch does work as advertised on libgcc soft-float code.  For an rv32i
build I see the memset calls in libgcc.a reduced from 31 to 10.

[Bug target/98981] gcc-10.2 for RISC-V has extraneous register moves

2021-02-18 Thread wilson at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98981

--- Comment #5 from Jim Wilson  ---
Neither of the two patches I mentioned in comment 1 can fix the problem by
themselves, as we still have a mix of SImode and DImode operations.

I looked at REE.  It doesn't work because there is more than one reaching def. 
But even if it did work, I don't think it would completely solve the problem
because it runs after register allocation and hence won't be able to remove
move instructions.

To get the best result, we need the register allocator to take two registers
with different modes with overlapping live ranges, and realize that they can be
allocated to the same hard reg because the overlapping uses are
non-conflicting.  I haven't tried looking at the register allocator, but it
doesn't seem like a good way to try to solve the problem.

We have an inconvenient mix of SImdoe and DImode because we don't have SImode
compare and branch instructions.  That requires sign extending 32-bit values to
64-bit to compare them, which then results in the sign extend and register
allocation optimization issues.  it is unlikely that 32-bit compare and branch
instructions will be added to the ISA though.

One useful thing I noticed is that the program is doing a max operation, and
the B extension adds a max instruction.  Having one instruction instead of a
series of instructions including a branch to compute max makes the optimization
issues easier, and gcc does give the right result in this case.  Using a
compiler with B support I get
lw  a4,0(a5)
lw  a2,0(a3)
addia5,a5,4
addia3,a3,4
addwa4,a4,a2
max a0,a4,a0
bne a5,a1,.L2
which is good code with the extra moves and sign-extends removed.  So I have a
workaround of sorts, but only if you have the B extension.

The -mtune=sifive-7-series can support conditional move via macro fusion, I was
hopeful that this would work as well as max, but unfortunately the sign-extend
that was removed in the max case does't get removed in the conditional move
case.  Also, the conditional move is 2-address, and the register allocator ends
up needing a reload, which gives us the unwanted mv again.  So the code in this
case is the same as without the option.  I didn't check to see if this is
fixable.

[Bug target/98981] gcc-10.2 for RISC-V has extraneous register moves

2021-02-17 Thread wilson at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98981

--- Comment #4 from Jim Wilson  ---
With this testcase

extern void sub2 (void);

void
sub (int *i, int *j)
{
  int k = *i + 1;
  *j = k;
  if (k == 0)
sub2 ();
}

Compiling without the riscv_rtx_cost patch, I get
lw  a5,0(a0)
addiw   a4,a5,1
sw  a4,0(a1)
beq a4,zero,.L4
compiling with the riscv_rtx_cost patch, I get
lw  a5,0(a0)
addiw   a4,a5,1
sw  a4,0(a1)
addiw   a5,a5,1
beq a5,zero,.L4

The problem here is that we have RTL
(insn 9 7 10 2 (set (reg:SI 76)
(plus:SI (subreg:SI (reg:DI 78 [ *i_4(D) ]) 0)
(const_int 1 [0x1]))) "tmp.c":6:7 3 {addsi3}
 (expr_list:REG_DEAD (reg:DI 78 [ *i_4(D) ])
(nil)))
(insn 10 9 11 2 (set (reg/v:DI 73 [ k ])
(sign_extend:DI (reg:SI 76))) "tmp.c":6:7 90 {extendsidi2}
 (nil))
with the SImode 76 used for the sw and the DImode 73 used for the beq.  With
the riscv_rtx_cost change, which makes the sign_extend add cheap, combine folds
the add into the sign-extend to get
(insn 9 7 10 2 (set (reg:SI 76)
(plus:SI (subreg:SI (reg:DI 78 [ *i_4(D) ]) 0)
(const_int 1 [0x1]))) "tmp.c":6:7 3 {addsi3}
 (nil))
(insn 10 9 11 2 (set (reg/v:DI 73 [ k ])
(sign_extend:DI (plus:SI (subreg:SI (reg:DI 78 [ *i_4(D) ]) 0)
(const_int 1 [0x1] "tmp.c":6:7 5 {*addsi3_extended}
 (expr_list:REG_DEAD (reg:DI 78 [ *i_4(D) ])
(nil)))
and now we have two adds which is wrong.  Without the patch, combine does
nothing, then ree manages to merge the sign_extend into the add and subseg the
sw, so we end up with only the one addw and no explicit sign extend.

My take on this is that we never should have generated the SImode add in the
first place, because we don't actually have that instruction.  If we would have
generated the sign-extended add at rtl generation time, and subreged the
result, then we would have gotten the right result.  In theory.

So I think the riscv_rtx_cost change is useful, but only if combined with the
rtl generation change I proposed in comment 1.

[Bug target/99089] unnecessary zero extend before AND

2021-02-15 Thread wilson at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99089

--- Comment #2 from Jim Wilson  ---
I don't know if REE can do this optimization, but it is a good place to start
looking.

[Bug debug/99090] New: gsplit-dwarf broken on riscv64-linux

2021-02-13 Thread wilson at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99090

Bug ID: 99090
   Summary: gsplit-dwarf broken on riscv64-linux
   Product: gcc
   Version: 10.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: debug
  Assignee: unassigned at gcc dot gnu.org
  Reporter: wilson at gcc dot gnu.org
  Target Milestone: ---

Enabling -gsplit-dwarf by default and trying a build hits an assert in
dw2_asm_output_delta_uleb128 because HAVE_AS_LEB128 is not defined.

The problem appears to be in output_loc_list in dwarf2out.c which has in the
dwarf_split_debug_info code
  /* FIXME: This will ICE ifndef HAVE_AS_LEB128.
 For that case we probably need to emit DW_LLE_startx_endx, 
 but we'd need 2 .debug_addr entries rather than just one.  */

riscv doesn't allow leb128 because of agressive linker relaxation, so we need
the alternative solution here that works without HAVE_AS_LEB128.

[Bug target/99089] New: unnecessary zero extend before AND

2021-02-13 Thread wilson at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99089

Bug ID: 99089
   Summary: unnecessary zero extend before AND
   Product: gcc
   Version: 10.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: wilson at gcc dot gnu.org
  Target Milestone: ---

Given this testcase extracted from newlib

struct s
{
  short s;
  int i;
};

extern void sub2 (void);
extern void sub3 (void);

int
sub (struct s* t)
{
  short i = t->s;
  if ((i & 0x8) == 0)
t->s |= 0x800;
  if ((t->s & 0x3) == 0)
sub3 ();
  t->i = i;
  return 0;
}

compiling for riscv32-elf with -O2 -S and looking at assembly code, I see two
places where we have

sllia5,a5,16
srlia5,a5,16
andia5,a5,3

The zero extend before the AND is clearly unnecessary.

It seems a complex set of circumstances leads to here.  The tree level
optimizer extends the lifetime of the first zero extend into a phi, which means
the operation is split across basic blocks.  This also means no REG_DEAD note
and no combine.  It isn't until 309 bbro that the zero extend and AND end up
back in the same basic block again, and that is too late to optimize it as
nothing after 309 bbro can fix this.

So it appears we need a global rtl pass that can notice and fix redundant zero
extends feeding into AND operations across basic blocks.

[Bug target/97417] RISC-V Unnecessary andi instruction when loading volatile bool

2021-02-13 Thread wilson at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417

Jim Wilson  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #62 from Jim Wilson  ---
I committed my shorten memrefs patch and Levy's patch minus the combine change.
 I don't believe that we need the combine change.

I did notice one interesting case where we get unnecesssary zero extends.  I
will submit that as a new bug report.

I also noticed with riscv64-linux testing that some -gsplit-dwarf tests fail,
but this turns out to be a latent bug in the split-dwarf support.  I will
submit that as a new bug report.

I believe this is fixed, so closing as resolved.

[Bug rtl-optimization/99067] Missed optimization for induction variable elimination

2021-02-10 Thread wilson at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99067

Jim Wilson  changed:

   What|Removed |Added

 CC||wilson at gcc dot gnu.org

--- Comment #1 from Jim Wilson  ---
This looks similar to an ivopts problem I looked at regarding coremark.  Given
this testcase
void matrix_add_const_unsigned(unsigned int N, short *A, short val) {
unsigned int i,j;
for (i=0; i

[Bug target/98981] gcc-10.2 for RISC-V has extraneous register moves

2021-02-05 Thread wilson at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98981

--- Comment #3 from Jim Wilson  ---
I suppose cost model problems could explain why combine didn't do the
optimization.  I didn't have a chance to look at that.

I still think there is a fundmental problem with how we represent SImode
operations, but again cost model problems could explain why my experiments to
fix that didn't work as expected.  I probably didn't look at that when I was
experimenting with riscv.md changes.

Your patch does look useful, but setting cost to 1 for MULT is wrong, and would
be just as wrong for DIV.  That is OK for PLUS, MINUS, and NEG though.  I think
a better option is to set *total = 0 and return false.  That gives no extra
cost to the sign extend, and recurs to get the proper cost for the operation
underneath.  That would work for MUL and DIV.  I found code in the rs6000 port
that does this.

[Bug target/98981] gcc-10.2 for RISC-V has extraneous register moves

2021-02-05 Thread wilson at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98981

Jim Wilson  changed:

   What|Removed |Added

 CC||wilson at gcc dot gnu.org

--- Comment #1 from Jim Wilson  ---
The extra move instruction is a side effect of how the riscv64 toolchain
handles 32-bit arithmetic.  We lie to the compiler and tell it that we have
instructions that produce 32-bit results.  In fact, we only have instructions
that produce 64-bit sign-extended 32-bit results.  The lie means that the RTL
has some insns with SImode output and some instructions with DImode outputs,
and sometimes we end up with nop moves to convert between the modes.  In this
case, it is peephole2 after regalloc that notices a SImode add followed by a
sign-extend, and converts it to a sign-extending 32-bit add followed by a move,
but can't eliminate the move because we already did register allocation.

This same problem is also why we get the unnecessary sext after the label, as
peephole can't fix that.

This problem has been on my todo list for a few years, and I have ideas of how
to fix it, but I have no idea when I will have time to try to fix it. I did
document it for the RISC-V International Code Speed Optimization task group.
https://github.com/riscv/riscv-code-speed-optimization/blob/main/projects/gcc-optimizations.adoc
This one is the first one in the list.

[Bug target/98596] registers not reused on RISC-V

2021-01-13 Thread wilson at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98596

Jim Wilson  changed:

   What|Removed |Added

 Ever confirmed|0   |1
 Status|UNCONFIRMED |NEW
   Last reconfirmed||2021-01-14
 CC||wilson at gcc dot gnu.org

--- Comment #1 from Jim Wilson  ---
Part of the problem seems to be the use of volatile, as we disable
optimizations inside volatile operations, and both constants are used inside
volatile operations.

But a bigger issue seems to be how we calculate constant costs.  In
riscv_rtx_costs we have
  /* If the constant is likely to be stored in a GPR, SETs of   
 single-insn constants are as cheap as register sets; we
 never want to CSE them.  */
  if (cost == 1 && outer_code == SET)
*total = 0;
which tells the compiler that constants are cheaper than registers.  If I
change that to "*total = 1;" then the two constants get optimized.  Changing
this cost means we will likely extend register lifetimes and increase register
pressure, which may reduce performance in some applications.  We would need a
lot of tersting to see what happens.  We are already computing a cost of 0 for
constants in the arithmetic immediate range, so setting costs to 0 here seems
unnecessary, but it is hard to predict what might happen with this change. 
There might be something else I'm missing here.

[Bug target/97417] RISC-V Unnecessary andi instruction when loading volatile bool

2020-12-17 Thread wilson at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417

--- Comment #52 from Jim Wilson  ---
I did some simple testing of my patch alone.  I modified the
riscv-gnu-toolchain makefile to use -Os to build all libraries, built a
rv32imac/ilp32 toolchain, and looked at library code size.  I see a few cases
where my patch gives smaller code, and no obvious cases where it gives larger
code, so I think it is OK.

I haven't tried a full test with my patch combined with Levy's patch minus the
combine change.

[Bug target/97417] RISC-V Unnecessary andi instruction when loading volatile bool

2020-12-15 Thread wilson at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417

--- Comment #51 from Jim Wilson  ---
Created attachment 49773
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49773=edit
untested fix to use instead of levy's combine.c patch

Needs testing without Levy's patch to make sure it doesn't accidentally
increase code size.

[Bug target/97417] RISC-V Unnecessary andi instruction when loading volatile bool

2020-12-15 Thread wilson at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417

--- Comment #50 from Jim Wilson  ---
The combine change is inconvenient.  We can't do that in stage3, and it means
we need to make sure that this doesn't break other targets.

If the combine change is a good idea, then I think you can just modify
is_just_move rather than add a new function.  Just skip over a ZERO_EXTEND or
SIGN_EXTEND before the the general_operand check.  We might need a mode check
against UNITS_PER_WORD since extending past the word size is not necessarily a
simple move.

So the problem stems from the code in combine that is willing to do a 2->2
split if neither original instruction is a simple move.  When we add a
SIGN_EXTEND or ZERO_EXTEND they aren't considered simple moves anymore.

There is still the question of why the instruction cost allows the change. 
First I noticed that riscv_address_cost has a hook to check for shorten_memrefs
but that riscv_rtx_costs isn't calling it.  It uses riscv_address_insns
instead.  So it seems like adding a shorten_memref check to the MEM case
riscv_rtx_costs might solve the problem.  But riscv_compressed_lw_address_p has
a !reload_completed check, and combine runs before reload, so that returns the
same result for both the new and old insns.  I think that reload_completed
check isn't quite right.  If we have a pseudo-reg, then we should allow that,
but if we have an offset that is clearly not compressible, then we should
reject it before reload.  So I think the reload_completed check should be moved
down to where it checks for a compressible register.  With those two changes, I
can make the testcase work without a combine change.  Though since I changed
how shorten_memrefs works we need a check to make sure this patch doens't
change code size.  I haven't tried to do that yet.

With my changes, in the combine dump, I see

Successfully matched this instruction:
(set (reg/f:DI 92)
(plus:DI (reg:DI 96)
(const_int 768 [0x300])))
Successfully matched this instruction:
(set (reg:DI 82 [ MEM[(intD.1 *)array_5(D) + 800B] ])
(sign_extend:DI (mem:SI (plus:DI (reg:DI 96)
(const_int 800 [0x320])) [1 MEM[(intD.1 *)array_5(D) + 800B]+0
S4 A32])))
rejecting combination of insns 27 and 6
original costs 4 + 16 = 20
replacement costs 4 + 20 = 24

so now the replacement gets rejected because of the increased address cost.

[Bug middle-end/98227] [11 Regression] ICE: tree check: expected tree that contains 'decl common' structure, have 'constructor' in get_section, at varasm.c:297 on riscv64-linux-gnu

2020-12-12 Thread wilson at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98227

--- Comment #5 from Jim Wilson  ---
My bootstrap with ada succeeded.  I used the same configure options except for
--prefix.  make check is still running.

[Bug middle-end/98227] [11 Regression] ICE: tree check: expected tree that contains 'decl common' structure, have 'constructor' in get_section, at varasm.c:297 on riscv64-linux-gnu

2020-12-11 Thread wilson at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98227

Jim Wilson  changed:

   What|Removed |Added

 CC||wilson at gcc dot gnu.org

--- Comment #4 from Jim Wilson  ---
This should already be fixed.  I ran into the same problem in my testing and
committed a patch today that HJ helpfully pointed at.  I don't test ada builds
normally.  I can try that to verify.

[Bug target/97417] RISC-V Unnecessary andi instruction when loading volatile bool

2020-11-19 Thread wilson at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417

--- Comment #42 from Jim Wilson  ---
riscv_address_cost is a hook, so it is targetm.address_cost which is only
called from address_cost which is only called in a few places one of which is
in postreload.c so that is the one I would look at first.  This is in
try_replace_in_use which is called from reload_combine_recognize_const_pattern
which is trying to put offsets back into mems which is exactly what we don't
want here.  This suggests that containing_mem isn't getting set when we have a
sign/zero extend.  It should get set in reload_combine_note_use.

[Bug target/97417] RISC-V Unnecessary andi instruction when loading volatile bool

2020-11-18 Thread wilson at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417

--- Comment #40 from Jim Wilson  ---
If you look at riscv.opt you will see that the -mshorten-memrefs option sets
the variable riscv_mshorten_memrefs.  If you grep for that, you will see that
it is used in riscv_address_cost in riscv.c.  I believe it is this change to
the address cost that is supposed to prevent the recombination back into
addresses that don't fit in compressed instructions.  So you need to look at
why this works in the current code, but not with your zero/sign extend load
patch.  Maybe there is something about the rtx costs for a regular load versus
a zero/sign extend load that causes the problem.  In the combine dump where it
says "original costs" and "replacement costs" that is where it is using
rtx_cost and riscv_address_cost.  The replacement cost should be more than the
original cost to prevent the recombination.  You should see that if you look at
the combine dump for the unpatched compiler.

[Bug other/97417] RISC-V Unnecessary andi instruction when loading volatile bool

2020-11-12 Thread wilson at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417

--- Comment #35 from Jim Wilson  ---
By combine issue, are you referring to the regression I mentioned in comment 3
and filed as bug 97747?  We can handle that as a separate issue.  It should be
uncommon.  I expect to get much more benefit from this patch than the downside
due to that combine issue.

As for the shorten-memrefs problem, I didn't notice that one in my testing.  It
does need to be fixed.  Taking a look now, it looks pretty simple to fix.  The
code currently looks for MEM, it needs to handle (SIGN_EXTEND (MEM)) and
((ZERO_EXTEND (MEM)).  See the get_si_mem_base_reg function.  You need to skip
over the sign_extend or zero_extend when looking fot the mem at the first call.
 Then at the second call you need to be careful to put the sign_extend or
zero_extend back when creating the new RTL.  Maybe just another arg to
get_si_mem_base so it can record the parent rtx code of the mem.  Or maybe do
this outside get_si_mem_base to skip over a sign/zero extend at the first call,
and then do the same at the second call but record what rtx we skipped over so
that we can put it back.  We can either handle this here or as another patch. 
But since you have some time while waiting for paperwork maybe you can try
writing a fix.

[Bug other/97417] RISC-V Unnecessary andi instruction when loading volatile bool

2020-11-11 Thread wilson at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417

--- Comment #33 from Jim Wilson  ---
I did say that I'm willing to fix code style issues.  All major software
projects I have worked with have coding style conventions.  It makes it easier
to maintain a large software base when everything is using the same coding
style.  We do have info to help you follow the GNU coding style.  See
https://gcc.gnu.org/wiki/FormattingCodeForGCC
which has emacs and vim settings to get the right code style.

[Bug other/97417] RISC-V Unnecessary andi instruction when loading volatile bool

2020-11-10 Thread wilson at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417

--- Comment #30 from Jim Wilson  ---
Looking at your v2 patch, the first verison fails because you are passing
mismatched modes to emit_move_insn.  The version with gen_lowpart solves that
problem, but fails because of infinite recursion.

The v4 patch looks good.  There are some coding style issues, but I can always
fix them for you.  There should be a space before open paren.  The first &&
should line up with the last two.  The braces should be indented two more
spaces.  The comment needs to end with a period and two spaces.

In the comment, instead of "Separate ... to ..." I'd say "Expand ... to ..." or
maybe "Emit ... as ...".

Now we need the copyright assignment paperwork.

[Bug middle-end/94083] inefficient soft-float x!=Inf code

2020-11-07 Thread wilson at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94083

The original bug report was apparently lost in the sourceware/gcc migration
back in the spring and I didn't notice until now.

This testcase

int foo(void) {
  volatile float f, g;
  intn;
  f = __builtin_huge_valf();
  g = __builtin_huge_valf();
  n += 1 - (f != __builtin_huge_valf());
  return n;
}

compiled for soft-float with -O2, and looking at the original tree dump I see

  f =  Inf;
  g =  Inf;
  SAVE_EXPR ;, n =
SAVE_EX
PR  + n;;

So the C front end converted the f != Inf compare to a f u<=
 compare, but the problem here is that the !=
operation is a single libcall, but u<= is two libcalls.  So code that should
have a single soft-float libcall ends up with two.  First a call to __unordsf2,
then a compare and branch, and then a call to __lesf2.  This is a
de-optimization.

Perhaps we can convert the f u<=  back to f != Inf in
the optimization to get a single libcall.  Or maybe we can add unordered
soft-float libcalls like ulesf2.

[Bug other/97417] RISC-V Unnecessary andi instruction when loading volatile bool

2020-11-06 Thread wilson at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417

--- Comment #21 from Jim Wilson  ---
I submitted my testcase as 97747 so it will get more attention.

[Bug rtl-optimization/97747] New: missed combine opt with logical ops after zero extended load

2020-11-06 Thread wilson at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97747

Bug ID: 97747
   Summary: missed combine opt with logical ops after zero
extended load
   Product: gcc
   Version: 10.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: rtl-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: wilson at gcc dot gnu.org
  Target Milestone: ---

Consider this testcase
struct
{
  unsigned int a : 1;
  unsigned int b : 1;
  unsigned int c : 1;
  unsigned int d : 1;
  unsigned int pad1 : 28;
} s;

void
sub (void)
{
  s.a = 1;
  s.c = 1;
}

Compiling with -O2 -S for ARM I get
sub:
@ args = 0, pretend = 0, frame = 0
@ frame_needed = 0, uses_anonymous_args = 0
@ link register save eliminated.
movwr2, #:lower16:.LANCHOR0
movtr2, #:upper16:.LANCHOR0
ldrbr3, [r2]@ zero_extendqisi2
bic r3, r3, #5
orr r3, r3, #5
strbr3, [r2]
bx  lr
The bic bit-clear instruction is obviously unnecessary.

In the combine dump file I see that we have
(insn 9 7 11 2 (set (reg:SI 120)
(and:SI (reg:SI 119 [ MEM  [(struct  *)] ])
(const_int -6 [0xfffa]))) "tmp.c":13:7 90
{*arm_andsi3_insn}
 (expr_list:REG_DEAD (reg:SI 119 [ MEM  [(struct 
*)] ])
(nil)))
(insn 11 9 13 2 (set (reg:SI 122)
(ior:SI (reg:SI 120)
(const_int 5 [0x5]))) "tmp.c":13:7 106 {*iorsi3_insn}
 (expr_list:REG_DEAD (reg:SI 120)
(nil)))

And the combiner does:
Trying 9 -> 11:
9: r120:SI=r119:SI&0xfffa
  REG_DEAD r119:SI
   11: r122:SI=r120:SI|0x5
  REG_DEAD r120:SI
Failed to match this instruction:
(set (reg:SI 122)
(ior:SI (and:SI (reg:SI 119 [ MEM  [(struct  *)] ])
(const_int 250 [0xfa]))
(const_int 5 [0x5])))

The problem here is that the ARM port generated a zero_extend for the load
byte, so combine knows that r120 has only 8 nonzero bits, it modified the -6 to
250 and then fails to notice that the and operation can be folded away because
in SImode the operation is no longer redundant with the modified constant.

On targets that do not generate the zero_extend, the and -6 operation gets
optimized away in combine.  For instance, with the current RISC-V port I get
sub:
lui a4,%hi(s)
lbu a5,%lo(s)(a4)
ori a5,a5,5
sb  a5,%lo(s)(a4)
ret

This likely fails on any target where movqi generates a zero extended load.

[Bug other/97417] RISC-V Unnecessary andi instruction when loading volatile bool

2020-11-06 Thread wilson at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417

--- Comment #20 from Jim Wilson  ---
Maybe convert_to_mode is recursively calling convert_to_mode until you run out
of stack space.  You can use --save-temps to generate a .i preprocessed file
form your input testcasxe, then run cc1 under gdb.  You need to give the
default arch/abi options or you will get an error
(gdb) run -march=rv64gc -mabi=lp64d -O2 tmp.i
when look at the backtrace when you hit the segfault.

There are other ways to get the  zero extend we need here.  We could just
generate the RTL for the insn directly instead of calling the
gen_zero_extendXiYi2 functions because we know that they exist and what form
they are in.  This is inconvenient though.  We could try querying the optabs
table to get the right insn code.  There is a gen_extend_insn function that
looks like it would work and is more direct than convert_to_mode. 
gen_extend_insn does require that the input is in a form that the convert will
accept, so it must be forced to a register if it isn't already a register. 
Another solution would be to use one of the rtx simplier functions, e.g. you
can do
 simplify_gen_unary (ZERO_EXTEND, word_mode, src, mode);
but the simplify functions are really for simplifying complex RTL to simpler
RTL, so I think not the right approach here.

I think gen_extend_insn is the best approach if we can't use convert_to_mode.

[Bug other/97417] RISC-V Unnecessary andi instruction when loading volatile bool

2020-11-05 Thread wilson at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417

--- Comment #17 from Jim Wilson  ---
Yes, LOAD_EXTEND_OP is a good suggestion.  We can maybe do something like
  int extend = (LOAD_EXTEND_OP (mode) == ZERO_EXTEND);
  temp_reg = force_reg (word_mode, convert_to_mode (word_mode, src, extend));

[Bug other/97417] RISC-V Unnecessary andi instruction when loading volatile bool

2020-11-05 Thread wilson at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417

--- Comment #15 from Jim Wilson  ---
I tried testing your first patch.  I just built unpatched/patch
rv32-elf/rv64-linux toolchains and used nm/objdump to look at libraries like
libc, libstdc++, libm.

I managed to find a testcase from glibc that gives worse code with the patch.
struct
{
  unsigned int a : 1;
  unsigned int b : 1;
  unsigned int c : 1;
  unsigned int d : 1;
  unsigned int pad1 : 28;
} s;

void
sub (void)
{
  s.a = 1;
  s.c = 1;
}

Without the patch we get
sub:
lui a5,%hi(s)
addia5,a5,%lo(s)
lbu a4,1(a5)
ori a4,a4,5
sb  a4,1(a5)
ret
and with the patch we get
sub:
lui a4,%hi(s)
lbu a5,%lo(s)(a4)
andia5,a5,-6
ori a5,a5,5
sb  a5,%lo(s)(a4)
ret
Note the extra and instruction.

This seems to be a combine problem.  With the patched compiler, I see in the
-fdump-rtl-combine-all dump file
Trying 9 -> 11:
9: r79:DI=r78:DI&0xfffa
  REG_DEAD r78:DI
   11: r81:DI=r79:DI|0x5
  REG_DEAD r79:DI
Failed to match this instruction:
(set (reg:DI 81)
(ior:DI (and:DI (reg:DI 78 [ MEM  [(struct  *)] ])
(const_int 250 [0xfa]))
(const_int 5 [0x5])))
Combine knows that reg 78 only has 8 nonzero bits, so it simplified the AND -6
to AND 250.  If combine had left that constant alone, or if maybe combine
propagated that info about nonzero bits through to r81, then it should have
been able to optimize out the AND operation.  This does work when the load does
not zero extend by default.

The ARM port shows the exact same problem.  I see
sub:
@ args = 0, pretend = 0, frame = 0
@ frame_needed = 0, uses_anonymous_args = 0
@ link register save eliminated.
movwr2, #:lower16:.LANCHOR0
movtr2, #:upper16:.LANCHOR0
ldrbr3, [r2]@ zero_extendqisi2
bic r3, r3, #5
orr r3, r3, #5
strbr3, [r2]
bx  lr
and the bic (bit clear) is obviously unnecessary.

This probably should be submitted as a separate bug if we don't want to fix it
here.

[Bug other/97417] RISC-V Unnecessary andi instruction when loading volatile bool

2020-11-05 Thread wilson at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417

--- Comment #14 from Jim Wilson  ---
Actually, for the SImode to DImode conversion, zero extending is unlikely to be
correct in that case.  The rv64 'w' instructions require the input to be
sign-extended from 32-bits to 64-bits, so a sign-extend is probably required. 
There will be a similar consideration for rv128 when we get to that.  So maybe
we should only handle QImode and HImode here.

Or maybe we make the choice of zero-extend or sign-extend depend on the mode,
and use zero-extend for smaller than SImode and sign-extend for SImode and
larger.

For qimode, char is unsigned by default, so zero extension is likely the right
choice.  For himode, it isn't clear which is best, but the arm port does a zero
extend.  Also, the Huawei code size proposal says that zero exnteded byte and
short loads are more important than sign extended byte and short load, so that
is more evidence that zero extend is more useful in those cases.

[Bug other/97417] RISC-V Unnecessary andi instruction when loading volatile bool

2020-11-05 Thread wilson at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417

--- Comment #13 from Jim Wilson  ---
The attachments show the entire riscv.c file being deleted and then readded
with your change.  This is due to a line ending problem.  The original file has
the unix style linefeed and the new file has the windows style carriage return
and linefeed.  Git has a setting called core.autocrlf that can help with this. 
This will require using git diff instead of just diff though to generate
patche, but should give better diffs.  Or alternatively, maybe to can force the
original file to have msdos line endings before you make the diff, e.g. maybe
just loading the original file into your editor and saving it without making
changes will fix the line endings.

You have in the second patch
(mode == QImode || mode == SImode || mode == HImode)
which is wrong but harmless for rv32 since we can't extend SImode, and is also
wrong for the eventual rv128 support.  You can fix this by using something like
(GET_MODE_CLASS (mode) == MODE_INT && GET_MODE_SIZE (mode) < UNITS_PER_WORD
There is one place in riscv_legitimize_move that already uses this.

The code inside the if statement is a lot more verbose then necessary.  You can
use some helper functions to simplify it.  First you can use word_mode which is
DImode for rv64 and SImode for rv32 (and will be OImode for rv128).  Then you
can call a helper to do the mode conversion.  So for instance something like
  temp_reg = force_reg (word_mode, convert_to_mode (word_mode, src, 1));
should work.  That should emit an insn to do the zero extend and put it in a
reg.  Now you no longer need to check src mode or TARGET_64BIT as the code is
the same in all cases.  So it should just be about 3 or 4 lines of code for the
body of the if statement.

You have a check for REG_P (dest).  This is stricter than you need, since it
only works for REG and doesn't accept SUBREG.  We should handle both.  Also, it
isn't hard to also handle non-reg or non-subreg dests.  You just need to force
the source to a reg, and you already did that when you generated the
zero_extend operation.  So this code looks like it should work for any dest and
you should be able to drop the REG_P (dest) check.

[Bug other/97417] RISC-V Unnecessary andi instruction when loading volatile bool

2020-10-29 Thread wilson at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417

--- Comment #7 from Jim Wilson  ---
That patch is basically correct.  I would suggest using gen_lowpart instead of
gen_rtx_SUBREG as a minor cleanup.  It will do the same thing, and is shorter
and easier to read.

There is one problem here that you can't generate new pseudo registers during
register allocation, or after register allocation is complete.  So you need to
disable this optimization in this case.  You can do that by adding a check for
can_create_pseudo_p ().  This is already used explicitly in one place in
riscv_legitimize_move and implicitly in some subfunctions, and is used in the
arm.md movqi pattern.

The next part is testing the patch.  We need some correctness testing.  You can
just run the gcc testsuite for that.  And we need some code size/performance
testing.  I'd suggest compiling some code with and without the patch and check
function sizes and look for anything that got bigger with the patch, and check
to see if it is a problem.  I like to use the toolchain libraries like libc.a
and libstdc++.a since they are being built anways, but using a nice benchmark
is OK also as long as it is big enough to stress the compiler.

If the patch passes testing, then we can look at expanding the scope to handle
more modes, and also handle MEM dest in addition to REG dest.

Yes, we can discuss this Monday.

[Bug other/97417] RISC-V Unnecessary andi instruction when loading volatile bool

2020-10-27 Thread wilson at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417

--- Comment #5 from Jim Wilson  ---
Yes, the volatile is the problem.  We need to disable some optimizations like
the combiner to avoid breaking the semantics of volatile.  However, if you try
looking at other ports, like arm, you can see that they don't have this problem
because they generate different RTL at the start and hence do not need the
combiner to generate the sign-extended load.  So the proposal here is that we
modify the RISC-V gcc port to also emit the sign-extended load at RTL
generation time, which solves this problem. And then we need to do some testing
to make sure that this actually generates good code in every case, as we don't
want to accidentally introduce a code size or performance regression while
fixing this volatile optimization problem.

If you are curious about the combiner issue, see the init_recog_no_volatile
call in combine.c.  If you comment that out, the andi will be optimized away. 
But we can't remove that call, because that would break programs using
volatile.

[Bug other/97417] RISC-V Unnecessary andi instruction when loading volatile bool

2020-10-20 Thread wilson at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417

--- Comment #3 from Jim Wilson  ---
The basic idea here is that the movqi pattern in riscv.md currently emits RTL
for a load that looks like this
  (set (reg:QI target) (mem:QI (address)))
As an experiment, we want to try changing that to something like this
  (set (reg:DI temp) (zero_extend:DI (mem:DI (address
  (set (reg:QI target) (subreg:QI (reg:DI temp) 0))
The hope is that the optimizer will combine the subreg with following
operations resulting in smaller faster code at the end.  And this should also
solve the volatile load optimization problem.  So we need a patch, and then we
need experiments to see if the patch actually produces better code on real
programs.  It should be fairly easy to write the patch even if you don't have
any gcc experience.  The testing part of this is probably more work than the
patch writing.

The movqi pattern calls riscv_legitmize_move in riscv.c, so that would have to
be modified to look for qimode loads from memory, allocate a temporary
register, do a zero_extending load into the temp reg, and then a subreg copy
into the target register.

You will probably also need to handle cases where both the target and source
are memory locations, in which case this already gets split into two
instructions, a load followed by a store.

You can look at the movqi pattern in arm.md file to see an example of how to do
this, where it calls gen_zero_extendqisi2.  Though for RISC-V, we would want
gen_zero_extendqidi2 for rv64 and gen_zero_extendqisi2 for rv32.

If the movqi change works, then we would want similar changes for movhi and
maybe also movsi for rv64.

It might also be worth checking whether zero-extend or sign-extend is the
better choice.  We zero extend char by default, so that should be best.  For
rv64, the hardware sign extends simode to dimode by default, so sign-extend is
probably best for that.  For himode I'm not sure, I think we prefer sign-extend
by default, but that isn't necessarily the best choice for loads.  This would
have to be tested.

You can see rtl dumps by using -fdump-rtl-all.  The combiner is the pass that
should be optimizing away the unnecessary zero-extend.  You can see details of
what the combiner pass is doing by using -fdump-rtl-combine-all.

[Bug target/97481] GCC ice when build with RISCV on msys2

2020-10-19 Thread wilson at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97481

Jim Wilson  changed:

   What|Removed |Added

 CC||wilson at gcc dot gnu.org

--- Comment #2 from Jim Wilson  ---
riscv-unknown-linux-gnu is not a supported target.  You must use either
riscv32-unknown-linux-gnu or riscv64-unknwon-linux-gnu.  Both compilers support
both word sizes, the only difference is which is the default word size.  This
may be part of the reason why it failed.

The attachment doesn't show any ICE.  It is just a config.log file, and I don't
see anything interesting in there.

Note that mingw64 builds of a linux toolchain are unlikely to work as glibc
requires a case sensitive file system.  I'd suggest using WSL2 as something
more likely to work, but I haven't tried that myself.

It looks like you have a badly broken compiler build.  You will need to figure
out why it is broken.  This doesn't seem to be a gcc problem, but rather a
build problem on your side.

[Bug bootstrap/97409] riscv cross toolchain build fails

2020-10-19 Thread wilson at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97409

Jim Wilson  changed:

   What|Removed |Added

 Status|WAITING |RESOLVED
 Resolution|--- |INVALID

--- Comment #8 from Jim Wilson  ---
Apparent user error, and no response in several days, so closing.

[Bug other/97417] RISC-V Unnecessary andi instruction when loading volatile bool

2020-10-15 Thread wilson at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417

Jim Wilson  changed:

   What|Removed |Added

 CC||wilson at gcc dot gnu.org
 Status|UNCONFIRMED |NEW
   Last reconfirmed||2020-10-15
 Ever confirmed|0   |1

--- Comment #1 from Jim Wilson  ---
Comparing with the ARM port, I see that in the ARM port, the movqi pattern
emits
(insn 8 7 9 2 (set (reg:SI 117)
(zero_extend:SI (mem/v/c:QI (reg/f:SI 115) [1 active+0 S1 A8])))
"tmp.c\
":7:7 -1
 (nil))
(insn 9 8 10 2 (set (reg:QI 116)
(subreg:QI (reg:SI 117) 0)) "tmp.c":7:7 -1
 (nil))
and then later it combines the subreg operation with the following zero_extend
and cancels them out.

Whereas in the RISC-V port, the movqi pattern emits
(insn 9 7 10 2 (set (reg:QI 76)
(mem/v/c:QI (lo_sum:DI (reg:DI 74)
(symbol_ref:DI ("active") [flags 0xc4]  )) [1 active+0 S1 A8])) "tmp.c":7:7 -1
 (nil))
and then combine refuses to combine the following zero-extend with this insn as
the memory operation is volatile.

So it seems we need to rewrite the RISC-V port to make movqi and movhi zero
extend to si/di mode and then subreg.  That probably will require cascading
changes to avoid code size and performance regressions.

Looks like a tractable small to medium size project, but will need to wait for
a volunteer to work on it.

[Bug bootstrap/97409] riscv cross toolchain build fails

2020-10-14 Thread wilson at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97409

Jim Wilson  changed:

   What|Removed |Added

 CC||wilson at gcc dot gnu.org

--- Comment #7 from Jim Wilson  ---
To follow up on what Andreas mentioned, it appears that the user is trying to
use a linux gcc with an embedded elf binutils which will not work.  They aren't
fully compatible.  One of the differences is the set of supported emulations.

rohan:2069$ riscv64-unknown-elf-ld -mfoo
riscv64-unknown-elf-ld: unrecognised emulation mode: foo
Supported emulations: elf64lriscv elf32lriscv
rohan:2070$ riscv64-unknown-linux-gnu-ld -mfoo
riscv64-unknown-linux-gnu-ld: unrecognised emulation mode: foo
Supported emulations: elf64lriscv elf64lriscv_lp64f elf64lriscv_lp64
elf32lriscv elf32lriscv_ilp32f elf32lriscv_ilp32
rohan:2071$ 

So as Andreas mentioned, you need to use the same target for binutils as for
gcc.

[Bug bootstrap/97183] zstd build failure for gcc 10 on Ubuntu 16.04

2020-09-30 Thread wilson at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97183

Jim Wilson  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|NEW |RESOLVED

--- Comment #8 from Jim Wilson  ---
Remember to set it to resolved/fixed this tine,

[Bug bootstrap/97183] zstd build failure for gcc 10 on Ubuntu 16.04

2020-09-30 Thread wilson at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97183

--- Comment #7 from Jim Wilson  ---
Fixed on mainline and the gcc-10 branch.

[Bug target/96700] undefined reference to `failure_on_line_796'

2020-09-29 Thread wilson at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96700

Jim Wilson  changed:

   What|Removed |Added

 Resolution|--- |INVALID
 Status|UNCONFIRMED |RESOLVED

--- Comment #3 from Jim Wilson  ---
I took another look at this.

I can get this to fail with an x86_64 gcc, but that is the system compiler
which is gcc-7.  A little testing shows that the testcase works for x86_64
gcc-8 and later and fails for gcc-7.  However, since this is a testcase for
gcc-11, there is no expectation that it should work with older gcc versions. 
We sometimes need to modify testcases to work when gcc changes.

I also tried this with rv32-elf compilers, gcc-11, gcc-10, gcc-9, and I
couldn't get the testcase to fail for any of them, with or without -flto.

So I can't reproduce the original poster's problem, which gets me back to my
original claim that I don't think that this is a gcc bug.

Actually, on second thought, I realized that I'm using the original testcase
from the gcc source tree gcc.dg/tree-ssa/buitlin-sprintf.c not the attachment. 
Looking at the attachment, I see that some numbers have been modified.  So that
is why it fails.  It is a bogus testcase.  Looking further, I see that you are
using an old version of the testcase before it was changed in 2018.  See
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86274
which both changed an optimization and changed the testcase to match the new
optimization.  The fix was put on mainlnie (gcc-9) and backported to gcc-8, so
the testcase should fail for any gcc-8 or later.

This is invalid.  You can't take a testcase from gcc-X and expect it to work
with gcc-Y.

[Bug bootstrap/97183] zstd build failure for gcc 10 on Ubuntu 16.04

2020-09-24 Thread wilson at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97183

--- Comment #3 from Jim Wilson  ---
I installed Ubuntu 16.04 on an old laptop so I can directly reproduce the build
failure.

Checking for the zstd version looks like the easier patch.

Checking for specific macros and functions might be better, but what do we do
with the info?  I'm not familiar with zstd or the lto-compress.c code, and it
doesn't look like we can easily change it to not use certain macros/functions
when they don't exist.  Checking for half a dozen macros/functions and
disabling all of the code if one or more is missing seems silly.  Though
looking at this, as Kito pointed out, it appears that
ZSTD_getFrameContentSize() is the most recently added function that we are
using, so maybe we only need to check for that one.  That was added in version
1.3.0.  Or we can check for the 1.3.0 version which seems simpler.  I do know
that version 1.3.3 of zstd works as this is what Ubuntu 18.04 has.  I haven't
tried the versions 1.3.[012].

[Bug libstdc++/97182] Add support for targets that only define SYS_futex_time64 and not SYS_futex

2020-09-24 Thread wilson at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97182

--- Comment #5 from Jim Wilson  ---
I see that a riscv64-linux libgomp.so has mutex calls and no obvious futex
calls.  Though I find it a little curious that futex support isn't
auto-detected.  There is already config/futex.m4 to detect futex support, and
libstdc++ is using it.  So yes, we are missing riscv*-linux futex support in
libgomp.

[Bug target/96700] undefined reference to `failure_on_line_796'

2020-09-23 Thread wilson at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96700

Jim Wilson  changed:

   What|Removed |Added

 CC||wilson at gcc dot gnu.org

--- Comment #1 from Jim Wilson  ---
The testcase is not strictly valid C code, as it has references to undefined
funtions.  The testcase was written this way to check for a specific compiler
optimization.  If the testcase is used correctly, and the optimization
succeeds, then the testcase links.  If the testcase is used correctly, and the
optimization fails, then the testcase gives link errors.  Also, if you use the
testcase incorrectly, i.e. the wrong optimziation level, it can give link
errors.  

The testcase as written can't work with -flto, but does correectly verify
whether the optimization works, so it is OK, but could probably be improved. 
This looks like more of an enhancement request than a bug.

This is not a riscv specific problem.