[Bug target/89012] SH2 (FDPIC) duplicate symbols in generated assembly.

2019-10-02 Thread olegendo at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89012

--- Comment #10 from Oleg Endo  ---
(In reply to Rich Felker from comment #9)
> I think it's actually just a matter of removing the patterns for generating
> bsrf, but I may be mistaken. Generating jsr should be what happens "by
> default" in some sense if GCC just has to load the address, no?

I think so, yes.

> Of course 
> that would also explicitly load the GOT pointer for the callee which we
> don't need since it's local.

Can you make an example?  Maybe it can get optimized away afterwards, if it's
not used?

[Bug target/89012] SH2 (FDPIC) duplicate symbols in generated assembly.

2019-09-30 Thread bugdal at aerifal dot cx
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89012

--- Comment #9 from Rich Felker  ---
I think it's actually just a matter of removing the patterns for generating
bsrf, but I may be mistaken. Generating jsr should be what happens "by default"
in some sense if GCC just has to load the address, no? Of course that would
also explicitly load the GOT pointer for the callee which we don't need since
it's local. I'll try to take a look at this in more detail soon and see what I
can find.

[Bug target/89012] SH2 (FDPIC) duplicate symbols in generated assembly.

2019-09-28 Thread olegendo at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89012

--- Comment #8 from Oleg Endo  ---
Converting the FDPIC from bsrf back to jsr sounds like quite some work. 
However, I think chances of success are higher of it does the same thing as the
normal PIC code.

Do you know what the main reason was to use bsrf for FDPIC?  How does FDPIC
differ from regular PIC?

[Bug target/89012] SH2 (FDPIC) duplicate symbols in generated assembly.

2019-09-28 Thread bugdal at aerifal dot cx
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89012

--- Comment #7 from Rich Felker  ---
I think all the FDPIC work done to use bsrf like this was probably a mistake.
It ends up greatly enlarging functions that make a lot of such calls, for
example soft-float that does it for each floating point operation. We actually
encountered this in production usage. I think the bulk of the FDPIC patch was
adding this stuff, and I wouldn't be opposed to removing all that (just
generating jsr) if we can determine that it fixes bugs and results in better
codegen.

[Bug target/89012] SH2 (FDPIC) duplicate symbols in generated assembly.

2019-09-28 Thread olegendo at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89012

Oleg Endo  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2019-09-29
 Ever confirmed|0   |1

--- Comment #6 from Oleg Endo  ---
The ashlsi3_d_call insn, alternative 1 seems to be the problem here, but I
think the same problem could happen for all other libcalls that expand to a
bsrf sequence.

In this particular case, something in the optimizers decides that moving the
shift instruction out of the loop is a good idea.  To do that, it copies the
shift instruction only -- which is "ashlsi3_d_call", which is the bsrf plus the
following label.  The label is generated once during RTL expansion.

This works for normal PIC code which calculates the call address and puts it in
a register first.  This happens only once during RTL expansion and the address
remains fixed.  Subsequent copies of the shift instruction (which is just a
jsr) work, because they re-use the calculated address.

In case of FDPIC, the use of bsrf will always require a unique address/offset
in the symbols.  It can't just make arbitrary copies of the shift instruction
(which is a bsrf) because the pre-calculated address/offset will be wrong.


1) If not already there, something needs to be added that allows re-generating
the address/offset of the copied/cloned instruction and also re-emitting the
constant load.  This is probably not so easy to do, as instructions can be
copied in many places during the compilation.

2) Postpone the calculation and emission of symbols and constant loads until a
very late stage of the compilation using very late splitters.

3) Use jsr instead of bsrf in the compiler and let the linker post-optimize
calls via relaxation (although linker relaxation on SH has been broken for a
long time).


I don't know much about PIC/FDPIC, but one thing I've noticed looks strange.
The right-shift pattern "ashrsi3_n" will result in

bsrfr1  ! 268   [c=5 l=2]  call_valuei_pcrel
...

.long   ___ashrsi3@PLT-(.LPCS0+2-.)



While the left-shift pattern "ashlsi3_d_call" will result in

bsrfr6  ! 24[c=80 l=2]  ashlsi3_d_call/1
...

.long   ___ashlsi3_r0-(.LPCS0+2)

[Bug target/89012] SH2 (FDPIC) duplicate symbols in generated assembly.

2019-01-28 Thread me at zv dot io
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89012

--- Comment #5 from Zach van Rijn  ---
Created attachment 45546
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=45546=edit
All files produced by -O2 -da

[Bug target/89012] SH2 (FDPIC) duplicate symbols in generated assembly.

2019-01-28 Thread me at zv dot io
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89012

--- Comment #4 from Zach van Rijn  ---
The error can be reproduced at `O1` optimization level with both
(strictly both) of the following options:

./cc -c mintest.c -O1 -freorder-blocks-algorithm=stc -ftree-pre

Changing to `-freorder-blocks-algorithm=simple` will not reveal
the issue at `O1`, `O2` or `O3`.

In summary, the only known ways to reproduce this issue are:

(0) `-O2` as described in original bug report;

(1) `-O1 -freorder-blocks-algorithm=stc -ftree-pre`, exclusively
not at any other optimization level;

and the only known ways ot mitigate this issue using either of
the above configurations are:

(2) `-O2 -freorder-blocks-algorithm=simple`;

(3) `-O1` without specifically both of the aforementioned flags.

The attached tarball contains 5 files named by letters 'A' - 'E'
containing the generated assembly, each with -dp` as suggested:

(A) FAIL: `-O2 -freorder-blocks-algorithm=stc`

(B) PASS: `-O2 -freorder-blocks-algorithm=simple`

(C) FAIL: `-O1 -freorder-blocks-algorithm=stc -ftree-pre`

(D) PASS: `-O1 -freorder-blocks-algorithm=simple -ftree-pre`

(E) PASS: `-O1 -freorder-blocks-algorithm=stc`

[Bug target/89012] SH2 (FDPIC) duplicate symbols in generated assembly.

2019-01-28 Thread me at zv dot io
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89012

--- Comment #3 from Zach van Rijn  ---
Created attachment 45545
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=45545=edit
Tarball containing intermediate asm (with -dp) for each of 5 cases.

[Bug target/89012] SH2 (FDPIC) duplicate symbols in generated assembly.

2019-01-25 Thread olegendo at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89012

--- Comment #2 from Oleg Endo  ---
You can compile the code with the '-dp' option to see which insn patterns make
up the asm code.  The pattern names will be emitted as comments in the asm
output.

[Bug target/89012] SH2 (FDPIC) duplicate symbols in generated assembly.

2019-01-23 Thread bugdal at aerifal dot cx
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89012

--- Comment #1 from Rich Felker  ---
Binutils version should not be relevant; the bug here is before anything even
gets to binutils. It looks like one of the RTL patterns used for calling libgcc
bitshift functions from FDPIC was inteded to be non-duplicable but isn't. This
bug goes undetected on J2 (-mj2) target because the J2 has the SH3 barrel shift
instructions and does not need libgcc for variable shifts, but affects baseline
SH2 ISA. I'll look back at the source and see if I can figure out what's
happenning.