[Bug rtl-optimization/78041] Wrong code on ARMv7 with -mthumb -mfpu=neon-fp16 -O0
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78041 ktkachov at gcc dot gnu.org changed: What|Removed |Added Status|NEW |ASSIGNED --- Comment #13 from ktkachov at gcc dot gnu.org --- .
[Bug rtl-optimization/78041] Wrong code on ARMv7 with -mthumb -mfpu=neon-fp16 -O0
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78041 Bernd Edlinger changed: What|Removed |Added Assignee|unassigned at gcc dot gnu.org |wdijkstr at arm dot com --- Comment #12 from Bernd Edlinger --- (In reply to Wilco from comment #11) > (In reply to ktkachov from comment #10) > > Confirmed then. Wilco, if you're working on this can you please assign it to > > yourself? > > Unfortunately the form doesn't allow me to do anything with the headers... I know that happens.
[Bug rtl-optimization/78041] Wrong code on ARMv7 with -mthumb -mfpu=neon-fp16 -O0
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78041 --- Comment #11 from Wilco --- (In reply to ktkachov from comment #10) > Confirmed then. Wilco, if you're working on this can you please assign it to > yourself? Unfortunately the form doesn't allow me to do anything with the headers...
[Bug rtl-optimization/78041] Wrong code on ARMv7 with -mthumb -mfpu=neon-fp16 -O0
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78041 ktkachov at gcc dot gnu.org changed: What|Removed |Added Status|UNCONFIRMED |NEW Last reconfirmed||2016-10-21 CC||ktkachov at gcc dot gnu.org Ever confirmed|0 |1 Known to fail||4.9.4, 5.4.1, 6.2.1, 7.0 --- Comment #10 from ktkachov at gcc dot gnu.org --- Confirmed then. Wilco, if you're working on this can you please assign it to yourself?
[Bug rtl-optimization/78041] Wrong code on ARMv7 with -mthumb -mfpu=neon-fp16 -O0
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78041 --- Comment #9 from Bernd Edlinger --- (In reply to Wilco from comment #8) > > I've got a patch that fixes it, it's being tested. > > While looking at how DI mode operations get expanded, I noticed there is a > CQ issue with your shift change. Shifts that are expanded early now use > extra registers due to the DI mode write of zero. Given all other DI mode > operations are expanded after reload, it may be better to do the same for > shifts too. Interesting idea. After reload there is no need anymore to zero the DI mode register at all, so that could become completely unnecessary.
[Bug rtl-optimization/78041] Wrong code on ARMv7 with -mthumb -mfpu=neon-fp16 -O0
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78041 --- Comment #8 from Wilco --- (In reply to Bernd Edlinger from comment #7) > (In reply to Richard Earnshaw from comment #6) > > (In reply to Bernd Edlinger from comment #5) > > > (In reply to Wilco from comment #4) > > > > However dealing with partial overlaps is complex so maybe the best > > > > option > > > > would be to add alternatives to di3_neon to either allow full > > > > overlap > > > > "r 0 X X X" or no overlap " r X X X". The shift code works with full > > > > overlap. > > > > > > That sounds like a good idea. > > > > > > Then this condition in di3_neon could go away too: > > > > > > && (!reg_overlap_mentioned_p (operands[0], operands[1]) > > > || REGNO (operands[0]) == REGNO (operands[1]))) > > > > Note that we don't want to restrict complete overlaps, only partial > > overlaps. Restricting complete overlaps leads to significant increase in > > register pressure and a lot of redundant copying. > > Yes. > > That is Wilco's idea: instead of =r 0r X X X > use =r 0 X X X and = r X X X, that should ensure that > no partial overlap happens, just full overlap or nothing. > > That's what arm_emit_coreregs_64bit_shift > and arm_ashldi3_1bit can handle. > > Who will do it? I've got a patch that fixes it, it's being tested. While looking at how DI mode operations get expanded, I noticed there is a CQ issue with your shift change. Shifts that are expanded early now use extra registers due to the DI mode write of zero. Given all other DI mode operations are expanded after reload, it may be better to do the same for shifts too.
[Bug rtl-optimization/78041] Wrong code on ARMv7 with -mthumb -mfpu=neon-fp16 -O0
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78041 --- Comment #7 from Bernd Edlinger --- (In reply to Richard Earnshaw from comment #6) > (In reply to Bernd Edlinger from comment #5) > > (In reply to Wilco from comment #4) > > > However dealing with partial overlaps is complex so maybe the best option > > > would be to add alternatives to di3_neon to either allow full > > > overlap > > > "r 0 X X X" or no overlap " r X X X". The shift code works with full > > > overlap. > > > > That sounds like a good idea. > > > > Then this condition in di3_neon could go away too: > > > > && (!reg_overlap_mentioned_p (operands[0], operands[1]) > > || REGNO (operands[0]) == REGNO (operands[1]))) > > Note that we don't want to restrict complete overlaps, only partial > overlaps. Restricting complete overlaps leads to significant increase in > register pressure and a lot of redundant copying. Yes. That is Wilco's idea: instead of =r 0r X X X use =r 0 X X X and = r X X X, that should ensure that no partial overlap happens, just full overlap or nothing. That's what arm_emit_coreregs_64bit_shift and arm_ashldi3_1bit can handle. Who will do it?
[Bug rtl-optimization/78041] Wrong code on ARMv7 with -mthumb -mfpu=neon-fp16 -O0
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78041 --- Comment #6 from Richard Earnshaw --- (In reply to Bernd Edlinger from comment #5) > (In reply to Wilco from comment #4) > > However dealing with partial overlaps is complex so maybe the best option > > would be to add alternatives to di3_neon to either allow full overlap > > "r 0 X X X" or no overlap " r X X X". The shift code works with full > > overlap. > > That sounds like a good idea. > > Then this condition in di3_neon could go away too: > > && (!reg_overlap_mentioned_p (operands[0], operands[1]) > || REGNO (operands[0]) == REGNO (operands[1]))) Note that we don't want to restrict complete overlaps, only partial overlaps. Restricting complete overlaps leads to significant increase in register pressure and a lot of redundant copying.
[Bug rtl-optimization/78041] Wrong code on ARMv7 with -mthumb -mfpu=neon-fp16 -O0
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78041 --- Comment #5 from Bernd Edlinger --- (In reply to Wilco from comment #4) > However dealing with partial overlaps is complex so maybe the best option > would be to add alternatives to di3_neon to either allow full overlap > "r 0 X X X" or no overlap " r X X X". The shift code works with full > overlap. That sounds like a good idea. Then this condition in di3_neon could go away too: && (!reg_overlap_mentioned_p (operands[0], operands[1]) || REGNO (operands[0]) == REGNO (operands[1])))
[Bug rtl-optimization/78041] Wrong code on ARMv7 with -mthumb -mfpu=neon-fp16 -O0
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78041 --- Comment #4 from Wilco --- (In reply to Bernd Edlinger from comment #3) > (In reply to Wilco from comment #2) > > (In reply to Bernd Edlinger from comment #1) > > > some background about this bug can be found here: > > > > > > https://gcc.gnu.org/ml/gcc-patches/2016-10/msg01561.html > > > > The di3_neon pattern does not constrain the input not to overlap with > > the output for immediate shifts, and the arm_emit_coreregs_64bit_shift code > > does not handle partial overlaps. However it is feasible to fix that by > > swapping the order for the various cases. > > from di3_neon: > > if (INTVAL (operands[2]) < 1) > { > emit_insn (gen_movdi (operands[0], operands[1])); > DONE; > } > > Will the movdi pattern work with partial overlaps? > It does basically this: > > emit_move_insn (gen_lowpart (SImode, operands[0]), > gen_lowpart (SImode, operands[1])); > emit_move_insn (gen_highpart (SImode, operands[0]), > gen_highpart (SImode, operands[1])); I think it's OK - that code only triggers if a movdi has a physical register that is not a valid DI register which is not the case we're dealing with. movdi has a split that does check for partial overlap around line 5900 in arm.md: /* Handle a partial overlap. */ if (rtx_equal_p (operands[0], operands[3])) ... However dealing with partial overlaps is complex so maybe the best option would be to add alternatives to di3_neon to either allow full overlap "r 0 X X X" or no overlap " r X X X". The shift code works with full overlap.
[Bug rtl-optimization/78041] Wrong code on ARMv7 with -mthumb -mfpu=neon-fp16 -O0
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78041 --- Comment #3 from Bernd Edlinger --- (In reply to Wilco from comment #2) > (In reply to Bernd Edlinger from comment #1) > > some background about this bug can be found here: > > > > https://gcc.gnu.org/ml/gcc-patches/2016-10/msg01561.html > > The di3_neon pattern does not constrain the input not to overlap with > the output for immediate shifts, and the arm_emit_coreregs_64bit_shift code > does not handle partial overlaps. However it is feasible to fix that by > swapping the order for the various cases. from di3_neon: if (INTVAL (operands[2]) < 1) { emit_insn (gen_movdi (operands[0], operands[1])); DONE; } Will the movdi pattern work with partial overlaps? It does basically this: emit_move_insn (gen_lowpart (SImode, operands[0]), gen_lowpart (SImode, operands[1])); emit_move_insn (gen_highpart (SImode, operands[0]), gen_highpart (SImode, operands[1]));
[Bug rtl-optimization/78041] Wrong code on ARMv7 with -mthumb -mfpu=neon-fp16 -O0
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78041 Wilco changed: What|Removed |Added CC||wdijkstr at arm dot com --- Comment #2 from Wilco --- (In reply to Bernd Edlinger from comment #1) > some background about this bug can be found here: > > https://gcc.gnu.org/ml/gcc-patches/2016-10/msg01561.html The di3_neon pattern does not constrain the input not to overlap with the output for immediate shifts, and the arm_emit_coreregs_64bit_shift code does not handle partial overlaps. However it is feasible to fix that by swapping the order for the various cases.
[Bug rtl-optimization/78041] Wrong code on ARMv7 with -mthumb -mfpu=neon-fp16 -O0
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78041 --- Comment #1 from Bernd Edlinger --- some background about this bug can be found here: https://gcc.gnu.org/ml/gcc-patches/2016-10/msg01561.html