This patch fixes PR rtl-optimization/52175, which is another case of dbr_schedule mishandling the result of shrink-wrapping. There is code to allow:
add rN,rN,CONST to be put into a delay slot by adding a compensating: add rN,rN,-CONST to the opposite thread. (Which has caused headaches before because it increases code size.) In the testcase, we end up doing this for a frame-related stack allocation: we allocate stack in the delay slot, then immediately deallocate it again on the opposite thread. This triggers an ICE because the deallocation is not marked as frame-related, so it looks to the cfi code as though we can reach the same point with two different CFAs. We don't really want the optimisers to add new frame-related annotations, and it's difficult to do that anyway in the general case where the annotation is based on an instruction that had REG_FRAME_RELATED_EXPR notes attached. I also can't really think of many cases where it would be worthwhile. So this patch stops us applying for the optimisation in that case, just like we did for PR 51471. I strongly suspect that doing this optimisation for the stack pointer is independently wrong -- e.g. if an alloca is protected by a bounds check -- but since that part isn't a regression, I'm not tackling it here. And, to really over-egg this one-liner, I thought I'd better mention why I fixed it this way. The code reads: /* If we haven't found anything for this delay slot and it is very likely that the branch will be taken, see if the insn at our target increments or decrements a register with an increment that does not depend on the destination register. If so, try to place the opposite arithmetic insn after the jump insn and put the arithmetic insn in the delay slot. If we can't do this, return. */ if (delay_list == 0 && likely && new_thread && !ANY_RETURN_P (new_thread) && NONJUMP_INSN_P (new_thread) && GET_CODE (PATTERN (new_thread)) != ASM_INPUT && asm_noperands (PATTERN (new_thread)) < 0) { rtx pat = PATTERN (new_thread); rtx dest; rtx src; trial = new_thread; pat = PATTERN (trial); if (!NONJUMP_INSN_P (trial) || GET_CODE (pat) != SET || ! eligible_for_delay (insn, 0, trial, flags) || can_throw_internal (trial)) return 0; And there's obviously some redundancy between the two if blocks here. So which is the right way to handle the unoptimisable case? Returning, or falling through? I think it's falling through, so that we still redirect the jump in cases where we have skipped a redundant insn without actually filling a delay slot. I therefore added the check to the first "if" statement rather than the second. Despite all that, this felt obvious enough to self-approve. Tested on various mips* targets and applied. Richard gcc/ PR rtl-optimization/52175 * reorg.c (fill_slots_from_thread): Don't apply add/sub optimization to frame-related instructions. gcc/testsuite/ PR rtl-optimization/52175 * gcc.c-torture/compile/pr52175.c: New test. Index: gcc/reorg.c =================================================================== --- gcc/reorg.c 2012-02-07 22:28:46.000000000 +0000 +++ gcc/reorg.c 2012-02-07 22:40:58.000000000 +0000 @@ -2937,6 +2937,7 @@ fill_slots_from_thread (rtx insn, rtx co if (delay_list == 0 && likely && new_thread && !ANY_RETURN_P (new_thread) && NONJUMP_INSN_P (new_thread) + && !RTX_FRAME_RELATED_P (new_thread) && GET_CODE (PATTERN (new_thread)) != ASM_INPUT && asm_noperands (PATTERN (new_thread)) < 0) { Index: gcc/testsuite/gcc.c-torture/compile/pr52175.c =================================================================== --- /dev/null 2012-02-08 18:03:45.062776123 +0000 +++ gcc/testsuite/gcc.c-torture/compile/pr52175.c 2012-02-08 19:13:26.000000000 +0000 @@ -0,0 +1,25 @@ +void bad (void); +char *foo (char *src, char **last) +{ + char *dst; + int ch; + dst = src = (src ? src : *last); + + if (*src == 0) + return 0; + + while (src[0]) + { + if (!src[1]) + { + bad (); + break; + } + *dst = *src; + dst += 1; + src += 2; + } + *last = src; + *dst = 0; + return *last; +}