Re: [PATCH] PR63404, gcc 5 miscompiles linux block layer
On 10/08/2014 08:31 AM, Jiong Wang wrote: On 30/09/14 19:36, Jiong Wang wrote: 2014-09-30 17:30 GMT+01:00 Jeff Law l...@redhat.com: On 09/30/14 08:37, Jiong Wang wrote: On 30/09/14 05:21, Jeff Law wrote: I do agree with Richard that it would be useful to see the insns that are incorrectly sunk and the surrounding context. So I must be missing something. I thought the shrink-wrapping code wouldn't sink arithmetic/logical insns like we see with insn 14 and insn 182. I thought it was limited to reg-reg copies and constant initializations. yes, it was limited to reg-reg copies, and my previous sink improvement aimed to sink any rtx A: be single_set B: the src operand be any combination of no more than one register and no non-constant objects. while some operator like shift may have side effect. IMHO, all side effects are reflected on RTX, together with this fail_on_clobber_use modification, the rtx returned by single_set_no_clobber_use is safe to sink if it meets the above limit B and pass later register use/def check in move_insn_for_shrink_wrap ? Ping ~ And as there is NONDEBUG_INSN_P check before move_insn_for_shrink_wrap invoked, we could avoid creating new wrapper function by invoke single_set_2 directly. I'm committing the following to fix this. (1) Don't bother modifying single_set; just look for a bare SET. (2) Tighten the set of expressions we're willing to move. (3) Use direct return false in the failure case, rather than counting a non-zero number of non-constants, etc. Tested on x86_64, and against Andi's test case (unfortunately unreduced). r~ 2014-10-10 Richard Henderson r...@redhat.com PR target/63404 * shrink-wrap.c (move_insn_for_shrink_wrap): Don't use single_set. Restrict the set of expressions we're willing to move. diff --git a/gcc/shrink-wrap.c b/gcc/shrink-wrap.c index b1ff8a2..257812c 100644 --- a/gcc/shrink-wrap.c +++ b/gcc/shrink-wrap.c @@ -176,17 +176,40 @@ move_insn_for_shrink_wrap (basic_block bb, rtx_insn *insn, basic_block next_block; edge live_edge; - /* Look for a simple register copy. */ - set = single_set (insn); - if (!set) + /* Look for a simple register assignment. We don't use single_set here + because we can't deal with any CLOBBERs, USEs, or REG_UNUSED secondary + destinations. */ + if (!INSN_P (insn)) +return false; + set = PATTERN (insn); + if (GET_CODE (set) != SET) return false; src = SET_SRC (set); dest = SET_DEST (set); - if (!REG_P (src)) + /* For the destination, we want only a register. Also disallow STACK + or FRAME related adjustments. They are likely part of the prologue, + so keep them in the entry block. */ + if (!REG_P (dest) + || dest == stack_pointer_rtx + || dest == frame_pointer_rtx + || dest == hard_frame_pointer_rtx) +return false; + + /* For the source, we want one of: + (1) A (non-overlapping) register + (2) A constant, + (3) An expression involving no more than one register. + + That last point comes from the code following, which was originally + written to handle only register move operations, and still only handles + a single source register when checking for overlaps. Happily, the + same checks can be applied to expressions like (plus reg const). */ + + if (CONSTANT_P (src)) +; + else if (!REG_P (src)) { - unsigned int reg_num = 0; - unsigned int nonconstobj_num = 0; rtx src_inner = NULL_RTX; if (can_throw_internal (insn)) @@ -196,30 +219,50 @@ move_insn_for_shrink_wrap (basic_block bb, rtx_insn *insn, FOR_EACH_SUBRTX_VAR (iter, array, src, ALL) { rtx x = *iter; - if (REG_P (x)) + switch (GET_RTX_CLASS (GET_CODE (x))) { - reg_num++; - src_inner = x; + case RTX_CONST_OBJ: + case RTX_COMPARE: + case RTX_COMM_COMPARE: + case RTX_BIN_ARITH: + case RTX_COMM_ARITH: + case RTX_UNARY: + case RTX_TERNARY: + /* Constant or expression. Continue. */ + break; + + case RTX_OBJ: + case RTX_EXTRA: + switch (GET_CODE (x)) + { + case UNSPEC: + case SUBREG: + case STRICT_LOW_PART: + case PC: + /* Ok. Continue. */ + break; + + case REG: + /* Fail if we see a second inner register. */ + if (src_inner != NULL) + return false; + src_inner = x; + break; + + default: + return false; + } + break; + + default: + return false; } - else if (!CONSTANT_P (x) OBJECT_P (x)) - nonconstobj_num++; } - if (nonconstobj_num 0 -
Re: [PATCH] PR63404, gcc 5 miscompiles linux block layer
On 10/10/14 16:59, Richard Henderson wrote: On 10/08/2014 08:31 AM, Jiong Wang wrote: Ping ~ And as there is NONDEBUG_INSN_P check before move_insn_for_shrink_wrap invoked, we could avoid creating new wrapper function by invoke single_set_2 directly. I'm committing the following to fix this. (1) Don't bother modifying single_set; just look for a bare SET. (2) Tighten the set of expressions we're willing to move. (3) Use direct return false in the failure case, rather than counting a non-zero number of non-constants, etc. Tested on x86_64, and against Andi's test case (unfortunately unreduced). minor nit, after this patch, gcc.target/aarch64/shrink_wrap_symbol_ref_1.c still not shrink-wrapped under -mabi=ilp32, the reason is as Pinski reported, LO_SUM is with with RTX_OBJ class, while it could be treated as expression. in your fix, if it's RTX_OBJ, and LO_SUM, then the move aborted by return NULL. anyway, thanks for the fix, I verified the shrink-wrapping of some hotpot functions in benchmark on aarch64 are not affected. Regards, Jiong r~
Re: [PATCH] PR63404, gcc 5 miscompiles linux block layer
On 10/10/2014 09:39 AM, Jiong Wang wrote: (1) Don't bother modifying single_set; just look for a bare SET. (2) Tighten the set of expressions we're willing to move. (3) Use direct return false in the failure case, rather than counting a non-zero number of non-constants, etc. Tested on x86_64, and against Andi's test case (unfortunately unreduced). minor nit, after this patch, gcc.target/aarch64/shrink_wrap_symbol_ref_1.c still not shrink-wrapped under -mabi=ilp32, the reason is as Pinski reported, LO_SUM is with with RTX_OBJ class, while it could be treated as expression. in your fix, if it's RTX_OBJ, and LO_SUM, then the move aborted by return NULL. I missed that message. Interesting. I wonder what kind of fallout there would be from changing LO_SUM to RTX_BIN_ARITH, which is what it should have been all along. My guess is that someone looked at HIGH being RTX_CONST_OBJ and thought that LO_SUM should also be a kind of object too. But it's really a lot more like a kind of PLUS. If instead we had a LOW to match HIGH it would have been (plus reg (low symbol)) and thus more obvious. I'll see if I can bootstrap such a change on e.g. sparc or ppc32, which uses lo_sum heavily. anyway, thanks for the fix, I verified the shrink-wrapping of some hotpot functions in benchmark on aarch64 are not affected. Thanks for the additional testing. r~
Re: [PATCH] PR63404, gcc 5 miscompiles linux block layer
On 10/10/14 10:52, Richard Henderson wrote: I missed that message. Interesting. I wonder what kind of fallout there would be from changing LO_SUM to RTX_BIN_ARITH, which is what it should have been all along. My guess is that someone looked at HIGH being RTX_CONST_OBJ and thought that LO_SUM should also be a kind of object too. Most likely. One could even make an argument that LO_SUM should be a constant, but that's a more intrusive change. But it's really a lot more like a kind of PLUS. If instead we had a LOW to match HIGH it would have been Right. In fact, I believe at the hardware level it's typically implemented as addition. Jeff
Re: [PATCH] PR63404, gcc 5 miscompiles linux block layer
On Fri, Oct 10, 2014 at 11:00:54AM -0600, Jeff Law wrote: But it's really a lot more like a kind of PLUS. If instead we had a LOW to match HIGH it would have been Right. In fact, I believe at the hardware level it's typically implemented as addition. Can be addition or bitwise or, as high should have some low bits zero, both usually work the same. Jakub
Re: [PATCH] PR63404, gcc 5 miscompiles linux block layer
On 10/10/14 11:04, Jakub Jelinek wrote: On Fri, Oct 10, 2014 at 11:00:54AM -0600, Jeff Law wrote: But it's really a lot more like a kind of PLUS. If instead we had a LOW to match HIGH it would have been Right. In fact, I believe at the hardware level it's typically implemented as addition. Can be addition or bitwise or, as high should have some low bits zero, both usually work the same. It can be bitwise-or on some architectures, but I believe it's typically implemented as addition. There's also typically an overlap between the high and low parts when building up addresses. GCC actually takes advantage of that overlap to reduce unnecessary HIGH expressions. And you can always have an oddball architecture like the PA where the LO_SUM does something utterly braindead. It looks like this addil %r27,symbolic nonsense You might think %r27 holds the high bits of the address... Umm, no it doesn't. There's an implicit %r1 source/destination operand (which holds the high bits). So what this really does is add %r27, %r1 and the symbolic constant. %r27 is a data pointer. Obviously the implicit operand is used get more bits holding the symbolic constant in the instruction. If that's not bad enough, if the object is in readonly memory, then the linker will modify the instruction to change %r27 to %r0 (hardwired 0 constant).But I'm getting offtopic here :-) jeff
Re: [PATCH] PR63404, gcc 5 miscompiles linux block layer
On 10/10/2014 10:21 AM, Jeff Law wrote: On 10/10/14 11:04, Jakub Jelinek wrote: On Fri, Oct 10, 2014 at 11:00:54AM -0600, Jeff Law wrote: But it's really a lot more like a kind of PLUS. If instead we had a LOW to match HIGH it would have been Right. In fact, I believe at the hardware level it's typically implemented as addition. Can be addition or bitwise or, as high should have some low bits zero, both usually work the same. It can be bitwise-or on some architectures... Just to be clear, the LOW rtx code I mentioned upthread was for illustration purposes only. I only intend to change the rtx class of the existing LO_SUM. :-) r~
Re: [PATCH] PR63404, gcc 5 miscompiles linux block layer
On 10/10/14 11:25, Richard Henderson wrote: On 10/10/2014 10:21 AM, Jeff Law wrote: On 10/10/14 11:04, Jakub Jelinek wrote: On Fri, Oct 10, 2014 at 11:00:54AM -0600, Jeff Law wrote: But it's really a lot more like a kind of PLUS. If instead we had a LOW to match HIGH it would have been Right. In fact, I believe at the hardware level it's typically implemented as addition. Can be addition or bitwise or, as high should have some low bits zero, both usually work the same. It can be bitwise-or on some architectures... Just to be clear, the LOW rtx code I mentioned upthread was for illustration purposes only. I only intend to change the rtx class of the existing LO_SUM. That was my expectation :-) jeff
Re: [PATCH] PR63404, gcc 5 miscompiles linux block layer
On 10/10/2014 09:52 AM, Richard Henderson wrote: I wonder what kind of fallout there would be from changing LO_SUM to RTX_BIN_ARITH, which is what it should have been all along. The answer is a lot. Mostly throughout simplify-rtx.c, wherein we'd have to move all sorts of code around to accommodate the class change. I've abandoned that line and I'm just testing a one line change to shrink-wrap.c to accept LO_SUM. r~
Re: [PATCH] PR63404, gcc 5 miscompiles linux block layer
On 30/09/14 19:36, Jiong Wang wrote: 2014-09-30 17:30 GMT+01:00 Jeff Law l...@redhat.com: On 09/30/14 08:37, Jiong Wang wrote: On 30/09/14 05:21, Jeff Law wrote: I do agree with Richard that it would be useful to see the insns that are incorrectly sunk and the surrounding context. So I must be missing something. I thought the shrink-wrapping code wouldn't sink arithmetic/logical insns like we see with insn 14 and insn 182. I thought it was limited to reg-reg copies and constant initializations. yes, it was limited to reg-reg copies, and my previous sink improvement aimed to sink any rtx A: be single_set B: the src operand be any combination of no more than one register and no non-constant objects. while some operator like shift may have side effect. IMHO, all side effects are reflected on RTX, together with this fail_on_clobber_use modification, the rtx returned by single_set_no_clobber_use is safe to sink if it meets the above limit B and pass later register use/def check in move_insn_for_shrink_wrap ? Ping ~ And as there is NONDEBUG_INSN_P check before move_insn_for_shrink_wrap invoked, we could avoid creating new wrapper function by invoke single_set_2 directly. comments? bootstrap ok on x86-64, and no regression on check-gcc/g++. will do aarch64 bootstrapping/regression going on. 2014-10-08 Jiong Wang jiong.w...@arm.com * rtl.h (single_set_2): New parameter fail_on_clobber_use. (single_set): Likewise. * config/ia64/ia64.c (ia64_single_set): Likewise. * rtlanal.c (single_set_2): Return NULL_RTX if fail_on_clobber_use be true. * shrink-wrap.c (move_insn_for_shrink_wrap): Use single_set_2. Regards, Jiong Regards, Jiong Jeff diff --git a/gcc/config/ia64/ia64.c b/gcc/config/ia64/ia64.c index 9337be1..09d3c4a 100644 --- a/gcc/config/ia64/ia64.c +++ b/gcc/config/ia64/ia64.c @@ -7172,7 +7172,7 @@ ia64_single_set (rtx_insn *insn) break; default: - ret = single_set_2 (insn, x); + ret = single_set_2 (insn, x, false); break; } diff --git a/gcc/rtl.h b/gcc/rtl.h index e73f731..c0b5bf5 100644 --- a/gcc/rtl.h +++ b/gcc/rtl.h @@ -2797,7 +2797,7 @@ extern void set_insn_deleted (rtx); /* Functions in rtlanal.c */ -extern rtx single_set_2 (const rtx_insn *, const_rtx); +extern rtx single_set_2 (const rtx_insn *, const_rtx, bool fail_on_clobber_use); /* Handle the cheap and common cases inline for performance. */ @@ -2810,7 +2810,7 @@ inline rtx single_set (const rtx_insn *insn) return PATTERN (insn); /* Defer to the more expensive case. */ - return single_set_2 (insn, PATTERN (insn)); + return single_set_2 (insn, PATTERN (insn), false); } extern enum machine_mode get_address_mode (rtx mem); diff --git a/gcc/rtlanal.c b/gcc/rtlanal.c index 3063458..7d6ed27 100644 --- a/gcc/rtlanal.c +++ b/gcc/rtlanal.c @@ -1182,7 +1182,7 @@ record_hard_reg_uses (rtx *px, void *data) will not be used, which we ignore. */ rtx -single_set_2 (const rtx_insn *insn, const_rtx pat) +single_set_2 (const rtx_insn *insn, const_rtx pat, bool fail_on_clobber_use) { rtx set = NULL; int set_verified = 1; @@ -1197,6 +1197,8 @@ single_set_2 (const rtx_insn *insn, const_rtx pat) { case USE: case CLOBBER: + if (fail_on_clobber_use) + return NULL_RTX; break; case SET: diff --git a/gcc/shrink-wrap.c b/gcc/shrink-wrap.c index b1ff8a2..a3b57b6 100644 --- a/gcc/shrink-wrap.c +++ b/gcc/shrink-wrap.c @@ -177,7 +177,8 @@ move_insn_for_shrink_wrap (basic_block bb, rtx_insn *insn, edge live_edge; /* Look for a simple register copy. */ - set = single_set (insn); + set = (GET_CODE (PATTERN (insn)) == SET ? PATTERN (insn) + : single_set_2 (insn, PATTERN (insn), true)); if (!set) return false; src = SET_SRC (set);
Re: [PATCH] PR63404, gcc 5 miscompiles linux block layer
On 29/09/14 19:32, Richard Henderson wrote: On 09/29/2014 11:12 AM, Jiong Wang wrote: +inline rtx single_set_no_clobber_use (const rtx_insn *insn) +{ + if (!INSN_P (insn)) +return NULL_RTX; + + if (GET_CODE (PATTERN (insn)) == SET) +return PATTERN (insn); + + /* Defer to the more expensive case, and return NULL_RTX if there is + USE or CLOBBER. */ + return single_set_2 (insn, PATTERN (insn), true); } What more expensive case? If you're disallowing USE and CLOBBER, then single_set is just GET_CODE == SET. I think this function is somewhat useless, and should not be added. An adjustment to move_insn_for_shrink_wrap may be reasonable though. I haven't tried to understand the miscompilation yet. I can imagine that this would disable quite a bit of shrink wrapping for x86 though. Can we do better in understanding when the clobbered register is live at the location to which we'd like to move then insns? r~ I think part of the problem is in the naming of single_set(). From the name it's not entirely obvious to users that this includes insns that clobber registers or which write other registers that are unused after that point. I've previously had to fix a bug where this assumption was made (eg https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54300) Most uses of single_set prior to register allocation are probably safe; but later uses are fraught with potential problems of this nature and may well be bugs waiting to happen. R.
Re: [PATCH] PR63404, gcc 5 miscompiles linux block layer
On 30/09/14 05:21, Jeff Law wrote: On 09/29/14 13:24, Jiong Wang wrote: I don't think so. from the x86-64 bootstrap, there is no regression on the number of functions shrink-wrapped. actually speaking, previously only single mov dest, src handled, so the disallowing USE/CLOBBER will not disallow shrink-wrap opportunity which was allowed previously. This is the key, of course. shrink-wrapping is very restrictive in its ability to sink insns. The only forms it'll currently shrink are simple moves. Arithmetic, logicals, etc are left alone. Thus disallowing USE/CLOBBER does not impact the x86 port in any significant way. yes, and we could get +1.22% (2567 compared with 2536) functions shrink-wrapped after we sinking more insn except simple mov dest, src on x86-64 bootstrap. and I remember the similar percentage on glibc build. while on aarch64, the overall functions shrink-wrapped increased +25% on some programs. maybe we could gain the same on other RISC backend. I do agree with Richard that it would be useful to see the insns that are incorrectly sunk and the surrounding context. insn 14 and 182 are sunk incorrectly. below is the details. (insn 14 173 174 2 (parallel [ (set (reg:QI 37 r8 [orig:86 D.32480 ] [86]) (lshiftrt:QI (reg:QI 37 r8 [orig:86 D.32480 ] [86]) (const_int 2 [0x2]))) (clobber (reg:CC 17 flags)) ]) /home/andi/lsrc/linux/block/blk-flush2.c:50 547 {*lshrqi3_1} (expr_list:REG_EQUAL (lshiftrt:QI (mem:QI (plus:DI (reg/v/f:DI 43 r14 [orig:85 q ] [85]) (const_int 1612 [0x64c])) [20 *q_7+1612 S1 A32]) (const_int 2 [0x2])) (nil))) (insn 174 14 182 2 (set (reg:QI 44 r15 [orig:86 D.32480 ] [86]) (reg:QI 37 r8 [orig:86 D.32480 ] [86])) /home/andi/lsrc/linux/block/blk-flush2.c:50 93 {*movqi_internal} (nil)) (insn 182 174 16 2 (parallel [ (set (reg:SI 44 r15 [orig:86 D.32480 ] [86]) (and:SI (reg:SI 44 r15 [orig:86 D.32480 ] [86]) (const_int 1 [0x1]))) (clobber (reg:CC 17 flags)) ]) /home/andi/lsrc/linux/block/blk-flush2.c:50 376 {*andsi_1} (nil)) Jeff
Re: [PATCH] PR63404, gcc 5 miscompiles linux block layer
On 09/30/14 08:37, Jiong Wang wrote: On 30/09/14 05:21, Jeff Law wrote: On 09/29/14 13:24, Jiong Wang wrote: I don't think so. from the x86-64 bootstrap, there is no regression on the number of functions shrink-wrapped. actually speaking, previously only single mov dest, src handled, so the disallowing USE/CLOBBER will not disallow shrink-wrap opportunity which was allowed previously. This is the key, of course. shrink-wrapping is very restrictive in its ability to sink insns. The only forms it'll currently shrink are simple moves. Arithmetic, logicals, etc are left alone. Thus disallowing USE/CLOBBER does not impact the x86 port in any significant way. yes, and we could get +1.22% (2567 compared with 2536) functions shrink-wrapped after we sinking more insn except simple mov dest, src on x86-64 bootstrap. and I remember the similar percentage on glibc build. while on aarch64, the overall functions shrink-wrapped increased +25% on some programs. maybe we could gain the same on other RISC backend. I do agree with Richard that it would be useful to see the insns that are incorrectly sunk and the surrounding context. So I must be missing something. I thought the shrink-wrapping code wouldn't sink arithmetic/logical insns like we see with insn 14 and insn 182. I thought it was limited to reg-reg copies and constant initializations. Jeff
Re: [PATCH] PR63404, gcc 5 miscompiles linux block layer
On 09/30/14 08:15, Richard Earnshaw wrote: I think part of the problem is in the naming of single_set(). From the name it's not entirely obvious to users that this includes insns that clobber registers or which write other registers that are unused after that point. I've previously had to fix a bug where this assumption was made (eg https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54300) Most uses of single_set prior to register allocation are probably safe; but later uses are fraught with potential problems of this nature and may well be bugs waiting to happen. Very possibly. There's a bit of a natural tension here in that often we don't much care about the additional CLOBBERS, but when we get it wrong, obviously it's bad. I haven't done any research, but I suspect the change it ignore clobbers in single_set came in as part of exposing the CC register and avoiding regressions all over the place as a result. I wonder what would happen if we ignored prior to register allocation, then rejected insns with those CLOBBERs once register allocation started. Jeff
Re: [PATCH] PR63404, gcc 5 miscompiles linux block layer
On 30/09/14 17:45, Jeff Law wrote: On 09/30/14 08:15, Richard Earnshaw wrote: I think part of the problem is in the naming of single_set(). From the name it's not entirely obvious to users that this includes insns that clobber registers or which write other registers that are unused after that point. I've previously had to fix a bug where this assumption was made (eg https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54300) Most uses of single_set prior to register allocation are probably safe; but later uses are fraught with potential problems of this nature and may well be bugs waiting to happen. Very possibly. There's a bit of a natural tension here in that often we don't much care about the additional CLOBBERS, but when we get it wrong, obviously it's bad. I haven't done any research, but I suspect the change it ignore clobbers in single_set came in as part of exposing the CC register and avoiding regressions all over the place as a result. It's not just clobbers; it ignores patterns like (parallel [(set (a) (...) (set (b) (...)]) [(reg_note (reg_unused(b))] Which is probably fine before register allocation but definitely something you have to think about afterwards. I wonder what would happen if we ignored prior to register allocation, then rejected insns with those CLOBBERs once register allocation started. Might work; but it might miss some useful cases as well... R.
Re: [PATCH] PR63404, gcc 5 miscompiles linux block layer
2014-09-30 17:30 GMT+01:00 Jeff Law l...@redhat.com: On 09/30/14 08:37, Jiong Wang wrote: On 30/09/14 05:21, Jeff Law wrote: On 09/29/14 13:24, Jiong Wang wrote: I don't think so. from the x86-64 bootstrap, there is no regression on the number of functions shrink-wrapped. actually speaking, previously only single mov dest, src handled, so the disallowing USE/CLOBBER will not disallow shrink-wrap opportunity which was allowed previously. This is the key, of course. shrink-wrapping is very restrictive in its ability to sink insns. The only forms it'll currently shrink are simple moves. Arithmetic, logicals, etc are left alone. Thus disallowing USE/CLOBBER does not impact the x86 port in any significant way. yes, and we could get +1.22% (2567 compared with 2536) functions shrink-wrapped after we sinking more insn except simple mov dest, src on x86-64 bootstrap. and I remember the similar percentage on glibc build. while on aarch64, the overall functions shrink-wrapped increased +25% on some programs. maybe we could gain the same on other RISC backend. I do agree with Richard that it would be useful to see the insns that are incorrectly sunk and the surrounding context. So I must be missing something. I thought the shrink-wrapping code wouldn't sink arithmetic/logical insns like we see with insn 14 and insn 182. I thought it was limited to reg-reg copies and constant initializations. yes, it was limited to reg-reg copies, and my previous sink improvement aimed to sink any rtx A: be single_set B: the src operand be any combination of no more than one register and no non-constant objects. while some operator like shift may have side effect. IMHO, all side effects are reflected on RTX, together with this fail_on_clobber_use modification, the rtx returned by single_set_no_clobber_use is safe to sink if it meets the above limit B and pass later register use/def check in move_insn_for_shrink_wrap ? Regards, Jiong Jeff
Re: [PATCH] PR63404, gcc 5 miscompiles linux block layer
On Tue, Sep 30, 2014 at 6:51 PM, Richard Earnshaw wrote: It's not just clobbers; it ignores patterns like (parallel [(set (a) (...) (set (b) (...)]) [(reg_note (reg_unused(b))] Which is probably fine before register allocation but definitely something you have to think about afterwards. Even before RA this isn't always fine. We have checks for !multiple_sets for this. Ciao! Steven
[PATCH] PR63404, gcc 5 miscompiles linux block layer
it's exposed by linux kernel for x86. the root cause is current single_set will ignore CLOBBER USE, while we need to take them into account when handling shrink-wrap. this patch add one parameter to single_set_2 to support return NULL_RTX if we want to remove any side-effect. add a new helper function single_set_no_clobber_use added. pass x86-64 bootstrap check-gcc/g++, also manually checked ths issue reported at 63404 is gone away. also no regression on aarch64-none-elf regression test. comments? thanks. 2014-09-26 Jiong Wang jiong.w...@arm.com * rtl.h (single_set_no_clobber_use): New function. (single_set_2): New parameter fail_on_clobber_use. (single_set): Likewise. * config/ia64/ia64.c (ia64_single_set): Likewise. * rtlanal.c (single_set_2): Return NULL_RTX if fail_on_clobber_use be true. * shrink-wrap.c (move_insn_for_shrink_wrap): Use single_set_no_clobber_use. diff --git a/gcc/config/ia64/ia64.c b/gcc/config/ia64/ia64.c index 9337be1..09d3c4a 100644 --- a/gcc/config/ia64/ia64.c +++ b/gcc/config/ia64/ia64.c @@ -7172,7 +7172,7 @@ ia64_single_set (rtx_insn *insn) break; default: - ret = single_set_2 (insn, x); + ret = single_set_2 (insn, x, false); break; } diff --git a/gcc/rtl.h b/gcc/rtl.h index e73f731..7c40d5a 100644 --- a/gcc/rtl.h +++ b/gcc/rtl.h @@ -2797,7 +2797,7 @@ extern void set_insn_deleted (rtx); /* Functions in rtlanal.c */ -extern rtx single_set_2 (const rtx_insn *, const_rtx); +extern rtx single_set_2 (const rtx_insn *, const_rtx, bool fail_on_clobber_use); /* Handle the cheap and common cases inline for performance. */ @@ -2810,7 +2810,20 @@ inline rtx single_set (const rtx_insn *insn) return PATTERN (insn); /* Defer to the more expensive case. */ - return single_set_2 (insn, PATTERN (insn)); + return single_set_2 (insn, PATTERN (insn), false); +} + +inline rtx single_set_no_clobber_use (const rtx_insn *insn) +{ + if (!INSN_P (insn)) +return NULL_RTX; + + if (GET_CODE (PATTERN (insn)) == SET) +return PATTERN (insn); + + /* Defer to the more expensive case, and return NULL_RTX if there is + USE or CLOBBER. */ + return single_set_2 (insn, PATTERN (insn), true); } extern enum machine_mode get_address_mode (rtx mem); diff --git a/gcc/rtlanal.c b/gcc/rtlanal.c index 3063458..cb5e36a 100644 --- a/gcc/rtlanal.c +++ b/gcc/rtlanal.c @@ -1182,7 +1182,7 @@ record_hard_reg_uses (rtx *px, void *data) will not be used, which we ignore. */ rtx -single_set_2 (const rtx_insn *insn, const_rtx pat) +single_set_2 (const rtx_insn *insn, const_rtx pat, bool fail_on_clobber_use) { rtx set = NULL; int set_verified = 1; @@ -1197,6 +1197,8 @@ single_set_2 (const rtx_insn *insn, const_rtx pat) { case USE: case CLOBBER: + if (fail_on_clobber_use) + return NULL_RTX; break; case SET: diff --git a/gcc/shrink-wrap.c b/gcc/shrink-wrap.c index b1ff8a2..5624ef7 100644 --- a/gcc/shrink-wrap.c +++ b/gcc/shrink-wrap.c @@ -177,7 +177,7 @@ move_insn_for_shrink_wrap (basic_block bb, rtx_insn *insn, edge live_edge; /* Look for a simple register copy. */ - set = single_set (insn); + set = single_set_no_clobber_use (insn); if (!set) return false; src = SET_SRC (set);
Re: [PATCH] PR63404, gcc 5 miscompiles linux block layer
On 09/29/2014 11:12 AM, Jiong Wang wrote: +inline rtx single_set_no_clobber_use (const rtx_insn *insn) +{ + if (!INSN_P (insn)) +return NULL_RTX; + + if (GET_CODE (PATTERN (insn)) == SET) +return PATTERN (insn); + + /* Defer to the more expensive case, and return NULL_RTX if there is + USE or CLOBBER. */ + return single_set_2 (insn, PATTERN (insn), true); } What more expensive case? If you're disallowing USE and CLOBBER, then single_set is just GET_CODE == SET. I think this function is somewhat useless, and should not be added. An adjustment to move_insn_for_shrink_wrap may be reasonable though. I haven't tried to understand the miscompilation yet. I can imagine that this would disable quite a bit of shrink wrapping for x86 though. Can we do better in understanding when the clobbered register is live at the location to which we'd like to move then insns? r~
Re: [PATCH] PR63404, gcc 5 miscompiles linux block layer
On 29/09/14 19:32, Richard Henderson wrote: On 09/29/2014 11:12 AM, Jiong Wang wrote: +inline rtx single_set_no_clobber_use (const rtx_insn *insn) +{ + if (!INSN_P (insn)) +return NULL_RTX; + + if (GET_CODE (PATTERN (insn)) == SET) +return PATTERN (insn); + + /* Defer to the more expensive case, and return NULL_RTX if there is + USE or CLOBBER. */ + return single_set_2 (insn, PATTERN (insn), true); } Richard, thanks for review. What more expensive case? single_set_no_clobber_use is just a clone of single_set, I copied the comments with only minor modifications. I think the more expensive case here means the case where there are PARALLEL that we need to check the inner rtx. If you're disallowing USE and CLOBBER, then single_set is just GET_CODE == SET. I think this function is somewhat useless, and should not be added. An adjustment to move_insn_for_shrink_wrap may be reasonable though. I haven't tried to understand the miscompilation yet. I can imagine that this would disable quite a bit of shrink wrapping for x86 though. I don't think so. from the x86-64 bootstrap, there is no regression on the number of functions shrink-wrapped. actually speaking, previously only single mov dest, src handled, so the disallowing USE/CLOBBER will not disallow shrink-wrap opportunity which was allowed previously. and I am afraid if we don't reuse single_set_2, then there will be another loop to check all those inner rtx which single_set_2 already does. so, IMHO, just modify single_set_2 will be more efficient. Can we do better in understanding when the clobbered register is live at the location to which we'd like to move then insns? currently, the generic code in move_insn_for_shrink_wrap only handle dest/src be single register, so if there is clobber or use, then we might need to check multiply regs, then there might be a few modifications. and I think that's better be done after all single dest/src issues fixed. -- Jiong r~
Re: [PATCH] PR63404, gcc 5 miscompiles linux block layer
On Mon, 2014-09-29 at 20:24 +0100, Jiong Wang wrote: On 29/09/14 19:32, Richard Henderson wrote: On 09/29/2014 11:12 AM, Jiong Wang wrote: +inline rtx single_set_no_clobber_use (const rtx_insn *insn) +{ + if (!INSN_P (insn)) +return NULL_RTX; + + if (GET_CODE (PATTERN (insn)) == SET) +return PATTERN (insn); + + /* Defer to the more expensive case, and return NULL_RTX if there is + USE or CLOBBER. */ + return single_set_2 (insn, PATTERN (insn), true); } Richard, thanks for review. What more expensive case? single_set_no_clobber_use is just a clone of single_set, I copied the comments with only minor modifications. I introduced that comment to single_set, in r215089, when making single_set into an inline function (so that it could check that it received an rtx_insn *, rather than an rtx). I think the more expensive case here means the case where there are PARALLEL that we need to check the inner rtx. My comment may have been misleading, sorry. IIRC, what I was thinking that the old implementation had split single_set into a macro and a function. This was by Honza (CCed), 14 years ago to the day back in r36664 (on 2000-09-29): https://gcc.gnu.org/ml/gcc-patches/2000-09/msg00893.html /* Single set is implemented as macro for performance reasons. */ #define single_set(I) (INSN_P (I) \ ? (GET_CODE (PATTERN (I)) == SET \ ? PATTERN (I) : single_set_1 (I)) \ : NULL_RTX) I think by the more expensive case I meant having to make a function call to handle the less-common cases (which indeed covers the PARALLEL case), rather than having logic inline; preserving that inlined vs not-inlined split was one of my aims for r215089. Perhaps it should be rewritten to Defer to a function call to handle the less common cases, or somesuch? [...snip rest of post...] Dave
Re: [PATCH] PR63404, gcc 5 miscompiles linux block layer
On 09/29/14 13:24, Jiong Wang wrote: I don't think so. from the x86-64 bootstrap, there is no regression on the number of functions shrink-wrapped. actually speaking, previously only single mov dest, src handled, so the disallowing USE/CLOBBER will not disallow shrink-wrap opportunity which was allowed previously. This is the key, of course. shrink-wrapping is very restrictive in its ability to sink insns. The only forms it'll currently shrink are simple moves. Arithmetic, logicals, etc are left alone. Thus disallowing USE/CLOBBER does not impact the x86 port in any significant way. I do agree with Richard that it would be useful to see the insns that are incorrectly sunk and the surrounding context. Jeff