Re: [PR80693] drop value of parallel SETs dropped by combine
Hi! On Thu, Dec 07, 2017 at 07:01:03PM -0200, Alexandre Oliva wrote: [ Big snip; thanks for all the detailed information! ] > Still, I'm concerned I haven't caught all of the cases in which > adjusting REG_N_SETS would be needed: we might have dropped multiple > SETs of the same pseudo, each with its own REG_UNUSED note (say, from > all of i3, i2, i1, and i0), and the current logic will only decrement > REG_N_SETS once, and only if i2 no longer sets the register. The only thing that matters for combine is that if REG_N_SETS == 1, then it must be correct. So "forgetting" to decrement is okay, forgetting to increment is not. > > It wasn't obvious to me (this code is horribly complicated). Whether > > all existing code is correct... it's probably best not to look too > > closely :-/ > > You're right about its being horribly complicated. I would like to remove all regstat things from combine, use "real" DF instead. This will simplify many things, and allow better optimisation as well. But is it fast enough? We'll see. > > If you have a patch you feel confident in, could you post it again > > please? > > So let me tell you how I feel about this. It has waited long enough, > and there are at least 3 bugs known to be fixed by the first very simple > patch below. The catch is that it doesn't adjust REG_N_SETS at all (we > didn't before the patch, and that didn't seem to hurt too much). I've > regstrapped this successfully on x86_64-linux-gnu and i686-linux-gnu. > PR rtl-optimization/80693 > PR rtl-optimization/81019 > PR rtl-optimization/81020 > * combine.c (distribute_notes): Reset any REG_UNUSED REGs that > are not mentioned in i3. Place the REG_UNUSED note on i2, > possibly modified to REG_DEAD, if it did not originate in i3. > > for gcc/testsuite/ChangeLog > > PR rtl-optimization/80693 > PR rtl-optimization/81019 > PR rtl-optimization/81020 > * gcc.dg/pr80693.c: New. > * gcc.dg/pr81019.c: New. > But then it occurred to me that the REG_UNUSED register from i1 might > have *also* been set in i2, and we might then have used it as a scratch > register for split, while the set that the REG_UNUSED pertained to might > have been completely discarded from a parallel in i1 or i0. And then I > wasn't unsure whether or not to decrement REG_N_SETS. [ more snip ] > Then I realized we might have to also reset the cached value when it's > referenced in i3, and possibly also adjust the counts. And also some of > that when it's set in i3. Then I lost it. Help? :-) > > FWIW, I'd be glad just installing the patch between --->cut<---s, or > even a simpler version thereof that just resets the recorded value and > doesn't even attempt to place notes in i2, to get the bugs fixed while > we sort out the really tricky issues of adjusting REG_N_SETS. The > (incomplete, but functional) fix has been known for a while, and we > shouldn't subject users to wrong code when we can help it, even if we > might miss optimization opportunities for that, right? Thoughts? Yes, that first patch is okay for trunk. Thanks for all the work on this! I don't think this patch makes anything worse, and it does make some things better. Segher
Re: [PR80693] drop value of parallel SETs dropped by combine
Hi Alexandre, On 07/12/17 21:01, Alexandre Oliva wrote: On Jul 7, 2017, Segher Boessenkoolwrote: > I meant, just double check if your new > code does the correct thing for the set count. Sorry this took me so long to get back to. Even this was difficult for me to answer for sure, then and now. We don't (and can't?) know whether the REG_UNUSED note originally pertained to a clobber or a set, but I think that doesn't matter: unless we've reused the REG in i2 as a scratch, I think we should decrement the set count, because we used to have a SET or a CLOBBER that is now gone. Looking back at the issue, I realized we should keep/add the REG_UNUSED note to i2, if the reg is mentioned in i2, possibly turned into REG_DEAD if i2 is referenced but not set. Still, I'm concerned I haven't caught all of the cases in which adjusting REG_N_SETS would be needed: we might have dropped multiple SETs of the same pseudo, each with its own REG_UNUSED note (say, from all of i3, i2, i1, and i0), and the current logic will only decrement REG_N_SETS once, and only if i2 no longer sets the register. I'm also concerned that the logic for when the REG is set or referenced in i3 is incomplete: references in i3 might have come from any of the previous insns, even if intervening sets of the same register were substituted and thus removed. Consider the following nightmarish scenario: i0: (parallel [(set (reg CC) (op1 (reg A))) (set (reg A) (plus (reg A) (const_int 1)))]) (REG_UNUSED (reg A)) i1: (set (reg A) (ne (reg CC) (const_int 0))) (REG_DEAD (reg CC)) i2: (parallel [(set (reg CC) (op2 (reg A))) (set (reg A) (plus (reg A) (const_int 1)))]) (REG_UNUSED (reg A))) i3: (set (reg A) (eq (reg CC) (const_int 0))) (REG_DEAD (reg CC)) we might turn that into say: i2: (set (reg CC) (op3 (reg A))) (REG_DEAD (reg A)) i3: (set (reg A) (op4 (reg CC))) (REG_DEAD (reg CC)) and now we'd have to somehow figure out that we're to discount the two unused sets of reg A, those from i0 and i2, and to turn either REG_UNUSED note about reg A into a REG_DEAD note to be placed at i2. A is set at i3, so combine should record its new value, but if it's computed in terms of the scratch CC and a much-older A, will we get the correct value? Or is the value unchanged because it's the output of the latest insn? Now, consider this slightly simpler scenario (trying to combine only the first 3 insns): i0: nil i1: (parallel [(set (reg CC) (op1 (reg A))) (set (reg A) (plus (reg A) (const_int 1)))]) (REG_UNUSED (reg A)) i2: (set (reg A) (ne (reg CC) (const_int 0))) (REG_DEAD (reg CC)) i3: (parallel [(set (reg CC) (op2 (reg A))) (set (reg A) (plus (reg A) (const_int 1)))]) (REG_UNUSED (reg A))) this might combine into: i2: (set (reg A) (op5 (reg A))) i3: (set (reg CC) (op6 (reg A))) (REG_DEAD (reg A)) and now we have removed 3 sets to A, but added 1 by splitting within combine using A as scratch. Would we then have to figure out that for each of the REG_UNUSED notes pertaining to A we have to drop the REG_N_SETS count by 1, although A remains used in i3, and set and used in i2? I don't see how. I see that combine would record the value for reg A at i2 in this case, but would it express it in terms of which earlier value of reg A? Shouldn't we have reset it while placing notes in this case too? > It wasn't obvious to me (this code is horribly complicated). Whether > all existing code is correct... it's probably best not to look too > closely :-/ You're right about its being horribly complicated. Maybe I should go about it incrementally. > If you have a patch you feel confident in, could you post it again > please? So let me tell you how I feel about this. It has waited long enough, and there are at least 3 bugs known to be fixed by the first very simple patch below. The catch is that it doesn't adjust REG_N_SETS at all (we didn't before the patch, and that didn't seem to hurt too much). I've regstrapped this successfully on x86_64-linux-gnu and i686-linux-gnu. --->cut<--- When combine drops a REG_UNUSED SET in a parallel, we have to clear cached values, so that, even if the REGs remain used (e.g. because they were referenced in the used SET_SRC), we will not use properties of the dropped modified value as if they applied to the preserved original one. We fail to adjust REG_N_SETS. for gcc/ChangeLog PR rtl-optimization/80693 PR rtl-optimization/81019 PR rtl-optimization/81020 * combine.c (distribute_notes): Reset any REG_UNUSED REGs that are not mentioned in i3. Place the REG_UNUSED note on i2, possibly modified to REG_DEAD, if it did not originate in i3. for gcc/testsuite/ChangeLog PR rtl-optimization/80693 PR rtl-optimization/81019 PR rtl-optimization/81020 * gcc.dg/pr80693.c: New. * gcc.dg/pr81019.c: New. ---
Re: [PR80693] drop value of parallel SETs dropped by combine
On Jul 7, 2017, Segher Boessenkoolwrote: > I meant, just double check if your new > code does the correct thing for the set count. Sorry this took me so long to get back to. Even this was difficult for me to answer for sure, then and now. We don't (and can't?) know whether the REG_UNUSED note originally pertained to a clobber or a set, but I think that doesn't matter: unless we've reused the REG in i2 as a scratch, I think we should decrement the set count, because we used to have a SET or a CLOBBER that is now gone. Looking back at the issue, I realized we should keep/add the REG_UNUSED note to i2, if the reg is mentioned in i2, possibly turned into REG_DEAD if i2 is referenced but not set. Still, I'm concerned I haven't caught all of the cases in which adjusting REG_N_SETS would be needed: we might have dropped multiple SETs of the same pseudo, each with its own REG_UNUSED note (say, from all of i3, i2, i1, and i0), and the current logic will only decrement REG_N_SETS once, and only if i2 no longer sets the register. I'm also concerned that the logic for when the REG is set or referenced in i3 is incomplete: references in i3 might have come from any of the previous insns, even if intervening sets of the same register were substituted and thus removed. Consider the following nightmarish scenario: i0: (parallel [(set (reg CC) (op1 (reg A))) (set (reg A) (plus (reg A) (const_int 1)))]) (REG_UNUSED (reg A)) i1: (set (reg A) (ne (reg CC) (const_int 0))) (REG_DEAD (reg CC)) i2: (parallel [(set (reg CC) (op2 (reg A))) (set (reg A) (plus (reg A) (const_int 1)))]) (REG_UNUSED (reg A))) i3: (set (reg A) (eq (reg CC) (const_int 0))) (REG_DEAD (reg CC)) we might turn that into say: i2: (set (reg CC) (op3 (reg A))) (REG_DEAD (reg A)) i3: (set (reg A) (op4 (reg CC))) (REG_DEAD (reg CC)) and now we'd have to somehow figure out that we're to discount the two unused sets of reg A, those from i0 and i2, and to turn either REG_UNUSED note about reg A into a REG_DEAD note to be placed at i2. A is set at i3, so combine should record its new value, but if it's computed in terms of the scratch CC and a much-older A, will we get the correct value? Or is the value unchanged because it's the output of the latest insn? Now, consider this slightly simpler scenario (trying to combine only the first 3 insns): i0: nil i1: (parallel [(set (reg CC) (op1 (reg A))) (set (reg A) (plus (reg A) (const_int 1)))]) (REG_UNUSED (reg A)) i2: (set (reg A) (ne (reg CC) (const_int 0))) (REG_DEAD (reg CC)) i3: (parallel [(set (reg CC) (op2 (reg A))) (set (reg A) (plus (reg A) (const_int 1)))]) (REG_UNUSED (reg A))) this might combine into: i2: (set (reg A) (op5 (reg A))) i3: (set (reg CC) (op6 (reg A))) (REG_DEAD (reg A)) and now we have removed 3 sets to A, but added 1 by splitting within combine using A as scratch. Would we then have to figure out that for each of the REG_UNUSED notes pertaining to A we have to drop the REG_N_SETS count by 1, although A remains used in i3, and set and used in i2? I don't see how. I see that combine would record the value for reg A at i2 in this case, but would it express it in terms of which earlier value of reg A? Shouldn't we have reset it while placing notes in this case too? > It wasn't obvious to me (this code is horribly complicated). Whether > all existing code is correct... it's probably best not to look too > closely :-/ You're right about its being horribly complicated. Maybe I should go about it incrementally. > If you have a patch you feel confident in, could you post it again > please? So let me tell you how I feel about this. It has waited long enough, and there are at least 3 bugs known to be fixed by the first very simple patch below. The catch is that it doesn't adjust REG_N_SETS at all (we didn't before the patch, and that didn't seem to hurt too much). I've regstrapped this successfully on x86_64-linux-gnu and i686-linux-gnu. --->cut<--- When combine drops a REG_UNUSED SET in a parallel, we have to clear cached values, so that, even if the REGs remain used (e.g. because they were referenced in the used SET_SRC), we will not use properties of the dropped modified value as if they applied to the preserved original one. We fail to adjust REG_N_SETS. for gcc/ChangeLog PR rtl-optimization/80693 PR rtl-optimization/81019 PR rtl-optimization/81020 * combine.c (distribute_notes): Reset any REG_UNUSED REGs that are not mentioned in i3. Place the REG_UNUSED note on i2, possibly modified to REG_DEAD, if it did not originate in i3. for gcc/testsuite/ChangeLog PR rtl-optimization/80693 PR rtl-optimization/81019 PR rtl-optimization/81020 * gcc.dg/pr80693.c: New. * gcc.dg/pr81019.c: New. --- gcc/combine.c | 40
Re: [PR80693] drop value of parallel SETs dropped by combine
Hi again, sorry for the delay, On Fri, Jun 23, 2017 at 11:01:12PM -0300, Alexandre Oliva wrote: > > Things should probably be restructured a bit so we keep the sets count > > correct, if that is possible? > > I'll have to think a bit to figure out the exact conditions in which to > decrement the sets count, and reset the recorded value. I was thinking > the conditions were the same; am I missing something? > > Or are you getting at cases in which we should do both and don't, or > vice-versa? E.g., if reg_referenced_p holds but the subsequent test > doesn't? I guess we do, but don't we have to distinguish the cases of > an original unused set remaining from that of reusing the pseudo for a > new set? > > Do we have to test whether from_insn still reg_sets_p the REG_UNUSED > operand, when from_insn is not i3? (e.g., it could be something that > remains set in i1 as a side effect, but that's not used in either i2 or > i3) > > Am I overdoing this? The situations I had to analyze in the patch I > posted before were much simpler, and even then I now think I missed a > number of them :-) Yeah you're overdoing it ;-) I meant, just double check if your new code does the correct thing for the set count. It wasn't obvious to me (this code is horribly complicated). Whether all existing code is correct... it's probably best not to look too closely :-/ If you have a patch you feel confident in, could you post it again please? Segher p.s. What I still want to do is never reuse a set pseudo, always create a new pseudo instead. This will get rid of many existing bugs, and a lot of the complications in existing code. Unfortunately not everything is set up yet for creating new pseudos during combine.
Re: [PR80693] drop value of parallel SETs dropped by combine
On Jun 22, 2017, Segher Boessenkoolwrote: > On Thu, Jun 22, 2017 at 03:21:01AM -0300, Alexandre Oliva wrote: >> On Jun 8, 2017, Segher Boessenkool wrote: >> >> > [ I missed this patch the first time around; please cc: me to prevent this >> > ] >> >> > On Thu, May 18, 2017 at 07:25:57AM -0300, Alexandre Oliva wrote: >> >> When an insn used by combine has multiple SETs, only the non-REG_UNUSED >> >> set is used: others will end up dropped on the floor. >> >> > Sometimes, yes; not always. >> >> You mean sets to non-REGs, I suppose. I didn't take them into account >> in my statement indeed, but I think it still applies: can_combine_p will >> reject parallel SETs if two or more of them don't have a REG_UNUSED note >> for their respective SET_DESTs. > can_combine_p is not called for I3; it also isn't called until after > I2 is split, if that happens. Oh, I see what you mean now. I just don't think of I3 as "used"; other insns are "used" in that they are substituted, in whole or in part, into I3. On Jun 22, 2017, Segher Boessenkool wrote: > On Thu, Jun 22, 2017 at 09:25:21AM -0300, Alexandre Oliva wrote: >> On Jun 8, 2017, Segher Boessenkool wrote: >> >> > Would it work to just have "else" instead if this "if"? Or hrm, we'll >> > need to kill the recorded reg_stat value in the last case before this >> > as well? >> >> The patch below (is this what you meant?) > Yes exactly. >> How's this? (I haven't run regression tests yet) > Looks a lot better digestable than the previous patch, thanks! > Things should probably be restructured a bit so we keep the sets count > correct, if that is possible? I'll have to think a bit to figure out the exact conditions in which to decrement the sets count, and reset the recorded value. I was thinking the conditions were the same; am I missing something? Or are you getting at cases in which we should do both and don't, or vice-versa? E.g., if reg_referenced_p holds but the subsequent test doesn't? I guess we do, but don't we have to distinguish the cases of an original unused set remaining from that of reusing the pseudo for a new set? Do we have to test whether from_insn still reg_sets_p the REG_UNUSED operand, when from_insn is not i3? (e.g., it could be something that remains set in i1 as a side effect, but that's not used in either i2 or i3) Am I overdoing this? The situations I had to analyze in the patch I posted before were much simpler, and even then I now think I missed a number of them :-) >> When an insn used by combine has multiple SETs, only the non-REG_UNUSED >> set is used: others will end up dropped on the floor. > write something simpler like > If combine drops a REG_UNUSED SET, [...] Nice, thanks. > Similar for this comment. (It has a stray tab btw, before "We"). *nod* > Maybe use "long long"? Less incorrect/misleading on 32-bit targets ;-) Sure, thanks. -- Alexandre Oliva, freedom fighterhttp://FSFLA.org/~lxoliva/ You must be the change you wish to see in the world. -- Gandhi Be Free! -- http://FSFLA.org/ FSF Latin America board member Free Software Evangelist|Red Hat Brasil GNU Toolchain Engineer
Re: [PR80693] drop value of parallel SETs dropped by combine
On Thu, Jun 22, 2017 at 09:25:21AM -0300, Alexandre Oliva wrote: > On Jun 8, 2017, Segher Boessenkoolwrote: > > > Would it work to just have "else" instead if this "if"? Or hrm, we'll > > need to kill the recorded reg_stat value in the last case before this > > as well? > > The patch below (is this what you meant?) Yes exactly. > fixes the PR testcase, and the > new else block doesn't get exercised in an x86_64-linux-gnu bootstrap. > However, it seems to me that it might very well drop parallel SETs > without decrementing the sets count, though probably in cases in which > this won't matter. > > How's this? (I haven't run regression tests yet) Looks a lot better digestable than the previous patch, thanks! Things should probably be restructured a bit so we keep the sets count correct, if that is possible? > When an insn used by combine has multiple SETs, only the non-REG_UNUSED > set is used: others will end up dropped on the floor. This isn't exactly true, as I described in a previous email... You can write something simpler like If combine drops a REG_UNUSED SET, [...] > We have to take > note of the dropped REG_UNUSED SETs, clearing their cached values, so > that, even if the REGs remain used (e.g. because they were referenced > in the used SET_SRC), we will not use properties of the latest value > as if they applied to the earlier one. > --- a/gcc/combine.c > +++ b/gcc/combine.c > @@ -14087,6 +14087,25 @@ distribute_notes (rtx notes, rtx_insn *from_insn, > rtx_insn *i3, rtx_insn *i2, > PUT_REG_NOTE_KIND (note, REG_DEAD); > place = i3; > } > + > + /* If there were any parallel sets in FROM_INSN other than > + the one setting IDEST, it must be REG_UNUSED, otherwise > + we could not have used FROM_INSN in combine. Since this > + combine attempt succeeded, we know this unused SET was > + dropped on the floor, because the insn was either deleted > + or created from a new pattern that does not use its > + SET_DEST. We must forget whatever we knew about the > + value that was stored by that SET, since the prior value > + may still be present in IDEST's src expression or > + elsewhere, and we do not want to use properties of the > + dropped value as if they applied to the prior one when > + simplifying e.g. subsequent combine attempts. */ Similar for this comment. (It has a stray tab btw, before "We"). > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/pr80693.c > @@ -0,0 +1,26 @@ > +/* { dg-do run } */ > +/* { dg-options "-O -fno-tree-coalesce-vars" } */ > +typedef unsigned char u8; > +typedef unsigned short u16; > +typedef unsigned u32; > +typedef unsigned long u64; Maybe use "long long"? Less incorrect/misleading on 32-bit targets ;-) Thanks, Segher
Re: [PR80693] drop value of parallel SETs dropped by combine
On Thu, Jun 22, 2017 at 03:21:01AM -0300, Alexandre Oliva wrote: > On Jun 8, 2017, Segher Boessenkoolwrote: > > > [ I missed this patch the first time around; please cc: me to prevent this ] > > > On Thu, May 18, 2017 at 07:25:57AM -0300, Alexandre Oliva wrote: > >> When an insn used by combine has multiple SETs, only the non-REG_UNUSED > >> set is used: others will end up dropped on the floor. > > > Sometimes, yes; not always. > > You mean sets to non-REGs, I suppose. I didn't take them into account > in my statement indeed, but I think it still applies: can_combine_p will > reject parallel SETs if two or more of them don't have a REG_UNUSED note > for their respective SET_DESTs. can_combine_p is not called for I3; it also isn't called until after I2 is split, if that happens. It is fairly normal for an I3 with two SETs to have both survive in the combined insn, even if one is unused. (And this is a good thing! The insn with one SET removed might not exist. OTOH, if we can remove one of the SETs we probably should; we don't try that currently, if the "full" instruction is recognised.) Segher
Re: [PR80693] drop value of parallel SETs dropped by combine
On Jun 22, 2017, Alexandre Olivawrote: > The patch below (is this what you meant?) fixes the PR testcase, and the > new else block doesn't get exercised in an x86_64-linux-gnu bootstrap. Err, I misdescribed the situation. It's not that it doesn't get exercised, it's that this new block doesn't make any noticeable difference in the generated code. I checked that by adding a new option in common.opt, running the new block conditionally on the new option, building stage1 host normally, then continuing bootstrap with GCC_COMPARE_DEBUG=-fthe-new-option set in the environment, building target libs and stage2 build and host like that. GCC_COMPARE_DEBUG, like -fcompare-debug, checks that the compiler dumps at final are the same. -- Alexandre Oliva, freedom fighterhttp://FSFLA.org/~lxoliva/ You must be the change you wish to see in the world. -- Gandhi Be Free! -- http://FSFLA.org/ FSF Latin America board member Free Software Evangelist|Red Hat Brasil GNU Toolchain Engineer
Re: [PR80693] drop value of parallel SETs dropped by combine
On Jun 8, 2017, Segher Boessenkool <seg...@kernel.crashing.org> wrote: > Would it work to just have "else" instead if this "if"? Or hrm, we'll > need to kill the recorded reg_stat value in the last case before this > as well? The patch below (is this what you meant?) fixes the PR testcase, and the new else block doesn't get exercised in an x86_64-linux-gnu bootstrap. However, it seems to me that it might very well drop parallel SETs without decrementing the sets count, though probably in cases in which this won't matter. How's this? (I haven't run regression tests yet) [PR80693] drop value of parallel SETs dropped by combine When an insn used by combine has multiple SETs, only the non-REG_UNUSED set is used: others will end up dropped on the floor. We have to take note of the dropped REG_UNUSED SETs, clearing their cached values, so that, even if the REGs remain used (e.g. because they were referenced in the used SET_SRC), we will not use properties of the latest value as if they applied to the earlier one. for gcc/ChangeLog PR rtl-optimization/80693 * combine.c (distribute_notes): Reset any REG_UNUSED REGs that are not set or referenced in i3. for gcc/testsuite/ChangeLog PR rtl-optimization/80693 * gcc.dg/pr80693.c: New. --- gcc/combine.c | 19 +++ gcc/testsuite/gcc.dg/pr80693.c | 26 ++ 2 files changed, 45 insertions(+) create mode 100644 gcc/testsuite/gcc.dg/pr80693.c diff --git a/gcc/combine.c b/gcc/combine.c index 2d49bc2..477010d 100644 --- a/gcc/combine.c +++ b/gcc/combine.c @@ -14087,6 +14087,25 @@ distribute_notes (rtx notes, rtx_insn *from_insn, rtx_insn *i3, rtx_insn *i2, PUT_REG_NOTE_KIND (note, REG_DEAD); place = i3; } + + /* If there were any parallel sets in FROM_INSN other than +the one setting IDEST, it must be REG_UNUSED, otherwise +we could not have used FROM_INSN in combine. Since this +combine attempt succeeded, we know this unused SET was +dropped on the floor, because the insn was either deleted +or created from a new pattern that does not use its +SET_DEST. We must forget whatever we knew about the +value that was stored by that SET, since the prior value +may still be present in IDEST's src expression or +elsewhere, and we do not want to use properties of the +dropped value as if they applied to the prior one when +simplifying e.g. subsequent combine attempts. */ + else + { + gcc_assert (REG_P (XEXP (note, 0))); + record_value_for_reg (XEXP (note, 0), NULL, NULL_RTX); + INC_REG_N_SETS (REGNO (XEXP (note, 0)), -1); + } break; case REG_EQUAL: diff --git a/gcc/testsuite/gcc.dg/pr80693.c b/gcc/testsuite/gcc.dg/pr80693.c new file mode 100644 index 000..aecddd0 --- /dev/null +++ b/gcc/testsuite/gcc.dg/pr80693.c @@ -0,0 +1,26 @@ +/* { dg-do run } */ +/* { dg-options "-O -fno-tree-coalesce-vars" } */ +typedef unsigned char u8; +typedef unsigned short u16; +typedef unsigned u32; +typedef unsigned long u64; + +static u64 __attribute__((noinline, noclone)) +foo(u8 u8_0, u16 u16_0, u32 u32_0, u64 u64_0, u16 u16_1) +{ + u16_1 += 0x1051; + u16_1 &= 1; + u8_0 <<= u32_0 & 7; + u16_0 -= !u16_1; + u16_1 >>= ((u16)-u8_0 != 0xff); + return u8_0 + u16_0 + u64_0 + u16_1; +} + +int +main (void) +{ + u64 x = foo(1, 1, 0x, 0, 1); + if (x != 0x80) +__builtin_abort(); + return 0; +} -- Alexandre Oliva, freedom fighterhttp://FSFLA.org/~lxoliva/ You must be the change you wish to see in the world. -- Gandhi Be Free! -- http://FSFLA.org/ FSF Latin America board member Free Software Evangelist|Red Hat Brasil GNU Toolchain Engineer
Re: [PR80693] drop value of parallel SETs dropped by combine
On Jun 8, 2017, Segher Boessenkoolwrote: > [ I missed this patch the first time around; please cc: me to prevent this ] > On Thu, May 18, 2017 at 07:25:57AM -0300, Alexandre Oliva wrote: >> When an insn used by combine has multiple SETs, only the non-REG_UNUSED >> set is used: others will end up dropped on the floor. > Sometimes, yes; not always. You mean sets to non-REGs, I suppose. I didn't take them into account in my statement indeed, but I think it still applies: can_combine_p will reject parallel SETs if two or more of them don't have a REG_UNUSED note for their respective SET_DESTs. >> PR rtl-optimization/80693 >> * combine.c (distribute_notes): Add IDEST parameter. Reset any >> REG_UNUSED REGs that are not IDEST, if IDEST is given. Adjust >> all callers. > Most callers use NULL_RTX for idest. It isn't obvious to me that this > is correct. My reasoning is that the block of try_combine that passes i[0-3]dest will cover all combine-affected insns that could possibly have REG_UNUSED notes, since these notes would have to be put back in the insns at that point. Since it's enough to reset the reg stats once, that would do it, and so we need not be concerned with any other cases, so we can pass NULL_RTX for idest elsewhere. >> @@ -14087,6 +14090,26 @@ distribute_notes (rtx notes, rtx_insn *from_insn, >> rtx_insn *i3, rtx_insn *i2, >> PUT_REG_NOTE_KIND (note, REG_DEAD); >> place = i3; >> } >> + >> + /* If there were any parallel sets in FROM_INSN other than >> + the one setting IDEST, it must be REG_UNUSED, otherwise >> + we could not have used FROM_INSN in combine. Since this >> + combine attempt succeeded, we know this unused SET was >> + dropped on the floor, because the insn was either deleted >> + or created from a new pattern that does not use its >> + SET_DEST. We must forget whatever we knew about the >> + value that was stored by that SET, since the prior value >> + may still be present in IDEST's src expression or >> + elsewhere, and we do not want to use properties of the >> + dropped value as if they applied to the prior one when >> + simplifying e.g. subsequent combine attempts. */ >> + if (idest && XEXP (note, 0) != idest) > Would it work to just have "else" instead if this "if"? Or hrm, we'll > need to kill the recorded reg_stat value in the last case before this > as well? You mean instead of passing an idest to distribute_notes and testing it, right? We might also need to catch the case in which the first if block breaks. I think I had missed that. > Could you try that out? Or I can do it, let me know what you prefer. I'll give it a shot. -- Alexandre Oliva, freedom fighterhttp://FSFLA.org/~lxoliva/ You must be the change you wish to see in the world. -- Gandhi Be Free! -- http://FSFLA.org/ FSF Latin America board member Free Software Evangelist|Red Hat Brasil GNU Toolchain Engineer
Re: [PR80693] drop value of parallel SETs dropped by combine
Hi! [ I missed this patch the first time around; please cc: me to prevent this ] On Thu, May 18, 2017 at 07:25:57AM -0300, Alexandre Oliva wrote: > When an insn used by combine has multiple SETs, only the non-REG_UNUSED > set is used: others will end up dropped on the floor. Sometimes, yes; not always. > We have to take > note of the dropped REG_UNUSED SETs, clearing their cached values, so > that, even if the REGs remain used (e.g. because they were referenced > in the used SET_SRC), we will not use properties of the latest value > as if they applied to the earlier one. The reg_stat stuff is no end of pain, sigh. > PR rtl-optimization/80693 > * combine.c (distribute_notes): Add IDEST parameter. Reset any > REG_UNUSED REGs that are not IDEST, if IDEST is given. Adjust > all callers. Most callers use NULL_RTX for idest. It isn't obvious to me that this is correct. > @@ -14087,6 +14090,26 @@ distribute_notes (rtx notes, rtx_insn *from_insn, > rtx_insn *i3, rtx_insn *i2, > PUT_REG_NOTE_KIND (note, REG_DEAD); > place = i3; > } > + > + /* If there were any parallel sets in FROM_INSN other than > + the one setting IDEST, it must be REG_UNUSED, otherwise > + we could not have used FROM_INSN in combine. Since this > + combine attempt succeeded, we know this unused SET was > + dropped on the floor, because the insn was either deleted > + or created from a new pattern that does not use its > + SET_DEST. We must forget whatever we knew about the > + value that was stored by that SET, since the prior value > + may still be present in IDEST's src expression or > + elsewhere, and we do not want to use properties of the > + dropped value as if they applied to the prior one when > + simplifying e.g. subsequent combine attempts. */ > + if (idest && XEXP (note, 0) != idest) Would it work to just have "else" instead if this "if"? Or hrm, we'll need to kill the recorded reg_stat value in the last case before this as well? > + { > + gcc_assert (REG_P (XEXP (note, 0))); > + record_value_for_reg (XEXP (note, 0), NULL, NULL_RTX); > + INC_REG_N_SETS (REGNO (XEXP (note, 0)), -1); > + } > + > break; Could you try that out? Or I can do it, let me know what you prefer. Thanks, Segher
Re: [PR80693] drop value of parallel SETs dropped by combine
On May 18, 2017, Alexandre Olivawrote: > When an insn used by combine has multiple SETs, only the non-REG_UNUSED > set is used: others will end up dropped on the floor. We have to take > note of the dropped REG_UNUSED SETs, clearing their cached values, so > that, even if the REGs remain used (e.g. because they were referenced > in the used SET_SRC), we will not use properties of the latest value > as if they applied to the earlier one. > Regstrapped on x86_64-linux-gnu. Ok to install? > for gcc/ChangeLog > PR rtl-optimization/80693 > * combine.c (distribute_notes): Add IDEST parameter. Reset any > REG_UNUSED REGs that are not IDEST, if IDEST is given. Adjust > all callers. > for gcc/testsuite/ChangeLog > PR rtl-optimization/80693 > * gcc.dg/pr80693.c: New. Ping? https://gcc.gnu.org/ml/gcc-patches/2017-05/msg01444.html -- Alexandre Oliva, freedom fighterhttp://FSFLA.org/~lxoliva/ You must be the change you wish to see in the world. -- Gandhi Be Free! -- http://FSFLA.org/ FSF Latin America board member Free Software Evangelist|Red Hat Brasil GNU Toolchain Engineer
[PR80693] drop value of parallel SETs dropped by combine
When an insn used by combine has multiple SETs, only the non-REG_UNUSED set is used: others will end up dropped on the floor. We have to take note of the dropped REG_UNUSED SETs, clearing their cached values, so that, even if the REGs remain used (e.g. because they were referenced in the used SET_SRC), we will not use properties of the latest value as if they applied to the earlier one. Regstrapped on x86_64-linux-gnu. Ok to install? for gcc/ChangeLog PR rtl-optimization/80693 * combine.c (distribute_notes): Add IDEST parameter. Reset any REG_UNUSED REGs that are not IDEST, if IDEST is given. Adjust all callers. for gcc/testsuite/ChangeLog PR rtl-optimization/80693 * gcc.dg/pr80693.c: New. --- gcc/combine.c | 80 ++-- gcc/testsuite/gcc.dg/pr80693.c | 26 + 2 files changed, 78 insertions(+), 28 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/pr80693.c diff --git a/gcc/combine.c b/gcc/combine.c index 39ef3c6..6954f92 100644 --- a/gcc/combine.c +++ b/gcc/combine.c @@ -483,7 +483,7 @@ static void reg_dead_at_p_1 (rtx, const_rtx, void *); static int reg_dead_at_p (rtx, rtx_insn *); static void move_deaths (rtx, rtx, int, rtx_insn *, rtx *); static int reg_bitfield_target_p (rtx, rtx); -static void distribute_notes (rtx, rtx_insn *, rtx_insn *, rtx_insn *, rtx, rtx, rtx); +static void distribute_notes (rtx, rtx_insn *, rtx, rtx_insn *, rtx_insn *, rtx, rtx, rtx); static void distribute_links (struct insn_link *); static void mark_used_regs_combine (rtx); static void record_promoted_value (rtx_insn *, rtx); @@ -4170,7 +4170,7 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn *i1, rtx_insn *i0, remove_note (undobuf.other_insn, note); } - distribute_notes (new_other_notes, undobuf.other_insn, + distribute_notes (new_other_notes, undobuf.other_insn, NULL_RTX, undobuf.other_insn, NULL, NULL_RTX, NULL_RTX, NULL_RTX); } @@ -4424,19 +4424,19 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn *i1, rtx_insn *i0, /* Distribute all the LOG_LINKS and REG_NOTES from I1, I2, and I3. */ if (i3notes) - distribute_notes (i3notes, i3, i3, newi2pat ? i2 : NULL, + distribute_notes (i3notes, i3, NULL_RTX, i3, newi2pat ? i2 : NULL, elim_i2, elim_i1, elim_i0); if (i2notes) - distribute_notes (i2notes, i2, i3, newi2pat ? i2 : NULL, + distribute_notes (i2notes, i2, i2dest, i3, newi2pat ? i2 : NULL, elim_i2, elim_i1, elim_i0); if (i1notes) - distribute_notes (i1notes, i1, i3, newi2pat ? i2 : NULL, + distribute_notes (i1notes, i1, i1dest, i3, newi2pat ? i2 : NULL, elim_i2, local_elim_i1, local_elim_i0); if (i0notes) - distribute_notes (i0notes, i0, i3, newi2pat ? i2 : NULL, + distribute_notes (i0notes, i0, i0dest, i3, newi2pat ? i2 : NULL, elim_i2, elim_i1, local_elim_i0); if (midnotes) - distribute_notes (midnotes, NULL, i3, newi2pat ? i2 : NULL, + distribute_notes (midnotes, NULL, NULL_RTX, i3, newi2pat ? i2 : NULL, elim_i2, elim_i1, elim_i0); /* Distribute any notes added to I2 or I3 by recog_for_combine. We @@ -,12 +,12 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn *i1, rtx_insn *i0, so we always pass it as i3. */ if (newi2pat && new_i2_notes) - distribute_notes (new_i2_notes, i2, i2, NULL, NULL_RTX, NULL_RTX, - NULL_RTX); + distribute_notes (new_i2_notes, i2, NULL_RTX, i2, NULL, + NULL_RTX, NULL_RTX, NULL_RTX); if (new_i3_notes) - distribute_notes (new_i3_notes, i3, i3, NULL, NULL_RTX, NULL_RTX, - NULL_RTX); + distribute_notes (new_i3_notes, i3, NULL_RTX, i3, NULL, + NULL_RTX, NULL_RTX, NULL_RTX); /* If I3DEST was used in I3SRC, it really died in I3. We may need to put a REG_DEAD note for it somewhere. If NEWI2PAT exists and sets @@ -4462,10 +4462,10 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn *i1, rtx_insn *i0, { rtx new_note = alloc_reg_note (REG_DEAD, i3dest_killed, NULL_RTX); if (newi2pat && reg_set_p (i3dest_killed, newi2pat)) - distribute_notes (new_note, NULL, i2, NULL, elim_i2, + distribute_notes (new_note, NULL, NULL_RTX, i2, NULL, elim_i2, elim_i1, elim_i0); else - distribute_notes (new_note, NULL, i3, newi2pat ? i2 : NULL, + distribute_notes (new_note, NULL, NULL_RTX, i3, newi2pat ? i2 : NULL, elim_i2, elim_i1, elim_i0); } @@ -4473,10 +4473,10 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn *i1, rtx_insn *i0, { rtx new_note = alloc_reg_note (REG_DEAD, i2dest,