[Bug tree-optimization/100922] CSE leads to fully redundant (back to back) zero-extending loads of the same thing in a loop, or a register copy
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100922 Richard Biener changed: What|Removed |Added CC||rguenth at gcc dot gnu.org --- Comment #4 from Richard Biener --- (In reply to Andrew Pinski from comment #1) > Pre takes: >[local count: 1073741824]: > # str_9 = PHI > # out_10 = PHI > str_16 = str_9 + 1; > c_17 = *str_9; > _5 = (sizetype) c_17; > _6 = keep_lut_14(D) + _5; > inc_18 = *_6; > *out_10 = c_17; > _7 = (sizetype) inc_18; > out_20 = out_10 + _7; > if (c_17 != 0) > goto ; [89.00%] > else > goto ; [11.00%] > >[local count: 955630224]: > goto ; [100.00%] > > Changes it to: >[local count: 1073741824]: > # str_9 = PHI > # out_10 = PHI > # prephitmp_60 = PHI > str_16 = str_9 + 1; > c_17 = *str_9; > _5 = (sizetype) c_17; > _6 = keep_lut_14(D) + _5; > *out_10 = c_17; > _7 = (sizetype) prephitmp_60; > out_20 = out_10 + _7; > if (c_17 != 0) > goto ; [89.00%] > else > goto ; [11.00%] > >[local count: 955630224]: > pretmp_55 = MEM[(char *)str_9 + 1B]; > _57 = (sizetype) pretmp_55; > _58 = keep_lut_14(D) + _57; > pretmp_59 = *_58; > goto ; [100.00%] > > And then ch_vect changes it to (basically undoing the PRE): >[local count: 850510900]: > >[local count: 955630224]: > # str_42 = PHI > # str_41 = PHI > # out_38 = PHI > pretmp_55 = MEM[(char *)str_42 + 1B]; > _57 = (sizetype) pretmp_55; > _58 = keep_lut_14(D) + _57; > pretmp_59 = *_58; > str_16 = str_41 + 1; > c_17 = *str_41; > *out_38 = c_17; > _7 = (sizetype) pretmp_59; > out_20 = out_38 + _7; > if (c_17 != 0) > goto ; [89.00%] > else > goto ; [11.00%] > > And then IVOPTS comes around and does: >[local count: 850510900]: > >[local count: 955630224]: > # str_41 = PHI > # out_38 = PHI > pretmp_55 = MEM[(char *)str_41]; > _57 = (sizetype) pretmp_55; > _58 = keep_lut_14(D) + _57; > pretmp_59 = *_58; > str_16 = str_41 + 1; > c_17 = MEM[(char *)str_16 + -1B]; > *out_38 = c_17; > _7 = (sizetype) pretmp_59; > out_20 = out_38 + _7; > if (c_17 != 0) > goto ; [89.00%] > else > goto ; [11.00%] > > Which is fine except FRE (and DOM), don't recognize the MEM[(char *)str_16 + > -1B] and MEM[(char *)str_41] being the same. For FRE, it almost looks like > TARGET_MEM_REF is not handled TARGET_MEM_REF is handled but what is not is "forwarding" the str_16 definition to the str_16 + -1B TARGET_MEM_REF - vn_reference_maybe_forwprop_address only forwards to MEM_REFs (see the caller in valueize_refs_1 - the TARGET_MEM_REF representation is different enough to make a trivial patch impossible). Now, there's code to fold TARGET_MEM_REF back to MEM_REF, but it doesn't trigger here as /* If possible use a plain MEM_REF instead of a TARGET_MEM_REF. ??? As IVOPTs does not follow restrictions to where the base pointer may point to create a MEM_REF only if we know that base is valid. */ if ((TREE_CODE (base) == ADDR_EXPR || TREE_CODE (base) == INTEGER_CST) I _think_ that IVOPTs is fixed, but I'm not 100% sure ...
[Bug tree-optimization/100922] CSE leads to fully redundant (back to back) zero-extending loads of the same thing in a loop, or a register copy
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100922 --- Comment #3 from Andrew Pinski --- (In reply to Peter Cordes from comment #2) > Possibly also related: Fully unrelated.
[Bug tree-optimization/100922] CSE leads to fully redundant (back to back) zero-extending loads of the same thing in a loop, or a register copy
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100922 --- Comment #2 from Peter Cordes --- Possibly also related: With different surrounding code, this loop can compile to asm which has two useless movz / mov register copies in the loop at -O2 (https://godbolt.org/z/PTcqzM6q7). (To set up for entry into the next loop in over-complicated ways, and doing this in the loop is unnecessary.) while( lut[(unsigned char)*str] == 0 ){ // also catches terminating 0 str++; } .L19: movzbl 1(%rdi), %edx addq$1, %rdi movzbl %dl, %ecx movl%edx, %eax cmpb$0, -120(%rsp,%rcx) je .L19 from source void remove_chars(char *restrict str, const char *restrict remove) { char lut[256] = {0}; do { lut[(unsigned char)*remove] = -1; }while(*remove++); /*** Over complicated asm in this loop */ while( lut[(unsigned char)*str] == 0 ){ // also catches terminating 0 str++; } // str points at first char to *not* keep (or the terminating 0) const char *in = str; char *out = str; while (*in) { char mask = lut[(unsigned char)*in]; unsigned char cin = *in, cout = *out; *out = mask ? cout : cin; out += mask + 1; in++; } *out = *in; }
[Bug tree-optimization/100922] CSE leads to fully redundant (back to back) zero-extending loads of the same thing in a loop, or a register copy
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100922 Andrew Pinski changed: What|Removed |Added Status|UNCONFIRMED |NEW Last reconfirmed||2021-06-05 Ever confirmed|0 |1 --- Comment #1 from Andrew Pinski --- Pre takes: [local count: 1073741824]: # str_9 = PHI # out_10 = PHI str_16 = str_9 + 1; c_17 = *str_9; _5 = (sizetype) c_17; _6 = keep_lut_14(D) + _5; inc_18 = *_6; *out_10 = c_17; _7 = (sizetype) inc_18; out_20 = out_10 + _7; if (c_17 != 0) goto ; [89.00%] else goto ; [11.00%] [local count: 955630224]: goto ; [100.00%] Changes it to: [local count: 1073741824]: # str_9 = PHI # out_10 = PHI # prephitmp_60 = PHI str_16 = str_9 + 1; c_17 = *str_9; _5 = (sizetype) c_17; _6 = keep_lut_14(D) + _5; *out_10 = c_17; _7 = (sizetype) prephitmp_60; out_20 = out_10 + _7; if (c_17 != 0) goto ; [89.00%] else goto ; [11.00%] [local count: 955630224]: pretmp_55 = MEM[(char *)str_9 + 1B]; _57 = (sizetype) pretmp_55; _58 = keep_lut_14(D) + _57; pretmp_59 = *_58; goto ; [100.00%] And then ch_vect changes it to (basically undoing the PRE): [local count: 850510900]: [local count: 955630224]: # str_42 = PHI # str_41 = PHI # out_38 = PHI pretmp_55 = MEM[(char *)str_42 + 1B]; _57 = (sizetype) pretmp_55; _58 = keep_lut_14(D) + _57; pretmp_59 = *_58; str_16 = str_41 + 1; c_17 = *str_41; *out_38 = c_17; _7 = (sizetype) pretmp_59; out_20 = out_38 + _7; if (c_17 != 0) goto ; [89.00%] else goto ; [11.00%] And then IVOPTS comes around and does: [local count: 850510900]: [local count: 955630224]: # str_41 = PHI # out_38 = PHI pretmp_55 = MEM[(char *)str_41]; _57 = (sizetype) pretmp_55; _58 = keep_lut_14(D) + _57; pretmp_59 = *_58; str_16 = str_41 + 1; c_17 = MEM[(char *)str_16 + -1B]; *out_38 = c_17; _7 = (sizetype) pretmp_59; out_20 = out_38 + _7; if (c_17 != 0) goto ; [89.00%] else goto ; [11.00%] Which is fine except FRE (and DOM), don't recognize the MEM[(char *)str_16 + -1B] and MEM[(char *)str_41] being the same. For FRE, it almost looks like TARGET_MEM_REF is not handled