[Bug rtl-optimization/63259] Detecting byteswap sequence
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63259 --- Comment #21 from thopre01 at gcc dot gnu.org --- Author: thopre01 Date: Tue Jan 6 11:51:16 2015 New Revision: 219256 URL: https://gcc.gnu.org/viewcvs?rev=219256root=gccview=rev Log: 2015-01-06 Thomas Preud'homme thomas.preudho...@arm.com gcc/ PR tree-optimization/63259 * tree-ssa-math-opts.c (pass_optimize_bswap::execute): Stop checking if optab exists for 16bit byteswap. Modified: trunk/gcc/ChangeLog trunk/gcc/tree-ssa-math-opts.c
[Bug rtl-optimization/63259] Detecting byteswap sequence
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63259 --- Comment #19 from thopre01 at gcc dot gnu.org --- Yeah, when doing something like (x[0] 8) | x[1]) 8) | x[2]) 8) | x[3] there is already a depth proportional to the size of the value being byte swapped with a coefficient due to casting. But I need to evaluate the impact of increasing the limit in terms of compilation time. If the impact is noticeable, it might be necessary to do the refactoring suggested by Richard Biener in [1] first. [1] https://gcc.gnu.org/ml/gcc-patches/2014-10/msg00616.html From the performance side the pass could be re-structured to populate a lattice, thus work from def to use instead of the other way around. Which means we visit each stmt exactly once, compute its value symbolically and check it against a rotate.
[Bug rtl-optimization/63259] Detecting byteswap sequence
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63259 --- Comment #20 from Oleg Endo olegendo at gcc dot gnu.org --- BTW, PR 42587 contains a few other cases. I've created PR 64376 for the SH specific parts.
[Bug rtl-optimization/63259] Detecting byteswap sequence
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63259 --- Comment #18 from Oleg Endo olegendo at gcc dot gnu.org --- (In reply to Oleg Endo from comment #17) (In reply to thopre01 from comment #16) Did we? All I can find is you and Andreas mentionning that it should work because it will be sign extended to int when doing the bitwise AND with 0xFF00. What did I miss? Ah sorry ... me bad. I haven't tried increasing the 'level' with the bswaphi expander pattern in place. Will do that later. So I did. Increasing the limit to limit += 2 + (int) ceil_log2 ... works for me for the signed short case. However, the following function from the CSiBE set: void _nrrdSwap32Endian(void *_data, size_t N) { int *data, w, fix; size_t I; if (_data) { data = (int *)_data; for (I=0; IN; I++) { w = data[I]; fix = (w 0x00FF); fix = ((w 0xFF00) 0x08) | (fix 0x08); fix = ((w 0x00FF) 0x10) | (fix 0x08); fix = ((w 0xFF00) 0x18) | (fix 0x08); data[I] = fix; } } } seems to require 'limit += 3 + (int) ...' for the bswap insn to be detected. Then, this fine function (from the same set) void _nrrdSwap64Endian(void *_data, size_t N) { airLLong *data, l, fix; size_t I; if (_data) { data = (airLLong *)_data; for (I=0; IN; I++) { l = data[I]; fix = (l 0x00FF); fix = ((l 0xFF00) 0x08) | (fix 0x08); fix = ((l 0x00FF) 0x10) | (fix 0x08); fix = ((l 0xFF00) 0x18) | (fix 0x08); #if defined(_WIN32) fix = ((l 0x00FFi64) 0x20) | (fix 0x08); fix = ((l 0xFF00i64) 0x28) | (fix 0x08); fix = ((l 0x00FFi64) 0x30) | (fix 0x08); fix = ((l 0xFF00i64) 0x38) | (fix 0x08); #else fix = ((l 0x00FFLL) 0x20) | (fix 0x08); fix = ((l 0xFF00LL) 0x28) | (fix 0x08); fix = ((l 0x00FFLL) 0x30) | (fix 0x08); fix = ((l 0xFF00LL) 0x38) | (fix 0x08); #endif data[I] = fix; } } } requires 'limit += 6 + (int) ...' (luckily, there's no _nrrdSwap128Endian in the set)
[Bug rtl-optimization/63259] Detecting byteswap sequence
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63259 --- Comment #14 from thopre01 at gcc dot gnu.org --- (In reply to Oleg Endo from comment #13) (In reply to thopre01 from comment #12) That's good, it means the pattern is recognized. Is there an optab defined for bswap16? Nope. Just this: (define_insn rotlhi3_8 [(set (match_operand:HI 0 arith_reg_dest =r) (rotate:HI (match_operand:HI 1 arith_reg_operand r) (const_int 8)))] TARGET_SH1 swap.b %1,%0 [(set_attr type arith)]) If I remember correctly, there was something that would check whether it can use the rotate pattern instead of a bswaphi2. It's rather the contrary. The bswap pass will replace the statements by a 8 bit rotation if the value is 16bit and the expander will choose a bswaphi pattern for that if the backend has one, otherwise it will keep the rotation. The problem is that currently the bswap pass still bails out if there is no 16 bit bswap available. I shall fix that. After adding the following pattern: (define_expand bswaphi2 [(set (match_operand:HI 0 arith_reg_dest) (bswap:HI (match_operand:HI 1 arith_reg_operand)))] TARGET_SH1 { if (!can_create_pseudo_p ()) FAIL; else { emit_insn (gen_rotlhi3_8 (operands[0], operands[1])); DONE; } }) it looks much better. The cases above work, except for the signed short. You mean with the added bswaphi2 pattern the pattern is still unchanged? On SH the bswap:HI HW insn actually doesn't modify the upper 16 bit of the 32 bit register. What it does is: unsigned int test_0999 (unsigned int a, unsigned int b) { return (a 0x) | ((a 0xFF00) 8) | ((a 0xFF) 8); } I was afraid that using a bswap:HI will result in unnecessary code around it: mov.l .L6,r0 ! r0 = 0x and r4,r0 extu.w r4,r4 swap.b r4,r4 rts or r4,r0 Looks good to me, what exactly is the problem? In my case, combine is looking for the pattern: Failed to match this instruction: (set (reg:SI 172 [ D.1356 ]) (ior:SI (ior:SI (and:SI (ashift:SI (reg/v:SI 170 [ a ]) (const_int 8 [0x8])) (const_int 65280 [0xff00])) (zero_extract:SI (reg/v:SI 170 [ a ]) (const_int 8 [0x8]) (const_int 8 [0x8]))) (and:SI (reg/v:SI 170 [ a ]) (const_int -65536 [0x] Which should then work. If you have a bswap instruction it seems better to define a pattern for that which the expander will use. That's the job of the bswap pass to detect a bswap, it shouldn't be done by combine. Same as the other pattern, can you try to print n-n in hex with this new limit? My guess is that the pattern is now recognized but fails later for the same reason as above. With increased limit, the number is/was the same. The bswap:HI expander was missing. Thanks! Ok so the limit ought to be increased and the check for bswaphi removed. I'll take a stab at that but that will probably make it only after Christmas.
[Bug rtl-optimization/63259] Detecting byteswap sequence
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63259 --- Comment #15 from Oleg Endo olegendo at gcc dot gnu.org --- (In reply to thopre01 from comment #14) It's rather the contrary. The bswap pass will replace the statements by a 8 bit rotation if the value is 16bit and the expander will choose a bswaphi pattern for that if the backend has one, otherwise it will keep the rotation. The problem is that currently the bswap pass still bails out if there is no 16 bit bswap available. I shall fix that. Ah, I see. You mean with the added bswaphi2 pattern the pattern is still unchanged? After adding bswaphi2, the bswap pass does the transformation. Except for the non-working 'signed short' mentioned above. But we already figured that out earlier. On SH the bswap:HI HW insn actually doesn't modify the upper 16 bit of the 32 bit register. What it does is: unsigned int test_0999 (unsigned int a, unsigned int b) { return (a 0x) | ((a 0xFF00) 8) | ((a 0xFF) 8); } I was afraid that using a bswap:HI will result in unnecessary code around it: mov.l .L6,r0 ! r0 = 0x and r4,r0 extu.w r4,r4 swap.b r4,r4 rts or r4,r0 Looks good to me, what exactly is the problem? The expected sequence for the function above is: rts swap.b r4,r0 i.e. no anding and oring of lower/higher 16 bit word, since the swap.b insn operates on a SImode value and does not alter the high 16 bits. If you have a bswap instruction it seems better to define a pattern for that which the expander will use. That's the job of the bswap pass to detect a bswap, it shouldn't be done by combine. The combine parts I was talking about are to eliminate the anding and oring of higher 16 bits when a 16 bit byte swap is done on a 32 bit value. Ok so the limit ought to be increased and the check for bswaphi removed. I'll take a stab at that but that will probably make it only after Christmas. No problem.
[Bug rtl-optimization/63259] Detecting byteswap sequence
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63259 --- Comment #16 from thopre01 at gcc dot gnu.org --- (In reply to Oleg Endo from comment #15) (In reply to thopre01 from comment #14) You mean with the added bswaphi2 pattern the pattern is still unchanged? After adding bswaphi2, the bswap pass does the transformation. Except for the non-working 'signed short' mentioned above. But we already figured that out earlier. Did we? All I can find is you and Andreas mentionning that it should work because it will be sign extended to int when doing the bitwise AND with 0xFF00. What did I miss? The expected sequence for the function above is: rts swap.b r4,r0 i.e. no anding and oring of lower/higher 16 bit word, since the swap.b insn operates on a SImode value and does not alter the high 16 bits. Oh yeah right. If you have a bswap instruction it seems better to define a pattern for that which the expander will use. That's the job of the bswap pass to detect a bswap, it shouldn't be done by combine. The combine parts I was talking about are to eliminate the anding and oring of higher 16 bits when a 16 bit byte swap is done on a 32 bit value. I'm surprised that it's not the semantic of a bswaphi. Best regards, Thomas
[Bug rtl-optimization/63259] Detecting byteswap sequence
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63259 --- Comment #17 from Oleg Endo olegendo at gcc dot gnu.org --- (In reply to thopre01 from comment #16) Did we? All I can find is you and Andreas mentionning that it should work because it will be sign extended to int when doing the bitwise AND with 0xFF00. What did I miss? Ah sorry ... me bad. I haven't tried increasing the 'level' with the bswaphi expander pattern in place. Will do that later. If you have a bswap instruction it seems better to define a pattern for that which the expander will use. That's the job of the bswap pass to detect a bswap, it shouldn't be done by combine. The combine parts I was talking about are to eliminate the anding and oring of higher 16 bits when a 16 bit byte swap is done on a 32 bit value. I'm surprised that it's not the semantic of a bswaphi. As far as I know... bswaphi's inputs/output modes are HImode and it operates on the whole reg, not on a subreg. If applied to a SImode subreg the other bits need to be saved/restored, which is what happens wiit the anding/oring.
[Bug rtl-optimization/63259] Detecting byteswap sequence
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63259 --- Comment #13 from Oleg Endo olegendo at gcc dot gnu.org --- (In reply to thopre01 from comment #12) That's good, it means the pattern is recognized. Is there an optab defined for bswap16? Nope. Just this: (define_insn rotlhi3_8 [(set (match_operand:HI 0 arith_reg_dest =r) (rotate:HI (match_operand:HI 1 arith_reg_operand r) (const_int 8)))] TARGET_SH1 swap.b%1,%0 [(set_attr type arith)]) If I remember correctly, there was something that would check whether it can use the rotate pattern instead of a bswaphi2. After adding the following pattern: (define_expand bswaphi2 [(set (match_operand:HI 0 arith_reg_dest) (bswap:HI (match_operand:HI 1 arith_reg_operand)))] TARGET_SH1 { if (!can_create_pseudo_p ()) FAIL; else { emit_insn (gen_rotlhi3_8 (operands[0], operands[1])); DONE; } }) it looks much better. The cases above work, except for the signed short. On SH the bswap:HI HW insn actually doesn't modify the upper 16 bit of the 32 bit register. What it does is: unsigned int test_0999 (unsigned int a, unsigned int b) { return (a 0x) | ((a 0xFF00) 8) | ((a 0xFF) 8); } I was afraid that using a bswap:HI will result in unnecessary code around it: mov.l.L6,r0 ! r0 = 0x andr4,r0 extu.wr4,r4 swap.br4,r4 rts orr4,r0 In my case, combine is looking for the pattern: Failed to match this instruction: (set (reg:SI 172 [ D.1356 ]) (ior:SI (ior:SI (and:SI (ashift:SI (reg/v:SI 170 [ a ]) (const_int 8 [0x8])) (const_int 65280 [0xff00])) (zero_extract:SI (reg/v:SI 170 [ a ]) (const_int 8 [0x8]) (const_int 8 [0x8]))) (and:SI (reg/v:SI 170 [ a ]) (const_int -65536 [0x] Which should then work. Same as the other pattern, can you try to print n-n in hex with this new limit? My guess is that the pattern is now recognized but fails later for the same reason as above. With increased limit, the number is/was the same. The bswap:HI expander was missing. Thanks!
[Bug rtl-optimization/63259] Detecting byteswap sequence
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63259 --- Comment #12 from thopre01 at gcc dot gnu.org --- (In reply to Oleg Endo from comment #11) (In reply to thopre01 from comment #10) I have the same gimple and for me the bswap is correctly detected. Can you break at find_bswap_or_nop just after calling find_bswap_or_nop_1 on the if (!source_stmt) and show me the output of p/x n-n ? n-n = 0x0102 limit = 4 That's good, it means the pattern is recognized. Is there an optab defined for bswap16? For both, test_099 and test_08. Indeed, my mistake. Ok I tested a bit and found that the problem is the depth at which it's looking. Try to recompile tree-ssa-math-opts.c after increasing the limit number in find_bswap_or_nop. Right now the limit will evaluate to 4 and the gimple I have has a depth of 5. I've tried ... limit += 10 + (int) ceil_log2 ((unsigned HOST_WIDE_INT) limit); ... but it doesn't change anything here. Same as the other pattern, can you try to print n-n in hex with this new limit? My guess is that the pattern is now recognized but fails later for the same reason as above. Best regards.
[Bug rtl-optimization/63259] Detecting byteswap sequence
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63259 --- Comment #11 from Oleg Endo olegendo at gcc dot gnu.org --- (In reply to thopre01 from comment #10) I have the same gimple and for me the bswap is correctly detected. Can you break at find_bswap_or_nop just after calling find_bswap_or_nop_1 on the if (!source_stmt) and show me the output of p/x n-n ? n-n = 0x0102 limit = 4 For both, test_099 and test_08. Indeed, my mistake. Ok I tested a bit and found that the problem is the depth at which it's looking. Try to recompile tree-ssa-math-opts.c after increasing the limit number in find_bswap_or_nop. Right now the limit will evaluate to 4 and the gimple I have has a depth of 5. I've tried ... limit += 10 + (int) ceil_log2 ((unsigned HOST_WIDE_INT) limit); ... but it doesn't change anything here.
[Bug rtl-optimization/63259] Detecting byteswap sequence
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63259 --- Comment #10 from thopre01 at gcc dot gnu.org --- (In reply to Oleg Endo from comment #9) (In reply to thopre01 from comment #7) Strange, could you show the output of -fdump-tree-bswap? Not so strange at all. What is strange is that it should detect such bswap pattern. After looking at the RTL dumps, I've noticed that the swap.b insn is generated by the combine pass. I've got a few combine patterns for matching byte swaps on SH. The pattern for swap.b doesn't combine well with other ops around/on it. -fdump-tree-bswap says: ;; Function test_08 (test_08, funcdef_no=1, decl_uid=1333, cgraph_uid=1, symbol_order=1) test_08 (short unsigned int a, short unsigned int b) { short unsigned int _2; signed short _3; int _4; int _5; signed short _6; signed short _7; short unsigned int _8; short unsigned int _10; bb 2: _2 = a_1(D) 8; _3 = (signed short) _2; _4 = (int) a_1(D); _5 = _4 8; _6 = (signed short) _5; _7 = _3 | _6; _8 = (short unsigned int) _7; _10 = _8 + b_9(D); return _10; } I have the same gimple and for me the bswap is correctly detected. Can you break at find_bswap_or_nop just after calling find_bswap_or_nop_1 on the if (!source_stmt) and show me the output of p/x n-n ? Byte swapping of signed short types seems to be not working: short test_func_111 (short a, short b, short c) { return (((a 0xFF00) 8) | ((a 0xFF) 8)); } exts.w r4,r4 mov r4,r0 shlr8 r0 extu.b r0,r0 shll8 r4 or r0,r4 rts exts.w r4,r0 That's expected. Think about what happens if a = 0x8001. Doing a right shift by 8 bit would give 0xFF80 (due to the most significant bit being 1). The right part of the bitwise OR would give 0x0100 as expected and the result would be 0xFF80, so not a byte swap. It would work with an int though as the highest bit would then be 0, or with unsigned short as a right shift would introduce 0 in the most significant bits. As Andreas mentioned, 'a' is promoted to int, so this should be a byte swap. Indeed, my mistake. Ok I tested a bit and found that the problem is the depth at which it's looking. Try to recompile tree-ssa-math-opts.c after increasing the limit number in find_bswap_or_nop. Right now the limit will evaluate to 4 and the gimple I have has a depth of 5.
[Bug rtl-optimization/63259] Detecting byteswap sequence
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63259 --- Comment #6 from Oleg Endo olegendo at gcc dot gnu.org --- With r218705 on SH (-O2 -m4 -ml) I get the following: unsigned short test_099 (unsigned short a, unsigned short b) { return (((a 0xFF00) 8) | ((a 0xFF) 8)); } compiles to: extu.wr4,r4 rts swap.br4,r0 unsigned short test_08 (unsigned short a, unsigned short b) { return b + (((a 0xFF00) 8) | ((a 0xFF) 8)); } compiles to: extu.w r4,r4 mov r4,r0 shll8 r4 shlr8 r0 or r0,r4 add r4,r5 rts extu.w r5,r0 Byte swapping of signed short types seems to be not working: short test_func_111 (short a, short b, short c) { return (((a 0xFF00) 8) | ((a 0xFF) 8)); } exts.w r4,r4 mov r4,r0 shlr8 r0 extu.b r0,r0 shll8 r4 or r0,r4 rts exts.w r4,r0
[Bug rtl-optimization/63259] Detecting byteswap sequence
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63259 --- Comment #7 from thopre01 at gcc dot gnu.org --- (In reply to Oleg Endo from comment #6) With r218705 on SH (-O2 -m4 -ml) I get the following: unsigned short test_099 (unsigned short a, unsigned short b) { return (((a 0xFF00) 8) | ((a 0xFF) 8)); } compiles to: extu.wr4,r4 rts swap.br4,r0 This one looks ok except for the zero-extension. Is the swap.b instruction limited to 32 bit values? unsigned short test_08 (unsigned short a, unsigned short b) { return b + (((a 0xFF00) 8) | ((a 0xFF) 8)); } compiles to: extu.w r4,r4 mov r4,r0 shll8 r4 shlr8 r0 or r0,r4 add r4,r5 rts extu.w r5,r0 Strange, could you show the output of -fdump-tree-bswap? Byte swapping of signed short types seems to be not working: short test_func_111 (short a, short b, short c) { return (((a 0xFF00) 8) | ((a 0xFF) 8)); } exts.w r4,r4 mov r4,r0 shlr8 r0 extu.b r0,r0 shll8 r4 or r0,r4 rts exts.w r4,r0 That's expected. Think about what happens if a = 0x8001. Doing a right shift by 8 bit would give 0xFF80 (due to the most significant bit being 1). The right part of the bitwise OR would give 0x0100 as expected and the result would be 0xFF80, so not a byte swap. It would work with an int though as the highest bit would then be 0, or with unsigned short as a right shift would introduce 0 in the most significant bits. Best regards, Thomas
[Bug rtl-optimization/63259] Detecting byteswap sequence
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63259 --- Comment #8 from Andreas Schwab sch...@linux-m68k.org --- (a 0xFF00) 8 with short a = 0x8001 evaluates to 0x80, since all operands are first promoted to int.
[Bug rtl-optimization/63259] Detecting byteswap sequence
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63259 --- Comment #9 from Oleg Endo olegendo at gcc dot gnu.org --- (In reply to thopre01 from comment #7) (In reply to Oleg Endo from comment #6) With r218705 on SH (-O2 -m4 -ml) I get the following: unsigned short test_099 (unsigned short a, unsigned short b) { return (((a 0xFF00) 8) | ((a 0xFF) 8)); } compiles to: extu.w r4,r4 rts swap.b r4,r0 This one looks ok except for the zero-extension. Is the swap.b instruction limited to 32 bit values? Yes, in sh.md swap.b is defined for SImode values only. But never mind the extu.w, it's an SH ABI thing (PR 52441). unsigned short test_08 (unsigned short a, unsigned short b) { return b + (((a 0xFF00) 8) | ((a 0xFF) 8)); } compiles to: extu.w r4,r4 mov r4,r0 shll8 r4 shlr8 r0 or r0,r4 add r4,r5 rts extu.w r5,r0 Strange, could you show the output of -fdump-tree-bswap? Not so strange at all. After looking at the RTL dumps, I've noticed that the swap.b insn is generated by the combine pass. I've got a few combine patterns for matching byte swaps on SH. The pattern for swap.b doesn't combine well with other ops around/on it. -fdump-tree-bswap says: ;; Function test_08 (test_08, funcdef_no=1, decl_uid=1333, cgraph_uid=1, symbol_order=1) test_08 (short unsigned int a, short unsigned int b) { short unsigned int _2; signed short _3; int _4; int _5; signed short _6; signed short _7; short unsigned int _8; short unsigned int _10; bb 2: _2 = a_1(D) 8; _3 = (signed short) _2; _4 = (int) a_1(D); _5 = _4 8; _6 = (signed short) _5; _7 = _3 | _6; _8 = (short unsigned int) _7; _10 = _8 + b_9(D); return _10; } Byte swapping of signed short types seems to be not working: short test_func_111 (short a, short b, short c) { return (((a 0xFF00) 8) | ((a 0xFF) 8)); } exts.w r4,r4 mov r4,r0 shlr8 r0 extu.b r0,r0 shll8 r4 or r0,r4 rts exts.w r4,r0 That's expected. Think about what happens if a = 0x8001. Doing a right shift by 8 bit would give 0xFF80 (due to the most significant bit being 1). The right part of the bitwise OR would give 0x0100 as expected and the result would be 0xFF80, so not a byte swap. It would work with an int though as the highest bit would then be 0, or with unsigned short as a right shift would introduce 0 in the most significant bits. As Andreas mentioned, 'a' is promoted to int, so this should be a byte swap.
[Bug rtl-optimization/63259] Detecting byteswap sequence
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63259 --- Comment #5 from thopre01 at gcc dot gnu.org --- Author: thopre01 Date: Fri Oct 31 11:55:07 2014 New Revision: 216971 URL: https://gcc.gnu.org/viewcvs?rev=216971root=gccview=rev Log: 2014-10-31 Thomas Preud'homme thomas.preudho...@arm.com gcc/ PR tree-optimization/63259 * tree-ssa-math-opts.c (bswap_replace): Replace expression by a rotation left if it is a 16 bit byte swap. (pass_optimize_bswap::execute): Also consider bswap in LROTATE_EXPR and RROTATE_EXPR statements if it is a byte rotation. gcc/testsuite/ PR tree-optimization/63259 * optimize-bswapsi-1.c (swap32_f): New bswap pass test. * optimize-bswaphi-1.c: Drop useless SIType definition and fix typo in following comment. Modified: trunk/gcc/ChangeLog trunk/gcc/testsuite/ChangeLog trunk/gcc/testsuite/gcc.dg/optimize-bswaphi-1.c trunk/gcc/testsuite/gcc.dg/optimize-bswapsi-1.c trunk/gcc/tree-ssa-math-opts.c
[Bug rtl-optimization/63259] Detecting byteswap sequence
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63259 --- Comment #4 from thopre01 at gcc dot gnu.org --- I detect no noticeable difference when bootstrapping gcc with or without the patch so I think we're in for a fix. :-)
[Bug rtl-optimization/63259] Detecting byteswap sequence
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63259 thopre01 at gcc dot gnu.org changed: What|Removed |Added Status|NEW |ASSIGNED --- Comment #3 from thopre01 at gcc dot gnu.org --- The reason is that the last gimple statement of this bswap is a rotation right on the result of a bitwise OR. Right now bswap are searched only from bitwise OR statement doing recursion until the source. Therefore the analysis is only done on a subset of the statements making up this bswap. The fix is easy but needs compilation benchmarking to make sure this doesn't increase the cost of compiling in a noticeable way. The more statement are considered as finishing a bswap, the closer we are to a O(n²) algorithm.
[Bug rtl-optimization/63259] Detecting byteswap sequence
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63259 Oleg Endo olegendo at gcc dot gnu.org changed: What|Removed |Added CC||olegendo at gcc dot gnu.org, ||thopre01 at gcc dot gnu.org --- Comment #2 from Oleg Endo olegendo at gcc dot gnu.org --- Thomas just recently did some bswap patterns work, maybe he's got an idea.
[Bug rtl-optimization/63259] Detecting byteswap sequence
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63259 Hans-Peter Nilsson hp at gcc dot gnu.org changed: What|Removed |Added Status|UNCONFIRMED |NEW Last reconfirmed||2014-09-19 CC||hp at gcc dot gnu.org Ever confirmed|0 |1 --- Comment #1 from Hans-Peter Nilsson hp at gcc dot gnu.org --- Confirmed the general observation; observed (as far back as) 4.7 era and trunk at r215401 for cris-elf (-march=v8 or higher needed).