x86 gcc lacks simple optimization

2013-12-06 Thread Konstantin Vladimirov
Hi, Consider code: int foo(char *t, char *v, int w) { int i; for (i = 1; i != w; ++i) { int x = i 2; v[x + 4] = t[x + 4]; } return 0; } Compile it to x86 (I used both gcc 4.7.2 and gcc 4.8.1) with options: gcc -O2 -m32 -S test.c You will see loop, formed like: .L5: leal 0(,%eax,4), %edx

Re: x86 gcc lacks simple optimization

2013-12-06 Thread David Brown
On 06/12/13 09:30, Konstantin Vladimirov wrote: Hi, Consider code: int foo(char *t, char *v, int w) { int i; for (i = 1; i != w; ++i) { int x = i 2; v[x + 4] = t[x + 4]; } return 0; } Compile it to x86 (I used both gcc 4.7.2 and gcc 4.8.1) with options: gcc -O2 -m32 -S

Re: x86 gcc lacks simple optimization

2013-12-06 Thread Konstantin Vladimirov
Hi, Example from x86 code was only for ease of reproduction. I am pretty sure, this is architecture-independent issue. Say on ARM: .L2: mov ip, r3, asl #2 add ip, ip, #4 add r3, r3, #1 ldrb r4, [r0, ip] @ zero_extendqisi2 cmp r3, r2 strb r4, [r1, ip] bne .L2 May be improved to: .L2: add r3,

Re: x86 gcc lacks simple optimization

2013-12-06 Thread Jakub Jelinek
On Fri, Dec 06, 2013 at 12:30:54PM +0400, Konstantin Vladimirov wrote: Consider code: int foo(char *t, char *v, int w) { int i; for (i = 1; i != w; ++i) { int x = i 2; v[x + 4] = t[x + 4]; } return 0; } This is either job of ivopts pass, dunno why it doesn't consider turning

Re: x86 gcc lacks simple optimization

2013-12-06 Thread Richard Biener
On Fri, Dec 6, 2013 at 9:30 AM, Konstantin Vladimirov konstantin.vladimi...@gmail.com wrote: Hi, Consider code: int foo(char *t, char *v, int w) { int i; for (i = 1; i != w; ++i) { int x = i 2; v[x + 4] = t[x + 4]; } return 0; } Compile it to x86 (I used both gcc 4.7.2 and gcc

Re: x86 gcc lacks simple optimization

2013-12-06 Thread Konstantin Vladimirov
Hi, nothing changes if everything is unsigned and we are guaranteed to not raise UB on overflow: unsigned foo(unsigned char *t, unsigned char *v, unsigned w) { unsigned i; for (i = 1; i != w; ++i) { unsigned x = i 2; v[x + 4] = t[x + 4]; } return 0; } yields: .L5: leal 0(,%eax,4), %edx addl

Re: x86 gcc lacks simple optimization

2013-12-06 Thread Richard Biener
On Fri, Dec 6, 2013 at 11:19 AM, Konstantin Vladimirov konstantin.vladimi...@gmail.com wrote: Hi, nothing changes if everything is unsigned and we are guaranteed to not raise UB on overflow: unsigned foo(unsigned char *t, unsigned char *v, unsigned w) { unsigned i; for (i = 1; i != w;

Re: x86 gcc lacks simple optimization

2013-12-06 Thread H.J. Lu
On Fri, Dec 6, 2013 at 2:25 AM, Richard Biener richard.guent...@gmail.com wrote: On Fri, Dec 6, 2013 at 11:19 AM, Konstantin Vladimirov konstantin.vladimi...@gmail.com wrote: Hi, nothing changes if everything is unsigned and we are guaranteed to not raise UB on overflow: unsigned

Re: x86 gcc lacks simple optimization

2013-12-06 Thread Marc Glisse
On Fri, 6 Dec 2013, Konstantin Vladimirov wrote: Consider code: int foo(char *t, char *v, int w) { int i; for (i = 1; i != w; ++i) { int x = i 2; A side note, but something too few people seem to be aware of: writing i2 can pessimize code compared to i*4 (and it is never faster). That is

Re: x86 gcc lacks simple optimization

2013-12-06 Thread Konstantin Vladimirov
Hi, Richard, I tried to add LSHIFT_EXPR case to tree-scalar-evolution.c and now it yields code like (x86 again): .L5: movzbl 4(%esi,%eax,4), %edx movb %dl, 4(%ebx,%eax,4) addl $1, %eax cmpl %ecx, %eax jne .L5 So, excessive lea is gone. It is great, thank you so much. But I wonder what else can

Re: x86 gcc lacks simple optimization

2013-12-06 Thread Richard Biener
On Fri, Dec 6, 2013 at 2:52 PM, Konstantin Vladimirov konstantin.vladimi...@gmail.com wrote: Hi, Richard, I tried to add LSHIFT_EXPR case to tree-scalar-evolution.c and now it yields code like (x86 again): .L5: movzbl 4(%esi,%eax,4), %edx movb %dl, 4(%ebx,%eax,4) addl $1, %eax cmpl %ecx,

Re: x86 gcc lacks simple optimization

2013-12-06 Thread Jeff Law
On 12/06/13 07:17, Richard Biener wrote: On Fri, Dec 6, 2013 at 2:52 PM, Konstantin Vladimirov konstantin.vladimi...@gmail.com wrote: Hi, Richard, I tried to add LSHIFT_EXPR case to tree-scalar-evolution.c and now it yields code like (x86 again): .L5: movzbl 4(%esi,%eax,4), %edx movb %dl,