Re: [Open64-devel] sub-optimal strength reduction bug?

Yiran Wang Thu, 12 Jul 2012 17:17:01 -0700

OK, I see.

My guess is that such propagation is allowed, when the designer/developer
feel that EPRE is strong and robust enough.


As, EPRE should be able to move loop invariant computation out, at least
they may be int the loop at the beginning. Also, after copy-propagation
into the loop, the compiler may find more redundancy in the loop.

-yiran

On Thu, Jul 12, 2012 at 5:04 PM, Sun Chan <[email protected]> wrote:

> the adding of 44 was copy propagated into the loop. there used to be
> code that guards such kind of propagation inside the loop
> Sun
>
> On Fri, Jul 13, 2012 at 7:53 AM, Yiran Wang <[email protected]> wrote:
> > Hi Sun,
> >
> > BTW, the optimal 5 instructions sequence of the loop should be as
> following:
> >
> > addl r4, ....          au = a+b*4;
> >
> > L_loop:
> >
> > addl r1, r2;          x = x+d
> > movl r1, (r3);       *a = x
> > addl r3, 4;           a+=4
> > test r3, r4;           t = a-au
> > jl L_loop;             jump if t < 0
> >
> > Without LFTR, the original IV i can not be removed, say, one more
> > instruction is needed.
> >
> > Best Regards,
> > Yiran
> >
> > On Thu, Jul 12, 2012 at 4:28 PM, Sun Chan <[email protected]> wrote:
> >>
> >> BTW, if x is needed on return, which instruction is redundant?
> >> Sun
> >>
> >> On Fri, Jul 13, 2012 at 7:21 AM, Sun Chan <[email protected]> wrote:
> >> > if x is local and is dead outside of the loop, the add to x should be
> >> > eliminated. if not, it is an alias issue
> >> > Sun
> >> >
> >> > On Fri, Jul 13, 2012 at 7:12 AM, Yiran Wang <[email protected]>
> >> > wrote:
> >> >> Hi Sun,
> >> >>
> >> >> Thanks for your reply.
> >> >>
> >> >> FYI, if we remove the last use of x, there are still 7 instructions.
> >> >>
> >> >> This is just an example, the real cases may not be able to be
> >> >> vectorized. I
> >> >> just use the unroll_times_max option to simplify the output a little
> >> >> bit.
> >> >>
> >> >> Regards,
> >> >> Yiran
> >> >>
> >> >>
> >> >> On Thu, Jul 12, 2012 at 3:56 PM, Sun Chan <[email protected]>
> wrote:
> >> >>>
> >> >>> I'm not too familiar with x86 assembly, but I see at least 6
> >> >>> instructions needed (he has unroll max = 1, so assume he doesn't
> want
> >> >>> vector). Value of x is needed outside of loop.
> >> >>> Sun
> >> >>>
> >> >>> On Fri, Jul 13, 2012 at 6:42 AM, shuxin yang <
> [email protected]>
> >> >>> wrote:
> >> >>> > hi buddy, this loop can be vectorized:
> >> >>> >
> >> >>> >     v_init = <x, x+d, x+2d, x+3d>
> >> >>> >     v_inc = <4d, 4d, 4d, 4d>
> >> >>> >
> >> >>> >     the statement can be vect into:
> >> >>> >     a[0:3] = v_init;
> >> >>> >     v_init += v_inc;
> >> >>> >
> >> >>> >
> >> >>> >
> >> >>> > On 07/12/2012 03:28 PM, Yiran Wang wrote:
> >> >>> >
> >> >>> > Hi All,
> >> >>> >
> >> >>> > It looks like strength reduction is not optimal for the following
> >> >>> > example?
> >> >>> >
> >> >>> > 7 instructions per iteration is used, but 4 (or 5 without LFTR)
> are
> >> >>> > necessary.
> >> >>> >
> >> >>> > Best Regards,
> >> >>> > Yiran Wang
> >> >>> >
> >> >>> > bash-4.0$ cat x.c
> >> >>> > int foo(int x, int b, int *__restrict a)
> >> >>> > {
> >> >>> >   int i;
> >> >>> >   int c,d ;
> >> >>> >   c = b*60;
> >> >>> >   d = c+44;
> >> >>> >
> >> >>> >   for (i = 0; i< b; i++)
> >> >>> >   {
> >> >>> >     x = x+d;
> >> >>> >     *a++=x;
> >> >>> >   }
> >> >>> >   return x;
> >> >>> > }
> >> >>> > bash-4.0$ /opt/open64tr/bin/opencc -c -O3 -keep x.c
> >> >>> > -Wb,-trlow,-tt25:0xffffffff -OPT:unroll_times_max=1
> -march=barcelona
> >> >>> > bash-4.0$ cat x.s
> >> >>> > #  /opt/open64tr/lib/gcc-lib/x86_64-open64-linux/5.0/be::5.0
> >> >>> >
> >> >>> > #-----------------------------------------------------------
> >> >>> > # Compiling x.c (x.I)
> >> >>> > #-----------------------------------------------------------
> >> >>> >
> >> >>> > #-----------------------------------------------------------
> >> >>> > # Options:
> >> >>> > #-----------------------------------------------------------
> >> >>> > #  Target:Barcelona, ISA:ISA_1, Endian:little, Pointer Size:32
> >> >>> > #  -O3 (Optimization level)
> >> >>> > #  -g0 (Debug level)
> >> >>> > #  -m2 (Report advisories)
> >> >>> > #-----------------------------------------------------------
> >> >>> >
> >> >>> > int foo(int x, int b, int *__restrict a)
> >> >>> > {
> >> >>> >   int i;
> >> >>> >   int c,d ;
> >> >>> >   c = b*60;
> >> >>> >   d = c+44;
> >> >>> >
> >> >>> >   for (i = 0; i< b; i++)
> >> >>> >   {
> >> >>> >     x = x+d;
> >> >>> >     *a++=x;
> >> >>> >   }
> >> >>> >   return x;
> >> >>> > }
> >> >>> > bash-4.0$ /opt/open64tr/bin/opencc -c -O3 -keep x.c
> >> >>> > -Wb,-trlow,-tt25:0xffffffff -OPT:unroll_times_max=1
> -march=barcelona
> >> >>> > bash-4.0$ cat x.s
> >> >>> > #  /opt/open64tr/lib/gcc-lib/x86_64-open64-linux/5.0/be::5.0
> >> >>> >
> >> >>> > #-----------------------------------------------------------
> >> >>> > # Compiling x.c (x.I)
> >> >>> > #-----------------------------------------------------------
> >> >>> >
> >> >>> > #-----------------------------------------------------------
> >> >>> > # Options:
> >> >>> > #-----------------------------------------------------------
> >> >>> > #  Target:Barcelona, ISA:ISA_1, Endian:little, Pointer Size:32
> >> >>> > #  -O3 (Optimization level)
> >> >>> > #  -g0 (Debug level)
> >> >>> > #  -m2 (Report advisories)
> >> >>> > #-----------------------------------------------------------
> >> >>> >
> >> >>> > .text
> >> >>> > .align 2
> >> >>> > .section .text
> >> >>> > .p2align 5,,
> >> >>> >
> >> >>> > # Program Unit: foo
> >> >>> > .globl foo
> >> >>> > .type foo, @function
> >> >>> > foo: # 0x0
> >> >>> > # .frame %esp, 16, %esp
> >> >>> > # _temp_gra_spill0 = 0
> >> >>> > .loc 1 2 0
> >> >>> >  #   1  int foo(int x, int b, int *__restrict a)
> >> >>> >  #   2  {
> >> >>> > .LBB1_foo:
> >> >>> > pushl %ebp                     # [0]
> >> >>> > pushl %ebx                     # [3]
> >> >>> > pushl %edi                     # [6]
> >> >>> > addl $-16,%esp                 # [9]
> >> >>> > movl 36(%esp),%edi             # [10] b
> >> >>> > leal -1(%edi),%eax             # [13]
> >> >>> > testl %eax,%eax               # [14]
> >> >>> > jl .Lt_0_2818                 # [15]
> >> >>> > .LBB2_foo:
> >> >>> > movl %edi,%ebp                 # [0]
> >> >>> > .loc 1 8 0
> >> >>> >  #   4    int c,d ;
> >> >>> >  #   5    c = b*60;
> >> >>> >  #   6    d = c+44;
> >> >>> >  #   7
> >> >>> >  #   8    for (i = 0; i< b; i++)
> >> >>> > movl %edi,%ecx                 # [0]
> >> >>> > movl 32(%esp),%ebx             # [0] x
> >> >>> > movl %ecx,0(%esp)             # [1] _temp_gra_spill0
> >> >>> > imull $60,%ebp                 # [1]
> >> >>> > movl 40(%esp),%eax             # [1] a
> >> >>> > xorl %edx,%edx                 # [2]
> >> >>> > .p2align 5,,31
> >> >>> > .Lt_0_3586:
> >> >>> >  #<loop> Loop body line 8, nesting depth: 1, estimated iterations:
> >> >>> > 1000
> >> >>> > .loc 1 11 0
> >> >>> >  #   9    {
> >> >>> >  #  10      x = x+d;
> >> >>> >  #  11      *a++=x;
> >> >>> > addl $1,%edx                   # [0]
> >> >>> > .loc 1 10 0
> >> >>> > addl %ebp,%ebx                 # [0]
> >> >>> > .loc 1 11 0
> >> >>> > addl $4,%eax                   # [0]
> >> >>> > .loc 1 10 0
> >> >>> > addl $44,%ebx                 # [1]
> >> >>> > .loc 1 11 0
> >> >>> > cmpl %edi,%edx                 # [1]
> >> >>> > movl %ebx,-4(%eax)             # [2] id:17
> >> >>> > jl .Lt_0_3586                 # [2]
> >> >>> > .Lt_0_4098:
> >> >>> > .loc 1 13 0
> >> >>> >  #  12    }
> >> >>> >  #  13    return x;
> >> >>> > movl %ebx,%eax                 # [0]
> >> >>> > addl $16,%esp                 # [0]
> >> >>> > popl %edi                     # [1]
> >> >>> > popl %ebx                     # [4]
> >> >>> > popl %ebp                     # [7]
> >> >>> > ret                           # [7]
> >> >>> > .p2align 5,,31
> >> >>> > .Lt_0_2818:
> >> >>> > .loc 1 11 0
> >> >>> > movl 32(%esp),%eax             # [0] x
> >> >>> > .loc 1 13 0
> >> >>> > addl $16,%esp                 # [0]
> >> >>> > popl %edi                     # [1]
> >> >>> > popl %ebx                     # [4]
> >> >>> > popl %ebp                     # [7]
> >> >>> > ret                           # [7]
> >> >>> > .LDWend_foo:
> >> >>> > .size foo, .LDWend_foo-foo
> >> >>> > .section .text
> >> >>> > .align 4
> >> >>> >
> >> >>> > .section .eh_frame, "a",@progbits
> >> >>> > .LEHCIE:
> >> >>> > .4byte .LEHCIE_end - .LEHCIE_begin
> >> >>> > .LEHCIE_begin:
> >> >>> > .4byte 0x0
> >> >>> > .byte 0x01, 0x00, 0x01, 0x7c, 0x08, 0x0c, 0x04, 0x04
> >> >>> > .byte 0x88, 0x01
> >> >>> > .align 4
> >> >>> > .LEHCIE_end:
> >> >>> >
> >> >>> > .section .debug_line, ""
> >> >>> > .section .note.GNU-stack,"",@progbits
> >> >>> > .ident "#Open64 Compiler Version 5.0 : x.c compiled with : -O3
> >> >>> > -OPT:unroll_times_max=1 -march=barcelona -msse2 -msse3 -mno-3dnow
> >> >>> > -mno-sse4a
> >> >>> > -mno-ssse3 -mno-sse41 -mno-sse42 -mno-aes -mno-pclmul -mno-avx
> >> >>> > -mno-xop
> >> >>> > -mno-fma -mno-fma4 -m32"
> >> >>> >
> >> >>> >
> >> >>> >
> >> >>> > .text
> >> >>> > .align 2
> >> >>> > .section .text
> >> >>> > .p2align 5,,
> >> >>> >
> >> >>> > # Program Unit: foo
> >> >>> > .globl foo
> >> >>> > .type foo, @function
> >> >>> > foo: # 0x0
> >> >>> > # .frame %esp, 16, %esp
> >> >>> > # _temp_gra_spill0 = 0
> >> >>> > .loc 1 2 0
> >> >>> >  #   1  int foo(int x, int b, int *__restrict a)
> >> >>> >  #   2  {
> >> >>> > .LBB1_foo:
> >> >>> > pushl %ebp                     # [0]
> >> >>> > pushl %ebx                     # [3]
> >> >>> > pushl %edi                     # [6]
> >> >>> > addl $-16,%esp                 # [9]
> >> >>> > movl 36(%esp),%edi             # [10] b
> >> >>> > leal -1(%edi),%eax             # [13]
> >> >>> > testl %eax,%eax               # [14]
> >> >>> > jl .Lt_0_2818                 # [15]
> >> >>> > .LBB2_foo:
> >> >>> > movl %edi,%ebp                 # [0]
> >> >>> > .loc 1 8 0
> >> >>> >  #   4    int c,d ;
> >> >>> >  #   5    c = b*60;
> >> >>> >  #   6    d = c+44;
> >> >>> >  #   7
> >> >>> >  #   8    for (i = 0; i< b; i++)
> >> >>> > movl %edi,%ecx                 # [0]
> >> >>> > movl 32(%esp),%ebx             # [0] x
> >> >>> > movl %ecx,0(%esp)             # [1] _temp_gra_spill0
> >> >>> > imull $60,%ebp                 # [1]
> >> >>> > movl 40(%esp),%eax             # [1] a
> >> >>> > xorl %edx,%edx                 # [2]
> >> >>> > .p2align 5,,31
> >> >>> > .Lt_0_3586:
> >> >>> >  #<loop> Loop body line 8, nesting depth: 1, estimated iterations:
> >> >>> > 1000
> >> >>> > .loc 1 11 0
> >> >>> >  #   9    {
> >> >>> >  #  10      x = x+d;
> >> >>> >  #  11      *a++=x;
> >> >>> > addl $1,%edx                   # [0]
> >> >>> > .loc 1 10 0
> >> >>> > addl %ebp,%ebx                 # [0]
> >> >>> > .loc 1 11 0
> >> >>> > addl $4,%eax                   # [0]
> >> >>> > .loc 1 10 0
> >> >>> > addl $44,%ebx                 # [1]
> >> >>> > .loc 1 11 0
> >> >>> > cmpl %edi,%edx                 # [1]
> >> >>> > movl %ebx,-4(%eax)             # [2] id:17
> >> >>> > jl .Lt_0_3586                 # [2]
> >> >>> > .Lt_0_4098:
> >> >>> > .loc 1 13 0
> >> >>> >  #  12    }
> >> >>> >  #  13    return x;
> >> >>> > movl %ebx,%eax                 # [0]
> >> >>> > addl $16,%esp                 # [0]
> >> >>> > popl %edi                     # [1]
> >> >>> > popl %ebx                     # [4]
> >> >>> > popl %ebp                     # [7]
> >> >>> > ret                           # [7]
> >> >>> > .p2align 5,,31
> >> >>> > .Lt_0_2818:
> >> >>> > .loc 1 11 0
> >> >>> > movl 32(%esp),%eax             # [0] x
> >> >>> > .loc 1 13 0
> >> >>> > addl $16,%esp                 # [0]
> >> >>> > popl %edi                     # [1]
> >> >>> > popl %ebx                     # [4]
> >> >>> > popl %ebp                     # [7]
> >> >>> > ret                           # [7]
> >> >>> > .LDWend_foo:
> >> >>> > .size foo, .LDWend_foo-foo
> >> >>> > .section .text
> >> >>> > .align 4
> >> >>> >
> >> >>> > .section .eh_frame, "a",@progbits
> >> >>> > .LEHCIE:
> >> >>> > .4byte .LEHCIE_end - .LEHCIE_begin
> >> >>> > .LEHCIE_begin:
> >> >>> > .4byte 0x0
> >> >>> > .byte 0x01, 0x00, 0x01, 0x7c, 0x08, 0x0c, 0x04, 0x04
> >> >>> > .byte 0x88, 0x01
> >> >>> > .align 4
> >> >>> > .LEHCIE_end:
> >> >>> >
> >> >>> > .section .debug_line, ""
> >> >>> > .section .note.GNU-stack,"",@progbits
> >> >>> > .ident "#Open64 Compiler Version 5.0 : x.c compiled with : -O3
> >> >>> > -OPT:unroll_times_max=1 -march=barcelona -msse2 -msse3 -mno-3dnow
> >> >>> > -mno-sse4a
> >> >>> > -mno-ssse3 -mno-sse41 -mno-sse42 -mno-aes -mno-pclmul -mno-avx
> >> >>> > -mno-xop
> >> >>> > -mno-fma -mno-fma4 -m32"
> >> >>> >
> >> >>> >
> >> >>> >
> >> >>> >
> >> >>> >
> >> >>> >
> ------------------------------------------------------------------------------
> >> >>> > Live Security Virtual Conference
> >> >>> > Exclusive live event will cover all the ways today's security and
> >> >>> > threat landscape has changed and how IT managers can respond.
> >> >>> > Discussions
> >> >>> > will include endpoint security, mobile security and the latest in
> >> >>> > malware
> >> >>> > threats.
> http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> >> >>> >
> >> >>> >
> >> >>> >
> >> >>> > _______________________________________________
> >> >>> > Open64-devel mailing list
> >> >>> > [email protected]
> >> >>> > https://lists.sourceforge.net/lists/listinfo/open64-devel
> >> >>> >
> >> >>> >
> >> >>> >
> >> >>> >
> >> >>> >
> >> >>> >
> ------------------------------------------------------------------------------
> >> >>> > Live Security Virtual Conference
> >> >>> > Exclusive live event will cover all the ways today's security and
> >> >>> > threat landscape has changed and how IT managers can respond.
> >> >>> > Discussions
> >> >>> > will include endpoint security, mobile security and the latest in
> >> >>> > malware
> >> >>> > threats.
> http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> >> >>> > _______________________________________________
> >> >>> > Open64-devel mailing list
> >> >>> > [email protected]
> >> >>> > https://lists.sourceforge.net/lists/listinfo/open64-devel
> >> >>> >
> >> >>
> >> >>
> >
> >
>

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/

_______________________________________________
Open64-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/open64-devel

Re: [Open64-devel] sub-optimal strength reduction bug?

Reply via email to