OK, I see.
My guess is that such propagation is allowed, when the designer/developer
feel that EPRE is strong and robust enough.
As, EPRE should be able to move loop invariant computation out, at least
they may be int the loop at the beginning. Also, after copy-propagation
into the loop, the compiler may find more redundancy in the loop.
-yiran
On Thu, Jul 12, 2012 at 5:04 PM, Sun Chan <sun.c...@gmail.com> wrote:
> the adding of 44 was copy propagated into the loop. there used to be
> code that guards such kind of propagation inside the loop
> Sun
>
> On Fri, Jul 13, 2012 at 7:53 AM, Yiran Wang <yiran.w...@gmail.com> wrote:
> > Hi Sun,
> >
> > BTW, the optimal 5 instructions sequence of the loop should be as
> following:
> >
> > addl r4, .... au = a+b*4;
> >
> > L_loop:
> >
> > addl r1, r2; x = x+d
> > movl r1, (r3); *a = x
> > addl r3, 4; a+=4
> > test r3, r4; t = a-au
> > jl L_loop; jump if t < 0
> >
> > Without LFTR, the original IV i can not be removed, say, one more
> > instruction is needed.
> >
> > Best Regards,
> > Yiran
> >
> > On Thu, Jul 12, 2012 at 4:28 PM, Sun Chan <sun.c...@gmail.com> wrote:
> >>
> >> BTW, if x is needed on return, which instruction is redundant?
> >> Sun
> >>
> >> On Fri, Jul 13, 2012 at 7:21 AM, Sun Chan <sun.c...@gmail.com> wrote:
> >> > if x is local and is dead outside of the loop, the add to x should be
> >> > eliminated. if not, it is an alias issue
> >> > Sun
> >> >
> >> > On Fri, Jul 13, 2012 at 7:12 AM, Yiran Wang <yiran.w...@gmail.com>
> >> > wrote:
> >> >> Hi Sun,
> >> >>
> >> >> Thanks for your reply.
> >> >>
> >> >> FYI, if we remove the last use of x, there are still 7 instructions.
> >> >>
> >> >> This is just an example, the real cases may not be able to be
> >> >> vectorized. I
> >> >> just use the unroll_times_max option to simplify the output a little
> >> >> bit.
> >> >>
> >> >> Regards,
> >> >> Yiran
> >> >>
> >> >>
> >> >> On Thu, Jul 12, 2012 at 3:56 PM, Sun Chan <sun.c...@gmail.com>
> wrote:
> >> >>>
> >> >>> I'm not too familiar with x86 assembly, but I see at least 6
> >> >>> instructions needed (he has unroll max = 1, so assume he doesn't
> want
> >> >>> vector). Value of x is needed outside of loop.
> >> >>> Sun
> >> >>>
> >> >>> On Fri, Jul 13, 2012 at 6:42 AM, shuxin yang <
> shuxin.ope...@gmail.com>
> >> >>> wrote:
> >> >>> > hi buddy, this loop can be vectorized:
> >> >>> >
> >> >>> > v_init = <x, x+d, x+2d, x+3d>
> >> >>> > v_inc = <4d, 4d, 4d, 4d>
> >> >>> >
> >> >>> > the statement can be vect into:
> >> >>> > a[0:3] = v_init;
> >> >>> > v_init += v_inc;
> >> >>> >
> >> >>> >
> >> >>> >
> >> >>> > On 07/12/2012 03:28 PM, Yiran Wang wrote:
> >> >>> >
> >> >>> > Hi All,
> >> >>> >
> >> >>> > It looks like strength reduction is not optimal for the following
> >> >>> > example?
> >> >>> >
> >> >>> > 7 instructions per iteration is used, but 4 (or 5 without LFTR)
> are
> >> >>> > necessary.
> >> >>> >
> >> >>> > Best Regards,
> >> >>> > Yiran Wang
> >> >>> >
> >> >>> > bash-4.0$ cat x.c
> >> >>> > int foo(int x, int b, int *__restrict a)
> >> >>> > {
> >> >>> > int i;
> >> >>> > int c,d ;
> >> >>> > c = b*60;
> >> >>> > d = c+44;
> >> >>> >
> >> >>> > for (i = 0; i< b; i++)
> >> >>> > {
> >> >>> > x = x+d;
> >> >>> > *a++=x;
> >> >>> > }
> >> >>> > return x;
> >> >>> > }
> >> >>> > bash-4.0$ /opt/open64tr/bin/opencc -c -O3 -keep x.c
> >> >>> > -Wb,-trlow,-tt25:0xffffffff -OPT:unroll_times_max=1
> -march=barcelona
> >> >>> > bash-4.0$ cat x.s
> >> >>> > # /opt/open64tr/lib/gcc-lib/x86_64-open64-linux/5.0/be::5.0
> >> >>> >
> >> >>> > #-----------------------------------------------------------
> >> >>> > # Compiling x.c (x.I)
> >> >>> > #-----------------------------------------------------------
> >> >>> >
> >> >>> > #-----------------------------------------------------------
> >> >>> > # Options:
> >> >>> > #-----------------------------------------------------------
> >> >>> > # Target:Barcelona, ISA:ISA_1, Endian:little, Pointer Size:32
> >> >>> > # -O3 (Optimization level)
> >> >>> > # -g0 (Debug level)
> >> >>> > # -m2 (Report advisories)
> >> >>> > #-----------------------------------------------------------
> >> >>> >
> >> >>> > int foo(int x, int b, int *__restrict a)
> >> >>> > {
> >> >>> > int i;
> >> >>> > int c,d ;
> >> >>> > c = b*60;
> >> >>> > d = c+44;
> >> >>> >
> >> >>> > for (i = 0; i< b; i++)
> >> >>> > {
> >> >>> > x = x+d;
> >> >>> > *a++=x;
> >> >>> > }
> >> >>> > return x;
> >> >>> > }
> >> >>> > bash-4.0$ /opt/open64tr/bin/opencc -c -O3 -keep x.c
> >> >>> > -Wb,-trlow,-tt25:0xffffffff -OPT:unroll_times_max=1
> -march=barcelona
> >> >>> > bash-4.0$ cat x.s
> >> >>> > # /opt/open64tr/lib/gcc-lib/x86_64-open64-linux/5.0/be::5.0
> >> >>> >
> >> >>> > #-----------------------------------------------------------
> >> >>> > # Compiling x.c (x.I)
> >> >>> > #-----------------------------------------------------------
> >> >>> >
> >> >>> > #-----------------------------------------------------------
> >> >>> > # Options:
> >> >>> > #-----------------------------------------------------------
> >> >>> > # Target:Barcelona, ISA:ISA_1, Endian:little, Pointer Size:32
> >> >>> > # -O3 (Optimization level)
> >> >>> > # -g0 (Debug level)
> >> >>> > # -m2 (Report advisories)
> >> >>> > #-----------------------------------------------------------
> >> >>> >
> >> >>> > .text
> >> >>> > .align 2
> >> >>> > .section .text
> >> >>> > .p2align 5,,
> >> >>> >
> >> >>> > # Program Unit: foo
> >> >>> > .globl foo
> >> >>> > .type foo, @function
> >> >>> > foo: # 0x0
> >> >>> > # .frame %esp, 16, %esp
> >> >>> > # _temp_gra_spill0 = 0
> >> >>> > .loc 1 2 0
> >> >>> > # 1 int foo(int x, int b, int *__restrict a)
> >> >>> > # 2 {
> >> >>> > .LBB1_foo:
> >> >>> > pushl %ebp # [0]
> >> >>> > pushl %ebx # [3]
> >> >>> > pushl %edi # [6]
> >> >>> > addl $-16,%esp # [9]
> >> >>> > movl 36(%esp),%edi # [10] b
> >> >>> > leal -1(%edi),%eax # [13]
> >> >>> > testl %eax,%eax # [14]
> >> >>> > jl .Lt_0_2818 # [15]
> >> >>> > .LBB2_foo:
> >> >>> > movl %edi,%ebp # [0]
> >> >>> > .loc 1 8 0
> >> >>> > # 4 int c,d ;
> >> >>> > # 5 c = b*60;
> >> >>> > # 6 d = c+44;
> >> >>> > # 7
> >> >>> > # 8 for (i = 0; i< b; i++)
> >> >>> > movl %edi,%ecx # [0]
> >> >>> > movl 32(%esp),%ebx # [0] x
> >> >>> > movl %ecx,0(%esp) # [1] _temp_gra_spill0
> >> >>> > imull $60,%ebp # [1]
> >> >>> > movl 40(%esp),%eax # [1] a
> >> >>> > xorl %edx,%edx # [2]
> >> >>> > .p2align 5,,31
> >> >>> > .Lt_0_3586:
> >> >>> > #<loop> Loop body line 8, nesting depth: 1, estimated iterations:
> >> >>> > 1000
> >> >>> > .loc 1 11 0
> >> >>> > # 9 {
> >> >>> > # 10 x = x+d;
> >> >>> > # 11 *a++=x;
> >> >>> > addl $1,%edx # [0]
> >> >>> > .loc 1 10 0
> >> >>> > addl %ebp,%ebx # [0]
> >> >>> > .loc 1 11 0
> >> >>> > addl $4,%eax # [0]
> >> >>> > .loc 1 10 0
> >> >>> > addl $44,%ebx # [1]
> >> >>> > .loc 1 11 0
> >> >>> > cmpl %edi,%edx # [1]
> >> >>> > movl %ebx,-4(%eax) # [2] id:17
> >> >>> > jl .Lt_0_3586 # [2]
> >> >>> > .Lt_0_4098:
> >> >>> > .loc 1 13 0
> >> >>> > # 12 }
> >> >>> > # 13 return x;
> >> >>> > movl %ebx,%eax # [0]
> >> >>> > addl $16,%esp # [0]
> >> >>> > popl %edi # [1]
> >> >>> > popl %ebx # [4]
> >> >>> > popl %ebp # [7]
> >> >>> > ret # [7]
> >> >>> > .p2align 5,,31
> >> >>> > .Lt_0_2818:
> >> >>> > .loc 1 11 0
> >> >>> > movl 32(%esp),%eax # [0] x
> >> >>> > .loc 1 13 0
> >> >>> > addl $16,%esp # [0]
> >> >>> > popl %edi # [1]
> >> >>> > popl %ebx # [4]
> >> >>> > popl %ebp # [7]
> >> >>> > ret # [7]
> >> >>> > .LDWend_foo:
> >> >>> > .size foo, .LDWend_foo-foo
> >> >>> > .section .text
> >> >>> > .align 4
> >> >>> >
> >> >>> > .section .eh_frame, "a",@progbits
> >> >>> > .LEHCIE:
> >> >>> > .4byte .LEHCIE_end - .LEHCIE_begin
> >> >>> > .LEHCIE_begin:
> >> >>> > .4byte 0x0
> >> >>> > .byte 0x01, 0x00, 0x01, 0x7c, 0x08, 0x0c, 0x04, 0x04
> >> >>> > .byte 0x88, 0x01
> >> >>> > .align 4
> >> >>> > .LEHCIE_end:
> >> >>> >
> >> >>> > .section .debug_line, ""
> >> >>> > .section .note.GNU-stack,"",@progbits
> >> >>> > .ident "#Open64 Compiler Version 5.0 : x.c compiled with : -O3
> >> >>> > -OPT:unroll_times_max=1 -march=barcelona -msse2 -msse3 -mno-3dnow
> >> >>> > -mno-sse4a
> >> >>> > -mno-ssse3 -mno-sse41 -mno-sse42 -mno-aes -mno-pclmul -mno-avx
> >> >>> > -mno-xop
> >> >>> > -mno-fma -mno-fma4 -m32"
> >> >>> >
> >> >>> >
> >> >>> >
> >> >>> > .text
> >> >>> > .align 2
> >> >>> > .section .text
> >> >>> > .p2align 5,,
> >> >>> >
> >> >>> > # Program Unit: foo
> >> >>> > .globl foo
> >> >>> > .type foo, @function
> >> >>> > foo: # 0x0
> >> >>> > # .frame %esp, 16, %esp
> >> >>> > # _temp_gra_spill0 = 0
> >> >>> > .loc 1 2 0
> >> >>> > # 1 int foo(int x, int b, int *__restrict a)
> >> >>> > # 2 {
> >> >>> > .LBB1_foo:
> >> >>> > pushl %ebp # [0]
> >> >>> > pushl %ebx # [3]
> >> >>> > pushl %edi # [6]
> >> >>> > addl $-16,%esp # [9]
> >> >>> > movl 36(%esp),%edi # [10] b
> >> >>> > leal -1(%edi),%eax # [13]
> >> >>> > testl %eax,%eax # [14]
> >> >>> > jl .Lt_0_2818 # [15]
> >> >>> > .LBB2_foo:
> >> >>> > movl %edi,%ebp # [0]
> >> >>> > .loc 1 8 0
> >> >>> > # 4 int c,d ;
> >> >>> > # 5 c = b*60;
> >> >>> > # 6 d = c+44;
> >> >>> > # 7
> >> >>> > # 8 for (i = 0; i< b; i++)
> >> >>> > movl %edi,%ecx # [0]
> >> >>> > movl 32(%esp),%ebx # [0] x
> >> >>> > movl %ecx,0(%esp) # [1] _temp_gra_spill0
> >> >>> > imull $60,%ebp # [1]
> >> >>> > movl 40(%esp),%eax # [1] a
> >> >>> > xorl %edx,%edx # [2]
> >> >>> > .p2align 5,,31
> >> >>> > .Lt_0_3586:
> >> >>> > #<loop> Loop body line 8, nesting depth: 1, estimated iterations:
> >> >>> > 1000
> >> >>> > .loc 1 11 0
> >> >>> > # 9 {
> >> >>> > # 10 x = x+d;
> >> >>> > # 11 *a++=x;
> >> >>> > addl $1,%edx # [0]
> >> >>> > .loc 1 10 0
> >> >>> > addl %ebp,%ebx # [0]
> >> >>> > .loc 1 11 0
> >> >>> > addl $4,%eax # [0]
> >> >>> > .loc 1 10 0
> >> >>> > addl $44,%ebx # [1]
> >> >>> > .loc 1 11 0
> >> >>> > cmpl %edi,%edx # [1]
> >> >>> > movl %ebx,-4(%eax) # [2] id:17
> >> >>> > jl .Lt_0_3586 # [2]
> >> >>> > .Lt_0_4098:
> >> >>> > .loc 1 13 0
> >> >>> > # 12 }
> >> >>> > # 13 return x;
> >> >>> > movl %ebx,%eax # [0]
> >> >>> > addl $16,%esp # [0]
> >> >>> > popl %edi # [1]
> >> >>> > popl %ebx # [4]
> >> >>> > popl %ebp # [7]
> >> >>> > ret # [7]
> >> >>> > .p2align 5,,31
> >> >>> > .Lt_0_2818:
> >> >>> > .loc 1 11 0
> >> >>> > movl 32(%esp),%eax # [0] x
> >> >>> > .loc 1 13 0
> >> >>> > addl $16,%esp # [0]
> >> >>> > popl %edi # [1]
> >> >>> > popl %ebx # [4]
> >> >>> > popl %ebp # [7]
> >> >>> > ret # [7]
> >> >>> > .LDWend_foo:
> >> >>> > .size foo, .LDWend_foo-foo
> >> >>> > .section .text
> >> >>> > .align 4
> >> >>> >
> >> >>> > .section .eh_frame, "a",@progbits
> >> >>> > .LEHCIE:
> >> >>> > .4byte .LEHCIE_end - .LEHCIE_begin
> >> >>> > .LEHCIE_begin:
> >> >>> > .4byte 0x0
> >> >>> > .byte 0x01, 0x00, 0x01, 0x7c, 0x08, 0x0c, 0x04, 0x04
> >> >>> > .byte 0x88, 0x01
> >> >>> > .align 4
> >> >>> > .LEHCIE_end:
> >> >>> >
> >> >>> > .section .debug_line, ""
> >> >>> > .section .note.GNU-stack,"",@progbits
> >> >>> > .ident "#Open64 Compiler Version 5.0 : x.c compiled with : -O3
> >> >>> > -OPT:unroll_times_max=1 -march=barcelona -msse2 -msse3 -mno-3dnow
> >> >>> > -mno-sse4a
> >> >>> > -mno-ssse3 -mno-sse41 -mno-sse42 -mno-aes -mno-pclmul -mno-avx
> >> >>> > -mno-xop
> >> >>> > -mno-fma -mno-fma4 -m32"
> >> >>> >
> >> >>> >
> >> >>> >
> >> >>> >
> >> >>> >
> >> >>> >
> ------------------------------------------------------------------------------
> >> >>> > Live Security Virtual Conference
> >> >>> > Exclusive live event will cover all the ways today's security and
> >> >>> > threat landscape has changed and how IT managers can respond.
> >> >>> > Discussions
> >> >>> > will include endpoint security, mobile security and the latest in
> >> >>> > malware
> >> >>> > threats.
> http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> >> >>> >
> >> >>> >
> >> >>> >
> >> >>> > _______________________________________________
> >> >>> > Open64-devel mailing list
> >> >>> > Open64-devel@lists.sourceforge.net
> >> >>> > https://lists.sourceforge.net/lists/listinfo/open64-devel
> >> >>> >
> >> >>> >
> >> >>> >
> >> >>> >
> >> >>> >
> >> >>> >
> ------------------------------------------------------------------------------
> >> >>> > Live Security Virtual Conference
> >> >>> > Exclusive live event will cover all the ways today's security and
> >> >>> > threat landscape has changed and how IT managers can respond.
> >> >>> > Discussions
> >> >>> > will include endpoint security, mobile security and the latest in
> >> >>> > malware
> >> >>> > threats.
> http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> >> >>> > _______________________________________________
> >> >>> > Open64-devel mailing list
> >> >>> > Open64-devel@lists.sourceforge.net
> >> >>> > https://lists.sourceforge.net/lists/listinfo/open64-devel
> >> >>> >
> >> >>
> >> >>
> >
> >
>
------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Open64-devel mailing list
Open64-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/open64-devel