Martin v. Löwis <[email protected]> added the comment:
Marc-Andre: gcc will normally not unroll loops, unless -funroll-loops is given
on the command line. Then, it will unroll many loops, and do so with 8
iterations per outer loop. This typically causes significant code bloat, which
is why unrolling is normally disabled and left to the programmer.
For those who want to experiment with this, I attach a C file with just the
code in question. Compile this with your favorite compiler settings, and see
what the compile generates. clang, on an x64 system, compiles the original loop
into
LBB0_2: ## =>This Inner Loop Header: Depth=1
movzbl (%rdi), %eax
movw %ax, (%rdx)
incq %rdi
addq $2, %rdx
decq %rsi
jne LBB0_2
and the unrolled loop into
LBB1_2: ## %.lr.ph6
## =>This Inner Loop Header: Depth=1
movzbl (%rdi,%rcx), %r8d
movw %r8w, (%rdx)
movzbl 1(%rdi,%rcx), %r8d
movw %r8w, 2(%rdx)
movzbl 2(%rdi,%rcx), %r8d
movw %r8w, 4(%rdx)
movzbl 3(%rdi,%rcx), %r8d
movw %r8w, 6(%rdx)
addq $8, %rdx
addq $4, %rcx
cmpq %rax, %rcx
jl LBB1_2
----------
nosy: +loewis
Added file: http://bugs.python.org/file23353/unroll.c
_______________________________________
Python tracker <[email protected]>
<http://bugs.python.org/issue13136>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe:
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com