Paolo 'Blaisorblade' Giarrusso <p.giarru...@gmail.com> added the comment:
1st note: is that code from the threaded version? Note that you need to modify the source to make it accept also ICC to try that. In case you already did that, I guess the patch is not useful at all with ICC since, as far as I can see, the jump is shared. It is vital to this patch that the jump is not shared, something similar to -fno-crossjumping should be found. 2nd note: the answer to your questions seems yes, ICC has less register spills. Look for instance at: movl -272(%ebp), %ecx movzbl (%ecx), %eax addl $1, %ecx and movzbl (%esi), %ecx incl %esi This represents the increment of the program counter after loading the next opcode. In the code you posted, one can see that the program counter is spilled to memory by GCC, but isn't by ICC. Either the spill is elsewhere, or ICC is better here. And it's widely known that ICC has a much better optimizer in many cases, and I remember that GCC register allocator really needs improvement. Finally, I'm a bit surprised by "addl $1, %ecx", since any peephole optimizer should remove that; I'm not shocked just because I've never seen perfect GCC output. _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue4753> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com