> On Sep 14, 2015, at 12:49 PM, Brett Cannon <bcan...@gmail.com> wrote: > > Would it be worth adding a comment that the block of code is an inlined copy > of deque_append()? > Or maybe even turn the append() function into a macro so you minimize code > duplication?
I don't think either would be helpful. The point of the inlining was to let the code evolve independently from deque_append(). Once separated from the mother ship, the code in deque_inline_repeat() could now shed the unnecessary work. The state variable is updated once. The updates within a single block are now in the own inner loop. The deque size is updated outside of that loop, etc. In other words, they are no longer the same code. The original append-in-a-loop version was already being in-lined by the compiler but was doing way too much work. For each item written in the original, there were 7 memory reads, 5 writes, 6 predictable compare-and-branches, and 5 add/sub operations. In the current form, there are 0 reads, 1 writes, 2 predictable compare-and-branches, and 3 add/sub operations. FWIW, my work flow is that periodically I expand the code with new features (the upcoming work is to add slicing support http://bugs.python.org/issue17394), then once it is correct and tested, I make a series optimization passes (such as the work I just described above). After that, I come along and factor-out common code, usually with clean, in-lineable functions rather than macros (such as the recent check-in replacing redundant code in deque_repeat with a call to the common code in deque_inplace_repeat). My schedule lately hasn't given me any big blocks of time to work with, so I do the steps piecemeal as I get snippets of development time. Raymond P.S. For those who are interested, here is the before and after: ---- before --------------------------------- L1152: movq __Py_NoneStruct@GOTPCREL(%rip), %rdi cmpq $0, (%rdi) < je L1257 L1159: addq $1, %r13 cmpq %r14, %r13 je L1141 movq 16(%rbx), %rsi < L1142: movq 48(%rbx), %rdx < addq $1, 56(%rbx) <> cmpq $63, %rdx je L1143 movq 32(%rbx), %rax < addq $1, %rdx L1144: addq $1, 0(%rbp) <> leaq 1(%rsi), %rcx movq %rdx, 48(%rbx) > movq %rcx, 16(%rbx) > movq %rbp, 8(%rax,%rdx,8) > movq 64(%rbx), %rax < cmpq %rax, %rcx jle L1152 cmpq $-1, %rax je L1152 ---- after ------------------------------------ L777: cmpq $63, %rdx je L816 L779: addq $1, %rdx movq %rbp, 16(%rsi,%rbx,8) < addq $1, %rbx leaq (%rdx,%r9), %rcx subq %r8, %rcx cmpq %r12, %rbx jl L777 # outside the inner-loop movq %rdx, 48(%r13) movq %rcx, 0(%rbp) cmpq %r12, %rbx jl L780 _______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com