> On Sep 14, 2015, at 12:49 PM, Brett Cannon <bcan...@gmail.com> wrote:
> 
> Would it be worth adding a comment that the block of code is an inlined copy 
> of deque_append()?
> Or maybe even turn the append() function into a macro so you minimize code 
> duplication?

I don't think either would be helpful.  The point of the inlining was to let 
the code evolve independently from deque_append().   

Once separated from the mother ship, the code in deque_inline_repeat() could 
now shed the unnecessary work.  The state variable is updated once.  The 
updates within a single block are now in the own inner loop. The deque size is 
updated outside of that loop, etc.   In other words, they are no longer the 
same code.

The original append-in-a-loop version was already being in-lined by the 
compiler but was doing way too much work.  For each item written in the 
original, there were 7 memory reads, 5 writes, 6 predictable 
compare-and-branches, and 5 add/sub operations.  In the current form, there are 
0 reads, 1 writes, 2 predictable compare-and-branches, and 3 add/sub operations.

FWIW, my work flow is that periodically I expand the code with new features 
(the upcoming work is to add slicing support 
http://bugs.python.org/issue17394), then once it is correct and tested, I make 
a series optimization passes (such as the work I just described above).  After 
that, I come along and factor-out common code, usually with clean, in-lineable 
functions rather than macros (such as the recent check-in replacing redundant 
code in deque_repeat with a call to the common code in deque_inplace_repeat).

My schedule lately hasn't given me any big blocks of time to work with, so I do 
the steps piecemeal as I get snippets of development time.


Raymond


P.S. For those who are interested, here is the before and after:

---- before ---------------------------------
L1152:
    movq    __Py_NoneStruct@GOTPCREL(%rip), %rdi
    cmpq    $0, (%rdi)                                   <
    je  L1257
L1159:
    addq    $1, %r13
    cmpq    %r14, %r13
    je  L1141
    movq    16(%rbx), %rsi                               <
L1142:
    movq    48(%rbx), %rdx                               <
    addq    $1, 56(%rbx)                                 <>
    cmpq    $63, %rdx
    je  L1143
    movq    32(%rbx), %rax                               <
    addq    $1, %rdx
L1144:
    addq    $1, 0(%rbp)                                  <>
    leaq    1(%rsi), %rcx
    movq    %rdx, 48(%rbx)                                >
    movq    %rcx, 16(%rbx)                                >
    movq    %rbp, 8(%rax,%rdx,8)                          >
    movq    64(%rbx), %rax                               <
    cmpq    %rax, %rcx
    jle L1152
    cmpq    $-1, %rax
    je  L1152


---- after ------------------------------------
L777:
    cmpq    $63, %rdx
    je  L816
L779:
    addq    $1, %rdx
    movq    %rbp, 16(%rsi,%rbx,8)                <
    addq    $1, %rbx
    leaq    (%rdx,%r9), %rcx
    subq    %r8, %rcx
    cmpq    %r12, %rbx
    jl  L777

    # outside the inner-loop
    movq    %rdx, 48(%r13)                  
    movq    %rcx, 0(%rbp)
    cmpq    %r12, %rbx
    jl  L780
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Reply via email to