https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81461

            Bug ID: 81461
           Summary: Optimization for removing same variable comparisons in
                    loop: while(it != end1 && it != end2)
           Product: gcc
           Version: 7.1.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: antoshkka at gmail dot com
  Target Milestone: ---

Simple iteration by std::deque elements produces suboptimal code. For example 

#include <deque>

unsigned sum(std::deque<unsigned> cont) {
    unsigned sum = 0;
    for (unsigned v : cont)
        sum += v;

    return sum;
}


produces the following loop:


.L2:
        cmp     rcx, rdx
        je      .L1
        add     eax, DWORD PTR [rdx]
        add     rdx, 4
        cmp     rdx, rsi
        jne     .L2
        mov     rdx, QWORD PTR [r8+8]
        add     r8, 8
        lea     rsi, [rdx+512]
        jmp     .L2

The loop has two comparisons in it and behaves as the following C code:

unsigned sum_like0(unsigned** chunks, unsigned* end) {
    unsigned sum = 0;

    for (unsigned* it = *chunks; it != end; it = *(++chunks)) {
        for (;it != end && it != *chunks + 128; ++it) {
            sum += *it;
        }
    }

    return sum;
}


Note the `it != end && it != *chunks + 128` condition. It could be simplified:
if `end` belongs to `[it, *chunks + 128]` change the condition to `it != end`
and use the condition `it != *chunks + 128` otherwise. Such optimization
removes the cmp from the loop and produces a much more faster loop:

.L15:
        add     eax, DWORD PTR [rdx]
        add     rdx, 4
        cmp     rdx, rcx
        jne     .L15

Synthetic tests show up to 2 times better performance. Assembly outputs:
https://godbolt.org/g/L7Mr4M

Reply via email to