https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81461
Bug ID: 81461 Summary: Optimization for removing same variable comparisons in loop: while(it != end1 && it != end2) Product: gcc Version: 7.1.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: antoshkka at gmail dot com Target Milestone: --- Simple iteration by std::deque elements produces suboptimal code. For example #include <deque> unsigned sum(std::deque<unsigned> cont) { unsigned sum = 0; for (unsigned v : cont) sum += v; return sum; } produces the following loop: .L2: cmp rcx, rdx je .L1 add eax, DWORD PTR [rdx] add rdx, 4 cmp rdx, rsi jne .L2 mov rdx, QWORD PTR [r8+8] add r8, 8 lea rsi, [rdx+512] jmp .L2 The loop has two comparisons in it and behaves as the following C code: unsigned sum_like0(unsigned** chunks, unsigned* end) { unsigned sum = 0; for (unsigned* it = *chunks; it != end; it = *(++chunks)) { for (;it != end && it != *chunks + 128; ++it) { sum += *it; } } return sum; } Note the `it != end && it != *chunks + 128` condition. It could be simplified: if `end` belongs to `[it, *chunks + 128]` change the condition to `it != end` and use the condition `it != *chunks + 128` otherwise. Such optimization removes the cmp from the loop and produces a much more faster loop: .L15: add eax, DWORD PTR [rdx] add rdx, 4 cmp rdx, rcx jne .L15 Synthetic tests show up to 2 times better performance. Assembly outputs: https://godbolt.org/g/L7Mr4M