https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84646

            Bug ID: 84646
           Summary: Missed optimisation for hoisting conditions outside
                    nested loops
           Product: gcc
           Version: unknown
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c
          Assignee: unassigned at gcc dot gnu.org
          Reporter: david at westcontrol dot com
  Target Milestone: ---

This is a missed optimisation opportunity.  In a discussion about the "best"
way to break out of a nested loop, I tested this code with gcc:

int foo(const int * p, const int * q, int m, int n) {
    int sum = 0;
    bool running = true;
    const int max = 20000;
    for (int i = 0; i < n; i++) {
        for (int j = 0; j < m; j++) {
            if (running) {
                sum += (p[i] * q[j]);
                if (sum >= max) {
                    running = false;
                    sum = max;
                }
            }
        }
    }
    return sum;
}


The test for "running" is hoisted outside the inner loop, so that the generated
code is changed to approximately:

int foo(const int * p, const int * q, int m, int n) {
    int sum = 0;
    bool running = true;
    const int max = 20000;
    for (int i = 0; i < n; i++) {
loop:
        if (running) {
            for (int j = 0; j < m; j++) {
                sum += (p[i] * q[j]);
                if (sum >= max) {
                    running = false;
                    sum = max;
                    goto loop;
                }
            }
        }
    }
    return sum;
}

This is definitely a good step - avoiding the check for "running" in the inner
loop, and breaking out of it when the max condition is reached is a clear win. 
But it would be even nicer if the check could be hoisted further - the
"running" flag could be completely eliminated to give the transformation:

int foo(const int * p, const int * q, int m, int n) {
    int sum = 0;
    //bool running = true;
    const int max = 20000;
    for (int i = 0; i < n; i++) {
        for (int j = 0; j < m; j++) {
            if (running) {
                sum += (p[i] * q[j]);
                if (sum >= max) {
                    //running = false;
                    sum = max;
                    goto exit;
                }
            }
        }
    }
exit:
    return sum;
}


Testing was done with -O2 and -O3, on a variety of gcc versions and targets
(thanks, godbolt.org!) up to version 8.0 (trunk at this time).  The generated
code showed approximately the same transformations and optimisations.

Reply via email to