[Bug middle-end/55266] New: vector expansion: 36 movs for 4 adds

glisse at gcc dot gnu.org Sat, 10 Nov 2012 07:10:05 -0800


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55266




             Bug #: 55266

           Summary: vector expansion: 36 movs for 4 adds

    Classification: Unclassified

           Product: gcc

           Version: 4.8.0

            Status: UNCONFIRMED

          Severity: normal

          Priority: P3

         Component: middle-end

        AssignedTo: unassig...@gcc.gnu.org

        ReportedBy: gli...@gcc.gnu.org

            Target: x86_64-linux-gnu





I already mentioned this example, but I don't think it is in any PR:



typedef double vec __attribute__((vector_size(4*sizeof(double))));

void f(vec*x){

  *x+=*x+*x;

}



compiled with -S -O3 -msse4, produces 4 add insns (normal), and 36 mov insns,

which is a bit much... For comparison, this should be equivalent to the

following code, which generates only 6 mov insn:



typedef double vec __attribute__((vector_size(2*sizeof(double))));

void f(vec*x){

  x[0]+=x[0]+x[0];

  x[1]+=x[1]+x[1];

}



One minor enhancement would be to have fold_ternary handle BIT_FIELD_REF of

CONSTRUCTOR of vectors (I think it is already tracked elsewhere, though I

couldn't find it).



But the main issue is with copying these fake vectors. Their fake "registers"

are in memory, and copying between those (4 times 2 movs going through rax in

DImode, I assume it is faster than going through xmm registers?) isn't

optimized away. In this example, the content of *x is first copied to a fake

register. Then V2DF parts are extracted, added, and put in memory. That fake

register is now copied to a new fake register. V2DF are taken from it, added to

the V2DF that were still there, and stored to memory. And that is finally

copied to the memory location x.



I don't know how that should be improved. Maybe the vector lowering pass should

go even further, turn the first program into the second one, and not leave any

extra long vectors for the back-end to handle? It doesn't seem easy to optimize

in the back-end, too late. Or maybe something can be done at expansion time?

[Bug middle-end/55266] New: vector expansion: 36 movs for 4 adds

Reply via email to