https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86020
Bug ID: 86020 Summary: Performance regression in Eigen geometry.cpp test starting with r248333 Product: gcc Version: 8.1.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: wschmidt at gcc dot gnu.org Target Milestone: --- GCC 8.1 has regressed by about 30% compared with GCC 7.3 on one of the Eigen test cases when measured on Power9 hardware (powerpc64le-linux-gnu). Similar performance loss is seen on Power8 hardware as well. Pat Haugen did some bisecting and found that this began with r248334. 2017-05-22 Jan Hubicka <hubi...@ucw.cz> * ipa-inline.c (edge_badness): Use inlined_time instead of inline_summaries->get. The performance difference seems to be due to a change in inlining as a result of the above patch. The symptom is a marked increase in stack load/store activity. An example occurs in one of the hottest routines: _ZN5Eigen8internal20generic_product_implINS_5BlockIKNS_6MatrixIfLi4ELi4ELi0ELi4ELi4EEELi3ELi3ELb0EEENS3_IfLi3ELi8ELi0ELi3ELi8EEENS_10DenseShapeES8_Li3EE5addToINS2_IS7_Li3ELi8ELb1EEEEEvRT_RKS6_RKS7_ Below is the code with GCC 8.1.0 : 0.15 : 1000b274: lfs f7,8(r9) 0.00 : 1000b278: li r5,2 0.00 : 1000b27c: addi r4,r1,464 0.06 : 1000b280: lfs f8,4(r9) 0.09 : 1000b284: lfs f9,0(r9) 0.00 : 1000b288: lfs f0,4(r31) 0.00 : 1000b28c: addi r3,r1,648 2.61 : 1000b290: lfs f10,32(r10) 0.71 : 1000b294: lfs f11,16(r10) 0.46 : 1000b298: lfs f12,0(r10) 1.99 : 1000b29c: fmuls f10,f10,f7 4.36 : 1000b2a0: fmadds f11,f11,f8,f10 4.20 : 1000b2a4: fmadds f12,f12,f9,f11 4.08 : 1000b2a8: fadds f0,f0,f12 7.67 : 1000b2ac: stfs f0,4(r31) 0.00 : 1000b2b0: bl 10004838 0.00 : 1000b2b4: nop 0.06 : 1000b2b8: ld r5,664(r1) 0.00 : 1000b2bc: ld r10,680(r1) 0.00 : 1000b2c0: ld r9,512(r1) 0.00 : 1000b2c4: addi r4,r1,32 0.00 : 1000b2c8: ld r6,688(r1) 0.06 : 1000b2cc: ld r7,696(r1) 0.00 : 1000b2d0: ld r8,712(r1) 0.03 : 1000b2d4: ld r0,648(r1) 0.00 : 1000b2d8: addi r3,r1,1104 0.00 : 1000b2dc: ld r11,704(r1) 0.09 : 1000b2e0: std r28,144(r1) 0.00 : 1000b2e4: std r30,152(r1) 0.00 : 1000b2e8: std r26,160(r1) 0.18 : 1000b2ec: std r5,48(r1) 0.09 : 1000b2f0: std r10,64(r1) 0.03 : 1000b2f4: ld r5,728(r1) 0.00 : 1000b2f8: ld r10,720(r1) 0.06 : 1000b2fc: std r9,136(r1) 0.00 : 1000b300: add r9,r9,r29 0.09 : 1000b304: std r0,32(r1) 0.46 : 1000b308: std r6,600(r1) 0.06 : 1000b30c: std r6,72(r1) 0.18 : 1000b310: std r7,608(r1) 0.09 : 1000b314: std r7,80(r1) 0.12 : 1000b318: std r11,88(r1) 0.40 : 1000b31c: std r9,120(r1) 0.09 : 1000b320: std r8,624(r1) 0.06 : 1000b324: std r8,96(r1) 0.03 : 1000b328: std r10,632(r1) 0.06 : 1000b32c: std r10,104(r1) 0.06 : 1000b330: std r5,112(r1) 0.15 : 1000b334: bl 1000a248 <calls the map base evaluator function> The additional instructions can be seen between addresses 1000b2b8 and 1000b330. Below is the code for GCC 7.3.0 : 0.16 : 1000ac88: lfs f7,8(r9) 0.04 : 1000ac8c: ld r10,488(r1) 0.08 : 1000ac90: addi r4,r1,176 0.35 : 1000ac94: lfs f8,4(r9) 0.28 : 1000ac98: lfs f9,0(r9) 0.12 : 1000ac9c: lfs f0,0(r31) 0.00 : 1000aca0: ld r9,496(r1) 0.00 : 1000aca4: addi r3,r1,1160 3.39 : 1000aca8: lfs f10,32(r8) 0.87 : 1000acac: lfs f11,16(r8) 0.83 : 1000acb0: lfs f12,0(r8) 0.32 : 1000acb4: std r10,864(r1) 0.12 : 1000acb8: std r10,776(r1) 0.04 : 1000acbc: std r9,872(r1) 0.24 : 1000acc0: std r9,784(r1) 1.93 : 1000acc4: fmuls f10,f10,f7 4.38 : 1000acc8: fmadds f11,f11,f8,f10 5.48 : 1000accc: fmadds f12,f12,f9,f11 5.99 : 1000acd0: fadds f0,f0,f12 10.25 : 1000acd4: stfs f0,0(r31) 0.00 : 1000acd8: std r7,304(r1) 0.20 : 1000acdc: std r9,224(r1) 0.00 : 1000ace0: std r21,176(r1) 0.24 : 1000ace4: std r27,192(r1) 0.00 : 1000ace8: std r23,208(r1) 0.00 : 1000acec: std r24,232(r1) 0.04 : 1000acf0: std r26,240(r1) 0.00 : 1000acf4: std r30,248(r1) 0.12 : 1000acf8: std r26,256(r1) 0.79 : 1000acfc: std r28,264(r1) 0.16 : 1000ad00: std r25,280(r1) 0.08 : 1000ad04: std r30,288(r1) 0.28 : 1000ad08: std r29,296(r1) 0.00 : 1000ad0c: std r10,216(r1) 0.00 : 1000ad10: bl 1000aae8 <calls the map base evaluator function> The GCC 8.1 code has an additional call, no longer inlined, to the constructor Eigen::Matrix(). We suspect this and similar changes in other variants of this function are responsible. It is hard to understand the patch that made this change out of context. I went back to the gcc-patches mailing list to look for discussion, but I wasn't able to find a justification for this change. Jan, could you explain its purpose?