[Bug target/85216] Performance issue with PHP on ppc64 systems
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85216 --- Comment #18 from Bill Schmidt --- I asked around a bit. On x86, user-user attacks are not mitigated by default. To enable user-user mitigation: echo 2 > /sys/kernel/debug/x86/ibrs_enabled My source tells me: 8<--- Red Hat explains the above setting as follows in https://access.redhat.com/articles/3311301 - "When IBRS is set to 2 (spectre_v2=ibrs_always), both userland and kernel runs with indirect branch restricted speculation. This protects userspace from hyperthreading/simultaneous multi-threading attacks as well, and is also the default on certain old AMD processors (family 10h, 12h and 16h). This feature addresses CVE-2017-5715, variant #2." If a GCC compiler with support for "thunks" is available, one might also build their applications, for example, PHP with the following flags added to mitigate spectre variant #2- -mindirect-branch=thunk-inline -mfunction-return=thunk-inline -mindirect-branch-register However, it is possible that to properly mitigate spectre variant#2 in Skylake processors, setting ibrs_enabled to 2 AND using thunks may be necessary, although I am not sure about this.
[Bug target/85216] Performance issue with PHP on ppc64 systems
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85216 --- Comment #17 from Bill Schmidt --- OK, thanks! I'd be very interested in hearing what you discover.
[Bug target/85216] Performance issue with PHP on ppc64 systems
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85216 --- Comment #16 from Timothy Pearson --- (In reply to Bill Schmidt from comment #15) > PHP's reliance on frequent indirect branches makes it essentially the worst > case for this sort of thing. When Spectre v2 CVE mitigations are in place > for user code, you will see performance issues on all architectures that > rely on speculation for indirect branch performance. When user code is > running in an "unsafe" configuration, you will not see those issues. (We > have seen similar issues on x86 when retpoline is used for user code.) What's most puzzling is that we're looking at benchmarks on x86 systems that are supposed to be mitigated, but the performance drop isn't really showing up. At this point I'm wondering if: a.) The user/user attack isn't actually mitigated on these systems, only the user/kernel attack b.) Intel/AMD found some way to update the microcode so as not to have a heavy performance loss In any case, we'll continue to investigate / run benchmarks to see if any light can be shed on this.
[Bug target/85216] Performance issue with PHP on ppc64 systems
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85216 --- Comment #15 from Bill Schmidt --- PHP's reliance on frequent indirect branches makes it essentially the worst case for this sort of thing. When Spectre v2 CVE mitigations are in place for user code, you will see performance issues on all architectures that rely on speculation for indirect branch performance. When user code is running in an "unsafe" configuration, you will not see those issues. (We have seen similar issues on x86 when retpoline is used for user code.)
[Bug target/85216] Performance issue with PHP on ppc64 systems
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85216 --- Comment #14 from Timothy Pearson --- (In reply to Bill Schmidt from comment #13) > This was prototyped and measured against the firmware fixes with > indistinguishable results. So the complexity of a software solution, with > its impacts on Linux distributions, was not warranted. (That is, the > firmware workarounds are already tightly targeted.) Good to know, thank you. Was about to test it on this end. I guess the main takeway then is that POWER9 handles interpreted workloads quite badly, or is there still some possibility of additional optimization here?
[Bug target/85216] Performance issue with PHP on ppc64 systems
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85216 --- Comment #13 from Bill Schmidt --- This was prototyped and measured against the firmware fixes with indistinguishable results. So the complexity of a software solution, with its impacts on Linux distributions, was not warranted. (That is, the firmware workarounds are already tightly targeted.)
[Bug target/85216] Performance issue with PHP on ppc64 systems
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85216 --- Comment #12 from Timothy Pearson --- After quite a bit of investigation, this is down to the Spectre v2 user mode protections on POWER9, which (from what I understand) involve completely disabling the branch predictor. My question then comes down to, why wasn't the retpoline style mitigation used on ppc64el? Nuking hardware elements seems extreme and is obviously causing serious problems for direct branching code (like that being seen here). What would be involved in creating a retpoline-type mitigation for ppc64el, so that we can run with the branch predictor turned back on (already verified to fix much of the performance issues seen on POWER9)?
[Bug target/85216] Performance issue with PHP on ppc64 systems
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85216 --- Comment #11 from Bill Schmidt --- (In reply to Timothy Pearson from comment #10) > > It's even slow compared to P8 with mitigations applied. Do you have a link > to the hostboot commit that may have enabled the P9 mitigation, or to the > register name (SCOM) that was modified to enable the mitigation? No, I'm sorry, I don't know those details. If you contact me offline I can probably find someone who does.
[Bug target/85216] Performance issue with PHP on ppc64 systems
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85216 --- Comment #10 from Timothy Pearson --- (In reply to Bill Schmidt from comment #9) > You mentioned you're on a POWER9 machine. It could be that you have > firmware with Spectre mitigations applied, which will affect all indirect > branches. It may be that you do not have Spectre mitigations applied on > your x86 machine, in which case the comparison would be expected to be quite > different. Depending on firmware levels, the mitigations may be able to be > switched off, so you should check into that first. PHP is known to be > sensitive to indirect branch performance. > > The Power landing page for these mitigations is > https://www.ibm.com/blogs/psirt/potential-impact-processors-power-family/. > From here you should be able to get to further information for your specific > hardware and OS version. It's even slow compared to P8 with mitigations applied. Do you have a link to the hostboot commit that may have enabled the P9 mitigation, or to the register name (SCOM) that was modified to enable the mitigation?
[Bug target/85216] Performance issue with PHP on ppc64 systems
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85216 --- Comment #9 from Bill Schmidt --- You mentioned you're on a POWER9 machine. It could be that you have firmware with Spectre mitigations applied, which will affect all indirect branches. It may be that you do not have Spectre mitigations applied on your x86 machine, in which case the comparison would be expected to be quite different. Depending on firmware levels, the mitigations may be able to be switched off, so you should check into that first. PHP is known to be sensitive to indirect branch performance. The Power landing page for these mitigations is https://www.ibm.com/blogs/psirt/potential-impact-processors-power-family/. >From here you should be able to get to further information for your specific hardware and OS version.
[Bug target/85216] Performance issue with PHP on ppc64 systems
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85216 --- Comment #8 from Richard Biener --- (In reply to Timothy Pearson from comment #4) > (In reply to Andrew Pinski from comment #3) > > This is 100% the equivalent code. > > > > jmp *(%r15) # opline.199_67->handler > > Does two things: > > loads a pointer from %r15 and then jumps to that pointer. > > > > In PowerPC, you can only jump indirectly via the CTR or LR registers. > > > > ld 9,0(29) # opline.200_67->handler, gotovar.1505_2678 > > mtctr 9 # gotovar.1505_2678, gotovar.1505_2678 > > bctr > > > > > > Most likely what is happening is the indirect branch predictor is not > > predicting the branch correctly on the powerpc side while it is on the x86 > > side. This is a micro-architecture difference between the two chips and is > > unrelated to the ISA differences. > > I'm forwarding this for analysis to see if there's anything we can do in > firmware to "fix" the branch predictor. If not, is there a way to prime the > predictor in this scenario, or is this too specific to be added > compiler-side? The usual way is speculative devirtualization, you replace jmp *(%r15) with if (%r15 == constant-address) jmp constant-address else jmp *(%r15) where the hope is this helps branch prediction. Other than that - are there very many such indirect branches or is it just one?
[Bug target/85216] Performance issue with PHP on ppc64 systems
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85216 --- Comment #7 from David Edelsohn --- One possibility is bad luck and the branch happens to fall on an address that conflicts with another branch.
[Bug target/85216] Performance issue with PHP on ppc64 systems
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85216 --- Comment #6 from Timothy Pearson --- Understood. I'll update this report if we find a way to get the predictor working optimally in this scenario.
[Bug target/85216] Performance issue with PHP on ppc64 systems
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85216 David Edelsohn changed: What|Removed |Added Status|RESOLVED|REOPENED Resolution|INVALID |--- --- Comment #5 from David Edelsohn --- The issue is *why* the branch predictor is not predicting it correctly. It may be that the details of the branch predictor are causing the prediction to conflict with another branch, for example, nullifying the correct prediction. One should not leap to the conclusion that the predictor is not initialized.
[Bug target/85216] Performance issue with PHP on ppc64 systems
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85216 --- Comment #4 from Timothy Pearson --- (In reply to Andrew Pinski from comment #3) > This is 100% the equivalent code. > > jmp *(%r15) # opline.199_67->handler > Does two things: > loads a pointer from %r15 and then jumps to that pointer. > > In PowerPC, you can only jump indirectly via the CTR or LR registers. > > ld 9,0(29) # opline.200_67->handler, gotovar.1505_2678 > mtctr 9 # gotovar.1505_2678, gotovar.1505_2678 > bctr > > > Most likely what is happening is the indirect branch predictor is not > predicting the branch correctly on the powerpc side while it is on the x86 > side. This is a micro-architecture difference between the two chips and is > unrelated to the ISA differences. I'm forwarding this for analysis to see if there's anything we can do in firmware to "fix" the branch predictor. If not, is there a way to prime the predictor in this scenario, or is this too specific to be added compiler-side?
[Bug target/85216] Performance issue with PHP on ppc64 systems
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85216 Andrew Pinski changed: What|Removed |Added Status|WAITING |RESOLVED Last reconfirmed|2018-04-05 00:00:00 | Resolution|--- |INVALID --- Comment #3 from Andrew Pinski --- This is 100% the equivalent code. jmp *(%r15) # opline.199_67->handler Does two things: loads a pointer from %r15 and then jumps to that pointer. In PowerPC, you can only jump indirectly via the CTR or LR registers. ld 9,0(29) # opline.200_67->handler, gotovar.1505_2678 mtctr 9 # gotovar.1505_2678, gotovar.1505_2678 bctr Most likely what is happening is the indirect branch predictor is not predicting the branch correctly on the powerpc side while it is on the x86 side. This is a micro-architecture difference between the two chips and is unrelated to the ISA differences.
[Bug target/85216] Performance issue with PHP on ppc64 systems
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85216 --- Comment #2 from Timothy Pearson --- (In reply to David Edelsohn from comment #1) > What two additional instructions? x86 is a CISC architecture and Power is a > RISC architecture. x86 has an instruction that directly performs an > indirect call through a pointer. Power must explicitly load the pointer and > move it to the appropriate register to perform an indirect branch. > > One can comment / questions that the *SEQUENCE* appears to require more time > on Power than the equivalent sequence on x86. But directly comparing > instructions and counting instructions in two different ISAs without context > is not meaningful. That is in fact what I am concerned with, the fact that the sequence is taking longer than the equivalent sequence on x86. I am aware that the two instruction sequences accomplish the same goal, but for some reason the x86 one is fast enough that it doesn't even show up in the perf output as a hot instruction, while the ppc64 sequence stalls twice (two hot instructions), once on the load and once on the register move.
[Bug target/85216] Performance issue with PHP on ppc64 systems
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85216 David Edelsohn changed: What|Removed |Added Status|UNCONFIRMED |WAITING Last reconfirmed||2018-04-05 Ever confirmed|0 |1 --- Comment #1 from David Edelsohn --- What two additional instructions? x86 is a CISC architecture and Power is a RISC architecture. x86 has an instruction that directly performs an indirect call through a pointer. Power must explicitly load the pointer and move it to the appropriate register to perform an indirect branch. One can comment / questions that the *SEQUENCE* appears to require more time on Power than the equivalent sequence on x86. But directly comparing instructions and counting instructions in two different ISAs without context is not meaningful.