[Bug ipa/102059] Incorrect always_inline diagnostic in LTO mode with #pragma GCC target("cpu=power10")
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102059 Peter Bergner changed: What|Removed |Added Resolution|--- |FIXED Status|ASSIGNED|RESOLVED --- Comment #42 from Peter Bergner --- Fixed everywhere as much as we will. It was decided (Comment 34) not to backport Kewen's patch that fixes the HTM issue to the GCC 11 and older release branches and that instead we will live with the workarounds described in Comment 35 for them. The -mpower8-fusion bug is fixed everywhere.
[Bug ipa/102059] Incorrect always_inline diagnostic in LTO mode with #pragma GCC target("cpu=power10")
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102059 --- Comment #41 from CVS Commits --- The releases/gcc-11 branch has been updated by Peter Bergner : https://gcc.gnu.org/g:3ea6d0ed6860e4f1d8f609c26de9d9a7f7a9ad2d commit r11-10022-g3ea6d0ed6860e4f1d8f609c26de9d9a7f7a9ad2d Author: Michael Meissner Date: Fri May 6 11:39:13 2022 -0500 rs6000: Ignore fusion option flags for inlining test [PR102059] The -mpower8-fusion and -mpower10-fusion options do not modify which instructions we can generate, so ignore them when deciding whether we can inline callee into caller. 2022-05-06 Michael Meissner Segher Boessenkool gcc/ PR target/102059 * config/rs6000/rs6000.c (rs6000_can_inline_p): Ignore -mpower8-fusion and -mpower10-fusion options for inlining purposes. gcc/testsuite/ PR target/102059 * gcc.target/powerpc/pr102059-4.c: New test. (cherry picked from commit 2fb654f77d5292864ef57040f7bc01d7a975f6d9)
[Bug ipa/102059] Incorrect always_inline diagnostic in LTO mode with #pragma GCC target("cpu=power10")
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102059 --- Comment #40 from CVS Commits --- The releases/gcc-12 branch has been updated by Peter Bergner : https://gcc.gnu.org/g:e6b1ac334ac61f72536f3479f735ea3514f1309d commit r12-8353-ge6b1ac334ac61f72536f3479f735ea3514f1309d Author: Michael Meissner Date: Fri May 6 11:39:13 2022 -0500 rs6000: Ignore fusion option flags for inlining test [PR102059] The -mpower8-fusion and -mpower10-fusion options do not modify which instructions we can generate, so ignore them when deciding whether we can inline callee into caller. 2022-05-06 Michael Meissner gcc/ PR target/102059 * config/rs6000/rs6000.cc (rs6000_can_inline_p): Ignore -mpower8-fusion and -mpower10-fusion options for inlining purposes. gcc/testsuite/ PR target/102059 * gcc.target/powerpc/pr102059-4.c: New test. (cherry picked from commit 2fb654f77d5292864ef57040f7bc01d7a975f6d9)
[Bug ipa/102059] Incorrect always_inline diagnostic in LTO mode with #pragma GCC target("cpu=power10")
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102059 --- Comment #39 from CVS Commits --- The releases/gcc-10 branch has been updated by Peter Bergner : https://gcc.gnu.org/g:aa33a2d866c0f3f2f9b3cb26f8ff2074bcd169b4 commit r10-10596-gaa33a2d866c0f3f2f9b3cb26f8ff2074bcd169b4 Author: Michael Meissner Date: Fri May 6 11:39:13 2022 -0500 rs6000: Ignore fusion option flags for inlining test [PR102059] The -mpower8-fusion option does not modify which instructions we can generate, so ignore it when deciding whether we can inline callee into caller. 2022-05-06 Michael Meissner gcc/ PR target/102059 * config/rs6000/rs6000.c (rs6000_can_inline_p): Ignore -mpower8-fusion option for inlining purposes. gcc/testsuite/ PR target/102059 * gcc.target/powerpc/pr102059-4.c: New test. (cherry picked from commit 2fb654f77d5292864ef57040f7bc01d7a975f6d9)
[Bug ipa/102059] Incorrect always_inline diagnostic in LTO mode with #pragma GCC target("cpu=power10")
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102059 --- Comment #38 from CVS Commits --- The master branch has been updated by Peter Bergner : https://gcc.gnu.org/g:2fb654f77d5292864ef57040f7bc01d7a975f6d9 commit r13-157-g2fb654f77d5292864ef57040f7bc01d7a975f6d9 Author: Michael Meissner Date: Fri May 6 11:39:13 2022 -0500 rs6000: Ignore fusion option flags for inlining test [PR102059] The -mpower8-fusion and -mpower10-fusion options do not modify which instructions we can generate, so ignore them when deciding whether we can inline callee into caller. 2022-05-06 Michael Meissner gcc/ PR target/102059 * config/rs6000/rs6000.cc (rs6000_can_inline_p): Ignore -mpower8-fusion and -mpower10-fusion options for inlining purposes. gcc/testsuite/ PR target/102059 * gcc.target/powerpc/pr102059-4.c: New test.
[Bug ipa/102059] Incorrect always_inline diagnostic in LTO mode with #pragma GCC target("cpu=power10")
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102059 --- Comment #37 from Peter Bergner --- (In reply to Kewen Lin from comment #36) > Mike had one patch [1] under review for the power8 fusion piece, moving this > under his name. Thanks Mike! > > [1] https://gcc.gnu.org/pipermail/gcc-patches/2022-March/591496.html Pinged a 5th time here: https://gcc.gnu.org/pipermail/gcc-patches/2022-May/593943.html
[Bug ipa/102059] Incorrect always_inline diagnostic in LTO mode with #pragma GCC target("cpu=power10")
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102059 Kewen Lin changed: What|Removed |Added Assignee|linkw at gcc dot gnu.org |meissner at gcc dot gnu.org URL||https://gcc.gnu.org/piperma ||il/gcc-patches/2022-March/5 ||91496.html --- Comment #36 from Kewen Lin --- Mike had one patch [1] under review for the power8 fusion piece, moving this under his name. Thanks Mike! [1] https://gcc.gnu.org/pipermail/gcc-patches/2022-March/591496.html
[Bug ipa/102059] Incorrect always_inline diagnostic in LTO mode with #pragma GCC target("cpu=power10")
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102059 --- Comment #35 from Kewen Lin --- > I don't think the r12-6219 commit qualifies for backporting. What about the > comment#31 patch? Does it address the issue for Eigen on the branches? Got it. comment#31 patch can only address the mismatch issue on power8-fusion, it doesn't help htm. (FWIW, the previous posted patch https://gcc.gnu.org/pipermail/gcc-patches/2022-January/587635.html doesn't either.) For htm issue, I think we can go with the workarounds for branches like: 1) target attribute "cpu=power10,htm" for callers to pretend power10 having htm to avoid mismatch. 2) adding -mno-htm for the whole Eigen build if it doesn't use any HTM features. 3) if unfortunately it does use HTM, make it fine grain to only those inlined callees with target attribute "no-htm".
[Bug ipa/102059] Incorrect always_inline diagnostic in LTO mode with #pragma GCC target("cpu=power10")
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102059 --- Comment #34 from Richard Biener --- (In reply to Kewen Lin from comment #33) > Since this issue affects Eigen building on Power and we have GCC11 and GCC10 > to support Power10 MMA, one of our colleagues is wondering if we can > backport all the fixes there for this PR. In my limited understanding, the > commit r12-6219 in comment 26 is more like a small feature, it might not be > recommended to be backported and we have to use some workaround for HTM. > > @Richi and @Jakub, what's your opinions? I don't think the r12-6219 commit qualifies for backporting. What about the comment#31 patch? Does it address the issue for Eigen on the branches?
[Bug ipa/102059] Incorrect always_inline diagnostic in LTO mode with #pragma GCC target("cpu=power10")
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102059 Kewen Lin changed: What|Removed |Added CC||jakub at gcc dot gnu.org, ||rguenth at gcc dot gnu.org --- Comment #33 from Kewen Lin --- Since this issue affects Eigen building on Power and we have GCC11 and GCC10 to support Power10 MMA, one of our colleagues is wondering if we can backport all the fixes there for this PR. In my limited understanding, the commit r12-6219 in comment 26 is more like a small feature, it might not be recommended to be backported and we have to use some workaround for HTM. @Richi and @Jakub, what's your opinions?
[Bug ipa/102059] Incorrect always_inline diagnostic in LTO mode with #pragma GCC target("cpu=power10")
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102059 --- Comment #32 from Kewen Lin --- (In reply to Michael Meissner from comment #31) > Created attachment 52383 [details] > Simpler patch to fix the problem with power8-fusion. > > This patch just ignores the -mpower8-fusion option in the callee if the > caller does not have it set, and the option wasn't set explicitly. Thanks for the patch Mike! IMHO, it can also remove the option -mno-power8-fusion in gcc/testsuite/gcc.dg/lto/pr102059-1_0.c, the case is designed for the usage reflected by this PR.
[Bug ipa/102059] Incorrect always_inline diagnostic in LTO mode with #pragma GCC target("cpu=power10")
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102059 --- Comment #31 from Michael Meissner --- Created attachment 52383 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=52383&action=edit Simpler patch to fix the problem with power8-fusion. This patch just ignores the -mpower8-fusion option in the callee if the caller does not have it set, and the option wasn't set explicitly.
[Bug ipa/102059] Incorrect always_inline diagnostic in LTO mode with #pragma GCC target("cpu=power10")
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102059 --- Comment #30 from Kewen Lin --- (In reply to pc from comment #27) > There was a commit related to this bug, but it is still in ASSIGNED state, > so I'm not sure if this was to be considered "fixed", but... > > Chip discovered that, with a build of today's trunk, the original test case, > and at least gcc.target/powerpc/pr102059-1.c still fail (I didn't try > others), and it seems to be related to the presence of "-flto": > -- > $ gcc -c gcc/testsuite/gcc.target/powerpc/pr102059-1.c -O2 -mcpu=power8 > -Wno-attributes -flto > gcc/testsuite/gcc.target/powerpc/pr102059-1.c: In function 'bar': > gcc/testsuite/gcc.target/powerpc/pr102059-1.c:8:1: error: inlining failed in > call to 'always_inline' 'foo': target specific option mismatch > 8 | foo (int *b) > | ^~~ > gcc/testsuite/gcc.target/powerpc/pr102059-1.c:18:8: note: called from here >18 | *a = foo (a); > |^~~ > > $ gcc -c gcc/testsuite/gcc.target/powerpc/pr102059-1.c -O2 -mcpu=power8 > -Wno-attributes > $ > -- > > The testcases included with the commit do not use "-flto", so these tests > are PASSing. Your spotted failure is expected, as comment 25 there are two patches for this issue, the pushed commit was just for the 2nd issue. The first issue is pending on review. Please see the link: https://gcc.gnu.org/pipermail/gcc-patches/2022-January/587635.html, I just ping-ed it before my vacation, will ping it again later.
[Bug ipa/102059] Incorrect always_inline diagnostic in LTO mode with #pragma GCC target("cpu=power10")
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102059 Peter Bergner changed: What|Removed |Added Known to fail||12.0 Known to work|12.0| --- Comment #29 from Peter Bergner --- Since this is still broken when using trunk, I'm moving GCC 12.0 from Known to Work back to Known to Fail.
[Bug ipa/102059] Incorrect always_inline diagnostic in LTO mode with #pragma GCC target("cpu=power10")
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102059 --- Comment #28 from Dan Horák --- comment #27 matches our experience in Fedora, still a build issue in Eigen with gcc12 and LTO
[Bug ipa/102059] Incorrect always_inline diagnostic in LTO mode with #pragma GCC target("cpu=power10")
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102059 pc at gcc dot gnu.org changed: What|Removed |Added CC||pc at gcc dot gnu.org --- Comment #27 from pc at gcc dot gnu.org --- There was a commit related to this bug, but it is still in ASSIGNED state, so I'm not sure if this was to be considered "fixed", but... Chip discovered that, with a build of today's trunk, the original test case, and at least gcc.target/powerpc/pr102059-1.c still fail (I didn't try others), and it seems to be related to the presence of "-flto": -- $ gcc -c gcc/testsuite/gcc.target/powerpc/pr102059-1.c -O2 -mcpu=power8 -Wno-attributes -flto gcc/testsuite/gcc.target/powerpc/pr102059-1.c: In function 'bar': gcc/testsuite/gcc.target/powerpc/pr102059-1.c:8:1: error: inlining failed in call to 'always_inline' 'foo': target specific option mismatch 8 | foo (int *b) | ^~~ gcc/testsuite/gcc.target/powerpc/pr102059-1.c:18:8: note: called from here 18 | *a = foo (a); |^~~ $ gcc -c gcc/testsuite/gcc.target/powerpc/pr102059-1.c -O2 -mcpu=power8 -Wno-attributes $ -- The testcases included with the commit do not use "-flto", so these tests are PASSing.
[Bug ipa/102059] Incorrect always_inline diagnostic in LTO mode with #pragma GCC target("cpu=power10")
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102059 --- Comment #26 from CVS Commits --- The master branch has been updated by Kewen Lin : https://gcc.gnu.org/g:0fc60c183358be2f2003b94226ab49e21c585b13 commit r12-6219-g0fc60c183358be2f2003b94226ab49e21c585b13 Author: Kewen Lin Date: Tue Jan 4 20:27:18 2022 -0600 ipa-inline: Add target info into fn summary [PR102059] Power ISA 2.07 (Power8) introduces transactional memory feature but ISA3.1 (Power10) removes it. It exposes one troublesome issue as PR102059 shows. Users define some function with target pragma cpu=power10 then it calls one function with attribute always_inline which inherits command line option -mcpu=power8 which enables HTM implicitly. The current isa_flags check doesn't allow this inlining due to "target specific option mismatch" and error mesasge is emitted. Normally, the callee function isn't intended to exploit HTM feature, but the default flag setting make it look it has. As Richi raised in the PR, we have fp_expressions flag in function summary, and allow us to check the function actually contains any floating point expressions to avoid overkill. So this patch follows the similar idea but is more target specific, for this rs6000 port specific requirement on HTM feature check, we would like to check rs6000 specific HTM built-in functions and inline assembly, it allows targets to do their own customized checks and updates. It introduces two target hooks need_ipa_fn_target_info and update_ipa_fn_target_info. The former allows target to do some previous check and decides to collect target specific information for this function or not. For some special case, it can predict the analysis result and set it early without any scannings. The latter allows the analyze_function_body to pass gimple stmts down just like fp_expressions handlings, target can do its own tricks. I put them together as one hook initially with one boolean to indicate whether it's initial time, but the code looks a bit ugly, to separate them seems to have better readability. gcc/ChangeLog: PR ipa/102059 * config/rs6000/rs6000.c (TARGET_NEED_IPA_FN_TARGET_INFO): New macro. (TARGET_UPDATE_IPA_FN_TARGET_INFO): Likewise. (rs6000_need_ipa_fn_target_info): New function. (rs6000_update_ipa_fn_target_info): Likewise. (rs6000_can_inline_p): Adjust for ipa function summary target info. * config/rs6000/rs6000.h (RS6000_FN_TARGET_INFO_HTM): New macro. * ipa-fnsummary.c (ipa_dump_fn_summary): Adjust for ipa function summary target info. (analyze_function_body): Adjust for ipa function summary target info and call hook rs6000_need_ipa_fn_target_info and rs6000_update_ipa_fn_target_info. (ipa_merge_fn_summary_after_inlining): Adjust for ipa function summary target info. (inline_read_section): Likewise. (ipa_fn_summary_write): Likewise. * ipa-fnsummary.h (ipa_fn_summary::target_info): New member. * doc/tm.texi: Regenerate. * doc/tm.texi.in (TARGET_UPDATE_IPA_FN_TARGET_INFO): Document new hook. (TARGET_NEED_IPA_FN_TARGET_INFO): Likewise. * target.def (update_ipa_fn_target_info): New hook. (need_ipa_fn_target_info): Likewise. * targhooks.c (default_need_ipa_fn_target_info): New function. (default_update_ipa_fn_target_info): Likewise. * targhooks.h (default_update_ipa_fn_target_info): New declare. (default_need_ipa_fn_target_info): Likewise. gcc/testsuite/ChangeLog: PR ipa/102059 * gcc.dg/lto/pr102059-1_0.c: New test. * gcc.dg/lto/pr102059-1_1.c: New test. * gcc.dg/lto/pr102059-1_2.c: New test. * gcc.dg/lto/pr102059-2_0.c: New test. * gcc.dg/lto/pr102059-2_1.c: New test. * gcc.dg/lto/pr102059-2_2.c: New test. * gcc.target/powerpc/pr102059-1.c: New test. * gcc.target/powerpc/pr102059-2.c: New test. * gcc.target/powerpc/pr102059-3.c: New test.
[Bug ipa/102059] Incorrect always_inline diagnostic in LTO mode with #pragma GCC target("cpu=power10")
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102059 --- Comment #25 from Kewen Lin --- Status update: > > The fusion related flags have been considered in the posted patch: > https://gcc.gnu.org/pipermail/gcc-patches/2021-September/578552.html. > It's still being ping-ed for review since it's posted on Sep. 01. > One RFC/Patch > https://gcc.gnu.org/pipermail/gcc-patches/2021-September/578555.html is also > posted to see if we can avoid to change implicit option behavior for > Power8/9. The patch v3 (https://gcc.gnu.org/pipermail/gcc-patches/2021-September/579658.html) was approved with some additional required adjustments. But the cases were written/tested on top of the above fusion related patch, so I hold to commit it.
[Bug ipa/102059] Incorrect always_inline diagnostic in LTO mode with #pragma GCC target("cpu=power10")
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102059 --- Comment #24 from Chip Kerchner --- (In reply to Kewen Lin from comment #23) > Hi Chip, I can reproduce this error with trunk. With some investigation, I > think it's not duplicated of this PR, some information restoring seems wrong > when lto. Could you please file a separated PR? Thanks in advance! https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102347
[Bug ipa/102059] Incorrect always_inline diagnostic in LTO mode with #pragma GCC target("cpu=power10")
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102059 --- Comment #23 from Kewen Lin --- (In reply to Chip Kerchner from comment #22) > (In reply to Chip Kerchner from comment #21) - Forgot one line of code > > -- > > #pragma GCC target "cpu=power10" > > int main() { > > float *b; > > __vector_quad c; > > __builtin_mma_disassemble_acc(b, &c); > > return 0; > > } > > -- Hi Chip, I can reproduce this error with trunk. With some investigation, I think it's not duplicated of this PR, some information restoring seems wrong when lto. Could you please file a separated PR? Thanks in advance!
[Bug ipa/102059] Incorrect always_inline diagnostic in LTO mode with #pragma GCC target("cpu=power10")
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102059 --- Comment #22 from Chip Kerchner --- (In reply to Chip Kerchner from comment #21) - Forgot one line of code > -- > #pragma GCC target "cpu=power10" > int main() { > float *b; > __vector_quad c; > __builtin_mma_disassemble_acc(b, &c); > return 0; > } > --
[Bug ipa/102059] Incorrect always_inline diagnostic in LTO mode with #pragma GCC target("cpu=power10")
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102059 --- Comment #21 from Chip Kerchner --- I'm also seeing MMA problems with LTO. With this simple program (main.ii) -- int main() { float *b; __vector_quad c; __builtin_mma_disassemble_acc(b, &c); return 0; } -- And this compile (using gcc 10.3.1) -- g++ -flto=auto -mcpu=power9 main.ii -- I'm seeing this error (which does NOT occur without LTO) -- lto1: error: '__builtin_mma_xxmfacc_internal' requires the '-mmma' option lto1: fatal error: target specific builtin not available compilation terminated. --
[Bug ipa/102059] Incorrect always_inline diagnostic in LTO mode with #pragma GCC target("cpu=power10")
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102059 --- Comment #20 from Kewen Lin --- Thanks for the detailed explanation, Mike! The fusion related flags have been considered in the posted patch: https://gcc.gnu.org/pipermail/gcc-patches/2021-September/578552.html. One RFC/Patch https://gcc.gnu.org/pipermail/gcc-patches/2021-September/578555.html is also posted to see if we can avoid to change implicit option behavior for Power8/9.
[Bug ipa/102059] Incorrect always_inline diagnostic in LTO mode with #pragma GCC target("cpu=power10")
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102059 Michael Meissner changed: What|Removed |Added CC||meissner at gcc dot gnu.org --- Comment #19 from Michael Meissner --- The main power8 fusion that GCC does is combining: addis rtmp,r0,symbol@hi(r2) ld/lbz/lwz rx,symbol@lo(rtmp) into: addis rx,symbol@hi(r2) ld/lbz/lwz rx,symbol@lo(rx) This fusion is listed as one of the fusion types in the power10 documents. The fusion type is wideimmediate. Note, when you are compiling for -mcpu=power10, this fusion case doesn't often get used because we use PC-relative loads. But the machine does support it. In addition, it combines loads to a traditional floating point register, and then a move to a traditional Altivec register. Similarly, it will combine a move from a traditional Altivec register to a traditional floating point register, and then a store: lfd fy,32(rx)xxlor fy,vsrx xxlor vsrz,fy,fy stfd fy,32(rz) into: li rtmp,32 lirtmp,32 lxdx vsrz,2,rtmp stxdx vsrx.rz.rtmp Now on power9 and power10, this sequence is not generated because we have the lxsd and stxsd instructions (and plxsd/pstxsd in power10). So I suspect, we may want to move the p8 load fusion case support to fusion.md, and do it for power10 as well. Aaron Sawdey may have other thoughts, since he has been working on the power10 fusion support, and knows more what is actually implemented in current hardware. Then for inlining, we may want to exclude p8_fusion and p10_fusion in the comparison in rs6000_can_inline_p, since these are optimizations that don't affect the instructions generated. Note, there were so-called power9 fusion code that was originally in the power9 spec, but was not implemented in the hardware. I removed support for these in November 2018.
[Bug ipa/102059] Incorrect always_inline diagnostic in LTO mode with #pragma GCC target("cpu=power10")
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102059 --- Comment #18 from Kewen Lin --- (In reply to Martin Liška from comment #16) > > > > Thanks for the example, it looks useful! Now the field fp_expressions is > > generic, one target specific summary class seems required then. And not sure > > if the users still have interests/senses to further try flto when they fail > > to get expected inlining with some basic optimization options. :) > > Note the Comment 7, fixing that people will see the error also in non-LTO > mode. Yeah, the patch draft in comment 17 can make it consistent. Thanks for the finding again! I think I was misunderstanding on ipa_fn_summaries, thought it's only generated for LTO. Your reply makes me conclude it's generated for both. :)
[Bug ipa/102059] Incorrect always_inline diagnostic in LTO mode with #pragma GCC target("cpu=power10")
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102059 --- Comment #17 from Kewen Lin --- Created attachment 51357 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=51357&action=edit Fix some issues in rs6000_can_inline_p As Martin pointed out, currently function rs6000_can_inline_p just returns true when the callee_tree is NULL, we should use the target_option_default_node instead. Besides, for the else arm we should use target_option_default_node for caller_tree since I noticed rs6000_isa_flags can probably vary since global initialization (like from later calls to rs6000_option_override_internal). The existing explicit option checks only work for non-NULL callee_tree before, but it should apply for NULL callee_tree case, *-2.c case is used to cover/verify it. The testing is ongoing.
[Bug ipa/102059] Incorrect always_inline diagnostic in LTO mode with #pragma GCC target("cpu=power10")
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102059 --- Comment #16 from Martin Liška --- > > Thanks for the example, it looks useful! Now the field fp_expressions is > generic, one target specific summary class seems required then. And not sure > if the users still have interests/senses to further try flto when they fail > to get expected inlining with some basic optimization options. :) Note the Comment 7, fixing that people will see the error also in non-LTO mode.
[Bug ipa/102059] Incorrect always_inline diagnostic in LTO mode with #pragma GCC target("cpu=power10")
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102059 --- Comment #15 from Kewen Lin --- (In reply to Florian Weimer from comment #12) > (In reply to Richard Biener from comment #10) > > As of HTM it would make the testcase a user error - when using -mcpu=power10 > > it would require building with -mcpu=power8 -mno-htm? > > Or -mcpu=power8 should imply -mno-htm. It's not 100% compatible for sure, > but so is removing a feature from a later CPU that was present in an earlier > version and enabled by default in GCC. I suspect it's better to take the > compatibility hit at run time, rather than later at run time (once the users > get their hands on POWER10 hardware). Not sure if we faced similar situations before, both power8 and power9 support HTM, looking forward to the inputs from port maintainers and experts.
[Bug ipa/102059] Incorrect always_inline diagnostic in LTO mode with #pragma GCC target("cpu=power10")
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102059 --- Comment #14 from Kewen Lin --- (In reply to Richard Biener from comment #11) > Note that x86 uses for example > > else if (caller_opts->x_ix86_fpmath != callee_opts->x_ix86_fpmath >/* If the calle doesn't use FP expressions differences in > ix86_fpmath can be ignored. We are called from FEs > for multi-versioning call optimization, so beware of > ipa_fn_summaries not available. */ >&& (! ipa_fn_summaries >|| ipa_fn_summaries->get (callee_node) == NULL >|| ipa_fn_summaries->get (callee_node)->fp_expressions)) > ret = false; > > I wonder if we need to give targets the ability to compute IPA inline > summary bits, like "uses HTM" or "uses intrinsics for ISA X", to > selectively ignore bits that are not actually used. > Thanks for the example, it looks useful! Now the field fp_expressions is generic, one target specific summary class seems required then. And not sure if the users still have interests/senses to further try flto when they fail to get expected inlining with some basic optimization options. :)
[Bug ipa/102059] Incorrect always_inline diagnostic in LTO mode with #pragma GCC target("cpu=power10")
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102059 --- Comment #13 from Kewen Lin --- (In reply to Richard Biener from comment #10) > OPTION_MASK_P8_FUSION is purely optimization and shouldn't prevent inlining, > no? > > As of HTM it would make the testcase a user error - when using -mcpu=power10 > it would require building with -mcpu=power8 -mno-htm? > > Thus INVALID based on HTM, and the OPTION_MASK_P8_FUSION (and other "tuning" > things) should be excluded from the compares in the backend, at least for the > always-inline case. Good point, yes, the fusion should only affect performance, I'll collect some similar flags and do some special handlings for it.
[Bug ipa/102059] Incorrect always_inline diagnostic in LTO mode with #pragma GCC target("cpu=power10")
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102059 --- Comment #12 from Florian Weimer --- (In reply to Richard Biener from comment #10) > As of HTM it would make the testcase a user error - when using -mcpu=power10 > it would require building with -mcpu=power8 -mno-htm? Or -mcpu=power8 should imply -mno-htm. It's not 100% compatible for sure, but so is removing a feature from a later CPU that was present in an earlier version and enabled by default in GCC. I suspect it's better to take the compatibility hit at run time, rather than later at run time (once the users get their hands on POWER10 hardware).
[Bug ipa/102059] Incorrect always_inline diagnostic in LTO mode with #pragma GCC target("cpu=power10")
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102059 Richard Biener changed: What|Removed |Added CC||hubicka at gcc dot gnu.org --- Comment #11 from Richard Biener --- Note that x86 uses for example else if (caller_opts->x_ix86_fpmath != callee_opts->x_ix86_fpmath /* If the calle doesn't use FP expressions differences in ix86_fpmath can be ignored. We are called from FEs for multi-versioning call optimization, so beware of ipa_fn_summaries not available. */ && (! ipa_fn_summaries || ipa_fn_summaries->get (callee_node) == NULL || ipa_fn_summaries->get (callee_node)->fp_expressions)) ret = false; I wonder if we need to give targets the ability to compute IPA inline summary bits, like "uses HTM" or "uses intrinsics for ISA X", to selectively ignore bits that are not actually used. And as always my stance on "always_inline" is that we should give thumbs up to inlining and let the user shoot itself in the foot if he likes to.
[Bug ipa/102059] Incorrect always_inline diagnostic in LTO mode with #pragma GCC target("cpu=power10")
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102059 --- Comment #10 from Richard Biener --- OPTION_MASK_P8_FUSION is purely optimization and shouldn't prevent inlining, no? As of HTM it would make the testcase a user error - when using -mcpu=power10 it would require building with -mcpu=power8 -mno-htm? Thus INVALID based on HTM, and the OPTION_MASK_P8_FUSION (and other "tuning" things) should be excluded from the compares in the backend, at least for the always-inline case.
[Bug ipa/102059] Incorrect always_inline diagnostic in LTO mode with #pragma GCC target("cpu=power10")
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102059 --- Comment #9 from Kewen Lin --- One more reduced test case: fail cmd: gcc -c -O2 -flto -mcpu=power8 pass cmd: gcc -c -O2 -flto -mcpu=power8 -mno-htm -mno-power8-fusion -- __attribute__((always_inline)) int foo(int *b) { *b += 10; return *b; } #pragma GCC target "cpu=power10" int bar(int* a){ *a = foo(a); return 0; } -- I confirm the mismatch subset on HTM is expected, ISA 3.1 has removed the transnational memory support. As to OPTION_MASK_P8_FUSION, it's reasonable to exclude it from ISA3.0 (power9), but I'm not sure if it's not a subset of Power10 new fusion, from the current implementation it isn't. Hi @Mike, could you help to confirm this?
[Bug ipa/102059] Incorrect always_inline diagnostic in LTO mode with #pragma GCC target("cpu=power10")
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102059 Kewen Lin changed: What|Removed |Added Assignee|unassigned at gcc dot gnu.org |linkw at gcc dot gnu.org Status|NEW |ASSIGNED --- Comment #8 from Kewen Lin --- Thanks for the investigation, @Martin. I'll take a look tomorrow.
[Bug ipa/102059] Incorrect always_inline diagnostic in LTO mode with #pragma GCC target("cpu=power10")
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102059 Martin Liška changed: What|Removed |Added CC||meissner at linux dot ibm.com --- Comment #7 from Martin Liška --- About the non-LTO mode, the rs6000_can_inline_p succeeds due to: /* If the callee has no option attributes, then it is ok to inline. */ if (!callee_tree) ret = true; This is not correct in my opinion, please see i386.c: static bool ix86_can_inline_p (tree caller, tree callee) { ... if (!callee_tree) callee_tree = target_option_default_node; if (!caller_tree) caller_tree = target_option_default_node; ... So I think the following should be applied: diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c index e073b26b430..4a997b0cabc 100644 --- a/gcc/config/rs6000/rs6000.c +++ b/gcc/config/rs6000/rs6000.c @@ -25036,9 +25036,8 @@ rs6000_can_inline_p (tree caller, tree callee) tree caller_tree = DECL_FUNCTION_SPECIFIC_TARGET (caller); tree callee_tree = DECL_FUNCTION_SPECIFIC_TARGET (callee); - /* If the callee has no option attributes, then it is ok to inline. */ if (!callee_tree) -ret = true; +callee_tree = target_option_default_node; else {
[Bug ipa/102059] Incorrect always_inline diagnostic in LTO mode with #pragma GCC target("cpu=power10")
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102059 --- Comment #6 from Martin Liška --- And the OPTION_MASK_P8_FUSION mask is set here: /* Enable power8 fusion if we are tuning for power8, even if we aren't generating power8 instructions. Power9 does not optimize power8 fusion cases. */ if (!(rs6000_isa_flags_explicit & OPTION_MASK_P8_FUSION)) { if (processor_target_table[tune_index].processor == PROCESSOR_POWER8) rs6000_isa_flags |= OPTION_MASK_P8_FUSION; else rs6000_isa_flags &= ~OPTION_MASK_P8_FUSION; }
[Bug ipa/102059] Incorrect always_inline diagnostic in LTO mode with #pragma GCC target("cpu=power10")
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102059 Martin Liška changed: What|Removed |Added CC||linkw at gcc dot gnu.org Status|ASSIGNED|NEW Assignee|marxin at gcc dot gnu.org |unassigned at gcc dot gnu.org --- Comment #5 from Martin Liška --- So what happens in LTO mode: rs6000_can_inline_p is called with: #0 rs6000_can_inline_p (caller=, callee=) at /home/marxin/Programming/gcc/gcc/config/rs6000/rs6000.c:25035 #1 0x02529129 in can_inline_edge_p (e= -> )>, report=true, early=true) at /home/marxin/Programming/gcc/gcc/ipa-inline.c:381 #2 0x0252a48a in can_early_inline_edge_p (e= -> )>) at /home/marxin/Programming/gcc/gcc/ipa-inline.c:646 #3 0x0253135a in inline_always_inline_functions (node=) at /home/marxin/Programming/gcc/gcc/ipa-inline.c:2838 The following condition is false: /* The callee's options must be a subset of the caller's options, i.e. a vsx function may inline an altivec function, but a no-vsx function must not inline a vsx function. However, for those options that the callee has explicitly enabled or disabled, then we must enforce that the callee's and caller's options match exactly; see PR70010. */ >25062 if (((caller_isa & callee_isa) == callee_isa) because: (gdb) p /x (caller_isa & callee_isa) ^ callee_isa $3 = 0x22 which are the following masks: #define OPTION_MASK_HTM (HOST_WIDE_INT_1U << 17) #define OPTION_MASK_P8_FUSION (HOST_WIDE_INT_1U << 37) So looking at the condition, callers options are not subset. Which is caused by g:82800987cb3b22427a8799b3e8491eff496724b9 commit 82800987cb3b22427a8799b3e8491eff496724b9 Author: Kewen Lin Date: Wed Dec 2 01:55:34 2020 -0600 rs6000: Disable HTM for Power10 and later by default Power ISA 3.1 has dropped transactional memory support, this patch is to disable HTM feature for power10 and later by default. Bootstrapped/regtested on powerpc64le-linux-gnu P8 and P10. gcc/ChangeLog: * config/rs6000/rs6000.c (rs6000_option_override_internal): Use OPTION_MASK_DIRECT_MOVE for Power8 target_enable instead of OPTION_MASK_HTM. * config/rs6000/rs6000-cpus.def (ISA_2_7_MASKS_SERVER): Remove OPTION_MASK_HTM. (RS6000_CPU): Add OPTION_MASK_HTM to power8, power9 and powerpc64le entries. @Kewen: Can you please take a look?
[Bug ipa/102059] Incorrect always_inline diagnostic in LTO mode with #pragma GCC target("cpu=power10")
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102059 --- Comment #4 from Martin Liška --- Reduced test-case: $ cat /tmp/basic_op.ii enum { Unaligned, Aligned }; enum { ColMajor }; enum { ReadOnlyAccessors, DefaultProduct }; template struct traits; struct accessors_level { enum { has_direct_access, has_write_access }; }; template class Matrix; template class MatrixBase; template class NoAlias; template class Product; template class MapBase; template class Map; template struct dense_xpr_base { typedef MatrixBase type; }; template struct assign_op {}; struct DenseBase { typedef int Scalar; enum { Flags }; }; template struct MatrixBase : DenseBase { NoAlias noalias(); }; template class ProductImpl; template struct Product : ProductImpl { _Lhs lhs(); _Rhs rhs(); }; template struct ProductImpl : dense_xpr_base::type {}; template struct Assignment; template void call_assignment_no_alias(Dst dst, Src src, Func func) { Assignment::run(dst, src, func); } template struct general_matrix_matrix_product; template class blas_data_mapper; template struct blas_data_mapper { blas_data_mapper(Scalar *, Index, Index); }; template struct NoAlias { template void operator=(OtherDerived) { Product, 1>, Map, 1>> __trans_tmp_4; call_assignment_no_alias(m_expression, __trans_tmp_4, assign_op()); } ExpressionType m_expression; }; template struct traits> { typedef _Scalar Scalar; }; template struct MapBase : dense_xpr_base::type { typedef typename traits::Scalar *PointerType; }; template struct MapBase : MapBase {}; template struct traits> : traits {}; template struct Map : MapBase, MapOptions>> { typedef Map Base; Map(typename Base::PointerType, long, long); }; struct gebp_traits { enum { nr, mr }; }; template struct gebp_kernel; long parallelize_gemm_cols; template void parallelize_gemm(Functor func, Index, bool) { func(0, parallelize_gemm_cols); } template struct generic_product_impl; template struct Assignment, assign_op> { static void run(DstXprType dst, Product src, assign_op) { Map __trans_tmp_5 = src.rhs(); generic_product_impl::evalTo(dst, src.lhs(), __trans_tmp_5); } }; template struct general_matrix_matrix_product { typedef LhsScalar ResScalar; static void run(ResScalar *_res, ResScalar alpha) { Index resStride, resIncr, actual_mc, actual_kc, actual_nc; typedef blas_data_mapper ResMapper; ResMapper res(_res, resStride, resIncr); gebp_kernel gebp; LhsScalar blockA, blockB; gebp(res, &blockA, &blockB, actual_mc, actual_kc, actual_nc, alpha); } }; struct gemm_blocking_space; struct gemm_functor { gemm_functor(Map, 1>, Map, 1>, Map, 1>, float, gemm_blocking_space); void operator()(int, int) { general_matrix_matrix_product::run( 0, m_actualAlpha); } float m_actualAlpha; }; struct gemm_blocking_space { gemm_blocking_space(long, long, long, long, bool); }; int scaleAndAddTo___trans_tmp_3; template struct generic_product_impl { template static void evalTo(Dst dst, Lhs lhs, Rhs rhs) { scaleAndAddTo(dst, lhs, rhs); } template static void scaleAndAddTo(Dest dst, Lhs a_lhs, Rhs a_rhs) { Map lhs(a_lhs); Map rhs(a_rhs); typedef gemm_functor GemmFunctor; gemm_blocking_space blocking(0, 0, 0, 1, true); parallelize_gemm<0>(GemmFunctor(lhs, rhs, dst, 0, blocking), scaleAndAddTo___trans_tmp_3, Dest::Flags); } }; template Packet bmask(); #pragma GCC target "cpu=power10" template void gemmMMA(const DataMapper &, const float *, const float *, long, long, long, float, long, long, long, long) { bmask(); } #pragma GCC reset_options struct quad_traits { enum { size, rows }; }; template __attribute__((always_inline)) Packet bmask() {} template struct gebp_kernel { void operator()(const DataMapper &, const float *, const float *, Index, Index, Index, float, Index = 1, Index = 1, Index = 0, Index = 0); }; template void gebp_kernel::operator()(const DataMapper &, const float *, const float *, Index, Index, Index, float, Index, Index, Index, Index) { void (*gemm_function)(const DataMapper &, const float *, const float *, Index, Index, Index, float, Index, Index, Index, Index) = gemmMMA; } template struct Data_ { float &operator[](long); Data_ *MatrixOp(); }; Product __trans_tmp_7; template Data_ *Data_::MatrixOp() { long NbCol0, NbRow1; Data_ res; Map m2(&res[0], NbCol0, NbRow1); m2.noalias() = __trans_tmp_7; } template class Data_;
[Bug ipa/102059] Incorrect always_inline diagnostic in LTO mode with #pragma GCC target("cpu=power10")
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102059 Chip Kerchner changed: What|Removed |Added CC||chip.kerchner at ibm dot com --- Comment #3 from Chip Kerchner --- It also fails with version 10.3
[Bug ipa/102059] Incorrect always_inline diagnostic in LTO mode with #pragma GCC target("cpu=power10")
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102059 Martin Liška changed: What|Removed |Added Assignee|unassigned at gcc dot gnu.org |marxin at gcc dot gnu.org Status|NEW |ASSIGNED --- Comment #2 from Martin Liška --- Interesting, let me take a look.
[Bug ipa/102059] Incorrect always_inline diagnostic in LTO mode with #pragma GCC target("cpu=power10")
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102059 Richard Biener changed: What|Removed |Added Ever confirmed|0 |1 Keywords||lto Last reconfirmed||2021-08-25 Known to fail||12.0 Component|lto |ipa Status|UNCONFIRMED |NEW --- Comment #1 from Richard Biener --- Interesting. Note this happens at compile-time where -flto shouldn't have any such effects ... I happen to have a trunk cc1plus from some point around and that reproduces the issue as well.