[Bug ipa/102059] Incorrect always_inline diagnostic in LTO mode with #pragma GCC target("cpu=power10")

2022-05-19 Thread bergner at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102059

Peter Bergner  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|ASSIGNED|RESOLVED

--- Comment #42 from Peter Bergner  ---
Fixed everywhere as much as we will.

It was decided (Comment 34) not to backport Kewen's patch that fixes the HTM
issue to the GCC 11 and older release branches and that instead we will live
with the workarounds described in Comment 35 for them.  The -mpower8-fusion bug
is fixed everywhere.

[Bug ipa/102059] Incorrect always_inline diagnostic in LTO mode with #pragma GCC target("cpu=power10")

2022-05-19 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102059

--- Comment #41 from CVS Commits  ---
The releases/gcc-11 branch has been updated by Peter Bergner
:

https://gcc.gnu.org/g:3ea6d0ed6860e4f1d8f609c26de9d9a7f7a9ad2d

commit r11-10022-g3ea6d0ed6860e4f1d8f609c26de9d9a7f7a9ad2d
Author: Michael Meissner 
Date:   Fri May 6 11:39:13 2022 -0500

rs6000: Ignore fusion option flags for inlining test [PR102059]

The -mpower8-fusion and -mpower10-fusion options do not modify which
instructions we can generate, so ignore them when deciding whether we
can inline callee into caller.

2022-05-06  Michael Meissner  
Segher Boessenkool  

gcc/
PR target/102059
* config/rs6000/rs6000.c (rs6000_can_inline_p): Ignore
-mpower8-fusion
and -mpower10-fusion options for inlining purposes.

gcc/testsuite/
PR target/102059
* gcc.target/powerpc/pr102059-4.c: New test.

(cherry picked from commit 2fb654f77d5292864ef57040f7bc01d7a975f6d9)

[Bug ipa/102059] Incorrect always_inline diagnostic in LTO mode with #pragma GCC target("cpu=power10")

2022-05-06 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102059

--- Comment #40 from CVS Commits  ---
The releases/gcc-12 branch has been updated by Peter Bergner
:

https://gcc.gnu.org/g:e6b1ac334ac61f72536f3479f735ea3514f1309d

commit r12-8353-ge6b1ac334ac61f72536f3479f735ea3514f1309d
Author: Michael Meissner 
Date:   Fri May 6 11:39:13 2022 -0500

rs6000: Ignore fusion option flags for inlining test [PR102059]

The -mpower8-fusion and -mpower10-fusion options do not modify which
instructions we can generate, so ignore them when deciding whether we
can inline callee into caller.

2022-05-06   Michael Meissner  

gcc/
PR target/102059
* config/rs6000/rs6000.cc (rs6000_can_inline_p): Ignore
-mpower8-fusion
and -mpower10-fusion options for inlining purposes.

gcc/testsuite/
PR target/102059
* gcc.target/powerpc/pr102059-4.c: New test.

(cherry picked from commit 2fb654f77d5292864ef57040f7bc01d7a975f6d9)

[Bug ipa/102059] Incorrect always_inline diagnostic in LTO mode with #pragma GCC target("cpu=power10")

2022-05-06 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102059

--- Comment #39 from CVS Commits  ---
The releases/gcc-10 branch has been updated by Peter Bergner
:

https://gcc.gnu.org/g:aa33a2d866c0f3f2f9b3cb26f8ff2074bcd169b4

commit r10-10596-gaa33a2d866c0f3f2f9b3cb26f8ff2074bcd169b4
Author: Michael Meissner 
Date:   Fri May 6 11:39:13 2022 -0500

rs6000: Ignore fusion option flags for inlining test [PR102059]

The -mpower8-fusion option does not modify which instructions we can
generate,
so ignore it when deciding whether we can inline callee into caller.

2022-05-06   Michael Meissner  

gcc/
PR target/102059
* config/rs6000/rs6000.c (rs6000_can_inline_p): Ignore
-mpower8-fusion
option for inlining purposes.

gcc/testsuite/
PR target/102059
* gcc.target/powerpc/pr102059-4.c: New test.

(cherry picked from commit 2fb654f77d5292864ef57040f7bc01d7a975f6d9)

[Bug ipa/102059] Incorrect always_inline diagnostic in LTO mode with #pragma GCC target("cpu=power10")

2022-05-06 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102059

--- Comment #38 from CVS Commits  ---
The master branch has been updated by Peter Bergner :

https://gcc.gnu.org/g:2fb654f77d5292864ef57040f7bc01d7a975f6d9

commit r13-157-g2fb654f77d5292864ef57040f7bc01d7a975f6d9
Author: Michael Meissner 
Date:   Fri May 6 11:39:13 2022 -0500

rs6000: Ignore fusion option flags for inlining test [PR102059]

The -mpower8-fusion and -mpower10-fusion options do not modify which
instructions we can generate, so ignore them when deciding whether we
can inline callee into caller.

2022-05-06   Michael Meissner  

gcc/
PR target/102059
* config/rs6000/rs6000.cc (rs6000_can_inline_p): Ignore
-mpower8-fusion
and -mpower10-fusion options for inlining purposes.

gcc/testsuite/
PR target/102059
* gcc.target/powerpc/pr102059-4.c: New test.

[Bug ipa/102059] Incorrect always_inline diagnostic in LTO mode with #pragma GCC target("cpu=power10")

2022-05-05 Thread bergner at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102059

--- Comment #37 from Peter Bergner  ---
(In reply to Kewen Lin from comment #36)
> Mike had one patch [1] under review for the power8 fusion piece, moving this
> under his name. Thanks Mike!
> 
> [1] https://gcc.gnu.org/pipermail/gcc-patches/2022-March/591496.html

Pinged a 5th time here:

  https://gcc.gnu.org/pipermail/gcc-patches/2022-May/593943.html

[Bug ipa/102059] Incorrect always_inline diagnostic in LTO mode with #pragma GCC target("cpu=power10")

2022-03-14 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102059

Kewen Lin  changed:

   What|Removed |Added

   Assignee|linkw at gcc dot gnu.org   |meissner at gcc dot 
gnu.org
URL||https://gcc.gnu.org/piperma
   ||il/gcc-patches/2022-March/5
   ||91496.html

--- Comment #36 from Kewen Lin  ---
Mike had one patch [1] under review for the power8 fusion piece, moving this
under his name. Thanks Mike!

[1] https://gcc.gnu.org/pipermail/gcc-patches/2022-March/591496.html

[Bug ipa/102059] Incorrect always_inline diagnostic in LTO mode with #pragma GCC target("cpu=power10")

2022-02-08 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102059

--- Comment #35 from Kewen Lin  ---
> I don't think the r12-6219 commit qualifies for backporting.  What about the
> comment#31 patch?  Does it address the issue for Eigen on the branches?

Got it. comment#31 patch can only address the mismatch issue on power8-fusion,
it doesn't help htm. (FWIW, the previous posted patch
https://gcc.gnu.org/pipermail/gcc-patches/2022-January/587635.html doesn't
either.)

For htm issue, I think we can go with the workarounds for branches like:
  1) target attribute "cpu=power10,htm" for callers to pretend power10 having
htm to avoid mismatch.
  2) adding -mno-htm for the whole Eigen build if it doesn't use any HTM
features.
  3) if unfortunately it does use HTM, make it fine grain to only those inlined
callees with target attribute "no-htm".

[Bug ipa/102059] Incorrect always_inline diagnostic in LTO mode with #pragma GCC target("cpu=power10")

2022-02-08 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102059

--- Comment #34 from Richard Biener  ---
(In reply to Kewen Lin from comment #33)
> Since this issue affects Eigen building on Power and we have GCC11 and GCC10
> to support Power10 MMA, one of our colleagues is wondering if we can
> backport all the fixes there for this PR. In my limited understanding, the
> commit r12-6219 in comment 26 is more like a small feature, it might not be
> recommended to be backported and we have to use some workaround for HTM.
> 
> @Richi and @Jakub, what's your opinions?

I don't think the r12-6219 commit qualifies for backporting.  What about the
comment#31 patch?  Does it address the issue for Eigen on the branches?

[Bug ipa/102059] Incorrect always_inline diagnostic in LTO mode with #pragma GCC target("cpu=power10")

2022-02-08 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102059

Kewen Lin  changed:

   What|Removed |Added

 CC||jakub at gcc dot gnu.org,
   ||rguenth at gcc dot gnu.org

--- Comment #33 from Kewen Lin  ---
Since this issue affects Eigen building on Power and we have GCC11 and GCC10 to
support Power10 MMA, one of our colleagues is wondering if we can backport all
the fixes there for this PR. In my limited understanding, the commit r12-6219
in comment 26 is more like a small feature, it might not be recommended to be
backported and we have to use some workaround for HTM.

@Richi and @Jakub, what's your opinions?

[Bug ipa/102059] Incorrect always_inline diagnostic in LTO mode with #pragma GCC target("cpu=power10")

2022-02-08 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102059

--- Comment #32 from Kewen Lin  ---
(In reply to Michael Meissner from comment #31)
> Created attachment 52383 [details]
> Simpler patch to fix the problem with power8-fusion.
> 
> This patch just ignores the -mpower8-fusion option in the callee if the
> caller does not have it set, and the option wasn't set explicitly.

Thanks for the patch Mike! IMHO, it can also remove the option
-mno-power8-fusion in gcc/testsuite/gcc.dg/lto/pr102059-1_0.c, the case is
designed for the usage reflected by this PR.

[Bug ipa/102059] Incorrect always_inline diagnostic in LTO mode with #pragma GCC target("cpu=power10")

2022-02-08 Thread meissner at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102059

--- Comment #31 from Michael Meissner  ---
Created attachment 52383
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=52383&action=edit
Simpler patch to fix the problem with power8-fusion.

This patch just ignores the -mpower8-fusion option in the callee if the caller
does not have it set, and the option wasn't set explicitly.

[Bug ipa/102059] Incorrect always_inline diagnostic in LTO mode with #pragma GCC target("cpu=power10")

2022-02-04 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102059

--- Comment #30 from Kewen Lin  ---
(In reply to pc from comment #27)
> There was a commit related to this bug, but it is still in ASSIGNED state,
> so I'm not sure if this was to be considered "fixed", but...
> 
> Chip discovered that, with a build of today's trunk, the original test case,
> and at least gcc.target/powerpc/pr102059-1.c still fail (I didn't try
> others), and it seems to be related to the presence of "-flto":
> --
> $ gcc -c gcc/testsuite/gcc.target/powerpc/pr102059-1.c -O2 -mcpu=power8
> -Wno-attributes -flto 
> gcc/testsuite/gcc.target/powerpc/pr102059-1.c: In function 'bar':
> gcc/testsuite/gcc.target/powerpc/pr102059-1.c:8:1: error: inlining failed in
> call to 'always_inline' 'foo': target specific option mismatch
> 8 | foo (int *b)
>   | ^~~
> gcc/testsuite/gcc.target/powerpc/pr102059-1.c:18:8: note: called from here
>18 |   *a = foo (a);
>   |^~~
> 
> $ gcc -c gcc/testsuite/gcc.target/powerpc/pr102059-1.c -O2 -mcpu=power8
> -Wno-attributes
> $ 
> --
> 
> The testcases included with the commit do not use "-flto", so these tests
> are PASSing.

Your spotted failure is expected, as comment 25 there are two patches for this
issue, the pushed commit was just for the 2nd issue. The first issue is pending
on review. Please see the link:
https://gcc.gnu.org/pipermail/gcc-patches/2022-January/587635.html, I just
ping-ed it before my vacation, will ping it again later.

[Bug ipa/102059] Incorrect always_inline diagnostic in LTO mode with #pragma GCC target("cpu=power10")

2022-02-04 Thread bergner at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102059

Peter Bergner  changed:

   What|Removed |Added

  Known to fail||12.0
  Known to work|12.0|

--- Comment #29 from Peter Bergner  ---
Since this is still broken when using trunk, I'm moving GCC 12.0 from Known to
Work back to Known to Fail.

[Bug ipa/102059] Incorrect always_inline diagnostic in LTO mode with #pragma GCC target("cpu=power10")

2022-02-04 Thread dan at danny dot cz via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102059

--- Comment #28 from Dan Horák  ---
comment #27 matches our experience in Fedora, still a build issue in Eigen with
gcc12 and LTO

[Bug ipa/102059] Incorrect always_inline diagnostic in LTO mode with #pragma GCC target("cpu=power10")

2022-02-04 Thread pc at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102059

pc at gcc dot gnu.org changed:

   What|Removed |Added

 CC||pc at gcc dot gnu.org

--- Comment #27 from pc at gcc dot gnu.org ---
There was a commit related to this bug, but it is still in ASSIGNED state, so
I'm not sure if this was to be considered "fixed", but...

Chip discovered that, with a build of today's trunk, the original test case,
and at least gcc.target/powerpc/pr102059-1.c still fail (I didn't try others),
and it seems to be related to the presence of "-flto":
--
$ gcc -c gcc/testsuite/gcc.target/powerpc/pr102059-1.c -O2 -mcpu=power8
-Wno-attributes -flto 
gcc/testsuite/gcc.target/powerpc/pr102059-1.c: In function 'bar':
gcc/testsuite/gcc.target/powerpc/pr102059-1.c:8:1: error: inlining failed in
call to 'always_inline' 'foo': target specific option mismatch
8 | foo (int *b)
  | ^~~
gcc/testsuite/gcc.target/powerpc/pr102059-1.c:18:8: note: called from here
   18 |   *a = foo (a);
  |^~~

$ gcc -c gcc/testsuite/gcc.target/powerpc/pr102059-1.c -O2 -mcpu=power8
-Wno-attributes
$ 
--

The testcases included with the commit do not use "-flto", so these tests are
PASSing.

[Bug ipa/102059] Incorrect always_inline diagnostic in LTO mode with #pragma GCC target("cpu=power10")

2022-01-04 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102059

--- Comment #26 from CVS Commits  ---
The master branch has been updated by Kewen Lin :

https://gcc.gnu.org/g:0fc60c183358be2f2003b94226ab49e21c585b13

commit r12-6219-g0fc60c183358be2f2003b94226ab49e21c585b13
Author: Kewen Lin 
Date:   Tue Jan 4 20:27:18 2022 -0600

ipa-inline: Add target info into fn summary [PR102059]

Power ISA 2.07 (Power8) introduces transactional memory
feature but ISA3.1 (Power10) removes it.  It exposes one
troublesome issue as PR102059 shows.  Users define some
function with target pragma cpu=power10 then it calls one
function with attribute always_inline which inherits
command line option -mcpu=power8 which enables HTM
implicitly.  The current isa_flags check doesn't allow this
inlining due to "target specific option mismatch" and error
mesasge is emitted.

Normally, the callee function isn't intended to exploit HTM
feature, but the default flag setting make it look it has.
As Richi raised in the PR, we have fp_expressions flag in
function summary, and allow us to check the function
actually contains any floating point expressions to avoid
overkill.  So this patch follows the similar idea but is
more target specific, for this rs6000 port specific
requirement on HTM feature check, we would like to check
rs6000 specific HTM built-in functions and inline assembly,
it allows targets to do their own customized checks and
updates.

It introduces two target hooks need_ipa_fn_target_info and
update_ipa_fn_target_info.  The former allows target to do
some previous check and decides to collect target specific
information for this function or not.  For some special
case, it can predict the analysis result and set it early
without any scannings.  The latter allows the
analyze_function_body to pass gimple stmts down just like
fp_expressions handlings, target can do its own tricks.
I put them together as one hook initially with one boolean
to indicate whether it's initial time, but the code looks a
bit ugly, to separate them seems to have better readability.

gcc/ChangeLog:

PR ipa/102059
* config/rs6000/rs6000.c (TARGET_NEED_IPA_FN_TARGET_INFO): New
macro.
(TARGET_UPDATE_IPA_FN_TARGET_INFO): Likewise.
(rs6000_need_ipa_fn_target_info): New function.
(rs6000_update_ipa_fn_target_info): Likewise.
(rs6000_can_inline_p): Adjust for ipa function summary target info.
* config/rs6000/rs6000.h (RS6000_FN_TARGET_INFO_HTM): New macro.
* ipa-fnsummary.c (ipa_dump_fn_summary): Adjust for ipa function
summary target info.
(analyze_function_body): Adjust for ipa function summary target
info
and call hook rs6000_need_ipa_fn_target_info and
rs6000_update_ipa_fn_target_info.
(ipa_merge_fn_summary_after_inlining): Adjust for ipa function
summary
target info.
(inline_read_section): Likewise.
(ipa_fn_summary_write): Likewise.
* ipa-fnsummary.h (ipa_fn_summary::target_info): New member.
* doc/tm.texi: Regenerate.
* doc/tm.texi.in (TARGET_UPDATE_IPA_FN_TARGET_INFO): Document new
hook.
(TARGET_NEED_IPA_FN_TARGET_INFO): Likewise.
* target.def (update_ipa_fn_target_info): New hook.
(need_ipa_fn_target_info): Likewise.
* targhooks.c (default_need_ipa_fn_target_info): New function.
(default_update_ipa_fn_target_info): Likewise.
* targhooks.h (default_update_ipa_fn_target_info): New declare.
(default_need_ipa_fn_target_info): Likewise.

gcc/testsuite/ChangeLog:

PR ipa/102059
* gcc.dg/lto/pr102059-1_0.c: New test.
* gcc.dg/lto/pr102059-1_1.c: New test.
* gcc.dg/lto/pr102059-1_2.c: New test.
* gcc.dg/lto/pr102059-2_0.c: New test.
* gcc.dg/lto/pr102059-2_1.c: New test.
* gcc.dg/lto/pr102059-2_2.c: New test.
* gcc.target/powerpc/pr102059-1.c: New test.
* gcc.target/powerpc/pr102059-2.c: New test.
* gcc.target/powerpc/pr102059-3.c: New test.

[Bug ipa/102059] Incorrect always_inline diagnostic in LTO mode with #pragma GCC target("cpu=power10")

2021-11-25 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102059

--- Comment #25 from Kewen Lin  ---
Status update:

> 
> The fusion related flags have been considered in the posted patch:
> https://gcc.gnu.org/pipermail/gcc-patches/2021-September/578552.html. 
> 

It's still being ping-ed for review since it's posted on Sep. 01.

> One RFC/Patch
> https://gcc.gnu.org/pipermail/gcc-patches/2021-September/578555.html is also
> posted to see if we can avoid to change implicit option behavior for
> Power8/9.

The patch v3
(https://gcc.gnu.org/pipermail/gcc-patches/2021-September/579658.html) was
approved with some additional required adjustments. But the cases were
written/tested on top of the above fusion related patch, so I hold to commit
it.

[Bug ipa/102059] Incorrect always_inline diagnostic in LTO mode with #pragma GCC target("cpu=power10")

2021-09-15 Thread chip.kerchner at ibm dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102059

--- Comment #24 from Chip Kerchner  ---
(In reply to Kewen Lin from comment #23)
> Hi Chip, I can reproduce this error with trunk. With some investigation, I
> think it's not duplicated of this PR, some information restoring seems wrong
> when lto. Could you please file a separated PR?  Thanks in advance!

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102347

[Bug ipa/102059] Incorrect always_inline diagnostic in LTO mode with #pragma GCC target("cpu=power10")

2021-09-15 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102059

--- Comment #23 from Kewen Lin  ---
(In reply to Chip Kerchner from comment #22)
> (In reply to Chip Kerchner from comment #21) - Forgot one line of code
> > --
> > #pragma GCC target "cpu=power10"
> > int main() {
> >   float *b;
> >   __vector_quad c;
> >   __builtin_mma_disassemble_acc(b, &c);
> >   return 0;
> > }
> > --

Hi Chip, I can reproduce this error with trunk. With some investigation, I
think it's not duplicated of this PR, some information restoring seems wrong
when lto. Could you please file a separated PR?  Thanks in advance!

[Bug ipa/102059] Incorrect always_inline diagnostic in LTO mode with #pragma GCC target("cpu=power10")

2021-09-14 Thread chip.kerchner at ibm dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102059

--- Comment #22 from Chip Kerchner  ---
(In reply to Chip Kerchner from comment #21) - Forgot one line of code
> --
> #pragma GCC target "cpu=power10"
> int main() {
>   float *b;
>   __vector_quad c;
>   __builtin_mma_disassemble_acc(b, &c);
>   return 0;
> }
> --

[Bug ipa/102059] Incorrect always_inline diagnostic in LTO mode with #pragma GCC target("cpu=power10")

2021-09-14 Thread chip.kerchner at ibm dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102059

--- Comment #21 from Chip Kerchner  ---
I'm also seeing MMA problems with LTO.  With this simple program (main.ii)

--
int main() {
  float *b;
  __vector_quad c;
  __builtin_mma_disassemble_acc(b, &c);
  return 0;
}
--

And this compile (using gcc 10.3.1)

--
g++ -flto=auto -mcpu=power9 main.ii
--

I'm seeing this error (which does NOT occur without LTO)

--
lto1: error: '__builtin_mma_xxmfacc_internal' requires the '-mmma' option
lto1: fatal error: target specific builtin not available
compilation terminated.
--

[Bug ipa/102059] Incorrect always_inline diagnostic in LTO mode with #pragma GCC target("cpu=power10")

2021-09-01 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102059

--- Comment #20 from Kewen Lin  ---
Thanks for the detailed explanation, Mike!

The fusion related flags have been considered in the posted patch:
https://gcc.gnu.org/pipermail/gcc-patches/2021-September/578552.html. 

One RFC/Patch
https://gcc.gnu.org/pipermail/gcc-patches/2021-September/578555.html is also
posted to see if we can avoid to change implicit option behavior for Power8/9.

[Bug ipa/102059] Incorrect always_inline diagnostic in LTO mode with #pragma GCC target("cpu=power10")

2021-08-26 Thread meissner at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102059

Michael Meissner  changed:

   What|Removed |Added

 CC||meissner at gcc dot gnu.org

--- Comment #19 from Michael Meissner  ---
The main power8 fusion that GCC does is combining:

addis   rtmp,r0,symbol@hi(r2)
ld/lbz/lwz  rx,symbol@lo(rtmp)

into:

addis   rx,symbol@hi(r2)
ld/lbz/lwz  rx,symbol@lo(rx)

This fusion is listed as one of the fusion types in the power10 documents.  The
fusion type is wideimmediate.  Note, when you are compiling for -mcpu=power10,
this fusion case doesn't often get used because we use PC-relative loads.  But
the machine does support it.

In addition, it combines loads to a traditional floating point register, and
then a move to a traditional Altivec register.   Similarly, it will combine a
move from a traditional Altivec register to a traditional floating point
register, and then a store:

lfd   fy,32(rx)xxlor fy,vsrx
xxlor vsrz,fy,fy   stfd  fy,32(rz)

into:

li   rtmp,32   lirtmp,32
lxdx vsrz,2,rtmp   stxdx vsrx.rz.rtmp

Now on power9 and power10, this sequence is not generated because we have the
lxsd and stxsd instructions (and plxsd/pstxsd in power10).

So I suspect, we may want to move the p8 load fusion case support to fusion.md,
and do it for power10 as well.  Aaron Sawdey may have other thoughts, since he
has been working on the power10 fusion support, and knows more what is actually
implemented in current hardware.

Then for inlining, we may want to exclude p8_fusion and p10_fusion in the
comparison in rs6000_can_inline_p, since these are optimizations that don't
affect the instructions generated.

Note, there were so-called power9 fusion code that was originally in the power9
spec, but was not implemented in the hardware.  I removed support for these in
November 2018.

[Bug ipa/102059] Incorrect always_inline diagnostic in LTO mode with #pragma GCC target("cpu=power10")

2021-08-26 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102059

--- Comment #18 from Kewen Lin  ---
(In reply to Martin Liška from comment #16)
> > 
> > Thanks for the example, it looks useful! Now the field fp_expressions is
> > generic, one target specific summary class seems required then. And not sure
> > if the users still have interests/senses to further try flto when they fail
> > to get expected inlining with some basic optimization options. :)
> 
> Note the Comment 7, fixing that people will see the error also in non-LTO
> mode.

Yeah, the patch draft in comment 17 can make it consistent. Thanks for the
finding again! I think I was misunderstanding on ipa_fn_summaries, thought it's
only generated for LTO. Your reply makes me conclude it's generated for both.
:)

[Bug ipa/102059] Incorrect always_inline diagnostic in LTO mode with #pragma GCC target("cpu=power10")

2021-08-26 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102059

--- Comment #17 from Kewen Lin  ---
Created attachment 51357
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=51357&action=edit
Fix some issues in rs6000_can_inline_p

As Martin pointed out, currently function rs6000_can_inline_p just returns true
when the callee_tree is NULL, we should use the target_option_default_node
instead. Besides, for the else arm we should use target_option_default_node for
caller_tree since I noticed rs6000_isa_flags can probably vary since global
initialization (like from later calls to rs6000_option_override_internal). The
existing explicit option checks only work for non-NULL callee_tree before, but
it should apply for NULL callee_tree case, *-2.c case is used to cover/verify
it. The testing is ongoing.

[Bug ipa/102059] Incorrect always_inline diagnostic in LTO mode with #pragma GCC target("cpu=power10")

2021-08-26 Thread marxin at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102059

--- Comment #16 from Martin Liška  ---
> 
> Thanks for the example, it looks useful! Now the field fp_expressions is
> generic, one target specific summary class seems required then. And not sure
> if the users still have interests/senses to further try flto when they fail
> to get expected inlining with some basic optimization options. :)

Note the Comment 7, fixing that people will see the error also in non-LTO mode.

[Bug ipa/102059] Incorrect always_inline diagnostic in LTO mode with #pragma GCC target("cpu=power10")

2021-08-26 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102059

--- Comment #15 from Kewen Lin  ---
(In reply to Florian Weimer from comment #12)
> (In reply to Richard Biener from comment #10)
> > As of HTM it would make the testcase a user error - when using -mcpu=power10
> > it would require building with -mcpu=power8 -mno-htm?
> 
> Or -mcpu=power8 should imply -mno-htm. It's not 100% compatible for sure,
> but so is removing a feature from a later CPU that was present in an earlier
> version and enabled by default in GCC. I suspect it's better to take the
> compatibility hit at run time, rather than later at run time (once the users
> get their hands on POWER10 hardware).

Not sure if we faced similar situations before, both power8 and power9 support
HTM, looking forward to the inputs from port maintainers and experts.

[Bug ipa/102059] Incorrect always_inline diagnostic in LTO mode with #pragma GCC target("cpu=power10")

2021-08-26 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102059

--- Comment #14 from Kewen Lin  ---
(In reply to Richard Biener from comment #11)
> Note that x86 uses for example
> 
>   else if (caller_opts->x_ix86_fpmath != callee_opts->x_ix86_fpmath
>/* If the calle doesn't use FP expressions differences in
>   ix86_fpmath can be ignored.  We are called from FEs
>   for multi-versioning call optimization, so beware of
>   ipa_fn_summaries not available.  */
>&& (! ipa_fn_summaries
>|| ipa_fn_summaries->get (callee_node) == NULL
>|| ipa_fn_summaries->get (callee_node)->fp_expressions))
> ret = false;
> 
> I wonder if we need to give targets the ability to compute IPA inline
> summary bits, like "uses HTM" or "uses intrinsics for ISA X", to
> selectively ignore bits that are not actually used.
> 

Thanks for the example, it looks useful! Now the field fp_expressions is
generic, one target specific summary class seems required then. And not sure if
the users still have interests/senses to further try flto when they fail to get
expected inlining with some basic optimization options. :)

[Bug ipa/102059] Incorrect always_inline diagnostic in LTO mode with #pragma GCC target("cpu=power10")

2021-08-26 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102059

--- Comment #13 from Kewen Lin  ---
(In reply to Richard Biener from comment #10)
> OPTION_MASK_P8_FUSION is purely optimization and shouldn't prevent inlining,
> no?
> 
> As of HTM it would make the testcase a user error - when using -mcpu=power10
> it would require building with -mcpu=power8 -mno-htm?
> 
> Thus INVALID based on HTM, and the OPTION_MASK_P8_FUSION (and other "tuning"
> things) should be excluded from the compares in the backend, at least for the
> always-inline case.

Good point, yes, the fusion should only affect performance, I'll collect some
similar flags and do some special handlings for it.

[Bug ipa/102059] Incorrect always_inline diagnostic in LTO mode with #pragma GCC target("cpu=power10")

2021-08-26 Thread fw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102059

--- Comment #12 from Florian Weimer  ---
(In reply to Richard Biener from comment #10)
> As of HTM it would make the testcase a user error - when using -mcpu=power10
> it would require building with -mcpu=power8 -mno-htm?

Or -mcpu=power8 should imply -mno-htm. It's not 100% compatible for sure, but
so is removing a feature from a later CPU that was present in an earlier
version and enabled by default in GCC. I suspect it's better to take the
compatibility hit at run time, rather than later at run time (once the users
get their hands on POWER10 hardware).

[Bug ipa/102059] Incorrect always_inline diagnostic in LTO mode with #pragma GCC target("cpu=power10")

2021-08-26 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102059

Richard Biener  changed:

   What|Removed |Added

 CC||hubicka at gcc dot gnu.org

--- Comment #11 from Richard Biener  ---
Note that x86 uses for example

  else if (caller_opts->x_ix86_fpmath != callee_opts->x_ix86_fpmath
   /* If the calle doesn't use FP expressions differences in
  ix86_fpmath can be ignored.  We are called from FEs
  for multi-versioning call optimization, so beware of
  ipa_fn_summaries not available.  */
   && (! ipa_fn_summaries
   || ipa_fn_summaries->get (callee_node) == NULL
   || ipa_fn_summaries->get (callee_node)->fp_expressions))
ret = false;

I wonder if we need to give targets the ability to compute IPA inline
summary bits, like "uses HTM" or "uses intrinsics for ISA X", to
selectively ignore bits that are not actually used.

And as always my stance on "always_inline" is that we should give thumbs
up to inlining and let the user shoot itself in the foot if he likes to.

[Bug ipa/102059] Incorrect always_inline diagnostic in LTO mode with #pragma GCC target("cpu=power10")

2021-08-26 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102059

--- Comment #10 from Richard Biener  ---
OPTION_MASK_P8_FUSION is purely optimization and shouldn't prevent inlining,
no?

As of HTM it would make the testcase a user error - when using -mcpu=power10 it
would require building with -mcpu=power8 -mno-htm?

Thus INVALID based on HTM, and the OPTION_MASK_P8_FUSION (and other "tuning"
things) should be excluded from the compares in the backend, at least for the
always-inline case.

[Bug ipa/102059] Incorrect always_inline diagnostic in LTO mode with #pragma GCC target("cpu=power10")

2021-08-25 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102059

--- Comment #9 from Kewen Lin  ---
One more reduced test case:

fail cmd: gcc -c -O2 -flto -mcpu=power8
pass cmd: gcc -c -O2 -flto -mcpu=power8 -mno-htm -mno-power8-fusion

--
__attribute__((always_inline)) int foo(int *b) {
  *b += 10;
  return *b;
}

#pragma GCC target "cpu=power10"
int bar(int* a){
  *a = foo(a);
  return 0;
}

--

I confirm the mismatch subset on HTM is expected, ISA 3.1 has removed the
transnational memory support.  As to OPTION_MASK_P8_FUSION, it's reasonable to
exclude it from ISA3.0 (power9), but I'm not sure if it's not a subset of
Power10 new fusion, from the current implementation it isn't.

Hi @Mike, could you help to confirm this?

[Bug ipa/102059] Incorrect always_inline diagnostic in LTO mode with #pragma GCC target("cpu=power10")

2021-08-25 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102059

Kewen Lin  changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |linkw at gcc dot gnu.org
 Status|NEW |ASSIGNED

--- Comment #8 from Kewen Lin  ---
Thanks for the investigation, @Martin. I'll take a look tomorrow.

[Bug ipa/102059] Incorrect always_inline diagnostic in LTO mode with #pragma GCC target("cpu=power10")

2021-08-25 Thread marxin at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102059

Martin Liška  changed:

   What|Removed |Added

 CC||meissner at linux dot ibm.com

--- Comment #7 from Martin Liška  ---
About the non-LTO mode, the rs6000_can_inline_p succeeds due to:

  /* If the callee has no option attributes, then it is ok to inline.  */
  if (!callee_tree)
ret = true;

This is not correct in my opinion, please see i386.c:

static bool
ix86_can_inline_p (tree caller, tree callee)
{
...
  if (!callee_tree)
callee_tree = target_option_default_node;
  if (!caller_tree)
caller_tree = target_option_default_node;
...

So I think the following should be applied:

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index e073b26b430..4a997b0cabc 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -25036,9 +25036,8 @@ rs6000_can_inline_p (tree caller, tree callee)
   tree caller_tree = DECL_FUNCTION_SPECIFIC_TARGET (caller);
   tree callee_tree = DECL_FUNCTION_SPECIFIC_TARGET (callee);

-  /* If the callee has no option attributes, then it is ok to inline.  */
   if (!callee_tree)
-ret = true;
+callee_tree = target_option_default_node;

   else
 {

[Bug ipa/102059] Incorrect always_inline diagnostic in LTO mode with #pragma GCC target("cpu=power10")

2021-08-25 Thread marxin at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102059

--- Comment #6 from Martin Liška  ---
And the OPTION_MASK_P8_FUSION mask is set here:

  /* Enable power8 fusion if we are tuning for power8, even if we aren't
 generating power8 instructions.  Power9 does not optimize power8 fusion
 cases.  */
  if (!(rs6000_isa_flags_explicit & OPTION_MASK_P8_FUSION))
{
  if (processor_target_table[tune_index].processor == PROCESSOR_POWER8)
rs6000_isa_flags |= OPTION_MASK_P8_FUSION;
  else
rs6000_isa_flags &= ~OPTION_MASK_P8_FUSION;
}

[Bug ipa/102059] Incorrect always_inline diagnostic in LTO mode with #pragma GCC target("cpu=power10")

2021-08-25 Thread marxin at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102059

Martin Liška  changed:

   What|Removed |Added

 CC||linkw at gcc dot gnu.org
 Status|ASSIGNED|NEW
   Assignee|marxin at gcc dot gnu.org  |unassigned at gcc dot 
gnu.org

--- Comment #5 from Martin Liška  ---
So what happens in LTO mode: rs6000_can_inline_p is called with:

#0  rs6000_can_inline_p (caller=,
callee=) at
/home/marxin/Programming/gcc/gcc/config/rs6000/rs6000.c:25035
#1  0x02529129 in can_inline_edge_p (e= -> )>, report=true, early=true) at
/home/marxin/Programming/gcc/gcc/ipa-inline.c:381
#2  0x0252a48a in can_early_inline_edge_p (e= -> )>) at
/home/marxin/Programming/gcc/gcc/ipa-inline.c:646
#3  0x0253135a in inline_always_inline_functions (node=) at
/home/marxin/Programming/gcc/gcc/ipa-inline.c:2838

The following condition is false:

  /* The callee's options must be a subset of the caller's options, i.e.
 a vsx function may inline an altivec function, but a no-vsx function
 must not inline a vsx function.  However, for those options that the
 callee has explicitly enabled or disabled, then we must enforce that
 the callee's and caller's options match exactly; see PR70010.  */


>25062 if (((caller_isa & callee_isa) == callee_isa)

because:

(gdb) p /x (caller_isa & callee_isa) ^ callee_isa
$3 = 0x22

which are the following masks:

#define OPTION_MASK_HTM (HOST_WIDE_INT_1U << 17)
#define OPTION_MASK_P8_FUSION (HOST_WIDE_INT_1U << 37)

So looking at the condition, callers options are not subset.

Which is caused by g:82800987cb3b22427a8799b3e8491eff496724b9

commit 82800987cb3b22427a8799b3e8491eff496724b9
Author: Kewen Lin 
Date:   Wed Dec 2 01:55:34 2020 -0600

rs6000: Disable HTM for Power10 and later by default

Power ISA 3.1 has dropped transactional memory support, this patch
is to disable HTM feature for power10 and later by default.

Bootstrapped/regtested on powerpc64le-linux-gnu P8 and P10.

gcc/ChangeLog:

* config/rs6000/rs6000.c (rs6000_option_override_internal):
Use OPTION_MASK_DIRECT_MOVE for Power8 target_enable instead
of OPTION_MASK_HTM.
* config/rs6000/rs6000-cpus.def (ISA_2_7_MASKS_SERVER):
Remove OPTION_MASK_HTM.
(RS6000_CPU): Add OPTION_MASK_HTM to power8, power9 and
powerpc64le entries.

@Kewen: Can you please take a look?

[Bug ipa/102059] Incorrect always_inline diagnostic in LTO mode with #pragma GCC target("cpu=power10")

2021-08-25 Thread marxin at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102059

--- Comment #4 from Martin Liška  ---
Reduced test-case:

$ cat /tmp/basic_op.ii
enum { Unaligned, Aligned };
enum { ColMajor };
enum { ReadOnlyAccessors, DefaultProduct };
template  struct traits;
struct accessors_level {
  enum { has_direct_access, has_write_access };
};
template 
class Matrix;
template  class MatrixBase;
template  class NoAlias;
template  class Product;
template  class MapBase;
template  class Map;
template  struct dense_xpr_base {
  typedef MatrixBase type;
};
template  struct assign_op {};
struct DenseBase {
  typedef int Scalar;
  enum { Flags };
};
template  struct MatrixBase : DenseBase {
  NoAlias noalias();
};
template  class ProductImpl;
template 
struct Product : ProductImpl {
  _Lhs lhs();
  _Rhs rhs();
};
template  struct ProductImpl : dense_xpr_base::type {};
template  struct Assignment;
template 
void call_assignment_no_alias(Dst dst, Src src, Func func) {
  Assignment::run(dst, src, func);
}
template 
struct general_matrix_matrix_product;
template  class blas_data_mapper;
template 
struct blas_data_mapper {
  blas_data_mapper(Scalar *, Index, Index);
};
template  struct NoAlias {
  template  void operator=(OtherDerived) {
Product, 1>, Map, 1>>
__trans_tmp_4;
call_assignment_no_alias(m_expression, __trans_tmp_4,
 assign_op());
  }
  ExpressionType m_expression;
};
template 
struct traits> {
  typedef _Scalar Scalar;
};
template 
struct MapBase : dense_xpr_base::type {
  typedef typename traits::Scalar *PointerType;
};
template 
struct MapBase : MapBase {};
template 
struct traits>
: traits {};
template 
struct Map : MapBase, MapOptions>> {
  typedef Map Base;
  Map(typename Base::PointerType, long, long);
};
struct gebp_traits {
  enum { nr, mr };
};
template 
struct gebp_kernel;
long parallelize_gemm_cols;
template 
void parallelize_gemm(Functor func, Index, bool) {
  func(0, parallelize_gemm_cols);
}
template  struct generic_product_impl;
template 
struct Assignment, assign_op> {
  static void run(DstXprType dst, Product src, assign_op) {
Map __trans_tmp_5 = src.rhs();
generic_product_impl::evalTo(dst, src.lhs(), __trans_tmp_5);
  }
};
template 
struct general_matrix_matrix_product {
  typedef LhsScalar ResScalar;
  static void run(ResScalar *_res, ResScalar alpha) {
Index resStride, resIncr, actual_mc, actual_kc, actual_nc;
typedef blas_data_mapper ResMapper;
ResMapper res(_res, resStride, resIncr);
gebp_kernel
gebp;
LhsScalar blockA, blockB;
gebp(res, &blockA, &blockB, actual_mc, actual_kc, actual_nc, alpha);
  }
};
struct gemm_blocking_space;
struct gemm_functor {
  gemm_functor(Map, 1>,
   Map, 1>,
   Map, 1>, float, gemm_blocking_space);
  void operator()(int, int) {
general_matrix_matrix_product::run(
0, m_actualAlpha);
  }
  float m_actualAlpha;
};
struct gemm_blocking_space {
  gemm_blocking_space(long, long, long, long, bool);
};
int scaleAndAddTo___trans_tmp_3;
template  struct generic_product_impl {
  template  static void evalTo(Dst dst, Lhs lhs, Rhs rhs) {
scaleAndAddTo(dst, lhs, rhs);
  }
  template 
  static void scaleAndAddTo(Dest dst, Lhs a_lhs, Rhs a_rhs) {
Map lhs(a_lhs);
Map rhs(a_rhs);
typedef gemm_functor GemmFunctor;
gemm_blocking_space blocking(0, 0, 0, 1, true);
parallelize_gemm<0>(GemmFunctor(lhs, rhs, dst, 0, blocking),
scaleAndAddTo___trans_tmp_3, Dest::Flags);
  }
};
template  Packet bmask();
#pragma GCC target "cpu=power10"
template 
void gemmMMA(const DataMapper &, const float *, const float *, long, long,
long,
 float, long, long, long, long) {
  bmask();
}
#pragma GCC reset_options
struct quad_traits {
  enum { size, rows };
};
template  __attribute__((always_inline)) Packet bmask() {}
template 
struct gebp_kernel {
  void operator()(const DataMapper &, const float *, const float *, Index,
  Index, Index, float, Index = 1, Index = 1, Index = 0,
  Index = 0);
};
template 
void gebp_kernel::operator()(const DataMapper &, const float *,
   const float *, Index, Index, Index,
   float, Index, Index, Index, Index) {
  void (*gemm_function)(const DataMapper &, const float *, const float *,
Index,
Index, Index, float, Index, Index, Index, Index) =
  gemmMMA;
}
template  struct Data_ {
  float &operator[](long);
  Data_ *MatrixOp();
};
Product __trans_tmp_7;
template  Data_ *Data_::MatrixOp() {
  long NbCol0, NbRow1;
  Data_ res;
  Map m2(&res[0], NbCol0, NbRow1);
  m2.noalias() = __trans_tmp_7;
}
template class Data_;

[Bug ipa/102059] Incorrect always_inline diagnostic in LTO mode with #pragma GCC target("cpu=power10")

2021-08-25 Thread chip.kerchner at ibm dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102059

Chip Kerchner  changed:

   What|Removed |Added

 CC||chip.kerchner at ibm dot com

--- Comment #3 from Chip Kerchner  ---
It also fails with version 10.3

[Bug ipa/102059] Incorrect always_inline diagnostic in LTO mode with #pragma GCC target("cpu=power10")

2021-08-25 Thread marxin at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102059

Martin Liška  changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |marxin at gcc dot 
gnu.org
 Status|NEW |ASSIGNED

--- Comment #2 from Martin Liška  ---
Interesting, let me take a look.

[Bug ipa/102059] Incorrect always_inline diagnostic in LTO mode with #pragma GCC target("cpu=power10")

2021-08-25 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102059

Richard Biener  changed:

   What|Removed |Added

 Ever confirmed|0   |1
   Keywords||lto
   Last reconfirmed||2021-08-25
  Known to fail||12.0
  Component|lto |ipa
 Status|UNCONFIRMED |NEW

--- Comment #1 from Richard Biener  ---
Interesting.  Note this happens at compile-time where -flto shouldn't have any
such effects ...

I happen to have a trunk cc1plus from some point around and that reproduces the
issue as well.