[Bug gcov-profile/110082] Coverage analysis vs. offloading compilation

2023-06-02 Thread rguenther at suse dot de via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110082

--- Comment #6 from rguenther at suse dot de  ---
On Fri, 2 Jun 2023, tschwinge at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110082
> 
> --- Comment #5 from Thomas Schwinge  ---
> (In reply to Jakub Jelinek from comment #4)
> > (In reply to rguent...@suse.de from comment #3)
> > > I suppose you want to apply this generally, not only to offloaded
> > > functions and when offloading is enabled?
> > 
> > It could be done just for the functions that aren't host only, i.e.
> > the offloading kernels or declare target functions, what the offloading LTO
> > streams out.
> 
> Indeed my idea has been to apply this abstraction generally, without any
> conditionals on offloading constructs etc.  That's for reasons of
> maintainability: to not add any more diverging code paths, requiring special
> testing (now, and for future changes), and to lessen possibility of surprising
> behavior re the diverging code paths doing different things.  OK?

Yes, I think that's good.

[Bug gcov-profile/110082] Coverage analysis vs. offloading compilation

2023-06-02 Thread tschwinge at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110082

--- Comment #5 from Thomas Schwinge  ---
(In reply to Jakub Jelinek from comment #4)
> (In reply to rguent...@suse.de from comment #3)
> > I suppose you want to apply this generally, not only to offloaded
> > functions and when offloading is enabled?
> 
> It could be done just for the functions that aren't host only, i.e.
> the offloading kernels or declare target functions, what the offloading LTO
> streams out.

Indeed my idea has been to apply this abstraction generally, without any
conditionals on offloading constructs etc.  That's for reasons of
maintainability: to not add any more diverging code paths, requiring special
testing (now, and for future changes), and to lessen possibility of surprising
behavior re the diverging code paths doing different things.  OK?


Thanks for your input!

[Bug gcov-profile/110082] Coverage analysis vs. offloading compilation

2023-06-02 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110082

--- Comment #4 from Jakub Jelinek  ---
(In reply to rguent...@suse.de from comment #3)
> I suppose you want to apply this generally, not only to offloaded
> functions and when offloading is enabled?

It could be done just for the functions that aren't host only, i.e.
the offloading kernels or declare target functions, what the offloading LTO
streams out.

[Bug gcov-profile/110082] Coverage analysis vs. offloading compilation

2023-06-02 Thread rguenther at suse dot de via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110082

--- Comment #3 from rguenther at suse dot de  ---
On Fri, 2 Jun 2023, tschwinge at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110082
> 
> Thomas Schwinge  changed:
> 
>What|Removed |Added
> 
>  Ever confirmed|0   |1
>Last reconfirmed||2023-06-02
>  Status|UNCONFIRMED |NEW
> 
> --- Comment #2 from Thomas Schwinge  ---
> (In reply to Richard Biener from comment #1)
> > Note that when you do it as
> > proposed the code will appear as having no coverage (the counters will be
> > allocated at the host side but nothing will increment them).
> 
> ACK, our customer does understand this.
> 
> I infer correctly that the "do it as proposed" does seem fine to you:
> 
> (In reply to me from comment #0)
> > My idea is to abstract the "increment the edge execution count" operations
> > into some new GIMPLE/IFN code (?), and then later, once the offloading code
> > has been split off, lower it to the current form (host-side), or no-op
> > (device-side).  I'd appreciate a quick review if that approach makes sense?

Yes, I think this is a reasonable way to do this - I'll note there's
IPA pass analysis that might need adjustments to correctly capture
the semantics of the internal functions.

I suppose you want to apply this generally, not only to offloaded
functions and when offloading is enabled?

I briefly considered whether it's possible/useful to move profile
instrumentation to the main IPA _transform_ stage but I guess
this will unnecessarily complicate the intricate web of things
there.  Profile read for -fprofile-use would then still need to
happen at IPA analysis phase so keeping meta-data between
compile and LTRANS phase in-sync to make that working out nicely
would be another challenge.

[Bug gcov-profile/110082] Coverage analysis vs. offloading compilation

2023-06-02 Thread tschwinge at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110082

Thomas Schwinge  changed:

   What|Removed |Added

 Ever confirmed|0   |1
   Last reconfirmed||2023-06-02
 Status|UNCONFIRMED |NEW

--- Comment #2 from Thomas Schwinge  ---
(In reply to Richard Biener from comment #1)
> Note that when you do it as
> proposed the code will appear as having no coverage (the counters will be
> allocated at the host side but nothing will increment them).

ACK, our customer does understand this.

I infer correctly that the "do it as proposed" does seem fine to you:

(In reply to me from comment #0)
> My idea is to abstract the "increment the edge execution count" operations
> into some new GIMPLE/IFN code (?), and then later, once the offloading code
> has been split off, lower it to the current form (host-side), or no-op
> (device-side).  I'd appreciate a quick review if that approach makes sense?

[Bug gcov-profile/110082] Coverage analysis vs. offloading compilation

2023-06-02 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110082

Richard Biener  changed:

   What|Removed |Added

 CC||sebastian.huber@embedded-br
   ||ains.de

--- Comment #1 from Richard Biener  ---
Sebastian was also working in this area.  Note that when you do it as proposed
the code will appear as having no coverage (the counters will be allocated at
the host side but nothing will increment them).

I suppose the very same issue exists for -fprofile-generate/use then where
this will then cause the offload code to be optimized for size because
it's cold (unless you use -fprofile-partial-training)?