Re: On the subject of module consumer diagnostics.

2024-09-03 Thread Nathaniel Shead via Gcc
On Tue, Sep 03, 2024 at 03:30:05PM +0100, Iain Sandoe wrote:
> 
> 
> > On 3 Sep 2024, at 13:59, Nathaniel Shead  wrote:
> > 
> > On Tue, Sep 03, 2024 at 10:14:29AM +0100, Iain Sandoe wrote:
> >> Hi Folks,
> >> 
> >> When we build a C++ binary module (CMI/BMI), we obviously have access to 
> >> its source to produce diagnostics, all fine.
> >> 
> >> However, when we consume that module we might also need access to the 
> >> sources used to build it - since diagnostics triggered in the consumer can 
> >> refer back to the sources used.
> >> 
> >> Currently clang has been experimenting with embedding the sources into the 
> >> BMI - this can make things seem more efficient when, for example, 
> >> distributing BMIs to remote nodes in a large-scale distributed build.
> >> 
> >> There was a patch proposed to make this the default for clang, which has 
> >> resulted in the discussion here:
> >> 
> >> https://discourse.llvm.org/t/rfc-modules-should-we-embed-sources-to-the-bmi/81029
> >> 
> >> Does GCC have a plan to deal with this?
> >> .. if so can we communicate it..
> >> .. if not, then what do we think about this strategy?
> >> .. at the very least trying to avoid tooling divergence seems a worthwhile 
> >> objective.
> >> 
> >> thanks
> >> Iain
> >> 
> > 
> > I've thought a little bit about this in the past but I don't currently
> > think embedding source files by default would be a great idea.
> > 
> > For GCC, the main benefit of embedding the original source would be that
> > currently, if you compile a module interface, then edit the original
> > source file before compiling a TU that consumes that source file,
> > diagnostics referring to the original module interface may appear
> > "incorrect" and quote unrelated code or point to incorrect lines, which
> > can potentially be confusing:
> > 
> > x.cpp: In function ‘int main()’:
> > x.cpp:4:7: error: invalid conversion from ‘const char*’ to ‘int’ 
> > [-fpermissive]
> >4 |   foo("hello");
> >  |   ^~~
> >  |   |
> >  |   const char*
> > In module M, imported at x.cpp:1:
> > m.cpp:2:17: note:   initialising argument 1 of ‘void foo@M(int)’
> >2 | hello this file is edited
> >  | ^~~
> 
> I would expect, in most build systems, that any change in the BMI sources
> would cause a rebuild of the BMI itself - anything else is not actually safe?
> 

Right; this mostly only comes up if you're compiling 'by hand' and
forget to recompile a dependent module, which as you say is not safe to
start with, hence why I don't think it's much of an issue.  I mostly
bring this up because it can be quite confusing if it does happen.

> > This is the same issue you might get with listing code using GDB on a
> > binary that is older than the source files it was built with; GDB
> > currently provides a nice warning when this happens that the source file
> > may be out of date, that I imagine could be helpful in this circumstance
> > as well.  But this would not require embedding any source files, just
> > comparing the mtime of the CMI vs. the source and/or header file in
> > question.
> > 
> > Either way I don't think this is much of an issue in practice: I would
> > expect users to typically be using build systems that will ensure this
> > above situation never occurs to begin with, and even in cases where the
> > source files are completely unavailable diagnostics still work fine as
> > it is, just without quoting of the source code (though line:column
> > information is still available).
> > 
> > And on the flip side, embedding source files could grow the CMIs quite a
> > lot; naively just embedding all source files for something like
> > 
> >  module;
> >  #include 
> >  export module M;
> >  export std::string foo();
> > 
> > would presumably also require embedding the entire contents of 
> > and all headers that it depends on, recursively.  And this is quite
> > apart from the potential questions around IP mentioned in the thread you
> > linked.
> 
> There is some (anecdotal) evidence that for current AST representation on
> clang, at least, that the BMI size does not grow much.  See discourse 
> discussion
> for cross references.
> 

With the above (admittedly somewhat contrived) example, the CMI that GCC
emits is currently 2381 KB, and storing the (uncompressed) contents of
 with '-E -fdirectives-only' as a guesstimate for embedding
would add 1736 KB, for an ~70% increase in size.  Obviously as the
proportion of "useful work" in a module interface grows in comparison to
the amount of code included via headers this ratio would improve.

That said, this would not be an issue in this particular case once
'import std' becomes available, and using a header unit would not have
this issue either (only the header unit CMI would need to embed these
source files rather than every CMI including the header), so perhaps
this is not a large problem?

Yours,
Nathaniel

> cheers
> Iain
> 
> > 
> > Yours,
> > Nathaniel
> 


Re: On the subject of module consumer diagnostics.

2024-09-03 Thread Nathaniel Shead via Gcc
On Tue, Sep 03, 2024 at 10:14:29AM +0100, Iain Sandoe wrote:
> Hi Folks,
> 
> When we build a C++ binary module (CMI/BMI), we obviously have access to its 
> source to produce diagnostics, all fine.
> 
> However, when we consume that module we might also need access to the sources 
> used to build it - since diagnostics triggered in the consumer can refer back 
> to the sources used.
> 
> Currently clang has been experimenting with embedding the sources into the 
> BMI - this can make things seem more efficient when, for example, 
> distributing BMIs to remote nodes in a large-scale distributed build.
> 
> There was a patch proposed to make this the default for clang, which has 
> resulted in the discussion here:
> 
> https://discourse.llvm.org/t/rfc-modules-should-we-embed-sources-to-the-bmi/81029
> 
> Does GCC have a plan to deal with this?
> .. if so can we communicate it..
> .. if not, then what do we think about this strategy?
> .. at the very least trying to avoid tooling divergence seems a worthwhile 
> objective.
> 
> thanks
> Iain
> 

I've thought a little bit about this in the past but I don't currently
think embedding source files by default would be a great idea.

For GCC, the main benefit of embedding the original source would be that
currently, if you compile a module interface, then edit the original
source file before compiling a TU that consumes that source file,
diagnostics referring to the original module interface may appear
"incorrect" and quote unrelated code or point to incorrect lines, which
can potentially be confusing:

x.cpp: In function ‘int main()’:
x.cpp:4:7: error: invalid conversion from ‘const char*’ to ‘int’ [-fpermissive]
4 |   foo("hello");
  |   ^~~
  |   |
  |   const char*
In module M, imported at x.cpp:1:
m.cpp:2:17: note:   initialising argument 1 of ‘void foo@M(int)’
2 | hello this file is edited
  | ^~~

This is the same issue you might get with listing code using GDB on a
binary that is older than the source files it was built with; GDB
currently provides a nice warning when this happens that the source file
may be out of date, that I imagine could be helpful in this circumstance
as well.  But this would not require embedding any source files, just
comparing the mtime of the CMI vs. the source and/or header file in
question.

Either way I don't think this is much of an issue in practice: I would
expect users to typically be using build systems that will ensure this
above situation never occurs to begin with, and even in cases where the
source files are completely unavailable diagnostics still work fine as
it is, just without quoting of the source code (though line:column
information is still available).

And on the flip side, embedding source files could grow the CMIs quite a
lot; naively just embedding all source files for something like

  module;
  #include 
  export module M;
  export std::string foo();

would presumably also require embedding the entire contents of 
and all headers that it depends on, recursively.  And this is quite
apart from the potential questions around IP mentioned in the thread you
linked.

Yours,
Nathaniel


Re: Improvement of CLOBBER descriptions

2024-02-21 Thread Nathaniel Shead via Gcc
On Wed, Feb 21, 2024 at 03:02:55PM +0400, Daniil Frolov wrote:
> Hi.
> 
> Following the recent introduction of more detailed CLOBBER types in GCC, a
> minor
> inconsistency has been identified in the description of
> CLOBBER_OBJECT_BEGIN:
> 
>   /* Beginning of object lifetime, e.g. C++ constructor.  */
>   CLOBBER_OBJECT_BEGIN
> 
> This comment appears somewhat contradictory, as according to the C++
> standard,
> an object's lifetime commences upon completion of its initialization:
> 
> > The lifetime of an object of type T begins when:
> >   -- storage with the proper alignment and size for type T is obtained,
> > and
> >   -- its initialization (if any) is complete (including vacuous
> > initialization)
> etc.
> 
> However, GCC emits CLOBBERs of this type at the beginning of a constructor.
> 

And similarly for CLOBBER_OBJECT_END; by the standard an object ends its
lifetime at the start of the destructor, while we emit this at the end
of the destructor call.

However, both placements are useful. At least for constexpr, the current
meaning of CLOBBER_OBJECT_END is important to know when we can still
access members of a class undergoing construction or destruction (see
[class.cdtor] p1) since this isn't directly tied to the lifetime of the
containing object itself.

But I'm still working for GCC 15 on correctly handling "partially
constructed" objects in constexpr (e.g. PR109518), and having a version
of CLOBBER_OBJECT_BEGIN at the /end/ of the constructor to signal this
might be useful as due to splitting of non-constant initialisers this is
otherwise a little awkward to determine from the IR currently.

> Does anybody have any ideas how to make this description more precise?  It'd
> be
> better to clearly define what an object's lifetime is at the GIMPLE IMHO.

So, given the above, I think CLOBBER_OBJECT_BEGIN is still a pretty good
name (though maybe not perfect?), but maybe the comment should say
something like "Beginning of object construction"? Or otherwise mention
that this is the start of the lifetime of either the object or its first
subobject.

Yours,
Nathaniel.

> ---
> With best regards,
> Daniil