Re: On the subject of module consumer diagnostics.
On Tue, Sep 03, 2024 at 03:30:05PM +0100, Iain Sandoe wrote: > > > > On 3 Sep 2024, at 13:59, Nathaniel Shead wrote: > > > > On Tue, Sep 03, 2024 at 10:14:29AM +0100, Iain Sandoe wrote: > >> Hi Folks, > >> > >> When we build a C++ binary module (CMI/BMI), we obviously have access to > >> its source to produce diagnostics, all fine. > >> > >> However, when we consume that module we might also need access to the > >> sources used to build it - since diagnostics triggered in the consumer can > >> refer back to the sources used. > >> > >> Currently clang has been experimenting with embedding the sources into the > >> BMI - this can make things seem more efficient when, for example, > >> distributing BMIs to remote nodes in a large-scale distributed build. > >> > >> There was a patch proposed to make this the default for clang, which has > >> resulted in the discussion here: > >> > >> https://discourse.llvm.org/t/rfc-modules-should-we-embed-sources-to-the-bmi/81029 > >> > >> Does GCC have a plan to deal with this? > >> .. if so can we communicate it.. > >> .. if not, then what do we think about this strategy? > >> .. at the very least trying to avoid tooling divergence seems a worthwhile > >> objective. > >> > >> thanks > >> Iain > >> > > > > I've thought a little bit about this in the past but I don't currently > > think embedding source files by default would be a great idea. > > > > For GCC, the main benefit of embedding the original source would be that > > currently, if you compile a module interface, then edit the original > > source file before compiling a TU that consumes that source file, > > diagnostics referring to the original module interface may appear > > "incorrect" and quote unrelated code or point to incorrect lines, which > > can potentially be confusing: > > > > x.cpp: In function ‘int main()’: > > x.cpp:4:7: error: invalid conversion from ‘const char*’ to ‘int’ > > [-fpermissive] > >4 | foo("hello"); > > | ^~~ > > | | > > | const char* > > In module M, imported at x.cpp:1: > > m.cpp:2:17: note: initialising argument 1 of ‘void foo@M(int)’ > >2 | hello this file is edited > > | ^~~ > > I would expect, in most build systems, that any change in the BMI sources > would cause a rebuild of the BMI itself - anything else is not actually safe? > Right; this mostly only comes up if you're compiling 'by hand' and forget to recompile a dependent module, which as you say is not safe to start with, hence why I don't think it's much of an issue. I mostly bring this up because it can be quite confusing if it does happen. > > This is the same issue you might get with listing code using GDB on a > > binary that is older than the source files it was built with; GDB > > currently provides a nice warning when this happens that the source file > > may be out of date, that I imagine could be helpful in this circumstance > > as well. But this would not require embedding any source files, just > > comparing the mtime of the CMI vs. the source and/or header file in > > question. > > > > Either way I don't think this is much of an issue in practice: I would > > expect users to typically be using build systems that will ensure this > > above situation never occurs to begin with, and even in cases where the > > source files are completely unavailable diagnostics still work fine as > > it is, just without quoting of the source code (though line:column > > information is still available). > > > > And on the flip side, embedding source files could grow the CMIs quite a > > lot; naively just embedding all source files for something like > > > > module; > > #include > > export module M; > > export std::string foo(); > > > > would presumably also require embedding the entire contents of > > and all headers that it depends on, recursively. And this is quite > > apart from the potential questions around IP mentioned in the thread you > > linked. > > There is some (anecdotal) evidence that for current AST representation on > clang, at least, that the BMI size does not grow much. See discourse > discussion > for cross references. > With the above (admittedly somewhat contrived) example, the CMI that GCC emits is currently 2381 KB, and storing the (uncompressed) contents of with '-E -fdirectives-only' as a guesstimate for embedding would add 1736 KB, for an ~70% increase in size. Obviously as the proportion of "useful work" in a module interface grows in comparison to the amount of code included via headers this ratio would improve. That said, this would not be an issue in this particular case once 'import std' becomes available, and using a header unit would not have this issue either (only the header unit CMI would need to embed these source files rather than every CMI including the header), so perhaps this is not a large problem? Yours, Nathaniel > cheers > Iain > > > > > Yours, > > Nathaniel >
Re: On the subject of module consumer diagnostics.
On Tue, Sep 03, 2024 at 10:14:29AM +0100, Iain Sandoe wrote: > Hi Folks, > > When we build a C++ binary module (CMI/BMI), we obviously have access to its > source to produce diagnostics, all fine. > > However, when we consume that module we might also need access to the sources > used to build it - since diagnostics triggered in the consumer can refer back > to the sources used. > > Currently clang has been experimenting with embedding the sources into the > BMI - this can make things seem more efficient when, for example, > distributing BMIs to remote nodes in a large-scale distributed build. > > There was a patch proposed to make this the default for clang, which has > resulted in the discussion here: > > https://discourse.llvm.org/t/rfc-modules-should-we-embed-sources-to-the-bmi/81029 > > Does GCC have a plan to deal with this? > .. if so can we communicate it.. > .. if not, then what do we think about this strategy? > .. at the very least trying to avoid tooling divergence seems a worthwhile > objective. > > thanks > Iain > I've thought a little bit about this in the past but I don't currently think embedding source files by default would be a great idea. For GCC, the main benefit of embedding the original source would be that currently, if you compile a module interface, then edit the original source file before compiling a TU that consumes that source file, diagnostics referring to the original module interface may appear "incorrect" and quote unrelated code or point to incorrect lines, which can potentially be confusing: x.cpp: In function ‘int main()’: x.cpp:4:7: error: invalid conversion from ‘const char*’ to ‘int’ [-fpermissive] 4 | foo("hello"); | ^~~ | | | const char* In module M, imported at x.cpp:1: m.cpp:2:17: note: initialising argument 1 of ‘void foo@M(int)’ 2 | hello this file is edited | ^~~ This is the same issue you might get with listing code using GDB on a binary that is older than the source files it was built with; GDB currently provides a nice warning when this happens that the source file may be out of date, that I imagine could be helpful in this circumstance as well. But this would not require embedding any source files, just comparing the mtime of the CMI vs. the source and/or header file in question. Either way I don't think this is much of an issue in practice: I would expect users to typically be using build systems that will ensure this above situation never occurs to begin with, and even in cases where the source files are completely unavailable diagnostics still work fine as it is, just without quoting of the source code (though line:column information is still available). And on the flip side, embedding source files could grow the CMIs quite a lot; naively just embedding all source files for something like module; #include export module M; export std::string foo(); would presumably also require embedding the entire contents of and all headers that it depends on, recursively. And this is quite apart from the potential questions around IP mentioned in the thread you linked. Yours, Nathaniel
Re: Improvement of CLOBBER descriptions
On Wed, Feb 21, 2024 at 03:02:55PM +0400, Daniil Frolov wrote: > Hi. > > Following the recent introduction of more detailed CLOBBER types in GCC, a > minor > inconsistency has been identified in the description of > CLOBBER_OBJECT_BEGIN: > > /* Beginning of object lifetime, e.g. C++ constructor. */ > CLOBBER_OBJECT_BEGIN > > This comment appears somewhat contradictory, as according to the C++ > standard, > an object's lifetime commences upon completion of its initialization: > > > The lifetime of an object of type T begins when: > > -- storage with the proper alignment and size for type T is obtained, > > and > > -- its initialization (if any) is complete (including vacuous > > initialization) > etc. > > However, GCC emits CLOBBERs of this type at the beginning of a constructor. > And similarly for CLOBBER_OBJECT_END; by the standard an object ends its lifetime at the start of the destructor, while we emit this at the end of the destructor call. However, both placements are useful. At least for constexpr, the current meaning of CLOBBER_OBJECT_END is important to know when we can still access members of a class undergoing construction or destruction (see [class.cdtor] p1) since this isn't directly tied to the lifetime of the containing object itself. But I'm still working for GCC 15 on correctly handling "partially constructed" objects in constexpr (e.g. PR109518), and having a version of CLOBBER_OBJECT_BEGIN at the /end/ of the constructor to signal this might be useful as due to splitting of non-constant initialisers this is otherwise a little awkward to determine from the IR currently. > Does anybody have any ideas how to make this description more precise? It'd > be > better to clearly define what an object's lifetime is at the GIMPLE IMHO. So, given the above, I think CLOBBER_OBJECT_BEGIN is still a pretty good name (though maybe not perfect?), but maybe the comment should say something like "Beginning of object construction"? Or otherwise mention that this is the start of the lifetime of either the object or its first subobject. Yours, Nathaniel. > --- > With best regards, > Daniil