Re: [PATCH v5 3/5] p1689r5: initial support

2023-06-25 Thread Ben Boeckel via Gcc-patches
On Fri, Jun 23, 2023 at 14:31:17 -0400, Jason Merrill wrote:
> On 6/20/23 15:46, Ben Boeckel wrote:
> > On Tue, Feb 14, 2023 at 16:50:27 -0500, Jason Merrill wrote:
> >> On 1/25/23 13:06, Ben Boeckel wrote:
> 
> >>> Header units (including the standard library headers) are 100%
> >>> unsupported right now because the `-E` mechanism wants to import their
> >>> BMIs. A new mode (i.e., something more workable than existing `-E`
> >>> behavior) that mocks up header units as if they were imported purely
> >>> from their path and content would be required.
> >> >> I notice that the cpp dependency generation tries (in open_file_failed)
> >> to continue after encountering a missing file, is that not sufficient 
> >> for header units?  Or adjustable to be sufficient?
> > 
> > No. Header units can introduce macros which can be used to modify the
> > set of modules that are imported. Included headers are "discovered"
> > dependencies and don't modify the build graph (just add more files that
> > trigger a rebuild) and can be collected during compilation. Module
> > dependencies are needed to get the build correct in the first place in
> > order to:
> > 
> > - order module compilations in the build graph so that imported modules
> >   are ready before anything using them; and
> > - computing the set of flags needed for telling the compiler where
> >   imported modules' CMI files should be located.
> 
> So if the header unit CMI isn't available during dependency generation, 
> would it be better to just #include the header?

It's not so simple: the preprocessor state needs to isolate out
`LOCAL_ONLY` from this case:

```
#define LOCAL_ONLY 1
import ; // The preprocessing of this should *not* see
// `LOCAL_ONLY`.
```

> > Hmm. But `stdout` is probably fine to use for both though. Basically:
> > 
> >  if (fdeps_stream == out_stream && fdeps_stream != stdout)
> >make_diagnostic_noise ();
> 
> (fdeps_stream == deps_stream, but sure, that's reasonable.

Done.

> >> So, I take it this is the common use case you have in mind, generating
> >> Make dependencies for the p1689 file?  When are you thinking the Make
> >> dependencies for the .o are generated?  At build time?
> > 
> > Yes. If an included file changes, the scanning should be performed
> > again. The compilation will have its own `-MF` as well (which should
> > point to the same files plus the CMI files it ends up reading).
> > 
> >> I'm a bit surprised you're using .json rather than an extension that
> >> indicates what the information is.
> > 
> > I can change that; the filename doesn't *really* matter (e.g., CMake
> > uses `.ddi` for "dynamic dependency information").
> 
> That works.

Done.

> >>> `-M` is about discovered dependencies: those that you find out while
> >>> doing work. `-fdep-*` is about ordering dependencies: extracting
> >>> information from file content in order to even order future work around.
> >>
> >> I'm not sure I see the distinction; Makefiles also express ordering
> >> dependencies.  In both cases, you want to find out from the files what
> >> order you will want to process them in when building the project.
> > 
> > Makefiles can express ordering dependencies, but not the `-M` snippets;
> > these are for files that, if changed, should trigger a rebuild. This is > 
> > fundamentally different than module dependencies which instead indicate
> > which *compiles* (or CMI generation if using a 2-phase setup) need to
> > complete before compilation (or CMI generation) of the scanned TU can be
> > performed. Generally generated headers will be ordered manually in the
> > build system description. However, maintaining that same level for
> > in-source dependency information on a per-source level is a *far* higher
> > burden.
> 
> The main difference I see is that the CMI might not exist yet.  As you 
> say, we don't want to require people to write all the dependencies by 
> hand, but that just means we need to be able to generate the 
> dependencies automatically.  In the Make-only model I'm thinking of, one 
> would collect dependencies on an initial failing build, and then start 
> over from the beginning again with the dependencies we discovered.  It's 
> the same two-phase scan and build, but one that uses the same compile 
> commands for both phases.

It's a potentially unbounded set of phases:

- 2 phases per tool that is built that generates other module-using
  code for other tools:

- scan files for toolA
- build files for toolA
- scan files written by toolA (for toolB)
- build files written by toolA (for toolB)
- …

- if a referenced module does not exist, knowing when one is "done" is
  difficult (an import cycle would appear like this because while module
  X does exist, it depends on Y which itself claims a dependency on X).

*Something* needs to interpret the information being provided in the
`-fdeps-file=` and communicate back to the build tool what is going on.
This 

Re: [PATCH v5 3/5] p1689r5: initial support

2023-06-23 Thread Jason Merrill via Gcc-patches

On 6/20/23 15:46, Ben Boeckel wrote:

On Tue, Feb 14, 2023 at 16:50:27 -0500, Jason Merrill wrote:

On 1/25/23 13:06, Ben Boeckel wrote:



Header units (including the standard library headers) are 100%
unsupported right now because the `-E` mechanism wants to import their
BMIs. A new mode (i.e., something more workable than existing `-E`
behavior) that mocks up header units as if they were imported purely
from their path and content would be required.

>> I notice that the cpp dependency generation tries (in open_file_failed)
to continue after encountering a missing file, is that not sufficient 
for header units?  Or adjustable to be sufficient?


No. Header units can introduce macros which can be used to modify the
set of modules that are imported. Included headers are "discovered"
dependencies and don't modify the build graph (just add more files that
trigger a rebuild) and can be collected during compilation. Module
dependencies are needed to get the build correct in the first place in
order to:

- order module compilations in the build graph so that imported modules
  are ready before anything using them; and
- computing the set of flags needed for telling the compiler where
  imported modules' CMI files should be located.


So if the header unit CMI isn't available during dependency generation, 
would it be better to just #include the header?



+  if (cpp_opts->deps.format != DEPS_FMT_NONE)
+{
+  if (!fdeps_file)
+   fdeps_stream = out_stream;
+  else if (fdeps_file[0] == '-' && fdeps_file[1] == '\0')
+   fdeps_stream = stdout;


You probably want to check that deps_stream and fdeps_stream don't end
up as the same stream.


Hmm. But `stdout` is probably fine to use for both though. Basically:

 if (fdeps_stream == out_stream && fdeps_stream != stdout)
   make_diagnostic_noise ();


(fdeps_stream == deps_stream, but sure, that's reasonable.


So, I take it this is the common use case you have in mind, generating
Make dependencies for the p1689 file?  When are you thinking the Make
dependencies for the .o are generated?  At build time?


Yes. If an included file changes, the scanning should be performed
again. The compilation will have its own `-MF` as well (which should
point to the same files plus the CMI files it ends up reading).


I'm a bit surprised you're using .json rather than an extension that
indicates what the information is.


I can change that; the filename doesn't *really* matter (e.g., CMake
uses `.ddi` for "dynamic dependency information").


That works.


`-M` is about discovered dependencies: those that you find out while
doing work. `-fdep-*` is about ordering dependencies: extracting
information from file content in order to even order future work around.


I'm not sure I see the distinction; Makefiles also express ordering
dependencies.  In both cases, you want to find out from the files what
order you will want to process them in when building the project.


Makefiles can express ordering dependencies, but not the `-M` snippets;
these are for files that, if changed, should trigger a rebuild. This is > 
fundamentally different than module dependencies which instead indicate
which *compiles* (or CMI generation if using a 2-phase setup) need to
complete before compilation (or CMI generation) of the scanned TU can be
performed. Generally generated headers will be ordered manually in the
build system description. However, maintaining that same level for
in-source dependency information on a per-source level is a *far* higher
burden.


The main difference I see is that the CMI might not exist yet.  As you 
say, we don't want to require people to write all the dependencies by 
hand, but that just means we need to be able to generate the 
dependencies automatically.  In the Make-only model I'm thinking of, one 
would collect dependencies on an initial failing build, and then start 
over from the beginning again with the dependencies we discovered.  It's 
the same two-phase scan and build, but one that uses the same compile 
commands for both phases.


Anyway, this isn't an objection to this patch, just another model I also 
want to support.






Is there a reason not to use the gcc/json.h interface for JSON output?


This is `libcpp`; is that not a dependency cycle?


Ah, indeed.  We could move it to libiberty, but it would need 
significant adjustments to remove its dependencies on other stuff in 
gcc/.  So maybe just add a TODO comment about that, along with adding 
comments before the functions.


Jason



Re: [PATCH v5 3/5] p1689r5: initial support

2023-06-20 Thread Ben Boeckel via Gcc-patches
On Tue, Feb 14, 2023 at 16:50:27 -0500, Jason Merrill wrote:
> On 1/25/23 13:06, Ben Boeckel wrote:
> > - header-unit information fields
> > 
> > Header units (including the standard library headers) are 100%
> > unsupported right now because the `-E` mechanism wants to import their
> > BMIs. A new mode (i.e., something more workable than existing `-E`
> > behavior) that mocks up header units as if they were imported purely
> > from their path and content would be required.
> 
> I notice that the cpp dependency generation tries (in open_file_failed) 
> to continue after encountering a missing file, is that not sufficient 
> for header units?  Or adjustable to be sufficient?

No. Header units can introduce macros which can be used to modify the
set of modules that are imported. Included headers are "discovered"
dependencies and don't modify the build graph (just add more files that
trigger a rebuild) and can be collected during compilation. Module
dependencies are needed to get the build correct in the first place in
order to:

- order module compilations in the build graph so that imported modules
  are ready before anything using them; and
- computing the set of flags needed for telling the compiler where
  imported modules' CMI files should be located.

> > - non-utf8 paths
> > 
> > The current standard says that paths that are not unambiguously
> > represented using UTF-8 are not supported (because these cases are rare
> > and the extra complication is not worth it at this time). Future
> > versions of the format might have ways of encoding non-UTF-8 paths. For
> > now, this patch just doesn't support non-UTF-8 paths (ignoring the
> > "unambiguously represetable in UTF-8" case).
> 
> typo "representable"

Fixed.

> > diff --git a/gcc/c-family/c-opts.cc b/gcc/c-family/c-opts.cc
> > index c68a2a27469..1c14ce3fe8e 100644
> > --- a/gcc/c-family/c-opts.cc
> > +++ b/gcc/c-family/c-opts.cc
> > @@ -77,6 +77,9 @@ static bool verbose;
> >   /* Dependency output file.  */
> >   static const char *deps_file;
> >   
> > +/* Enhanced dependency output file.  */
> 
> Maybe "structured", as in the docs?  It isn't really a direct 
> enhancement of the makefile dependencies.

Agreed. I'll also add a link to p1689r5 as a comment for what
"structured" means where it is parsed out.

> > +  if (cpp_opts->deps.format != DEPS_FMT_NONE)
> > +{
> > +  if (!fdeps_file)
> > +   fdeps_stream = out_stream;
> > +  else if (fdeps_file[0] == '-' && fdeps_file[1] == '\0')
> > +   fdeps_stream = stdout;
> 
> You probably want to check that deps_stream and fdeps_stream don't end 
> up as the same stream.

Hmm. But `stdout` is probably fine to use for both though. Basically:

if (fdeps_stream == out_stream && fdeps_stream != stdout)
  make_diagnostic_noise ();

> > @@ -1374,6 +1410,8 @@ handle_deferred_opts (void)
> >   
> > if (opt->code == OPT_MT || opt->code == OPT_MQ)
> >   deps_add_target (deps, opt->arg, opt->code == OPT_MQ);
> > +   else if (opt->code == OPT_fdep_output_)
> > + deps_add_output (deps, opt->arg, true);
> 
> How about fdeps_add_target?

Renamed.

> > diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt
> > index ef371ca8c26..630781fdf8a 100644
> > --- a/gcc/c-family/c.opt
> > +++ b/gcc/c-family/c.opt
> > @@ -256,6 +256,18 @@ MT
> >   C ObjC C++ ObjC++ Joined Separate MissingArgError(missing makefile target 
> > after %qs)
> >   -MT   Add a target that does not require quoting.
> >   
> > +fdep-format=
> > +C ObjC C++ ObjC++ NoDriverArg Joined MissingArgError(missing format after 
> > %qs)
> > +Format for output dependency information.  Supported (\"p1689r5\").
> 
> I think we want "structured" here, as well.

Fixed.

> > diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> > index 06d77983e30..b61c3ebd3ec 100644
> > --- a/gcc/doc/invoke.texi
> > +++ b/gcc/doc/invoke.texi
> > @@ -2791,6 +2791,21 @@ is @option{-fpermitted-flt-eval-methods=c11}.  The 
> > default when in a GNU
> >   dialect (@option{-std=gnu11} or similar) is
> >   @option{-fpermitted-flt-eval-methods=ts-18661-3}.
> >   
> > +@item -fdep-file=@var{file}
> > +@opindex fdep-file
> > +Where to write structured dependency information.
> > +
> > +@item -fdep-format=@var{format}
> > +@opindex fdep-format
> > +The format to use for structured dependency information. @samp{p1689r5} is 
> > the
> > +only supported format right now.  Note that when this argument is 
> > specified, the
> > +output of @samp{-MF} is stripped of some information (namely C++ modules) 
> > so
> > +that it does not use extended makefile syntax not understood by most tools.
> > +
> > +@item -fdep-output=@var{file}
> > +@opindex fdep-output
> > +Analogous to @option{-MT} but for structured dependency information.
> 
> Please add more detail about how these are intended to be used.

Will do.

> > diff --git a/gcc/testsuite/g++.dg/modules/p1689-1.C 
> > b/gcc/testsuite/g++.dg/modules/p1689-1.C
> > new file mode 100644
> > index 000..245e30d09ce

Re: [PATCH v5 3/5] p1689r5: initial support

2023-06-20 Thread Ben Boeckel via Gcc-patches
On Mon, Jun 19, 2023 at 17:33:58 -0400, Jason Merrill wrote:
> On 5/12/23 10:24, Ben Boeckel wrote:
> > `file` can be omitted (the `output_stream` will be used then). I *think*
> > I see that adding:
> > 
> >  %{fdeps_file:-fdeps-file=%{!o:%b.ddi}%{o*:%.ddi%*}}
> 
> %{!fdeps-file: but yes.
> 
> > would at least do for `-fdeps-file` defaults? I don't know if there's a
> > reasonable default for `-fdeps-target=` though given that this command
> > line has no information about the object file that will be used (`-o` is
> > used for preprocessor output since we're leaning on `-E` here).
> 
> I would think it could default to %b.o?

I suppose that could work, yes.

> I had quite a few more comments on the v5 patch that you didn't respond 
> to here or address in the v6 patch; did your mail client hide them from you?

Oof. Sorry, I saw large chunks of quoting and apparently assumed the
rest was fine (I usually do aggressive trimming when doing that style of
review). I see them now. Will go through and include in v7.

--Ben


Re: [PATCH v5 3/5] p1689r5: initial support

2023-06-19 Thread Jason Merrill via Gcc-patches

On 5/12/23 10:24, Ben Boeckel wrote:

On Tue, Feb 14, 2023 at 16:50:27 -0500, Jason Merrill wrote:

I notice that the actual flags are all -fdep-*, though some of them are
-fdeps-* here, and the internal variables all seem to be fdeps_*.  I
lean toward harmonizing on "deps", I think.


Done.


I don't love the three separate options, but I suppose it's fine.  I'd
prefer "target" instead of "output".


Done.


It should be possible to omit both -file and -target and get reasonable
defaults, like the ones for -MD/-MQ in gcc.cc:cpp_unique_options.


`file` can be omitted (the `output_stream` will be used then). I *think*
I see that adding:

 %{fdeps_file:-fdeps-file=%{!o:%b.ddi}%{o*:%.ddi%*}}


%{!fdeps-file: but yes.


would at least do for `-fdeps-file` defaults? I don't know if there's a
reasonable default for `-fdeps-target=` though given that this command
line has no information about the object file that will be used (`-o` is
used for preprocessor output since we're leaning on `-E` here).


I would think it could default to %b.o?

I had quite a few more comments on the v5 patch that you didn't respond 
to here or address in the v6 patch; did your mail client hide them from you?


Jason



Re: [PATCH v5 3/5] p1689r5: initial support

2023-05-12 Thread Ben Boeckel via Gcc-patches
On Tue, Feb 14, 2023 at 16:50:27 -0500, Jason Merrill wrote:
> I notice that the actual flags are all -fdep-*, though some of them are 
> -fdeps-* here, and the internal variables all seem to be fdeps_*.  I 
> lean toward harmonizing on "deps", I think.

Done.

> I don't love the three separate options, but I suppose it's fine.  I'd 
> prefer "target" instead of "output".

Done.

> It should be possible to omit both -file and -target and get reasonable 
> defaults, like the ones for -MD/-MQ in gcc.cc:cpp_unique_options.

`file` can be omitted (the `output_stream` will be used then). I *think*
I see that adding:

%{fdeps_file:-fdeps-file=%{!o:%b.ddi}%{o*:%.ddi%*}}

would at least do for `-fdeps-file` defaults? I don't know if there's a
reasonable default for `-fdeps-target=` though given that this command
line has no information about the object file that will be used (`-o` is
used for preprocessor output since we're leaning on `-E` here).

--Ben


Re: [PATCH v5 3/5] p1689r5: initial support

2023-02-14 Thread Jason Merrill via Gcc-patches

On 1/25/23 13:06, Ben Boeckel wrote:

This patch implements support for [P1689R5][] to communicate to a build
system the C++20 module dependencies to build systems so that they may
build `.gcm` files in the proper order.


Thanks again.


Support is communicated through the following three new flags:

- `-fdeps-format=` specifies the format for the output. Currently named
   `p1689r5`.

- `-fdeps-file=` specifies the path to the file to write the format to.

- `-fdep-output=` specifies the `.o` that will be written for the TU
   that is scanned. This is required so that the build system can
   correlate the dependency output with the actual compilation that will
   occur.


I notice that the actual flags are all -fdep-*, though some of them are 
-fdeps-* here, and the internal variables all seem to be fdeps_*.  I 
lean toward harmonizing on "deps", I think.


I don't love the three separate options, but I suppose it's fine.  I'd 
prefer "target" instead of "output".


It should be possible to omit both -file and -target and get reasonable 
defaults, like the ones for -MD/-MQ in gcc.cc:cpp_unique_options.



CMake supports this format as of 17 Jun 2022 (to be part of 3.25.0)
using an experimental feature selection (to allow for future usage
evolution without committing to how it works today). While it remains
experimental, docs may be found in CMake's documentation for
experimental features.

Future work may include using this format for Fortran module
dependencies as well, however this is still pending work.

[P1689R5]: https://isocpp.org/files/papers/P1689R5.html
[cmake-experimental]: 
https://gitlab.kitware.com/cmake/cmake/-/blob/master/Help/dev/experimental.rst

TODO:

- header-unit information fields

Header units (including the standard library headers) are 100%
unsupported right now because the `-E` mechanism wants to import their
BMIs. A new mode (i.e., something more workable than existing `-E`
behavior) that mocks up header units as if they were imported purely
from their path and content would be required.


I notice that the cpp dependency generation tries (in open_file_failed) 
to continue after encountering a missing file, is that not sufficient 
for header units?  Or adjustable to be sufficient?



- non-utf8 paths

The current standard says that paths that are not unambiguously
represented using UTF-8 are not supported (because these cases are rare
and the extra complication is not worth it at this time). Future
versions of the format might have ways of encoding non-UTF-8 paths. For
now, this patch just doesn't support non-UTF-8 paths (ignoring the
"unambiguously represetable in UTF-8" case).


typo "representable"


- figure out why junk gets placed at the end of the file

Sometimes it seems like the file gets a lot of `NUL` bytes appended to
it. It happens rarely and seems to be the result of some
`ftruncate`-style call which results in extra padding in the contents.
Noting it here as an observation at least.

libcpp/

* include/cpplib.h: Add cpp_deps_format enum.
(cpp_options): Add format field
(cpp_finish): Add dependency stream parameter.
* include/mkdeps.h (deps_add_module_target): Add new preprocessor
parameter used for C++ module tracking.
* init.cc (cpp_finish): Add new preprocessor parameter used for C++
module tracking.
* mkdeps.cc (mkdeps): Implement P1689R5 output.

gcc/

* doc/invoke.texi: Document -fdeps-format=, -fdep-file=, and
-fdep-output= flags.

gcc/c-family/

* c-opts.cc (c_common_handle_option): Add fdeps_file variable and
-fdeps-format=, -fdep-file=, and -fdep-output= parsing.
* c.opt: Add -fdeps-format=, -fdep-file=, and -fdep-output= flags.

gcc/cp/

* module.cc (preprocessed_module): Pass whether the module is
exported to dependency tracking.

gcc/testsuite/

* g++.dg/modules/depflags-f-MD.C: New test.
* g++.dg/modules/depflags-f.C: New test.
* g++.dg/modules/depflags-fi.C: New test.
* g++.dg/modules/depflags-fj-MD.C: New test.
* g++.dg/modules/depflags-fj.C: New test.
* g++.dg/modules/depflags-fjo-MD.C: New test.
* g++.dg/modules/depflags-fjo.C: New test.
* g++.dg/modules/depflags-fo-MD.C: New test.
* g++.dg/modules/depflags-fo.C: New test.
* g++.dg/modules/depflags-j-MD.C: New test.
* g++.dg/modules/depflags-j.C: New test.
* g++.dg/modules/depflags-jo-MD.C: New test.
* g++.dg/modules/depflags-jo.C: New test.
* g++.dg/modules/depflags-o-MD.C: New test.
* g++.dg/modules/depflags-o.C: New test.
* g++.dg/modules/p1689-1.C: New test.
* g++.dg/modules/p1689-1.exp.json: New test expectation.
* g++.dg/modules/p1689-2.C: New test.
* g++.dg/modules/p1689-2.exp.json: New test expectation.
* g++.dg/modules/p1689-3.C: New test.
* g++.dg/modules/p1689-3.exp.json: New test expectation.