[PATCH] D153114: [clangd] [C++20] [Modules] Support C++20 modules for clangd

2023-09-15 Thread Chuanqi Xu via Phabricator via cfe-commits
ChuanqiXu added a comment.

I sent the next patch at: https://github.com/llvm/llvm-project/pull/66462


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D153114/new/

https://reviews.llvm.org/D153114

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D153114: [clangd] [C++20] [Modules] Support C++20 modules for clangd

2023-08-31 Thread Chuanqi Xu via Phabricator via cfe-commits
ChuanqiXu added a comment.

Got it. Thanks for your explanation. I can continue my experiment : )


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D153114/new/

https://reviews.llvm.org/D153114

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D153114: [clangd] [C++20] [Modules] Support C++20 modules for clangd

2023-08-31 Thread Sam McCall via Phabricator via cfe-commits
sammccall added a comment.

In D153114#4630318 , @ChuanqiXu wrote:

> @sammccall @nridge while I am looking for the initial support for modules in 
> clangd, I failed to find the mechanism to update files after I update a 
> header file.
>
> e.g., when I am opening the following file:
>
>   // a.cc
>   #include "a.h"
>   ...
>
> and there is a concurrent update to `a.h`. How can the ASTWorker of `a.cc` 
> know such changes so that it can update the corresponding Preamble of `a.cc`?

The PrecompiledPreamble structure (defined in clang, not clangd) consists of a 
handle to the `*.pch` file, and also a list of filenames that it was built 
from. We can test whether it's out of date by `stat()`ing the input files 
(`PrecompiledPreamble::CanReuse`).

Once upon a time, clangd used this in a simple way: the ASTWorker always called 

 [`clangd::buildPreamble(inputs, old 
preamble)`](https://github.com/llvm/llvm-project/blob/release/10.x/clang-tools-extra/clangd/Preamble.cpp#L101-L107)
 which would just return the old one if it was still valid.

For a while now we've had async preambles which are more complicated but use 
the same underlying mechanism. Each file has an ASTWorker thread and a 
PreambleThread. When the ASTWorker thread wants to reuse preamble, it notifies 
the PreambleThread "hey, maybe rebuild the preamble?" but meanwhile it charges 
on using the old stale preamble. The preamble asynchronously performs the 
validity check 
,
 rebuilds the preamble if needed, and eventually informs the ASTWorker which 
ensures the up-to-date preamble is eventually used.

This is a "pull" system: we only check if the preamble is valid when we tried 
to use it, i.e. when the main-file changed. If you just touch a header on disk 
but do nothing else, we won't rebuild either the main file or the preamble.

> In the comments of `ClangdServer::reparseOpenFilesIfNeeded()`, I see:
>
>> /// Requests a reparse of currently opened files using their latest source.
>> /// This will typically only rebuild if something other than the source has
>> /// changed (e.g. the CDB yields different flags, or **files included in the
>> /// preamble have been modified**).

Because the above is a pull system, editing a header doesn't update e.g. 
diagnostics in files that include that header.
So this is a workaround: it requests all files to be rebuilt from current 
sources, so we pull on all preambles.
Then at every layer we try to ensure this does no work if nothing has changed 
:-)

In practice we call this in response to didSave (user edited+saved a header in 
their editor), we could potentially call it in response to changes on disk as 
Nathan suggested, and I think we have a custom integration somewhere that calls 
it when we know externally that compile flags have changed.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D153114/new/

https://reviews.llvm.org/D153114

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D153114: [clangd] [C++20] [Modules] Support C++20 modules for clangd

2023-08-31 Thread Chuanqi Xu via Phabricator via cfe-commits
ChuanqiXu added a comment.

In D153114#4630408 , @nridge wrote:

> In D153114#4630318 , @ChuanqiXu 
> wrote:
>
>> However, I can't search the caller of `reparseOpenFilesIfNeeded` which 
>> semantics matches the behavior. The two callers of  
>> `reparseOpenFilesIfNeeded` I found are 
>> `ClangdLSPServer::applyConfiguration()` and 
>> `ClangdLSPServer::onDocumentDidSave()` and neither of them matches 
>> description `files included in the  preamble have been modified`.
>>
>> So I want to ask what's the behavior when I update a header and where is the 
>> corresponding code. Thanks.
>
> I'm afraid `onDocumentDidSave()` is all we have for now. It detects changes 
> to the header when editing the header in the client (when the header is 
> saved). I don't believe we have a mechanism for detecting changes to the 
> header made in other ways.

IIUC, when we open `a.cpp` and `b.h` (there is no relationship between them), 
and we edit and save `b.h`, then the AST Worker of `a.cpp` will receive a 
request to update `a.cpp`. Did I understand correct? I imaged there may be an 
early exit point in the path of `ASTWorker::update` or `PreambleThread::update` 
if we detects the preamble doesn't change actually. But I failed to find it.

> If/when we want to add such a mechanism, I think the way to do it is using 
> didChangeWatchedFiles (there is some discussion there about why LSP 
> recommends servers delegate file-watching to the client rather than 
> implementing file-watching in the server).

Got it. And I am wondering the reason we didn't implement it may be that it is 
not so bad actually. Since a user generally won't open too many tabs. Do I 
understand right? And if it is the case, maybe we need to look at it in the 
future since it may be a concern with modules.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D153114/new/

https://reviews.llvm.org/D153114

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D153114: [clangd] [C++20] [Modules] Support C++20 modules for clangd

2023-08-31 Thread Nathan Ridge via Phabricator via cfe-commits
nridge added a comment.

In D153114#4630318 , @ChuanqiXu wrote:

> However, I can't search the caller of `reparseOpenFilesIfNeeded` which 
> semantics matches the behavior. The two callers of  
> `reparseOpenFilesIfNeeded` I found are 
> `ClangdLSPServer::applyConfiguration()` and 
> `ClangdLSPServer::onDocumentDidSave()` and neither of them matches 
> description `files included in the  preamble have been modified`.
>
> So I want to ask what's the behavior when I update a header and where is the 
> corresponding code. Thanks.

I'm afraid `onDocumentDidSave()` is all we have for now. It detects changes to 
the header when editing the header in the client (when the header is saved). I 
don't believe we have a mechanism for detecting changes to the header made in 
other ways.

If/when we want to add such a mechanism, I think the way to do it is using 
didChangeWatchedFiles 

 (there is some discussion there about why LSP recommends servers delegate 
file-watching to the client rather than implementing file-watching in the 
server).


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D153114/new/

https://reviews.llvm.org/D153114

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D153114: [clangd] [C++20] [Modules] Support C++20 modules for clangd

2023-08-31 Thread Chuanqi Xu via Phabricator via cfe-commits
ChuanqiXu added a comment.

@sammccall @nridge while I am looking for the initial support for modules in 
clangd, I failed to find the mechanism to update files after I update a header 
file.

e.g., when I am opening the following file:

  // a.cc
  #include "a.h"
  ...

and there is a concurrent update to `a.h`. How can the ASTWorker of `a.cc` know 
such changes so that it can update the corresponding Preamble of `a.cc`?

In the comments of `ClangdServer::reparseOpenFilesIfNeeded()`, I see:

> /// Requests a reparse of currently opened files using their latest source.
> /// This will typically only rebuild if something other than the source has
> /// changed (e.g. the CDB yields different flags, or **files included in the
> /// preamble have been modified**).

So I thought this is what I want. However, I can't search the caller of 
`reparseOpenFilesIfNeeded` which semantics matches the behavior. The two 
callers of  `reparseOpenFilesIfNeeded` I found are 
`ClangdLSPServer::applyConfiguration()` and 
`ClangdLSPServer::onDocumentDidSave()` and neither of them matches description 
`files included in the  preamble have been modified`.

So I want to ask what's the behavior when I update a header and where is the 
corresponding code. Thanks.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D153114/new/

https://reviews.llvm.org/D153114

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D153114: [clangd] [C++20] [Modules] Support C++20 modules for clangd

2023-08-24 Thread Sam McCall via Phabricator via cfe-commits
sammccall added a comment.

Aside: I've been doing some investigation into how modules+clangd could work in 
our huge monorepo (specifically bazel + distributed build cluster).
It looks feasible (with some serious effort) to get all BMI/index/etc data we 
need for transitive modules to be generated by a copy of clangd running in the 
distributed build system itself, and likely this will perform better than 
anything involving building in-process.
So it seems plausible to add a coarse-grained extension point for this, and 
assume that the "find-and-build-modules" part we're adding now doesn't have to 
scale to that size. (We'd still like it to scale to projects the size of 
Chromium, which is roughly 10x LLVM and our usual proxy for "large local 
project"). I think we can make some simplifying assumptions even beyond 
experimental stage:

- the module graph will fit in memory
- the compilation database will fit in memory
- the set of files in the project is enumerable
- reading all the source files in the project is possible: takes minutes but 
not hours

In D153114#4612843 , @ChuanqiXu wrote:

> @sammccall here is a question (or double check) about the intended initial 
> version.
>
> This is the requirement for the initial version:
>
>> Don't attempt any cross-file or cross-version coordination: i.e. don't try 
>> to reuse BMIs between different files, don't try to reuse BMIs between 
>> (preamble) reparses of the same file, don't try to persist the module graph. 
>> Instead, when building a preamble, synchronously scan for the module graph, 
>> build the required PCMs on the single preamble thread with filenames private 
>> to that preamble, and then proceed to build the preamble.
>
> And all of us agree that it will be the job of the preamble to manage the 
> module files in the end of the day. I just want to ask or double check that 
> it might be OK to not related the module files with preamble in the initial 
> version, right?
>
> Since the definition (or description) of preamble is:
>
>> A preamble can be reused between multiple versions of the file until 
>> invalidated by a modification to a header, compile commands or modification 
>> to relevant part of the current file.
>
> And it looks like it beyonds the scope of the our requirement of the initial 
> patch.

On the one hand, if building modules in the AST works and is simpler, sure we 
can delay the "preamble optimization" for them.

But I don't really expect this is the case:

- as we've established, you can reach `import` statements inside the preamble 
region (`#include` a textual header that itself contains an `import`). So you 
need to build at least some of the required BMIs before building the preamble, 
doing it when parsing the main AST is not enough. (I believe the current 
version of the patch will just fail to parse some code in the preamble in this 
case).
- rebuilding BMIs with every preamble (not trying to reuse/cache/version them 
yet) will be pretty slow. But if we rebuild them every main-file reparse (i.e. 
every keystroke!) I think this is going to be intolerably slow to the point of 
not even being useful as a prototype.
- I only see two things that are really different between building during 
preamble vs building during AST, and neither are a lot of work:
  - invalidation: we need the preamble to be invalid if imports outside the 
preamble region have changed, or if the inputs of modules themselves have 
changed. But I think we can punt on all of this for now, and just not rebuild 
the preamble sometimes when we should. (Workaround: manually touch the preamble 
region after adding imports)
  - we need to attach the modules to PreambleData, rather than having them 
local to `ParsedAST::build`. But this really does not seem like much work at 
all - as we're not (yet) sharing modules through the filesystem, encapsulating 
the lifetime of a generated module in an object is something we should be doing 
right from the start anyway.
- given this, adding the code to ASTWorker and then moving it to PreambleWorker 
later seems like it doesn't save anything much over doing it right in the first 
place


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D153114/new/

https://reviews.llvm.org/D153114

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D153114: [clangd] [C++20] [Modules] Support C++20 modules for clangd

2023-08-24 Thread Chuanqi Xu via Phabricator via cfe-commits
ChuanqiXu added a comment.

@sammccall here is a question (or double check) about the intended initial 
version.

This is the requirement for the initial version:

> Don't attempt any cross-file or cross-version coordination: i.e. don't try to 
> reuse BMIs between different files, don't try to reuse BMIs between 
> (preamble) reparses of the same file, don't try to persist the module graph. 
> Instead, when building a preamble, synchronously scan for the module graph, 
> build the required PCMs on the single preamble thread with filenames private 
> to that preamble, and then proceed to build the preamble.

And all of us agree that it will be the job of the preamble to manage the 
module files in the end of the day. I just want to ask or double check that it 
might be OK to not related the module files with preamble in the initial 
version, right?

Since the definition (or description) of preamble is:

> A preamble can be reused between multiple versions of the file until 
> invalidated by a modification to a header, compile commands or modification 
> to relevant part of the current file.

And it looks like it beyonds the scope of the our requirement of the initial 
patch.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D153114/new/

https://reviews.llvm.org/D153114

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D153114: [clangd] [C++20] [Modules] Support C++20 modules for clangd

2023-08-22 Thread Chuanqi Xu via Phabricator via cfe-commits
ChuanqiXu added a comment.

In D153114#4603579 , @sammccall wrote:

> In D153114#4602414 , @ChuanqiXu 
> wrote:
>
>>> Don't attempt any cross-file or cross-version coordination: i.e. don't try 
>>> to reuse BMIs between different files, don't try to reuse BMIs between 
>>> (preamble) reparses of the same file, don't try to persist the module 
>>> graph. Instead, when building a preamble, synchronously scan for the module 
>>> graph, build the required PCMs on the single preamble thread with filenames 
>>> private to that preamble, and then proceed to build the preamble.
>>
>> Do I understand right? If I understand correctly, I fully agree with the 
>> direction. We can go slowly, as long as we keep moving forward.
>>
>> Then I'd like to leave the patch as-is for referencing and create new 
>> patches following the suggestion.
>
> Yes, that's the suggestion, and that plan makes sense to me, thanks!
>
> I did some more thinking about this (having a concrete implementation helps a 
> lot!) and had a couple more thoughts.
> At some point we should write down a design somewhere, need to strike a 
> balance between doing it early enough to be useful but late enough that we've 
> understood!

Yeah, then let's make the page into some design ideas discussing page.

> Dep scanning - roles
> 
>
> IIUC we do this for two reasons:
>
> - to identify what module names we must have PCMs for in order to build a 
> given TU (either an open file, or a module we're building as PCM)
> - to build a database mapping module name => filename, which we compose with 
> the CDB to know how to build a PCM for a given module name
>
> I think it would be good to clearly separate these. The latter is simpler, 
> more performance-critical, async, and is probably not used at all if the 
> build system can tell us this mapping.
> The latter is more complex, and will always be needed synchronously for the 
> mainfile regardless of the build system.

Yes, agreed.

> Dep scanning - implementation
> -
>
> The dep scanner seems to work by getting the compile command and running the 
> preprocessor. This is fairly heavyweight, and I can't see anywhere it's going 
> into single-file mode - is it really reading all `#included` headers? This is 
> certainly not workable for reparses of the mainfile (when no headers have 
> changed).
>
> It seems unneccesary: the standard seems to go to some lengths to ensure that 
> we (almost) only need to lex the top-level file:
>
> - module and import decls can't be produced by macros (seems to be the effect 
> of the `pp-module` directive etc)
> - module and import decls can't be `#include`d (definition of `module-file` 
> and `[cpp.import]` rules)
>
> The wrinkle I see is that some PP usage is possible: the module name can be 
> produced by a macro, and imports can be `#ifdef`d. I think the former is very 
> unlikely (like `#include MACRO_NAME`) and we can not support it, and the 
> latter will just result in us overestimating the deps, which seems OK.
> You have more context here though, and maybe I'm misreading the restrictions 
> placed by the standard. Today clang doesn't seem to enforce most of these 
> sort of restrictions, which is probably worth fixing if they're real.
>
> (This doesn't apply to header modules: it's perfectly possible to include a 
> textual header which includes a modular header, and it's impossible to know 
> without actually preprocessing. This divergence is nasty, but I don't think 
> we should pessimize standard modules for it).

There are some problems due to the complexities of the standard...

The take away may be:

- module and import decls can't be produced by macros (seems to be the effect 
of the `pp-module` directive etc)
  - yes
- module and import decls can't be `#include`d (definition of `module-file` and 
`[cpp.import]` rules)
  - the module declarations can't be `#include`d.
  - but the import decls can be `#include`d partially. See the discussion of 
https://github.com/llvm/llvm-project/issues/59688 for detail. The explanation 
is:
- the wording(http://eel.is/c++draft/cpp.import#3) is "If a pp-import is 
produced by source file inclusion (including by the rewrite produced when a 
#include directive names an importable header) while processing the group of a 
module-file, the program is ill-formed."
- and the definition of `module-file` 
(http://eel.is/c++draft/cpp.pre#nt:module-file) is `pp-global-module-fragment 
pp-module group pp-private-module-fragment`.
- so the phrase `the group of a module-file` only refers to the `group` in 
the definition of `module-file` literally. We can't expand the grammar.
- the module name can be produced by a macro
  - yes
- imports can be #ifdefd.
  - yes. And this is a pretty important use-case for using modules in practice.

So possibly we have to look into the `#include`s when scanning.

> 

[PATCH] D153114: [clangd] [C++20] [Modules] Support C++20 modules for clangd

2023-08-21 Thread Sam McCall via Phabricator via cfe-commits
sammccall added a comment.

In D153114#4604438 , @nridge wrote:

> In D153114#4603579 , @sammccall 
> wrote:
>
>> - to identify what module names we must have PCMs for in order to build a 
>> given TU (either an open file, or a module we're building as PCM)
>> - to build a database mapping module name => filename, which we compose with 
>> the CDB to know how to build a PCM for a given module name
>>
>> I think it would be good to clearly separate these. The latter is simpler, 
>> more performance-critical, async, and is probably not used at all if the 
>> build system can tell us this mapping.
>> The latter is more complex, and will always be needed synchronously for the 
>> mainfile regardless of the build system.
>
> I think the second instance of "the latter" was meant to be "the former" :)

Oops, yes!

>> At a high level, `import` decls should be processed with the preamble: [...]
>> However they don't appear in a prefix of the file
>
> This point is not obvious to me: do you mean that there are coding styles 
> that place `import` statements further down in the file, after non-trivial 
> declarations?

Yes:

- inside a module unit, imports must appear directly underneath the module 
declaration, above non-trivial declarations
- but non-module TUs can place imports anywhere in the file (at TU scope)

> If not, what stops us from altering the definition of "preamble" to something 
> like "everything before the first declaration which is not an import/module 
> declaration"?

We *have* to find and build *all* the PCMs that will be imported - unlike with 
the current preamble PCH, we can't give up on some of them and fall back to 
textual inclusion.

We could also have the preamble region cover the imports that happen to be at 
the top - i.e. the PCH would contain these ImportDecls. But I don't know that 
actually achieves anything meaningful over stopping the preamble before the 
imports. Either way, each time we use the preamble we'll have load the imported 
PCMs.

---

I just realized my ideas for simplifying the dep scanning may not work: as well 
as named modules we could also have imports of header units, which presumably 
means we need to at least build an include path and stat files on it to resolve 
a header unit name :-(


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D153114/new/

https://reviews.llvm.org/D153114

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D153114: [clangd] [C++20] [Modules] Support C++20 modules for clangd

2023-08-21 Thread Nathan Ridge via Phabricator via cfe-commits
nridge added a comment.

In D153114#4603579 , @sammccall wrote:

> Dep scanning - roles
> 
>
> IIUC we do this for two reasons:
>
> - to identify what module names we must have PCMs for in order to build a 
> given TU (either an open file, or a module we're building as PCM)
> - to build a database mapping module name => filename, which we compose with 
> the CDB to know how to build a PCM for a given module name
>
> I think it would be good to clearly separate these. The latter is simpler, 
> more performance-critical, async, and is probably not used at all if the 
> build system can tell us this mapping.
> The latter is more complex, and will always be needed synchronously for the 
> mainfile regardless of the build system.

I think the second instance of "the latter" was meant to be "the former" :)

> At a high level, `import` decls should be processed with the preamble: [...]
> However they don't appear in a prefix of the file

This point is not obvious to me: do you mean that there are coding styles that 
place `import` statements further down in the file, after non-trivial 
declarations?

If not, what stops us from altering the definition of "preamble" to something 
like "everything before the first declaration which is not an import/module 
declaration"?


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D153114/new/

https://reviews.llvm.org/D153114

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D153114: [clangd] [C++20] [Modules] Support C++20 modules for clangd

2023-08-21 Thread Sam McCall via Phabricator via cfe-commits
sammccall added a comment.

In D153114#4602414 , @ChuanqiXu wrote:

>> Don't attempt any cross-file or cross-version coordination: i.e. don't try 
>> to reuse BMIs between different files, don't try to reuse BMIs between 
>> (preamble) reparses of the same file, don't try to persist the module graph. 
>> Instead, when building a preamble, synchronously scan for the module graph, 
>> build the required PCMs on the single preamble thread with filenames private 
>> to that preamble, and then proceed to build the preamble.
>
> Do I understand right? If I understand correctly, I fully agree with the 
> direction. We can go slowly, as long as we keep moving forward.
>
> Then I'd like to leave the patch as-is for referencing and create new patches 
> following the suggestion.

Yes, that's the suggestion, and that plan makes sense to me, thanks!

I did some more thinking about this (having a concrete implementation helps a 
lot!) and had a couple more thoughts.
At some point we should write down a design somewhere, need to strike a balance 
between doing it early enough to be useful but late enough that we've 
understood!

Dep scanning - roles


IIUC we do this for two reasons:

- to identify what module names we must have PCMs for in order to build a given 
TU (either an open file, or a module we're building as PCM)
- to build a database mapping module name => filename, which we compose with 
the CDB to know how to build a PCM for a given module name

I think it would be good to clearly separate these. The latter is simpler, more 
performance-critical, async, and is probably not used at all if the build 
system can tell us this mapping.
The latter is more complex, and will always be needed synchronously for the 
mainfile regardless of the build system.

Dep scanning - implementation
-

The dep scanner seems to work by getting the compile command and running the 
preprocessor. This is fairly heavyweight, and I can't see anywhere it's going 
into single-file mode - is it really reading all `#included` headers? This is 
certainly not workable for reparses of the mainfile (when no headers have 
changed).

It seems unneccesary: the standard seems to go to some lengths to ensure that 
we (almost) only need to lex the top-level file:

- module and import decls can't be produced by macros (seems to be the effect 
of the `pp-module` directive etc)
- module and import decls can't be `#include`d (definition of `module-file` and 
`[cpp.import]` rules)

The wrinkle I see is that some PP usage is possible: the module name can be 
produced by a macro, and imports can be `#ifdef`d. I think the former is very 
unlikely (like `#include MACRO_NAME`) and we can not support it, and the latter 
will just result in us overestimating the deps, which seems OK.
You have more context here though, and maybe I'm misreading the restrictions 
placed by the standard. Today clang doesn't seem to enforce most of these sort 
of restrictions, which is probably worth fixing if they're real.

(This doesn't apply to header modules: it's perfectly possible to include a 
textual header which includes a modular header, and it's impossible to know 
without actually preprocessing. This divergence is nasty, but I don't think we 
should pessimize standard modules for it).

Interaction with preamble
-

At a high level, `import` decls should be processed with the preamble: they 
should change infrequently, rebuilding modules is expensive, coarse-grained 
work, we want to make the same policy decisions on whether to use stale PCMs or 
block on fresh ones etc.
However they don't appear in a prefix of the file, and this is pretty important 
to how the preamble action works, so exactly in what sense are they part of the 
preamble?

I believe the best answer is:

- "preamble" is really a set of required artifacts + conditions to check for 
validity
- `import foo` in a file means `foo.pcm` is a required artifact, not that 
`preamble.pcm` contains an `ImportDecl`

So given this code:

  module;
  #include 
  module foo;
  import dep1;
  module :private;
  import dep2;

The "preamble region" should end after `#include ` and preamble.pcm should 
contain the AST & PP state up to that point.
Meanwhile `dep1.pcm` and `dep2.pcm` are separate PCM files that will be loaded 
on each parse.
For a preamble to be usable at all, we need to have built `preamble.pcm`, 
`dep1.pcm`, `dep2.pcm`.
For a preamble to be up-to-date, the preamble region + set of imported modules 
must be unchanged, the preamble.pcm must be up to date with respect to its 
sources, and the module PCMs must be up-to-date with respect to their sources. 
(unclear exactly how to implement the latter, may need to add extra tracking)

(When building this module as a PCM we may not want to treat dep2 as a 
dependency or parse the private fragment... but this is not relevant to 
preambles as we won't be 

[PATCH] D153114: [clangd] [C++20] [Modules] Support C++20 modules for clangd

2023-08-20 Thread Chuanqi Xu via Phabricator via cfe-commits
ChuanqiXu added a comment.

@sammccall Hi, Sam. Thanks for your high-quality comments! It is valuable. All 
the low-level inline comments are helpful. But I didn't reply them for the 
suggested direction in the higher level comments.

I'll repeat your suggestion in my mind again to avoid any misunderstandings:

While we should leave the space for future development, we should do the 
following thing in the initial patch:

> Don't attempt any cross-file or cross-version coordination: i.e. don't try to 
> reuse BMIs between different files, don't try to reuse BMIs between 
> (preamble) reparses of the same file, don't try to persist the module graph. 
> Instead, when building a preamble, synchronously scan for the module graph, 
> build the required PCMs on the single preamble thread with filenames private 
> to that preamble, and then proceed to build the preamble.

Do I understand right? If I understand correctly, I fully agree with the 
direction. We can go slowly, as long as we keep moving forward.

Then I'd like to leave the patch as-is for referencing and create new patches 
following the suggestion.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D153114/new/

https://reviews.llvm.org/D153114

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D153114: [clangd] [C++20] [Modules] Support C++20 modules for clangd

2023-08-17 Thread Sam McCall via Phabricator via cfe-commits
sammccall added a comment.

Sorry for the long radio silence here. There's a lot to chew on, and I put it 
off too long. Thanks for your patience!

I agree we should get experimental modules support landed in some form on the 
LLVM 18 timeline.
It's fine if this doesn't have extension points for large projects, or for 
header modules, and if it'll take some refactoring later to get there. Still, I 
think we should consider them in the architecture (partly because a second impl 
helps reason about interfaces).

I do think it's important we play nicely with the rest of clangd 
infrastructure: things like threading/context, VFS clean-ness, ability to 
easily write gtests for modules-enabled ASTs.

This initial pass probably mostly comes across as a list of complaints... I'd 
like to be more constructive & provide ideas, but this is long already so I'll 
follow up with that.

I've tried to limit to mostly high-level comments (or at least details that 
require a bit of thought!)
A couple more that don't fit anywhere:

---

Indexing


Much of clangd's functionality relies on having an up-to-date index of the 
transitively included headers (and now, modules).
While we're more relaxed about having a full-project index, this preamble index 
really is essential.

It's infeasible to build this index from the PCM files themselves: this means 
fully deserializing them and is too expensive (we tried this with PCH).
Traditionally we indexed the preamble the ASTContext before serializing the 
PCH, though recently this is optionally async.
(You can see this in `onPreambleAST` in ClangdServer.cpp and trace from there.)

I think indexing can be left out of this initial patch, but we should work out 
where/how it fits and leave a comment!

---

Scope and incremental development
-

There are a lot of moving pieces to this. That's useful to understand how they 
all fit together (which is necessary whether they're in the same patch or not).
However it makes difficult to review, and to drive a review to a conclusion. I 
think we should change the scope of the initial patch to establish a small 
subset that works and we understand well:

**Don't attempt any cross-file or cross-version coordination**: i.e. don't try 
to reuse BMIs between different files, don't try to reuse BMIs between 
(preamble) reparses of the same file, don't try to persist the module graph. 
Instead, when building a preamble, synchronously scan for the module graph, 
build the required PCMs on the single preamble thread with filenames private to 
that preamble, and then proceed to build the preamble.

Obviously this no longer provides any performance benefits from modules! But it 
gets us to a modules-based build that works, and these aspects can be added on 
top in a relatively clear way.
It avoids a lot of tricky problems:

- scheduling
- (lack of) transactionality of reads/writes from the module graph
- the possibility of BMI version conflicts
- various events that can invalidate the module graph
- rebuilding of BMIs when source files change (the current patch deliberately 
doesn't do this to keep the scope down, but that means the result is both 
complex and incorrect, which is a difficult situation to get out of)

I think this also encourages us to unbundle some of the "ModulesManager"'s 
responsibilities and will lead to a better API there.

**Do consider the lifecycle/namespacing of BMI files on disk**

While we may eventually want to reuse BMIs between (sequential or concurrent) 
clangd instances, this is complicated and IMO won't happen soon (needs to deal 
with source versioning, I believe).

Therefore for now we should treat the BMIs as transient, conceptually like 
in-memory objects. In practice we want to store them on disk for size reasons, 
but these should have transient tempfile names that will never conflict, should 
be deleted on shutdown etc. (It's fine to do basic delete-on-destruction and 
leave crash-handling etc out for now).




Comment at: clang-tools-extra/clangd/ModulesManager.cpp:50
+
+  DependencyScanningTool ScanningTool(Service);
+

the docs suggest this tool is going to spawn threads.

clangd needs to retain control of thread spawning, e.g. to ensure 
clangd::Context is propagated (required for config, tracing and others), thread 
priorities are set appropriately etc.



Comment at: clang-tools-extra/clangd/ModulesManager.cpp:55
+  llvm::Expected P1689Rule =
+  ScanningTool.getP1689ModuleDependencyFile(
+  *Cmd, Cmd->Directory, MakeformatOutput, MakeformatOutputPath);

It looks like this performs IO for the dependency scanning.
The files may not be on a real filesystem, IO on source files should go through 
the configured VFS (see ThreadsafeFS from which you can obtain a concrete FS 
for the desired operation).



Comment at: clang-tools-extra/clangd/ModulesManager.cpp:56
+  

[PATCH] D153114: [clangd] [C++20] [Modules] Support C++20 modules for clangd

2023-08-09 Thread Ilya Biryukov via Phabricator via cfe-commits
ilya-biryukov added a comment.

In D153114#4571703 , @ChuanqiXu wrote:

> For example, may it be a reasonable expectation that we can have named 
> modules support in clangd in clang18? Clang18 will be released in the first 
> week of March 2024. So that's still roughly 7 months away from now. I guess 
> the time span may be sufficient. How do you feel about this? And if we have 
> consensus on that, then we will need to move forward from the patch if we 
> don't have solution in December at least. Since it takes time to review and 
> experiment this further.
>
> BTW, I don't mind someone else to take this job away completely as long as we 
> can get the support in time. Since I am not a clangd developer : )

Setting some deadlines totally makes sense and Clang 18 release looks like a 
very reasonable target.
Give us a few days to see if we can set this up. I hope to be back with 
concrete commitments by the end of this week or early next week.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D153114/new/

https://reviews.llvm.org/D153114

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D153114: [clangd] [C++20] [Modules] Support C++20 modules for clangd

2023-08-08 Thread Chuanqi Xu via Phabricator via cfe-commits
ChuanqiXu added a comment.

Got it. The explanation makes sense. A well designed and scaling solution is 
what I (and probably every one) want.

Then from the expectation, the difference between supporting header modules and 
C++20 named modules from the **user** side may be:

- For supporting header modules in clangd, it is mainly a speed issue and some 
corner cases.
- For supporting C++20 named modules in clangd, it will be pretty hard for 
users to use named modules in practice.

In another word, it may not be super bad for not supporting header modules in 
clangd. But it is super bad for named modules. So I am wondering if we can have 
an expectation for the time points to support named modules (even only 
experimentally) in clangd. For example, may it be a reasonable expectation that 
we can have named modules support in clangd in clang18? Clang18 will be 
released in the first week of March 2024. So that's still roughly 7 months away 
from now. I guess the time span may be sufficient. How do you feel about this? 
And if we have consensus on that, then we will need to move forward from the 
patch if we don't have solution in December at least. Since it takes time to 
review and experiment this further.

BTW, I don't mind someone else to take this job away completely as long as we 
can get the support in time since I am not a clangd developer : )


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D153114/new/

https://reviews.llvm.org/D153114

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D153114: [clangd] [C++20] [Modules] Support C++20 modules for clangd

2023-08-08 Thread Ilya Biryukov via Phabricator via cfe-commits
ilya-biryukov added a comment.

In D153114#4569591 , @ChuanqiXu wrote:

> BTW, I have a question about supporting header modules (including clang 
> header modules and C++20 header units) in static analysing tools (including 
> clangd), "why can't we fallback to include the corresponding headers 
> simply?". So I still feel it is not a problem to support header modules in 
> static tools. Since I feel the semantics of header modules is almost 
> transparent to other tools.

We could and it mostly works. In fact, this is what we do internally at Google 
at build system level since our source tools don't support header modules, but 
the build does support them.
There are problems with this approach, though:

- Modules provide build time improvement that add up if they are widely used in 
the codebase. This leads to source tools being much slower than builds. This 
results in slowness and timeouts when source tools are used.
- There are still incompatibilities between header modules and preprocessor and 
we occasionally get code that builds, but source tools don't work on it.

I think we should be open to special-casing the header modules if they turn out 
to be hard to support compared to C++20 Modules.
However, my intuition is that that would not be the case and the major problems 
are actually quite similar between the two (versioning of PCMs we consume, when 
PCMs need to be rebuilt, etc) and we get header modules almost for free if we 
solve those problem for C++20 Modules.
But we do want to make sure we don't miss anything important for header modules 
in the design phase as that's actually something used by Google today and it 
would be valuable to solve the aforementioned problems.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D153114/new/

https://reviews.llvm.org/D153114

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D153114: [clangd] [C++20] [Modules] Support C++20 modules for clangd

2023-08-08 Thread Chuanqi Xu via Phabricator via cfe-commits
ChuanqiXu added a comment.

BTW, I have a question about supporting header modules (including clang header 
modules and C++20 header units) in static analysing tools (including clangd), 
"why can't we fallback to include the corresponding headers simply?". So I 
still feel it is not a problem to support header modules in static tools. Since 
I feel the semantics of header modules is almost transparent to other tools.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D153114/new/

https://reviews.llvm.org/D153114

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D153114: [clangd] [C++20] [Modules] Support C++20 modules for clangd

2023-08-07 Thread Chuanqi Xu via Phabricator via cfe-commits
ChuanqiXu added a comment.

Got it. Being patience is not bad and it is good enough to know that we are 
moving forward : )


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D153114/new/

https://reviews.llvm.org/D153114

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D153114: [clangd] [C++20] [Modules] Support C++20 modules for clangd

2023-08-07 Thread Ilya Biryukov via Phabricator via cfe-commits
ilya-biryukov requested changes to this revision.
ilya-biryukov added a comment.
This revision now requires changes to proceed.

Hi @ChuanqiXu,

Sam is on vacation now, but we are in the same team and I am responding on 
behalf of the team.

First, sorry for not getting to this review earlier.
The change is quite big, and, as Sam mentioned, we want to make sure this 
scales to our (unreasonable) project sizes and supports for header modules.
That is to say, we feel that the stakes are high for getting it right.

In order to move forward with this change, we need to do some homework to 
figure out the longer-term strategy for modules in Clangd.
We plan to come back with answers, but it may take a few months because 
planning this urgently is tough.
I do want to stress that we want to get modules in Clangd right, plan to work 
on it.

Sorry for these delays and confusion.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D153114/new/

https://reviews.llvm.org/D153114

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D153114: [clangd] [C++20] [Modules] Support C++20 modules for clangd

2023-08-06 Thread Chuanqi Xu via Phabricator via cfe-commits
ChuanqiXu added a comment.

@sammccall gentle ping


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D153114/new/

https://reviews.llvm.org/D153114

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D153114: [clangd] [C++20] [Modules] Support C++20 modules for clangd

2023-07-25 Thread Chuanqi Xu via Phabricator via cfe-commits
ChuanqiXu added a comment.

@sammccall gentle ping~


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D153114/new/

https://reviews.llvm.org/D153114

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D153114: [clangd] [C++20] [Modules] Support C++20 modules for clangd

2023-07-12 Thread Nathan Ridge via Phabricator via cfe-commits
nridge resigned from this revision.
nridge added a comment.

I'm sorry, I feel like I don't have a good enough level of insight into the 
requirements to usefully critique this patch, nor the bandwidth to take a 
detailed look through the implementation right now.

I think it's best for me to resign as reviewer for the time being, and leave 
the review in Sam's capable hands.

Sam, I hope this is ok; if you'd like a second opinion on any particular point, 
or a second pair of eyes on the implementation, please feel free to re-add me 
and I will do my best to make time to weigh in.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D153114/new/

https://reviews.llvm.org/D153114

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D153114: [clangd] [C++20] [Modules] Support C++20 modules for clangd

2023-07-11 Thread Chuanqi Xu via Phabricator via cfe-commits
ChuanqiXu added a comment.

@sammccall @nridge gentle ping~


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D153114/new/

https://reviews.llvm.org/D153114

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D153114: [clangd] [C++20] [Modules] Support C++20 modules for clangd

2023-06-29 Thread Chuanqi Xu via Phabricator via cfe-commits
ChuanqiXu added a comment.

@sammccall @nridge gentle ping~


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D153114/new/

https://reviews.llvm.org/D153114

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D153114: [clangd] [C++20] [Modules] Support C++20 modules for clangd

2023-06-19 Thread Pol M via Phabricator via cfe-commits
Destroyerrrocket added inline comments.



Comment at: clang-tools-extra/clangd/ModulesManager.cpp:413-414
+  else
+WaitingCallables[Filename.str()].push_back(
+{std::move(ReadyCallback), std::move(ReadyCallback)});
+}

ChuanqiXu wrote:
> Destroyerrrocket wrote:
> > This is a bug; The second move is invalid. You could make a copy
> Done. Thanks for looking this. I changed it with a new signature for the 
> callbacks with a bool argument.
No problem! I'd love to help :)


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D153114/new/

https://reviews.llvm.org/D153114

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D153114: [clangd] [C++20] [Modules] Support C++20 modules for clangd

2023-06-18 Thread Chuanqi Xu via Phabricator via cfe-commits
ChuanqiXu added inline comments.



Comment at: clang-tools-extra/clangd/ModulesManager.cpp:413-414
+  else
+WaitingCallables[Filename.str()].push_back(
+{std::move(ReadyCallback), std::move(ReadyCallback)});
+}

Destroyerrrocket wrote:
> This is a bug; The second move is invalid. You could make a copy
Done. Thanks for looking this. I changed it with a new signature for the 
callbacks with a bool argument.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D153114/new/

https://reviews.llvm.org/D153114

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D153114: [clangd] [C++20] [Modules] Support C++20 modules for clangd

2023-06-18 Thread Chuanqi Xu via Phabricator via cfe-commits
ChuanqiXu updated this revision to Diff 532502.
ChuanqiXu added a comment.

Address comments.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D153114/new/

https://reviews.llvm.org/D153114

Files:
  clang-tools-extra/clangd/CMakeLists.txt
  clang-tools-extra/clangd/ClangdLSPServer.cpp
  clang-tools-extra/clangd/ClangdLSPServer.h
  clang-tools-extra/clangd/GlobalCompilationDatabase.cpp
  clang-tools-extra/clangd/GlobalCompilationDatabase.h
  clang-tools-extra/clangd/ModulesManager.cpp
  clang-tools-extra/clangd/ModulesManager.h
  clang-tools-extra/clangd/TUScheduler.cpp
  clang-tools-extra/clangd/test/CMakeLists.txt
  clang-tools-extra/clangd/test/modules.test
  clang-tools-extra/clangd/tool/ClangdMain.cpp
  clang-tools-extra/clangd/unittests/CMakeLists.txt
  clang-tools-extra/clangd/unittests/ModulesManagerTests.cpp
  clang-tools-extra/docs/ReleaseNotes.rst

Index: clang-tools-extra/docs/ReleaseNotes.rst
===
--- clang-tools-extra/docs/ReleaseNotes.rst
+++ clang-tools-extra/docs/ReleaseNotes.rst
@@ -48,6 +48,9 @@
 Improvements to clangd
 --
 
+- Implemented the experimental support for C++20 modules. This can be enabled by
+  `-experimental-modules-support` option.
+
 Inlay hints
 ^^^
 
Index: clang-tools-extra/clangd/unittests/ModulesManagerTests.cpp
===
--- /dev/null
+++ clang-tools-extra/clangd/unittests/ModulesManagerTests.cpp
@@ -0,0 +1,350 @@
+//===-- ModulesManagerTests.cpp -*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+
+#include "Config.h"
+#include "ModulesManager.h"
+
+#include "llvm/ADT/STLExtras.h"
+#include "llvm/Support/FileSystem.h"
+#include "llvm/Support/raw_ostream.h"
+
+#include "gmock/gmock.h"
+#include "gtest/gtest.h"
+
+using namespace clang;
+using namespace clang::clangd;
+using namespace llvm;
+
+namespace {
+class ModulesManagerTest : public ::testing::Test {
+  void SetUp() override {
+ASSERT_FALSE(sys::fs::createUniqueDirectory("modules-test", TestDir));
+llvm::errs() << "Created TestDir: " << TestDir << "\n";
+  }
+
+  void TearDown() override {
+// sys::fs::remove_directories(TestDir);
+  }
+
+public:
+  SmallString<256> TestDir;
+
+  // Add files to the working testing directory and repalce all the
+  // `__DIR__` to TestDir.
+  void addFile(StringRef Path, StringRef Contents) {
+ASSERT_FALSE(sys::path::is_absolute(Path));
+
+SmallString<256> AbsPath(TestDir);
+sys::path::append(AbsPath, Path);
+
+ASSERT_FALSE(
+sys::fs::create_directories(llvm::sys::path::parent_path(AbsPath)));
+
+std::error_code EC;
+llvm::raw_fd_ostream OS(AbsPath, EC);
+ASSERT_FALSE(EC);
+
+std::size_t Pos = Contents.find("__DIR__");
+while (Pos != llvm::StringRef::npos) {
+  OS << Contents.take_front(Pos);
+  OS << TestDir;
+  Contents = Contents.drop_front(Pos + sizeof("__DIR__") - 1);
+  Pos = Contents.find("__DIR__");
+}
+
+OS << Contents;
+  }
+
+  // Get the absolute path for file specified by Path under testing working
+  // directory.
+  std::string getFullPath(StringRef Path) {
+SmallString<128> Result(TestDir);
+sys::path::append(Result, Path);
+return Result.str().str();
+  }
+};
+
+TEST_F(ModulesManagerTest, ReplaceCommandsTest) {
+  addFile("build/compile_commands.json", R"cpp(
+[
+{
+  "directory": "__DIR__",
+  "command": "clang++ -std=c++20 __DIR__/M.cppm -c -o __DIR__/M.o -fmodule-file=D=__DIR__/D.pcm",
+  "file": "__DIR__/M.cppm",
+  "output": "__DIR__/M.o"
+}
+]
+  )cpp");
+
+  addFile("M.cppm", R"cpp(
+export module M;
+import D;
+  )cpp");
+
+  RealThreadsafeFS TFS;
+  DirectoryBasedGlobalCompilationDatabase::Options Opts(TFS);
+  DirectoryBasedModulesGlobalCompilationDatabase MCDB(Opts,
+  /*AsyncThreadsCount*/ 4);
+
+  std::optional Cmd =
+  MCDB.getCompileCommand(getFullPath("M.cppm"));
+  EXPECT_TRUE(Cmd);
+  // Since the graph is not built yet. We don't expect to see the mutated
+  // command line for modules.
+  EXPECT_FALSE(any_of(Cmd->CommandLine, [](StringRef Arg) {
+return Arg.count("-fprebuilt-module-path");
+  }));
+  EXPECT_TRUE(any_of(Cmd->CommandLine, [](StringRef Arg) {
+return Arg.count("-fmodule-file=");
+  }));
+
+  ModulesManager *MMgr = MCDB.getModulesManager();
+  EXPECT_TRUE(MMgr);
+  MMgr->UpdateNode(getFullPath("M.cppm"));
+
+  MMgr->waitUntilInitialized();
+
+  Cmd = MCDB.getCompileCommand(getFullPath("M.cppm"));
+  EXPECT_TRUE(Cmd);
+  // Since the graph has been built. We expect to see the mutated command line
+  // for modules.
+  

[PATCH] D153114: [clangd] [C++20] [Modules] Support C++20 modules for clangd

2023-06-18 Thread Pol M via Phabricator via cfe-commits
Destroyerrrocket requested changes to this revision.
Destroyerrrocket added inline comments.
This revision now requires changes to proceed.



Comment at: clang-tools-extra/clangd/ModulesManager.cpp:413-414
+  else
+WaitingCallables[Filename.str()].push_back(
+{std::move(ReadyCallback), std::move(ReadyCallback)});
+}

This is a bug; The second move is invalid. You could make a copy


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D153114/new/

https://reviews.llvm.org/D153114

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D153114: [clangd] [C++20] [Modules] Support C++20 modules for clangd

2023-06-16 Thread Chuanqi Xu via Phabricator via cfe-commits
ChuanqiXu added a comment.

> That said, there are two large issues that I think should be addressed in the 
> design (though not necessarily *implemented* now).

Yeah, totally agreed. Design is pretty important especially in open source 
softwares. I'm pretty open to the suggestions.

---

> support for clang header modules
>
> Google has a large deployment of clangd, serving a codebase that builds 
> significant parts as clang header modules for performance reasons. There is 
> no near-term plan of adopting C++20 modules (multiple reasons, which I 
> probably can't represent well).
> We've been disabling header modules for clangd & other tools for a long time. 
> But it's come to a head, and we expect to get someone working on enabling 
> modules in clangd in ~2 months.
> As I understand it, Meta are in a similar situation (though their solution is 
> to version-lock clangd with the toolchain, keep modules enabled, and accept 
> that some things are broken).
>
> I suspect the answer for header modules is that we can study this patch and 
> understand what the equivalents of graph nodes/deps/names/scanning look like 
> for explicit header modules, and understand that we'll be able to abstractify 
> some names and add some levels of indirection and it'll all work out.

Yeah. On the one hand, I always think that named modules are far different from 
header modules (including clang header modules and header units) for tools. 
Since for tools, it makes sense to fallback to traditional inclusion when they 
saw header units or header modules. For example, 
for clang header modules, IIUC, clangd can work if we rewrite the compile 
commands to strip a lot of clang header modules related options. And we reuse 
the rewrite process in the patch. (I know there are some exceptions due to 
macros). On the other hand, I'm curious about how do you handle/model clang 
explicit header modules with compilation database? I didn't look about this. Is 
the headers an entry in the compilation database? And if is it possible for one 
header to have multiple BMIs? Such problems look not easy to handle so I prefer 
to fallback it to inclusions.

> Our specific build system setup is that all module-build actions and inputs 
> explicit (-fmodule-file=, -fmodule-map-file=). The build system will not 
> produce usable PCMs due to version skew.
> I know that Meta folks *would* like to make use of available PCMs from the 
> build system.

Also it looks like to me the Meta's method are not  a solution we want. This 
looks basically what I want for named modules half a year ago. Basically I did 
nothing and just tried to let clangd fallback to clang to handle everything.

---

> support for large projects
>
> Apart from the concrete scanning of deps, keeping the full project module 
> graph in memory won't always be possible. (It's a perfectly reasonable 
> default implementation though).
>
>> We'll need some abstraction layer (like CompilationDatabase is to 
>> compile_commands.json) that exposes enough data to run the algorithms we 
>> need, without exposing so much that you have to hold the whole graph in 
>> memory. It could be backed by in-memory depscan results, or a build-system 
>> artifact, or a live query of the build system.

Yeah, I agree the performance is a key point. In the current design, the 
ModulesDependencyGraph is only visible to ModulesManager so I feel we left the 
space for further optimization while I don't have concrete ideas. The current 
patch only left 5 interfaces for outer users  (currently only TUScheduler) (the 
interfaces: UpdateNode(PathRef), isReadyToCompile(PathRef), 
HasInvalidDependencies(PathRef), addCallbackAfterReady(PathRef, 
std::function) and GenerateModuleInterfacesFor(PathRef)), which didn't 
leak any special data to outer users. So I feel while it is OK to adopt ideas 
to enhance the scaling ability, it is OK to forward now since we've left the 
space for further optimization. (premature optimization is the root of all evil 
:) )

> For "every time we update a file, all the affected files (e.g., we changed a 
> header file) will be re-scanned" - we need a way to express this more 
> abstractly than "re-scanned", and more narrowly than "all the affected files" 
> - at least it needs to be limited to all the files that could themselves 
> affect any open file.

Yeah, actually now this patch doesn't make it **explicitly**. Now we achieve 
this **implicitly** by making `TUScheduler::update` (which will be called every 
time the TU changes) to call `ModulesManager::UpdateNode(PathRef)`. What I want 
to do is to let `ModulesManager::UpdateNode(PathRef Node)` to call 
`TUScheduler::update` for the users of Node.

So here "re-scanned" means to call `TUScheduler::update`. And "all the affected 
files" is not reflected in the patch actually. Do you have suggestions to 
improve this or do you feel it is OK to make it implicitly as now?


CHANGES SINCE LAST ACTION
  

[PATCH] D153114: [clangd] [C++20] [Modules] Support C++20 modules for clangd

2023-06-16 Thread Nathan Ridge via Phabricator via cfe-commits
nridge added a comment.

> The major problem I see now is that we can't handle third party modules. 
> Third party modules refer to the modules whose source codes are not in 
> current project. The users are still able to see the hint from clangd if the 
> BMI of the third party modules are built ahead of time. I think this is true 
> for a lot use cases. But this breaks our goal to not be locked by the same 
> version compiler.
>
> I think we can only solve the issue after SG15 solves it. I know SG15 is 
> discussing how to let the modules communicate across libraries boundary. And 
> it is not wise to invent wheels agains SG15. So let's wait for that.

For context, SG15 is the ISO C++ standards committee's Tooling Study Group. One 
proposal being looked at that may be relevant here is P2581 - Specifying the 
Interoperability of Built
Module Interface Files 
.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D153114/new/

https://reviews.llvm.org/D153114

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D153114: [clangd] [C++20] [Modules] Support C++20 modules for clangd

2023-06-16 Thread Sam McCall via Phabricator via cfe-commits
sammccall added a comment.

(Sorry, hit send too soon)

I suspect the answer for header modules is that we can study this patch and 
understand what the equivalents of graph nodes/deps/names/scanning look like 
for explicit header modules, and understand that we'll be able to abstractify 
some names and add some levels of indirection and it'll all work out.

For large project support, I suspect a bit more thought is needed.
We'll need some abstraction layer (like CompilationDatabase is to 
compile_commands.json) that exposes enough data to run the algorithms we need, 
without exposing so much that you have to hold the whole graph in memory. It 
could be backed by in-memory depscan results, or a build-system artifact, or a 
live query of the build system.
For "every time we update a file, all the affected files (e.g., we changed a 
header file) will be re-scanned" - we need a way to express this more 
abstractly than "re-scanned", and more narrowly than "all the affected files" - 
at least it needs to be limited to all the files that could themselves affect 
any open file.

(Going to dig into the design now and come back with more thoughts)


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D153114/new/

https://reviews.llvm.org/D153114

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D153114: [clangd] [C++20] [Modules] Support C++20 modules for clangd

2023-06-16 Thread Sam McCall via Phabricator via cfe-commits
sammccall added a comment.

Thanks for putting this together, I'm going to study it carefully and try it 
out!

That said, there are two large issues that I think should be addressed in the 
design (though not necessarily *implemented* now).
I'll be upfront: these are things without which $EMPLOYER will not be able to 
use this at all.
Since there isn't really room for multiple modules implementations in clangd, 
I'd like to make sure these fit into the design, or can be bolted on.

In the past, we've been reasonably successful at finding extension points that 
let clangd scale to unreasonable codebase sizes, while doing the right thing 
for smaller projects too. (Index, build system integration, VFS support, etc). 
To quote from the bug:

> you will need an alternative for them (luckily most of them already are 
> willing to invest in custom tooling), but scanning all project files should 
> still be fine for the vast majority of clangd users

For historical context: clangd **was** that custom tooling designed to scale to 
huge monorepos. This isn

---

**support for clang header modules**

Google has a large deployment of clangd, serving a codebase that builds 
significant parts as clang header modules for performance reasons. There is no 
near-term plan of adopting C++20 modules (multiple reasons, which I probably 
can't represent well).
We've been disabling header modules for clangd & other tools for a long time. 
But it's come to a head, and we expect to get someone working on enabling 
modules in clangd in ~2 months.
As I understand it, Meta are in a similar situation (though their solution is 
to version-lock clangd with the toolchain, keep modules enabled, and accept 
that some things are broken).

Our specific build system setup is that all module-build actions and inputs 
explicit (-fmodule-file=, -fmodule-map-file=). The build system will not 
produce usable PCMs due to version skew.
I know that Meta folks *would* like to make use of available PCMs from the 
build system.

---

**support for large projects**

We have multiple codebases large enough that touching every file to discover 
dependencies isn't feasible. 
The largest internal one had 2B LOC 7 years ago, and is now much larger. But 
even Chrome for example has 20M. Such projects are prone to adopt modules (in 
some form) for build scalability.
Apart from the concrete scanning of deps, keeping the full project module graph 
in memory won't always be possible. (It's a perfectly reasonable default 
implementation though).


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D153114/new/

https://reviews.llvm.org/D153114

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits