[PATCH] D135849: [llvm] Return canonical virtual path from `RedirectingFileSystem`

2023-07-06 Thread Jan Svoboda via Phabricator via cfe-commits
jansvoboda11 updated this revision to Diff 537689.
jansvoboda11 added a comment.

Add unit tests, use `SmallVector`, make `clang-scan-deps` test portable


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D135849/new/

https://reviews.llvm.org/D135849

Files:
  clang/lib/Lex/ModuleMap.cpp
  clang/test/ClangScanDeps/modules-canononical-module-map-case.c
  llvm/include/llvm/Support/VirtualFileSystem.h
  llvm/lib/Support/VirtualFileSystem.cpp
  llvm/unittests/Support/VirtualFileSystemTest.cpp

Index: llvm/unittests/Support/VirtualFileSystemTest.cpp
===
--- llvm/unittests/Support/VirtualFileSystemTest.cpp
+++ llvm/unittests/Support/VirtualFileSystemTest.cpp
@@ -2580,6 +2580,7 @@
   Lower->addSymlink("/link");
   IntrusiveRefCntPtr FS = getFromYAMLString(
   "{ 'use-external-names': false,\n"
+  "  'case-sensitive': false,\n"
   "  'roots': [\n"
   "{\n"
   "  'type': 'directory',\n"
@@ -2588,6 +2589,11 @@
   "  'type': 'file',\n"
   "  'name': 'bar',\n"
   "  'external-contents': '/link'\n"
+  "},\n"
+  "{\n"
+  "  'type': 'directory',\n"
+  "  'name': 'baz',\n"
+  "  'contents': []\n"
   "}\n"
   "  ]\n"
   "},\n"
@@ -2610,9 +2616,9 @@
   EXPECT_FALSE(FS->getRealPath("//root/bar", RealPath));
   EXPECT_EQ(RealPath.str(), "/symlink");
 
-  // Directories should fall back to the underlying file system is possible.
-  EXPECT_FALSE(FS->getRealPath("//dir/", RealPath));
-  EXPECT_EQ(RealPath.str(), "//dir/");
+  // Directories should return the virtual path as written in the definition.
+  EXPECT_FALSE(FS->getRealPath("//ROOT/baz", RealPath));
+  EXPECT_EQ(RealPath.str(), "//root/baz");
 
   // Try a non-existing file.
   EXPECT_EQ(FS->getRealPath("/non_existing", RealPath),
Index: llvm/lib/Support/VirtualFileSystem.cpp
===
--- llvm/lib/Support/VirtualFileSystem.cpp
+++ llvm/lib/Support/VirtualFileSystem.cpp
@@ -2239,6 +2239,14 @@
   }
 }
 
+void RedirectingFileSystem::LookupResult::getPath(
+llvm::SmallVectorImpl ) const {
+  Result.clear();
+  for (Entry *Parent : Parents)
+llvm::sys::path::append(Result, Parent->getName());
+  llvm::sys::path::append(Result, E->getName());
+}
+
 std::error_code
 RedirectingFileSystem::makeCanonical(SmallVectorImpl ) const {
   if (std::error_code EC = makeAbsolute(Path))
@@ -2257,11 +2265,14 @@
 RedirectingFileSystem::lookupPath(StringRef Path) const {
   sys::path::const_iterator Start = sys::path::begin(Path);
   sys::path::const_iterator End = sys::path::end(Path);
+  llvm::SmallVector Entries;
   for (const auto  : Roots) {
 ErrorOr Result =
-lookupPathImpl(Start, End, Root.get());
-if (Result || Result.getError() != llvm::errc::no_such_file_or_directory)
+lookupPathImpl(Start, End, Root.get(), Entries);
+if (Result || Result.getError() != llvm::errc::no_such_file_or_directory) {
+  Result->Parents = std::move(Entries);
   return Result;
+}
   }
   return make_error_code(llvm::errc::no_such_file_or_directory);
 }
@@ -2269,7 +2280,8 @@
 ErrorOr
 RedirectingFileSystem::lookupPathImpl(
 sys::path::const_iterator Start, sys::path::const_iterator End,
-RedirectingFileSystem::Entry *From) const {
+RedirectingFileSystem::Entry *From,
+llvm::SmallVectorImpl ) const {
   assert(!isTraversalComponent(*Start) &&
  !isTraversalComponent(From->getName()) &&
  "Paths should not contain traversal components");
@@ -2298,10 +2310,12 @@
   auto *DE = cast(From);
   for (const std::unique_ptr  :
llvm::make_range(DE->contents_begin(), DE->contents_end())) {
+Entries.push_back(From);
 ErrorOr Result =
-lookupPathImpl(Start, End, DirEntry.get());
+lookupPathImpl(Start, End, DirEntry.get(), Entries);
 if (Result || Result.getError() != llvm::errc::no_such_file_or_directory)
   return Result;
+Entries.pop_back();
   }
 
   return make_error_code(llvm::errc::no_such_file_or_directory);
@@ -2543,10 +2557,12 @@
 return P;
   }
 
-  // If we found a DirectoryEntry, still fallthrough to the original path if
-  // allowed, because directories don't have a single external contents path.
-  if (Redirection == RedirectKind::Fallthrough)
-return ExternalFS->getRealPath(CanonicalPath, Output);
+  // We found a DirectoryEntry, which does not have a single external contents
+  // path. Use the canonical virtual path.
+  if (Redirection == RedirectKind::Fallthrough) {
+Result->getPath(Output);
+return {};
+  }
   return llvm::errc::invalid_argument;
 }
 
Index: llvm/include/llvm/Support/VirtualFileSystem.h
===
--- 

[PATCH] D135849: [llvm] Return canonical virtual path from `RedirectingFileSystem`

2023-07-05 Thread Jan Svoboda via Phabricator via cfe-commits
jansvoboda11 added a comment.

In D135849#4469376 , @benlangmuir 
wrote:

> In D135849#4469347 , @jansvoboda11 
> wrote:
>
>> It'd be really nice to have DirectoryEntry::getDir() API, so that we can 
>> walk up the directory hierarchy while preserving the virtual/real 
>> distinction between directories in the overlay/on disk, never accidentally 
>> "bubbling up" into the overlay again. What's your take on that?
>
> Can you say more about how you would do this and preserve the real/virtual 
> distinction? Currently FileManager is just filling in the directory based on 
> the parent path with a lookup to the VFS, so isn't it the same issue? Or did 
> you mean we would keep more info?

Yes, the mechanism you mention has the same issue. My idea was that we could 
keep more information when looking up the path in the VFS, associate each 
`{File,Directory}Entry` with list of `FileSystem *` that "owns" each of its 
parent directories. We could then send the `llvm::sys::path::parent_path(...)` 
argument directly to the owning FS instead to the top one. This is not 
realistic from performance standpoint, though.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D135849/new/

https://reviews.llvm.org/D135849

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D135849: [llvm] Return canonical virtual path from `RedirectingFileSystem`

2023-07-03 Thread Ben Langmuir via Phabricator via cfe-commits
benlangmuir added a comment.

In D135849#4469347 , @jansvoboda11 
wrote:

> It'd be really nice to have DirectoryEntry::getDir() API, so that we can walk 
> up the directory hierarchy while preserving the virtual/real distinction 
> between directories in the overlay/on disk, never accidentally "bubbling up" 
> into the overlay again. What's your take on that?

Can you say more about how you would do this and preserve the real/virtual 
distinction? Currently FileManager is just filling in the directory based on 
the parent path with a lookup to the VFS, so isn't it the same issue? Or did 
you mean we would keep more info?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D135849/new/

https://reviews.llvm.org/D135849

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D135849: [llvm] Return canonical virtual path from `RedirectingFileSystem`

2023-07-03 Thread Jan Svoboda via Phabricator via cfe-commits
jansvoboda11 added a comment.

Okay, I'll try to figure out how expensive this would be. I'd like Clang to be 
stricter in situations like these, but if that's not feasible, I'll probably 
implement the first workaround.

For the second workaround, I don't think that moves us in better direction - we 
should be able to assign one canonical real path to each `DirectoryEntry`.

The unfortunate aspect of the `ModuleMap::canonicalizeModuleMapPath()` 
implementation is that it uses 
`FileManager.getDirectoryRef(llvm::sys::path::parent_path(ModuleMapFilePath))`. 
If the request for getting the module map file fell through the VFS into the 
underlying FS, parent `DirectoryEntry` of that file is different than the 
virtual directory made for the VFS. Chopping off the file name and going 
through the VFS again doesn't guarantee falling through the VFS again, which 
that code doesn't account for. It'd be really nice to have 
`DirectoryEntry::getDir()` API, so that we can walk up the directory hierarchy 
while preserving the virtual/real distinction between directories in the 
overlay/on disk, never accidentally "bubbling up" into the overlay again. 
What's your take on that?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D135849/new/

https://reviews.llvm.org/D135849

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D135849: [llvm] Return canonical virtual path from `RedirectingFileSystem`

2023-07-03 Thread Ben Langmuir via Phabricator via cfe-commits
benlangmuir added a comment.

> You mean diagnosing whenever the spelling in the VFS definition differs from 
> its realpath?

Right, this would be ideal, but may be too expensive in practice.

> How could we make that work with symlinks in place?

Ah right, we are intentionally allowing symlinks in the VFS path (e.g. 
Foo.framework/Headers) because we don't correctly handle symlink resolution in 
the VFS itself.

You can get the canonical spelling of a symlink in a couple of ways: you can 
readdir the parent directory and find all case-insensitive matches.  If there 
is only 1, that's the spelling, if there are more than 1  then it's a 
case-sensitive filesystem and the original path must be correct.  Another way 
that is OS-specific is on Darwin you can `open(..., O_SYMLINK)` the path 
component to open the symlink itself and then `fcntl(..., F_GETPATH, ...)` to 
get its realpath.  Both of these approaches require handling each component of 
the path individually, though each step is cacheable.  Not sure if there are 
any better ways.

As I said, not sure we can afford to diagnose this.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D135849/new/

https://reviews.llvm.org/D135849

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D135849: [llvm] Return canonical virtual path from `RedirectingFileSystem`

2023-07-03 Thread Jan Svoboda via Phabricator via cfe-commits
jansvoboda11 added a comment.

In D135849#4464234 , @benlangmuir 
wrote:

>> Just realized this most likely won't work if the case-insensitive VFS is 
>> specified with wrong spelling too, e.g. Fw.framework.
>
> Is this about the spelling in the VFS definition itself, or about the path 
> being looked up?

Yes, spelling in the VFS definition itself.

> If it's the VFS definition maybe we can say that's a bug for whoever wrote 
> the VFS? Ideally we could diagnose but I'm not sure we want to pay for a lot 
> of calls to realpath.

You mean diagnosing whenever the spelling in the VFS definition differs from 
its realpath? How could we make that work with symlinks in place?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D135849/new/

https://reviews.llvm.org/D135849

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D135849: [llvm] Return canonical virtual path from `RedirectingFileSystem`

2023-06-30 Thread Ben Langmuir via Phabricator via cfe-commits
benlangmuir added a comment.

> Just realized this most likely won't work if the case-insensitive VFS is 
> specified with wrong spelling too, e.g. Fw.framework.

Is this about the spelling in the VFS definition itself, or about the path 
being looked up? If it's the VFS definition maybe we can say that's a bug for 
whoever wrote the VFS? Ideally we could diagnose but I'm not sure we want to 
pay for a lot of calls to realpath.




Comment at: llvm/include/llvm/Support/VirtualFileSystem.h:876
+/// Chain of parent directory entries for \c E.
+std::vector Parents;
+

SmallVector?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D135849/new/

https://reviews.llvm.org/D135849

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D135849: [llvm] Return canonical virtual path from `RedirectingFileSystem`

2023-06-30 Thread Jan Svoboda via Phabricator via cfe-commits
jansvoboda11 added a comment.

Just realized this most likely won't work if the case-insensitive VFS is 
specified with wrong spelling too, e.g. `Fw.framework`. Maybe we'll need to 
combine this patch with one of the alternatives after all.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D135849/new/

https://reviews.llvm.org/D135849

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D135849: [llvm] Return canonical virtual path from `RedirectingFileSystem`

2023-06-30 Thread Jan Svoboda via Phabricator via cfe-commits
jansvoboda11 added a comment.

Module map canonicalization in `clang-scan-deps` causes issues even after 
D135841  landed.

Here's the problematic scenario this patch aims to solve:

- there is a framework `FW` whose headers (but not module maps) are redirected 
in a VFS overlay file
- there is a TU with `#import ` (note the incorrect case of the 
framework name)
- what follows is how Clang treats the involved file/directory paths:
  - find the incorrectly-cased header through the case-insensitive VFS overlay
  - chop off `Headers/Header.h`
  - call `FileManager::getCanonicalName()` on the `fw.framework` directory
- `RedirectingFileSystem` calls the underlying FS to get the real path
  - note: this happens since the node is not 
`RedirectingFileSystem::{FileEntry,DirectoryRemapEntry}`, see the last branch 
of `RedirectingFileSystem::getRealPath()`
  - the underlying case-sensitive FS doesn't recognize the 
incorrectly-cased path
- `FileManager` falls back to just using `Dir.getName()` (`fw.framework`) 
for that virtual `DirectoryEntry` as the canonical path
  - the `fw.framework/Modules/module.modulemap` file is not in the VFS overlay, 
and due to the wrong case, the underlying case-sensitive FW doesn't recognize 
it either
  - no matching module map means `#include ` is resolved to 
textual include
  - that header then includes other headers from the same FW using correct 
case: `#import `
  - the correctly-cased module map is found in the underlying FS and the 
framework gets compiled into a PCM file
  - at the end of dependency scan, scanner takes the correctly-cased module map 
path from the PCM file
  - scanner tries to canonicalize the module map path
- `Modules/module.modulemap` is chopped off and 
`FileManager::getCanonicalName()` is called on the framework directory
- `FileManager::getDirectory("FW.framework")` resolves to the same 
`DirectoryEntry` that `fw.framework` resolved to (VFS overlay is 
case-insensitive)
- `FileManager` looks in the cache of canonical names and finds 
`fw.framework` for that virtual `DirectoryEntry`
- `FW.framework/Modules/module.modulemap` is canonicalized to 
`fw.framework/Modules/module.modulemap`
  - scanner asserts that file exists, but it's not in the VFS nor in the 
underlying case-sensitive FS

This patch fixes this issue by returning the **canonical virtual path** for 
`fw.framework` at the very beginning. This is achieved by using the as-written 
spelling from the overlay file.

I can think of two other alternatives to this patch:

- In `ModuleMap::canonicalizeModuleMapPath()`, avoid canonicalization if the 
new module map path resolves to a different (or no) `FileEntry`.
- In `FileManager`, cache real paths based on the directory name, not the 
`DirectoryEntry`.

Both of these alternatives are workarounds, whereas the VFS fix seems fairly 
straightforward.

Note that I plan to add unit tests for this and add `// XFAIL: 
case-insensitive-filesystem` to the clang-scan-deps test if we get consensus on 
this approach.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D135849/new/

https://reviews.llvm.org/D135849

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits