[PATCH] D78058: option to write files to memory instead of disk

2021-01-26 Thread Duncan P. N. Exon Smith via Phabricator via cfe-commits
dexonsmith added a comment.

In D78058#2481050 , @dexonsmith wrote:

> In D78058#2480411 , @dexonsmith 
> wrote:
>
>> In D78058#2471735 , @sammccall 
>> wrote:
>>
>>> @dexonsmith thanks for sharing!
>>> Some initial thoughts since abstracting outputs is something we're starting 
>>> to care about too...
>>
>> Thanks for looking.
>
> I think I've found a fairly clean way to get the write-through memory buffer 
> optimization with pluggable backends; still working through the other 
> feedback, but hoping to have the patch / RFC out soonish.

FYI, I posted an RFC here: 
https://lists.llvm.org/pipermail/cfe-dev/2021-January/067576.html


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D78058/new/

https://reviews.llvm.org/D78058

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D78058: option to write files to memory instead of disk

2021-01-05 Thread Duncan P. N. Exon Smith via Phabricator via cfe-commits
dexonsmith added a comment.

In D78058#2480411 , @dexonsmith wrote:

> In D78058#2471735 , @sammccall wrote:
>
>> @dexonsmith thanks for sharing!
>> Some initial thoughts since abstracting outputs is something we're starting 
>> to care about too...
>
> Thanks for looking.

I think I've found a fairly clean way to get the write-through memory buffer 
optimization with pluggable backends; still working through the other feedback, 
but hoping to have the patch / RFC out soonish.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D78058/new/

https://reviews.llvm.org/D78058

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D78058: option to write files to memory instead of disk

2021-01-05 Thread Duncan P. N. Exon Smith via Phabricator via cfe-commits
dexonsmith added a comment.

In D78058#2471735 , @sammccall wrote:

> @dexonsmith thanks for sharing!
> Some initial thoughts since abstracting outputs is something we're starting 
> to care about too...

Thanks for looking.

> This doesn't appear to be an extension point - you can write to an InMemoryFS 
> or to real disk, but not to anything else. If we're going to add abstractions 
> around output, I think they should support flexible use in libraries (which 
> doesn't need to cost complexity here: FS vs MemoryFS vs custom storage can 
> implement the same interface).
> With the right interface, I think this obviates the need for RequestedOutput 
> and generalizes it - a caller can install an intercepting output backend that 
> hooks certain paths and delegates the rest to the old backend.
> (Of course such a backend could be provided, but it gets the complexity out 
> of the way of the core abstractions)

I agree with your concerns with the monolithic design. My motivation for it is 
to get the memory optimization of using a write-through memory buffer when the 
output is needed both on-disk and in-memory. I imagine similar concerns if/when 
we add a CAS-based storage option, where it's likely more efficient to ask the 
CAS to materialize the on-disk file than to write it ourselves.

I'll think about whether there's a clean way to model these kinds of 
interactions between storage implementations without a monolithic design; let 
me know if you have any concrete ideas.

The option I see is to have an abstract extension point that allows libraries 
to plug in custom storage, but keep the monolithic design for on-disk and 
in-memory (and eventually CAS).

> I think the lifecycle and allowed operations on outputs would be clearer if 
> an `Output` class was returned, with methods, rather than a stream + handle 
> and various "free" functions (well, methods on OutputManager) than manipulate 
> the handle.
> This would again make it easier to reason about what operations are part of a 
> storage backend: there'd be some `OutputBackend` interface to implement, and 
> it would hand out `Output` objects which again must be implemented. (This 
> would be reminiscent of FileManager + vfs::FileSystem + vfs::File).

I'll think some more about the options here. The current design is motivated by 
how outputs are currently used / referenced in `CompilerInstance` (and how the 
streams are passed around separately to APIs that don't know anything about 
outputs); i.e., the current design seems simple to land as an incremental 
change. But it's probably not hard to make it a bit cleaner.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D78058/new/

https://reviews.llvm.org/D78058

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D78058: option to write files to memory instead of disk

2020-12-26 Thread Sam McCall via Phabricator via cfe-commits
sammccall added a comment.

@dexonsmith thanks for sharing!
Some initial thoughts since abstracting outputs is something we're starting to 
care about too...

This doesn't appear to be an extension point - you can write to an InMemoryFS 
or to real disk, but not to anything else. If we're going to add abstractions 
around output, I think they should support flexible use in libraries (which 
doesn't need to cost complexity here: FS vs MemoryFS vs custom storage can 
implement the same interface).
With the right interface, I think this obviates the need for RequestedOutput 
and generalizes it - a caller can install an intercepting output backend that 
hooks certain paths and delegates the rest to the old backend.
(Of course such a backend could be provided, but it gets the complexity out of 
the way of the core abstractions)

I think the lifecycle and allowed operations on outputs would be clearer if an 
`Output` class was returned, with methods, rather than a stream + handle and 
various "free" functions (well, methods on OutputManager) than manipulate the 
handle.
This would again make it easier to reason about what operations are part of a 
storage backend: there'd be some `OutputBackend` interface to implement, and it 
would hand out `Output` objects which again must be implemented. (This would be 
reminiscent of FileManager + vfs::FileSystem + vfs::File).


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D78058/new/

https://reviews.llvm.org/D78058

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D78058: option to write files to memory instead of disk

2020-12-23 Thread Duncan P. N. Exon Smith via Phabricator via cfe-commits
dexonsmith added a comment.

In D78058#2464262 , @dexonsmith wrote:

> I'll reply here once I've posted the RFC and patch (as I said, I'm hoping 
> next week) so you can take a look.

I don't quite have my patch and RFC ready (and I may not until I'm back from 
holidays in January), but if you're curious the current state is at 
https://github.com/dexonsmith/llvm-project/tree/output-manager / 
https://github.com/dexonsmith/llvm-project/commit/924cb118272acab5f031c04f55465ae4799be4bf.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D78058/new/

https://reviews.llvm.org/D78058

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D78058: option to write files to memory instead of disk

2020-12-18 Thread Duncan P. N. Exon Smith via Phabricator via cfe-commits
dexonsmith added subscribers: erik.pilkington, arphaman, Bigcheese.
dexonsmith added a comment.

I think this is important; thanks for working on it!

I'm sorry I missed it before... coincidentally, I'm working in a similar 
problem space right now (likely I'll post a patch with an RFC next week) but my 
design is a bit different.

The main abstraction I came up with is in two parts:

- An `OutputManager`, which can support opening / closing outputs with 
configurable destinations (kind of like a chain of consumers, but my initial 
design is monolithic to simplify optional optimizations like using a 
`WriteThroughMemoryBuffer` for mirroring a just-written on-disk file in 
memory). The expectation is that various `CompilerInstances` (and other things) 
could share one of these. This is not a file system; it just has very simple 
"open" / "close" / "erase" APIs for output paths (initially it has three 
possible destinations: on-disk, in-memory (store in an `InMemoryFileSystem`), 
and a request-provided `raw_pwrite_ostream`; it can do any or all of these).
- An `OutputContext`, which can be instantiated to track multiple outputs in a 
specific context. Each `CompilerInstance` would have one of these, effectively 
replacing the `std::list` it currently has (`clearOutputFiles` 
would call `OutputContext::closeAllOutputs`, which then calls 
`OutputManager::closeOutput` for each output its tracking). This can have 
partial configuration (partially overriding the manager's default 
configuration).

(Initially, I was thinking of a virtual base class `OutputManager` with 
subclasses `OnDiskOutputManager` and `InMemoryOutputManager`, and I didn't have 
`OutputContext` as a separate abstraction, but I don't think that gives us the 
flexibility we need for various use cases; see below.)

I also have:

- `OutputConfig`, configuration for a specific file; the default configuration 
lives on `OutputManager` (and is configurable).
- `PartialOutputConfig`, a partial configuration that overrides only some of a 
configuration; can be put on an `OutputContext` ("mirror all of my outputs in 
memory" or "don't write to disk, only to memory, for these outputs"), or put on 
an individual "open file" request ("the client needs seeking support for this 
output").

These are the use cases I think we should support:

- A "normal" clang invocation without implicit modules should just write to 
disk.
- If invoking clang from the command-line and implicit modules is turned on, 
the modules should be mirrored in memory (but also written to disk). My idea 
here is that VFS is constructed with an `OverlayFileSystem` at the top level 
and an initially-empty `InMemoryFileSystem` underneath in front of the "usual" 
one. The `OutputContext`s for the modules builds would be configured to mirror 
all outputs into the `InMemoryFileSystem`. (This is a step toward dropping the 
rather awkward `InMemoryModuleCache`; I think that will be easy once this 
lands.)
- Index services often don't want outputs written to disk. If they're happy 
passing in an `InMemoryFileSystem` to the `OutputManager`, they can use the 
same approach.
- When clang-scan-deps is used to do a modules pre-scan, its source-minimized 
modules should never hit the disk. The `OutputManager` can be configured to 
only fill up the `InMemoryFileSystem`.
- For unit tests or a `-cc1` compile-in-memory API where we care about a single 
output file, it might be nice to be able to ignore most files (write to 
`raw_null_ostream`), but get back specific file content and/or configure 
certain files in some special way.
- There are lots of other outputs in clang (e.g., index-while-building files 
from `-index-store-path`, which is still being upstreamed), which all do their 
own temp-file dance (... or worse, they don't), and they should be able to 
delete code and use this as well; I think even references to `errs()` should go 
through so this could be captured in unit tests (or an in-memory compile API, 
or...).
- It should be easy to cleanly add more output destinations (e.g., CAS) as we 
find the need, and `OutputContext` users would not need to know about specific 
destinations unless they wanted some special configuration.

I'll reply here once I've posted the RFC and patch (as I said, I'm hoping next 
week) so you can take a look.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D78058/new/

https://reviews.llvm.org/D78058

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D78058: option to write files to memory instead of disk

2020-12-17 Thread Sam McCall via Phabricator via cfe-commits
sammccall added a comment.

It's great that you're looking at this! Clang tooling in general and clangd in 
particular a lot of use of VFS (particularly when deployed inside google), so 
I've got a fair bit of interest/experience in these interfaces. And we'd also 
looked at virtualizing module caches in particular for clangd (but haven't 
tackled it yet).

A couple of high-level design questions:

- if the goal is really to make module caching work without a physical disk, is 
virtualizing the FS underlying the cache really the right abstraction? Compared 
to an alternative like plugging in a ModuleCache interface, it seems like quite 
a complicated contract to have to maintain, and less flexible too (we've 
dreamed about ideas like a network-distributed cache, expiry, selectively 
sharing cached modules based on security context etc... easier if the objects 
are modules+metadata rather than filenames+bytes). Of course it's pretty 
generic and maybe can be used for other stuff too, but...
- if we *are* going to introduce a writable VFS concept to LLVM, I feel we must 
allow this to be reused for virtualizing file writes in general. (clang has 
several ad-hoc mechanisms for virtual file inputs that AIUI predate llvm::vfs, 
and they're a pain). I think at minimum we should ensure that the writable FS 
is an interface rather than just one implementation, and that we can implement 
it well for the real FS also. Does this seem reasonable?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D78058/new/

https://reviews.llvm.org/D78058

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D78058: option to write files to memory instead of disk

2020-12-02 Thread Marc Rasi via Phabricator via cfe-commits
marcrasi added a comment.

Ping on this. Could anyone take a look?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D78058/new/

https://reviews.llvm.org/D78058

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D78058: option to write files to memory instead of disk

2020-04-13 Thread Marc Rasi via Phabricator via cfe-commits
marcrasi created this revision.
Herald added subscribers: cfe-commits, jfb, mgorny.
Herald added a project: clang.
marcrasi edited the summary of this revision.

This allows clients to invoke clang without writing anything to disk.

I'm not sure if this is the right design, so I'm sending this patch to
start a discussion about the right way to do this. I'll be happy to make
changes and add tests after talking about the right way to do this!

Motivation:

At Google, we run some Swift source tooling on diskless servers. The
Swift source tooling calls Clang through the ClangImporter
(https://github.com/apple/swift/blob/master/lib/ClangImporter/ClangImporter.cpp).
Clang compiles module to an on-disk module cache, which does not work on
the diskless servers. This patch lets clients ask Clang to write the
module cache to memory.

Current design:

- Adds `class InMemoryOutputFileSystem : public llvm::vfs::FileSystem` that 
supports write operations backed by memory. After a file is written, its 
contents are accessible through the `llvm::vfs::FileSystem` interface.
- Adds an `InMemoryOutputFileSystem` field to `CompilerInstance`. When this 
field is set, the `CompilerInstance` writes to the `InMemoryOutputFileSystem` 
instead of the real file system.


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D78058

Files:
  clang/include/clang/Basic/InMemoryOutputFileSystem.h
  clang/include/clang/Frontend/CompilerInstance.h
  clang/lib/Basic/CMakeLists.txt
  clang/lib/Basic/InMemoryOutputFileSystem.cpp
  clang/lib/Frontend/CompilerInstance.cpp

Index: clang/lib/Frontend/CompilerInstance.cpp
===
--- clang/lib/Frontend/CompilerInstance.cpp
+++ clang/lib/Frontend/CompilerInstance.cpp
@@ -642,6 +642,18 @@
 
 void CompilerInstance::clearOutputFiles(bool EraseFiles) {
   for (OutputFile  : OutputFiles) {
+if (InMemoryOutputFileSystem) {
+  assert(!OF.TempFilename.empty() &&
+ "InMemoryOutputFileSystem requires using temporary files");
+  if (EraseFiles) {
+InMemoryOutputFileSystem->DeleteTemporaryBuffer(OF.TempFilename);
+  } else {
+InMemoryOutputFileSystem->FinalizeTemporaryBuffer(OF.Filename,
+  OF.TempFilename);
+  }
+  continue;
+}
+
 if (!OF.TempFilename.empty()) {
   if (EraseFiles) {
 llvm::sys::fs::remove(OF.TempFilename);
@@ -728,6 +740,18 @@
 OutFile = "-";
   }
 
+  if (InMemoryOutputFileSystem) {
+assert(UseTemporary &&
+   "InMemoryOutputFileSystem requires using temporary files");
+auto stream =
+InMemoryOutputFileSystem->CreateTemporaryBuffer(OutFile, );
+if (ResultPathName)
+  *ResultPathName = OutFile;
+if (TempPathName)
+  *TempPathName = TempFile;
+return stream;
+  }
+
   std::unique_ptr OS;
   std::string OSFile;
 
@@ -1126,6 +1150,9 @@
ImportingInstance.getDiagnosticClient()),
  /*ShouldOwnClient=*/true);
 
+  Instance.setInMemoryOutputFileSystem(
+  ImportingInstance.getInMemoryOutputFileSystem());
+
   // Note that this module is part of the module build stack, so that we
   // can detect cycles in the module graph.
   Instance.setFileManager(());
@@ -1271,6 +1298,32 @@
 << Module->Name << SourceRange(ImportLoc, ModuleNameLoc);
   };
 
+  // If we're writing to an InMemoryOutputFileSystem, then immediately compile
+  // and read the module, rather than doing all the lockfile based locking logic
+  // below, because the InMemoryOutputFileSystem doesn't support lockfiles. This
+  // is okay because the locks are only necessary for performance, not
+  // correctness.
+  if (ImportingInstance.getInMemoryOutputFileSystem()) {
+if (!compileModule(ImportingInstance, ModuleNameLoc, Module,
+   ModuleFileName)) {
+  diagnoseBuildFailure();
+  return false;
+}
+
+// Try to read the module file, now that we've compiled it.
+ASTReader::ASTReadResult ReadResult =
+ImportingInstance.getASTReader()->ReadAST(
+ModuleFileName, serialization::MK_ImplicitModule, ImportLoc,
+ASTReader::ARR_None);
+
+if (ReadResult != ASTReader::Success) {
+  diagnoseBuildFailure();
+  return false;
+}
+
+return true;
+  }
+
   // FIXME: have LockFileManager return an error_code so that we can
   // avoid the mkdir when the directory already exists.
   StringRef Dir = llvm::sys::path::parent_path(ModuleFileName);
Index: clang/lib/Basic/InMemoryOutputFileSystem.cpp
===
--- /dev/null
+++ clang/lib/Basic/InMemoryOutputFileSystem.cpp
@@ -0,0 +1,56 @@
+//=== InMemoryOutputFileSystem.cpp - Collects outputs in memory -*- C++ -*-===//
+//
+// The LLVM Compiler Infrastructure
+//
+// This file is distributed under the University of Illinois