[PATCH] D69582: Let clang driver support parallel jobs

2020-02-05 Thread Yaxun Liu via Phabricator via cfe-commits
yaxunl marked an inline comment as done.
yaxunl added inline comments.



Comment at: clang/lib/Driver/Compilation.cpp:332
+if (!Next) {
+  std::this_thread::yield();
   continue;

aganea wrote:
> yaxunl wrote:
> > aganea wrote:
> > > In addition to what @thakis said above, yielding here is maybe not a good 
> > > idea. This causes the process to spin, and remain in the OS' active 
> > > process list, which uselessly eats cpu cycles. This can become 
> > > significant over the course of several minutes of compilation.
> > > 
> > > Here's a //tiny// example of what happens when threads are waiting for 
> > > something to happen:
> > > (the top parts yields frequently; the bottom part does not yield - see 
> > > D68820)
> > > {F10592208}
> > > 
> > > You would need here to go through a OS primitive that suspends the 
> > > process until at least one job in the pool completes. On Windows this can 
> > > be achieved through `WaitForMultipleObjects()` or I/O completion ports 
> > > like provided by @thakis. You can take a look at 
> > > `Compilation::executeJobs()` in D52193 and further down the line, 
> > > `WaitMany()` which waits for at least one job/process to complete.
> > Sorry for the delay.
> > 
> > If D52193 is commited, I will probably only need some minor change to 
> > support parallel compilation for HIP. Therefore I hope D52193 could get 
> > committed soon.
> > 
> > I am wondering what is the current status of D52193 and what is blocking 
> > it. Is there any chance to get it commited soon?
> > 
> > Thanks.
> Hi @yaxunl! Nothing prevents from finishing D52193 :-) It was meant as a 
> prototype, but I could transform it into a more desirable state.
> I left it aside because we made another (unpublished) prototype, where the 
> process invocations were in fact collapsed into the calling process, ie. ran 
> in a thread pool in the manner of the recent `-fintegrated-cc1` flag. But 
> that requires for `cl::opt` to support different contexts, as opposed to just 
> one global state ([[ 
> http://lists.llvm.org/pipermail/llvm-dev/2018-October/127039.html | an RFC 
> was discussed ]] about a year ago, but there was no consensus).
> Having a thread pool instead of the process pool is faster when compiling 
> .C/.CPP files with `clang-cl /MP`, but perhaps in your case that won't work, 
> you need to call external binaries, do you? Binaries that are not part of 
> LLVM? If so, then landing D52193 first would makes sense.
HIP toolchain needs to launch executables other than clang for a compilation, 
therefore D52193 is more relevant to us. I believe this is also the case for 
CUDA, OpenMP and probably more general situations involving linker. I think 
both parallel by threads and parallel by processes are useful. However parallel 
by processes is probably more generic. Therefore landing D52193 first would 
benefit a lot.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D69582/new/

https://reviews.llvm.org/D69582



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D69582: Let clang driver support parallel jobs

2020-02-05 Thread Alexandre Ganea via Phabricator via cfe-commits
aganea added inline comments.



Comment at: clang/lib/Driver/Compilation.cpp:332
+if (!Next) {
+  std::this_thread::yield();
   continue;

yaxunl wrote:
> aganea wrote:
> > In addition to what @thakis said above, yielding here is maybe not a good 
> > idea. This causes the process to spin, and remain in the OS' active process 
> > list, which uselessly eats cpu cycles. This can become significant over the 
> > course of several minutes of compilation.
> > 
> > Here's a //tiny// example of what happens when threads are waiting for 
> > something to happen:
> > (the top parts yields frequently; the bottom part does not yield - see 
> > D68820)
> > {F10592208}
> > 
> > You would need here to go through a OS primitive that suspends the process 
> > until at least one job in the pool completes. On Windows this can be 
> > achieved through `WaitForMultipleObjects()` or I/O completion ports like 
> > provided by @thakis. You can take a look at `Compilation::executeJobs()` in 
> > D52193 and further down the line, `WaitMany()` which waits for at least one 
> > job/process to complete.
> Sorry for the delay.
> 
> If D52193 is commited, I will probably only need some minor change to support 
> parallel compilation for HIP. Therefore I hope D52193 could get committed 
> soon.
> 
> I am wondering what is the current status of D52193 and what is blocking it. 
> Is there any chance to get it commited soon?
> 
> Thanks.
Hi @yaxunl! Nothing prevents from finishing D52193 :-) It was meant as a 
prototype, but I could transform it into a more desirable state.
I left it aside because we made another (unpublished) prototype, where the 
process invocations were in fact collapsed into the calling process, ie. ran in 
a thread pool in the manner of the recent `-fintegrated-cc1` flag. But that 
requires for `cl::opt` to support different contexts, as opposed to just one 
global state ([[ 
http://lists.llvm.org/pipermail/llvm-dev/2018-October/127039.html | an RFC was 
discussed ]] about a year ago, but there was no consensus).
Having a thread pool instead of the process pool is faster when compiling 
.C/.CPP files with `clang-cl /MP`, but perhaps in your case that won't work, 
you need to call external binaries, do you? Binaries that are not part of LLVM? 
If so, then landing D52193 first would makes sense.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D69582/new/

https://reviews.llvm.org/D69582



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D69582: Let clang driver support parallel jobs

2020-02-04 Thread Yaxun Liu via Phabricator via cfe-commits
yaxunl marked an inline comment as done.
yaxunl added inline comments.



Comment at: clang/lib/Driver/Compilation.cpp:332
+if (!Next) {
+  std::this_thread::yield();
   continue;

aganea wrote:
> In addition to what @thakis said above, yielding here is maybe not a good 
> idea. This causes the process to spin, and remain in the OS' active process 
> list, which uselessly eats cpu cycles. This can become significant over the 
> course of several minutes of compilation.
> 
> Here's a //tiny// example of what happens when threads are waiting for 
> something to happen:
> (the top parts yields frequently; the bottom part does not yield - see D68820)
> {F10592208}
> 
> You would need here to go through a OS primitive that suspends the process 
> until at least one job in the pool completes. On Windows this can be achieved 
> through `WaitForMultipleObjects()` or I/O completion ports like provided by 
> @thakis. You can take a look at `Compilation::executeJobs()` in D52193 and 
> further down the line, `WaitMany()` which waits for at least one job/process 
> to complete.
Sorry for the delay.

If D52193 is commited, I will probably only need some minor change to support 
parallel compilation for HIP. Therefore I hope D52193 could get committed soon.

I am wondering what is the current status of D52193 and what is blocking it. Is 
there any chance to get it commited soon?

Thanks.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D69582/new/

https://reviews.llvm.org/D69582



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D69582: Let clang driver support parallel jobs

2019-12-05 Thread Yaxun Liu via Phabricator via cfe-commits
yaxunl updated this revision to Diff 232405.
yaxunl added a comment.

split the patch


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D69582/new/

https://reviews.llvm.org/D69582

Files:
  clang/include/clang/Driver/Driver.h
  clang/include/clang/Driver/Job.h
  clang/include/clang/Driver/Options.td
  clang/lib/Driver/Compilation.cpp
  clang/lib/Driver/Driver.cpp
  clang/lib/Driver/Job.cpp
  clang/lib/Driver/ToolChains/Clang.cpp

Index: clang/lib/Driver/ToolChains/Clang.cpp
===
--- clang/lib/Driver/ToolChains/Clang.cpp
+++ clang/lib/Driver/ToolChains/Clang.cpp
@@ -6483,7 +6483,7 @@
   C.addCommand(std::make_unique(
   JA, *this,
   TCArgs.MakeArgString(getToolChain().GetProgramPath(getShortName())),
-  CmdArgs, None));
+  CmdArgs, Inputs));
 }
 
 void OffloadBundler::ConstructJobMultipleOutputs(
@@ -6549,7 +6549,7 @@
   C.addCommand(std::make_unique(
   JA, *this,
   TCArgs.MakeArgString(getToolChain().GetProgramPath(getShortName())),
-  CmdArgs, None));
+  CmdArgs, Inputs));
 }
 
 void OffloadWrapper::ConstructJob(Compilation &C, const JobAction &JA,
Index: clang/lib/Driver/Job.cpp
===
--- clang/lib/Driver/Job.cpp
+++ clang/lib/Driver/Job.cpp
@@ -39,9 +39,11 @@
  ArrayRef Inputs)
 : Source(Source), Creator(Creator), Executable(Executable),
   Arguments(Arguments) {
-  for (const auto &II : Inputs)
+  for (const auto &II : Inputs) {
 if (II.isFilename())
   InputFilenames.push_back(II.getFilename());
+DependentActions.push_back(II.getAction());
+  }
 }
 
 /// Check if the compiler flag in question should be skipped when
Index: clang/lib/Driver/Driver.cpp
===
--- clang/lib/Driver/Driver.cpp
+++ clang/lib/Driver/Driver.cpp
@@ -38,13 +38,14 @@
 #include "ToolChains/NaCl.h"
 #include "ToolChains/NetBSD.h"
 #include "ToolChains/OpenBSD.h"
-#include "ToolChains/PS4CPU.h"
 #include "ToolChains/PPCLinux.h"
+#include "ToolChains/PS4CPU.h"
 #include "ToolChains/RISCVToolchain.h"
 #include "ToolChains/Solaris.h"
 #include "ToolChains/TCE.h"
 #include "ToolChains/WebAssembly.h"
 #include "ToolChains/XCore.h"
+#include "clang/Basic/OptionUtils.h"
 #include "clang/Basic/Version.h"
 #include "clang/Config/config.h"
 #include "clang/Driver/Action.h"
@@ -55,6 +56,7 @@
 #include "clang/Driver/SanitizerArgs.h"
 #include "clang/Driver/Tool.h"
 #include "clang/Driver/ToolChain.h"
+#include "clang/Driver/Util.h"
 #include "llvm/ADT/ArrayRef.h"
 #include "llvm/ADT/STLExtras.h"
 #include "llvm/ADT/SmallSet.h"
@@ -130,7 +132,7 @@
   CCLogDiagnostics(false), CCGenDiagnostics(false),
   TargetTriple(TargetTriple), CCCGenericGCCName(""), Saver(Alloc),
   CheckInputsExist(true), GenReproducer(false),
-  SuppressMissingInputWarning(false) {
+  SuppressMissingInputWarning(false), NumParallelJobs(1) {
 
   // Provide a sane fallback if no VFS is specified.
   if (!this->VFS)
@@ -1103,6 +1105,9 @@
   BitcodeEmbed = static_cast(Model);
   }
 
+  setNumberOfParallelJobs(
+  getLastArgIntValue(Args, options::OPT_parallel_jobs_EQ, 1, Diags));
+
   std::unique_ptr UArgs =
   std::make_unique(std::move(Args));
 
Index: clang/lib/Driver/Compilation.cpp
===
--- clang/lib/Driver/Compilation.cpp
+++ clang/lib/Driver/Compilation.cpp
@@ -15,6 +15,7 @@
 #include "clang/Driver/Options.h"
 #include "clang/Driver/ToolChain.h"
 #include "clang/Driver/Util.h"
+#include "llvm/ADT/DenseMap.h"
 #include "llvm/ADT/None.h"
 #include "llvm/ADT/STLExtras.h"
 #include "llvm/ADT/SmallVector.h"
@@ -25,8 +26,11 @@
 #include "llvm/Support/FileSystem.h"
 #include "llvm/Support/raw_ostream.h"
 #include 
+#include 
+#include 
 #include 
 #include 
+#include 
 #include 
 
 using namespace clang;
@@ -220,22 +224,134 @@
   return !ActionFailed(&C.getSource(), FailingCommands);
 }
 
+namespace {
+class JobScheduler {
+public:
+  enum JobState { JS_WAIT, JS_RUN, JS_DONE, JS_FAIL };
+  JobScheduler(const JobList &Jobs, size_t NJobs = 1)
+  : Jobs(Jobs), NumJobs(NJobs) {
+#if !LLVM_ENABLE_THREADS
+NumJobs = 1;
+#endif
+for (auto &Job : Jobs) {
+  JState[&Job] = JS_WAIT;
+  for (const auto *AI : Job.getDependentActions()) {
+for (const auto *CI : ActToCmds[AI]) {
+  DependentCmds[&Job].push_back(CI);
+}
+  }
+  for (const auto *CI : ActToCmds[&Job.getSource()]) {
+DependentCmds[&Job].push_back(CI);
+  }
+  ActToCmds[&Job.getSource()].push_back(&Job);
+}
+  }
+  /// \return true if all jobs are done. Otherwise, \p Next contains the
+  /// the next job ready to be executed if it is not null pointer. Otherwise
+  /// all jobs are running or waiting.
+  bool IsDone(const Command *&Next) {
+std::lock_guard lock(Mutex);
+Next = nullptr;
+unsigned 

[PATCH] D69582: Let clang driver support parallel jobs

2019-10-31 Thread Alexandre Ganea via Phabricator via cfe-commits
aganea added a comment.

This is somehow similar to what I was proposing in D52193 
.

Would you possibly provide tests and/or an example of your usage please?




Comment at: clang/lib/Driver/Compilation.cpp:303
+}
+std::thread Th(Work);
+Th.detach();

thakis wrote:
> Maybe a select() / fork() loop is a better approach than spawning one thread 
> per subprocess? This is doing thread-level parallelism in addition to 
> process-level parallelism :)
> 
> If llvm doesn't have a subprocess pool abstraction yet, ninja's is pretty 
> small, self-contained, battle-tested and open-source, maybe you could copy 
> that over (and remove bits you don't need)?
> 
> https://github.com/ninja-build/ninja/blob/master/src/subprocess.h
> https://github.com/ninja-build/ninja/blob/master/src/subprocess-win32.cc
> https://github.com/ninja-build/ninja/blob/master/src/subprocess-posix.cc
@thakis How would this new `Subprocess` interface with the existing 
`llvm/include/llvm/Support/Program.h` APIs? Wouldn't be better to simply extend 
what is already there with a `WaitMany()` and a `Terminate()` API like I was 
suggesting in D52193? That would cover all that's needed. Or are you suggesting 
to further stub `ExecuteAndWait()` by this new `Subprocess` API?



Comment at: clang/lib/Driver/Compilation.cpp:332
+if (!Next) {
+  std::this_thread::yield();
   continue;

In addition to what @thakis said above, yielding here is maybe not a good idea. 
This causes the process to spin, and remain in the OS' active process list, 
which uselessly eats cpu cycles. This can become significant over the course of 
several minutes of compilation.

Here's a //tiny// example of what happens when threads are waiting for 
something to happen:
(the top parts yields frequently; the bottom part does not yield - see D68820)
{F10592208}

You would need here to go through a OS primitive that suspends the process 
until at least one job in the pool completes. On Windows this can be achieved 
through `WaitForMultipleObjects()` or I/O completion ports like provided by 
@thakis. You can take a look at `Compilation::executeJobs()` in D52193 and 
further down the line, `WaitMany()` which waits for at least one job/process to 
complete.



Comment at: clang/lib/Driver/Compilation.cpp:354
+};
+JS.launch(Work);
   }

It's a waste to start a new thread here just because `ExecuteAndWait()` is used 
inside `Command::Execute()`. An async mechanism would be a lot better like 
stated above.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D69582/new/

https://reviews.llvm.org/D69582



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D69582: Let clang driver support parallel jobs

2019-10-30 Thread Nico Weber via Phabricator via cfe-commits
thakis added inline comments.



Comment at: clang/lib/Driver/Compilation.cpp:303
+}
+std::thread Th(Work);
+Th.detach();

Maybe a select() / fork() loop is a better approach than spawning one thread 
per subprocess? This is doing thread-level parallelism in addition to 
process-level parallelism :)

If llvm doesn't have a subprocess pool abstraction yet, ninja's is pretty 
small, self-contained, battle-tested and open-source, maybe you could copy that 
over (and remove bits you don't need)?

https://github.com/ninja-build/ninja/blob/master/src/subprocess.h
https://github.com/ninja-build/ninja/blob/master/src/subprocess-win32.cc
https://github.com/ninja-build/ninja/blob/master/src/subprocess-posix.cc


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D69582/new/

https://reviews.llvm.org/D69582



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D69582: Let clang driver support parallel jobs

2019-10-30 Thread Reid Kleckner via Phabricator via cfe-commits
rnk added subscribers: aganea, amccarth, rnk.
rnk added a comment.

+@aganea @amccarth 
Users have been asking for /MP support in clang-cl for a while, which is 
basically this.

Is there anything in JobScheduler that could reasonably be moved down to 
llvm/lib/Support? I would also like to be able to use it to implement 
multi-process ThinLTO instead of multi-threaded ThinLTO.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D69582/new/

https://reviews.llvm.org/D69582



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D69582: Let clang driver support parallel jobs

2019-10-29 Thread Artem Belevich via Phabricator via cfe-commits
tra added a reviewer: echristo.
tra added a subscriber: echristo.
tra added a comment.

@echristo Eric, any thoughts/concerns on the overall direction for the driver?

@yaxunl One concern I have is diagnostics. When the jobs are executed in 
parallel, I assume all of them will go to the standard error and will be 
interleaved randomly across all parallel compilations. Figuring out what went 
wrong will be hard. Ideally we may want to collect output from individual 
sub-commands and print them once the job has finished, so there's no confusion 
about the source of the error.

> It is observed that device code compilation takes most of the compilation 
> time when
>  clang compiles CUDA/HIP programs since device code usually contains 
> complicated
>  computation code. Often times such code are highly coupled, which results in
>  a few large source files which become bottlenecks of a whole project. Things 
> become
>  worse when such code is compiled with multiple gpu archs, since clang 
> compiles for
>  each gpu arch sequentially. In practice, it is common to compile for more 
> than 5 gpu
>  archs.

I think this change will only help with relatively small local builds with few 
relatively large CUDA/HIP files. We did talk internally about parallelizing 
CUDA builds in the past and came to the conclusion that it's not very useful in 
practice, at least for us. We have a lot of non-CUDA files to compile, too, and 
that usually provides enough work for the build to hide the long CUDA 
compilations. Distributed builds (and I guess local, too) often assume one 
compilation per CPU, so launching multiple parallel subcompilations for each 
top-level job may be not that helpful in practice beyond manual compilation of 
one file. That said, the change will be a nice improvement for quick rebuilds 
where only one/few CUDA files need to be recompiled. However, in that case 
being able to get  comprehensible error messages would also be very important.

Overall I'm on the fence about this change. It may be more trouble than it's 
worth.




Comment at: clang/include/clang/Driver/Job.h:77
+  /// Dependent actions
+  llvm::SmallVector DependentActions;
+

Nit: Using pointer as a key will result in sub-compilations being done in 
different order from run to run and that may result in build results changing 
from run to run.

I can't think of a realistic scenario yet. One case where it may make a 
difference is generation of dependency file.
We currently leak some output file name flags to device-side compilations. E.g. 
`-fsyntax-only -MD -MF foo.d`  will write foo.d for each compilation.  At best 
we'll end up with the result of whichever sub-compilation finished last. At 
worst we'll end up with corrupt output. In this case it's the output argument 
leak that's the problem, but I suspect there may be other cases where execution 
order will be observable.



Comment at: clang/lib/Driver/Compilation.cpp:284-286
+  }
+  }
+}

Indentation seems to be off. Run through clang-format?


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D69582/new/

https://reviews.llvm.org/D69582



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D69582: Let clang driver support parallel jobs

2019-10-29 Thread Yaxun Liu via Phabricator via cfe-commits
yaxunl created this revision.
yaxunl added reviewers: tra, rsmith, rjmccall.
Herald added subscribers: jfb, mgorny.

It is observed that device code compilation takes most of the compilation time 
when
clang compiles CUDA/HIP programs since device code usually contains complicated
computation code. Often times such code are highly coupled, which results in
a few large source files which become bottlenecks of a whole project. Things 
become
worse when such code is compiled with multiple gpu archs, since clang compiles 
for
each gpu arch sequentially. In practice, it is common to compile for more than 
5 gpu
archs.

To alleviate this issue, this patch implements a simple scheduler which let 
clang
driver compile independent jobs in parallel.

This patch tries to minimize impact on existing clang driver. No changes to 
action
builder and tool chain. It introduces a driver option -parallel-jobs=n to 
control number
of parallel jobs to launch. By default it is 1, and it is NFC per clang driver 
behavior.
If llvm/clang is built with LLVM_ENABLE_THREADS off, this change is also NFC.

The basic design of the scheduler is to find the dependence among the jobs and
use a thread to launches a job when its dependent jobs are done.


https://reviews.llvm.org/D69582

Files:
  clang/include/clang/Basic/OptionUtils.h
  clang/include/clang/Driver/Driver.h
  clang/include/clang/Driver/Job.h
  clang/include/clang/Driver/Options.td
  clang/include/clang/Frontend/Utils.h
  clang/lib/Basic/CMakeLists.txt
  clang/lib/Basic/OptionUtils.cpp
  clang/lib/Driver/Compilation.cpp
  clang/lib/Driver/Driver.cpp
  clang/lib/Driver/Job.cpp
  clang/lib/Frontend/CompilerInvocation.cpp

Index: clang/lib/Frontend/CompilerInvocation.cpp
===
--- clang/lib/Frontend/CompilerInvocation.cpp
+++ clang/lib/Frontend/CompilerInvocation.cpp
@@ -3622,35 +3622,8 @@
   return llvm::APInt(64, code).toString(36, /*Signed=*/false);
 }
 
-template
-static IntTy getLastArgIntValueImpl(const ArgList &Args, OptSpecifier Id,
-IntTy Default,
-DiagnosticsEngine *Diags) {
-  IntTy Res = Default;
-  if (Arg *A = Args.getLastArg(Id)) {
-if (StringRef(A->getValue()).getAsInteger(10, Res)) {
-  if (Diags)
-Diags->Report(diag::err_drv_invalid_int_value) << A->getAsString(Args)
-   << A->getValue();
-}
-  }
-  return Res;
-}
-
 namespace clang {
 
-// Declared in clang/Frontend/Utils.h.
-int getLastArgIntValue(const ArgList &Args, OptSpecifier Id, int Default,
-   DiagnosticsEngine *Diags) {
-  return getLastArgIntValueImpl(Args, Id, Default, Diags);
-}
-
-uint64_t getLastArgUInt64Value(const ArgList &Args, OptSpecifier Id,
-   uint64_t Default,
-   DiagnosticsEngine *Diags) {
-  return getLastArgIntValueImpl(Args, Id, Default, Diags);
-}
-
 IntrusiveRefCntPtr
 createVFSFromCompilerInvocation(const CompilerInvocation &CI,
 DiagnosticsEngine &Diags) {
Index: clang/lib/Driver/Job.cpp
===
--- clang/lib/Driver/Job.cpp
+++ clang/lib/Driver/Job.cpp
@@ -39,9 +39,11 @@
  ArrayRef Inputs)
 : Source(Source), Creator(Creator), Executable(Executable),
   Arguments(Arguments) {
-  for (const auto &II : Inputs)
+  for (const auto &II : Inputs) {
 if (II.isFilename())
   InputFilenames.push_back(II.getFilename());
+DependentActions.push_back(II.getAction());
+  }
 }
 
 /// Check if the compiler flag in question should be skipped when
Index: clang/lib/Driver/Driver.cpp
===
--- clang/lib/Driver/Driver.cpp
+++ clang/lib/Driver/Driver.cpp
@@ -37,13 +37,14 @@
 #include "ToolChains/NaCl.h"
 #include "ToolChains/NetBSD.h"
 #include "ToolChains/OpenBSD.h"
-#include "ToolChains/PS4CPU.h"
 #include "ToolChains/PPCLinux.h"
+#include "ToolChains/PS4CPU.h"
 #include "ToolChains/RISCVToolchain.h"
 #include "ToolChains/Solaris.h"
 #include "ToolChains/TCE.h"
 #include "ToolChains/WebAssembly.h"
 #include "ToolChains/XCore.h"
+#include "clang/Basic/OptionUtils.h"
 #include "clang/Basic/Version.h"
 #include "clang/Config/config.h"
 #include "clang/Driver/Action.h"
@@ -54,6 +55,7 @@
 #include "clang/Driver/SanitizerArgs.h"
 #include "clang/Driver/Tool.h"
 #include "clang/Driver/ToolChain.h"
+#include "clang/Driver/Util.h"
 #include "llvm/ADT/ArrayRef.h"
 #include "llvm/ADT/STLExtras.h"
 #include "llvm/ADT/SmallSet.h"
@@ -129,7 +131,7 @@
   CCLogDiagnostics(false), CCGenDiagnostics(false),
   TargetTriple(TargetTriple), CCCGenericGCCName(""), Saver(Alloc),
   CheckInputsExist(true), GenReproducer(false),
-  SuppressMissingInputWarning(false) {
+  SuppressMissingInputWarning(false), NumParallelJobs(1) {
 
   // Provide