[PATCH] D75153: [ThinLTO] Allow usage of all SMT threads in the system

2020-10-11 Thread Nico Weber via Phabricator via cfe-commits
thakis added a comment.

https://clang.llvm.org/docs/ThinLTO.html#controlling-backend-parallelism should 
probably be updated to mention "all".


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D75153/new/

https://reviews.llvm.org/D75153

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D75153: [ThinLTO] Allow usage of all SMT threads in the system

2020-04-16 Thread Alexandre Ganea via Phabricator via cfe-commits
aganea added a comment.

In D75153#1987538 , @MaskRay wrote:

> I remember I saw a related bugs.llvm.org report yesterday but I can't find it 
> now...


This? https://bugs.llvm.org/show_bug.cgi?id=45556


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D75153/new/

https://reviews.llvm.org/D75153



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D75153: [ThinLTO] Allow usage of all SMT threads in the system

2020-04-16 Thread Fangrui Song via Phabricator via cfe-commits
MaskRay added a comment.

In D75153#1942336 , @aganea wrote:

> In D75153#1906580 , @MaskRay wrote:
>
> > Does `taskset -c 0-3 lld -flavor ...` restrict the number of cores?
> >
> >   cpu_set_t cpu;
> >   sched_getaffinity(0, sizeof(cpu), );
> >   CPU_COUNT()
>
>
> Thanks for raising this! This does not seem to work (I currently only have 
> WSL at hand, no real Linux machine). I don't think it worked before my patch. 
> The current code in LLVM is written such as: (//note the "if" statement//)
>
>   #if defined(HAVE_SCHED_GETAFFINITY) && defined(HAVE_CPU_COUNT)
> cpu_set_t Set;
> if (sched_getaffinity(0, sizeof(Set), ))
>   return CPU_COUNT();
>   #endif
>
>
>
>
>   [[ https://linux.die.net/man/2/sched_getaffinity | The doc ]] for 
> `sched_getaffinity` says:
>   
>
> > On success, sched_setaffinity() and sched_getaffinity() return 0. On error, 
> > -1 is returned, and errno is set appropriately.
>
> So it would always fall back to `std::thread::hardware_concurrency`, which 
> apparently does not always take affinity into account, according to 
> @respindola (please see rG8c0ff9508da5f02e8ce6580a126a2018c9bf702a 
> ).
>
> I'll write a follow-up patch to test affinity on Linux and Windows.


Created D78324  to fix this pre-existing 
issue. I remember I saw a related bugs.llvm.org report yesterday but I can't 
find it now...


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D75153/new/

https://reviews.llvm.org/D75153



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D75153: [ThinLTO] Allow usage of all SMT threads in the system

2020-04-16 Thread Alexandre Ganea via Phabricator via cfe-commits
aganea added a comment.

In D75153#1987320 , @phosek wrote:

> In D75153#1987272 , @phosek wrote:
>
> > We've started seeing `llvm-cov` on our Linux bots with this error:
> >
> >   terminating with uncaught exception of type std::__2::system_error: 
> > thread constructor failed: Resource temporarily unavailable
> >
> >
> > Specifically, we're running `llvm export` which uses 
> > `heavyweight_hardware_concurrency` (we use the default number of threads, 
> > i.e. `0`): 
> > https://github.com/llvm/llvm-project/blob/master/llvm/tools/llvm-cov/CoverageExporterJson.cpp#L169
> >
> > I'm not yet sure what's the problem, but bisecting is pointing at this 
> > change.
>
>
> Also on runs that succeeded, we see the execution times more than doubled.


This is caused by a mix of the previous change (rG8404aeb5 
, see 
https://github.com/llvm/llvm-project/commit/8404aeb56a73ab24f9b295111de3b37a37f0b841#diff-9c7f49c15e22d38241ccb9d294f880f4L950)
 which was restraining the ThreadPool to the number of hardware threads, and 
this patch which makes the number of threads in the ThreadPool limitless.

I will revert the behavior of `llvm-cov` to previous state, and will check if 
there are other cases like this.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D75153/new/

https://reviews.llvm.org/D75153



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D75153: [ThinLTO] Allow usage of all SMT threads in the system

2020-04-16 Thread Petr Hosek via Phabricator via cfe-commits
phosek added a comment.

In D75153#1987272 , @phosek wrote:

> We've started seeing `llvm-cov` on our Linux bots with this error:
>
>   terminating with uncaught exception of type std::__2::system_error: thread 
> constructor failed: Resource temporarily unavailable
>
>
> Specifically, we're running `llvm export` which uses 
> `heavyweight_hardware_concurrency` (we use the default number of threads, 
> i.e. `0`): 
> https://github.com/llvm/llvm-project/blob/master/llvm/tools/llvm-cov/CoverageExporterJson.cpp#L169
>
> I'm not yet sure what's the problem, but bisecting is pointing at this change.


Also on runs that succeeded, we see the execution times more than doubled.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D75153/new/

https://reviews.llvm.org/D75153



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D75153: [ThinLTO] Allow usage of all SMT threads in the system

2020-04-16 Thread Petr Hosek via Phabricator via cfe-commits
phosek added a comment.
Herald added a reviewer: MaskRay.

We've started seeing `llvm-cov` on our Linux bots with this error:

  terminating with uncaught exception of type std::__2::system_error: thread 
constructor failed: Resource temporarily unavailable

Specifically, we're running `llvm export` which uses 
`heavyweight_hardware_concurrency` (we use the default number of threads, i.e. 
`0`): 
https://github.com/llvm/llvm-project/blob/master/llvm/tools/llvm-cov/CoverageExporterJson.cpp#L169

I'm not yet sure what's the problem, but bisecting is pointing at this change.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D75153/new/

https://reviews.llvm.org/D75153



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D75153: [ThinLTO] Allow usage of all SMT threads in the system

2020-03-28 Thread Alexandre Ganea via Phabricator via cfe-commits
aganea added a comment.

In D75153#1947956 , @zero9178 wrote:

> This patch seems to break when disabling threading (aka LLVM_ENABLE_THREADS 
> == 0) as get_threadpool_strategy is undefined therefore creating linker 
> failures in clang. (Tested on Windows)


Should be fixed after rG3ab3f3c5d5825476dc1be15992f7c964629de688 
.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D75153/new/

https://reviews.llvm.org/D75153



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D75153: [ThinLTO] Allow usage of all SMT threads in the system

2020-03-28 Thread Alexandre Ganea via Phabricator via cfe-commits
aganea added a comment.

In D75153#1947956 , @zero9178 wrote:

> This patch seems to break when disabling threading (aka LLVM_ENABLE_THREADS 
> == 0) as get_threadpool_strategy is undefined therefore creating linker 
> failures in clang. (Tested on Windows)


Taking a look now.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D75153/new/

https://reviews.llvm.org/D75153



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D75153: [ThinLTO] Allow usage of all SMT threads in the system

2020-03-28 Thread Markus Böck via Phabricator via cfe-commits
zero9178 added a comment.

This patch seems to break when disabling threading (aka LLVM_ENABLE_THREADS == 
0) as get_threadpool_strategy is undefined therefore creating linker failures 
in clang. (Tested on Windows)


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D75153/new/

https://reviews.llvm.org/D75153



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D75153: [ThinLTO] Allow usage of all SMT threads in the system

2020-03-27 Thread Fangrui Song via Phabricator via cfe-commits
MaskRay added inline comments.



Comment at: lld/test/ELF/basic.s:252
 
-# RUN: not ld.lld %t --thinlto-jobs=0 2>&1 | FileCheck 
--check-prefix=NOTHREADSTHIN %s
-# RUN: not ld.lld %t --plugin-opt=jobs=0 2>&1 | FileCheck 
--check-prefix=NOTHREADSTHIN %s
-# NOTHREADSTHIN: --thinlto-jobs: number of threads must be > 0
+# RUN: ld.lld %t --thinlto-jobs=0 -verbose 2>&1 | FileCheck 
--check-prefix=THREADSTHIN %s
+# RUN: ld.lld %t --thinlto-jobs=1 -verbose 2>&1 | FileCheck 
--check-prefix=THREADSTHIN %s

This change is not needed. lto/thinlto.ll has already tested the functionally.

basic.s should also be split. I did this in 
34bdddf9a13cfdbbb5506dc89cf8e781be53105f



Comment at: lld/test/ELF/basic.s:253
+# RUN: ld.lld %t --thinlto-jobs=0 -verbose 2>&1 | FileCheck 
--check-prefix=THREADSTHIN %s
+# RUN: ld.lld %t --thinlto-jobs=1 -verbose 2>&1 | FileCheck 
--check-prefix=THREADSTHIN %s
+# RUN: ld.lld %t --thinlto-jobs=2 -verbose 2>&1 | FileCheck 
--check-prefix=THREADSTHIN %s

-verbose is not needed because verbose just prints input filenames, which has 
nothing to do with --thinlto-jobs=0`.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D75153/new/

https://reviews.llvm.org/D75153



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D75153: [ThinLTO] Allow usage of all SMT threads in the system

2020-03-27 Thread Alexandre Ganea via Phabricator via cfe-commits
This revision was automatically updated to reflect the committed changes.
aganea marked an inline comment as done.
Closed by commit rG09158252f777: [ThinLTO] Allow usage of all hardware threads 
in the system (authored by aganea).

Changed prior to commit:
  https://reviews.llvm.org/D75153?vs=248254=253116#toc

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D75153/new/

https://reviews.llvm.org/D75153

Files:
  clang/lib/Driver/ToolChains/CommonArgs.cpp
  clang/lib/Driver/ToolChains/CommonArgs.h
  clang/lib/Driver/ToolChains/Darwin.cpp
  lld/COFF/Config.h
  lld/COFF/Driver.cpp
  lld/COFF/LTO.cpp
  lld/ELF/Config.h
  lld/ELF/Driver.cpp
  lld/ELF/LTO.cpp
  lld/test/COFF/thinlto.ll
  lld/test/ELF/basic.s
  lld/test/ELF/lto/thinlto.ll
  lld/test/wasm/lto/thinlto.ll
  lld/wasm/Config.h
  lld/wasm/Driver.cpp
  lld/wasm/LTO.cpp
  llvm/include/llvm/LTO/LTO.h
  llvm/include/llvm/Support/Threading.h
  llvm/lib/LTO/LTO.cpp
  llvm/lib/Support/Threading.cpp
  llvm/lib/Support/Unix/Threading.inc
  llvm/lib/Support/Windows/Threading.inc
  llvm/tools/gold/gold-plugin.cpp
  llvm/tools/llvm-lto2/llvm-lto2.cpp

Index: llvm/tools/llvm-lto2/llvm-lto2.cpp
===
--- llvm/tools/llvm-lto2/llvm-lto2.cpp
+++ llvm/tools/llvm-lto2/llvm-lto2.cpp
@@ -68,9 +68,10 @@
"distributed backend case"));
 
 // Default to using all available threads in the system, but using only one
-// thread per core, as indicated by the usage of
-// heavyweight_hardware_concurrency() in the InProcessThinBackend constructor.
-static cl::opt Threads("thinlto-threads", cl::init(0));
+// thread per core (no SMT).
+// Use -thinlto-threads=all to use hardware_concurrency() instead, which means
+// to use all hardware threads or cores in the system.
+static cl::opt Threads("thinlto-threads");
 
 static cl::list SymbolResolutions(
 "r",
@@ -286,7 +287,8 @@
 /* LinkedObjectsFile */ nullptr,
 /* OnWrite */ {});
   else
-Backend = createInProcessThinBackend(Threads);
+Backend = createInProcessThinBackend(
+llvm::heavyweight_hardware_concurrency(Threads));
   LTO Lto(std::move(Conf), std::move(Backend));
 
   bool HasErrors = false;
Index: llvm/tools/gold/gold-plugin.cpp
===
--- llvm/tools/gold/gold-plugin.cpp
+++ llvm/tools/gold/gold-plugin.cpp
@@ -28,6 +28,7 @@
 #include "llvm/Support/MemoryBuffer.h"
 #include "llvm/Support/Path.h"
 #include "llvm/Support/TargetSelect.h"
+#include "llvm/Support/Threading.h"
 #include "llvm/Support/raw_ostream.h"
 #include 
 #include 
@@ -134,11 +135,9 @@
   };
   static OutputType TheOutputType = OT_NORMAL;
   static unsigned OptLevel = 2;
-  // Default parallelism of 0 used to indicate that user did not specify.
-  // Actual parallelism default value depends on implementation.
   // Currently only affects ThinLTO, where the default is the max cores in the
-  // system.
-  static unsigned Parallelism = 0;
+  // system. See llvm::get_threadpool_strategy() for acceptable values.
+  static std::string Parallelism;
   // Default regular LTO codegen parallelism (number of partitions).
   static unsigned ParallelCodeGenParallelismLevel = 1;
 #ifdef NDEBUG
@@ -272,8 +271,10 @@
 message(LDPL_FATAL, "Optimization level must be between 0 and 3");
   OptLevel = opt[1] - '0';
 } else if (opt.startswith("jobs=")) {
-  if (StringRef(opt_ + 5).getAsInteger(10, Parallelism))
-message(LDPL_FATAL, "Invalid parallelism level: %s", opt_ + 5);
+  StringRef Num(opt_ + 5);
+  if (!get_threadpool_strategy(Num))
+message(LDPL_FATAL, "Invalid parallelism level: %s", Num.data());
+  Parallelism = Num;
 } else if (opt.startswith("lto-partitions=")) {
   if (opt.substr(strlen("lto-partitions="))
   .getAsInteger(10, ParallelCodeGenParallelismLevel))
@@ -877,14 +878,15 @@
   Conf.PTO.LoopVectorization = options::OptLevel > 1;
   Conf.PTO.SLPVectorization = options::OptLevel > 1;
 
-  if (options::Parallelism)
-Backend = createInProcessThinBackend(options::Parallelism);
   if (options::thinlto_index_only) {
 std::string OldPrefix, NewPrefix;
 getThinLTOOldAndNewPrefix(OldPrefix, NewPrefix);
 Backend = createWriteIndexesThinBackend(OldPrefix, NewPrefix,
 options::thinlto_emit_imports_files,
 LinkedObjectsFile, OnIndexWrite);
+  } else {
+Backend = createInProcessThinBackend(
+llvm::heavyweight_hardware_concurrency(options::Parallelism));
   }
 
   Conf.OverrideTriple = options::triple;
Index: llvm/lib/Support/Windows/Threading.inc
===
--- llvm/lib/Support/Windows/Threading.inc
+++ 

[PATCH] D75153: [ThinLTO] Allow usage of all SMT threads in the system

2020-03-25 Thread Alexandre Ganea via Phabricator via cfe-commits
aganea marked 3 inline comments as done.
aganea added a subscriber: respindola.
aganea added a comment.

In D75153#1906580 , @MaskRay wrote:

> Does `taskset -c 0-3 lld -flavor ...` restrict the number of cores?
>
>   cpu_set_t cpu;
>   sched_getaffinity(0, sizeof(cpu), );
>   CPU_COUNT()


Thanks for raising this! This does not seem to work (I currently only have WSL 
at hand, no real Linux machine). I don't think it worked before my patch. The 
current code in LLVM is written such as: (//note the "if" statement//)

  #if defined(HAVE_SCHED_GETAFFINITY) && defined(HAVE_CPU_COUNT)
cpu_set_t Set;
if (sched_getaffinity(0, sizeof(Set), ))
  return CPU_COUNT();
  #endif

The doc  for `sched_getaffinity` 
says:

> On success, sched_setaffinity() and sched_getaffinity() return 0. On error, 
> -1 is returned, and errno is set appropriately.

So it would always fall back to `std::thread::hardware_concurrency`, which 
apparently does not always take affinity into account, according to @respindola 
(please see rG8c0ff9508da5f02e8ce6580a126a2018c9bf702a 
).

I'll write a follow-up patch to test affinity on Linux and Windows.




Comment at: llvm/include/llvm/Support/Threading.h:201
+  /// hardware core is used.
+  inline ThreadPoolStrategy heavyweight_hardware_concurrency(StringRef Num) {
+Optional S =

abrachet wrote:
> Nit: Remove `inline` 
> https://llvm.org/docs/CodingStandards.html#don-t-use-inline-when-defining-a-function-in-a-class-definition
After discussing offling with @abrachet , I'll leave the `inline` for now. It 
makes the symbol weak, removing `inline` would otherwise fail linking. I can 
move the function(s) to the .CPP after this patch to save on link time.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D75153/new/

https://reviews.llvm.org/D75153



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D75153: [ThinLTO] Allow usage of all SMT threads in the system

2020-03-04 Thread Alex Brachet via Phabricator via cfe-commits
abrachet added inline comments.



Comment at: llvm/include/llvm/Support/Threading.h:201
+  /// hardware core is used.
+  inline ThreadPoolStrategy heavyweight_hardware_concurrency(StringRef Num) {
+Optional S =

Nit: Remove `inline` 
https://llvm.org/docs/CodingStandards.html#don-t-use-inline-when-defining-a-function-in-a-class-definition


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D75153/new/

https://reviews.llvm.org/D75153



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D75153: [ThinLTO] Allow usage of all SMT threads in the system

2020-03-04 Thread Fangrui Song via Phabricator via cfe-commits
MaskRay added a comment.

Does `taskset -c 0-3 lld -flavor ...` restrict the number of cores?

  cpu_set_t cpu;
  sched_getaffinity(0, sizeof(cpu), );
  CPU_COUNT()


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D75153/new/

https://reviews.llvm.org/D75153



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D75153: [ThinLTO] Allow usage of all SMT threads in the system

2020-03-04 Thread Reid Kleckner via Phabricator via cfe-commits
rnk added a comment.

I think making the build system responsible for running the backend actions is 
the ideal solution, actually. The main reason we have all this threading logic 
in the linker is to make it easy for users of traditional build systems to use 
ThinLTO with a few flag flips. We haven't actually made build system changes 
for Chrome yet. Instead, we replaced the linker with a script that wraps the 
thin link + backend jobs + native link steps. https://crbug.com/877722 has a 
partial record of ideas we had and stuff we tried for Chrome.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D75153/new/

https://reviews.llvm.org/D75153



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D75153: [ThinLTO] Allow usage of all SMT threads in the system

2020-03-04 Thread Alexandre Ganea via Phabricator via cfe-commits
aganea added a comment.

Thanks Reid! I will leave this open for a few days, in case there are any other 
feedbacks.

As for `-fthinlto-index`, we have someone looking at distributing it remotely 
with Fastbuild. I think it is reasonable on the short-term to let the build 
system handle that (I assume that's what you did). One adjacent question is how 
to address the size of the local cache folder (ie. `lto.cache` when compiling 
LLVM) which grows big very quickly. Pruning periodically is fine, but I wonder 
if we could keep it compressed on disk. Maybe do `DeviceIoControl 
(...FSCTL_SET_COMPRESSION..)` and let Windows handle that? I'll take a look in 
the weeks to come.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D75153/new/

https://reviews.llvm.org/D75153



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D75153: [ThinLTO] Allow usage of all SMT threads in the system

2020-03-04 Thread Reid Kleckner via Phabricator via cfe-commits
rnk accepted this revision.
rnk added a comment.
This revision is now accepted and ready to land.

I'm busy and I haven't looked at the code in detail, but I'm OK with going back 
to the old way of doing things. I think honoring user requests to use more 
threads than cores is an important use case. We had plans at one point to add 
indirection points to allow us to distribute these backend actions to other 
machines. Essentially, this would be done by having LLD invoke `$mydiscc clang 
-c foo.o -fuse-thin-lto-index=foo.thinlto.bc`. We've gone a different 
direction, so that change is not likely to come soon, but it seems like a 
reasonable use case where one would want to pass -j LARGE and have it be 
honored.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D75153/new/

https://reviews.llvm.org/D75153



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D75153: [ThinLTO] Allow usage of all SMT threads in the system

2020-03-04 Thread Alexandre Ganea via Phabricator via cfe-commits
aganea updated this revision to Diff 248254.
aganea edited the summary of this revision.
aganea edited reviewers, added: dexonsmith, ruiu; removed: RobRich999, 
espindola.
aganea added subscribers: ruiu, RobRich999.
aganea added a comment.
Herald added a reviewer: espindola.

Simplify. Revert to previous behavior, where any number of threads can be 
specified on the cmd-line, as suggested by @mehdi_amini. See the updated 
Summary for cmd-line usage.

+ @dexonsmith for the Darwin part.
+ @ruiu for the LLD part.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D75153/new/

https://reviews.llvm.org/D75153

Files:
  clang/lib/Driver/ToolChains/CommonArgs.cpp
  clang/lib/Driver/ToolChains/CommonArgs.h
  clang/lib/Driver/ToolChains/Darwin.cpp
  lld/COFF/Config.h
  lld/COFF/Driver.cpp
  lld/COFF/LTO.cpp
  lld/ELF/Config.h
  lld/ELF/Driver.cpp
  lld/ELF/LTO.cpp
  lld/test/COFF/thinlto.ll
  lld/test/ELF/basic.s
  lld/test/ELF/lto/thinlto.ll
  lld/test/wasm/lto/thinlto.ll
  lld/wasm/Config.h
  lld/wasm/Driver.cpp
  lld/wasm/LTO.cpp
  llvm/include/llvm/LTO/LTO.h
  llvm/include/llvm/Support/Threading.h
  llvm/lib/LTO/LTO.cpp
  llvm/lib/Support/Threading.cpp
  llvm/lib/Support/Windows/Threading.inc
  llvm/test/Transforms/PGOProfile/thinlto_samplepgo_icp3.ll
  llvm/tools/gold/gold-plugin.cpp
  llvm/tools/llvm-lto2/llvm-lto2.cpp

Index: llvm/tools/llvm-lto2/llvm-lto2.cpp
===
--- llvm/tools/llvm-lto2/llvm-lto2.cpp
+++ llvm/tools/llvm-lto2/llvm-lto2.cpp
@@ -66,9 +66,10 @@
"distributed backend case"));
 
 // Default to using all available threads in the system, but using only one
-// thread per core, as indicated by the usage of
-// heavyweight_hardware_concurrency() in the InProcessThinBackend constructor.
-static cl::opt Threads("thinlto-threads", cl::init(0));
+// thread per core (no SMT).
+// Use -thinlto-threads=all to use hardware_concurrency() instead, which means
+// to use all hardware threads or cores in the system.
+static cl::opt Threads("thinlto-threads");
 
 static cl::list SymbolResolutions(
 "r",
@@ -284,7 +285,8 @@
 /* LinkedObjectsFile */ nullptr,
 /* OnWrite */ {});
   else
-Backend = createInProcessThinBackend(Threads);
+Backend = createInProcessThinBackend(
+llvm::heavyweight_hardware_concurrency(Threads));
   LTO Lto(std::move(Conf), std::move(Backend));
 
   bool HasErrors = false;
Index: llvm/tools/gold/gold-plugin.cpp
===
--- llvm/tools/gold/gold-plugin.cpp
+++ llvm/tools/gold/gold-plugin.cpp
@@ -28,6 +28,7 @@
 #include "llvm/Support/MemoryBuffer.h"
 #include "llvm/Support/Path.h"
 #include "llvm/Support/TargetSelect.h"
+#include "llvm/Support/Threading.h"
 #include "llvm/Support/raw_ostream.h"
 #include 
 #include 
@@ -132,11 +133,9 @@
   };
   static OutputType TheOutputType = OT_NORMAL;
   static unsigned OptLevel = 2;
-  // Default parallelism of 0 used to indicate that user did not specify.
-  // Actual parallelism default value depends on implementation.
   // Currently only affects ThinLTO, where the default is the max cores in the
-  // system.
-  static unsigned Parallelism = 0;
+  // system. See llvm::get_threadpool_strategy() for acceptable values.
+  static std::string Parallelism;
   // Default regular LTO codegen parallelism (number of partitions).
   static unsigned ParallelCodeGenParallelismLevel = 1;
 #ifdef NDEBUG
@@ -270,8 +269,10 @@
 message(LDPL_FATAL, "Optimization level must be between 0 and 3");
   OptLevel = opt[1] - '0';
 } else if (opt.startswith("jobs=")) {
-  if (StringRef(opt_ + 5).getAsInteger(10, Parallelism))
-message(LDPL_FATAL, "Invalid parallelism level: %s", opt_ + 5);
+  StringRef Num(opt_ + 5);
+  if (!get_threadpool_strategy(Num))
+message(LDPL_FATAL, "Invalid parallelism level: %s", Num.data());
+  Parallelism = Num;
 } else if (opt.startswith("lto-partitions=")) {
   if (opt.substr(strlen("lto-partitions="))
   .getAsInteger(10, ParallelCodeGenParallelismLevel))
@@ -875,14 +876,15 @@
   Conf.PTO.LoopVectorization = options::OptLevel > 1;
   Conf.PTO.SLPVectorization = options::OptLevel > 1;
 
-  if (options::Parallelism)
-Backend = createInProcessThinBackend(options::Parallelism);
   if (options::thinlto_index_only) {
 std::string OldPrefix, NewPrefix;
 getThinLTOOldAndNewPrefix(OldPrefix, NewPrefix);
 Backend = createWriteIndexesThinBackend(OldPrefix, NewPrefix,
 options::thinlto_emit_imports_files,
 LinkedObjectsFile, OnIndexWrite);
+  } else {
+Backend = createInProcessThinBackend(
+llvm::heavyweight_hardware_concurrency(options::Parallelism));
   }
 
   Conf.OverrideTriple = options::triple;

[PATCH] D75153: [ThinLTO] Allow usage of all SMT threads in the system

2020-03-01 Thread Alexandre Ganea via Phabricator via cfe-commits
aganea added a comment.

@mehdi_amini Agreed. In that case, I just need to slightly change this function 

 to ensure threads are equally dispatched on all CPU sockets regardless of the 
thread strategy.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D75153/new/

https://reviews.llvm.org/D75153



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D75153: [ThinLTO] Allow usage of all SMT threads in the system

2020-03-01 Thread Mehdi AMINI via Phabricator via cfe-commits
mehdi_amini added a comment.

> /opt:lldltojobs=N -- limit usage to N threads, but constrained by usage of 
> heavyweight_hardware_concurrency().

I really dislike this behavior: this seems user hostile to me. I would either:

- honor the user request (eventually display a warning), this is in line with 
other system behavior like `ninja -j N` for instance.
- reject the user request

If you want such a behavior, then it should be another flag which express it in 
the name like `opt:lldltomaxnumjobs`.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D75153/new/

https://reviews.llvm.org/D75153



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D75153: [ThinLTO] Allow usage of all SMT threads in the system

2020-02-27 Thread Robert Richmond via Phabricator via cfe-commits
RobRich999 added a comment.

Based upon the description, I think this patch is more applicable than just 
targeting a specific AMD proc family since it allows the end-user a choice for 
maximizing threading with both CMT and SMT on all supported platforms.

BTW, until if/when this patch lands, I just set a static value in source as a 
local workaround for now.

llvm/lib/Support/Host.cpp

#elif defined(_WIN32)
// Defined in llvm/lib/Support/Windows/Threading.inc
static int computeHostNumPhysicalCores() { return 32; }
#else


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D75153/new/

https://reviews.llvm.org/D75153



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D75153: [ThinLTO] Allow usage of all SMT threads in the system

2020-02-27 Thread Alexandre Ganea via Phabricator via cfe-commits
aganea added a comment.

If a simpler (more yucky?) patch is needed to fix what @thakis was suggesting 
in https://reviews.llvm.org/D71775#1891709, and if we don't want this extra new 
flag, we can also check the CPU brand for "AMD Opteron", and keep the old 
behavior in that case.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D75153/new/

https://reviews.llvm.org/D75153



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D75153: [ThinLTO] Allow usage of all SMT threads in the system

2020-02-25 Thread Alexandre Ganea via Phabricator via cfe-commits
aganea created this revision.
aganea added reviewers: tejohnson, thakis, rnk, RobRich999.
Herald added subscribers: cfe-commits, dang, dexonsmith, mikhail.ramalho, 
steven_wu, MaskRay, aheejin, hiraditya, arichardson, inglorion, sbc100, emaste.
Herald added a reviewer: espindola.
Herald added projects: clang, LLVM.

Before this patch, it wasn't possible to extend the ThinLTO threads to all 
SMT/CMT/hyper-threads in the system. Only one thread per core was allowed, 
instructed by usage of `llvm::heavyweight_hardware_concurrency()` in places in 
the ThinLTO objects initialization.

Any number passed to the LLD flag `/opt:lldltojobs=...`, or any other 
ThinLTO-specific flag, would be interpreted in the context of 
`llvm::heavyweight_hardware_concurrency()`, which means SMT disabled.

After this patch, one can say in LLD: `/opt:lldltojobs=all` -- which means to 
ignore the `llvm::heavyweight_hardware_concurrency()` requirement, and instead 
use all SMT/CMT/hyper-threads in the system, which is equivalent to using 
`llvm::hardware_concurrency()`.

All cmd-line flags and code paths that lead to initialize ThinLTO have been 
modified by this patch: `-flto-jobs=...`, `-threads=`, `--thinlto-jobs=...`, 
`-thinlto-threads=...`, `--plugin-opt=jobs=...`.

Summary
---

To sum up:
`/opt:lldltojobs=0` -- use all threads available, but constrained by usage of 
`heavyweight_hardware_concurrency()` or `hardware_concurrency()`.
`/opt:lldltojobs=N` -- limit usage to N threads, but constrained by usage of 
`heavyweight_hardware_concurrency()` or `hardware_concurrency()`.
`/opt:lldltojobs=all` -- use all threads available, regardless of usage of 
`heavyweight_hardware_concurrency()` or `hardware_concurrency()`.


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D75153

Files:
  clang/lib/Driver/ToolChains/CommonArgs.cpp
  clang/lib/Driver/ToolChains/CommonArgs.h
  clang/lib/Driver/ToolChains/Darwin.cpp
  lld/COFF/Config.h
  lld/COFF/Driver.cpp
  lld/COFF/LTO.cpp
  lld/ELF/Config.h
  lld/ELF/Driver.cpp
  lld/ELF/LTO.cpp
  lld/test/COFF/thinlto.ll
  lld/test/ELF/basic.s
  lld/test/ELF/lto/thinlto.ll
  lld/test/wasm/lto/thinlto.ll
  lld/wasm/Config.h
  lld/wasm/Driver.cpp
  lld/wasm/LTO.cpp
  llvm/include/llvm/LTO/LTO.h
  llvm/lib/LTO/LTO.cpp
  llvm/test/Transforms/PGOProfile/thinlto_samplepgo_icp3.ll
  llvm/tools/gold/gold-plugin.cpp
  llvm/tools/llvm-lto2/llvm-lto2.cpp

Index: llvm/tools/llvm-lto2/llvm-lto2.cpp
===
--- llvm/tools/llvm-lto2/llvm-lto2.cpp
+++ llvm/tools/llvm-lto2/llvm-lto2.cpp
@@ -66,9 +66,10 @@
"distributed backend case"));
 
 // Default to using all available threads in the system, but using only one
-// thread per core, as indicated by the usage of
-// heavyweight_hardware_concurrency() in the InProcessThinBackend constructor.
-static cl::opt Threads("thinlto-threads", cl::init(0));
+// thread per core (no SMT).
+// Use -thinlto-threads=all to use hardware_concurrency() instead, which means
+// to use all hardware threads or cores in the system.
+static cl::opt Threads("thinlto-threads");
 
 static cl::list SymbolResolutions(
 "r",
@@ -276,6 +277,16 @@
   Conf.PTO.LoopVectorization = Conf.OptLevel > 1;
   Conf.PTO.SLPVectorization = Conf.OptLevel > 1;
 
+  auto getStrategy = [](StringRef Num) {
+if (Num == "all")
+  return llvm::hardware_concurrency();
+if (Num.empty())
+  return ThreadPoolStrategy();
+unsigned V;
+Num.getAsInteger(10, V);
+return llvm::heavyweight_hardware_concurrency(V);
+  };
+
   ThinBackend Backend;
   if (ThinLTODistributedIndexes)
 Backend = createWriteIndexesThinBackend(/* OldPrefix */ "",
@@ -284,7 +295,7 @@
 /* LinkedObjectsFile */ nullptr,
 /* OnWrite */ {});
   else
-Backend = createInProcessThinBackend(Threads);
+Backend = createInProcessThinBackend(getStrategy(Threads));
   LTO Lto(std::move(Conf), std::move(Backend));
 
   bool HasErrors = false;
Index: llvm/tools/gold/gold-plugin.cpp
===
--- llvm/tools/gold/gold-plugin.cpp
+++ llvm/tools/gold/gold-plugin.cpp
@@ -139,6 +139,7 @@
   static unsigned Parallelism = 0;
   // Default regular LTO codegen parallelism (number of partitions).
   static unsigned ParallelCodeGenParallelismLevel = 1;
+  static bool ParallelismHeavyWeight = true;
 #ifdef NDEBUG
   static bool DisableVerify = true;
 #else
@@ -270,7 +271,10 @@
 message(LDPL_FATAL, "Optimization level must be between 0 and 3");
   OptLevel = opt[1] - '0';
 } else if (opt.startswith("jobs=")) {
-  if (StringRef(opt_ + 5).getAsInteger(10, Parallelism))
+  StringRef Num(opt_ + 5);
+  if (Num == "all")
+ParallelismHeavyWeight = false;
+  else if (Num.getAsInteger(10, Parallelism))
 message(LDPL_FATAL, "Invalid parallelism