[PATCH] D88666: DirectoryWatcher: add an implementation for Windows

2020-10-04 Thread Saleem Abdulrasool via Phabricator via cfe-commits
compnerd updated this revision to Diff 296092.
compnerd added a comment.
Herald added a subscriber: mgorny.

Address feedback


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D88666/new/

https://reviews.llvm.org/D88666

Files:
  clang/lib/DirectoryWatcher/windows/DirectoryWatcher-windows.cpp
  clang/unittests/DirectoryWatcher/CMakeLists.txt

Index: clang/unittests/DirectoryWatcher/CMakeLists.txt
===
--- clang/unittests/DirectoryWatcher/CMakeLists.txt
+++ clang/unittests/DirectoryWatcher/CMakeLists.txt
@@ -1,4 +1,4 @@
-if(APPLE OR CMAKE_SYSTEM_NAME MATCHES "Linux")
+if(APPLE OR CMAKE_SYSTEM_NAME MATCHES "Linux" OR CMAKE_SYSTEM_NAME STREQUAL Windows)
 
   set(LLVM_LINK_COMPONENTS
 Support
Index: clang/lib/DirectoryWatcher/windows/DirectoryWatcher-windows.cpp
===
--- clang/lib/DirectoryWatcher/windows/DirectoryWatcher-windows.cpp
+++ clang/lib/DirectoryWatcher/windows/DirectoryWatcher-windows.cpp
@@ -6,18 +6,12 @@
 //
 //===--===//
 
-// TODO: This is not yet an implementation, but it will make it so Windows
-//   builds don't fail.
-
 #include "DirectoryScanner.h"
 #include "clang/DirectoryWatcher/DirectoryWatcher.h"
-
 #include "llvm/ADT/STLExtras.h"
-#include "llvm/ADT/ScopeExit.h"
-#include "llvm/Support/AlignOf.h"
-#include "llvm/Support/Errno.h"
-#include "llvm/Support/Mutex.h"
+#include "llvm/Support/ConvertUTF.h"
 #include "llvm/Support/Path.h"
+#include "llvm/Support/Windows/WindowsSupport.h"
 #include 
 #include 
 #include 
@@ -26,25 +20,240 @@
 #include 
 #include 
 
+
 namespace {
 
+using DirectoryWatcherCallback =
+std::function, bool)>;
+
 using namespace llvm;
 using namespace clang;
 
 class DirectoryWatcherWindows : public clang::DirectoryWatcher {
+  OVERLAPPED Overlapped;
+
+  alignas(DWORD)
+  CHAR Notifications[4 * (sizeof(FILE_NOTIFY_INFORMATION) + MAX_PATH * sizeof(WCHAR))];
+
+  std::thread WatcherThread;
+  std::thread HandlerThread;
+  std::function, bool)> Callback;
+  SmallString Path;
+
+  class EventQueue {
+std::mutex M;
+std::queue Q;
+std::condition_variable CV;
+
+  public:
+void emplace(DirectoryWatcher::Event::EventKind Kind, StringRef Path) {
+  {
+std::unique_lock L(M);
+Q.emplace(Kind, Path);
+  }
+  CV.notify_one();
+}
+
+DirectoryWatcher::Event pop_front() {
+  std::unique_lock L(M);
+  while (true) {
+if (!Q.empty()) {
+  DirectoryWatcher::Event E = Q.front();
+  Q.pop();
+  return E;
+}
+CV.wait(L, [this]() { return !Q.empty(); });
+  }
+}
+  } Q;
+
 public:
-  ~DirectoryWatcherWindows() override { }
-  void InitialScan() { }
-  void EventReceivingLoop() { }
-  void StopWork() { }
+  DirectoryWatcherWindows(HANDLE DirectoryHandle, bool WaitForInitialSync,
+  DirectoryWatcherCallback Receiver);
+
+  ~DirectoryWatcherWindows() override;
+
+  void InitialScan();
 };
+
+DirectoryWatcherWindows::DirectoryWatcherWindows(
+HANDLE DirectoryHandle, bool WaitForInitialSync,
+DirectoryWatcherCallback Receiver)
+: Callback(Receiver) {
+  // Pre-compute the real location as we will be handing over the directory
+  // handle to the watcher and performing synchronous operations.
+  {
+DWORD Length = GetFinalPathNameByHandleW(DirectoryHandle, NULL, 0, 0);
+
+std::vector Buffer;
+Buffer.reserve(Length);
+
+Length = GetFinalPathNameByHandleW(DirectoryHandle, Buffer.data(),
+   Buffer.capacity(), 0);
+Buffer.resize(Length);
+
+llvm::sys::windows::UTF16ToUTF8(Buffer.data(), Buffer.size(), Path);
+  }
+
+  memset(, 0, sizeof(Overlapped));
+  Overlapped.hEvent =
+  CreateEventW(NULL, /*bManualReset=*/TRUE, /*bInitialState=*/FALSE, NULL);
+  assert(Overlapped.hEvent && "unable to create event");
+
+  WatcherThread = std::thread([this, DirectoryHandle]() {
+while (true) {
+  // We do not guarantee subdirectories, but macOS already provides
+  // subdirectories, might as well as ...
+  BOOL WatchSubtree = TRUE;
+  DWORD NotifyFilter = FILE_NOTIFY_CHANGE_FILE_NAME
+ | FILE_NOTIFY_CHANGE_DIR_NAME
+ | FILE_NOTIFY_CHANGE_SIZE
+ | FILE_NOTIFY_CHANGE_LAST_ACCESS
+ | FILE_NOTIFY_CHANGE_LAST_WRITE
+ | FILE_NOTIFY_CHANGE_CREATION;
+
+  DWORD BytesTransferred;
+  if (!ReadDirectoryChangesW(DirectoryHandle, Notifications,
+ sizeof(Notifications), WatchSubtree,
+ NotifyFilter, , ,
+ NULL)) {
+Q.emplace(DirectoryWatcher::Event::EventKind::WatcherGotInvalidated,
+  "");
+

[PATCH] D88666: DirectoryWatcher: add an implementation for Windows

2020-10-04 Thread Saleem Abdulrasool via Phabricator via cfe-commits
compnerd added inline comments.



Comment at: clang/lib/DirectoryWatcher/windows/DirectoryWatcher-windows.cpp:87
+DWORD dwLength = GetFinalPathNameByHandleW(hDirectory, NULL, 0, 0);
+std::unique_ptr buffer{new WCHAR[dwLength + 1]};
+(void)GetFinalPathNameByHandleW(hDirectory, buffer.get(), dwLength + 1, 0);

compnerd wrote:
> amccarth wrote:
> > compnerd wrote:
> > > aaron.ballman wrote:
> > > > Is a smart pointer required here or could you use `std::vector` 
> > > > and reserve the space that way?
> > > Sure, I can convert this to a `std::vector` instead.
> > * I guess it's fine to use the array form of `std::unique_ptr` (but then 
> > you should `#include `).  If it were me, I'd probably just use a 
> > `std::wstring` or `std::vector`.
> > 
> > * `dwLength` already includes the size of the null terminator.  Your first 
> > `GetFinalPathNameByHandleW` function "fails" because the buffer is too 
> > small.  The does says that, if it fails because the buffer is too small, 
> > then the return value is the required size _including_ the null terminator. 
> >  (In the success case, it's the size w/o the terminator.)
> > 
> > * I know this is the Windows-specific implementation, but it might be best 
> > to just the Support api ` realPathFromHandle`, which does this and has 
> > tests.
> I didn't know about `realPathFromHandle` - I prefer that actually.
Actually, `realPathFromHandle` is private to `Path.cpp` :-(


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D88666/new/

https://reviews.llvm.org/D88666

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D88666: DirectoryWatcher: add an implementation for Windows

2020-10-04 Thread Saleem Abdulrasool via Phabricator via cfe-commits
compnerd marked 14 inline comments as done.
compnerd added a comment.

There already is testing coverage for this - I just missed the CMake changes.




Comment at: clang/lib/DirectoryWatcher/windows/DirectoryWatcher-windows.cpp:36
+  alignas(DWORD)
+  CHAR Buffer[4 * (sizeof(FILE_NOTIFY_INFORMATION) + MAX_PATH * 
sizeof(WCHAR))];
+

amccarth wrote:
> If it were me, I'd probably make this a `std::vector`.
> 
> * If an off-by-one bug causes an overrun of one WCHAR, you could trash a 
> crucial member variable.  On the heap, the damage is less likely to be 
> catastrophic.
> * You wouldn't need `alignas`.
> * I don't think these are created in a tight loop, so the overhead doesn't 
> concern me.
> 
> Also, I'd probably go with a slightly more descriptive name, like 
> `Notifications` rather than `Buffer`.
The `alignas` is because the documentation states that the buffer should be 
DWORD aligned.  It is more for pedantic reasons rather than anything else.  I 
think that making it a catastrophic failure is a good thing though - it would 
catch the error :)  You are correct about the allocation - it is once per 
watch.  I'll rename it at least.



Comment at: clang/lib/DirectoryWatcher/windows/DirectoryWatcher-windows.cpp:82
+DirectoryWatcherCallback Receiver)
+: Callback(Receiver) {
+  // Pre-compute the real location as we will be handing over the directory

amccarth wrote:
> There's a lot going on in this constructor.  Is this how the other 
> implementations are arranged?
> 
> Would it make sense to just initialize the object, and save most of the 
> actual work to a `Watch` method?
Largely the same.  However, the majority of the "work" is actually the thread 
proc for the two threads.



Comment at: clang/lib/DirectoryWatcher/windows/DirectoryWatcher-windows.cpp:87
+DWORD dwLength = GetFinalPathNameByHandleW(hDirectory, NULL, 0, 0);
+std::unique_ptr buffer{new WCHAR[dwLength + 1]};
+(void)GetFinalPathNameByHandleW(hDirectory, buffer.get(), dwLength + 1, 0);

amccarth wrote:
> compnerd wrote:
> > aaron.ballman wrote:
> > > Is a smart pointer required here or could you use `std::vector` 
> > > and reserve the space that way?
> > Sure, I can convert this to a `std::vector` instead.
> * I guess it's fine to use the array form of `std::unique_ptr` (but then you 
> should `#include `).  If it were me, I'd probably just use a 
> `std::wstring` or `std::vector`.
> 
> * `dwLength` already includes the size of the null terminator.  Your first 
> `GetFinalPathNameByHandleW` function "fails" because the buffer is too small. 
>  The does says that, if it fails because the buffer is too small, then the 
> return value is the required size _including_ the null terminator.  (In the 
> success case, it's the size w/o the terminator.)
> 
> * I know this is the Windows-specific implementation, but it might be best to 
> just the Support api ` realPathFromHandle`, which does this and has tests.
I didn't know about `realPathFromHandle` - I prefer that actually.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D88666/new/

https://reviews.llvm.org/D88666

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D88550: [HIP] Fix -fgpu-allow-device-init option

2020-10-04 Thread Yaxun Liu via Phabricator via cfe-commits
This revision was automatically updated to reflect the committed changes.
Closed by commit rGe372c1d7624e: [HIP] Fix -fgpu-allow-device-init option 
(authored by yaxunl).
Herald added a project: clang.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D88550/new/

https://reviews.llvm.org/D88550

Files:
  clang/lib/Driver/ToolChains/Clang.cpp
  clang/lib/Driver/ToolChains/HIP.cpp
  clang/test/Driver/hip-options.hip


Index: clang/test/Driver/hip-options.hip
===
--- clang/test/Driver/hip-options.hip
+++ clang/test/Driver/hip-options.hip
@@ -9,6 +9,11 @@
 // CHECK: clang{{.*}}" "-cc1" {{.*}} "-fcuda-is-device"
 // CHECK-SAME: "--gpu-max-threads-per-block=1024"
 
+// RUN: %clang -### -nogpuinc -nogpulib -fgpu-allow-device-init \
+// RUN:   %s 2>&1 | FileCheck -check-prefix=DEVINIT %s
+// DEVINIT: clang{{.*}}" "-cc1" {{.*}}"-fgpu-allow-device-init"
+// DEVINIT: clang{{.*}}" "-cc1" {{.*}}"-fgpu-allow-device-init"
+
 // RUN: %clang -### -x hip -target x86_64-pc-windows-msvc -fms-extensions \
 // RUN:   -mllvm -amdgpu-early-inline-all=true  %s 2>&1 | \
 // RUN:   FileCheck -check-prefix=MLLVM %s
Index: clang/lib/Driver/ToolChains/HIP.cpp
===
--- clang/lib/Driver/ToolChains/HIP.cpp
+++ clang/lib/Driver/ToolChains/HIP.cpp
@@ -268,10 +268,6 @@
 CC1Args.push_back(DriverArgs.MakeArgStringRef(ArgStr));
   }
 
-  if (DriverArgs.hasFlag(options::OPT_fgpu_allow_device_init,
- options::OPT_fno_gpu_allow_device_init, false))
-CC1Args.push_back("-fgpu-allow-device-init");
-
   CC1Args.push_back("-fcuda-allow-variadic-functions");
 
   // Default to "hidden" visibility, as object level linking will not be
Index: clang/lib/Driver/ToolChains/Clang.cpp
===
--- clang/lib/Driver/ToolChains/Clang.cpp
+++ clang/lib/Driver/ToolChains/Clang.cpp
@@ -5476,9 +5476,14 @@
   // Forward -cl options to -cc1
   RenderOpenCLOptions(Args, CmdArgs);
 
-  if (IsHIP && Args.hasFlag(options::OPT_fhip_new_launch_api,
-options::OPT_fno_hip_new_launch_api, true))
-CmdArgs.push_back("-fhip-new-launch-api");
+  if (IsHIP) {
+if (Args.hasFlag(options::OPT_fhip_new_launch_api,
+ options::OPT_fno_hip_new_launch_api, true))
+  CmdArgs.push_back("-fhip-new-launch-api");
+if (Args.hasFlag(options::OPT_fgpu_allow_device_init,
+ options::OPT_fno_gpu_allow_device_init, false))
+  CmdArgs.push_back("-fgpu-allow-device-init");
+  }
 
   if (Arg *A = Args.getLastArg(options::OPT_fcf_protection_EQ)) {
 CmdArgs.push_back(


Index: clang/test/Driver/hip-options.hip
===
--- clang/test/Driver/hip-options.hip
+++ clang/test/Driver/hip-options.hip
@@ -9,6 +9,11 @@
 // CHECK: clang{{.*}}" "-cc1" {{.*}} "-fcuda-is-device"
 // CHECK-SAME: "--gpu-max-threads-per-block=1024"
 
+// RUN: %clang -### -nogpuinc -nogpulib -fgpu-allow-device-init \
+// RUN:   %s 2>&1 | FileCheck -check-prefix=DEVINIT %s
+// DEVINIT: clang{{.*}}" "-cc1" {{.*}}"-fgpu-allow-device-init"
+// DEVINIT: clang{{.*}}" "-cc1" {{.*}}"-fgpu-allow-device-init"
+
 // RUN: %clang -### -x hip -target x86_64-pc-windows-msvc -fms-extensions \
 // RUN:   -mllvm -amdgpu-early-inline-all=true  %s 2>&1 | \
 // RUN:   FileCheck -check-prefix=MLLVM %s
Index: clang/lib/Driver/ToolChains/HIP.cpp
===
--- clang/lib/Driver/ToolChains/HIP.cpp
+++ clang/lib/Driver/ToolChains/HIP.cpp
@@ -268,10 +268,6 @@
 CC1Args.push_back(DriverArgs.MakeArgStringRef(ArgStr));
   }
 
-  if (DriverArgs.hasFlag(options::OPT_fgpu_allow_device_init,
- options::OPT_fno_gpu_allow_device_init, false))
-CC1Args.push_back("-fgpu-allow-device-init");
-
   CC1Args.push_back("-fcuda-allow-variadic-functions");
 
   // Default to "hidden" visibility, as object level linking will not be
Index: clang/lib/Driver/ToolChains/Clang.cpp
===
--- clang/lib/Driver/ToolChains/Clang.cpp
+++ clang/lib/Driver/ToolChains/Clang.cpp
@@ -5476,9 +5476,14 @@
   // Forward -cl options to -cc1
   RenderOpenCLOptions(Args, CmdArgs);
 
-  if (IsHIP && Args.hasFlag(options::OPT_fhip_new_launch_api,
-options::OPT_fno_hip_new_launch_api, true))
-CmdArgs.push_back("-fhip-new-launch-api");
+  if (IsHIP) {
+if (Args.hasFlag(options::OPT_fhip_new_launch_api,
+ options::OPT_fno_hip_new_launch_api, true))
+  CmdArgs.push_back("-fhip-new-launch-api");
+if (Args.hasFlag(options::OPT_fgpu_allow_device_init,
+ options::OPT_fno_gpu_allow_device_init, false))
+  CmdArgs.push_back("-fgpu-allow-device-init");
+  }
 
   if (Arg *A = 

[clang] e372c1d - [HIP] Fix -fgpu-allow-device-init option

2020-10-04 Thread Yaxun Liu via cfe-commits

Author: Yaxun (Sam) Liu
Date: 2020-10-04T22:13:05-04:00
New Revision: e372c1d7624e2402a5f91a640780fb32649922b6

URL: 
https://github.com/llvm/llvm-project/commit/e372c1d7624e2402a5f91a640780fb32649922b6
DIFF: 
https://github.com/llvm/llvm-project/commit/e372c1d7624e2402a5f91a640780fb32649922b6.diff

LOG: [HIP] Fix -fgpu-allow-device-init option

The option needs to be passed to both host and device compilation.

Differential Revision: https://reviews.llvm.org/D88550

Added: 


Modified: 
clang/lib/Driver/ToolChains/Clang.cpp
clang/lib/Driver/ToolChains/HIP.cpp
clang/test/Driver/hip-options.hip

Removed: 




diff  --git a/clang/lib/Driver/ToolChains/Clang.cpp 
b/clang/lib/Driver/ToolChains/Clang.cpp
index 272a49899012..f6eeb53964a7 100644
--- a/clang/lib/Driver/ToolChains/Clang.cpp
+++ b/clang/lib/Driver/ToolChains/Clang.cpp
@@ -5476,9 +5476,14 @@ void Clang::ConstructJob(Compilation , const JobAction 
,
   // Forward -cl options to -cc1
   RenderOpenCLOptions(Args, CmdArgs);
 
-  if (IsHIP && Args.hasFlag(options::OPT_fhip_new_launch_api,
-options::OPT_fno_hip_new_launch_api, true))
-CmdArgs.push_back("-fhip-new-launch-api");
+  if (IsHIP) {
+if (Args.hasFlag(options::OPT_fhip_new_launch_api,
+ options::OPT_fno_hip_new_launch_api, true))
+  CmdArgs.push_back("-fhip-new-launch-api");
+if (Args.hasFlag(options::OPT_fgpu_allow_device_init,
+ options::OPT_fno_gpu_allow_device_init, false))
+  CmdArgs.push_back("-fgpu-allow-device-init");
+  }
 
   if (Arg *A = Args.getLastArg(options::OPT_fcf_protection_EQ)) {
 CmdArgs.push_back(

diff  --git a/clang/lib/Driver/ToolChains/HIP.cpp 
b/clang/lib/Driver/ToolChains/HIP.cpp
index f1044f316fc8..4d1e0f9f2fdf 100644
--- a/clang/lib/Driver/ToolChains/HIP.cpp
+++ b/clang/lib/Driver/ToolChains/HIP.cpp
@@ -268,10 +268,6 @@ void HIPToolChain::addClangTargetOptions(
 CC1Args.push_back(DriverArgs.MakeArgStringRef(ArgStr));
   }
 
-  if (DriverArgs.hasFlag(options::OPT_fgpu_allow_device_init,
- options::OPT_fno_gpu_allow_device_init, false))
-CC1Args.push_back("-fgpu-allow-device-init");
-
   CC1Args.push_back("-fcuda-allow-variadic-functions");
 
   // Default to "hidden" visibility, as object level linking will not be

diff  --git a/clang/test/Driver/hip-options.hip 
b/clang/test/Driver/hip-options.hip
index a7a6e02a3c81..fa7b019e5762 100644
--- a/clang/test/Driver/hip-options.hip
+++ b/clang/test/Driver/hip-options.hip
@@ -9,6 +9,11 @@
 // CHECK: clang{{.*}}" "-cc1" {{.*}} "-fcuda-is-device"
 // CHECK-SAME: "--gpu-max-threads-per-block=1024"
 
+// RUN: %clang -### -nogpuinc -nogpulib -fgpu-allow-device-init \
+// RUN:   %s 2>&1 | FileCheck -check-prefix=DEVINIT %s
+// DEVINIT: clang{{.*}}" "-cc1" {{.*}}"-fgpu-allow-device-init"
+// DEVINIT: clang{{.*}}" "-cc1" {{.*}}"-fgpu-allow-device-init"
+
 // RUN: %clang -### -x hip -target x86_64-pc-windows-msvc -fms-extensions \
 // RUN:   -mllvm -amdgpu-early-inline-all=true  %s 2>&1 | \
 // RUN:   FileCheck -check-prefix=MLLVM %s



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D88730: [HIP] Fix default output file for -E

2020-10-04 Thread Yaxun Liu via Phabricator via cfe-commits
This revision was automatically updated to reflect the committed changes.
Closed by commit rG5b551b79d3bb: [HIP] Fix default output file for -E (authored 
by yaxunl).
Herald added a project: clang.

Changed prior to commit:
  https://reviews.llvm.org/D88730?vs=295799=296088#toc

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D88730/new/

https://reviews.llvm.org/D88730

Files:
  clang/lib/Driver/Driver.cpp
  clang/test/Driver/hip-output-file-name.hip


Index: clang/test/Driver/hip-output-file-name.hip
===
--- clang/test/Driver/hip-output-file-name.hip
+++ clang/test/Driver/hip-output-file-name.hip
@@ -7,3 +7,45 @@
 // RUN: 2>&1 | FileCheck %s
 
 // CHECK: {{.*}}clang-offload-bundler{{.*}}"-outputs=hip-output-file-name.o"
+
+// Check -E default output is "-" (stdout).
+
+// RUN: %clang -### -E -target x86_64-linux-gnu \
+// RUN:   --cuda-gpu-arch=gfx803 --cuda-gpu-arch=gfx900 %s \
+// RUN: 2>&1 | FileCheck -check-prefixes=DASH %s
+
+// RUN: %clang -### -E -save-temps -target x86_64-linux-gnu \
+// RUN:   --cuda-gpu-arch=gfx803 --cuda-gpu-arch=gfx900 %s \
+// RUN: 2>&1 | FileCheck -check-prefixes=DASH %s
+
+// RUN: %clang -### -E --cuda-device-only -target x86_64-linux-gnu \
+// RUN:   --cuda-gpu-arch=gfx803 --cuda-gpu-arch=gfx900 %s \
+// RUN: 2>&1 | FileCheck -check-prefixes=CLANG-DASH %s
+
+// RUN: %clang -### -E --cuda-host-only -target x86_64-linux-gnu \
+// RUN:   --cuda-gpu-arch=gfx803 --cuda-gpu-arch=gfx900 %s \
+// RUN: 2>&1 | FileCheck -check-prefixes=CLANG-DASH %s
+
+// DASH: {{.*}}clang-offload-bundler{{.*}}"-outputs=-"
+// CLANG-DASH: {{.*}}clang{{.*}}"-o" "-"
+
+// Check -E with -o.
+
+// RUN: %clang -### -E -o test.cui -target x86_64-linux-gnu \
+// RUN:   --cuda-gpu-arch=gfx803 --cuda-gpu-arch=gfx900 %s \
+// RUN: 2>&1 | FileCheck -check-prefixes=OUT %s
+
+// RUN: %clang -### -E -o test.cui -save-temps -target x86_64-linux-gnu \
+// RUN:   --cuda-gpu-arch=gfx803 --cuda-gpu-arch=gfx900 %s \
+// RUN: 2>&1 | FileCheck -check-prefixes=OUT %s
+
+// RUN: %clang -### -E -o test.cui --cuda-device-only -target x86_64-linux-gnu 
\
+// RUN:   --cuda-gpu-arch=gfx803 --cuda-gpu-arch=gfx900 %s \
+// RUN: 2>&1 | FileCheck -check-prefixes=CLANG-OUT %s
+
+// RUN: %clang -### -E -o test.cui --cuda-host-only -target x86_64-linux-gnu \
+// RUN:   --cuda-gpu-arch=gfx803 --cuda-gpu-arch=gfx900 %s \
+// RUN: 2>&1 | FileCheck -check-prefixes=CLANG-OUT %s
+
+// OUT: {{.*}}clang-offload-bundler{{.*}}"-outputs=test.cui"
+// CLANG-OUT: {{.*}}clang{{.*}}"-o" "test.cui"
Index: clang/lib/Driver/Driver.cpp
===
--- clang/lib/Driver/Driver.cpp
+++ clang/lib/Driver/Driver.cpp
@@ -4604,6 +4604,17 @@
   return Args.MakeArgString(Filename.c_str());
 }
 
+static bool HasPreprocessOutput(const Action ) {
+  if (isa(JA))
+return true;
+  if (isa(JA) && isa(JA.getInputs()[0]))
+return true;
+  if (isa(JA) &&
+  HasPreprocessOutput(*(JA.getInputs()[0])))
+return true;
+  return false;
+}
+
 const char *Driver::GetNamedOutputPath(Compilation , const JobAction ,
const char *BaseInput,
StringRef BoundArch, bool AtTopLevel,
@@ -4629,8 +4640,9 @@
   }
 
   // Default to writing to stdout?
-  if (AtTopLevel && !CCGenDiagnostics && isa(JA))
+  if (AtTopLevel && !CCGenDiagnostics && HasPreprocessOutput(JA)) {
 return "-";
+  }
 
   // Is this the assembly listing for /FA?
   if (JA.getType() == types::TY_PP_Asm &&


Index: clang/test/Driver/hip-output-file-name.hip
===
--- clang/test/Driver/hip-output-file-name.hip
+++ clang/test/Driver/hip-output-file-name.hip
@@ -7,3 +7,45 @@
 // RUN: 2>&1 | FileCheck %s
 
 // CHECK: {{.*}}clang-offload-bundler{{.*}}"-outputs=hip-output-file-name.o"
+
+// Check -E default output is "-" (stdout).
+
+// RUN: %clang -### -E -target x86_64-linux-gnu \
+// RUN:   --cuda-gpu-arch=gfx803 --cuda-gpu-arch=gfx900 %s \
+// RUN: 2>&1 | FileCheck -check-prefixes=DASH %s
+
+// RUN: %clang -### -E -save-temps -target x86_64-linux-gnu \
+// RUN:   --cuda-gpu-arch=gfx803 --cuda-gpu-arch=gfx900 %s \
+// RUN: 2>&1 | FileCheck -check-prefixes=DASH %s
+
+// RUN: %clang -### -E --cuda-device-only -target x86_64-linux-gnu \
+// RUN:   --cuda-gpu-arch=gfx803 --cuda-gpu-arch=gfx900 %s \
+// RUN: 2>&1 | FileCheck -check-prefixes=CLANG-DASH %s
+
+// RUN: %clang -### -E --cuda-host-only -target x86_64-linux-gnu \
+// RUN:   --cuda-gpu-arch=gfx803 --cuda-gpu-arch=gfx900 %s \
+// RUN: 2>&1 | FileCheck -check-prefixes=CLANG-DASH %s
+
+// DASH: {{.*}}clang-offload-bundler{{.*}}"-outputs=-"
+// CLANG-DASH: {{.*}}clang{{.*}}"-o" "-"
+
+// Check -E with -o.
+
+// RUN: %clang -### -E -o test.cui -target x86_64-linux-gnu \
+// RUN:   --cuda-gpu-arch=gfx803 --cuda-gpu-arch=gfx900 %s \
+// RUN: 2>&1 | FileCheck 

[clang] 5b551b7 - [HIP] Fix default output file for -E

2020-10-04 Thread Yaxun Liu via cfe-commits

Author: Yaxun (Sam) Liu
Date: 2020-10-04T22:03:16-04:00
New Revision: 5b551b79d3bba5a8a282bf5f72c7baaccf925870

URL: 
https://github.com/llvm/llvm-project/commit/5b551b79d3bba5a8a282bf5f72c7baaccf925870
DIFF: 
https://github.com/llvm/llvm-project/commit/5b551b79d3bba5a8a282bf5f72c7baaccf925870.diff

LOG: [HIP] Fix default output file for -E

By convention the default output file for -E is "-" (stdout).
This is expected by tools like ccache, which uses output
of -E to determine if a file and its dependence has changed.

Currently clang does not use stdout as default output file for -E
for HIP, which causes ccache not working.

This patch fixes that.

Differential Revision: https://reviews.llvm.org/D88730

Added: 


Modified: 
clang/lib/Driver/Driver.cpp
clang/test/Driver/hip-output-file-name.hip

Removed: 




diff  --git a/clang/lib/Driver/Driver.cpp b/clang/lib/Driver/Driver.cpp
index 96798b3d0adb..6f2a030290ed 100644
--- a/clang/lib/Driver/Driver.cpp
+++ b/clang/lib/Driver/Driver.cpp
@@ -4604,6 +4604,17 @@ static const char *MakeCLOutputFilename(const ArgList 
, StringRef ArgValue,
   return Args.MakeArgString(Filename.c_str());
 }
 
+static bool HasPreprocessOutput(const Action ) {
+  if (isa(JA))
+return true;
+  if (isa(JA) && isa(JA.getInputs()[0]))
+return true;
+  if (isa(JA) &&
+  HasPreprocessOutput(*(JA.getInputs()[0])))
+return true;
+  return false;
+}
+
 const char *Driver::GetNamedOutputPath(Compilation , const JobAction ,
const char *BaseInput,
StringRef BoundArch, bool AtTopLevel,
@@ -4629,8 +4640,9 @@ const char *Driver::GetNamedOutputPath(Compilation , 
const JobAction ,
   }
 
   // Default to writing to stdout?
-  if (AtTopLevel && !CCGenDiagnostics && isa(JA))
+  if (AtTopLevel && !CCGenDiagnostics && HasPreprocessOutput(JA)) {
 return "-";
+  }
 
   // Is this the assembly listing for /FA?
   if (JA.getType() == types::TY_PP_Asm &&

diff  --git a/clang/test/Driver/hip-output-file-name.hip 
b/clang/test/Driver/hip-output-file-name.hip
index d57f7e87f89e..b0b1a9d7ff3d 100644
--- a/clang/test/Driver/hip-output-file-name.hip
+++ b/clang/test/Driver/hip-output-file-name.hip
@@ -7,3 +7,45 @@
 // RUN: 2>&1 | FileCheck %s
 
 // CHECK: {{.*}}clang-offload-bundler{{.*}}"-outputs=hip-output-file-name.o"
+
+// Check -E default output is "-" (stdout).
+
+// RUN: %clang -### -E -target x86_64-linux-gnu \
+// RUN:   --cuda-gpu-arch=gfx803 --cuda-gpu-arch=gfx900 %s \
+// RUN: 2>&1 | FileCheck -check-prefixes=DASH %s
+
+// RUN: %clang -### -E -save-temps -target x86_64-linux-gnu \
+// RUN:   --cuda-gpu-arch=gfx803 --cuda-gpu-arch=gfx900 %s \
+// RUN: 2>&1 | FileCheck -check-prefixes=DASH %s
+
+// RUN: %clang -### -E --cuda-device-only -target x86_64-linux-gnu \
+// RUN:   --cuda-gpu-arch=gfx803 --cuda-gpu-arch=gfx900 %s \
+// RUN: 2>&1 | FileCheck -check-prefixes=CLANG-DASH %s
+
+// RUN: %clang -### -E --cuda-host-only -target x86_64-linux-gnu \
+// RUN:   --cuda-gpu-arch=gfx803 --cuda-gpu-arch=gfx900 %s \
+// RUN: 2>&1 | FileCheck -check-prefixes=CLANG-DASH %s
+
+// DASH: {{.*}}clang-offload-bundler{{.*}}"-outputs=-"
+// CLANG-DASH: {{.*}}clang{{.*}}"-o" "-"
+
+// Check -E with -o.
+
+// RUN: %clang -### -E -o test.cui -target x86_64-linux-gnu \
+// RUN:   --cuda-gpu-arch=gfx803 --cuda-gpu-arch=gfx900 %s \
+// RUN: 2>&1 | FileCheck -check-prefixes=OUT %s
+
+// RUN: %clang -### -E -o test.cui -save-temps -target x86_64-linux-gnu \
+// RUN:   --cuda-gpu-arch=gfx803 --cuda-gpu-arch=gfx900 %s \
+// RUN: 2>&1 | FileCheck -check-prefixes=OUT %s
+
+// RUN: %clang -### -E -o test.cui --cuda-device-only -target x86_64-linux-gnu 
\
+// RUN:   --cuda-gpu-arch=gfx803 --cuda-gpu-arch=gfx900 %s \
+// RUN: 2>&1 | FileCheck -check-prefixes=CLANG-OUT %s
+
+// RUN: %clang -### -E -o test.cui --cuda-host-only -target x86_64-linux-gnu \
+// RUN:   --cuda-gpu-arch=gfx803 --cuda-gpu-arch=gfx900 %s \
+// RUN: 2>&1 | FileCheck -check-prefixes=CLANG-OUT %s
+
+// OUT: {{.*}}clang-offload-bundler{{.*}}"-outputs=test.cui"
+// CLANG-OUT: {{.*}}clang{{.*}}"-o" "test.cui"



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] 9756a40 - Recommit "[HIP] Add option --gpu-instrument-lib="

2020-10-04 Thread Yaxun Liu via cfe-commits

Author: Yaxun (Sam) Liu
Date: 2020-10-04T21:41:43-04:00
New Revision: 9756a402f297d0030689aaade3651785b7496649

URL: 
https://github.com/llvm/llvm-project/commit/9756a402f297d0030689aaade3651785b7496649
DIFF: 
https://github.com/llvm/llvm-project/commit/9756a402f297d0030689aaade3651785b7496649.diff

LOG: Recommit "[HIP] Add option --gpu-instrument-lib="

recommit 64f7790e7d2309b5d38949921a256acf8068e659 after
fixing hip-device-libs.hip.

Added: 
clang/test/Driver/Inputs/hip_multiple_inputs/instrument.bc

Modified: 
clang/include/clang/Driver/Options.td
clang/lib/Driver/ToolChains/HIP.cpp
clang/test/Driver/hip-device-libs.hip

Removed: 




diff  --git a/clang/include/clang/Driver/Options.td 
b/clang/include/clang/Driver/Options.td
index 672a833c9d4d..18a123476253 100644
--- a/clang/include/clang/Driver/Options.td
+++ b/clang/include/clang/Driver/Options.td
@@ -672,6 +672,9 @@ defm gpu_allow_device_init : 
OptInFFlag<"gpu-allow-device-init",
 def gpu_max_threads_per_block_EQ : Joined<["--"], 
"gpu-max-threads-per-block=">,
   Flags<[CC1Option]>,
   HelpText<"Default max threads per block for kernel launch bounds for HIP">;
+def gpu_instrument_lib_EQ : Joined<["--"], "gpu-instrument-lib=">,
+  HelpText<"Instrument device library for HIP, which is a LLVM bitcode 
containing "
+  "__cyg_profile_func_enter and __cyg_profile_func_exit">;
 def libomptarget_nvptx_path_EQ : Joined<["--"], "libomptarget-nvptx-path=">, 
Group,
   HelpText<"Path to libomptarget-nvptx libraries">;
 def dD : Flag<["-"], "dD">, Group, Flags<[CC1Option]>,

diff  --git a/clang/lib/Driver/ToolChains/HIP.cpp 
b/clang/lib/Driver/ToolChains/HIP.cpp
index 07d72c073b4b..f1044f316fc8 100644
--- a/clang/lib/Driver/ToolChains/HIP.cpp
+++ b/clang/lib/Driver/ToolChains/HIP.cpp
@@ -330,6 +330,17 @@ void HIPToolChain::addClangTargetOptions(
 RocmInstallation.addCommonBitcodeLibCC1Args(
   DriverArgs, CC1Args, LibDeviceFile, Wave64, DAZ, FiniteOnly,
   UnsafeMathOpt, FastRelaxedMath, CorrectSqrt);
+
+// Add instrument lib.
+auto InstLib =
+DriverArgs.getLastArgValue(options::OPT_gpu_instrument_lib_EQ);
+if (InstLib.empty())
+  return;
+if (llvm::sys::fs::exists(InstLib)) {
+  CC1Args.push_back("-mlink-builtin-bitcode");
+  CC1Args.push_back(DriverArgs.MakeArgString(InstLib));
+} else
+  getDriver().Diag(diag::err_drv_no_such_file) << InstLib;
   }
 }
 

diff  --git a/clang/test/Driver/Inputs/hip_multiple_inputs/instrument.bc 
b/clang/test/Driver/Inputs/hip_multiple_inputs/instrument.bc
new file mode 100644
index ..e69de29bb2d1

diff  --git a/clang/test/Driver/hip-device-libs.hip 
b/clang/test/Driver/hip-device-libs.hip
index 3dd798476e2b..c3e89d1a4fed 100644
--- a/clang/test/Driver/hip-device-libs.hip
+++ b/clang/test/Driver/hip-device-libs.hip
@@ -92,7 +92,7 @@
 
 // Test --hip-device-lib-path flag
 // RUN: %clang -### -target x86_64-linux-gnu \
-// RUN:   --cuda-gpu-arch=gfx803 \
+// RUN:   --cuda-gpu-arch=gfx803 -nogpuinc \
 // RUN:   --hip-device-lib-path=%S/Inputs/rocm/amdgcn/bitcode   \
 // RUN:   %S/Inputs/hip_multiple_inputs/b.hip \
 // RUN: 2>&1 | FileCheck %s --check-prefixes=ALL,FLUSHD
@@ -101,10 +101,19 @@
 // Test environment variable HIP_DEVICE_LIB_PATH
 // RUN: env HIP_DEVICE_LIB_PATH=%S/Inputs/rocm/amdgcn/bitcode \
 // RUN:   %clang -### -target x86_64-linux-gnu \
-// RUN:   --cuda-gpu-arch=gfx900 \
+// RUN:   --cuda-gpu-arch=gfx900 -nogpuinc \
 // RUN:   %S/Inputs/hip_multiple_inputs/b.hip \
 // RUN: 2>&1 | FileCheck %s --check-prefixes=ALL
 
+// Test --gpu-instrument-lib
+// RUN: %clang -### -target x86_64-linux-gnu \
+// RUN:   --cuda-gpu-arch=gfx900 \
+// RUN:   --rocm-path=%S/Inputs/rocm \
+// RUN:   --gpu-instrument-lib=%S/Inputs/hip_multiple_inputs/instrument.bc \
+// RUN:   %S/Inputs/hip_multiple_inputs/b.hip \
+// RUN: 2>&1 | FileCheck %s --check-prefixes=ALL,INST
+
+// ALL-NOT: error:
 // ALL: {{"[^"]*clang[^"]*"}}
 // ALL-SAME: "-mlink-builtin-bitcode" "{{.*}}hip.bc"
 // ALL-SAME: "-mlink-builtin-bitcode" "{{.*}}ocml.bc"
@@ -118,3 +127,4 @@
 // ALL-SAME: "-mlink-builtin-bitcode" "{{.*}}oclc_correctly_rounded_sqrt_on.bc"
 // ALL-SAME: "-mlink-builtin-bitcode" "{{.*}}oclc_wavefrontsize64_on.bc"
 // ALL-SAME: "-mlink-builtin-bitcode" "{{.*}}oclc_isa_version_{{[0-9]+}}.bc"
+// INST-SAME: "-mlink-builtin-bitcode" "{{.*}}instrument.bc"



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] fef0ebb - Revert "[HIP] Add option --gpu-instrument-lib="

2020-10-04 Thread Yaxun Liu via cfe-commits

Author: Yaxun (Sam) Liu
Date: 2020-10-04T21:27:29-04:00
New Revision: fef0ebbc0b39167656bd11283e3084b000b309dd

URL: 
https://github.com/llvm/llvm-project/commit/fef0ebbc0b39167656bd11283e3084b000b309dd
DIFF: 
https://github.com/llvm/llvm-project/commit/fef0ebbc0b39167656bd11283e3084b000b309dd.diff

LOG: Revert "[HIP] Add option --gpu-instrument-lib="

This reverts commit 64f7790e7d2309b5d38949921a256acf8068e659 due
to regression in hip-device-libs.hip.

Added: 


Modified: 
clang/include/clang/Driver/Options.td
clang/lib/Driver/ToolChains/HIP.cpp
clang/test/Driver/hip-device-libs.hip

Removed: 
clang/test/Driver/Inputs/hip_multiple_inputs/instrument.bc



diff  --git a/clang/include/clang/Driver/Options.td 
b/clang/include/clang/Driver/Options.td
index 18a123476253..672a833c9d4d 100644
--- a/clang/include/clang/Driver/Options.td
+++ b/clang/include/clang/Driver/Options.td
@@ -672,9 +672,6 @@ defm gpu_allow_device_init : 
OptInFFlag<"gpu-allow-device-init",
 def gpu_max_threads_per_block_EQ : Joined<["--"], 
"gpu-max-threads-per-block=">,
   Flags<[CC1Option]>,
   HelpText<"Default max threads per block for kernel launch bounds for HIP">;
-def gpu_instrument_lib_EQ : Joined<["--"], "gpu-instrument-lib=">,
-  HelpText<"Instrument device library for HIP, which is a LLVM bitcode 
containing "
-  "__cyg_profile_func_enter and __cyg_profile_func_exit">;
 def libomptarget_nvptx_path_EQ : Joined<["--"], "libomptarget-nvptx-path=">, 
Group,
   HelpText<"Path to libomptarget-nvptx libraries">;
 def dD : Flag<["-"], "dD">, Group, Flags<[CC1Option]>,

diff  --git a/clang/lib/Driver/ToolChains/HIP.cpp 
b/clang/lib/Driver/ToolChains/HIP.cpp
index f1044f316fc8..07d72c073b4b 100644
--- a/clang/lib/Driver/ToolChains/HIP.cpp
+++ b/clang/lib/Driver/ToolChains/HIP.cpp
@@ -330,17 +330,6 @@ void HIPToolChain::addClangTargetOptions(
 RocmInstallation.addCommonBitcodeLibCC1Args(
   DriverArgs, CC1Args, LibDeviceFile, Wave64, DAZ, FiniteOnly,
   UnsafeMathOpt, FastRelaxedMath, CorrectSqrt);
-
-// Add instrument lib.
-auto InstLib =
-DriverArgs.getLastArgValue(options::OPT_gpu_instrument_lib_EQ);
-if (InstLib.empty())
-  return;
-if (llvm::sys::fs::exists(InstLib)) {
-  CC1Args.push_back("-mlink-builtin-bitcode");
-  CC1Args.push_back(DriverArgs.MakeArgString(InstLib));
-} else
-  getDriver().Diag(diag::err_drv_no_such_file) << InstLib;
   }
 }
 

diff  --git a/clang/test/Driver/Inputs/hip_multiple_inputs/instrument.bc 
b/clang/test/Driver/Inputs/hip_multiple_inputs/instrument.bc
deleted file mode 100644
index e69de29bb2d1..

diff  --git a/clang/test/Driver/hip-device-libs.hip 
b/clang/test/Driver/hip-device-libs.hip
index 1ffaeda18390..3dd798476e2b 100644
--- a/clang/test/Driver/hip-device-libs.hip
+++ b/clang/test/Driver/hip-device-libs.hip
@@ -105,15 +105,6 @@
 // RUN:   %S/Inputs/hip_multiple_inputs/b.hip \
 // RUN: 2>&1 | FileCheck %s --check-prefixes=ALL
 
-// Test --gpu-instrument-lib
-// RUN: %clang -### -target x86_64-linux-gnu \
-// RUN:   --cuda-gpu-arch=gfx900 \
-// RUN:   --rocm-path=%S/Inputs/rocm \
-// RUN:   --gpu-instrument-lib=%S/Inputs/hip_multiple_inputs/instrument.bc \
-// RUN:   %S/Inputs/hip_multiple_inputs/b.hip \
-// RUN: 2>&1 | FileCheck %s --check-prefixes=ALL,INST
-
-// ALL-NOT: error:
 // ALL: {{"[^"]*clang[^"]*"}}
 // ALL-SAME: "-mlink-builtin-bitcode" "{{.*}}hip.bc"
 // ALL-SAME: "-mlink-builtin-bitcode" "{{.*}}ocml.bc"
@@ -127,4 +118,3 @@
 // ALL-SAME: "-mlink-builtin-bitcode" "{{.*}}oclc_correctly_rounded_sqrt_on.bc"
 // ALL-SAME: "-mlink-builtin-bitcode" "{{.*}}oclc_wavefrontsize64_on.bc"
 // ALL-SAME: "-mlink-builtin-bitcode" "{{.*}}oclc_isa_version_{{[0-9]+}}.bc"
-// INST-SAME: "-mlink-builtin-bitcode" "{{.*}}instrument.bc"



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D88557: [HIP] Add option --gpu-instrument-lib=

2020-10-04 Thread Yaxun Liu via Phabricator via cfe-commits
This revision was automatically updated to reflect the committed changes.
Closed by commit rG64f7790e7d23: [HIP] Add option --gpu-instrument-lib= 
(authored by yaxunl).
Herald added a project: clang.

Changed prior to commit:
  https://reviews.llvm.org/D88557?vs=295244=296087#toc

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D88557/new/

https://reviews.llvm.org/D88557

Files:
  clang/include/clang/Driver/Options.td
  clang/lib/Driver/ToolChains/HIP.cpp
  clang/test/Driver/Inputs/hip_multiple_inputs/instrument.bc
  clang/test/Driver/hip-device-libs.hip


Index: clang/test/Driver/hip-device-libs.hip
===
--- clang/test/Driver/hip-device-libs.hip
+++ clang/test/Driver/hip-device-libs.hip
@@ -105,6 +105,15 @@
 // RUN:   %S/Inputs/hip_multiple_inputs/b.hip \
 // RUN: 2>&1 | FileCheck %s --check-prefixes=ALL
 
+// Test --gpu-instrument-lib
+// RUN: %clang -### -target x86_64-linux-gnu \
+// RUN:   --cuda-gpu-arch=gfx900 \
+// RUN:   --rocm-path=%S/Inputs/rocm \
+// RUN:   --gpu-instrument-lib=%S/Inputs/hip_multiple_inputs/instrument.bc \
+// RUN:   %S/Inputs/hip_multiple_inputs/b.hip \
+// RUN: 2>&1 | FileCheck %s --check-prefixes=ALL,INST
+
+// ALL-NOT: error:
 // ALL: {{"[^"]*clang[^"]*"}}
 // ALL-SAME: "-mlink-builtin-bitcode" "{{.*}}hip.bc"
 // ALL-SAME: "-mlink-builtin-bitcode" "{{.*}}ocml.bc"
@@ -118,3 +127,4 @@
 // ALL-SAME: "-mlink-builtin-bitcode" "{{.*}}oclc_correctly_rounded_sqrt_on.bc"
 // ALL-SAME: "-mlink-builtin-bitcode" "{{.*}}oclc_wavefrontsize64_on.bc"
 // ALL-SAME: "-mlink-builtin-bitcode" "{{.*}}oclc_isa_version_{{[0-9]+}}.bc"
+// INST-SAME: "-mlink-builtin-bitcode" "{{.*}}instrument.bc"
Index: clang/lib/Driver/ToolChains/HIP.cpp
===
--- clang/lib/Driver/ToolChains/HIP.cpp
+++ clang/lib/Driver/ToolChains/HIP.cpp
@@ -330,6 +330,17 @@
 RocmInstallation.addCommonBitcodeLibCC1Args(
   DriverArgs, CC1Args, LibDeviceFile, Wave64, DAZ, FiniteOnly,
   UnsafeMathOpt, FastRelaxedMath, CorrectSqrt);
+
+// Add instrument lib.
+auto InstLib =
+DriverArgs.getLastArgValue(options::OPT_gpu_instrument_lib_EQ);
+if (InstLib.empty())
+  return;
+if (llvm::sys::fs::exists(InstLib)) {
+  CC1Args.push_back("-mlink-builtin-bitcode");
+  CC1Args.push_back(DriverArgs.MakeArgString(InstLib));
+} else
+  getDriver().Diag(diag::err_drv_no_such_file) << InstLib;
   }
 }
 
Index: clang/include/clang/Driver/Options.td
===
--- clang/include/clang/Driver/Options.td
+++ clang/include/clang/Driver/Options.td
@@ -672,6 +672,9 @@
 def gpu_max_threads_per_block_EQ : Joined<["--"], 
"gpu-max-threads-per-block=">,
   Flags<[CC1Option]>,
   HelpText<"Default max threads per block for kernel launch bounds for HIP">;
+def gpu_instrument_lib_EQ : Joined<["--"], "gpu-instrument-lib=">,
+  HelpText<"Instrument device library for HIP, which is a LLVM bitcode 
containing "
+  "__cyg_profile_func_enter and __cyg_profile_func_exit">;
 def libomptarget_nvptx_path_EQ : Joined<["--"], "libomptarget-nvptx-path=">, 
Group,
   HelpText<"Path to libomptarget-nvptx libraries">;
 def dD : Flag<["-"], "dD">, Group, Flags<[CC1Option]>,


Index: clang/test/Driver/hip-device-libs.hip
===
--- clang/test/Driver/hip-device-libs.hip
+++ clang/test/Driver/hip-device-libs.hip
@@ -105,6 +105,15 @@
 // RUN:   %S/Inputs/hip_multiple_inputs/b.hip \
 // RUN: 2>&1 | FileCheck %s --check-prefixes=ALL
 
+// Test --gpu-instrument-lib
+// RUN: %clang -### -target x86_64-linux-gnu \
+// RUN:   --cuda-gpu-arch=gfx900 \
+// RUN:   --rocm-path=%S/Inputs/rocm \
+// RUN:   --gpu-instrument-lib=%S/Inputs/hip_multiple_inputs/instrument.bc \
+// RUN:   %S/Inputs/hip_multiple_inputs/b.hip \
+// RUN: 2>&1 | FileCheck %s --check-prefixes=ALL,INST
+
+// ALL-NOT: error:
 // ALL: {{"[^"]*clang[^"]*"}}
 // ALL-SAME: "-mlink-builtin-bitcode" "{{.*}}hip.bc"
 // ALL-SAME: "-mlink-builtin-bitcode" "{{.*}}ocml.bc"
@@ -118,3 +127,4 @@
 // ALL-SAME: "-mlink-builtin-bitcode" "{{.*}}oclc_correctly_rounded_sqrt_on.bc"
 // ALL-SAME: "-mlink-builtin-bitcode" "{{.*}}oclc_wavefrontsize64_on.bc"
 // ALL-SAME: "-mlink-builtin-bitcode" "{{.*}}oclc_isa_version_{{[0-9]+}}.bc"
+// INST-SAME: "-mlink-builtin-bitcode" "{{.*}}instrument.bc"
Index: clang/lib/Driver/ToolChains/HIP.cpp
===
--- clang/lib/Driver/ToolChains/HIP.cpp
+++ clang/lib/Driver/ToolChains/HIP.cpp
@@ -330,6 +330,17 @@
 RocmInstallation.addCommonBitcodeLibCC1Args(
   DriverArgs, CC1Args, LibDeviceFile, Wave64, DAZ, FiniteOnly,
   UnsafeMathOpt, FastRelaxedMath, CorrectSqrt);
+
+// Add instrument lib.
+auto InstLib =
+DriverArgs.getLastArgValue(options::OPT_gpu_instrument_lib_EQ);
+if 

[clang] 64f7790 - [HIP] Add option --gpu-instrument-lib=

2020-10-04 Thread Yaxun Liu via cfe-commits

Author: Yaxun (Sam) Liu
Date: 2020-10-04T21:16:36-04:00
New Revision: 64f7790e7d2309b5d38949921a256acf8068e659

URL: 
https://github.com/llvm/llvm-project/commit/64f7790e7d2309b5d38949921a256acf8068e659
DIFF: 
https://github.com/llvm/llvm-project/commit/64f7790e7d2309b5d38949921a256acf8068e659.diff

LOG: [HIP] Add option --gpu-instrument-lib=

Add an option --gpu-instrument-lib= to allow users to specify
an instrument device library. This is for supporting -finstrument
in device code for debugging/profiling tools.

Differential Revision: https://reviews.llvm.org/D88557

Added: 
clang/test/Driver/Inputs/hip_multiple_inputs/instrument.bc

Modified: 
clang/include/clang/Driver/Options.td
clang/lib/Driver/ToolChains/HIP.cpp
clang/test/Driver/hip-device-libs.hip

Removed: 




diff  --git a/clang/include/clang/Driver/Options.td 
b/clang/include/clang/Driver/Options.td
index 672a833c9d4d..18a123476253 100644
--- a/clang/include/clang/Driver/Options.td
+++ b/clang/include/clang/Driver/Options.td
@@ -672,6 +672,9 @@ defm gpu_allow_device_init : 
OptInFFlag<"gpu-allow-device-init",
 def gpu_max_threads_per_block_EQ : Joined<["--"], 
"gpu-max-threads-per-block=">,
   Flags<[CC1Option]>,
   HelpText<"Default max threads per block for kernel launch bounds for HIP">;
+def gpu_instrument_lib_EQ : Joined<["--"], "gpu-instrument-lib=">,
+  HelpText<"Instrument device library for HIP, which is a LLVM bitcode 
containing "
+  "__cyg_profile_func_enter and __cyg_profile_func_exit">;
 def libomptarget_nvptx_path_EQ : Joined<["--"], "libomptarget-nvptx-path=">, 
Group,
   HelpText<"Path to libomptarget-nvptx libraries">;
 def dD : Flag<["-"], "dD">, Group, Flags<[CC1Option]>,

diff  --git a/clang/lib/Driver/ToolChains/HIP.cpp 
b/clang/lib/Driver/ToolChains/HIP.cpp
index 07d72c073b4b..f1044f316fc8 100644
--- a/clang/lib/Driver/ToolChains/HIP.cpp
+++ b/clang/lib/Driver/ToolChains/HIP.cpp
@@ -330,6 +330,17 @@ void HIPToolChain::addClangTargetOptions(
 RocmInstallation.addCommonBitcodeLibCC1Args(
   DriverArgs, CC1Args, LibDeviceFile, Wave64, DAZ, FiniteOnly,
   UnsafeMathOpt, FastRelaxedMath, CorrectSqrt);
+
+// Add instrument lib.
+auto InstLib =
+DriverArgs.getLastArgValue(options::OPT_gpu_instrument_lib_EQ);
+if (InstLib.empty())
+  return;
+if (llvm::sys::fs::exists(InstLib)) {
+  CC1Args.push_back("-mlink-builtin-bitcode");
+  CC1Args.push_back(DriverArgs.MakeArgString(InstLib));
+} else
+  getDriver().Diag(diag::err_drv_no_such_file) << InstLib;
   }
 }
 

diff  --git a/clang/test/Driver/Inputs/hip_multiple_inputs/instrument.bc 
b/clang/test/Driver/Inputs/hip_multiple_inputs/instrument.bc
new file mode 100644
index ..e69de29bb2d1

diff  --git a/clang/test/Driver/hip-device-libs.hip 
b/clang/test/Driver/hip-device-libs.hip
index 3dd798476e2b..1ffaeda18390 100644
--- a/clang/test/Driver/hip-device-libs.hip
+++ b/clang/test/Driver/hip-device-libs.hip
@@ -105,6 +105,15 @@
 // RUN:   %S/Inputs/hip_multiple_inputs/b.hip \
 // RUN: 2>&1 | FileCheck %s --check-prefixes=ALL
 
+// Test --gpu-instrument-lib
+// RUN: %clang -### -target x86_64-linux-gnu \
+// RUN:   --cuda-gpu-arch=gfx900 \
+// RUN:   --rocm-path=%S/Inputs/rocm \
+// RUN:   --gpu-instrument-lib=%S/Inputs/hip_multiple_inputs/instrument.bc \
+// RUN:   %S/Inputs/hip_multiple_inputs/b.hip \
+// RUN: 2>&1 | FileCheck %s --check-prefixes=ALL,INST
+
+// ALL-NOT: error:
 // ALL: {{"[^"]*clang[^"]*"}}
 // ALL-SAME: "-mlink-builtin-bitcode" "{{.*}}hip.bc"
 // ALL-SAME: "-mlink-builtin-bitcode" "{{.*}}ocml.bc"
@@ -118,3 +127,4 @@
 // ALL-SAME: "-mlink-builtin-bitcode" "{{.*}}oclc_correctly_rounded_sqrt_on.bc"
 // ALL-SAME: "-mlink-builtin-bitcode" "{{.*}}oclc_wavefrontsize64_on.bc"
 // ALL-SAME: "-mlink-builtin-bitcode" "{{.*}}oclc_isa_version_{{[0-9]+}}.bc"
+// INST-SAME: "-mlink-builtin-bitcode" "{{.*}}instrument.bc"



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] 2c94d88 - [NewPM] collapsing nested pass mangers of the same type

2020-10-04 Thread Yuanfang Chen via cfe-commits

Author: Yuanfang Chen
Date: 2020-10-04T15:57:13-07:00
New Revision: 2c94d88e076990a7b533578a392a150d4b9b0fa8

URL: 
https://github.com/llvm/llvm-project/commit/2c94d88e076990a7b533578a392a150d4b9b0fa8
DIFF: 
https://github.com/llvm/llvm-project/commit/2c94d88e076990a7b533578a392a150d4b9b0fa8.diff

LOG: [NewPM] collapsing nested pass mangers of the same type

This is one of the reason for extra invalidations in D84959. In
practice, I don't think we have use cases needing this. This simplifies
the pipeline a bit and prune corner cases when considering
invalidations.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D85676

Added: 


Modified: 
clang/test/CodeGen/thinlto-distributed-newpm.ll
llvm/include/llvm/IR/PassManager.h
llvm/test/Other/new-pass-manager.ll
llvm/test/Other/new-pm-defaults.ll
llvm/test/Other/new-pm-lto-defaults.ll
llvm/test/Other/new-pm-thinlto-defaults.ll
llvm/test/Other/new-pm-thinlto-postlink-pgo-defaults.ll
llvm/test/Other/new-pm-thinlto-postlink-samplepgo-defaults.ll
llvm/test/Other/new-pm-thinlto-prelink-pgo-defaults.ll
llvm/test/Other/new-pm-thinlto-prelink-samplepgo-defaults.ll
llvm/test/Other/pass-pipeline-parsing.ll

Removed: 




diff  --git a/clang/test/CodeGen/thinlto-distributed-newpm.ll 
b/clang/test/CodeGen/thinlto-distributed-newpm.ll
index 9f9a8bec4ef5..ec56845a8fdf 100644
--- a/clang/test/CodeGen/thinlto-distributed-newpm.ll
+++ b/clang/test/CodeGen/thinlto-distributed-newpm.ll
@@ -25,7 +25,6 @@
 ; CHECK-O: Running pass: LowerTypeTestsPass
 ; CHECK-O: Invalidating analysis: InnerAnalysisManagerProxy
 ; CHECK-O: Running pass: ForceFunctionAttrsPass
-; CHECK-O: Starting {{.*}}Module pass manager run.
 ; CHECK-O: Running pass: PGOIndirectCallPromotion
 ; CHECK-O: Running analysis: ProfileSummaryAnalysis
 ; CHECK-O: Running analysis: InnerAnalysisManagerProxy
@@ -151,8 +150,6 @@
 ; CHECK-O: Invalidating analysis: DemandedBitsAnalysis on main
 ; CHECK-O: Invalidating analysis: PostDominatorTreeAnalysis on main
 ; CHECK-O: Invalidating analysis: CallGraphAnalysis
-; CHECK-O: Finished {{.*}}Module pass manager run.
-; CHECK-O: Starting {{.*}}Module pass manager run.
 ; CHECK-O: Running pass: GlobalOptPass
 ; CHECK-O: Running pass: GlobalDCEPass
 ; CHECK-O: Running pass: EliminateAvailableExternallyPass
@@ -207,7 +204,6 @@
 ; CHECK-O: Running pass: GlobalDCEPass
 ; CHECK-O: Running pass: ConstantMergePass
 ; CHECK-O: Finished {{.*}}Module pass manager run.
-; CHECK-O: Finished {{.*}}Module pass manager run.
 
 target datalayout = 
"e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
 target triple = "x86_64-grtev4-linux-gnu"

diff  --git a/llvm/include/llvm/IR/PassManager.h 
b/llvm/include/llvm/IR/PassManager.h
index 6b4f8e3140ee..44f8900f2ebf 100644
--- a/llvm/include/llvm/IR/PassManager.h
+++ b/llvm/include/llvm/IR/PassManager.h
@@ -548,7 +548,9 @@ class PassManager : public PassInfoMixin<
 return PA;
   }
 
-  template  void addPass(PassT Pass) {
+  template 
+  std::enable_if_t::value>
+  addPass(PassT Pass) {
 using PassModelT =
 detail::PassModel;
@@ -556,6 +558,18 @@ class PassManager : public PassInfoMixin<
 Passes.emplace_back(new PassModelT(std::move(Pass)));
   }
 
+  /// When adding a pass manager pass that has the same type as this pass
+  /// manager, simply move the passes over. This is because we don't have use
+  /// cases rely on executing nested pass managers. Doing this could reduce
+  /// implementation complexity and avoid potential invalidation issues that 
may
+  /// happen with nested pass managers of the same type.
+  template 
+  std::enable_if_t::value>
+  addPass(PassT &) {
+for (auto  : Pass.Passes)
+  Passes.emplace_back(std::move(P));
+  }
+
   static bool isRequired() { return true; }
 
 protected:

diff  --git a/llvm/test/Other/new-pass-manager.ll 
b/llvm/test/Other/new-pass-manager.ll
index 31be3adb6897..70d1f7152120 100644
--- a/llvm/test/Other/new-pass-manager.ll
+++ b/llvm/test/Other/new-pass-manager.ll
@@ -207,7 +207,6 @@
 ; CHECK-INVALIDATE-ALL: Starting llvm::Module pass manager run
 ; CHECK-INVALIDATE-ALL: Running pass: RequireAnalysisPass
 ; CHECK-INVALIDATE-ALL: Running analysis: NoOpModuleAnalysis
-; CHECK-INVALIDATE-ALL: Starting llvm::Module pass manager run
 ; CHECK-INVALIDATE-ALL: Running pass: RequireAnalysisPass
 ; CHECK-INVALIDATE-ALL-NOT: Running analysis: NoOpModuleAnalysis
 ; CHECK-INVALIDATE-ALL: Starting llvm::Function pass manager run
@@ -221,7 +220,6 @@
 ; CHECK-INVALIDATE-ALL: Invalidating analysis: NoOpModuleAnalysis
 ; CHECK-INVALIDATE-ALL: Running pass: RequireAnalysisPass
 ; CHECK-INVALIDATE-ALL: Running analysis: NoOpModuleAnalysis
-; CHECK-INVALIDATE-ALL: Finished llvm::Module pass manager run
 ; CHECK-INVALIDATE-ALL-NOT: Invalidating analysis: NoOpModuleAnalysis
 ; CHECK-INVALIDATE-ALL: Running pass: 

[PATCH] D85676: [NewPM] collapsing nested pass mangers of the same type

2020-10-04 Thread Yuanfang Chen via Phabricator via cfe-commits
This revision was landed with ongoing or failed builds.
This revision was automatically updated to reflect the committed changes.
Closed by commit rG2c94d88e0769: [NewPM] collapsing nested pass mangers of the 
same type (authored by ychen).

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D85676/new/

https://reviews.llvm.org/D85676

Files:
  clang/test/CodeGen/thinlto-distributed-newpm.ll
  llvm/include/llvm/IR/PassManager.h
  llvm/test/Other/new-pass-manager.ll
  llvm/test/Other/new-pm-defaults.ll
  llvm/test/Other/new-pm-lto-defaults.ll
  llvm/test/Other/new-pm-thinlto-defaults.ll
  llvm/test/Other/new-pm-thinlto-postlink-pgo-defaults.ll
  llvm/test/Other/new-pm-thinlto-postlink-samplepgo-defaults.ll
  llvm/test/Other/new-pm-thinlto-prelink-pgo-defaults.ll
  llvm/test/Other/new-pm-thinlto-prelink-samplepgo-defaults.ll
  llvm/test/Other/pass-pipeline-parsing.ll

Index: llvm/test/Other/pass-pipeline-parsing.ll
===
--- llvm/test/Other/pass-pipeline-parsing.ll
+++ llvm/test/Other/pass-pipeline-parsing.ll
@@ -10,11 +10,9 @@
 ; RUN: -passes='module(no-op-module,no-op-module)' %s 2>&1 \
 ; RUN: | FileCheck %s --check-prefix=CHECK-NESTED-TWO-NOOP-MP
 ; CHECK-NESTED-TWO-NOOP-MP: Starting llvm::Module pass manager run
-; CHECK-NESTED-TWO-NOOP-MP: Starting llvm::Module pass manager run
 ; CHECK-NESTED-TWO-NOOP-MP: Running pass: NoOpModulePass
 ; CHECK-NESTED-TWO-NOOP-MP: Running pass: NoOpModulePass
 ; CHECK-NESTED-TWO-NOOP-MP: Finished llvm::Module pass manager run
-; CHECK-NESTED-TWO-NOOP-MP: Finished llvm::Module pass manager run
 
 ; RUN: opt -disable-output -debug-pass-manager \
 ; RUN: -passes=no-op-function,no-op-function %s 2>&1 \
@@ -112,7 +110,6 @@
 ; RUN: -passes='module(function(no-op-function),cgscc(no-op-cgscc,function(no-op-function),no-op-cgscc),function(no-op-function))' %s 2>&1 \
 ; RUN: | FileCheck %s --check-prefix=CHECK-NESTED-MP-CG-FP
 ; CHECK-NESTED-MP-CG-FP: Starting llvm::Module pass manager run
-; CHECK-NESTED-MP-CG-FP: Starting llvm::Module pass manager run
 ; CHECK-NESTED-MP-CG-FP: Starting llvm::Function pass manager run
 ; CHECK-NESTED-MP-CG-FP: Running pass: NoOpFunctionPass
 ; CHECK-NESTED-MP-CG-FP: Finished llvm::Function pass manager run
@@ -127,7 +124,6 @@
 ; CHECK-NESTED-MP-CG-FP: Running pass: NoOpFunctionPass
 ; CHECK-NESTED-MP-CG-FP: Finished llvm::Function pass manager run
 ; CHECK-NESTED-MP-CG-FP: Finished llvm::Module pass manager run
-; CHECK-NESTED-MP-CG-FP: Finished llvm::Module pass manager run
 
 ; RUN: opt -disable-output -debug-pass-manager \
 ; RUN: -passes='no-op-loop,no-op-loop' %s 2>&1 \
@@ -165,7 +161,6 @@
 ; RUN: -passes='module(no-op-function,no-op-loop,no-op-cgscc,cgscc(no-op-function,no-op-loop),function(no-op-loop))' %s 2>&1 \
 ; RUN: | FileCheck %s --check-prefix=CHECK-ADAPTORS
 ; CHECK-ADAPTORS: Starting llvm::Module pass manager run
-; CHECK-ADAPTORS: Starting llvm::Module pass manager run
 ; CHECK-ADAPTORS: Running pass: ModuleToFunctionPassAdaptor<{{.*}}NoOpFunctionPass>
 ; CHECK-ADAPTORS: Running pass: NoOpFunctionPass
 ; CHECK-ADAPTORS: Running pass: ModuleToFunctionPassAdaptor<{{.*}}FunctionToLoopPassAdaptor<{{.*}}NoOpLoopPass>{{.*}}>
@@ -187,7 +182,6 @@
 ; CHECK-ADAPTORS: Running pass: NoOpLoopPass on Loop at depth 1 containing: %loop
 ; CHECK-ADAPTORS: Finished llvm::Function pass manager run
 ; CHECK-ADAPTORS: Finished llvm::Module pass manager run
-; CHECK-ADAPTORS: Finished llvm::Module pass manager run
 
 ; RUN: opt -disable-output -debug-pass-manager \
 ; RUN: -passes='cgscc(print)' %s 2>&1 \
Index: llvm/test/Other/new-pm-thinlto-prelink-samplepgo-defaults.ll
===
--- llvm/test/Other/new-pm-thinlto-prelink-samplepgo-defaults.ll
+++ llvm/test/Other/new-pm-thinlto-prelink-samplepgo-defaults.ll
@@ -26,12 +26,10 @@
 ; RUN: | FileCheck %s --check-prefixes=CHECK-DIS,CHECK-O,CHECK-O2,CHECK-O23SZ,CHECK-O123 --dump-input=fail
 ;
 ; CHECK-O: Starting {{.*}}Module pass manager run.
-; CHECK-O-NEXT: Starting {{.*}}Module pass manager run.
 ; CHECK-O-NEXT: Running pass: ForceFunctionAttrsPass
 ; CHECK-O-NEXT: Running analysis: InnerAnalysisManagerProxy
 ; CHECK-O-NEXT: Running pass: AddDiscriminatorsPass
 ; CHECK-EP-PIPELINE-START-NEXT: Running pass: NoOpModulePass
-; CHECK-O-NEXT: Starting {{.*}}Module pass manager run.
 ; CHECK-O-NEXT: Running pass: InferFunctionAttrsPass
 ; CHECK-O-NEXT: Running analysis: TargetLibraryAnalysis
 ; CHECK-O-NEXT: Starting {{.*}}Function pass manager run.
@@ -171,9 +169,7 @@
 ; CHECK-O-NEXT: Finished {{.*}}Function pass manager run.
 ; CHECK-O-NEXT: Finished CGSCC pass manager run.
 ; CHECK-O-NEXT: Finished {{.*}}Module pass manager run.
-; CHECK-O-NEXT: Finished {{.*}}Module pass manager run.
 ; CHECK-O-NEXT: Running pass: GlobalOptPass
-; CHECK-O-NEXT: Finished {{.*}}Module pass manager run.
 ; CHECK-O-NEXT: 

[PATCH] D88594: [OpenMP] Add Error Handling for Conflicting Pointer Sizes for Target Offload

2020-10-04 Thread Joseph Huber via Phabricator via cfe-commits
jhuber6 updated this revision to Diff 296081.
jhuber6 added a comment.

Changed the lit substitution to be for fixing this problem specifically. It 
made the tests too unreadable and wasn't a good solution since it didn't detect 
16 bit architectures anyway.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D88594/new/

https://reviews.llvm.org/D88594

Files:
  clang/include/clang/Basic/DiagnosticDriverKinds.td
  clang/lib/Frontend/CompilerInvocation.cpp
  clang/test/OpenMP/distribute_parallel_for_if_codegen.cpp
  clang/test/OpenMP/distribute_parallel_for_num_threads_codegen.cpp
  clang/test/OpenMP/distribute_parallel_for_simd_if_codegen.cpp
  clang/test/OpenMP/distribute_parallel_for_simd_num_threads_codegen.cpp
  clang/test/OpenMP/nvptx_target_parallel_reduction_codegen_tbaa_PR46146.cpp
  clang/test/OpenMP/target_incompatible_architecture_messages.cpp
  clang/test/OpenMP/target_teams_distribute_parallel_for_if_codegen.cpp
  clang/test/OpenMP/target_teams_distribute_parallel_for_simd_if_codegen.cpp
  clang/test/OpenMP/teams_distribute_parallel_for_if_codegen.cpp
  clang/test/OpenMP/teams_distribute_parallel_for_simd_if_codegen.cpp
  llvm/utils/lit/lit/llvm/config.py

Index: llvm/utils/lit/lit/llvm/config.py
===
--- llvm/utils/lit/lit/llvm/config.py
+++ llvm/utils/lit/lit/llvm/config.py
@@ -456,6 +456,8 @@
   self.make_itanium_abi_triple(self.config.target_triple)))
 self.config.substitutions.append(('%ms_abi_triple',
   self.make_msabi_triple(self.config.target_triple)))
+self.config.substitutions.append(('%omp_powerpc_triple',
+  'powerpc' + str(sys.hash_info.width) + 'le-ibm-linux-gnu'))
 self.config.substitutions.append(
 ('%resource_dir', builtin_include_dir))
 
Index: clang/test/OpenMP/teams_distribute_parallel_for_simd_if_codegen.cpp
===
--- clang/test/OpenMP/teams_distribute_parallel_for_simd_if_codegen.cpp
+++ clang/test/OpenMP/teams_distribute_parallel_for_simd_if_codegen.cpp
@@ -1,16 +1,16 @@
-// RUN: %clang_cc1 -verify -fopenmp -fopenmp-version=45 -fopenmp-targets=powerpc64le-ibm-linux-gnu -x c++ -triple %itanium_abi_triple -emit-llvm %s -o - | FileCheck %s --check-prefix CHECK --check-prefix OMP45
-// RUN: %clang_cc1 -fopenmp -fopenmp-version=45 -fopenmp-targets=powerpc64le-ibm-linux-gnu -x c++ -std=c++11 -triple %itanium_abi_triple -emit-pch -o %t %s
-// RUN: %clang_cc1 -fopenmp -fopenmp-version=45 -fopenmp-targets=powerpc64le-ibm-linux-gnu -x c++ -triple %itanium_abi_triple -std=c++11 -include-pch %t -verify %s -emit-llvm -o - | FileCheck %s --check-prefix CHECK --check-prefix OMP45
-// RUN: %clang_cc1 -verify -fopenmp -fopenmp-version=50 -DOMP5 -fopenmp-targets=powerpc64le-ibm-linux-gnu -x c++ -triple %itanium_abi_triple -emit-llvm %s -o - | FileCheck %s --check-prefix CHECK --check-prefix OMP50
-// RUN: %clang_cc1 -fopenmp -fopenmp-version=50 -DOMP5 -fopenmp-targets=powerpc64le-ibm-linux-gnu -x c++ -std=c++11 -triple %itanium_abi_triple -emit-pch -o %t %s
-// RUN: %clang_cc1 -fopenmp -fopenmp-version=50 -DOMP5 -fopenmp-targets=powerpc64le-ibm-linux-gnu -x c++ -triple %itanium_abi_triple -std=c++11 -include-pch %t -verify %s -emit-llvm -o - | FileCheck %s --check-prefix CHECK --check-prefix OMP50
-
-// RUN: %clang_cc1 -verify -fopenmp-simd -fopenmp-version=45 -fopenmp-targets=powerpc64le-ibm-linux-gnu -x c++ -triple %itanium_abi_triple -emit-llvm %s -o - | FileCheck --check-prefix SIMD-ONLY0 %s
-// RUN: %clang_cc1 -fopenmp-simd -fopenmp-version=45 -fopenmp-targets=powerpc64le-ibm-linux-gnu -x c++ -std=c++11 -triple %itanium_abi_triple -emit-pch -o %t %s
-// RUN: %clang_cc1 -fopenmp-simd -fopenmp-version=45 -fopenmp-targets=powerpc64le-ibm-linux-gnu -x c++ -triple %itanium_abi_triple -std=c++11 -include-pch %t -verify %s -emit-llvm -o - | FileCheck --check-prefix SIMD-ONLY0 %s
-// RUN: %clang_cc1 -verify -fopenmp-simd -fopenmp-version=50 -DOMP5 -fopenmp-targets=powerpc64le-ibm-linux-gnu -x c++ -triple %itanium_abi_triple -emit-llvm %s -o - | FileCheck --check-prefix SIMD-ONLY0 %s
-// RUN: %clang_cc1 -fopenmp-simd -fopenmp-version=50 -DOMP5 -fopenmp-targets=powerpc64le-ibm-linux-gnu -x c++ -std=c++11 -triple %itanium_abi_triple -emit-pch -o %t %s
-// RUN: %clang_cc1 -fopenmp-simd -fopenmp-version=50 -DOMP5 -fopenmp-targets=powerpc64le-ibm-linux-gnu -x c++ -triple %itanium_abi_triple -std=c++11 -include-pch %t -verify %s -emit-llvm -o - | FileCheck --check-prefix SIMD-ONLY0 %s
+// RUN: %clang_cc1 -verify -fopenmp -fopenmp-version=45 -fopenmp-targets=%omp_powerpc_triple -x c++ -triple %itanium_abi_triple -emit-llvm %s -o - | FileCheck %s --check-prefix CHECK --check-prefix OMP45
+// RUN: %clang_cc1 -fopenmp -fopenmp-version=45 -fopenmp-targets=%omp_powerpc_triple -x c++ 

[PATCH] D88789: [InstCombine] Revert rL226781 "Teach InstCombine to canonicalize loads which are only ever stored to always use a legal integer type if one is available." (PR47592)

2020-10-04 Thread Eli Friedman via Phabricator via cfe-commits
efriedma added a comment.

In D88789#2310606 , @chandlerc wrote:

> FWIW, I still very much feel that this is the correct canonicalization, and 
> that downstream problems *must* be fixed downstream. Avoiding this 
> canonicalization doesn't actually fix them, it just makes us less *aware* of 
> the problems that still fundamentally exist. =[

I'd agree if we excluded all pointers from canonicalization.  But the semantics 
of inttoptr and inttoptr-equivalent memory operations are weird; in general, 
I'm not sure we can recover the original semantics of the code if we throw away 
the pointer-ness of pointer load/store operations.

To address the issue at hand, I think changing the isNonIntegralPointerType() 
check to just isPtrOrPtrVectorTy() would be enough.  I think that might make 
sense?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D88789/new/

https://reviews.llvm.org/D88789

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] 6c6cd5f - [X86] Consolidate wide Key Locker intrinsics into the same header as the other Key Locker intrinsics.

2020-10-04 Thread Craig Topper via cfe-commits

Author: Craig Topper
Date: 2020-10-04T12:09:21-07:00
New Revision: 6c6cd5f8a9750865800ce26bdeacd8455db3

URL: 
https://github.com/llvm/llvm-project/commit/6c6cd5f8a9750865800ce26bdeacd8455db3
DIFF: 
https://github.com/llvm/llvm-project/commit/6c6cd5f8a9750865800ce26bdeacd8455db3.diff

LOG: [X86] Consolidate wide Key Locker intrinsics into the same header as the 
other Key Locker intrinsics.

Added: 


Modified: 
clang/lib/Headers/CMakeLists.txt
clang/lib/Headers/immintrin.h
clang/lib/Headers/keylockerintrin.h

Removed: 
clang/lib/Headers/keylocker_wide_intrin.h



diff  --git a/clang/lib/Headers/CMakeLists.txt 
b/clang/lib/Headers/CMakeLists.txt
index 8c12d5ab935d..95047e7069e7 100644
--- a/clang/lib/Headers/CMakeLists.txt
+++ b/clang/lib/Headers/CMakeLists.txt
@@ -73,7 +73,6 @@ set(files
   invpcidintrin.h
   iso646.h
   keylockerintrin.h
-  keylocker_wide_intrin.h
   limits.h
   lwpintrin.h
   lzcntintrin.h

diff  --git a/clang/lib/Headers/immintrin.h b/clang/lib/Headers/immintrin.h
index 1beade1be248..8fb5447a5919 100644
--- a/clang/lib/Headers/immintrin.h
+++ b/clang/lib/Headers/immintrin.h
@@ -472,15 +472,10 @@ _storebe_i64(void * __P, long long __D) {
 #endif
 
 #if !(defined(_MSC_VER) || defined(__SCE__)) || __has_feature(modules) ||  
\
-defined(__KL__)
+defined(__KL__) || defined(__WIDEKL__)
 #include 
 #endif
 
-#if !(defined(_MSC_VER) || defined(__SCE__)) || __has_feature(modules) ||  
\
-defined(__WIDEKL__)
-#include 
-#endif
-
 #if !(defined(_MSC_VER) || defined(__SCE__)) || __has_feature(modules) ||  
\
 defined(__AMXTILE__) || defined(__AMXINT8__) || defined(__AMXBF16__)
 #include 

diff  --git a/clang/lib/Headers/keylocker_wide_intrin.h 
b/clang/lib/Headers/keylocker_wide_intrin.h
deleted file mode 100644
index 9b6c9ccab811..
--- a/clang/lib/Headers/keylocker_wide_intrin.h
+++ /dev/null
@@ -1,259 +0,0 @@
-/*===-- keylocker_wide_intrin.h - KL_WIDE Intrinsics 
===
- *
- * Permission is hereby granted, free of charge, to any person obtaining a copy
- * of this software and associated documentation files (the "Software"), to 
deal
- * in the Software without restriction, including without limitation the rights
- * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
- * copies of the Software, and to permit persons to whom the Software is
- * furnished to do so, subject to the following conditions:
- *
- * The above copyright notice and this permission notice shall be included in
- * all copies or substantial portions of the Software.
- *
- * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
- * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
- * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
- * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
- * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 
FROM,
- * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
- * THE SOFTWARE.
- *
- *===---===
- */
-
-#ifndef __IMMINTRIN_H
-#error "Never use  directly; include  
instead."
-#endif
-
-#ifndef _KEYLOCKERINTRIN_WIDE_H
-#define _KEYLOCKERINTRIN_WIDE_H
-
-/* Define the default attributes for the functions in this file. */
-#define __DEFAULT_FN_ATTRS \
-  __attribute__((__always_inline__, __nodebug__, __target__("kl,widekl"),\
- __min_vector_width__(128)))
-
-/// Encrypt __idata[0] to __idata[7] using 128-bit AES key indicated by handle
-/// at __h and store each resultant block back from __odata to __odata+7. And
-/// return the affected ZF flag status.
-///
-/// \headerfile 
-///
-/// This intrinsic corresponds to the  AESENCWIDE128KL  instructions.
-///
-/// \operation
-/// Handle := MEM[__h+383:__h]
-/// IllegalHandle := ( HandleReservedBitSet (Handle[383:0]) ||
-///(Handle[127:0] AND (CPL > 0)) ||
-///Handle[255:128] ||
-///HandleKeyType (Handle[383:0]) != HANDLE_KEY_TYPE_AES128 
)
-/// IF (IllegalHandle)
-///   ZF := 1
-/// ELSE
-///   (UnwrappedKey, Authentic) := UnwrapKeyAndAuthenticate384 (Handle[383:0], 
IWKey)
-///   IF Authentic == 0
-/// ZF := 1
-///   ELSE
-/// FOR i := 0 to 7
-///   __odata[i] := AES128Encrypt (__idata[i], UnwrappedKey)
-/// ENDFOR
-/// ZF := 0
-///   FI
-/// FI
-/// dst := ZF
-/// OF := 0
-/// SF := 0
-/// AF := 0
-/// PF := 0
-/// CF := 0
-/// \endoperation
-static __inline__ unsigned char __DEFAULT_FN_ATTRS
-_mm_aesencwide128kl_u8(__m128i __odata[8], const __m128i __idata[8], const 
void* __h) {
-  return __builtin_ia32_aesencwide128kl(__h,
-__odata,
-__odata + 1,
-  

[clang] a02b449 - [X86] Sync AESENC/DEC Key Locker builtins with gcc.

2020-10-04 Thread Craig Topper via cfe-commits

Author: Craig Topper
Date: 2020-10-04T12:09:41-07:00
New Revision: a02b449bb1556fe0f17b86eaa69f6bcda945d123

URL: 
https://github.com/llvm/llvm-project/commit/a02b449bb1556fe0f17b86eaa69f6bcda945d123
DIFF: 
https://github.com/llvm/llvm-project/commit/a02b449bb1556fe0f17b86eaa69f6bcda945d123.diff

LOG: [X86] Sync AESENC/DEC Key Locker builtins with gcc.

For the wide builtins, pass a single input and output pointer to
the builtins. Emit the GEPs and input loads from CGBuiltin.

Added: 


Modified: 
clang/include/clang/Basic/BuiltinsX86.def
clang/lib/CodeGen/CGBuiltin.cpp
clang/lib/Headers/keylockerintrin.h
clang/test/CodeGen/X86/keylocker.c
llvm/test/CodeGen/X86/keylocker-intrinsics-fast-isel.ll

Removed: 




diff  --git a/clang/include/clang/Basic/BuiltinsX86.def 
b/clang/include/clang/Basic/BuiltinsX86.def
index c33026139b3c..8f9cfe4b6dc5 100644
--- a/clang/include/clang/Basic/BuiltinsX86.def
+++ b/clang/include/clang/Basic/BuiltinsX86.def
@@ -1902,22 +1902,16 @@ TARGET_BUILTIN(__builtin_ia32_enqcmds, "Ucv*vC*", "n", 
"enqcmd")
 
 // KEY LOCKER
 TARGET_BUILTIN(__builtin_ia32_loadiwkey, "vV2OiV2OiV2OiUi", "nV:128:", "kl")
-TARGET_BUILTIN(__builtin_ia32_encodekey128_u32,
-   "UiUiV2Oiv*", "nV:128:", "kl")
-TARGET_BUILTIN(__builtin_ia32_encodekey256_u32,
-   "UiUiV2OiV2Oiv*", "nV:128:", "kl")
-TARGET_BUILTIN(__builtin_ia32_aesenc128kl, "UcV2Oi*V2OivC*", "nV:128:", "kl")
-TARGET_BUILTIN(__builtin_ia32_aesenc256kl, "UcV2Oi*V2OivC*", "nV:128:", "kl")
-TARGET_BUILTIN(__builtin_ia32_aesdec128kl, "UcV2Oi*V2OivC*", "nV:128:", "kl")
-TARGET_BUILTIN(__builtin_ia32_aesdec256kl, "UcV2Oi*V2OivC*", "nV:128:", "kl")
-TARGET_BUILTIN(__builtin_ia32_aesencwide128kl,
-   
"UcvC*V2Oi*V2Oi*V2Oi*V2Oi*V2Oi*V2Oi*V2Oi*V2Oi*V2OiV2OiV2OiV2OiV2OiV2OiV2OiV2Oi",
 "nV:128:", "kl,widekl")
-TARGET_BUILTIN(__builtin_ia32_aesencwide256kl,
-   
"UcvC*V2Oi*V2Oi*V2Oi*V2Oi*V2Oi*V2Oi*V2Oi*V2Oi*V2OiV2OiV2OiV2OiV2OiV2OiV2OiV2Oi",
 "nV:128:", "kl,widekl")
-TARGET_BUILTIN(__builtin_ia32_aesdecwide128kl,
-   
"UcvC*V2Oi*V2Oi*V2Oi*V2Oi*V2Oi*V2Oi*V2Oi*V2Oi*V2OiV2OiV2OiV2OiV2OiV2OiV2OiV2Oi",
 "nV:128:", "kl,widekl")
-TARGET_BUILTIN(__builtin_ia32_aesdecwide256kl,
-   
"UcvC*V2Oi*V2Oi*V2Oi*V2Oi*V2Oi*V2Oi*V2Oi*V2Oi*V2OiV2OiV2OiV2OiV2OiV2OiV2OiV2Oi",
 "nV:128:", "kl,widekl")
+TARGET_BUILTIN(__builtin_ia32_encodekey128_u32, "UiUiV2Oiv*", "nV:128:", "kl")
+TARGET_BUILTIN(__builtin_ia32_encodekey256_u32, "UiUiV2OiV2Oiv*", "nV:128:", 
"kl")
+TARGET_BUILTIN(__builtin_ia32_aesenc128kl_u8, "UcV2Oi*V2OivC*", "nV:128:", 
"kl")
+TARGET_BUILTIN(__builtin_ia32_aesenc256kl_u8, "UcV2Oi*V2OivC*", "nV:128:", 
"kl")
+TARGET_BUILTIN(__builtin_ia32_aesdec128kl_u8, "UcV2Oi*V2OivC*", "nV:128:", 
"kl")
+TARGET_BUILTIN(__builtin_ia32_aesdec256kl_u8, "UcV2Oi*V2OivC*", "nV:128:", 
"kl")
+TARGET_BUILTIN(__builtin_ia32_aesencwide128kl_u8, "UcV2Oi*V2OiC*vC*", 
"nV:128:", "kl,widekl")
+TARGET_BUILTIN(__builtin_ia32_aesencwide256kl_u8, "UcV2Oi*V2OiC*vC*", 
"nV:128:", "kl,widekl")
+TARGET_BUILTIN(__builtin_ia32_aesdecwide128kl_u8, "UcV2Oi*V2OiC*vC*", 
"nV:128:", "kl,widekl")
+TARGET_BUILTIN(__builtin_ia32_aesdecwide256kl_u8, "UcV2Oi*V2OiC*vC*", 
"nV:128:", "kl,widekl")
 
 // SERIALIZE
 TARGET_BUILTIN(__builtin_ia32_serialize, "v", "n", "serialize")

diff  --git a/clang/lib/CodeGen/CGBuiltin.cpp b/clang/lib/CodeGen/CGBuiltin.cpp
index d3603579844d..dc3cafa5d062 100644
--- a/clang/lib/CodeGen/CGBuiltin.cpp
+++ b/clang/lib/CodeGen/CGBuiltin.cpp
@@ -14070,75 +14070,67 @@ Value *CodeGenFunction::EmitX86BuiltinExpr(unsigned 
BuiltinID,
 
 return Builder.CreateExtractValue(Call, 0);
   }
-  case X86::BI__builtin_ia32_aesenc128kl:
-  case X86::BI__builtin_ia32_aesdec128kl:
-  case X86::BI__builtin_ia32_aesenc256kl:
-  case X86::BI__builtin_ia32_aesdec256kl:
-  case X86::BI__builtin_ia32_aesencwide128kl:
-  case X86::BI__builtin_ia32_aesdecwide128kl:
-  case X86::BI__builtin_ia32_aesencwide256kl:
-  case X86::BI__builtin_ia32_aesdecwide256kl: {
-int FirstReturnOp;
-int ResultCount;
-SmallVector InOps;
-unsigned ID;
-
+  case X86::BI__builtin_ia32_aesenc128kl_u8:
+  case X86::BI__builtin_ia32_aesdec128kl_u8:
+  case X86::BI__builtin_ia32_aesenc256kl_u8:
+  case X86::BI__builtin_ia32_aesdec256kl_u8: {
+Intrinsic::ID IID;
 switch (BuiltinID) {
-default: llvm_unreachable("Unsupported intrinsic!");
-case X86::BI__builtin_ia32_aesenc128kl:
-case X86::BI__builtin_ia32_aesdec128kl:
-case X86::BI__builtin_ia32_aesenc256kl:
-case X86::BI__builtin_ia32_aesdec256kl: {
-  InOps = {Ops[1], Ops[2]};
-  FirstReturnOp = 0;
-  ResultCount = 1;
-  switch (BuiltinID) {
-  case X86::BI__builtin_ia32_aesenc128kl:
-ID = Intrinsic::x86_aesenc128kl;
-break;
-  case X86::BI__builtin_ia32_aesdec128kl:
-ID = Intrinsic::x86_aesdec128kl;
-

[clang] 230c57b - [X86] Synchronize the encodekey builtins with gcc. Don't assume void* is 16 byte aligned.

2020-10-04 Thread Craig Topper via cfe-commits

Author: Craig Topper
Date: 2020-10-04T12:09:35-07:00
New Revision: 230c57b0bd8321085a5e0339baf37b509d5c76f6

URL: 
https://github.com/llvm/llvm-project/commit/230c57b0bd8321085a5e0339baf37b509d5c76f6
DIFF: 
https://github.com/llvm/llvm-project/commit/230c57b0bd8321085a5e0339baf37b509d5c76f6.diff

LOG: [X86] Synchronize the encodekey builtins with gcc. Don't assume void* is 
16 byte aligned.

We were taking multiple pointer arguments in the builtin.
gcc accepts a single void*.

The cast from void* to _m128i* caused the IR generation to assume
the pointer was aligned.

Instead make the builtin take a single void*, emit i8* GEPs to
adjust then cast to <2 x i64>* and perform a store with align of 1.

Added: 
llvm/test/CodeGen/X86/keylocker-intrinsics-fast-isel.ll

Modified: 
clang/include/clang/Basic/BuiltinsX86.def
clang/lib/CodeGen/CGBuiltin.cpp
clang/lib/Headers/keylockerintrin.h
clang/test/CodeGen/X86/keylocker.c

Removed: 




diff  --git a/clang/include/clang/Basic/BuiltinsX86.def 
b/clang/include/clang/Basic/BuiltinsX86.def
index 1fbc950998a1..c33026139b3c 100644
--- a/clang/include/clang/Basic/BuiltinsX86.def
+++ b/clang/include/clang/Basic/BuiltinsX86.def
@@ -1902,10 +1902,10 @@ TARGET_BUILTIN(__builtin_ia32_enqcmds, "Ucv*vC*", "n", 
"enqcmd")
 
 // KEY LOCKER
 TARGET_BUILTIN(__builtin_ia32_loadiwkey, "vV2OiV2OiV2OiUi", "nV:128:", "kl")
-TARGET_BUILTIN(__builtin_ia32_encodekey128,
-   "UiUiV2OiV2Oi*V2Oi*V2Oi*V2Oi*V2Oi*V2Oi*", "nV:128:", "kl")
-TARGET_BUILTIN(__builtin_ia32_encodekey256,
-   "UiUiV2OiV2OiV2Oi*V2Oi*V2Oi*V2Oi*V2Oi*V2Oi*V2Oi*", "nV:128:", 
"kl")
+TARGET_BUILTIN(__builtin_ia32_encodekey128_u32,
+   "UiUiV2Oiv*", "nV:128:", "kl")
+TARGET_BUILTIN(__builtin_ia32_encodekey256_u32,
+   "UiUiV2OiV2Oiv*", "nV:128:", "kl")
 TARGET_BUILTIN(__builtin_ia32_aesenc128kl, "UcV2Oi*V2OivC*", "nV:128:", "kl")
 TARGET_BUILTIN(__builtin_ia32_aesenc256kl, "UcV2Oi*V2OivC*", "nV:128:", "kl")
 TARGET_BUILTIN(__builtin_ia32_aesdec128kl, "UcV2Oi*V2OivC*", "nV:128:", "kl")

diff  --git a/clang/lib/CodeGen/CGBuiltin.cpp b/clang/lib/CodeGen/CGBuiltin.cpp
index e5f6ee138a21..d3603579844d 100644
--- a/clang/lib/CodeGen/CGBuiltin.cpp
+++ b/clang/lib/CodeGen/CGBuiltin.cpp
@@ -14039,8 +14039,37 @@ Value *CodeGenFunction::EmitX86BuiltinExpr(unsigned 
BuiltinID,
   case X86::BI__builtin_ia32_psubusb128:
   case X86::BI__builtin_ia32_psubusw128:
 return EmitX86BinaryIntrinsic(*this, Ops, Intrinsic::usub_sat);
-  case X86::BI__builtin_ia32_encodekey128:
-  case X86::BI__builtin_ia32_encodekey256:
+  case X86::BI__builtin_ia32_encodekey128_u32: {
+Intrinsic::ID IID = Intrinsic::x86_encodekey128;
+
+Value *Call = Builder.CreateCall(CGM.getIntrinsic(IID), {Ops[0], Ops[1]});
+
+for (int i = 0; i < 6; ++i) {
+  Value *Extract = Builder.CreateExtractValue(Call, i + 1);
+  Value *Ptr = Builder.CreateConstGEP1_32(Ops[2], i * 16);
+  Ptr = Builder.CreateBitCast(
+  Ptr, llvm::PointerType::getUnqual(Extract->getType()));
+  Builder.CreateAlignedStore(Extract, Ptr, Align(1));
+}
+
+return Builder.CreateExtractValue(Call, 0);
+  }
+  case X86::BI__builtin_ia32_encodekey256_u32: {
+Intrinsic::ID IID = Intrinsic::x86_encodekey256;
+
+Value *Call =
+Builder.CreateCall(CGM.getIntrinsic(IID), {Ops[0], Ops[1], Ops[2]});
+
+for (int i = 0; i < 7; ++i) {
+  Value *Extract = Builder.CreateExtractValue(Call, i + 1);
+  Value *Ptr = Builder.CreateConstGEP1_32(Ops[3], i * 16);
+  Ptr = Builder.CreateBitCast(
+  Ptr, llvm::PointerType::getUnqual(Extract->getType()));
+  Builder.CreateAlignedStore(Extract, Ptr, Align(1));
+}
+
+return Builder.CreateExtractValue(Call, 0);
+  }
   case X86::BI__builtin_ia32_aesenc128kl:
   case X86::BI__builtin_ia32_aesdec128kl:
   case X86::BI__builtin_ia32_aesenc256kl:
@@ -14056,18 +14085,6 @@ Value *CodeGenFunction::EmitX86BuiltinExpr(unsigned 
BuiltinID,
 
 switch (BuiltinID) {
 default: llvm_unreachable("Unsupported intrinsic!");
-case X86::BI__builtin_ia32_encodekey128:
-  ID = Intrinsic::x86_encodekey128;
-  InOps = {Ops[0], Ops[1]};
-  FirstReturnOp = 2;
-  ResultCount = 6;
-  break;
-case X86::BI__builtin_ia32_encodekey256:
-  ID = Intrinsic::x86_encodekey256;
-  InOps = {Ops[0], Ops[1], Ops[2]};
-  FirstReturnOp = 3;
-  ResultCount = 7;
-  break;
 case X86::BI__builtin_ia32_aesenc128kl:
 case X86::BI__builtin_ia32_aesdec128kl:
 case X86::BI__builtin_ia32_aesenc256kl:

diff  --git a/clang/lib/Headers/keylockerintrin.h 
b/clang/lib/Headers/keylockerintrin.h
index 718771c869cc..c31ba16122a5 100644
--- a/clang/lib/Headers/keylockerintrin.h
+++ b/clang/lib/Headers/keylockerintrin.h
@@ -132,15 +132,7 @@ _mm_loadiwkey (unsigned int __ctl, __m128i __intkey,
 /// \endoperation
 static __inline__ 

[clang] 28595cb - [X86] Synchronize the loadiwkey builtin operand order with gcc version.

2020-10-04 Thread Craig Topper via cfe-commits

Author: Craig Topper
Date: 2020-10-04T12:09:29-07:00
New Revision: 28595cbbeb2cc75584410b8b974f67ec99a853f2

URL: 
https://github.com/llvm/llvm-project/commit/28595cbbeb2cc75584410b8b974f67ec99a853f2
DIFF: 
https://github.com/llvm/llvm-project/commit/28595cbbeb2cc75584410b8b974f67ec99a853f2.diff

LOG: [X86] Synchronize the loadiwkey builtin operand order with gcc version.

Added: 


Modified: 
clang/include/clang/Basic/BuiltinsX86.def
clang/lib/Headers/keylockerintrin.h
llvm/include/llvm/IR/IntrinsicsX86.td
llvm/lib/Target/X86/X86InstrKL.td
llvm/test/CodeGen/X86/keylocker-intrinsics.ll

Removed: 




diff  --git a/clang/include/clang/Basic/BuiltinsX86.def 
b/clang/include/clang/Basic/BuiltinsX86.def
index e212d0a2a0cc..1fbc950998a1 100644
--- a/clang/include/clang/Basic/BuiltinsX86.def
+++ b/clang/include/clang/Basic/BuiltinsX86.def
@@ -1901,7 +1901,7 @@ TARGET_BUILTIN(__builtin_ia32_enqcmd, "Ucv*vC*", "n", 
"enqcmd")
 TARGET_BUILTIN(__builtin_ia32_enqcmds, "Ucv*vC*", "n", "enqcmd")
 
 // KEY LOCKER
-TARGET_BUILTIN(__builtin_ia32_loadiwkey, "vUiV2OiV2OiV2Oi", "nV:128:", "kl")
+TARGET_BUILTIN(__builtin_ia32_loadiwkey, "vV2OiV2OiV2OiUi", "nV:128:", "kl")
 TARGET_BUILTIN(__builtin_ia32_encodekey128,
"UiUiV2OiV2Oi*V2Oi*V2Oi*V2Oi*V2Oi*V2Oi*", "nV:128:", "kl")
 TARGET_BUILTIN(__builtin_ia32_encodekey256,

diff  --git a/clang/lib/Headers/keylockerintrin.h 
b/clang/lib/Headers/keylockerintrin.h
index 2d6a1ca5851f..718771c869cc 100644
--- a/clang/lib/Headers/keylockerintrin.h
+++ b/clang/lib/Headers/keylockerintrin.h
@@ -95,7 +95,7 @@
 static __inline__ void __DEFAULT_FN_ATTRS
 _mm_loadiwkey (unsigned int __ctl, __m128i __intkey,
__m128i __enkey_lo, __m128i __enkey_hi) {
-  __builtin_ia32_loadiwkey (__ctl, __intkey, __enkey_lo, __enkey_hi);
+  __builtin_ia32_loadiwkey (__intkey, __enkey_lo, __enkey_hi, __ctl);
 }
 
 /// Wrap a 128-bit AES key from __key into a key handle and output in

diff  --git a/llvm/include/llvm/IR/IntrinsicsX86.td 
b/llvm/include/llvm/IR/IntrinsicsX86.td
index 5708a761919f..8546dc34 100644
--- a/llvm/include/llvm/IR/IntrinsicsX86.td
+++ b/llvm/include/llvm/IR/IntrinsicsX86.td
@@ -4953,7 +4953,7 @@ let TargetPrefix = "x86" in {
 // Key Locker
 let TargetPrefix = "x86" in {
   def int_x86_loadiwkey : GCCBuiltin<"__builtin_ia32_loadiwkey">,
-  Intrinsic<[], [llvm_i32_ty, llvm_v2i64_ty, llvm_v2i64_ty, llvm_v2i64_ty],
+  Intrinsic<[], [llvm_v2i64_ty, llvm_v2i64_ty, llvm_v2i64_ty, llvm_i32_ty],
 []>;
   def int_x86_encodekey128 :
   Intrinsic<[llvm_i32_ty, llvm_v2i64_ty, llvm_v2i64_ty,

diff  --git a/llvm/lib/Target/X86/X86InstrKL.td 
b/llvm/lib/Target/X86/X86InstrKL.td
index aa7df4256cec..7a7e6467ae97 100644
--- a/llvm/lib/Target/X86/X86InstrKL.td
+++ b/llvm/lib/Target/X86/X86InstrKL.td
@@ -20,7 +20,7 @@ let SchedRW = [WriteSystem], Predicates = [HasKL] in {
   let Uses = [XMM0, EAX], Defs = [EFLAGS] in {
 def LOADIWKEY : I<0xDC, MRMSrcReg, (outs), (ins VR128:$src1, VR128:$src2),
   "loadiwkey\t{$src2, $src1|$src1, $src2}",
-  [(int_x86_loadiwkey EAX, XMM0, VR128:$src1, 
VR128:$src2)]>, T8XS;
+  [(int_x86_loadiwkey XMM0, VR128:$src1, VR128:$src2, 
EAX)]>, T8XS;
   }
 
   let Uses = [XMM0], Defs = [XMM0, XMM1, XMM2, XMM4, XMM5, XMM6, EFLAGS] in {

diff  --git a/llvm/test/CodeGen/X86/keylocker-intrinsics.ll 
b/llvm/test/CodeGen/X86/keylocker-intrinsics.ll
index 584391f2eafd..2f9797e437b7 100644
--- a/llvm/test/CodeGen/X86/keylocker-intrinsics.ll
+++ b/llvm/test/CodeGen/X86/keylocker-intrinsics.ll
@@ -4,7 +4,7 @@
 ; RUN: llc < %s -verify-machineinstrs -mtriple=x86_64-unkown-unknown 
-mattr=+widekl | FileCheck %s --check-prefix=X64
 ; RUN: llc < %s -verify-machineinstrs -mtriple=i386-unkown-unknown 
-mattr=+widekl -mattr=+avx2 | FileCheck %s --check-prefix=X32
 
-declare void @llvm.x86.loadiwkey(i32, <2 x i64>, <2 x i64>, <2 x i64>)
+declare void @llvm.x86.loadiwkey(<2 x i64>, <2 x i64>, <2 x i64>, i32)
 declare { i32, <2 x i64>, <2 x i64>, <2 x i64>, <2 x i64>, <2 x i64>, <2 x 
i64> } @llvm.x86.encodekey128(i32, <2 x i64>)
 declare { i32, <2 x i64>, <2 x i64>, <2 x i64>, <2 x i64>, <2 x i64>, <2 x 
i64>, <2 x i64> } @llvm.x86.encodekey256(i32, <2 x i64>, <2 x i64>)
 declare { i8, <2 x i64> } @llvm.x86.aesenc128kl(<2 x i64>, i8*)
@@ -29,7 +29,7 @@ define void @test_loadiwkey(i32 %ctl, <2 x i64> %intkey, <2 x 
i64> %enkey_lo, <2
 ; X32-NEXT:loadiwkey %xmm2, %xmm1
 ; X32-NEXT:retl
 entry:
-  tail call void @llvm.x86.loadiwkey(i32 %ctl, <2 x i64> %intkey, <2 x i64> 
%enkey_lo, <2 x i64> %enkey_hi)
+  tail call void @llvm.x86.loadiwkey(<2 x i64> %intkey, <2 x i64> %enkey_lo, 
<2 x i64> %enkey_hi, i32 %ctl)
   ret void
 }
 



___
cfe-commits mailing list
cfe-commits@lists.llvm.org

[PATCH] D87989: [Flang][Driver] Add infrastructure for basic frontend actions and file I/O

2020-10-04 Thread Caroline via Phabricator via cfe-commits
CarolineConcatto marked 7 inline comments as done.
CarolineConcatto added a comment.

@awarzynski patch updated.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D87989/new/

https://reviews.llvm.org/D87989

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D87989: [Flang][Driver] Add infrastructure for basic frontend actions and file I/O

2020-10-04 Thread Caroline via Phabricator via cfe-commits
CarolineConcatto updated this revision to Diff 296066.
CarolineConcatto edited the summary of this revision.
CarolineConcatto added a comment.

address reviews comment


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D87989/new/

https://reviews.llvm.org/D87989

Files:
  clang/include/clang/Driver/Options.h
  clang/include/clang/Driver/Options.td
  clang/lib/Driver/Driver.cpp
  clang/lib/Driver/ToolChains/Flang.cpp
  clang/lib/Driver/Types.cpp
  clang/test/Driver/immediate-options.c
  flang/include/flang/Frontend/CompilerInstance.h
  flang/include/flang/Frontend/CompilerInvocation.h
  flang/include/flang/Frontend/FrontendAction.h
  flang/include/flang/Frontend/FrontendActions.h
  flang/include/flang/Frontend/FrontendOptions.h
  flang/include/flang/FrontendTool/Utils.h
  flang/lib/Frontend/CMakeLists.txt
  flang/lib/Frontend/CompilerInstance.cpp
  flang/lib/Frontend/CompilerInvocation.cpp
  flang/lib/Frontend/FrontendAction.cpp
  flang/lib/Frontend/FrontendActions.cpp
  flang/lib/Frontend/FrontendOptions.cpp
  flang/lib/FrontendTool/CMakeLists.txt
  flang/lib/FrontendTool/ExecuteCompilerInvocation.cpp
  flang/test/Flang-Driver/driver-help-hidden.f90
  flang/test/Flang-Driver/driver-help.f90
  flang/test/Flang-Driver/emit-obj.f90
  flang/test/Frontend/Inputs/hello-world.f90
  flang/test/Frontend/input-output-file.f90
  flang/test/Frontend/multiple-input-files.f90
  flang/test/lit.cfg.py
  flang/tools/flang-driver/fc1_main.cpp
  flang/unittests/Frontend/CMakeLists.txt
  flang/unittests/Frontend/CompilerInstanceTest.cpp
  flang/unittests/Frontend/InputOutputTest.cpp
  llvm/include/llvm/Option/OptTable.h

Index: llvm/include/llvm/Option/OptTable.h
===
--- llvm/include/llvm/Option/OptTable.h
+++ llvm/include/llvm/Option/OptTable.h
@@ -243,7 +243,8 @@
   /// \param Usage - USAGE: Usage
   /// \param Title - OVERVIEW: Title
   /// \param FlagsToInclude - If non-zero, only include options with any
-  /// of these flags set.
+  /// of these flags set. Takes precedence over
+  /// FlagsToExclude.
   /// \param FlagsToExclude - Exclude options with any of these flags set.
   /// \param ShowAllAliases - If true, display all options including aliases
   /// that don't have help texts. By default, we display
Index: flang/unittests/Frontend/InputOutputTest.cpp
===
--- /dev/null
+++ flang/unittests/Frontend/InputOutputTest.cpp
@@ -0,0 +1,72 @@
+//===- unittests/Frontend/OutputStreamTest.cpp --- FrontendAction tests --===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+
+#include "flang/Frontend/CompilerInstance.h"
+#include "flang/Frontend/CompilerInvocation.h"
+#include "flang/Frontend/FrontendOptions.h"
+#include "flang/FrontendTool/Utils.h"
+#include "llvm/Support/FileSystem.h"
+#include "llvm/Support/raw_ostream.h"
+#include "gtest/gtest.h"
+
+using namespace Fortran::frontend;
+
+namespace {
+
+TEST(FrontendAction, TestInputOutputTestAction) {
+  std::string inputFile = "io-file-test.f";
+  std::error_code ec;
+
+  // 1. Create the input file for the file manager
+  // AllSources (which is used to manage files inside every compiler instance),
+  // works with paths. This means that it requires a physical file. Create one.
+  std::unique_ptr os{
+  new llvm::raw_fd_ostream(inputFile, ec, llvm::sys::fs::OF_None)};
+  if (ec)
+FAIL() << "Failed to create the input file";
+
+  // Populate the input file with the pre-defined input and flush it.
+  *(os) << "End Program arithmetic";
+  os.reset();
+
+  // Get the path of the input file
+  llvm::SmallString<64> cwd;
+  if (std::error_code ec = llvm::sys::fs::current_path(cwd))
+FAIL() << "Failed to obtain the current working directory";
+  std::string testFilePath(cwd.c_str());
+  testFilePath += "/" + inputFile;
+
+  // 2. Prepare the compiler (CompilerInvocation + CompilerInstance)
+  CompilerInstance compInst;
+  compInst.CreateDiagnostics();
+  auto invocation = std::make_shared();
+  invocation->GetFrontendOpts().programAction_ = InputOutputTest;
+  compInst.SetInvocation(std::move(invocation));
+  compInst.GetFrontendOpts().inputs_.push_back(
+  FrontendInputFile(/*File=*/testFilePath, Language::Fortran));
+
+  // 3. Set-up the output stream. Using output buffer wrapped as an output
+  // stream, as opposed to an actual file (or a file descriptor).
+  llvm::SmallVector outputFileBuffer;
+  std::unique_ptr outputFileStream(
+  new llvm::raw_svector_ostream(outputFileBuffer));
+  compInst.SetOutputStream(std::move(outputFileStream));
+
+  // 

[PATCH] D88394: [Driver][M68K] (Patch 8/8) Add driver support for M68K

2020-10-04 Thread Min-Yih Hsu via Phabricator via cfe-commits
myhsu updated this revision to Diff 296065.
myhsu added a comment.

- Use the canonical CPU name (i.e. names started with upper case 'M')
- Tell the driver to use integrated assembler (i.e. MC) by default


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D88394/new/

https://reviews.llvm.org/D88394

Files:
  clang/include/clang/Driver/Options.td
  clang/lib/Driver/CMakeLists.txt
  clang/lib/Driver/ToolChains/Arch/M680x0.cpp
  clang/lib/Driver/ToolChains/Arch/M680x0.h
  clang/lib/Driver/ToolChains/Clang.cpp
  clang/lib/Driver/ToolChains/CommonArgs.cpp
  clang/lib/Driver/ToolChains/Gnu.cpp
  clang/lib/Driver/ToolChains/Linux.cpp
  clang/test/Driver/m680x0-features.cpp
  clang/test/Driver/m680x0-sub-archs.cpp

Index: clang/test/Driver/m680x0-sub-archs.cpp
===
--- /dev/null
+++ clang/test/Driver/m680x0-sub-archs.cpp
@@ -0,0 +1,29 @@
+// RUN: %clang -### -target m680x0-unknown-linux -mcpu=68000 %s 2>&1 | FileCheck --check-prefix=CHECK-MX00 %s
+// RUN: %clang -### -target m680x0-unknown-linux -mcpu=m68000 %s 2>&1 | FileCheck --check-prefix=CHECK-MX00 %s
+// RUN: %clang -### -target m680x0-unknown-linux -mcpu=M68000 %s 2>&1 | FileCheck --check-prefix=CHECK-MX00 %s
+// RUN: %clang -### -target m680x0-unknown-linux -m68000 %s 2>&1 | FileCheck --check-prefix=CHECK-MX00 %s
+// CHECK-MX00: "-target-cpu" "M68000"
+
+// RUN: %clang -### -target m680x0-unknown-linux -mcpu=68010 %s 2>&1 | FileCheck --check-prefix=CHECK-MX10 %s
+// RUN: %clang -### -target m680x0-unknown-linux -mcpu=m68010 %s 2>&1 | FileCheck --check-prefix=CHECK-MX10 %s
+// RUN: %clang -### -target m680x0-unknown-linux -mcpu=M68010 %s 2>&1 | FileCheck --check-prefix=CHECK-MX10 %s
+// RUN: %clang -### -target m680x0-unknown-linux -m68010 %s 2>&1 | FileCheck --check-prefix=CHECK-MX10 %s
+// CHECK-MX10: "-target-cpu" "M68010"
+
+// RUN: %clang -### -target m680x0-unknown-linux -mcpu=68020 %s 2>&1 | FileCheck --check-prefix=CHECK-MX20 %s
+// RUN: %clang -### -target m680x0-unknown-linux -mcpu=m68020 %s 2>&1 | FileCheck --check-prefix=CHECK-MX20 %s
+// RUN: %clang -### -target m680x0-unknown-linux -mcpu=M68020 %s 2>&1 | FileCheck --check-prefix=CHECK-MX20 %s
+// RUN: %clang -### -target m680x0-unknown-linux -m68020 %s 2>&1 | FileCheck --check-prefix=CHECK-MX20 %s
+// CHECK-MX20: "-target-cpu" "M68020"
+
+// RUN: %clang -### -target m680x0-unknown-linux -mcpu=68030 %s 2>&1 | FileCheck --check-prefix=CHECK-MX30 %s
+// RUN: %clang -### -target m680x0-unknown-linux -mcpu=m68030 %s 2>&1 | FileCheck --check-prefix=CHECK-MX30 %s
+// RUN: %clang -### -target m680x0-unknown-linux -mcpu=M68030 %s 2>&1 | FileCheck --check-prefix=CHECK-MX30 %s
+// RUN: %clang -### -target m680x0-unknown-linux -m68030 %s 2>&1 | FileCheck --check-prefix=CHECK-MX30 %s
+// CHECK-MX30: "-target-cpu" "M68030"
+
+// RUN: %clang -### -target m680x0-unknown-linux -mcpu=68040 %s 2>&1 | FileCheck --check-prefix=CHECK-MX40 %s
+// RUN: %clang -### -target m680x0-unknown-linux -mcpu=m68040 %s 2>&1 | FileCheck --check-prefix=CHECK-MX40 %s
+// RUN: %clang -### -target m680x0-unknown-linux -mcpu=M68040 %s 2>&1 | FileCheck --check-prefix=CHECK-MX40 %s
+// RUN: %clang -### -target m680x0-unknown-linux -m68040 %s 2>&1 | FileCheck --check-prefix=CHECK-MX40 %s
+// CHECK-MX40: "-target-cpu" "M68040"
Index: clang/test/Driver/m680x0-features.cpp
===
--- /dev/null
+++ clang/test/Driver/m680x0-features.cpp
@@ -0,0 +1,37 @@
+// Check macro definitions
+// RUN: %clang -target m680x0-unknown-linux -m68000 -dM -E %s | FileCheck --check-prefix=CHECK-MX %s
+// CHECK-MX: #define __mc68000 1
+// CHECK-MX: #define __mc68000__ 1
+// CHECK-MX: #define mc68000 1
+
+// RUN: %clang -target m680x0-unknown-linux -m68010 -dM -E %s | FileCheck --check-prefix=CHECK-MX10 %s
+// CHECK-MX10: #define __mc68000 1
+// CHECK-MX10: #define __mc68000__ 1
+// CHECK-MX10: #define __mc68010 1
+// CHECK-MX10: #define __mc68010__ 1
+// CHECK-MX10: #define mc68000 1
+// CHECK-MX10: #define mc68010 1
+
+// RUN: %clang -target m680x0-unknown-linux -m68020 -dM -E %s | FileCheck --check-prefix=CHECK-MX20 %s
+// CHECK-MX20: #define __mc68000 1
+// CHECK-MX20: #define __mc68000__ 1
+// CHECK-MX20: #define __mc68020 1
+// CHECK-MX20: #define __mc68020__ 1
+// CHECK-MX20: #define mc68000 1
+// CHECK-MX20: #define mc68020 1
+
+// RUN: %clang -target m680x0-unknown-linux -m68030 -dM -E %s | FileCheck --check-prefix=CHECK-MX30 %s
+// CHECK-MX30: #define __mc68000 1
+// CHECK-MX30: #define __mc68000__ 1
+// CHECK-MX30: #define __mc68030 1
+// CHECK-MX30: #define __mc68030__ 1
+// CHECK-MX30: #define mc68000 1
+// CHECK-MX30: #define mc68030 1
+
+// RUN: %clang -target m680x0-unknown-linux -m68040 -dM -E %s | FileCheck --check-prefix=CHECK-MX40 %s
+// CHECK-MX40: #define __mc68000 1
+// CHECK-MX40: #define __mc68000__ 1
+// CHECK-MX40: #define __mc68040 1
+// CHECK-MX40: #define __mc68040__ 1
+// CHECK-MX40: #define mc68000 1
+// 

[PATCH] D88393: [cfe][M68K] (Patch 7/8) Basic Clang support

2020-10-04 Thread Min-Yih Hsu via Phabricator via cfe-commits
myhsu updated this revision to Diff 296064.
myhsu added a comment.

Fix the CPU name passing to the Backend


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D88393/new/

https://reviews.llvm.org/D88393

Files:
  clang/include/clang/Basic/Attr.td
  clang/lib/Basic/CMakeLists.txt
  clang/lib/Basic/Targets.cpp
  clang/lib/Basic/Targets/M680x0.cpp
  clang/lib/Basic/Targets/M680x0.h
  clang/lib/CodeGen/TargetInfo.cpp
  clang/lib/Sema/SemaDeclAttr.cpp

Index: clang/lib/Sema/SemaDeclAttr.cpp
===
--- clang/lib/Sema/SemaDeclAttr.cpp
+++ clang/lib/Sema/SemaDeclAttr.cpp
@@ -5797,6 +5797,39 @@
   D->addAttr(::new (S.Context) MipsInterruptAttr(S.Context, AL, Kind));
 }
 
+static void handleM680x0InterruptAttr(Sema , Decl *D, const ParsedAttr ) {
+  if (!checkAttributeNumArgs(S, AL, 1))
+return;
+
+  if (!AL.isArgExpr(0)) {
+S.Diag(AL.getLoc(), diag::err_attribute_argument_type)
+<< AL << AANT_ArgumentIntegerConstant;
+return;
+  }
+
+  // FIXME: Check for decl - it should be void ()(void).
+
+  Expr *NumParamsExpr = static_cast(AL.getArgAsExpr(0));
+  auto MaybeNumParams = NumParamsExpr->getIntegerConstantExpr(S.Context);
+  if (!MaybeNumParams) {
+S.Diag(AL.getLoc(), diag::err_attribute_argument_type)
+<< AL << AANT_ArgumentIntegerConstant
+<< NumParamsExpr->getSourceRange();
+return;
+  }
+
+  unsigned Num = MaybeNumParams->getLimitedValue(255);
+  if ((Num & 1) || Num > 30) {
+S.Diag(AL.getLoc(), diag::err_attribute_argument_out_of_bounds)
+<< AL << (int)MaybeNumParams->getSExtValue()
+<< NumParamsExpr->getSourceRange();
+return;
+  }
+
+  D->addAttr(::new (S.Context) M680x0InterruptAttr(S.Context, AL, Num));
+  D->addAttr(UsedAttr::CreateImplicit(S.Context));
+}
+
 static void handleAnyX86InterruptAttr(Sema , Decl *D, const ParsedAttr ) {
   // Semantic checks for a function with the 'interrupt' attribute.
   // a) Must be a function.
@@ -6069,6 +6102,9 @@
   case llvm::Triple::mips:
 handleMipsInterruptAttr(S, D, AL);
 break;
+  case llvm::Triple::m680x0:
+handleM680x0InterruptAttr(S, D, AL);
+break;
   case llvm::Triple::x86:
   case llvm::Triple::x86_64:
 handleAnyX86InterruptAttr(S, D, AL);
Index: clang/lib/CodeGen/TargetInfo.cpp
===
--- clang/lib/CodeGen/TargetInfo.cpp
+++ clang/lib/CodeGen/TargetInfo.cpp
@@ -8064,6 +8064,45 @@
   return false;
 }
 
+//===--===//
+// M680x0 ABI Implementation
+//===--===//
+
+namespace {
+
+class M680x0TargetCodeGenInfo : public TargetCodeGenInfo {
+public:
+  M680x0TargetCodeGenInfo(CodeGenTypes )
+  : TargetCodeGenInfo(std::make_unique(CGT)) {}
+  void setTargetAttributes(const Decl *D, llvm::GlobalValue *GV,
+   CodeGen::CodeGenModule ) const override;
+};
+
+} // namespace
+
+// TODO Does not actually work right now
+void M680x0TargetCodeGenInfo::setTargetAttributes(
+const Decl *D, llvm::GlobalValue *GV, CodeGen::CodeGenModule ) const {
+  if (const FunctionDecl *FD = dyn_cast_or_null(D)) {
+if (const M680x0InterruptAttr *attr = FD->getAttr()) {
+  // Handle 'interrupt' attribute:
+  llvm::Function *F = cast(GV);
+
+  // Step 1: Set ISR calling convention.
+  F->setCallingConv(llvm::CallingConv::M680x0_INTR);
+
+  // Step 2: Add attributes goodness.
+  F->addFnAttr(llvm::Attribute::NoInline);
+
+  // ??? is this right
+  // Step 3: Emit ISR vector alias.
+  unsigned Num = attr->getNumber() / 2;
+  llvm::GlobalAlias::create(llvm::Function::ExternalLinkage,
+"__isr_" + Twine(Num), F);
+}
+  }
+}
+
 //===--===//
 // AVR ABI Implementation.
 //===--===//
Index: clang/lib/Basic/Targets/M680x0.h
===
--- /dev/null
+++ clang/lib/Basic/Targets/M680x0.h
@@ -0,0 +1,57 @@
+//===--- M680x0.h - Declare M680x0 target feature support ---*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+//
+// This file declares M680x0 TargetInfo objects.
+//
+//===--===//
+
+#ifndef M680X0_H_LTNCIPAD
+#define M680X0_H_LTNCIPAD
+
+#include "OSTargets.h"
+#include "clang/Basic/TargetInfo.h"
+#include "clang/Basic/TargetOptions.h"
+#include "llvm/ADT/Triple.h"
+#include "llvm/Support/Compiler.h"
+
+namespace clang {

[PATCH] D88789: [InstCombine] Revert rL226781 "Teach InstCombine to canonicalize loads which are only ever stored to always use a legal integer type if one is available." (PR47592)

2020-10-04 Thread Roman Lebedev via Phabricator via cfe-commits
lebedev.ri updated this revision to Diff 296055.
lebedev.ri added a comment.
Herald added a project: clang.
Herald added a subscriber: cfe-commits.

Rebase/fix remaining tests.

In D88789#2310441 , @nlopes wrote:

> Not introducing inttoptr during optimization is a very healthy goal.

Thank you for pointing that out.
Indeed, that is very precisely my goal here.

In D88789#2310593 , @nikic wrote:

>> as it was rightfully pointed out, that is very much not compile-time free: 
>> https://llvm-compile-time-tracker.com/compare.php?from=871d03a6751e0f82e210c80a881ef357c5633a26=782be5b99377b62e998e4157ddede0fa296664b5=instructions
>
> Looks free to me?

We can revisit that patch afterwards.

In D88789#2310593 , @nikic wrote:

> In any case, this change looks reasonable to me. GVN has no problems 
> deduplicating load/stores from different types 
> (https://llvm.godbolt.org/z/5nTjWE), so I'm not sure what this 
> canonicalization was useful for.

Yep.

In D88789#2310606 , @chandlerc wrote:

> FWIW, I still very much feel that this is the correct canonicalization, and 
> that downstream problems *must* be fixed downstream. Avoiding this 
> canonicalization doesn't actually fix them, it just makes us less *aware* of 
> the problems that still fundamentally exist. =[
>
> That said, I'm not heavily involved in LLVM, and so if everyone currently 
> involved thinks this is a good change, I'm not going to stand in the way. It 
> just makes no sense to me.

Thank you for commenting!


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D88789/new/

https://reviews.llvm.org/D88789

Files:
  clang/test/CodeGen/attr-arm-sve-vector-bits-bitcast.c
  clang/test/CodeGen/attr-arm-sve-vector-bits-call.c
  clang/test/CodeGen/attr-arm-sve-vector-bits-cast.c
  clang/test/CodeGen/attr-arm-sve-vector-bits-globals.c
  llvm/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp
  llvm/test/Transforms/InstCombine/atomic.ll
  llvm/test/Transforms/InstCombine/load.ll
  llvm/test/Transforms/InstCombine/loadstore-metadata.ll
  llvm/test/Transforms/InstCombine/non-integral-pointers.ll
  llvm/test/Transforms/PhaseOrdering/instcombine-sroa-inttoptr.ll

Index: llvm/test/Transforms/PhaseOrdering/instcombine-sroa-inttoptr.ll
===
--- llvm/test/Transforms/PhaseOrdering/instcombine-sroa-inttoptr.ll
+++ llvm/test/Transforms/PhaseOrdering/instcombine-sroa-inttoptr.ll
@@ -50,10 +50,10 @@
 define dso_local void @_Z3gen1S(%0* noalias sret align 8 %arg, %0* byval(%0) align 8 %arg1) {
 ; CHECK-LABEL: @_Z3gen1S(
 ; CHECK-NEXT:  bb:
-; CHECK-NEXT:[[TMP0:%.*]] = bitcast %0* [[ARG1:%.*]] to i64*
-; CHECK-NEXT:[[I21:%.*]] = load i64, i64* [[TMP0]], align 8
-; CHECK-NEXT:[[TMP1:%.*]] = bitcast %0* [[ARG:%.*]] to i64*
-; CHECK-NEXT:store i64 [[I21]], i64* [[TMP1]], align 8
+; CHECK-NEXT:[[I:%.*]] = getelementptr inbounds [[TMP0:%.*]], %0* [[ARG1:%.*]], i64 0, i32 0
+; CHECK-NEXT:[[I2:%.*]] = load i32*, i32** [[I]], align 8
+; CHECK-NEXT:[[I3:%.*]] = getelementptr inbounds [[TMP0]], %0* [[ARG:%.*]], i64 0, i32 0
+; CHECK-NEXT:store i32* [[I2]], i32** [[I3]], align 8
 ; CHECK-NEXT:ret void
 ;
 bb:
@@ -68,12 +68,12 @@
 ; CHECK-LABEL: @_Z3foo1S(
 ; CHECK-NEXT:  bb:
 ; CHECK-NEXT:[[I2:%.*]] = alloca [[TMP0:%.*]], align 8
-; CHECK-NEXT:[[TMP0]] = getelementptr inbounds [[TMP0]], %0* [[ARG:%.*]], i64 0, i32 0
-; CHECK-NEXT:[[I1_SROA_0_0_COPYLOAD15:%.*]] = load i32*, i32** [[TMP0]], align 8
+; CHECK-NEXT:[[I1_SROA_0_0_I5_SROA_IDX:%.*]] = getelementptr inbounds [[TMP0]], %0* [[ARG:%.*]], i64 0, i32 0
+; CHECK-NEXT:[[I1_SROA_0_0_COPYLOAD:%.*]] = load i32*, i32** [[I1_SROA_0_0_I5_SROA_IDX]], align 8
 ; CHECK-NEXT:[[I_SROA_0_0_I6_SROA_IDX:%.*]] = getelementptr inbounds [[TMP0]], %0* [[I2]], i64 0, i32 0
-; CHECK-NEXT:store i32* [[I1_SROA_0_0_COPYLOAD15]], i32** [[I_SROA_0_0_I6_SROA_IDX]], align 8
+; CHECK-NEXT:store i32* [[I1_SROA_0_0_COPYLOAD]], i32** [[I_SROA_0_0_I6_SROA_IDX]], align 8
 ; CHECK-NEXT:tail call void @_Z7escape01S(%0* nonnull byval(%0) align 8 [[I2]])
-; CHECK-NEXT:ret i32* [[I1_SROA_0_0_COPYLOAD15]]
+; CHECK-NEXT:ret i32* [[I1_SROA_0_0_COPYLOAD]]
 ;
 bb:
   %i = alloca %0, align 8
@@ -107,21 +107,21 @@
 define dso_local i32* @_Z3bar1S(%0* byval(%0) align 8 %arg) {
 ; CHECK-LABEL: @_Z3bar1S(
 ; CHECK-NEXT:  bb:
-; CHECK-NEXT:[[TMP0:%.*]] = getelementptr inbounds [[TMP0]], %0* [[ARG:%.*]], i64 0, i32 0
-; CHECK-NEXT:[[I1_SROA_0_0_COPYLOAD14:%.*]] = load i32*, i32** [[TMP0]], align 8
+; CHECK-NEXT:[[I1_SROA_0_0_I4_SROA_IDX:%.*]] = getelementptr inbounds [[TMP0:%.*]], %0* [[ARG:%.*]], i64 0, i32 0
+; CHECK-NEXT:[[I1_SROA_0_0_COPYLOAD:%.*]] = load i32*, i32** [[I1_SROA_0_0_I4_SROA_IDX]], align 8
 ; 

[clang] aaae13d - [NFC][clang][codegen] Autogenerate a few ARM SVE tests that are being affected by an upcoming patch

2020-10-04 Thread Roman Lebedev via cfe-commits

Author: Roman Lebedev
Date: 2020-10-04T19:54:09+03:00
New Revision: aaae13d0c29ec2a20f93e6adb9d9b5c2656d2af6

URL: 
https://github.com/llvm/llvm-project/commit/aaae13d0c29ec2a20f93e6adb9d9b5c2656d2af6
DIFF: 
https://github.com/llvm/llvm-project/commit/aaae13d0c29ec2a20f93e6adb9d9b5c2656d2af6.diff

LOG: [NFC][clang][codegen] Autogenerate a few ARM SVE tests that are being 
affected by an upcoming patch

Added: 


Modified: 
clang/test/CodeGen/attr-arm-sve-vector-bits-bitcast.c
clang/test/CodeGen/attr-arm-sve-vector-bits-call.c
clang/test/CodeGen/attr-arm-sve-vector-bits-cast.c
clang/test/CodeGen/attr-arm-sve-vector-bits-globals.c

Removed: 




diff  --git a/clang/test/CodeGen/attr-arm-sve-vector-bits-bitcast.c 
b/clang/test/CodeGen/attr-arm-sve-vector-bits-bitcast.c
index 84559e9edb9a..3a5628d7f57e 100644
--- a/clang/test/CodeGen/attr-arm-sve-vector-bits-bitcast.c
+++ b/clang/test/CodeGen/attr-arm-sve-vector-bits-bitcast.c
@@ -31,21 +31,21 @@ DEFINE_STRUCT(bool)
 // CHECK-128-NEXT:  entry:
 // CHECK-128-NEXT:[[ARRAYIDX:%.*]] = getelementptr inbounds 
[[STRUCT_STRUCT_INT64:%.*]], %struct.struct_int64* [[S:%.*]], i64 0, i32 1, i64 0
 // CHECK-128-NEXT:[[TMP0:%.*]] = bitcast <2 x i64>* [[ARRAYIDX]] to 
*
-// CHECK-128-NEXT:[[TMP1:%.*]] = load , * [[TMP0]], align 16, [[TBAA2:!tbaa !.*]]
+// CHECK-128-NEXT:[[TMP1:%.*]] = load , * [[TMP0]], align 16, [[TBAA6:!tbaa !.*]]
 // CHECK-128-NEXT:ret  [[TMP1]]
 //
 // CHECK-256-LABEL: @read_int64(
 // CHECK-256-NEXT:  entry:
 // CHECK-256-NEXT:[[ARRAYIDX:%.*]] = getelementptr inbounds 
[[STRUCT_STRUCT_INT64:%.*]], %struct.struct_int64* [[S:%.*]], i64 0, i32 1, i64 0
 // CHECK-256-NEXT:[[TMP0:%.*]] = bitcast <4 x i64>* [[ARRAYIDX]] to 
*
-// CHECK-256-NEXT:[[TMP1:%.*]] = load , * [[TMP0]], align 16, [[TBAA2:!tbaa !.*]]
+// CHECK-256-NEXT:[[TMP1:%.*]] = load , * [[TMP0]], align 16, [[TBAA6:!tbaa !.*]]
 // CHECK-256-NEXT:ret  [[TMP1]]
 //
 // CHECK-512-LABEL: @read_int64(
 // CHECK-512-NEXT:  entry:
 // CHECK-512-NEXT:[[ARRAYIDX:%.*]] = getelementptr inbounds 
[[STRUCT_STRUCT_INT64:%.*]], %struct.struct_int64* [[S:%.*]], i64 0, i32 1, i64 0
 // CHECK-512-NEXT:[[TMP0:%.*]] = bitcast <8 x i64>* [[ARRAYIDX]] to 
*
-// CHECK-512-NEXT:[[TMP1:%.*]] = load , * [[TMP0]], align 16, [[TBAA2:!tbaa !.*]]
+// CHECK-512-NEXT:[[TMP1:%.*]] = load , * [[TMP0]], align 16, [[TBAA6:!tbaa !.*]]
 // CHECK-512-NEXT:ret  [[TMP1]]
 //
 svint64_t read_int64(struct struct_int64 *s) {
@@ -55,31 +55,31 @@ svint64_t read_int64(struct struct_int64 *s) {
 // CHECK-128-LABEL: @write_int64(
 // CHECK-128-NEXT:  entry:
 // CHECK-128-NEXT:[[X_ADDR:%.*]] = alloca , align 16
-// CHECK-128-NEXT:store  [[X:%.*]], * 
[[X_ADDR]], align 16, [[TBAA5:!tbaa !.*]]
+// CHECK-128-NEXT:store  [[X:%.*]], * 
[[X_ADDR]], align 16, [[TBAA9:!tbaa !.*]]
 // CHECK-128-NEXT:[[TMP0:%.*]] = bitcast * [[X_ADDR]] to 
<2 x i64>*
-// CHECK-128-NEXT:[[TMP1:%.*]] = load <2 x i64>, <2 x i64>* [[TMP0]], 
align 16, [[TBAA2]]
+// CHECK-128-NEXT:[[TMP1:%.*]] = load <2 x i64>, <2 x i64>* [[TMP0]], 
align 16, [[TBAA6]]
 // CHECK-128-NEXT:[[ARRAYIDX:%.*]] = getelementptr inbounds 
[[STRUCT_STRUCT_INT64:%.*]], %struct.struct_int64* [[S:%.*]], i64 0, i32 1, i64 0
-// CHECK-128-NEXT:store <2 x i64> [[TMP1]], <2 x i64>* [[ARRAYIDX]], align 
16, [[TBAA2]]
+// CHECK-128-NEXT:store <2 x i64> [[TMP1]], <2 x i64>* [[ARRAYIDX]], align 
16, [[TBAA6]]
 // CHECK-128-NEXT:ret void
 //
 // CHECK-256-LABEL: @write_int64(
 // CHECK-256-NEXT:  entry:
 // CHECK-256-NEXT:[[X_ADDR:%.*]] = alloca , align 16
-// CHECK-256-NEXT:store  [[X:%.*]], * 
[[X_ADDR]], align 16, [[TBAA5:!tbaa !.*]]
+// CHECK-256-NEXT:store  [[X:%.*]], * 
[[X_ADDR]], align 16, [[TBAA9:!tbaa !.*]]
 // CHECK-256-NEXT:[[TMP0:%.*]] = bitcast * [[X_ADDR]] to 
<4 x i64>*
-// CHECK-256-NEXT:[[TMP1:%.*]] = load <4 x i64>, <4 x i64>* [[TMP0]], 
align 16, [[TBAA2]]
+// CHECK-256-NEXT:[[TMP1:%.*]] = load <4 x i64>, <4 x i64>* [[TMP0]], 
align 16, [[TBAA6]]
 // CHECK-256-NEXT:[[ARRAYIDX:%.*]] = getelementptr inbounds 
[[STRUCT_STRUCT_INT64:%.*]], %struct.struct_int64* [[S:%.*]], i64 0, i32 1, i64 0
-// CHECK-256-NEXT:store <4 x i64> [[TMP1]], <4 x i64>* [[ARRAYIDX]], align 
16, [[TBAA2]]
+// CHECK-256-NEXT:store <4 x i64> [[TMP1]], <4 x i64>* [[ARRAYIDX]], align 
16, [[TBAA6]]
 // CHECK-256-NEXT:ret void
 //
 // CHECK-512-LABEL: @write_int64(
 // CHECK-512-NEXT:  entry:
 // CHECK-512-NEXT:[[X_ADDR:%.*]] = alloca , align 16
-// CHECK-512-NEXT:store  [[X:%.*]], * 
[[X_ADDR]], align 16, [[TBAA5:!tbaa !.*]]
+// CHECK-512-NEXT:store  [[X:%.*]], * 
[[X_ADDR]], align 16, [[TBAA9:!tbaa !.*]]
 // CHECK-512-NEXT:[[TMP0:%.*]] = bitcast * [[X_ADDR]] to 
<8 x i64>*
-// CHECK-512-NEXT:[[TMP1:%.*]] = load <8 x i64>, <8 x i64>* [[TMP0]], 
align 16, [[TBAA2]]
+// 

[PATCH] D88278: [PowerPC] Add builtins for xvtdiv(dp|sp) and xvtsqrt(dp|sp).

2020-10-04 Thread EsmeYi via Phabricator via cfe-commits
This revision was landed with ongoing or failed builds.
This revision was automatically updated to reflect the committed changes.
Closed by commit rGe3475f5b91c8: [PowerPC] Add builtins for xvtdiv(dp|sp) and 
xvtsqrt(dp|sp). (authored by Esme).

Changed prior to commit:
  https://reviews.llvm.org/D88278?vs=294229=296054#toc

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D88278/new/

https://reviews.llvm.org/D88278

Files:
  clang/include/clang/Basic/BuiltinsPPC.def
  clang/lib/Headers/altivec.h
  clang/test/CodeGen/builtins-ppc-vsx.c
  llvm/include/llvm/IR/IntrinsicsPowerPC.td
  llvm/lib/Target/PowerPC/PPCInstrVSX.td
  llvm/test/CodeGen/PowerPC/vsx_builtins.ll

Index: llvm/test/CodeGen/PowerPC/vsx_builtins.ll
===
--- llvm/test/CodeGen/PowerPC/vsx_builtins.ll
+++ llvm/test/CodeGen/PowerPC/vsx_builtins.ll
@@ -54,3 +54,55 @@
 }
 ; Function Attrs: nounwind readnone
 declare void @llvm.ppc.vsx.stxvd2x.be(<2 x double>, i8*)
+
+define i32 @test_vec_test_swdiv(<2 x double> %a, <2 x double> %b) {
+; CHECK-LABEL: test_vec_test_swdiv:
+; CHECK:   # %bb.0: # %entry
+; CHECK-NEXT:xvtdivdp cr0, v2, v3
+; CHECK-NEXT:mfocrf r3, 128
+; CHECK-NEXT:srwi r3, r3, 28
+; CHECK-NEXT:blr
+  entry:
+%0 = tail call i32 @llvm.ppc.vsx.xvtdivdp(<2 x double> %a, <2 x double> %b)
+ret i32 %0
+}
+declare i32 @llvm.ppc.vsx.xvtdivdp(<2 x double>, <2 x double>)
+
+define i32 @test_vec_test_swdivs(<4 x float> %a, <4 x float> %b) {
+; CHECK-LABEL: test_vec_test_swdivs:
+; CHECK:   # %bb.0: # %entry
+; CHECK-NEXT:xvtdivsp cr0, v2, v3
+; CHECK-NEXT:mfocrf r3, 128
+; CHECK-NEXT:srwi r3, r3, 28
+; CHECK-NEXT:blr
+  entry:
+%0 = tail call i32 @llvm.ppc.vsx.xvtdivsp(<4 x float> %a, <4 x float> %b)
+ret i32 %0
+}
+declare i32 @llvm.ppc.vsx.xvtdivsp(<4 x float>, <4 x float>)
+
+define i32 @test_vec_test_swsqrt(<2 x double> %a) {
+; CHECK-LABEL: test_vec_test_swsqrt:
+; CHECK:   # %bb.0: # %entry
+; CHECK-NEXT:xvtsqrtdp cr0, v2
+; CHECK-NEXT:mfocrf r3, 128
+; CHECK-NEXT:srwi r3, r3, 28
+; CHECK-NEXT:blr
+  entry:
+%0 = tail call i32 @llvm.ppc.vsx.xvtsqrtdp(<2 x double> %a)
+ret i32 %0
+}
+declare i32 @llvm.ppc.vsx.xvtsqrtdp(<2 x double>)
+
+define i32 @test_vec_test_swsqrts(<4 x float> %a) {
+; CHECK-LABEL: test_vec_test_swsqrts:
+; CHECK:   # %bb.0: # %entry
+; CHECK-NEXT:xvtsqrtsp cr0, v2
+; CHECK-NEXT:mfocrf r3, 128
+; CHECK-NEXT:srwi r3, r3, 28
+; CHECK-NEXT:blr
+  entry:
+%0 = tail call i32 @llvm.ppc.vsx.xvtsqrtsp(<4 x float> %a)
+ret i32 %0
+}
+declare i32 @llvm.ppc.vsx.xvtsqrtsp(<4 x float>)
Index: llvm/lib/Target/PowerPC/PPCInstrVSX.td
===
--- llvm/lib/Target/PowerPC/PPCInstrVSX.td
+++ llvm/lib/Target/PowerPC/PPCInstrVSX.td
@@ -2591,6 +2591,16 @@
 def : Pat<(int_ppc_vsx_xvdivdp v2f64:$A, v2f64:$B),
   (XVDIVDP $A, $B)>;
 
+// Vector test for software divide and sqrt.
+def : Pat<(i32 (int_ppc_vsx_xvtdivdp v2f64:$A, v2f64:$B)),
+  (COPY_TO_REGCLASS (XVTDIVDP $A, $B), GPRC)>;
+def : Pat<(i32 (int_ppc_vsx_xvtdivsp v4f32:$A, v4f32:$B)),
+  (COPY_TO_REGCLASS (XVTDIVSP $A, $B), GPRC)>;
+def : Pat<(i32 (int_ppc_vsx_xvtsqrtdp v2f64:$A)),
+  (COPY_TO_REGCLASS (XVTSQRTDP $A), GPRC)>;
+def : Pat<(i32 (int_ppc_vsx_xvtsqrtsp v4f32:$A)),
+  (COPY_TO_REGCLASS (XVTSQRTSP $A), GPRC)>;
+
 // Reciprocal estimate
 def : Pat<(int_ppc_vsx_xvresp v4f32:$A),
   (XVRESP $A)>;
Index: llvm/include/llvm/IR/IntrinsicsPowerPC.td
===
--- llvm/include/llvm/IR/IntrinsicsPowerPC.td
+++ llvm/include/llvm/IR/IntrinsicsPowerPC.td
@@ -1249,6 +1249,16 @@
 def int_ppc_vsx_xvtlsbb :
   PowerPC_VSX_Intrinsic<"xvtlsbb", [llvm_i32_ty],
 [llvm_v16i8_ty, llvm_i32_ty], [IntrNoMem]>;
+def int_ppc_vsx_xvtdivdp :
+  PowerPC_VSX_Intrinsic<"xvtdivdp", [llvm_i32_ty],
+[llvm_v2f64_ty, llvm_v2f64_ty], [IntrNoMem]>;
+def int_ppc_vsx_xvtdivsp :
+  PowerPC_VSX_Intrinsic<"xvtdivsp", [llvm_i32_ty],
+[llvm_v4f32_ty, llvm_v4f32_ty], [IntrNoMem]>;
+def int_ppc_vsx_xvtsqrtdp :
+  PowerPC_VSX_Intrinsic<"xvtsqrtdp", [llvm_i32_ty], [llvm_v2f64_ty], [IntrNoMem]>;
+def int_ppc_vsx_xvtsqrtsp :
+  PowerPC_VSX_Intrinsic<"xvtsqrtsp", [llvm_i32_ty], [llvm_v4f32_ty], [IntrNoMem]>;
 def int_ppc_vsx_xxeval :
   PowerPC_VSX_Intrinsic<"xxeval", [llvm_v2i64_ty],
[llvm_v2i64_ty, llvm_v2i64_ty,
Index: clang/test/CodeGen/builtins-ppc-vsx.c
===
--- clang/test/CodeGen/builtins-ppc-vsx.c
+++ clang/test/CodeGen/builtins-ppc-vsx.c
@@ -52,6 +52,7 @@
 vector signed __int128 res_vslll;
 
 double res_d;
+int res_i;
 float res_af[4];
 double 

[clang] e3475f5 - [PowerPC] Add builtins for xvtdiv(dp|sp) and xvtsqrt(dp|sp).

2020-10-04 Thread via cfe-commits

Author: Esme-Yi
Date: 2020-10-04T16:24:20Z
New Revision: e3475f5b91c8dc3142b90b2bb4a1884d6e8d8c2c

URL: 
https://github.com/llvm/llvm-project/commit/e3475f5b91c8dc3142b90b2bb4a1884d6e8d8c2c
DIFF: 
https://github.com/llvm/llvm-project/commit/e3475f5b91c8dc3142b90b2bb4a1884d6e8d8c2c.diff

LOG: [PowerPC] Add builtins for xvtdiv(dp|sp) and xvtsqrt(dp|sp).

Summary: This patch implements the builtins for xvtdivdp, xvtdivsp, xvtsqrtdp, 
xvtsqrtsp.
The instructions correspond to the following builtins:
int vec_test_swdiv(vector double v1, vector double v2);
int vec_test_swdivs(vector float v1, vector float v2);
int vec_test_swsqrt(vector double v1);
int vec_test_swsqrts(vector float v1);
This patch depends on D88274, which fixes the bug in copying from CRRC to 
GPRC/G8RC.

Reviewed By: steven.zhang, amyk

Differential Revision: https://reviews.llvm.org/D88278

Added: 


Modified: 
clang/include/clang/Basic/BuiltinsPPC.def
clang/lib/Headers/altivec.h
clang/test/CodeGen/builtins-ppc-vsx.c
llvm/include/llvm/IR/IntrinsicsPowerPC.td
llvm/lib/Target/PowerPC/PPCInstrVSX.td
llvm/test/CodeGen/PowerPC/vsx_builtins.ll

Removed: 




diff  --git a/clang/include/clang/Basic/BuiltinsPPC.def 
b/clang/include/clang/Basic/BuiltinsPPC.def
index 29bce799c8f4..015411abc508 100644
--- a/clang/include/clang/Basic/BuiltinsPPC.def
+++ b/clang/include/clang/Basic/BuiltinsPPC.def
@@ -558,6 +558,11 @@ BUILTIN(__builtin_vsx_xxeval, 
"V2ULLiV2ULLiV2ULLiV2ULLiIi", "")
 
 BUILTIN(__builtin_vsx_xvtlsbb, "iV16UcUi", "")
 
+BUILTIN(__builtin_vsx_xvtdivdp, "iV2dV2d", "")
+BUILTIN(__builtin_vsx_xvtdivsp, "iV4fV4f", "")
+BUILTIN(__builtin_vsx_xvtsqrtdp, "iV2d", "")
+BUILTIN(__builtin_vsx_xvtsqrtsp, "iV4f", "")
+
 // P10 Vector Permute Extended built-in.
 BUILTIN(__builtin_vsx_xxpermx, "V16UcV16UcV16UcV16UcIi", "")
 

diff  --git a/clang/lib/Headers/altivec.h b/clang/lib/Headers/altivec.h
index 572b8863dd1a..1d7bc201d330 100644
--- a/clang/lib/Headers/altivec.h
+++ b/clang/lib/Headers/altivec.h
@@ -3504,6 +3504,20 @@ vec_div(vector signed __int128 __a, vector signed 
__int128 __b) {
 }
 #endif __POWER10_VECTOR__
 
+/* vec_xvtdiv */
+
+#ifdef __VSX__
+static __inline__ int __ATTRS_o_ai vec_test_swdiv(vector double __a,
+  vector double __b) {
+  return __builtin_vsx_xvtdivdp(__a, __b);
+}
+
+static __inline__ int __ATTRS_o_ai vec_test_swdivs(vector float __a,
+   vector float __b) {
+  return __builtin_vsx_xvtdivsp(__a, __b);
+}
+#endif
+
 /* vec_dss */
 
 #define vec_dss __builtin_altivec_dss
@@ -8057,6 +8071,18 @@ vec_vrsqrtefp(vector float __a) {
   return __builtin_altivec_vrsqrtefp(__a);
 }
 
+/* vec_xvtsqrt */
+
+#ifdef __VSX__
+static __inline__ int __ATTRS_o_ai vec_test_swsqrt(vector double __a) {
+  return __builtin_vsx_xvtsqrtdp(__a);
+}
+
+static __inline__ int __ATTRS_o_ai vec_test_swsqrts(vector float __a) {
+  return __builtin_vsx_xvtsqrtsp(__a);
+}
+#endif
+
 /* vec_sel */
 
 #define __builtin_altivec_vsel_4si vec_sel

diff  --git a/clang/test/CodeGen/builtins-ppc-vsx.c 
b/clang/test/CodeGen/builtins-ppc-vsx.c
index 2542b30590bf..d99b0c1e8f41 100644
--- a/clang/test/CodeGen/builtins-ppc-vsx.c
+++ b/clang/test/CodeGen/builtins-ppc-vsx.c
@@ -52,6 +52,7 @@ vector unsigned long long res_vull;
 vector signed __int128 res_vslll;
 
 double res_d;
+int res_i;
 float res_af[4];
 double res_ad[2];
 signed char res_asc[16];
@@ -878,6 +879,23 @@ void test1() {
 // CHECK: call <2 x double> @llvm.ppc.vsx.xvrsqrtedp(<2 x double> %{{[0-9]+}})
 // CHECK-LE: call <2 x double> @llvm.ppc.vsx.xvrsqrtedp(<2 x double> 
%{{[0-9]+}})
 
+  res_i = vec_test_swsqrt(vd);
+// CHECK: call i32 @llvm.ppc.vsx.xvtsqrtdp(<2 x double> %{{[0-9]+}})
+// CHECK-LE: call i32 @llvm.ppc.vsx.xvtsqrtdp(<2 x double> %{{[0-9]+}})
+
+  res_i = vec_test_swsqrts(vf);
+// CHECK: call i32 @llvm.ppc.vsx.xvtsqrtsp(<4 x float> %{{[0-9]+}})
+// CHECK-LE: call i32 @llvm.ppc.vsx.xvtsqrtsp(<4 x float> %{{[0-9]+}})
+
+  res_i = vec_test_swdiv(vd, vd);
+// CHECK: call i32 @llvm.ppc.vsx.xvtdivdp(<2 x double> %{{[0-9]+}}, <2 x 
double> %{{[0-9]+}})
+// CHECK-LE: call i32 @llvm.ppc.vsx.xvtdivdp(<2 x double> %{{[0-9]+}}, <2 x 
double> %{{[0-9]+}})
+
+  res_i = vec_test_swdivs(vf, vf);
+// CHECK: call i32 @llvm.ppc.vsx.xvtdivsp(<4 x float> %{{[0-9]+}}, <4 x float> 
%{{[0-9]+}})
+// CHECK-LE: call i32 @llvm.ppc.vsx.xvtdivsp(<4 x float> %{{[0-9]+}}, <4 x 
float> %{{[0-9]+}})
+
+
   dummy();
 // CHECK: call void @dummy()
 // CHECK-LE: call void @dummy()

diff  --git a/llvm/include/llvm/IR/IntrinsicsPowerPC.td 
b/llvm/include/llvm/IR/IntrinsicsPowerPC.td
index 7b11555296a4..7ab4ee301bb5 100644
--- a/llvm/include/llvm/IR/IntrinsicsPowerPC.td
+++ b/llvm/include/llvm/IR/IntrinsicsPowerPC.td
@@ -1249,6 +1249,16 @@ def int_ppc_vsx_xxinsertw :
 def int_ppc_vsx_xvtlsbb :
   

[PATCH] D87981: [X86] AMX programming model prototype.

2020-10-04 Thread LuoYuanke via Phabricator via cfe-commits
LuoYuanke updated this revision to Diff 296050.
LuoYuanke added a comment.
Herald added a subscriber: pengfei.

Rebase


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D87981/new/

https://reviews.llvm.org/D87981

Files:
  clang/include/clang/Basic/BuiltinsX86_64.def
  clang/lib/Headers/amxintrin.h
  clang/test/CodeGen/AMX/amx_api.c
  llvm/include/llvm/CodeGen/LiveIntervalUnion.h
  llvm/include/llvm/CodeGen/LiveRegMatrix.h
  llvm/include/llvm/CodeGen/Passes.h
  llvm/include/llvm/CodeGen/TileShapeInfo.h
  llvm/include/llvm/CodeGen/VirtRegMap.h
  llvm/include/llvm/IR/Intrinsics.td
  llvm/include/llvm/IR/IntrinsicsX86.td
  llvm/lib/CodeGen/InlineSpiller.cpp
  llvm/lib/CodeGen/LiveIntervalUnion.cpp
  llvm/lib/CodeGen/LiveRegMatrix.cpp
  llvm/lib/CodeGen/VirtRegMap.cpp
  llvm/lib/IR/Function.cpp
  llvm/lib/Target/X86/CMakeLists.txt
  llvm/lib/Target/X86/X86.h
  llvm/lib/Target/X86/X86ExpandPseudo.cpp
  llvm/lib/Target/X86/X86ISelDAGToDAG.cpp
  llvm/lib/Target/X86/X86ISelLowering.cpp
  llvm/lib/Target/X86/X86InstrAMX.td
  llvm/lib/Target/X86/X86InstrInfo.cpp
  llvm/lib/Target/X86/X86LowerAMXType.cpp
  llvm/lib/Target/X86/X86RegisterInfo.cpp
  llvm/lib/Target/X86/X86RegisterInfo.h
  llvm/lib/Target/X86/X86RegisterInfo.td
  llvm/lib/Target/X86/X86Subtarget.h
  llvm/lib/Target/X86/X86TargetMachine.cpp
  llvm/lib/Target/X86/X86TileConfig.cpp
  llvm/test/CodeGen/X86/AMX/amx-across-func.ll
  llvm/test/CodeGen/X86/AMX/amx-config.ll
  llvm/test/CodeGen/X86/AMX/amx-spill.ll
  llvm/test/CodeGen/X86/AMX/amx-type.ll
  llvm/test/CodeGen/X86/O0-pipeline.ll
  llvm/test/CodeGen/X86/opt-pipeline.ll
  llvm/utils/TableGen/IntrinsicEmitter.cpp

Index: llvm/utils/TableGen/IntrinsicEmitter.cpp
===
--- llvm/utils/TableGen/IntrinsicEmitter.cpp
+++ llvm/utils/TableGen/IntrinsicEmitter.cpp
@@ -197,25 +197,25 @@
 enum IIT_Info {
   // Common values should be encoded with 0-15.
   IIT_Done = 0,
-  IIT_I1   = 1,
-  IIT_I8   = 2,
-  IIT_I16  = 3,
-  IIT_I32  = 4,
-  IIT_I64  = 5,
-  IIT_F16  = 6,
-  IIT_F32  = 7,
-  IIT_F64  = 8,
-  IIT_V2   = 9,
-  IIT_V4   = 10,
-  IIT_V8   = 11,
-  IIT_V16  = 12,
-  IIT_V32  = 13,
-  IIT_PTR  = 14,
-  IIT_ARG  = 15,
+  IIT_I1 = 1,
+  IIT_I8 = 2,
+  IIT_I16 = 3,
+  IIT_I32 = 4,
+  IIT_I64 = 5,
+  IIT_F16 = 6,
+  IIT_F32 = 7,
+  IIT_F64 = 8,
+  IIT_V2 = 9,
+  IIT_V4 = 10,
+  IIT_V8 = 11,
+  IIT_V16 = 12,
+  IIT_V32 = 13,
+  IIT_PTR = 14,
+  IIT_ARG = 15,
 
   // Values from 16+ are only encodable with the inefficient encoding.
-  IIT_V64  = 16,
-  IIT_MMX  = 17,
+  IIT_V64 = 16,
+  IIT_MMX = 17,
   IIT_TOKEN = 18,
   IIT_METADATA = 19,
   IIT_EMPTYSTRUCT = 20,
@@ -226,7 +226,7 @@
   IIT_EXTEND_ARG = 25,
   IIT_TRUNC_ARG = 26,
   IIT_ANYPTR = 27,
-  IIT_V1   = 28,
+  IIT_V1 = 28,
   IIT_VARARG = 29,
   IIT_HALF_VEC_ARG = 30,
   IIT_SAME_VEC_WIDTH_ARG = 31,
@@ -247,7 +247,8 @@
   IIT_VEC_OF_BITCASTS_TO_INT = 46,
   IIT_V128 = 47,
   IIT_BF16 = 48,
-  IIT_STRUCT9 = 49
+  IIT_STRUCT9 = 49,
+  IIT_V256 = 50
 };
 
 static void EncodeFixedValueType(MVT::SimpleValueType VT,
@@ -385,6 +386,9 @@
 case 32: Sig.push_back(IIT_V32); break;
 case 64: Sig.push_back(IIT_V64); break;
 case 128: Sig.push_back(IIT_V128); break;
+case 256:
+  Sig.push_back(IIT_V256);
+  break;
 case 512: Sig.push_back(IIT_V512); break;
 case 1024: Sig.push_back(IIT_V1024); break;
 }
Index: llvm/test/CodeGen/X86/opt-pipeline.ll
===
--- llvm/test/CodeGen/X86/opt-pipeline.ll
+++ llvm/test/CodeGen/X86/opt-pipeline.ll
@@ -24,6 +24,7 @@
 ; CHECK-NEXT: Pre-ISel Intrinsic Lowering
 ; CHECK-NEXT: FunctionPass Manager
 ; CHECK-NEXT:   Expand Atomic instructions
+; CHECK-NEXT:   Lower AMX type for load/store
 ; CHECK-NEXT:   Module Verifier
 ; CHECK-NEXT:   Dominator Tree Construction
 ; CHECK-NEXT:   Basic Alias Analysis (stateless AA impl)
@@ -141,6 +142,7 @@
 ; CHECK-NEXT:   Lazy Machine Block Frequency Analysis
 ; CHECK-NEXT:   Machine Optimization Remark Emitter
 ; CHECK-NEXT:   Greedy Register Allocator
+; CHECK-NEXT:   Tile Register Configure
 ; CHECK-NEXT:   Virtual Register Rewriter
 ; CHECK-NEXT:   Stack Slot Coloring
 ; CHECK-NEXT:   Machine Copy Propagation Pass
Index: llvm/test/CodeGen/X86/O0-pipeline.ll
===
--- llvm/test/CodeGen/X86/O0-pipeline.ll
+++ llvm/test/CodeGen/X86/O0-pipeline.ll
@@ -18,6 +18,7 @@
 ; CHECK-NEXT: Pre-ISel Intrinsic Lowering
 ; CHECK-NEXT: FunctionPass Manager
 ; CHECK-NEXT:   Expand Atomic instructions
+; CHECK-NEXT:   Lower AMX type for load/store
 ; CHECK-NEXT:   Module Verifier
 ; CHECK-NEXT:   Lower Garbage Collection Instructions
 ; CHECK-NEXT:   Shadow Stack GC Lowering
Index: llvm/test/CodeGen/X86/AMX/amx-type.ll

[clang] 1113fbf - [CodeGen] Improve likelihood branch weights

2020-10-04 Thread Mark de Wever via cfe-commits

Author: Mark de Wever
Date: 2020-10-04T14:24:27+02:00
New Revision: 1113fbf44c2250621548e278d2a1e11ab2b2d63d

URL: 
https://github.com/llvm/llvm-project/commit/1113fbf44c2250621548e278d2a1e11ab2b2d63d
DIFF: 
https://github.com/llvm/llvm-project/commit/1113fbf44c2250621548e278d2a1e11ab2b2d63d.diff

LOG: [CodeGen] Improve likelihood branch weights

Bruno De Fraine discovered some issues with D85091. The branch weights
generated for `logical not` and `ternary conditional` were wrong. The
`logical and` and `logical or` differed from the code generated of
`__builtin_predict`.

Adjusted the generated code for the likelihood to match
`__builtin_predict`. The patch is based on Bruno's suggestions.

Differential Revision: https://reviews.llvm.org/D88363

Added: 
clang/test/CodeGenCXX/attr-likelihood-if-vs-builtin-expect.cpp

Modified: 
clang/lib/CodeGen/CGStmt.cpp
clang/lib/CodeGen/CodeGenFunction.cpp
clang/lib/CodeGen/CodeGenFunction.h

Removed: 




diff  --git a/clang/lib/CodeGen/CGStmt.cpp b/clang/lib/CodeGen/CGStmt.cpp
index 83dd1be31633..c9e6ce2df2c0 100644
--- a/clang/lib/CodeGen/CGStmt.cpp
+++ b/clang/lib/CodeGen/CGStmt.cpp
@@ -27,7 +27,6 @@
 #include "llvm/IR/Intrinsics.h"
 #include "llvm/IR/MDBuilder.h"
 #include "llvm/Support/SaveAndRestore.h"
-#include "llvm/Transforms/Scalar/LowerExpectIntrinsic.h"
 
 using namespace clang;
 using namespace CodeGen;
@@ -652,20 +651,6 @@ void CodeGenFunction::EmitIndirectGotoStmt(const 
IndirectGotoStmt ) {
 
   EmitBranch(IndGotoBB);
 }
-static Optional>
-getLikelihoodWeights(const IfStmt ) {
-  switch (Stmt::getLikelihood(If.getThen(), If.getElse())) {
-  case Stmt::LH_Unlikely:
-return std::pair(llvm::UnlikelyBranchWeight,
- llvm::LikelyBranchWeight);
-  case Stmt::LH_None:
-return None;
-  case Stmt::LH_Likely:
-return std::pair(llvm::LikelyBranchWeight,
- llvm::UnlikelyBranchWeight);
-  }
-  llvm_unreachable("Unknown Likelihood");
-}
 
 void CodeGenFunction::EmitIfStmt(const IfStmt ) {
   // C99 6.8.4.1: The first substatement is executed if the expression compares
@@ -713,17 +698,11 @@ void CodeGenFunction::EmitIfStmt(const IfStmt ) {
   // Prefer the PGO based weights over the likelihood attribute.
   // When the build isn't optimized the metadata isn't used, so don't generate
   // it.
-  llvm::MDNode *Weights = nullptr;
+  Stmt::Likelihood LH = Stmt::LH_None;
   uint64_t Count = getProfileCount(S.getThen());
-  if (!Count && CGM.getCodeGenOpts().OptimizationLevel) {
-Optional> LHW = getLikelihoodWeights(S);
-if (LHW) {
-  llvm::MDBuilder MDHelper(CGM.getLLVMContext());
-  Weights = MDHelper.createBranchWeights(LHW->first, LHW->second);
-}
-  }
-
-  EmitBranchOnBoolExpr(S.getCond(), ThenBlock, ElseBlock, Count, Weights);
+  if (!Count && CGM.getCodeGenOpts().OptimizationLevel)
+LH = Stmt::getLikelihood(S.getThen(), S.getElse());
+  EmitBranchOnBoolExpr(S.getCond(), ThenBlock, ElseBlock, Count, LH);
 
   // Emit the 'then' code.
   EmitBlock(ThenBlock);

diff  --git a/clang/lib/CodeGen/CodeGenFunction.cpp 
b/clang/lib/CodeGen/CodeGenFunction.cpp
index 47ef5c830723..363b418dc198 100644
--- a/clang/lib/CodeGen/CodeGenFunction.cpp
+++ b/clang/lib/CodeGen/CodeGenFunction.cpp
@@ -42,6 +42,7 @@
 #include "llvm/IR/MDBuilder.h"
 #include "llvm/IR/Operator.h"
 #include "llvm/Support/CRC.h"
+#include "llvm/Transforms/Scalar/LowerExpectIntrinsic.h"
 #include "llvm/Transforms/Utils/PromoteMemToReg.h"
 using namespace clang;
 using namespace CodeGen;
@@ -1477,15 +1478,30 @@ bool 
CodeGenFunction::ConstantFoldsToSimpleInteger(const Expr *Cond,
   return true;
 }
 
+static Optional>
+getLikelihoodWeights(Stmt::Likelihood LH) {
+  switch (LH) {
+  case Stmt::LH_Unlikely:
+return std::pair(llvm::UnlikelyBranchWeight,
+ llvm::LikelyBranchWeight);
+  case Stmt::LH_None:
+return None;
+  case Stmt::LH_Likely:
+return std::pair(llvm::LikelyBranchWeight,
+ llvm::UnlikelyBranchWeight);
+  }
+  llvm_unreachable("Unknown Likelihood");
+}
+
 /// EmitBranchOnBoolExpr - Emit a branch on a boolean condition (e.g. for an if
 /// statement) to the specified blocks.  Based on the condition, this might try
 /// to simplify the codegen of the conditional based on the branch.
-/// \param Weights The weights determined by the likelihood attributes.
+/// \param LH The value of the likelihood attribute on the True branch.
 void CodeGenFunction::EmitBranchOnBoolExpr(const Expr *Cond,
llvm::BasicBlock *TrueBlock,
llvm::BasicBlock *FalseBlock,
uint64_t TrueCount,
-   llvm::MDNode *Weights) {
+   

[PATCH] D88363: [CodeGen] Improve likelihood attribute branch weights

2020-10-04 Thread Mark de Wever via Phabricator via cfe-commits
This revision was landed with ongoing or failed builds.
This revision was automatically updated to reflect the committed changes.
Mordante marked 2 inline comments as done.
Closed by commit rG1113fbf44c22: [CodeGen] Improve likelihood branch weights 
(authored by Mordante).

Changed prior to commit:
  https://reviews.llvm.org/D88363?vs=294509=296042#toc

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D88363/new/

https://reviews.llvm.org/D88363

Files:
  clang/lib/CodeGen/CGStmt.cpp
  clang/lib/CodeGen/CodeGenFunction.cpp
  clang/lib/CodeGen/CodeGenFunction.h
  clang/test/CodeGenCXX/attr-likelihood-if-vs-builtin-expect.cpp

Index: clang/test/CodeGenCXX/attr-likelihood-if-vs-builtin-expect.cpp
===
--- /dev/null
+++ clang/test/CodeGenCXX/attr-likelihood-if-vs-builtin-expect.cpp
@@ -0,0 +1,223 @@
+// RUN: %clang_cc1 -O1 -emit-llvm %s -o - -triple=x86_64-linux-gnu | FileCheck %s
+
+// Verifies the output of __builtin_expect versus the output of the likelihood
+// attributes. They should generate the same probabilities for the branches.
+
+extern bool a();
+extern bool b();
+extern bool c();
+
+void ab1(int ) {
+  // CHECK-LABEL: define{{.*}}ab1
+  // CHECK: br {{.*}} !prof !2
+  // CHECK: br {{.*}} !prof !2
+  // CHECK: br {{.*}} !prof !2
+  if (__builtin_expect(a() && b() && a(), 1)) {
+++i;
+  } else {
+--i;
+  }
+}
+
+void al(int ) {
+  // CHECK-LABEL: define{{.*}}al
+  // CHECK: br {{.*}} !prof !2
+  // CHECK: br {{.*}} !prof !2
+  // CHECK: br {{.*}} !prof !2
+  if (a() && b() && c()) [[likely]] {
+++i;
+  } else {
+--i;
+  }
+}
+
+void ab0(int ) {
+  // CHECK-LABEL: define{{.*}}ab0
+  // CHECK: br {{.*}}else{{$}}
+  // CHECK: br {{.*}}else{{$}}
+  // CHECK: br {{.*}} !prof !8
+  if (__builtin_expect(a() && b() && c(), 0)) {
+++i;
+  } else {
+--i;
+  }
+}
+
+void au(int ) {
+  // CHECK-LABEL: define{{.*}}au
+  // CHECK: br {{.*}}else{{$}}
+  // CHECK: br {{.*}}else{{$}}
+  // CHECK: br {{.*}} !prof !8
+  if (a() && b() && c()) [[unlikely]] {
+++i;
+  } else {
+--i;
+  }
+}
+
+void ob1(int ) {
+  // CHECK-LABEL: define{{.*}}ob1
+  // CHECK: br {{.*}}false{{$}}
+  // CHECK: br {{.*}}rhs{{$}}
+  // CHECK: br {{.*}} !prof !2
+  if (__builtin_expect(a() || b() || a(), 1)) {
+i = 0;
+  } else {
+--i;
+  }
+}
+
+void ol(int ) {
+  // CHECK-LABEL: define{{.*}}ol
+  // CHECK: br {{.*}}false{{$}}
+  // CHECK: br {{.*}}false2{{$}}
+  // CHECK: br {{.*}} !prof !2
+  if (a() || b() || c()) [[likely]] {
+i = 0;
+  } else {
+--i;
+  }
+}
+
+void ob0(int ) {
+  // CHECK-LABEL: define{{.*}}ob0
+  // CHECK: br {{.*}} !prof !8
+  // CHECK: br {{.*}} !prof !8
+  // CHECK: br {{.*}} !prof !8
+  if (__builtin_expect(a() || b() || c(), 0)) {
+i = 0;
+  } else {
+--i;
+  }
+}
+
+void ou(int ) {
+  // CHECK-LABEL: define{{.*}}ou
+  // CHECK: br {{.*}} !prof !8
+  // CHECK: br {{.*}} !prof !8
+  // CHECK: br {{.*}} !prof !8
+  if (a() || b() || c()) [[unlikely]] {
+i = 0;
+  } else {
+--i;
+  }
+}
+
+void nb1(int ) {
+  // CHECK-LABEL: define{{.*}}nb1
+  // CHECK: storemerge{{.*}} !prof !8
+  if (__builtin_expect(!a(), 1)) {
+++i;
+  } else {
+--i;
+  }
+}
+
+void nl(int ) {
+  // CHECK-LABEL: define{{.*}}nl
+  // CHECK: storemerge{{.*}} !prof !8
+  if (!a()) [[likely]] {
+++i;
+  } else {
+--i;
+  }
+}
+
+void nb0(int ) {
+  // CHECK-LABEL: define{{.*}}nb0
+  // CHECK: storemerge{{.*}} !prof !2
+  if (__builtin_expect(!a(), 0)) {
+++i;
+  } else {
+--i;
+  }
+}
+
+void nu(int ) {
+  // CHECK-LABEL: define{{.*}}nu
+  // CHECK: storemerge{{.*}} !prof !2
+  if (!a()) [[unlikely]] {
+++i;
+  } else {
+--i;
+  }
+}
+
+void tb1(int ) {
+  // CHECK-LABEL: define{{.*}}tb1
+  // CHECK: br {{.*}}false{{$}}
+  // CHECK: br {{.*}}end{{$}}
+  // CHECK: br {{.*}}end{{$}}
+  // CHECK: storemerge{{.*}} !prof !2
+  if (__builtin_expect(a() ? b() : c(), 1)) {
+++i;
+  } else {
+--i;
+  }
+}
+
+void tl(int ) {
+  // CHECK-LABEL: define{{.*}}tl
+  // CHECK: br {{.*}}false{{$}}
+  // CHECK: br {{.*}}end{{$}}
+  // CHECK: br {{.*}}end{{$}}
+  // CHECK: storemerge{{.*}} !prof !2
+  if (bool d = a() ? b() : c()) [[likely]] {
+++i;
+  } else {
+--i;
+  }
+}
+
+void tl2(int ) {
+  // CHECK-LABEL: define{{.*}}tl
+  // CHECK: br {{.*}}false{{$}}
+  // CHECK: br {{.*}} !prof !2
+  // CHECK: br {{.*}} !prof !2
+  if (a() ? b() : c()) [[likely]] {
+++i;
+  } else {
+--i;
+  }
+}
+
+void tb0(int ) {
+  // CHECK-LABEL: define{{.*}}tb0
+  // CHECK: br {{.*}}false{{$}}
+  // CHECK: br {{.*}}end{{$}}
+  // CHECK: br {{.*}}end{{$}}
+  // CHECK: storemerge{{.*}} !prof !8
+  if (__builtin_expect(a() ? b() : c(), 0)) {
+++i;
+  } else {
+--i;
+  }
+}
+
+void tu(int ) {
+  // CHECK-LABEL: define{{.*}}tu
+  // CHECK: br {{.*}}false{{$}}
+  // CHECK: br {{.*}}end{{$}}
+  // CHECK: br {{.*}}end{{$}}
+  // CHECK: storemerge{{.*}} !prof !8

[PATCH] D88154: Initial support for vectorization using Libmvec (GLIBC vector math library).

2020-10-04 Thread Venkataramanan Kumar via Phabricator via cfe-commits
venkataramanan.kumar.llvm added a comment.

In D88154#2290205 , @abique wrote:

> Looks good to me.
> Regarding the tests, it seems that you check if auto-vectorization takes 
> advantages of libmvec?
> Would it be interesting to have a test which declares a vector and call the 
> builtin sin on it?
>
> Thank you very much for the changes! :)

do we we have built-in support for sin that takes vector types?

I tried

__m128d compute_sin(__m128d x)
{

  return __builtin_sin(x);

}

>> error: passing '__m128d' (vector of 2 'double' values) to parameter of 
>> incompatible type 'double'


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D88154/new/

https://reviews.llvm.org/D88154

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D88154: Initial support for vectorization using Libmvec (GLIBC vector math library).

2020-10-04 Thread Venkataramanan Kumar via Phabricator via cfe-commits
venkataramanan.kumar.llvm updated this revision to Diff 296035.
venkataramanan.kumar.llvm added a reviewer: spatel.
venkataramanan.kumar.llvm added a comment.
Herald added a subscriber: pengfei.

Selection of Glibc vector math library is enabled via the option 
-fvec-lib=libmvec .


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D88154/new/

https://reviews.llvm.org/D88154

Files:
  clang/include/clang/Basic/CodeGenOptions.def
  clang/include/clang/Basic/CodeGenOptions.h
  clang/include/clang/Driver/Options.td
  clang/lib/CodeGen/BackendUtil.cpp
  clang/lib/Frontend/CompilerInvocation.cpp
  clang/test/Driver/autocomplete.c
  llvm/include/llvm/Analysis/TargetLibraryInfo.h
  llvm/include/llvm/Analysis/VecFuncs.def
  llvm/lib/Analysis/TargetLibraryInfo.cpp
  llvm/test/Transforms/LoopVectorize/X86/libm-vector-calls-finite.ll
  llvm/test/Transforms/LoopVectorize/X86/libm-vector-calls.ll
  llvm/test/Transforms/Util/add-TLI-mappings.ll

Index: llvm/test/Transforms/Util/add-TLI-mappings.ll
===
--- llvm/test/Transforms/Util/add-TLI-mappings.ll
+++ llvm/test/Transforms/Util/add-TLI-mappings.ll
@@ -3,6 +3,8 @@
 ; RUN: opt -vector-library=MASSV  -inject-tli-mappings-S < %s | FileCheck %s  --check-prefixes=COMMON,MASSV
 ; RUN: opt -vector-library=MASSV  -passes=inject-tli-mappings -S < %s | FileCheck %s  --check-prefixes=COMMON,MASSV
 ; RUN: opt -vector-library=Accelerate -inject-tli-mappings-S < %s | FileCheck %s  --check-prefixes=COMMON,ACCELERATE
+; RUN: opt -vector-library=LIBMVEC -inject-tli-mappings-S < %s | FileCheck %s  --check-prefixes=COMMON,LIBMVEC
+; RUN: opt -vector-library=LIBMVEC -passes=inject-tli-mappings -S < %s | FileCheck %s  --check-prefixes=COMMON,LIBMVEC
 ; RUN: opt -vector-library=Accelerate -passes=inject-tli-mappings -S < %s | FileCheck %s  --check-prefixes=COMMON,ACCELERATE
 
 target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
@@ -21,6 +23,9 @@
 ; MASSV-SAME: i8* bitcast (<4 x float> (<4 x float>)* @__log10f4_massv to i8*)
 ; ACCELERATE-SAME:  [1 x i8*] [
 ; ACCELERATE-SAME:i8* bitcast (<4 x float> (<4 x float>)* @vlog10f to i8*)
+; LIBMVEC-SAME: [2 x i8*] [
+; LIBMVEC-SAME:   i8* bitcast (<2 x double> (<2 x double>)* @_ZGVbN2v_sin to i8*),
+; LIBMVEC-SAME:   i8* bitcast (<4 x double> (<4 x double>)* @_ZGVdN4v_sin to i8*)
 ; COMMON-SAME:  ], section "llvm.metadata"
 
 define double @sin_f64(double %in) {
@@ -28,6 +33,7 @@
 ; SVML: call double @sin(double %{{.*}}) #[[SIN:[0-9]+]]
 ; MASSV:call double @sin(double %{{.*}}) #[[SIN:[0-9]+]]
 ; ACCELERATE:   call double @sin(double %{{.*}})
+; LIBMVEC:  call double @sin(double %{{.*}}) #[[SIN:[0-9]+]]
 ; No mapping of "sin" to a vector function for Accelerate.
 ; ACCELERATE-NOT: _ZGV_LLVM_{{.*}}_sin({{.*}})
   %call = tail call double @sin(double %in)
@@ -39,10 +45,12 @@
 define float @call_llvm.log10.f32(float %in) {
 ; COMMON-LABEL: @call_llvm.log10.f32(
 ; SVML: call float @llvm.log10.f32(float %{{.*}})
+; LIBMVEC:  call float @llvm.log10.f32(float %{{.*}})
 ; MASSV:call float @llvm.log10.f32(float %{{.*}}) #[[LOG10:[0-9]+]]
 ; ACCELERATE:   call float @llvm.log10.f32(float %{{.*}}) #[[LOG10:[0-9]+]]
 ; No mapping of "llvm.log10.f32" to a vector function for SVML.
 ; SVML-NOT: _ZGV_LLVM_{{.*}}_llvm.log10.f32({{.*}})
+; LIBMVEC-NOT:  _ZGV_LLVM_{{.*}}_llvm.log10.f32({{.*}})
   %call = tail call float @llvm.log10.f32(float %in)
   ret float %call
 }
@@ -62,3 +70,7 @@
 
 ; ACCELERATE:  attributes #[[LOG10]] = { "vector-function-abi-variant"=
 ; ACCELERATE-SAME:   "_ZGV_LLVM_N4v_llvm.log10.f32(vlog10f)" }
+
+; LIBMVEC:  attributes #[[SIN]] = { "vector-function-abi-variant"=
+; LIBMVEC-SAME:   "_ZGV_LLVM_N2v_sin(_ZGVbN2v_sin),
+; LIBMVEC-SAME:   _ZGV_LLVM_N4v_sin(_ZGVdN4v_sin)" }
Index: llvm/test/Transforms/LoopVectorize/X86/libm-vector-calls.ll
===
--- /dev/null
+++ llvm/test/Transforms/LoopVectorize/X86/libm-vector-calls.ll
@@ -0,0 +1,325 @@
+; RUN: opt -vector-library=LIBMVEC  -inject-tli-mappings -force-vector-width=4 -force-vector-interleave=1 -loop-vectorize -S < %s | FileCheck %s
+
+target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
+target triple = "x86_64-unknown-linux-gnu"
+
+declare double @sin(double) #0
+declare float @sinf(float) #0
+declare double @llvm.sin.f64(double) #0
+declare float @llvm.sin.f32(float) #0
+
+declare double @cos(double) #0
+declare float @cosf(float) #0
+declare double @llvm.cos.f64(double) #0
+declare float @llvm.cos.f32(float) #0
+
+define void @sin_f64(double* nocapture %varray) {
+; CHECK-LABEL: @sin_f64(
+; CHECK:[[TMP5:%.*]] = call <4 x double> @_ZGVdN4v_sin(<4 x double> [[TMP4:%.*]])
+; CHECK:ret void
+;
+entry:
+  br label %for.body
+
+for.body:
+  %iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]
+  %tmp = trunc i64 %iv to i32
+  %conv 

[PATCH] D87972: [OldPM] Pass manager: run SROA after (simple) loop unrolling

2020-10-04 Thread Roman Lebedev via Phabricator via cfe-commits
This revision was automatically updated to reflect the committed changes.
Closed by commit rG03bd5198b6f7: [OldPM] Pass manager: run SROA after (simple) 
loop unrolling (authored by lebedev.ri).

Changed prior to commit:
  https://reviews.llvm.org/D87972?vs=296005=296028#toc

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D87972/new/

https://reviews.llvm.org/D87972

Files:
  clang/test/CodeGenCXX/union-tbaa2.cpp
  clang/test/Misc/loop-opt-setup.c
  llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
  llvm/lib/Transforms/IPO/PassManagerBuilder.cpp
  llvm/test/Other/opt-O2-pipeline.ll
  llvm/test/Other/opt-O3-pipeline-enable-matrix.ll
  llvm/test/Other/opt-O3-pipeline.ll
  llvm/test/Other/opt-Os-pipeline.ll
  llvm/test/Transforms/PhaseOrdering/X86/SROA-after-loop-unrolling.ll

Index: llvm/test/Transforms/PhaseOrdering/X86/SROA-after-loop-unrolling.ll
===
--- llvm/test/Transforms/PhaseOrdering/X86/SROA-after-loop-unrolling.ll
+++ llvm/test/Transforms/PhaseOrdering/X86/SROA-after-loop-unrolling.ll
@@ -22,55 +22,21 @@
 %"struct.std::array" = type { [6 x i32] }
 
 define dso_local void @_Z3fooi(i32 %cnt) {
-; OLDPM-LABEL: @_Z3fooi(
-; OLDPM-NEXT:  entry:
-; OLDPM-NEXT:[[ARR:%.*]] = alloca %"struct.std::array", align 16
-; OLDPM-NEXT:[[TMP0:%.*]] = bitcast %"struct.std::array"* [[ARR]] to i8*
-; OLDPM-NEXT:call void @llvm.lifetime.start.p0i8(i64 24, i8* nonnull [[TMP0]])
-; OLDPM-NEXT:[[ARRAYDECAY_I_I_I:%.*]] = getelementptr inbounds %"struct.std::array", %"struct.std::array"* [[ARR]], i64 0, i32 0, i64 0
-; OLDPM-NEXT:[[INCDEC_PTR:%.*]] = getelementptr inbounds %"struct.std::array", %"struct.std::array"* [[ARR]], i64 0, i32 0, i64 1
-; OLDPM-NEXT:[[INCDEC_PTR_1:%.*]] = getelementptr inbounds %"struct.std::array", %"struct.std::array"* [[ARR]], i64 0, i32 0, i64 2
-; OLDPM-NEXT:[[INCDEC_PTR_2:%.*]] = getelementptr inbounds %"struct.std::array", %"struct.std::array"* [[ARR]], i64 0, i32 0, i64 3
-; OLDPM-NEXT:[[TMP1:%.*]] = insertelement <4 x i32> undef, i32 [[CNT:%.*]], i32 0
-; OLDPM-NEXT:[[TMP2:%.*]] = shufflevector <4 x i32> [[TMP1]], <4 x i32> undef, <4 x i32> zeroinitializer
-; OLDPM-NEXT:[[TMP3:%.*]] = add nsw <4 x i32> [[TMP2]], 
-; OLDPM-NEXT:[[TMP4:%.*]] = bitcast %"struct.std::array"* [[ARR]] to <4 x i32>*
-; OLDPM-NEXT:store <4 x i32> [[TMP3]], <4 x i32>* [[TMP4]], align 16
-; OLDPM-NEXT:[[INCDEC_PTR_3:%.*]] = getelementptr inbounds %"struct.std::array", %"struct.std::array"* [[ARR]], i64 0, i32 0, i64 4
-; OLDPM-NEXT:[[INC_4:%.*]] = add nsw i32 [[CNT]], 5
-; OLDPM-NEXT:store i32 [[INC_4]], i32* [[INCDEC_PTR_3]], align 16
-; OLDPM-NEXT:[[INCDEC_PTR_4:%.*]] = getelementptr inbounds %"struct.std::array", %"struct.std::array"* [[ARR]], i64 0, i32 0, i64 5
-; OLDPM-NEXT:[[INC_5:%.*]] = add nsw i32 [[CNT]], 6
-; OLDPM-NEXT:store i32 [[INC_5]], i32* [[INCDEC_PTR_4]], align 4
-; OLDPM-NEXT:[[TMP5:%.*]] = load i32, i32* [[ARRAYDECAY_I_I_I]], align 16
-; OLDPM-NEXT:call void @_Z3usei(i32 [[TMP5]])
-; OLDPM-NEXT:[[TMP6:%.*]] = load i32, i32* [[INCDEC_PTR]], align 4
-; OLDPM-NEXT:call void @_Z3usei(i32 [[TMP6]])
-; OLDPM-NEXT:[[TMP7:%.*]] = load i32, i32* [[INCDEC_PTR_1]], align 8
-; OLDPM-NEXT:call void @_Z3usei(i32 [[TMP7]])
-; OLDPM-NEXT:[[TMP8:%.*]] = load i32, i32* [[INCDEC_PTR_2]], align 4
-; OLDPM-NEXT:call void @_Z3usei(i32 [[TMP8]])
-; OLDPM-NEXT:[[TMP9:%.*]] = load i32, i32* [[INCDEC_PTR_3]], align 16
-; OLDPM-NEXT:call void @_Z3usei(i32 [[TMP9]])
-; OLDPM-NEXT:call void @_Z3usei(i32 [[INC_5]])
-; OLDPM-NEXT:call void @llvm.lifetime.end.p0i8(i64 24, i8* nonnull [[TMP0]])
-; OLDPM-NEXT:ret void
-;
-; NEWPM-LABEL: @_Z3fooi(
-; NEWPM-NEXT:  entry:
-; NEWPM-NEXT:[[INC:%.*]] = add nsw i32 [[CNT:%.*]], 1
-; NEWPM-NEXT:[[INC_1:%.*]] = add nsw i32 [[CNT]], 2
-; NEWPM-NEXT:[[INC_2:%.*]] = add nsw i32 [[CNT]], 3
-; NEWPM-NEXT:[[INC_3:%.*]] = add nsw i32 [[CNT]], 4
-; NEWPM-NEXT:[[INC_4:%.*]] = add nsw i32 [[CNT]], 5
-; NEWPM-NEXT:[[INC_5:%.*]] = add nsw i32 [[CNT]], 6
-; NEWPM-NEXT:call void @_Z3usei(i32 [[INC]])
-; NEWPM-NEXT:call void @_Z3usei(i32 [[INC_1]])
-; NEWPM-NEXT:call void @_Z3usei(i32 [[INC_2]])
-; NEWPM-NEXT:call void @_Z3usei(i32 [[INC_3]])
-; NEWPM-NEXT:call void @_Z3usei(i32 [[INC_4]])
-; NEWPM-NEXT:call void @_Z3usei(i32 [[INC_5]])
-; NEWPM-NEXT:ret void
+; CHECK-LABEL: @_Z3fooi(
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:[[INC:%.*]] = add nsw i32 [[CNT:%.*]], 1
+; CHECK-NEXT:[[INC_1:%.*]] = add nsw i32 [[CNT]], 2
+; CHECK-NEXT:[[INC_2:%.*]] = add nsw i32 [[CNT]], 3
+; CHECK-NEXT:[[INC_3:%.*]] = add nsw i32 [[CNT]], 4
+; CHECK-NEXT:[[INC_4:%.*]] = add nsw i32 [[CNT]], 5
+; CHECK-NEXT:[[INC_5:%.*]] = add nsw i32 [[CNT]], 6
+; CHECK-NEXT:call void @_Z3usei(i32 [[INC]])
+; CHECK-NEXT:call void 

[clang] 03bd519 - [OldPM] Pass manager: run SROA after (simple) loop unrolling

2020-10-04 Thread Roman Lebedev via cfe-commits

Author: Roman Lebedev
Date: 2020-10-04T11:53:50+03:00
New Revision: 03bd5198b6f7d9f49d72e6516d813a206f3b6d0d

URL: 
https://github.com/llvm/llvm-project/commit/03bd5198b6f7d9f49d72e6516d813a206f3b6d0d
DIFF: 
https://github.com/llvm/llvm-project/commit/03bd5198b6f7d9f49d72e6516d813a206f3b6d0d.diff

LOG: [OldPM] Pass manager: run SROA after (simple) loop unrolling

I have stumbled into this pretty accidentally, when rewriting
some spaghetti-like code into something more structured,
which involved using some `std::array<>`s. And to my surprise,
the `alloca`s remained, causing about `+160%` perf regression.

https://llvm-compile-time-tracker.com/compare.php?from=bb6f4d32aac3eecb51909f4facc625219307ee68=d563e66f40f9d4d145cb2050e41cb961e2b37785=instructions
suggests that this has geomean compile-time cost of `+0.08%`.

Note that D68593 / cecc0d27ad58c0aed8ef9ed99bbf691e137a0f26
already did this chage for NewPM, but left OldPM in a pessimized state.

This fixes [[ https://bugs.llvm.org/show_bug.cgi?id=40011 | PR40011 ]], [[ 
https://bugs.llvm.org/show_bug.cgi?id=42794 | PR42794 ]] and probably some 
other reports.

Reviewed By: nikic, xbolva00

Differential Revision: https://reviews.llvm.org/D87972

Added: 


Modified: 
clang/test/CodeGenCXX/union-tbaa2.cpp
clang/test/Misc/loop-opt-setup.c
llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
llvm/lib/Transforms/IPO/PassManagerBuilder.cpp
llvm/test/Other/opt-O2-pipeline.ll
llvm/test/Other/opt-O3-pipeline-enable-matrix.ll
llvm/test/Other/opt-O3-pipeline.ll
llvm/test/Other/opt-Os-pipeline.ll
llvm/test/Transforms/PhaseOrdering/X86/SROA-after-loop-unrolling.ll

Removed: 




diff  --git a/clang/test/CodeGenCXX/union-tbaa2.cpp 
b/clang/test/CodeGenCXX/union-tbaa2.cpp
index 5d13ff1ad8d9..65872d4a98ae 100644
--- a/clang/test/CodeGenCXX/union-tbaa2.cpp
+++ b/clang/test/CodeGenCXX/union-tbaa2.cpp
@@ -1,4 +1,4 @@
-// RUN: %clang_cc1 %s -O2 -fno-experimental-new-pass-manager -std=c++11 
-triple x86_64-unknown-linux-gnu -target-cpu x86-64 -target-feature +sse4.2 
-target-feature +avx -emit-llvm -o - | FileCheck %s
+// RUN: %clang_cc1 %s -O1 -std=c++11 -triple x86_64-unknown-linux-gnu 
-target-cpu x86-64 -target-feature +sse4.2 -target-feature +avx -emit-llvm -o - 
| FileCheck %s
 
 // Testcase from llvm.org/PR32056
 

diff  --git a/clang/test/Misc/loop-opt-setup.c 
b/clang/test/Misc/loop-opt-setup.c
index 868c716c6ed7..322f5e0e6d4a 100644
--- a/clang/test/Misc/loop-opt-setup.c
+++ b/clang/test/Misc/loop-opt-setup.c
@@ -1,5 +1,5 @@
-// RUN: %clang -O1 -fexperimental-new-pass-manager -fno-unroll-loops -S -o - 
%s -emit-llvm | FileCheck %s -check-prefix=CHECK-NEWPM
-// RUN: %clang -O1 -fno-experimental-new-pass-manager -fno-unroll-loops -S -o 
- %s -emit-llvm | FileCheck %s -check-prefix=CHECK-OLDPM
+// RUN: %clang -O1 -fno-unroll-loops -S -o - %s -emit-llvm | FileCheck %s
+
 extern int a[16];
 int b = 0;
 int foo(void) {
@@ -9,10 +9,8 @@ int foo(void) {
   return b;
 }
 // Check br i1 to make sure that the loop is fully unrolled
-// CHECK-LABEL-NEWPM: foo
-// CHECK-NOT-NEWPM: br i1
-// CHECK-LABEL-OLDPM: foo
-// CHECK-NOT-OLDPM: br i1
+// CHECK-LABEL: foo
+// CHECK-NOT: br i1
 
 void Helper() {
   const int *nodes[5];
@@ -26,17 +24,7 @@ void Helper() {
 }
 
 // Check br i1 to make sure the loop is gone, there will still be a label 
branch for the infinite loop.
-// CHECK-LABEL-NEWPM: Helper
-// CHECK-NEWPM: br label
-// CHECK-NEWPM-NOT: br i1
-// CHECK-NEWPM: br label
-
-// The old pass manager doesn't remove the while loop so check for 5 load i32*.
-// CHECK-LABEL-OLDPM: Helper
-// CHECK-OLDPM: br label
-// CHECK-OLDPM: load i32*
-// CHECK-OLDPM: load i32*
-// CHECK-OLDPM: load i32*
-// CHECK-OLDPM: load i32*
-// CHECK-OLDPM: load i32*
-// CHECK-OLDPM: ret
+// CHECK-LABEL: Helper
+// CHECK: br label
+// CHECK-NOT: br i1
+// CHECK: br label

diff  --git a/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp 
b/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
index ccc493640b29..043effc97f2b 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
@@ -479,14 +479,6 @@ void 
AMDGPUTargetMachine::adjustPassManager(PassManagerBuilder ) {
   if (EnableOpt)
 PM.add(createAMDGPUPromoteAllocaToVector());
   });
-
-  Builder.addExtension(
-  PassManagerBuilder::EP_LoopOptimizerEnd,
-  [](const PassManagerBuilder &, legacy::PassManagerBase ) {
-// Add SROA after loop unrolling as more promotable patterns are
-// exposed after small loops are fully unrolled.
-PM.add(createSROAPass());
-  });
 }
 
 
//===--===//

diff  --git a/llvm/lib/Transforms/IPO/PassManagerBuilder.cpp 
b/llvm/lib/Transforms/IPO/PassManagerBuilder.cpp
index c63705a4ee94..088f1e25f3d1 100644
--- 

[PATCH] D87972: [OldPM] Pass manager: run SROA after (simple) loop unrolling

2020-10-04 Thread Roman Lebedev via Phabricator via cfe-commits
lebedev.ri added a comment.

In D87972#2310603 , @xbolva00 wrote:

>> ! In D87972#2310595 , @nikic wrote:
>>
>>> I'll just say this LGTM as it establishes parity with what NewPM has been 
>>> doing for a while already.
>
> +1

Thank you. 
I'm gonna just land this as is then.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D87972/new/

https://reviews.llvm.org/D87972

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D87972: [OldPM] Pass manager: run SROA after (simple) loop unrolling

2020-10-04 Thread Dávid Bolvanský via Phabricator via cfe-commits
xbolva00 accepted this revision.
xbolva00 added a comment.

>> I'll just say this LGTM as it establishes parity with what NewPM has been 
>> doing for a while already.

+1


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D87972/new/

https://reviews.llvm.org/D87972

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D87972: [OldPM] Pass manager: run SROA after (simple) loop unrolling

2020-10-04 Thread Nikita Popov via Phabricator via cfe-commits
nikic accepted this revision.
nikic added a comment.
This revision is now accepted and ready to land.

I'll just say this LGTM as it establishes parity with what NewPM has been doing 
for a while already.

Reviewers, in the future, please reject any patches that only change the NewPM 
pipeline or only change the LegacyPM pipeline, unless there is some good 
technical reason to do so. If there was one here, it was not mentioned in the 
original patch.




Comment at: clang/test/CodeGenCXX/union-tbaa2.cpp:1
-// RUN: %clang_cc1 %s -O2 -fno-experimental-new-pass-manager -std=c++11 
-triple x86_64-unknown-linux-gnu -target-cpu x86-64 -target-feature +sse4.2 
-target-feature +avx -emit-llvm -o - | FileCheck %s
+// RUN: %clang_cc1 %s -O1 -fno-experimental-new-pass-manager -std=c++11 
-triple x86_64-unknown-linux-gnu -target-cpu x86-64 -target-feature +sse4.2 
-target-feature +avx -emit-llvm -o - | FileCheck %s
 

Remove `-fno-experimental-new-pass-manager `? It was added to work around the 
NewPM/LegacyPM discrepancy.



Comment at: clang/test/Misc/loop-opt-setup.c:2
+// RUN: %clang -O1 -fexperimental-new-pass-manager -fno-unroll-loops -S -o - 
%s -emit-llvm | FileCheck %s -check-prefixes=CHECK-ALL,CHECK-NEWPM
+// RUN: %clang -O1 -fno-experimental-new-pass-manager -fno-unroll-loops -S -o 
- %s -emit-llvm | FileCheck %s -check-prefixes=CHECK-ALL,CHECK-NEWPM
 extern int a[16];

xbolva00 wrote:
> OLDPM?
Remove the NewPM/OldPM tests now that behavior is the same?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D87972/new/

https://reviews.llvm.org/D87972

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits