from:"Jay Foad via Phabricator via cfe\-commits"

[PATCH] D156989: FloatingPointMode: Use -1 for "Dynamic"

2023-08-03 Thread Jay Foad via Phabricator via cfe-commits

foad added inline comments.



Comment at: llvm/include/llvm/ADT/FloatingPointMode.h:39
+  // Special values.
+  Invalid = -2,
+

Lost the `///<` comment here.



Comment at: llvm/include/llvm/ADT/FloatingPointMode.h:41
+
+  ///< Denotes mode unknown at compile time.
+  Dynamic = -1,

Does `///<` work //before// the field name?


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D156989/new/

https://reviews.llvm.org/D156989

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D74094: [IRGen] Emit lifetime intrinsics around temporary aggregate argument allocas

2023-08-02 Thread Jay Foad via Phabricator via cfe-commits

foad added a comment.
Herald added a project: All.

Hi @erik.pilkington, I see this got reverted:

  commit e26c24b849211f35a988d001753e0cd15e4a9d7b
  Author: Erik Pilkington 
  Date:   Wed Feb 12 12:02:58 2020 -0800
  
  Revert "[IRGen] Emit lifetime intrinsics around temporary aggregate 
argument allocas"
  
  This reverts commit fafc6e4fdf3673dcf557d6c8ae0c0a4bb3184402.
  
  Should fix ppc stage2 failure: 
http://lab.llvm.org:8011/builders/clang-ppc64be-linux-multistage/builds/23546

Do you have any more info on the "ppc stage2 failure"? I'd like to pursue 
something like this patch to get more accurate lifetime markers for 
temporaries, so that LLVM stack slot coloring can do a better job, and we get 
smaller stack usage. This is prompted by 
https://github.com/llvm/llvm-project/issues/41896


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D74094/new/

https://reviews.llvm.org/D74094

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D155429: [AMDGPU] Add targets gfx1150 and gfx1151

2023-07-17 Thread Jay Foad via Phabricator via cfe-commits

This revision was landed with ongoing or failed builds.
This revision was automatically updated to reflect the committed changes.
Closed by commit rG92542f2a4000: [AMDGPU] Add targets gfx1150 and gfx1151 
(authored by foad).

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D155429/new/

https://reviews.llvm.org/D155429

Files:
  clang/include/clang/Basic/Cuda.h
  clang/lib/Basic/Cuda.cpp
  clang/lib/Basic/Targets/NVPTX.cpp
  clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp
  clang/test/CodeGenOpenCL/amdgpu-features.cl
  clang/test/CodeGenOpenCL/builtins-amdgcn-gfx11.cl
  clang/test/Driver/amdgpu-macros.cl
  clang/test/Driver/amdgpu-mcpu.cl
  clang/test/Misc/target-invalid-cpu-note.c
  flang/runtime/CMakeLists.txt
  libc/cmake/modules/prepare_libc_gpu_build.cmake
  libc/src/math/gpu/vendor/amdgpu/platform.h
  llvm/docs/AMDGPUUsage.rst
  llvm/include/llvm/BinaryFormat/ELF.h
  llvm/include/llvm/TargetParser/TargetParser.h
  llvm/lib/Object/ELFObjectFile.cpp
  llvm/lib/ObjectYAML/ELFYAML.cpp
  llvm/lib/Target/AMDGPU/AMDGPU.td
  llvm/lib/Target/AMDGPU/GCNProcessors.td
  llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUTargetStreamer.cpp
  llvm/lib/TargetParser/TargetParser.cpp
  llvm/test/CodeGen/AMDGPU/directive-amdgcn-target-v3.ll
  llvm/test/CodeGen/AMDGPU/directive-amdgcn-target.ll
  llvm/test/CodeGen/AMDGPU/elf-header-flags-mach.ll
  llvm/test/CodeGen/AMDGPU/occupancy-levels.ll
  llvm/test/Object/AMDGPU/elf-header-flags-mach.yaml
  llvm/test/tools/llvm-objdump/ELF/AMDGPU/subtarget.ll
  llvm/test/tools/llvm-readobj/ELF/amdgpu-elf-headers.test
  llvm/tools/llvm-readobj/ELFDumper.cpp
  openmp/libomptarget/DeviceRTL/CMakeLists.txt

Index: openmp/libomptarget/DeviceRTL/CMakeLists.txt
===
--- openmp/libomptarget/DeviceRTL/CMakeLists.txt
+++ openmp/libomptarget/DeviceRTL/CMakeLists.txt
@@ -57,7 +57,7 @@
 set(all_amdgpu_architectures "gfx700;gfx701;gfx801;gfx803;gfx900;gfx902;gfx906"
  "gfx908;gfx90a;gfx90c;gfx940;gfx1010;gfx1030"
  "gfx1031;gfx1032;gfx1033;gfx1034;gfx1035;gfx1036"
- "gfx1100;gfx1101;gfx1102;gfx1103")
+ "gfx1100;gfx1101;gfx1102;gfx1103;gfx1150;gfx1151")
 set(all_nvptx_architectures "sm_35;sm_37;sm_50;sm_52;sm_53;sm_60;sm_61;sm_62"
 "sm_70;sm_72;sm_75;sm_80;sm_86;sm_87;sm_89;sm_90")
 set(all_gpu_architectures
Index: llvm/tools/llvm-readobj/ELFDumper.cpp
===
--- llvm/tools/llvm-readobj/ELFDumper.cpp
+++ llvm/tools/llvm-readobj/ELFDumper.cpp
@@ -1605,6 +1605,8 @@
   LLVM_READOBJ_ENUM_ENT(ELF, EF_AMDGPU_MACH_AMDGCN_GFX1101),
   LLVM_READOBJ_ENUM_ENT(ELF, EF_AMDGPU_MACH_AMDGCN_GFX1102),
   LLVM_READOBJ_ENUM_ENT(ELF, EF_AMDGPU_MACH_AMDGCN_GFX1103),
+  LLVM_READOBJ_ENUM_ENT(ELF, EF_AMDGPU_MACH_AMDGCN_GFX1150),
+  LLVM_READOBJ_ENUM_ENT(ELF, EF_AMDGPU_MACH_AMDGCN_GFX1151),
   LLVM_READOBJ_ENUM_ENT(ELF, EF_AMDGPU_FEATURE_XNACK_V3),
   LLVM_READOBJ_ENUM_ENT(ELF, EF_AMDGPU_FEATURE_SRAMECC_V3)
 };
@@ -1667,6 +1669,8 @@
   LLVM_READOBJ_ENUM_ENT(ELF, EF_AMDGPU_MACH_AMDGCN_GFX1101),
   LLVM_READOBJ_ENUM_ENT(ELF, EF_AMDGPU_MACH_AMDGCN_GFX1102),
   LLVM_READOBJ_ENUM_ENT(ELF, EF_AMDGPU_MACH_AMDGCN_GFX1103),
+  LLVM_READOBJ_ENUM_ENT(ELF, EF_AMDGPU_MACH_AMDGCN_GFX1150),
+  LLVM_READOBJ_ENUM_ENT(ELF, EF_AMDGPU_MACH_AMDGCN_GFX1151),
   LLVM_READOBJ_ENUM_ENT(ELF, EF_AMDGPU_FEATURE_XNACK_ANY_V4),
   LLVM_READOBJ_ENUM_ENT(ELF, EF_AMDGPU_FEATURE_XNACK_OFF_V4),
   LLVM_READOBJ_ENUM_ENT(ELF, EF_AMDGPU_FEATURE_XNACK_ON_V4),
Index: llvm/test/tools/llvm-readobj/ELF/amdgpu-elf-headers.test
===
--- llvm/test/tools/llvm-readobj/ELF/amdgpu-elf-headers.test
+++ llvm/test/tools/llvm-readobj/ELF/amdgpu-elf-headers.test
@@ -391,6 +391,24 @@
 # RUN: yaml2obj %s -o %t -DABI_VERSION=2 -DFLAG_NAME=EF_AMDGPU_MACH_AMDGCN_GFX1103
 # RUN: llvm-readobj -h %t | FileCheck %s --check-prefixes=ALL,KNOWN-ABI-VERSION,SINGLE-FLAG --match-full-lines -DABI_VERSION=2 -DFILE=%t -DFLAG_NAME=EF_AMDGPU_MACH_AMDGCN_GFX1103 -DFLAG_VALUE=0x44
 
+# RUN: yaml2obj %s -o %t -DABI_VERSION=0 -DFLAG_NAME=EF_AMDGPU_MACH_AMDGCN_GFX1150
+# RUN: llvm-readobj -h %t | FileCheck %s --check-prefixes=ALL,KNOWN-ABI-VERSION,SINGLE-FLAG --match-full-lines -DABI_VERSION=0 -DFILE=%t -DFLAG_NAME=EF_AMDGPU_MACH_AMDGCN_GFX1150 -DFLAG_VALUE=0x43
+
+# RUN: yaml2obj %s -o %t -DABI_VERSION=1 -DFLAG_NAME=EF_AMDGPU_MACH_AMDGCN_GFX1150
+# RUN: llvm-readobj -h %t | FileCheck %s --check-prefixes=ALL,KNOWN-ABI-VERSION,SINGLE-FLAG --match-full-lines -DABI_VERSION=1 -DFILE=%t -DFLAG_NAME=EF_AMDGPU_MACH_AMDGCN_GFX1150 -DFLAG_VALUE=0x43
+
+# RUN: yaml2obj %s -o %t -DABI_VERSION=2 -DFLAG_NAME=EF_AMDGPU_MACH_AMDGCN_GFX1150
+# RUN: llvm-readobj -h %t | FileCheck %s --check-prefixes=ALL,KNOWN-ABI-VERSION,SINGLE-FLAG --match-full-lines

[PATCH] D155429: [AMDGPU] Add targets gfx1150 and gfx1151

2023-07-17 Thread Jay Foad via Phabricator via cfe-commits

foad created this revision.
foad added a reviewer: AMDGPU.
Herald added subscribers: libc-commits, mattd, gchakrabarti, asavonic, 
StephenFan, kerbowa, hiraditya, tpr, dstuttard, yaxunl, jvesely, kzhuravl, 
emaste, arsenm.
Herald added a reviewer: jhenderson.
Herald added a reviewer: MaskRay.
Herald added projects: libc-project, Flang, All.
foad requested review of this revision.
Herald added subscribers: llvm-commits, openmp-commits, cfe-commits, jdoerfert, 
wdng, jholewinski.
Herald added projects: clang, OpenMP, LLVM.

This is the target definition only. Currently they are treated the same
as GFX 11.0.x.


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D155429

Files:
  clang/include/clang/Basic/Cuda.h
  clang/lib/Basic/Cuda.cpp
  clang/lib/Basic/Targets/NVPTX.cpp
  clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp
  clang/test/CodeGenOpenCL/amdgpu-features.cl
  clang/test/CodeGenOpenCL/builtins-amdgcn-gfx11.cl
  clang/test/Driver/amdgpu-macros.cl
  clang/test/Driver/amdgpu-mcpu.cl
  clang/test/Misc/target-invalid-cpu-note.c
  flang/runtime/CMakeLists.txt
  libc/cmake/modules/prepare_libc_gpu_build.cmake
  libc/src/math/gpu/vendor/amdgpu/platform.h
  llvm/docs/AMDGPUUsage.rst
  llvm/include/llvm/BinaryFormat/ELF.h
  llvm/include/llvm/TargetParser/TargetParser.h
  llvm/lib/Object/ELFObjectFile.cpp
  llvm/lib/ObjectYAML/ELFYAML.cpp
  llvm/lib/Target/AMDGPU/AMDGPU.td
  llvm/lib/Target/AMDGPU/GCNProcessors.td
  llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUTargetStreamer.cpp
  llvm/lib/TargetParser/TargetParser.cpp
  llvm/test/CodeGen/AMDGPU/directive-amdgcn-target-v3.ll
  llvm/test/CodeGen/AMDGPU/directive-amdgcn-target.ll
  llvm/test/CodeGen/AMDGPU/elf-header-flags-mach.ll
  llvm/test/CodeGen/AMDGPU/occupancy-levels.ll
  llvm/test/Object/AMDGPU/elf-header-flags-mach.yaml
  llvm/test/tools/llvm-objdump/ELF/AMDGPU/subtarget.ll
  llvm/test/tools/llvm-readobj/ELF/amdgpu-elf-headers.test
  llvm/tools/llvm-readobj/ELFDumper.cpp
  openmp/libomptarget/DeviceRTL/CMakeLists.txt

Index: openmp/libomptarget/DeviceRTL/CMakeLists.txt
===
--- openmp/libomptarget/DeviceRTL/CMakeLists.txt
+++ openmp/libomptarget/DeviceRTL/CMakeLists.txt
@@ -57,7 +57,7 @@
 set(all_amdgpu_architectures "gfx700;gfx701;gfx801;gfx803;gfx900;gfx902;gfx906"
  "gfx908;gfx90a;gfx90c;gfx940;gfx1010;gfx1030"
  "gfx1031;gfx1032;gfx1033;gfx1034;gfx1035;gfx1036"
- "gfx1100;gfx1101;gfx1102;gfx1103")
+ "gfx1100;gfx1101;gfx1102;gfx1103;gfx1150;gfx1151")
 set(all_nvptx_architectures "sm_35;sm_37;sm_50;sm_52;sm_53;sm_60;sm_61;sm_62"
 "sm_70;sm_72;sm_75;sm_80;sm_86;sm_87;sm_89;sm_90")
 set(all_gpu_architectures
Index: llvm/tools/llvm-readobj/ELFDumper.cpp
===
--- llvm/tools/llvm-readobj/ELFDumper.cpp
+++ llvm/tools/llvm-readobj/ELFDumper.cpp
@@ -1605,6 +1605,8 @@
   LLVM_READOBJ_ENUM_ENT(ELF, EF_AMDGPU_MACH_AMDGCN_GFX1101),
   LLVM_READOBJ_ENUM_ENT(ELF, EF_AMDGPU_MACH_AMDGCN_GFX1102),
   LLVM_READOBJ_ENUM_ENT(ELF, EF_AMDGPU_MACH_AMDGCN_GFX1103),
+  LLVM_READOBJ_ENUM_ENT(ELF, EF_AMDGPU_MACH_AMDGCN_GFX1150),
+  LLVM_READOBJ_ENUM_ENT(ELF, EF_AMDGPU_MACH_AMDGCN_GFX1151),
   LLVM_READOBJ_ENUM_ENT(ELF, EF_AMDGPU_FEATURE_XNACK_V3),
   LLVM_READOBJ_ENUM_ENT(ELF, EF_AMDGPU_FEATURE_SRAMECC_V3)
 };
@@ -1667,6 +1669,8 @@
   LLVM_READOBJ_ENUM_ENT(ELF, EF_AMDGPU_MACH_AMDGCN_GFX1101),
   LLVM_READOBJ_ENUM_ENT(ELF, EF_AMDGPU_MACH_AMDGCN_GFX1102),
   LLVM_READOBJ_ENUM_ENT(ELF, EF_AMDGPU_MACH_AMDGCN_GFX1103),
+  LLVM_READOBJ_ENUM_ENT(ELF, EF_AMDGPU_MACH_AMDGCN_GFX1150),
+  LLVM_READOBJ_ENUM_ENT(ELF, EF_AMDGPU_MACH_AMDGCN_GFX1151),
   LLVM_READOBJ_ENUM_ENT(ELF, EF_AMDGPU_FEATURE_XNACK_ANY_V4),
   LLVM_READOBJ_ENUM_ENT(ELF, EF_AMDGPU_FEATURE_XNACK_OFF_V4),
   LLVM_READOBJ_ENUM_ENT(ELF, EF_AMDGPU_FEATURE_XNACK_ON_V4),
Index: llvm/test/tools/llvm-readobj/ELF/amdgpu-elf-headers.test
===
--- llvm/test/tools/llvm-readobj/ELF/amdgpu-elf-headers.test
+++ llvm/test/tools/llvm-readobj/ELF/amdgpu-elf-headers.test
@@ -391,6 +391,24 @@
 # RUN: yaml2obj %s -o %t -DABI_VERSION=2 -DFLAG_NAME=EF_AMDGPU_MACH_AMDGCN_GFX1103
 # RUN: llvm-readobj -h %t | FileCheck %s --check-prefixes=ALL,KNOWN-ABI-VERSION,SINGLE-FLAG --match-full-lines -DABI_VERSION=2 -DFILE=%t -DFLAG_NAME=EF_AMDGPU_MACH_AMDGCN_GFX1103 -DFLAG_VALUE=0x44
 
+# RUN: yaml2obj %s -o %t -DABI_VERSION=0 -DFLAG_NAME=EF_AMDGPU_MACH_AMDGCN_GFX1150
+# RUN: llvm-readobj -h %t | FileCheck %s --check-prefixes=ALL,KNOWN-ABI-VERSION,SINGLE-FLAG --match-full-lines -DABI_VERSION=0 -DFILE=%t -DFLAG_NAME=EF_AMDGPU_MACH_AMDGCN_GFX1150 -DFLAG_VALUE=0x43
+
+# RUN: yaml2obj %s -o %t -DABI_VERSION=1 -DFLAG_NAME=EF_AMDGPU_MACH_AMDGCN_GFX1150
+# RUN: llvm-readobj -h %t | FileCheck %s

[PATCH] D153953: Revert "[AMDGPU] Mark mbcnt as convergent"

2023-06-29 Thread Jay Foad via Phabricator via cfe-commits

foad added a comment.

In D153953#4458134 , @sameerds wrote:

> @pravinjagtap @arsenm ... reverting the mbcnt intrinsic affects tests that 
> were added for atomic optimizations. In particular, the mbcnt is now being 
> moved across/into/out of control flow because it is no longer convergent. I 
> eyeballed one example and it seemed okay to me, but a more thorough check 
> will be useful.

They are just being moved from before the loop to after the loop. This is fine. 
It is even a bit weird that the atomic optimizer pass emits them before the 
loop in the first place.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D153953/new/

https://reviews.llvm.org/D153953

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D152901: AMDGPU: Add llvm.amdgcn.exp2 intrinsic

2023-06-14 Thread Jay Foad via Phabricator via cfe-commits

foad added inline comments.



Comment at: llvm/lib/Target/AMDGPU/AMDGPUISelLowering.h:463
+  // exp2, no denormal handling for f32.
+  EXP,
+

Is this used anywhere?


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D152901/new/

https://reviews.llvm.org/D152901

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D152697: AMDGPU: Add llvm.amdgcn.log intrinsic

2023-06-12 Thread Jay Foad via Phabricator via cfe-commits

foad accepted this revision.
foad added a comment.
This revision is now accepted and ready to land.

Seems fine.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D152697/new/

https://reviews.llvm.org/D152697

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D149776: Re-land "[AMDGPU] Define data layout entries for buffers""

2023-05-05 Thread Jay Foad via Phabricator via cfe-commits

foad added a comment.

Hi, with the new datalayout we're hitting this crash:

  ; RUN: opt -passes=indvars -S < %s
  
  target datalayout = 
"e-p:64:64-p1:64:64-p2:32:32-p3:32:32-p4:64:64-p5:32:32-p6:32:32-p7:160:256:256:32-p8:128:128-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-v1024:1024-v2048:2048-n32:64-S32-A5-G1-ni:7:8"
  target triple = "amdgcn--amdpal"
  
  define void @f(ptr addrspace(7) %arg) {
  bb:
br label %bb1
  bb1:
%i = getelementptr i32, ptr addrspace(7) %arg, i32 2
br i1 false, label %bb2, label %bb1
  bb2:
br label %bb3
  bb3:
%i4 = load i32, ptr addrspace(7) %i, align 4
br label %bb3
  }

  PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ 
and include the crash backtrace.
  Stack dump:
  0.Program arguments: /home/jayfoad2/llvm-release/bin/opt -passes=indvars 
-S
   #0 0x063a25e7 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) 
(/home/jayfoad2/llvm-release/bin/opt+0x63a25e7)
   #1 0x063a049e llvm::sys::RunSignalHandlers() 
(/home/jayfoad2/llvm-release/bin/opt+0x63a049e)
   #2 0x063a2c8a SignalHandler(int) Signals.cpp:0:0
   #3 0x7fdb0be42520 (/lib/x86_64-linux-gnu/libc.so.6+0x42520)
   #4 0x05b0b80f llvm::EVT::isExtendedVector() const 
(/home/jayfoad2/llvm-release/bin/opt+0x5b0b80f)
   #5 0x05a59bf9 
llvm::TargetLoweringBase::getTypeConversion(llvm::LLVMContext&, llvm::EVT) 
const (/home/jayfoad2/llvm-release/bin/opt+0x5a59bf9)
   #6 0x0423cf63 
llvm::BasicTTIImplBase::getTypeLegalizationCost(llvm::Type*) 
const AMDGPUTargetMachine.cpp:0:0
   #7 0x04305117 llvm::GCNTTIImpl::getArithmeticInstrCost(unsigned int, 
llvm::Type*, llvm::TargetTransformInfo::TargetCostKind, 
llvm::TargetTransformInfo::OperandValueInfo, 
llvm::TargetTransformInfo::OperandValueInfo, llvm::ArrayRef, llvm::Instruction const*) AMDGPUTargetTransformInfo.cpp:0:0
   #8 0x05549453 
llvm::TargetTransformInfo::getArithmeticInstrCost(unsigned int, llvm::Type*, 
llvm::TargetTransformInfo::TargetCostKind, 
llvm::TargetTransformInfo::OperandValueInfo, 
llvm::TargetTransformInfo::OperandValueInfo, llvm::ArrayRef, llvm::Instruction const*) const 
(/home/jayfoad2/llvm-release/bin/opt+0x5549453)
   #9 0x064b5695 
llvm::SCEVExpander::isHighCostExpansionHelper(llvm::SCEVOperand const&, 
llvm::Loop*, llvm::Instruction const&, llvm::InstructionCost&, unsigned int, 
llvm::TargetTransformInfo const&, llvm::SmallPtrSetImpl&, 
llvm::SmallVectorImpl&) 
(/home/jayfoad2/llvm-release/bin/opt+0x64b5695)
  #10 0x06475d2b 
llvm::SCEVExpander::isHighCostExpansion(llvm::ArrayRef, 
llvm::Loop*, unsigned int, llvm::TargetTransformInfo const*, llvm::Instruction 
const*) LoopUnrollRuntime.cpp:0:0
  #11 0x064813ab llvm::rewriteLoopExitValues(llvm::Loop*, 
llvm::LoopInfo*, llvm::TargetLibraryInfo*, llvm::ScalarEvolution*, 
llvm::TargetTransformInfo const*, llvm::SCEVExpander&, llvm::DominatorTree*, 
llvm::ReplaceExitVal, llvm::SmallVector&) 
(/home/jayfoad2/llvm-release/bin/opt+0x64813ab)
  #12 0x06a71b09 (anonymous 
namespace)::IndVarSimplify::run(llvm::Loop*) IndVarSimplify.cpp:0:0
  #13 0x06a7 llvm::IndVarSimplifyPass::run(llvm::Loop&, 
llvm::AnalysisManager&, 
llvm::LoopStandardAnalysisResults&, llvm::LPMUpdater&) 
(/home/jayfoad2/llvm-release/bin/opt+0x6a7)
  #14 0x065bef0d llvm::detail::PassModel, 
llvm::LoopStandardAnalysisResults&, llvm::LPMUpdater&>::run(llvm::Loop&, 
llvm::AnalysisManager&, 
llvm::LoopStandardAnalysisResults&, llvm::LPMUpdater&) PassBuilder.cpp:0:0
  #15 0x0679b933 std::optional 
llvm::PassManager, llvm::LoopStandardAnalysisResults&, 
llvm::LPMUpdater&>::runSinglePass, 
llvm::LoopStandardAnalysisResults&, llvm::LPMUpdater&>, 
std::default_delete, 
llvm::LoopStandardAnalysisResults&, llvm::LPMUpdater&(llvm::Loop&, 
std::unique_ptr, 
llvm::LoopStandardAnalysisResults&, llvm::LPMUpdater&>, 
std::default_delete, 
llvm::LoopStandardAnalysisResults&, llvm::LPMUpdater&>>>&, 
llvm::AnalysisManager&, 
llvm::LoopStandardAnalysisResults&, llvm::LPMUpdater&, 
llvm::PassInstrumentation&) (/home/jayfoad2/llvm-release/bin/opt+0x679b933)
  #16 0x0679b5a2 llvm::PassManager, 
llvm::LoopStandardAnalysisResults&, 
llvm::LPMUpdater&>::runWithoutLoopNestPasses(llvm::Loop&, 
llvm::AnalysisManager&, 
llvm::LoopStandardAnalysisResults&, llvm::LPMUpdater&) 
(/home/jayfoad2/llvm-release/bin/opt+0x679b5a2)
  #17 0x0679ab88 llvm::PassManager, 
llvm::LoopStandardAnalysisResults&, llvm::LPMUpdater&>::run(llvm::Loop&, 
llvm::AnalysisManager&, 
llvm::LoopStandardAnalysisResults&, llvm::LPMUpdater&) 
(/home/jayfoad2/llvm-release/bin/opt+0x679ab88)
  #18 0x0659ddbd llvm::detail::PassModel, llvm::LoopStandardAnalysisResults&, 
llvm::LPMUpdater&>, llvm::PreservedAnalyses, llvm::AnalysisManager, llvm::LoopStandardAnalysisResults&, 
llvm::LPMUpdater&>::run(llvm::Loop&, llvm::AnalysisManager&,

[PATCH] D147732: [AMDGPU] Add f32 permlane{16, x16} builtin variants

2023-04-14 Thread Jay Foad via Phabricator via cfe-commits

foad added a comment.

In D147732#4268661 , @rampitec wrote:

> In D147732#4267553 , @foad wrote:
>
>> Changing the existing intrinsics to use type mangling could break clients 
>> like LLPC and Mesa. I've put up a patch for LLPC to protect it against this 
>> change: https://github.com/GPUOpen-Drivers/llpc/pull/2404
>
> It can be fixed with IR autoupgrade I suppose.

No, I'm thinking of clients that use IRBuilder to create intrinsic calls 
programmatically.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D147732/new/

https://reviews.llvm.org/D147732

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D147732: [AMDGPU] Add f32 permlane{16, x16} builtin variants

2023-04-14 Thread Jay Foad via Phabricator via cfe-commits

foad added a comment.

Changing the existing intrinsics to use type mangling could break clients like 
LLPC and Mesa. I've put up a patch for LLPC to protect it against this change: 
https://github.com/GPUOpen-Drivers/llpc/pull/2404


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D147732/new/

https://reviews.llvm.org/D147732

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D146808: [AMDGPU] Add clang builtin for __builtin_amdgcn_ds_atomic_fadd_v2f16

2023-03-24 Thread Jay Foad via Phabricator via cfe-commits

foad accepted this revision.
foad added a comment.
This revision is now accepted and ready to land.

LGTM.




Comment at: clang/include/clang/Basic/BuiltinsAMDGPU.def:234
 TARGET_BUILTIN(__builtin_amdgcn_ds_atomic_fadd_v2bf16, "V2sV2s*3V2s", "t", 
"atomic-ds-pk-add-16-insts")
+TARGET_BUILTIN(__builtin_amdgcn_ds_atomic_fadd_v2f16, "V2hV2h*3V2h", "t", 
"atomic-ds-pk-add-16-insts")
 

Just curious - is there a reason that these builtins can't be declared in 
LLVM's IntrinsicsAMDGPU.td?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D146808/new/

https://reviews.llvm.org/D146808

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D146701: [AMDGPU] Create Subtarget Features for some of 16 bits atomic fadd instructions

2023-03-24 Thread Jay Foad via Phabricator via cfe-commits

foad accepted this revision.
foad added a comment.
This revision is now accepted and ready to land.

LGTM, thanks!

If you want to remove some of the other unnecessary predicates from Real 
instructions you could do that in a separate NFC patch.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D146701/new/

https://reviews.llvm.org/D146701

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D146701: [AMDGPU] Create Subtarget Features for some of 16 bits atomic fadd instructions

2023-03-23 Thread Jay Foad via Phabricator via cfe-commits

foad added inline comments.



Comment at: llvm/lib/Target/AMDGPU/BUFInstructions.td:2891
 
+let SubtargetPredicate = HasAtomicFaddNoRtnInsts in {
+defm BUFFER_ATOMIC_ADD_F32: MUBUF_Real_Atomic_vi <0x4d>;

mariusz-sikora-at-amd wrote:
> foad wrote:
> > Is this still required?
> No. We can remove this. But I wanted to limit this change only to atomic 
> f16/bf16 and not going deeper 
OK. I thought this was something you added in this patch, but now I see it is 
just moved around.



Comment at: llvm/lib/Target/AMDGPU/FLATInstructions.td:1915
 
+defm GLOBAL_ATOMIC_PK_ADD_F16 : FLAT_Global_Real_Atomics_vi <0x04e, 0>;
+defm GLOBAL_ATOMIC_PK_ADD_BF16 : FLAT_Global_Real_Atomics_vi<0x52>;

mariusz-sikora-at-amd wrote:
> foad wrote:
> > Are these changes (from here to the end of the file) still required?
> Not sure if I understand what you mean. Could you please elaborate more ? Are 
> you referring to the fact that both flat_atomic and global_atomic have FLAT 
> encoding and could be unified ?
> I thought this is required, but now you got me thinking ...
I don't understand why the changes from here to the end of the file are 
required. It looks like you have just moved some definitions around, so that 
they no longer have a SubtargetPredicate applied. Is that correct? Why?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D146701/new/

https://reviews.llvm.org/D146701

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D146701: [AMDGPU] Create Subtarget Features for some of 16 bits atomic fadd instructions

2023-03-23 Thread Jay Foad via Phabricator via cfe-commits

foad added inline comments.



Comment at: llvm/lib/Target/AMDGPU/BUFInstructions.td:2891
 
+let SubtargetPredicate = HasAtomicFaddNoRtnInsts in {
+defm BUFFER_ATOMIC_ADD_F32: MUBUF_Real_Atomic_vi <0x4d>;

Is this still required?



Comment at: llvm/lib/Target/AMDGPU/FLATInstructions.td:1915
 
+defm GLOBAL_ATOMIC_PK_ADD_F16 : FLAT_Global_Real_Atomics_vi <0x04e, 0>;
+defm GLOBAL_ATOMIC_PK_ADD_BF16 : FLAT_Global_Real_Atomics_vi<0x52>;

Are these changes (from here to the end of the file) still required?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D146701/new/

https://reviews.llvm.org/D146701

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D146701: [AMDGPU] Create Subtarget Features for some of 16 bits atomic fadd instructions

2023-03-23 Thread Jay Foad via Phabricator via cfe-commits

foad added inline comments.



Comment at: llvm/lib/Target/AMDGPU/BUFInstructions.td:2889
 
-defm BUFFER_ATOMIC_ADD_F32: MUBUF_Real_Atomic_vi <0x4d>;
+let SubtargetPredicate = HasAtomicBufferGlobalPkAddF16NoRtnInsts in {
 defm BUFFER_ATOMIC_PK_ADD_F16 : MUBUF_Real_Atomic_vi <0x4e>;

mariusz-sikora-at-amd wrote:
> foad wrote:
> > Could remove the braces if you prefer - then you don't need the "End" 
> > comment either.
> So, as I understand from other comment:
> 
> > Generally Real instructions copy their predicates from the corresponding 
> > Pseudo, so this should not be required here. Please check the other places 
> > where you have added predicates to Real instructions too.
> 
> We do not need this (L2889) Predicate, because it was added to Pseudo 
> Instruction ?
Correct. See the places commented "copy relevant pseudo op flags" in this file.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D146701/new/

https://reviews.llvm.org/D146701

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D146701: [AMDGPU] Create Subtarget Features for some of 16 bits atomic fadd instructions

2023-03-23 Thread Jay Foad via Phabricator via cfe-commits

foad added inline comments.



Comment at: clang/include/clang/Basic/BuiltinsAMDGPU.def:233
+TARGET_BUILTIN(__builtin_amdgcn_global_atomic_fadd_v2bf16, "V2sV2s*1V2s", "t", 
"atomic-global-pk-add-bf16-inst")
+TARGET_BUILTIN(__builtin_amdgcn_ds_atomic_fadd_v2bf16, "V2sV2s*3V2s", "t", 
"atomic-ds-pk-add-16-insts")
 

So __builtin_amdgcn_ds_atomic_fadd_v2f16 is missing here? (Just curious- I am 
not asking you to add it in this patch.)



Comment at: llvm/lib/Target/AMDGPU/BUFInstructions.td:2889
 
-defm BUFFER_ATOMIC_ADD_F32: MUBUF_Real_Atomic_vi <0x4d>;
+let SubtargetPredicate = HasAtomicBufferGlobalPkAddF16NoRtnInsts in {
 defm BUFFER_ATOMIC_PK_ADD_F16 : MUBUF_Real_Atomic_vi <0x4e>;

Could remove the braces if you prefer - then you don't need the "End" comment 
either.



Comment at: llvm/lib/Target/AMDGPU/FLATInstructions.td:1916
+let SubtargetPredicate = HasAtomicBufferGlobalPkAddF16NoRtnInsts in
+defm GLOBAL_ATOMIC_PK_ADD_F16 : FLAT_Global_Real_Atomics_vi <0x04e, 0>;
+

Generally Real instructions copy their predicates from the corresponding 
Pseudo, so this should not be required here. Please check the other places 
where you have added predicates to Real instructions too.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D146701/new/

https://reviews.llvm.org/D146701

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D145441: [AMDGPU] Define data layout entries for buffers

2023-03-07 Thread Jay Foad via Phabricator via cfe-commits

foad added a comment.

Just my 2p: it feels a bit premature to commit patches for this. It feels more 
like something you could prototype on a branch somewhere and come back when you 
have more experience with how it all works out in practice.

But I don't actually object to the patch, if the other reviewers are happy and 
it doens't break anything.

> The first is address space 7, a non-integral address space (which was
> already in the data layout) that has 160-bit pointers (which are
> 256-bit aligned)

Any particular reason for choosing 256-bit alignment?

> However, they must not be used as the arguments to
> getelementptr or otherwise used in address computations

I don't understand what kind of rule this is and how it would be enforced. Is 
it something that will be written into the IR LangRef?

> This commit also updates the "fallback address space" for buffer
> intrinsics to the buffer resource,

It's not clear to me that this is any more or less correct, since 7 and 8 
behave identically wrt alias analysis don't they?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D145441/new/

https://reviews.llvm.org/D145441

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D142823: Intrinsics: Allow tablegen to mark parameters with dereferenceable

2023-01-31 Thread Jay Foad via Phabricator via cfe-commits

foad added a comment.

In D142823#4093363 , @arsenm wrote:

> In D142823#4093357 , @foad wrote:
>
>> I think the tablegen functionality should be a separate patch from the 
>> amdgpu changes.
>
> Maybe, but then it’s untested in the patch which adds it

Not if you add a test. There are some already like test/TableGen/immarg.td and 
test/TableGen/intrin-side-effects.td.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D142823/new/

https://reviews.llvm.org/D142823

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D142968: [NFC] Extract `CodeGenInstAlias` into its own .h/.cpp

2023-01-31 Thread Jay Foad via Phabricator via cfe-commits

foad accepted this revision.
foad added a comment.
This revision is now accepted and ready to land.

Seems reasonable.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D142968/new/

https://reviews.llvm.org/D142968

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D142823: Intrinsics: Allow tablegen to mark parameters with dereferenceable

2023-01-31 Thread Jay Foad via Phabricator via cfe-commits

foad added a comment.

I think the tablegen functionality should be a separate patch from the amdgpu 
changes.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D142823/new/

https://reviews.llvm.org/D142823

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D142968: [NFC] Extract `CodeGenInstAlias` into its own .h/.cpp

2023-01-31 Thread Jay Foad via Phabricator via cfe-commits

foad added inline comments.



Comment at: clang/docs/tools/clang-formatted-files.txt:7421
 llvm/utils/not/not.cpp
+llvm/Utils/TableGen/CodeGenInstAlias.cpp
+llvm/Utils/TableGen/CodeGenInstAlias.h

Should come after CodeBeads in alphabetical order?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D142968/new/

https://reviews.llvm.org/D142968

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D142968: [NFC] Extract `CodeGenInstAlias` into its own .h/.cpp

2023-01-31 Thread Jay Foad via Phabricator via cfe-commits

foad added a comment.

Looks OK but what's the motivation for it?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D142968/new/

https://reviews.llvm.org/D142968

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D141798: Remove ZeroBehavior of countLeadingZeros and the like (NFC)

2023-01-17 Thread Jay Foad via Phabricator via cfe-commits

foad added a comment.
Herald added a subscriber: luke.

In D141798#4055050 , @barannikov88 
wrote:

> It would be nice to have comments reflecting the new behavior in the case of 
> 0 / max value.

+1


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D141798/new/

https://reviews.llvm.org/D141798

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D140467: [X86][Reduce] Preserve fast math flags when change it. NFCI

2022-12-21 Thread Jay Foad via Phabricator via cfe-commits

foad added inline comments.



Comment at: clang/lib/CodeGen/CGBuiltin.cpp:14740
 CGM.getIntrinsic(Intrinsic::vector_reduce_fadd, Ops[1]->getType());
+FastMathFlags FMF = Builder.getFastMathFlags();
 Builder.getFastMathFlags().setAllowReassoc();

We have FastMathFlagGuard for automatically saving and restoring fast math 
flags.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D140467/new/

https://reviews.llvm.org/D140467

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D140294: clang: Replace implementation of __builtin_isnormal

2022-12-19 Thread Jay Foad via Phabricator via cfe-commits

foad added inline comments.



Comment at: clang/lib/CodeGen/CGBuiltin.cpp:3308
   case Builtin::BI__builtin_isnormal: {
-// isnormal(x) --> x == x && fabsf(x) < infinity && fabsf(x) >= float_min
+// isnormal(x) --> fabs(x) < infinity && !(fabs(x) < float_min)
 CodeGenFunction::CGFPOptionsRAII FPOptsRAII(*this, E);

Why not make both compares ordered and write this as just `fabs(x) < infinity 
&& fabs(x) >= float_min`? That seems conceptually simpler - i.e. it makes the 
comment easier to understand and the IR no worse.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D140294/new/

https://reviews.llvm.org/D140294

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D137960: [Lexer] Speedup LexTokenInternal

2022-11-17 Thread Jay Foad via Phabricator via cfe-commits

foad added inline comments.



Comment at: clang/lib/Lex/Lexer.cpp:3520-3521
+LexStart:
+  assert(!Result.needsCleaning() && "Result doesn't need cleaning");
+  assert(!Result.hasPtrData() && "Result has been reset");
 

Messages are backwards. They should be like "Result needs cleaning!" and 
"Result has not been reset!".


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D137960/new/

https://reviews.llvm.org/D137960

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D137524: clang/AMDGPU: Emit atomicrmw for atomic_inc/dec builtins

2022-11-07 Thread Jay Foad via Phabricator via cfe-commits

foad added a comment.

In D137524#3911439 , @JonChesterfield 
wrote:

> Do you know where the uinc_wrap etc were introduced?

D137361  in the stack for this patch.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D137524/new/

https://reviews.llvm.org/D137524

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D137338: Fix dupe word typos

2022-11-03 Thread Jay Foad via Phabricator via cfe-commits

foad added a comment.

I committed the lib/Target/AMDGPU parts as 5073ae2a883f 
.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D137338/new/

https://reviews.llvm.org/D137338

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D125557: [APInt] Remove all uses of zextOrSelf, sextOrSelf and truncOrSelf

2022-05-19 Thread Jay Foad via Phabricator via cfe-commits

This revision was automatically updated to reflect the committed changes.
Closed by commit rG6bec3e9303d6: [APInt] Remove all uses of zextOrSelf, 
sextOrSelf and truncOrSelf (authored by foad).

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D125557/new/

https://reviews.llvm.org/D125557

Files:
  clang/lib/AST/ExprConstant.cpp
  clang/lib/AST/MicrosoftMangle.cpp
  clang/lib/CodeGen/CGBuiltin.cpp
  clang/lib/Sema/SemaDecl.cpp
  clang/lib/StaticAnalyzer/Core/LoopUnrolling.cpp
  llvm/lib/Analysis/BasicAliasAnalysis.cpp
  llvm/lib/Analysis/ConstantFolding.cpp
  llvm/lib/Analysis/LazyValueInfo.cpp
  llvm/lib/Analysis/MemoryBuiltins.cpp
  llvm/lib/Analysis/ScalarEvolution.cpp
  llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp
  llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp
  llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
  llvm/lib/CodeGen/SelectionDAG/FunctionLoweringInfo.cpp
  llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
  llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
  llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
  llvm/lib/IR/ConstantRange.cpp
  llvm/lib/Support/APFixedPoint.cpp
  llvm/lib/Support/APInt.cpp
  llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp
  llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
  llvm/lib/Target/AArch64/GISel/AArch64PostLegalizerCombiner.cpp
  llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp
  llvm/lib/Target/Hexagon/HexagonConstPropagation.cpp
  llvm/lib/Target/RISCV/RISCVISelLowering.cpp
  llvm/lib/Target/X86/X86ISelLowering.cpp
  llvm/lib/Target/X86/X86TargetTransformInfo.cpp
  llvm/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp
  llvm/lib/Transforms/Scalar/CorrelatedValuePropagation.cpp
  llvm/lib/Transforms/Vectorize/LoadStoreVectorizer.cpp
  llvm/test/TableGen/VarLenEncoder.td
  llvm/utils/TableGen/VarLenCodeEmitterGen.cpp
  polly/lib/CodeGen/IslExprBuilder.cpp

Index: polly/lib/CodeGen/IslExprBuilder.cpp
===
--- polly/lib/CodeGen/IslExprBuilder.cpp
+++ polly/lib/CodeGen/IslExprBuilder.cpp
@@ -765,7 +765,7 @@
   else
 T = Builder.getIntNTy(BitWidth);
 
-  APValue = APValue.sextOrSelf(T->getBitWidth());
+  APValue = APValue.sext(T->getBitWidth());
   V = ConstantInt::get(T, APValue);
 
   isl_ast_expr_free(Expr);
Index: llvm/utils/TableGen/VarLenCodeEmitterGen.cpp
===
--- llvm/utils/TableGen/VarLenCodeEmitterGen.cpp
+++ llvm/utils/TableGen/VarLenCodeEmitterGen.cpp
@@ -424,7 +424,7 @@
   raw_string_ostream SS(Case);
   // Resize the scratch buffer.
   if (BitWidth && !VLI.isFixedValueOnly())
-SS.indent(6) << "Scratch = Scratch.zextOrSelf(" << BitWidth << ");\n";
+SS.indent(6) << "Scratch = Scratch.zext(" << BitWidth << ");\n";
   // Populate based value.
   SS.indent(6) << "Inst = getInstBits(opcode);\n";
 
Index: llvm/test/TableGen/VarLenEncoder.td
===
--- llvm/test/TableGen/VarLenEncoder.td
+++ llvm/test/TableGen/VarLenEncoder.td
@@ -65,7 +65,7 @@
 // CHECK: UINT64_C(46848), // FOO32
 
 // CHECK-LABEL: case ::FOO16: {
-// CHECK: Scratch = Scratch.zextOrSelf(41);
+// CHECK: Scratch = Scratch.zext(41);
 // src.reg
 // CHECK: getMachineOpValue(MI, MI.getOperand(1), /*Pos=*/0, Scratch, Fixups, STI);
 // CHECK: Inst.insertBits(Scratch.extractBits(8, 0), 0);
@@ -83,7 +83,7 @@
 // CHECK: Inst.insertBits(Scratch.extractBits(2, 0), 39);
 
 // CHECK-LABEL: case ::FOO32: {
-// CHECK: Scratch = Scratch.zextOrSelf(57);
+// CHECK: Scratch = Scratch.zext(57);
 // src.reg
 // CHECK: getMachineOpValue(MI, MI.getOperand(1), /*Pos=*/0, Scratch, Fixups, STI);
 // CHECK: Inst.insertBits(Scratch.extractBits(8, 0), 0);
Index: llvm/lib/Transforms/Vectorize/LoadStoreVectorizer.cpp
===
--- llvm/lib/Transforms/Vectorize/LoadStoreVectorizer.cpp
+++ llvm/lib/Transforms/Vectorize/LoadStoreVectorizer.cpp
@@ -496,7 +496,7 @@
   if (PtrDelta.urem(Stride) != 0)
 return false;
   unsigned IdxBitWidth = OpA->getType()->getScalarSizeInBits();
-  APInt IdxDiff = PtrDelta.udiv(Stride).zextOrSelf(IdxBitWidth);
+  APInt IdxDiff = PtrDelta.udiv(Stride).zext(IdxBitWidth);
 
   // Only look through a ZExt/SExt.
   if (!isa(OpA) && !isa(OpA))
Index: llvm/lib/Transforms/Scalar/CorrelatedValuePropagation.cpp
===
--- llvm/lib/Transforms/Scalar/CorrelatedValuePropagation.cpp
+++ llvm/lib/Transforms/Scalar/CorrelatedValuePropagation.cpp
@@ -741,8 +741,7 @@
   // sdiv/srem is UB if divisor is -1 and divident is INT_MIN, so unless we can
   // prove that such a combination is impossible, we need to bump the bitwidth.
   if (CRs[1]->contains(APInt::getAllOnes(OrigWidth)) &&
-  CRs[0]->contains(
-  APInt::getSignedMinValue(MinSignedBits).sextOrSelf(OrigWidth)))
+

[PATCH] D125557: [APInt] Remove all uses of zextOrSelf, sextOrSelf and truncOrSelf

2022-05-17 Thread Jay Foad via Phabricator via cfe-commits

foad added inline comments.



Comment at: llvm/lib/IR/ConstantRange.cpp:724
 auto BW = getBitWidth();
-APInt Min = APInt::getMinValue(BW).zextOrSelf(ResultBitWidth);
-APInt Max = APInt::getMaxValue(BW).zextOrSelf(ResultBitWidth);
+APInt Min = APInt::getMinValue(BW);
+APInt Max = APInt::getMaxValue(BW);

foad wrote:
> efriedma wrote:
> > efriedma wrote:
> > > Making the bitwidth of the result here not equal to ResultBitWidth seems 
> > > suspect.
> > > 
> > > I think there should just be an `if (ResultBitWidth < BW) return 
> > > getFull(ResultBitWidth);` here.  Then a simple conversion just works.
> > Actually, looking at D27294 again, maybe it is actually making the result 
> > bitwidth intentionally inflate like this.
> > 
> > This could use a comment explaining what it's doing, in any case.
> I agree it could use a comment but I don't feel qualified to write it - I am 
> just trying to preserve the current behaviour.
@efriedma do you have any objection to the patch as-is?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D125557/new/

https://reviews.llvm.org/D125557

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D125557: [APInt] Remove all uses of zextOrSelf, sextOrSelf and truncOrSelf

2022-05-14 Thread Jay Foad via Phabricator via cfe-commits

foad marked 2 inline comments as done.
foad added inline comments.



Comment at: llvm/lib/Analysis/ConstantFolding.cpp:2884
 if (IntrinsicID == Intrinsic::smul_fix_sat) {
-  APInt Max = APInt::getSignedMaxValue(Width).sextOrSelf(ExtendedWidth);
-  APInt Min = APInt::getSignedMinValue(Width).sextOrSelf(ExtendedWidth);
+  APInt Max = APInt::getSignedMaxValue(Width).sext(ExtendedWidth);
+  APInt Min = APInt::getSignedMinValue(Width).sext(ExtendedWidth);

lattner wrote:
> I think this can be a zext given the top bit will be zero
Sure the first one could be zext, but the second one can't be, so it feels 
conceptually simpler (to me) to keep them both as sext.



Comment at: llvm/lib/IR/ConstantRange.cpp:724
 auto BW = getBitWidth();
-APInt Min = APInt::getMinValue(BW).zextOrSelf(ResultBitWidth);
-APInt Max = APInt::getMaxValue(BW).zextOrSelf(ResultBitWidth);
+APInt Min = APInt::getMinValue(BW);
+APInt Max = APInt::getMaxValue(BW);

efriedma wrote:
> efriedma wrote:
> > Making the bitwidth of the result here not equal to ResultBitWidth seems 
> > suspect.
> > 
> > I think there should just be an `if (ResultBitWidth < BW) return 
> > getFull(ResultBitWidth);` here.  Then a simple conversion just works.
> Actually, looking at D27294 again, maybe it is actually making the result 
> bitwidth intentionally inflate like this.
> 
> This could use a comment explaining what it's doing, in any case.
I agree it could use a comment but I don't feel qualified to write it - I am 
just trying to preserve the current behaviour.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D125557/new/

https://reviews.llvm.org/D125557

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D125557: [APInt] Remove all uses of zextOrSelf, sextOrSelf and truncOrSelf

2022-05-14 Thread Jay Foad via Phabricator via cfe-commits

foad updated this revision to Diff 429466.
foad added a comment.

Address some review comments.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D125557/new/

https://reviews.llvm.org/D125557

Files:
  clang/lib/AST/ExprConstant.cpp
  clang/lib/AST/MicrosoftMangle.cpp
  clang/lib/CodeGen/CGBuiltin.cpp
  clang/lib/Sema/SemaDecl.cpp
  clang/lib/StaticAnalyzer/Core/LoopUnrolling.cpp
  llvm/lib/Analysis/BasicAliasAnalysis.cpp
  llvm/lib/Analysis/ConstantFolding.cpp
  llvm/lib/Analysis/LazyValueInfo.cpp
  llvm/lib/Analysis/MemoryBuiltins.cpp
  llvm/lib/Analysis/ScalarEvolution.cpp
  llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp
  llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp
  llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
  llvm/lib/CodeGen/SelectionDAG/FunctionLoweringInfo.cpp
  llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
  llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
  llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
  llvm/lib/IR/ConstantRange.cpp
  llvm/lib/Support/APFixedPoint.cpp
  llvm/lib/Support/APInt.cpp
  llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp
  llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
  llvm/lib/Target/AArch64/GISel/AArch64PostLegalizerCombiner.cpp
  llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp
  llvm/lib/Target/Hexagon/HexagonConstPropagation.cpp
  llvm/lib/Target/RISCV/RISCVISelLowering.cpp
  llvm/lib/Target/X86/X86ISelLowering.cpp
  llvm/lib/Target/X86/X86TargetTransformInfo.cpp
  llvm/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp
  llvm/lib/Transforms/Scalar/CorrelatedValuePropagation.cpp
  llvm/lib/Transforms/Vectorize/LoadStoreVectorizer.cpp
  llvm/test/TableGen/VarLenEncoder.td
  llvm/utils/TableGen/VarLenCodeEmitterGen.cpp
  polly/lib/CodeGen/IslExprBuilder.cpp

Index: polly/lib/CodeGen/IslExprBuilder.cpp
===
--- polly/lib/CodeGen/IslExprBuilder.cpp
+++ polly/lib/CodeGen/IslExprBuilder.cpp
@@ -765,7 +765,7 @@
   else
 T = Builder.getIntNTy(BitWidth);
 
-  APValue = APValue.sextOrSelf(T->getBitWidth());
+  APValue = APValue.sext(T->getBitWidth());
   V = ConstantInt::get(T, APValue);
 
   isl_ast_expr_free(Expr);
Index: llvm/utils/TableGen/VarLenCodeEmitterGen.cpp
===
--- llvm/utils/TableGen/VarLenCodeEmitterGen.cpp
+++ llvm/utils/TableGen/VarLenCodeEmitterGen.cpp
@@ -424,7 +424,7 @@
   raw_string_ostream SS(Case);
   // Resize the scratch buffer.
   if (BitWidth && !VLI.isFixedValueOnly())
-SS.indent(6) << "Scratch = Scratch.zextOrSelf(" << BitWidth << ");\n";
+SS.indent(6) << "Scratch = Scratch.zext(" << BitWidth << ");\n";
   // Populate based value.
   SS.indent(6) << "Inst = getInstBits(opcode);\n";
 
Index: llvm/test/TableGen/VarLenEncoder.td
===
--- llvm/test/TableGen/VarLenEncoder.td
+++ llvm/test/TableGen/VarLenEncoder.td
@@ -65,7 +65,7 @@
 // CHECK: UINT64_C(46848), // FOO32
 
 // CHECK-LABEL: case ::FOO16: {
-// CHECK: Scratch = Scratch.zextOrSelf(41);
+// CHECK: Scratch = Scratch.zext(41);
 // src.reg
 // CHECK: getMachineOpValue(MI, MI.getOperand(1), /*Pos=*/0, Scratch, Fixups, STI);
 // CHECK: Inst.insertBits(Scratch.extractBits(8, 0), 0);
@@ -83,7 +83,7 @@
 // CHECK: Inst.insertBits(Scratch.extractBits(2, 0), 39);
 
 // CHECK-LABEL: case ::FOO32: {
-// CHECK: Scratch = Scratch.zextOrSelf(57);
+// CHECK: Scratch = Scratch.zext(57);
 // src.reg
 // CHECK: getMachineOpValue(MI, MI.getOperand(1), /*Pos=*/0, Scratch, Fixups, STI);
 // CHECK: Inst.insertBits(Scratch.extractBits(8, 0), 0);
Index: llvm/lib/Transforms/Vectorize/LoadStoreVectorizer.cpp
===
--- llvm/lib/Transforms/Vectorize/LoadStoreVectorizer.cpp
+++ llvm/lib/Transforms/Vectorize/LoadStoreVectorizer.cpp
@@ -496,7 +496,7 @@
   if (PtrDelta.urem(Stride) != 0)
 return false;
   unsigned IdxBitWidth = OpA->getType()->getScalarSizeInBits();
-  APInt IdxDiff = PtrDelta.udiv(Stride).zextOrSelf(IdxBitWidth);
+  APInt IdxDiff = PtrDelta.udiv(Stride).zext(IdxBitWidth);
 
   // Only look through a ZExt/SExt.
   if (!isa(OpA) && !isa(OpA))
Index: llvm/lib/Transforms/Scalar/CorrelatedValuePropagation.cpp
===
--- llvm/lib/Transforms/Scalar/CorrelatedValuePropagation.cpp
+++ llvm/lib/Transforms/Scalar/CorrelatedValuePropagation.cpp
@@ -741,8 +741,7 @@
   // sdiv/srem is UB if divisor is -1 and divident is INT_MIN, so unless we can
   // prove that such a combination is impossible, we need to bump the bitwidth.
   if (CRs[1]->contains(APInt::getAllOnes(OrigWidth)) &&
-  CRs[0]->contains(
-  APInt::getSignedMinValue(MinSignedBits).sextOrSelf(OrigWidth)))
+  CRs[0]->contains(APInt::getSignedMinValue(MinSignedBits).sext(OrigWidth)))
 ++MinSignedBits;
 
   // Don't shrink below 8 bits wide.
Index:

[PATCH] D125557: [APInt] Remove all uses of zextOrSelf, sextOrSelf and truncOrSelf

2022-05-13 Thread Jay Foad via Phabricator via cfe-commits

foad created this revision.
foad added reviewers: lattner, RKSimon, lebedev.ri, spatel.
Herald added subscribers: kosarev, jsilvanus, hsmhsm, jeroen.dobbelaere, 
frasercrmck, ecnelises, martong, kerbowa, luismarques, apazos, sameer.abuasal, 
pengfei, s.egerton, Jim, jocewei, PkmX, the_o, brucehoult, MartinMosbeck, 
rogfer01, edward-jones, zzheng, jrtc27, niosHD, sabuasal, simoncook, johnrusso, 
rbar, asb, hiraditya, arichardson, nhaehnle, jvesely, arsenm.
Herald added a reviewer: bollu.
Herald added a project: All.
foad requested review of this revision.
Herald added subscribers: llvm-commits, cfe-commits, pcwang-thead, MaskRay.
Herald added projects: clang, LLVM.

Most clients only used these methods because they wanted to be able to
extend or truncate to the same bit width (which is a no-op). Now that
the standard zext, sext and trunc allow this, there is no reason to use
the OrSelf versions.

The OrSelf versions additionally have the strange behaviour of allowing
extending to a *smaller* width, or truncating to a *larger* width, which
are also treated as no-ops. A small amount of client code relied on this
(ConstantRange::castOp and MicrosoftCXXNameMangler::mangleNumber) and
needed rewriting.


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D125557

Files:
  clang/lib/AST/ExprConstant.cpp
  clang/lib/AST/MicrosoftMangle.cpp
  clang/lib/CodeGen/CGBuiltin.cpp
  clang/lib/Sema/SemaDecl.cpp
  clang/lib/StaticAnalyzer/Core/LoopUnrolling.cpp
  llvm/lib/Analysis/BasicAliasAnalysis.cpp
  llvm/lib/Analysis/ConstantFolding.cpp
  llvm/lib/Analysis/LazyValueInfo.cpp
  llvm/lib/Analysis/MemoryBuiltins.cpp
  llvm/lib/Analysis/ScalarEvolution.cpp
  llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp
  llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp
  llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
  llvm/lib/CodeGen/SelectionDAG/FunctionLoweringInfo.cpp
  llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
  llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
  llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
  llvm/lib/IR/ConstantRange.cpp
  llvm/lib/Support/APFixedPoint.cpp
  llvm/lib/Support/APInt.cpp
  llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp
  llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
  llvm/lib/Target/AArch64/GISel/AArch64PostLegalizerCombiner.cpp
  llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp
  llvm/lib/Target/Hexagon/HexagonConstPropagation.cpp
  llvm/lib/Target/RISCV/RISCVISelLowering.cpp
  llvm/lib/Target/X86/X86ISelLowering.cpp
  llvm/lib/Target/X86/X86TargetTransformInfo.cpp
  llvm/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp
  llvm/lib/Transforms/Scalar/CorrelatedValuePropagation.cpp
  llvm/lib/Transforms/Vectorize/LoadStoreVectorizer.cpp
  llvm/test/TableGen/VarLenEncoder.td
  llvm/utils/TableGen/VarLenCodeEmitterGen.cpp
  polly/lib/CodeGen/IslExprBuilder.cpp

Index: polly/lib/CodeGen/IslExprBuilder.cpp
===
--- polly/lib/CodeGen/IslExprBuilder.cpp
+++ polly/lib/CodeGen/IslExprBuilder.cpp
@@ -765,7 +765,7 @@
   else
 T = Builder.getIntNTy(BitWidth);
 
-  APValue = APValue.sextOrSelf(T->getBitWidth());
+  APValue = APValue.sext(T->getBitWidth());
   V = ConstantInt::get(T, APValue);
 
   isl_ast_expr_free(Expr);
Index: llvm/utils/TableGen/VarLenCodeEmitterGen.cpp
===
--- llvm/utils/TableGen/VarLenCodeEmitterGen.cpp
+++ llvm/utils/TableGen/VarLenCodeEmitterGen.cpp
@@ -424,7 +424,7 @@
   raw_string_ostream SS(Case);
   // Resize the scratch buffer.
   if (BitWidth && !VLI.isFixedValueOnly())
-SS.indent(6) << "Scratch = Scratch.zextOrSelf(" << BitWidth << ");\n";
+SS.indent(6) << "Scratch = Scratch.zext(" << BitWidth << ");\n";
   // Populate based value.
   SS.indent(6) << "Inst = getInstBits(opcode);\n";
 
Index: llvm/test/TableGen/VarLenEncoder.td
===
--- llvm/test/TableGen/VarLenEncoder.td
+++ llvm/test/TableGen/VarLenEncoder.td
@@ -65,7 +65,7 @@
 // CHECK: UINT64_C(46848), // FOO32
 
 // CHECK-LABEL: case ::FOO16: {
-// CHECK: Scratch = Scratch.zextOrSelf(41);
+// CHECK: Scratch = Scratch.zext(41);
 // src.reg
 // CHECK: getMachineOpValue(MI, MI.getOperand(1), /*Pos=*/0, Scratch, Fixups, STI);
 // CHECK: Inst.insertBits(Scratch.extractBits(8, 0), 0);
@@ -83,7 +83,7 @@
 // CHECK: Inst.insertBits(Scratch.extractBits(2, 0), 39);
 
 // CHECK-LABEL: case ::FOO32: {
-// CHECK: Scratch = Scratch.zextOrSelf(57);
+// CHECK: Scratch = Scratch.zext(57);
 // src.reg
 // CHECK: getMachineOpValue(MI, MI.getOperand(1), /*Pos=*/0, Scratch, Fixups, STI);
 // CHECK: Inst.insertBits(Scratch.extractBits(8, 0), 0);
Index: llvm/lib/Transforms/Vectorize/LoadStoreVectorizer.cpp
===
--- llvm/lib/Transforms/Vectorize/LoadStoreVectorizer.cpp
+++ llvm/lib/Transforms/Vectorize/LoadStoreVectorizer.cpp
@@ -496,7 +496,7 @@
   if

[PATCH] D124536: [AMDGPU] Add gfx11 subtarget ELF definition

2022-04-28 Thread Jay Foad via Phabricator via cfe-commits

foad accepted this revision.
foad added a reviewer: t-tye.
foad added a comment.
This revision is now accepted and ready to land.

LGTM, thanks!


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D124536/new/

https://reviews.llvm.org/D124536

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D114957: [AMDGPU] Change llvm.amdgcn.image.bvh.intersect.ray to take vec3 args

2021-12-07 Thread Jay Foad via Phabricator via cfe-commits

foad abandoned this revision.
foad added a comment.

Abandoned in favour of D115032 .


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D114957/new/

https://reviews.llvm.org/D114957

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D115032: [AMDGPU] Change llvm.amdgcn.image.bvh.intersect.ray to take vec3 args

2021-12-04 Thread Jay Foad via Phabricator via cfe-commits

This revision was landed with ongoing or failed builds.
This revision was automatically updated to reflect the committed changes.
Closed by commit rG2774bad11242: [AMDGPU] Change 
llvm.amdgcn.image.bvh.intersect.ray to take vec3 args (authored by foad).

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D115032/new/

https://reviews.llvm.org/D115032

Files:
  clang/lib/CodeGen/CGBuiltin.cpp
  clang/test/CodeGenOpenCL/builtins-amdgcn-raytracing.cl
  llvm/include/llvm/IR/IntrinsicsAMDGPU.td
  llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
  llvm/lib/Target/AMDGPU/SIISelLowering.cpp
  llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.intersect_ray.ll
  llvm/test/CodeGen/AMDGPU/llvm.amdgcn.intersect_ray.ll

Index: llvm/test/CodeGen/AMDGPU/llvm.amdgcn.intersect_ray.ll
===
--- llvm/test/CodeGen/AMDGPU/llvm.amdgcn.intersect_ray.ll
+++ llvm/test/CodeGen/AMDGPU/llvm.amdgcn.intersect_ray.ll
@@ -3,15 +3,15 @@
 ; RUN: llc -march=amdgcn -mcpu=gfx1030 -verify-machineinstrs < %s | FileCheck -check-prefixes=GCN,GFX1030 %s
 ; RUN: not --crash llc -march=amdgcn -mcpu=gfx1012 -verify-machineinstrs < %s 2>&1 | FileCheck -check-prefix=ERR %s
 
-; uint4 llvm.amdgcn.image.bvh.intersect.ray.i32.v4f32(uint node_ptr, float ray_extent, float4 ray_origin, float4 ray_dir, float4 ray_inv_dir, uint4 texture_descr)
-; uint4 llvm.amdgcn.image.bvh.intersect.ray.i32.v4f16(uint node_ptr, float ray_extent, float4 ray_origin, half4 ray_dir, half4 ray_inv_dir, uint4 texture_descr)
-; uint4 llvm.amdgcn.image.bvh.intersect.ray.i64.v4f32(ulong node_ptr, float ray_extent, float4 ray_origin, float4 ray_dir, float4 ray_inv_dir, uint4 texture_descr)
-; uint4 llvm.amdgcn.image.bvh.intersect.ray.i64.v4f16(ulong node_ptr, float ray_extent, float4 ray_origin, half4 ray_dir, half4 ray_inv_dir, uint4 texture_descr)
+; uint4 llvm.amdgcn.image.bvh.intersect.ray.i32.v4f32(uint node_ptr, float ray_extent, float3 ray_origin, float3 ray_dir, float3 ray_inv_dir, uint4 texture_descr)
+; uint4 llvm.amdgcn.image.bvh.intersect.ray.i32.v4f16(uint node_ptr, float ray_extent, float3 ray_origin, half3 ray_dir, half3 ray_inv_dir, uint4 texture_descr)
+; uint4 llvm.amdgcn.image.bvh.intersect.ray.i64.v4f32(ulong node_ptr, float ray_extent, float3 ray_origin, float3 ray_dir, float3 ray_inv_dir, uint4 texture_descr)
+; uint4 llvm.amdgcn.image.bvh.intersect.ray.i64.v4f16(ulong node_ptr, float ray_extent, float3 ray_origin, half3 ray_dir, half3 ray_inv_dir, uint4 texture_descr)
 
-declare <4 x i32> @llvm.amdgcn.image.bvh.intersect.ray.i32.v4f32(i32, float, <4 x float>, <4 x float>, <4 x float>, <4 x i32>)
-declare <4 x i32> @llvm.amdgcn.image.bvh.intersect.ray.i32.v4f16(i32, float, <4 x float>, <4 x half>, <4 x half>, <4 x i32>)
-declare <4 x i32> @llvm.amdgcn.image.bvh.intersect.ray.i64.v4f32(i64, float, <4 x float>, <4 x float>, <4 x float>, <4 x i32>)
-declare <4 x i32> @llvm.amdgcn.image.bvh.intersect.ray.i64.v4f16(i64, float, <4 x float>, <4 x half>, <4 x half>, <4 x i32>)
+declare <4 x i32> @llvm.amdgcn.image.bvh.intersect.ray.i32.v4f32(i32, float, <3 x float>, <3 x float>, <3 x float>, <4 x i32>)
+declare <4 x i32> @llvm.amdgcn.image.bvh.intersect.ray.i32.v4f16(i32, float, <3 x float>, <3 x half>, <3 x half>, <4 x i32>)
+declare <4 x i32> @llvm.amdgcn.image.bvh.intersect.ray.i64.v4f32(i64, float, <3 x float>, <3 x float>, <3 x float>, <4 x i32>)
+declare <4 x i32> @llvm.amdgcn.image.bvh.intersect.ray.i64.v4f16(i64, float, <3 x float>, <3 x half>, <3 x half>, <4 x i32>)
 
 ; ERR: in function image_bvh_intersect_ray{{.*}}intrinsic not supported on subtarget
 ; Arguments are flattened to represent the actual VGPR_A layout, so we have no
@@ -23,43 +23,43 @@
 ; GCN-NEXT:s_waitcnt vmcnt(0)
 ; GCN-NEXT:; return to shader part epilog
 main_body:
-  %ray_origin0 = insertelement <4 x float> undef, float %ray_origin_x, i32 0
-  %ray_origin1 = insertelement <4 x float> %ray_origin0, float %ray_origin_y, i32 1
-  %ray_origin = insertelement <4 x float> %ray_origin1, float %ray_origin_z, i32 2
-  %ray_dir0 = insertelement <4 x float> undef, float %ray_dir_x, i32 0
-  %ray_dir1 = insertelement <4 x float> %ray_dir0, float %ray_dir_y, i32 1
-  %ray_dir = insertelement <4 x float> %ray_dir1, float %ray_dir_z, i32 2
-  %ray_inv_dir0 = insertelement <4 x float> undef, float %ray_inv_dir_x, i32 0
-  %ray_inv_dir1 = insertelement <4 x float> %ray_inv_dir0, float %ray_inv_dir_y, i32 1
-  %ray_inv_dir = insertelement <4 x float> %ray_inv_dir1, float %ray_inv_dir_z, i32 2
-  %v = call <4 x i32> @llvm.amdgcn.image.bvh.intersect.ray.i32.v4f32(i32 %node_ptr, float %ray_extent, <4 x float> %ray_origin, <4 x float> %ray_dir, <4 x float> %ray_inv_dir, <4 x i32> %tdescr)
+  %ray_origin0 = insertelement <3 x float> undef, float %ray_origin_x, i32 0
+  %ray_origin1 = insertelement <3 x float> %ray_origin0, float %ray_origin_y, i32 1
+  %ray_origin = insertelement <3 x float> %ray_origin1,

[PATCH] D114957: [AMDGPU] Change llvm.amdgcn.image.bvh.intersect.ray to take vec3 args

2021-12-03 Thread Jay Foad via Phabricator via cfe-commits

foad added a comment.

In D114957#3167700 , @arsenm wrote:

> I think this macro is purely terrible and should not be added (and at least 
> should be all caps?). If we can't just hard break users, I would rather just 
> leave the builtin signatures broken

OK, how about D115032 ?

Personally I have no opinion about what's best to do with the OpenCL builtins, 
but I would like to make progress with changing the intrinsics. So I have a 
slight preference for D115032  because it 
gives me a way forward without changing OpenCL behaviour. The OpenCL team can 
then decide whether or not to update the builtins at their leisure.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D114957/new/

https://reviews.llvm.org/D114957

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D115032: [AMDGPU] Change llvm.amdgcn.image.bvh.intersect.ray to take vec3 args

2021-12-03 Thread Jay Foad via Phabricator via cfe-commits

foad added a comment.

This is an alternative to D114957  that does 
not update the API of the OpenCL builtins.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D115032/new/

https://reviews.llvm.org/D115032

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D115032: [AMDGPU] Change llvm.amdgcn.image.bvh.intersect.ray to take vec3 args

2021-12-03 Thread Jay Foad via Phabricator via cfe-commits

foad created this revision.
foad added reviewers: arsenm, rampitec, yaxunl, critson, b-sumner.
Herald added subscribers: kerbowa, hiraditya, t-tye, Anastasia, tpr, dstuttard, 
nhaehnle, jvesely, kzhuravl.
foad requested review of this revision.
Herald added subscribers: llvm-commits, cfe-commits, wdng.
Herald added projects: clang, LLVM.

The ray_origin, ray_dir and ray_inv_dir arguments should all be vec3 to
match how the hardware instruction works.

Don't change the API of the corresponding OpenCL builtins.


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D115032

Files:
  clang/lib/CodeGen/CGBuiltin.cpp
  clang/test/CodeGenOpenCL/builtins-amdgcn-raytracing.cl
  llvm/include/llvm/IR/IntrinsicsAMDGPU.td
  llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
  llvm/lib/Target/AMDGPU/SIISelLowering.cpp
  llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.intersect_ray.ll
  llvm/test/CodeGen/AMDGPU/llvm.amdgcn.intersect_ray.ll

Index: llvm/test/CodeGen/AMDGPU/llvm.amdgcn.intersect_ray.ll
===
--- llvm/test/CodeGen/AMDGPU/llvm.amdgcn.intersect_ray.ll
+++ llvm/test/CodeGen/AMDGPU/llvm.amdgcn.intersect_ray.ll
@@ -3,15 +3,15 @@
 ; RUN: llc -march=amdgcn -mcpu=gfx1030 -verify-machineinstrs < %s | FileCheck -check-prefixes=GCN,GFX1030 %s
 ; RUN: not --crash llc -march=amdgcn -mcpu=gfx1012 -verify-machineinstrs < %s 2>&1 | FileCheck -check-prefix=ERR %s
 
-; uint4 llvm.amdgcn.image.bvh.intersect.ray.i32.v4f32(uint node_ptr, float ray_extent, float4 ray_origin, float4 ray_dir, float4 ray_inv_dir, uint4 texture_descr)
-; uint4 llvm.amdgcn.image.bvh.intersect.ray.i32.v4f16(uint node_ptr, float ray_extent, float4 ray_origin, half4 ray_dir, half4 ray_inv_dir, uint4 texture_descr)
-; uint4 llvm.amdgcn.image.bvh.intersect.ray.i64.v4f32(ulong node_ptr, float ray_extent, float4 ray_origin, float4 ray_dir, float4 ray_inv_dir, uint4 texture_descr)
-; uint4 llvm.amdgcn.image.bvh.intersect.ray.i64.v4f16(ulong node_ptr, float ray_extent, float4 ray_origin, half4 ray_dir, half4 ray_inv_dir, uint4 texture_descr)
+; uint4 llvm.amdgcn.image.bvh.intersect.ray.i32.v4f32(uint node_ptr, float ray_extent, float3 ray_origin, float3 ray_dir, float3 ray_inv_dir, uint4 texture_descr)
+; uint4 llvm.amdgcn.image.bvh.intersect.ray.i32.v4f16(uint node_ptr, float ray_extent, float3 ray_origin, half3 ray_dir, half3 ray_inv_dir, uint4 texture_descr)
+; uint4 llvm.amdgcn.image.bvh.intersect.ray.i64.v4f32(ulong node_ptr, float ray_extent, float3 ray_origin, float3 ray_dir, float3 ray_inv_dir, uint4 texture_descr)
+; uint4 llvm.amdgcn.image.bvh.intersect.ray.i64.v4f16(ulong node_ptr, float ray_extent, float3 ray_origin, half3 ray_dir, half3 ray_inv_dir, uint4 texture_descr)
 
-declare <4 x i32> @llvm.amdgcn.image.bvh.intersect.ray.i32.v4f32(i32, float, <4 x float>, <4 x float>, <4 x float>, <4 x i32>)
-declare <4 x i32> @llvm.amdgcn.image.bvh.intersect.ray.i32.v4f16(i32, float, <4 x float>, <4 x half>, <4 x half>, <4 x i32>)
-declare <4 x i32> @llvm.amdgcn.image.bvh.intersect.ray.i64.v4f32(i64, float, <4 x float>, <4 x float>, <4 x float>, <4 x i32>)
-declare <4 x i32> @llvm.amdgcn.image.bvh.intersect.ray.i64.v4f16(i64, float, <4 x float>, <4 x half>, <4 x half>, <4 x i32>)
+declare <4 x i32> @llvm.amdgcn.image.bvh.intersect.ray.i32.v4f32(i32, float, <3 x float>, <3 x float>, <3 x float>, <4 x i32>)
+declare <4 x i32> @llvm.amdgcn.image.bvh.intersect.ray.i32.v4f16(i32, float, <3 x float>, <3 x half>, <3 x half>, <4 x i32>)
+declare <4 x i32> @llvm.amdgcn.image.bvh.intersect.ray.i64.v4f32(i64, float, <3 x float>, <3 x float>, <3 x float>, <4 x i32>)
+declare <4 x i32> @llvm.amdgcn.image.bvh.intersect.ray.i64.v4f16(i64, float, <3 x float>, <3 x half>, <3 x half>, <4 x i32>)
 
 ; ERR: in function image_bvh_intersect_ray{{.*}}intrinsic not supported on subtarget
 ; Arguments are flattened to represent the actual VGPR_A layout, so we have no
@@ -23,43 +23,43 @@
 ; GCN-NEXT:s_waitcnt vmcnt(0)
 ; GCN-NEXT:; return to shader part epilog
 main_body:
-  %ray_origin0 = insertelement <4 x float> undef, float %ray_origin_x, i32 0
-  %ray_origin1 = insertelement <4 x float> %ray_origin0, float %ray_origin_y, i32 1
-  %ray_origin = insertelement <4 x float> %ray_origin1, float %ray_origin_z, i32 2
-  %ray_dir0 = insertelement <4 x float> undef, float %ray_dir_x, i32 0
-  %ray_dir1 = insertelement <4 x float> %ray_dir0, float %ray_dir_y, i32 1
-  %ray_dir = insertelement <4 x float> %ray_dir1, float %ray_dir_z, i32 2
-  %ray_inv_dir0 = insertelement <4 x float> undef, float %ray_inv_dir_x, i32 0
-  %ray_inv_dir1 = insertelement <4 x float> %ray_inv_dir0, float %ray_inv_dir_y, i32 1
-  %ray_inv_dir = insertelement <4 x float> %ray_inv_dir1, float %ray_inv_dir_z, i32 2
-  %v = call <4 x i32> @llvm.amdgcn.image.bvh.intersect.ray.i32.v4f32(i32 %node_ptr, float %ray_extent, <4 x float> %ray_origin, <4 x float> %ray_dir, <4 x float> %ray_inv_dir, <4 x i32> %tdescr)
+  %ray_origin0 =

[PATCH] D114957: [AMDGPU] Change llvm.amdgcn.image.bvh.intersect.ray to take vec3 args

2021-12-02 Thread Jay Foad via Phabricator via cfe-commits

foad updated this revision to Diff 391403.
foad added a comment.

Define __amdgcn_bvh_use_vec3__.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D114957/new/

https://reviews.llvm.org/D114957

Files:
  clang/include/clang/Basic/BuiltinsAMDGPU.def
  clang/lib/Basic/Targets/AMDGPU.cpp
  clang/test/CodeGenOpenCL/builtins-amdgcn-raytracing.cl
  clang/test/Preprocessor/predefined-macros.c
  llvm/include/llvm/IR/IntrinsicsAMDGPU.td
  llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
  llvm/lib/Target/AMDGPU/SIISelLowering.cpp
  llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.intersect_ray.ll
  llvm/test/CodeGen/AMDGPU/llvm.amdgcn.intersect_ray.ll

Index: llvm/test/CodeGen/AMDGPU/llvm.amdgcn.intersect_ray.ll
===
--- llvm/test/CodeGen/AMDGPU/llvm.amdgcn.intersect_ray.ll
+++ llvm/test/CodeGen/AMDGPU/llvm.amdgcn.intersect_ray.ll
@@ -3,15 +3,15 @@
 ; RUN: llc -march=amdgcn -mcpu=gfx1030 -verify-machineinstrs < %s | FileCheck -check-prefixes=GCN,GFX1030 %s
 ; RUN: not --crash llc -march=amdgcn -mcpu=gfx1012 -verify-machineinstrs < %s 2>&1 | FileCheck -check-prefix=ERR %s
 
-; uint4 llvm.amdgcn.image.bvh.intersect.ray.i32.v4f32(uint node_ptr, float ray_extent, float4 ray_origin, float4 ray_dir, float4 ray_inv_dir, uint4 texture_descr)
-; uint4 llvm.amdgcn.image.bvh.intersect.ray.i32.v4f16(uint node_ptr, float ray_extent, float4 ray_origin, half4 ray_dir, half4 ray_inv_dir, uint4 texture_descr)
-; uint4 llvm.amdgcn.image.bvh.intersect.ray.i64.v4f32(ulong node_ptr, float ray_extent, float4 ray_origin, float4 ray_dir, float4 ray_inv_dir, uint4 texture_descr)
-; uint4 llvm.amdgcn.image.bvh.intersect.ray.i64.v4f16(ulong node_ptr, float ray_extent, float4 ray_origin, half4 ray_dir, half4 ray_inv_dir, uint4 texture_descr)
+; uint4 llvm.amdgcn.image.bvh.intersect.ray.i32.v4f32(uint node_ptr, float ray_extent, float3 ray_origin, float3 ray_dir, float3 ray_inv_dir, uint4 texture_descr)
+; uint4 llvm.amdgcn.image.bvh.intersect.ray.i32.v4f16(uint node_ptr, float ray_extent, float3 ray_origin, half3 ray_dir, half3 ray_inv_dir, uint4 texture_descr)
+; uint4 llvm.amdgcn.image.bvh.intersect.ray.i64.v4f32(ulong node_ptr, float ray_extent, float3 ray_origin, float3 ray_dir, float3 ray_inv_dir, uint4 texture_descr)
+; uint4 llvm.amdgcn.image.bvh.intersect.ray.i64.v4f16(ulong node_ptr, float ray_extent, float3 ray_origin, half3 ray_dir, half3 ray_inv_dir, uint4 texture_descr)
 
-declare <4 x i32> @llvm.amdgcn.image.bvh.intersect.ray.i32.v4f32(i32, float, <4 x float>, <4 x float>, <4 x float>, <4 x i32>)
-declare <4 x i32> @llvm.amdgcn.image.bvh.intersect.ray.i32.v4f16(i32, float, <4 x float>, <4 x half>, <4 x half>, <4 x i32>)
-declare <4 x i32> @llvm.amdgcn.image.bvh.intersect.ray.i64.v4f32(i64, float, <4 x float>, <4 x float>, <4 x float>, <4 x i32>)
-declare <4 x i32> @llvm.amdgcn.image.bvh.intersect.ray.i64.v4f16(i64, float, <4 x float>, <4 x half>, <4 x half>, <4 x i32>)
+declare <4 x i32> @llvm.amdgcn.image.bvh.intersect.ray.i32.v4f32(i32, float, <3 x float>, <3 x float>, <3 x float>, <4 x i32>)
+declare <4 x i32> @llvm.amdgcn.image.bvh.intersect.ray.i32.v4f16(i32, float, <3 x float>, <3 x half>, <3 x half>, <4 x i32>)
+declare <4 x i32> @llvm.amdgcn.image.bvh.intersect.ray.i64.v4f32(i64, float, <3 x float>, <3 x float>, <3 x float>, <4 x i32>)
+declare <4 x i32> @llvm.amdgcn.image.bvh.intersect.ray.i64.v4f16(i64, float, <3 x float>, <3 x half>, <3 x half>, <4 x i32>)
 
 ; ERR: in function image_bvh_intersect_ray{{.*}}intrinsic not supported on subtarget
 ; Arguments are flattened to represent the actual VGPR_A layout, so we have no
@@ -23,43 +23,43 @@
 ; GCN-NEXT:s_waitcnt vmcnt(0)
 ; GCN-NEXT:; return to shader part epilog
 main_body:
-  %ray_origin0 = insertelement <4 x float> undef, float %ray_origin_x, i32 0
-  %ray_origin1 = insertelement <4 x float> %ray_origin0, float %ray_origin_y, i32 1
-  %ray_origin = insertelement <4 x float> %ray_origin1, float %ray_origin_z, i32 2
-  %ray_dir0 = insertelement <4 x float> undef, float %ray_dir_x, i32 0
-  %ray_dir1 = insertelement <4 x float> %ray_dir0, float %ray_dir_y, i32 1
-  %ray_dir = insertelement <4 x float> %ray_dir1, float %ray_dir_z, i32 2
-  %ray_inv_dir0 = insertelement <4 x float> undef, float %ray_inv_dir_x, i32 0
-  %ray_inv_dir1 = insertelement <4 x float> %ray_inv_dir0, float %ray_inv_dir_y, i32 1
-  %ray_inv_dir = insertelement <4 x float> %ray_inv_dir1, float %ray_inv_dir_z, i32 2
-  %v = call <4 x i32> @llvm.amdgcn.image.bvh.intersect.ray.i32.v4f32(i32 %node_ptr, float %ray_extent, <4 x float> %ray_origin, <4 x float> %ray_dir, <4 x float> %ray_inv_dir, <4 x i32> %tdescr)
+  %ray_origin0 = insertelement <3 x float> undef, float %ray_origin_x, i32 0
+  %ray_origin1 = insertelement <3 x float> %ray_origin0, float %ray_origin_y, i32 1
+  %ray_origin = insertelement <3 x float> %ray_origin1, float %ray_origin_z, i32 2
+  %ray_dir0 = insertelement <3 x

[PATCH] D114957: [AMDGPU] Change llvm.amdgcn.image.bvh.intersect.ray to take vec3 args

2021-12-02 Thread Jay Foad via Phabricator via cfe-commits

foad added a comment.

In D114957#3166948 , @b-sumner wrote:

> In D114957#3166936 , @foad wrote:
>
>> In D114957#3166858 , @yaxunl wrote:
>>
>>> In D114957#3166817 , @foad wrote:
>>>
 This is a flag-day change to the signatures of the LLVM intrinsics and the 
 OpenCL builtins. Is that OK?
>>>
>>> This breaks users' code. If we have to do this, at least let clang emit a 
>>> pre-defined macro e.g. `__amdgcn_bvh_use_vec3__`=1 so that users can make 
>>> their code work before and after the change.
>>
>> I don't know anything about OpenCL macros. Is it good enough to put this in 
>> `AMDGPUTargetInfo::getTargetDefines`:
>>
>>   if (Opts.OpenCL)
>> Builder.defineMacro("__amdgcn_bvh_use_vec3__");
>>
>> Does it need tests, documentation, etc?
>
> But how long would that be carried?  And then deprecated?

Then do you think the patch is OK as-is?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D114957/new/

https://reviews.llvm.org/D114957

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D114957: [AMDGPU] Change llvm.amdgcn.image.bvh.intersect.ray to take vec3 args

2021-12-02 Thread Jay Foad via Phabricator via cfe-commits

foad added a comment.

In D114957#3166858 , @yaxunl wrote:

> In D114957#3166817 , @foad wrote:
>
>> This is a flag-day change to the signatures of the LLVM intrinsics and the 
>> OpenCL builtins. Is that OK?
>
> This breaks users' code. If we have to do this, at least let clang emit a 
> pre-defined macro e.g. `__amdgcn_bvh_use_vec3__`=1 so that users can make 
> their code work before and after the change.

I don't know anything about OpenCL macros. Is it good enough to put this in 
`AMDGPUTargetInfo::getTargetDefines`:

  if (Opts.OpenCL)
Builder.defineMacro("__amdgcn_bvh_use_vec3__");

Does it need tests, documentation, etc?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D114957/new/

https://reviews.llvm.org/D114957

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D114957: [AMDGPU] Change llvm.amdgcn.image.bvh.intersect.ray to take vec3 args

2021-12-02 Thread Jay Foad via Phabricator via cfe-commits

foad added a comment.

This is a flag-day change to the signatures of the LLVM intrinsics and the 
OpenCL builtins. Is that OK?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D114957/new/

https://reviews.llvm.org/D114957

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D114957: [AMDGPU] Change llvm.amdgcn.image.bvh.intersect.ray to take vec3 args

2021-12-02 Thread Jay Foad via Phabricator via cfe-commits

foad created this revision.
foad added reviewers: arsenm, rampitec, critson, yaxunl, b-sumner.
Herald added subscribers: kerbowa, hiraditya, t-tye, tpr, dstuttard, nhaehnle, 
jvesely, kzhuravl.
foad requested review of this revision.
Herald added subscribers: llvm-commits, cfe-commits, wdng.
Herald added projects: clang, LLVM.

The ray_origin, ray_dir and ray_inv_dir arguments should all be vec3 to
match how the hardware instruction works.


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D114957

Files:
  clang/include/clang/Basic/BuiltinsAMDGPU.def
  clang/test/CodeGenOpenCL/builtins-amdgcn-raytracing.cl
  llvm/include/llvm/IR/IntrinsicsAMDGPU.td
  llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
  llvm/lib/Target/AMDGPU/SIISelLowering.cpp
  llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.intersect_ray.ll
  llvm/test/CodeGen/AMDGPU/llvm.amdgcn.intersect_ray.ll

Index: llvm/test/CodeGen/AMDGPU/llvm.amdgcn.intersect_ray.ll
===
--- llvm/test/CodeGen/AMDGPU/llvm.amdgcn.intersect_ray.ll
+++ llvm/test/CodeGen/AMDGPU/llvm.amdgcn.intersect_ray.ll
@@ -3,15 +3,15 @@
 ; RUN: llc -march=amdgcn -mcpu=gfx1030 -verify-machineinstrs < %s | FileCheck -check-prefixes=GCN,GFX1030 %s
 ; RUN: not --crash llc -march=amdgcn -mcpu=gfx1012 -verify-machineinstrs < %s 2>&1 | FileCheck -check-prefix=ERR %s
 
-; uint4 llvm.amdgcn.image.bvh.intersect.ray.i32.v4f32(uint node_ptr, float ray_extent, float4 ray_origin, float4 ray_dir, float4 ray_inv_dir, uint4 texture_descr)
-; uint4 llvm.amdgcn.image.bvh.intersect.ray.i32.v4f16(uint node_ptr, float ray_extent, float4 ray_origin, half4 ray_dir, half4 ray_inv_dir, uint4 texture_descr)
-; uint4 llvm.amdgcn.image.bvh.intersect.ray.i64.v4f32(ulong node_ptr, float ray_extent, float4 ray_origin, float4 ray_dir, float4 ray_inv_dir, uint4 texture_descr)
-; uint4 llvm.amdgcn.image.bvh.intersect.ray.i64.v4f16(ulong node_ptr, float ray_extent, float4 ray_origin, half4 ray_dir, half4 ray_inv_dir, uint4 texture_descr)
+; uint4 llvm.amdgcn.image.bvh.intersect.ray.i32.v4f32(uint node_ptr, float ray_extent, float3 ray_origin, float3 ray_dir, float3 ray_inv_dir, uint4 texture_descr)
+; uint4 llvm.amdgcn.image.bvh.intersect.ray.i32.v4f16(uint node_ptr, float ray_extent, float3 ray_origin, half3 ray_dir, half3 ray_inv_dir, uint4 texture_descr)
+; uint4 llvm.amdgcn.image.bvh.intersect.ray.i64.v4f32(ulong node_ptr, float ray_extent, float3 ray_origin, float3 ray_dir, float3 ray_inv_dir, uint4 texture_descr)
+; uint4 llvm.amdgcn.image.bvh.intersect.ray.i64.v4f16(ulong node_ptr, float ray_extent, float3 ray_origin, half3 ray_dir, half3 ray_inv_dir, uint4 texture_descr)
 
-declare <4 x i32> @llvm.amdgcn.image.bvh.intersect.ray.i32.v4f32(i32, float, <4 x float>, <4 x float>, <4 x float>, <4 x i32>)
-declare <4 x i32> @llvm.amdgcn.image.bvh.intersect.ray.i32.v4f16(i32, float, <4 x float>, <4 x half>, <4 x half>, <4 x i32>)
-declare <4 x i32> @llvm.amdgcn.image.bvh.intersect.ray.i64.v4f32(i64, float, <4 x float>, <4 x float>, <4 x float>, <4 x i32>)
-declare <4 x i32> @llvm.amdgcn.image.bvh.intersect.ray.i64.v4f16(i64, float, <4 x float>, <4 x half>, <4 x half>, <4 x i32>)
+declare <4 x i32> @llvm.amdgcn.image.bvh.intersect.ray.i32.v4f32(i32, float, <3 x float>, <3 x float>, <3 x float>, <4 x i32>)
+declare <4 x i32> @llvm.amdgcn.image.bvh.intersect.ray.i32.v4f16(i32, float, <3 x float>, <3 x half>, <3 x half>, <4 x i32>)
+declare <4 x i32> @llvm.amdgcn.image.bvh.intersect.ray.i64.v4f32(i64, float, <3 x float>, <3 x float>, <3 x float>, <4 x i32>)
+declare <4 x i32> @llvm.amdgcn.image.bvh.intersect.ray.i64.v4f16(i64, float, <3 x float>, <3 x half>, <3 x half>, <4 x i32>)
 
 ; ERR: in function image_bvh_intersect_ray{{.*}}intrinsic not supported on subtarget
 ; Arguments are flattened to represent the actual VGPR_A layout, so we have no
@@ -23,43 +23,43 @@
 ; GCN-NEXT:s_waitcnt vmcnt(0)
 ; GCN-NEXT:; return to shader part epilog
 main_body:
-  %ray_origin0 = insertelement <4 x float> undef, float %ray_origin_x, i32 0
-  %ray_origin1 = insertelement <4 x float> %ray_origin0, float %ray_origin_y, i32 1
-  %ray_origin = insertelement <4 x float> %ray_origin1, float %ray_origin_z, i32 2
-  %ray_dir0 = insertelement <4 x float> undef, float %ray_dir_x, i32 0
-  %ray_dir1 = insertelement <4 x float> %ray_dir0, float %ray_dir_y, i32 1
-  %ray_dir = insertelement <4 x float> %ray_dir1, float %ray_dir_z, i32 2
-  %ray_inv_dir0 = insertelement <4 x float> undef, float %ray_inv_dir_x, i32 0
-  %ray_inv_dir1 = insertelement <4 x float> %ray_inv_dir0, float %ray_inv_dir_y, i32 1
-  %ray_inv_dir = insertelement <4 x float> %ray_inv_dir1, float %ray_inv_dir_z, i32 2
-  %v = call <4 x i32> @llvm.amdgcn.image.bvh.intersect.ray.i32.v4f32(i32 %node_ptr, float %ray_extent, <4 x float> %ray_origin, <4 x float> %ray_dir, <4 x float> %ray_inv_dir, <4 x i32> %tdescr)
+  %ray_origin0 = insertelement <3 x float> undef, float %ray_origin_x, i32 0
+

[PATCH] D112791: [IR] Merge createReplacementInstr into ConstantExpr::getAsInstruction

2021-10-29 Thread Jay Foad via Phabricator via cfe-commits

This revision was landed with ongoing or failed builds.
This revision was automatically updated to reflect the committed changes.
Closed by commit rG1b758925adf6: [IR] Merge createReplacementInstr into 
ConstantExpr::getAsInstruction (authored by foad).

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D112791/new/

https://reviews.llvm.org/D112791

Files:
  clang/lib/CodeGen/CGCUDANV.cpp
  llvm/include/llvm/IR/Constants.h
  llvm/lib/IR/Constants.cpp
  llvm/lib/IR/ReplaceConstant.cpp
  llvm/lib/Target/XCore/XCoreLowerThreadLocal.cpp
  llvm/lib/Transforms/IPO/GlobalOpt.cpp
  llvm/lib/Transforms/Scalar/ConstantHoisting.cpp

Index: llvm/lib/Transforms/Scalar/ConstantHoisting.cpp
===
--- llvm/lib/Transforms/Scalar/ConstantHoisting.cpp
+++ llvm/lib/Transforms/Scalar/ConstantHoisting.cpp
@@ -819,10 +819,9 @@
 
 // Aside from constant GEPs, only constant cast expressions are collected.
 assert(ConstExpr->isCast() && "ConstExpr should be a cast");
-Instruction *ConstExprInst = ConstExpr->getAsInstruction();
+Instruction *ConstExprInst = ConstExpr->getAsInstruction(
+findMatInsertPt(ConstUser.Inst, ConstUser.OpndIdx));
 ConstExprInst->setOperand(0, Mat);
-ConstExprInst->insertBefore(findMatInsertPt(ConstUser.Inst,
-ConstUser.OpndIdx));
 
 // Use the same debug location as the instruction we are about to update.
 ConstExprInst->setDebugLoc(ConstUser.Inst->getDebugLoc());
Index: llvm/lib/Transforms/IPO/GlobalOpt.cpp
===
--- llvm/lib/Transforms/IPO/GlobalOpt.cpp
+++ llvm/lib/Transforms/IPO/GlobalOpt.cpp
@@ -1490,8 +1490,7 @@
 append_range(UUsers, U->users());
 for (auto *UU : UUsers) {
   Instruction *UI = cast(UU);
-  Instruction *NewU = U->getAsInstruction();
-  NewU->insertBefore(UI);
+  Instruction *NewU = U->getAsInstruction(UI);
   UI->replaceUsesOfWith(U, NewU);
 }
 // We've replaced all the uses, so destroy the constant. (destroyConstant
Index: llvm/lib/Target/XCore/XCoreLowerThreadLocal.cpp
===
--- llvm/lib/Target/XCore/XCoreLowerThreadLocal.cpp
+++ llvm/lib/Target/XCore/XCoreLowerThreadLocal.cpp
@@ -21,7 +21,6 @@
 #include "llvm/IR/IntrinsicsXCore.h"
 #include "llvm/IR/Module.h"
 #include "llvm/IR/NoFolder.h"
-#include "llvm/IR/ReplaceConstant.h"
 #include "llvm/IR/ValueHandle.h"
 #include "llvm/Pass.h"
 #include "llvm/Support/CommandLine.h"
@@ -90,11 +89,11 @@
   if (PredBB->getTerminator()->getNumSuccessors() > 1)
 PredBB = SplitEdge(PredBB, PN->getParent());
   Instruction *InsertPos = PredBB->getTerminator();
-  Instruction *NewInst = createReplacementInstr(CE, InsertPos);
+  Instruction *NewInst = CE->getAsInstruction(InsertPos);
   PN->setOperand(I, NewInst);
 }
 } else if (Instruction *Instr = dyn_cast(WU)) {
-  Instruction *NewInst = createReplacementInstr(CE, Instr);
+  Instruction *NewInst = CE->getAsInstruction(Instr);
   Instr->replaceUsesOfWith(CE, NewInst);
 } else {
   ConstantExpr *CExpr = dyn_cast(WU);
@@ -103,7 +102,7 @@
 }
   }
   } while (CE->hasNUsesOrMore(1)); // We need to check because a recursive
-  // sibling may have used 'CE' when createReplacementInstr was called.
+  // sibling may have used 'CE' when getAsInstruction was called.
   CE->destroyConstant();
   return true;
 }
Index: llvm/lib/IR/ReplaceConstant.cpp
===
--- llvm/lib/IR/ReplaceConstant.cpp
+++ llvm/lib/IR/ReplaceConstant.cpp
@@ -20,9 +20,7 @@
 // Replace a constant expression by instructions with equivalent operations at
 // a specified location.
 Instruction *createReplacementInstr(ConstantExpr *CE, Instruction *Instr) {
-  auto *CEInstr = CE->getAsInstruction();
-  CEInstr->insertBefore(Instr);
-  return CEInstr;
+  return CE->getAsInstruction(Instr);
 }
 
 void convertConstantExprsToInstructions(Instruction *I, ConstantExpr *CE,
@@ -63,8 +61,7 @@
   for (auto *CE : Path) {
 if (!Visited.insert(CE).second)
   continue;
-auto *NI = CE->getAsInstruction();
-NI->insertBefore(BI);
+auto *NI = CE->getAsInstruction(BI);
 II->replaceUsesOfWith(CE, NI);
 CE->removeDeadConstantUsers();
 BI = II = NI;
Index: llvm/lib/IR/Constants.cpp
===
--- llvm/lib/IR/Constants.cpp
+++ llvm/lib/IR/Constants.cpp
@@ -3492,7 +3492,7 @@
   NewOps, this, From, To, NumUpdated, OperandNo);
 }
 
-Instruction *ConstantExpr::getAsInstruction() const {
+Instruction *ConstantExpr::getAsInstruction(Instruction *InsertBefore) const {
   SmallVector

[PATCH] D112791: [IR] Merge createReplacementInstr into ConstantExpr::getAsInstruction

2021-10-29 Thread Jay Foad via Phabricator via cfe-commits

foad added inline comments.



Comment at: llvm/include/llvm/IR/Constants.h:1317
   /// would make it harder to remove ConstantExprs altogether.
-  Instruction *getAsInstruction() const;
+  Instruction *getAsInstruction(Instruction *InsertBefore = nullptr) const;
 

yaxunl wrote:
> Can you add a comment about the insertion location when 'InsertBefore' is 
> nullptr? Thanks.
Done, although there are loads of InsertBefore arguments in Instructions.h that 
all work the same way, and no comments explaining them :)


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D112791/new/

https://reviews.llvm.org/D112791

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D112791: [IR] Merge createReplacementInstr into ConstantExpr::getAsInstruction

2021-10-29 Thread Jay Foad via Phabricator via cfe-commits

foad updated this revision to Diff 383341.
foad added a comment.

Add comment about InsertBefore.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D112791/new/

https://reviews.llvm.org/D112791

Files:
  clang/lib/CodeGen/CGCUDANV.cpp
  llvm/include/llvm/IR/Constants.h
  llvm/lib/IR/Constants.cpp
  llvm/lib/IR/ReplaceConstant.cpp
  llvm/lib/Target/XCore/XCoreLowerThreadLocal.cpp
  llvm/lib/Transforms/IPO/GlobalOpt.cpp
  llvm/lib/Transforms/Scalar/ConstantHoisting.cpp

Index: llvm/lib/Transforms/Scalar/ConstantHoisting.cpp
===
--- llvm/lib/Transforms/Scalar/ConstantHoisting.cpp
+++ llvm/lib/Transforms/Scalar/ConstantHoisting.cpp
@@ -819,10 +819,9 @@
 
 // Aside from constant GEPs, only constant cast expressions are collected.
 assert(ConstExpr->isCast() && "ConstExpr should be a cast");
-Instruction *ConstExprInst = ConstExpr->getAsInstruction();
+Instruction *ConstExprInst = ConstExpr->getAsInstruction(
+findMatInsertPt(ConstUser.Inst, ConstUser.OpndIdx));
 ConstExprInst->setOperand(0, Mat);
-ConstExprInst->insertBefore(findMatInsertPt(ConstUser.Inst,
-ConstUser.OpndIdx));
 
 // Use the same debug location as the instruction we are about to update.
 ConstExprInst->setDebugLoc(ConstUser.Inst->getDebugLoc());
Index: llvm/lib/Transforms/IPO/GlobalOpt.cpp
===
--- llvm/lib/Transforms/IPO/GlobalOpt.cpp
+++ llvm/lib/Transforms/IPO/GlobalOpt.cpp
@@ -1490,8 +1490,7 @@
 append_range(UUsers, U->users());
 for (auto *UU : UUsers) {
   Instruction *UI = cast(UU);
-  Instruction *NewU = U->getAsInstruction();
-  NewU->insertBefore(UI);
+  Instruction *NewU = U->getAsInstruction(UI);
   UI->replaceUsesOfWith(U, NewU);
 }
 // We've replaced all the uses, so destroy the constant. (destroyConstant
Index: llvm/lib/Target/XCore/XCoreLowerThreadLocal.cpp
===
--- llvm/lib/Target/XCore/XCoreLowerThreadLocal.cpp
+++ llvm/lib/Target/XCore/XCoreLowerThreadLocal.cpp
@@ -21,7 +21,6 @@
 #include "llvm/IR/IntrinsicsXCore.h"
 #include "llvm/IR/Module.h"
 #include "llvm/IR/NoFolder.h"
-#include "llvm/IR/ReplaceConstant.h"
 #include "llvm/IR/ValueHandle.h"
 #include "llvm/Pass.h"
 #include "llvm/Support/CommandLine.h"
@@ -90,11 +89,11 @@
   if (PredBB->getTerminator()->getNumSuccessors() > 1)
 PredBB = SplitEdge(PredBB, PN->getParent());
   Instruction *InsertPos = PredBB->getTerminator();
-  Instruction *NewInst = createReplacementInstr(CE, InsertPos);
+  Instruction *NewInst = CE->getAsInstruction(InsertPos);
   PN->setOperand(I, NewInst);
 }
 } else if (Instruction *Instr = dyn_cast(WU)) {
-  Instruction *NewInst = createReplacementInstr(CE, Instr);
+  Instruction *NewInst = CE->getAsInstruction(Instr);
   Instr->replaceUsesOfWith(CE, NewInst);
 } else {
   ConstantExpr *CExpr = dyn_cast(WU);
@@ -103,7 +102,7 @@
 }
   }
   } while (CE->hasNUsesOrMore(1)); // We need to check because a recursive
-  // sibling may have used 'CE' when createReplacementInstr was called.
+  // sibling may have used 'CE' when getAsInstruction was called.
   CE->destroyConstant();
   return true;
 }
Index: llvm/lib/IR/ReplaceConstant.cpp
===
--- llvm/lib/IR/ReplaceConstant.cpp
+++ llvm/lib/IR/ReplaceConstant.cpp
@@ -20,9 +20,7 @@
 // Replace a constant expression by instructions with equivalent operations at
 // a specified location.
 Instruction *createReplacementInstr(ConstantExpr *CE, Instruction *Instr) {
-  auto *CEInstr = CE->getAsInstruction();
-  CEInstr->insertBefore(Instr);
-  return CEInstr;
+  return CE->getAsInstruction(Instr);
 }
 
 void convertConstantExprsToInstructions(Instruction *I, ConstantExpr *CE,
@@ -63,8 +61,7 @@
   for (auto *CE : Path) {
 if (!Visited.insert(CE).second)
   continue;
-auto *NI = CE->getAsInstruction();
-NI->insertBefore(BI);
+auto *NI = CE->getAsInstruction(BI);
 II->replaceUsesOfWith(CE, NI);
 CE->removeDeadConstantUsers();
 BI = II = NI;
Index: llvm/lib/IR/Constants.cpp
===
--- llvm/lib/IR/Constants.cpp
+++ llvm/lib/IR/Constants.cpp
@@ -3492,7 +3492,7 @@
   NewOps, this, From, To, NumUpdated, OperandNo);
 }
 
-Instruction *ConstantExpr::getAsInstruction() const {
+Instruction *ConstantExpr::getAsInstruction(Instruction *InsertBefore) const {
   SmallVector ValueOperands(operands());
   ArrayRef Ops(ValueOperands);
 
@@ -3510,40 +3510,43 @@
   case Instruction::IntToPtr:
   case Instruction::BitCast:
   case

[PATCH] D112791: [IR] Merge createReplacementInstr into ConstantExpr::getAsInstruction

2021-10-29 Thread Jay Foad via Phabricator via cfe-commits

foad created this revision.
Herald added subscribers: ormris, dexonsmith, hiraditya.
foad requested review of this revision.
Herald added projects: clang, LLVM.
Herald added subscribers: llvm-commits, cfe-commits.

createReplacementInstr was a trivial wrapper around
ConstantExpr::getAsInstruction, which also inserted the new instruction
into a basic block. Implement this directly in getAsInstruction by
adding an InsertBefore parameter and change all callers to use it. NFC.

A follow-up patch will remove createReplacementInstr.


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D112791

Files:
  clang/lib/CodeGen/CGCUDANV.cpp
  llvm/include/llvm/IR/Constants.h
  llvm/lib/IR/Constants.cpp
  llvm/lib/IR/ReplaceConstant.cpp
  llvm/lib/Target/XCore/XCoreLowerThreadLocal.cpp
  llvm/lib/Transforms/IPO/GlobalOpt.cpp
  llvm/lib/Transforms/Scalar/ConstantHoisting.cpp

Index: llvm/lib/Transforms/Scalar/ConstantHoisting.cpp
===
--- llvm/lib/Transforms/Scalar/ConstantHoisting.cpp
+++ llvm/lib/Transforms/Scalar/ConstantHoisting.cpp
@@ -819,10 +819,9 @@
 
 // Aside from constant GEPs, only constant cast expressions are collected.
 assert(ConstExpr->isCast() && "ConstExpr should be a cast");
-Instruction *ConstExprInst = ConstExpr->getAsInstruction();
+Instruction *ConstExprInst = ConstExpr->getAsInstruction(
+findMatInsertPt(ConstUser.Inst, ConstUser.OpndIdx));
 ConstExprInst->setOperand(0, Mat);
-ConstExprInst->insertBefore(findMatInsertPt(ConstUser.Inst,
-ConstUser.OpndIdx));
 
 // Use the same debug location as the instruction we are about to update.
 ConstExprInst->setDebugLoc(ConstUser.Inst->getDebugLoc());
Index: llvm/lib/Transforms/IPO/GlobalOpt.cpp
===
--- llvm/lib/Transforms/IPO/GlobalOpt.cpp
+++ llvm/lib/Transforms/IPO/GlobalOpt.cpp
@@ -1490,8 +1490,7 @@
 append_range(UUsers, U->users());
 for (auto *UU : UUsers) {
   Instruction *UI = cast(UU);
-  Instruction *NewU = U->getAsInstruction();
-  NewU->insertBefore(UI);
+  Instruction *NewU = U->getAsInstruction(UI);
   UI->replaceUsesOfWith(U, NewU);
 }
 // We've replaced all the uses, so destroy the constant. (destroyConstant
Index: llvm/lib/Target/XCore/XCoreLowerThreadLocal.cpp
===
--- llvm/lib/Target/XCore/XCoreLowerThreadLocal.cpp
+++ llvm/lib/Target/XCore/XCoreLowerThreadLocal.cpp
@@ -21,7 +21,6 @@
 #include "llvm/IR/IntrinsicsXCore.h"
 #include "llvm/IR/Module.h"
 #include "llvm/IR/NoFolder.h"
-#include "llvm/IR/ReplaceConstant.h"
 #include "llvm/IR/ValueHandle.h"
 #include "llvm/Pass.h"
 #include "llvm/Support/CommandLine.h"
@@ -90,11 +89,11 @@
   if (PredBB->getTerminator()->getNumSuccessors() > 1)
 PredBB = SplitEdge(PredBB, PN->getParent());
   Instruction *InsertPos = PredBB->getTerminator();
-  Instruction *NewInst = createReplacementInstr(CE, InsertPos);
+  Instruction *NewInst = CE->getAsInstruction(InsertPos);
   PN->setOperand(I, NewInst);
 }
 } else if (Instruction *Instr = dyn_cast(WU)) {
-  Instruction *NewInst = createReplacementInstr(CE, Instr);
+  Instruction *NewInst = CE->getAsInstruction(Instr);
   Instr->replaceUsesOfWith(CE, NewInst);
 } else {
   ConstantExpr *CExpr = dyn_cast(WU);
@@ -103,7 +102,7 @@
 }
   }
   } while (CE->hasNUsesOrMore(1)); // We need to check because a recursive
-  // sibling may have used 'CE' when createReplacementInstr was called.
+  // sibling may have used 'CE' when getAsInstruction was called.
   CE->destroyConstant();
   return true;
 }
Index: llvm/lib/IR/ReplaceConstant.cpp
===
--- llvm/lib/IR/ReplaceConstant.cpp
+++ llvm/lib/IR/ReplaceConstant.cpp
@@ -20,9 +20,7 @@
 // Replace a constant expression by instructions with equivalent operations at
 // a specified location.
 Instruction *createReplacementInstr(ConstantExpr *CE, Instruction *Instr) {
-  auto *CEInstr = CE->getAsInstruction();
-  CEInstr->insertBefore(Instr);
-  return CEInstr;
+  return CE->getAsInstruction(Instr);
 }
 
 void convertConstantExprsToInstructions(Instruction *I, ConstantExpr *CE,
@@ -63,8 +61,7 @@
   for (auto *CE : Path) {
 if (!Visited.insert(CE).second)
   continue;
-auto *NI = CE->getAsInstruction();
-NI->insertBefore(BI);
+auto *NI = CE->getAsInstruction(BI);
 II->replaceUsesOfWith(CE, NI);
 CE->removeDeadConstantUsers();
 BI = II = NI;
Index: llvm/lib/IR/Constants.cpp
===
--- llvm/lib/IR/Constants.cpp
+++ llvm/lib/IR/Constants.cpp
@@ -3492,7 +3492,7 @@

[PATCH] D110808: [APInt] Stop using soft-deprecated constructors and methods in clang. NFC.

2021-10-04 Thread Jay Foad via Phabricator via cfe-commits

This revision was landed with ongoing or failed builds.
This revision was automatically updated to reflect the committed changes.
Closed by commit rGd933adeaca7b: [APInt] Stop using soft-deprecated 
constructors and methods in clang. NFC. (authored by foad).

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D110808/new/

https://reviews.llvm.org/D110808

Files:
  clang-tools-extra/clang-tidy/bugprone/StringConstructorCheck.cpp
  clang/lib/AST/ExprConstant.cpp
  clang/lib/AST/OpenMPClause.cpp
  clang/lib/CodeGen/CGExprScalar.cpp
  clang/lib/Lex/LiteralSupport.cpp
  clang/lib/Sema/SemaChecking.cpp
  clang/lib/Sema/SemaExpr.cpp
  clang/lib/Sema/SemaOpenMP.cpp
  clang/lib/StaticAnalyzer/Core/MemRegion.cpp
  clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
  clang/lib/StaticAnalyzer/Core/SimpleSValBuilder.cpp

Index: clang/lib/StaticAnalyzer/Core/SimpleSValBuilder.cpp
===
--- clang/lib/StaticAnalyzer/Core/SimpleSValBuilder.cpp
+++ clang/lib/StaticAnalyzer/Core/SimpleSValBuilder.cpp
@@ -128,14 +128,14 @@
 // a&0 and a&(~0)
 if (RHS == 0)
   return makeIntVal(0, resultTy);
-else if (RHS.isAllOnesValue())
+else if (RHS.isAllOnes())
   isIdempotent = true;
 break;
   case BO_Or:
 // a|0 and a|(~0)
 if (RHS == 0)
   isIdempotent = true;
-else if (RHS.isAllOnesValue()) {
+else if (RHS.isAllOnes()) {
   const llvm::APSInt  = BasicVals.Convert(resultTy, RHS);
   return nonloc::ConcreteInt(Result);
 }
@@ -509,7 +509,7 @@
 continue;
   case BO_Shr:
 // (~0)>>a
-if (LHSValue.isAllOnesValue() && LHSValue.isSigned())
+if (LHSValue.isAllOnes() && LHSValue.isSigned())
   return evalCast(lhs, resultTy, QualType{});
 LLVM_FALLTHROUGH;
   case BO_Shl:
Index: clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
===
--- clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
+++ clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
@@ -1568,7 +1568,7 @@
 assert(!Constraint.isEmpty() && "Empty ranges shouldn't get here");
 
 if (Constraint.getConcreteValue())
-  return !Constraint.getConcreteValue()->isNullValue();
+  return !Constraint.getConcreteValue()->isZero();
 
 APSIntType T{Constraint.getMinValue()};
 Const Zero = T.getZeroValue();
Index: clang/lib/StaticAnalyzer/Core/MemRegion.cpp
===
--- clang/lib/StaticAnalyzer/Core/MemRegion.cpp
+++ clang/lib/StaticAnalyzer/Core/MemRegion.cpp
@@ -794,7 +794,7 @@
 
 const AnalyzerOptions  = SVB.getAnalyzerOptions();
 if (Opts.ShouldConsiderSingleElementArraysAsFlexibleArrayMembers &&
-Size.isOneValue())
+Size.isOne())
   return true;
   }
   return false;
Index: clang/lib/Sema/SemaOpenMP.cpp
===
--- clang/lib/Sema/SemaOpenMP.cpp
+++ clang/lib/Sema/SemaOpenMP.cpp
@@ -18895,7 +18895,7 @@
   Expr::EvalResult ResultL;
   if (!OASE->getLength()->isValueDependent() &&
   OASE->getLength()->EvaluateAsInt(ResultR, SemaRef.getASTContext()) &&
-  !ResultR.Val.getInt().isOneValue()) {
+  !ResultR.Val.getInt().isOne()) {
 SemaRef.Diag(OASE->getLength()->getExprLoc(),
  diag::err_omp_invalid_map_this_expr);
 SemaRef.Diag(OASE->getLength()->getExprLoc(),
Index: clang/lib/Sema/SemaExpr.cpp
===
--- clang/lib/Sema/SemaExpr.cpp
+++ clang/lib/Sema/SemaExpr.cpp
@@ -3813,7 +3813,7 @@
 
 llvm::APInt Val(bit_width, 0, isSigned);
 bool Overflowed = Literal.GetFixedPointValue(Val, scale);
-bool ValIsZero = Val.isNullValue() && !Overflowed;
+bool ValIsZero = Val.isZero() && !Overflowed;
 
 auto MaxVal = Context.getFixedPointMax(Ty).getValue();
 if (Literal.isFract && Val == MaxVal + 1 && !ValIsZero)
@@ -5254,7 +5254,7 @@
   // OpenMP 5.0, 2.1.6 Iterators, Restrictions
   // If the step expression of a range-specification equals zero, the
   // behavior is unspecified.
-  if (Result && Result->isNullValue()) {
+  if (Result && Result->isZero()) {
 Diag(Step->getExprLoc(), diag::err_omp_iterator_step_constant_zero)
 << Step << Step->getSourceRange();
 IsCorrect = false;
Index: clang/lib/Sema/SemaChecking.cpp
===
--- clang/lib/Sema/SemaChecking.cpp
+++ clang/lib/Sema/SemaChecking.cpp
@@ -1729,7 +1729,7 @@
 // value so we bail out.
 if (SizeOp->isValueDependent())
   break;
-if (!SizeOp->EvaluateKnownConstInt(Context).isNullValue()) {
+if (!SizeOp->EvaluateKnownConstInt(Context).isZero()) {
   CheckNonNullArgument(*this,

[PATCH] D110808: [APInt] Stop using soft-deprecated constructors and methods in clang. NFC.

2021-09-30 Thread Jay Foad via Phabricator via cfe-commits

foad created this revision.
Herald added a subscriber: martong.
foad requested review of this revision.
Herald added projects: clang, clang-tools-extra.
Herald added a subscriber: cfe-commits.

Stop using APInt constructors and methods that were soft-deprecated in
D109483 . This fixes all the uses I found in 
clang.


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D110808

Files:
  clang-tools-extra/clang-tidy/bugprone/StringConstructorCheck.cpp
  clang/lib/AST/ExprConstant.cpp
  clang/lib/AST/OpenMPClause.cpp
  clang/lib/CodeGen/CGExprScalar.cpp
  clang/lib/Lex/LiteralSupport.cpp
  clang/lib/Sema/SemaChecking.cpp
  clang/lib/Sema/SemaExpr.cpp
  clang/lib/Sema/SemaOpenMP.cpp
  clang/lib/StaticAnalyzer/Core/MemRegion.cpp
  clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
  clang/lib/StaticAnalyzer/Core/SimpleSValBuilder.cpp

Index: clang/lib/StaticAnalyzer/Core/SimpleSValBuilder.cpp
===
--- clang/lib/StaticAnalyzer/Core/SimpleSValBuilder.cpp
+++ clang/lib/StaticAnalyzer/Core/SimpleSValBuilder.cpp
@@ -128,14 +128,14 @@
 // a&0 and a&(~0)
 if (RHS == 0)
   return makeIntVal(0, resultTy);
-else if (RHS.isAllOnesValue())
+else if (RHS.isAllOnes())
   isIdempotent = true;
 break;
   case BO_Or:
 // a|0 and a|(~0)
 if (RHS == 0)
   isIdempotent = true;
-else if (RHS.isAllOnesValue()) {
+else if (RHS.isAllOnes()) {
   const llvm::APSInt  = BasicVals.Convert(resultTy, RHS);
   return nonloc::ConcreteInt(Result);
 }
@@ -509,7 +509,7 @@
 continue;
   case BO_Shr:
 // (~0)>>a
-if (LHSValue.isAllOnesValue() && LHSValue.isSigned())
+if (LHSValue.isAllOnes() && LHSValue.isSigned())
   return evalCast(lhs, resultTy, QualType{});
 LLVM_FALLTHROUGH;
   case BO_Shl:
Index: clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
===
--- clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
+++ clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
@@ -1568,7 +1568,7 @@
 assert(!Constraint.isEmpty() && "Empty ranges shouldn't get here");
 
 if (Constraint.getConcreteValue())
-  return !Constraint.getConcreteValue()->isNullValue();
+  return !Constraint.getConcreteValue()->isZero();
 
 APSIntType T{Constraint.getMinValue()};
 Const Zero = T.getZeroValue();
Index: clang/lib/StaticAnalyzer/Core/MemRegion.cpp
===
--- clang/lib/StaticAnalyzer/Core/MemRegion.cpp
+++ clang/lib/StaticAnalyzer/Core/MemRegion.cpp
@@ -794,7 +794,7 @@
 
 const AnalyzerOptions  = SVB.getAnalyzerOptions();
 if (Opts.ShouldConsiderSingleElementArraysAsFlexibleArrayMembers &&
-Size.isOneValue())
+Size.isOne())
   return true;
   }
   return false;
Index: clang/lib/Sema/SemaOpenMP.cpp
===
--- clang/lib/Sema/SemaOpenMP.cpp
+++ clang/lib/Sema/SemaOpenMP.cpp
@@ -18895,7 +18895,7 @@
   Expr::EvalResult ResultL;
   if (!OASE->getLength()->isValueDependent() &&
   OASE->getLength()->EvaluateAsInt(ResultR, SemaRef.getASTContext()) &&
-  !ResultR.Val.getInt().isOneValue()) {
+  !ResultR.Val.getInt().isOne()) {
 SemaRef.Diag(OASE->getLength()->getExprLoc(),
  diag::err_omp_invalid_map_this_expr);
 SemaRef.Diag(OASE->getLength()->getExprLoc(),
Index: clang/lib/Sema/SemaExpr.cpp
===
--- clang/lib/Sema/SemaExpr.cpp
+++ clang/lib/Sema/SemaExpr.cpp
@@ -3813,7 +3813,7 @@
 
 llvm::APInt Val(bit_width, 0, isSigned);
 bool Overflowed = Literal.GetFixedPointValue(Val, scale);
-bool ValIsZero = Val.isNullValue() && !Overflowed;
+bool ValIsZero = Val.isZero() && !Overflowed;
 
 auto MaxVal = Context.getFixedPointMax(Ty).getValue();
 if (Literal.isFract && Val == MaxVal + 1 && !ValIsZero)
@@ -5254,7 +5254,7 @@
   // OpenMP 5.0, 2.1.6 Iterators, Restrictions
   // If the step expression of a range-specification equals zero, the
   // behavior is unspecified.
-  if (Result && Result->isNullValue()) {
+  if (Result && Result->isZero()) {
 Diag(Step->getExprLoc(), diag::err_omp_iterator_step_constant_zero)
 << Step << Step->getSourceRange();
 IsCorrect = false;
Index: clang/lib/Sema/SemaChecking.cpp
===
--- clang/lib/Sema/SemaChecking.cpp
+++ clang/lib/Sema/SemaChecking.cpp
@@ -1729,7 +1729,7 @@
 // value so we bail out.
 if (SizeOp->isValueDependent())
   break;
-if (!SizeOp->EvaluateKnownConstInt(Context).isNullValue()) {
+if (!SizeOp->EvaluateKnownConstInt(Context).isZero()) {

[PATCH] D109483: [APInt] Normalize naming on keep constructors / predicate methods.

2021-09-13 Thread Jay Foad via Phabricator via cfe-commits

foad added a comment.

What is a "keep constructor"?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D109483/new/

https://reviews.llvm.org/D109483

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D81886: [AMDGPU] Add gfx1030 target

2021-06-25 Thread Jay Foad via Phabricator via cfe-commits

foad added inline comments.
Herald added a subscriber: dexonsmith.



Comment at: llvm/lib/Target/AMDGPU/AMDGPU.td:1245
+
+def HasDsSrc2Insts : Predicate<"!Subtarget->hasDsSrc2Insts()">,
+  AssemblerPredicate<(all_of FeatureDsSrc2Insts)>;

The `!` is obviously wrong in this definition, but if I remove it, all the 
tests still pass. So does this predicate actually control anything?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D81886/new/

https://reviews.llvm.org/D81886

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D104804: [AMDGPU] Add gfx1035 target

2021-06-24 Thread Jay Foad via Phabricator via cfe-commits

foad added a comment.

Looks OK. Have you run check-llvm and check-clang?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D104804/new/

https://reviews.llvm.org/D104804

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D104804: [AMDGPU] Add gfx1035 target

2021-06-24 Thread Jay Foad via Phabricator via cfe-commits

foad added inline comments.



Comment at: clang/test/Driver/amdgpu-mcpu.cl:138
 // GFX1034:   "-target-cpu" "gfx1034"
+// GFX1034:   "-target-cpu" "gfx1035"

Typo in check prefix. Why didn't this make the test fail?



Comment at: llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUTargetStreamer.cpp:177
   case GK_GFX1034: return ELF::EF_AMDGPU_MACH_AMDGCN_GFX1034;
   case GK_NONE:return ELF::EF_AMDGPU_MACH_NONE;
   }

Please update this switch too.



Comment at: llvm/test/CodeGen/AMDGPU/directive-amdgcn-target.ll:91
 ; RUN: llc -mtriple=amdgcn-amd-amdhsa --amdhsa-code-object-version=3 
-mcpu=gfx1034 < %s | FileCheck --check-prefixes=V3-GFX1034 %s
+; RUN: llc -mtriple=amdgcn-amd-amdhsa --amdhsa-code-object-version=3 
-mcpu=gfx1035 < %s | FileCheck --check-prefixes=V3-GFX1035 %s
 

You've added a RUN line here but no checks that use the new prefix. (Doesn't 
that make llvm-lit complain?)



Comment at: llvm/test/CodeGen/AMDGPU/directive-amdgcn-target.ll:183
 ; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1034 < %s | FileCheck 
--check-prefixes=GFX1034 %s
 
 ; V3-GFX600: .amdgcn_target "amdgcn-amd-amdhsa--gfx600"

Add a RUN line here?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D104804/new/

https://reviews.llvm.org/D104804

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D104124: [IR] Simplify createReplacementInstr

2021-06-23 Thread Jay Foad via Phabricator via cfe-commits

This revision was automatically updated to reflect the committed changes.
Closed by commit rG157473a58f02: [IR] Simplify createReplacementInstr (authored 
by foad).

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D104124/new/

https://reviews.llvm.org/D104124

Files:
  clang/test/CodeGenCUDA/managed-var.cu
  llvm/lib/IR/ReplaceConstant.cpp


Index: llvm/lib/IR/ReplaceConstant.cpp
===
--- llvm/lib/IR/ReplaceConstant.cpp
+++ llvm/lib/IR/ReplaceConstant.cpp
@@ -20,53 +20,9 @@
 // Replace a constant expression by instructions with equivalent operations at
 // a specified location.
 Instruction *createReplacementInstr(ConstantExpr *CE, Instruction *Instr) {
-  IRBuilder Builder(Instr);
-  unsigned OpCode = CE->getOpcode();
-  switch (OpCode) {
-  case Instruction::GetElementPtr: {
-SmallVector CEOpVec(CE->operands());
-ArrayRef CEOps(CEOpVec);
-return dyn_cast(
-
Builder.CreateInBoundsGEP(cast(CE)->getSourceElementType(),
-  CEOps[0], CEOps.slice(1)));
-  }
-  case Instruction::Add:
-  case Instruction::Sub:
-  case Instruction::Mul:
-  case Instruction::UDiv:
-  case Instruction::SDiv:
-  case Instruction::FDiv:
-  case Instruction::URem:
-  case Instruction::SRem:
-  case Instruction::FRem:
-  case Instruction::Shl:
-  case Instruction::LShr:
-  case Instruction::AShr:
-  case Instruction::And:
-  case Instruction::Or:
-  case Instruction::Xor:
-return dyn_cast(
-Builder.CreateBinOp((Instruction::BinaryOps)OpCode, CE->getOperand(0),
-CE->getOperand(1), CE->getName()));
-  case Instruction::Trunc:
-  case Instruction::ZExt:
-  case Instruction::SExt:
-  case Instruction::FPToUI:
-  case Instruction::FPToSI:
-  case Instruction::UIToFP:
-  case Instruction::SIToFP:
-  case Instruction::FPTrunc:
-  case Instruction::FPExt:
-  case Instruction::PtrToInt:
-  case Instruction::IntToPtr:
-  case Instruction::BitCast:
-  case Instruction::AddrSpaceCast:
-return dyn_cast(
-Builder.CreateCast((Instruction::CastOps)OpCode, CE->getOperand(0),
-   CE->getType(), CE->getName()));
-  default:
-llvm_unreachable("Unhandled constant expression!\n");
-  }
+  auto *CEInstr = CE->getAsInstruction();
+  CEInstr->insertBefore(Instr);
+  return CEInstr;
 }
 
 void convertConstantExprsToInstructions(Instruction *I, ConstantExpr *CE,
Index: clang/test/CodeGenCUDA/managed-var.cu
===
--- clang/test/CodeGenCUDA/managed-var.cu
+++ clang/test/CodeGenCUDA/managed-var.cu
@@ -146,7 +146,7 @@
 // HOST:  %3 = getelementptr inbounds [100 x %struct.vec], [100 x 
%struct.vec]* %2, i64 0, i64 1, i32 1
 // HOST:  %4 = ptrtoint float* %3 to i64
 // HOST:  %5 = sub i64 %4, %1
-// HOST:  %6 = sdiv i64 %5, 4
+// HOST:  %6 = sdiv exact i64 %5, 4
 // HOST:  %7 = sitofp i64 %6 to float
 // HOST:  ret float %7
 float addr_taken2() {


Index: llvm/lib/IR/ReplaceConstant.cpp
===
--- llvm/lib/IR/ReplaceConstant.cpp
+++ llvm/lib/IR/ReplaceConstant.cpp
@@ -20,53 +20,9 @@
 // Replace a constant expression by instructions with equivalent operations at
 // a specified location.
 Instruction *createReplacementInstr(ConstantExpr *CE, Instruction *Instr) {
-  IRBuilder Builder(Instr);
-  unsigned OpCode = CE->getOpcode();
-  switch (OpCode) {
-  case Instruction::GetElementPtr: {
-SmallVector CEOpVec(CE->operands());
-ArrayRef CEOps(CEOpVec);
-return dyn_cast(
-Builder.CreateInBoundsGEP(cast(CE)->getSourceElementType(),
-  CEOps[0], CEOps.slice(1)));
-  }
-  case Instruction::Add:
-  case Instruction::Sub:
-  case Instruction::Mul:
-  case Instruction::UDiv:
-  case Instruction::SDiv:
-  case Instruction::FDiv:
-  case Instruction::URem:
-  case Instruction::SRem:
-  case Instruction::FRem:
-  case Instruction::Shl:
-  case Instruction::LShr:
-  case Instruction::AShr:
-  case Instruction::And:
-  case Instruction::Or:
-  case Instruction::Xor:
-return dyn_cast(
-Builder.CreateBinOp((Instruction::BinaryOps)OpCode, CE->getOperand(0),
-CE->getOperand(1), CE->getName()));
-  case Instruction::Trunc:
-  case Instruction::ZExt:
-  case Instruction::SExt:
-  case Instruction::FPToUI:
-  case Instruction::FPToSI:
-  case Instruction::UIToFP:
-  case Instruction::SIToFP:
-  case Instruction::FPTrunc:
-  case Instruction::FPExt:
-  case Instruction::PtrToInt:
-  case Instruction::IntToPtr:
-  case Instruction::BitCast:
-  case Instruction::AddrSpaceCast:
-return dyn_cast(
-Builder.CreateCast((Instruction::CastOps)OpCode, CE->getOperand(0),
-   CE->getType(), CE->getName()));
-  default:
-llvm_unreachable("Unhandled constant expression!\n");
-  }
+  auto *CEInstr =

[PATCH] D104124: [IR] Simplify createReplacementInstr

2021-06-11 Thread Jay Foad via Phabricator via cfe-commits

foad added reviewers: yaxunl, robertlytton.
foad added a comment.

Given how simple createReplacementInstr is now, this does make me wonder if it 
was really worth creating ReplaceConstant.{cpp,h} in the first place.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D104124/new/

https://reviews.llvm.org/D104124

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D104124: [IR] Simplify createReplacementInstr

2021-06-11 Thread Jay Foad via Phabricator via cfe-commits

foad created this revision.
Herald added subscribers: dexonsmith, hiraditya.
foad requested review of this revision.
Herald added projects: clang, LLVM.
Herald added subscribers: llvm-commits, cfe-commits.

NFCI, although the test change shows that ConstantExpr::getAsInstruction
is better than the old implementation of createReplacementInstr because
it propagates things like the sdiv "exact" flag.


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D104124

Files:
  clang/test/CodeGenCUDA/managed-var.cu
  llvm/lib/IR/ReplaceConstant.cpp


Index: llvm/lib/IR/ReplaceConstant.cpp
===
--- llvm/lib/IR/ReplaceConstant.cpp
+++ llvm/lib/IR/ReplaceConstant.cpp
@@ -20,53 +20,9 @@
 // Replace a constant expression by instructions with equivalent operations at
 // a specified location.
 Instruction *createReplacementInstr(ConstantExpr *CE, Instruction *Instr) {
-  IRBuilder Builder(Instr);
-  unsigned OpCode = CE->getOpcode();
-  switch (OpCode) {
-  case Instruction::GetElementPtr: {
-SmallVector CEOpVec(CE->operands());
-ArrayRef CEOps(CEOpVec);
-return dyn_cast(
-
Builder.CreateInBoundsGEP(cast(CE)->getSourceElementType(),
-  CEOps[0], CEOps.slice(1)));
-  }
-  case Instruction::Add:
-  case Instruction::Sub:
-  case Instruction::Mul:
-  case Instruction::UDiv:
-  case Instruction::SDiv:
-  case Instruction::FDiv:
-  case Instruction::URem:
-  case Instruction::SRem:
-  case Instruction::FRem:
-  case Instruction::Shl:
-  case Instruction::LShr:
-  case Instruction::AShr:
-  case Instruction::And:
-  case Instruction::Or:
-  case Instruction::Xor:
-return dyn_cast(
-Builder.CreateBinOp((Instruction::BinaryOps)OpCode, CE->getOperand(0),
-CE->getOperand(1), CE->getName()));
-  case Instruction::Trunc:
-  case Instruction::ZExt:
-  case Instruction::SExt:
-  case Instruction::FPToUI:
-  case Instruction::FPToSI:
-  case Instruction::UIToFP:
-  case Instruction::SIToFP:
-  case Instruction::FPTrunc:
-  case Instruction::FPExt:
-  case Instruction::PtrToInt:
-  case Instruction::IntToPtr:
-  case Instruction::BitCast:
-  case Instruction::AddrSpaceCast:
-return dyn_cast(
-Builder.CreateCast((Instruction::CastOps)OpCode, CE->getOperand(0),
-   CE->getType(), CE->getName()));
-  default:
-llvm_unreachable("Unhandled constant expression!\n");
-  }
+  auto *CEInstr = CE->getAsInstruction();
+  CEInstr->insertBefore(Instr);
+  return CEInstr;
 }
 
 void convertConstantExprsToInstructions(Instruction *I, ConstantExpr *CE,
Index: clang/test/CodeGenCUDA/managed-var.cu
===
--- clang/test/CodeGenCUDA/managed-var.cu
+++ clang/test/CodeGenCUDA/managed-var.cu
@@ -146,7 +146,7 @@
 // HOST:  %3 = getelementptr inbounds [100 x %struct.vec], [100 x 
%struct.vec]* %2, i64 0, i64 1, i32 1
 // HOST:  %4 = ptrtoint float* %3 to i64
 // HOST:  %5 = sub i64 %4, %1
-// HOST:  %6 = sdiv i64 %5, 4
+// HOST:  %6 = sdiv exact i64 %5, 4
 // HOST:  %7 = sitofp i64 %6 to float
 // HOST:  ret float %7
 float addr_taken2() {


Index: llvm/lib/IR/ReplaceConstant.cpp
===
--- llvm/lib/IR/ReplaceConstant.cpp
+++ llvm/lib/IR/ReplaceConstant.cpp
@@ -20,53 +20,9 @@
 // Replace a constant expression by instructions with equivalent operations at
 // a specified location.
 Instruction *createReplacementInstr(ConstantExpr *CE, Instruction *Instr) {
-  IRBuilder Builder(Instr);
-  unsigned OpCode = CE->getOpcode();
-  switch (OpCode) {
-  case Instruction::GetElementPtr: {
-SmallVector CEOpVec(CE->operands());
-ArrayRef CEOps(CEOpVec);
-return dyn_cast(
-Builder.CreateInBoundsGEP(cast(CE)->getSourceElementType(),
-  CEOps[0], CEOps.slice(1)));
-  }
-  case Instruction::Add:
-  case Instruction::Sub:
-  case Instruction::Mul:
-  case Instruction::UDiv:
-  case Instruction::SDiv:
-  case Instruction::FDiv:
-  case Instruction::URem:
-  case Instruction::SRem:
-  case Instruction::FRem:
-  case Instruction::Shl:
-  case Instruction::LShr:
-  case Instruction::AShr:
-  case Instruction::And:
-  case Instruction::Or:
-  case Instruction::Xor:
-return dyn_cast(
-Builder.CreateBinOp((Instruction::BinaryOps)OpCode, CE->getOperand(0),
-CE->getOperand(1), CE->getName()));
-  case Instruction::Trunc:
-  case Instruction::ZExt:
-  case Instruction::SExt:
-  case Instruction::FPToUI:
-  case Instruction::FPToSI:
-  case Instruction::UIToFP:
-  case Instruction::SIToFP:
-  case Instruction::FPTrunc:
-  case Instruction::FPExt:
-  case Instruction::PtrToInt:
-  case Instruction::IntToPtr:
-  case Instruction::BitCast:
-  case Instruction::AddrSpaceCast:
-return dyn_cast(
-Builder.CreateCast((Instruction::CastOps)OpCode, CE->getOperand(0),
-

[PATCH] D103663: [AMDGPU] Add gfx1013 target

2021-06-09 Thread Jay Foad via Phabricator via cfe-commits

foad added inline comments.



Comment at: llvm/lib/Target/AMDGPU/AMDGPU.td:471
+  "true",
+  "Encoding format GFX10_A"
+>;

bcahoon wrote:
> foad wrote:
> > I realise you're just following the precedent set by GFX10_B, but is this 
> > terminology actually used in any documentation anywhere? And if not could 
> > we describe it a little better here?
> I changed the description to be specific w.r.t what the target feature 
> enables.
Thank you. I think that is much more useful.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D103663/new/

https://reviews.llvm.org/D103663

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D103663: [AMDGPU] Add gfx1013 target

2021-06-08 Thread Jay Foad via Phabricator via cfe-commits

foad accepted this revision.
foad added a comment.

LGTM anyway, with or without any action on my last couple of comments.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D103663/new/

https://reviews.llvm.org/D103663

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D103663: [AMDGPU] Add gfx1013 target

2021-06-08 Thread Jay Foad via Phabricator via cfe-commits

foad added inline comments.



Comment at: llvm/lib/Target/AMDGPU/AMDGPU.td:471
+  "true",
+  "Encoding format GFX10_A"
+>;

I realise you're just following the precedent set by GFX10_B, but is this 
terminology actually used in any documentation anywhere? And if not could we 
describe it a little better here?



Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:7341
+if (!Subtarget->hasGFX10_AEncoding())
+  emitRemovedIntrinsicError(DAG, DL, Op.getValueType());
+

bcahoon wrote:
> rampitec wrote:
> > return emitRemovedIntrinsicError();
> I've changed this to return. Thanks for catching that. But, it returns a 
> UNDEF value instead of SDValue() so that it doesn't crash. I can change the 
> behavior if that's preferred. 
Personally I would follow all the existing precedents and "return 
emitRemovedIntrinsicError(...)". I don't see any value in deliberately trying 
to make the compiler crash harder.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D103663/new/

https://reviews.llvm.org/D103663

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D103663: [AMDGPU] Add gfx1013 target

2021-06-04 Thread Jay Foad via Phabricator via cfe-commits

foad added inline comments.



Comment at: llvm/docs/AMDGPUUsage.rst:389
 - xnack 
scratch   - *pal-amdpal*
+ ``gfx1013`` ``amdgcn``   dGPU  - cumode  - 
Absolute  - *rocm-amdhsa* *TBA*
+- wavefrontsize64   flat   
   - *pal-amdhsa*

Is it dGPU or APU?

Every other entry with `*TBA*` also has a `TODO::` message



Comment at: llvm/lib/Target/AMDGPU/AMDGPU.td:468
 
+def FeatureGFX10_AEncoding : SubtargetFeature<"gfx10_a-encoding",
+  "GFX10_AEncoding",

What is this new encoding? It doesn't seem to be used for anything.



Comment at: llvm/lib/Target/AMDGPU/GCNSubtarget.h:879
+  }
+  
   bool hasGFX10_BEncoding() const {

Stray whitespace on this line.



Comment at: llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp:1453
+}
+  
 bool isGFX10_BEncoding(const MCSubtargetInfo ) {

Stray whitespace.



Comment at: llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.intersect_ray.ll:4
+; RUN: llc -global-isel -march=amdgcn -mcpu=gfx1013 -verify-machineinstrs < %s 
| FileCheck -check-prefix=GCN %s
+; RUN: llc -global-isel -march=amdgcn -mcpu=gfx1012 -verify-machineinstrs < %s 
| FileCheck -check-prefix=GCN %s
 

This test surely should not pass for gfx1012, since it does not have these 
instructions. And with your patch as written it should fail for gfx1013 too, 
since they are predicated on HasGFX10_BEncoding.

@rampitec any idea what is wrong here? Apparently the backend will happily 
generate image_bvh_intersect_ray instructions even for gfx900!



Comment at: llvm/test/CodeGen/AMDGPU/llvm.amdgcn.intersect_ray.ll:3
+; RUN: llc -march=amdgcn -mcpu=gfx1013 -verify-machineinstrs < %s | FileCheck 
-check-prefix=GCN %s
+; RUN: llc -march=amdgcn -mcpu=gfx1012 -verify-machineinstrs < %s | FileCheck 
-check-prefix=GCN %s
 

Likewise.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D103663/new/

https://reviews.llvm.org/D103663

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D103663: [AMDGPU] Add gfx1013 target

2021-06-04 Thread Jay Foad via Phabricator via cfe-commits

foad added a comment.

Please also update `llvm/test/Object/AMDGPU/elf-header-flags-mach.yaml`.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D103663/new/

https://reviews.llvm.org/D103663

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D102306: Add gfx1034

2021-05-12 Thread Jay Foad via Phabricator via cfe-commits

foad added a comment.

Can you also check for updates in:
clang/lib/Basic/Cuda.cpp
openmp/libomptarget/plugins/amdgpu/impl/get_elf_mach_gfx_name.cpp
llvm/test/CodeGen/AMDGPU/directive-amdgcn-target.ll
llvm/test/tools/llvm-objdump/ELF/AMDGPU/subtarget.ll
llvm/test/tools/llvm-readobj/ELF/amdgpu-elf-headers.test




Comment at: clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp:4489
   case CudaArch::GFX1033:
+  case CudaArch::GFX1034:
   case CudaArch::UNUSED:

I think lint is complaining about trailing whitespace here -- please check the 
whole patch for it.



Comment at: llvm/test/Object/AMDGPU/elf-header-flags-mach.yaml:334
 
+# ELF-AMDGCN-GFX1034:   EF_AMDGPU_MACH_AMDGCN_GFX1034 (0x3E)
+# YAML-AMDGCN-GFX1034:  Flags: [ EF_AMDGPU_MACH_AMDGCN_GFX1034 ]

You also need to add RUN lines at the top of this file.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D102306/new/

https://reviews.llvm.org/D102306

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D69498: IR: Invert convergent attribute handling

2021-04-23 Thread Jay Foad via Phabricator via cfe-commits

foad added a comment.

In D69498#2705441 , @sameerds wrote:

> I would propose refining the definition of the `noconvergent` attribute as 
> follows:
>
>> noconvergent:
>>
>> Some targets with a parallel execution model provide cross-thread operations 
>> whose outcome is affected by the presence of divergent control flow. We call 
>> such operations convergent. Optimizations that change control flow may 
>> affect the correctness of a program that uses convergent operations. In the 
>> presence of divergent control flow, such optimizations conservatively treat 
>> every call/invoke instruction as convergent by default. The noconvergent 
>> attribute relaxes this constraint as follows:
>>
>> - The noconvergent attribute can be added to a call/invoke to indicate that 
>> it is not affected by changes to the control flow that reaches it.
>> - The noconvergent attribute can be added to a function to indicate that it 
>> does not execute any convergent operations. A call/invoke automatically 
>> inherits the noconvergent attribute if it is set on the callee.

I don't have much to add to the conversation except to point out that this 
definition defines `noconvergent` in terms of divergent control flow, but the 
langref itself doesn't define what divergent control flow //is//, which makes 
it an incomplete spec. (Perhaps I'm just restating @arsenm's objections.) This 
seems unsatisfactory to me but I have no idea what to do about it. I agree with 
@sameerds that the current definition of `convergent` is too restrictive 
because in practice we really do want to be able to move convergent calls past 
uniform control flow.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D69498/new/

https://reviews.llvm.org/D69498

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D69498: IR: Invert convergent attribute handling

2021-04-21 Thread Jay Foad via Phabricator via cfe-commits

foad added a comment.

> But in practice, the main issue for everyone is the effect on compile time 
> for targets that don't care about convergence/divergence. For such targets, 
> running even the divergence analysis is an unnecessary cost.

LegacyDivergenceAnalysis::runOnFunction bails out immediately if 
!hasBranchDivergence.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D69498/new/

https://reviews.llvm.org/D69498

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D69498: IR: Invert convergent attribute handling

2021-04-21 Thread Jay Foad via Phabricator via cfe-commits

foad added a comment.

In D69498#2703317 , @sameerds wrote:

> The way I see it, the notion of convergence is relevant only to a certain 
> class of targets (usually represented by GPUs) and it only affects certain 
> optimizations. Then why not have only these optimizations check `TTI` to see 
> if convergence matters? `TTI.hasBranchDivergence()` seems like a sufficient 
> proxy for this information.
>
> 1. `convergent` becomes the default in LLVM IR, but it does not affect 
> optimizations on non-GPU targets.
> 2. This is not a reinterpretation of the same IR on different targets. The 
> notional execution model of LLVM IR will say that all function calls are 
> convergent. Targets that only care about one thread at a time represent the 
> degenerate case where all executions are convergent anyway.
>
> This recasts the whole question to be one about combining optimizations with 
> target-specific information. The only changes required are in transforms that 
> check `CallInst::isConvergent()`. These should now also check `TTI`, possibly 
> adding a dependency on the `TTI` analysis where it didn't exist earlier.

@sameerds I agree with your conclusions but I would describe the situation a 
little differently. As I understand it, the optimizations that check 
isConvergent really only care about moving convergent calls past control flow 
//that might be divergent//. !hasBranchDivergence is a promise that there are 
no possible sources of divergence for a target, so you can run a divergence 
analysis if you like but it will just tell you that everything is uniform, so 
all control flow is uniform, so it's OK to move isConvergent calls around.

In practice the optimizations that check isConvergent don't seem to use 
divergence analysis, they just pessimistically assume that any control flow 
might be divergent (if hasBranchDivergence). But they could and perhaps should 
use divergence analysis, and then it would all just fall out in the wash with 
no need for an explicit hasBranchDivergence test.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D69498/new/

https://reviews.llvm.org/D69498

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D96906: [AMDGPU] gfx90a support

2021-03-29 Thread Jay Foad via Phabricator via cfe-commits

foad added inline comments.
Herald added a subscriber: mstorsjo.



Comment at: llvm/lib/Target/AMDGPU/SIFoldOperands.cpp:100
   bool tryFoldOMod(MachineInstr );
+  bool tryFoldRegSeqence(MachineInstr );
+  bool tryFoldLCSSAPhi(MachineInstr );

Spelling "sequence".


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D96906/new/

https://reviews.llvm.org/D96906

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D98717: [AMDGPU] Split dot2-insts feature

2021-03-17 Thread Jay Foad via Phabricator via cfe-commits

This revision was landed with ongoing or failed builds.
This revision was automatically updated to reflect the committed changes.
Closed by commit rG967b64beb4bf: [AMDGPU] Split dot2-insts feature (authored by 
foad).

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D98717/new/

https://reviews.llvm.org/D98717

Files:
  clang/include/clang/Basic/BuiltinsAMDGPU.def
  clang/lib/Basic/Targets/AMDGPU.cpp
  clang/test/CodeGenOpenCL/amdgpu-features.cl
  clang/test/CodeGenOpenCL/builtins-amdgcn-dl-insts-err.cl
  llvm/lib/Target/AMDGPU/AMDGPU.td
  llvm/lib/Target/AMDGPU/AMDGPUSubtarget.cpp
  llvm/lib/Target/AMDGPU/GCNSubtarget.h
  llvm/lib/Target/AMDGPU/SIISelLowering.cpp
  llvm/lib/Target/AMDGPU/VOP3PInstructions.td

Index: llvm/lib/Target/AMDGPU/VOP3PInstructions.td
===
--- llvm/lib/Target/AMDGPU/VOP3PInstructions.td
+++ llvm/lib/Target/AMDGPU/VOP3PInstructions.td
@@ -287,19 +287,24 @@
 let IsDOT = 1 in {
 let SubtargetPredicate = HasDot2Insts in {
 
-def V_DOT2_F32_F16 : VOP3PInst<"v_dot2_f32_f16",
-  VOP3_Profile,
-  AMDGPUfdot2, 1/*ExplicitClamp*/>;
 def V_DOT2_I32_I16 : VOP3PInst<"v_dot2_i32_i16",
   VOP3_Profile, int_amdgcn_sdot2, 1>;
 def V_DOT2_U32_U16 : VOP3PInst<"v_dot2_u32_u16",
   VOP3_Profile, int_amdgcn_udot2, 1>;
+
+} // End SubtargetPredicate = HasDot2Insts
+
+let SubtargetPredicate = HasDot7Insts in {
+
+def V_DOT2_F32_F16 : VOP3PInst<"v_dot2_f32_f16",
+  VOP3_Profile,
+  AMDGPUfdot2, 1/*ExplicitClamp*/>;
 def V_DOT4_U32_U8  : VOP3PInst<"v_dot4_u32_u8",
   VOP3_Profile, int_amdgcn_udot4, 1>;
 def V_DOT8_U32_U4  : VOP3PInst<"v_dot8_u32_u4",
   VOP3_Profile, int_amdgcn_udot8, 1>;
 
-} // End SubtargetPredicate = HasDot2Insts
+} // End SubtargetPredicate = HasDot7Insts
 
 let SubtargetPredicate = HasDot1Insts in {
 
@@ -564,13 +569,18 @@
 
 let SubtargetPredicate = HasDot2Insts in {
 
-defm V_DOT2_F32_F16 : VOP3P_Real_vi <0x23>;
 defm V_DOT2_I32_I16 : VOP3P_Real_vi <0x26>;
 defm V_DOT2_U32_U16 : VOP3P_Real_vi <0x27>;
+
+} // End SubtargetPredicate = HasDot2Insts
+
+let SubtargetPredicate = HasDot7Insts in {
+
+defm V_DOT2_F32_F16 : VOP3P_Real_vi <0x23>;
 defm V_DOT4_U32_U8  : VOP3P_Real_vi <0x29>;
 defm V_DOT8_U32_U4  : VOP3P_Real_vi <0x2b>;
 
-} // End SubtargetPredicate = HasDot2Insts
+} // End SubtargetPredicate = HasDot7Insts
 
 let SubtargetPredicate = HasDot1Insts in {
 
@@ -657,13 +667,18 @@
 
 let SubtargetPredicate = HasDot2Insts in {
 
-defm V_DOT2_F32_F16 : VOP3P_Real_gfx10 <0x13>;
 defm V_DOT2_I32_I16 : VOP3P_Real_gfx10 <0x14>;
 defm V_DOT2_U32_U16 : VOP3P_Real_gfx10 <0x15>;
+
+} // End SubtargetPredicate = HasDot2Insts
+
+let SubtargetPredicate = HasDot7Insts in {
+
+defm V_DOT2_F32_F16 : VOP3P_Real_gfx10 <0x13>;
 defm V_DOT4_U32_U8  : VOP3P_Real_gfx10 <0x17>;
 defm V_DOT8_U32_U4  : VOP3P_Real_gfx10 <0x19>;
 
-} // End SubtargetPredicate = HasDot2Insts
+} // End SubtargetPredicate = HasDot7Insts
 
 let SubtargetPredicate = HasDot1Insts in {
 
Index: llvm/lib/Target/AMDGPU/SIISelLowering.cpp
===
--- llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+++ llvm/lib/Target/AMDGPU/SIISelLowering.cpp
@@ -10486,7 +10486,7 @@
   EVT VT = N->getValueType(0);
   SDLoc SL(N);
 
-  if (!Subtarget->hasDot2Insts() || VT != MVT::f32)
+  if (!Subtarget->hasDot7Insts() || VT != MVT::f32)
 return SDValue();
 
   // FMA((F32)S0.x, (F32)S1. x, FMA((F32)S0.y, (F32)S1.y, (F32)z)) ->
Index: llvm/lib/Target/AMDGPU/GCNSubtarget.h
===
--- llvm/lib/Target/AMDGPU/GCNSubtarget.h
+++ llvm/lib/Target/AMDGPU/GCNSubtarget.h
@@ -150,6 +150,7 @@
   bool HasDot4Insts;
   bool HasDot5Insts;
   bool HasDot6Insts;
+  bool HasDot7Insts;
   bool HasMAIInsts;
   bool HasPkFmacF16Inst;
   bool HasAtomicFaddInsts;
@@ -687,6 +688,10 @@
 return HasDot6Insts;
   }
 
+  bool hasDot7Insts() const {
+return HasDot7Insts;
+  }
+
   bool hasMAIInsts() const {
 return HasMAIInsts;
   }
Index: llvm/lib/Target/AMDGPU/AMDGPUSubtarget.cpp
===
--- llvm/lib/Target/AMDGPU/AMDGPUSubtarget.cpp
+++ llvm/lib/Target/AMDGPU/AMDGPUSubtarget.cpp
@@ -267,6 +267,7 @@
 HasDot4Insts(false),
 HasDot5Insts(false),
 HasDot6Insts(false),
+HasDot7Insts(false),
 HasMAIInsts(false),
 HasPkFmacF16Inst(false),
 HasAtomicFaddInsts(false),
Index: llvm/lib/Target/AMDGPU/AMDGPU.td
===
--- llvm/lib/Target/AMDGPU/AMDGPU.td
+++ llvm/lib/Target/AMDGPU/AMDGPU.td
@@ -480,7 +480,7 @@
 def FeatureDot2Insts : SubtargetFeature<"dot2-insts",
   "HasDot2Insts",
   "true",
-  "Has v_dot2_f32_f16, v_dot2_i32_i16, v_dot2_u32_u16, v_dot4_u32_u8, v_dot8_u32_u4 instructions"
+  "Has v_dot2_i32_i16, v_dot2_u32_u16 instructions"
 >;
 
 def FeatureDot3Insts : SubtargetFeature<"dot3-insts",
@@ -507,6

[PATCH] D98717: [AMDGPU] Split dot2-insts feature

2021-03-16 Thread Jay Foad via Phabricator via cfe-commits

foad added inline comments.



Comment at: llvm/lib/Target/AMDGPU/AMDGPU.td:511
+def FeatureDot7Insts : SubtargetFeature<"dot7-insts",
+  "HasDot7Insts",
+  "true",

arsenm wrote:
> I'm not sure where the "7" is coming from
It's the next number after 6. I don't think any of the dotN-insts numbers have 
any real meaning.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D98717/new/

https://reviews.llvm.org/D98717

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D98717: [AMDGPU] Split dot2-insts feature

2021-03-16 Thread Jay Foad via Phabricator via cfe-commits

foad created this revision.
foad added reviewers: rampitec, kzhuravl, b-sumner.
Herald added subscribers: kerbowa, hiraditya, t-tye, tpr, dstuttard, yaxunl, 
nhaehnle, jvesely, arsenm.
foad requested review of this revision.
Herald added subscribers: llvm-commits, cfe-commits, wdng.
Herald added projects: clang, LLVM.

Split out some of the instructions predicated on the dot2-insts target
feature into a new dot7-insts, in preparation for subtargets that have
some but not all of these instructions. NFCI.


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D98717

Files:
  clang/include/clang/Basic/BuiltinsAMDGPU.def
  clang/lib/Basic/Targets/AMDGPU.cpp
  clang/test/CodeGenOpenCL/amdgpu-features.cl
  clang/test/CodeGenOpenCL/builtins-amdgcn-dl-insts-err.cl
  llvm/lib/Target/AMDGPU/AMDGPU.td
  llvm/lib/Target/AMDGPU/AMDGPUSubtarget.cpp
  llvm/lib/Target/AMDGPU/GCNSubtarget.h
  llvm/lib/Target/AMDGPU/SIISelLowering.cpp
  llvm/lib/Target/AMDGPU/VOP3PInstructions.td

Index: llvm/lib/Target/AMDGPU/VOP3PInstructions.td
===
--- llvm/lib/Target/AMDGPU/VOP3PInstructions.td
+++ llvm/lib/Target/AMDGPU/VOP3PInstructions.td
@@ -287,19 +287,24 @@
 let IsDOT = 1 in {
 let SubtargetPredicate = HasDot2Insts in {
 
-def V_DOT2_F32_F16 : VOP3PInst<"v_dot2_f32_f16",
-  VOP3_Profile,
-  AMDGPUfdot2, 1/*ExplicitClamp*/>;
 def V_DOT2_I32_I16 : VOP3PInst<"v_dot2_i32_i16",
   VOP3_Profile, int_amdgcn_sdot2, 1>;
 def V_DOT2_U32_U16 : VOP3PInst<"v_dot2_u32_u16",
   VOP3_Profile, int_amdgcn_udot2, 1>;
+
+} // End SubtargetPredicate = HasDot2Insts
+
+let SubtargetPredicate = HasDot7Insts in {
+
+def V_DOT2_F32_F16 : VOP3PInst<"v_dot2_f32_f16",
+  VOP3_Profile,
+  AMDGPUfdot2, 1/*ExplicitClamp*/>;
 def V_DOT4_U32_U8  : VOP3PInst<"v_dot4_u32_u8",
   VOP3_Profile, int_amdgcn_udot4, 1>;
 def V_DOT8_U32_U4  : VOP3PInst<"v_dot8_u32_u4",
   VOP3_Profile, int_amdgcn_udot8, 1>;
 
-} // End SubtargetPredicate = HasDot2Insts
+} // End SubtargetPredicate = HasDot7Insts
 
 let SubtargetPredicate = HasDot1Insts in {
 
@@ -564,13 +569,18 @@
 
 let SubtargetPredicate = HasDot2Insts in {
 
-defm V_DOT2_F32_F16 : VOP3P_Real_vi <0x23>;
 defm V_DOT2_I32_I16 : VOP3P_Real_vi <0x26>;
 defm V_DOT2_U32_U16 : VOP3P_Real_vi <0x27>;
+
+} // End SubtargetPredicate = HasDot2Insts
+
+let SubtargetPredicate = HasDot7Insts in {
+
+defm V_DOT2_F32_F16 : VOP3P_Real_vi <0x23>;
 defm V_DOT4_U32_U8  : VOP3P_Real_vi <0x29>;
 defm V_DOT8_U32_U4  : VOP3P_Real_vi <0x2b>;
 
-} // End SubtargetPredicate = HasDot2Insts
+} // End SubtargetPredicate = HasDot7Insts
 
 let SubtargetPredicate = HasDot1Insts in {
 
@@ -657,13 +667,18 @@
 
 let SubtargetPredicate = HasDot2Insts in {
 
-defm V_DOT2_F32_F16 : VOP3P_Real_gfx10 <0x13>;
 defm V_DOT2_I32_I16 : VOP3P_Real_gfx10 <0x14>;
 defm V_DOT2_U32_U16 : VOP3P_Real_gfx10 <0x15>;
+
+} // End SubtargetPredicate = HasDot2Insts
+
+let SubtargetPredicate = HasDot7Insts in {
+
+defm V_DOT2_F32_F16 : VOP3P_Real_gfx10 <0x13>;
 defm V_DOT4_U32_U8  : VOP3P_Real_gfx10 <0x17>;
 defm V_DOT8_U32_U4  : VOP3P_Real_gfx10 <0x19>;
 
-} // End SubtargetPredicate = HasDot2Insts
+} // End SubtargetPredicate = HasDot7Insts
 
 let SubtargetPredicate = HasDot1Insts in {
 
Index: llvm/lib/Target/AMDGPU/SIISelLowering.cpp
===
--- llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+++ llvm/lib/Target/AMDGPU/SIISelLowering.cpp
@@ -10486,7 +10486,7 @@
   EVT VT = N->getValueType(0);
   SDLoc SL(N);
 
-  if (!Subtarget->hasDot2Insts() || VT != MVT::f32)
+  if (!Subtarget->hasDot7Insts() || VT != MVT::f32)
 return SDValue();
 
   // FMA((F32)S0.x, (F32)S1. x, FMA((F32)S0.y, (F32)S1.y, (F32)z)) ->
Index: llvm/lib/Target/AMDGPU/GCNSubtarget.h
===
--- llvm/lib/Target/AMDGPU/GCNSubtarget.h
+++ llvm/lib/Target/AMDGPU/GCNSubtarget.h
@@ -150,6 +150,7 @@
   bool HasDot4Insts;
   bool HasDot5Insts;
   bool HasDot6Insts;
+  bool HasDot7Insts;
   bool HasMAIInsts;
   bool HasPkFmacF16Inst;
   bool HasAtomicFaddInsts;
@@ -687,6 +688,10 @@
 return HasDot6Insts;
   }
 
+  bool hasDot7Insts() const {
+return HasDot7Insts;
+  }
+
   bool hasMAIInsts() const {
 return HasMAIInsts;
   }
Index: llvm/lib/Target/AMDGPU/AMDGPUSubtarget.cpp
===
--- llvm/lib/Target/AMDGPU/AMDGPUSubtarget.cpp
+++ llvm/lib/Target/AMDGPU/AMDGPUSubtarget.cpp
@@ -267,6 +267,7 @@
 HasDot4Insts(false),
 HasDot5Insts(false),
 HasDot6Insts(false),
+HasDot7Insts(false),
 HasMAIInsts(false),
 HasPkFmacF16Inst(false),
 HasAtomicFaddInsts(false),
Index: llvm/lib/Target/AMDGPU/AMDGPU.td
===
--- llvm/lib/Target/AMDGPU/AMDGPU.td
+++ llvm/lib/Target/AMDGPU/AMDGPU.td
@@ -480,7 +480,7 @@
 def FeatureDot2Insts : SubtargetFeature<"dot2-insts",
   "HasDot2Insts",

[PATCH] D97928: [AMDGPU] Restore the s_memtime instruction in gfx1030

2021-03-05 Thread Jay Foad via Phabricator via cfe-commits

This revision was landed with ongoing or failed builds.
This revision was automatically updated to reflect the committed changes.
Closed by commit rGfc28f600e558: [AMDGPU] Restore the s_memtime instruction in 
gfx1030 (authored by foad).

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D97928/new/

https://reviews.llvm.org/D97928

Files:
  clang/lib/Basic/Targets/AMDGPU.cpp
  clang/test/CodeGenOpenCL/amdgpu-features.cl
  clang/test/SemaOpenCL/builtins-amdgcn-error-gfx1030.cl
  llvm/lib/Target/AMDGPU/AMDGPU.td
  llvm/lib/Target/AMDGPU/GCNSubtarget.h
  llvm/lib/Target/AMDGPU/SMInstructions.td
  llvm/test/CodeGen/AMDGPU/llvm.amdgcn.s.memtime.ll
  llvm/test/MC/AMDGPU/gfx1030_err.s

Index: llvm/test/MC/AMDGPU/gfx1030_err.s
===
--- llvm/test/MC/AMDGPU/gfx1030_err.s
+++ llvm/test/MC/AMDGPU/gfx1030_err.s
@@ -21,9 +21,6 @@
 s_get_waveid_in_workgroup s0
 // GFX10: :[[@LINE-1]]:{{[0-9]+}}: error: instruction not supported on this GPU
 
-s_memtime s[0:1]
-// GFX10: :[[@LINE-1]]:{{[0-9]+}}: error: instruction not supported on this GPU
-
 s_getreg_b32 s2, hwreg(HW_REG_XNACK_MASK)
 // GFX10: :[[@LINE-1]]:{{[0-9]+}}: error: specified hardware register is not supported on this GPU
 
Index: llvm/test/CodeGen/AMDGPU/llvm.amdgcn.s.memtime.ll
===
--- llvm/test/CodeGen/AMDGPU/llvm.amdgcn.s.memtime.ll
+++ llvm/test/CodeGen/AMDGPU/llvm.amdgcn.s.memtime.ll
@@ -1,7 +1,7 @@
 ; RUN: llc -march=amdgcn -mcpu=tahiti -verify-machineinstrs < %s | FileCheck --check-prefixes=SIVI,GCN %s
 ; RUN: llc -march=amdgcn -mcpu=tonga -mattr=-flat-for-global -verify-machineinstrs < %s | FileCheck --check-prefixes=SIVI,GCN %s
 ; RUN: llc -march=amdgcn -mcpu=gfx1010 -mattr=-flat-for-global -verify-machineinstrs < %s | FileCheck -check-prefix=GCN %s
-; RUN: not --crash llc -march=amdgcn -mcpu=gfx1030 -mattr=-flat-for-global -verify-machineinstrs < %s 2>&1 | FileCheck -check-prefix=GFX1030-ERR %s
+; RUN: llc -march=amdgcn -mcpu=gfx1030 -mattr=-flat-for-global -verify-machineinstrs < %s | FileCheck -check-prefix=GCN %s
 
 declare i64 @llvm.amdgcn.s.memtime() #0
 
@@ -13,7 +13,6 @@
 ; SIVI-NOT: lgkmcnt
 ; GCN: s_memtime s{{\[[0-9]+:[0-9]+\]}}
 ; GCN: {{buffer|global}}_store_dwordx2
-; GFX1030-ERR: ERROR
 define amdgpu_kernel void @test_s_memtime(i64 addrspace(1)* %out) #0 {
   %cycle0 = call i64 @llvm.amdgcn.s.memtime()
   store volatile i64 %cycle0, i64 addrspace(1)* %out
Index: llvm/lib/Target/AMDGPU/SMInstructions.td
===
--- llvm/lib/Target/AMDGPU/SMInstructions.td
+++ llvm/lib/Target/AMDGPU/SMInstructions.td
@@ -866,14 +866,16 @@
 >;
 } // let OtherPredicates = [HasSMemTimeInst]
 
-let OtherPredicates = [HasNoSMemTimeInst] in {
+let OtherPredicates = [HasShaderCyclesRegister] in {
 def : GCNPat <
   (i64 (readcyclecounter)),
   (REG_SEQUENCE SReg_64,
 (S_GETREG_B32 getHwRegImm.ret), sub0,
-(S_MOV_B32 (i32 0)), sub1)
->;
-} // let OtherPredicates = [HasNoSMemTimeInst]
+(S_MOV_B32 (i32 0)), sub1)> {
+  // Prefer this to s_memtime because it has lower and more predictable latency.
+  let AddedComplexity = 1;
+}
+} // let OtherPredicates = [HasShaderCyclesRegister]
 
 //===--===//
 // GFX10.
Index: llvm/lib/Target/AMDGPU/GCNSubtarget.h
===
--- llvm/lib/Target/AMDGPU/GCNSubtarget.h
+++ llvm/lib/Target/AMDGPU/GCNSubtarget.h
@@ -163,6 +163,7 @@
   bool HasVscnt;
   bool HasGetWaveIdInst;
   bool HasSMemTimeInst;
+  bool HasShaderCyclesRegister;
   bool HasRegisterBanking;
   bool HasVOP3Literal;
   bool HasNoDataDepHazard;
@@ -714,6 +715,10 @@
 return HasSMemTimeInst;
   }
 
+  bool hasShaderCyclesRegister() const {
+return HasShaderCyclesRegister;
+  }
+
   bool hasRegisterBanking() const {
 return HasRegisterBanking;
   }
Index: llvm/lib/Target/AMDGPU/AMDGPU.td
===
--- llvm/lib/Target/AMDGPU/AMDGPU.td
+++ llvm/lib/Target/AMDGPU/AMDGPU.td
@@ -563,6 +563,12 @@
   "Has s_memtime instruction"
 >;
 
+def FeatureShaderCyclesRegister : SubtargetFeature<"shader-cycles-register",
+  "HasShaderCyclesRegister",
+  "true",
+  "Has SHADER_CYCLES hardware register"
+>;
+
 def FeatureMadMacF32Insts : SubtargetFeature<"mad-mac-f32-insts",
   "HasMadMacF32Insts",
   "true",
@@ -777,7 +783,7 @@
FeatureNoSdstCMPX, FeatureVscnt, FeatureRegisterBanking,
FeatureVOP3Literal, FeatureDPP8, FeatureExtendedImageInsts,
FeatureNoDataDepHazard, FeaturePkFmacF16Inst,
-   FeatureGFX10A16, FeatureFastDenormalF32, FeatureG16,
+   FeatureGFX10A16, FeatureSMemTimeInst, FeatureFastDenormalF32, FeatureG16,
FeatureUnalignedBufferAccess, FeatureUnalignedDSAccess
   ]
 >;
@@ -988,7 +994,6 @@
  FeatureScalarAtomics,

[PATCH] D97928: [AMDGPU] Restore the s_memtime instruction in gfx1030

2021-03-05 Thread Jay Foad via Phabricator via cfe-commits

foad updated this revision to Diff 328429.
foad added a comment.
Herald added a project: clang.
Herald added a subscriber: cfe-commits.

Add clang changes.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D97928/new/

https://reviews.llvm.org/D97928

Files:
  clang/lib/Basic/Targets/AMDGPU.cpp
  clang/test/CodeGenOpenCL/amdgpu-features.cl
  clang/test/SemaOpenCL/builtins-amdgcn-error-gfx1030.cl
  llvm/lib/Target/AMDGPU/AMDGPU.td
  llvm/lib/Target/AMDGPU/GCNSubtarget.h
  llvm/lib/Target/AMDGPU/SMInstructions.td
  llvm/test/CodeGen/AMDGPU/llvm.amdgcn.s.memtime.ll
  llvm/test/MC/AMDGPU/gfx1030_err.s

Index: llvm/test/MC/AMDGPU/gfx1030_err.s
===
--- llvm/test/MC/AMDGPU/gfx1030_err.s
+++ llvm/test/MC/AMDGPU/gfx1030_err.s
@@ -21,9 +21,6 @@
 s_get_waveid_in_workgroup s0
 // GFX10: :[[@LINE-1]]:{{[0-9]+}}: error: instruction not supported on this GPU
 
-s_memtime s[0:1]
-// GFX10: :[[@LINE-1]]:{{[0-9]+}}: error: instruction not supported on this GPU
-
 s_getreg_b32 s2, hwreg(HW_REG_XNACK_MASK)
 // GFX10: :[[@LINE-1]]:{{[0-9]+}}: error: specified hardware register is not supported on this GPU
 
Index: llvm/test/CodeGen/AMDGPU/llvm.amdgcn.s.memtime.ll
===
--- llvm/test/CodeGen/AMDGPU/llvm.amdgcn.s.memtime.ll
+++ llvm/test/CodeGen/AMDGPU/llvm.amdgcn.s.memtime.ll
@@ -1,7 +1,7 @@
 ; RUN: llc -march=amdgcn -mcpu=tahiti -verify-machineinstrs < %s | FileCheck --check-prefixes=SIVI,GCN %s
 ; RUN: llc -march=amdgcn -mcpu=tonga -mattr=-flat-for-global -verify-machineinstrs < %s | FileCheck --check-prefixes=SIVI,GCN %s
 ; RUN: llc -march=amdgcn -mcpu=gfx1010 -mattr=-flat-for-global -verify-machineinstrs < %s | FileCheck -check-prefix=GCN %s
-; RUN: not --crash llc -march=amdgcn -mcpu=gfx1030 -mattr=-flat-for-global -verify-machineinstrs < %s 2>&1 | FileCheck -check-prefix=GFX1030-ERR %s
+; RUN: llc -march=amdgcn -mcpu=gfx1030 -mattr=-flat-for-global -verify-machineinstrs < %s | FileCheck -check-prefix=GCN %s
 
 declare i64 @llvm.amdgcn.s.memtime() #0
 
@@ -13,7 +13,6 @@
 ; SIVI-NOT: lgkmcnt
 ; GCN: s_memtime s{{\[[0-9]+:[0-9]+\]}}
 ; GCN: {{buffer|global}}_store_dwordx2
-; GFX1030-ERR: ERROR
 define amdgpu_kernel void @test_s_memtime(i64 addrspace(1)* %out) #0 {
   %cycle0 = call i64 @llvm.amdgcn.s.memtime()
   store volatile i64 %cycle0, i64 addrspace(1)* %out
Index: llvm/lib/Target/AMDGPU/SMInstructions.td
===
--- llvm/lib/Target/AMDGPU/SMInstructions.td
+++ llvm/lib/Target/AMDGPU/SMInstructions.td
@@ -866,14 +866,16 @@
 >;
 } // let OtherPredicates = [HasSMemTimeInst]
 
-let OtherPredicates = [HasNoSMemTimeInst] in {
+let OtherPredicates = [HasShaderCyclesRegister] in {
 def : GCNPat <
   (i64 (readcyclecounter)),
   (REG_SEQUENCE SReg_64,
 (S_GETREG_B32 getHwRegImm.ret), sub0,
-(S_MOV_B32 (i32 0)), sub1)
->;
-} // let OtherPredicates = [HasNoSMemTimeInst]
+(S_MOV_B32 (i32 0)), sub1)> {
+  // Prefer this to s_memtime because it has lower and more predictable latency.
+  let AddedComplexity = 1;
+}
+} // let OtherPredicates = [HasShaderCyclesRegister]
 
 //===--===//
 // GFX10.
Index: llvm/lib/Target/AMDGPU/GCNSubtarget.h
===
--- llvm/lib/Target/AMDGPU/GCNSubtarget.h
+++ llvm/lib/Target/AMDGPU/GCNSubtarget.h
@@ -163,6 +163,7 @@
   bool HasVscnt;
   bool HasGetWaveIdInst;
   bool HasSMemTimeInst;
+  bool HasShaderCyclesRegister;
   bool HasRegisterBanking;
   bool HasVOP3Literal;
   bool HasNoDataDepHazard;
@@ -714,6 +715,10 @@
 return HasSMemTimeInst;
   }
 
+  bool hasShaderCyclesRegister() const {
+return HasShaderCyclesRegister;
+  }
+
   bool hasRegisterBanking() const {
 return HasRegisterBanking;
   }
Index: llvm/lib/Target/AMDGPU/AMDGPU.td
===
--- llvm/lib/Target/AMDGPU/AMDGPU.td
+++ llvm/lib/Target/AMDGPU/AMDGPU.td
@@ -563,6 +563,12 @@
   "Has s_memtime instruction"
 >;
 
+def FeatureShaderCyclesRegister : SubtargetFeature<"shader-cycles-register",
+  "HasShaderCyclesRegister",
+  "true",
+  "Has SHADER_CYCLES hardware register"
+>;
+
 def FeatureMadMacF32Insts : SubtargetFeature<"mad-mac-f32-insts",
   "HasMadMacF32Insts",
   "true",
@@ -777,7 +783,7 @@
FeatureNoSdstCMPX, FeatureVscnt, FeatureRegisterBanking,
FeatureVOP3Literal, FeatureDPP8, FeatureExtendedImageInsts,
FeatureNoDataDepHazard, FeaturePkFmacF16Inst,
-   FeatureGFX10A16, FeatureFastDenormalF32, FeatureG16,
+   FeatureGFX10A16, FeatureSMemTimeInst, FeatureFastDenormalF32, FeatureG16,
FeatureUnalignedBufferAccess, FeatureUnalignedDSAccess
   ]
 >;
@@ -988,7 +994,6 @@
  FeatureScalarAtomics,
  FeatureScalarFlatScratchInsts,
  FeatureGetWaveIdInst,
- FeatureSMemTimeInst,

[PATCH] D61112: AMDGPU: Enable _Float16

2021-02-24 Thread Jay Foad via Phabricator via cfe-commits

foad added a comment.
Herald added a subscriber: kerbowa.

Should have updated 
https://clang.llvm.org/docs/LanguageExtensions.html#half-precision-floating-point
 "_Float16 is currently only supported on the following targets, with further 
targets pending ABI standardization: ..."


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D61112/new/

https://reviews.llvm.org/D61112

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D96906: [AMDGPU] gfx90a support

2021-02-19 Thread Jay Foad via Phabricator via cfe-commits

foad added a comment.

In D96906#2573265 , @echristo wrote:

> In D96906#2572842 , @msearles wrote:
>
>> In D96906#2572749 , @kzhuravl wrote:
>>
 The point is that nobody upstream even got a chance to chime in.
>>>
>>> We are and will be taking care of any feedback provided in this review 
>>> post-commit.
>>
>> To be fair to @rampitec , it was not his desire to push this up in 1 big 
>> patch. We needed this upstreamed and no time was given to him to break it up 
>> into reasonably sized pieces. If it appears to be his doing/his intent, 
>> well, it should not. There have been a couple comments; I believe most 
>> addressed; comments will continue to be addressed.
>
> "we needed this upstream" is a business issue on AMD's side, not an issue for 
> the llvm project. In general the expectation is that code is reviewed 
> according to the guidelines and a single reviewer with one (small) patch that 
> wasn't a revert doesn't feel like sufficient review for something of this 
> size. For something this size I'd have expected Matt to at least be on the 
> reviewer line and that also wasn't done. This feels like an abuse of the 
> review system and probably should be reverted.
>
> Thanks.
>
> -eric

I'd appreciate it if you could find a solution that does not involve reverting 
and reapplying later, as this will triple the amount of churn we get 
downstream. (I realise LLVM policy is not to care about downstream but I 
thought I'd plead my case anyway!)


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D96906/new/

https://reviews.llvm.org/D96906

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D96906: [AMDGPU] gfx90a support

2021-02-18 Thread Jay Foad via Phabricator via cfe-commits

foad added inline comments.



Comment at: llvm/test/CodeGen/AMDGPU/llvm.amdgcn.workitem.id.ll:23
+
+; CO-V3: .amdhsa_system_vgpr_workitem_id 0
+; PACKED-TID: .amdhsa_system_vgpr_workitem_id 0

CO-V3 isn't tested by any RUN line. I think FileCheck might complain about this 
in future.



Comment at: llvm/test/CodeGen/AMDGPU/llvm.amdgcn.workitem.id.ll:41
+
+; UNPACKED-TID-NOT: v1
+; UNPACKED-TID: {{buffer|flat}}_store_dword {{.*}}v1

UNPACKED-TID isn't tested by any RUN line.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D96906/new/

https://reviews.llvm.org/D96906

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D90809: [amdgpu] Add `llvm.amdgcn.endpgm` support.

2020-11-06 Thread Jay Foad via Phabricator via cfe-commits

foad added inline comments.



Comment at: llvm/include/llvm/IR/IntrinsicsAMDGPU.td:1580
 
+def int_amdgcn_endpgm : GCCBuiltin<"__builtin_amdgcn_endpgm">,
+  Intrinsic<[], [], [IntrNoReturn, IntrCold, IntrNoMem, IntrHasSideEffects]

The intrinsic def needs a comment. Is it supposed to literally just generate an 
s_endpgm instruction, or is it something more high-level?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D90809/new/

https://reviews.llvm.org/D90809

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D90419: [AMDGPU] Add gfx90c target

2020-10-30 Thread Jay Foad via Phabricator via cfe-commits

foad added inline comments.



Comment at: llvm/docs/AMDGPUUsage.rst:261
+ ``gfx90c``  ``amdgcn``   APU   - xnack
   *TBA*
+  [off]
+   
   .. TODO::

Is xnack really supposed to be off for this target?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D90419/new/

https://reviews.llvm.org/D90419

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D90447: [AMDGPU] Add gfx1033 target

2020-10-30 Thread Jay Foad via Phabricator via cfe-commits

foad added a comment.

Don't you need to update lib/Object/ELFObjectFile.cpp and 
test/Object/AMDGPU/elf-header-flags-mach.yaml?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D90447/new/

https://reviews.llvm.org/D90447

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D89487: [AMDGPU] gfx1032 target

2020-10-16 Thread Jay Foad via Phabricator via cfe-commits

foad added inline comments.



Comment at: llvm/docs/AMDGPUUsage.rst:280

  names.
+ ``gfx1032`` ``amdgcn``   dGPU  - xnack   
*TBA*
+  [off]

xnack looks like a mistake here?



Comment at: llvm/lib/Support/TargetParser.cpp:66
 // Don't bother listing the implicitly true features
-constexpr GPUInfo AMDGCNGPUs[43] = {
+constexpr GPUInfo AMDGCNGPUs[44] = {
   // Name CanonicalKindFeatures

Use `[]` so we don't have to keep updating the number?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D89487/new/

https://reviews.llvm.org/D89487

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D81728: [InstCombine] Add target-specific inst combining

2020-07-17 Thread Jay Foad via Phabricator via cfe-commits

foad added inline comments.



Comment at: llvm/include/llvm/Analysis/TargetTransformInfo.h:552-555
+  /// \returns false to not do anything target specific or true to return the
+  /// value in \p ResultI from the InstCombiner. It is possible to return null
+  /// and stop further processing of the intrinsic by writing nullptr into
+  /// \p ResultI and returning true.

Did you consider returning `std::pair`?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D81728/new/

https://reviews.llvm.org/D81728



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D81728: [InstCombine] Add target-specific inst combining

2020-06-30 Thread Jay Foad via Phabricator via cfe-commits

foad added a subscriber: bogner.
foad added inline comments.



Comment at: llvm/lib/Transforms/InstCombine/InstCombineSimplifyDemanded.cpp:1444
+  *this, *II, DemandedElts, UndefElts, UndefElts2, UndefElts3,
+  simplifyAndSetOp, ))
+return V;

efriedma wrote:
> Is there some way we can check that an intrinsic is actually target-specific, 
> to discourage people from handling generic intrinsics in target-specific ways?
That was the intent of @bogner's rG92a8c6112c6571112e8b622bfddc7e4d1685a6fe.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D81728/new/

https://reviews.llvm.org/D81728



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D82085: [TRE] markTails marks call sites as tailcalls though some of them are not.

2020-06-18 Thread Jay Foad via Phabricator via cfe-commits

foad added a comment.

> markTails function set IsTailcall bit for functions which are not
>  last calls:

It's OK to set "tail" on any call that satisfies these requirements (from 
https://llvm.org/docs/LangRef.html#call-instruction): "Both markers [tail and 
musttail] imply that the callee does not access allocas from the caller. The 
tail marker additionally implies that the callee does not access varargs from 
the caller."

"tail" does not mean that the call *must* be generated as a tail call. It just 
means that it's safe to generate it as a tail call if it turns out to be 
possible (e.g. if the compiler can prove that @noarg doesn't return, or if it 
can prove that all the code after the call to @noarg has no effect, or so on).

So I don't think there is a bug here that needs to be fixed.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D82085/new/

https://reviews.llvm.org/D82085



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D81886: [AMDGPU] Add gfx1030 target

2020-06-17 Thread Jay Foad via Phabricator via cfe-commits

foad added inline comments.



Comment at: llvm/docs/AMDGPUUsage.rst:266-267

  names.
+ ``gfx1030`` ``amdgcn``   dGPU  - xnack   
*TBA*
+  [off]
+- wavefrontsize64

Seems odd to list xnack as a "supported fetaure" when the hardware doesn't 
support it.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D81886/new/

https://reviews.llvm.org/D81886



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D71213: [Alignment][NFC] CreateMemSet use MaybeAlign

2019-12-11 Thread Jay Foad via Phabricator via cfe-commits

foad added a comment.

In D71213#1780088 , @gchatelet wrote:

> In D71213#1779841 , @foad wrote:
>
> > @gchatelet in general would it be possible to make changes like this in a 
> > backwards-compatible way, or in two stages without a "flag day" change? We 
> > have out-of-tree users of CreateMemSet and it's awkward to change them all 
> > at exactly the same time as we merge in this change from upstream llvm, and 
> > we have had the same problem with other MaybeAlign changes recently. I 
> > realise that LLVM doesn't make any official promises about API stability.
>
>
> Thx for letting me know @foad . I'll make sure to keep the old API with a 
> deprecation message from now on.
>  Do you have any other suggestions on how to make this less painful for 
> out-of-tree users? I'm afraid that the cleanup phase (removal of deprecated 
> function) will be disruptive as well.


Removing deprecated functions is generally OK, as long as it happens *after* 
the preferred function is introduced, so we have time to switch over.

In the specific case of functions taking `Align` instead of `unsigned`, perhaps 
you could start off allowing implicit conversion from `unsigned` to `Align` and 
then remove it later, when all callers have been updated? Or perhaps it's too 
late for that now.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D71213/new/

https://reviews.llvm.org/D71213



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D71213: [Alignment][NFC] CreateMemSet use MaybeAlign

2019-12-11 Thread Jay Foad via Phabricator via cfe-commits

foad added a comment.

@gchatelet in general would it be possible to make changes like this in a 
backwards-compatible way, or in two stages without a "flag day" change? We have 
out-of-tree users of CreateMemSet and it's awkward to change them all at 
exactly the same time as we merge in this change from upstream llvm, and we 
have had the same problem with other MaybeAlign changes recently. I realise 
that LLVM doesn't make any official promises about API stability.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D71213/new/

https://reviews.llvm.org/D71213



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

85 matches

Mail list logo