[PATCH] D155213: [HIP] Add `-fno-hip-uniform-block`

2023-07-20 Thread Yaxun Liu via Phabricator via cfe-commits
yaxunl marked 2 inline comments as done.
yaxunl added inline comments.



Comment at: clang/include/clang/Driver/Options.td:1092
   ShouldParseIf;
+defm hip_uniform_block : BoolFOption<"hip-uniform-block",
+  LangOpts<"HIPUniformBlock">, DefaultTrue,

arsenm wrote:
> scchan wrote:
> > yaxunl wrote:
> > > arsenm wrote:
> > > > Can we avoid adding yet another language flag for something that's 
> > > > reusable for everything? Is there an --offload- ?
> > > Currently, the naming convention for shared CUDA/HIP language options is 
> > > `-fgpu-*` or `--gpu-* . The shared CUDA/HIP/OpenMP driver options are 
> > > named `--offload-*`.
> > > 
> > > This option is named `-fhip-uniform-block` because AFAIK CUDA does not 
> > > support non-uniform block size.
> > > 
> > > If we want to make it a generic option, it should be named as 
> > > `-fgpu-uniform-block` by the current naming convention. Unless we want to 
> > > change the naming convention for generic offloading language options.
> > > 
> > > @tra What do you think? Thanks.
> > Don't we need a different default value for some languages like OpenCL?
> Yes, but opencl already has a spec'd flag for this. If we're making up a new 
> one, it could be something generic that aliases the opencl one in that case. 
> Plus the +/- value of a new flag should work (the CL one only goes in one 
> direction)
I am thinking, maybe it is time to start moving towards the final direction.

How about renaming it as `--offload-uniform-block` ?

@MaskRay @tra 



CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D155213/new/

https://reviews.llvm.org/D155213

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D155213: [HIP] Add `-fno-hip-uniform-block`

2023-07-14 Thread Yaxun Liu via Phabricator via cfe-commits
yaxunl marked an inline comment as done.
yaxunl added inline comments.



Comment at: clang/include/clang/Driver/Options.td:1092
   ShouldParseIf;
+defm hip_uniform_block : BoolFOption<"hip-uniform-block",
+  LangOpts<"HIPUniformBlock">, DefaultTrue,

arsenm wrote:
> Can we avoid adding yet another language flag for something that's reusable 
> for everything? Is there an --offload- ?
Currently, the naming convention for shared CUDA/HIP language options is 
`-fgpu-*` or `--gpu-* . The shared CUDA/HIP/OpenMP driver options are named 
`--offload-*`.

This option is named `-fhip-uniform-block` because AFAIK CUDA does not support 
non-uniform block size.

If we want to make it a generic option, it should be named as 
`-fgpu-uniform-block` by the current naming convention. Unless we want to 
change the naming convention for generic offloading language options.

@tra What do you think? Thanks.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D155213/new/

https://reviews.llvm.org/D155213

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D155213: [HIP] Add `-fno-hip-uniform-block`

2023-07-14 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm added inline comments.



Comment at: clang/include/clang/Driver/Options.td:1092
   ShouldParseIf;
+defm hip_uniform_block : BoolFOption<"hip-uniform-block",
+  LangOpts<"HIPUniformBlock">, DefaultTrue,

scchan wrote:
> arsenm wrote:
> > Can we avoid adding yet another language flag for something that's reusable 
> > for everything? Is there an --offload- ?
> Don't we need a different default value for some languages like OpenCL?
Yes, but opencl already has a spec'd flag for this. If we're making up a new 
one, it could be something generic that aliases the opencl one in that case. 
Plus the +/- value of a new flag should work (the CL one only goes in one 
direction)


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D155213/new/

https://reviews.llvm.org/D155213

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D155213: [HIP] Add `-fno-hip-uniform-block`

2023-07-14 Thread Siu Chi Chan via Phabricator via cfe-commits
scchan added inline comments.



Comment at: clang/include/clang/Driver/Options.td:1092
   ShouldParseIf;
+defm hip_uniform_block : BoolFOption<"hip-uniform-block",
+  LangOpts<"HIPUniformBlock">, DefaultTrue,

arsenm wrote:
> Can we avoid adding yet another language flag for something that's reusable 
> for everything? Is there an --offload- ?
Don't we need a different default value for some languages like OpenCL?


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D155213/new/

https://reviews.llvm.org/D155213

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D155213: [HIP] Add `-fno-hip-uniform-block`

2023-07-14 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm added inline comments.



Comment at: clang/include/clang/Driver/Options.td:1092
   ShouldParseIf;
+defm hip_uniform_block : BoolFOption<"hip-uniform-block",
+  LangOpts<"HIPUniformBlock">, DefaultTrue,

Can we avoid adding yet another language flag for something that's reusable for 
everything? Is there an --offload- ?


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D155213/new/

https://reviews.llvm.org/D155213

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D155213: [HIP] Add `-fno-hip-uniform-block`

2023-07-14 Thread Yaxun Liu via Phabricator via cfe-commits
yaxunl updated this revision to Diff 540397.
yaxunl marked an inline comment as done.
yaxunl added a comment.

revised by comments


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D155213/new/

https://reviews.llvm.org/D155213

Files:
  clang/include/clang/Basic/LangOptions.def
  clang/include/clang/Driver/Options.td
  clang/lib/CodeGen/CGCall.cpp
  clang/lib/CodeGen/Targets/AMDGPU.cpp
  clang/lib/Driver/ToolChains/Clang.cpp
  clang/test/CodeGenHIP/default-attributes.hip
  clang/test/Driver/hip-options.hip

Index: clang/test/Driver/hip-options.hip
===
--- clang/test/Driver/hip-options.hip
+++ clang/test/Driver/hip-options.hip
@@ -169,3 +169,25 @@
 // RUN: %clang -### -nogpuinc -nogpulib -fhip-fp32-correctly-rounded-divide-sqrt \
 // RUN:   --cuda-gpu-arch=gfx906  %s 2>&1 | FileCheck -check-prefixes=CRDS %s
 // CRDS-NOT: "-f{{(no-)?}}hip-fp32-correctly-rounded-divide-sqrt"
+
+// Check -fno-hip-uniform-block is passed to clang -cc1 but
+// (default) -fhip-uniform-block is not.
+
+// RUN: %clang -### --target=x86_64-unknown-linux-gnu -nogpuinc -nogpulib -fno-hip-uniform-block \
+// RUN:   --cuda-gpu-arch=gfx906 %s 2>&1 | FileCheck -check-prefix=NOUNIBLK %s
+
+// NOUNIBLK: "-cc1"{{.*}} "-triple" "amdgcn-amd-amdhsa" {{.*}} "-fno-hip-uniform-block"
+// NOUNIBLK: "-cc1"{{.*}} "-triple" "x86_64-unknown-linux-gnu" {{.*}} "-fno-hip-uniform-block"
+
+// RUN: %clang -### -nogpuinc -nogpulib -fhip-uniform-block \
+// RUN:   --cuda-gpu-arch=gfx906 %s 2>&1 | FileCheck -check-prefix=UNIBLK %s
+
+// RUN: %clang -### -nogpuinc -nogpulib \
+// RUN:   --cuda-gpu-arch=gfx906 %s 2>&1 | FileCheck -check-prefix=UNIBLK %s
+
+// UNIBLK-NOT: "-f{{(no-)?}}hip-uniform-block"
+
+// Check no warnings for -f[no-]uniform-block.
+
+// RUN: %clang -fdriver-only -Werror --target=x86_64-unknown-linux-gnu -nogpuinc -nogpulib -fno-hip-uniform-block \
+// RUN:   -fhip-uniform-block --cuda-gpu-arch=gfx906 %s 2>&1 | count 0
Index: clang/test/CodeGenHIP/default-attributes.hip
===
--- clang/test/CodeGenHIP/default-attributes.hip
+++ clang/test/CodeGenHIP/default-attributes.hip
@@ -5,6 +5,9 @@
 // RUN: %clang_cc1 -O3 -triple amdgcn-amd-amdhsa -x hip -fno-ident -fcuda-is-device \
 // RUN:-emit-llvm -o - %s | FileCheck -check-prefix=OPT %s
 
+// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -x hip -fno-ident -fcuda-is-device -fno-hip-uniform-block \
+// RUN:-emit-llvm -o - %s | FileCheck -check-prefix=NOUNIBLK %s
+
 #define __device__ __attribute__((device))
 #define __global__ __attribute__((global))
 
@@ -20,6 +23,12 @@
 // OPT-NEXT:  entry:
 // OPT-NEXT:ret void
 //
+// NOUNIBLK: Function Attrs: convergent mustprogress noinline nounwind optnone
+// NOUNIBLK-LABEL: define {{[^@]+}}@_Z4funcv
+// NOUNIBLK-SAME: () #[[ATTR0:[0-9]+]] {
+// NOUNIBLK-NEXT:  entry:
+// NOUNIBLK-NEXT:ret void
+//
 __device__ void func() {
 
 }
@@ -36,21 +45,34 @@
 // OPT-NEXT:  entry:
 // OPT-NEXT:ret void
 //
+// NOUNIBLK: Function Attrs: convergent mustprogress noinline norecurse nounwind optnone
+// NOUNIBLK-LABEL: define {{[^@]+}}@_Z6kernelv
+// NOUNIBLK-SAME: () #[[ATTR1:[0-9]+]] {
+// NOUNIBLK-NEXT:  entry:
+// NOUNIBLK-NEXT:ret void
+//
 __global__ void kernel() {
 
 }
 //.
-// OPTNONE: attributes #0 = { convergent mustprogress noinline nounwind optnone "no-trapping-math"="true" "stack-protector-buffer-size"="8" }
-// OPTNONE: attributes #1 = { convergent mustprogress noinline norecurse nounwind optnone "amdgpu-flat-work-group-size"="1,1024" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "uniform-work-group-size"="true" }
+// OPTNONE: attributes #[[ATTR0]] = { convergent mustprogress noinline nounwind optnone "no-trapping-math"="true" "stack-protector-buffer-size"="8" }
+// OPTNONE: attributes #[[ATTR1]] = { convergent mustprogress noinline norecurse nounwind optnone "amdgpu-flat-work-group-size"="1,1024" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "uniform-work-group-size"="true" }
+//.
+// OPT: attributes #[[ATTR0]] = { mustprogress nofree norecurse nosync nounwind willreturn memory(none) "no-trapping-math"="true" "stack-protector-buffer-size"="8" }
+// OPT: attributes #[[ATTR1]] = { mustprogress nofree norecurse nosync nounwind willreturn memory(none) "amdgpu-flat-work-group-size"="1,1024" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "uniform-work-group-size"="true" }
+//.
+// NOUNIBLK: attributes #[[ATTR0]] = { convergent mustprogress noinline nounwind optnone "no-trapping-math"="true" "stack-protector-buffer-size"="8" }
+// NOUNIBLK: attributes #[[ATTR1]] = { convergent mustprogress noinline norecurse nounwind optnone "amdgpu-flat-work-group-size"="1,1024" "no-trapping-math"="true" "stack-protector-buffer-size"="8" }
 //.
-// OPT: attributes #0 = { mustprogress nofree norecurse nosync nounwind willreturn memory(none) "no-trapping-math"="true" 

[PATCH] D155213: [HIP] Add `-fno-hip-uniform-block`

2023-07-13 Thread Yaxun Liu via Phabricator via cfe-commits
yaxunl marked 2 inline comments as done.
yaxunl added inline comments.



Comment at: clang/lib/Driver/ToolChains/Clang.cpp:7226
+  } else {
+Args.claimAllArgs(options::OPT_fhip_uniform_block,
+  options::OPT_fno_hip_uniform_block);

MaskRay wrote:
> yaxunl wrote:
> > MaskRay wrote:
> > > Why is the -Wunused-command-line-argument warning suppressed in non-IsHIP 
> > > mode?
> > Users may want to add these options to clang config file.
> > 
> > Is there a general rule which options should be claimed?
> Options in a configuration file are automatically claimed.
> 
> I don't know a general rule, but we generally don't claim newly introduced 
> options.
I think I should remove the claimAllArgs for this option. It should behave like 
the usual options when not used.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D155213/new/

https://reviews.llvm.org/D155213

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D155213: [HIP] Add `-fno-hip-uniform-block`

2023-07-13 Thread Fangrui Song via Phabricator via cfe-commits
MaskRay added inline comments.



Comment at: clang/lib/Driver/ToolChains/Clang.cpp:7226
+  } else {
+Args.claimAllArgs(options::OPT_fhip_uniform_block,
+  options::OPT_fno_hip_uniform_block);

yaxunl wrote:
> MaskRay wrote:
> > Why is the -Wunused-command-line-argument warning suppressed in non-IsHIP 
> > mode?
> Users may want to add these options to clang config file.
> 
> Is there a general rule which options should be claimed?
Options in a configuration file are automatically claimed.

I don't know a general rule, but we generally don't claim newly introduced 
options.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D155213/new/

https://reviews.llvm.org/D155213

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D155213: [HIP] Add `-fno-hip-uniform-block`

2023-07-13 Thread Yaxun Liu via Phabricator via cfe-commits
yaxunl added inline comments.



Comment at: clang/lib/Driver/ToolChains/Clang.cpp:7226
+  } else {
+Args.claimAllArgs(options::OPT_fhip_uniform_block,
+  options::OPT_fno_hip_uniform_block);

MaskRay wrote:
> Why is the -Wunused-command-line-argument warning suppressed in non-IsHIP 
> mode?
Users may want to add these options to clang config file.

Is there a general rule which options should be claimed?


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D155213/new/

https://reviews.llvm.org/D155213

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D155213: [HIP] Add `-fno-hip-uniform-block`

2023-07-13 Thread Fangrui Song via Phabricator via cfe-commits
MaskRay added inline comments.



Comment at: clang/lib/Driver/ToolChains/Clang.cpp:7226
+  } else {
+Args.claimAllArgs(options::OPT_fhip_uniform_block,
+  options::OPT_fno_hip_uniform_block);

Why is the -Wunused-command-line-argument warning suppressed in non-IsHIP mode?


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D155213/new/

https://reviews.llvm.org/D155213

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D155213: [HIP] Add `-fno-hip-uniform-block`

2023-07-13 Thread Yaxun Liu via Phabricator via cfe-commits
yaxunl created this revision.
yaxunl added reviewers: tra, b-sumner, MaskRay, arsenm, scchan.
Herald added subscribers: jdoerfert, kerbowa, jvesely.
Herald added a project: All.
yaxunl requested review of this revision.
Herald added a subscriber: wdng.

By default, clang assumes HIP kernels are launched with uniform block size,
which is the case for kernels launched through triple chevron or
hipLaunchKernelGGL. Clang adds uniform-work-group-size function attribute
to HIP kernels to allow the backend to do optimizations on that.

However, in some rare cases, HIP kernels can be launched
through hipExtModuleLaunchKernel where global work size is specified,
which may result in non-uniform block size.

To be able to support non-uniform block size for HIP kernels,
an option `-f[no-]hip-uniform-block is added. By default it is on.


https://reviews.llvm.org/D155213

Files:
  clang/include/clang/Basic/LangOptions.def
  clang/include/clang/Driver/Options.td
  clang/lib/CodeGen/CGCall.cpp
  clang/lib/CodeGen/Targets/AMDGPU.cpp
  clang/lib/Driver/ToolChains/Clang.cpp
  clang/test/CodeGenHIP/default-attributes.hip
  clang/test/Driver/hip-options.hip

Index: clang/test/Driver/hip-options.hip
===
--- clang/test/Driver/hip-options.hip
+++ clang/test/Driver/hip-options.hip
@@ -169,3 +169,28 @@
 // RUN: %clang -### -nogpuinc -nogpulib -fhip-fp32-correctly-rounded-divide-sqrt \
 // RUN:   --cuda-gpu-arch=gfx906  %s 2>&1 | FileCheck -check-prefixes=CRDS %s
 // CRDS-NOT: "-f{{(no-)?}}hip-fp32-correctly-rounded-divide-sqrt"
+
+// Check -fno-hip-uniform-block is passed to clang -cc1 but
+// (default) -fhip-uniform-block is not.
+
+// RUN: %clang -### --target=x86_64-unknown-linux-gnu -nogpuinc -nogpulib -fno-hip-uniform-block \
+// RUN:   --cuda-gpu-arch=gfx906 %s 2>&1 | FileCheck -check-prefix=NOUNIBLK %s
+
+// NOUNIBLK: "-cc1"{{.*}} "-triple" "amdgcn-amd-amdhsa" {{.*}} "-fno-hip-uniform-block"
+// NOUNIBLK: "-cc1"{{.*}} "-triple" "x86_64-unknown-linux-gnu" {{.*}} "-fno-hip-uniform-block"
+
+// RUN: %clang -### -nogpuinc -nogpulib -fhip-uniform-block \
+// RUN:   --cuda-gpu-arch=gfx906 %s 2>&1 | FileCheck -check-prefix=UNIBLK %s
+
+// RUN: %clang -### -nogpuinc -nogpulib \
+// RUN:   --cuda-gpu-arch=gfx906 %s 2>&1 | FileCheck -check-prefix=UNIBLK %s
+
+// UNIBLK-NOT: "-f{{(no-)?}}hip-uniform-block"
+
+// Check no warnings for -f[no-]uniform-block.
+
+// RUN: %clang -fdriver-only -Werror --target=x86_64-unknown-linux-gnu -nogpuinc -nogpulib -fno-hip-uniform-block \
+// RUN:   -fhip-uniform-block --cuda-gpu-arch=gfx906 %s 2>&1 | count 0
+
+// RUN: %clang  -fdriver-only -Werror --target=x86_64-unknown-linux-gnu -nostdinc -nostdlib -fno-hip-uniform-block \
+// RUN:   -fhip-uniform-block -x c++ %s 2>&1 | count 0
Index: clang/test/CodeGenHIP/default-attributes.hip
===
--- clang/test/CodeGenHIP/default-attributes.hip
+++ clang/test/CodeGenHIP/default-attributes.hip
@@ -5,6 +5,9 @@
 // RUN: %clang_cc1 -O3 -triple amdgcn-amd-amdhsa -x hip -fno-ident -fcuda-is-device \
 // RUN:-emit-llvm -o - %s | FileCheck -check-prefix=OPT %s
 
+// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -x hip -fno-ident -fcuda-is-device -fno-hip-uniform-block \
+// RUN:-emit-llvm -o - %s | FileCheck -check-prefix=NOUNIBLK %s
+
 #define __device__ __attribute__((device))
 #define __global__ __attribute__((global))
 
@@ -20,6 +23,12 @@
 // OPT-NEXT:  entry:
 // OPT-NEXT:ret void
 //
+// NOUNIBLK: Function Attrs: convergent mustprogress noinline nounwind optnone
+// NOUNIBLK-LABEL: define {{[^@]+}}@_Z4funcv
+// NOUNIBLK-SAME: () #[[ATTR0:[0-9]+]] {
+// NOUNIBLK-NEXT:  entry:
+// NOUNIBLK-NEXT:ret void
+//
 __device__ void func() {
 
 }
@@ -36,21 +45,34 @@
 // OPT-NEXT:  entry:
 // OPT-NEXT:ret void
 //
+// NOUNIBLK: Function Attrs: convergent mustprogress noinline norecurse nounwind optnone
+// NOUNIBLK-LABEL: define {{[^@]+}}@_Z6kernelv
+// NOUNIBLK-SAME: () #[[ATTR1:[0-9]+]] {
+// NOUNIBLK-NEXT:  entry:
+// NOUNIBLK-NEXT:ret void
+//
 __global__ void kernel() {
 
 }
 //.
-// OPTNONE: attributes #0 = { convergent mustprogress noinline nounwind optnone "no-trapping-math"="true" "stack-protector-buffer-size"="8" }
-// OPTNONE: attributes #1 = { convergent mustprogress noinline norecurse nounwind optnone "amdgpu-flat-work-group-size"="1,1024" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "uniform-work-group-size"="true" }
+// OPTNONE: attributes #[[ATTR0]] = { convergent mustprogress noinline nounwind optnone "no-trapping-math"="true" "stack-protector-buffer-size"="8" }
+// OPTNONE: attributes #[[ATTR1]] = { convergent mustprogress noinline norecurse nounwind optnone "amdgpu-flat-work-group-size"="1,1024" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "uniform-work-group-size"="true" }
+//.
+// OPT: attributes #[[ATTR0]] = { mustprogress nofree norecurse nosync nounwind willreturn