[PATCH] D155213: [HIP] Add `-fno-hip-uniform-block`
yaxunl marked 2 inline comments as done. yaxunl added inline comments. Comment at: clang/include/clang/Driver/Options.td:1092 ShouldParseIf; +defm hip_uniform_block : BoolFOption<"hip-uniform-block", + LangOpts<"HIPUniformBlock">, DefaultTrue, arsenm wrote: > scchan wrote: > > yaxunl wrote: > > > arsenm wrote: > > > > Can we avoid adding yet another language flag for something that's > > > > reusable for everything? Is there an --offload- ? > > > Currently, the naming convention for shared CUDA/HIP language options is > > > `-fgpu-*` or `--gpu-* . The shared CUDA/HIP/OpenMP driver options are > > > named `--offload-*`. > > > > > > This option is named `-fhip-uniform-block` because AFAIK CUDA does not > > > support non-uniform block size. > > > > > > If we want to make it a generic option, it should be named as > > > `-fgpu-uniform-block` by the current naming convention. Unless we want to > > > change the naming convention for generic offloading language options. > > > > > > @tra What do you think? Thanks. > > Don't we need a different default value for some languages like OpenCL? > Yes, but opencl already has a spec'd flag for this. If we're making up a new > one, it could be something generic that aliases the opencl one in that case. > Plus the +/- value of a new flag should work (the CL one only goes in one > direction) I am thinking, maybe it is time to start moving towards the final direction. How about renaming it as `--offload-uniform-block` ? @MaskRay @tra CHANGES SINCE LAST ACTION https://reviews.llvm.org/D155213/new/ https://reviews.llvm.org/D155213 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D155213: [HIP] Add `-fno-hip-uniform-block`
yaxunl marked an inline comment as done. yaxunl added inline comments. Comment at: clang/include/clang/Driver/Options.td:1092 ShouldParseIf; +defm hip_uniform_block : BoolFOption<"hip-uniform-block", + LangOpts<"HIPUniformBlock">, DefaultTrue, arsenm wrote: > Can we avoid adding yet another language flag for something that's reusable > for everything? Is there an --offload- ? Currently, the naming convention for shared CUDA/HIP language options is `-fgpu-*` or `--gpu-* . The shared CUDA/HIP/OpenMP driver options are named `--offload-*`. This option is named `-fhip-uniform-block` because AFAIK CUDA does not support non-uniform block size. If we want to make it a generic option, it should be named as `-fgpu-uniform-block` by the current naming convention. Unless we want to change the naming convention for generic offloading language options. @tra What do you think? Thanks. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D155213/new/ https://reviews.llvm.org/D155213 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D155213: [HIP] Add `-fno-hip-uniform-block`
arsenm added inline comments. Comment at: clang/include/clang/Driver/Options.td:1092 ShouldParseIf; +defm hip_uniform_block : BoolFOption<"hip-uniform-block", + LangOpts<"HIPUniformBlock">, DefaultTrue, scchan wrote: > arsenm wrote: > > Can we avoid adding yet another language flag for something that's reusable > > for everything? Is there an --offload- ? > Don't we need a different default value for some languages like OpenCL? Yes, but opencl already has a spec'd flag for this. If we're making up a new one, it could be something generic that aliases the opencl one in that case. Plus the +/- value of a new flag should work (the CL one only goes in one direction) CHANGES SINCE LAST ACTION https://reviews.llvm.org/D155213/new/ https://reviews.llvm.org/D155213 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D155213: [HIP] Add `-fno-hip-uniform-block`
scchan added inline comments. Comment at: clang/include/clang/Driver/Options.td:1092 ShouldParseIf; +defm hip_uniform_block : BoolFOption<"hip-uniform-block", + LangOpts<"HIPUniformBlock">, DefaultTrue, arsenm wrote: > Can we avoid adding yet another language flag for something that's reusable > for everything? Is there an --offload- ? Don't we need a different default value for some languages like OpenCL? CHANGES SINCE LAST ACTION https://reviews.llvm.org/D155213/new/ https://reviews.llvm.org/D155213 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D155213: [HIP] Add `-fno-hip-uniform-block`
arsenm added inline comments. Comment at: clang/include/clang/Driver/Options.td:1092 ShouldParseIf; +defm hip_uniform_block : BoolFOption<"hip-uniform-block", + LangOpts<"HIPUniformBlock">, DefaultTrue, Can we avoid adding yet another language flag for something that's reusable for everything? Is there an --offload- ? CHANGES SINCE LAST ACTION https://reviews.llvm.org/D155213/new/ https://reviews.llvm.org/D155213 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D155213: [HIP] Add `-fno-hip-uniform-block`
yaxunl updated this revision to Diff 540397. yaxunl marked an inline comment as done. yaxunl added a comment. revised by comments CHANGES SINCE LAST ACTION https://reviews.llvm.org/D155213/new/ https://reviews.llvm.org/D155213 Files: clang/include/clang/Basic/LangOptions.def clang/include/clang/Driver/Options.td clang/lib/CodeGen/CGCall.cpp clang/lib/CodeGen/Targets/AMDGPU.cpp clang/lib/Driver/ToolChains/Clang.cpp clang/test/CodeGenHIP/default-attributes.hip clang/test/Driver/hip-options.hip Index: clang/test/Driver/hip-options.hip === --- clang/test/Driver/hip-options.hip +++ clang/test/Driver/hip-options.hip @@ -169,3 +169,25 @@ // RUN: %clang -### -nogpuinc -nogpulib -fhip-fp32-correctly-rounded-divide-sqrt \ // RUN: --cuda-gpu-arch=gfx906 %s 2>&1 | FileCheck -check-prefixes=CRDS %s // CRDS-NOT: "-f{{(no-)?}}hip-fp32-correctly-rounded-divide-sqrt" + +// Check -fno-hip-uniform-block is passed to clang -cc1 but +// (default) -fhip-uniform-block is not. + +// RUN: %clang -### --target=x86_64-unknown-linux-gnu -nogpuinc -nogpulib -fno-hip-uniform-block \ +// RUN: --cuda-gpu-arch=gfx906 %s 2>&1 | FileCheck -check-prefix=NOUNIBLK %s + +// NOUNIBLK: "-cc1"{{.*}} "-triple" "amdgcn-amd-amdhsa" {{.*}} "-fno-hip-uniform-block" +// NOUNIBLK: "-cc1"{{.*}} "-triple" "x86_64-unknown-linux-gnu" {{.*}} "-fno-hip-uniform-block" + +// RUN: %clang -### -nogpuinc -nogpulib -fhip-uniform-block \ +// RUN: --cuda-gpu-arch=gfx906 %s 2>&1 | FileCheck -check-prefix=UNIBLK %s + +// RUN: %clang -### -nogpuinc -nogpulib \ +// RUN: --cuda-gpu-arch=gfx906 %s 2>&1 | FileCheck -check-prefix=UNIBLK %s + +// UNIBLK-NOT: "-f{{(no-)?}}hip-uniform-block" + +// Check no warnings for -f[no-]uniform-block. + +// RUN: %clang -fdriver-only -Werror --target=x86_64-unknown-linux-gnu -nogpuinc -nogpulib -fno-hip-uniform-block \ +// RUN: -fhip-uniform-block --cuda-gpu-arch=gfx906 %s 2>&1 | count 0 Index: clang/test/CodeGenHIP/default-attributes.hip === --- clang/test/CodeGenHIP/default-attributes.hip +++ clang/test/CodeGenHIP/default-attributes.hip @@ -5,6 +5,9 @@ // RUN: %clang_cc1 -O3 -triple amdgcn-amd-amdhsa -x hip -fno-ident -fcuda-is-device \ // RUN:-emit-llvm -o - %s | FileCheck -check-prefix=OPT %s +// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -x hip -fno-ident -fcuda-is-device -fno-hip-uniform-block \ +// RUN:-emit-llvm -o - %s | FileCheck -check-prefix=NOUNIBLK %s + #define __device__ __attribute__((device)) #define __global__ __attribute__((global)) @@ -20,6 +23,12 @@ // OPT-NEXT: entry: // OPT-NEXT:ret void // +// NOUNIBLK: Function Attrs: convergent mustprogress noinline nounwind optnone +// NOUNIBLK-LABEL: define {{[^@]+}}@_Z4funcv +// NOUNIBLK-SAME: () #[[ATTR0:[0-9]+]] { +// NOUNIBLK-NEXT: entry: +// NOUNIBLK-NEXT:ret void +// __device__ void func() { } @@ -36,21 +45,34 @@ // OPT-NEXT: entry: // OPT-NEXT:ret void // +// NOUNIBLK: Function Attrs: convergent mustprogress noinline norecurse nounwind optnone +// NOUNIBLK-LABEL: define {{[^@]+}}@_Z6kernelv +// NOUNIBLK-SAME: () #[[ATTR1:[0-9]+]] { +// NOUNIBLK-NEXT: entry: +// NOUNIBLK-NEXT:ret void +// __global__ void kernel() { } //. -// OPTNONE: attributes #0 = { convergent mustprogress noinline nounwind optnone "no-trapping-math"="true" "stack-protector-buffer-size"="8" } -// OPTNONE: attributes #1 = { convergent mustprogress noinline norecurse nounwind optnone "amdgpu-flat-work-group-size"="1,1024" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "uniform-work-group-size"="true" } +// OPTNONE: attributes #[[ATTR0]] = { convergent mustprogress noinline nounwind optnone "no-trapping-math"="true" "stack-protector-buffer-size"="8" } +// OPTNONE: attributes #[[ATTR1]] = { convergent mustprogress noinline norecurse nounwind optnone "amdgpu-flat-work-group-size"="1,1024" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "uniform-work-group-size"="true" } +//. +// OPT: attributes #[[ATTR0]] = { mustprogress nofree norecurse nosync nounwind willreturn memory(none) "no-trapping-math"="true" "stack-protector-buffer-size"="8" } +// OPT: attributes #[[ATTR1]] = { mustprogress nofree norecurse nosync nounwind willreturn memory(none) "amdgpu-flat-work-group-size"="1,1024" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "uniform-work-group-size"="true" } +//. +// NOUNIBLK: attributes #[[ATTR0]] = { convergent mustprogress noinline nounwind optnone "no-trapping-math"="true" "stack-protector-buffer-size"="8" } +// NOUNIBLK: attributes #[[ATTR1]] = { convergent mustprogress noinline norecurse nounwind optnone "amdgpu-flat-work-group-size"="1,1024" "no-trapping-math"="true" "stack-protector-buffer-size"="8" } //. -// OPT: attributes #0 = { mustprogress nofree norecurse nosync nounwind willreturn memory(none) "no-trapping-math"="true"
[PATCH] D155213: [HIP] Add `-fno-hip-uniform-block`
yaxunl marked 2 inline comments as done. yaxunl added inline comments. Comment at: clang/lib/Driver/ToolChains/Clang.cpp:7226 + } else { +Args.claimAllArgs(options::OPT_fhip_uniform_block, + options::OPT_fno_hip_uniform_block); MaskRay wrote: > yaxunl wrote: > > MaskRay wrote: > > > Why is the -Wunused-command-line-argument warning suppressed in non-IsHIP > > > mode? > > Users may want to add these options to clang config file. > > > > Is there a general rule which options should be claimed? > Options in a configuration file are automatically claimed. > > I don't know a general rule, but we generally don't claim newly introduced > options. I think I should remove the claimAllArgs for this option. It should behave like the usual options when not used. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D155213/new/ https://reviews.llvm.org/D155213 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D155213: [HIP] Add `-fno-hip-uniform-block`
MaskRay added inline comments. Comment at: clang/lib/Driver/ToolChains/Clang.cpp:7226 + } else { +Args.claimAllArgs(options::OPT_fhip_uniform_block, + options::OPT_fno_hip_uniform_block); yaxunl wrote: > MaskRay wrote: > > Why is the -Wunused-command-line-argument warning suppressed in non-IsHIP > > mode? > Users may want to add these options to clang config file. > > Is there a general rule which options should be claimed? Options in a configuration file are automatically claimed. I don't know a general rule, but we generally don't claim newly introduced options. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D155213/new/ https://reviews.llvm.org/D155213 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D155213: [HIP] Add `-fno-hip-uniform-block`
yaxunl added inline comments. Comment at: clang/lib/Driver/ToolChains/Clang.cpp:7226 + } else { +Args.claimAllArgs(options::OPT_fhip_uniform_block, + options::OPT_fno_hip_uniform_block); MaskRay wrote: > Why is the -Wunused-command-line-argument warning suppressed in non-IsHIP > mode? Users may want to add these options to clang config file. Is there a general rule which options should be claimed? CHANGES SINCE LAST ACTION https://reviews.llvm.org/D155213/new/ https://reviews.llvm.org/D155213 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D155213: [HIP] Add `-fno-hip-uniform-block`
MaskRay added inline comments. Comment at: clang/lib/Driver/ToolChains/Clang.cpp:7226 + } else { +Args.claimAllArgs(options::OPT_fhip_uniform_block, + options::OPT_fno_hip_uniform_block); Why is the -Wunused-command-line-argument warning suppressed in non-IsHIP mode? CHANGES SINCE LAST ACTION https://reviews.llvm.org/D155213/new/ https://reviews.llvm.org/D155213 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D155213: [HIP] Add `-fno-hip-uniform-block`
yaxunl created this revision. yaxunl added reviewers: tra, b-sumner, MaskRay, arsenm, scchan. Herald added subscribers: jdoerfert, kerbowa, jvesely. Herald added a project: All. yaxunl requested review of this revision. Herald added a subscriber: wdng. By default, clang assumes HIP kernels are launched with uniform block size, which is the case for kernels launched through triple chevron or hipLaunchKernelGGL. Clang adds uniform-work-group-size function attribute to HIP kernels to allow the backend to do optimizations on that. However, in some rare cases, HIP kernels can be launched through hipExtModuleLaunchKernel where global work size is specified, which may result in non-uniform block size. To be able to support non-uniform block size for HIP kernels, an option `-f[no-]hip-uniform-block is added. By default it is on. https://reviews.llvm.org/D155213 Files: clang/include/clang/Basic/LangOptions.def clang/include/clang/Driver/Options.td clang/lib/CodeGen/CGCall.cpp clang/lib/CodeGen/Targets/AMDGPU.cpp clang/lib/Driver/ToolChains/Clang.cpp clang/test/CodeGenHIP/default-attributes.hip clang/test/Driver/hip-options.hip Index: clang/test/Driver/hip-options.hip === --- clang/test/Driver/hip-options.hip +++ clang/test/Driver/hip-options.hip @@ -169,3 +169,28 @@ // RUN: %clang -### -nogpuinc -nogpulib -fhip-fp32-correctly-rounded-divide-sqrt \ // RUN: --cuda-gpu-arch=gfx906 %s 2>&1 | FileCheck -check-prefixes=CRDS %s // CRDS-NOT: "-f{{(no-)?}}hip-fp32-correctly-rounded-divide-sqrt" + +// Check -fno-hip-uniform-block is passed to clang -cc1 but +// (default) -fhip-uniform-block is not. + +// RUN: %clang -### --target=x86_64-unknown-linux-gnu -nogpuinc -nogpulib -fno-hip-uniform-block \ +// RUN: --cuda-gpu-arch=gfx906 %s 2>&1 | FileCheck -check-prefix=NOUNIBLK %s + +// NOUNIBLK: "-cc1"{{.*}} "-triple" "amdgcn-amd-amdhsa" {{.*}} "-fno-hip-uniform-block" +// NOUNIBLK: "-cc1"{{.*}} "-triple" "x86_64-unknown-linux-gnu" {{.*}} "-fno-hip-uniform-block" + +// RUN: %clang -### -nogpuinc -nogpulib -fhip-uniform-block \ +// RUN: --cuda-gpu-arch=gfx906 %s 2>&1 | FileCheck -check-prefix=UNIBLK %s + +// RUN: %clang -### -nogpuinc -nogpulib \ +// RUN: --cuda-gpu-arch=gfx906 %s 2>&1 | FileCheck -check-prefix=UNIBLK %s + +// UNIBLK-NOT: "-f{{(no-)?}}hip-uniform-block" + +// Check no warnings for -f[no-]uniform-block. + +// RUN: %clang -fdriver-only -Werror --target=x86_64-unknown-linux-gnu -nogpuinc -nogpulib -fno-hip-uniform-block \ +// RUN: -fhip-uniform-block --cuda-gpu-arch=gfx906 %s 2>&1 | count 0 + +// RUN: %clang -fdriver-only -Werror --target=x86_64-unknown-linux-gnu -nostdinc -nostdlib -fno-hip-uniform-block \ +// RUN: -fhip-uniform-block -x c++ %s 2>&1 | count 0 Index: clang/test/CodeGenHIP/default-attributes.hip === --- clang/test/CodeGenHIP/default-attributes.hip +++ clang/test/CodeGenHIP/default-attributes.hip @@ -5,6 +5,9 @@ // RUN: %clang_cc1 -O3 -triple amdgcn-amd-amdhsa -x hip -fno-ident -fcuda-is-device \ // RUN:-emit-llvm -o - %s | FileCheck -check-prefix=OPT %s +// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -x hip -fno-ident -fcuda-is-device -fno-hip-uniform-block \ +// RUN:-emit-llvm -o - %s | FileCheck -check-prefix=NOUNIBLK %s + #define __device__ __attribute__((device)) #define __global__ __attribute__((global)) @@ -20,6 +23,12 @@ // OPT-NEXT: entry: // OPT-NEXT:ret void // +// NOUNIBLK: Function Attrs: convergent mustprogress noinline nounwind optnone +// NOUNIBLK-LABEL: define {{[^@]+}}@_Z4funcv +// NOUNIBLK-SAME: () #[[ATTR0:[0-9]+]] { +// NOUNIBLK-NEXT: entry: +// NOUNIBLK-NEXT:ret void +// __device__ void func() { } @@ -36,21 +45,34 @@ // OPT-NEXT: entry: // OPT-NEXT:ret void // +// NOUNIBLK: Function Attrs: convergent mustprogress noinline norecurse nounwind optnone +// NOUNIBLK-LABEL: define {{[^@]+}}@_Z6kernelv +// NOUNIBLK-SAME: () #[[ATTR1:[0-9]+]] { +// NOUNIBLK-NEXT: entry: +// NOUNIBLK-NEXT:ret void +// __global__ void kernel() { } //. -// OPTNONE: attributes #0 = { convergent mustprogress noinline nounwind optnone "no-trapping-math"="true" "stack-protector-buffer-size"="8" } -// OPTNONE: attributes #1 = { convergent mustprogress noinline norecurse nounwind optnone "amdgpu-flat-work-group-size"="1,1024" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "uniform-work-group-size"="true" } +// OPTNONE: attributes #[[ATTR0]] = { convergent mustprogress noinline nounwind optnone "no-trapping-math"="true" "stack-protector-buffer-size"="8" } +// OPTNONE: attributes #[[ATTR1]] = { convergent mustprogress noinline norecurse nounwind optnone "amdgpu-flat-work-group-size"="1,1024" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "uniform-work-group-size"="true" } +//. +// OPT: attributes #[[ATTR0]] = { mustprogress nofree norecurse nosync nounwind willreturn