[PATCH] D36327: [OpenCL] Allow targets emit optimized pipe functions for power of 2 type sizes
yaxunl abandoned this revision. yaxunl added a comment. We implemented this optimization through some target specific llvm pass. https://reviews.llvm.org/D36327 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D36327: [OpenCL] Allow targets emit optimized pipe functions for power of 2 type sizes
yaxunl added a comment. In https://reviews.llvm.org/D36327#840658, @bader wrote: > In https://reviews.llvm.org/D36327#840616, @yaxunl wrote: > > > In https://reviews.llvm.org/D36327#839809, @rjmccall wrote: > > > > > Could you just implement this in SimplifyLibCalls? I assume there's some > > > way to fill in TargetLibraryInfo appropriately for a platform. Is that > > > too late for your linking requirements? > > > > > > Both the optimized and generic versions of __read_pipe function contains > > call of other library functions and are complicate enough not to be > > generated programmatically. amdgpu target does not have the capability to > > link in library code after LLVM codegen. The linking has to be done before > > SimplifyLibCalls. > > > If I understand correctly, SimplifyLibCalls is LLVM IR transformation, so it > works before linking and LLVM codegen (e.g. InstCombine passes run this > transformation). This pass is doing something similar to what you are trying > to achieve for __read_pipe builti-ins: pow(2.0, x) -> llvm.exp2(x). Thanks. I will take a look. https://reviews.llvm.org/D36327 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D36327: [OpenCL] Allow targets emit optimized pipe functions for power of 2 type sizes
bader added a comment. In https://reviews.llvm.org/D36327#840616, @yaxunl wrote: > In https://reviews.llvm.org/D36327#839809, @rjmccall wrote: > > > Could you just implement this in SimplifyLibCalls? I assume there's some > > way to fill in TargetLibraryInfo appropriately for a platform. Is that too > > late for your linking requirements? > > > Both the optimized and generic versions of __read_pipe function contains call > of other library functions and are complicate enough not to be generated > programmatically. amdgpu target does not have the capability to link in > library code after LLVM codegen. The linking has to be done before > SimplifyLibCalls. If I understand correctly, SimplifyLibCalls is LLVM IR transformation, so it works before linking and LLVM codegen (e.g. InstCombine passes run this transformation). This pass is doing something similar to what you are trying to achieve for __read_pipe builti-ins: pow(2.0, x) -> llvm.exp2(x). https://reviews.llvm.org/D36327 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D36327: [OpenCL] Allow targets emit optimized pipe functions for power of 2 type sizes
yaxunl added a comment. In https://reviews.llvm.org/D36327#839809, @rjmccall wrote: > Could you just implement this in SimplifyLibCalls? I assume there's some way > to fill in TargetLibraryInfo appropriately for a platform. Is that too late > for your linking requirements? Both the optimized and generic versions of __read_pipe function contains call of other library functions and are complicate enough not to be generated programmatically. amdgpu target does not have the capability to link in library code after LLVM codegen. The linking has to be done before SimplifyLibCalls. https://reviews.llvm.org/D36327 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D36327: [OpenCL] Allow targets emit optimized pipe functions for power of 2 type sizes
rjmccall added a comment. Could you just implement this in SimplifyLibCalls? I assume there's some way to fill in TargetLibraryInfo appropriately for a platform. Is that too late for your linking requirements? https://reviews.llvm.org/D36327 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D36327: [OpenCL] Allow targets emit optimized pipe functions for power of 2 type sizes
yaxunl added a comment. John, do you have any comments? Thanks. https://reviews.llvm.org/D36327 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D36327: [OpenCL] Allow targets emit optimized pipe functions for power of 2 type sizes
bader added a comment. @rsmith do you have an opinion on what would be the right place for the kind of proposed optimization? It looks like it can be implemented as target independent optimization, acting only for target with specified properties - in this case target must provide required built-in functions. https://reviews.llvm.org/D36327 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D36327: [OpenCL] Allow targets emit optimized pipe functions for power of 2 type sizes
yaxunl added a comment. In https://reviews.llvm.org/D36327#835634, @Anastasia wrote: > In https://reviews.llvm.org/D36327#835153, @b-sumner wrote: > > > In https://reviews.llvm.org/D36327#834032, @Anastasia wrote: > > > > > In https://reviews.llvm.org/D36327#833891, @yaxunl wrote: > > > > > > > In https://reviews.llvm.org/D36327#833653, @bader wrote: > > > > > > > > > Hi Sam, > > > > > > > > > > What do you think about implementing this optimization in target > > > > > specific optimization pass? Since size/alignment is saved as function > > > > > parameter in LLVM IR, the optimization can be done in target specific > > > > > components w/o adding additional conditions to generic library. > > > > > > > > > > Thanks, > > > > > Alexey > > > > > > > > > > > > Hi Alexey, > > > > > > > > The optimization of the power-of-2 type size is implemented as a > > > > library function. Our backend lacks the capability to link in library > > > > code at ISA level, so linking of the optimized library function has to > > > > be done before any target-specific passes. It seems the only place to > > > > do this is Clang codegen since Clang/llvm does not support > > > > target-specific pre-linking passes. > > > > > > > > > My general feeling is that it doesn't look like a generic enough change > > > for the frontend. Even though it is implemented in a generic way, not > > > every target might have a special support for the power of 2 size and > > > also if there is such a support not every implementation would handle it > > > as a library function. But I can see that perhaps LLVM is missing > > > flexibility in the flow to accommodate these needs. Any change we could > > > try to extend the compilation flow such that this target specific > > > optimization could happen before the IR linking? > > > > > > It is trivial to implement the small number of specialized functions this > > patch adds in terms of the general one if desired, and the general one can > > continue to be handled as it had been. > > > > We had actually proposed a patch (sorry I don't have the reference handy) > > to add general mechanism for targets to introduce pre-link passes, but it > > was not accepted. We can try again, but I don't really expect more > > progress. > > > It would be nice to understand why it has not been accepted and whether we > could try to argument using this case as an example. It seems like a useful > feature for toolchains with the IR linking. The original review is here: https://reviews.llvm.org/D20682 To cite the reason why it was rejected: "I fundamentally do not believe that the TargetMachine should be involved in fixing language semantics issues with a "pre linking" pass at the LLVM level. Why? Because there is nothing "target" about this. There needs to be a fundamentally more principled way of handling this at the language and frontend level IMO." However, until now, we could not find "a fundamentally more principled way of handling this at the language and frontend level". https://reviews.llvm.org/D36327 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D36327: [OpenCL] Allow targets emit optimized pipe functions for power of 2 type sizes
Anastasia added a comment. In https://reviews.llvm.org/D36327#835153, @b-sumner wrote: > In https://reviews.llvm.org/D36327#834032, @Anastasia wrote: > > > In https://reviews.llvm.org/D36327#833891, @yaxunl wrote: > > > > > In https://reviews.llvm.org/D36327#833653, @bader wrote: > > > > > > > Hi Sam, > > > > > > > > What do you think about implementing this optimization in target > > > > specific optimization pass? Since size/alignment is saved as function > > > > parameter in LLVM IR, the optimization can be done in target specific > > > > components w/o adding additional conditions to generic library. > > > > > > > > Thanks, > > > > Alexey > > > > > > > > > Hi Alexey, > > > > > > The optimization of the power-of-2 type size is implemented as a library > > > function. Our backend lacks the capability to link in library code at ISA > > > level, so linking of the optimized library function has to be done before > > > any target-specific passes. It seems the only place to do this is Clang > > > codegen since Clang/llvm does not support target-specific pre-linking > > > passes. > > > > > > My general feeling is that it doesn't look like a generic enough change for > > the frontend. Even though it is implemented in a generic way, not every > > target might have a special support for the power of 2 size and also if > > there is such a support not every implementation would handle it as a > > library function. But I can see that perhaps LLVM is missing flexibility in > > the flow to accommodate these needs. Any change we could try to extend the > > compilation flow such that this target specific optimization could happen > > before the IR linking? > > > It is trivial to implement the small number of specialized functions this > patch adds in terms of the general one if desired, and the general one can > continue to be handled as it had been. > > We had actually proposed a patch (sorry I don't have the reference handy) to > add general mechanism for targets to introduce pre-link passes, but it was > not accepted. We can try again, but I don't really expect more progress. It would be nice to understand why it has not been accepted and whether we could try to argument using this case as an example. It seems like a useful feature for toolchains with the IR linking. https://reviews.llvm.org/D36327 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D36327: [OpenCL] Allow targets emit optimized pipe functions for power of 2 type sizes
b-sumner added a comment. In https://reviews.llvm.org/D36327#834032, @Anastasia wrote: > In https://reviews.llvm.org/D36327#833891, @yaxunl wrote: > > > In https://reviews.llvm.org/D36327#833653, @bader wrote: > > > > > Hi Sam, > > > > > > What do you think about implementing this optimization in target specific > > > optimization pass? Since size/alignment is saved as function parameter in > > > LLVM IR, the optimization can be done in target specific components w/o > > > adding additional conditions to generic library. > > > > > > Thanks, > > > Alexey > > > > > > Hi Alexey, > > > > The optimization of the power-of-2 type size is implemented as a library > > function. Our backend lacks the capability to link in library code at ISA > > level, so linking of the optimized library function has to be done before > > any target-specific passes. It seems the only place to do this is Clang > > codegen since Clang/llvm does not support target-specific pre-linking > > passes. > > > My general feeling is that it doesn't look like a generic enough change for > the frontend. Even though it is implemented in a generic way, not every > target might have a special support for the power of 2 size and also if there > is such a support not every implementation would handle it as a library > function. But I can see that perhaps LLVM is missing flexibility in the flow > to accommodate these needs. Any change we could try to extend the compilation > flow such that this target specific optimization could happen before the IR > linking? It is trivial to implement the small number of specialized functions this patch adds in terms of the general one if desired, and the general one can continue to be handled as it had been. We had actually proposed a patch (sorry I don't have the reference handy) to add general mechanism for targets to introduce pre-link passes, but it was not accepted. We can try again, but I don't really expect more progress. https://reviews.llvm.org/D36327 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D36327: [OpenCL] Allow targets emit optimized pipe functions for power of 2 type sizes
Anastasia added a comment. In https://reviews.llvm.org/D36327#833891, @yaxunl wrote: > In https://reviews.llvm.org/D36327#833653, @bader wrote: > > > Hi Sam, > > > > What do you think about implementing this optimization in target specific > > optimization pass? Since size/alignment is saved as function parameter in > > LLVM IR, the optimization can be done in target specific components w/o > > adding additional conditions to generic library. > > > > Thanks, > > Alexey > > > Hi Alexey, > > The optimization of the power-of-2 type size is implemented as a library > function. Our backend lacks the capability to link in library code at ISA > level, so linking of the optimized library function has to be done before any > target-specific passes. It seems the only place to do this is Clang codegen > since Clang/llvm does not support target-specific pre-linking passes. My general feeling is that it doesn't look like a generic enough change for the frontend. Even though it is implemented in a generic way, not every target might have a special support for the power of 2 size and also if there is such a support not every implementation would handle it as a library function. But I can see that perhaps LLVM is missing flexibility in the flow to accommodate these needs. Any change we could try to extend the compilation flow such that this target specific optimization could happen before the IR linking? https://reviews.llvm.org/D36327 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D36327: [OpenCL] Allow targets emit optimized pipe functions for power of 2 type sizes
yaxunl added a comment. In https://reviews.llvm.org/D36327#833653, @bader wrote: > Hi Sam, > > What do you think about implementing this optimization in target specific > optimization pass? Since size/alignment is saved as function parameter in > LLVM IR, the optimization can be done in target specific components w/o > adding additional conditions to generic library. > > Thanks, > Alexey Hi Alexey, The optimization of the power-of-2 type size is implemented as a library function. Our backend lacks the capability to link in library code at ISA level, so linking of the optimized library function has to be done before any target-specific passes. It seems the only place to do this is Clang codegen since Clang/llvm does not support target-specific pre-linking passes. https://reviews.llvm.org/D36327 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D36327: [OpenCL] Allow targets emit optimized pipe functions for power of 2 type sizes
bader added a comment. Hi Sam, What do you think about implementing this optimization in target specific optimization pass? Since size/alignment is saved as function parameter in LLVM IR, the optimization can be done in target specific components w/o adding additional conditions to generic library. Thanks, Alexey https://reviews.llvm.org/D36327 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D36327: [OpenCL] Allow targets emit optimized pipe functions for power of 2 type sizes
yaxunl created this revision. Herald added a subscriber: tpr. Currently Clang emits call of __read_pipe_2 or __read_pipe_4 for OpenCL read_pipe builtin, with appended type size and alignment arguments, where 2 or 4 indicates the original number of arguments. For certain targets (e.g. amdgpu), there are optimized version of __read_pipe_2/__read_pipe_4 when the type size and alignment has the same power of 2 value. It is desired that Clang emits a different function for these cases. This patch let Clang emits __read_pipe_2_N for such cases where N is the size in bytes of the type. (N = 1,2,4,8,...,128), so that the target runtime can use an optimized version of read_pipe. The same with __read_pipe_4, __write_pipe_2 and __wirte_pipe_4. This optimization is controlled by TargetCodeGenInfo::hasOptimizedPipeBuiltin, which returns false by default. Each target can override this function to turn on this optimization. https://reviews.llvm.org/D36327 Files: lib/CodeGen/CGBuiltin.cpp lib/CodeGen/TargetInfo.cpp lib/CodeGen/TargetInfo.h test/CodeGenOpenCL/pipe_builtin.cl Index: test/CodeGenOpenCL/pipe_builtin.cl === --- test/CodeGenOpenCL/pipe_builtin.cl +++ test/CodeGenOpenCL/pipe_builtin.cl @@ -1,73 +1,90 @@ -// RUN: %clang_cc1 -emit-llvm -cl-ext=+cl_khr_subgroups -O0 -cl-std=CL2.0 -o - %s | FileCheck %s +// RUN: %clang_cc1 -emit-llvm -cl-ext=+cl_khr_subgroups -O0 -cl-std=CL2.0 -o - %s | FileCheck -check-prefixes=CHECK,NAMD %s +// RUN: %clang_cc1 -triple amdgcn---amdgizcl -emit-llvm -cl-ext=+cl_khr_subgroups -O0 -cl-std=CL2.0 -o - %s | FileCheck -check-prefixes=CHECK,AMD %s // CHECK: %opencl.pipe_t = type opaque // CHECK: %opencl.reserve_id_t = type opaque #pragma OPENCL EXTENSION cl_khr_subgroups : enable +typedef struct { + int x[100]; +} S; + +typedef long long2 __attribute__((ext_vector_type(2))); +typedef long long3 __attribute__((ext_vector_type(3))); +typedef long long4 __attribute__((ext_vector_type(4))); +typedef long long8 __attribute__((ext_vector_type(8))); +typedef long long16 __attribute__((ext_vector_type(16))); + void test1(read_only pipe int p, global int *ptr) { - // CHECK: call i32 @__read_pipe_2(%opencl.pipe_t* %{{.*}}, i8* %{{.*}}, i32 4, i32 4) + // NAMD: call i32 @__read_pipe_2(%opencl.pipe_t* %{{.*}}, i8* %{{.*}}, i32 4, i32 4) + // AMD: call i32 @__read_pipe_2_4(%opencl.pipe_t addrspace(1)* %{{.*}}, i32* %{{.*}}) read_pipe(p, ptr); - // CHECK: call %opencl.reserve_id_t* @__reserve_read_pipe(%opencl.pipe_t* %{{.*}}, i32 {{.*}}, i32 4, i32 4) + // CHECK: call %opencl.reserve_id_t* @__reserve_read_pipe(%opencl.pipe_t{{.*}}* %{{.*}}, i32 {{.*}}, i32 4, i32 4) reserve_id_t rid = reserve_read_pipe(p, 2); - // CHECK: call i32 @__read_pipe_4(%opencl.pipe_t* %{{.*}}, %opencl.reserve_id_t* %{{.*}}, i32 {{.*}}, i8* %{{.*}}, i32 4, i32 4) + // NAMD: call i32 @__read_pipe_4(%opencl.pipe_t* %{{.*}}, %opencl.reserve_id_t* %{{.*}}, i32 {{.*}}, i8* %{{.*}}, i32 4, i32 4) + // AMD: call i32 @__read_pipe_4_4(%opencl.pipe_t addrspace(1)* %{{.*}}, %opencl.reserve_id_t* %{{.*}}, i32 {{.*}}, i32* %{{.*}}) read_pipe(p, rid, 2, ptr); - // CHECK: call void @__commit_read_pipe(%opencl.pipe_t* %{{.*}}, %opencl.reserve_id_t* %{{.*}}, i32 4, i32 4) + // CHECK: call void @__commit_read_pipe(%opencl.pipe_t{{.*}}* %{{.*}}, %opencl.reserve_id_t{{.*}}* %{{.*}}, i32 4, i32 4) commit_read_pipe(p, rid); } void test2(write_only pipe int p, global int *ptr) { - // CHECK: call i32 @__write_pipe_2(%opencl.pipe_t* %{{.*}}, i8* %{{.*}}, i32 4, i32 4) + // NAMD: call i32 @__write_pipe_2(%opencl.pipe_t* %{{.*}}, i8* %{{.*}}, i32 4, i32 4) + // AMD: call i32 @__write_pipe_2_4(%opencl.pipe_t addrspace(1)* %{{.*}}, i32* %{{.*}}) write_pipe(p, ptr); - // CHECK: call %opencl.reserve_id_t* @__reserve_write_pipe(%opencl.pipe_t* %{{.*}}, i32 {{.*}}, i32 4, i32 4) + // CHECK: call %opencl.reserve_id_t* @__reserve_write_pipe(%opencl.pipe_t{{.*}}* %{{.*}}, i32 {{.*}}, i32 4, i32 4) reserve_id_t rid = reserve_write_pipe(p, 2); - // CHECK: call i32 @__write_pipe_4(%opencl.pipe_t* %{{.*}}, %opencl.reserve_id_t* %{{.*}}, i32 {{.*}}, i8* %{{.*}}, i32 4, i32 4) + // NAMD: call i32 @__write_pipe_4(%opencl.pipe_t* %{{.*}}, %opencl.reserve_id_t* %{{.*}}, i32 {{.*}}, i8* %{{.*}}, i32 4, i32 4) + // AMD: call i32 @__write_pipe_4_4(%opencl.pipe_t addrspace(1)* %{{.*}}, %opencl.reserve_id_t* %{{.*}}, i32 {{.*}}, i32* %{{.*}}) write_pipe(p, rid, 2, ptr); - // CHECK: call void @__commit_write_pipe(%opencl.pipe_t* %{{.*}}, %opencl.reserve_id_t* %{{.*}}, i32 4, i32 4) + // CHECK: call void @__commit_write_pipe(%opencl.pipe_t{{.*}}* %{{.*}}, %opencl.reserve_id_t* %{{.*}}, i32 4, i32 4) commit_write_pipe(p, rid); } void test3(read_only pipe int p, global int *ptr) { - // CHECK: call %opencl.reserve_id_t* @__work_group_reserve_read_pipe(%opencl.pipe_t* %{{.*}}, i32 {{.*}}, i32 4, i32 4) + // CHECK: call