[PATCH] D36327: [OpenCL] Allow targets emit optimized pipe functions for power of 2 type sizes

2017-09-08 Thread Yaxun Liu via Phabricator via cfe-commits
yaxunl abandoned this revision.
yaxunl added a comment.

We implemented this optimization through some target specific llvm pass.


https://reviews.llvm.org/D36327



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D36327: [OpenCL] Allow targets emit optimized pipe functions for power of 2 type sizes

2017-08-14 Thread Yaxun Liu via Phabricator via cfe-commits
yaxunl added a comment.

In https://reviews.llvm.org/D36327#840658, @bader wrote:

> In https://reviews.llvm.org/D36327#840616, @yaxunl wrote:
>
> > In https://reviews.llvm.org/D36327#839809, @rjmccall wrote:
> >
> > > Could you just implement this in SimplifyLibCalls?  I assume there's some 
> > > way to fill in TargetLibraryInfo appropriately for a platform.  Is that 
> > > too late for your linking requirements?
> >
> >
> > Both the optimized and generic versions of __read_pipe function contains 
> > call of other library functions and are complicate enough not to be 
> > generated programmatically. amdgpu target does not have the capability to 
> > link in library code after LLVM codegen. The linking has to be done before 
> > SimplifyLibCalls.
>
>
> If I understand correctly, SimplifyLibCalls is LLVM IR transformation, so it 
> works before linking and LLVM codegen (e.g. InstCombine passes run this 
> transformation). This pass is doing something similar to what you are trying 
> to achieve for __read_pipe builti-ins: pow(2.0, x) -> llvm.exp2(x).


Thanks. I will take a look.


https://reviews.llvm.org/D36327



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D36327: [OpenCL] Allow targets emit optimized pipe functions for power of 2 type sizes

2017-08-14 Thread Alexey Bader via Phabricator via cfe-commits
bader added a comment.

In https://reviews.llvm.org/D36327#840616, @yaxunl wrote:

> In https://reviews.llvm.org/D36327#839809, @rjmccall wrote:
>
> > Could you just implement this in SimplifyLibCalls?  I assume there's some 
> > way to fill in TargetLibraryInfo appropriately for a platform.  Is that too 
> > late for your linking requirements?
>
>
> Both the optimized and generic versions of __read_pipe function contains call 
> of other library functions and are complicate enough not to be generated 
> programmatically. amdgpu target does not have the capability to link in 
> library code after LLVM codegen. The linking has to be done before 
> SimplifyLibCalls.


If I understand correctly, SimplifyLibCalls is LLVM IR transformation, so it 
works before linking and LLVM codegen (e.g. InstCombine passes run this 
transformation). This pass is doing something similar to what you are trying to 
achieve for __read_pipe builti-ins: pow(2.0, x) -> llvm.exp2(x).


https://reviews.llvm.org/D36327



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D36327: [OpenCL] Allow targets emit optimized pipe functions for power of 2 type sizes

2017-08-14 Thread Yaxun Liu via Phabricator via cfe-commits
yaxunl added a comment.

In https://reviews.llvm.org/D36327#839809, @rjmccall wrote:

> Could you just implement this in SimplifyLibCalls?  I assume there's some way 
> to fill in TargetLibraryInfo appropriately for a platform.  Is that too late 
> for your linking requirements?


Both the optimized and generic versions of __read_pipe function contains call 
of other library functions and are complicate enough not to be generated 
programmatically. amdgpu target does not have the capability to link in library 
code after LLVM codegen. The linking has to be done before SimplifyLibCalls.


https://reviews.llvm.org/D36327



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D36327: [OpenCL] Allow targets emit optimized pipe functions for power of 2 type sizes

2017-08-11 Thread John McCall via Phabricator via cfe-commits
rjmccall added a comment.

Could you just implement this in SimplifyLibCalls?  I assume there's some way 
to fill in TargetLibraryInfo appropriately for a platform.  Is that too late 
for your linking requirements?


https://reviews.llvm.org/D36327



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D36327: [OpenCL] Allow targets emit optimized pipe functions for power of 2 type sizes

2017-08-11 Thread Yaxun Liu via Phabricator via cfe-commits
yaxunl added a comment.

John, do you have any comments? Thanks.


https://reviews.llvm.org/D36327



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D36327: [OpenCL] Allow targets emit optimized pipe functions for power of 2 type sizes

2017-08-08 Thread Alexey Bader via Phabricator via cfe-commits
bader added a comment.

@rsmith do you have an opinion on what would be the right place for the kind of 
proposed optimization?
It looks like it can be implemented as target independent optimization, acting 
only for target with specified properties - in this case target must provide 
required built-in functions.


https://reviews.llvm.org/D36327



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D36327: [OpenCL] Allow targets emit optimized pipe functions for power of 2 type sizes

2017-08-08 Thread Yaxun Liu via Phabricator via cfe-commits
yaxunl added a comment.

In https://reviews.llvm.org/D36327#835634, @Anastasia wrote:

> In https://reviews.llvm.org/D36327#835153, @b-sumner wrote:
>
> > In https://reviews.llvm.org/D36327#834032, @Anastasia wrote:
> >
> > > In https://reviews.llvm.org/D36327#833891, @yaxunl wrote:
> > >
> > > > In https://reviews.llvm.org/D36327#833653, @bader wrote:
> > > >
> > > > > Hi Sam,
> > > > >
> > > > > What do you think about implementing this optimization in target 
> > > > > specific optimization pass? Since size/alignment is saved as function 
> > > > > parameter in LLVM IR, the optimization can be done in target specific 
> > > > > components w/o adding additional conditions to generic library.
> > > > >
> > > > > Thanks,
> > > > > Alexey
> > > >
> > > >
> > > > Hi Alexey,
> > > >
> > > > The optimization of the power-of-2 type size is implemented as a 
> > > > library function. Our backend lacks the capability to link in library 
> > > > code at ISA level, so linking of the optimized library function has to 
> > > > be done before any target-specific passes. It seems the only place to 
> > > > do this is Clang codegen since Clang/llvm does not support 
> > > > target-specific pre-linking passes.
> > >
> > >
> > > My general feeling is that it doesn't look like a generic enough change 
> > > for the frontend. Even though it is implemented in a generic way, not 
> > > every target might have a special support for the power of 2 size and 
> > > also if there is such a support not every implementation would handle it 
> > > as a library function. But I can see that perhaps LLVM is missing 
> > > flexibility in the flow to accommodate these needs. Any change we could 
> > > try to extend the compilation flow such that this target specific 
> > > optimization could happen before the IR linking?
> >
> >
> > It is trivial to implement the small number of specialized functions this 
> > patch adds in terms of the general one if desired, and the general one can 
> > continue to be handled as it had been.
> >
> > We had actually proposed a patch (sorry I don't have the reference handy) 
> > to add general mechanism for targets to introduce pre-link passes, but it 
> > was not accepted.  We can try again, but I don't really expect more 
> > progress.
>
>
> It would be nice to understand why it has not been accepted and whether we 
> could try to argument using this case as an example. It seems like a useful 
> feature for toolchains with the IR linking.


The original review is here:

https://reviews.llvm.org/D20682

To cite the reason why it was rejected:

"I fundamentally do not believe that the TargetMachine should be involved in 
fixing language semantics issues with a "pre linking" pass at the LLVM level. 
Why? Because there is nothing "target" about this.

There needs to be a fundamentally more principled way of handling this at the 
language and frontend level IMO."

However, until now, we could not find "a fundamentally more principled way of 
handling this at the language and frontend level".


https://reviews.llvm.org/D36327



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D36327: [OpenCL] Allow targets emit optimized pipe functions for power of 2 type sizes

2017-08-08 Thread Anastasia Stulova via Phabricator via cfe-commits
Anastasia added a comment.

In https://reviews.llvm.org/D36327#835153, @b-sumner wrote:

> In https://reviews.llvm.org/D36327#834032, @Anastasia wrote:
>
> > In https://reviews.llvm.org/D36327#833891, @yaxunl wrote:
> >
> > > In https://reviews.llvm.org/D36327#833653, @bader wrote:
> > >
> > > > Hi Sam,
> > > >
> > > > What do you think about implementing this optimization in target 
> > > > specific optimization pass? Since size/alignment is saved as function 
> > > > parameter in LLVM IR, the optimization can be done in target specific 
> > > > components w/o adding additional conditions to generic library.
> > > >
> > > > Thanks,
> > > > Alexey
> > >
> > >
> > > Hi Alexey,
> > >
> > > The optimization of the power-of-2 type size is implemented as a library 
> > > function. Our backend lacks the capability to link in library code at ISA 
> > > level, so linking of the optimized library function has to be done before 
> > > any target-specific passes. It seems the only place to do this is Clang 
> > > codegen since Clang/llvm does not support target-specific pre-linking 
> > > passes.
> >
> >
> > My general feeling is that it doesn't look like a generic enough change for 
> > the frontend. Even though it is implemented in a generic way, not every 
> > target might have a special support for the power of 2 size and also if 
> > there is such a support not every implementation would handle it as a 
> > library function. But I can see that perhaps LLVM is missing flexibility in 
> > the flow to accommodate these needs. Any change we could try to extend the 
> > compilation flow such that this target specific optimization could happen 
> > before the IR linking?
>
>
> It is trivial to implement the small number of specialized functions this 
> patch adds in terms of the general one if desired, and the general one can 
> continue to be handled as it had been.
>
> We had actually proposed a patch (sorry I don't have the reference handy) to 
> add general mechanism for targets to introduce pre-link passes, but it was 
> not accepted.  We can try again, but I don't really expect more progress.


It would be nice to understand why it has not been accepted and whether we 
could try to argument using this case as an example. It seems like a useful 
feature for toolchains with the IR linking.


https://reviews.llvm.org/D36327



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D36327: [OpenCL] Allow targets emit optimized pipe functions for power of 2 type sizes

2017-08-08 Thread Brian Sumner via Phabricator via cfe-commits
b-sumner added a comment.

In https://reviews.llvm.org/D36327#834032, @Anastasia wrote:

> In https://reviews.llvm.org/D36327#833891, @yaxunl wrote:
>
> > In https://reviews.llvm.org/D36327#833653, @bader wrote:
> >
> > > Hi Sam,
> > >
> > > What do you think about implementing this optimization in target specific 
> > > optimization pass? Since size/alignment is saved as function parameter in 
> > > LLVM IR, the optimization can be done in target specific components w/o 
> > > adding additional conditions to generic library.
> > >
> > > Thanks,
> > > Alexey
> >
> >
> > Hi Alexey,
> >
> > The optimization of the power-of-2 type size is implemented as a library 
> > function. Our backend lacks the capability to link in library code at ISA 
> > level, so linking of the optimized library function has to be done before 
> > any target-specific passes. It seems the only place to do this is Clang 
> > codegen since Clang/llvm does not support target-specific pre-linking 
> > passes.
>
>
> My general feeling is that it doesn't look like a generic enough change for 
> the frontend. Even though it is implemented in a generic way, not every 
> target might have a special support for the power of 2 size and also if there 
> is such a support not every implementation would handle it as a library 
> function. But I can see that perhaps LLVM is missing flexibility in the flow 
> to accommodate these needs. Any change we could try to extend the compilation 
> flow such that this target specific optimization could happen before the IR 
> linking?


It is trivial to implement the small number of specialized functions this patch 
adds in terms of the general one if desired, and the general one can continue 
to be handled as it had been.

We had actually proposed a patch (sorry I don't have the reference handy) to 
add general mechanism for targets to introduce pre-link passes, but it was not 
accepted.  We can try again, but I don't really expect more progress.


https://reviews.llvm.org/D36327



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D36327: [OpenCL] Allow targets emit optimized pipe functions for power of 2 type sizes

2017-08-07 Thread Anastasia Stulova via Phabricator via cfe-commits
Anastasia added a comment.

In https://reviews.llvm.org/D36327#833891, @yaxunl wrote:

> In https://reviews.llvm.org/D36327#833653, @bader wrote:
>
> > Hi Sam,
> >
> > What do you think about implementing this optimization in target specific 
> > optimization pass? Since size/alignment is saved as function parameter in 
> > LLVM IR, the optimization can be done in target specific components w/o 
> > adding additional conditions to generic library.
> >
> > Thanks,
> > Alexey
>
>
> Hi Alexey,
>
> The optimization of the power-of-2 type size is implemented as a library 
> function. Our backend lacks the capability to link in library code at ISA 
> level, so linking of the optimized library function has to be done before any 
> target-specific passes. It seems the only place to do this is Clang codegen 
> since Clang/llvm does not support target-specific pre-linking passes.


My general feeling is that it doesn't look like a generic enough change for the 
frontend. Even though it is implemented in a generic way, not every target 
might have a special support for the power of 2 size and also if there is such 
a support not every implementation would handle it as a library function. But I 
can see that perhaps LLVM is missing flexibility in the flow to accommodate 
these needs. Any change we could try to extend the compilation flow such that 
this target specific optimization could happen before the IR linking?


https://reviews.llvm.org/D36327



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D36327: [OpenCL] Allow targets emit optimized pipe functions for power of 2 type sizes

2017-08-07 Thread Yaxun Liu via Phabricator via cfe-commits
yaxunl added a comment.

In https://reviews.llvm.org/D36327#833653, @bader wrote:

> Hi Sam,
>
> What do you think about implementing this optimization in target specific 
> optimization pass? Since size/alignment is saved as function parameter in 
> LLVM IR, the optimization can be done in target specific components w/o 
> adding additional conditions to generic library.
>
> Thanks,
> Alexey


Hi Alexey,

The optimization of the power-of-2 type size is implemented as a library 
function. Our backend lacks the capability to link in library code at ISA 
level, so linking of the optimized library function has to be done before any 
target-specific passes. It seems the only place to do this is Clang codegen 
since Clang/llvm does not support target-specific pre-linking passes.


https://reviews.llvm.org/D36327



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D36327: [OpenCL] Allow targets emit optimized pipe functions for power of 2 type sizes

2017-08-07 Thread Alexey Bader via Phabricator via cfe-commits
bader added a comment.

Hi Sam,

What do you think about implementing this optimization in target specific 
optimization pass? Since size/alignment is saved as function parameter in LLVM 
IR, the optimization can be done in target specific components w/o adding 
additional conditions to generic library.

Thanks,
Alexey


https://reviews.llvm.org/D36327



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D36327: [OpenCL] Allow targets emit optimized pipe functions for power of 2 type sizes

2017-08-04 Thread Yaxun Liu via Phabricator via cfe-commits
yaxunl created this revision.
Herald added a subscriber: tpr.

Currently Clang emits call of __read_pipe_2 or __read_pipe_4 for OpenCL 
read_pipe builtin,
with appended type size and alignment arguments, where 2 or 4 indicates the 
original
number of arguments.

For certain targets (e.g. amdgpu), there are optimized version of 
__read_pipe_2/__read_pipe_4
when the type size and alignment has the same power of 2 value. It is desired 
that Clang
emits a different function for these cases.

This patch let Clang emits __read_pipe_2_N for such cases where N is the size 
in bytes of
the type. (N = 1,2,4,8,...,128), so that the target runtime can use an 
optimized version of
read_pipe.

The same with __read_pipe_4, __write_pipe_2 and __wirte_pipe_4.

This optimization is controlled by TargetCodeGenInfo::hasOptimizedPipeBuiltin, 
which returns
false by default. Each target can override this function to turn on this 
optimization.


https://reviews.llvm.org/D36327

Files:
  lib/CodeGen/CGBuiltin.cpp
  lib/CodeGen/TargetInfo.cpp
  lib/CodeGen/TargetInfo.h
  test/CodeGenOpenCL/pipe_builtin.cl

Index: test/CodeGenOpenCL/pipe_builtin.cl
===
--- test/CodeGenOpenCL/pipe_builtin.cl
+++ test/CodeGenOpenCL/pipe_builtin.cl
@@ -1,73 +1,90 @@
-// RUN: %clang_cc1 -emit-llvm -cl-ext=+cl_khr_subgroups -O0 -cl-std=CL2.0 -o - %s | FileCheck %s
+// RUN: %clang_cc1 -emit-llvm -cl-ext=+cl_khr_subgroups -O0 -cl-std=CL2.0 -o - %s | FileCheck -check-prefixes=CHECK,NAMD %s
+// RUN: %clang_cc1 -triple amdgcn---amdgizcl -emit-llvm -cl-ext=+cl_khr_subgroups -O0 -cl-std=CL2.0 -o - %s | FileCheck -check-prefixes=CHECK,AMD %s
 
 // CHECK: %opencl.pipe_t = type opaque
 // CHECK: %opencl.reserve_id_t = type opaque
 
 #pragma OPENCL EXTENSION cl_khr_subgroups : enable
 
+typedef struct {
+  int x[100];
+} S;
+
+typedef long long2 __attribute__((ext_vector_type(2)));
+typedef long long3 __attribute__((ext_vector_type(3)));
+typedef long long4 __attribute__((ext_vector_type(4)));
+typedef long long8 __attribute__((ext_vector_type(8)));
+typedef long long16 __attribute__((ext_vector_type(16)));
+
 void test1(read_only pipe int p, global int *ptr) {
-  // CHECK: call i32 @__read_pipe_2(%opencl.pipe_t* %{{.*}}, i8* %{{.*}}, i32 4, i32 4)
+  // NAMD: call i32 @__read_pipe_2(%opencl.pipe_t* %{{.*}}, i8* %{{.*}}, i32 4, i32 4)
+  // AMD: call i32 @__read_pipe_2_4(%opencl.pipe_t addrspace(1)* %{{.*}}, i32* %{{.*}})
   read_pipe(p, ptr);
-  // CHECK: call %opencl.reserve_id_t* @__reserve_read_pipe(%opencl.pipe_t* %{{.*}}, i32 {{.*}}, i32 4, i32 4)
+  // CHECK: call %opencl.reserve_id_t* @__reserve_read_pipe(%opencl.pipe_t{{.*}}* %{{.*}}, i32 {{.*}}, i32 4, i32 4)
   reserve_id_t rid = reserve_read_pipe(p, 2);
-  // CHECK: call i32 @__read_pipe_4(%opencl.pipe_t* %{{.*}}, %opencl.reserve_id_t* %{{.*}}, i32 {{.*}}, i8* %{{.*}}, i32 4, i32 4)
+  // NAMD: call i32 @__read_pipe_4(%opencl.pipe_t* %{{.*}}, %opencl.reserve_id_t* %{{.*}}, i32 {{.*}}, i8* %{{.*}}, i32 4, i32 4)
+  // AMD: call i32 @__read_pipe_4_4(%opencl.pipe_t addrspace(1)* %{{.*}}, %opencl.reserve_id_t* %{{.*}}, i32 {{.*}}, i32* %{{.*}})
   read_pipe(p, rid, 2, ptr);
-  // CHECK: call void @__commit_read_pipe(%opencl.pipe_t* %{{.*}}, %opencl.reserve_id_t* %{{.*}}, i32 4, i32 4)
+  // CHECK: call void @__commit_read_pipe(%opencl.pipe_t{{.*}}* %{{.*}}, %opencl.reserve_id_t{{.*}}* %{{.*}}, i32 4, i32 4)
   commit_read_pipe(p, rid);
 }
 
 void test2(write_only pipe int p, global int *ptr) {
-  // CHECK: call i32 @__write_pipe_2(%opencl.pipe_t* %{{.*}}, i8* %{{.*}}, i32 4, i32 4)
+  // NAMD: call i32 @__write_pipe_2(%opencl.pipe_t* %{{.*}}, i8* %{{.*}}, i32 4, i32 4)
+  // AMD: call i32 @__write_pipe_2_4(%opencl.pipe_t addrspace(1)* %{{.*}}, i32* %{{.*}})
   write_pipe(p, ptr);
-  // CHECK: call %opencl.reserve_id_t* @__reserve_write_pipe(%opencl.pipe_t* %{{.*}}, i32 {{.*}}, i32 4, i32 4)
+  // CHECK: call %opencl.reserve_id_t* @__reserve_write_pipe(%opencl.pipe_t{{.*}}* %{{.*}}, i32 {{.*}}, i32 4, i32 4)
   reserve_id_t rid = reserve_write_pipe(p, 2);
-  // CHECK: call i32 @__write_pipe_4(%opencl.pipe_t* %{{.*}}, %opencl.reserve_id_t* %{{.*}}, i32 {{.*}}, i8* %{{.*}}, i32 4, i32 4)
+  // NAMD: call i32 @__write_pipe_4(%opencl.pipe_t* %{{.*}}, %opencl.reserve_id_t* %{{.*}}, i32 {{.*}}, i8* %{{.*}}, i32 4, i32 4)
+  // AMD: call i32 @__write_pipe_4_4(%opencl.pipe_t addrspace(1)* %{{.*}}, %opencl.reserve_id_t* %{{.*}}, i32 {{.*}}, i32* %{{.*}})
   write_pipe(p, rid, 2, ptr);
-  // CHECK: call void @__commit_write_pipe(%opencl.pipe_t* %{{.*}}, %opencl.reserve_id_t* %{{.*}}, i32 4, i32 4)
+  // CHECK: call void @__commit_write_pipe(%opencl.pipe_t{{.*}}* %{{.*}}, %opencl.reserve_id_t* %{{.*}}, i32 4, i32 4)
   commit_write_pipe(p, rid);
 }
 
 void test3(read_only pipe int p, global int *ptr) {
-  // CHECK: call %opencl.reserve_id_t* @__work_group_reserve_read_pipe(%opencl.pipe_t* %{{.*}}, i32 {{.*}}, i32 4, i32 4)
+  // CHECK: call