[PATCH] D63756: [AMDGPU] Increased the number of implicit argument bytes for both OpenCL and HIP (CLANG).

2019-07-10 Thread Christudasan Devadasan via Phabricator via cfe-commits
This revision was automatically updated to reflect the committed changes.
Closed by commit rL365643: [AMDGPU] Increased the number of implicit argument 
bytes for both OpenCL and… (authored by cdevadas, committed by ).
Herald added a project: LLVM.
Herald added a subscriber: llvm-commits.

Changed prior to commit:
  https://reviews.llvm.org/D63756?vs=206377=208969#toc

Repository:
  rL LLVM

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D63756/new/

https://reviews.llvm.org/D63756

Files:
  cfe/trunk/lib/CodeGen/TargetInfo.cpp
  cfe/trunk/test/CodeGenCUDA/amdgpu-hip-implicit-kernarg.cu
  cfe/trunk/test/CodeGenOpenCL/amdgpu-attrs.cl


Index: cfe/trunk/test/CodeGenOpenCL/amdgpu-attrs.cl
===
--- cfe/trunk/test/CodeGenOpenCL/amdgpu-attrs.cl
+++ cfe/trunk/test/CodeGenOpenCL/amdgpu-attrs.cl
@@ -158,30 +158,30 @@
 // CHECK-NOT: "amdgpu-num-sgpr"="0"
 // CHECK-NOT: "amdgpu-num-vgpr"="0"
 
-// CHECK-DAG: attributes [[FLAT_WORK_GROUP_SIZE_32_64]] = { convergent 
noinline nounwind optnone "amdgpu-flat-work-group-size"="32,64" 
"amdgpu-implicitarg-num-bytes"="48" 
-// CHECK-DAG: attributes [[FLAT_WORK_GROUP_SIZE_64_64]] = { convergent 
noinline nounwind optnone "amdgpu-flat-work-group-size"="64,64" 
"amdgpu-implicitarg-num-bytes"="48" 
-// CHECK-DAG: attributes [[FLAT_WORK_GROUP_SIZE_16_128]] = { convergent 
noinline nounwind optnone "amdgpu-flat-work-group-size"="16,128" 
"amdgpu-implicitarg-num-bytes"="48" 
-// CHECK-DAG: attributes [[WAVES_PER_EU_2]] = { convergent noinline nounwind 
optnone "amdgpu-implicitarg-num-bytes"="48" "amdgpu-waves-per-eu"="2"
-// CHECK-DAG: attributes [[WAVES_PER_EU_2_4]] = { convergent noinline nounwind 
optnone "amdgpu-implicitarg-num-bytes"="48" "amdgpu-waves-per-eu"="2,4"
-// CHECK-DAG: attributes [[NUM_SGPR_32]] = { convergent noinline nounwind 
optnone "amdgpu-implicitarg-num-bytes"="48" "amdgpu-num-sgpr"="32" 
-// CHECK-DAG: attributes [[NUM_VGPR_64]] = { convergent noinline nounwind 
optnone "amdgpu-implicitarg-num-bytes"="48" "amdgpu-num-vgpr"="64" 
-
-// CHECK-DAG: attributes [[FLAT_WORK_GROUP_SIZE_32_64_WAVES_PER_EU_2]] = { 
convergent noinline nounwind optnone "amdgpu-flat-work-group-size"="32,64" 
"amdgpu-implicitarg-num-bytes"="48" "amdgpu-waves-per-eu"="2"
-// CHECK-DAG: attributes [[FLAT_WORK_GROUP_SIZE_32_64_WAVES_PER_EU_2_4]] = { 
convergent noinline nounwind optnone "amdgpu-flat-work-group-size"="32,64" 
"amdgpu-implicitarg-num-bytes"="48" "amdgpu-waves-per-eu"="2,4"
-// CHECK-DAG: attributes [[FLAT_WORK_GROUP_SIZE_32_64_NUM_SGPR_32]] = { 
convergent noinline nounwind optnone "amdgpu-flat-work-group-size"="32,64" 
"amdgpu-implicitarg-num-bytes"="48" "amdgpu-num-sgpr"="32" 
-// CHECK-DAG: attributes [[FLAT_WORK_GROUP_SIZE_32_64_NUM_VGPR_64]] = { 
convergent noinline nounwind optnone "amdgpu-flat-work-group-size"="32,64" 
"amdgpu-implicitarg-num-bytes"="48" "amdgpu-num-vgpr"="64" 
-// CHECK-DAG: attributes [[WAVES_PER_EU_2_NUM_SGPR_32]] = { convergent 
noinline nounwind optnone "amdgpu-implicitarg-num-bytes"="48" 
"amdgpu-num-sgpr"="32" "amdgpu-waves-per-eu"="2"
-// CHECK-DAG: attributes [[WAVES_PER_EU_2_NUM_VGPR_64]] = { convergent 
noinline nounwind optnone "amdgpu-implicitarg-num-bytes"="48" 
"amdgpu-num-vgpr"="64" "amdgpu-waves-per-eu"="2"
-// CHECK-DAG: attributes [[WAVES_PER_EU_2_4_NUM_SGPR_32]] = { convergent 
noinline nounwind optnone "amdgpu-implicitarg-num-bytes"="48" 
"amdgpu-num-sgpr"="32" "amdgpu-waves-per-eu"="2,4"
-// CHECK-DAG: attributes [[WAVES_PER_EU_2_4_NUM_VGPR_64]] = { convergent 
noinline nounwind optnone "amdgpu-implicitarg-num-bytes"="48" 
"amdgpu-num-vgpr"="64" "amdgpu-waves-per-eu"="2,4"
-// CHECK-DAG: attributes [[NUM_SGPR_32_NUM_VGPR_64]] = { convergent noinline 
nounwind optnone "amdgpu-implicitarg-num-bytes"="48" "amdgpu-num-sgpr"="32" 
"amdgpu-num-vgpr"="64" 
-
-// CHECK-DAG: attributes 
[[FLAT_WORK_GROUP_SIZE_32_64_WAVES_PER_EU_2_NUM_SGPR_32]] = { convergent 
noinline nounwind optnone "amdgpu-flat-work-group-size"="32,64" 
"amdgpu-implicitarg-num-bytes"="48" "amdgpu-num-sgpr"="32" 
"amdgpu-waves-per-eu"="2"
-// CHECK-DAG: attributes 
[[FLAT_WORK_GROUP_SIZE_32_64_WAVES_PER_EU_2_NUM_VGPR_64]] = { convergent 
noinline nounwind optnone "amdgpu-flat-work-group-size"="32,64" 
"amdgpu-implicitarg-num-bytes"="48" "amdgpu-num-vgpr"="64" 
"amdgpu-waves-per-eu"="2"
-// CHECK-DAG: attributes 
[[FLAT_WORK_GROUP_SIZE_32_64_WAVES_PER_EU_2_4_NUM_SGPR_32]] = { convergent 
noinline nounwind optnone "amdgpu-flat-work-group-size"="32,64" 
"amdgpu-implicitarg-num-bytes"="48" "amdgpu-num-sgpr"="32" 
"amdgpu-waves-per-eu"="2,4"
-// CHECK-DAG: attributes 
[[FLAT_WORK_GROUP_SIZE_32_64_WAVES_PER_EU_2_4_NUM_VGPR_64]] = { convergent 
noinline nounwind optnone "amdgpu-flat-work-group-size"="32,64" 
"amdgpu-implicitarg-num-bytes"="48" "amdgpu-num-vgpr"="64" 
"amdgpu-waves-per-eu"="2,4"
+// CHECK-DAG: attributes [[FLAT_WORK_GROUP_SIZE_32_64]] = { convergent 
noinline nounwind optnone 

[PATCH] D63756: [AMDGPU] Increased the number of implicit argument bytes for both OpenCL and HIP (CLANG).

2019-07-08 Thread Yaxun Liu via Phabricator via cfe-commits
yaxunl accepted this revision.
yaxunl added a comment.
This revision is now accepted and ready to land.

LGTM. Thanks.


Repository:
  rC Clang

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D63756/new/

https://reviews.llvm.org/D63756



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D63756: [AMDGPU] Increased the number of implicit argument bytes for both OpenCL and HIP (CLANG).

2019-07-05 Thread Christudasan Devadasan via Phabricator via cfe-commits
cdevadas added a comment.

The backend changes, D63886  are in the 
upstream now with revision rL365217 .
Please review this patch.


Repository:
  rC Clang

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D63756/new/

https://reviews.llvm.org/D63756



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D63756: [AMDGPU] Increased the number of implicit argument bytes for both OpenCL and HIP (CLANG).

2019-06-27 Thread Christudasan Devadasan via Phabricator via cfe-commits
cdevadas added a comment.

I have created the review for adding the metadata in the backend. 
https://reviews.llvm.org/D63886 
Marking the current review depended on D63886 .


Repository:
  rC Clang

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D63756/new/

https://reviews.llvm.org/D63756



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D63756: [AMDGPU] Increased the number of implicit argument bytes for both OpenCL and HIP (CLANG).

2019-06-25 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment.

In D63756#1557658 , @cdevadas wrote:

> Hi Sam,
>  The compiler generates metadata for the first 48 bytes. I compiled a sample 
> code and verified it. The backend does nothing for the extra bytes now.
>  I will soon submit the backend patch to generate the new metadata.


Something that adds ability to handle extra args should be in place at the same 
time or earlier than the knob that enables the feature.
Ordering of patches matters. You don't know at which revision users will check 
out the tree and we do want the tree to be buildable and functional at every 
revision.
Even if breaking the logical order appears to be benign in this case, it would 
still be better to commit them in order.

I propose to wait until your upcoming patch is up for review, add it as a 
dependency here and land patches in the right order.


Repository:
  rC Clang

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D63756/new/

https://reviews.llvm.org/D63756



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D63756: [AMDGPU] Increased the number of implicit argument bytes for both OpenCL and HIP (CLANG).

2019-06-25 Thread Christudasan Devadasan via Phabricator via cfe-commits
cdevadas added a comment.

Hi Sam,
The compiler generates metadata for the first 48 bytes. I compiled a sample 
code and verified it. The backend does nothing for the extra bytes now.
I will soon submit the backend patch to generate the new metadata.


Repository:
  rC Clang

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D63756/new/

https://reviews.llvm.org/D63756



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D63756: [AMDGPU] Increased the number of implicit argument bytes for both OpenCL and HIP (CLANG).

2019-06-25 Thread Yaxun Liu via Phabricator via cfe-commits
yaxunl added a comment.

can you try compile an empty HIP kernel and see what metadata is generated by 
backend?

If I remember correctly, backend generates kernel arg metadata based on the 
number of implicit kernel args. It knows how to handle 48 but I am not sure 
what will happen if it becomes 56.


Repository:
  rC Clang

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D63756/new/

https://reviews.llvm.org/D63756



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D63756: [AMDGPU] Increased the number of implicit argument bytes for both OpenCL and HIP (CLANG).

2019-06-25 Thread Christudasan Devadasan via Phabricator via cfe-commits
cdevadas created this revision.
cdevadas added reviewers: yaxunl, b-sumner.
Herald added subscribers: cfe-commits, t-tye, Anastasia, tpr, dstuttard, 
nhaehnle, wdng, jvesely, kzhuravl.
Herald added a project: clang.

To enable a new implicit kernel argument, increasing the number of argument 
bytes from 48 to 56.


Repository:
  rC Clang

https://reviews.llvm.org/D63756

Files:
  lib/CodeGen/TargetInfo.cpp
  test/CodeGenCUDA/amdgpu-hip-implicit-kernarg.cu
  test/CodeGenOpenCL/amdgpu-attrs.cl


Index: test/CodeGenOpenCL/amdgpu-attrs.cl
===
--- test/CodeGenOpenCL/amdgpu-attrs.cl
+++ test/CodeGenOpenCL/amdgpu-attrs.cl
@@ -158,30 +158,30 @@
 // CHECK-NOT: "amdgpu-num-sgpr"="0"
 // CHECK-NOT: "amdgpu-num-vgpr"="0"
 
-// CHECK-DAG: attributes [[FLAT_WORK_GROUP_SIZE_32_64]] = { convergent 
noinline nounwind optnone "amdgpu-flat-work-group-size"="32,64" 
"amdgpu-implicitarg-num-bytes"="48" 
-// CHECK-DAG: attributes [[FLAT_WORK_GROUP_SIZE_64_64]] = { convergent 
noinline nounwind optnone "amdgpu-flat-work-group-size"="64,64" 
"amdgpu-implicitarg-num-bytes"="48" 
-// CHECK-DAG: attributes [[FLAT_WORK_GROUP_SIZE_16_128]] = { convergent 
noinline nounwind optnone "amdgpu-flat-work-group-size"="16,128" 
"amdgpu-implicitarg-num-bytes"="48" 
-// CHECK-DAG: attributes [[WAVES_PER_EU_2]] = { convergent noinline nounwind 
optnone "amdgpu-implicitarg-num-bytes"="48" "amdgpu-waves-per-eu"="2"
-// CHECK-DAG: attributes [[WAVES_PER_EU_2_4]] = { convergent noinline nounwind 
optnone "amdgpu-implicitarg-num-bytes"="48" "amdgpu-waves-per-eu"="2,4"
-// CHECK-DAG: attributes [[NUM_SGPR_32]] = { convergent noinline nounwind 
optnone "amdgpu-implicitarg-num-bytes"="48" "amdgpu-num-sgpr"="32" 
-// CHECK-DAG: attributes [[NUM_VGPR_64]] = { convergent noinline nounwind 
optnone "amdgpu-implicitarg-num-bytes"="48" "amdgpu-num-vgpr"="64" 
-
-// CHECK-DAG: attributes [[FLAT_WORK_GROUP_SIZE_32_64_WAVES_PER_EU_2]] = { 
convergent noinline nounwind optnone "amdgpu-flat-work-group-size"="32,64" 
"amdgpu-implicitarg-num-bytes"="48" "amdgpu-waves-per-eu"="2"
-// CHECK-DAG: attributes [[FLAT_WORK_GROUP_SIZE_32_64_WAVES_PER_EU_2_4]] = { 
convergent noinline nounwind optnone "amdgpu-flat-work-group-size"="32,64" 
"amdgpu-implicitarg-num-bytes"="48" "amdgpu-waves-per-eu"="2,4"
-// CHECK-DAG: attributes [[FLAT_WORK_GROUP_SIZE_32_64_NUM_SGPR_32]] = { 
convergent noinline nounwind optnone "amdgpu-flat-work-group-size"="32,64" 
"amdgpu-implicitarg-num-bytes"="48" "amdgpu-num-sgpr"="32" 
-// CHECK-DAG: attributes [[FLAT_WORK_GROUP_SIZE_32_64_NUM_VGPR_64]] = { 
convergent noinline nounwind optnone "amdgpu-flat-work-group-size"="32,64" 
"amdgpu-implicitarg-num-bytes"="48" "amdgpu-num-vgpr"="64" 
-// CHECK-DAG: attributes [[WAVES_PER_EU_2_NUM_SGPR_32]] = { convergent 
noinline nounwind optnone "amdgpu-implicitarg-num-bytes"="48" 
"amdgpu-num-sgpr"="32" "amdgpu-waves-per-eu"="2"
-// CHECK-DAG: attributes [[WAVES_PER_EU_2_NUM_VGPR_64]] = { convergent 
noinline nounwind optnone "amdgpu-implicitarg-num-bytes"="48" 
"amdgpu-num-vgpr"="64" "amdgpu-waves-per-eu"="2"
-// CHECK-DAG: attributes [[WAVES_PER_EU_2_4_NUM_SGPR_32]] = { convergent 
noinline nounwind optnone "amdgpu-implicitarg-num-bytes"="48" 
"amdgpu-num-sgpr"="32" "amdgpu-waves-per-eu"="2,4"
-// CHECK-DAG: attributes [[WAVES_PER_EU_2_4_NUM_VGPR_64]] = { convergent 
noinline nounwind optnone "amdgpu-implicitarg-num-bytes"="48" 
"amdgpu-num-vgpr"="64" "amdgpu-waves-per-eu"="2,4"
-// CHECK-DAG: attributes [[NUM_SGPR_32_NUM_VGPR_64]] = { convergent noinline 
nounwind optnone "amdgpu-implicitarg-num-bytes"="48" "amdgpu-num-sgpr"="32" 
"amdgpu-num-vgpr"="64" 
-
-// CHECK-DAG: attributes 
[[FLAT_WORK_GROUP_SIZE_32_64_WAVES_PER_EU_2_NUM_SGPR_32]] = { convergent 
noinline nounwind optnone "amdgpu-flat-work-group-size"="32,64" 
"amdgpu-implicitarg-num-bytes"="48" "amdgpu-num-sgpr"="32" 
"amdgpu-waves-per-eu"="2"
-// CHECK-DAG: attributes 
[[FLAT_WORK_GROUP_SIZE_32_64_WAVES_PER_EU_2_NUM_VGPR_64]] = { convergent 
noinline nounwind optnone "amdgpu-flat-work-group-size"="32,64" 
"amdgpu-implicitarg-num-bytes"="48" "amdgpu-num-vgpr"="64" 
"amdgpu-waves-per-eu"="2"
-// CHECK-DAG: attributes 
[[FLAT_WORK_GROUP_SIZE_32_64_WAVES_PER_EU_2_4_NUM_SGPR_32]] = { convergent 
noinline nounwind optnone "amdgpu-flat-work-group-size"="32,64" 
"amdgpu-implicitarg-num-bytes"="48" "amdgpu-num-sgpr"="32" 
"amdgpu-waves-per-eu"="2,4"
-// CHECK-DAG: attributes 
[[FLAT_WORK_GROUP_SIZE_32_64_WAVES_PER_EU_2_4_NUM_VGPR_64]] = { convergent 
noinline nounwind optnone "amdgpu-flat-work-group-size"="32,64" 
"amdgpu-implicitarg-num-bytes"="48" "amdgpu-num-vgpr"="64" 
"amdgpu-waves-per-eu"="2,4"
-
-// CHECK-DAG: attributes 
[[FLAT_WORK_GROUP_SIZE_32_64_WAVES_PER_EU_2_NUM_SGPR_32_NUM_VGPR_64]] = { 
convergent noinline nounwind optnone "amdgpu-flat-work-group-size"="32,64" 
"amdgpu-implicitarg-num-bytes"="48" "amdgpu-num-sgpr"="32" 
"amdgpu-num-vgpr"="64" "amdgpu-waves-per-eu"="2"
-//