[clang] [Clang][TableGen] Support specifying address space in clang builtin prototypes (PR #108497)
vikramRH wrote: > > > > Gentle ping @AaronBallman , @philnik777 , @fpetrogalli :) > > > > > > > > > Ah, sorry -- because the PR is marked as a Draft, I figured it wasn't > > > ready for review yet. > > > I think I'd rather this was expressed differently; we already don't put > > > attribute information in the prototype anyway (`noexcept` as an example), > > > so I'd prefer to continue down that road and put the address space > > > information into the `Attributes` field. e.g., > > > ``` > > > def BuiltinCPUIs : Builtin { > > > let Spellings = ["__builtin_cpu_is"]; > > > let Attributes = [NoThrow, Const, AddressSpace<2>]; > > > let Prototype = "bool(char const*)"; > > > } > > > ``` > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I think that makes it more clean in terms of specifying the attribute, > > > and it also means we can name the address spaces in `BuiltinsBase.td` if > > > we would like, which is even easier for folks to understand when reading > > > `Builtins.td` > > > WDYT? > > > > > > Thanks for the reply @AaronBallman . The reason this is still a draft is > > that I wanted it to be an initial proposal to get some inputs and a > > consensus on the final design. and about it being part of the "Attributes" > > field, one major issue is that the address space information should be per > > argument including the return type. "Attributes" field currently expresses > > attributes to the function. If attribute in the prototype is not desired, > > probably a new field that lets us specify per argument attributes makes > > sense ? > > Oh! I hadn't realized this was needed on a per-parameter basis. Oof that > makes this more awkward. I'd still love to avoid writing this as part of the > signature; I think we could use the existing `IndexedAttribute` to specify > which argument the attribute applies to. e.g., > > ``` > class AddressSpace : IndexedAttribute<"something", > Idx> { > int SpaceNum = AddrSpaceNum; > } > ``` Makes sense, I will give this a try and update the PR https://github.com/llvm/llvm-project/pull/108497 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Clang][TableGen] Support specifying address space in clang builtin prototypes (PR #108497)
vikramRH wrote: > > Gentle ping @AaronBallman , @philnik777 , @fpetrogalli :) > > Ah, sorry -- because the PR is marked as a Draft, I figured it wasn't ready > for review yet. > > I think I'd rather this was expressed differently; we already don't put > attribute information in the prototype anyway (`noexcept` as an example), so > I'd prefer to continue down that road and put the address space information > into the `Attributes` field. e.g., > > ``` > def BuiltinCPUIs : Builtin { > let Spellings = ["__builtin_cpu_is"]; > let Attributes = [NoThrow, Const, AddressSpace<2>]; > let Prototype = "bool(char const*)"; > } > ``` > > I think that makes it more clean in terms of specifying the attribute, and it > also means we can name the address spaces in `BuiltinsBase.td` if we would > like, which is even easier for folks to understand when reading `Builtins.td` > > WDYT? Thanks for the reply @AaronBallman . The reason this is still a draft is that I wanted it to be an initial proposal to get some inputs and a consensus on the final design. and about it being part of the "Attributes" field, one major issues is that the address space information should be per argument including the return type. "Attributes" field currently expresses attributes to the function. If attribute in the prototype is not desired, probably a new field that lets us specify per argument attributes makes sense ? https://github.com/llvm/llvm-project/pull/108497 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Clang][TableGen] Support specifying address space in clang builtin prototypes (PR #108497)
vikramRH wrote: Gentle ping @AaronBallman , @philnik777 , @fpetrogalli :) https://github.com/llvm/llvm-project/pull/108497 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Clang][TableGen] Support specifying address space in clang builtin prototypes (PR #108497)
https://github.com/vikramRH created https://github.com/llvm/llvm-project/pull/108497 this is a follow up from the discussion in https://github.com/llvm/llvm-project/pull/86801 (apologies for the long delay...). This PR proposes a way to specify address spaces in builtin prototypes. The idea is to specify address space numbers using CXX11 attribute list syntax ([[]]) with following limitations, 1. The attribute [[addrspace[n]]] is strictly a "prefix" to the builtin type, i.e something as follows is not accepted, int* const [[addrspace[3]]] ; 2. I really wanted the syntax to be like [[addrspace(n)]] but '(' token conflicts with function signature. so current approach is to use "[[addrspace[n]]]" 3. The attribute is only valid with pointer and reference types (as per the restriction imposed by .def files) I would like some views on this approach and alternate suggestions if any. Also please let me know if there are any parallel efforts towards this which I might not be aware of. >From 6afc2e91d8877cc330f6e317a404a74990d9c607 Mon Sep 17 00:00:00 2001 From: vikhegde Date: Wed, 4 Sep 2024 10:34:54 + Subject: [PATCH] [clang][TableGen] Support specifying address space in clang builtin prototypes --- .../target-builtins-prototype-parser.td | 71 +++ clang/utils/TableGen/ClangBuiltinsEmitter.cpp | 52 -- 2 files changed, 119 insertions(+), 4 deletions(-) diff --git a/clang/test/TableGen/target-builtins-prototype-parser.td b/clang/test/TableGen/target-builtins-prototype-parser.td index 555aebb3ccfb1f..dcff11046603ef 100644 --- a/clang/test/TableGen/target-builtins-prototype-parser.td +++ b/clang/test/TableGen/target-builtins-prototype-parser.td @@ -6,6 +6,12 @@ // RUN: not clang-tblgen -I %p/../../include/ %s --gen-clang-builtins -DERROR_EXPECTED_B 2>&1 | FileCheck %s --check-prefix=ERROR_EXPECTED_B // RUN: not clang-tblgen -I %p/../../include/ %s --gen-clang-builtins -DERROR_EXPECTED_C 2>&1 | FileCheck %s --check-prefix=ERROR_EXPECTED_C // RUN: not clang-tblgen -I %p/../../include/ %s --gen-clang-builtins -DERROR_EXPECTED_D 2>&1 | FileCheck %s --check-prefix=ERROR_EXPECTED_D +// RUN: not clang-tblgen -I %p/../../include/ %s --gen-clang-builtins -DERROR_EXPECTED_E 2>&1 | FileCheck %s --check-prefix=ERROR_EXPECTED_E +// RUN: not clang-tblgen -I %p/../../include/ %s --gen-clang-builtins -DERROR_EXPECTED_F 2>&1 | FileCheck %s --check-prefix=ERROR_EXPECTED_F +// RUN: not clang-tblgen -I %p/../../include/ %s --gen-clang-builtins -DERROR_EXPECTED_G 2>&1 | FileCheck %s --check-prefix=ERROR_EXPECTED_G +// RUN: not clang-tblgen -I %p/../../include/ %s --gen-clang-builtins -DERROR_EXPECTED_H 2>&1 | FileCheck %s --check-prefix=ERROR_EXPECTED_H +// RUN: not clang-tblgen -I %p/../../include/ %s --gen-clang-builtins -DERROR_EXPECTED_I 2>&1 | FileCheck %s --check-prefix=ERROR_EXPECTED_I +// RUN: not clang-tblgen -I %p/../../include/ %s --gen-clang-builtins -DERROR_EXPECTED_J 2>&1 | FileCheck %s --check-prefix=ERROR_EXPECTED_J include "clang/Basic/BuiltinsBase.td" @@ -113,3 +119,68 @@ def : Builtin { } #endif +def : Builtin { +// CHECK: BUILTIN(__builtin_test_addrspace_attribute_01, "di*3", "") + let Prototype = "double( [[addrspace[3]]] int*)"; + let Spellings = ["__builtin_test_addrspace_attribute_01"]; +} + +def : Builtin { +// CHECK: BUILTIN(__builtin_test_addrspace_attribute_02, "Ii*5i*d", "") + let Prototype = "_Constant [[addrspace[5]]] int* (int*, double)"; + let Spellings = ["__builtin_test_addrspace_attribute_02"]; +} + +def : Builtin { +// CHECK: BUILTIN(__builtin_test_addrspace_attribute_04, "Ii&4id*7", "") + let Prototype = "_Constant [[addrspace[4]]] int& (int , [[addrspace[7]]] double*)"; + let Spellings = ["__builtin_test_addrspace_attribute_04"]; +} + +#ifdef ERROR_EXPECTED_E +def : Builtin { +// ERROR_EXPECTED_E: :[[# @LINE + 1]]:7: error: Expected opening bracket '[' after 'addrspace' + let Prototype = "_Constant [[addrspace]] int& (int , double*)"; + let Spellings = ["__builtin_test_addrspace_attribute_04"]; +} +#endif + +#ifdef ERROR_EXPECTED_F +def : Builtin { +// ERROR_EXPECTED_F: :[[# @LINE + 1]]:7: error: Address space attribute can only be specified with a pointer or reference type + let Prototype = "_Constant [[addrspace[4]]] int (int , double*)"; + let Spellings = ["__builtin_test_addrspace_attribute_04"]; +} +#endif + +#ifdef ERROR_EXPECTED_G +def : Builtin { +// ERROR_EXPECTED_G: :[[# @LINE + 1]]:7: error: Expecetd valid integer for 'addrspace' attribute + let Prototype = "_Constant [[addrspace[k]]] int* (int , double*)"; + let Spellings = ["__builtin_test_addrspace_attribute_04"]; +} +#endif + +#ifdef ERROR_EXPECTED_H +def : Builtin { +// ERROR_EXPECTED_H: :[[# @LINE + 1]]:7: error: Expected closing bracket ']' after address space specification + let Prototype = "_Constant [[addrspace[6 int* (int , double*)"; + let Spellings = ["__builtin_test_addrspace_attribute_04"]; +} +#endif + +#ifdef ERROR_EXP
[clang] [clang][ExprConst] allow single element access of vector object to be constant expression (PR #72607)
vikramRH wrote: closing this, since its handled via https://github.com/llvm/llvm-project/pull/101126 https://github.com/llvm/llvm-project/pull/72607 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang][ExprConst] allow single element access of vector object to be constant expression (PR #72607)
https://github.com/vikramRH closed https://github.com/llvm/llvm-project/pull/72607 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang][ExprConst] allow single element access of vector object to be constant expression (PR #101126)
vikramRH wrote: > @vikramRH Do you need someone else to merge this for you? sorry for the delay, merged. https://github.com/llvm/llvm-project/pull/101126 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang][ExprConst] allow single element access of vector object to be constant expression (PR #101126)
https://github.com/vikramRH closed https://github.com/llvm/llvm-project/pull/101126 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang][ExprConst] allow single element access of vector object to be constant expression (PR #101126)
vikramRH wrote: ### Merge activity * **Aug 7, 6:38 AM EDT**: @vikramRH started a stack merge that includes this pull request via [Graphite](https://app.graphite.dev/github/pr/llvm/llvm-project/101126). https://github.com/llvm/llvm-project/pull/101126 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang][ExprConst] allow single element access of vector object to be constant expression (PR #101126)
@@ -64,6 +64,9 @@ sections with improvements to Clang's support for those languages. C++ Language Changes +- Allow single element access of vector object to be constant expression. vikramRH wrote: done https://github.com/llvm/llvm-project/pull/101126 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang][ExprConst] allow single element access of vector object to be constant expression (PR #101126)
@@ -3,40 +3,40 @@ typedef int __attribute__((vector_size(16))) VI4; constexpr VI4 A = {1,2,3,4}; -static_assert(A[0] == 1, ""); // ref-error {{not an integral constant expression}} -static_assert(A[1] == 2, ""); // ref-error {{not an integral constant expression}} -static_assert(A[2] == 3, ""); // ref-error {{not an integral constant expression}} -static_assert(A[3] == 4, ""); // ref-error {{not an integral constant expression}} +static_assert(A[0] == 1, ""); +static_assert(A[1] == 2, ""); +static_assert(A[2] == 3, ""); +static_assert(A[3] == 4, ""); /// FIXME: It would be nice if the note said 'vector' instead of 'array'. -static_assert(A[12] == 4, ""); // ref-error {{not an integral constant expression}} \ - // expected-error {{not an integral constant expression}} \ - // expected-note {{cannot refer to element 12 of array of 4 elements in a constant expression}} +static_assert(A[12] == 4, ""); // both-error {{not an integral constant expression}} \ + // expected-note {{cannot refer to element 12 of array of 4 elements in a constant expression}} \ + // ref-note {{read of dereferenced one-past-the-end pointer is not allowed in a constant expression}} vikramRH wrote: done, let me know if the updated changes are okay https://github.com/llvm/llvm-project/pull/101126 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang][ExprConst] allow single element access of vector object to be constant expression (PR #101126)
https://github.com/vikramRH updated https://github.com/llvm/llvm-project/pull/101126 >From 690901f2370381285afa7cf7c2f7401d89e568f6 Mon Sep 17 00:00:00 2001 From: Vikram Date: Mon, 29 Jul 2024 08:56:07 -0400 Subject: [PATCH 1/2] [clang][ExprConst] allow single element access of vector object to be constant expression --- clang/docs/ReleaseNotes.rst | 3 + clang/lib/AST/ExprConstant.cpp| 102 +- clang/lib/AST/Interp/State.h | 3 +- clang/test/AST/Interp/builtin-functions.cpp | 26 ++--- clang/test/AST/Interp/vectors.cpp | 50 - clang/test/CodeGenCXX/temporaries.cpp | 41 --- .../constexpr-vectors-access-elements.cpp | 29 + 7 files changed, 190 insertions(+), 64 deletions(-) create mode 100644 clang/test/SemaCXX/constexpr-vectors-access-elements.cpp diff --git a/clang/docs/ReleaseNotes.rst b/clang/docs/ReleaseNotes.rst index ddad083571eb1..2179aaea12387 100644 --- a/clang/docs/ReleaseNotes.rst +++ b/clang/docs/ReleaseNotes.rst @@ -64,6 +64,9 @@ sections with improvements to Clang's support for those languages. C++ Language Changes +- Allow single element access of vector object to be constant expression. + Supports the `V.xyzw` syntax and other tidbits as seen in OpenCL. + Selecting multiple elements is left as a future work. C++17 Feature Support ^ diff --git a/clang/lib/AST/ExprConstant.cpp b/clang/lib/AST/ExprConstant.cpp index 558e20ed3e423..08f49ac896153 100644 --- a/clang/lib/AST/ExprConstant.cpp +++ b/clang/lib/AST/ExprConstant.cpp @@ -222,6 +222,11 @@ namespace { ArraySize = 2; MostDerivedLength = I + 1; IsArray = true; + } else if (const auto *VT = Type->getAs()) { +Type = VT->getElementType(); +ArraySize = VT->getNumElements(); +MostDerivedLength = I + 1; +IsArray = true; } else if (const FieldDecl *FD = getAsField(Path[I])) { Type = FD->getType(); ArraySize = 0; @@ -268,7 +273,6 @@ namespace { /// If the current array is an unsized array, the value of this is /// undefined. uint64_t MostDerivedArraySize; - /// The type of the most derived object referred to by this address. QualType MostDerivedType; @@ -442,6 +446,16 @@ namespace { MostDerivedArraySize = 2; MostDerivedPathLength = Entries.size(); } + +void addVectorElementUnchecked(QualType EltTy, uint64_t Size, + uint64_t Idx) { + Entries.push_back(PathEntry::ArrayIndex(Idx)); + MostDerivedType = EltTy; + MostDerivedPathLength = Entries.size(); + MostDerivedArraySize = 0; + MostDerivedIsArrayElement = false; +} + void diagnoseUnsizedArrayPointerArithmetic(EvalInfo &Info, const Expr *E); void diagnosePointerArithmetic(EvalInfo &Info, const Expr *E, const APSInt &N); @@ -1737,6 +1751,11 @@ namespace { if (checkSubobject(Info, E, Imag ? CSK_Imag : CSK_Real)) Designator.addComplexUnchecked(EltTy, Imag); } +void addVectorElement(EvalInfo &Info, const Expr *E, QualType EltTy, + uint64_t Size, uint64_t Idx) { + if (checkSubobject(Info, E, CSK_VectorElement)) +Designator.addVectorElementUnchecked(EltTy, Size, Idx); +} void clearIsNullPointer() { IsNullPtr = false; } @@ -3310,6 +3329,19 @@ static bool HandleLValueComplexElement(EvalInfo &Info, const Expr *E, return true; } +static bool HandleLValueVectorElement(EvalInfo &Info, const Expr *E, + LValue &LVal, QualType EltTy, + uint64_t Size, uint64_t Idx) { + if (Idx) { +CharUnits SizeOfElement; +if (!HandleSizeof(Info, E->getExprLoc(), EltTy, SizeOfElement)) + return false; +LVal.Offset += SizeOfElement * Idx; + } + LVal.addVectorElement(Info, E, EltTy, Size, Idx); + return true; +} + /// Try to evaluate the initializer for a variable declaration. /// /// \param Info Information about the ongoing evaluation. @@ -3855,6 +3887,19 @@ findSubobject(EvalInfo &Info, const Expr *E, const CompleteObject &Obj, return handler.found(Index ? O->getComplexFloatImag() : O->getComplexFloatReal(), ObjType); } +} else if (const auto *VT = ObjType->getAs()) { + uint64_t Index = Sub.Entries[I].getAsArrayIndex(); + if (Index >= VT->getNumElements()) { +if (Info.getLangOpts().CPlusPlus11) + Info.FFDiag(E, diag::note_constexpr_access_past_end) + << handler.AccessKind; +else + Info.FFDiag(E); +return handler.failed(); + } + ObjType = VT->getElementType(); + assert(I == N - 1 && "extracting subobject of scalar?"); + return handler.found(O->getVectorElt(Index), ObjType); } else if
[clang] [clang][ExprConst] allow single element access of vector object to be constant expression (PR #101126)
@@ -3,40 +3,40 @@ typedef int __attribute__((vector_size(16))) VI4; constexpr VI4 A = {1,2,3,4}; -static_assert(A[0] == 1, ""); // ref-error {{not an integral constant expression}} -static_assert(A[1] == 2, ""); // ref-error {{not an integral constant expression}} -static_assert(A[2] == 3, ""); // ref-error {{not an integral constant expression}} -static_assert(A[3] == 4, ""); // ref-error {{not an integral constant expression}} +static_assert(A[0] == 1, ""); +static_assert(A[1] == 2, ""); +static_assert(A[2] == 3, ""); +static_assert(A[3] == 4, ""); /// FIXME: It would be nice if the note said 'vector' instead of 'array'. -static_assert(A[12] == 4, ""); // ref-error {{not an integral constant expression}} \ - // expected-error {{not an integral constant expression}} \ - // expected-note {{cannot refer to element 12 of array of 4 elements in a constant expression}} +static_assert(A[12] == 4, ""); // both-error {{not an integral constant expression}} \ + // expected-note {{cannot refer to element 12 of array of 4 elements in a constant expression}} \ + // ref-note {{read of dereferenced one-past-the-end pointer is not allowed in a constant expression}} vikramRH wrote: I just kept the original version of the PR, but the message "cannot refer to element 12 of array of 4 elements" seems correct here. I shall update this https://github.com/llvm/llvm-project/pull/101126 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang][ExprConst] allow single element access of vector object to be constant expression (PR #101126)
@@ -442,6 +446,16 @@ namespace { MostDerivedArraySize = 2; MostDerivedPathLength = Entries.size(); } + +void addVectorElementUnchecked(QualType EltTy, uint64_t Size, + uint64_t Idx) { + Entries.push_back(PathEntry::ArrayIndex(Idx)); vikramRH wrote: Yes, I thought about having a new accessor of the sort "vectorIndex" but all it seems to achieve is just adding new API that returns does the exact same thing as array (other than perhaps adding a new meaning to PathEntry value). I will update it if you feel this makes sense. https://github.com/llvm/llvm-project/pull/101126 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang][ExprConst] allow single element access of vector object to be constant expression (PR #101126)
https://github.com/vikramRH ready_for_review https://github.com/llvm/llvm-project/pull/101126 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang][ExprConst] allow single element access of vector object to be constant expression (PR #101126)
https://github.com/vikramRH edited https://github.com/llvm/llvm-project/pull/101126 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang][ExprConst] allow single element access of vector object to be constant expression (PR #101126)
vikramRH wrote: * **#101126** https://app.graphite.dev/github/pr/llvm/llvm-project/101126?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> 👈 * `main` This stack of pull requests is managed by Graphite. https://stacking.dev/?utm_source=stack-comment";>Learn more about stacking. Join @vikramRH and the rest of your teammates on https://graphite.dev?utm-source=stack-comment";>https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="11px" height="11px"/> Graphite https://github.com/llvm/llvm-project/pull/101126 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang][ExprConst] allow single element access of vector object to be constant expression (PR #101126)
https://github.com/vikramRH created https://github.com/llvm/llvm-project/pull/101126 None >From 690901f2370381285afa7cf7c2f7401d89e568f6 Mon Sep 17 00:00:00 2001 From: Vikram Date: Mon, 29 Jul 2024 08:56:07 -0400 Subject: [PATCH] [clang][ExprConst] allow single element access of vector object to be constant expression --- clang/docs/ReleaseNotes.rst | 3 + clang/lib/AST/ExprConstant.cpp| 102 +- clang/lib/AST/Interp/State.h | 3 +- clang/test/AST/Interp/builtin-functions.cpp | 26 ++--- clang/test/AST/Interp/vectors.cpp | 50 - clang/test/CodeGenCXX/temporaries.cpp | 41 --- .../constexpr-vectors-access-elements.cpp | 29 + 7 files changed, 190 insertions(+), 64 deletions(-) create mode 100644 clang/test/SemaCXX/constexpr-vectors-access-elements.cpp diff --git a/clang/docs/ReleaseNotes.rst b/clang/docs/ReleaseNotes.rst index ddad083571eb1..2179aaea12387 100644 --- a/clang/docs/ReleaseNotes.rst +++ b/clang/docs/ReleaseNotes.rst @@ -64,6 +64,9 @@ sections with improvements to Clang's support for those languages. C++ Language Changes +- Allow single element access of vector object to be constant expression. + Supports the `V.xyzw` syntax and other tidbits as seen in OpenCL. + Selecting multiple elements is left as a future work. C++17 Feature Support ^ diff --git a/clang/lib/AST/ExprConstant.cpp b/clang/lib/AST/ExprConstant.cpp index 558e20ed3e423..08f49ac896153 100644 --- a/clang/lib/AST/ExprConstant.cpp +++ b/clang/lib/AST/ExprConstant.cpp @@ -222,6 +222,11 @@ namespace { ArraySize = 2; MostDerivedLength = I + 1; IsArray = true; + } else if (const auto *VT = Type->getAs()) { +Type = VT->getElementType(); +ArraySize = VT->getNumElements(); +MostDerivedLength = I + 1; +IsArray = true; } else if (const FieldDecl *FD = getAsField(Path[I])) { Type = FD->getType(); ArraySize = 0; @@ -268,7 +273,6 @@ namespace { /// If the current array is an unsized array, the value of this is /// undefined. uint64_t MostDerivedArraySize; - /// The type of the most derived object referred to by this address. QualType MostDerivedType; @@ -442,6 +446,16 @@ namespace { MostDerivedArraySize = 2; MostDerivedPathLength = Entries.size(); } + +void addVectorElementUnchecked(QualType EltTy, uint64_t Size, + uint64_t Idx) { + Entries.push_back(PathEntry::ArrayIndex(Idx)); + MostDerivedType = EltTy; + MostDerivedPathLength = Entries.size(); + MostDerivedArraySize = 0; + MostDerivedIsArrayElement = false; +} + void diagnoseUnsizedArrayPointerArithmetic(EvalInfo &Info, const Expr *E); void diagnosePointerArithmetic(EvalInfo &Info, const Expr *E, const APSInt &N); @@ -1737,6 +1751,11 @@ namespace { if (checkSubobject(Info, E, Imag ? CSK_Imag : CSK_Real)) Designator.addComplexUnchecked(EltTy, Imag); } +void addVectorElement(EvalInfo &Info, const Expr *E, QualType EltTy, + uint64_t Size, uint64_t Idx) { + if (checkSubobject(Info, E, CSK_VectorElement)) +Designator.addVectorElementUnchecked(EltTy, Size, Idx); +} void clearIsNullPointer() { IsNullPtr = false; } @@ -3310,6 +3329,19 @@ static bool HandleLValueComplexElement(EvalInfo &Info, const Expr *E, return true; } +static bool HandleLValueVectorElement(EvalInfo &Info, const Expr *E, + LValue &LVal, QualType EltTy, + uint64_t Size, uint64_t Idx) { + if (Idx) { +CharUnits SizeOfElement; +if (!HandleSizeof(Info, E->getExprLoc(), EltTy, SizeOfElement)) + return false; +LVal.Offset += SizeOfElement * Idx; + } + LVal.addVectorElement(Info, E, EltTy, Size, Idx); + return true; +} + /// Try to evaluate the initializer for a variable declaration. /// /// \param Info Information about the ongoing evaluation. @@ -3855,6 +3887,19 @@ findSubobject(EvalInfo &Info, const Expr *E, const CompleteObject &Obj, return handler.found(Index ? O->getComplexFloatImag() : O->getComplexFloatReal(), ObjType); } +} else if (const auto *VT = ObjType->getAs()) { + uint64_t Index = Sub.Entries[I].getAsArrayIndex(); + if (Index >= VT->getNumElements()) { +if (Info.getLangOpts().CPlusPlus11) + Info.FFDiag(E, diag::note_constexpr_access_past_end) + << handler.AccessKind; +else + Info.FFDiag(E); +return handler.failed(); + } + ObjType = VT->getElementType(); + assert(I == N - 1 && "extracting subobject of scalar?"); + return handler.found(O->getVectorElt(Index), ObjType); } else
[clang] [llvm] [AMDGPU] Enable atomic optimizer for 64 bit divergent values (PR #96473)
vikramRH wrote: closing this in favour of https://github.com/llvm/llvm-project/pull/96933 and https://github.com/llvm/llvm-project/pull/96934 https://github.com/llvm/llvm-project/pull/96473 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU] Enable atomic optimizer for 64 bit divergent values (PR #96473)
https://github.com/vikramRH closed https://github.com/llvm/llvm-project/pull/96473 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU] Enable atomic optimizer for 64 bit divergent values (PR #96473)
vikramRH wrote: Apologies for the commit spam here, graphite seems a good option now onwards. However all dependent patches have landed now, the diff here is now up to date. https://github.com/llvm/llvm-project/pull/96473 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU] Enable atomic optimizer for 64 bit divergent values (PR #96473)
@@ -228,10 +228,11 @@ void AMDGPUAtomicOptimizerImpl::visitAtomicRMWInst(AtomicRMWInst &I) { // If the value operand is divergent, each lane is contributing a different // value to the atomic calculation. We can only optimize divergent values if - // we have DPP available on our subtarget, and the atomic operation is 32 - // bits. + // we have DPP available on our subtarget, and the atomic operation is either + // 32 or 64 bits. if (ValDivergent && - (!ST->hasDPP() || DL->getTypeSizeInBits(I.getType()) != 32)) { + (!ST->hasDPP() || (DL->getTypeSizeInBits(I.getType()) != 32 && + DL->getTypeSizeInBits(I.getType()) != 64))) { vikramRH wrote: Done https://github.com/llvm/llvm-project/pull/96473 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU] Extend permlane16, permlanex16 and permlane64 intrinsic lowering for generic types (PR #92725)
https://github.com/vikramRH closed https://github.com/llvm/llvm-project/pull/92725 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)
https://github.com/vikramRH closed https://github.com/llvm/llvm-project/pull/89217 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU] Enable atomic optimizer for 64 bit divergent values (PR #96473)
https://github.com/vikramRH edited https://github.com/llvm/llvm-project/pull/96473 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU] Extend permlane16, permlanex16 and permlane64 intrinsic lowering for generic types (PR #92725)
https://github.com/vikramRH ready_for_review https://github.com/llvm/llvm-project/pull/92725 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU] Extend permlane16, permlanex16 and permlane64 intrinsic lowering for generic types (PR #92725)
https://github.com/vikramRH edited https://github.com/llvm/llvm-project/pull/92725 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang][ExprConst] allow single element access of vector object to be constant expression (PR #72607)
vikramRH wrote: > Hello @vikramRH, please feel free to commandeer this. Thanks @yuanfang-chen. Also, clang already rejects expressions like &V[0] (https://godbolt.org/z/eGcxzGo66), which is also true with constexprs and this PR. What's the specific concern here ? https://github.com/llvm/llvm-project/pull/72607 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU][WIP] Extend permlane16, permlanex16 and permlane64 intrinsic lowering for generic types (PR #92725)
vikramRH wrote: Updated this PR to be in sync with #89217, However still plan is to land this land this only after changes in #89217 are accepted. https://github.com/llvm/llvm-project/pull/92725 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU][WIP] Extend permlane16, permlanex16 and permlane64 intrinsic lowering for generic types (PR #92725)
@@ -18479,6 +18479,28 @@ Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned BuiltinID, CGM.getIntrinsic(Intrinsic::amdgcn_update_dpp, Args[0]->getType()); return Builder.CreateCall(F, Args); } + case AMDGPU::BI__builtin_amdgcn_permlane16: + case AMDGPU::BI__builtin_amdgcn_permlanex16: { +llvm::Value *Src0 = EmitScalarExpr(E->getArg(0)); vikramRH wrote: added a new helper https://github.com/llvm/llvm-project/pull/92725 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang][ExprConst] allow single element access of vector object to be constant expression (PR #72607)
vikramRH wrote: @yuanfang-chen , @AaronBallman, @shafik, are we still actively looking into this ? (I would be willing to commandeer this if its not high on your priority list) https://github.com/llvm/llvm-project/pull/72607 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)
@@ -0,0 +1,65 @@ +; RUN: llc -stop-after=amdgpu-isel -mtriple=amdgcn-- -mcpu=gfx1100 -verify-machineinstrs -o - %s | FileCheck --check-prefixes=CHECK,ISEL %s + +; CHECK-LABEL: name:basic_readfirstlane_i64 +; CHECK:[[TOKEN:%[0-9]+]]{{[^ ]*}} = CONVERGENCECTRL_ANCHOR vikramRH wrote: Makes sense, updated. https://github.com/llvm/llvm-project/pull/89217 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)
https://github.com/vikramRH edited https://github.com/llvm/llvm-project/pull/89217 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)
@@ -0,0 +1,65 @@ +; RUN: llc -stop-after=amdgpu-isel -mtriple=amdgcn-- -mcpu=gfx1100 -verify-machineinstrs -o - %s | FileCheck --check-prefixes=CHECK,ISEL %s + +; CHECK-LABEL: name:basic_readfirstlane_i64 +; CHECK:[[TOKEN:%[0-9]+]]{{[^ ]*}} = CONVERGENCECTRL_ANCHOR vikramRH wrote: this is a preexisting error, and the failure is further down the pipeline. (after sreg alloc now i guess), does it make sense to have it as xfail now rather then stopping after isel? https://github.com/llvm/llvm-project/pull/89217 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)
@@ -0,0 +1,65 @@ +; RUN: llc -stop-after=amdgpu-isel -mtriple=amdgcn-- -mcpu=gfx1100 -verify-machineinstrs -o - %s | FileCheck --check-prefixes=CHECK,ISEL %s + +; CHECK-LABEL: name:basic_readfirstlane_i64 +; CHECK:[[TOKEN:%[0-9]+]]{{[^ ]*}} = CONVERGENCECTRL_ANCHOR vikramRH wrote: I currently see machine verifier failure which is not related to this patch. An i32 example with trunc here, https://godbolt.org/z/he8asMe77. This is also seen with wider type legalizations that we do now, so I cannot integrate these with existing tests just yet. am I missing something here ? https://github.com/llvm/llvm-project/pull/89217 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)
vikramRH wrote: > That's another option. The only real plus to the intermediate is it's > slightly less annoying to write combines for. But there are limited combining > opportunities for these we now legalize to intrinsics directly. The SDAG lowering uses a new helper to unroll vector cases while also handling convergence tokens https://github.com/llvm/llvm-project/pull/89217 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)
vikramRH wrote: > > > > @jayfoad's testcase fails and the same test should be repeated for all > > > > 3 intrinsics > > > > > > > > > added MIR tests for 3 intrinsics. The issue is that Im not able to attach > > > the glue nodes to newly created laneop pieces since they fail at > > > selection. #87509 should enable this, > > > > > > I am not really comfortable waiting for #87509 to fix convergence tokens in > > this expansion. Is it really true that this expansion cannot be fixed > > independent of future work on `CONVERGENCE_GLUE`? There is no way to > > manually handle the same glue operands?? > > I guess one way would be to have custom selection for each of the new node > type introduced, but would this be a proper way forward ? (this would be in > general for all convergent SDNodes i guess if selection is not made generic) Or drop the new nodes altogether and legelaize to intrinsics directly ? https://github.com/llvm/llvm-project/pull/89217 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)
vikramRH wrote: > > > @jayfoad's testcase fails and the same test should be repeated for all 3 > > > intrinsics > > > > > > added MIR tests for 3 intrinsics. The issue is that Im not able to attach > > the glue nodes to newly created laneop pieces since they fail at selection. > > #87509 should enable this, > > I am not really comfortable waiting for #87509 to fix convergence tokens in > this expansion. Is it really true that this expansion cannot be fixed > independent of future work on `CONVERGENCE_GLUE`? There is no way to manually > handle the same glue operands?? I guess one way would be to have custom selection for each of the new node type introduced, but would this be a proper way forward ? (this would be in general for all convergent SDNodes i guess if selection is not made generic) https://github.com/llvm/llvm-project/pull/89217 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)
@@ -0,0 +1,46 @@ +# RUN: not --crash llc -mtriple=amdgcn -run-pass=none -verify-machineinstrs -o /dev/null %s 2>&1 | FileCheck %s vikramRH wrote: Okay, I'll update with IR's https://github.com/llvm/llvm-project/pull/89217 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)
vikramRH wrote: > @jayfoad's testcase fails and the same test should be repeated for all 3 > intrinsics added MIR tests for 3 intrinsics. The issue is that Im not able to attach the glue nodes to newly created laneop pieces since they fail at selection. https://github.com/llvm/llvm-project/pull/87509 should enable this, https://github.com/llvm/llvm-project/pull/89217 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)
vikramRH wrote: > You should add the mentioned convergence-tokens.ll test function Added the test in a separate test file https://github.com/llvm/llvm-project/pull/89217 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)
@@ -5496,6 +5496,9 @@ const char* AMDGPUTargetLowering::getTargetNodeName(unsigned Opcode) const { NODE_NAME_CASE(LDS) NODE_NAME_CASE(FPTRUNC_ROUND_UPWARD) NODE_NAME_CASE(FPTRUNC_ROUND_DOWNWARD) + NODE_NAME_CASE(READLANE) + NODE_NAME_CASE(READFIRSTLANE) vikramRH wrote: done https://github.com/llvm/llvm-project/pull/89217 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)
@@ -5461,8 +5461,7 @@ bool AMDGPULegalizerInfo::legalizeLaneOp(LegalizerHelper &Helper, SmallVector PartialRes; unsigned NumParts = Size / 32; - MachineInstrBuilder Src0Parts, Src2Parts; - Src0Parts = B.buildUnmerge(PartialResTy, Src0); + MachineInstrBuilder Src0Parts = B.buildUnmerge(PartialResTy, Src0), Src2Parts; vikramRH wrote: Done https://github.com/llvm/llvm-project/pull/89217 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)
@@ -1170,6 +1170,23 @@ The AMDGPU backend implements the following LLVM IR intrinsics. :ref:`llvm.set.fpenv` Sets the floating point environment to the specifies state. + llvm.amdgcn.readfirstlaneProvides direct access to v_readfirstlane_b32. Returns the value in + the lowest active lane of the input operand. Currently implemented + for i16, i32, float, half, bf16, <2 x i16>, <2 x half>, <2 x bfloat>, vikramRH wrote: done https://github.com/llvm/llvm-project/pull/89217 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)
@@ -6086,6 +6086,63 @@ static SDValue lowerBALLOTIntrinsic(const SITargetLowering &TLI, SDNode *N, DAG.getConstant(0, SL, MVT::i32), DAG.getCondCode(ISD::SETNE)); } +static SDValue lowerLaneOp(const SITargetLowering &TLI, SDNode *N, + SelectionDAG &DAG) { + EVT VT = N->getValueType(0); + unsigned ValSize = VT.getSizeInBits(); + unsigned IntrinsicID = N->getConstantOperandVal(0); + SDValue Src0 = N->getOperand(1); + SDLoc SL(N); + MVT IntVT = MVT::getIntegerVT(ValSize); + + auto createLaneOp = [&DAG, &SL](SDValue Src0, SDValue Src1, SDValue Src2, + MVT VT) -> SDValue { +return (Src2 ? DAG.getNode(AMDGPUISD::WRITELANE, SL, VT, {Src0, Src1, Src2}) +: Src1 ? DAG.getNode(AMDGPUISD::READLANE, SL, VT, {Src0, Src1}) + : DAG.getNode(AMDGPUISD::READFIRSTLANE, SL, VT, {Src0})); + }; + + SDValue Src1, Src2; + if (IntrinsicID == Intrinsic::amdgcn_readlane || + IntrinsicID == Intrinsic::amdgcn_writelane) { +Src1 = N->getOperand(2); +if (IntrinsicID == Intrinsic::amdgcn_writelane) + Src2 = N->getOperand(3); + } + + if (ValSize == 32) { +// Already legal +return SDValue(); + } + + if (ValSize < 32) { +bool IsFloat = VT.isFloatingPoint(); +Src0 = DAG.getAnyExtOrTrunc(IsFloat ? DAG.getBitcast(IntVT, Src0) : Src0, +SL, MVT::i32); +if (Src2.getNode()) { + Src2 = DAG.getAnyExtOrTrunc(IsFloat ? DAG.getBitcast(IntVT, Src2) : Src2, + SL, MVT::i32); +} +SDValue LaneOp = createLaneOp(Src0, Src1, Src2, MVT::i32); +SDValue Trunc = DAG.getAnyExtOrTrunc(LaneOp, SL, IntVT); +return IsFloat ? DAG.getBitcast(VT, Trunc) : Trunc; + } + + if ((ValSize % 32) == 0) { +MVT VecVT = MVT::getVectorVT(MVT::i32, ValSize / 32); vikramRH wrote: Updated https://github.com/llvm/llvm-project/pull/89217 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)
@@ -1170,6 +1170,23 @@ The AMDGPU backend implements the following LLVM IR intrinsics. :ref:`llvm.set.fpenv` Sets the floating point environment to the specifies state. + llvm.amdgcn.readfirstlaneProvides direct access to v_readfirstlane_b32. Returns the value in + the lowest active lane of the input operand. Currently + implemented for i16, i32, float, half, bf16, v2i16, v2f16 and types + whose sizes are multiples of 32-bit. + + llvm.amdgcn.readlane Provides direct access to v_readlane_b32. Returns the value in the + specified lane of the first input operand. The second operand + specifies the lane to read from. Currently implemented + for i16, i32, float, half, bf16, v2i16, v2f16 and types whose sizes vikramRH wrote: Updated https://github.com/llvm/llvm-project/pull/89217 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)
@@ -5387,6 +5387,124 @@ bool AMDGPULegalizerInfo::legalizeDSAtomicFPIntrinsic(LegalizerHelper &Helper, return true; } +// TODO: Fix pointer type handling +bool AMDGPULegalizerInfo::legalizeLaneOp(LegalizerHelper &Helper, + MachineInstr &MI, + Intrinsic::ID IID) const { + + MachineIRBuilder &B = Helper.MIRBuilder; + MachineRegisterInfo &MRI = *B.getMRI(); + + Register DstReg = MI.getOperand(0).getReg(); + Register Src0 = MI.getOperand(2).getReg(); + + auto createLaneOp = [&](Register Src0, Register Src1, + Register Src2) -> Register { +auto LaneOp = B.buildIntrinsic(IID, {S32}).addUse(Src0); +switch (IID) { +case Intrinsic::amdgcn_readfirstlane: + return LaneOp.getReg(0); +case Intrinsic::amdgcn_readlane: + return LaneOp.addUse(Src1).getReg(0); +case Intrinsic::amdgcn_writelane: + return LaneOp.addUse(Src1).addUse(Src2).getReg(0); +default: + llvm_unreachable("unhandled lane op"); +} + }; + + Register Src1, Src2; + if (IID == Intrinsic::amdgcn_readlane || IID == Intrinsic::amdgcn_writelane) { +Src1 = MI.getOperand(3).getReg(); +if (IID == Intrinsic::amdgcn_writelane) { + Src2 = MI.getOperand(4).getReg(); +} + } + + LLT Ty = MRI.getType(DstReg); + unsigned Size = Ty.getSizeInBits(); + + if (Size == 32) { +// Already legal +return true; + } + + if (Size < 32) { +Src0 = B.buildAnyExt(S32, Src0).getReg(0); +if (Src2.isValid()) + Src2 = B.buildAnyExt(LLT::scalar(32), Src2).getReg(0); + +Register LaneOpDst = createLaneOp(Src0, Src1, Src2); +B.buildTrunc(DstReg, LaneOpDst); + +MI.eraseFromParent(); +return true; + } + + if ((Size % 32) == 0) { +SmallVector PartialRes; +unsigned NumParts = Size / 32; +LLT PartialResTy = +Ty.isVector() && Ty.getElementType() == S16 ? V2S16 : S32; vikramRH wrote: Done https://github.com/llvm/llvm-project/pull/89217 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)
@@ -6086,6 +6086,63 @@ static SDValue lowerBALLOTIntrinsic(const SITargetLowering &TLI, SDNode *N, DAG.getConstant(0, SL, MVT::i32), DAG.getCondCode(ISD::SETNE)); } +static SDValue lowerLaneOp(const SITargetLowering &TLI, SDNode *N, + SelectionDAG &DAG) { + EVT VT = N->getValueType(0); + unsigned ValSize = VT.getSizeInBits(); + unsigned IntrinsicID = N->getConstantOperandVal(0); + SDValue Src0 = N->getOperand(1); + SDLoc SL(N); + MVT IntVT = MVT::getIntegerVT(ValSize); + + auto createLaneOp = [&DAG, &SL](SDValue Src0, SDValue Src1, SDValue Src2, + MVT VT) -> SDValue { +return (Src2 ? DAG.getNode(AMDGPUISD::WRITELANE, SL, VT, {Src0, Src1, Src2}) +: Src1 ? DAG.getNode(AMDGPUISD::READLANE, SL, VT, {Src0, Src1}) + : DAG.getNode(AMDGPUISD::READFIRSTLANE, SL, VT, {Src0})); + }; + + SDValue Src1, Src2; + if (IntrinsicID == Intrinsic::amdgcn_readlane || + IntrinsicID == Intrinsic::amdgcn_writelane) { +Src1 = N->getOperand(2); +if (IntrinsicID == Intrinsic::amdgcn_writelane) + Src2 = N->getOperand(3); + } + + if (ValSize == 32) { +// Already legal +return SDValue(); + } + + if (ValSize < 32) { +bool IsFloat = VT.isFloatingPoint(); +Src0 = DAG.getAnyExtOrTrunc(IsFloat ? DAG.getBitcast(IntVT, Src0) : Src0, +SL, MVT::i32); +if (Src2.getNode()) { + Src2 = DAG.getAnyExtOrTrunc(IsFloat ? DAG.getBitcast(IntVT, Src2) : Src2, + SL, MVT::i32); +} +SDValue LaneOp = createLaneOp(Src0, Src1, Src2, MVT::i32); +SDValue Trunc = DAG.getAnyExtOrTrunc(LaneOp, SL, IntVT); +return IsFloat ? DAG.getBitcast(VT, Trunc) : Trunc; + } + + if ((ValSize % 32) == 0) { +MVT VecVT = MVT::getVectorVT(MVT::i32, ValSize / 32); vikramRH wrote: Understood. Thanks ! https://github.com/llvm/llvm-project/pull/89217 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)
https://github.com/vikramRH edited https://github.com/llvm/llvm-project/pull/89217 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU][WIP] Extend permlane16, permlanex16 and permlane64 intrinsic lowering for generic types (PR #92725)
vikramRH wrote: 1. Added/updated tests for permlanex16, permlane64 2. This needs https://github.com/llvm/llvm-project/pull/89217 to land first so that only incremental changes can be reviewed. https://github.com/llvm/llvm-project/pull/92725 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU][WIP] Extend permlane16, permlanex16 and permlane64 intrinsic lowering for generic types (PR #92725)
https://github.com/vikramRH edited https://github.com/llvm/llvm-project/pull/92725 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)
@@ -6086,6 +6086,63 @@ static SDValue lowerBALLOTIntrinsic(const SITargetLowering &TLI, SDNode *N, DAG.getConstant(0, SL, MVT::i32), DAG.getCondCode(ISD::SETNE)); } +static SDValue lowerLaneOp(const SITargetLowering &TLI, SDNode *N, + SelectionDAG &DAG) { + EVT VT = N->getValueType(0); + unsigned ValSize = VT.getSizeInBits(); + unsigned IntrinsicID = N->getConstantOperandVal(0); + SDValue Src0 = N->getOperand(1); + SDLoc SL(N); + MVT IntVT = MVT::getIntegerVT(ValSize); + + auto createLaneOp = [&DAG, &SL](SDValue Src0, SDValue Src1, SDValue Src2, + MVT VT) -> SDValue { +return (Src2 ? DAG.getNode(AMDGPUISD::WRITELANE, SL, VT, {Src0, Src1, Src2}) +: Src1 ? DAG.getNode(AMDGPUISD::READLANE, SL, VT, {Src0, Src1}) + : DAG.getNode(AMDGPUISD::READFIRSTLANE, SL, VT, {Src0})); + }; + + SDValue Src1, Src2; + if (IntrinsicID == Intrinsic::amdgcn_readlane || + IntrinsicID == Intrinsic::amdgcn_writelane) { +Src1 = N->getOperand(2); +if (IntrinsicID == Intrinsic::amdgcn_writelane) + Src2 = N->getOperand(3); + } + + if (ValSize == 32) { +// Already legal +return SDValue(); + } + + if (ValSize < 32) { +bool IsFloat = VT.isFloatingPoint(); +Src0 = DAG.getAnyExtOrTrunc(IsFloat ? DAG.getBitcast(IntVT, Src0) : Src0, +SL, MVT::i32); +if (Src2.getNode()) { + Src2 = DAG.getAnyExtOrTrunc(IsFloat ? DAG.getBitcast(IntVT, Src2) : Src2, + SL, MVT::i32); +} +SDValue LaneOp = createLaneOp(Src0, Src1, Src2, MVT::i32); +SDValue Trunc = DAG.getAnyExtOrTrunc(LaneOp, SL, IntVT); +return IsFloat ? DAG.getBitcast(VT, Trunc) : Trunc; + } + + if ((ValSize % 32) == 0) { +MVT VecVT = MVT::getVectorVT(MVT::i32, ValSize / 32); +Src0 = DAG.getBitcast(VecVT, Src0); + +if (Src2.getNode()) + Src2 = DAG.getBitcast(VecVT, Src2); + +SDValue LaneOp = createLaneOp(Src0, Src1, Src2, VecVT); +SDValue UnrolledLaneOp = DAG.UnrollVectorOp(LaneOp.getNode()); +return DAG.getBitcast(VT, UnrolledLaneOp); vikramRH wrote: ```suggestion MVT LaneOpT = VT.isVector() && VT.getVectorElementType().getSizeInBits() == 16 ? MVT::v2i16 : MVT::i32; SDValue Src0SubReg, Src2SubReg; SmallVector LaneOps; LaneOps.push_back(DAG.getTargetConstant( TLI.getRegClassFor(VT.getSimpleVT(), N->isDivergent())->getID(), SL, MVT::i32)); for (unsigned i = 0; i < (ValSize / 32); i++) { unsigned SubRegIdx = SIRegisterInfo::getSubRegFromChannel(i); Src0SubReg = DAG.getTargetExtractSubreg(SubRegIdx, SL, LaneOpT, Src0); if (Src2) Src2SubReg = DAG.getTargetExtractSubreg(SubRegIdx, SL, LaneOpT, Src2); LaneOps.push_back(createLaneOp(Src0SubReg, Src1, Src2SubReg, LaneOpT)); LaneOps.push_back(DAG.getTargetConstant(SubRegIdx, SL, MVT::i32)); } return SDValue( DAG.getMachineNode(TargetOpcode::REG_SEQUENCE, SL, VT, LaneOps), 0); ``` @arsenm , @jayfoad , an alternate idea here that is much closer in logic to the GIsel implementation and doesn't rely on bitcasts. how does this look ? https://github.com/llvm/llvm-project/pull/89217 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU][WIP] Extend permlane16, permlanex16 and permlane64 intrinsic lowering for generic types (PR #92725)
@@ -5433,7 +5450,16 @@ bool AMDGPULegalizerInfo::legalizeLaneOp(LegalizerHelper &Helper, ? Src0 : B.buildBitcast(LLT::scalar(Size), Src0).getReg(0); Src0 = B.buildAnyExt(S32, Src0Cast).getReg(0); -if (Src2.isValid()) { + +if (IsPermLane16) { + Register Src1Cast = + MRI.getType(Src1).isScalar() + ? Src1 + : B.buildBitcast(LLT::scalar(Size), Src2).getReg(0); vikramRH wrote: Yes, I will take over the changes from https://github.com/llvm/llvm-project/pull/89217 once finalized, https://github.com/llvm/llvm-project/pull/92725 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU][WIP] Extend permlane16, permlanex16 and permlane64 intrinsic lowering for generic types (PR #92725)
@@ -18479,6 +18479,25 @@ Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned BuiltinID, CGM.getIntrinsic(Intrinsic::amdgcn_update_dpp, Args[0]->getType()); return Builder.CreateCall(F, Args); } + case AMDGPU::BI__builtin_amdgcn_permlane16: + case AMDGPU::BI__builtin_amdgcn_permlanex16: { +Intrinsic::ID IID; +IID = BuiltinID == AMDGPU::BI__builtin_amdgcn_permlane16 + ? Intrinsic::amdgcn_permlane16 + : Intrinsic::amdgcn_permlanex16; + +llvm::Value *Src0 = EmitScalarExpr(E->getArg(0)); +llvm::Value *Src1 = EmitScalarExpr(E->getArg(1)); +llvm::Value *Src2 = EmitScalarExpr(E->getArg(2)); vikramRH wrote: yes https://github.com/llvm/llvm-project/pull/92725 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)
@@ -5456,43 +5444,32 @@ bool AMDGPULegalizerInfo::legalizeLaneOp(LegalizerHelper &Helper, if ((Size % 32) == 0) { SmallVector PartialRes; unsigned NumParts = Size / 32; -auto IsS16Vec = Ty.isVector() && Ty.getElementType() == S16; +bool IsS16Vec = Ty.isVector() && Ty.getElementType() == S16; vikramRH wrote: done https://github.com/llvm/llvm-project/pull/89217 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)
https://github.com/vikramRH edited https://github.com/llvm/llvm-project/pull/89217 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU][WIP] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)
vikramRH wrote: > > 1. What's the proper way to legalize f16 and bf16 for SDAG case without > > bitcasts ? (I would think "fp_extend -> LaneOp -> Fptrunc" is wrong) > > Bitcast to i16, anyext to i32, laneop, trunc to i16, bitcast to original type. > > Why wouldn't you use bitcasts? Just a doubt I had on previous comments, sorry for the noise ! https://github.com/llvm/llvm-project/pull/89217 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU][WIP] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)
vikramRH wrote: updated the GIsel legalizer, I still have couple of questions for SDAG case though, 1. What's the proper way to legalize f16 and bf16 for SDAG case without bitcasts ? (I would think "fp_extend -> LaneOp -> Fptrunc" is wrong) 2. For scalar cases such as i64, f64, i128 .. (i.e 32 bit multiples), I guess bitcast to vectors (v2i32, v2f32, v4i32) is unavoidable since "UnrollVectorOp" wouldn't work otherwise. any alternalte suggestions here ? https://github.com/llvm/llvm-project/pull/89217 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU][WIP] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)
@@ -5387,6 +5387,192 @@ bool AMDGPULegalizerInfo::legalizeDSAtomicFPIntrinsic(LegalizerHelper &Helper, return true; } +bool AMDGPULegalizerInfo::legalizeLaneOp(LegalizerHelper &Helper, + MachineInstr &MI, + Intrinsic::ID IID) const { + + MachineIRBuilder &B = Helper.MIRBuilder; + MachineRegisterInfo &MRI = *B.getMRI(); + + Register DstReg = MI.getOperand(0).getReg(); + Register Src0 = MI.getOperand(2).getReg(); + + auto createLaneOp = [&](Register Src0, Register Src1, + Register Src2) -> Register { +auto LaneOp = B.buildIntrinsic(IID, {S32}).addUse(Src0); +switch (IID) { +case Intrinsic::amdgcn_readfirstlane: + return LaneOp.getReg(0); +case Intrinsic::amdgcn_readlane: + return LaneOp.addUse(Src1).getReg(0); +case Intrinsic::amdgcn_writelane: + return LaneOp.addUse(Src1).addUse(Src2).getReg(0); +default: + llvm_unreachable("unhandled lane op"); +} + }; + + Register Src1, Src2; + if (IID == Intrinsic::amdgcn_readlane || IID == Intrinsic::amdgcn_writelane) { +Src1 = MI.getOperand(3).getReg(); +if (IID == Intrinsic::amdgcn_writelane) { + Src2 = MI.getOperand(4).getReg(); +} + } + + LLT Ty = MRI.getType(DstReg); + unsigned Size = Ty.getSizeInBits(); + + if (Size == 32) { +// Already legal +return true; + } + + if (Size < 32) { +Register Src0Cast = MRI.getType(Src0).isScalar() +? Src0 +: B.buildBitcast(LLT::scalar(Size), Src0).getReg(0); +Src0 = B.buildAnyExt(S32, Src0Cast).getReg(0); +if (Src2.isValid()) { + Register Src2Cast = + MRI.getType(Src2).isScalar() + ? Src2 + : B.buildBitcast(LLT::scalar(Size), Src2).getReg(0); + Src2 = B.buildAnyExt(LLT::scalar(32), Src2Cast).getReg(0); +} + +Register LaneOpDst = createLaneOp(Src0, Src1, Src2); +if (Ty.isScalar()) + B.buildTrunc(DstReg, LaneOpDst); +else { + auto Trunc = B.buildTrunc(LLT::scalar(Size), LaneOpDst); + B.buildBitcast(DstReg, Trunc); +} + +MI.eraseFromParent(); +return true; + } + + if ((Size % 32) == 0) { +SmallVector PartialRes; +unsigned NumParts = Size / 32; +auto IsS16Vec = Ty.isVector() && Ty.getElementType() == S16; +MachineInstrBuilder Src0Parts; + +if (Ty.isPointer()) { + auto PtrToInt = B.buildPtrToInt(LLT::scalar(Size), Src0); + Src0Parts = B.buildUnmerge(S32, PtrToInt); +} else if (Ty.isPointerVector()) { + LLT IntVecTy = Ty.changeElementType( + LLT::scalar(Ty.getElementType().getSizeInBits())); + auto PtrToInt = B.buildPtrToInt(IntVecTy, Src0); + Src0Parts = B.buildUnmerge(S32, PtrToInt); +} else + Src0Parts = + IsS16Vec ? B.buildUnmerge(V2S16, Src0) : B.buildUnmerge(S32, Src0); + +switch (IID) { +case Intrinsic::amdgcn_readlane: { + Register Src1 = MI.getOperand(3).getReg(); + for (unsigned i = 0; i < NumParts; ++i) { +Src0 = IsS16Vec ? B.buildBitcast(S32, Src0Parts.getReg(i)).getReg(0) +: Src0Parts.getReg(i); +PartialRes.push_back( +(B.buildIntrinsic(Intrinsic::amdgcn_readlane, {S32}) + .addUse(Src0) + .addUse(Src1)) +.getReg(0)); + } + break; +} +case Intrinsic::amdgcn_readfirstlane: { + for (unsigned i = 0; i < NumParts; ++i) { +Src0 = IsS16Vec ? B.buildBitcast(S32, Src0Parts.getReg(i)).getReg(0) +: Src0Parts.getReg(i); +PartialRes.push_back( +(B.buildIntrinsic(Intrinsic::amdgcn_readfirstlane, {S32}) + .addUse(Src0) + .getReg(0))); + } + + break; +} +case Intrinsic::amdgcn_writelane: { + Register Src1 = MI.getOperand(3).getReg(); + Register Src2 = MI.getOperand(4).getReg(); + MachineInstrBuilder Src2Parts; + + if (Ty.isPointer()) { +auto PtrToInt = B.buildPtrToInt(S64, Src2); +Src2Parts = B.buildUnmerge(S32, PtrToInt); + } else if (Ty.isPointerVector()) { +LLT IntVecTy = Ty.changeElementType( +LLT::scalar(Ty.getElementType().getSizeInBits())); +auto PtrToInt = B.buildPtrToInt(IntVecTy, Src2); +Src2Parts = B.buildUnmerge(S32, PtrToInt); + } else +Src2Parts = +IsS16Vec ? B.buildUnmerge(V2S16, Src2) : B.buildUnmerge(S32, Src2); vikramRH wrote: done https://github.com/llvm/llvm-project/pull/89217 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU][WIP] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)
https://github.com/vikramRH edited https://github.com/llvm/llvm-project/pull/89217 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU][WIP] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)
@@ -6086,6 +6086,62 @@ static SDValue lowerBALLOTIntrinsic(const SITargetLowering &TLI, SDNode *N, DAG.getConstant(0, SL, MVT::i32), DAG.getCondCode(ISD::SETNE)); } +static SDValue lowerLaneOp(const SITargetLowering &TLI, SDNode *N, + SelectionDAG &DAG) { + EVT VT = N->getValueType(0); + unsigned ValSize = VT.getSizeInBits(); + unsigned IntrinsicID = N->getConstantOperandVal(0); + SDValue Src0 = N->getOperand(1); + SDLoc SL(N); + MVT IntVT = MVT::getIntegerVT(ValSize); + + auto createLaneOp = [&DAG, &SL](SDValue Src0, SDValue Src1, SDValue Src2, + MVT VT) -> SDValue { +return (Src2 ? DAG.getNode(AMDGPUISD::WRITELANE, SL, VT, {Src0, Src1, Src2}) +: Src1 ? DAG.getNode(AMDGPUISD::READLANE, SL, VT, {Src0, Src1}) + : DAG.getNode(AMDGPUISD::READFIRSTLANE, SL, VT, {Src0})); + }; + + SDValue Src1, Src2; + if (IntrinsicID == Intrinsic::amdgcn_readlane || + IntrinsicID == Intrinsic::amdgcn_writelane) { +Src1 = N->getOperand(2); +if (IntrinsicID == Intrinsic::amdgcn_writelane) + Src2 = N->getOperand(3); + } + + if (ValSize == 32) { +// Already legal +return SDValue(); + } + + if (ValSize < 32) { +SDValue InitBitCast = DAG.getBitcast(IntVT, Src0); +Src0 = DAG.getAnyExtOrTrunc(InitBitCast, SL, MVT::i32); +if (Src2.getNode()) { + SDValue Src2Cast = DAG.getBitcast(IntVT, Src2); vikramRH wrote: What would be the proper way to legalize f16 and bf16 for SDAG case without bitcasts ? (Im currently thinking "fp_extend -> LaneOp -> Fptrunc" which seems wrong) https://github.com/llvm/llvm-project/pull/89217 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU][WIP] Extend permlane16, permlanex16 and permlane64 intrinsic lowering for generic types (PR #92725)
https://github.com/vikramRH edited https://github.com/llvm/llvm-project/pull/92725 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU][WIP] Extend permlane16, permlanex16 and permlane64 intrinsic lowering for generic types (PR #92725)
https://github.com/vikramRH edited https://github.com/llvm/llvm-project/pull/92725 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU][WIP] Extend permlane16, permlanex16 and permlane64 intrinsic lowering for generic types (PR #92725)
https://github.com/vikramRH edited https://github.com/llvm/llvm-project/pull/92725 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU][WIP] Extend permlane16, permlanex16 and permlane64 intrinsic lowering for generic types (PR #92725)
https://github.com/vikramRH edited https://github.com/llvm/llvm-project/pull/92725 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU][WIP] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)
https://github.com/vikramRH edited https://github.com/llvm/llvm-project/pull/89217 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU][WIP] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)
https://github.com/vikramRH edited https://github.com/llvm/llvm-project/pull/89217 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU][WIP] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)
@@ -243,11 +243,16 @@ def VOP_READFIRSTLANE : VOPProfile <[i32, i32, untyped, untyped]> { // FIXME: Specify SchedRW for READFIRSTLANE_B32 // TODO: There is VOP3 encoding also def V_READFIRSTLANE_B32 : VOP1_Pseudo <"v_readfirstlane_b32", VOP_READFIRSTLANE, - getVOP1Pat.ret, 1> { + [], 1> { let isConvergent = 1; } +foreach vt = Reg32Types.types in { + def : GCNPat<(vt (AMDGPUreadfirstlane (vt VRegOrLdsSrc_32:$src0))), +(V_READFIRSTLANE_B32 (vt VRegOrLdsSrc_32:$src0)) vikramRH wrote: Done https://github.com/llvm/llvm-project/pull/89217 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU][WIP] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)
@@ -243,11 +243,16 @@ def VOP_READFIRSTLANE : VOPProfile <[i32, i32, untyped, untyped]> { // FIXME: Specify SchedRW for READFIRSTLANE_B32 // TODO: There is VOP3 encoding also def V_READFIRSTLANE_B32 : VOP1_Pseudo <"v_readfirstlane_b32", VOP_READFIRSTLANE, - getVOP1Pat.ret, 1> { + [], 1> { let isConvergent = 1; } +foreach vt = Reg32Types.types in { + def : GCNPat<(vt (AMDGPUreadfirstlane (vt VRegOrLdsSrc_32:$src0))), +(V_READFIRSTLANE_B32 (vt VRegOrLdsSrc_32:$src0)) vikramRH wrote: Do you think these changes are okay until I figure out root cause ? https://github.com/llvm/llvm-project/pull/89217 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU][WIP] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)
https://github.com/vikramRH edited https://github.com/llvm/llvm-project/pull/89217 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU][WIP] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)
@@ -342,6 +342,22 @@ def AMDGPUfdot2_impl : SDNode<"AMDGPUISD::FDOT2", def AMDGPUperm_impl : SDNode<"AMDGPUISD::PERM", AMDGPUDTIntTernaryOp, []>; +def AMDGPUReadfirstlaneOp : SDTypeProfile<1, 1, [ + SDTCisSameAs<0, 1> +]>; + +def AMDGPUReadlaneOp : SDTypeProfile<1, 2, [ + SDTCisSameAs<0, 1>, SDTCisInt<2> +]>; + +def AMDGPUDWritelaneOp : SDTypeProfile<1, 3, [ + SDTCisSameAs<1, 1>, SDTCisInt<2>, SDTCisSameAs<0, 3>, vikramRH wrote: Thanks for pointing this, missed updating this latest version. updated now, however issue is not related to this https://github.com/llvm/llvm-project/pull/89217 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU][WIP] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)
https://github.com/vikramRH edited https://github.com/llvm/llvm-project/pull/89217 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU][WIP] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)
https://github.com/vikramRH edited https://github.com/llvm/llvm-project/pull/89217 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU][WIP] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)
@@ -243,11 +243,16 @@ def VOP_READFIRSTLANE : VOPProfile <[i32, i32, untyped, untyped]> { // FIXME: Specify SchedRW for READFIRSTLANE_B32 // TODO: There is VOP3 encoding also def V_READFIRSTLANE_B32 : VOP1_Pseudo <"v_readfirstlane_b32", VOP_READFIRSTLANE, - getVOP1Pat.ret, 1> { + [], 1> { let isConvergent = 1; } +foreach vt = Reg32Types.types in { + def : GCNPat<(vt (AMDGPUreadfirstlane (vt VRegOrLdsSrc_32:$src0))), +(V_READFIRSTLANE_B32 (vt VRegOrLdsSrc_32:$src0)) vikramRH wrote: Attaching example match table snippets for v2i16 and p3 here, should make the scenario bit more clear, for v2i16 ``` GIM_Try, /*On fail goto*//*Label 3499*/ GIMT_Encode4(202699), // Rule ID 2117 // GIM_CheckIntrinsicID, /*MI*/0, /*Op*/1, GIMT_Encode2(Intrinsic::amdgcn_writelane), GIM_RootCheckType, /*Op*/0, /*Type*/GILLT_v2s16, GIM_RootCheckType, /*Op*/2, /*Type*/GILLT_v2s16, GIM_RootCheckType, /*Op*/3, /*Type*/GILLT_s32, GIM_RootCheckType, /*Op*/4, /*Type*/GILLT_v2s16, GIM_RootCheckRegBankForClass, /*Op*/0, /*RC*/GIMT_Encode2(AMDGPU::VGPR_32RegClassID), // (intrinsic_wo_chain:{ *:[v2i16] } 2863:{ *:[iPTR] }, v2i16:{ *:[v2i16] }:$src0, i32:{ *:[i32] }:$src1, v2i16:{ *:[v2i16] }:$src2) => (V_WRITELANE_B32:{ *:[v2i16] } SCSrc_b32:{ *:[v2i16] }:$src0, SCSrc_b32:{ *:[i32] }:$src1, VGPR_32:{ *:[v2i16] }:$src2) GIR_BuildRootMI, /*Opcode*/GIMT_Encode2(AMDGPU::V_WRITELANE_B32), ``` and for p3, ``` GIM_Try, /*On fail goto*//*Label 3502*/ GIMT_Encode4(202816), // Rule ID 2129 // GIM_CheckIntrinsicID, /*MI*/0, /*Op*/1, GIMT_Encode2(Intrinsic::amdgcn_writelane), GIM_RootCheckType, /*Op*/0, /*Type*/GILLT_s32, GIM_RootCheckType, /*Op*/2, /*Type*/GILLT_p2s32, GIM_RootCheckType, /*Op*/3, /*Type*/GILLT_s32, GIM_RootCheckType, /*Op*/4, /*Type*/GILLT_p2s32, GIM_RootCheckRegBankForClass, /*Op*/0, /*RC*/GIMT_Encode2(AMDGPU::VGPR_32RegClassID), // (intrinsic_wo_chain:{ *:[i32] } 2863:{ *:[iPTR] }, p2:{ *:[i32] }:$src0, i32:{ *:[i32] }:$src1, p2:{ *:[i32] }:$src2) => (V_WRITELANE_B32:{ *:[i32] } SCSrc_b32:{ *:[i32] }:$src0, SCSrc_b32:{ *:[i32] }:$src1, VGPR_32:{ *:[i32] }:$src2) GIR_BuildRootMI, /*Opcode*/GIMT_Encode2(AMDGPU::V_WRITELANE_B32), ``` The destination type check for p3 case is still for "GILLT_s32", https://github.com/llvm/llvm-project/pull/89217 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU][WIP] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)
@@ -243,11 +243,16 @@ def VOP_READFIRSTLANE : VOPProfile <[i32, i32, untyped, untyped]> { // FIXME: Specify SchedRW for READFIRSTLANE_B32 // TODO: There is VOP3 encoding also def V_READFIRSTLANE_B32 : VOP1_Pseudo <"v_readfirstlane_b32", VOP_READFIRSTLANE, - getVOP1Pat.ret, 1> { + [], 1> { let isConvergent = 1; } +foreach vt = Reg32Types.types in { + def : GCNPat<(vt (AMDGPUreadfirstlane (vt VRegOrLdsSrc_32:$src0))), +(V_READFIRSTLANE_B32 (vt VRegOrLdsSrc_32:$src0)) vikramRH wrote: Unfortunately no, Had tried this and couple of other variations. the issue seems to be too specific to GIsel pointers.. https://github.com/llvm/llvm-project/pull/89217 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU][WIP] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)
https://github.com/vikramRH edited https://github.com/llvm/llvm-project/pull/89217 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU][WIP] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)
@@ -5387,6 +5387,212 @@ bool AMDGPULegalizerInfo::legalizeDSAtomicFPIntrinsic(LegalizerHelper &Helper, return true; } +bool AMDGPULegalizerInfo::legalizeLaneOp(LegalizerHelper &Helper, + MachineInstr &MI, + Intrinsic::ID IID) const { + + MachineIRBuilder &B = Helper.MIRBuilder; + MachineRegisterInfo &MRI = *B.getMRI(); + + Register DstReg = MI.getOperand(0).getReg(); + Register Src0 = MI.getOperand(2).getReg(); + + auto createLaneOp = [&](Register Src0, Register Src1, + Register Src2) -> Register { +auto LaneOp = B.buildIntrinsic(IID, {S32}).addUse(Src0); +switch (IID) { +case Intrinsic::amdgcn_readfirstlane: + return LaneOp.getReg(0); +case Intrinsic::amdgcn_readlane: + return LaneOp.addUse(Src1).getReg(0); +case Intrinsic::amdgcn_writelane: + return LaneOp.addUse(Src1).addUse(Src2).getReg(0); +default: + llvm_unreachable("unhandled lane op"); +} + }; + + Register Src1, Src2; + if (IID == Intrinsic::amdgcn_readlane || IID == Intrinsic::amdgcn_writelane) { +Src1 = MI.getOperand(3).getReg(); +if (IID == Intrinsic::amdgcn_writelane) { + Src2 = MI.getOperand(4).getReg(); +} + } + + LLT Ty = MRI.getType(DstReg); + unsigned Size = Ty.getSizeInBits(); + + if (Size == 32) { +if (Ty.isScalar()) + // Already legal vikramRH wrote: Also the issue is only for pointer types, float, v2i16 etc work just fine https://github.com/llvm/llvm-project/pull/89217 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU][WIP] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)
https://github.com/vikramRH edited https://github.com/llvm/llvm-project/pull/89217 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU][WIP] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)
@@ -5387,6 +5387,212 @@ bool AMDGPULegalizerInfo::legalizeDSAtomicFPIntrinsic(LegalizerHelper &Helper, return true; } +bool AMDGPULegalizerInfo::legalizeLaneOp(LegalizerHelper &Helper, + MachineInstr &MI, + Intrinsic::ID IID) const { + + MachineIRBuilder &B = Helper.MIRBuilder; + MachineRegisterInfo &MRI = *B.getMRI(); + + Register DstReg = MI.getOperand(0).getReg(); + Register Src0 = MI.getOperand(2).getReg(); + + auto createLaneOp = [&](Register Src0, Register Src1, + Register Src2) -> Register { +auto LaneOp = B.buildIntrinsic(IID, {S32}).addUse(Src0); +switch (IID) { +case Intrinsic::amdgcn_readfirstlane: + return LaneOp.getReg(0); +case Intrinsic::amdgcn_readlane: + return LaneOp.addUse(Src1).getReg(0); +case Intrinsic::amdgcn_writelane: + return LaneOp.addUse(Src1).addUse(Src2).getReg(0); +default: + llvm_unreachable("unhandled lane op"); +} + }; + + Register Src1, Src2; + if (IID == Intrinsic::amdgcn_readlane || IID == Intrinsic::amdgcn_writelane) { +Src1 = MI.getOperand(3).getReg(); +if (IID == Intrinsic::amdgcn_writelane) { + Src2 = MI.getOperand(4).getReg(); +} + } + + LLT Ty = MRI.getType(DstReg); + unsigned Size = Ty.getSizeInBits(); + + if (Size == 32) { +if (Ty.isScalar()) + // Already legal vikramRH wrote: Done except for pointers. I currently see an issue where pattern type inference somehow deduces destination type to scalars (instead of say LLT_ p3s32). not currently sure why , any ideas ? https://github.com/llvm/llvm-project/pull/89217 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang][ExprConst] allow single element access of vector object to be constant expression (PR #72607)
vikramRH wrote: @yuanfang-chen , any plans to continue with this PR ? https://github.com/llvm/llvm-project/pull/72607 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU][WIP] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)
vikramRH wrote: Added new 32 bit pointer, <8 x i16> tests https://github.com/llvm/llvm-project/pull/89217 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU][WIP] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)
@@ -5386,6 +5386,153 @@ bool AMDGPULegalizerInfo::legalizeDSAtomicFPIntrinsic(LegalizerHelper &Helper, return true; } +bool AMDGPULegalizerInfo::legalizeLaneOp(LegalizerHelper &Helper, + MachineInstr &MI, + Intrinsic::ID IID) const { + + MachineIRBuilder &B = Helper.MIRBuilder; + MachineRegisterInfo &MRI = *B.getMRI(); + + Register DstReg = MI.getOperand(0).getReg(); + Register Src0 = MI.getOperand(2).getReg(); + + Register Src1, Src2; + if (IID == Intrinsic::amdgcn_readlane || IID == Intrinsic::amdgcn_writelane) { +Src1 = MI.getOperand(3).getReg(); +if (IID == Intrinsic::amdgcn_writelane) { + Src2 = MI.getOperand(4).getReg(); +} + } + + LLT Ty = MRI.getType(DstReg); + unsigned Size = Ty.getSizeInBits(); + + if (Size == 32) { +if (Ty.isScalar()) + // Already legal + return true; + +Register Src0Valid = B.buildBitcast(S32, Src0).getReg(0); +MachineInstrBuilder LaneOpDst; +switch (IID) { +case Intrinsic::amdgcn_readfirstlane: { + LaneOpDst = B.buildIntrinsic(IID, {S32}).addUse(Src0Valid); + break; +} +case Intrinsic::amdgcn_readlane: { + LaneOpDst = B.buildIntrinsic(IID, {S32}).addUse(Src0Valid).addUse(Src1); + break; +} +case Intrinsic::amdgcn_writelane: { + Register Src2Valid = B.buildBitcast(S32, Src2).getReg(0); + LaneOpDst = B.buildIntrinsic(IID, {S32}) + .addUse(Src0Valid) + .addUse(Src1) + .addUse(Src2Valid); +} +} + +Register LaneOpDstReg = LaneOpDst.getReg(0); +B.buildBitcast(DstReg, LaneOpDstReg); +MI.eraseFromParent(); +return true; + } + + if (Size < 32) { +Register Src0Cast = MRI.getType(Src0).isScalar() +? Src0 +: B.buildBitcast(LLT::scalar(Size), Src0).getReg(0); +Register Src0Valid = B.buildAnyExt(S32, Src0Cast).getReg(0); + +MachineInstrBuilder LaneOpDst; +switch (IID) { +case Intrinsic::amdgcn_readfirstlane: { + LaneOpDst = B.buildIntrinsic(IID, {S32}).addUse(Src0Valid); + break; +} +case Intrinsic::amdgcn_readlane: { + LaneOpDst = B.buildIntrinsic(IID, {S32}).addUse(Src0Valid).addUse(Src1); + break; +} +case Intrinsic::amdgcn_writelane: { + Register Src2Cast = + MRI.getType(Src2).isScalar() + ? Src2 + : B.buildBitcast(LLT::scalar(Size), Src2).getReg(0); + Register Src2Valid = B.buildAnyExt(LLT::scalar(32), Src2Cast).getReg(0); + LaneOpDst = B.buildIntrinsic(IID, {S32}) + .addUse(Src0Valid) + .addUse(Src1) + .addUse(Src2Valid); +} +} + +Register LaneOpDstReg = LaneOpDst.getReg(0); +if (Ty.isScalar()) + B.buildTrunc(DstReg, LaneOpDstReg); +else { + auto Trunc = B.buildTrunc(LLT::scalar(Size), LaneOpDstReg); + B.buildBitcast(DstReg, Trunc); +} + +MI.eraseFromParent(); +return true; + } + + if ((Size % 32) == 0) { +SmallVector PartialRes; +unsigned NumParts = Size / 32; +auto Src0Parts = B.buildUnmerge(S32, Src0); vikramRH wrote: done I hope as per the expectation, however I don't understand the plus here https://github.com/llvm/llvm-project/pull/89217 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU][WIP] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)
https://github.com/vikramRH deleted https://github.com/llvm/llvm-project/pull/89217 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU][WIP] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)
@@ -5386,6 +5386,153 @@ bool AMDGPULegalizerInfo::legalizeDSAtomicFPIntrinsic(LegalizerHelper &Helper, return true; } +bool AMDGPULegalizerInfo::legalizeLaneOp(LegalizerHelper &Helper, + MachineInstr &MI, + Intrinsic::ID IID) const { + + MachineIRBuilder &B = Helper.MIRBuilder; + MachineRegisterInfo &MRI = *B.getMRI(); + + Register DstReg = MI.getOperand(0).getReg(); + Register Src0 = MI.getOperand(2).getReg(); + + Register Src1, Src2; + if (IID == Intrinsic::amdgcn_readlane || IID == Intrinsic::amdgcn_writelane) { +Src1 = MI.getOperand(3).getReg(); +if (IID == Intrinsic::amdgcn_writelane) { + Src2 = MI.getOperand(4).getReg(); +} + } + + LLT Ty = MRI.getType(DstReg); + unsigned Size = Ty.getSizeInBits(); + + if (Size == 32) { +if (Ty.isScalar()) + // Already legal + return true; + +Register Src0Valid = B.buildBitcast(S32, Src0).getReg(0); +MachineInstrBuilder LaneOpDst; +switch (IID) { +case Intrinsic::amdgcn_readfirstlane: { + LaneOpDst = B.buildIntrinsic(IID, {S32}).addUse(Src0Valid); + break; +} +case Intrinsic::amdgcn_readlane: { + LaneOpDst = B.buildIntrinsic(IID, {S32}).addUse(Src0Valid).addUse(Src1); + break; +} +case Intrinsic::amdgcn_writelane: { + Register Src2Valid = B.buildBitcast(S32, Src2).getReg(0); + LaneOpDst = B.buildIntrinsic(IID, {S32}) + .addUse(Src0Valid) + .addUse(Src1) + .addUse(Src2Valid); +} +} + +Register LaneOpDstReg = LaneOpDst.getReg(0); +B.buildBitcast(DstReg, LaneOpDstReg); +MI.eraseFromParent(); +return true; + } + + if (Size < 32) { +Register Src0Cast = MRI.getType(Src0).isScalar() +? Src0 +: B.buildBitcast(LLT::scalar(Size), Src0).getReg(0); +Register Src0Valid = B.buildAnyExt(S32, Src0Cast).getReg(0); + +MachineInstrBuilder LaneOpDst; +switch (IID) { +case Intrinsic::amdgcn_readfirstlane: { + LaneOpDst = B.buildIntrinsic(IID, {S32}).addUse(Src0Valid); + break; +} +case Intrinsic::amdgcn_readlane: { + LaneOpDst = B.buildIntrinsic(IID, {S32}).addUse(Src0Valid).addUse(Src1); + break; +} +case Intrinsic::amdgcn_writelane: { + Register Src2Cast = + MRI.getType(Src2).isScalar() + ? Src2 + : B.buildBitcast(LLT::scalar(Size), Src2).getReg(0); + Register Src2Valid = B.buildAnyExt(LLT::scalar(32), Src2Cast).getReg(0); + LaneOpDst = B.buildIntrinsic(IID, {S32}) + .addUse(Src0Valid) + .addUse(Src1) + .addUse(Src2Valid); +} +} + +Register LaneOpDstReg = LaneOpDst.getReg(0); +if (Ty.isScalar()) + B.buildTrunc(DstReg, LaneOpDstReg); +else { + auto Trunc = B.buildTrunc(LLT::scalar(Size), LaneOpDstReg); + B.buildBitcast(DstReg, Trunc); +} + +MI.eraseFromParent(); +return true; + } + + if ((Size % 32) == 0) { +SmallVector PartialRes; +unsigned NumParts = Size / 32; +auto Src0Parts = B.buildUnmerge(S32, Src0); vikramRH wrote: Do you mean extract s16 elements individually and handle them as (Size < 32) case ? https://github.com/llvm/llvm-project/pull/89217 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU][WIP] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)
@@ -5386,6 +5386,153 @@ bool AMDGPULegalizerInfo::legalizeDSAtomicFPIntrinsic(LegalizerHelper &Helper, return true; } +bool AMDGPULegalizerInfo::legalizeLaneOp(LegalizerHelper &Helper, + MachineInstr &MI, + Intrinsic::ID IID) const { + + MachineIRBuilder &B = Helper.MIRBuilder; + MachineRegisterInfo &MRI = *B.getMRI(); + + Register DstReg = MI.getOperand(0).getReg(); + Register Src0 = MI.getOperand(2).getReg(); + + Register Src1, Src2; + if (IID == Intrinsic::amdgcn_readlane || IID == Intrinsic::amdgcn_writelane) { +Src1 = MI.getOperand(3).getReg(); +if (IID == Intrinsic::amdgcn_writelane) { + Src2 = MI.getOperand(4).getReg(); +} + } + + LLT Ty = MRI.getType(DstReg); + unsigned Size = Ty.getSizeInBits(); + + if (Size == 32) { +if (Ty.isScalar()) + // Already legal + return true; + +Register Src0Valid = B.buildBitcast(S32, Src0).getReg(0); +MachineInstrBuilder LaneOpDst; +switch (IID) { vikramRH wrote: my bad, I will improve the helper https://github.com/llvm/llvm-project/pull/89217 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU][WIP] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)
@@ -5386,6 +5386,153 @@ bool AMDGPULegalizerInfo::legalizeDSAtomicFPIntrinsic(LegalizerHelper &Helper, return true; } +bool AMDGPULegalizerInfo::legalizeLaneOp(LegalizerHelper &Helper, + MachineInstr &MI, + Intrinsic::ID IID) const { + + MachineIRBuilder &B = Helper.MIRBuilder; + MachineRegisterInfo &MRI = *B.getMRI(); + + Register DstReg = MI.getOperand(0).getReg(); + Register Src0 = MI.getOperand(2).getReg(); + + Register Src1, Src2; + if (IID == Intrinsic::amdgcn_readlane || IID == Intrinsic::amdgcn_writelane) { +Src1 = MI.getOperand(3).getReg(); +if (IID == Intrinsic::amdgcn_writelane) { + Src2 = MI.getOperand(4).getReg(); +} + } + + LLT Ty = MRI.getType(DstReg); + unsigned Size = Ty.getSizeInBits(); + + if (Size == 32) { +if (Ty.isScalar()) + // Already legal + return true; + +Register Src0Valid = B.buildBitcast(S32, Src0).getReg(0); +MachineInstrBuilder LaneOpDst; +switch (IID) { vikramRH wrote: I removed the helper in the recent commit following @arsenm's suggestion. Only reason is readability https://github.com/llvm/llvm-project/pull/89217 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU][WIP] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)
https://github.com/vikramRH edited https://github.com/llvm/llvm-project/pull/89217 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU][WIP] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)
vikramRH wrote: > > add f32 pattern to select read/writelane operations > > Why would you need this? Don't you legalize f32 to i32? Sorry about this. Its a leftover comment from the initial implementation which I should have removed. https://github.com/llvm/llvm-project/pull/89217 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU][WIP] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)
@@ -5386,6 +5386,130 @@ bool AMDGPULegalizerInfo::legalizeDSAtomicFPIntrinsic(LegalizerHelper &Helper, return true; } +bool AMDGPULegalizerInfo::legalizeLaneOp(LegalizerHelper &Helper, + MachineInstr &MI, + Intrinsic::ID IID) const { + + MachineIRBuilder &B = Helper.MIRBuilder; + MachineRegisterInfo &MRI = *B.getMRI(); + + Register DstReg = MI.getOperand(0).getReg(); + Register Src0 = MI.getOperand(2).getReg(); + + auto createLaneOp = [&](Register &Src0, Register &Src1, + Register &Src2) -> Register { +auto LaneOpDst = B.buildIntrinsic(IID, {S32}).addUse(Src0); +if (Src2.isValid()) + return (LaneOpDst.addUse(Src1).addUse(Src2)).getReg(0); +if (Src1.isValid()) + return (LaneOpDst.addUse(Src1)).getReg(0); +return LaneOpDst.getReg(0); + }; + + Register Src1, Src2, Src0Valid, Src2Valid; + if (IID == Intrinsic::amdgcn_readlane || IID == Intrinsic::amdgcn_writelane) { +Src1 = MI.getOperand(3).getReg(); +if (IID == Intrinsic::amdgcn_writelane) { + Src2 = MI.getOperand(4).getReg(); +} + } + + LLT Ty = MRI.getType(DstReg); + unsigned Size = Ty.getSizeInBits(); + + if (Size == 32) { +if (Ty.isScalar()) + // Already legal + return true; + +Register Src0Valid = B.buildBitcast(S32, Src0).getReg(0); +if (Src2.isValid()) + Src2Valid = B.buildBitcast(S32, Src2).getReg(0); +Register LaneOp = createLaneOp(Src0Valid, Src1, Src2Valid); +B.buildBitcast(DstReg, LaneOp); +MI.eraseFromParent(); +return true; + } + + if (Size < 32) { +Register Src0Cast = MRI.getType(Src0).isScalar() +? Src0 +: B.buildBitcast(LLT::scalar(Size), Src0).getReg(0); +Src0Valid = B.buildAnyExt(S32, Src0Cast).getReg(0); + +if (Src2.isValid()) { + Register Src2Cast = + MRI.getType(Src2).isScalar() + ? Src2 + : B.buildBitcast(LLT::scalar(Size), Src2).getReg(0); + Src2Valid = B.buildAnyExt(LLT::scalar(32), Src2Cast).getReg(0); +} +Register LaneOp = createLaneOp(Src0Valid, Src1, Src2Valid); +if (Ty.isScalar()) + B.buildTrunc(DstReg, LaneOp); +else { + auto Trunc = B.buildTrunc(LLT::scalar(Size), LaneOp); + B.buildBitcast(DstReg, Trunc); +} + +MI.eraseFromParent(); +return true; + } + + if ((Size % 32) == 0) { +SmallVector PartialRes; +unsigned NumParts = Size / 32; +auto Src0Parts = B.buildUnmerge(S32, Src0); + +switch (IID) { +case Intrinsic::amdgcn_readlane: { + Register Src1 = MI.getOperand(3).getReg(); + for (unsigned i = 0; i < NumParts; ++i) +PartialRes.push_back( +(B.buildIntrinsic(Intrinsic::amdgcn_readlane, {S32}) + .addUse(Src0Parts.getReg(i)) + .addUse(Src1)) +.getReg(0)); vikramRH wrote: should this be a seperate change that addresses other such instances too ? https://github.com/llvm/llvm-project/pull/89217 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU][WIP] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)
https://github.com/vikramRH edited https://github.com/llvm/llvm-project/pull/89217 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU][WIP] Add support for i64/f64 readlane, writelane and readfirstlane operations. (PR #89217)
vikramRH wrote: 1. Review comments 2. improve GIsel lowering 3. add tests for half, bfloat, float2, ptr, vector of ptr and int 4. removed gfx700 checks from writelane test since it caused issues with f16 legalization. is this required ? https://github.com/llvm/llvm-project/pull/89217 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU][WIP] Add support for i64/f64 readlane, writelane and readfirstlane operations. (PR #89217)
vikramRH wrote: new commit extends @jayfoad's implementation with GIsel support. yet to add tests for half, floats and some vectors https://github.com/llvm/llvm-project/pull/89217 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU][WIP] Add support for i64/f64 readlane, writelane and readfirstlane operations. (PR #89217)
@@ -4822,6 +4822,111 @@ static MachineBasicBlock *lowerWaveReduce(MachineInstr &MI, return RetBB; } +static MachineBasicBlock *lowerPseudoLaneOp(MachineInstr &MI, vikramRH wrote: @arsenm, would "PreISelIntrinsicLowering" be a proper place for this ? https://github.com/llvm/llvm-project/pull/89217 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU][WIP] Add support for i64/f64 readlane, writelane and readfirstlane operations. (PR #89217)
vikramRH wrote: Gentle ping :) https://github.com/llvm/llvm-project/pull/89217 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU][WIP] Add support for i64/f64 readlane, writelane and readfirstlane operations. (PR #89217)
vikramRH wrote: Added/updated tests for readfirstlane and writelane ops https://github.com/llvm/llvm-project/pull/89217 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [RFC][Clang] Enable custom type checking for printf (PR #86801)
vikramRH wrote: > I looked at the OpenCL spec for C standard library support and was surprised > that 1) it's only talking about C99 so it's unclear what happens for C11 > (clause 6 says "This document describes the modifications and restrictions to > C99 and C11 in OpenCL C" but 6.11 only talks about C99 headers and leaves > `iso646.h`, `math.h`, `stdbool.h`, `stddef.h`, (all in C99) as well as > `stdalign.h`, `stdatomic.h`, `stdnoreturn.h`, `threads.h`, and `uchar.h` > available?), and 2) OpenCL's `printf` is not really the same function as C's > `printf` > (https://registry.khronos.org/OpenCL/specs/3.0-unified/html/OpenCL_C.html#differences-between-opencl-c-and-c99-printf). > > #1 is probably more of an oversight than anything, at least with the C11 > headers. So maybe this isn't a super slippery slope, but maybe C23 will > change that (I can imagine `stdbit.h` being of use in OpenCL for bit-bashing > operations). However, the fact that the builtin isn't really `printf` but is > `printf`-like makes me think we should make it a separate builtin to avoid > surprises (we do diagnostics based on builtin IDs and we have special > checking logic that we perhaps should be exempting in some cases). Understood. Then I propose the following. 1. Currently Builtin TableGen does not seem to support specifying lang address spaces in function prototypes. this needs to be implemented first if not already in development. 2. We could have two new macro variants probably named "OCL_BUILTIN" and "OCL_LIB_BUILTIN" which will take the ID's of the form "BI_OCL##". we would also need corresponding TableGen classes (probably named similar to the macros) which can expose such overloaded prototypes when required. How does this sound ? https://github.com/llvm/llvm-project/pull/86801 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [RFC][Clang] Enable custom type checking for printf (PR #86801)
vikramRH wrote: Thanks for the comments @AaronBallman. The core issue here is that the current builtin handling design does not allow multiple overloads for the same identifier to coexist (ref. https://github.com/llvm/llvm-project/blob/eacda36c7dd842cb15c0c954eda74b67d0c73814/clang/include/clang/Basic/Builtins.h#L66), unless the builtins are defined in target specific namespaces which is what I tried in my original patch . If we want change this approach, I currently think of couple of ways at a top level 1. As you said, we could have OCL specific LibBuiltin and LangBuiltin TableGen classes (and corresponding macros in Buitlins.inc). To make this work they would need new builtin ID's of different form (say "BI_OCL##"). This is very Language specific. 2. Probably change the current Builtin Info structure to allow vector of possible signatures for an identifier. The builtin type decoder could choose the appropriate signature based on LangOpt. (This wording is vague and could be a separate discussion in itself ) either way, changes in current design are required. printf is the only current use case I know that can benefit out of this (since OpenCL v1.2 s6.9.f says other library functions defined in C standard header are not available ,so 🤷♂️ ). But I guess we could have more use cases in future. can this be a separate discussion ? This patch would unblock my current work for now. https://github.com/llvm/llvm-project/pull/86801 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [RFC][Clang] Enable custom type checking for printf (PR #86801)
https://github.com/vikramRH ready_for_review https://github.com/llvm/llvm-project/pull/86801 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [AMDGPU] Treat printf as builtin for OpenCL (PR #72554)
vikramRH wrote: closing this in favour of https://github.com/llvm/llvm-project/pull/86801 https://github.com/llvm/llvm-project/pull/72554 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [AMDGPU] Treat printf as builtin for OpenCL (PR #72554)
https://github.com/vikramRH closed https://github.com/llvm/llvm-project/pull/72554 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits