[clang] [llvm] [X86][AMX] Support AMX-AVX512 (PR #114070)
phoebewang wrote: Thanks all! Fixed in https://github.com/llvm/llvm-project/pull/115581 https://github.com/llvm/llvm-project/pull/114070 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [X86][AMX] Support AMX-AVX512 (PR #114070)
alanzhao1 wrote: Figured out a repro - `immintrin.h` doesn't compile if both `-msse` and `-mno-sse2` are passed to clang: ``` $ cat ~/src/test-mac.c #include $ bin/clang -msse -mno-sse2 -o /dev/null -c ~/src/test-mac.c In file included from /usr/local/google/home/ayzhao/src/test-mac.c:1: In file included from /usr/local/google/home/ayzhao/src/llvm-project/build/lib/clang/20/include/immintrin.h:660: /usr/local/google/home/ayzhao/src/llvm-project/build/lib/clang/20/include/amxavx512intrin.h:240:19: error: unknown type name '__m512bh' 240 | static __inline__ __m512bh __DEFAULT_FN_ATTRS_AVX512 | ^ /usr/local/google/home/ayzhao/src/llvm-project/build/lib/clang/20/include/amxavx512intrin.h:243:10: error: returning '__attribute__((__vector_size__(32 * sizeof(__bf16 __bf16' (vector of 32 '__bf16' values) from a function with incompatible result type 'int' 243 | return __builtin_ia32_tcvtrowps2pbf16h_internal(m, n, src, u); | ^~ /usr/local/google/home/ayzhao/src/llvm-project/build/lib/clang/20/include/amxavx512intrin.h:246:19: error: unknown type name '__m512bh' 246 | static __inline__ __m512bh __DEFAULT_FN_ATTRS_AVX512 | ^ /usr/local/google/home/ayzhao/src/llvm-project/build/lib/clang/20/include/amxavx512intrin.h:249:10: error: returning '__attribute__((__vector_size__(32 * sizeof(__bf16 __bf16' (vector of 32 '__bf16' values) from a function with incompatible result type 'int' 249 | return __builtin_ia32_tcvtrowps2pbf16l_internal(m, n, src, u); | ^~ /usr/local/google/home/ayzhao/src/llvm-project/build/lib/clang/20/include/amxavx512intrin.h:252:19: error: unknown type name '__m512h' 252 | static __inline__ __m512h __DEFAULT_FN_ATTRS_AVX512 _tile_cvtrowps2phh_internal( | ^ /usr/local/google/home/ayzhao/src/llvm-project/build/lib/clang/20/include/amxavx512intrin.h:254:10: error: returning '__attribute__((__vector_size__(32 * sizeof(_Float16 _Float16' (vector of 32 '_Float16' values) from a function with incompatible result type 'int' 254 | return __builtin_ia32_tcvtrowps2phh_internal(m, n, src, u); | ^~~ /usr/local/google/home/ayzhao/src/llvm-project/build/lib/clang/20/include/amxavx512intrin.h:257:19: error: unknown type name '__m512h' 257 | static __inline__ __m512h __DEFAULT_FN_ATTRS_AVX512 _tile_cvtrowps2phl_internal( | ^ /usr/local/google/home/ayzhao/src/llvm-project/build/lib/clang/20/include/amxavx512intrin.h:259:10: error: returning '__attribute__((__vector_size__(32 * sizeof(_Float16 _Float16' (vector of 32 '_Float16' values) from a function with incompatible result type 'int' 259 | return __builtin_ia32_tcvtrowps2phl_internal(m, n, src, u); | ^~~ /usr/local/google/home/ayzhao/src/llvm-project/build/lib/clang/20/include/amxavx512intrin.h:302:8: error: unknown type name '__m512bh' 302 | static __m512bh __tile_cvtrowps2pbf16h(__tile1024i src0, unsigned src1) { |^ /usr/local/google/home/ayzhao/src/llvm-project/build/lib/clang/20/include/amxavx512intrin.h:321:8: error: unknown type name '__m512bh' 321 | static __m512bh __tile_cvtrowps2pbf16l(__tile1024i src0, unsigned src1) { |^ /usr/local/google/home/ayzhao/src/llvm-project/build/lib/clang/20/include/amxavx512intrin.h:340:8: error: unknown type name '__m512h' 340 | static __m512h __tile_cvtrowps2phh(__tile1024i src0, unsigned src1) { |^ /usr/local/google/home/ayzhao/src/llvm-project/build/lib/clang/20/include/amxavx512intrin.h:359:8: error: unknown type name '__m512h' 359 | static __m512h __tile_cvtrowps2phl(__tile1024i src0, unsigned src1) { |^ 12 errors generated. ``` Test was on an x64 linux system. https://github.com/llvm/llvm-project/pull/114070 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [X86][AMX] Support AMX-AVX512 (PR #114070)
pranavk wrote: This is causing similar errors for us as well. https://github.com/llvm/llvm-project/pull/114070 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [X86][AMX] Support AMX-AVX512 (PR #114070)
alanzhao1 wrote: > FYI this is causing Chrome X86 MacOS builds to fail due to `error: unknown > type name '__m512bh'`: https://crbug.com/378111077 As I mentioned in https://crbug.com/378111077#comment3, the issue is that we pull in avx512bf16intrin.h because `__SCE__` is not defined, but `__SSE2__` is also not defined, so we don't pull in the `typedef` for `__m512bh` (since they're guarded by macros that look for `__SSE2__`) and so on. We then pull in amxavx512intrin.h which was introduced in this PR which then fails to compile because we don't have the `typedef`s. https://github.com/llvm/llvm-project/pull/114070 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [X86][AMX] Support AMX-AVX512 (PR #114070)
alanzhao1 wrote: FYI this is causing Chrome X86 MacOS builds to fail due to `error: unknown type name '__m512bh'`: https://crbug.com/378111077 https://github.com/llvm/llvm-project/pull/114070 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [X86][AMX] Support AMX-AVX512 (PR #114070)
llvm-ci wrote: LLVM Buildbot has detected a new failure on builder `clang-s390x-linux` running on `systemz-1` while building `clang,llvm` at step 5 "ninja check 1". Full details are available at: https://lab.llvm.org/buildbot/#/builders/42/builds/1812 Here is the relevant piece of the build log for the reference ``` Step 5 (ninja check 1) failure: stage 1 checked (failure) TEST 'libFuzzer-s390x-default-Linux :: fuzzer-timeout.test' FAILED Exit Code: 1 Command Output (stderr): -- RUN: at line 1: /home/uweigand/sandbox/buildbot/clang-s390x-linux/stage1/./bin/clang -Wthread-safety -Wthread-safety-reference -Wthread-safety-beta --driver-mode=g++ -O2 -gline-tables-only -fsanitize=address,fuzzer -I/home/uweigand/sandbox/buildbot/clang-s390x-linux/llvm/compiler-rt/lib/fuzzer /home/uweigand/sandbox/buildbot/clang-s390x-linux/llvm/compiler-rt/test/fuzzer/TimeoutTest.cpp -o /home/uweigand/sandbox/buildbot/clang-s390x-linux/stage1/runtimes/runtimes-bins/compiler-rt/test/fuzzer/S390XDefaultLinuxConfig/Output/fuzzer-timeout.test.tmp-TimeoutTest + /home/uweigand/sandbox/buildbot/clang-s390x-linux/stage1/./bin/clang -Wthread-safety -Wthread-safety-reference -Wthread-safety-beta --driver-mode=g++ -O2 -gline-tables-only -fsanitize=address,fuzzer -I/home/uweigand/sandbox/buildbot/clang-s390x-linux/llvm/compiler-rt/lib/fuzzer /home/uweigand/sandbox/buildbot/clang-s390x-linux/llvm/compiler-rt/test/fuzzer/TimeoutTest.cpp -o /home/uweigand/sandbox/buildbot/clang-s390x-linux/stage1/runtimes/runtimes-bins/compiler-rt/test/fuzzer/S390XDefaultLinuxConfig/Output/fuzzer-timeout.test.tmp-TimeoutTest RUN: at line 2: /home/uweigand/sandbox/buildbot/clang-s390x-linux/stage1/./bin/clang -Wthread-safety -Wthread-safety-reference -Wthread-safety-beta --driver-mode=g++ -O2 -gline-tables-only -fsanitize=address,fuzzer -I/home/uweigand/sandbox/buildbot/clang-s390x-linux/llvm/compiler-rt/lib/fuzzer /home/uweigand/sandbox/buildbot/clang-s390x-linux/llvm/compiler-rt/test/fuzzer/TimeoutEmptyTest.cpp -o /home/uweigand/sandbox/buildbot/clang-s390x-linux/stage1/runtimes/runtimes-bins/compiler-rt/test/fuzzer/S390XDefaultLinuxConfig/Output/fuzzer-timeout.test.tmp-TimeoutEmptyTest + /home/uweigand/sandbox/buildbot/clang-s390x-linux/stage1/./bin/clang -Wthread-safety -Wthread-safety-reference -Wthread-safety-beta --driver-mode=g++ -O2 -gline-tables-only -fsanitize=address,fuzzer -I/home/uweigand/sandbox/buildbot/clang-s390x-linux/llvm/compiler-rt/lib/fuzzer /home/uweigand/sandbox/buildbot/clang-s390x-linux/llvm/compiler-rt/test/fuzzer/TimeoutEmptyTest.cpp -o /home/uweigand/sandbox/buildbot/clang-s390x-linux/stage1/runtimes/runtimes-bins/compiler-rt/test/fuzzer/S390XDefaultLinuxConfig/Output/fuzzer-timeout.test.tmp-TimeoutEmptyTest RUN: at line 3: not /home/uweigand/sandbox/buildbot/clang-s390x-linux/stage1/runtimes/runtimes-bins/compiler-rt/test/fuzzer/S390XDefaultLinuxConfig/Output/fuzzer-timeout.test.tmp-TimeoutTest -timeout=1 2>&1 | FileCheck /home/uweigand/sandbox/buildbot/clang-s390x-linux/llvm/compiler-rt/test/fuzzer/fuzzer-timeout.test --check-prefix=TimeoutTest + not /home/uweigand/sandbox/buildbot/clang-s390x-linux/stage1/runtimes/runtimes-bins/compiler-rt/test/fuzzer/S390XDefaultLinuxConfig/Output/fuzzer-timeout.test.tmp-TimeoutTest -timeout=1 + FileCheck /home/uweigand/sandbox/buildbot/clang-s390x-linux/llvm/compiler-rt/test/fuzzer/fuzzer-timeout.test --check-prefix=TimeoutTest /home/uweigand/sandbox/buildbot/clang-s390x-linux/llvm/compiler-rt/test/fuzzer/fuzzer-timeout.test:7:14: error: TimeoutTest: expected string not found in input TimeoutTest: #0 ^ :19:44: note: scanning from here ==3335512== ERROR: libFuzzer: timeout after 1 seconds ^ :24:104: note: possible intended match here AddressSanitizer: CHECK failed: asan_report.cpp:199 "((current_error_.kind)) == ((kErrorKindInvalid))" (0x1, 0x0) (tid=3335512) ^ Input file: Check file: /home/uweigand/sandbox/buildbot/clang-s390x-linux/llvm/compiler-rt/test/fuzzer/fuzzer-timeout.test -dump-input=help explains the following input dump. Input was: << . . . 14: MS: 1 InsertByte-; base unit: bf397014ecbce0b1be8d9011c77f6181927a357f 15: 0x48,0x69,0x21,0x48, 16: Hi!H 17: artifact_prefix='./'; Test unit written to ./timeout-c9f7ef19d5ac7565f3dcaf7a3221ae711a187db5 18: Base64: SGkhSA== 19: ==3335512== ERROR: libFuzzer: timeout after 1 seconds check:7'0X~~ error: no match found 20: AddressSanitizer:DEADLYSIGNAL check:7'0 ~~ 21: = check:7'0
[clang] [llvm] [X86][AMX] Support AMX-AVX512 (PR #114070)
llvm-ci wrote: LLVM Buildbot has detected a new failure on builder `sanitizer-x86_64-linux-bootstrap-asan` running on `sanitizer-buildbot2` while building `clang,llvm` at step 2 "annotate". Full details are available at: https://lab.llvm.org/buildbot/#/builders/52/builds/3564 Here is the relevant piece of the build log for the reference ``` Step 2 (annotate) failure: 'python ../sanitizer_buildbot/sanitizers/zorg/buildbot/builders/sanitizers/buildbot_selector.py' (failure) ... llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:506: note: using lld-link: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/lld-link llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:506: note: using ld64.lld: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/ld64.lld llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:506: note: using wasm-ld: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/wasm-ld llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:506: note: using ld.lld: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/ld.lld llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:506: note: using lld-link: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/lld-link llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:506: note: using ld64.lld: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/ld64.lld llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:506: note: using wasm-ld: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/wasm-ld llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/main.py:72: note: The test suite configuration requested an individual test timeout of 0 seconds but a timeout of 900 seconds was requested on the command line. Forcing timeout to be 900 seconds. -- Testing: 87001 of 87002 tests, 88 workers -- Testing: 0.. 10.. 20.. 30.. 40.. 50.. 60.. 70.. 80.. 90. FAIL: lld :: ELF/allow-shlib-undefined.s (84780 of 87001) TEST 'lld :: ELF/allow-shlib-undefined.s' FAILED Exit Code: 1 Command Output (stderr): -- RUN: at line 3: rm -rf /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/tools/lld/test/ELF/Output/allow-shlib-undefined.s.tmp && split-file /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/lld/test/ELF/allow-shlib-undefined.s /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/tools/lld/test/ELF/Output/allow-shlib-undefined.s.tmp && cd /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/tools/lld/test/ELF/Output/allow-shlib-undefined.s.tmp + rm -rf /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/tools/lld/test/ELF/Output/allow-shlib-undefined.s.tmp + split-file /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/lld/test/ELF/allow-shlib-undefined.s /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/tools/lld/test/ELF/Output/allow-shlib-undefined.s.tmp + cd /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/tools/lld/test/ELF/Output/allow-shlib-undefined.s.tmp RUN: at line 4: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/llvm-mc -filetype=obj -triple=x86_64 main.s -o main.o + /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/llvm-mc -filetype=obj -triple=x86_64 main.s -o main.o RUN: at line 5: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/llvm-mc -filetype=obj -triple=x86_64 def.s -o def.o + /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/llvm-mc -filetype=obj -triple=x86_64 def.s -o def.o RUN: at line 6: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/llvm-mc -filetype=obj -triple=x86_64 def-hidden.s -o def-hidden.o + /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/llvm-mc -filetype=obj -triple=x86_64 def-hidden.s -o def-hidden.o RUN: at line 7: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/llvm-mc -filetype=obj -triple=x86_64 ref.s -o ref.o + /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/llvm-mc -filetype=obj -triple=x86_64 ref.s -o ref.o RUN: at line 8: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/llvm-mc -filetype=obj -triple=x86_64 a.s -o a.o && /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/ld.lld -shared a.o -o a.so + /home/b/
[clang] [llvm] [X86][AMX] Support AMX-AVX512 (PR #114070)
https://github.com/fzou1 approved this pull request. LGTM https://github.com/llvm/llvm-project/pull/114070 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [X86][AMX] Support AMX-AVX512 (PR #114070)
@@ -369,3 +369,150 @@ let Predicates = [HasAMXTRANSPOSE, In64BitMode] in { } } } // HasAMXTILE, HasAMXTRANSPOSE + +multiclass m_tcvtrowd2ps { + let Predicates = [HasAMXAVX512, In64BitMode] in { phoebewang wrote: Done, thanks! https://github.com/llvm/llvm-project/pull/114070 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [X86][AMX] Support AMX-AVX512 (PR #114070)
https://github.com/phoebewang updated https://github.com/llvm/llvm-project/pull/114070 >From 587d0105e7724db0f35fc5c8179519fa6319e5c8 Mon Sep 17 00:00:00 2001 From: "Wang, Phoebe" Date: Tue, 29 Oct 2024 22:29:25 +0800 Subject: [PATCH 1/6] [X86][AMX] Support AMX-AVX512 --- clang/docs/ReleaseNotes.rst | 2 + clang/include/clang/Basic/BuiltinsX86_64.def | 13 + clang/include/clang/Driver/Options.td | 2 + clang/lib/Basic/Targets/X86.cpp | 6 + clang/lib/Basic/Targets/X86.h | 1 + clang/lib/Headers/CMakeLists.txt | 1 + clang/lib/Headers/amxavx512intrin.h | 381 ++ clang/lib/Headers/immintrin.h | 4 + clang/lib/Sema/SemaX86.cpp| 6 + clang/test/CodeGen/X86/amx_avx512_api.c | 52 +++ clang/test/CodeGen/X86/amxavx512-builtins.c | 41 ++ clang/test/CodeGen/attr-target-x86.c | 8 +- clang/test/Driver/x86-target-features.c | 7 + clang/test/Preprocessor/x86_target_features.c | 7 + llvm/include/llvm/IR/IntrinsicsX86.td | 50 +++ .../llvm/TargetParser/X86TargetParser.def | 1 + llvm/lib/Target/X86/X86.td| 4 + llvm/lib/Target/X86/X86ExpandPseudo.cpp | 64 ++- llvm/lib/Target/X86/X86ISelLowering.cpp | 76 llvm/lib/Target/X86/X86InstrAMX.td| 147 +++ llvm/lib/Target/X86/X86InstrPredicates.td | 1 + llvm/lib/Target/X86/X86LowerAMXType.cpp | 11 + llvm/lib/Target/X86/X86PreTileConfig.cpp | 19 +- llvm/lib/TargetParser/Host.cpp| 4 + llvm/lib/TargetParser/X86TargetParser.cpp | 2 + .../CodeGen/X86/amx-across-func-tilemovrow.ll | 171 .../test/CodeGen/X86/amx-avx512-intrinsics.ll | 116 ++ .../CodeGen/X86/amx-tile-avx512-internals.ll | 61 +++ llvm/test/MC/Disassembler/X86/amx-avx512.txt | 106 + llvm/test/MC/X86/amx-avx512-att.s | 105 + llvm/test/MC/X86/amx-avx512-intel.s | 105 + 31 files changed, 1564 insertions(+), 10 deletions(-) create mode 100644 clang/lib/Headers/amxavx512intrin.h create mode 100644 clang/test/CodeGen/X86/amx_avx512_api.c create mode 100644 clang/test/CodeGen/X86/amxavx512-builtins.c create mode 100644 llvm/test/CodeGen/X86/amx-across-func-tilemovrow.ll create mode 100644 llvm/test/CodeGen/X86/amx-avx512-intrinsics.ll create mode 100644 llvm/test/CodeGen/X86/amx-tile-avx512-internals.ll create mode 100644 llvm/test/MC/Disassembler/X86/amx-avx512.txt create mode 100644 llvm/test/MC/X86/amx-avx512-att.s create mode 100644 llvm/test/MC/X86/amx-avx512-intel.s diff --git a/clang/docs/ReleaseNotes.rst b/clang/docs/ReleaseNotes.rst index ce046a305c89b6..d45bd1240dd173 100644 --- a/clang/docs/ReleaseNotes.rst +++ b/clang/docs/ReleaseNotes.rst @@ -611,6 +611,8 @@ X86 Support * Supported MINMAX intrinsics of ``*_(mask(z)))_minmax(ne)_p[s|d|h|bh]`` and ``*_(mask(z)))_minmax_s[s|d|h]``. +- Support ISA of ``AMX-AVX512``. + - All intrinsics in adcintrin.h can now be used in constant expressions. - All intrinsics in adxintrin.h can now be used in constant expressions. diff --git a/clang/include/clang/Basic/BuiltinsX86_64.def b/clang/include/clang/Basic/BuiltinsX86_64.def index 2c591edb2835cd..70644f3f6b6054 100644 --- a/clang/include/clang/Basic/BuiltinsX86_64.def +++ b/clang/include/clang/Basic/BuiltinsX86_64.def @@ -128,6 +128,12 @@ TARGET_BUILTIN(__builtin_ia32_tdpbf16ps_internal, "V256iUsUsUsV256iV256iV256i", TARGET_BUILTIN(__builtin_ia32_tdpfp16ps_internal, "V256iUsUsUsV256iV256iV256i", "n", "amx-fp16") TARGET_BUILTIN(__builtin_ia32_tcmmimfp16ps_internal, "V256iUsUsUsV256iV256iV256i", "n", "amx-complex") TARGET_BUILTIN(__builtin_ia32_tcmmrlfp16ps_internal, "V256iUsUsUsV256iV256iV256i", "n", "amx-complex") +TARGET_BUILTIN(__builtin_ia32_tcvtrowd2ps_internal, "V16fUsUsV256iUi", "n", "amx-avx512") +TARGET_BUILTIN(__builtin_ia32_tcvtrowps2pbf16h_internal, "V32yUsUsV256iUi", "n", "amx-avx512") +TARGET_BUILTIN(__builtin_ia32_tcvtrowps2pbf16l_internal, "V32yUsUsV256iUi", "n", "amx-avx512") +TARGET_BUILTIN(__builtin_ia32_tcvtrowps2phh_internal, "V32xUsUsV256iUi", "n", "amx-avx512") +TARGET_BUILTIN(__builtin_ia32_tcvtrowps2phl_internal, "V32xUsUsV256iUi", "n", "amx-avx512") +TARGET_BUILTIN(__builtin_ia32_tilemovrow_internal, "V16iUsUsV256iUi", "n", "amx-avx512") // AMX TARGET_BUILTIN(__builtin_ia32_tile_loadconfig, "vvC*", "n", "amx-tile") TARGET_BUILTIN(__builtin_ia32_tile_storeconfig, "vvC*", "n", "amx-tile") @@ -148,6 +154,13 @@ TARGET_BUILTIN(__builtin_ia32_ptwrite64, "vUOi", "n", "ptwrite") TARGET_BUILTIN(__builtin_ia32_tcmmimfp16ps, "vIUcIUcIUc", "n", "amx-complex") TARGET_BUILTIN(__builtin_ia32_tcmmrlfp16ps, "vIUcIUcIUc", "n", "amx-complex") +TARGET_BUILTIN(__builtin_ia32_tcvtrowd2ps, "V16fIUcUi", "n", "amx-avx512") +TARGET_BUILTIN(__builtin_ia32_tcvtrowps2pbf16h, "V32yIUcUi", "n", "amx-avx512") +TARGET_BUILTIN(__builtin_i
[clang] [llvm] [X86][AMX] Support AMX-AVX512 (PR #114070)
@@ -369,3 +369,150 @@ let Predicates = [HasAMXTRANSPOSE, In64BitMode] in { } } } // HasAMXTILE, HasAMXTRANSPOSE + +multiclass m_tcvtrowd2ps { + let Predicates = [HasAMXAVX512, In64BitMode] in { fzou1 wrote: Should add HasAVX10_2_512 in line 374, 390 and 454? https://github.com/llvm/llvm-project/pull/114070 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [X86][AMX] Support AMX-AVX512 (PR #114070)
https://github.com/fzou1 edited https://github.com/llvm/llvm-project/pull/114070 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [X86][AMX] Support AMX-AVX512 (PR #114070)
https://github.com/fzou1 commented: LGTM except the last place probably missing avx10.2-512 dependency. https://github.com/llvm/llvm-project/pull/114070 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [X86][AMX] Support AMX-AVX512 (PR #114070)
@@ -0,0 +1,381 @@ +/*===- amxavx512intrin.h - AMXAVX512 === + * + * Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. + * See https://llvm.org/LICENSE.txt for license information. + * SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception + * + *====== + */ +#ifndef __IMMINTRIN_H +#error "Never use directly; include instead." +#endif // __IMMINTRIN_H + +#ifndef __AMX_AVX512INTRIN_H +#define __AMX_AVX512INTRIN_H +#ifdef __x86_64__ + +#define __DEFAULT_FN_ATTRS_AVX512 \ + __attribute__((__always_inline__, __nodebug__, __target__("amx-avx512"))) phoebewang wrote: Yes, we need them all. Good catch! https://github.com/llvm/llvm-project/pull/114070 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [X86][AMX] Support AMX-AVX512 (PR #114070)
https://github.com/phoebewang updated https://github.com/llvm/llvm-project/pull/114070 >From 587d0105e7724db0f35fc5c8179519fa6319e5c8 Mon Sep 17 00:00:00 2001 From: "Wang, Phoebe" Date: Tue, 29 Oct 2024 22:29:25 +0800 Subject: [PATCH 1/5] [X86][AMX] Support AMX-AVX512 --- clang/docs/ReleaseNotes.rst | 2 + clang/include/clang/Basic/BuiltinsX86_64.def | 13 + clang/include/clang/Driver/Options.td | 2 + clang/lib/Basic/Targets/X86.cpp | 6 + clang/lib/Basic/Targets/X86.h | 1 + clang/lib/Headers/CMakeLists.txt | 1 + clang/lib/Headers/amxavx512intrin.h | 381 ++ clang/lib/Headers/immintrin.h | 4 + clang/lib/Sema/SemaX86.cpp| 6 + clang/test/CodeGen/X86/amx_avx512_api.c | 52 +++ clang/test/CodeGen/X86/amxavx512-builtins.c | 41 ++ clang/test/CodeGen/attr-target-x86.c | 8 +- clang/test/Driver/x86-target-features.c | 7 + clang/test/Preprocessor/x86_target_features.c | 7 + llvm/include/llvm/IR/IntrinsicsX86.td | 50 +++ .../llvm/TargetParser/X86TargetParser.def | 1 + llvm/lib/Target/X86/X86.td| 4 + llvm/lib/Target/X86/X86ExpandPseudo.cpp | 64 ++- llvm/lib/Target/X86/X86ISelLowering.cpp | 76 llvm/lib/Target/X86/X86InstrAMX.td| 147 +++ llvm/lib/Target/X86/X86InstrPredicates.td | 1 + llvm/lib/Target/X86/X86LowerAMXType.cpp | 11 + llvm/lib/Target/X86/X86PreTileConfig.cpp | 19 +- llvm/lib/TargetParser/Host.cpp| 4 + llvm/lib/TargetParser/X86TargetParser.cpp | 2 + .../CodeGen/X86/amx-across-func-tilemovrow.ll | 171 .../test/CodeGen/X86/amx-avx512-intrinsics.ll | 116 ++ .../CodeGen/X86/amx-tile-avx512-internals.ll | 61 +++ llvm/test/MC/Disassembler/X86/amx-avx512.txt | 106 + llvm/test/MC/X86/amx-avx512-att.s | 105 + llvm/test/MC/X86/amx-avx512-intel.s | 105 + 31 files changed, 1564 insertions(+), 10 deletions(-) create mode 100644 clang/lib/Headers/amxavx512intrin.h create mode 100644 clang/test/CodeGen/X86/amx_avx512_api.c create mode 100644 clang/test/CodeGen/X86/amxavx512-builtins.c create mode 100644 llvm/test/CodeGen/X86/amx-across-func-tilemovrow.ll create mode 100644 llvm/test/CodeGen/X86/amx-avx512-intrinsics.ll create mode 100644 llvm/test/CodeGen/X86/amx-tile-avx512-internals.ll create mode 100644 llvm/test/MC/Disassembler/X86/amx-avx512.txt create mode 100644 llvm/test/MC/X86/amx-avx512-att.s create mode 100644 llvm/test/MC/X86/amx-avx512-intel.s diff --git a/clang/docs/ReleaseNotes.rst b/clang/docs/ReleaseNotes.rst index ce046a305c89b6..d45bd1240dd173 100644 --- a/clang/docs/ReleaseNotes.rst +++ b/clang/docs/ReleaseNotes.rst @@ -611,6 +611,8 @@ X86 Support * Supported MINMAX intrinsics of ``*_(mask(z)))_minmax(ne)_p[s|d|h|bh]`` and ``*_(mask(z)))_minmax_s[s|d|h]``. +- Support ISA of ``AMX-AVX512``. + - All intrinsics in adcintrin.h can now be used in constant expressions. - All intrinsics in adxintrin.h can now be used in constant expressions. diff --git a/clang/include/clang/Basic/BuiltinsX86_64.def b/clang/include/clang/Basic/BuiltinsX86_64.def index 2c591edb2835cd..70644f3f6b6054 100644 --- a/clang/include/clang/Basic/BuiltinsX86_64.def +++ b/clang/include/clang/Basic/BuiltinsX86_64.def @@ -128,6 +128,12 @@ TARGET_BUILTIN(__builtin_ia32_tdpbf16ps_internal, "V256iUsUsUsV256iV256iV256i", TARGET_BUILTIN(__builtin_ia32_tdpfp16ps_internal, "V256iUsUsUsV256iV256iV256i", "n", "amx-fp16") TARGET_BUILTIN(__builtin_ia32_tcmmimfp16ps_internal, "V256iUsUsUsV256iV256iV256i", "n", "amx-complex") TARGET_BUILTIN(__builtin_ia32_tcmmrlfp16ps_internal, "V256iUsUsUsV256iV256iV256i", "n", "amx-complex") +TARGET_BUILTIN(__builtin_ia32_tcvtrowd2ps_internal, "V16fUsUsV256iUi", "n", "amx-avx512") +TARGET_BUILTIN(__builtin_ia32_tcvtrowps2pbf16h_internal, "V32yUsUsV256iUi", "n", "amx-avx512") +TARGET_BUILTIN(__builtin_ia32_tcvtrowps2pbf16l_internal, "V32yUsUsV256iUi", "n", "amx-avx512") +TARGET_BUILTIN(__builtin_ia32_tcvtrowps2phh_internal, "V32xUsUsV256iUi", "n", "amx-avx512") +TARGET_BUILTIN(__builtin_ia32_tcvtrowps2phl_internal, "V32xUsUsV256iUi", "n", "amx-avx512") +TARGET_BUILTIN(__builtin_ia32_tilemovrow_internal, "V16iUsUsV256iUi", "n", "amx-avx512") // AMX TARGET_BUILTIN(__builtin_ia32_tile_loadconfig, "vvC*", "n", "amx-tile") TARGET_BUILTIN(__builtin_ia32_tile_storeconfig, "vvC*", "n", "amx-tile") @@ -148,6 +154,13 @@ TARGET_BUILTIN(__builtin_ia32_ptwrite64, "vUOi", "n", "ptwrite") TARGET_BUILTIN(__builtin_ia32_tcmmimfp16ps, "vIUcIUcIUc", "n", "amx-complex") TARGET_BUILTIN(__builtin_ia32_tcmmrlfp16ps, "vIUcIUcIUc", "n", "amx-complex") +TARGET_BUILTIN(__builtin_ia32_tcvtrowd2ps, "V16fIUcUi", "n", "amx-avx512") +TARGET_BUILTIN(__builtin_ia32_tcvtrowps2pbf16h, "V32yIUcUi", "n", "amx-avx512") +TARGET_BUILTIN(__builtin_i
[clang] [llvm] [X86][AMX] Support AMX-AVX512 (PR #114070)
@@ -133,6 +133,12 @@ TARGET_BUILTIN(__builtin_ia32_t2rpntlvwz0t1_internal, "vUsUsUsV256i*V256i*vC*z", TARGET_BUILTIN(__builtin_ia32_t2rpntlvwz1_internal, "vUsUsUsV256i*V256i*vC*z", "n", "amx-transpose") TARGET_BUILTIN(__builtin_ia32_t2rpntlvwz1t1_internal, "vUsUsUsV256i*V256i*vC*z", "n", "amx-transpose") TARGET_BUILTIN(__builtin_ia32_ttransposed_internal, "V256iUsUsV256i", "n", "amx-transpose") +TARGET_BUILTIN(__builtin_ia32_tcvtrowd2ps_internal, "V16fUsUsV256iUi", "n", "amx-avx512") phoebewang wrote: Yes, sorry I missed that parts. https://github.com/llvm/llvm-project/pull/114070 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [X86][AMX] Support AMX-AVX512 (PR #114070)
https://github.com/phoebewang updated https://github.com/llvm/llvm-project/pull/114070 >From 587d0105e7724db0f35fc5c8179519fa6319e5c8 Mon Sep 17 00:00:00 2001 From: "Wang, Phoebe" Date: Tue, 29 Oct 2024 22:29:25 +0800 Subject: [PATCH 1/4] [X86][AMX] Support AMX-AVX512 --- clang/docs/ReleaseNotes.rst | 2 + clang/include/clang/Basic/BuiltinsX86_64.def | 13 + clang/include/clang/Driver/Options.td | 2 + clang/lib/Basic/Targets/X86.cpp | 6 + clang/lib/Basic/Targets/X86.h | 1 + clang/lib/Headers/CMakeLists.txt | 1 + clang/lib/Headers/amxavx512intrin.h | 381 ++ clang/lib/Headers/immintrin.h | 4 + clang/lib/Sema/SemaX86.cpp| 6 + clang/test/CodeGen/X86/amx_avx512_api.c | 52 +++ clang/test/CodeGen/X86/amxavx512-builtins.c | 41 ++ clang/test/CodeGen/attr-target-x86.c | 8 +- clang/test/Driver/x86-target-features.c | 7 + clang/test/Preprocessor/x86_target_features.c | 7 + llvm/include/llvm/IR/IntrinsicsX86.td | 50 +++ .../llvm/TargetParser/X86TargetParser.def | 1 + llvm/lib/Target/X86/X86.td| 4 + llvm/lib/Target/X86/X86ExpandPseudo.cpp | 64 ++- llvm/lib/Target/X86/X86ISelLowering.cpp | 76 llvm/lib/Target/X86/X86InstrAMX.td| 147 +++ llvm/lib/Target/X86/X86InstrPredicates.td | 1 + llvm/lib/Target/X86/X86LowerAMXType.cpp | 11 + llvm/lib/Target/X86/X86PreTileConfig.cpp | 19 +- llvm/lib/TargetParser/Host.cpp| 4 + llvm/lib/TargetParser/X86TargetParser.cpp | 2 + .../CodeGen/X86/amx-across-func-tilemovrow.ll | 171 .../test/CodeGen/X86/amx-avx512-intrinsics.ll | 116 ++ .../CodeGen/X86/amx-tile-avx512-internals.ll | 61 +++ llvm/test/MC/Disassembler/X86/amx-avx512.txt | 106 + llvm/test/MC/X86/amx-avx512-att.s | 105 + llvm/test/MC/X86/amx-avx512-intel.s | 105 + 31 files changed, 1564 insertions(+), 10 deletions(-) create mode 100644 clang/lib/Headers/amxavx512intrin.h create mode 100644 clang/test/CodeGen/X86/amx_avx512_api.c create mode 100644 clang/test/CodeGen/X86/amxavx512-builtins.c create mode 100644 llvm/test/CodeGen/X86/amx-across-func-tilemovrow.ll create mode 100644 llvm/test/CodeGen/X86/amx-avx512-intrinsics.ll create mode 100644 llvm/test/CodeGen/X86/amx-tile-avx512-internals.ll create mode 100644 llvm/test/MC/Disassembler/X86/amx-avx512.txt create mode 100644 llvm/test/MC/X86/amx-avx512-att.s create mode 100644 llvm/test/MC/X86/amx-avx512-intel.s diff --git a/clang/docs/ReleaseNotes.rst b/clang/docs/ReleaseNotes.rst index ce046a305c89b6..d45bd1240dd173 100644 --- a/clang/docs/ReleaseNotes.rst +++ b/clang/docs/ReleaseNotes.rst @@ -611,6 +611,8 @@ X86 Support * Supported MINMAX intrinsics of ``*_(mask(z)))_minmax(ne)_p[s|d|h|bh]`` and ``*_(mask(z)))_minmax_s[s|d|h]``. +- Support ISA of ``AMX-AVX512``. + - All intrinsics in adcintrin.h can now be used in constant expressions. - All intrinsics in adxintrin.h can now be used in constant expressions. diff --git a/clang/include/clang/Basic/BuiltinsX86_64.def b/clang/include/clang/Basic/BuiltinsX86_64.def index 2c591edb2835cd..70644f3f6b6054 100644 --- a/clang/include/clang/Basic/BuiltinsX86_64.def +++ b/clang/include/clang/Basic/BuiltinsX86_64.def @@ -128,6 +128,12 @@ TARGET_BUILTIN(__builtin_ia32_tdpbf16ps_internal, "V256iUsUsUsV256iV256iV256i", TARGET_BUILTIN(__builtin_ia32_tdpfp16ps_internal, "V256iUsUsUsV256iV256iV256i", "n", "amx-fp16") TARGET_BUILTIN(__builtin_ia32_tcmmimfp16ps_internal, "V256iUsUsUsV256iV256iV256i", "n", "amx-complex") TARGET_BUILTIN(__builtin_ia32_tcmmrlfp16ps_internal, "V256iUsUsUsV256iV256iV256i", "n", "amx-complex") +TARGET_BUILTIN(__builtin_ia32_tcvtrowd2ps_internal, "V16fUsUsV256iUi", "n", "amx-avx512") +TARGET_BUILTIN(__builtin_ia32_tcvtrowps2pbf16h_internal, "V32yUsUsV256iUi", "n", "amx-avx512") +TARGET_BUILTIN(__builtin_ia32_tcvtrowps2pbf16l_internal, "V32yUsUsV256iUi", "n", "amx-avx512") +TARGET_BUILTIN(__builtin_ia32_tcvtrowps2phh_internal, "V32xUsUsV256iUi", "n", "amx-avx512") +TARGET_BUILTIN(__builtin_ia32_tcvtrowps2phl_internal, "V32xUsUsV256iUi", "n", "amx-avx512") +TARGET_BUILTIN(__builtin_ia32_tilemovrow_internal, "V16iUsUsV256iUi", "n", "amx-avx512") // AMX TARGET_BUILTIN(__builtin_ia32_tile_loadconfig, "vvC*", "n", "amx-tile") TARGET_BUILTIN(__builtin_ia32_tile_storeconfig, "vvC*", "n", "amx-tile") @@ -148,6 +154,13 @@ TARGET_BUILTIN(__builtin_ia32_ptwrite64, "vUOi", "n", "ptwrite") TARGET_BUILTIN(__builtin_ia32_tcmmimfp16ps, "vIUcIUcIUc", "n", "amx-complex") TARGET_BUILTIN(__builtin_ia32_tcmmrlfp16ps, "vIUcIUcIUc", "n", "amx-complex") +TARGET_BUILTIN(__builtin_ia32_tcvtrowd2ps, "V16fIUcUi", "n", "amx-avx512") +TARGET_BUILTIN(__builtin_ia32_tcvtrowps2pbf16h, "V32yIUcUi", "n", "amx-avx512") +TARGET_BUILTIN(__builtin_i
[clang] [llvm] [X86][AMX] Support AMX-AVX512 (PR #114070)
@@ -0,0 +1,381 @@ +/*===- amxavx512intrin.h - AMXAVX512 === + * + * Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. + * See https://llvm.org/LICENSE.txt for license information. + * SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception + * + *====== + */ +#ifndef __IMMINTRIN_H +#error "Never use directly; include instead." +#endif // __IMMINTRIN_H + +#ifndef __AMX_AVX512INTRIN_H +#define __AMX_AVX512INTRIN_H +#ifdef __x86_64__ + +#define __DEFAULT_FN_ATTRS_AVX512 \ + __attribute__((__always_inline__, __nodebug__, __target__("amx-avx512"))) fzou1 wrote: If AVX10.2-512 feature dependency is needed for internal APIs, we may create another attribute with AVX10.2-512 and add it to internal APIs. https://github.com/llvm/llvm-project/pull/114070 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [X86][AMX] Support AMX-AVX512 (PR #114070)
@@ -133,6 +133,12 @@ TARGET_BUILTIN(__builtin_ia32_t2rpntlvwz0t1_internal, "vUsUsUsV256i*V256i*vC*z", TARGET_BUILTIN(__builtin_ia32_t2rpntlvwz1_internal, "vUsUsUsV256i*V256i*vC*z", "n", "amx-transpose") TARGET_BUILTIN(__builtin_ia32_t2rpntlvwz1t1_internal, "vUsUsUsV256i*V256i*vC*z", "n", "amx-transpose") TARGET_BUILTIN(__builtin_ia32_ttransposed_internal, "V256iUsUsV256i", "n", "amx-transpose") +TARGET_BUILTIN(__builtin_ia32_tcvtrowd2ps_internal, "V16fUsUsV256iUi", "n", "amx-avx512") fzou1 wrote: Is it necessary to add avx10.2-512 feature for these internal APIs? With that, we may detect errors for new APIs if there is no AVX10.2-512 target feature. https://github.com/llvm/llvm-project/pull/114070 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [X86][AMX] Support AMX-AVX512 (PR #114070)
@@ -133,6 +133,12 @@ TARGET_BUILTIN(__builtin_ia32_t2rpntlvwz0t1_internal, "vUsUsUsV256i*V256i*vC*z", TARGET_BUILTIN(__builtin_ia32_t2rpntlvwz1_internal, "vUsUsUsV256i*V256i*vC*z", "n", "amx-transpose") TARGET_BUILTIN(__builtin_ia32_t2rpntlvwz1t1_internal, "vUsUsUsV256i*V256i*vC*z", "n", "amx-transpose") TARGET_BUILTIN(__builtin_ia32_ttransposed_internal, "V256iUsUsV256i", "n", "amx-transpose") +TARGET_BUILTIN(__builtin_ia32_tcvtrowd2ps_internal, "V16fUsUsV256iUi", "n", "amx-avx512") phoebewang wrote: Good catch! Done. https://github.com/llvm/llvm-project/pull/114070 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [X86][AMX] Support AMX-AVX512 (PR #114070)
https://github.com/phoebewang updated https://github.com/llvm/llvm-project/pull/114070 >From 587d0105e7724db0f35fc5c8179519fa6319e5c8 Mon Sep 17 00:00:00 2001 From: "Wang, Phoebe" Date: Tue, 29 Oct 2024 22:29:25 +0800 Subject: [PATCH 1/3] [X86][AMX] Support AMX-AVX512 --- clang/docs/ReleaseNotes.rst | 2 + clang/include/clang/Basic/BuiltinsX86_64.def | 13 + clang/include/clang/Driver/Options.td | 2 + clang/lib/Basic/Targets/X86.cpp | 6 + clang/lib/Basic/Targets/X86.h | 1 + clang/lib/Headers/CMakeLists.txt | 1 + clang/lib/Headers/amxavx512intrin.h | 381 ++ clang/lib/Headers/immintrin.h | 4 + clang/lib/Sema/SemaX86.cpp| 6 + clang/test/CodeGen/X86/amx_avx512_api.c | 52 +++ clang/test/CodeGen/X86/amxavx512-builtins.c | 41 ++ clang/test/CodeGen/attr-target-x86.c | 8 +- clang/test/Driver/x86-target-features.c | 7 + clang/test/Preprocessor/x86_target_features.c | 7 + llvm/include/llvm/IR/IntrinsicsX86.td | 50 +++ .../llvm/TargetParser/X86TargetParser.def | 1 + llvm/lib/Target/X86/X86.td| 4 + llvm/lib/Target/X86/X86ExpandPseudo.cpp | 64 ++- llvm/lib/Target/X86/X86ISelLowering.cpp | 76 llvm/lib/Target/X86/X86InstrAMX.td| 147 +++ llvm/lib/Target/X86/X86InstrPredicates.td | 1 + llvm/lib/Target/X86/X86LowerAMXType.cpp | 11 + llvm/lib/Target/X86/X86PreTileConfig.cpp | 19 +- llvm/lib/TargetParser/Host.cpp| 4 + llvm/lib/TargetParser/X86TargetParser.cpp | 2 + .../CodeGen/X86/amx-across-func-tilemovrow.ll | 171 .../test/CodeGen/X86/amx-avx512-intrinsics.ll | 116 ++ .../CodeGen/X86/amx-tile-avx512-internals.ll | 61 +++ llvm/test/MC/Disassembler/X86/amx-avx512.txt | 106 + llvm/test/MC/X86/amx-avx512-att.s | 105 + llvm/test/MC/X86/amx-avx512-intel.s | 105 + 31 files changed, 1564 insertions(+), 10 deletions(-) create mode 100644 clang/lib/Headers/amxavx512intrin.h create mode 100644 clang/test/CodeGen/X86/amx_avx512_api.c create mode 100644 clang/test/CodeGen/X86/amxavx512-builtins.c create mode 100644 llvm/test/CodeGen/X86/amx-across-func-tilemovrow.ll create mode 100644 llvm/test/CodeGen/X86/amx-avx512-intrinsics.ll create mode 100644 llvm/test/CodeGen/X86/amx-tile-avx512-internals.ll create mode 100644 llvm/test/MC/Disassembler/X86/amx-avx512.txt create mode 100644 llvm/test/MC/X86/amx-avx512-att.s create mode 100644 llvm/test/MC/X86/amx-avx512-intel.s diff --git a/clang/docs/ReleaseNotes.rst b/clang/docs/ReleaseNotes.rst index ce046a305c89b6..d45bd1240dd173 100644 --- a/clang/docs/ReleaseNotes.rst +++ b/clang/docs/ReleaseNotes.rst @@ -611,6 +611,8 @@ X86 Support * Supported MINMAX intrinsics of ``*_(mask(z)))_minmax(ne)_p[s|d|h|bh]`` and ``*_(mask(z)))_minmax_s[s|d|h]``. +- Support ISA of ``AMX-AVX512``. + - All intrinsics in adcintrin.h can now be used in constant expressions. - All intrinsics in adxintrin.h can now be used in constant expressions. diff --git a/clang/include/clang/Basic/BuiltinsX86_64.def b/clang/include/clang/Basic/BuiltinsX86_64.def index 2c591edb2835cd..70644f3f6b6054 100644 --- a/clang/include/clang/Basic/BuiltinsX86_64.def +++ b/clang/include/clang/Basic/BuiltinsX86_64.def @@ -128,6 +128,12 @@ TARGET_BUILTIN(__builtin_ia32_tdpbf16ps_internal, "V256iUsUsUsV256iV256iV256i", TARGET_BUILTIN(__builtin_ia32_tdpfp16ps_internal, "V256iUsUsUsV256iV256iV256i", "n", "amx-fp16") TARGET_BUILTIN(__builtin_ia32_tcmmimfp16ps_internal, "V256iUsUsUsV256iV256iV256i", "n", "amx-complex") TARGET_BUILTIN(__builtin_ia32_tcmmrlfp16ps_internal, "V256iUsUsUsV256iV256iV256i", "n", "amx-complex") +TARGET_BUILTIN(__builtin_ia32_tcvtrowd2ps_internal, "V16fUsUsV256iUi", "n", "amx-avx512") +TARGET_BUILTIN(__builtin_ia32_tcvtrowps2pbf16h_internal, "V32yUsUsV256iUi", "n", "amx-avx512") +TARGET_BUILTIN(__builtin_ia32_tcvtrowps2pbf16l_internal, "V32yUsUsV256iUi", "n", "amx-avx512") +TARGET_BUILTIN(__builtin_ia32_tcvtrowps2phh_internal, "V32xUsUsV256iUi", "n", "amx-avx512") +TARGET_BUILTIN(__builtin_ia32_tcvtrowps2phl_internal, "V32xUsUsV256iUi", "n", "amx-avx512") +TARGET_BUILTIN(__builtin_ia32_tilemovrow_internal, "V16iUsUsV256iUi", "n", "amx-avx512") // AMX TARGET_BUILTIN(__builtin_ia32_tile_loadconfig, "vvC*", "n", "amx-tile") TARGET_BUILTIN(__builtin_ia32_tile_storeconfig, "vvC*", "n", "amx-tile") @@ -148,6 +154,13 @@ TARGET_BUILTIN(__builtin_ia32_ptwrite64, "vUOi", "n", "ptwrite") TARGET_BUILTIN(__builtin_ia32_tcmmimfp16ps, "vIUcIUcIUc", "n", "amx-complex") TARGET_BUILTIN(__builtin_ia32_tcmmrlfp16ps, "vIUcIUcIUc", "n", "amx-complex") +TARGET_BUILTIN(__builtin_ia32_tcvtrowd2ps, "V16fIUcUi", "n", "amx-avx512") +TARGET_BUILTIN(__builtin_ia32_tcvtrowps2pbf16h, "V32yIUcUi", "n", "amx-avx512") +TARGET_BUILTIN(__builtin_i
[clang] [llvm] [X86][AMX] Support AMX-AVX512 (PR #114070)
@@ -133,6 +133,12 @@ TARGET_BUILTIN(__builtin_ia32_t2rpntlvwz0t1_internal, "vUsUsUsV256i*V256i*vC*z", TARGET_BUILTIN(__builtin_ia32_t2rpntlvwz1_internal, "vUsUsUsV256i*V256i*vC*z", "n", "amx-transpose") TARGET_BUILTIN(__builtin_ia32_t2rpntlvwz1t1_internal, "vUsUsUsV256i*V256i*vC*z", "n", "amx-transpose") TARGET_BUILTIN(__builtin_ia32_ttransposed_internal, "V256iUsUsV256i", "n", "amx-transpose") +TARGET_BUILTIN(__builtin_ia32_tcvtrowd2ps_internal, "V16fUsUsV256iUi", "n", "amx-avx512") fzou1 wrote: Is "avx10.2-512" feature needed to be added for the intrinsics here and there? https://github.com/llvm/llvm-project/pull/114070 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [X86][AMX] Support AMX-AVX512 (PR #114070)
github-actions[bot] wrote: :warning: C/C++ code formatter, clang-format found issues in your code. :warning: You can test this locally with the following command: ``bash git-clang-format --diff 37ce18951fded6be1de319b05b968918cb45c00b c38da4e614434b02158444f31f50aee61f9879f6 --extensions cpp,c,h -- clang/lib/Headers/amxavx512intrin.h clang/test/CodeGen/X86/amx_avx512_api.c clang/test/CodeGen/X86/amxavx512-builtins.c clang/lib/Basic/Targets/X86.cpp clang/lib/Basic/Targets/X86.h clang/lib/Headers/immintrin.h clang/lib/Sema/SemaX86.cpp clang/test/CodeGen/attr-target-x86.c clang/test/Driver/x86-target-features.c clang/test/Preprocessor/x86_target_features.c llvm/lib/Target/X86/X86ExpandPseudo.cpp llvm/lib/Target/X86/X86ISelLowering.cpp llvm/lib/Target/X86/X86LowerAMXType.cpp llvm/lib/Target/X86/X86PreTileConfig.cpp llvm/lib/TargetParser/Host.cpp llvm/lib/TargetParser/X86TargetParser.cpp `` View the diff from clang-format here. ``diff diff --git a/clang/lib/Headers/amxavx512intrin.h b/clang/lib/Headers/amxavx512intrin.h index 9bfa868cf4..1e6ee35dea 100644 --- a/clang/lib/Headers/amxavx512intrin.h +++ b/clang/lib/Headers/amxavx512intrin.h @@ -36,7 +36,8 @@ /// IF i + row_chunk / 4 >= tsrc.colsb / 4 /// dst.dword[i] := 0 /// ELSE -/// dst.f32[i] := CONVERT_INT32_TO_FP32(tsrc.row[row_index].dword[row_chunk/4+i], RNE) +/// dst.f32[i] := +/// CONVERT_INT32_TO_FP32(tsrc.row[row_index].dword[row_chunk/4+i], RNE) /// FI /// ENDFOR /// dst[MAX_VL-1:VL] := 0 @@ -72,7 +73,8 @@ /// dst.dword[i] := 0 /// ELSE /// dst.word[2*i+0] := 0 -/// dst.bf16[2*i+1] := CONVERT_FP32_TO_BF16(tsrc.row[row_index].fp32[row_chunk/4+i], RNE) +/// dst.bf16[2*i+1] := +/// CONVERT_FP32_TO_BF16(tsrc.row[row_index].fp32[row_chunk/4+i], RNE) /// FI /// ENDFOR /// dst[MAX_VL-1:VL] := 0 @@ -109,7 +111,8 @@ /// dst.dword[i] := 0 /// ELSE /// dst.word[2*i+1] := 0 -/// dst.bf16[2*i+0] := CONVERT_FP32_TO_BF16(tsrc.row[row_index].fp32[row_chunk/4+i], RNE) +/// dst.bf16[2*i+0] := +/// CONVERT_FP32_TO_BF16(tsrc.row[row_index].fp32[row_chunk/4+i], RNE) /// FI /// ENDFOR /// dst[MAX_VL-1:VL] := 0 @@ -146,7 +149,8 @@ /// dst.dword[i] := 0 /// ELSE /// dst.word[2*i+0] := 0 -/// dst.fp16[2*i+1] := CONVERT_FP32_TO_FP16(tsrc.row[row_index].fp32[row_chunk/4+i], RNE) +/// dst.fp16[2*i+1] := +/// CONVERT_FP32_TO_FP16(tsrc.row[row_index].fp32[row_chunk/4+i], RNE) /// FI /// ENDFOR /// dst[MAX_VL-1:VL] := 0 @@ -182,7 +186,8 @@ /// dst.dword[i] := 0 /// ELSE /// dst.word[2*i+1] := 0 -/// dst.fp16[2*i+0] := CONVERT_FP32_TO_FP16(tsrc.row[row_index].fp32[row_chunk/4+i], RNE) +/// dst.fp16[2*i+0] := +/// CONVERT_FP32_TO_FP16(tsrc.row[row_index].fp32[row_chunk/4+i], RNE) /// FI /// ENDFOR /// dst[MAX_VL-1:VL] := 0 diff --git a/llvm/lib/Target/X86/X86ExpandPseudo.cpp b/llvm/lib/Target/X86/X86ExpandPseudo.cpp index 52519f49e7..9511a82f0e 100644 --- a/llvm/lib/Target/X86/X86ExpandPseudo.cpp +++ b/llvm/lib/Target/X86/X86ExpandPseudo.cpp @@ -770,7 +770,8 @@ bool X86ExpandPseudo::expandMI(MachineBasicBlock &MBB, case X86::PTDPBUUDV: Opc = X86::TDPBUUD; break; case X86::PTDPBF16PSV: Opc = X86::TDPBF16PS; break; case X86::PTDPFP16PSV: Opc = X86::TDPFP16PS; break; -default: llvm_unreachable("Unexpected Opcode"); +default: + llvm_unreachable("Unexpected Opcode"); } MI.setDesc(TII->get(Opc)); MI.tieOperands(0, 1); `` https://github.com/llvm/llvm-project/pull/114070 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [X86][AMX] Support AMX-AVX512 (PR #114070)
@@ -559,12 +559,68 @@ bool X86ExpandPseudo::expandMI(MachineBasicBlock &MBB, return true; } case X86::PTILELOADDV: - case X86::PTILELOADDT1V: { + case X86::PTILELOADDT1V: + case X86::PTCVTROWD2PSrreV: + case X86::PTCVTROWD2PSrriV: + case X86::PTCVTROWPS2PBF16HrreV: + case X86::PTCVTROWPS2PBF16HrriV: + case X86::PTCVTROWPS2PBF16LrreV: + case X86::PTCVTROWPS2PBF16LrriV: + case X86::PTCVTROWPS2PHHrreV: + case X86::PTCVTROWPS2PHHrriV: + case X86::PTCVTROWPS2PHLrreV: + case X86::PTCVTROWPS2PHLrriV: + case X86::PTILEMOVROWrreV: + case X86::PTILEMOVROWrriV: { for (unsigned i = 2; i > 0; --i) MI.removeOperand(i); -unsigned Opc = Opcode == X86::PTILELOADDV - ? GET_EGPR_IF_ENABLED(X86::TILELOADD) - : GET_EGPR_IF_ENABLED(X86::TILELOADDT1); +unsigned Opc; +switch (Opcode) { +case X86::PTILELOADDV: + Opc = GET_EGPR_IF_ENABLED(X86::TILELOADD); + break; +case X86::PTILELOADDT1V: + Opc = GET_EGPR_IF_ENABLED(X86::TILELOADDT1); + break; +case X86::PTCVTROWD2PSrreV: + Opc = X86::TCVTROWD2PSrre; + break; +case X86::PTCVTROWD2PSrriV: + Opc = X86::TCVTROWD2PSrri; + break; +case X86::PTCVTROWPS2PBF16HrreV: + Opc = X86::TCVTROWPS2PBF16Hrre; + break; +case X86::PTCVTROWPS2PBF16HrriV: + Opc = X86::TCVTROWPS2PBF16Hrri; + break; +case X86::PTCVTROWPS2PBF16LrreV: + Opc = X86::TCVTROWPS2PBF16Lrre; + break; +case X86::PTCVTROWPS2PBF16LrriV: + Opc = X86::TCVTROWPS2PBF16Lrri; + break; +case X86::PTCVTROWPS2PHHrreV: + Opc = X86::TCVTROWPS2PHHrre; + break; +case X86::PTCVTROWPS2PHHrriV: + Opc = X86::TCVTROWPS2PHHrri; + break; +case X86::PTCVTROWPS2PHLrreV: + Opc = X86::TCVTROWPS2PHLrre; + break; +case X86::PTCVTROWPS2PHLrriV: + Opc = X86::TCVTROWPS2PHLrri; + break; +case X86::PTILEMOVROWrreV: + Opc = X86::TILEMOVROWrre; + break; +case X86::PTILEMOVROWrriV: + Opc = X86::TILEMOVROWrri; + break; +default: + llvm_unreachable("Impossible Opcode!"); phoebewang wrote: Done. https://github.com/llvm/llvm-project/pull/114070 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [X86][AMX] Support AMX-AVX512 (PR #114070)
@@ -0,0 +1,381 @@ +/*===- amxavx512intrin.h - AMXAVX512 === + * + * Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. + * See https://llvm.org/LICENSE.txt for license information. + * SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception + * + *====== + */ +#ifndef __IMMINTRIN_H +#error "Never use directly; include instead." +#endif // __IMMINTRIN_H + +#ifndef __AMX_AVX512INTRIN_H +#define __AMX_AVX512INTRIN_H +#ifdef __x86_64__ + +#define __DEFAULT_FN_ATTRS_AVX512 \ + __attribute__((__always_inline__, __nodebug__, __target__("amx-avx512"))) + +/// Moves a row from a tile register to a zmm destination register, converting +///the int32 source elements to fp32. The row of the tile is selected by an +///32b GPR. +/// +/// \headerfile +/// +/// \code +/// __m512i _tile_cvtrowd2ps(__tile tsrc, unsigned int row); +/// \endcode +/// +/// \code{.operation} +/// VL := 512 +/// VL_bytes := VL >> 3 +/// row_index := row & 0x +/// row_chunk := ((row >> 16) & 0x) * VL_bytes +/// FOR i := 0 TO (VL_bytes / 4) - 1 +/// IF i + row_chunk / 4 >= tsrc.colsb / 4 +/// dst.dword[i] := 0 +/// ELSE +/// dst.f32[i] := CONVERT_INT32_TO_FP32(tsrc.row[row_index].dword[row_chunk/4+i], RNE) +/// FI +/// ENDFOR +/// dst[MAX_VL-1:VL] := 0 +/// zero_tileconfig_start() +/// \endcode +/// +/// This intrinsic corresponds to the \c TCVTROWD2PS instruction. +/// +/// \param tsrc +///The 1st source tile. Max size is 1024 Bytes. +/// \param row +///The row of the source tile +#define _tile_cvtrowd2ps(tsrc, row) __builtin_ia32_tcvtrowd2ps(tsrc, row) + +/// Moves a row from a tile register to a zmm destination register, converting +///the fp32 source elements to bf16. It places the resulting bf16 elements +///in the high 16 bits within each dword. The row of the tile is selected +///by an 32b GPR. phoebewang wrote: Done. https://github.com/llvm/llvm-project/pull/114070 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [X86][AMX] Support AMX-AVX512 (PR #114070)
@@ -0,0 +1,381 @@ +/*===- amxavx512intrin.h - AMXAVX512 === + * + * Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. + * See https://llvm.org/LICENSE.txt for license information. + * SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception + * + *====== + */ +#ifndef __IMMINTRIN_H +#error "Never use directly; include instead." +#endif // __IMMINTRIN_H + +#ifndef __AMX_AVX512INTRIN_H +#define __AMX_AVX512INTRIN_H +#ifdef __x86_64__ + +#define __DEFAULT_FN_ATTRS_AVX512 \ + __attribute__((__always_inline__, __nodebug__, __target__("amx-avx512"))) + +/// Moves a row from a tile register to a zmm destination register, converting +///the int32 source elements to fp32. The row of the tile is selected by an +///32b GPR. +/// +/// \headerfile +/// +/// \code +/// __m512i _tile_cvtrowd2ps(__tile tsrc, unsigned int row); +/// \endcode +/// +/// \code{.operation} +/// VL := 512 +/// VL_bytes := VL >> 3 +/// row_index := row & 0x +/// row_chunk := ((row >> 16) & 0x) * VL_bytes +/// FOR i := 0 TO (VL_bytes / 4) - 1 +/// IF i + row_chunk / 4 >= tsrc.colsb / 4 +/// dst.dword[i] := 0 +/// ELSE +/// dst.f32[i] := CONVERT_INT32_TO_FP32(tsrc.row[row_index].dword[row_chunk/4+i], RNE) +/// FI +/// ENDFOR +/// dst[MAX_VL-1:VL] := 0 +/// zero_tileconfig_start() +/// \endcode +/// +/// This intrinsic corresponds to the \c TCVTROWD2PS instruction. +/// +/// \param tsrc +///The 1st source tile. Max size is 1024 Bytes. phoebewang wrote: Done. https://github.com/llvm/llvm-project/pull/114070 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [X86][AMX] Support AMX-AVX512 (PR #114070)
@@ -0,0 +1,381 @@ +/*===- amxavx512intrin.h - AMXAVX512 === + * + * Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. + * See https://llvm.org/LICENSE.txt for license information. + * SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception + * + *====== + */ +#ifndef __IMMINTRIN_H +#error "Never use directly; include instead." +#endif // __IMMINTRIN_H + +#ifndef __AMX_AVX512INTRIN_H +#define __AMX_AVX512INTRIN_H +#ifdef __x86_64__ + +#define __DEFAULT_FN_ATTRS_AVX512 \ + __attribute__((__always_inline__, __nodebug__, __target__("amx-avx512"))) + +/// Moves a row from a tile register to a zmm destination register, converting +///the int32 source elements to fp32. The row of the tile is selected by an phoebewang wrote: Done. https://github.com/llvm/llvm-project/pull/114070 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [X86][AMX] Support AMX-AVX512 (PR #114070)
https://github.com/phoebewang updated https://github.com/llvm/llvm-project/pull/114070 >From 587d0105e7724db0f35fc5c8179519fa6319e5c8 Mon Sep 17 00:00:00 2001 From: "Wang, Phoebe" Date: Tue, 29 Oct 2024 22:29:25 +0800 Subject: [PATCH 1/2] [X86][AMX] Support AMX-AVX512 --- clang/docs/ReleaseNotes.rst | 2 + clang/include/clang/Basic/BuiltinsX86_64.def | 13 + clang/include/clang/Driver/Options.td | 2 + clang/lib/Basic/Targets/X86.cpp | 6 + clang/lib/Basic/Targets/X86.h | 1 + clang/lib/Headers/CMakeLists.txt | 1 + clang/lib/Headers/amxavx512intrin.h | 381 ++ clang/lib/Headers/immintrin.h | 4 + clang/lib/Sema/SemaX86.cpp| 6 + clang/test/CodeGen/X86/amx_avx512_api.c | 52 +++ clang/test/CodeGen/X86/amxavx512-builtins.c | 41 ++ clang/test/CodeGen/attr-target-x86.c | 8 +- clang/test/Driver/x86-target-features.c | 7 + clang/test/Preprocessor/x86_target_features.c | 7 + llvm/include/llvm/IR/IntrinsicsX86.td | 50 +++ .../llvm/TargetParser/X86TargetParser.def | 1 + llvm/lib/Target/X86/X86.td| 4 + llvm/lib/Target/X86/X86ExpandPseudo.cpp | 64 ++- llvm/lib/Target/X86/X86ISelLowering.cpp | 76 llvm/lib/Target/X86/X86InstrAMX.td| 147 +++ llvm/lib/Target/X86/X86InstrPredicates.td | 1 + llvm/lib/Target/X86/X86LowerAMXType.cpp | 11 + llvm/lib/Target/X86/X86PreTileConfig.cpp | 19 +- llvm/lib/TargetParser/Host.cpp| 4 + llvm/lib/TargetParser/X86TargetParser.cpp | 2 + .../CodeGen/X86/amx-across-func-tilemovrow.ll | 171 .../test/CodeGen/X86/amx-avx512-intrinsics.ll | 116 ++ .../CodeGen/X86/amx-tile-avx512-internals.ll | 61 +++ llvm/test/MC/Disassembler/X86/amx-avx512.txt | 106 + llvm/test/MC/X86/amx-avx512-att.s | 105 + llvm/test/MC/X86/amx-avx512-intel.s | 105 + 31 files changed, 1564 insertions(+), 10 deletions(-) create mode 100644 clang/lib/Headers/amxavx512intrin.h create mode 100644 clang/test/CodeGen/X86/amx_avx512_api.c create mode 100644 clang/test/CodeGen/X86/amxavx512-builtins.c create mode 100644 llvm/test/CodeGen/X86/amx-across-func-tilemovrow.ll create mode 100644 llvm/test/CodeGen/X86/amx-avx512-intrinsics.ll create mode 100644 llvm/test/CodeGen/X86/amx-tile-avx512-internals.ll create mode 100644 llvm/test/MC/Disassembler/X86/amx-avx512.txt create mode 100644 llvm/test/MC/X86/amx-avx512-att.s create mode 100644 llvm/test/MC/X86/amx-avx512-intel.s diff --git a/clang/docs/ReleaseNotes.rst b/clang/docs/ReleaseNotes.rst index ce046a305c89b6..d45bd1240dd173 100644 --- a/clang/docs/ReleaseNotes.rst +++ b/clang/docs/ReleaseNotes.rst @@ -611,6 +611,8 @@ X86 Support * Supported MINMAX intrinsics of ``*_(mask(z)))_minmax(ne)_p[s|d|h|bh]`` and ``*_(mask(z)))_minmax_s[s|d|h]``. +- Support ISA of ``AMX-AVX512``. + - All intrinsics in adcintrin.h can now be used in constant expressions. - All intrinsics in adxintrin.h can now be used in constant expressions. diff --git a/clang/include/clang/Basic/BuiltinsX86_64.def b/clang/include/clang/Basic/BuiltinsX86_64.def index 2c591edb2835cd..70644f3f6b6054 100644 --- a/clang/include/clang/Basic/BuiltinsX86_64.def +++ b/clang/include/clang/Basic/BuiltinsX86_64.def @@ -128,6 +128,12 @@ TARGET_BUILTIN(__builtin_ia32_tdpbf16ps_internal, "V256iUsUsUsV256iV256iV256i", TARGET_BUILTIN(__builtin_ia32_tdpfp16ps_internal, "V256iUsUsUsV256iV256iV256i", "n", "amx-fp16") TARGET_BUILTIN(__builtin_ia32_tcmmimfp16ps_internal, "V256iUsUsUsV256iV256iV256i", "n", "amx-complex") TARGET_BUILTIN(__builtin_ia32_tcmmrlfp16ps_internal, "V256iUsUsUsV256iV256iV256i", "n", "amx-complex") +TARGET_BUILTIN(__builtin_ia32_tcvtrowd2ps_internal, "V16fUsUsV256iUi", "n", "amx-avx512") +TARGET_BUILTIN(__builtin_ia32_tcvtrowps2pbf16h_internal, "V32yUsUsV256iUi", "n", "amx-avx512") +TARGET_BUILTIN(__builtin_ia32_tcvtrowps2pbf16l_internal, "V32yUsUsV256iUi", "n", "amx-avx512") +TARGET_BUILTIN(__builtin_ia32_tcvtrowps2phh_internal, "V32xUsUsV256iUi", "n", "amx-avx512") +TARGET_BUILTIN(__builtin_ia32_tcvtrowps2phl_internal, "V32xUsUsV256iUi", "n", "amx-avx512") +TARGET_BUILTIN(__builtin_ia32_tilemovrow_internal, "V16iUsUsV256iUi", "n", "amx-avx512") // AMX TARGET_BUILTIN(__builtin_ia32_tile_loadconfig, "vvC*", "n", "amx-tile") TARGET_BUILTIN(__builtin_ia32_tile_storeconfig, "vvC*", "n", "amx-tile") @@ -148,6 +154,13 @@ TARGET_BUILTIN(__builtin_ia32_ptwrite64, "vUOi", "n", "ptwrite") TARGET_BUILTIN(__builtin_ia32_tcmmimfp16ps, "vIUcIUcIUc", "n", "amx-complex") TARGET_BUILTIN(__builtin_ia32_tcmmrlfp16ps, "vIUcIUcIUc", "n", "amx-complex") +TARGET_BUILTIN(__builtin_ia32_tcvtrowd2ps, "V16fIUcUi", "n", "amx-avx512") +TARGET_BUILTIN(__builtin_ia32_tcvtrowps2pbf16h, "V32yIUcUi", "n", "amx-avx512") +TARGET_BUILTIN(__builtin_i
[clang] [llvm] [X86][AMX] Support AMX-AVX512 (PR #114070)
https://github.com/phoebewang updated https://github.com/llvm/llvm-project/pull/114070 >From 587d0105e7724db0f35fc5c8179519fa6319e5c8 Mon Sep 17 00:00:00 2001 From: "Wang, Phoebe" Date: Tue, 29 Oct 2024 22:29:25 +0800 Subject: [PATCH] [X86][AMX] Support AMX-AVX512 --- clang/docs/ReleaseNotes.rst | 2 + clang/include/clang/Basic/BuiltinsX86_64.def | 13 + clang/include/clang/Driver/Options.td | 2 + clang/lib/Basic/Targets/X86.cpp | 6 + clang/lib/Basic/Targets/X86.h | 1 + clang/lib/Headers/CMakeLists.txt | 1 + clang/lib/Headers/amxavx512intrin.h | 381 ++ clang/lib/Headers/immintrin.h | 4 + clang/lib/Sema/SemaX86.cpp| 6 + clang/test/CodeGen/X86/amx_avx512_api.c | 52 +++ clang/test/CodeGen/X86/amxavx512-builtins.c | 41 ++ clang/test/CodeGen/attr-target-x86.c | 8 +- clang/test/Driver/x86-target-features.c | 7 + clang/test/Preprocessor/x86_target_features.c | 7 + llvm/include/llvm/IR/IntrinsicsX86.td | 50 +++ .../llvm/TargetParser/X86TargetParser.def | 1 + llvm/lib/Target/X86/X86.td| 4 + llvm/lib/Target/X86/X86ExpandPseudo.cpp | 64 ++- llvm/lib/Target/X86/X86ISelLowering.cpp | 76 llvm/lib/Target/X86/X86InstrAMX.td| 147 +++ llvm/lib/Target/X86/X86InstrPredicates.td | 1 + llvm/lib/Target/X86/X86LowerAMXType.cpp | 11 + llvm/lib/Target/X86/X86PreTileConfig.cpp | 19 +- llvm/lib/TargetParser/Host.cpp| 4 + llvm/lib/TargetParser/X86TargetParser.cpp | 2 + .../CodeGen/X86/amx-across-func-tilemovrow.ll | 171 .../test/CodeGen/X86/amx-avx512-intrinsics.ll | 116 ++ .../CodeGen/X86/amx-tile-avx512-internals.ll | 61 +++ llvm/test/MC/Disassembler/X86/amx-avx512.txt | 106 + llvm/test/MC/X86/amx-avx512-att.s | 105 + llvm/test/MC/X86/amx-avx512-intel.s | 105 + 31 files changed, 1564 insertions(+), 10 deletions(-) create mode 100644 clang/lib/Headers/amxavx512intrin.h create mode 100644 clang/test/CodeGen/X86/amx_avx512_api.c create mode 100644 clang/test/CodeGen/X86/amxavx512-builtins.c create mode 100644 llvm/test/CodeGen/X86/amx-across-func-tilemovrow.ll create mode 100644 llvm/test/CodeGen/X86/amx-avx512-intrinsics.ll create mode 100644 llvm/test/CodeGen/X86/amx-tile-avx512-internals.ll create mode 100644 llvm/test/MC/Disassembler/X86/amx-avx512.txt create mode 100644 llvm/test/MC/X86/amx-avx512-att.s create mode 100644 llvm/test/MC/X86/amx-avx512-intel.s diff --git a/clang/docs/ReleaseNotes.rst b/clang/docs/ReleaseNotes.rst index ce046a305c89b6..d45bd1240dd173 100644 --- a/clang/docs/ReleaseNotes.rst +++ b/clang/docs/ReleaseNotes.rst @@ -611,6 +611,8 @@ X86 Support * Supported MINMAX intrinsics of ``*_(mask(z)))_minmax(ne)_p[s|d|h|bh]`` and ``*_(mask(z)))_minmax_s[s|d|h]``. +- Support ISA of ``AMX-AVX512``. + - All intrinsics in adcintrin.h can now be used in constant expressions. - All intrinsics in adxintrin.h can now be used in constant expressions. diff --git a/clang/include/clang/Basic/BuiltinsX86_64.def b/clang/include/clang/Basic/BuiltinsX86_64.def index 2c591edb2835cd..70644f3f6b6054 100644 --- a/clang/include/clang/Basic/BuiltinsX86_64.def +++ b/clang/include/clang/Basic/BuiltinsX86_64.def @@ -128,6 +128,12 @@ TARGET_BUILTIN(__builtin_ia32_tdpbf16ps_internal, "V256iUsUsUsV256iV256iV256i", TARGET_BUILTIN(__builtin_ia32_tdpfp16ps_internal, "V256iUsUsUsV256iV256iV256i", "n", "amx-fp16") TARGET_BUILTIN(__builtin_ia32_tcmmimfp16ps_internal, "V256iUsUsUsV256iV256iV256i", "n", "amx-complex") TARGET_BUILTIN(__builtin_ia32_tcmmrlfp16ps_internal, "V256iUsUsUsV256iV256iV256i", "n", "amx-complex") +TARGET_BUILTIN(__builtin_ia32_tcvtrowd2ps_internal, "V16fUsUsV256iUi", "n", "amx-avx512") +TARGET_BUILTIN(__builtin_ia32_tcvtrowps2pbf16h_internal, "V32yUsUsV256iUi", "n", "amx-avx512") +TARGET_BUILTIN(__builtin_ia32_tcvtrowps2pbf16l_internal, "V32yUsUsV256iUi", "n", "amx-avx512") +TARGET_BUILTIN(__builtin_ia32_tcvtrowps2phh_internal, "V32xUsUsV256iUi", "n", "amx-avx512") +TARGET_BUILTIN(__builtin_ia32_tcvtrowps2phl_internal, "V32xUsUsV256iUi", "n", "amx-avx512") +TARGET_BUILTIN(__builtin_ia32_tilemovrow_internal, "V16iUsUsV256iUi", "n", "amx-avx512") // AMX TARGET_BUILTIN(__builtin_ia32_tile_loadconfig, "vvC*", "n", "amx-tile") TARGET_BUILTIN(__builtin_ia32_tile_storeconfig, "vvC*", "n", "amx-tile") @@ -148,6 +154,13 @@ TARGET_BUILTIN(__builtin_ia32_ptwrite64, "vUOi", "n", "ptwrite") TARGET_BUILTIN(__builtin_ia32_tcmmimfp16ps, "vIUcIUcIUc", "n", "amx-complex") TARGET_BUILTIN(__builtin_ia32_tcmmrlfp16ps, "vIUcIUcIUc", "n", "amx-complex") +TARGET_BUILTIN(__builtin_ia32_tcvtrowd2ps, "V16fIUcUi", "n", "amx-avx512") +TARGET_BUILTIN(__builtin_ia32_tcvtrowps2pbf16h, "V32yIUcUi", "n", "amx-avx512") +TARGET_BUILTIN(__builtin_ia32_
[clang] [llvm] [X86][AMX] Support AMX-AVX512 (PR #114070)
@@ -0,0 +1,381 @@ +/*===- amxavx512intrin.h - AMXAVX512 === + * + * Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. + * See https://llvm.org/LICENSE.txt for license information. + * SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception + * + *====== + */ +#ifndef __IMMINTRIN_H +#error "Never use directly; include instead." +#endif // __IMMINTRIN_H + +#ifndef __AMX_AVX512INTRIN_H +#define __AMX_AVX512INTRIN_H +#ifdef __x86_64__ + +#define __DEFAULT_FN_ATTRS_AVX512 \ + __attribute__((__always_inline__, __nodebug__, __target__("amx-avx512"))) + +/// Moves a row from a tile register to a zmm destination register, converting +///the int32 source elements to fp32. The row of the tile is selected by an fzou1 wrote: an -> a. https://github.com/llvm/llvm-project/pull/114070 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [X86][AMX] Support AMX-AVX512 (PR #114070)
@@ -0,0 +1,381 @@ +/*===- amxavx512intrin.h - AMXAVX512 === + * + * Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. + * See https://llvm.org/LICENSE.txt for license information. + * SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception + * + *====== + */ +#ifndef __IMMINTRIN_H +#error "Never use directly; include instead." +#endif // __IMMINTRIN_H + +#ifndef __AMX_AVX512INTRIN_H +#define __AMX_AVX512INTRIN_H +#ifdef __x86_64__ + +#define __DEFAULT_FN_ATTRS_AVX512 \ + __attribute__((__always_inline__, __nodebug__, __target__("amx-avx512"))) + +/// Moves a row from a tile register to a zmm destination register, converting +///the int32 source elements to fp32. The row of the tile is selected by an +///32b GPR. +/// +/// \headerfile +/// +/// \code +/// __m512i _tile_cvtrowd2ps(__tile tsrc, unsigned int row); +/// \endcode +/// +/// \code{.operation} +/// VL := 512 +/// VL_bytes := VL >> 3 +/// row_index := row & 0x +/// row_chunk := ((row >> 16) & 0x) * VL_bytes +/// FOR i := 0 TO (VL_bytes / 4) - 1 +/// IF i + row_chunk / 4 >= tsrc.colsb / 4 +/// dst.dword[i] := 0 +/// ELSE +/// dst.f32[i] := CONVERT_INT32_TO_FP32(tsrc.row[row_index].dword[row_chunk/4+i], RNE) +/// FI +/// ENDFOR +/// dst[MAX_VL-1:VL] := 0 +/// zero_tileconfig_start() +/// \endcode +/// +/// This intrinsic corresponds to the \c TCVTROWD2PS instruction. +/// +/// \param tsrc +///The 1st source tile. Max size is 1024 Bytes. fzou1 wrote: Remove "1st" since there is only one source tile. The following should be updated as this. https://github.com/llvm/llvm-project/pull/114070 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [X86][AMX] Support AMX-AVX512 (PR #114070)
@@ -0,0 +1,381 @@ +/*===- amxavx512intrin.h - AMXAVX512 === + * + * Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. + * See https://llvm.org/LICENSE.txt for license information. + * SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception + * + *====== + */ +#ifndef __IMMINTRIN_H +#error "Never use directly; include instead." +#endif // __IMMINTRIN_H + +#ifndef __AMX_AVX512INTRIN_H +#define __AMX_AVX512INTRIN_H +#ifdef __x86_64__ + +#define __DEFAULT_FN_ATTRS_AVX512 \ + __attribute__((__always_inline__, __nodebug__, __target__("amx-avx512"))) + +/// Moves a row from a tile register to a zmm destination register, converting +///the int32 source elements to fp32. The row of the tile is selected by an +///32b GPR. +/// +/// \headerfile +/// +/// \code +/// __m512i _tile_cvtrowd2ps(__tile tsrc, unsigned int row); +/// \endcode +/// +/// \code{.operation} +/// VL := 512 +/// VL_bytes := VL >> 3 +/// row_index := row & 0x +/// row_chunk := ((row >> 16) & 0x) * VL_bytes +/// FOR i := 0 TO (VL_bytes / 4) - 1 +/// IF i + row_chunk / 4 >= tsrc.colsb / 4 +/// dst.dword[i] := 0 +/// ELSE +/// dst.f32[i] := CONVERT_INT32_TO_FP32(tsrc.row[row_index].dword[row_chunk/4+i], RNE) +/// FI +/// ENDFOR +/// dst[MAX_VL-1:VL] := 0 +/// zero_tileconfig_start() +/// \endcode +/// +/// This intrinsic corresponds to the \c TCVTROWD2PS instruction. +/// +/// \param tsrc +///The 1st source tile. Max size is 1024 Bytes. +/// \param row +///The row of the source tile +#define _tile_cvtrowd2ps(tsrc, row) __builtin_ia32_tcvtrowd2ps(tsrc, row) + +/// Moves a row from a tile register to a zmm destination register, converting +///the fp32 source elements to bf16. It places the resulting bf16 elements +///in the high 16 bits within each dword. The row of the tile is selected +///by an 32b GPR. fzou1 wrote: an -> a. The following "an 32b" should be updated to "a 32b" too. https://github.com/llvm/llvm-project/pull/114070 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [X86][AMX] Support AMX-AVX512 (PR #114070)
@@ -559,12 +559,68 @@ bool X86ExpandPseudo::expandMI(MachineBasicBlock &MBB, return true; } case X86::PTILELOADDV: - case X86::PTILELOADDT1V: { + case X86::PTILELOADDT1V: + case X86::PTCVTROWD2PSrreV: + case X86::PTCVTROWD2PSrriV: + case X86::PTCVTROWPS2PBF16HrreV: + case X86::PTCVTROWPS2PBF16HrriV: + case X86::PTCVTROWPS2PBF16LrreV: + case X86::PTCVTROWPS2PBF16LrriV: + case X86::PTCVTROWPS2PHHrreV: + case X86::PTCVTROWPS2PHHrriV: + case X86::PTCVTROWPS2PHLrreV: + case X86::PTCVTROWPS2PHLrriV: + case X86::PTILEMOVROWrreV: + case X86::PTILEMOVROWrriV: { for (unsigned i = 2; i > 0; --i) MI.removeOperand(i); -unsigned Opc = Opcode == X86::PTILELOADDV - ? GET_EGPR_IF_ENABLED(X86::TILELOADD) - : GET_EGPR_IF_ENABLED(X86::TILELOADDT1); +unsigned Opc; +switch (Opcode) { +case X86::PTILELOADDV: + Opc = GET_EGPR_IF_ENABLED(X86::TILELOADD); + break; +case X86::PTILELOADDT1V: + Opc = GET_EGPR_IF_ENABLED(X86::TILELOADDT1); + break; +case X86::PTCVTROWD2PSrreV: + Opc = X86::TCVTROWD2PSrre; + break; +case X86::PTCVTROWD2PSrriV: + Opc = X86::TCVTROWD2PSrri; + break; +case X86::PTCVTROWPS2PBF16HrreV: + Opc = X86::TCVTROWPS2PBF16Hrre; + break; +case X86::PTCVTROWPS2PBF16HrriV: + Opc = X86::TCVTROWPS2PBF16Hrri; + break; +case X86::PTCVTROWPS2PBF16LrreV: + Opc = X86::TCVTROWPS2PBF16Lrre; + break; +case X86::PTCVTROWPS2PBF16LrriV: + Opc = X86::TCVTROWPS2PBF16Lrri; + break; +case X86::PTCVTROWPS2PHHrreV: + Opc = X86::TCVTROWPS2PHHrre; + break; +case X86::PTCVTROWPS2PHHrriV: + Opc = X86::TCVTROWPS2PHHrri; + break; +case X86::PTCVTROWPS2PHLrreV: + Opc = X86::TCVTROWPS2PHLrre; + break; +case X86::PTCVTROWPS2PHLrriV: + Opc = X86::TCVTROWPS2PHLrri; + break; +case X86::PTILEMOVROWrreV: + Opc = X86::TILEMOVROWrre; + break; +case X86::PTILEMOVROWrriV: + Opc = X86::TILEMOVROWrri; + break; +default: + llvm_unreachable("Impossible Opcode!"); fzou1 wrote: Better to change "Impossible" to "Invalid". https://github.com/llvm/llvm-project/pull/114070 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [X86][AMX] Support AMX-AVX512 (PR #114070)
llvmbot wrote: @llvm/pr-subscribers-clang-driver Author: Phoebe Wang (phoebewang) Changes --- Patch is 81.89 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/114070.diff 31 Files Affected: - (modified) clang/docs/ReleaseNotes.rst (+2) - (modified) clang/include/clang/Basic/BuiltinsX86_64.def (+13) - (modified) clang/include/clang/Driver/Options.td (+2) - (modified) clang/lib/Basic/Targets/X86.cpp (+6) - (modified) clang/lib/Basic/Targets/X86.h (+1) - (modified) clang/lib/Headers/CMakeLists.txt (+1) - (added) clang/lib/Headers/amxavx512intrin.h (+381) - (modified) clang/lib/Headers/immintrin.h (+4) - (modified) clang/lib/Sema/SemaX86.cpp (+6) - (added) clang/test/CodeGen/X86/amx_avx512_api.c (+52) - (added) clang/test/CodeGen/X86/amxavx512-builtins.c (+41) - (modified) clang/test/CodeGen/attr-target-x86.c (+4-4) - (modified) clang/test/Driver/x86-target-features.c (+7) - (modified) clang/test/Preprocessor/x86_target_features.c (+7) - (modified) llvm/include/llvm/IR/IntrinsicsX86.td (+50) - (modified) llvm/include/llvm/TargetParser/X86TargetParser.def (+1) - (modified) llvm/lib/Target/X86/X86.td (+4) - (modified) llvm/lib/Target/X86/X86ExpandPseudo.cpp (+60-4) - (modified) llvm/lib/Target/X86/X86ISelLowering.cpp (+76) - (modified) llvm/lib/Target/X86/X86InstrAMX.td (+147) - (modified) llvm/lib/Target/X86/X86InstrPredicates.td (+1) - (modified) llvm/lib/Target/X86/X86LowerAMXType.cpp (+11) - (modified) llvm/lib/Target/X86/X86PreTileConfig.cpp (+17-2) - (modified) llvm/lib/TargetParser/Host.cpp (+4) - (modified) llvm/lib/TargetParser/X86TargetParser.cpp (+2) - (added) llvm/test/CodeGen/X86/amx-across-func-tilemovrow.ll (+171) - (added) llvm/test/CodeGen/X86/amx-avx512-intrinsics.ll (+116) - (added) llvm/test/CodeGen/X86/amx-tile-avx512-internals.ll (+61) - (added) llvm/test/MC/Disassembler/X86/amx-avx512.txt (+106) - (added) llvm/test/MC/X86/amx-avx512-att.s (+105) - (added) llvm/test/MC/X86/amx-avx512-intel.s (+105) ``diff diff --git a/clang/docs/ReleaseNotes.rst b/clang/docs/ReleaseNotes.rst index ce046a305c89b6..d45bd1240dd173 100644 --- a/clang/docs/ReleaseNotes.rst +++ b/clang/docs/ReleaseNotes.rst @@ -611,6 +611,8 @@ X86 Support * Supported MINMAX intrinsics of ``*_(mask(z)))_minmax(ne)_p[s|d|h|bh]`` and ``*_(mask(z)))_minmax_s[s|d|h]``. +- Support ISA of ``AMX-AVX512``. + - All intrinsics in adcintrin.h can now be used in constant expressions. - All intrinsics in adxintrin.h can now be used in constant expressions. diff --git a/clang/include/clang/Basic/BuiltinsX86_64.def b/clang/include/clang/Basic/BuiltinsX86_64.def index 2c591edb2835cd..70644f3f6b6054 100644 --- a/clang/include/clang/Basic/BuiltinsX86_64.def +++ b/clang/include/clang/Basic/BuiltinsX86_64.def @@ -128,6 +128,12 @@ TARGET_BUILTIN(__builtin_ia32_tdpbf16ps_internal, "V256iUsUsUsV256iV256iV256i", TARGET_BUILTIN(__builtin_ia32_tdpfp16ps_internal, "V256iUsUsUsV256iV256iV256i", "n", "amx-fp16") TARGET_BUILTIN(__builtin_ia32_tcmmimfp16ps_internal, "V256iUsUsUsV256iV256iV256i", "n", "amx-complex") TARGET_BUILTIN(__builtin_ia32_tcmmrlfp16ps_internal, "V256iUsUsUsV256iV256iV256i", "n", "amx-complex") +TARGET_BUILTIN(__builtin_ia32_tcvtrowd2ps_internal, "V16fUsUsV256iUi", "n", "amx-avx512") +TARGET_BUILTIN(__builtin_ia32_tcvtrowps2pbf16h_internal, "V32yUsUsV256iUi", "n", "amx-avx512") +TARGET_BUILTIN(__builtin_ia32_tcvtrowps2pbf16l_internal, "V32yUsUsV256iUi", "n", "amx-avx512") +TARGET_BUILTIN(__builtin_ia32_tcvtrowps2phh_internal, "V32xUsUsV256iUi", "n", "amx-avx512") +TARGET_BUILTIN(__builtin_ia32_tcvtrowps2phl_internal, "V32xUsUsV256iUi", "n", "amx-avx512") +TARGET_BUILTIN(__builtin_ia32_tilemovrow_internal, "V16iUsUsV256iUi", "n", "amx-avx512") // AMX TARGET_BUILTIN(__builtin_ia32_tile_loadconfig, "vvC*", "n", "amx-tile") TARGET_BUILTIN(__builtin_ia32_tile_storeconfig, "vvC*", "n", "amx-tile") @@ -148,6 +154,13 @@ TARGET_BUILTIN(__builtin_ia32_ptwrite64, "vUOi", "n", "ptwrite") TARGET_BUILTIN(__builtin_ia32_tcmmimfp16ps, "vIUcIUcIUc", "n", "amx-complex") TARGET_BUILTIN(__builtin_ia32_tcmmrlfp16ps, "vIUcIUcIUc", "n", "amx-complex") +TARGET_BUILTIN(__builtin_ia32_tcvtrowd2ps, "V16fIUcUi", "n", "amx-avx512") +TARGET_BUILTIN(__builtin_ia32_tcvtrowps2pbf16h, "V32yIUcUi", "n", "amx-avx512") +TARGET_BUILTIN(__builtin_ia32_tcvtrowps2pbf16l, "V32yIUcUi", "n", "amx-avx512") +TARGET_BUILTIN(__builtin_ia32_tcvtrowps2phh, "V32xIUcUi", "n", "amx-avx512") +TARGET_BUILTIN(__builtin_ia32_tcvtrowps2phl, "V32xIUcUi", "n", "amx-avx512") +TARGET_BUILTIN(__builtin_ia32_tilemovrow, "V16iIUcUi", "n", "amx-avx512") + TARGET_BUILTIN(__builtin_ia32_prefetchi, "vvC*Ui", "nc", "prefetchi") TARGET_BUILTIN(__builtin_ia32_cmpccxadd32, "Siv*SiSiIi", "n", "cmpccxadd") TARGET_BUILTIN(__builtin_ia32_cmpccxadd64, "SLLiv*SLLiSLLiIi", "n", "cmpccxadd") diff --git a/clang/include/clang/Driver/Options.td b/clang/include/clang/Driver/Options.td ind
[clang] [llvm] [X86][AMX] Support AMX-AVX512 (PR #114070)
llvmbot wrote: @llvm/pr-subscribers-backend-x86 Author: Phoebe Wang (phoebewang) Changes --- Patch is 81.89 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/114070.diff 31 Files Affected: - (modified) clang/docs/ReleaseNotes.rst (+2) - (modified) clang/include/clang/Basic/BuiltinsX86_64.def (+13) - (modified) clang/include/clang/Driver/Options.td (+2) - (modified) clang/lib/Basic/Targets/X86.cpp (+6) - (modified) clang/lib/Basic/Targets/X86.h (+1) - (modified) clang/lib/Headers/CMakeLists.txt (+1) - (added) clang/lib/Headers/amxavx512intrin.h (+381) - (modified) clang/lib/Headers/immintrin.h (+4) - (modified) clang/lib/Sema/SemaX86.cpp (+6) - (added) clang/test/CodeGen/X86/amx_avx512_api.c (+52) - (added) clang/test/CodeGen/X86/amxavx512-builtins.c (+41) - (modified) clang/test/CodeGen/attr-target-x86.c (+4-4) - (modified) clang/test/Driver/x86-target-features.c (+7) - (modified) clang/test/Preprocessor/x86_target_features.c (+7) - (modified) llvm/include/llvm/IR/IntrinsicsX86.td (+50) - (modified) llvm/include/llvm/TargetParser/X86TargetParser.def (+1) - (modified) llvm/lib/Target/X86/X86.td (+4) - (modified) llvm/lib/Target/X86/X86ExpandPseudo.cpp (+60-4) - (modified) llvm/lib/Target/X86/X86ISelLowering.cpp (+76) - (modified) llvm/lib/Target/X86/X86InstrAMX.td (+147) - (modified) llvm/lib/Target/X86/X86InstrPredicates.td (+1) - (modified) llvm/lib/Target/X86/X86LowerAMXType.cpp (+11) - (modified) llvm/lib/Target/X86/X86PreTileConfig.cpp (+17-2) - (modified) llvm/lib/TargetParser/Host.cpp (+4) - (modified) llvm/lib/TargetParser/X86TargetParser.cpp (+2) - (added) llvm/test/CodeGen/X86/amx-across-func-tilemovrow.ll (+171) - (added) llvm/test/CodeGen/X86/amx-avx512-intrinsics.ll (+116) - (added) llvm/test/CodeGen/X86/amx-tile-avx512-internals.ll (+61) - (added) llvm/test/MC/Disassembler/X86/amx-avx512.txt (+106) - (added) llvm/test/MC/X86/amx-avx512-att.s (+105) - (added) llvm/test/MC/X86/amx-avx512-intel.s (+105) ``diff diff --git a/clang/docs/ReleaseNotes.rst b/clang/docs/ReleaseNotes.rst index ce046a305c89b6..d45bd1240dd173 100644 --- a/clang/docs/ReleaseNotes.rst +++ b/clang/docs/ReleaseNotes.rst @@ -611,6 +611,8 @@ X86 Support * Supported MINMAX intrinsics of ``*_(mask(z)))_minmax(ne)_p[s|d|h|bh]`` and ``*_(mask(z)))_minmax_s[s|d|h]``. +- Support ISA of ``AMX-AVX512``. + - All intrinsics in adcintrin.h can now be used in constant expressions. - All intrinsics in adxintrin.h can now be used in constant expressions. diff --git a/clang/include/clang/Basic/BuiltinsX86_64.def b/clang/include/clang/Basic/BuiltinsX86_64.def index 2c591edb2835cd..70644f3f6b6054 100644 --- a/clang/include/clang/Basic/BuiltinsX86_64.def +++ b/clang/include/clang/Basic/BuiltinsX86_64.def @@ -128,6 +128,12 @@ TARGET_BUILTIN(__builtin_ia32_tdpbf16ps_internal, "V256iUsUsUsV256iV256iV256i", TARGET_BUILTIN(__builtin_ia32_tdpfp16ps_internal, "V256iUsUsUsV256iV256iV256i", "n", "amx-fp16") TARGET_BUILTIN(__builtin_ia32_tcmmimfp16ps_internal, "V256iUsUsUsV256iV256iV256i", "n", "amx-complex") TARGET_BUILTIN(__builtin_ia32_tcmmrlfp16ps_internal, "V256iUsUsUsV256iV256iV256i", "n", "amx-complex") +TARGET_BUILTIN(__builtin_ia32_tcvtrowd2ps_internal, "V16fUsUsV256iUi", "n", "amx-avx512") +TARGET_BUILTIN(__builtin_ia32_tcvtrowps2pbf16h_internal, "V32yUsUsV256iUi", "n", "amx-avx512") +TARGET_BUILTIN(__builtin_ia32_tcvtrowps2pbf16l_internal, "V32yUsUsV256iUi", "n", "amx-avx512") +TARGET_BUILTIN(__builtin_ia32_tcvtrowps2phh_internal, "V32xUsUsV256iUi", "n", "amx-avx512") +TARGET_BUILTIN(__builtin_ia32_tcvtrowps2phl_internal, "V32xUsUsV256iUi", "n", "amx-avx512") +TARGET_BUILTIN(__builtin_ia32_tilemovrow_internal, "V16iUsUsV256iUi", "n", "amx-avx512") // AMX TARGET_BUILTIN(__builtin_ia32_tile_loadconfig, "vvC*", "n", "amx-tile") TARGET_BUILTIN(__builtin_ia32_tile_storeconfig, "vvC*", "n", "amx-tile") @@ -148,6 +154,13 @@ TARGET_BUILTIN(__builtin_ia32_ptwrite64, "vUOi", "n", "ptwrite") TARGET_BUILTIN(__builtin_ia32_tcmmimfp16ps, "vIUcIUcIUc", "n", "amx-complex") TARGET_BUILTIN(__builtin_ia32_tcmmrlfp16ps, "vIUcIUcIUc", "n", "amx-complex") +TARGET_BUILTIN(__builtin_ia32_tcvtrowd2ps, "V16fIUcUi", "n", "amx-avx512") +TARGET_BUILTIN(__builtin_ia32_tcvtrowps2pbf16h, "V32yIUcUi", "n", "amx-avx512") +TARGET_BUILTIN(__builtin_ia32_tcvtrowps2pbf16l, "V32yIUcUi", "n", "amx-avx512") +TARGET_BUILTIN(__builtin_ia32_tcvtrowps2phh, "V32xIUcUi", "n", "amx-avx512") +TARGET_BUILTIN(__builtin_ia32_tcvtrowps2phl, "V32xIUcUi", "n", "amx-avx512") +TARGET_BUILTIN(__builtin_ia32_tilemovrow, "V16iIUcUi", "n", "amx-avx512") + TARGET_BUILTIN(__builtin_ia32_prefetchi, "vvC*Ui", "nc", "prefetchi") TARGET_BUILTIN(__builtin_ia32_cmpccxadd32, "Siv*SiSiIi", "n", "cmpccxadd") TARGET_BUILTIN(__builtin_ia32_cmpccxadd64, "SLLiv*SLLiSLLiIi", "n", "cmpccxadd") diff --git a/clang/include/clang/Driver/Options.td b/clang/include/clang/Driver/Options.td inde
[clang] [llvm] [X86][AMX] Support AMX-AVX512 (PR #114070)
https://github.com/phoebewang created https://github.com/llvm/llvm-project/pull/114070 None >From 587d0105e7724db0f35fc5c8179519fa6319e5c8 Mon Sep 17 00:00:00 2001 From: "Wang, Phoebe" Date: Tue, 29 Oct 2024 22:29:25 +0800 Subject: [PATCH] [X86][AMX] Support AMX-AVX512 --- clang/docs/ReleaseNotes.rst | 2 + clang/include/clang/Basic/BuiltinsX86_64.def | 13 + clang/include/clang/Driver/Options.td | 2 + clang/lib/Basic/Targets/X86.cpp | 6 + clang/lib/Basic/Targets/X86.h | 1 + clang/lib/Headers/CMakeLists.txt | 1 + clang/lib/Headers/amxavx512intrin.h | 381 ++ clang/lib/Headers/immintrin.h | 4 + clang/lib/Sema/SemaX86.cpp| 6 + clang/test/CodeGen/X86/amx_avx512_api.c | 52 +++ clang/test/CodeGen/X86/amxavx512-builtins.c | 41 ++ clang/test/CodeGen/attr-target-x86.c | 8 +- clang/test/Driver/x86-target-features.c | 7 + clang/test/Preprocessor/x86_target_features.c | 7 + llvm/include/llvm/IR/IntrinsicsX86.td | 50 +++ .../llvm/TargetParser/X86TargetParser.def | 1 + llvm/lib/Target/X86/X86.td| 4 + llvm/lib/Target/X86/X86ExpandPseudo.cpp | 64 ++- llvm/lib/Target/X86/X86ISelLowering.cpp | 76 llvm/lib/Target/X86/X86InstrAMX.td| 147 +++ llvm/lib/Target/X86/X86InstrPredicates.td | 1 + llvm/lib/Target/X86/X86LowerAMXType.cpp | 11 + llvm/lib/Target/X86/X86PreTileConfig.cpp | 19 +- llvm/lib/TargetParser/Host.cpp| 4 + llvm/lib/TargetParser/X86TargetParser.cpp | 2 + .../CodeGen/X86/amx-across-func-tilemovrow.ll | 171 .../test/CodeGen/X86/amx-avx512-intrinsics.ll | 116 ++ .../CodeGen/X86/amx-tile-avx512-internals.ll | 61 +++ llvm/test/MC/Disassembler/X86/amx-avx512.txt | 106 + llvm/test/MC/X86/amx-avx512-att.s | 105 + llvm/test/MC/X86/amx-avx512-intel.s | 105 + 31 files changed, 1564 insertions(+), 10 deletions(-) create mode 100644 clang/lib/Headers/amxavx512intrin.h create mode 100644 clang/test/CodeGen/X86/amx_avx512_api.c create mode 100644 clang/test/CodeGen/X86/amxavx512-builtins.c create mode 100644 llvm/test/CodeGen/X86/amx-across-func-tilemovrow.ll create mode 100644 llvm/test/CodeGen/X86/amx-avx512-intrinsics.ll create mode 100644 llvm/test/CodeGen/X86/amx-tile-avx512-internals.ll create mode 100644 llvm/test/MC/Disassembler/X86/amx-avx512.txt create mode 100644 llvm/test/MC/X86/amx-avx512-att.s create mode 100644 llvm/test/MC/X86/amx-avx512-intel.s diff --git a/clang/docs/ReleaseNotes.rst b/clang/docs/ReleaseNotes.rst index ce046a305c89b6..d45bd1240dd173 100644 --- a/clang/docs/ReleaseNotes.rst +++ b/clang/docs/ReleaseNotes.rst @@ -611,6 +611,8 @@ X86 Support * Supported MINMAX intrinsics of ``*_(mask(z)))_minmax(ne)_p[s|d|h|bh]`` and ``*_(mask(z)))_minmax_s[s|d|h]``. +- Support ISA of ``AMX-AVX512``. + - All intrinsics in adcintrin.h can now be used in constant expressions. - All intrinsics in adxintrin.h can now be used in constant expressions. diff --git a/clang/include/clang/Basic/BuiltinsX86_64.def b/clang/include/clang/Basic/BuiltinsX86_64.def index 2c591edb2835cd..70644f3f6b6054 100644 --- a/clang/include/clang/Basic/BuiltinsX86_64.def +++ b/clang/include/clang/Basic/BuiltinsX86_64.def @@ -128,6 +128,12 @@ TARGET_BUILTIN(__builtin_ia32_tdpbf16ps_internal, "V256iUsUsUsV256iV256iV256i", TARGET_BUILTIN(__builtin_ia32_tdpfp16ps_internal, "V256iUsUsUsV256iV256iV256i", "n", "amx-fp16") TARGET_BUILTIN(__builtin_ia32_tcmmimfp16ps_internal, "V256iUsUsUsV256iV256iV256i", "n", "amx-complex") TARGET_BUILTIN(__builtin_ia32_tcmmrlfp16ps_internal, "V256iUsUsUsV256iV256iV256i", "n", "amx-complex") +TARGET_BUILTIN(__builtin_ia32_tcvtrowd2ps_internal, "V16fUsUsV256iUi", "n", "amx-avx512") +TARGET_BUILTIN(__builtin_ia32_tcvtrowps2pbf16h_internal, "V32yUsUsV256iUi", "n", "amx-avx512") +TARGET_BUILTIN(__builtin_ia32_tcvtrowps2pbf16l_internal, "V32yUsUsV256iUi", "n", "amx-avx512") +TARGET_BUILTIN(__builtin_ia32_tcvtrowps2phh_internal, "V32xUsUsV256iUi", "n", "amx-avx512") +TARGET_BUILTIN(__builtin_ia32_tcvtrowps2phl_internal, "V32xUsUsV256iUi", "n", "amx-avx512") +TARGET_BUILTIN(__builtin_ia32_tilemovrow_internal, "V16iUsUsV256iUi", "n", "amx-avx512") // AMX TARGET_BUILTIN(__builtin_ia32_tile_loadconfig, "vvC*", "n", "amx-tile") TARGET_BUILTIN(__builtin_ia32_tile_storeconfig, "vvC*", "n", "amx-tile") @@ -148,6 +154,13 @@ TARGET_BUILTIN(__builtin_ia32_ptwrite64, "vUOi", "n", "ptwrite") TARGET_BUILTIN(__builtin_ia32_tcmmimfp16ps, "vIUcIUcIUc", "n", "amx-complex") TARGET_BUILTIN(__builtin_ia32_tcmmrlfp16ps, "vIUcIUcIUc", "n", "amx-complex") +TARGET_BUILTIN(__builtin_ia32_tcvtrowd2ps, "V16fIUcUi", "n", "amx-avx512") +TARGET_BUILTIN(__builtin_ia32_tcvtrowps2pbf16h, "V32yIUcUi", "n", "amx-avx512") +TARGET_BUILTIN(__builtin