[llvm-branch-commits] [openmp] release/18.x: [OpenMP][OMPT] Fix hsa include when building amdgpu/src/rtl.cpp (PR #95484)
jhuber6 wrote: > The `openmp/libomptarget/plugins-nextgen/amdgpu/src/rtl.cpp` file requires > the `HSA_AMD_AGENT_INFO_TIMESTAMP_FREQUENCY` symbol. > > This symbol is expected to be provided by > `openmp/libomptarget/plugins-nextgen/amdgpu/dynamic_hsa/hsa_ext_amd.h`, not > by third-party external `/opt/rocm/include/hsa/hsa_ext_amd.h`. This was introduced in ROCm-5.3, see https://github.com/ROCm/ROCR-Runtime/blob/rocm-5.3.x/src/inc/hsa_ext_amd.h#L333. The `dynamic_hsa/` version is a copy of this header for use when the system version is not provided. If the system fails to find HSA, then it will use the dynamic version. The problem here is that you _have_ HSA, but it's too old. I don't know how much backward compatibility we really provide here, unfortunately the HSA headers really don't give you much versioning to work with, so we can't do `ifdef` on this stuff. https://github.com/llvm/llvm-project/pull/95484 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [openmp] release/18.x: [OpenMP][OMPT] Fix hsa include when building amdgpu/src/rtl.cpp (PR #95484)
illwieckz wrote: I reproduce the bug with both `release/18.x` and `release/17.x`. I don't reproduce the bug with `release/16.x`. I cannot test `release/15.x` because of other unrelated errors happening (like not having `getenv` defined). https://github.com/llvm/llvm-project/pull/95484 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [Support] Integrate SipHash.cpp into libSupport. (PR #94394)
https://github.com/kbeyls approved this pull request. https://github.com/llvm/llvm-project/pull/94394 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [Support] Integrate SipHash.cpp into libSupport. (PR #94394)
kbeyls wrote: > [37c84b9](https://github.com/llvm/llvm-project/pull/94394/commits/37c84b9dce70f40db8a7c27b7de8232c4d10f78f) > shows what I had in mind, let me know what you all think. I added: > > ``` > void getSipHash_2_4_64(ArrayRef In, const uint8_t ()[16], >uint8_t ()[8]); > > void getSipHash_2_4_128(ArrayRef In, const uint8_t ()[16], > uint8_t ()[16]); > ``` > > as the core interfaces, and mimicked the ref. test harness to reuse the same > test vectors. If this seems reasonable to yall I'm happy to extract the > vectors.h file from the ref. implementation into the "Import original > sources" PR – that's why I kept it open ;) Thanks, that looks good to me. https://github.com/llvm/llvm-project/pull/94394 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [openmp] release/18.x: [OpenMP][OMPT] Fix hsa include when building amdgpu/src/rtl.cpp (PR #95484)
illwieckz wrote: Here is a script to reproduce the bug: ```bash #! /usr/bin/env bash set -x -u -e -o pipefail version="${1:-18}" CMAKE_BUILD_PARALLEL_LEVEL="$(nproc)" export CMAKE_BUILD_PARALLEL_LEVEL="${CMAKE_BUILD_PARALLEL_LEVEL:-4}" workspace="llvm-bug95484-${version}" rm -rf "${workspace}" mkdir "${workspace}" cd "${workspace}" git clone --depth 1 \ --branch "release/${version}.x" \ 'https://github.com/llvm/llvm-project.git' \ 'llvm-project' git clone --depth 1 \ 'https://github.com/KhronosGroup/SPIRV-Headers.git' \ 'llvm-project/llvm/projects/SPIRV-Headers' git clone --depth 1 \ --branch "llvm_release_${version}0" \ 'https://github.com/KhronosGroup/SPIRV-LLVM-Translator.git' \ 'llvm-project/llvm/projects/SPIRV-LLVM-Translator' cmake \ -S'llvm-project/llvm' \ -B'build' \ -G'Ninja' \ -D'CMAKE_INSTALL_PREFIX'='install' \ -D'CMAKE_BUILD_TYPE'='Release' \ -D'BUILD_SHARED_LIBS'='ON' \ -D'LLVM_ENABLE_PROJECTS'='clang;openmp' \ -D'LLVM_TARGETS_TO_BUILD'='Native' \ -D'LLVM_EXPERIMENTAL_TARGETS_TO_BUILD'='SPIRV' \ -D'LLVM_ENABLE_ASSERTIONS'='OFF' \ -D'LLVM_ENABLE_RTTI'='ON' \ -D'LLVM_BUILD_TESTS'='OFF' \ -D'LLVM_BUILD_TOOLS'='ON' \ -D'LLVM_SPIRV_INCLUDE_TESTS'='OFF' \ -D'LLVM_EXTERNAL_PROJECTS'='SPIRV-Headers' cmake --build 'build' cmake --install 'build' ``` It can be used just by saving it as `llvm-bug95484` and running it by doing either: - `./llvm-bug95484` to fetch and attempt a clean build of `release/18.x` in a way it reproduces the bug, - `./llvm-bug95484 17` to fetch and reproduce the bug with `release/17.x`. It will fail this way: ``` llvm-bug95484-18/llvm-project/openmp/libomptarget/plugins-nextgen/amdgpu/src/rtl.cpp:1902:37: error: ‘HSA_AMD_AGENT_INFO_TIMESTAMP_FREQUENCY’ was not declared in this scope; did you mean ‘HSA_SYSTEM_INFO_TIMESTAMP_FREQUENCY’? 1902 | if (auto Err = getDeviceAttrRaw(HSA_AMD_AGENT_INFO_TIMESTAMP_FREQUENCY, | ^~ | HSA_SYSTEM_INFO_TIMESTAMP_FREQUENCY ``` https://github.com/llvm/llvm-project/pull/95484 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [openmp] release/18.x: [OpenMP][OMPT] Fix hsa include when building amdgpu/src/rtl.cpp (PR #95484)
illwieckz wrote: ```$ $ rg HSA_AMD_AGENT_INFO_TIMESTAMP_FREQUENCY libc/utils/gpu/loader/amdgpu/Loader.cpp 521: HSA_AMD_AGENT_INFO_TIMESTAMP_FREQUENCY), openmp/libomptarget/plugins-nextgen/amdgpu/dynamic_hsa/hsa_ext_amd.h 74: HSA_AMD_AGENT_INFO_TIMESTAMP_FREQUENCY = 0xA016, openmp/libomptarget/plugins-nextgen/amdgpu/src/rtl.cpp 1892:if (auto Err = getDeviceAttrRaw(HSA_AMD_AGENT_INFO_TIMESTAMP_FREQUENCY, ``` The `openmp/libomptarget/plugins-nextgen/amdgpu/src/rtl.cpp` file requires the `HSA_AMD_AGENT_INFO_TIMESTAMP_FREQUENCY` symbol. This symbol is expected to be provided by `openmp/libomptarget/plugins-nextgen/amdgpu/dynamic_hsa/hsa_ext_amd.h`, not by third-party external `/opt/rocm/include/hsa/hsa_ext_amd.h` The code in `release/17.x` and `release/18.x` is explictely looking for `ROCm`'s `hsa/_ext_amd.h` and never look for LLVM `dynamic_hsa/hsa_ext_amd.h`. It tries to look for LLVM-provided `hsa_ext_amd.h` as a fallback but because of a mistake in `CMakeLists.txt`, this doesn't work in all cases because `dynamic_hsa` is not added to include directories in all cases. https://github.com/llvm/llvm-project/pull/95484 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [openmp] release/18.x: [OpenMP][OMPT] Fix hsa include when building amdgpu/src/rtl.cpp (PR #95484)
illwieckz wrote: > We made a change recently that made the dynamic_hsa version the default. The > error you're seeing is from an old HSA, so if you're overriding the default > to use an old library that's probably not worth working around. The error I see comes from the fact there is no old HSA around to workaround an LLVM bug. There is no `hsa/hsa.h` in the tree, the default `dynamic_hsa` is not used. The `hsa/hsa.h` file is from ROCm, not from LLVM. Without such patch, LLVM requires ROCm to be installed and configured to be in default includes for `src/rtl.cpp` to build if `hsa.cpp` is not built. This patch is to make LLVM use `dynamic_hsa` for building `src/rtl.cpp` because it is the default. This patch is needed to build both `release/17.x` and `release/18.x`, the `main` branch changed the code layout so the patch will not work. I assume a full LLVM build will not trigger the build problem because something else will include `dynamic_hsa` and will make it findable by `src/rtl.cpp` by luck. But when building a not-full LLVM, just what's needed by some applications, `dynamic_hsa` is not added to the include directories while being required by `src/rtl.cpp`. https://github.com/llvm/llvm-project/pull/95484 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [openmp] release/18.x: [OpenMP][OMPT] Fix hsa include when building amdgpu/src/rtl.cpp (PR #95484)
https://github.com/jhuber6 edited https://github.com/llvm/llvm-project/pull/95484 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [libc] 4076c30 - [libc] more fix
Author: Schrodinger ZHU Yifan Date: 2024-06-13T20:22:21-07:00 New Revision: 4076c3004f09e95d1fcd299452843f99235ff422 URL: https://github.com/llvm/llvm-project/commit/4076c3004f09e95d1fcd299452843f99235ff422 DIFF: https://github.com/llvm/llvm-project/commit/4076c3004f09e95d1fcd299452843f99235ff422.diff LOG: [libc] more fix Added: Modified: libc/cmake/modules/LLVMLibCTestRules.cmake libc/test/IntegrationTest/CMakeLists.txt libc/test/IntegrationTest/test.cpp libc/test/UnitTest/CMakeLists.txt libc/test/UnitTest/HermeticTestUtils.cpp Removed: diff --git a/libc/cmake/modules/LLVMLibCTestRules.cmake b/libc/cmake/modules/LLVMLibCTestRules.cmake index eb6be91b55e26..c8d7c8a2b1c7c 100644 --- a/libc/cmake/modules/LLVMLibCTestRules.cmake +++ b/libc/cmake/modules/LLVMLibCTestRules.cmake @@ -686,6 +686,15 @@ function(add_libc_hermetic_test test_name) LibcTest.hermetic libc.test.UnitTest.ErrnoSetterMatcher ${fq_deps_list}) + # TODO: currently the dependency chain is broken such that getauxval cannot properly + # propagate to hermetic tests. This is a temporary workaround. + if (LIBC_TARGET_ARCHITECTURE_IS_AARCH64) +target_link_libraries( + ${fq_build_target_name} + PRIVATE +libc.src.sys.auxv.getauxval +) + endif() # Tests on the GPU require an external loader utility to launch the kernel. if(TARGET libc.utils.gpu.loader) diff --git a/libc/test/IntegrationTest/CMakeLists.txt b/libc/test/IntegrationTest/CMakeLists.txt index 4f31f10b29f0b..4a999407d48d7 100644 --- a/libc/test/IntegrationTest/CMakeLists.txt +++ b/libc/test/IntegrationTest/CMakeLists.txt @@ -1,3 +1,7 @@ +set(arch_specific_deps) +if(LIBC_TARGET_ARCHITECTURE_IS_AARCH64) + set(arch_specific_deps libc.src.sys.auxv.getauxval) +endif() add_object_library( test SRCS @@ -8,4 +12,5 @@ add_object_library( test.h DEPENDS libc.src.__support.OSUtil.osutil +${arch_specific_deps} ) diff --git a/libc/test/IntegrationTest/test.cpp b/libc/test/IntegrationTest/test.cpp index 27e7f29efa0f1..a8b2f2911fd8e 100644 --- a/libc/test/IntegrationTest/test.cpp +++ b/libc/test/IntegrationTest/test.cpp @@ -6,6 +6,8 @@ // //===--===// +#include "src/__support/common.h" +#include "src/sys/auxv/getauxval.h" #include #include @@ -80,9 +82,11 @@ void *realloc(void *ptr, size_t s) { // __dso_handle when -nostdlib is used. void *__dso_handle = nullptr; -// On some platform (aarch64 fedora tested) full build integration test -// objects need to link against libgcc, which may expect a __getauxval -// function. For now, it is fine to provide a weak definition that always -// returns false. -[[gnu::weak]] bool __getauxval(uint64_t, uint64_t *) { return false; } +#ifdef LIBC_TARGET_ARCH_IS_AARCH64 +// Due to historical reasons, libgcc on aarch64 may expect __getauxval to be +// defined. See also https://gcc.gnu.org/pipermail/gcc-cvs/2020-June/300635.html +unsigned long __getauxval(unsigned long id) { + return LIBC_NAMESPACE::getauxval(id); +} +#endif } // extern "C" diff --git a/libc/test/UnitTest/CMakeLists.txt b/libc/test/UnitTest/CMakeLists.txt index 302af3044ca3d..4adc2f5c725f7 100644 --- a/libc/test/UnitTest/CMakeLists.txt +++ b/libc/test/UnitTest/CMakeLists.txt @@ -41,7 +41,7 @@ function(add_unittest_framework_library name) target_compile_options(${name}.hermetic PRIVATE ${compile_options}) if(TEST_LIB_DEPENDS) -foreach(dep IN LISTS ${TEST_LIB_DEPENDS}) +foreach(dep IN ITEMS ${TEST_LIB_DEPENDS}) if(TARGET ${dep}.unit) add_dependencies(${name}.unit ${dep}.unit) else() diff --git a/libc/test/UnitTest/HermeticTestUtils.cpp b/libc/test/UnitTest/HermeticTestUtils.cpp index 349c182ff2379..6e815e6c8aab0 100644 --- a/libc/test/UnitTest/HermeticTestUtils.cpp +++ b/libc/test/UnitTest/HermeticTestUtils.cpp @@ -6,6 +6,8 @@ // //===--===// +#include "src/__support/common.h" +#include "src/sys/auxv/getauxval.h" #include #include @@ -19,6 +21,12 @@ void *memmove(void *dst, const void *src, size_t count); void *memset(void *ptr, int value, size_t count); int atexit(void (*func)(void)); +// TODO: It seems that some old test frameworks does not use +// add_libc_hermetic_test properly. Such that they won't get correct linkage +// against the object containing this function. We create a dummy function that +// always returns 0 to indicate a failure. +[[gnu::weak]] unsigned long getauxval(unsigned long id) { return 0; } + } // namespace LIBC_NAMESPACE namespace { @@ -102,6 +110,14 @@ void __cxa_pure_virtual() { // __dso_handle when -nostdlib is used. void *__dso_handle = nullptr; +#ifdef LIBC_TARGET_ARCH_IS_AARCH64 +// Due to historical reasons,
[llvm-branch-commits] [openmp] release/18.x: [OpenMP][OMPT] Fix hsa include when building amdgpu/src/rtl.cpp (PR #95484)
https://github.com/jhuber6 commented: We made a change recently that made the dynamic_hsa version the default. The error you're seeing is from an old HSA, so if you're overriding the default to use an old library that's probably not worth working around. https://github.com/llvm/llvm-project/pull/95484 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [clang] Define ptrauth_sign_constant builtin. (PR #93904)
@@ -354,6 +354,23 @@ Given that ``signedPointer`` matches the layout for signed pointers signed with the given key, extract the raw pointer from it. This operation does not trap and cannot fail, even if the pointer is not validly signed. +``ptrauth_sign_constant`` +^ + +.. code-block:: c + + ptrauth_sign_constant(pointer, key, discriminator) + +Return a signed pointer for a constant address in a manner which guarantees +a non-attackable sequence. + +``pointer`` must be a constant expression of pointer type which evaluates to +a non-null pointer. The result will have the same type as ``discriminator``. + +Calls to this are constant expressions if the discriminator is a null-pointer +constant expression or an integer constant expression. Implementations may +allow other pointer expressions as well. ahmedbougacha wrote: Yeah, I agree today this could simply be "it's always a constant expression"; I'll rewrite it (cc @rjmccall if this looks like anything to you) https://github.com/llvm/llvm-project/pull/93904 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [clang] Define ptrauth_sign_constant builtin. (PR #93904)
@@ -354,6 +354,23 @@ Given that ``signedPointer`` matches the layout for signed pointers signed with the given key, extract the raw pointer from it. This operation does not trap and cannot fail, even if the pointer is not validly signed. +``ptrauth_sign_constant`` +^ + +.. code-block:: c + + ptrauth_sign_constant(pointer, key, discriminator) + +Return a signed pointer for a constant address in a manner which guarantees +a non-attackable sequence. ahmedbougacha wrote: Later additions to this document describe that in depth, you can look for > [clang][docs] Document the ptrauth security model. on my branch https://github.com/llvm/llvm-project/pull/93904 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [clang] Define ptrauth_sign_constant builtin. (PR #93904)
@@ -58,6 +58,35 @@ void test_string_discriminator(const char *str) { } +void test_sign_constant(int *dp, int (*fp)(int)) { + __builtin_ptrauth_sign_constant(, VALID_DATA_KEY); // expected-error {{too few arguments}} + __builtin_ptrauth_sign_constant(, VALID_DATA_KEY, , ); // expected-error {{too many arguments}} + + __builtin_ptrauth_sign_constant(mismatched_type, VALID_DATA_KEY, 0); // expected-error {{signed value must have pointer type; type here is 'struct A'}} + __builtin_ptrauth_sign_constant(, mismatched_type, 0); // expected-error {{passing 'struct A' to parameter of incompatible type 'int'}} + __builtin_ptrauth_sign_constant(, VALID_DATA_KEY, mismatched_type); // expected-error {{extra discriminator must have pointer or integer type; type here is 'struct A'}} + + (void) __builtin_ptrauth_sign_constant(NULL, VALID_DATA_KEY, ); // expected-error {{argument to ptrauth_sign_constant must refer to a global variable or function}} ahmedbougacha wrote: We could special-case null pointers, but they're already covered by the diagnostic, which asks for global variables or functions – which NULL isn't. For auth/sign, we don't have that sort of constraint on the pointer: it really is NULL and NULL alone that's special. Now, the more interesting question is whether we should allow null pointers at all here. Since defining these original builtins we have taught the qualifier to have a mode that signs/authenticates null, for some specific use-cases where replacing a signed value with NULL (which is otherwise never signed or authenticated) would bypass signing in a problematic way. We haven't had the chance or need to revisit the builtins to allow sign/auth of NULL, but it's reasonable to add that support in the future. We'd have to consider how to expose that in the builtins, because it's probably still something that's almost always a mistake; more builtins would be an easy solution but maybe not a sophisticated one. https://github.com/llvm/llvm-project/pull/93904 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [clang] Define ptrauth_sign_constant builtin. (PR #93904)
@@ -2061,6 +2071,58 @@ ConstantLValueEmitter::VisitCallExpr(const CallExpr *E) { } } +ConstantLValue +ConstantLValueEmitter::emitPointerAuthSignConstant(const CallExpr *E) { + llvm::Constant *UnsignedPointer = emitPointerAuthPointer(E->getArg(0)); + unsigned Key = emitPointerAuthKey(E->getArg(1)); + llvm::Constant *StorageAddress; + llvm::Constant *OtherDiscriminator; + std::tie(StorageAddress, OtherDiscriminator) = ahmedbougacha wrote: Yeah, this simply predates structured bindings; we can indeed use them now. https://github.com/llvm/llvm-project/pull/93904 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [openmp] release/18.x: [OpenMP][OMPT] Fix hsa include when building amdgpu/src/rtl.cpp (PR #95484)
illwieckz wrote: @pranav-sivaraman try this patch: ```diff diff --git a/openmp/libomptarget/plugins/amdgpu/CMakeLists.txt b/openmp/libomptarget/plugins/amdgpu/CMakeLists.txt index 92523c23f68b..92bcd94edb7a 100644 --- a/openmp/libomptarget/plugins/amdgpu/CMakeLists.txt +++ b/openmp/libomptarget/plugins/amdgpu/CMakeLists.txt @@ -56,13 +56,14 @@ include_directories( set(LIBOMPTARGET_DLOPEN_LIBHSA OFF) option(LIBOMPTARGET_FORCE_DLOPEN_LIBHSA "Build with dlopened libhsa" ${LIBOMPTARGET_DLOPEN_LIBHSA}) +include_directories(dynamic_hsa) + if (${hsa-runtime64_FOUND} AND NOT LIBOMPTARGET_FORCE_DLOPEN_LIBHSA) libomptarget_say("Building AMDGPU plugin linked against libhsa") set(LIBOMPTARGET_EXTRA_SOURCE) set(LIBOMPTARGET_DEP_LIBRARIES hsa-runtime64::hsa-runtime64) else() libomptarget_say("Building AMDGPU plugin for dlopened libhsa") - include_directories(dynamic_hsa) set(LIBOMPTARGET_EXTRA_SOURCE dynamic_hsa/hsa.cpp) set(LIBOMPTARGET_DEP_LIBRARIES) endif() ``` I haven't tested it, but maybe the mistake is similar. https://github.com/llvm/llvm-project/pull/95484 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [openmp] release/18.x: [OpenMP][OMPT] Fix hsa include when building amdgpu/src/rtl.cpp (PR #95484)
illwieckz wrote: The 14 branch seems to be very old, espially the file you link is in `plugins/` directory, while the files I modify are in `plugins-nextgen/` directory, witht the `plugins/` directory not existing anymore. So I strongly doubt the patch is useful for LLVM 14, but your problem probably needs another but similar solution. https://github.com/llvm/llvm-project/pull/95484 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] Using matched block counts to measure discrepancy (PR #95486)
llvmbot wrote: @llvm/pr-subscribers-llvm-transforms Author: shaw young (shawbyoung) Changes Test Plan: tbd --- Full diff: https://github.com/llvm/llvm-project/pull/95486.diff 2 Files Affected: - (modified) bolt/lib/Profile/StaleProfileMatching.cpp (+29-8) - (modified) llvm/include/llvm/Transforms/Utils/SampleProfileInference.h (-2) ``diff diff --git a/bolt/lib/Profile/StaleProfileMatching.cpp b/bolt/lib/Profile/StaleProfileMatching.cpp index 6588cf2c0ce66..cbd98f4d4769f 100644 --- a/bolt/lib/Profile/StaleProfileMatching.cpp +++ b/bolt/lib/Profile/StaleProfileMatching.cpp @@ -53,9 +53,9 @@ cl::opt cl::opt MatchedProfileThreshold( "matched-profile-threshold", -cl::desc("Percentage threshold of matched execution counts at which stale " +cl::desc("Percentage threshold of matched basic blocks at which stale " "profile inference is executed."), -cl::init(5), cl::Hidden, cl::cat(BoltOptCategory)); +cl::init(50), cl::Hidden, cl::cat(BoltOptCategory)); cl::opt StaleMatchingMaxFuncSize( "stale-matching-max-func-size", @@ -186,6 +186,17 @@ struct BlendedBlockHash { uint8_t SuccHash{0}; }; +/// A data object containing function matching information. +struct FunctionMatchingData { +public: + /// The number of blocks matched exactly. + uint64_t MatchedExactBlocks{0}; + /// The number of blocks matched loosely. + uint64_t MatchedLooseBlocks{0}; + /// The number of execution counts matched. + uint64_t MatchedExecCounts{0}; +}; + /// The object is used to identify and match basic blocks in a BinaryFunction /// given their hashes computed on a binary built from several revisions behind /// release. @@ -400,7 +411,8 @@ createFlowFunction(const BinaryFunction::BasicBlockOrderType ) { void matchWeightsByHashes(BinaryContext , const BinaryFunction::BasicBlockOrderType , const yaml::bolt::BinaryFunctionProfile , - FlowFunction ) { + FlowFunction , + FunctionMatchingData ) { assert(Func.Blocks.size() == BlockOrder.size() + 1); std::vector Blocks; @@ -440,9 +452,11 @@ void matchWeightsByHashes(BinaryContext , if (Matcher.isHighConfidenceMatch(BinHash, YamlHash)) { ++BC.Stats.NumMatchedBlocks; BC.Stats.MatchedSampleCount += YamlBB.ExecCount; -Func.MatchedExecCount += YamlBB.ExecCount; +FuncMatchingData.MatchedExecCounts += YamlBB.ExecCount; +FuncMatchingData.MatchedExactBlocks += 1; LLVM_DEBUG(dbgs() << " exact match\n"); } else { +FuncMatchingData.MatchedLooseBlocks += 1; LLVM_DEBUG(dbgs() << " loose match\n"); } if (YamlBB.NumInstructions == BB->size()) @@ -582,11 +596,14 @@ void preprocessUnreachableBlocks(FlowFunction ) { /// Decide if stale profile matching can be applied for a given function. /// Currently we skip inference for (very) large instances and for instances /// having "unexpected" control flow (e.g., having no sink basic blocks). -bool canApplyInference(const FlowFunction , const yaml::bolt::BinaryFunctionProfile ) { +bool canApplyInference(const FlowFunction , + const yaml::bolt::BinaryFunctionProfile , + const FunctionMatchingData ) { if (Func.Blocks.size() > opts::StaleMatchingMaxFuncSize) return false; - if (Func.MatchedExecCount / YamlBF.ExecCount >= opts::MatchedProfileThreshold) + if ((double)FuncMatchingData.MatchedExactBlocks / YamlBF.Blocks.size() >= + opts::MatchedProfileThreshold / 100.0) return false; bool HasExitBlocks = llvm::any_of( @@ -735,18 +752,22 @@ bool YAMLProfileReader::inferStaleProfile( const BinaryFunction::BasicBlockOrderType BlockOrder( BF.getLayout().block_begin(), BF.getLayout().block_end()); + // Create a containter for function matching data. + FunctionMatchingData FuncMatchingData; + // Create a wrapper flow function to use with the profile inference algorithm. FlowFunction Func = createFlowFunction(BlockOrder); // Match as many block/jump counts from the stale profile as possible - matchWeightsByHashes(BF.getBinaryContext(), BlockOrder, YamlBF, Func); + matchWeightsByHashes(BF.getBinaryContext(), BlockOrder, YamlBF, Func, + FuncMatchingData); // Adjust the flow function by marking unreachable blocks Unlikely so that // they don't get any counts assigned. preprocessUnreachableBlocks(Func); // Check if profile inference can be applied for the instance. - if (!canApplyInference(Func, YamlBF)) + if (!canApplyInference(Func, YamlBF, FuncMatchingData)) return false; // Apply the profile inference algorithm. diff --git a/llvm/include/llvm/Transforms/Utils/SampleProfileInference.h b/llvm/include/llvm/Transforms/Utils/SampleProfileInference.h index e7971ca1cb428..b4ea1ad840f9d 100644 ---
[llvm-branch-commits] Using matched block counts to measure discrepancy (PR #95486)
https://github.com/shawbyoung closed https://github.com/llvm/llvm-project/pull/95486 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] Using matched block counts to measure discrepancy (PR #95486)
https://github.com/shawbyoung created https://github.com/llvm/llvm-project/pull/95486 Test Plan: tbd ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [openmp] release/18.x: [OpenMP][OMPT] Fix hsa include when building amdgpu/src/rtl.cpp (PR #95484)
pranav-sivaraman wrote: This is different from this [file](https://github.com/llvm/llvm-project/blob/release/14.x/openmp/libomptarget/plugins/amdgpu/impl/hsa_api.h) right? I'm trying to fix an issue when building LLVM 14 with a newer ROCm releases which fails to find the newer `hsa/hsa.h` headers. Not sure if I need to extend the patch to include these changes as well. https://github.com/llvm/llvm-project/pull/95484 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [openmp] release/18.x: [OpenMP][OMPT] Fix hsa include when building amdgpu/src/rtl.cpp (PR #95484)
illwieckz wrote: I first noticed the issue when building the chipStar fork of LLVM 17: https://github.com/CHIP-SPV/llvm-project (branch `chipStar-llvm-17`), but the code being the same in LLVM 18, it is expected to fail in LLVM 18 too. The whole folder disappeared in `main` so I made this patch to target the most recent release branch having those files: LLVM18. It would be good to backport it to LLVM 17 too. I haven't checked it yet if versions older than LLVM 17 are affected. https://github.com/llvm/llvm-project/pull/95484 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [openmp] release/18.x: [OpenMP][OMPT] Fix hsa include when building amdgpu/src/rtl.cpp (PR #95484)
llvmbot wrote: @llvm/pr-subscribers-backend-amdgpu Author: Thomas Debesse (illwieckz) Changes The `dynamic_hsa/` include directory is required by both optional `dynamic_hsa/hsa.cpp` and non-optional `src/rtl.cpp`. It should then always be included or the build will fail if only `src/rtl.cpp` is built. This also simplifies the way header files from `dynamic_hsa/` are included in `src/rtl.cpp`. Fixes: ``` error: ‘HSA_AMD_AGENT_INFO_TIMESTAMP_FREQUENCY’ was not declared in this scope ``` --- Full diff: https://github.com/llvm/llvm-project/pull/95484.diff 2 Files Affected: - (modified) openmp/libomptarget/plugins-nextgen/amdgpu/CMakeLists.txt (+3-1) - (modified) openmp/libomptarget/plugins-nextgen/amdgpu/src/rtl.cpp (-10) ``diff diff --git a/openmp/libomptarget/plugins-nextgen/amdgpu/CMakeLists.txt b/openmp/libomptarget/plugins-nextgen/amdgpu/CMakeLists.txt index 68ce63467a6c8..42cc560c79112 100644 --- a/openmp/libomptarget/plugins-nextgen/amdgpu/CMakeLists.txt +++ b/openmp/libomptarget/plugins-nextgen/amdgpu/CMakeLists.txt @@ -38,13 +38,15 @@ add_definitions(-DDEBUG_PREFIX="TARGET AMDGPU RTL") set(LIBOMPTARGET_DLOPEN_LIBHSA OFF) option(LIBOMPTARGET_FORCE_DLOPEN_LIBHSA "Build with dlopened libhsa" ${LIBOMPTARGET_DLOPEN_LIBHSA}) +# Required by both optional dynamic_hsa/hsa.cpp and non-optional src/rtl.cpp. +include_directories(dynamic_hsa) + if (${hsa-runtime64_FOUND} AND NOT LIBOMPTARGET_FORCE_DLOPEN_LIBHSA) libomptarget_say("Building AMDGPU NextGen plugin linked against libhsa") set(LIBOMPTARGET_EXTRA_SOURCE) set(LIBOMPTARGET_DEP_LIBRARIES hsa-runtime64::hsa-runtime64) else() libomptarget_say("Building AMDGPU NextGen plugin for dlopened libhsa") - include_directories(dynamic_hsa) set(LIBOMPTARGET_EXTRA_SOURCE dynamic_hsa/hsa.cpp) set(LIBOMPTARGET_DEP_LIBRARIES) endif() diff --git a/openmp/libomptarget/plugins-nextgen/amdgpu/src/rtl.cpp b/openmp/libomptarget/plugins-nextgen/amdgpu/src/rtl.cpp index 81634ae1edc49..8cedc72d5f63c 100644 --- a/openmp/libomptarget/plugins-nextgen/amdgpu/src/rtl.cpp +++ b/openmp/libomptarget/plugins-nextgen/amdgpu/src/rtl.cpp @@ -56,18 +56,8 @@ #define BIGENDIAN_CPU #endif -#if defined(__has_include) -#if __has_include("hsa/hsa.h") -#include "hsa/hsa.h" -#include "hsa/hsa_ext_amd.h" -#elif __has_include("hsa.h") #include "hsa.h" #include "hsa_ext_amd.h" -#endif -#else -#include "hsa/hsa.h" -#include "hsa/hsa_ext_amd.h" -#endif namespace llvm { namespace omp { `` https://github.com/llvm/llvm-project/pull/95484 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [openmp] release/18.x: [OpenMP][OMPT] Fix hsa include when building amdgpu/src/rtl.cpp (PR #95484)
https://github.com/illwieckz created https://github.com/llvm/llvm-project/pull/95484 The `dynamic_hsa/` include directory is required by both optional `dynamic_hsa/hsa.cpp` and non-optional `src/rtl.cpp`. It should then always be included or the build will fail if only `src/rtl.cpp` is built. This also simplifies the way header files from `dynamic_hsa/` are included in `src/rtl.cpp`. Fixes: ``` error: ‘HSA_AMD_AGENT_INFO_TIMESTAMP_FREQUENCY’ was not declared in this scope ``` >From e84e8bdef6d902d51a72eb93f7ca9812f0467c72 Mon Sep 17 00:00:00 2001 From: Thomas Debesse Date: Fri, 14 Jun 2024 00:38:25 +0200 Subject: [PATCH] release/18.x: [OpenMP][OMPT] Fix hsa include when building amdgpu/src/rtl.cpp MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The dynamic_hsa/ include directory is required by both optional dynamic_hsa/hsa.cpp and non-optional src/rtl.cpp. It should then always be included or the build will fail if only src/rtl.cpp is built. This also simplifies the way header files from dynamic_hsa/ are included in src/rtl.cpp. Fixes: error: ‘HSA_AMD_AGENT_INFO_TIMESTAMP_FREQUENCY’ was not declared in this scope --- .../libomptarget/plugins-nextgen/amdgpu/CMakeLists.txt | 4 +++- openmp/libomptarget/plugins-nextgen/amdgpu/src/rtl.cpp | 10 -- 2 files changed, 3 insertions(+), 11 deletions(-) diff --git a/openmp/libomptarget/plugins-nextgen/amdgpu/CMakeLists.txt b/openmp/libomptarget/plugins-nextgen/amdgpu/CMakeLists.txt index 68ce63467a6c8..42cc560c79112 100644 --- a/openmp/libomptarget/plugins-nextgen/amdgpu/CMakeLists.txt +++ b/openmp/libomptarget/plugins-nextgen/amdgpu/CMakeLists.txt @@ -38,13 +38,15 @@ add_definitions(-DDEBUG_PREFIX="TARGET AMDGPU RTL") set(LIBOMPTARGET_DLOPEN_LIBHSA OFF) option(LIBOMPTARGET_FORCE_DLOPEN_LIBHSA "Build with dlopened libhsa" ${LIBOMPTARGET_DLOPEN_LIBHSA}) +# Required by both optional dynamic_hsa/hsa.cpp and non-optional src/rtl.cpp. +include_directories(dynamic_hsa) + if (${hsa-runtime64_FOUND} AND NOT LIBOMPTARGET_FORCE_DLOPEN_LIBHSA) libomptarget_say("Building AMDGPU NextGen plugin linked against libhsa") set(LIBOMPTARGET_EXTRA_SOURCE) set(LIBOMPTARGET_DEP_LIBRARIES hsa-runtime64::hsa-runtime64) else() libomptarget_say("Building AMDGPU NextGen plugin for dlopened libhsa") - include_directories(dynamic_hsa) set(LIBOMPTARGET_EXTRA_SOURCE dynamic_hsa/hsa.cpp) set(LIBOMPTARGET_DEP_LIBRARIES) endif() diff --git a/openmp/libomptarget/plugins-nextgen/amdgpu/src/rtl.cpp b/openmp/libomptarget/plugins-nextgen/amdgpu/src/rtl.cpp index 81634ae1edc49..8cedc72d5f63c 100644 --- a/openmp/libomptarget/plugins-nextgen/amdgpu/src/rtl.cpp +++ b/openmp/libomptarget/plugins-nextgen/amdgpu/src/rtl.cpp @@ -56,18 +56,8 @@ #define BIGENDIAN_CPU #endif -#if defined(__has_include) -#if __has_include("hsa/hsa.h") -#include "hsa/hsa.h" -#include "hsa/hsa_ext_amd.h" -#elif __has_include("hsa.h") #include "hsa.h" #include "hsa_ext_amd.h" -#endif -#else -#include "hsa/hsa.h" -#include "hsa/hsa_ext_amd.h" -#endif namespace llvm { namespace omp { ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [Support] Integrate SipHash.cpp into libSupport. (PR #94394)
https://github.com/ahmedbougacha updated https://github.com/llvm/llvm-project/pull/94394 >From 1e9a3fde97d907c3cd6be33db91d1c18c7236ffb Mon Sep 17 00:00:00 2001 From: Ahmed Bougacha Date: Tue, 4 Jun 2024 12:41:47 -0700 Subject: [PATCH 1/7] [Support] Reformat SipHash.cpp to match libSupport. While there, give it our usual file header and an acknowledgement, and remove the imported README.md.SipHash. --- llvm/lib/Support/README.md.SipHash | 126 -- llvm/lib/Support/SipHash.cpp | 264 ++--- 2 files changed, 129 insertions(+), 261 deletions(-) delete mode 100644 llvm/lib/Support/README.md.SipHash diff --git a/llvm/lib/Support/README.md.SipHash b/llvm/lib/Support/README.md.SipHash deleted file mode 100644 index 4de3cd1854681..0 --- a/llvm/lib/Support/README.md.SipHash +++ /dev/null @@ -1,126 +0,0 @@ -# SipHash - -[![License: -CC0-1.0](https://licensebuttons.net/l/zero/1.0/80x15.png)](http://creativecommons.org/publicdomain/zero/1.0/) - -[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT) - - -SipHash is a family of pseudorandom functions (PRFs) optimized for speed on short messages. -This is the reference C code of SipHash: portable, simple, optimized for clarity and debugging. - -SipHash was designed in 2012 by [Jean-Philippe Aumasson](https://aumasson.jp) -and [Daniel J. Bernstein](https://cr.yp.to) as a defense against [hash-flooding -DoS attacks](https://aumasson.jp/siphash/siphashdos_29c3_slides.pdf). - -SipHash is: - -* *Simpler and faster* on short messages than previous cryptographic -algorithms, such as MACs based on universal hashing. - -* *Competitive in performance* with insecure non-cryptographic algorithms, such as [fhhash](https://github.com/cbreeden/fxhash). - -* *Cryptographically secure*, with no sign of weakness despite multiple [cryptanalysis](https://eprint.iacr.org/2019/865) [projects](https://eprint.iacr.org/2019/865) by leading cryptographers. - -* *Battle-tested*, with successful integration in OSs (Linux kernel, OpenBSD, -FreeBSD, FreeRTOS), languages (Perl, Python, Ruby, etc.), libraries (OpenSSL libcrypto, -Sodium, etc.) and applications (Wireguard, Redis, etc.). - -As a secure pseudorandom function (a.k.a. keyed hash function), SipHash can also be used as a secure message authentication code (MAC). -But SipHash is *not a hash* in the sense of general-purpose key-less hash function such as BLAKE3 or SHA-3. -SipHash should therefore always be used with a secret key in order to be secure. - - -## Variants - -The default SipHash is *SipHash-2-4*: it takes a 128-bit key, does 2 compression -rounds, 4 finalization rounds, and returns a 64-bit tag. - -Variants can use a different number of rounds. For example, we proposed *SipHash-4-8* as a conservative version. - -The following versions are not described in the paper but were designed and analyzed to fulfill applications' needs: - -* *SipHash-128* returns a 128-bit tag instead of 64-bit. Versions with specified number of rounds are SipHash-2-4-128, SipHash4-8-128, and so on. - -* *HalfSipHash* works with 32-bit words instead of 64-bit, takes a 64-bit key, -and returns 32-bit or 64-bit tags. For example, HalfSipHash-2-4-32 has 2 -compression rounds, 4 finalization rounds, and returns a 32-bit tag. - - -## Security - -(Half)SipHash-*c*-*d* with *c* ≥ 2 and *d* ≥ 4 is expected to provide the maximum PRF -security for any function with the same key and output size. - -The standard PRF security goal allow the attacker access to the output of SipHash on messages chosen adaptively by the attacker. - -Security is limited by the key size (128 bits for SipHash), such that -attackers searching 2*s* keys have chance 2*s*−128 of finding -the SipHash key. -Security is also limited by the output size. In particular, when -SipHash is used as a MAC, an attacker who blindly tries 2*s* tags will -succeed with probability 2*s*-*t*, if *t* is that tag's bit size. - - -## Research - -* [Research paper](https://www.aumasson.jp/siphash/siphash.pdf) "SipHash: a fast short-input PRF" (accepted at INDOCRYPT 2012) -* [Slides](https://cr.yp.to/talks/2012.12.12/slides.pdf) of the presentation of SipHash at INDOCRYPT 2012 (Bernstein) -* [Slides](https://www.aumasson.jp/siphash/siphash_slides.pdf) of the presentation of SipHash at the DIAC workshop (Aumasson) - - -## Usage - -Running - -```sh - make -``` - -will build tests for - -* SipHash-2-4-64 -* SipHash-2-4-128 -* HalfSipHash-2-4-32 -* HalfSipHash-2-4-64 - - -```C - ./test -``` - -verifies 64 test vectors, and - -```C - ./debug -``` - -does the same and prints intermediate values. - -The code can be adapted to implement SipHash-*c*-*d*, the version of SipHash -with *c* compression rounds and *d* finalization rounds, by defining `cROUNDS` -or `dROUNDS` when compiling. This can be done with `-D` command line arguments -to many compilers such as below. - -```sh -gcc -Wall
[llvm-branch-commits] [llvm] [Support] Integrate SipHash.cpp into libSupport. (PR #94394)
ahmedbougacha wrote: [37c84b9](https://github.com/llvm/llvm-project/pull/94394/commits/37c84b9dce70f40db8a7c27b7de8232c4d10f78f) shows what I had in mind, let me know what you all think. I added: ``` void getSipHash_2_4_64(const uint8_t *In, uint64_t InLen, const uint8_t ()[16], uint8_t ()[8]); void getSipHash_2_4_128(const uint8_t *In, uint64_t InLen, const uint8_t ()[16], uint8_t ()[16]); ``` as the core interfaces, and mimicked the ref. test harness to reuse the same test vectors. If this seems reasonable to yall I'm happy to extract the vectors.h file from the ref. implementation into the "Import original sources" PR – that's why I kept it open ;) https://github.com/llvm/llvm-project/pull/94394 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [Support] Integrate SipHash.cpp into libSupport. (PR #94394)
https://github.com/ahmedbougacha updated https://github.com/llvm/llvm-project/pull/94394 >From 1e9a3fde97d907c3cd6be33db91d1c18c7236ffb Mon Sep 17 00:00:00 2001 From: Ahmed Bougacha Date: Tue, 4 Jun 2024 12:41:47 -0700 Subject: [PATCH 1/6] [Support] Reformat SipHash.cpp to match libSupport. While there, give it our usual file header and an acknowledgement, and remove the imported README.md.SipHash. --- llvm/lib/Support/README.md.SipHash | 126 -- llvm/lib/Support/SipHash.cpp | 264 ++--- 2 files changed, 129 insertions(+), 261 deletions(-) delete mode 100644 llvm/lib/Support/README.md.SipHash diff --git a/llvm/lib/Support/README.md.SipHash b/llvm/lib/Support/README.md.SipHash deleted file mode 100644 index 4de3cd1854681..0 --- a/llvm/lib/Support/README.md.SipHash +++ /dev/null @@ -1,126 +0,0 @@ -# SipHash - -[![License: -CC0-1.0](https://licensebuttons.net/l/zero/1.0/80x15.png)](http://creativecommons.org/publicdomain/zero/1.0/) - -[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT) - - -SipHash is a family of pseudorandom functions (PRFs) optimized for speed on short messages. -This is the reference C code of SipHash: portable, simple, optimized for clarity and debugging. - -SipHash was designed in 2012 by [Jean-Philippe Aumasson](https://aumasson.jp) -and [Daniel J. Bernstein](https://cr.yp.to) as a defense against [hash-flooding -DoS attacks](https://aumasson.jp/siphash/siphashdos_29c3_slides.pdf). - -SipHash is: - -* *Simpler and faster* on short messages than previous cryptographic -algorithms, such as MACs based on universal hashing. - -* *Competitive in performance* with insecure non-cryptographic algorithms, such as [fhhash](https://github.com/cbreeden/fxhash). - -* *Cryptographically secure*, with no sign of weakness despite multiple [cryptanalysis](https://eprint.iacr.org/2019/865) [projects](https://eprint.iacr.org/2019/865) by leading cryptographers. - -* *Battle-tested*, with successful integration in OSs (Linux kernel, OpenBSD, -FreeBSD, FreeRTOS), languages (Perl, Python, Ruby, etc.), libraries (OpenSSL libcrypto, -Sodium, etc.) and applications (Wireguard, Redis, etc.). - -As a secure pseudorandom function (a.k.a. keyed hash function), SipHash can also be used as a secure message authentication code (MAC). -But SipHash is *not a hash* in the sense of general-purpose key-less hash function such as BLAKE3 or SHA-3. -SipHash should therefore always be used with a secret key in order to be secure. - - -## Variants - -The default SipHash is *SipHash-2-4*: it takes a 128-bit key, does 2 compression -rounds, 4 finalization rounds, and returns a 64-bit tag. - -Variants can use a different number of rounds. For example, we proposed *SipHash-4-8* as a conservative version. - -The following versions are not described in the paper but were designed and analyzed to fulfill applications' needs: - -* *SipHash-128* returns a 128-bit tag instead of 64-bit. Versions with specified number of rounds are SipHash-2-4-128, SipHash4-8-128, and so on. - -* *HalfSipHash* works with 32-bit words instead of 64-bit, takes a 64-bit key, -and returns 32-bit or 64-bit tags. For example, HalfSipHash-2-4-32 has 2 -compression rounds, 4 finalization rounds, and returns a 32-bit tag. - - -## Security - -(Half)SipHash-*c*-*d* with *c* ≥ 2 and *d* ≥ 4 is expected to provide the maximum PRF -security for any function with the same key and output size. - -The standard PRF security goal allow the attacker access to the output of SipHash on messages chosen adaptively by the attacker. - -Security is limited by the key size (128 bits for SipHash), such that -attackers searching 2*s* keys have chance 2*s*−128 of finding -the SipHash key. -Security is also limited by the output size. In particular, when -SipHash is used as a MAC, an attacker who blindly tries 2*s* tags will -succeed with probability 2*s*-*t*, if *t* is that tag's bit size. - - -## Research - -* [Research paper](https://www.aumasson.jp/siphash/siphash.pdf) "SipHash: a fast short-input PRF" (accepted at INDOCRYPT 2012) -* [Slides](https://cr.yp.to/talks/2012.12.12/slides.pdf) of the presentation of SipHash at INDOCRYPT 2012 (Bernstein) -* [Slides](https://www.aumasson.jp/siphash/siphash_slides.pdf) of the presentation of SipHash at the DIAC workshop (Aumasson) - - -## Usage - -Running - -```sh - make -``` - -will build tests for - -* SipHash-2-4-64 -* SipHash-2-4-128 -* HalfSipHash-2-4-32 -* HalfSipHash-2-4-64 - - -```C - ./test -``` - -verifies 64 test vectors, and - -```C - ./debug -``` - -does the same and prints intermediate values. - -The code can be adapted to implement SipHash-*c*-*d*, the version of SipHash -with *c* compression rounds and *d* finalization rounds, by defining `cROUNDS` -or `dROUNDS` when compiling. This can be done with `-D` command line arguments -to many compilers such as below. - -```sh -gcc -Wall
[llvm-branch-commits] [llvm] AMDGPU: Fix buffer load/store of pointers (PR #95379)
krzysz00 wrote: On the other hand, it's a lot easier to handle ugly types down in instruction selection, where you get to play much more fast and loose with types. And there are buffer uses that don't fit into the fat pointer use use case where we'd still want them to work. For example, both `str uct.ptr.bufferload.v6f16` and `struct.ptr.buffer.load.v3f32` should be a `buffer_load_dwordx3`, but I'm pretty sure 6 x half isn't a register type. The load and store intrinsics are already overloaded to handle various {8, 16, ..., 128}-bit types, and it seems much cleaner to let it support any type of those lengths. It's just a load/store with somewhat weird indexing semantics, is all. And then, since we're there ... `load i256, ptr addrspace(1) %p` legalizes to multiple instructions, and `{raw,struct}.ptr.buffer.load(ptr addrspace(8) %p, i32 %offset, ...)` should too. It's just a load, after all. https://github.com/llvm/llvm-project/pull/95379 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [flang] [flang] Lower REDUCE intrinsic for reduction op with args by value (PR #95353)
@@ -5745,6 +5745,14 @@ IntrinsicLibrary::genReduce(mlir::Type resultType, int rank = arrayTmp.rank(); assert(rank >= 1); + // Arguements to the reduction operation are passed by reference or value? + bool argByRef = true; + if (auto embox = + mlir::dyn_cast_or_null(operation.getDefiningOp())) { clementval wrote: > Does REDUCE works with dummy procedure and procedure pointers? If so it would > be good to add tests for those cases to ensure the pattern matching here > works with them. I'll check if this is supported and add proper test if it is. https://github.com/llvm/llvm-project/pull/95353 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [lld] [llvm] release/18.x: [lld] Fix -ObjC load behavior with LTO (#92162) (PR #92478)
https://github.com/AtariDreams reopened https://github.com/llvm/llvm-project/pull/92478 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] release/18.x: [SystemZ] Bugfix in getDemandedSrcElements(). (#88623) (PR #95463)
llvmbot wrote: @uweigand What do you think about merging this PR to the release branch? https://github.com/llvm/llvm-project/pull/95463 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] release/18.x: [SystemZ] Bugfix in getDemandedSrcElements(). (#88623) (PR #95463)
https://github.com/llvmbot milestoned https://github.com/llvm/llvm-project/pull/95463 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] release/18.x: [SystemZ] Bugfix in getDemandedSrcElements(). (#88623) (PR #95463)
https://github.com/llvmbot created https://github.com/llvm/llvm-project/pull/95463 Backport 7e4c6e98fa05f5c3bf14f96365ae74a8d12c6257 Requested by: @nikic >From 016c200faf4bcf1a531dabd4411a2ec4d0a23068 Mon Sep 17 00:00:00 2001 From: Jonas Paulsson Date: Mon, 15 Apr 2024 16:32:14 +0200 Subject: [PATCH] [SystemZ] Bugfix in getDemandedSrcElements(). (#88623) For the intrinsic s390_vperm, all of the elements are demanded, so use an APInt with the value of '-1' for them (not '1'). Fixes https://github.com/llvm/llvm-project/issues/88397 (cherry picked from commit 7e4c6e98fa05f5c3bf14f96365ae74a8d12c6257) --- .../Target/SystemZ/SystemZISelLowering.cpp| 2 +- .../SystemZ/knownbits-intrinsics-binop.ll | 19 +++ 2 files changed, 20 insertions(+), 1 deletion(-) diff --git a/llvm/lib/Target/SystemZ/SystemZISelLowering.cpp b/llvm/lib/Target/SystemZ/SystemZISelLowering.cpp index 5e0b0594b0a42..3a297238c2088 100644 --- a/llvm/lib/Target/SystemZ/SystemZISelLowering.cpp +++ b/llvm/lib/Target/SystemZ/SystemZISelLowering.cpp @@ -7774,7 +7774,7 @@ static APInt getDemandedSrcElements(SDValue Op, const APInt , break; } case Intrinsic::s390_vperm: - SrcDemE = APInt(NumElts, 1); + SrcDemE = APInt(NumElts, -1); break; default: llvm_unreachable("Unhandled intrinsic."); diff --git a/llvm/test/CodeGen/SystemZ/knownbits-intrinsics-binop.ll b/llvm/test/CodeGen/SystemZ/knownbits-intrinsics-binop.ll index 3bcbbb45581f9..b855d01934782 100644 --- a/llvm/test/CodeGen/SystemZ/knownbits-intrinsics-binop.ll +++ b/llvm/test/CodeGen/SystemZ/knownbits-intrinsics-binop.ll @@ -458,3 +458,22 @@ define <16 x i8> @f30() { i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1> ret <16 x i8> %res } + +; Test VPERM with various constant operands. +define i32 @f31() { +; CHECK-LABEL: f31: +; CHECK-LABEL: # %bb.0: +; CHECK-NEXT: larl %r1, .LCPI31_0 +; CHECK-NEXT: vl %v0, 0(%r1), 3 +; CHECK-NEXT: larl %r1, .LCPI31_1 +; CHECK-NEXT: vl %v1, 0(%r1), 3 +; CHECK-NEXT: vperm %v0, %v1, %v1, %v0 +; CHECK-NEXT: vlgvb %r2, %v0, 0 +; CHECK-NEXT: nilf %r2, 7 +; CHECK-NEXT: # kill: def $r2l killed $r2l killed $r2d +; CHECK-NEXT: br %r14 + %P = tail call <16 x i8> @llvm.s390.vperm(<16 x i8> , <16 x i8> , <16 x i8> ) + %E = extractelement <16 x i8> %P, i64 0 + %res = zext i8 %E to i32 + ret i32 %res +} ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] Bump version to 18.1.8 (PR #95458)
llvmbot wrote: @llvm/pr-subscribers-testing-tools Author: Tom Stellard (tstellar) Changes --- Full diff: https://github.com/llvm/llvm-project/pull/95458.diff 2 Files Affected: - (modified) llvm/CMakeLists.txt (+1-1) - (modified) llvm/utils/lit/lit/__init__.py (+1-1) ``diff diff --git a/llvm/CMakeLists.txt b/llvm/CMakeLists.txt index 51278943847aa..909a965cd86c8 100644 --- a/llvm/CMakeLists.txt +++ b/llvm/CMakeLists.txt @@ -22,7 +22,7 @@ if(NOT DEFINED LLVM_VERSION_MINOR) set(LLVM_VERSION_MINOR 1) endif() if(NOT DEFINED LLVM_VERSION_PATCH) - set(LLVM_VERSION_PATCH 7) + set(LLVM_VERSION_PATCH 8) endif() if(NOT DEFINED LLVM_VERSION_SUFFIX) set(LLVM_VERSION_SUFFIX) diff --git a/llvm/utils/lit/lit/__init__.py b/llvm/utils/lit/lit/__init__.py index 5003d78ce5218..800d59492d8ff 100644 --- a/llvm/utils/lit/lit/__init__.py +++ b/llvm/utils/lit/lit/__init__.py @@ -2,7 +2,7 @@ __author__ = "Daniel Dunbar" __email__ = "dan...@minormatter.com" -__versioninfo__ = (18, 1, 7) +__versioninfo__ = (18, 1, 8) __version__ = ".".join(str(v) for v in __versioninfo__) + "dev" __all__ = [] `` https://github.com/llvm/llvm-project/pull/95458 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] Bump version to 18.1.8 (PR #95458)
https://github.com/tstellar created https://github.com/llvm/llvm-project/pull/95458 None >From 2edf6218b7e74cc76035e4e1efa8166b1c22312d Mon Sep 17 00:00:00 2001 From: Tom Stellard Date: Thu, 13 Jun 2024 12:33:39 -0700 Subject: [PATCH] Bump version to 18.1.8 --- llvm/CMakeLists.txt| 2 +- llvm/utils/lit/lit/__init__.py | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/llvm/CMakeLists.txt b/llvm/CMakeLists.txt index 51278943847aa..909a965cd86c8 100644 --- a/llvm/CMakeLists.txt +++ b/llvm/CMakeLists.txt @@ -22,7 +22,7 @@ if(NOT DEFINED LLVM_VERSION_MINOR) set(LLVM_VERSION_MINOR 1) endif() if(NOT DEFINED LLVM_VERSION_PATCH) - set(LLVM_VERSION_PATCH 7) + set(LLVM_VERSION_PATCH 8) endif() if(NOT DEFINED LLVM_VERSION_SUFFIX) set(LLVM_VERSION_SUFFIX) diff --git a/llvm/utils/lit/lit/__init__.py b/llvm/utils/lit/lit/__init__.py index 5003d78ce5218..800d59492d8ff 100644 --- a/llvm/utils/lit/lit/__init__.py +++ b/llvm/utils/lit/lit/__init__.py @@ -2,7 +2,7 @@ __author__ = "Daniel Dunbar" __email__ = "dan...@minormatter.com" -__versioninfo__ = (18, 1, 7) +__versioninfo__ = (18, 1, 8) __version__ = ".".join(str(v) for v in __versioninfo__) + "dev" __all__ = [] ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [compiler-rt] [TySan] Fixed false positive when accessing offset member variables (PR #95387)
https://github.com/gbMattN converted_to_draft https://github.com/llvm/llvm-project/pull/95387 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [Support] Integrate SipHash.cpp into libSupport. (PR #94394)
kbeyls wrote: > Yes, this doesn't have tests by itself because there's no exposed interface. > It's certainly trivial to add one (which would allow using the reference test > vectors). > > I don't have strong arguments either way, but I figured the conservative > option is to force hypothetical users to consider their use more seriously. > One might argue that's not how we usually treat libSupport, so I'm happy to > expose the raw function here. I see some value in being able to test with the reference test vectors to be fully sure that the implementation really implements SipHash. But as I said above, I'm happy with merging this as is. https://github.com/llvm/llvm-project/pull/94394 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [compiler-rt] 7fe862d - Revert "[hwasan] Add fixed_shadow_base flag (#73980)"
Author: Florian Mayer Date: 2024-06-13T09:55:29-07:00 New Revision: 7fe862d0a1f6dfa67c236f5af32ad15546797404 URL: https://github.com/llvm/llvm-project/commit/7fe862d0a1f6dfa67c236f5af32ad15546797404 DIFF: https://github.com/llvm/llvm-project/commit/7fe862d0a1f6dfa67c236f5af32ad15546797404.diff LOG: Revert "[hwasan] Add fixed_shadow_base flag (#73980)" This reverts commit ea991a11b2a3d2bfa545adbefb71cd17e8970a43. Added: Modified: compiler-rt/lib/hwasan/hwasan_flags.inc compiler-rt/lib/hwasan/hwasan_linux.cpp Removed: compiler-rt/test/hwasan/TestCases/Linux/fixed-shadow.c diff --git a/compiler-rt/lib/hwasan/hwasan_flags.inc b/compiler-rt/lib/hwasan/hwasan_flags.inc index 058a0457b9e7f..978fa46b705cb 100644 --- a/compiler-rt/lib/hwasan/hwasan_flags.inc +++ b/compiler-rt/lib/hwasan/hwasan_flags.inc @@ -84,10 +84,3 @@ HWASAN_FLAG(bool, malloc_bisect_dump, false, // are untagged before the call. HWASAN_FLAG(bool, fail_without_syscall_abi, true, "Exit if fail to request relaxed syscall ABI.") - -HWASAN_FLAG( -uptr, fixed_shadow_base, -1, -"If not -1, HWASan will attempt to allocate the shadow at this address, " -"instead of choosing one dynamically." -"Tip: this can be combined with the compiler option, " -"-hwasan-mapping-offset, to optimize the instrumentation.") diff --git a/compiler-rt/lib/hwasan/hwasan_linux.cpp b/compiler-rt/lib/hwasan/hwasan_linux.cpp index e6aa60b324fa7..c254670ee2d48 100644 --- a/compiler-rt/lib/hwasan/hwasan_linux.cpp +++ b/compiler-rt/lib/hwasan/hwasan_linux.cpp @@ -106,12 +106,8 @@ static uptr GetHighMemEnd() { } static void InitializeShadowBaseAddress(uptr shadow_size_bytes) { - if (flags()->fixed_shadow_base != (uptr)-1) { -__hwasan_shadow_memory_dynamic_address = flags()->fixed_shadow_base; - } else { -__hwasan_shadow_memory_dynamic_address = -FindDynamicShadowStart(shadow_size_bytes); - } + __hwasan_shadow_memory_dynamic_address = + FindDynamicShadowStart(shadow_size_bytes); } static void MaybeDieIfNoTaggingAbi(const char *message) { diff --git a/compiler-rt/test/hwasan/TestCases/Linux/fixed-shadow.c b/compiler-rt/test/hwasan/TestCases/Linux/fixed-shadow.c deleted file mode 100644 index 4ff1d3e64c1d0..0 --- a/compiler-rt/test/hwasan/TestCases/Linux/fixed-shadow.c +++ /dev/null @@ -1,76 +0,0 @@ -// Test fixed shadow base functionality. -// -// Default compiler instrumentation works with any shadow base (dynamic or fixed). -// RUN: %clang_hwasan %s -o %t && %run %t -// RUN: %clang_hwasan %s -o %t && HWASAN_OPTIONS=fixed_shadow_base=263878495698944 %run %t -// RUN: %clang_hwasan %s -o %t && HWASAN_OPTIONS=fixed_shadow_base=4398046511104 %run %t -// -// If -hwasan-mapping-offset is set, then the fixed_shadow_base needs to match. -// RUN: %clang_hwasan %s -mllvm -hwasan-mapping-offset=263878495698944 -o %t && HWASAN_OPTIONS=fixed_shadow_base=263878495698944 %run %t -// RUN: %clang_hwasan %s -mllvm -hwasan-mapping-offset=4398046511104 -o %t && HWASAN_OPTIONS=fixed_shadow_base=4398046511104 %run %t -// RUN: %clang_hwasan %s -mllvm -hwasan-mapping-offset=263878495698944 -o %t && HWASAN_OPTIONS=fixed_shadow_base=4398046511104 not %run %t -// RUN: %clang_hwasan %s -mllvm -hwasan-mapping-offset=4398046511104 -o %t && HWASAN_OPTIONS=fixed_shadow_base=263878495698944 not %run %t -// -// Note: if fixed_shadow_base is not set, compiler-rt will dynamically choose a -// shadow base, which has a tiny but non-zero probability of matching the -// compiler instrumentation. To avoid test flake, we do not test this case. -// -// Assume 48-bit VMA -// REQUIRES: aarch64-target-arch -// -// REQUIRES: Clang -// -// UNSUPPORTED: android - -#include -#include -#include -#include -#include -#include - -int main() { - __hwasan_enable_allocator_tagging(); - - // We test that the compiler instrumentation is able to access shadow memory - // for many diff erent addresses. If we only test a small number of addresses, - // it might work by chance even if the shadow base does not match between the - // compiler instrumentation and compiler-rt. - void **mmaps[256]; - // 48-bit VMA - for (int i = 0; i < 256; i++) { -unsigned long long addr = (i * (1ULL << 40)); - -void *p = mmap((void *)addr, 4096, PROT_READ | PROT_WRITE, - MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); -// We don't use MAP_FIXED, to avoid overwriting critical memory. -// However, if we don't get allocated the requested address, it -// isn't a useful test. -if ((unsigned long long)p != addr) { - munmap(p, 4096); - mmaps[i] = MAP_FAILED; -} else { - mmaps[i] = p; -} - } - - int failures = 0; - for (int i = 0; i < 256; i++) { -if (mmaps[i] == MAP_FAILED) { - failures++; -} else { - printf("%d %p\n", i, mmaps[i]); - munmap(mmaps[i], 4096); -} - } - -
[llvm-branch-commits] [flang] [flang] Lower REDUCE intrinsic for reduction op with args by value (PR #95353)
https://github.com/jeanPerier approved this pull request. LGTM https://github.com/llvm/llvm-project/pull/95353 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [BOLT] Drop high discrepancy profiles in matching (PR #95156)
@@ -592,10 +599,15 @@ void preprocessUnreachableBlocks(FlowFunction ) { /// Decide if stale profile matching can be applied for a given function. /// Currently we skip inference for (very) large instances and for instances /// having "unexpected" control flow (e.g., having no sink basic blocks). -bool canApplyInference(const FlowFunction ) { +bool canApplyInference(const FlowFunction , + const yaml::bolt::BinaryFunctionProfile ) { if (Func.Blocks.size() > opts::StaleMatchingMaxFuncSize) return false; + if ((double)Func.MatchedExecCount / YamlBF.ExecCount >= + opts::MatchedProfileThreshold / 100.0) +return false; shawbyoung wrote: I’m leaning towards the block count heuristic now. I think the 1M and 4x1K exec count block case is likely pretty common – I imagine functions with loops would look a lot like this. Having some blocks matched exactly would suggest to me that there would likely be a reasonable amount of similarity between the profiled function and existing function relationally, which block coldness likely doesn’t have an outsized bearing on. I think having a reasonably high threshold for matched blocks would conservatively allow us to drop functions in high discrepancy – I’ll test this on a production binary. https://github.com/llvm/llvm-project/pull/95156 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU: Fix buffer load/store of pointers (PR #95379)
krzysz00 wrote: Yeah, makes sense. ... what prevents a match-bitwidth operator from existing? Context from where I'm standing is that you should be able to `raw.buffer.load/store` any (non-aggregate, let's say, since that could be better handled in `addrspace(7)` handling) type you could `load` or `store`. That is, `raw.ptr.buffer.load.i15` should work (as an i16 load that truncates) as should `raw.ptr.buffer.store.v8f32` (or `raw.ptr.buffer.store.i256`). Sure, the latter are two instructions long, but regular loads can regularize to multiple instructions just fine. My thoughts on how to implement that second behavior were to split the type into legal chunks and add in the offsets, and then merge/bitcast the values back. https://github.com/llvm/llvm-project/pull/95379 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [flang] [flang] Lower REDUCE intrinsic for reduction op with args by value (PR #95353)
@@ -5745,6 +5745,14 @@ IntrinsicLibrary::genReduce(mlir::Type resultType, int rank = arrayTmp.rank(); assert(rank >= 1); + // Arguements to the reduction operation are passed by reference or value? + bool argByRef = true; + if (auto embox = + mlir::dyn_cast_or_null(operation.getDefiningOp())) { jeanPerier wrote: Does REDUCE works with dummy procedure and procedure pointers? If so it would be good to add tests for those cases to ensure the pattern matching here works with them. https://github.com/llvm/llvm-project/pull/95353 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU: Fix buffer load/store of pointers (PR #95379)
arsenm wrote: I don't think we should be trying to handle the unreasonable illegal types in the intrinsics themselves. Theoretically the intrinsic should correspond to direct support. We would handle the ugly types in the fat pointer lowering in terms of the intrinsics. https://github.com/llvm/llvm-project/pull/95379 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [compiler-rt] [TySan] Fixed false positive when accessing offset member variables (PR #95387)
gbMattN wrote: This may be a side effect of a different bug tracking global variables. I think fixing that bug first, and then applying this change if the problem persists is a better idea. Because of this, I'm switching this to a draft for now. Discourse link is https://discourse.llvm.org/t/reviving-typesanitizer-a-sanitizer-to-catch-type-based-aliasing-violations/66092/23 https://github.com/llvm/llvm-project/pull/95387 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [libc] 93e7f14 - Revert "[libc] fix aarch64 linux full build (#95358)"
Author: Schrodinger ZHU Yifan Date: 2024-06-13T07:54:57-07:00 New Revision: 93e7f145bc38c7c47d797e652d891695eb44fcfa URL: https://github.com/llvm/llvm-project/commit/93e7f145bc38c7c47d797e652d891695eb44fcfa DIFF: https://github.com/llvm/llvm-project/commit/93e7f145bc38c7c47d797e652d891695eb44fcfa.diff LOG: Revert "[libc] fix aarch64 linux full build (#95358)" This reverts commit ca05204f9aa258c5324d5675c7987c7e570168a0. Added: Modified: libc/config/linux/aarch64/entrypoints.txt libc/src/__support/threads/linux/CMakeLists.txt libc/test/IntegrationTest/test.cpp Removed: diff --git a/libc/config/linux/aarch64/entrypoints.txt b/libc/config/linux/aarch64/entrypoints.txt index 7ce088689b925..db96a80051a8d 100644 --- a/libc/config/linux/aarch64/entrypoints.txt +++ b/libc/config/linux/aarch64/entrypoints.txt @@ -643,12 +643,6 @@ if(LLVM_LIBC_FULL_BUILD) libc.src.pthread.pthread_mutexattr_setrobust libc.src.pthread.pthread_mutexattr_settype libc.src.pthread.pthread_once -libc.src.pthread.pthread_rwlockattr_destroy -libc.src.pthread.pthread_rwlockattr_getkind_np -libc.src.pthread.pthread_rwlockattr_getpshared -libc.src.pthread.pthread_rwlockattr_init -libc.src.pthread.pthread_rwlockattr_setkind_np -libc.src.pthread.pthread_rwlockattr_setpshared libc.src.pthread.pthread_setspecific # sched.h entrypoints @@ -759,7 +753,6 @@ if(LLVM_LIBC_FULL_BUILD) libc.src.unistd._exit libc.src.unistd.environ libc.src.unistd.execv -libc.src.unistd.fork libc.src.unistd.getopt libc.src.unistd.optarg libc.src.unistd.optind diff --git a/libc/src/__support/threads/linux/CMakeLists.txt b/libc/src/__support/threads/linux/CMakeLists.txt index 8e6cd7227b2c8..9bf88ccc84557 100644 --- a/libc/src/__support/threads/linux/CMakeLists.txt +++ b/libc/src/__support/threads/linux/CMakeLists.txt @@ -64,7 +64,6 @@ add_object_library( .futex_utils libc.config.linux.app_h libc.include.sys_syscall -libc.include.fcntl libc.src.errno.errno libc.src.__support.CPP.atomic libc.src.__support.CPP.stringstream diff --git a/libc/test/IntegrationTest/test.cpp b/libc/test/IntegrationTest/test.cpp index 27e7f29efa0f1..3bdbe89a3fb62 100644 --- a/libc/test/IntegrationTest/test.cpp +++ b/libc/test/IntegrationTest/test.cpp @@ -79,10 +79,4 @@ void *realloc(void *ptr, size_t s) { // Integration tests are linked with -nostdlib. BFD linker expects // __dso_handle when -nostdlib is used. void *__dso_handle = nullptr; - -// On some platform (aarch64 fedora tested) full build integration test -// objects need to link against libgcc, which may expect a __getauxval -// function. For now, it is fine to provide a weak definition that always -// returns false. -[[gnu::weak]] bool __getauxval(uint64_t, uint64_t *) { return false; } } // extern "C" ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [libc] 91323a6 - Revert "Revert "[libc] fix aarch64 linux full build (#95358)" (#95419)"
Author: Schrodinger ZHU Yifan Date: 2024-06-13T08:38:05-07:00 New Revision: 91323a6ea8f32a9fe2cec7051e8a99b87157133e URL: https://github.com/llvm/llvm-project/commit/91323a6ea8f32a9fe2cec7051e8a99b87157133e DIFF: https://github.com/llvm/llvm-project/commit/91323a6ea8f32a9fe2cec7051e8a99b87157133e.diff LOG: Revert "Revert "[libc] fix aarch64 linux full build (#95358)" (#95419)" This reverts commit 9e5428e6b02c77fb18c4bdf688a216c957fd7a53. Added: Modified: libc/config/linux/aarch64/entrypoints.txt libc/src/__support/threads/linux/CMakeLists.txt libc/test/IntegrationTest/test.cpp Removed: diff --git a/libc/config/linux/aarch64/entrypoints.txt b/libc/config/linux/aarch64/entrypoints.txt index db96a80051a8d..7ce088689b925 100644 --- a/libc/config/linux/aarch64/entrypoints.txt +++ b/libc/config/linux/aarch64/entrypoints.txt @@ -643,6 +643,12 @@ if(LLVM_LIBC_FULL_BUILD) libc.src.pthread.pthread_mutexattr_setrobust libc.src.pthread.pthread_mutexattr_settype libc.src.pthread.pthread_once +libc.src.pthread.pthread_rwlockattr_destroy +libc.src.pthread.pthread_rwlockattr_getkind_np +libc.src.pthread.pthread_rwlockattr_getpshared +libc.src.pthread.pthread_rwlockattr_init +libc.src.pthread.pthread_rwlockattr_setkind_np +libc.src.pthread.pthread_rwlockattr_setpshared libc.src.pthread.pthread_setspecific # sched.h entrypoints @@ -753,6 +759,7 @@ if(LLVM_LIBC_FULL_BUILD) libc.src.unistd._exit libc.src.unistd.environ libc.src.unistd.execv +libc.src.unistd.fork libc.src.unistd.getopt libc.src.unistd.optarg libc.src.unistd.optind diff --git a/libc/src/__support/threads/linux/CMakeLists.txt b/libc/src/__support/threads/linux/CMakeLists.txt index 9bf88ccc84557..8e6cd7227b2c8 100644 --- a/libc/src/__support/threads/linux/CMakeLists.txt +++ b/libc/src/__support/threads/linux/CMakeLists.txt @@ -64,6 +64,7 @@ add_object_library( .futex_utils libc.config.linux.app_h libc.include.sys_syscall +libc.include.fcntl libc.src.errno.errno libc.src.__support.CPP.atomic libc.src.__support.CPP.stringstream diff --git a/libc/test/IntegrationTest/test.cpp b/libc/test/IntegrationTest/test.cpp index 3bdbe89a3fb62..27e7f29efa0f1 100644 --- a/libc/test/IntegrationTest/test.cpp +++ b/libc/test/IntegrationTest/test.cpp @@ -79,4 +79,10 @@ void *realloc(void *ptr, size_t s) { // Integration tests are linked with -nostdlib. BFD linker expects // __dso_handle when -nostdlib is used. void *__dso_handle = nullptr; + +// On some platform (aarch64 fedora tested) full build integration test +// objects need to link against libgcc, which may expect a __getauxval +// function. For now, it is fine to provide a weak definition that always +// returns false. +[[gnu::weak]] bool __getauxval(uint64_t, uint64_t *) { return false; } } // extern "C" ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU: Fix buffer load/store of pointers (PR #95379)
arsenm wrote: That's what we've traditionally done and I think we should stop. We currently skip inserting the casts if the type is legal. It introduces extra bitcasts, which have a cost and increase pattern match complexity. We have a bunch of patterns that don't bother to look through the casts for a load/store https://github.com/llvm/llvm-project/pull/95379 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU: Fix buffer load/store of pointers (PR #95379)
krzysz00 wrote: So, general question on this patch series: Wouldn't it be more reasonable to, instead of having separate handling for all the possible register types, always do loads as `i8`, `i16`, `i32` `<2 x i32>`, `<3 x i32>, or `<4 x i32>` and then `bitcast`/`merge_values`/... the results back to their type? Or at least to have that fallback path - if we don't know what a type is, load/store it as its bits? (Then we wouldn't need to, for example, go back and add a `<16 x i8>` case if someone realizes they want that) https://github.com/llvm/llvm-project/pull/95379 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [llvm] clang/AMDGPU: Emit atomicrmw from ds_fadd builtins (PR #95395)
@@ -117,13 +117,44 @@ void test_update_dpp(global int* out, int arg1, int arg2) } // CHECK-LABEL: @test_ds_fadd -// CHECK: {{.*}}call{{.*}} float @llvm.amdgcn.ds.fadd.f32(ptr addrspace(3) %out, float %src, i32 0, i32 0, i1 false) +// CHECK: atomicrmw fadd ptr addrspace(3) %out, float %src monotonic, align 4{{$}} +// CHECK: atomicrmw volatile fadd ptr addrspace(3) %out, float %src monotonic, align 4{{$}} + +// CHECK: atomicrmw fadd ptr addrspace(3) %out, float %src acquire, align 4{{$}} +// CHECK: atomicrmw fadd ptr addrspace(3) %out, float %src acquire, align 4{{$}} +// CHECK: atomicrmw fadd ptr addrspace(3) %out, float %src release, align 4{{$}} +// CHECK: atomicrmw fadd ptr addrspace(3) %out, float %src acq_rel, align 4{{$}} +// CHECK: atomicrmw fadd ptr addrspace(3) %out, float %src seq_cst, align 4{{$}} +// CHECK: atomicrmw fadd ptr addrspace(3) %out, float %src seq_cst, align 4{{$}} + +// CHECK: atomicrmw fadd ptr addrspace(3) %out, float %src syncscope("agent") monotonic, align 4{{$}} +// CHECK: atomicrmw fadd ptr addrspace(3) %out, float %src syncscope("workgroup") monotonic, align 4{{$}} +// CHECK: atomicrmw fadd ptr addrspace(3) %out, float %src syncscope("wavefront") monotonic, align 4{{$}} +// CHECK: atomicrmw fadd ptr addrspace(3) %out, float %src syncscope("singlethread") monotonic, align 4{{$}} +// CHECK: atomicrmw fadd ptr addrspace(3) %out, float %src monotonic, align 4{{$}} #if !defined(__SPIRV__) void test_ds_faddf(local float *out, float src) { #else -void test_ds_faddf(__attribute__((address_space(3))) float *out, float src) { + void test_ds_faddf(__attribute__((address_space(3))) float *out, float src) { #endif + *out = __builtin_amdgcn_ds_faddf(out, src, 0, 0, false); + *out = __builtin_amdgcn_ds_faddf(out, src, 0, 0, true); + + // Test all orders. + *out = __builtin_amdgcn_ds_faddf(out, src, 1, 0, false); yxsamliu wrote: better use predefined macros ``` // Define macros for the C11 / C++11 memory orderings Builder.defineMacro("__ATOMIC_RELAXED", "0"); Builder.defineMacro("__ATOMIC_CONSUME", "1"); Builder.defineMacro("__ATOMIC_ACQUIRE", "2"); Builder.defineMacro("__ATOMIC_RELEASE", "3"); Builder.defineMacro("__ATOMIC_ACQ_REL", "4"); Builder.defineMacro("__ATOMIC_SEQ_CST", "5"); // Define macros for the clang atomic scopes. Builder.defineMacro("__MEMORY_SCOPE_SYSTEM", "0"); Builder.defineMacro("__MEMORY_SCOPE_DEVICE", "1"); Builder.defineMacro("__MEMORY_SCOPE_WRKGRP", "2"); Builder.defineMacro("__MEMORY_SCOPE_WVFRNT", "3"); Builder.defineMacro("__MEMORY_SCOPE_SINGLE", "4"); ``` https://github.com/llvm/llvm-project/pull/95395 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [compiler-rt] [TySan] Fixed false positive when accessing offset member variables (PR #95387)
gbMattN wrote: @fhahn https://github.com/llvm/llvm-project/pull/95387 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [llvm] AMDGPU: Remove ds atomic fadd intrinsics (PR #95396)
@@ -2331,40 +2337,74 @@ static Value *upgradeARMIntrinsicCall(StringRef Name, CallBase *CI, Function *F, llvm_unreachable("Unknown function for ARM CallBase upgrade."); } +// These are expected to have have the arguments: cdevadas wrote: ```suggestion // These are expected to have the arguments: ``` https://github.com/llvm/llvm-project/pull/95396 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [compiler-rt] [TySan] Fixed false positive when accessing offset member variables (PR #95387)
https://github.com/gbMattN updated https://github.com/llvm/llvm-project/pull/95387 >From 432f994b1bc21e4db0778fff9cc1425f788f8168 Mon Sep 17 00:00:00 2001 From: Matthew Nagy Date: Thu, 13 Jun 2024 09:54:04 + Subject: [PATCH] [TySan] Fixed false positive when accessing offset member variables --- compiler-rt/lib/tysan/tysan.cpp | 12 +- compiler-rt/test/tysan/struct-members.c | 31 + 2 files changed, 42 insertions(+), 1 deletion(-) create mode 100644 compiler-rt/test/tysan/struct-members.c diff --git a/compiler-rt/lib/tysan/tysan.cpp b/compiler-rt/lib/tysan/tysan.cpp index f627851d049e6..747727e48a152 100644 --- a/compiler-rt/lib/tysan/tysan.cpp +++ b/compiler-rt/lib/tysan/tysan.cpp @@ -221,7 +221,17 @@ __tysan_check(void *addr, int size, tysan_type_descriptor *td, int flags) { OldTDPtr -= i; OldTD = *OldTDPtr; -if (!isAliasingLegal(td, OldTD)) +tysan_type_descriptor *InternalMember = OldTD; +if (OldTD->Tag == TYSAN_STRUCT_TD) { + for (int j = 0; j < OldTD->Struct.MemberCount; j++) { +if (OldTD->Struct.Members[j].Offset == i) { + InternalMember = OldTD->Struct.Members[j].Type; + break; +} + } +} + +if (!isAliasingLegal(td, InternalMember)) reportError(addr, size, td, OldTD, AccessStr, "accesses part of an existing object", -i, pc, bp, sp); diff --git a/compiler-rt/test/tysan/struct-members.c b/compiler-rt/test/tysan/struct-members.c new file mode 100644 index 0..76ea3c431dd7b --- /dev/null +++ b/compiler-rt/test/tysan/struct-members.c @@ -0,0 +1,31 @@ +// RUN: %clang_tysan -O0 %s -o %t && %run %t >%t.out 2>&1 +// RUN: FileCheck %s < %t.out + +#include + +struct X { + int a, b, c; +} x; + +static struct X xArray[2]; + +int main() { + x.a = 1; + x.b = 2; + x.c = 3; + + printf("%d %d %d\n", x.a, x.b, x.c); + // CHECK-NOT: ERROR: TypeSanitizer: type-aliasing-violation + + for (size_t i = 0; i < 2; i++) { +xArray[i].a = 1; +xArray[i].b = 1; +xArray[i].c = 1; + } + + struct X *xPtr = (struct X *)&(xArray[0].c); + xPtr->a = 1; + // CHECK: ERROR: TypeSanitizer: type-aliasing-violation + // CHECK: WRITE of size 4 at {{.*}} with type int (in X at offset 0) accesses an existing object of type int (in X at offset 8) + // CHECK: {{#0 0x.* in main .*struct-members.c:}}[[@LINE-3]] +} ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU: Cleanup selection patterns for buffer loads (PR #95378)
@@ -1421,27 +1421,21 @@ let OtherPredicates = [HasPackedD16VMem] in { defm : MUBUF_LoadIntrinsicPat; } // End HasPackedD16VMem. -defm : MUBUF_LoadIntrinsicPat; -defm : MUBUF_LoadIntrinsicPat; -defm : MUBUF_LoadIntrinsicPat; -defm : MUBUF_LoadIntrinsicPat; -defm : MUBUF_LoadIntrinsicPat; -defm : MUBUF_LoadIntrinsicPat; -defm : MUBUF_LoadIntrinsicPat; -defm : MUBUF_LoadIntrinsicPat; -defm : MUBUF_LoadIntrinsicPat; -defm : MUBUF_LoadIntrinsicPat; -defm : MUBUF_LoadIntrinsicPat; -defm : MUBUF_LoadIntrinsicPat; -defm : MUBUF_LoadIntrinsicPat; -defm : MUBUF_LoadIntrinsicPat; -defm : MUBUF_LoadIntrinsicPat; -defm : MUBUF_LoadIntrinsicPat; -defm : MUBUF_LoadIntrinsicPat; -defm : MUBUF_LoadIntrinsicPat; -defm : MUBUF_LoadIntrinsicPat; -defm : MUBUF_LoadIntrinsicPat; -defm : MUBUF_LoadIntrinsicPat; +foreach vt = Reg32Types.types in { +defm : MUBUF_LoadIntrinsicPat; +} arsenm wrote: I'm not a big fan of omitting the braces, especially in tablegen. If we're going to delete the braces the lines should at least be indented https://github.com/llvm/llvm-project/pull/95378 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU: Fix buffer load/store of pointers (PR #95379)
https://github.com/arsenm updated https://github.com/llvm/llvm-project/pull/95379 >From 14695322d92821374dd6599d8f0f76d212e50169 Mon Sep 17 00:00:00 2001 From: Matt Arsenault Date: Wed, 12 Jun 2024 10:10:20 +0200 Subject: [PATCH] AMDGPU: Fix buffer load/store of pointers Make sure we test all the address spaces since this support isn't free in gisel. --- llvm/lib/Target/AMDGPU/SIISelLowering.cpp | 31 +- .../AMDGPU/llvm.amdgcn.raw.ptr.buffer.load.ll | 596 ++ .../llvm.amdgcn.raw.ptr.buffer.store.ll | 456 ++ 3 files changed, 1071 insertions(+), 12 deletions(-) diff --git a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp index 81098201e9c0f..7a36c88b892c8 100644 --- a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp +++ b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp @@ -1112,29 +1112,33 @@ unsigned SITargetLowering::getVectorTypeBreakdownForCallingConv( Context, CC, VT, IntermediateVT, NumIntermediates, RegisterVT); } -static EVT memVTFromLoadIntrData(Type *Ty, unsigned MaxNumLanes) { +static EVT memVTFromLoadIntrData(const SITargetLowering , + const DataLayout , Type *Ty, + unsigned MaxNumLanes) { assert(MaxNumLanes != 0); + LLVMContext = Ty->getContext(); if (auto *VT = dyn_cast(Ty)) { unsigned NumElts = std::min(MaxNumLanes, VT->getNumElements()); -return EVT::getVectorVT(Ty->getContext(), -EVT::getEVT(VT->getElementType()), +return EVT::getVectorVT(Ctx, TLI.getValueType(DL, VT->getElementType()), NumElts); } - return EVT::getEVT(Ty); + return TLI.getValueType(DL, Ty); } // Peek through TFE struct returns to only use the data size. -static EVT memVTFromLoadIntrReturn(Type *Ty, unsigned MaxNumLanes) { +static EVT memVTFromLoadIntrReturn(const SITargetLowering , + const DataLayout , Type *Ty, + unsigned MaxNumLanes) { auto *ST = dyn_cast(Ty); if (!ST) -return memVTFromLoadIntrData(Ty, MaxNumLanes); +return memVTFromLoadIntrData(TLI, DL, Ty, MaxNumLanes); // TFE intrinsics return an aggregate type. assert(ST->getNumContainedTypes() == 2 && ST->getContainedType(1)->isIntegerTy(32)); - return memVTFromLoadIntrData(ST->getContainedType(0), MaxNumLanes); + return memVTFromLoadIntrData(TLI, DL, ST->getContainedType(0), MaxNumLanes); } /// Map address space 7 to MVT::v5i32 because that's its in-memory @@ -1219,10 +1223,12 @@ bool SITargetLowering::getTgtMemIntrinsic(IntrinsicInfo , MaxNumLanes = DMask == 0 ? 1 : llvm::popcount(DMask); } -Info.memVT = memVTFromLoadIntrReturn(CI.getType(), MaxNumLanes); +Info.memVT = memVTFromLoadIntrReturn(*this, MF.getDataLayout(), + CI.getType(), MaxNumLanes); } else { -Info.memVT = memVTFromLoadIntrReturn( -CI.getType(), std::numeric_limits::max()); +Info.memVT = +memVTFromLoadIntrReturn(*this, MF.getDataLayout(), CI.getType(), +std::numeric_limits::max()); } // FIXME: What does alignment mean for an image? @@ -1235,9 +1241,10 @@ bool SITargetLowering::getTgtMemIntrinsic(IntrinsicInfo , if (RsrcIntr->IsImage) { unsigned DMask = cast(CI.getArgOperand(1))->getZExtValue(); unsigned DMaskLanes = DMask == 0 ? 1 : llvm::popcount(DMask); -Info.memVT = memVTFromLoadIntrData(DataTy, DMaskLanes); +Info.memVT = memVTFromLoadIntrData(*this, MF.getDataLayout(), DataTy, + DMaskLanes); } else -Info.memVT = EVT::getEVT(DataTy); +Info.memVT = getValueType(MF.getDataLayout(), DataTy); Info.flags |= MachineMemOperand::MOStore; } else { diff --git a/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.raw.ptr.buffer.load.ll b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.raw.ptr.buffer.load.ll index 3e3371091ef72..4d557c76dc4d0 100644 --- a/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.raw.ptr.buffer.load.ll +++ b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.raw.ptr.buffer.load.ll @@ -1280,6 +1280,602 @@ define <2 x i64> @buffer_load_v2i64__voffset_add(ptr addrspace(8) inreg %rsrc, i ret <2 x i64> %data } +define ptr @buffer_load_p0__voffset_add(ptr addrspace(8) inreg %rsrc, i32 %voffset) { +; PREGFX10-LABEL: buffer_load_p0__voffset_add: +; PREGFX10: ; %bb.0: +; PREGFX10-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) +; PREGFX10-NEXT:buffer_load_dwordx2 v[0:1], v0, s[4:7], 0 offen offset:60 +; PREGFX10-NEXT:s_waitcnt vmcnt(0) +; PREGFX10-NEXT:s_setpc_b64 s[30:31] +; +; GFX10-LABEL: buffer_load_p0__voffset_add: +; GFX10: ; %bb.0: +; GFX10-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) +; GFX10-NEXT:buffer_load_dwordx2 v[0:1], v0, s[4:7], 0 offen offset:60 +;
[llvm-branch-commits] [llvm] AMDGPU: Cleanup selection patterns for buffer loads (PR #95378)
https://github.com/arsenm updated https://github.com/llvm/llvm-project/pull/95378 >From 1dfcc0961e82bbe656faded0c38e694da0d76c9b Mon Sep 17 00:00:00 2001 From: Matt Arsenault Date: Sun, 9 Jun 2024 23:12:31 +0200 Subject: [PATCH] AMDGPU: Cleanup selection patterns for buffer loads We should just support these for all register types. --- llvm/lib/Target/AMDGPU/BUFInstructions.td | 72 ++- llvm/lib/Target/AMDGPU/SIRegisterInfo.td | 16 ++--- 2 files changed, 39 insertions(+), 49 deletions(-) diff --git a/llvm/lib/Target/AMDGPU/BUFInstructions.td b/llvm/lib/Target/AMDGPU/BUFInstructions.td index 50e62788c5eac..978d261f5a662 100644 --- a/llvm/lib/Target/AMDGPU/BUFInstructions.td +++ b/llvm/lib/Target/AMDGPU/BUFInstructions.td @@ -1421,27 +1421,21 @@ let OtherPredicates = [HasPackedD16VMem] in { defm : MUBUF_LoadIntrinsicPat; } // End HasPackedD16VMem. -defm : MUBUF_LoadIntrinsicPat; -defm : MUBUF_LoadIntrinsicPat; -defm : MUBUF_LoadIntrinsicPat; -defm : MUBUF_LoadIntrinsicPat; -defm : MUBUF_LoadIntrinsicPat; -defm : MUBUF_LoadIntrinsicPat; -defm : MUBUF_LoadIntrinsicPat; -defm : MUBUF_LoadIntrinsicPat; -defm : MUBUF_LoadIntrinsicPat; -defm : MUBUF_LoadIntrinsicPat; -defm : MUBUF_LoadIntrinsicPat; -defm : MUBUF_LoadIntrinsicPat; -defm : MUBUF_LoadIntrinsicPat; -defm : MUBUF_LoadIntrinsicPat; -defm : MUBUF_LoadIntrinsicPat; -defm : MUBUF_LoadIntrinsicPat; -defm : MUBUF_LoadIntrinsicPat; -defm : MUBUF_LoadIntrinsicPat; -defm : MUBUF_LoadIntrinsicPat; -defm : MUBUF_LoadIntrinsicPat; -defm : MUBUF_LoadIntrinsicPat; +foreach vt = Reg32Types.types in { +defm : MUBUF_LoadIntrinsicPat; +} + +foreach vt = Reg64Types.types in { +defm : MUBUF_LoadIntrinsicPat; +} + +foreach vt = Reg96Types.types in { +defm : MUBUF_LoadIntrinsicPat; +} + +foreach vt = Reg128Types.types in { +defm : MUBUF_LoadIntrinsicPat; +} defm : MUBUF_LoadIntrinsicPat; defm : MUBUF_LoadIntrinsicPat; @@ -1532,27 +1526,21 @@ let OtherPredicates = [HasPackedD16VMem] in { defm : MUBUF_StoreIntrinsicPat; } // End HasPackedD16VMem. -defm : MUBUF_StoreIntrinsicPat; -defm : MUBUF_StoreIntrinsicPat; -defm : MUBUF_StoreIntrinsicPat; -defm : MUBUF_StoreIntrinsicPat; -defm : MUBUF_StoreIntrinsicPat; -defm : MUBUF_StoreIntrinsicPat; -defm : MUBUF_StoreIntrinsicPat; -defm : MUBUF_StoreIntrinsicPat; -defm : MUBUF_StoreIntrinsicPat; -defm : MUBUF_StoreIntrinsicPat; -defm : MUBUF_StoreIntrinsicPat; -defm : MUBUF_StoreIntrinsicPat; -defm : MUBUF_StoreIntrinsicPat; -defm : MUBUF_StoreIntrinsicPat; -defm : MUBUF_StoreIntrinsicPat; -defm : MUBUF_StoreIntrinsicPat; -defm : MUBUF_StoreIntrinsicPat; -defm : MUBUF_StoreIntrinsicPat; -defm : MUBUF_StoreIntrinsicPat; -defm : MUBUF_StoreIntrinsicPat; -defm : MUBUF_StoreIntrinsicPat; +foreach vt = Reg32Types.types in { +defm : MUBUF_StoreIntrinsicPat; +} + +foreach vt = Reg64Types.types in { +defm : MUBUF_StoreIntrinsicPat; +} + +foreach vt = Reg96Types.types in { +defm : MUBUF_StoreIntrinsicPat; +} + +foreach vt = Reg128Types.types in { +defm : MUBUF_StoreIntrinsicPat; +} defm : MUBUF_StoreIntrinsicPat; defm : MUBUF_StoreIntrinsicPat; diff --git a/llvm/lib/Target/AMDGPU/SIRegisterInfo.td b/llvm/lib/Target/AMDGPU/SIRegisterInfo.td index caac7126068ef..a8efe2b2ba35e 100644 --- a/llvm/lib/Target/AMDGPU/SIRegisterInfo.td +++ b/llvm/lib/Target/AMDGPU/SIRegisterInfo.td @@ -586,7 +586,9 @@ class RegisterTypes reg_types> { def Reg16Types : RegisterTypes<[i16, f16, bf16]>; def Reg32Types : RegisterTypes<[i32, f32, v2i16, v2f16, v2bf16, p2, p3, p5, p6]>; -def Reg64Types : RegisterTypes<[i64, f64, v2i32, v2f32, p0]>; +def Reg64Types : RegisterTypes<[i64, f64, v2i32, v2f32, p0, v4i16, v4f16, v4bf16]>; +def Reg96Types : RegisterTypes<[v3i32, v3f32]>; +def Reg128Types : RegisterTypes<[v4i32, v4f32, v2i64, v2f64, v8i16, v8f16, v8bf16]>; let HasVGPR = 1 in { // VOP3 and VINTERP can access 256 lo and 256 hi registers. @@ -744,7 +746,7 @@ def Pseudo_SReg_32 : SIRegisterClass<"AMDGPU", [i32, f32, i16, f16, bf16, v2i16, let BaseClassOrder = 1; } -def Pseudo_SReg_128 : SIRegisterClass<"AMDGPU", [v4i32, v2i64, v2f64, v8i16, v8f16, v8bf16], 32, +def Pseudo_SReg_128 : SIRegisterClass<"AMDGPU", Reg128Types.types, 32, (add PRIVATE_RSRC_REG)> { let isAllocatable = 0; let CopyCost = -1; @@ -815,7 +817,7 @@ def SRegOrLds_32 : SIRegisterClass<"AMDGPU", [i32, f32, i16, f16, bf16, v2i16, v let HasSGPR = 1; } -def SGPR_64 : SIRegisterClass<"AMDGPU", [v2i32, i64, v2f32, f64, v4i16, v4f16, v4bf16], 32, +def SGPR_64 : SIRegisterClass<"AMDGPU", Reg64Types.types, 32, (add SGPR_64Regs)> { let CopyCost = 1; let AllocationPriority = 1; @@ -905,8 +907,8 @@ multiclass SRegClass; -defm "" : SRegClass<4, [v4i32, v4f32, v2i64, v2f64, v8i16, v8f16, v8bf16], SGPR_128Regs, TTMP_128Regs>; +defm "" : SRegClass<3, Reg96Types.types, SGPR_96Regs, TTMP_96Regs>; +defm "" : SRegClass<4, Reg128Types.types, SGPR_128Regs, TTMP_128Regs>; defm "" : SRegClass<5,
[llvm-branch-commits] [clang] 0e8c9bc - Revert "[clang][NFC] Add a test for CWG2685 (#95206)"
Author: Younan Zhang Date: 2024-06-13T18:53:46+08:00 New Revision: 0e8c9bca863137f14aea2cee0e05d4270b33e0e8 URL: https://github.com/llvm/llvm-project/commit/0e8c9bca863137f14aea2cee0e05d4270b33e0e8 DIFF: https://github.com/llvm/llvm-project/commit/0e8c9bca863137f14aea2cee0e05d4270b33e0e8.diff LOG: Revert "[clang][NFC] Add a test for CWG2685 (#95206)" This reverts commit 3475116e2c37a2c8a69658b36c02871c322da008. Added: Modified: clang/test/CXX/drs/cwg26xx.cpp clang/www/cxx_dr_status.html Removed: diff --git a/clang/test/CXX/drs/cwg26xx.cpp b/clang/test/CXX/drs/cwg26xx.cpp index fee3ef16850bf..2b17c8101438d 100644 --- a/clang/test/CXX/drs/cwg26xx.cpp +++ b/clang/test/CXX/drs/cwg26xx.cpp @@ -225,15 +225,6 @@ void m() { } #if __cplusplus >= 202302L - -namespace cwg2685 { // cwg2685: 17 -template -struct A { - T ar[4]; -}; -A a = { "foo" }; -} - namespace cwg2687 { // cwg2687: 18 struct S{ void f(int); diff --git a/clang/www/cxx_dr_status.html b/clang/www/cxx_dr_status.html index 8c79708f23abd..5e2ab06701703 100755 --- a/clang/www/cxx_dr_status.html +++ b/clang/www/cxx_dr_status.html @@ -15918,7 +15918,7 @@ C++ defect report implementation status https://cplusplus.github.io/CWG/issues/2685.html;>2685 C++23 Aggregate CTAD, string, and brace elision -Clang 17 +Unknown https://cplusplus.github.io/CWG/issues/2686.html;>2686 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [compiler-rt] [TySan] Fixed false positive when accessing offset member variables (PR #95387)
llvmbot wrote: @llvm/pr-subscribers-compiler-rt-sanitizer Author: None (gbMattN) Changes This patch fixes a bug the current TySan implementation has. Currently if you access a member variable other than the first, TySan reports an error. TySan believes you are accessing the struct type with an offset equal to the offset of the member variable you are trying to access. With this patch, the type we are trying to access is amended to the type of the member variable matching the offset we are accessing with. It does this if and only if there is a member at that offset, however, so any incorrect accesses are still caught. This is checked in the struct-members.c test. --- Full diff: https://github.com/llvm/llvm-project/pull/95387.diff 2 Files Affected: - (modified) compiler-rt/lib/tysan/tysan.cpp (+11-1) - (added) compiler-rt/test/tysan/struct-members.c (+32) ``diff diff --git a/compiler-rt/lib/tysan/tysan.cpp b/compiler-rt/lib/tysan/tysan.cpp index f627851d049e6..747727e48a152 100644 --- a/compiler-rt/lib/tysan/tysan.cpp +++ b/compiler-rt/lib/tysan/tysan.cpp @@ -221,7 +221,17 @@ __tysan_check(void *addr, int size, tysan_type_descriptor *td, int flags) { OldTDPtr -= i; OldTD = *OldTDPtr; -if (!isAliasingLegal(td, OldTD)) +tysan_type_descriptor *InternalMember = OldTD; +if (OldTD->Tag == TYSAN_STRUCT_TD) { + for (int j = 0; j < OldTD->Struct.MemberCount; j++) { +if (OldTD->Struct.Members[j].Offset == i) { + InternalMember = OldTD->Struct.Members[j].Type; + break; +} + } +} + +if (!isAliasingLegal(td, InternalMember)) reportError(addr, size, td, OldTD, AccessStr, "accesses part of an existing object", -i, pc, bp, sp); diff --git a/compiler-rt/test/tysan/struct-members.c b/compiler-rt/test/tysan/struct-members.c new file mode 100644 index 0..8cf6499f78ce6 --- /dev/null +++ b/compiler-rt/test/tysan/struct-members.c @@ -0,0 +1,32 @@ +// RUN: %clang_tysan -O0 %s -o %t && %run %t >%t.out 2>&1 +// RUN: FileCheck %s < %t.out + +#include + +struct X { + int a, b, c; +} x; + +static struct X xArray[2]; + +int main() { + x.a = 1; + x.b = 2; + x.c = 3; + + printf("%d %d %d\n", x.a, x.b, x.c); + // CHECK-NOT: ERROR: TypeSanitizer: type-aliasing-violation + + for (size_t i = 0; i < 2; i++) { +xArray[i].a = 1; +xArray[i].b = 1; +xArray[i].c = 1; + } + printf("Here\n"); + + struct X *xPtr = (struct X *)&(xArray[0].c); + xPtr->a = 1; + // CHECK: ERROR: TypeSanitizer: type-aliasing-violation + // CHECK: WRITE of size 4 at {{.*}} with type int (in X at offset 0) accesses an existing object of type int (in X at offset 8) + // CHECK: {{#0 0x.* in main .*struct-members.c:}}[[@LINE-3]] +} `` https://github.com/llvm/llvm-project/pull/95387 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [compiler-rt] [TySan] Fixed false positive when accessing offset member variables (PR #95387)
github-actions[bot] wrote: Thank you for submitting a Pull Request (PR) to the LLVM Project! This PR will be automatically labeled and the relevant teams will be notified. If you wish to, you can add reviewers by using the "Reviewers" section on this page. If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using `@` followed by their GitHub username. If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers. If you have further questions, they may be answered by the [LLVM GitHub User Guide](https://llvm.org/docs/GitHub.html). You can also ask questions in a comment on this PR, on the [LLVM Discord](https://discord.com/invite/xS7Z362) or on the [forums](https://discourse.llvm.org/). https://github.com/llvm/llvm-project/pull/95387 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU: Fix buffer intrinsic store of bfloat (PR #95377)
https://github.com/jayfoad approved this pull request. https://github.com/llvm/llvm-project/pull/95377 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU: Fix buffer intrinsic store of bfloat (PR #95377)
https://github.com/arsenm ready_for_review https://github.com/llvm/llvm-project/pull/95377 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU: Cleanup selection patterns for buffer loads (PR #95378)
llvmbot wrote: @llvm/pr-subscribers-backend-amdgpu Author: Matt Arsenault (arsenm) Changes We should just support these for all register types. --- Full diff: https://github.com/llvm/llvm-project/pull/95378.diff 2 Files Affected: - (modified) llvm/lib/Target/AMDGPU/BUFInstructions.td (+30-42) - (modified) llvm/lib/Target/AMDGPU/SIRegisterInfo.td (+9-7) ``diff diff --git a/llvm/lib/Target/AMDGPU/BUFInstructions.td b/llvm/lib/Target/AMDGPU/BUFInstructions.td index 94dd45f1333b0..2f52edb7f917a 100644 --- a/llvm/lib/Target/AMDGPU/BUFInstructions.td +++ b/llvm/lib/Target/AMDGPU/BUFInstructions.td @@ -1421,27 +1421,21 @@ let OtherPredicates = [HasPackedD16VMem] in { defm : MUBUF_LoadIntrinsicPat; } // End HasPackedD16VMem. -defm : MUBUF_LoadIntrinsicPat; -defm : MUBUF_LoadIntrinsicPat; -defm : MUBUF_LoadIntrinsicPat; -defm : MUBUF_LoadIntrinsicPat; -defm : MUBUF_LoadIntrinsicPat; -defm : MUBUF_LoadIntrinsicPat; -defm : MUBUF_LoadIntrinsicPat; -defm : MUBUF_LoadIntrinsicPat; -defm : MUBUF_LoadIntrinsicPat; -defm : MUBUF_LoadIntrinsicPat; -defm : MUBUF_LoadIntrinsicPat; -defm : MUBUF_LoadIntrinsicPat; -defm : MUBUF_LoadIntrinsicPat; -defm : MUBUF_LoadIntrinsicPat; -defm : MUBUF_LoadIntrinsicPat; -defm : MUBUF_LoadIntrinsicPat; -defm : MUBUF_LoadIntrinsicPat; -defm : MUBUF_LoadIntrinsicPat; -defm : MUBUF_LoadIntrinsicPat; -defm : MUBUF_LoadIntrinsicPat; -defm : MUBUF_LoadIntrinsicPat; +foreach vt = Reg32Types.types in { +defm : MUBUF_LoadIntrinsicPat; +} + +foreach vt = Reg64Types.types in { +defm : MUBUF_LoadIntrinsicPat; +} + +foreach vt = Reg96Types.types in { +defm : MUBUF_LoadIntrinsicPat; +} + +foreach vt = Reg128Types.types in { +defm : MUBUF_LoadIntrinsicPat; +} defm : MUBUF_LoadIntrinsicPat; defm : MUBUF_LoadIntrinsicPat; @@ -1532,27 +1526,21 @@ let OtherPredicates = [HasPackedD16VMem] in { defm : MUBUF_StoreIntrinsicPat; } // End HasPackedD16VMem. -defm : MUBUF_StoreIntrinsicPat; -defm : MUBUF_StoreIntrinsicPat; -defm : MUBUF_StoreIntrinsicPat; -defm : MUBUF_StoreIntrinsicPat; -defm : MUBUF_StoreIntrinsicPat; -defm : MUBUF_StoreIntrinsicPat; -defm : MUBUF_StoreIntrinsicPat; -defm : MUBUF_StoreIntrinsicPat; -defm : MUBUF_StoreIntrinsicPat; -defm : MUBUF_StoreIntrinsicPat; -defm : MUBUF_StoreIntrinsicPat; -defm : MUBUF_StoreIntrinsicPat; -defm : MUBUF_StoreIntrinsicPat; -defm : MUBUF_StoreIntrinsicPat; -defm : MUBUF_StoreIntrinsicPat; -defm : MUBUF_StoreIntrinsicPat; -defm : MUBUF_StoreIntrinsicPat; -defm : MUBUF_StoreIntrinsicPat; -defm : MUBUF_StoreIntrinsicPat; -defm : MUBUF_StoreIntrinsicPat; -defm : MUBUF_StoreIntrinsicPat; +foreach vt = Reg32Types.types in { +defm : MUBUF_StoreIntrinsicPat; +} + +foreach vt = Reg64Types.types in { +defm : MUBUF_StoreIntrinsicPat; +} + +foreach vt = Reg96Types.types in { +defm : MUBUF_StoreIntrinsicPat; +} + +foreach vt = Reg128Types.types in { +defm : MUBUF_StoreIntrinsicPat; +} defm : MUBUF_StoreIntrinsicPat; defm : MUBUF_StoreIntrinsicPat; diff --git a/llvm/lib/Target/AMDGPU/SIRegisterInfo.td b/llvm/lib/Target/AMDGPU/SIRegisterInfo.td index caac7126068ef..a8efe2b2ba35e 100644 --- a/llvm/lib/Target/AMDGPU/SIRegisterInfo.td +++ b/llvm/lib/Target/AMDGPU/SIRegisterInfo.td @@ -586,7 +586,9 @@ class RegisterTypes reg_types> { def Reg16Types : RegisterTypes<[i16, f16, bf16]>; def Reg32Types : RegisterTypes<[i32, f32, v2i16, v2f16, v2bf16, p2, p3, p5, p6]>; -def Reg64Types : RegisterTypes<[i64, f64, v2i32, v2f32, p0]>; +def Reg64Types : RegisterTypes<[i64, f64, v2i32, v2f32, p0, v4i16, v4f16, v4bf16]>; +def Reg96Types : RegisterTypes<[v3i32, v3f32]>; +def Reg128Types : RegisterTypes<[v4i32, v4f32, v2i64, v2f64, v8i16, v8f16, v8bf16]>; let HasVGPR = 1 in { // VOP3 and VINTERP can access 256 lo and 256 hi registers. @@ -744,7 +746,7 @@ def Pseudo_SReg_32 : SIRegisterClass<"AMDGPU", [i32, f32, i16, f16, bf16, v2i16, let BaseClassOrder = 1; } -def Pseudo_SReg_128 : SIRegisterClass<"AMDGPU", [v4i32, v2i64, v2f64, v8i16, v8f16, v8bf16], 32, +def Pseudo_SReg_128 : SIRegisterClass<"AMDGPU", Reg128Types.types, 32, (add PRIVATE_RSRC_REG)> { let isAllocatable = 0; let CopyCost = -1; @@ -815,7 +817,7 @@ def SRegOrLds_32 : SIRegisterClass<"AMDGPU", [i32, f32, i16, f16, bf16, v2i16, v let HasSGPR = 1; } -def SGPR_64 : SIRegisterClass<"AMDGPU", [v2i32, i64, v2f32, f64, v4i16, v4f16, v4bf16], 32, +def SGPR_64 : SIRegisterClass<"AMDGPU", Reg64Types.types, 32, (add SGPR_64Regs)> { let CopyCost = 1; let AllocationPriority = 1; @@ -905,8 +907,8 @@ multiclass SRegClass; -defm "" : SRegClass<4, [v4i32, v4f32, v2i64, v2f64, v8i16, v8f16, v8bf16], SGPR_128Regs, TTMP_128Regs>; +defm "" : SRegClass<3, Reg96Types.types, SGPR_96Regs, TTMP_96Regs>; +defm "" : SRegClass<4, Reg128Types.types, SGPR_128Regs, TTMP_128Regs>; defm "" : SRegClass<5, [v5i32, v5f32], SGPR_160Regs, TTMP_160Regs>; defm "" : SRegClass<6, [v6i32, v6f32, v3i64, v3f64], SGPR_192Regs, TTMP_192Regs>; defm "" :
[llvm-branch-commits] [llvm] AMDGPU: Fix buffer load/store of pointers (PR #95379)
llvmbot wrote: @llvm/pr-subscribers-backend-amdgpu Author: Matt Arsenault (arsenm) Changes Make sure we test all the address spaces since this support isn't free in gisel. --- Patch is 38.37 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/95379.diff 3 Files Affected: - (modified) llvm/lib/Target/AMDGPU/SIISelLowering.cpp (+19-12) - (modified) llvm/test/CodeGen/AMDGPU/llvm.amdgcn.raw.ptr.buffer.load.ll (+596) - (modified) llvm/test/CodeGen/AMDGPU/llvm.amdgcn.raw.ptr.buffer.store.ll (+144) ``diff diff --git a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp index 81098201e9c0f..7a36c88b892c8 100644 --- a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp +++ b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp @@ -1112,29 +1112,33 @@ unsigned SITargetLowering::getVectorTypeBreakdownForCallingConv( Context, CC, VT, IntermediateVT, NumIntermediates, RegisterVT); } -static EVT memVTFromLoadIntrData(Type *Ty, unsigned MaxNumLanes) { +static EVT memVTFromLoadIntrData(const SITargetLowering , + const DataLayout , Type *Ty, + unsigned MaxNumLanes) { assert(MaxNumLanes != 0); + LLVMContext = Ty->getContext(); if (auto *VT = dyn_cast(Ty)) { unsigned NumElts = std::min(MaxNumLanes, VT->getNumElements()); -return EVT::getVectorVT(Ty->getContext(), -EVT::getEVT(VT->getElementType()), +return EVT::getVectorVT(Ctx, TLI.getValueType(DL, VT->getElementType()), NumElts); } - return EVT::getEVT(Ty); + return TLI.getValueType(DL, Ty); } // Peek through TFE struct returns to only use the data size. -static EVT memVTFromLoadIntrReturn(Type *Ty, unsigned MaxNumLanes) { +static EVT memVTFromLoadIntrReturn(const SITargetLowering , + const DataLayout , Type *Ty, + unsigned MaxNumLanes) { auto *ST = dyn_cast(Ty); if (!ST) -return memVTFromLoadIntrData(Ty, MaxNumLanes); +return memVTFromLoadIntrData(TLI, DL, Ty, MaxNumLanes); // TFE intrinsics return an aggregate type. assert(ST->getNumContainedTypes() == 2 && ST->getContainedType(1)->isIntegerTy(32)); - return memVTFromLoadIntrData(ST->getContainedType(0), MaxNumLanes); + return memVTFromLoadIntrData(TLI, DL, ST->getContainedType(0), MaxNumLanes); } /// Map address space 7 to MVT::v5i32 because that's its in-memory @@ -1219,10 +1223,12 @@ bool SITargetLowering::getTgtMemIntrinsic(IntrinsicInfo , MaxNumLanes = DMask == 0 ? 1 : llvm::popcount(DMask); } -Info.memVT = memVTFromLoadIntrReturn(CI.getType(), MaxNumLanes); +Info.memVT = memVTFromLoadIntrReturn(*this, MF.getDataLayout(), + CI.getType(), MaxNumLanes); } else { -Info.memVT = memVTFromLoadIntrReturn( -CI.getType(), std::numeric_limits::max()); +Info.memVT = +memVTFromLoadIntrReturn(*this, MF.getDataLayout(), CI.getType(), +std::numeric_limits::max()); } // FIXME: What does alignment mean for an image? @@ -1235,9 +1241,10 @@ bool SITargetLowering::getTgtMemIntrinsic(IntrinsicInfo , if (RsrcIntr->IsImage) { unsigned DMask = cast(CI.getArgOperand(1))->getZExtValue(); unsigned DMaskLanes = DMask == 0 ? 1 : llvm::popcount(DMask); -Info.memVT = memVTFromLoadIntrData(DataTy, DMaskLanes); +Info.memVT = memVTFromLoadIntrData(*this, MF.getDataLayout(), DataTy, + DMaskLanes); } else -Info.memVT = EVT::getEVT(DataTy); +Info.memVT = getValueType(MF.getDataLayout(), DataTy); Info.flags |= MachineMemOperand::MOStore; } else { diff --git a/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.raw.ptr.buffer.load.ll b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.raw.ptr.buffer.load.ll index 3e3371091ef72..4d557c76dc4d0 100644 --- a/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.raw.ptr.buffer.load.ll +++ b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.raw.ptr.buffer.load.ll @@ -1280,6 +1280,602 @@ define <2 x i64> @buffer_load_v2i64__voffset_add(ptr addrspace(8) inreg %rsrc, i ret <2 x i64> %data } +define ptr @buffer_load_p0__voffset_add(ptr addrspace(8) inreg %rsrc, i32 %voffset) { +; PREGFX10-LABEL: buffer_load_p0__voffset_add: +; PREGFX10: ; %bb.0: +; PREGFX10-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) +; PREGFX10-NEXT:buffer_load_dwordx2 v[0:1], v0, s[4:7], 0 offen offset:60 +; PREGFX10-NEXT:s_waitcnt vmcnt(0) +; PREGFX10-NEXT:s_setpc_b64 s[30:31] +; +; GFX10-LABEL: buffer_load_p0__voffset_add: +; GFX10: ; %bb.0: +; GFX10-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) +; GFX10-NEXT:buffer_load_dwordx2 v[0:1], v0, s[4:7], 0 offen offset:60 +; GFX10-NEXT:s_waitcnt vmcnt(0) +;
[llvm-branch-commits] [llvm] AMDGPU: Fix buffer load/store of pointers (PR #95379)
https://github.com/arsenm ready_for_review https://github.com/llvm/llvm-project/pull/95379 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU: Cleanup selection patterns for buffer loads (PR #95378)
https://github.com/arsenm ready_for_review https://github.com/llvm/llvm-project/pull/95378 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU: Fix buffer intrinsic store of bfloat (PR #95377)
llvmbot wrote: @llvm/pr-subscribers-backend-amdgpu Author: Matt Arsenault (arsenm) Changes --- Full diff: https://github.com/llvm/llvm-project/pull/95377.diff 2 Files Affected: - (modified) llvm/lib/Target/AMDGPU/SIISelLowering.cpp (+2-2) - (modified) llvm/test/CodeGen/AMDGPU/llvm.amdgcn.raw.ptr.buffer.store.bf16.ll (+32-5) ``diff diff --git a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp index 4946129c65a95..81098201e9c0f 100644 --- a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp +++ b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp @@ -874,7 +874,7 @@ SITargetLowering::SITargetLowering(const TargetMachine , {MVT::Other, MVT::v2i16, MVT::v2f16, MVT::v2bf16, MVT::v3i16, MVT::v3f16, MVT::v4f16, MVT::v4i16, MVT::v4bf16, MVT::v8i16, MVT::v8f16, MVT::v8bf16, - MVT::f16, MVT::i16, MVT::i8, MVT::i128}, + MVT::f16, MVT::i16, MVT::bf16, MVT::i8, MVT::i128}, Custom); setOperationAction(ISD::STACKSAVE, MVT::Other, Custom); @@ -9973,7 +9973,7 @@ SDValue SITargetLowering::handleByteShortBufferStores(SelectionDAG , EVT VDataType, SDLoc DL, SDValue Ops[], MemSDNode *M) const { - if (VDataType == MVT::f16) + if (VDataType == MVT::f16 || VDataType == MVT::bf16) Ops[1] = DAG.getNode(ISD::BITCAST, DL, MVT::i16, Ops[1]); SDValue BufferStoreExt = DAG.getNode(ISD::ANY_EXTEND, DL, MVT::i32, Ops[1]); diff --git a/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.raw.ptr.buffer.store.bf16.ll b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.raw.ptr.buffer.store.bf16.ll index f7f3742a90633..82dd35ab4c240 100644 --- a/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.raw.ptr.buffer.store.bf16.ll +++ b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.raw.ptr.buffer.store.bf16.ll @@ -5,11 +5,38 @@ ; RUN: llc -mtriple=amdgcn -mcpu=gfx1010 < %s | FileCheck --check-prefix=GFX10 %s ; RUN: llc -mtriple=amdgcn -mcpu=gfx1100 -amdgpu-enable-delay-alu=0 < %s | FileCheck --check-prefixes=GFX11 %s -; FIXME -; define amdgpu_ps void @buffer_store_bf16(ptr addrspace(8) inreg %rsrc, bfloat %data, i32 %offset) { -; call void @llvm.amdgcn.raw.ptr.buffer.store.bf16(bfloat %data, ptr addrspace(8) %rsrc, i32 %offset, i32 0, i32 0) -; ret void -; } +define amdgpu_ps void @buffer_store_bf16(ptr addrspace(8) inreg %rsrc, bfloat %data, i32 %offset) { +; GFX7-LABEL: buffer_store_bf16: +; GFX7: ; %bb.0: +; GFX7-NEXT:v_mul_f32_e32 v0, 1.0, v0 +; GFX7-NEXT:v_lshrrev_b32_e32 v0, 16, v0 +; GFX7-NEXT:buffer_store_short v0, v1, s[0:3], 0 offen +; GFX7-NEXT:s_endpgm +; +; GFX8-LABEL: buffer_store_bf16: +; GFX8: ; %bb.0: +; GFX8-NEXT:buffer_store_short v0, v1, s[0:3], 0 offen +; GFX8-NEXT:s_endpgm +; +; GFX9-LABEL: buffer_store_bf16: +; GFX9: ; %bb.0: +; GFX9-NEXT:buffer_store_short v0, v1, s[0:3], 0 offen +; GFX9-NEXT:s_endpgm +; +; GFX10-LABEL: buffer_store_bf16: +; GFX10: ; %bb.0: +; GFX10-NEXT:buffer_store_short v0, v1, s[0:3], 0 offen +; GFX10-NEXT:s_endpgm +; +; GFX11-LABEL: buffer_store_bf16: +; GFX11: ; %bb.0: +; GFX11-NEXT:buffer_store_b16 v0, v1, s[0:3], 0 offen +; GFX11-NEXT:s_nop 0 +; GFX11-NEXT:s_sendmsg sendmsg(MSG_DEALLOC_VGPRS) +; GFX11-NEXT:s_endpgm + call void @llvm.amdgcn.raw.ptr.buffer.store.bf16(bfloat %data, ptr addrspace(8) %rsrc, i32 %offset, i32 0, i32 0) + ret void +} define amdgpu_ps void @buffer_store_v2bf16(ptr addrspace(8) inreg %rsrc, <2 x bfloat> %data, i32 %offset) { ; GFX7-LABEL: buffer_store_v2bf16: `` https://github.com/llvm/llvm-project/pull/95377 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU: Cleanup selection patterns for buffer loads (PR #95378)
arsenm wrote: > [!WARNING] > This pull request is not mergeable via GitHub because a downstack PR is > open. Once all requirements are satisfied, merge this PR as a stack href="https://app.graphite.dev/github/pr/llvm/llvm-project/95378?utm_source=stack-comment-downstack-mergeability-warning; > >on Graphite. > https://graphite.dev/docs/merge-pull-requests;>Learn more * **#95379** https://app.graphite.dev/github/pr/llvm/llvm-project/95379?utm_source=stack-comment-icon; target="_blank">https://static.graphite.dev/graphite-32x32-black.png; alt="Graphite" width="10px" height="10px"/> * **#95378** https://app.graphite.dev/github/pr/llvm/llvm-project/95378?utm_source=stack-comment-icon; target="_blank">https://static.graphite.dev/graphite-32x32-black.png; alt="Graphite" width="10px" height="10px"/> * **#95377** https://app.graphite.dev/github/pr/llvm/llvm-project/95377?utm_source=stack-comment-icon; target="_blank">https://static.graphite.dev/graphite-32x32-black.png; alt="Graphite" width="10px" height="10px"/> * **#95376** https://app.graphite.dev/github/pr/llvm/llvm-project/95376?utm_source=stack-comment-icon; target="_blank">https://static.graphite.dev/graphite-32x32-black.png; alt="Graphite" width="10px" height="10px"/> * `main` This stack of pull requests is managed by Graphite. https://stacking.dev/?utm_source=stack-comment;>Learn more about stacking. Join @arsenm and the rest of your teammates on https://graphite.dev?utm-source=stack-comment;>https://static.graphite.dev/graphite-32x32-black.png; alt="Graphite" width="11px" height="11px"/> Graphite https://github.com/llvm/llvm-project/pull/95378 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU: Fix buffer intrinsic store of bfloat (PR #95377)
arsenm wrote: > [!WARNING] > This pull request is not mergeable via GitHub because a downstack PR is > open. Once all requirements are satisfied, merge this PR as a stack href="https://app.graphite.dev/github/pr/llvm/llvm-project/95377?utm_source=stack-comment-downstack-mergeability-warning; > >on Graphite. > https://graphite.dev/docs/merge-pull-requests;>Learn more * **#95379** https://app.graphite.dev/github/pr/llvm/llvm-project/95379?utm_source=stack-comment-icon; target="_blank">https://static.graphite.dev/graphite-32x32-black.png; alt="Graphite" width="10px" height="10px"/> * **#95378** https://app.graphite.dev/github/pr/llvm/llvm-project/95378?utm_source=stack-comment-icon; target="_blank">https://static.graphite.dev/graphite-32x32-black.png; alt="Graphite" width="10px" height="10px"/> * **#95377** https://app.graphite.dev/github/pr/llvm/llvm-project/95377?utm_source=stack-comment-icon; target="_blank">https://static.graphite.dev/graphite-32x32-black.png; alt="Graphite" width="10px" height="10px"/> * **#95376** https://app.graphite.dev/github/pr/llvm/llvm-project/95376?utm_source=stack-comment-icon; target="_blank">https://static.graphite.dev/graphite-32x32-black.png; alt="Graphite" width="10px" height="10px"/> * `main` This stack of pull requests is managed by Graphite. https://stacking.dev/?utm_source=stack-comment;>Learn more about stacking. Join @arsenm and the rest of your teammates on https://graphite.dev?utm-source=stack-comment;>https://static.graphite.dev/graphite-32x32-black.png; alt="Graphite" width="11px" height="11px"/> Graphite https://github.com/llvm/llvm-project/pull/95377 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU: Fix buffer load/store of pointers (PR #95379)
arsenm wrote: > [!WARNING] > This pull request is not mergeable via GitHub because a downstack PR is > open. Once all requirements are satisfied, merge this PR as a stack href="https://app.graphite.dev/github/pr/llvm/llvm-project/95379?utm_source=stack-comment-downstack-mergeability-warning; > >on Graphite. > https://graphite.dev/docs/merge-pull-requests;>Learn more * **#95379** https://app.graphite.dev/github/pr/llvm/llvm-project/95379?utm_source=stack-comment-icon; target="_blank">https://static.graphite.dev/graphite-32x32-black.png; alt="Graphite" width="10px" height="10px"/> * **#95378** https://app.graphite.dev/github/pr/llvm/llvm-project/95378?utm_source=stack-comment-icon; target="_blank">https://static.graphite.dev/graphite-32x32-black.png; alt="Graphite" width="10px" height="10px"/> * **#95377** https://app.graphite.dev/github/pr/llvm/llvm-project/95377?utm_source=stack-comment-icon; target="_blank">https://static.graphite.dev/graphite-32x32-black.png; alt="Graphite" width="10px" height="10px"/> * **#95376** https://app.graphite.dev/github/pr/llvm/llvm-project/95376?utm_source=stack-comment-icon; target="_blank">https://static.graphite.dev/graphite-32x32-black.png; alt="Graphite" width="10px" height="10px"/> * `main` This stack of pull requests is managed by Graphite. https://stacking.dev/?utm_source=stack-comment;>Learn more about stacking. Join @arsenm and the rest of your teammates on https://graphite.dev?utm-source=stack-comment;>https://static.graphite.dev/graphite-32x32-black.png; alt="Graphite" width="11px" height="11px"/> Graphite https://github.com/llvm/llvm-project/pull/95379 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU: Fix buffer load/store of pointers (PR #95379)
https://github.com/arsenm created https://github.com/llvm/llvm-project/pull/95379 Make sure we test all the address spaces since this support isn't free in gisel. >From b05179ed684e289ce31f7aee8b57939c7bf2809c Mon Sep 17 00:00:00 2001 From: Matt Arsenault Date: Wed, 12 Jun 2024 10:10:20 +0200 Subject: [PATCH] AMDGPU: Fix buffer load/store of pointers Make sure we test all the address spaces since this support isn't free in gisel. --- llvm/lib/Target/AMDGPU/SIISelLowering.cpp | 31 +- .../AMDGPU/llvm.amdgcn.raw.ptr.buffer.load.ll | 596 ++ .../llvm.amdgcn.raw.ptr.buffer.store.ll | 144 + 3 files changed, 759 insertions(+), 12 deletions(-) diff --git a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp index 81098201e9c0f..7a36c88b892c8 100644 --- a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp +++ b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp @@ -1112,29 +1112,33 @@ unsigned SITargetLowering::getVectorTypeBreakdownForCallingConv( Context, CC, VT, IntermediateVT, NumIntermediates, RegisterVT); } -static EVT memVTFromLoadIntrData(Type *Ty, unsigned MaxNumLanes) { +static EVT memVTFromLoadIntrData(const SITargetLowering , + const DataLayout , Type *Ty, + unsigned MaxNumLanes) { assert(MaxNumLanes != 0); + LLVMContext = Ty->getContext(); if (auto *VT = dyn_cast(Ty)) { unsigned NumElts = std::min(MaxNumLanes, VT->getNumElements()); -return EVT::getVectorVT(Ty->getContext(), -EVT::getEVT(VT->getElementType()), +return EVT::getVectorVT(Ctx, TLI.getValueType(DL, VT->getElementType()), NumElts); } - return EVT::getEVT(Ty); + return TLI.getValueType(DL, Ty); } // Peek through TFE struct returns to only use the data size. -static EVT memVTFromLoadIntrReturn(Type *Ty, unsigned MaxNumLanes) { +static EVT memVTFromLoadIntrReturn(const SITargetLowering , + const DataLayout , Type *Ty, + unsigned MaxNumLanes) { auto *ST = dyn_cast(Ty); if (!ST) -return memVTFromLoadIntrData(Ty, MaxNumLanes); +return memVTFromLoadIntrData(TLI, DL, Ty, MaxNumLanes); // TFE intrinsics return an aggregate type. assert(ST->getNumContainedTypes() == 2 && ST->getContainedType(1)->isIntegerTy(32)); - return memVTFromLoadIntrData(ST->getContainedType(0), MaxNumLanes); + return memVTFromLoadIntrData(TLI, DL, ST->getContainedType(0), MaxNumLanes); } /// Map address space 7 to MVT::v5i32 because that's its in-memory @@ -1219,10 +1223,12 @@ bool SITargetLowering::getTgtMemIntrinsic(IntrinsicInfo , MaxNumLanes = DMask == 0 ? 1 : llvm::popcount(DMask); } -Info.memVT = memVTFromLoadIntrReturn(CI.getType(), MaxNumLanes); +Info.memVT = memVTFromLoadIntrReturn(*this, MF.getDataLayout(), + CI.getType(), MaxNumLanes); } else { -Info.memVT = memVTFromLoadIntrReturn( -CI.getType(), std::numeric_limits::max()); +Info.memVT = +memVTFromLoadIntrReturn(*this, MF.getDataLayout(), CI.getType(), +std::numeric_limits::max()); } // FIXME: What does alignment mean for an image? @@ -1235,9 +1241,10 @@ bool SITargetLowering::getTgtMemIntrinsic(IntrinsicInfo , if (RsrcIntr->IsImage) { unsigned DMask = cast(CI.getArgOperand(1))->getZExtValue(); unsigned DMaskLanes = DMask == 0 ? 1 : llvm::popcount(DMask); -Info.memVT = memVTFromLoadIntrData(DataTy, DMaskLanes); +Info.memVT = memVTFromLoadIntrData(*this, MF.getDataLayout(), DataTy, + DMaskLanes); } else -Info.memVT = EVT::getEVT(DataTy); +Info.memVT = getValueType(MF.getDataLayout(), DataTy); Info.flags |= MachineMemOperand::MOStore; } else { diff --git a/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.raw.ptr.buffer.load.ll b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.raw.ptr.buffer.load.ll index 3e3371091ef72..4d557c76dc4d0 100644 --- a/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.raw.ptr.buffer.load.ll +++ b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.raw.ptr.buffer.load.ll @@ -1280,6 +1280,602 @@ define <2 x i64> @buffer_load_v2i64__voffset_add(ptr addrspace(8) inreg %rsrc, i ret <2 x i64> %data } +define ptr @buffer_load_p0__voffset_add(ptr addrspace(8) inreg %rsrc, i32 %voffset) { +; PREGFX10-LABEL: buffer_load_p0__voffset_add: +; PREGFX10: ; %bb.0: +; PREGFX10-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) +; PREGFX10-NEXT:buffer_load_dwordx2 v[0:1], v0, s[4:7], 0 offen offset:60 +; PREGFX10-NEXT:s_waitcnt vmcnt(0) +; PREGFX10-NEXT:s_setpc_b64 s[30:31] +; +; GFX10-LABEL: buffer_load_p0__voffset_add: +; GFX10: ; %bb.0: +; GFX10-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) +;
[llvm-branch-commits] [llvm] AMDGPU: Cleanup selection patterns for buffer loads (PR #95378)
https://github.com/arsenm created https://github.com/llvm/llvm-project/pull/95378 We should just support these for all register types. >From 46c7f8b4529827204e5273472ea5b642ecb7266e Mon Sep 17 00:00:00 2001 From: Matt Arsenault Date: Sun, 9 Jun 2024 23:12:31 +0200 Subject: [PATCH] AMDGPU: Cleanup selection patterns for buffer loads We should just support these for all register types. --- llvm/lib/Target/AMDGPU/BUFInstructions.td | 72 ++- llvm/lib/Target/AMDGPU/SIRegisterInfo.td | 16 ++--- 2 files changed, 39 insertions(+), 49 deletions(-) diff --git a/llvm/lib/Target/AMDGPU/BUFInstructions.td b/llvm/lib/Target/AMDGPU/BUFInstructions.td index 94dd45f1333b0..2f52edb7f917a 100644 --- a/llvm/lib/Target/AMDGPU/BUFInstructions.td +++ b/llvm/lib/Target/AMDGPU/BUFInstructions.td @@ -1421,27 +1421,21 @@ let OtherPredicates = [HasPackedD16VMem] in { defm : MUBUF_LoadIntrinsicPat; } // End HasPackedD16VMem. -defm : MUBUF_LoadIntrinsicPat; -defm : MUBUF_LoadIntrinsicPat; -defm : MUBUF_LoadIntrinsicPat; -defm : MUBUF_LoadIntrinsicPat; -defm : MUBUF_LoadIntrinsicPat; -defm : MUBUF_LoadIntrinsicPat; -defm : MUBUF_LoadIntrinsicPat; -defm : MUBUF_LoadIntrinsicPat; -defm : MUBUF_LoadIntrinsicPat; -defm : MUBUF_LoadIntrinsicPat; -defm : MUBUF_LoadIntrinsicPat; -defm : MUBUF_LoadIntrinsicPat; -defm : MUBUF_LoadIntrinsicPat; -defm : MUBUF_LoadIntrinsicPat; -defm : MUBUF_LoadIntrinsicPat; -defm : MUBUF_LoadIntrinsicPat; -defm : MUBUF_LoadIntrinsicPat; -defm : MUBUF_LoadIntrinsicPat; -defm : MUBUF_LoadIntrinsicPat; -defm : MUBUF_LoadIntrinsicPat; -defm : MUBUF_LoadIntrinsicPat; +foreach vt = Reg32Types.types in { +defm : MUBUF_LoadIntrinsicPat; +} + +foreach vt = Reg64Types.types in { +defm : MUBUF_LoadIntrinsicPat; +} + +foreach vt = Reg96Types.types in { +defm : MUBUF_LoadIntrinsicPat; +} + +foreach vt = Reg128Types.types in { +defm : MUBUF_LoadIntrinsicPat; +} defm : MUBUF_LoadIntrinsicPat; defm : MUBUF_LoadIntrinsicPat; @@ -1532,27 +1526,21 @@ let OtherPredicates = [HasPackedD16VMem] in { defm : MUBUF_StoreIntrinsicPat; } // End HasPackedD16VMem. -defm : MUBUF_StoreIntrinsicPat; -defm : MUBUF_StoreIntrinsicPat; -defm : MUBUF_StoreIntrinsicPat; -defm : MUBUF_StoreIntrinsicPat; -defm : MUBUF_StoreIntrinsicPat; -defm : MUBUF_StoreIntrinsicPat; -defm : MUBUF_StoreIntrinsicPat; -defm : MUBUF_StoreIntrinsicPat; -defm : MUBUF_StoreIntrinsicPat; -defm : MUBUF_StoreIntrinsicPat; -defm : MUBUF_StoreIntrinsicPat; -defm : MUBUF_StoreIntrinsicPat; -defm : MUBUF_StoreIntrinsicPat; -defm : MUBUF_StoreIntrinsicPat; -defm : MUBUF_StoreIntrinsicPat; -defm : MUBUF_StoreIntrinsicPat; -defm : MUBUF_StoreIntrinsicPat; -defm : MUBUF_StoreIntrinsicPat; -defm : MUBUF_StoreIntrinsicPat; -defm : MUBUF_StoreIntrinsicPat; -defm : MUBUF_StoreIntrinsicPat; +foreach vt = Reg32Types.types in { +defm : MUBUF_StoreIntrinsicPat; +} + +foreach vt = Reg64Types.types in { +defm : MUBUF_StoreIntrinsicPat; +} + +foreach vt = Reg96Types.types in { +defm : MUBUF_StoreIntrinsicPat; +} + +foreach vt = Reg128Types.types in { +defm : MUBUF_StoreIntrinsicPat; +} defm : MUBUF_StoreIntrinsicPat; defm : MUBUF_StoreIntrinsicPat; diff --git a/llvm/lib/Target/AMDGPU/SIRegisterInfo.td b/llvm/lib/Target/AMDGPU/SIRegisterInfo.td index caac7126068ef..a8efe2b2ba35e 100644 --- a/llvm/lib/Target/AMDGPU/SIRegisterInfo.td +++ b/llvm/lib/Target/AMDGPU/SIRegisterInfo.td @@ -586,7 +586,9 @@ class RegisterTypes reg_types> { def Reg16Types : RegisterTypes<[i16, f16, bf16]>; def Reg32Types : RegisterTypes<[i32, f32, v2i16, v2f16, v2bf16, p2, p3, p5, p6]>; -def Reg64Types : RegisterTypes<[i64, f64, v2i32, v2f32, p0]>; +def Reg64Types : RegisterTypes<[i64, f64, v2i32, v2f32, p0, v4i16, v4f16, v4bf16]>; +def Reg96Types : RegisterTypes<[v3i32, v3f32]>; +def Reg128Types : RegisterTypes<[v4i32, v4f32, v2i64, v2f64, v8i16, v8f16, v8bf16]>; let HasVGPR = 1 in { // VOP3 and VINTERP can access 256 lo and 256 hi registers. @@ -744,7 +746,7 @@ def Pseudo_SReg_32 : SIRegisterClass<"AMDGPU", [i32, f32, i16, f16, bf16, v2i16, let BaseClassOrder = 1; } -def Pseudo_SReg_128 : SIRegisterClass<"AMDGPU", [v4i32, v2i64, v2f64, v8i16, v8f16, v8bf16], 32, +def Pseudo_SReg_128 : SIRegisterClass<"AMDGPU", Reg128Types.types, 32, (add PRIVATE_RSRC_REG)> { let isAllocatable = 0; let CopyCost = -1; @@ -815,7 +817,7 @@ def SRegOrLds_32 : SIRegisterClass<"AMDGPU", [i32, f32, i16, f16, bf16, v2i16, v let HasSGPR = 1; } -def SGPR_64 : SIRegisterClass<"AMDGPU", [v2i32, i64, v2f32, f64, v4i16, v4f16, v4bf16], 32, +def SGPR_64 : SIRegisterClass<"AMDGPU", Reg64Types.types, 32, (add SGPR_64Regs)> { let CopyCost = 1; let AllocationPriority = 1; @@ -905,8 +907,8 @@ multiclass SRegClass; -defm "" : SRegClass<4, [v4i32, v4f32, v2i64, v2f64, v8i16, v8f16, v8bf16], SGPR_128Regs, TTMP_128Regs>; +defm "" : SRegClass<3, Reg96Types.types, SGPR_96Regs, TTMP_96Regs>; +defm "" : SRegClass<4, Reg128Types.types,
[llvm-branch-commits] [llvm] AMDGPU: Fix buffer intrinsic store of bfloat (PR #95377)
https://github.com/arsenm created https://github.com/llvm/llvm-project/pull/95377 None >From 520d91d73339d8bea65f2e30e2a4d7fd0eb3d92b Mon Sep 17 00:00:00 2001 From: Matt Arsenault Date: Sun, 9 Jun 2024 22:54:35 +0200 Subject: [PATCH] AMDGPU: Fix buffer intrinsic store of bfloat --- llvm/lib/Target/AMDGPU/SIISelLowering.cpp | 4 +- .../llvm.amdgcn.raw.ptr.buffer.store.bf16.ll | 37 --- 2 files changed, 34 insertions(+), 7 deletions(-) diff --git a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp index 4946129c65a95..81098201e9c0f 100644 --- a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp +++ b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp @@ -874,7 +874,7 @@ SITargetLowering::SITargetLowering(const TargetMachine , {MVT::Other, MVT::v2i16, MVT::v2f16, MVT::v2bf16, MVT::v3i16, MVT::v3f16, MVT::v4f16, MVT::v4i16, MVT::v4bf16, MVT::v8i16, MVT::v8f16, MVT::v8bf16, - MVT::f16, MVT::i16, MVT::i8, MVT::i128}, + MVT::f16, MVT::i16, MVT::bf16, MVT::i8, MVT::i128}, Custom); setOperationAction(ISD::STACKSAVE, MVT::Other, Custom); @@ -9973,7 +9973,7 @@ SDValue SITargetLowering::handleByteShortBufferStores(SelectionDAG , EVT VDataType, SDLoc DL, SDValue Ops[], MemSDNode *M) const { - if (VDataType == MVT::f16) + if (VDataType == MVT::f16 || VDataType == MVT::bf16) Ops[1] = DAG.getNode(ISD::BITCAST, DL, MVT::i16, Ops[1]); SDValue BufferStoreExt = DAG.getNode(ISD::ANY_EXTEND, DL, MVT::i32, Ops[1]); diff --git a/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.raw.ptr.buffer.store.bf16.ll b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.raw.ptr.buffer.store.bf16.ll index f7f3742a90633..82dd35ab4c240 100644 --- a/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.raw.ptr.buffer.store.bf16.ll +++ b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.raw.ptr.buffer.store.bf16.ll @@ -5,11 +5,38 @@ ; RUN: llc -mtriple=amdgcn -mcpu=gfx1010 < %s | FileCheck --check-prefix=GFX10 %s ; RUN: llc -mtriple=amdgcn -mcpu=gfx1100 -amdgpu-enable-delay-alu=0 < %s | FileCheck --check-prefixes=GFX11 %s -; FIXME -; define amdgpu_ps void @buffer_store_bf16(ptr addrspace(8) inreg %rsrc, bfloat %data, i32 %offset) { -; call void @llvm.amdgcn.raw.ptr.buffer.store.bf16(bfloat %data, ptr addrspace(8) %rsrc, i32 %offset, i32 0, i32 0) -; ret void -; } +define amdgpu_ps void @buffer_store_bf16(ptr addrspace(8) inreg %rsrc, bfloat %data, i32 %offset) { +; GFX7-LABEL: buffer_store_bf16: +; GFX7: ; %bb.0: +; GFX7-NEXT:v_mul_f32_e32 v0, 1.0, v0 +; GFX7-NEXT:v_lshrrev_b32_e32 v0, 16, v0 +; GFX7-NEXT:buffer_store_short v0, v1, s[0:3], 0 offen +; GFX7-NEXT:s_endpgm +; +; GFX8-LABEL: buffer_store_bf16: +; GFX8: ; %bb.0: +; GFX8-NEXT:buffer_store_short v0, v1, s[0:3], 0 offen +; GFX8-NEXT:s_endpgm +; +; GFX9-LABEL: buffer_store_bf16: +; GFX9: ; %bb.0: +; GFX9-NEXT:buffer_store_short v0, v1, s[0:3], 0 offen +; GFX9-NEXT:s_endpgm +; +; GFX10-LABEL: buffer_store_bf16: +; GFX10: ; %bb.0: +; GFX10-NEXT:buffer_store_short v0, v1, s[0:3], 0 offen +; GFX10-NEXT:s_endpgm +; +; GFX11-LABEL: buffer_store_bf16: +; GFX11: ; %bb.0: +; GFX11-NEXT:buffer_store_b16 v0, v1, s[0:3], 0 offen +; GFX11-NEXT:s_nop 0 +; GFX11-NEXT:s_sendmsg sendmsg(MSG_DEALLOC_VGPRS) +; GFX11-NEXT:s_endpgm + call void @llvm.amdgcn.raw.ptr.buffer.store.bf16(bfloat %data, ptr addrspace(8) %rsrc, i32 %offset, i32 0, i32 0) + ret void +} define amdgpu_ps void @buffer_store_v2bf16(ptr addrspace(8) inreg %rsrc, <2 x bfloat> %data, i32 %offset) { ; GFX7-LABEL: buffer_store_v2bf16: ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [flang] [flang] Lower REDUCE intrinsic for reduction op with args by value (PR #95353)
llvmbot wrote: @llvm/pr-subscribers-flang-fir-hlfir Author: Valentin Clement (バレンタイン クレメン) (clementval) Changes #95297 Updates the runtime entry points to distinguish between reduction operation with arguments passed by value or by reference. Add lowering to support the arguments passed by value. --- Patch is 62.25 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/95353.diff 5 Files Affected: - (modified) flang/include/flang/Optimizer/Builder/Runtime/RTBuilder.h (+22) - (modified) flang/include/flang/Optimizer/Builder/Runtime/Reduction.h (+4-4) - (modified) flang/lib/Optimizer/Builder/IntrinsicCall.cpp (+12-4) - (modified) flang/lib/Optimizer/Builder/Runtime/Reduction.cpp (+413-55) - (modified) flang/test/Lower/Intrinsics/reduce.f90 (+223-12) ``diff diff --git a/flang/include/flang/Optimizer/Builder/Runtime/RTBuilder.h b/flang/include/flang/Optimizer/Builder/Runtime/RTBuilder.h index 809d5b8d569dc..845ba385918d0 100644 --- a/flang/include/flang/Optimizer/Builder/Runtime/RTBuilder.h +++ b/flang/include/flang/Optimizer/Builder/Runtime/RTBuilder.h @@ -64,6 +64,18 @@ using FuncTypeBuilderFunc = mlir::FunctionType (*)(mlir::MLIRContext *); }; \ } +#define REDUCTION_VALUE_OPERATION_MODEL(T) \ + template <> \ + constexpr TypeBuilderFunc \ + getModel>() { \ +return [](mlir::MLIRContext *context) -> mlir::Type { \ + TypeBuilderFunc f{getModel()}; \ + auto refTy = fir::ReferenceType::get(f(context)); \ + return mlir::FunctionType::get(context, {f(context), f(context)}, \ + refTy); \ +}; \ + } + #define REDUCTION_CHAR_OPERATION_MODEL(T) \ template <> \ constexpr TypeBuilderFunc \ @@ -481,17 +493,27 @@ constexpr TypeBuilderFunc getModel() { } REDUCTION_REF_OPERATION_MODEL(std::int8_t) +REDUCTION_VALUE_OPERATION_MODEL(std::int8_t) REDUCTION_REF_OPERATION_MODEL(std::int16_t) +REDUCTION_VALUE_OPERATION_MODEL(std::int16_t) REDUCTION_REF_OPERATION_MODEL(std::int32_t) +REDUCTION_VALUE_OPERATION_MODEL(std::int32_t) REDUCTION_REF_OPERATION_MODEL(std::int64_t) +REDUCTION_VALUE_OPERATION_MODEL(std::int64_t) REDUCTION_REF_OPERATION_MODEL(Fortran::common::int128_t) +REDUCTION_VALUE_OPERATION_MODEL(Fortran::common::int128_t) REDUCTION_REF_OPERATION_MODEL(float) +REDUCTION_VALUE_OPERATION_MODEL(float) REDUCTION_REF_OPERATION_MODEL(double) +REDUCTION_VALUE_OPERATION_MODEL(double) REDUCTION_REF_OPERATION_MODEL(long double) +REDUCTION_VALUE_OPERATION_MODEL(long double) REDUCTION_REF_OPERATION_MODEL(std::complex) +REDUCTION_VALUE_OPERATION_MODEL(std::complex) REDUCTION_REF_OPERATION_MODEL(std::complex) +REDUCTION_VALUE_OPERATION_MODEL(std::complex) REDUCTION_CHAR_OPERATION_MODEL(char) REDUCTION_CHAR_OPERATION_MODEL(char16_t) diff --git a/flang/include/flang/Optimizer/Builder/Runtime/Reduction.h b/flang/include/flang/Optimizer/Builder/Runtime/Reduction.h index fedf453a6dc8d..2a40cddc0cc2c 100644 --- a/flang/include/flang/Optimizer/Builder/Runtime/Reduction.h +++ b/flang/include/flang/Optimizer/Builder/Runtime/Reduction.h @@ -229,8 +229,8 @@ void genIParityDim(fir::FirOpBuilder , mlir::Location loc, /// result value. This is used for COMPLEX, CHARACTER and DERIVED TYPES. void genReduce(fir::FirOpBuilder , mlir::Location loc, mlir::Value arrayBox, mlir::Value operation, mlir::Value maskBox, - mlir::Value identity, mlir::Value ordered, - mlir::Value resultBox); + mlir::Value identity, mlir::Value ordered, mlir::Value resultBox, + bool argByRef); /// Generate call to `Reduce` intrinsic runtime routine. This is the version /// that does not take a dim argument and return a scalare result. This is used @@ -238,14 +238,14 @@ void genReduce(fir::FirOpBuilder , mlir::Location loc, mlir::Value genReduce(fir::FirOpBuilder , mlir::Location loc, mlir::Value arrayBox, mlir::Value operation, mlir::Value maskBox, mlir::Value identity, - mlir::Value ordered); + mlir::Value ordered, bool argByRef); /// Generate call to `Reduce` intrinsic runtime routine. This is the version /// that takes arrays of any rank with a dim argument specified. void genReduceDim(fir::FirOpBuilder , mlir::Location loc,
[llvm-branch-commits] [flang] [flang] Lower REDUCE intrinsic for reduction op with args by value (PR #95353)
https://github.com/clementval created https://github.com/llvm/llvm-project/pull/95353 #95297 Updates the runtime entry points to distinguish between reduction operation with arguments passed by value or by reference. Add lowering to support the arguments passed by value. >From defadc4f18b0b4b369a3657a0f6e4c9f79ffd793 Mon Sep 17 00:00:00 2001 From: Valentin Clement Date: Wed, 12 Jun 2024 15:28:31 -0700 Subject: [PATCH] [flang] Update lowering of REDUCE intrinsic for reduction operation with args by value --- .../Optimizer/Builder/Runtime/RTBuilder.h | 22 + .../Optimizer/Builder/Runtime/Reduction.h | 8 +- flang/lib/Optimizer/Builder/IntrinsicCall.cpp | 16 +- .../Optimizer/Builder/Runtime/Reduction.cpp | 468 -- flang/test/Lower/Intrinsics/reduce.f90| 235 - 5 files changed, 674 insertions(+), 75 deletions(-) diff --git a/flang/include/flang/Optimizer/Builder/Runtime/RTBuilder.h b/flang/include/flang/Optimizer/Builder/Runtime/RTBuilder.h index 809d5b8d569dc..845ba385918d0 100644 --- a/flang/include/flang/Optimizer/Builder/Runtime/RTBuilder.h +++ b/flang/include/flang/Optimizer/Builder/Runtime/RTBuilder.h @@ -64,6 +64,18 @@ using FuncTypeBuilderFunc = mlir::FunctionType (*)(mlir::MLIRContext *); }; \ } +#define REDUCTION_VALUE_OPERATION_MODEL(T) \ + template <> \ + constexpr TypeBuilderFunc \ + getModel>() { \ +return [](mlir::MLIRContext *context) -> mlir::Type { \ + TypeBuilderFunc f{getModel()}; \ + auto refTy = fir::ReferenceType::get(f(context)); \ + return mlir::FunctionType::get(context, {f(context), f(context)}, \ + refTy); \ +}; \ + } + #define REDUCTION_CHAR_OPERATION_MODEL(T) \ template <> \ constexpr TypeBuilderFunc \ @@ -481,17 +493,27 @@ constexpr TypeBuilderFunc getModel() { } REDUCTION_REF_OPERATION_MODEL(std::int8_t) +REDUCTION_VALUE_OPERATION_MODEL(std::int8_t) REDUCTION_REF_OPERATION_MODEL(std::int16_t) +REDUCTION_VALUE_OPERATION_MODEL(std::int16_t) REDUCTION_REF_OPERATION_MODEL(std::int32_t) +REDUCTION_VALUE_OPERATION_MODEL(std::int32_t) REDUCTION_REF_OPERATION_MODEL(std::int64_t) +REDUCTION_VALUE_OPERATION_MODEL(std::int64_t) REDUCTION_REF_OPERATION_MODEL(Fortran::common::int128_t) +REDUCTION_VALUE_OPERATION_MODEL(Fortran::common::int128_t) REDUCTION_REF_OPERATION_MODEL(float) +REDUCTION_VALUE_OPERATION_MODEL(float) REDUCTION_REF_OPERATION_MODEL(double) +REDUCTION_VALUE_OPERATION_MODEL(double) REDUCTION_REF_OPERATION_MODEL(long double) +REDUCTION_VALUE_OPERATION_MODEL(long double) REDUCTION_REF_OPERATION_MODEL(std::complex) +REDUCTION_VALUE_OPERATION_MODEL(std::complex) REDUCTION_REF_OPERATION_MODEL(std::complex) +REDUCTION_VALUE_OPERATION_MODEL(std::complex) REDUCTION_CHAR_OPERATION_MODEL(char) REDUCTION_CHAR_OPERATION_MODEL(char16_t) diff --git a/flang/include/flang/Optimizer/Builder/Runtime/Reduction.h b/flang/include/flang/Optimizer/Builder/Runtime/Reduction.h index fedf453a6dc8d..2a40cddc0cc2c 100644 --- a/flang/include/flang/Optimizer/Builder/Runtime/Reduction.h +++ b/flang/include/flang/Optimizer/Builder/Runtime/Reduction.h @@ -229,8 +229,8 @@ void genIParityDim(fir::FirOpBuilder , mlir::Location loc, /// result value. This is used for COMPLEX, CHARACTER and DERIVED TYPES. void genReduce(fir::FirOpBuilder , mlir::Location loc, mlir::Value arrayBox, mlir::Value operation, mlir::Value maskBox, - mlir::Value identity, mlir::Value ordered, - mlir::Value resultBox); + mlir::Value identity, mlir::Value ordered, mlir::Value resultBox, + bool argByRef); /// Generate call to `Reduce` intrinsic runtime routine. This is the version /// that does not take a dim argument and return a scalare result. This is used @@ -238,14 +238,14 @@ void genReduce(fir::FirOpBuilder , mlir::Location loc, mlir::Value genReduce(fir::FirOpBuilder , mlir::Location loc, mlir::Value arrayBox, mlir::Value operation, mlir::Value maskBox, mlir::Value identity, - mlir::Value ordered); + mlir::Value ordered, bool argByRef); /// Generate call to `Reduce` intrinsic runtime routine. This is the version /// that takes arrays of any rank with a dim argument specified. void
[llvm-branch-commits] [clang] [clang] Implement function pointer signing and authenticated function calls (PR #93906)
https://github.com/ahatanak updated https://github.com/llvm/llvm-project/pull/93906 >From 0e85001f6d53e63beca77a76eaba1875ec84000d Mon Sep 17 00:00:00 2001 From: Akira Hatanaka Date: Fri, 24 May 2024 20:23:36 -0700 Subject: [PATCH 1/4] [clang] Implement function pointer signing. Co-Authored-By: John McCall --- clang/include/clang/Basic/CodeGenOptions.h| 4 + .../clang/Basic/DiagnosticDriverKinds.td | 3 + clang/include/clang/Basic/LangOptions.h | 2 + .../include/clang/Basic/PointerAuthOptions.h | 136 ++ .../clang/Frontend/CompilerInvocation.h | 10 ++ clang/lib/CodeGen/CGBuiltin.cpp | 3 +- clang/lib/CodeGen/CGCall.cpp | 3 + clang/lib/CodeGen/CGCall.h| 28 +++- clang/lib/CodeGen/CGExpr.cpp | 17 +-- clang/lib/CodeGen/CGExprConstant.cpp | 19 ++- clang/lib/CodeGen/CGPointerAuth.cpp | 51 +++ clang/lib/CodeGen/CGPointerAuthInfo.h | 96 + clang/lib/CodeGen/CodeGenFunction.cpp | 58 clang/lib/CodeGen/CodeGenFunction.h | 10 ++ clang/lib/CodeGen/CodeGenModule.h | 34 + clang/lib/Frontend/CompilerInvocation.cpp | 36 + clang/lib/Headers/ptrauth.h | 34 + .../CodeGen/ptrauth-function-attributes.c | 13 ++ .../test/CodeGen/ptrauth-function-init-fail.c | 5 + clang/test/CodeGen/ptrauth-function-init.c| 31 .../CodeGen/ptrauth-function-lvalue-cast.c| 23 +++ clang/test/CodeGen/ptrauth-weak_import.c | 10 ++ clang/test/CodeGenCXX/ptrauth.cpp | 24 23 files changed, 633 insertions(+), 17 deletions(-) create mode 100644 clang/lib/CodeGen/CGPointerAuthInfo.h create mode 100644 clang/test/CodeGen/ptrauth-function-attributes.c create mode 100644 clang/test/CodeGen/ptrauth-function-init-fail.c create mode 100644 clang/test/CodeGen/ptrauth-function-init.c create mode 100644 clang/test/CodeGen/ptrauth-function-lvalue-cast.c create mode 100644 clang/test/CodeGen/ptrauth-weak_import.c create mode 100644 clang/test/CodeGenCXX/ptrauth.cpp diff --git a/clang/include/clang/Basic/CodeGenOptions.h b/clang/include/clang/Basic/CodeGenOptions.h index 9469a424045bb..502722a6ec4eb 100644 --- a/clang/include/clang/Basic/CodeGenOptions.h +++ b/clang/include/clang/Basic/CodeGenOptions.h @@ -13,6 +13,7 @@ #ifndef LLVM_CLANG_BASIC_CODEGENOPTIONS_H #define LLVM_CLANG_BASIC_CODEGENOPTIONS_H +#include "clang/Basic/PointerAuthOptions.h" #include "clang/Basic/Sanitizers.h" #include "clang/Basic/XRayInstr.h" #include "llvm/ADT/FloatingPointMode.h" @@ -388,6 +389,9 @@ class CodeGenOptions : public CodeGenOptionsBase { std::vector Reciprocals; + /// Configuration for pointer-signing. + PointerAuthOptions PointerAuth; + /// The preferred width for auto-vectorization transforms. This is intended to /// override default transforms based on the width of the architected vector /// registers. diff --git a/clang/include/clang/Basic/DiagnosticDriverKinds.td b/clang/include/clang/Basic/DiagnosticDriverKinds.td index 773b234cd68fe..6cbb0c8401c15 100644 --- a/clang/include/clang/Basic/DiagnosticDriverKinds.td +++ b/clang/include/clang/Basic/DiagnosticDriverKinds.td @@ -351,6 +351,9 @@ def err_drv_omp_host_ir_file_not_found : Error< "target regions but cannot be found">; def err_drv_omp_host_target_not_supported : Error< "target '%0' is not a supported OpenMP host target">; +def err_drv_ptrauth_not_supported : Error< + "target '%0' does not support native pointer authentication">; + def err_drv_expecting_fopenmp_with_fopenmp_targets : Error< "'-fopenmp-targets' must be used in conjunction with a '-fopenmp' option " "compatible with offloading; e.g., '-fopenmp=libomp' or '-fopenmp=libiomp5'">; diff --git a/clang/include/clang/Basic/LangOptions.h b/clang/include/clang/Basic/LangOptions.h index 75e88afbd9705..5216822e45b1b 100644 --- a/clang/include/clang/Basic/LangOptions.h +++ b/clang/include/clang/Basic/LangOptions.h @@ -346,6 +346,8 @@ class LangOptionsBase { BKey }; + using PointerAuthenticationMode = ::clang::PointerAuthenticationMode; + enum class ThreadModelKind { /// POSIX Threads. POSIX, diff --git a/clang/include/clang/Basic/PointerAuthOptions.h b/clang/include/clang/Basic/PointerAuthOptions.h index e5cdcc31ebfb7..32b179e3f9460 100644 --- a/clang/include/clang/Basic/PointerAuthOptions.h +++ b/clang/include/clang/Basic/PointerAuthOptions.h @@ -14,10 +14,146 @@ #ifndef LLVM_CLANG_BASIC_POINTERAUTHOPTIONS_H #define LLVM_CLANG_BASIC_POINTERAUTHOPTIONS_H +#include "clang/Basic/LLVM.h" +#include "clang/Basic/LangOptions.h" +#include "llvm/Support/ErrorHandling.h" +#include "llvm/Target/TargetOptions.h" +#include +#include +#include +#include + namespace clang { constexpr unsigned PointerAuthKeyNone = -1; +class PointerAuthSchema { +public: + enum class Kind : unsigned { +
[llvm-branch-commits] [clang] [clang] Implement function pointer signing and authenticated function calls (PR #93906)
https://github.com/ahatanak updated https://github.com/llvm/llvm-project/pull/93906 >From 0e85001f6d53e63beca77a76eaba1875ec84000d Mon Sep 17 00:00:00 2001 From: Akira Hatanaka Date: Fri, 24 May 2024 20:23:36 -0700 Subject: [PATCH 1/4] [clang] Implement function pointer signing. Co-Authored-By: John McCall --- clang/include/clang/Basic/CodeGenOptions.h| 4 + .../clang/Basic/DiagnosticDriverKinds.td | 3 + clang/include/clang/Basic/LangOptions.h | 2 + .../include/clang/Basic/PointerAuthOptions.h | 136 ++ .../clang/Frontend/CompilerInvocation.h | 10 ++ clang/lib/CodeGen/CGBuiltin.cpp | 3 +- clang/lib/CodeGen/CGCall.cpp | 3 + clang/lib/CodeGen/CGCall.h| 28 +++- clang/lib/CodeGen/CGExpr.cpp | 17 +-- clang/lib/CodeGen/CGExprConstant.cpp | 19 ++- clang/lib/CodeGen/CGPointerAuth.cpp | 51 +++ clang/lib/CodeGen/CGPointerAuthInfo.h | 96 + clang/lib/CodeGen/CodeGenFunction.cpp | 58 clang/lib/CodeGen/CodeGenFunction.h | 10 ++ clang/lib/CodeGen/CodeGenModule.h | 34 + clang/lib/Frontend/CompilerInvocation.cpp | 36 + clang/lib/Headers/ptrauth.h | 34 + .../CodeGen/ptrauth-function-attributes.c | 13 ++ .../test/CodeGen/ptrauth-function-init-fail.c | 5 + clang/test/CodeGen/ptrauth-function-init.c| 31 .../CodeGen/ptrauth-function-lvalue-cast.c| 23 +++ clang/test/CodeGen/ptrauth-weak_import.c | 10 ++ clang/test/CodeGenCXX/ptrauth.cpp | 24 23 files changed, 633 insertions(+), 17 deletions(-) create mode 100644 clang/lib/CodeGen/CGPointerAuthInfo.h create mode 100644 clang/test/CodeGen/ptrauth-function-attributes.c create mode 100644 clang/test/CodeGen/ptrauth-function-init-fail.c create mode 100644 clang/test/CodeGen/ptrauth-function-init.c create mode 100644 clang/test/CodeGen/ptrauth-function-lvalue-cast.c create mode 100644 clang/test/CodeGen/ptrauth-weak_import.c create mode 100644 clang/test/CodeGenCXX/ptrauth.cpp diff --git a/clang/include/clang/Basic/CodeGenOptions.h b/clang/include/clang/Basic/CodeGenOptions.h index 9469a424045bb..502722a6ec4eb 100644 --- a/clang/include/clang/Basic/CodeGenOptions.h +++ b/clang/include/clang/Basic/CodeGenOptions.h @@ -13,6 +13,7 @@ #ifndef LLVM_CLANG_BASIC_CODEGENOPTIONS_H #define LLVM_CLANG_BASIC_CODEGENOPTIONS_H +#include "clang/Basic/PointerAuthOptions.h" #include "clang/Basic/Sanitizers.h" #include "clang/Basic/XRayInstr.h" #include "llvm/ADT/FloatingPointMode.h" @@ -388,6 +389,9 @@ class CodeGenOptions : public CodeGenOptionsBase { std::vector Reciprocals; + /// Configuration for pointer-signing. + PointerAuthOptions PointerAuth; + /// The preferred width for auto-vectorization transforms. This is intended to /// override default transforms based on the width of the architected vector /// registers. diff --git a/clang/include/clang/Basic/DiagnosticDriverKinds.td b/clang/include/clang/Basic/DiagnosticDriverKinds.td index 773b234cd68fe..6cbb0c8401c15 100644 --- a/clang/include/clang/Basic/DiagnosticDriverKinds.td +++ b/clang/include/clang/Basic/DiagnosticDriverKinds.td @@ -351,6 +351,9 @@ def err_drv_omp_host_ir_file_not_found : Error< "target regions but cannot be found">; def err_drv_omp_host_target_not_supported : Error< "target '%0' is not a supported OpenMP host target">; +def err_drv_ptrauth_not_supported : Error< + "target '%0' does not support native pointer authentication">; + def err_drv_expecting_fopenmp_with_fopenmp_targets : Error< "'-fopenmp-targets' must be used in conjunction with a '-fopenmp' option " "compatible with offloading; e.g., '-fopenmp=libomp' or '-fopenmp=libiomp5'">; diff --git a/clang/include/clang/Basic/LangOptions.h b/clang/include/clang/Basic/LangOptions.h index 75e88afbd9705..5216822e45b1b 100644 --- a/clang/include/clang/Basic/LangOptions.h +++ b/clang/include/clang/Basic/LangOptions.h @@ -346,6 +346,8 @@ class LangOptionsBase { BKey }; + using PointerAuthenticationMode = ::clang::PointerAuthenticationMode; + enum class ThreadModelKind { /// POSIX Threads. POSIX, diff --git a/clang/include/clang/Basic/PointerAuthOptions.h b/clang/include/clang/Basic/PointerAuthOptions.h index e5cdcc31ebfb7..32b179e3f9460 100644 --- a/clang/include/clang/Basic/PointerAuthOptions.h +++ b/clang/include/clang/Basic/PointerAuthOptions.h @@ -14,10 +14,146 @@ #ifndef LLVM_CLANG_BASIC_POINTERAUTHOPTIONS_H #define LLVM_CLANG_BASIC_POINTERAUTHOPTIONS_H +#include "clang/Basic/LLVM.h" +#include "clang/Basic/LangOptions.h" +#include "llvm/Support/ErrorHandling.h" +#include "llvm/Target/TargetOptions.h" +#include +#include +#include +#include + namespace clang { constexpr unsigned PointerAuthKeyNone = -1; +class PointerAuthSchema { +public: + enum class Kind : unsigned { +
[llvm-branch-commits] [mlir] 8944c8d - Revert "[MLIR][Arith] add fastMathAttr on arith::extf and arith::truncf (#93443)"
Author: Ivy Zhang Date: 2024-06-13T11:12:39+08:00 New Revision: 8944c8df45f8e4da860bf04118106d9a950cbf75 URL: https://github.com/llvm/llvm-project/commit/8944c8df45f8e4da860bf04118106d9a950cbf75 DIFF: https://github.com/llvm/llvm-project/commit/8944c8df45f8e4da860bf04118106d9a950cbf75.diff LOG: Revert "[MLIR][Arith] add fastMathAttr on arith::extf and arith::truncf (#93443)" This reverts commit 6784bf764207d267b781b4f515a2fafdcb345509. Added: Modified: mlir/include/mlir/Dialect/Arith/IR/ArithOps.td mlir/lib/Dialect/Arith/IR/ArithOps.cpp mlir/lib/Dialect/Arith/Transforms/EmulateUnsupportedFloats.cpp mlir/lib/Dialect/Math/Transforms/LegalizeToF32.cpp mlir/test/Conversion/ArithToLLVM/arith-to-llvm.mlir mlir/test/Dialect/Arith/canonicalize.mlir mlir/test/Dialect/Arith/emulate-unsupported-floats.mlir Removed: diff --git a/mlir/include/mlir/Dialect/Arith/IR/ArithOps.td b/mlir/include/mlir/Dialect/Arith/IR/ArithOps.td index c4471f9bc5af2..06fbdb7f2c4cb 100644 --- a/mlir/include/mlir/Dialect/Arith/IR/ArithOps.td +++ b/mlir/include/mlir/Dialect/Arith/IR/ArithOps.td @@ -1199,7 +1199,7 @@ def Arith_ExtSIOp : Arith_IToICastOp<"extsi"> { // ExtFOp //===--===// -def Arith_ExtFOp : Arith_FToFCastOp<"extf", [DeclareOpInterfaceMethods]> { +def Arith_ExtFOp : Arith_FToFCastOp<"extf"> { let summary = "cast from floating-point to wider floating-point"; let description = [{ Cast a floating-point value to a larger floating-point-typed value. @@ -1208,13 +1208,6 @@ def Arith_ExtFOp : Arith_FToFCastOp<"extf", [DeclareOpInterfaceMethods:$fastmath); - let results = (outs FloatLike:$out); - - let assemblyFormat = [{ $in (`fastmath` `` $fastmath^)? - attr-dict `:` type($in) `to` type($out) }]; } //===--===// @@ -1253,11 +1246,8 @@ def Arith_TruncFOp : Arith_Op<"truncf", [Pure, SameOperandsAndResultShape, SameInputOutputTensorDims, DeclareOpInterfaceMethods, - DeclareOpInterfaceMethods, DeclareOpInterfaceMethods]>, Arguments<(ins FloatLike:$in, - DefaultValuedAttr< - Arith_FastMathAttr, "::mlir::arith::FastMathFlags::none">:$fastmath, OptionalAttr:$roundingmode)>, Results<(outs FloatLike:$out)> { let summary = "cast from floating-point to narrower floating-point"; @@ -1277,9 +1267,7 @@ def Arith_TruncFOp : let hasFolder = 1; let hasVerifier = 1; - let assemblyFormat = [{ $in ($roundingmode^)? - (`fastmath` `` $fastmath^)? - attr-dict `:` type($in) `to` type($out) }]; + let assemblyFormat = "$in ($roundingmode^)? attr-dict `:` type($in) `to` type($out)"; } //===--===// diff --git a/mlir/lib/Dialect/Arith/IR/ArithOps.cpp b/mlir/lib/Dialect/Arith/IR/ArithOps.cpp index 291f6e5424ba5..2f6647a2a27b1 100644 --- a/mlir/lib/Dialect/Arith/IR/ArithOps.cpp +++ b/mlir/lib/Dialect/Arith/IR/ArithOps.cpp @@ -1390,20 +1390,6 @@ LogicalResult arith::ExtSIOp::verify() { /// Fold extension of float constants when there is no information loss due the /// diff erence in fp semantics. OpFoldResult arith::ExtFOp::fold(FoldAdaptor adaptor) { - if (auto truncFOp = getOperand().getDefiningOp()) { -if (truncFOp.getOperand().getType() == getType()) { - arith::FastMathFlags truncFMF = truncFOp.getFastmath(); - bool isTruncContract = - bitEnumContainsAll(truncFMF, arith::FastMathFlags::contract); - arith::FastMathFlags extFMF = getFastmath(); - bool isExtContract = - bitEnumContainsAll(extFMF, arith::FastMathFlags::contract); - if (isTruncContract && isExtContract) { -return truncFOp.getOperand(); - } -} - } - auto resElemType = cast(getElementTypeOrSelf(getType())); const llvm::fltSemantics = resElemType.getFloatSemantics(); return constFoldCastOp( diff --git a/mlir/lib/Dialect/Arith/Transforms/EmulateUnsupportedFloats.cpp b/mlir/lib/Dialect/Arith/Transforms/EmulateUnsupportedFloats.cpp index 8e1cb474feee7..4a50da3513f99 100644 --- a/mlir/lib/Dialect/Arith/Transforms/EmulateUnsupportedFloats.cpp +++ b/mlir/lib/Dialect/Arith/Transforms/EmulateUnsupportedFloats.cpp @@ -94,11 +94,8 @@ void EmulateFloatPattern::rewrite(Operation *op, ArrayRef operands, SmallVector newResults(expandedOp->getResults()); for (auto [res, oldType, newType] : llvm::zip_equal( MutableArrayRef{newResults}, op->getResultTypes(), resultTypes)) { -if (oldType != newType) { - auto truncFOp = rewriter.create(loc, oldType, res); - truncFOp.setFastmath(arith::FastMathFlags::contract); - res = truncFOp.getResult(); -} +
[llvm-branch-commits] [llvm] [Support] Integrate SipHash.cpp into libSupport. (PR #94394)
asl wrote: @kbeyls There are (some) tests in the follow-up commit https://github.com/llvm/llvm-project/pull/93902/files#diff-8df159460fc7a128734566054df883f3192b1b261dc8eac667933b4042e9af5f https://github.com/llvm/llvm-project/pull/94394 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [clang] Implement function pointer signing and authenticated function calls (PR #93906)
asl wrote: @ahatanak Looks like there are some conflicts that should be resolved https://github.com/llvm/llvm-project/pull/93906 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [BOLT] Drop high discrepancy profiles in matching (PR #95156)
@@ -592,10 +599,15 @@ void preprocessUnreachableBlocks(FlowFunction ) { /// Decide if stale profile matching can be applied for a given function. /// Currently we skip inference for (very) large instances and for instances /// having "unexpected" control flow (e.g., having no sink basic blocks). -bool canApplyInference(const FlowFunction ) { +bool canApplyInference(const FlowFunction , + const yaml::bolt::BinaryFunctionProfile ) { if (Func.Blocks.size() > opts::StaleMatchingMaxFuncSize) return false; + if ((double)Func.MatchedExecCount / YamlBF.ExecCount >= + opts::MatchedProfileThreshold / 100.0) +return false; WenleiHe wrote: > For block-based matching, the threshold should be higher than 5%, perhaps > closer to a half? Yes. Threshold of course need to be tuned based on the heuristic chosen. I just feel that block count based threshold could be a better proxy of how confident we are about the graph match and whether stale profile matching should proceed.. But I don't have very strong opinion. https://github.com/llvm/llvm-project/pull/95156 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] fc671bb - Revert "Bump the DWARF version number to 5 on Darwin. (#95164)"
Author: Florian Mayer Date: 2024-06-12T15:50:03-07:00 New Revision: fc671bbb1ceb94f8aac63bc0e4963e5894bc660e URL: https://github.com/llvm/llvm-project/commit/fc671bbb1ceb94f8aac63bc0e4963e5894bc660e DIFF: https://github.com/llvm/llvm-project/commit/fc671bbb1ceb94f8aac63bc0e4963e5894bc660e.diff LOG: Revert "Bump the DWARF version number to 5 on Darwin. (#95164)" This reverts commit 8f6acd973a38da6dce45faa676cbb51da37f72e5. Added: Modified: clang/lib/Driver/ToolChains/Darwin.cpp clang/test/Driver/debug-options.c Removed: diff --git a/clang/lib/Driver/ToolChains/Darwin.cpp b/clang/lib/Driver/ToolChains/Darwin.cpp index ca75a622b061e..ed5737915aa96 100644 --- a/clang/lib/Driver/ToolChains/Darwin.cpp +++ b/clang/lib/Driver/ToolChains/Darwin.cpp @@ -1257,17 +1257,7 @@ unsigned DarwinClang::GetDefaultDwarfVersion() const { if ((isTargetMacOSBased() && isMacosxVersionLT(10, 11)) || (isTargetIOSBased() && isIPhoneOSVersionLT(9))) return 2; - // Default to use DWARF 4 on OS X 10.11 - macOS 14 / iOS 9 - iOS 17. - if ((isTargetMacOSBased() && isMacosxVersionLT(15)) || - (isTargetIOSBased() && isIPhoneOSVersionLT(18)) || - (isTargetWatchOSBased() && TargetVersion < llvm::VersionTuple(11)) || - (isTargetXROS() && TargetVersion < llvm::VersionTuple(2)) || - (isTargetDriverKit() && TargetVersion < llvm::VersionTuple(24)) || - (isTargetMacOSBased() && - TargetVersion.empty()) || // apple-darwin, no version. - (TargetPlatform == llvm::Triple::BridgeOS)) -return 4; - return 5; + return 4; } void MachO::AddLinkRuntimeLib(const ArgList , ArgStringList , diff --git a/clang/test/Driver/debug-options.c b/clang/test/Driver/debug-options.c index 0a665f7017d63..07f6ca9e3902f 100644 --- a/clang/test/Driver/debug-options.c +++ b/clang/test/Driver/debug-options.c @@ -68,32 +68,7 @@ // RUN: %clang -### -c -g %s -target x86_64-apple-driverkit19.0 2>&1 \ // RUN: | FileCheck -check-prefix=G_STANDALONE \ // RUN: -check-prefix=G_DWARF4 %s -// RUN: %clang -### -c -g %s -target x86_64-apple-macosx15 2>&1 \ -// RUN: | FileCheck -check-prefix=G_STANDALONE \ -// RUN: -check-prefix=G_DWARF5 %s -// RUN: %clang -### -c -g %s -target arm64-apple-ios17.0 2>&1 \ -// RUN: | FileCheck -check-prefix=G_STANDALONE \ -// RUN: -check-prefix=G_DWARF4 %s -// RUN: %clang -### -c -g %s -target arm64-apple-ios18.0 2>&1 \ -// RUN: | FileCheck -check-prefix=G_STANDALONE \ -// RUN: -check-prefix=G_DWARF5 %s -// RUN: %clang -### -c -g %s -target arm64_32-apple-watchos11 2>&1 \ -// RUN: | FileCheck -check-prefix=G_STANDALONE \ -// RUN: -check-prefix=G_DWARF5 %s -// RUN: %clang -### -c -g %s -target arm64-apple-tvos18.0 2>&1 \ -// RUN: | FileCheck -check-prefix=G_STANDALONE \ -// RUN: -check-prefix=G_DWARF5 %s -// RUN: %clang -### -c -g %s -target x86_64-apple-driverkit24.0 2>&1 \ -// RUN: | FileCheck -check-prefix=G_STANDALONE \ -// RUN: -check-prefix=G_DWARF5 %s -// RUN: %clang -### -c -g %s -target arm64-apple-xros1 2>&1 \ -// RUN: | FileCheck -check-prefix=G_STANDALONE \ -// RUN: -check-prefix=G_DWARF4 %s -// RUN: %clang -### -c -g %s -target arm64-apple-xros2 2>&1 \ -// RUN: | FileCheck -check-prefix=G_STANDALONE \ -// RUN: -check-prefix=G_DWARF5 %s -// -// RUN: %clang -### -c -fsave-optimization-record %s\ +// RUN: %clang -### -c -fsave-optimization-record %s \ // RUN:-target x86_64-apple-darwin 2>&1 \ // RUN: | FileCheck -check-prefix=GLTO_ONLY %s // RUN: %clang -### -c -g -fsave-optimization-record %s \ ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [msan] Handle blendv intrinsics (PR #94882)
https://github.com/vitalybuka closed https://github.com/llvm/llvm-project/pull/94882 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [msan] Handle blendv intrinsics (PR #94882)
https://github.com/fmayer approved this pull request. https://github.com/llvm/llvm-project/pull/94882 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [msan] Handle blendv intrinsics (PR #94882)
https://github.com/vitalybuka edited https://github.com/llvm/llvm-project/pull/94882 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [msan] Handle blendv intrinsics (PR #94882)
@@ -3356,6 +3356,37 @@ struct MemorySanitizerVisitor : public InstVisitor { setOriginForNaryOp(I); } + Value *convertBlendvToSelectMask(IRBuilder<> , Value *C) { +C = CreateAppToShadowCast(IRB, C); +FixedVectorType *FVT = cast(C->getType()); +unsigned ElSize = FVT->getElementType()->getPrimitiveSizeInBits(); +C = IRB.CreateAShr(C, ElSize - 1); +FVT = FixedVectorType::get(IRB.getInt1Ty(), FVT->getNumElements()); +return IRB.CreateTrunc(C, FVT); + } + + // `blendv(f, t, c)` is effectively `select(c[top_bit], t, f)`. + void handleBlendvIntrinsic(IntrinsicInst ) { +Value *C = I.getOperand(2); +Value *T = I.getOperand(1); +Value *F = I.getOperand(0); + +Value *Sc = getShadow(, 2); +Value *Oc = MS.TrackOrigins ? getOrigin(C) : nullptr; + +{ + IRBuilder<> IRB(); vitalybuka wrote: I think it's unimportant. Builder has nothing interesting in destructor. `{}` is rather just to show that we don't need to can about builders conflict. https://github.com/llvm/llvm-project/pull/94882 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [msan] Handle blendv intrinsics (PR #94882)
@@ -3356,6 +3356,37 @@ struct MemorySanitizerVisitor : public InstVisitor { setOriginForNaryOp(I); } + Value *convertBlendvToSelectMask(IRBuilder<> , Value *C) { +C = CreateAppToShadowCast(IRB, C); +FixedVectorType *FVT = cast(C->getType()); +unsigned ElSize = FVT->getElementType()->getPrimitiveSizeInBits(); +C = IRB.CreateAShr(C, ElSize - 1); +FVT = FixedVectorType::get(IRB.getInt1Ty(), FVT->getNumElements()); +return IRB.CreateTrunc(C, FVT); + } + + // `blendv(f, t, c)` is effectively `select(c[top_bit], t, f)`. + void handleBlendvIntrinsic(IntrinsicInst ) { +Value *C = I.getOperand(2); +Value *T = I.getOperand(1); +Value *F = I.getOperand(0); + +Value *Sc = getShadow(, 2); +Value *Oc = MS.TrackOrigins ? getOrigin(C) : nullptr; + +{ + IRBuilder<> IRB(); fmayer wrote: Why does it matter that this doesn't outlive `handleSelectLikeInst`? Because that also creates an IRBuilder? How does that work? That creates it from `` as well, which means these instructions get inserted before the ones here, right? https://github.com/llvm/llvm-project/pull/94882 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [msan] Handle blendv intrinsics (PR #94882)
vitalybuka wrote: ping https://github.com/llvm/llvm-project/pull/94882 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [BOLT] Drop high discrepancy profiles in matching (PR #95156)
@@ -592,10 +599,15 @@ void preprocessUnreachableBlocks(FlowFunction ) { /// Decide if stale profile matching can be applied for a given function. /// Currently we skip inference for (very) large instances and for instances /// having "unexpected" control flow (e.g., having no sink basic blocks). -bool canApplyInference(const FlowFunction ) { +bool canApplyInference(const FlowFunction , + const yaml::bolt::BinaryFunctionProfile ) { if (Func.Blocks.size() > opts::StaleMatchingMaxFuncSize) return false; + if ((double)Func.MatchedExecCount / YamlBF.ExecCount >= + opts::MatchedProfileThreshold / 100.0) +return false; aaupov wrote: It's a tricky question how to define the cutoff in terms of sufficient matching. I first thought of defining a block count based cutoff (if we matched >5% of blocks, proceed with matching), but then what if these are cold blocks covering <1% of exec count? In this case we'd end up guessing/propagating most samples. For block-based matching, the threshold should be higher than 5%, perhaps closer to a half? For exec count based matching, I'd feel comfortable with 5% as threshold. https://github.com/llvm/llvm-project/pull/95156 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [BOLT] Drop high discrepancy profiles in matching (PR #95156)
WenleiHe wrote: cc @wlei-llvm https://github.com/llvm/llvm-project/pull/95156 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [BOLT] Drop high discrepancy profiles in matching (PR #95156)
@@ -592,10 +599,15 @@ void preprocessUnreachableBlocks(FlowFunction ) { /// Decide if stale profile matching can be applied for a given function. /// Currently we skip inference for (very) large instances and for instances /// having "unexpected" control flow (e.g., having no sink basic blocks). -bool canApplyInference(const FlowFunction ) { +bool canApplyInference(const FlowFunction , + const yaml::bolt::BinaryFunctionProfile ) { if (Func.Blocks.size() > opts::StaleMatchingMaxFuncSize) return false; + if ((double)Func.MatchedExecCount / YamlBF.ExecCount >= + opts::MatchedProfileThreshold / 100.0) +return false; WenleiHe wrote: Trying to understand the rationale behind using dynamic counts to determine whether profile inference is safe. The way I see it is, we have two graph that we try to match, if we have many nodes in the graph that we have exact match, chances are higher that we can infer the correct match for the rest of the nodes. With that, we care about more how many nodes we can match statically. Say if we have 5 blocks with count distribution of 1M, 1K, 1K, 1k, 1K, if we have exact match for the 4 1K node (80% exact match), we should feel reasonably confident about inferring the remaining 1 node, even though if we look at counts, we have exact match for only <1%. WDYT? https://github.com/llvm/llvm-project/pull/95156 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [BOLT] Drop high discrepancy profiles in matching (PR #95156)
@@ -59,6 +59,8 @@ struct FlowFunction { /// The index of the entry block. uint64_t Entry{0}; uint64_t Sink{UINT64_MAX}; + // Matched execution count for the function. + uint64_t MatchedExecCount{0}; WenleiHe wrote: nit: I'd be careful about adding this to `FlowFunction` -- strictly speaking this doesn't belong to flow function, which just describe the CFG and if we add function level "attributes" to flow functions, we'd have a lot more here. https://github.com/llvm/llvm-project/pull/95156 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [BOLT] Drop high discrepancy profiles in matching (PR #95156)
@@ -592,10 +599,15 @@ void preprocessUnreachableBlocks(FlowFunction ) { /// Decide if stale profile matching can be applied for a given function. /// Currently we skip inference for (very) large instances and for instances /// having "unexpected" control flow (e.g., having no sink basic blocks). -bool canApplyInference(const FlowFunction ) { +bool canApplyInference(const FlowFunction , WenleiHe wrote: Header comment needs update. https://github.com/llvm/llvm-project/pull/95156 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [BOLT] Drop high discrepancy profiles in matching (PR #95156)
@@ -614,6 +614,17 @@ - `--lite-threshold-pct=` + Threshold (in percent) of matched profile at which stale profile inference is + applied to functions. Argument corresponds to the sum of matched execution + counts of function blocks divided by the sum of execution counts of function + blocks. E.g if the sum of a function blocks' execution counts is 100, the sum + of the function blocks' matched execution counts is 10, and the argument is 15 + (15%), profile inference will not be applied to that function. A higher + threshold will correlate with fewer functions to process in cases of stale + profile. Default set to %5. WenleiHe wrote: nit: this is too verbose of a description. as you can see it's longer than most of other descriptions. :) https://github.com/llvm/llvm-project/pull/95156 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [libcxx] [libc++] Implement std::move_only_function (P0288R9) (PR #94670)
https://github.com/EricWF edited https://github.com/llvm/llvm-project/pull/94670 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] fe30a73 - Revert "[CLANG][DWARF] Handle DIE offset collision in DW_IDX_parent (#95039)"
Author: Florian Mayer Date: 2024-06-12T13:25:52-07:00 New Revision: fe30a734628b3028c086ce016b6f80440172f34f URL: https://github.com/llvm/llvm-project/commit/fe30a734628b3028c086ce016b6f80440172f34f DIFF: https://github.com/llvm/llvm-project/commit/fe30a734628b3028c086ce016b6f80440172f34f.diff LOG: Revert "[CLANG][DWARF] Handle DIE offset collision in DW_IDX_parent (#95039)" This reverts commit f59d9d538c7b580a93bee4afba0f098f7ddf09d9. Added: Modified: llvm/include/llvm/CodeGen/AccelTable.h llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp llvm/lib/DWARFLinker/Classic/DWARFLinker.cpp llvm/lib/DWARFLinker/Parallel/DWARFLinkerImpl.cpp Removed: llvm/test/DebugInfo/X86/debug-names-types-die-offset-collision.ll diff --git a/llvm/include/llvm/CodeGen/AccelTable.h b/llvm/include/llvm/CodeGen/AccelTable.h index 622fcf019aad6..cff8fcbaf2cd7 100644 --- a/llvm/include/llvm/CodeGen/AccelTable.h +++ b/llvm/include/llvm/CodeGen/AccelTable.h @@ -257,38 +257,18 @@ class AppleAccelTableData : public AccelTableData { /// Helper class to identify an entry in DWARF5AccelTable based on their DIE /// offset and UnitID. -struct OffsetAndUnitID { - uint64_t Offset = 0; - uint32_t UnitID = 0; - bool IsTU = false; - OffsetAndUnitID() = default; - OffsetAndUnitID(uint64_t Offset, uint32_t UnitID, bool IsTU) - : Offset(Offset), UnitID(UnitID), IsTU(IsTU) {} - uint64_t offset() const { return Offset; }; - uint32_t unitID() const { return UnitID; }; - bool isTU() const { return IsTU; } -}; +struct OffsetAndUnitID : std::pair { + using Base = std::pair; + OffsetAndUnitID(Base B) : Base(B) {} -template <> struct DenseMapInfo { - static inline OffsetAndUnitID getEmptyKey() { -OffsetAndUnitID Entry; -Entry.Offset = uint64_t(-1); -return Entry; - } - static inline OffsetAndUnitID getTombstoneKey() { -OffsetAndUnitID Entry; -Entry.Offset = uint64_t(-2); -return Entry; - } - static unsigned getHashValue(const OffsetAndUnitID ) { -return (unsigned)llvm::hash_combine(Val.offset(), Val.unitID(), Val.IsTU); - } - static bool isEqual(const OffsetAndUnitID , const OffsetAndUnitID ) { -return LHS.offset() == RHS.offset() && LHS.unitID() == RHS.unitID() && - LHS.IsTU == RHS.isTU(); - } + OffsetAndUnitID(uint64_t Offset, uint32_t UnitID) : Base(Offset, UnitID) {} + uint64_t offset() const { return first; }; + uint32_t unitID() const { return second; }; }; +template <> +struct DenseMapInfo : DenseMapInfo {}; + /// The Data class implementation for DWARF v5 accelerator table. Unlike the /// Apple Data classes, this class is just a DIE wrapper, and does not know to /// serialize itself. The complete serialization logic is in the @@ -297,11 +277,12 @@ class DWARF5AccelTableData : public AccelTableData { public: static uint32_t hash(StringRef Name) { return caseFoldingDjbHash(Name); } - DWARF5AccelTableData(const DIE , const uint32_t UnitID, const bool IsTU); + DWARF5AccelTableData(const DIE , const uint32_t UnitID, + const bool IsTU = false); DWARF5AccelTableData(const uint64_t DieOffset, const std::optional DefiningParentOffset, const unsigned DieTag, const unsigned UnitID, - const bool IsTU) + const bool IsTU = false) : OffsetVal(DieOffset), ParentOffset(DefiningParentOffset), DieTag(DieTag), AbbrevNumber(0), IsTU(IsTU), UnitID(UnitID) {} @@ -315,7 +296,7 @@ class DWARF5AccelTableData : public AccelTableData { } OffsetAndUnitID getDieOffsetAndUnitID() const { -return {getDieOffset(), getUnitID(), isTU()}; +return {getDieOffset(), UnitID}; } unsigned getDieTag() const { return DieTag; } @@ -341,7 +322,7 @@ class DWARF5AccelTableData : public AccelTableData { assert(isNormalized() && "Accessing DIE Offset before normalizing."); if (!ParentOffset) return std::nullopt; -return OffsetAndUnitID(*ParentOffset, getUnitID(), isTU()); +return OffsetAndUnitID(*ParentOffset, getUnitID()); } /// Sets AbbrevIndex for an Entry. @@ -435,7 +416,7 @@ class DWARF5AccelTable : public AccelTable { for (auto *Data : Entry.second.getValues()) { addName(Entry.second.Name, Data->getDieOffset(), Data->getParentDieOffset(), Data->getDieTag(), -Data->getUnitID(), Data->isTU()); +Data->getUnitID(), true); } } } diff --git a/llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp b/llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp index 7de9432325d8a..b9c02aed848cc 100644 --- a/llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp +++ b/llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp @@ -3592,8 +3592,7 @@ void DwarfDebug::addAccelNameImpl( "Kind is TU but CU is being processed."); // The type unit can be discarded, so need to
[llvm-branch-commits] [workflows] Fix version-check.yml to work with the new minor release bump (PR #95296)
https://github.com/vitalybuka closed https://github.com/llvm/llvm-project/pull/95296 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] Use rc version suffix (PR #95295)
https://github.com/vitalybuka closed https://github.com/llvm/llvm-project/pull/95295 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [workflows] Fix version-check.yml to work with the new minor release bump (PR #95296)
https://github.com/vitalybuka created https://github.com/llvm/llvm-project/pull/95296 (cherry picked from commit d5e69147b9d261bd53b4dd027f17131677be8613) ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] Use rc version suffix (PR #95295)
https://github.com/vitalybuka created https://github.com/llvm/llvm-project/pull/95295 None ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [clang] Implement function pointer signing and authenticated function calls (PR #93906)
https://github.com/ahatanak updated https://github.com/llvm/llvm-project/pull/93906 >From 0e85001f6d53e63beca77a76eaba1875ec84000d Mon Sep 17 00:00:00 2001 From: Akira Hatanaka Date: Fri, 24 May 2024 20:23:36 -0700 Subject: [PATCH 1/4] [clang] Implement function pointer signing. Co-Authored-By: John McCall --- clang/include/clang/Basic/CodeGenOptions.h| 4 + .../clang/Basic/DiagnosticDriverKinds.td | 3 + clang/include/clang/Basic/LangOptions.h | 2 + .../include/clang/Basic/PointerAuthOptions.h | 136 ++ .../clang/Frontend/CompilerInvocation.h | 10 ++ clang/lib/CodeGen/CGBuiltin.cpp | 3 +- clang/lib/CodeGen/CGCall.cpp | 3 + clang/lib/CodeGen/CGCall.h| 28 +++- clang/lib/CodeGen/CGExpr.cpp | 17 +-- clang/lib/CodeGen/CGExprConstant.cpp | 19 ++- clang/lib/CodeGen/CGPointerAuth.cpp | 51 +++ clang/lib/CodeGen/CGPointerAuthInfo.h | 96 + clang/lib/CodeGen/CodeGenFunction.cpp | 58 clang/lib/CodeGen/CodeGenFunction.h | 10 ++ clang/lib/CodeGen/CodeGenModule.h | 34 + clang/lib/Frontend/CompilerInvocation.cpp | 36 + clang/lib/Headers/ptrauth.h | 34 + .../CodeGen/ptrauth-function-attributes.c | 13 ++ .../test/CodeGen/ptrauth-function-init-fail.c | 5 + clang/test/CodeGen/ptrauth-function-init.c| 31 .../CodeGen/ptrauth-function-lvalue-cast.c| 23 +++ clang/test/CodeGen/ptrauth-weak_import.c | 10 ++ clang/test/CodeGenCXX/ptrauth.cpp | 24 23 files changed, 633 insertions(+), 17 deletions(-) create mode 100644 clang/lib/CodeGen/CGPointerAuthInfo.h create mode 100644 clang/test/CodeGen/ptrauth-function-attributes.c create mode 100644 clang/test/CodeGen/ptrauth-function-init-fail.c create mode 100644 clang/test/CodeGen/ptrauth-function-init.c create mode 100644 clang/test/CodeGen/ptrauth-function-lvalue-cast.c create mode 100644 clang/test/CodeGen/ptrauth-weak_import.c create mode 100644 clang/test/CodeGenCXX/ptrauth.cpp diff --git a/clang/include/clang/Basic/CodeGenOptions.h b/clang/include/clang/Basic/CodeGenOptions.h index 9469a424045bb..502722a6ec4eb 100644 --- a/clang/include/clang/Basic/CodeGenOptions.h +++ b/clang/include/clang/Basic/CodeGenOptions.h @@ -13,6 +13,7 @@ #ifndef LLVM_CLANG_BASIC_CODEGENOPTIONS_H #define LLVM_CLANG_BASIC_CODEGENOPTIONS_H +#include "clang/Basic/PointerAuthOptions.h" #include "clang/Basic/Sanitizers.h" #include "clang/Basic/XRayInstr.h" #include "llvm/ADT/FloatingPointMode.h" @@ -388,6 +389,9 @@ class CodeGenOptions : public CodeGenOptionsBase { std::vector Reciprocals; + /// Configuration for pointer-signing. + PointerAuthOptions PointerAuth; + /// The preferred width for auto-vectorization transforms. This is intended to /// override default transforms based on the width of the architected vector /// registers. diff --git a/clang/include/clang/Basic/DiagnosticDriverKinds.td b/clang/include/clang/Basic/DiagnosticDriverKinds.td index 773b234cd68fe..6cbb0c8401c15 100644 --- a/clang/include/clang/Basic/DiagnosticDriverKinds.td +++ b/clang/include/clang/Basic/DiagnosticDriverKinds.td @@ -351,6 +351,9 @@ def err_drv_omp_host_ir_file_not_found : Error< "target regions but cannot be found">; def err_drv_omp_host_target_not_supported : Error< "target '%0' is not a supported OpenMP host target">; +def err_drv_ptrauth_not_supported : Error< + "target '%0' does not support native pointer authentication">; + def err_drv_expecting_fopenmp_with_fopenmp_targets : Error< "'-fopenmp-targets' must be used in conjunction with a '-fopenmp' option " "compatible with offloading; e.g., '-fopenmp=libomp' or '-fopenmp=libiomp5'">; diff --git a/clang/include/clang/Basic/LangOptions.h b/clang/include/clang/Basic/LangOptions.h index 75e88afbd9705..5216822e45b1b 100644 --- a/clang/include/clang/Basic/LangOptions.h +++ b/clang/include/clang/Basic/LangOptions.h @@ -346,6 +346,8 @@ class LangOptionsBase { BKey }; + using PointerAuthenticationMode = ::clang::PointerAuthenticationMode; + enum class ThreadModelKind { /// POSIX Threads. POSIX, diff --git a/clang/include/clang/Basic/PointerAuthOptions.h b/clang/include/clang/Basic/PointerAuthOptions.h index e5cdcc31ebfb7..32b179e3f9460 100644 --- a/clang/include/clang/Basic/PointerAuthOptions.h +++ b/clang/include/clang/Basic/PointerAuthOptions.h @@ -14,10 +14,146 @@ #ifndef LLVM_CLANG_BASIC_POINTERAUTHOPTIONS_H #define LLVM_CLANG_BASIC_POINTERAUTHOPTIONS_H +#include "clang/Basic/LLVM.h" +#include "clang/Basic/LangOptions.h" +#include "llvm/Support/ErrorHandling.h" +#include "llvm/Target/TargetOptions.h" +#include +#include +#include +#include + namespace clang { constexpr unsigned PointerAuthKeyNone = -1; +class PointerAuthSchema { +public: + enum class Kind : unsigned { +
[llvm-branch-commits] [libcxx] Mark test as long_tests (PR #95266)
vitalybuka wrote: Thanks! Abandoning. https://github.com/llvm/llvm-project/pull/95266 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [libcxx] Mark test as long_tests (PR #95266)
https://github.com/vitalybuka closed https://github.com/llvm/llvm-project/pull/95266 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits