[llvm-branch-commits] [openmp] release/18.x: [OpenMP][OMPT] Fix hsa include when building amdgpu/src/rtl.cpp (PR #95484)

2024-06-14 Thread Joseph Huber via llvm-branch-commits

jhuber6 wrote:

> The `openmp/libomptarget/plugins-nextgen/amdgpu/src/rtl.cpp` file requires 
> the `HSA_AMD_AGENT_INFO_TIMESTAMP_FREQUENCY` symbol.
> 
> This symbol is expected to be provided by 
> `openmp/libomptarget/plugins-nextgen/amdgpu/dynamic_hsa/hsa_ext_amd.h`, not 
> by third-party external `/opt/rocm/include/hsa/hsa_ext_amd.h`.

This was introduced in ROCm-5.3, see 
https://github.com/ROCm/ROCR-Runtime/blob/rocm-5.3.x/src/inc/hsa_ext_amd.h#L333.
 The `dynamic_hsa/` version is a copy of this header for use when the system 
version is not provided. If the system fails to find HSA, then it will use the 
dynamic version. The problem here is that you _have_ HSA, but it's too old. I 
don't know how much backward compatibility we really provide here, 
unfortunately the HSA headers really don't give you much versioning to work 
with, so we can't do `ifdef` on this stuff. 

https://github.com/llvm/llvm-project/pull/95484
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [openmp] release/18.x: [OpenMP][OMPT] Fix hsa include when building amdgpu/src/rtl.cpp (PR #95484)

2024-06-14 Thread Thomas Debesse via llvm-branch-commits

illwieckz wrote:

I reproduce the bug with both `release/18.x` and `release/17.x`.

I don't reproduce the bug with `release/16.x`.

I cannot test `release/15.x` because of other unrelated errors happening (like 
not having `getenv` defined).

https://github.com/llvm/llvm-project/pull/95484
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [Support] Integrate SipHash.cpp into libSupport. (PR #94394)

2024-06-14 Thread Kristof Beyls via llvm-branch-commits

https://github.com/kbeyls approved this pull request.


https://github.com/llvm/llvm-project/pull/94394
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [Support] Integrate SipHash.cpp into libSupport. (PR #94394)

2024-06-14 Thread Kristof Beyls via llvm-branch-commits

kbeyls wrote:

> [37c84b9](https://github.com/llvm/llvm-project/pull/94394/commits/37c84b9dce70f40db8a7c27b7de8232c4d10f78f)
>  shows what I had in mind, let me know what you all think. I added:
> 
> ```
> void getSipHash_2_4_64(ArrayRef In, const uint8_t ()[16],
>uint8_t ()[8]);
> 
> void getSipHash_2_4_128(ArrayRef In, const uint8_t ()[16],
> uint8_t ()[16]);
> ```
> 
> as the core interfaces, and mimicked the ref. test harness to reuse the same 
> test vectors. If this seems reasonable to yall I'm happy to extract the 
> vectors.h file from the ref. implementation into the "Import original 
> sources" PR – that's why I kept it open ;)

Thanks, that looks good to me.

https://github.com/llvm/llvm-project/pull/94394
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [openmp] release/18.x: [OpenMP][OMPT] Fix hsa include when building amdgpu/src/rtl.cpp (PR #95484)

2024-06-14 Thread Thomas Debesse via llvm-branch-commits

illwieckz wrote:

Here is a script to reproduce the bug:

```bash
#! /usr/bin/env bash

set -x -u -e -o pipefail

version="${1:-18}"

CMAKE_BUILD_PARALLEL_LEVEL="$(nproc)"
export CMAKE_BUILD_PARALLEL_LEVEL="${CMAKE_BUILD_PARALLEL_LEVEL:-4}"

workspace="llvm-bug95484-${version}"

rm -rf "${workspace}"
mkdir "${workspace}"
cd "${workspace}"

git clone --depth 1 \
--branch "release/${version}.x" \
'https://github.com/llvm/llvm-project.git' \
'llvm-project'

git clone --depth 1 \
'https://github.com/KhronosGroup/SPIRV-Headers.git' \
'llvm-project/llvm/projects/SPIRV-Headers'

git clone --depth 1 \
--branch "llvm_release_${version}0" \
'https://github.com/KhronosGroup/SPIRV-LLVM-Translator.git' \
'llvm-project/llvm/projects/SPIRV-LLVM-Translator'

cmake \
-S'llvm-project/llvm' \
-B'build' \
-G'Ninja' \
-D'CMAKE_INSTALL_PREFIX'='install' \
-D'CMAKE_BUILD_TYPE'='Release' \
-D'BUILD_SHARED_LIBS'='ON' \
-D'LLVM_ENABLE_PROJECTS'='clang;openmp' \
-D'LLVM_TARGETS_TO_BUILD'='Native' \
-D'LLVM_EXPERIMENTAL_TARGETS_TO_BUILD'='SPIRV' \
-D'LLVM_ENABLE_ASSERTIONS'='OFF' \
-D'LLVM_ENABLE_RTTI'='ON' \
-D'LLVM_BUILD_TESTS'='OFF' \
-D'LLVM_BUILD_TOOLS'='ON' \
-D'LLVM_SPIRV_INCLUDE_TESTS'='OFF' \
-D'LLVM_EXTERNAL_PROJECTS'='SPIRV-Headers'

cmake --build 'build'

cmake --install 'build'
```

It can be used just by saving it as `llvm-bug95484` and running it by doing 
either:

- `./llvm-bug95484`
  to fetch and attempt a clean build of `release/18.x` in a way it reproduces 
the bug,
- `./llvm-bug95484 17`
  to fetch and reproduce the bug with `release/17.x`.

It will fail this way:

```
llvm-bug95484-18/llvm-project/openmp/libomptarget/plugins-nextgen/amdgpu/src/rtl.cpp:1902:37:
 error: ‘HSA_AMD_AGENT_INFO_TIMESTAMP_FREQUENCY’ was not declared in this scope;
 did you mean ‘HSA_SYSTEM_INFO_TIMESTAMP_FREQUENCY’?
 1902 | if (auto Err = 
getDeviceAttrRaw(HSA_AMD_AGENT_INFO_TIMESTAMP_FREQUENCY,
  | 
^~
  | HSA_SYSTEM_INFO_TIMESTAMP_FREQUENCY
```

https://github.com/llvm/llvm-project/pull/95484
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [openmp] release/18.x: [OpenMP][OMPT] Fix hsa include when building amdgpu/src/rtl.cpp (PR #95484)

2024-06-13 Thread Thomas Debesse via llvm-branch-commits

illwieckz wrote:

```$
$ rg HSA_AMD_AGENT_INFO_TIMESTAMP_FREQUENCY

libc/utils/gpu/loader/amdgpu/Loader.cpp
521:   HSA_AMD_AGENT_INFO_TIMESTAMP_FREQUENCY),

openmp/libomptarget/plugins-nextgen/amdgpu/dynamic_hsa/hsa_ext_amd.h
74:  HSA_AMD_AGENT_INFO_TIMESTAMP_FREQUENCY = 0xA016,

openmp/libomptarget/plugins-nextgen/amdgpu/src/rtl.cpp
1892:if (auto Err = getDeviceAttrRaw(HSA_AMD_AGENT_INFO_TIMESTAMP_FREQUENCY,
```

The `openmp/libomptarget/plugins-nextgen/amdgpu/src/rtl.cpp` file requires the 
`HSA_AMD_AGENT_INFO_TIMESTAMP_FREQUENCY` symbol.

This symbol is expected to be provided by 
`openmp/libomptarget/plugins-nextgen/amdgpu/dynamic_hsa/hsa_ext_amd.h`, not by 
third-party external `/opt/rocm/include/hsa/hsa_ext_amd.h`

The code in `release/17.x` and `release/18.x` is explictely looking for 
`ROCm`'s `hsa/_ext_amd.h` and never look for LLVM `dynamic_hsa/hsa_ext_amd.h`. 
It tries to look for LLVM-provided `hsa_ext_amd.h` as a fallback but because of 
a mistake in `CMakeLists.txt`, this doesn't work in all cases because 
`dynamic_hsa` is not added to include directories in all cases.

https://github.com/llvm/llvm-project/pull/95484
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [openmp] release/18.x: [OpenMP][OMPT] Fix hsa include when building amdgpu/src/rtl.cpp (PR #95484)

2024-06-13 Thread Thomas Debesse via llvm-branch-commits

illwieckz wrote:

> We made a change recently that made the dynamic_hsa version the default. The 
> error you're seeing is from an old HSA, so if you're overriding the default 
> to use an old library that's probably not worth working around.

The error I see comes from the fact there is no old HSA around to workaround an 
LLVM bug.

There is no `hsa/hsa.h` in the tree, the default `dynamic_hsa` is not used.

The `hsa/hsa.h` file is from ROCm, not from LLVM.

Without such patch, LLVM requires ROCm to be installed and configured to be in 
default includes for `src/rtl.cpp` to build if `hsa.cpp` is not built.

This patch is to make LLVM use `dynamic_hsa` for building `src/rtl.cpp` because 
it is the default.

This patch is needed to build both `release/17.x` and `release/18.x`, the 
`main` branch changed the code layout so the patch will not work.

I assume a full LLVM build will not trigger the build problem because something 
else will include `dynamic_hsa` and will make it findable by `src/rtl.cpp` by 
luck. But when building a not-full LLVM, just what's needed by some 
applications, `dynamic_hsa` is not added to the include directories while being 
required by `src/rtl.cpp`.

https://github.com/llvm/llvm-project/pull/95484
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [openmp] release/18.x: [OpenMP][OMPT] Fix hsa include when building amdgpu/src/rtl.cpp (PR #95484)

2024-06-13 Thread Joseph Huber via llvm-branch-commits

https://github.com/jhuber6 edited 
https://github.com/llvm/llvm-project/pull/95484
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [libc] 4076c30 - [libc] more fix

2024-06-13 Thread Schrodinger ZHU Yifan via llvm-branch-commits

Author: Schrodinger ZHU Yifan
Date: 2024-06-13T20:22:21-07:00
New Revision: 4076c3004f09e95d1fcd299452843f99235ff422

URL: 
https://github.com/llvm/llvm-project/commit/4076c3004f09e95d1fcd299452843f99235ff422
DIFF: 
https://github.com/llvm/llvm-project/commit/4076c3004f09e95d1fcd299452843f99235ff422.diff

LOG: [libc] more fix

Added: 


Modified: 
libc/cmake/modules/LLVMLibCTestRules.cmake
libc/test/IntegrationTest/CMakeLists.txt
libc/test/IntegrationTest/test.cpp
libc/test/UnitTest/CMakeLists.txt
libc/test/UnitTest/HermeticTestUtils.cpp

Removed: 




diff  --git a/libc/cmake/modules/LLVMLibCTestRules.cmake 
b/libc/cmake/modules/LLVMLibCTestRules.cmake
index eb6be91b55e26..c8d7c8a2b1c7c 100644
--- a/libc/cmake/modules/LLVMLibCTestRules.cmake
+++ b/libc/cmake/modules/LLVMLibCTestRules.cmake
@@ -686,6 +686,15 @@ function(add_libc_hermetic_test test_name)
LibcTest.hermetic
libc.test.UnitTest.ErrnoSetterMatcher
${fq_deps_list})
+  # TODO: currently the dependency chain is broken such that getauxval cannot 
properly
+  # propagate to hermetic tests. This is a temporary workaround.
+  if (LIBC_TARGET_ARCHITECTURE_IS_AARCH64)
+target_link_libraries(
+  ${fq_build_target_name}
+  PRIVATE
+libc.src.sys.auxv.getauxval
+)
+  endif()
 
   # Tests on the GPU require an external loader utility to launch the kernel.
   if(TARGET libc.utils.gpu.loader)

diff  --git a/libc/test/IntegrationTest/CMakeLists.txt 
b/libc/test/IntegrationTest/CMakeLists.txt
index 4f31f10b29f0b..4a999407d48d7 100644
--- a/libc/test/IntegrationTest/CMakeLists.txt
+++ b/libc/test/IntegrationTest/CMakeLists.txt
@@ -1,3 +1,7 @@
+set(arch_specific_deps)
+if(LIBC_TARGET_ARCHITECTURE_IS_AARCH64)
+  set(arch_specific_deps libc.src.sys.auxv.getauxval)
+endif()
 add_object_library(
   test
   SRCS
@@ -8,4 +12,5 @@ add_object_library(
 test.h
   DEPENDS
 libc.src.__support.OSUtil.osutil
+${arch_specific_deps}
 )

diff  --git a/libc/test/IntegrationTest/test.cpp 
b/libc/test/IntegrationTest/test.cpp
index 27e7f29efa0f1..a8b2f2911fd8e 100644
--- a/libc/test/IntegrationTest/test.cpp
+++ b/libc/test/IntegrationTest/test.cpp
@@ -6,6 +6,8 @@
 //
 
//===--===//
 
+#include "src/__support/common.h"
+#include "src/sys/auxv/getauxval.h"
 #include 
 #include 
 
@@ -80,9 +82,11 @@ void *realloc(void *ptr, size_t s) {
 // __dso_handle when -nostdlib is used.
 void *__dso_handle = nullptr;
 
-// On some platform (aarch64 fedora tested) full build integration test
-// objects need to link against libgcc, which may expect a __getauxval
-// function. For now, it is fine to provide a weak definition that always
-// returns false.
-[[gnu::weak]] bool __getauxval(uint64_t, uint64_t *) { return false; }
+#ifdef LIBC_TARGET_ARCH_IS_AARCH64
+// Due to historical reasons, libgcc on aarch64 may expect __getauxval to be
+// defined. See also 
https://gcc.gnu.org/pipermail/gcc-cvs/2020-June/300635.html
+unsigned long __getauxval(unsigned long id) {
+  return LIBC_NAMESPACE::getauxval(id);
+}
+#endif
 } // extern "C"

diff  --git a/libc/test/UnitTest/CMakeLists.txt 
b/libc/test/UnitTest/CMakeLists.txt
index 302af3044ca3d..4adc2f5c725f7 100644
--- a/libc/test/UnitTest/CMakeLists.txt
+++ b/libc/test/UnitTest/CMakeLists.txt
@@ -41,7 +41,7 @@ function(add_unittest_framework_library name)
   target_compile_options(${name}.hermetic PRIVATE ${compile_options})
 
   if(TEST_LIB_DEPENDS)
-foreach(dep IN LISTS ${TEST_LIB_DEPENDS})
+foreach(dep IN ITEMS ${TEST_LIB_DEPENDS})
   if(TARGET ${dep}.unit)
 add_dependencies(${name}.unit ${dep}.unit)
   else()

diff  --git a/libc/test/UnitTest/HermeticTestUtils.cpp 
b/libc/test/UnitTest/HermeticTestUtils.cpp
index 349c182ff2379..6e815e6c8aab0 100644
--- a/libc/test/UnitTest/HermeticTestUtils.cpp
+++ b/libc/test/UnitTest/HermeticTestUtils.cpp
@@ -6,6 +6,8 @@
 //
 
//===--===//
 
+#include "src/__support/common.h"
+#include "src/sys/auxv/getauxval.h"
 #include 
 #include 
 
@@ -19,6 +21,12 @@ void *memmove(void *dst, const void *src, size_t count);
 void *memset(void *ptr, int value, size_t count);
 int atexit(void (*func)(void));
 
+// TODO: It seems that some old test frameworks does not use
+// add_libc_hermetic_test properly. Such that they won't get correct linkage
+// against the object containing this function. We create a dummy function that
+// always returns 0 to indicate a failure.
+[[gnu::weak]] unsigned long getauxval(unsigned long id) { return 0; }
+
 } // namespace LIBC_NAMESPACE
 
 namespace {
@@ -102,6 +110,14 @@ void __cxa_pure_virtual() {
 // __dso_handle when -nostdlib is used.
 void *__dso_handle = nullptr;
 
+#ifdef LIBC_TARGET_ARCH_IS_AARCH64
+// Due to historical reasons, 

[llvm-branch-commits] [openmp] release/18.x: [OpenMP][OMPT] Fix hsa include when building amdgpu/src/rtl.cpp (PR #95484)

2024-06-13 Thread Joseph Huber via llvm-branch-commits

https://github.com/jhuber6 commented:

We made a change recently that made the dynamic_hsa version the default. The 
error you're seeing is from an old HSA, so if you're overriding the default to 
use an old library that's probably not worth working around.

https://github.com/llvm/llvm-project/pull/95484
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [clang] Define ptrauth_sign_constant builtin. (PR #93904)

2024-06-13 Thread Ahmed Bougacha via llvm-branch-commits


@@ -354,6 +354,23 @@ Given that ``signedPointer`` matches the layout for signed 
pointers signed with
 the given key, extract the raw pointer from it.  This operation does not trap
 and cannot fail, even if the pointer is not validly signed.
 
+``ptrauth_sign_constant``
+^
+
+.. code-block:: c
+
+  ptrauth_sign_constant(pointer, key, discriminator)
+
+Return a signed pointer for a constant address in a manner which guarantees
+a non-attackable sequence.
+
+``pointer`` must be a constant expression of pointer type which evaluates to
+a non-null pointer.  The result will have the same type as ``discriminator``.
+
+Calls to this are constant expressions if the discriminator is a null-pointer
+constant expression or an integer constant expression. Implementations may
+allow other pointer expressions as well.

ahmedbougacha wrote:

Yeah, I agree today this could simply be "it's always a constant expression";  
I'll rewrite it (cc @rjmccall if this looks like anything to you)

https://github.com/llvm/llvm-project/pull/93904
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [clang] Define ptrauth_sign_constant builtin. (PR #93904)

2024-06-13 Thread Ahmed Bougacha via llvm-branch-commits


@@ -354,6 +354,23 @@ Given that ``signedPointer`` matches the layout for signed 
pointers signed with
 the given key, extract the raw pointer from it.  This operation does not trap
 and cannot fail, even if the pointer is not validly signed.
 
+``ptrauth_sign_constant``
+^
+
+.. code-block:: c
+
+  ptrauth_sign_constant(pointer, key, discriminator)
+
+Return a signed pointer for a constant address in a manner which guarantees
+a non-attackable sequence.

ahmedbougacha wrote:

Later additions to this document describe that in depth, you can look for
> [clang][docs] Document the ptrauth security model.

on my branch

https://github.com/llvm/llvm-project/pull/93904
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [clang] Define ptrauth_sign_constant builtin. (PR #93904)

2024-06-13 Thread Ahmed Bougacha via llvm-branch-commits


@@ -58,6 +58,35 @@ void test_string_discriminator(const char *str) {
 }
 
 
+void test_sign_constant(int *dp, int (*fp)(int)) {
+  __builtin_ptrauth_sign_constant(, VALID_DATA_KEY); // expected-error 
{{too few arguments}}
+  __builtin_ptrauth_sign_constant(, VALID_DATA_KEY, , ); // 
expected-error {{too many arguments}}
+
+  __builtin_ptrauth_sign_constant(mismatched_type, VALID_DATA_KEY, 0); // 
expected-error {{signed value must have pointer type; type here is 'struct A'}}
+  __builtin_ptrauth_sign_constant(, mismatched_type, 0); // expected-error 
{{passing 'struct A' to parameter of incompatible type 'int'}}
+  __builtin_ptrauth_sign_constant(, VALID_DATA_KEY, mismatched_type); // 
expected-error {{extra discriminator must have pointer or integer type; type 
here is 'struct A'}}
+
+  (void) __builtin_ptrauth_sign_constant(NULL, VALID_DATA_KEY, ); // 
expected-error {{argument to ptrauth_sign_constant must refer to a global 
variable or function}}

ahmedbougacha wrote:

We could special-case null pointers, but they're already covered by the 
diagnostic, which asks for global variables or functions – which NULL isn't.  
For auth/sign, we don't have that sort of constraint on the pointer: it really 
is NULL and NULL alone that's special.

Now, the more interesting question is whether we should allow null pointers at 
all here.  Since defining these original builtins we have taught the qualifier 
to have a mode that signs/authenticates null, for some specific use-cases where 
replacing a signed value with NULL (which is otherwise never signed or 
authenticated) would bypass signing in a problematic way.
We haven't had the chance or need to revisit the builtins to allow sign/auth of 
NULL, but it's reasonable to add that support in the future.  We'd have to 
consider how to expose that in the builtins, because it's probably still 
something that's almost always a mistake;  more builtins would be an easy 
solution but maybe not a sophisticated one.

https://github.com/llvm/llvm-project/pull/93904
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [clang] Define ptrauth_sign_constant builtin. (PR #93904)

2024-06-13 Thread Ahmed Bougacha via llvm-branch-commits


@@ -2061,6 +2071,58 @@ ConstantLValueEmitter::VisitCallExpr(const CallExpr *E) {
   }
 }
 
+ConstantLValue
+ConstantLValueEmitter::emitPointerAuthSignConstant(const CallExpr *E) {
+  llvm::Constant *UnsignedPointer = emitPointerAuthPointer(E->getArg(0));
+  unsigned Key = emitPointerAuthKey(E->getArg(1));
+  llvm::Constant *StorageAddress;
+  llvm::Constant *OtherDiscriminator;
+  std::tie(StorageAddress, OtherDiscriminator) =

ahmedbougacha wrote:

Yeah, this simply predates structured bindings;  we can indeed use them now.

https://github.com/llvm/llvm-project/pull/93904
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [openmp] release/18.x: [OpenMP][OMPT] Fix hsa include when building amdgpu/src/rtl.cpp (PR #95484)

2024-06-13 Thread Thomas Debesse via llvm-branch-commits

illwieckz wrote:

@pranav-sivaraman try this patch:

```diff
diff --git a/openmp/libomptarget/plugins/amdgpu/CMakeLists.txt 
b/openmp/libomptarget/plugins/amdgpu/CMakeLists.txt
index 92523c23f68b..92bcd94edb7a 100644
--- a/openmp/libomptarget/plugins/amdgpu/CMakeLists.txt
+++ b/openmp/libomptarget/plugins/amdgpu/CMakeLists.txt
@@ -56,13 +56,14 @@ include_directories(
 set(LIBOMPTARGET_DLOPEN_LIBHSA OFF)
 option(LIBOMPTARGET_FORCE_DLOPEN_LIBHSA "Build with dlopened libhsa" 
${LIBOMPTARGET_DLOPEN_LIBHSA})
 
+include_directories(dynamic_hsa)
+
 if (${hsa-runtime64_FOUND} AND NOT LIBOMPTARGET_FORCE_DLOPEN_LIBHSA)
   libomptarget_say("Building AMDGPU plugin linked against libhsa")
   set(LIBOMPTARGET_EXTRA_SOURCE)
   set(LIBOMPTARGET_DEP_LIBRARIES hsa-runtime64::hsa-runtime64)
 else()
   libomptarget_say("Building AMDGPU plugin for dlopened libhsa")
-  include_directories(dynamic_hsa)
   set(LIBOMPTARGET_EXTRA_SOURCE dynamic_hsa/hsa.cpp)
   set(LIBOMPTARGET_DEP_LIBRARIES)
 endif()
```

I haven't tested it, but maybe the mistake is similar.

https://github.com/llvm/llvm-project/pull/95484
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [openmp] release/18.x: [OpenMP][OMPT] Fix hsa include when building amdgpu/src/rtl.cpp (PR #95484)

2024-06-13 Thread Thomas Debesse via llvm-branch-commits

illwieckz wrote:

The 14 branch seems to be very old, espially the file you link is in `plugins/` 
directory, while the files I modify are in `plugins-nextgen/` directory, witht 
the `plugins/` directory not existing anymore. So I strongly doubt the patch is 
useful for LLVM 14, but your problem probably needs another but similar 
solution.

https://github.com/llvm/llvm-project/pull/95484
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] Using matched block counts to measure discrepancy (PR #95486)

2024-06-13 Thread via llvm-branch-commits

llvmbot wrote:




@llvm/pr-subscribers-llvm-transforms

Author: shaw young (shawbyoung)


Changes



Test Plan: tbd


---
Full diff: https://github.com/llvm/llvm-project/pull/95486.diff


2 Files Affected:

- (modified) bolt/lib/Profile/StaleProfileMatching.cpp (+29-8) 
- (modified) llvm/include/llvm/Transforms/Utils/SampleProfileInference.h (-2) 


``diff
diff --git a/bolt/lib/Profile/StaleProfileMatching.cpp 
b/bolt/lib/Profile/StaleProfileMatching.cpp
index 6588cf2c0ce66..cbd98f4d4769f 100644
--- a/bolt/lib/Profile/StaleProfileMatching.cpp
+++ b/bolt/lib/Profile/StaleProfileMatching.cpp
@@ -53,9 +53,9 @@ cl::opt
 
 cl::opt MatchedProfileThreshold(
 "matched-profile-threshold",
-cl::desc("Percentage threshold of matched execution counts at which stale "
+cl::desc("Percentage threshold of matched basic blocks at which stale "
  "profile inference is executed."),
-cl::init(5), cl::Hidden, cl::cat(BoltOptCategory));
+cl::init(50), cl::Hidden, cl::cat(BoltOptCategory));
 
 cl::opt StaleMatchingMaxFuncSize(
 "stale-matching-max-func-size",
@@ -186,6 +186,17 @@ struct BlendedBlockHash {
   uint8_t SuccHash{0};
 };
 
+/// A data object containing function matching information.
+struct FunctionMatchingData {
+public:
+  /// The number of blocks matched exactly.
+  uint64_t MatchedExactBlocks{0};
+  /// The number of blocks matched loosely.
+  uint64_t MatchedLooseBlocks{0};
+  /// The number of execution counts matched.
+  uint64_t MatchedExecCounts{0};
+};
+
 /// The object is used to identify and match basic blocks in a BinaryFunction
 /// given their hashes computed on a binary built from several revisions behind
 /// release.
@@ -400,7 +411,8 @@ createFlowFunction(const 
BinaryFunction::BasicBlockOrderType ) {
 void matchWeightsByHashes(BinaryContext ,
   const BinaryFunction::BasicBlockOrderType 
,
   const yaml::bolt::BinaryFunctionProfile ,
-  FlowFunction ) {
+  FlowFunction ,
+  FunctionMatchingData ) {
   assert(Func.Blocks.size() == BlockOrder.size() + 1);
 
   std::vector Blocks;
@@ -440,9 +452,11 @@ void matchWeightsByHashes(BinaryContext ,
   if (Matcher.isHighConfidenceMatch(BinHash, YamlHash)) {
 ++BC.Stats.NumMatchedBlocks;
 BC.Stats.MatchedSampleCount += YamlBB.ExecCount;
-Func.MatchedExecCount += YamlBB.ExecCount;
+FuncMatchingData.MatchedExecCounts += YamlBB.ExecCount;
+FuncMatchingData.MatchedExactBlocks += 1;
 LLVM_DEBUG(dbgs() << "  exact match\n");
   } else {
+FuncMatchingData.MatchedLooseBlocks += 1;
 LLVM_DEBUG(dbgs() << "  loose match\n");
   }
   if (YamlBB.NumInstructions == BB->size())
@@ -582,11 +596,14 @@ void preprocessUnreachableBlocks(FlowFunction ) {
 /// Decide if stale profile matching can be applied for a given function.
 /// Currently we skip inference for (very) large instances and for instances
 /// having "unexpected" control flow (e.g., having no sink basic blocks).
-bool canApplyInference(const FlowFunction , const 
yaml::bolt::BinaryFunctionProfile ) {
+bool canApplyInference(const FlowFunction ,
+   const yaml::bolt::BinaryFunctionProfile ,
+   const FunctionMatchingData ) {
   if (Func.Blocks.size() > opts::StaleMatchingMaxFuncSize)
 return false;
 
-  if (Func.MatchedExecCount / YamlBF.ExecCount >= 
opts::MatchedProfileThreshold)
+  if ((double)FuncMatchingData.MatchedExactBlocks / YamlBF.Blocks.size() >=
+  opts::MatchedProfileThreshold / 100.0)
 return false;
 
   bool HasExitBlocks = llvm::any_of(
@@ -735,18 +752,22 @@ bool YAMLProfileReader::inferStaleProfile(
   const BinaryFunction::BasicBlockOrderType BlockOrder(
   BF.getLayout().block_begin(), BF.getLayout().block_end());
 
+  // Create a containter for function matching data.
+  FunctionMatchingData FuncMatchingData;
+
   // Create a wrapper flow function to use with the profile inference 
algorithm.
   FlowFunction Func = createFlowFunction(BlockOrder);
 
   // Match as many block/jump counts from the stale profile as possible
-  matchWeightsByHashes(BF.getBinaryContext(), BlockOrder, YamlBF, Func);
+  matchWeightsByHashes(BF.getBinaryContext(), BlockOrder, YamlBF, Func,
+   FuncMatchingData);
 
   // Adjust the flow function by marking unreachable blocks Unlikely so that
   // they don't get any counts assigned.
   preprocessUnreachableBlocks(Func);
 
   // Check if profile inference can be applied for the instance.
-  if (!canApplyInference(Func, YamlBF))
+  if (!canApplyInference(Func, YamlBF, FuncMatchingData))
 return false;
 
   // Apply the profile inference algorithm.
diff --git a/llvm/include/llvm/Transforms/Utils/SampleProfileInference.h 
b/llvm/include/llvm/Transforms/Utils/SampleProfileInference.h
index e7971ca1cb428..b4ea1ad840f9d 100644
--- 

[llvm-branch-commits] Using matched block counts to measure discrepancy (PR #95486)

2024-06-13 Thread shaw young via llvm-branch-commits

https://github.com/shawbyoung closed 
https://github.com/llvm/llvm-project/pull/95486
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] Using matched block counts to measure discrepancy (PR #95486)

2024-06-13 Thread shaw young via llvm-branch-commits

https://github.com/shawbyoung created 
https://github.com/llvm/llvm-project/pull/95486



Test Plan: tbd



___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [openmp] release/18.x: [OpenMP][OMPT] Fix hsa include when building amdgpu/src/rtl.cpp (PR #95484)

2024-06-13 Thread Pranav Sivaraman via llvm-branch-commits

pranav-sivaraman wrote:

This is different from this 
[file](https://github.com/llvm/llvm-project/blob/release/14.x/openmp/libomptarget/plugins/amdgpu/impl/hsa_api.h)
 right? I'm trying to fix an issue when building LLVM 14 with a newer ROCm 
releases which fails to find the newer `hsa/hsa.h` headers. Not sure if I need 
to extend the patch to include these changes as well. 

https://github.com/llvm/llvm-project/pull/95484
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [openmp] release/18.x: [OpenMP][OMPT] Fix hsa include when building amdgpu/src/rtl.cpp (PR #95484)

2024-06-13 Thread Thomas Debesse via llvm-branch-commits

illwieckz wrote:

I first noticed the issue when building the chipStar fork of LLVM 17: 
https://github.com/CHIP-SPV/llvm-project (branch `chipStar-llvm-17`), but the 
code being the same in LLVM 18, it is expected to fail in LLVM 18 too.

The whole folder disappeared in `main` so I made this patch to target the most 
recent release branch having those files: LLVM18.

It would be good to backport it to LLVM 17 too.

I haven't checked it yet if versions older than LLVM 17 are affected.

https://github.com/llvm/llvm-project/pull/95484
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [openmp] release/18.x: [OpenMP][OMPT] Fix hsa include when building amdgpu/src/rtl.cpp (PR #95484)

2024-06-13 Thread via llvm-branch-commits

llvmbot wrote:




@llvm/pr-subscribers-backend-amdgpu

Author: Thomas Debesse (illwieckz)


Changes

The `dynamic_hsa/` include directory is required by both optional 
`dynamic_hsa/hsa.cpp` and non-optional `src/rtl.cpp`.

It should then always be included or the build will fail if only `src/rtl.cpp` 
is built.

This also simplifies the way header files from `dynamic_hsa/` are included in 
`src/rtl.cpp`.

Fixes:

```
error: ‘HSA_AMD_AGENT_INFO_TIMESTAMP_FREQUENCY’ was not declared in this scope
```

---
Full diff: https://github.com/llvm/llvm-project/pull/95484.diff


2 Files Affected:

- (modified) openmp/libomptarget/plugins-nextgen/amdgpu/CMakeLists.txt (+3-1) 
- (modified) openmp/libomptarget/plugins-nextgen/amdgpu/src/rtl.cpp (-10) 


``diff
diff --git a/openmp/libomptarget/plugins-nextgen/amdgpu/CMakeLists.txt 
b/openmp/libomptarget/plugins-nextgen/amdgpu/CMakeLists.txt
index 68ce63467a6c8..42cc560c79112 100644
--- a/openmp/libomptarget/plugins-nextgen/amdgpu/CMakeLists.txt
+++ b/openmp/libomptarget/plugins-nextgen/amdgpu/CMakeLists.txt
@@ -38,13 +38,15 @@ add_definitions(-DDEBUG_PREFIX="TARGET AMDGPU RTL")
 set(LIBOMPTARGET_DLOPEN_LIBHSA OFF)
 option(LIBOMPTARGET_FORCE_DLOPEN_LIBHSA "Build with dlopened libhsa" 
${LIBOMPTARGET_DLOPEN_LIBHSA})
 
+# Required by both optional dynamic_hsa/hsa.cpp and non-optional src/rtl.cpp.
+include_directories(dynamic_hsa)
+
 if (${hsa-runtime64_FOUND} AND NOT LIBOMPTARGET_FORCE_DLOPEN_LIBHSA)
   libomptarget_say("Building AMDGPU NextGen plugin linked against libhsa")
   set(LIBOMPTARGET_EXTRA_SOURCE)
   set(LIBOMPTARGET_DEP_LIBRARIES hsa-runtime64::hsa-runtime64)
 else()
   libomptarget_say("Building AMDGPU NextGen plugin for dlopened libhsa")
-  include_directories(dynamic_hsa)
   set(LIBOMPTARGET_EXTRA_SOURCE dynamic_hsa/hsa.cpp)
   set(LIBOMPTARGET_DEP_LIBRARIES)
 endif()
diff --git a/openmp/libomptarget/plugins-nextgen/amdgpu/src/rtl.cpp 
b/openmp/libomptarget/plugins-nextgen/amdgpu/src/rtl.cpp
index 81634ae1edc49..8cedc72d5f63c 100644
--- a/openmp/libomptarget/plugins-nextgen/amdgpu/src/rtl.cpp
+++ b/openmp/libomptarget/plugins-nextgen/amdgpu/src/rtl.cpp
@@ -56,18 +56,8 @@
 #define BIGENDIAN_CPU
 #endif
 
-#if defined(__has_include)
-#if __has_include("hsa/hsa.h")
-#include "hsa/hsa.h"
-#include "hsa/hsa_ext_amd.h"
-#elif __has_include("hsa.h")
 #include "hsa.h"
 #include "hsa_ext_amd.h"
-#endif
-#else
-#include "hsa/hsa.h"
-#include "hsa/hsa_ext_amd.h"
-#endif
 
 namespace llvm {
 namespace omp {

``




https://github.com/llvm/llvm-project/pull/95484
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [openmp] release/18.x: [OpenMP][OMPT] Fix hsa include when building amdgpu/src/rtl.cpp (PR #95484)

2024-06-13 Thread Thomas Debesse via llvm-branch-commits

https://github.com/illwieckz created 
https://github.com/llvm/llvm-project/pull/95484

The `dynamic_hsa/` include directory is required by both optional 
`dynamic_hsa/hsa.cpp` and non-optional `src/rtl.cpp`.

It should then always be included or the build will fail if only `src/rtl.cpp` 
is built.

This also simplifies the way header files from `dynamic_hsa/` are included in 
`src/rtl.cpp`.

Fixes:

```
error: ‘HSA_AMD_AGENT_INFO_TIMESTAMP_FREQUENCY’ was not declared in this scope
```

>From e84e8bdef6d902d51a72eb93f7ca9812f0467c72 Mon Sep 17 00:00:00 2001
From: Thomas Debesse 
Date: Fri, 14 Jun 2024 00:38:25 +0200
Subject: [PATCH] release/18.x: [OpenMP][OMPT] Fix hsa include when building
 amdgpu/src/rtl.cpp
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

The dynamic_hsa/ include directory is required by both
optional dynamic_hsa/hsa.cpp and non-optional src/rtl.cpp.
It should then always be included or the build will fail
if only src/rtl.cpp is built.

This also simplifies the way header files from dynamic_hsa/
are included in src/rtl.cpp.

Fixes:

  error: ‘HSA_AMD_AGENT_INFO_TIMESTAMP_FREQUENCY’ was not declared in this scope
---
 .../libomptarget/plugins-nextgen/amdgpu/CMakeLists.txt |  4 +++-
 openmp/libomptarget/plugins-nextgen/amdgpu/src/rtl.cpp | 10 --
 2 files changed, 3 insertions(+), 11 deletions(-)

diff --git a/openmp/libomptarget/plugins-nextgen/amdgpu/CMakeLists.txt 
b/openmp/libomptarget/plugins-nextgen/amdgpu/CMakeLists.txt
index 68ce63467a6c8..42cc560c79112 100644
--- a/openmp/libomptarget/plugins-nextgen/amdgpu/CMakeLists.txt
+++ b/openmp/libomptarget/plugins-nextgen/amdgpu/CMakeLists.txt
@@ -38,13 +38,15 @@ add_definitions(-DDEBUG_PREFIX="TARGET AMDGPU RTL")
 set(LIBOMPTARGET_DLOPEN_LIBHSA OFF)
 option(LIBOMPTARGET_FORCE_DLOPEN_LIBHSA "Build with dlopened libhsa" 
${LIBOMPTARGET_DLOPEN_LIBHSA})
 
+# Required by both optional dynamic_hsa/hsa.cpp and non-optional src/rtl.cpp.
+include_directories(dynamic_hsa)
+
 if (${hsa-runtime64_FOUND} AND NOT LIBOMPTARGET_FORCE_DLOPEN_LIBHSA)
   libomptarget_say("Building AMDGPU NextGen plugin linked against libhsa")
   set(LIBOMPTARGET_EXTRA_SOURCE)
   set(LIBOMPTARGET_DEP_LIBRARIES hsa-runtime64::hsa-runtime64)
 else()
   libomptarget_say("Building AMDGPU NextGen plugin for dlopened libhsa")
-  include_directories(dynamic_hsa)
   set(LIBOMPTARGET_EXTRA_SOURCE dynamic_hsa/hsa.cpp)
   set(LIBOMPTARGET_DEP_LIBRARIES)
 endif()
diff --git a/openmp/libomptarget/plugins-nextgen/amdgpu/src/rtl.cpp 
b/openmp/libomptarget/plugins-nextgen/amdgpu/src/rtl.cpp
index 81634ae1edc49..8cedc72d5f63c 100644
--- a/openmp/libomptarget/plugins-nextgen/amdgpu/src/rtl.cpp
+++ b/openmp/libomptarget/plugins-nextgen/amdgpu/src/rtl.cpp
@@ -56,18 +56,8 @@
 #define BIGENDIAN_CPU
 #endif
 
-#if defined(__has_include)
-#if __has_include("hsa/hsa.h")
-#include "hsa/hsa.h"
-#include "hsa/hsa_ext_amd.h"
-#elif __has_include("hsa.h")
 #include "hsa.h"
 #include "hsa_ext_amd.h"
-#endif
-#else
-#include "hsa/hsa.h"
-#include "hsa/hsa_ext_amd.h"
-#endif
 
 namespace llvm {
 namespace omp {

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [Support] Integrate SipHash.cpp into libSupport. (PR #94394)

2024-06-13 Thread Ahmed Bougacha via llvm-branch-commits

https://github.com/ahmedbougacha updated 
https://github.com/llvm/llvm-project/pull/94394

>From 1e9a3fde97d907c3cd6be33db91d1c18c7236ffb Mon Sep 17 00:00:00 2001
From: Ahmed Bougacha 
Date: Tue, 4 Jun 2024 12:41:47 -0700
Subject: [PATCH 1/7] [Support] Reformat SipHash.cpp to match libSupport.

While there, give it our usual file header and an acknowledgement,
and remove the imported README.md.SipHash.
---
 llvm/lib/Support/README.md.SipHash | 126 --
 llvm/lib/Support/SipHash.cpp   | 264 ++---
 2 files changed, 129 insertions(+), 261 deletions(-)
 delete mode 100644 llvm/lib/Support/README.md.SipHash

diff --git a/llvm/lib/Support/README.md.SipHash 
b/llvm/lib/Support/README.md.SipHash
deleted file mode 100644
index 4de3cd1854681..0
--- a/llvm/lib/Support/README.md.SipHash
+++ /dev/null
@@ -1,126 +0,0 @@
-# SipHash
-
-[![License:
-CC0-1.0](https://licensebuttons.net/l/zero/1.0/80x15.png)](http://creativecommons.org/publicdomain/zero/1.0/)
-
-[![License: 
MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
-
-
-SipHash is a family of pseudorandom functions (PRFs) optimized for speed on 
short messages.
-This is the reference C code of SipHash: portable, simple, optimized for 
clarity and debugging.
-
-SipHash was designed in 2012 by [Jean-Philippe Aumasson](https://aumasson.jp)
-and [Daniel J. Bernstein](https://cr.yp.to) as a defense against [hash-flooding
-DoS attacks](https://aumasson.jp/siphash/siphashdos_29c3_slides.pdf).
-
-SipHash is:
-
-* *Simpler and faster* on short messages than previous cryptographic
-algorithms, such as MACs based on universal hashing.
-
-* *Competitive in performance* with insecure non-cryptographic algorithms, 
such as [fhhash](https://github.com/cbreeden/fxhash).
-
-* *Cryptographically secure*, with no sign of weakness despite multiple 
[cryptanalysis](https://eprint.iacr.org/2019/865) 
[projects](https://eprint.iacr.org/2019/865) by leading cryptographers.
-
-* *Battle-tested*, with successful integration in OSs (Linux kernel, OpenBSD,
-FreeBSD, FreeRTOS), languages (Perl, Python, Ruby, etc.), libraries (OpenSSL 
libcrypto,
-Sodium, etc.) and applications (Wireguard, Redis, etc.).
-
-As a secure pseudorandom function (a.k.a. keyed hash function), SipHash can 
also be used as a secure message authentication code (MAC).
-But SipHash is *not a hash* in the sense of general-purpose key-less hash 
function such as BLAKE3 or SHA-3.
-SipHash should therefore always be used with a secret key in order to be 
secure.
-
-
-## Variants
-
-The default SipHash is *SipHash-2-4*: it takes a 128-bit key, does 2 
compression
-rounds, 4 finalization rounds, and returns a 64-bit tag.
-
-Variants can use a different number of rounds. For example, we proposed 
*SipHash-4-8* as a conservative version.
-
-The following versions are not described in the paper but were designed and 
analyzed to fulfill applications' needs:
-
-* *SipHash-128* returns a 128-bit tag instead of 64-bit. Versions with 
specified number of rounds are SipHash-2-4-128, SipHash4-8-128, and so on.
-
-* *HalfSipHash* works with 32-bit words instead of 64-bit, takes a 64-bit key,
-and returns 32-bit or 64-bit tags. For example, HalfSipHash-2-4-32 has 2
-compression rounds, 4 finalization rounds, and returns a 32-bit tag.
-
-
-## Security
-
-(Half)SipHash-*c*-*d* with *c* ≥ 2 and *d* ≥ 4 is expected to provide the 
maximum PRF
-security for any function with the same key and output size.
-
-The standard PRF security goal allow the attacker access to the output of 
SipHash on messages chosen adaptively by the attacker.
-
-Security is limited by the key size (128 bits for SipHash), such that
-attackers searching 2*s* keys have chance 2*s*−128 of 
finding
-the SipHash key. 
-Security is also limited by the output size. In particular, when
-SipHash is used as a MAC, an attacker who blindly tries 2*s* tags 
will
-succeed with probability 2*s*-*t*, if *t* is that tag's bit size.
-
-
-## Research
-
-* [Research paper](https://www.aumasson.jp/siphash/siphash.pdf) "SipHash: a 
fast short-input PRF" (accepted at INDOCRYPT 2012)
-* [Slides](https://cr.yp.to/talks/2012.12.12/slides.pdf) of the presentation 
of SipHash at INDOCRYPT 2012 (Bernstein)
-* [Slides](https://www.aumasson.jp/siphash/siphash_slides.pdf) of the 
presentation of SipHash at the DIAC workshop (Aumasson)
-
-
-## Usage
-
-Running
-
-```sh
-  make
-```
-
-will build tests for 
-
-* SipHash-2-4-64
-* SipHash-2-4-128
-* HalfSipHash-2-4-32
-* HalfSipHash-2-4-64
-
-
-```C
-  ./test
-```
-
-verifies 64 test vectors, and
-
-```C
-  ./debug
-```
-
-does the same and prints intermediate values.
-
-The code can be adapted to implement SipHash-*c*-*d*, the version of SipHash
-with *c* compression rounds and *d* finalization rounds, by defining `cROUNDS`
-or `dROUNDS` when compiling.  This can be done with `-D` command line arguments
-to many compilers such as below.
-
-```sh
-gcc -Wall 

[llvm-branch-commits] [llvm] [Support] Integrate SipHash.cpp into libSupport. (PR #94394)

2024-06-13 Thread Ahmed Bougacha via llvm-branch-commits

ahmedbougacha wrote:

[37c84b9](https://github.com/llvm/llvm-project/pull/94394/commits/37c84b9dce70f40db8a7c27b7de8232c4d10f78f)
 shows what I had in mind, let me know what you all think.  I added:
```
void getSipHash_2_4_64(const uint8_t *In, uint64_t InLen,
   const uint8_t ()[16], uint8_t ()[8]);

void getSipHash_2_4_128(const uint8_t *In, uint64_t InLen,
const uint8_t ()[16], uint8_t ()[16]);
```
as the core interfaces, and mimicked the ref. test harness to reuse the same 
test vectors.  If this seems reasonable to yall I'm happy to extract the 
vectors.h file from the ref. implementation into the "Import original sources" 
PR – that's why I kept it open ;)

https://github.com/llvm/llvm-project/pull/94394
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [Support] Integrate SipHash.cpp into libSupport. (PR #94394)

2024-06-13 Thread Ahmed Bougacha via llvm-branch-commits

https://github.com/ahmedbougacha updated 
https://github.com/llvm/llvm-project/pull/94394

>From 1e9a3fde97d907c3cd6be33db91d1c18c7236ffb Mon Sep 17 00:00:00 2001
From: Ahmed Bougacha 
Date: Tue, 4 Jun 2024 12:41:47 -0700
Subject: [PATCH 1/6] [Support] Reformat SipHash.cpp to match libSupport.

While there, give it our usual file header and an acknowledgement,
and remove the imported README.md.SipHash.
---
 llvm/lib/Support/README.md.SipHash | 126 --
 llvm/lib/Support/SipHash.cpp   | 264 ++---
 2 files changed, 129 insertions(+), 261 deletions(-)
 delete mode 100644 llvm/lib/Support/README.md.SipHash

diff --git a/llvm/lib/Support/README.md.SipHash 
b/llvm/lib/Support/README.md.SipHash
deleted file mode 100644
index 4de3cd1854681..0
--- a/llvm/lib/Support/README.md.SipHash
+++ /dev/null
@@ -1,126 +0,0 @@
-# SipHash
-
-[![License:
-CC0-1.0](https://licensebuttons.net/l/zero/1.0/80x15.png)](http://creativecommons.org/publicdomain/zero/1.0/)
-
-[![License: 
MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
-
-
-SipHash is a family of pseudorandom functions (PRFs) optimized for speed on 
short messages.
-This is the reference C code of SipHash: portable, simple, optimized for 
clarity and debugging.
-
-SipHash was designed in 2012 by [Jean-Philippe Aumasson](https://aumasson.jp)
-and [Daniel J. Bernstein](https://cr.yp.to) as a defense against [hash-flooding
-DoS attacks](https://aumasson.jp/siphash/siphashdos_29c3_slides.pdf).
-
-SipHash is:
-
-* *Simpler and faster* on short messages than previous cryptographic
-algorithms, such as MACs based on universal hashing.
-
-* *Competitive in performance* with insecure non-cryptographic algorithms, 
such as [fhhash](https://github.com/cbreeden/fxhash).
-
-* *Cryptographically secure*, with no sign of weakness despite multiple 
[cryptanalysis](https://eprint.iacr.org/2019/865) 
[projects](https://eprint.iacr.org/2019/865) by leading cryptographers.
-
-* *Battle-tested*, with successful integration in OSs (Linux kernel, OpenBSD,
-FreeBSD, FreeRTOS), languages (Perl, Python, Ruby, etc.), libraries (OpenSSL 
libcrypto,
-Sodium, etc.) and applications (Wireguard, Redis, etc.).
-
-As a secure pseudorandom function (a.k.a. keyed hash function), SipHash can 
also be used as a secure message authentication code (MAC).
-But SipHash is *not a hash* in the sense of general-purpose key-less hash 
function such as BLAKE3 or SHA-3.
-SipHash should therefore always be used with a secret key in order to be 
secure.
-
-
-## Variants
-
-The default SipHash is *SipHash-2-4*: it takes a 128-bit key, does 2 
compression
-rounds, 4 finalization rounds, and returns a 64-bit tag.
-
-Variants can use a different number of rounds. For example, we proposed 
*SipHash-4-8* as a conservative version.
-
-The following versions are not described in the paper but were designed and 
analyzed to fulfill applications' needs:
-
-* *SipHash-128* returns a 128-bit tag instead of 64-bit. Versions with 
specified number of rounds are SipHash-2-4-128, SipHash4-8-128, and so on.
-
-* *HalfSipHash* works with 32-bit words instead of 64-bit, takes a 64-bit key,
-and returns 32-bit or 64-bit tags. For example, HalfSipHash-2-4-32 has 2
-compression rounds, 4 finalization rounds, and returns a 32-bit tag.
-
-
-## Security
-
-(Half)SipHash-*c*-*d* with *c* ≥ 2 and *d* ≥ 4 is expected to provide the 
maximum PRF
-security for any function with the same key and output size.
-
-The standard PRF security goal allow the attacker access to the output of 
SipHash on messages chosen adaptively by the attacker.
-
-Security is limited by the key size (128 bits for SipHash), such that
-attackers searching 2*s* keys have chance 2*s*−128 of 
finding
-the SipHash key. 
-Security is also limited by the output size. In particular, when
-SipHash is used as a MAC, an attacker who blindly tries 2*s* tags 
will
-succeed with probability 2*s*-*t*, if *t* is that tag's bit size.
-
-
-## Research
-
-* [Research paper](https://www.aumasson.jp/siphash/siphash.pdf) "SipHash: a 
fast short-input PRF" (accepted at INDOCRYPT 2012)
-* [Slides](https://cr.yp.to/talks/2012.12.12/slides.pdf) of the presentation 
of SipHash at INDOCRYPT 2012 (Bernstein)
-* [Slides](https://www.aumasson.jp/siphash/siphash_slides.pdf) of the 
presentation of SipHash at the DIAC workshop (Aumasson)
-
-
-## Usage
-
-Running
-
-```sh
-  make
-```
-
-will build tests for 
-
-* SipHash-2-4-64
-* SipHash-2-4-128
-* HalfSipHash-2-4-32
-* HalfSipHash-2-4-64
-
-
-```C
-  ./test
-```
-
-verifies 64 test vectors, and
-
-```C
-  ./debug
-```
-
-does the same and prints intermediate values.
-
-The code can be adapted to implement SipHash-*c*-*d*, the version of SipHash
-with *c* compression rounds and *d* finalization rounds, by defining `cROUNDS`
-or `dROUNDS` when compiling.  This can be done with `-D` command line arguments
-to many compilers such as below.
-
-```sh
-gcc -Wall 

[llvm-branch-commits] [llvm] AMDGPU: Fix buffer load/store of pointers (PR #95379)

2024-06-13 Thread Krzysztof Drewniak via llvm-branch-commits

krzysz00 wrote:

On the other hand, it's a lot easier to handle ugly types down in instruction 
selection, where you get to play much more fast and loose with types.

And there are buffer uses that don't fit into the fat pointer use use case 
where we'd still want them to work.
For example, both `str
uct.ptr.bufferload.v6f16` and `struct.ptr.buffer.load.v3f32` should be a 
`buffer_load_dwordx3`, but I'm pretty sure 6 x half isn't a register type.

The load and store intrinsics are already overloaded to handle various {8, 16, 
..., 128}-bit types, and it seems much cleaner to let it support any type of 
those lengths. It's just a load/store with somewhat weird indexing semantics, 
is all.

And then, since we're there ... `load i256, ptr addrspace(1) %p` legalizes to 
multiple instructions, and `{raw,struct}.ptr.buffer.load(ptr addrspace(8) %p, 
i32 %offset, ...)` should too. It's just a load, after all.

https://github.com/llvm/llvm-project/pull/95379
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [flang] [flang] Lower REDUCE intrinsic for reduction op with args by value (PR #95353)

2024-06-13 Thread Valentin Clement バレンタイン クレメン via llvm-branch-commits


@@ -5745,6 +5745,14 @@ IntrinsicLibrary::genReduce(mlir::Type resultType,
   int rank = arrayTmp.rank();
   assert(rank >= 1);
 
+  // Arguements to the reduction operation are passed by reference or value?
+  bool argByRef = true;
+  if (auto embox =
+  mlir::dyn_cast_or_null(operation.getDefiningOp())) 
{

clementval wrote:

> Does REDUCE works with dummy procedure and procedure pointers? If so it would 
> be good to add tests for those cases to ensure the pattern matching here 
> works with them.

I'll check if this is supported and add proper test if it is. 

https://github.com/llvm/llvm-project/pull/95353
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [lld] [llvm] release/18.x: [lld] Fix -ObjC load behavior with LTO (#92162) (PR #92478)

2024-06-13 Thread via llvm-branch-commits

https://github.com/AtariDreams reopened 
https://github.com/llvm/llvm-project/pull/92478
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] release/18.x: [SystemZ] Bugfix in getDemandedSrcElements(). (#88623) (PR #95463)

2024-06-13 Thread via llvm-branch-commits

llvmbot wrote:

@uweigand What do you think about merging this PR to the release branch?

https://github.com/llvm/llvm-project/pull/95463
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] release/18.x: [SystemZ] Bugfix in getDemandedSrcElements(). (#88623) (PR #95463)

2024-06-13 Thread via llvm-branch-commits

https://github.com/llvmbot milestoned 
https://github.com/llvm/llvm-project/pull/95463
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] release/18.x: [SystemZ] Bugfix in getDemandedSrcElements(). (#88623) (PR #95463)

2024-06-13 Thread via llvm-branch-commits

https://github.com/llvmbot created 
https://github.com/llvm/llvm-project/pull/95463

Backport 7e4c6e98fa05f5c3bf14f96365ae74a8d12c6257

Requested by: @nikic

>From 016c200faf4bcf1a531dabd4411a2ec4d0a23068 Mon Sep 17 00:00:00 2001
From: Jonas Paulsson 
Date: Mon, 15 Apr 2024 16:32:14 +0200
Subject: [PATCH] [SystemZ] Bugfix in getDemandedSrcElements(). (#88623)

For the intrinsic s390_vperm, all of the elements are demanded, so use
an APInt with the value of '-1' for them (not '1').

Fixes https://github.com/llvm/llvm-project/issues/88397

(cherry picked from commit 7e4c6e98fa05f5c3bf14f96365ae74a8d12c6257)
---
 .../Target/SystemZ/SystemZISelLowering.cpp|  2 +-
 .../SystemZ/knownbits-intrinsics-binop.ll | 19 +++
 2 files changed, 20 insertions(+), 1 deletion(-)

diff --git a/llvm/lib/Target/SystemZ/SystemZISelLowering.cpp 
b/llvm/lib/Target/SystemZ/SystemZISelLowering.cpp
index 5e0b0594b0a42..3a297238c2088 100644
--- a/llvm/lib/Target/SystemZ/SystemZISelLowering.cpp
+++ b/llvm/lib/Target/SystemZ/SystemZISelLowering.cpp
@@ -7774,7 +7774,7 @@ static APInt getDemandedSrcElements(SDValue Op, const 
APInt ,
   break;
 }
 case Intrinsic::s390_vperm:
-  SrcDemE = APInt(NumElts, 1);
+  SrcDemE = APInt(NumElts, -1);
   break;
 default:
   llvm_unreachable("Unhandled intrinsic.");
diff --git a/llvm/test/CodeGen/SystemZ/knownbits-intrinsics-binop.ll 
b/llvm/test/CodeGen/SystemZ/knownbits-intrinsics-binop.ll
index 3bcbbb45581f9..b855d01934782 100644
--- a/llvm/test/CodeGen/SystemZ/knownbits-intrinsics-binop.ll
+++ b/llvm/test/CodeGen/SystemZ/knownbits-intrinsics-binop.ll
@@ -458,3 +458,22 @@ define <16 x i8> @f30() {
i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1>
   ret <16 x i8> %res
 }
+
+; Test VPERM with various constant operands.
+define i32 @f31() {
+; CHECK-LABEL: f31:
+; CHECK-LABEL: # %bb.0:
+; CHECK-NEXT: larl %r1, .LCPI31_0
+; CHECK-NEXT: vl %v0, 0(%r1), 3
+; CHECK-NEXT: larl %r1, .LCPI31_1
+; CHECK-NEXT: vl %v1, 0(%r1), 3
+; CHECK-NEXT: vperm %v0, %v1, %v1, %v0
+; CHECK-NEXT: vlgvb %r2, %v0, 0
+; CHECK-NEXT: nilf %r2, 7
+; CHECK-NEXT:   # kill: def $r2l killed $r2l killed 
$r2d
+; CHECK-NEXT: br %r14
+  %P = tail call <16 x i8> @llvm.s390.vperm(<16 x i8> , <16 x 
i8> , <16 x i8> )
+  %E = extractelement <16 x i8> %P, i64 0
+  %res = zext i8 %E to i32
+  ret i32 %res
+}

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] Bump version to 18.1.8 (PR #95458)

2024-06-13 Thread via llvm-branch-commits

llvmbot wrote:




@llvm/pr-subscribers-testing-tools

Author: Tom Stellard (tstellar)


Changes



---
Full diff: https://github.com/llvm/llvm-project/pull/95458.diff


2 Files Affected:

- (modified) llvm/CMakeLists.txt (+1-1) 
- (modified) llvm/utils/lit/lit/__init__.py (+1-1) 


``diff
diff --git a/llvm/CMakeLists.txt b/llvm/CMakeLists.txt
index 51278943847aa..909a965cd86c8 100644
--- a/llvm/CMakeLists.txt
+++ b/llvm/CMakeLists.txt
@@ -22,7 +22,7 @@ if(NOT DEFINED LLVM_VERSION_MINOR)
   set(LLVM_VERSION_MINOR 1)
 endif()
 if(NOT DEFINED LLVM_VERSION_PATCH)
-  set(LLVM_VERSION_PATCH 7)
+  set(LLVM_VERSION_PATCH 8)
 endif()
 if(NOT DEFINED LLVM_VERSION_SUFFIX)
   set(LLVM_VERSION_SUFFIX)
diff --git a/llvm/utils/lit/lit/__init__.py b/llvm/utils/lit/lit/__init__.py
index 5003d78ce5218..800d59492d8ff 100644
--- a/llvm/utils/lit/lit/__init__.py
+++ b/llvm/utils/lit/lit/__init__.py
@@ -2,7 +2,7 @@
 
 __author__ = "Daniel Dunbar"
 __email__ = "dan...@minormatter.com"
-__versioninfo__ = (18, 1, 7)
+__versioninfo__ = (18, 1, 8)
 __version__ = ".".join(str(v) for v in __versioninfo__) + "dev"
 
 __all__ = []

``




https://github.com/llvm/llvm-project/pull/95458
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] Bump version to 18.1.8 (PR #95458)

2024-06-13 Thread Tom Stellard via llvm-branch-commits

https://github.com/tstellar created 
https://github.com/llvm/llvm-project/pull/95458

None

>From 2edf6218b7e74cc76035e4e1efa8166b1c22312d Mon Sep 17 00:00:00 2001
From: Tom Stellard 
Date: Thu, 13 Jun 2024 12:33:39 -0700
Subject: [PATCH] Bump version to 18.1.8

---
 llvm/CMakeLists.txt| 2 +-
 llvm/utils/lit/lit/__init__.py | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/llvm/CMakeLists.txt b/llvm/CMakeLists.txt
index 51278943847aa..909a965cd86c8 100644
--- a/llvm/CMakeLists.txt
+++ b/llvm/CMakeLists.txt
@@ -22,7 +22,7 @@ if(NOT DEFINED LLVM_VERSION_MINOR)
   set(LLVM_VERSION_MINOR 1)
 endif()
 if(NOT DEFINED LLVM_VERSION_PATCH)
-  set(LLVM_VERSION_PATCH 7)
+  set(LLVM_VERSION_PATCH 8)
 endif()
 if(NOT DEFINED LLVM_VERSION_SUFFIX)
   set(LLVM_VERSION_SUFFIX)
diff --git a/llvm/utils/lit/lit/__init__.py b/llvm/utils/lit/lit/__init__.py
index 5003d78ce5218..800d59492d8ff 100644
--- a/llvm/utils/lit/lit/__init__.py
+++ b/llvm/utils/lit/lit/__init__.py
@@ -2,7 +2,7 @@
 
 __author__ = "Daniel Dunbar"
 __email__ = "dan...@minormatter.com"
-__versioninfo__ = (18, 1, 7)
+__versioninfo__ = (18, 1, 8)
 __version__ = ".".join(str(v) for v in __versioninfo__) + "dev"
 
 __all__ = []

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [compiler-rt] [TySan] Fixed false positive when accessing offset member variables (PR #95387)

2024-06-13 Thread via llvm-branch-commits

https://github.com/gbMattN converted_to_draft 
https://github.com/llvm/llvm-project/pull/95387
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [Support] Integrate SipHash.cpp into libSupport. (PR #94394)

2024-06-13 Thread Kristof Beyls via llvm-branch-commits

kbeyls wrote:

> Yes, this doesn't have tests by itself because there's no exposed interface. 
> It's certainly trivial to add one (which would allow using the reference test 
> vectors).
> 
> I don't have strong arguments either way, but I figured the conservative 
> option is to force hypothetical users to consider their use more seriously. 
> One might argue that's not how we usually treat libSupport, so I'm happy to 
> expose the raw function here.

I see some value in being able to test with the reference test vectors to be 
fully sure that the implementation really implements SipHash. But as I said 
above, I'm happy with merging this as is.

https://github.com/llvm/llvm-project/pull/94394
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [compiler-rt] 7fe862d - Revert "[hwasan] Add fixed_shadow_base flag (#73980)"

2024-06-13 Thread via llvm-branch-commits

Author: Florian Mayer
Date: 2024-06-13T09:55:29-07:00
New Revision: 7fe862d0a1f6dfa67c236f5af32ad15546797404

URL: 
https://github.com/llvm/llvm-project/commit/7fe862d0a1f6dfa67c236f5af32ad15546797404
DIFF: 
https://github.com/llvm/llvm-project/commit/7fe862d0a1f6dfa67c236f5af32ad15546797404.diff

LOG: Revert "[hwasan] Add fixed_shadow_base flag (#73980)"

This reverts commit ea991a11b2a3d2bfa545adbefb71cd17e8970a43.

Added: 


Modified: 
compiler-rt/lib/hwasan/hwasan_flags.inc
compiler-rt/lib/hwasan/hwasan_linux.cpp

Removed: 
compiler-rt/test/hwasan/TestCases/Linux/fixed-shadow.c



diff  --git a/compiler-rt/lib/hwasan/hwasan_flags.inc 
b/compiler-rt/lib/hwasan/hwasan_flags.inc
index 058a0457b9e7f..978fa46b705cb 100644
--- a/compiler-rt/lib/hwasan/hwasan_flags.inc
+++ b/compiler-rt/lib/hwasan/hwasan_flags.inc
@@ -84,10 +84,3 @@ HWASAN_FLAG(bool, malloc_bisect_dump, false,
 // are untagged before the call.
 HWASAN_FLAG(bool, fail_without_syscall_abi, true,
 "Exit if fail to request relaxed syscall ABI.")
-
-HWASAN_FLAG(
-uptr, fixed_shadow_base, -1,
-"If not -1, HWASan will attempt to allocate the shadow at this address, "
-"instead of choosing one dynamically."
-"Tip: this can be combined with the compiler option, "
-"-hwasan-mapping-offset, to optimize the instrumentation.")

diff  --git a/compiler-rt/lib/hwasan/hwasan_linux.cpp 
b/compiler-rt/lib/hwasan/hwasan_linux.cpp
index e6aa60b324fa7..c254670ee2d48 100644
--- a/compiler-rt/lib/hwasan/hwasan_linux.cpp
+++ b/compiler-rt/lib/hwasan/hwasan_linux.cpp
@@ -106,12 +106,8 @@ static uptr GetHighMemEnd() {
 }
 
 static void InitializeShadowBaseAddress(uptr shadow_size_bytes) {
-  if (flags()->fixed_shadow_base != (uptr)-1) {
-__hwasan_shadow_memory_dynamic_address = flags()->fixed_shadow_base;
-  } else {
-__hwasan_shadow_memory_dynamic_address =
-FindDynamicShadowStart(shadow_size_bytes);
-  }
+  __hwasan_shadow_memory_dynamic_address =
+  FindDynamicShadowStart(shadow_size_bytes);
 }
 
 static void MaybeDieIfNoTaggingAbi(const char *message) {

diff  --git a/compiler-rt/test/hwasan/TestCases/Linux/fixed-shadow.c 
b/compiler-rt/test/hwasan/TestCases/Linux/fixed-shadow.c
deleted file mode 100644
index 4ff1d3e64c1d0..0
--- a/compiler-rt/test/hwasan/TestCases/Linux/fixed-shadow.c
+++ /dev/null
@@ -1,76 +0,0 @@
-// Test fixed shadow base functionality.
-//
-// Default compiler instrumentation works with any shadow base (dynamic or 
fixed).
-// RUN: %clang_hwasan %s -o %t && %run %t
-// RUN: %clang_hwasan %s -o %t && 
HWASAN_OPTIONS=fixed_shadow_base=263878495698944 %run %t
-// RUN: %clang_hwasan %s -o %t && 
HWASAN_OPTIONS=fixed_shadow_base=4398046511104 %run %t
-//
-// If -hwasan-mapping-offset is set, then the fixed_shadow_base needs to match.
-// RUN: %clang_hwasan %s -mllvm -hwasan-mapping-offset=263878495698944 -o %t 
&& HWASAN_OPTIONS=fixed_shadow_base=263878495698944 %run %t
-// RUN: %clang_hwasan %s -mllvm -hwasan-mapping-offset=4398046511104 -o %t && 
HWASAN_OPTIONS=fixed_shadow_base=4398046511104 %run %t
-// RUN: %clang_hwasan %s -mllvm -hwasan-mapping-offset=263878495698944 -o %t 
&& HWASAN_OPTIONS=fixed_shadow_base=4398046511104 not %run %t
-// RUN: %clang_hwasan %s -mllvm -hwasan-mapping-offset=4398046511104 -o %t && 
HWASAN_OPTIONS=fixed_shadow_base=263878495698944 not %run %t
-//
-// Note: if fixed_shadow_base is not set, compiler-rt will dynamically choose a
-// shadow base, which has a tiny but non-zero probability of matching the
-// compiler instrumentation. To avoid test flake, we do not test this case.
-//
-// Assume 48-bit VMA
-// REQUIRES: aarch64-target-arch
-//
-// REQUIRES: Clang
-//
-// UNSUPPORTED: android
-
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-
-int main() {
-  __hwasan_enable_allocator_tagging();
-
-  // We test that the compiler instrumentation is able to access shadow memory
-  // for many 
diff erent addresses. If we only test a small number of addresses,
-  // it might work by chance even if the shadow base does not match between the
-  // compiler instrumentation and compiler-rt.
-  void **mmaps[256];
-  // 48-bit VMA
-  for (int i = 0; i < 256; i++) {
-unsigned long long addr = (i * (1ULL << 40));
-
-void *p = mmap((void *)addr, 4096, PROT_READ | PROT_WRITE,
-   MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
-// We don't use MAP_FIXED, to avoid overwriting critical memory.
-// However, if we don't get allocated the requested address, it
-// isn't a useful test.
-if ((unsigned long long)p != addr) {
-  munmap(p, 4096);
-  mmaps[i] = MAP_FAILED;
-} else {
-  mmaps[i] = p;
-}
-  }
-
-  int failures = 0;
-  for (int i = 0; i < 256; i++) {
-if (mmaps[i] == MAP_FAILED) {
-  failures++;
-} else {
-  printf("%d %p\n", i, mmaps[i]);
-  munmap(mmaps[i], 4096);
-}
-  }
-
-  

[llvm-branch-commits] [flang] [flang] Lower REDUCE intrinsic for reduction op with args by value (PR #95353)

2024-06-13 Thread via llvm-branch-commits

https://github.com/jeanPerier approved this pull request.

LGTM

https://github.com/llvm/llvm-project/pull/95353
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [BOLT] Drop high discrepancy profiles in matching (PR #95156)

2024-06-13 Thread shaw young via llvm-branch-commits


@@ -592,10 +599,15 @@ void preprocessUnreachableBlocks(FlowFunction ) {
 /// Decide if stale profile matching can be applied for a given function.
 /// Currently we skip inference for (very) large instances and for instances
 /// having "unexpected" control flow (e.g., having no sink basic blocks).
-bool canApplyInference(const FlowFunction ) {
+bool canApplyInference(const FlowFunction ,
+   const yaml::bolt::BinaryFunctionProfile ) {
   if (Func.Blocks.size() > opts::StaleMatchingMaxFuncSize)
 return false;
 
+  if ((double)Func.MatchedExecCount / YamlBF.ExecCount >=
+  opts::MatchedProfileThreshold / 100.0)
+return false;

shawbyoung wrote:

I’m leaning towards the block count heuristic now. I think the 1M and 4x1K exec 
count block case is likely pretty common – I imagine functions with loops would 
look a lot like this. Having some blocks matched exactly would suggest to me 
that there would likely be a reasonable amount of similarity between the 
profiled function and existing function relationally, which block coldness 
likely doesn’t have an outsized bearing on. I think having a reasonably high 
threshold for matched blocks would conservatively allow us to drop functions in 
high discrepancy – I’ll test this on a production binary. 

https://github.com/llvm/llvm-project/pull/95156
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU: Fix buffer load/store of pointers (PR #95379)

2024-06-13 Thread Krzysztof Drewniak via llvm-branch-commits

krzysz00 wrote:

Yeah, makes sense.

... what prevents a match-bitwidth operator from existing?

Context from where I'm standing is that you should be able to 
`raw.buffer.load/store` any (non-aggregate, let's say, since that could be 
better handled in `addrspace(7)` handling) type you could `load` or `store`.

That is, `raw.ptr.buffer.load.i15` should work (as an i16 load that truncates) 
as should `raw.ptr.buffer.store.v8f32` (or `raw.ptr.buffer.store.i256`). Sure, 
the latter are two instructions long, but regular loads can regularize to 
multiple instructions just fine. 

My thoughts on how to implement that second behavior were to split the type 
into legal chunks and add in the offsets, and then merge/bitcast the values 
back.

https://github.com/llvm/llvm-project/pull/95379
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [flang] [flang] Lower REDUCE intrinsic for reduction op with args by value (PR #95353)

2024-06-13 Thread via llvm-branch-commits


@@ -5745,6 +5745,14 @@ IntrinsicLibrary::genReduce(mlir::Type resultType,
   int rank = arrayTmp.rank();
   assert(rank >= 1);
 
+  // Arguements to the reduction operation are passed by reference or value?
+  bool argByRef = true;
+  if (auto embox =
+  mlir::dyn_cast_or_null(operation.getDefiningOp())) 
{

jeanPerier wrote:

Does REDUCE works with dummy procedure and procedure pointers? If so it would 
be good to add tests for those cases to ensure the pattern matching here works 
with them.

https://github.com/llvm/llvm-project/pull/95353
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU: Fix buffer load/store of pointers (PR #95379)

2024-06-13 Thread Matt Arsenault via llvm-branch-commits

arsenm wrote:

I don't think we should be trying to handle the unreasonable illegal types in 
the intrinsics themselves. Theoretically the intrinsic should correspond to 
direct support.

We would handle the ugly types in the fat pointer lowering in terms of the 
intrinsics. 

https://github.com/llvm/llvm-project/pull/95379
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [compiler-rt] [TySan] Fixed false positive when accessing offset member variables (PR #95387)

2024-06-13 Thread via llvm-branch-commits

gbMattN wrote:

This may be a side effect of a different bug tracking global variables. I think 
fixing that bug first, and then applying this change if the problem persists is 
a better idea. Because of this, I'm switching this to a draft for now. 
Discourse link is 
https://discourse.llvm.org/t/reviving-typesanitizer-a-sanitizer-to-catch-type-based-aliasing-violations/66092/23

https://github.com/llvm/llvm-project/pull/95387
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [libc] 93e7f14 - Revert "[libc] fix aarch64 linux full build (#95358)"

2024-06-13 Thread via llvm-branch-commits

Author: Schrodinger ZHU Yifan
Date: 2024-06-13T07:54:57-07:00
New Revision: 93e7f145bc38c7c47d797e652d891695eb44fcfa

URL: 
https://github.com/llvm/llvm-project/commit/93e7f145bc38c7c47d797e652d891695eb44fcfa
DIFF: 
https://github.com/llvm/llvm-project/commit/93e7f145bc38c7c47d797e652d891695eb44fcfa.diff

LOG: Revert "[libc] fix aarch64 linux full build (#95358)"

This reverts commit ca05204f9aa258c5324d5675c7987c7e570168a0.

Added: 


Modified: 
libc/config/linux/aarch64/entrypoints.txt
libc/src/__support/threads/linux/CMakeLists.txt
libc/test/IntegrationTest/test.cpp

Removed: 




diff  --git a/libc/config/linux/aarch64/entrypoints.txt 
b/libc/config/linux/aarch64/entrypoints.txt
index 7ce088689b925..db96a80051a8d 100644
--- a/libc/config/linux/aarch64/entrypoints.txt
+++ b/libc/config/linux/aarch64/entrypoints.txt
@@ -643,12 +643,6 @@ if(LLVM_LIBC_FULL_BUILD)
 libc.src.pthread.pthread_mutexattr_setrobust
 libc.src.pthread.pthread_mutexattr_settype
 libc.src.pthread.pthread_once
-libc.src.pthread.pthread_rwlockattr_destroy
-libc.src.pthread.pthread_rwlockattr_getkind_np
-libc.src.pthread.pthread_rwlockattr_getpshared
-libc.src.pthread.pthread_rwlockattr_init
-libc.src.pthread.pthread_rwlockattr_setkind_np
-libc.src.pthread.pthread_rwlockattr_setpshared
 libc.src.pthread.pthread_setspecific
 
 # sched.h entrypoints
@@ -759,7 +753,6 @@ if(LLVM_LIBC_FULL_BUILD)
 libc.src.unistd._exit
 libc.src.unistd.environ
 libc.src.unistd.execv
-libc.src.unistd.fork
 libc.src.unistd.getopt
 libc.src.unistd.optarg
 libc.src.unistd.optind

diff  --git a/libc/src/__support/threads/linux/CMakeLists.txt 
b/libc/src/__support/threads/linux/CMakeLists.txt
index 8e6cd7227b2c8..9bf88ccc84557 100644
--- a/libc/src/__support/threads/linux/CMakeLists.txt
+++ b/libc/src/__support/threads/linux/CMakeLists.txt
@@ -64,7 +64,6 @@ add_object_library(
 .futex_utils
 libc.config.linux.app_h
 libc.include.sys_syscall
-libc.include.fcntl
 libc.src.errno.errno
 libc.src.__support.CPP.atomic
 libc.src.__support.CPP.stringstream

diff  --git a/libc/test/IntegrationTest/test.cpp 
b/libc/test/IntegrationTest/test.cpp
index 27e7f29efa0f1..3bdbe89a3fb62 100644
--- a/libc/test/IntegrationTest/test.cpp
+++ b/libc/test/IntegrationTest/test.cpp
@@ -79,10 +79,4 @@ void *realloc(void *ptr, size_t s) {
 // Integration tests are linked with -nostdlib. BFD linker expects
 // __dso_handle when -nostdlib is used.
 void *__dso_handle = nullptr;
-
-// On some platform (aarch64 fedora tested) full build integration test
-// objects need to link against libgcc, which may expect a __getauxval
-// function. For now, it is fine to provide a weak definition that always
-// returns false.
-[[gnu::weak]] bool __getauxval(uint64_t, uint64_t *) { return false; }
 } // extern "C"



___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [libc] 91323a6 - Revert "Revert "[libc] fix aarch64 linux full build (#95358)" (#95419)"

2024-06-13 Thread via llvm-branch-commits

Author: Schrodinger ZHU Yifan
Date: 2024-06-13T08:38:05-07:00
New Revision: 91323a6ea8f32a9fe2cec7051e8a99b87157133e

URL: 
https://github.com/llvm/llvm-project/commit/91323a6ea8f32a9fe2cec7051e8a99b87157133e
DIFF: 
https://github.com/llvm/llvm-project/commit/91323a6ea8f32a9fe2cec7051e8a99b87157133e.diff

LOG: Revert "Revert "[libc] fix aarch64 linux full build (#95358)" (#95419)"

This reverts commit 9e5428e6b02c77fb18c4bdf688a216c957fd7a53.

Added: 


Modified: 
libc/config/linux/aarch64/entrypoints.txt
libc/src/__support/threads/linux/CMakeLists.txt
libc/test/IntegrationTest/test.cpp

Removed: 




diff  --git a/libc/config/linux/aarch64/entrypoints.txt 
b/libc/config/linux/aarch64/entrypoints.txt
index db96a80051a8d..7ce088689b925 100644
--- a/libc/config/linux/aarch64/entrypoints.txt
+++ b/libc/config/linux/aarch64/entrypoints.txt
@@ -643,6 +643,12 @@ if(LLVM_LIBC_FULL_BUILD)
 libc.src.pthread.pthread_mutexattr_setrobust
 libc.src.pthread.pthread_mutexattr_settype
 libc.src.pthread.pthread_once
+libc.src.pthread.pthread_rwlockattr_destroy
+libc.src.pthread.pthread_rwlockattr_getkind_np
+libc.src.pthread.pthread_rwlockattr_getpshared
+libc.src.pthread.pthread_rwlockattr_init
+libc.src.pthread.pthread_rwlockattr_setkind_np
+libc.src.pthread.pthread_rwlockattr_setpshared
 libc.src.pthread.pthread_setspecific
 
 # sched.h entrypoints
@@ -753,6 +759,7 @@ if(LLVM_LIBC_FULL_BUILD)
 libc.src.unistd._exit
 libc.src.unistd.environ
 libc.src.unistd.execv
+libc.src.unistd.fork
 libc.src.unistd.getopt
 libc.src.unistd.optarg
 libc.src.unistd.optind

diff  --git a/libc/src/__support/threads/linux/CMakeLists.txt 
b/libc/src/__support/threads/linux/CMakeLists.txt
index 9bf88ccc84557..8e6cd7227b2c8 100644
--- a/libc/src/__support/threads/linux/CMakeLists.txt
+++ b/libc/src/__support/threads/linux/CMakeLists.txt
@@ -64,6 +64,7 @@ add_object_library(
 .futex_utils
 libc.config.linux.app_h
 libc.include.sys_syscall
+libc.include.fcntl
 libc.src.errno.errno
 libc.src.__support.CPP.atomic
 libc.src.__support.CPP.stringstream

diff  --git a/libc/test/IntegrationTest/test.cpp 
b/libc/test/IntegrationTest/test.cpp
index 3bdbe89a3fb62..27e7f29efa0f1 100644
--- a/libc/test/IntegrationTest/test.cpp
+++ b/libc/test/IntegrationTest/test.cpp
@@ -79,4 +79,10 @@ void *realloc(void *ptr, size_t s) {
 // Integration tests are linked with -nostdlib. BFD linker expects
 // __dso_handle when -nostdlib is used.
 void *__dso_handle = nullptr;
+
+// On some platform (aarch64 fedora tested) full build integration test
+// objects need to link against libgcc, which may expect a __getauxval
+// function. For now, it is fine to provide a weak definition that always
+// returns false.
+[[gnu::weak]] bool __getauxval(uint64_t, uint64_t *) { return false; }
 } // extern "C"



___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU: Fix buffer load/store of pointers (PR #95379)

2024-06-13 Thread Matt Arsenault via llvm-branch-commits

arsenm wrote:

That's what we've traditionally done and I think we should stop. We currently 
skip inserting the casts if the type is legal. 

It introduces extra bitcasts, which have a cost and increase pattern match 
complexity. We have a bunch of patterns that don't bother to look through the 
casts for a load/store 

https://github.com/llvm/llvm-project/pull/95379
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU: Fix buffer load/store of pointers (PR #95379)

2024-06-13 Thread Krzysztof Drewniak via llvm-branch-commits

krzysz00 wrote:

So, general question on this patch series:

Wouldn't it be more reasonable to, instead of having separate handling for all 
the possible register types, always do loads as `i8`, `i16`, `i32` `<2 x i32>`, 
`<3 x i32>, or `<4 x i32>` and then `bitcast`/`merge_values`/... the results 
back to their type?

Or at least to have that fallback path - if we don't know what a type is, 
load/store it as its bits?

(Then we wouldn't need to, for example, go back and add a `<16 x i8>` case if 
someone realizes they want that)

https://github.com/llvm/llvm-project/pull/95379
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [llvm] clang/AMDGPU: Emit atomicrmw from ds_fadd builtins (PR #95395)

2024-06-13 Thread Yaxun Liu via llvm-branch-commits


@@ -117,13 +117,44 @@ void test_update_dpp(global int* out, int arg1, int arg2)
 }
 
 // CHECK-LABEL: @test_ds_fadd
-// CHECK: {{.*}}call{{.*}} float @llvm.amdgcn.ds.fadd.f32(ptr addrspace(3) 
%out, float %src, i32 0, i32 0, i1 false)
+// CHECK: atomicrmw fadd ptr addrspace(3) %out, float %src monotonic, align 
4{{$}}
+// CHECK: atomicrmw volatile fadd ptr addrspace(3) %out, float %src monotonic, 
align 4{{$}}
+
+// CHECK: atomicrmw fadd ptr addrspace(3) %out, float %src acquire, align 
4{{$}}
+// CHECK: atomicrmw fadd ptr addrspace(3) %out, float %src acquire, align 
4{{$}}
+// CHECK: atomicrmw fadd ptr addrspace(3) %out, float %src release, align 
4{{$}}
+// CHECK: atomicrmw fadd ptr addrspace(3) %out, float %src acq_rel, align 
4{{$}}
+// CHECK: atomicrmw fadd ptr addrspace(3) %out, float %src seq_cst, align 
4{{$}}
+// CHECK: atomicrmw fadd ptr addrspace(3) %out, float %src seq_cst, align 
4{{$}}
+
+// CHECK: atomicrmw fadd ptr addrspace(3) %out, float %src syncscope("agent") 
monotonic, align 4{{$}}
+// CHECK: atomicrmw fadd ptr addrspace(3) %out, float %src 
syncscope("workgroup") monotonic, align 4{{$}}
+// CHECK: atomicrmw fadd ptr addrspace(3) %out, float %src 
syncscope("wavefront") monotonic, align 4{{$}}
+// CHECK: atomicrmw fadd ptr addrspace(3) %out, float %src 
syncscope("singlethread") monotonic, align 4{{$}}
+// CHECK: atomicrmw fadd ptr addrspace(3) %out, float %src monotonic, align 
4{{$}}
 #if !defined(__SPIRV__)
 void test_ds_faddf(local float *out, float src) {
 #else
-void test_ds_faddf(__attribute__((address_space(3))) float *out, float src) {
+  void test_ds_faddf(__attribute__((address_space(3))) float *out, float src) {
 #endif
+
   *out = __builtin_amdgcn_ds_faddf(out, src, 0, 0, false);
+  *out = __builtin_amdgcn_ds_faddf(out, src, 0, 0, true);
+
+  // Test all orders.
+  *out = __builtin_amdgcn_ds_faddf(out, src, 1, 0, false);

yxsamliu wrote:

better use predefined macros
```
  // Define macros for the C11 / C++11 memory orderings
  Builder.defineMacro("__ATOMIC_RELAXED", "0");
  Builder.defineMacro("__ATOMIC_CONSUME", "1");
  Builder.defineMacro("__ATOMIC_ACQUIRE", "2");
  Builder.defineMacro("__ATOMIC_RELEASE", "3");
  Builder.defineMacro("__ATOMIC_ACQ_REL", "4");
  Builder.defineMacro("__ATOMIC_SEQ_CST", "5");

  // Define macros for the clang atomic scopes.
  Builder.defineMacro("__MEMORY_SCOPE_SYSTEM", "0");
  Builder.defineMacro("__MEMORY_SCOPE_DEVICE", "1");
  Builder.defineMacro("__MEMORY_SCOPE_WRKGRP", "2");
  Builder.defineMacro("__MEMORY_SCOPE_WVFRNT", "3");
  Builder.defineMacro("__MEMORY_SCOPE_SINGLE", "4");

```

https://github.com/llvm/llvm-project/pull/95395
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [compiler-rt] [TySan] Fixed false positive when accessing offset member variables (PR #95387)

2024-06-13 Thread via llvm-branch-commits

gbMattN wrote:

@fhahn 

https://github.com/llvm/llvm-project/pull/95387
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [llvm] AMDGPU: Remove ds atomic fadd intrinsics (PR #95396)

2024-06-13 Thread Christudasan Devadasan via llvm-branch-commits


@@ -2331,40 +2337,74 @@ static Value *upgradeARMIntrinsicCall(StringRef Name, 
CallBase *CI, Function *F,
   llvm_unreachable("Unknown function for ARM CallBase upgrade.");
 }
 
+// These are expected to have have the arguments:

cdevadas wrote:

```suggestion
// These are expected to have the arguments:
```

https://github.com/llvm/llvm-project/pull/95396
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [compiler-rt] [TySan] Fixed false positive when accessing offset member variables (PR #95387)

2024-06-13 Thread via llvm-branch-commits

https://github.com/gbMattN updated 
https://github.com/llvm/llvm-project/pull/95387

>From 432f994b1bc21e4db0778fff9cc1425f788f8168 Mon Sep 17 00:00:00 2001
From: Matthew Nagy 
Date: Thu, 13 Jun 2024 09:54:04 +
Subject: [PATCH] [TySan] Fixed false positive when accessing offset member
 variables

---
 compiler-rt/lib/tysan/tysan.cpp | 12 +-
 compiler-rt/test/tysan/struct-members.c | 31 +
 2 files changed, 42 insertions(+), 1 deletion(-)
 create mode 100644 compiler-rt/test/tysan/struct-members.c

diff --git a/compiler-rt/lib/tysan/tysan.cpp b/compiler-rt/lib/tysan/tysan.cpp
index f627851d049e6..747727e48a152 100644
--- a/compiler-rt/lib/tysan/tysan.cpp
+++ b/compiler-rt/lib/tysan/tysan.cpp
@@ -221,7 +221,17 @@ __tysan_check(void *addr, int size, tysan_type_descriptor 
*td, int flags) {
 OldTDPtr -= i;
 OldTD = *OldTDPtr;
 
-if (!isAliasingLegal(td, OldTD))
+tysan_type_descriptor *InternalMember = OldTD;
+if (OldTD->Tag == TYSAN_STRUCT_TD) {
+  for (int j = 0; j < OldTD->Struct.MemberCount; j++) {
+if (OldTD->Struct.Members[j].Offset == i) {
+  InternalMember = OldTD->Struct.Members[j].Type;
+  break;
+}
+  }
+}
+
+if (!isAliasingLegal(td, InternalMember))
   reportError(addr, size, td, OldTD, AccessStr,
   "accesses part of an existing object", -i, pc, bp, sp);
 
diff --git a/compiler-rt/test/tysan/struct-members.c 
b/compiler-rt/test/tysan/struct-members.c
new file mode 100644
index 0..76ea3c431dd7b
--- /dev/null
+++ b/compiler-rt/test/tysan/struct-members.c
@@ -0,0 +1,31 @@
+// RUN: %clang_tysan -O0 %s -o %t && %run %t >%t.out 2>&1
+// RUN: FileCheck %s < %t.out
+
+#include 
+
+struct X {
+  int a, b, c;
+} x;
+
+static struct X xArray[2];
+
+int main() {
+  x.a = 1;
+  x.b = 2;
+  x.c = 3;
+
+  printf("%d %d %d\n", x.a, x.b, x.c);
+  // CHECK-NOT: ERROR: TypeSanitizer: type-aliasing-violation
+
+  for (size_t i = 0; i < 2; i++) {
+xArray[i].a = 1;
+xArray[i].b = 1;
+xArray[i].c = 1;
+  }
+
+  struct X *xPtr = (struct X *)&(xArray[0].c);
+  xPtr->a = 1;
+  // CHECK: ERROR: TypeSanitizer: type-aliasing-violation
+  // CHECK: WRITE of size 4 at {{.*}} with type int (in X at offset 0) 
accesses an existing object of type int (in X at offset 8)
+  // CHECK: {{#0 0x.* in main .*struct-members.c:}}[[@LINE-3]]
+}

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU: Cleanup selection patterns for buffer loads (PR #95378)

2024-06-13 Thread Matt Arsenault via llvm-branch-commits


@@ -1421,27 +1421,21 @@ let OtherPredicates = [HasPackedD16VMem] in {
   defm : MUBUF_LoadIntrinsicPat;
 } // End HasPackedD16VMem.
 
-defm : MUBUF_LoadIntrinsicPat;
-defm : MUBUF_LoadIntrinsicPat;
-defm : MUBUF_LoadIntrinsicPat;
-defm : MUBUF_LoadIntrinsicPat;
-defm : MUBUF_LoadIntrinsicPat;
-defm : MUBUF_LoadIntrinsicPat;
-defm : MUBUF_LoadIntrinsicPat;
-defm : MUBUF_LoadIntrinsicPat;
-defm : MUBUF_LoadIntrinsicPat;
-defm : MUBUF_LoadIntrinsicPat;
-defm : MUBUF_LoadIntrinsicPat;
-defm : MUBUF_LoadIntrinsicPat;
-defm : MUBUF_LoadIntrinsicPat;
-defm : MUBUF_LoadIntrinsicPat;
-defm : MUBUF_LoadIntrinsicPat;
-defm : MUBUF_LoadIntrinsicPat;
-defm : MUBUF_LoadIntrinsicPat;
-defm : MUBUF_LoadIntrinsicPat;
-defm : MUBUF_LoadIntrinsicPat;
-defm : MUBUF_LoadIntrinsicPat;
-defm : MUBUF_LoadIntrinsicPat;
+foreach vt = Reg32Types.types in {
+defm : MUBUF_LoadIntrinsicPat;
+}

arsenm wrote:

I'm not a big fan of omitting the braces, especially in tablegen. If we're 
going to delete the braces the lines should at least be indented 

https://github.com/llvm/llvm-project/pull/95378
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU: Fix buffer load/store of pointers (PR #95379)

2024-06-13 Thread Matt Arsenault via llvm-branch-commits

https://github.com/arsenm updated 
https://github.com/llvm/llvm-project/pull/95379

>From 14695322d92821374dd6599d8f0f76d212e50169 Mon Sep 17 00:00:00 2001
From: Matt Arsenault 
Date: Wed, 12 Jun 2024 10:10:20 +0200
Subject: [PATCH] AMDGPU: Fix buffer load/store of pointers

Make sure we test all the address spaces since this support isn't
free in gisel.
---
 llvm/lib/Target/AMDGPU/SIISelLowering.cpp |  31 +-
 .../AMDGPU/llvm.amdgcn.raw.ptr.buffer.load.ll | 596 ++
 .../llvm.amdgcn.raw.ptr.buffer.store.ll   | 456 ++
 3 files changed, 1071 insertions(+), 12 deletions(-)

diff --git a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp 
b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
index 81098201e9c0f..7a36c88b892c8 100644
--- a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+++ b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
@@ -1112,29 +1112,33 @@ unsigned 
SITargetLowering::getVectorTypeBreakdownForCallingConv(
 Context, CC, VT, IntermediateVT, NumIntermediates, RegisterVT);
 }
 
-static EVT memVTFromLoadIntrData(Type *Ty, unsigned MaxNumLanes) {
+static EVT memVTFromLoadIntrData(const SITargetLowering ,
+ const DataLayout , Type *Ty,
+ unsigned MaxNumLanes) {
   assert(MaxNumLanes != 0);
 
+  LLVMContext  = Ty->getContext();
   if (auto *VT = dyn_cast(Ty)) {
 unsigned NumElts = std::min(MaxNumLanes, VT->getNumElements());
-return EVT::getVectorVT(Ty->getContext(),
-EVT::getEVT(VT->getElementType()),
+return EVT::getVectorVT(Ctx, TLI.getValueType(DL, VT->getElementType()),
 NumElts);
   }
 
-  return EVT::getEVT(Ty);
+  return TLI.getValueType(DL, Ty);
 }
 
 // Peek through TFE struct returns to only use the data size.
-static EVT memVTFromLoadIntrReturn(Type *Ty, unsigned MaxNumLanes) {
+static EVT memVTFromLoadIntrReturn(const SITargetLowering ,
+   const DataLayout , Type *Ty,
+   unsigned MaxNumLanes) {
   auto *ST = dyn_cast(Ty);
   if (!ST)
-return memVTFromLoadIntrData(Ty, MaxNumLanes);
+return memVTFromLoadIntrData(TLI, DL, Ty, MaxNumLanes);
 
   // TFE intrinsics return an aggregate type.
   assert(ST->getNumContainedTypes() == 2 &&
  ST->getContainedType(1)->isIntegerTy(32));
-  return memVTFromLoadIntrData(ST->getContainedType(0), MaxNumLanes);
+  return memVTFromLoadIntrData(TLI, DL, ST->getContainedType(0), MaxNumLanes);
 }
 
 /// Map address space 7 to MVT::v5i32 because that's its in-memory
@@ -1219,10 +1223,12 @@ bool SITargetLowering::getTgtMemIntrinsic(IntrinsicInfo 
,
   MaxNumLanes = DMask == 0 ? 1 : llvm::popcount(DMask);
 }
 
-Info.memVT = memVTFromLoadIntrReturn(CI.getType(), MaxNumLanes);
+Info.memVT = memVTFromLoadIntrReturn(*this, MF.getDataLayout(),
+ CI.getType(), MaxNumLanes);
   } else {
-Info.memVT = memVTFromLoadIntrReturn(
-CI.getType(), std::numeric_limits::max());
+Info.memVT =
+memVTFromLoadIntrReturn(*this, MF.getDataLayout(), CI.getType(),
+std::numeric_limits::max());
   }
 
   // FIXME: What does alignment mean for an image?
@@ -1235,9 +1241,10 @@ bool SITargetLowering::getTgtMemIntrinsic(IntrinsicInfo 
,
   if (RsrcIntr->IsImage) {
 unsigned DMask = 
cast(CI.getArgOperand(1))->getZExtValue();
 unsigned DMaskLanes = DMask == 0 ? 1 : llvm::popcount(DMask);
-Info.memVT = memVTFromLoadIntrData(DataTy, DMaskLanes);
+Info.memVT = memVTFromLoadIntrData(*this, MF.getDataLayout(), DataTy,
+   DMaskLanes);
   } else
-Info.memVT = EVT::getEVT(DataTy);
+Info.memVT = getValueType(MF.getDataLayout(), DataTy);
 
   Info.flags |= MachineMemOperand::MOStore;
 } else {
diff --git a/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.raw.ptr.buffer.load.ll 
b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.raw.ptr.buffer.load.ll
index 3e3371091ef72..4d557c76dc4d0 100644
--- a/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.raw.ptr.buffer.load.ll
+++ b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.raw.ptr.buffer.load.ll
@@ -1280,6 +1280,602 @@ define <2 x i64> @buffer_load_v2i64__voffset_add(ptr 
addrspace(8) inreg %rsrc, i
   ret <2 x i64> %data
 }
 
+define ptr @buffer_load_p0__voffset_add(ptr addrspace(8) inreg %rsrc, i32 
%voffset) {
+; PREGFX10-LABEL: buffer_load_p0__voffset_add:
+; PREGFX10:   ; %bb.0:
+; PREGFX10-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; PREGFX10-NEXT:buffer_load_dwordx2 v[0:1], v0, s[4:7], 0 offen offset:60
+; PREGFX10-NEXT:s_waitcnt vmcnt(0)
+; PREGFX10-NEXT:s_setpc_b64 s[30:31]
+;
+; GFX10-LABEL: buffer_load_p0__voffset_add:
+; GFX10:   ; %bb.0:
+; GFX10-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX10-NEXT:buffer_load_dwordx2 v[0:1], v0, s[4:7], 0 offen offset:60
+; 

[llvm-branch-commits] [llvm] AMDGPU: Cleanup selection patterns for buffer loads (PR #95378)

2024-06-13 Thread Matt Arsenault via llvm-branch-commits

https://github.com/arsenm updated 
https://github.com/llvm/llvm-project/pull/95378

>From 1dfcc0961e82bbe656faded0c38e694da0d76c9b Mon Sep 17 00:00:00 2001
From: Matt Arsenault 
Date: Sun, 9 Jun 2024 23:12:31 +0200
Subject: [PATCH] AMDGPU: Cleanup selection patterns for buffer loads

We should just support these for all register types.
---
 llvm/lib/Target/AMDGPU/BUFInstructions.td | 72 ++-
 llvm/lib/Target/AMDGPU/SIRegisterInfo.td  | 16 ++---
 2 files changed, 39 insertions(+), 49 deletions(-)

diff --git a/llvm/lib/Target/AMDGPU/BUFInstructions.td 
b/llvm/lib/Target/AMDGPU/BUFInstructions.td
index 50e62788c5eac..978d261f5a662 100644
--- a/llvm/lib/Target/AMDGPU/BUFInstructions.td
+++ b/llvm/lib/Target/AMDGPU/BUFInstructions.td
@@ -1421,27 +1421,21 @@ let OtherPredicates = [HasPackedD16VMem] in {
   defm : MUBUF_LoadIntrinsicPat;
 } // End HasPackedD16VMem.
 
-defm : MUBUF_LoadIntrinsicPat;
-defm : MUBUF_LoadIntrinsicPat;
-defm : MUBUF_LoadIntrinsicPat;
-defm : MUBUF_LoadIntrinsicPat;
-defm : MUBUF_LoadIntrinsicPat;
-defm : MUBUF_LoadIntrinsicPat;
-defm : MUBUF_LoadIntrinsicPat;
-defm : MUBUF_LoadIntrinsicPat;
-defm : MUBUF_LoadIntrinsicPat;
-defm : MUBUF_LoadIntrinsicPat;
-defm : MUBUF_LoadIntrinsicPat;
-defm : MUBUF_LoadIntrinsicPat;
-defm : MUBUF_LoadIntrinsicPat;
-defm : MUBUF_LoadIntrinsicPat;
-defm : MUBUF_LoadIntrinsicPat;
-defm : MUBUF_LoadIntrinsicPat;
-defm : MUBUF_LoadIntrinsicPat;
-defm : MUBUF_LoadIntrinsicPat;
-defm : MUBUF_LoadIntrinsicPat;
-defm : MUBUF_LoadIntrinsicPat;
-defm : MUBUF_LoadIntrinsicPat;
+foreach vt = Reg32Types.types in {
+defm : MUBUF_LoadIntrinsicPat;
+}
+
+foreach vt = Reg64Types.types in {
+defm : MUBUF_LoadIntrinsicPat;
+}
+
+foreach vt = Reg96Types.types in {
+defm : MUBUF_LoadIntrinsicPat;
+}
+
+foreach vt = Reg128Types.types in {
+defm : MUBUF_LoadIntrinsicPat;
+}
 
 defm : MUBUF_LoadIntrinsicPat;
 defm : MUBUF_LoadIntrinsicPat;
@@ -1532,27 +1526,21 @@ let OtherPredicates = [HasPackedD16VMem] in {
   defm : MUBUF_StoreIntrinsicPat;
 } // End HasPackedD16VMem.
 
-defm : MUBUF_StoreIntrinsicPat;
-defm : MUBUF_StoreIntrinsicPat;
-defm : MUBUF_StoreIntrinsicPat;
-defm : MUBUF_StoreIntrinsicPat;
-defm : MUBUF_StoreIntrinsicPat;
-defm : MUBUF_StoreIntrinsicPat;
-defm : MUBUF_StoreIntrinsicPat;
-defm : MUBUF_StoreIntrinsicPat;
-defm : MUBUF_StoreIntrinsicPat;
-defm : MUBUF_StoreIntrinsicPat;
-defm : MUBUF_StoreIntrinsicPat;
-defm : MUBUF_StoreIntrinsicPat;
-defm : MUBUF_StoreIntrinsicPat;
-defm : MUBUF_StoreIntrinsicPat;
-defm : MUBUF_StoreIntrinsicPat;
-defm : MUBUF_StoreIntrinsicPat;
-defm : MUBUF_StoreIntrinsicPat;
-defm : MUBUF_StoreIntrinsicPat;
-defm : MUBUF_StoreIntrinsicPat;
-defm : MUBUF_StoreIntrinsicPat;
-defm : MUBUF_StoreIntrinsicPat;
+foreach vt = Reg32Types.types in {
+defm : MUBUF_StoreIntrinsicPat;
+}
+
+foreach vt = Reg64Types.types in {
+defm : MUBUF_StoreIntrinsicPat;
+}
+
+foreach vt = Reg96Types.types in {
+defm : MUBUF_StoreIntrinsicPat;
+}
+
+foreach vt = Reg128Types.types in {
+defm : MUBUF_StoreIntrinsicPat;
+}
 
 defm : MUBUF_StoreIntrinsicPat;
 defm : MUBUF_StoreIntrinsicPat;
diff --git a/llvm/lib/Target/AMDGPU/SIRegisterInfo.td 
b/llvm/lib/Target/AMDGPU/SIRegisterInfo.td
index caac7126068ef..a8efe2b2ba35e 100644
--- a/llvm/lib/Target/AMDGPU/SIRegisterInfo.td
+++ b/llvm/lib/Target/AMDGPU/SIRegisterInfo.td
@@ -586,7 +586,9 @@ class RegisterTypes reg_types> {
 
 def Reg16Types : RegisterTypes<[i16, f16, bf16]>;
 def Reg32Types : RegisterTypes<[i32, f32, v2i16, v2f16, v2bf16, p2, p3, p5, 
p6]>;
-def Reg64Types : RegisterTypes<[i64, f64, v2i32, v2f32, p0]>;
+def Reg64Types : RegisterTypes<[i64, f64, v2i32, v2f32, p0, v4i16, v4f16, 
v4bf16]>;
+def Reg96Types : RegisterTypes<[v3i32, v3f32]>;
+def Reg128Types : RegisterTypes<[v4i32, v4f32, v2i64, v2f64, v8i16, v8f16, 
v8bf16]>;
 
 let HasVGPR = 1 in {
 // VOP3 and VINTERP can access 256 lo and 256 hi registers.
@@ -744,7 +746,7 @@ def Pseudo_SReg_32 : SIRegisterClass<"AMDGPU", [i32, f32, 
i16, f16, bf16, v2i16,
   let BaseClassOrder = 1;
 }
 
-def Pseudo_SReg_128 : SIRegisterClass<"AMDGPU", [v4i32, v2i64, v2f64, v8i16, 
v8f16, v8bf16], 32,
+def Pseudo_SReg_128 : SIRegisterClass<"AMDGPU", Reg128Types.types, 32,
   (add PRIVATE_RSRC_REG)> {
   let isAllocatable = 0;
   let CopyCost = -1;
@@ -815,7 +817,7 @@ def SRegOrLds_32 : SIRegisterClass<"AMDGPU", [i32, f32, 
i16, f16, bf16, v2i16, v
   let HasSGPR = 1;
 }
 
-def SGPR_64 : SIRegisterClass<"AMDGPU", [v2i32, i64, v2f32, f64, v4i16, v4f16, 
v4bf16], 32,
+def SGPR_64 : SIRegisterClass<"AMDGPU", Reg64Types.types, 32,
 (add SGPR_64Regs)> {
   let CopyCost = 1;
   let AllocationPriority = 1;
@@ -905,8 +907,8 @@ multiclass SRegClass;
-defm "" : SRegClass<4, [v4i32, v4f32, v2i64, v2f64, v8i16, v8f16, v8bf16], 
SGPR_128Regs, TTMP_128Regs>;
+defm "" : SRegClass<3, Reg96Types.types, SGPR_96Regs, TTMP_96Regs>;
+defm "" : SRegClass<4, Reg128Types.types, SGPR_128Regs, TTMP_128Regs>;
 defm "" : SRegClass<5, 

[llvm-branch-commits] [clang] 0e8c9bc - Revert "[clang][NFC] Add a test for CWG2685 (#95206)"

2024-06-13 Thread via llvm-branch-commits

Author: Younan Zhang
Date: 2024-06-13T18:53:46+08:00
New Revision: 0e8c9bca863137f14aea2cee0e05d4270b33e0e8

URL: 
https://github.com/llvm/llvm-project/commit/0e8c9bca863137f14aea2cee0e05d4270b33e0e8
DIFF: 
https://github.com/llvm/llvm-project/commit/0e8c9bca863137f14aea2cee0e05d4270b33e0e8.diff

LOG: Revert "[clang][NFC] Add a test for CWG2685 (#95206)"

This reverts commit 3475116e2c37a2c8a69658b36c02871c322da008.

Added: 


Modified: 
clang/test/CXX/drs/cwg26xx.cpp
clang/www/cxx_dr_status.html

Removed: 




diff  --git a/clang/test/CXX/drs/cwg26xx.cpp b/clang/test/CXX/drs/cwg26xx.cpp
index fee3ef16850bf..2b17c8101438d 100644
--- a/clang/test/CXX/drs/cwg26xx.cpp
+++ b/clang/test/CXX/drs/cwg26xx.cpp
@@ -225,15 +225,6 @@ void m() {
 }
 
 #if __cplusplus >= 202302L
-
-namespace cwg2685 { // cwg2685: 17
-template 
-struct A {
-  T ar[4];
-};
-A a = { "foo" };
-}
-
 namespace cwg2687 { // cwg2687: 18
 struct S{
 void f(int);

diff  --git a/clang/www/cxx_dr_status.html b/clang/www/cxx_dr_status.html
index 8c79708f23abd..5e2ab06701703 100755
--- a/clang/www/cxx_dr_status.html
+++ b/clang/www/cxx_dr_status.html
@@ -15918,7 +15918,7 @@ C++ defect report implementation 
status
 https://cplusplus.github.io/CWG/issues/2685.html;>2685
 C++23
 Aggregate CTAD, string, and brace elision
-Clang 17
+Unknown
   
   
 https://cplusplus.github.io/CWG/issues/2686.html;>2686



___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [compiler-rt] [TySan] Fixed false positive when accessing offset member variables (PR #95387)

2024-06-13 Thread via llvm-branch-commits

llvmbot wrote:




@llvm/pr-subscribers-compiler-rt-sanitizer

Author: None (gbMattN)


Changes

This patch fixes a bug the current TySan implementation has. Currently if you 
access a member variable other than the first, TySan reports an error. TySan 
believes you are accessing the struct type with an offset equal to the offset 
of the member variable you are trying to access.
With this patch, the type we are trying to access is amended to the type of the 
member variable matching the offset we are accessing with. It does this if and 
only if there is a member at that offset, however, so any incorrect accesses 
are still caught. This is checked in the struct-members.c test.

---
Full diff: https://github.com/llvm/llvm-project/pull/95387.diff


2 Files Affected:

- (modified) compiler-rt/lib/tysan/tysan.cpp (+11-1) 
- (added) compiler-rt/test/tysan/struct-members.c (+32) 


``diff
diff --git a/compiler-rt/lib/tysan/tysan.cpp b/compiler-rt/lib/tysan/tysan.cpp
index f627851d049e6..747727e48a152 100644
--- a/compiler-rt/lib/tysan/tysan.cpp
+++ b/compiler-rt/lib/tysan/tysan.cpp
@@ -221,7 +221,17 @@ __tysan_check(void *addr, int size, tysan_type_descriptor 
*td, int flags) {
 OldTDPtr -= i;
 OldTD = *OldTDPtr;
 
-if (!isAliasingLegal(td, OldTD))
+tysan_type_descriptor *InternalMember = OldTD;
+if (OldTD->Tag == TYSAN_STRUCT_TD) {
+  for (int j = 0; j < OldTD->Struct.MemberCount; j++) {
+if (OldTD->Struct.Members[j].Offset == i) {
+  InternalMember = OldTD->Struct.Members[j].Type;
+  break;
+}
+  }
+}
+
+if (!isAliasingLegal(td, InternalMember))
   reportError(addr, size, td, OldTD, AccessStr,
   "accesses part of an existing object", -i, pc, bp, sp);
 
diff --git a/compiler-rt/test/tysan/struct-members.c 
b/compiler-rt/test/tysan/struct-members.c
new file mode 100644
index 0..8cf6499f78ce6
--- /dev/null
+++ b/compiler-rt/test/tysan/struct-members.c
@@ -0,0 +1,32 @@
+// RUN: %clang_tysan -O0 %s -o %t && %run %t >%t.out 2>&1
+// RUN: FileCheck %s < %t.out
+
+#include 
+
+struct X {
+  int a, b, c;
+} x;
+
+static struct X xArray[2];
+
+int main() {
+  x.a = 1;
+  x.b = 2;
+  x.c = 3;
+
+  printf("%d %d %d\n", x.a, x.b, x.c);
+  // CHECK-NOT: ERROR: TypeSanitizer: type-aliasing-violation
+
+  for (size_t i = 0; i < 2; i++) {
+xArray[i].a = 1;
+xArray[i].b = 1;
+xArray[i].c = 1;
+  }
+  printf("Here\n");
+
+  struct X *xPtr = (struct X *)&(xArray[0].c);
+  xPtr->a = 1;
+  // CHECK: ERROR: TypeSanitizer: type-aliasing-violation
+  // CHECK: WRITE of size 4 at {{.*}} with type int (in X at offset 0) 
accesses an existing object of type int (in X at offset 8)
+  // CHECK: {{#0 0x.* in main .*struct-members.c:}}[[@LINE-3]]
+}

``




https://github.com/llvm/llvm-project/pull/95387
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [compiler-rt] [TySan] Fixed false positive when accessing offset member variables (PR #95387)

2024-06-13 Thread via llvm-branch-commits

github-actions[bot] wrote:



Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be
notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this 
page.

If this is not working for you, it is probably because you do not have write
permissions for the repository. In which case you can instead tag reviewers by
name in a comment by using `@` followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review
by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate
is once a week. Please remember that you are asking for valuable time from 
other developers.

If you have further questions, they may be answered by the [LLVM GitHub User 
Guide](https://llvm.org/docs/GitHub.html).

You can also ask questions in a comment on this PR, on the [LLVM 
Discord](https://discord.com/invite/xS7Z362) or on the 
[forums](https://discourse.llvm.org/).

https://github.com/llvm/llvm-project/pull/95387
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU: Fix buffer intrinsic store of bfloat (PR #95377)

2024-06-13 Thread Jay Foad via llvm-branch-commits

https://github.com/jayfoad approved this pull request.


https://github.com/llvm/llvm-project/pull/95377
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU: Fix buffer intrinsic store of bfloat (PR #95377)

2024-06-13 Thread Matt Arsenault via llvm-branch-commits

https://github.com/arsenm ready_for_review 
https://github.com/llvm/llvm-project/pull/95377
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU: Cleanup selection patterns for buffer loads (PR #95378)

2024-06-13 Thread via llvm-branch-commits

llvmbot wrote:




@llvm/pr-subscribers-backend-amdgpu

Author: Matt Arsenault (arsenm)


Changes

We should just support these for all register types.

---
Full diff: https://github.com/llvm/llvm-project/pull/95378.diff


2 Files Affected:

- (modified) llvm/lib/Target/AMDGPU/BUFInstructions.td (+30-42) 
- (modified) llvm/lib/Target/AMDGPU/SIRegisterInfo.td (+9-7) 


``diff
diff --git a/llvm/lib/Target/AMDGPU/BUFInstructions.td 
b/llvm/lib/Target/AMDGPU/BUFInstructions.td
index 94dd45f1333b0..2f52edb7f917a 100644
--- a/llvm/lib/Target/AMDGPU/BUFInstructions.td
+++ b/llvm/lib/Target/AMDGPU/BUFInstructions.td
@@ -1421,27 +1421,21 @@ let OtherPredicates = [HasPackedD16VMem] in {
   defm : MUBUF_LoadIntrinsicPat;
 } // End HasPackedD16VMem.
 
-defm : MUBUF_LoadIntrinsicPat;
-defm : MUBUF_LoadIntrinsicPat;
-defm : MUBUF_LoadIntrinsicPat;
-defm : MUBUF_LoadIntrinsicPat;
-defm : MUBUF_LoadIntrinsicPat;
-defm : MUBUF_LoadIntrinsicPat;
-defm : MUBUF_LoadIntrinsicPat;
-defm : MUBUF_LoadIntrinsicPat;
-defm : MUBUF_LoadIntrinsicPat;
-defm : MUBUF_LoadIntrinsicPat;
-defm : MUBUF_LoadIntrinsicPat;
-defm : MUBUF_LoadIntrinsicPat;
-defm : MUBUF_LoadIntrinsicPat;
-defm : MUBUF_LoadIntrinsicPat;
-defm : MUBUF_LoadIntrinsicPat;
-defm : MUBUF_LoadIntrinsicPat;
-defm : MUBUF_LoadIntrinsicPat;
-defm : MUBUF_LoadIntrinsicPat;
-defm : MUBUF_LoadIntrinsicPat;
-defm : MUBUF_LoadIntrinsicPat;
-defm : MUBUF_LoadIntrinsicPat;
+foreach vt = Reg32Types.types in {
+defm : MUBUF_LoadIntrinsicPat;
+}
+
+foreach vt = Reg64Types.types in {
+defm : MUBUF_LoadIntrinsicPat;
+}
+
+foreach vt = Reg96Types.types in {
+defm : MUBUF_LoadIntrinsicPat;
+}
+
+foreach vt = Reg128Types.types in {
+defm : MUBUF_LoadIntrinsicPat;
+}
 
 defm : MUBUF_LoadIntrinsicPat;
 defm : MUBUF_LoadIntrinsicPat;
@@ -1532,27 +1526,21 @@ let OtherPredicates = [HasPackedD16VMem] in {
   defm : MUBUF_StoreIntrinsicPat;
 } // End HasPackedD16VMem.
 
-defm : MUBUF_StoreIntrinsicPat;
-defm : MUBUF_StoreIntrinsicPat;
-defm : MUBUF_StoreIntrinsicPat;
-defm : MUBUF_StoreIntrinsicPat;
-defm : MUBUF_StoreIntrinsicPat;
-defm : MUBUF_StoreIntrinsicPat;
-defm : MUBUF_StoreIntrinsicPat;
-defm : MUBUF_StoreIntrinsicPat;
-defm : MUBUF_StoreIntrinsicPat;
-defm : MUBUF_StoreIntrinsicPat;
-defm : MUBUF_StoreIntrinsicPat;
-defm : MUBUF_StoreIntrinsicPat;
-defm : MUBUF_StoreIntrinsicPat;
-defm : MUBUF_StoreIntrinsicPat;
-defm : MUBUF_StoreIntrinsicPat;
-defm : MUBUF_StoreIntrinsicPat;
-defm : MUBUF_StoreIntrinsicPat;
-defm : MUBUF_StoreIntrinsicPat;
-defm : MUBUF_StoreIntrinsicPat;
-defm : MUBUF_StoreIntrinsicPat;
-defm : MUBUF_StoreIntrinsicPat;
+foreach vt = Reg32Types.types in {
+defm : MUBUF_StoreIntrinsicPat;
+}
+
+foreach vt = Reg64Types.types in {
+defm : MUBUF_StoreIntrinsicPat;
+}
+
+foreach vt = Reg96Types.types in {
+defm : MUBUF_StoreIntrinsicPat;
+}
+
+foreach vt = Reg128Types.types in {
+defm : MUBUF_StoreIntrinsicPat;
+}
 
 defm : MUBUF_StoreIntrinsicPat;
 defm : MUBUF_StoreIntrinsicPat;
diff --git a/llvm/lib/Target/AMDGPU/SIRegisterInfo.td 
b/llvm/lib/Target/AMDGPU/SIRegisterInfo.td
index caac7126068ef..a8efe2b2ba35e 100644
--- a/llvm/lib/Target/AMDGPU/SIRegisterInfo.td
+++ b/llvm/lib/Target/AMDGPU/SIRegisterInfo.td
@@ -586,7 +586,9 @@ class RegisterTypes reg_types> {
 
 def Reg16Types : RegisterTypes<[i16, f16, bf16]>;
 def Reg32Types : RegisterTypes<[i32, f32, v2i16, v2f16, v2bf16, p2, p3, p5, 
p6]>;
-def Reg64Types : RegisterTypes<[i64, f64, v2i32, v2f32, p0]>;
+def Reg64Types : RegisterTypes<[i64, f64, v2i32, v2f32, p0, v4i16, v4f16, 
v4bf16]>;
+def Reg96Types : RegisterTypes<[v3i32, v3f32]>;
+def Reg128Types : RegisterTypes<[v4i32, v4f32, v2i64, v2f64, v8i16, v8f16, 
v8bf16]>;
 
 let HasVGPR = 1 in {
 // VOP3 and VINTERP can access 256 lo and 256 hi registers.
@@ -744,7 +746,7 @@ def Pseudo_SReg_32 : SIRegisterClass<"AMDGPU", [i32, f32, 
i16, f16, bf16, v2i16,
   let BaseClassOrder = 1;
 }
 
-def Pseudo_SReg_128 : SIRegisterClass<"AMDGPU", [v4i32, v2i64, v2f64, v8i16, 
v8f16, v8bf16], 32,
+def Pseudo_SReg_128 : SIRegisterClass<"AMDGPU", Reg128Types.types, 32,
   (add PRIVATE_RSRC_REG)> {
   let isAllocatable = 0;
   let CopyCost = -1;
@@ -815,7 +817,7 @@ def SRegOrLds_32 : SIRegisterClass<"AMDGPU", [i32, f32, 
i16, f16, bf16, v2i16, v
   let HasSGPR = 1;
 }
 
-def SGPR_64 : SIRegisterClass<"AMDGPU", [v2i32, i64, v2f32, f64, v4i16, v4f16, 
v4bf16], 32,
+def SGPR_64 : SIRegisterClass<"AMDGPU", Reg64Types.types, 32,
 (add SGPR_64Regs)> {
   let CopyCost = 1;
   let AllocationPriority = 1;
@@ -905,8 +907,8 @@ multiclass SRegClass;
-defm "" : SRegClass<4, [v4i32, v4f32, v2i64, v2f64, v8i16, v8f16, v8bf16], 
SGPR_128Regs, TTMP_128Regs>;
+defm "" : SRegClass<3, Reg96Types.types, SGPR_96Regs, TTMP_96Regs>;
+defm "" : SRegClass<4, Reg128Types.types, SGPR_128Regs, TTMP_128Regs>;
 defm "" : SRegClass<5, [v5i32, v5f32], SGPR_160Regs, TTMP_160Regs>;
 defm "" : SRegClass<6, [v6i32, v6f32, v3i64, v3f64], SGPR_192Regs, 
TTMP_192Regs>;
 defm "" : 

[llvm-branch-commits] [llvm] AMDGPU: Fix buffer load/store of pointers (PR #95379)

2024-06-13 Thread via llvm-branch-commits

llvmbot wrote:




@llvm/pr-subscribers-backend-amdgpu

Author: Matt Arsenault (arsenm)


Changes

Make sure we test all the address spaces since this support isn't
free in gisel.

---

Patch is 38.37 KiB, truncated to 20.00 KiB below, full version: 
https://github.com/llvm/llvm-project/pull/95379.diff


3 Files Affected:

- (modified) llvm/lib/Target/AMDGPU/SIISelLowering.cpp (+19-12) 
- (modified) llvm/test/CodeGen/AMDGPU/llvm.amdgcn.raw.ptr.buffer.load.ll (+596) 
- (modified) llvm/test/CodeGen/AMDGPU/llvm.amdgcn.raw.ptr.buffer.store.ll 
(+144) 


``diff
diff --git a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp 
b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
index 81098201e9c0f..7a36c88b892c8 100644
--- a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+++ b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
@@ -1112,29 +1112,33 @@ unsigned 
SITargetLowering::getVectorTypeBreakdownForCallingConv(
 Context, CC, VT, IntermediateVT, NumIntermediates, RegisterVT);
 }
 
-static EVT memVTFromLoadIntrData(Type *Ty, unsigned MaxNumLanes) {
+static EVT memVTFromLoadIntrData(const SITargetLowering ,
+ const DataLayout , Type *Ty,
+ unsigned MaxNumLanes) {
   assert(MaxNumLanes != 0);
 
+  LLVMContext  = Ty->getContext();
   if (auto *VT = dyn_cast(Ty)) {
 unsigned NumElts = std::min(MaxNumLanes, VT->getNumElements());
-return EVT::getVectorVT(Ty->getContext(),
-EVT::getEVT(VT->getElementType()),
+return EVT::getVectorVT(Ctx, TLI.getValueType(DL, VT->getElementType()),
 NumElts);
   }
 
-  return EVT::getEVT(Ty);
+  return TLI.getValueType(DL, Ty);
 }
 
 // Peek through TFE struct returns to only use the data size.
-static EVT memVTFromLoadIntrReturn(Type *Ty, unsigned MaxNumLanes) {
+static EVT memVTFromLoadIntrReturn(const SITargetLowering ,
+   const DataLayout , Type *Ty,
+   unsigned MaxNumLanes) {
   auto *ST = dyn_cast(Ty);
   if (!ST)
-return memVTFromLoadIntrData(Ty, MaxNumLanes);
+return memVTFromLoadIntrData(TLI, DL, Ty, MaxNumLanes);
 
   // TFE intrinsics return an aggregate type.
   assert(ST->getNumContainedTypes() == 2 &&
  ST->getContainedType(1)->isIntegerTy(32));
-  return memVTFromLoadIntrData(ST->getContainedType(0), MaxNumLanes);
+  return memVTFromLoadIntrData(TLI, DL, ST->getContainedType(0), MaxNumLanes);
 }
 
 /// Map address space 7 to MVT::v5i32 because that's its in-memory
@@ -1219,10 +1223,12 @@ bool SITargetLowering::getTgtMemIntrinsic(IntrinsicInfo 
,
   MaxNumLanes = DMask == 0 ? 1 : llvm::popcount(DMask);
 }
 
-Info.memVT = memVTFromLoadIntrReturn(CI.getType(), MaxNumLanes);
+Info.memVT = memVTFromLoadIntrReturn(*this, MF.getDataLayout(),
+ CI.getType(), MaxNumLanes);
   } else {
-Info.memVT = memVTFromLoadIntrReturn(
-CI.getType(), std::numeric_limits::max());
+Info.memVT =
+memVTFromLoadIntrReturn(*this, MF.getDataLayout(), CI.getType(),
+std::numeric_limits::max());
   }
 
   // FIXME: What does alignment mean for an image?
@@ -1235,9 +1241,10 @@ bool SITargetLowering::getTgtMemIntrinsic(IntrinsicInfo 
,
   if (RsrcIntr->IsImage) {
 unsigned DMask = 
cast(CI.getArgOperand(1))->getZExtValue();
 unsigned DMaskLanes = DMask == 0 ? 1 : llvm::popcount(DMask);
-Info.memVT = memVTFromLoadIntrData(DataTy, DMaskLanes);
+Info.memVT = memVTFromLoadIntrData(*this, MF.getDataLayout(), DataTy,
+   DMaskLanes);
   } else
-Info.memVT = EVT::getEVT(DataTy);
+Info.memVT = getValueType(MF.getDataLayout(), DataTy);
 
   Info.flags |= MachineMemOperand::MOStore;
 } else {
diff --git a/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.raw.ptr.buffer.load.ll 
b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.raw.ptr.buffer.load.ll
index 3e3371091ef72..4d557c76dc4d0 100644
--- a/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.raw.ptr.buffer.load.ll
+++ b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.raw.ptr.buffer.load.ll
@@ -1280,6 +1280,602 @@ define <2 x i64> @buffer_load_v2i64__voffset_add(ptr 
addrspace(8) inreg %rsrc, i
   ret <2 x i64> %data
 }
 
+define ptr @buffer_load_p0__voffset_add(ptr addrspace(8) inreg %rsrc, i32 
%voffset) {
+; PREGFX10-LABEL: buffer_load_p0__voffset_add:
+; PREGFX10:   ; %bb.0:
+; PREGFX10-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; PREGFX10-NEXT:buffer_load_dwordx2 v[0:1], v0, s[4:7], 0 offen offset:60
+; PREGFX10-NEXT:s_waitcnt vmcnt(0)
+; PREGFX10-NEXT:s_setpc_b64 s[30:31]
+;
+; GFX10-LABEL: buffer_load_p0__voffset_add:
+; GFX10:   ; %bb.0:
+; GFX10-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX10-NEXT:buffer_load_dwordx2 v[0:1], v0, s[4:7], 0 offen offset:60
+; GFX10-NEXT:s_waitcnt vmcnt(0)
+; 

[llvm-branch-commits] [llvm] AMDGPU: Fix buffer load/store of pointers (PR #95379)

2024-06-13 Thread Matt Arsenault via llvm-branch-commits

https://github.com/arsenm ready_for_review 
https://github.com/llvm/llvm-project/pull/95379
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU: Cleanup selection patterns for buffer loads (PR #95378)

2024-06-13 Thread Matt Arsenault via llvm-branch-commits

https://github.com/arsenm ready_for_review 
https://github.com/llvm/llvm-project/pull/95378
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU: Fix buffer intrinsic store of bfloat (PR #95377)

2024-06-13 Thread via llvm-branch-commits

llvmbot wrote:




@llvm/pr-subscribers-backend-amdgpu

Author: Matt Arsenault (arsenm)


Changes



---
Full diff: https://github.com/llvm/llvm-project/pull/95377.diff


2 Files Affected:

- (modified) llvm/lib/Target/AMDGPU/SIISelLowering.cpp (+2-2) 
- (modified) llvm/test/CodeGen/AMDGPU/llvm.amdgcn.raw.ptr.buffer.store.bf16.ll 
(+32-5) 


``diff
diff --git a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp 
b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
index 4946129c65a95..81098201e9c0f 100644
--- a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+++ b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
@@ -874,7 +874,7 @@ SITargetLowering::SITargetLowering(const TargetMachine ,
  {MVT::Other, MVT::v2i16, MVT::v2f16, MVT::v2bf16,
   MVT::v3i16, MVT::v3f16, MVT::v4f16, MVT::v4i16,
   MVT::v4bf16, MVT::v8i16, MVT::v8f16, MVT::v8bf16,
-  MVT::f16, MVT::i16, MVT::i8, MVT::i128},
+  MVT::f16, MVT::i16, MVT::bf16, MVT::i8, MVT::i128},
  Custom);
 
   setOperationAction(ISD::STACKSAVE, MVT::Other, Custom);
@@ -9973,7 +9973,7 @@ SDValue 
SITargetLowering::handleByteShortBufferStores(SelectionDAG ,
   EVT VDataType, SDLoc DL,
   SDValue Ops[],
   MemSDNode *M) const {
-  if (VDataType == MVT::f16)
+  if (VDataType == MVT::f16 || VDataType == MVT::bf16)
 Ops[1] = DAG.getNode(ISD::BITCAST, DL, MVT::i16, Ops[1]);
 
   SDValue BufferStoreExt = DAG.getNode(ISD::ANY_EXTEND, DL, MVT::i32, Ops[1]);
diff --git a/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.raw.ptr.buffer.store.bf16.ll 
b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.raw.ptr.buffer.store.bf16.ll
index f7f3742a90633..82dd35ab4c240 100644
--- a/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.raw.ptr.buffer.store.bf16.ll
+++ b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.raw.ptr.buffer.store.bf16.ll
@@ -5,11 +5,38 @@
 ; RUN: llc -mtriple=amdgcn -mcpu=gfx1010 < %s | FileCheck --check-prefix=GFX10 
%s
 ; RUN: llc -mtriple=amdgcn -mcpu=gfx1100 -amdgpu-enable-delay-alu=0 < %s | 
FileCheck --check-prefixes=GFX11 %s
 
-; FIXME
-; define amdgpu_ps void @buffer_store_bf16(ptr addrspace(8) inreg %rsrc, 
bfloat %data, i32 %offset) {
-;   call void @llvm.amdgcn.raw.ptr.buffer.store.bf16(bfloat %data, ptr 
addrspace(8) %rsrc, i32 %offset, i32 0, i32 0)
-;   ret void
-; }
+define amdgpu_ps void @buffer_store_bf16(ptr addrspace(8) inreg %rsrc, bfloat 
%data, i32 %offset) {
+; GFX7-LABEL: buffer_store_bf16:
+; GFX7:   ; %bb.0:
+; GFX7-NEXT:v_mul_f32_e32 v0, 1.0, v0
+; GFX7-NEXT:v_lshrrev_b32_e32 v0, 16, v0
+; GFX7-NEXT:buffer_store_short v0, v1, s[0:3], 0 offen
+; GFX7-NEXT:s_endpgm
+;
+; GFX8-LABEL: buffer_store_bf16:
+; GFX8:   ; %bb.0:
+; GFX8-NEXT:buffer_store_short v0, v1, s[0:3], 0 offen
+; GFX8-NEXT:s_endpgm
+;
+; GFX9-LABEL: buffer_store_bf16:
+; GFX9:   ; %bb.0:
+; GFX9-NEXT:buffer_store_short v0, v1, s[0:3], 0 offen
+; GFX9-NEXT:s_endpgm
+;
+; GFX10-LABEL: buffer_store_bf16:
+; GFX10:   ; %bb.0:
+; GFX10-NEXT:buffer_store_short v0, v1, s[0:3], 0 offen
+; GFX10-NEXT:s_endpgm
+;
+; GFX11-LABEL: buffer_store_bf16:
+; GFX11:   ; %bb.0:
+; GFX11-NEXT:buffer_store_b16 v0, v1, s[0:3], 0 offen
+; GFX11-NEXT:s_nop 0
+; GFX11-NEXT:s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)
+; GFX11-NEXT:s_endpgm
+  call void @llvm.amdgcn.raw.ptr.buffer.store.bf16(bfloat %data, ptr 
addrspace(8) %rsrc, i32 %offset, i32 0, i32 0)
+  ret void
+}
 
 define amdgpu_ps void @buffer_store_v2bf16(ptr addrspace(8) inreg %rsrc, <2 x 
bfloat> %data, i32 %offset) {
 ; GFX7-LABEL: buffer_store_v2bf16:

``




https://github.com/llvm/llvm-project/pull/95377
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU: Cleanup selection patterns for buffer loads (PR #95378)

2024-06-13 Thread Matt Arsenault via llvm-branch-commits

arsenm wrote:

> [!WARNING]
> This pull request is not mergeable via GitHub because a downstack PR is 
> open. Once all requirements are satisfied, merge this PR as a stack  href="https://app.graphite.dev/github/pr/llvm/llvm-project/95378?utm_source=stack-comment-downstack-mergeability-warning;
>  >on Graphite.
> https://graphite.dev/docs/merge-pull-requests;>Learn more

* **#95379** https://app.graphite.dev/github/pr/llvm/llvm-project/95379?utm_source=stack-comment-icon;
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png; alt="Graphite" 
width="10px" height="10px"/>
* **#95378** https://app.graphite.dev/github/pr/llvm/llvm-project/95378?utm_source=stack-comment-icon;
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png; alt="Graphite" 
width="10px" height="10px"/> 
* **#95377** https://app.graphite.dev/github/pr/llvm/llvm-project/95377?utm_source=stack-comment-icon;
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png; alt="Graphite" 
width="10px" height="10px"/>
* **#95376** https://app.graphite.dev/github/pr/llvm/llvm-project/95376?utm_source=stack-comment-icon;
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png; alt="Graphite" 
width="10px" height="10px"/>
* `main`

This stack of pull requests is managed by Graphite. https://stacking.dev/?utm_source=stack-comment;>Learn more about 
stacking.


 Join @arsenm and the rest of your teammates on https://graphite.dev?utm-source=stack-comment;>https://static.graphite.dev/graphite-32x32-black.png; alt="Graphite" 
width="11px" height="11px"/> Graphite
  

https://github.com/llvm/llvm-project/pull/95378
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU: Fix buffer intrinsic store of bfloat (PR #95377)

2024-06-13 Thread Matt Arsenault via llvm-branch-commits

arsenm wrote:

> [!WARNING]
> This pull request is not mergeable via GitHub because a downstack PR is 
> open. Once all requirements are satisfied, merge this PR as a stack  href="https://app.graphite.dev/github/pr/llvm/llvm-project/95377?utm_source=stack-comment-downstack-mergeability-warning;
>  >on Graphite.
> https://graphite.dev/docs/merge-pull-requests;>Learn more

* **#95379** https://app.graphite.dev/github/pr/llvm/llvm-project/95379?utm_source=stack-comment-icon;
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png; alt="Graphite" 
width="10px" height="10px"/>
* **#95378** https://app.graphite.dev/github/pr/llvm/llvm-project/95378?utm_source=stack-comment-icon;
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png; alt="Graphite" 
width="10px" height="10px"/>
* **#95377** https://app.graphite.dev/github/pr/llvm/llvm-project/95377?utm_source=stack-comment-icon;
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png; alt="Graphite" 
width="10px" height="10px"/> 
* **#95376** https://app.graphite.dev/github/pr/llvm/llvm-project/95376?utm_source=stack-comment-icon;
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png; alt="Graphite" 
width="10px" height="10px"/>
* `main`

This stack of pull requests is managed by Graphite. https://stacking.dev/?utm_source=stack-comment;>Learn more about 
stacking.


 Join @arsenm and the rest of your teammates on https://graphite.dev?utm-source=stack-comment;>https://static.graphite.dev/graphite-32x32-black.png; alt="Graphite" 
width="11px" height="11px"/> Graphite
  

https://github.com/llvm/llvm-project/pull/95377
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU: Fix buffer load/store of pointers (PR #95379)

2024-06-13 Thread Matt Arsenault via llvm-branch-commits

arsenm wrote:

> [!WARNING]
> This pull request is not mergeable via GitHub because a downstack PR is 
> open. Once all requirements are satisfied, merge this PR as a stack  href="https://app.graphite.dev/github/pr/llvm/llvm-project/95379?utm_source=stack-comment-downstack-mergeability-warning;
>  >on Graphite.
> https://graphite.dev/docs/merge-pull-requests;>Learn more

* **#95379** https://app.graphite.dev/github/pr/llvm/llvm-project/95379?utm_source=stack-comment-icon;
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png; alt="Graphite" 
width="10px" height="10px"/> 
* **#95378** https://app.graphite.dev/github/pr/llvm/llvm-project/95378?utm_source=stack-comment-icon;
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png; alt="Graphite" 
width="10px" height="10px"/>
* **#95377** https://app.graphite.dev/github/pr/llvm/llvm-project/95377?utm_source=stack-comment-icon;
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png; alt="Graphite" 
width="10px" height="10px"/>
* **#95376** https://app.graphite.dev/github/pr/llvm/llvm-project/95376?utm_source=stack-comment-icon;
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png; alt="Graphite" 
width="10px" height="10px"/>
* `main`

This stack of pull requests is managed by Graphite. https://stacking.dev/?utm_source=stack-comment;>Learn more about 
stacking.


 Join @arsenm and the rest of your teammates on https://graphite.dev?utm-source=stack-comment;>https://static.graphite.dev/graphite-32x32-black.png; alt="Graphite" 
width="11px" height="11px"/> Graphite
  

https://github.com/llvm/llvm-project/pull/95379
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU: Fix buffer load/store of pointers (PR #95379)

2024-06-13 Thread Matt Arsenault via llvm-branch-commits

https://github.com/arsenm created 
https://github.com/llvm/llvm-project/pull/95379

Make sure we test all the address spaces since this support isn't
free in gisel.

>From b05179ed684e289ce31f7aee8b57939c7bf2809c Mon Sep 17 00:00:00 2001
From: Matt Arsenault 
Date: Wed, 12 Jun 2024 10:10:20 +0200
Subject: [PATCH] AMDGPU: Fix buffer load/store of pointers

Make sure we test all the address spaces since this support isn't
free in gisel.
---
 llvm/lib/Target/AMDGPU/SIISelLowering.cpp |  31 +-
 .../AMDGPU/llvm.amdgcn.raw.ptr.buffer.load.ll | 596 ++
 .../llvm.amdgcn.raw.ptr.buffer.store.ll   | 144 +
 3 files changed, 759 insertions(+), 12 deletions(-)

diff --git a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp 
b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
index 81098201e9c0f..7a36c88b892c8 100644
--- a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+++ b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
@@ -1112,29 +1112,33 @@ unsigned 
SITargetLowering::getVectorTypeBreakdownForCallingConv(
 Context, CC, VT, IntermediateVT, NumIntermediates, RegisterVT);
 }
 
-static EVT memVTFromLoadIntrData(Type *Ty, unsigned MaxNumLanes) {
+static EVT memVTFromLoadIntrData(const SITargetLowering ,
+ const DataLayout , Type *Ty,
+ unsigned MaxNumLanes) {
   assert(MaxNumLanes != 0);
 
+  LLVMContext  = Ty->getContext();
   if (auto *VT = dyn_cast(Ty)) {
 unsigned NumElts = std::min(MaxNumLanes, VT->getNumElements());
-return EVT::getVectorVT(Ty->getContext(),
-EVT::getEVT(VT->getElementType()),
+return EVT::getVectorVT(Ctx, TLI.getValueType(DL, VT->getElementType()),
 NumElts);
   }
 
-  return EVT::getEVT(Ty);
+  return TLI.getValueType(DL, Ty);
 }
 
 // Peek through TFE struct returns to only use the data size.
-static EVT memVTFromLoadIntrReturn(Type *Ty, unsigned MaxNumLanes) {
+static EVT memVTFromLoadIntrReturn(const SITargetLowering ,
+   const DataLayout , Type *Ty,
+   unsigned MaxNumLanes) {
   auto *ST = dyn_cast(Ty);
   if (!ST)
-return memVTFromLoadIntrData(Ty, MaxNumLanes);
+return memVTFromLoadIntrData(TLI, DL, Ty, MaxNumLanes);
 
   // TFE intrinsics return an aggregate type.
   assert(ST->getNumContainedTypes() == 2 &&
  ST->getContainedType(1)->isIntegerTy(32));
-  return memVTFromLoadIntrData(ST->getContainedType(0), MaxNumLanes);
+  return memVTFromLoadIntrData(TLI, DL, ST->getContainedType(0), MaxNumLanes);
 }
 
 /// Map address space 7 to MVT::v5i32 because that's its in-memory
@@ -1219,10 +1223,12 @@ bool SITargetLowering::getTgtMemIntrinsic(IntrinsicInfo 
,
   MaxNumLanes = DMask == 0 ? 1 : llvm::popcount(DMask);
 }
 
-Info.memVT = memVTFromLoadIntrReturn(CI.getType(), MaxNumLanes);
+Info.memVT = memVTFromLoadIntrReturn(*this, MF.getDataLayout(),
+ CI.getType(), MaxNumLanes);
   } else {
-Info.memVT = memVTFromLoadIntrReturn(
-CI.getType(), std::numeric_limits::max());
+Info.memVT =
+memVTFromLoadIntrReturn(*this, MF.getDataLayout(), CI.getType(),
+std::numeric_limits::max());
   }
 
   // FIXME: What does alignment mean for an image?
@@ -1235,9 +1241,10 @@ bool SITargetLowering::getTgtMemIntrinsic(IntrinsicInfo 
,
   if (RsrcIntr->IsImage) {
 unsigned DMask = 
cast(CI.getArgOperand(1))->getZExtValue();
 unsigned DMaskLanes = DMask == 0 ? 1 : llvm::popcount(DMask);
-Info.memVT = memVTFromLoadIntrData(DataTy, DMaskLanes);
+Info.memVT = memVTFromLoadIntrData(*this, MF.getDataLayout(), DataTy,
+   DMaskLanes);
   } else
-Info.memVT = EVT::getEVT(DataTy);
+Info.memVT = getValueType(MF.getDataLayout(), DataTy);
 
   Info.flags |= MachineMemOperand::MOStore;
 } else {
diff --git a/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.raw.ptr.buffer.load.ll 
b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.raw.ptr.buffer.load.ll
index 3e3371091ef72..4d557c76dc4d0 100644
--- a/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.raw.ptr.buffer.load.ll
+++ b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.raw.ptr.buffer.load.ll
@@ -1280,6 +1280,602 @@ define <2 x i64> @buffer_load_v2i64__voffset_add(ptr 
addrspace(8) inreg %rsrc, i
   ret <2 x i64> %data
 }
 
+define ptr @buffer_load_p0__voffset_add(ptr addrspace(8) inreg %rsrc, i32 
%voffset) {
+; PREGFX10-LABEL: buffer_load_p0__voffset_add:
+; PREGFX10:   ; %bb.0:
+; PREGFX10-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; PREGFX10-NEXT:buffer_load_dwordx2 v[0:1], v0, s[4:7], 0 offen offset:60
+; PREGFX10-NEXT:s_waitcnt vmcnt(0)
+; PREGFX10-NEXT:s_setpc_b64 s[30:31]
+;
+; GFX10-LABEL: buffer_load_p0__voffset_add:
+; GFX10:   ; %bb.0:
+; GFX10-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; 

[llvm-branch-commits] [llvm] AMDGPU: Cleanup selection patterns for buffer loads (PR #95378)

2024-06-13 Thread Matt Arsenault via llvm-branch-commits

https://github.com/arsenm created 
https://github.com/llvm/llvm-project/pull/95378

We should just support these for all register types.

>From 46c7f8b4529827204e5273472ea5b642ecb7266e Mon Sep 17 00:00:00 2001
From: Matt Arsenault 
Date: Sun, 9 Jun 2024 23:12:31 +0200
Subject: [PATCH] AMDGPU: Cleanup selection patterns for buffer loads

We should just support these for all register types.
---
 llvm/lib/Target/AMDGPU/BUFInstructions.td | 72 ++-
 llvm/lib/Target/AMDGPU/SIRegisterInfo.td  | 16 ++---
 2 files changed, 39 insertions(+), 49 deletions(-)

diff --git a/llvm/lib/Target/AMDGPU/BUFInstructions.td 
b/llvm/lib/Target/AMDGPU/BUFInstructions.td
index 94dd45f1333b0..2f52edb7f917a 100644
--- a/llvm/lib/Target/AMDGPU/BUFInstructions.td
+++ b/llvm/lib/Target/AMDGPU/BUFInstructions.td
@@ -1421,27 +1421,21 @@ let OtherPredicates = [HasPackedD16VMem] in {
   defm : MUBUF_LoadIntrinsicPat;
 } // End HasPackedD16VMem.
 
-defm : MUBUF_LoadIntrinsicPat;
-defm : MUBUF_LoadIntrinsicPat;
-defm : MUBUF_LoadIntrinsicPat;
-defm : MUBUF_LoadIntrinsicPat;
-defm : MUBUF_LoadIntrinsicPat;
-defm : MUBUF_LoadIntrinsicPat;
-defm : MUBUF_LoadIntrinsicPat;
-defm : MUBUF_LoadIntrinsicPat;
-defm : MUBUF_LoadIntrinsicPat;
-defm : MUBUF_LoadIntrinsicPat;
-defm : MUBUF_LoadIntrinsicPat;
-defm : MUBUF_LoadIntrinsicPat;
-defm : MUBUF_LoadIntrinsicPat;
-defm : MUBUF_LoadIntrinsicPat;
-defm : MUBUF_LoadIntrinsicPat;
-defm : MUBUF_LoadIntrinsicPat;
-defm : MUBUF_LoadIntrinsicPat;
-defm : MUBUF_LoadIntrinsicPat;
-defm : MUBUF_LoadIntrinsicPat;
-defm : MUBUF_LoadIntrinsicPat;
-defm : MUBUF_LoadIntrinsicPat;
+foreach vt = Reg32Types.types in {
+defm : MUBUF_LoadIntrinsicPat;
+}
+
+foreach vt = Reg64Types.types in {
+defm : MUBUF_LoadIntrinsicPat;
+}
+
+foreach vt = Reg96Types.types in {
+defm : MUBUF_LoadIntrinsicPat;
+}
+
+foreach vt = Reg128Types.types in {
+defm : MUBUF_LoadIntrinsicPat;
+}
 
 defm : MUBUF_LoadIntrinsicPat;
 defm : MUBUF_LoadIntrinsicPat;
@@ -1532,27 +1526,21 @@ let OtherPredicates = [HasPackedD16VMem] in {
   defm : MUBUF_StoreIntrinsicPat;
 } // End HasPackedD16VMem.
 
-defm : MUBUF_StoreIntrinsicPat;
-defm : MUBUF_StoreIntrinsicPat;
-defm : MUBUF_StoreIntrinsicPat;
-defm : MUBUF_StoreIntrinsicPat;
-defm : MUBUF_StoreIntrinsicPat;
-defm : MUBUF_StoreIntrinsicPat;
-defm : MUBUF_StoreIntrinsicPat;
-defm : MUBUF_StoreIntrinsicPat;
-defm : MUBUF_StoreIntrinsicPat;
-defm : MUBUF_StoreIntrinsicPat;
-defm : MUBUF_StoreIntrinsicPat;
-defm : MUBUF_StoreIntrinsicPat;
-defm : MUBUF_StoreIntrinsicPat;
-defm : MUBUF_StoreIntrinsicPat;
-defm : MUBUF_StoreIntrinsicPat;
-defm : MUBUF_StoreIntrinsicPat;
-defm : MUBUF_StoreIntrinsicPat;
-defm : MUBUF_StoreIntrinsicPat;
-defm : MUBUF_StoreIntrinsicPat;
-defm : MUBUF_StoreIntrinsicPat;
-defm : MUBUF_StoreIntrinsicPat;
+foreach vt = Reg32Types.types in {
+defm : MUBUF_StoreIntrinsicPat;
+}
+
+foreach vt = Reg64Types.types in {
+defm : MUBUF_StoreIntrinsicPat;
+}
+
+foreach vt = Reg96Types.types in {
+defm : MUBUF_StoreIntrinsicPat;
+}
+
+foreach vt = Reg128Types.types in {
+defm : MUBUF_StoreIntrinsicPat;
+}
 
 defm : MUBUF_StoreIntrinsicPat;
 defm : MUBUF_StoreIntrinsicPat;
diff --git a/llvm/lib/Target/AMDGPU/SIRegisterInfo.td 
b/llvm/lib/Target/AMDGPU/SIRegisterInfo.td
index caac7126068ef..a8efe2b2ba35e 100644
--- a/llvm/lib/Target/AMDGPU/SIRegisterInfo.td
+++ b/llvm/lib/Target/AMDGPU/SIRegisterInfo.td
@@ -586,7 +586,9 @@ class RegisterTypes reg_types> {
 
 def Reg16Types : RegisterTypes<[i16, f16, bf16]>;
 def Reg32Types : RegisterTypes<[i32, f32, v2i16, v2f16, v2bf16, p2, p3, p5, 
p6]>;
-def Reg64Types : RegisterTypes<[i64, f64, v2i32, v2f32, p0]>;
+def Reg64Types : RegisterTypes<[i64, f64, v2i32, v2f32, p0, v4i16, v4f16, 
v4bf16]>;
+def Reg96Types : RegisterTypes<[v3i32, v3f32]>;
+def Reg128Types : RegisterTypes<[v4i32, v4f32, v2i64, v2f64, v8i16, v8f16, 
v8bf16]>;
 
 let HasVGPR = 1 in {
 // VOP3 and VINTERP can access 256 lo and 256 hi registers.
@@ -744,7 +746,7 @@ def Pseudo_SReg_32 : SIRegisterClass<"AMDGPU", [i32, f32, 
i16, f16, bf16, v2i16,
   let BaseClassOrder = 1;
 }
 
-def Pseudo_SReg_128 : SIRegisterClass<"AMDGPU", [v4i32, v2i64, v2f64, v8i16, 
v8f16, v8bf16], 32,
+def Pseudo_SReg_128 : SIRegisterClass<"AMDGPU", Reg128Types.types, 32,
   (add PRIVATE_RSRC_REG)> {
   let isAllocatable = 0;
   let CopyCost = -1;
@@ -815,7 +817,7 @@ def SRegOrLds_32 : SIRegisterClass<"AMDGPU", [i32, f32, 
i16, f16, bf16, v2i16, v
   let HasSGPR = 1;
 }
 
-def SGPR_64 : SIRegisterClass<"AMDGPU", [v2i32, i64, v2f32, f64, v4i16, v4f16, 
v4bf16], 32,
+def SGPR_64 : SIRegisterClass<"AMDGPU", Reg64Types.types, 32,
 (add SGPR_64Regs)> {
   let CopyCost = 1;
   let AllocationPriority = 1;
@@ -905,8 +907,8 @@ multiclass SRegClass;
-defm "" : SRegClass<4, [v4i32, v4f32, v2i64, v2f64, v8i16, v8f16, v8bf16], 
SGPR_128Regs, TTMP_128Regs>;
+defm "" : SRegClass<3, Reg96Types.types, SGPR_96Regs, TTMP_96Regs>;
+defm "" : SRegClass<4, Reg128Types.types, 

[llvm-branch-commits] [llvm] AMDGPU: Fix buffer intrinsic store of bfloat (PR #95377)

2024-06-13 Thread Matt Arsenault via llvm-branch-commits

https://github.com/arsenm created 
https://github.com/llvm/llvm-project/pull/95377

None

>From 520d91d73339d8bea65f2e30e2a4d7fd0eb3d92b Mon Sep 17 00:00:00 2001
From: Matt Arsenault 
Date: Sun, 9 Jun 2024 22:54:35 +0200
Subject: [PATCH] AMDGPU: Fix buffer intrinsic store of bfloat

---
 llvm/lib/Target/AMDGPU/SIISelLowering.cpp |  4 +-
 .../llvm.amdgcn.raw.ptr.buffer.store.bf16.ll  | 37 ---
 2 files changed, 34 insertions(+), 7 deletions(-)

diff --git a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp 
b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
index 4946129c65a95..81098201e9c0f 100644
--- a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+++ b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
@@ -874,7 +874,7 @@ SITargetLowering::SITargetLowering(const TargetMachine ,
  {MVT::Other, MVT::v2i16, MVT::v2f16, MVT::v2bf16,
   MVT::v3i16, MVT::v3f16, MVT::v4f16, MVT::v4i16,
   MVT::v4bf16, MVT::v8i16, MVT::v8f16, MVT::v8bf16,
-  MVT::f16, MVT::i16, MVT::i8, MVT::i128},
+  MVT::f16, MVT::i16, MVT::bf16, MVT::i8, MVT::i128},
  Custom);
 
   setOperationAction(ISD::STACKSAVE, MVT::Other, Custom);
@@ -9973,7 +9973,7 @@ SDValue 
SITargetLowering::handleByteShortBufferStores(SelectionDAG ,
   EVT VDataType, SDLoc DL,
   SDValue Ops[],
   MemSDNode *M) const {
-  if (VDataType == MVT::f16)
+  if (VDataType == MVT::f16 || VDataType == MVT::bf16)
 Ops[1] = DAG.getNode(ISD::BITCAST, DL, MVT::i16, Ops[1]);
 
   SDValue BufferStoreExt = DAG.getNode(ISD::ANY_EXTEND, DL, MVT::i32, Ops[1]);
diff --git a/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.raw.ptr.buffer.store.bf16.ll 
b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.raw.ptr.buffer.store.bf16.ll
index f7f3742a90633..82dd35ab4c240 100644
--- a/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.raw.ptr.buffer.store.bf16.ll
+++ b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.raw.ptr.buffer.store.bf16.ll
@@ -5,11 +5,38 @@
 ; RUN: llc -mtriple=amdgcn -mcpu=gfx1010 < %s | FileCheck --check-prefix=GFX10 
%s
 ; RUN: llc -mtriple=amdgcn -mcpu=gfx1100 -amdgpu-enable-delay-alu=0 < %s | 
FileCheck --check-prefixes=GFX11 %s
 
-; FIXME
-; define amdgpu_ps void @buffer_store_bf16(ptr addrspace(8) inreg %rsrc, 
bfloat %data, i32 %offset) {
-;   call void @llvm.amdgcn.raw.ptr.buffer.store.bf16(bfloat %data, ptr 
addrspace(8) %rsrc, i32 %offset, i32 0, i32 0)
-;   ret void
-; }
+define amdgpu_ps void @buffer_store_bf16(ptr addrspace(8) inreg %rsrc, bfloat 
%data, i32 %offset) {
+; GFX7-LABEL: buffer_store_bf16:
+; GFX7:   ; %bb.0:
+; GFX7-NEXT:v_mul_f32_e32 v0, 1.0, v0
+; GFX7-NEXT:v_lshrrev_b32_e32 v0, 16, v0
+; GFX7-NEXT:buffer_store_short v0, v1, s[0:3], 0 offen
+; GFX7-NEXT:s_endpgm
+;
+; GFX8-LABEL: buffer_store_bf16:
+; GFX8:   ; %bb.0:
+; GFX8-NEXT:buffer_store_short v0, v1, s[0:3], 0 offen
+; GFX8-NEXT:s_endpgm
+;
+; GFX9-LABEL: buffer_store_bf16:
+; GFX9:   ; %bb.0:
+; GFX9-NEXT:buffer_store_short v0, v1, s[0:3], 0 offen
+; GFX9-NEXT:s_endpgm
+;
+; GFX10-LABEL: buffer_store_bf16:
+; GFX10:   ; %bb.0:
+; GFX10-NEXT:buffer_store_short v0, v1, s[0:3], 0 offen
+; GFX10-NEXT:s_endpgm
+;
+; GFX11-LABEL: buffer_store_bf16:
+; GFX11:   ; %bb.0:
+; GFX11-NEXT:buffer_store_b16 v0, v1, s[0:3], 0 offen
+; GFX11-NEXT:s_nop 0
+; GFX11-NEXT:s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)
+; GFX11-NEXT:s_endpgm
+  call void @llvm.amdgcn.raw.ptr.buffer.store.bf16(bfloat %data, ptr 
addrspace(8) %rsrc, i32 %offset, i32 0, i32 0)
+  ret void
+}
 
 define amdgpu_ps void @buffer_store_v2bf16(ptr addrspace(8) inreg %rsrc, <2 x 
bfloat> %data, i32 %offset) {
 ; GFX7-LABEL: buffer_store_v2bf16:

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [flang] [flang] Lower REDUCE intrinsic for reduction op with args by value (PR #95353)

2024-06-12 Thread via llvm-branch-commits

llvmbot wrote:




@llvm/pr-subscribers-flang-fir-hlfir

Author: Valentin Clement (バレンタイン クレメン) (clementval)


Changes

#95297 Updates the runtime entry points to distinguish between 
reduction operation with arguments passed by value or by reference. Add 
lowering to support the arguments passed by value. 

---

Patch is 62.25 KiB, truncated to 20.00 KiB below, full version: 
https://github.com/llvm/llvm-project/pull/95353.diff


5 Files Affected:

- (modified) flang/include/flang/Optimizer/Builder/Runtime/RTBuilder.h (+22) 
- (modified) flang/include/flang/Optimizer/Builder/Runtime/Reduction.h (+4-4) 
- (modified) flang/lib/Optimizer/Builder/IntrinsicCall.cpp (+12-4) 
- (modified) flang/lib/Optimizer/Builder/Runtime/Reduction.cpp (+413-55) 
- (modified) flang/test/Lower/Intrinsics/reduce.f90 (+223-12) 


``diff
diff --git a/flang/include/flang/Optimizer/Builder/Runtime/RTBuilder.h 
b/flang/include/flang/Optimizer/Builder/Runtime/RTBuilder.h
index 809d5b8d569dc..845ba385918d0 100644
--- a/flang/include/flang/Optimizer/Builder/Runtime/RTBuilder.h
+++ b/flang/include/flang/Optimizer/Builder/Runtime/RTBuilder.h
@@ -64,6 +64,18 @@ using FuncTypeBuilderFunc = mlir::FunctionType 
(*)(mlir::MLIRContext *);
 }; 
\
   }
 
+#define REDUCTION_VALUE_OPERATION_MODEL(T) 
\
+  template <>  
\
+  constexpr TypeBuilderFunc
\
+  getModel>() {   
\
+return [](mlir::MLIRContext *context) -> mlir::Type {  
\
+  TypeBuilderFunc f{getModel()};
\
+  auto refTy = fir::ReferenceType::get(f(context));
\
+  return mlir::FunctionType::get(context, {f(context), f(context)},
\
+ refTy);   
\
+}; 
\
+  }
+
 #define REDUCTION_CHAR_OPERATION_MODEL(T)  
\
   template <>  
\
   constexpr TypeBuilderFunc
\
@@ -481,17 +493,27 @@ constexpr TypeBuilderFunc getModel() {
 }
 
 REDUCTION_REF_OPERATION_MODEL(std::int8_t)
+REDUCTION_VALUE_OPERATION_MODEL(std::int8_t)
 REDUCTION_REF_OPERATION_MODEL(std::int16_t)
+REDUCTION_VALUE_OPERATION_MODEL(std::int16_t)
 REDUCTION_REF_OPERATION_MODEL(std::int32_t)
+REDUCTION_VALUE_OPERATION_MODEL(std::int32_t)
 REDUCTION_REF_OPERATION_MODEL(std::int64_t)
+REDUCTION_VALUE_OPERATION_MODEL(std::int64_t)
 REDUCTION_REF_OPERATION_MODEL(Fortran::common::int128_t)
+REDUCTION_VALUE_OPERATION_MODEL(Fortran::common::int128_t)
 
 REDUCTION_REF_OPERATION_MODEL(float)
+REDUCTION_VALUE_OPERATION_MODEL(float)
 REDUCTION_REF_OPERATION_MODEL(double)
+REDUCTION_VALUE_OPERATION_MODEL(double)
 REDUCTION_REF_OPERATION_MODEL(long double)
+REDUCTION_VALUE_OPERATION_MODEL(long double)
 
 REDUCTION_REF_OPERATION_MODEL(std::complex)
+REDUCTION_VALUE_OPERATION_MODEL(std::complex)
 REDUCTION_REF_OPERATION_MODEL(std::complex)
+REDUCTION_VALUE_OPERATION_MODEL(std::complex)
 
 REDUCTION_CHAR_OPERATION_MODEL(char)
 REDUCTION_CHAR_OPERATION_MODEL(char16_t)
diff --git a/flang/include/flang/Optimizer/Builder/Runtime/Reduction.h 
b/flang/include/flang/Optimizer/Builder/Runtime/Reduction.h
index fedf453a6dc8d..2a40cddc0cc2c 100644
--- a/flang/include/flang/Optimizer/Builder/Runtime/Reduction.h
+++ b/flang/include/flang/Optimizer/Builder/Runtime/Reduction.h
@@ -229,8 +229,8 @@ void genIParityDim(fir::FirOpBuilder , 
mlir::Location loc,
 /// result value. This is used for COMPLEX, CHARACTER and DERIVED TYPES.
 void genReduce(fir::FirOpBuilder , mlir::Location loc,
mlir::Value arrayBox, mlir::Value operation, mlir::Value 
maskBox,
-   mlir::Value identity, mlir::Value ordered,
-   mlir::Value resultBox);
+   mlir::Value identity, mlir::Value ordered, mlir::Value 
resultBox,
+   bool argByRef);
 
 /// Generate call to `Reduce` intrinsic runtime routine. This is the version
 /// that does not take a dim argument and return a scalare result. This is used
@@ -238,14 +238,14 @@ void genReduce(fir::FirOpBuilder , mlir::Location 
loc,
 mlir::Value genReduce(fir::FirOpBuilder , mlir::Location loc,
   mlir::Value arrayBox, mlir::Value operation,
   mlir::Value maskBox, mlir::Value identity,
-  mlir::Value ordered);
+  mlir::Value ordered, bool argByRef);
 
 /// Generate call to `Reduce` intrinsic runtime routine. This is the version
 /// that takes arrays of any rank with a dim argument specified.
 void genReduceDim(fir::FirOpBuilder , mlir::Location loc,
   

[llvm-branch-commits] [flang] [flang] Lower REDUCE intrinsic for reduction op with args by value (PR #95353)

2024-06-12 Thread Valentin Clement バレンタイン クレメン via llvm-branch-commits

https://github.com/clementval created 
https://github.com/llvm/llvm-project/pull/95353

#95297 Updates the runtime entry points to distinguish between reduction 
operation with arguments passed by value or by reference. Add lowering to 
support the arguments passed by value. 

>From defadc4f18b0b4b369a3657a0f6e4c9f79ffd793 Mon Sep 17 00:00:00 2001
From: Valentin Clement 
Date: Wed, 12 Jun 2024 15:28:31 -0700
Subject: [PATCH] [flang] Update lowering of REDUCE intrinsic for reduction
 operation with args by value

---
 .../Optimizer/Builder/Runtime/RTBuilder.h |  22 +
 .../Optimizer/Builder/Runtime/Reduction.h |   8 +-
 flang/lib/Optimizer/Builder/IntrinsicCall.cpp |  16 +-
 .../Optimizer/Builder/Runtime/Reduction.cpp   | 468 --
 flang/test/Lower/Intrinsics/reduce.f90| 235 -
 5 files changed, 674 insertions(+), 75 deletions(-)

diff --git a/flang/include/flang/Optimizer/Builder/Runtime/RTBuilder.h 
b/flang/include/flang/Optimizer/Builder/Runtime/RTBuilder.h
index 809d5b8d569dc..845ba385918d0 100644
--- a/flang/include/flang/Optimizer/Builder/Runtime/RTBuilder.h
+++ b/flang/include/flang/Optimizer/Builder/Runtime/RTBuilder.h
@@ -64,6 +64,18 @@ using FuncTypeBuilderFunc = mlir::FunctionType 
(*)(mlir::MLIRContext *);
 }; 
\
   }
 
+#define REDUCTION_VALUE_OPERATION_MODEL(T) 
\
+  template <>  
\
+  constexpr TypeBuilderFunc
\
+  getModel>() {   
\
+return [](mlir::MLIRContext *context) -> mlir::Type {  
\
+  TypeBuilderFunc f{getModel()};
\
+  auto refTy = fir::ReferenceType::get(f(context));
\
+  return mlir::FunctionType::get(context, {f(context), f(context)},
\
+ refTy);   
\
+}; 
\
+  }
+
 #define REDUCTION_CHAR_OPERATION_MODEL(T)  
\
   template <>  
\
   constexpr TypeBuilderFunc
\
@@ -481,17 +493,27 @@ constexpr TypeBuilderFunc getModel() {
 }
 
 REDUCTION_REF_OPERATION_MODEL(std::int8_t)
+REDUCTION_VALUE_OPERATION_MODEL(std::int8_t)
 REDUCTION_REF_OPERATION_MODEL(std::int16_t)
+REDUCTION_VALUE_OPERATION_MODEL(std::int16_t)
 REDUCTION_REF_OPERATION_MODEL(std::int32_t)
+REDUCTION_VALUE_OPERATION_MODEL(std::int32_t)
 REDUCTION_REF_OPERATION_MODEL(std::int64_t)
+REDUCTION_VALUE_OPERATION_MODEL(std::int64_t)
 REDUCTION_REF_OPERATION_MODEL(Fortran::common::int128_t)
+REDUCTION_VALUE_OPERATION_MODEL(Fortran::common::int128_t)
 
 REDUCTION_REF_OPERATION_MODEL(float)
+REDUCTION_VALUE_OPERATION_MODEL(float)
 REDUCTION_REF_OPERATION_MODEL(double)
+REDUCTION_VALUE_OPERATION_MODEL(double)
 REDUCTION_REF_OPERATION_MODEL(long double)
+REDUCTION_VALUE_OPERATION_MODEL(long double)
 
 REDUCTION_REF_OPERATION_MODEL(std::complex)
+REDUCTION_VALUE_OPERATION_MODEL(std::complex)
 REDUCTION_REF_OPERATION_MODEL(std::complex)
+REDUCTION_VALUE_OPERATION_MODEL(std::complex)
 
 REDUCTION_CHAR_OPERATION_MODEL(char)
 REDUCTION_CHAR_OPERATION_MODEL(char16_t)
diff --git a/flang/include/flang/Optimizer/Builder/Runtime/Reduction.h 
b/flang/include/flang/Optimizer/Builder/Runtime/Reduction.h
index fedf453a6dc8d..2a40cddc0cc2c 100644
--- a/flang/include/flang/Optimizer/Builder/Runtime/Reduction.h
+++ b/flang/include/flang/Optimizer/Builder/Runtime/Reduction.h
@@ -229,8 +229,8 @@ void genIParityDim(fir::FirOpBuilder , 
mlir::Location loc,
 /// result value. This is used for COMPLEX, CHARACTER and DERIVED TYPES.
 void genReduce(fir::FirOpBuilder , mlir::Location loc,
mlir::Value arrayBox, mlir::Value operation, mlir::Value 
maskBox,
-   mlir::Value identity, mlir::Value ordered,
-   mlir::Value resultBox);
+   mlir::Value identity, mlir::Value ordered, mlir::Value 
resultBox,
+   bool argByRef);
 
 /// Generate call to `Reduce` intrinsic runtime routine. This is the version
 /// that does not take a dim argument and return a scalare result. This is used
@@ -238,14 +238,14 @@ void genReduce(fir::FirOpBuilder , mlir::Location 
loc,
 mlir::Value genReduce(fir::FirOpBuilder , mlir::Location loc,
   mlir::Value arrayBox, mlir::Value operation,
   mlir::Value maskBox, mlir::Value identity,
-  mlir::Value ordered);
+  mlir::Value ordered, bool argByRef);
 
 /// Generate call to `Reduce` intrinsic runtime routine. This is the version
 /// that takes arrays of any rank with a dim argument specified.
 void 

[llvm-branch-commits] [clang] [clang] Implement function pointer signing and authenticated function calls (PR #93906)

2024-06-12 Thread Akira Hatanaka via llvm-branch-commits

https://github.com/ahatanak updated 
https://github.com/llvm/llvm-project/pull/93906

>From 0e85001f6d53e63beca77a76eaba1875ec84000d Mon Sep 17 00:00:00 2001
From: Akira Hatanaka 
Date: Fri, 24 May 2024 20:23:36 -0700
Subject: [PATCH 1/4] [clang] Implement function pointer signing.

Co-Authored-By: John McCall 
---
 clang/include/clang/Basic/CodeGenOptions.h|   4 +
 .../clang/Basic/DiagnosticDriverKinds.td  |   3 +
 clang/include/clang/Basic/LangOptions.h   |   2 +
 .../include/clang/Basic/PointerAuthOptions.h  | 136 ++
 .../clang/Frontend/CompilerInvocation.h   |  10 ++
 clang/lib/CodeGen/CGBuiltin.cpp   |   3 +-
 clang/lib/CodeGen/CGCall.cpp  |   3 +
 clang/lib/CodeGen/CGCall.h|  28 +++-
 clang/lib/CodeGen/CGExpr.cpp  |  17 +--
 clang/lib/CodeGen/CGExprConstant.cpp  |  19 ++-
 clang/lib/CodeGen/CGPointerAuth.cpp   |  51 +++
 clang/lib/CodeGen/CGPointerAuthInfo.h |  96 +
 clang/lib/CodeGen/CodeGenFunction.cpp |  58 
 clang/lib/CodeGen/CodeGenFunction.h   |  10 ++
 clang/lib/CodeGen/CodeGenModule.h |  34 +
 clang/lib/Frontend/CompilerInvocation.cpp |  36 +
 clang/lib/Headers/ptrauth.h   |  34 +
 .../CodeGen/ptrauth-function-attributes.c |  13 ++
 .../test/CodeGen/ptrauth-function-init-fail.c |   5 +
 clang/test/CodeGen/ptrauth-function-init.c|  31 
 .../CodeGen/ptrauth-function-lvalue-cast.c|  23 +++
 clang/test/CodeGen/ptrauth-weak_import.c  |  10 ++
 clang/test/CodeGenCXX/ptrauth.cpp |  24 
 23 files changed, 633 insertions(+), 17 deletions(-)
 create mode 100644 clang/lib/CodeGen/CGPointerAuthInfo.h
 create mode 100644 clang/test/CodeGen/ptrauth-function-attributes.c
 create mode 100644 clang/test/CodeGen/ptrauth-function-init-fail.c
 create mode 100644 clang/test/CodeGen/ptrauth-function-init.c
 create mode 100644 clang/test/CodeGen/ptrauth-function-lvalue-cast.c
 create mode 100644 clang/test/CodeGen/ptrauth-weak_import.c
 create mode 100644 clang/test/CodeGenCXX/ptrauth.cpp

diff --git a/clang/include/clang/Basic/CodeGenOptions.h 
b/clang/include/clang/Basic/CodeGenOptions.h
index 9469a424045bb..502722a6ec4eb 100644
--- a/clang/include/clang/Basic/CodeGenOptions.h
+++ b/clang/include/clang/Basic/CodeGenOptions.h
@@ -13,6 +13,7 @@
 #ifndef LLVM_CLANG_BASIC_CODEGENOPTIONS_H
 #define LLVM_CLANG_BASIC_CODEGENOPTIONS_H
 
+#include "clang/Basic/PointerAuthOptions.h"
 #include "clang/Basic/Sanitizers.h"
 #include "clang/Basic/XRayInstr.h"
 #include "llvm/ADT/FloatingPointMode.h"
@@ -388,6 +389,9 @@ class CodeGenOptions : public CodeGenOptionsBase {
 
   std::vector Reciprocals;
 
+  /// Configuration for pointer-signing.
+  PointerAuthOptions PointerAuth;
+
   /// The preferred width for auto-vectorization transforms. This is intended 
to
   /// override default transforms based on the width of the architected vector
   /// registers.
diff --git a/clang/include/clang/Basic/DiagnosticDriverKinds.td 
b/clang/include/clang/Basic/DiagnosticDriverKinds.td
index 773b234cd68fe..6cbb0c8401c15 100644
--- a/clang/include/clang/Basic/DiagnosticDriverKinds.td
+++ b/clang/include/clang/Basic/DiagnosticDriverKinds.td
@@ -351,6 +351,9 @@ def err_drv_omp_host_ir_file_not_found : Error<
   "target regions but cannot be found">;
 def err_drv_omp_host_target_not_supported : Error<
   "target '%0' is not a supported OpenMP host target">;
+def err_drv_ptrauth_not_supported : Error<
+  "target '%0' does not support native pointer authentication">;
+
 def err_drv_expecting_fopenmp_with_fopenmp_targets : Error<
   "'-fopenmp-targets' must be used in conjunction with a '-fopenmp' option "
   "compatible with offloading; e.g., '-fopenmp=libomp' or 
'-fopenmp=libiomp5'">;
diff --git a/clang/include/clang/Basic/LangOptions.h 
b/clang/include/clang/Basic/LangOptions.h
index 75e88afbd9705..5216822e45b1b 100644
--- a/clang/include/clang/Basic/LangOptions.h
+++ b/clang/include/clang/Basic/LangOptions.h
@@ -346,6 +346,8 @@ class LangOptionsBase {
 BKey
   };
 
+  using PointerAuthenticationMode = ::clang::PointerAuthenticationMode;
+
   enum class ThreadModelKind {
 /// POSIX Threads.
 POSIX,
diff --git a/clang/include/clang/Basic/PointerAuthOptions.h 
b/clang/include/clang/Basic/PointerAuthOptions.h
index e5cdcc31ebfb7..32b179e3f9460 100644
--- a/clang/include/clang/Basic/PointerAuthOptions.h
+++ b/clang/include/clang/Basic/PointerAuthOptions.h
@@ -14,10 +14,146 @@
 #ifndef LLVM_CLANG_BASIC_POINTERAUTHOPTIONS_H
 #define LLVM_CLANG_BASIC_POINTERAUTHOPTIONS_H
 
+#include "clang/Basic/LLVM.h"
+#include "clang/Basic/LangOptions.h"
+#include "llvm/Support/ErrorHandling.h"
+#include "llvm/Target/TargetOptions.h"
+#include 
+#include 
+#include 
+#include 
+
 namespace clang {
 
 constexpr unsigned PointerAuthKeyNone = -1;
 
+class PointerAuthSchema {
+public:
+  enum class Kind : unsigned {
+

[llvm-branch-commits] [clang] [clang] Implement function pointer signing and authenticated function calls (PR #93906)

2024-06-12 Thread Akira Hatanaka via llvm-branch-commits

https://github.com/ahatanak updated 
https://github.com/llvm/llvm-project/pull/93906

>From 0e85001f6d53e63beca77a76eaba1875ec84000d Mon Sep 17 00:00:00 2001
From: Akira Hatanaka 
Date: Fri, 24 May 2024 20:23:36 -0700
Subject: [PATCH 1/4] [clang] Implement function pointer signing.

Co-Authored-By: John McCall 
---
 clang/include/clang/Basic/CodeGenOptions.h|   4 +
 .../clang/Basic/DiagnosticDriverKinds.td  |   3 +
 clang/include/clang/Basic/LangOptions.h   |   2 +
 .../include/clang/Basic/PointerAuthOptions.h  | 136 ++
 .../clang/Frontend/CompilerInvocation.h   |  10 ++
 clang/lib/CodeGen/CGBuiltin.cpp   |   3 +-
 clang/lib/CodeGen/CGCall.cpp  |   3 +
 clang/lib/CodeGen/CGCall.h|  28 +++-
 clang/lib/CodeGen/CGExpr.cpp  |  17 +--
 clang/lib/CodeGen/CGExprConstant.cpp  |  19 ++-
 clang/lib/CodeGen/CGPointerAuth.cpp   |  51 +++
 clang/lib/CodeGen/CGPointerAuthInfo.h |  96 +
 clang/lib/CodeGen/CodeGenFunction.cpp |  58 
 clang/lib/CodeGen/CodeGenFunction.h   |  10 ++
 clang/lib/CodeGen/CodeGenModule.h |  34 +
 clang/lib/Frontend/CompilerInvocation.cpp |  36 +
 clang/lib/Headers/ptrauth.h   |  34 +
 .../CodeGen/ptrauth-function-attributes.c |  13 ++
 .../test/CodeGen/ptrauth-function-init-fail.c |   5 +
 clang/test/CodeGen/ptrauth-function-init.c|  31 
 .../CodeGen/ptrauth-function-lvalue-cast.c|  23 +++
 clang/test/CodeGen/ptrauth-weak_import.c  |  10 ++
 clang/test/CodeGenCXX/ptrauth.cpp |  24 
 23 files changed, 633 insertions(+), 17 deletions(-)
 create mode 100644 clang/lib/CodeGen/CGPointerAuthInfo.h
 create mode 100644 clang/test/CodeGen/ptrauth-function-attributes.c
 create mode 100644 clang/test/CodeGen/ptrauth-function-init-fail.c
 create mode 100644 clang/test/CodeGen/ptrauth-function-init.c
 create mode 100644 clang/test/CodeGen/ptrauth-function-lvalue-cast.c
 create mode 100644 clang/test/CodeGen/ptrauth-weak_import.c
 create mode 100644 clang/test/CodeGenCXX/ptrauth.cpp

diff --git a/clang/include/clang/Basic/CodeGenOptions.h 
b/clang/include/clang/Basic/CodeGenOptions.h
index 9469a424045bb..502722a6ec4eb 100644
--- a/clang/include/clang/Basic/CodeGenOptions.h
+++ b/clang/include/clang/Basic/CodeGenOptions.h
@@ -13,6 +13,7 @@
 #ifndef LLVM_CLANG_BASIC_CODEGENOPTIONS_H
 #define LLVM_CLANG_BASIC_CODEGENOPTIONS_H
 
+#include "clang/Basic/PointerAuthOptions.h"
 #include "clang/Basic/Sanitizers.h"
 #include "clang/Basic/XRayInstr.h"
 #include "llvm/ADT/FloatingPointMode.h"
@@ -388,6 +389,9 @@ class CodeGenOptions : public CodeGenOptionsBase {
 
   std::vector Reciprocals;
 
+  /// Configuration for pointer-signing.
+  PointerAuthOptions PointerAuth;
+
   /// The preferred width for auto-vectorization transforms. This is intended 
to
   /// override default transforms based on the width of the architected vector
   /// registers.
diff --git a/clang/include/clang/Basic/DiagnosticDriverKinds.td 
b/clang/include/clang/Basic/DiagnosticDriverKinds.td
index 773b234cd68fe..6cbb0c8401c15 100644
--- a/clang/include/clang/Basic/DiagnosticDriverKinds.td
+++ b/clang/include/clang/Basic/DiagnosticDriverKinds.td
@@ -351,6 +351,9 @@ def err_drv_omp_host_ir_file_not_found : Error<
   "target regions but cannot be found">;
 def err_drv_omp_host_target_not_supported : Error<
   "target '%0' is not a supported OpenMP host target">;
+def err_drv_ptrauth_not_supported : Error<
+  "target '%0' does not support native pointer authentication">;
+
 def err_drv_expecting_fopenmp_with_fopenmp_targets : Error<
   "'-fopenmp-targets' must be used in conjunction with a '-fopenmp' option "
   "compatible with offloading; e.g., '-fopenmp=libomp' or 
'-fopenmp=libiomp5'">;
diff --git a/clang/include/clang/Basic/LangOptions.h 
b/clang/include/clang/Basic/LangOptions.h
index 75e88afbd9705..5216822e45b1b 100644
--- a/clang/include/clang/Basic/LangOptions.h
+++ b/clang/include/clang/Basic/LangOptions.h
@@ -346,6 +346,8 @@ class LangOptionsBase {
 BKey
   };
 
+  using PointerAuthenticationMode = ::clang::PointerAuthenticationMode;
+
   enum class ThreadModelKind {
 /// POSIX Threads.
 POSIX,
diff --git a/clang/include/clang/Basic/PointerAuthOptions.h 
b/clang/include/clang/Basic/PointerAuthOptions.h
index e5cdcc31ebfb7..32b179e3f9460 100644
--- a/clang/include/clang/Basic/PointerAuthOptions.h
+++ b/clang/include/clang/Basic/PointerAuthOptions.h
@@ -14,10 +14,146 @@
 #ifndef LLVM_CLANG_BASIC_POINTERAUTHOPTIONS_H
 #define LLVM_CLANG_BASIC_POINTERAUTHOPTIONS_H
 
+#include "clang/Basic/LLVM.h"
+#include "clang/Basic/LangOptions.h"
+#include "llvm/Support/ErrorHandling.h"
+#include "llvm/Target/TargetOptions.h"
+#include 
+#include 
+#include 
+#include 
+
 namespace clang {
 
 constexpr unsigned PointerAuthKeyNone = -1;
 
+class PointerAuthSchema {
+public:
+  enum class Kind : unsigned {
+

[llvm-branch-commits] [mlir] 8944c8d - Revert "[MLIR][Arith] add fastMathAttr on arith::extf and arith::truncf (#93443)"

2024-06-12 Thread via llvm-branch-commits

Author: Ivy Zhang
Date: 2024-06-13T11:12:39+08:00
New Revision: 8944c8df45f8e4da860bf04118106d9a950cbf75

URL: 
https://github.com/llvm/llvm-project/commit/8944c8df45f8e4da860bf04118106d9a950cbf75
DIFF: 
https://github.com/llvm/llvm-project/commit/8944c8df45f8e4da860bf04118106d9a950cbf75.diff

LOG: Revert "[MLIR][Arith] add fastMathAttr on arith::extf and arith::truncf 
(#93443)"

This reverts commit 6784bf764207d267b781b4f515a2fafdcb345509.

Added: 


Modified: 
mlir/include/mlir/Dialect/Arith/IR/ArithOps.td
mlir/lib/Dialect/Arith/IR/ArithOps.cpp
mlir/lib/Dialect/Arith/Transforms/EmulateUnsupportedFloats.cpp
mlir/lib/Dialect/Math/Transforms/LegalizeToF32.cpp
mlir/test/Conversion/ArithToLLVM/arith-to-llvm.mlir
mlir/test/Dialect/Arith/canonicalize.mlir
mlir/test/Dialect/Arith/emulate-unsupported-floats.mlir

Removed: 




diff  --git a/mlir/include/mlir/Dialect/Arith/IR/ArithOps.td 
b/mlir/include/mlir/Dialect/Arith/IR/ArithOps.td
index c4471f9bc5af2..06fbdb7f2c4cb 100644
--- a/mlir/include/mlir/Dialect/Arith/IR/ArithOps.td
+++ b/mlir/include/mlir/Dialect/Arith/IR/ArithOps.td
@@ -1199,7 +1199,7 @@ def Arith_ExtSIOp : Arith_IToICastOp<"extsi"> {
 // ExtFOp
 
//===--===//
 
-def Arith_ExtFOp : Arith_FToFCastOp<"extf", 
[DeclareOpInterfaceMethods]> {
+def Arith_ExtFOp : Arith_FToFCastOp<"extf"> {
   let summary = "cast from floating-point to wider floating-point";
   let description = [{
 Cast a floating-point value to a larger floating-point-typed value.
@@ -1208,13 +1208,6 @@ def Arith_ExtFOp : Arith_FToFCastOp<"extf", 
[DeclareOpInterfaceMethods:$fastmath);
-  let results = (outs FloatLike:$out);
-
-  let assemblyFormat = [{ $in (`fastmath` `` $fastmath^)?
-  attr-dict `:` type($in) `to` type($out) }];
 }
 
 
//===--===//
@@ -1253,11 +1246,8 @@ def Arith_TruncFOp :
 Arith_Op<"truncf",
   [Pure, SameOperandsAndResultShape, SameInputOutputTensorDims,
DeclareOpInterfaceMethods,
-   DeclareOpInterfaceMethods,
DeclareOpInterfaceMethods]>,
 Arguments<(ins FloatLike:$in,
-   DefaultValuedAttr<
-  Arith_FastMathAttr, 
"::mlir::arith::FastMathFlags::none">:$fastmath,
OptionalAttr:$roundingmode)>,
 Results<(outs FloatLike:$out)> {
   let summary = "cast from floating-point to narrower floating-point";
@@ -1277,9 +1267,7 @@ def Arith_TruncFOp :
 
   let hasFolder = 1;
   let hasVerifier = 1;
-  let assemblyFormat = [{ $in ($roundingmode^)?
-  (`fastmath` `` $fastmath^)?
-  attr-dict `:` type($in) `to` type($out) }];
+  let assemblyFormat = "$in ($roundingmode^)? attr-dict `:` type($in) `to` 
type($out)";
 }
 
 
//===--===//

diff  --git a/mlir/lib/Dialect/Arith/IR/ArithOps.cpp 
b/mlir/lib/Dialect/Arith/IR/ArithOps.cpp
index 291f6e5424ba5..2f6647a2a27b1 100644
--- a/mlir/lib/Dialect/Arith/IR/ArithOps.cpp
+++ b/mlir/lib/Dialect/Arith/IR/ArithOps.cpp
@@ -1390,20 +1390,6 @@ LogicalResult arith::ExtSIOp::verify() {
 /// Fold extension of float constants when there is no information loss due the
 /// 
diff erence in fp semantics.
 OpFoldResult arith::ExtFOp::fold(FoldAdaptor adaptor) {
-  if (auto truncFOp = getOperand().getDefiningOp()) {
-if (truncFOp.getOperand().getType() == getType()) {
-  arith::FastMathFlags truncFMF = truncFOp.getFastmath();
-  bool isTruncContract =
-  bitEnumContainsAll(truncFMF, arith::FastMathFlags::contract);
-  arith::FastMathFlags extFMF = getFastmath();
-  bool isExtContract =
-  bitEnumContainsAll(extFMF, arith::FastMathFlags::contract);
-  if (isTruncContract && isExtContract) {
-return truncFOp.getOperand();
-  }
-}
-  }
-
   auto resElemType = cast(getElementTypeOrSelf(getType()));
   const llvm::fltSemantics  = resElemType.getFloatSemantics();
   return constFoldCastOp(

diff  --git a/mlir/lib/Dialect/Arith/Transforms/EmulateUnsupportedFloats.cpp 
b/mlir/lib/Dialect/Arith/Transforms/EmulateUnsupportedFloats.cpp
index 8e1cb474feee7..4a50da3513f99 100644
--- a/mlir/lib/Dialect/Arith/Transforms/EmulateUnsupportedFloats.cpp
+++ b/mlir/lib/Dialect/Arith/Transforms/EmulateUnsupportedFloats.cpp
@@ -94,11 +94,8 @@ void EmulateFloatPattern::rewrite(Operation *op, 
ArrayRef operands,
   SmallVector newResults(expandedOp->getResults());
   for (auto [res, oldType, newType] : llvm::zip_equal(
MutableArrayRef{newResults}, op->getResultTypes(), resultTypes)) {
-if (oldType != newType) {
-  auto truncFOp = rewriter.create(loc, oldType, res);
-  truncFOp.setFastmath(arith::FastMathFlags::contract);
-  res = truncFOp.getResult();
-}
+   

[llvm-branch-commits] [llvm] [Support] Integrate SipHash.cpp into libSupport. (PR #94394)

2024-06-12 Thread Anton Korobeynikov via llvm-branch-commits

asl wrote:

@kbeyls There are (some) tests in the follow-up commit 
https://github.com/llvm/llvm-project/pull/93902/files#diff-8df159460fc7a128734566054df883f3192b1b261dc8eac667933b4042e9af5f

https://github.com/llvm/llvm-project/pull/94394
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [clang] Implement function pointer signing and authenticated function calls (PR #93906)

2024-06-12 Thread Anton Korobeynikov via llvm-branch-commits

asl wrote:

@ahatanak Looks like there are some conflicts that should be resolved

https://github.com/llvm/llvm-project/pull/93906
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [BOLT] Drop high discrepancy profiles in matching (PR #95156)

2024-06-12 Thread via llvm-branch-commits


@@ -592,10 +599,15 @@ void preprocessUnreachableBlocks(FlowFunction ) {
 /// Decide if stale profile matching can be applied for a given function.
 /// Currently we skip inference for (very) large instances and for instances
 /// having "unexpected" control flow (e.g., having no sink basic blocks).
-bool canApplyInference(const FlowFunction ) {
+bool canApplyInference(const FlowFunction ,
+   const yaml::bolt::BinaryFunctionProfile ) {
   if (Func.Blocks.size() > opts::StaleMatchingMaxFuncSize)
 return false;
 
+  if ((double)Func.MatchedExecCount / YamlBF.ExecCount >=
+  opts::MatchedProfileThreshold / 100.0)
+return false;

WenleiHe wrote:

> For block-based matching, the threshold should be higher than 5%, perhaps 
> closer to a half?

Yes. Threshold of course need to be tuned based on the heuristic chosen. I just 
feel that block count based threshold could be a better proxy of how confident 
we are about the graph match and whether stale profile matching should 
proceed.. But I don't have very strong opinion.

https://github.com/llvm/llvm-project/pull/95156
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] fc671bb - Revert "Bump the DWARF version number to 5 on Darwin. (#95164)"

2024-06-12 Thread via llvm-branch-commits

Author: Florian Mayer
Date: 2024-06-12T15:50:03-07:00
New Revision: fc671bbb1ceb94f8aac63bc0e4963e5894bc660e

URL: 
https://github.com/llvm/llvm-project/commit/fc671bbb1ceb94f8aac63bc0e4963e5894bc660e
DIFF: 
https://github.com/llvm/llvm-project/commit/fc671bbb1ceb94f8aac63bc0e4963e5894bc660e.diff

LOG: Revert "Bump the DWARF version number to 5 on Darwin. (#95164)"

This reverts commit 8f6acd973a38da6dce45faa676cbb51da37f72e5.

Added: 


Modified: 
clang/lib/Driver/ToolChains/Darwin.cpp
clang/test/Driver/debug-options.c

Removed: 




diff  --git a/clang/lib/Driver/ToolChains/Darwin.cpp 
b/clang/lib/Driver/ToolChains/Darwin.cpp
index ca75a622b061e..ed5737915aa96 100644
--- a/clang/lib/Driver/ToolChains/Darwin.cpp
+++ b/clang/lib/Driver/ToolChains/Darwin.cpp
@@ -1257,17 +1257,7 @@ unsigned DarwinClang::GetDefaultDwarfVersion() const {
   if ((isTargetMacOSBased() && isMacosxVersionLT(10, 11)) ||
   (isTargetIOSBased() && isIPhoneOSVersionLT(9)))
 return 2;
-  // Default to use DWARF 4 on OS X 10.11 - macOS 14 / iOS 9 - iOS 17.
-  if ((isTargetMacOSBased() && isMacosxVersionLT(15)) ||
-  (isTargetIOSBased() && isIPhoneOSVersionLT(18)) ||
-  (isTargetWatchOSBased() && TargetVersion < llvm::VersionTuple(11)) ||
-  (isTargetXROS() && TargetVersion < llvm::VersionTuple(2)) ||
-  (isTargetDriverKit() && TargetVersion < llvm::VersionTuple(24)) ||
-  (isTargetMacOSBased() &&
-   TargetVersion.empty()) || // apple-darwin, no version.
-  (TargetPlatform == llvm::Triple::BridgeOS))
-return 4;
-  return 5;
+  return 4;
 }
 
 void MachO::AddLinkRuntimeLib(const ArgList , ArgStringList ,

diff  --git a/clang/test/Driver/debug-options.c 
b/clang/test/Driver/debug-options.c
index 0a665f7017d63..07f6ca9e3902f 100644
--- a/clang/test/Driver/debug-options.c
+++ b/clang/test/Driver/debug-options.c
@@ -68,32 +68,7 @@
 // RUN: %clang -### -c -g %s -target x86_64-apple-driverkit19.0 2>&1 \
 // RUN: | FileCheck -check-prefix=G_STANDALONE \
 // RUN: -check-prefix=G_DWARF4 %s
-// RUN: %clang -### -c -g %s -target x86_64-apple-macosx15 2>&1 \
-// RUN: | FileCheck -check-prefix=G_STANDALONE \
-// RUN: -check-prefix=G_DWARF5 %s
-// RUN: %clang -### -c -g %s -target arm64-apple-ios17.0 2>&1 \
-// RUN: | FileCheck -check-prefix=G_STANDALONE \
-// RUN: -check-prefix=G_DWARF4 %s
-// RUN: %clang -### -c -g %s -target arm64-apple-ios18.0 2>&1 \
-// RUN: | FileCheck -check-prefix=G_STANDALONE \
-// RUN: -check-prefix=G_DWARF5 %s
-// RUN: %clang -### -c -g %s -target arm64_32-apple-watchos11 2>&1 \
-// RUN: | FileCheck -check-prefix=G_STANDALONE \
-// RUN: -check-prefix=G_DWARF5 %s
-// RUN: %clang -### -c -g %s -target arm64-apple-tvos18.0 2>&1 \
-// RUN: | FileCheck -check-prefix=G_STANDALONE \
-// RUN: -check-prefix=G_DWARF5 %s
-// RUN: %clang -### -c -g %s -target x86_64-apple-driverkit24.0 2>&1 \
-// RUN: | FileCheck -check-prefix=G_STANDALONE \
-// RUN: -check-prefix=G_DWARF5 %s
-// RUN: %clang -### -c -g %s -target arm64-apple-xros1 2>&1 \
-// RUN: | FileCheck -check-prefix=G_STANDALONE \
-// RUN: -check-prefix=G_DWARF4 %s
-// RUN: %clang -### -c -g %s -target arm64-apple-xros2 2>&1 \
-// RUN: | FileCheck -check-prefix=G_STANDALONE \
-// RUN: -check-prefix=G_DWARF5 %s
-//
-// RUN: %clang -### -c -fsave-optimization-record %s\
+// RUN: %clang -### -c -fsave-optimization-record %s \
 // RUN:-target x86_64-apple-darwin 2>&1 \
 // RUN: | FileCheck -check-prefix=GLTO_ONLY %s
 // RUN: %clang -### -c -g -fsave-optimization-record %s \



___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [msan] Handle blendv intrinsics (PR #94882)

2024-06-12 Thread Vitaly Buka via llvm-branch-commits

https://github.com/vitalybuka closed 
https://github.com/llvm/llvm-project/pull/94882
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [msan] Handle blendv intrinsics (PR #94882)

2024-06-12 Thread Florian Mayer via llvm-branch-commits

https://github.com/fmayer approved this pull request.


https://github.com/llvm/llvm-project/pull/94882
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [msan] Handle blendv intrinsics (PR #94882)

2024-06-12 Thread Vitaly Buka via llvm-branch-commits

https://github.com/vitalybuka edited 
https://github.com/llvm/llvm-project/pull/94882
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [msan] Handle blendv intrinsics (PR #94882)

2024-06-12 Thread Vitaly Buka via llvm-branch-commits


@@ -3356,6 +3356,37 @@ struct MemorySanitizerVisitor : public 
InstVisitor {
 setOriginForNaryOp(I);
   }
 
+  Value *convertBlendvToSelectMask(IRBuilder<> , Value *C) {
+C = CreateAppToShadowCast(IRB, C);
+FixedVectorType *FVT = cast(C->getType());
+unsigned ElSize = FVT->getElementType()->getPrimitiveSizeInBits();
+C = IRB.CreateAShr(C, ElSize - 1);
+FVT = FixedVectorType::get(IRB.getInt1Ty(), FVT->getNumElements());
+return IRB.CreateTrunc(C, FVT);
+  }
+
+  // `blendv(f, t, c)` is effectively `select(c[top_bit], t, f)`.
+  void handleBlendvIntrinsic(IntrinsicInst ) {
+Value *C = I.getOperand(2);
+Value *T = I.getOperand(1);
+Value *F = I.getOperand(0);
+
+Value *Sc = getShadow(, 2);
+Value *Oc = MS.TrackOrigins ? getOrigin(C) : nullptr;
+
+{
+  IRBuilder<> IRB();

vitalybuka wrote:

I think it's unimportant. Builder has nothing interesting in destructor.

`{}` is rather just to show that we don't need to can about builders conflict.

https://github.com/llvm/llvm-project/pull/94882
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [msan] Handle blendv intrinsics (PR #94882)

2024-06-12 Thread Florian Mayer via llvm-branch-commits


@@ -3356,6 +3356,37 @@ struct MemorySanitizerVisitor : public 
InstVisitor {
 setOriginForNaryOp(I);
   }
 
+  Value *convertBlendvToSelectMask(IRBuilder<> , Value *C) {
+C = CreateAppToShadowCast(IRB, C);
+FixedVectorType *FVT = cast(C->getType());
+unsigned ElSize = FVT->getElementType()->getPrimitiveSizeInBits();
+C = IRB.CreateAShr(C, ElSize - 1);
+FVT = FixedVectorType::get(IRB.getInt1Ty(), FVT->getNumElements());
+return IRB.CreateTrunc(C, FVT);
+  }
+
+  // `blendv(f, t, c)` is effectively `select(c[top_bit], t, f)`.
+  void handleBlendvIntrinsic(IntrinsicInst ) {
+Value *C = I.getOperand(2);
+Value *T = I.getOperand(1);
+Value *F = I.getOperand(0);
+
+Value *Sc = getShadow(, 2);
+Value *Oc = MS.TrackOrigins ? getOrigin(C) : nullptr;
+
+{
+  IRBuilder<> IRB();

fmayer wrote:

Why does it matter that this doesn't outlive `handleSelectLikeInst`? Because 
that also creates an IRBuilder? How does that work? That creates it from `` 
as well, which means these instructions get inserted before the ones here, 
right?

https://github.com/llvm/llvm-project/pull/94882
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [msan] Handle blendv intrinsics (PR #94882)

2024-06-12 Thread Vitaly Buka via llvm-branch-commits

vitalybuka wrote:

ping

https://github.com/llvm/llvm-project/pull/94882
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [BOLT] Drop high discrepancy profiles in matching (PR #95156)

2024-06-12 Thread Amir Ayupov via llvm-branch-commits


@@ -592,10 +599,15 @@ void preprocessUnreachableBlocks(FlowFunction ) {
 /// Decide if stale profile matching can be applied for a given function.
 /// Currently we skip inference for (very) large instances and for instances
 /// having "unexpected" control flow (e.g., having no sink basic blocks).
-bool canApplyInference(const FlowFunction ) {
+bool canApplyInference(const FlowFunction ,
+   const yaml::bolt::BinaryFunctionProfile ) {
   if (Func.Blocks.size() > opts::StaleMatchingMaxFuncSize)
 return false;
 
+  if ((double)Func.MatchedExecCount / YamlBF.ExecCount >=
+  opts::MatchedProfileThreshold / 100.0)
+return false;

aaupov wrote:

It's a tricky question how to define the cutoff in terms of sufficient matching.
I first thought of defining a block count based cutoff (if we matched >5% of 
blocks, proceed with matching), but then what if these are cold blocks covering 
<1% of exec count? In this case we'd end up guessing/propagating most samples.

For block-based matching, the threshold should be higher than 5%, perhaps 
closer to a half? For exec count based matching, I'd feel comfortable with 5% 
as threshold.

https://github.com/llvm/llvm-project/pull/95156
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [BOLT] Drop high discrepancy profiles in matching (PR #95156)

2024-06-12 Thread via llvm-branch-commits

WenleiHe wrote:

cc @wlei-llvm 

https://github.com/llvm/llvm-project/pull/95156
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [BOLT] Drop high discrepancy profiles in matching (PR #95156)

2024-06-12 Thread via llvm-branch-commits


@@ -592,10 +599,15 @@ void preprocessUnreachableBlocks(FlowFunction ) {
 /// Decide if stale profile matching can be applied for a given function.
 /// Currently we skip inference for (very) large instances and for instances
 /// having "unexpected" control flow (e.g., having no sink basic blocks).
-bool canApplyInference(const FlowFunction ) {
+bool canApplyInference(const FlowFunction ,
+   const yaml::bolt::BinaryFunctionProfile ) {
   if (Func.Blocks.size() > opts::StaleMatchingMaxFuncSize)
 return false;
 
+  if ((double)Func.MatchedExecCount / YamlBF.ExecCount >=
+  opts::MatchedProfileThreshold / 100.0)
+return false;

WenleiHe wrote:

Trying to understand the rationale behind using dynamic counts to determine 
whether profile inference is safe. 

The way I see it is, we have two graph that we try to match, if we have many 
nodes in the graph that we have exact match, chances are higher that we can 
infer the correct match for the rest of the nodes. With that, we care about 
more how many nodes we can match statically. 

Say if we have 5 blocks with count distribution of 1M, 1K, 1K, 1k, 1K, if we 
have exact match for the 4 1K node (80% exact match), we should feel reasonably 
confident about inferring the remaining 1 node, even though if we look at 
counts, we have exact match for only <1%. 

WDYT?

https://github.com/llvm/llvm-project/pull/95156
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [BOLT] Drop high discrepancy profiles in matching (PR #95156)

2024-06-12 Thread via llvm-branch-commits


@@ -59,6 +59,8 @@ struct FlowFunction {
   /// The index of the entry block.
   uint64_t Entry{0};
   uint64_t Sink{UINT64_MAX};
+  // Matched execution count for the function.
+  uint64_t MatchedExecCount{0};

WenleiHe wrote:

nit: I'd be careful about adding this to `FlowFunction` -- strictly speaking 
this doesn't belong to flow function, which just describe the CFG and if we add 
function level "attributes" to flow functions, we'd have a lot more here. 

https://github.com/llvm/llvm-project/pull/95156
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [BOLT] Drop high discrepancy profiles in matching (PR #95156)

2024-06-12 Thread via llvm-branch-commits


@@ -592,10 +599,15 @@ void preprocessUnreachableBlocks(FlowFunction ) {
 /// Decide if stale profile matching can be applied for a given function.
 /// Currently we skip inference for (very) large instances and for instances
 /// having "unexpected" control flow (e.g., having no sink basic blocks).
-bool canApplyInference(const FlowFunction ) {
+bool canApplyInference(const FlowFunction ,

WenleiHe wrote:

Header comment needs update.

https://github.com/llvm/llvm-project/pull/95156
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [BOLT] Drop high discrepancy profiles in matching (PR #95156)

2024-06-12 Thread via llvm-branch-commits


@@ -614,6 +614,17 @@
 
 - `--lite-threshold-pct=`
 
+  Threshold (in percent) of matched profile at which stale profile inference is
+  applied to functions. Argument corresponds to the sum of matched execution
+  counts of function blocks divided by the sum of execution counts of function
+  blocks. E.g if the sum of a function blocks' execution counts is 100, the sum
+  of the function blocks' matched execution counts is 10, and the argument is 
15
+  (15%), profile inference will not be applied to that function. A higher
+  threshold will correlate with fewer functions to process in cases of stale
+  profile. Default set to %5.

WenleiHe wrote:

nit: this is too verbose of a description. as you can see it's longer than most 
of other descriptions. :) 

https://github.com/llvm/llvm-project/pull/95156
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [libcxx] [libc++] Implement std::move_only_function (P0288R9) (PR #94670)

2024-06-12 Thread via llvm-branch-commits

https://github.com/EricWF edited https://github.com/llvm/llvm-project/pull/94670
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] fe30a73 - Revert "[CLANG][DWARF] Handle DIE offset collision in DW_IDX_parent (#95039)"

2024-06-12 Thread via llvm-branch-commits

Author: Florian Mayer
Date: 2024-06-12T13:25:52-07:00
New Revision: fe30a734628b3028c086ce016b6f80440172f34f

URL: 
https://github.com/llvm/llvm-project/commit/fe30a734628b3028c086ce016b6f80440172f34f
DIFF: 
https://github.com/llvm/llvm-project/commit/fe30a734628b3028c086ce016b6f80440172f34f.diff

LOG: Revert "[CLANG][DWARF] Handle DIE offset collision in DW_IDX_parent 
(#95039)"

This reverts commit f59d9d538c7b580a93bee4afba0f098f7ddf09d9.

Added: 


Modified: 
llvm/include/llvm/CodeGen/AccelTable.h
llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp
llvm/lib/DWARFLinker/Classic/DWARFLinker.cpp
llvm/lib/DWARFLinker/Parallel/DWARFLinkerImpl.cpp

Removed: 
llvm/test/DebugInfo/X86/debug-names-types-die-offset-collision.ll



diff  --git a/llvm/include/llvm/CodeGen/AccelTable.h 
b/llvm/include/llvm/CodeGen/AccelTable.h
index 622fcf019aad6..cff8fcbaf2cd7 100644
--- a/llvm/include/llvm/CodeGen/AccelTable.h
+++ b/llvm/include/llvm/CodeGen/AccelTable.h
@@ -257,38 +257,18 @@ class AppleAccelTableData : public AccelTableData {
 
 /// Helper class to identify an entry in DWARF5AccelTable based on their DIE
 /// offset and UnitID.
-struct OffsetAndUnitID {
-  uint64_t Offset = 0;
-  uint32_t UnitID = 0;
-  bool IsTU = false;
-  OffsetAndUnitID() = default;
-  OffsetAndUnitID(uint64_t Offset, uint32_t UnitID, bool IsTU)
-  : Offset(Offset), UnitID(UnitID), IsTU(IsTU) {}
-  uint64_t offset() const { return Offset; };
-  uint32_t unitID() const { return UnitID; };
-  bool isTU() const { return IsTU; }
-};
+struct OffsetAndUnitID : std::pair {
+  using Base = std::pair;
+  OffsetAndUnitID(Base B) : Base(B) {}
 
-template <> struct DenseMapInfo {
-  static inline OffsetAndUnitID getEmptyKey() {
-OffsetAndUnitID Entry;
-Entry.Offset = uint64_t(-1);
-return Entry;
-  }
-  static inline OffsetAndUnitID getTombstoneKey() {
-OffsetAndUnitID Entry;
-Entry.Offset = uint64_t(-2);
-return Entry;
-  }
-  static unsigned getHashValue(const OffsetAndUnitID ) {
-return (unsigned)llvm::hash_combine(Val.offset(), Val.unitID(), Val.IsTU);
-  }
-  static bool isEqual(const OffsetAndUnitID , const OffsetAndUnitID ) {
-return LHS.offset() == RHS.offset() && LHS.unitID() == RHS.unitID() &&
-   LHS.IsTU == RHS.isTU();
-  }
+  OffsetAndUnitID(uint64_t Offset, uint32_t UnitID) : Base(Offset, UnitID) {}
+  uint64_t offset() const { return first; };
+  uint32_t unitID() const { return second; };
 };
 
+template <>
+struct DenseMapInfo : DenseMapInfo {};
+
 /// The Data class implementation for DWARF v5 accelerator table. Unlike the
 /// Apple Data classes, this class is just a DIE wrapper, and does not know to
 /// serialize itself. The complete serialization logic is in the
@@ -297,11 +277,12 @@ class DWARF5AccelTableData : public AccelTableData {
 public:
   static uint32_t hash(StringRef Name) { return caseFoldingDjbHash(Name); }
 
-  DWARF5AccelTableData(const DIE , const uint32_t UnitID, const bool IsTU);
+  DWARF5AccelTableData(const DIE , const uint32_t UnitID,
+   const bool IsTU = false);
   DWARF5AccelTableData(const uint64_t DieOffset,
const std::optional DefiningParentOffset,
const unsigned DieTag, const unsigned UnitID,
-   const bool IsTU)
+   const bool IsTU = false)
   : OffsetVal(DieOffset), ParentOffset(DefiningParentOffset),
 DieTag(DieTag), AbbrevNumber(0), IsTU(IsTU), UnitID(UnitID) {}
 
@@ -315,7 +296,7 @@ class DWARF5AccelTableData : public AccelTableData {
   }
 
   OffsetAndUnitID getDieOffsetAndUnitID() const {
-return {getDieOffset(), getUnitID(), isTU()};
+return {getDieOffset(), UnitID};
   }
 
   unsigned getDieTag() const { return DieTag; }
@@ -341,7 +322,7 @@ class DWARF5AccelTableData : public AccelTableData {
 assert(isNormalized() && "Accessing DIE Offset before normalizing.");
 if (!ParentOffset)
   return std::nullopt;
-return OffsetAndUnitID(*ParentOffset, getUnitID(), isTU());
+return OffsetAndUnitID(*ParentOffset, getUnitID());
   }
 
   /// Sets AbbrevIndex for an Entry.
@@ -435,7 +416,7 @@ class DWARF5AccelTable : public 
AccelTable {
   for (auto *Data : Entry.second.getValues()) {
 addName(Entry.second.Name, Data->getDieOffset(),
 Data->getParentDieOffset(), Data->getDieTag(),
-Data->getUnitID(), Data->isTU());
+Data->getUnitID(), true);
   }
 }
   }

diff  --git a/llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp 
b/llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp
index 7de9432325d8a..b9c02aed848cc 100644
--- a/llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp
+++ b/llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp
@@ -3592,8 +3592,7 @@ void DwarfDebug::addAccelNameImpl(
"Kind is TU but CU is being processed.");
 // The type unit can be discarded, so need to 

[llvm-branch-commits] [workflows] Fix version-check.yml to work with the new minor release bump (PR #95296)

2024-06-12 Thread Vitaly Buka via llvm-branch-commits

https://github.com/vitalybuka closed 
https://github.com/llvm/llvm-project/pull/95296
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] Use rc version suffix (PR #95295)

2024-06-12 Thread Vitaly Buka via llvm-branch-commits

https://github.com/vitalybuka closed 
https://github.com/llvm/llvm-project/pull/95295
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [workflows] Fix version-check.yml to work with the new minor release bump (PR #95296)

2024-06-12 Thread Vitaly Buka via llvm-branch-commits

https://github.com/vitalybuka created 
https://github.com/llvm/llvm-project/pull/95296

(cherry picked from commit d5e69147b9d261bd53b4dd027f17131677be8613)



___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] Use rc version suffix (PR #95295)

2024-06-12 Thread Vitaly Buka via llvm-branch-commits

https://github.com/vitalybuka created 
https://github.com/llvm/llvm-project/pull/95295

None


___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [clang] Implement function pointer signing and authenticated function calls (PR #93906)

2024-06-12 Thread Akira Hatanaka via llvm-branch-commits

https://github.com/ahatanak updated 
https://github.com/llvm/llvm-project/pull/93906

>From 0e85001f6d53e63beca77a76eaba1875ec84000d Mon Sep 17 00:00:00 2001
From: Akira Hatanaka 
Date: Fri, 24 May 2024 20:23:36 -0700
Subject: [PATCH 1/4] [clang] Implement function pointer signing.

Co-Authored-By: John McCall 
---
 clang/include/clang/Basic/CodeGenOptions.h|   4 +
 .../clang/Basic/DiagnosticDriverKinds.td  |   3 +
 clang/include/clang/Basic/LangOptions.h   |   2 +
 .../include/clang/Basic/PointerAuthOptions.h  | 136 ++
 .../clang/Frontend/CompilerInvocation.h   |  10 ++
 clang/lib/CodeGen/CGBuiltin.cpp   |   3 +-
 clang/lib/CodeGen/CGCall.cpp  |   3 +
 clang/lib/CodeGen/CGCall.h|  28 +++-
 clang/lib/CodeGen/CGExpr.cpp  |  17 +--
 clang/lib/CodeGen/CGExprConstant.cpp  |  19 ++-
 clang/lib/CodeGen/CGPointerAuth.cpp   |  51 +++
 clang/lib/CodeGen/CGPointerAuthInfo.h |  96 +
 clang/lib/CodeGen/CodeGenFunction.cpp |  58 
 clang/lib/CodeGen/CodeGenFunction.h   |  10 ++
 clang/lib/CodeGen/CodeGenModule.h |  34 +
 clang/lib/Frontend/CompilerInvocation.cpp |  36 +
 clang/lib/Headers/ptrauth.h   |  34 +
 .../CodeGen/ptrauth-function-attributes.c |  13 ++
 .../test/CodeGen/ptrauth-function-init-fail.c |   5 +
 clang/test/CodeGen/ptrauth-function-init.c|  31 
 .../CodeGen/ptrauth-function-lvalue-cast.c|  23 +++
 clang/test/CodeGen/ptrauth-weak_import.c  |  10 ++
 clang/test/CodeGenCXX/ptrauth.cpp |  24 
 23 files changed, 633 insertions(+), 17 deletions(-)
 create mode 100644 clang/lib/CodeGen/CGPointerAuthInfo.h
 create mode 100644 clang/test/CodeGen/ptrauth-function-attributes.c
 create mode 100644 clang/test/CodeGen/ptrauth-function-init-fail.c
 create mode 100644 clang/test/CodeGen/ptrauth-function-init.c
 create mode 100644 clang/test/CodeGen/ptrauth-function-lvalue-cast.c
 create mode 100644 clang/test/CodeGen/ptrauth-weak_import.c
 create mode 100644 clang/test/CodeGenCXX/ptrauth.cpp

diff --git a/clang/include/clang/Basic/CodeGenOptions.h 
b/clang/include/clang/Basic/CodeGenOptions.h
index 9469a424045bb..502722a6ec4eb 100644
--- a/clang/include/clang/Basic/CodeGenOptions.h
+++ b/clang/include/clang/Basic/CodeGenOptions.h
@@ -13,6 +13,7 @@
 #ifndef LLVM_CLANG_BASIC_CODEGENOPTIONS_H
 #define LLVM_CLANG_BASIC_CODEGENOPTIONS_H
 
+#include "clang/Basic/PointerAuthOptions.h"
 #include "clang/Basic/Sanitizers.h"
 #include "clang/Basic/XRayInstr.h"
 #include "llvm/ADT/FloatingPointMode.h"
@@ -388,6 +389,9 @@ class CodeGenOptions : public CodeGenOptionsBase {
 
   std::vector Reciprocals;
 
+  /// Configuration for pointer-signing.
+  PointerAuthOptions PointerAuth;
+
   /// The preferred width for auto-vectorization transforms. This is intended 
to
   /// override default transforms based on the width of the architected vector
   /// registers.
diff --git a/clang/include/clang/Basic/DiagnosticDriverKinds.td 
b/clang/include/clang/Basic/DiagnosticDriverKinds.td
index 773b234cd68fe..6cbb0c8401c15 100644
--- a/clang/include/clang/Basic/DiagnosticDriverKinds.td
+++ b/clang/include/clang/Basic/DiagnosticDriverKinds.td
@@ -351,6 +351,9 @@ def err_drv_omp_host_ir_file_not_found : Error<
   "target regions but cannot be found">;
 def err_drv_omp_host_target_not_supported : Error<
   "target '%0' is not a supported OpenMP host target">;
+def err_drv_ptrauth_not_supported : Error<
+  "target '%0' does not support native pointer authentication">;
+
 def err_drv_expecting_fopenmp_with_fopenmp_targets : Error<
   "'-fopenmp-targets' must be used in conjunction with a '-fopenmp' option "
   "compatible with offloading; e.g., '-fopenmp=libomp' or 
'-fopenmp=libiomp5'">;
diff --git a/clang/include/clang/Basic/LangOptions.h 
b/clang/include/clang/Basic/LangOptions.h
index 75e88afbd9705..5216822e45b1b 100644
--- a/clang/include/clang/Basic/LangOptions.h
+++ b/clang/include/clang/Basic/LangOptions.h
@@ -346,6 +346,8 @@ class LangOptionsBase {
 BKey
   };
 
+  using PointerAuthenticationMode = ::clang::PointerAuthenticationMode;
+
   enum class ThreadModelKind {
 /// POSIX Threads.
 POSIX,
diff --git a/clang/include/clang/Basic/PointerAuthOptions.h 
b/clang/include/clang/Basic/PointerAuthOptions.h
index e5cdcc31ebfb7..32b179e3f9460 100644
--- a/clang/include/clang/Basic/PointerAuthOptions.h
+++ b/clang/include/clang/Basic/PointerAuthOptions.h
@@ -14,10 +14,146 @@
 #ifndef LLVM_CLANG_BASIC_POINTERAUTHOPTIONS_H
 #define LLVM_CLANG_BASIC_POINTERAUTHOPTIONS_H
 
+#include "clang/Basic/LLVM.h"
+#include "clang/Basic/LangOptions.h"
+#include "llvm/Support/ErrorHandling.h"
+#include "llvm/Target/TargetOptions.h"
+#include 
+#include 
+#include 
+#include 
+
 namespace clang {
 
 constexpr unsigned PointerAuthKeyNone = -1;
 
+class PointerAuthSchema {
+public:
+  enum class Kind : unsigned {
+

[llvm-branch-commits] [libcxx] Mark test as long_tests (PR #95266)

2024-06-12 Thread Vitaly Buka via llvm-branch-commits

vitalybuka wrote:

Thanks! Abandoning.

https://github.com/llvm/llvm-project/pull/95266
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [libcxx] Mark test as long_tests (PR #95266)

2024-06-12 Thread Vitaly Buka via llvm-branch-commits

https://github.com/vitalybuka closed 
https://github.com/llvm/llvm-project/pull/95266
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


  1   2   3   4   5   6   7   8   9   10   >