Bug#1070446: rocm-hipamd: arm64 FTBFS with glibc 2.38
Tags: patch Hi Graham, On 2024-05-05 07:31, Graham Inggs wrote: As can be seen in reproducible builds [1], rocm-hipamd FTBFS on arm64 with glibc 2.38. I've copied what I hope is the relevant part of the log below. A bug was filed against glibc [2], but it seems glibc upstream do not consider it a bug in glibc. There is nothing that can be done in rocm-hipamd to address this bug, aside from removing arm64 from the rocm-hipamd architecture list. The incompatibility is not with the HIP runtime, but with the HIP language. This is a disagreement that glibc and llvm will need to resolve between themselves. [1]https://tests.reproducible-builds.org/debian/rb-pkg/rocm-hipamd.html [2]https://sourceware.org/bugzilla/show_bug.cgi?id=30909 In file included from /tmp/hip_pch.724714/hip_pch.h:1: In file included from /build/reproducible-path/rocm-hipamd-5.7.1/hip/include/hip/hip_runtime.h:62: In file included from /build/reproducible-path/rocm-hipamd-5.7.1/hipamd/include/hip/amd_detail/amd_hip_runtime.h:76: In file included from /usr/lib/gcc/aarch64-linux-gnu/13/../../../../include/c++/13/cmath:47: In file included from /usr/include/math.h:40: /usr/include/aarch64-linux-gnu/bits/math-vector.h:40:9: error: unknown type name '__SVFloat32_t' 40 | typedef __SVFloat32_t __sv_f32_t; | ^ /usr/include/aarch64-linux-gnu/bits/math-vector.h:41:9: error: unknown type name '__SVFloat64_t' 41 | typedef __SVFloat64_t __sv_f64_t; | ^ /usr/include/aarch64-linux-gnu/bits/math-vector.h:42:9: error: unknown type name '__SVBool_t' 42 | typedef __SVBool_t __sv_bool_t; | ^ This compilation error is when building device code when the host architecture is aarch64. LLVM only defines __SVFloat32_t, __SVFloat64_t and __SVBool_t when building host code, but not when building device code. To me this seems reasonable because GPUs do not support SVE instructions. However, the math.h header (on aarch64 at least) is not aware of the concept of the distinction between host code and device code. As such, it fails when compiling device code. The glibc argument is that GCC always supports these types, but I'm not convinced. I'm curious how GCC handles the math headers for OpenMP GPU offloading [3]. In any case, I've attached a patch for glibc that would fix this bug. Perhaps my suggestion would be more palatable to upstream than the previously rejected patch. If not, it's up to glibc or LLVM to find a solution. If they cannot, then we will have to drop arm64 support for the HIP language. Sincerely, Cory Bloor [3]: https://gcc.gnu.org/wiki/Offloading From: Cordell Bloor Date: Wed, 10 Apr 2024 16:49:24 -0600 Subject: [PATCH] arm64/math-vec.h: drop SVE vector types in device code These headers get included when building HIP libraries on the aarch64 platform. The headers are used when building both CPU code and GPU code, but the SVE vector types are not supported on the GPU. The clang compiler sets __HIP_DEVICE_COMPILE__ when it is building code for the GPU, so disable __SVE_VEC_MATH_SUPPORTED when that macro is detected. Bug-Debian: https://bugs.debian.org/1070446 Bug-Ubuntu: https://bugs.launchpad.net/glibc/+bug/2032624 Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=30909 Forwarded: No Index: glibc/sysdeps/aarch64/fpu/bits/math-vector.h === --- glibc.orig/sysdeps/aarch64/fpu/bits/math-vector.h +++ glibc/sysdeps/aarch64/fpu/bits/math-vector.h @@ -101,7 +101,8 @@ typedef __attribute__ ((__neon_vector_ty typedef __attribute__ ((__neon_vector_type__ (2))) double __f64x2_t; #endif -#if __GNUC_PREREQ(10, 0) || __glibc_clang_prereq(11, 0) +#if (__GNUC_PREREQ(10, 0) || __glibc_clang_prereq(11, 0)) \ + && !defined(__HIP_DEVICE_COMPILE__) # define __SVE_VEC_MATH_SUPPORTED typedef __SVFloat32_t __sv_f32_t; typedef __SVFloat64_t __sv_f64_t;
Bug#1064730: stdgpu: FTBFS: type_traits.h:736:1: error: expected type-specifier before ‘template’
Hi Timo, On Sat, 2 Mar 2024 09:21:43 +0100 Timo =?utf-8?Q?R=C3=B6hling?= wrote: > > On Sun, 25 Feb 2024 20:28:53 +0100 Lucas Nussbaum > wrote: > > > /usr/include/thrust/detail/type_traits.h:736:1: error: expected > > > type-specifier before ‘template’ > > This bug is caused by a #ifdef cascade for different > THRUST_DEVICE_SYSTEM values, which sadly no longer works with > THRUST_DEVICE_SYSTEM_OMP. This makes it effectively impossible to > build the HIP backend and the OpenMP backend from the same source. Am I understanding correctly that this was broken in a rocthrust update? Should this be treated as a rocthrust bug? [1] Sincerely, Cory Bloor [1]: https://bugs.debian.org/1064730
Bug#1067956: rocalution: FTBFS on armhf (test failure with memory allocation)
Control: severity 1067956 important The rocalution package has never successfully built for armhf, so I don't think this qualifies as release-critical. It's great to see that the rocalution package gets all the way into the tests before failing, though. The upstream project only officially supports amd64, so that's better than I was expecting. The tests should probably skip anything requiring more than ~2 GB of memory when running on 32-bit architectures. Patches are welcome. Sincerely, Cory Bloor
Bug#1067356: hipsolver: FTBFS: make[1]: *** [debian/rules:17: override_dh_auto_configure-arch] Error 2
Control: reassign 1067356 libamdhip64-dev 5.7.1-1 Control: affects 1067356 hipsolver Control: fixed 1067356 5.7.1-2 On 2024-03-20 15:00, Lucas Nussbaum wrote: During a rebuild of all packages in sid, your package failed to build on amd64. Relevant part (hopefully): make[1]: Entering directory '/<>' dh_auto_configure -- -DCMAKE_BUILD_TYPE=Release -DROCM_SYMLINK_LIBS=OFF -DBUILD_FILE_REORG_BACKWARD_COMPATIBILITY=OFF -DBUILD_CLIENTS_TESTS=ON cd obj-x86_64-linux-gnu && DEB_PYTHON_INSTALL_LAYOUT=deb cmake -DCMAKE_INSTALL_PREFIX=/usr -DCMAKE_BUILD_TYPE=None -DCMAKE_INSTALL_SYSCONFDIR=/etc -DCMAKE_INSTALL_LOCALSTATEDIR=/var -DCMAKE_EXPORT_NO_PACKAGE_REGISTRY=ON -DCMAKE_FIND_USE_PACKAGE_REGISTRY=OFF -DCMAKE_FIND_PACKAGE_NO_PACKAGE_REGISTRY=ON -DFETCHCONTENT_FULLY_DISCONNECTED=ON -DCMAKE_INSTALL_RUNSTATEDIR=/run -DCMAKE_SKIP_INSTALL_ALL_DEPENDENCY=ON "-GUnix Makefiles" -DCMAKE_VERBOSE_MAKEFILE=ON -DCMAKE_INSTALL_LIBDIR=lib/x86_64-linux-gnu -DCMAKE_BUILD_TYPE=Release -DROCM_SYMLINK_LIBS=OFF -DBUILD_FILE_REORG_BACKWARD_COMPATIBILITY=OFF -DBUILD_CLIENTS_TESTS=ON .. Re-run cmake no build system arguments -- The CXX compiler identification is GNU 13.2.0 -- Detecting CXX compiler ABI info -- Detecting CXX compiler ABI info - done -- Check for working CXX compiler: /usr/bin/c++ - skipped -- Detecting CXX compile features -- Detecting CXX compile features - done -- The Fortran compiler identification is GNU 13.2.0 -- Detecting Fortran compiler ABI info -- Detecting Fortran compiler ABI info - done -- Check for working Fortran compiler: /usr/bin/gfortran - skipped CMake Error at /usr/lib/x86_64-linux-gnu/cmake/hip/hip-config.cmake:170 (message): Unexpected HIP_PLATFORM: Call Stack (most recent call first): CMakeLists.txt:145 (find_package) Thanks Lucas. I uploaded a fix for this yesterday, but it was too late for this build. Sincerely, Cory Bloor
Bug#1042036: rocblas: FTBFS: AttributeError: 'KernelWriterAssembly' object has no attribute 'language'
Thanks Lucas, On 2023-07-25 14:56, Lucas Nussbaum wrote: # Writing Kernels... Generating kernels: Launching 8 threads... Traceback (most recent call last): File "/<>/tensile/Tensile/Parallel.py", line 54, in apply_print_exception return func(*args) ^^^ File "/<>/tensile/Tensile/TensileCreateLibrary.py", line 67, in processKernelSource header = kernelWriter.getHeaderFileString(kernel) File "/<>/tensile/Tensile/KernelWriter.py", line 5065, in getHeaderFileString if self.language == "HIP" or self.language == "OCL": ^ AttributeError: 'KernelWriterAssembly' object has no attribute 'language' Custom kernel filename /<>/obj-x86_64-linux-gnu/library/src/build_tmp/TENSILE/assembly/DGEMM_Aldebaran_NN_MT128x128x16_MI16x16x4x1_GRVW2_SU4_SUS128_WGM4.s multiprocessing.pool.RemoteTraceback: """ Traceback (most recent call last): File "/usr/lib/python3.11/multiprocessing/pool.py", line 125, in worker result = (True, func(*args, **kwds)) ^^^ File "/usr/lib/python3.11/multiprocessing/pool.py", line 51, in starmapstar return list(itertools.starmap(args[0], args[1])) ^ File "/<>/tensile/Tensile/Parallel.py", line 54, in apply_print_exception return func(*args) ^^^ File "/<>/tensile/Tensile/TensileCreateLibrary.py", line 67, in processKernelSource header = kernelWriter.getHeaderFileString(kernel) File "/<>/tensile/Tensile/KernelWriter.py", line 5065, in getHeaderFileString if self.language == "HIP" or self.language == "OCL": ^ AttributeError: 'KernelWriterAssembly' object has no attribute 'language' """ The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/<>/tensile/Tensile/bin/TensileCreateLibrary", line 43, in TensileCreateLibrary() File "/<>/tensile/Tensile/TensileCreateLibrary.py", line 1303, in TensileCreateLibrary codeObjectFiles = writeSolutionsAndKernels(outputPath, CxxCompiler, None, solutions, ^^ File "/<>/tensile/Tensile/TensileCreateLibrary.py", line 482, in writeSolutionsAndKernels results = Common.ParallelMap(processKernelSource, kIter, "Generating kernels", method=lambda x: x.starmap, maxTasksPerChild=1) File "/<>/tensile/Tensile/Parallel.py", line 134, in ParallelMap rv = mapFunc(function, objects) ^^ File "/usr/lib/python3.11/multiprocessing/pool.py", line 375, in starmap return self._map_async(func, iterable, starmapstar, chunksize).get() ^ File "/usr/lib/python3.11/multiprocessing/pool.py", line 774, in get raise self._value AttributeError: 'KernelWriterAssembly' object has no attribute 'language' make[3]: *** [library/src/CMakeFiles/TENSILE_LIBRARY_TARGET.dir/build.make:92: Tensile/library/TensileLibrary.dat] Error 1 This build failure is non-deterministic. I've seen it before, but I had thought it only occurred when specifying the AMDGPU_TARGETS property in the rocBLAS build. It seems it can occur even without that. It may just be that specifying a reduced set of AMDGPU_TARGETS merely increases the probability of failure. The missing language attribute is an indication that the KernelWriterAssembly object was not initialized before it was used. I have never seen this when building the upstream project, so I suspect that this is related to the removal of the replacement kernels that had were excluded on DFSG grounds during Debian packaging. I am suspicious that this build failure is just one symptom and that the test failures that we see on gfx900 and gfx906 architectures may also be caused by incorrectly generated assembly related to the replacement kernels. We could run a test build with the replacement kernels restored to verify if this is the case. Even if the replacement kernels cannot be packaged in Debian, a local build with them restored may help us to confirm or falsify my theory as to the cause of this failure. We can also take a look at the YAML specification that drives the generation of DGEMM_Aldebaran_NN_MT128x128x16_MI16x16x4x1_GRVW2_SU4_SUS128_WGM4.s. A scorched-earth approach to dealing with this issue would be to delete the YAML of problematic assembly kernels until the rocBLAS build and tests stop failing. That may have a serious adverse effect on performance, but it could restore correctness as the library would fall back to using source kernels. We should avoid doing that if pos
Bug#1031252: hipsparse: FTBFS (c++: error: -E or -x required when input is from standard input)
On 2023-02-13 17:22, Santiago Vila wrote: [ 8%] Linking CXX shared library libhipsparse.so cd /<>/obj-x86_64-linux-gnu/library && /usr/bin/cmake -E cmake_link_script CMakeFiles/hipsparse.dir/link.txt --verbose=1 /usr/bin/c++ -fPIC -g -O2 -ffile-prefix-map=/<>=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -O3 -DNDEBUG -Wl,-z,relro -shared -Wl,-soname,libhipsparse.so.0 -o libhipsparse.so.0.1 CMakeFiles/hipsparse.dir/src/hcc_detail/hipsparse.cpp.o /usr/lib/x86_64-linux-gnu/librocsparse.so.0.1 /usr/lib/x86_64-linux-gnu/libamdhip64.so.5.2.21153- -lCLANGRT_BUILTINS-NOTFOUND c++: error: -E or -x required when input is from standard input make[3]: *** [library/CMakeFiles/hipsparse.dir/build.make:102: library/libhipsparse.so.0.1] Error 1 This is a bug in libamdhip64-5: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1021643 It was fixed in rocm-hipamd 5.2.3-2, but no versions since rocm-hipamd 5.2.3-1 have migrated to bookworm. This problem will affect all libraries and executables that link against libamdhip64-5 using the GCC toolchain. If this is really a bug in one of the build-depends, please use reassign and affects, so that this is still visible in the BTS web page for this package. I'm still learning how to use these the Debian bug reporting tools. Perhaps another maintainer could help set these properties. Apologies for the incomplete handling, but I hope that this information is helpful. Sincerely, Cory Bloor