Bug#1059511: [Pkg-opencl-devel] Bug#1059511: pocl: FTBFS on riscv64: testsuite fails: can't link double-float modules with soft-float modules
Hi, On Thu, Jan 18, 2024 at 5:41 PM Andreas Beckmann wrote: > > On 18/01/2024 08.41, Bo YU wrote: > > Obviously these tests failed was due to vectors missing from riscv64 > > now. This is expected I think. > > > > The result is aligned with pocl upstream support riscv64: > > > > ``` > > In this release we improved support for RISC-V CPUs. We tested PoCL on a > > Starfive VisionFive 2 using a Ubuntu 23.10 preinstalled image. With LLVM 17 > > and GCC 13.2, 98% tests pass (only 4 tests fail out of 253). > > If these tests are expected to fail, they should be annotated as such > for riscv64 (see e.g. handing of kernel/test_printf_vectors in > tests/kernel/CMakeLists.txt). Is there an upstream bug about these tests > failing on risc-v? I have report the issue to upstream and got their confirmation: https://github.com/pocl/pocl/issues/1394 > > printf is a mess on most architectures (even or especially also x86) > since the ABI for passing the variadic parameters is not well defined, > especially when it comes to OpenCL vector types. > > > So I suspect the `LLC_HOST_CPU='` was overriden by `GENERIC` then the > > ABI was different. I will look at this more deeper. > > GENERIC is a Debian specific patch to build for the LLVM default target > (i.e. without specifying -march=$whatever), s.t. the generated code > targets the riscv64 baseline (like all code in Debian binary packages). > (Assuming llvm targets the same baseline cpu architecture as gcc, which > unfortunately is not true for all Debian architectures.) > (By default upstream pocl wants to build for the native CPU which is a > no-go for distributing binaries. In order to fully utilize capabilities > for more modern CPUs (e.g. wider SIMD registers), pocl has a distro mode > where the bitcode libraries are provided for multiple targets and the > best one for the local CPU is selected at runtime. But first we should > get a baseline build for risc-v before we look into optimizing it (i.e. > supporting separately both CPUs without and with vector unit). Thanks. So we need to enable to select the best one for local CPU when runtime for riscv64 here. I looked at the patch: https://salsa.debian.org/opencl-team/pocl/-/blob/main/debian/patches/generic-cpu.patch?ref_type=heads But I am not sure how to enable this. The baseline of riscv64 port here is here: ``` The Debian port uses RV64GC as the hardware baseline and the lp64d ABI (the default ABI for RV64G systems) ``` https://wiki.debian.org/RISC-V#Hardware_baseline_and_ABI_choice Is this enough information? Please let me know of any issues or I can do some experiment about this. BR, Bo > > Andreas > > PS: I'll look into switching pocl to llvm-16 now
Bug#1059511: [Pkg-opencl-devel] Bug#1059511: pocl: FTBFS on riscv64: testsuite fails: can't link double-float modules with soft-float modules
On 18/01/2024 08.41, Bo YU wrote: Obviously these tests failed was due to vectors missing from riscv64 now. This is expected I think. The result is aligned with pocl upstream support riscv64: ``` In this release we improved support for RISC-V CPUs. We tested PoCL on a Starfive VisionFive 2 using a Ubuntu 23.10 preinstalled image. With LLVM 17 and GCC 13.2, 98% tests pass (only 4 tests fail out of 253). If these tests are expected to fail, they should be annotated as such for riscv64 (see e.g. handing of kernel/test_printf_vectors in tests/kernel/CMakeLists.txt). Is there an upstream bug about these tests failing on risc-v? printf is a mess on most architectures (even or especially also x86) since the ABI for passing the variadic parameters is not well defined, especially when it comes to OpenCL vector types. So I suspect the `LLC_HOST_CPU='` was overriden by `GENERIC` then the ABI was different. I will look at this more deeper. GENERIC is a Debian specific patch to build for the LLVM default target (i.e. without specifying -march=$whatever), s.t. the generated code targets the riscv64 baseline (like all code in Debian binary packages). (Assuming llvm targets the same baseline cpu architecture as gcc, which unfortunately is not true for all Debian architectures.) (By default upstream pocl wants to build for the native CPU which is a no-go for distributing binaries. In order to fully utilize capabilities for more modern CPUs (e.g. wider SIMD registers), pocl has a distro mode where the bitcode libraries are provided for multiple targets and the best one for the local CPU is selected at runtime. But first we should get a baseline build for risc-v before we look into optimizing it (i.e. supporting separately both CPUs without and with vector unit). Andreas PS: I'll look into switching pocl to llvm-16 now
Bug#1059511: pocl: FTBFS on riscv64: testsuite fails: can't link double-float modules with soft-float modules
Source: pocl Version: 5.0-1 Followup-For: Bug #1059511 I cost some time to try to fix the issue, progress is below. Inspired by the commit: https://github.com/pocl/pocl/issues/1088#issue-1340373438 If i build pocl by manual if follow: ``` cmake .. -DCMAKE_INSTALL_PREFIX='/usr' -DCMAKE_BUILD_TYPE='Debug' -DPOCL_DEBUG_MESSAGES=ON -DLLC_HOST_CPU='sifive-u54' ``` Then there are four test failed: ``` 98% tests passed, 4 tests failed out of 263 Label Time Summary: EinsteinToolkit= 190.88 sec*proc (2 tests) cuda = 279.87 sec*proc (42 tests) dlopen = 0.52 sec*proc (3 tests) hsa= 35.00 sec*proc (4 tests) hsa-native = 1127.40 sec*proc (82 tests) internal = 3317.43 sec*proc (256 tests) kernel = 1719.95 sec*proc (76 tests) level0 = 1446.25 sec*proc (124 tests) matrix = 40.41 sec*proc (4 tests) poclbin= 38.95 sec*proc (4 tests) proxy = 327.09 sec*proc (36 tests) regression = 933.17 sec*proc (95 tests) runtime= 179.38 sec*proc (31 tests) tce= 69.97 sec*proc (9 tests) vulkan = 176.65 sec*proc (26 tests) workgroup = 472.82 sec*proc (31 tests) Total Test time (real) = 3791.39 sec The following tests did not run: 62 - kernel/test_shuffle_half_loopvec (Skipped) 63 - kernel/test_shuffle_half_cbs (Skipped) 190 - runtime/clGetKernelArgInfo (Disabled) 199 - runtime/test_buffer_migration (Skipped) 200 - runtime/test_buffer_ping_pong (Skipped) The following tests FAILED: 76 - kernel/test_printf_vectors_loopvec (Failed) 77 - kernel/test_printf_vectors_cbs (Failed) 78 - kernel/test_printf_vectors_ulongn_loopvec (Failed) 79 - kernel/test_printf_vectors_ulongn_cbs (Failed) Errors while running CTest ``` Obviously these tests failed was due to vectors missing from riscv64 now. This is expected I think. The result is aligned with pocl upstream support riscv64: ``` In this release we improved support for RISC-V CPUs. We tested PoCL on a Starfive VisionFive 2 using a Ubuntu 23.10 preinstalled image. With LLVM 17 and GCC 13.2, 98% tests pass (only 4 tests fail out of 253). ``` http://portablecl.org/docs/html/notes_5_0.html#risc-v-cpu-support-improved If glance at the buildd log from riscv64: ``` -- udivmodti4 compiles without extra flags -- Checking if LLVM is a DEBUG build -- DEBUG build -- Find out LLC target triple (for host riscv64-unknown-linux-gnu) -- Find out LLC host CPU with /usr/bin/llc-15 -- Autodetected CPU sifive-u74 overridden by user to GENERIC -- Checking clang -march vs. -mcpu flag -- Using -None= -- Running LLVM link test -- LLVM link test OK ``` https://buildd.debian.org/status/fetch.php?pkg=pocl=riscv64=5.0-1=1704805199=0 So I suspect the `LLC_HOST_CPU='` was overriden by `GENERIC` then the ABI was different. I will look at this more deeper. -- Regards, -- Bo YU signature.asc Description: PGP signature