Re: armhf SIGILL, Illegal Instruction
On 29/09/2021 23:39, Jeffrey Walton wrote: On Wed, Sep 29, 2021 at 5:05 PM peter green wrote: As I understand it, there are two variants of "VFPv3", a version with 32 double registers (d0 to d31) and a version with only 16 double registers (d0 to d16). The former is reffered to by gcc as "vfpv3" while the latter is reffered to by gcc as "vfpv3_d16". Debian is supposed to support vfpv3_d16 but because there is relatively little hardware out there that doesn't support the extra registers bugs may take a while to get noticed. So IMO this is a bug in the compiler that is generating that code. What i'm not so sure about is whether selecting the correct compilation settings is the responsibility of the frontend (ldc) or the backend (llvm). Shouldn't that show up in the build logs? It will only show up in build logs if the build process is overriding the built-in defaults of the compiler. Normal practice in Debian is that when invoked without specific architecture flags compilers should generate code that will run on the baseline CPU of the port. If they don't then that is a bug in the compiler.
Re: armhf SIGILL, Illegal Instruction
On Wed, Sep 29, 2021 at 5:05 PM peter green wrote: > > As I understand it, there are two variants of "VFPv3", a version with 32 > double registers (d0 to d31) and a version with only 16 double registers (d0 > to d16). > The former is reffered to by gcc as "vfpv3" while the latter is reffered to > by gcc as "vfpv3_d16". > > Debian is supposed to support vfpv3_d16 but because there is relatively > little hardware out there that doesn't support the extra registers bugs may > take a while > to get noticed. > > So IMO this is a bug in the compiler that is generating that code. What i'm > not so sure about is whether selecting the correct compilation settings is the > responsibility of the frontend (ldc) or the backend (llvm). Shouldn't that show up in the build logs? You should see 'gcc -march=armv7 -fpu=vfpv3-d16 ...'? Also see https://gcc.gnu.org/onlinedocs/gcc/ARM-Options.html . I'm used to building with -fpu=neon, so I'm not too familiar with a fpu that does not do NEON. But I seem to recall we needed something similar for early Android devices. ( I also have never used ldc, so my [limited] knowledge must really be old...). Jeff
Re: armhf SIGILL, Illegal Instruction
As I understand it, there are two variants of "VFPv3", a version with 32 double registers (d0 to d31) and a version with only 16 double registers (d0 to d16). The former is reffered to by gcc as "vfpv3" while the latter is reffered to by gcc as "vfpv3_d16". Debian is supposed to support vfpv3_d16 but because there is relatively little hardware out there that doesn't support the extra registers bugs may take a while to get noticed. So IMO this is a bug in the compiler that is generating that code. What i'm not so sure about is whether selecting the correct compilation settings is the responsibility of the frontend (ldc) or the backend (llvm). On 29/09/2021 21:06, Ash Hughes wrote: Hi, I've been getting some programs terminated with SIGILL today, and I'm trying to find out if this is a package issue or if Debian (Bullseye) is no longer compatible with my ARM machine. I first got an error with onedrive, with gdb output: #0 0xb6948ca8 in gc.impl.conservative.gc.Gcx.fullcollect(bool) () from /usr/lib/arm-linux-gnueabihf/libdruntime-ldc-shared.so.94 which is "vldr d18, [pc, #216] ;". I then tried to run ldc2, and I got something similar: Core was generated by `ldc2 -c --output-o -conf= -w -mattr=-neon -O3 -release -relocation-model=pic -d'. Program terminated with signal SIGILL, Illegal instruction. #0 0x0089e15c in dmd.parse.Parser!(dmd.astcodegen.ASTCodegen).Parser.parsePrimaryExp() () which is also a vldr instruction ("vldr d16, [r6, #80] ; 0x50") Finally, I tried to compile ldc2 myself and running it I got: #0 0xb4a6eabc in ?? () from /usr/lib/arm-linux-gnueabihf/libLLVM-11.so.1 also vldr ("vldr d16, [sp, #8]") It looks like the vldr instruction is being used in several LLVM packages, in a way my CPU doesn't like. Here's my cpuinfo: processor : 0 model name : ARMv7 Processor rev 1 (v7l) BogoMIPS : 37.39 Features : half thumb fastmult vfp edsp thumbee vfpv3 vfpv3d16 tls idivt CPU implementer : 0x56 CPU architecture: 7 CPU variant : 0x1 CPU part : 0x581 CPU revision : 1 Hardware : Marvell Armada 370/XP (Device Tree) Revision : Serial : I don't have neon, although I think armhf doesn't require it, unless this has changed for Bullseye? If neon isn't required for Debian armhf, does this mean some LLVM related packages could be built differently to improve compatibility? Thanks, Ash
Re: armhf SIGILL, Illegal Instruction
Hi Jeffrey! On 9/29/21 22:28, Jeffrey Walton wrote: > I think John Paul Adrian Glaubitz (with the help of others) on the > PowerPC mailing list determined that Autools is the problem. Autotools > is using an M4 macro that is selecting the wrong platform or features. > It is new behavior. > > Also see Bug #995223: libffi: SIGILL on powerpc and ppc64 systems > since libffi8, https://lists.debian.org/debian-powerpc/2021/09/msg00051.html. > In particular, from a followup at > https://lists.debian.org/debian-powerpc/2021/09/msg00077.html: It looks like a different bug as the SIGILL faults that Ash is seeing are not occurring inside libffi.so.8. I think it's more likely an issue with LLVM in this case as could be seen from the backtrace. But I would have to look into the details to figure out who the culprit is. Adrian -- .''`. John Paul Adrian Glaubitz : :' : Debian Developer - glaub...@debian.org `. `' Freie Universitaet Berlin - glaub...@physik.fu-berlin.de `-GPG: 62FF 8A75 84E0 2956 9546 0006 7426 3B37 F5B5 F913
Re: armhf SIGILL, Illegal Instruction
On Wed, Sep 29, 2021 at 4:06 PM Ash Hughes wrote: > > Hi, > > I've been getting some programs terminated with SIGILL today, and I'm > trying to find out if this is a package issue or if Debian (Bullseye) is > no longer compatible with my ARM machine. I first got an error with > onedrive, with gdb output: > > #0 0xb6948ca8 in gc.impl.conservative.gc.Gcx.fullcollect(bool) () > from /usr/lib/arm-linux-gnueabihf/libdruntime-ldc-shared.so.94 > > which is "vldrd18, [pc, #216] ;". > > I then tried to run ldc2, and I got something similar: > > Core was generated by `ldc2 -c --output-o -conf= -w -mattr=-neon -O3 > -release -relocation-model=pic -d'. > Program terminated with signal SIGILL, Illegal instruction. > #0 0x0089e15c in > dmd.parse.Parser!(dmd.astcodegen.ASTCodegen).Parser.parsePrimaryExp() () > > which is also a vldr instruction ("vldrd16, [r6, #80] ; 0x50") > > Finally, I tried to compile ldc2 myself and running it I got: > > #0 0xb4a6eabc in ?? () from /usr/lib/arm-linux-gnueabihf/libLLVM-11.so.1 > > also vldr ("vldrd16, [sp, #8]") > > It looks like the vldr instruction is being used in several LLVM > packages, in a way my CPU doesn't like. Here's my cpuinfo: > > processor : 0 > model name : ARMv7 Processor rev 1 (v7l) > BogoMIPS: 37.39 > Features: half thumb fastmult vfp edsp thumbee vfpv3 vfpv3d16 > tls idivt > CPU implementer : 0x56 > CPU architecture: 7 > CPU variant : 0x1 > CPU part: 0x581 > CPU revision: 1 > > Hardware: Marvell Armada 370/XP (Device Tree) > Revision: > Serial : > > I don't have neon, although I think armhf doesn't require it, unless > this has changed for Bullseye? If neon isn't required for Debian armhf, > does this mean some LLVM related packages could be built differently to > improve compatibility? I think John Paul Adrian Glaubitz (with the help of others) on the PowerPC mailing list determined that Autools is the problem. Autotools is using an M4 macro that is selecting the wrong platform or features. It is new behavior. Also see Bug #995223: libffi: SIGILL on powerpc and ppc64 systems since libffi8, https://lists.debian.org/debian-powerpc/2021/09/msg00051.html. In particular, from a followup at https://lists.debian.org/debian-powerpc/2021/09/msg00077.html: It turns out that m4/ax_gcc_archflag.m4 contains code to detect the baseline of the host system and sets the GCC architecture accordingly. Thus, a libffi compiled on a POWER8 machine will not work on a POWER5 machine as the compiler is emitting POWER8 instructions in this case. Since the m4 script contains such a host enviroment detection for aarch64 as well [1], this bug can potentially affect arm64 which is a release architecture. We should therefore pass "--enable-portable-binary" in debian/rules. [1] https://github.com/libffi/libffi/blob/master/m4/ax_gcc_archflag.m4#L209 This is also of interest https://lists.debian.org/debian-powerpc/2021/09/msg00048.html. There's a lot of back-and-forth, but it is where the problem is revealed. I could be mistaken, so take it with a grain of salt. Jeff