Re: armhf SIGILL, Illegal Instruction

2021-09-29 Thread peter green

On 29/09/2021 23:39, Jeffrey Walton wrote:

On Wed, Sep 29, 2021 at 5:05 PM peter green  wrote:


As I understand it, there are two variants of "VFPv3", a version with 32 double 
registers (d0 to d31) and a version with only 16 double registers (d0 to d16).
The former is reffered to by gcc as "vfpv3" while the latter is reffered to by gcc as 
"vfpv3_d16".

Debian is supposed to support vfpv3_d16 but because there is relatively little 
hardware out there that doesn't support the extra registers bugs may take a 
while
to get noticed.

So IMO this is a bug in the compiler that is generating that code. What i'm not 
so sure about is whether selecting the correct compilation settings is the
responsibility of the frontend (ldc) or the backend (llvm).


Shouldn't that show up in the build logs? 


It will only show up in build logs if the build process is overriding the 
built-in defaults of the compiler.

Normal practice in Debian is that when invoked without specific architecture 
flags compilers should generate
code that will run on the baseline CPU of the port. If they don't then that is 
a bug in the compiler.



Re: armhf SIGILL, Illegal Instruction

2021-09-29 Thread Jeffrey Walton
On Wed, Sep 29, 2021 at 5:05 PM peter green  wrote:
>
> As I understand it, there are two variants of "VFPv3", a version with 32 
> double registers (d0 to d31) and a version with only 16 double registers (d0 
> to d16).
> The former is reffered to by gcc as "vfpv3" while the latter is reffered to 
> by gcc as "vfpv3_d16".
>
> Debian is supposed to support vfpv3_d16 but because there is relatively 
> little hardware out there that doesn't support the extra registers bugs may 
> take a while
> to get noticed.
>
> So IMO this is a bug in the compiler that is generating that code. What i'm 
> not so sure about is whether selecting the correct compilation settings is the
> responsibility of the frontend (ldc) or the backend (llvm).

Shouldn't that show up in the build logs? You should see 'gcc
-march=armv7 -fpu=vfpv3-d16 ...'? Also see
https://gcc.gnu.org/onlinedocs/gcc/ARM-Options.html .

I'm used to building with -fpu=neon, so I'm not too familiar with a
fpu that does not do NEON. But I seem to recall we needed something
similar for early Android devices.

( I also have never used ldc, so my [limited] knowledge must really be old...).

Jeff



Re: armhf SIGILL, Illegal Instruction

2021-09-29 Thread peter green

As I understand it, there are two variants of "VFPv3", a version with 32 double 
registers (d0 to d31) and a version with only 16 double registers (d0 to d16).
The former is reffered to by gcc as "vfpv3" while the latter is reffered to by gcc as 
"vfpv3_d16".

Debian is supposed to support vfpv3_d16 but because there is relatively little 
hardware out there that doesn't support the extra registers bugs may take a 
while
to get noticed.

So IMO this is a bug in the compiler that is generating that code. What i'm not 
so sure about is whether selecting the correct compilation settings is the
responsibility of the frontend (ldc) or the backend (llvm).

On 29/09/2021 21:06, Ash Hughes wrote:

Hi,

I've been getting some programs terminated with SIGILL today, and I'm trying to 
find out if this is a package issue or if Debian (Bullseye) is no longer 
compatible with my ARM machine. I first got an error with onedrive, with gdb 
output:

#0  0xb6948ca8 in gc.impl.conservative.gc.Gcx.fullcollect(bool) ()
    from /usr/lib/arm-linux-gnueabihf/libdruntime-ldc-shared.so.94

which is "vldr    d18, [pc, #216] ;".

I then tried to run ldc2, and I got something similar:

Core was generated by `ldc2 -c --output-o -conf= -w -mattr=-neon -O3 -release 
-relocation-model=pic -d'.
Program terminated with signal SIGILL, Illegal instruction.
#0  0x0089e15c in 
dmd.parse.Parser!(dmd.astcodegen.ASTCodegen).Parser.parsePrimaryExp() ()

which is also a vldr instruction ("vldr    d16, [r6, #80]  ; 0x50")

Finally, I tried to compile ldc2 myself and running it I got:

#0  0xb4a6eabc in ?? () from /usr/lib/arm-linux-gnueabihf/libLLVM-11.so.1

also vldr ("vldr    d16, [sp, #8]")

It looks like the vldr instruction is being used in several LLVM packages, in a 
way my CPU doesn't like. Here's my cpuinfo:

processor   : 0
model name  : ARMv7 Processor rev 1 (v7l)
BogoMIPS    : 37.39
Features    : half thumb fastmult vfp edsp thumbee vfpv3 vfpv3d16 tls idivt
CPU implementer : 0x56
CPU architecture: 7
CPU variant : 0x1
CPU part    : 0x581
CPU revision    : 1

Hardware    : Marvell Armada 370/XP (Device Tree)
Revision    : 
Serial  : 

I don't have neon, although I think armhf doesn't require it, unless this has 
changed for Bullseye? If neon isn't required for Debian armhf, does this mean 
some LLVM related packages could be built differently to improve compatibility?

Thanks,

Ash




Re: armhf SIGILL, Illegal Instruction

2021-09-29 Thread John Paul Adrian Glaubitz
Hi Jeffrey!

On 9/29/21 22:28, Jeffrey Walton wrote:
> I think John Paul Adrian Glaubitz (with the help of others) on the
> PowerPC mailing list determined that Autools is the problem. Autotools
> is using an M4 macro that is selecting the wrong platform or features.
> It is new behavior.
> 
> Also see Bug #995223: libffi: SIGILL on powerpc and ppc64 systems
> since libffi8, https://lists.debian.org/debian-powerpc/2021/09/msg00051.html.
> In particular, from a followup at
> https://lists.debian.org/debian-powerpc/2021/09/msg00077.html:

It looks like a different bug as the SIGILL faults that Ash is seeing are not
occurring inside libffi.so.8. I think it's more likely an issue with LLVM in
this case as could be seen from the backtrace.

But I would have to look into the details to figure out who the culprit is.

Adrian

-- 
 .''`.  John Paul Adrian Glaubitz
: :' :  Debian Developer - glaub...@debian.org
`. `'   Freie Universitaet Berlin - glaub...@physik.fu-berlin.de
  `-GPG: 62FF 8A75 84E0 2956 9546  0006 7426 3B37 F5B5 F913



Re: armhf SIGILL, Illegal Instruction

2021-09-29 Thread Jeffrey Walton
On Wed, Sep 29, 2021 at 4:06 PM Ash Hughes  wrote:
>
> Hi,
>
> I've been getting some programs terminated with SIGILL today, and I'm
> trying to find out if this is a package issue or if Debian (Bullseye) is
> no longer compatible with my ARM machine. I first got an error with
> onedrive, with gdb output:
>
> #0  0xb6948ca8 in gc.impl.conservative.gc.Gcx.fullcollect(bool) ()
> from /usr/lib/arm-linux-gnueabihf/libdruntime-ldc-shared.so.94
>
> which is "vldrd18, [pc, #216] ;".
>
> I then tried to run ldc2, and I got something similar:
>
> Core was generated by `ldc2 -c --output-o -conf= -w -mattr=-neon -O3
> -release -relocation-model=pic -d'.
> Program terminated with signal SIGILL, Illegal instruction.
> #0  0x0089e15c in
> dmd.parse.Parser!(dmd.astcodegen.ASTCodegen).Parser.parsePrimaryExp() ()
>
> which is also a vldr instruction ("vldrd16, [r6, #80]  ; 0x50")
>
> Finally, I tried to compile ldc2 myself and running it I got:
>
> #0  0xb4a6eabc in ?? () from /usr/lib/arm-linux-gnueabihf/libLLVM-11.so.1
>
> also vldr ("vldrd16, [sp, #8]")
>
> It looks like the vldr instruction is being used in several LLVM
> packages, in a way my CPU doesn't like. Here's my cpuinfo:
>
> processor   : 0
> model name  : ARMv7 Processor rev 1 (v7l)
> BogoMIPS: 37.39
> Features: half thumb fastmult vfp edsp thumbee vfpv3 vfpv3d16
> tls idivt
> CPU implementer : 0x56
> CPU architecture: 7
> CPU variant : 0x1
> CPU part: 0x581
> CPU revision: 1
>
> Hardware: Marvell Armada 370/XP (Device Tree)
> Revision: 
> Serial  : 
>
> I don't have neon, although I think armhf doesn't require it, unless
> this has changed for Bullseye? If neon isn't required for Debian armhf,
> does this mean some LLVM related packages could be built differently to
> improve compatibility?

I think John Paul Adrian Glaubitz (with the help of others) on the
PowerPC mailing list determined that Autools is the problem. Autotools
is using an M4 macro that is selecting the wrong platform or features.
It is new behavior.

Also see Bug #995223: libffi: SIGILL on powerpc and ppc64 systems
since libffi8, https://lists.debian.org/debian-powerpc/2021/09/msg00051.html.
In particular, from a followup at
https://lists.debian.org/debian-powerpc/2021/09/msg00077.html:


It turns out that m4/ax_gcc_archflag.m4 contains code to detect the
baseline of the host system and sets the GCC architecture accordingly.

Thus, a libffi compiled on a POWER8 machine will not work on a POWER5
machine as the compiler is emitting POWER8 instructions in this case.

Since the m4 script contains such a host enviroment detection for aarch64
as well [1], this bug can potentially affect arm64 which is a release
architecture.

We should therefore pass "--enable-portable-binary" in debian/rules.

[1] https://github.com/libffi/libffi/blob/master/m4/ax_gcc_archflag.m4#L209


This is also of interest
https://lists.debian.org/debian-powerpc/2021/09/msg00048.html. There's
a lot of back-and-forth, but it is where the problem is revealed.

I could be mistaken, so take it with a grain of salt.

Jeff