Bug#1068350: [musl] Re: Bug#1068350: musl: miscompiles (runtime problems) on riscv64 and s390x with static-pie → seems to be a toolchain bug after all, it does too hit glibc

2024-04-06 Thread Thorsten Glaser
Rich Felker dixit:

>Is there anything weird about how these objects were declared that
>might have caused ld not to resolve them statically like it should? It
>seems odd that these data symbols, but not any other ones, would be
>left as symbolic relocations.

I don’t think so?

In  I already
posted the short version; the actual source is (mirrored):

The initcoms array is here:
https://github.com/MirBSD/mksh/blob/b0219da8e6dfc7b16e923e220dc6933c5ed9b326/main.c#L77

Tdr is defined at:
https://github.com/MirBSD/mksh/blob/b0219da8e6dfc7b16e923e220dc6933c5ed9b326/sh.h#L3055

The u_ops array is declared a few lines above that and defined at:
https://github.com/MirBSD/mksh/blob/b0219da8e6dfc7b16e923e220dc6933c5ed9b326/funcs.c#L160

initvsn is defined at…
https://github.com/MirBSD/mksh/blob/b0219da8e6dfc7b16e923e220dc6933c5ed9b326/sh.h#L713
… with the EXTERN and E_INIT macros from…
https://github.com/MirBSD/mksh/blob/b0219da8e6dfc7b16e923e220dc6933c5ed9b326/sh.h#L657
where main.c defines EXTERN, so the string is embedded into the file using it.

Is there perhaps a misunderstanding with the gcc/binutils/glibc developers
as to what static-pie is meant to be?

bye,
//mirabilos
-- 
 cool ein Ada Lovelace Google-Doodle. aber zum 197. Geburtstag? Hätten
die nicht noch 3 Jahre warten können?  bis dahin gibts google nicht
mehr  ja, könnte man meinen. wahrscheinlich ist der angekündigte welt-
untergang aus dem maya-kalender die globale abschaltung von google ☺ und darum
müssen die die doodles vorher noch raushauen



Bug#1068350: [musl] Re: Bug#1068350: musl: miscompiles (runtime problems) on riscv64 and s390x with static-pie → seems to be a toolchain bug after all, it does too hit glibc

2024-04-05 Thread Rich Felker
On Fri, Apr 05, 2024 at 05:04:37AM +, Thorsten Glaser wrote:
> Markus Wichmann dixit:
> 
> >can check with readelf -r what the relocation types are. If they are not
> >relative, they will not be processed.
> 
> Gotcha! They are all R_390_RELATIVE except for:
> 
> 00045ff0  00110016 R_390_64  00042c58 u_ops + 70
> 00045ff8  00110016 R_390_64  00042c58 u_ops + 0
> 00047020  00110016 R_390_64  00042c58 u_ops + 80
> 00047088  00110016 R_390_64  00042c58 u_ops + 80
> 000470a8  00110016 R_390_64  00042c58 u_ops + b8
> 00047220  00110016 R_390_64  00042c58 u_ops + 80
> 00046900  00260016 R_390_64  00015af8 c_command + 0
> 00046940  00070016 R_390_64  00017238 c_exec + 0
> 00046ab0  00200016 R_390_64  00016a80 c_trap + 0
> 00047090  00250016 R_390_64  000430ac initvsn + 0
> 00047278  00550016 R_390_64  00047438 null_string + 2
> 
> That’s our missing strings.

Is there anything weird about how these objects were declared that
might have caused ld not to resolve them statically like it should? It
seems odd that these data symbols, but not any other ones, would be
left as symbolic relocations.

Rich



Bug#1068350: [musl] Re: Bug#1068350: musl: miscompiles (runtime problems) on riscv64 and s390x with static-pie → seems to be a toolchain bug after all, it does too hit glibc

2024-04-05 Thread Szabolcs Nagy
* Thorsten Glaser  [2024-04-05 05:04:37 +]:
> Markus Wichmann dixit:
> 
> >can check with readelf -r what the relocation types are. If they are not
> >relative, they will not be processed.
> 
> Gotcha! They are all R_390_RELATIVE except for:
> 
> 00045ff0  00110016 R_390_64  00042c58 u_ops + 70
> 00045ff8  00110016 R_390_64  00042c58 u_ops + 0
> 00047020  00110016 R_390_64  00042c58 u_ops + 80
> 00047088  00110016 R_390_64  00042c58 u_ops + 80
> 000470a8  00110016 R_390_64  00042c58 u_ops + b8
> 00047220  00110016 R_390_64  00042c58 u_ops + 80
> 00046900  00260016 R_390_64  00015af8 c_command + 0
> 00046940  00070016 R_390_64  00017238 c_exec + 0
> 00046ab0  00200016 R_390_64  00016a80 c_trap + 0
> 00047090  00250016 R_390_64  000430ac initvsn + 0
> 00047278  00550016 R_390_64  00047438 null_string + 2
> 
> That’s our missing strings.


this is not correct static pie.

glibc handles symbolic relocs, but there should not be
any non-local symbol in a static exe. you may want to
check the symbol table.

so s390 does not support static pie.
(arguably the elf is correct, if you expect a full
dynlinker in a static pie, but even then it's bad
quality linker output)



Bug#1068350: [musl] Re: Bug#1068350: musl: miscompiles (runtime problems) on riscv64 and s390x with static-pie → seems to be a toolchain bug after all, it does too hit glibc

2024-04-05 Thread Markus Wichmann
Am Fri, Apr 05, 2024 at 05:58:15AM + schrieb Thorsten Glaser:
> Markus Wichmann dixit:
> >In any case, the emission of non-relative relocations is the issue here,
> >and it is coming from the linker.
>
> They are present in the glibc static-pie binary as well, though.
> And tbh they look to me like “just plug the absolute address of
> the symbol here, please”, which is perfectly fine for things like
> an array of strings when the actual string has already its own symbol.
>
> (Disclaimer: I know… barely anything about Unix relocation types,
> a bit more about those on DOS and even TOS.)
>

Then glibc's static-pie startup code also processes symbolic
relocations. musl's doesn't. It only processes relative relocations. And
changing this would require some massive reworking. We'd somehow have to
put stage 2 of the dynamic linker into rcrt1.o.

A symbolic lookup doesn't really make sense for a static executable
outside of FDPIC. The only difference in address space possible is a
relative offset. In order to do a symbolic relocation, you also need the
symbol lookup stuff, which - granted - for a static PIE is probably
very simple because there can be only one symbol table, but still.

I thought the whole point of static-PIE support was to only leave
relative relocations around.

Ciao,
Markus



Bug#1068350: [musl] Re: Bug#1068350: musl: miscompiles (runtime problems) on riscv64 and s390x with static-pie → seems to be a toolchain bug after all, it does too hit glibc

2024-04-05 Thread Thorsten Glaser
Markus Wichmann dixit:

>I may not really know what I am talking about, so take this with a grain
>of salt, but isn't this missing a -Bsymbolic somewhere? Ironically, that
>switch causes ld to not emit symbolic relocations. I seem to remember
>reading long ago in Rich's initial -static-pie proposal that that was
>one of the switches added to the linker command line.

When searching for which architectures support static PIE in the first
place (sadly, there doesn’t seem a consistent list), I found one saying
it’s no longer necessart after some point, so I didn’t check it.

>In any case, the emission of non-relative relocations is the issue here,
>and it is coming from the linker.

They are present in the glibc static-pie binary as well, though.
And tbh they look to me like “just plug the absolute address of
the symbol here, please”, which is perfectly fine for things like
an array of strings when the actual string has already its own symbol.

(Disclaimer: I know… barely anything about Unix relocation types,
a bit more about those on DOS and even TOS.)

bye,
//mirabilos
-- 
When he found out that the m68k port was in a pretty bad shape, he did
not, like many before him, shrug and move on; instead, he took it upon
himself to start compiling things, just so he could compile his shell.
How's that for dedication. -- Wouter, about my Debian/m68k revival



Bug#1068350: [musl] Re: Bug#1068350: musl: miscompiles (runtime problems) on riscv64 and s390x with static-pie → seems to be a toolchain bug after all, it does too hit glibc

2024-04-04 Thread Markus Wichmann
Am Fri, Apr 05, 2024 at 05:04:37AM + schrieb Thorsten Glaser:
> Should be correct:
>
>  /usr/libexec/gcc/s390x-linux-gnu/13/collect2 -fno-lto -dynamic-linker 
> /lib/ld-musl-s390x.so.1 -nostdlib -static -static -pie --no-dynamic-linker -o 
> mksh /usr/lib/s390x-linux-musl/rcrt1.o /usr/lib/s390x-linux-musl/crti.o 
> /usr/lib/gcc/s390x-linux-gnu/13/crtbeginS.o -L/usr/lib/s390x-linux-musl -L 
> /usr/lib/gcc/s390x-linux-gnu/13/. -z relro -z now --as-needed -z text 
> --eh-frame-hdr lalloc.o edit.o eval.o exec.o expr.o funcs.o histrap.o jobs.o 
> lex.o main.o misc.o shf.o syn.o tree.o var.o ulimit.o --start-group 
> /usr/lib/gcc/s390x-linux-gnu/13/libgcc.a 
> /usr/lib/gcc/s390x-linux-gnu/13/libgcc_eh.a -lc --end-group 
> /usr/lib/gcc/s390x-linux-gnu/13/crtendS.o /usr/lib/s390x-linux-musl/crtn.o
>
> HTH & HAND,
> //mirabilos

I may not really know what I am talking about, so take this with a grain
of salt, but isn't this missing a -Bsymbolic somewhere? Ironically, that
switch causes ld to not emit symbolic relocations. I seem to remember
reading long ago in Rich's initial -static-pie proposal that that was
one of the switches added to the linker command line.

In any case, the emission of non-relative relocations is the issue here,
and it is coming from the linker.

Ciao,
Markus



Bug#1068350: [musl] Re: Bug#1068350: musl: miscompiles (runtime problems) on riscv64 and s390x with static-pie → seems to be a toolchain bug after all, it does too hit glibc

2024-04-04 Thread Thorsten Glaser
Markus Wichmann dixit:

>can check with readelf -r what the relocation types are. If they are not
>relative, they will not be processed.

Gotcha! They are all R_390_RELATIVE except for:

00045ff0  00110016 R_390_64  00042c58 u_ops + 70
00045ff8  00110016 R_390_64  00042c58 u_ops + 0
00047020  00110016 R_390_64  00042c58 u_ops + 80
00047088  00110016 R_390_64  00042c58 u_ops + 80
000470a8  00110016 R_390_64  00042c58 u_ops + b8
00047220  00110016 R_390_64  00042c58 u_ops + 80
00046900  00260016 R_390_64  00015af8 c_command + 0
00046940  00070016 R_390_64  00017238 c_exec + 0
00046ab0  00200016 R_390_64  00016a80 c_trap + 0
00047090  00250016 R_390_64  000430ac initvsn + 0
00047278  00550016 R_390_64  00047438 null_string + 2

That’s our missing strings.

>Is it possible you are linking in the wrong start file? gcc -v should
>output the command line it feeds to the linker.

Should be correct:

 /usr/libexec/gcc/s390x-linux-gnu/13/collect2 -fno-lto -dynamic-linker 
/lib/ld-musl-s390x.so.1 -nostdlib -static -static -pie --no-dynamic-linker -o 
mksh /usr/lib/s390x-linux-musl/rcrt1.o /usr/lib/s390x-linux-musl/crti.o 
/usr/lib/gcc/s390x-linux-gnu/13/crtbeginS.o -L/usr/lib/s390x-linux-musl -L 
/usr/lib/gcc/s390x-linux-gnu/13/. -z relro -z now --as-needed -z text 
--eh-frame-hdr lalloc.o edit.o eval.o exec.o expr.o funcs.o histrap.o jobs.o 
lex.o main.o misc.o shf.o syn.o tree.o var.o ulimit.o --start-group 
/usr/lib/gcc/s390x-linux-gnu/13/libgcc.a 
/usr/lib/gcc/s390x-linux-gnu/13/libgcc_eh.a -lc --end-group 
/usr/lib/gcc/s390x-linux-gnu/13/crtendS.o /usr/lib/s390x-linux-musl/crtn.o

HTH & HAND,
//mirabilos
-- 
„Cool, /usr/share/doc/mksh/examples/uhr.gz ist ja ein Grund,
mksh auf jedem System zu installieren.“
-- XTaran auf der OpenRheinRuhr, ganz begeistert
(EN: “[…]uhr.gz is a reason to install mksh on every system.”)



Bug#1068350: [musl] Re: Bug#1068350: musl: miscompiles (runtime problems) on riscv64 and s390x with static-pie → seems to be a toolchain bug after all, it does too hit glibc

2024-04-04 Thread Markus Wichmann
Hi,

in static-pie, relocations get processed in _start, before main() is
called. In musl, this is done by linking with rcrt1.o as start file
instead of crt1.o. And that file processes all relative relocations. You
can check with readelf -r what the relocation types are. If they are not
relative, they will not be processed.

What you are seeing seems indicative of missing relocation processing.
Is it possible you are linking in the wrong start file? gcc -v should
output the command line it feeds to the linker.

Ciao,
Markus



Bug#1068350: musl: miscompiles (runtime problems) on riscv64 and s390x with static-pie → seems to be a toolchain bug after all, it does too hit glibc

2024-04-04 Thread Thorsten Glaser
Dixi quod…

>Now I (or someone) is going to have to reduce that to a testcase, so

No success with that, unfortunately.

>But this does seem to be a toolchain bug: adding -static-pie to the
>glibc dynamic-pie link command and…
>
>(gdb) print initcoms
>$1 = {0xda494 "typeset", 0x0, 0x0, 0x0, 0xda494 "typeset", 0x0, 0xd942c 
>"HOME", 0xda7d8 "PATH",

Wait, what?

(gdb) b main
Breakpoint 1 at 0xd820: file ../../main.c, line 785.
(gdb) print initcoms
$1 = {0xda494 "typeset", 0x0, 0x0, 0x0, 0xda494 "typeset", 0x0, 0xd942c "HOME", 
0xda7d8 "PATH",
[…]
(gdb) r
Starting program: /home/tg/mksh-59c/builddir/full/mksh

Breakpoint 1, main (argc=1, argv=0x3ffa4d8) at ../../main.c:785
785 {
(gdb) print initcoms
$2 = {0x3fff7eda494 "typeset", 0x3fff7ee4548  "-r",
  0x3fff7ee4ae0  "KSH_VERSION=@(#)MIRBSD KSH R59 2024/02/01 +Debian", 
0x0, 0x3fff7eda494 "typeset",
[…]

While in musl:

(gdb) print initcoms
$1 = {0x414a4 "typeset", 0x0, 0x0, 0x0, 0x414a4 "typeset", 0x0, 0x40478 "HOME", 
0x41d42 "PATH",
[…]
(gdb) r
Starting program: /home/tg/mksh-59c/builddir/static-musl/mksh

Breakpoint 1, main (argc=1, argv=0x3ffa498) at ../../main.c:785
785 {
(gdb) print initcoms
$2 = {0x3fff7fc14a4 "typeset", 0x0, 0x0, 0x0, 0x3fff7fc14a4 "typeset", 0x0, 
0x3fff7fc0478 "HOME",
[…]

So the existing ones did get relocated, but the nullptrs stayed thusly.

Apparently, it *is* supported on glibc on s390x, mjt (qemu maintainer)
also said so in 2023.

bye,
//mirabilos
-- 
22:20⎜ The crazy that persists in his craziness becomes a master
22:21⎜ And the distance between the craziness and geniality is
only measured by the success 18:35⎜ "Psychotics are consistently
inconsistent. The essence of sanity is to be inconsistently inconsistent



Bug#1068350: musl: miscompiles (runtime problems) on riscv64 and s390x with static-pie → seems to be a toolchain bug after all, it does too hit glibc

2024-04-04 Thread Thorsten Glaser
Dixi quod…

>Hmm, actually… I could… test whether that one fixes static-pie
>on zelenka. Or at least the same approach. I’ll get back with
>report from that.

Having looked at the spec file, the only extra things the stock
specs do that the overriding specs don’t is:

*link:
[…] %{!static|static-pie:--eh-frame-hdr} […] %{static-pie:-static -pie 
--no-dynamic-linker -z text} […]

instead of:

[…] %{static-pie:-static -pie --no-dynamic-linker} […]

The -Wl,-z,text makes TEXTRELs an error. Granted.
The -Wl,--eh-frame-hdr is added for anything that’s not a normal
static executable, however adding that to a musl build doesn’t
fix the problem either.

A bit of gdb-ing shows the problem, though: the source code has…

#define Ttypeset "typeset"
#define Tdr "-r"
//… (a variant of this is used for string sharing on ancient Unix)

static const char *initcoms[] = {
Ttypeset, Tdr, initvsn, NULL,
Ttypeset, Tdx, "HOME", TPATH, TSHELL, NULL,
  […]
};

It then iterates over these commands with:

for (wp = initcoms; *wp != NULL; wp++) {
c_builtin(wp);
while (*wp != NULL)
wp++;
}

This is where the extra output happens:

(gdb) print initcoms
$3 = {0x3fff7fc14a4 "typeset", 0x0, 0x0, 0x0, 0x3fff7fc14a4 "typeset", 0x0, 
0x3fff7fc0478 "HOME", 
[…]

Notice the nullptrs there where string pointers are expected.
It shows the same output when just loading the executable, i.e. this
isn’t a runtime issue.

Linking the exact same .o files with the exact same command minus
-static-pie gives:

(gdb) print initcoms
$1 = {0x103cb34 "typeset", 0x103e368  "-r", 
  0x103e73c  "KSH_VERSION=@(#)MIRBSD KSH R59 2024/02/01 +Debian", 0x0, 
0x103cb34 "typeset", 

But this does seem to be a toolchain bug: adding -static-pie to the
glibc dynamic-pie link command and…

(gdb) print initcoms
$1 = {0xda494 "typeset", 0x0, 0x0, 0x0, 0xda494 "typeset", 0x0, 0xd942c "HOME", 
0xda7d8 "PATH",

Now I (or someone) is going to have to reduce that to a testcase, so
we can detect static-pie viability before it’s committed to being used…

bye,
//mirabilos
-- 
Solange man keine schmutzigen Tricks macht, und ich meine *wirklich*
schmutzige Tricks, wie bei einer doppelt verketteten Liste beide
Pointer XORen und in nur einem Word speichern, funktioniert Boehm ganz
hervorragend.   -- Andreas Bogk über boehm-gc in d.a.s.r