Re: linux clone issue
On Tue, Oct 05, 2021 at 03:57:14PM +0100, Robert Swindells wrote: > > Manuel Bouyer wrote: > >I'm trying to run a binary-only linux program under NetBSD 9.2. > >From what I found, the binary was built on Ubuntu 16.04 > > > >The program dies at at specific point and it seems to be a bug in our > >emulation: > > 8992 8992 mylinuxprog CALL set_robust_list(0x7f7ff7ef5a20,0x18) > 8992 8992 mylinuxprog RET set_robust_list 0 > > This is doing futex stuff which isn't in -9, it doesn't work in -current > either but thorpej@ has an improved version on a branch. Hum, so after the ptrace issue this is going to be the next challenge :) -- Manuel Bouyer NetBSD: 26 ans d'experience feront toujours la difference --
linux ptrace issue [Re: linux clone issue]
On Tue, Oct 05, 2021 at 01:08:52PM +0200, Manuel Bouyer wrote: > On Tue, Oct 05, 2021 at 12:42:33AM -0400, Eric Hawicz wrote: > > > > On 10/4/2021 10:33 AM, Manuel Bouyer wrote: > > > Hello > > > I'm trying to run a binary-only linux program under NetBSD 9.2. > > > From what I found, the binary was built on Ubuntu 16.04 > > > [...] > > > > > > As you can see above (ktrace -si output), the read on fd 3 in 26751 > > > returns > > > with an error as soon as the child does its execve(), just as if CLOSEEXEC > > > was set in the child. But the dup2(4,1) should keep the write side open > > > without CLOSEEXEC. The program does a similar sequence just before > > > (also forking a shell to execute some command) and it works. > > > Later when sh tries to write to stdout it gets a SIGPIPE. > > > > > > I couldn't reproduce this with a simple program. > > > But it seems that I can't reproduce this clone call. It seems that we are > > > called with flags 0x1200011, which would translate to > > > CLONE_CHILD_SETTID | CLONE_CHILD_CLEARTID | SIGCHLD, > > > and a NULL stack pointer. > > > But when run on linux, this clone syscall straces to > > > CLONE_VM|CLONE_VFORK|SIGCHLD > > > > I think that combination of flags is actually a "fork()" call, which glibc > > implements using clone. I found that through > > https://eli.thegreenplace.net/2018/launching-linux-threads-and-processes-with-clone/, > > which mentions that glibc has a ARCH_FORK macro, though it seems that the > > more recent code uses an arch_fork inline function: > > https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/unix/sysv/linux/arch-fork.h;h=b846da08f98839aef336868de24850626428509c;hb=HEAD > > Yes, I think it's a form of fork() or vfork(). But when I compile a > test program on linux (RHEL7 or Ubuntu 20), fork() and vfork() appears > as fork and vfork in NetBSD's ktrace, not clone. I missed a point in the trace output, the parent is killed, and read() returns not because the other end is closed but because of the signal. This seems to come from a ptrace difference between linux and our emulation. Actually this binary linux program does a fork() and the child does the work, the parent just waits. But what happens is: the parent: p = fork() wait() ptrace(PTRACE_CONT, p, NULL, SIG_0) exit(0) the child does: ptrace(PTRACE_TRACEME, 0, NULL, NULL) exit(0) On linux, ptrace(PTRACE_TRACEME) returns EPERM, the wait in the parent waits until the child exits, and ptrace(PTRACE_CONT) gets ESRCH. On NetBSD, ptrace(PTRACE_TRACEME) succeeds, wait() returns at some point before the child exits, the parent ptrace(PTRACE_CONT) the child, the child gets killed (not by the parent, I can't see a kill() in the trace). On linux, ptrace(PTRACE_TRACEME) receiving EPERM may be because the process is running under strace. Running strace without -f (so that only the parent gets traced), I see the wait() returning, the parent getting a SIGCHLD, and ptrace(PTRACE_CONT) succeeding. But on linux, it doesn't seem that an orphaned child process gets killed. Could our linux ptrace emulation be fixed in any way ? especially avoid the pid XXX was killed: orphaned traced process -- Manuel Bouyer NetBSD: 26 ans d'experience feront toujours la difference --
Re: linux clone issue
Manuel Bouyer wrote: >I'm trying to run a binary-only linux program under NetBSD 9.2. >From what I found, the binary was built on Ubuntu 16.04 > >The program dies at at specific point and it seems to be a bug in our >emulation: 8992 8992 mylinuxprog CALL set_robust_list(0x7f7ff7ef5a20,0x18) 8992 8992 mylinuxprog RET set_robust_list 0 This is doing futex stuff which isn't in -9, it doesn't work in -current either but thorpej@ has an improved version on a branch.
Re: linux clone issue
On Tue, Oct 05, 2021 at 12:42:33AM -0400, Eric Hawicz wrote: > > On 10/4/2021 10:33 AM, Manuel Bouyer wrote: > > Hello > > I'm trying to run a binary-only linux program under NetBSD 9.2. > > From what I found, the binary was built on Ubuntu 16.04 > > [...] > > > > As you can see above (ktrace -si output), the read on fd 3 in 26751 returns > > with an error as soon as the child does its execve(), just as if CLOSEEXEC > > was set in the child. But the dup2(4,1) should keep the write side open > > without CLOSEEXEC. The program does a similar sequence just before > > (also forking a shell to execute some command) and it works. > > Later when sh tries to write to stdout it gets a SIGPIPE. > > > > I couldn't reproduce this with a simple program. > > But it seems that I can't reproduce this clone call. It seems that we are > > called with flags 0x1200011, which would translate to > > CLONE_CHILD_SETTID | CLONE_CHILD_CLEARTID | SIGCHLD, > > and a NULL stack pointer. > > But when run on linux, this clone syscall straces to > > CLONE_VM|CLONE_VFORK|SIGCHLD > > I think that combination of flags is actually a "fork()" call, which glibc > implements using clone. I found that through > https://eli.thegreenplace.net/2018/launching-linux-threads-and-processes-with-clone/, > which mentions that glibc has a ARCH_FORK macro, though it seems that the > more recent code uses an arch_fork inline function: > https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/unix/sysv/linux/arch-fork.h;h=b846da08f98839aef336868de24850626428509c;hb=HEAD Yes, I think it's a form of fork() or vfork(). But when I compile a test program on linux (RHEL7 or Ubuntu 20), fork() and vfork() appears as fork and vfork in NetBSD's ktrace, not clone. > > > > I tried writing a program using fork(), vfork() or clone() but > > none of them would use the clone() syscall as do my linux binary. > > Any idea what could cause clone() to be used this way ? > > Is your binary statically linked? Maybe it has a different glibc > implementation from the .so that's on your system. Yes, the linux emulation on NetBSD use suse's glibc, while my linux test systems are RHEL7 and Ubuntu 20 -- Manuel Bouyer NetBSD: 26 ans d'experience feront toujours la difference --
Re: linux clone issue
On 10/4/2021 10:33 AM, Manuel Bouyer wrote: Hello I'm trying to run a binary-only linux program under NetBSD 9.2. From what I found, the binary was built on Ubuntu 16.04 [...] As you can see above (ktrace -si output), the read on fd 3 in 26751 returns with an error as soon as the child does its execve(), just as if CLOSEEXEC was set in the child. But the dup2(4,1) should keep the write side open without CLOSEEXEC. The program does a similar sequence just before (also forking a shell to execute some command) and it works. Later when sh tries to write to stdout it gets a SIGPIPE. I couldn't reproduce this with a simple program. But it seems that I can't reproduce this clone call. It seems that we are called with flags 0x1200011, which would translate to CLONE_CHILD_SETTID | CLONE_CHILD_CLEARTID | SIGCHLD, and a NULL stack pointer. But when run on linux, this clone syscall straces to CLONE_VM|CLONE_VFORK|SIGCHLD I think that combination of flags is actually a "fork()" call, which glibc implements using clone. I found that through https://eli.thegreenplace.net/2018/launching-linux-threads-and-processes-with-clone/, which mentions that glibc has a ARCH_FORK macro, though it seems that the more recent code uses an arch_fork inline function: https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/unix/sysv/linux/arch-fork.h;h=b846da08f98839aef336868de24850626428509c;hb=HEAD I tried writing a program using fork(), vfork() or clone() but none of them would use the clone() syscall as do my linux binary. Any idea what could cause clone() to be used this way ? Is your binary statically linked? Maybe it has a different glibc implementation from the .so that's on your system. Eric