-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256
On 01/13/2014 12:21 AM, Richard Weinberger wrote:
> On Sat, Jan 11, 2014 at 11:47 AM, Toralf Förster <[email protected]>
> wrote:
> I do fuzz testing with trinity (latest git version) a stable 32 bit Gentoo
> Linux user mode linux image.
> The host is a stable 32 bit vanilla 3.12.7 kernel, the guest runs latest git
> tree + 2 patches (attached).
>
> The trinity call in the UML guest is :
> $> trinity -q -l off -N 10000 -C 2 -x move_pages -x mremap -v /mnt/ramdisk
>
> After a while there's no progress on the command line seen at the host system
> - the trinity process seems to just hangs/idling. When this does occur I
> cannot longer ssh into the system. The system however runs furthermore. In
> another terminal I still see the output of this command:
>
>> Does it consume 100% CPU?
>
No.
It just doesnt allow new ssh connections. Existing ssh conenctinos are still
working.
> $> ssh root@trinity "tail -f /var/log/messages"
>
> That's why I do know that the system does not hang completely. The output of
> top at the host system gives me the pid of the linux exe. A gdb call gives
> for that pid :
>
> $ date; sudo gdb /home/tfoerste/devel/linux/linux 25224 -n -batch -ex 'bt
> full'
> Sat Jan 11 11:36:47 CET 2014
>
> warning: Could not load shared library symbols for linux-gate.so.1.
> Do you need "set solib-search-path" or "set sysroot"?
> 0xb7800424 in __kernel_vsyscall ()
> #0 0xb7800424 in __kernel_vsyscall ()
> No symbol table info available.
> #1 0x083d63ff in __nanosleep_nocancel ()
> No symbol table info available.
> #2 0x0807266c in idle_sleep (nsecs=602496380195307520) at
> arch/um/os-Linux/time.c:183
> ts = {tv_sec = 0, tv_nsec = 8436602}
> #3 0x0805fc0f in arch_cpu_idle () at arch/um/kernel/process.c:208
> No locals.
> #4 0x080a8971 in cpu_idle_loop () at kernel/cpu/idle.c:98
> No locals.
> #5 cpu_startup_entry (state=CPUHP_ONLINE) at kernel/cpu/idle.c:140
> No locals.
> #6 0x084215e9 in rest_init () at init/main.c:402
> pid = -516
> __func__ = "rest_init"
> #7 0x080487e1 in start_kernel () at init/main.c:656
> command_line = 0x85b8400 <command_line> "earlyprintk
> ubda=/home/tfoerste/virtual/uml/trinity ubdb=/mnt/ramdisk/trinity_swap
> eth0=tuntap,tap0,72:ef:3d:9f:c3:5a mem=1025M con0=fd:0,fd:1 con=pts
> rootfstype=ext4 root=98:0"
> #8 0x08049e42 in start_kernel_proc (unused=0x0) at
> arch/um/kernel/skas/process.c:48
> pid = -516
> __func__ = "start_kernel_proc"
> #9 0x0805f7cb in new_thread_handler () at arch/um/kernel/process.c:129
> fn = 0x0
> #10 0x00000000 in ?? ()
> No symbol table info available.
>
>
>
> Please note that BUG_ON was not triggered. For completeness here are the gdb
> traces from all linux processes currently running at the host:
>
>> So let's forget the 516 issue for now.
>> What we no for now is that you manage to trigger a lockup within UML.
>
Agreed, especially b/c I added this patch too :
$ cat ~/devel/priv/uml/pid516_2.patch
- --- init/main.c_orig 2014-01-12 16:43:48.585439158 +0100
+++ init/main.c 2014-01-12 16:44:01.706438453 +0100
@@ -389,6 +389,7 @@
BUG_ON(pid == -516);
rcu_read_lock();
kthreadd_task = find_task_by_pid_ns(pid, &init_pid_ns);
+ BUG_ON(pid == -516);
rcu_read_unlock();
complete(&kthreadd_done);
and this wasn't triggered (/me wonders if the -516 is somehow garbage).
But I can narrow down the problem. In an still open ssh sessions I made :
$ lsof | grep t3
bash 6129 tfoerste cwd DIR 98,0 4096 734
/home/tfoerste/t3
logger 6135 tfoerste cwd DIR 98,0 4096 734
/home/tfoerste/t3
(t3 is the ~/t3 directory where I cd into it bewfore I run trinity.
And after killing the logger command the trinity batch continues :
$ ps xf -eo pid,start_time,command | grep trinity
6412 20:48 | \_ grep --colour=auto trinity
6129 19:17 \_ bash -c cd ~; sudo su -c 'if [[ -d ./t3 ]]; then sudo
chmod -R a+rwx ./t3; sudo rm -rf ./t3; fi'; mkdir ./t3; cd ./t3; logger "17#-1,
M=/mnt/ramdisk"; if [[ -n /mnt/ramdisk ]]; then if [[ -d
/mnt/ramdisk/victims/v1 ]]; then sudo chmod -R a+rwx /mnt/ramdisk/victims/v1;
sudo rm -rf /mnt/ramdisk/victims/v1; fi; mkdir -p /mnt/ramdisk/victims/v1/v2;
for i in $(seq -w 0 99); do touch /mnt/ramdisk/victims/v1/v2/f$i 2>/dev/null;
mkdir /mnt/ramdisk/victims/v1/v2/d$i 2>/dev/null; done; fi; trinity -q -N
10000 -C 2 -x move_pages -x mremap -V /mnt/ramdisk/victims/v1/v2
6390 20:46 \_ trinity -q -N 10000 -C 2 -x move_pages -x mremap -V
/mnt/ramdisk/victims/v1/v2
6391 20:46 \_ trinity -q -N 10000 -C 2 -x move_pages -x
mremap -V /mnt/ramdisk/victims/v1/v2
6392 20:46 \_ trinity -q -N 10000 -C 2 -x move_pages -x
mremap -V /mnt/ramdisk/victims/v1/v2
6408 20:47 \_ trinity -q -N 10000 -C 2 -x move_pages -x
mremap -V /mnt/ramdisk/victims/v1/v2
6410 20:48 \_ trinity -q -N 10000 -C 2 -x move_pages -x
mremap -V /mnt/ramdisk/victims/v1/v2
FWIW a ssh into the UML guest is however still no longer possible. So I'm
pretty sure that trinity damage there something really but I'd expect that such
a damage should be seen somewhere in the logs, or ?
And finally - now the the batch trinity command hangs again and now not even
killing logger helps.
And a shutdown ("sudo halt; exit") hangs too.
>
>
> $ pgrep linux | xargs -n1 -I {} sudo gdb /home/tfoerste/devel/linux/linux {}
> -n -batch -ex 'bt'
> warning: process 1613 is already traced by process 25224
> ptrace: Operation not permitted.
> /home/tfoerste/1613: No such file or directory.
> No stack.
> warning: process 21849 is already traced by process 25224
> ptrace: Operation not permitted.
> /home/tfoerste/21849: No such file or directory.
> No stack.
>
> warning: Could not load shared library symbols for linux-gate.so.1.
> Do you need "set solib-search-path" or "set sysroot"?
> 0xb7800424 in __kernel_vsyscall ()
> #0 0xb7800424 in __kernel_vsyscall ()
> #1 0x083d63ff in __nanosleep_nocancel ()
> #2 0x0807266c in idle_sleep (nsecs=602496380205307520) at
> arch/um/os-Linux/time.c:183
> #3 0x0805fc0f in arch_cpu_idle () at arch/um/kernel/process.c:208
> #4 0x080a8971 in cpu_idle_loop () at kernel/cpu/idle.c:98
> #5 cpu_startup_entry (state=CPUHP_ONLINE) at kernel/cpu/idle.c:140
> #6 0x084215e9 in rest_init () at init/main.c:402
> #7 0x080487e1 in start_kernel () at init/main.c:656
> #8 0x08049e42 in start_kernel_proc (unused=0x0) at
> arch/um/kernel/skas/process.c:48
> #9 0x0805f7cb in new_thread_handler () at arch/um/kernel/process.c:129
> #10 0x00000000 in ?? ()
>
> warning: process 25231 is a cloned process
>
> warning: Could not load shared library symbols for linux-gate.so.1.
> Do you need "set solib-search-path" or "set sysroot"?
> 0xb7800424 in __kernel_vsyscall ()
> #0 0xb7800424 in __kernel_vsyscall ()
> #1 0x083da446 in syscall ()
> #2 0x0806e861 in io_getevents (events=<optimized out>, ctx_id=<optimized
> out>, min_nr=<optimized out>, nr=<optimized out>, timeout=<optimized out>) at
> arch/um/os-Linux/aio.c:49
> #3 aio_thread (arg=0x0) at arch/um/os-Linux/aio.c:109
> #4 0x083db56e in clone ()
>
> warning: process 25232 is a cloned process
>
> warning: Could not load shared library symbols for linux-gate.so.1.
> Do you need "set solib-search-path" or "set sysroot"?
> 0xb7800424 in __kernel_vsyscall ()
> #0 0xb7800424 in __kernel_vsyscall ()
> #1 0x083d82c2 in __read_nocancel ()
> #2 0x0806f3ff in read (__nbytes=<optimized out>, __buf=<optimized out>,
> __fd=<optimized out>) at /usr/include/bits/unistd.h:44
> #3 os_read_file (fd=-512, buf=0xfffffe00, len=-512) at
> arch/um/os-Linux/file.c:253
> #4 0x0806bafc in io_thread (arg=0x0) at arch/um/drivers/ubd_kern.c:1482
> #5 0x083db56e in clone ()
>
> warning: process 25233 is a cloned process
>
> warning: Could not load shared library symbols for linux-gate.so.1.
> Do you need "set solib-search-path" or "set sysroot"?
> 0xb7800424 in __kernel_vsyscall ()
> #0 0xb7800424 in __kernel_vsyscall ()
> #1 0x083d9132 in __poll_nocancel ()
> #2 0x08071114 in poll (__timeout=<optimized out>, __nfds=<optimized out>,
> __fds=<optimized out>) at /usr/include/bits/poll2.h:46
> #3 write_sigio_thread (unused=0x0) at arch/um/os-Linux/sigio.c:61
> #4 0x083db56e in clone ()
> warning: process 25234 is a zombie - the process has already terminated
> ptrace: Operation not permitted.
> /home/tfoerste/25234: No such file or directory.
> No stack.
> ...
>
>
> Please Cc: me I'm not subscribed.
>
>> Wouldn't it make sense to subscribe?
>> You post very often on this list. :)
>
done ;)
>
>
>>
>> ------------------------------------------------------------------------------
>> CenturyLink Cloud: The Leader in Enterprise Cloud Services.
>> Learn Why More Businesses Are Choosing CenturyLink Cloud For
>> Critical Workloads, Development Environments & Everything In Between.
>> Get a Quote or Start a Free Trial Today.
>> http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk
>> _______________________________________________
>> User-mode-linux-devel mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel
>>
>
>
>
- --
MfG/Sincerely
Toralf Förster
pgp finger print:1A37 6F99 4A9D 026F 13E2 4DCF C4EA CDDE 0076 E94E
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/
iF4EAREIAAYFAlLURGIACgkQxOrN3gB26U44RQD+KUqGBeP6/nJk1K/1Wx6nz7ij
/JXcjNN+ZBt8PsMWrV4A/jx7w7Xrl0RPWcwXVFYm+Ixo0dSbtr+zvh/2pdcCNU2c
=uGid
-----END PGP SIGNATURE-----
------------------------------------------------------------------------------
CenturyLink Cloud: The Leader in Enterprise Cloud Services.
Learn Why More Businesses Are Choosing CenturyLink Cloud For
Critical Workloads, Development Environments & Everything In Between.
Get a Quote or Start a Free Trial Today.
http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk
_______________________________________________
User-mode-linux-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel