-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256 On 01/13/2014 08:54 PM, Toralf Förster wrote: > On 01/13/2014 12:21 AM, Richard Weinberger wrote: >> On Sat, Jan 11, 2014 at 11:47 AM, Toralf Förster <toralf.foers...@gmx.de> >> wrote: >> I do fuzz testing with trinity (latest git version) a stable 32 bit Gentoo >> Linux user mode linux image. >> The host is a stable 32 bit vanilla 3.12.7 kernel, the guest runs latest git >> tree + 2 patches (attached). > >> The trinity call in the UML guest is : >> $> trinity -q -l off -N 10000 -C 2 -x move_pages -x mremap -v /mnt/ramdisk > >> After a while there's no progress on the command line seen at the host >> system - the trinity process seems to just hangs/idling. When this does >> occur I cannot longer ssh into the system. The system however runs >> furthermore. In another terminal I still see the output of this command: > >>> Does it consume 100% CPU? > > No. > It just doesnt allow new ssh connections. Existing ssh conenctinos are still > working. > >> $> ssh root@trinity "tail -f /var/log/messages" > >> That's why I do know that the system does not hang completely. The output of >> top at the host system gives me the pid of the linux exe. A gdb call gives >> for that pid : > >> $ date; sudo gdb /home/tfoerste/devel/linux/linux 25224 -n -batch -ex 'bt >> full' >> Sat Jan 11 11:36:47 CET 2014 > >> warning: Could not load shared library symbols for linux-gate.so.1. >> Do you need "set solib-search-path" or "set sysroot"? >> 0xb7800424 in __kernel_vsyscall () >> #0 0xb7800424 in __kernel_vsyscall () >> No symbol table info available. >> #1 0x083d63ff in __nanosleep_nocancel () >> No symbol table info available. >> #2 0x0807266c in idle_sleep (nsecs=602496380195307520) at >> arch/um/os-Linux/time.c:183 >> ts = {tv_sec = 0, tv_nsec = 8436602} >> #3 0x0805fc0f in arch_cpu_idle () at arch/um/kernel/process.c:208 >> No locals. >> #4 0x080a8971 in cpu_idle_loop () at kernel/cpu/idle.c:98 >> No locals. >> #5 cpu_startup_entry (state=CPUHP_ONLINE) at kernel/cpu/idle.c:140 >> No locals. >> #6 0x084215e9 in rest_init () at init/main.c:402 >> pid = -516 >> __func__ = "rest_init" >> #7 0x080487e1 in start_kernel () at init/main.c:656 >> command_line = 0x85b8400 <command_line> "earlyprintk >> ubda=/home/tfoerste/virtual/uml/trinity ubdb=/mnt/ramdisk/trinity_swap >> eth0=tuntap,tap0,72:ef:3d:9f:c3:5a mem=1025M con0=fd:0,fd:1 con=pts >> rootfstype=ext4 root=98:0" >> #8 0x08049e42 in start_kernel_proc (unused=0x0) at >> arch/um/kernel/skas/process.c:48 >> pid = -516 >> __func__ = "start_kernel_proc" >> #9 0x0805f7cb in new_thread_handler () at arch/um/kernel/process.c:129 >> fn = 0x0 >> #10 0x00000000 in ?? () >> No symbol table info available. > > > >> Please note that BUG_ON was not triggered. For completeness here are the gdb >> traces from all linux processes currently running at the host: > >>> So let's forget the 516 issue for now. >>> What we no for now is that you manage to trigger a lockup within UML. > > Agreed, especially b/c I added this patch too : > $ cat ~/devel/priv/uml/pid516_2.patch > --- init/main.c_orig 2014-01-12 16:43:48.585439158 +0100 > +++ init/main.c 2014-01-12 16:44:01.706438453 +0100 > @@ -389,6 +389,7 @@ > BUG_ON(pid == -516); > rcu_read_lock(); > kthreadd_task = find_task_by_pid_ns(pid, &init_pid_ns); > + BUG_ON(pid == -516); > rcu_read_unlock(); > complete(&kthreadd_done); > > and this wasn't triggered (/me wonders if the -516 is somehow garbage). > > But I can narrow down the problem. In an still open ssh sessions I made : > > $ lsof | grep t3 > bash 6129 tfoerste cwd DIR 98,0 4096 734 > /home/tfoerste/t3 > logger 6135 tfoerste cwd DIR 98,0 4096 734 > /home/tfoerste/t3 > > (t3 is the ~/t3 directory where I cd into it bewfore I run trinity. > > And after killing the logger command the trinity batch continues : > > $ ps xf -eo pid,start_time,command | grep trinity > 6412 20:48 | \_ grep --colour=auto trinity > 6129 19:17 \_ bash -c cd ~; sudo su -c 'if [[ -d ./t3 ]]; then sudo > chmod -R a+rwx ./t3; sudo rm -rf ./t3; fi'; mkdir ./t3; cd ./t3; logger > "17#-1, M=/mnt/ramdisk"; if [[ -n /mnt/ramdisk ]]; then if [[ -d > /mnt/ramdisk/victims/v1 ]]; then sudo chmod -R a+rwx /mnt/ramdisk/victims/v1; > sudo rm -rf /mnt/ramdisk/victims/v1; fi; mkdir -p /mnt/ramdisk/victims/v1/v2; > for i in $(seq -w 0 99); do touch /mnt/ramdisk/victims/v1/v2/f$i 2>/dev/null; > mkdir /mnt/ramdisk/victims/v1/v2/d$i 2>/dev/null; done; fi; trinity -q -N > 10000 -C 2 -x move_pages -x mremap -V /mnt/ramdisk/victims/v1/v2 > 6390 20:46 \_ trinity -q -N 10000 -C 2 -x move_pages -x mremap > -V /mnt/ramdisk/victims/v1/v2 > 6391 20:46 \_ trinity -q -N 10000 -C 2 -x move_pages -x > mremap -V /mnt/ramdisk/victims/v1/v2 > 6392 20:46 \_ trinity -q -N 10000 -C 2 -x move_pages -x > mremap -V /mnt/ramdisk/victims/v1/v2 > 6408 20:47 \_ trinity -q -N 10000 -C 2 -x move_pages -x > mremap -V /mnt/ramdisk/victims/v1/v2 > 6410 20:48 \_ trinity -q -N 10000 -C 2 -x move_pages -x > mremap -V /mnt/ramdisk/victims/v1/v2 > > > FWIW a ssh into the UML guest is however still no longer possible. So I'm > pretty sure that trinity damage there something really but I'd expect that > such a damage should be seen somewhere in the logs, or ? > > And finally - now the the batch trinity command hangs again and now not even > killing logger helps. > And a shutdown ("sudo halt; exit") hangs too. > > > >> $ pgrep linux | xargs -n1 -I {} sudo gdb /home/tfoerste/devel/linux/linux {} >> -n -batch -ex 'bt' >> warning: process 1613 is already traced by process 25224 >> ptrace: Operation not permitted. >> /home/tfoerste/1613: No such file or directory. >> No stack. >> warning: process 21849 is already traced by process 25224 >> ptrace: Operation not permitted. >> /home/tfoerste/21849: No such file or directory. >> No stack. > >> warning: Could not load shared library symbols for linux-gate.so.1. >> Do you need "set solib-search-path" or "set sysroot"? >> 0xb7800424 in __kernel_vsyscall () >> #0 0xb7800424 in __kernel_vsyscall () >> #1 0x083d63ff in __nanosleep_nocancel () >> #2 0x0807266c in idle_sleep (nsecs=602496380205307520) at >> arch/um/os-Linux/time.c:183 >> #3 0x0805fc0f in arch_cpu_idle () at arch/um/kernel/process.c:208 >> #4 0x080a8971 in cpu_idle_loop () at kernel/cpu/idle.c:98 >> #5 cpu_startup_entry (state=CPUHP_ONLINE) at kernel/cpu/idle.c:140 >> #6 0x084215e9 in rest_init () at init/main.c:402 >> #7 0x080487e1 in start_kernel () at init/main.c:656 >> #8 0x08049e42 in start_kernel_proc (unused=0x0) at >> arch/um/kernel/skas/process.c:48 >> #9 0x0805f7cb in new_thread_handler () at arch/um/kernel/process.c:129 >> #10 0x00000000 in ?? () > >> warning: process 25231 is a cloned process > >> warning: Could not load shared library symbols for linux-gate.so.1. >> Do you need "set solib-search-path" or "set sysroot"? >> 0xb7800424 in __kernel_vsyscall () >> #0 0xb7800424 in __kernel_vsyscall () >> #1 0x083da446 in syscall () >> #2 0x0806e861 in io_getevents (events=<optimized out>, ctx_id=<optimized >> out>, min_nr=<optimized out>, nr=<optimized out>, timeout=<optimized out>) >> at arch/um/os-Linux/aio.c:49 >> #3 aio_thread (arg=0x0) at arch/um/os-Linux/aio.c:109 >> #4 0x083db56e in clone () > >> warning: process 25232 is a cloned process > >> warning: Could not load shared library symbols for linux-gate.so.1. >> Do you need "set solib-search-path" or "set sysroot"? >> 0xb7800424 in __kernel_vsyscall () >> #0 0xb7800424 in __kernel_vsyscall () >> #1 0x083d82c2 in __read_nocancel () >> #2 0x0806f3ff in read (__nbytes=<optimized out>, __buf=<optimized out>, >> __fd=<optimized out>) at /usr/include/bits/unistd.h:44 >> #3 os_read_file (fd=-512, buf=0xfffffe00, len=-512) at >> arch/um/os-Linux/file.c:253 >> #4 0x0806bafc in io_thread (arg=0x0) at arch/um/drivers/ubd_kern.c:1482 >> #5 0x083db56e in clone () > >> warning: process 25233 is a cloned process > >> warning: Could not load shared library symbols for linux-gate.so.1. >> Do you need "set solib-search-path" or "set sysroot"? >> 0xb7800424 in __kernel_vsyscall () >> #0 0xb7800424 in __kernel_vsyscall () >> #1 0x083d9132 in __poll_nocancel () >> #2 0x08071114 in poll (__timeout=<optimized out>, __nfds=<optimized out>, >> __fds=<optimized out>) at /usr/include/bits/poll2.h:46 >> #3 write_sigio_thread (unused=0x0) at arch/um/os-Linux/sigio.c:61 >> #4 0x083db56e in clone () >> warning: process 25234 is a zombie - the process has already terminated >> ptrace: Operation not permitted. >> /home/tfoerste/25234: No such file or directory. >> No stack. >> ... > > >> Please Cc: me I'm not subscribed. > >>> Wouldn't it make sense to subscribe? >>> You post very often on this list. :) > > done ;) > > > >>> >>> ------------------------------------------------------------------------------ >>> CenturyLink Cloud: The Leader in Enterprise Cloud Services. >>> Learn Why More Businesses Are Choosing CenturyLink Cloud For >>> Critical Workloads, Development Environments & Everything In Between. >>> Get a Quote or Start a Free Trial Today. >>> http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk >>> _______________________________________________ >>> User-mode-linux-devel mailing list >>> User-mode-linux-devel@lists.sourceforge.net >>> https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel >>> > > > > >
A funny variant of the problem is , that with clatest Linus' tree I do now get pid = 0 - all others are unchanged : $ date; sudo gdb /home/tfoerste/devel/linux/linux 18483 -n -batch -ex 'bt full' Sat Feb 15 16:41:42 CET 2014 warning: Could not load shared library symbols for linux-gate.so.1. Do you need "set solib-search-path" or "set sysroot"? 0xb7792424 in __kernel_vsyscall () #0 0xb7792424 in __kernel_vsyscall () No symbol table info available. #1 0x083ded0f in __nanosleep_nocancel () No symbol table info available. #2 0x0807269c in idle_sleep (nsecs=602637203593008768) at arch/um/os-Linux/time.c:183 ts = {tv_sec = 0, tv_nsec = 10000000} #3 0x0805fc2f in arch_cpu_idle () at arch/um/kernel/process.c:208 No locals. #4 0x080a99c1 in cpu_idle_loop () at kernel/cpu/idle.c:98 No locals. #5 cpu_startup_entry (state=CPUHP_ONLINE) at kernel/cpu/idle.c:143 No locals. #6 0x08429ec2 in rest_init () at init/main.c:397 pid = 0 #7 0x080487e9 in start_kernel () at init/main.c:652 command_line = 0x85c2420 <command_line> "earlyprintk ubda=/home/tfoerste/virtual/uml/trinity ubdb=/mnt/ramdisk/trinity_swap eth0=tuntap,tap0,72:ef:3d:9f:c3:5a mem=1025M con0=fd:0,fd:1 con=pts rootfstype=ext4 root=98:0" #8 0x08049e19 in start_kernel_proc (unused=0x0) at arch/um/kernel/skas/process.c:46 pid = 0 #9 0x0805f7eb in new_thread_handler () at arch/um/kernel/process.c:129 fn = 0x0 #10 0x00000000 in ?? () No symbol table info available. - -- MfG/Sincerely Toralf Förster pgp finger print:1A37 6F99 4A9D 026F 13E2 4DCF C4EA CDDE 0076 E94E -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iF4EAREIAAYFAlL/i1MACgkQxOrN3gB26U6HCwD/WRTDhGO38eNIMaZla2RPLCcW AVbaR7p7PLtFHP/I7AsA/Rzz9ASZyvxpx+TufWWl/3xKkv7fFs/Z6/laEseKhVpM =v42x -----END PGP SIGNATURE----- ------------------------------------------------------------------------------ Android apps run on BlackBerry 10 Introducing the new BlackBerry 10.2.1 Runtime for Android apps. Now with support for Jelly Bean, Bluetooth, Mapview and more. Get your Android app in front of a whole new audience. Start now. http://pubads.g.doubleclick.net/gampad/clk?id=124407151&iu=/4140/ostg.clktrk _______________________________________________ User-mode-linux-devel mailing list User-mode-linux-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel