-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

On 01/13/2014 08:54 PM, Toralf Förster wrote:
> On 01/13/2014 12:21 AM, Richard Weinberger wrote:
>> On Sat, Jan 11, 2014 at 11:47 AM, Toralf Förster <toralf.foers...@gmx.de> 
>> wrote:
>> I do fuzz testing with trinity (latest git version) a stable 32 bit Gentoo 
>> Linux user mode linux image.
>> The host is a stable 32 bit vanilla 3.12.7 kernel, the guest runs latest git 
>> tree + 2 patches (attached).
> 
>> The trinity call in the UML guest is :
>> $> trinity -q -l off -N 10000 -C 2 -x move_pages -x mremap -v /mnt/ramdisk
> 
>> After a while there's no progress on the command line seen at the host 
>> system - the trinity process seems to just hangs/idling. When this does 
>> occur I cannot longer ssh into the system. The system however runs 
>> furthermore. In another terminal I still see the output of this command:
> 
>>> Does it consume 100% CPU?
> 
> No.
> It just doesnt allow new ssh connections. Existing ssh conenctinos are still 
> working.
> 
>> $> ssh root@trinity "tail -f /var/log/messages"
> 
>> That's why I do know that the system does not hang completely. The output of 
>> top at the host system gives me the pid of the linux exe. A gdb call gives 
>> for that pid :
> 
>> $ date; sudo gdb /home/tfoerste/devel/linux/linux 25224 -n -batch -ex 'bt 
>> full'
>> Sat Jan 11 11:36:47 CET 2014
> 
>> warning: Could not load shared library symbols for linux-gate.so.1.
>> Do you need "set solib-search-path" or "set sysroot"?
>> 0xb7800424 in __kernel_vsyscall ()
>> #0  0xb7800424 in __kernel_vsyscall ()
>> No symbol table info available.
>> #1  0x083d63ff in __nanosleep_nocancel ()
>> No symbol table info available.
>> #2  0x0807266c in idle_sleep (nsecs=602496380195307520) at 
>> arch/um/os-Linux/time.c:183
>>         ts = {tv_sec = 0, tv_nsec = 8436602}
>> #3  0x0805fc0f in arch_cpu_idle () at arch/um/kernel/process.c:208
>> No locals.
>> #4  0x080a8971 in cpu_idle_loop () at kernel/cpu/idle.c:98
>> No locals.
>> #5  cpu_startup_entry (state=CPUHP_ONLINE) at kernel/cpu/idle.c:140
>> No locals.
>> #6  0x084215e9 in rest_init () at init/main.c:402
>>         pid = -516
>>         __func__ = "rest_init"
>> #7  0x080487e1 in start_kernel () at init/main.c:656
>>         command_line = 0x85b8400 <command_line> "earlyprintk 
>> ubda=/home/tfoerste/virtual/uml/trinity ubdb=/mnt/ramdisk/trinity_swap 
>> eth0=tuntap,tap0,72:ef:3d:9f:c3:5a mem=1025M con0=fd:0,fd:1 con=pts 
>> rootfstype=ext4  root=98:0"
>> #8  0x08049e42 in start_kernel_proc (unused=0x0) at 
>> arch/um/kernel/skas/process.c:48
>>         pid = -516
>>         __func__ = "start_kernel_proc"
>> #9  0x0805f7cb in new_thread_handler () at arch/um/kernel/process.c:129
>>         fn = 0x0
>> #10 0x00000000 in ?? ()
>> No symbol table info available.
> 
> 
> 
>> Please note that BUG_ON was not triggered. For completeness here are the gdb 
>> traces from all linux processes currently running at the host:
> 
>>> So let's forget the 516 issue for now.
>>> What we no for now is that you manage to trigger a lockup within UML.
> 
> Agreed, especially b/c I added this patch too :
> $ cat ~/devel/priv/uml/pid516_2.patch
> --- init/main.c_orig    2014-01-12 16:43:48.585439158 +0100
> +++ init/main.c 2014-01-12 16:44:01.706438453 +0100
> @@ -389,6 +389,7 @@
>         BUG_ON(pid == -516);
>         rcu_read_lock();
>         kthreadd_task = find_task_by_pid_ns(pid, &init_pid_ns);
> +       BUG_ON(pid == -516);
>         rcu_read_unlock();
>         complete(&kthreadd_done);
> 
> and this wasn't triggered (/me wonders if the -516 is somehow garbage).
> 
> But I can narrow down the problem. In an still open ssh sessions I made :
> 
> $ lsof | grep t3
> bash      6129      tfoerste  cwd       DIR       98,0     4096    734 
> /home/tfoerste/t3
> logger    6135      tfoerste  cwd       DIR       98,0     4096    734 
> /home/tfoerste/t3
> 
> (t3 is the ~/t3 directory where I cd into it bewfore I run trinity.
> 
> And after killing the logger command the trinity batch continues :
> 
> $ ps xf -eo pid,start_time,command | grep trinity
>  6412 20:48  |           \_ grep --colour=auto trinity
>  6129 19:17          \_ bash -c cd ~; sudo su -c 'if [[ -d ./t3 ]]; then sudo 
> chmod -R a+rwx ./t3; sudo rm -rf ./t3; fi'; mkdir ./t3; cd ./t3; logger 
> "17#-1, M=/mnt/ramdisk"; if [[ -n /mnt/ramdisk ]]; then if [[ -d 
> /mnt/ramdisk/victims/v1 ]]; then sudo chmod -R a+rwx /mnt/ramdisk/victims/v1; 
> sudo rm -rf /mnt/ramdisk/victims/v1; fi; mkdir -p /mnt/ramdisk/victims/v1/v2; 
> for i in $(seq -w 0 99); do touch /mnt/ramdisk/victims/v1/v2/f$i 2>/dev/null; 
> mkdir /mnt/ramdisk/victims/v1/v2/d$i 2>/dev/null; done; fi;  trinity -q -N 
> 10000 -C 2 -x move_pages -x mremap -V /mnt/ramdisk/victims/v1/v2
>  6390 20:46              \_ trinity -q -N 10000 -C 2 -x move_pages -x mremap 
> -V /mnt/ramdisk/victims/v1/v2
>  6391 20:46                  \_ trinity -q -N 10000 -C 2 -x move_pages -x 
> mremap -V /mnt/ramdisk/victims/v1/v2
>  6392 20:46                  \_ trinity -q -N 10000 -C 2 -x move_pages -x 
> mremap -V /mnt/ramdisk/victims/v1/v2
>  6408 20:47                      \_ trinity -q -N 10000 -C 2 -x move_pages -x 
> mremap -V /mnt/ramdisk/victims/v1/v2
>  6410 20:48                      \_ trinity -q -N 10000 -C 2 -x move_pages -x 
> mremap -V /mnt/ramdisk/victims/v1/v2
> 
> 
> FWIW a ssh into the UML guest is however still no longer possible. So I'm 
> pretty sure that trinity damage there something really but I'd expect that 
> such a damage should be seen somewhere in the logs, or ?
> 
> And finally - now the the batch trinity command hangs again and now not even 
> killing logger helps.
> And a shutdown ("sudo halt; exit") hangs too.
> 
> 
> 
>> $ pgrep linux | xargs -n1 -I {} sudo gdb /home/tfoerste/devel/linux/linux {} 
>> -n -batch -ex 'bt'
>> warning: process 1613 is already traced by process 25224
>> ptrace: Operation not permitted.
>> /home/tfoerste/1613: No such file or directory.
>> No stack.
>> warning: process 21849 is already traced by process 25224
>> ptrace: Operation not permitted.
>> /home/tfoerste/21849: No such file or directory.
>> No stack.
> 
>> warning: Could not load shared library symbols for linux-gate.so.1.
>> Do you need "set solib-search-path" or "set sysroot"?
>> 0xb7800424 in __kernel_vsyscall ()
>> #0  0xb7800424 in __kernel_vsyscall ()
>> #1  0x083d63ff in __nanosleep_nocancel ()
>> #2  0x0807266c in idle_sleep (nsecs=602496380205307520) at 
>> arch/um/os-Linux/time.c:183
>> #3  0x0805fc0f in arch_cpu_idle () at arch/um/kernel/process.c:208
>> #4  0x080a8971 in cpu_idle_loop () at kernel/cpu/idle.c:98
>> #5  cpu_startup_entry (state=CPUHP_ONLINE) at kernel/cpu/idle.c:140
>> #6  0x084215e9 in rest_init () at init/main.c:402
>> #7  0x080487e1 in start_kernel () at init/main.c:656
>> #8  0x08049e42 in start_kernel_proc (unused=0x0) at 
>> arch/um/kernel/skas/process.c:48
>> #9  0x0805f7cb in new_thread_handler () at arch/um/kernel/process.c:129
>> #10 0x00000000 in ?? ()
> 
>> warning: process 25231 is a cloned process
> 
>> warning: Could not load shared library symbols for linux-gate.so.1.
>> Do you need "set solib-search-path" or "set sysroot"?
>> 0xb7800424 in __kernel_vsyscall ()
>> #0  0xb7800424 in __kernel_vsyscall ()
>> #1  0x083da446 in syscall ()
>> #2  0x0806e861 in io_getevents (events=<optimized out>, ctx_id=<optimized 
>> out>, min_nr=<optimized out>, nr=<optimized out>, timeout=<optimized out>) 
>> at arch/um/os-Linux/aio.c:49
>> #3  aio_thread (arg=0x0) at arch/um/os-Linux/aio.c:109
>> #4  0x083db56e in clone ()
> 
>> warning: process 25232 is a cloned process
> 
>> warning: Could not load shared library symbols for linux-gate.so.1.
>> Do you need "set solib-search-path" or "set sysroot"?
>> 0xb7800424 in __kernel_vsyscall ()
>> #0  0xb7800424 in __kernel_vsyscall ()
>> #1  0x083d82c2 in __read_nocancel ()
>> #2  0x0806f3ff in read (__nbytes=<optimized out>, __buf=<optimized out>, 
>> __fd=<optimized out>) at /usr/include/bits/unistd.h:44
>> #3  os_read_file (fd=-512, buf=0xfffffe00, len=-512) at 
>> arch/um/os-Linux/file.c:253
>> #4  0x0806bafc in io_thread (arg=0x0) at arch/um/drivers/ubd_kern.c:1482
>> #5  0x083db56e in clone ()
> 
>> warning: process 25233 is a cloned process
> 
>> warning: Could not load shared library symbols for linux-gate.so.1.
>> Do you need "set solib-search-path" or "set sysroot"?
>> 0xb7800424 in __kernel_vsyscall ()
>> #0  0xb7800424 in __kernel_vsyscall ()
>> #1  0x083d9132 in __poll_nocancel ()
>> #2  0x08071114 in poll (__timeout=<optimized out>, __nfds=<optimized out>, 
>> __fds=<optimized out>) at /usr/include/bits/poll2.h:46
>> #3  write_sigio_thread (unused=0x0) at arch/um/os-Linux/sigio.c:61
>> #4  0x083db56e in clone ()
>> warning: process 25234 is a zombie - the process has already terminated
>> ptrace: Operation not permitted.
>> /home/tfoerste/25234: No such file or directory.
>> No stack.
>> ...
> 
> 
>> Please Cc: me I'm not subscribed.
> 
>>> Wouldn't it make sense to subscribe?
>>> You post very often on this list. :)
> 
> done ;)
> 
> 
> 
>>>
>>> ------------------------------------------------------------------------------
>>> CenturyLink Cloud: The Leader in Enterprise Cloud Services.
>>> Learn Why More Businesses Are Choosing CenturyLink Cloud For
>>> Critical Workloads, Development Environments & Everything In Between.
>>> Get a Quote or Start a Free Trial Today.
>>> http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk
>>> _______________________________________________
>>> User-mode-linux-devel mailing list
>>> User-mode-linux-devel@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel
>>>
> 
> 
> 
> 
> 


A funny variant of the problem is , that with clatest Linus' tree I do now get 
pid = 0 - all others are unchanged :

$ date; sudo gdb /home/tfoerste/devel/linux/linux 18483 -n -batch -ex 'bt full'
Sat Feb 15 16:41:42 CET 2014

warning: Could not load shared library symbols for linux-gate.so.1.
Do you need "set solib-search-path" or "set sysroot"?
0xb7792424 in __kernel_vsyscall ()
#0  0xb7792424 in __kernel_vsyscall ()
No symbol table info available.
#1  0x083ded0f in __nanosleep_nocancel ()
No symbol table info available.
#2  0x0807269c in idle_sleep (nsecs=602637203593008768) at 
arch/um/os-Linux/time.c:183
        ts = {tv_sec = 0, tv_nsec = 10000000}
#3  0x0805fc2f in arch_cpu_idle () at arch/um/kernel/process.c:208
No locals.
#4  0x080a99c1 in cpu_idle_loop () at kernel/cpu/idle.c:98
No locals.
#5  cpu_startup_entry (state=CPUHP_ONLINE) at kernel/cpu/idle.c:143
No locals.
#6  0x08429ec2 in rest_init () at init/main.c:397
        pid = 0
#7  0x080487e9 in start_kernel () at init/main.c:652
        command_line = 0x85c2420 <command_line> "earlyprintk 
ubda=/home/tfoerste/virtual/uml/trinity ubdb=/mnt/ramdisk/trinity_swap 
eth0=tuntap,tap0,72:ef:3d:9f:c3:5a mem=1025M con0=fd:0,fd:1 con=pts 
rootfstype=ext4  root=98:0"
#8  0x08049e19 in start_kernel_proc (unused=0x0) at 
arch/um/kernel/skas/process.c:46
        pid = 0
#9  0x0805f7eb in new_thread_handler () at arch/um/kernel/process.c:129
        fn = 0x0
#10 0x00000000 in ?? ()
No symbol table info available.


- -- 
MfG/Sincerely
Toralf Förster
pgp finger print:1A37 6F99 4A9D 026F 13E2 4DCF C4EA CDDE 0076 E94E
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iF4EAREIAAYFAlL/i1MACgkQxOrN3gB26U6HCwD/WRTDhGO38eNIMaZla2RPLCcW
AVbaR7p7PLtFHP/I7AsA/Rzz9ASZyvxpx+TufWWl/3xKkv7fFs/Z6/laEseKhVpM
=v42x
-----END PGP SIGNATURE-----

------------------------------------------------------------------------------
Android apps run on BlackBerry 10
Introducing the new BlackBerry 10.2.1 Runtime for Android apps.
Now with support for Jelly Bean, Bluetooth, Mapview and more.
Get your Android app in front of a whole new audience.  Start now.
http://pubads.g.doubleclick.net/gampad/clk?id=124407151&iu=/4140/ostg.clktrk
_______________________________________________
User-mode-linux-devel mailing list
User-mode-linux-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel

Reply via email to