On Wed, Oct 2, 2013 at 8:30 PM, Toralf Förster <toralf.foers...@gmx.de> wrote:
> Running trinity (1 process, no victim files, just "$>trinity -C1) for a 
> longer time
> within a 32 bit user mode linux image with a recent git kernel (host: 3.11.3 
> guest 3.12-rc3-g...)
> yields into this konsole message :
>
>  * Starting local
> net.core.warnings = 0                                                         
>                 [ ok ]
> BUG: soft lockup - CPU#0 stuck for 23s! [trinity-child0:2031]
>
>
> and at the host t1 of the "linux"-processes eats all CPU cycles at 1 CPU core.
> 2 subsequent made back traces made with
>
> $> sudo gdb /home/tfoerste/devel/linux/linux 28144 -n -batch -ex bt
>
> shows nearly a similar position around __get_user_pages() - both are attached.
>
> I'm not surprised that trinity harms a systems - I'm just wondering whether 
> this particular picture is
> expected or if it points to an issue.
>
>
> FWIW the last lines of trinity log were :
>
>
> [2031] [94] setsid() = 2031
> [2031] [95] setresgid(rgid=0xffff33e3, egid=0xffffff93, sgid=0x22000040) = -1 
> (Operation not permitted)
> [2031] [96] vmsplice(fd=5, iov=0x85501e0, nr_segs=300, flags=9) = 0x3000
> [2031] [97] setresuid(ruid=0x80549193, euid=0xc61041e0, suid=0xff19b6fa) = -1 
> (Operation not permitted)
> [2031] [98] setpriority(which=0xff010000, who=0xf3737373, niceval=0x8088960c) 
> = -1 (Invalid argument)
> [2031] [99] socketcall(call=1, args=0x8550200) = -1 (Address family not 
> supported by protocol)
> [2031] [100] access(filename="�", mode=2017) = -1 (Invalid argument)
> [2031] [101] getgroups(gidsetsize=0, grouplist=0x80d0000[page_rand]) = 3
> [2031] [102] msync(start=0xc0100220, len=0, flags=3) = -1 (Invalid argument)
> [2031] [103] sigpending(set=0x40025000) = 0
> [2031] [104] signalfd4(ufd=383, user_mask=1, sizemask=0xa4200000, 
> flags=0x80800) = -1 (Invalid argument)
> [2031] [105] sendfile(out_fd=383, in_fd=382, offset=0, count=4096) = -1 
> (Invalid argument)
> [2031] [106] fanotify_mark(fanotify_fd=382, flags=5, mask=0x8000023, dfd=382, 
> pathname="/proc/1092/task/1092/fdinfo/68") = -1 (Invalid argument)
> [2031] [107] wait4(upid=1, stat_addr=4, options=0xd761979b, ru=8) = -1 
> (Invalid argument)
> [2031] [108] sigpending(set=0x80ca000[page_zeros]) = 0
> [2031] [109] setresuid(ruid=0xefffd6fc, euid=0x1bf4c92f, suid=0xffff2e33) = 
> -1 (Operation not permitted)
> [2031] [110] munlock(addr=0x40025000, len=34) = 0
> [2031] [111] timer_delete(timer_id=0xffffffdc) = -1 (Invalid argument)
> [2031] [112] sched_get_priority_max(policy=0x10000040) = -1 (Invalid argument)
> [2031] [113] syslog(type=0xc1000000, buf=1, len=0x82a5) = -1 (Operation not 
> permitted)
> [2031] [114] setpriority(which=0xc4c806c6, who=0xffffff01, 
> niceval=0xffff0682) = -1 (Invalid argument)
> [2031] [115] getgroups16(gidsetsize=0xfffe, grouplist=1) = -1 (Bad address)
> [2031] [116] rename(oldname=4, newname=8) = -1 (Bad address)
> [2031] [117] inotify_init() = 654
> [2031] [118] getgid() = 100
> [2031] [119] fstatat64(dfd=382, 
> filename="/sys/devices/virtual/net/sit0/duplex", statbuf=0, flag=0xb545d727) 
> = -1 (Invalid argument)
> [2031] [120] unlinkat(dfd=382, 
> pathname="/proc/sys/net/ipv4/neigh/default/retrans_time", flag=0xc00ef76) = 
> -1 (Invalid argument)
> [2031] [121] timerfd_create(clockid=0, flags=0) = 655
> [2031] [122] munlock(addr=4, len=0x3fff) = -1 (Cannot allocate memory)
> [2031] [123] fremovexattr(fd=382, name=0) = -1 (Bad address)
> [2031] [124] sched_get_priority_min(policy=0xff58bfef) = -1 (Invalid argument)
> [2031] [125] mq_timedreceive(mqdes=397, u_msg_ptr=4, msg_len=5245, 
> u_msg_prio=0xc0100220, u_abs_timeout=0xc0100220) = -1 (Bad address)
> [2031] [126] chdir(filename="/proc/116/net/ptype") = -1 (Not a directory)
> [2031] [127] ssetmask(newmask=0x88000092) = 0
> [2031] [128] statfs(pathname="/proc/6/mounts", buf=0) = -1 (Bad address)
> [2031] [129] fchown16(fd=397, user=104, group=0x94100000) = -1 (Operation not 
> permitted)
> [2031] [130] fchdir(fd=397) = -1 (Not a directory)
> [2031] [131] mkdir(pathname="/proc/1092/task/1092/fdinfo/316", mode=525) = -1 
> (File exists)
> [2031] [132] fsetxattr(fd=386, name=0x856f158, value=0x8571160, size=0, 
> flags=0) = -1 (Numerical result out of range)
> [2031] [133] io_setup(nr_events=4095, ctxp=0x40266000) ^CKilled by signal 2.

Reading your gdb backtraces show that schedule_timeout() got called
with a negative value.
Looks like an integer overflow.
The soft-lockup might also origin from that (very big integer which
did not overflow jet)

-- 
Thanks,
//richard

------------------------------------------------------------------------------
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from 
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60134791&iu=/4140/ostg.clktrk
_______________________________________________
User-mode-linux-devel mailing list
User-mode-linux-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel

Reply via email to