Hi Sluggers.

I have got a intermittant process "events/0" (usually PID 11 or 
thereabouts) that hangs, putting one of the CPU's into 100% usage 
awaiting for something (hence my need to force a core dump).

I have tested the system running Linux without the applications I 
normally want running (kaffeine-dtv,myth backend via 4 dtv capture 
ports, etc), and just running continuous kernel builds with the "-j4" 
directive and concurrent database builds/rebuilds over 48 hours showd 
not even the slightest problem anywhere. Under this process I 
exercise swapping across all cores, all available memory, disk I/O 
(have ext3 & XFS partitions) and wget downloads of DVD helped the 
system workload push it to its limit.

2.6.22.17-0.1-default #1 SMP x86_64 GNU/Linux OpenSuSE-10.3
AMD Phenom CPU 4Gb/mem, 3ware-9650SE-8LPML(224MB) Raid5 - 1.8TG
Rather than bore people - system details available upon request.

Although I have set up ulimit correctly, I have tried very thing I 
know (which ain't much people ;-). The usual "kill # pid" doesn't 
effect anything at all because I suspect the offending task is 100% 
in wait loop it and it an't letting any signal in at all.

I have been able to renice it to a lower level, and I have been 
through the procfs looking for whatever maybe help point to the 
problem.

Nothing obvious. From "top" the following stats.
Pid=11 user=root PR=17 NI=2 Virt=Res=Shr (memory) = 0
%CPU=100 CPU=0 (1 of four) %mem = 0
WCHAN=stext, Command = "events/0"

<< from "cat /proc/11/stat" >>
Name:   events/0
State:  R (running)
SleepAVG:       98%
Tgid:   11
Pid:    11
PPid:   2
TracerPid:      0
Uid:    0       0       0       0
Gid:    0       0       0       0
FDSize: 64
Groups:
Threads:        1
SigQ:   1/71680
SigPnd: 0000000000000000
ShdPnd: 0000000000000000
SigBlk: 0000000000000000
SigIgn: ffffffffffffffff
SigCgt: 0000000000000000
CapInh: 0000000000000000
CapPrm: 00000000ffffffff
CapEff: 00000000fffffeff
Cpus_allowed:   00000000,00000000,00000000,00000001
Mems_allowed:   00000000,00000001


I can't append the offending pid with gdb, nor can I force a core dump
although lots task data collected and saved.

So, apart from embedding some of my own source code into the kerenl 
that handles events, does anyone have any further suggestions?

Thanks.
Grahame
-- 
SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/
Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html

Reply via email to