Hi Sluggers. I have got a intermittant process "events/0" (usually PID 11 or thereabouts) that hangs, putting one of the CPU's into 100% usage awaiting for something (hence my need to force a core dump).
I have tested the system running Linux without the applications I normally want running (kaffeine-dtv,myth backend via 4 dtv capture ports, etc), and just running continuous kernel builds with the "-j4" directive and concurrent database builds/rebuilds over 48 hours showd not even the slightest problem anywhere. Under this process I exercise swapping across all cores, all available memory, disk I/O (have ext3 & XFS partitions) and wget downloads of DVD helped the system workload push it to its limit. 2.6.22.17-0.1-default #1 SMP x86_64 GNU/Linux OpenSuSE-10.3 AMD Phenom CPU 4Gb/mem, 3ware-9650SE-8LPML(224MB) Raid5 - 1.8TG Rather than bore people - system details available upon request. Although I have set up ulimit correctly, I have tried very thing I know (which ain't much people ;-). The usual "kill # pid" doesn't effect anything at all because I suspect the offending task is 100% in wait loop it and it an't letting any signal in at all. I have been able to renice it to a lower level, and I have been through the procfs looking for whatever maybe help point to the problem. Nothing obvious. From "top" the following stats. Pid=11 user=root PR=17 NI=2 Virt=Res=Shr (memory) = 0 %CPU=100 CPU=0 (1 of four) %mem = 0 WCHAN=stext, Command = "events/0" << from "cat /proc/11/stat" >> Name: events/0 State: R (running) SleepAVG: 98% Tgid: 11 Pid: 11 PPid: 2 TracerPid: 0 Uid: 0 0 0 0 Gid: 0 0 0 0 FDSize: 64 Groups: Threads: 1 SigQ: 1/71680 SigPnd: 0000000000000000 ShdPnd: 0000000000000000 SigBlk: 0000000000000000 SigIgn: ffffffffffffffff SigCgt: 0000000000000000 CapInh: 0000000000000000 CapPrm: 00000000ffffffff CapEff: 00000000fffffeff Cpus_allowed: 00000000,00000000,00000000,00000001 Mems_allowed: 00000000,00000001 I can't append the offending pid with gdb, nor can I force a core dump although lots task data collected and saved. So, apart from embedding some of my own source code into the kerenl that handles events, does anyone have any further suggestions? Thanks. Grahame -- SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/ Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html
