We have a similar situation, our system hangs several times a day. I still can't figure out exactly what's going wrong. But on 1 node of our system (where Apache runs + a webservice written in Ruby (Mongrel/Camping)), the system load keeps rising until it is not responding to anything. Also in the process list there are a lot of processes in D state at that time. The weird thing is that we just discovered that rebooting *another* node (we have 4 in total) fixes this situation. Suddenly the system load on the node that initially had the problem returns to a normal level and the processes that were in a D state are also returning to their normal states. Any idea why rebooting another node results fixes this situation? And what might be the cause of this?
We are running: Linux test01 2.6.22-14-server #1 SMP Thu Jan 31 23:57:25 UTC 2008 x86_64 GNU/Linux [ 77.688875] OCFS2 Node Manager 1.3.3 [ 77.703166] OCFS2 DLM 1.3.3 [ 77.710731] OCFS2 DLMFS 1.3.3 [ 77.710816] OCFS2 User DLM kernel interface loaded [ 85.870956] OCFS2 1.3.3 Kind regards, Erik. > Hello, > > yes.. when this situation happens there is allways a process spinning > (running > at 100%cpu). We can't kill it even with kill -9 > _______________________________________________ Ocfs2-users mailing list [email protected] http://oss.oracle.com/mailman/listinfo/ocfs2-users
