I have two openvz servers, which both seem to like to hang 'in the morning'. I've seen the problem with both the suse kernel vmlinux-2.6.16.21-2.2-smp yesterday, and the stable vmlinux-2.6.9-023stab040.1 today.

This time, I had some 'top's open, which report a load over >80. I can SSH connect to the system, but both local and remote logins hang. Interestingly, the VZs running on the machine still work, I can run commands in them and they report no uptime.

I run the vzs on reiserfs/ext3 partitions, mounted over AoE. I have the feeling the kernel might actually be hanging over NFS (I use NFS to share configuration and administrative files for openvz, but not for the VZs themselves: running VZ on NFS mounts didn't work), but restarting the NFS server doesn't help anything. I rebooted one of the hanging servers, and it could access the NFS just fine afterwards, so NFS itself seems to be up.

syslog still worked, and I grabbed the following callstacks using sysrq - I noticed at lot of cron processes hanging with this trace:

Feb 23 11:31:36 web2 kernel: cron S 0000807940a0 000001011ae0c050 0 3018 6456 3022 3019 3014 (NOTLB) Feb 23 11:31:36 web2 kernel: 0000010119c23df8 0000000000000006 000001013f674f00 ffffffffa012706b Feb 23 11:31:36 web2 kernel: 0000000000000000 ffffffff8017c62b ffffffff8054bc80 0000000000000000
Feb 23 11:31:36 web2 kernel:  000001011ae0c050 0000807940a0edd0
Feb 23 11:31:36 web2 kernel: Call Trace: [<ffffffffa012706b>] :simfs:sim_systemcall+0x6b/0x280
Feb 23 11:31:36 web2 kernel:  [<ffffffff8017c62b>] do_wp_page+0x44b/0x4c0
Feb 23 11:31:36 web2 kernel:  [<ffffffff8019ceb0>] pipe_wait+0xa0/0xf0
Feb 23 11:31:36 web2 kernel: [<ffffffff8013b8a0>] autoremove_wake_function+0x0/0x30

....

None of the VZs should be running crontab as far as I know, so this should be the crontab of the underlying system. I'm not sure if it should even be in a simfs function?

I think these are the crons that invoke vpsnetclean and vpsreboot (which also occur a lot in the process list), so this probably explains the >80 load.

The stack trace of vpsreboot:
Feb 23 11:31:44 web2 kernel: vpsreboot D 00008ad76e6a 0000010117e6e3d0 0 4316 4315 (NOTLB) Feb 23 11:31:44 web2 kernel: 0000010117db9928 0000000000000006 0000000000000003 ffffffff8016f624 Feb 23 11:31:44 web2 kernel: 000001000000f380 0000000000000202 ffffffff8054bc80 0000000000000000
Feb 23 11:31:44 web2 kernel:  0000010117e6e3d0 00008ad76e6abd1c
Feb 23 11:31:44 web2 kernel: Call Trace: [<ffffffff8016f624>] __alloc_collect_stats+0x54/0xc0 Feb 23 11:31:44 web2 kernel: [<ffffffffa00b2ec1>] :sunrpc:rpc_sleep_on+0x41/0x70 Feb 23 11:31:44 web2 kernel: [<ffffffffa00b3bd0>] :sunrpc:__rpc_execute+0x1f0/0x3c0 Feb 23 11:31:44 web2 kernel: [<ffffffff8013b8a0>] autoremove_wake_function+0x0/0x30 Feb 23 11:31:44 web2 kernel: [<ffffffffa00b36c7>] :sunrpc:rpc_init_task+0x157/0x1f0 Feb 23 11:31:44 web2 kernel: [<ffffffff8013b8a0>] autoremove_wake_function+0x0/0x30 Feb 23 11:31:44 web2 kernel: [<ffffffffa00ae8d2>] :sunrpc:rpc_call_sync+0x82/0xc0 Feb 23 11:31:44 web2 kernel: [<ffffffffa00fa41e>] :nfs:nfs3_rpc_wrapper+0x2e/0x90 Feb 23 11:31:44 web2 kernel: [<ffffffffa00fabe9>] :nfs:nfs3_proc_access+0x109/0x180

and vpsnetclean:
Feb 23 11:31:44 web2 kernel: vpsnetclean D 00008ad76e6a 0000010117e5ccf0 0 4318 4317 (NOTLB) Feb 23 11:31:44 web2 kernel: 0000010117ed5928 0000000000000006 00000101312e67a8 ffffffff8016f624 Feb 23 11:31:44 web2 kernel: 000002000000f380 0000000000000001 ffffffff8054bc80 0000000000000000
Feb 23 11:31:44 web2 kernel:  0000010117e5ccf0 00008ad76e6a901c
Feb 23 11:31:44 web2 kernel: Call Trace: [<ffffffff8016f624>] __alloc_collect_stats+0x54/0xc0 Feb 23 11:31:44 web2 kernel: [<ffffffffa00b2ec1>] :sunrpc:rpc_sleep_on+0x41/0x70 Feb 23 11:31:44 web2 kernel: [<ffffffffa00b3bd0>] :sunrpc:__rpc_execute+0x1f0/0x3c0 Feb 23 11:31:44 web2 kernel: [<ffffffff8013b8a0>] autoremove_wake_function+0x0/0x30 Feb 23 11:31:44 web2 kernel: [<ffffffffa00b36c7>] :sunrpc:rpc_init_task+0x157/0x1f0 Feb 23 11:31:44 web2 kernel: [<ffffffff8013b8a0>] autoremove_wake_function+0x0/0x30 Feb 23 11:31:44 web2 kernel: [<ffffffffa00ae8d2>] :sunrpc:rpc_call_sync+0x82/0xc0 Feb 23 11:31:44 web2 kernel: [<ffffffffa00fa41e>] :nfs:nfs3_rpc_wrapper+0x2e/0x90

Any idea what I can do to investigate this further? Could putting /etc/vz and /etc/sysconfig/vz-scripts on NFS be the source of the problems ?


_______________________________________________
Users mailing list
[email protected]
https://openvz.org/mailman/listinfo/users

Reply via email to