Hi all I happened to be able to crash one hostnode of our testing cluster when restoring a CT.
Hostnodes: * 3 hostnodes running Debian 7 amd64 with OpenVZ kernel * Kernel: 042stab085.20 * VE_ROOT / VE_PRIVATE is on an NFS mount shared by nodes Test-CT: * Debian 7 from self-made template * amd64 * ploop * runs mysql server and apache2 web server * runs scripts to cause load on mysql and web server For testing purposes, I was online-migrating the test-CT between nodes once every 30 seconds. This went fine for a while, but after a few cycles (~20) one of the hostnodes crashed when trying to restore the CT. This issue is most likely not specific to the kernel version. I got similar crashes with older versions as well, but was too lazy to report them. I'm aware that migrating a CT every 30 seconds might be considered extreme, though we experienced similar crashes on production systems at the time of online migration and on those we migrate every few weeks at most. Before using online migration on production again, I'd like to verify that the most extreme situation I can think of is handled gracefully by the kernel. Here is the part of the syslog I was able to catch at the time of the crash, let me know if further information is needed: Mar 26 16:17:05 virtuetest3 kernel: [ 1000.279251] ploop46524: p1 Mar 26 16:17:05 virtuetest3 kernel: [ 1000.289409] ploop46524: p1 Mar 26 16:17:05 virtuetest3 kernel: [ 1000.313031] EXT4-fs (ploop46524p1): mounted filesystem with ordered data mode. Opts: Mar 26 16:17:05 virtuetest3 kernel: [ 1000.314840] EXT4-fs (ploop46524p1): loaded balloon from 12 (0 blocks) Mar 26 16:17:05 virtuetest3 kernel: [ 1000.383837] lo: Dropping TSO features since no CSUM feature. Mar 26 16:17:05 virtuetest3 kernel: [ 1000.384787] CT: 54: started Mar 26 16:17:05 virtuetest3 kernel: [ 1000.399195] device veth54.0 entered promiscuous mode Mar 26 16:17:05 virtuetest3 kernel: [ 1000.399286] br_206: port 2(veth54.0) entering forwarding state Mar 26 16:17:07 virtuetest3 kernel: [ 1001.660051] BUG: unable to handle kernel NULL pointer dereference at 0000000000000018 Mar 26 16:17:07 virtuetest3 kernel: [ 1001.660232] IP: [<ffffffff814adcfe>] inet_csk_reqsk_queue_prune+0x29e/0x2c0 Mar 26 16:17:07 virtuetest3 kernel: [ 1001.660372] PGD 0 Mar 26 16:17:07 virtuetest3 kernel: [ 1001.660419] Oops: 0000 [#1] SMP Mar 26 16:17:07 virtuetest3 kernel: [ 1001.660498] last sysfs file: /sys/devices/virtual/block/ploop46524/removable Mar 26 16:17:07 virtuetest3 kernel: [ 1001.660616] CPU 0 Mar 26 16:17:07 virtuetest3 kernel: [ 1001.660657] Modules linked in: vzethdev vznetdev pio_nfs pio_direct pfmt_raw pfmt_ploop1 ploop simfs vzrst nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 vzcpt nf_conntrack vziolimit vzmon xt_length xt_hl xt_tcpmss xt_TCPMSS iptable_mangle iptable_filter xt_multiport xt_limit xt_dscp ipt_REJECT ip_tables nfs fscache vzdquota vzdev vzevent ib_iser rdma_cm ib_addr iw_cm ib_cm ib_sa ib_mad ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi fuse nfsd nfs_acl auth_rpcgss lockd sunrpc ipv6 bridge 8021q garp stp llc snd_pcsp radeon iTCO_wdt iTCO_vendor_support snd_pcm ttm snd_page_alloc drm_kms_helper snd_timer lpc_ich i5000_edac drm ioatdma mfd_core edac_core snd i2c_algo_bit i5k_amb i2c_core soundcore serio_raw dca shpchp ext4 jbd2 mbcache sg sd_mod crc_t10dif ata_generic pata_acpi mptsas mptscsih bnx2 ata_piix mptbase scsi_transport_sas [last unloaded: ploop] Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003] Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003] Pid: 0, comm: swapper veid: 0 Not tainted 2.6.32-openvz-042stab085.20-amd64 #1 042stab085_20 IBM IBM eServer BladeCenter HS21 -[7995L3G]-/Server Blade Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003] RIP: 0010:[<ffffffff814adcfe>] [<ffffffff814adcfe>] inet_csk_reqsk_queue_prune+0x29e/0x2c0 Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003] RSP: 0018:ffff880028203d50 EFLAGS: 00010202 Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003] RAX: 0000000000000000 RBX: 00000001000ab4dc RCX: 0000000000000000 Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003] RDX: ffff88034c05a500 RSI: ffff880362581c80 RDI: ffff880366f2b080 Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003] RBP: ffff880028203dc0 R08: ffff88002821c320 R09: 0000000000000000 Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003] R10: 0000000000000001 R11: 0000000000000000 R12: ffff880366f2b080 Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003] R13: 000000000001d4c0 R14: ffff880366f2b3c0 R15: ffff880362581c80 Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003] FS: 0000000000000000(0000) GS:ffff880028200000(0000) knlGS:0000000000000000 Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003] CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003] CR2: 0000000000000018 CR3: 0000000349579000 CR4: 00000000000007f0 Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003] Process swapper (pid: 0, veid: 0, threadinfo ffffffff81a00000, task ffffffff81a8d020) Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003] Stack: Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003] 0000840bffff23fa ffff88034c05a500 00000000000000c8 0000000181c0f7a8 Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003] <d> 0000009d0000002f 00000000000003e8 ffff88034c05a000 000000058146c918 Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003] <d> ffffc900035d9000 ffff880366f2b080 ffff880366f2b0c8 ffffffff81aaa180 Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003] Call Trace: Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003] <IRQ> Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003] [<ffffffff814c2757>] tcp_keepalive_timer+0x187/0x2e0 Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003] [<ffffffff81089b7c>] run_timer_softirq+0x1bc/0x380 Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003] [<ffffffff814c25d0>] ? tcp_keepalive_timer+0x0/0x2e0 Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003] [<ffffffff8107f3c3>] __do_softirq+0x103/0x260 Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003] [<ffffffff8100c44c>] call_softirq+0x1c/0x30 Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003] [<ffffffff81010195>] do_softirq+0x65/0xa0 Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003] [<ffffffff8107f1ed>] irq_exit+0xcd/0xd0 Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003] [<ffffffff81539515>] do_IRQ+0x75/0xf0 Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003] [<ffffffff8100ba93>] ret_from_intr+0x0/0x11 Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003] <EOI> Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003] [<ffffffff81016ce7>] ? mwait_idle+0x77/0xd0 Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003] [<ffffffff81535a9a>] ? atomic_notifier_call_chain+0x1a/0x20 Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003] [<ffffffff8100a013>] cpu_idle+0xb3/0x110 Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003] [<ffffffff81514db5>] rest_init+0x85/0x90 Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003] [<ffffffff81c31f80>] start_kernel+0x412/0x41e Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003] [<ffffffff81c3133a>] x86_64_start_reservations+0x125/0x129 Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003] [<ffffffff81c31453>] x86_64_start_kernel+0x115/0x124 Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003] Code: ff 0f 1f 40 00 4c 89 fa e9 50 fe ff ff 41 f6 47 49 10 74 09 31 c9 e9 6f ff ff ff 66 90 49 8b 47 20 4c 89 fe 48 89 55 98 4c 89 e7 <ff> 50 18 85 c0 48 8b 55 98 0f 84 68 fe ff ff 41 f6 47 49 10 0f Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003] RIP [<ffffffff814adcfe>] inet_csk_reqsk_queue_prune+0x29e/0x2c0 Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003] RSP <ffff880028203d50> Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003] CR2: 0000000000000018 Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003] Tainting kernel with flag 0x7 Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003] Pid: 0, comm: swapper veid: 0 Not tainted 2.6.32-openvz-042stab085.20-amd64 #1 Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003] Call Trace: Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003] <IRQ> [<ffffffff81075e65>] ? add_taint+0x35/0x70 Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003] [<ffffffff815339b4>] ? oops_end+0x54/0x100 Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003] [<ffffffff8104af5b>] ? no_context+0xfb/0x260 Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003] [<ffffffff8104b1d5>] ? __bad_area_nosemaphore+0x115/0x1e0 Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003] [<ffffffffa034fff8>] ? br_nf_pre_routing_finish+0x238/0x350 [bridge] Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003] [<ffffffff8104b2b3>] ? bad_area_nosemaphore+0x13/0x20 Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003] [<ffffffff8104ba02>] ? __do_page_fault+0x322/0x490 Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003] [<ffffffffa03505ba>] ? br_nf_pre_routing+0x4aa/0x7e0 [bridge] Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003] [<ffffffff81497ab9>] ? nf_iterate+0x69/0xb0 Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003] [<ffffffffa0349e00>] ? br_handle_frame_finish+0x0/0x320 [bridge] Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003] [<ffffffff81497c76>] ? nf_hook_slow+0x76/0x120 Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003] [<ffffffffa0349e00>] ? br_handle_frame_finish+0x0/0x320 [bridge] Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003] [<ffffffff8153597e>] ? do_page_fault+0x3e/0xa0 Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003] [<ffffffff81532d05>] ? page_fault+0x25/0x30 Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003] [<ffffffff814adcfe>] ? inet_csk_reqsk_queue_prune+0x29e/0x2c0 Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003] [<ffffffff814c2757>] ? tcp_keepalive_timer+0x187/0x2e0 Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003] [<ffffffff81089b7c>] ? run_timer_softirq+0x1bc/0x380 Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003] [<ffffffff814c25d0>] ? tcp_keepalive_tim _______________________________________________ Users mailing list Users@openvz.org https://lists.openvz.org/mailman/listinfo/users