Hello Roman, Could you file a bug to bugzilla.openvz.org and assign it to me?
Thanks. On Wed, Mar 26, 2014 at 05:35:55PM +0100, Roman Haefeli wrote: > Hi all > > I happened to be able to crash one hostnode of our testing cluster when > restoring a CT. > > Hostnodes: > * 3 hostnodes running Debian 7 amd64 with OpenVZ kernel > * Kernel: 042stab085.20 > * VE_ROOT / VE_PRIVATE is on an NFS mount shared by nodes > > Test-CT: > * Debian 7 from self-made template > * amd64 > * ploop > * runs mysql server and apache2 web server > * runs scripts to cause load on mysql and web server > > For testing purposes, I was online-migrating the test-CT between nodes > once every 30 seconds. This went fine for a while, but after a few > cycles (~20) one of the hostnodes crashed when trying to restore the CT. > > This issue is most likely not specific to the kernel version. I got > similar crashes with older versions as well, but was too lazy to report > them. > > I'm aware that migrating a CT every 30 seconds might be considered > extreme, though we experienced similar crashes on production systems at > the time of online migration and on those we migrate every few weeks at > most. Before using online migration on production again, I'd like to > verify that the most extreme situation I can think of is handled > gracefully by the kernel. > > Here is the part of the syslog I was able to catch at the time of the > crash, let me know if further information is needed: > > Mar 26 16:17:05 virtuetest3 kernel: [ 1000.279251] ploop46524: p1 > Mar 26 16:17:05 virtuetest3 kernel: [ 1000.289409] ploop46524: p1 > Mar 26 16:17:05 virtuetest3 kernel: [ 1000.313031] EXT4-fs (ploop46524p1): > mounted filesystem with ordered data mode. Opts: > Mar 26 16:17:05 virtuetest3 kernel: [ 1000.314840] EXT4-fs (ploop46524p1): > loaded balloon from 12 (0 blocks) > Mar 26 16:17:05 virtuetest3 kernel: [ 1000.383837] lo: Dropping TSO features > since no CSUM feature. > Mar 26 16:17:05 virtuetest3 kernel: [ 1000.384787] CT: 54: started > Mar 26 16:17:05 virtuetest3 kernel: [ 1000.399195] device veth54.0 entered > promiscuous mode > Mar 26 16:17:05 virtuetest3 kernel: [ 1000.399286] br_206: port 2(veth54.0) > entering forwarding state > Mar 26 16:17:07 virtuetest3 kernel: [ 1001.660051] BUG: unable to handle > kernel NULL pointer dereference at 0000000000000018 > Mar 26 16:17:07 virtuetest3 kernel: [ 1001.660232] IP: [<ffffffff814adcfe>] > inet_csk_reqsk_queue_prune+0x29e/0x2c0 > Mar 26 16:17:07 virtuetest3 kernel: [ 1001.660372] PGD 0 > Mar 26 16:17:07 virtuetest3 kernel: [ 1001.660419] Oops: 0000 [#1] SMP > Mar 26 16:17:07 virtuetest3 kernel: [ 1001.660498] last sysfs file: > /sys/devices/virtual/block/ploop46524/removable > Mar 26 16:17:07 virtuetest3 kernel: [ 1001.660616] CPU 0 > Mar 26 16:17:07 virtuetest3 kernel: [ 1001.660657] Modules linked in: > vzethdev vznetdev pio_nfs pio_direct pfmt_raw pfmt_ploop1 ploop simfs vzrst > nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 vzcpt nf_conntrack vziolimit vzmon > xt_length xt_hl xt_tcpmss xt_TCPMSS iptable_mangle iptable_filter > xt_multiport xt_limit xt_dscp ipt_REJECT ip_tables nfs fscache vzdquota vzdev > vzevent ib_iser rdma_cm ib_addr iw_cm ib_cm ib_sa ib_mad ib_core iscsi_tcp > libiscsi_tcp libiscsi scsi_transport_iscsi fuse nfsd nfs_acl auth_rpcgss > lockd sunrpc ipv6 bridge 8021q garp stp llc snd_pcsp radeon iTCO_wdt > iTCO_vendor_support snd_pcm ttm snd_page_alloc drm_kms_helper snd_timer > lpc_ich i5000_edac drm ioatdma mfd_core edac_core snd i2c_algo_bit i5k_amb > i2c_core soundcore serio_raw dca shpchp ext4 jbd2 mbcache sg sd_mod > crc_t10dif ata_generic pata_acpi mptsas mptscsih bnx2 ata_piix mptbase > scsi_transport_sas [last unloaded: ploop] > Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003] > Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003] Pid: 0, comm: swapper > veid: 0 Not tainted 2.6.32-openvz-042stab085.20-amd64 #1 042stab085_20 IBM > IBM eServer BladeCenter HS21 -[7995L3G]-/Server Blade > Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003] RIP: > 0010:[<ffffffff814adcfe>] [<ffffffff814adcfe>] > inet_csk_reqsk_queue_prune+0x29e/0x2c0 > Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003] RSP: 0018:ffff880028203d50 > EFLAGS: 00010202 > Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003] RAX: 0000000000000000 RBX: > 00000001000ab4dc RCX: 0000000000000000 > Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003] RDX: ffff88034c05a500 RSI: > ffff880362581c80 RDI: ffff880366f2b080 > Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003] RBP: ffff880028203dc0 R08: > ffff88002821c320 R09: 0000000000000000 > Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003] R10: 0000000000000001 R11: > 0000000000000000 R12: ffff880366f2b080 > Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003] R13: 000000000001d4c0 R14: > ffff880366f2b3c0 R15: ffff880362581c80 > Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003] FS: > 0000000000000000(0000) GS:ffff880028200000(0000) knlGS:0000000000000000 > Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003] CS: 0010 DS: 0018 ES: > 0018 CR0: 000000008005003b > Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003] CR2: 0000000000000018 CR3: > 0000000349579000 CR4: 00000000000007f0 > Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003] DR0: 0000000000000000 DR1: > 0000000000000000 DR2: 0000000000000000 > Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003] DR3: 0000000000000000 DR6: > 00000000ffff0ff0 DR7: 0000000000000400 > Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003] Process swapper (pid: 0, > veid: 0, threadinfo ffffffff81a00000, task ffffffff81a8d020) > Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003] Stack: > Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003] 0000840bffff23fa > ffff88034c05a500 00000000000000c8 0000000181c0f7a8 > Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003] <d> 0000009d0000002f > 00000000000003e8 ffff88034c05a000 000000058146c918 > Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003] <d> ffffc900035d9000 > ffff880366f2b080 ffff880366f2b0c8 ffffffff81aaa180 > Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003] Call Trace: > Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003] <IRQ> > Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003] [<ffffffff814c2757>] > tcp_keepalive_timer+0x187/0x2e0 > Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003] [<ffffffff81089b7c>] > run_timer_softirq+0x1bc/0x380 > Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003] [<ffffffff814c25d0>] ? > tcp_keepalive_timer+0x0/0x2e0 > Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003] [<ffffffff8107f3c3>] > __do_softirq+0x103/0x260 > Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003] [<ffffffff8100c44c>] > call_softirq+0x1c/0x30 > Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003] [<ffffffff81010195>] > do_softirq+0x65/0xa0 > Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003] [<ffffffff8107f1ed>] > irq_exit+0xcd/0xd0 > Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003] [<ffffffff81539515>] > do_IRQ+0x75/0xf0 > Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003] [<ffffffff8100ba93>] > ret_from_intr+0x0/0x11 > Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003] <EOI> > Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003] [<ffffffff81016ce7>] ? > mwait_idle+0x77/0xd0 > Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003] [<ffffffff81535a9a>] ? > atomic_notifier_call_chain+0x1a/0x20 > Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003] [<ffffffff8100a013>] > cpu_idle+0xb3/0x110 > Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003] [<ffffffff81514db5>] > rest_init+0x85/0x90 > Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003] [<ffffffff81c31f80>] > start_kernel+0x412/0x41e > Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003] [<ffffffff81c3133a>] > x86_64_start_reservations+0x125/0x129 > Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003] [<ffffffff81c31453>] > x86_64_start_kernel+0x115/0x124 > Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003] Code: ff 0f 1f 40 00 4c 89 > fa e9 50 fe ff ff 41 f6 47 49 10 74 09 31 c9 e9 6f ff ff ff 66 90 49 8b 47 20 > 4c 89 fe 48 89 55 98 4c 89 e7 <ff> 50 18 85 c0 48 8b 55 98 0f 84 68 fe ff ff > 41 f6 47 49 10 0f > Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003] RIP [<ffffffff814adcfe>] > inet_csk_reqsk_queue_prune+0x29e/0x2c0 > Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003] RSP <ffff880028203d50> > Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003] CR2: 0000000000000018 > Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003] Tainting kernel with flag > 0x7 > Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003] Pid: 0, comm: swapper > veid: 0 Not tainted 2.6.32-openvz-042stab085.20-amd64 #1 > Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003] Call Trace: > Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003] <IRQ> > [<ffffffff81075e65>] ? add_taint+0x35/0x70 > Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003] [<ffffffff815339b4>] ? > oops_end+0x54/0x100 > Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003] [<ffffffff8104af5b>] ? > no_context+0xfb/0x260 > Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003] [<ffffffff8104b1d5>] ? > __bad_area_nosemaphore+0x115/0x1e0 > Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003] [<ffffffffa034fff8>] ? > br_nf_pre_routing_finish+0x238/0x350 [bridge] > Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003] [<ffffffff8104b2b3>] ? > bad_area_nosemaphore+0x13/0x20 > Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003] [<ffffffff8104ba02>] ? > __do_page_fault+0x322/0x490 > Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003] [<ffffffffa03505ba>] ? > br_nf_pre_routing+0x4aa/0x7e0 [bridge] > Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003] [<ffffffff81497ab9>] ? > nf_iterate+0x69/0xb0 > Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003] [<ffffffffa0349e00>] ? > br_handle_frame_finish+0x0/0x320 [bridge] > Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003] [<ffffffff81497c76>] ? > nf_hook_slow+0x76/0x120 > Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003] [<ffffffffa0349e00>] ? > br_handle_frame_finish+0x0/0x320 [bridge] > Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003] [<ffffffff8153597e>] ? > do_page_fault+0x3e/0xa0 > Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003] [<ffffffff81532d05>] ? > page_fault+0x25/0x30 > Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003] [<ffffffff814adcfe>] ? > inet_csk_reqsk_queue_prune+0x29e/0x2c0 > Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003] [<ffffffff814c2757>] ? > tcp_keepalive_timer+0x187/0x2e0 > Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003] [<ffffffff81089b7c>] ? > run_timer_softirq+0x1bc/0x380 > Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003] [<ffffffff814c25d0>] ? > tcp_keepalive_tim > > > _______________________________________________ > Users mailing list > Users@openvz.org > https://lists.openvz.org/mailman/listinfo/users _______________________________________________ Users mailing list Users@openvz.org https://lists.openvz.org/mailman/listinfo/users