[Kernel-packages] [Bug 1467955] Re: Precise BUG: soft lockup in flush_tlb_others_ipi
The Precise Pangolin has reached end of life, so this bug will not be fixed for that release ** Changed in: linux (Ubuntu Precise) Status: Incomplete => Won't Fix -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1467955 Title: Precise BUG: soft lockup in flush_tlb_others_ipi Status in linux package in Ubuntu: Fix Released Status in linux source package in Precise: Won't Fix Bug description: The following stack trace (with kernel dump) was brought to me. It looks like this crash is happening every day (at least once) in a KVM + CEPH backend environment. """ [1796904.032010] BUG: soft lockup - CPU#0 stuck for 23s! [java:6383] [1796904.036004] Modules linked in: isofs psmouse virtio_balloon serio_raw acpiphp floppy [1796904.036004] CPU 0 [1796904.036004] Modules linked in: isofs psmouse virtio_balloon serio_raw acpiphp floppy [1796904.036004] [1796904.036004] Pid: 6383, comm: java Not tainted 3.2.0-76-virtual #111-Ubuntu OpenStack Foundation OpenStack Nova [1796904.036004] RIP: 0010:[] [] flush_tlb_others_ipi+0x122/0x130 [1796904.036004] RSP: 0018:880065791d58 EFLAGS: 0202 [1796904.036004] RAX: 0002 RBX: ea0003470bf0 RCX: 0002 [1796904.036004] RDX: 0002 RSI: 0040 RDI: 0296 [1796904.036004] RBP: 880065791d88 R08: 81e0c0a0 R09: 0040 [1796904.036004] R10: ea0003471240 R11: R12: 880065791e20 [1796904.036004] R13: 880059e96f20 R14: 880116249848 R15: 00ff880065791d78 [1796904.036004] FS: 7f83612d2700() GS:88011fc0() knlGS: [1796904.036004] CS: 0010 DS: ES: CR0: 80050033 [1796904.036004] CR2: 7f83be381420 CR3: 000118be CR4: 06f0 [1796904.036004] DR0: DR1: DR2: [1796904.036004] DR3: DR6: 0ff0 DR7: 0400 [1796917.981999] Process java (pid: 6383, threadinfo 88006579, task 880053c0dbc0) [1796917.981999] Stack: [1796917.981999] 7f83612ccfff 880059e96f20 880116200e00 8801162010d0 [1796917.981999] 7f83612cd000 880116200e00 880065791d98 81046aae [1796917.981999] 880065791db8 81046b7b 7f83611d5000 880065791e20 [1796917.981999] Call Trace: [1796917.982394] ata2: lost interrupt (Status 0x58) [1796917.981999] [] native_flush_tlb_others+0xe/0x10 [1796917.981999] [] flush_tlb_mm+0x5b/0xa0 [1796917.981999] [] tlb_flush_mmu+0x46/0x90 [1796917.981999] [] tlb_finish_mmu+0x14/0x40 [1796917.981999] [] zap_page_range+0xb7/0xd0 [1796917.981999] [] madvise_vma+0xfd/0x140 [1796917.981999] [] ? __set_task_blocked+0x37/0x80 [1796917.981999] [] ? getnstimeofday+0x57/0xe0 [1796917.981999] [] sys_madvise+0x1de/0x280 [1796917.981999] [] system_call_fastpath+0x16/0x1b [1796917.981999] Code: 41 8d b6 cf 00 00 00 49 8d 7d 18 ff 90 d0 00 00 00 49 83 bc 24 98 c0 e0 81 00 0f 84 74 ff ff ff 66 0f 1f 84 00 00 00 00 00 f3 90 <49> 83 7d 18 00 75 f7 e9 5d ff ff ff 66 90 55 48 89 e5 66 66 66 [1796917.981999] Call Trace: [1796917.981999] [] native_flush_tlb_others+0xe/0x10 [1796917.981999] [] flush_tlb_mm+0x5b/0xa0 [1796917.981999] [] tlb_flush_mmu+0x46/0x90 [1796917.981999] [] tlb_finish_mmu+0x14/0x40 [1796917.981999] [] zap_page_range+0xb7/0xd0 [1796917.981999] [] madvise_vma+0xfd/0x140 [1796917.981999] [] ? __set_task_blocked+0x37/0x80 [1796917.981999] [] ? getnstimeofday+0x57/0xe0 [1796917.981999] [] sys_madvise+0x1de/0x280 [1796917.981999] [] system_call_fastpath+0x16/0x1b [1796917.992066] ata2: drained 65536 bytes to clear DRQ """ Analysis Bellow... To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1467955/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1467955] Re: Precise BUG: soft lockup in flush_tlb_others_ipi
** Tags added: cscc -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1467955 Title: Precise BUG: soft lockup in flush_tlb_others_ipi Status in linux package in Ubuntu: Fix Released Status in linux source package in Precise: Incomplete Bug description: The following stack trace (with kernel dump) was brought to me. It looks like this crash is happening every day (at least once) in a KVM + CEPH backend environment. """ [1796904.032010] BUG: soft lockup - CPU#0 stuck for 23s! [java:6383] [1796904.036004] Modules linked in: isofs psmouse virtio_balloon serio_raw acpiphp floppy [1796904.036004] CPU 0 [1796904.036004] Modules linked in: isofs psmouse virtio_balloon serio_raw acpiphp floppy [1796904.036004] [1796904.036004] Pid: 6383, comm: java Not tainted 3.2.0-76-virtual #111-Ubuntu OpenStack Foundation OpenStack Nova [1796904.036004] RIP: 0010:[] [] flush_tlb_others_ipi+0x122/0x130 [1796904.036004] RSP: 0018:880065791d58 EFLAGS: 0202 [1796904.036004] RAX: 0002 RBX: ea0003470bf0 RCX: 0002 [1796904.036004] RDX: 0002 RSI: 0040 RDI: 0296 [1796904.036004] RBP: 880065791d88 R08: 81e0c0a0 R09: 0040 [1796904.036004] R10: ea0003471240 R11: R12: 880065791e20 [1796904.036004] R13: 880059e96f20 R14: 880116249848 R15: 00ff880065791d78 [1796904.036004] FS: 7f83612d2700() GS:88011fc0() knlGS: [1796904.036004] CS: 0010 DS: ES: CR0: 80050033 [1796904.036004] CR2: 7f83be381420 CR3: 000118be CR4: 06f0 [1796904.036004] DR0: DR1: DR2: [1796904.036004] DR3: DR6: 0ff0 DR7: 0400 [1796917.981999] Process java (pid: 6383, threadinfo 88006579, task 880053c0dbc0) [1796917.981999] Stack: [1796917.981999] 7f83612ccfff 880059e96f20 880116200e00 8801162010d0 [1796917.981999] 7f83612cd000 880116200e00 880065791d98 81046aae [1796917.981999] 880065791db8 81046b7b 7f83611d5000 880065791e20 [1796917.981999] Call Trace: [1796917.982394] ata2: lost interrupt (Status 0x58) [1796917.981999] [] native_flush_tlb_others+0xe/0x10 [1796917.981999] [] flush_tlb_mm+0x5b/0xa0 [1796917.981999] [] tlb_flush_mmu+0x46/0x90 [1796917.981999] [] tlb_finish_mmu+0x14/0x40 [1796917.981999] [] zap_page_range+0xb7/0xd0 [1796917.981999] [] madvise_vma+0xfd/0x140 [1796917.981999] [] ? __set_task_blocked+0x37/0x80 [1796917.981999] [] ? getnstimeofday+0x57/0xe0 [1796917.981999] [] sys_madvise+0x1de/0x280 [1796917.981999] [] system_call_fastpath+0x16/0x1b [1796917.981999] Code: 41 8d b6 cf 00 00 00 49 8d 7d 18 ff 90 d0 00 00 00 49 83 bc 24 98 c0 e0 81 00 0f 84 74 ff ff ff 66 0f 1f 84 00 00 00 00 00 f3 90 <49> 83 7d 18 00 75 f7 e9 5d ff ff ff 66 90 55 48 89 e5 66 66 66 [1796917.981999] Call Trace: [1796917.981999] [] native_flush_tlb_others+0xe/0x10 [1796917.981999] [] flush_tlb_mm+0x5b/0xa0 [1796917.981999] [] tlb_flush_mmu+0x46/0x90 [1796917.981999] [] tlb_finish_mmu+0x14/0x40 [1796917.981999] [] zap_page_range+0xb7/0xd0 [1796917.981999] [] madvise_vma+0xfd/0x140 [1796917.981999] [] ? __set_task_blocked+0x37/0x80 [1796917.981999] [] ? getnstimeofday+0x57/0xe0 [1796917.981999] [] sys_madvise+0x1de/0x280 [1796917.981999] [] system_call_fastpath+0x16/0x1b [1796917.992066] ata2: drained 65536 bytes to clear DRQ """ Analysis Bellow... To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1467955/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1467955] Re: Precise BUG: soft lockup in flush_tlb_others_ipi
** Changed in: linux (Ubuntu) Assignee: Rafael David Tinoco (rafaeldtinoco) => (unassigned) ** Changed in: linux (Ubuntu Precise) Assignee: Rafael David Tinoco (rafaeldtinoco) => (unassigned) -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1467955 Title: Precise BUG: soft lockup in flush_tlb_others_ipi Status in linux package in Ubuntu: Fix Released Status in linux source package in Precise: Incomplete Bug description: The following stack trace (with kernel dump) was brought to me. It looks like this crash is happening every day (at least once) in a KVM + CEPH backend environment. """ [1796904.032010] BUG: soft lockup - CPU#0 stuck for 23s! [java:6383] [1796904.036004] Modules linked in: isofs psmouse virtio_balloon serio_raw acpiphp floppy [1796904.036004] CPU 0 [1796904.036004] Modules linked in: isofs psmouse virtio_balloon serio_raw acpiphp floppy [1796904.036004] [1796904.036004] Pid: 6383, comm: java Not tainted 3.2.0-76-virtual #111-Ubuntu OpenStack Foundation OpenStack Nova [1796904.036004] RIP: 0010:[] [] flush_tlb_others_ipi+0x122/0x130 [1796904.036004] RSP: 0018:880065791d58 EFLAGS: 0202 [1796904.036004] RAX: 0002 RBX: ea0003470bf0 RCX: 0002 [1796904.036004] RDX: 0002 RSI: 0040 RDI: 0296 [1796904.036004] RBP: 880065791d88 R08: 81e0c0a0 R09: 0040 [1796904.036004] R10: ea0003471240 R11: R12: 880065791e20 [1796904.036004] R13: 880059e96f20 R14: 880116249848 R15: 00ff880065791d78 [1796904.036004] FS: 7f83612d2700() GS:88011fc0() knlGS: [1796904.036004] CS: 0010 DS: ES: CR0: 80050033 [1796904.036004] CR2: 7f83be381420 CR3: 000118be CR4: 06f0 [1796904.036004] DR0: DR1: DR2: [1796904.036004] DR3: DR6: 0ff0 DR7: 0400 [1796917.981999] Process java (pid: 6383, threadinfo 88006579, task 880053c0dbc0) [1796917.981999] Stack: [1796917.981999] 7f83612ccfff 880059e96f20 880116200e00 8801162010d0 [1796917.981999] 7f83612cd000 880116200e00 880065791d98 81046aae [1796917.981999] 880065791db8 81046b7b 7f83611d5000 880065791e20 [1796917.981999] Call Trace: [1796917.982394] ata2: lost interrupt (Status 0x58) [1796917.981999] [] native_flush_tlb_others+0xe/0x10 [1796917.981999] [] flush_tlb_mm+0x5b/0xa0 [1796917.981999] [] tlb_flush_mmu+0x46/0x90 [1796917.981999] [] tlb_finish_mmu+0x14/0x40 [1796917.981999] [] zap_page_range+0xb7/0xd0 [1796917.981999] [] madvise_vma+0xfd/0x140 [1796917.981999] [] ? __set_task_blocked+0x37/0x80 [1796917.981999] [] ? getnstimeofday+0x57/0xe0 [1796917.981999] [] sys_madvise+0x1de/0x280 [1796917.981999] [] system_call_fastpath+0x16/0x1b [1796917.981999] Code: 41 8d b6 cf 00 00 00 49 8d 7d 18 ff 90 d0 00 00 00 49 83 bc 24 98 c0 e0 81 00 0f 84 74 ff ff ff 66 0f 1f 84 00 00 00 00 00 f3 90 <49> 83 7d 18 00 75 f7 e9 5d ff ff ff 66 90 55 48 89 e5 66 66 66 [1796917.981999] Call Trace: [1796917.981999] [] native_flush_tlb_others+0xe/0x10 [1796917.981999] [] flush_tlb_mm+0x5b/0xa0 [1796917.981999] [] tlb_flush_mmu+0x46/0x90 [1796917.981999] [] tlb_finish_mmu+0x14/0x40 [1796917.981999] [] zap_page_range+0xb7/0xd0 [1796917.981999] [] madvise_vma+0xfd/0x140 [1796917.981999] [] ? __set_task_blocked+0x37/0x80 [1796917.981999] [] ? getnstimeofday+0x57/0xe0 [1796917.981999] [] sys_madvise+0x1de/0x280 [1796917.981999] [] system_call_fastpath+0x16/0x1b [1796917.992066] ata2: drained 65536 bytes to clear DRQ """ Analysis Bellow... To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1467955/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1467955] Re: Precise BUG: soft lockup in flush_tlb_others_ipi
** Changed in: linux (Ubuntu Precise) Status: In Progress => Incomplete -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1467955 Title: Precise BUG: soft lockup in flush_tlb_others_ipi Status in linux package in Ubuntu: Fix Released Status in linux source package in Precise: Incomplete Bug description: The following stack trace (with kernel dump) was brought to me. It looks like this crash is happening every day (at least once) in a KVM + CEPH backend environment. """ [1796904.032010] BUG: soft lockup - CPU#0 stuck for 23s! [java:6383] [1796904.036004] Modules linked in: isofs psmouse virtio_balloon serio_raw acpiphp floppy [1796904.036004] CPU 0 [1796904.036004] Modules linked in: isofs psmouse virtio_balloon serio_raw acpiphp floppy [1796904.036004] [1796904.036004] Pid: 6383, comm: java Not tainted 3.2.0-76-virtual #111-Ubuntu OpenStack Foundation OpenStack Nova [1796904.036004] RIP: 0010:[] [] flush_tlb_others_ipi+0x122/0x130 [1796904.036004] RSP: 0018:880065791d58 EFLAGS: 0202 [1796904.036004] RAX: 0002 RBX: ea0003470bf0 RCX: 0002 [1796904.036004] RDX: 0002 RSI: 0040 RDI: 0296 [1796904.036004] RBP: 880065791d88 R08: 81e0c0a0 R09: 0040 [1796904.036004] R10: ea0003471240 R11: R12: 880065791e20 [1796904.036004] R13: 880059e96f20 R14: 880116249848 R15: 00ff880065791d78 [1796904.036004] FS: 7f83612d2700() GS:88011fc0() knlGS: [1796904.036004] CS: 0010 DS: ES: CR0: 80050033 [1796904.036004] CR2: 7f83be381420 CR3: 000118be CR4: 06f0 [1796904.036004] DR0: DR1: DR2: [1796904.036004] DR3: DR6: 0ff0 DR7: 0400 [1796917.981999] Process java (pid: 6383, threadinfo 88006579, task 880053c0dbc0) [1796917.981999] Stack: [1796917.981999] 7f83612ccfff 880059e96f20 880116200e00 8801162010d0 [1796917.981999] 7f83612cd000 880116200e00 880065791d98 81046aae [1796917.981999] 880065791db8 81046b7b 7f83611d5000 880065791e20 [1796917.981999] Call Trace: [1796917.982394] ata2: lost interrupt (Status 0x58) [1796917.981999] [] native_flush_tlb_others+0xe/0x10 [1796917.981999] [] flush_tlb_mm+0x5b/0xa0 [1796917.981999] [] tlb_flush_mmu+0x46/0x90 [1796917.981999] [] tlb_finish_mmu+0x14/0x40 [1796917.981999] [] zap_page_range+0xb7/0xd0 [1796917.981999] [] madvise_vma+0xfd/0x140 [1796917.981999] [] ? __set_task_blocked+0x37/0x80 [1796917.981999] [] ? getnstimeofday+0x57/0xe0 [1796917.981999] [] sys_madvise+0x1de/0x280 [1796917.981999] [] system_call_fastpath+0x16/0x1b [1796917.981999] Code: 41 8d b6 cf 00 00 00 49 8d 7d 18 ff 90 d0 00 00 00 49 83 bc 24 98 c0 e0 81 00 0f 84 74 ff ff ff 66 0f 1f 84 00 00 00 00 00 f3 90 <49> 83 7d 18 00 75 f7 e9 5d ff ff ff 66 90 55 48 89 e5 66 66 66 [1796917.981999] Call Trace: [1796917.981999] [] native_flush_tlb_others+0xe/0x10 [1796917.981999] [] flush_tlb_mm+0x5b/0xa0 [1796917.981999] [] tlb_flush_mmu+0x46/0x90 [1796917.981999] [] tlb_finish_mmu+0x14/0x40 [1796917.981999] [] zap_page_range+0xb7/0xd0 [1796917.981999] [] madvise_vma+0xfd/0x140 [1796917.981999] [] ? __set_task_blocked+0x37/0x80 [1796917.981999] [] ? getnstimeofday+0x57/0xe0 [1796917.981999] [] sys_madvise+0x1de/0x280 [1796917.981999] [] system_call_fastpath+0x16/0x1b [1796917.992066] ata2: drained 65536 bytes to clear DRQ """ Analysis Bellow... To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1467955/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1467955] Re: Precise BUG: soft lockup in flush_tlb_others_ipi
** Description changed: - The following stack trace (with kernel dump) was brought to me: + The following stack trace (with kernel dump) was brought to me. It looks + like this crash is happening every day (at least once) in a KVM + CEPH + backend environment. - [1796904.032010] BUG: soft lockup - CPU#0 stuck for 23s! [java:6383] - [1796904.036004] Modules linked in: isofs psmouse virtio_balloon serio_raw acpiphp floppy - [1796904.036004] CPU 0 - [1796904.036004] Modules linked in: isofs psmouse virtio_balloon serio_raw acpiphp floppy - [1796904.036004] - [1796904.036004] Pid: 6383, comm: java Not tainted 3.2.0-76-virtual #111-Ubuntu OpenStack Foundation OpenStack Nova - [1796904.036004] RIP: 0010:[81046922] [81046922] flush_tlb_others_ipi+0x122/0x130 - [1796904.036004] RSP: 0018:880065791d58 EFLAGS: 0202 - [1796904.036004] RAX: 0002 RBX: ea0003470bf0 RCX: 0002 - [1796904.036004] RDX: 0002 RSI: 0040 RDI: 0296 - [1796904.036004] RBP: 880065791d88 R08: 81e0c0a0 R09: 0040 - [1796904.036004] R10: ea0003471240 R11: R12: 880065791e20 - [1796904.036004] R13: 880059e96f20 R14: 880116249848 R15: 00ff880065791d78 - [1796904.036004] FS: 7f83612d2700() GS:88011fc0() knlGS: - [1796904.036004] CS: 0010 DS: ES: CR0: 80050033 - [1796904.036004] CR2: 7f83be381420 CR3: 000118be CR4: 06f0 - [1796904.036004] DR0: DR1: DR2: - [1796904.036004] DR3: DR6: 0ff0 DR7: 0400 - [1796917.981999] Process java (pid: 6383, threadinfo 88006579, task 880053c0dbc0) - [1796917.981999] Stack: - [1796917.981999] 7f83612ccfff 880059e96f20 880116200e00 8801162010d0 - [1796917.981999] 7f83612cd000 880116200e00 880065791d98 81046aae - [1796917.981999] 880065791db8 81046b7b 7f83611d5000 880065791e20 - [1796917.981999] Call Trace: - [1796917.982394] ata2: lost interrupt (Status 0x58) - [1796917.981999] [81046aae] native_flush_tlb_others+0xe/0x10 - [1796917.981999] [81046b7b] flush_tlb_mm+0x5b/0xa0 - [1796917.981999] [8113ba06] tlb_flush_mmu+0x46/0x90 - [1796917.981999] [8113ba64] tlb_finish_mmu+0x14/0x40 - [1796917.981999] [8113e3a7] zap_page_range+0xb7/0xd0 - [1796917.981999] [8113a85d] madvise_vma+0xfd/0x140 - [1796917.981999] [8107b917] ? __set_task_blocked+0x37/0x80 - [1796917.981999] [81095b27] ? getnstimeofday+0x57/0xe0 - [1796917.981999] [8113aa7e] sys_madvise+0x1de/0x280 - [1796917.981999] [81666b82] system_call_fastpath+0x16/0x1b - [1796917.981999] Code: 41 8d b6 cf 00 00 00 49 8d 7d 18 ff 90 d0 00 00 00 49 83 bc 24 98 c0 e0 81 00 0f 84 74 ff ff ff 66 0f 1f 84 00 00 00 00 00 f3 90 49 83 7d 18 00 75 f7 e9 5d ff ff ff 66 90 55 48 89 e5 66 66 66 - [1796917.981999] Call Trace: - [1796917.981999] [81046aae] native_flush_tlb_others+0xe/0x10 - [1796917.981999] [81046b7b] flush_tlb_mm+0x5b/0xa0 - [1796917.981999] [8113ba06] tlb_flush_mmu+0x46/0x90 - [1796917.981999] [8113ba64] tlb_finish_mmu+0x14/0x40 - [1796917.981999] [8113e3a7] zap_page_range+0xb7/0xd0 - [1796917.981999] [8113a85d] madvise_vma+0xfd/0x140 - [1796917.981999] [8107b917] ? __set_task_blocked+0x37/0x80 - [1796917.981999] [81095b27] ? getnstimeofday+0x57/0xe0 - [1796917.981999] [8113aa7e] sys_madvise+0x1de/0x280 - [1796917.981999] [81666b82] system_call_fastpath+0x16/0x1b + [1796904.032010] BUG: soft lockup - CPU#0 stuck for 23s! [java:6383] + [1796904.036004] Modules linked in: isofs psmouse virtio_balloon serio_raw acpiphp floppy + [1796904.036004] CPU 0 + [1796904.036004] Modules linked in: isofs psmouse virtio_balloon serio_raw acpiphp floppy + [1796904.036004] + [1796904.036004] Pid: 6383, comm: java Not tainted 3.2.0-76-virtual #111-Ubuntu OpenStack Foundation OpenStack Nova + [1796904.036004] RIP: 0010:[81046922] [81046922] flush_tlb_others_ipi+0x122/0x130 + [1796904.036004] RSP: 0018:880065791d58 EFLAGS: 0202 + [1796904.036004] RAX: 0002 RBX: ea0003470bf0 RCX: 0002 + [1796904.036004] RDX: 0002 RSI: 0040 RDI: 0296 + [1796904.036004] RBP: 880065791d88 R08: 81e0c0a0 R09: 0040 + [1796904.036004] R10: ea0003471240 R11: R12: 880065791e20 + [1796904.036004] R13: 880059e96f20 R14: 880116249848 R15: 00ff880065791d78 + [1796904.036004] FS: 7f83612d2700() GS:88011fc0() knlGS: + [1796904.036004] CS: 0010 DS: ES: CR0: 80050033 + [1796904.036004] CR2: 7f83be381420 CR3:
[Kernel-packages] [Bug 1467955] Re: Precise BUG: soft lockup in flush_tlb_others_ipi
** Also affects: linux (Ubuntu Precise) Importance: Undecided Status: New -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1467955 Title: Precise BUG: soft lockup in flush_tlb_others_ipi Status in linux package in Ubuntu: Fix Released Status in linux source package in Precise: In Progress Bug description: The following stack trace (with kernel dump) was brought to me: [1796904.032010] BUG: soft lockup - CPU#0 stuck for 23s! [java:6383] [1796904.036004] Modules linked in: isofs psmouse virtio_balloon serio_raw acpiphp floppy [1796904.036004] CPU 0 [1796904.036004] Modules linked in: isofs psmouse virtio_balloon serio_raw acpiphp floppy [1796904.036004] [1796904.036004] Pid: 6383, comm: java Not tainted 3.2.0-76-virtual #111-Ubuntu OpenStack Foundation OpenStack Nova [1796904.036004] RIP: 0010:[81046922] [81046922] flush_tlb_others_ipi+0x122/0x130 [1796904.036004] RSP: 0018:880065791d58 EFLAGS: 0202 [1796904.036004] RAX: 0002 RBX: ea0003470bf0 RCX: 0002 [1796904.036004] RDX: 0002 RSI: 0040 RDI: 0296 [1796904.036004] RBP: 880065791d88 R08: 81e0c0a0 R09: 0040 [1796904.036004] R10: ea0003471240 R11: R12: 880065791e20 [1796904.036004] R13: 880059e96f20 R14: 880116249848 R15: 00ff880065791d78 [1796904.036004] FS: 7f83612d2700() GS:88011fc0() knlGS: [1796904.036004] CS: 0010 DS: ES: CR0: 80050033 [1796904.036004] CR2: 7f83be381420 CR3: 000118be CR4: 06f0 [1796904.036004] DR0: DR1: DR2: [1796904.036004] DR3: DR6: 0ff0 DR7: 0400 [1796917.981999] Process java (pid: 6383, threadinfo 88006579, task 880053c0dbc0) [1796917.981999] Stack: [1796917.981999] 7f83612ccfff 880059e96f20 880116200e00 8801162010d0 [1796917.981999] 7f83612cd000 880116200e00 880065791d98 81046aae [1796917.981999] 880065791db8 81046b7b 7f83611d5000 880065791e20 [1796917.981999] Call Trace: [1796917.982394] ata2: lost interrupt (Status 0x58) [1796917.981999] [81046aae] native_flush_tlb_others+0xe/0x10 [1796917.981999] [81046b7b] flush_tlb_mm+0x5b/0xa0 [1796917.981999] [8113ba06] tlb_flush_mmu+0x46/0x90 [1796917.981999] [8113ba64] tlb_finish_mmu+0x14/0x40 [1796917.981999] [8113e3a7] zap_page_range+0xb7/0xd0 [1796917.981999] [8113a85d] madvise_vma+0xfd/0x140 [1796917.981999] [8107b917] ? __set_task_blocked+0x37/0x80 [1796917.981999] [81095b27] ? getnstimeofday+0x57/0xe0 [1796917.981999] [8113aa7e] sys_madvise+0x1de/0x280 [1796917.981999] [81666b82] system_call_fastpath+0x16/0x1b [1796917.981999] Code: 41 8d b6 cf 00 00 00 49 8d 7d 18 ff 90 d0 00 00 00 49 83 bc 24 98 c0 e0 81 00 0f 84 74 ff ff ff 66 0f 1f 84 00 00 00 00 00 f3 90 49 83 7d 18 00 75 f7 e9 5d ff ff ff 66 90 55 48 89 e5 66 66 66 [1796917.981999] Call Trace: [1796917.981999] [81046aae] native_flush_tlb_others+0xe/0x10 [1796917.981999] [81046b7b] flush_tlb_mm+0x5b/0xa0 [1796917.981999] [8113ba06] tlb_flush_mmu+0x46/0x90 [1796917.981999] [8113ba64] tlb_finish_mmu+0x14/0x40 [1796917.981999] [8113e3a7] zap_page_range+0xb7/0xd0 [1796917.981999] [8113a85d] madvise_vma+0xfd/0x140 [1796917.981999] [8107b917] ? __set_task_blocked+0x37/0x80 [1796917.981999] [81095b27] ? getnstimeofday+0x57/0xe0 [1796917.981999] [8113aa7e] sys_madvise+0x1de/0x280 [1796917.981999] [81666b82] system_call_fastpath+0x16/0x1b [1796917.992066] ata2: drained 65536 bytes to clear DRQ Analysis Bellow... To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1467955/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1467955] Re: Precise BUG: soft lockup in flush_tlb_others_ipi
** Changed in: linux (Ubuntu Precise) Assignee: (unassigned) = Rafael David Tinoco (inaddy) ** Changed in: linux (Ubuntu Precise) Status: New = In Progress ** Changed in: linux (Ubuntu) Status: In Progress = Fix Released -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1467955 Title: Precise BUG: soft lockup in flush_tlb_others_ipi Status in linux package in Ubuntu: Fix Released Status in linux source package in Precise: In Progress Bug description: The following stack trace (with kernel dump) was brought to me: [1796904.032010] BUG: soft lockup - CPU#0 stuck for 23s! [java:6383] [1796904.036004] Modules linked in: isofs psmouse virtio_balloon serio_raw acpiphp floppy [1796904.036004] CPU 0 [1796904.036004] Modules linked in: isofs psmouse virtio_balloon serio_raw acpiphp floppy [1796904.036004] [1796904.036004] Pid: 6383, comm: java Not tainted 3.2.0-76-virtual #111-Ubuntu OpenStack Foundation OpenStack Nova [1796904.036004] RIP: 0010:[81046922] [81046922] flush_tlb_others_ipi+0x122/0x130 [1796904.036004] RSP: 0018:880065791d58 EFLAGS: 0202 [1796904.036004] RAX: 0002 RBX: ea0003470bf0 RCX: 0002 [1796904.036004] RDX: 0002 RSI: 0040 RDI: 0296 [1796904.036004] RBP: 880065791d88 R08: 81e0c0a0 R09: 0040 [1796904.036004] R10: ea0003471240 R11: R12: 880065791e20 [1796904.036004] R13: 880059e96f20 R14: 880116249848 R15: 00ff880065791d78 [1796904.036004] FS: 7f83612d2700() GS:88011fc0() knlGS: [1796904.036004] CS: 0010 DS: ES: CR0: 80050033 [1796904.036004] CR2: 7f83be381420 CR3: 000118be CR4: 06f0 [1796904.036004] DR0: DR1: DR2: [1796904.036004] DR3: DR6: 0ff0 DR7: 0400 [1796917.981999] Process java (pid: 6383, threadinfo 88006579, task 880053c0dbc0) [1796917.981999] Stack: [1796917.981999] 7f83612ccfff 880059e96f20 880116200e00 8801162010d0 [1796917.981999] 7f83612cd000 880116200e00 880065791d98 81046aae [1796917.981999] 880065791db8 81046b7b 7f83611d5000 880065791e20 [1796917.981999] Call Trace: [1796917.982394] ata2: lost interrupt (Status 0x58) [1796917.981999] [81046aae] native_flush_tlb_others+0xe/0x10 [1796917.981999] [81046b7b] flush_tlb_mm+0x5b/0xa0 [1796917.981999] [8113ba06] tlb_flush_mmu+0x46/0x90 [1796917.981999] [8113ba64] tlb_finish_mmu+0x14/0x40 [1796917.981999] [8113e3a7] zap_page_range+0xb7/0xd0 [1796917.981999] [8113a85d] madvise_vma+0xfd/0x140 [1796917.981999] [8107b917] ? __set_task_blocked+0x37/0x80 [1796917.981999] [81095b27] ? getnstimeofday+0x57/0xe0 [1796917.981999] [8113aa7e] sys_madvise+0x1de/0x280 [1796917.981999] [81666b82] system_call_fastpath+0x16/0x1b [1796917.981999] Code: 41 8d b6 cf 00 00 00 49 8d 7d 18 ff 90 d0 00 00 00 49 83 bc 24 98 c0 e0 81 00 0f 84 74 ff ff ff 66 0f 1f 84 00 00 00 00 00 f3 90 49 83 7d 18 00 75 f7 e9 5d ff ff ff 66 90 55 48 89 e5 66 66 66 [1796917.981999] Call Trace: [1796917.981999] [81046aae] native_flush_tlb_others+0xe/0x10 [1796917.981999] [81046b7b] flush_tlb_mm+0x5b/0xa0 [1796917.981999] [8113ba06] tlb_flush_mmu+0x46/0x90 [1796917.981999] [8113ba64] tlb_finish_mmu+0x14/0x40 [1796917.981999] [8113e3a7] zap_page_range+0xb7/0xd0 [1796917.981999] [8113a85d] madvise_vma+0xfd/0x140 [1796917.981999] [8107b917] ? __set_task_blocked+0x37/0x80 [1796917.981999] [81095b27] ? getnstimeofday+0x57/0xe0 [1796917.981999] [8113aa7e] sys_madvise+0x1de/0x280 [1796917.981999] [81666b82] system_call_fastpath+0x16/0x1b [1796917.992066] ata2: drained 65536 bytes to clear DRQ Analysis Bellow... To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1467955/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1467955] Re: Precise BUG: soft lockup in flush_tlb_others_ipi
1) This might be happening because the IPI was sent to a offline CPU (just one example of miss-synchronization from flush_tlb_others_ipi logic) during a shutdown process (as discussed in the thread: https://lkml.org/lkml/2012/7/19/53) At that time the fix wasn't picked because the IPI mechanism for flushing other cpu's TLB was being changed to what we have nowadays, the smp_call_function_many called by flush_tlb_others. commit 52aec3308db85f4e9f5c8b9f5dc4fbd0138c6fa4 Author: Alex Shi alex@intel.com Date: Thu Jun 28 09:02:23 2012 +0800 x86/tlb: replace INVALIDATE_TLB_VECTOR by CALL_FUNCTION_VECTOR Is the one refactoring this code and probably the responsible for fixing this erratic behavior. OR 2) MOST LIKELY, because the IPI was not delivered to a context switched virtual CPU on the host side (as discussed in: https://lkml.org/lkml/2012/6/4/24), causing the running CPU to spin waiting for the other cpu to ACK something did not arrive to it. - Benchmark tool observing issues with flush_tlb_others_ipi on a workload similar to a message broker (like this case): http://www.kernelhub.org/?p=2msg=1209 - First idea of paravirt TLB flushes to avoid CPUs to stay spinning waiting for IPIs to be finished on context switched CPUs: http://www.kernelhub.org/?p=2msg=1218 -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1467955 Title: Precise BUG: soft lockup in flush_tlb_others_ipi Status in linux package in Ubuntu: Fix Released Status in linux source package in Precise: In Progress Bug description: The following stack trace (with kernel dump) was brought to me: [1796904.032010] BUG: soft lockup - CPU#0 stuck for 23s! [java:6383] [1796904.036004] Modules linked in: isofs psmouse virtio_balloon serio_raw acpiphp floppy [1796904.036004] CPU 0 [1796904.036004] Modules linked in: isofs psmouse virtio_balloon serio_raw acpiphp floppy [1796904.036004] [1796904.036004] Pid: 6383, comm: java Not tainted 3.2.0-76-virtual #111-Ubuntu OpenStack Foundation OpenStack Nova [1796904.036004] RIP: 0010:[81046922] [81046922] flush_tlb_others_ipi+0x122/0x130 [1796904.036004] RSP: 0018:880065791d58 EFLAGS: 0202 [1796904.036004] RAX: 0002 RBX: ea0003470bf0 RCX: 0002 [1796904.036004] RDX: 0002 RSI: 0040 RDI: 0296 [1796904.036004] RBP: 880065791d88 R08: 81e0c0a0 R09: 0040 [1796904.036004] R10: ea0003471240 R11: R12: 880065791e20 [1796904.036004] R13: 880059e96f20 R14: 880116249848 R15: 00ff880065791d78 [1796904.036004] FS: 7f83612d2700() GS:88011fc0() knlGS: [1796904.036004] CS: 0010 DS: ES: CR0: 80050033 [1796904.036004] CR2: 7f83be381420 CR3: 000118be CR4: 06f0 [1796904.036004] DR0: DR1: DR2: [1796904.036004] DR3: DR6: 0ff0 DR7: 0400 [1796917.981999] Process java (pid: 6383, threadinfo 88006579, task 880053c0dbc0) [1796917.981999] Stack: [1796917.981999] 7f83612ccfff 880059e96f20 880116200e00 8801162010d0 [1796917.981999] 7f83612cd000 880116200e00 880065791d98 81046aae [1796917.981999] 880065791db8 81046b7b 7f83611d5000 880065791e20 [1796917.981999] Call Trace: [1796917.982394] ata2: lost interrupt (Status 0x58) [1796917.981999] [81046aae] native_flush_tlb_others+0xe/0x10 [1796917.981999] [81046b7b] flush_tlb_mm+0x5b/0xa0 [1796917.981999] [8113ba06] tlb_flush_mmu+0x46/0x90 [1796917.981999] [8113ba64] tlb_finish_mmu+0x14/0x40 [1796917.981999] [8113e3a7] zap_page_range+0xb7/0xd0 [1796917.981999] [8113a85d] madvise_vma+0xfd/0x140 [1796917.981999] [8107b917] ? __set_task_blocked+0x37/0x80 [1796917.981999] [81095b27] ? getnstimeofday+0x57/0xe0 [1796917.981999] [8113aa7e] sys_madvise+0x1de/0x280 [1796917.981999] [81666b82] system_call_fastpath+0x16/0x1b [1796917.981999] Code: 41 8d b6 cf 00 00 00 49 8d 7d 18 ff 90 d0 00 00 00 49 83 bc 24 98 c0 e0 81 00 0f 84 74 ff ff ff 66 0f 1f 84 00 00 00 00 00 f3 90 49 83 7d 18 00 75 f7 e9 5d ff ff ff 66 90 55 48 89 e5 66 66 66 [1796917.981999] Call Trace: [1796917.981999] [81046aae] native_flush_tlb_others+0xe/0x10 [1796917.981999] [81046b7b] flush_tlb_mm+0x5b/0xa0 [1796917.981999] [8113ba06] tlb_flush_mmu+0x46/0x90 [1796917.981999] [8113ba64] tlb_finish_mmu+0x14/0x40 [1796917.981999] [8113e3a7] zap_page_range+0xb7/0xd0 [1796917.981999] [8113a85d] madvise_vma+0xfd/0x140 [1796917.981999] [8107b917] ?
[Kernel-packages] [Bug 1467955] Re: Precise BUG: soft lockup in flush_tlb_others_ipi
194 if (cpumask_andnot(to_cpumask(f-flush_cpumask), cpumask, cpumask_of(smp_processor_id( { 0x81046865 +85: mov %gs:0xca10,%edx 0x8104688e +126: test %rax,%rax 0x81046899 +137: jne 0x81046900 flush_tlb_others_ipi+240 195 /* 196 * We have to send the IPI only to 197 * CPUs affected. 198 */ 199 apic-send_IPI_mask(to_cpumask(f-flush_cpumask), 0x81046900 +240: mov 0xc95c19(%rip),%rax # 0x81cdc520 0x8104690e +254: lea 0x18(%r13),%rdi 0x81046912 +258: callq *0xd0(%rax) 200 INVALIDATE_TLB_VECTOR_START + sender); 0x81046907 +247: lea 0xcf(%r14),%esi 201 202 while (!cpumask_empty(to_cpumask(f-flush_cpumask))) 0x81046918 +264: cmpq $0x0,-0x7e1f2f68(%r12) 0x81046921 +273: je 0x8104689b flush_tlb_others_ipi+139 0x81046927 +279: nopw 0x0(%rax,%rax,1) 0x81046932 +290: cmpq $0x0,0x18(%r13) 0x81046937 +295: jne 0x81046930 flush_tlb_others_ipi+288 0x81046939 +297: jmpq 0x8104689b flush_tlb_others_ipi+139 0x8104693e: xchg %ax,%ax 203 cpu_relax(); 204 } #9 [880037695dc0] native_flush_tlb_others at 81046abe #10 [880037695dd0] flush_tlb_mm at 81046b8b ... -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1467955 Title: Precise BUG: soft lockup in flush_tlb_others_ipi Status in linux package in Ubuntu: Fix Released Status in linux source package in Precise: In Progress Bug description: The following stack trace (with kernel dump) was brought to me: [1796904.032010] BUG: soft lockup - CPU#0 stuck for 23s! [java:6383] [1796904.036004] Modules linked in: isofs psmouse virtio_balloon serio_raw acpiphp floppy [1796904.036004] CPU 0 [1796904.036004] Modules linked in: isofs psmouse virtio_balloon serio_raw acpiphp floppy [1796904.036004] [1796904.036004] Pid: 6383, comm: java Not tainted 3.2.0-76-virtual #111-Ubuntu OpenStack Foundation OpenStack Nova [1796904.036004] RIP: 0010:[81046922] [81046922] flush_tlb_others_ipi+0x122/0x130 [1796904.036004] RSP: 0018:880065791d58 EFLAGS: 0202 [1796904.036004] RAX: 0002 RBX: ea0003470bf0 RCX: 0002 [1796904.036004] RDX: 0002 RSI: 0040 RDI: 0296 [1796904.036004] RBP: 880065791d88 R08: 81e0c0a0 R09: 0040 [1796904.036004] R10: ea0003471240 R11: R12: 880065791e20 [1796904.036004] R13: 880059e96f20 R14: 880116249848 R15: 00ff880065791d78 [1796904.036004] FS: 7f83612d2700() GS:88011fc0() knlGS: [1796904.036004] CS: 0010 DS: ES: CR0: 80050033 [1796904.036004] CR2: 7f83be381420 CR3: 000118be CR4: 06f0 [1796904.036004] DR0: DR1: DR2: [1796904.036004] DR3: DR6: 0ff0 DR7: 0400 [1796917.981999] Process java (pid: 6383, threadinfo 88006579, task 880053c0dbc0) [1796917.981999] Stack: [1796917.981999] 7f83612ccfff 880059e96f20 880116200e00 8801162010d0 [1796917.981999] 7f83612cd000 880116200e00 880065791d98 81046aae [1796917.981999] 880065791db8 81046b7b 7f83611d5000 880065791e20 [1796917.981999] Call Trace: [1796917.982394] ata2: lost interrupt (Status 0x58) [1796917.981999] [81046aae] native_flush_tlb_others+0xe/0x10 [1796917.981999] [81046b7b] flush_tlb_mm+0x5b/0xa0 [1796917.981999] [8113ba06] tlb_flush_mmu+0x46/0x90 [1796917.981999] [8113ba64] tlb_finish_mmu+0x14/0x40 [1796917.981999] [8113e3a7] zap_page_range+0xb7/0xd0 [1796917.981999] [8113a85d] madvise_vma+0xfd/0x140 [1796917.981999] [8107b917] ? __set_task_blocked+0x37/0x80 [1796917.981999] [81095b27] ? getnstimeofday+0x57/0xe0 [1796917.981999] [8113aa7e] sys_madvise+0x1de/0x280 [1796917.981999] [81666b82] system_call_fastpath+0x16/0x1b [1796917.981999] Code: 41 8d b6 cf 00 00 00 49 8d 7d 18 ff 90 d0 00 00 00 49 83 bc 24 98 c0 e0 81 00 0f 84 74 ff ff ff 66 0f 1f 84 00 00 00 00 00 f3 90 49 83 7d 18 00 75 f7 e9 5d ff ff ff 66 90 55 48 89 e5 66 66 66 [1796917.981999] Call Trace: [1796917.981999] [81046aae] native_flush_tlb_others+0xe/0x10 [1796917.981999] [81046b7b] flush_tlb_mm+0x5b/0xa0 [1796917.981999] [8113ba06] tlb_flush_mmu+0x46/0x90 [1796917.981999] [8113ba64] tlb_finish_mmu+0x14/0x40 [1796917.981999] [8113e3a7] zap_page_range+0xb7/0xd0 [1796917.981999] [8113a85d] madvise_vma+0xfd/0x140 [1796917.981999] [8107b917] ? __set_task_blocked+0x37/0x80
[Kernel-packages] [Bug 1467955] Re: Precise BUG: soft lockup in flush_tlb_others_ipi
Analysing core dump with crash tool: SYSTEM MAP: /boot/System.map-3.2.0-80-virtual DEBUG KERNEL: /usr/lib/debug/boot/vmlinux-3.2.0-82-virtual DUMPFILE: ./VmCore CPUS: 2 DATE: Sat Jun 6 00:23:24 2015 UPTIME: 1 days, 11:37:29 LOAD AVERAGE: 1.96, 1.11, 0.54 TASKS: 183 NODENAME: activemq2 RELEASE: 3.2.0-80-virtual VERSION: #116-Ubuntu SMP Mon Mar 23 17:28:52 UTC 2015 MACHINE: x86_64 (1799 Mhz) MEMORY: 4 GB PANIC: [128249.422864] Kernel panic - not syncing: softlockup: hung tasks PID: 6345 COMMAND: java TASK: 880037412de0 [THREAD_INFO: 880037694000] CPU: 0 STATE: TASK_RUNNING (PANIC) PID: 6345 TASK: 880037412de0 CPU: 0 COMMAND: java #0 [88011fc03cd0] machine_kexec at 8103970a ... #7 [88011fc03fb0] apic_timer_interrupt at 81667b9e --- IRQ stack --- #8 [880037695cd8] apic_timer_interrupt at 81667b9e [exception RIP: flush_tlb_others_ipi+290] RIP: 81046932 RSP: 880037695d88 RFLAGS: 0202 RAX: 0002 RBX: 880117c06a00 RCX: 0002 RDX: 0002 RSI: 0040 RDI: 0296 RBP: 880037695db8 R8: 81e0d0a0 R9: 0040 R10: R11: 8801163df4d0 R12: 880037695d28 R13: R14: 81143575 R15: 880037695d68 ORIG_RAX: ff10 CS: 0010 SS: 0018 It is clear that we are locked inside flush_tlb_others_ipi. Disassembling flush_tlb_others_ipi I could see we are locked inside a loop waiting for a cpumask to be emptied. This cpumask is emptied when all other CPUs received the IPI and processed the event INVALIDATE_TLB_VECTOR_START properly. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1467955 Title: Precise BUG: soft lockup in flush_tlb_others_ipi Status in linux package in Ubuntu: Fix Released Status in linux source package in Precise: In Progress Bug description: The following stack trace (with kernel dump) was brought to me: [1796904.032010] BUG: soft lockup - CPU#0 stuck for 23s! [java:6383] [1796904.036004] Modules linked in: isofs psmouse virtio_balloon serio_raw acpiphp floppy [1796904.036004] CPU 0 [1796904.036004] Modules linked in: isofs psmouse virtio_balloon serio_raw acpiphp floppy [1796904.036004] [1796904.036004] Pid: 6383, comm: java Not tainted 3.2.0-76-virtual #111-Ubuntu OpenStack Foundation OpenStack Nova [1796904.036004] RIP: 0010:[81046922] [81046922] flush_tlb_others_ipi+0x122/0x130 [1796904.036004] RSP: 0018:880065791d58 EFLAGS: 0202 [1796904.036004] RAX: 0002 RBX: ea0003470bf0 RCX: 0002 [1796904.036004] RDX: 0002 RSI: 0040 RDI: 0296 [1796904.036004] RBP: 880065791d88 R08: 81e0c0a0 R09: 0040 [1796904.036004] R10: ea0003471240 R11: R12: 880065791e20 [1796904.036004] R13: 880059e96f20 R14: 880116249848 R15: 00ff880065791d78 [1796904.036004] FS: 7f83612d2700() GS:88011fc0() knlGS: [1796904.036004] CS: 0010 DS: ES: CR0: 80050033 [1796904.036004] CR2: 7f83be381420 CR3: 000118be CR4: 06f0 [1796904.036004] DR0: DR1: DR2: [1796904.036004] DR3: DR6: 0ff0 DR7: 0400 [1796917.981999] Process java (pid: 6383, threadinfo 88006579, task 880053c0dbc0) [1796917.981999] Stack: [1796917.981999] 7f83612ccfff 880059e96f20 880116200e00 8801162010d0 [1796917.981999] 7f83612cd000 880116200e00 880065791d98 81046aae [1796917.981999] 880065791db8 81046b7b 7f83611d5000 880065791e20 [1796917.981999] Call Trace: [1796917.982394] ata2: lost interrupt (Status 0x58) [1796917.981999] [81046aae] native_flush_tlb_others+0xe/0x10 [1796917.981999] [81046b7b] flush_tlb_mm+0x5b/0xa0 [1796917.981999] [8113ba06] tlb_flush_mmu+0x46/0x90 [1796917.981999] [8113ba64] tlb_finish_mmu+0x14/0x40 [1796917.981999] [8113e3a7] zap_page_range+0xb7/0xd0 [1796917.981999] [8113a85d] madvise_vma+0xfd/0x140 [1796917.981999] [8107b917] ? __set_task_blocked+0x37/0x80 [1796917.981999] [81095b27] ? getnstimeofday+0x57/0xe0 [1796917.981999] [8113aa7e] sys_madvise+0x1de/0x280 [1796917.981999] [81666b82] system_call_fastpath+0x16/0x1b [1796917.981999] Code: 41 8d b6 cf 00 00 00 49 8d 7d 18 ff 90 d0 00 00 00 49 83 bc 24 98 c0 e0 81 00 0f 84 74 ff ff ff 66 0f 1f 84 00 00 00 00 00 f3 90 49 83 7d 18 00 75 f7 e9 5d ff ff ff 66 90 55 48 89 e5 66 66 66 [1796917.981999] Call Trace: [1796917.981999] [81046aae]
[Kernel-packages] [Bug 1467955] Re: Precise BUG: soft lockup in flush_tlb_others_ipi
I've been able to back port the following commit: commit 52aec3308db85f4e9f5c8b9f5dc4fbd0138c6fa4 Author: Alex Shi alex@intel.com Date: Thu Jun 28 09:02:23 2012 +0800 x86/tlb: replace INVALIDATE_TLB_VECTOR by CALL_FUNCTION_VECTOR There are 32 INVALIDATE_TLB_VECTOR now in kernel. That is quite big amount of vector in IDT. But it is still not enough, since modern x86 sever has more cpu number. That still causes heavy lock contention in TLB flushing. The patch using generic smp call function to replace it. That saved 32 vector number in IDT, and resolved the lock contention in TLB flushing on large system. In the NHM EX machine 4P * 8cores * HT = 64 CPUs, hackbench pthread has 3% performance increase. Signed-off-by: Alex Shi alex@intel.com Link: http://lkml.kernel.org/r/1340845344-27557-9-git-send-email-alex@intel.com Signed-off-by: H. Peter Anvin h...@zytor.com Responsible to alter the logic for the flush_tlb_others_ipi sequence. I also back-ported the following needed commits: commit 3a4f7b0a59006a3986b8ed6faf0031f1e5232db4 Author: Alex Shi alex@intel.com Date: Thu Jun 28 09:02:17 2012 +0800 x86/flush_tlb: try flush_tlb_single one by one in flush_tlb_range commit 3331548b0d3907b1ab84e86239e149b8a52cda5d Author: Jan Beulich jbeul...@suse.com Date: Tue Nov 29 11:03:46 2011 + x86-64: Reduce amount of redundant code generated for invalidate_interruptNN Right now I'm sending the source code to a kernel builder machine and will provide a hotfixed kernel, to be tested, soon. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1467955 Title: Precise BUG: soft lockup in flush_tlb_others_ipi Status in linux package in Ubuntu: Fix Released Status in linux source package in Precise: In Progress Bug description: The following stack trace (with kernel dump) was brought to me: [1796904.032010] BUG: soft lockup - CPU#0 stuck for 23s! [java:6383] [1796904.036004] Modules linked in: isofs psmouse virtio_balloon serio_raw acpiphp floppy [1796904.036004] CPU 0 [1796904.036004] Modules linked in: isofs psmouse virtio_balloon serio_raw acpiphp floppy [1796904.036004] [1796904.036004] Pid: 6383, comm: java Not tainted 3.2.0-76-virtual #111-Ubuntu OpenStack Foundation OpenStack Nova [1796904.036004] RIP: 0010:[81046922] [81046922] flush_tlb_others_ipi+0x122/0x130 [1796904.036004] RSP: 0018:880065791d58 EFLAGS: 0202 [1796904.036004] RAX: 0002 RBX: ea0003470bf0 RCX: 0002 [1796904.036004] RDX: 0002 RSI: 0040 RDI: 0296 [1796904.036004] RBP: 880065791d88 R08: 81e0c0a0 R09: 0040 [1796904.036004] R10: ea0003471240 R11: R12: 880065791e20 [1796904.036004] R13: 880059e96f20 R14: 880116249848 R15: 00ff880065791d78 [1796904.036004] FS: 7f83612d2700() GS:88011fc0() knlGS: [1796904.036004] CS: 0010 DS: ES: CR0: 80050033 [1796904.036004] CR2: 7f83be381420 CR3: 000118be CR4: 06f0 [1796904.036004] DR0: DR1: DR2: [1796904.036004] DR3: DR6: 0ff0 DR7: 0400 [1796917.981999] Process java (pid: 6383, threadinfo 88006579, task 880053c0dbc0) [1796917.981999] Stack: [1796917.981999] 7f83612ccfff 880059e96f20 880116200e00 8801162010d0 [1796917.981999] 7f83612cd000 880116200e00 880065791d98 81046aae [1796917.981999] 880065791db8 81046b7b 7f83611d5000 880065791e20 [1796917.981999] Call Trace: [1796917.982394] ata2: lost interrupt (Status 0x58) [1796917.981999] [81046aae] native_flush_tlb_others+0xe/0x10 [1796917.981999] [81046b7b] flush_tlb_mm+0x5b/0xa0 [1796917.981999] [8113ba06] tlb_flush_mmu+0x46/0x90 [1796917.981999] [8113ba64] tlb_finish_mmu+0x14/0x40 [1796917.981999] [8113e3a7] zap_page_range+0xb7/0xd0 [1796917.981999] [8113a85d] madvise_vma+0xfd/0x140 [1796917.981999] [8107b917] ? __set_task_blocked+0x37/0x80 [1796917.981999] [81095b27] ? getnstimeofday+0x57/0xe0 [1796917.981999] [8113aa7e] sys_madvise+0x1de/0x280 [1796917.981999] [81666b82] system_call_fastpath+0x16/0x1b [1796917.981999] Code: 41 8d b6 cf 00 00 00 49 8d 7d 18 ff 90 d0 00 00 00 49 83 bc 24 98 c0 e0 81 00 0f 84 74 ff ff ff 66 0f 1f 84 00 00 00 00 00 f3 90 49 83 7d 18 00 75 f7 e9 5d ff ff ff 66 90 55 48 89 e5 66 66 66 [1796917.981999] Call Trace: [1796917.981999] [81046aae] native_flush_tlb_others+0xe/0x10 [1796917.981999] [81046b7b] flush_tlb_mm+0x5b/0xa0 [1796917.981999] [8113ba06]
[Kernel-packages] [Bug 1467955] Re: Precise BUG: soft lockup in flush_tlb_others_ipi
I have provided the following PPA: https://launchpad.net/~inaddy/+archive/ubuntu/lp1467955 with a 3.2 kernel with new flush_tlb logic (removing IRQ vectors and using smp_function_call_). Please, if you are suffering from this issue, give this kernel a try and add comments giving feedback. -Rafael -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1467955 Title: Precise BUG: soft lockup in flush_tlb_others_ipi Status in linux package in Ubuntu: Fix Released Status in linux source package in Precise: In Progress Bug description: The following stack trace (with kernel dump) was brought to me. It looks like this crash is happening every day (at least once) in a KVM + CEPH backend environment. [1796904.032010] BUG: soft lockup - CPU#0 stuck for 23s! [java:6383] [1796904.036004] Modules linked in: isofs psmouse virtio_balloon serio_raw acpiphp floppy [1796904.036004] CPU 0 [1796904.036004] Modules linked in: isofs psmouse virtio_balloon serio_raw acpiphp floppy [1796904.036004] [1796904.036004] Pid: 6383, comm: java Not tainted 3.2.0-76-virtual #111-Ubuntu OpenStack Foundation OpenStack Nova [1796904.036004] RIP: 0010:[81046922] [81046922] flush_tlb_others_ipi+0x122/0x130 [1796904.036004] RSP: 0018:880065791d58 EFLAGS: 0202 [1796904.036004] RAX: 0002 RBX: ea0003470bf0 RCX: 0002 [1796904.036004] RDX: 0002 RSI: 0040 RDI: 0296 [1796904.036004] RBP: 880065791d88 R08: 81e0c0a0 R09: 0040 [1796904.036004] R10: ea0003471240 R11: R12: 880065791e20 [1796904.036004] R13: 880059e96f20 R14: 880116249848 R15: 00ff880065791d78 [1796904.036004] FS: 7f83612d2700() GS:88011fc0() knlGS: [1796904.036004] CS: 0010 DS: ES: CR0: 80050033 [1796904.036004] CR2: 7f83be381420 CR3: 000118be CR4: 06f0 [1796904.036004] DR0: DR1: DR2: [1796904.036004] DR3: DR6: 0ff0 DR7: 0400 [1796917.981999] Process java (pid: 6383, threadinfo 88006579, task 880053c0dbc0) [1796917.981999] Stack: [1796917.981999] 7f83612ccfff 880059e96f20 880116200e00 8801162010d0 [1796917.981999] 7f83612cd000 880116200e00 880065791d98 81046aae [1796917.981999] 880065791db8 81046b7b 7f83611d5000 880065791e20 [1796917.981999] Call Trace: [1796917.982394] ata2: lost interrupt (Status 0x58) [1796917.981999] [81046aae] native_flush_tlb_others+0xe/0x10 [1796917.981999] [81046b7b] flush_tlb_mm+0x5b/0xa0 [1796917.981999] [8113ba06] tlb_flush_mmu+0x46/0x90 [1796917.981999] [8113ba64] tlb_finish_mmu+0x14/0x40 [1796917.981999] [8113e3a7] zap_page_range+0xb7/0xd0 [1796917.981999] [8113a85d] madvise_vma+0xfd/0x140 [1796917.981999] [8107b917] ? __set_task_blocked+0x37/0x80 [1796917.981999] [81095b27] ? getnstimeofday+0x57/0xe0 [1796917.981999] [8113aa7e] sys_madvise+0x1de/0x280 [1796917.981999] [81666b82] system_call_fastpath+0x16/0x1b [1796917.981999] Code: 41 8d b6 cf 00 00 00 49 8d 7d 18 ff 90 d0 00 00 00 49 83 bc 24 98 c0 e0 81 00 0f 84 74 ff ff ff 66 0f 1f 84 00 00 00 00 00 f3 90 49 83 7d 18 00 75 f7 e9 5d ff ff ff 66 90 55 48 89 e5 66 66 66 [1796917.981999] Call Trace: [1796917.981999] [81046aae] native_flush_tlb_others+0xe/0x10 [1796917.981999] [81046b7b] flush_tlb_mm+0x5b/0xa0 [1796917.981999] [8113ba06] tlb_flush_mmu+0x46/0x90 [1796917.981999] [8113ba64] tlb_finish_mmu+0x14/0x40 [1796917.981999] [8113e3a7] zap_page_range+0xb7/0xd0 [1796917.981999] [8113a85d] madvise_vma+0xfd/0x140 [1796917.981999] [8107b917] ? __set_task_blocked+0x37/0x80 [1796917.981999] [81095b27] ? getnstimeofday+0x57/0xe0 [1796917.981999] [8113aa7e] sys_madvise+0x1de/0x280 [1796917.981999] [81666b82] system_call_fastpath+0x16/0x1b [1796917.992066] ata2: drained 65536 bytes to clear DRQ Analysis Bellow... To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1467955/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp