Re: Multitude of dst obsolescense race conditions
> On Wed, May 14, 2014, at 2:57, dormando wrote: > > Given a machine with frequently changing routes (ie; a router with an > > active internet BGP table and multiple interfaces), there're at least > > several places where obsolete dst's are handled improperly. If I pause > > the > > route changes, the crashes appear to stop. This first one has a crash > > utility we've made, so I was able to more quickly find a patch and test > > it. The others take time to reproduce. > > > > I'm testing against 3.10.39, but I think if these were fixed they'd be > > backported to stable? I've also had recent 3.12's running that have > > crashed in the same spots. Anyway correct me if I'm wrong... > > Just a hunch: > You use macvlan? Could you somehow try without? > Maybe... some ref overflow? (You could add some testing code in dst_hold > with atomic_inc_return and WARN_ON). > > dst_release already contains such a check, so I am not sure at all if > that could happen. > > Bye, > > Hannes > We've seen the crashes with macvlan removed. Don't think I've explicitly removed it recently or for the udp crash, but I'm sorta doubting that'd make a difference. and yeah, pretty weird right? it's like the RCU isn't working.. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Multitude of dst obsolescense race conditions
> On Wed, 2014-05-14 at 02:57 -0700, dormando wrote: > > Hi, > > > > Given a machine with frequently changing routes (ie; a router with an > > active internet BGP table and multiple interfaces), there're at least > > several places where obsolete dst's are handled improperly. If I pause the > > route changes, the crashes appear to stop. This first one has a crash > > utility we've made, so I was able to more quickly find a patch and test > > it. The others take time to reproduce. > > > > I'm testing against 3.10.39, but I think if these were fixed they'd be > > backported to stable? I've also had recent 3.12's running that have > > crashed in the same spots. Anyway correct me if I'm wrong... > > Is this a vanilla kernel ? I never had any issues like that. > > I wonder if you have some RCU issues. > > static inline struct dst_entry * > sk_dst_get(struct sock *sk) > { > struct dst_entry *dst; > > rcu_read_lock(); > dst = rcu_dereference(sk->sk_dst_cache); > if (dst) > dst_hold(dst); > rcu_read_unlock(); > return dst; > } > > static inline void > __sk_dst_set(struct sock *sk, struct dst_entry *dst) > { > struct dst_entry *old_dst; > > sk_tx_queue_clear(sk); > /* > * This can be called while sk is owned by the caller only, > * with no state that can be checked in a rcu_dereference_check() cond > */ > old_dst = rcu_dereference_raw(sk->sk_dst_cache); > rcu_assign_pointer(sk->sk_dst_cache, dst); > dst_release(old_dst); > } > > static inline void > sk_dst_set(struct sock *sk, struct dst_entry *dst) > { > spin_lock(>sk_dst_lock); > __sk_dst_set(sk, dst); > spin_unlock(>sk_dst_lock); > } > > > > We have some minor patches, but I've removed them before and they still happen. I'd crashed a vanilla 3.12 + just the stable patches recently I think. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Multitude of dst obsolescense race conditions
[] irq_exit+0x55/0x60 <4>[7359073.004983] [] scheduler_ipi+0x35/0x40 <4>[7359073.005049] [] smp_reschedule_interrupt+0x2a/0x30 <4>[7359073.005115] [] reschedule_interrupt+0x6a/0x70 <4>[7359073.005176] <4>[7359073.005217] [] ? _raw_spin_lock+0x25/0x30 <4>[7359073.005370] [] futex_wait_setup+0x69/0xf0 <4>[7359073.005433] [] futex_wait+0x186/0x2c0 <4>[7359073.005495] [] ? current_fs_time+0x16/0x60 <4>[7359073.005559] [] ? pipe_write+0x2f3/0x590 <4>[7359073.005625] [] ? fsnotify+0x1d2/0x2b0 <4>[7359073.005687] [] do_futex+0x334/0xb20 <4>[7359073.005751] [] ? do_sync_write+0x7a/0xb0 <4>[7359073.005813] [] ? fsnotify+0x1d2/0x2b0 <4>[7359073.005875] [] SyS_futex+0x142/0x1a0 <4>[7359073.005939] [] ? SyS_write+0x6b/0xa0 <4>[7359073.006001] [] system_call_fastpath+0x16/0x1b <4>[7359073.006063] Code: 00 49 8b 50 08 4d 8b 28 49 8b 40 10 4d 85 ed 0f 84 e7 00 00 00 48 85 c0 0f 84 de 00 00 00 49 63 44 24 20 48 8d 4a 01 49 8b 3c 24 <49> 8b 5c 05 00 4c 89 e8 65 48 0f c7 0f 0f 94 c0 84 c0 74 b5 49 <1>[7359073.008543] RIP [] kmem_cache_alloc+0x57/0x150 <4>[7359073.008642] RSP <4>[7359073.008700] CR2: 0001 <4>[7359073.008767] ---[ end trace 83220393c4cb24ad ]--- <0>[7359073.072455] Kernel panic - not syncing: Fatal exception in interrupt (apologies for the mangling, it's getting late and I'm getting progressively lazier) The path for this one appears to shift a bit, but is always dying from the kmem_cache_alloc() call withind dst_alloc(). I've also seen: <4>[14723139.584187] Call Trace: <4>[14723139.584241] <4>[14723139.584282] [] dst_alloc+0x5a/0x180 <4>[14723139.584433] [] ? tcp_write_timer_handler+0x1d0/0x1d0 <4>[14723139.584497] [] rt_dst_alloc+0x4c/0x50 <4>[14723139.584558] [] __ip_route_output_key+0x281/0x860 <4>[14723139.584622] [] ? tcp_write_timer_handler+0x1d0/0x1d0 <4>[14723139.584685] [] ip_route_output_flow+0x27/0x70 <4>[14723139.584747] [] inet_sk_rebuild_header+0x137/0x310 <4>[14723139.584810] [] ? tcp_write_timer_handler+0x1d0/0x1d0 <4>[14723139.584874] [] __tcp_retransmit_skb+0x78/0x5a0 <4>[14723139.584938] [] ? bictcp_state+0xa1/0x100 <4>[14723139.584999] [] ? tcp_write_timer_handler+0x1d0/0x1d0 <4>[14723139.585062] [] tcp_retransmit_skb+0x24/0x100 <4>[14723139.585124] [] tcp_retransmit_timer+0x271/0x6d0 <4>[14723139.585187] [] ? tcp_write_timer_handler+0x1d0/0x1d0 <4>[14723139.585250] [] tcp_write_timer_handler+0xa0/0x1d0 <4>[14723139.585314] [] ? tcp_write_timer_handler+0x1d0/0x1d0 <4>[14723139.585378] [] tcp_write_timer+0x60/0x70 <4>[14723139.585443] [] call_timer_fn+0x3b/0x150 <4>[14723139.585507] [] ? do_IRQ+0x63/0xe0 <4>[14723139.585568] [] ? tcp_write_timer_handler+0x1d0/0x1d0 <4>[14723139.585630] [] run_timer_softirq+0x243/0x290 <4>[14723139.585690] [] __do_softirq+0xd0/0x270 <4>[14723139.585749] [] irq_exit+0x55/0x60 <4>[14723139.585807] [] smp_apic_timer_interrupt+0x6e/0x99 <4>[14723139.585868] [] apic_timer_interrupt+0x6a/0x70 This one's the most problematic. It's the least frequent, most difficult to reproduce. Given how the other issues all centralize around dst's being mishandled during route updates my wild guess would be that it's somewhere within there. It's probably worth auditing how dst caches are handled in all places, but it is 3am and I have to stop for now. Anyway this sucks, please help! thanks! -Dormando -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Multitude of dst obsolescense race conditions
] process_backlog+0x9b/0x170 4[7359073.004729] [815afa49] net_rx_action+0x119/0x220 4[7359073.004794] [81080f0b] ? check_preempt_wakeup+0x14b/0x230 4[7359073.004860] [81051970] __do_softirq+0xd0/0x270 4[7359073.004921] [81051c25] irq_exit+0x55/0x60 4[7359073.004983] [8107a5b5] scheduler_ipi+0x35/0x40 4[7359073.005049] [81023bda] smp_reschedule_interrupt+0x2a/0x30 4[7359073.005115] [816cd5da] reschedule_interrupt+0x6a/0x70 4[7359073.005176] EOI 4[7359073.005217] [816c41f5] ? _raw_spin_lock+0x25/0x30 4[7359073.005370] [81098629] futex_wait_setup+0x69/0xf0 4[7359073.005433] [81098836] futex_wait+0x186/0x2c0 4[7359073.005495] [810508c6] ? current_fs_time+0x16/0x60 4[7359073.005559] [81159123] ? pipe_write+0x2f3/0x590 4[7359073.005625] [8118e8c2] ? fsnotify+0x1d2/0x2b0 4[7359073.005687] [81099e04] do_futex+0x334/0xb20 4[7359073.005751] [8115021a] ? do_sync_write+0x7a/0xb0 4[7359073.005813] [8118e8c2] ? fsnotify+0x1d2/0x2b0 4[7359073.005875] [8109a732] SyS_futex+0x142/0x1a0 4[7359073.005939] [8115148b] ? SyS_write+0x6b/0xa0 4[7359073.006001] [816cc702] system_call_fastpath+0x16/0x1b 4[7359073.006063] Code: 00 49 8b 50 08 4d 8b 28 49 8b 40 10 4d 85 ed 0f 84 e7 00 00 00 48 85 c0 0f 84 de 00 00 00 49 63 44 24 20 48 8d 4a 01 49 8b 3c 24 49 8b 5c 05 00 4c 89 e8 65 48 0f c7 0f 0f 94 c0 84 c0 74 b5 49 1[7359073.008543] RIP [811421e7] kmem_cache_alloc+0x57/0x150 4[7359073.008642] RSP 88c07fc638f8 4[7359073.008700] CR2: 0001 4[7359073.008767] ---[ end trace 83220393c4cb24ad ]--- 0[7359073.072455] Kernel panic - not syncing: Fatal exception in interrupt (apologies for the mangling, it's getting late and I'm getting progressively lazier) The path for this one appears to shift a bit, but is always dying from the kmem_cache_alloc() call withind dst_alloc(). I've also seen: 4[14723139.584187] Call Trace: 4[14723139.584241] IRQ 4[14723139.584282] [815b672a] dst_alloc+0x5a/0x180 4[14723139.584433] [8161c880] ? tcp_write_timer_handler+0x1d0/0x1d0 4[14723139.584497] [815f78bc] rt_dst_alloc+0x4c/0x50 4[14723139.584558] [815f8861] __ip_route_output_key+0x281/0x860 4[14723139.584622] [8161c880] ? tcp_write_timer_handler+0x1d0/0x1d0 4[14723139.584685] [815f8e67] ip_route_output_flow+0x27/0x70 4[14723139.584747] [816329f7] inet_sk_rebuild_header+0x137/0x310 4[14723139.584810] [8161c880] ? tcp_write_timer_handler+0x1d0/0x1d0 4[14723139.584874] [81619c28] __tcp_retransmit_skb+0x78/0x5a0 4[14723139.584938] [816557f1] ? bictcp_state+0xa1/0x100 4[14723139.584999] [8161c880] ? tcp_write_timer_handler+0x1d0/0x1d0 4[14723139.585062] [8161a354] tcp_retransmit_skb+0x24/0x100 4[14723139.585124] [8161c251] tcp_retransmit_timer+0x271/0x6d0 4[14723139.585187] [8161c880] ? tcp_write_timer_handler+0x1d0/0x1d0 4[14723139.585250] [8161c750] tcp_write_timer_handler+0xa0/0x1d0 4[14723139.585314] [8161c880] ? tcp_write_timer_handler+0x1d0/0x1d0 4[14723139.585378] [8161c8e0] tcp_write_timer+0x60/0x70 4[14723139.585443] [81057ccb] call_timer_fn+0x3b/0x150 4[14723139.585507] [816cdfc3] ? do_IRQ+0x63/0xe0 4[14723139.585568] [8161c880] ? tcp_write_timer_handler+0x1d0/0x1d0 4[14723139.585630] [81059223] run_timer_softirq+0x243/0x290 4[14723139.585690] [81051970] __do_softirq+0xd0/0x270 4[14723139.585749] [81051c25] irq_exit+0x55/0x60 4[14723139.585807] [816ce0ae] smp_apic_timer_interrupt+0x6e/0x99 4[14723139.585868] [816cd2ca] apic_timer_interrupt+0x6a/0x70 This one's the most problematic. It's the least frequent, most difficult to reproduce. Given how the other issues all centralize around dst's being mishandled during route updates my wild guess would be that it's somewhere within there. It's probably worth auditing how dst caches are handled in all places, but it is 3am and I have to stop for now. Anyway this sucks, please help! thanks! -Dormando -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Multitude of dst obsolescense race conditions
On Wed, 2014-05-14 at 02:57 -0700, dormando wrote: Hi, Given a machine with frequently changing routes (ie; a router with an active internet BGP table and multiple interfaces), there're at least several places where obsolete dst's are handled improperly. If I pause the route changes, the crashes appear to stop. This first one has a crash utility we've made, so I was able to more quickly find a patch and test it. The others take time to reproduce. I'm testing against 3.10.39, but I think if these were fixed they'd be backported to stable? I've also had recent 3.12's running that have crashed in the same spots. Anyway correct me if I'm wrong... Is this a vanilla kernel ? I never had any issues like that. I wonder if you have some RCU issues. static inline struct dst_entry * sk_dst_get(struct sock *sk) { struct dst_entry *dst; rcu_read_lock(); dst = rcu_dereference(sk-sk_dst_cache); if (dst) dst_hold(dst); rcu_read_unlock(); return dst; } static inline void __sk_dst_set(struct sock *sk, struct dst_entry *dst) { struct dst_entry *old_dst; sk_tx_queue_clear(sk); /* * This can be called while sk is owned by the caller only, * with no state that can be checked in a rcu_dereference_check() cond */ old_dst = rcu_dereference_raw(sk-sk_dst_cache); rcu_assign_pointer(sk-sk_dst_cache, dst); dst_release(old_dst); } static inline void sk_dst_set(struct sock *sk, struct dst_entry *dst) { spin_lock(sk-sk_dst_lock); __sk_dst_set(sk, dst); spin_unlock(sk-sk_dst_lock); } We have some minor patches, but I've removed them before and they still happen. I'd crashed a vanilla 3.12 + just the stable patches recently I think. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Multitude of dst obsolescense race conditions
On Wed, May 14, 2014, at 2:57, dormando wrote: Given a machine with frequently changing routes (ie; a router with an active internet BGP table and multiple interfaces), there're at least several places where obsolete dst's are handled improperly. If I pause the route changes, the crashes appear to stop. This first one has a crash utility we've made, so I was able to more quickly find a patch and test it. The others take time to reproduce. I'm testing against 3.10.39, but I think if these were fixed they'd be backported to stable? I've also had recent 3.12's running that have crashed in the same spots. Anyway correct me if I'm wrong... Just a hunch: You use macvlan? Could you somehow try without? Maybe... some ref overflow? (You could add some testing code in dst_hold with atomic_inc_return and WARN_ON). dst_release already contains such a check, so I am not sure at all if that could happen. Bye, Hannes We've seen the crashes with macvlan removed. Don't think I've explicitly removed it recently or for the udp crash, but I'm sorta doubting that'd make a difference. and yeah, pretty weird right? it's like the RCU isn't working.. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: ipv4_dst_destroy panic regression after 3.10.15
On Wed, 22 Jan 2014, Alexei Starovoitov wrote: > On Tue, Jan 21, 2014 at 10:02 PM, Alexei Starovoitov > wrote: > > On Tue, Jan 21, 2014 at 8:10 PM, dormando wrote: > >> > >> > >> On Tue, 21 Jan 2014, Alexei Starovoitov wrote: > >> > >>> On Tue, Jan 21, 2014 at 5:39 PM, dormando wrote: > >>> > > >>> > > On Fri, Jan 17, 2014 at 11:16 PM, dormando wrote: > >>> > > >> On Fri, 2014-01-17 at 22:49 -0800, Eric Dumazet wrote: > >>> > > >> > On Fri, 2014-01-17 at 17:25 -0800, dormando wrote: > >>> > > >> > > Hi, > >>> > > >> > > > >>> > > >> > > Upgraded a few kernels to the latest 3.10 stable tree while > >>> > > >> > > tracking down > >>> > > >> > > a rare kernel panic, seems to have introduced a much more > >>> > > >> > > frequent kernel > >>> > > >> > > panic. Takes anywhere from 4 hours to 2 days to trigger: > >>> > > >> > > > >>> > > >> > > <4>[196727.311203] general protection fault: [#1] SMP > >>> > > >> > > <4>[196727.311224] Modules linked in: xt_TEE xt_dscp xt_DSCP > >>> > > >> > > macvlan bridge coretemp crc32_pclmul ghash_clmulni_intel > >>> > > >> > > gpio_ich microcode > >>> ipmi_watchdog ipmi_devintf sb_edac edac_core lpc_ich mfd_core tpm_tis tpm > >>> tpm_bios ipmi_si ipmi_msghandler isci igb libsas i2c_algo_bit ixgbe ptp > >>> pps_core mdio > >>> > > >> > > <4>[196727.311333] CPU: 17 PID: 0 Comm: swapper/17 Not tainted > >>> > > >> > > 3.10.26 #1 > >>> > > >> > > <4>[196727.311344] Hardware name: Supermicro > >>> > > >> > > X9DRi-LN4+/X9DR3-LN4+/X9DRi-LN4+/X9DR3-LN4+, BIOS 3.0 > >>> > > >> > > 07/05/2013 > >>> > > >> > > <4>[196727.311364] task: 885e6f069700 ti: 885e6f072000 > >>> > > >> > > task.ti: 885e6f072000 > >>> > > >> > > <4>[196727.311377] RIP: 0010:[] > >>> > > >> > > [] ipv4_dst_destroy+0x4f/0x80 > >>> > > >> > > <4>[196727.311399] RSP: 0018:885effd23a70 EFLAGS: 00010282 > >>> > > >> > > <4>[196727.311409] RAX: dead00200200 RBX: 8854c398ecc0 > >>> > > >> > > RCX: 0040 > >>> > > >> > > <4>[196727.311423] RDX: dead00100100 RSI: dead00100100 > >>> > > >> > > RDI: dead00200200 > >>> > > >> > > <4>[196727.311437] RBP: 885effd23a80 R08: 815fd9e0 > >>> > > >> > > R09: 885d5a590800 > >>> > > >> > > <4>[196727.311451] R10: R11: > >>> > > >> > > R12: > >>> > > >> > > <4>[196727.311464] R13: 81c8c280 R14: > >>> > > >> > > R15: 880e85ee16ce > >>> > > >> > > <4>[196727.311510] FS: () > >>> > > >> > > GS:885effd2() knlGS: > >>> > > >> > > <4>[196727.311554] CS: 0010 DS: ES: CR0: > >>> > > >> > > 80050033 > >>> > > >> > > <4>[196727.311581] CR2: 7a46751eb000 CR3: 005e65688000 > >>> > > >> > > CR4: 000407e0 > >>> > > >> > > <4>[196727.311625] DR0: DR1: > >>> > > >> > > DR2: > >>> > > >> > > <4>[196727.311669] DR3: DR6: 0ff0 > >>> > > >> > > DR7: 0400 > >>> > > >> > > <4>[196727.311713] Stack: > >>> > > >> > > <4>[196727.311733] 8854c398ecc0 8854c398ecc0 > >>> > > >> > > 885effd23ab0 815b7f42 > >>> > > >> > > <4>[196727.311784] 88be6595bc00 8854c398ecc0 > &
Re: ipv4_dst_destroy panic regression after 3.10.15
On Wed, 22 Jan 2014, Alexei Starovoitov wrote: On Tue, Jan 21, 2014 at 10:02 PM, Alexei Starovoitov alexei.starovoi...@gmail.com wrote: On Tue, Jan 21, 2014 at 8:10 PM, dormando dorma...@rydia.net wrote: On Tue, 21 Jan 2014, Alexei Starovoitov wrote: On Tue, Jan 21, 2014 at 5:39 PM, dormando dorma...@rydia.net wrote: On Fri, Jan 17, 2014 at 11:16 PM, dormando dorma...@rydia.net wrote: On Fri, 2014-01-17 at 22:49 -0800, Eric Dumazet wrote: On Fri, 2014-01-17 at 17:25 -0800, dormando wrote: Hi, Upgraded a few kernels to the latest 3.10 stable tree while tracking down a rare kernel panic, seems to have introduced a much more frequent kernel panic. Takes anywhere from 4 hours to 2 days to trigger: 4[196727.311203] general protection fault: [#1] SMP 4[196727.311224] Modules linked in: xt_TEE xt_dscp xt_DSCP macvlan bridge coretemp crc32_pclmul ghash_clmulni_intel gpio_ich microcode ipmi_watchdog ipmi_devintf sb_edac edac_core lpc_ich mfd_core tpm_tis tpm tpm_bios ipmi_si ipmi_msghandler isci igb libsas i2c_algo_bit ixgbe ptp pps_core mdio 4[196727.311333] CPU: 17 PID: 0 Comm: swapper/17 Not tainted 3.10.26 #1 4[196727.311344] Hardware name: Supermicro X9DRi-LN4+/X9DR3-LN4+/X9DRi-LN4+/X9DR3-LN4+, BIOS 3.0 07/05/2013 4[196727.311364] task: 885e6f069700 ti: 885e6f072000 task.ti: 885e6f072000 4[196727.311377] RIP: 0010:[815f8c7f] [815f8c7f] ipv4_dst_destroy+0x4f/0x80 4[196727.311399] RSP: 0018:885effd23a70 EFLAGS: 00010282 4[196727.311409] RAX: dead00200200 RBX: 8854c398ecc0 RCX: 0040 4[196727.311423] RDX: dead00100100 RSI: dead00100100 RDI: dead00200200 4[196727.311437] RBP: 885effd23a80 R08: 815fd9e0 R09: 885d5a590800 4[196727.311451] R10: R11: R12: 4[196727.311464] R13: 81c8c280 R14: R15: 880e85ee16ce 4[196727.311510] FS: () GS:885effd2() knlGS: 4[196727.311554] CS: 0010 DS: ES: CR0: 80050033 4[196727.311581] CR2: 7a46751eb000 CR3: 005e65688000 CR4: 000407e0 4[196727.311625] DR0: DR1: DR2: 4[196727.311669] DR3: DR6: 0ff0 DR7: 0400 4[196727.311713] Stack: 4[196727.311733] 8854c398ecc0 8854c398ecc0 885effd23ab0 815b7f42 4[196727.311784] 88be6595bc00 8854c398ecc0 8854c398ecc0 4[196727.311834] 885effd23ad0 815b86c6 885d5a590800 8816827821c0 4[196727.311885] Call Trace: 4[196727.311907] IRQ 4[196727.311912] [815b7f42] dst_destroy+0x32/0xe0 4[196727.311959] [815b86c6] dst_release+0x56/0x80 4[196727.311986] [81620bd5] tcp_v4_do_rcv+0x2a5/0x4a0 4[196727.312013] [81622b5a] tcp_v4_rcv+0x7da/0x820 4[196727.312041] [815fd9e0] ? ip_rcv_finish+0x360/0x360 4[196727.312070] [815de02d] ? nf_hook_slow+0x7d/0x150 4[196727.312097] [815fd9e0] ? ip_rcv_finish+0x360/0x360 4[196727.312125] [815fda92] ip_local_deliver_finish+0xb2/0x230 4[196727.312154] [815fdd9a] ip_local_deliver+0x4a/0x90 4[196727.312183] [815fd799] ip_rcv_finish+0x119/0x360 4[196727.312212] [815fe00b] ip_rcv+0x22b/0x340 4[196727.312242] [a0339680] ? macvlan_broadcast+0x160/0x160 [macvlan] 4[196727.312275] [815b0c62] __netif_receive_skb_core+0x512/0x640 4[196727.312308] [811427fb] ? kmem_cache_alloc+0x13b/0x150 4[196727.312338] [815b0db1] __netif_receive_skb+0x21/0x70 4[196727.312368] [815b0fa1] netif_receive_skb+0x31/0xa0 4[196727.312397] [815b1ae8] napi_gro_receive+0xe8/0x140 4[196727.312433] [a00274f1] ixgbe_poll+0x551/0x11f0 [ixgbe] 4[196727.312463] [815fe00b] ? ip_rcv+0x22b/0x340 4[196727.312491] [815b1691] net_rx_action+0x111/0x210 4[196727.312521] [815b0db1] ? __netif_receive_skb+0x21/0x70 4[196727.312552] [810519d0] __do_softirq+0xd0/0x270 4[196727.312583] [816cef3c] call_softirq+0x1c/0x30 4[196727.312613] [81004205] do_softirq+0x55/0x90 4[196727.312640] [81051c85] irq_exit+0x55/0x60 4[196727.312668
Re: kmem_cache_alloc panic in 3.10+
On Fri, 31 Jan 2014, David Rientjes wrote: > On Fri, 31 Jan 2014, dormando wrote: > > > > CONFIG_SLUB_DEBUG_ON will definitely be slower but can help to identify > > > any possible corruption issues. > > > > > > I'm wondering if you have CONFIG_MEMCG enabled and are actually allocating > > > slab in a non-root memcg? What does /proc/self/cgroup say? > > > > > > > /proc/self/cgroup is empty on these hosts. CONFIG_MEMCG is enabled though. > > > > It _looks_ like the cmpxchg_double() so seeing if there is anything else > funny going on with CONFIG_SLUB_DEBUG_ON would definitely be helpful; > otherwise, try using CONFIG_SLAB is Eric suggested and seeing if the > problem goes away. > chpxchg_double()? that's not related to the 62713c4b fix right? I'll see what I can do.. it's going to take a long time to iterate on this though. Thanks for the suggestions! -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kmem_cache_alloc panic in 3.10+
On Fri, 31 Jan 2014, David Rientjes wrote: On Fri, 31 Jan 2014, dormando wrote: CONFIG_SLUB_DEBUG_ON will definitely be slower but can help to identify any possible corruption issues. I'm wondering if you have CONFIG_MEMCG enabled and are actually allocating slab in a non-root memcg? What does /proc/self/cgroup say? /proc/self/cgroup is empty on these hosts. CONFIG_MEMCG is enabled though. It _looks_ like the cmpxchg_double() so seeing if there is anything else funny going on with CONFIG_SLUB_DEBUG_ON would definitely be helpful; otherwise, try using CONFIG_SLAB is Eric suggested and seeing if the problem goes away. chpxchg_double()? that's not related to the 62713c4b fix right? I'll see what I can do.. it's going to take a long time to iterate on this though. Thanks for the suggestions! -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kmem_cache_alloc panic in 3.10+
> On Thu, Jan 30, 2014 at 6:16 PM, Eric Dumazet wrote: > > On Wed, 2014-01-29 at 23:05 -0800, dormando wrote: > > > >> We hit the routing code fairly hard. Any hints for what to look at or how > >> to instrument it? Or if it's fixed already? It's a real pain to iterate > >> since it takes ~30 days to crash, usually. Sometimes. > > sounds like adding mdelay() didn't help to crash it sooner. Then I don't > see how my dst fix was causing it to crash more often. Something odd. > fyi just to check it more thoroughly I've been running with mdelay() > and config_slub_debug_on for a week without issues. Sorry, I'm actually trying to deal with two separate crashes at once :/ One is this 3.10.15 one, and one was the regression in 3.10.23 - I haven't had time to attempt the mdelay test yet. The two crashes have fairly distinct traces. For what it's worth though the machines I have with that one patch reverted are still running fine. > > I really wonder... it looks like a possible in SLUB. (might be already > > fixed) > > > > Could you try using SLAB instead ? > > try config_slub_debug_on=y ? it should catch double free and other things. > Any slowdowns/issues with that? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kmem_cache_alloc panic in 3.10+
On Thu, Jan 30, 2014 at 6:16 PM, Eric Dumazet eric.duma...@gmail.com wrote: On Wed, 2014-01-29 at 23:05 -0800, dormando wrote: We hit the routing code fairly hard. Any hints for what to look at or how to instrument it? Or if it's fixed already? It's a real pain to iterate since it takes ~30 days to crash, usually. Sometimes. sounds like adding mdelay() didn't help to crash it sooner. Then I don't see how my dst fix was causing it to crash more often. Something odd. fyi just to check it more thoroughly I've been running with mdelay() and config_slub_debug_on for a week without issues. Sorry, I'm actually trying to deal with two separate crashes at once :/ One is this 3.10.15 one, and one was the regression in 3.10.23 - I haven't had time to attempt the mdelay test yet. The two crashes have fairly distinct traces. For what it's worth though the machines I have with that one patch reverted are still running fine. I really wonder... it looks like a possible in SLUB. (might be already fixed) Could you try using SLAB instead ? try config_slub_debug_on=y ? it should catch double free and other things. Any slowdowns/issues with that? -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kmem_cache_alloc panic in 3.10+
> > On Sat, 2014-01-18 at 00:44 -0800, dormando wrote: > > > Hello again! > > > > > > We've had a rare crash that's existed between 3.10.0 and 3.10.15 at least > > > (trying newer stables now, but I can't tell if it was fixed, and it takes > > > weeks to reproduce). > > > > > > Unfortunately I can only get 8k back from pstore. The panic looks a bit > > > longer than that is caught in the log, but the bottom part is almost > > > always this same trace as this one: > > > > > > Panic#6 Part1 > > > <4>[1197485.199166] [] tcp_push+0x6c/0x90 > > > <4>[1197485.199171] [] tcp_sendmsg+0x109/0xd40 > > > <4>[1197485.199179] [] ? put_page+0x35/0x40 > > > <4>[1197485.199185] [] inet_sendmsg+0x45/0xb0 > > > <4>[1197485.199191] [] sock_aio_write+0x11e/0x130 > > > <4>[1197485.199196] [] ? inet_recvmsg+0x4f/0x80 > > > <4>[1197485.199203] [] do_sync_readv_writev+0x6d/0xa0 > > > <4>[1197485.199209] [] do_readv_writev+0xfb/0x2f0 > > > <4>[1197485.199215] [] ? __free_pages+0x35/0x40 > > > <4>[1197485.199220] [] ? free_pages+0x46/0x50 > > > <4>[1197485.199226] [] ? SyS_mincore+0x152/0x690 > > > <4>[1197485.199231] [] vfs_writev+0x48/0x60 > > > <4>[1197485.199236] [] SyS_writev+0x5f/0xd0 > > > <4>[1197485.199243] [] system_call_fastpath+0x16/0x1b > > > <4>[1197485.199247] Code: 65 4c 03 04 25 c8 cb 00 00 49 8b 50 08 4d 8b 28 > > > 49 8b 40 10 4d 85 ed 0f 84 84 00 00 00 48 85 c0 74 7f 49 63 44 24 20 49 > > > 8b 3c 24 <49> 8b 5c 05 00 48 8d 4a 01 4c 89 e8 65 48 0f c7 0f 0f 94 c0 3c > > > <1>[1197485.199290] RIP [] kmem_cache_alloc+0x5a/0x130 > > > <4>[1197485.199296] RSP > > > <4>[1197485.199299] CR2: 0001 > > > <4>[1197485.199343] ---[ end trace 90fee06aa40b7304 ]--- > > > <1>[1197485.263911] BUG: unable to handle kernel paging request at > > > 0001 > > > <1>[1197485.263923] IP: [] kmem_cache_alloc+0x5a/0x130 > > > <4>[1197485.263932] PGD 3f43e5c067 PUD 0 > > > <4>[1197485.263937] Oops: [#5] SMP > > > <4>[1197485.263941] Modules linked in: ntfs vfat msdos fat macvlan bridge > > > coretemp crc32_pclmul ghash_clmulni_intel gpio_ich microcode sb_edac > > > edac_core lpc_ich mfd_core ixgbe igb i2c_algo_bit mdio ptp pps_core > > > <4>[1197485.263966] CPU: 0 PID: 233846 Comm: cache-worker Tainted: G > > > D 3.10.15 #1 > > > <4>[1197485.263972] Hardware name: Supermicro X9DR3-F/X9DR3-F, BIOS 2.0a > > > 03/07/2013 > > > <4>[1197485.263976] task: 883427f9dc00 ti: 8830d4312000 task.ti: > > > 8830d4312000 > > > <4>[1197485.263982] RIP: 0010:[] [] > > > kmem_cache_alloc+0x5a/0x130 > > > <4>[1197485.263990] RSP: 0018:881fffc038c8 EFLAGS: 00010286 > > > <4>[1197485.263994] RAX: RBX: 81c8c740 RCX: > > > > > > <4>[1197485.263999] RDX: 29273024 RSI: 0020 RDI: > > > 00015680 > > > <4>[1197485.264004] RBP: 881fffc03908 R08: 881fffc15680 R09: > > > 815bdd4b > > > <4>[1197485.264009] R10: 881c65d21800 R11: R12: > > > 881fff803800 > > > <4>[1197485.264014] R13: 0001 R14: R15: > > > > > > <4>[1197485.264019] FS: 7f8d855eb700() GS:881fffc0() > > > knlGS: > > > <4>[1197485.264024] CS: 0010 DS: ES: CR0: 80050033 > > > <4>[1197485.264028] CR2: 0001 CR3: 00308f258000 CR4: > > > 000407f0 > > > <4>[1197485.264032] DR0: DR1: DR2: > > > > > > <4>[1197485.264037] DR3: DR6: 0ff0 DR7: > > > 0400 > > > <4>[1197485.264041] Stack: > > > <4>[1197485.264044] 881fffc03928 0020815d0d95 881fffc03938 > > > 81c8c740 > > > <4>[1197485.264050] 881fce21 0001 > > > > > > <4>[1197485.264056] 881fffc03958 815bdd4b 881fffc039a8 > > > > > > <4>[1197485.264063] Call Trace: > > > <4>[1
Re: kmem_cache_alloc panic in 3.10+
On Sat, 2014-01-18 at 00:44 -0800, dormando wrote: Hello again! We've had a rare crash that's existed between 3.10.0 and 3.10.15 at least (trying newer stables now, but I can't tell if it was fixed, and it takes weeks to reproduce). Unfortunately I can only get 8k back from pstore. The panic looks a bit longer than that is caught in the log, but the bottom part is almost always this same trace as this one: Panic#6 Part1 4[1197485.199166] [81611e8c] tcp_push+0x6c/0x90 4[1197485.199171] [816160a9] tcp_sendmsg+0x109/0xd40 4[1197485.199179] [81114b65] ? put_page+0x35/0x40 4[1197485.199185] [8163bf75] inet_sendmsg+0x45/0xb0 4[1197485.199191] [8159da7e] sock_aio_write+0x11e/0x130 4[1197485.199196] [8163b83f] ? inet_recvmsg+0x4f/0x80 4[1197485.199203] [811558ad] do_sync_readv_writev+0x6d/0xa0 4[1197485.199209] [8115722b] do_readv_writev+0xfb/0x2f0 4[1197485.199215] [8110fda5] ? __free_pages+0x35/0x40 4[1197485.199220] [8110fe56] ? free_pages+0x46/0x50 4[1197485.199226] [8112f9e2] ? SyS_mincore+0x152/0x690 4[1197485.199231] [81157468] vfs_writev+0x48/0x60 4[1197485.199236] [811575af] SyS_writev+0x5f/0xd0 4[1197485.199243] [816cf942] system_call_fastpath+0x16/0x1b 4[1197485.199247] Code: 65 4c 03 04 25 c8 cb 00 00 49 8b 50 08 4d 8b 28 49 8b 40 10 4d 85 ed 0f 84 84 00 00 00 48 85 c0 74 7f 49 63 44 24 20 49 8b 3c 24 49 8b 5c 05 00 48 8d 4a 01 4c 89 e8 65 48 0f c7 0f 0f 94 c0 3c 1[1197485.199290] RIP [811476da] kmem_cache_alloc+0x5a/0x130 4[1197485.199296] RSP 883171211868 4[1197485.199299] CR2: 0001 4[1197485.199343] ---[ end trace 90fee06aa40b7304 ]--- 1[1197485.263911] BUG: unable to handle kernel paging request at 0001 1[1197485.263923] IP: [811476da] kmem_cache_alloc+0x5a/0x130 4[1197485.263932] PGD 3f43e5c067 PUD 0 4[1197485.263937] Oops: [#5] SMP 4[1197485.263941] Modules linked in: ntfs vfat msdos fat macvlan bridge coretemp crc32_pclmul ghash_clmulni_intel gpio_ich microcode sb_edac edac_core lpc_ich mfd_core ixgbe igb i2c_algo_bit mdio ptp pps_core 4[1197485.263966] CPU: 0 PID: 233846 Comm: cache-worker Tainted: G D 3.10.15 #1 4[1197485.263972] Hardware name: Supermicro X9DR3-F/X9DR3-F, BIOS 2.0a 03/07/2013 4[1197485.263976] task: 883427f9dc00 ti: 8830d4312000 task.ti: 8830d4312000 4[1197485.263982] RIP: 0010:[811476da] [811476da] kmem_cache_alloc+0x5a/0x130 4[1197485.263990] RSP: 0018:881fffc038c8 EFLAGS: 00010286 4[1197485.263994] RAX: RBX: 81c8c740 RCX: 4[1197485.263999] RDX: 29273024 RSI: 0020 RDI: 00015680 4[1197485.264004] RBP: 881fffc03908 R08: 881fffc15680 R09: 815bdd4b 4[1197485.264009] R10: 881c65d21800 R11: R12: 881fff803800 4[1197485.264014] R13: 0001 R14: R15: 4[1197485.264019] FS: 7f8d855eb700() GS:881fffc0() knlGS: 4[1197485.264024] CS: 0010 DS: ES: CR0: 80050033 4[1197485.264028] CR2: 0001 CR3: 00308f258000 CR4: 000407f0 4[1197485.264032] DR0: DR1: DR2: 4[1197485.264037] DR3: DR6: 0ff0 DR7: 0400 4[1197485.264041] Stack: 4[1197485.264044] 881fffc03928 0020815d0d95 881fffc03938 81c8c740 4[1197485.264050] 881fce21 0001 4[1197485.264056] 881fffc03958 815bdd4b 881fffc039a8 4[1197485.264063] Call Trace: 4[1197485.264066] IRQ 4[1197485.264069] [815bdd4b] dst_alloc+0x5b/0x190 4[1197485.264080] [8160068c] rt_dst_alloc+0x4c/0x50 4[1197485.264085] [81602a30] __ip_route_output_key+0x270/0x880 4[1197485.264092] [8107ee7e] ? try_to_wake_up+0x23e/0x2b0 4[1197485.264097] [81603067] ip_route_output_flow+0x27/0x60 4[1197485.264102] [8160ab8a] ip_queue_xmit+0x36a/0x390 4[1197485.264108] [816207c5] tcp_transmit_skb+0x485/0x890 4[1197485.264113] [81621aa1] tcp_send_ack+0xf1/0x130 4[1197485.264118] [81618d7e] __tcp_ack_snd_check+0x5e/0xa0 4[1197485.264123] [8161f2c2] tcp_rcv_state_process+0x8b2/0xb20 4[1197485.264128] [81627e61] tcp_v4_do_rcv+0x191/0x4f0 4[1197485.264133] [8162984c] tcp_v4_rcv+0x5fc/0x750 4[1197485.264138] [81604c80] ? ip_rcv+0x350/0x350 4[1197485.264143] [815e45cd] ? nf_hook_slow+0x7d/0x160 4[1197485.264147
Re: ipv4_dst_destroy panic regression after 3.10.15
On Tue, 21 Jan 2014, Alexei Starovoitov wrote: > On Tue, Jan 21, 2014 at 5:39 PM, dormando wrote: > > > > > On Fri, Jan 17, 2014 at 11:16 PM, dormando wrote: > > > >> On Fri, 2014-01-17 at 22:49 -0800, Eric Dumazet wrote: > > > >> > On Fri, 2014-01-17 at 17:25 -0800, dormando wrote: > > > >> > > Hi, > > > >> > > > > > >> > > Upgraded a few kernels to the latest 3.10 stable tree while > > > >> > > tracking down > > > >> > > a rare kernel panic, seems to have introduced a much more frequent > > > >> > > kernel > > > >> > > panic. Takes anywhere from 4 hours to 2 days to trigger: > > > >> > > > > > >> > > <4>[196727.311203] general protection fault: [#1] SMP > > > >> > > <4>[196727.311224] Modules linked in: xt_TEE xt_dscp xt_DSCP > > > >> > > macvlan bridge coretemp crc32_pclmul ghash_clmulni_intel gpio_ich > > > >> > > microcode > ipmi_watchdog ipmi_devintf sb_edac edac_core lpc_ich mfd_core tpm_tis tpm > tpm_bios ipmi_si ipmi_msghandler isci igb libsas i2c_algo_bit ixgbe ptp > pps_core mdio > > > >> > > <4>[196727.311333] CPU: 17 PID: 0 Comm: swapper/17 Not tainted > > > >> > > 3.10.26 #1 > > > >> > > <4>[196727.311344] Hardware name: Supermicro > > > >> > > X9DRi-LN4+/X9DR3-LN4+/X9DRi-LN4+/X9DR3-LN4+, BIOS 3.0 07/05/2013 > > > >> > > <4>[196727.311364] task: 885e6f069700 ti: 885e6f072000 > > > >> > > task.ti: 885e6f072000 > > > >> > > <4>[196727.311377] RIP: 0010:[] > > > >> > > [] ipv4_dst_destroy+0x4f/0x80 > > > >> > > <4>[196727.311399] RSP: 0018:885effd23a70 EFLAGS: 00010282 > > > >> > > <4>[196727.311409] RAX: dead00200200 RBX: 8854c398ecc0 > > > >> > > RCX: 0040 > > > >> > > <4>[196727.311423] RDX: dead00100100 RSI: dead00100100 > > > >> > > RDI: dead00200200 > > > >> > > <4>[196727.311437] RBP: 885effd23a80 R08: 815fd9e0 > > > >> > > R09: 885d5a590800 > > > >> > > <4>[196727.311451] R10: R11: > > > >> > > R12: > > > >> > > <4>[196727.311464] R13: 81c8c280 R14: > > > >> > > R15: 880e85ee16ce > > > >> > > <4>[196727.311510] FS: () > > > >> > > GS:885effd2() knlGS: > > > >> > > <4>[196727.311554] CS: 0010 DS: ES: CR0: > > > >> > > 80050033 > > > >> > > <4>[196727.311581] CR2: 7a46751eb000 CR3: 005e65688000 > > > >> > > CR4: 000407e0 > > > >> > > <4>[196727.311625] DR0: DR1: > > > >> > > DR2: > > > >> > > <4>[196727.311669] DR3: DR6: 0ff0 > > > >> > > DR7: 0400 > > > >> > > <4>[196727.311713] Stack: > > > >> > > <4>[196727.311733] 8854c398ecc0 8854c398ecc0 > > > >> > > 885effd23ab0 815b7f42 > > > >> > > <4>[196727.311784] 88be6595bc00 8854c398ecc0 > > > >> > > 8854c398ecc0 > > > >> > > <4>[196727.311834] 885effd23ad0 815b86c6 > > > >> > > 885d5a590800 8816827821c0 > > > >> > > <4>[196727.311885] Call Trace: > > > >> > > <4>[196727.311907] > > > >> > > <4>[196727.311912] [] dst_destroy+0x32/0xe0 > > > >> > > <4>[196727.311959] [] dst_release+0x56/0x80 > > > >> > > <4>[196727.311986] [] tcp_v4_do_rcv+0x2a5/0x4a0 > > > >> > > <4>[196727.312013] [] tcp_v4_rcv+0x7da/0x820 > > > >> > > <4>[196727.312041] [] ? > > > >> > > ip_rcv_finish+0x360/0x360 > > > >> > > <4>[196727.312070] [] ? nf_hook_slow+0x7d/0x150 > > > >> > > <4>[196727.312097] [] ? &g
Re: ipv4_dst_destroy panic regression after 3.10.15
> On Fri, Jan 17, 2014 at 11:16 PM, dormando wrote: > >> On Fri, 2014-01-17 at 22:49 -0800, Eric Dumazet wrote: > >> > On Fri, 2014-01-17 at 17:25 -0800, dormando wrote: > >> > > Hi, > >> > > > >> > > Upgraded a few kernels to the latest 3.10 stable tree while tracking > >> > > down > >> > > a rare kernel panic, seems to have introduced a much more frequent > >> > > kernel > >> > > panic. Takes anywhere from 4 hours to 2 days to trigger: > >> > > > >> > > <4>[196727.311203] general protection fault: [#1] SMP > >> > > <4>[196727.311224] Modules linked in: xt_TEE xt_dscp xt_DSCP macvlan > >> > > bridge coretemp crc32_pclmul ghash_clmulni_intel gpio_ich microcode > >> > > ipmi_watchdog ipmi_devintf sb_edac edac_core lpc_ich mfd_core tpm_tis > >> > > tpm tpm_bios ipmi_si ipmi_msghandler isci igb libsas i2c_algo_bit > >> > > ixgbe ptp pps_core mdio > >> > > <4>[196727.311333] CPU: 17 PID: 0 Comm: swapper/17 Not tainted 3.10.26 > >> > > #1 > >> > > <4>[196727.311344] Hardware name: Supermicro > >> > > X9DRi-LN4+/X9DR3-LN4+/X9DRi-LN4+/X9DR3-LN4+, BIOS 3.0 07/05/2013 > >> > > <4>[196727.311364] task: 885e6f069700 ti: 885e6f072000 > >> > > task.ti: 885e6f072000 > >> > > <4>[196727.311377] RIP: 0010:[] > >> > > [] ipv4_dst_destroy+0x4f/0x80 > >> > > <4>[196727.311399] RSP: 0018:885effd23a70 EFLAGS: 00010282 > >> > > <4>[196727.311409] RAX: dead00200200 RBX: 8854c398ecc0 RCX: > >> > > 0040 > >> > > <4>[196727.311423] RDX: dead00100100 RSI: dead00100100 RDI: > >> > > dead00200200 > >> > > <4>[196727.311437] RBP: 885effd23a80 R08: 815fd9e0 R09: > >> > > 885d5a590800 > >> > > <4>[196727.311451] R10: R11: R12: > >> > > > >> > > <4>[196727.311464] R13: 81c8c280 R14: R15: > >> > > 880e85ee16ce > >> > > <4>[196727.311510] FS: () > >> > > GS:885effd2() knlGS: > >> > > <4>[196727.311554] CS: 0010 DS: ES: CR0: 80050033 > >> > > <4>[196727.311581] CR2: 7a46751eb000 CR3: 005e65688000 CR4: > >> > > 000407e0 > >> > > <4>[196727.311625] DR0: DR1: DR2: > >> > > > >> > > <4>[196727.311669] DR3: DR6: 0ff0 DR7: > >> > > 0400 > >> > > <4>[196727.311713] Stack: > >> > > <4>[196727.311733] 8854c398ecc0 8854c398ecc0 885effd23ab0 > >> > > 815b7f42 > >> > > <4>[196727.311784] 88be6595bc00 8854c398ecc0 > >> > > 8854c398ecc0 > >> > > <4>[196727.311834] 885effd23ad0 815b86c6 885d5a590800 > >> > > 8816827821c0 > >> > > <4>[196727.311885] Call Trace: > >> > > <4>[196727.311907] > >> > > <4>[196727.311912] [] dst_destroy+0x32/0xe0 > >> > > <4>[196727.311959] [] dst_release+0x56/0x80 > >> > > <4>[196727.311986] [] tcp_v4_do_rcv+0x2a5/0x4a0 > >> > > <4>[196727.312013] [] tcp_v4_rcv+0x7da/0x820 > >> > > <4>[196727.312041] [] ? ip_rcv_finish+0x360/0x360 > >> > > <4>[196727.312070] [] ? nf_hook_slow+0x7d/0x150 > >> > > <4>[196727.312097] [] ? ip_rcv_finish+0x360/0x360 > >> > > <4>[196727.312125] [] > >> > > ip_local_deliver_finish+0xb2/0x230 > >> > > <4>[196727.312154] [] ip_local_deliver+0x4a/0x90 > >> > > <4>[196727.312183] [] ip_rcv_finish+0x119/0x360 > >> > > <4>[196727.312212] [] ip_rcv+0x22b/0x340 > >> > > <4>[196727.312242] [] ? > >> > > macvlan_broadcast+0x160/0x160 [macvlan] > >> > > <4>[196727.312275] [] > >> > > __netif_receive_skb_core+0x512/0x640 > >> > > <4>[196727.312308] [] ? kmem_cache_alloc+0x13b/0x150 > >> > > <4>[196727.312338]
Re: ipv4_dst_destroy panic regression after 3.10.15
On Fri, Jan 17, 2014 at 11:16 PM, dormando dorma...@rydia.net wrote: On Fri, 2014-01-17 at 22:49 -0800, Eric Dumazet wrote: On Fri, 2014-01-17 at 17:25 -0800, dormando wrote: Hi, Upgraded a few kernels to the latest 3.10 stable tree while tracking down a rare kernel panic, seems to have introduced a much more frequent kernel panic. Takes anywhere from 4 hours to 2 days to trigger: 4[196727.311203] general protection fault: [#1] SMP 4[196727.311224] Modules linked in: xt_TEE xt_dscp xt_DSCP macvlan bridge coretemp crc32_pclmul ghash_clmulni_intel gpio_ich microcode ipmi_watchdog ipmi_devintf sb_edac edac_core lpc_ich mfd_core tpm_tis tpm tpm_bios ipmi_si ipmi_msghandler isci igb libsas i2c_algo_bit ixgbe ptp pps_core mdio 4[196727.311333] CPU: 17 PID: 0 Comm: swapper/17 Not tainted 3.10.26 #1 4[196727.311344] Hardware name: Supermicro X9DRi-LN4+/X9DR3-LN4+/X9DRi-LN4+/X9DR3-LN4+, BIOS 3.0 07/05/2013 4[196727.311364] task: 885e6f069700 ti: 885e6f072000 task.ti: 885e6f072000 4[196727.311377] RIP: 0010:[815f8c7f] [815f8c7f] ipv4_dst_destroy+0x4f/0x80 4[196727.311399] RSP: 0018:885effd23a70 EFLAGS: 00010282 4[196727.311409] RAX: dead00200200 RBX: 8854c398ecc0 RCX: 0040 4[196727.311423] RDX: dead00100100 RSI: dead00100100 RDI: dead00200200 4[196727.311437] RBP: 885effd23a80 R08: 815fd9e0 R09: 885d5a590800 4[196727.311451] R10: R11: R12: 4[196727.311464] R13: 81c8c280 R14: R15: 880e85ee16ce 4[196727.311510] FS: () GS:885effd2() knlGS: 4[196727.311554] CS: 0010 DS: ES: CR0: 80050033 4[196727.311581] CR2: 7a46751eb000 CR3: 005e65688000 CR4: 000407e0 4[196727.311625] DR0: DR1: DR2: 4[196727.311669] DR3: DR6: 0ff0 DR7: 0400 4[196727.311713] Stack: 4[196727.311733] 8854c398ecc0 8854c398ecc0 885effd23ab0 815b7f42 4[196727.311784] 88be6595bc00 8854c398ecc0 8854c398ecc0 4[196727.311834] 885effd23ad0 815b86c6 885d5a590800 8816827821c0 4[196727.311885] Call Trace: 4[196727.311907] IRQ 4[196727.311912] [815b7f42] dst_destroy+0x32/0xe0 4[196727.311959] [815b86c6] dst_release+0x56/0x80 4[196727.311986] [81620bd5] tcp_v4_do_rcv+0x2a5/0x4a0 4[196727.312013] [81622b5a] tcp_v4_rcv+0x7da/0x820 4[196727.312041] [815fd9e0] ? ip_rcv_finish+0x360/0x360 4[196727.312070] [815de02d] ? nf_hook_slow+0x7d/0x150 4[196727.312097] [815fd9e0] ? ip_rcv_finish+0x360/0x360 4[196727.312125] [815fda92] ip_local_deliver_finish+0xb2/0x230 4[196727.312154] [815fdd9a] ip_local_deliver+0x4a/0x90 4[196727.312183] [815fd799] ip_rcv_finish+0x119/0x360 4[196727.312212] [815fe00b] ip_rcv+0x22b/0x340 4[196727.312242] [a0339680] ? macvlan_broadcast+0x160/0x160 [macvlan] 4[196727.312275] [815b0c62] __netif_receive_skb_core+0x512/0x640 4[196727.312308] [811427fb] ? kmem_cache_alloc+0x13b/0x150 4[196727.312338] [815b0db1] __netif_receive_skb+0x21/0x70 4[196727.312368] [815b0fa1] netif_receive_skb+0x31/0xa0 4[196727.312397] [815b1ae8] napi_gro_receive+0xe8/0x140 4[196727.312433] [a00274f1] ixgbe_poll+0x551/0x11f0 [ixgbe] 4[196727.312463] [815fe00b] ? ip_rcv+0x22b/0x340 4[196727.312491] [815b1691] net_rx_action+0x111/0x210 4[196727.312521] [815b0db1] ? __netif_receive_skb+0x21/0x70 4[196727.312552] [810519d0] __do_softirq+0xd0/0x270 4[196727.312583] [816cef3c] call_softirq+0x1c/0x30 4[196727.312613] [81004205] do_softirq+0x55/0x90 4[196727.312640] [81051c85] irq_exit+0x55/0x60 4[196727.312668] [816cf5c3] do_IRQ+0x63/0xe0 4[196727.312696] [816c5aaa] common_interrupt+0x6a/0x6a 4[196727.312722] EOI 4[196727.312727] [8100a150] ? default_idle+0x20/0xe0 4[196727.312775] [8100a8ff] arch_cpu_idle+0xf/0x20 4[196727.312803] [8108d330] cpu_startup_entry+0xc0/0x270 4[196727.312833] [816b276e] start_secondary+0x1f9/0x200 4[196727.312860] Code: 4a 9f e9 81 e8 13 cb 0c 00 48 8b 93 b0 00 00 00 48 bf 00 02 20 00 00 00 ad de 48 8b 83 b8 00 00 00 48 be 00 01 10 00 00 00 ad de 48 89 42 08 48 89 10 48 89 bb b8 00 00 00 48 c7 c7 4a 9f e9 81 1[196727.313071] RIP [815f8c7f] ipv4_dst_destroy+0x4f
Re: ipv4_dst_destroy panic regression after 3.10.15
On Tue, 21 Jan 2014, Alexei Starovoitov wrote: On Tue, Jan 21, 2014 at 5:39 PM, dormando dorma...@rydia.net wrote: On Fri, Jan 17, 2014 at 11:16 PM, dormando dorma...@rydia.net wrote: On Fri, 2014-01-17 at 22:49 -0800, Eric Dumazet wrote: On Fri, 2014-01-17 at 17:25 -0800, dormando wrote: Hi, Upgraded a few kernels to the latest 3.10 stable tree while tracking down a rare kernel panic, seems to have introduced a much more frequent kernel panic. Takes anywhere from 4 hours to 2 days to trigger: 4[196727.311203] general protection fault: [#1] SMP 4[196727.311224] Modules linked in: xt_TEE xt_dscp xt_DSCP macvlan bridge coretemp crc32_pclmul ghash_clmulni_intel gpio_ich microcode ipmi_watchdog ipmi_devintf sb_edac edac_core lpc_ich mfd_core tpm_tis tpm tpm_bios ipmi_si ipmi_msghandler isci igb libsas i2c_algo_bit ixgbe ptp pps_core mdio 4[196727.311333] CPU: 17 PID: 0 Comm: swapper/17 Not tainted 3.10.26 #1 4[196727.311344] Hardware name: Supermicro X9DRi-LN4+/X9DR3-LN4+/X9DRi-LN4+/X9DR3-LN4+, BIOS 3.0 07/05/2013 4[196727.311364] task: 885e6f069700 ti: 885e6f072000 task.ti: 885e6f072000 4[196727.311377] RIP: 0010:[815f8c7f] [815f8c7f] ipv4_dst_destroy+0x4f/0x80 4[196727.311399] RSP: 0018:885effd23a70 EFLAGS: 00010282 4[196727.311409] RAX: dead00200200 RBX: 8854c398ecc0 RCX: 0040 4[196727.311423] RDX: dead00100100 RSI: dead00100100 RDI: dead00200200 4[196727.311437] RBP: 885effd23a80 R08: 815fd9e0 R09: 885d5a590800 4[196727.311451] R10: R11: R12: 4[196727.311464] R13: 81c8c280 R14: R15: 880e85ee16ce 4[196727.311510] FS: () GS:885effd2() knlGS: 4[196727.311554] CS: 0010 DS: ES: CR0: 80050033 4[196727.311581] CR2: 7a46751eb000 CR3: 005e65688000 CR4: 000407e0 4[196727.311625] DR0: DR1: DR2: 4[196727.311669] DR3: DR6: 0ff0 DR7: 0400 4[196727.311713] Stack: 4[196727.311733] 8854c398ecc0 8854c398ecc0 885effd23ab0 815b7f42 4[196727.311784] 88be6595bc00 8854c398ecc0 8854c398ecc0 4[196727.311834] 885effd23ad0 815b86c6 885d5a590800 8816827821c0 4[196727.311885] Call Trace: 4[196727.311907] IRQ 4[196727.311912] [815b7f42] dst_destroy+0x32/0xe0 4[196727.311959] [815b86c6] dst_release+0x56/0x80 4[196727.311986] [81620bd5] tcp_v4_do_rcv+0x2a5/0x4a0 4[196727.312013] [81622b5a] tcp_v4_rcv+0x7da/0x820 4[196727.312041] [815fd9e0] ? ip_rcv_finish+0x360/0x360 4[196727.312070] [815de02d] ? nf_hook_slow+0x7d/0x150 4[196727.312097] [815fd9e0] ? ip_rcv_finish+0x360/0x360 4[196727.312125] [815fda92] ip_local_deliver_finish+0xb2/0x230 4[196727.312154] [815fdd9a] ip_local_deliver+0x4a/0x90 4[196727.312183] [815fd799] ip_rcv_finish+0x119/0x360 4[196727.312212] [815fe00b] ip_rcv+0x22b/0x340 4[196727.312242] [a0339680] ? macvlan_broadcast+0x160/0x160 [macvlan] 4[196727.312275] [815b0c62] __netif_receive_skb_core+0x512/0x640 4[196727.312308] [811427fb] ? kmem_cache_alloc+0x13b/0x150 4[196727.312338] [815b0db1] __netif_receive_skb+0x21/0x70 4[196727.312368] [815b0fa1] netif_receive_skb+0x31/0xa0 4[196727.312397] [815b1ae8] napi_gro_receive+0xe8/0x140 4[196727.312433] [a00274f1] ixgbe_poll+0x551/0x11f0 [ixgbe] 4[196727.312463] [815fe00b] ? ip_rcv+0x22b/0x340 4[196727.312491] [815b1691] net_rx_action+0x111/0x210 4[196727.312521] [815b0db1] ? __netif_receive_skb+0x21/0x70 4[196727.312552] [810519d0] __do_softirq+0xd0/0x270 4[196727.312583] [816cef3c] call_softirq+0x1c/0x30 4[196727.312613] [81004205] do_softirq+0x55/0x90 4[196727.312640] [81051c85] irq_exit+0x55/0x60 4[196727.312668] [816cf5c3] do_IRQ+0x63/0xe0 4[196727.312696] [816c5aaa] common_interrupt+0x6a/0x6a 4[196727.312722] EOI 4[196727.312727] [8100a150] ? default_idle+0x20/0xe0 4[196727.312775] [8100a8ff] arch_cpu_idle+0xf/0x20 4[196727.312803] [8108d330] cpu_startup_entry+0xc0/0x270 4[196727.312833
Re: ipv4_dst_destroy panic regression after 3.10.15
> On Fri, 2014-01-17 at 17:25 -0800, dormando wrote: > > Hi, > > > > Upgraded a few kernels to the latest 3.10 stable tree while tracking down > > a rare kernel panic, seems to have introduced a much more frequent kernel > > panic. Takes anywhere from 4 hours to 2 days to trigger: > > > > <4>[196727.311203] general protection fault: [#1] SMP > > <4>[196727.311224] Modules linked in: xt_TEE xt_dscp xt_DSCP macvlan bridge > > coretemp crc32_pclmul ghash_clmulni_intel gpio_ich microcode ipmi_watchdog > > ipmi_devintf sb_edac edac_core lpc_ich mfd_core tpm_tis tpm tpm_bios > > ipmi_si ipmi_msghandler isci igb libsas i2c_algo_bit ixgbe ptp pps_core mdio > > <4>[196727.311333] CPU: 17 PID: 0 Comm: swapper/17 Not tainted 3.10.26 #1 > > <4>[196727.311344] Hardware name: Supermicro > > X9DRi-LN4+/X9DR3-LN4+/X9DRi-LN4+/X9DR3-LN4+, BIOS 3.0 07/05/2013 > > <4>[196727.311364] task: 885e6f069700 ti: 885e6f072000 task.ti: > > 885e6f072000 > > <4>[196727.311377] RIP: 0010:[] [] > > ipv4_dst_destroy+0x4f/0x80 > > <4>[196727.311399] RSP: 0018:885effd23a70 EFLAGS: 00010282 > > <4>[196727.311409] RAX: dead00200200 RBX: 8854c398ecc0 RCX: > > 0040 > > <4>[196727.311423] RDX: dead00100100 RSI: dead00100100 RDI: > > dead00200200 > > <4>[196727.311437] RBP: 885effd23a80 R08: 815fd9e0 R09: > > 885d5a590800 > > <4>[196727.311451] R10: R11: R12: > > > > <4>[196727.311464] R13: 81c8c280 R14: R15: > > 880e85ee16ce > > <4>[196727.311510] FS: () GS:885effd2() > > knlGS: > > <4>[196727.311554] CS: 0010 DS: ES: CR0: 80050033 > > <4>[196727.311581] CR2: 7a46751eb000 CR3: 005e65688000 CR4: > > 000407e0 > > <4>[196727.311625] DR0: DR1: DR2: > > > > <4>[196727.311669] DR3: DR6: 0ff0 DR7: > > 0400 > > <4>[196727.311713] Stack: > > <4>[196727.311733] 8854c398ecc0 8854c398ecc0 885effd23ab0 > > 815b7f42 > > <4>[196727.311784] 88be6595bc00 8854c398ecc0 > > 8854c398ecc0 > > <4>[196727.311834] 885effd23ad0 815b86c6 885d5a590800 > > 8816827821c0 > > <4>[196727.311885] Call Trace: > > <4>[196727.311907] > > <4>[196727.311912] [] dst_destroy+0x32/0xe0 > > <4>[196727.311959] [] dst_release+0x56/0x80 > > <4>[196727.311986] [] tcp_v4_do_rcv+0x2a5/0x4a0 > > <4>[196727.312013] [] tcp_v4_rcv+0x7da/0x820 > > <4>[196727.312041] [] ? ip_rcv_finish+0x360/0x360 > > <4>[196727.312070] [] ? nf_hook_slow+0x7d/0x150 > > <4>[196727.312097] [] ? ip_rcv_finish+0x360/0x360 > > <4>[196727.312125] [] ip_local_deliver_finish+0xb2/0x230 > > <4>[196727.312154] [] ip_local_deliver+0x4a/0x90 > > <4>[196727.312183] [] ip_rcv_finish+0x119/0x360 > > <4>[196727.312212] [] ip_rcv+0x22b/0x340 > > <4>[196727.312242] [] ? macvlan_broadcast+0x160/0x160 > > [macvlan] > > <4>[196727.312275] [] > > __netif_receive_skb_core+0x512/0x640 > > <4>[196727.312308] [] ? kmem_cache_alloc+0x13b/0x150 > > <4>[196727.312338] [] __netif_receive_skb+0x21/0x70 > > <4>[196727.312368] [] netif_receive_skb+0x31/0xa0 > > <4>[196727.312397] [] napi_gro_receive+0xe8/0x140 > > <4>[196727.312433] [] ixgbe_poll+0x551/0x11f0 [ixgbe] > > <4>[196727.312463] [] ? ip_rcv+0x22b/0x340 > > <4>[196727.312491] [] net_rx_action+0x111/0x210 > > <4>[196727.312521] [] ? __netif_receive_skb+0x21/0x70 > > <4>[196727.312552] [] __do_softirq+0xd0/0x270 > > <4>[196727.312583] [] call_softirq+0x1c/0x30 > > <4>[196727.312613] [] do_softirq+0x55/0x90 > > <4>[196727.312640] [] irq_exit+0x55/0x60 > > <4>[196727.312668] [] do_IRQ+0x63/0xe0 > > <4>[196727.312696] [] common_interrupt+0x6a/0x6a > > <4>[196727.312722] > > <4>[196727.312727] [] ? default_idle+0x20/0xe0 > > <4>[196727.312775] [] arch_cpu_idle+0xf/0x20 > > <4>[196727.312803] [] cpu_startup_entry+0xc0/0x270 > > <4>[196727.312833] [] start_secondary+0x1f9/0x200 > > <4>[196727.312860] Code: 4a 9f e9 81 e8 13 cb 0c
Re: ipv4_dst_destroy panic regression after 3.10.15
On Fri, 2014-01-17 at 17:25 -0800, dormando wrote: Hi, Upgraded a few kernels to the latest 3.10 stable tree while tracking down a rare kernel panic, seems to have introduced a much more frequent kernel panic. Takes anywhere from 4 hours to 2 days to trigger: 4[196727.311203] general protection fault: [#1] SMP 4[196727.311224] Modules linked in: xt_TEE xt_dscp xt_DSCP macvlan bridge coretemp crc32_pclmul ghash_clmulni_intel gpio_ich microcode ipmi_watchdog ipmi_devintf sb_edac edac_core lpc_ich mfd_core tpm_tis tpm tpm_bios ipmi_si ipmi_msghandler isci igb libsas i2c_algo_bit ixgbe ptp pps_core mdio 4[196727.311333] CPU: 17 PID: 0 Comm: swapper/17 Not tainted 3.10.26 #1 4[196727.311344] Hardware name: Supermicro X9DRi-LN4+/X9DR3-LN4+/X9DRi-LN4+/X9DR3-LN4+, BIOS 3.0 07/05/2013 4[196727.311364] task: 885e6f069700 ti: 885e6f072000 task.ti: 885e6f072000 4[196727.311377] RIP: 0010:[815f8c7f] [815f8c7f] ipv4_dst_destroy+0x4f/0x80 4[196727.311399] RSP: 0018:885effd23a70 EFLAGS: 00010282 4[196727.311409] RAX: dead00200200 RBX: 8854c398ecc0 RCX: 0040 4[196727.311423] RDX: dead00100100 RSI: dead00100100 RDI: dead00200200 4[196727.311437] RBP: 885effd23a80 R08: 815fd9e0 R09: 885d5a590800 4[196727.311451] R10: R11: R12: 4[196727.311464] R13: 81c8c280 R14: R15: 880e85ee16ce 4[196727.311510] FS: () GS:885effd2() knlGS: 4[196727.311554] CS: 0010 DS: ES: CR0: 80050033 4[196727.311581] CR2: 7a46751eb000 CR3: 005e65688000 CR4: 000407e0 4[196727.311625] DR0: DR1: DR2: 4[196727.311669] DR3: DR6: 0ff0 DR7: 0400 4[196727.311713] Stack: 4[196727.311733] 8854c398ecc0 8854c398ecc0 885effd23ab0 815b7f42 4[196727.311784] 88be6595bc00 8854c398ecc0 8854c398ecc0 4[196727.311834] 885effd23ad0 815b86c6 885d5a590800 8816827821c0 4[196727.311885] Call Trace: 4[196727.311907] IRQ 4[196727.311912] [815b7f42] dst_destroy+0x32/0xe0 4[196727.311959] [815b86c6] dst_release+0x56/0x80 4[196727.311986] [81620bd5] tcp_v4_do_rcv+0x2a5/0x4a0 4[196727.312013] [81622b5a] tcp_v4_rcv+0x7da/0x820 4[196727.312041] [815fd9e0] ? ip_rcv_finish+0x360/0x360 4[196727.312070] [815de02d] ? nf_hook_slow+0x7d/0x150 4[196727.312097] [815fd9e0] ? ip_rcv_finish+0x360/0x360 4[196727.312125] [815fda92] ip_local_deliver_finish+0xb2/0x230 4[196727.312154] [815fdd9a] ip_local_deliver+0x4a/0x90 4[196727.312183] [815fd799] ip_rcv_finish+0x119/0x360 4[196727.312212] [815fe00b] ip_rcv+0x22b/0x340 4[196727.312242] [a0339680] ? macvlan_broadcast+0x160/0x160 [macvlan] 4[196727.312275] [815b0c62] __netif_receive_skb_core+0x512/0x640 4[196727.312308] [811427fb] ? kmem_cache_alloc+0x13b/0x150 4[196727.312338] [815b0db1] __netif_receive_skb+0x21/0x70 4[196727.312368] [815b0fa1] netif_receive_skb+0x31/0xa0 4[196727.312397] [815b1ae8] napi_gro_receive+0xe8/0x140 4[196727.312433] [a00274f1] ixgbe_poll+0x551/0x11f0 [ixgbe] 4[196727.312463] [815fe00b] ? ip_rcv+0x22b/0x340 4[196727.312491] [815b1691] net_rx_action+0x111/0x210 4[196727.312521] [815b0db1] ? __netif_receive_skb+0x21/0x70 4[196727.312552] [810519d0] __do_softirq+0xd0/0x270 4[196727.312583] [816cef3c] call_softirq+0x1c/0x30 4[196727.312613] [81004205] do_softirq+0x55/0x90 4[196727.312640] [81051c85] irq_exit+0x55/0x60 4[196727.312668] [816cf5c3] do_IRQ+0x63/0xe0 4[196727.312696] [816c5aaa] common_interrupt+0x6a/0x6a 4[196727.312722] EOI 4[196727.312727] [8100a150] ? default_idle+0x20/0xe0 4[196727.312775] [8100a8ff] arch_cpu_idle+0xf/0x20 4[196727.312803] [8108d330] cpu_startup_entry+0xc0/0x270 4[196727.312833] [816b276e] start_secondary+0x1f9/0x200 4[196727.312860] Code: 4a 9f e9 81 e8 13 cb 0c 00 48 8b 93 b0 00 00 00 48 bf 00 02 20 00 00 00 ad de 48 8b 83 b8 00 00 00 48 be 00 01 10 00 00 00 ad de 48 89 42 08 48 89 10 48 89 bb b8 00 00 00 48 c7 c7 4a 9f e9 81 1[196727.313071] RIP [815f8c7f] ipv4_dst_destroy+0x4f/0x80 4[196727.313100] RSP 885effd23a70 4[196727.313377] ---[ end trace 64b3f14fae0f2e29 ]--- 0[196727.380908] Kernel panic - not syncing: Fatal exception in interrupt ... bisecting it's going to be a pain... I tried eyeballing the diffs and am trying a revert or two. We've hit it in .25, .26 so far. I have .27
Re: ipv4_dst_destroy panic regression after 3.10.15
> On Fri, Jan 17, 2014 at 11:16 PM, dormando wrote: > >> On Fri, 2014-01-17 at 22:49 -0800, Eric Dumazet wrote: > >> > On Fri, 2014-01-17 at 17:25 -0800, dormando wrote: > >> > > Hi, > >> > > > >> > > Upgraded a few kernels to the latest 3.10 stable tree while tracking > >> > > down > >> > > a rare kernel panic, seems to have introduced a much more frequent > >> > > kernel > >> > > panic. Takes anywhere from 4 hours to 2 days to trigger: > >> > > > >> > > <4>[196727.311203] general protection fault: [#1] SMP > >> > > <4>[196727.311224] Modules linked in: xt_TEE xt_dscp xt_DSCP macvlan > >> > > bridge coretemp crc32_pclmul ghash_clmulni_intel gpio_ich microcode > >> > > ipmi_watchdog ipmi_devintf sb_edac edac_core lpc_ich mfd_core tpm_tis > >> > > tpm tpm_bios ipmi_si ipmi_msghandler isci igb libsas i2c_algo_bit > >> > > ixgbe ptp pps_core mdio > >> > > <4>[196727.311333] CPU: 17 PID: 0 Comm: swapper/17 Not tainted 3.10.26 > >> > > #1 > >> > > <4>[196727.311344] Hardware name: Supermicro > >> > > X9DRi-LN4+/X9DR3-LN4+/X9DRi-LN4+/X9DR3-LN4+, BIOS 3.0 07/05/2013 > >> > > <4>[196727.311364] task: 885e6f069700 ti: 885e6f072000 > >> > > task.ti: 885e6f072000 > >> > > <4>[196727.311377] RIP: 0010:[] > >> > > [] ipv4_dst_destroy+0x4f/0x80 > >> > > <4>[196727.311399] RSP: 0018:885effd23a70 EFLAGS: 00010282 > >> > > <4>[196727.311409] RAX: dead00200200 RBX: 8854c398ecc0 RCX: > >> > > 0040 > >> > > <4>[196727.311423] RDX: dead00100100 RSI: dead00100100 RDI: > >> > > dead00200200 > >> > > <4>[196727.311437] RBP: 885effd23a80 R08: 815fd9e0 R09: > >> > > 885d5a590800 > >> > > <4>[196727.311451] R10: R11: R12: > >> > > > >> > > <4>[196727.311464] R13: 81c8c280 R14: R15: > >> > > 880e85ee16ce > >> > > <4>[196727.311510] FS: () > >> > > GS:885effd2() knlGS: > >> > > <4>[196727.311554] CS: 0010 DS: ES: CR0: 80050033 > >> > > <4>[196727.311581] CR2: 7a46751eb000 CR3: 005e65688000 CR4: > >> > > 000407e0 > >> > > <4>[196727.311625] DR0: DR1: DR2: > >> > > > >> > > <4>[196727.311669] DR3: DR6: 0ff0 DR7: > >> > > 0400 > >> > > <4>[196727.311713] Stack: > >> > > <4>[196727.311733] 8854c398ecc0 8854c398ecc0 885effd23ab0 > >> > > 815b7f42 > >> > > <4>[196727.311784] 88be6595bc00 8854c398ecc0 > >> > > 8854c398ecc0 > >> > > <4>[196727.311834] 885effd23ad0 815b86c6 885d5a590800 > >> > > 8816827821c0 > >> > > <4>[196727.311885] Call Trace: > >> > > <4>[196727.311907] > >> > > <4>[196727.311912] [] dst_destroy+0x32/0xe0 > >> > > <4>[196727.311959] [] dst_release+0x56/0x80 > >> > > <4>[196727.311986] [] tcp_v4_do_rcv+0x2a5/0x4a0 > >> > > <4>[196727.312013] [] tcp_v4_rcv+0x7da/0x820 > >> > > <4>[196727.312041] [] ? ip_rcv_finish+0x360/0x360 > >> > > <4>[196727.312070] [] ? nf_hook_slow+0x7d/0x150 > >> > > <4>[196727.312097] [] ? ip_rcv_finish+0x360/0x360 > >> > > <4>[196727.312125] [] > >> > > ip_local_deliver_finish+0xb2/0x230 > >> > > <4>[196727.312154] [] ip_local_deliver+0x4a/0x90 > >> > > <4>[196727.312183] [] ip_rcv_finish+0x119/0x360 > >> > > <4>[196727.312212] [] ip_rcv+0x22b/0x340 > >> > > <4>[196727.312242] [] ? > >> > > macvlan_broadcast+0x160/0x160 [macvlan] > >> > > <4>[196727.312275] [] > >> > > __netif_receive_skb_core+0x512/0x640 > >> > > <4>[196727.312308] [] ? kmem_cache_alloc+0x13b/0x150 > >> > > <4>[196727.312338]
Re: kmem_cache_alloc panic in 3.10+
> On Sat, 2014-01-18 at 00:44 -0800, dormando wrote: > > Hello again! > > > > We've had a rare crash that's existed between 3.10.0 and 3.10.15 at least > > (trying newer stables now, but I can't tell if it was fixed, and it takes > > weeks to reproduce). > > > > Unfortunately I can only get 8k back from pstore. The panic looks a bit > > longer than that is caught in the log, but the bottom part is almost > > always this same trace as this one: > > > > Panic#6 Part1 > > <4>[1197485.199166] [] tcp_push+0x6c/0x90 > > <4>[1197485.199171] [] tcp_sendmsg+0x109/0xd40 > > <4>[1197485.199179] [] ? put_page+0x35/0x40 > > <4>[1197485.199185] [] inet_sendmsg+0x45/0xb0 > > <4>[1197485.199191] [] sock_aio_write+0x11e/0x130 > > <4>[1197485.199196] [] ? inet_recvmsg+0x4f/0x80 > > <4>[1197485.199203] [] do_sync_readv_writev+0x6d/0xa0 > > <4>[1197485.199209] [] do_readv_writev+0xfb/0x2f0 > > <4>[1197485.199215] [] ? __free_pages+0x35/0x40 > > <4>[1197485.199220] [] ? free_pages+0x46/0x50 > > <4>[1197485.199226] [] ? SyS_mincore+0x152/0x690 > > <4>[1197485.199231] [] vfs_writev+0x48/0x60 > > <4>[1197485.199236] [] SyS_writev+0x5f/0xd0 > > <4>[1197485.199243] [] system_call_fastpath+0x16/0x1b > > <4>[1197485.199247] Code: 65 4c 03 04 25 c8 cb 00 00 49 8b 50 08 4d 8b 28 > > 49 8b 40 10 4d 85 ed 0f 84 84 00 00 00 48 85 c0 74 7f 49 63 44 24 20 49 8b > > 3c 24 <49> 8b 5c 05 00 48 8d 4a 01 4c 89 e8 65 48 0f c7 0f 0f 94 c0 3c > > <1>[1197485.199290] RIP [] kmem_cache_alloc+0x5a/0x130 > > <4>[1197485.199296] RSP > > <4>[1197485.199299] CR2: 0001 > > <4>[1197485.199343] ---[ end trace 90fee06aa40b7304 ]--- > > <1>[1197485.263911] BUG: unable to handle kernel paging request at > > 0001 > > <1>[1197485.263923] IP: [] kmem_cache_alloc+0x5a/0x130 > > <4>[1197485.263932] PGD 3f43e5c067 PUD 0 > > <4>[1197485.263937] Oops: [#5] SMP > > <4>[1197485.263941] Modules linked in: ntfs vfat msdos fat macvlan bridge > > coretemp crc32_pclmul ghash_clmulni_intel gpio_ich microcode sb_edac > > edac_core lpc_ich mfd_core ixgbe igb i2c_algo_bit mdio ptp pps_core > > <4>[1197485.263966] CPU: 0 PID: 233846 Comm: cache-worker Tainted: G D > > 3.10.15 #1 > > <4>[1197485.263972] Hardware name: Supermicro X9DR3-F/X9DR3-F, BIOS 2.0a > > 03/07/2013 > > <4>[1197485.263976] task: 883427f9dc00 ti: 8830d4312000 task.ti: > > 8830d4312000 > > <4>[1197485.263982] RIP: 0010:[] [] > > kmem_cache_alloc+0x5a/0x130 > > <4>[1197485.263990] RSP: 0018:881fffc038c8 EFLAGS: 00010286 > > <4>[1197485.263994] RAX: RBX: 81c8c740 RCX: > > > > <4>[1197485.263999] RDX: 29273024 RSI: 0020 RDI: > > 00015680 > > <4>[1197485.264004] RBP: 881fffc03908 R08: 881fffc15680 R09: > > 815bdd4b > > <4>[1197485.264009] R10: 881c65d21800 R11: R12: > > 881fff803800 > > <4>[1197485.264014] R13: 0001 R14: R15: > > > > <4>[1197485.264019] FS: 7f8d855eb700() GS:881fffc0() > > knlGS: > > <4>[1197485.264024] CS: 0010 DS: ES: CR0: 80050033 > > <4>[1197485.264028] CR2: 0001 CR3: 00308f258000 CR4: > > 000407f0 > > <4>[1197485.264032] DR0: DR1: DR2: > > > > <4>[1197485.264037] DR3: DR6: 0ff0 DR7: > > 0400 > > <4>[1197485.264041] Stack: > > <4>[1197485.264044] 881fffc03928 0020815d0d95 881fffc03938 > > 81c8c740 > > <4>[1197485.264050] 881fce21 0001 > > > > <4>[1197485.264056] 881fffc03958 815bdd4b 881fffc039a8 > > > > <4>[1197485.264063] Call Trace: > > <4>[1197485.264066] > > <4>[1197485.264069] [] dst_alloc+0x5b/0x190 > > <4>[1197485.264080] [] rt_dst_alloc+0x4c/0x50 > > <4>[1197485.264085] [] __ip_route_output_key+0x270/0x880 > > <4>[1197485.264092] [] ? try_to_wake_up+0x23e/0x2b0 > > <4>[1197485.264097] [] ip_route_output_flow+0x27/0x60 > > <4>[1197485.264102] [] ip_queue_x
kmem_cache_alloc panic in 3.10+
97485.264180] [] process_backlog+0xf4/0x1e0 <4>[1197485.264184] [] net_rx_action+0xf5/0x250 <4>[1197485.264190] [] __do_softirq+0xef/0x270 <4>[1197485.264196] [] call_softirq+0x1c/0x30 <4>[1197485.264199] <4>[1197485.264201] [] do_softirq+0x55/0x90 <4>[1197485.264209] [] local_bh_enable+0x94/0xa0 <4>[1197485.264215] [] ipt_do_table+0x22a/0x680 <4>[1197485.264221] [] ? skb_clone_tx_timestamp+0x31/0x110 <4>[1197485.264231] [] ? ixgbe_xmit_frame_ring+0x4c0/0xd40 [ixgbe] <4>[1197485.264239] [] ? ixgbe_xmit_frame+0x43/0x90 [ixgbe] <4>[1197485.264245] [] iptable_raw_hook+0x33/0x70 <4>[1197485.264252] [] nf_iterate+0x87/0xb0 <4>[1197485.264256] [] ? ip_options_echo+0x420/0x420 <4>[1197485.264261] [] nf_hook_slow+0x7d/0x160 <4>[1197485.264266] [] ? ip_options_echo+0x420/0x420 <4>[1197485.264270] [] __ip_local_out+0xa0/0xb0 <4>[1197485.264275] [] ip_local_out+0x16/0x30 <4>[1197485.264280] [] ip_queue_xmit+0x15a/0x390 <4>[1197485.264286] [] ? tcp_v4_md5_lookup+0x13/0x20 <4>[1197485.264290] [] tcp_transmit_skb+0x485/0x890 <4>[1197485.264295] [] tcp_write_xmit+0x1b8/0xa50 <4>[1197485.264300] [] ? __alloc_skb+0xa8/0x1f0 <4>[1197485.264304] [] tcp_push_one+0x30/0x40 <4>[1197485.264309] [] tcp_sendmsg+0xbe4/0xd40 <4>[1197485.264315] [] ? put_page+0x35/0x40 <4>[1197485.264321] [] inet_sendmsg+0x45/0xb0 <4>[1197485.264326] [] sock_aio_write+0x11e/0x130 <4>[1197485.264331] [] ? inet_recvmsg+0x4f/0x80 <4>[1197485.264337] [] do_sync_readv_writev+0x6d/0xa0 <4>[1197485.264343] [] do_readv_writev+0xfb/0x2f0 <4>[1197485.264347] [] ? __free_pages+0x35/0x40 <4>[1197485.264352] [] ? free_pages+0x46/0x50 <4>[1197485.264357] [] ? SyS_mincore+0x152/0x690 <4>[1197485.264363] [] vfs_writev+0x48/0x60 <4>[1197485.264367] [] SyS_writev+0x5f/0xd0 <4>[1197485.264373] [] system_call_fastpath+0x16/0x1b <4>[1197485.264377] Code: 65 4c 03 04 25 c8 cb 00 00 49 8b 50 08 4d 8b 28 49 8b 40 10 4d 85 ed 0f 84 84 00 00 00 48 85 c0 74 7f 49 63 44 24 20 49 8b 3c 24 <49> 8b 5c 05 00 48 8d 4a 01 4c 89 e8 65 48 0f c7 0f 0f 94 c0 3c <1>[1197485.264417] RIP [] kmem_cache_alloc+0x5a/0x130 <4>[1197485.264424] RSP <4>[1197485.264427] CR2: 0001 <4>[1197485.264431] ---[ end trace 90fee06aa40b7305 ]--- <0>[1197485.325141] Kernel panic - not syncing: Fatal exception in interrupt ... way down in the tcp code. Any help would be appreciated :) I'll do what I can to help, but iterating this particular crash is very hard due to the amount of time it takes to reproduce. Since we have a large number of machines they're always crashing here and there, but once they do it's not going to happen again for a while. Thanks! -Dormando -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
kmem_cache_alloc panic in 3.10+
] __netif_receive_skb_core+0x477/0x600 4[1197485.264175] [815b8ba7] __netif_receive_skb+0x27/0x70 4[1197485.264180] [815b8ce4] process_backlog+0xf4/0x1e0 4[1197485.264184] [815b94e5] net_rx_action+0xf5/0x250 4[1197485.264190] [81053b7f] __do_softirq+0xef/0x270 4[1197485.264196] [816d0b7c] call_softirq+0x1c/0x30 4[1197485.264199] EOI 4[1197485.264201] [81004495] do_softirq+0x55/0x90 4[1197485.264209] [81053a84] local_bh_enable+0x94/0xa0 4[1197485.264215] [8165567a] ipt_do_table+0x22a/0x680 4[1197485.264221] [815d39c1] ? skb_clone_tx_timestamp+0x31/0x110 4[1197485.264231] [a00ae840] ? ixgbe_xmit_frame_ring+0x4c0/0xd40 [ixgbe] 4[1197485.264239] [a00af103] ? ixgbe_xmit_frame+0x43/0x90 [ixgbe] 4[1197485.264245] [81657a23] iptable_raw_hook+0x33/0x70 4[1197485.264252] [815e43a7] nf_iterate+0x87/0xb0 4[1197485.264256] [81607e20] ? ip_options_echo+0x420/0x420 4[1197485.264261] [815e45cd] nf_hook_slow+0x7d/0x160 4[1197485.264266] [81607e20] ? ip_options_echo+0x420/0x420 4[1197485.264270] [8160a430] __ip_local_out+0xa0/0xb0 4[1197485.264275] [8160a456] ip_local_out+0x16/0x30 4[1197485.264280] [8160a97a] ip_queue_xmit+0x15a/0x390 4[1197485.264286] [81625e73] ? tcp_v4_md5_lookup+0x13/0x20 4[1197485.264290] [816207c5] tcp_transmit_skb+0x485/0x890 4[1197485.264295] [81622e08] tcp_write_xmit+0x1b8/0xa50 4[1197485.264300] [815a7e28] ? __alloc_skb+0xa8/0x1f0 4[1197485.264304] [816236d0] tcp_push_one+0x30/0x40 4[1197485.264309] [81616b84] tcp_sendmsg+0xbe4/0xd40 4[1197485.264315] [81114b65] ? put_page+0x35/0x40 4[1197485.264321] [8163bf75] inet_sendmsg+0x45/0xb0 4[1197485.264326] [8159da7e] sock_aio_write+0x11e/0x130 4[1197485.264331] [8163b83f] ? inet_recvmsg+0x4f/0x80 4[1197485.264337] [811558ad] do_sync_readv_writev+0x6d/0xa0 4[1197485.264343] [8115722b] do_readv_writev+0xfb/0x2f0 4[1197485.264347] [8110fda5] ? __free_pages+0x35/0x40 4[1197485.264352] [8110fe56] ? free_pages+0x46/0x50 4[1197485.264357] [8112f9e2] ? SyS_mincore+0x152/0x690 4[1197485.264363] [81157468] vfs_writev+0x48/0x60 4[1197485.264367] [811575af] SyS_writev+0x5f/0xd0 4[1197485.264373] [816cf942] system_call_fastpath+0x16/0x1b 4[1197485.264377] Code: 65 4c 03 04 25 c8 cb 00 00 49 8b 50 08 4d 8b 28 49 8b 40 10 4d 85 ed 0f 84 84 00 00 00 48 85 c0 74 7f 49 63 44 24 20 49 8b 3c 24 49 8b 5c 05 00 48 8d 4a 01 4c 89 e8 65 48 0f c7 0f 0f 94 c0 3c 1[1197485.264417] RIP [811476da] kmem_cache_alloc+0x5a/0x130 4[1197485.264424] RSP 881fffc038c8 4[1197485.264427] CR2: 0001 4[1197485.264431] ---[ end trace 90fee06aa40b7305 ]--- 0[1197485.325141] Kernel panic - not syncing: Fatal exception in interrupt ... way down in the tcp code. Any help would be appreciated :) I'll do what I can to help, but iterating this particular crash is very hard due to the amount of time it takes to reproduce. Since we have a large number of machines they're always crashing here and there, but once they do it's not going to happen again for a while. Thanks! -Dormando -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kmem_cache_alloc panic in 3.10+
On Sat, 2014-01-18 at 00:44 -0800, dormando wrote: Hello again! We've had a rare crash that's existed between 3.10.0 and 3.10.15 at least (trying newer stables now, but I can't tell if it was fixed, and it takes weeks to reproduce). Unfortunately I can only get 8k back from pstore. The panic looks a bit longer than that is caught in the log, but the bottom part is almost always this same trace as this one: Panic#6 Part1 4[1197485.199166] [81611e8c] tcp_push+0x6c/0x90 4[1197485.199171] [816160a9] tcp_sendmsg+0x109/0xd40 4[1197485.199179] [81114b65] ? put_page+0x35/0x40 4[1197485.199185] [8163bf75] inet_sendmsg+0x45/0xb0 4[1197485.199191] [8159da7e] sock_aio_write+0x11e/0x130 4[1197485.199196] [8163b83f] ? inet_recvmsg+0x4f/0x80 4[1197485.199203] [811558ad] do_sync_readv_writev+0x6d/0xa0 4[1197485.199209] [8115722b] do_readv_writev+0xfb/0x2f0 4[1197485.199215] [8110fda5] ? __free_pages+0x35/0x40 4[1197485.199220] [8110fe56] ? free_pages+0x46/0x50 4[1197485.199226] [8112f9e2] ? SyS_mincore+0x152/0x690 4[1197485.199231] [81157468] vfs_writev+0x48/0x60 4[1197485.199236] [811575af] SyS_writev+0x5f/0xd0 4[1197485.199243] [816cf942] system_call_fastpath+0x16/0x1b 4[1197485.199247] Code: 65 4c 03 04 25 c8 cb 00 00 49 8b 50 08 4d 8b 28 49 8b 40 10 4d 85 ed 0f 84 84 00 00 00 48 85 c0 74 7f 49 63 44 24 20 49 8b 3c 24 49 8b 5c 05 00 48 8d 4a 01 4c 89 e8 65 48 0f c7 0f 0f 94 c0 3c 1[1197485.199290] RIP [811476da] kmem_cache_alloc+0x5a/0x130 4[1197485.199296] RSP 883171211868 4[1197485.199299] CR2: 0001 4[1197485.199343] ---[ end trace 90fee06aa40b7304 ]--- 1[1197485.263911] BUG: unable to handle kernel paging request at 0001 1[1197485.263923] IP: [811476da] kmem_cache_alloc+0x5a/0x130 4[1197485.263932] PGD 3f43e5c067 PUD 0 4[1197485.263937] Oops: [#5] SMP 4[1197485.263941] Modules linked in: ntfs vfat msdos fat macvlan bridge coretemp crc32_pclmul ghash_clmulni_intel gpio_ich microcode sb_edac edac_core lpc_ich mfd_core ixgbe igb i2c_algo_bit mdio ptp pps_core 4[1197485.263966] CPU: 0 PID: 233846 Comm: cache-worker Tainted: G D 3.10.15 #1 4[1197485.263972] Hardware name: Supermicro X9DR3-F/X9DR3-F, BIOS 2.0a 03/07/2013 4[1197485.263976] task: 883427f9dc00 ti: 8830d4312000 task.ti: 8830d4312000 4[1197485.263982] RIP: 0010:[811476da] [811476da] kmem_cache_alloc+0x5a/0x130 4[1197485.263990] RSP: 0018:881fffc038c8 EFLAGS: 00010286 4[1197485.263994] RAX: RBX: 81c8c740 RCX: 4[1197485.263999] RDX: 29273024 RSI: 0020 RDI: 00015680 4[1197485.264004] RBP: 881fffc03908 R08: 881fffc15680 R09: 815bdd4b 4[1197485.264009] R10: 881c65d21800 R11: R12: 881fff803800 4[1197485.264014] R13: 0001 R14: R15: 4[1197485.264019] FS: 7f8d855eb700() GS:881fffc0() knlGS: 4[1197485.264024] CS: 0010 DS: ES: CR0: 80050033 4[1197485.264028] CR2: 0001 CR3: 00308f258000 CR4: 000407f0 4[1197485.264032] DR0: DR1: DR2: 4[1197485.264037] DR3: DR6: 0ff0 DR7: 0400 4[1197485.264041] Stack: 4[1197485.264044] 881fffc03928 0020815d0d95 881fffc03938 81c8c740 4[1197485.264050] 881fce21 0001 4[1197485.264056] 881fffc03958 815bdd4b 881fffc039a8 4[1197485.264063] Call Trace: 4[1197485.264066] IRQ 4[1197485.264069] [815bdd4b] dst_alloc+0x5b/0x190 4[1197485.264080] [8160068c] rt_dst_alloc+0x4c/0x50 4[1197485.264085] [81602a30] __ip_route_output_key+0x270/0x880 4[1197485.264092] [8107ee7e] ? try_to_wake_up+0x23e/0x2b0 4[1197485.264097] [81603067] ip_route_output_flow+0x27/0x60 4[1197485.264102] [8160ab8a] ip_queue_xmit+0x36a/0x390 4[1197485.264108] [816207c5] tcp_transmit_skb+0x485/0x890 4[1197485.264113] [81621aa1] tcp_send_ack+0xf1/0x130 4[1197485.264118] [81618d7e] __tcp_ack_snd_check+0x5e/0xa0 4[1197485.264123] [8161f2c2] tcp_rcv_state_process+0x8b2/0xb20 4[1197485.264128] [81627e61] tcp_v4_do_rcv+0x191/0x4f0 4[1197485.264133] [8162984c] tcp_v4_rcv+0x5fc/0x750 4[1197485.264138] [81604c80] ? ip_rcv+0x350/0x350 4[1197485.264143] [815e45cd] ? nf_hook_slow+0x7d/0x160 4[1197485.264147] [81604c80] ? ip_rcv+0x350/0x350 4[1197485.264152] [81604d4e] ip_local_deliver_finish+0xce/0x250
Re: ipv4_dst_destroy panic regression after 3.10.15
On Fri, Jan 17, 2014 at 11:16 PM, dormando dorma...@rydia.net wrote: On Fri, 2014-01-17 at 22:49 -0800, Eric Dumazet wrote: On Fri, 2014-01-17 at 17:25 -0800, dormando wrote: Hi, Upgraded a few kernels to the latest 3.10 stable tree while tracking down a rare kernel panic, seems to have introduced a much more frequent kernel panic. Takes anywhere from 4 hours to 2 days to trigger: 4[196727.311203] general protection fault: [#1] SMP 4[196727.311224] Modules linked in: xt_TEE xt_dscp xt_DSCP macvlan bridge coretemp crc32_pclmul ghash_clmulni_intel gpio_ich microcode ipmi_watchdog ipmi_devintf sb_edac edac_core lpc_ich mfd_core tpm_tis tpm tpm_bios ipmi_si ipmi_msghandler isci igb libsas i2c_algo_bit ixgbe ptp pps_core mdio 4[196727.311333] CPU: 17 PID: 0 Comm: swapper/17 Not tainted 3.10.26 #1 4[196727.311344] Hardware name: Supermicro X9DRi-LN4+/X9DR3-LN4+/X9DRi-LN4+/X9DR3-LN4+, BIOS 3.0 07/05/2013 4[196727.311364] task: 885e6f069700 ti: 885e6f072000 task.ti: 885e6f072000 4[196727.311377] RIP: 0010:[815f8c7f] [815f8c7f] ipv4_dst_destroy+0x4f/0x80 4[196727.311399] RSP: 0018:885effd23a70 EFLAGS: 00010282 4[196727.311409] RAX: dead00200200 RBX: 8854c398ecc0 RCX: 0040 4[196727.311423] RDX: dead00100100 RSI: dead00100100 RDI: dead00200200 4[196727.311437] RBP: 885effd23a80 R08: 815fd9e0 R09: 885d5a590800 4[196727.311451] R10: R11: R12: 4[196727.311464] R13: 81c8c280 R14: R15: 880e85ee16ce 4[196727.311510] FS: () GS:885effd2() knlGS: 4[196727.311554] CS: 0010 DS: ES: CR0: 80050033 4[196727.311581] CR2: 7a46751eb000 CR3: 005e65688000 CR4: 000407e0 4[196727.311625] DR0: DR1: DR2: 4[196727.311669] DR3: DR6: 0ff0 DR7: 0400 4[196727.311713] Stack: 4[196727.311733] 8854c398ecc0 8854c398ecc0 885effd23ab0 815b7f42 4[196727.311784] 88be6595bc00 8854c398ecc0 8854c398ecc0 4[196727.311834] 885effd23ad0 815b86c6 885d5a590800 8816827821c0 4[196727.311885] Call Trace: 4[196727.311907] IRQ 4[196727.311912] [815b7f42] dst_destroy+0x32/0xe0 4[196727.311959] [815b86c6] dst_release+0x56/0x80 4[196727.311986] [81620bd5] tcp_v4_do_rcv+0x2a5/0x4a0 4[196727.312013] [81622b5a] tcp_v4_rcv+0x7da/0x820 4[196727.312041] [815fd9e0] ? ip_rcv_finish+0x360/0x360 4[196727.312070] [815de02d] ? nf_hook_slow+0x7d/0x150 4[196727.312097] [815fd9e0] ? ip_rcv_finish+0x360/0x360 4[196727.312125] [815fda92] ip_local_deliver_finish+0xb2/0x230 4[196727.312154] [815fdd9a] ip_local_deliver+0x4a/0x90 4[196727.312183] [815fd799] ip_rcv_finish+0x119/0x360 4[196727.312212] [815fe00b] ip_rcv+0x22b/0x340 4[196727.312242] [a0339680] ? macvlan_broadcast+0x160/0x160 [macvlan] 4[196727.312275] [815b0c62] __netif_receive_skb_core+0x512/0x640 4[196727.312308] [811427fb] ? kmem_cache_alloc+0x13b/0x150 4[196727.312338] [815b0db1] __netif_receive_skb+0x21/0x70 4[196727.312368] [815b0fa1] netif_receive_skb+0x31/0xa0 4[196727.312397] [815b1ae8] napi_gro_receive+0xe8/0x140 4[196727.312433] [a00274f1] ixgbe_poll+0x551/0x11f0 [ixgbe] 4[196727.312463] [815fe00b] ? ip_rcv+0x22b/0x340 4[196727.312491] [815b1691] net_rx_action+0x111/0x210 4[196727.312521] [815b0db1] ? __netif_receive_skb+0x21/0x70 4[196727.312552] [810519d0] __do_softirq+0xd0/0x270 4[196727.312583] [816cef3c] call_softirq+0x1c/0x30 4[196727.312613] [81004205] do_softirq+0x55/0x90 4[196727.312640] [81051c85] irq_exit+0x55/0x60 4[196727.312668] [816cf5c3] do_IRQ+0x63/0xe0 4[196727.312696] [816c5aaa] common_interrupt+0x6a/0x6a 4[196727.312722] EOI 4[196727.312727] [8100a150] ? default_idle+0x20/0xe0 4[196727.312775] [8100a8ff] arch_cpu_idle+0xf/0x20 4[196727.312803] [8108d330] cpu_startup_entry+0xc0/0x270 4[196727.312833] [816b276e] start_secondary+0x1f9/0x200 4[196727.312860] Code: 4a 9f e9 81 e8 13 cb 0c 00 48 8b 93 b0 00 00 00 48 bf 00 02 20 00 00 00 ad de 48 8b 83 b8 00 00 00 48 be 00 01 10 00 00 00 ad de 48 89 42 08 48 89 10 48 89 bb b8 00 00 00 48 c7 c7 4a 9f e9 81 1[196727.313071] RIP [815f8c7f] ipv4_dst_destroy+0x4f
Re: ipv4_dst_destroy panic regression after 3.10.15
> On Fri, 2014-01-17 at 22:49 -0800, Eric Dumazet wrote: > > On Fri, 2014-01-17 at 17:25 -0800, dormando wrote: > > > Hi, > > > > > > Upgraded a few kernels to the latest 3.10 stable tree while tracking down > > > a rare kernel panic, seems to have introduced a much more frequent kernel > > > panic. Takes anywhere from 4 hours to 2 days to trigger: > > > > > > <4>[196727.311203] general protection fault: [#1] SMP > > > <4>[196727.311224] Modules linked in: xt_TEE xt_dscp xt_DSCP macvlan > > > bridge coretemp crc32_pclmul ghash_clmulni_intel gpio_ich microcode > > > ipmi_watchdog ipmi_devintf sb_edac edac_core lpc_ich mfd_core tpm_tis tpm > > > tpm_bios ipmi_si ipmi_msghandler isci igb libsas i2c_algo_bit ixgbe ptp > > > pps_core mdio > > > <4>[196727.311333] CPU: 17 PID: 0 Comm: swapper/17 Not tainted 3.10.26 #1 > > > <4>[196727.311344] Hardware name: Supermicro > > > X9DRi-LN4+/X9DR3-LN4+/X9DRi-LN4+/X9DR3-LN4+, BIOS 3.0 07/05/2013 > > > <4>[196727.311364] task: 885e6f069700 ti: 885e6f072000 task.ti: > > > 885e6f072000 > > > <4>[196727.311377] RIP: 0010:[] [] > > > ipv4_dst_destroy+0x4f/0x80 > > > <4>[196727.311399] RSP: 0018:885effd23a70 EFLAGS: 00010282 > > > <4>[196727.311409] RAX: dead00200200 RBX: 8854c398ecc0 RCX: > > > 0040 > > > <4>[196727.311423] RDX: dead00100100 RSI: dead00100100 RDI: > > > dead00200200 > > > <4>[196727.311437] RBP: 885effd23a80 R08: 815fd9e0 R09: > > > 885d5a590800 > > > <4>[196727.311451] R10: R11: R12: > > > > > > <4>[196727.311464] R13: 81c8c280 R14: R15: > > > 880e85ee16ce > > > <4>[196727.311510] FS: () GS:885effd2() > > > knlGS: > > > <4>[196727.311554] CS: 0010 DS: ES: CR0: 80050033 > > > <4>[196727.311581] CR2: 7a46751eb000 CR3: 005e65688000 CR4: > > > 000407e0 > > > <4>[196727.311625] DR0: DR1: DR2: > > > > > > <4>[196727.311669] DR3: DR6: 0ff0 DR7: > > > 0400 > > > <4>[196727.311713] Stack: > > > <4>[196727.311733] 8854c398ecc0 8854c398ecc0 885effd23ab0 > > > 815b7f42 > > > <4>[196727.311784] 88be6595bc00 8854c398ecc0 > > > 8854c398ecc0 > > > <4>[196727.311834] 885effd23ad0 815b86c6 885d5a590800 > > > 8816827821c0 > > > <4>[196727.311885] Call Trace: > > > <4>[196727.311907] > > > <4>[196727.311912] [] dst_destroy+0x32/0xe0 > > > <4>[196727.311959] [] dst_release+0x56/0x80 > > > <4>[196727.311986] [] tcp_v4_do_rcv+0x2a5/0x4a0 > > > <4>[196727.312013] [] tcp_v4_rcv+0x7da/0x820 > > > <4>[196727.312041] [] ? ip_rcv_finish+0x360/0x360 > > > <4>[196727.312070] [] ? nf_hook_slow+0x7d/0x150 > > > <4>[196727.312097] [] ? ip_rcv_finish+0x360/0x360 > > > <4>[196727.312125] [] > > > ip_local_deliver_finish+0xb2/0x230 > > > <4>[196727.312154] [] ip_local_deliver+0x4a/0x90 > > > <4>[196727.312183] [] ip_rcv_finish+0x119/0x360 > > > <4>[196727.312212] [] ip_rcv+0x22b/0x340 > > > <4>[196727.312242] [] ? macvlan_broadcast+0x160/0x160 > > > [macvlan] > > > <4>[196727.312275] [] > > > __netif_receive_skb_core+0x512/0x640 > > > <4>[196727.312308] [] ? kmem_cache_alloc+0x13b/0x150 > > > <4>[196727.312338] [] __netif_receive_skb+0x21/0x70 > > > <4>[196727.312368] [] netif_receive_skb+0x31/0xa0 > > > <4>[196727.312397] [] napi_gro_receive+0xe8/0x140 > > > <4>[196727.312433] [] ixgbe_poll+0x551/0x11f0 [ixgbe] > > > <4>[196727.312463] [] ? ip_rcv+0x22b/0x340 > > > <4>[196727.312491] [] net_rx_action+0x111/0x210 > > > <4>[196727.312521] [] ? __netif_receive_skb+0x21/0x70 > > > <4>[196727.312552] [] __do_softirq+0xd0/0x270 > > > <4>[196727.312583] [] call_softirq+0x1c/0x30 > > > <4>[196727.312613] [] do_softirq+0x55/0x90 > > > <4>[196727.312640] [] irq_exit+0x55/0x60 > > > <4&
ipv4_dst_destroy panic regression after 3.10.15
Hi, Upgraded a few kernels to the latest 3.10 stable tree while tracking down a rare kernel panic, seems to have introduced a much more frequent kernel panic. Takes anywhere from 4 hours to 2 days to trigger: <4>[196727.311203] general protection fault: [#1] SMP <4>[196727.311224] Modules linked in: xt_TEE xt_dscp xt_DSCP macvlan bridge coretemp crc32_pclmul ghash_clmulni_intel gpio_ich microcode ipmi_watchdog ipmi_devintf sb_edac edac_core lpc_ich mfd_core tpm_tis tpm tpm_bios ipmi_si ipmi_msghandler isci igb libsas i2c_algo_bit ixgbe ptp pps_core mdio <4>[196727.311333] CPU: 17 PID: 0 Comm: swapper/17 Not tainted 3.10.26 #1 <4>[196727.311344] Hardware name: Supermicro X9DRi-LN4+/X9DR3-LN4+/X9DRi-LN4+/X9DR3-LN4+, BIOS 3.0 07/05/2013 <4>[196727.311364] task: 885e6f069700 ti: 885e6f072000 task.ti: 885e6f072000 <4>[196727.311377] RIP: 0010:[] [] ipv4_dst_destroy+0x4f/0x80 <4>[196727.311399] RSP: 0018:885effd23a70 EFLAGS: 00010282 <4>[196727.311409] RAX: dead00200200 RBX: 8854c398ecc0 RCX: 0040 <4>[196727.311423] RDX: dead00100100 RSI: dead00100100 RDI: dead00200200 <4>[196727.311437] RBP: 885effd23a80 R08: 815fd9e0 R09: 885d5a590800 <4>[196727.311451] R10: R11: R12: <4>[196727.311464] R13: 81c8c280 R14: R15: 880e85ee16ce <4>[196727.311510] FS: () GS:885effd2() knlGS: <4>[196727.311554] CS: 0010 DS: ES: CR0: 80050033 <4>[196727.311581] CR2: 7a46751eb000 CR3: 005e65688000 CR4: 000407e0 <4>[196727.311625] DR0: DR1: DR2: <4>[196727.311669] DR3: DR6: 0ff0 DR7: 0400 <4>[196727.311713] Stack: <4>[196727.311733] 8854c398ecc0 8854c398ecc0 885effd23ab0 815b7f42 <4>[196727.311784] 88be6595bc00 8854c398ecc0 8854c398ecc0 <4>[196727.311834] 885effd23ad0 815b86c6 885d5a590800 8816827821c0 <4>[196727.311885] Call Trace: <4>[196727.311907] <4>[196727.311912] [] dst_destroy+0x32/0xe0 <4>[196727.311959] [] dst_release+0x56/0x80 <4>[196727.311986] [] tcp_v4_do_rcv+0x2a5/0x4a0 <4>[196727.312013] [] tcp_v4_rcv+0x7da/0x820 <4>[196727.312041] [] ? ip_rcv_finish+0x360/0x360 <4>[196727.312070] [] ? nf_hook_slow+0x7d/0x150 <4>[196727.312097] [] ? ip_rcv_finish+0x360/0x360 <4>[196727.312125] [] ip_local_deliver_finish+0xb2/0x230 <4>[196727.312154] [] ip_local_deliver+0x4a/0x90 <4>[196727.312183] [] ip_rcv_finish+0x119/0x360 <4>[196727.312212] [] ip_rcv+0x22b/0x340 <4>[196727.312242] [] ? macvlan_broadcast+0x160/0x160 [macvlan] <4>[196727.312275] [] __netif_receive_skb_core+0x512/0x640 <4>[196727.312308] [] ? kmem_cache_alloc+0x13b/0x150 <4>[196727.312338] [] __netif_receive_skb+0x21/0x70 <4>[196727.312368] [] netif_receive_skb+0x31/0xa0 <4>[196727.312397] [] napi_gro_receive+0xe8/0x140 <4>[196727.312433] [] ixgbe_poll+0x551/0x11f0 [ixgbe] <4>[196727.312463] [] ? ip_rcv+0x22b/0x340 <4>[196727.312491] [] net_rx_action+0x111/0x210 <4>[196727.312521] [] ? __netif_receive_skb+0x21/0x70 <4>[196727.312552] [] __do_softirq+0xd0/0x270 <4>[196727.312583] [] call_softirq+0x1c/0x30 <4>[196727.312613] [] do_softirq+0x55/0x90 <4>[196727.312640] [] irq_exit+0x55/0x60 <4>[196727.312668] [] do_IRQ+0x63/0xe0 <4>[196727.312696] [] common_interrupt+0x6a/0x6a <4>[196727.312722] <4>[196727.312727] [] ? default_idle+0x20/0xe0 <4>[196727.312775] [] arch_cpu_idle+0xf/0x20 <4>[196727.312803] [] cpu_startup_entry+0xc0/0x270 <4>[196727.312833] [] start_secondary+0x1f9/0x200 <4>[196727.312860] Code: 4a 9f e9 81 e8 13 cb 0c 00 48 8b 93 b0 00 00 00 48 bf 00 02 20 00 00 00 ad de 48 8b 83 b8 00 00 00 48 be 00 01 10 00 00 00 ad de <48> 89 42 08 48 89 10 48 89 bb b8 00 00 00 48 c7 c7 4a 9f e9 81 <1>[196727.313071] RIP [] ipv4_dst_destroy+0x4f/0x80 <4>[196727.313100] RSP <4>[196727.313377] ---[ end trace 64b3f14fae0f2e29 ]--- <0>[196727.380908] Kernel panic - not syncing: Fatal exception in interrupt ... bisecting it's going to be a pain... I tried eyeballing the diffs and am trying a revert or two. We've hit it in .25, .26 so far. I have .27 running but not sure if it crashed, so the change exists between .15 and .25. Thanks, -Dormando -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
ipv4_dst_destroy panic regression after 3.10.15
Hi, Upgraded a few kernels to the latest 3.10 stable tree while tracking down a rare kernel panic, seems to have introduced a much more frequent kernel panic. Takes anywhere from 4 hours to 2 days to trigger: 4[196727.311203] general protection fault: [#1] SMP 4[196727.311224] Modules linked in: xt_TEE xt_dscp xt_DSCP macvlan bridge coretemp crc32_pclmul ghash_clmulni_intel gpio_ich microcode ipmi_watchdog ipmi_devintf sb_edac edac_core lpc_ich mfd_core tpm_tis tpm tpm_bios ipmi_si ipmi_msghandler isci igb libsas i2c_algo_bit ixgbe ptp pps_core mdio 4[196727.311333] CPU: 17 PID: 0 Comm: swapper/17 Not tainted 3.10.26 #1 4[196727.311344] Hardware name: Supermicro X9DRi-LN4+/X9DR3-LN4+/X9DRi-LN4+/X9DR3-LN4+, BIOS 3.0 07/05/2013 4[196727.311364] task: 885e6f069700 ti: 885e6f072000 task.ti: 885e6f072000 4[196727.311377] RIP: 0010:[815f8c7f] [815f8c7f] ipv4_dst_destroy+0x4f/0x80 4[196727.311399] RSP: 0018:885effd23a70 EFLAGS: 00010282 4[196727.311409] RAX: dead00200200 RBX: 8854c398ecc0 RCX: 0040 4[196727.311423] RDX: dead00100100 RSI: dead00100100 RDI: dead00200200 4[196727.311437] RBP: 885effd23a80 R08: 815fd9e0 R09: 885d5a590800 4[196727.311451] R10: R11: R12: 4[196727.311464] R13: 81c8c280 R14: R15: 880e85ee16ce 4[196727.311510] FS: () GS:885effd2() knlGS: 4[196727.311554] CS: 0010 DS: ES: CR0: 80050033 4[196727.311581] CR2: 7a46751eb000 CR3: 005e65688000 CR4: 000407e0 4[196727.311625] DR0: DR1: DR2: 4[196727.311669] DR3: DR6: 0ff0 DR7: 0400 4[196727.311713] Stack: 4[196727.311733] 8854c398ecc0 8854c398ecc0 885effd23ab0 815b7f42 4[196727.311784] 88be6595bc00 8854c398ecc0 8854c398ecc0 4[196727.311834] 885effd23ad0 815b86c6 885d5a590800 8816827821c0 4[196727.311885] Call Trace: 4[196727.311907] IRQ 4[196727.311912] [815b7f42] dst_destroy+0x32/0xe0 4[196727.311959] [815b86c6] dst_release+0x56/0x80 4[196727.311986] [81620bd5] tcp_v4_do_rcv+0x2a5/0x4a0 4[196727.312013] [81622b5a] tcp_v4_rcv+0x7da/0x820 4[196727.312041] [815fd9e0] ? ip_rcv_finish+0x360/0x360 4[196727.312070] [815de02d] ? nf_hook_slow+0x7d/0x150 4[196727.312097] [815fd9e0] ? ip_rcv_finish+0x360/0x360 4[196727.312125] [815fda92] ip_local_deliver_finish+0xb2/0x230 4[196727.312154] [815fdd9a] ip_local_deliver+0x4a/0x90 4[196727.312183] [815fd799] ip_rcv_finish+0x119/0x360 4[196727.312212] [815fe00b] ip_rcv+0x22b/0x340 4[196727.312242] [a0339680] ? macvlan_broadcast+0x160/0x160 [macvlan] 4[196727.312275] [815b0c62] __netif_receive_skb_core+0x512/0x640 4[196727.312308] [811427fb] ? kmem_cache_alloc+0x13b/0x150 4[196727.312338] [815b0db1] __netif_receive_skb+0x21/0x70 4[196727.312368] [815b0fa1] netif_receive_skb+0x31/0xa0 4[196727.312397] [815b1ae8] napi_gro_receive+0xe8/0x140 4[196727.312433] [a00274f1] ixgbe_poll+0x551/0x11f0 [ixgbe] 4[196727.312463] [815fe00b] ? ip_rcv+0x22b/0x340 4[196727.312491] [815b1691] net_rx_action+0x111/0x210 4[196727.312521] [815b0db1] ? __netif_receive_skb+0x21/0x70 4[196727.312552] [810519d0] __do_softirq+0xd0/0x270 4[196727.312583] [816cef3c] call_softirq+0x1c/0x30 4[196727.312613] [81004205] do_softirq+0x55/0x90 4[196727.312640] [81051c85] irq_exit+0x55/0x60 4[196727.312668] [816cf5c3] do_IRQ+0x63/0xe0 4[196727.312696] [816c5aaa] common_interrupt+0x6a/0x6a 4[196727.312722] EOI 4[196727.312727] [8100a150] ? default_idle+0x20/0xe0 4[196727.312775] [8100a8ff] arch_cpu_idle+0xf/0x20 4[196727.312803] [8108d330] cpu_startup_entry+0xc0/0x270 4[196727.312833] [816b276e] start_secondary+0x1f9/0x200 4[196727.312860] Code: 4a 9f e9 81 e8 13 cb 0c 00 48 8b 93 b0 00 00 00 48 bf 00 02 20 00 00 00 ad de 48 8b 83 b8 00 00 00 48 be 00 01 10 00 00 00 ad de 48 89 42 08 48 89 10 48 89 bb b8 00 00 00 48 c7 c7 4a 9f e9 81 1[196727.313071] RIP [815f8c7f] ipv4_dst_destroy+0x4f/0x80 4[196727.313100] RSP 885effd23a70 4[196727.313377] ---[ end trace 64b3f14fae0f2e29 ]--- 0[196727.380908] Kernel panic - not syncing: Fatal exception in interrupt ... bisecting it's going to be a pain... I tried eyeballing the diffs and am trying a revert or two. We've hit it in .25, .26 so far. I have .27 running but not sure if it crashed, so the change exists between .15 and .25. Thanks, -Dormando -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo
Re: ipv4_dst_destroy panic regression after 3.10.15
On Fri, 2014-01-17 at 22:49 -0800, Eric Dumazet wrote: On Fri, 2014-01-17 at 17:25 -0800, dormando wrote: Hi, Upgraded a few kernels to the latest 3.10 stable tree while tracking down a rare kernel panic, seems to have introduced a much more frequent kernel panic. Takes anywhere from 4 hours to 2 days to trigger: 4[196727.311203] general protection fault: [#1] SMP 4[196727.311224] Modules linked in: xt_TEE xt_dscp xt_DSCP macvlan bridge coretemp crc32_pclmul ghash_clmulni_intel gpio_ich microcode ipmi_watchdog ipmi_devintf sb_edac edac_core lpc_ich mfd_core tpm_tis tpm tpm_bios ipmi_si ipmi_msghandler isci igb libsas i2c_algo_bit ixgbe ptp pps_core mdio 4[196727.311333] CPU: 17 PID: 0 Comm: swapper/17 Not tainted 3.10.26 #1 4[196727.311344] Hardware name: Supermicro X9DRi-LN4+/X9DR3-LN4+/X9DRi-LN4+/X9DR3-LN4+, BIOS 3.0 07/05/2013 4[196727.311364] task: 885e6f069700 ti: 885e6f072000 task.ti: 885e6f072000 4[196727.311377] RIP: 0010:[815f8c7f] [815f8c7f] ipv4_dst_destroy+0x4f/0x80 4[196727.311399] RSP: 0018:885effd23a70 EFLAGS: 00010282 4[196727.311409] RAX: dead00200200 RBX: 8854c398ecc0 RCX: 0040 4[196727.311423] RDX: dead00100100 RSI: dead00100100 RDI: dead00200200 4[196727.311437] RBP: 885effd23a80 R08: 815fd9e0 R09: 885d5a590800 4[196727.311451] R10: R11: R12: 4[196727.311464] R13: 81c8c280 R14: R15: 880e85ee16ce 4[196727.311510] FS: () GS:885effd2() knlGS: 4[196727.311554] CS: 0010 DS: ES: CR0: 80050033 4[196727.311581] CR2: 7a46751eb000 CR3: 005e65688000 CR4: 000407e0 4[196727.311625] DR0: DR1: DR2: 4[196727.311669] DR3: DR6: 0ff0 DR7: 0400 4[196727.311713] Stack: 4[196727.311733] 8854c398ecc0 8854c398ecc0 885effd23ab0 815b7f42 4[196727.311784] 88be6595bc00 8854c398ecc0 8854c398ecc0 4[196727.311834] 885effd23ad0 815b86c6 885d5a590800 8816827821c0 4[196727.311885] Call Trace: 4[196727.311907] IRQ 4[196727.311912] [815b7f42] dst_destroy+0x32/0xe0 4[196727.311959] [815b86c6] dst_release+0x56/0x80 4[196727.311986] [81620bd5] tcp_v4_do_rcv+0x2a5/0x4a0 4[196727.312013] [81622b5a] tcp_v4_rcv+0x7da/0x820 4[196727.312041] [815fd9e0] ? ip_rcv_finish+0x360/0x360 4[196727.312070] [815de02d] ? nf_hook_slow+0x7d/0x150 4[196727.312097] [815fd9e0] ? ip_rcv_finish+0x360/0x360 4[196727.312125] [815fda92] ip_local_deliver_finish+0xb2/0x230 4[196727.312154] [815fdd9a] ip_local_deliver+0x4a/0x90 4[196727.312183] [815fd799] ip_rcv_finish+0x119/0x360 4[196727.312212] [815fe00b] ip_rcv+0x22b/0x340 4[196727.312242] [a0339680] ? macvlan_broadcast+0x160/0x160 [macvlan] 4[196727.312275] [815b0c62] __netif_receive_skb_core+0x512/0x640 4[196727.312308] [811427fb] ? kmem_cache_alloc+0x13b/0x150 4[196727.312338] [815b0db1] __netif_receive_skb+0x21/0x70 4[196727.312368] [815b0fa1] netif_receive_skb+0x31/0xa0 4[196727.312397] [815b1ae8] napi_gro_receive+0xe8/0x140 4[196727.312433] [a00274f1] ixgbe_poll+0x551/0x11f0 [ixgbe] 4[196727.312463] [815fe00b] ? ip_rcv+0x22b/0x340 4[196727.312491] [815b1691] net_rx_action+0x111/0x210 4[196727.312521] [815b0db1] ? __netif_receive_skb+0x21/0x70 4[196727.312552] [810519d0] __do_softirq+0xd0/0x270 4[196727.312583] [816cef3c] call_softirq+0x1c/0x30 4[196727.312613] [81004205] do_softirq+0x55/0x90 4[196727.312640] [81051c85] irq_exit+0x55/0x60 4[196727.312668] [816cf5c3] do_IRQ+0x63/0xe0 4[196727.312696] [816c5aaa] common_interrupt+0x6a/0x6a 4[196727.312722] EOI 4[196727.312727] [8100a150] ? default_idle+0x20/0xe0 4[196727.312775] [8100a8ff] arch_cpu_idle+0xf/0x20 4[196727.312803] [8108d330] cpu_startup_entry+0xc0/0x270 4[196727.312833] [816b276e] start_secondary+0x1f9/0x200 4[196727.312860] Code: 4a 9f e9 81 e8 13 cb 0c 00 48 8b 93 b0 00 00 00 48 bf 00 02 20 00 00 00 ad de 48 8b 83 b8 00 00 00 48 be 00 01 10 00 00 00 ad de 48 89 42 08 48 89 10 48 89 bb b8 00 00 00 48 c7 c7 4a 9f e9 81 1[196727.313071] RIP [815f8c7f] ipv4_dst_destroy+0x4f/0x80 4[196727.313100] RSP 885effd23a70 4[196727.313377] ---[ end trace 64b3f14fae0f2e29 ]--- 0[196727.380908] Kernel panic - not syncing: Fatal exception in interrupt
Re: ipv4: warnings on sk_wmem_queued
> I noticed these warnings on stock 3.10.9 running stress tests on > cmogstored.git (git://bogomips.org/cmogstored.git) doing standard > HTTP server stuff between lo and tmpfs: > [...] > I was going to reboot into 3.10.10 before I looked at dmesg. These > warnings happened after ~8 hours of stress tests, and those stress tests > are still running. I had a kernel panic this morning on a production machine, also running 3.10.9. I only got a small part of the end of the trace, but it matches: > Aug 30 06:03:54 localhost kernel: [] > ip_queue_xmit+0x153/0x3c0 > Aug 30 06:03:54 localhost kernel: [] > tcp_transmit_skb+0x3c5/0x820 > Aug 30 06:03:54 localhost kernel: [] > tcp_write_xmit+0x191/0xaa0 > Aug 30 06:03:54 localhost kernel: [] ? > __kmalloc_reserve.isra.49+0x3c/0xa0 > Aug 30 06:03:54 localhost kernel: [] > __tcp_push_pending_frames+0x32/0xa0 > Aug 30 06:03:54 localhost kernel: [] tcp_send_fin+0x6f/0x190 > Aug 30 06:03:54 localhost kernel: [] tcp_close+0x378/0x410 > Aug 30 06:03:54 localhost kernel: [] inet_release+0x5a/0xa0 > Aug 30 06:03:54 localhost kernel: [] sock_release+0x28/0x90 > Aug 30 06:03:54 localhost kernel: [] sock_close+0x12/0x20 > Aug 30 06:03:54 localhost kernel: [] __fput+0xaf/0x240 > Aug 30 06:03:54 localhost kernel: [] fput+0xe/0x10 > Aug 30 06:03:54 localhost kernel: [] task_work_run+0xa7/0xe0 > Aug 30 06:03:54 localhost kernel: [] > do_notify_resume+0x9c/0xb0 > Aug 30 06:03:54 localhost kernel: [] int_signal+0x12/0x17 ... from there to here... Then: RIP [ kmem_cache_alloc+0x5a/0x130 RSP ---[ end trace 6ab931f3db28b31e ]--- Kernel panic - not syncing: Fatal exception in interrupt Machine was running for a few days before panic'ing. I don't see anything in 3.10.10 that would've affected this. Thanks! (also: hi Eric!) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: ipv4: warnings on sk_wmem_queued
I noticed these warnings on stock 3.10.9 running stress tests on cmogstored.git (git://bogomips.org/cmogstored.git) doing standard HTTP server stuff between lo and tmpfs: [...] I was going to reboot into 3.10.10 before I looked at dmesg. These warnings happened after ~8 hours of stress tests, and those stress tests are still running. I had a kernel panic this morning on a production machine, also running 3.10.9. I only got a small part of the end of the trace, but it matches: Aug 30 06:03:54 localhost kernel: [813c0073] ip_queue_xmit+0x153/0x3c0 Aug 30 06:03:54 localhost kernel: [813d6c25] tcp_transmit_skb+0x3c5/0x820 Aug 30 06:03:54 localhost kernel: [813d72c1] tcp_write_xmit+0x191/0xaa0 Aug 30 06:03:54 localhost kernel: [8138434c] ? __kmalloc_reserve.isra.49+0x3c/0xa0 Aug 30 06:03:54 localhost kernel: [813d7c42] __tcp_push_pending_frames+0x32/0xa0 Aug 30 06:03:54 localhost kernel: [813d8a8f] tcp_send_fin+0x6f/0x190 Aug 30 06:03:54 localhost kernel: [813cc508] tcp_close+0x378/0x410 Aug 30 06:03:54 localhost kernel: [813efe5a] inet_release+0x5a/0xa0 Aug 30 06:03:54 localhost kernel: [8137a218] sock_release+0x28/0x90 Aug 30 06:03:54 localhost kernel: [8137a5c2] sock_close+0x12/0x20 Aug 30 06:03:54 localhost kernel: [81123def] __fput+0xaf/0x240 Aug 30 06:03:54 localhost kernel: [8112403e] fput+0xe/0x10 Aug 30 06:03:54 localhost kernel: [81054d47] task_work_run+0xa7/0xe0 Aug 30 06:03:54 localhost kernel: [8100209c] do_notify_resume+0x9c/0xb0 Aug 30 06:03:54 localhost kernel: [81430788] int_signal+0x12/0x17 ... from there to here... Then: RIP [8113c42a kmem_cache_alloc+0x5a/0x130 RSP 881fffca3958 ---[ end trace 6ab931f3db28b31e ]--- Kernel panic - not syncing: Fatal exception in interrupt Machine was running for a few days before panic'ing. I don't see anything in 3.10.10 that would've affected this. Thanks! (also: hi Eric!) -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/10] Reduce system disruption due to kswapd V2
> On Tue, Apr 09, 2013 at 05:27:18PM +, Christoph Lameter wrote: > > One additional measure that may be useful is to make kswapd prefer one > > specific processor on a socket. Two benefits arise from that: > > > > 1. Better use of cpu caches and therefore higher speed, less > > serialization. > > > > Considering the volume of pages that kswapd can scan when it's active > I would expect that it trashes its cache anyway. The L1 cache would be > flushed after scanning struct pages for just a few MB of memory. > > > 2. Reduction of the disturbances to one processor. > > > > I've never checked it but I would have expected kswapd to stay on the > same processor for significant periods of time. Have you experienced > problems where kswapd bounces around on CPUs within a node causing > workload disruption? When kswapd shares the same CPU as our main process it causes a measurable drop in response time (graphs show tiny spikes at the same time memory is freed). Would be nice to be able to ensure it runs on a different core than our latency sensitive processes at least. We can pin processes to subsets of cores but I don't think there's a way to keep kswapd from waking up on any of them? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/10] Reduce system disruption due to kswapd V2
On Tue, Apr 09, 2013 at 05:27:18PM +, Christoph Lameter wrote: One additional measure that may be useful is to make kswapd prefer one specific processor on a socket. Two benefits arise from that: 1. Better use of cpu caches and therefore higher speed, less serialization. Considering the volume of pages that kswapd can scan when it's active I would expect that it trashes its cache anyway. The L1 cache would be flushed after scanning struct pages for just a few MB of memory. 2. Reduction of the disturbances to one processor. I've never checked it but I would have expected kswapd to stay on the same processor for significant periods of time. Have you experienced problems where kswapd bounces around on CPUs within a node causing workload disruption? When kswapd shares the same CPU as our main process it causes a measurable drop in response time (graphs show tiny spikes at the same time memory is freed). Would be nice to be able to ensure it runs on a different core than our latency sensitive processes at least. We can pin processes to subsets of cores but I don't think there's a way to keep kswapd from waking up on any of them? -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: BUG: IPv4: Attempt to release TCP socket in state 1
> On Sat, 2013-03-16 at 10:36 -0700, Eric Dumazet wrote: > > On Fri, 2013-03-15 at 00:19 +0100, Eric Dumazet wrote: > > > > > Thanks thats really useful, we might miss to increment socket refcount > > > in a timer setup. > > > > > > > Hmm, please add following debugging patch as well > > > > diff --git a/include/net/sock.h b/include/net/sock.h > > index 14f6e9d..fe7c8a6 100644 > > --- a/include/net/sock.h > > +++ b/include/net/sock.h > > @@ -530,7 +530,9 @@ static inline void sock_hold(struct sock *sk) > > */ > > static inline void __sock_put(struct sock *sk) > > { > > - atomic_dec(>sk_refcnt); > > + int newref = atomic_dec_return(>sk_refcnt); > > + > > + BUG_ON(newref <= 0); > > } > > > > static inline bool sk_del_node_init(struct sock *sk) > > diff --git a/net/ipv4/inet_connection_sock.c > > b/net/ipv4/inet_connection_sock.c > > index 786d97a..a445e15 100644 > > --- a/net/ipv4/inet_connection_sock.c > > +++ b/net/ipv4/inet_connection_sock.c > > @@ -739,7 +739,7 @@ void inet_csk_prepare_forced_close(struct sock *sk) > > { > > /* sk_clone_lock locked the socket and set refcnt to 2 */ > > bh_unlock_sock(sk); > > - sock_put(sk); > > + __sock_put(sk); > > > > /* The below has to be done to allow calling inet_csk_destroy_sock */ > > sock_set_flag(sk, SOCK_DEAD); > > @@ -835,13 +835,13 @@ void inet_csk_listen_stop(struct sock *sk) > > * tcp_v4_destroy_sock(). > > */ > > tcp_sk(child)->fastopen_rsk = NULL; > > - sock_put(sk); > > + __sock_put(sk); > > } > > inet_csk_destroy_sock(child); > > > > bh_unlock_sock(child); > > local_bh_enable(); > > - sock_put(child); > > + __sock_put(child); > > > > Please don't include the last line : this should stay as > > sock_put(child); Hope you don't mind a screenshot: http://www.dormando.me/p/3.8.2-trace-crash.jpg (I put the patches on 3.8.2). box is on another continent so screenshot via IPMI is what I get. If this isn't enough or isn't right I'll try harder to get the trace logged, I guess? Thanks! -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: BUG: IPv4: Attempt to release TCP socket in state 1
On Sat, 2013-03-16 at 10:36 -0700, Eric Dumazet wrote: On Fri, 2013-03-15 at 00:19 +0100, Eric Dumazet wrote: Thanks thats really useful, we might miss to increment socket refcount in a timer setup. Hmm, please add following debugging patch as well diff --git a/include/net/sock.h b/include/net/sock.h index 14f6e9d..fe7c8a6 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -530,7 +530,9 @@ static inline void sock_hold(struct sock *sk) */ static inline void __sock_put(struct sock *sk) { - atomic_dec(sk-sk_refcnt); + int newref = atomic_dec_return(sk-sk_refcnt); + + BUG_ON(newref = 0); } static inline bool sk_del_node_init(struct sock *sk) diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c index 786d97a..a445e15 100644 --- a/net/ipv4/inet_connection_sock.c +++ b/net/ipv4/inet_connection_sock.c @@ -739,7 +739,7 @@ void inet_csk_prepare_forced_close(struct sock *sk) { /* sk_clone_lock locked the socket and set refcnt to 2 */ bh_unlock_sock(sk); - sock_put(sk); + __sock_put(sk); /* The below has to be done to allow calling inet_csk_destroy_sock */ sock_set_flag(sk, SOCK_DEAD); @@ -835,13 +835,13 @@ void inet_csk_listen_stop(struct sock *sk) * tcp_v4_destroy_sock(). */ tcp_sk(child)-fastopen_rsk = NULL; - sock_put(sk); + __sock_put(sk); } inet_csk_destroy_sock(child); bh_unlock_sock(child); local_bh_enable(); - sock_put(child); + __sock_put(child); Please don't include the last line : this should stay as sock_put(child); Hope you don't mind a screenshot: http://www.dormando.me/p/3.8.2-trace-crash.jpg (I put the patches on 3.8.2). box is on another continent so screenshot via IPMI is what I get. If this isn't enough or isn't right I'll try harder to get the trace logged, I guess? Thanks! -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: BUG: IPv4: Attempt to release TCP socket in state 1
> On Sat, 2013-03-16 at 10:36 -0700, Eric Dumazet wrote: > > On Fri, 2013-03-15 at 00:19 +0100, Eric Dumazet wrote: > > > > > Thanks thats really useful, we might miss to increment socket refcount > > > in a timer setup. > > > > > > > Hmm, please add following debugging patch as well > > > > diff --git a/include/net/sock.h b/include/net/sock.h > > index 14f6e9d..fe7c8a6 100644 > > --- a/include/net/sock.h > > +++ b/include/net/sock.h > > @@ -530,7 +530,9 @@ static inline void sock_hold(struct sock *sk) > > */ > > static inline void __sock_put(struct sock *sk) > > { > > - atomic_dec(>sk_refcnt); > > + int newref = atomic_dec_return(>sk_refcnt); > > + > > + BUG_ON(newref <= 0); > > } > > > > static inline bool sk_del_node_init(struct sock *sk) > > diff --git a/net/ipv4/inet_connection_sock.c > > b/net/ipv4/inet_connection_sock.c > > index 786d97a..a445e15 100644 > > --- a/net/ipv4/inet_connection_sock.c > > +++ b/net/ipv4/inet_connection_sock.c > > @@ -739,7 +739,7 @@ void inet_csk_prepare_forced_close(struct sock *sk) > > { > > /* sk_clone_lock locked the socket and set refcnt to 2 */ > > bh_unlock_sock(sk); > > - sock_put(sk); > > + __sock_put(sk); > > > > /* The below has to be done to allow calling inet_csk_destroy_sock */ > > sock_set_flag(sk, SOCK_DEAD); > > @@ -835,13 +835,13 @@ void inet_csk_listen_stop(struct sock *sk) > > * tcp_v4_destroy_sock(). > > */ > > tcp_sk(child)->fastopen_rsk = NULL; > > - sock_put(sk); > > + __sock_put(sk); > > } > > inet_csk_destroy_sock(child); > > > > bh_unlock_sock(child); > > local_bh_enable(); > > - sock_put(child); > > + __sock_put(child); > > > > Please don't include the last line : this should stay as > > sock_put(child); > thanks! Will take at least 24 hours to get the trace. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: BUG: IPv4: Attempt to release TCP socket in state 1
On Sat, 2013-03-16 at 10:36 -0700, Eric Dumazet wrote: On Fri, 2013-03-15 at 00:19 +0100, Eric Dumazet wrote: Thanks thats really useful, we might miss to increment socket refcount in a timer setup. Hmm, please add following debugging patch as well diff --git a/include/net/sock.h b/include/net/sock.h index 14f6e9d..fe7c8a6 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -530,7 +530,9 @@ static inline void sock_hold(struct sock *sk) */ static inline void __sock_put(struct sock *sk) { - atomic_dec(sk-sk_refcnt); + int newref = atomic_dec_return(sk-sk_refcnt); + + BUG_ON(newref = 0); } static inline bool sk_del_node_init(struct sock *sk) diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c index 786d97a..a445e15 100644 --- a/net/ipv4/inet_connection_sock.c +++ b/net/ipv4/inet_connection_sock.c @@ -739,7 +739,7 @@ void inet_csk_prepare_forced_close(struct sock *sk) { /* sk_clone_lock locked the socket and set refcnt to 2 */ bh_unlock_sock(sk); - sock_put(sk); + __sock_put(sk); /* The below has to be done to allow calling inet_csk_destroy_sock */ sock_set_flag(sk, SOCK_DEAD); @@ -835,13 +835,13 @@ void inet_csk_listen_stop(struct sock *sk) * tcp_v4_destroy_sock(). */ tcp_sk(child)-fastopen_rsk = NULL; - sock_put(sk); + __sock_put(sk); } inet_csk_destroy_sock(child); bh_unlock_sock(child); local_bh_enable(); - sock_put(child); + __sock_put(child); Please don't include the last line : this should stay as sock_put(child); thanks! Will take at least 24 hours to get the trace. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: BUG: IPv4: Attempt to release TCP socket in state 1
> On Thu, 2013-03-14 at 14:21 -0700, dormando wrote: > > > > > > diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c > > > index 68f6a94..1d4d97e 100644 > > > --- a/net/ipv4/af_inet.c > > > +++ b/net/ipv4/af_inet.c > > > @@ -141,8 +141,9 @@ void inet_sock_destruct(struct sock *sk) > > > sk_mem_reclaim(sk); > > > > > > if (sk->sk_type == SOCK_STREAM && sk->sk_state != TCP_CLOSE) { > > > - pr_err("Attempt to release TCP socket in state %d %p\n", > > > -sk->sk_state, sk); > > > + pr_err("Attempt to release TCP socket family %d in state %d > > > %p\n", > > > +sk->sk_family, sk->sk_state, sk); > > > + WARN_ON_ONCE(1); > > > return; > > > } > > > if (!sock_flag(sk, SOCK_DEAD)) { > > > > [58377.436522] IPv4: Attempt to release TCP socket family 2 in state 1 > > 8813fbad9500 > > There is no stack information on the WARN_ON_ONCE(1) ? *sigh*. it's been a long month, sorry: [58377.436522] IPv4: Attempt to release TCP socket family 2 in state 1 8813fbad9500 [58377.436539] [ cut here ] [58377.436545] WARNING: at net/ipv4/af_inet.c:146 inet_sock_destruct+0x176/0x200() [58377.436546] Hardware name: X9DR3-F [58377.436547] Modules linked in: bridge coretemp ghash_clmulni_intel ipmi_watchdog ipmi_devintf gpio_ich microcode ixgbe sb_edac edac_core mei lpc_ich mfd_core mdio ipmi_si ipmi_msghandler iptable_nat nf_nat_ipv4 nf_nat isci libsas igb ptp pps_core [58377.436563] Pid: 0, comm: swapper/0 Not tainted 3.8.2 #3 [58377.436564] Call Trace: [58377.436566][] warn_slowpath_common+0x7f/0xc0 [58377.436572] [] warn_slowpath_null+0x1a/0x20 [58377.436574] [] inet_sock_destruct+0x176/0x200 [58377.436578] [] ? tcp_write_timer_handler+0x1b0/0x1b0 [58377.436581] [] __sk_free+0x1d/0x140 [58377.436583] [] ? tcp_write_timer_handler+0x1b0/0x1b0 [58377.436585] [] sk_free+0x25/0x30 [58377.436586] [] tcp_write_timer+0x49/0x70 [58377.436590] [] call_timer_fn+0x49/0x130 [58377.436593] [] ? scheduler_tick+0x15f/0x190 [58377.436596] [] run_timer_softirq+0x224/0x290 [58377.436598] [] ? update_process_times+0x76/0x90 [58377.436600] [] ? tcp_write_timer_handler+0x1b0/0x1b0 [58377.436602] [] ? ktime_get+0x54/0xe0 [58377.436604] [] __do_softirq+0xc7/0x230 [58377.436608] [] call_softirq+0x1c/0x30 [58377.436611] [] do_softirq+0x55/0x90 [58377.436613] [] irq_exit+0x85/0xa0 [58377.436616] [] smp_apic_timer_interrupt+0x6e/0x99 [58377.436618] [] apic_timer_interrupt+0x6a/0x70 [58377.436619][] ? __schedule+0x3ac/0x750 [58377.436625] [] ? mwait_idle+0xad/0x1f0 [58377.436627] [] cpu_idle+0xb3/0x100 [58377.436629] [] rest_init+0x72/0x80 [58377.436633] [] start_kernel+0x3ac/0x3b9 [58377.436635] [] ? repair_env_string+0x5b/0x5b [58377.436636] [] x86_64_start_reservations+0x131/0x136 [58377.436638] [] x86_64_start_kernel+0xed/0xf4 [58377.436639] ---[ end trace 9e57364162374433 ]--- ^ pretty sure that's the WARN_ON_ONCE(1) Then a short while later the usual: [58394.689801] [ cut here ] [58394.689817] WARNING: at net/sched/sch_generic.c:254 dev_watchdog+0x258/0x270() [58394.689820] Hardware name: X9DR3-F [58394.689836] NETDEV WATCHDOG: eth2 (ixgbe): transmit queue 14 timed out [58394.689837] Modules linked in: bridge coretemp ghash_clmulni_intel ipmi_watchdog ipmi_devintf gpio_ich microcode ixgbe sb_edac edac_core mei lpc_ich mfd_core mdio ipmi_si ipmi_msghandler iptable_nat nf_nat_ipv4 nf_nat isci libsas igb ptp pps_core [58394.689853] Pid: 0, comm: swapper/0 Tainted: GW 3.8.2 #3 [58394.689854] Call Trace: [58394.689856][] warn_slowpath_common+0x7f/0xc0 [58394.689863] [] warn_slowpath_fmt+0x46/0x50 [58394.689865] [] dev_watchdog+0x258/0x270 [58394.689868] [] ? __netdev_watchdog_up+0x80/0x80 [58394.689872] [] call_timer_fn+0x49/0x130 [58394.689875] [] ? scheduler_tick+0x15f/0x190 [58394.689877] [] run_timer_softirq+0x224/0x290 [58394.689880] [] ? update_process_times+0x76/0x90 [58394.689882] [] ? __netdev_watchdog_up+0x80/0x80 [58394.689884] [] ? ktime_get+0x54/0xe0 [58394.689886] [] __do_softirq+0xc7/0x230 [58394.689890] [] call_softirq+0x1c/0x30 [58394.689894] [] do_softirq+0x55/0x90 [58394.689895] [] irq_exit+0x85/0xa0 [58394.689898] [] smp_apic_timer_interrupt+0x6e/0x99 [58394.689900] [] apic_timer_interrupt+0x6a/0x70 [58394.689901][] ? __schedule+0x3ac/0x750 [58394.689907] [] ? mwait_idle+0xad/0x1f0 [58394.689909] [] cpu_idle+0xb3/0x100 [58394.689911] [] rest_init+0x72/0x80 [58394.689915] [] start_kernel+0x3ac/0x3b9 [58394.689917] [] ? repair_env_string+0x5b/0x5b [58394.689918] [] x86_64_start_reservations+0x131/0x136 [58394.689920] [] x86_64_start_kernel+0xed/0xf4 [58394.689922] ---[ end trace 9e57364162374434 ]--- [5839
Re: BUG: IPv4: Attempt to release TCP socket in state 1
On Thu, 2013-03-14 at 14:21 -0700, dormando wrote: diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c index 68f6a94..1d4d97e 100644 --- a/net/ipv4/af_inet.c +++ b/net/ipv4/af_inet.c @@ -141,8 +141,9 @@ void inet_sock_destruct(struct sock *sk) sk_mem_reclaim(sk); if (sk-sk_type == SOCK_STREAM sk-sk_state != TCP_CLOSE) { - pr_err(Attempt to release TCP socket in state %d %p\n, -sk-sk_state, sk); + pr_err(Attempt to release TCP socket family %d in state %d %p\n, +sk-sk_family, sk-sk_state, sk); + WARN_ON_ONCE(1); return; } if (!sock_flag(sk, SOCK_DEAD)) { [58377.436522] IPv4: Attempt to release TCP socket family 2 in state 1 8813fbad9500 There is no stack information on the WARN_ON_ONCE(1) ? *sigh*. it's been a long month, sorry: [58377.436522] IPv4: Attempt to release TCP socket family 2 in state 1 8813fbad9500 [58377.436539] [ cut here ] [58377.436545] WARNING: at net/ipv4/af_inet.c:146 inet_sock_destruct+0x176/0x200() [58377.436546] Hardware name: X9DR3-F [58377.436547] Modules linked in: bridge coretemp ghash_clmulni_intel ipmi_watchdog ipmi_devintf gpio_ich microcode ixgbe sb_edac edac_core mei lpc_ich mfd_core mdio ipmi_si ipmi_msghandler iptable_nat nf_nat_ipv4 nf_nat isci libsas igb ptp pps_core [58377.436563] Pid: 0, comm: swapper/0 Not tainted 3.8.2 #3 [58377.436564] Call Trace: [58377.436566] IRQ [8104964f] warn_slowpath_common+0x7f/0xc0 [58377.436572] [810496aa] warn_slowpath_null+0x1a/0x20 [58377.436574] [816032e6] inet_sock_destruct+0x176/0x200 [58377.436578] [815ec8e0] ? tcp_write_timer_handler+0x1b0/0x1b0 [58377.436581] [8156ee8d] __sk_free+0x1d/0x140 [58377.436583] [815ec8e0] ? tcp_write_timer_handler+0x1b0/0x1b0 [58377.436585] [8156efd5] sk_free+0x25/0x30 [58377.436586] [815ec929] tcp_write_timer+0x49/0x70 [58377.436590] [81059259] call_timer_fn+0x49/0x130 [58377.436593] [8107a07f] ? scheduler_tick+0x15f/0x190 [58377.436596] [81059854] run_timer_softirq+0x224/0x290 [58377.436598] [81058f76] ? update_process_times+0x76/0x90 [58377.436600] [815ec8e0] ? tcp_write_timer_handler+0x1b0/0x1b0 [58377.436602] [8108ebd4] ? ktime_get+0x54/0xe0 [58377.436604] [810518a7] __do_softirq+0xc7/0x230 [58377.436608] [8168fd4c] call_softirq+0x1c/0x30 [58377.436611] [81004415] do_softirq+0x55/0x90 [58377.436613] [810516a5] irq_exit+0x85/0xa0 [58377.436616] [8169036e] smp_apic_timer_interrupt+0x6e/0x99 [58377.436618] [8168f74a] apic_timer_interrupt+0x6a/0x70 [58377.436619] EOI [816855cc] ? __schedule+0x3ac/0x750 [58377.436625] [8100b1fd] ? mwait_idle+0xad/0x1f0 [58377.436627] [8100a743] cpu_idle+0xb3/0x100 [58377.436629] [816736a2] rest_init+0x72/0x80 [58377.436633] [81cc7d0e] start_kernel+0x3ac/0x3b9 [58377.436635] [81cc7790] ? repair_env_string+0x5b/0x5b [58377.436636] [81cc732d] x86_64_start_reservations+0x131/0x136 [58377.436638] [81cc741f] x86_64_start_kernel+0xed/0xf4 [58377.436639] ---[ end trace 9e57364162374433 ]--- ^ pretty sure that's the WARN_ON_ONCE(1) Then a short while later the usual: [58394.689801] [ cut here ] [58394.689817] WARNING: at net/sched/sch_generic.c:254 dev_watchdog+0x258/0x270() [58394.689820] Hardware name: X9DR3-F [58394.689836] NETDEV WATCHDOG: eth2 (ixgbe): transmit queue 14 timed out [58394.689837] Modules linked in: bridge coretemp ghash_clmulni_intel ipmi_watchdog ipmi_devintf gpio_ich microcode ixgbe sb_edac edac_core mei lpc_ich mfd_core mdio ipmi_si ipmi_msghandler iptable_nat nf_nat_ipv4 nf_nat isci libsas igb ptp pps_core [58394.689853] Pid: 0, comm: swapper/0 Tainted: GW 3.8.2 #3 [58394.689854] Call Trace: [58394.689856] IRQ [8104964f] warn_slowpath_common+0x7f/0xc0 [58394.689863] [81049746] warn_slowpath_fmt+0x46/0x50 [58394.689865] [815a1508] dev_watchdog+0x258/0x270 [58394.689868] [815a12b0] ? __netdev_watchdog_up+0x80/0x80 [58394.689872] [81059259] call_timer_fn+0x49/0x130 [58394.689875] [8107a07f] ? scheduler_tick+0x15f/0x190 [58394.689877] [81059854] run_timer_softirq+0x224/0x290 [58394.689880] [81058f76] ? update_process_times+0x76/0x90 [58394.689882] [815a12b0] ? __netdev_watchdog_up+0x80/0x80 [58394.689884] [8108ebd4] ? ktime_get+0x54/0xe0 [58394.689886] [810518a7] __do_softirq+0xc7/0x230 [58394.689890] [8168fd4c] call_softirq+0x1c/0x30 [58394.689894] [81004415] do_softirq+0x55/0x90 [58394.689895] [810516a5] irq_exit+0x85/0xa0 [58394.689898] [8169036e] smp_apic_timer_interrupt+0x6e/0x99 [58394.689900] [8168f74a] apic_timer_interrupt+0x6a/0x70 [58394.689901] EOI
Re: BUG: IPv4: Attempt to release TCP socket in state 1
> On Wed, 2013-03-06 at 16:41 -0800, dormando wrote: > > > Ok... bridge module is loaded but nothing seems to be using it. No > > bond/tunnels/anything enabled. I couldn't quickly figure out what was > > causing it to load. > > > > We removed the need for macvlan, started machines with a fresh boot, and > > they still crashed without it, after a few hours. > > > > Unfortunately I just saw a machine crash in the same way on 3.6.6 and > > 3.6.9. I'm working on getting a completely pristine 3.6.6 and 3.6.9 > > tested. Our patches are minor but there were a few, so I'm backing it all > > out just to be sure. > > > > Is there anything in particular which is most interesting? I can post lots > > and lots and lots of information. Sadly bridge/macvlan weren't part of the > > problem. .config, sysctls are easiest I guess? When this "hang" happens > > the machine is still up somewhat, but we lose access to it. Syslog is > > still writing entries to disk occasionally, so it's possible we could set > > something up to dump more information. > > > > It takes a day or two to cycle this, so it might take a while to get > > information and test crashes. > > Thanks ! > > Please add a stack trace, it might help : > > diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c > index 68f6a94..1d4d97e 100644 > --- a/net/ipv4/af_inet.c > +++ b/net/ipv4/af_inet.c > @@ -141,8 +141,9 @@ void inet_sock_destruct(struct sock *sk) > sk_mem_reclaim(sk); > > if (sk->sk_type == SOCK_STREAM && sk->sk_state != TCP_CLOSE) { > - pr_err("Attempt to release TCP socket in state %d %p\n", > -sk->sk_state, sk); > + pr_err("Attempt to release TCP socket family %d in state %d > %p\n", > +sk->sk_family, sk->sk_state, sk); > + WARN_ON_ONCE(1); > return; > } > if (!sock_flag(sk, SOCK_DEAD)) { Ok. I have a pristine 3.6.6 up and testing now... It definitely looks like we've been having this crash for quite a while, but much more rarely. Recent changes in traffic have made it worse. I'll try your patch soon. It'll take a few days to reproduce. I'll be back (ho ho ho). Please ping with any ideas you folks might have in the meantime :( -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: BUG: IPv4: Attempt to release TCP socket in state 1
On Wed, 2013-03-06 at 16:41 -0800, dormando wrote: Ok... bridge module is loaded but nothing seems to be using it. No bond/tunnels/anything enabled. I couldn't quickly figure out what was causing it to load. We removed the need for macvlan, started machines with a fresh boot, and they still crashed without it, after a few hours. Unfortunately I just saw a machine crash in the same way on 3.6.6 and 3.6.9. I'm working on getting a completely pristine 3.6.6 and 3.6.9 tested. Our patches are minor but there were a few, so I'm backing it all out just to be sure. Is there anything in particular which is most interesting? I can post lots and lots and lots of information. Sadly bridge/macvlan weren't part of the problem. .config, sysctls are easiest I guess? When this hang happens the machine is still up somewhat, but we lose access to it. Syslog is still writing entries to disk occasionally, so it's possible we could set something up to dump more information. It takes a day or two to cycle this, so it might take a while to get information and test crashes. Thanks ! Please add a stack trace, it might help : diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c index 68f6a94..1d4d97e 100644 --- a/net/ipv4/af_inet.c +++ b/net/ipv4/af_inet.c @@ -141,8 +141,9 @@ void inet_sock_destruct(struct sock *sk) sk_mem_reclaim(sk); if (sk-sk_type == SOCK_STREAM sk-sk_state != TCP_CLOSE) { - pr_err(Attempt to release TCP socket in state %d %p\n, -sk-sk_state, sk); + pr_err(Attempt to release TCP socket family %d in state %d %p\n, +sk-sk_family, sk-sk_state, sk); + WARN_ON_ONCE(1); return; } if (!sock_flag(sk, SOCK_DEAD)) { Ok. I have a pristine 3.6.6 up and testing now... It definitely looks like we've been having this crash for quite a while, but much more rarely. Recent changes in traffic have made it worse. I'll try your patch soon. It'll take a few days to reproduce. I'll be back (ho ho ho). Please ping with any ideas you folks might have in the meantime :( -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: BUG: IPv4: Attempt to release TCP socket in state 1
> On Mon, 2013-03-04 at 21:44 -0800, dormando wrote: > > > No 3rd party modules. There's a tiny patch for controlling initcwnd from > > userspace and another one for the extra_free_kbytes tunable that I brought > > up in another thread. We've had the initcwnd patch in for a long time > > without trouble. The extra_free_kbytes tunable isn't even being used yet, > > so all that's doing is adding a 0 somewhere. > > > > Only two iptables rules loaded: global NOTRACK rules for PREROUTING/OUTPUT > > in raw. > > > > Kernel's as close to pristine as I can make it. We had the 10g patch in > > but I've dropped it. > > -- > > Hmm, I spent time on this bug report but found nothing. > > Please post as much information as you can on your setup. > > I see you use macvlan, bridge, so maybe there is a configuration issue > (and a kernel bug of course) Ok... bridge module is loaded but nothing seems to be using it. No bond/tunnels/anything enabled. I couldn't quickly figure out what was causing it to load. We removed the need for macvlan, started machines with a fresh boot, and they still crashed without it, after a few hours. Unfortunately I just saw a machine crash in the same way on 3.6.6 and 3.6.9. I'm working on getting a completely pristine 3.6.6 and 3.6.9 tested. Our patches are minor but there were a few, so I'm backing it all out just to be sure. Is there anything in particular which is most interesting? I can post lots and lots and lots of information. Sadly bridge/macvlan weren't part of the problem. .config, sysctls are easiest I guess? When this "hang" happens the machine is still up somewhat, but we lose access to it. Syslog is still writing entries to disk occasionally, so it's possible we could set something up to dump more information. It takes a day or two to cycle this, so it might take a while to get information and test crashes. thanks, -Dormando -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: BUG: IPv4: Attempt to release TCP socket in state 1
On Mon, 2013-03-04 at 21:44 -0800, dormando wrote: No 3rd party modules. There's a tiny patch for controlling initcwnd from userspace and another one for the extra_free_kbytes tunable that I brought up in another thread. We've had the initcwnd patch in for a long time without trouble. The extra_free_kbytes tunable isn't even being used yet, so all that's doing is adding a 0 somewhere. Only two iptables rules loaded: global NOTRACK rules for PREROUTING/OUTPUT in raw. Kernel's as close to pristine as I can make it. We had the 10g patch in but I've dropped it. -- Hmm, I spent time on this bug report but found nothing. Please post as much information as you can on your setup. I see you use macvlan, bridge, so maybe there is a configuration issue (and a kernel bug of course) Ok... bridge module is loaded but nothing seems to be using it. No bond/tunnels/anything enabled. I couldn't quickly figure out what was causing it to load. We removed the need for macvlan, started machines with a fresh boot, and they still crashed without it, after a few hours. Unfortunately I just saw a machine crash in the same way on 3.6.6 and 3.6.9. I'm working on getting a completely pristine 3.6.6 and 3.6.9 tested. Our patches are minor but there were a few, so I'm backing it all out just to be sure. Is there anything in particular which is most interesting? I can post lots and lots and lots of information. Sadly bridge/macvlan weren't part of the problem. .config, sysctls are easiest I guess? When this hang happens the machine is still up somewhat, but we lose access to it. Syslog is still writing entries to disk occasionally, so it's possible we could set something up to dump more information. It takes a day or two to cycle this, so it might take a while to get information and test crashes. thanks, -Dormando -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: BUG: IPv4: Attempt to release TCP socket in state 1
On Mon, 4 Mar 2013, Eric Dumazet wrote: > On Tue, 2013-03-05 at 11:47 +0800, Cong Wang wrote: > > (Cc'ing the right netdev mailing list...) > > > > On 03/05/2013 08:01 AM, dormando wrote: > > > Hi! > > > > > > I have a (core lockup?) with 3.7.6+ and 3.8.2 which appears to be under > > > ixgbe. The machine appears to still be up but network stays in a severely > > > hobbled state. Either lagging or not responding to the network at all. > > > > > > On a new box the hang happens within 8-24 hours of giving it production > > > network traffic. On an older machine (6 cores instead of 8, etc) it can > > > run for a week or more before hanging. > > > > > > The hang from 3.7 might be slightly different than 3.8. They seem to be > > > mostly the same aside from 3.8 hanging in the GRO path. Don't see anything > > > obvious in 3.9-rc1 that would fix it, and haven't tried 3.9-rc1. > > > > > > I've not yet figured out how to reproduce outside of production (as > > > always, sigh). This doesn't seem to happen with 3.6.6, but we have > > > different and less frequent kernel panics there. > > > > > Dornando, do you use any kind of special setup, external modules, > or netfilter ? (iptables-save output would help) > > Is it a pristine kernel, or a modified one ? > (Sigh. sorry for the misfire, thanks for fixing cc). No 3rd party modules. There's a tiny patch for controlling initcwnd from userspace and another one for the extra_free_kbytes tunable that I brought up in another thread. We've had the initcwnd patch in for a long time without trouble. The extra_free_kbytes tunable isn't even being used yet, so all that's doing is adding a 0 somewhere. Only two iptables rules loaded: global NOTRACK rules for PREROUTING/OUTPUT in raw. Kernel's as close to pristine as I can make it. We had the 10g patch in but I've dropped it. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
BUG: IPv4: Attempt to release TCP socket in state 1
5] [] hrtimer_interrupt+0xf6/0x230 [37335.739871] [] smp_apic_timer_interrupt+0x69/0x99 [37335.739874] [] apic_timer_interrupt+0x6a/0x70 [37335.739878] [] ? __inet_lookup_established+0xcf/0x2d0 [37335.739880] [] ? inet_del_protocol+0x40/0x40 [37335.739884] [] tcp_v4_early_demux+0xac/0x170 [37335.739886] [] ip_rcv_finish+0x14d/0x360 [37335.739888] [] ip_rcv+0x226/0x310 [37335.739892] [] __netif_receive_skb+0x492/0x640 [37335.739895] [] netif_receive_skb+0x2d/0x90 [37335.739897] [] ? tcp4_gro_receive+0xb0/0x130 [37335.739899] [] napi_gro_complete+0x95/0xe0 [37335.739901] [] dev_gro_receive+0x2b6/0x3b0 [37335.739903] [] napi_gro_receive+0x5b/0x130 [37335.739911] [] ixgbe_poll+0x54a/0x1180 [ixgbe] [37335.739915] [] ? enqueue_task+0x6a/0x80 [37335.739917] [] net_rx_action+0xf5/0x260 [37335.739919] [] __do_softirq+0xc7/0x230 [37335.739922] [] call_softirq+0x1c/0x30 [37335.739927] [] do_softirq+0x55/0x90 [37335.739928] [] irq_exit+0x85/0xa0 [37335.739931] [] do_IRQ+0x66/0xe0 [37335.739937] [] common_interrupt+0x6a/0x6a [37335.739938][] ? __schedule+0x3ac/0x750 [37335.739943] [] ? mwait_idle+0xad/0x1f0 [37335.739945] [] cpu_idle+0xb3/0x100 [37335.739948] [] start_secondary+0x1d7/0x1de [37515.727179] INFO: rcu_sched self-detected stall on CPU { 24} (t=1005087 jiffies g=1985385 c=1985384 q=2087) [37515.727246] Pid: 0, comm: swapper/24 Tainted: GW3.8.2 #2 [37515.727249] Call Trace: [37515.727251][] rcu_check_callbacks+0x21e/0x7c0 [37515.727265] [] ? account_system_time+0xe8/0x1e0 [37515.727271] [] update_process_times+0x48/0x90 [37515.727275] [] tick_sched_timer+0x56/0x130 [37515.727279] [] __run_hrtimer+0x7d/0x1c0 [37515.727281] [] ? tick_setup_sched_timer+0x110/0x110 [37515.727283] [] hrtimer_interrupt+0xf6/0x230 [37515.727289] [] smp_apic_timer_interrupt+0x69/0x99 [37515.727292] [] apic_timer_interrupt+0x6a/0x70 [37515.727296] [] ? __inet_lookup_established+0xcb/0x2d0 [37515.727298] [] ? inet_del_protocol+0x40/0x40 [37515.727302] [] tcp_v4_early_demux+0xac/0x170 [37515.727304] [] ip_rcv_finish+0x14d/0x360 [37515.727306] [] ip_rcv+0x226/0x310 [37515.727310] [] __netif_receive_skb+0x492/0x640 [37515.727312] [] netif_receive_skb+0x2d/0x90 [37515.727315] [] ? tcp4_gro_receive+0xb0/0x130 [37515.727317] [] napi_gro_complete+0x95/0xe0 [37515.727319] [] dev_gro_receive+0x2b6/0x3b0 [37515.727322] [] napi_gro_receive+0x5b/0x130 [37515.727330] [] ixgbe_poll+0x54a/0x1180 [ixgbe] [37515.727334] [] ? enqueue_task+0x6a/0x80 [37515.727336] [] net_rx_action+0xf5/0x260 [37515.727338] [] __do_softirq+0xc7/0x230 [37515.727341] [] call_softirq+0x1c/0x30 [37515.727345] [] do_softirq+0x55/0x90 [37515.727346] [] irq_exit+0x85/0xa0 [37515.727349] [] do_IRQ+0x66/0xe0 [37515.727354] [] common_interrupt+0x6a/0x6a [37515.727355][] ? __schedule+0x3ac/0x750 [37515.727360] [] ? mwait_idle+0xad/0x1f0 [37515.727362] [] cpu_idle+0xb3/0x100 [37515.727365] [] start_secondary+0x1d7/0x1de ... then swapped just does this until someone reboots the box. Apologies for the ugly paste. Thanks, -Dormando -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
BUG: IPv4: Attempt to release TCP socket in state 1
[8168567c] ? __schedule+0x3ac/0x750 [37335.739943] [8100b1fd] ? mwait_idle+0xad/0x1f0 [37335.739945] [8100a743] cpu_idle+0xb3/0x100 [37335.739948] [8167d7d2] start_secondary+0x1d7/0x1de [37515.727179] INFO: rcu_sched self-detected stall on CPU { 24} (t=1005087 jiffies g=1985385 c=1985384 q=2087) [37515.727246] Pid: 0, comm: swapper/24 Tainted: GW3.8.2 #2 [37515.727249] Call Trace: [37515.727251] IRQ [810bea1e] rcu_check_callbacks+0x21e/0x7c0 [37515.727265] [8107f518] ? account_system_time+0xe8/0x1e0 [37515.727271] [81058f48] update_process_times+0x48/0x90 [37515.727275] [81095e06] tick_sched_timer+0x56/0x130 [37515.727279] [8107099d] __run_hrtimer+0x7d/0x1c0 [37515.727281] [81095db0] ? tick_setup_sched_timer+0x110/0x110 [37515.727283] [81070d56] hrtimer_interrupt+0xf6/0x230 [37515.727289] [81690429] smp_apic_timer_interrupt+0x69/0x99 [37515.727292] [8168f80a] apic_timer_interrupt+0x6a/0x70 [37515.727296] [815d3deb] ? __inet_lookup_established+0xcb/0x2d0 [37515.727298] [815cab80] ? inet_del_protocol+0x40/0x40 [37515.727302] [815f078c] tcp_v4_early_demux+0xac/0x170 [37515.727304] [815caccd] ip_rcv_finish+0x14d/0x360 [37515.727306] [815cb246] ip_rcv+0x226/0x310 [37515.727310] [815841a2] __netif_receive_skb+0x492/0x640 [37515.727312] [8158455d] netif_receive_skb+0x2d/0x90 [37515.727315] [815ed450] ? tcp4_gro_receive+0xb0/0x130 [37515.727317] [81584655] napi_gro_complete+0x95/0xe0 [37515.727319] [81584956] dev_gro_receive+0x2b6/0x3b0 [37515.727322] [8158508b] napi_gro_receive+0x5b/0x130 [37515.727330] [a01db04a] ixgbe_poll+0x54a/0x1180 [ixgbe] [37515.727334] [810792fa] ? enqueue_task+0x6a/0x80 [37515.727336] [81584c15] net_rx_action+0xf5/0x260 [37515.727338] [810518a7] __do_softirq+0xc7/0x230 [37515.727341] [8168fe0c] call_softirq+0x1c/0x30 [37515.727345] [81004415] do_softirq+0x55/0x90 [37515.727346] [810516a5] irq_exit+0x85/0xa0 [37515.727349] [81690346] do_IRQ+0x66/0xe0 [37515.727354] [81686daa] common_interrupt+0x6a/0x6a [37515.727355] EOI [8168567c] ? __schedule+0x3ac/0x750 [37515.727360] [8100b1fd] ? mwait_idle+0xad/0x1f0 [37515.727362] [8100a743] cpu_idle+0xb3/0x100 [37515.727365] [8167d7d2] start_secondary+0x1d7/0x1de ... then swapped just does this until someone reboots the box. Apologies for the ugly paste. Thanks, -Dormando -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: BUG: IPv4: Attempt to release TCP socket in state 1
On Mon, 4 Mar 2013, Eric Dumazet wrote: On Tue, 2013-03-05 at 11:47 +0800, Cong Wang wrote: (Cc'ing the right netdev mailing list...) On 03/05/2013 08:01 AM, dormando wrote: Hi! I have a (core lockup?) with 3.7.6+ and 3.8.2 which appears to be under ixgbe. The machine appears to still be up but network stays in a severely hobbled state. Either lagging or not responding to the network at all. On a new box the hang happens within 8-24 hours of giving it production network traffic. On an older machine (6 cores instead of 8, etc) it can run for a week or more before hanging. The hang from 3.7 might be slightly different than 3.8. They seem to be mostly the same aside from 3.8 hanging in the GRO path. Don't see anything obvious in 3.9-rc1 that would fix it, and haven't tried 3.9-rc1. I've not yet figured out how to reproduce outside of production (as always, sigh). This doesn't seem to happen with 3.6.6, but we have different and less frequent kernel panics there. Dornando, do you use any kind of special setup, external modules, or netfilter ? (iptables-save output would help) Is it a pristine kernel, or a modified one ? (Sigh. sorry for the misfire, thanks for fixing cc). No 3rd party modules. There's a tiny patch for controlling initcwnd from userspace and another one for the extra_free_kbytes tunable that I brought up in another thread. We've had the initcwnd patch in for a long time without trouble. The extra_free_kbytes tunable isn't even being used yet, so all that's doing is adding a 0 somewhere. Only two iptables rules loaded: global NOTRACK rules for PREROUTING/OUTPUT in raw. Kernel's as close to pristine as I can make it. We had the 10g patch in but I've dropped it. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] add extra free kbytes tunable
> > The problem is that adding this tunable will constrain future VM > implementations. We will forever need to at least retain the > pseudo-file. We will also need to make some effort to retain its > behaviour. > > It would of course be better to fix things so you don't need to tweak > VM internals to get acceptable behaviour. I sympathize with this. It's presently all that keeps us afloat though. I'll whine about it again later if nothing else pans out. > You said: > > : We have a server workload wherein machines with 100G+ of "free" memory > : (used by page cache), scattered but frequent random io reads from 12+ > : SSD's, and 5gbps+ of internet traffic, will frequently hit direct reclaim > : in a few different ways. > : > : 1) It'll run into small amounts of reclaim randomly (a few hundred > : thousand). > : > : 2) A burst of reads or traffic can cause extra pressure, which kswapd > : occasionally responds to by freeing up 40g+ of the pagecache all at once > : (!) while pausing the system (Argh). > : > : 3) A blip in an upstream provider or failover from a peer causes the > : kernel to allocate massive amounts of memory for retransmission > : queues/etc, potentially along with buffered IO reads and (some, but not > : often a ton) of new allocations from an application. This paired with 2) > : can cause the box to stall for 15+ seconds. > > Can we prioritise these? 2) looks just awful - kswapd shouldn't just > go off and free 40G of pagecache. Do you know what's actually in that > pagecache? Large number of small files or small number of (very) large > files? We have a handful of huge files (6-12ish 200g+) that are mmap'ed and accessed via address. occasionally madvise (WILLNEED) applied to the address ranges before attempting to use them. There're a mix of other files but nothing significant. The mmap's are READONLY and writes are done via pwrite-ish functions. I could use some guidance on inspecting/tracing the problem. I've been trying to reproduce it in a lab, and respecting to 2)'s issue I've found: - The amount of memory freed back up is either a percentage of total memory or a percentage of free memory. (a machine with 48G of ram will "only" free up an extra 4-7g) - It's most likely to happen after a fresh boot, or if "3 > drop_caches" is applied with the application down. As it fills it seems to get itself into trouble, but becomes more stable after that. Unfortunately 1) and 3) still apply to a stable instance. - Protecting the DMA32 zone with something like "1 1 32" into lowmem_reserve_ratio makes the mass-reclaiming less likely to happen. - While watching "sar -B 1" I'll see kswapd wake up, and scan up to a few hundred thousand pages before finding anything it actually wants to reclaim (low vmeff). I've only been able to reproduce this from a clean start. It can take up to 3 seconds before kswapd starts actually reclaiming pages. - So far as I can tell we're almost exclusively using 0 order allocations. THP is disabled. There's not much dirty memory involved. It's not flushing out writes while reclaiming, it just kills off massive amount of cached memory. We're not running the machines particularily hard... Often less than 30% CPU usage at peak. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] add extra free kbytes tunable
The problem is that adding this tunable will constrain future VM implementations. We will forever need to at least retain the pseudo-file. We will also need to make some effort to retain its behaviour. It would of course be better to fix things so you don't need to tweak VM internals to get acceptable behaviour. I sympathize with this. It's presently all that keeps us afloat though. I'll whine about it again later if nothing else pans out. You said: : We have a server workload wherein machines with 100G+ of free memory : (used by page cache), scattered but frequent random io reads from 12+ : SSD's, and 5gbps+ of internet traffic, will frequently hit direct reclaim : in a few different ways. : : 1) It'll run into small amounts of reclaim randomly (a few hundred : thousand). : : 2) A burst of reads or traffic can cause extra pressure, which kswapd : occasionally responds to by freeing up 40g+ of the pagecache all at once : (!) while pausing the system (Argh). : : 3) A blip in an upstream provider or failover from a peer causes the : kernel to allocate massive amounts of memory for retransmission : queues/etc, potentially along with buffered IO reads and (some, but not : often a ton) of new allocations from an application. This paired with 2) : can cause the box to stall for 15+ seconds. Can we prioritise these? 2) looks just awful - kswapd shouldn't just go off and free 40G of pagecache. Do you know what's actually in that pagecache? Large number of small files or small number of (very) large files? We have a handful of huge files (6-12ish 200g+) that are mmap'ed and accessed via address. occasionally madvise (WILLNEED) applied to the address ranges before attempting to use them. There're a mix of other files but nothing significant. The mmap's are READONLY and writes are done via pwrite-ish functions. I could use some guidance on inspecting/tracing the problem. I've been trying to reproduce it in a lab, and respecting to 2)'s issue I've found: - The amount of memory freed back up is either a percentage of total memory or a percentage of free memory. (a machine with 48G of ram will only free up an extra 4-7g) - It's most likely to happen after a fresh boot, or if 3 drop_caches is applied with the application down. As it fills it seems to get itself into trouble, but becomes more stable after that. Unfortunately 1) and 3) still apply to a stable instance. - Protecting the DMA32 zone with something like 1 1 32 into lowmem_reserve_ratio makes the mass-reclaiming less likely to happen. - While watching sar -B 1 I'll see kswapd wake up, and scan up to a few hundred thousand pages before finding anything it actually wants to reclaim (low vmeff). I've only been able to reproduce this from a clean start. It can take up to 3 seconds before kswapd starts actually reclaiming pages. - So far as I can tell we're almost exclusively using 0 order allocations. THP is disabled. There's not much dirty memory involved. It's not flushing out writes while reclaiming, it just kills off massive amount of cached memory. We're not running the machines particularily hard... Often less than 30% CPU usage at peak. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: extra free kbytes tunable
On Fri, 15 Feb 2013, Rik van Riel wrote: > On 02/15/2013 05:21 PM, Seiji Aguchi wrote: > > Rik, Satoru, > > > > Do you have any comments? > > IIRC at the time the patch was rejected as too inelegant. > > However, nobody else seems to have come up with a better plan, and > there are users in need of a fix for this problem. > > I would still like to see a fix for the problem merged upstream. I merged in the cleanups to your original patch, rebased it off of linus' master from a day or two ago and re-sent (not sure how to preserve authorship in that case? Apologies for goofing it). I'm willing to argue it, or investigate better options. I'm going to be stuck maintaining this patch since we can't really afford to have production hang, or waste 12g+ of RAM per box. > > > -Original Message- > > > From: linux-kernel-ow...@vger.kernel.org > > > [mailto:linux-kernel-ow...@vger.kernel.org] On Behalf Of dormando > > > Sent: Monday, February 11, 2013 9:01 PM > > > To: Rik van Riel > > > Cc: Randy Dunlap; Satoru Moriya; linux-kernel@vger.kernel.org; > > > linux...@kvack.org; lwood...@redhat.com; Seiji Aguchi; > > > a...@linux-foundation.org; hu...@google.com > > > Subject: extra free kbytes tunable > > > > > > Hi, > > > > > > As discussed in this thread: > > > http://marc.info/?l=linux-mm=131490523222031=2 > > > (with this cleanup as well: https://lkml.org/lkml/2011/9/2/225) > > > > > > A tunable was proposed to allow specifying the distance between pages_min > > > and the low watermark before kswapd is kicked in to > > > free up pages. I'd like to re-open this thread since the patch did not > > > appear to go anywhere. > > > > > > We have a server workload wherein machines with 100G+ of "free" memory > > > (used by page cache), scattered but frequent random io > > > reads from 12+ SSD's, and 5gbps+ of internet traffic, will frequently hit > > > direct reclaim in a few different ways. > > > > > > 1) It'll run into small amounts of reclaim randomly (a few hundred > > > thousand). > > > > > > 2) A burst of reads or traffic can cause extra pressure, which kswapd > > > occasionally responds to by freeing up 40g+ of the pagecache all > > > at once > > > (!) while pausing the system (Argh). > > > > > > 3) A blip in an upstream provider or failover from a peer causes the > > > kernel to allocate massive amounts of memory for retransmission > > > queues/etc, potentially along with buffered IO reads and (some, but not > > > often a ton) of new allocations from an application. This > > > paired with 2) can cause the box to stall for 15+ seconds. > > > > > > We're seeing this more in 3.4/3.5/3.6, saw it less in 2.6.38. Mass > > > reclaims are more common in newer kernels, but reclaims still happen > > > in all kernels without raising min_free_kbytes dramatically. > > > > > > I've found that setting "lowmem_reserve_ratio" to something like "1 1 32" > > > (thus protecting the DMA32 zone) causes 2) to happen less often, and is > > > generally less violent with 1). > > > > > > Setting min_free_kbytes to 15G or more, paired with the above, has been > > > the best at mitigating the issue. This is simply trying to raise > > > the distance between the min and low watermarks. With min_free_kbytes set > > > to 1500, that gives us a whopping 1.8G (!!!) of > > > leeway before slamming into direct reclaim. > > > > > > So, this patch is unfortunate but wonderful at letting us reclaim 10G+ of > > > otherwise lost memory. Could we please revisit it? > > > > > > I saw a lot of discussion on doing this automatically, or making kswapd > > > more efficient to it, and I'd love to do that. Beyond making > > > kswapd psychic I haven't seen any better options yet. > > > > > > The issue is more complex than simply having an application warn of an > > > impending allocation, since this can happen via read load on > > > disk or from kernel page allocations for the network, or a combination of > > > the two (or three, if you add the app back in). > > > > > > It's going to get worse as we push machines with faster SSD's and bigger > > > networks. I'm open to any ideas on how to make kswapd > > > more efficient in our case, or really anything at all that works. > > > > > > I have more details, but cut it down as much as I could for this mail. > > > > > > Thanks, > > > -Dormando > > > -- > > > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > > > the body of a message to majord...@vger.kernel.org More > > > majordomo info at http://vger.kernel.org/majordomo-info.html > > > Please read the FAQ at http://www.tux.org/lkml/ > > > -- > All rights reversed > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] add extra free kbytes tunable
From: Rik van Riel Add a userspace visible knob to tell the VM to keep an extra amount of memory free, by increasing the gap between each zone's min and low watermarks. This is useful for realtime applications that call system calls and have a bound on the number of allocations that happen in any short time period. In this application, extra_free_kbytes would be left at an amount equal to or larger than than the maximum number of allocations that happen in any burst. It may also be useful to reduce the memory use of virtual machines (temporarily?), in a way that does not cause memory fragmentation like ballooning does. --- Documentation/sysctl/vm.txt | 16 include/linux/mmzone.h |2 +- include/linux/swap.h|2 ++ kernel/sysctl.c | 11 +-- mm/page_alloc.c | 39 +-- 5 files changed, 57 insertions(+), 13 deletions(-) diff --git a/Documentation/sysctl/vm.txt b/Documentation/sysctl/vm.txt index 078701f..5d12bbd 100644 --- a/Documentation/sysctl/vm.txt +++ b/Documentation/sysctl/vm.txt @@ -28,6 +28,7 @@ Currently, these files are in /proc/sys/vm: - dirty_writeback_centisecs - drop_caches - extfrag_threshold +- extra_free_kbytes - hugepages_treat_as_movable - hugetlb_shm_group - laptop_mode @@ -167,6 +168,21 @@ fragmentation index is <= extfrag_threshold. The default value is 500. == +extra_free_kbytes + +This parameter tells the VM to keep extra free memory between the threshold +where background reclaim (kswapd) kicks in, and the threshold where direct +reclaim (by allocating processes) kicks in. + +This is useful for workloads that require low latency memory allocations +and have a bounded burstiness in memory allocations, for example a +realtime application that receives and transmits network traffic +(causing in-kernel memory allocations) with a maximum total message burst +size of 200MB may need 200MB of extra free memory to avoid direct reclaim +related latencies. + +== + hugepages_treat_as_movable This parameter is only useful when kernelcore= is specified at boot time to diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 73b64a3..7f8f883 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -881,7 +881,7 @@ static inline int is_dma(struct zone *zone) /* These two functions are used to setup the per zone pages min values */ struct ctl_table; -int min_free_kbytes_sysctl_handler(struct ctl_table *, int, +int free_kbytes_sysctl_handler(struct ctl_table *, int, void __user *, size_t *, loff_t *); extern int sysctl_lowmem_reserve_ratio[MAX_NR_ZONES-1]; int lowmem_reserve_ratio_sysctl_handler(struct ctl_table *, int, diff --git a/include/linux/swap.h b/include/linux/swap.h index 68df9c1..66a12c4 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -215,6 +215,8 @@ struct swap_list_t { /* linux/mm/page_alloc.c */ extern unsigned long totalram_pages; extern unsigned long totalreserve_pages; +extern int min_free_kbytes; +extern int extra_free_kbytes; extern unsigned long dirty_balance_reserve; extern unsigned int nr_free_buffer_pages(void); extern unsigned int nr_free_pagecache_pages(void); diff --git a/kernel/sysctl.c b/kernel/sysctl.c index c88878d..102e9a1 100644 --- a/kernel/sysctl.c +++ b/kernel/sysctl.c @@ -104,7 +104,6 @@ extern char core_pattern[]; extern unsigned int core_pipe_limit; #endif extern int pid_max; -extern int min_free_kbytes; extern int pid_max_min, pid_max_max; extern int sysctl_drop_caches; extern int percpu_pagelist_fraction; @@ -1246,10 +1245,18 @@ static struct ctl_table vm_table[] = { .data = _free_kbytes, .maxlen = sizeof(min_free_kbytes), .mode = 0644, - .proc_handler = min_free_kbytes_sysctl_handler, + .proc_handler = free_kbytes_sysctl_handler, .extra1 = , }, { + .procname = "extra_free_kbytes", + .data = _free_kbytes, + .maxlen = sizeof(extra_free_kbytes), + .mode = 0644, + .proc_handler = free_kbytes_sysctl_handler, + .extra1 = , + }, + { .procname = "percpu_pagelist_fraction", .data = _pagelist_fraction, .maxlen = sizeof(percpu_pagelist_fraction), diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 9673d96..5380d84 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -194,8 +194,21 @@ static char * const zone_names[MAX_NR_ZONES] = { "Movable", }; +/* + * Try to keep at least this much lowmem free. Do not allow normal + * allocations below this point, only high priority ones. Automatically
[PATCH] add extra free kbytes tunable
From: Rik van Riel r...@redhat.com Add a userspace visible knob to tell the VM to keep an extra amount of memory free, by increasing the gap between each zone's min and low watermarks. This is useful for realtime applications that call system calls and have a bound on the number of allocations that happen in any short time period. In this application, extra_free_kbytes would be left at an amount equal to or larger than than the maximum number of allocations that happen in any burst. It may also be useful to reduce the memory use of virtual machines (temporarily?), in a way that does not cause memory fragmentation like ballooning does. --- Documentation/sysctl/vm.txt | 16 include/linux/mmzone.h |2 +- include/linux/swap.h|2 ++ kernel/sysctl.c | 11 +-- mm/page_alloc.c | 39 +-- 5 files changed, 57 insertions(+), 13 deletions(-) diff --git a/Documentation/sysctl/vm.txt b/Documentation/sysctl/vm.txt index 078701f..5d12bbd 100644 --- a/Documentation/sysctl/vm.txt +++ b/Documentation/sysctl/vm.txt @@ -28,6 +28,7 @@ Currently, these files are in /proc/sys/vm: - dirty_writeback_centisecs - drop_caches - extfrag_threshold +- extra_free_kbytes - hugepages_treat_as_movable - hugetlb_shm_group - laptop_mode @@ -167,6 +168,21 @@ fragmentation index is = extfrag_threshold. The default value is 500. == +extra_free_kbytes + +This parameter tells the VM to keep extra free memory between the threshold +where background reclaim (kswapd) kicks in, and the threshold where direct +reclaim (by allocating processes) kicks in. + +This is useful for workloads that require low latency memory allocations +and have a bounded burstiness in memory allocations, for example a +realtime application that receives and transmits network traffic +(causing in-kernel memory allocations) with a maximum total message burst +size of 200MB may need 200MB of extra free memory to avoid direct reclaim +related latencies. + +== + hugepages_treat_as_movable This parameter is only useful when kernelcore= is specified at boot time to diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 73b64a3..7f8f883 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -881,7 +881,7 @@ static inline int is_dma(struct zone *zone) /* These two functions are used to setup the per zone pages min values */ struct ctl_table; -int min_free_kbytes_sysctl_handler(struct ctl_table *, int, +int free_kbytes_sysctl_handler(struct ctl_table *, int, void __user *, size_t *, loff_t *); extern int sysctl_lowmem_reserve_ratio[MAX_NR_ZONES-1]; int lowmem_reserve_ratio_sysctl_handler(struct ctl_table *, int, diff --git a/include/linux/swap.h b/include/linux/swap.h index 68df9c1..66a12c4 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -215,6 +215,8 @@ struct swap_list_t { /* linux/mm/page_alloc.c */ extern unsigned long totalram_pages; extern unsigned long totalreserve_pages; +extern int min_free_kbytes; +extern int extra_free_kbytes; extern unsigned long dirty_balance_reserve; extern unsigned int nr_free_buffer_pages(void); extern unsigned int nr_free_pagecache_pages(void); diff --git a/kernel/sysctl.c b/kernel/sysctl.c index c88878d..102e9a1 100644 --- a/kernel/sysctl.c +++ b/kernel/sysctl.c @@ -104,7 +104,6 @@ extern char core_pattern[]; extern unsigned int core_pipe_limit; #endif extern int pid_max; -extern int min_free_kbytes; extern int pid_max_min, pid_max_max; extern int sysctl_drop_caches; extern int percpu_pagelist_fraction; @@ -1246,10 +1245,18 @@ static struct ctl_table vm_table[] = { .data = min_free_kbytes, .maxlen = sizeof(min_free_kbytes), .mode = 0644, - .proc_handler = min_free_kbytes_sysctl_handler, + .proc_handler = free_kbytes_sysctl_handler, .extra1 = zero, }, { + .procname = extra_free_kbytes, + .data = extra_free_kbytes, + .maxlen = sizeof(extra_free_kbytes), + .mode = 0644, + .proc_handler = free_kbytes_sysctl_handler, + .extra1 = zero, + }, + { .procname = percpu_pagelist_fraction, .data = percpu_pagelist_fraction, .maxlen = sizeof(percpu_pagelist_fraction), diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 9673d96..5380d84 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -194,8 +194,21 @@ static char * const zone_names[MAX_NR_ZONES] = { Movable, }; +/* + * Try to keep at least this much lowmem free. Do not allow normal + * allocations below this point, only high
Re: extra free kbytes tunable
On Fri, 15 Feb 2013, Rik van Riel wrote: On 02/15/2013 05:21 PM, Seiji Aguchi wrote: Rik, Satoru, Do you have any comments? IIRC at the time the patch was rejected as too inelegant. However, nobody else seems to have come up with a better plan, and there are users in need of a fix for this problem. I would still like to see a fix for the problem merged upstream. I merged in the cleanups to your original patch, rebased it off of linus' master from a day or two ago and re-sent (not sure how to preserve authorship in that case? Apologies for goofing it). I'm willing to argue it, or investigate better options. I'm going to be stuck maintaining this patch since we can't really afford to have production hang, or waste 12g+ of RAM per box. -Original Message- From: linux-kernel-ow...@vger.kernel.org [mailto:linux-kernel-ow...@vger.kernel.org] On Behalf Of dormando Sent: Monday, February 11, 2013 9:01 PM To: Rik van Riel Cc: Randy Dunlap; Satoru Moriya; linux-kernel@vger.kernel.org; linux...@kvack.org; lwood...@redhat.com; Seiji Aguchi; a...@linux-foundation.org; hu...@google.com Subject: extra free kbytes tunable Hi, As discussed in this thread: http://marc.info/?l=linux-mmm=131490523222031w=2 (with this cleanup as well: https://lkml.org/lkml/2011/9/2/225) A tunable was proposed to allow specifying the distance between pages_min and the low watermark before kswapd is kicked in to free up pages. I'd like to re-open this thread since the patch did not appear to go anywhere. We have a server workload wherein machines with 100G+ of free memory (used by page cache), scattered but frequent random io reads from 12+ SSD's, and 5gbps+ of internet traffic, will frequently hit direct reclaim in a few different ways. 1) It'll run into small amounts of reclaim randomly (a few hundred thousand). 2) A burst of reads or traffic can cause extra pressure, which kswapd occasionally responds to by freeing up 40g+ of the pagecache all at once (!) while pausing the system (Argh). 3) A blip in an upstream provider or failover from a peer causes the kernel to allocate massive amounts of memory for retransmission queues/etc, potentially along with buffered IO reads and (some, but not often a ton) of new allocations from an application. This paired with 2) can cause the box to stall for 15+ seconds. We're seeing this more in 3.4/3.5/3.6, saw it less in 2.6.38. Mass reclaims are more common in newer kernels, but reclaims still happen in all kernels without raising min_free_kbytes dramatically. I've found that setting lowmem_reserve_ratio to something like 1 1 32 (thus protecting the DMA32 zone) causes 2) to happen less often, and is generally less violent with 1). Setting min_free_kbytes to 15G or more, paired with the above, has been the best at mitigating the issue. This is simply trying to raise the distance between the min and low watermarks. With min_free_kbytes set to 1500, that gives us a whopping 1.8G (!!!) of leeway before slamming into direct reclaim. So, this patch is unfortunate but wonderful at letting us reclaim 10G+ of otherwise lost memory. Could we please revisit it? I saw a lot of discussion on doing this automatically, or making kswapd more efficient to it, and I'd love to do that. Beyond making kswapd psychic I haven't seen any better options yet. The issue is more complex than simply having an application warn of an impending allocation, since this can happen via read load on disk or from kernel page allocations for the network, or a combination of the two (or three, if you add the app back in). It's going to get worse as we push machines with faster SSD's and bigger networks. I'm open to any ideas on how to make kswapd more efficient in our case, or really anything at all that works. I have more details, but cut it down as much as I could for this mail. Thanks, -Dormando -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ -- All rights reversed -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
extra free kbytes tunable
Hi, As discussed in this thread: http://marc.info/?l=linux-mm=131490523222031=2 (with this cleanup as well: https://lkml.org/lkml/2011/9/2/225) A tunable was proposed to allow specifying the distance between pages_min and the low watermark before kswapd is kicked in to free up pages. I'd like to re-open this thread since the patch did not appear to go anywhere. We have a server workload wherein machines with 100G+ of "free" memory (used by page cache), scattered but frequent random io reads from 12+ SSD's, and 5gbps+ of internet traffic, will frequently hit direct reclaim in a few different ways. 1) It'll run into small amounts of reclaim randomly (a few hundred thousand). 2) A burst of reads or traffic can cause extra pressure, which kswapd occasionally responds to by freeing up 40g+ of the pagecache all at once (!) while pausing the system (Argh). 3) A blip in an upstream provider or failover from a peer causes the kernel to allocate massive amounts of memory for retransmission queues/etc, potentially along with buffered IO reads and (some, but not often a ton) of new allocations from an application. This paired with 2) can cause the box to stall for 15+ seconds. We're seeing this more in 3.4/3.5/3.6, saw it less in 2.6.38. Mass reclaims are more common in newer kernels, but reclaims still happen in all kernels without raising min_free_kbytes dramatically. I've found that setting "lowmem_reserve_ratio" to something like "1 1 32" (thus protecting the DMA32 zone) causes 2) to happen less often, and is generally less violent with 1). Setting min_free_kbytes to 15G or more, paired with the above, has been the best at mitigating the issue. This is simply trying to raise the distance between the min and low watermarks. With min_free_kbytes set to 1500, that gives us a whopping 1.8G (!!!) of leeway before slamming into direct reclaim. So, this patch is unfortunate but wonderful at letting us reclaim 10G+ of otherwise lost memory. Could we please revisit it? I saw a lot of discussion on doing this automatically, or making kswapd more efficient to it, and I'd love to do that. Beyond making kswapd psychic I haven't seen any better options yet. The issue is more complex than simply having an application warn of an impending allocation, since this can happen via read load on disk or from kernel page allocations for the network, or a combination of the two (or three, if you add the app back in). It's going to get worse as we push machines with faster SSD's and bigger networks. I'm open to any ideas on how to make kswapd more efficient in our case, or really anything at all that works. I have more details, but cut it down as much as I could for this mail. Thanks, -Dormando -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
extra free kbytes tunable
Hi, As discussed in this thread: http://marc.info/?l=linux-mmm=131490523222031w=2 (with this cleanup as well: https://lkml.org/lkml/2011/9/2/225) A tunable was proposed to allow specifying the distance between pages_min and the low watermark before kswapd is kicked in to free up pages. I'd like to re-open this thread since the patch did not appear to go anywhere. We have a server workload wherein machines with 100G+ of free memory (used by page cache), scattered but frequent random io reads from 12+ SSD's, and 5gbps+ of internet traffic, will frequently hit direct reclaim in a few different ways. 1) It'll run into small amounts of reclaim randomly (a few hundred thousand). 2) A burst of reads or traffic can cause extra pressure, which kswapd occasionally responds to by freeing up 40g+ of the pagecache all at once (!) while pausing the system (Argh). 3) A blip in an upstream provider or failover from a peer causes the kernel to allocate massive amounts of memory for retransmission queues/etc, potentially along with buffered IO reads and (some, but not often a ton) of new allocations from an application. This paired with 2) can cause the box to stall for 15+ seconds. We're seeing this more in 3.4/3.5/3.6, saw it less in 2.6.38. Mass reclaims are more common in newer kernels, but reclaims still happen in all kernels without raising min_free_kbytes dramatically. I've found that setting lowmem_reserve_ratio to something like 1 1 32 (thus protecting the DMA32 zone) causes 2) to happen less often, and is generally less violent with 1). Setting min_free_kbytes to 15G or more, paired with the above, has been the best at mitigating the issue. This is simply trying to raise the distance between the min and low watermarks. With min_free_kbytes set to 1500, that gives us a whopping 1.8G (!!!) of leeway before slamming into direct reclaim. So, this patch is unfortunate but wonderful at letting us reclaim 10G+ of otherwise lost memory. Could we please revisit it? I saw a lot of discussion on doing this automatically, or making kswapd more efficient to it, and I'd love to do that. Beyond making kswapd psychic I haven't seen any better options yet. The issue is more complex than simply having an application warn of an impending allocation, since this can happen via read load on disk or from kernel page allocations for the network, or a combination of the two (or three, if you add the app back in). It's going to get worse as we push machines with faster SSD's and bigger networks. I'm open to any ideas on how to make kswapd more efficient in our case, or really anything at all that works. I have more details, but cut it down as much as I could for this mail. Thanks, -Dormando -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/