from:"dormando"

Re: Multitude of dst obsolescense race conditions

2014-05-14 Thread dormando

> On Wed, May 14, 2014, at 2:57, dormando wrote:
> > Given a machine with frequently changing routes (ie; a router with an
> > active internet BGP table and multiple interfaces), there're at least
> > several places where obsolete dst's are handled improperly. If I pause
> > the
> > route changes, the crashes appear to stop. This first one has a crash
> > utility we've made, so I was able to more quickly find a patch and test
> > it. The others take time to reproduce.
> >
> > I'm testing against 3.10.39, but I think if these were fixed they'd be
> > backported to stable? I've also had recent 3.12's running that have
> > crashed in the same spots. Anyway correct me if I'm wrong...
>
> Just a hunch:
> You use macvlan? Could you somehow try without?
> Maybe... some ref overflow? (You could add some testing code in dst_hold
> with atomic_inc_return and WARN_ON).
>
> dst_release already contains such a check, so I am not sure at all if
> that could happen.
>
> Bye,
>
>   Hannes
>

We've seen the crashes with macvlan removed. Don't think I've explicitly
removed it recently or for the udp crash, but I'm sorta doubting that'd
make a difference.

and yeah, pretty weird right? it's like the RCU isn't working..
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Multitude of dst obsolescense race conditions

2014-05-14 Thread dormando


> On Wed, 2014-05-14 at 02:57 -0700, dormando wrote:
> > Hi,
> >
> > Given a machine with frequently changing routes (ie; a router with an
> > active internet BGP table and multiple interfaces), there're at least
> > several places where obsolete dst's are handled improperly. If I pause the
> > route changes, the crashes appear to stop. This first one has a crash
> > utility we've made, so I was able to more quickly find a patch and test
> > it. The others take time to reproduce.
> >
> > I'm testing against 3.10.39, but I think if these were fixed they'd be
> > backported to stable? I've also had recent 3.12's running that have
> > crashed in the same spots. Anyway correct me if I'm wrong...
>
> Is this a vanilla kernel ? I never had any issues like that.
>
> I wonder if you have some RCU issues.
>
> static inline struct dst_entry *
> sk_dst_get(struct sock *sk)
> {
> struct dst_entry *dst;
>
> rcu_read_lock();
> dst = rcu_dereference(sk->sk_dst_cache);
> if (dst)
> dst_hold(dst);
> rcu_read_unlock();
> return dst;
> }
>
> static inline void
> __sk_dst_set(struct sock *sk, struct dst_entry *dst)
> {
> struct dst_entry *old_dst;
>
> sk_tx_queue_clear(sk);
> /*
>  * This can be called while sk is owned by the caller only,
>  * with no state that can be checked in a rcu_dereference_check() cond
>  */
> old_dst = rcu_dereference_raw(sk->sk_dst_cache);
> rcu_assign_pointer(sk->sk_dst_cache, dst);
> dst_release(old_dst);
> }
>
> static inline void
> sk_dst_set(struct sock *sk, struct dst_entry *dst)
> {
> spin_lock(>sk_dst_lock);
> __sk_dst_set(sk, dst);
> spin_unlock(>sk_dst_lock);
> }
>
>
>
>

We have some minor patches, but I've removed them before and they still
happen. I'd crashed a vanilla 3.12 + just the stable patches recently I
think.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Multitude of dst obsolescense race conditions

2014-05-14 Thread dormando

  [] irq_exit+0x55/0x60
<4>[7359073.004983]  [] scheduler_ipi+0x35/0x40
<4>[7359073.005049]  []
smp_reschedule_interrupt+0x2a/0x30
<4>[7359073.005115]  [] reschedule_interrupt+0x6a/0x70
<4>[7359073.005176]  
<4>[7359073.005217]  [] ? _raw_spin_lock+0x25/0x30
<4>[7359073.005370]  [] futex_wait_setup+0x69/0xf0
<4>[7359073.005433]  [] futex_wait+0x186/0x2c0
<4>[7359073.005495]  [] ? current_fs_time+0x16/0x60
<4>[7359073.005559]  [] ? pipe_write+0x2f3/0x590
<4>[7359073.005625]  [] ? fsnotify+0x1d2/0x2b0
<4>[7359073.005687]  [] do_futex+0x334/0xb20
<4>[7359073.005751]  [] ? do_sync_write+0x7a/0xb0
<4>[7359073.005813]  [] ? fsnotify+0x1d2/0x2b0
<4>[7359073.005875]  [] SyS_futex+0x142/0x1a0
<4>[7359073.005939]  [] ? SyS_write+0x6b/0xa0
<4>[7359073.006001]  [] system_call_fastpath+0x16/0x1b
<4>[7359073.006063] Code: 00 49 8b 50 08 4d 8b 28 49 8b 40 10 4d 85 ed 0f
84 e7 00 00 00 48 85 c0 0f 84 de 00 00 00 49 63 44 24 20 48 8d 4a 01 49 8b
3c 24 <49> 8b 5c 05 00 4c 89 e8 65 48 0f c7 0f 0f 94 c0 84 c0 74 b5 49
<1>[7359073.008543] RIP  [] kmem_cache_alloc+0x57/0x150
<4>[7359073.008642]  RSP 
<4>[7359073.008700] CR2: 0001
<4>[7359073.008767] ---[ end trace 83220393c4cb24ad ]---
<0>[7359073.072455] Kernel panic - not syncing: Fatal exception in
interrupt

(apologies for the mangling, it's getting late and I'm getting
progressively lazier)

The path for this one appears to shift a bit, but is always dying from the
kmem_cache_alloc() call withind dst_alloc().

I've also seen:

<4>[14723139.584187] Call Trace:
<4>[14723139.584241]  
<4>[14723139.584282]  [] dst_alloc+0x5a/0x180
<4>[14723139.584433]  [] ?
tcp_write_timer_handler+0x1d0/0x1d0
<4>[14723139.584497]  [] rt_dst_alloc+0x4c/0x50
<4>[14723139.584558]  []
__ip_route_output_key+0x281/0x860
<4>[14723139.584622]  [] ?
tcp_write_timer_handler+0x1d0/0x1d0
<4>[14723139.584685]  [] ip_route_output_flow+0x27/0x70
<4>[14723139.584747]  []
inet_sk_rebuild_header+0x137/0x310
<4>[14723139.584810]  [] ?
tcp_write_timer_handler+0x1d0/0x1d0
<4>[14723139.584874]  [] __tcp_retransmit_skb+0x78/0x5a0
<4>[14723139.584938]  [] ? bictcp_state+0xa1/0x100
<4>[14723139.584999]  [] ?
tcp_write_timer_handler+0x1d0/0x1d0
<4>[14723139.585062]  [] tcp_retransmit_skb+0x24/0x100
<4>[14723139.585124]  []
tcp_retransmit_timer+0x271/0x6d0
<4>[14723139.585187]  [] ?
tcp_write_timer_handler+0x1d0/0x1d0
<4>[14723139.585250]  []
tcp_write_timer_handler+0xa0/0x1d0
<4>[14723139.585314]  [] ?
tcp_write_timer_handler+0x1d0/0x1d0
<4>[14723139.585378]  [] tcp_write_timer+0x60/0x70
<4>[14723139.585443]  [] call_timer_fn+0x3b/0x150
<4>[14723139.585507]  [] ? do_IRQ+0x63/0xe0
<4>[14723139.585568]  [] ?
tcp_write_timer_handler+0x1d0/0x1d0
<4>[14723139.585630]  [] run_timer_softirq+0x243/0x290
<4>[14723139.585690]  [] __do_softirq+0xd0/0x270
<4>[14723139.585749]  [] irq_exit+0x55/0x60
<4>[14723139.585807]  []
smp_apic_timer_interrupt+0x6e/0x99
<4>[14723139.585868]  [] apic_timer_interrupt+0x6a/0x70

This one's the most problematic. It's the least frequent, most difficult
to reproduce. Given how the other issues all centralize around dst's being
mishandled during route updates my wild guess would be that it's somewhere
within there.

It's probably worth auditing how dst caches are handled in all places, but
it is 3am and I have to stop for now. Anyway this sucks, please help!

thanks!
-Dormando
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Multitude of dst obsolescense race conditions

2014-05-14 Thread dormando

] process_backlog+0x9b/0x170
4[7359073.004729]  [815afa49] net_rx_action+0x119/0x220
4[7359073.004794]  [81080f0b] ?
check_preempt_wakeup+0x14b/0x230
4[7359073.004860]  [81051970] __do_softirq+0xd0/0x270
4[7359073.004921]  [81051c25] irq_exit+0x55/0x60
4[7359073.004983]  [8107a5b5] scheduler_ipi+0x35/0x40
4[7359073.005049]  [81023bda]
smp_reschedule_interrupt+0x2a/0x30
4[7359073.005115]  [816cd5da] reschedule_interrupt+0x6a/0x70
4[7359073.005176]  EOI
4[7359073.005217]  [816c41f5] ? _raw_spin_lock+0x25/0x30
4[7359073.005370]  [81098629] futex_wait_setup+0x69/0xf0
4[7359073.005433]  [81098836] futex_wait+0x186/0x2c0
4[7359073.005495]  [810508c6] ? current_fs_time+0x16/0x60
4[7359073.005559]  [81159123] ? pipe_write+0x2f3/0x590
4[7359073.005625]  [8118e8c2] ? fsnotify+0x1d2/0x2b0
4[7359073.005687]  [81099e04] do_futex+0x334/0xb20
4[7359073.005751]  [8115021a] ? do_sync_write+0x7a/0xb0
4[7359073.005813]  [8118e8c2] ? fsnotify+0x1d2/0x2b0
4[7359073.005875]  [8109a732] SyS_futex+0x142/0x1a0
4[7359073.005939]  [8115148b] ? SyS_write+0x6b/0xa0
4[7359073.006001]  [816cc702] system_call_fastpath+0x16/0x1b
4[7359073.006063] Code: 00 49 8b 50 08 4d 8b 28 49 8b 40 10 4d 85 ed 0f
84 e7 00 00 00 48 85 c0 0f 84 de 00 00 00 49 63 44 24 20 48 8d 4a 01 49 8b
3c 24 49 8b 5c 05 00 4c 89 e8 65 48 0f c7 0f 0f 94 c0 84 c0 74 b5 49
1[7359073.008543] RIP  [811421e7] kmem_cache_alloc+0x57/0x150
4[7359073.008642]  RSP 88c07fc638f8
4[7359073.008700] CR2: 0001
4[7359073.008767] ---[ end trace 83220393c4cb24ad ]---
0[7359073.072455] Kernel panic - not syncing: Fatal exception in
interrupt

(apologies for the mangling, it's getting late and I'm getting
progressively lazier)

The path for this one appears to shift a bit, but is always dying from the
kmem_cache_alloc() call withind dst_alloc().

I've also seen:

4[14723139.584187] Call Trace:
4[14723139.584241]  IRQ
4[14723139.584282]  [815b672a] dst_alloc+0x5a/0x180
4[14723139.584433]  [8161c880] ?
tcp_write_timer_handler+0x1d0/0x1d0
4[14723139.584497]  [815f78bc] rt_dst_alloc+0x4c/0x50
4[14723139.584558]  [815f8861]
__ip_route_output_key+0x281/0x860
4[14723139.584622]  [8161c880] ?
tcp_write_timer_handler+0x1d0/0x1d0
4[14723139.584685]  [815f8e67] ip_route_output_flow+0x27/0x70
4[14723139.584747]  [816329f7]
inet_sk_rebuild_header+0x137/0x310
4[14723139.584810]  [8161c880] ?
tcp_write_timer_handler+0x1d0/0x1d0
4[14723139.584874]  [81619c28] __tcp_retransmit_skb+0x78/0x5a0
4[14723139.584938]  [816557f1] ? bictcp_state+0xa1/0x100
4[14723139.584999]  [8161c880] ?
tcp_write_timer_handler+0x1d0/0x1d0
4[14723139.585062]  [8161a354] tcp_retransmit_skb+0x24/0x100
4[14723139.585124]  [8161c251]
tcp_retransmit_timer+0x271/0x6d0
4[14723139.585187]  [8161c880] ?
tcp_write_timer_handler+0x1d0/0x1d0
4[14723139.585250]  [8161c750]
tcp_write_timer_handler+0xa0/0x1d0
4[14723139.585314]  [8161c880] ?
tcp_write_timer_handler+0x1d0/0x1d0
4[14723139.585378]  [8161c8e0] tcp_write_timer+0x60/0x70
4[14723139.585443]  [81057ccb] call_timer_fn+0x3b/0x150
4[14723139.585507]  [816cdfc3] ? do_IRQ+0x63/0xe0
4[14723139.585568]  [8161c880] ?
tcp_write_timer_handler+0x1d0/0x1d0
4[14723139.585630]  [81059223] run_timer_softirq+0x243/0x290
4[14723139.585690]  [81051970] __do_softirq+0xd0/0x270
4[14723139.585749]  [81051c25] irq_exit+0x55/0x60
4[14723139.585807]  [816ce0ae]
smp_apic_timer_interrupt+0x6e/0x99
4[14723139.585868]  [816cd2ca] apic_timer_interrupt+0x6a/0x70

This one's the most problematic. It's the least frequent, most difficult
to reproduce. Given how the other issues all centralize around dst's being
mishandled during route updates my wild guess would be that it's somewhere
within there.

It's probably worth auditing how dst caches are handled in all places, but
it is 3am and I have to stop for now. Anyway this sucks, please help!

thanks!
-Dormando
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Multitude of dst obsolescense race conditions

2014-05-14 Thread dormando


 On Wed, 2014-05-14 at 02:57 -0700, dormando wrote:
  Hi,
 
  Given a machine with frequently changing routes (ie; a router with an
  active internet BGP table and multiple interfaces), there're at least
  several places where obsolete dst's are handled improperly. If I pause the
  route changes, the crashes appear to stop. This first one has a crash
  utility we've made, so I was able to more quickly find a patch and test
  it. The others take time to reproduce.
 
  I'm testing against 3.10.39, but I think if these were fixed they'd be
  backported to stable? I've also had recent 3.12's running that have
  crashed in the same spots. Anyway correct me if I'm wrong...

 Is this a vanilla kernel ? I never had any issues like that.

 I wonder if you have some RCU issues.

 static inline struct dst_entry *
 sk_dst_get(struct sock *sk)
 {
 struct dst_entry *dst;

 rcu_read_lock();
 dst = rcu_dereference(sk-sk_dst_cache);
 if (dst)
 dst_hold(dst);
 rcu_read_unlock();
 return dst;
 }

 static inline void
 __sk_dst_set(struct sock *sk, struct dst_entry *dst)
 {
 struct dst_entry *old_dst;

 sk_tx_queue_clear(sk);
 /*
  * This can be called while sk is owned by the caller only,
  * with no state that can be checked in a rcu_dereference_check() cond
  */
 old_dst = rcu_dereference_raw(sk-sk_dst_cache);
 rcu_assign_pointer(sk-sk_dst_cache, dst);
 dst_release(old_dst);
 }

 static inline void
 sk_dst_set(struct sock *sk, struct dst_entry *dst)
 {
 spin_lock(sk-sk_dst_lock);
 __sk_dst_set(sk, dst);
 spin_unlock(sk-sk_dst_lock);
 }





We have some minor patches, but I've removed them before and they still
happen. I'd crashed a vanilla 3.12 + just the stable patches recently I
think.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Multitude of dst obsolescense race conditions

2014-05-14 Thread dormando

 On Wed, May 14, 2014, at 2:57, dormando wrote:
  Given a machine with frequently changing routes (ie; a router with an
  active internet BGP table and multiple interfaces), there're at least
  several places where obsolete dst's are handled improperly. If I pause
  the
  route changes, the crashes appear to stop. This first one has a crash
  utility we've made, so I was able to more quickly find a patch and test
  it. The others take time to reproduce.
 
  I'm testing against 3.10.39, but I think if these were fixed they'd be
  backported to stable? I've also had recent 3.12's running that have
  crashed in the same spots. Anyway correct me if I'm wrong...

 Just a hunch:
 You use macvlan? Could you somehow try without?
 Maybe... some ref overflow? (You could add some testing code in dst_hold
 with atomic_inc_return and WARN_ON).

 dst_release already contains such a check, so I am not sure at all if
 that could happen.

 Bye,

   Hannes


We've seen the crashes with macvlan removed. Don't think I've explicitly
removed it recently or for the udp crash, but I'm sorta doubting that'd
make a difference.

and yeah, pretty weird right? it's like the RCU isn't working..
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: ipv4_dst_destroy panic regression after 3.10.15

2014-05-13 Thread dormando

On Wed, 22 Jan 2014, Alexei Starovoitov wrote:

> On Tue, Jan 21, 2014 at 10:02 PM, Alexei Starovoitov
>  wrote:
> > On Tue, Jan 21, 2014 at 8:10 PM, dormando  wrote:
> >>
> >>
> >> On Tue, 21 Jan 2014, Alexei Starovoitov wrote:
> >>
> >>> On Tue, Jan 21, 2014 at 5:39 PM, dormando  wrote:
> >>> >
> >>> > > On Fri, Jan 17, 2014 at 11:16 PM, dormando  wrote:
> >>> > > >> On Fri, 2014-01-17 at 22:49 -0800, Eric Dumazet wrote:
> >>> > > >> > On Fri, 2014-01-17 at 17:25 -0800, dormando wrote:
> >>> > > >> > > Hi,
> >>> > > >> > >
> >>> > > >> > > Upgraded a few kernels to the latest 3.10 stable tree while 
> >>> > > >> > > tracking down
> >>> > > >> > > a rare kernel panic, seems to have introduced a much more 
> >>> > > >> > > frequent kernel
> >>> > > >> > > panic. Takes anywhere from 4 hours to 2 days to trigger:
> >>> > > >> > >
> >>> > > >> > > <4>[196727.311203] general protection fault:  [#1] SMP
> >>> > > >> > > <4>[196727.311224] Modules linked in: xt_TEE xt_dscp xt_DSCP 
> >>> > > >> > > macvlan bridge coretemp crc32_pclmul ghash_clmulni_intel 
> >>> > > >> > > gpio_ich microcode
> >>> ipmi_watchdog ipmi_devintf sb_edac edac_core lpc_ich mfd_core tpm_tis tpm 
> >>> tpm_bios ipmi_si ipmi_msghandler isci igb libsas i2c_algo_bit ixgbe ptp
> >>> pps_core mdio
> >>> > > >> > > <4>[196727.311333] CPU: 17 PID: 0 Comm: swapper/17 Not tainted 
> >>> > > >> > > 3.10.26 #1
> >>> > > >> > > <4>[196727.311344] Hardware name: Supermicro 
> >>> > > >> > > X9DRi-LN4+/X9DR3-LN4+/X9DRi-LN4+/X9DR3-LN4+, BIOS 3.0 
> >>> > > >> > > 07/05/2013
> >>> > > >> > > <4>[196727.311364] task: 885e6f069700 ti: 885e6f072000 
> >>> > > >> > > task.ti: 885e6f072000
> >>> > > >> > > <4>[196727.311377] RIP: 0010:[]  
> >>> > > >> > > [] ipv4_dst_destroy+0x4f/0x80
> >>> > > >> > > <4>[196727.311399] RSP: 0018:885effd23a70  EFLAGS: 00010282
> >>> > > >> > > <4>[196727.311409] RAX: dead00200200 RBX: 8854c398ecc0 
> >>> > > >> > > RCX: 0040
> >>> > > >> > > <4>[196727.311423] RDX: dead00100100 RSI: dead00100100 
> >>> > > >> > > RDI: dead00200200
> >>> > > >> > > <4>[196727.311437] RBP: 885effd23a80 R08: 815fd9e0 
> >>> > > >> > > R09: 885d5a590800
> >>> > > >> > > <4>[196727.311451] R10:  R11:  
> >>> > > >> > > R12: 
> >>> > > >> > > <4>[196727.311464] R13: 81c8c280 R14:  
> >>> > > >> > > R15: 880e85ee16ce
> >>> > > >> > > <4>[196727.311510] FS:  () 
> >>> > > >> > > GS:885effd2() knlGS:
> >>> > > >> > > <4>[196727.311554] CS:  0010 DS:  ES:  CR0: 
> >>> > > >> > > 80050033
> >>> > > >> > > <4>[196727.311581] CR2: 7a46751eb000 CR3: 005e65688000 
> >>> > > >> > > CR4: 000407e0
> >>> > > >> > > <4>[196727.311625] DR0:  DR1:  
> >>> > > >> > > DR2: 
> >>> > > >> > > <4>[196727.311669] DR3:  DR6: 0ff0 
> >>> > > >> > > DR7: 0400
> >>> > > >> > > <4>[196727.311713] Stack:
> >>> > > >> > > <4>[196727.311733]  8854c398ecc0 8854c398ecc0 
> >>> > > >> > > 885effd23ab0 815b7f42
> >>> > > >> > > <4>[196727.311784]  88be6595bc00 8854c398ecc0 
> &

Re: ipv4_dst_destroy panic regression after 3.10.15

2014-05-13 Thread dormando

On Wed, 22 Jan 2014, Alexei Starovoitov wrote:

 On Tue, Jan 21, 2014 at 10:02 PM, Alexei Starovoitov
 alexei.starovoi...@gmail.com wrote:
  On Tue, Jan 21, 2014 at 8:10 PM, dormando dorma...@rydia.net wrote:
 
 
  On Tue, 21 Jan 2014, Alexei Starovoitov wrote:
 
  On Tue, Jan 21, 2014 at 5:39 PM, dormando dorma...@rydia.net wrote:
  
On Fri, Jan 17, 2014 at 11:16 PM, dormando dorma...@rydia.net wrote:
 On Fri, 2014-01-17 at 22:49 -0800, Eric Dumazet wrote:
  On Fri, 2014-01-17 at 17:25 -0800, dormando wrote:
   Hi,
  
   Upgraded a few kernels to the latest 3.10 stable tree while 
   tracking down
   a rare kernel panic, seems to have introduced a much more 
   frequent kernel
   panic. Takes anywhere from 4 hours to 2 days to trigger:
  
   4[196727.311203] general protection fault:  [#1] SMP
   4[196727.311224] Modules linked in: xt_TEE xt_dscp xt_DSCP 
   macvlan bridge coretemp crc32_pclmul ghash_clmulni_intel 
   gpio_ich microcode
  ipmi_watchdog ipmi_devintf sb_edac edac_core lpc_ich mfd_core tpm_tis tpm 
  tpm_bios ipmi_si ipmi_msghandler isci igb libsas i2c_algo_bit ixgbe ptp
  pps_core mdio
   4[196727.311333] CPU: 17 PID: 0 Comm: swapper/17 Not tainted 
   3.10.26 #1
   4[196727.311344] Hardware name: Supermicro 
   X9DRi-LN4+/X9DR3-LN4+/X9DRi-LN4+/X9DR3-LN4+, BIOS 3.0 
   07/05/2013
   4[196727.311364] task: 885e6f069700 ti: 885e6f072000 
   task.ti: 885e6f072000
   4[196727.311377] RIP: 0010:[815f8c7f]  
   [815f8c7f] ipv4_dst_destroy+0x4f/0x80
   4[196727.311399] RSP: 0018:885effd23a70  EFLAGS: 00010282
   4[196727.311409] RAX: dead00200200 RBX: 8854c398ecc0 
   RCX: 0040
   4[196727.311423] RDX: dead00100100 RSI: dead00100100 
   RDI: dead00200200
   4[196727.311437] RBP: 885effd23a80 R08: 815fd9e0 
   R09: 885d5a590800
   4[196727.311451] R10:  R11:  
   R12: 
   4[196727.311464] R13: 81c8c280 R14:  
   R15: 880e85ee16ce
   4[196727.311510] FS:  () 
   GS:885effd2() knlGS:
   4[196727.311554] CS:  0010 DS:  ES:  CR0: 
   80050033
   4[196727.311581] CR2: 7a46751eb000 CR3: 005e65688000 
   CR4: 000407e0
   4[196727.311625] DR0:  DR1:  
   DR2: 
   4[196727.311669] DR3:  DR6: 0ff0 
   DR7: 0400
   4[196727.311713] Stack:
   4[196727.311733]  8854c398ecc0 8854c398ecc0 
   885effd23ab0 815b7f42
   4[196727.311784]  88be6595bc00 8854c398ecc0 
    8854c398ecc0
   4[196727.311834]  885effd23ad0 815b86c6 
   885d5a590800 8816827821c0
   4[196727.311885] Call Trace:
   4[196727.311907]  IRQ
   4[196727.311912]  [815b7f42] dst_destroy+0x32/0xe0
   4[196727.311959]  [815b86c6] dst_release+0x56/0x80
   4[196727.311986]  [81620bd5] 
   tcp_v4_do_rcv+0x2a5/0x4a0
   4[196727.312013]  [81622b5a] tcp_v4_rcv+0x7da/0x820
   4[196727.312041]  [815fd9e0] ? 
   ip_rcv_finish+0x360/0x360
   4[196727.312070]  [815de02d] ? 
   nf_hook_slow+0x7d/0x150
   4[196727.312097]  [815fd9e0] ? 
   ip_rcv_finish+0x360/0x360
   4[196727.312125]  [815fda92] 
   ip_local_deliver_finish+0xb2/0x230
   4[196727.312154]  [815fdd9a] 
   ip_local_deliver+0x4a/0x90
   4[196727.312183]  [815fd799] 
   ip_rcv_finish+0x119/0x360
   4[196727.312212]  [815fe00b] ip_rcv+0x22b/0x340
   4[196727.312242]  [a0339680] ? 
   macvlan_broadcast+0x160/0x160 [macvlan]
   4[196727.312275]  [815b0c62] 
   __netif_receive_skb_core+0x512/0x640
   4[196727.312308]  [811427fb] ? 
   kmem_cache_alloc+0x13b/0x150
   4[196727.312338]  [815b0db1] 
   __netif_receive_skb+0x21/0x70
   4[196727.312368]  [815b0fa1] 
   netif_receive_skb+0x31/0xa0
   4[196727.312397]  [815b1ae8] 
   napi_gro_receive+0xe8/0x140
   4[196727.312433]  [a00274f1] 
   ixgbe_poll+0x551/0x11f0 [ixgbe]
   4[196727.312463]  [815fe00b] ? ip_rcv+0x22b/0x340
   4[196727.312491]  [815b1691] 
   net_rx_action+0x111/0x210
   4[196727.312521]  [815b0db1] ? 
   __netif_receive_skb+0x21/0x70
   4[196727.312552]  [810519d0] 
   __do_softirq+0xd0/0x270
   4[196727.312583]  [816cef3c] call_softirq+0x1c/0x30
   4[196727.312613]  [81004205] do_softirq+0x55/0x90
   4[196727.312640]  [81051c85] irq_exit+0x55/0x60
   4[196727.312668

Re: kmem_cache_alloc panic in 3.10+

2014-01-31 Thread dormando

On Fri, 31 Jan 2014, David Rientjes wrote:

> On Fri, 31 Jan 2014, dormando wrote:
>
> > > CONFIG_SLUB_DEBUG_ON will definitely be slower but can help to identify
> > > any possible corruption issues.
> > >
> > > I'm wondering if you have CONFIG_MEMCG enabled and are actually allocating
> > > slab in a non-root memcg?  What does /proc/self/cgroup say?
> > >
> >
> > /proc/self/cgroup is empty on these hosts. CONFIG_MEMCG is enabled though.
> >
>
> It _looks_ like the cmpxchg_double() so seeing if there is anything else
> funny going on with CONFIG_SLUB_DEBUG_ON would definitely be helpful;
> otherwise, try using CONFIG_SLAB is Eric suggested and seeing if the
> problem goes away.
>

chpxchg_double()? that's not related to the 62713c4b fix right?

I'll see what I can do.. it's going to take a long time to iterate on this
though.

Thanks for the suggestions!
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: kmem_cache_alloc panic in 3.10+

2014-01-31 Thread dormando

On Fri, 31 Jan 2014, David Rientjes wrote:

 On Fri, 31 Jan 2014, dormando wrote:

   CONFIG_SLUB_DEBUG_ON will definitely be slower but can help to identify
   any possible corruption issues.
  
   I'm wondering if you have CONFIG_MEMCG enabled and are actually allocating
   slab in a non-root memcg?  What does /proc/self/cgroup say?
  
 
  /proc/self/cgroup is empty on these hosts. CONFIG_MEMCG is enabled though.
 

 It _looks_ like the cmpxchg_double() so seeing if there is anything else
 funny going on with CONFIG_SLUB_DEBUG_ON would definitely be helpful;
 otherwise, try using CONFIG_SLAB is Eric suggested and seeing if the
 problem goes away.


chpxchg_double()? that's not related to the 62713c4b fix right?

I'll see what I can do.. it's going to take a long time to iterate on this
though.

Thanks for the suggestions!
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: kmem_cache_alloc panic in 3.10+

2014-01-30 Thread dormando

> On Thu, Jan 30, 2014 at 6:16 PM, Eric Dumazet  wrote:
> > On Wed, 2014-01-29 at 23:05 -0800, dormando wrote:
> >
> >> We hit the routing code fairly hard. Any hints for what to look at or how
> >> to instrument it? Or if it's fixed already? It's a real pain to iterate
> >> since it takes ~30 days to crash, usually. Sometimes.
>
> sounds like adding mdelay() didn't help to crash it sooner. Then I don't
> see how my dst fix was causing it to crash more often. Something odd.
> fyi just to check it more thoroughly I've been running with mdelay()
> and config_slub_debug_on for a week without issues.

Sorry, I'm actually trying to deal with two separate crashes at once :/
One is this 3.10.15 one, and one was the regression in 3.10.23 - I haven't
had time to attempt the mdelay test yet. The two crashes have fairly
distinct traces.

For what it's worth though the machines I have with that one patch
reverted are still running fine.

> > I really wonder... it looks like a possible in SLUB. (might be already
> > fixed)
> >
> > Could you try using SLAB instead ?
>
> try config_slub_debug_on=y ? it should catch double free and other things.
>

Any slowdowns/issues with that?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: kmem_cache_alloc panic in 3.10+

2014-01-30 Thread dormando

 On Thu, Jan 30, 2014 at 6:16 PM, Eric Dumazet eric.duma...@gmail.com wrote:
  On Wed, 2014-01-29 at 23:05 -0800, dormando wrote:
 
  We hit the routing code fairly hard. Any hints for what to look at or how
  to instrument it? Or if it's fixed already? It's a real pain to iterate
  since it takes ~30 days to crash, usually. Sometimes.

 sounds like adding mdelay() didn't help to crash it sooner. Then I don't
 see how my dst fix was causing it to crash more often. Something odd.
 fyi just to check it more thoroughly I've been running with mdelay()
 and config_slub_debug_on for a week without issues.

Sorry, I'm actually trying to deal with two separate crashes at once :/
One is this 3.10.15 one, and one was the regression in 3.10.23 - I haven't
had time to attempt the mdelay test yet. The two crashes have fairly
distinct traces.

For what it's worth though the machines I have with that one patch
reverted are still running fine.

  I really wonder... it looks like a possible in SLUB. (might be already
  fixed)
 
  Could you try using SLAB instead ?

 try config_slub_debug_on=y ? it should catch double free and other things.


Any slowdowns/issues with that?
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: kmem_cache_alloc panic in 3.10+

2014-01-29 Thread dormando

> > On Sat, 2014-01-18 at 00:44 -0800, dormando wrote:
> > > Hello again!
> > >
> > > We've had a rare crash that's existed between 3.10.0 and 3.10.15 at least
> > > (trying newer stables now, but I can't tell if it was fixed, and it takes
> > > weeks to reproduce).
> > >
> > > Unfortunately I can only get 8k back from pstore. The panic looks a bit
> > > longer than that is caught in the log, but the bottom part is almost
> > > always this same trace as this one:
> > >
> > > Panic#6 Part1
> > > <4>[1197485.199166]  [] tcp_push+0x6c/0x90
> > > <4>[1197485.199171]  [] tcp_sendmsg+0x109/0xd40
> > > <4>[1197485.199179]  [] ? put_page+0x35/0x40
> > > <4>[1197485.199185]  [] inet_sendmsg+0x45/0xb0
> > > <4>[1197485.199191]  [] sock_aio_write+0x11e/0x130
> > > <4>[1197485.199196]  [] ? inet_recvmsg+0x4f/0x80
> > > <4>[1197485.199203]  [] do_sync_readv_writev+0x6d/0xa0
> > > <4>[1197485.199209]  [] do_readv_writev+0xfb/0x2f0
> > > <4>[1197485.199215]  [] ? __free_pages+0x35/0x40
> > > <4>[1197485.199220]  [] ? free_pages+0x46/0x50
> > > <4>[1197485.199226]  [] ? SyS_mincore+0x152/0x690
> > > <4>[1197485.199231]  [] vfs_writev+0x48/0x60
> > > <4>[1197485.199236]  [] SyS_writev+0x5f/0xd0
> > > <4>[1197485.199243]  [] system_call_fastpath+0x16/0x1b
> > > <4>[1197485.199247] Code: 65 4c 03 04 25 c8 cb 00 00 49 8b 50 08 4d 8b 28 
> > > 49 8b 40 10 4d 85 ed 0f 84 84 00 00 00 48 85 c0 74 7f 49 63 44 24 20 49 
> > > 8b 3c 24 <49> 8b 5c 05 00 48 8d 4a 01 4c 89 e8 65 48 0f c7 0f 0f 94 c0 3c
> > > <1>[1197485.199290] RIP  [] kmem_cache_alloc+0x5a/0x130
> > > <4>[1197485.199296]  RSP 
> > > <4>[1197485.199299] CR2: 0001
> > > <4>[1197485.199343] ---[ end trace 90fee06aa40b7304 ]---
> > > <1>[1197485.263911] BUG: unable to handle kernel paging request at 
> > > 0001
> > > <1>[1197485.263923] IP: [] kmem_cache_alloc+0x5a/0x130
> > > <4>[1197485.263932] PGD 3f43e5c067 PUD 0
> > > <4>[1197485.263937] Oops:  [#5] SMP
> > > <4>[1197485.263941] Modules linked in: ntfs vfat msdos fat macvlan bridge 
> > > coretemp crc32_pclmul ghash_clmulni_intel gpio_ich microcode sb_edac 
> > > edac_core lpc_ich mfd_core ixgbe igb i2c_algo_bit mdio ptp pps_core
> > > <4>[1197485.263966] CPU: 0 PID: 233846 Comm: cache-worker Tainted: G  
> > > D  3.10.15 #1
> > > <4>[1197485.263972] Hardware name: Supermicro X9DR3-F/X9DR3-F, BIOS 2.0a 
> > > 03/07/2013
> > > <4>[1197485.263976] task: 883427f9dc00 ti: 8830d4312000 task.ti: 
> > > 8830d4312000
> > > <4>[1197485.263982] RIP: 0010:[]  [] 
> > > kmem_cache_alloc+0x5a/0x130
> > > <4>[1197485.263990] RSP: 0018:881fffc038c8  EFLAGS: 00010286
> > > <4>[1197485.263994] RAX:  RBX: 81c8c740 RCX: 
> > > 
> > > <4>[1197485.263999] RDX: 29273024 RSI: 0020 RDI: 
> > > 00015680
> > > <4>[1197485.264004] RBP: 881fffc03908 R08: 881fffc15680 R09: 
> > > 815bdd4b
> > > <4>[1197485.264009] R10: 881c65d21800 R11:  R12: 
> > > 881fff803800
> > > <4>[1197485.264014] R13: 0001 R14:  R15: 
> > > 
> > > <4>[1197485.264019] FS:  7f8d855eb700() GS:881fffc0() 
> > > knlGS:
> > > <4>[1197485.264024] CS:  0010 DS:  ES:  CR0: 80050033
> > > <4>[1197485.264028] CR2: 0001 CR3: 00308f258000 CR4: 
> > > 000407f0
> > > <4>[1197485.264032] DR0:  DR1:  DR2: 
> > > 
> > > <4>[1197485.264037] DR3:  DR6: 0ff0 DR7: 
> > > 0400
> > > <4>[1197485.264041] Stack:
> > > <4>[1197485.264044]  881fffc03928 0020815d0d95 881fffc03938 
> > > 81c8c740
> > > <4>[1197485.264050]  881fce21 0001  
> > > 
> > > <4>[1197485.264056]  881fffc03958 815bdd4b 881fffc039a8 
> > > 
> > > <4>[1197485.264063] Call Trace:
> > > <4>[1

Re: kmem_cache_alloc panic in 3.10+

2014-01-29 Thread dormando

  On Sat, 2014-01-18 at 00:44 -0800, dormando wrote:
   Hello again!
  
   We've had a rare crash that's existed between 3.10.0 and 3.10.15 at least
   (trying newer stables now, but I can't tell if it was fixed, and it takes
   weeks to reproduce).
  
   Unfortunately I can only get 8k back from pstore. The panic looks a bit
   longer than that is caught in the log, but the bottom part is almost
   always this same trace as this one:
  
   Panic#6 Part1
   4[1197485.199166]  [81611e8c] tcp_push+0x6c/0x90
   4[1197485.199171]  [816160a9] tcp_sendmsg+0x109/0xd40
   4[1197485.199179]  [81114b65] ? put_page+0x35/0x40
   4[1197485.199185]  [8163bf75] inet_sendmsg+0x45/0xb0
   4[1197485.199191]  [8159da7e] sock_aio_write+0x11e/0x130
   4[1197485.199196]  [8163b83f] ? inet_recvmsg+0x4f/0x80
   4[1197485.199203]  [811558ad] do_sync_readv_writev+0x6d/0xa0
   4[1197485.199209]  [8115722b] do_readv_writev+0xfb/0x2f0
   4[1197485.199215]  [8110fda5] ? __free_pages+0x35/0x40
   4[1197485.199220]  [8110fe56] ? free_pages+0x46/0x50
   4[1197485.199226]  [8112f9e2] ? SyS_mincore+0x152/0x690
   4[1197485.199231]  [81157468] vfs_writev+0x48/0x60
   4[1197485.199236]  [811575af] SyS_writev+0x5f/0xd0
   4[1197485.199243]  [816cf942] system_call_fastpath+0x16/0x1b
   4[1197485.199247] Code: 65 4c 03 04 25 c8 cb 00 00 49 8b 50 08 4d 8b 28 
   49 8b 40 10 4d 85 ed 0f 84 84 00 00 00 48 85 c0 74 7f 49 63 44 24 20 49 
   8b 3c 24 49 8b 5c 05 00 48 8d 4a 01 4c 89 e8 65 48 0f c7 0f 0f 94 c0 3c
   1[1197485.199290] RIP  [811476da] kmem_cache_alloc+0x5a/0x130
   4[1197485.199296]  RSP 883171211868
   4[1197485.199299] CR2: 0001
   4[1197485.199343] ---[ end trace 90fee06aa40b7304 ]---
   1[1197485.263911] BUG: unable to handle kernel paging request at 
   0001
   1[1197485.263923] IP: [811476da] kmem_cache_alloc+0x5a/0x130
   4[1197485.263932] PGD 3f43e5c067 PUD 0
   4[1197485.263937] Oops:  [#5] SMP
   4[1197485.263941] Modules linked in: ntfs vfat msdos fat macvlan bridge 
   coretemp crc32_pclmul ghash_clmulni_intel gpio_ich microcode sb_edac 
   edac_core lpc_ich mfd_core ixgbe igb i2c_algo_bit mdio ptp pps_core
   4[1197485.263966] CPU: 0 PID: 233846 Comm: cache-worker Tainted: G  
   D  3.10.15 #1
   4[1197485.263972] Hardware name: Supermicro X9DR3-F/X9DR3-F, BIOS 2.0a 
   03/07/2013
   4[1197485.263976] task: 883427f9dc00 ti: 8830d4312000 task.ti: 
   8830d4312000
   4[1197485.263982] RIP: 0010:[811476da]  [811476da] 
   kmem_cache_alloc+0x5a/0x130
   4[1197485.263990] RSP: 0018:881fffc038c8  EFLAGS: 00010286
   4[1197485.263994] RAX:  RBX: 81c8c740 RCX: 
   
   4[1197485.263999] RDX: 29273024 RSI: 0020 RDI: 
   00015680
   4[1197485.264004] RBP: 881fffc03908 R08: 881fffc15680 R09: 
   815bdd4b
   4[1197485.264009] R10: 881c65d21800 R11:  R12: 
   881fff803800
   4[1197485.264014] R13: 0001 R14:  R15: 
   
   4[1197485.264019] FS:  7f8d855eb700() GS:881fffc0() 
   knlGS:
   4[1197485.264024] CS:  0010 DS:  ES:  CR0: 80050033
   4[1197485.264028] CR2: 0001 CR3: 00308f258000 CR4: 
   000407f0
   4[1197485.264032] DR0:  DR1:  DR2: 
   
   4[1197485.264037] DR3:  DR6: 0ff0 DR7: 
   0400
   4[1197485.264041] Stack:
   4[1197485.264044]  881fffc03928 0020815d0d95 881fffc03938 
   81c8c740
   4[1197485.264050]  881fce21 0001  
   
   4[1197485.264056]  881fffc03958 815bdd4b 881fffc039a8 
   
   4[1197485.264063] Call Trace:
   4[1197485.264066]  IRQ
   4[1197485.264069]  [815bdd4b] dst_alloc+0x5b/0x190
   4[1197485.264080]  [8160068c] rt_dst_alloc+0x4c/0x50
   4[1197485.264085]  [81602a30] 
   __ip_route_output_key+0x270/0x880
   4[1197485.264092]  [8107ee7e] ? try_to_wake_up+0x23e/0x2b0
   4[1197485.264097]  [81603067] ip_route_output_flow+0x27/0x60
   4[1197485.264102]  [8160ab8a] ip_queue_xmit+0x36a/0x390
   4[1197485.264108]  [816207c5] tcp_transmit_skb+0x485/0x890
   4[1197485.264113]  [81621aa1] tcp_send_ack+0xf1/0x130
   4[1197485.264118]  [81618d7e] __tcp_ack_snd_check+0x5e/0xa0
   4[1197485.264123]  [8161f2c2] 
   tcp_rcv_state_process+0x8b2/0xb20
   4[1197485.264128]  [81627e61] tcp_v4_do_rcv+0x191/0x4f0
   4[1197485.264133]  [8162984c] tcp_v4_rcv+0x5fc/0x750
   4[1197485.264138]  [81604c80] ? ip_rcv+0x350/0x350
   4[1197485.264143]  [815e45cd] ? nf_hook_slow+0x7d/0x160
   4[1197485.264147

Re: ipv4_dst_destroy panic regression after 3.10.15

2014-01-21 Thread dormando



On Tue, 21 Jan 2014, Alexei Starovoitov wrote:

> On Tue, Jan 21, 2014 at 5:39 PM, dormando  wrote:
> >
> > > On Fri, Jan 17, 2014 at 11:16 PM, dormando  wrote:
> > > >> On Fri, 2014-01-17 at 22:49 -0800, Eric Dumazet wrote:
> > > >> > On Fri, 2014-01-17 at 17:25 -0800, dormando wrote:
> > > >> > > Hi,
> > > >> > >
> > > >> > > Upgraded a few kernels to the latest 3.10 stable tree while 
> > > >> > > tracking down
> > > >> > > a rare kernel panic, seems to have introduced a much more frequent 
> > > >> > > kernel
> > > >> > > panic. Takes anywhere from 4 hours to 2 days to trigger:
> > > >> > >
> > > >> > > <4>[196727.311203] general protection fault:  [#1] SMP
> > > >> > > <4>[196727.311224] Modules linked in: xt_TEE xt_dscp xt_DSCP 
> > > >> > > macvlan bridge coretemp crc32_pclmul ghash_clmulni_intel gpio_ich 
> > > >> > > microcode
> ipmi_watchdog ipmi_devintf sb_edac edac_core lpc_ich mfd_core tpm_tis tpm 
> tpm_bios ipmi_si ipmi_msghandler isci igb libsas i2c_algo_bit ixgbe ptp
> pps_core mdio
> > > >> > > <4>[196727.311333] CPU: 17 PID: 0 Comm: swapper/17 Not tainted 
> > > >> > > 3.10.26 #1
> > > >> > > <4>[196727.311344] Hardware name: Supermicro 
> > > >> > > X9DRi-LN4+/X9DR3-LN4+/X9DRi-LN4+/X9DR3-LN4+, BIOS 3.0 07/05/2013
> > > >> > > <4>[196727.311364] task: 885e6f069700 ti: 885e6f072000 
> > > >> > > task.ti: 885e6f072000
> > > >> > > <4>[196727.311377] RIP: 0010:[]  
> > > >> > > [] ipv4_dst_destroy+0x4f/0x80
> > > >> > > <4>[196727.311399] RSP: 0018:885effd23a70  EFLAGS: 00010282
> > > >> > > <4>[196727.311409] RAX: dead00200200 RBX: 8854c398ecc0 
> > > >> > > RCX: 0040
> > > >> > > <4>[196727.311423] RDX: dead00100100 RSI: dead00100100 
> > > >> > > RDI: dead00200200
> > > >> > > <4>[196727.311437] RBP: 885effd23a80 R08: 815fd9e0 
> > > >> > > R09: 885d5a590800
> > > >> > > <4>[196727.311451] R10:  R11:  
> > > >> > > R12: 
> > > >> > > <4>[196727.311464] R13: 81c8c280 R14:  
> > > >> > > R15: 880e85ee16ce
> > > >> > > <4>[196727.311510] FS:  () 
> > > >> > > GS:885effd2() knlGS:
> > > >> > > <4>[196727.311554] CS:  0010 DS:  ES:  CR0: 
> > > >> > > 80050033
> > > >> > > <4>[196727.311581] CR2: 7a46751eb000 CR3: 005e65688000 
> > > >> > > CR4: 000407e0
> > > >> > > <4>[196727.311625] DR0:  DR1:  
> > > >> > > DR2: 
> > > >> > > <4>[196727.311669] DR3:  DR6: 0ff0 
> > > >> > > DR7: 0400
> > > >> > > <4>[196727.311713] Stack:
> > > >> > > <4>[196727.311733]  8854c398ecc0 8854c398ecc0 
> > > >> > > 885effd23ab0 815b7f42
> > > >> > > <4>[196727.311784]  88be6595bc00 8854c398ecc0 
> > > >> > >  8854c398ecc0
> > > >> > > <4>[196727.311834]  885effd23ad0 815b86c6 
> > > >> > > 885d5a590800 8816827821c0
> > > >> > > <4>[196727.311885] Call Trace:
> > > >> > > <4>[196727.311907]  
> > > >> > > <4>[196727.311912]  [] dst_destroy+0x32/0xe0
> > > >> > > <4>[196727.311959]  [] dst_release+0x56/0x80
> > > >> > > <4>[196727.311986]  [] tcp_v4_do_rcv+0x2a5/0x4a0
> > > >> > > <4>[196727.312013]  [] tcp_v4_rcv+0x7da/0x820
> > > >> > > <4>[196727.312041]  [] ? 
> > > >> > > ip_rcv_finish+0x360/0x360
> > > >> > > <4>[196727.312070]  [] ? nf_hook_slow+0x7d/0x150
> > > >> > > <4>[196727.312097]  [] ? 
&g

Re: ipv4_dst_destroy panic regression after 3.10.15

2014-01-21 Thread dormando

> On Fri, Jan 17, 2014 at 11:16 PM, dormando  wrote:
> >> On Fri, 2014-01-17 at 22:49 -0800, Eric Dumazet wrote:
> >> > On Fri, 2014-01-17 at 17:25 -0800, dormando wrote:
> >> > > Hi,
> >> > >
> >> > > Upgraded a few kernels to the latest 3.10 stable tree while tracking 
> >> > > down
> >> > > a rare kernel panic, seems to have introduced a much more frequent 
> >> > > kernel
> >> > > panic. Takes anywhere from 4 hours to 2 days to trigger:
> >> > >
> >> > > <4>[196727.311203] general protection fault:  [#1] SMP
> >> > > <4>[196727.311224] Modules linked in: xt_TEE xt_dscp xt_DSCP macvlan 
> >> > > bridge coretemp crc32_pclmul ghash_clmulni_intel gpio_ich microcode 
> >> > > ipmi_watchdog ipmi_devintf sb_edac edac_core lpc_ich mfd_core tpm_tis 
> >> > > tpm tpm_bios ipmi_si ipmi_msghandler isci igb libsas i2c_algo_bit 
> >> > > ixgbe ptp pps_core mdio
> >> > > <4>[196727.311333] CPU: 17 PID: 0 Comm: swapper/17 Not tainted 3.10.26 
> >> > > #1
> >> > > <4>[196727.311344] Hardware name: Supermicro 
> >> > > X9DRi-LN4+/X9DR3-LN4+/X9DRi-LN4+/X9DR3-LN4+, BIOS 3.0 07/05/2013
> >> > > <4>[196727.311364] task: 885e6f069700 ti: 885e6f072000 
> >> > > task.ti: 885e6f072000
> >> > > <4>[196727.311377] RIP: 0010:[]  
> >> > > [] ipv4_dst_destroy+0x4f/0x80
> >> > > <4>[196727.311399] RSP: 0018:885effd23a70  EFLAGS: 00010282
> >> > > <4>[196727.311409] RAX: dead00200200 RBX: 8854c398ecc0 RCX: 
> >> > > 0040
> >> > > <4>[196727.311423] RDX: dead00100100 RSI: dead00100100 RDI: 
> >> > > dead00200200
> >> > > <4>[196727.311437] RBP: 885effd23a80 R08: 815fd9e0 R09: 
> >> > > 885d5a590800
> >> > > <4>[196727.311451] R10:  R11:  R12: 
> >> > > 
> >> > > <4>[196727.311464] R13: 81c8c280 R14:  R15: 
> >> > > 880e85ee16ce
> >> > > <4>[196727.311510] FS:  () 
> >> > > GS:885effd2() knlGS:
> >> > > <4>[196727.311554] CS:  0010 DS:  ES:  CR0: 80050033
> >> > > <4>[196727.311581] CR2: 7a46751eb000 CR3: 005e65688000 CR4: 
> >> > > 000407e0
> >> > > <4>[196727.311625] DR0:  DR1:  DR2: 
> >> > > 
> >> > > <4>[196727.311669] DR3:  DR6: 0ff0 DR7: 
> >> > > 0400
> >> > > <4>[196727.311713] Stack:
> >> > > <4>[196727.311733]  8854c398ecc0 8854c398ecc0 885effd23ab0 
> >> > > 815b7f42
> >> > > <4>[196727.311784]  88be6595bc00 8854c398ecc0  
> >> > > 8854c398ecc0
> >> > > <4>[196727.311834]  885effd23ad0 815b86c6 885d5a590800 
> >> > > 8816827821c0
> >> > > <4>[196727.311885] Call Trace:
> >> > > <4>[196727.311907]  
> >> > > <4>[196727.311912]  [] dst_destroy+0x32/0xe0
> >> > > <4>[196727.311959]  [] dst_release+0x56/0x80
> >> > > <4>[196727.311986]  [] tcp_v4_do_rcv+0x2a5/0x4a0
> >> > > <4>[196727.312013]  [] tcp_v4_rcv+0x7da/0x820
> >> > > <4>[196727.312041]  [] ? ip_rcv_finish+0x360/0x360
> >> > > <4>[196727.312070]  [] ? nf_hook_slow+0x7d/0x150
> >> > > <4>[196727.312097]  [] ? ip_rcv_finish+0x360/0x360
> >> > > <4>[196727.312125]  [] 
> >> > > ip_local_deliver_finish+0xb2/0x230
> >> > > <4>[196727.312154]  [] ip_local_deliver+0x4a/0x90
> >> > > <4>[196727.312183]  [] ip_rcv_finish+0x119/0x360
> >> > > <4>[196727.312212]  [] ip_rcv+0x22b/0x340
> >> > > <4>[196727.312242]  [] ? 
> >> > > macvlan_broadcast+0x160/0x160 [macvlan]
> >> > > <4>[196727.312275]  [] 
> >> > > __netif_receive_skb_core+0x512/0x640
> >> > > <4>[196727.312308]  [] ? kmem_cache_alloc+0x13b/0x150
> >> > > <4>[196727.312338]

Re: ipv4_dst_destroy panic regression after 3.10.15

2014-01-21 Thread dormando

 On Fri, Jan 17, 2014 at 11:16 PM, dormando dorma...@rydia.net wrote:
  On Fri, 2014-01-17 at 22:49 -0800, Eric Dumazet wrote:
   On Fri, 2014-01-17 at 17:25 -0800, dormando wrote:
Hi,
   
Upgraded a few kernels to the latest 3.10 stable tree while tracking 
down
a rare kernel panic, seems to have introduced a much more frequent 
kernel
panic. Takes anywhere from 4 hours to 2 days to trigger:
   
4[196727.311203] general protection fault:  [#1] SMP
4[196727.311224] Modules linked in: xt_TEE xt_dscp xt_DSCP macvlan 
bridge coretemp crc32_pclmul ghash_clmulni_intel gpio_ich microcode 
ipmi_watchdog ipmi_devintf sb_edac edac_core lpc_ich mfd_core tpm_tis 
tpm tpm_bios ipmi_si ipmi_msghandler isci igb libsas i2c_algo_bit 
ixgbe ptp pps_core mdio
4[196727.311333] CPU: 17 PID: 0 Comm: swapper/17 Not tainted 3.10.26 
#1
4[196727.311344] Hardware name: Supermicro 
X9DRi-LN4+/X9DR3-LN4+/X9DRi-LN4+/X9DR3-LN4+, BIOS 3.0 07/05/2013
4[196727.311364] task: 885e6f069700 ti: 885e6f072000 
task.ti: 885e6f072000
4[196727.311377] RIP: 0010:[815f8c7f]  
[815f8c7f] ipv4_dst_destroy+0x4f/0x80
4[196727.311399] RSP: 0018:885effd23a70  EFLAGS: 00010282
4[196727.311409] RAX: dead00200200 RBX: 8854c398ecc0 RCX: 
0040
4[196727.311423] RDX: dead00100100 RSI: dead00100100 RDI: 
dead00200200
4[196727.311437] RBP: 885effd23a80 R08: 815fd9e0 R09: 
885d5a590800
4[196727.311451] R10:  R11:  R12: 

4[196727.311464] R13: 81c8c280 R14:  R15: 
880e85ee16ce
4[196727.311510] FS:  () 
GS:885effd2() knlGS:
4[196727.311554] CS:  0010 DS:  ES:  CR0: 80050033
4[196727.311581] CR2: 7a46751eb000 CR3: 005e65688000 CR4: 
000407e0
4[196727.311625] DR0:  DR1:  DR2: 

4[196727.311669] DR3:  DR6: 0ff0 DR7: 
0400
4[196727.311713] Stack:
4[196727.311733]  8854c398ecc0 8854c398ecc0 885effd23ab0 
815b7f42
4[196727.311784]  88be6595bc00 8854c398ecc0  
8854c398ecc0
4[196727.311834]  885effd23ad0 815b86c6 885d5a590800 
8816827821c0
4[196727.311885] Call Trace:
4[196727.311907]  IRQ
4[196727.311912]  [815b7f42] dst_destroy+0x32/0xe0
4[196727.311959]  [815b86c6] dst_release+0x56/0x80
4[196727.311986]  [81620bd5] tcp_v4_do_rcv+0x2a5/0x4a0
4[196727.312013]  [81622b5a] tcp_v4_rcv+0x7da/0x820
4[196727.312041]  [815fd9e0] ? ip_rcv_finish+0x360/0x360
4[196727.312070]  [815de02d] ? nf_hook_slow+0x7d/0x150
4[196727.312097]  [815fd9e0] ? ip_rcv_finish+0x360/0x360
4[196727.312125]  [815fda92] 
ip_local_deliver_finish+0xb2/0x230
4[196727.312154]  [815fdd9a] ip_local_deliver+0x4a/0x90
4[196727.312183]  [815fd799] ip_rcv_finish+0x119/0x360
4[196727.312212]  [815fe00b] ip_rcv+0x22b/0x340
4[196727.312242]  [a0339680] ? 
macvlan_broadcast+0x160/0x160 [macvlan]
4[196727.312275]  [815b0c62] 
__netif_receive_skb_core+0x512/0x640
4[196727.312308]  [811427fb] ? kmem_cache_alloc+0x13b/0x150
4[196727.312338]  [815b0db1] __netif_receive_skb+0x21/0x70
4[196727.312368]  [815b0fa1] netif_receive_skb+0x31/0xa0
4[196727.312397]  [815b1ae8] napi_gro_receive+0xe8/0x140
4[196727.312433]  [a00274f1] ixgbe_poll+0x551/0x11f0 
[ixgbe]
4[196727.312463]  [815fe00b] ? ip_rcv+0x22b/0x340
4[196727.312491]  [815b1691] net_rx_action+0x111/0x210
4[196727.312521]  [815b0db1] ? 
__netif_receive_skb+0x21/0x70
4[196727.312552]  [810519d0] __do_softirq+0xd0/0x270
4[196727.312583]  [816cef3c] call_softirq+0x1c/0x30
4[196727.312613]  [81004205] do_softirq+0x55/0x90
4[196727.312640]  [81051c85] irq_exit+0x55/0x60
4[196727.312668]  [816cf5c3] do_IRQ+0x63/0xe0
4[196727.312696]  [816c5aaa] common_interrupt+0x6a/0x6a
4[196727.312722]  EOI
4[196727.312727]  [8100a150] ? default_idle+0x20/0xe0
4[196727.312775]  [8100a8ff] arch_cpu_idle+0xf/0x20
4[196727.312803]  [8108d330] cpu_startup_entry+0xc0/0x270
4[196727.312833]  [816b276e] start_secondary+0x1f9/0x200
4[196727.312860] Code: 4a 9f e9 81 e8 13 cb 0c 00 48 8b 93 b0 00 00 
00 48 bf 00 02 20 00 00 00 ad de 48 8b 83 b8 00 00 00 48 be 00 01 10 
00 00 00 ad de 48 89 42 08 48 89 10 48 89 bb b8 00 00 00 48 c7 c7 4a 
9f e9 81
1[196727.313071] RIP  [815f8c7f] ipv4_dst_destroy+0x4f

Re: ipv4_dst_destroy panic regression after 3.10.15

2014-01-21 Thread dormando



On Tue, 21 Jan 2014, Alexei Starovoitov wrote:

 On Tue, Jan 21, 2014 at 5:39 PM, dormando dorma...@rydia.net wrote:
 
   On Fri, Jan 17, 2014 at 11:16 PM, dormando dorma...@rydia.net wrote:
On Fri, 2014-01-17 at 22:49 -0800, Eric Dumazet wrote:
 On Fri, 2014-01-17 at 17:25 -0800, dormando wrote:
  Hi,
 
  Upgraded a few kernels to the latest 3.10 stable tree while 
  tracking down
  a rare kernel panic, seems to have introduced a much more frequent 
  kernel
  panic. Takes anywhere from 4 hours to 2 days to trigger:
 
  4[196727.311203] general protection fault:  [#1] SMP
  4[196727.311224] Modules linked in: xt_TEE xt_dscp xt_DSCP 
  macvlan bridge coretemp crc32_pclmul ghash_clmulni_intel gpio_ich 
  microcode
 ipmi_watchdog ipmi_devintf sb_edac edac_core lpc_ich mfd_core tpm_tis tpm 
 tpm_bios ipmi_si ipmi_msghandler isci igb libsas i2c_algo_bit ixgbe ptp
 pps_core mdio
  4[196727.311333] CPU: 17 PID: 0 Comm: swapper/17 Not tainted 
  3.10.26 #1
  4[196727.311344] Hardware name: Supermicro 
  X9DRi-LN4+/X9DR3-LN4+/X9DRi-LN4+/X9DR3-LN4+, BIOS 3.0 07/05/2013
  4[196727.311364] task: 885e6f069700 ti: 885e6f072000 
  task.ti: 885e6f072000
  4[196727.311377] RIP: 0010:[815f8c7f]  
  [815f8c7f] ipv4_dst_destroy+0x4f/0x80
  4[196727.311399] RSP: 0018:885effd23a70  EFLAGS: 00010282
  4[196727.311409] RAX: dead00200200 RBX: 8854c398ecc0 
  RCX: 0040
  4[196727.311423] RDX: dead00100100 RSI: dead00100100 
  RDI: dead00200200
  4[196727.311437] RBP: 885effd23a80 R08: 815fd9e0 
  R09: 885d5a590800
  4[196727.311451] R10:  R11:  
  R12: 
  4[196727.311464] R13: 81c8c280 R14:  
  R15: 880e85ee16ce
  4[196727.311510] FS:  () 
  GS:885effd2() knlGS:
  4[196727.311554] CS:  0010 DS:  ES:  CR0: 
  80050033
  4[196727.311581] CR2: 7a46751eb000 CR3: 005e65688000 
  CR4: 000407e0
  4[196727.311625] DR0:  DR1:  
  DR2: 
  4[196727.311669] DR3:  DR6: 0ff0 
  DR7: 0400
  4[196727.311713] Stack:
  4[196727.311733]  8854c398ecc0 8854c398ecc0 
  885effd23ab0 815b7f42
  4[196727.311784]  88be6595bc00 8854c398ecc0 
   8854c398ecc0
  4[196727.311834]  885effd23ad0 815b86c6 
  885d5a590800 8816827821c0
  4[196727.311885] Call Trace:
  4[196727.311907]  IRQ
  4[196727.311912]  [815b7f42] dst_destroy+0x32/0xe0
  4[196727.311959]  [815b86c6] dst_release+0x56/0x80
  4[196727.311986]  [81620bd5] tcp_v4_do_rcv+0x2a5/0x4a0
  4[196727.312013]  [81622b5a] tcp_v4_rcv+0x7da/0x820
  4[196727.312041]  [815fd9e0] ? 
  ip_rcv_finish+0x360/0x360
  4[196727.312070]  [815de02d] ? nf_hook_slow+0x7d/0x150
  4[196727.312097]  [815fd9e0] ? 
  ip_rcv_finish+0x360/0x360
  4[196727.312125]  [815fda92] 
  ip_local_deliver_finish+0xb2/0x230
  4[196727.312154]  [815fdd9a] ip_local_deliver+0x4a/0x90
  4[196727.312183]  [815fd799] ip_rcv_finish+0x119/0x360
  4[196727.312212]  [815fe00b] ip_rcv+0x22b/0x340
  4[196727.312242]  [a0339680] ? 
  macvlan_broadcast+0x160/0x160 [macvlan]
  4[196727.312275]  [815b0c62] 
  __netif_receive_skb_core+0x512/0x640
  4[196727.312308]  [811427fb] ? 
  kmem_cache_alloc+0x13b/0x150
  4[196727.312338]  [815b0db1] 
  __netif_receive_skb+0x21/0x70
  4[196727.312368]  [815b0fa1] 
  netif_receive_skb+0x31/0xa0
  4[196727.312397]  [815b1ae8] 
  napi_gro_receive+0xe8/0x140
  4[196727.312433]  [a00274f1] ixgbe_poll+0x551/0x11f0 
  [ixgbe]
  4[196727.312463]  [815fe00b] ? ip_rcv+0x22b/0x340
  4[196727.312491]  [815b1691] net_rx_action+0x111/0x210
  4[196727.312521]  [815b0db1] ? 
  __netif_receive_skb+0x21/0x70
  4[196727.312552]  [810519d0] __do_softirq+0xd0/0x270
  4[196727.312583]  [816cef3c] call_softirq+0x1c/0x30
  4[196727.312613]  [81004205] do_softirq+0x55/0x90
  4[196727.312640]  [81051c85] irq_exit+0x55/0x60
  4[196727.312668]  [816cf5c3] do_IRQ+0x63/0xe0
  4[196727.312696]  [816c5aaa] common_interrupt+0x6a/0x6a
  4[196727.312722]  EOI
  4[196727.312727]  [8100a150] ? default_idle+0x20/0xe0
  4[196727.312775]  [8100a8ff] arch_cpu_idle+0xf/0x20
  4[196727.312803]  [8108d330] 
  cpu_startup_entry+0xc0/0x270
  4[196727.312833

Re: ipv4_dst_destroy panic regression after 3.10.15

2014-01-19 Thread dormando

> On Fri, 2014-01-17 at 17:25 -0800, dormando wrote:
> > Hi,
> >
> > Upgraded a few kernels to the latest 3.10 stable tree while tracking down
> > a rare kernel panic, seems to have introduced a much more frequent kernel
> > panic. Takes anywhere from 4 hours to 2 days to trigger:
> >
> > <4>[196727.311203] general protection fault:  [#1] SMP
> > <4>[196727.311224] Modules linked in: xt_TEE xt_dscp xt_DSCP macvlan bridge 
> > coretemp crc32_pclmul ghash_clmulni_intel gpio_ich microcode ipmi_watchdog 
> > ipmi_devintf sb_edac edac_core lpc_ich mfd_core tpm_tis tpm tpm_bios 
> > ipmi_si ipmi_msghandler isci igb libsas i2c_algo_bit ixgbe ptp pps_core mdio
> > <4>[196727.311333] CPU: 17 PID: 0 Comm: swapper/17 Not tainted 3.10.26 #1
> > <4>[196727.311344] Hardware name: Supermicro 
> > X9DRi-LN4+/X9DR3-LN4+/X9DRi-LN4+/X9DR3-LN4+, BIOS 3.0 07/05/2013
> > <4>[196727.311364] task: 885e6f069700 ti: 885e6f072000 task.ti: 
> > 885e6f072000
> > <4>[196727.311377] RIP: 0010:[]  [] 
> > ipv4_dst_destroy+0x4f/0x80
> > <4>[196727.311399] RSP: 0018:885effd23a70  EFLAGS: 00010282
> > <4>[196727.311409] RAX: dead00200200 RBX: 8854c398ecc0 RCX: 
> > 0040
> > <4>[196727.311423] RDX: dead00100100 RSI: dead00100100 RDI: 
> > dead00200200
> > <4>[196727.311437] RBP: 885effd23a80 R08: 815fd9e0 R09: 
> > 885d5a590800
> > <4>[196727.311451] R10:  R11:  R12: 
> > 
> > <4>[196727.311464] R13: 81c8c280 R14:  R15: 
> > 880e85ee16ce
> > <4>[196727.311510] FS:  () GS:885effd2() 
> > knlGS:
> > <4>[196727.311554] CS:  0010 DS:  ES:  CR0: 80050033
> > <4>[196727.311581] CR2: 7a46751eb000 CR3: 005e65688000 CR4: 
> > 000407e0
> > <4>[196727.311625] DR0:  DR1:  DR2: 
> > 
> > <4>[196727.311669] DR3:  DR6: 0ff0 DR7: 
> > 0400
> > <4>[196727.311713] Stack:
> > <4>[196727.311733]  8854c398ecc0 8854c398ecc0 885effd23ab0 
> > 815b7f42
> > <4>[196727.311784]  88be6595bc00 8854c398ecc0  
> > 8854c398ecc0
> > <4>[196727.311834]  885effd23ad0 815b86c6 885d5a590800 
> > 8816827821c0
> > <4>[196727.311885] Call Trace:
> > <4>[196727.311907]  
> > <4>[196727.311912]  [] dst_destroy+0x32/0xe0
> > <4>[196727.311959]  [] dst_release+0x56/0x80
> > <4>[196727.311986]  [] tcp_v4_do_rcv+0x2a5/0x4a0
> > <4>[196727.312013]  [] tcp_v4_rcv+0x7da/0x820
> > <4>[196727.312041]  [] ? ip_rcv_finish+0x360/0x360
> > <4>[196727.312070]  [] ? nf_hook_slow+0x7d/0x150
> > <4>[196727.312097]  [] ? ip_rcv_finish+0x360/0x360
> > <4>[196727.312125]  [] ip_local_deliver_finish+0xb2/0x230
> > <4>[196727.312154]  [] ip_local_deliver+0x4a/0x90
> > <4>[196727.312183]  [] ip_rcv_finish+0x119/0x360
> > <4>[196727.312212]  [] ip_rcv+0x22b/0x340
> > <4>[196727.312242]  [] ? macvlan_broadcast+0x160/0x160 
> > [macvlan]
> > <4>[196727.312275]  [] 
> > __netif_receive_skb_core+0x512/0x640
> > <4>[196727.312308]  [] ? kmem_cache_alloc+0x13b/0x150
> > <4>[196727.312338]  [] __netif_receive_skb+0x21/0x70
> > <4>[196727.312368]  [] netif_receive_skb+0x31/0xa0
> > <4>[196727.312397]  [] napi_gro_receive+0xe8/0x140
> > <4>[196727.312433]  [] ixgbe_poll+0x551/0x11f0 [ixgbe]
> > <4>[196727.312463]  [] ? ip_rcv+0x22b/0x340
> > <4>[196727.312491]  [] net_rx_action+0x111/0x210
> > <4>[196727.312521]  [] ? __netif_receive_skb+0x21/0x70
> > <4>[196727.312552]  [] __do_softirq+0xd0/0x270
> > <4>[196727.312583]  [] call_softirq+0x1c/0x30
> > <4>[196727.312613]  [] do_softirq+0x55/0x90
> > <4>[196727.312640]  [] irq_exit+0x55/0x60
> > <4>[196727.312668]  [] do_IRQ+0x63/0xe0
> > <4>[196727.312696]  [] common_interrupt+0x6a/0x6a
> > <4>[196727.312722]  
> > <4>[196727.312727]  [] ? default_idle+0x20/0xe0
> > <4>[196727.312775]  [] arch_cpu_idle+0xf/0x20
> > <4>[196727.312803]  [] cpu_startup_entry+0xc0/0x270
> > <4>[196727.312833]  [] start_secondary+0x1f9/0x200
> > <4>[196727.312860] Code: 4a 9f e9 81 e8 13 cb 0c

Re: ipv4_dst_destroy panic regression after 3.10.15

2014-01-19 Thread dormando

 On Fri, 2014-01-17 at 17:25 -0800, dormando wrote:
  Hi,
 
  Upgraded a few kernels to the latest 3.10 stable tree while tracking down
  a rare kernel panic, seems to have introduced a much more frequent kernel
  panic. Takes anywhere from 4 hours to 2 days to trigger:
 
  4[196727.311203] general protection fault:  [#1] SMP
  4[196727.311224] Modules linked in: xt_TEE xt_dscp xt_DSCP macvlan bridge 
  coretemp crc32_pclmul ghash_clmulni_intel gpio_ich microcode ipmi_watchdog 
  ipmi_devintf sb_edac edac_core lpc_ich mfd_core tpm_tis tpm tpm_bios 
  ipmi_si ipmi_msghandler isci igb libsas i2c_algo_bit ixgbe ptp pps_core mdio
  4[196727.311333] CPU: 17 PID: 0 Comm: swapper/17 Not tainted 3.10.26 #1
  4[196727.311344] Hardware name: Supermicro 
  X9DRi-LN4+/X9DR3-LN4+/X9DRi-LN4+/X9DR3-LN4+, BIOS 3.0 07/05/2013
  4[196727.311364] task: 885e6f069700 ti: 885e6f072000 task.ti: 
  885e6f072000
  4[196727.311377] RIP: 0010:[815f8c7f]  [815f8c7f] 
  ipv4_dst_destroy+0x4f/0x80
  4[196727.311399] RSP: 0018:885effd23a70  EFLAGS: 00010282
  4[196727.311409] RAX: dead00200200 RBX: 8854c398ecc0 RCX: 
  0040
  4[196727.311423] RDX: dead00100100 RSI: dead00100100 RDI: 
  dead00200200
  4[196727.311437] RBP: 885effd23a80 R08: 815fd9e0 R09: 
  885d5a590800
  4[196727.311451] R10:  R11:  R12: 
  
  4[196727.311464] R13: 81c8c280 R14:  R15: 
  880e85ee16ce
  4[196727.311510] FS:  () GS:885effd2() 
  knlGS:
  4[196727.311554] CS:  0010 DS:  ES:  CR0: 80050033
  4[196727.311581] CR2: 7a46751eb000 CR3: 005e65688000 CR4: 
  000407e0
  4[196727.311625] DR0:  DR1:  DR2: 
  
  4[196727.311669] DR3:  DR6: 0ff0 DR7: 
  0400
  4[196727.311713] Stack:
  4[196727.311733]  8854c398ecc0 8854c398ecc0 885effd23ab0 
  815b7f42
  4[196727.311784]  88be6595bc00 8854c398ecc0  
  8854c398ecc0
  4[196727.311834]  885effd23ad0 815b86c6 885d5a590800 
  8816827821c0
  4[196727.311885] Call Trace:
  4[196727.311907]  IRQ
  4[196727.311912]  [815b7f42] dst_destroy+0x32/0xe0
  4[196727.311959]  [815b86c6] dst_release+0x56/0x80
  4[196727.311986]  [81620bd5] tcp_v4_do_rcv+0x2a5/0x4a0
  4[196727.312013]  [81622b5a] tcp_v4_rcv+0x7da/0x820
  4[196727.312041]  [815fd9e0] ? ip_rcv_finish+0x360/0x360
  4[196727.312070]  [815de02d] ? nf_hook_slow+0x7d/0x150
  4[196727.312097]  [815fd9e0] ? ip_rcv_finish+0x360/0x360
  4[196727.312125]  [815fda92] ip_local_deliver_finish+0xb2/0x230
  4[196727.312154]  [815fdd9a] ip_local_deliver+0x4a/0x90
  4[196727.312183]  [815fd799] ip_rcv_finish+0x119/0x360
  4[196727.312212]  [815fe00b] ip_rcv+0x22b/0x340
  4[196727.312242]  [a0339680] ? macvlan_broadcast+0x160/0x160 
  [macvlan]
  4[196727.312275]  [815b0c62] 
  __netif_receive_skb_core+0x512/0x640
  4[196727.312308]  [811427fb] ? kmem_cache_alloc+0x13b/0x150
  4[196727.312338]  [815b0db1] __netif_receive_skb+0x21/0x70
  4[196727.312368]  [815b0fa1] netif_receive_skb+0x31/0xa0
  4[196727.312397]  [815b1ae8] napi_gro_receive+0xe8/0x140
  4[196727.312433]  [a00274f1] ixgbe_poll+0x551/0x11f0 [ixgbe]
  4[196727.312463]  [815fe00b] ? ip_rcv+0x22b/0x340
  4[196727.312491]  [815b1691] net_rx_action+0x111/0x210
  4[196727.312521]  [815b0db1] ? __netif_receive_skb+0x21/0x70
  4[196727.312552]  [810519d0] __do_softirq+0xd0/0x270
  4[196727.312583]  [816cef3c] call_softirq+0x1c/0x30
  4[196727.312613]  [81004205] do_softirq+0x55/0x90
  4[196727.312640]  [81051c85] irq_exit+0x55/0x60
  4[196727.312668]  [816cf5c3] do_IRQ+0x63/0xe0
  4[196727.312696]  [816c5aaa] common_interrupt+0x6a/0x6a
  4[196727.312722]  EOI
  4[196727.312727]  [8100a150] ? default_idle+0x20/0xe0
  4[196727.312775]  [8100a8ff] arch_cpu_idle+0xf/0x20
  4[196727.312803]  [8108d330] cpu_startup_entry+0xc0/0x270
  4[196727.312833]  [816b276e] start_secondary+0x1f9/0x200
  4[196727.312860] Code: 4a 9f e9 81 e8 13 cb 0c 00 48 8b 93 b0 00 00 00 48 
  bf 00 02 20 00 00 00 ad de 48 8b 83 b8 00 00 00 48 be 00 01 10 00 00 00 ad 
  de 48 89 42 08 48 89 10 48 89 bb b8 00 00 00 48 c7 c7 4a 9f e9 81
  1[196727.313071] RIP  [815f8c7f] ipv4_dst_destroy+0x4f/0x80
  4[196727.313100]  RSP 885effd23a70
  4[196727.313377] ---[ end trace 64b3f14fae0f2e29 ]---
  0[196727.380908] Kernel panic - not syncing: Fatal exception in interrupt
 
 
  ... bisecting it's going to be a pain... I tried eyeballing the diffs and
  am trying a revert or two.
 
  We've hit it in .25, .26 so far. I have .27

Re: ipv4_dst_destroy panic regression after 3.10.15

2014-01-18 Thread dormando

> On Fri, Jan 17, 2014 at 11:16 PM, dormando  wrote:
> >> On Fri, 2014-01-17 at 22:49 -0800, Eric Dumazet wrote:
> >> > On Fri, 2014-01-17 at 17:25 -0800, dormando wrote:
> >> > > Hi,
> >> > >
> >> > > Upgraded a few kernels to the latest 3.10 stable tree while tracking 
> >> > > down
> >> > > a rare kernel panic, seems to have introduced a much more frequent 
> >> > > kernel
> >> > > panic. Takes anywhere from 4 hours to 2 days to trigger:
> >> > >
> >> > > <4>[196727.311203] general protection fault:  [#1] SMP
> >> > > <4>[196727.311224] Modules linked in: xt_TEE xt_dscp xt_DSCP macvlan 
> >> > > bridge coretemp crc32_pclmul ghash_clmulni_intel gpio_ich microcode 
> >> > > ipmi_watchdog ipmi_devintf sb_edac edac_core lpc_ich mfd_core tpm_tis 
> >> > > tpm tpm_bios ipmi_si ipmi_msghandler isci igb libsas i2c_algo_bit 
> >> > > ixgbe ptp pps_core mdio
> >> > > <4>[196727.311333] CPU: 17 PID: 0 Comm: swapper/17 Not tainted 3.10.26 
> >> > > #1
> >> > > <4>[196727.311344] Hardware name: Supermicro 
> >> > > X9DRi-LN4+/X9DR3-LN4+/X9DRi-LN4+/X9DR3-LN4+, BIOS 3.0 07/05/2013
> >> > > <4>[196727.311364] task: 885e6f069700 ti: 885e6f072000 
> >> > > task.ti: 885e6f072000
> >> > > <4>[196727.311377] RIP: 0010:[]  
> >> > > [] ipv4_dst_destroy+0x4f/0x80
> >> > > <4>[196727.311399] RSP: 0018:885effd23a70  EFLAGS: 00010282
> >> > > <4>[196727.311409] RAX: dead00200200 RBX: 8854c398ecc0 RCX: 
> >> > > 0040
> >> > > <4>[196727.311423] RDX: dead00100100 RSI: dead00100100 RDI: 
> >> > > dead00200200
> >> > > <4>[196727.311437] RBP: 885effd23a80 R08: 815fd9e0 R09: 
> >> > > 885d5a590800
> >> > > <4>[196727.311451] R10:  R11:  R12: 
> >> > > 
> >> > > <4>[196727.311464] R13: 81c8c280 R14:  R15: 
> >> > > 880e85ee16ce
> >> > > <4>[196727.311510] FS:  () 
> >> > > GS:885effd2() knlGS:
> >> > > <4>[196727.311554] CS:  0010 DS:  ES:  CR0: 80050033
> >> > > <4>[196727.311581] CR2: 7a46751eb000 CR3: 005e65688000 CR4: 
> >> > > 000407e0
> >> > > <4>[196727.311625] DR0:  DR1:  DR2: 
> >> > > 
> >> > > <4>[196727.311669] DR3:  DR6: 0ff0 DR7: 
> >> > > 0400
> >> > > <4>[196727.311713] Stack:
> >> > > <4>[196727.311733]  8854c398ecc0 8854c398ecc0 885effd23ab0 
> >> > > 815b7f42
> >> > > <4>[196727.311784]  88be6595bc00 8854c398ecc0  
> >> > > 8854c398ecc0
> >> > > <4>[196727.311834]  885effd23ad0 815b86c6 885d5a590800 
> >> > > 8816827821c0
> >> > > <4>[196727.311885] Call Trace:
> >> > > <4>[196727.311907]  
> >> > > <4>[196727.311912]  [] dst_destroy+0x32/0xe0
> >> > > <4>[196727.311959]  [] dst_release+0x56/0x80
> >> > > <4>[196727.311986]  [] tcp_v4_do_rcv+0x2a5/0x4a0
> >> > > <4>[196727.312013]  [] tcp_v4_rcv+0x7da/0x820
> >> > > <4>[196727.312041]  [] ? ip_rcv_finish+0x360/0x360
> >> > > <4>[196727.312070]  [] ? nf_hook_slow+0x7d/0x150
> >> > > <4>[196727.312097]  [] ? ip_rcv_finish+0x360/0x360
> >> > > <4>[196727.312125]  [] 
> >> > > ip_local_deliver_finish+0xb2/0x230
> >> > > <4>[196727.312154]  [] ip_local_deliver+0x4a/0x90
> >> > > <4>[196727.312183]  [] ip_rcv_finish+0x119/0x360
> >> > > <4>[196727.312212]  [] ip_rcv+0x22b/0x340
> >> > > <4>[196727.312242]  [] ? 
> >> > > macvlan_broadcast+0x160/0x160 [macvlan]
> >> > > <4>[196727.312275]  [] 
> >> > > __netif_receive_skb_core+0x512/0x640
> >> > > <4>[196727.312308]  [] ? kmem_cache_alloc+0x13b/0x150
> >> > > <4>[196727.312338]

Re: kmem_cache_alloc panic in 3.10+

2014-01-18 Thread dormando

> On Sat, 2014-01-18 at 00:44 -0800, dormando wrote:
> > Hello again!
> >
> > We've had a rare crash that's existed between 3.10.0 and 3.10.15 at least
> > (trying newer stables now, but I can't tell if it was fixed, and it takes
> > weeks to reproduce).
> >
> > Unfortunately I can only get 8k back from pstore. The panic looks a bit
> > longer than that is caught in the log, but the bottom part is almost
> > always this same trace as this one:
> >
> > Panic#6 Part1
> > <4>[1197485.199166]  [] tcp_push+0x6c/0x90
> > <4>[1197485.199171]  [] tcp_sendmsg+0x109/0xd40
> > <4>[1197485.199179]  [] ? put_page+0x35/0x40
> > <4>[1197485.199185]  [] inet_sendmsg+0x45/0xb0
> > <4>[1197485.199191]  [] sock_aio_write+0x11e/0x130
> > <4>[1197485.199196]  [] ? inet_recvmsg+0x4f/0x80
> > <4>[1197485.199203]  [] do_sync_readv_writev+0x6d/0xa0
> > <4>[1197485.199209]  [] do_readv_writev+0xfb/0x2f0
> > <4>[1197485.199215]  [] ? __free_pages+0x35/0x40
> > <4>[1197485.199220]  [] ? free_pages+0x46/0x50
> > <4>[1197485.199226]  [] ? SyS_mincore+0x152/0x690
> > <4>[1197485.199231]  [] vfs_writev+0x48/0x60
> > <4>[1197485.199236]  [] SyS_writev+0x5f/0xd0
> > <4>[1197485.199243]  [] system_call_fastpath+0x16/0x1b
> > <4>[1197485.199247] Code: 65 4c 03 04 25 c8 cb 00 00 49 8b 50 08 4d 8b 28 
> > 49 8b 40 10 4d 85 ed 0f 84 84 00 00 00 48 85 c0 74 7f 49 63 44 24 20 49 8b 
> > 3c 24 <49> 8b 5c 05 00 48 8d 4a 01 4c 89 e8 65 48 0f c7 0f 0f 94 c0 3c
> > <1>[1197485.199290] RIP  [] kmem_cache_alloc+0x5a/0x130
> > <4>[1197485.199296]  RSP 
> > <4>[1197485.199299] CR2: 0001
> > <4>[1197485.199343] ---[ end trace 90fee06aa40b7304 ]---
> > <1>[1197485.263911] BUG: unable to handle kernel paging request at 
> > 0001
> > <1>[1197485.263923] IP: [] kmem_cache_alloc+0x5a/0x130
> > <4>[1197485.263932] PGD 3f43e5c067 PUD 0
> > <4>[1197485.263937] Oops:  [#5] SMP
> > <4>[1197485.263941] Modules linked in: ntfs vfat msdos fat macvlan bridge 
> > coretemp crc32_pclmul ghash_clmulni_intel gpio_ich microcode sb_edac 
> > edac_core lpc_ich mfd_core ixgbe igb i2c_algo_bit mdio ptp pps_core
> > <4>[1197485.263966] CPU: 0 PID: 233846 Comm: cache-worker Tainted: G  D 
> >  3.10.15 #1
> > <4>[1197485.263972] Hardware name: Supermicro X9DR3-F/X9DR3-F, BIOS 2.0a 
> > 03/07/2013
> > <4>[1197485.263976] task: 883427f9dc00 ti: 8830d4312000 task.ti: 
> > 8830d4312000
> > <4>[1197485.263982] RIP: 0010:[]  [] 
> > kmem_cache_alloc+0x5a/0x130
> > <4>[1197485.263990] RSP: 0018:881fffc038c8  EFLAGS: 00010286
> > <4>[1197485.263994] RAX:  RBX: 81c8c740 RCX: 
> > 
> > <4>[1197485.263999] RDX: 29273024 RSI: 0020 RDI: 
> > 00015680
> > <4>[1197485.264004] RBP: 881fffc03908 R08: 881fffc15680 R09: 
> > 815bdd4b
> > <4>[1197485.264009] R10: 881c65d21800 R11:  R12: 
> > 881fff803800
> > <4>[1197485.264014] R13: 0001 R14:  R15: 
> > 
> > <4>[1197485.264019] FS:  7f8d855eb700() GS:881fffc0() 
> > knlGS:
> > <4>[1197485.264024] CS:  0010 DS:  ES:  CR0: 80050033
> > <4>[1197485.264028] CR2: 0001 CR3: 00308f258000 CR4: 
> > 000407f0
> > <4>[1197485.264032] DR0:  DR1:  DR2: 
> > 
> > <4>[1197485.264037] DR3:  DR6: 0ff0 DR7: 
> > 0400
> > <4>[1197485.264041] Stack:
> > <4>[1197485.264044]  881fffc03928 0020815d0d95 881fffc03938 
> > 81c8c740
> > <4>[1197485.264050]  881fce21 0001  
> > 
> > <4>[1197485.264056]  881fffc03958 815bdd4b 881fffc039a8 
> > 
> > <4>[1197485.264063] Call Trace:
> > <4>[1197485.264066]  
> > <4>[1197485.264069]  [] dst_alloc+0x5b/0x190
> > <4>[1197485.264080]  [] rt_dst_alloc+0x4c/0x50
> > <4>[1197485.264085]  [] __ip_route_output_key+0x270/0x880
> > <4>[1197485.264092]  [] ? try_to_wake_up+0x23e/0x2b0
> > <4>[1197485.264097]  [] ip_route_output_flow+0x27/0x60
> > <4>[1197485.264102]  [] ip_queue_x

kmem_cache_alloc panic in 3.10+

2014-01-18 Thread dormando

97485.264180]  [] process_backlog+0xf4/0x1e0
<4>[1197485.264184]  [] net_rx_action+0xf5/0x250
<4>[1197485.264190]  [] __do_softirq+0xef/0x270
<4>[1197485.264196]  [] call_softirq+0x1c/0x30
<4>[1197485.264199]  
<4>[1197485.264201]  [] do_softirq+0x55/0x90
<4>[1197485.264209]  [] local_bh_enable+0x94/0xa0
<4>[1197485.264215]  [] ipt_do_table+0x22a/0x680
<4>[1197485.264221]  [] ? skb_clone_tx_timestamp+0x31/0x110
<4>[1197485.264231]  [] ? ixgbe_xmit_frame_ring+0x4c0/0xd40 
[ixgbe]
<4>[1197485.264239]  [] ? ixgbe_xmit_frame+0x43/0x90 [ixgbe]
<4>[1197485.264245]  [] iptable_raw_hook+0x33/0x70
<4>[1197485.264252]  [] nf_iterate+0x87/0xb0
<4>[1197485.264256]  [] ? ip_options_echo+0x420/0x420
<4>[1197485.264261]  [] nf_hook_slow+0x7d/0x160
<4>[1197485.264266]  [] ? ip_options_echo+0x420/0x420
<4>[1197485.264270]  [] __ip_local_out+0xa0/0xb0
<4>[1197485.264275]  [] ip_local_out+0x16/0x30
<4>[1197485.264280]  [] ip_queue_xmit+0x15a/0x390
<4>[1197485.264286]  [] ? tcp_v4_md5_lookup+0x13/0x20
<4>[1197485.264290]  [] tcp_transmit_skb+0x485/0x890
<4>[1197485.264295]  [] tcp_write_xmit+0x1b8/0xa50
<4>[1197485.264300]  [] ? __alloc_skb+0xa8/0x1f0
<4>[1197485.264304]  [] tcp_push_one+0x30/0x40
<4>[1197485.264309]  [] tcp_sendmsg+0xbe4/0xd40
<4>[1197485.264315]  [] ? put_page+0x35/0x40
<4>[1197485.264321]  [] inet_sendmsg+0x45/0xb0
<4>[1197485.264326]  [] sock_aio_write+0x11e/0x130
<4>[1197485.264331]  [] ? inet_recvmsg+0x4f/0x80
<4>[1197485.264337]  [] do_sync_readv_writev+0x6d/0xa0
<4>[1197485.264343]  [] do_readv_writev+0xfb/0x2f0
<4>[1197485.264347]  [] ? __free_pages+0x35/0x40
<4>[1197485.264352]  [] ? free_pages+0x46/0x50
<4>[1197485.264357]  [] ? SyS_mincore+0x152/0x690
<4>[1197485.264363]  [] vfs_writev+0x48/0x60
<4>[1197485.264367]  [] SyS_writev+0x5f/0xd0
<4>[1197485.264373]  [] system_call_fastpath+0x16/0x1b
<4>[1197485.264377] Code: 65 4c 03 04 25 c8 cb 00 00 49 8b 50 08 4d 8b 28 49 8b 
40 10 4d 85 ed 0f 84 84 00 00 00 48 85 c0 74 7f 49 63 44 24 20 49 8b 3c 24 <49> 
8b 5c 05 00 48 8d 4a 01 4c 89 e8 65 48 0f c7 0f 0f 94 c0 3c
<1>[1197485.264417] RIP  [] kmem_cache_alloc+0x5a/0x130
<4>[1197485.264424]  RSP 
<4>[1197485.264427] CR2: 0001
<4>[1197485.264431] ---[ end trace 90fee06aa40b7305 ]---
<0>[1197485.325141] Kernel panic - not syncing: Fatal exception in interrupt

... way down in the tcp code.

Any help would be appreciated :) I'll do what I can to help, but iterating
this particular crash is very hard due to the amount of time it takes to
reproduce. Since we have a large number of machines they're always
crashing here and there, but once they do it's not going to happen again
for a while.

Thanks!
-Dormando
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

kmem_cache_alloc panic in 3.10+

2014-01-18 Thread dormando

] __netif_receive_skb_core+0x477/0x600
4[1197485.264175]  [815b8ba7] __netif_receive_skb+0x27/0x70
4[1197485.264180]  [815b8ce4] process_backlog+0xf4/0x1e0
4[1197485.264184]  [815b94e5] net_rx_action+0xf5/0x250
4[1197485.264190]  [81053b7f] __do_softirq+0xef/0x270
4[1197485.264196]  [816d0b7c] call_softirq+0x1c/0x30
4[1197485.264199]  EOI
4[1197485.264201]  [81004495] do_softirq+0x55/0x90
4[1197485.264209]  [81053a84] local_bh_enable+0x94/0xa0
4[1197485.264215]  [8165567a] ipt_do_table+0x22a/0x680
4[1197485.264221]  [815d39c1] ? skb_clone_tx_timestamp+0x31/0x110
4[1197485.264231]  [a00ae840] ? ixgbe_xmit_frame_ring+0x4c0/0xd40 
[ixgbe]
4[1197485.264239]  [a00af103] ? ixgbe_xmit_frame+0x43/0x90 [ixgbe]
4[1197485.264245]  [81657a23] iptable_raw_hook+0x33/0x70
4[1197485.264252]  [815e43a7] nf_iterate+0x87/0xb0
4[1197485.264256]  [81607e20] ? ip_options_echo+0x420/0x420
4[1197485.264261]  [815e45cd] nf_hook_slow+0x7d/0x160
4[1197485.264266]  [81607e20] ? ip_options_echo+0x420/0x420
4[1197485.264270]  [8160a430] __ip_local_out+0xa0/0xb0
4[1197485.264275]  [8160a456] ip_local_out+0x16/0x30
4[1197485.264280]  [8160a97a] ip_queue_xmit+0x15a/0x390
4[1197485.264286]  [81625e73] ? tcp_v4_md5_lookup+0x13/0x20
4[1197485.264290]  [816207c5] tcp_transmit_skb+0x485/0x890
4[1197485.264295]  [81622e08] tcp_write_xmit+0x1b8/0xa50
4[1197485.264300]  [815a7e28] ? __alloc_skb+0xa8/0x1f0
4[1197485.264304]  [816236d0] tcp_push_one+0x30/0x40
4[1197485.264309]  [81616b84] tcp_sendmsg+0xbe4/0xd40
4[1197485.264315]  [81114b65] ? put_page+0x35/0x40
4[1197485.264321]  [8163bf75] inet_sendmsg+0x45/0xb0
4[1197485.264326]  [8159da7e] sock_aio_write+0x11e/0x130
4[1197485.264331]  [8163b83f] ? inet_recvmsg+0x4f/0x80
4[1197485.264337]  [811558ad] do_sync_readv_writev+0x6d/0xa0
4[1197485.264343]  [8115722b] do_readv_writev+0xfb/0x2f0
4[1197485.264347]  [8110fda5] ? __free_pages+0x35/0x40
4[1197485.264352]  [8110fe56] ? free_pages+0x46/0x50
4[1197485.264357]  [8112f9e2] ? SyS_mincore+0x152/0x690
4[1197485.264363]  [81157468] vfs_writev+0x48/0x60
4[1197485.264367]  [811575af] SyS_writev+0x5f/0xd0
4[1197485.264373]  [816cf942] system_call_fastpath+0x16/0x1b
4[1197485.264377] Code: 65 4c 03 04 25 c8 cb 00 00 49 8b 50 08 4d 8b 28 49 8b 
40 10 4d 85 ed 0f 84 84 00 00 00 48 85 c0 74 7f 49 63 44 24 20 49 8b 3c 24 49 
8b 5c 05 00 48 8d 4a 01 4c 89 e8 65 48 0f c7 0f 0f 94 c0 3c
1[1197485.264417] RIP  [811476da] kmem_cache_alloc+0x5a/0x130
4[1197485.264424]  RSP 881fffc038c8
4[1197485.264427] CR2: 0001
4[1197485.264431] ---[ end trace 90fee06aa40b7305 ]---
0[1197485.325141] Kernel panic - not syncing: Fatal exception in interrupt

... way down in the tcp code.

Any help would be appreciated :) I'll do what I can to help, but iterating
this particular crash is very hard due to the amount of time it takes to
reproduce. Since we have a large number of machines they're always
crashing here and there, but once they do it's not going to happen again
for a while.

Thanks!
-Dormando
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: kmem_cache_alloc panic in 3.10+

2014-01-18 Thread dormando

 On Sat, 2014-01-18 at 00:44 -0800, dormando wrote:
  Hello again!
 
  We've had a rare crash that's existed between 3.10.0 and 3.10.15 at least
  (trying newer stables now, but I can't tell if it was fixed, and it takes
  weeks to reproduce).
 
  Unfortunately I can only get 8k back from pstore. The panic looks a bit
  longer than that is caught in the log, but the bottom part is almost
  always this same trace as this one:
 
  Panic#6 Part1
  4[1197485.199166]  [81611e8c] tcp_push+0x6c/0x90
  4[1197485.199171]  [816160a9] tcp_sendmsg+0x109/0xd40
  4[1197485.199179]  [81114b65] ? put_page+0x35/0x40
  4[1197485.199185]  [8163bf75] inet_sendmsg+0x45/0xb0
  4[1197485.199191]  [8159da7e] sock_aio_write+0x11e/0x130
  4[1197485.199196]  [8163b83f] ? inet_recvmsg+0x4f/0x80
  4[1197485.199203]  [811558ad] do_sync_readv_writev+0x6d/0xa0
  4[1197485.199209]  [8115722b] do_readv_writev+0xfb/0x2f0
  4[1197485.199215]  [8110fda5] ? __free_pages+0x35/0x40
  4[1197485.199220]  [8110fe56] ? free_pages+0x46/0x50
  4[1197485.199226]  [8112f9e2] ? SyS_mincore+0x152/0x690
  4[1197485.199231]  [81157468] vfs_writev+0x48/0x60
  4[1197485.199236]  [811575af] SyS_writev+0x5f/0xd0
  4[1197485.199243]  [816cf942] system_call_fastpath+0x16/0x1b
  4[1197485.199247] Code: 65 4c 03 04 25 c8 cb 00 00 49 8b 50 08 4d 8b 28 
  49 8b 40 10 4d 85 ed 0f 84 84 00 00 00 48 85 c0 74 7f 49 63 44 24 20 49 8b 
  3c 24 49 8b 5c 05 00 48 8d 4a 01 4c 89 e8 65 48 0f c7 0f 0f 94 c0 3c
  1[1197485.199290] RIP  [811476da] kmem_cache_alloc+0x5a/0x130
  4[1197485.199296]  RSP 883171211868
  4[1197485.199299] CR2: 0001
  4[1197485.199343] ---[ end trace 90fee06aa40b7304 ]---
  1[1197485.263911] BUG: unable to handle kernel paging request at 
  0001
  1[1197485.263923] IP: [811476da] kmem_cache_alloc+0x5a/0x130
  4[1197485.263932] PGD 3f43e5c067 PUD 0
  4[1197485.263937] Oops:  [#5] SMP
  4[1197485.263941] Modules linked in: ntfs vfat msdos fat macvlan bridge 
  coretemp crc32_pclmul ghash_clmulni_intel gpio_ich microcode sb_edac 
  edac_core lpc_ich mfd_core ixgbe igb i2c_algo_bit mdio ptp pps_core
  4[1197485.263966] CPU: 0 PID: 233846 Comm: cache-worker Tainted: G  D 
   3.10.15 #1
  4[1197485.263972] Hardware name: Supermicro X9DR3-F/X9DR3-F, BIOS 2.0a 
  03/07/2013
  4[1197485.263976] task: 883427f9dc00 ti: 8830d4312000 task.ti: 
  8830d4312000
  4[1197485.263982] RIP: 0010:[811476da]  [811476da] 
  kmem_cache_alloc+0x5a/0x130
  4[1197485.263990] RSP: 0018:881fffc038c8  EFLAGS: 00010286
  4[1197485.263994] RAX:  RBX: 81c8c740 RCX: 
  
  4[1197485.263999] RDX: 29273024 RSI: 0020 RDI: 
  00015680
  4[1197485.264004] RBP: 881fffc03908 R08: 881fffc15680 R09: 
  815bdd4b
  4[1197485.264009] R10: 881c65d21800 R11:  R12: 
  881fff803800
  4[1197485.264014] R13: 0001 R14:  R15: 
  
  4[1197485.264019] FS:  7f8d855eb700() GS:881fffc0() 
  knlGS:
  4[1197485.264024] CS:  0010 DS:  ES:  CR0: 80050033
  4[1197485.264028] CR2: 0001 CR3: 00308f258000 CR4: 
  000407f0
  4[1197485.264032] DR0:  DR1:  DR2: 
  
  4[1197485.264037] DR3:  DR6: 0ff0 DR7: 
  0400
  4[1197485.264041] Stack:
  4[1197485.264044]  881fffc03928 0020815d0d95 881fffc03938 
  81c8c740
  4[1197485.264050]  881fce21 0001  
  
  4[1197485.264056]  881fffc03958 815bdd4b 881fffc039a8 
  
  4[1197485.264063] Call Trace:
  4[1197485.264066]  IRQ
  4[1197485.264069]  [815bdd4b] dst_alloc+0x5b/0x190
  4[1197485.264080]  [8160068c] rt_dst_alloc+0x4c/0x50
  4[1197485.264085]  [81602a30] __ip_route_output_key+0x270/0x880
  4[1197485.264092]  [8107ee7e] ? try_to_wake_up+0x23e/0x2b0
  4[1197485.264097]  [81603067] ip_route_output_flow+0x27/0x60
  4[1197485.264102]  [8160ab8a] ip_queue_xmit+0x36a/0x390
  4[1197485.264108]  [816207c5] tcp_transmit_skb+0x485/0x890
  4[1197485.264113]  [81621aa1] tcp_send_ack+0xf1/0x130
  4[1197485.264118]  [81618d7e] __tcp_ack_snd_check+0x5e/0xa0
  4[1197485.264123]  [8161f2c2] tcp_rcv_state_process+0x8b2/0xb20
  4[1197485.264128]  [81627e61] tcp_v4_do_rcv+0x191/0x4f0
  4[1197485.264133]  [8162984c] tcp_v4_rcv+0x5fc/0x750
  4[1197485.264138]  [81604c80] ? ip_rcv+0x350/0x350
  4[1197485.264143]  [815e45cd] ? nf_hook_slow+0x7d/0x160
  4[1197485.264147]  [81604c80] ? ip_rcv+0x350/0x350
  4[1197485.264152]  [81604d4e] ip_local_deliver_finish+0xce/0x250

Re: ipv4_dst_destroy panic regression after 3.10.15

2014-01-18 Thread dormando

 On Fri, Jan 17, 2014 at 11:16 PM, dormando dorma...@rydia.net wrote:
  On Fri, 2014-01-17 at 22:49 -0800, Eric Dumazet wrote:
   On Fri, 2014-01-17 at 17:25 -0800, dormando wrote:
Hi,
   
Upgraded a few kernels to the latest 3.10 stable tree while tracking 
down
a rare kernel panic, seems to have introduced a much more frequent 
kernel
panic. Takes anywhere from 4 hours to 2 days to trigger:
   
4[196727.311203] general protection fault:  [#1] SMP
4[196727.311224] Modules linked in: xt_TEE xt_dscp xt_DSCP macvlan 
bridge coretemp crc32_pclmul ghash_clmulni_intel gpio_ich microcode 
ipmi_watchdog ipmi_devintf sb_edac edac_core lpc_ich mfd_core tpm_tis 
tpm tpm_bios ipmi_si ipmi_msghandler isci igb libsas i2c_algo_bit 
ixgbe ptp pps_core mdio
4[196727.311333] CPU: 17 PID: 0 Comm: swapper/17 Not tainted 3.10.26 
#1
4[196727.311344] Hardware name: Supermicro 
X9DRi-LN4+/X9DR3-LN4+/X9DRi-LN4+/X9DR3-LN4+, BIOS 3.0 07/05/2013
4[196727.311364] task: 885e6f069700 ti: 885e6f072000 
task.ti: 885e6f072000
4[196727.311377] RIP: 0010:[815f8c7f]  
[815f8c7f] ipv4_dst_destroy+0x4f/0x80
4[196727.311399] RSP: 0018:885effd23a70  EFLAGS: 00010282
4[196727.311409] RAX: dead00200200 RBX: 8854c398ecc0 RCX: 
0040
4[196727.311423] RDX: dead00100100 RSI: dead00100100 RDI: 
dead00200200
4[196727.311437] RBP: 885effd23a80 R08: 815fd9e0 R09: 
885d5a590800
4[196727.311451] R10:  R11:  R12: 

4[196727.311464] R13: 81c8c280 R14:  R15: 
880e85ee16ce
4[196727.311510] FS:  () 
GS:885effd2() knlGS:
4[196727.311554] CS:  0010 DS:  ES:  CR0: 80050033
4[196727.311581] CR2: 7a46751eb000 CR3: 005e65688000 CR4: 
000407e0
4[196727.311625] DR0:  DR1:  DR2: 

4[196727.311669] DR3:  DR6: 0ff0 DR7: 
0400
4[196727.311713] Stack:
4[196727.311733]  8854c398ecc0 8854c398ecc0 885effd23ab0 
815b7f42
4[196727.311784]  88be6595bc00 8854c398ecc0  
8854c398ecc0
4[196727.311834]  885effd23ad0 815b86c6 885d5a590800 
8816827821c0
4[196727.311885] Call Trace:
4[196727.311907]  IRQ
4[196727.311912]  [815b7f42] dst_destroy+0x32/0xe0
4[196727.311959]  [815b86c6] dst_release+0x56/0x80
4[196727.311986]  [81620bd5] tcp_v4_do_rcv+0x2a5/0x4a0
4[196727.312013]  [81622b5a] tcp_v4_rcv+0x7da/0x820
4[196727.312041]  [815fd9e0] ? ip_rcv_finish+0x360/0x360
4[196727.312070]  [815de02d] ? nf_hook_slow+0x7d/0x150
4[196727.312097]  [815fd9e0] ? ip_rcv_finish+0x360/0x360
4[196727.312125]  [815fda92] 
ip_local_deliver_finish+0xb2/0x230
4[196727.312154]  [815fdd9a] ip_local_deliver+0x4a/0x90
4[196727.312183]  [815fd799] ip_rcv_finish+0x119/0x360
4[196727.312212]  [815fe00b] ip_rcv+0x22b/0x340
4[196727.312242]  [a0339680] ? 
macvlan_broadcast+0x160/0x160 [macvlan]
4[196727.312275]  [815b0c62] 
__netif_receive_skb_core+0x512/0x640
4[196727.312308]  [811427fb] ? kmem_cache_alloc+0x13b/0x150
4[196727.312338]  [815b0db1] __netif_receive_skb+0x21/0x70
4[196727.312368]  [815b0fa1] netif_receive_skb+0x31/0xa0
4[196727.312397]  [815b1ae8] napi_gro_receive+0xe8/0x140
4[196727.312433]  [a00274f1] ixgbe_poll+0x551/0x11f0 
[ixgbe]
4[196727.312463]  [815fe00b] ? ip_rcv+0x22b/0x340
4[196727.312491]  [815b1691] net_rx_action+0x111/0x210
4[196727.312521]  [815b0db1] ? 
__netif_receive_skb+0x21/0x70
4[196727.312552]  [810519d0] __do_softirq+0xd0/0x270
4[196727.312583]  [816cef3c] call_softirq+0x1c/0x30
4[196727.312613]  [81004205] do_softirq+0x55/0x90
4[196727.312640]  [81051c85] irq_exit+0x55/0x60
4[196727.312668]  [816cf5c3] do_IRQ+0x63/0xe0
4[196727.312696]  [816c5aaa] common_interrupt+0x6a/0x6a
4[196727.312722]  EOI
4[196727.312727]  [8100a150] ? default_idle+0x20/0xe0
4[196727.312775]  [8100a8ff] arch_cpu_idle+0xf/0x20
4[196727.312803]  [8108d330] cpu_startup_entry+0xc0/0x270
4[196727.312833]  [816b276e] start_secondary+0x1f9/0x200
4[196727.312860] Code: 4a 9f e9 81 e8 13 cb 0c 00 48 8b 93 b0 00 00 
00 48 bf 00 02 20 00 00 00 ad de 48 8b 83 b8 00 00 00 48 be 00 01 10 
00 00 00 ad de 48 89 42 08 48 89 10 48 89 bb b8 00 00 00 48 c7 c7 4a 
9f e9 81
1[196727.313071] RIP  [815f8c7f] ipv4_dst_destroy+0x4f

Re: ipv4_dst_destroy panic regression after 3.10.15

2014-01-17 Thread dormando

> On Fri, 2014-01-17 at 22:49 -0800, Eric Dumazet wrote:
> > On Fri, 2014-01-17 at 17:25 -0800, dormando wrote:
> > > Hi,
> > >
> > > Upgraded a few kernels to the latest 3.10 stable tree while tracking down
> > > a rare kernel panic, seems to have introduced a much more frequent kernel
> > > panic. Takes anywhere from 4 hours to 2 days to trigger:
> > >
> > > <4>[196727.311203] general protection fault:  [#1] SMP
> > > <4>[196727.311224] Modules linked in: xt_TEE xt_dscp xt_DSCP macvlan 
> > > bridge coretemp crc32_pclmul ghash_clmulni_intel gpio_ich microcode 
> > > ipmi_watchdog ipmi_devintf sb_edac edac_core lpc_ich mfd_core tpm_tis tpm 
> > > tpm_bios ipmi_si ipmi_msghandler isci igb libsas i2c_algo_bit ixgbe ptp 
> > > pps_core mdio
> > > <4>[196727.311333] CPU: 17 PID: 0 Comm: swapper/17 Not tainted 3.10.26 #1
> > > <4>[196727.311344] Hardware name: Supermicro 
> > > X9DRi-LN4+/X9DR3-LN4+/X9DRi-LN4+/X9DR3-LN4+, BIOS 3.0 07/05/2013
> > > <4>[196727.311364] task: 885e6f069700 ti: 885e6f072000 task.ti: 
> > > 885e6f072000
> > > <4>[196727.311377] RIP: 0010:[]  [] 
> > > ipv4_dst_destroy+0x4f/0x80
> > > <4>[196727.311399] RSP: 0018:885effd23a70  EFLAGS: 00010282
> > > <4>[196727.311409] RAX: dead00200200 RBX: 8854c398ecc0 RCX: 
> > > 0040
> > > <4>[196727.311423] RDX: dead00100100 RSI: dead00100100 RDI: 
> > > dead00200200
> > > <4>[196727.311437] RBP: 885effd23a80 R08: 815fd9e0 R09: 
> > > 885d5a590800
> > > <4>[196727.311451] R10:  R11:  R12: 
> > > 
> > > <4>[196727.311464] R13: 81c8c280 R14:  R15: 
> > > 880e85ee16ce
> > > <4>[196727.311510] FS:  () GS:885effd2() 
> > > knlGS:
> > > <4>[196727.311554] CS:  0010 DS:  ES:  CR0: 80050033
> > > <4>[196727.311581] CR2: 7a46751eb000 CR3: 005e65688000 CR4: 
> > > 000407e0
> > > <4>[196727.311625] DR0:  DR1:  DR2: 
> > > 
> > > <4>[196727.311669] DR3:  DR6: 0ff0 DR7: 
> > > 0400
> > > <4>[196727.311713] Stack:
> > > <4>[196727.311733]  8854c398ecc0 8854c398ecc0 885effd23ab0 
> > > 815b7f42
> > > <4>[196727.311784]  88be6595bc00 8854c398ecc0  
> > > 8854c398ecc0
> > > <4>[196727.311834]  885effd23ad0 815b86c6 885d5a590800 
> > > 8816827821c0
> > > <4>[196727.311885] Call Trace:
> > > <4>[196727.311907]  
> > > <4>[196727.311912]  [] dst_destroy+0x32/0xe0
> > > <4>[196727.311959]  [] dst_release+0x56/0x80
> > > <4>[196727.311986]  [] tcp_v4_do_rcv+0x2a5/0x4a0
> > > <4>[196727.312013]  [] tcp_v4_rcv+0x7da/0x820
> > > <4>[196727.312041]  [] ? ip_rcv_finish+0x360/0x360
> > > <4>[196727.312070]  [] ? nf_hook_slow+0x7d/0x150
> > > <4>[196727.312097]  [] ? ip_rcv_finish+0x360/0x360
> > > <4>[196727.312125]  [] 
> > > ip_local_deliver_finish+0xb2/0x230
> > > <4>[196727.312154]  [] ip_local_deliver+0x4a/0x90
> > > <4>[196727.312183]  [] ip_rcv_finish+0x119/0x360
> > > <4>[196727.312212]  [] ip_rcv+0x22b/0x340
> > > <4>[196727.312242]  [] ? macvlan_broadcast+0x160/0x160 
> > > [macvlan]
> > > <4>[196727.312275]  [] 
> > > __netif_receive_skb_core+0x512/0x640
> > > <4>[196727.312308]  [] ? kmem_cache_alloc+0x13b/0x150
> > > <4>[196727.312338]  [] __netif_receive_skb+0x21/0x70
> > > <4>[196727.312368]  [] netif_receive_skb+0x31/0xa0
> > > <4>[196727.312397]  [] napi_gro_receive+0xe8/0x140
> > > <4>[196727.312433]  [] ixgbe_poll+0x551/0x11f0 [ixgbe]
> > > <4>[196727.312463]  [] ? ip_rcv+0x22b/0x340
> > > <4>[196727.312491]  [] net_rx_action+0x111/0x210
> > > <4>[196727.312521]  [] ? __netif_receive_skb+0x21/0x70
> > > <4>[196727.312552]  [] __do_softirq+0xd0/0x270
> > > <4>[196727.312583]  [] call_softirq+0x1c/0x30
> > > <4>[196727.312613]  [] do_softirq+0x55/0x90
> > > <4>[196727.312640]  [] irq_exit+0x55/0x60
> > > <4&

ipv4_dst_destroy panic regression after 3.10.15

2014-01-17 Thread dormando

Hi,

Upgraded a few kernels to the latest 3.10 stable tree while tracking down
a rare kernel panic, seems to have introduced a much more frequent kernel
panic. Takes anywhere from 4 hours to 2 days to trigger:

<4>[196727.311203] general protection fault:  [#1] SMP
<4>[196727.311224] Modules linked in: xt_TEE xt_dscp xt_DSCP macvlan bridge 
coretemp crc32_pclmul ghash_clmulni_intel gpio_ich microcode ipmi_watchdog 
ipmi_devintf sb_edac edac_core lpc_ich mfd_core tpm_tis tpm tpm_bios ipmi_si 
ipmi_msghandler isci igb libsas i2c_algo_bit ixgbe ptp pps_core mdio
<4>[196727.311333] CPU: 17 PID: 0 Comm: swapper/17 Not tainted 3.10.26 #1
<4>[196727.311344] Hardware name: Supermicro 
X9DRi-LN4+/X9DR3-LN4+/X9DRi-LN4+/X9DR3-LN4+, BIOS 3.0 07/05/2013
<4>[196727.311364] task: 885e6f069700 ti: 885e6f072000 task.ti: 
885e6f072000
<4>[196727.311377] RIP: 0010:[]  [] 
ipv4_dst_destroy+0x4f/0x80
<4>[196727.311399] RSP: 0018:885effd23a70  EFLAGS: 00010282
<4>[196727.311409] RAX: dead00200200 RBX: 8854c398ecc0 RCX: 
0040
<4>[196727.311423] RDX: dead00100100 RSI: dead00100100 RDI: 
dead00200200
<4>[196727.311437] RBP: 885effd23a80 R08: 815fd9e0 R09: 
885d5a590800
<4>[196727.311451] R10:  R11:  R12: 

<4>[196727.311464] R13: 81c8c280 R14:  R15: 
880e85ee16ce
<4>[196727.311510] FS:  () GS:885effd2() 
knlGS:
<4>[196727.311554] CS:  0010 DS:  ES:  CR0: 80050033
<4>[196727.311581] CR2: 7a46751eb000 CR3: 005e65688000 CR4: 
000407e0
<4>[196727.311625] DR0:  DR1:  DR2: 

<4>[196727.311669] DR3:  DR6: 0ff0 DR7: 
0400
<4>[196727.311713] Stack:
<4>[196727.311733]  8854c398ecc0 8854c398ecc0 885effd23ab0 
815b7f42
<4>[196727.311784]  88be6595bc00 8854c398ecc0  
8854c398ecc0
<4>[196727.311834]  885effd23ad0 815b86c6 885d5a590800 
8816827821c0
<4>[196727.311885] Call Trace:
<4>[196727.311907]  
<4>[196727.311912]  [] dst_destroy+0x32/0xe0
<4>[196727.311959]  [] dst_release+0x56/0x80
<4>[196727.311986]  [] tcp_v4_do_rcv+0x2a5/0x4a0
<4>[196727.312013]  [] tcp_v4_rcv+0x7da/0x820
<4>[196727.312041]  [] ? ip_rcv_finish+0x360/0x360
<4>[196727.312070]  [] ? nf_hook_slow+0x7d/0x150
<4>[196727.312097]  [] ? ip_rcv_finish+0x360/0x360
<4>[196727.312125]  [] ip_local_deliver_finish+0xb2/0x230
<4>[196727.312154]  [] ip_local_deliver+0x4a/0x90
<4>[196727.312183]  [] ip_rcv_finish+0x119/0x360
<4>[196727.312212]  [] ip_rcv+0x22b/0x340
<4>[196727.312242]  [] ? macvlan_broadcast+0x160/0x160 
[macvlan]
<4>[196727.312275]  [] __netif_receive_skb_core+0x512/0x640
<4>[196727.312308]  [] ? kmem_cache_alloc+0x13b/0x150
<4>[196727.312338]  [] __netif_receive_skb+0x21/0x70
<4>[196727.312368]  [] netif_receive_skb+0x31/0xa0
<4>[196727.312397]  [] napi_gro_receive+0xe8/0x140
<4>[196727.312433]  [] ixgbe_poll+0x551/0x11f0 [ixgbe]
<4>[196727.312463]  [] ? ip_rcv+0x22b/0x340
<4>[196727.312491]  [] net_rx_action+0x111/0x210
<4>[196727.312521]  [] ? __netif_receive_skb+0x21/0x70
<4>[196727.312552]  [] __do_softirq+0xd0/0x270
<4>[196727.312583]  [] call_softirq+0x1c/0x30
<4>[196727.312613]  [] do_softirq+0x55/0x90
<4>[196727.312640]  [] irq_exit+0x55/0x60
<4>[196727.312668]  [] do_IRQ+0x63/0xe0
<4>[196727.312696]  [] common_interrupt+0x6a/0x6a
<4>[196727.312722]  
<4>[196727.312727]  [] ? default_idle+0x20/0xe0
<4>[196727.312775]  [] arch_cpu_idle+0xf/0x20
<4>[196727.312803]  [] cpu_startup_entry+0xc0/0x270
<4>[196727.312833]  [] start_secondary+0x1f9/0x200
<4>[196727.312860] Code: 4a 9f e9 81 e8 13 cb 0c 00 48 8b 93 b0 00 00 00 48 bf 
00 02 20 00 00 00 ad de 48 8b 83 b8 00 00 00 48 be 00 01 10 00 00 00 ad de <48> 
89 42 08 48 89 10 48 89 bb b8 00 00 00 48 c7 c7 4a 9f e9 81
<1>[196727.313071] RIP  [] ipv4_dst_destroy+0x4f/0x80
<4>[196727.313100]  RSP 
<4>[196727.313377] ---[ end trace 64b3f14fae0f2e29 ]---
<0>[196727.380908] Kernel panic - not syncing: Fatal exception in interrupt


... bisecting it's going to be a pain... I tried eyeballing the diffs and
am trying a revert or two.

We've hit it in .25, .26 so far. I have .27 running but not sure if it
crashed, so the change exists between .15 and .25.

Thanks,
-Dormando
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

ipv4_dst_destroy panic regression after 3.10.15

2014-01-17 Thread dormando

Hi,

Upgraded a few kernels to the latest 3.10 stable tree while tracking down
a rare kernel panic, seems to have introduced a much more frequent kernel
panic. Takes anywhere from 4 hours to 2 days to trigger:

4[196727.311203] general protection fault:  [#1] SMP
4[196727.311224] Modules linked in: xt_TEE xt_dscp xt_DSCP macvlan bridge 
coretemp crc32_pclmul ghash_clmulni_intel gpio_ich microcode ipmi_watchdog 
ipmi_devintf sb_edac edac_core lpc_ich mfd_core tpm_tis tpm tpm_bios ipmi_si 
ipmi_msghandler isci igb libsas i2c_algo_bit ixgbe ptp pps_core mdio
4[196727.311333] CPU: 17 PID: 0 Comm: swapper/17 Not tainted 3.10.26 #1
4[196727.311344] Hardware name: Supermicro 
X9DRi-LN4+/X9DR3-LN4+/X9DRi-LN4+/X9DR3-LN4+, BIOS 3.0 07/05/2013
4[196727.311364] task: 885e6f069700 ti: 885e6f072000 task.ti: 
885e6f072000
4[196727.311377] RIP: 0010:[815f8c7f]  [815f8c7f] 
ipv4_dst_destroy+0x4f/0x80
4[196727.311399] RSP: 0018:885effd23a70  EFLAGS: 00010282
4[196727.311409] RAX: dead00200200 RBX: 8854c398ecc0 RCX: 
0040
4[196727.311423] RDX: dead00100100 RSI: dead00100100 RDI: 
dead00200200
4[196727.311437] RBP: 885effd23a80 R08: 815fd9e0 R09: 
885d5a590800
4[196727.311451] R10:  R11:  R12: 

4[196727.311464] R13: 81c8c280 R14:  R15: 
880e85ee16ce
4[196727.311510] FS:  () GS:885effd2() 
knlGS:
4[196727.311554] CS:  0010 DS:  ES:  CR0: 80050033
4[196727.311581] CR2: 7a46751eb000 CR3: 005e65688000 CR4: 
000407e0
4[196727.311625] DR0:  DR1:  DR2: 

4[196727.311669] DR3:  DR6: 0ff0 DR7: 
0400
4[196727.311713] Stack:
4[196727.311733]  8854c398ecc0 8854c398ecc0 885effd23ab0 
815b7f42
4[196727.311784]  88be6595bc00 8854c398ecc0  
8854c398ecc0
4[196727.311834]  885effd23ad0 815b86c6 885d5a590800 
8816827821c0
4[196727.311885] Call Trace:
4[196727.311907]  IRQ
4[196727.311912]  [815b7f42] dst_destroy+0x32/0xe0
4[196727.311959]  [815b86c6] dst_release+0x56/0x80
4[196727.311986]  [81620bd5] tcp_v4_do_rcv+0x2a5/0x4a0
4[196727.312013]  [81622b5a] tcp_v4_rcv+0x7da/0x820
4[196727.312041]  [815fd9e0] ? ip_rcv_finish+0x360/0x360
4[196727.312070]  [815de02d] ? nf_hook_slow+0x7d/0x150
4[196727.312097]  [815fd9e0] ? ip_rcv_finish+0x360/0x360
4[196727.312125]  [815fda92] ip_local_deliver_finish+0xb2/0x230
4[196727.312154]  [815fdd9a] ip_local_deliver+0x4a/0x90
4[196727.312183]  [815fd799] ip_rcv_finish+0x119/0x360
4[196727.312212]  [815fe00b] ip_rcv+0x22b/0x340
4[196727.312242]  [a0339680] ? macvlan_broadcast+0x160/0x160 
[macvlan]
4[196727.312275]  [815b0c62] __netif_receive_skb_core+0x512/0x640
4[196727.312308]  [811427fb] ? kmem_cache_alloc+0x13b/0x150
4[196727.312338]  [815b0db1] __netif_receive_skb+0x21/0x70
4[196727.312368]  [815b0fa1] netif_receive_skb+0x31/0xa0
4[196727.312397]  [815b1ae8] napi_gro_receive+0xe8/0x140
4[196727.312433]  [a00274f1] ixgbe_poll+0x551/0x11f0 [ixgbe]
4[196727.312463]  [815fe00b] ? ip_rcv+0x22b/0x340
4[196727.312491]  [815b1691] net_rx_action+0x111/0x210
4[196727.312521]  [815b0db1] ? __netif_receive_skb+0x21/0x70
4[196727.312552]  [810519d0] __do_softirq+0xd0/0x270
4[196727.312583]  [816cef3c] call_softirq+0x1c/0x30
4[196727.312613]  [81004205] do_softirq+0x55/0x90
4[196727.312640]  [81051c85] irq_exit+0x55/0x60
4[196727.312668]  [816cf5c3] do_IRQ+0x63/0xe0
4[196727.312696]  [816c5aaa] common_interrupt+0x6a/0x6a
4[196727.312722]  EOI
4[196727.312727]  [8100a150] ? default_idle+0x20/0xe0
4[196727.312775]  [8100a8ff] arch_cpu_idle+0xf/0x20
4[196727.312803]  [8108d330] cpu_startup_entry+0xc0/0x270
4[196727.312833]  [816b276e] start_secondary+0x1f9/0x200
4[196727.312860] Code: 4a 9f e9 81 e8 13 cb 0c 00 48 8b 93 b0 00 00 00 48 bf 
00 02 20 00 00 00 ad de 48 8b 83 b8 00 00 00 48 be 00 01 10 00 00 00 ad de 48 
89 42 08 48 89 10 48 89 bb b8 00 00 00 48 c7 c7 4a 9f e9 81
1[196727.313071] RIP  [815f8c7f] ipv4_dst_destroy+0x4f/0x80
4[196727.313100]  RSP 885effd23a70
4[196727.313377] ---[ end trace 64b3f14fae0f2e29 ]---
0[196727.380908] Kernel panic - not syncing: Fatal exception in interrupt


... bisecting it's going to be a pain... I tried eyeballing the diffs and
am trying a revert or two.

We've hit it in .25, .26 so far. I have .27 running but not sure if it
crashed, so the change exists between .15 and .25.

Thanks,
-Dormando
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo

Re: ipv4_dst_destroy panic regression after 3.10.15

2014-01-17 Thread dormando

 On Fri, 2014-01-17 at 22:49 -0800, Eric Dumazet wrote:
  On Fri, 2014-01-17 at 17:25 -0800, dormando wrote:
   Hi,
  
   Upgraded a few kernels to the latest 3.10 stable tree while tracking down
   a rare kernel panic, seems to have introduced a much more frequent kernel
   panic. Takes anywhere from 4 hours to 2 days to trigger:
  
   4[196727.311203] general protection fault:  [#1] SMP
   4[196727.311224] Modules linked in: xt_TEE xt_dscp xt_DSCP macvlan 
   bridge coretemp crc32_pclmul ghash_clmulni_intel gpio_ich microcode 
   ipmi_watchdog ipmi_devintf sb_edac edac_core lpc_ich mfd_core tpm_tis tpm 
   tpm_bios ipmi_si ipmi_msghandler isci igb libsas i2c_algo_bit ixgbe ptp 
   pps_core mdio
   4[196727.311333] CPU: 17 PID: 0 Comm: swapper/17 Not tainted 3.10.26 #1
   4[196727.311344] Hardware name: Supermicro 
   X9DRi-LN4+/X9DR3-LN4+/X9DRi-LN4+/X9DR3-LN4+, BIOS 3.0 07/05/2013
   4[196727.311364] task: 885e6f069700 ti: 885e6f072000 task.ti: 
   885e6f072000
   4[196727.311377] RIP: 0010:[815f8c7f]  [815f8c7f] 
   ipv4_dst_destroy+0x4f/0x80
   4[196727.311399] RSP: 0018:885effd23a70  EFLAGS: 00010282
   4[196727.311409] RAX: dead00200200 RBX: 8854c398ecc0 RCX: 
   0040
   4[196727.311423] RDX: dead00100100 RSI: dead00100100 RDI: 
   dead00200200
   4[196727.311437] RBP: 885effd23a80 R08: 815fd9e0 R09: 
   885d5a590800
   4[196727.311451] R10:  R11:  R12: 
   
   4[196727.311464] R13: 81c8c280 R14:  R15: 
   880e85ee16ce
   4[196727.311510] FS:  () GS:885effd2() 
   knlGS:
   4[196727.311554] CS:  0010 DS:  ES:  CR0: 80050033
   4[196727.311581] CR2: 7a46751eb000 CR3: 005e65688000 CR4: 
   000407e0
   4[196727.311625] DR0:  DR1:  DR2: 
   
   4[196727.311669] DR3:  DR6: 0ff0 DR7: 
   0400
   4[196727.311713] Stack:
   4[196727.311733]  8854c398ecc0 8854c398ecc0 885effd23ab0 
   815b7f42
   4[196727.311784]  88be6595bc00 8854c398ecc0  
   8854c398ecc0
   4[196727.311834]  885effd23ad0 815b86c6 885d5a590800 
   8816827821c0
   4[196727.311885] Call Trace:
   4[196727.311907]  IRQ
   4[196727.311912]  [815b7f42] dst_destroy+0x32/0xe0
   4[196727.311959]  [815b86c6] dst_release+0x56/0x80
   4[196727.311986]  [81620bd5] tcp_v4_do_rcv+0x2a5/0x4a0
   4[196727.312013]  [81622b5a] tcp_v4_rcv+0x7da/0x820
   4[196727.312041]  [815fd9e0] ? ip_rcv_finish+0x360/0x360
   4[196727.312070]  [815de02d] ? nf_hook_slow+0x7d/0x150
   4[196727.312097]  [815fd9e0] ? ip_rcv_finish+0x360/0x360
   4[196727.312125]  [815fda92] 
   ip_local_deliver_finish+0xb2/0x230
   4[196727.312154]  [815fdd9a] ip_local_deliver+0x4a/0x90
   4[196727.312183]  [815fd799] ip_rcv_finish+0x119/0x360
   4[196727.312212]  [815fe00b] ip_rcv+0x22b/0x340
   4[196727.312242]  [a0339680] ? macvlan_broadcast+0x160/0x160 
   [macvlan]
   4[196727.312275]  [815b0c62] 
   __netif_receive_skb_core+0x512/0x640
   4[196727.312308]  [811427fb] ? kmem_cache_alloc+0x13b/0x150
   4[196727.312338]  [815b0db1] __netif_receive_skb+0x21/0x70
   4[196727.312368]  [815b0fa1] netif_receive_skb+0x31/0xa0
   4[196727.312397]  [815b1ae8] napi_gro_receive+0xe8/0x140
   4[196727.312433]  [a00274f1] ixgbe_poll+0x551/0x11f0 [ixgbe]
   4[196727.312463]  [815fe00b] ? ip_rcv+0x22b/0x340
   4[196727.312491]  [815b1691] net_rx_action+0x111/0x210
   4[196727.312521]  [815b0db1] ? __netif_receive_skb+0x21/0x70
   4[196727.312552]  [810519d0] __do_softirq+0xd0/0x270
   4[196727.312583]  [816cef3c] call_softirq+0x1c/0x30
   4[196727.312613]  [81004205] do_softirq+0x55/0x90
   4[196727.312640]  [81051c85] irq_exit+0x55/0x60
   4[196727.312668]  [816cf5c3] do_IRQ+0x63/0xe0
   4[196727.312696]  [816c5aaa] common_interrupt+0x6a/0x6a
   4[196727.312722]  EOI
   4[196727.312727]  [8100a150] ? default_idle+0x20/0xe0
   4[196727.312775]  [8100a8ff] arch_cpu_idle+0xf/0x20
   4[196727.312803]  [8108d330] cpu_startup_entry+0xc0/0x270
   4[196727.312833]  [816b276e] start_secondary+0x1f9/0x200
   4[196727.312860] Code: 4a 9f e9 81 e8 13 cb 0c 00 48 8b 93 b0 00 00 00 
   48 bf 00 02 20 00 00 00 ad de 48 8b 83 b8 00 00 00 48 be 00 01 10 00 00 
   00 ad de 48 89 42 08 48 89 10 48 89 bb b8 00 00 00 48 c7 c7 4a 9f e9 81
   1[196727.313071] RIP  [815f8c7f] ipv4_dst_destroy+0x4f/0x80
   4[196727.313100]  RSP 885effd23a70
   4[196727.313377] ---[ end trace 64b3f14fae0f2e29 ]---
   0[196727.380908] Kernel panic - not syncing: Fatal exception in 
   interrupt

Re: ipv4: warnings on sk_wmem_queued

2013-08-31 Thread dormando

> I noticed these warnings on stock 3.10.9 running stress tests on
> cmogstored.git (git://bogomips.org/cmogstored.git) doing standard
> HTTP server stuff between lo and tmpfs:
>
[...]
> I was going to reboot into 3.10.10 before I looked at dmesg.  These
> warnings happened after ~8 hours of stress tests, and those stress tests
> are still running.

I had a kernel panic this morning on a production machine, also running
3.10.9. I only got a small part of the end of the trace, but it matches:

> Aug 30 06:03:54 localhost kernel: [] 
> ip_queue_xmit+0x153/0x3c0
> Aug 30 06:03:54 localhost kernel: [] 
> tcp_transmit_skb+0x3c5/0x820
> Aug 30 06:03:54 localhost kernel: [] 
> tcp_write_xmit+0x191/0xaa0
> Aug 30 06:03:54 localhost kernel: [] ? 
> __kmalloc_reserve.isra.49+0x3c/0xa0
> Aug 30 06:03:54 localhost kernel: [] 
> __tcp_push_pending_frames+0x32/0xa0
> Aug 30 06:03:54 localhost kernel: [] tcp_send_fin+0x6f/0x190
> Aug 30 06:03:54 localhost kernel: [] tcp_close+0x378/0x410
> Aug 30 06:03:54 localhost kernel: [] inet_release+0x5a/0xa0
> Aug 30 06:03:54 localhost kernel: [] sock_release+0x28/0x90
> Aug 30 06:03:54 localhost kernel: [] sock_close+0x12/0x20
> Aug 30 06:03:54 localhost kernel: [] __fput+0xaf/0x240
> Aug 30 06:03:54 localhost kernel: [] fput+0xe/0x10
> Aug 30 06:03:54 localhost kernel: [] task_work_run+0xa7/0xe0
> Aug 30 06:03:54 localhost kernel: [] 
> do_notify_resume+0x9c/0xb0
> Aug 30 06:03:54 localhost kernel: [] int_signal+0x12/0x17

... from there to here...

Then:
RIP  [ kmem_cache_alloc+0x5a/0x130
 RSP 
---[ end trace 6ab931f3db28b31e ]---
Kernel panic - not syncing: Fatal exception in interrupt

Machine was running for a few days before panic'ing. I don't see anything
in 3.10.10 that would've affected this.

Thanks!

(also: hi Eric!)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: ipv4: warnings on sk_wmem_queued

2013-08-31 Thread dormando

 I noticed these warnings on stock 3.10.9 running stress tests on
 cmogstored.git (git://bogomips.org/cmogstored.git) doing standard
 HTTP server stuff between lo and tmpfs:

[...]
 I was going to reboot into 3.10.10 before I looked at dmesg.  These
 warnings happened after ~8 hours of stress tests, and those stress tests
 are still running.

I had a kernel panic this morning on a production machine, also running
3.10.9. I only got a small part of the end of the trace, but it matches:

 Aug 30 06:03:54 localhost kernel: [813c0073] 
 ip_queue_xmit+0x153/0x3c0
 Aug 30 06:03:54 localhost kernel: [813d6c25] 
 tcp_transmit_skb+0x3c5/0x820
 Aug 30 06:03:54 localhost kernel: [813d72c1] 
 tcp_write_xmit+0x191/0xaa0
 Aug 30 06:03:54 localhost kernel: [8138434c] ? 
 __kmalloc_reserve.isra.49+0x3c/0xa0
 Aug 30 06:03:54 localhost kernel: [813d7c42] 
 __tcp_push_pending_frames+0x32/0xa0
 Aug 30 06:03:54 localhost kernel: [813d8a8f] tcp_send_fin+0x6f/0x190
 Aug 30 06:03:54 localhost kernel: [813cc508] tcp_close+0x378/0x410
 Aug 30 06:03:54 localhost kernel: [813efe5a] inet_release+0x5a/0xa0
 Aug 30 06:03:54 localhost kernel: [8137a218] sock_release+0x28/0x90
 Aug 30 06:03:54 localhost kernel: [8137a5c2] sock_close+0x12/0x20
 Aug 30 06:03:54 localhost kernel: [81123def] __fput+0xaf/0x240
 Aug 30 06:03:54 localhost kernel: [8112403e] fput+0xe/0x10
 Aug 30 06:03:54 localhost kernel: [81054d47] task_work_run+0xa7/0xe0
 Aug 30 06:03:54 localhost kernel: [8100209c] 
 do_notify_resume+0x9c/0xb0
 Aug 30 06:03:54 localhost kernel: [81430788] int_signal+0x12/0x17

... from there to here...

Then:
RIP  [8113c42a kmem_cache_alloc+0x5a/0x130
 RSP 881fffca3958
---[ end trace 6ab931f3db28b31e ]---
Kernel panic - not syncing: Fatal exception in interrupt

Machine was running for a few days before panic'ing. I don't see anything
in 3.10.10 that would've affected this.

Thanks!

(also: hi Eric!)
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/10] Reduce system disruption due to kswapd V2

2013-04-10 Thread dormando

> On Tue, Apr 09, 2013 at 05:27:18PM +, Christoph Lameter wrote:
> > One additional measure that may be useful is to make kswapd prefer one
> > specific processor on a socket. Two benefits arise from that:
> >
> > 1. Better use of cpu caches and therefore higher speed, less
> > serialization.
> >
>
> Considering the volume of pages that kswapd can scan when it's active
> I would expect that it trashes its cache anyway. The L1 cache would be
> flushed after scanning struct pages for just a few MB of memory.
>
> > 2. Reduction of the disturbances to one processor.
> >
>
> I've never checked it but I would have expected kswapd to stay on the
> same processor for significant periods of time. Have you experienced
> problems where kswapd bounces around on CPUs within a node causing
> workload disruption?

When kswapd shares the same CPU as our main process it causes a measurable
drop in response time (graphs show tiny spikes at the same time memory is
freed). Would be nice to be able to ensure it runs on a different core
than our latency sensitive processes at least. We can pin processes to
subsets of cores but I don't think there's a way to keep kswapd from
waking up on any of them?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/10] Reduce system disruption due to kswapd V2

2013-04-10 Thread dormando

 On Tue, Apr 09, 2013 at 05:27:18PM +, Christoph Lameter wrote:
  One additional measure that may be useful is to make kswapd prefer one
  specific processor on a socket. Two benefits arise from that:
 
  1. Better use of cpu caches and therefore higher speed, less
  serialization.
 

 Considering the volume of pages that kswapd can scan when it's active
 I would expect that it trashes its cache anyway. The L1 cache would be
 flushed after scanning struct pages for just a few MB of memory.

  2. Reduction of the disturbances to one processor.
 

 I've never checked it but I would have expected kswapd to stay on the
 same processor for significant periods of time. Have you experienced
 problems where kswapd bounces around on CPUs within a node causing
 workload disruption?

When kswapd shares the same CPU as our main process it causes a measurable
drop in response time (graphs show tiny spikes at the same time memory is
freed). Would be nice to be able to ensure it runs on a different core
than our latency sensitive processes at least. We can pin processes to
subsets of cores but I don't think there's a way to keep kswapd from
waking up on any of them?
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: BUG: IPv4: Attempt to release TCP socket in state 1

2013-03-17 Thread dormando

> On Sat, 2013-03-16 at 10:36 -0700, Eric Dumazet wrote:
> > On Fri, 2013-03-15 at 00:19 +0100, Eric Dumazet wrote:
> >
> > > Thanks thats really useful, we might miss to increment socket refcount
> > > in a timer setup.
> > >
> >
> > Hmm, please add following debugging patch as well
> >
> > diff --git a/include/net/sock.h b/include/net/sock.h
> > index 14f6e9d..fe7c8a6 100644
> > --- a/include/net/sock.h
> > +++ b/include/net/sock.h
> > @@ -530,7 +530,9 @@ static inline void sock_hold(struct sock *sk)
> >   */
> >  static inline void __sock_put(struct sock *sk)
> >  {
> > -   atomic_dec(>sk_refcnt);
> > +   int newref = atomic_dec_return(>sk_refcnt);
> > +
> > +   BUG_ON(newref <= 0);
> >  }
> >
> >  static inline bool sk_del_node_init(struct sock *sk)
> > diff --git a/net/ipv4/inet_connection_sock.c 
> > b/net/ipv4/inet_connection_sock.c
> > index 786d97a..a445e15 100644
> > --- a/net/ipv4/inet_connection_sock.c
> > +++ b/net/ipv4/inet_connection_sock.c
> > @@ -739,7 +739,7 @@ void inet_csk_prepare_forced_close(struct sock *sk)
> >  {
> > /* sk_clone_lock locked the socket and set refcnt to 2 */
> > bh_unlock_sock(sk);
> > -   sock_put(sk);
> > +   __sock_put(sk);
> >
> > /* The below has to be done to allow calling inet_csk_destroy_sock */
> > sock_set_flag(sk, SOCK_DEAD);
> > @@ -835,13 +835,13 @@ void inet_csk_listen_stop(struct sock *sk)
> >  * tcp_v4_destroy_sock().
> >  */
> > tcp_sk(child)->fastopen_rsk = NULL;
> > -   sock_put(sk);
> > +   __sock_put(sk);
> > }
> > inet_csk_destroy_sock(child);
> >
> > bh_unlock_sock(child);
> > local_bh_enable();
> > -   sock_put(child);
> > +   __sock_put(child);
> >
>
> Please don't include the last line : this should stay as
>
>  sock_put(child);

Hope you don't mind a screenshot:
http://www.dormando.me/p/3.8.2-trace-crash.jpg

(I put the patches on 3.8.2). box is on another continent so screenshot
via IPMI is what I get. If this isn't enough or isn't right I'll try
harder to get the trace logged, I guess?

Thanks!
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: BUG: IPv4: Attempt to release TCP socket in state 1

2013-03-17 Thread dormando

 On Sat, 2013-03-16 at 10:36 -0700, Eric Dumazet wrote:
  On Fri, 2013-03-15 at 00:19 +0100, Eric Dumazet wrote:
 
   Thanks thats really useful, we might miss to increment socket refcount
   in a timer setup.
  
 
  Hmm, please add following debugging patch as well
 
  diff --git a/include/net/sock.h b/include/net/sock.h
  index 14f6e9d..fe7c8a6 100644
  --- a/include/net/sock.h
  +++ b/include/net/sock.h
  @@ -530,7 +530,9 @@ static inline void sock_hold(struct sock *sk)
*/
   static inline void __sock_put(struct sock *sk)
   {
  -   atomic_dec(sk-sk_refcnt);
  +   int newref = atomic_dec_return(sk-sk_refcnt);
  +
  +   BUG_ON(newref = 0);
   }
 
   static inline bool sk_del_node_init(struct sock *sk)
  diff --git a/net/ipv4/inet_connection_sock.c 
  b/net/ipv4/inet_connection_sock.c
  index 786d97a..a445e15 100644
  --- a/net/ipv4/inet_connection_sock.c
  +++ b/net/ipv4/inet_connection_sock.c
  @@ -739,7 +739,7 @@ void inet_csk_prepare_forced_close(struct sock *sk)
   {
  /* sk_clone_lock locked the socket and set refcnt to 2 */
  bh_unlock_sock(sk);
  -   sock_put(sk);
  +   __sock_put(sk);
 
  /* The below has to be done to allow calling inet_csk_destroy_sock */
  sock_set_flag(sk, SOCK_DEAD);
  @@ -835,13 +835,13 @@ void inet_csk_listen_stop(struct sock *sk)
   * tcp_v4_destroy_sock().
   */
  tcp_sk(child)-fastopen_rsk = NULL;
  -   sock_put(sk);
  +   __sock_put(sk);
  }
  inet_csk_destroy_sock(child);
 
  bh_unlock_sock(child);
  local_bh_enable();
  -   sock_put(child);
  +   __sock_put(child);
 

 Please don't include the last line : this should stay as

  sock_put(child);

Hope you don't mind a screenshot:
http://www.dormando.me/p/3.8.2-trace-crash.jpg

(I put the patches on 3.8.2). box is on another continent so screenshot
via IPMI is what I get. If this isn't enough or isn't right I'll try
harder to get the trace logged, I guess?

Thanks!
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: BUG: IPv4: Attempt to release TCP socket in state 1

2013-03-16 Thread dormando

> On Sat, 2013-03-16 at 10:36 -0700, Eric Dumazet wrote:
> > On Fri, 2013-03-15 at 00:19 +0100, Eric Dumazet wrote:
> >
> > > Thanks thats really useful, we might miss to increment socket refcount
> > > in a timer setup.
> > >
> >
> > Hmm, please add following debugging patch as well
> >
> > diff --git a/include/net/sock.h b/include/net/sock.h
> > index 14f6e9d..fe7c8a6 100644
> > --- a/include/net/sock.h
> > +++ b/include/net/sock.h
> > @@ -530,7 +530,9 @@ static inline void sock_hold(struct sock *sk)
> >   */
> >  static inline void __sock_put(struct sock *sk)
> >  {
> > -   atomic_dec(>sk_refcnt);
> > +   int newref = atomic_dec_return(>sk_refcnt);
> > +
> > +   BUG_ON(newref <= 0);
> >  }
> >
> >  static inline bool sk_del_node_init(struct sock *sk)
> > diff --git a/net/ipv4/inet_connection_sock.c 
> > b/net/ipv4/inet_connection_sock.c
> > index 786d97a..a445e15 100644
> > --- a/net/ipv4/inet_connection_sock.c
> > +++ b/net/ipv4/inet_connection_sock.c
> > @@ -739,7 +739,7 @@ void inet_csk_prepare_forced_close(struct sock *sk)
> >  {
> > /* sk_clone_lock locked the socket and set refcnt to 2 */
> > bh_unlock_sock(sk);
> > -   sock_put(sk);
> > +   __sock_put(sk);
> >
> > /* The below has to be done to allow calling inet_csk_destroy_sock */
> > sock_set_flag(sk, SOCK_DEAD);
> > @@ -835,13 +835,13 @@ void inet_csk_listen_stop(struct sock *sk)
> >  * tcp_v4_destroy_sock().
> >  */
> > tcp_sk(child)->fastopen_rsk = NULL;
> > -   sock_put(sk);
> > +   __sock_put(sk);
> > }
> > inet_csk_destroy_sock(child);
> >
> > bh_unlock_sock(child);
> > local_bh_enable();
> > -   sock_put(child);
> > +   __sock_put(child);
> >
>
> Please don't include the last line : this should stay as
>
>  sock_put(child);
>

thanks! Will take at least 24 hours to get the trace.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: BUG: IPv4: Attempt to release TCP socket in state 1

2013-03-16 Thread dormando

 On Sat, 2013-03-16 at 10:36 -0700, Eric Dumazet wrote:
  On Fri, 2013-03-15 at 00:19 +0100, Eric Dumazet wrote:
 
   Thanks thats really useful, we might miss to increment socket refcount
   in a timer setup.
  
 
  Hmm, please add following debugging patch as well
 
  diff --git a/include/net/sock.h b/include/net/sock.h
  index 14f6e9d..fe7c8a6 100644
  --- a/include/net/sock.h
  +++ b/include/net/sock.h
  @@ -530,7 +530,9 @@ static inline void sock_hold(struct sock *sk)
*/
   static inline void __sock_put(struct sock *sk)
   {
  -   atomic_dec(sk-sk_refcnt);
  +   int newref = atomic_dec_return(sk-sk_refcnt);
  +
  +   BUG_ON(newref = 0);
   }
 
   static inline bool sk_del_node_init(struct sock *sk)
  diff --git a/net/ipv4/inet_connection_sock.c 
  b/net/ipv4/inet_connection_sock.c
  index 786d97a..a445e15 100644
  --- a/net/ipv4/inet_connection_sock.c
  +++ b/net/ipv4/inet_connection_sock.c
  @@ -739,7 +739,7 @@ void inet_csk_prepare_forced_close(struct sock *sk)
   {
  /* sk_clone_lock locked the socket and set refcnt to 2 */
  bh_unlock_sock(sk);
  -   sock_put(sk);
  +   __sock_put(sk);
 
  /* The below has to be done to allow calling inet_csk_destroy_sock */
  sock_set_flag(sk, SOCK_DEAD);
  @@ -835,13 +835,13 @@ void inet_csk_listen_stop(struct sock *sk)
   * tcp_v4_destroy_sock().
   */
  tcp_sk(child)-fastopen_rsk = NULL;
  -   sock_put(sk);
  +   __sock_put(sk);
  }
  inet_csk_destroy_sock(child);
 
  bh_unlock_sock(child);
  local_bh_enable();
  -   sock_put(child);
  +   __sock_put(child);
 

 Please don't include the last line : this should stay as

  sock_put(child);


thanks! Will take at least 24 hours to get the trace.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: BUG: IPv4: Attempt to release TCP socket in state 1

2013-03-14 Thread dormando

> On Thu, 2013-03-14 at 14:21 -0700, dormando wrote:
> > >
> > > diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
> > > index 68f6a94..1d4d97e 100644
> > > --- a/net/ipv4/af_inet.c
> > > +++ b/net/ipv4/af_inet.c
> > > @@ -141,8 +141,9 @@ void inet_sock_destruct(struct sock *sk)
> > >   sk_mem_reclaim(sk);
> > >
> > >   if (sk->sk_type == SOCK_STREAM && sk->sk_state != TCP_CLOSE) {
> > > - pr_err("Attempt to release TCP socket in state %d %p\n",
> > > -sk->sk_state, sk);
> > > + pr_err("Attempt to release TCP socket family %d in state %d 
> > > %p\n",
> > > +sk->sk_family, sk->sk_state, sk);
> > > + WARN_ON_ONCE(1);
> > >   return;
> > >   }
> > >   if (!sock_flag(sk, SOCK_DEAD)) {
> >
> > [58377.436522] IPv4: Attempt to release TCP socket family 2 in state 1
> > 8813fbad9500
>
> There is no stack information on the WARN_ON_ONCE(1) ?

*sigh*. it's been a long month, sorry:

[58377.436522] IPv4: Attempt to release TCP socket family 2 in state 1
8813fbad9500
[58377.436539] [ cut here ]
[58377.436545] WARNING: at net/ipv4/af_inet.c:146
inet_sock_destruct+0x176/0x200()
[58377.436546] Hardware name: X9DR3-F
[58377.436547] Modules linked in: bridge coretemp ghash_clmulni_intel
ipmi_watchdog ipmi_devintf gpio_ich microcode ixgbe sb_edac edac_core mei
lpc_ich mfd_core mdio ipmi_si ipmi_msghandler iptable_nat nf_nat_ipv4
nf_nat isci libsas igb ptp pps_core
[58377.436563] Pid: 0, comm: swapper/0 Not tainted 3.8.2 #3
[58377.436564] Call Trace:
[58377.436566][] warn_slowpath_common+0x7f/0xc0
[58377.436572]  [] warn_slowpath_null+0x1a/0x20
[58377.436574]  [] inet_sock_destruct+0x176/0x200
[58377.436578]  [] ? tcp_write_timer_handler+0x1b0/0x1b0
[58377.436581]  [] __sk_free+0x1d/0x140
[58377.436583]  [] ? tcp_write_timer_handler+0x1b0/0x1b0
[58377.436585]  [] sk_free+0x25/0x30
[58377.436586]  [] tcp_write_timer+0x49/0x70
[58377.436590]  [] call_timer_fn+0x49/0x130
[58377.436593]  [] ? scheduler_tick+0x15f/0x190
[58377.436596]  [] run_timer_softirq+0x224/0x290
[58377.436598]  [] ? update_process_times+0x76/0x90
[58377.436600]  [] ? tcp_write_timer_handler+0x1b0/0x1b0
[58377.436602]  [] ? ktime_get+0x54/0xe0
[58377.436604]  [] __do_softirq+0xc7/0x230
[58377.436608]  [] call_softirq+0x1c/0x30
[58377.436611]  [] do_softirq+0x55/0x90
[58377.436613]  [] irq_exit+0x85/0xa0
[58377.436616]  [] smp_apic_timer_interrupt+0x6e/0x99
[58377.436618]  [] apic_timer_interrupt+0x6a/0x70
[58377.436619][] ? __schedule+0x3ac/0x750
[58377.436625]  [] ? mwait_idle+0xad/0x1f0
[58377.436627]  [] cpu_idle+0xb3/0x100
[58377.436629]  [] rest_init+0x72/0x80
[58377.436633]  [] start_kernel+0x3ac/0x3b9
[58377.436635]  [] ? repair_env_string+0x5b/0x5b
[58377.436636]  [] x86_64_start_reservations+0x131/0x136
[58377.436638]  [] x86_64_start_kernel+0xed/0xf4
[58377.436639] ---[ end trace 9e57364162374433 ]---

^ pretty sure that's the WARN_ON_ONCE(1)

Then a short while later the usual:

[58394.689801] [ cut here ]
[58394.689817] WARNING: at net/sched/sch_generic.c:254
dev_watchdog+0x258/0x270()
[58394.689820] Hardware name: X9DR3-F
[58394.689836] NETDEV WATCHDOG: eth2 (ixgbe): transmit queue 14 timed out
[58394.689837] Modules linked in: bridge coretemp ghash_clmulni_intel
ipmi_watchdog ipmi_devintf gpio_ich microcode ixgbe sb_edac edac_core mei
lpc_ich mfd_core mdio ipmi_si ipmi_msghandler iptable_nat nf_nat_ipv4
nf_nat isci libsas igb ptp pps_core
[58394.689853] Pid: 0, comm: swapper/0 Tainted: GW
3.8.2 #3
[58394.689854] Call Trace:
[58394.689856][] warn_slowpath_common+0x7f/0xc0
[58394.689863]  [] warn_slowpath_fmt+0x46/0x50
[58394.689865]  [] dev_watchdog+0x258/0x270
[58394.689868]  [] ? __netdev_watchdog_up+0x80/0x80
[58394.689872]  [] call_timer_fn+0x49/0x130
[58394.689875]  [] ? scheduler_tick+0x15f/0x190
[58394.689877]  [] run_timer_softirq+0x224/0x290
[58394.689880]  [] ? update_process_times+0x76/0x90
[58394.689882]  [] ? __netdev_watchdog_up+0x80/0x80
[58394.689884]  [] ? ktime_get+0x54/0xe0
[58394.689886]  [] __do_softirq+0xc7/0x230
[58394.689890]  [] call_softirq+0x1c/0x30
[58394.689894]  [] do_softirq+0x55/0x90
[58394.689895]  [] irq_exit+0x85/0xa0
[58394.689898]  [] smp_apic_timer_interrupt+0x6e/0x99
[58394.689900]  [] apic_timer_interrupt+0x6a/0x70
[58394.689901][] ? __schedule+0x3ac/0x750
[58394.689907]  [] ? mwait_idle+0xad/0x1f0
[58394.689909]  [] cpu_idle+0xb3/0x100
[58394.689911]  [] rest_init+0x72/0x80
[58394.689915]  [] start_kernel+0x3ac/0x3b9
[58394.689917]  [] ? repair_env_string+0x5b/0x5b
[58394.689918]  [] x86_64_start_reservations+0x131/0x136
[58394.689920]  [] x86_64_start_kernel+0xed/0xf4
[58394.689922] ---[ end trace 9e57364162374434 ]---
[5839

Re: BUG: IPv4: Attempt to release TCP socket in state 1

2013-03-14 Thread dormando

 On Thu, 2013-03-14 at 14:21 -0700, dormando wrote:
  
   diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
   index 68f6a94..1d4d97e 100644
   --- a/net/ipv4/af_inet.c
   +++ b/net/ipv4/af_inet.c
   @@ -141,8 +141,9 @@ void inet_sock_destruct(struct sock *sk)
 sk_mem_reclaim(sk);
  
 if (sk-sk_type == SOCK_STREAM  sk-sk_state != TCP_CLOSE) {
   - pr_err(Attempt to release TCP socket in state %d %p\n,
   -sk-sk_state, sk);
   + pr_err(Attempt to release TCP socket family %d in state %d 
   %p\n,
   +sk-sk_family, sk-sk_state, sk);
   + WARN_ON_ONCE(1);
 return;
 }
 if (!sock_flag(sk, SOCK_DEAD)) {
 
  [58377.436522] IPv4: Attempt to release TCP socket family 2 in state 1
  8813fbad9500

 There is no stack information on the WARN_ON_ONCE(1) ?

*sigh*. it's been a long month, sorry:

[58377.436522] IPv4: Attempt to release TCP socket family 2 in state 1
8813fbad9500
[58377.436539] [ cut here ]
[58377.436545] WARNING: at net/ipv4/af_inet.c:146
inet_sock_destruct+0x176/0x200()
[58377.436546] Hardware name: X9DR3-F
[58377.436547] Modules linked in: bridge coretemp ghash_clmulni_intel
ipmi_watchdog ipmi_devintf gpio_ich microcode ixgbe sb_edac edac_core mei
lpc_ich mfd_core mdio ipmi_si ipmi_msghandler iptable_nat nf_nat_ipv4
nf_nat isci libsas igb ptp pps_core
[58377.436563] Pid: 0, comm: swapper/0 Not tainted 3.8.2 #3
[58377.436564] Call Trace:
[58377.436566]  IRQ  [8104964f] warn_slowpath_common+0x7f/0xc0
[58377.436572]  [810496aa] warn_slowpath_null+0x1a/0x20
[58377.436574]  [816032e6] inet_sock_destruct+0x176/0x200
[58377.436578]  [815ec8e0] ? tcp_write_timer_handler+0x1b0/0x1b0
[58377.436581]  [8156ee8d] __sk_free+0x1d/0x140
[58377.436583]  [815ec8e0] ? tcp_write_timer_handler+0x1b0/0x1b0
[58377.436585]  [8156efd5] sk_free+0x25/0x30
[58377.436586]  [815ec929] tcp_write_timer+0x49/0x70
[58377.436590]  [81059259] call_timer_fn+0x49/0x130
[58377.436593]  [8107a07f] ? scheduler_tick+0x15f/0x190
[58377.436596]  [81059854] run_timer_softirq+0x224/0x290
[58377.436598]  [81058f76] ? update_process_times+0x76/0x90
[58377.436600]  [815ec8e0] ? tcp_write_timer_handler+0x1b0/0x1b0
[58377.436602]  [8108ebd4] ? ktime_get+0x54/0xe0
[58377.436604]  [810518a7] __do_softirq+0xc7/0x230
[58377.436608]  [8168fd4c] call_softirq+0x1c/0x30
[58377.436611]  [81004415] do_softirq+0x55/0x90
[58377.436613]  [810516a5] irq_exit+0x85/0xa0
[58377.436616]  [8169036e] smp_apic_timer_interrupt+0x6e/0x99
[58377.436618]  [8168f74a] apic_timer_interrupt+0x6a/0x70
[58377.436619]  EOI  [816855cc] ? __schedule+0x3ac/0x750
[58377.436625]  [8100b1fd] ? mwait_idle+0xad/0x1f0
[58377.436627]  [8100a743] cpu_idle+0xb3/0x100
[58377.436629]  [816736a2] rest_init+0x72/0x80
[58377.436633]  [81cc7d0e] start_kernel+0x3ac/0x3b9
[58377.436635]  [81cc7790] ? repair_env_string+0x5b/0x5b
[58377.436636]  [81cc732d] x86_64_start_reservations+0x131/0x136
[58377.436638]  [81cc741f] x86_64_start_kernel+0xed/0xf4
[58377.436639] ---[ end trace 9e57364162374433 ]---

^ pretty sure that's the WARN_ON_ONCE(1)

Then a short while later the usual:

[58394.689801] [ cut here ]
[58394.689817] WARNING: at net/sched/sch_generic.c:254
dev_watchdog+0x258/0x270()
[58394.689820] Hardware name: X9DR3-F
[58394.689836] NETDEV WATCHDOG: eth2 (ixgbe): transmit queue 14 timed out
[58394.689837] Modules linked in: bridge coretemp ghash_clmulni_intel
ipmi_watchdog ipmi_devintf gpio_ich microcode ixgbe sb_edac edac_core mei
lpc_ich mfd_core mdio ipmi_si ipmi_msghandler iptable_nat nf_nat_ipv4
nf_nat isci libsas igb ptp pps_core
[58394.689853] Pid: 0, comm: swapper/0 Tainted: GW
3.8.2 #3
[58394.689854] Call Trace:
[58394.689856]  IRQ  [8104964f] warn_slowpath_common+0x7f/0xc0
[58394.689863]  [81049746] warn_slowpath_fmt+0x46/0x50
[58394.689865]  [815a1508] dev_watchdog+0x258/0x270
[58394.689868]  [815a12b0] ? __netdev_watchdog_up+0x80/0x80
[58394.689872]  [81059259] call_timer_fn+0x49/0x130
[58394.689875]  [8107a07f] ? scheduler_tick+0x15f/0x190
[58394.689877]  [81059854] run_timer_softirq+0x224/0x290
[58394.689880]  [81058f76] ? update_process_times+0x76/0x90
[58394.689882]  [815a12b0] ? __netdev_watchdog_up+0x80/0x80
[58394.689884]  [8108ebd4] ? ktime_get+0x54/0xe0
[58394.689886]  [810518a7] __do_softirq+0xc7/0x230
[58394.689890]  [8168fd4c] call_softirq+0x1c/0x30
[58394.689894]  [81004415] do_softirq+0x55/0x90
[58394.689895]  [810516a5] irq_exit+0x85/0xa0
[58394.689898]  [8169036e] smp_apic_timer_interrupt+0x6e/0x99
[58394.689900]  [8168f74a] apic_timer_interrupt+0x6a/0x70
[58394.689901]  EOI

Re: BUG: IPv4: Attempt to release TCP socket in state 1

2013-03-07 Thread dormando

> On Wed, 2013-03-06 at 16:41 -0800, dormando wrote:
>
> > Ok... bridge module is loaded but nothing seems to be using it. No
> > bond/tunnels/anything enabled. I couldn't quickly figure out what was
> > causing it to load.
> >
> > We removed the need for macvlan, started machines with a fresh boot, and
> > they still crashed without it, after a few hours.
> >
> > Unfortunately I just saw a machine crash in the same way on 3.6.6 and
> > 3.6.9. I'm working on getting a completely pristine 3.6.6 and 3.6.9
> > tested. Our patches are minor but there were a few, so I'm backing it all
> > out just to be sure.
> >
> > Is there anything in particular which is most interesting? I can post lots
> > and lots and lots of information. Sadly bridge/macvlan weren't part of the
> > problem. .config, sysctls are easiest I guess? When this "hang" happens
> > the machine is still up somewhat, but we lose access to it. Syslog is
> > still writing entries to disk occasionally, so it's possible we could set
> > something up to dump more information.
> >
> > It takes a day or two to cycle this, so it might take a while to get
> > information and test crashes.
>
> Thanks !
>
> Please add a stack trace, it might help :
>
> diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
> index 68f6a94..1d4d97e 100644
> --- a/net/ipv4/af_inet.c
> +++ b/net/ipv4/af_inet.c
> @@ -141,8 +141,9 @@ void inet_sock_destruct(struct sock *sk)
>   sk_mem_reclaim(sk);
>
>   if (sk->sk_type == SOCK_STREAM && sk->sk_state != TCP_CLOSE) {
> - pr_err("Attempt to release TCP socket in state %d %p\n",
> -sk->sk_state, sk);
> + pr_err("Attempt to release TCP socket family %d in state %d 
> %p\n",
> +sk->sk_family, sk->sk_state, sk);
> + WARN_ON_ONCE(1);
>   return;
>   }
>   if (!sock_flag(sk, SOCK_DEAD)) {

Ok. I have a pristine 3.6.6 up and testing now... It definitely looks like
we've been having this crash for quite a while, but much more rarely.
Recent changes in traffic have made it worse. I'll try your patch soon.

It'll take a few days to reproduce. I'll be back (ho ho ho). Please ping
with any ideas you folks might have in the meantime :(
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: BUG: IPv4: Attempt to release TCP socket in state 1

2013-03-07 Thread dormando

 On Wed, 2013-03-06 at 16:41 -0800, dormando wrote:

  Ok... bridge module is loaded but nothing seems to be using it. No
  bond/tunnels/anything enabled. I couldn't quickly figure out what was
  causing it to load.
 
  We removed the need for macvlan, started machines with a fresh boot, and
  they still crashed without it, after a few hours.
 
  Unfortunately I just saw a machine crash in the same way on 3.6.6 and
  3.6.9. I'm working on getting a completely pristine 3.6.6 and 3.6.9
  tested. Our patches are minor but there were a few, so I'm backing it all
  out just to be sure.
 
  Is there anything in particular which is most interesting? I can post lots
  and lots and lots of information. Sadly bridge/macvlan weren't part of the
  problem. .config, sysctls are easiest I guess? When this hang happens
  the machine is still up somewhat, but we lose access to it. Syslog is
  still writing entries to disk occasionally, so it's possible we could set
  something up to dump more information.
 
  It takes a day or two to cycle this, so it might take a while to get
  information and test crashes.

 Thanks !

 Please add a stack trace, it might help :

 diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
 index 68f6a94..1d4d97e 100644
 --- a/net/ipv4/af_inet.c
 +++ b/net/ipv4/af_inet.c
 @@ -141,8 +141,9 @@ void inet_sock_destruct(struct sock *sk)
   sk_mem_reclaim(sk);

   if (sk-sk_type == SOCK_STREAM  sk-sk_state != TCP_CLOSE) {
 - pr_err(Attempt to release TCP socket in state %d %p\n,
 -sk-sk_state, sk);
 + pr_err(Attempt to release TCP socket family %d in state %d 
 %p\n,
 +sk-sk_family, sk-sk_state, sk);
 + WARN_ON_ONCE(1);
   return;
   }
   if (!sock_flag(sk, SOCK_DEAD)) {

Ok. I have a pristine 3.6.6 up and testing now... It definitely looks like
we've been having this crash for quite a while, but much more rarely.
Recent changes in traffic have made it worse. I'll try your patch soon.

It'll take a few days to reproduce. I'll be back (ho ho ho). Please ping
with any ideas you folks might have in the meantime :(
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: BUG: IPv4: Attempt to release TCP socket in state 1

2013-03-06 Thread dormando

> On Mon, 2013-03-04 at 21:44 -0800, dormando wrote:
>
> > No 3rd party modules. There's a tiny patch for controlling initcwnd from
> > userspace and another one for the extra_free_kbytes tunable that I brought
> > up in another thread. We've had the initcwnd patch in for a long time
> > without trouble. The extra_free_kbytes tunable isn't even being used yet,
> > so all that's doing is adding a 0 somewhere.
> >
> > Only two iptables rules loaded: global NOTRACK rules for PREROUTING/OUTPUT
> > in raw.
> >
> > Kernel's as close to pristine as I can make it. We had the 10g patch in
> > but I've dropped it.
> > --
>
> Hmm, I spent time on this bug report but found nothing.
>
> Please post as much information as you can on your setup.
>
> I see you use macvlan, bridge, so maybe there is a configuration issue
> (and a kernel bug of course)

Ok... bridge module is loaded but nothing seems to be using it. No
bond/tunnels/anything enabled. I couldn't quickly figure out what was
causing it to load.

We removed the need for macvlan, started machines with a fresh boot, and
they still crashed without it, after a few hours.

Unfortunately I just saw a machine crash in the same way on 3.6.6 and
3.6.9. I'm working on getting a completely pristine 3.6.6 and 3.6.9
tested. Our patches are minor but there were a few, so I'm backing it all
out just to be sure.

Is there anything in particular which is most interesting? I can post lots
and lots and lots of information. Sadly bridge/macvlan weren't part of the
problem. .config, sysctls are easiest I guess? When this "hang" happens
the machine is still up somewhat, but we lose access to it. Syslog is
still writing entries to disk occasionally, so it's possible we could set
something up to dump more information.

It takes a day or two to cycle this, so it might take a while to get
information and test crashes.

thanks,
-Dormando
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: BUG: IPv4: Attempt to release TCP socket in state 1

2013-03-06 Thread dormando

 On Mon, 2013-03-04 at 21:44 -0800, dormando wrote:

  No 3rd party modules. There's a tiny patch for controlling initcwnd from
  userspace and another one for the extra_free_kbytes tunable that I brought
  up in another thread. We've had the initcwnd patch in for a long time
  without trouble. The extra_free_kbytes tunable isn't even being used yet,
  so all that's doing is adding a 0 somewhere.
 
  Only two iptables rules loaded: global NOTRACK rules for PREROUTING/OUTPUT
  in raw.
 
  Kernel's as close to pristine as I can make it. We had the 10g patch in
  but I've dropped it.
  --

 Hmm, I spent time on this bug report but found nothing.

 Please post as much information as you can on your setup.

 I see you use macvlan, bridge, so maybe there is a configuration issue
 (and a kernel bug of course)

Ok... bridge module is loaded but nothing seems to be using it. No
bond/tunnels/anything enabled. I couldn't quickly figure out what was
causing it to load.

We removed the need for macvlan, started machines with a fresh boot, and
they still crashed without it, after a few hours.

Unfortunately I just saw a machine crash in the same way on 3.6.6 and
3.6.9. I'm working on getting a completely pristine 3.6.6 and 3.6.9
tested. Our patches are minor but there were a few, so I'm backing it all
out just to be sure.

Is there anything in particular which is most interesting? I can post lots
and lots and lots of information. Sadly bridge/macvlan weren't part of the
problem. .config, sysctls are easiest I guess? When this hang happens
the machine is still up somewhat, but we lose access to it. Syslog is
still writing entries to disk occasionally, so it's possible we could set
something up to dump more information.

It takes a day or two to cycle this, so it might take a while to get
information and test crashes.

thanks,
-Dormando
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: BUG: IPv4: Attempt to release TCP socket in state 1

2013-03-04 Thread dormando



On Mon, 4 Mar 2013, Eric Dumazet wrote:

> On Tue, 2013-03-05 at 11:47 +0800, Cong Wang wrote:
> > (Cc'ing the right netdev mailing list...)
> >
> > On 03/05/2013 08:01 AM, dormando wrote:
> > > Hi!
> > >
> > > I have a (core lockup?) with 3.7.6+ and 3.8.2 which appears to be under
> > > ixgbe. The machine appears to still be up but network stays in a severely
> > > hobbled state. Either lagging or not responding to the network at all.
> > >
> > > On a new box the hang happens within 8-24 hours of giving it production
> > > network traffic. On an older machine (6 cores instead of 8, etc) it can
> > > run for a week or more before hanging.
> > >
> > > The hang from 3.7 might be slightly different than 3.8. They seem to be
> > > mostly the same aside from 3.8 hanging in the GRO path. Don't see anything
> > > obvious in 3.9-rc1 that would fix it, and haven't tried 3.9-rc1.
> > >
> > > I've not yet figured out how to reproduce outside of production (as
> > > always, sigh). This doesn't seem to happen with 3.6.6, but we have
> > > different and less frequent kernel panics there.
> > >
>
> Dornando, do you use any kind of special setup, external modules,
> or netfilter ? (iptables-save output would help)
>
> Is it a pristine kernel, or a modified one ?
>

(Sigh. sorry for the misfire, thanks for fixing cc).

No 3rd party modules. There's a tiny patch for controlling initcwnd from
userspace and another one for the extra_free_kbytes tunable that I brought
up in another thread. We've had the initcwnd patch in for a long time
without trouble. The extra_free_kbytes tunable isn't even being used yet,
so all that's doing is adding a 0 somewhere.

Only two iptables rules loaded: global NOTRACK rules for PREROUTING/OUTPUT
in raw.

Kernel's as close to pristine as I can make it. We had the 10g patch in
but I've dropped it.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

BUG: IPv4: Attempt to release TCP socket in state 1

2013-03-04 Thread dormando

5]  [] hrtimer_interrupt+0xf6/0x230
[37335.739871]  [] smp_apic_timer_interrupt+0x69/0x99
[37335.739874]  [] apic_timer_interrupt+0x6a/0x70
[37335.739878]  [] ?
__inet_lookup_established+0xcf/0x2d0
[37335.739880]  [] ? inet_del_protocol+0x40/0x40
[37335.739884]  [] tcp_v4_early_demux+0xac/0x170
[37335.739886]  [] ip_rcv_finish+0x14d/0x360
[37335.739888]  [] ip_rcv+0x226/0x310
[37335.739892]  [] __netif_receive_skb+0x492/0x640
[37335.739895]  [] netif_receive_skb+0x2d/0x90
[37335.739897]  [] ? tcp4_gro_receive+0xb0/0x130
[37335.739899]  [] napi_gro_complete+0x95/0xe0
[37335.739901]  [] dev_gro_receive+0x2b6/0x3b0
[37335.739903]  [] napi_gro_receive+0x5b/0x130
[37335.739911]  [] ixgbe_poll+0x54a/0x1180 [ixgbe]
[37335.739915]  [] ? enqueue_task+0x6a/0x80
[37335.739917]  [] net_rx_action+0xf5/0x260
[37335.739919]  [] __do_softirq+0xc7/0x230
[37335.739922]  [] call_softirq+0x1c/0x30
[37335.739927]  [] do_softirq+0x55/0x90
[37335.739928]  [] irq_exit+0x85/0xa0
[37335.739931]  [] do_IRQ+0x66/0xe0
[37335.739937]  [] common_interrupt+0x6a/0x6a
[37335.739938][] ? __schedule+0x3ac/0x750
[37335.739943]  [] ? mwait_idle+0xad/0x1f0
[37335.739945]  [] cpu_idle+0xb3/0x100
[37335.739948]  [] start_secondary+0x1d7/0x1de
[37515.727179] INFO: rcu_sched self-detected stall on CPU { 24}
(t=1005087 jiffies g=1985385 c=1985384 q=2087)
[37515.727246] Pid: 0, comm: swapper/24 Tainted: GW3.8.2 #2
[37515.727249] Call Trace:
[37515.727251][]
rcu_check_callbacks+0x21e/0x7c0
[37515.727265]  [] ? account_system_time+0xe8/0x1e0
[37515.727271]  [] update_process_times+0x48/0x90
[37515.727275]  [] tick_sched_timer+0x56/0x130
[37515.727279]  [] __run_hrtimer+0x7d/0x1c0
[37515.727281]  [] ? tick_setup_sched_timer+0x110/0x110
[37515.727283]  [] hrtimer_interrupt+0xf6/0x230
[37515.727289]  [] smp_apic_timer_interrupt+0x69/0x99
[37515.727292]  [] apic_timer_interrupt+0x6a/0x70
[37515.727296]  [] ?
__inet_lookup_established+0xcb/0x2d0
[37515.727298]  [] ? inet_del_protocol+0x40/0x40
[37515.727302]  [] tcp_v4_early_demux+0xac/0x170
[37515.727304]  [] ip_rcv_finish+0x14d/0x360
[37515.727306]  [] ip_rcv+0x226/0x310
[37515.727310]  [] __netif_receive_skb+0x492/0x640
[37515.727312]  [] netif_receive_skb+0x2d/0x90
[37515.727315]  [] ? tcp4_gro_receive+0xb0/0x130
[37515.727317]  [] napi_gro_complete+0x95/0xe0
[37515.727319]  [] dev_gro_receive+0x2b6/0x3b0
[37515.727322]  [] napi_gro_receive+0x5b/0x130
[37515.727330]  [] ixgbe_poll+0x54a/0x1180 [ixgbe]
[37515.727334]  [] ? enqueue_task+0x6a/0x80
[37515.727336]  [] net_rx_action+0xf5/0x260
[37515.727338]  [] __do_softirq+0xc7/0x230
[37515.727341]  [] call_softirq+0x1c/0x30
[37515.727345]  [] do_softirq+0x55/0x90
[37515.727346]  [] irq_exit+0x85/0xa0
[37515.727349]  [] do_IRQ+0x66/0xe0
[37515.727354]  [] common_interrupt+0x6a/0x6a
[37515.727355][] ? __schedule+0x3ac/0x750
[37515.727360]  [] ? mwait_idle+0xad/0x1f0
[37515.727362]  [] cpu_idle+0xb3/0x100
[37515.727365]  [] start_secondary+0x1d7/0x1de

... then swapped just does this until someone reboots the box.

Apologies for the ugly paste.

Thanks,
-Dormando
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

BUG: IPv4: Attempt to release TCP socket in state 1

2013-03-04 Thread dormando

  [8168567c] ? __schedule+0x3ac/0x750
[37335.739943]  [8100b1fd] ? mwait_idle+0xad/0x1f0
[37335.739945]  [8100a743] cpu_idle+0xb3/0x100
[37335.739948]  [8167d7d2] start_secondary+0x1d7/0x1de
[37515.727179] INFO: rcu_sched self-detected stall on CPU { 24}
(t=1005087 jiffies g=1985385 c=1985384 q=2087)
[37515.727246] Pid: 0, comm: swapper/24 Tainted: GW3.8.2 #2
[37515.727249] Call Trace:
[37515.727251]  IRQ  [810bea1e]
rcu_check_callbacks+0x21e/0x7c0
[37515.727265]  [8107f518] ? account_system_time+0xe8/0x1e0
[37515.727271]  [81058f48] update_process_times+0x48/0x90
[37515.727275]  [81095e06] tick_sched_timer+0x56/0x130
[37515.727279]  [8107099d] __run_hrtimer+0x7d/0x1c0
[37515.727281]  [81095db0] ? tick_setup_sched_timer+0x110/0x110
[37515.727283]  [81070d56] hrtimer_interrupt+0xf6/0x230
[37515.727289]  [81690429] smp_apic_timer_interrupt+0x69/0x99
[37515.727292]  [8168f80a] apic_timer_interrupt+0x6a/0x70
[37515.727296]  [815d3deb] ?
__inet_lookup_established+0xcb/0x2d0
[37515.727298]  [815cab80] ? inet_del_protocol+0x40/0x40
[37515.727302]  [815f078c] tcp_v4_early_demux+0xac/0x170
[37515.727304]  [815caccd] ip_rcv_finish+0x14d/0x360
[37515.727306]  [815cb246] ip_rcv+0x226/0x310
[37515.727310]  [815841a2] __netif_receive_skb+0x492/0x640
[37515.727312]  [8158455d] netif_receive_skb+0x2d/0x90
[37515.727315]  [815ed450] ? tcp4_gro_receive+0xb0/0x130
[37515.727317]  [81584655] napi_gro_complete+0x95/0xe0
[37515.727319]  [81584956] dev_gro_receive+0x2b6/0x3b0
[37515.727322]  [8158508b] napi_gro_receive+0x5b/0x130
[37515.727330]  [a01db04a] ixgbe_poll+0x54a/0x1180 [ixgbe]
[37515.727334]  [810792fa] ? enqueue_task+0x6a/0x80
[37515.727336]  [81584c15] net_rx_action+0xf5/0x260
[37515.727338]  [810518a7] __do_softirq+0xc7/0x230
[37515.727341]  [8168fe0c] call_softirq+0x1c/0x30
[37515.727345]  [81004415] do_softirq+0x55/0x90
[37515.727346]  [810516a5] irq_exit+0x85/0xa0
[37515.727349]  [81690346] do_IRQ+0x66/0xe0
[37515.727354]  [81686daa] common_interrupt+0x6a/0x6a
[37515.727355]  EOI  [8168567c] ? __schedule+0x3ac/0x750
[37515.727360]  [8100b1fd] ? mwait_idle+0xad/0x1f0
[37515.727362]  [8100a743] cpu_idle+0xb3/0x100
[37515.727365]  [8167d7d2] start_secondary+0x1d7/0x1de

... then swapped just does this until someone reboots the box.

Apologies for the ugly paste.

Thanks,
-Dormando
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: BUG: IPv4: Attempt to release TCP socket in state 1

2013-03-04 Thread dormando



On Mon, 4 Mar 2013, Eric Dumazet wrote:

 On Tue, 2013-03-05 at 11:47 +0800, Cong Wang wrote:
  (Cc'ing the right netdev mailing list...)
 
  On 03/05/2013 08:01 AM, dormando wrote:
   Hi!
  
   I have a (core lockup?) with 3.7.6+ and 3.8.2 which appears to be under
   ixgbe. The machine appears to still be up but network stays in a severely
   hobbled state. Either lagging or not responding to the network at all.
  
   On a new box the hang happens within 8-24 hours of giving it production
   network traffic. On an older machine (6 cores instead of 8, etc) it can
   run for a week or more before hanging.
  
   The hang from 3.7 might be slightly different than 3.8. They seem to be
   mostly the same aside from 3.8 hanging in the GRO path. Don't see anything
   obvious in 3.9-rc1 that would fix it, and haven't tried 3.9-rc1.
  
   I've not yet figured out how to reproduce outside of production (as
   always, sigh). This doesn't seem to happen with 3.6.6, but we have
   different and less frequent kernel panics there.
  

 Dornando, do you use any kind of special setup, external modules,
 or netfilter ? (iptables-save output would help)

 Is it a pristine kernel, or a modified one ?


(Sigh. sorry for the misfire, thanks for fixing cc).

No 3rd party modules. There's a tiny patch for controlling initcwnd from
userspace and another one for the extra_free_kbytes tunable that I brought
up in another thread. We've had the initcwnd patch in for a long time
without trouble. The extra_free_kbytes tunable isn't even being used yet,
so all that's doing is adding a 0 somewhere.

Only two iptables rules loaded: global NOTRACK rules for PREROUTING/OUTPUT
in raw.

Kernel's as close to pristine as I can make it. We had the 10g patch in
but I've dropped it.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] add extra free kbytes tunable

2013-02-19 Thread dormando

>
> The problem is that adding this tunable will constrain future VM
> implementations.  We will forever need to at least retain the
> pseudo-file.  We will also need to make some effort to retain its
> behaviour.
>
> It would of course be better to fix things so you don't need to tweak
> VM internals to get acceptable behaviour.

I sympathize with this. It's presently all that keeps us afloat though.
I'll whine about it again later if nothing else pans out.

> You said:
>
> : We have a server workload wherein machines with 100G+ of "free" memory
> : (used by page cache), scattered but frequent random io reads from 12+
> : SSD's, and 5gbps+ of internet traffic, will frequently hit direct reclaim
> : in a few different ways.
> :
> : 1) It'll run into small amounts of reclaim randomly (a few hundred
> : thousand).
> :
> : 2) A burst of reads or traffic can cause extra pressure, which kswapd
> : occasionally responds to by freeing up 40g+ of the pagecache all at once
> : (!) while pausing the system (Argh).
> :
> : 3) A blip in an upstream provider or failover from a peer causes the
> : kernel to allocate massive amounts of memory for retransmission
> : queues/etc, potentially along with buffered IO reads and (some, but not
> : often a ton) of new allocations from an application. This paired with 2)
> : can cause the box to stall for 15+ seconds.
>
> Can we prioritise these?  2) looks just awful - kswapd shouldn't just
> go off and free 40G of pagecache.  Do you know what's actually in that
> pagecache?  Large number of small files or small number of (very) large
> files?

We have a handful of huge files (6-12ish 200g+) that are mmap'ed and
accessed via address. occasionally madvise (WILLNEED) applied to the
address ranges before attempting to use them. There're a mix of other
files but nothing significant. The mmap's are READONLY and writes are done
via pwrite-ish functions.

I could use some guidance on inspecting/tracing the problem. I've been
trying to reproduce it in a lab, and respecting to 2)'s issue I've found:

- The amount of memory freed back up is either a percentage of total
memory or a percentage of free memory. (a machine with 48G of ram will
"only" free up an extra 4-7g)

- It's most likely to happen after a fresh boot, or if "3 > drop_caches"
is applied with the application down. As it fills it seems to get itself
into trouble, but becomes more stable after that. Unfortunately 1) and 3)
still apply to a stable instance.

- Protecting the DMA32 zone with something like "1 1 32" into
lowmem_reserve_ratio makes the mass-reclaiming less likely to happen.

- While watching "sar -B 1" I'll see kswapd wake up, and scan up to a few
hundred thousand pages before finding anything it actually wants to
reclaim (low vmeff). I've only been able to reproduce this from a clean
start. It can take up to 3 seconds before kswapd starts actually
reclaiming pages.

- So far as I can tell we're almost exclusively using 0 order allocations.
THP is disabled.

There's not much dirty memory involved. It's not flushing out writes while
reclaiming, it just kills off massive amount of cached memory.

We're not running the machines particularily hard... Often less than 30%
CPU usage at peak.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] add extra free kbytes tunable

2013-02-19 Thread dormando


 The problem is that adding this tunable will constrain future VM
 implementations.  We will forever need to at least retain the
 pseudo-file.  We will also need to make some effort to retain its
 behaviour.

 It would of course be better to fix things so you don't need to tweak
 VM internals to get acceptable behaviour.

I sympathize with this. It's presently all that keeps us afloat though.
I'll whine about it again later if nothing else pans out.

 You said:

 : We have a server workload wherein machines with 100G+ of free memory
 : (used by page cache), scattered but frequent random io reads from 12+
 : SSD's, and 5gbps+ of internet traffic, will frequently hit direct reclaim
 : in a few different ways.
 :
 : 1) It'll run into small amounts of reclaim randomly (a few hundred
 : thousand).
 :
 : 2) A burst of reads or traffic can cause extra pressure, which kswapd
 : occasionally responds to by freeing up 40g+ of the pagecache all at once
 : (!) while pausing the system (Argh).
 :
 : 3) A blip in an upstream provider or failover from a peer causes the
 : kernel to allocate massive amounts of memory for retransmission
 : queues/etc, potentially along with buffered IO reads and (some, but not
 : often a ton) of new allocations from an application. This paired with 2)
 : can cause the box to stall for 15+ seconds.

 Can we prioritise these?  2) looks just awful - kswapd shouldn't just
 go off and free 40G of pagecache.  Do you know what's actually in that
 pagecache?  Large number of small files or small number of (very) large
 files?

We have a handful of huge files (6-12ish 200g+) that are mmap'ed and
accessed via address. occasionally madvise (WILLNEED) applied to the
address ranges before attempting to use them. There're a mix of other
files but nothing significant. The mmap's are READONLY and writes are done
via pwrite-ish functions.

I could use some guidance on inspecting/tracing the problem. I've been
trying to reproduce it in a lab, and respecting to 2)'s issue I've found:

- The amount of memory freed back up is either a percentage of total
memory or a percentage of free memory. (a machine with 48G of ram will
only free up an extra 4-7g)

- It's most likely to happen after a fresh boot, or if 3  drop_caches
is applied with the application down. As it fills it seems to get itself
into trouble, but becomes more stable after that. Unfortunately 1) and 3)
still apply to a stable instance.

- Protecting the DMA32 zone with something like 1 1 32 into
lowmem_reserve_ratio makes the mass-reclaiming less likely to happen.

- While watching sar -B 1 I'll see kswapd wake up, and scan up to a few
hundred thousand pages before finding anything it actually wants to
reclaim (low vmeff). I've only been able to reproduce this from a clean
start. It can take up to 3 seconds before kswapd starts actually
reclaiming pages.

- So far as I can tell we're almost exclusively using 0 order allocations.
THP is disabled.

There's not much dirty memory involved. It's not flushing out writes while
reclaiming, it just kills off massive amount of cached memory.

We're not running the machines particularily hard... Often less than 30%
CPU usage at peak.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: extra free kbytes tunable

2013-02-17 Thread dormando



On Fri, 15 Feb 2013, Rik van Riel wrote:

> On 02/15/2013 05:21 PM, Seiji Aguchi wrote:
> > Rik, Satoru,
> >
> > Do you have any comments?
>
> IIRC at the time the patch was rejected as too inelegant.
>
> However, nobody else seems to have come up with a better plan, and
> there are users in need of a fix for this problem.
>
> I would still like to see a fix for the problem merged upstream.

I merged in the cleanups to your original patch, rebased it off of linus'
master from a day or two ago and re-sent (not sure how to preserve
authorship in that case? Apologies for goofing it).

I'm willing to argue it, or investigate better options. I'm going to be
stuck maintaining this patch since we can't really afford to have
production hang, or waste 12g+ of RAM per box.

> > > -Original Message-
> > > From: linux-kernel-ow...@vger.kernel.org
> > > [mailto:linux-kernel-ow...@vger.kernel.org] On Behalf Of dormando
> > > Sent: Monday, February 11, 2013 9:01 PM
> > > To: Rik van Riel
> > > Cc: Randy Dunlap; Satoru Moriya; linux-kernel@vger.kernel.org;
> > > linux...@kvack.org; lwood...@redhat.com; Seiji Aguchi;
> > > a...@linux-foundation.org; hu...@google.com
> > > Subject: extra free kbytes tunable
> > >
> > > Hi,
> > >
> > > As discussed in this thread:
> > > http://marc.info/?l=linux-mm=131490523222031=2
> > > (with this cleanup as well: https://lkml.org/lkml/2011/9/2/225)
> > >
> > > A tunable was proposed to allow specifying the distance between pages_min
> > > and the low watermark before kswapd is kicked in to
> > > free up pages. I'd like to re-open this thread since the patch did not
> > > appear to go anywhere.
> > >
> > > We have a server workload wherein machines with 100G+ of "free" memory
> > > (used by page cache), scattered but frequent random io
> > > reads from 12+ SSD's, and 5gbps+ of internet traffic, will frequently hit
> > > direct reclaim in a few different ways.
> > >
> > > 1) It'll run into small amounts of reclaim randomly (a few hundred
> > > thousand).
> > >
> > > 2) A burst of reads or traffic can cause extra pressure, which kswapd
> > > occasionally responds to by freeing up 40g+ of the pagecache all
> > > at once
> > > (!) while pausing the system (Argh).
> > >
> > > 3) A blip in an upstream provider or failover from a peer causes the
> > > kernel to allocate massive amounts of memory for retransmission
> > > queues/etc, potentially along with buffered IO reads and (some, but not
> > > often a ton) of new allocations from an application. This
> > > paired with 2) can cause the box to stall for 15+ seconds.
> > >
> > > We're seeing this more in 3.4/3.5/3.6, saw it less in 2.6.38. Mass
> > > reclaims are more common in newer kernels, but reclaims still happen
> > > in all kernels without raising min_free_kbytes dramatically.
> > >
> > > I've found that setting "lowmem_reserve_ratio" to something like "1 1 32"
> > > (thus protecting the DMA32 zone) causes 2) to happen less often, and is
> > > generally less violent with 1).
> > >
> > > Setting min_free_kbytes to 15G or more, paired with the above, has been
> > > the best at mitigating the issue. This is simply trying to raise
> > > the distance between the min and low watermarks. With min_free_kbytes set
> > > to 1500, that gives us a whopping 1.8G (!!!) of
> > > leeway before slamming into direct reclaim.
> > >
> > > So, this patch is unfortunate but wonderful at letting us reclaim 10G+ of
> > > otherwise lost memory. Could we please revisit it?
> > >
> > > I saw a lot of discussion on doing this automatically, or making kswapd
> > > more efficient to it, and I'd love to do that. Beyond making
> > > kswapd psychic I haven't seen any better options yet.
> > >
> > > The issue is more complex than simply having an application warn of an
> > > impending allocation, since this can happen via read load on
> > > disk or from kernel page allocations for the network, or a combination of
> > > the two (or three, if you add the app back in).
> > >
> > > It's going to get worse as we push machines with faster SSD's and bigger
> > > networks. I'm open to any ideas on how to make kswapd
> > > more efficient in our case, or really anything at all that works.
> > >
> > > I have more details, but cut it down as much as I could for this mail.
> > >
> > > Thanks,
> > > -Dormando
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > > the body of a message to majord...@vger.kernel.org More
> > > majordomo info at  http://vger.kernel.org/majordomo-info.html
> > > Please read the FAQ at  http://www.tux.org/lkml/
>
>
> --
> All rights reversed
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] add extra free kbytes tunable

2013-02-17 Thread dormando

From: Rik van Riel 

Add a userspace visible knob to tell the VM to keep an extra amount
of memory free, by increasing the gap between each zone's min and
low watermarks.

This is useful for realtime applications that call system
calls and have a bound on the number of allocations that happen
in any short time period.  In this application, extra_free_kbytes
would be left at an amount equal to or larger than than the
maximum number of allocations that happen in any burst.

It may also be useful to reduce the memory use of virtual
machines (temporarily?), in a way that does not cause memory
fragmentation like ballooning does.
---
 Documentation/sysctl/vm.txt |   16 
 include/linux/mmzone.h  |2 +-
 include/linux/swap.h|2 ++
 kernel/sysctl.c |   11 +--
 mm/page_alloc.c |   39 +--
 5 files changed, 57 insertions(+), 13 deletions(-)

diff --git a/Documentation/sysctl/vm.txt b/Documentation/sysctl/vm.txt
index 078701f..5d12bbd 100644
--- a/Documentation/sysctl/vm.txt
+++ b/Documentation/sysctl/vm.txt
@@ -28,6 +28,7 @@ Currently, these files are in /proc/sys/vm:
 - dirty_writeback_centisecs
 - drop_caches
 - extfrag_threshold
+- extra_free_kbytes
 - hugepages_treat_as_movable
 - hugetlb_shm_group
 - laptop_mode
@@ -167,6 +168,21 @@ fragmentation index is <= extfrag_threshold. The default 
value is 500.

 ==

+extra_free_kbytes
+
+This parameter tells the VM to keep extra free memory between the threshold
+where background reclaim (kswapd) kicks in, and the threshold where direct
+reclaim (by allocating processes) kicks in.
+
+This is useful for workloads that require low latency memory allocations
+and have a bounded burstiness in memory allocations, for example a
+realtime application that receives and transmits network traffic
+(causing in-kernel memory allocations) with a maximum total message burst
+size of 200MB may need 200MB of extra free memory to avoid direct reclaim
+related latencies.
+
+==
+
 hugepages_treat_as_movable

 This parameter is only useful when kernelcore= is specified at boot time to
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 73b64a3..7f8f883 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -881,7 +881,7 @@ static inline int is_dma(struct zone *zone)

 /* These two functions are used to setup the per zone pages min values */
 struct ctl_table;
-int min_free_kbytes_sysctl_handler(struct ctl_table *, int,
+int free_kbytes_sysctl_handler(struct ctl_table *, int,
void __user *, size_t *, loff_t *);
 extern int sysctl_lowmem_reserve_ratio[MAX_NR_ZONES-1];
 int lowmem_reserve_ratio_sysctl_handler(struct ctl_table *, int,
diff --git a/include/linux/swap.h b/include/linux/swap.h
index 68df9c1..66a12c4 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -215,6 +215,8 @@ struct swap_list_t {
 /* linux/mm/page_alloc.c */
 extern unsigned long totalram_pages;
 extern unsigned long totalreserve_pages;
+extern int min_free_kbytes;
+extern int extra_free_kbytes;
 extern unsigned long dirty_balance_reserve;
 extern unsigned int nr_free_buffer_pages(void);
 extern unsigned int nr_free_pagecache_pages(void);
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index c88878d..102e9a1 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -104,7 +104,6 @@ extern char core_pattern[];
 extern unsigned int core_pipe_limit;
 #endif
 extern int pid_max;
-extern int min_free_kbytes;
 extern int pid_max_min, pid_max_max;
 extern int sysctl_drop_caches;
 extern int percpu_pagelist_fraction;
@@ -1246,10 +1245,18 @@ static struct ctl_table vm_table[] = {
.data   = _free_kbytes,
.maxlen = sizeof(min_free_kbytes),
.mode   = 0644,
-   .proc_handler   = min_free_kbytes_sysctl_handler,
+   .proc_handler   = free_kbytes_sysctl_handler,
.extra1 = ,
},
{
+   .procname   = "extra_free_kbytes",
+   .data   = _free_kbytes,
+   .maxlen = sizeof(extra_free_kbytes),
+   .mode   = 0644,
+   .proc_handler   = free_kbytes_sysctl_handler,
+   .extra1 = ,
+   },
+   {
.procname   = "percpu_pagelist_fraction",
.data   = _pagelist_fraction,
.maxlen = sizeof(percpu_pagelist_fraction),
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 9673d96..5380d84 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -194,8 +194,21 @@ static char * const zone_names[MAX_NR_ZONES] = {
 "Movable",
 };

+/*
+ * Try to keep at least this much lowmem free.  Do not allow normal
+ * allocations below this point, only high priority ones. Automatically

[PATCH] add extra free kbytes tunable

2013-02-17 Thread dormando

From: Rik van Riel r...@redhat.com

Add a userspace visible knob to tell the VM to keep an extra amount
of memory free, by increasing the gap between each zone's min and
low watermarks.

This is useful for realtime applications that call system
calls and have a bound on the number of allocations that happen
in any short time period.  In this application, extra_free_kbytes
would be left at an amount equal to or larger than than the
maximum number of allocations that happen in any burst.

It may also be useful to reduce the memory use of virtual
machines (temporarily?), in a way that does not cause memory
fragmentation like ballooning does.
---
 Documentation/sysctl/vm.txt |   16 
 include/linux/mmzone.h  |2 +-
 include/linux/swap.h|2 ++
 kernel/sysctl.c |   11 +--
 mm/page_alloc.c |   39 +--
 5 files changed, 57 insertions(+), 13 deletions(-)

diff --git a/Documentation/sysctl/vm.txt b/Documentation/sysctl/vm.txt
index 078701f..5d12bbd 100644
--- a/Documentation/sysctl/vm.txt
+++ b/Documentation/sysctl/vm.txt
@@ -28,6 +28,7 @@ Currently, these files are in /proc/sys/vm:
 - dirty_writeback_centisecs
 - drop_caches
 - extfrag_threshold
+- extra_free_kbytes
 - hugepages_treat_as_movable
 - hugetlb_shm_group
 - laptop_mode
@@ -167,6 +168,21 @@ fragmentation index is = extfrag_threshold. The default 
value is 500.

 ==

+extra_free_kbytes
+
+This parameter tells the VM to keep extra free memory between the threshold
+where background reclaim (kswapd) kicks in, and the threshold where direct
+reclaim (by allocating processes) kicks in.
+
+This is useful for workloads that require low latency memory allocations
+and have a bounded burstiness in memory allocations, for example a
+realtime application that receives and transmits network traffic
+(causing in-kernel memory allocations) with a maximum total message burst
+size of 200MB may need 200MB of extra free memory to avoid direct reclaim
+related latencies.
+
+==
+
 hugepages_treat_as_movable

 This parameter is only useful when kernelcore= is specified at boot time to
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 73b64a3..7f8f883 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -881,7 +881,7 @@ static inline int is_dma(struct zone *zone)

 /* These two functions are used to setup the per zone pages min values */
 struct ctl_table;
-int min_free_kbytes_sysctl_handler(struct ctl_table *, int,
+int free_kbytes_sysctl_handler(struct ctl_table *, int,
void __user *, size_t *, loff_t *);
 extern int sysctl_lowmem_reserve_ratio[MAX_NR_ZONES-1];
 int lowmem_reserve_ratio_sysctl_handler(struct ctl_table *, int,
diff --git a/include/linux/swap.h b/include/linux/swap.h
index 68df9c1..66a12c4 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -215,6 +215,8 @@ struct swap_list_t {
 /* linux/mm/page_alloc.c */
 extern unsigned long totalram_pages;
 extern unsigned long totalreserve_pages;
+extern int min_free_kbytes;
+extern int extra_free_kbytes;
 extern unsigned long dirty_balance_reserve;
 extern unsigned int nr_free_buffer_pages(void);
 extern unsigned int nr_free_pagecache_pages(void);
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index c88878d..102e9a1 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -104,7 +104,6 @@ extern char core_pattern[];
 extern unsigned int core_pipe_limit;
 #endif
 extern int pid_max;
-extern int min_free_kbytes;
 extern int pid_max_min, pid_max_max;
 extern int sysctl_drop_caches;
 extern int percpu_pagelist_fraction;
@@ -1246,10 +1245,18 @@ static struct ctl_table vm_table[] = {
.data   = min_free_kbytes,
.maxlen = sizeof(min_free_kbytes),
.mode   = 0644,
-   .proc_handler   = min_free_kbytes_sysctl_handler,
+   .proc_handler   = free_kbytes_sysctl_handler,
.extra1 = zero,
},
{
+   .procname   = extra_free_kbytes,
+   .data   = extra_free_kbytes,
+   .maxlen = sizeof(extra_free_kbytes),
+   .mode   = 0644,
+   .proc_handler   = free_kbytes_sysctl_handler,
+   .extra1 = zero,
+   },
+   {
.procname   = percpu_pagelist_fraction,
.data   = percpu_pagelist_fraction,
.maxlen = sizeof(percpu_pagelist_fraction),
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 9673d96..5380d84 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -194,8 +194,21 @@ static char * const zone_names[MAX_NR_ZONES] = {
 Movable,
 };

+/*
+ * Try to keep at least this much lowmem free.  Do not allow normal
+ * allocations below this point, only high

Re: extra free kbytes tunable

2013-02-17 Thread dormando



On Fri, 15 Feb 2013, Rik van Riel wrote:

 On 02/15/2013 05:21 PM, Seiji Aguchi wrote:
  Rik, Satoru,
 
  Do you have any comments?

 IIRC at the time the patch was rejected as too inelegant.

 However, nobody else seems to have come up with a better plan, and
 there are users in need of a fix for this problem.

 I would still like to see a fix for the problem merged upstream.

I merged in the cleanups to your original patch, rebased it off of linus'
master from a day or two ago and re-sent (not sure how to preserve
authorship in that case? Apologies for goofing it).

I'm willing to argue it, or investigate better options. I'm going to be
stuck maintaining this patch since we can't really afford to have
production hang, or waste 12g+ of RAM per box.

   -Original Message-
   From: linux-kernel-ow...@vger.kernel.org
   [mailto:linux-kernel-ow...@vger.kernel.org] On Behalf Of dormando
   Sent: Monday, February 11, 2013 9:01 PM
   To: Rik van Riel
   Cc: Randy Dunlap; Satoru Moriya; linux-kernel@vger.kernel.org;
   linux...@kvack.org; lwood...@redhat.com; Seiji Aguchi;
   a...@linux-foundation.org; hu...@google.com
   Subject: extra free kbytes tunable
  
   Hi,
  
   As discussed in this thread:
   http://marc.info/?l=linux-mmm=131490523222031w=2
   (with this cleanup as well: https://lkml.org/lkml/2011/9/2/225)
  
   A tunable was proposed to allow specifying the distance between pages_min
   and the low watermark before kswapd is kicked in to
   free up pages. I'd like to re-open this thread since the patch did not
   appear to go anywhere.
  
   We have a server workload wherein machines with 100G+ of free memory
   (used by page cache), scattered but frequent random io
   reads from 12+ SSD's, and 5gbps+ of internet traffic, will frequently hit
   direct reclaim in a few different ways.
  
   1) It'll run into small amounts of reclaim randomly (a few hundred
   thousand).
  
   2) A burst of reads or traffic can cause extra pressure, which kswapd
   occasionally responds to by freeing up 40g+ of the pagecache all
   at once
   (!) while pausing the system (Argh).
  
   3) A blip in an upstream provider or failover from a peer causes the
   kernel to allocate massive amounts of memory for retransmission
   queues/etc, potentially along with buffered IO reads and (some, but not
   often a ton) of new allocations from an application. This
   paired with 2) can cause the box to stall for 15+ seconds.
  
   We're seeing this more in 3.4/3.5/3.6, saw it less in 2.6.38. Mass
   reclaims are more common in newer kernels, but reclaims still happen
   in all kernels without raising min_free_kbytes dramatically.
  
   I've found that setting lowmem_reserve_ratio to something like 1 1 32
   (thus protecting the DMA32 zone) causes 2) to happen less often, and is
   generally less violent with 1).
  
   Setting min_free_kbytes to 15G or more, paired with the above, has been
   the best at mitigating the issue. This is simply trying to raise
   the distance between the min and low watermarks. With min_free_kbytes set
   to 1500, that gives us a whopping 1.8G (!!!) of
   leeway before slamming into direct reclaim.
  
   So, this patch is unfortunate but wonderful at letting us reclaim 10G+ of
   otherwise lost memory. Could we please revisit it?
  
   I saw a lot of discussion on doing this automatically, or making kswapd
   more efficient to it, and I'd love to do that. Beyond making
   kswapd psychic I haven't seen any better options yet.
  
   The issue is more complex than simply having an application warn of an
   impending allocation, since this can happen via read load on
   disk or from kernel page allocations for the network, or a combination of
   the two (or three, if you add the app back in).
  
   It's going to get worse as we push machines with faster SSD's and bigger
   networks. I'm open to any ideas on how to make kswapd
   more efficient in our case, or really anything at all that works.
  
   I have more details, but cut it down as much as I could for this mail.
  
   Thanks,
   -Dormando
   --
   To unsubscribe from this list: send the line unsubscribe linux-kernel in
   the body of a message to majord...@vger.kernel.org More
   majordomo info at  http://vger.kernel.org/majordomo-info.html
   Please read the FAQ at  http://www.tux.org/lkml/


 --
 All rights reversed

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

extra free kbytes tunable

2013-02-11 Thread dormando

Hi,

As discussed in this thread:
http://marc.info/?l=linux-mm=131490523222031=2
(with this cleanup as well: https://lkml.org/lkml/2011/9/2/225)

A tunable was proposed to allow specifying the distance between pages_min
and the low watermark before kswapd is kicked in to free up pages. I'd
like to re-open this thread since the patch did not appear to go anywhere.

We have a server workload wherein machines with 100G+ of "free" memory
(used by page cache), scattered but frequent random io reads from 12+
SSD's, and 5gbps+ of internet traffic, will frequently hit direct reclaim
in a few different ways.

1) It'll run into small amounts of reclaim randomly (a few hundred
thousand).

2) A burst of reads or traffic can cause extra pressure, which kswapd
occasionally responds to by freeing up 40g+ of the pagecache all at once
(!) while pausing the system (Argh).

3) A blip in an upstream provider or failover from a peer causes the
kernel to allocate massive amounts of memory for retransmission
queues/etc, potentially along with buffered IO reads and (some, but not
often a ton) of new allocations from an application. This paired with 2)
can cause the box to stall for 15+ seconds.

We're seeing this more in 3.4/3.5/3.6, saw it less in 2.6.38. Mass
reclaims are more common in newer kernels, but reclaims still happen in
all kernels without raising min_free_kbytes dramatically.

I've found that setting "lowmem_reserve_ratio" to something like "1 1 32"
(thus protecting the DMA32 zone) causes 2) to happen less often, and is
generally less violent with 1).

Setting min_free_kbytes to 15G or more, paired with the above, has been
the best at mitigating the issue. This is simply trying to raise the
distance between the min and low watermarks. With min_free_kbytes set to
1500, that gives us a whopping 1.8G (!!!) of leeway before slamming
into direct reclaim.

So, this patch is unfortunate but wonderful at letting us reclaim 10G+ of
otherwise lost memory. Could we please revisit it?

I saw a lot of discussion on doing this automatically, or making kswapd
more efficient to it, and I'd love to do that. Beyond making kswapd
psychic I haven't seen any better options yet.

The issue is more complex than simply having an application warn of an
impending allocation, since this can happen via read load on disk or from
kernel page allocations for the network, or a combination of the two (or
three, if you add the app back in).

It's going to get worse as we push machines with faster SSD's and bigger
networks. I'm open to any ideas on how to make kswapd more efficient in
our case, or really anything at all that works.

I have more details, but cut it down as much as I could for this mail.

Thanks,
-Dormando
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

extra free kbytes tunable

2013-02-11 Thread dormando

Hi,

As discussed in this thread:
http://marc.info/?l=linux-mmm=131490523222031w=2
(with this cleanup as well: https://lkml.org/lkml/2011/9/2/225)

A tunable was proposed to allow specifying the distance between pages_min
and the low watermark before kswapd is kicked in to free up pages. I'd
like to re-open this thread since the patch did not appear to go anywhere.

We have a server workload wherein machines with 100G+ of free memory
(used by page cache), scattered but frequent random io reads from 12+
SSD's, and 5gbps+ of internet traffic, will frequently hit direct reclaim
in a few different ways.

1) It'll run into small amounts of reclaim randomly (a few hundred
thousand).

2) A burst of reads or traffic can cause extra pressure, which kswapd
occasionally responds to by freeing up 40g+ of the pagecache all at once
(!) while pausing the system (Argh).

3) A blip in an upstream provider or failover from a peer causes the
kernel to allocate massive amounts of memory for retransmission
queues/etc, potentially along with buffered IO reads and (some, but not
often a ton) of new allocations from an application. This paired with 2)
can cause the box to stall for 15+ seconds.

We're seeing this more in 3.4/3.5/3.6, saw it less in 2.6.38. Mass
reclaims are more common in newer kernels, but reclaims still happen in
all kernels without raising min_free_kbytes dramatically.

I've found that setting lowmem_reserve_ratio to something like 1 1 32
(thus protecting the DMA32 zone) causes 2) to happen less often, and is
generally less violent with 1).

Setting min_free_kbytes to 15G or more, paired with the above, has been
the best at mitigating the issue. This is simply trying to raise the
distance between the min and low watermarks. With min_free_kbytes set to
1500, that gives us a whopping 1.8G (!!!) of leeway before slamming
into direct reclaim.

So, this patch is unfortunate but wonderful at letting us reclaim 10G+ of
otherwise lost memory. Could we please revisit it?

I saw a lot of discussion on doing this automatically, or making kswapd
more efficient to it, and I'd love to do that. Beyond making kswapd
psychic I haven't seen any better options yet.

The issue is more complex than simply having an application warn of an
impending allocation, since this can happen via read load on disk or from
kernel page allocations for the network, or a combination of the two (or
three, if you add the app back in).

It's going to get worse as we push machines with faster SSD's and bigger
networks. I'm open to any ideas on how to make kswapd more efficient in
our case, or really anything at all that works.

I have more details, but cut it down as much as I could for this mail.

Thanks,
-Dormando
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

56 matches

Mail list logo