Re: [PATCH] tcp: Modify the condition for the first skb to collapse
On Tue, 2013-06-18 at 05:52 -0400, Jun Chen wrote: > > > There are many warning for tcp_recvmsg before this crash. I can't find > other memory warning in the logs, but I'm not sure whether there are > memory issues because of the length limitation of saved logs. I think > this logs will give you more information. > > <4>[ 7736.343742] [ cut here ] > > <4>[ 7736.343759] WARNING: > at /data/buildbot/workdir/jb/kernel/net/ipv4/tcp.c:1496 tcp_recvmsg > +0x3bf/0x910() > > <4>[ 7736.343775] recvmsg bug: copied AB57C870 seq AB57CD95 rcvnxt > AB57F19F fl 0 > > <4>[ 7736.343845] Call Trace: > > <4>[ 7736.343865] [] warn_slowpath_common+0x72/0xa0 > > <4>[ 7736.343888] [] ? tcp_recvmsg+0x3bf/0x910 > > <4>[ 7736.343902] [] ? tcp_recvmsg+0x3bf/0x910 > > <4>[ 7736.343922] [] warn_slowpath_fmt+0x33/0x40 > > <4>[ 7736.343944] [] tcp_recvmsg+0x3bf/0x910 > > <4>[ 7736.343968] [] inet_recvmsg+0x85/0xa0 > > <4>[ 7736.343992] [] sock_aio_read+0x140/0x160 > > <4>[ 7736.344016] [] ? set_next_entity+0xc1/0xf0 > > <4>[ 7736.344039] [] do_sync_read+0xb7/0xf0 > > <4>[ 7736.344064] [] ? rw_verify_area+0x6c/0x120 > > <4>[ 7736.344077] [] ? sys_epoll_wait+0x68/0x360 > > <4>[ 7736.344098] [] vfs_read+0x149/0x160 > > <4>[ 7736.344120] [] ? fget_light+0x58/0xd0 > > <4>[ 7736.344142] [] sys_read+0x3d/0x70 > > <4>[ 7736.344164] [] syscall_call+0x7/0xb > > <4>[ 7736.344187] [] ? perf_cpu_notify+0x45/0x89 > > <4>[ 7736.344205] ---[ end trace b3c5b245ce7ff5b5 ]--- > Thats exactly the interesting stuff ;) This was fixed, or should be fixed if still happening on more recent kernels. Basically, once we are in this state, there is nothing we can do to prevent a crash. Please try to reproduce the issue using 3.9 or David trees (net-next or net ) Thanks -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] tcp: Modify the condition for the first skb to collapse
On Mon, 2013-06-17 at 06:21 -0700, Eric Dumazet wrote: > On Mon, 2013-06-17 at 14:52 -0400, Jun Chen wrote: > > On Mon, 2013-06-17 at 03:29 -0700, Eric Dumazet wrote: > > > On Mon, 2013-06-17 at 13:29 -0400, Jun Chen wrote: > > > > > > > > > hi, > > > > When the condition of tcp_win_from_space(skb->truesize) > skb->len is > > > > true but the before(start, TCP_SKB_CB(skb)->seq) is also true, the final > > > > condition will be true. The follow line: > > > > int offset = start - TCP_SKB_CB(skb)->seq; > > > > BUG_ON(offset < 0); > > > > this BUG_ON will be triggered. > > > > > > > > > > Really this should never happen, we must track what's happening here. > > It's very very rare, but the logic of codes have such a little hole. > > > > > > Are you using a pristine kernel, without any patches ? > > The based kernel version is 3.4. > > > > > > Are you able to reproduce this bug in a short amount of time ? > > I can't reproduce it in short time, this log had just been found once > > for long long time tests on many devices . > > > > > > What kind of driver is in use ? (your stack trace was truncated) > > > > I attach the whole stack traces for you. > > > Any other suspect messages before this, a memory allocation error for > example ? > > I believe we have a bug in tcp_collapse() if one alloc_skb() returns > NULL while we were in the middle of collapsing a big GRO packet. > > gro_skb needed 3 skb to be rebuilt, and only two skbs could be allocated > > skb1: seq=X end_seq=X+4000 > skb2: seq=X+4000 end_seq=X+8000 > > grp_skb: seq=X end_seq=X+16000 > > Next time we call tcp_collapse(), we might split again the GRO packet > and get following incorrect queue : > > skb1: seq=X end_seq=X+4000 > skb2: seq=X+4000 end_seq=X+8000 > skb3: seq=X end_seq=X+4000 > skb4: seq=X+4000 end_seq=X+8000 > skb5: seq=X+8000 end_seq=X+12000 > skb6: seq=X+12000 end_seq=X+16000 > > > I would use the following patch instead, to narrow the problem > > If we really find in the ofo queue a skb with a lower seq than the > previous one, we should complain instead of lowering @start, since > this is going to crash later. > > receive_queue / ofo_queue should contain monotonically increasing > skb->seq. > > diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c > index 46271cdc..5507a09 100644 > --- a/net/ipv4/tcp_input.c > +++ b/net/ipv4/tcp_input.c > @@ -4513,8 +4513,10 @@ static void tcp_collapse_ofo_queue(struct sock *sk) > start = TCP_SKB_CB(skb)->seq; > end = TCP_SKB_CB(skb)->end_seq; > } else { > - if (before(TCP_SKB_CB(skb)->seq, start)) > - start = TCP_SKB_CB(skb)->seq; > + if (before(TCP_SKB_CB(skb)->seq, start)) { > + pr_err_once("tcp_collapse_ofo_queue() : seq > %08x before start %08X\n", > + TCP_SKB_CB(skb)->seq, start); > + } > if (after(TCP_SKB_CB(skb)->end_seq, end)) > end = TCP_SKB_CB(skb)->end_seq; > } > > There are many warning for tcp_recvmsg before this crash. I can't find other memory warning in the logs, but I'm not sure whether there are memory issues because of the length limitation of saved logs. I think this logs will give you more information. <4>[ 7736.343742] [ cut here ] <4>[ 7736.343759] WARNING: at /data/buildbot/workdir/jb/kernel/net/ipv4/tcp.c:1496 tcp_recvmsg +0x3bf/0x910() <4>[ 7736.343775] recvmsg bug: copied AB57C870 seq AB57CD95 rcvnxt AB57F19F fl 0 <4>[ 7736.343845] Call Trace: <4>[ 7736.343865] [] warn_slowpath_common+0x72/0xa0 <4>[ 7736.343888] [] ? tcp_recvmsg+0x3bf/0x910 <4>[ 7736.343902] [] ? tcp_recvmsg+0x3bf/0x910 <4>[ 7736.343922] [] warn_slowpath_fmt+0x33/0x40 <4>[ 7736.343944] [] tcp_recvmsg+0x3bf/0x910 <4>[ 7736.343968] [] inet_recvmsg+0x85/0xa0 <4>[ 7736.343992] [] sock_aio_read+0x140/0x160 <4>[ 7736.344016] [] ? set_next_entity+0xc1/0xf0 <4>[ 7736.344039] [] do_sync_read+0xb7/0xf0 <4>[ 7736.344064] [] ? rw_verify_area+0x6c/0x120 <4>[ 7736.344077] [] ? sys_epoll_wait+0x68/0x360 <4>[ 7736.344098] [] vfs_read+0x149/0x160 <4>[ 7736.344120] [] ? fget_light+0x58/0xd0 <4>[ 7736.344142] [] sys_read+0x3d/0x70 <4>[ 7736.344164] [] syscall_call+0x7/0xb <4>[ 7736.344187] [] ? perf_cpu_notify+0x45/0x89 <4>[ 7736.344205] ---[ end trace b3c5b245ce7ff5b5 ]--- -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] tcp: Modify the condition for the first skb to collapse
On Mon, 2013-06-17 at 14:52 -0400, Jun Chen wrote: > On Mon, 2013-06-17 at 03:29 -0700, Eric Dumazet wrote: > > On Mon, 2013-06-17 at 13:29 -0400, Jun Chen wrote: > > > > > > > hi, > > > When the condition of tcp_win_from_space(skb->truesize) > skb->len is > > > true but the before(start, TCP_SKB_CB(skb)->seq) is also true, the final > > > condition will be true. The follow line: > > > int offset = start - TCP_SKB_CB(skb)->seq; > > > BUG_ON(offset < 0); > > > this BUG_ON will be triggered. > > > > > > > Really this should never happen, we must track what's happening here. > It's very very rare, but the logic of codes have such a little hole. > > > > Are you using a pristine kernel, without any patches ? > The based kernel version is 3.4. > > > > Are you able to reproduce this bug in a short amount of time ? > I can't reproduce it in short time, this log had just been found once > for long long time tests on many devices . > > > > What kind of driver is in use ? (your stack trace was truncated) > > I attach the whole stack traces for you. > > <0>[ 7736.348788] Call Trace: > > <4>[ 7736.348861] [] tcp_prune_queue+0x120/0x2f0 > > <4>[ 7736.348984] [] tcp_data_queue+0x777/0xf00 > > <4>[ 7736.349055] [] ? ipt_do_table+0x1f8/0x480 > > <4>[ 7736.349126] [] ? ipt_do_table+0x1f8/0x480 > > <4>[ 7736.349196] [] tcp_rcv_established+0x114/0x680 > > <4>[ 7736.349269] [] tcp_v4_do_rcv+0x164/0x350 > > <4>[ 7736.349396] [] ? nf_nat_fn+0xb1/0x1d0 > > <4>[ 7736.349470] [] tcp_v4_rcv+0x6f1/0x7a0 > > <4>[ 7736.349599] [] ? nf_hook_slow+0x10d/0x150 > > <4>[ 7736.349673] [] ip_local_deliver_finish+0x8b/0x200 > > <4>[ 7736.349796] [] ip_local_deliver+0x8f/0xa0 > > <4>[ 7736.349867] [] ? ip_rcv_finish+0x300/0x300 > > <4>[ 7736.349937] [] ip_rcv_finish+0xdf/0x300 > > <4>[ 7736.350062] [] ip_rcv+0x258/0x330 > > <4>[ 7736.350132] [] ? inet_del_protocol+0x30/0x30 > > <4>[ 7736.350258] [] __netif_receive_skb+0x325/0x410 > > <4>[ 7736.350331] [] process_backlog+0x96/0x150 > > <4>[ 7736.350455] [] net_rx_action+0x115/0x210 > > <4>[ 7736.350525] [] ? tcp_out_of_resources+0xb0/0xb0 > > <4>[ 7736.350652] [] __do_softirq+0x9b/0x220 > > <4>[ 7736.350723] [] ? local_bh_enable_ip+0xd0/0xd0 > Any other suspect messages before this, a memory allocation error for example ? I believe we have a bug in tcp_collapse() if one alloc_skb() returns NULL while we were in the middle of collapsing a big GRO packet. gro_skb needed 3 skb to be rebuilt, and only two skbs could be allocated skb1: seq=X end_seq=X+4000 skb2: seq=X+4000 end_seq=X+8000 grp_skb: seq=X end_seq=X+16000 Next time we call tcp_collapse(), we might split again the GRO packet and get following incorrect queue : skb1: seq=X end_seq=X+4000 skb2: seq=X+4000 end_seq=X+8000 skb3: seq=X end_seq=X+4000 skb4: seq=X+4000 end_seq=X+8000 skb5: seq=X+8000 end_seq=X+12000 skb6: seq=X+12000 end_seq=X+16000 I would use the following patch instead, to narrow the problem If we really find in the ofo queue a skb with a lower seq than the previous one, we should complain instead of lowering @start, since this is going to crash later. receive_queue / ofo_queue should contain monotonically increasing skb->seq. diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index 46271cdc..5507a09 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -4513,8 +4513,10 @@ static void tcp_collapse_ofo_queue(struct sock *sk) start = TCP_SKB_CB(skb)->seq; end = TCP_SKB_CB(skb)->end_seq; } else { - if (before(TCP_SKB_CB(skb)->seq, start)) - start = TCP_SKB_CB(skb)->seq; + if (before(TCP_SKB_CB(skb)->seq, start)) { + pr_err_once("tcp_collapse_ofo_queue() : seq %08x before start %08X\n", + TCP_SKB_CB(skb)->seq, start); + } if (after(TCP_SKB_CB(skb)->end_seq, end)) end = TCP_SKB_CB(skb)->end_seq; } -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] tcp: Modify the condition for the first skb to collapse
On Mon, 2013-06-17 at 03:29 -0700, Eric Dumazet wrote: > On Mon, 2013-06-17 at 13:29 -0400, Jun Chen wrote: > > > > > hi, > > When the condition of tcp_win_from_space(skb->truesize) > skb->len is > > true but the before(start, TCP_SKB_CB(skb)->seq) is also true, the final > > condition will be true. The follow line: > > int offset = start - TCP_SKB_CB(skb)->seq; > > BUG_ON(offset < 0); > > this BUG_ON will be triggered. > > > > Really this should never happen, we must track what's happening here. It's very very rare, but the logic of codes have such a little hole. > > Are you using a pristine kernel, without any patches ? The based kernel version is 3.4. > > Are you able to reproduce this bug in a short amount of time ? I can't reproduce it in short time, this log had just been found once for long long time tests on many devices . > > What kind of driver is in use ? (your stack trace was truncated) I attach the whole stack traces for you. <0>[ 7736.348788] Call Trace: <4>[ 7736.348861] [] tcp_prune_queue+0x120/0x2f0 <4>[ 7736.348984] [] tcp_data_queue+0x777/0xf00 <4>[ 7736.349055] [] ? ipt_do_table+0x1f8/0x480 <4>[ 7736.349126] [] ? ipt_do_table+0x1f8/0x480 <4>[ 7736.349196] [] tcp_rcv_established+0x114/0x680 <4>[ 7736.349269] [] tcp_v4_do_rcv+0x164/0x350 <4>[ 7736.349396] [] ? nf_nat_fn+0xb1/0x1d0 <4>[ 7736.349470] [] tcp_v4_rcv+0x6f1/0x7a0 <4>[ 7736.349599] [] ? nf_hook_slow+0x10d/0x150 <4>[ 7736.349673] [] ip_local_deliver_finish+0x8b/0x200 <4>[ 7736.349796] [] ip_local_deliver+0x8f/0xa0 <4>[ 7736.349867] [] ? ip_rcv_finish+0x300/0x300 <4>[ 7736.349937] [] ip_rcv_finish+0xdf/0x300 <4>[ 7736.350062] [] ip_rcv+0x258/0x330 <4>[ 7736.350132] [] ? inet_del_protocol+0x30/0x30 <4>[ 7736.350258] [] __netif_receive_skb+0x325/0x410 <4>[ 7736.350331] [] process_backlog+0x96/0x150 <4>[ 7736.350455] [] net_rx_action+0x115/0x210 <4>[ 7736.350525] [] ? tcp_out_of_resources+0xb0/0xb0 <4>[ 7736.350652] [] __do_softirq+0x9b/0x220 <4>[ 7736.350723] [] ? local_bh_enable_ip+0xd0/0xd0 > > > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] tcp: Modify the condition for the first skb to collapse
On Mon, 2013-06-17 at 13:29 -0400, Jun Chen wrote: > > > hi, > When the condition of tcp_win_from_space(skb->truesize) > skb->len is > true but the before(start, TCP_SKB_CB(skb)->seq) is also true, the final > condition will be true. The follow line: > int offset = start - TCP_SKB_CB(skb)->seq; > BUG_ON(offset < 0); > this BUG_ON will be triggered. > Really this should never happen, we must track what's happening here. Are you using a pristine kernel, without any patches ? Are you able to reproduce this bug in a short amount of time ? What kind of driver is in use ? (your stack trace was truncated) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] tcp: Modify the condition for the first skb to collapse
On Mon, 2013-06-17 at 01:15 -0700, Eric Dumazet wrote: > On Mon, 2013-06-17 at 10:18 -0400, Jun Chen wrote: > > When search the first skb to collapse,the condition of overlap to the next > > one have been > > reached,but the start is less than TCP_SKB_CB(skb)->seq at this time, then > > followed process > > will trigger the BUG_ON of the offset(start - TCP_SKB_CB(skb)->seq). > > So this patch add one check (! before(start,TCP_SKB_CB(skb)->seq)) to avoid > > this ipanic. > > > > Signed-off-by: Chen Jun > > --- > > net/ipv4/tcp_input.c |3 ++- > > 1 files changed, 2 insertions(+), 1 deletions(-) > > > > diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c > > index 9c62257..4c745c5 100644 > > --- a/net/ipv4/tcp_input.c > > +++ b/net/ipv4/tcp_input.c > > @@ -4465,7 +4465,8 @@ restart: > > * overlaps to the next one. > > */ > > if (!tcp_hdr(skb)->syn && !tcp_hdr(skb)->fin && > > - (tcp_win_from_space(skb->truesize) > skb->len || > > + ((tcp_win_from_space(skb->truesize) > skb->len && > > + !before(start, TCP_SKB_CB(skb)->seq)) || > > before(TCP_SKB_CB(skb)->seq, start))) { > > end_of_skbs = false; > > break; > > Hmm... I must say I do not understand this patch. > > If we find a skb with before(TCP_SKB_CB(skb)->seq, start), then the > final condition will be true. > > Let's rewrite your code to equivalent one : > > if (!tcp_hdr(skb)->syn && !tcp_hdr(skb)->fin && > (before(TCP_SKB_CB(skb)->seq, start) || > tcp_win_from_space(skb->truesize) > skb->len)) { > > So it seems your patch would not solve the problem for all > possible skbs (aka not bloated) ? > > Please tell us how you trigger this bug, and send the stack trace. > > Thanks > > hi, When the condition of tcp_win_from_space(skb->truesize) > skb->len is true but the before(start, TCP_SKB_CB(skb)->seq) is also true, the final condition will be true. The follow line: int offset = start - TCP_SKB_CB(skb)->seq; BUG_ON(offset < 0); this BUG_ON will be triggered. Follow line is my error logs: <2>[ 7736.344508] kernel BUG at /data/buildbot/workdir/jb/kernel/net/ipv4/tcp_input.c:4845! <4>[ 7736.344578] invalid opcode: [#1] PREEMPT SMP <4>[ 7736.344883] Modules linked in: atomisp lm3559 ov9724 imx1x5 bcm4335(O) cfg80211 bcm_bt_lpm videobuf_vmalloc videobuf_core matrix(C) <4>[ 7736.345681] <4>[ 7736.345748] Pid: 5189, comm: TimedEventQueue Tainted: GWC O 3.4.43-186445-g3ada675 #1 Intel Corporation Merrifield/SALT BAY <4>[ 7736.346059] EIP: 0060:[] EFLAGS: 00010297 CPU: 1 <4>[ 7736.346183] EIP is at tcp_collapse+0x3bd/0x3d0 <4>[ 7736.346250] EAX: ab57d2bb EBX: df428c00 ECX: c97dcd00 EDX: 10c0 <4>[ 7736.346372] ESI: df4289c0 EDI: fadb EBP: edca1d88 ESP: edca1d60 <4>[ 7736.346441] DS: 007b ES: 007b FS: 00d8 GS: 003b SS: 0068 <4>[ 7736.346560] CR0: 8005003b CR2: 41d310bc CR3: 2d30 CR4: 001007d0 <4>[ 7736.346629] DR0: DR1: DR2: DR3: <4>[ 7736.346749] DR6: 0ff0 DR7: 0400 <0>[ 7736.346816] Process TimedEventQueue (pid: 5189, ti=edca task=dc30b660 task.ti=c9a6e000) <0>[ 7736.346936] Stack: <4>[ 7736.347002] fadb c97dcd5c 0001 c97dcd00 0e32 c97dcd00 <4>[ 7736.347615] c97dcd00 df428180 edca1db0 c18addd0 ab57c870 ab57f19f c97dcd00 <4>[ 7736.348175] c97dd198 80c0 c97dcd00 df428180 edca1df0 c18aea27 c18dc8f8 <0>[ 7736.348788] Call Trace: <4>[ 7736.348861] [] tcp_prune_queue+0x120/0x2f0 <4>[ 7736.348984] [] tcp_data_queue+0x777/0xf00 <4>[ 7736.349055] [] ? ipt_do_table+0x1f8/0x480 <4>[ 7736.349126] [] ? ipt_do_table+0x1f8/0x480 <4>[ 7736.349196] [] tcp_rcv_established+0x114/0x680 <4>[ 7736.349269] [] tcp_v4_do_rcv+0x164/0x350 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] tcp: Modify the condition for the first skb to collapse
On Mon, 2013-06-17 at 10:18 -0400, Jun Chen wrote: > When search the first skb to collapse,the condition of overlap to the next > one have been > reached,but the start is less than TCP_SKB_CB(skb)->seq at this time, then > followed process > will trigger the BUG_ON of the offset(start - TCP_SKB_CB(skb)->seq). > So this patch add one check (! before(start,TCP_SKB_CB(skb)->seq)) to avoid > this ipanic. > > Signed-off-by: Chen Jun > --- > net/ipv4/tcp_input.c |3 ++- > 1 files changed, 2 insertions(+), 1 deletions(-) > > diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c > index 9c62257..4c745c5 100644 > --- a/net/ipv4/tcp_input.c > +++ b/net/ipv4/tcp_input.c > @@ -4465,7 +4465,8 @@ restart: >* overlaps to the next one. >*/ > if (!tcp_hdr(skb)->syn && !tcp_hdr(skb)->fin && > - (tcp_win_from_space(skb->truesize) > skb->len || > + ((tcp_win_from_space(skb->truesize) > skb->len && > + !before(start, TCP_SKB_CB(skb)->seq)) || >before(TCP_SKB_CB(skb)->seq, start))) { > end_of_skbs = false; > break; Hmm... I must say I do not understand this patch. If we find a skb with before(TCP_SKB_CB(skb)->seq, start), then the final condition will be true. Let's rewrite your code to equivalent one : if (!tcp_hdr(skb)->syn && !tcp_hdr(skb)->fin && (before(TCP_SKB_CB(skb)->seq, start) || tcp_win_from_space(skb->truesize) > skb->len)) { So it seems your patch would not solve the problem for all possible skbs (aka not bloated) ? Please tell us how you trigger this bug, and send the stack trace. Thanks -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] tcp: Modify the condition for the first skb to collapse
When search the first skb to collapse,the condition of overlap to the next one have been reached,but the start is less than TCP_SKB_CB(skb)->seq at this time, then followed process will trigger the BUG_ON of the offset(start - TCP_SKB_CB(skb)->seq). So this patch add one check (! before(start,TCP_SKB_CB(skb)->seq)) to avoid this ipanic. Signed-off-by: Chen Jun --- net/ipv4/tcp_input.c |3 ++- 1 files changed, 2 insertions(+), 1 deletions(-) diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index 9c62257..4c745c5 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -4465,7 +4465,8 @@ restart: * overlaps to the next one. */ if (!tcp_hdr(skb)->syn && !tcp_hdr(skb)->fin && - (tcp_win_from_space(skb->truesize) > skb->len || + ((tcp_win_from_space(skb->truesize) > skb->len && + !before(start, TCP_SKB_CB(skb)->seq)) || before(TCP_SKB_CB(skb)->seq, start))) { end_of_skbs = false; break; -- 1.7.4.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] tcp: Modify the condition for the first skb to collapse
When search the first skb to collapse,the condition of overlap to the next one have been reached,but the start is less than TCP_SKB_CB(skb)-seq at this time, then followed process will trigger the BUG_ON of the offset(start - TCP_SKB_CB(skb)-seq). So this patch add one check (! before(start,TCP_SKB_CB(skb)-seq)) to avoid this ipanic. Signed-off-by: Chen Jun jun.d.c...@intel.com --- net/ipv4/tcp_input.c |3 ++- 1 files changed, 2 insertions(+), 1 deletions(-) diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index 9c62257..4c745c5 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -4465,7 +4465,8 @@ restart: * overlaps to the next one. */ if (!tcp_hdr(skb)-syn !tcp_hdr(skb)-fin - (tcp_win_from_space(skb-truesize) skb-len || + ((tcp_win_from_space(skb-truesize) skb-len + !before(start, TCP_SKB_CB(skb)-seq)) || before(TCP_SKB_CB(skb)-seq, start))) { end_of_skbs = false; break; -- 1.7.4.1 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] tcp: Modify the condition for the first skb to collapse
On Mon, 2013-06-17 at 10:18 -0400, Jun Chen wrote: When search the first skb to collapse,the condition of overlap to the next one have been reached,but the start is less than TCP_SKB_CB(skb)-seq at this time, then followed process will trigger the BUG_ON of the offset(start - TCP_SKB_CB(skb)-seq). So this patch add one check (! before(start,TCP_SKB_CB(skb)-seq)) to avoid this ipanic. Signed-off-by: Chen Jun jun.d.c...@intel.com --- net/ipv4/tcp_input.c |3 ++- 1 files changed, 2 insertions(+), 1 deletions(-) diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index 9c62257..4c745c5 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -4465,7 +4465,8 @@ restart: * overlaps to the next one. */ if (!tcp_hdr(skb)-syn !tcp_hdr(skb)-fin - (tcp_win_from_space(skb-truesize) skb-len || + ((tcp_win_from_space(skb-truesize) skb-len + !before(start, TCP_SKB_CB(skb)-seq)) || before(TCP_SKB_CB(skb)-seq, start))) { end_of_skbs = false; break; Hmm... I must say I do not understand this patch. If we find a skb with before(TCP_SKB_CB(skb)-seq, start), then the final condition will be true. Let's rewrite your code to equivalent one : if (!tcp_hdr(skb)-syn !tcp_hdr(skb)-fin (before(TCP_SKB_CB(skb)-seq, start) || tcp_win_from_space(skb-truesize) skb-len)) { So it seems your patch would not solve the problem for all possible skbs (aka not bloated) ? Please tell us how you trigger this bug, and send the stack trace. Thanks -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] tcp: Modify the condition for the first skb to collapse
On Mon, 2013-06-17 at 01:15 -0700, Eric Dumazet wrote: On Mon, 2013-06-17 at 10:18 -0400, Jun Chen wrote: When search the first skb to collapse,the condition of overlap to the next one have been reached,but the start is less than TCP_SKB_CB(skb)-seq at this time, then followed process will trigger the BUG_ON of the offset(start - TCP_SKB_CB(skb)-seq). So this patch add one check (! before(start,TCP_SKB_CB(skb)-seq)) to avoid this ipanic. Signed-off-by: Chen Jun jun.d.c...@intel.com --- net/ipv4/tcp_input.c |3 ++- 1 files changed, 2 insertions(+), 1 deletions(-) diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index 9c62257..4c745c5 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -4465,7 +4465,8 @@ restart: * overlaps to the next one. */ if (!tcp_hdr(skb)-syn !tcp_hdr(skb)-fin - (tcp_win_from_space(skb-truesize) skb-len || + ((tcp_win_from_space(skb-truesize) skb-len + !before(start, TCP_SKB_CB(skb)-seq)) || before(TCP_SKB_CB(skb)-seq, start))) { end_of_skbs = false; break; Hmm... I must say I do not understand this patch. If we find a skb with before(TCP_SKB_CB(skb)-seq, start), then the final condition will be true. Let's rewrite your code to equivalent one : if (!tcp_hdr(skb)-syn !tcp_hdr(skb)-fin (before(TCP_SKB_CB(skb)-seq, start) || tcp_win_from_space(skb-truesize) skb-len)) { So it seems your patch would not solve the problem for all possible skbs (aka not bloated) ? Please tell us how you trigger this bug, and send the stack trace. Thanks hi, When the condition of tcp_win_from_space(skb-truesize) skb-len is true but the before(start, TCP_SKB_CB(skb)-seq) is also true, the final condition will be true. The follow line: int offset = start - TCP_SKB_CB(skb)-seq; BUG_ON(offset 0); this BUG_ON will be triggered. Follow line is my error logs: 2[ 7736.344508] kernel BUG at /data/buildbot/workdir/jb/kernel/net/ipv4/tcp_input.c:4845! 4[ 7736.344578] invalid opcode: [#1] PREEMPT SMP 4[ 7736.344883] Modules linked in: atomisp lm3559 ov9724 imx1x5 bcm4335(O) cfg80211 bcm_bt_lpm videobuf_vmalloc videobuf_core matrix(C) 4[ 7736.345681] 4[ 7736.345748] Pid: 5189, comm: TimedEventQueue Tainted: GWC O 3.4.43-186445-g3ada675 #1 Intel Corporation Merrifield/SALT BAY 4[ 7736.346059] EIP: 0060:[c18ad61d] EFLAGS: 00010297 CPU: 1 4[ 7736.346183] EIP is at tcp_collapse+0x3bd/0x3d0 4[ 7736.346250] EAX: ab57d2bb EBX: df428c00 ECX: c97dcd00 EDX: 10c0 4[ 7736.346372] ESI: df4289c0 EDI: fadb EBP: edca1d88 ESP: edca1d60 4[ 7736.346441] DS: 007b ES: 007b FS: 00d8 GS: 003b SS: 0068 4[ 7736.346560] CR0: 8005003b CR2: 41d310bc CR3: 2d30 CR4: 001007d0 4[ 7736.346629] DR0: DR1: DR2: DR3: 4[ 7736.346749] DR6: 0ff0 DR7: 0400 0[ 7736.346816] Process TimedEventQueue (pid: 5189, ti=edca task=dc30b660 task.ti=c9a6e000) 0[ 7736.346936] Stack: 4[ 7736.347002] fadb c97dcd5c 0001 c97dcd00 0e32 c97dcd00 4[ 7736.347615] c97dcd00 df428180 edca1db0 c18addd0 ab57c870 ab57f19f c97dcd00 4[ 7736.348175] c97dd198 80c0 c97dcd00 df428180 edca1df0 c18aea27 c18dc8f8 0[ 7736.348788] Call Trace: 4[ 7736.348861] [c18addd0] tcp_prune_queue+0x120/0x2f0 4[ 7736.348984] [c18aea27] tcp_data_queue+0x777/0xf00 4[ 7736.349055] [c18dc8f8] ? ipt_do_table+0x1f8/0x480 4[ 7736.349126] [c18dc8f8] ? ipt_do_table+0x1f8/0x480 4[ 7736.349196] [c18b2e84] tcp_rcv_established+0x114/0x680 4[ 7736.349269] [c18bb034] tcp_v4_do_rcv+0x164/0x350 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] tcp: Modify the condition for the first skb to collapse
On Mon, 2013-06-17 at 13:29 -0400, Jun Chen wrote: hi, When the condition of tcp_win_from_space(skb-truesize) skb-len is true but the before(start, TCP_SKB_CB(skb)-seq) is also true, the final condition will be true. The follow line: int offset = start - TCP_SKB_CB(skb)-seq; BUG_ON(offset 0); this BUG_ON will be triggered. Really this should never happen, we must track what's happening here. Are you using a pristine kernel, without any patches ? Are you able to reproduce this bug in a short amount of time ? What kind of driver is in use ? (your stack trace was truncated) -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] tcp: Modify the condition for the first skb to collapse
On Mon, 2013-06-17 at 03:29 -0700, Eric Dumazet wrote: On Mon, 2013-06-17 at 13:29 -0400, Jun Chen wrote: hi, When the condition of tcp_win_from_space(skb-truesize) skb-len is true but the before(start, TCP_SKB_CB(skb)-seq) is also true, the final condition will be true. The follow line: int offset = start - TCP_SKB_CB(skb)-seq; BUG_ON(offset 0); this BUG_ON will be triggered. Really this should never happen, we must track what's happening here. It's very very rare, but the logic of codes have such a little hole. Are you using a pristine kernel, without any patches ? The based kernel version is 3.4. Are you able to reproduce this bug in a short amount of time ? I can't reproduce it in short time, this log had just been found once for long long time tests on many devices . What kind of driver is in use ? (your stack trace was truncated) I attach the whole stack traces for you. 0[ 7736.348788] Call Trace: 4[ 7736.348861] [c18addd0] tcp_prune_queue+0x120/0x2f0 4[ 7736.348984] [c18aea27] tcp_data_queue+0x777/0xf00 4[ 7736.349055] [c18dc8f8] ? ipt_do_table+0x1f8/0x480 4[ 7736.349126] [c18dc8f8] ? ipt_do_table+0x1f8/0x480 4[ 7736.349196] [c18b2e84] tcp_rcv_established+0x114/0x680 4[ 7736.349269] [c18bb034] tcp_v4_do_rcv+0x164/0x350 4[ 7736.349396] [c18de301] ? nf_nat_fn+0xb1/0x1d0 4[ 7736.349470] [c18bc0c1] tcp_v4_rcv+0x6f1/0x7a0 4[ 7736.349599] [c1881dad] ? nf_hook_slow+0x10d/0x150 4[ 7736.349673] [c189b30b] ip_local_deliver_finish+0x8b/0x200 4[ 7736.349796] [c189b60f] ip_local_deliver+0x8f/0xa0 4[ 7736.349867] [c189b280] ? ip_rcv_finish+0x300/0x300 4[ 7736.349937] [c189b05f] ip_rcv_finish+0xdf/0x300 4[ 7736.350062] [c189b878] ip_rcv+0x258/0x330 4[ 7736.350132] [c189af80] ? inet_del_protocol+0x30/0x30 4[ 7736.350258] [c1864175] __netif_receive_skb+0x325/0x410 4[ 7736.350331] [c1864956] process_backlog+0x96/0x150 4[ 7736.350455] [c1864ba5] net_rx_action+0x115/0x210 4[ 7736.350525] [c18b7680] ? tcp_out_of_resources+0xb0/0xb0 4[ 7736.350652] [c123dc0b] __do_softirq+0x9b/0x220 4[ 7736.350723] [c123db70] ? local_bh_enable_ip+0xd0/0xd0 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] tcp: Modify the condition for the first skb to collapse
On Mon, 2013-06-17 at 14:52 -0400, Jun Chen wrote: On Mon, 2013-06-17 at 03:29 -0700, Eric Dumazet wrote: On Mon, 2013-06-17 at 13:29 -0400, Jun Chen wrote: hi, When the condition of tcp_win_from_space(skb-truesize) skb-len is true but the before(start, TCP_SKB_CB(skb)-seq) is also true, the final condition will be true. The follow line: int offset = start - TCP_SKB_CB(skb)-seq; BUG_ON(offset 0); this BUG_ON will be triggered. Really this should never happen, we must track what's happening here. It's very very rare, but the logic of codes have such a little hole. Are you using a pristine kernel, without any patches ? The based kernel version is 3.4. Are you able to reproduce this bug in a short amount of time ? I can't reproduce it in short time, this log had just been found once for long long time tests on many devices . What kind of driver is in use ? (your stack trace was truncated) I attach the whole stack traces for you. 0[ 7736.348788] Call Trace: 4[ 7736.348861] [c18addd0] tcp_prune_queue+0x120/0x2f0 4[ 7736.348984] [c18aea27] tcp_data_queue+0x777/0xf00 4[ 7736.349055] [c18dc8f8] ? ipt_do_table+0x1f8/0x480 4[ 7736.349126] [c18dc8f8] ? ipt_do_table+0x1f8/0x480 4[ 7736.349196] [c18b2e84] tcp_rcv_established+0x114/0x680 4[ 7736.349269] [c18bb034] tcp_v4_do_rcv+0x164/0x350 4[ 7736.349396] [c18de301] ? nf_nat_fn+0xb1/0x1d0 4[ 7736.349470] [c18bc0c1] tcp_v4_rcv+0x6f1/0x7a0 4[ 7736.349599] [c1881dad] ? nf_hook_slow+0x10d/0x150 4[ 7736.349673] [c189b30b] ip_local_deliver_finish+0x8b/0x200 4[ 7736.349796] [c189b60f] ip_local_deliver+0x8f/0xa0 4[ 7736.349867] [c189b280] ? ip_rcv_finish+0x300/0x300 4[ 7736.349937] [c189b05f] ip_rcv_finish+0xdf/0x300 4[ 7736.350062] [c189b878] ip_rcv+0x258/0x330 4[ 7736.350132] [c189af80] ? inet_del_protocol+0x30/0x30 4[ 7736.350258] [c1864175] __netif_receive_skb+0x325/0x410 4[ 7736.350331] [c1864956] process_backlog+0x96/0x150 4[ 7736.350455] [c1864ba5] net_rx_action+0x115/0x210 4[ 7736.350525] [c18b7680] ? tcp_out_of_resources+0xb0/0xb0 4[ 7736.350652] [c123dc0b] __do_softirq+0x9b/0x220 4[ 7736.350723] [c123db70] ? local_bh_enable_ip+0xd0/0xd0 Any other suspect messages before this, a memory allocation error for example ? I believe we have a bug in tcp_collapse() if one alloc_skb() returns NULL while we were in the middle of collapsing a big GRO packet. gro_skb needed 3 skb to be rebuilt, and only two skbs could be allocated skb1: seq=X end_seq=X+4000 skb2: seq=X+4000 end_seq=X+8000 missing grp_skb: seq=X end_seq=X+16000 Next time we call tcp_collapse(), we might split again the GRO packet and get following incorrect queue : skb1: seq=X end_seq=X+4000 skb2: seq=X+4000 end_seq=X+8000 skb3: seq=X end_seq=X+4000 skb4: seq=X+4000 end_seq=X+8000 skb5: seq=X+8000 end_seq=X+12000 skb6: seq=X+12000 end_seq=X+16000 I would use the following patch instead, to narrow the problem If we really find in the ofo queue a skb with a lower seq than the previous one, we should complain instead of lowering @start, since this is going to crash later. receive_queue / ofo_queue should contain monotonically increasing skb-seq. diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index 46271cdc..5507a09 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -4513,8 +4513,10 @@ static void tcp_collapse_ofo_queue(struct sock *sk) start = TCP_SKB_CB(skb)-seq; end = TCP_SKB_CB(skb)-end_seq; } else { - if (before(TCP_SKB_CB(skb)-seq, start)) - start = TCP_SKB_CB(skb)-seq; + if (before(TCP_SKB_CB(skb)-seq, start)) { + pr_err_once(tcp_collapse_ofo_queue() : seq %08x before start %08X\n, + TCP_SKB_CB(skb)-seq, start); + } if (after(TCP_SKB_CB(skb)-end_seq, end)) end = TCP_SKB_CB(skb)-end_seq; } -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] tcp: Modify the condition for the first skb to collapse
On Mon, 2013-06-17 at 06:21 -0700, Eric Dumazet wrote: On Mon, 2013-06-17 at 14:52 -0400, Jun Chen wrote: On Mon, 2013-06-17 at 03:29 -0700, Eric Dumazet wrote: On Mon, 2013-06-17 at 13:29 -0400, Jun Chen wrote: hi, When the condition of tcp_win_from_space(skb-truesize) skb-len is true but the before(start, TCP_SKB_CB(skb)-seq) is also true, the final condition will be true. The follow line: int offset = start - TCP_SKB_CB(skb)-seq; BUG_ON(offset 0); this BUG_ON will be triggered. Really this should never happen, we must track what's happening here. It's very very rare, but the logic of codes have such a little hole. Are you using a pristine kernel, without any patches ? The based kernel version is 3.4. Are you able to reproduce this bug in a short amount of time ? I can't reproduce it in short time, this log had just been found once for long long time tests on many devices . What kind of driver is in use ? (your stack trace was truncated) I attach the whole stack traces for you. Any other suspect messages before this, a memory allocation error for example ? I believe we have a bug in tcp_collapse() if one alloc_skb() returns NULL while we were in the middle of collapsing a big GRO packet. gro_skb needed 3 skb to be rebuilt, and only two skbs could be allocated skb1: seq=X end_seq=X+4000 skb2: seq=X+4000 end_seq=X+8000 missing grp_skb: seq=X end_seq=X+16000 Next time we call tcp_collapse(), we might split again the GRO packet and get following incorrect queue : skb1: seq=X end_seq=X+4000 skb2: seq=X+4000 end_seq=X+8000 skb3: seq=X end_seq=X+4000 skb4: seq=X+4000 end_seq=X+8000 skb5: seq=X+8000 end_seq=X+12000 skb6: seq=X+12000 end_seq=X+16000 I would use the following patch instead, to narrow the problem If we really find in the ofo queue a skb with a lower seq than the previous one, we should complain instead of lowering @start, since this is going to crash later. receive_queue / ofo_queue should contain monotonically increasing skb-seq. diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index 46271cdc..5507a09 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -4513,8 +4513,10 @@ static void tcp_collapse_ofo_queue(struct sock *sk) start = TCP_SKB_CB(skb)-seq; end = TCP_SKB_CB(skb)-end_seq; } else { - if (before(TCP_SKB_CB(skb)-seq, start)) - start = TCP_SKB_CB(skb)-seq; + if (before(TCP_SKB_CB(skb)-seq, start)) { + pr_err_once(tcp_collapse_ofo_queue() : seq %08x before start %08X\n, + TCP_SKB_CB(skb)-seq, start); + } if (after(TCP_SKB_CB(skb)-end_seq, end)) end = TCP_SKB_CB(skb)-end_seq; } There are many warning for tcp_recvmsg before this crash. I can't find other memory warning in the logs, but I'm not sure whether there are memory issues because of the length limitation of saved logs. I think this logs will give you more information. 4[ 7736.343742] [ cut here ] 4[ 7736.343759] WARNING: at /data/buildbot/workdir/jb/kernel/net/ipv4/tcp.c:1496 tcp_recvmsg +0x3bf/0x910() 4[ 7736.343775] recvmsg bug: copied AB57C870 seq AB57CD95 rcvnxt AB57F19F fl 0 4[ 7736.343845] Call Trace: 4[ 7736.343865] [c1237032] warn_slowpath_common+0x72/0xa0 4[ 7736.343888] [c18a955f] ? tcp_recvmsg+0x3bf/0x910 4[ 7736.343902] [c18a955f] ? tcp_recvmsg+0x3bf/0x910 4[ 7736.343922] [c1237103] warn_slowpath_fmt+0x33/0x40 4[ 7736.343944] [c18a955f] tcp_recvmsg+0x3bf/0x910 4[ 7736.343968] [c18c9bb5] inet_recvmsg+0x85/0xa0 4[ 7736.343992] [c1852030] sock_aio_read+0x140/0x160 4[ 7736.344016] [c126b221] ? set_next_entity+0xc1/0xf0 4[ 7736.344039] [c130d627] do_sync_read+0xb7/0xf0 4[ 7736.344064] [c130dc6c] ? rw_verify_area+0x6c/0x120 4[ 7736.344077] [c1349aa8] ? sys_epoll_wait+0x68/0x360 4[ 7736.344098] [c130e1e9] vfs_read+0x149/0x160 4[ 7736.344120] [c130f518] ? fget_light+0x58/0xd0 4[ 7736.344142] [c130e23d] sys_read+0x3d/0x70 4[ 7736.344164] [c198c361] syscall_call+0x7/0xb 4[ 7736.344187] [c198] ? perf_cpu_notify+0x45/0x89 4[ 7736.344205] ---[ end trace b3c5b245ce7ff5b5 ]--- -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] tcp: Modify the condition for the first skb to collapse
On Tue, 2013-06-18 at 05:52 -0400, Jun Chen wrote: There are many warning for tcp_recvmsg before this crash. I can't find other memory warning in the logs, but I'm not sure whether there are memory issues because of the length limitation of saved logs. I think this logs will give you more information. 4[ 7736.343742] [ cut here ] 4[ 7736.343759] WARNING: at /data/buildbot/workdir/jb/kernel/net/ipv4/tcp.c:1496 tcp_recvmsg +0x3bf/0x910() 4[ 7736.343775] recvmsg bug: copied AB57C870 seq AB57CD95 rcvnxt AB57F19F fl 0 4[ 7736.343845] Call Trace: 4[ 7736.343865] [c1237032] warn_slowpath_common+0x72/0xa0 4[ 7736.343888] [c18a955f] ? tcp_recvmsg+0x3bf/0x910 4[ 7736.343902] [c18a955f] ? tcp_recvmsg+0x3bf/0x910 4[ 7736.343922] [c1237103] warn_slowpath_fmt+0x33/0x40 4[ 7736.343944] [c18a955f] tcp_recvmsg+0x3bf/0x910 4[ 7736.343968] [c18c9bb5] inet_recvmsg+0x85/0xa0 4[ 7736.343992] [c1852030] sock_aio_read+0x140/0x160 4[ 7736.344016] [c126b221] ? set_next_entity+0xc1/0xf0 4[ 7736.344039] [c130d627] do_sync_read+0xb7/0xf0 4[ 7736.344064] [c130dc6c] ? rw_verify_area+0x6c/0x120 4[ 7736.344077] [c1349aa8] ? sys_epoll_wait+0x68/0x360 4[ 7736.344098] [c130e1e9] vfs_read+0x149/0x160 4[ 7736.344120] [c130f518] ? fget_light+0x58/0xd0 4[ 7736.344142] [c130e23d] sys_read+0x3d/0x70 4[ 7736.344164] [c198c361] syscall_call+0x7/0xb 4[ 7736.344187] [c198] ? perf_cpu_notify+0x45/0x89 4[ 7736.344205] ---[ end trace b3c5b245ce7ff5b5 ]--- Thats exactly the interesting stuff ;) This was fixed, or should be fixed if still happening on more recent kernels. Basically, once we are in this state, there is nothing we can do to prevent a crash. Please try to reproduce the issue using 3.9 or David trees (net-next or net ) Thanks -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/