Re: [PATCH] tcp: Modify the condition for the first skb to collapse

2013-06-17 Thread Eric Dumazet
On Tue, 2013-06-18 at 05:52 -0400, Jun Chen wrote:
> > 
> There are many warning for tcp_recvmsg before this crash. I can't find
> other memory warning in the logs, but I'm not sure whether there are
> memory issues because of the length limitation of saved logs. I think
> this logs will give you more information.
> 
> <4>[ 7736.343742] [ cut here ]
> 
> <4>[ 7736.343759] WARNING:
> at /data/buildbot/workdir/jb/kernel/net/ipv4/tcp.c:1496 tcp_recvmsg
> +0x3bf/0x910()
> 
> <4>[ 7736.343775] recvmsg bug: copied AB57C870 seq AB57CD95 rcvnxt
> AB57F19F fl 0
> 
> <4>[ 7736.343845] Call Trace:
> 
> <4>[ 7736.343865]  [] warn_slowpath_common+0x72/0xa0
> 
> <4>[ 7736.343888]  [] ? tcp_recvmsg+0x3bf/0x910
> 
> <4>[ 7736.343902]  [] ? tcp_recvmsg+0x3bf/0x910
> 
> <4>[ 7736.343922]  [] warn_slowpath_fmt+0x33/0x40
> 
> <4>[ 7736.343944]  [] tcp_recvmsg+0x3bf/0x910
> 
> <4>[ 7736.343968]  [] inet_recvmsg+0x85/0xa0
> 
> <4>[ 7736.343992]  [] sock_aio_read+0x140/0x160
> 
> <4>[ 7736.344016]  [] ? set_next_entity+0xc1/0xf0
> 
> <4>[ 7736.344039]  [] do_sync_read+0xb7/0xf0
> 
> <4>[ 7736.344064]  [] ? rw_verify_area+0x6c/0x120
> 
> <4>[ 7736.344077]  [] ? sys_epoll_wait+0x68/0x360
> 
> <4>[ 7736.344098]  [] vfs_read+0x149/0x160
> 
> <4>[ 7736.344120]  [] ? fget_light+0x58/0xd0
> 
> <4>[ 7736.344142]  [] sys_read+0x3d/0x70
> 
> <4>[ 7736.344164]  [] syscall_call+0x7/0xb
> 
> <4>[ 7736.344187]  [] ? perf_cpu_notify+0x45/0x89
> 
> <4>[ 7736.344205] ---[ end trace b3c5b245ce7ff5b5 ]---
> 

Thats exactly the interesting stuff ;)

This was fixed, or should be fixed if still happening on more recent
kernels.

Basically, once we are in this state, there is nothing we can do to
prevent a crash.

Please try to reproduce the issue using 3.9 or David trees (net-next or
net )

Thanks


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] tcp: Modify the condition for the first skb to collapse

2013-06-17 Thread Jun Chen
On Mon, 2013-06-17 at 06:21 -0700, Eric Dumazet wrote:
> On Mon, 2013-06-17 at 14:52 -0400, Jun Chen wrote:
> > On Mon, 2013-06-17 at 03:29 -0700, Eric Dumazet wrote:
> > > On Mon, 2013-06-17 at 13:29 -0400, Jun Chen wrote:
> > > > > 
> > > > hi,
> > > > When the condition of tcp_win_from_space(skb->truesize) > skb->len is
> > > > true but the before(start, TCP_SKB_CB(skb)->seq) is also true, the final
> > > > condition will be true. The follow line:
> > > > int offset = start - TCP_SKB_CB(skb)->seq;
> > > > BUG_ON(offset < 0);
> > > > this BUG_ON will be triggered.
> > > > 
> > > 
> > > Really this should never happen, we must track what's happening here.
> > It's very very rare, but the logic of codes have such a little hole.
> > > 
> > > Are you using a pristine kernel, without any patches ?
> > The based kernel version is 3.4.  
> > > 
> > > Are you able to reproduce this bug in a short amount of time ?
> > I can't reproduce it in short time, this log had just been found once
> > for long long time tests on many devices . 
> > > 
> > > What kind of driver is in use ? (your stack trace was truncated)
> > 
> > I attach the whole stack traces for you.
> > 

> Any other suspect messages before this, a memory allocation error for
> example ?
> 
> I believe we have a bug in tcp_collapse() if one alloc_skb() returns
> NULL while we were in the middle of collapsing a big GRO packet.
> 
> gro_skb needed 3 skb to be rebuilt, and only two skbs could be allocated
> 
> skb1: seq=X  end_seq=X+4000
> skb2: seq=X+4000 end_seq=X+8000
> 
> grp_skb: seq=X end_seq=X+16000
> 
> Next time we call tcp_collapse(), we might split again the GRO packet
> and get following incorrect queue :
> 
> skb1: seq=X  end_seq=X+4000
> skb2: seq=X+4000 end_seq=X+8000
> skb3: seq=X  end_seq=X+4000
> skb4: seq=X+4000 end_seq=X+8000
> skb5: seq=X+8000 end_seq=X+12000
> skb6: seq=X+12000 end_seq=X+16000
> 
> 
> I would use the following patch instead, to narrow the problem
> 
> If we really find in the ofo queue a skb with a lower seq than the
> previous one, we should complain instead of lowering @start, since
> this is going to crash later.
> 
> receive_queue / ofo_queue should contain monotonically increasing
> skb->seq.
> 
> diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
> index 46271cdc..5507a09 100644
> --- a/net/ipv4/tcp_input.c
> +++ b/net/ipv4/tcp_input.c
> @@ -4513,8 +4513,10 @@ static void tcp_collapse_ofo_queue(struct sock *sk)
>   start = TCP_SKB_CB(skb)->seq;
>   end = TCP_SKB_CB(skb)->end_seq;
>   } else {
> - if (before(TCP_SKB_CB(skb)->seq, start))
> - start = TCP_SKB_CB(skb)->seq;
> + if (before(TCP_SKB_CB(skb)->seq, start)) {
> + pr_err_once("tcp_collapse_ofo_queue() : seq 
> %08x before start %08X\n",
> + TCP_SKB_CB(skb)->seq, start);
> + }
>   if (after(TCP_SKB_CB(skb)->end_seq, end))
>   end = TCP_SKB_CB(skb)->end_seq;
>   }
> 
> 
There are many warning for tcp_recvmsg before this crash. I can't find
other memory warning in the logs, but I'm not sure whether there are
memory issues because of the length limitation of saved logs. I think
this logs will give you more information.

<4>[ 7736.343742] [ cut here ]

<4>[ 7736.343759] WARNING:
at /data/buildbot/workdir/jb/kernel/net/ipv4/tcp.c:1496 tcp_recvmsg
+0x3bf/0x910()

<4>[ 7736.343775] recvmsg bug: copied AB57C870 seq AB57CD95 rcvnxt
AB57F19F fl 0

<4>[ 7736.343845] Call Trace:

<4>[ 7736.343865]  [] warn_slowpath_common+0x72/0xa0

<4>[ 7736.343888]  [] ? tcp_recvmsg+0x3bf/0x910

<4>[ 7736.343902]  [] ? tcp_recvmsg+0x3bf/0x910

<4>[ 7736.343922]  [] warn_slowpath_fmt+0x33/0x40

<4>[ 7736.343944]  [] tcp_recvmsg+0x3bf/0x910

<4>[ 7736.343968]  [] inet_recvmsg+0x85/0xa0

<4>[ 7736.343992]  [] sock_aio_read+0x140/0x160

<4>[ 7736.344016]  [] ? set_next_entity+0xc1/0xf0

<4>[ 7736.344039]  [] do_sync_read+0xb7/0xf0

<4>[ 7736.344064]  [] ? rw_verify_area+0x6c/0x120

<4>[ 7736.344077]  [] ? sys_epoll_wait+0x68/0x360

<4>[ 7736.344098]  [] vfs_read+0x149/0x160

<4>[ 7736.344120]  [] ? fget_light+0x58/0xd0

<4>[ 7736.344142]  [] sys_read+0x3d/0x70

<4>[ 7736.344164]  [] syscall_call+0x7/0xb

<4>[ 7736.344187]  [] ? perf_cpu_notify+0x45/0x89

<4>[ 7736.344205] ---[ end trace b3c5b245ce7ff5b5 ]---



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] tcp: Modify the condition for the first skb to collapse

2013-06-17 Thread Eric Dumazet
On Mon, 2013-06-17 at 14:52 -0400, Jun Chen wrote:
> On Mon, 2013-06-17 at 03:29 -0700, Eric Dumazet wrote:
> > On Mon, 2013-06-17 at 13:29 -0400, Jun Chen wrote:
> > > > 
> > > hi,
> > > When the condition of tcp_win_from_space(skb->truesize) > skb->len is
> > > true but the before(start, TCP_SKB_CB(skb)->seq) is also true, the final
> > > condition will be true. The follow line:
> > > int offset = start - TCP_SKB_CB(skb)->seq;
> > > BUG_ON(offset < 0);
> > > this BUG_ON will be triggered.
> > > 
> > 
> > Really this should never happen, we must track what's happening here.
> It's very very rare, but the logic of codes have such a little hole.
> > 
> > Are you using a pristine kernel, without any patches ?
> The based kernel version is 3.4.  
> > 
> > Are you able to reproduce this bug in a short amount of time ?
> I can't reproduce it in short time, this log had just been found once
> for long long time tests on many devices . 
> > 
> > What kind of driver is in use ? (your stack trace was truncated)
> 
> I attach the whole stack traces for you.
> 
> <0>[ 7736.348788] Call Trace:
> 
> <4>[ 7736.348861]  [] tcp_prune_queue+0x120/0x2f0
> 
> <4>[ 7736.348984]  [] tcp_data_queue+0x777/0xf00
> 
> <4>[ 7736.349055]  [] ? ipt_do_table+0x1f8/0x480
> 
> <4>[ 7736.349126]  [] ? ipt_do_table+0x1f8/0x480
> 
> <4>[ 7736.349196]  [] tcp_rcv_established+0x114/0x680
> 
> <4>[ 7736.349269]  [] tcp_v4_do_rcv+0x164/0x350
> 
> <4>[ 7736.349396]  [] ? nf_nat_fn+0xb1/0x1d0
> 
> <4>[ 7736.349470]  [] tcp_v4_rcv+0x6f1/0x7a0
> 
> <4>[ 7736.349599]  [] ? nf_hook_slow+0x10d/0x150
> 
> <4>[ 7736.349673]  [] ip_local_deliver_finish+0x8b/0x200
> 
> <4>[ 7736.349796]  [] ip_local_deliver+0x8f/0xa0
> 
> <4>[ 7736.349867]  [] ? ip_rcv_finish+0x300/0x300
> 
> <4>[ 7736.349937]  [] ip_rcv_finish+0xdf/0x300
> 
> <4>[ 7736.350062]  [] ip_rcv+0x258/0x330
> 
> <4>[ 7736.350132]  [] ? inet_del_protocol+0x30/0x30
> 
> <4>[ 7736.350258]  [] __netif_receive_skb+0x325/0x410
> 
> <4>[ 7736.350331]  [] process_backlog+0x96/0x150
> 
> <4>[ 7736.350455]  [] net_rx_action+0x115/0x210
> 
> <4>[ 7736.350525]  [] ? tcp_out_of_resources+0xb0/0xb0
> 
> <4>[ 7736.350652]  [] __do_softirq+0x9b/0x220
> 
> <4>[ 7736.350723]  [] ? local_bh_enable_ip+0xd0/0xd0
> 

Any other suspect messages before this, a memory allocation error for
example ?

I believe we have a bug in tcp_collapse() if one alloc_skb() returns
NULL while we were in the middle of collapsing a big GRO packet.

gro_skb needed 3 skb to be rebuilt, and only two skbs could be allocated

skb1: seq=X  end_seq=X+4000
skb2: seq=X+4000 end_seq=X+8000

grp_skb: seq=X end_seq=X+16000

Next time we call tcp_collapse(), we might split again the GRO packet
and get following incorrect queue :

skb1: seq=X  end_seq=X+4000
skb2: seq=X+4000 end_seq=X+8000
skb3: seq=X  end_seq=X+4000
skb4: seq=X+4000 end_seq=X+8000
skb5: seq=X+8000 end_seq=X+12000
skb6: seq=X+12000 end_seq=X+16000


I would use the following patch instead, to narrow the problem

If we really find in the ofo queue a skb with a lower seq than the
previous one, we should complain instead of lowering @start, since
this is going to crash later.

receive_queue / ofo_queue should contain monotonically increasing
skb->seq.

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 46271cdc..5507a09 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -4513,8 +4513,10 @@ static void tcp_collapse_ofo_queue(struct sock *sk)
start = TCP_SKB_CB(skb)->seq;
end = TCP_SKB_CB(skb)->end_seq;
} else {
-   if (before(TCP_SKB_CB(skb)->seq, start))
-   start = TCP_SKB_CB(skb)->seq;
+   if (before(TCP_SKB_CB(skb)->seq, start)) {
+   pr_err_once("tcp_collapse_ofo_queue() : seq 
%08x before start %08X\n",
+   TCP_SKB_CB(skb)->seq, start);
+   }
if (after(TCP_SKB_CB(skb)->end_seq, end))
end = TCP_SKB_CB(skb)->end_seq;
}


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] tcp: Modify the condition for the first skb to collapse

2013-06-17 Thread Jun Chen
On Mon, 2013-06-17 at 03:29 -0700, Eric Dumazet wrote:
> On Mon, 2013-06-17 at 13:29 -0400, Jun Chen wrote:
> > > 
> > hi,
> > When the condition of tcp_win_from_space(skb->truesize) > skb->len is
> > true but the before(start, TCP_SKB_CB(skb)->seq) is also true, the final
> > condition will be true. The follow line:
> > int offset = start - TCP_SKB_CB(skb)->seq;
> > BUG_ON(offset < 0);
> > this BUG_ON will be triggered.
> > 
> 
> Really this should never happen, we must track what's happening here.
It's very very rare, but the logic of codes have such a little hole.
> 
> Are you using a pristine kernel, without any patches ?
The based kernel version is 3.4.  
> 
> Are you able to reproduce this bug in a short amount of time ?
I can't reproduce it in short time, this log had just been found once
for long long time tests on many devices . 
> 
> What kind of driver is in use ? (your stack trace was truncated)

I attach the whole stack traces for you.

<0>[ 7736.348788] Call Trace:

<4>[ 7736.348861]  [] tcp_prune_queue+0x120/0x2f0

<4>[ 7736.348984]  [] tcp_data_queue+0x777/0xf00

<4>[ 7736.349055]  [] ? ipt_do_table+0x1f8/0x480

<4>[ 7736.349126]  [] ? ipt_do_table+0x1f8/0x480

<4>[ 7736.349196]  [] tcp_rcv_established+0x114/0x680

<4>[ 7736.349269]  [] tcp_v4_do_rcv+0x164/0x350

<4>[ 7736.349396]  [] ? nf_nat_fn+0xb1/0x1d0

<4>[ 7736.349470]  [] tcp_v4_rcv+0x6f1/0x7a0

<4>[ 7736.349599]  [] ? nf_hook_slow+0x10d/0x150

<4>[ 7736.349673]  [] ip_local_deliver_finish+0x8b/0x200

<4>[ 7736.349796]  [] ip_local_deliver+0x8f/0xa0

<4>[ 7736.349867]  [] ? ip_rcv_finish+0x300/0x300

<4>[ 7736.349937]  [] ip_rcv_finish+0xdf/0x300

<4>[ 7736.350062]  [] ip_rcv+0x258/0x330

<4>[ 7736.350132]  [] ? inet_del_protocol+0x30/0x30

<4>[ 7736.350258]  [] __netif_receive_skb+0x325/0x410

<4>[ 7736.350331]  [] process_backlog+0x96/0x150

<4>[ 7736.350455]  [] net_rx_action+0x115/0x210

<4>[ 7736.350525]  [] ? tcp_out_of_resources+0xb0/0xb0

<4>[ 7736.350652]  [] __do_softirq+0x9b/0x220

<4>[ 7736.350723]  [] ? local_bh_enable_ip+0xd0/0xd0

> 
> 
> 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] tcp: Modify the condition for the first skb to collapse

2013-06-17 Thread Eric Dumazet
On Mon, 2013-06-17 at 13:29 -0400, Jun Chen wrote:
> > 
> hi,
> When the condition of tcp_win_from_space(skb->truesize) > skb->len is
> true but the before(start, TCP_SKB_CB(skb)->seq) is also true, the final
> condition will be true. The follow line:
> int offset = start - TCP_SKB_CB(skb)->seq;
> BUG_ON(offset < 0);
> this BUG_ON will be triggered.
> 

Really this should never happen, we must track what's happening here.

Are you using a pristine kernel, without any patches ?

Are you able to reproduce this bug in a short amount of time ?

What kind of driver is in use ? (your stack trace was truncated)



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] tcp: Modify the condition for the first skb to collapse

2013-06-17 Thread Jun Chen
On Mon, 2013-06-17 at 01:15 -0700, Eric Dumazet wrote:
> On Mon, 2013-06-17 at 10:18 -0400, Jun Chen wrote:
> > When search the first skb to collapse,the condition of overlap to the next 
> > one have been
> > reached,but the start is less than TCP_SKB_CB(skb)->seq at this time, then 
> > followed process
> > will trigger the BUG_ON of the offset(start - TCP_SKB_CB(skb)->seq).
> > So this patch add one check (! before(start,TCP_SKB_CB(skb)->seq)) to avoid 
> > this ipanic.
> > 
> > Signed-off-by: Chen Jun 
> > ---
> >  net/ipv4/tcp_input.c |3 ++-
> >  1 files changed, 2 insertions(+), 1 deletions(-)
> > 
> > diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
> > index 9c62257..4c745c5 100644
> > --- a/net/ipv4/tcp_input.c
> > +++ b/net/ipv4/tcp_input.c
> > @@ -4465,7 +4465,8 @@ restart:
> >  *   overlaps to the next one.
> >  */
> > if (!tcp_hdr(skb)->syn && !tcp_hdr(skb)->fin &&
> > -   (tcp_win_from_space(skb->truesize) > skb->len ||
> > +   ((tcp_win_from_space(skb->truesize) > skb->len &&
> > +   !before(start, TCP_SKB_CB(skb)->seq)) ||
> >  before(TCP_SKB_CB(skb)->seq, start))) {
> > end_of_skbs = false;
> > break;
> 
> Hmm... I must say I do not understand this patch.
> 
> If we find a skb with before(TCP_SKB_CB(skb)->seq, start), then the
> final condition will be true.
> 
> Let's rewrite your code to equivalent one :
> 
>  if (!tcp_hdr(skb)->syn && !tcp_hdr(skb)->fin &&
>  (before(TCP_SKB_CB(skb)->seq, start) ||
>   tcp_win_from_space(skb->truesize) > skb->len)) {
> 
> So it seems your patch would not solve the problem for all
> possible skbs (aka not bloated) ?
> 
> Please tell us how you trigger this bug, and send the stack trace.
> 
> Thanks
> 
> 
hi,
When the condition of tcp_win_from_space(skb->truesize) > skb->len is
true but the before(start, TCP_SKB_CB(skb)->seq) is also true, the final
condition will be true. The follow line:
int offset = start - TCP_SKB_CB(skb)->seq;
BUG_ON(offset < 0);
this BUG_ON will be triggered.


Follow line is my error logs:

<2>[ 7736.344508] kernel BUG
at /data/buildbot/workdir/jb/kernel/net/ipv4/tcp_input.c:4845!

<4>[ 7736.344578] invalid opcode:  [#1] PREEMPT SMP 

<4>[ 7736.344883] Modules linked in: atomisp lm3559 ov9724 imx1x5
bcm4335(O) cfg80211 bcm_bt_lpm videobuf_vmalloc videobuf_core matrix(C)

<4>[ 7736.345681] 

<4>[ 7736.345748] Pid: 5189, comm: TimedEventQueue Tainted: GWC
O 3.4.43-186445-g3ada675 #1 Intel Corporation Merrifield/SALT BAY

<4>[ 7736.346059] EIP: 0060:[] EFLAGS: 00010297 CPU: 1

<4>[ 7736.346183] EIP is at tcp_collapse+0x3bd/0x3d0

<4>[ 7736.346250] EAX: ab57d2bb EBX: df428c00 ECX: c97dcd00 EDX:
10c0

<4>[ 7736.346372] ESI: df4289c0 EDI: fadb EBP: edca1d88 ESP:
edca1d60

<4>[ 7736.346441]  DS: 007b ES: 007b FS: 00d8 GS: 003b SS: 0068

<4>[ 7736.346560] CR0: 8005003b CR2: 41d310bc CR3: 2d30 CR4:
001007d0

<4>[ 7736.346629] DR0:  DR1:  DR2:  DR3:


<4>[ 7736.346749] DR6: 0ff0 DR7: 0400

<0>[ 7736.346816] Process TimedEventQueue (pid: 5189, ti=edca
task=dc30b660 task.ti=c9a6e000)

<0>[ 7736.346936] Stack:

<4>[ 7736.347002]    fadb c97dcd5c 0001 c97dcd00
0e32 c97dcd00

<4>[ 7736.347615]  c97dcd00 df428180 edca1db0 c18addd0  ab57c870
ab57f19f c97dcd00

<4>[ 7736.348175]  c97dd198 80c0 c97dcd00 df428180 edca1df0 c18aea27
 c18dc8f8

<0>[ 7736.348788] Call Trace:

<4>[ 7736.348861]  [] tcp_prune_queue+0x120/0x2f0

<4>[ 7736.348984]  [] tcp_data_queue+0x777/0xf00

<4>[ 7736.349055]  [] ? ipt_do_table+0x1f8/0x480

<4>[ 7736.349126]  [] ? ipt_do_table+0x1f8/0x480

<4>[ 7736.349196]  [] tcp_rcv_established+0x114/0x680

<4>[ 7736.349269]  [] tcp_v4_do_rcv+0x164/0x350



 
 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] tcp: Modify the condition for the first skb to collapse

2013-06-17 Thread Eric Dumazet
On Mon, 2013-06-17 at 10:18 -0400, Jun Chen wrote:
> When search the first skb to collapse,the condition of overlap to the next 
> one have been
> reached,but the start is less than TCP_SKB_CB(skb)->seq at this time, then 
> followed process
> will trigger the BUG_ON of the offset(start - TCP_SKB_CB(skb)->seq).
> So this patch add one check (! before(start,TCP_SKB_CB(skb)->seq)) to avoid 
> this ipanic.
> 
> Signed-off-by: Chen Jun 
> ---
>  net/ipv4/tcp_input.c |3 ++-
>  1 files changed, 2 insertions(+), 1 deletions(-)
> 
> diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
> index 9c62257..4c745c5 100644
> --- a/net/ipv4/tcp_input.c
> +++ b/net/ipv4/tcp_input.c
> @@ -4465,7 +4465,8 @@ restart:
>*   overlaps to the next one.
>*/
>   if (!tcp_hdr(skb)->syn && !tcp_hdr(skb)->fin &&
> - (tcp_win_from_space(skb->truesize) > skb->len ||
> + ((tcp_win_from_space(skb->truesize) > skb->len &&
> + !before(start, TCP_SKB_CB(skb)->seq)) ||
>before(TCP_SKB_CB(skb)->seq, start))) {
>   end_of_skbs = false;
>   break;

Hmm... I must say I do not understand this patch.

If we find a skb with before(TCP_SKB_CB(skb)->seq, start), then the
final condition will be true.

Let's rewrite your code to equivalent one :

 if (!tcp_hdr(skb)->syn && !tcp_hdr(skb)->fin &&
 (before(TCP_SKB_CB(skb)->seq, start) ||
  tcp_win_from_space(skb->truesize) > skb->len)) {

So it seems your patch would not solve the problem for all
possible skbs (aka not bloated) ?

Please tell us how you trigger this bug, and send the stack trace.

Thanks


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] tcp: Modify the condition for the first skb to collapse

2013-06-17 Thread Jun Chen

When search the first skb to collapse,the condition of overlap to the next one 
have been
reached,but the start is less than TCP_SKB_CB(skb)->seq at this time, then 
followed process
will trigger the BUG_ON of the offset(start - TCP_SKB_CB(skb)->seq).
So this patch add one check (! before(start,TCP_SKB_CB(skb)->seq)) to avoid 
this ipanic.

Signed-off-by: Chen Jun 
---
 net/ipv4/tcp_input.c |3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 9c62257..4c745c5 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -4465,7 +4465,8 @@ restart:
 *   overlaps to the next one.
 */
if (!tcp_hdr(skb)->syn && !tcp_hdr(skb)->fin &&
-   (tcp_win_from_space(skb->truesize) > skb->len ||
+   ((tcp_win_from_space(skb->truesize) > skb->len &&
+   !before(start, TCP_SKB_CB(skb)->seq)) ||
 before(TCP_SKB_CB(skb)->seq, start))) {
end_of_skbs = false;
break;
-- 
1.7.4.1



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] tcp: Modify the condition for the first skb to collapse

2013-06-17 Thread Jun Chen

When search the first skb to collapse,the condition of overlap to the next one 
have been
reached,but the start is less than TCP_SKB_CB(skb)-seq at this time, then 
followed process
will trigger the BUG_ON of the offset(start - TCP_SKB_CB(skb)-seq).
So this patch add one check (! before(start,TCP_SKB_CB(skb)-seq)) to avoid 
this ipanic.

Signed-off-by: Chen Jun jun.d.c...@intel.com
---
 net/ipv4/tcp_input.c |3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 9c62257..4c745c5 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -4465,7 +4465,8 @@ restart:
 *   overlaps to the next one.
 */
if (!tcp_hdr(skb)-syn  !tcp_hdr(skb)-fin 
-   (tcp_win_from_space(skb-truesize)  skb-len ||
+   ((tcp_win_from_space(skb-truesize)  skb-len 
+   !before(start, TCP_SKB_CB(skb)-seq)) ||
 before(TCP_SKB_CB(skb)-seq, start))) {
end_of_skbs = false;
break;
-- 
1.7.4.1



--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] tcp: Modify the condition for the first skb to collapse

2013-06-17 Thread Eric Dumazet
On Mon, 2013-06-17 at 10:18 -0400, Jun Chen wrote:
 When search the first skb to collapse,the condition of overlap to the next 
 one have been
 reached,but the start is less than TCP_SKB_CB(skb)-seq at this time, then 
 followed process
 will trigger the BUG_ON of the offset(start - TCP_SKB_CB(skb)-seq).
 So this patch add one check (! before(start,TCP_SKB_CB(skb)-seq)) to avoid 
 this ipanic.
 
 Signed-off-by: Chen Jun jun.d.c...@intel.com
 ---
  net/ipv4/tcp_input.c |3 ++-
  1 files changed, 2 insertions(+), 1 deletions(-)
 
 diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
 index 9c62257..4c745c5 100644
 --- a/net/ipv4/tcp_input.c
 +++ b/net/ipv4/tcp_input.c
 @@ -4465,7 +4465,8 @@ restart:
*   overlaps to the next one.
*/
   if (!tcp_hdr(skb)-syn  !tcp_hdr(skb)-fin 
 - (tcp_win_from_space(skb-truesize)  skb-len ||
 + ((tcp_win_from_space(skb-truesize)  skb-len 
 + !before(start, TCP_SKB_CB(skb)-seq)) ||
before(TCP_SKB_CB(skb)-seq, start))) {
   end_of_skbs = false;
   break;

Hmm... I must say I do not understand this patch.

If we find a skb with before(TCP_SKB_CB(skb)-seq, start), then the
final condition will be true.

Let's rewrite your code to equivalent one :

 if (!tcp_hdr(skb)-syn  !tcp_hdr(skb)-fin 
 (before(TCP_SKB_CB(skb)-seq, start) ||
  tcp_win_from_space(skb-truesize)  skb-len)) {

So it seems your patch would not solve the problem for all
possible skbs (aka not bloated) ?

Please tell us how you trigger this bug, and send the stack trace.

Thanks


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] tcp: Modify the condition for the first skb to collapse

2013-06-17 Thread Jun Chen
On Mon, 2013-06-17 at 01:15 -0700, Eric Dumazet wrote:
 On Mon, 2013-06-17 at 10:18 -0400, Jun Chen wrote:
  When search the first skb to collapse,the condition of overlap to the next 
  one have been
  reached,but the start is less than TCP_SKB_CB(skb)-seq at this time, then 
  followed process
  will trigger the BUG_ON of the offset(start - TCP_SKB_CB(skb)-seq).
  So this patch add one check (! before(start,TCP_SKB_CB(skb)-seq)) to avoid 
  this ipanic.
  
  Signed-off-by: Chen Jun jun.d.c...@intel.com
  ---
   net/ipv4/tcp_input.c |3 ++-
   1 files changed, 2 insertions(+), 1 deletions(-)
  
  diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
  index 9c62257..4c745c5 100644
  --- a/net/ipv4/tcp_input.c
  +++ b/net/ipv4/tcp_input.c
  @@ -4465,7 +4465,8 @@ restart:
   *   overlaps to the next one.
   */
  if (!tcp_hdr(skb)-syn  !tcp_hdr(skb)-fin 
  -   (tcp_win_from_space(skb-truesize)  skb-len ||
  +   ((tcp_win_from_space(skb-truesize)  skb-len 
  +   !before(start, TCP_SKB_CB(skb)-seq)) ||
   before(TCP_SKB_CB(skb)-seq, start))) {
  end_of_skbs = false;
  break;
 
 Hmm... I must say I do not understand this patch.
 
 If we find a skb with before(TCP_SKB_CB(skb)-seq, start), then the
 final condition will be true.
 
 Let's rewrite your code to equivalent one :
 
  if (!tcp_hdr(skb)-syn  !tcp_hdr(skb)-fin 
  (before(TCP_SKB_CB(skb)-seq, start) ||
   tcp_win_from_space(skb-truesize)  skb-len)) {
 
 So it seems your patch would not solve the problem for all
 possible skbs (aka not bloated) ?
 
 Please tell us how you trigger this bug, and send the stack trace.
 
 Thanks
 
 
hi,
When the condition of tcp_win_from_space(skb-truesize)  skb-len is
true but the before(start, TCP_SKB_CB(skb)-seq) is also true, the final
condition will be true. The follow line:
int offset = start - TCP_SKB_CB(skb)-seq;
BUG_ON(offset  0);
this BUG_ON will be triggered.


Follow line is my error logs:

2[ 7736.344508] kernel BUG
at /data/buildbot/workdir/jb/kernel/net/ipv4/tcp_input.c:4845!

4[ 7736.344578] invalid opcode:  [#1] PREEMPT SMP 

4[ 7736.344883] Modules linked in: atomisp lm3559 ov9724 imx1x5
bcm4335(O) cfg80211 bcm_bt_lpm videobuf_vmalloc videobuf_core matrix(C)

4[ 7736.345681] 

4[ 7736.345748] Pid: 5189, comm: TimedEventQueue Tainted: GWC
O 3.4.43-186445-g3ada675 #1 Intel Corporation Merrifield/SALT BAY

4[ 7736.346059] EIP: 0060:[c18ad61d] EFLAGS: 00010297 CPU: 1

4[ 7736.346183] EIP is at tcp_collapse+0x3bd/0x3d0

4[ 7736.346250] EAX: ab57d2bb EBX: df428c00 ECX: c97dcd00 EDX:
10c0

4[ 7736.346372] ESI: df4289c0 EDI: fadb EBP: edca1d88 ESP:
edca1d60

4[ 7736.346441]  DS: 007b ES: 007b FS: 00d8 GS: 003b SS: 0068

4[ 7736.346560] CR0: 8005003b CR2: 41d310bc CR3: 2d30 CR4:
001007d0

4[ 7736.346629] DR0:  DR1:  DR2:  DR3:


4[ 7736.346749] DR6: 0ff0 DR7: 0400

0[ 7736.346816] Process TimedEventQueue (pid: 5189, ti=edca
task=dc30b660 task.ti=c9a6e000)

0[ 7736.346936] Stack:

4[ 7736.347002]    fadb c97dcd5c 0001 c97dcd00
0e32 c97dcd00

4[ 7736.347615]  c97dcd00 df428180 edca1db0 c18addd0  ab57c870
ab57f19f c97dcd00

4[ 7736.348175]  c97dd198 80c0 c97dcd00 df428180 edca1df0 c18aea27
 c18dc8f8

0[ 7736.348788] Call Trace:

4[ 7736.348861]  [c18addd0] tcp_prune_queue+0x120/0x2f0

4[ 7736.348984]  [c18aea27] tcp_data_queue+0x777/0xf00

4[ 7736.349055]  [c18dc8f8] ? ipt_do_table+0x1f8/0x480

4[ 7736.349126]  [c18dc8f8] ? ipt_do_table+0x1f8/0x480

4[ 7736.349196]  [c18b2e84] tcp_rcv_established+0x114/0x680

4[ 7736.349269]  [c18bb034] tcp_v4_do_rcv+0x164/0x350



 
 

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] tcp: Modify the condition for the first skb to collapse

2013-06-17 Thread Eric Dumazet
On Mon, 2013-06-17 at 13:29 -0400, Jun Chen wrote:
  
 hi,
 When the condition of tcp_win_from_space(skb-truesize)  skb-len is
 true but the before(start, TCP_SKB_CB(skb)-seq) is also true, the final
 condition will be true. The follow line:
 int offset = start - TCP_SKB_CB(skb)-seq;
 BUG_ON(offset  0);
 this BUG_ON will be triggered.
 

Really this should never happen, we must track what's happening here.

Are you using a pristine kernel, without any patches ?

Are you able to reproduce this bug in a short amount of time ?

What kind of driver is in use ? (your stack trace was truncated)



--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] tcp: Modify the condition for the first skb to collapse

2013-06-17 Thread Jun Chen
On Mon, 2013-06-17 at 03:29 -0700, Eric Dumazet wrote:
 On Mon, 2013-06-17 at 13:29 -0400, Jun Chen wrote:
   
  hi,
  When the condition of tcp_win_from_space(skb-truesize)  skb-len is
  true but the before(start, TCP_SKB_CB(skb)-seq) is also true, the final
  condition will be true. The follow line:
  int offset = start - TCP_SKB_CB(skb)-seq;
  BUG_ON(offset  0);
  this BUG_ON will be triggered.
  
 
 Really this should never happen, we must track what's happening here.
It's very very rare, but the logic of codes have such a little hole.
 
 Are you using a pristine kernel, without any patches ?
The based kernel version is 3.4.  
 
 Are you able to reproduce this bug in a short amount of time ?
I can't reproduce it in short time, this log had just been found once
for long long time tests on many devices . 
 
 What kind of driver is in use ? (your stack trace was truncated)

I attach the whole stack traces for you.

0[ 7736.348788] Call Trace:

4[ 7736.348861]  [c18addd0] tcp_prune_queue+0x120/0x2f0

4[ 7736.348984]  [c18aea27] tcp_data_queue+0x777/0xf00

4[ 7736.349055]  [c18dc8f8] ? ipt_do_table+0x1f8/0x480

4[ 7736.349126]  [c18dc8f8] ? ipt_do_table+0x1f8/0x480

4[ 7736.349196]  [c18b2e84] tcp_rcv_established+0x114/0x680

4[ 7736.349269]  [c18bb034] tcp_v4_do_rcv+0x164/0x350

4[ 7736.349396]  [c18de301] ? nf_nat_fn+0xb1/0x1d0

4[ 7736.349470]  [c18bc0c1] tcp_v4_rcv+0x6f1/0x7a0

4[ 7736.349599]  [c1881dad] ? nf_hook_slow+0x10d/0x150

4[ 7736.349673]  [c189b30b] ip_local_deliver_finish+0x8b/0x200

4[ 7736.349796]  [c189b60f] ip_local_deliver+0x8f/0xa0

4[ 7736.349867]  [c189b280] ? ip_rcv_finish+0x300/0x300

4[ 7736.349937]  [c189b05f] ip_rcv_finish+0xdf/0x300

4[ 7736.350062]  [c189b878] ip_rcv+0x258/0x330

4[ 7736.350132]  [c189af80] ? inet_del_protocol+0x30/0x30

4[ 7736.350258]  [c1864175] __netif_receive_skb+0x325/0x410

4[ 7736.350331]  [c1864956] process_backlog+0x96/0x150

4[ 7736.350455]  [c1864ba5] net_rx_action+0x115/0x210

4[ 7736.350525]  [c18b7680] ? tcp_out_of_resources+0xb0/0xb0

4[ 7736.350652]  [c123dc0b] __do_softirq+0x9b/0x220

4[ 7736.350723]  [c123db70] ? local_bh_enable_ip+0xd0/0xd0

 
 
 


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] tcp: Modify the condition for the first skb to collapse

2013-06-17 Thread Eric Dumazet
On Mon, 2013-06-17 at 14:52 -0400, Jun Chen wrote:
 On Mon, 2013-06-17 at 03:29 -0700, Eric Dumazet wrote:
  On Mon, 2013-06-17 at 13:29 -0400, Jun Chen wrote:

   hi,
   When the condition of tcp_win_from_space(skb-truesize)  skb-len is
   true but the before(start, TCP_SKB_CB(skb)-seq) is also true, the final
   condition will be true. The follow line:
   int offset = start - TCP_SKB_CB(skb)-seq;
   BUG_ON(offset  0);
   this BUG_ON will be triggered.
   
  
  Really this should never happen, we must track what's happening here.
 It's very very rare, but the logic of codes have such a little hole.
  
  Are you using a pristine kernel, without any patches ?
 The based kernel version is 3.4.  
  
  Are you able to reproduce this bug in a short amount of time ?
 I can't reproduce it in short time, this log had just been found once
 for long long time tests on many devices . 
  
  What kind of driver is in use ? (your stack trace was truncated)
 
 I attach the whole stack traces for you.
 
 0[ 7736.348788] Call Trace:
 
 4[ 7736.348861]  [c18addd0] tcp_prune_queue+0x120/0x2f0
 
 4[ 7736.348984]  [c18aea27] tcp_data_queue+0x777/0xf00
 
 4[ 7736.349055]  [c18dc8f8] ? ipt_do_table+0x1f8/0x480
 
 4[ 7736.349126]  [c18dc8f8] ? ipt_do_table+0x1f8/0x480
 
 4[ 7736.349196]  [c18b2e84] tcp_rcv_established+0x114/0x680
 
 4[ 7736.349269]  [c18bb034] tcp_v4_do_rcv+0x164/0x350
 
 4[ 7736.349396]  [c18de301] ? nf_nat_fn+0xb1/0x1d0
 
 4[ 7736.349470]  [c18bc0c1] tcp_v4_rcv+0x6f1/0x7a0
 
 4[ 7736.349599]  [c1881dad] ? nf_hook_slow+0x10d/0x150
 
 4[ 7736.349673]  [c189b30b] ip_local_deliver_finish+0x8b/0x200
 
 4[ 7736.349796]  [c189b60f] ip_local_deliver+0x8f/0xa0
 
 4[ 7736.349867]  [c189b280] ? ip_rcv_finish+0x300/0x300
 
 4[ 7736.349937]  [c189b05f] ip_rcv_finish+0xdf/0x300
 
 4[ 7736.350062]  [c189b878] ip_rcv+0x258/0x330
 
 4[ 7736.350132]  [c189af80] ? inet_del_protocol+0x30/0x30
 
 4[ 7736.350258]  [c1864175] __netif_receive_skb+0x325/0x410
 
 4[ 7736.350331]  [c1864956] process_backlog+0x96/0x150
 
 4[ 7736.350455]  [c1864ba5] net_rx_action+0x115/0x210
 
 4[ 7736.350525]  [c18b7680] ? tcp_out_of_resources+0xb0/0xb0
 
 4[ 7736.350652]  [c123dc0b] __do_softirq+0x9b/0x220
 
 4[ 7736.350723]  [c123db70] ? local_bh_enable_ip+0xd0/0xd0
 

Any other suspect messages before this, a memory allocation error for
example ?

I believe we have a bug in tcp_collapse() if one alloc_skb() returns
NULL while we were in the middle of collapsing a big GRO packet.

gro_skb needed 3 skb to be rebuilt, and only two skbs could be allocated

skb1: seq=X  end_seq=X+4000
skb2: seq=X+4000 end_seq=X+8000
missing
grp_skb: seq=X end_seq=X+16000

Next time we call tcp_collapse(), we might split again the GRO packet
and get following incorrect queue :

skb1: seq=X  end_seq=X+4000
skb2: seq=X+4000 end_seq=X+8000
skb3: seq=X  end_seq=X+4000
skb4: seq=X+4000 end_seq=X+8000
skb5: seq=X+8000 end_seq=X+12000
skb6: seq=X+12000 end_seq=X+16000


I would use the following patch instead, to narrow the problem

If we really find in the ofo queue a skb with a lower seq than the
previous one, we should complain instead of lowering @start, since
this is going to crash later.

receive_queue / ofo_queue should contain monotonically increasing
skb-seq.

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 46271cdc..5507a09 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -4513,8 +4513,10 @@ static void tcp_collapse_ofo_queue(struct sock *sk)
start = TCP_SKB_CB(skb)-seq;
end = TCP_SKB_CB(skb)-end_seq;
} else {
-   if (before(TCP_SKB_CB(skb)-seq, start))
-   start = TCP_SKB_CB(skb)-seq;
+   if (before(TCP_SKB_CB(skb)-seq, start)) {
+   pr_err_once(tcp_collapse_ofo_queue() : seq 
%08x before start %08X\n,
+   TCP_SKB_CB(skb)-seq, start);
+   }
if (after(TCP_SKB_CB(skb)-end_seq, end))
end = TCP_SKB_CB(skb)-end_seq;
}


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] tcp: Modify the condition for the first skb to collapse

2013-06-17 Thread Jun Chen
On Mon, 2013-06-17 at 06:21 -0700, Eric Dumazet wrote:
 On Mon, 2013-06-17 at 14:52 -0400, Jun Chen wrote:
  On Mon, 2013-06-17 at 03:29 -0700, Eric Dumazet wrote:
   On Mon, 2013-06-17 at 13:29 -0400, Jun Chen wrote:
 
hi,
When the condition of tcp_win_from_space(skb-truesize)  skb-len is
true but the before(start, TCP_SKB_CB(skb)-seq) is also true, the final
condition will be true. The follow line:
int offset = start - TCP_SKB_CB(skb)-seq;
BUG_ON(offset  0);
this BUG_ON will be triggered.

   
   Really this should never happen, we must track what's happening here.
  It's very very rare, but the logic of codes have such a little hole.
   
   Are you using a pristine kernel, without any patches ?
  The based kernel version is 3.4.  
   
   Are you able to reproduce this bug in a short amount of time ?
  I can't reproduce it in short time, this log had just been found once
  for long long time tests on many devices . 
   
   What kind of driver is in use ? (your stack trace was truncated)
  
  I attach the whole stack traces for you.
  

 Any other suspect messages before this, a memory allocation error for
 example ?
 
 I believe we have a bug in tcp_collapse() if one alloc_skb() returns
 NULL while we were in the middle of collapsing a big GRO packet.
 
 gro_skb needed 3 skb to be rebuilt, and only two skbs could be allocated
 
 skb1: seq=X  end_seq=X+4000
 skb2: seq=X+4000 end_seq=X+8000
 missing
 grp_skb: seq=X end_seq=X+16000
 
 Next time we call tcp_collapse(), we might split again the GRO packet
 and get following incorrect queue :
 
 skb1: seq=X  end_seq=X+4000
 skb2: seq=X+4000 end_seq=X+8000
 skb3: seq=X  end_seq=X+4000
 skb4: seq=X+4000 end_seq=X+8000
 skb5: seq=X+8000 end_seq=X+12000
 skb6: seq=X+12000 end_seq=X+16000
 
 
 I would use the following patch instead, to narrow the problem
 
 If we really find in the ofo queue a skb with a lower seq than the
 previous one, we should complain instead of lowering @start, since
 this is going to crash later.
 
 receive_queue / ofo_queue should contain monotonically increasing
 skb-seq.
 
 diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
 index 46271cdc..5507a09 100644
 --- a/net/ipv4/tcp_input.c
 +++ b/net/ipv4/tcp_input.c
 @@ -4513,8 +4513,10 @@ static void tcp_collapse_ofo_queue(struct sock *sk)
   start = TCP_SKB_CB(skb)-seq;
   end = TCP_SKB_CB(skb)-end_seq;
   } else {
 - if (before(TCP_SKB_CB(skb)-seq, start))
 - start = TCP_SKB_CB(skb)-seq;
 + if (before(TCP_SKB_CB(skb)-seq, start)) {
 + pr_err_once(tcp_collapse_ofo_queue() : seq 
 %08x before start %08X\n,
 + TCP_SKB_CB(skb)-seq, start);
 + }
   if (after(TCP_SKB_CB(skb)-end_seq, end))
   end = TCP_SKB_CB(skb)-end_seq;
   }
 
 
There are many warning for tcp_recvmsg before this crash. I can't find
other memory warning in the logs, but I'm not sure whether there are
memory issues because of the length limitation of saved logs. I think
this logs will give you more information.

4[ 7736.343742] [ cut here ]

4[ 7736.343759] WARNING:
at /data/buildbot/workdir/jb/kernel/net/ipv4/tcp.c:1496 tcp_recvmsg
+0x3bf/0x910()

4[ 7736.343775] recvmsg bug: copied AB57C870 seq AB57CD95 rcvnxt
AB57F19F fl 0

4[ 7736.343845] Call Trace:

4[ 7736.343865]  [c1237032] warn_slowpath_common+0x72/0xa0

4[ 7736.343888]  [c18a955f] ? tcp_recvmsg+0x3bf/0x910

4[ 7736.343902]  [c18a955f] ? tcp_recvmsg+0x3bf/0x910

4[ 7736.343922]  [c1237103] warn_slowpath_fmt+0x33/0x40

4[ 7736.343944]  [c18a955f] tcp_recvmsg+0x3bf/0x910

4[ 7736.343968]  [c18c9bb5] inet_recvmsg+0x85/0xa0

4[ 7736.343992]  [c1852030] sock_aio_read+0x140/0x160

4[ 7736.344016]  [c126b221] ? set_next_entity+0xc1/0xf0

4[ 7736.344039]  [c130d627] do_sync_read+0xb7/0xf0

4[ 7736.344064]  [c130dc6c] ? rw_verify_area+0x6c/0x120

4[ 7736.344077]  [c1349aa8] ? sys_epoll_wait+0x68/0x360

4[ 7736.344098]  [c130e1e9] vfs_read+0x149/0x160

4[ 7736.344120]  [c130f518] ? fget_light+0x58/0xd0

4[ 7736.344142]  [c130e23d] sys_read+0x3d/0x70

4[ 7736.344164]  [c198c361] syscall_call+0x7/0xb

4[ 7736.344187]  [c198] ? perf_cpu_notify+0x45/0x89

4[ 7736.344205] ---[ end trace b3c5b245ce7ff5b5 ]---



--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] tcp: Modify the condition for the first skb to collapse

2013-06-17 Thread Eric Dumazet
On Tue, 2013-06-18 at 05:52 -0400, Jun Chen wrote:
  
 There are many warning for tcp_recvmsg before this crash. I can't find
 other memory warning in the logs, but I'm not sure whether there are
 memory issues because of the length limitation of saved logs. I think
 this logs will give you more information.
 
 4[ 7736.343742] [ cut here ]
 
 4[ 7736.343759] WARNING:
 at /data/buildbot/workdir/jb/kernel/net/ipv4/tcp.c:1496 tcp_recvmsg
 +0x3bf/0x910()
 
 4[ 7736.343775] recvmsg bug: copied AB57C870 seq AB57CD95 rcvnxt
 AB57F19F fl 0
 
 4[ 7736.343845] Call Trace:
 
 4[ 7736.343865]  [c1237032] warn_slowpath_common+0x72/0xa0
 
 4[ 7736.343888]  [c18a955f] ? tcp_recvmsg+0x3bf/0x910
 
 4[ 7736.343902]  [c18a955f] ? tcp_recvmsg+0x3bf/0x910
 
 4[ 7736.343922]  [c1237103] warn_slowpath_fmt+0x33/0x40
 
 4[ 7736.343944]  [c18a955f] tcp_recvmsg+0x3bf/0x910
 
 4[ 7736.343968]  [c18c9bb5] inet_recvmsg+0x85/0xa0
 
 4[ 7736.343992]  [c1852030] sock_aio_read+0x140/0x160
 
 4[ 7736.344016]  [c126b221] ? set_next_entity+0xc1/0xf0
 
 4[ 7736.344039]  [c130d627] do_sync_read+0xb7/0xf0
 
 4[ 7736.344064]  [c130dc6c] ? rw_verify_area+0x6c/0x120
 
 4[ 7736.344077]  [c1349aa8] ? sys_epoll_wait+0x68/0x360
 
 4[ 7736.344098]  [c130e1e9] vfs_read+0x149/0x160
 
 4[ 7736.344120]  [c130f518] ? fget_light+0x58/0xd0
 
 4[ 7736.344142]  [c130e23d] sys_read+0x3d/0x70
 
 4[ 7736.344164]  [c198c361] syscall_call+0x7/0xb
 
 4[ 7736.344187]  [c198] ? perf_cpu_notify+0x45/0x89
 
 4[ 7736.344205] ---[ end trace b3c5b245ce7ff5b5 ]---
 

Thats exactly the interesting stuff ;)

This was fixed, or should be fixed if still happening on more recent
kernels.

Basically, once we are in this state, there is nothing we can do to
prevent a crash.

Please try to reproduce the issue using 3.9 or David trees (net-next or
net )

Thanks


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/