Re: Regression associated with commit c8628155ece3 - "tcp: reduce out_of_order memory use"

2012-08-27 Thread Larry Finger

On 08/27/2012 01:21 PM, Eric Dumazet wrote:

On Mon, 2012-08-27 at 12:55 -0500, Larry Finger wrote:


I have prepared a patch to fix all the unchecked allocations.

Over the weekend I made some progress. To test the latest vendor driver, I
installed a 32-bit system. Their driver is not compatible with a 64-bit system.
I found that not only did the vendor driver work with secure sites, but so did
the in-kernel version. I now have tcpdump output for the 32-bit case that works,
and the 64-bit case that fails. It seems likely that I missed some 32/64 bit
incompatibility when I did the conversion.

Thanks for all your help in trying to resolve this issue.


Interesting, so these 32/64 bits problems did not happen in prior
kernels ? (< 3.4 )


Prior kernels are OK.

Larry

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Regression associated with commit c8628155ece3 - "tcp: reduce out_of_order memory use"

2012-08-27 Thread Eric Dumazet
On Mon, 2012-08-27 at 12:55 -0500, Larry Finger wrote:

> I have prepared a patch to fix all the unchecked allocations.
> 
> Over the weekend I made some progress. To test the latest vendor driver, I 
> installed a 32-bit system. Their driver is not compatible with a 64-bit 
> system. 
> I found that not only did the vendor driver work with secure sites, but so 
> did 
> the in-kernel version. I now have tcpdump output for the 32-bit case that 
> works, 
> and the 64-bit case that fails. It seems likely that I missed some 32/64 bit 
> incompatibility when I did the conversion.
> 
> Thanks for all your help in trying to resolve this issue.

Interesting, so these 32/64 bits problems did not happen in prior
kernels ? (< 3.4 )


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Regression associated with commit c8628155ece3 - "tcp: reduce out_of_order memory use"

2012-08-27 Thread Larry Finger

On 08/24/2012 12:47 PM, Eric Dumazet wrote:


dev_alloc_skb() can return NULL

-> crash

skb_clone() can also return NULL

-> crash


I have prepared a patch to fix all the unchecked allocations.

Over the weekend I made some progress. To test the latest vendor driver, I 
installed a 32-bit system. Their driver is not compatible with a 64-bit system. 
I found that not only did the vendor driver work with secure sites, but so did 
the in-kernel version. I now have tcpdump output for the 32-bit case that works, 
and the 64-bit case that fails. It seems likely that I missed some 32/64 bit 
incompatibility when I did the conversion.


Thanks for all your help in trying to resolve this issue.

Larry


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Regression associated with commit c8628155ece3 - tcp: reduce out_of_order memory use

2012-08-27 Thread Larry Finger

On 08/24/2012 12:47 PM, Eric Dumazet wrote:


dev_alloc_skb() can return NULL

- crash

skb_clone() can also return NULL

- crash


I have prepared a patch to fix all the unchecked allocations.

Over the weekend I made some progress. To test the latest vendor driver, I 
installed a 32-bit system. Their driver is not compatible with a 64-bit system. 
I found that not only did the vendor driver work with secure sites, but so did 
the in-kernel version. I now have tcpdump output for the 32-bit case that works, 
and the 64-bit case that fails. It seems likely that I missed some 32/64 bit 
incompatibility when I did the conversion.


Thanks for all your help in trying to resolve this issue.

Larry


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Regression associated with commit c8628155ece3 - tcp: reduce out_of_order memory use

2012-08-27 Thread Eric Dumazet
On Mon, 2012-08-27 at 12:55 -0500, Larry Finger wrote:

 I have prepared a patch to fix all the unchecked allocations.
 
 Over the weekend I made some progress. To test the latest vendor driver, I 
 installed a 32-bit system. Their driver is not compatible with a 64-bit 
 system. 
 I found that not only did the vendor driver work with secure sites, but so 
 did 
 the in-kernel version. I now have tcpdump output for the 32-bit case that 
 works, 
 and the 64-bit case that fails. It seems likely that I missed some 32/64 bit 
 incompatibility when I did the conversion.
 
 Thanks for all your help in trying to resolve this issue.

Interesting, so these 32/64 bits problems did not happen in prior
kernels ? ( 3.4 )


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Regression associated with commit c8628155ece3 - tcp: reduce out_of_order memory use

2012-08-27 Thread Larry Finger

On 08/27/2012 01:21 PM, Eric Dumazet wrote:

On Mon, 2012-08-27 at 12:55 -0500, Larry Finger wrote:


I have prepared a patch to fix all the unchecked allocations.

Over the weekend I made some progress. To test the latest vendor driver, I
installed a 32-bit system. Their driver is not compatible with a 64-bit system.
I found that not only did the vendor driver work with secure sites, but so did
the in-kernel version. I now have tcpdump output for the 32-bit case that works,
and the 64-bit case that fails. It seems likely that I missed some 32/64 bit
incompatibility when I did the conversion.

Thanks for all your help in trying to resolve this issue.


Interesting, so these 32/64 bits problems did not happen in prior
kernels ? ( 3.4 )


Prior kernels are OK.

Larry

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Regression associated with commit c8628155ece3 - "tcp: reduce out_of_order memory use"

2012-08-24 Thread Eric Dumazet
On Fri, 2012-08-24 at 11:58 -0500, Larry Finger wrote:
> On 08/24/2012 11:23 AM, Eric Dumazet wrote:
> > On Fri, 2012-08-24 at 18:18 +0200, Eric Dumazet wrote:
> >> On Fri, 2012-08-24 at 10:58 -0500, Larry Finger wrote:
> >>> On 08/24/2012 10:19 AM, David Miller wrote:
> 
>  This looks like full-on data corruption to me.
> >>>
> >>> I agree. The question is why does it happen with r8712u, and only after 
> >>> the
> >>> commit in the subject. Drivers for other devices that I have are OK. Thus 
> >>> far, I
> >>> have tested b43, rtl8187, ath9k_htc, and rtl8192cu. To my knowledge, 
> >>> there are
> >>> no reports posted for this bug with any other device.
> >>
> >> bugs can sit unnoticed, and one change somewhere can uncover them.
> >>
> >> Really this driver must have a bug, if not half a dozen of bugs.
> >>
> >> For example this sequence of code is a clear bug :
> >>
> >> sub_skb = dev_alloc_skb(nSubframe_Length + 12);
> >> skb_reserve(sub_skb, 12);
> >>
> >>
> >> Also the free_recv_skb_queue looks really suspect to me
> >>
> >> What the hell is doing recv_tasklet() I really wonder.
> >>
> >> This code, combined with the skb_clone() in recvbuf2recvframe()
> >> can clearly reuse an skb passed to upper stacks.
> >>
> >>
> >> queueing one skb in free_recv_skb_queue should be done
> >> only if no clone of this skb exist somewhere.
> >>
> >> Please someone fix this buggy driver.
> >>
> >
> > Try the following patch for a start
> >
> > diff --git a/drivers/staging/rtl8712/rtl8712_recv.c 
> > b/drivers/staging/rtl8712/rtl8712_recv.c
> > index 8e82ce2..88e3ca6 100644
> > --- a/drivers/staging/rtl8712/rtl8712_recv.c
> > +++ b/drivers/staging/rtl8712/rtl8712_recv.c
> > @@ -1127,6 +1127,9 @@ static void recv_tasklet(void *priv)
> > recvbuf2recvframe(padapter, pskb);
> > skb_reset_tail_pointer(pskb);
> > pskb->len = 0;
> > -   skb_queue_tail(>free_recv_skb_queue, pskb);
> > +   if (!skb_cloned(pskb))
> > +   skb_queue_tail(>free_recv_skb_queue, pskb);
> > +   else
> > +   consume_skb(pskb);
> > }
> >   }
> 
> This one did not help. There is no doubt it is needed for the case where 
> memory 
> is tight, an allocation fails, and the driver clones the skb. In the present 
> case, debug statements have shown that the skb_clone() call was not made.
> 
> In the long term, this driver will be replaced with one that uses mac80211, 
> but 
> in the short term, I am trying to fix it.
> 
> As I said earlier, my skb skills are minimal. Could you explain what is wrong 
> with the following sequence?
> 
>   sub_skb = dev_alloc_skb(nSubframe_Length + 12);
>   skb_reserve(sub_skb, 12);

dev_alloc_skb() can return NULL

-> crash

skb_clone() can also return NULL 

-> crash



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Regression associated with commit c8628155ece3 - "tcp: reduce out_of_order memory use"

2012-08-24 Thread Larry Finger

On 08/24/2012 11:23 AM, Eric Dumazet wrote:

On Fri, 2012-08-24 at 18:18 +0200, Eric Dumazet wrote:

On Fri, 2012-08-24 at 10:58 -0500, Larry Finger wrote:

On 08/24/2012 10:19 AM, David Miller wrote:


This looks like full-on data corruption to me.


I agree. The question is why does it happen with r8712u, and only after the
commit in the subject. Drivers for other devices that I have are OK. Thus far, I
have tested b43, rtl8187, ath9k_htc, and rtl8192cu. To my knowledge, there are
no reports posted for this bug with any other device.


bugs can sit unnoticed, and one change somewhere can uncover them.

Really this driver must have a bug, if not half a dozen of bugs.

For example this sequence of code is a clear bug :

sub_skb = dev_alloc_skb(nSubframe_Length + 12);
skb_reserve(sub_skb, 12);


Also the free_recv_skb_queue looks really suspect to me

What the hell is doing recv_tasklet() I really wonder.

This code, combined with the skb_clone() in recvbuf2recvframe()
can clearly reuse an skb passed to upper stacks.


queueing one skb in free_recv_skb_queue should be done
only if no clone of this skb exist somewhere.

Please someone fix this buggy driver.



Try the following patch for a start

diff --git a/drivers/staging/rtl8712/rtl8712_recv.c 
b/drivers/staging/rtl8712/rtl8712_recv.c
index 8e82ce2..88e3ca6 100644
--- a/drivers/staging/rtl8712/rtl8712_recv.c
+++ b/drivers/staging/rtl8712/rtl8712_recv.c
@@ -1127,6 +1127,9 @@ static void recv_tasklet(void *priv)
recvbuf2recvframe(padapter, pskb);
skb_reset_tail_pointer(pskb);
pskb->len = 0;
-   skb_queue_tail(>free_recv_skb_queue, pskb);
+   if (!skb_cloned(pskb))
+   skb_queue_tail(>free_recv_skb_queue, pskb);
+   else
+   consume_skb(pskb);
}
  }


This one did not help. There is no doubt it is needed for the case where memory 
is tight, an allocation fails, and the driver clones the skb. In the present 
case, debug statements have shown that the skb_clone() call was not made.


In the long term, this driver will be replaced with one that uses mac80211, but 
in the short term, I am trying to fix it.


As I said earlier, my skb skills are minimal. Could you explain what is wrong 
with the following sequence?


 sub_skb = dev_alloc_skb(nSubframe_Length + 12);
 skb_reserve(sub_skb, 12);

Thanks,

Larry


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Regression associated with commit c8628155ece3 - "tcp: reduce out_of_order memory use"

2012-08-24 Thread Larry Finger

On 08/24/2012 11:01 AM, Eric Dumazet wrote:

On Fri, 2012-08-24 at 10:49 -0500, Larry Finger wrote:


There is nothing in kernel log when it happens. The file STRACE is attached.



So there is indeed a corruption. What was the driver you used in this
case ?


The only driver that shows this problem is r8712u. One thought that I just had 
is that this driver does not use mac80211, which is why it is in staging. 
Instead it uses a Realtek soft MAC layer derived from the obsolete softmac 
driver that was in the kernel in the 2.6.20 time frame. That means that none of 
the skbs have ever been cloned by any part of the wireless code.


Larry


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Regression associated with commit c8628155ece3 - "tcp: reduce out_of_order memory use"

2012-08-24 Thread Eric Dumazet
On Fri, 2012-08-24 at 18:18 +0200, Eric Dumazet wrote:
> On Fri, 2012-08-24 at 10:58 -0500, Larry Finger wrote:
> > On 08/24/2012 10:19 AM, David Miller wrote:
> > >
> > > This looks like full-on data corruption to me.
> > 
> > I agree. The question is why does it happen with r8712u, and only after the 
> > commit in the subject. Drivers for other devices that I have are OK. Thus 
> > far, I 
> > have tested b43, rtl8187, ath9k_htc, and rtl8192cu. To my knowledge, there 
> > are 
> > no reports posted for this bug with any other device.
> 
> bugs can sit unnoticed, and one change somewhere can uncover them.
> 
> Really this driver must have a bug, if not half a dozen of bugs.
> 
> For example this sequence of code is a clear bug :
> 
> sub_skb = dev_alloc_skb(nSubframe_Length + 12); 
> skb_reserve(sub_skb, 12);
> 
> 
> Also the free_recv_skb_queue looks really suspect to me
> 
> What the hell is doing recv_tasklet() I really wonder.
> 
> This code, combined with the skb_clone() in recvbuf2recvframe()
> can clearly reuse an skb passed to upper stacks.
> 
> 
> queueing one skb in free_recv_skb_queue should be done
> only if no clone of this skb exist somewhere.
> 
> Please someone fix this buggy driver.
> 

Try the following patch for a start

diff --git a/drivers/staging/rtl8712/rtl8712_recv.c 
b/drivers/staging/rtl8712/rtl8712_recv.c
index 8e82ce2..88e3ca6 100644
--- a/drivers/staging/rtl8712/rtl8712_recv.c
+++ b/drivers/staging/rtl8712/rtl8712_recv.c
@@ -1127,6 +1127,9 @@ static void recv_tasklet(void *priv)
recvbuf2recvframe(padapter, pskb);
skb_reset_tail_pointer(pskb);
pskb->len = 0;
-   skb_queue_tail(>free_recv_skb_queue, pskb);
+   if (!skb_cloned(pskb))
+   skb_queue_tail(>free_recv_skb_queue, pskb);
+   else
+   consume_skb(pskb);
}
 }


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Regression associated with commit c8628155ece3 - "tcp: reduce out_of_order memory use"

2012-08-24 Thread Eric Dumazet
On Fri, 2012-08-24 at 10:58 -0500, Larry Finger wrote:
> On 08/24/2012 10:19 AM, David Miller wrote:
> >
> > This looks like full-on data corruption to me.
> 
> I agree. The question is why does it happen with r8712u, and only after the 
> commit in the subject. Drivers for other devices that I have are OK. Thus 
> far, I 
> have tested b43, rtl8187, ath9k_htc, and rtl8192cu. To my knowledge, there 
> are 
> no reports posted for this bug with any other device.

bugs can sit unnoticed, and one change somewhere can uncover them.

Really this driver must have a bug, if not half a dozen of bugs.

For example this sequence of code is a clear bug :

sub_skb = dev_alloc_skb(nSubframe_Length + 12); 
skb_reserve(sub_skb, 12);


Also the free_recv_skb_queue looks really suspect to me

What the hell is doing recv_tasklet() I really wonder.

This code, combined with the skb_clone() in recvbuf2recvframe()
can clearly reuse an skb passed to upper stacks.


queueing one skb in free_recv_skb_queue should be done
only if no clone of this skb exist somewhere.

Please someone fix this buggy driver.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Regression associated with commit c8628155ece3 - "tcp: reduce out_of_order memory use"

2012-08-24 Thread Eric Dumazet
On Fri, 2012-08-24 at 10:49 -0500, Larry Finger wrote:

> There is nothing in kernel log when it happens. The file STRACE is attached.
> 

So there is indeed a corruption. What was the driver you used in this
case ?


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Regression associated with commit c8628155ece3 - "tcp: reduce out_of_order memory use"

2012-08-24 Thread Larry Finger

On 08/24/2012 10:19 AM, David Miller wrote:


This looks like full-on data corruption to me.


I agree. The question is why does it happen with r8712u, and only after the 
commit in the subject. Drivers for other devices that I have are OK. Thus far, I 
have tested b43, rtl8187, ath9k_htc, and rtl8192cu. To my knowledge, there are 
no reports posted for this bug with any other device.


Larry




--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Regression associated with commit c8628155ece3 - "tcp: reduce out_of_order memory use"

2012-08-24 Thread David Miller
From: Larry Finger 
Date: Fri, 24 Aug 2012 09:09:57 -0500

> I usually get failures on the first or second try. A kernel built from
> "git checkout c862815" gets the following wget output:
> 
> 
> --2012-08-23 21:50:32--
> --https://bugzilla.redhat.com/show_bug.cgi?id=847525
> Resolving bugzilla.redhat.com (bugzilla.redhat.com)... 209.132.183.69
> Connecting to bugzilla.redhat.com
> (bugzilla.redhat.com)|209.132.183.69|:443... connected.
> OpenSSL: error:0D0680A8:asn1 encoding routines:ASN1_CHECK_TLEN:wrong
> tag
> OpenSSL: error:0D07803A:asn1 encoding routines:ASN1_ITEM_EX_D2I:nested
> asn1 error
> OpenSSL: error:0D08303A:asn1 encoding
> routines:ASN1_TEMPLATE_NOEXP_D2I:nested asn1 error
> OpenSSL: error:0D08303A:asn1 encoding
> routines:ASN1_TEMPLATE_NOEXP_D2I:nested asn1 error
> OpenSSL: error:1409000D:SSL routines:SSL3_GET_SERVER_CERTIFICATE:ASN1
> lib
> Unable to establish SSL connection.
> 

This looks like full-on data corruption to me.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Regression associated with commit c8628155ece3 - "tcp: reduce out_of_order memory use"

2012-08-24 Thread Eric Dumazet
Le vendredi 24 août 2012 à 09:09 -0500, Larry Finger a écrit :

> With kernel 3.6-rc2, the error changes to the following:
> 
> 
> --2012-08-24 08:26:42--  https://bugzilla.redhat.com/show_bug.cgi?id=847525
> Resolving bugzilla.redhat.com (bugzilla.redhat.com)... 209.132.183.69
> Connecting to bugzilla.redhat.com 
> (bugzilla.redhat.com)|209.132.183.69|:443... 
> connected.
> HTTP request sent, awaiting response... 200 OK
> Length: unspecified [text/html]
> Saving to: `show_bug.cgi?id=847525.3'
> 
>  [ <=>  ] 0   --.-K/s   in 0s
> 
> 2012-08-24 08:26:45 (0.00 B/s) - Read error at byte 0 (error:1408F119:SSL 
> routines:SSL3_GET_RECORD:decryption failed or bad record mac).Retrying.
> 
> 

> Anything else to try?

When this happens, nothing in kernel log ?

Could you do

strace -o STRACE -s 8000 wget https://bugzilla.redhat.com/show_bug.cgi?id=847525

and send the STRACE file when you get a failure ?

Thanks



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Regression associated with commit c8628155ece3 - "tcp: reduce out_of_order memory use"

2012-08-24 Thread Larry Finger

On 08/23/2012 04:26 PM, Eric Dumazet wrote:


skb->truesize is adjusted when a frag is added to one skb, or when
skb->head is re-allocated.

Are you sure you dont have another problem, because as I said commit
c8628155ece3  had a bug, so a bisect is not very useful.

How many reloads are needed to trigger the bug, do you have a script to
reproduce it ?


I was doing the tests with a browser; however, it seems that a wget of that same 
page is much more effective at triggering the problem. Now I have a script that 
tries 60 passes of


wget https://bugzilla.redhat.com/show_bug.cgi?id=847525

I usually get failures on the first or second try. A kernel built from "git 
checkout c862815" gets the following wget output:



--2012-08-23 21:50:32--  https://bugzilla.redhat.com/show_bug.cgi?id=847525
Resolving bugzilla.redhat.com (bugzilla.redhat.com)... 209.132.183.69
Connecting to bugzilla.redhat.com (bugzilla.redhat.com)|209.132.183.69|:443... 
connected.

OpenSSL: error:0D0680A8:asn1 encoding routines:ASN1_CHECK_TLEN:wrong tag
OpenSSL: error:0D07803A:asn1 encoding routines:ASN1_ITEM_EX_D2I:nested asn1 
error
OpenSSL: error:0D08303A:asn1 encoding routines:ASN1_TEMPLATE_NOEXP_D2I:nested 
asn1 error
OpenSSL: error:0D08303A:asn1 encoding routines:ASN1_TEMPLATE_NOEXP_D2I:nested 
asn1 error

OpenSSL: error:1409000D:SSL routines:SSL3_GET_SERVER_CERTIFICATE:ASN1 lib
Unable to establish SSL connection.


With kernel 3.6-rc2, the error changes to the following:


--2012-08-24 08:26:42--  https://bugzilla.redhat.com/show_bug.cgi?id=847525
Resolving bugzilla.redhat.com (bugzilla.redhat.com)... 209.132.183.69
Connecting to bugzilla.redhat.com (bugzilla.redhat.com)|209.132.183.69|:443... 
connected.

HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]
Saving to: `show_bug.cgi?id=847525.3'

[ <=>  ] 0   --.-K/s   in 0s

2012-08-24 08:26:45 (0.00 B/s) - Read error at byte 0 (error:1408F119:SSL 
routines:SSL3_GET_RECORD:decryption failed or bad record mac).Retrying.




Could it be a PMTU problem ? (check
http://git.kernel.org/?p=linux/kernel/git/davem/net.git;a=commit;h=9b04f350057863d1fad1ba071e09362a1da3503e
 )


I added the PMTU patch to kernel 3.6-rc2 - nothing changed.

Anything else to try?

Thanks,

Larry

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Regression associated with commit c8628155ece3 - tcp: reduce out_of_order memory use

2012-08-24 Thread Larry Finger

On 08/23/2012 04:26 PM, Eric Dumazet wrote:


skb-truesize is adjusted when a frag is added to one skb, or when
skb-head is re-allocated.

Are you sure you dont have another problem, because as I said commit
c8628155ece3  had a bug, so a bisect is not very useful.

How many reloads are needed to trigger the bug, do you have a script to
reproduce it ?


I was doing the tests with a browser; however, it seems that a wget of that same 
page is much more effective at triggering the problem. Now I have a script that 
tries 60 passes of


wget https://bugzilla.redhat.com/show_bug.cgi?id=847525

I usually get failures on the first or second try. A kernel built from git 
checkout c862815 gets the following wget output:



--2012-08-23 21:50:32--  https://bugzilla.redhat.com/show_bug.cgi?id=847525
Resolving bugzilla.redhat.com (bugzilla.redhat.com)... 209.132.183.69
Connecting to bugzilla.redhat.com (bugzilla.redhat.com)|209.132.183.69|:443... 
connected.

OpenSSL: error:0D0680A8:asn1 encoding routines:ASN1_CHECK_TLEN:wrong tag
OpenSSL: error:0D07803A:asn1 encoding routines:ASN1_ITEM_EX_D2I:nested asn1 
error
OpenSSL: error:0D08303A:asn1 encoding routines:ASN1_TEMPLATE_NOEXP_D2I:nested 
asn1 error
OpenSSL: error:0D08303A:asn1 encoding routines:ASN1_TEMPLATE_NOEXP_D2I:nested 
asn1 error

OpenSSL: error:1409000D:SSL routines:SSL3_GET_SERVER_CERTIFICATE:ASN1 lib
Unable to establish SSL connection.


With kernel 3.6-rc2, the error changes to the following:


--2012-08-24 08:26:42--  https://bugzilla.redhat.com/show_bug.cgi?id=847525
Resolving bugzilla.redhat.com (bugzilla.redhat.com)... 209.132.183.69
Connecting to bugzilla.redhat.com (bugzilla.redhat.com)|209.132.183.69|:443... 
connected.

HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]
Saving to: `show_bug.cgi?id=847525.3'

[ =  ] 0   --.-K/s   in 0s

2012-08-24 08:26:45 (0.00 B/s) - Read error at byte 0 (error:1408F119:SSL 
routines:SSL3_GET_RECORD:decryption failed or bad record mac).Retrying.




Could it be a PMTU problem ? (check
http://git.kernel.org/?p=linux/kernel/git/davem/net.git;a=commit;h=9b04f350057863d1fad1ba071e09362a1da3503e
 )


I added the PMTU patch to kernel 3.6-rc2 - nothing changed.

Anything else to try?

Thanks,

Larry

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Regression associated with commit c8628155ece3 - tcp: reduce out_of_order memory use

2012-08-24 Thread Eric Dumazet
Le vendredi 24 août 2012 à 09:09 -0500, Larry Finger a écrit :

 With kernel 3.6-rc2, the error changes to the following:
 
 
 --2012-08-24 08:26:42--  https://bugzilla.redhat.com/show_bug.cgi?id=847525
 Resolving bugzilla.redhat.com (bugzilla.redhat.com)... 209.132.183.69
 Connecting to bugzilla.redhat.com 
 (bugzilla.redhat.com)|209.132.183.69|:443... 
 connected.
 HTTP request sent, awaiting response... 200 OK
 Length: unspecified [text/html]
 Saving to: `show_bug.cgi?id=847525.3'
 
  [ =  ] 0   --.-K/s   in 0s
 
 2012-08-24 08:26:45 (0.00 B/s) - Read error at byte 0 (error:1408F119:SSL 
 routines:SSL3_GET_RECORD:decryption failed or bad record mac).Retrying.
 
 

 Anything else to try?

When this happens, nothing in kernel log ?

Could you do

strace -o STRACE -s 8000 wget https://bugzilla.redhat.com/show_bug.cgi?id=847525

and send the STRACE file when you get a failure ?

Thanks



--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Regression associated with commit c8628155ece3 - tcp: reduce out_of_order memory use

2012-08-24 Thread David Miller
From: Larry Finger larry.fin...@lwfinger.net
Date: Fri, 24 Aug 2012 09:09:57 -0500

 I usually get failures on the first or second try. A kernel built from
 git checkout c862815 gets the following wget output:
 
 
 --2012-08-23 21:50:32--
 --https://bugzilla.redhat.com/show_bug.cgi?id=847525
 Resolving bugzilla.redhat.com (bugzilla.redhat.com)... 209.132.183.69
 Connecting to bugzilla.redhat.com
 (bugzilla.redhat.com)|209.132.183.69|:443... connected.
 OpenSSL: error:0D0680A8:asn1 encoding routines:ASN1_CHECK_TLEN:wrong
 tag
 OpenSSL: error:0D07803A:asn1 encoding routines:ASN1_ITEM_EX_D2I:nested
 asn1 error
 OpenSSL: error:0D08303A:asn1 encoding
 routines:ASN1_TEMPLATE_NOEXP_D2I:nested asn1 error
 OpenSSL: error:0D08303A:asn1 encoding
 routines:ASN1_TEMPLATE_NOEXP_D2I:nested asn1 error
 OpenSSL: error:1409000D:SSL routines:SSL3_GET_SERVER_CERTIFICATE:ASN1
 lib
 Unable to establish SSL connection.
 

This looks like full-on data corruption to me.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Regression associated with commit c8628155ece3 - tcp: reduce out_of_order memory use

2012-08-24 Thread Larry Finger

On 08/24/2012 10:19 AM, David Miller wrote:


This looks like full-on data corruption to me.


I agree. The question is why does it happen with r8712u, and only after the 
commit in the subject. Drivers for other devices that I have are OK. Thus far, I 
have tested b43, rtl8187, ath9k_htc, and rtl8192cu. To my knowledge, there are 
no reports posted for this bug with any other device.


Larry




--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Regression associated with commit c8628155ece3 - tcp: reduce out_of_order memory use

2012-08-24 Thread Eric Dumazet
On Fri, 2012-08-24 at 10:49 -0500, Larry Finger wrote:

 There is nothing in kernel log when it happens. The file STRACE is attached.
 

So there is indeed a corruption. What was the driver you used in this
case ?


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Regression associated with commit c8628155ece3 - tcp: reduce out_of_order memory use

2012-08-24 Thread Eric Dumazet
On Fri, 2012-08-24 at 10:58 -0500, Larry Finger wrote:
 On 08/24/2012 10:19 AM, David Miller wrote:
 
  This looks like full-on data corruption to me.
 
 I agree. The question is why does it happen with r8712u, and only after the 
 commit in the subject. Drivers for other devices that I have are OK. Thus 
 far, I 
 have tested b43, rtl8187, ath9k_htc, and rtl8192cu. To my knowledge, there 
 are 
 no reports posted for this bug with any other device.

bugs can sit unnoticed, and one change somewhere can uncover them.

Really this driver must have a bug, if not half a dozen of bugs.

For example this sequence of code is a clear bug :

sub_skb = dev_alloc_skb(nSubframe_Length + 12); 
skb_reserve(sub_skb, 12);


Also the free_recv_skb_queue looks really suspect to me

What the hell is doing recv_tasklet() I really wonder.

This code, combined with the skb_clone() in recvbuf2recvframe()
can clearly reuse an skb passed to upper stacks.


queueing one skb in free_recv_skb_queue should be done
only if no clone of this skb exist somewhere.

Please someone fix this buggy driver.


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Regression associated with commit c8628155ece3 - tcp: reduce out_of_order memory use

2012-08-24 Thread Eric Dumazet
On Fri, 2012-08-24 at 18:18 +0200, Eric Dumazet wrote:
 On Fri, 2012-08-24 at 10:58 -0500, Larry Finger wrote:
  On 08/24/2012 10:19 AM, David Miller wrote:
  
   This looks like full-on data corruption to me.
  
  I agree. The question is why does it happen with r8712u, and only after the 
  commit in the subject. Drivers for other devices that I have are OK. Thus 
  far, I 
  have tested b43, rtl8187, ath9k_htc, and rtl8192cu. To my knowledge, there 
  are 
  no reports posted for this bug with any other device.
 
 bugs can sit unnoticed, and one change somewhere can uncover them.
 
 Really this driver must have a bug, if not half a dozen of bugs.
 
 For example this sequence of code is a clear bug :
 
 sub_skb = dev_alloc_skb(nSubframe_Length + 12); 
 skb_reserve(sub_skb, 12);
 
 
 Also the free_recv_skb_queue looks really suspect to me
 
 What the hell is doing recv_tasklet() I really wonder.
 
 This code, combined with the skb_clone() in recvbuf2recvframe()
 can clearly reuse an skb passed to upper stacks.
 
 
 queueing one skb in free_recv_skb_queue should be done
 only if no clone of this skb exist somewhere.
 
 Please someone fix this buggy driver.
 

Try the following patch for a start

diff --git a/drivers/staging/rtl8712/rtl8712_recv.c 
b/drivers/staging/rtl8712/rtl8712_recv.c
index 8e82ce2..88e3ca6 100644
--- a/drivers/staging/rtl8712/rtl8712_recv.c
+++ b/drivers/staging/rtl8712/rtl8712_recv.c
@@ -1127,6 +1127,9 @@ static void recv_tasklet(void *priv)
recvbuf2recvframe(padapter, pskb);
skb_reset_tail_pointer(pskb);
pskb-len = 0;
-   skb_queue_tail(precvpriv-free_recv_skb_queue, pskb);
+   if (!skb_cloned(pskb))
+   skb_queue_tail(precvpriv-free_recv_skb_queue, pskb);
+   else
+   consume_skb(pskb);
}
 }


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Regression associated with commit c8628155ece3 - tcp: reduce out_of_order memory use

2012-08-24 Thread Larry Finger

On 08/24/2012 11:01 AM, Eric Dumazet wrote:

On Fri, 2012-08-24 at 10:49 -0500, Larry Finger wrote:


There is nothing in kernel log when it happens. The file STRACE is attached.



So there is indeed a corruption. What was the driver you used in this
case ?


The only driver that shows this problem is r8712u. One thought that I just had 
is that this driver does not use mac80211, which is why it is in staging. 
Instead it uses a Realtek soft MAC layer derived from the obsolete softmac 
driver that was in the kernel in the 2.6.20 time frame. That means that none of 
the skbs have ever been cloned by any part of the wireless code.


Larry


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Regression associated with commit c8628155ece3 - tcp: reduce out_of_order memory use

2012-08-24 Thread Larry Finger

On 08/24/2012 11:23 AM, Eric Dumazet wrote:

On Fri, 2012-08-24 at 18:18 +0200, Eric Dumazet wrote:

On Fri, 2012-08-24 at 10:58 -0500, Larry Finger wrote:

On 08/24/2012 10:19 AM, David Miller wrote:


This looks like full-on data corruption to me.


I agree. The question is why does it happen with r8712u, and only after the
commit in the subject. Drivers for other devices that I have are OK. Thus far, I
have tested b43, rtl8187, ath9k_htc, and rtl8192cu. To my knowledge, there are
no reports posted for this bug with any other device.


bugs can sit unnoticed, and one change somewhere can uncover them.

Really this driver must have a bug, if not half a dozen of bugs.

For example this sequence of code is a clear bug :

sub_skb = dev_alloc_skb(nSubframe_Length + 12);
skb_reserve(sub_skb, 12);


Also the free_recv_skb_queue looks really suspect to me

What the hell is doing recv_tasklet() I really wonder.

This code, combined with the skb_clone() in recvbuf2recvframe()
can clearly reuse an skb passed to upper stacks.


queueing one skb in free_recv_skb_queue should be done
only if no clone of this skb exist somewhere.

Please someone fix this buggy driver.



Try the following patch for a start

diff --git a/drivers/staging/rtl8712/rtl8712_recv.c 
b/drivers/staging/rtl8712/rtl8712_recv.c
index 8e82ce2..88e3ca6 100644
--- a/drivers/staging/rtl8712/rtl8712_recv.c
+++ b/drivers/staging/rtl8712/rtl8712_recv.c
@@ -1127,6 +1127,9 @@ static void recv_tasklet(void *priv)
recvbuf2recvframe(padapter, pskb);
skb_reset_tail_pointer(pskb);
pskb-len = 0;
-   skb_queue_tail(precvpriv-free_recv_skb_queue, pskb);
+   if (!skb_cloned(pskb))
+   skb_queue_tail(precvpriv-free_recv_skb_queue, pskb);
+   else
+   consume_skb(pskb);
}
  }


This one did not help. There is no doubt it is needed for the case where memory 
is tight, an allocation fails, and the driver clones the skb. In the present 
case, debug statements have shown that the skb_clone() call was not made.


In the long term, this driver will be replaced with one that uses mac80211, but 
in the short term, I am trying to fix it.


As I said earlier, my skb skills are minimal. Could you explain what is wrong 
with the following sequence?


 sub_skb = dev_alloc_skb(nSubframe_Length + 12);
 skb_reserve(sub_skb, 12);

Thanks,

Larry


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Regression associated with commit c8628155ece3 - tcp: reduce out_of_order memory use

2012-08-24 Thread Eric Dumazet
On Fri, 2012-08-24 at 11:58 -0500, Larry Finger wrote:
 On 08/24/2012 11:23 AM, Eric Dumazet wrote:
  On Fri, 2012-08-24 at 18:18 +0200, Eric Dumazet wrote:
  On Fri, 2012-08-24 at 10:58 -0500, Larry Finger wrote:
  On 08/24/2012 10:19 AM, David Miller wrote:
 
  This looks like full-on data corruption to me.
 
  I agree. The question is why does it happen with r8712u, and only after 
  the
  commit in the subject. Drivers for other devices that I have are OK. Thus 
  far, I
  have tested b43, rtl8187, ath9k_htc, and rtl8192cu. To my knowledge, 
  there are
  no reports posted for this bug with any other device.
 
  bugs can sit unnoticed, and one change somewhere can uncover them.
 
  Really this driver must have a bug, if not half a dozen of bugs.
 
  For example this sequence of code is a clear bug :
 
  sub_skb = dev_alloc_skb(nSubframe_Length + 12);
  skb_reserve(sub_skb, 12);
 
 
  Also the free_recv_skb_queue looks really suspect to me
 
  What the hell is doing recv_tasklet() I really wonder.
 
  This code, combined with the skb_clone() in recvbuf2recvframe()
  can clearly reuse an skb passed to upper stacks.
 
 
  queueing one skb in free_recv_skb_queue should be done
  only if no clone of this skb exist somewhere.
 
  Please someone fix this buggy driver.
 
 
  Try the following patch for a start
 
  diff --git a/drivers/staging/rtl8712/rtl8712_recv.c 
  b/drivers/staging/rtl8712/rtl8712_recv.c
  index 8e82ce2..88e3ca6 100644
  --- a/drivers/staging/rtl8712/rtl8712_recv.c
  +++ b/drivers/staging/rtl8712/rtl8712_recv.c
  @@ -1127,6 +1127,9 @@ static void recv_tasklet(void *priv)
  recvbuf2recvframe(padapter, pskb);
  skb_reset_tail_pointer(pskb);
  pskb-len = 0;
  -   skb_queue_tail(precvpriv-free_recv_skb_queue, pskb);
  +   if (!skb_cloned(pskb))
  +   skb_queue_tail(precvpriv-free_recv_skb_queue, pskb);
  +   else
  +   consume_skb(pskb);
  }
}
 
 This one did not help. There is no doubt it is needed for the case where 
 memory 
 is tight, an allocation fails, and the driver clones the skb. In the present 
 case, debug statements have shown that the skb_clone() call was not made.
 
 In the long term, this driver will be replaced with one that uses mac80211, 
 but 
 in the short term, I am trying to fix it.
 
 As I said earlier, my skb skills are minimal. Could you explain what is wrong 
 with the following sequence?
 
   sub_skb = dev_alloc_skb(nSubframe_Length + 12);
   skb_reserve(sub_skb, 12);

dev_alloc_skb() can return NULL

- crash

skb_clone() can also return NULL 

- crash



--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Regression associated with commit c8628155ece3 - "tcp: reduce out_of_order memory use"

2012-08-23 Thread Eric Dumazet
On Thu, 2012-08-23 at 15:57 -0500, Larry Finger wrote:
> On 08/22/2012 11:03 PM, Eric Dumazet wrote:
> >
> > Changing the allocation size removes the problem ? thats really strange.
> >
> > If you try different sizes in the 9100-30720 range, can you pinpoint the
> > failure threshold ?
> 
> The allocation size change did not fix the problem. It turned out that 10 
> tries 
> from a secure web page were not enough to trigger this intermittent problem 
> that 
> particular test.
> 
> Based on DaveM's comment that skb->truesize could be wrong, I tried setting 
> truesize after every netdev_alloc_skb() call. Of course, that had no effect. 
> I 
> then found https://lkml.org/lkml/2010/11/19/505I, which clearly states why 
> this 
> need not be done.
> 
> What skb modifications require that truesize be adjusted? The driver never 
> resets skb->len or skb->data_len for any buffers, other than setting skb->len 
> to 
> zero.

skb->truesize is adjusted when a frag is added to one skb, or when
skb->head is re-allocated.

Are you sure you dont have another problem, because as I said commit
c8628155ece3  had a bug, so a bisect is not very useful.

How many reloads are needed to trigger the bug, do you have a script to
reproduce it ?

Could it be a PMTU problem ? (check
http://git.kernel.org/?p=linux/kernel/git/davem/net.git;a=commit;h=9b04f350057863d1fad1ba071e09362a1da3503e
 )


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Regression associated with commit c8628155ece3 - "tcp: reduce out_of_order memory use"

2012-08-23 Thread Larry Finger

On 08/22/2012 11:03 PM, Eric Dumazet wrote:


Changing the allocation size removes the problem ? thats really strange.

If you try different sizes in the 9100-30720 range, can you pinpoint the
failure threshold ?


The allocation size change did not fix the problem. It turned out that 10 tries 
from a secure web page were not enough to trigger this intermittent problem that 
particular test.


Based on DaveM's comment that skb->truesize could be wrong, I tried setting 
truesize after every netdev_alloc_skb() call. Of course, that had no effect. I 
then found https://lkml.org/lkml/2010/11/19/505I, which clearly states why this 
need not be done.


What skb modifications require that truesize be adjusted? The driver never 
resets skb->len or skb->data_len for any buffers, other than setting skb->len to 
zero.


Thanks,

Larry

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Regression associated with commit c8628155ece3 - tcp: reduce out_of_order memory use

2012-08-23 Thread Larry Finger

On 08/22/2012 11:03 PM, Eric Dumazet wrote:


Changing the allocation size removes the problem ? thats really strange.

If you try different sizes in the 9100-30720 range, can you pinpoint the
failure threshold ?


The allocation size change did not fix the problem. It turned out that 10 tries 
from a secure web page were not enough to trigger this intermittent problem that 
particular test.


Based on DaveM's comment that skb-truesize could be wrong, I tried setting 
truesize after every netdev_alloc_skb() call. Of course, that had no effect. I 
then found https://lkml.org/lkml/2010/11/19/505I, which clearly states why this 
need not be done.


What skb modifications require that truesize be adjusted? The driver never 
resets skb-len or skb-data_len for any buffers, other than setting skb-len to 
zero.


Thanks,

Larry

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Regression associated with commit c8628155ece3 - tcp: reduce out_of_order memory use

2012-08-23 Thread Eric Dumazet
On Thu, 2012-08-23 at 15:57 -0500, Larry Finger wrote:
 On 08/22/2012 11:03 PM, Eric Dumazet wrote:
 
  Changing the allocation size removes the problem ? thats really strange.
 
  If you try different sizes in the 9100-30720 range, can you pinpoint the
  failure threshold ?
 
 The allocation size change did not fix the problem. It turned out that 10 
 tries 
 from a secure web page were not enough to trigger this intermittent problem 
 that 
 particular test.
 
 Based on DaveM's comment that skb-truesize could be wrong, I tried setting 
 truesize after every netdev_alloc_skb() call. Of course, that had no effect. 
 I 
 then found https://lkml.org/lkml/2010/11/19/505I, which clearly states why 
 this 
 need not be done.
 
 What skb modifications require that truesize be adjusted? The driver never 
 resets skb-len or skb-data_len for any buffers, other than setting skb-len 
 to 
 zero.

skb-truesize is adjusted when a frag is added to one skb, or when
skb-head is re-allocated.

Are you sure you dont have another problem, because as I said commit
c8628155ece3  had a bug, so a bisect is not very useful.

How many reloads are needed to trigger the bug, do you have a script to
reproduce it ?

Could it be a PMTU problem ? (check
http://git.kernel.org/?p=linux/kernel/git/davem/net.git;a=commit;h=9b04f350057863d1fad1ba071e09362a1da3503e
 )


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Regression associated with commit c8628155ece3 - "tcp: reduce out_of_order memory use"

2012-08-22 Thread Eric Dumazet
On Wed, 2012-08-22 at 16:33 -0500, Larry Finger wrote:
> On 08/22/2012 12:15 AM, Eric Dumazet wrote:
> >
> > This particular commit is the start of a patches batch that ended in the
> > generic TCP coalescing mechanism.
> >
> > It is known to have problem on drivers doing skb_clone() in their rx
> > path.
> >
> > Current kernels should be ok, because coalescing doesnt happen if the
> > destination skb is cloned (skb_cloned(to) in skb_try_coalesce())
> 
> The skb_clone() call is not the source of the problem for r8712u, as it is 
> only 
> used when a memory allocation fails, which is not happening. The suggestion 
> did 
> lead to another patch that I had been preparing. The initial allocation of RX 
> buffers used a size of 30720 bytes, while 9100 is sufficient to allow 
> aggregation. Upon reducing the buffer size, the driver now works for me. I am 
> now awaiting tests by the OP on the Red Hat bugzilla before sending the patch 
> upstream.
> 
> So far, no ideas for the problem in connecting to WPA1 networks.
> 

Changing the allocation size removes the problem ? thats really strange.

If you try different sizes in the 9100-30720 range, can you pinpoint the
failure threshold ?



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Regression associated with commit c8628155ece3 - "tcp: reduce out_of_order memory use"

2012-08-22 Thread Larry Finger

On 08/22/2012 12:15 AM, Eric Dumazet wrote:


This particular commit is the start of a patches batch that ended in the
generic TCP coalescing mechanism.

It is known to have problem on drivers doing skb_clone() in their rx
path.

Current kernels should be ok, because coalescing doesnt happen if the
destination skb is cloned (skb_cloned(to) in skb_try_coalesce())


The skb_clone() call is not the source of the problem for r8712u, as it is only 
used when a memory allocation fails, which is not happening. The suggestion did 
lead to another patch that I had been preparing. The initial allocation of RX 
buffers used a size of 30720 bytes, while 9100 is sufficient to allow 
aggregation. Upon reducing the buffer size, the driver now works for me. I am 
now awaiting tests by the OP on the Red Hat bugzilla before sending the patch 
upstream.


So far, no ideas for the problem in connecting to WPA1 networks.

Larry


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Regression associated with commit c8628155ece3 - "tcp: reduce out_of_order memory use"

2012-08-22 Thread Larry Finger

On 08/22/2012 12:15 AM, Eric Dumazet wrote:


This particular commit is the start of a patches batch that ended in the
generic TCP coalescing mechanism.

It is known to have problem on drivers doing skb_clone() in their rx
path.

Current kernels should be ok, because coalescing doesnt happen if the
destination skb is cloned (skb_cloned(to) in skb_try_coalesce())

For 3.4 kernel, I guess I need to backport this skb_cloned(to) check fo
stable 3.4 kernel

But these skb_clone() in various USB drivers should be killed for good,
they really can kill the box because of skb->truesize lies.


Applying your patch to 3.4 does not fix the problem. In addition, all of the 
problems I noted happen with 3.6-rc2 - current kernels are not OK.


Of the drivers I reported, only r8712u in staging/rtl8712/rtl8712_recv.c has a 
direct call to skb_clone(). As its symptoms are different, the cloning is likely 
the source of the problems there. In addition, the problem is intermittent. The 
driver normally allocates a new buffer and copies to it, and the clone only 
occurs when that skb allocation fails. I will try to fix that. My skb skills are 
minimal and I will have to do some studying.


Thanks,

Larry


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Regression associated with commit c8628155ece3 - tcp: reduce out_of_order memory use

2012-08-22 Thread Larry Finger

On 08/22/2012 12:15 AM, Eric Dumazet wrote:


This particular commit is the start of a patches batch that ended in the
generic TCP coalescing mechanism.

It is known to have problem on drivers doing skb_clone() in their rx
path.

Current kernels should be ok, because coalescing doesnt happen if the
destination skb is cloned (skb_cloned(to) in skb_try_coalesce())

For 3.4 kernel, I guess I need to backport this skb_cloned(to) check fo
stable 3.4 kernel

But these skb_clone() in various USB drivers should be killed for good,
they really can kill the box because of skb-truesize lies.


Applying your patch to 3.4 does not fix the problem. In addition, all of the 
problems I noted happen with 3.6-rc2 - current kernels are not OK.


Of the drivers I reported, only r8712u in staging/rtl8712/rtl8712_recv.c has a 
direct call to skb_clone(). As its symptoms are different, the cloning is likely 
the source of the problems there. In addition, the problem is intermittent. The 
driver normally allocates a new buffer and copies to it, and the clone only 
occurs when that skb allocation fails. I will try to fix that. My skb skills are 
minimal and I will have to do some studying.


Thanks,

Larry


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Regression associated with commit c8628155ece3 - tcp: reduce out_of_order memory use

2012-08-22 Thread Larry Finger

On 08/22/2012 12:15 AM, Eric Dumazet wrote:


This particular commit is the start of a patches batch that ended in the
generic TCP coalescing mechanism.

It is known to have problem on drivers doing skb_clone() in their rx
path.

Current kernels should be ok, because coalescing doesnt happen if the
destination skb is cloned (skb_cloned(to) in skb_try_coalesce())


The skb_clone() call is not the source of the problem for r8712u, as it is only 
used when a memory allocation fails, which is not happening. The suggestion did 
lead to another patch that I had been preparing. The initial allocation of RX 
buffers used a size of 30720 bytes, while 9100 is sufficient to allow 
aggregation. Upon reducing the buffer size, the driver now works for me. I am 
now awaiting tests by the OP on the Red Hat bugzilla before sending the patch 
upstream.


So far, no ideas for the problem in connecting to WPA1 networks.

Larry


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Regression associated with commit c8628155ece3 - tcp: reduce out_of_order memory use

2012-08-22 Thread Eric Dumazet
On Wed, 2012-08-22 at 16:33 -0500, Larry Finger wrote:
 On 08/22/2012 12:15 AM, Eric Dumazet wrote:
 
  This particular commit is the start of a patches batch that ended in the
  generic TCP coalescing mechanism.
 
  It is known to have problem on drivers doing skb_clone() in their rx
  path.
 
  Current kernels should be ok, because coalescing doesnt happen if the
  destination skb is cloned (skb_cloned(to) in skb_try_coalesce())
 
 The skb_clone() call is not the source of the problem for r8712u, as it is 
 only 
 used when a memory allocation fails, which is not happening. The suggestion 
 did 
 lead to another patch that I had been preparing. The initial allocation of RX 
 buffers used a size of 30720 bytes, while 9100 is sufficient to allow 
 aggregation. Upon reducing the buffer size, the driver now works for me. I am 
 now awaiting tests by the OP on the Red Hat bugzilla before sending the patch 
 upstream.
 
 So far, no ideas for the problem in connecting to WPA1 networks.
 

Changing the allocation size removes the problem ? thats really strange.

If you try different sizes in the 9100-30720 range, can you pinpoint the
failure threshold ?



--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Regression associated with commit c8628155ece3 - "tcp: reduce out_of_order memory use"

2012-08-21 Thread Eric Dumazet
On Tue, 2012-08-21 at 23:07 -0500, Larry Finger wrote:
> Hi,
> 
> The commit entitled "tcp: reduce out_of_order memory use" turns out to cause 
> problems with a number of USB drivers.
> 
> The first one called to my attention was for staging/r8712u. For this driver, 
> there are problems with SSL communications as reported in 
> https://bugzilla.redhat.com/show_bug.cgi?id=847525.
> 
> Other drivers including rtl8187, rtl8192cu and rt2800usb cannot connect to a 
> WPA1- or WEP-encrypted AP, but work fine with WPA2. The rtl8192cu problem is 
> reported in https://bugzilla.kernel.org/show_bug.cgi?id=46171.
> 
> I find it hard to understand why this patch should cause these effects; 
> however, 
> I have verified that a kernel generated from "git checkout e86b2919" works 
> fine. 
> This commit is immediately before the patch in question.
> 
> Note, this patch was applied during the merge between 3.3 and 3.4. The 
> regression has been undiscovered for some time.
> 
> Any help on this problem will be appreciated.
> 
> Larry

This particular commit is the start of a patches batch that ended in the
generic TCP coalescing mechanism.

It is known to have problem on drivers doing skb_clone() in their rx
path.

Current kernels should be ok, because coalescing doesnt happen if the
destination skb is cloned (skb_cloned(to) in skb_try_coalesce())

For 3.4 kernel, I guess I need to backport this skb_cloned(to) check fo
stable 3.4 kernel

But these skb_clone() in various USB drivers should be killed for good,
they really can kill the box because of skb->truesize lies.

Please test following patch (for 3.4 kernels)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 257b617..c45ac2d 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -4496,7 +4496,9 @@ static void tcp_data_queue_ofo(struct sock *sk, struct 
sk_buff *skb)
 * to avoid future tcp_collapse_ofo_queue(),
 * probably the most expensive function in tcp stack.
 */
-   if (skb->len <= skb_tailroom(skb1) && !tcp_hdr(skb)->fin) {
+   if (skb->len <= skb_tailroom(skb1) &&
+   !skb_cloned(skb1) &&
+   !tcp_hdr(skb)->fin) {
NET_INC_STATS_BH(sock_net(sk),
 LINUX_MIB_TCPRCVCOALESCE);
BUG_ON(skb_copy_bits(skb, 0,


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Regression associated with commit c8628155ece3 - "tcp: reduce out_of_order memory use"

2012-08-21 Thread David Miller
From: Larry Finger 
Date: Tue, 21 Aug 2012 23:07:46 -0500

> I find it hard to understand why this patch should cause these
> effects; however, I have verified that a kernel generated from "git
> checkout e86b2919" works fine. This commit is immediately before the
> patch in question.

I can guarentee it's improper skb->truesize setting in these drivers,
or use of too huge buffer sizes compared to the amount of data
contained within the packet.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Regression associated with commit c8628155ece3 - "tcp: reduce out_of_order memory use"

2012-08-21 Thread Larry Finger

Hi,

The commit entitled "tcp: reduce out_of_order memory use" turns out to cause 
problems with a number of USB drivers.


The first one called to my attention was for staging/r8712u. For this driver, 
there are problems with SSL communications as reported in 
https://bugzilla.redhat.com/show_bug.cgi?id=847525.


Other drivers including rtl8187, rtl8192cu and rt2800usb cannot connect to a 
WPA1- or WEP-encrypted AP, but work fine with WPA2. The rtl8192cu problem is 
reported in https://bugzilla.kernel.org/show_bug.cgi?id=46171.


I find it hard to understand why this patch should cause these effects; however, 
I have verified that a kernel generated from "git checkout e86b2919" works fine. 
This commit is immediately before the patch in question.


Note, this patch was applied during the merge between 3.3 and 3.4. The 
regression has been undiscovered for some time.


Any help on this problem will be appreciated.

Larry
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Regression associated with commit c8628155ece3 - tcp: reduce out_of_order memory use

2012-08-21 Thread Larry Finger

Hi,

The commit entitled tcp: reduce out_of_order memory use turns out to cause 
problems with a number of USB drivers.


The first one called to my attention was for staging/r8712u. For this driver, 
there are problems with SSL communications as reported in 
https://bugzilla.redhat.com/show_bug.cgi?id=847525.


Other drivers including rtl8187, rtl8192cu and rt2800usb cannot connect to a 
WPA1- or WEP-encrypted AP, but work fine with WPA2. The rtl8192cu problem is 
reported in https://bugzilla.kernel.org/show_bug.cgi?id=46171.


I find it hard to understand why this patch should cause these effects; however, 
I have verified that a kernel generated from git checkout e86b2919 works fine. 
This commit is immediately before the patch in question.


Note, this patch was applied during the merge between 3.3 and 3.4. The 
regression has been undiscovered for some time.


Any help on this problem will be appreciated.

Larry
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Regression associated with commit c8628155ece3 - tcp: reduce out_of_order memory use

2012-08-21 Thread David Miller
From: Larry Finger larry.fin...@lwfinger.net
Date: Tue, 21 Aug 2012 23:07:46 -0500

 I find it hard to understand why this patch should cause these
 effects; however, I have verified that a kernel generated from git
 checkout e86b2919 works fine. This commit is immediately before the
 patch in question.

I can guarentee it's improper skb-truesize setting in these drivers,
or use of too huge buffer sizes compared to the amount of data
contained within the packet.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Regression associated with commit c8628155ece3 - tcp: reduce out_of_order memory use

2012-08-21 Thread Eric Dumazet
On Tue, 2012-08-21 at 23:07 -0500, Larry Finger wrote:
 Hi,
 
 The commit entitled tcp: reduce out_of_order memory use turns out to cause 
 problems with a number of USB drivers.
 
 The first one called to my attention was for staging/r8712u. For this driver, 
 there are problems with SSL communications as reported in 
 https://bugzilla.redhat.com/show_bug.cgi?id=847525.
 
 Other drivers including rtl8187, rtl8192cu and rt2800usb cannot connect to a 
 WPA1- or WEP-encrypted AP, but work fine with WPA2. The rtl8192cu problem is 
 reported in https://bugzilla.kernel.org/show_bug.cgi?id=46171.
 
 I find it hard to understand why this patch should cause these effects; 
 however, 
 I have verified that a kernel generated from git checkout e86b2919 works 
 fine. 
 This commit is immediately before the patch in question.
 
 Note, this patch was applied during the merge between 3.3 and 3.4. The 
 regression has been undiscovered for some time.
 
 Any help on this problem will be appreciated.
 
 Larry

This particular commit is the start of a patches batch that ended in the
generic TCP coalescing mechanism.

It is known to have problem on drivers doing skb_clone() in their rx
path.

Current kernels should be ok, because coalescing doesnt happen if the
destination skb is cloned (skb_cloned(to) in skb_try_coalesce())

For 3.4 kernel, I guess I need to backport this skb_cloned(to) check fo
stable 3.4 kernel

But these skb_clone() in various USB drivers should be killed for good,
they really can kill the box because of skb-truesize lies.

Please test following patch (for 3.4 kernels)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 257b617..c45ac2d 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -4496,7 +4496,9 @@ static void tcp_data_queue_ofo(struct sock *sk, struct 
sk_buff *skb)
 * to avoid future tcp_collapse_ofo_queue(),
 * probably the most expensive function in tcp stack.
 */
-   if (skb-len = skb_tailroom(skb1)  !tcp_hdr(skb)-fin) {
+   if (skb-len = skb_tailroom(skb1) 
+   !skb_cloned(skb1) 
+   !tcp_hdr(skb)-fin) {
NET_INC_STATS_BH(sock_net(sk),
 LINUX_MIB_TCPRCVCOALESCE);
BUG_ON(skb_copy_bits(skb, 0,


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/