Re: Regression associated with commit c8628155ece3 - "tcp: reduce out_of_order memory use"
On 08/27/2012 01:21 PM, Eric Dumazet wrote: On Mon, 2012-08-27 at 12:55 -0500, Larry Finger wrote: I have prepared a patch to fix all the unchecked allocations. Over the weekend I made some progress. To test the latest vendor driver, I installed a 32-bit system. Their driver is not compatible with a 64-bit system. I found that not only did the vendor driver work with secure sites, but so did the in-kernel version. I now have tcpdump output for the 32-bit case that works, and the 64-bit case that fails. It seems likely that I missed some 32/64 bit incompatibility when I did the conversion. Thanks for all your help in trying to resolve this issue. Interesting, so these 32/64 bits problems did not happen in prior kernels ? (< 3.4 ) Prior kernels are OK. Larry -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Regression associated with commit c8628155ece3 - "tcp: reduce out_of_order memory use"
On Mon, 2012-08-27 at 12:55 -0500, Larry Finger wrote: > I have prepared a patch to fix all the unchecked allocations. > > Over the weekend I made some progress. To test the latest vendor driver, I > installed a 32-bit system. Their driver is not compatible with a 64-bit > system. > I found that not only did the vendor driver work with secure sites, but so > did > the in-kernel version. I now have tcpdump output for the 32-bit case that > works, > and the 64-bit case that fails. It seems likely that I missed some 32/64 bit > incompatibility when I did the conversion. > > Thanks for all your help in trying to resolve this issue. Interesting, so these 32/64 bits problems did not happen in prior kernels ? (< 3.4 ) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Regression associated with commit c8628155ece3 - "tcp: reduce out_of_order memory use"
On 08/24/2012 12:47 PM, Eric Dumazet wrote: dev_alloc_skb() can return NULL -> crash skb_clone() can also return NULL -> crash I have prepared a patch to fix all the unchecked allocations. Over the weekend I made some progress. To test the latest vendor driver, I installed a 32-bit system. Their driver is not compatible with a 64-bit system. I found that not only did the vendor driver work with secure sites, but so did the in-kernel version. I now have tcpdump output for the 32-bit case that works, and the 64-bit case that fails. It seems likely that I missed some 32/64 bit incompatibility when I did the conversion. Thanks for all your help in trying to resolve this issue. Larry -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Regression associated with commit c8628155ece3 - tcp: reduce out_of_order memory use
On 08/24/2012 12:47 PM, Eric Dumazet wrote: dev_alloc_skb() can return NULL - crash skb_clone() can also return NULL - crash I have prepared a patch to fix all the unchecked allocations. Over the weekend I made some progress. To test the latest vendor driver, I installed a 32-bit system. Their driver is not compatible with a 64-bit system. I found that not only did the vendor driver work with secure sites, but so did the in-kernel version. I now have tcpdump output for the 32-bit case that works, and the 64-bit case that fails. It seems likely that I missed some 32/64 bit incompatibility when I did the conversion. Thanks for all your help in trying to resolve this issue. Larry -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Regression associated with commit c8628155ece3 - tcp: reduce out_of_order memory use
On Mon, 2012-08-27 at 12:55 -0500, Larry Finger wrote: I have prepared a patch to fix all the unchecked allocations. Over the weekend I made some progress. To test the latest vendor driver, I installed a 32-bit system. Their driver is not compatible with a 64-bit system. I found that not only did the vendor driver work with secure sites, but so did the in-kernel version. I now have tcpdump output for the 32-bit case that works, and the 64-bit case that fails. It seems likely that I missed some 32/64 bit incompatibility when I did the conversion. Thanks for all your help in trying to resolve this issue. Interesting, so these 32/64 bits problems did not happen in prior kernels ? ( 3.4 ) -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Regression associated with commit c8628155ece3 - tcp: reduce out_of_order memory use
On 08/27/2012 01:21 PM, Eric Dumazet wrote: On Mon, 2012-08-27 at 12:55 -0500, Larry Finger wrote: I have prepared a patch to fix all the unchecked allocations. Over the weekend I made some progress. To test the latest vendor driver, I installed a 32-bit system. Their driver is not compatible with a 64-bit system. I found that not only did the vendor driver work with secure sites, but so did the in-kernel version. I now have tcpdump output for the 32-bit case that works, and the 64-bit case that fails. It seems likely that I missed some 32/64 bit incompatibility when I did the conversion. Thanks for all your help in trying to resolve this issue. Interesting, so these 32/64 bits problems did not happen in prior kernels ? ( 3.4 ) Prior kernels are OK. Larry -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Regression associated with commit c8628155ece3 - "tcp: reduce out_of_order memory use"
On Fri, 2012-08-24 at 11:58 -0500, Larry Finger wrote: > On 08/24/2012 11:23 AM, Eric Dumazet wrote: > > On Fri, 2012-08-24 at 18:18 +0200, Eric Dumazet wrote: > >> On Fri, 2012-08-24 at 10:58 -0500, Larry Finger wrote: > >>> On 08/24/2012 10:19 AM, David Miller wrote: > > This looks like full-on data corruption to me. > >>> > >>> I agree. The question is why does it happen with r8712u, and only after > >>> the > >>> commit in the subject. Drivers for other devices that I have are OK. Thus > >>> far, I > >>> have tested b43, rtl8187, ath9k_htc, and rtl8192cu. To my knowledge, > >>> there are > >>> no reports posted for this bug with any other device. > >> > >> bugs can sit unnoticed, and one change somewhere can uncover them. > >> > >> Really this driver must have a bug, if not half a dozen of bugs. > >> > >> For example this sequence of code is a clear bug : > >> > >> sub_skb = dev_alloc_skb(nSubframe_Length + 12); > >> skb_reserve(sub_skb, 12); > >> > >> > >> Also the free_recv_skb_queue looks really suspect to me > >> > >> What the hell is doing recv_tasklet() I really wonder. > >> > >> This code, combined with the skb_clone() in recvbuf2recvframe() > >> can clearly reuse an skb passed to upper stacks. > >> > >> > >> queueing one skb in free_recv_skb_queue should be done > >> only if no clone of this skb exist somewhere. > >> > >> Please someone fix this buggy driver. > >> > > > > Try the following patch for a start > > > > diff --git a/drivers/staging/rtl8712/rtl8712_recv.c > > b/drivers/staging/rtl8712/rtl8712_recv.c > > index 8e82ce2..88e3ca6 100644 > > --- a/drivers/staging/rtl8712/rtl8712_recv.c > > +++ b/drivers/staging/rtl8712/rtl8712_recv.c > > @@ -1127,6 +1127,9 @@ static void recv_tasklet(void *priv) > > recvbuf2recvframe(padapter, pskb); > > skb_reset_tail_pointer(pskb); > > pskb->len = 0; > > - skb_queue_tail(>free_recv_skb_queue, pskb); > > + if (!skb_cloned(pskb)) > > + skb_queue_tail(>free_recv_skb_queue, pskb); > > + else > > + consume_skb(pskb); > > } > > } > > This one did not help. There is no doubt it is needed for the case where > memory > is tight, an allocation fails, and the driver clones the skb. In the present > case, debug statements have shown that the skb_clone() call was not made. > > In the long term, this driver will be replaced with one that uses mac80211, > but > in the short term, I am trying to fix it. > > As I said earlier, my skb skills are minimal. Could you explain what is wrong > with the following sequence? > > sub_skb = dev_alloc_skb(nSubframe_Length + 12); > skb_reserve(sub_skb, 12); dev_alloc_skb() can return NULL -> crash skb_clone() can also return NULL -> crash -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Regression associated with commit c8628155ece3 - "tcp: reduce out_of_order memory use"
On 08/24/2012 11:23 AM, Eric Dumazet wrote: On Fri, 2012-08-24 at 18:18 +0200, Eric Dumazet wrote: On Fri, 2012-08-24 at 10:58 -0500, Larry Finger wrote: On 08/24/2012 10:19 AM, David Miller wrote: This looks like full-on data corruption to me. I agree. The question is why does it happen with r8712u, and only after the commit in the subject. Drivers for other devices that I have are OK. Thus far, I have tested b43, rtl8187, ath9k_htc, and rtl8192cu. To my knowledge, there are no reports posted for this bug with any other device. bugs can sit unnoticed, and one change somewhere can uncover them. Really this driver must have a bug, if not half a dozen of bugs. For example this sequence of code is a clear bug : sub_skb = dev_alloc_skb(nSubframe_Length + 12); skb_reserve(sub_skb, 12); Also the free_recv_skb_queue looks really suspect to me What the hell is doing recv_tasklet() I really wonder. This code, combined with the skb_clone() in recvbuf2recvframe() can clearly reuse an skb passed to upper stacks. queueing one skb in free_recv_skb_queue should be done only if no clone of this skb exist somewhere. Please someone fix this buggy driver. Try the following patch for a start diff --git a/drivers/staging/rtl8712/rtl8712_recv.c b/drivers/staging/rtl8712/rtl8712_recv.c index 8e82ce2..88e3ca6 100644 --- a/drivers/staging/rtl8712/rtl8712_recv.c +++ b/drivers/staging/rtl8712/rtl8712_recv.c @@ -1127,6 +1127,9 @@ static void recv_tasklet(void *priv) recvbuf2recvframe(padapter, pskb); skb_reset_tail_pointer(pskb); pskb->len = 0; - skb_queue_tail(>free_recv_skb_queue, pskb); + if (!skb_cloned(pskb)) + skb_queue_tail(>free_recv_skb_queue, pskb); + else + consume_skb(pskb); } } This one did not help. There is no doubt it is needed for the case where memory is tight, an allocation fails, and the driver clones the skb. In the present case, debug statements have shown that the skb_clone() call was not made. In the long term, this driver will be replaced with one that uses mac80211, but in the short term, I am trying to fix it. As I said earlier, my skb skills are minimal. Could you explain what is wrong with the following sequence? sub_skb = dev_alloc_skb(nSubframe_Length + 12); skb_reserve(sub_skb, 12); Thanks, Larry -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Regression associated with commit c8628155ece3 - "tcp: reduce out_of_order memory use"
On 08/24/2012 11:01 AM, Eric Dumazet wrote: On Fri, 2012-08-24 at 10:49 -0500, Larry Finger wrote: There is nothing in kernel log when it happens. The file STRACE is attached. So there is indeed a corruption. What was the driver you used in this case ? The only driver that shows this problem is r8712u. One thought that I just had is that this driver does not use mac80211, which is why it is in staging. Instead it uses a Realtek soft MAC layer derived from the obsolete softmac driver that was in the kernel in the 2.6.20 time frame. That means that none of the skbs have ever been cloned by any part of the wireless code. Larry -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Regression associated with commit c8628155ece3 - "tcp: reduce out_of_order memory use"
On Fri, 2012-08-24 at 18:18 +0200, Eric Dumazet wrote: > On Fri, 2012-08-24 at 10:58 -0500, Larry Finger wrote: > > On 08/24/2012 10:19 AM, David Miller wrote: > > > > > > This looks like full-on data corruption to me. > > > > I agree. The question is why does it happen with r8712u, and only after the > > commit in the subject. Drivers for other devices that I have are OK. Thus > > far, I > > have tested b43, rtl8187, ath9k_htc, and rtl8192cu. To my knowledge, there > > are > > no reports posted for this bug with any other device. > > bugs can sit unnoticed, and one change somewhere can uncover them. > > Really this driver must have a bug, if not half a dozen of bugs. > > For example this sequence of code is a clear bug : > > sub_skb = dev_alloc_skb(nSubframe_Length + 12); > skb_reserve(sub_skb, 12); > > > Also the free_recv_skb_queue looks really suspect to me > > What the hell is doing recv_tasklet() I really wonder. > > This code, combined with the skb_clone() in recvbuf2recvframe() > can clearly reuse an skb passed to upper stacks. > > > queueing one skb in free_recv_skb_queue should be done > only if no clone of this skb exist somewhere. > > Please someone fix this buggy driver. > Try the following patch for a start diff --git a/drivers/staging/rtl8712/rtl8712_recv.c b/drivers/staging/rtl8712/rtl8712_recv.c index 8e82ce2..88e3ca6 100644 --- a/drivers/staging/rtl8712/rtl8712_recv.c +++ b/drivers/staging/rtl8712/rtl8712_recv.c @@ -1127,6 +1127,9 @@ static void recv_tasklet(void *priv) recvbuf2recvframe(padapter, pskb); skb_reset_tail_pointer(pskb); pskb->len = 0; - skb_queue_tail(>free_recv_skb_queue, pskb); + if (!skb_cloned(pskb)) + skb_queue_tail(>free_recv_skb_queue, pskb); + else + consume_skb(pskb); } } -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Regression associated with commit c8628155ece3 - "tcp: reduce out_of_order memory use"
On Fri, 2012-08-24 at 10:58 -0500, Larry Finger wrote: > On 08/24/2012 10:19 AM, David Miller wrote: > > > > This looks like full-on data corruption to me. > > I agree. The question is why does it happen with r8712u, and only after the > commit in the subject. Drivers for other devices that I have are OK. Thus > far, I > have tested b43, rtl8187, ath9k_htc, and rtl8192cu. To my knowledge, there > are > no reports posted for this bug with any other device. bugs can sit unnoticed, and one change somewhere can uncover them. Really this driver must have a bug, if not half a dozen of bugs. For example this sequence of code is a clear bug : sub_skb = dev_alloc_skb(nSubframe_Length + 12); skb_reserve(sub_skb, 12); Also the free_recv_skb_queue looks really suspect to me What the hell is doing recv_tasklet() I really wonder. This code, combined with the skb_clone() in recvbuf2recvframe() can clearly reuse an skb passed to upper stacks. queueing one skb in free_recv_skb_queue should be done only if no clone of this skb exist somewhere. Please someone fix this buggy driver. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Regression associated with commit c8628155ece3 - "tcp: reduce out_of_order memory use"
On Fri, 2012-08-24 at 10:49 -0500, Larry Finger wrote: > There is nothing in kernel log when it happens. The file STRACE is attached. > So there is indeed a corruption. What was the driver you used in this case ? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Regression associated with commit c8628155ece3 - "tcp: reduce out_of_order memory use"
On 08/24/2012 10:19 AM, David Miller wrote: This looks like full-on data corruption to me. I agree. The question is why does it happen with r8712u, and only after the commit in the subject. Drivers for other devices that I have are OK. Thus far, I have tested b43, rtl8187, ath9k_htc, and rtl8192cu. To my knowledge, there are no reports posted for this bug with any other device. Larry -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Regression associated with commit c8628155ece3 - "tcp: reduce out_of_order memory use"
From: Larry Finger Date: Fri, 24 Aug 2012 09:09:57 -0500 > I usually get failures on the first or second try. A kernel built from > "git checkout c862815" gets the following wget output: > > > --2012-08-23 21:50:32-- > --https://bugzilla.redhat.com/show_bug.cgi?id=847525 > Resolving bugzilla.redhat.com (bugzilla.redhat.com)... 209.132.183.69 > Connecting to bugzilla.redhat.com > (bugzilla.redhat.com)|209.132.183.69|:443... connected. > OpenSSL: error:0D0680A8:asn1 encoding routines:ASN1_CHECK_TLEN:wrong > tag > OpenSSL: error:0D07803A:asn1 encoding routines:ASN1_ITEM_EX_D2I:nested > asn1 error > OpenSSL: error:0D08303A:asn1 encoding > routines:ASN1_TEMPLATE_NOEXP_D2I:nested asn1 error > OpenSSL: error:0D08303A:asn1 encoding > routines:ASN1_TEMPLATE_NOEXP_D2I:nested asn1 error > OpenSSL: error:1409000D:SSL routines:SSL3_GET_SERVER_CERTIFICATE:ASN1 > lib > Unable to establish SSL connection. > This looks like full-on data corruption to me. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Regression associated with commit c8628155ece3 - "tcp: reduce out_of_order memory use"
Le vendredi 24 août 2012 à 09:09 -0500, Larry Finger a écrit : > With kernel 3.6-rc2, the error changes to the following: > > > --2012-08-24 08:26:42-- https://bugzilla.redhat.com/show_bug.cgi?id=847525 > Resolving bugzilla.redhat.com (bugzilla.redhat.com)... 209.132.183.69 > Connecting to bugzilla.redhat.com > (bugzilla.redhat.com)|209.132.183.69|:443... > connected. > HTTP request sent, awaiting response... 200 OK > Length: unspecified [text/html] > Saving to: `show_bug.cgi?id=847525.3' > > [ <=> ] 0 --.-K/s in 0s > > 2012-08-24 08:26:45 (0.00 B/s) - Read error at byte 0 (error:1408F119:SSL > routines:SSL3_GET_RECORD:decryption failed or bad record mac).Retrying. > > > Anything else to try? When this happens, nothing in kernel log ? Could you do strace -o STRACE -s 8000 wget https://bugzilla.redhat.com/show_bug.cgi?id=847525 and send the STRACE file when you get a failure ? Thanks -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Regression associated with commit c8628155ece3 - "tcp: reduce out_of_order memory use"
On 08/23/2012 04:26 PM, Eric Dumazet wrote: skb->truesize is adjusted when a frag is added to one skb, or when skb->head is re-allocated. Are you sure you dont have another problem, because as I said commit c8628155ece3 had a bug, so a bisect is not very useful. How many reloads are needed to trigger the bug, do you have a script to reproduce it ? I was doing the tests with a browser; however, it seems that a wget of that same page is much more effective at triggering the problem. Now I have a script that tries 60 passes of wget https://bugzilla.redhat.com/show_bug.cgi?id=847525 I usually get failures on the first or second try. A kernel built from "git checkout c862815" gets the following wget output: --2012-08-23 21:50:32-- https://bugzilla.redhat.com/show_bug.cgi?id=847525 Resolving bugzilla.redhat.com (bugzilla.redhat.com)... 209.132.183.69 Connecting to bugzilla.redhat.com (bugzilla.redhat.com)|209.132.183.69|:443... connected. OpenSSL: error:0D0680A8:asn1 encoding routines:ASN1_CHECK_TLEN:wrong tag OpenSSL: error:0D07803A:asn1 encoding routines:ASN1_ITEM_EX_D2I:nested asn1 error OpenSSL: error:0D08303A:asn1 encoding routines:ASN1_TEMPLATE_NOEXP_D2I:nested asn1 error OpenSSL: error:0D08303A:asn1 encoding routines:ASN1_TEMPLATE_NOEXP_D2I:nested asn1 error OpenSSL: error:1409000D:SSL routines:SSL3_GET_SERVER_CERTIFICATE:ASN1 lib Unable to establish SSL connection. With kernel 3.6-rc2, the error changes to the following: --2012-08-24 08:26:42-- https://bugzilla.redhat.com/show_bug.cgi?id=847525 Resolving bugzilla.redhat.com (bugzilla.redhat.com)... 209.132.183.69 Connecting to bugzilla.redhat.com (bugzilla.redhat.com)|209.132.183.69|:443... connected. HTTP request sent, awaiting response... 200 OK Length: unspecified [text/html] Saving to: `show_bug.cgi?id=847525.3' [ <=> ] 0 --.-K/s in 0s 2012-08-24 08:26:45 (0.00 B/s) - Read error at byte 0 (error:1408F119:SSL routines:SSL3_GET_RECORD:decryption failed or bad record mac).Retrying. Could it be a PMTU problem ? (check http://git.kernel.org/?p=linux/kernel/git/davem/net.git;a=commit;h=9b04f350057863d1fad1ba071e09362a1da3503e ) I added the PMTU patch to kernel 3.6-rc2 - nothing changed. Anything else to try? Thanks, Larry -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Regression associated with commit c8628155ece3 - tcp: reduce out_of_order memory use
On 08/23/2012 04:26 PM, Eric Dumazet wrote: skb-truesize is adjusted when a frag is added to one skb, or when skb-head is re-allocated. Are you sure you dont have another problem, because as I said commit c8628155ece3 had a bug, so a bisect is not very useful. How many reloads are needed to trigger the bug, do you have a script to reproduce it ? I was doing the tests with a browser; however, it seems that a wget of that same page is much more effective at triggering the problem. Now I have a script that tries 60 passes of wget https://bugzilla.redhat.com/show_bug.cgi?id=847525 I usually get failures on the first or second try. A kernel built from git checkout c862815 gets the following wget output: --2012-08-23 21:50:32-- https://bugzilla.redhat.com/show_bug.cgi?id=847525 Resolving bugzilla.redhat.com (bugzilla.redhat.com)... 209.132.183.69 Connecting to bugzilla.redhat.com (bugzilla.redhat.com)|209.132.183.69|:443... connected. OpenSSL: error:0D0680A8:asn1 encoding routines:ASN1_CHECK_TLEN:wrong tag OpenSSL: error:0D07803A:asn1 encoding routines:ASN1_ITEM_EX_D2I:nested asn1 error OpenSSL: error:0D08303A:asn1 encoding routines:ASN1_TEMPLATE_NOEXP_D2I:nested asn1 error OpenSSL: error:0D08303A:asn1 encoding routines:ASN1_TEMPLATE_NOEXP_D2I:nested asn1 error OpenSSL: error:1409000D:SSL routines:SSL3_GET_SERVER_CERTIFICATE:ASN1 lib Unable to establish SSL connection. With kernel 3.6-rc2, the error changes to the following: --2012-08-24 08:26:42-- https://bugzilla.redhat.com/show_bug.cgi?id=847525 Resolving bugzilla.redhat.com (bugzilla.redhat.com)... 209.132.183.69 Connecting to bugzilla.redhat.com (bugzilla.redhat.com)|209.132.183.69|:443... connected. HTTP request sent, awaiting response... 200 OK Length: unspecified [text/html] Saving to: `show_bug.cgi?id=847525.3' [ = ] 0 --.-K/s in 0s 2012-08-24 08:26:45 (0.00 B/s) - Read error at byte 0 (error:1408F119:SSL routines:SSL3_GET_RECORD:decryption failed or bad record mac).Retrying. Could it be a PMTU problem ? (check http://git.kernel.org/?p=linux/kernel/git/davem/net.git;a=commit;h=9b04f350057863d1fad1ba071e09362a1da3503e ) I added the PMTU patch to kernel 3.6-rc2 - nothing changed. Anything else to try? Thanks, Larry -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Regression associated with commit c8628155ece3 - tcp: reduce out_of_order memory use
Le vendredi 24 août 2012 à 09:09 -0500, Larry Finger a écrit : With kernel 3.6-rc2, the error changes to the following: --2012-08-24 08:26:42-- https://bugzilla.redhat.com/show_bug.cgi?id=847525 Resolving bugzilla.redhat.com (bugzilla.redhat.com)... 209.132.183.69 Connecting to bugzilla.redhat.com (bugzilla.redhat.com)|209.132.183.69|:443... connected. HTTP request sent, awaiting response... 200 OK Length: unspecified [text/html] Saving to: `show_bug.cgi?id=847525.3' [ = ] 0 --.-K/s in 0s 2012-08-24 08:26:45 (0.00 B/s) - Read error at byte 0 (error:1408F119:SSL routines:SSL3_GET_RECORD:decryption failed or bad record mac).Retrying. Anything else to try? When this happens, nothing in kernel log ? Could you do strace -o STRACE -s 8000 wget https://bugzilla.redhat.com/show_bug.cgi?id=847525 and send the STRACE file when you get a failure ? Thanks -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Regression associated with commit c8628155ece3 - tcp: reduce out_of_order memory use
From: Larry Finger larry.fin...@lwfinger.net Date: Fri, 24 Aug 2012 09:09:57 -0500 I usually get failures on the first or second try. A kernel built from git checkout c862815 gets the following wget output: --2012-08-23 21:50:32-- --https://bugzilla.redhat.com/show_bug.cgi?id=847525 Resolving bugzilla.redhat.com (bugzilla.redhat.com)... 209.132.183.69 Connecting to bugzilla.redhat.com (bugzilla.redhat.com)|209.132.183.69|:443... connected. OpenSSL: error:0D0680A8:asn1 encoding routines:ASN1_CHECK_TLEN:wrong tag OpenSSL: error:0D07803A:asn1 encoding routines:ASN1_ITEM_EX_D2I:nested asn1 error OpenSSL: error:0D08303A:asn1 encoding routines:ASN1_TEMPLATE_NOEXP_D2I:nested asn1 error OpenSSL: error:0D08303A:asn1 encoding routines:ASN1_TEMPLATE_NOEXP_D2I:nested asn1 error OpenSSL: error:1409000D:SSL routines:SSL3_GET_SERVER_CERTIFICATE:ASN1 lib Unable to establish SSL connection. This looks like full-on data corruption to me. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Regression associated with commit c8628155ece3 - tcp: reduce out_of_order memory use
On 08/24/2012 10:19 AM, David Miller wrote: This looks like full-on data corruption to me. I agree. The question is why does it happen with r8712u, and only after the commit in the subject. Drivers for other devices that I have are OK. Thus far, I have tested b43, rtl8187, ath9k_htc, and rtl8192cu. To my knowledge, there are no reports posted for this bug with any other device. Larry -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Regression associated with commit c8628155ece3 - tcp: reduce out_of_order memory use
On Fri, 2012-08-24 at 10:49 -0500, Larry Finger wrote: There is nothing in kernel log when it happens. The file STRACE is attached. So there is indeed a corruption. What was the driver you used in this case ? -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Regression associated with commit c8628155ece3 - tcp: reduce out_of_order memory use
On Fri, 2012-08-24 at 10:58 -0500, Larry Finger wrote: On 08/24/2012 10:19 AM, David Miller wrote: This looks like full-on data corruption to me. I agree. The question is why does it happen with r8712u, and only after the commit in the subject. Drivers for other devices that I have are OK. Thus far, I have tested b43, rtl8187, ath9k_htc, and rtl8192cu. To my knowledge, there are no reports posted for this bug with any other device. bugs can sit unnoticed, and one change somewhere can uncover them. Really this driver must have a bug, if not half a dozen of bugs. For example this sequence of code is a clear bug : sub_skb = dev_alloc_skb(nSubframe_Length + 12); skb_reserve(sub_skb, 12); Also the free_recv_skb_queue looks really suspect to me What the hell is doing recv_tasklet() I really wonder. This code, combined with the skb_clone() in recvbuf2recvframe() can clearly reuse an skb passed to upper stacks. queueing one skb in free_recv_skb_queue should be done only if no clone of this skb exist somewhere. Please someone fix this buggy driver. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Regression associated with commit c8628155ece3 - tcp: reduce out_of_order memory use
On Fri, 2012-08-24 at 18:18 +0200, Eric Dumazet wrote: On Fri, 2012-08-24 at 10:58 -0500, Larry Finger wrote: On 08/24/2012 10:19 AM, David Miller wrote: This looks like full-on data corruption to me. I agree. The question is why does it happen with r8712u, and only after the commit in the subject. Drivers for other devices that I have are OK. Thus far, I have tested b43, rtl8187, ath9k_htc, and rtl8192cu. To my knowledge, there are no reports posted for this bug with any other device. bugs can sit unnoticed, and one change somewhere can uncover them. Really this driver must have a bug, if not half a dozen of bugs. For example this sequence of code is a clear bug : sub_skb = dev_alloc_skb(nSubframe_Length + 12); skb_reserve(sub_skb, 12); Also the free_recv_skb_queue looks really suspect to me What the hell is doing recv_tasklet() I really wonder. This code, combined with the skb_clone() in recvbuf2recvframe() can clearly reuse an skb passed to upper stacks. queueing one skb in free_recv_skb_queue should be done only if no clone of this skb exist somewhere. Please someone fix this buggy driver. Try the following patch for a start diff --git a/drivers/staging/rtl8712/rtl8712_recv.c b/drivers/staging/rtl8712/rtl8712_recv.c index 8e82ce2..88e3ca6 100644 --- a/drivers/staging/rtl8712/rtl8712_recv.c +++ b/drivers/staging/rtl8712/rtl8712_recv.c @@ -1127,6 +1127,9 @@ static void recv_tasklet(void *priv) recvbuf2recvframe(padapter, pskb); skb_reset_tail_pointer(pskb); pskb-len = 0; - skb_queue_tail(precvpriv-free_recv_skb_queue, pskb); + if (!skb_cloned(pskb)) + skb_queue_tail(precvpriv-free_recv_skb_queue, pskb); + else + consume_skb(pskb); } } -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Regression associated with commit c8628155ece3 - tcp: reduce out_of_order memory use
On 08/24/2012 11:01 AM, Eric Dumazet wrote: On Fri, 2012-08-24 at 10:49 -0500, Larry Finger wrote: There is nothing in kernel log when it happens. The file STRACE is attached. So there is indeed a corruption. What was the driver you used in this case ? The only driver that shows this problem is r8712u. One thought that I just had is that this driver does not use mac80211, which is why it is in staging. Instead it uses a Realtek soft MAC layer derived from the obsolete softmac driver that was in the kernel in the 2.6.20 time frame. That means that none of the skbs have ever been cloned by any part of the wireless code. Larry -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Regression associated with commit c8628155ece3 - tcp: reduce out_of_order memory use
On 08/24/2012 11:23 AM, Eric Dumazet wrote: On Fri, 2012-08-24 at 18:18 +0200, Eric Dumazet wrote: On Fri, 2012-08-24 at 10:58 -0500, Larry Finger wrote: On 08/24/2012 10:19 AM, David Miller wrote: This looks like full-on data corruption to me. I agree. The question is why does it happen with r8712u, and only after the commit in the subject. Drivers for other devices that I have are OK. Thus far, I have tested b43, rtl8187, ath9k_htc, and rtl8192cu. To my knowledge, there are no reports posted for this bug with any other device. bugs can sit unnoticed, and one change somewhere can uncover them. Really this driver must have a bug, if not half a dozen of bugs. For example this sequence of code is a clear bug : sub_skb = dev_alloc_skb(nSubframe_Length + 12); skb_reserve(sub_skb, 12); Also the free_recv_skb_queue looks really suspect to me What the hell is doing recv_tasklet() I really wonder. This code, combined with the skb_clone() in recvbuf2recvframe() can clearly reuse an skb passed to upper stacks. queueing one skb in free_recv_skb_queue should be done only if no clone of this skb exist somewhere. Please someone fix this buggy driver. Try the following patch for a start diff --git a/drivers/staging/rtl8712/rtl8712_recv.c b/drivers/staging/rtl8712/rtl8712_recv.c index 8e82ce2..88e3ca6 100644 --- a/drivers/staging/rtl8712/rtl8712_recv.c +++ b/drivers/staging/rtl8712/rtl8712_recv.c @@ -1127,6 +1127,9 @@ static void recv_tasklet(void *priv) recvbuf2recvframe(padapter, pskb); skb_reset_tail_pointer(pskb); pskb-len = 0; - skb_queue_tail(precvpriv-free_recv_skb_queue, pskb); + if (!skb_cloned(pskb)) + skb_queue_tail(precvpriv-free_recv_skb_queue, pskb); + else + consume_skb(pskb); } } This one did not help. There is no doubt it is needed for the case where memory is tight, an allocation fails, and the driver clones the skb. In the present case, debug statements have shown that the skb_clone() call was not made. In the long term, this driver will be replaced with one that uses mac80211, but in the short term, I am trying to fix it. As I said earlier, my skb skills are minimal. Could you explain what is wrong with the following sequence? sub_skb = dev_alloc_skb(nSubframe_Length + 12); skb_reserve(sub_skb, 12); Thanks, Larry -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Regression associated with commit c8628155ece3 - tcp: reduce out_of_order memory use
On Fri, 2012-08-24 at 11:58 -0500, Larry Finger wrote: On 08/24/2012 11:23 AM, Eric Dumazet wrote: On Fri, 2012-08-24 at 18:18 +0200, Eric Dumazet wrote: On Fri, 2012-08-24 at 10:58 -0500, Larry Finger wrote: On 08/24/2012 10:19 AM, David Miller wrote: This looks like full-on data corruption to me. I agree. The question is why does it happen with r8712u, and only after the commit in the subject. Drivers for other devices that I have are OK. Thus far, I have tested b43, rtl8187, ath9k_htc, and rtl8192cu. To my knowledge, there are no reports posted for this bug with any other device. bugs can sit unnoticed, and one change somewhere can uncover them. Really this driver must have a bug, if not half a dozen of bugs. For example this sequence of code is a clear bug : sub_skb = dev_alloc_skb(nSubframe_Length + 12); skb_reserve(sub_skb, 12); Also the free_recv_skb_queue looks really suspect to me What the hell is doing recv_tasklet() I really wonder. This code, combined with the skb_clone() in recvbuf2recvframe() can clearly reuse an skb passed to upper stacks. queueing one skb in free_recv_skb_queue should be done only if no clone of this skb exist somewhere. Please someone fix this buggy driver. Try the following patch for a start diff --git a/drivers/staging/rtl8712/rtl8712_recv.c b/drivers/staging/rtl8712/rtl8712_recv.c index 8e82ce2..88e3ca6 100644 --- a/drivers/staging/rtl8712/rtl8712_recv.c +++ b/drivers/staging/rtl8712/rtl8712_recv.c @@ -1127,6 +1127,9 @@ static void recv_tasklet(void *priv) recvbuf2recvframe(padapter, pskb); skb_reset_tail_pointer(pskb); pskb-len = 0; - skb_queue_tail(precvpriv-free_recv_skb_queue, pskb); + if (!skb_cloned(pskb)) + skb_queue_tail(precvpriv-free_recv_skb_queue, pskb); + else + consume_skb(pskb); } } This one did not help. There is no doubt it is needed for the case where memory is tight, an allocation fails, and the driver clones the skb. In the present case, debug statements have shown that the skb_clone() call was not made. In the long term, this driver will be replaced with one that uses mac80211, but in the short term, I am trying to fix it. As I said earlier, my skb skills are minimal. Could you explain what is wrong with the following sequence? sub_skb = dev_alloc_skb(nSubframe_Length + 12); skb_reserve(sub_skb, 12); dev_alloc_skb() can return NULL - crash skb_clone() can also return NULL - crash -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Regression associated with commit c8628155ece3 - "tcp: reduce out_of_order memory use"
On Thu, 2012-08-23 at 15:57 -0500, Larry Finger wrote: > On 08/22/2012 11:03 PM, Eric Dumazet wrote: > > > > Changing the allocation size removes the problem ? thats really strange. > > > > If you try different sizes in the 9100-30720 range, can you pinpoint the > > failure threshold ? > > The allocation size change did not fix the problem. It turned out that 10 > tries > from a secure web page were not enough to trigger this intermittent problem > that > particular test. > > Based on DaveM's comment that skb->truesize could be wrong, I tried setting > truesize after every netdev_alloc_skb() call. Of course, that had no effect. > I > then found https://lkml.org/lkml/2010/11/19/505I, which clearly states why > this > need not be done. > > What skb modifications require that truesize be adjusted? The driver never > resets skb->len or skb->data_len for any buffers, other than setting skb->len > to > zero. skb->truesize is adjusted when a frag is added to one skb, or when skb->head is re-allocated. Are you sure you dont have another problem, because as I said commit c8628155ece3 had a bug, so a bisect is not very useful. How many reloads are needed to trigger the bug, do you have a script to reproduce it ? Could it be a PMTU problem ? (check http://git.kernel.org/?p=linux/kernel/git/davem/net.git;a=commit;h=9b04f350057863d1fad1ba071e09362a1da3503e ) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Regression associated with commit c8628155ece3 - "tcp: reduce out_of_order memory use"
On 08/22/2012 11:03 PM, Eric Dumazet wrote: Changing the allocation size removes the problem ? thats really strange. If you try different sizes in the 9100-30720 range, can you pinpoint the failure threshold ? The allocation size change did not fix the problem. It turned out that 10 tries from a secure web page were not enough to trigger this intermittent problem that particular test. Based on DaveM's comment that skb->truesize could be wrong, I tried setting truesize after every netdev_alloc_skb() call. Of course, that had no effect. I then found https://lkml.org/lkml/2010/11/19/505I, which clearly states why this need not be done. What skb modifications require that truesize be adjusted? The driver never resets skb->len or skb->data_len for any buffers, other than setting skb->len to zero. Thanks, Larry -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Regression associated with commit c8628155ece3 - tcp: reduce out_of_order memory use
On 08/22/2012 11:03 PM, Eric Dumazet wrote: Changing the allocation size removes the problem ? thats really strange. If you try different sizes in the 9100-30720 range, can you pinpoint the failure threshold ? The allocation size change did not fix the problem. It turned out that 10 tries from a secure web page were not enough to trigger this intermittent problem that particular test. Based on DaveM's comment that skb-truesize could be wrong, I tried setting truesize after every netdev_alloc_skb() call. Of course, that had no effect. I then found https://lkml.org/lkml/2010/11/19/505I, which clearly states why this need not be done. What skb modifications require that truesize be adjusted? The driver never resets skb-len or skb-data_len for any buffers, other than setting skb-len to zero. Thanks, Larry -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Regression associated with commit c8628155ece3 - tcp: reduce out_of_order memory use
On Thu, 2012-08-23 at 15:57 -0500, Larry Finger wrote: On 08/22/2012 11:03 PM, Eric Dumazet wrote: Changing the allocation size removes the problem ? thats really strange. If you try different sizes in the 9100-30720 range, can you pinpoint the failure threshold ? The allocation size change did not fix the problem. It turned out that 10 tries from a secure web page were not enough to trigger this intermittent problem that particular test. Based on DaveM's comment that skb-truesize could be wrong, I tried setting truesize after every netdev_alloc_skb() call. Of course, that had no effect. I then found https://lkml.org/lkml/2010/11/19/505I, which clearly states why this need not be done. What skb modifications require that truesize be adjusted? The driver never resets skb-len or skb-data_len for any buffers, other than setting skb-len to zero. skb-truesize is adjusted when a frag is added to one skb, or when skb-head is re-allocated. Are you sure you dont have another problem, because as I said commit c8628155ece3 had a bug, so a bisect is not very useful. How many reloads are needed to trigger the bug, do you have a script to reproduce it ? Could it be a PMTU problem ? (check http://git.kernel.org/?p=linux/kernel/git/davem/net.git;a=commit;h=9b04f350057863d1fad1ba071e09362a1da3503e ) -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Regression associated with commit c8628155ece3 - "tcp: reduce out_of_order memory use"
On Wed, 2012-08-22 at 16:33 -0500, Larry Finger wrote: > On 08/22/2012 12:15 AM, Eric Dumazet wrote: > > > > This particular commit is the start of a patches batch that ended in the > > generic TCP coalescing mechanism. > > > > It is known to have problem on drivers doing skb_clone() in their rx > > path. > > > > Current kernels should be ok, because coalescing doesnt happen if the > > destination skb is cloned (skb_cloned(to) in skb_try_coalesce()) > > The skb_clone() call is not the source of the problem for r8712u, as it is > only > used when a memory allocation fails, which is not happening. The suggestion > did > lead to another patch that I had been preparing. The initial allocation of RX > buffers used a size of 30720 bytes, while 9100 is sufficient to allow > aggregation. Upon reducing the buffer size, the driver now works for me. I am > now awaiting tests by the OP on the Red Hat bugzilla before sending the patch > upstream. > > So far, no ideas for the problem in connecting to WPA1 networks. > Changing the allocation size removes the problem ? thats really strange. If you try different sizes in the 9100-30720 range, can you pinpoint the failure threshold ? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Regression associated with commit c8628155ece3 - "tcp: reduce out_of_order memory use"
On 08/22/2012 12:15 AM, Eric Dumazet wrote: This particular commit is the start of a patches batch that ended in the generic TCP coalescing mechanism. It is known to have problem on drivers doing skb_clone() in their rx path. Current kernels should be ok, because coalescing doesnt happen if the destination skb is cloned (skb_cloned(to) in skb_try_coalesce()) The skb_clone() call is not the source of the problem for r8712u, as it is only used when a memory allocation fails, which is not happening. The suggestion did lead to another patch that I had been preparing. The initial allocation of RX buffers used a size of 30720 bytes, while 9100 is sufficient to allow aggregation. Upon reducing the buffer size, the driver now works for me. I am now awaiting tests by the OP on the Red Hat bugzilla before sending the patch upstream. So far, no ideas for the problem in connecting to WPA1 networks. Larry -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Regression associated with commit c8628155ece3 - "tcp: reduce out_of_order memory use"
On 08/22/2012 12:15 AM, Eric Dumazet wrote: This particular commit is the start of a patches batch that ended in the generic TCP coalescing mechanism. It is known to have problem on drivers doing skb_clone() in their rx path. Current kernels should be ok, because coalescing doesnt happen if the destination skb is cloned (skb_cloned(to) in skb_try_coalesce()) For 3.4 kernel, I guess I need to backport this skb_cloned(to) check fo stable 3.4 kernel But these skb_clone() in various USB drivers should be killed for good, they really can kill the box because of skb->truesize lies. Applying your patch to 3.4 does not fix the problem. In addition, all of the problems I noted happen with 3.6-rc2 - current kernels are not OK. Of the drivers I reported, only r8712u in staging/rtl8712/rtl8712_recv.c has a direct call to skb_clone(). As its symptoms are different, the cloning is likely the source of the problems there. In addition, the problem is intermittent. The driver normally allocates a new buffer and copies to it, and the clone only occurs when that skb allocation fails. I will try to fix that. My skb skills are minimal and I will have to do some studying. Thanks, Larry -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Regression associated with commit c8628155ece3 - tcp: reduce out_of_order memory use
On 08/22/2012 12:15 AM, Eric Dumazet wrote: This particular commit is the start of a patches batch that ended in the generic TCP coalescing mechanism. It is known to have problem on drivers doing skb_clone() in their rx path. Current kernels should be ok, because coalescing doesnt happen if the destination skb is cloned (skb_cloned(to) in skb_try_coalesce()) For 3.4 kernel, I guess I need to backport this skb_cloned(to) check fo stable 3.4 kernel But these skb_clone() in various USB drivers should be killed for good, they really can kill the box because of skb-truesize lies. Applying your patch to 3.4 does not fix the problem. In addition, all of the problems I noted happen with 3.6-rc2 - current kernels are not OK. Of the drivers I reported, only r8712u in staging/rtl8712/rtl8712_recv.c has a direct call to skb_clone(). As its symptoms are different, the cloning is likely the source of the problems there. In addition, the problem is intermittent. The driver normally allocates a new buffer and copies to it, and the clone only occurs when that skb allocation fails. I will try to fix that. My skb skills are minimal and I will have to do some studying. Thanks, Larry -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Regression associated with commit c8628155ece3 - tcp: reduce out_of_order memory use
On 08/22/2012 12:15 AM, Eric Dumazet wrote: This particular commit is the start of a patches batch that ended in the generic TCP coalescing mechanism. It is known to have problem on drivers doing skb_clone() in their rx path. Current kernels should be ok, because coalescing doesnt happen if the destination skb is cloned (skb_cloned(to) in skb_try_coalesce()) The skb_clone() call is not the source of the problem for r8712u, as it is only used when a memory allocation fails, which is not happening. The suggestion did lead to another patch that I had been preparing. The initial allocation of RX buffers used a size of 30720 bytes, while 9100 is sufficient to allow aggregation. Upon reducing the buffer size, the driver now works for me. I am now awaiting tests by the OP on the Red Hat bugzilla before sending the patch upstream. So far, no ideas for the problem in connecting to WPA1 networks. Larry -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Regression associated with commit c8628155ece3 - tcp: reduce out_of_order memory use
On Wed, 2012-08-22 at 16:33 -0500, Larry Finger wrote: On 08/22/2012 12:15 AM, Eric Dumazet wrote: This particular commit is the start of a patches batch that ended in the generic TCP coalescing mechanism. It is known to have problem on drivers doing skb_clone() in their rx path. Current kernels should be ok, because coalescing doesnt happen if the destination skb is cloned (skb_cloned(to) in skb_try_coalesce()) The skb_clone() call is not the source of the problem for r8712u, as it is only used when a memory allocation fails, which is not happening. The suggestion did lead to another patch that I had been preparing. The initial allocation of RX buffers used a size of 30720 bytes, while 9100 is sufficient to allow aggregation. Upon reducing the buffer size, the driver now works for me. I am now awaiting tests by the OP on the Red Hat bugzilla before sending the patch upstream. So far, no ideas for the problem in connecting to WPA1 networks. Changing the allocation size removes the problem ? thats really strange. If you try different sizes in the 9100-30720 range, can you pinpoint the failure threshold ? -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Regression associated with commit c8628155ece3 - "tcp: reduce out_of_order memory use"
On Tue, 2012-08-21 at 23:07 -0500, Larry Finger wrote: > Hi, > > The commit entitled "tcp: reduce out_of_order memory use" turns out to cause > problems with a number of USB drivers. > > The first one called to my attention was for staging/r8712u. For this driver, > there are problems with SSL communications as reported in > https://bugzilla.redhat.com/show_bug.cgi?id=847525. > > Other drivers including rtl8187, rtl8192cu and rt2800usb cannot connect to a > WPA1- or WEP-encrypted AP, but work fine with WPA2. The rtl8192cu problem is > reported in https://bugzilla.kernel.org/show_bug.cgi?id=46171. > > I find it hard to understand why this patch should cause these effects; > however, > I have verified that a kernel generated from "git checkout e86b2919" works > fine. > This commit is immediately before the patch in question. > > Note, this patch was applied during the merge between 3.3 and 3.4. The > regression has been undiscovered for some time. > > Any help on this problem will be appreciated. > > Larry This particular commit is the start of a patches batch that ended in the generic TCP coalescing mechanism. It is known to have problem on drivers doing skb_clone() in their rx path. Current kernels should be ok, because coalescing doesnt happen if the destination skb is cloned (skb_cloned(to) in skb_try_coalesce()) For 3.4 kernel, I guess I need to backport this skb_cloned(to) check fo stable 3.4 kernel But these skb_clone() in various USB drivers should be killed for good, they really can kill the box because of skb->truesize lies. Please test following patch (for 3.4 kernels) diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index 257b617..c45ac2d 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -4496,7 +4496,9 @@ static void tcp_data_queue_ofo(struct sock *sk, struct sk_buff *skb) * to avoid future tcp_collapse_ofo_queue(), * probably the most expensive function in tcp stack. */ - if (skb->len <= skb_tailroom(skb1) && !tcp_hdr(skb)->fin) { + if (skb->len <= skb_tailroom(skb1) && + !skb_cloned(skb1) && + !tcp_hdr(skb)->fin) { NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_TCPRCVCOALESCE); BUG_ON(skb_copy_bits(skb, 0, -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Regression associated with commit c8628155ece3 - "tcp: reduce out_of_order memory use"
From: Larry Finger Date: Tue, 21 Aug 2012 23:07:46 -0500 > I find it hard to understand why this patch should cause these > effects; however, I have verified that a kernel generated from "git > checkout e86b2919" works fine. This commit is immediately before the > patch in question. I can guarentee it's improper skb->truesize setting in these drivers, or use of too huge buffer sizes compared to the amount of data contained within the packet. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Regression associated with commit c8628155ece3 - "tcp: reduce out_of_order memory use"
Hi, The commit entitled "tcp: reduce out_of_order memory use" turns out to cause problems with a number of USB drivers. The first one called to my attention was for staging/r8712u. For this driver, there are problems with SSL communications as reported in https://bugzilla.redhat.com/show_bug.cgi?id=847525. Other drivers including rtl8187, rtl8192cu and rt2800usb cannot connect to a WPA1- or WEP-encrypted AP, but work fine with WPA2. The rtl8192cu problem is reported in https://bugzilla.kernel.org/show_bug.cgi?id=46171. I find it hard to understand why this patch should cause these effects; however, I have verified that a kernel generated from "git checkout e86b2919" works fine. This commit is immediately before the patch in question. Note, this patch was applied during the merge between 3.3 and 3.4. The regression has been undiscovered for some time. Any help on this problem will be appreciated. Larry -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Regression associated with commit c8628155ece3 - tcp: reduce out_of_order memory use
Hi, The commit entitled tcp: reduce out_of_order memory use turns out to cause problems with a number of USB drivers. The first one called to my attention was for staging/r8712u. For this driver, there are problems with SSL communications as reported in https://bugzilla.redhat.com/show_bug.cgi?id=847525. Other drivers including rtl8187, rtl8192cu and rt2800usb cannot connect to a WPA1- or WEP-encrypted AP, but work fine with WPA2. The rtl8192cu problem is reported in https://bugzilla.kernel.org/show_bug.cgi?id=46171. I find it hard to understand why this patch should cause these effects; however, I have verified that a kernel generated from git checkout e86b2919 works fine. This commit is immediately before the patch in question. Note, this patch was applied during the merge between 3.3 and 3.4. The regression has been undiscovered for some time. Any help on this problem will be appreciated. Larry -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Regression associated with commit c8628155ece3 - tcp: reduce out_of_order memory use
From: Larry Finger larry.fin...@lwfinger.net Date: Tue, 21 Aug 2012 23:07:46 -0500 I find it hard to understand why this patch should cause these effects; however, I have verified that a kernel generated from git checkout e86b2919 works fine. This commit is immediately before the patch in question. I can guarentee it's improper skb-truesize setting in these drivers, or use of too huge buffer sizes compared to the amount of data contained within the packet. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Regression associated with commit c8628155ece3 - tcp: reduce out_of_order memory use
On Tue, 2012-08-21 at 23:07 -0500, Larry Finger wrote: Hi, The commit entitled tcp: reduce out_of_order memory use turns out to cause problems with a number of USB drivers. The first one called to my attention was for staging/r8712u. For this driver, there are problems with SSL communications as reported in https://bugzilla.redhat.com/show_bug.cgi?id=847525. Other drivers including rtl8187, rtl8192cu and rt2800usb cannot connect to a WPA1- or WEP-encrypted AP, but work fine with WPA2. The rtl8192cu problem is reported in https://bugzilla.kernel.org/show_bug.cgi?id=46171. I find it hard to understand why this patch should cause these effects; however, I have verified that a kernel generated from git checkout e86b2919 works fine. This commit is immediately before the patch in question. Note, this patch was applied during the merge between 3.3 and 3.4. The regression has been undiscovered for some time. Any help on this problem will be appreciated. Larry This particular commit is the start of a patches batch that ended in the generic TCP coalescing mechanism. It is known to have problem on drivers doing skb_clone() in their rx path. Current kernels should be ok, because coalescing doesnt happen if the destination skb is cloned (skb_cloned(to) in skb_try_coalesce()) For 3.4 kernel, I guess I need to backport this skb_cloned(to) check fo stable 3.4 kernel But these skb_clone() in various USB drivers should be killed for good, they really can kill the box because of skb-truesize lies. Please test following patch (for 3.4 kernels) diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index 257b617..c45ac2d 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -4496,7 +4496,9 @@ static void tcp_data_queue_ofo(struct sock *sk, struct sk_buff *skb) * to avoid future tcp_collapse_ofo_queue(), * probably the most expensive function in tcp stack. */ - if (skb-len = skb_tailroom(skb1) !tcp_hdr(skb)-fin) { + if (skb-len = skb_tailroom(skb1) + !skb_cloned(skb1) + !tcp_hdr(skb)-fin) { NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_TCPRCVCOALESCE); BUG_ON(skb_copy_bits(skb, 0, -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/