Re: mlx4en, timer irq @100%... (11.0 stuck on high network load ???)
Hi Ben, On 8/31/17 12:04 PM, Ben RUBSON wrote: >> On 28 Aug 2017, at 11:27, Julien Charbonwrote: >> >> On 8/28/17 10:25 AM, Ben RUBSON wrote: On 16 Aug 2017, at 11:02, Ben RUBSON wrote: > On 15 Aug 2017, at 23:33, Julien Charbon wrote: > > On 8/11/17 11:32 AM, Ben RUBSON wrote: >>> On 08 Aug 2017, at 13:33, Julien Charbon wrote: >>> >>> On 8/8/17 10:31 AM, Hans Petter Selasky wrote: Suggested fix attached. >>> >>> I agree we your conclusion. Just for the record, more precisely this >>> regression seems to have been introduced with: >>> (...) >>> Thus good catch, and your patch looks good. I am going to just verify >>> the other in_pcbrele_wlocked() calls in TCP stack. >> >> Julien, do you plan to make this fix reach 11.0-p12 ? > > I am checking if your issue is another flavor of the issue fixed by: > > https://svnweb.freebsd.org/base?view=revision=307551 > https://reviews.freebsd.org/D8211 > > This fix in not in 11.0 but in 11.1. Currently I did not found how an > inp in INP_TIMEWAIT state can have been INP_FREED without having its tw > set to NULL already except the issue fixed by r307551. > > Thus could you try to apply this patch: > > https://github.com/freebsd/freebsd/commit/acb5bfda99b753d9ead3529d04f20087c5f7d0a0.patch > > and see if you can still reproduce this issue? Thank you for your answer Julien. Unfortunately, I'm not sure at all how to reproduce the issue. I have other servers which are 100% identical to this one, same workload, same some-months uptime, but they did not trigger the bug yet. If other network stack experts (I'm not) agree with your analysis, we could then certainly go further with D8211 / r307551. One thing that perhaps might help : # netstat -an | grep TIME_WAIT$ | wc -l 468 Note that due to this running bug, sendmail has lots of difficulties to send outgoing mails. As soon as I run the above netstat command, I receive a lot of stacked mails (more than 20 this time). As if netstat was able to somehow help... Number of TIME_WAIT connections however does not decrease, but increases. > And in the spirit of r307551 fix and based on Hans patch I will also > propose to add a kernel log describing the issue instead of starting an > infinite loop when INVARIANT is not set. Which should then never be triggered :) Good idea I think ! >>> >>> What about : >>> D8211/r307551 >>> + Hans' patch >>> + Julien's idea of a kernel log (sort of "We should not be here but we are") >> >> I did this change and I am testing it > > Good news ! > >> on your side did you try this patch applied on 11.0? >> >> https://github.com/freebsd/freebsd/commit/acb5bfda99b753d9ead3529d04f20087c5f7d0a0.patch > > Yes, patch applied and running correctly, > however hard to say whether or not it solves this issue, > as there is no easy way to reproduce it. No problem, it is just a matter of not seeing the issue anymore during a long enough period. I created a review that includes Hans's patch and uses the same log(LOG_ERR) logic than r307551: https://reviews.freebsd.org/D12267 On my side, TCP smoke tests are ok. And I am going to launch our TCP QA on it while receiving review comments. > Mail sent to FreeBSD Security Team ! > > Many thanks, let's stay tuned ! Thanks to you and Hans for reporting that issue. And in summary: - Applying r307551 on top of 11.0 should prevent this case to happen - D12267 will prevent the tcp_tw_2msl_scan() infinite loop while reporting the error, in case a regression defeating r307551 is introduced Thanks. -- Julien ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: mlx4en, timer irq @100%... (11.0 stuck on high network load ???)
> On 28 Aug 2017, at 11:27, Julien Charbonwrote: > > On 8/28/17 10:25 AM, Ben RUBSON wrote: >>> On 16 Aug 2017, at 11:02, Ben RUBSON wrote: >>> On 15 Aug 2017, at 23:33, Julien Charbon wrote: On 8/11/17 11:32 AM, Ben RUBSON wrote: >> On 08 Aug 2017, at 13:33, Julien Charbon wrote: >> >> On 8/8/17 10:31 AM, Hans Petter Selasky wrote: >>> >>> Suggested fix attached. >> >> I agree we your conclusion. Just for the record, more precisely this >> regression seems to have been introduced with: >> (...) >> Thus good catch, and your patch looks good. I am going to just verify >> the other in_pcbrele_wlocked() calls in TCP stack. > > Julien, do you plan to make this fix reach 11.0-p12 ? I am checking if your issue is another flavor of the issue fixed by: https://svnweb.freebsd.org/base?view=revision=307551 https://reviews.freebsd.org/D8211 This fix in not in 11.0 but in 11.1. Currently I did not found how an inp in INP_TIMEWAIT state can have been INP_FREED without having its tw set to NULL already except the issue fixed by r307551. Thus could you try to apply this patch: https://github.com/freebsd/freebsd/commit/acb5bfda99b753d9ead3529d04f20087c5f7d0a0.patch and see if you can still reproduce this issue? >>> >>> Thank you for your answer Julien. >>> Unfortunately, I'm not sure at all how to reproduce the issue. >>> I have other servers which are 100% identical to this one, same workload, >>> same some-months uptime, but they did not trigger the bug yet. >>> >>> If other network stack experts (I'm not) agree with your analysis, >>> we could then certainly go further with D8211 / r307551. >>> >>> One thing that perhaps might help : >>> # netstat -an | grep TIME_WAIT$ | wc -l >>> 468 >>> >>> Note that due to this running bug, sendmail has lots of difficulties to >>> send outgoing mails. >>> As soon as I run the above netstat command, I receive a lot of stacked >>> mails (more than 20 this time). >>> As if netstat was able to somehow help... >>> >>> Number of TIME_WAIT connections however does not decrease, but increases. >>> And in the spirit of r307551 fix and based on Hans patch I will also propose to add a kernel log describing the issue instead of starting an infinite loop when INVARIANT is not set. >>> >>> Which should then never be triggered :) >>> Good idea I think ! >> >> What about : >> D8211/r307551 >> + Hans' patch >> + Julien's idea of a kernel log (sort of "We should not be here but we are") > > I did this change and I am testing it Good news ! > on your side did you try this patch applied on 11.0? > > https://github.com/freebsd/freebsd/commit/acb5bfda99b753d9ead3529d04f20087c5f7d0a0.patch Yes, patch applied and running correctly, however hard to say whether or not it solves this issue, as there is no easy way to reproduce it. >> And backporting all this to 11.0 (and so to 11.1 too) ? >> >> As this bug can impact every FreeBSD machine / server, >> leading to an unavailable / unreachable system (this is how mine ended), >> sounds like it could inevitably be a good thing, for production stability >> purpose. > > The main fix for your issue is (I believe): > > Fix a double-free when an inp transitions to INP_TIMEWAIT state > after having been dropped. > https://svnweb.freebsd.org/base?view=revision=307551 > > This fix has been MFC-ed on both stable/11, stable/10 and is already > included in 11.1 and will be in 10.4. To push in 11.0 release directly, > I guess you have to promote this change to an Errata (never did that > myself): > > https://www.freebsd.org/security/notices.html > https://www.freebsd.org/security/security.html#reporting Mail sent to FreeBSD Security Team ! Many thanks, let's stay tuned ! Ben ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: mlx4en, timer irq @100%... (11.0 stuck on high network load ???)
Hi Ben, On 8/28/17 10:25 AM, Ben RUBSON wrote: >> On 16 Aug 2017, at 11:02, Ben RUBSONwrote: >> >>> On 15 Aug 2017, at 23:33, Julien Charbon wrote: >>> >>> On 8/11/17 11:32 AM, Ben RUBSON wrote: > On 08 Aug 2017, at 13:33, Julien Charbon wrote: > > On 8/8/17 10:31 AM, Hans Petter Selasky wrote: >> >> Suggested fix attached. > > I agree we your conclusion. Just for the record, more precisely this > regression seems to have been introduced with: > (...) > Thus good catch, and your patch looks good. I am going to just verify > the other in_pcbrele_wlocked() calls in TCP stack. Julien, do you plan to make this fix reach 11.0-p12 ? >>> >>> I am checking if your issue is another flavor of the issue fixed by: >>> >>> https://svnweb.freebsd.org/base?view=revision=307551 >>> https://reviews.freebsd.org/D8211 >>> >>> This fix in not in 11.0 but in 11.1. Currently I did not found how an >>> inp in INP_TIMEWAIT state can have been INP_FREED without having its tw >>> set to NULL already except the issue fixed by r307551. >>> >>> Thus could you try to apply this patch: >>> >>> https://github.com/freebsd/freebsd/commit/acb5bfda99b753d9ead3529d04f20087c5f7d0a0.patch >>> >>> and see if you can still reproduce this issue? >> >> Thank you for your answer Julien. >> Unfortunately, I'm not sure at all how to reproduce the issue. >> I have other servers which are 100% identical to this one, same workload, >> same some-months uptime, but they did not trigger the bug yet. >> >> If other network stack experts (I'm not) agree with your analysis, >> we could then certainly go further with D8211 / r307551. >> >> One thing that perhaps might help : >> # netstat -an | grep TIME_WAIT$ | wc -l >> 468 >> >> Note that due to this running bug, sendmail has lots of difficulties to send >> outgoing mails. >> As soon as I run the above netstat command, I receive a lot of stacked mails >> (more than 20 this time). >> As if netstat was able to somehow help... >> >> Number of TIME_WAIT connections however does not decrease, but increases. >> >>> And in the spirit of r307551 fix and based on Hans patch I will also >>> propose to add a kernel log describing the issue instead of starting an >>> infinite loop when INVARIANT is not set. >> >> Which should then never be triggered :) >> Good idea I think ! > > What about : > D8211/r307551 > + Hans' patch > + Julien's idea of a kernel log (sort of "We should not be here but we are") I did this change and I am testing it, on your side did you try this patch applied on 11.0? https://github.com/freebsd/freebsd/commit/acb5bfda99b753d9ead3529d04f20087c5f7d0a0.patch > And backporting all this to 11.0 (and so to 11.1 too) ? > > As this bug can impact every FreeBSD machine / server, > leading to an unavailable / unreachable system (this is how mine ended), > sounds like it could inevitably be a good thing, for production stability > purpose. The main fix for your issue is (I believe): Fix a double-free when an inp transitions to INP_TIMEWAIT state after having been dropped. https://svnweb.freebsd.org/base?view=revision=307551 This fix has been MFC-ed on both stable/11, stable/10 and is already included in 11.1 and will be in 10.4. To push in 11.0 release directly, I guess you have to promote this change to an Errata (never did that myself): https://www.freebsd.org/security/notices.html https://www.freebsd.org/security/security.html#reporting While waiting for this errata to be accepted better using the patch. My 2 cents. -- Julien ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: mlx4en, timer irq @100%... (11.0 stuck on high network load ???)
> On 16 Aug 2017, at 11:02, Ben RUBSONwrote: > >> On 15 Aug 2017, at 23:33, Julien Charbon wrote: >> >> On 8/11/17 11:32 AM, Ben RUBSON wrote: On 08 Aug 2017, at 13:33, Julien Charbon wrote: On 8/8/17 10:31 AM, Hans Petter Selasky wrote: > > Suggested fix attached. I agree we your conclusion. Just for the record, more precisely this regression seems to have been introduced with: (...) Thus good catch, and your patch looks good. I am going to just verify the other in_pcbrele_wlocked() calls in TCP stack. >>> >>> Julien, do you plan to make this fix reach 11.0-p12 ? >> >> I am checking if your issue is another flavor of the issue fixed by: >> >> https://svnweb.freebsd.org/base?view=revision=307551 >> https://reviews.freebsd.org/D8211 >> >> This fix in not in 11.0 but in 11.1. Currently I did not found how an >> inp in INP_TIMEWAIT state can have been INP_FREED without having its tw >> set to NULL already except the issue fixed by r307551. >> >> Thus could you try to apply this patch: >> >> https://github.com/freebsd/freebsd/commit/acb5bfda99b753d9ead3529d04f20087c5f7d0a0.patch >> >> and see if you can still reproduce this issue? > > Thank you for your answer Julien. > Unfortunately, I'm not sure at all how to reproduce the issue. > I have other servers which are 100% identical to this one, same workload, > same some-months uptime, but they did not trigger the bug yet. > > If other network stack experts (I'm not) agree with your analysis, > we could then certainly go further with D8211 / r307551. > > One thing that perhaps might help : > # netstat -an | grep TIME_WAIT$ | wc -l > 468 > > Note that due to this running bug, sendmail has lots of difficulties to send > outgoing mails. > As soon as I run the above netstat command, I receive a lot of stacked mails > (more than 20 this time). > As if netstat was able to somehow help... > > Number of TIME_WAIT connections however does not decrease, but increases. > >> And in the spirit of r307551 fix and based on Hans patch I will also >> propose to add a kernel log describing the issue instead of starting an >> infinite loop when INVARIANT is not set. > > Which should then never be triggered :) > Good idea I think ! What about : D8211/r307551 + Hans' patch + Julien's idea of a kernel log (sort of "We should not be here but we are") And backporting all this to 11.0 (and so to 11.1 too) ? As this bug can impact every FreeBSD machine / server, leading to an unavailable / unreachable system (this is how mine ended), sounds like it could inevitably be a good thing, for production stability purpose. Thank you very much ! Ben ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: mlx4en, timer irq @100%... (11.0 stuck on high network load ???)
> On 15 Aug 2017, at 23:33, Julien Charbonwrote: > > On 8/11/17 11:32 AM, Ben RUBSON wrote: >>> On 08 Aug 2017, at 13:33, Julien Charbon wrote: >>> >>> On 8/8/17 10:31 AM, Hans Petter Selasky wrote: Suggested fix attached. >>> >>> I agree we your conclusion. Just for the record, more precisely this >>> regression seems to have been introduced with: >>> (...) >>> Thus good catch, and your patch looks good. I am going to just verify >>> the other in_pcbrele_wlocked() calls in TCP stack. >> >> Julien, do you plan to make this fix reach 11.0-p12 ? > > I am checking if your issue is another flavor of the issue fixed by: > > https://svnweb.freebsd.org/base?view=revision=307551 > https://reviews.freebsd.org/D8211 > > This fix in not in 11.0 but in 11.1. Currently I did not found how an > inp in INP_TIMEWAIT state can have been INP_FREED without having its tw > set to NULL already except the issue fixed by r307551. > > Thus could you try to apply this patch: > > https://github.com/freebsd/freebsd/commit/acb5bfda99b753d9ead3529d04f20087c5f7d0a0.patch > > and see if you can still reproduce this issue? Thank you for your answer Julien. Unfortunately, I'm not sure at all how to reproduce the issue. I have other servers which are 100% identical to this one, same workload, same some-months uptime, but they did not trigger the bug yet. If other network stack experts (I'm not) agree with your analysis, we could then certainly go further with D8211 / r307551. One thing that perhaps might help : # netstat -an | grep TIME_WAIT$ | wc -l 468 Note that due to this running bug, sendmail has lots of difficulties to send outgoing mails. As soon as I run the above netstat command, I receive a lot of stacked mails (more than 20 this time). As if netstat was able to somehow help... Number of TIME_WAIT connections however does not decrease, but increases. > And in the spirit of r307551 fix and based on Hans patch I will also > propose to add a kernel log describing the issue instead of starting an > infinite loop when INVARIANT is not set. Which should then never be triggered :) Good idea I think ! Thank you again ! Ben ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: mlx4en, timer irq @100%... (11.0 stuck on high network load ???)
Hi Ben, On 8/11/17 11:32 AM, Ben RUBSON wrote: >> On 08 Aug 2017, at 13:33, Julien Charbonwrote: >> >> On 8/8/17 10:31 AM, Hans Petter Selasky wrote: >>> >>> Suggested fix attached. >> >> I agree we your conclusion. Just for the record, more precisely this >> regression seems to have been introduced with: >> (...) >> Thus good catch, and your patch looks good. I am going to just verify >> the other in_pcbrele_wlocked() calls in TCP stack. > > Julien, do you plan to make this fix reach 11.0-p12 ? I am checking if your issue is another flavor of the issue fixed by: https://svnweb.freebsd.org/base?view=revision=307551 https://reviews.freebsd.org/D8211 This fix in not in 11.0 but in 11.1. Currently I did not found how an inp in INP_TIMEWAIT state can have been INP_FREED without having its tw set to NULL already except the issue fixed by r307551. Thus could you try to apply this patch: https://github.com/freebsd/freebsd/commit/acb5bfda99b753d9ead3529d04f20087c5f7d0a0.patch and see if you can still reproduce this issue? And in the spirit of r307551 fix and based on Hans patch I will also propose to add a kernel log describing the issue instead of starting an infinite loop when INVARIANT is not set. -- Julien ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: mlx4en, timer irq @100%... (11.0 stuck on high network load ???)
> On 08 Aug 2017, at 13:33, Julien Charbonwrote: > > Hi, > > On 8/8/17 10:31 AM, Hans Petter Selasky wrote: >> >> >> Suggested fix attached. > > I agree we your conclusion. Just for the record, more precisely this > regression seems to have been introduced with: > (...) > Thus good catch, and your patch looks good. I am going to just verify > the other in_pcbrele_wlocked() calls in TCP stack. Julien, do you plan to make this fix reach 11.0-p12 ? Many thx ! Ben ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: mlx4en, timer irq @100%... (11.0 stuck on high network load ???)
On 08/08/17 13:56, Slawa Olhovchenkov wrote: On Tue, Aug 08, 2017 at 01:49:08PM +0200, Hans Petter Selasky wrote: On 08/08/17 13:33, Slawa Olhovchenkov wrote: TW_RUNLOCK(V_tw_lock); and if (INP_INFO_TRY_WLOCK(_tcbinfo)) { `inp` can be invalidated, freed and this pointer may be invalid? If you look one line up there is a pcbref ?? Yes. Can different thread take this inp and freed it? May be timer thread? No, it cannot be freed while there is a ref. Some lines down the ref is dropped once the inp pointer is no longer needed. --HPS ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: mlx4en, timer irq @100%... (11.0 stuck on high network load ???)
Hi, On 8/8/17 10:31 AM, Hans Petter Selasky wrote: > On 08/08/17 10:06, Ben RUBSON wrote: >>> On 08 Aug 2017, at 10:02, Hans Petter Selaskywrote: >>> >>> On 08/08/17 10:00, Ben RUBSON wrote: kgdb) print *twq_2msl.tqh_first $2 = { tw_inpcb = 0xf8031c570740, >>> >>> print *twq_2msl.tqh_first->tw_inpcb >> > > Here is the conclusion: > > The following code is going in an infinite loop: > > >> for (;;) { >> TW_RLOCK(V_tw_lock); >> tw = TAILQ_FIRST(_twq_2msl); >> if (tw == NULL || (!reuse && (tw->tw_time - ticks) > >> 0)) { >> TW_RUNLOCK(V_tw_lock); >> break; >> } >> KASSERT(tw->tw_inpcb != NULL, ("%s: tw->tw_inpcb == >> NULL", >> __func__)); >> >> inp = tw->tw_inpcb; >> in_pcbref(inp); >> TW_RUNLOCK(V_tw_lock); >> >> if (INP_INFO_TRY_RLOCK(_tcbinfo)) { >> >> INP_WLOCK(inp); >> tw = intotw(inp); >> if (in_pcbrele_wlocked(inp)) { > > in_pcbrele_wlocked() returns (1) because INP_FREED (16) is set in > inp->inp_flags2. I guess you have invariants disabled, because the > KASSERT() below should have caused a panic. > >> KASSERT(tw == NULL, ("%s: held last inp " >> "reference but tw not NULL", >> __func__)); >> INP_INFO_RUNLOCK(_tcbinfo); >> continue; >> } > > This is a regression issue after: > >> commit 5630210a7f1dbbd903b77b2aef939cd47c63da58 >> Author: jch >> Date: Thu Oct 30 08:53:56 2014 + >> >> Fix a race condition in TCP timewait between tcp_tw_2msl_reuse() and >> tcp_tw_2msl_scan(). This race condition drives unplanned timewait >> timeout cancellation. Also simplify implementation by holding inpcb >> reference and removing tcptw reference counting. > > Suggested fix attached. I agree we your conclusion. Just for the record, more precisely this regression seems to have been introduced with: commit b02d40ddcda08b51a49e5667e6808f5dc5ec0472 Author: fabient Date: Wed Nov 25 14:45:43 2015 + The r241129 description was wrong that the scenario is possible only for read locks on pcbs. The same race can happen with write lock semantics as well. The race scenario: - Two threads (1 and 2) locate pcb with writer semantics (INPLOOKUP_WLOCKPCB) and do in_pcbref() on it. - 1 and 2 both drop the inp hash lock. - Another thread (3) grabs the inp hash lock. Then it runs in_pcbfree(), which wlocks the pcb. They must happen faster than 1 or 2 come INP_WLOCK()! - 1 and 2 congest in INP_WLOCK(). - 3 does in_pcbremlists(), drops hash lock, and runs in_pcbrele_wlocked(), which doesn't free the pcb due to two references on it. Then it unlocks the pcb. - 1 (or 2) gets wlock on the pcb, runs in_pcbrele_wlocked(), which doesn't report inp as freed, due to 2 (or 1) still helding extra reference on it. The thread tries to do smth with a disconnected pcb and crashes. Submitted by: emeric.pou...@stormshield.eu Reviewed by:gleb@ MFC after: 1 week Sponsored by: Stormshield Tested by: Cassiano Peixoto, Stormshield Notes: svn path=/head/; revision=291301 Before this change in_pcbrele_wlocked() returned 1 only on the last reference which is what tcp_tw_2msl_scan() currently expects. Thus good catch, and your patch looks good. I am going to just verify the other in_pcbrele_wlocked() calls in TCP stack. Thanks. -- Julien ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: mlx4en, timer irq @100%... (11.0 stuck on high network load ???)
On Tue, Aug 08, 2017 at 01:49:08PM +0200, Hans Petter Selasky wrote: > On 08/08/17 13:33, Slawa Olhovchenkov wrote: > > TW_RUNLOCK(V_tw_lock); > > and > > if (INP_INFO_TRY_WLOCK(_tcbinfo)) { > > > > `inp` can be invalidated, freed and this pointer may be invalid? > > If you look one line up there is a pcbref ?? Yes. Can different thread take this inp and freed it? May be timer thread? ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: mlx4en, timer irq @100%... (11.0 stuck on high network load ???)
On 08/08/17 13:33, Slawa Olhovchenkov wrote: TW_RUNLOCK(V_tw_lock); and if (INP_INFO_TRY_WLOCK(_tcbinfo)) { `inp` can be invalidated, freed and this pointer may be invalid? If you look one line up there is a pcbref ?? --HPS ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: mlx4en, timer irq @100%... (11.0 stuck on high network load ???)
On Tue, Aug 08, 2017 at 10:31:33AM +0200, Hans Petter Selasky wrote: > Here is the conclusion: > > The following code is going in an infinite loop: > > > > for (;;) { > > TW_RLOCK(V_tw_lock); > > tw = TAILQ_FIRST(_twq_2msl); > > if (tw == NULL || (!reuse && (tw->tw_time - ticks) > 0)) { > > TW_RUNLOCK(V_tw_lock); > > break; > > } > > KASSERT(tw->tw_inpcb != NULL, ("%s: tw->tw_inpcb == NULL", > > __func__)); > > > > inp = tw->tw_inpcb; > > in_pcbref(inp); > > TW_RUNLOCK(V_tw_lock); > > > > if (INP_INFO_TRY_RLOCK(_tcbinfo)) { > > > > INP_WLOCK(inp); > > tw = intotw(inp); > > if (in_pcbrele_wlocked(inp)) { > > in_pcbrele_wlocked() returns (1) because INP_FREED (16) is set in > inp->inp_flags2. I guess you have invariants disabled, because the > KASSERT() below should have caused a panic. > > > KASSERT(tw == NULL, ("%s: held last inp " > > "reference but tw not NULL", __func__)); > > INP_INFO_RUNLOCK(_tcbinfo); > > continue; > > } > > This is a regression issue after: > > > commit 5630210a7f1dbbd903b77b2aef939cd47c63da58 > > Author: jch> > Date: Thu Oct 30 08:53:56 2014 + > > > > Fix a race condition in TCP timewait between tcp_tw_2msl_reuse() and > > tcp_tw_2msl_scan(). This race condition drives unplanned timewait > > timeout cancellation. Also simplify implementation by holding inpcb > > reference and removing tcptw reference counting. > > Suggested fix attached. Hmm, I am not sure, IMHO between TW_RUNLOCK(V_tw_lock); and if (INP_INFO_TRY_WLOCK(_tcbinfo)) { `inp` can be invalidated, freed and this pointer may be invalid? > Index: sys/netinet/tcp_timewait.c > === > --- sys/netinet/tcp_timewait.c(revision 321981) > +++ sys/netinet/tcp_timewait.c(working copy) > @@ -709,10 +709,11 @@ > INP_WLOCK(inp); > tw = intotw(inp); > if (in_pcbrele_wlocked(inp)) { > - KASSERT(tw == NULL, ("%s: held last inp " > - "reference but tw not NULL", __func__)); > INP_INFO_RUNLOCK(_tcbinfo); > - continue; > + if (tw == NULL) > + continue; > + else > + break; /* INP_FREED flag is set */ > } > > if (tw == NULL) { > ___ > freebsd-...@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org" ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: mlx4en, timer irq @100%... (11.0 stuck on high network load ???)
> On 08 Aug 2017, at 10:31, Hans Petter Selaskywrote: > > On 08/08/17 10:06, Ben RUBSON wrote: >>> On 08 Aug 2017, at 10:02, Hans Petter Selasky wrote: >>> >>> On 08/08/17 10:00, Ben RUBSON wrote: kgdb) print *twq_2msl.tqh_first $2 = { tw_inpcb = 0xf8031c570740, >>> >>> print *twq_2msl.tqh_first->tw_inpcb >> (kgdb) print *twq_2msl.tqh_first->tw_inpcb >> $3 = { >> inp_hash = { >> le_next = 0x0, >> le_prev = 0xfe000f78adb8 >> }, >> inp_pcbgrouphash = { >> le_next = 0x0, >> le_prev = 0x0 >> }, >> inp_list = { >> le_next = 0xf80c2a07f570, >> le_prev = 0x81e15e20 >> }, >> inp_ppcb = 0xf80d1bf12210, >> inp_pcbinfo = 0x81e15e28, >> inp_pcbgroup = 0x0, >> inp_pcbgroup_wild = { >> le_next = 0x0, >> le_prev = 0x0 >> }, >> inp_socket = 0x0, >> inp_cred = 0xf804ae6ca400, >> inp_flow = 0, >> inp_flags = 92274688, >> inp_flags2 = 16, >> inp_vflag = 0 '\0', >> inp_ip_ttl = 64 '@', >> inp_ip_p = 0 '\0', >> inp_ip_minttl = 0 '\0', >> inp_flowid = 946611505, >> inp_refcount = 2, >> inp_pspare = 0xf8031c5707c0, >> inp_flowtype = 191, >> inp_rss_listen_bucket = 0, >> inp_ispare = 0xf8031c5707f0, >> inp_inc = { >> inc_flags = 0 '\0', >> inc_len = 0 '\0', >> inc_fibnum = 0, >> inc_ie = { >> ie_fport = 53987, >> ie_lport = 47873, >> ie_dependfaddr = { >> ie46_foreign = { >> ia46_pad32 = 0xf8031c570808, >> ia46_addr4 = { >> s_addr = 3011802202 >> } >> }, >> ie6_foreign = { >> __u6_addr = { >> __u6_addr8 = 0xf8031c570808 "", >> __u6_addr16 = 0xf8031c570808, >> __u6_addr32 = 0xf8031c570808 >> } >> } >> }, >> ie_dependladdr = { >> ie46_local = { >> ia46_pad32 = 0xf8031c570818, >> ia46_addr4 = { >> s_addr = 4068705883 >> } >> }, >> ie6_local = { >> __u6_addr = { >> __u6_addr8 = 0xf8031c570818 "", >> __u6_addr16 = 0xf8031c570818, >> __u6_addr32 = 0xf8031c570818 >> } >> } >> }, >> ie6_zoneid = 0 >> } >> }, >> inp_label = 0x0, >> inp_sp = 0x0, >> inp_depend4 = { >> inp4_ip_tos = 0 '\0', >> inp4_options = 0x0, >> inp4_moptions = 0x0 >> }, >> inp_depend6 = { >> inp6_options = 0x0, >> inp6_outputopts = 0x0, >> inp6_moptions = 0x0, >> inp6_icmp6filt = 0x0, >> inp6_cksum = 0, >> inp6_hops = 0 >> }, >> inp_portlist = { >> le_next = 0xf80274298ae0, >> le_prev = 0xf800454999b0 >> }, >> inp_phd = 0xf800454999a0, >> inp_gencnt = 2119756, >> inp_lle = 0x0, >> inp_lock = { >> lock_object = { >> lo_name = 0x814e6940 "tcpinp", >> lo_flags = 90898432, >> lo_data = 0, >> lo_witness = 0x0 >> }, >> rw_lock = 18446735277871559936 >> }, >> inp_rt_cookie = 10, >> inp_rtu = { >> inpu_route = { >> ro_rt = 0x0, >> ro_lle = 0x0, >> ro_prepend = 0x0, >> ro_plen = 0, >> ro_flags = 384, >> ro_mtu = 0, >> spare = 0, >> ro_dst = { >> sa_len = 16 '\020', >> sa_family = 2 '\002', >> sa_data = 0xf8031c5708f2 "" >> } >> }, >> inpu_route6 = { >> ro_rt = 0x0, >> ro_lle = 0x0, >> ro_prepend = 0x0, >> ro_plen = 0, >> ro_flags = 384, >> ro_mtu = 0, >> spare = 0, >> ro_dst = { >> sin6_len = 16 '\020', >> sin6_family = 2 '\002', >> sin6_port = 0, >> sin6_flowinfo = 3011802202, >> sin6_addr = { >> __u6_addr = { >> __u6_addr8 = 0xf8031c5708f8 "", >> __u6_addr16 = 0xf8031c5708f8, >> __u6_addr32 = 0xf8031c5708f8 >> } >> }, >> sin6_scope_id = 0 >> } >> } >> } >> } >> (kgdb) > > Hi, > > Here is the conclusion: > > The following code is going in an infinite loop: > > >>for (;;) { >>TW_RLOCK(V_tw_lock); >>tw = TAILQ_FIRST(_twq_2msl); >>if (tw == NULL || (!reuse && (tw->tw_time - ticks) > 0)) { >>TW_RUNLOCK(V_tw_lock); >>break; >>} >>KASSERT(tw->tw_inpcb != NULL, ("%s: tw->tw_inpcb == NULL", >>__func__)); >>inp = tw->tw_inpcb; >>in_pcbref(inp); >>TW_RUNLOCK(V_tw_lock); >>if (INP_INFO_TRY_RLOCK(_tcbinfo)) { >>INP_WLOCK(inp); >>tw = intotw(inp); >>if (in_pcbrele_wlocked(inp)) { > > in_pcbrele_wlocked() returns (1) because INP_FREED (16) is set in >
Re: mlx4en, timer irq @100%... (11.0 stuck on high network load ???)
On 08/08/17 10:06, Ben RUBSON wrote: On 08 Aug 2017, at 10:02, Hans Petter Selaskywrote: On 08/08/17 10:00, Ben RUBSON wrote: kgdb) print *twq_2msl.tqh_first $2 = { tw_inpcb = 0xf8031c570740, print *twq_2msl.tqh_first->tw_inpcb (kgdb) print *twq_2msl.tqh_first->tw_inpcb $3 = { inp_hash = { le_next = 0x0, le_prev = 0xfe000f78adb8 }, inp_pcbgrouphash = { le_next = 0x0, le_prev = 0x0 }, inp_list = { le_next = 0xf80c2a07f570, le_prev = 0x81e15e20 }, inp_ppcb = 0xf80d1bf12210, inp_pcbinfo = 0x81e15e28, inp_pcbgroup = 0x0, inp_pcbgroup_wild = { le_next = 0x0, le_prev = 0x0 }, inp_socket = 0x0, inp_cred = 0xf804ae6ca400, inp_flow = 0, inp_flags = 92274688, inp_flags2 = 16, inp_vflag = 0 '\0', inp_ip_ttl = 64 '@', inp_ip_p = 0 '\0', inp_ip_minttl = 0 '\0', inp_flowid = 946611505, inp_refcount = 2, inp_pspare = 0xf8031c5707c0, inp_flowtype = 191, inp_rss_listen_bucket = 0, inp_ispare = 0xf8031c5707f0, inp_inc = { inc_flags = 0 '\0', inc_len = 0 '\0', inc_fibnum = 0, inc_ie = { ie_fport = 53987, ie_lport = 47873, ie_dependfaddr = { ie46_foreign = { ia46_pad32 = 0xf8031c570808, ia46_addr4 = { s_addr = 3011802202 } }, ie6_foreign = { __u6_addr = { __u6_addr8 = 0xf8031c570808 "", __u6_addr16 = 0xf8031c570808, __u6_addr32 = 0xf8031c570808 } } }, ie_dependladdr = { ie46_local = { ia46_pad32 = 0xf8031c570818, ia46_addr4 = { s_addr = 4068705883 } }, ie6_local = { __u6_addr = { __u6_addr8 = 0xf8031c570818 "", __u6_addr16 = 0xf8031c570818, __u6_addr32 = 0xf8031c570818 } } }, ie6_zoneid = 0 } }, inp_label = 0x0, inp_sp = 0x0, inp_depend4 = { inp4_ip_tos = 0 '\0', inp4_options = 0x0, inp4_moptions = 0x0 }, inp_depend6 = { inp6_options = 0x0, inp6_outputopts = 0x0, inp6_moptions = 0x0, inp6_icmp6filt = 0x0, inp6_cksum = 0, inp6_hops = 0 }, inp_portlist = { le_next = 0xf80274298ae0, le_prev = 0xf800454999b0 }, inp_phd = 0xf800454999a0, inp_gencnt = 2119756, inp_lle = 0x0, inp_lock = { lock_object = { lo_name = 0x814e6940 "tcpinp", lo_flags = 90898432, lo_data = 0, lo_witness = 0x0 }, rw_lock = 18446735277871559936 }, inp_rt_cookie = 10, inp_rtu = { inpu_route = { ro_rt = 0x0, ro_lle = 0x0, ro_prepend = 0x0, ro_plen = 0, ro_flags = 384, ro_mtu = 0, spare = 0, ro_dst = { sa_len = 16 '\020', sa_family = 2 '\002', sa_data = 0xf8031c5708f2 "" } }, inpu_route6 = { ro_rt = 0x0, ro_lle = 0x0, ro_prepend = 0x0, ro_plen = 0, ro_flags = 384, ro_mtu = 0, spare = 0, ro_dst = { sin6_len = 16 '\020', sin6_family = 2 '\002', sin6_port = 0, sin6_flowinfo = 3011802202, sin6_addr = { __u6_addr = { __u6_addr8 = 0xf8031c5708f8 "", __u6_addr16 = 0xf8031c5708f8, __u6_addr32 = 0xf8031c5708f8 } }, sin6_scope_id = 0 } } } } (kgdb) Hi, Here is the conclusion: The following code is going in an infinite loop: for (;;) { TW_RLOCK(V_tw_lock); tw = TAILQ_FIRST(_twq_2msl); if (tw == NULL || (!reuse && (tw->tw_time - ticks) > 0)) { TW_RUNLOCK(V_tw_lock); break; } KASSERT(tw->tw_inpcb != NULL, ("%s: tw->tw_inpcb == NULL", __func__)); inp = tw->tw_inpcb; in_pcbref(inp); TW_RUNLOCK(V_tw_lock); if (INP_INFO_TRY_RLOCK(_tcbinfo)) { INP_WLOCK(inp); tw = intotw(inp); if (in_pcbrele_wlocked(inp)) { in_pcbrele_wlocked() returns (1) because INP_FREED (16) is set in inp->inp_flags2. I guess you have invariants disabled, because the KASSERT() below should have caused a panic. KASSERT(tw == NULL, ("%s: held last inp " "reference but tw not NULL", __func__)); INP_INFO_RUNLOCK(_tcbinfo); continue; } This is a regression issue after: commit