Re: mlx4en, timer irq @100%... (11.0 stuck on high network load ???)

2017-09-07 Thread Julien Charbon

 Hi Ben,

On 8/31/17 12:04 PM, Ben RUBSON wrote:
>> On 28 Aug 2017, at 11:27, Julien Charbon  wrote:
>>
>> On 8/28/17 10:25 AM, Ben RUBSON wrote:
 On 16 Aug 2017, at 11:02, Ben RUBSON  wrote:

> On 15 Aug 2017, at 23:33, Julien Charbon  wrote:
>
> On 8/11/17 11:32 AM, Ben RUBSON wrote:
>>> On 08 Aug 2017, at 13:33, Julien Charbon  wrote:
>>>
>>> On 8/8/17 10:31 AM, Hans Petter Selasky wrote:

 Suggested fix attached.
>>>
>>> I agree we your conclusion.  Just for the record, more precisely this
>>> regression seems to have been introduced with:
>>> (...)
>>> Thus good catch, and your patch looks good.  I am going to just verify
>>> the other in_pcbrele_wlocked() calls in TCP stack.
>>
>> Julien, do you plan to make this fix reach 11.0-p12 ?
>
> I am checking if your issue is another flavor of the issue fixed by:
>
> https://svnweb.freebsd.org/base?view=revision=307551
> https://reviews.freebsd.org/D8211
>
> This fix in not in 11.0 but in 11.1.  Currently I did not found how an
> inp in INP_TIMEWAIT state can have been INP_FREED without having its tw
> set to NULL already except the issue fixed by r307551.
>
> Thus could you try to apply this patch:
>
> https://github.com/freebsd/freebsd/commit/acb5bfda99b753d9ead3529d04f20087c5f7d0a0.patch
>
> and see if you can still reproduce this issue?

 Thank you for your answer Julien.
 Unfortunately, I'm not sure at all how to reproduce the issue.
 I have other servers which are 100% identical to this one, same workload,
 same some-months uptime, but they did not trigger the bug yet.

 If other network stack experts (I'm not) agree with your analysis,
 we could then certainly go further with D8211 / r307551.

 One thing that perhaps might help :
 # netstat -an | grep TIME_WAIT$ | wc -l
 468

 Note that due to this running bug, sendmail has lots of difficulties to 
 send outgoing mails.
 As soon as I run the above netstat command, I receive a lot of stacked 
 mails (more than 20 this time).
 As if netstat was able to somehow help...

 Number of TIME_WAIT connections however does not decrease, but increases.

> And in the spirit of r307551 fix and based on Hans patch I will also
> propose to add a kernel log describing the issue instead of starting an
> infinite loop when INVARIANT is not set.

 Which should then never be triggered :)
 Good idea I think !
>>>
>>> What about :
>>> D8211/r307551
>>> + Hans' patch
>>> + Julien's idea of a kernel log (sort of "We should not be here but we are")
>>
>> I did this change and I am testing it
> 
> Good news !
> 
>> on your side did you try this patch applied on 11.0?
>>
>> https://github.com/freebsd/freebsd/commit/acb5bfda99b753d9ead3529d04f20087c5f7d0a0.patch
> 
> Yes, patch applied and running correctly,
> however hard to say whether or not it solves this issue,
> as there is no easy way to reproduce it.

 No problem, it is just a matter of not seeing the issue anymore during
a long enough period.

I created a review that includes Hans's patch and uses the same
log(LOG_ERR) logic than r307551:
https://reviews.freebsd.org/D12267

 On my side, TCP smoke tests are ok.  And I am going to launch our TCP
QA on it while receiving review comments.

> Mail sent to FreeBSD Security Team !
> 
> Many thanks, let's stay tuned !

 Thanks to you and Hans for reporting that issue.  And in summary:

 - Applying r307551 on top of 11.0 should prevent this case to happen
 - D12267 will prevent the tcp_tw_2msl_scan() infinite loop while
reporting the error, in case a regression defeating r307551 is introduced

 Thanks.

--
Julien
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: mlx4en, timer irq @100%... (11.0 stuck on high network load ???)

2017-08-31 Thread Ben RUBSON
> On 28 Aug 2017, at 11:27, Julien Charbon  wrote:
> 
> On 8/28/17 10:25 AM, Ben RUBSON wrote:
>>> On 16 Aug 2017, at 11:02, Ben RUBSON  wrote:
>>> 
 On 15 Aug 2017, at 23:33, Julien Charbon  wrote:
 
 On 8/11/17 11:32 AM, Ben RUBSON wrote:
>> On 08 Aug 2017, at 13:33, Julien Charbon  wrote:
>> 
>> On 8/8/17 10:31 AM, Hans Petter Selasky wrote:
>>> 
>>> Suggested fix attached.
>> 
>> I agree we your conclusion.  Just for the record, more precisely this
>> regression seems to have been introduced with:
>> (...)
>> Thus good catch, and your patch looks good.  I am going to just verify
>> the other in_pcbrele_wlocked() calls in TCP stack.
> 
> Julien, do you plan to make this fix reach 11.0-p12 ?
 
 I am checking if your issue is another flavor of the issue fixed by:
 
 https://svnweb.freebsd.org/base?view=revision=307551
 https://reviews.freebsd.org/D8211
 
 This fix in not in 11.0 but in 11.1.  Currently I did not found how an
 inp in INP_TIMEWAIT state can have been INP_FREED without having its tw
 set to NULL already except the issue fixed by r307551.
 
 Thus could you try to apply this patch:
 
 https://github.com/freebsd/freebsd/commit/acb5bfda99b753d9ead3529d04f20087c5f7d0a0.patch
 
 and see if you can still reproduce this issue?
>>> 
>>> Thank you for your answer Julien.
>>> Unfortunately, I'm not sure at all how to reproduce the issue.
>>> I have other servers which are 100% identical to this one, same workload,
>>> same some-months uptime, but they did not trigger the bug yet.
>>> 
>>> If other network stack experts (I'm not) agree with your analysis,
>>> we could then certainly go further with D8211 / r307551.
>>> 
>>> One thing that perhaps might help :
>>> # netstat -an | grep TIME_WAIT$ | wc -l
>>> 468
>>> 
>>> Note that due to this running bug, sendmail has lots of difficulties to 
>>> send outgoing mails.
>>> As soon as I run the above netstat command, I receive a lot of stacked 
>>> mails (more than 20 this time).
>>> As if netstat was able to somehow help...
>>> 
>>> Number of TIME_WAIT connections however does not decrease, but increases.
>>> 
 And in the spirit of r307551 fix and based on Hans patch I will also
 propose to add a kernel log describing the issue instead of starting an
 infinite loop when INVARIANT is not set.
>>> 
>>> Which should then never be triggered :)
>>> Good idea I think !
>> 
>> What about :
>> D8211/r307551
>> + Hans' patch
>> + Julien's idea of a kernel log (sort of "We should not be here but we are")
> 
> I did this change and I am testing it

Good news !

> on your side did you try this patch applied on 11.0?
> 
> https://github.com/freebsd/freebsd/commit/acb5bfda99b753d9ead3529d04f20087c5f7d0a0.patch

Yes, patch applied and running correctly,
however hard to say whether or not it solves this issue,
as there is no easy way to reproduce it.

>> And backporting all this to 11.0 (and so to 11.1 too) ?
>> 
>> As this bug can impact every FreeBSD machine / server,
>> leading to an unavailable / unreachable system (this is how mine ended),
>> sounds like it could inevitably be a good thing, for production stability 
>> purpose.
> 
> The main fix for your issue is (I believe):
> 
> Fix a double-free when an inp transitions to INP_TIMEWAIT state
> after having been dropped.
> https://svnweb.freebsd.org/base?view=revision=307551
> 
> This fix has been MFC-ed on both stable/11, stable/10 and is already
> included in 11.1 and will be in 10.4.  To push in 11.0 release directly,
> I guess you have to promote this change to an Errata (never did that
> myself):
> 
> https://www.freebsd.org/security/notices.html
> https://www.freebsd.org/security/security.html#reporting

Mail sent to FreeBSD Security Team !

Many thanks, let's stay tuned !

Ben

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: mlx4en, timer irq @100%... (11.0 stuck on high network load ???)

2017-08-28 Thread Julien Charbon

 Hi Ben,

On 8/28/17 10:25 AM, Ben RUBSON wrote:
>> On 16 Aug 2017, at 11:02, Ben RUBSON  wrote:
>>
>>> On 15 Aug 2017, at 23:33, Julien Charbon  wrote:
>>>
>>> On 8/11/17 11:32 AM, Ben RUBSON wrote:
> On 08 Aug 2017, at 13:33, Julien Charbon  wrote:
>
> On 8/8/17 10:31 AM, Hans Petter Selasky wrote:
>>
>> Suggested fix attached.
>
> I agree we your conclusion.  Just for the record, more precisely this
> regression seems to have been introduced with:
> (...)
> Thus good catch, and your patch looks good.  I am going to just verify
> the other in_pcbrele_wlocked() calls in TCP stack.

 Julien, do you plan to make this fix reach 11.0-p12 ?
>>>
>>> I am checking if your issue is another flavor of the issue fixed by:
>>>
>>> https://svnweb.freebsd.org/base?view=revision=307551
>>> https://reviews.freebsd.org/D8211
>>>
>>> This fix in not in 11.0 but in 11.1.  Currently I did not found how an
>>> inp in INP_TIMEWAIT state can have been INP_FREED without having its tw
>>> set to NULL already except the issue fixed by r307551.
>>>
>>> Thus could you try to apply this patch:
>>>
>>> https://github.com/freebsd/freebsd/commit/acb5bfda99b753d9ead3529d04f20087c5f7d0a0.patch
>>>
>>> and see if you can still reproduce this issue?
>>
>> Thank you for your answer Julien.
>> Unfortunately, I'm not sure at all how to reproduce the issue.
>> I have other servers which are 100% identical to this one, same workload,
>> same some-months uptime, but they did not trigger the bug yet.
>>
>> If other network stack experts (I'm not) agree with your analysis,
>> we could then certainly go further with D8211 / r307551.
>>
>> One thing that perhaps might help :
>> # netstat -an | grep TIME_WAIT$ | wc -l
>> 468
>>
>> Note that due to this running bug, sendmail has lots of difficulties to send 
>> outgoing mails.
>> As soon as I run the above netstat command, I receive a lot of stacked mails 
>> (more than 20 this time).
>> As if netstat was able to somehow help...
>>
>> Number of TIME_WAIT connections however does not decrease, but increases.
>>
>>> And in the spirit of r307551 fix and based on Hans patch I will also
>>> propose to add a kernel log describing the issue instead of starting an
>>> infinite loop when INVARIANT is not set.
>>
>> Which should then never be triggered :)
>> Good idea I think !
> 
> What about :
> D8211/r307551
> + Hans' patch
> + Julien's idea of a kernel log (sort of "We should not be here but we are")

 I did this change and I am testing it, on your side did you try this
patch applied on 11.0?

https://github.com/freebsd/freebsd/commit/acb5bfda99b753d9ead3529d04f20087c5f7d0a0.patch

> And backporting all this to 11.0 (and so to 11.1 too) ?
>
> As this bug can impact every FreeBSD machine / server,
> leading to an unavailable / unreachable system (this is how mine ended),
> sounds like it could inevitably be a good thing, for production stability 
> purpose.

 The main fix for your issue is (I believe):

Fix a double-free when an inp transitions to INP_TIMEWAIT state
after having been dropped.
https://svnweb.freebsd.org/base?view=revision=307551

 This fix has been MFC-ed on both stable/11, stable/10 and is already
included in 11.1 and will be in 10.4.  To push in 11.0 release directly,
I guess you have to promote this change to an Errata (never did that
myself):

https://www.freebsd.org/security/notices.html
https://www.freebsd.org/security/security.html#reporting

 While waiting for this errata to be accepted better using the patch.

 My 2 cents.

--
Julien
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: mlx4en, timer irq @100%... (11.0 stuck on high network load ???)

2017-08-28 Thread Ben RUBSON
> On 16 Aug 2017, at 11:02, Ben RUBSON  wrote:
> 
>> On 15 Aug 2017, at 23:33, Julien Charbon  wrote:
>> 
>> On 8/11/17 11:32 AM, Ben RUBSON wrote:
 On 08 Aug 2017, at 13:33, Julien Charbon  wrote:
 
 On 8/8/17 10:31 AM, Hans Petter Selasky wrote:
> 
> Suggested fix attached.
 
 I agree we your conclusion.  Just for the record, more precisely this
 regression seems to have been introduced with:
 (...)
 Thus good catch, and your patch looks good.  I am going to just verify
 the other in_pcbrele_wlocked() calls in TCP stack.
>>> 
>>> Julien, do you plan to make this fix reach 11.0-p12 ?
>> 
>> I am checking if your issue is another flavor of the issue fixed by:
>> 
>> https://svnweb.freebsd.org/base?view=revision=307551
>> https://reviews.freebsd.org/D8211
>> 
>> This fix in not in 11.0 but in 11.1.  Currently I did not found how an
>> inp in INP_TIMEWAIT state can have been INP_FREED without having its tw
>> set to NULL already except the issue fixed by r307551.
>> 
>> Thus could you try to apply this patch:
>> 
>> https://github.com/freebsd/freebsd/commit/acb5bfda99b753d9ead3529d04f20087c5f7d0a0.patch
>> 
>> and see if you can still reproduce this issue?
> 
> Thank you for your answer Julien.
> Unfortunately, I'm not sure at all how to reproduce the issue.
> I have other servers which are 100% identical to this one, same workload,
> same some-months uptime, but they did not trigger the bug yet.
> 
> If other network stack experts (I'm not) agree with your analysis,
> we could then certainly go further with D8211 / r307551.
> 
> One thing that perhaps might help :
> # netstat -an | grep TIME_WAIT$ | wc -l
> 468
> 
> Note that due to this running bug, sendmail has lots of difficulties to send 
> outgoing mails.
> As soon as I run the above netstat command, I receive a lot of stacked mails 
> (more than 20 this time).
> As if netstat was able to somehow help...
> 
> Number of TIME_WAIT connections however does not decrease, but increases.
> 
>> And in the spirit of r307551 fix and based on Hans patch I will also
>> propose to add a kernel log describing the issue instead of starting an
>> infinite loop when INVARIANT is not set.
> 
> Which should then never be triggered :)
> Good idea I think !

What about :
D8211/r307551
+ Hans' patch
+ Julien's idea of a kernel log (sort of "We should not be here but we are")

And backporting all this to 11.0 (and so to 11.1 too) ?

As this bug can impact every FreeBSD machine / server,
leading to an unavailable / unreachable system (this is how mine ended),
sounds like it could inevitably be a good thing, for production stability 
purpose.

Thank you very much !

Ben

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: mlx4en, timer irq @100%... (11.0 stuck on high network load ???)

2017-08-16 Thread Ben RUBSON
> On 15 Aug 2017, at 23:33, Julien Charbon  wrote:
> 
> On 8/11/17 11:32 AM, Ben RUBSON wrote:
>>> On 08 Aug 2017, at 13:33, Julien Charbon  wrote:
>>> 
>>> On 8/8/17 10:31 AM, Hans Petter Selasky wrote:
 
 Suggested fix attached.
>>> 
>>> I agree we your conclusion.  Just for the record, more precisely this
>>> regression seems to have been introduced with:
>>> (...)
>>> Thus good catch, and your patch looks good.  I am going to just verify
>>> the other in_pcbrele_wlocked() calls in TCP stack.
>> 
>> Julien, do you plan to make this fix reach 11.0-p12 ?
> 
> I am checking if your issue is another flavor of the issue fixed by:
> 
> https://svnweb.freebsd.org/base?view=revision=307551
> https://reviews.freebsd.org/D8211
> 
> This fix in not in 11.0 but in 11.1.  Currently I did not found how an
> inp in INP_TIMEWAIT state can have been INP_FREED without having its tw
> set to NULL already except the issue fixed by r307551.
> 
> Thus could you try to apply this patch:
> 
> https://github.com/freebsd/freebsd/commit/acb5bfda99b753d9ead3529d04f20087c5f7d0a0.patch
> 
> and see if you can still reproduce this issue?

Thank you for your answer Julien.
Unfortunately, I'm not sure at all how to reproduce the issue.
I have other servers which are 100% identical to this one, same workload,
same some-months uptime, but they did not trigger the bug yet.

If other network stack experts (I'm not) agree with your analysis,
we could then certainly go further with D8211 / r307551.

One thing that perhaps might help :
# netstat -an | grep TIME_WAIT$ | wc -l
468

Note that due to this running bug, sendmail has lots of difficulties to send 
outgoing mails.
As soon as I run the above netstat command, I receive a lot of stacked mails 
(more than 20 this time).
As if netstat was able to somehow help...

Number of TIME_WAIT connections however does not decrease, but increases.

> And in the spirit of r307551 fix and based on Hans patch I will also
> propose to add a kernel log describing the issue instead of starting an
> infinite loop when INVARIANT is not set.

Which should then never be triggered :)
Good idea I think !

Thank you again !

Ben

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: mlx4en, timer irq @100%... (11.0 stuck on high network load ???)

2017-08-15 Thread Julien Charbon

 Hi Ben,

On 8/11/17 11:32 AM, Ben RUBSON wrote:
>> On 08 Aug 2017, at 13:33, Julien Charbon  wrote:
>>
>> On 8/8/17 10:31 AM, Hans Petter Selasky wrote:
>>>
>>> Suggested fix attached.
>>
>> I agree we your conclusion.  Just for the record, more precisely this
>> regression seems to have been introduced with:
>> (...)
>> Thus good catch, and your patch looks good.  I am going to just verify
>> the other in_pcbrele_wlocked() calls in TCP stack.
> 
> Julien, do you plan to make this fix reach 11.0-p12 ?

 I am checking if your issue is another flavor of the issue fixed by:

https://svnweb.freebsd.org/base?view=revision=307551
https://reviews.freebsd.org/D8211

 This fix in not in 11.0 but in 11.1.  Currently I did not found how an
inp in INP_TIMEWAIT state can have been INP_FREED without having its tw
set to NULL already except the issue fixed by r307551.

 Thus could you try to apply this patch:

https://github.com/freebsd/freebsd/commit/acb5bfda99b753d9ead3529d04f20087c5f7d0a0.patch

 and see if you can still reproduce this issue?

 And in the spirit of r307551 fix and based on Hans patch I will also
propose to add a kernel log describing the issue instead of starting an
infinite loop when INVARIANT is not set.

--
Julien
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: mlx4en, timer irq @100%... (11.0 stuck on high network load ???)

2017-08-11 Thread Ben RUBSON

> On 08 Aug 2017, at 13:33, Julien Charbon  wrote:
> 
> Hi,
> 
> On 8/8/17 10:31 AM, Hans Petter Selasky wrote:
>> 
>> 
>> Suggested fix attached.
> 
> I agree we your conclusion.  Just for the record, more precisely this
> regression seems to have been introduced with:
> (...)
> Thus good catch, and your patch looks good.  I am going to just verify
> the other in_pcbrele_wlocked() calls in TCP stack.

Julien, do you plan to make this fix reach 11.0-p12 ?

Many thx !

Ben

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: mlx4en, timer irq @100%... (11.0 stuck on high network load ???)

2017-08-08 Thread Hans Petter Selasky

On 08/08/17 13:56, Slawa Olhovchenkov wrote:

On Tue, Aug 08, 2017 at 01:49:08PM +0200, Hans Petter Selasky wrote:


On 08/08/17 13:33, Slawa Olhovchenkov wrote:

TW_RUNLOCK(V_tw_lock);
and
if (INP_INFO_TRY_WLOCK(_tcbinfo)) {

`inp` can be invalidated, freed and this pointer may be invalid?


If you look one line up there is a pcbref ??


Yes.
Can different thread take this inp and freed it?
May be timer thread?


No, it cannot be freed while there is a ref.

Some lines down the ref is dropped once the inp pointer is no longer needed.

--HPS
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: mlx4en, timer irq @100%... (11.0 stuck on high network load ???)

2017-08-08 Thread Julien Charbon

 Hi,

On 8/8/17 10:31 AM, Hans Petter Selasky wrote:
> On 08/08/17 10:06, Ben RUBSON wrote:
>>> On 08 Aug 2017, at 10:02, Hans Petter Selasky  wrote:
>>>
>>> On 08/08/17 10:00, Ben RUBSON wrote:
 kgdb) print *twq_2msl.tqh_first
 $2 = {
tw_inpcb = 0xf8031c570740,
>>>
>>> print *twq_2msl.tqh_first->tw_inpcb
>>
> 
> Here is the conclusion:
> 
> The following code is going in an infinite loop:
> 
> 
>> for (;;) {
>> TW_RLOCK(V_tw_lock);
>> tw = TAILQ_FIRST(_twq_2msl);
>> if (tw == NULL || (!reuse && (tw->tw_time - ticks) >
>> 0)) {
>> TW_RUNLOCK(V_tw_lock);
>> break;
>> }
>> KASSERT(tw->tw_inpcb != NULL, ("%s: tw->tw_inpcb ==
>> NULL",
>> __func__));
>>
>> inp = tw->tw_inpcb;
>> in_pcbref(inp);
>> TW_RUNLOCK(V_tw_lock);
>>
>> if (INP_INFO_TRY_RLOCK(_tcbinfo)) {
>>
>> INP_WLOCK(inp);
>> tw = intotw(inp);
>> if (in_pcbrele_wlocked(inp)) {
> 
> in_pcbrele_wlocked() returns (1) because INP_FREED (16) is set in
> inp->inp_flags2. I guess you have invariants disabled, because the
> KASSERT() below should have caused a panic.
> 
>> KASSERT(tw == NULL, ("%s: held last inp "
>> "reference but tw not NULL",
>> __func__));
>> INP_INFO_RUNLOCK(_tcbinfo);
>> continue;
>> }
> 
> This is a regression issue after:
> 
>> commit 5630210a7f1dbbd903b77b2aef939cd47c63da58
>> Author: jch 
>> Date:   Thu Oct 30 08:53:56 2014 +
>>
>> Fix a race condition in TCP timewait between tcp_tw_2msl_reuse() and
>> tcp_tw_2msl_scan().  This race condition drives unplanned timewait
>> timeout cancellation.  Also simplify implementation by holding inpcb
>> reference and removing tcptw reference counting.
> 
> Suggested fix attached.

 I agree we your conclusion.  Just for the record, more precisely this
regression seems to have been introduced with:

commit b02d40ddcda08b51a49e5667e6808f5dc5ec0472
Author: fabient 
Date:   Wed Nov 25 14:45:43 2015 +

The r241129 description was wrong that the scenario is possible
only for read locks on pcbs. The same race can happen with write
lock semantics as well.

The race scenario:

- Two threads (1 and 2) locate pcb with writer semantics
(INPLOOKUP_WLOCKPCB)
 and do in_pcbref() on it.
- 1 and 2 both drop the inp hash lock.
- Another thread (3) grabs the inp hash lock. Then it runs in_pcbfree(),
 which wlocks the pcb. They must happen faster than 1 or 2 come
INP_WLOCK()!
- 1 and 2 congest in INP_WLOCK().
- 3 does in_pcbremlists(), drops hash lock, and runs
in_pcbrele_wlocked(),
 which doesn't free the pcb due to two references on it.
 Then it unlocks the pcb.
- 1 (or 2) gets wlock on the pcb, runs in_pcbrele_wlocked(), which
doesn't
 report inp as freed, due to 2 (or 1) still helding extra reference
on it.
 The thread tries to do smth with a disconnected pcb and crashes.

Submitted by:   emeric.pou...@stormshield.eu
Reviewed by:gleb@
MFC after:  1 week
Sponsored by: Stormshield
Tested by: Cassiano Peixoto, Stormshield

Notes:
svn path=/head/; revision=291301

 Before this change in_pcbrele_wlocked() returned 1 only on the last
reference which is what tcp_tw_2msl_scan() currently expects.

 Thus good catch, and your patch looks good.  I am going to just verify
the other in_pcbrele_wlocked() calls in TCP stack.

 Thanks.

--
Julien
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: mlx4en, timer irq @100%... (11.0 stuck on high network load ???)

2017-08-08 Thread Slawa Olhovchenkov
On Tue, Aug 08, 2017 at 01:49:08PM +0200, Hans Petter Selasky wrote:

> On 08/08/17 13:33, Slawa Olhovchenkov wrote:
> > TW_RUNLOCK(V_tw_lock);
> > and
> > if (INP_INFO_TRY_WLOCK(_tcbinfo)) {
> > 
> > `inp` can be invalidated, freed and this pointer may be invalid?
> 
> If you look one line up there is a pcbref ??

Yes.
Can different thread take this inp and freed it?
May be timer thread?
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: mlx4en, timer irq @100%... (11.0 stuck on high network load ???)

2017-08-08 Thread Hans Petter Selasky

On 08/08/17 13:33, Slawa Olhovchenkov wrote:

TW_RUNLOCK(V_tw_lock);
and
if (INP_INFO_TRY_WLOCK(_tcbinfo)) {

`inp` can be invalidated, freed and this pointer may be invalid?


If you look one line up there is a pcbref ??

--HPS
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: mlx4en, timer irq @100%... (11.0 stuck on high network load ???)

2017-08-08 Thread Slawa Olhovchenkov
On Tue, Aug 08, 2017 at 10:31:33AM +0200, Hans Petter Selasky wrote:

> Here is the conclusion:
> 
> The following code is going in an infinite loop:
> 
> 
> > for (;;) {
> > TW_RLOCK(V_tw_lock);
> > tw = TAILQ_FIRST(_twq_2msl);
> > if (tw == NULL || (!reuse && (tw->tw_time - ticks) > 0)) {
> > TW_RUNLOCK(V_tw_lock);
> > break;
> > }
> > KASSERT(tw->tw_inpcb != NULL, ("%s: tw->tw_inpcb == NULL",
> > __func__));
> > 
> > inp = tw->tw_inpcb;
> > in_pcbref(inp);
> > TW_RUNLOCK(V_tw_lock);
> > 
> > if (INP_INFO_TRY_RLOCK(_tcbinfo)) {
> > 
> > INP_WLOCK(inp);
> > tw = intotw(inp);
> > if (in_pcbrele_wlocked(inp)) {
> 
> in_pcbrele_wlocked() returns (1) because INP_FREED (16) is set in 
> inp->inp_flags2. I guess you have invariants disabled, because the 
> KASSERT() below should have caused a panic.
> 
> > KASSERT(tw == NULL, ("%s: held last inp "
> > "reference but tw not NULL", __func__));
> > INP_INFO_RUNLOCK(_tcbinfo);
> > continue;
> > }
> 
> This is a regression issue after:
> 
> > commit 5630210a7f1dbbd903b77b2aef939cd47c63da58
> > Author: jch 
> > Date:   Thu Oct 30 08:53:56 2014 +
> > 
> > Fix a race condition in TCP timewait between tcp_tw_2msl_reuse() and
> > tcp_tw_2msl_scan().  This race condition drives unplanned timewait
> > timeout cancellation.  Also simplify implementation by holding inpcb
> > reference and removing tcptw reference counting.
> 
> Suggested fix attached.

Hmm, I am not sure, IMHO between

TW_RUNLOCK(V_tw_lock);
and
if (INP_INFO_TRY_WLOCK(_tcbinfo)) {

`inp` can be invalidated, freed and this pointer may be invalid?


> Index: sys/netinet/tcp_timewait.c
> ===
> --- sys/netinet/tcp_timewait.c(revision 321981)
> +++ sys/netinet/tcp_timewait.c(working copy)
> @@ -709,10 +709,11 @@
>   INP_WLOCK(inp);
>   tw = intotw(inp);
>   if (in_pcbrele_wlocked(inp)) {
> - KASSERT(tw == NULL, ("%s: held last inp "
> - "reference but tw not NULL", __func__));
>   INP_INFO_RUNLOCK(_tcbinfo);
> - continue;
> + if (tw == NULL)
> + continue;
> + else
> + break;  /* INP_FREED flag is set */
>   }
>  
>   if (tw == NULL) {

> ___
> freebsd-...@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-net
> To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: mlx4en, timer irq @100%... (11.0 stuck on high network load ???)

2017-08-08 Thread Ben RUBSON

> On 08 Aug 2017, at 10:31, Hans Petter Selasky  wrote:
> 
> On 08/08/17 10:06, Ben RUBSON wrote:
>>> On 08 Aug 2017, at 10:02, Hans Petter Selasky  wrote:
>>> 
>>> On 08/08/17 10:00, Ben RUBSON wrote:
 kgdb) print *twq_2msl.tqh_first
 $2 = {
   tw_inpcb = 0xf8031c570740,
>>> 
>>> print *twq_2msl.tqh_first->tw_inpcb
>> (kgdb) print *twq_2msl.tqh_first->tw_inpcb
>> $3 = {
>>   inp_hash = {
>> le_next = 0x0,
>> le_prev = 0xfe000f78adb8
>>   },
>>   inp_pcbgrouphash = {
>> le_next = 0x0,
>> le_prev = 0x0
>>   },
>>   inp_list = {
>> le_next = 0xf80c2a07f570,
>> le_prev = 0x81e15e20
>>   },
>>   inp_ppcb = 0xf80d1bf12210,
>>   inp_pcbinfo = 0x81e15e28,
>>   inp_pcbgroup = 0x0,
>>   inp_pcbgroup_wild = {
>> le_next = 0x0,
>> le_prev = 0x0
>>   },
>>   inp_socket = 0x0,
>>   inp_cred = 0xf804ae6ca400,
>>   inp_flow = 0,
>>   inp_flags = 92274688,
>>   inp_flags2 = 16,
>>   inp_vflag = 0 '\0',
>>   inp_ip_ttl = 64 '@',
>>   inp_ip_p = 0 '\0',
>>   inp_ip_minttl = 0 '\0',
>>   inp_flowid = 946611505,
>>   inp_refcount = 2,
>>   inp_pspare = 0xf8031c5707c0,
>>   inp_flowtype = 191,
>>   inp_rss_listen_bucket = 0,
>>   inp_ispare = 0xf8031c5707f0,
>>   inp_inc = {
>> inc_flags = 0 '\0',
>> inc_len = 0 '\0',
>> inc_fibnum = 0,
>> inc_ie = {
>>   ie_fport = 53987,
>>   ie_lport = 47873,
>>   ie_dependfaddr = {
>> ie46_foreign = {
>>   ia46_pad32 = 0xf8031c570808,
>>   ia46_addr4 = {
>> s_addr = 3011802202
>>   }
>> },
>> ie6_foreign = {
>>   __u6_addr = {
>> __u6_addr8 = 0xf8031c570808 "",
>> __u6_addr16 = 0xf8031c570808,
>> __u6_addr32 = 0xf8031c570808
>>   }
>> }
>>   },
>>   ie_dependladdr = {
>> ie46_local = {
>>   ia46_pad32 = 0xf8031c570818,
>>   ia46_addr4 = {
>> s_addr = 4068705883
>>   }
>> },
>> ie6_local = {
>>   __u6_addr = {
>> __u6_addr8 = 0xf8031c570818 "",
>> __u6_addr16 = 0xf8031c570818,
>> __u6_addr32 = 0xf8031c570818
>>   }
>> }
>>   },
>>   ie6_zoneid = 0
>> }
>>   },
>>   inp_label = 0x0,
>>   inp_sp = 0x0,
>>   inp_depend4 = {
>> inp4_ip_tos = 0 '\0',
>> inp4_options = 0x0,
>> inp4_moptions = 0x0
>>   },
>>   inp_depend6 = {
>> inp6_options = 0x0,
>> inp6_outputopts = 0x0,
>> inp6_moptions = 0x0,
>> inp6_icmp6filt = 0x0,
>> inp6_cksum = 0,
>> inp6_hops = 0
>>   },
>>   inp_portlist = {
>> le_next = 0xf80274298ae0,
>> le_prev = 0xf800454999b0
>>   },
>>   inp_phd = 0xf800454999a0,
>>   inp_gencnt = 2119756,
>>   inp_lle = 0x0,
>>   inp_lock = {
>> lock_object = {
>>   lo_name = 0x814e6940 "tcpinp",
>>   lo_flags = 90898432,
>>   lo_data = 0,
>>   lo_witness = 0x0
>> },
>> rw_lock = 18446735277871559936
>>   },
>>   inp_rt_cookie = 10,
>>   inp_rtu = {
>> inpu_route = {
>>   ro_rt = 0x0,
>>   ro_lle = 0x0,
>>   ro_prepend = 0x0,
>>   ro_plen = 0,
>>   ro_flags = 384,
>>   ro_mtu = 0,
>>   spare = 0,
>>   ro_dst = {
>> sa_len = 16 '\020',
>> sa_family = 2 '\002',
>> sa_data = 0xf8031c5708f2 ""
>>   }
>> },
>> inpu_route6 = {
>>   ro_rt = 0x0,
>>   ro_lle = 0x0,
>>   ro_prepend = 0x0,
>>   ro_plen = 0,
>>   ro_flags = 384,
>>   ro_mtu = 0,
>>   spare = 0,
>>   ro_dst = {
>> sin6_len = 16 '\020',
>> sin6_family = 2 '\002',
>> sin6_port = 0,
>> sin6_flowinfo = 3011802202,
>> sin6_addr = {
>>   __u6_addr = {
>> __u6_addr8 = 0xf8031c5708f8 "",
>> __u6_addr16 = 0xf8031c5708f8,
>> __u6_addr32 = 0xf8031c5708f8
>>   }
>> },
>> sin6_scope_id = 0
>>   }
>> }
>>   }
>> }
>> (kgdb)
> 
> Hi,
> 
> Here is the conclusion:
> 
> The following code is going in an infinite loop:
> 
> 
>>for (;;) {
>>TW_RLOCK(V_tw_lock);
>>tw = TAILQ_FIRST(_twq_2msl);
>>if (tw == NULL || (!reuse && (tw->tw_time - ticks) > 0)) {
>>TW_RUNLOCK(V_tw_lock);
>>break;
>>}
>>KASSERT(tw->tw_inpcb != NULL, ("%s: tw->tw_inpcb == NULL",
>>__func__));
>>inp = tw->tw_inpcb;
>>in_pcbref(inp);
>>TW_RUNLOCK(V_tw_lock);
>>if (INP_INFO_TRY_RLOCK(_tcbinfo)) {
>>INP_WLOCK(inp);
>>tw = intotw(inp);
>>if (in_pcbrele_wlocked(inp)) {
> 
> in_pcbrele_wlocked() returns (1) because INP_FREED (16) is set in 
> 

Re: mlx4en, timer irq @100%... (11.0 stuck on high network load ???)

2017-08-08 Thread Hans Petter Selasky

On 08/08/17 10:06, Ben RUBSON wrote:

On 08 Aug 2017, at 10:02, Hans Petter Selasky  wrote:

On 08/08/17 10:00, Ben RUBSON wrote:

kgdb) print *twq_2msl.tqh_first
$2 = {
   tw_inpcb = 0xf8031c570740,


print *twq_2msl.tqh_first->tw_inpcb


(kgdb) print *twq_2msl.tqh_first->tw_inpcb
$3 = {
   inp_hash = {
 le_next = 0x0,
 le_prev = 0xfe000f78adb8
   },
   inp_pcbgrouphash = {
 le_next = 0x0,
 le_prev = 0x0
   },
   inp_list = {
 le_next = 0xf80c2a07f570,
 le_prev = 0x81e15e20
   },
   inp_ppcb = 0xf80d1bf12210,
   inp_pcbinfo = 0x81e15e28,
   inp_pcbgroup = 0x0,
   inp_pcbgroup_wild = {
 le_next = 0x0,
 le_prev = 0x0
   },
   inp_socket = 0x0,
   inp_cred = 0xf804ae6ca400,
   inp_flow = 0,
   inp_flags = 92274688,
   inp_flags2 = 16,
   inp_vflag = 0 '\0',
   inp_ip_ttl = 64 '@',
   inp_ip_p = 0 '\0',
   inp_ip_minttl = 0 '\0',
   inp_flowid = 946611505,
   inp_refcount = 2,
   inp_pspare = 0xf8031c5707c0,
   inp_flowtype = 191,
   inp_rss_listen_bucket = 0,
   inp_ispare = 0xf8031c5707f0,
   inp_inc = {
 inc_flags = 0 '\0',
 inc_len = 0 '\0',
 inc_fibnum = 0,
 inc_ie = {
   ie_fport = 53987,
   ie_lport = 47873,
   ie_dependfaddr = {
 ie46_foreign = {
   ia46_pad32 = 0xf8031c570808,
   ia46_addr4 = {
 s_addr = 3011802202
   }
 },
 ie6_foreign = {
   __u6_addr = {
 __u6_addr8 = 0xf8031c570808 "",
 __u6_addr16 = 0xf8031c570808,
 __u6_addr32 = 0xf8031c570808
   }
 }
   },
   ie_dependladdr = {
 ie46_local = {
   ia46_pad32 = 0xf8031c570818,
   ia46_addr4 = {
 s_addr = 4068705883
   }
 },
 ie6_local = {
   __u6_addr = {
 __u6_addr8 = 0xf8031c570818 "",
 __u6_addr16 = 0xf8031c570818,
 __u6_addr32 = 0xf8031c570818
   }
 }
   },
   ie6_zoneid = 0
 }
   },
   inp_label = 0x0,
   inp_sp = 0x0,
   inp_depend4 = {
 inp4_ip_tos = 0 '\0',
 inp4_options = 0x0,
 inp4_moptions = 0x0
   },
   inp_depend6 = {
 inp6_options = 0x0,
 inp6_outputopts = 0x0,
 inp6_moptions = 0x0,
 inp6_icmp6filt = 0x0,
 inp6_cksum = 0,
 inp6_hops = 0
   },
   inp_portlist = {
 le_next = 0xf80274298ae0,
 le_prev = 0xf800454999b0
   },
   inp_phd = 0xf800454999a0,
   inp_gencnt = 2119756,
   inp_lle = 0x0,
   inp_lock = {
 lock_object = {
   lo_name = 0x814e6940 "tcpinp",
   lo_flags = 90898432,
   lo_data = 0,
   lo_witness = 0x0
 },
 rw_lock = 18446735277871559936
   },
   inp_rt_cookie = 10,
   inp_rtu = {
 inpu_route = {
   ro_rt = 0x0,
   ro_lle = 0x0,
   ro_prepend = 0x0,
   ro_plen = 0,
   ro_flags = 384,
   ro_mtu = 0,
   spare = 0,
   ro_dst = {
 sa_len = 16 '\020',
 sa_family = 2 '\002',
 sa_data = 0xf8031c5708f2 ""
   }
 },
 inpu_route6 = {
   ro_rt = 0x0,
   ro_lle = 0x0,
   ro_prepend = 0x0,
   ro_plen = 0,
   ro_flags = 384,
   ro_mtu = 0,
   spare = 0,
   ro_dst = {
 sin6_len = 16 '\020',
 sin6_family = 2 '\002',
 sin6_port = 0,
 sin6_flowinfo = 3011802202,
 sin6_addr = {
   __u6_addr = {
 __u6_addr8 = 0xf8031c5708f8 "",
 __u6_addr16 = 0xf8031c5708f8,
 __u6_addr32 = 0xf8031c5708f8
   }
 },
 sin6_scope_id = 0
   }
 }
   }
}
(kgdb)



Hi,

Here is the conclusion:

The following code is going in an infinite loop:



for (;;) {
TW_RLOCK(V_tw_lock);
tw = TAILQ_FIRST(_twq_2msl);
if (tw == NULL || (!reuse && (tw->tw_time - ticks) > 0)) {
TW_RUNLOCK(V_tw_lock);
break;
}
KASSERT(tw->tw_inpcb != NULL, ("%s: tw->tw_inpcb == NULL",
__func__));

inp = tw->tw_inpcb;
in_pcbref(inp);
TW_RUNLOCK(V_tw_lock);

if (INP_INFO_TRY_RLOCK(_tcbinfo)) {

INP_WLOCK(inp);
tw = intotw(inp);
if (in_pcbrele_wlocked(inp)) {


in_pcbrele_wlocked() returns (1) because INP_FREED (16) is set in 
inp->inp_flags2. I guess you have invariants disabled, because the 
KASSERT() below should have caused a panic.



KASSERT(tw == NULL, ("%s: held last inp "
"reference but tw not NULL", __func__));
INP_INFO_RUNLOCK(_tcbinfo);
continue;
}


This is a regression issue after:


commit