Re: svn commit: r303037 - head/sys/kern

2016-07-20 Thread Hans Petter Selasky

On 07/20/16 18:35, Gleb Smirnoff wrote:

  Randall,

  I have just tested and r303037 brings the TCP panic back. I
got two crashes during 2.5 hours.


Hi Gleb,

There is more than one way to fix the TCP panic. You don't _need_ to 
change the return values of callout_stop(). You _can_ make a new 
function for this, which doesn't cause problems in case anyone out there 
is depending on the current callout_stop() behaviour.



In your email [1] you are right, there is regression that (-1)
return value is lost. This problem was worked on in the PR 210884,
and we were very close to commiting the fix.

The whole 11.0-RELEASE cycle strongly depends on this change. We
don't want to release with TCP panic, and of course we want the
regression described in 210884 to be fixed.

Your backout mixed with extra code really made things messy. Since
I don't want to go with commit war, on behalf of RE we are asking
for explicit agreement to back out r303037. Then we will proceed with
latest patch from 210884. Is that okay?


I think Randall is right trying to preserve the callout API. My wish 
Gleb, is that you add a new function to handle the new behaviour.




[1] https://lists.freebsd.org/pipermail/svn-src-head/2016-July/089313.html



--HPS
___
svn-src-all@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"


Re: svn commit: r303037 - head/sys/kern

2016-07-20 Thread Randall Stewart via svn-src-all
You are most welcome to backout anything you like.. as far as
I am concerned you own the code..

R
> On Jul 20, 2016, at 6:35 PM, Gleb Smirnoff  wrote:
> 
>  Randall,
> 
>  I have just tested and r303037 brings the TCP panic back. I
> got two crashes during 2.5 hours.
> 
> In your email [1] you are right, there is regression that (-1)
> return value is lost. This problem was worked on in the PR 210884,
> and we were very close to commiting the fix.
> 
> The whole 11.0-RELEASE cycle strongly depends on this change. We
> don't want to release with TCP panic, and of course we want the
> regression described in 210884 to be fixed.
> 
> Your backout mixed with extra code really made things messy. Since
> I don't want to go with commit war, on behalf of RE we are asking
> for explicit agreement to back out r303037. Then we will proceed with
> latest patch from 210884. Is that okay?
> 
> [1] https://lists.freebsd.org/pipermail/svn-src-head/2016-July/089313.html
> 
> On Wed, Jul 20, 2016 at 03:33:37PM +0200, Randall Stewart wrote:
> R> Gleb
> R> 
> R> I wish you would have responded earlier.. I am more than glad to hand
> R> off all kern_timeout.c to you… please take it commit what you want to
> R> it and have it. I hate the code and I dislike having to touch it.
> R> 
> R> Its yours.. I can assure you I will not touch it again.
> R> 
> R> R
> R> > On Jul 20, 2016, at 8:53 AM, Gleb Smirnoff  wrote:
> R> > 
> R> > On Tue, Jul 19, 2016 at 06:31:19PM +, Randall Stewart wrote:
> R> > R> Author: rrs
> R> > R> Date: Tue Jul 19 18:31:19 2016
> R> > R> New Revision: 303037
> R> > R> URL: https://svnweb.freebsd.org/changeset/base/303037
> R> > R> 
> R> > R> Log:
> R> > R>   This reverts out Gleb's changes and adds three small
> R> > R>   fixes that I think closes up the races Gleb was
> R> > R>   looking for. This is running quite nicely in Netflix and
> R> > R>   now no longer causes TCP-tcb leaks.
> R> > R>   
> R> > R>   Differential Revision:  7135
> R> > 
> R> > Just to notice that I am completely pissed of by this commit
> R> > war, that you started.
> R> > 
> R> > I've been testing my changes properly, I gave people time to
> R> > review my changes. You didn't.
> R> > 
> R> > From your explanation in other emails I see that you've been
> R> > testing your changes with a version of FreeBSD that is a heavily
> R> > modified FreeBSD 10, not 11.
> R> > 
> R> > The new code you mixed with revert of mine, doesn't fix the
> R> > problem observed. It fixes another problem that you imagined,
> R> > which might exist, but isn't observed. We already discussed that
> R> > and you didn't prove it wrong.
> R> > 
> R> > Your change doesn't even revert my change completely.
> R> > 
> R> > -- 
> R> > Totus tuus, Glebius.
> R> 
> R> 
> R> Randall Stewart
> R> r...@netflix.com
> R> 803-317-4952
> R> 
> R> 
> R> 
> R> 
> R> 
> R> 
> 
> -- 
> Totus tuus, Glebius.


Randall Stewart
r...@netflix.com
803-317-4952





___
svn-src-all@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"

Re: svn commit: r303037 - head/sys/kern

2016-07-20 Thread Gleb Smirnoff
  Randall,

  I have just tested and r303037 brings the TCP panic back. I
got two crashes during 2.5 hours.

In your email [1] you are right, there is regression that (-1)
return value is lost. This problem was worked on in the PR 210884,
and we were very close to commiting the fix.

The whole 11.0-RELEASE cycle strongly depends on this change. We
don't want to release with TCP panic, and of course we want the
regression described in 210884 to be fixed.

Your backout mixed with extra code really made things messy. Since
I don't want to go with commit war, on behalf of RE we are asking
for explicit agreement to back out r303037. Then we will proceed with
latest patch from 210884. Is that okay?

[1] https://lists.freebsd.org/pipermail/svn-src-head/2016-July/089313.html

On Wed, Jul 20, 2016 at 03:33:37PM +0200, Randall Stewart wrote:
R> Gleb
R> 
R> I wish you would have responded earlier.. I am more than glad to hand
R> off all kern_timeout.c to you… please take it commit what you want to
R> it and have it. I hate the code and I dislike having to touch it.
R> 
R> Its yours.. I can assure you I will not touch it again.
R> 
R> R
R> > On Jul 20, 2016, at 8:53 AM, Gleb Smirnoff  wrote:
R> > 
R> > On Tue, Jul 19, 2016 at 06:31:19PM +, Randall Stewart wrote:
R> > R> Author: rrs
R> > R> Date: Tue Jul 19 18:31:19 2016
R> > R> New Revision: 303037
R> > R> URL: https://svnweb.freebsd.org/changeset/base/303037
R> > R> 
R> > R> Log:
R> > R>   This reverts out Gleb's changes and adds three small
R> > R>   fixes that I think closes up the races Gleb was
R> > R>   looking for. This is running quite nicely in Netflix and
R> > R>   now no longer causes TCP-tcb leaks.
R> > R>   
R> > R>   Differential Revision:7135
R> > 
R> > Just to notice that I am completely pissed of by this commit
R> > war, that you started.
R> > 
R> > I've been testing my changes properly, I gave people time to
R> > review my changes. You didn't.
R> > 
R> > From your explanation in other emails I see that you've been
R> > testing your changes with a version of FreeBSD that is a heavily
R> > modified FreeBSD 10, not 11.
R> > 
R> > The new code you mixed with revert of mine, doesn't fix the
R> > problem observed. It fixes another problem that you imagined,
R> > which might exist, but isn't observed. We already discussed that
R> > and you didn't prove it wrong.
R> > 
R> > Your change doesn't even revert my change completely.
R> > 
R> > -- 
R> > Totus tuus, Glebius.
R> 
R> 
R> Randall Stewart
R> r...@netflix.com
R> 803-317-4952
R> 
R> 
R> 
R> 
R> 
R> 

-- 
Totus tuus, Glebius.
___
svn-src-all@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"

Re: svn commit: r303037 - head/sys/kern

2016-07-20 Thread Randall Stewart via svn-src-all
Gleb

I wish you would have responded earlier.. I am more than glad to hand
off all kern_timeout.c to you… please take it commit what you want to
it and have it. I hate the code and I dislike having to touch it.

Its yours.. I can assure you I will not touch it again.

R
> On Jul 20, 2016, at 8:53 AM, Gleb Smirnoff  wrote:
> 
> On Tue, Jul 19, 2016 at 06:31:19PM +, Randall Stewart wrote:
> R> Author: rrs
> R> Date: Tue Jul 19 18:31:19 2016
> R> New Revision: 303037
> R> URL: https://svnweb.freebsd.org/changeset/base/303037
> R> 
> R> Log:
> R>   This reverts out Gleb's changes and adds three small
> R>   fixes that I think closes up the races Gleb was
> R>   looking for. This is running quite nicely in Netflix and
> R>   now no longer causes TCP-tcb leaks.
> R>   
> R>   Differential Revision:   7135
> 
> Just to notice that I am completely pissed of by this commit
> war, that you started.
> 
> I've been testing my changes properly, I gave people time to
> review my changes. You didn't.
> 
> From your explanation in other emails I see that you've been
> testing your changes with a version of FreeBSD that is a heavily
> modified FreeBSD 10, not 11.
> 
> The new code you mixed with revert of mine, doesn't fix the
> problem observed. It fixes another problem that you imagined,
> which might exist, but isn't observed. We already discussed that
> and you didn't prove it wrong.
> 
> Your change doesn't even revert my change completely.
> 
> -- 
> Totus tuus, Glebius.


Randall Stewart
r...@netflix.com
803-317-4952





___
svn-src-all@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"

Re: svn commit: r303037 - head/sys/kern

2016-07-20 Thread Gleb Smirnoff
On Tue, Jul 19, 2016 at 06:31:19PM +, Randall Stewart wrote:
R> Author: rrs
R> Date: Tue Jul 19 18:31:19 2016
R> New Revision: 303037
R> URL: https://svnweb.freebsd.org/changeset/base/303037
R> 
R> Log:
R>   This reverts out Gleb's changes and adds three small
R>   fixes that I think closes up the races Gleb was
R>   looking for. This is running quite nicely in Netflix and
R>   now no longer causes TCP-tcb leaks.
R>   
R>   Differential Revision: 7135

Just to notice that I am completely pissed of by this commit
war, that you started.

I've been testing my changes properly, I gave people time to
review my changes. You didn't.

>From your explanation in other emails I see that you've been
testing your changes with a version of FreeBSD that is a heavily
modified FreeBSD 10, not 11.

The new code you mixed with revert of mine, doesn't fix the
problem observed. It fixes another problem that you imagined,
which might exist, but isn't observed. We already discussed that
and you didn't prove it wrong.

Your change doesn't even revert my change completely.

-- 
Totus tuus, Glebius.
___
svn-src-all@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"


Re: svn commit: r303037 - head/sys/kern

2016-07-19 Thread Ngie Cooper (yaneurabeya)

> On Jul 19, 2016, at 11:31, Randall Stewart  wrote:
> 
> Author: rrs
> Date: Tue Jul 19 18:31:19 2016
> New Revision: 303037
> URL: https://svnweb.freebsd.org/changeset/base/303037
> 
> Log:
>  This reverts out Gleb's changes and adds three small
>  fixes that I think closes up the races Gleb was
>  looking for. This is running quite nicely in Netflix and
>  now no longer causes TCP-tcb leaks.
> 
>  Differential Revision:   7135

This needs to be MFCed to ^/stable/11.
Thanks,
-Ngie


signature.asc
Description: Message signed with OpenPGP using GPGMail


svn commit: r303037 - head/sys/kern

2016-07-19 Thread Randall Stewart
Author: rrs
Date: Tue Jul 19 18:31:19 2016
New Revision: 303037
URL: https://svnweb.freebsd.org/changeset/base/303037

Log:
  This reverts out Gleb's changes and adds three small
  fixes that I think closes up the races Gleb was
  looking for. This is running quite nicely in Netflix and
  now no longer causes TCP-tcb leaks.
  
  Differential Revision:7135

Modified:
  head/sys/kern/kern_timeout.c

Modified: head/sys/kern/kern_timeout.c
==
--- head/sys/kern/kern_timeout.cTue Jul 19 18:15:22 2016
(r303036)
+++ head/sys/kern/kern_timeout.cTue Jul 19 18:31:19 2016
(r303037)
@@ -1050,7 +1050,7 @@ callout_reset_sbt_on(struct callout *c, 
 */
if (c->c_lock != NULL && !cc_exec_cancel(cc, direct))
cancelled = cc_exec_cancel(cc, direct) = true;
-   if (cc_exec_waiting(cc, direct)) {
+   if (cc_exec_waiting(cc, direct) || cc_exec_drain(cc, direct)) {
/*
 * Someone has called callout_drain to kill this
 * callout.  Don't reschedule.
@@ -1166,7 +1166,7 @@ _callout_stop_safe(struct callout *c, in
struct callout_cpu *cc, *old_cc;
struct lock_class *class;
int direct, sq_locked, use_lock;
-   int cancelled, not_on_a_list;
+   int not_on_a_list;
 
if ((flags & CS_DRAIN) != 0)
WITNESS_WARN(WARN_GIANTOK | WARN_SLEEPOK, c->c_lock,
@@ -1234,17 +1234,47 @@ again:
panic("migration should not happen");
 #endif
}
-
+   if ((drain != NULL) && (c->c_iflags & CALLOUT_PENDING) &&
+   (cc_exec_curr(cc, direct) != c)) {
+   /* 
+* This callout is executing and we are draining.
+* The only way this can happen is if its also
+* been rescheduled to run on one thread *and* asked to drain
+* on this thread (at the same time it is waiting to execute).
+*/
+   if ((c->c_iflags & CALLOUT_PROCESSED) == 0) {
+   if (cc_exec_next(cc) == c)
+   cc_exec_next(cc) = LIST_NEXT(c, c_links.le);
+   LIST_REMOVE(c, c_links.le);
+   } else {
+   TAILQ_REMOVE(>cc_expireq, c, c_links.tqe);
+   }
+   c->c_iflags &= ~CALLOUT_PENDING;
+   c->c_flags &= ~CALLOUT_ACTIVE;
+   }
/*
-* If the callout is running, try to stop it or drain it.
+* If the callout isn't pending, it's not on the queue, so
+* don't attempt to remove it from the queue.  We can try to
+* stop it by other means however.
 */
-   if (cc_exec_curr(cc, direct) == c) {
+   if (!(c->c_iflags & CALLOUT_PENDING)) {
/*
-* Succeed we to stop it or not, we must clear the
-* active flag - this is what API users expect.
+* If it wasn't on the queue and it isn't the current
+* callout, then we can't stop it, so just bail.
+* It probably has already been run (if locking
+* is properly done). You could get here if the caller
+* calls stop twice in a row for example. The second
+* call would fall here without CALLOUT_ACTIVE set.
 */
c->c_flags &= ~CALLOUT_ACTIVE;
-
+   if (cc_exec_curr(cc, direct) != c) {
+   CTR3(KTR_CALLOUT, "failed to stop %p func %p arg %p",
+   c, c->c_func, c->c_arg);
+   CC_UNLOCK(cc);
+   if (sq_locked)
+   sleepq_release(_exec_waiting(cc, direct));
+   return (-1);
+   }
if ((flags & CS_DRAIN) != 0) {
/*
 * The current callout is running (or just
@@ -1278,7 +1308,6 @@ again:
old_cc = cc;
goto again;
}
-
/*
 * Migration could be cancelled here, but
 * as long as it is still not sure when it
@@ -1362,6 +1391,8 @@ again:
cc_exec_drain(cc, direct) = drain;
}
CC_UNLOCK(cc);
+   if (drain)
+   return(0);
return ((flags & CS_EXECUTING) != 0);
}
CTR3(KTR_CALLOUT, "failed to stop %p func %p arg %p",
@@ -1369,20 +1400,12 @@ again:
if (drain) {
cc_exec_drain(cc, direct) = drain;
}
-   KASSERT(!sq_locked,