Re: panic with tcp timers

2016-07-21 Thread Hans Petter Selasky
On 07/21/16 09:54, Julien Charbon wrote: Hi, On 7/14/16 11:02 PM, Larry Rosenman wrote: On 2016-07-14 12:01, Julien Charbon wrote: On 6/20/16 11:55 AM, Julien Charbon wrote: On 6/20/16 9:39 AM, Gleb Smirnoff wrote: On Fri, Jun 17, 2016 at 11:27:39AM +0200, Julien Charbon wrote: J> >

Re: panic with tcp timers

2016-07-21 Thread Julien Charbon
Hi, On 7/14/16 11:02 PM, Larry Rosenman wrote: > On 2016-07-14 12:01, Julien Charbon wrote: >> On 6/20/16 11:55 AM, Julien Charbon wrote: >>> On 6/20/16 9:39 AM, Gleb Smirnoff wrote: On Fri, Jun 17, 2016 at 11:27:39AM +0200, Julien Charbon wrote: J> > Comparing stable/10 and head, I

Re: panic with tcp timers

2016-07-14 Thread Larry Rosenman
On 2016-07-14 12:01, Julien Charbon wrote: Hi, On 6/20/16 11:55 AM, Julien Charbon wrote: On 6/20/16 9:39 AM, Gleb Smirnoff wrote: On Fri, Jun 17, 2016 at 11:27:39AM +0200, Julien Charbon wrote: J> > Comparing stable/10 and head, I see two changes that could J> > affect that: J> > J> > -

Re: panic with tcp timers

2016-07-14 Thread Julien Charbon
Hi, On 6/20/16 11:55 AM, Julien Charbon wrote: > On 6/20/16 9:39 AM, Gleb Smirnoff wrote: >> On Fri, Jun 17, 2016 at 11:27:39AM +0200, Julien Charbon wrote: >> J> > Comparing stable/10 and head, I see two changes that could >> J> > affect that: >> J> > >> J> > - callout_async_drain >> J> > -

Re: EBR fix for life cycle races was Re: panic with tcp timers

2016-06-30 Thread Matthew Macy
On Tue, 28 Jun 2016 23:19:45 -0700 Matthew Macy wrote > > > > On Tue, 28 Jun 2016 15:51:57 -0700 K. Macy wrote > > > On Tue, Jun 28, 2016 at 10:51 AM, Matthew Macy wrote: > > > > You guys

EBR fix for life cycle races was Re: panic with tcp timers

2016-06-29 Thread Matthew Macy
On Tue, 28 Jun 2016 15:51:57 -0700 K. Macy wrote > On Tue, Jun 28, 2016 at 10:51 AM, Matthew Macy wrote: > > You guys should really look at Samy Bahra's epoch based reclamation. I > > solved a similar problem in drm/linuxkpi using it. >

Re: panic with tcp timers

2016-06-28 Thread K. Macy
On Tue, Jun 28, 2016 at 10:51 AM, Matthew Macy wrote: > You guys should really look at Samy Bahra's epoch based reclamation. I solved > a similar problem in drm/linuxkpi using it. The point being that this is a bug in the TCP life cycle handling _not_ in callouts. Churning

Re: panic with tcp timers

2016-06-28 Thread Matthew Macy
You guys should really look at Samy Bahra's epoch based reclamation. I solved a similar problem in drm/linuxkpi using it. -M On Tue, 28 Jun 2016 02:58:56 -0700 Julien Charbon wrote > > Hi Randall, > > On 6/25/16 4:41 PM, Randall Stewart via freebsd-net

Re: panic with tcp timers

2016-06-28 Thread Julien Charbon
Hi Randall, On 6/25/16 4:41 PM, Randall Stewart via freebsd-net wrote: > Ok > > Lets try this again with my source changed to my @freebsd.net :-) > > Now I am also attaching a patch for you Gleb, this will take some poking to > get in to your NF-head since it incorporates some changes we made

Re: panic with tcp timers

2016-06-25 Thread Randall Stewart
Ok Lets try this again with my source changed to my @freebsd.net :-) Now I am also attaching a patch for you Gleb, this will take some poking to get in to your NF-head since it incorporates some changes we made earlier. I think this will fix the problem.. i.e. dealing with two locks in the

Re: panic with tcp timers

2016-06-25 Thread Randall Stewart
So All of our timers in TCP do something like - INFO-LOCK INP_WLOCK if (inp needs to be dropped) { drop-it } do other work UNLOCK-INP UNLOCK-INFO -- And generally the path “inp needs to be dropped” is rarely taken. So why don’t we change the procedure

Re: panic with tcp timers

2016-06-20 Thread Julien Charbon
Hi, On 6/20/16 11:58 AM, Gleb Smirnoff wrote: > On Mon, Jun 20, 2016 at 11:55:55AM +0200, Julien Charbon wrote: > J> > On Fri, Jun 17, 2016 at 11:27:39AM +0200, Julien Charbon wrote: > J> > J> > Comparing stable/10 and head, I see two changes that could > J> > J> > affect that: > J> > J> > >

Re: panic with tcp timers

2016-06-20 Thread Julien Charbon
Hi, On 6/20/16 12:30 PM, Gleb Smirnoff wrote: > On Mon, Jun 20, 2016 at 12:14:18PM +0200, Hans Petter Selasky wrote: > H> On 06/20/16 11:58, Gleb Smirnoff wrote: > H> > The fix I am working on now is doing exactly that. callout_reset must > H> > return 0 if the callout is currently running. >

Re: panic with tcp timers

2016-06-20 Thread Adrian Chadd
There's implications for RSS with how the callout system currently works. If you don't know the RSS bucket ID of a connection in advance, you'll create callouts on the wrong CPUs and then they're not migrated. The initial work here did convert things over, but didn't place the callouts in the

Re: panic with tcp timers

2016-06-20 Thread Hans Petter Selasky
On 06/20/16 12:30, Gleb Smirnoff wrote: What does prevent us from converting TCP timeouts to locked? To my understanding it is the lock order of taking pcbinfo after pcb lock. I started this work: https://reviews.freebsd.org/D1563 --HPS ___

Re: panic with tcp timers

2016-06-20 Thread Hans Petter Selasky
On 06/20/16 12:30, Gleb Smirnoff wrote: Exactly! I am convinced that all callouts should be locked, and non-locked one should simply go away, as well as async drain. I agree about that that, except you still need the async drain, because it will prevent freeing the lock protecting the

Re: panic with tcp timers

2016-06-20 Thread Konstantin Belousov
On Mon, Jun 20, 2016 at 11:55:55AM +0200, Julien Charbon wrote: > > Hi, > > On 6/20/16 9:39 AM, Gleb Smirnoff wrote: > > On Fri, Jun 17, 2016 at 11:27:39AM +0200, Julien Charbon wrote: > > J> > Comparing stable/10 and head, I see two changes that could > > J> > affect that: > > J> > > > J> > -

Re: panic with tcp timers

2016-06-20 Thread Hans Petter Selasky
On 06/20/16 11:58, Gleb Smirnoff wrote: J> callout_stop() should return 0 when the callout is currently being J> serviced and indeed unstoppable J> https://reviews.freebsd.org/differential/changeset/?ref=62513=ignore-most What are the old paths impacted? Hi Gleb, Digging through my

Re: panic with tcp timers

2016-06-20 Thread Gleb Smirnoff
On Mon, Jun 20, 2016 at 12:14:18PM +0200, Hans Petter Selasky wrote: H> On 06/20/16 11:58, Gleb Smirnoff wrote: H> > The fix I am working on now is doing exactly that. callout_reset must H> > return 0 if the callout is currently running. H> > H> > What are the old paths impacted? H> H> Hi, H> H>

Re: panic with tcp timers

2016-06-20 Thread Hans Petter Selasky
On 06/20/16 11:58, Gleb Smirnoff wrote: The fix I am working on now is doing exactly that. callout_reset must return 0 if the callout is currently running. What are the old paths impacted? Hi, I'll dig into the matter aswell and give some comments. Thanks for the analysis, Gleb. FYI: This

Re: panic with tcp timers

2016-06-20 Thread Gleb Smirnoff
On Mon, Jun 20, 2016 at 11:55:55AM +0200, Julien Charbon wrote: J> > On Fri, Jun 17, 2016 at 11:27:39AM +0200, Julien Charbon wrote: J> > J> > Comparing stable/10 and head, I see two changes that could J> > J> > affect that: J> > J> > J> > J> > - callout_async_drain J> > J> > - switch to READ

Re: panic with tcp timers

2016-06-20 Thread Julien Charbon
Hi, On 6/20/16 9:39 AM, Gleb Smirnoff wrote: > On Fri, Jun 17, 2016 at 11:27:39AM +0200, Julien Charbon wrote: > J> > Comparing stable/10 and head, I see two changes that could > J> > affect that: > J> > > J> > - callout_async_drain > J> > - switch to READ lock for inp info in tcp timers > J>

Re: panic with tcp timers

2016-06-20 Thread Gleb Smirnoff
Hi! On Fri, Jun 17, 2016 at 11:27:39AM +0200, Julien Charbon wrote: J> > Comparing stable/10 and head, I see two changes that could J> > affect that: J> > J> > - callout_async_drain J> > - switch to READ lock for inp info in tcp timers J> > J> > That's why you are in To, Julien and Hans :) J>

Re: panic with tcp timers

2016-06-17 Thread Julien Charbon
Hi Gleb, On 6/17/16 6:53 AM, Gleb Smirnoff wrote: > At Netflix we are observing a race in TCP timers with head. > The problem is a regression, that doesn't happen on stable/10. > The panic usually happens after several hours at 55 Gbit/s of > traffic. > > What happens is that tcp_timer_keep

Re: panic with tcp timers

2016-06-17 Thread Bjoern A. Zeeb
On 17 Jun 2016, at 4:53, Gleb Smirnoff wrote: Hi! At Netflix we are observing a race in TCP timers with head. The problem is a regression, that doesn't happen on stable/10. The panic usually happens after several hours at 55 Gbit/s of traffic. What happens is that tcp_timer_keep finds

Re: panic with tcp timers

2016-06-17 Thread Hans Petter Selasky
On 06/17/16 06:53, Gleb Smirnoff wrote: Hi! At Netflix we are observing a race in TCP timers with head. The problem is a regression, that doesn't happen on stable/10. The panic usually happens after several hours at 55 Gbit/s of traffic. What happens is that tcp_timer_keep finds t_tcpcb

panic with tcp timers

2016-06-16 Thread Gleb Smirnoff
Hi! At Netflix we are observing a race in TCP timers with head. The problem is a regression, that doesn't happen on stable/10. The panic usually happens after several hours at 55 Gbit/s of traffic. What happens is that tcp_timer_keep finds t_tcpcb being NULL. Some coredumps have tcpcb