Re: svn commit: r357051 - head/sys/dev/bge
On Sun, Jan 26, 2020 at 09:48:42AM +, John Baldwin wrote: J> > We also are concerned about that theoretically. Haven't yet seen effect J> > in practice, but our sessions are mostly longer living. First we have the J> > tunable to limit batching. Second, there are some ideas on how to improve J> > the garbage collector performance if it becomes an issue. J> J> There are other workloads than Netflix. ;) Verisign has incredibly short-lived J> connections with very high turnover. I think though that they have already J> abandoned the in-tree network stack for a userland stack built on netmap. Still, J> I think that there are probably other FreeBSD users that are probably somewhere J> in the middle that shouldn't be ignored. I understand that. First, my change doesn't create extra work for the garbage collector, but it potentially may affect its burstiness. If a machine has very high connection turnover it already may exhibit some problems with the PCB garbage collector on 12.0-RELEASE. Have we observed this yet? Note that PCBs are very different to other things protected by the network epoch: address lists, interface lists, firewall rules, etc. They are short lived. We got several ideas on how to deal with this potential problem: 1) The current PCB destructor does lots of things like freeing and unrefcounting associated multicast options, credentials, etc. This is because it was converted to epoch with a principle of minimal diff in r335015. However, since all operations with a PCB happen under its individual lock, we can free it differently. We can mark it INP_FREED and do all the destruction immediately, leaving only actual free to the garbage collector. Once this is achieved, the garbage collection can be batched at level of UMA. Jeff is working on that. 2) Use separate epoch for PCBs, leaving the network epoch for long lived structures only. Don't hold the PCB epoch long. 3) Just don't use epoch for PCBs. Decompose the hash lock into per slot hash lock. -- Gleb Smirnoff ___ svn-src-head@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/svn-src-head To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"
Re: svn commit: r357051 - head/sys/dev/bge
On Sun, 26 Jan 2020, John Baldwin wrote: On 1/23/20 7:32 PM, Gleb Smirnoff wrote: On Thu, Jan 23, 2020 at 05:09:14PM -1000, Jeff Roberson wrote: J> While we don't have a policy strictly requiring reviews it is the norm to J> have substantial changes socialized and reviewed. I appreciate the work J> that you are doing but it likely should've been discussed somewhere J> more publicly. I apologized if I missed it but I don't see reference to J> anything. That was https://reviews.freebsd.org/D23242 A review alone isn't sufficient for large, sweeping changes in my mind. For major changes, a thread on arch@ or net@ or the like is probably more appropriate. You can include a link to a review or git branch, etc. in that e-mail, but phabricator aren't as well suited to higher-level design-review type discussion, more for implementation-review. J> Architecturally I am more concerned with the coarseness of net_epoch and J> the duration of hold becoming a resource utilization problem in high J> turn-over workloads. Like short connection tcp. Has anyone done J> substantial testing here? epoch as it is today will hold every free J> callback for a minimum of several clock ticks and a maximum of 2x the J> duration of the longest epoch section time. With preemption, etc. this J> could be 100s of ms of PCBs held. We also are concerned about that theoretically. Haven't yet seen effect in practice, but our sessions are mostly longer living. First we have the tunable to limit batching. Second, there are some ideas on how to improve the garbage collector performance if it becomes an issue. There are other workloads than Netflix. ;) Verisign has incredibly short-lived connections with very high turnover. I think though that they have already abandoned the in-tree network stack for a userland stack built on netmap. Still, I think that there are probably other FreeBSD users that are probably somewhere in the middle that shouldn't be ignored. Packet batching would not be impossible by simply using m_nextpkt chains in mbufs passed up to ether_input and having ether_input pass them in a loop to the next higher loop (as a first step). That would reduce unlock/lock operations in drivers (for those still using locks on receive) as well as permitting ether_input to process batches under a single epoch invocation. This is actually the approach that I took for nokia. You could prefetch m->m_nextpkt at the top of the loop iteration. It was very effective there. Seeing how many drivers and unexpected entry points had to have the NET_EPOCH added I would want to review again once it's stable and see if there is a way to simplify through API changes. It seems like more than expected slipped through the cracks and I worry about long-term maintenance. Thanks, Jeff -- John Baldwin ___ svn-src-head@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/svn-src-head To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"
Re: svn commit: r357051 - head/sys/dev/bge
On 1/23/20 7:32 PM, Gleb Smirnoff wrote: > On Thu, Jan 23, 2020 at 05:09:14PM -1000, Jeff Roberson wrote: > J> While we don't have a policy strictly requiring reviews it is the norm to > J> have substantial changes socialized and reviewed. I appreciate the work > J> that you are doing but it likely should've been discussed somewhere > J> more publicly. I apologized if I missed it but I don't see reference to > J> anything. > > That was https://reviews.freebsd.org/D23242 A review alone isn't sufficient for large, sweeping changes in my mind. For major changes, a thread on arch@ or net@ or the like is probably more appropriate. You can include a link to a review or git branch, etc. in that e-mail, but phabricator aren't as well suited to higher-level design-review type discussion, more for implementation-review. > J> Architecturally I am more concerned with the coarseness of net_epoch and > J> the duration of hold becoming a resource utilization problem in high > J> turn-over workloads. Like short connection tcp. Has anyone done > J> substantial testing here? epoch as it is today will hold every free > J> callback for a minimum of several clock ticks and a maximum of 2x the > J> duration of the longest epoch section time. With preemption, etc. this > J> could be 100s of ms of PCBs held. > > We also are concerned about that theoretically. Haven't yet seen effect > in practice, but our sessions are mostly longer living. First we have the > tunable to limit batching. Second, there are some ideas on how to improve > the garbage collector performance if it becomes an issue. There are other workloads than Netflix. ;) Verisign has incredibly short-lived connections with very high turnover. I think though that they have already abandoned the in-tree network stack for a userland stack built on netmap. Still, I think that there are probably other FreeBSD users that are probably somewhere in the middle that shouldn't be ignored. Packet batching would not be impossible by simply using m_nextpkt chains in mbufs passed up to ether_input and having ether_input pass them in a loop to the next higher loop (as a first step). That would reduce unlock/lock operations in drivers (for those still using locks on receive) as well as permitting ether_input to process batches under a single epoch invocation. -- John Baldwin ___ svn-src-head@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/svn-src-head To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"
Re: svn commit: r357051 - head/sys/dev/bge
On 1/23/20 9:15 PM, Gleb Smirnoff wrote: > On Thu, Jan 23, 2020 at 06:18:15PM -1000, Jeff Roberson wrote: > J> > That was https://reviews.freebsd.org/D23242 > J> > J> Ok thank you. Can you tag commits so people can see the discussion? Was > J> it in one I missed? When I'm committing a long patch series I include the > J> link in all of them. > > I probably now on will also include in every patch in a serie. I just > dislike that first one (usually a preparation change) closes the review. You can also add additional commits to a review via 'Edit related revisions' in the web ui if you forget to tag a commit. -- John Baldwin ___ svn-src-head@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/svn-src-head To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"
Re: svn commit: r357051 - head/sys/dev/bge
On Thu, Jan 23, 2020 at 06:18:15PM -1000, Jeff Roberson wrote: J> > That was https://reviews.freebsd.org/D23242 J> J> Ok thank you. Can you tag commits so people can see the discussion? Was J> it in one I missed? When I'm committing a long patch series I include the J> link in all of them. I probably now on will also include in every patch in a serie. I just dislike that first one (usually a preparation change) closes the review. -- Gleb Smirnoff ___ svn-src-head@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/svn-src-head To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"
Re: svn commit: r357051 - head/sys/dev/bge
On Thu, 23 Jan 2020, Gleb Smirnoff wrote: On Thu, Jan 23, 2020 at 05:09:14PM -1000, Jeff Roberson wrote: J> While we don't have a policy strictly requiring reviews it is the norm to J> have substantial changes socialized and reviewed. I appreciate the work J> that you are doing but it likely should've been discussed somewhere J> more publicly. I apologized if I missed it but I don't see reference to J> anything. That was https://reviews.freebsd.org/D23242 Ok thank you. Can you tag commits so people can see the discussion? Was it in one I missed? When I'm committing a long patch series I include the link in all of them. Ryan, are you subscribed to @networking? J> Architecturally I am more concerned with the coarseness of net_epoch and J> the duration of hold becoming a resource utilization problem in high J> turn-over workloads. Like short connection tcp. Has anyone done J> substantial testing here? epoch as it is today will hold every free J> callback for a minimum of several clock ticks and a maximum of 2x the J> duration of the longest epoch section time. With preemption, etc. this J> could be 100s of ms of PCBs held. We also are concerned about that theoretically. Haven't yet seen effect in practice, but our sessions are mostly longer living. First we have the tunable to limit batching. Second, there are some ideas on how to improve the garbage collector performance if it becomes an issue. I am often surprised at how much cpu time I see on linux spent in RCU free processing. Many of these things are written in such a way that everything is cache cold by the time you swing back around. So the callout walk is a giant cold linked list. Thanks, Jeff -- Gleb Smirnoff ___ svn-src-head@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/svn-src-head To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"
Re: svn commit: r357051 - head/sys/dev/bge
On Thu, Jan 23, 2020 at 05:09:14PM -1000, Jeff Roberson wrote: J> While we don't have a policy strictly requiring reviews it is the norm to J> have substantial changes socialized and reviewed. I appreciate the work J> that you are doing but it likely should've been discussed somewhere J> more publicly. I apologized if I missed it but I don't see reference to J> anything. That was https://reviews.freebsd.org/D23242 J> Architecturally I am more concerned with the coarseness of net_epoch and J> the duration of hold becoming a resource utilization problem in high J> turn-over workloads. Like short connection tcp. Has anyone done J> substantial testing here? epoch as it is today will hold every free J> callback for a minimum of several clock ticks and a maximum of 2x the J> duration of the longest epoch section time. With preemption, etc. this J> could be 100s of ms of PCBs held. We also are concerned about that theoretically. Haven't yet seen effect in practice, but our sessions are mostly longer living. First we have the tunable to limit batching. Second, there are some ideas on how to improve the garbage collector performance if it becomes an issue. -- Gleb Smirnoff ___ svn-src-head@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/svn-src-head To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"
Re: svn commit: r357051 - head/sys/dev/bge
On Thu, 23 Jan 2020, Gleb Smirnoff wrote: On Thu, Jan 23, 2020 at 08:33:58PM -0500, Ryan Stone wrote: R> > Because at interrupt level we can batch multiple packets in a single epoch. R> > This speeds up unfiltered packet forwarding performance by 5%. R> > R> > With driver level pfil hooks I would claim even more improvement, because before R> > the change we needed to enter epoch twice - once for filtering, than for ether_input. R> > R> > Epoch isn't a layer, it is a synchronisation primitive, so I disagree about R> > statement on layering violation. R> R> Epoch is a synchronization primitive, but the net_epoch is absolutely R> a part of the networking layer. If we need better batching then the R> correct solution is to introduce a batched interface for drivers to R> push packets up the stack, not to mess around at the interrupt layer. Such interface of course would be valuable but will not cover case when an interrupt arrives during processing of previous one. So its batching possiblities are limited compared to interrupt level batching. And I already noted that ether_input() isn't the only way to enter the network stack. R> Note that the only reason why this works for mlx4/mlx5 is that R> linuxkpi *always* requests a INTR_TYPE_NET no matter what driver is R> running. This means that all drm graphics driver interrupts are now R> running under the net_epoch: R> R> https://svnweb.freebsd.org/base/head/sys/compat/linuxkpi/common/include/linux/interrupt.h?revision=352205&view=markup#l103 The historical reason is that linuxkpi was originally made to support ofed and there was no real way to get this information from the driver. Well, it is not my fault that a video driver requests INTR_TYPE_NET interrupt. I mean, you can't put this as a rationale against using network epoch for all interrrupts that declare theirselves as network interrupts. However, this is harmless. While we don't have a policy strictly requiring reviews it is the norm to have substantial changes socialized and reviewed. I appreciate the work that you are doing but it likely should've been discussed somewhere more publicly. I apologized if I missed it but I don't see reference to anything. Architecturally I am more concerned with the coarseness of net_epoch and the duration of hold becoming a resource utilization problem in high turn-over workloads. Like short connection tcp. Has anyone done substantial testing here? epoch as it is today will hold every free callback for a minimum of several clock ticks and a maximum of 2x the duration of the longest epoch section time. With preemption, etc. this could be 100s of ms of PCBs held. Thanks, Jeff -- Gleb Smirnoff ___ svn-src-head@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/svn-src-head To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"
Re: svn commit: r357051 - head/sys/dev/bge
On Thu, Jan 23, 2020 at 08:33:58PM -0500, Ryan Stone wrote: R> > Because at interrupt level we can batch multiple packets in a single epoch. R> > This speeds up unfiltered packet forwarding performance by 5%. R> > R> > With driver level pfil hooks I would claim even more improvement, because before R> > the change we needed to enter epoch twice - once for filtering, than for ether_input. R> > R> > Epoch isn't a layer, it is a synchronisation primitive, so I disagree about R> > statement on layering violation. R> R> Epoch is a synchronization primitive, but the net_epoch is absolutely R> a part of the networking layer. If we need better batching then the R> correct solution is to introduce a batched interface for drivers to R> push packets up the stack, not to mess around at the interrupt layer. Such interface of course would be valuable but will not cover case when an interrupt arrives during processing of previous one. So its batching possiblities are limited compared to interrupt level batching. And I already noted that ether_input() isn't the only way to enter the network stack. R> Note that the only reason why this works for mlx4/mlx5 is that R> linuxkpi *always* requests a INTR_TYPE_NET no matter what driver is R> running. This means that all drm graphics driver interrupts are now R> running under the net_epoch: R> R> https://svnweb.freebsd.org/base/head/sys/compat/linuxkpi/common/include/linux/interrupt.h?revision=352205&view=markup#l103 Well, it is not my fault that a video driver requests INTR_TYPE_NET interrupt. I mean, you can't put this as a rationale against using network epoch for all interrrupts that declare theirselves as network interrupts. However, this is harmless. -- Gleb Smirnoff ___ svn-src-head@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/svn-src-head To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"
Re: svn commit: r357051 - head/sys/dev/bge
On Thu, Jan 23, 2020 at 8:25 PM Gleb Smirnoff wrote: > Because at interrupt level we can batch multiple packets in a single epoch. > This speeds up unfiltered packet forwarding performance by 5%. > > With driver level pfil hooks I would claim even more improvement, because > before > the change we needed to enter epoch twice - once for filtering, than for > ether_input. > > Epoch isn't a layer, it is a synchronisation primitive, so I disagree about > statement on layering violation. Epoch is a synchronization primitive, but the net_epoch is absolutely a part of the networking layer. If we need better batching then the correct solution is to introduce a batched interface for drivers to push packets up the stack, not to mess around at the interrupt layer. Note that the only reason why this works for mlx4/mlx5 is that linuxkpi *always* requests a INTR_TYPE_NET no matter what driver is running. This means that all drm graphics driver interrupts are now running under the net_epoch: https://svnweb.freebsd.org/base/head/sys/compat/linuxkpi/common/include/linux/interrupt.h?revision=352205&view=markup#l103 ___ svn-src-head@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/svn-src-head To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"
Re: svn commit: r357051 - head/sys/dev/bge
On Thu, Jan 23, 2020 at 08:17:46PM -0500, Ryan Stone wrote: R> On Thu, Jan 23, 2020 at 6:05 PM Gleb Smirnoff wrote: R> > R> > On Thu, Jan 23, 2020 at 02:17:33PM -0500, Ryan Stone wrote: R> > R> What is a driver's responsibility now for entering/leaving the net epoch now? R> > R> > For drivers that are 'special', entering the net epoch is necessary. Special R> > usually means running if_input outside network interrupt context. R> > R> > However, there is plan to generalize entering/exiting epoch for taskqueues R> > and callouts. R> R> Why on earth is it done that way rather than putting the network epoch R> enter/exit in ether_input? I'm with Ian; this sounds like a huge R> layering violation. Another thing I should have mentioned. Doing this at interrupt level makes much easier to maintain alternative network stacks, for example TOE - which now has tons of hacks. Same stands for netgraph. If you hook ng_ether on, than ether_input is skipped and packet goes to netgraph, then it can be injected back into network stack from a different type of node. -- Gleb Smirnoff ___ svn-src-head@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/svn-src-head To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"
Re: svn commit: r357051 - head/sys/dev/bge
On Thu, Jan 23, 2020 at 08:17:46PM -0500, Ryan Stone wrote: R> On Thu, Jan 23, 2020 at 6:05 PM Gleb Smirnoff wrote: R> > R> > On Thu, Jan 23, 2020 at 02:17:33PM -0500, Ryan Stone wrote: R> > R> What is a driver's responsibility now for entering/leaving the net epoch now? R> > R> > For drivers that are 'special', entering the net epoch is necessary. Special R> > usually means running if_input outside network interrupt context. R> > R> > However, there is plan to generalize entering/exiting epoch for taskqueues R> > and callouts. R> R> Why on earth is it done that way rather than putting the network epoch R> enter/exit in ether_input? I'm with Ian; this sounds like a huge R> layering violation. Because at interrupt level we can batch multiple packets in a single epoch. This speeds up unfiltered packet forwarding performance by 5%. With driver level pfil hooks I would claim even more improvement, because before the change we needed to enter epoch twice - once for filtering, than for ether_input. Epoch isn't a layer, it is a synchronisation primitive, so I disagree about statement on layering violation. -- Gleb Smirnoff ___ svn-src-head@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/svn-src-head To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"
Re: svn commit: r357051 - head/sys/dev/bge
On Thu, Jan 23, 2020 at 6:05 PM Gleb Smirnoff wrote: > > On Thu, Jan 23, 2020 at 02:17:33PM -0500, Ryan Stone wrote: > R> What is a driver's responsibility now for entering/leaving the net epoch > now? > > For drivers that are 'special', entering the net epoch is necessary. Special > usually means running if_input outside network interrupt context. > > However, there is plan to generalize entering/exiting epoch for taskqueues > and callouts. Why on earth is it done that way rather than putting the network epoch enter/exit in ether_input? I'm with Ian; this sounds like a huge layering violation. ___ svn-src-head@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/svn-src-head To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"
Re: svn commit: r357051 - head/sys/dev/bge
On Thu, Jan 23, 2020 at 04:08:32PM -0700, Ian Lepore wrote: I> On Thu, 2020-01-23 at 15:05 -0800, Gleb Smirnoff wrote: I> > On Thu, Jan 23, 2020 at 02:17:33PM -0500, Ryan Stone wrote: I> > R> What is a driver's responsibility now for entering/leaving the net epoch now? I> > I> > For drivers that are 'special', entering the net epoch is necessary. Special I> > usually means running if_input outside network interrupt context. I> > I> > However, there is plan to generalize entering/exiting epoch for taskqueues I> > and callouts. I> > I> I> That sounds every bit as horrible and out-of-place as the recent hack I> (and it does feel very much like a horrible hack) that put network- I> specific code into the guts of interrupt dispatching. It isn't really a network code. You just include which declares the epoch KPI and a few globally recognized epochs. For now the only one is the network epoch. If you want to have for a example a disk epoch also supported by interrupt code, that would add one extra declaration to epoch.h, and again nothing really disk related. -- Gleb Smirnoff ___ svn-src-head@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/svn-src-head To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"
Re: svn commit: r357051 - head/sys/dev/bge
On Thu, 2020-01-23 at 15:05 -0800, Gleb Smirnoff wrote: > On Thu, Jan 23, 2020 at 02:17:33PM -0500, Ryan Stone wrote: > R> What is a driver's responsibility now for entering/leaving the net epoch > now? > > For drivers that are 'special', entering the net epoch is necessary. Special > usually means running if_input outside network interrupt context. > > However, there is plan to generalize entering/exiting epoch for taskqueues > and callouts. > That sounds every bit as horrible and out-of-place as the recent hack (and it does feel very much like a horrible hack) that put network- specific code into the guts of interrupt dispatching. -- Ian ___ svn-src-head@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/svn-src-head To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"
Re: svn commit: r357051 - head/sys/dev/bge
On Thu, Jan 23, 2020 at 02:17:33PM -0500, Ryan Stone wrote: R> What is a driver's responsibility now for entering/leaving the net epoch now? For drivers that are 'special', entering the net epoch is necessary. Special usually means running if_input outside network interrupt context. However, there is plan to generalize entering/exiting epoch for taskqueues and callouts. -- Gleb Smirnoff ___ svn-src-head@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/svn-src-head To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"
Re: svn commit: r357051 - head/sys/dev/bge
What is a driver's responsibility now for entering/leaving the net epoch now? ___ svn-src-head@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/svn-src-head To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"