Re: Request for Testing: TCP RACK

2024-03-21 Thread Drew Gallatin
The entire point is to *NOT* go through the overhead of scheduling something 
asynchronously, but to take advantage of the fact that a user/kernel transition 
is going to trash the cache anyway.

In the common case of a system which has less than the threshold  number of 
connections , we access the tcp_hpts_softclock function pointer, make one 
function call, and access hpts_that_need_softclock, and then return.  So that's 
2 variables and a function call.

I think it would be preferable to avoid that call, and to move the declaration 
of tcp_hpts_softclock and hpts_that_need_softclock so that they are in the same 
cacheline.  Then we'd be hitting just a single line in the common case.  (I've 
made comments on the review to that effect).

Also, I wonder if the threshold could get higher by default, so that hpts is 
never called in this context unless we're to the point where we're scheduling 
thousands of runs of the hpts thread (and taking all those clock interrupts).

Drew

On Wed, Mar 20, 2024, at 8:17 PM, Konstantin Belousov wrote:
> On Tue, Mar 19, 2024 at 06:19:52AM -0400, rrs wrote:
> > Ok I have created
> > 
> > https://reviews.freebsd.org/D44420
> > 
> > 
> > To address the issue. I also attach a short version of the patch that Nuno
> > can try and validate
> > 
> > it works. Drew you may want to try this and validate the optimization does
> > kick in since I can
> > 
> > only now test that it does not on my local box :)
> The patch still causes access to all cpu's cachelines on each userret.
> It would be much better to inc/check the threshold and only schedule the
> call when exceeded.  Then the call can occur in some dedicated context,
> like per-CPU thread, instead of userret.
> 
> > 
> > 
> > R
> > 
> > 
> > 
> > On 3/18/24 3:42 PM, Drew Gallatin wrote:
> > > No.  The goal is to run on every return to userspace for every thread.
> > > 
> > > Drew
> > > 
> > > On Mon, Mar 18, 2024, at 3:41 PM, Konstantin Belousov wrote:
> > > > On Mon, Mar 18, 2024 at 03:13:11PM -0400, Drew Gallatin wrote:
> > > > > I got the idea from
> > > > > https://people.mpi-sws.org/~druschel/publications/soft-timers-tocs.pdf
> > > > > The gist is that the TCP pacing stuff needs to run frequently, and
> > > > > rather than run it out of a clock interrupt, its more efficient to run
> > > > > it out of a system call context at just the point where we return to
> > > > > userspace and the cache is trashed anyway. The current implementation
> > > > > is fine for our workload, but probably not idea for a generic system.
> > > > > Especially one where something is banging on system calls.
> > > > >
> > > > > Ast's could be the right tool for this, but I'm super unfamiliar with
> > > > > them, and I can't find any docs on them.
> > > > >
> > > > > Would ast_register(0, ASTR_UNCOND, 0, func) be roughly equivalent to
> > > > > what's happening here?
> > > > This call would need some AST number added, and then it registers the
> > > > ast to run on next return to userspace, for the current thread.
> > > > 
> > > > Is it enough?
> > > > >
> > > > > Drew
> > > > 
> > > > >
> > > > > On Mon, Mar 18, 2024, at 2:33 PM, Konstantin Belousov wrote:
> > > > > > On Mon, Mar 18, 2024 at 07:26:10AM -0500, Mike Karels wrote:
> > > > > > > On 18 Mar 2024, at 7:04, tue...@freebsd.org wrote:
> > > > > > >
> > > > > > > >> On 18. Mar 2024, at 12:42, Nuno Teixeira
> > > >  wrote:
> > > > > > > >>
> > > > > > > >> Hello all!
> > > > > > > >>
> > > > > > > >> It works just fine!
> > > > > > > >> System performance is OK.
> > > > > > > >> Using patch on main-n268841-b0aaf8beb126(-dirty).
> > > > > > > >>
> > > > > > > >> ---
> > > > > > > >> net.inet.tcp.functions_available:
> > > > > > > >> Stack   D
> > > > AliasPCB count
> > > > > > > >> freebsd freebsd  0
> > > > > > > >> rack*
> > > > rack 38
> > > > > > > >> ---
> &g

Re: Request for Testing: TCP RACK

2024-03-18 Thread Drew Gallatin
No.  The goal is to run on every return to userspace for every thread.

Drew

On Mon, Mar 18, 2024, at 3:41 PM, Konstantin Belousov wrote:
> On Mon, Mar 18, 2024 at 03:13:11PM -0400, Drew Gallatin wrote:
> > I got the idea from
> > https://people.mpi-sws.org/~druschel/publications/soft-timers-tocs.pdf
> > The gist is that the TCP pacing stuff needs to run frequently, and
> > rather than run it out of a clock interrupt, its more efficient to run
> > it out of a system call context at just the point where we return to
> > userspace and the cache is trashed anyway. The current implementation
> > is fine for our workload, but probably not idea for a generic system.
> > Especially one where something is banging on system calls.
> >
> > Ast's could be the right tool for this, but I'm super unfamiliar with
> > them, and I can't find any docs on them.
> >
> > Would ast_register(0, ASTR_UNCOND, 0, func) be roughly equivalent to
> > what's happening here?
> This call would need some AST number added, and then it registers the
> ast to run on next return to userspace, for the current thread.
> 
> Is it enough?
> >
> > Drew
> 
> > 
> > On Mon, Mar 18, 2024, at 2:33 PM, Konstantin Belousov wrote:
> > > On Mon, Mar 18, 2024 at 07:26:10AM -0500, Mike Karels wrote:
> > > > On 18 Mar 2024, at 7:04, tue...@freebsd.org wrote:
> > > > 
> > > > >> On 18. Mar 2024, at 12:42, Nuno Teixeira  wrote:
> > > > >>
> > > > >> Hello all!
> > > > >>
> > > > >> It works just fine!
> > > > >> System performance is OK.
> > > > >> Using patch on main-n268841-b0aaf8beb126(-dirty).
> > > > >>
> > > > >> ---
> > > > >> net.inet.tcp.functions_available:
> > > > >> Stack   D Alias
> > > > >> PCB count
> > > > >> freebsd   freebsd  0
> > > > >> rack* rack 38
> > > > >> ---
> > > > >>
> > > > >> It would be so nice that we can have a sysctl tunnable for this patch
> > > > >> so we could do more tests without recompiling kernel.
> > > > > Thanks for testing!
> > > > >
> > > > > @gallatin: can you come up with a patch that is acceptable for Netflix
> > > > > and allows to mitigate the performance regression.
> > > > 
> > > > Ideally, tcphpts could enable this automatically when it starts to be
> > > > used (enough?), but a sysctl could select auto/on/off.
> > > There is already a well-known mechanism to request execution of the
> > > specific function on return to userspace, namely AST.  The difference
> > > with the current hack is that the execution is requested for one callback
> > > in the context of the specific thread.
> > > 
> > > Still, it might be worth a try to use it; what is the reason to hit a 
> > > thread
> > > that does not do networking, with TCP processing?
> > > 
> > > > 
> > > > Mike
> > > > 
> > > > > Best regards
> > > > > Michael
> > > > >>
> > > > >> Thanks all!
> > > > >> Really happy here :)
> > > > >>
> > > > >> Cheers,
> > > > >>
> > > > >> Nuno Teixeira  escreveu (domingo, 17/03/2024 
> > > > >> à(s) 20:26):
> > > > >>>
> > > > >>> Hello,
> > > > >>>
> > > > >>>> I don't have the full context, but it seems like the complaint is 
> > > > >>>> a performance regression in bonnie++ and perhaps other things when 
> > > > >>>> tcp_hpts is loaded, even when it is not used.  Is that correct?
> > > > >>>>
> > > > >>>> If so, I suspect its because we drive the tcp_hpts_softclock() 
> > > > >>>> routine from userret(), in order to avoid tons of timer interrupts 
> > > > >>>> and context switches.  To test this theory,  you could apply a 
> > > > >>>> patch like:
> > > > >>>
> > > > >>> It's affecting overall system performance, bonnie was just a way to
> > > > >>> get some numbers to compare.
> > > > >>>
> > > > >>> Tomorrow I will test patch.
> > > > >>>
> > > > >>> Thanks!
> > > > >>>
> > > > >>> --
> > > > >>> Nuno Teixeira
> > > > >>> FreeBSD Committer (ports)
> > > > >>
> > > > >>
> > > > >>
> > > > >> -- 
> > > > >> Nuno Teixeira
> > > > >> FreeBSD Committer (ports)
> > > > 
> > > 
> 


Re: Request for Testing: TCP RACK

2024-03-18 Thread Drew Gallatin
I got the idea from 
https://people.mpi-sws.org/~druschel/publications/soft-timers-tocs.pdf  The 
gist is that the TCP pacing stuff needs to run frequently, and rather than run 
it out of a clock interrupt, its more efficient to run it out of a system call 
context at just the point where we return to userspace and the cache is trashed 
anyway.   The current implementation is fine for our workload, but probably not 
idea for a generic system.  Especially one where something is banging on system 
calls.  

Ast's could be the right tool for this, but I'm super unfamiliar with them, and 
I can't find any docs on them. 

Would ast_register(0, ASTR_UNCOND, 0, func) be roughly equivalent to what's 
happening here?

Drew

On Mon, Mar 18, 2024, at 2:33 PM, Konstantin Belousov wrote:
> On Mon, Mar 18, 2024 at 07:26:10AM -0500, Mike Karels wrote:
> > On 18 Mar 2024, at 7:04, tue...@freebsd.org wrote:
> > 
> > >> On 18. Mar 2024, at 12:42, Nuno Teixeira  wrote:
> > >>
> > >> Hello all!
> > >>
> > >> It works just fine!
> > >> System performance is OK.
> > >> Using patch on main-n268841-b0aaf8beb126(-dirty).
> > >>
> > >> ---
> > >> net.inet.tcp.functions_available:
> > >> Stack   D AliasPCB 
> > >> count
> > >> freebsd   freebsd  0
> > >> rack* rack 38
> > >> ---
> > >>
> > >> It would be so nice that we can have a sysctl tunnable for this patch
> > >> so we could do more tests without recompiling kernel.
> > > Thanks for testing!
> > >
> > > @gallatin: can you come up with a patch that is acceptable for Netflix
> > > and allows to mitigate the performance regression.
> > 
> > Ideally, tcphpts could enable this automatically when it starts to be
> > used (enough?), but a sysctl could select auto/on/off.
> There is already a well-known mechanism to request execution of the
> specific function on return to userspace, namely AST.  The difference
> with the current hack is that the execution is requested for one callback
> in the context of the specific thread.
> 
> Still, it might be worth a try to use it; what is the reason to hit a thread
> that does not do networking, with TCP processing?
> 
> > 
> > Mike
> > 
> > > Best regards
> > > Michael
> > >>
> > >> Thanks all!
> > >> Really happy here :)
> > >>
> > >> Cheers,
> > >>
> > >> Nuno Teixeira  escreveu (domingo, 17/03/2024 à(s) 
> > >> 20:26):
> > >>>
> > >>> Hello,
> > >>>
> >  I don't have the full context, but it seems like the complaint is a 
> >  performance regression in bonnie++ and perhaps other things when 
> >  tcp_hpts is loaded, even when it is not used.  Is that correct?
> > 
> >  If so, I suspect its because we drive the tcp_hpts_softclock() routine 
> >  from userret(), in order to avoid tons of timer interrupts and context 
> >  switches.  To test this theory,  you could apply a patch like:
> > >>>
> > >>> It's affecting overall system performance, bonnie was just a way to
> > >>> get some numbers to compare.
> > >>>
> > >>> Tomorrow I will test patch.
> > >>>
> > >>> Thanks!
> > >>>
> > >>> --
> > >>> Nuno Teixeira
> > >>> FreeBSD Committer (ports)
> > >>
> > >>
> > >>
> > >> -- 
> > >> Nuno Teixeira
> > >> FreeBSD Committer (ports)
> > 
> 


Re: Request for Testing: TCP RACK

2024-03-17 Thread Drew Gallatin
Resending with the patch as an attachment.

Drew

On Sun, Mar 17, 2024, at 11:39 AM, Drew Gallatin wrote:
> I don't have the full context, but it seems like the complaint is a 
> performance regression in bonnie++ and perhaps other things when tcp_hpts is 
> loaded, even when it is not used.  Is that correct?
> 
> If so, I suspect its because we drive the tcp_hpts_softclock() routine from 
> userret(), in order to avoid tons of timer interrupts and context switches.  
> To test this theory,  you could apply a patch like:
> 
> diff --git a/sys/kern/subr_trap.c b/sys/kern/subr_trap.c
> index e9a16cd0b36e..54b540c97123 100644
> --- a/sys/kern/subr_trap.c
> +++ b/sys/kern/subr_trap.c
> @@ -138,7 +138,7 @@ userret(struct thread *td, struct trapframe *frame)
>  * Software Timer Support for Network Processing"
>  * by Mohit Aron and Peter Druschel.
>  */
> -   tcp_hpts_softclock();
> +   /*tcp_hpts_softclock();*/
> /*
>  * Let the scheduler adjust our priority etc.
>  */
> 
> 
> If that fixes it, I suspect we should either make this hook optional for 
> casual users of tcp_hpts(), or add some kind of "last called" timestamp to 
> prevent it being called over and over and over on workloads which are syscall 
> heavy.
> 
> Note that for non-casual users of hpts (like Netflix, with hundreds of 
> thousands of TCP connections managed by hpts), this call is a huge win, so I 
> think we'd prefer that it remain in some form.
> 
> Drew
> 

nerf_tcp_hpts_softclock.diff
Description: Binary data


Re: Request for Testing: TCP RACK

2024-03-17 Thread Drew Gallatin
I don't have the full context, but it seems like the complaint is a performance 
regression in bonnie++ and perhaps other things when tcp_hpts is loaded, even 
when it is not used.  Is that correct?

If so, I suspect its because we drive the tcp_hpts_softclock() routine from 
userret(), in order to avoid tons of timer interrupts and context switches.  To 
test this theory,  you could apply a patch like:

diff --git a/sys/kern/subr_trap.c b/sys/kern/subr_trap.c
index e9a16cd0b36e..54b540c97123 100644
--- a/sys/kern/subr_trap.c
+++ b/sys/kern/subr_trap.c
@@ -138,7 +138,7 @@ userret(struct thread *td, struct trapframe *frame)
 * Software Timer Support for Network Processing"
 * by Mohit Aron and Peter Druschel.
 */
-   tcp_hpts_softclock();
+   /*tcp_hpts_softclock();*/
/*
 * Let the scheduler adjust our priority etc.
 */


If that fixes it, I suspect we should either make this hook optional for casual 
users of tcp_hpts(), or add some kind of "last called" timestamp to prevent it 
being called over and over and over on workloads which are syscall heavy.

Note that for non-casual users of hpts (like Netflix, with hundreds of 
thousands of TCP connections managed by hpts), this call is a huge win, so I 
think we'd prefer that it remain in some form.

Drew


Re: IFF_KNOWSEPOCH -> IFF_NEEDSEPOCH

2023-04-07 Thread Drew Gallatin
This sounds like a good plan to me

On Thu, Apr 6, 2023, at 2:34 PM, Gleb Smirnoff wrote:
>   Hi,
> 
> recently we had several drivers marked with IFF_KNOWSEPOCH
> which reminded me that this flag was supposed to be temporary.
> 
> Here is the change that introduced it e87c4940156. It was
> caused by several drivers sending packets from non-interrupt
> context and thus triggering NET_EPOCH_ASSERT(). It was about
> 10 - 20 drivers having this problem initially and reduced down
> to a few after 4426b2e64bd. We had a pretty heated dicussion
> back then and I apologize for that.
> 
> May I suggest before entering FreeBSD 14.0-RELEASE cycle we
> will get back to what was there before e87c4940156?
> 
> To avoid the driver fallout that we used to have back in
> early 2020, here is the plan. In ether_input() where we
> now conditionally on the IFF_KNOWSEPOCH flag enter/exit the
> epoch with INVARIANTS we will also conditionally enter/exit
> in case we are supposed to be in the epoch wrt the flag, but
> we are not. We will also print a warning once, like "interface
> foo0 called if_input without epoch". This handling will be
> converted to normal assertion after a couple months.
> 
> If everybody is fine with this suggestion I will post
> a review. Otherwise please express your concerns.
> 
> -- 
> Gleb Smirnoff
>