Re: [rfc] bind per-cpu timeout threads to each CPU

2014-05-09 Thread John Baldwin
On Thursday, May 08, 2014 11:43:39 pm Adrian Chadd wrote:
 Hi,
 
 I'd like to revisit this now.
 
 I'd like to commit this stuff as-is and then take some time to revisit
 the catch-all softclock from cpu0 swi. It's more complicated than it
 needs to be as it just assumes timeout_cpu == cpuid of cpu 0. So
 there's no easy way to slide in a new catch-all softclock.
 
 Once that's done I'd like to then experiment with turning on the pcpu
 tcp timer stuff and gluing that into the RSS CPU ID / netisr ID stuff.
 
 Thanks,

To be clear, are you going to commit the change to bind all but CPU 0
to their CPU but let the default swi float for now?  I think that is
fine to commit, but I wouldn't want to bind the default swi for now.

 -a
 
 
 On 20 February 2014 13:48, Adrian Chadd adr...@freebsd.org wrote:
  On 20 February 2014 11:17, John Baldwin j...@freebsd.org wrote:
 
  (A further variant of this would be to divorce cpu0's swi from the
  catch-all softclock and let the catch-all softclock float, but bind
  all the per-cpu swis)
 
  I like this idea. If something (eg per-CPU TCP timers, if it's turned
  on) makes a very specific decision about the CPU then it should be
  fixed. Otherwise a lot of the underlying assumptions for things like
  RSS just aren't guaranteed to hold.
 
  It could also perhaps extend to some abstract pool of CPUs later, if
  we wanted to do things like one flowing swi per socket or whatnot when
  we start booting on 1024 core boxes...
 
  -a
 

-- 
John Baldwin
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: [rfc] bind per-cpu timeout threads to each CPU

2014-05-09 Thread Adrian Chadd
On 9 May 2014 10:49, John Baldwin j...@freebsd.org wrote:
 On Thursday, May 08, 2014 11:43:39 pm Adrian Chadd wrote:
 Hi,

 I'd like to revisit this now.

 I'd like to commit this stuff as-is and then take some time to revisit
 the catch-all softclock from cpu0 swi. It's more complicated than it
 needs to be as it just assumes timeout_cpu == cpuid of cpu 0. So
 there's no easy way to slide in a new catch-all softclock.

 Once that's done I'd like to then experiment with turning on the pcpu
 tcp timer stuff and gluing that into the RSS CPU ID / netisr ID stuff.

 Thanks,

 To be clear, are you going to commit the change to bind all but CPU 0
 to their CPU but let the default swi float for now?  I think that is
 fine to commit, but I wouldn't want to bind the default swi for now.

I'd like to do it in the other order and bind everything, so things
like the per-CPU TCP timer thing can be flipped on for RSS and
actually be useful.

I'm looking into what it'd take to create a separate default swi as
well as a cpu-0 swi but as I said, it's pretty hairy there.

How about i instead do the comprimise:

* i'll pin all other swi's
* default swi isn't pinned by default, but one can flip on a sysctl at
boot time to pin it

How's that sound?


-a
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: [rfc] bind per-cpu timeout threads to each CPU

2014-05-09 Thread Peter Grehan

How about i instead do the comprimise:

* i'll pin all other swi's
* default swi isn't pinned by default, but one can flip on a sysctl at
boot time to pin it

How's that sound?


 And also please a sysctl that disables any swi pinning.

 It is sometimes useful to change the default cpuset, for instance to 
allocate a subset of CPUs to some particular applications and not 
FreeBSD. Having kernel threads pinned prevents this from happening since 
they are in the default set.


 (Note that some network drivers are also culprits here, though 
disabling MSI-x in them is a workaround).


later,

Peter.

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: [rfc] bind per-cpu timeout threads to each CPU

2014-05-09 Thread Adrian Chadd
On 9 May 2014 12:50, Peter Grehan gre...@freebsd.org wrote:
 How about i instead do the comprimise:

 * i'll pin all other swi's
 * default swi isn't pinned by default, but one can flip on a sysctl at
 boot time to pin it

 How's that sound?


  And also please a sysctl that disables any swi pinning.

  It is sometimes useful to change the default cpuset, for instance to
 allocate a subset of CPUs to some particular applications and not FreeBSD.
 Having kernel threads pinned prevents this from happening since they are in
 the default set.

  (Note that some network drivers are also culprits here, though disabling
 MSI-x in them is a workaround).

Yup. I've just done that.

http://people.freebsd.org/~adrian/norse/20140509-swi-pin-1.diff

Which workloads are you thinking about? Maybe we could introduce some
higher level description of which CPU(s) at boot time to do freebsd
stuff on, and then don't start things like pcpu swi's and NIC threads
on those CPUs.

Can you think of situations where we'd want to have per-cpu swi's even
_running_ for CPUs that you want to dedicate to other things? There's
nothing stopping you from scheduling a callout on a different target
CPU.

(It'd require some work, but it likely should be done in some form as
part of overall RSS framework hacking.)



-a
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: [rfc] bind per-cpu timeout threads to each CPU

2014-05-09 Thread John Baldwin
On Friday, May 09, 2014 3:50:28 pm Peter Grehan wrote:
  How about i instead do the comprimise:
 
  * i'll pin all other swi's
  * default swi isn't pinned by default, but one can flip on a sysctl at
  boot time to pin it
 
  How's that sound?
 
   And also please a sysctl that disables any swi pinning.
 
   It is sometimes useful to change the default cpuset, for instance to 
 allocate a subset of CPUs to some particular applications and not 
 FreeBSD. Having kernel threads pinned prevents this from happening since 
 they are in the default set.
 
   (Note that some network drivers are also culprits here, though 
 disabling MSI-x in them is a workaround).

I'd actually like a way to exempt certain kernel threads that are inherently
per-CPU (such as queues for NIC drivers or per-CPU swi threads) from the
default cpuset so that they don't break 'cpuset -l 0 -s 1'.  Providing some
sort of way to disable the pinning for now should be good, but I think I'd
eventually prefer the former suggestion.

-- 
John Baldwin
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: [rfc] bind per-cpu timeout threads to each CPU

2014-05-09 Thread Peter Grehan

Yup. I've just done that.

http://people.freebsd.org/~adrian/norse/20140509-swi-pin-1.diff


 Thanks, that'll work.


Which workloads are you thinking about? Maybe we could introduce some
higher level description of which CPU(s) at boot time to do freebsd
stuff on, and then don't start things like pcpu swi's and NIC threads
on those CPUs.


 A classic case is partitioning cores into control and data plane 
groups. I'm sure there are lots more. What's nice about cpuset is that 
the choice and change can be dynamic, so long as there aren't pinned 
threads in the default set.


 An option to restrict FreeBSD pCPU threads to a subset could be useful.


Can you think of situations where we'd want to have per-cpu swi's even
_running_ for CPUs that you want to dedicate to other things? There's
nothing stopping you from scheduling a callout on a different target
CPU.


 At least for the uses I know, it's complete isolation from other 
processing, kernel threads included. The 'freebsd stuff' info you 
mentioned should be sufficient.


later,

Peter.
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: [rfc] bind per-cpu timeout threads to each CPU

2014-05-09 Thread Adrian Chadd
On 9 May 2014 16:49, Peter Grehan gre...@freebsd.org wrote:
 Yup. I've just done that.

 http://people.freebsd.org/~adrian/norse/20140509-swi-pin-1.diff


  Thanks, that'll work.


 Which workloads are you thinking about? Maybe we could introduce some
 higher level description of which CPU(s) at boot time to do freebsd
 stuff on, and then don't start things like pcpu swi's and NIC threads
 on those CPUs.


  A classic case is partitioning cores into control and data plane groups.
 I'm sure there are lots more. What's nice about cpuset is that the choice
 and change can be dynamic, so long as there aren't pinned threads in the
 default set.

  An option to restrict FreeBSD pCPU threads to a subset could be useful.

Cool.

 Can you think of situations where we'd want to have per-cpu swi's even
 _running_ for CPUs that you want to dedicate to other things? There's
 nothing stopping you from scheduling a callout on a different target
 CPU.

  At least for the uses I know, it's complete isolation from other
 processing, kernel threads included. The 'freebsd stuff' info you mentioned
 should be sufficient.

Cool.

I'll look at committing this stuff in the next few hours. It can
always easily be undone.

I'll revisit the limit pcpu threads to cpuids x idea later.


-a
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: [rfc] bind per-cpu timeout threads to each CPU

2014-05-09 Thread Adrian Chadd
ok, I've committed it but I've left the default at don't pin.

That way the existing behaviour hasn't changed and it's easy to flip
on to play with.

Thanks for the feedback John / Peter!


-a
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: [rfc] bind per-cpu timeout threads to each CPU

2014-05-08 Thread Adrian Chadd
Hi,

I'd like to revisit this now.

I'd like to commit this stuff as-is and then take some time to revisit
the catch-all softclock from cpu0 swi. It's more complicated than it
needs to be as it just assumes timeout_cpu == cpuid of cpu 0. So
there's no easy way to slide in a new catch-all softclock.

Once that's done I'd like to then experiment with turning on the pcpu
tcp timer stuff and gluing that into the RSS CPU ID / netisr ID stuff.

Thanks,

-a


On 20 February 2014 13:48, Adrian Chadd adr...@freebsd.org wrote:
 On 20 February 2014 11:17, John Baldwin j...@freebsd.org wrote:

 (A further variant of this would be to divorce cpu0's swi from the
 catch-all softclock and let the catch-all softclock float, but bind
 all the per-cpu swis)

 I like this idea. If something (eg per-CPU TCP timers, if it's turned
 on) makes a very specific decision about the CPU then it should be
 fixed. Otherwise a lot of the underlying assumptions for things like
 RSS just aren't guaranteed to hold.

 It could also perhaps extend to some abstract pool of CPUs later, if
 we wanted to do things like one flowing swi per socket or whatnot when
 we start booting on 1024 core boxes...

 -a
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: [rfc] bind per-cpu timeout threads to each CPU

2014-02-20 Thread John Baldwin
On Wednesday, February 19, 2014 4:02:54 pm John Baldwin wrote:
 On Wednesday, February 19, 2014 3:04:51 pm Adrian Chadd wrote:
  On 19 February 2014 11:59, Alexander Motin m...@freebsd.org wrote:
  
   So if we're moving towards supporting (among others) a pcbgroup / RSS
   hash style work load distribution across CPUs to minimise
   per-connection lock contention, we really don't want the scheduler to
   decide it can schedule things on other CPUs under enough pressure.
   That'll just make things worse.
  
   True, though it is also not obvious that putting second thread on CPU run
   queue is better then executing it right now on another core.
  
  Well, it depends if you're trying to optimise for run all runnable
  tasks as quickly as possible or run all runnable tasks in contexts
  that minimise lock contention.
  
  The former sounds great as long as there's no real lock contention
  going on. But as you add more chances for contention (something like
  100,000 concurrent TCP flows) then you may end up having your TCP
  timer firing stuff interfere with more TXing or RXing on the same
  connection.
  
  Chasing this stuff down is a pain, because it only really shows up
  when you're doing lots of concurrency.
  
  I'm happy to make this a boot-time option and leave it off for the
  time being. How's that?
 
 I think having it be a tunable would be good.  OTOH, I could also
 see another option which would be to pin all clock threads except
 for the default one by default and only have the option control
 whether or not the default thread is pinned to CPU 0 as callers
 who use callout_on() are explicitly asking to run the callout on a
 specific CPU.

(A further variant of this would be to divorce cpu0's swi from the
catch-all softclock and let the catch-all softclock float, but bind
all the per-cpu swis)

-- 
John Baldwin
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: [rfc] bind per-cpu timeout threads to each CPU

2014-02-20 Thread Adrian Chadd
On 20 February 2014 11:17, John Baldwin j...@freebsd.org wrote:

 (A further variant of this would be to divorce cpu0's swi from the
 catch-all softclock and let the catch-all softclock float, but bind
 all the per-cpu swis)

I like this idea. If something (eg per-CPU TCP timers, if it's turned
on) makes a very specific decision about the CPU then it should be
fixed. Otherwise a lot of the underlying assumptions for things like
RSS just aren't guaranteed to hold.

It could also perhaps extend to some abstract pool of CPUs later, if
we wanted to do things like one flowing swi per socket or whatnot when
we start booting on 1024 core boxes...

-a
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: [rfc] bind per-cpu timeout threads to each CPU

2014-02-19 Thread Alexander Motin

Hi.

Clock interrupt threads, same as other ones are only softly bound to 
specific CPUs by scheduler preferring to run them on CPUs where they are 
scheduled. So far that was enough to balance load, but allowed threads 
to migrate, if needed. Is it too flexible for some use case?


--
Alexander Motin
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: [rfc] bind per-cpu timeout threads to each CPU

2014-02-19 Thread Adrian Chadd
On 19 February 2014 11:40, Alexander Motin m...@freebsd.org wrote:
 Hi.

 Clock interrupt threads, same as other ones are only softly bound to
 specific CPUs by scheduler preferring to run them on CPUs where they are
 scheduled. So far that was enough to balance load, but allowed threads to
 migrate, if needed. Is it too flexible for some use case?

I saw it migrate under enough CPU load / pressure, right smack bang in
the middle of doing TCP processing.

So if we're moving towards supporting (among others) a pcbgroup / RSS
hash style work load distribution across CPUs to minimise
per-connection lock contention, we really don't want the scheduler to
decide it can schedule things on other CPUs under enough pressure.
That'll just make things worse.


-a
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: [rfc] bind per-cpu timeout threads to each CPU

2014-02-19 Thread Alexander Motin

On 19.02.2014 21:51, Adrian Chadd wrote:

On 19 February 2014 11:40, Alexander Motin m...@freebsd.org wrote:

Clock interrupt threads, same as other ones are only softly bound to
specific CPUs by scheduler preferring to run them on CPUs where they are
scheduled. So far that was enough to balance load, but allowed threads to
migrate, if needed. Is it too flexible for some use case?


I saw it migrate under enough CPU load / pressure, right smack bang in
the middle of doing TCP processing.

So if we're moving towards supporting (among others) a pcbgroup / RSS
hash style work load distribution across CPUs to minimise
per-connection lock contention, we really don't want the scheduler to
decide it can schedule things on other CPUs under enough pressure.
That'll just make things worse.


True, though it is also not obvious that putting second thread on CPU 
run queue is better then executing it right now on another core.


--
Alexander Motin
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: [rfc] bind per-cpu timeout threads to each CPU

2014-02-19 Thread Adrian Chadd
On 19 February 2014 11:59, Alexander Motin m...@freebsd.org wrote:

 So if we're moving towards supporting (among others) a pcbgroup / RSS
 hash style work load distribution across CPUs to minimise
 per-connection lock contention, we really don't want the scheduler to
 decide it can schedule things on other CPUs under enough pressure.
 That'll just make things worse.

 True, though it is also not obvious that putting second thread on CPU run
 queue is better then executing it right now on another core.

Well, it depends if you're trying to optimise for run all runnable
tasks as quickly as possible or run all runnable tasks in contexts
that minimise lock contention.

The former sounds great as long as there's no real lock contention
going on. But as you add more chances for contention (something like
100,000 concurrent TCP flows) then you may end up having your TCP
timer firing stuff interfere with more TXing or RXing on the same
connection.

Chasing this stuff down is a pain, because it only really shows up
when you're doing lots of concurrency.

I'm happy to make this a boot-time option and leave it off for the
time being. How's that?



-a
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: [rfc] bind per-cpu timeout threads to each CPU

2014-02-19 Thread Alfred Perlstein


On 2/19/14, 12:04 PM, Adrian Chadd wrote:

On 19 February 2014 11:59, Alexander Motin m...@freebsd.org wrote:


So if we're moving towards supporting (among others) a pcbgroup / RSS
hash style work load distribution across CPUs to minimise
per-connection lock contention, we really don't want the scheduler to
decide it can schedule things on other CPUs under enough pressure.
That'll just make things worse.

True, though it is also not obvious that putting second thread on CPU run
queue is better then executing it right now on another core.

Well, it depends if you're trying to optimise for run all runnable
tasks as quickly as possible or run all runnable tasks in contexts
that minimise lock contention.

The former sounds great as long as there's no real lock contention
going on. But as you add more chances for contention (something like
100,000 concurrent TCP flows) then you may end up having your TCP
timer firing stuff interfere with more TXing or RXing on the same
connection.

Chasing this stuff down is a pain, because it only really shows up
when you're doing lots of concurrency.

I'm happy to make this a boot-time option and leave it off for the
time being. How's that?


options THROUGHPUT

Yes, looks like a latency vs throughput issue.  One giant switch might 
be a starting point so that it doesn't become death of 1000 switches to 
get throughput or latency sensitive work done.






-a
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org



___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: [rfc] bind per-cpu timeout threads to each CPU

2014-02-19 Thread John Baldwin
On Wednesday, February 19, 2014 3:04:51 pm Adrian Chadd wrote:
 On 19 February 2014 11:59, Alexander Motin m...@freebsd.org wrote:
 
  So if we're moving towards supporting (among others) a pcbgroup / RSS
  hash style work load distribution across CPUs to minimise
  per-connection lock contention, we really don't want the scheduler to
  decide it can schedule things on other CPUs under enough pressure.
  That'll just make things worse.
 
  True, though it is also not obvious that putting second thread on CPU run
  queue is better then executing it right now on another core.
 
 Well, it depends if you're trying to optimise for run all runnable
 tasks as quickly as possible or run all runnable tasks in contexts
 that minimise lock contention.
 
 The former sounds great as long as there's no real lock contention
 going on. But as you add more chances for contention (something like
 100,000 concurrent TCP flows) then you may end up having your TCP
 timer firing stuff interfere with more TXing or RXing on the same
 connection.
 
 Chasing this stuff down is a pain, because it only really shows up
 when you're doing lots of concurrency.
 
 I'm happy to make this a boot-time option and leave it off for the
 time being. How's that?

I think having it be a tunable would be good.  OTOH, I could also
see another option which would be to pin all clock threads except
for the default one by default and only have the option control
whether or not the default thread is pinned to CPU 0 as callers
who use callout_on() are explicitly asking to run the callout on a
specific CPU.

-- 
John Baldwin
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: [rfc] bind per-cpu timeout threads to each CPU

2014-02-19 Thread Alexander Motin

On 19.02.2014 22:04, Adrian Chadd wrote:

On 19 February 2014 11:59, Alexander Motin m...@freebsd.org wrote:


So if we're moving towards supporting (among others) a pcbgroup / RSS
hash style work load distribution across CPUs to minimise
per-connection lock contention, we really don't want the scheduler to
decide it can schedule things on other CPUs under enough pressure.
That'll just make things worse.



True, though it is also not obvious that putting second thread on CPU run
queue is better then executing it right now on another core.


Well, it depends if you're trying to optimise for run all runnable
tasks as quickly as possible or run all runnable tasks in contexts
that minimise lock contention.

The former sounds great as long as there's no real lock contention
going on. But as you add more chances for contention (something like
100,000 concurrent TCP flows) then you may end up having your TCP
timer firing stuff interfere with more TXing or RXing on the same
connection.


100K TCP flows probably means 100K locks. That means that chance of lock 
collision on each of them is effectively zero. More realistic it could 
be to speak about cache coherency traffic, etc, but I still think that 
number of expired timeouts should be much lower then number of other 
flow data accesses.



Chasing this stuff down is a pain, because it only really shows up
when you're doing lots of concurrency.

I'm happy to make this a boot-time option and leave it off for the
time being. How's that?


I generally hate tunables like that. There are too few people who may 
even try to make grounded decision in that question. If you think it 
right -- just do it, otherwise -- don't do it. I am not really 
objecting, more like sounding concerns.


--
Alexander Motin
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: [rfc] bind per-cpu timeout threads to each CPU

2014-02-19 Thread Slawa Olhovchenkov
On Wed, Feb 19, 2014 at 11:04:49PM +0200, Alexander Motin wrote:

 On 19.02.2014 22:04, Adrian Chadd wrote:
  On 19 February 2014 11:59, Alexander Motin m...@freebsd.org wrote:
 
  So if we're moving towards supporting (among others) a pcbgroup / RSS
  hash style work load distribution across CPUs to minimise
  per-connection lock contention, we really don't want the scheduler to
  decide it can schedule things on other CPUs under enough pressure.
  That'll just make things worse.
 
  True, though it is also not obvious that putting second thread on CPU run
  queue is better then executing it right now on another core.
 
  Well, it depends if you're trying to optimise for run all runnable
  tasks as quickly as possible or run all runnable tasks in contexts
  that minimise lock contention.
 
  The former sounds great as long as there's no real lock contention
  going on. But as you add more chances for contention (something like
  100,000 concurrent TCP flows) then you may end up having your TCP
  timer firing stuff interfere with more TXing or RXing on the same
  connection.
 
 100K TCP flows probably means 100K locks. That means that chance of lock 
 collision on each of them is effectively zero. More realistic it could 

What about 100K/N_cpu*PPS timer's queue locks for remove/insert TCP
timeouts callbacks?
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: [rfc] bind per-cpu timeout threads to each CPU

2014-02-19 Thread Alexander Motin

On 19.02.2014 23:44, Slawa Olhovchenkov wrote:

On Wed, Feb 19, 2014 at 11:04:49PM +0200, Alexander Motin wrote:


On 19.02.2014 22:04, Adrian Chadd wrote:

On 19 February 2014 11:59, Alexander Motin m...@freebsd.org wrote:


So if we're moving towards supporting (among others) a pcbgroup / RSS
hash style work load distribution across CPUs to minimise
per-connection lock contention, we really don't want the scheduler to
decide it can schedule things on other CPUs under enough pressure.
That'll just make things worse.



True, though it is also not obvious that putting second thread on CPU run
queue is better then executing it right now on another core.


Well, it depends if you're trying to optimise for run all runnable
tasks as quickly as possible or run all runnable tasks in contexts
that minimise lock contention.

The former sounds great as long as there's no real lock contention
going on. But as you add more chances for contention (something like
100,000 concurrent TCP flows) then you may end up having your TCP
timer firing stuff interfere with more TXing or RXing on the same
connection.


100K TCP flows probably means 100K locks. That means that chance of lock
collision on each of them is effectively zero. More realistic it could


What about 100K/N_cpu*PPS timer's queue locks for remove/insert TCP
timeouts callbacks?


I am not sure what this formula means, but yes, per-CPU callout locks 
can much more likely be congested. They are only per-CPU, not per-flow.


--
Alexander Motin
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: [rfc] bind per-cpu timeout threads to each CPU

2014-02-19 Thread Adrian Chadd
On 19 February 2014 14:09, Alexander Motin m...@freebsd.org wrote:
 On 19.02.2014 23:44, Slawa Olhovchenkov wrote:

 On Wed, Feb 19, 2014 at 11:04:49PM +0200, Alexander Motin wrote:

 On 19.02.2014 22:04, Adrian Chadd wrote:

 On 19 February 2014 11:59, Alexander Motin m...@freebsd.org wrote:

 So if we're moving towards supporting (among others) a pcbgroup / RSS
 hash style work load distribution across CPUs to minimise
 per-connection lock contention, we really don't want the scheduler to
 decide it can schedule things on other CPUs under enough pressure.
 That'll just make things worse.


 True, though it is also not obvious that putting second thread on CPU
 run
 queue is better then executing it right now on another core.


 Well, it depends if you're trying to optimise for run all runnable
 tasks as quickly as possible or run all runnable tasks in contexts
 that minimise lock contention.

 The former sounds great as long as there's no real lock contention
 going on. But as you add more chances for contention (something like
 100,000 concurrent TCP flows) then you may end up having your TCP
 timer firing stuff interfere with more TXing or RXing on the same
 connection.


 100K TCP flows probably means 100K locks. That means that chance of lock
 collision on each of them is effectively zero. More realistic it could


 What about 100K/N_cpu*PPS timer's queue locks for remove/insert TCP
 timeouts callbacks?


 I am not sure what this formula means, but yes, per-CPU callout locks can
 much more likely be congested. They are only per-CPU, not per-flow.

It's not just that, but also TX versus RX ACK processing and further
TX being done on different threads.



-a
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: [rfc] bind per-cpu timeout threads to each CPU

2014-02-19 Thread Slawa Olhovchenkov
On Thu, Feb 20, 2014 at 12:09:04AM +0200, Alexander Motin wrote:

 On 19.02.2014 23:44, Slawa Olhovchenkov wrote:
  On Wed, Feb 19, 2014 at 11:04:49PM +0200, Alexander Motin wrote:
 
  On 19.02.2014 22:04, Adrian Chadd wrote:
  On 19 February 2014 11:59, Alexander Motin m...@freebsd.org wrote:
 
  So if we're moving towards supporting (among others) a pcbgroup / RSS
  hash style work load distribution across CPUs to minimise
  per-connection lock contention, we really don't want the scheduler to
  decide it can schedule things on other CPUs under enough pressure.
  That'll just make things worse.
 
  True, though it is also not obvious that putting second thread on CPU run
  queue is better then executing it right now on another core.
 
  Well, it depends if you're trying to optimise for run all runnable
  tasks as quickly as possible or run all runnable tasks in contexts
  that minimise lock contention.
 
  The former sounds great as long as there's no real lock contention
  going on. But as you add more chances for contention (something like
  100,000 concurrent TCP flows) then you may end up having your TCP
  timer firing stuff interfere with more TXing or RXing on the same
  connection.
 
  100K TCP flows probably means 100K locks. That means that chance of lock
  collision on each of them is effectively zero. More realistic it could
 
  What about 100K/N_cpu*PPS timer's queue locks for remove/insert TCP
  timeouts callbacks?
 
 I am not sure what this formula means, but yes, per-CPU callout locks 
 can much more likely be congested. They are only per-CPU, not per-flow.

100K TCP flows distributed between CPU (100K/N_cpu).
every TCP flow several times per seconds touch his callout (*PPS)
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


[rfc] bind per-cpu timeout threads to each CPU

2014-02-18 Thread Adrian Chadd
Hi,

This patch binds the per-CPU timeout swi threads to the CPU they're on.

The scheduler may decide that a preempted kernel thread that's still
runnable and this can happen during things like per-CPU TCP timers
firing.

Thanks,

Index: sys/kern/kern_timeout.c
===
--- sys/kern/kern_timeout.c (revision 261910)
+++ sys/kern/kern_timeout.c (working copy)
@@ -355,6 +355,7 @@
char name[MAXCOMLEN];
 #ifdef SMP
int cpu;
+   struct intr_event *ie;
 #endif

cc = CC_CPU(timeout_cpu);
@@ -362,6 +363,11 @@
if (swi_add(clk_intr_event, name, softclock, cc, SWI_CLOCK,
INTR_MPSAFE, cc-cc_cookie))
panic(died while creating standard software ithreads);
+   if (intr_event_bind(clk_intr_event, timeout_cpu) != 0) {
+   printf(%s: timeout clock couldn't be pinned to cpu %d\n,
+   __func__,
+   timeout_cpu);
+   }
 #ifdef SMP
CPU_FOREACH(cpu) {
if (cpu == timeout_cpu)
@@ -370,9 +376,15 @@
cc-cc_callout = NULL;  /* Only cpu0 handles timeout(9). */
callout_cpu_init(cc);
snprintf(name, sizeof(name), clock (%d), cpu);
-   if (swi_add(NULL, name, softclock, cc, SWI_CLOCK,
+   ie = NULL;
+   if (swi_add(ie, name, softclock, cc, SWI_CLOCK,
INTR_MPSAFE, cc-cc_cookie))
panic(died while creating standard software ithreads);
+   if (intr_event_bind(ie, cpu) != 0) {
+   printf(%s: per-cpu clock couldn't be pinned
to cpu %d\n,
+   __func__,
+   cpu);
+   }
}
 #endif
 }


-a
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org