DP performance

2005-11-28 Thread Danial Thom
It seems most of the banter for the past few
months is userland related. What is the state of
the kernel in terms of DP/MP kernel performance?
Has any work been done or is DFLY still in the
cleaning up stages? I'm still desparately seeking
a good reason to move to Dual-core processors

DT




__ 
Yahoo! Mail - PC Magazine Editors' Choice 2005 
http://mail.yahoo.com


Re: DP performance

2005-11-28 Thread Matthew Dillon

:It seems most of the banter for the past few
:months is userland related. What is the state of
:the kernel in terms of DP/MP kernel performance?
:Has any work been done or is DFLY still in the
:cleaning up stages? I'm still desparately seeking
:a good reason to move to Dual-core processors
:
:DT

It's getting better but there is still a lot of work to do.  After
this upcoming release (middle of December) I intend to bring in Jeff's
parallel routing code and make the TCP and UDP protocol threads MP safe.

Even in its current state (and, in fact, even using the old FreeBSD 4.x
kernel's), you will reap major benefits on a dual-core cpu.  I have
been very impressed with AMD's Athlon X2 in these Shuttle XPC boxes,
despite having to buy a PCI GiGE ethernet card due to motherboard
issues.

-Matt
Matthew Dillon 
[EMAIL PROTECTED]


Re: DP performance

2005-11-28 Thread Danial Thom


--- Matthew Dillon [EMAIL PROTECTED]
wrote:

 
 :It seems most of the banter for the past few
 :months is userland related. What is the state
 of
 :the kernel in terms of DP/MP kernel
 performance?
 :Has any work been done or is DFLY still in the
 :cleaning up stages? I'm still desparately
 seeking
 :a good reason to move to Dual-core processors
 :
 :DT
 
 It's getting better but there is still a
 lot of work to do.  After
 this upcoming release (middle of December)
 I intend to bring in Jeff's
 parallel routing code and make the TCP and
 UDP protocol threads MP safe.
 
 Even in its current state (and, in fact,
 even using the old FreeBSD 4.x
 kernel's), you will reap major benefits on
 a dual-core cpu.  I have
 been very impressed with AMD's Athlon X2 in
 these Shuttle XPC boxes,
 despite having to buy a PCI GiGE ethernet
 card due to motherboard
 issues.

What kind of benefits would be realized for
systems being used primary as a router/bridge,
given that its almost 100% kernel usage?

DT



__ 
Yahoo! Music Unlimited 
Access over 1 million songs. Try it free. 
http://music.yahoo.com/unlimited/


Re: DP performance

2005-11-28 Thread Matthew Dillon
:What kind of benefits would be realized for
:systems being used primary as a router/bridge,
:given that its almost 100% kernel usage?
:
:DT

Routing packets doesn't take much cpu unless you are running a gigabit
of actual bandwidth (or more).  If you aren't doing anything else with 
the machine then the cheapest AMD XP will do the job.

-Matt
Matthew Dillon 
[EMAIL PROTECTED]


Re: DP performance

2005-11-28 Thread Steve Shorter
On Mon, Nov 28, 2005 at 10:15:55AM -0800, Matthew Dillon wrote:
 :What kind of benefits would be realized for
 :systems being used primary as a router/bridge,
 :given that its almost 100% kernel usage?
 :
 :DT
 
 Routing packets doesn't take much cpu unless you are running a gigabit
 of actual bandwidth (or more).  If you aren't doing anything else with 
 the machine then the cheapest AMD XP will do the job.
 

We've found the bottle neck for routers is CPU cycles neccessary to
process NIC hardware interrupts. At least for OBSD. Interupt mitigation, and
I suppose POLLING on Dragonfly may help but it isn't supported on all 
hardware AFAIK.

Why kind of parallelism, as far as processing separate NIC hardware 
interupts on separate CPU's, can DragonFly currently support?

-steve



Re: DP performance

2005-11-28 Thread Danial Thom


--- Steve Shorter [EMAIL PROTECTED] wrote:

 On Mon, Nov 28, 2005 at 10:15:55AM -0800,
 Matthew Dillon wrote:
  :What kind of benefits would be realized
 for
  :systems being used primary as a
 router/bridge,
  :given that its almost 100% kernel usage?
  :
  :DT
  
  Routing packets doesn't take much cpu
 unless you are running a gigabit
  of actual bandwidth (or more).  If you
 aren't doing anything else with 
  the machine then the cheapest AMD XP will
 do the job.
  
 
   We've found the bottle neck for routers is CPU
 cycles neccessary to
 process NIC hardware interrupts. At least for
 OBSD. Interupt mitigation, and
 I suppose POLLING on Dragonfly may help but it
 isn't supported on all 
 hardware AFAIK.

Polling is pretty dumb with modern NICs, as most
have built-in interrupt moderation that does the
work of polling without all of the overhead (by
generating interrupts with user-definable forced
separation). At least Intels do; if others don't
then thats enough reason not to use them. Doing
500K pps with a 10K interrupts/second setting is
better than you could ever do with polling, and
the results quite good.

Dealing with the NICs (processing packets, I/Os,
etc) is what uses the cycles, and the stack (a
bridge machine can do twice as many packets as a
router for example). For network processing
true separation of transmit and receive is
probably the only way to realize networking gains
for non TCP/UDP operations. Slicing up the stack
will only slow things down compared to UP (think
of a full-speed relay race against a guy that
doesn't get tired...the guy without the hand-offs
will always win.) The best  you can probably do
is match the UP performance; but idealy have a
bunch of cpu power left over. So maybe you'd have
a UP machine that can do 800K pps and be on the
edge of livelock, and a DP machine that can do
750K pps but is still usable at the user level.

Danial





__ 
Yahoo! Mail - PC Magazine Editors' Choice 2005 
http://mail.yahoo.com


Re: DP performance

2005-11-28 Thread Matthew Dillon
If we are talking about maxing out a machine in the packet routing
role, then there are two major issue sthat have to be considered:

* Bus bandwidth.  e.g. PCI, PCIX, PCIE, etc etc etc.  A standard PCI
  bus is limited to ~120 MBytes/sec, not enough for even a single GiGE
  link going full duplex at full speed.  More recent busses can do better.

* Workload separation.  So e.g. if one has four interfaces and two cpus,
  each cpu could handle two interfaces.

An MP system would not reap any real gains over UP until one had three
or more network interfaces, since two interfaces is no different from one
interface from the point of view of trying to route packets.

Main memory bandwidth used to be an issue but isn't so much any more.

Insofar as DragonFly goes, we can almost handle the workload separation
case now, but not quite.  We will be able to handle it with the work
going in after the release.  Even so, it will probably only matter if 
the majority of packets being routed are tiny.  Bigger packets eat far
less cpu for the amount of data transfered.

-Matt