Re: [pfSense-discussion] throughput - cpu, bus
Chun Wong wrote: snip Now somewhere in there is the culprit for slowing things down. I have been using ftp get on large files to do the measuring: Is there a better method ? Thanks Chun, Yes, try iperf to do performance testing. Even better, you want to perform several (think multiple threads) in parallel. Even multiple wgets on large files will perform much better. Some of the things that will limit you in the current setup are: 1- nail up the interface speeds and duplex. Always a good idea given a commodity switch (the SMC), or any switch for that matter. 2- the SMC switch. I presume it's a store and forward switch (each frame is received in full, stored in memory, then forwarded). Higher performance switches are cut through in which the header is read, destination port is determined, and the packet is started out the second port before the tail end of the packet is finished arriving at the input port. 3- ftp. ftp is likely the worst test out there, and if you could graph it in real time, you'll see a variety of issues, from the TCP sawtooth effect (when packets start to drop, tcp, depending on what OS is involved, will downrate the speed by 33-50% and try to ramp up again to max speed), and having a less than efficient control channel (in proper ftp, not passive) on another tcp port doesn't actually help. Lots of places in the code where things slow down. Contrast that to http clients being multithread and a lot more efficient. (ftp was designed for off hours bulk transfer). Contrast this to a true streaming protocol which uses UDP for transport, and a seperate tcp channel for control. Much faster. 4- vmware. well, it's a fake interface emulated on decent hardware. 'nuff said? Not terrible, but 3-4x the code and no direct access to the network hardware or stack itself. My quick test (same application code, same machine, same destination) cut performance a lot going through vmware. 5- via chipset. Ok, not super, but workable. With similar hardware (intel pro 100 nic cards) I used to get a typical 8-10MBps through a cheapo store and forward switch (similar broadcom chipset to the SMC). 6- tcp stack settings. TCP window sizes, etc. Most by default are 10Mbps friendly (that's bits/sec not bytes) especially in microsoft windows and need some help for either higher latency (think cable/dsl) or higher speed networks. Performance can sometimes double with a properly setup client and server. Hmmm, Hope some of that helps. :-) cheers, andy
Re: [pfSense-discussion] throughput - cpu, bus
note small edit for clarity below: Andrew C Burnette wrote: Chun Wong wrote: snip Now somewhere in there is the culprit for slowing things down. I have been using ftp get on large files to do the measuring: Is there a better method ? Thanks snip 5- via chipset. Ok, not super, but workable. With similar hardware (intel pro 100 nic cards) I used to get a typical 8-10MBps through a cheapo store and forward switch (similar broadcom chipset to the SMC). this wasn't pfsense, just general traffic on a repeated basis to/from an nfs server through a small broadcom chipset switch (most commodity switches these days have same/similar broadcom chipset in them, despite whatever brand they might be). Both client/server were running linux with little other network traffic present.
RE: [pfSense-discussion] throughput - cpu, bus
HP DL380G3 w/ Broadcom and Intel NICs. I also ran an iperf test, but ran out of physical boxes to generate and receive the load at around 900Mbit That's around the same figure I managed to generate with iperf here while testing 12 months ago. (I did determine the maximum xmit/receive rate of a Sun v120 running Solaris 8 though ;) ) During the iperf tests, the cpu load was closer to 25%, but iperf generates larger packets, so that's no huge surprise and why Avalanche is a much closer to real life test. Quite. Rather hard to fill a state table with iperf. Putting in a DL-385 for the same client, on 6.x/PF with 4 * em to firewall off a large network backup environment. I should have some pretty symon pictures soon. Very interested in results from a high throughput environment. I can pass on the symon graphic goodness for my handrolled 6.x/pf build on a dl-385 if you're interested, should have some meaningful stats soon. Shame the 802.3ad/lacp code from NetBSD hasn't been ported over yet, I could make use of it in this design. We're a large company and pfSense doesn't meet our internal audit requirements just yet - that's on my todo list (multi-user, change logs, etc). Give it time :-), its all good. greg
RE: [pfSense-discussion] throughput - cpu, bus
guys, 2.2MBs, 2.2 megabytes per second (120) 7MBs, 7 megabytes pers second (athlon) Are the Athlon figures on a Via chipset motherboard ? Some of the early Via athlon chipsets had pretty lousy PCI performance. You could try tweaking the PCI latency timers in the bios to give the em card more time on the bus. This may improve throughput slightly. On a bge plugged into a nforce2 board, I can iperf ~800 read/ ~600 write through it. Greg
RE: [pfSense-discussion] throughput - cpu, bus
Chipset ? I'm not sure tbh, its an abit board I purchased 4-5 years ago. The source is on a HP Netserver LH3000 (2 x P3 866Mhz, 2.25Gb RAM) with dual 64 bit PCI bus. 3 x Intel Pro MT1000 gig nics (64bit). The disk subsystem is 2 x megaraid scsi/sata controllers, with scsi3 and sata raid 5 arrays. I doubt the bottle neck is there. Although it is running vmware 2.5.1 at the moment. The guest OS is Windows XP SP2. I guess I need to see what happens when I run straight linux on the box. The firewall is currently on an abit mb, don't know which chipset till I down the fw and take a look. This has Intel Pro MT1000 gig nics (64bit) too although only 32bits are being used. The destination machine is a nforce2 mb with an athlon xp1700 with 1Gb RAM and ATA133 seagate 7200rpm drive running XP SP2. Here there is a 3com 996B Now somewhere in there is the culprit for slowing things down. I have been using ftp get on large files to do the measuring: Is there a better method ? Thanks -Original Message- From: Greg Hennessy [mailto:[EMAIL PROTECTED] Sent: 15 March 2006 10:45 To: discussion@pfsense.com Subject: RE: [pfSense-discussion] throughput - cpu, bus guys, 2.2MBs, 2.2 megabytes per second (120) 7MBs, 7 megabytes pers second (athlon) Are the Athlon figures on a Via chipset motherboard ? Some of the early Via athlon chipsets had pretty lousy PCI performance. You could try tweaking the PCI latency timers in the bios to give the em card more time on the bus. This may improve throughput slightly. On a bge plugged into a nforce2 board, I can iperf ~800 read/ ~600 write through it. Greg
RE: [pfSense-discussion] throughput - cpu, bus
That version of Vmware is prehistoric, and probably only emulates a 10 Mbit AMD PCNet nic. Try testing from the host OS on your source machine. The best method for testing bulk is iperf, or this Avalance thing is more real-world. -Original Message- From: Chun Wong [mailto:[EMAIL PROTECTED] Sent: Thursday, 16 March 2006 12:47 a.m. To: discussion@pfsense.com Subject: RE: [pfSense-discussion] throughput - cpu, bus Chipset ? I'm not sure tbh, its an abit board I purchased 4-5 years ago. The source is on a HP Netserver LH3000 (2 x P3 866Mhz, 2.25Gb RAM) with dual 64 bit PCI bus. 3 x Intel Pro MT1000 gig nics (64bit). The disk subsystem is 2 x megaraid scsi/sata controllers, with scsi3 and sata raid 5 arrays. I doubt the bottle neck is there. Although it is running vmware 2.5.1 at the moment. The guest OS is Windows XP SP2. I guess I need to see what happens when I run straight linux on the box. The firewall is currently on an abit mb, don't know which chipset till I down the fw and take a look. This has Intel Pro MT1000 gig nics (64bit) too although only 32bits are being used. The destination machine is a nforce2 mb with an athlon xp1700 with 1Gb RAM and ATA133 seagate 7200rpm drive running XP SP2. Here there is a 3com 996B Now somewhere in there is the culprit for slowing things down. I have been using ftp get on large files to do the measuring: Is there a better method ? Thanks -Original Message- From: Greg Hennessy [mailto:[EMAIL PROTECTED] Sent: 15 March 2006 10:45 To: discussion@pfsense.com Subject: RE: [pfSense-discussion] throughput - cpu, bus guys, 2.2MBs, 2.2 megabytes per second (120) 7MBs, 7 megabytes pers second (athlon) Are the Athlon figures on a Via chipset motherboard ? Some of the early Via athlon chipsets had pretty lousy PCI performance. You could try tweaking the PCI latency timers in the bios to give the em card more time on the bus. This may improve throughput slightly. On a bge plugged into a nforce2 board, I can iperf ~800 read/ ~600 write through it. Greg
RE: [pfSense-discussion] throughput - cpu, bus - VMware
Ooops sorry - I thought you meant vmware workstation, not vmware ESX server. However I still suggest testing from the host OS, just makes things tidier. -Original Message- From: Chun Wong [mailto:[EMAIL PROTECTED] Sent: Thursday, 16 March 2006 11:45 a.m. To: discussion@pfsense.com Subject: RE: [pfSense-discussion] throughput - cpu, bus - VMware Hi Craig vmware 2.5.1 esx is current, 3.0 is in beta at the moment. definitely emulates FE or better, I am getting a sustained 75mbs, I was just hoping for more. But you are absolutely right, I should be testing in native mode. Regards --- Ursprüngliche Nachricht --- Von: Craig FALCONER [EMAIL PROTECTED] An: discussion@pfsense.com Betreff: RE: [pfSense-discussion] throughput - cpu, bus Datum: Thu, 16 Mar 2006 10:40:13 +1300 That version of Vmware is prehistoric, and probably only emulates a 10 Mbit AMD PCNet nic. Try testing from the host OS on your source machine. The best method for testing bulk is iperf, or this Avalance thing is more real-world. -- Feel free mit GMX FreeMail! Monat für Monat 10 FreeSMS inklusive! http://www.gmx.net
Re: [pfSense-discussion] throughput - cpu, bus
On 3/15/06, Chun Wong [EMAIL PROTECTED] wrote: Chipset ? I'm not sure tbh, its an abit board I purchased 4-5 years ago. The source is on a HP Netserver LH3000 (2 x P3 866Mhz, 2.25Gb RAM) with dual 64 bit PCI bus. 3 x Intel Pro MT1000 gig nics (64bit). The disk subsystem is 2 x megaraid scsi/sata controllers, with scsi3 and sata raid 5 arrays. I doubt the bottle neck is there. Although it is running vmware 2.5.1 at the moment. The guest OS is Windows XP SP2. I guess I need to see what happens when I run straight linux on the box. VMWare performance regardless of whether this is ESX or not (I'm assuming ESX, not workstation or GSX) sucks. Use a physical box for this type of testing. --Bill
Re: [pfSense-discussion] throughput - cpu, bus
Chun Wong wrote: Hi, I have two fw platforms, mono 1.21 running on a Nokia120 and pfsense1.0beta2 running on an AMD athlon 900. I can get 2.2MBs on the 120 platform, at 96% cpu usage. On the athlon, 32bit, 33Mhz pci, I can get 7MBs using Intel PRO 1000MT 64 bit PCI cards. My question is what speed/type cpu do I need to use to improve on this with a PCI-X bus? (64bit, 33Mhz or maybe 66Mhz) I would like to get 15-20MBs, but without spending too much. I am looking at a 2nd hand Supermicro FPGA370 dual Pentium mb, with PCI-X bus. All my NICs are Intelpro MT1000, 64bit. Thanks Something else is wrong. Either of these platforms should be able to forward at something close to 100Mbps, if not higher.
Re: [pfSense-discussion] throughput - cpu, bus
On 3/14/06, Jim Thompson [EMAIL PROTECTED] wrote: Chun Wong wrote: Hi, I have two fw platforms, mono 1.21 running on a Nokia120 and pfsense1.0beta2 running on an AMD athlon 900. I can get 2.2MBs on the 120 platform, at 96% cpu usage. On the athlon, 32bit, 33Mhz pci, I can get 7MBs using Intel PRO 1000MT 64 bit PCI cards. My question is what speed/type cpu do I need to use to improve on this with a PCI-X bus? (64bit, 33Mhz or maybe 66Mhz) I would like to get 15-20MBs, but without spending too much. I am looking at a 2nd hand Supermicro FPGA370 dual Pentium mb, with PCI-X bus. All my NICs are Intelpro MT1000, 64bit. Thanks Something else is wrong. Either of these platforms should be able to forward at something close to 100Mbps, if not higher. Agreed...unless those MT1000's are plugged into 100Mbit ports (but I guess that would fall under the something else is wrong) :) Then 70Mbit wouldn't be entirely out of line (depending on the test software). 500Mbit throughput is about all you'll practically get on a 33Mhz 32bit slot and in practice, it'll be somewhat slower (closer to 3-400Mbit). A 64bit/66Mhz slot will make that a much higher ceiling. --Bill
Re: [pfSense-discussion] throughput - cpu, bus
mmhhh, not sure if I understood well what you are measuring.. I'm doing some tests on a EPIA PD6000 (so I think much lower than your hardware): http://www.via.com.tw/en/products/mainboards/mini_itx/epia_pd/ I can reach an average throughput of about 92-97 MB/s, with unencrypted traffic. I am testing it with iperf. I'm about to test same config, but using two box and an ipsec VPN between (I want to test the accelerated crypto hw). Anybody altready tested it? Tom On 3/14/06, Chun Wong [EMAIL PROTECTED] wrote: Hi,I have two fw platforms, mono 1.21 running on a Nokia120 and pfsense1.0beta2running on an AMD athlon 900.I can get 2.2MBs on the 120 platform, at 96% cpu usage. On the athlon,32bit, 33Mhz pci, I can get 7MBs using Intel PRO 1000MT 64 bit PCI cards. My question is what speed/type cpu do I need to use to improve on this witha PCI-X bus? (64bit, 33Mhz or maybe 66Mhz)I would like to get 15-20MBs, but without spending too much. I am looking ata 2nd hand Supermicro FPGA370 dual Pentium mb, with PCI-X bus. All my NICs are Intelpro MT1000, 64bit.Thanks
Re: [pfSense-discussion] throughput - cpu, bus
Am 14.03.2006 um 20:52 schrieb Greg Hennessy: I'd love to get the chance to throw an Avalanche at a decent system running PF to see what it really can stand upto. Andre Oppermann is working on that. http://people.freebsd.org/~andre/ But the results won't show-up until 7.0 is released, which looks to be sometime in 2007. http://www.freebsd.org/releng/index.html Rainer
Re: [pfSense-discussion] throughput - cpu, bus
Chun Wong wrote: guys, 2.2MBs, 2.2 megabytes per second (120) 7MBs, 7 megabytes pers second (athlon) Yes. These are, respectively: 17.6Mbps and 56Mbps (your values * 8 to translate to 'megabits per second') thats from smart ftp transfering 3GB size files. On the fw traffic graph, I see 30 megabits per second on the 120 (95% cpu) and 75 megabits peak on the athlon platform (45% cpu) Note how the graph is indicating some 33% to 70% more traffic than your application. Something is 'wrong' (could be as simple as you mis-reading the graph, but probably not.) Are you measuring ftp 'get' or ftp 'put'? If you're using 'put', please stop and use 'get', or move to an application like 'iperf' to measure throughput. Reason: When you use ftp's 'put', the client stops measuring the transfer time immediately after all of the data are written through the socket interface, and the socket is closed. In practice, the socket writes only transfer data from the application to the socket buffers maintained in the kernel. The TCP protocol is then responsible for transmitting the data from the socket buffers to the remote machine. Details of acknowledgments, packet losses, and retransmissions that are implemented by TCP are hidden from the application. As a result, the time interval between the first write request and the last socket write and socket close only indicates the time to transfer all of the application data to the socket buffer. This does not measure the total time that elapses before the last byte of data actually reaches the remote machine. In the extreme case, a socket buffer full of data, (normally 8 KBytes, but could be much larger), could still be awaiting transmission by TCP. In the case of a ftp 'get', the clock only stops when the last byte of the file is read. Normally, the socket implementation guarantees that the socket buffer is empty when the socket close call succeeds, but only if SO_LINGER was set on the socket at creation (or using setsockopt() (or if you call shutdown() on a socket, but now I'm getting into the weeds)). If SO_LINGER is not set, (or shutdown() is not called) then you can have data in flight (and in the kernel buffers) that is not yet ACKed or delivered. to be honest I was expecting a lot more. I am using an 8 port SMC gigabit switch that supports jumbo frames - how do I increase the ethernet frame size on the firewall interface ? Are you sure your cards support it? I'll see if I can rig up an extra long crossover cable to bypass the switch. If I am supposed to see 400 megabits, then I presume this is split between the incoming nic and outgoing nic, so 200 megabits per second ?? No, I'm sure Bill was referring to a single flow (in one direction, modulo the ACKs and other protocol overhead) measured in terms of data delivery, not the marketing speak of multiplying x2 because your card supports full-duplex. Any ideas where I should be checking ? full-duplex .vs half-duplex mismatch mtu mismatch Thanks ! --- Ursprüngliche Nachricht --- Von: Bill Marquette [EMAIL PROTECTED] An: discussion@pfsense.com Betreff: Re: [pfSense-discussion] throughput - cpu, bus Datum: Tue, 14 Mar 2006 13:41:15 -0600 On 3/14/06, Jim Thompson [EMAIL PROTECTED] wrote: Chun Wong wrote: Hi, I have two fw platforms, mono 1.21 running on a Nokia120 and pfsense1.0beta2 running on an AMD athlon 900. I can get 2.2MBs on the 120 platform, at 96% cpu usage. On the athlon, 32bit, 33Mhz pci, I can get 7MBs using Intel PRO 1000MT 64 bit PCI cards. My question is what speed/type cpu do I need to use to improve on this with a PCI-X bus? (64bit, 33Mhz or maybe 66Mhz) I would like to get 15-20MBs, but without spending too much. I am looking at a 2nd hand Supermicro FPGA370 dual Pentium mb, with PCI-X bus. All my NICs are Intelpro MT1000, 64bit. Thanks Something else is wrong. Either of these platforms should be able to forward at something close to 100Mbps, if not higher. Agreed...unless those MT1000's are plugged into 100Mbit ports (but I guess that would fall under the something else is wrong) :) Then 70Mbit wouldn't be entirely out of line (depending on the test software). 500Mbit throughput is about all you'll practically get on a 33Mhz 32bit slot and in practice, it'll be somewhat slower (closer to 3-400Mbit). A 64bit/66Mhz slot will make that a much higher ceiling. --Bill
Re: [pfSense-discussion] throughput - cpu, bus
On 3/14/06, Chun Wong [EMAIL PROTECTED] wrote: On the fw traffic graph, I see 30 megabits per second on the 120 (95% cpu) and 75 megabits peak on the athlon platform (45% cpu). This certainly suggests that CPU on the athlon is not your limiting factor. to be honest I was expecting a lot more. I am using an 8 port SMC gigabit switch that supports jumbo frames - how do I increase the ethernet frame size on the firewall interface ? I believe there is a hidden option to change MTU - I'll leave it to someone else to provide that option. I'll see if I can rig up an extra long crossover cable to bypass the switch. If I am supposed to see 400 megabits, then I presume this is split between the incoming nic and outgoing nic, so 200 megabits per second ?? No, that's 400Mbit throughput :) A [EMAIL PROTECTED] bus is roughly around 1Gbit transfer rate so 500Mbit would be the absolute max. Any ideas where I should be checking ? netstat -ni from the shell and see if you're taking any interface errors on all the machines involved in the test. --Bill
Re: [pfSense-discussion] throughput - cpu, bus
On 3/14/06, Rainer Duffner [EMAIL PROTECTED] wrote: Am 14.03.2006 um 20:52 schrieb Greg Hennessy: I'd love to get the chance to throw an Avalanche at a decent system running PF to see what it really can stand upto. Quite a bit. I ran out of Avalanche/Reflector capacity at 750Mbit, but the OpenBSD box I pointed the firehose at, was only hitting about 30% CPU load at the time. I expect I'd see better performance out of FreeBSD (w/ or w/out Andre's work). I plan on running the same tests against pfSense 1.0 when released. --Bill
RE: [pfSense-discussion] throughput - cpu, bus
Quite a bit. I ran out of Avalanche/Reflector capacity at 750Mbit, but the OpenBSD box I pointed the firehose at, was only hitting about 30% CPU load at the time. Interesting, what nics were in the box ? I expect I'd see better performance out of FreeBSD (w/ or w/out Andre's work). I plan on running the same tests against pfSense 1.0 when released. Looking forward to it. Putting in a DL-385 for the same client, on 6.x/PF with 4 * em to firewall off a large network backup environment. I should have some pretty symon pictures soon. Greg
Re: [pfSense-discussion] throughput - cpu, bus
On 3/14/06, Greg Hennessy [EMAIL PROTECTED] wrote: Quite a bit. I ran out of Avalanche/Reflector capacity at 750Mbit, but the OpenBSD box I pointed the firehose at, was only hitting about 30% CPU load at the time. Interesting, what nics were in the box ? HP DL380G3 w/ Broadcom and Intel NICs. I also ran an iperf test, but ran out of physical boxes to generate and receive the load at around 900Mbit (I did determine the maximum xmit/receive rate of a Sun v120 running Solaris 8 though ;) ) During the iperf tests, the cpu load was closer to 25%, but iperf generates larger packets, so that's no huge surprise and why Avalanche is a much closer to real life test. I've got some interestingly crappy test results while working on the shaper before Beta 2 on a 1Ghz Via cpu here: http://www.pfsense.com/~billm/spirent/1/ And I do mean crappy. I wasn't trying too hard to get a good working test, just tossing traffic to see what's blowing up and why. I expect I'd see better performance out of FreeBSD (w/ or w/out Andre's work). I plan on running the same tests against pfSense 1.0 when released. Looking forward to it. Putting in a DL-385 for the same client, on 6.x/PF with 4 * em to firewall off a large network backup environment. I should have some pretty symon pictures soon. Very interested in results from a high throughput environment. I'm probably a good year or so away from deploying pfSense anywhere near our high throughput (high dollar) production environment but I'm interested in others results in the meantime. For now, that environment is staying on OpenBSD (and pf's native OS). We're a large company and pfSense doesn't meet our internal audit requirements just yet - that's on my todo list (multi-user, change logs, etc). --Bill
Re: [pfSense-discussion] throughput - cpu, bus
Greg Hennessy wrote: That's ~20 megabits/sec, not bad for an IP-120 given its horsepower Not for m0n0wall/FreeBSD 4.x. That box should be about the same speed as a Soekris 4801 or WRAP, either of which will hit ~40-45 Mbps. If this were pfsense/FreeBSD 6.x, I would say ~20 Mbps is low, but acceptable. Neither FreeBSD 5 or 6 will even boot on the Nokia IP1xx's though (kernel panics). Though I have heard one person complain of poor performance (~25 Mbps tops, IIRC) on a IP110 with m0n0wall, so there may be something odd with that hardware that makes it slower than it appears it should be given the specs.