Re: Re[4]: serious networking (em) performance (ggate and NFS) problem
On Sun, 21 Nov 2004, Sean McNeil wrote: I have to disagree. Packet loss is likely according to some of my tests. With the re driver, no change except placing a 100BT setup with no packet loss to a gigE setup (both linksys switches) will cause serious packet loss at 20Mbps data rates. I have discovered the only way to get good performance with no packet loss was to 1) Remove interrupt moderation 2) defrag each mbuf that comes in to the driver. Sounds like you're bumping into a queue limit that is made worse by interrupting less frequently, resulting in bursts of packets that are relatively large, rather than a trickle of packets at a higher rate. Perhaps a limit on the number of outstanding descriptors in the driver or hardware and/or a limit in the netisr/ifqueue queue depth. You might try changing the default IFQ_MAXLEN from 50 to 128 to increase the size of the ifnet and netisr queues. You could also try setting net.isr.enable=1 to enable direct dispatch, which in the in-bound direction would reduce the number of context switches and queueing. It sounds like the device driver has a limit of 256 receive and transmit descriptors, which one supposes is probably derived from the hardware limit, but I have no documentation on hand so can't confirm that. It would be interesting on the send and receive sides to inspect the counters for drops at various points in the network stack; i.e., are we dropping packets at the ifq handoff because we're overfilling the descriptors in the driver, are packets dropped on the inbound path going into the netisr due to over-filling before the netisr is scheduled, etc. And, it's probably interesting to look at stats on filling the socket buffers for the same reason: if bursts of packets come up the stack, the socket buffers could well be being over-filled before the user thread can run. Robert N M Watson FreeBSD Core Team, TrustedBSD Projects [EMAIL PROTECTED] Principal Research Scientist, McAfee Research ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Re[4]: serious networking (em) performance (ggate and NFS) problem
On Mon, 2004-11-22 at 11:34 +, Robert Watson wrote: On Sun, 21 Nov 2004, Sean McNeil wrote: I have to disagree. Packet loss is likely according to some of my tests. With the re driver, no change except placing a 100BT setup with no packet loss to a gigE setup (both linksys switches) will cause serious packet loss at 20Mbps data rates. I have discovered the only way to get good performance with no packet loss was to 1) Remove interrupt moderation 2) defrag each mbuf that comes in to the driver. Sounds like you're bumping into a queue limit that is made worse by interrupting less frequently, resulting in bursts of packets that are relatively large, rather than a trickle of packets at a higher rate. Perhaps a limit on the number of outstanding descriptors in the driver or hardware and/or a limit in the netisr/ifqueue queue depth. You might try changing the default IFQ_MAXLEN from 50 to 128 to increase the size of the ifnet and netisr queues. You could also try setting net.isr.enable=1 to enable direct dispatch, which in the in-bound direction would reduce the number of context switches and queueing. It sounds like the device driver has a limit of 256 receive and transmit descriptors, which one supposes is probably derived from the hardware limit, but I have no documentation on hand so can't confirm that. I've tried bumping IFQ_MAXLEN and it made no difference. I could rerun this test to be 100% certain I suppose. It was done a while back. I haven't tried net.isr.enable=1, but packet loss is in the transmission direction. The device driver has been modified to have 1024 transmit and receive descriptors each as that is the hardware limitation. That didn't matter either. With 1024 descriptors I still lost packets without the m_defrag. The most difficult thing for me to understand is: if this is some sort of resource limitation why will it work with a slower phy layer perfectly and not with the gigE? The only thing I could think of was that the old driver was doing m_defrag calls when it filled the transmit descriptor queues up to a certain point. Understanding the effects of m_defrag would be helpful in figuring this out I suppose. It would be interesting on the send and receive sides to inspect the counters for drops at various points in the network stack; i.e., are we dropping packets at the ifq handoff because we're overfilling the descriptors in the driver, are packets dropped on the inbound path going into the netisr due to over-filling before the netisr is scheduled, etc. And, it's probably interesting to look at stats on filling the socket buffers for the same reason: if bursts of packets come up the stack, the socket buffers could well be being over-filled before the user thread can run. Yes, this would be very interesting and should point out the problem. I would do such a thing if I had enough knowledge of the network pathways. Alas, I am very green in this area. The receive side has no issues, though, so I would focus on transmit counters (with assistance). signature.asc Description: This is a digitally signed message part
Re: Re[4]: serious networking (em) performance (ggate and NFS) problem
Sean McNeil wrote this message on Mon, Nov 22, 2004 at 12:14 -0800: On Mon, 2004-11-22 at 11:34 +, Robert Watson wrote: On Sun, 21 Nov 2004, Sean McNeil wrote: I have to disagree. Packet loss is likely according to some of my tests. With the re driver, no change except placing a 100BT setup with no packet loss to a gigE setup (both linksys switches) will cause serious packet loss at 20Mbps data rates. I have discovered the only way to get good performance with no packet loss was to 1) Remove interrupt moderation 2) defrag each mbuf that comes in to the driver. Sounds like you're bumping into a queue limit that is made worse by interrupting less frequently, resulting in bursts of packets that are relatively large, rather than a trickle of packets at a higher rate. Perhaps a limit on the number of outstanding descriptors in the driver or hardware and/or a limit in the netisr/ifqueue queue depth. You might try changing the default IFQ_MAXLEN from 50 to 128 to increase the size of the ifnet and netisr queues. You could also try setting net.isr.enable=1 to enable direct dispatch, which in the in-bound direction would reduce the number of context switches and queueing. It sounds like the device driver has a limit of 256 receive and transmit descriptors, which one supposes is probably derived from the hardware limit, but I have no documentation on hand so can't confirm that. I've tried bumping IFQ_MAXLEN and it made no difference. I could rerun And the default for if_re is RL_IFQ_MAXLEN which is already 512... As is mentioned below, the card can do 64 segments (which usually means 32 packets since each packet usually has a header + payload in seperate packets)... this test to be 100% certain I suppose. It was done a while back. I haven't tried net.isr.enable=1, but packet loss is in the transmission direction. The device driver has been modified to have 1024 transmit and receive descriptors each as that is the hardware limitation. That didn't matter either. With 1024 descriptors I still lost packets without the m_defrag. hmmm... you know, I wonder if this is a problem with the if_re not pulling enough data from memory before starting the transmit... Though we currently have it set for unlimited... so, that doesn't seem like it would be it.. The most difficult thing for me to understand is: if this is some sort of resource limitation why will it work with a slower phy layer perfectly and not with the gigE? The only thing I could think of was that the old driver was doing m_defrag calls when it filled the transmit descriptor queues up to a certain point. Understanding the effects of m_defrag would be helpful in figuring this out I suppose. maybe the chip just can't keep the transmit fifo loaded at the higher speeds... is it possible vls is doing a writev for multisegmented UDP packet? I'll have to look at this again... It would be interesting on the send and receive sides to inspect the counters for drops at various points in the network stack; i.e., are we dropping packets at the ifq handoff because we're overfilling the descriptors in the driver, are packets dropped on the inbound path going into the netisr due to over-filling before the netisr is scheduled, etc. And, it's probably interesting to look at stats on filling the socket buffers for the same reason: if bursts of packets come up the stack, the socket buffers could well be being over-filled before the user thread can run. Yes, this would be very interesting and should point out the problem. I would do such a thing if I had enough knowledge of the network pathways. Alas, I am very green in this area. The receive side has no issues, though, so I would focus on transmit counters (with assistance). -- John-Mark Gurney Voice: +1 415 225 5579 All that I will do, has been done, All that I have, has not. ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Re[4]: serious networking (em) performance (ggate and NFS) problem
Hi John-Mark, On Mon, 2004-11-22 at 13:31 -0800, John-Mark Gurney wrote: Sean McNeil wrote this message on Mon, Nov 22, 2004 at 12:14 -0800: On Mon, 2004-11-22 at 11:34 +, Robert Watson wrote: On Sun, 21 Nov 2004, Sean McNeil wrote: I have to disagree. Packet loss is likely according to some of my tests. With the re driver, no change except placing a 100BT setup with no packet loss to a gigE setup (both linksys switches) will cause serious packet loss at 20Mbps data rates. I have discovered the only way to get good performance with no packet loss was to 1) Remove interrupt moderation 2) defrag each mbuf that comes in to the driver. Sounds like you're bumping into a queue limit that is made worse by interrupting less frequently, resulting in bursts of packets that are relatively large, rather than a trickle of packets at a higher rate. Perhaps a limit on the number of outstanding descriptors in the driver or hardware and/or a limit in the netisr/ifqueue queue depth. You might try changing the default IFQ_MAXLEN from 50 to 128 to increase the size of the ifnet and netisr queues. You could also try setting net.isr.enable=1 to enable direct dispatch, which in the in-bound direction would reduce the number of context switches and queueing. It sounds like the device driver has a limit of 256 receive and transmit descriptors, which one supposes is probably derived from the hardware limit, but I have no documentation on hand so can't confirm that. I've tried bumping IFQ_MAXLEN and it made no difference. I could rerun And the default for if_re is RL_IFQ_MAXLEN which is already 512... As is mentioned below, the card can do 64 segments (which usually means 32 packets since each packet usually has a header + payload in seperate packets)... It sounds like you believe this is an if_re-only problem. I had the feeling that the if_em driver performance problems were related in some way. I noticed that if_em does not do anything with m_defrag and thought it might be a little more than coincidence. this test to be 100% certain I suppose. It was done a while back. I haven't tried net.isr.enable=1, but packet loss is in the transmission direction. The device driver has been modified to have 1024 transmit and receive descriptors each as that is the hardware limitation. That didn't matter either. With 1024 descriptors I still lost packets without the m_defrag. hmmm... you know, I wonder if this is a problem with the if_re not pulling enough data from memory before starting the transmit... Though we currently have it set for unlimited... so, that doesn't seem like it would be it.. Right. Plus it now has 1024 descriptors on my machine and, like I said, made little difference. The most difficult thing for me to understand is: if this is some sort of resource limitation why will it work with a slower phy layer perfectly and not with the gigE? The only thing I could think of was that the old driver was doing m_defrag calls when it filled the transmit descriptor queues up to a certain point. Understanding the effects of m_defrag would be helpful in figuring this out I suppose. maybe the chip just can't keep the transmit fifo loaded at the higher speeds... is it possible vls is doing a writev for multisegmented UDP packet? I'll have to look at this again... I suppose. As I understand it, though, it should be sending out 1316-byte data packets at a metered pace. Also, wouldn't it behave the same for 100BT vs. gigE? Shouldn't I see packet loss with 100BT if this is the case? It would be interesting on the send and receive sides to inspect the counters for drops at various points in the network stack; i.e., are we dropping packets at the ifq handoff because we're overfilling the descriptors in the driver, are packets dropped on the inbound path going into the netisr due to over-filling before the netisr is scheduled, etc. And, it's probably interesting to look at stats on filling the socket buffers for the same reason: if bursts of packets come up the stack, the socket buffers could well be being over-filled before the user thread can run. Yes, this would be very interesting and should point out the problem. I would do such a thing if I had enough knowledge of the network pathways. Alas, I am very green in this area. The receive side has no issues, though, so I would focus on transmit counters (with assistance). signature.asc Description: This is a digitally signed message part
Re: Re[4]: serious networking (em) performance (ggate and NFS) problem
:Increasing the interrupt moderation frequency worked on the re driver, :but it only made it marginally better. Even without moderation, :however, I could lose packets without m_defrag. I suspect that there is :something in the higher level layers that is causing the packet loss. I :have no explanation why m_defrag makes such a big difference for me, but :it does. I also have no idea why a 20Mbps UDP stream can lose data over :gigE phy and not lose anything over 100BT... without the above mentioned :changes that is. It kinda sounds like the receiver's UDP buffer is not large enough to handle the burst traffic. 100BT is a much slower transport and the receiver (userland process) was likely able drain its buffer before new packets arrived. Use netstat -s to observe the drop statistics for udp on both the sender and receiver sides. You may also be able to get some useful information looking at the ip stats on both sides too. Try bumping up net.inet.udp.recvspace and see if that helps. In anycase, you should be able to figure out where the drops are occuring by observing netstat -s output. -Matt Matthew Dillon [EMAIL PROTECTED] ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re[4]: serious networking (em) performance (ggate and NFS) problem
Thank you, Matt. Very interesting, but the only reason you get lower results is simply because the TCP window is not big enough. That's it. Yes, I knew that adjusting TCP window size is important to use up a link. However I wanted to show adjusting the parameters of Interrupt Moderation affects network performance. And I think a packet loss was occured by enabled Interrupt Moderation. The mechanism of a packet loss in this case is not cleared, but I think inappropriate TCP window size is not the only reason. I found TCP throuput improvement at disabled Interrupt Moderation is related to congestion avoidance phase of TCP. Because these standard deviations are decreased when Interrupt Moderation is disabled. The following two results are outputs of `iperf -P 10'. without TCP window size adjustment too. I think, the difference of each throughput at same measurement shows congestion avoidance worked. o with default setting of Interrupt Moderation. [ ID] Interval Transfer Bandwidth [ 13] 0.0-10.0 sec 80.1 MBytes 67.2 Mbits/sec [ 11] 0.0-10.0 sec 121 MBytes 102 Mbits/sec [ 12] 0.0-10.0 sec 98.9 MBytes 83.0 Mbits/sec [ 4] 0.0-10.0 sec 91.8 MBytes 76.9 Mbits/sec [ 7] 0.0-10.0 sec 127 MBytes 106 Mbits/sec [ 5] 0.0-10.0 sec 106 MBytes 88.8 Mbits/sec [ 6] 0.0-10.0 sec 113 MBytes 94.4 Mbits/sec [ 10] 0.0-10.0 sec 117 MBytes 98.2 Mbits/sec [ 9] 0.0-10.0 sec 113 MBytes 95.0 Mbits/sec [ 8] 0.0-10.0 sec 93.0 MBytes 78.0 Mbits/sec [SUM] 0.0-10.0 sec 1.04 GBytes 889 Mbits/sec o with disabled Interrupt Moderation. [ ID] Interval Transfer Bandwidth [ 7] 0.0-10.0 sec 106 MBytes 88.9 Mbits/sec [ 10] 0.0-10.0 sec 107 MBytes 89.7 Mbits/sec [ 8] 0.0-10.0 sec 107 MBytes 89.4 Mbits/sec [ 9] 0.0-10.0 sec 107 MBytes 90.0 Mbits/sec [ 11] 0.0-10.0 sec 106 MBytes 89.2 Mbits/sec [ 12] 0.0-10.0 sec 104 MBytes 87.6 Mbits/sec [ 4] 0.0-10.0 sec 106 MBytes 88.7 Mbits/sec [ 13] 0.0-10.0 sec 106 MBytes 88.9 Mbits/sec [ 5] 0.0-10.0 sec 106 MBytes 88.9 Mbits/sec [ 6] 0.0-10.0 sec 107 MBytes 89.9 Mbits/sec [SUM] 0.0-10.0 sec 1.04 GBytes 891 Mbits/sec But, By decreasing TCP windows size, it could avoid. o with default setting of Interrupt Moderation and iperf -P 10 -w 28.3k [ ID] Interval Transfer Bandwidth [ 12] 0.0-10.0 sec 111 MBytes 93.0 Mbits/sec [ 4] 0.0-10.0 sec 106 MBytes 88.8 Mbits/sec [ 11] 0.0-10.0 sec 107 MBytes 89.9 Mbits/sec [ 9] 0.0-10.0 sec 109 MBytes 91.6 Mbits/sec [ 5] 0.0-10.0 sec 109 MBytes 91.5 Mbits/sec [ 13] 0.0-10.0 sec 108 MBytes 90.8 Mbits/sec [ 10] 0.0-10.0 sec 107 MBytes 89.7 Mbits/sec [ 8] 0.0-10.0 sec 110 MBytes 92.3 Mbits/sec [ 6] 0.0-10.0 sec 111 MBytes 93.2 Mbits/sec [ 7] 0.0-10.0 sec 108 MBytes 90.6 Mbits/sec [SUM] 0.0-10.0 sec 1.06 GBytes 911 Mbits/sec Measureing TCP throughput was not appropriate way to indicate an effect of Interrupt Moderation clearly. It's my mistake. TCP is too complicated. :) -- Shunsuke SHINOMIYA [EMAIL PROTECTED] ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Re[4]: serious networking (em) performance (ggate and NFS) problem
: Yes, I knew that adjusting TCP window size is important to use up a link. : However I wanted to show adjusting the parameters of Interrupt : Moderation affects network performance. : : And I think a packet loss was occured by enabled Interrupt Moderation. : The mechanism of a packet loss in this case is not cleared, but I think : inappropriate TCP window size is not the only reason. Packet loss is not likely, at least not for the contrived tests we are doing because GiGE links have hardware flow control (I'm fairly sure). One could calculate the worst case small-packet build up in the receive ring. I'm not sure what the minimum pad for GiGE is, but lets say it's 64 bytes. Then the packet rate would be around 1.9M pps or 244 packets per interrupt at a moderation frequency of 8000 hz. The ring is 256 packets. But, don't forget the hardware flow control! The switch has some buffering too. hmm... me thinks I now understand why 8000 was chosen as the default :-) I would say that this means packet loss due to the interrupt moderation is highly unlikely, at least in theory, but if one were paranoid one might want to use a higher moderation frequency, say 16000 hz, to be sure. : I found TCP throuput improvement at disabled Interrupt Moderation is related : to congestion avoidance phase of TCP. Because these standard deviations are : decreased when Interrupt Moderation is disabled. : : The following two results are outputs of `iperf -P 10'. without TCP : window size adjustment too. I think, the difference of each throughput : at same measurement shows congestion avoidance worked. : :o with default setting of Interrupt Moderation. : [ ID] Interval Transfer Bandwidth : [ 13] 0.0-10.0 sec 80.1 MBytes 67.2 Mbits/sec : [ 11] 0.0-10.0 sec 121 MBytes 102 Mbits/sec : [ 12] 0.0-10.0 sec 98.9 MBytes 83.0 Mbits/sec : [ 4] 0.0-10.0 sec 91.8 MBytes 76.9 Mbits/sec : [ 7] 0.0-10.0 sec 127 MBytes 106 Mbits/sec : [ 5] 0.0-10.0 sec 106 MBytes 88.8 Mbits/sec : [ 6] 0.0-10.0 sec 113 MBytes 94.4 Mbits/sec : [ 10] 0.0-10.0 sec 117 MBytes 98.2 Mbits/sec : [ 9] 0.0-10.0 sec 113 MBytes 95.0 Mbits/sec : [ 8] 0.0-10.0 sec 93.0 MBytes 78.0 Mbits/sec : [SUM] 0.0-10.0 sec 1.04 GBytes 889 Mbits/sec Certainly overall send/response latency will be effected by up to 1/freq, e.g. 1/8000 = 125 uS (x2 hosts == 250 uS worst case), which is readily observable by running ping: [intrate] [set on both boxes] max:64 bytes from 216.240.41.62: icmp_seq=2 ttl=64 time=0.057 ms 10: 64 bytes from 216.240.41.62: icmp_seq=8 ttl=64 time=0.061 ms 3: 64 bytes from 216.240.41.62: icmp_seq=5 ttl=64 time=0.078 ms 8000: 64 bytes from 216.240.41.62: icmp_seq=3 ttl=64 time=0.176 ms (large stddev too, e.g. 0.188, 0.166, etc). But this is only relevant for applications that require that sort of response time == not very many applications. Note that a large packet will turn the best case 57 uS round trip into a 140 uS round trip with the EM card. It might be interesting to see how interrupt moderation effects a buildworld over NFS as that certainly results in a huge amount of synchronous transactional traffic. : Measureing TCP throughput was not appropriate way to indicate an effect : of Interrupt Moderation clearly. It's my mistake. TCP is too : complicated. :) : :-- :Shunsuke SHINOMIYA [EMAIL PROTECTED] It really just comes down to how sensitive a production system is to round trip times within the range of effect of the moderation frequency. Usually the answer is: not very. That is, the benefit is not sufficient to warrent the additional interrupt load that turning moderation off would create. And even if low latency is desired it is not actually necessary to turn off moderation. It could be set fairly high, e.g. 2, to reap most of the benefit. Processing overheads are also important. If the network is loaded down you will wind up eating a significant chunk of cpu with moderation turned off. This is readily observable by running vmstat during an iperf test. iperf test ~700 MBits/sec reported for all tested moderation frequencies. using iperf -w 63.5K on DragonFly. I would be interesting in knowing how FreeBSD fares, though SMP might skew the reality too much to be meaningful. moderation cpu frequency %idle 10 2% idle 3 7% idle 2 35% idle 1 60% idle 800066% idle In otherwords, if you are doing more then just shoving bits around the network, for example if you need to read or write the disk or do some sort of computation or other activity that requires cpu, turning off moderation could wind up being a very, very bad
Re: Re[4]: serious networking (em) performance (ggate and NFS) problem
On Sun, 2004-11-21 at 20:42 -0800, Matthew Dillon wrote: : Yes, I knew that adjusting TCP window size is important to use up a link. : However I wanted to show adjusting the parameters of Interrupt : Moderation affects network performance. : : And I think a packet loss was occured by enabled Interrupt Moderation. : The mechanism of a packet loss in this case is not cleared, but I think : inappropriate TCP window size is not the only reason. Packet loss is not likely, at least not for the contrived tests we are doing because GiGE links have hardware flow control (I'm fairly sure). I have to disagree. Packet loss is likely according to some of my tests. With the re driver, no change except placing a 100BT setup with no packet loss to a gigE setup (both linksys switches) will cause serious packet loss at 20Mbps data rates. I have discovered the only way to get good performance with no packet loss was to 1) Remove interrupt moderation 2) defrag each mbuf that comes in to the driver. Doing both of these, I get excellent performance without any packet loss. All my testing has been with UDP packets, however, and nothing was checked for TCP. One could calculate the worst case small-packet build up in the receive ring. I'm not sure what the minimum pad for GiGE is, but lets say it's 64 bytes. Then the packet rate would be around 1.9M pps or 244 packets per interrupt at a moderation frequency of 8000 hz. The ring is 256 packets. But, don't forget the hardware flow control! The switch has some buffering too. hmm... me thinks I now understand why 8000 was chosen as the default :-) I would say that this means packet loss due to the interrupt moderation is highly unlikely, at least in theory, but if one were paranoid one might want to use a higher moderation frequency, say 16000 hz, to be sure. Your calculations are based on the mbufs being a particular size, no? What happens if they are seriously defragmented? Is this what you mean by small-packet? Are you assuming the mbufs are as small as they get? How small can they go? 1 byte? 1 MTU? Increasing the interrupt moderation frequency worked on the re driver, but it only made it marginally better. Even without moderation, however, I could lose packets without m_defrag. I suspect that there is something in the higher level layers that is causing the packet loss. I have no explanation why m_defrag makes such a big difference for me, but it does. I also have no idea why a 20Mbps UDP stream can lose data over gigE phy and not lose anything over 100BT... without the above mentioned changes that is. signature.asc Description: This is a digitally signed message part