Re: [PATCH 0/4][RFC] lro: Generic Large Receive Offload for TCP traffic
Jan-Bernd Themann wrote: On Monday 30 July 2007 22:32, Andrew Gallatin wrote: Second, you still need to set skb->ip_summed = CHECKSUM_UNNECESSARY when modified packets are flushed, else the stack will see bad checksums for packets from CHECKSUM_COMPLETE drivers using the skb interface. Fixed in the attached patch. I thought about it... As we do update the TCP checksum for aggregated packets we could add a second ip_summed field in the net_lro_mgr struct used for aggregated packets to support HW that does not have any checksum helper functionality. These drivers could set this ip_summed field to CHECKSUM_NONE, and thus leave the checksum check to the stack. I'm not sure if these old devices benefit a lot from LRO. So what do you think? This might be handy, and it would also fix the problem with CHECKSUM_PARTIAL drivers using the skb interface by allowing them to set it to CHECKSUM_UNNECESSARY. Fourth, I did some traffic sniffing to try to figure out what's going on above, and saw tcpdump complain about bad checksums. Have you tried running tcpdump -s 65535 -vvv? Have you also seen bad checksums? I seem to see this for both page- and skb-based versions of the driver. Hmmm, can't confirm that. For our skb-based version I see correct checksums for aggregated packets and for the page-based version as well. I used: (tcpdump -i ethX -s 0 -w dump.bin) in combination with ethereal. Don't see problems as well with your tcpdump command. I'm still trying to get a handle on this. It happens both with page based and skb based receive for me.. I would not be surprised if I was doing something wrong in myri10ge. But I don't see it without LRO, or with my LRO. I'll let you know when I figure it out.. In the meantime, in case you have any insight, I've left a capture of a small "netcat" transfer of a 64KB file full of zeros at http://www.myri.com/staff/gallatin/lro/netcat_dump.bz2 Drew - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/4][RFC] lro: Generic Large Receive Offload for TCP traffic
Hi, Thanks for finding these bugs! I'll post an updated version soon (2 patches with no separate Kconfig patches, one LRO and one eHEA patch). See comments below. Thanks, Jan-Bernd On Monday 30 July 2007 22:32, Andrew Gallatin wrote: > I was working on testing the myri10ge patch, and I ran into a few > problems. I've attached a patch to inet_lro.c to fix some of them, > and a patch to myri10ge.c to show how to use the page based > interface. Both patches are signed off by Andrew Gallatin > <[EMAIL PROTECTED]> > > First, the LRO_MAX_PG_HLEN is still a problem. Minimally sized 60 > byte frames still cause problems in lro_gen_skb due to skb->len > going negative. Fixed in the attached patch. It may be simpler > to just drop LRO_MAX_PG_HLEN to ETH_ZLEN, but I'm not sure if > that is enough. Are there "smart" NICs which might chop off padding > themselves? I'd tend to stick to an explicit check as implemented in your patch for now > > Second, you still need to set skb->ip_summed = CHECKSUM_UNNECESSARY > when modified packets are flushed, else the stack will see bad > checksums for packets from CHECKSUM_COMPLETE drivers using the > skb interface. Fixed in the attached patch. I thought about it... As we do update the TCP checksum for aggregated packets we could add a second ip_summed field in the net_lro_mgr struct used for aggregated packets to support HW that does not have any checksum helper functionality. These drivers could set this ip_summed field to CHECKSUM_NONE, and thus leave the checksum check to the stack. I'm not sure if these old devices benefit a lot from LRO. So what do you think? > > Fourth, I did some traffic sniffing to try to figure out what's going > on above, and saw tcpdump complain about bad checksums. Have you tried > running tcpdump -s 65535 -vvv? Have you also seen bad checksums? > I seem to see this for both page- and skb-based versions of the driver. > Hmmm, can't confirm that. For our skb-based version I see correct checksums for aggregated packets and for the page-based version as well. I used: (tcpdump -i ethX -s 0 -w dump.bin) in combination with ethereal. Don't see problems as well with your tcpdump command. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH 0/4][RFC] lro: Generic Large Receive Offload for TCP traffic
> -Original Message- > From: [EMAIL PROTECTED] [mailto:netdev- > [EMAIL PROTECTED] On Behalf Of Andrew Gallatin > Sent: Monday, July 30, 2007 10:43 AM > To: Linas Vepstas > Cc: Jan-Bernd Themann; netdev; Thomas Klein; Jeff Garzik; Jan-Bernd > Themann; linux-kernel; linux-ppc; Christoph Raisch; Marcus Eder; Stefan > Roscher; David Miller > Subject: Re: [PATCH 0/4][RFC] lro: Generic Large Receive Offload for > TCP traffic > > > Here is a quick reply before something more official can > be written up: > > Linas Vepstas wrote: > > > -- what is LRO? > > Large Receive Offload > > > -- Basic principles of operation? > > LRO is analogous to a receive side version of TSO. The NIC (or > driver) merges several consecutive segments from the same connection, > fixing up checksums, etc. Rather than up to 45 separate 1500 byte > frames (meaning up to 45 trips through the network stack), the driver > merges them into one 65212 byte frame. It currently works only > with TCP over IPv4. > > LRO was, AFAIK, first though of by Neterion. They had a paper about > it at OLS2005. > http://www.linuxinsight.com/files/ols2005/grossman-reprint.pdf > > > -- Can I use it in my driver? > > Yes, it can be used in any driver. > > > -- Does my hardware have to have some special feature before I can > use it? > > No. Ditto. LRO hw assists (or full LRO offload, meaning that the merge happens in the ASIC but the merge criteria are set by the host) are quite beneficial, especially as the number of LRO connections gets very large (Neterion has IP around fw/hw LRO and will release full hw LRO offload in the next ASIC), but as Andrew indicated sw-only LRO implementation can be done for any NIC and provides good results - especially for non-jumbo workloads. > > > -- What sort of performance improvement does it provide? Throughput? > >Latency? CPU usage? How does it affect DMA allocation? Does it > >improve only a certain type of traffic (large/small packets, > etc.) > > The benefit is directly proportional to the packet rate. > > See my reply to the previous RFC for performance information. The > executive summary is that for the myri10ge 10GbE driver on low end > hardware with 1500b frames, I've seen it increase throughput by a > factor of nearly 2.5x, while at the same time reducing CPU utilization > by 17%. The affect for jumbo frames is less dramatic, but still > impressive (1.10x, 14% CPU reduction) > > You can achieve better speedups if your driver receives into > high-order pages. > > > -- Example code? What's the API? How should my driver use it? > > The 3/4 in this patch showed an example of converting a driver > to use LRO for skb based receive buffers. I'm working on > a patch for myri10ge that shows how you would use it in a driver > which receives into pages. > > Cheers, > > Drew > - > To unsubscribe from this list: send the line "unsubscribe netdev" in > the body of a message to [EMAIL PROTECTED] > More majordomo info at http://vger.kernel.org/majordomo-info.html - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/4][RFC] lro: Generic Large Receive Offload for TCP traffic
David Miller wrote: From: Jeff Garzik <[EMAIL PROTECTED]> Date: Mon, 30 Jul 2007 12:17:58 -0400 David, thoughts on merging? I'm not We could stick this into your tree or mine. Whether yours or mine, I would like to keep the driver and net-core patches together in the same git tree. No objections and I'll stick it into my net-2.6.24 tree once I cut that, I'll have the NAPI stuff in there too so I'll keep these two merge nightmares out of your hair :-) Works for me. Thanks, Jeff - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/4][RFC] lro: Generic Large Receive Offload for TCP traffic
From: Jeff Garzik <[EMAIL PROTECTED]> Date: Mon, 30 Jul 2007 12:17:58 -0400 > David, thoughts on merging? I'm not We could stick this into your tree > or mine. Whether yours or mine, I would like to keep the driver and > net-core patches together in the same git tree. No objections and I'll stick it into my net-2.6.24 tree once I cut that, I'll have the NAPI stuff in there too so I'll keep these two merge nightmares out of your hair :-) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/4][RFC] lro: Generic Large Receive Offload for TCP traffic
Jan-Bernd Themann wrote: Hi, this patch set contains the latest generic LRO code, a Kconfig / Makefile and an eHEA patch demonstrating how the "aggregate SKB" interface has to to be used. Drew, could you provide a patch for the myri10ge driver to show how the "receive in page" interface works? Hi, Thank you for the many fixes, and especially for the counters! I was working on testing the myri10ge patch, and I ran into a few problems. I've attached a patch to inet_lro.c to fix some of them, and a patch to myri10ge.c to show how to use the page based interface. Both patches are signed off by Andrew Gallatin <[EMAIL PROTECTED]> First, the LRO_MAX_PG_HLEN is still a problem. Minimally sized 60 byte frames still cause problems in lro_gen_skb due to skb->len going negative. Fixed in the attached patch. It may be simpler to just drop LRO_MAX_PG_HLEN to ETH_ZLEN, but I'm not sure if that is enough. Are there "smart" NICs which might chop off padding themselves? Second, you still need to set skb->ip_summed = CHECKSUM_UNNECESSARY when modified packets are flushed, else the stack will see bad checksums for packets from CHECKSUM_COMPLETE drivers using the skb interface. Fixed in the attached patch. Third, for some reason I have yet to figure out, this version of the patch takes a while to "ramp up", but only when the receiver MTU is 9000 and the sender MTU is 1500. If the receiver MTU is also 1500, even a 10 second netperf easily shows line rate. I don't see this with our LRO, and I'm so far at a loss to explain it. Here is the first 3 seconds of output from a simple program which diffs the interface counters once / sec. Ipkts IBytesOpkts Obytes rx MTU=9000: 0000 7299092 14 1102 105 1589707 420 750324 113598708417804 1068240 814427 123304247818529 740 <> rx MTU=1500 0000 44134466818016813132 788182 815938 123532865618716 1122960 817698 123799477218628 1117680 <.> Once it has ramped up, the bandwidth is fine, and there doesn't seem to be much difference between setting the receiver MTU to 1500 or 9000. But it takes a very long netperf run to overcome the ramp up period. Fourth, I did some traffic sniffing to try to figure out what's going on above, and saw tcpdump complain about bad checksums. Have you tried running tcpdump -s 65535 -vvv? Have you also seen bad checksums? I seem to see this for both page- and skb-based versions of the driver. Last, the pointer to the TCP header in __lro_proc_segment() in the unaccelerated vlan hdr case needs to also be offset by vlan_hdr_len from skb->data. I've fixed this in the attached patch. Thanks, Drew --- linux-2.6.23-rc1.orig/net/ipv4/inet_lro.c 2007-07-30 13:10:49.0 -0400 +++ linux-2.6.23-rc1/net/ipv4/inet_lro.c2007-07-30 15:17:51.0 -0400 @@ -126,6 +126,7 @@ static void lro_update_tcp_ip_header(str htons(lro_desc->ip_tot_len) - IP_HDR_LEN(iph), IPPROTO_TCP, lro_desc->data_csum); + lro_desc->parent->ip_summed = CHECKSUM_UNNECESSARY; } static __wsum lro_tcp_data_csum(struct iphdr *iph, struct tcphdr *tcph, int len) @@ -396,6 +397,9 @@ static struct sk_buff *lro_gen_skb(struc if (!skb) return NULL; + if (hlen > len) + hlen = len; + skb->len = len; skb->data_len = len - hlen; skb->truesize += true_size; diff -urp a/drivers/net/myri10ge/myri10ge.c b/drivers/net/myri10ge/myri10ge.c --- a/drivers/net/myri10ge/myri10ge.c 2007-07-24 15:57:12.0 -0400 +++ b/drivers/net/myri10ge/myri10ge.c 2007-07-30 16:12:06.0 -0400 @@ -48,6 +48,7 @@ #include #include #include +#include #include #include #include @@ -62,6 +63,8 @@ #include #include #include +#include +#include #include #include #include @@ -89,6 +92,7 @@ MODULE_LICENSE("Dual BSD/GPL"); #define MYRI10GE_EEPROM_STRINGS_SIZE 256 #define MYRI10GE_MAX_SEND_DESC_TSO ((65536 / 2048) * 2) +#define MYRI10GE_MAX_LRO_DESCRIPTORS 8 #define MYRI10GE_NO_CONFIRM_DATA htonl(0x) #define MYRI10GE_NO_RESPONSE_RESULT 0x @@ -151,6 +155,8 @@ struct myri10ge_rx_done { dma_addr_t bus; int cnt; int idx; + struct net_lro_mgr lro_mgr; + struct net_lro_desc lro_desc[MYRI10GE_MAX_LRO_DESCRIPTORS]; }; struct myri10ge_priv { @@ -276,6 +282,14 @@ static int myri10ge_debug = -1;/* defau module_param(myri10ge_debug, int, 0); MODULE_PARM_DESC(myri10ge_debug, "Debug level (0=none,...,16=all)"); +static int myri10ge_lro = 1; +module_param(myri10ge_lro, int, S_IRUGO); +MODULE_PARM_DESC(myri1
Re: [PATCH 0/4][RFC] lro: Generic Large Receive Offload for TCP traffic
Could someone add this to: http://linux-net.osdl.org Wiki? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/4][RFC] lro: Generic Large Receive Offload for TCP traffic
Here is a quick reply before something more official can be written up: Linas Vepstas wrote: > -- what is LRO? Large Receive Offload > -- Basic principles of operation? LRO is analogous to a receive side version of TSO. The NIC (or driver) merges several consecutive segments from the same connection, fixing up checksums, etc. Rather than up to 45 separate 1500 byte frames (meaning up to 45 trips through the network stack), the driver merges them into one 65212 byte frame. It currently works only with TCP over IPv4. LRO was, AFAIK, first though of by Neterion. They had a paper about it at OLS2005. http://www.linuxinsight.com/files/ols2005/grossman-reprint.pdf > -- Can I use it in my driver? Yes, it can be used in any driver. > -- Does my hardware have to have some special feature before I can use it? No. > -- What sort of performance improvement does it provide? Throughput? >Latency? CPU usage? How does it affect DMA allocation? Does it >improve only a certain type of traffic (large/small packets, etc.) The benefit is directly proportional to the packet rate. See my reply to the previous RFC for performance information. The executive summary is that for the myri10ge 10GbE driver on low end hardware with 1500b frames, I've seen it increase throughput by a factor of nearly 2.5x, while at the same time reducing CPU utilization by 17%. The affect for jumbo frames is less dramatic, but still impressive (1.10x, 14% CPU reduction) You can achieve better speedups if your driver receives into high-order pages. > -- Example code? What's the API? How should my driver use it? The 3/4 in this patch showed an example of converting a driver to use LRO for skb based receive buffers. I'm working on a patch for myri10ge that shows how you would use it in a driver which receives into pages. Cheers, Drew - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/4][RFC] lro: Generic Large Receive Offload for TCP traffic
On Mon, Jul 30, 2007 at 05:24:33PM +0200, Jan-Bernd Themann wrote: > > Changes to http://www.spinics.net/lists/netdev/msg36912.html > > 1) A new field called "features" has been added to the net_lro_mgr struct. >It is set by the driver to indicate: >- LRO_F_NAPI:Use NAPI / netif_rx to pass packets to stack > >- LRO_F_EXTRACT_VLAN_ID: Set by driver if HW extracts VLAN IDs for VLAN > packets but does not modify ETH protocol (ETH_P_8021Q) > > 2) Padded frames are not aggregated for now. Bug fixed > > 3) Correct header length now used. No minimal header length for aggregated >packets used anymore. > > 4) Statistic counters were introduced. They are stored in a new struct in >the net_lro_mgr. This has the advantage that no locking is required in >cases where the driver uses multiple lro_mgrs for different receive queues. >Thus we get the following statistics per lro_mgr / eth device: >- Number of aggregated packets >- Number of flushed packets >- Number of times we run out of lro_desc. > >The ratio of "aggregated packets" and "flushed packets" give you an >idea how well LRO is working. I'd like to see an edited form of this, together with an introduction to LRO, written up in the Documentation subdirectory. As someone with some driver experience, but not on te bleeding edge, some basc newbie questions pop into mind: -- what is LRO? -- Basic principles of operation? -- Can I use it in my driver? -- Does my hardware have to have some special feature before I can use it? -- What sort of performance improvement does it provide? Throughput? Latency? CPU usage? How does it affect DMA allocation? Does it improve only a certain type of traffic (large/small packets, etc.) -- Example code? What's the API? How should my driver use it? Right now, I can maybe find answers by doing lots of googling. I'd like to have some quick way of getting a grip on this. --linas - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/4][RFC] lro: Generic Large Receive Offload for TCP traffic
Seems pretty good to me, save for one minor detail: patches #1/#2 should be combined together for greater git-bisect happiness. Ditto for patches #3/#4. Largely harmless in this case, but keeps the git history pollution to a minimum. Caveat reviewer: I'm not an expert of net/ipv4/* code, so I reviewed largely from the driver API perspective. David, thoughts on merging? I'm not We could stick this into your tree or mine. Whether yours or mine, I would like to keep the driver and net-core patches together in the same git tree. Jeff - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/