Re: [PATCH 1/1] lro: Generic Large Receive Offload for TCP traffic

2007-08-06 Thread Jan-Bernd Themann
Hi Jörn

On Friday 03 August 2007 15:41, Jörn Engel wrote:
 On Fri, 3 August 2007 14:41:19 +0200, Jan-Bernd Themann wrote:
  
  This patch provides generic Large Receive Offload (LRO) functionality
  for IPv4/TCP traffic.
  
  LRO combines received tcp packets to a single larger tcp packet and 
  passes them then to the network stack in order to increase performance
  (throughput). The interface supports two modes: Drivers can either pass
  SKBs or fragment lists to the LRO engine. 
 
 Maybe this is a stupid question, but why is LRO done at the device
 driver level?
 
 If it is a unversal performance benefit, I would have expected it to be
 done generically, i.e. have all packets moved into network layer pass
 through LRO instead.

The driver seems to be the right place:
-  There is the page mode interface that accepts fragment lists instead of
   SKBs and does generate SKBs only in the end (see Andrew Gallatins 
   mails where he described the advantages of this approach)

-  Some drivers (in particular for 10G NICs which actually could benefit
   from LRO) have multiple HW receive queues that do some sort of sorting,
   thus using one lro_mgr per queue increases the likelyhood of beeing able
   to do efficient LRO.
   

  +void lro_flush_pkt(struct net_lro_mgr *lro_mgr,
  +  struct iphdr *iph, struct tcphdr *tcph);

 In particular this bit looks like it should be driven by a timeout,
 which would be settable via /proc/sys/net/core/lro_timeout or similar.

No, this function is needed for page mode as some HW provides
extra handling for small packets where packets are not stored in preallocated 
pages but in extra queues. Thus the driver needs a way to flush old sessions
for this connection and handle these packets in a different way (for example 
create a SKB and copy the data there).

Timeouts are not used at all. Experiments showed that flushing at the end 
of a NAPI poll round seems to be sufficient (see Andrew's test results)
and does not affect the latency too badly.

Regards,
Jan-Bernd
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/1] lro: Generic Large Receive Offload for TCP traffic

2007-08-03 Thread Jan-Bernd Themann
This patch provides generic Large Receive Offload (LRO) functionality
for IPv4/TCP traffic.

LRO combines received tcp packets to a single larger tcp packet and 
passes them then to the network stack in order to increase performance
(throughput). The interface supports two modes: Drivers can either pass
SKBs or fragment lists to the LRO engine. 

Signed-off-by: Jan-Bernd Themann [EMAIL PROTECTED]


---
 include/linux/inet_lro.h |  177 ++
 net/ipv4/Kconfig |8 +
 net/ipv4/Makefile|1 +
 net/ipv4/inet_lro.c  |  600 ++
 4 files changed, 786 insertions(+), 0 deletions(-)
 create mode 100644 include/linux/inet_lro.h
 create mode 100644 net/ipv4/inet_lro.c

diff --git a/include/linux/inet_lro.h b/include/linux/inet_lro.h
new file mode 100644
index 000..e1fc1d1
--- /dev/null
+++ b/include/linux/inet_lro.h
@@ -0,0 +1,177 @@
+/*
+ *  linux/include/linux/inet_lro.h
+ *
+ *  Large Receive Offload (ipv4 / tcp)
+ *
+ *  (C) Copyright IBM Corp. 2007
+ *
+ *  Authors:
+ *   Jan-Bernd Themann [EMAIL PROTECTED]
+ *   Christoph Raisch [EMAIL PROTECTED]
+ *
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
+ */
+
+#ifndef __INET_LRO_H_
+#define __INET_LRO_H_
+
+#include net/ip.h
+#include net/tcp.h
+
+/*
+ * LRO statistics
+ */
+
+struct net_lro_stats {
+   unsigned long aggregated;
+   unsigned long flushed;
+   unsigned long no_desc;
+};
+
+/*
+ * LRO descriptor for a tcp session
+ */
+struct net_lro_desc {
+   struct sk_buff *parent;
+   struct sk_buff *last_skb;
+   struct skb_frag_struct *next_frag;
+   struct iphdr *iph;
+   struct tcphdr *tcph;
+   struct vlan_group *vgrp;
+   __wsum  data_csum;
+   u32 tcp_rcv_tsecr;
+   u32 tcp_rcv_tsval;
+   u32 tcp_ack;
+   u32 tcp_next_seq;
+   u32 skb_tot_frags_len;
+   u16 ip_tot_len;
+   u16 tcp_saw_tstamp; /* timestamps enabled */
+   u16 tcp_window;
+   u16 vlan_tag;
+   int pkt_aggr_cnt;   /* counts aggregated packets */
+   int vlan_packet;
+   int mss;
+   int active;
+};
+
+/*
+ * Large Receive Offload (LRO) Manager
+ *
+ * Fields must be set by driver
+ */
+
+struct net_lro_mgr {
+   struct net_device *dev;
+   struct net_lro_stats stats;
+
+   /* LRO features */
+   unsigned long features;
+#define LRO_F_NAPI1  /* Pass packets to stack via NAPI */
+#define LRO_F_EXTRACT_VLAN_ID 2  /* Set flag if VLAN IDs are extracted
+   from received packets and eth protocol
+   is still ETH_P_8021Q */
+
+   u32 ip_summed;  /* Set in non generated SKBs in page mode */
+   u32 ip_summed_aggr; /* Set in aggregated SKBs: CHECKSUM_UNNECESSARY
+* or CHECKSUM_NONE */
+
+   int max_desc; /* Max number of LRO descriptors  */
+   int max_aggr; /* Max number of LRO packets to be aggregated */
+
+   struct net_lro_desc *lro_arr; /* Array of LRO descriptors */
+
+   /*
+* Optimized driver functions
+*
+* get_skb_header: returns tcp and ip header for packet in SKB
+*/
+   int (*get_skb_header)(struct sk_buff *skb, void **ip_hdr,
+ void **tcpudp_hdr, u64 *hdr_flags, void *priv);
+
+   /* hdr_flags: */
+#define LRO_IPV4 1 /* ip_hdr is IPv4 header */
+#define LRO_TCP  2 /* tcpudp_hdr is TCP header */
+
+   /*
+* get_frag_header: returns mac, tcp and ip header for packet in SKB
+*
+* @hdr_flags: Indicate what kind of LRO has to be done
+* (IPv4/IPv6/TCP/UDP)
+*/
+   int (*get_frag_header)(struct skb_frag_struct *frag, void **mac_hdr,
+  void **ip_hdr, void **tcpudp_hdr, u64 *hdr_flags,
+  void *priv);
+};
+
+/*
+ * Processes a SKB
+ *
+ * @lro_mgr: LRO manager to use
+ * @skb: SKB to aggregate
+ * @priv: Private data that may be used by driver functions
+ *(for example get_tcp_ip_hdr)
+ */
+
+void lro_receive_skb(struct net_lro_mgr *lro_mgr,
+struct sk_buff *skb,
+void *priv);
+
+/*
+ * Processes a SKB with VLAN HW acceleration support
+ */
+
+void lro_vlan_hwaccel_receive_skb(struct 

Re: [PATCH 1/1] lro: Generic Large Receive Offload for TCP traffic

2007-08-03 Thread Jörn Engel
On Fri, 3 August 2007 14:41:19 +0200, Jan-Bernd Themann wrote:
 
 This patch provides generic Large Receive Offload (LRO) functionality
 for IPv4/TCP traffic.
 
 LRO combines received tcp packets to a single larger tcp packet and 
 passes them then to the network stack in order to increase performance
 (throughput). The interface supports two modes: Drivers can either pass
 SKBs or fragment lists to the LRO engine. 

Maybe this is a stupid question, but why is LRO done at the device
driver level?

If it is a unversal performance benefit, I would have expected it to be
done generically, i.e. have all packets moved into network layer pass
through LRO instead.

 +void lro_flush_pkt(struct net_lro_mgr *lro_mgr,
 +struct iphdr *iph, struct tcphdr *tcph);

In particular this bit looks like it should be driven by a timeout,
which would be settable via /proc/sys/net/core/lro_timeout or similar.

Jörn

-- 
Rules of Optimization:
Rule 1: Don't do it.
Rule 2 (for experts only): Don't do it yet.
-- M.A. Jackson
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html