[1/5] [NET]: Merge TSO/UFO fields in sk_buff

2006-06-22 Thread Herbert Xu
Hi:

[NET]: Merge TSO/UFO fields in sk_buff

Having separate fields in sk_buff for TSO/UFO (tso_size/ufo_size) is not
going to scale if we add any more segmentation methods (e.g., DCCP).  So
let's merge them.

They were used to tell the protocol of a packet.  This function has been
subsumed by the new gso_type field.  This is essentially a set of netdev
feature bits (shifted by 16 bits) that are required to process a specific
skb.  As such it's easy to tell whether a given device can process a GSO
skb: you just have to and the gso_type field and the netdev's features
field.

I've made gso_type a conjunction.  The idea is that you have a base type
(e.g., SKB_GSO_TCPV4) that can be modified further to support new features.
For example, if we add a hardware TSO type that supports ECN, they would
declare NETIF_F_TSO | NETIF_F_TSO_ECN.  All TSO packets with CWR set would
have a gso_type of SKB_GSO_TCPV4 | SKB_GSO_TCPV4_ECN while all other TSO
packets would be SKB_GSO_TCPV4.  This means that only the CWR packets need
to be emulated in software.

Signed-off-by: Herbert Xu <[EMAIL PROTECTED]>

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
diff --git a/drivers/net/8139cp.c b/drivers/net/8139cp.c
--- a/drivers/net/8139cp.c
+++ b/drivers/net/8139cp.c
@@ -792,7 +792,7 @@ static int cp_start_xmit (struct sk_buff
entry = cp->tx_head;
eor = (entry == (CP_TX_RING_SIZE - 1)) ? RingEnd : 0;
if (dev->features & NETIF_F_TSO)
-   mss = skb_shinfo(skb)->tso_size;
+   mss = skb_shinfo(skb)->gso_size;
 
if (skb_shinfo(skb)->nr_frags == 0) {
struct cp_desc *txd = &cp->tx_ring[entry];
diff --git a/drivers/net/bnx2.c b/drivers/net/bnx2.c
--- a/drivers/net/bnx2.c
+++ b/drivers/net/bnx2.c
@@ -1640,7 +1640,7 @@ bnx2_tx_int(struct bnx2 *bp)
skb = tx_buf->skb;
 #ifdef BCM_TSO 
/* partial BD completions possible with TSO packets */
-   if (skb_shinfo(skb)->tso_size) {
+   if (skb_shinfo(skb)->gso_size) {
u16 last_idx, last_ring_idx;
 
last_idx = sw_cons +
@@ -4428,7 +4428,7 @@ bnx2_start_xmit(struct sk_buff *skb, str
(TX_BD_FLAGS_VLAN_TAG | (vlan_tx_tag_get(skb) << 16));
}
 #ifdef BCM_TSO 
-   if ((mss = skb_shinfo(skb)->tso_size) &&
+   if ((mss = skb_shinfo(skb)->gso_size) &&
(skb->len > (bp->dev->mtu + ETH_HLEN))) {
u32 tcp_opt_len, ip_tcp_len;
 
diff --git a/drivers/net/chelsio/sge.c b/drivers/net/chelsio/sge.c
--- a/drivers/net/chelsio/sge.c
+++ b/drivers/net/chelsio/sge.c
@@ -1418,7 +1418,7 @@ int t1_start_xmit(struct sk_buff *skb, s
struct cpl_tx_pkt *cpl;
 
 #ifdef NETIF_F_TSO
-   if (skb_shinfo(skb)->tso_size) {
+   if (skb_shinfo(skb)->gso_size) {
int eth_type;
struct cpl_tx_pkt_lso *hdr;
 
@@ -1433,7 +1433,7 @@ int t1_start_xmit(struct sk_buff *skb, s
hdr->ip_hdr_words = skb->nh.iph->ihl;
hdr->tcp_hdr_words = skb->h.th->doff;
hdr->eth_type_mss = htons(MK_ETH_TYPE_MSS(eth_type,
-   skb_shinfo(skb)->tso_size));
+   skb_shinfo(skb)->gso_size));
hdr->len = htonl(skb->len - sizeof(*hdr));
cpl = (struct cpl_tx_pkt *)hdr;
sge->stats.tx_lso_pkts++;
diff --git a/drivers/net/e1000/e1000_main.c b/drivers/net/e1000/e1000_main.c
--- a/drivers/net/e1000/e1000_main.c
+++ b/drivers/net/e1000/e1000_main.c
@@ -2394,7 +2394,7 @@ e1000_tso(struct e1000_adapter *adapter,
uint8_t ipcss, ipcso, tucss, tucso, hdr_len;
int err;
 
-   if (skb_shinfo(skb)->tso_size) {
+   if (skb_shinfo(skb)->gso_size) {
if (skb_header_cloned(skb)) {
err = pskb_expand_head(skb, 0, 0, GFP_ATOMIC);
if (err)
@@ -2402,7 +2402,7 @@ e1000_tso(struct e1000_adapter *adapter,
}
 
hdr_len = ((skb->h.raw - skb->data) + (skb->h.th->doff << 2));
-   mss = skb_shinfo(skb)->tso_size;
+   mss = skb_shinfo(skb)->gso_size;
if (skb->protocol == htons(ETH_P_IP)) {
skb->nh.iph->tot_len = 0;
skb->nh.iph->check = 0;
@@ -2519,7 +2519,7 @@ e1000_tx_map(struct e1000_adapter *adapt
 * tso gets written back prematurely before the data is fully
 * DMA'd to the controller */
if (!skb->data_len && tx_ring->last_tx_tso &&
-   !skb_shinfo(skb)->tso_size) {
+   !skb_shinfo(skb)->gso_size) {
tx_ring->last_tx_tso = 0;
size -= 4;
}

Re: [1/5] [NET]: Merge TSO/UFO fields in sk_buff

2006-06-21 Thread David Miller
From: Herbert Xu <[EMAIL PROTECTED]>
Date: Thu, 22 Jun 2006 11:09:25 +1000

> ECE just needs to be replicated so it would seem to be a safe bet unless
> Dave knows some really broken hardware out there? If not I'd say that
> we should just assume that it works and add a new bit it if said broken
> stuff does turn up.

ECE simply needs to persist while the ECE condition is true.
If it is true when we build the TSO frame, it would have
thus been true during the time in which we had built each
individual sub-frame.

I don't anticipate any problems if you just mirror the ECE
bit in each chopped up frame.

> Thanks a lot for looking into this!

Yes, indeed, thanks Michael.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [1/5] [NET]: Merge TSO/UFO fields in sk_buff

2006-06-21 Thread Herbert Xu
On Wed, Jun 21, 2006 at 05:46:24PM -0700, Michael Chan wrote:
>
> OK, if time permits, I'll cook up some patches to support generic TSO
> ECN with or without hardware support.  Without hardware ECN, it will use
> GSO to split up the packet with CWR.  Can we assume that all hardware
> will handle ECE properly?

ECE just needs to be replicated so it would seem to be a safe bet unless
Dave knows some really broken hardware out there? If not I'd say that
we should just assume that it works and add a new bit it if said broken
stuff does turn up.

Thanks a lot for looking into this!

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [1/5] [NET]: Merge TSO/UFO fields in sk_buff

2006-06-21 Thread Michael Chan
On Thu, 2006-06-22 at 09:27 +1000, Herbert Xu wrote:
> Hi Michael:
> 
> On Wed, Jun 21, 2006 at 02:48:15PM -0700, Michael Chan wrote:
> > 
> > We have some hardware that supports TSO and ECN.  Is something like the
> > patch below what you had in mind to support NETIF_F_TSO_ECN?  Or are you
> > thinking about something more generic that works with or without
> > hardware support?
> 
> Yeah I was thinking of something more generic because packets with CWR
> set should be rare.
> 
OK, if time permits, I'll cook up some patches to support generic TSO
ECN with or without hardware support.  Without hardware ECN, it will use
GSO to split up the packet with CWR.  Can we assume that all hardware
will handle ECE properly?

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [1/5] [NET]: Merge TSO/UFO fields in sk_buff

2006-06-21 Thread Herbert Xu
Hi Michael:

On Wed, Jun 21, 2006 at 02:48:15PM -0700, Michael Chan wrote:
> 
> We have some hardware that supports TSO and ECN.  Is something like the
> patch below what you had in mind to support NETIF_F_TSO_ECN?  Or are you
> thinking about something more generic that works with or without
> hardware support?

Yeah I was thinking of something more generic because packets with CWR
set should be rare.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [1/5] [NET]: Merge TSO/UFO fields in sk_buff

2006-06-21 Thread Michael Chan
On Tue, 2006-06-20 at 19:10 +1000, Herbert Xu wrote:

> I've made gso_type a conjunction.  The idea is that you have a base type
> (e.g., SKB_GSO_TCPV4) that can be modified further to support new features.
> For example, if we add a hardware TSO type that supports ECN, they would
> declare NETIF_F_TSO | NETIF_F_TSO_ECN.

Hi Herbert,

We have some hardware that supports TSO and ECN.  Is something like the
patch below what you had in mind to support NETIF_F_TSO_ECN?  Or are you
thinking about something more generic that works with or without
hardware support?

[NET]: Add hardware TSO support for ECN

In the current TSO implementation, NETIF_F_TSO and ECN cannot be
turned on together in a TCP connection.  This patch adds a new
feature NETIF_F_TSO_ECN for hardware that supports TSO and ECN.

To support NETIF_F_TSO_ECN, hardware has to set the ECE flag in the
TCP flags for all segments if the first TSO segment has the ECE flag set.
If the CWR flag is set in the first TSO segment, hardware has to set
CWR in the first segment only and clear it in all subsequent segments.

Signed-off-by: Michael Chan <[EMAIL PROTECTED]>


diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index a3af961..825b66d 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -316,6 +316,7 @@ struct net_device
 #define NETIF_F_GSO_SHIFT  16
 #define NETIF_F_TSO(SKB_GSO_TCPV4 << NETIF_F_GSO_SHIFT)
 #define NETIF_F_UFO(SKB_GSO_UDPV4 << NETIF_F_GSO_SHIFT)
+#define NETIF_F_TSO_ECN(SKB_GSO_TCPV4_ECN << NETIF_F_GSO_SHIFT)
 
 #define NETIF_F_GEN_CSUM   (NETIF_F_NO_CSUM | NETIF_F_HW_CSUM)
 #define NETIF_F_ALL_CSUM   (NETIF_F_IP_CSUM | NETIF_F_GEN_CSUM)
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 679feab..818f478 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -173,6 +173,7 @@ enum {
 enum {
SKB_GSO_TCPV4 = 1 << 0,
SKB_GSO_UDPV4 = 1 << 1,
+   SKB_GSO_TCPV4_ECN = 1 << 2,
 };
 
 /** 
diff --git a/include/net/sock.h b/include/net/sock.h
index 6aac245..7c1ac0c 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -1034,7 +1034,8 @@ static inline void sk_setup_caps(struct 
if (sk->sk_route_caps & NETIF_F_GSO)
sk->sk_route_caps |= NETIF_F_TSO;
if (sk->sk_route_caps & NETIF_F_TSO) {
-   if (sock_flag(sk, SOCK_NO_LARGESEND) || dst->header_len)
+   if ((sock_flag(sk, SOCK_NO_LARGESEND) &&
+   !(sk->sk_route_caps & NETIF_F_TSO_ECN)) || dst->header_len)
sk->sk_route_caps &= ~NETIF_F_TSO;
else 
sk->sk_route_caps |= NETIF_F_SG | NETIF_F_HW_CSUM;
diff --git a/include/net/tcp_ecn.h b/include/net/tcp_ecn.h
index c6b8439..c8a3b48 100644
--- a/include/net/tcp_ecn.h
+++ b/include/net/tcp_ecn.h
@@ -31,7 +31,8 @@ static inline void TCP_ECN_send_syn(stru
struct sk_buff *skb)
 {
tp->ecn_flags = 0;
-   if (sysctl_tcp_ecn && !(sk->sk_route_caps & NETIF_F_TSO)) {
+   if (sysctl_tcp_ecn && (!(sk->sk_route_caps & NETIF_F_TSO) ||
+   (sk->sk_route_caps & NETIF_F_TSO_ECN))) {
TCP_SKB_CB(skb)->flags |= TCPCB_FLAG_ECE|TCPCB_FLAG_CWR;
tp->ecn_flags = TCP_ECN_OK;
sock_set_flag(sk, SOCK_NO_LARGESEND);
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index bdd71db..a65fe56 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -2044,7 +2044,8 @@ struct sk_buff * tcp_make_synack(struct 
memset(th, 0, sizeof(struct tcphdr));
th->syn = 1;
th->ack = 1;
-   if (dst->dev->features&NETIF_F_TSO)
+   if ((dst->dev->features&NETIF_F_TSO) &&
+   !(dst->dev->features&NETIF_F_TSO_ECN))
ireq->ecn_ok = 0;
TCP_ECN_make_synack(req, th);
th->source = inet_sk(sk)->sport;


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[1/5] [NET]: Merge TSO/UFO fields in sk_buff

2006-06-20 Thread Herbert Xu
Hi:

[NET]: Merge TSO/UFO fields in sk_buff

Having separate fields in sk_buff for TSO/UFO (tso_size/ufo_size) is not
going to scale if we add any more segmentation methods (e.g., DCCP).  So
let's merge them.

They were used to tell the protocol of a packet.  This function has been
subsumed by the new gso_type field.  This is essentially a set of netdev
feature bits (shifted by 16 bits) that are required to process a specific
skb.  As such it's easy to tell whether a given device can process a GSO
skb: you just have to and the gso_type field and the netdev's features
field.

I've made gso_type a conjunction.  The idea is that you have a base type
(e.g., SKB_GSO_TCPV4) that can be modified further to support new features.
For example, if we add a hardware TSO type that supports ECN, they would
declare NETIF_F_TSO | NETIF_F_TSO_ECN.  All TSO packets with CWR set would
have a gso_type of SKB_GSO_TCPV4 | SKB_GSO_TCPV4_ECN while all other TSO
packets would be SKB_GSO_TCPV4.  This means that only the CWR packets need
to be emulated in software.  The emulation could even chop it up into one
CWR fragment and another super-packet to be further segmented by the NIC.

Signed-off-by: Herbert Xu <[EMAIL PROTECTED]>

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
diff --git a/drivers/net/8139cp.c b/drivers/net/8139cp.c
--- a/drivers/net/8139cp.c
+++ b/drivers/net/8139cp.c
@@ -792,7 +792,7 @@ static int cp_start_xmit (struct sk_buff
entry = cp->tx_head;
eor = (entry == (CP_TX_RING_SIZE - 1)) ? RingEnd : 0;
if (dev->features & NETIF_F_TSO)
-   mss = skb_shinfo(skb)->tso_size;
+   mss = skb_shinfo(skb)->gso_size;
 
if (skb_shinfo(skb)->nr_frags == 0) {
struct cp_desc *txd = &cp->tx_ring[entry];
diff --git a/drivers/net/bnx2.c b/drivers/net/bnx2.c
--- a/drivers/net/bnx2.c
+++ b/drivers/net/bnx2.c
@@ -1640,7 +1640,7 @@ bnx2_tx_int(struct bnx2 *bp)
skb = tx_buf->skb;
 #ifdef BCM_TSO 
/* partial BD completions possible with TSO packets */
-   if (skb_shinfo(skb)->tso_size) {
+   if (skb_shinfo(skb)->gso_size) {
u16 last_idx, last_ring_idx;
 
last_idx = sw_cons +
@@ -4428,7 +4428,7 @@ bnx2_start_xmit(struct sk_buff *skb, str
(TX_BD_FLAGS_VLAN_TAG | (vlan_tx_tag_get(skb) << 16));
}
 #ifdef BCM_TSO 
-   if ((mss = skb_shinfo(skb)->tso_size) &&
+   if ((mss = skb_shinfo(skb)->gso_size) &&
(skb->len > (bp->dev->mtu + ETH_HLEN))) {
u32 tcp_opt_len, ip_tcp_len;
 
diff --git a/drivers/net/chelsio/sge.c b/drivers/net/chelsio/sge.c
--- a/drivers/net/chelsio/sge.c
+++ b/drivers/net/chelsio/sge.c
@@ -1418,7 +1418,7 @@ int t1_start_xmit(struct sk_buff *skb, s
struct cpl_tx_pkt *cpl;
 
 #ifdef NETIF_F_TSO
-   if (skb_shinfo(skb)->tso_size) {
+   if (skb_shinfo(skb)->gso_size) {
int eth_type;
struct cpl_tx_pkt_lso *hdr;
 
@@ -1433,7 +1433,7 @@ int t1_start_xmit(struct sk_buff *skb, s
hdr->ip_hdr_words = skb->nh.iph->ihl;
hdr->tcp_hdr_words = skb->h.th->doff;
hdr->eth_type_mss = htons(MK_ETH_TYPE_MSS(eth_type,
-   skb_shinfo(skb)->tso_size));
+   skb_shinfo(skb)->gso_size));
hdr->len = htonl(skb->len - sizeof(*hdr));
cpl = (struct cpl_tx_pkt *)hdr;
sge->stats.tx_lso_pkts++;
diff --git a/drivers/net/e1000/e1000_main.c b/drivers/net/e1000/e1000_main.c
--- a/drivers/net/e1000/e1000_main.c
+++ b/drivers/net/e1000/e1000_main.c
@@ -2394,7 +2394,7 @@ e1000_tso(struct e1000_adapter *adapter,
uint8_t ipcss, ipcso, tucss, tucso, hdr_len;
int err;
 
-   if (skb_shinfo(skb)->tso_size) {
+   if (skb_shinfo(skb)->gso_size) {
if (skb_header_cloned(skb)) {
err = pskb_expand_head(skb, 0, 0, GFP_ATOMIC);
if (err)
@@ -2402,7 +2402,7 @@ e1000_tso(struct e1000_adapter *adapter,
}
 
hdr_len = ((skb->h.raw - skb->data) + (skb->h.th->doff << 2));
-   mss = skb_shinfo(skb)->tso_size;
+   mss = skb_shinfo(skb)->gso_size;
if (skb->protocol == htons(ETH_P_IP)) {
skb->nh.iph->tot_len = 0;
skb->nh.iph->check = 0;
@@ -2519,7 +2519,7 @@ e1000_tx_map(struct e1000_adapter *adapt
 * tso gets written back prematurely before the data is fully
 * DMA'd to the controller */
if (!skb->data_len && tx_ring->last_tx_tso &&
-   !skb_shinfo(skb)->tso_size) {
+   !skb_shinfo(s