Please queue up the following networking fixes for 3.2.x-stable

Thanks!
From 0758483247371934d490bd9a3d059c3cb01c7885 Mon Sep 17 00:00:00 2001
From: Eric Dumazet <[email protected]>
Date: Wed, 15 Feb 2012 20:43:11 +0000
Subject: [PATCH 01/13] atl1c: dont use highprio tx queue

[ Upstream commit 11aad99af6ef629ff3b05d1c9f0936589b204316 ]

This driver attempts to use two TX rings but lacks proper support :

1) IRQ handler only takes care of TX completion on first TX ring
2) the stop/start logic uses the legacy functions (for non multiqueue
drivers)

This means all packets witk skb mark set to 1 are sent through high
queue but are never cleaned and queue eventualy fills and block the
device, triggering the infamous "NETDEV WATCHDOG" message.

Lets use a single TX ring to fix the problem, this driver is not a real
multiqueue one yet.

Minimal fix for stable kernels.

Reported-by: Thomas Meyer <[email protected]>
Tested-by: Thomas Meyer <[email protected]>
Signed-off-by: Eric Dumazet <[email protected]>
Cc: Jay Cliburn <[email protected]>
Cc: Chris Snook <[email protected]>
---
 drivers/net/ethernet/atheros/atl1c/atl1c_main.c |    4 ----
 1 files changed, 0 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/atheros/atl1c/atl1c_main.c 
b/drivers/net/ethernet/atheros/atl1c/atl1c_main.c
index 02c7ed8..eccdcff 100644
--- a/drivers/net/ethernet/atheros/atl1c/atl1c_main.c
+++ b/drivers/net/ethernet/atheros/atl1c/atl1c_main.c
@@ -2241,10 +2241,6 @@ static netdev_tx_t atl1c_xmit_frame(struct sk_buff *skb,
                        dev_info(&adapter->pdev->dev, "tx locked\n");
                return NETDEV_TX_LOCKED;
        }
-       if (skb->mark == 0x01)
-               type = atl1c_trans_high;
-       else
-               type = atl1c_trans_normal;
 
        if (atl1c_tpd_avail(adapter, type) < tpd_req) {
                /* no enough descriptor, just stop queue */
-- 
1.7.6.5


From 204179f43e2f26e5e2ff459f3b089f60d755d08d Mon Sep 17 00:00:00 2001
From: Michel Machado <[email protected]>
Date: Tue, 21 Feb 2012 11:04:13 +0000
Subject: [PATCH 02/13] neighbour: Fixed race condition at tbl->nht

[ Upstream commit 84338a6c9dbb6ff3de4749864020f8f25d86fc81 ]

When the fixed race condition happens:

1. While function neigh_periodic_work scans the neighbor hash table
pointed by field tbl->nht, it unlocks and locks tbl->lock between
buckets in order to call cond_resched.

2. Assume that function neigh_periodic_work calls cond_resched, that is,
the lock tbl->lock is available, and function neigh_hash_grow runs.

3. Once function neigh_hash_grow finishes, and RCU calls
neigh_hash_free_rcu, the original struct neigh_hash_table that function
neigh_periodic_work was using doesn't exist anymore.

4. Once back at neigh_periodic_work, whenever the old struct
neigh_hash_table is accessed, things can go badly.

Signed-off-by: Michel Machado <[email protected]>
CC: "David S. Miller" <[email protected]>
CC: Eric Dumazet <[email protected]>
---
 net/core/neighbour.c |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/net/core/neighbour.c b/net/core/neighbour.c
index 5ac07d3..7aafaed 100644
--- a/net/core/neighbour.c
+++ b/net/core/neighbour.c
@@ -802,6 +802,8 @@ next_elt:
                write_unlock_bh(&tbl->lock);
                cond_resched();
                write_lock_bh(&tbl->lock);
+               nht = rcu_dereference_protected(tbl->nht,
+                                               lockdep_is_held(&tbl->lock));
        }
        /* Cycle through all hash buckets every base_reachable_time/2 ticks.
         * ARP entry timeouts range from 1/2 base_reachable_time to 3/2
-- 
1.7.6.5


From 24993865a1421d820d8399fd0681c8f15980cd8b Mon Sep 17 00:00:00 2001
From: Eric Dumazet <[email protected]>
Date: Thu, 23 Feb 2012 10:55:02 +0000
Subject: [PATCH 03/13] ipsec: be careful of non existing mac headers
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

[ Upstream commit 03606895cd98c0a628b17324fd7b5ff15db7e3cd ]

Niccolo Belli reported ipsec crashes in case we handle a frame without
mac header (atm in his case)

Before copying mac header, better make sure it is present.

Bugzilla reference:  https://bugzilla.kernel.org/show_bug.cgi?id=42809

Reported-by: Niccolò Belli <[email protected]>
Tested-by: Niccolò Belli <[email protected]>
Signed-off-by: Eric Dumazet <[email protected]>
---
 include/linux/skbuff.h       |   10 ++++++++++
 net/ipv4/xfrm4_mode_beet.c   |    5 +----
 net/ipv4/xfrm4_mode_tunnel.c |    6 ++----
 net/ipv6/xfrm6_mode_beet.c   |    6 +-----
 net/ipv6/xfrm6_mode_tunnel.c |    6 ++----
 5 files changed, 16 insertions(+), 17 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index fe86488..6cf8b53 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -1453,6 +1453,16 @@ static inline void skb_set_mac_header(struct sk_buff 
*skb, const int offset)
 }
 #endif /* NET_SKBUFF_DATA_USES_OFFSET */
 
+static inline void skb_mac_header_rebuild(struct sk_buff *skb)
+{
+       if (skb_mac_header_was_set(skb)) {
+               const unsigned char *old_mac = skb_mac_header(skb);
+
+               skb_set_mac_header(skb, -skb->mac_len);
+               memmove(skb_mac_header(skb), old_mac, skb->mac_len);
+       }
+}
+
 static inline int skb_checksum_start_offset(const struct sk_buff *skb)
 {
        return skb->csum_start - skb_headroom(skb);
diff --git a/net/ipv4/xfrm4_mode_beet.c b/net/ipv4/xfrm4_mode_beet.c
index 6341818..e3db3f9 100644
--- a/net/ipv4/xfrm4_mode_beet.c
+++ b/net/ipv4/xfrm4_mode_beet.c
@@ -110,10 +110,7 @@ static int xfrm4_beet_input(struct xfrm_state *x, struct 
sk_buff *skb)
 
        skb_push(skb, sizeof(*iph));
        skb_reset_network_header(skb);
-
-       memmove(skb->data - skb->mac_len, skb_mac_header(skb),
-               skb->mac_len);
-       skb_set_mac_header(skb, -skb->mac_len);
+       skb_mac_header_rebuild(skb);
 
        xfrm4_beet_make_header(skb);
 
diff --git a/net/ipv4/xfrm4_mode_tunnel.c b/net/ipv4/xfrm4_mode_tunnel.c
index 534972e..ed4bf11 100644
--- a/net/ipv4/xfrm4_mode_tunnel.c
+++ b/net/ipv4/xfrm4_mode_tunnel.c
@@ -66,7 +66,6 @@ static int xfrm4_mode_tunnel_output(struct xfrm_state *x, 
struct sk_buff *skb)
 
 static int xfrm4_mode_tunnel_input(struct xfrm_state *x, struct sk_buff *skb)
 {
-       const unsigned char *old_mac;
        int err = -EINVAL;
 
        if (XFRM_MODE_SKB_CB(skb)->protocol != IPPROTO_IPIP)
@@ -84,10 +83,9 @@ static int xfrm4_mode_tunnel_input(struct xfrm_state *x, 
struct sk_buff *skb)
        if (!(x->props.flags & XFRM_STATE_NOECN))
                ipip_ecn_decapsulate(skb);
 
-       old_mac = skb_mac_header(skb);
-       skb_set_mac_header(skb, -skb->mac_len);
-       memmove(skb_mac_header(skb), old_mac, skb->mac_len);
        skb_reset_network_header(skb);
+       skb_mac_header_rebuild(skb);
+
        err = 0;
 
 out:
diff --git a/net/ipv6/xfrm6_mode_beet.c b/net/ipv6/xfrm6_mode_beet.c
index 3437d7d..f37cba9 100644
--- a/net/ipv6/xfrm6_mode_beet.c
+++ b/net/ipv6/xfrm6_mode_beet.c
@@ -80,7 +80,6 @@ static int xfrm6_beet_output(struct xfrm_state *x, struct 
sk_buff *skb)
 static int xfrm6_beet_input(struct xfrm_state *x, struct sk_buff *skb)
 {
        struct ipv6hdr *ip6h;
-       const unsigned char *old_mac;
        int size = sizeof(struct ipv6hdr);
        int err;
 
@@ -90,10 +89,7 @@ static int xfrm6_beet_input(struct xfrm_state *x, struct 
sk_buff *skb)
 
        __skb_push(skb, size);
        skb_reset_network_header(skb);
-
-       old_mac = skb_mac_header(skb);
-       skb_set_mac_header(skb, -skb->mac_len);
-       memmove(skb_mac_header(skb), old_mac, skb->mac_len);
+       skb_mac_header_rebuild(skb);
 
        xfrm6_beet_make_header(skb);
 
diff --git a/net/ipv6/xfrm6_mode_tunnel.c b/net/ipv6/xfrm6_mode_tunnel.c
index 4d6edff..23ecd68 100644
--- a/net/ipv6/xfrm6_mode_tunnel.c
+++ b/net/ipv6/xfrm6_mode_tunnel.c
@@ -63,7 +63,6 @@ static int xfrm6_mode_tunnel_output(struct xfrm_state *x, 
struct sk_buff *skb)
 static int xfrm6_mode_tunnel_input(struct xfrm_state *x, struct sk_buff *skb)
 {
        int err = -EINVAL;
-       const unsigned char *old_mac;
 
        if (XFRM_MODE_SKB_CB(skb)->protocol != IPPROTO_IPV6)
                goto out;
@@ -80,10 +79,9 @@ static int xfrm6_mode_tunnel_input(struct xfrm_state *x, 
struct sk_buff *skb)
        if (!(x->props.flags & XFRM_STATE_NOECN))
                ipip6_ecn_decapsulate(skb);
 
-       old_mac = skb_mac_header(skb);
-       skb_set_mac_header(skb, -skb->mac_len);
-       memmove(skb_mac_header(skb), old_mac, skb->mac_len);
        skb_reset_network_header(skb);
+       skb_mac_header_rebuild(skb);
+
        err = 0;
 
 out:
-- 
1.7.6.5


From 4bf8a7ec2262ec2fae6dadc5c0b3adb7b518c1ca Mon Sep 17 00:00:00 2001
From: Ben McKeegan <[email protected]>
Date: Fri, 24 Feb 2012 06:33:56 +0000
Subject: [PATCH 04/13] ppp: fix 'ppp_mp_reconstruct bad seq' errors

[ Upstream commit 8a49ad6e89feb5015e77ce6efeb2678947117e20 ]

This patch fixes a (mostly cosmetic) bug introduced by the patch
'ppp: Use SKB queue abstraction interfaces in fragment processing'
found here: http://www.spinics.net/lists/netdev/msg153312.html

The above patch rewrote and moved the code responsible for cleaning
up discarded fragments but the new code does not catch every case
where this is necessary.  This results in some discarded fragments
remaining in the queue, and triggering a 'bad seq' error on the
subsequent call to ppp_mp_reconstruct.  Fragments are discarded
whenever other fragments of the same frame have been lost.
This can generate a lot of unwanted and misleading log messages.

This patch also adds additional detail to the debug logging to
make it clearer which fragments were lost and which other fragments
were discarded as a result of losses. (Run pppd with 'kdebug 1'
option to enable debug logging.)

Signed-off-by: Ben McKeegan <[email protected]>
---
 drivers/net/ppp/ppp_generic.c |   23 +++++++++++++++++++++++
 1 files changed, 23 insertions(+), 0 deletions(-)

diff --git a/drivers/net/ppp/ppp_generic.c b/drivers/net/ppp/ppp_generic.c
index edfa15d..486b404 100644
--- a/drivers/net/ppp/ppp_generic.c
+++ b/drivers/net/ppp/ppp_generic.c
@@ -2024,14 +2024,22 @@ ppp_mp_reconstruct(struct ppp *ppp)
                        continue;
                }
                if (PPP_MP_CB(p)->sequence != seq) {
+                       u32 oldseq;
                        /* Fragment `seq' is missing.  If it is after
                           minseq, it might arrive later, so stop here. */
                        if (seq_after(seq, minseq))
                                break;
                        /* Fragment `seq' is lost, keep going. */
                        lost = 1;
+                       oldseq = seq;
                        seq = seq_before(minseq, PPP_MP_CB(p)->sequence)?
                                minseq + 1: PPP_MP_CB(p)->sequence;
+
+                       if (ppp->debug & 1)
+                               netdev_printk(KERN_DEBUG, ppp->dev,
+                                             "lost frag %u..%u\n",
+                                             oldseq, seq-1);
+
                        goto again;
                }
 
@@ -2076,6 +2084,10 @@ ppp_mp_reconstruct(struct ppp *ppp)
                        struct sk_buff *tmp2;
 
                        skb_queue_reverse_walk_from_safe(list, p, tmp2) {
+                               if (ppp->debug & 1)
+                                       netdev_printk(KERN_DEBUG, ppp->dev,
+                                                     "discarding frag %u\n",
+                                                     PPP_MP_CB(p)->sequence);
                                __skb_unlink(p, list);
                                kfree_skb(p);
                        }
@@ -2091,6 +2103,17 @@ ppp_mp_reconstruct(struct ppp *ppp)
                /* If we have discarded any fragments,
                   signal a receive error. */
                if (PPP_MP_CB(head)->sequence != ppp->nextseq) {
+                       skb_queue_walk_safe(list, p, tmp) {
+                               if (p == head)
+                                       break;
+                               if (ppp->debug & 1)
+                                       netdev_printk(KERN_DEBUG, ppp->dev,
+                                                     "discarding frag %u\n",
+                                                     PPP_MP_CB(p)->sequence);
+                               __skb_unlink(p, list);
+                               kfree_skb(p);
+                       }
+
                        if (ppp->debug & 1)
                                netdev_printk(KERN_DEBUG, ppp->dev,
                                              "  missed pkts %u..%u\n",
-- 
1.7.6.5


From f63d8be193626abd6c21892cfcce30d707c67d43 Mon Sep 17 00:00:00 2001
From: Ben Hutchings <[email protected]>
Date: Fri, 24 Feb 2012 15:12:34 +0000
Subject: [PATCH 05/13] sfc: Fix assignment of ip_summed for pre-allocated
 skbs

[ Upstream commit ff3bc1e7527504a93710535611b2f812f3bb89bf ]

When pre-allocating skbs for received packets, we set ip_summed =
CHECKSUM_UNNCESSARY.  We used to change it back to CHECKSUM_NONE when
the received packet had an incorrect checksum or unhandled protocol.

Commit bc8acf2c8c3e43fcc192762a9f964b3e9a17748b ('drivers/net: avoid
some skb->ip_summed initializations') mistakenly replaced the latter
assignment with a DEBUG-only assertion that ip_summed ==
CHECKSUM_NONE.  This assertion is always false, but it seems no-one
has exercised this code path in a DEBUG build.

Fix this by moving our assignment of CHECKSUM_UNNECESSARY into
efx_rx_packet_gro().

Signed-off-by: Ben Hutchings <[email protected]>
---
 drivers/net/ethernet/sfc/rx.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/sfc/rx.c b/drivers/net/ethernet/sfc/rx.c
index 752d521..5ef4cc0 100644
--- a/drivers/net/ethernet/sfc/rx.c
+++ b/drivers/net/ethernet/sfc/rx.c
@@ -156,11 +156,10 @@ static int efx_init_rx_buffers_skb(struct efx_rx_queue 
*rx_queue)
                if (unlikely(!skb))
                        return -ENOMEM;
 
-               /* Adjust the SKB for padding and checksum */
+               /* Adjust the SKB for padding */
                skb_reserve(skb, NET_IP_ALIGN);
                rx_buf->len = skb_len - NET_IP_ALIGN;
                rx_buf->is_page = false;
-               skb->ip_summed = CHECKSUM_UNNECESSARY;
 
                rx_buf->dma_addr = pci_map_single(efx->pci_dev,
                                                  skb->data, rx_buf->len,
@@ -499,6 +498,7 @@ static void efx_rx_packet_gro(struct efx_channel *channel,
 
                EFX_BUG_ON_PARANOID(!checksummed);
                rx_buf->u.skb = NULL;
+               skb->ip_summed = CHECKSUM_UNNECESSARY;
 
                gro_result = napi_gro_receive(napi, skb);
        }
-- 
1.7.6.5


From 55434881da04db44d265eb832063f4c45f1de461 Mon Sep 17 00:00:00 2001
From: Neal Cardwell <[email protected]>
Date: Sun, 26 Feb 2012 10:06:19 +0000
Subject: [PATCH 06/13] tcp: fix false reordering signal in tcp_shifted_skb

[ Upstream commit 4c90d3b30334833450ccbb02f452d4972a3c3c3f ]

When tcp_shifted_skb() shifts bytes from the skb that is currently
pointed to by 'highest_sack' then the increment of
TCP_SKB_CB(skb)->seq implicitly advances tcp_highest_sack_seq(). This
implicit advancement, combined with the recent fix to pass the correct
SACKed range into tcp_sacktag_one(), caused tcp_sacktag_one() to think
that the newly SACKed range was before the tcp_highest_sack_seq(),
leading to a call to tcp_update_reordering() with a degree of
reordering matching the size of the newly SACKed range (typically just
1 packet, which is a NOP, but potentially larger).

This commit fixes this by simply calling tcp_sacktag_one() before the
TCP_SKB_CB(skb)->seq advancement that can advance our notion of the
highest SACKed sequence.

Correspondingly, we can simplify the code a little now that
tcp_shifted_skb() should update the lost_cnt_hint in all cases where
skb == tp->lost_skb_hint.

Signed-off-by: Neal Cardwell <[email protected]>
Acked-by: Yuchung Cheng <[email protected]>
---
 net/ipv4/tcp_input.c |   18 ++++++++++--------
 1 files changed, 10 insertions(+), 8 deletions(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 53113b9..9e32fca 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -1406,8 +1406,16 @@ static int tcp_shifted_skb(struct sock *sk, struct 
sk_buff *skb,
 
        BUG_ON(!pcount);
 
-       /* Adjust hint for FACK. Non-FACK is handled in tcp_sacktag_one(). */
-       if (tcp_is_fack(tp) && (skb == tp->lost_skb_hint))
+       /* Adjust counters and hints for the newly sacked sequence
+        * range but discard the return value since prev is already
+        * marked. We must tag the range first because the seq
+        * advancement below implicitly advances
+        * tcp_highest_sack_seq() when skb is highest_sack.
+        */
+       tcp_sacktag_one(sk, state, TCP_SKB_CB(skb)->sacked,
+                       start_seq, end_seq, dup_sack, pcount);
+
+       if (skb == tp->lost_skb_hint)
                tp->lost_cnt_hint += pcount;
 
        TCP_SKB_CB(prev)->end_seq += shifted;
@@ -1433,12 +1441,6 @@ static int tcp_shifted_skb(struct sock *sk, struct 
sk_buff *skb,
                skb_shinfo(skb)->gso_type = 0;
        }
 
-       /* Adjust counters and hints for the newly sacked sequence range but
-        * discard the return value since prev is already marked.
-        */
-       tcp_sacktag_one(sk, state, TCP_SKB_CB(skb)->sacked,
-                       start_seq, end_seq, dup_sack, pcount);
-
        /* Difference in this won't matter, both ACKed by the same cumul. ACK */
        TCP_SKB_CB(prev)->sacked |= (TCP_SKB_CB(skb)->sacked & 
TCPCB_EVER_RETRANS);
 
-- 
1.7.6.5


From 2dd9e727f451f8aa0a9e360dbdc5559c18f17f93 Mon Sep 17 00:00:00 2001
From: Shreyas Bhatewara <[email protected]>
Date: Tue, 28 Feb 2012 22:17:38 +0000
Subject: [PATCH 07/13] vmxnet3: Fix transport header size

[ Upstream commit efead8710aad9e384730ecf25eae0287878840d7 ]

Fix transport header size

Fix the transpoert header size for UDP packets.

Signed-off-by: Shreyas N Bhatewara <[email protected]>
---
 drivers/net/vmxnet3/vmxnet3_drv.c |    7 +------
 drivers/net/vmxnet3/vmxnet3_int.h |    4 ++--
 2 files changed, 3 insertions(+), 8 deletions(-)

diff --git a/drivers/net/vmxnet3/vmxnet3_drv.c 
b/drivers/net/vmxnet3/vmxnet3_drv.c
index d96bfb1..d426261 100644
--- a/drivers/net/vmxnet3/vmxnet3_drv.c
+++ b/drivers/net/vmxnet3/vmxnet3_drv.c
@@ -830,13 +830,8 @@ vmxnet3_parse_and_copy_hdr(struct sk_buff *skb, struct 
vmxnet3_tx_queue *tq,
                                        ctx->l4_hdr_size = ((struct tcphdr *)
                                           skb_transport_header(skb))->doff * 4;
                                else if (iph->protocol == IPPROTO_UDP)
-                                       /*
-                                        * Use tcp header size so that bytes to
-                                        * be copied are more than required by
-                                        * the device.
-                                        */
                                        ctx->l4_hdr_size =
-                                                       sizeof(struct tcphdr);
+                                                       sizeof(struct udphdr);
                                else
                                        ctx->l4_hdr_size = 0;
                        } else {
diff --git a/drivers/net/vmxnet3/vmxnet3_int.h 
b/drivers/net/vmxnet3/vmxnet3_int.h
index b18eac1..8df921b 100644
--- a/drivers/net/vmxnet3/vmxnet3_int.h
+++ b/drivers/net/vmxnet3/vmxnet3_int.h
@@ -70,10 +70,10 @@
 /*
  * Version numbers
  */
-#define VMXNET3_DRIVER_VERSION_STRING   "1.1.18.0-k"
+#define VMXNET3_DRIVER_VERSION_STRING   "1.1.29.0-k"
 
 /* a 32-bit int, each byte encode a verion number in VMXNET3_DRIVER_VERSION */
-#define VMXNET3_DRIVER_VERSION_NUM      0x01011200
+#define VMXNET3_DRIVER_VERSION_NUM      0x01011D00
 
 #if defined(CONFIG_PCI_MSI)
        /* RSS only makes sense if MSI-X is supported. */
-- 
1.7.6.5


From da6962a341ebf847646a06092b680979b660c9b1 Mon Sep 17 00:00:00 2001
From: stephen hemminger <[email protected]>
Date: Fri, 2 Mar 2012 13:38:56 +0000
Subject: [PATCH 08/13] packetengines: fix config default

[ Upstream commit 3f2010b2ad3d66d5291497c9b274315e7b807ecd ]

As part of the big network driver reorg, each vendor directory defaults to
yes, so that older config's can migrate correctly. Looks like this one
got missed.

Signed-off-by: Stephen Hemminger <[email protected]>
---
 drivers/net/ethernet/packetengines/Kconfig |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/drivers/net/ethernet/packetengines/Kconfig 
b/drivers/net/ethernet/packetengines/Kconfig
index b97132d..8f29feb 100644
--- a/drivers/net/ethernet/packetengines/Kconfig
+++ b/drivers/net/ethernet/packetengines/Kconfig
@@ -4,6 +4,7 @@
 
 config NET_PACKET_ENGINE
        bool "Packet Engine devices"
+       default y
        depends on PCI
        ---help---
          If you have a network (Ethernet) card belonging to this class, say Y
-- 
1.7.6.5


From c8d7aa2ab97ce7536ff433f565dc3d7094ba35f3 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?fran=C3=A7ois=20romieu?= <[email protected]>
Date: Fri, 2 Mar 2012 04:43:14 +0000
Subject: [PATCH 09/13] r8169: corrupted IP fragments fix for large mtu.

[ Upstream commit 9c5028e9da1255dd2b99762d8627b88b29f68cce ]

Noticed with the 8168d (-vb-gr, aka RTL_GIGA_MAC_VER_26).

ConfigX registers should only be written while the Config9346 lock
is held.

Signed-off-by: Francois Romieu <[email protected]>
Reported-by: Nick Bowler <[email protected]>
Cc: Hayes Wang <[email protected]>
---
 drivers/net/ethernet/realtek/r8169.c |    8 ++++++++
 1 files changed, 8 insertions(+), 0 deletions(-)

diff --git a/drivers/net/ethernet/realtek/r8169.c 
b/drivers/net/ethernet/realtek/r8169.c
index c8f47f1..0cf2351 100644
--- a/drivers/net/ethernet/realtek/r8169.c
+++ b/drivers/net/ethernet/realtek/r8169.c
@@ -3781,12 +3781,20 @@ static void rtl8169_init_ring_indexes(struct 
rtl8169_private *tp)
 
 static void rtl_hw_jumbo_enable(struct rtl8169_private *tp)
 {
+       void __iomem *ioaddr = tp->mmio_addr;
+
+       RTL_W8(Cfg9346, Cfg9346_Unlock);
        rtl_generic_op(tp, tp->jumbo_ops.enable);
+       RTL_W8(Cfg9346, Cfg9346_Lock);
 }
 
 static void rtl_hw_jumbo_disable(struct rtl8169_private *tp)
 {
+       void __iomem *ioaddr = tp->mmio_addr;
+
+       RTL_W8(Cfg9346, Cfg9346_Unlock);
        rtl_generic_op(tp, tp->jumbo_ops.disable);
+       RTL_W8(Cfg9346, Cfg9346_Lock);
 }
 
 static void r8168c_hw_jumbo_enable(struct rtl8169_private *tp)
-- 
1.7.6.5


From 197dd635524b96cc1c5bff549a27496dae3613e5 Mon Sep 17 00:00:00 2001
From: Neal Cardwell <[email protected]>
Date: Fri, 2 Mar 2012 21:36:51 +0000
Subject: [PATCH 10/13] tcp: don't fragment SACKed skbs in
 tcp_mark_head_lost()
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

[ Upstream commit c0638c247f559e1a16ee79e54df14bca2cb679ea ]

In tcp_mark_head_lost() we should not attempt to fragment a SACKed skb
to mark the first portion as lost. This is for two primary reasons:

(1) tcp_shifted_skb() coalesces adjacent regions of SACKed skbs. When
doing this, it preserves the sum of their packet counts in order to
reflect the real-world dynamics on the wire. But given that skbs can
have remainders that do not align to MSS boundaries, this packet count
preservation means that for SACKed skbs there is not necessarily a
direct linear relationship between tcp_skb_pcount(skb) and
skb->len. Thus tcp_mark_head_lost()'s previous attempts to fragment
off and mark as lost a prefix of length (packets - oldcnt)*mss from
SACKed skbs were leading to occasional failures of the WARN_ON(len >
skb->len) in tcp_fragment() (which used to be a BUG_ON(); see the
recent "crash in tcp_fragment" thread on netdev).

(2) there is no real point in fragmenting off part of a SACKed skb and
calling tcp_skb_mark_lost() on it, since tcp_skb_mark_lost() is a NOP
for SACKed skbs.

Signed-off-by: Neal Cardwell <[email protected]>
Acked-by: Ilpo Järvinen <[email protected]>
Acked-by: Yuchung Cheng <[email protected]>
Acked-by: Nandita Dukkipati <[email protected]>
---
 net/ipv4/tcp_input.c |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 9e32fca..1c774af 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -2571,6 +2571,7 @@ static void tcp_mark_head_lost(struct sock *sk, int 
packets, int mark_head)
 
                if (cnt > packets) {
                        if ((tcp_is_sack(tp) && !tcp_is_fack(tp)) ||
+                           (TCP_SKB_CB(skb)->sacked & TCPCB_SACKED_ACKED) ||
                            (oldcnt >= packets))
                                break;
 
-- 
1.7.6.5


From 9f7257bc8234cb34c0075131e993cbe1f2339e7c Mon Sep 17 00:00:00 2001
From: Ulrich Weber <[email protected]>
Date: Mon, 5 Mar 2012 04:52:44 +0000
Subject: [PATCH 11/13] bridge: check return value of ipv6_dev_get_saddr()

[ Upstream commit d1d81d4c3dd886d5fa25a2c4fa1e39cb89613712 ]

otherwise source IPv6 address of ICMPV6_MGM_QUERY packet
might be random junk if IPv6 is disabled on interface or
link-local address is not yet ready (DAD).

Signed-off-by: Ulrich Weber <[email protected]>
---
 net/bridge/br_multicast.c |    7 +++++--
 1 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/net/bridge/br_multicast.c b/net/bridge/br_multicast.c
index a5f4e57..8eb6b15 100644
--- a/net/bridge/br_multicast.c
+++ b/net/bridge/br_multicast.c
@@ -446,8 +446,11 @@ static struct sk_buff *br_ip6_multicast_alloc_query(struct 
net_bridge *br,
        ip6h->nexthdr = IPPROTO_HOPOPTS;
        ip6h->hop_limit = 1;
        ipv6_addr_set(&ip6h->daddr, htonl(0xff020000), 0, 0, htonl(1));
-       ipv6_dev_get_saddr(dev_net(br->dev), br->dev, &ip6h->daddr, 0,
-                          &ip6h->saddr);
+       if (ipv6_dev_get_saddr(dev_net(br->dev), br->dev, &ip6h->daddr, 0,
+                              &ip6h->saddr)) {
+               kfree_skb(skb);
+               return NULL;
+       }
        ipv6_eth_mc_map(&ip6h->daddr, eth->h_dest);
 
        hopopt = (u8 *)(ip6h + 1);
-- 
1.7.6.5


From 259dd54f2324e8e3b03467c32c27ffd84e708a67 Mon Sep 17 00:00:00 2001
From: Neal Cardwell <[email protected]>
Date: Mon, 5 Mar 2012 19:35:04 +0000
Subject: [PATCH 12/13] tcp: fix tcp_shift_skb_data() to not shift SACKed data
 below snd_una

[ Upstream commit 4648dc97af9d496218a05353b0e442b3dfa6aaab ]

This commit fixes tcp_shift_skb_data() so that it does not shift
SACKed data below snd_una.

This fixes an issue whose symptoms exactly match reports showing
tp->sacked_out going negative since 3.3.0-rc4 (see "WARNING: at
net/ipv4/tcp_input.c:3418" thread on netdev).

Since 2008 (832d11c5cd076abc0aa1eaf7be96c81d1a59ce41)
tcp_shift_skb_data() had been shifting SACKed ranges that were below
snd_una. It checked that the *end* of the skb it was about to shift
from was above snd_una, but did not check that the end of the actual
shifted range was above snd_una; this commit adds that check.

Shifting SACKed ranges below snd_una is problematic because for such
ranges tcp_sacktag_one() short-circuits: it does not declare anything
as SACKed and does not increase sacked_out.

Before the fixes in commits cc9a672ee522d4805495b98680f4a3db5d0a0af9
and daef52bab1fd26e24e8e9578f8fb33ba1d0cb412, shifting SACKed ranges
below snd_una happened to work because tcp_shifted_skb() was always
(incorrectly) passing in to tcp_sacktag_one() an skb whose end_seq
tcp_shift_skb_data() had already guaranteed was beyond snd_una. Hence
tcp_sacktag_one() never short-circuited and always increased
tp->sacked_out in this case.

After those two fixes, my testing has verified that shifting SACKed
ranges below snd_una could cause tp->sacked_out to go negative with
the following sequence of events:

(1) tcp_shift_skb_data() sees an skb whose end_seq is beyond snd_una,
    then shifts a prefix of that skb that is below snd_una

(2) tcp_shifted_skb() increments the packet count of the
    already-SACKed prev sk_buff

(3) tcp_sacktag_one() sees the end of the new SACKed range is below
    snd_una, so it short-circuits and doesn't increase tp->sacked_out

(5) tcp_clean_rtx_queue() sees the SACKed skb has been ACKed,
    decrements tp->sacked_out by this "inflated" pcount that was
    missing a matching increase in tp->sacked_out, and hence
    tp->sacked_out underflows to a u32 like 0xFFFFFFFF, which casted
    to s32 is negative.

(6) this leads to the warnings seen in the recent "WARNING: at
    net/ipv4/tcp_input.c:3418" thread on the netdev list; e.g.:
    tcp_input.c:3418  WARN_ON((int)tp->sacked_out < 0);

More generally, I think this bug can be tickled in some cases where
two or more ACKs from the receiver are lost and then a DSACK arrives
that is immediately above an existing SACKed skb in the write queue.

This fix changes tcp_shift_skb_data() to abort this sequence at step
(1) in the scenario above by noticing that the bytes are below snd_una
and not shifting them.

Signed-off-by: Neal Cardwell <[email protected]>
---
 net/ipv4/tcp_input.c |    4 ++++
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 1c774af..e4d1e4a 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -1588,6 +1588,10 @@ static struct sk_buff *tcp_shift_skb_data(struct sock 
*sk, struct sk_buff *skb,
                }
        }
 
+       /* tcp_sacktag_one() won't SACK-tag ranges below snd_una */
+       if (!after(TCP_SKB_CB(skb)->seq + len, tp->snd_una))
+               goto fallback;
+
        if (!skb_shift(prev, skb, len))
                goto fallback;
        if (!tcp_shifted_skb(sk, skb, state, pcount, len, mss, dup_sack))
-- 
1.7.6.5


From 1f8356cc3f041f2599f53750e46eabcfef63cffb Mon Sep 17 00:00:00 2001
From: Li Wei <[email protected]>
Date: Mon, 5 Mar 2012 14:45:17 +0000
Subject: [PATCH 13/13] IPv6: Fix not join all-router mcast group when
 forwarding set.

[ Upstream commit d6ddef9e641d1229d4ec841dc75ae703171c3e92 ]

When forwarding was set and a new net device is register,
we need add this device to the all-router mcast group.

Signed-off-by: Li Wei <[email protected]>
---
 net/ipv6/addrconf.c |    4 ++++
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index 836c4ea..a5521c5 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -434,6 +434,10 @@ static struct inet6_dev * ipv6_add_dev(struct net_device 
*dev)
        /* Join all-node multicast group */
        ipv6_dev_mc_inc(dev, &in6addr_linklocal_allnodes);
 
+       /* Join all-router multicast group if forwarding is set */
+       if (ndev->cnf.forwarding && dev && (dev->flags & IFF_MULTICAST))
+               ipv6_dev_mc_inc(dev, &in6addr_linklocal_allrouters);
+
        return ndev;
 }
 
-- 
1.7.6.5

Reply via email to