[stable] [PATCHES] Networking fixes for 2.6.39-stable

David Miller Fri, 27 May 2011 13:23:57 -0700

Please queue up the following networking bug fixes for 2.6.39-stable

Thanks!

>From 44e7464f931d31f74815c59520f830fc7ee79fbb Mon Sep 17 00:00:00 2001
From: Patrick McHardy <[email protected]>
Date: Mon, 16 May 2011 14:42:26 +0200
Subject: [PATCH 01/17] netfilter: nf_ct_sip: validate Content-Length in TCP
 SIP messages


[ Upstream commit 274ea0e2a4cdf18110e5931b8ecbfef6353e5293 ]

Verify that the message length of a single SIP message, which is calculated
based on the Content-Length field contained in the SIP message, does not
exceed the packet boundaries.

Signed-off-by: Patrick McHardy <[email protected]>
---
 net/netfilter/nf_conntrack_sip.c |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/net/netfilter/nf_conntrack_sip.c b/net/netfilter/nf_conntrack_sip.c
index 237cc19..3fed15e 100644
--- a/net/netfilter/nf_conntrack_sip.c
+++ b/net/netfilter/nf_conntrack_sip.c
@@ -1461,6 +1461,8 @@ static int sip_help_tcp(struct sk_buff *skb, unsigned int 
protoff,
                end += strlen("\r\n\r\n") + clen;
 
                msglen = origlen = end - dptr;
+               if (msglen > datalen)
+                       return NF_DROP;
 
                ret = process_sip_msg(skb, ct, dataoff, &dptr, &msglen);
                if (ret != NF_ACCEPT)
-- 
1.7.4.4


>From 8fe63da58ac1dda2fafb306767799a5e8b2e73ad Mon Sep 17 00:00:00 2001
From: Patrick McHardy <[email protected]>
Date: Mon, 16 May 2011 14:45:39 +0200
Subject: [PATCH 02/17] netfilter: nf_ct_sip: fix SDP parsing in TCP SIP
 messages for some Cisco phones

[ Upstream commit e6e4d9ed11fb1fab8b3256a3dc14d71b5e984ac4 ]

Some Cisco phones do not place the Content-Length field at the end of the
SIP message. This is valid, due to a misunderstanding of the specification
the parser expects the SDP body to start directly after the Content-Length
field. Fix the parser to scan for \r\n\r\n to locate the beginning of the
SDP body.

Reported-by: Teresa Kang <[email protected]>
Signed-off-by: Patrick McHardy <[email protected]>
---
 net/netfilter/nf_conntrack_sip.c |   14 ++++++++++----
 1 files changed, 10 insertions(+), 4 deletions(-)

diff --git a/net/netfilter/nf_conntrack_sip.c b/net/netfilter/nf_conntrack_sip.c
index 3fed15e..cb5a285 100644
--- a/net/netfilter/nf_conntrack_sip.c
+++ b/net/netfilter/nf_conntrack_sip.c
@@ -1419,6 +1419,7 @@ static int sip_help_tcp(struct sk_buff *skb, unsigned int 
protoff,
        const char *dptr, *end;
        s16 diff, tdiff = 0;
        int ret = NF_ACCEPT;
+       bool term;
        typeof(nf_nat_sip_seq_adjust_hook) nf_nat_sip_seq_adjust;
 
        if (ctinfo != IP_CT_ESTABLISHED &&
@@ -1453,10 +1454,15 @@ static int sip_help_tcp(struct sk_buff *skb, unsigned 
int protoff,
                if (dptr + matchoff == end)
                        break;
 
-               if (end + strlen("\r\n\r\n") > dptr + datalen)
-                       break;
-               if (end[0] != '\r' || end[1] != '\n' ||
-                   end[2] != '\r' || end[3] != '\n')
+               term = false;
+               for (; end + strlen("\r\n\r\n") <= dptr + datalen; end++) {
+                       if (end[0] == '\r' && end[1] == '\n' &&
+                           end[2] == '\r' && end[3] == '\n') {
+                               term = true;
+                               break;
+                       }
+               }
+               if (!term)
                        break;
                end += strlen("\r\n\r\n") + clen;
 
-- 
1.7.4.4


>From a586e48b616d1b2298e7399d66058dfe14c96eab Mon Sep 17 00:00:00 2001
From: Eric Dumazet <[email protected]>
Date: Tue, 17 May 2011 13:56:59 -0400
Subject: [PATCH 03/17] net: use hlist_del_rcu() in dev_change_name()

[ Upstream commit 372b2312010bece1e36f577d6c99a6193ec54cbd ]

Using plain hlist_del() in dev_change_name() is wrong since a
concurrent reader can crash trying to dereference LIST_POISON1.

Bug introduced in commit 72c9528bab94 (net: Introduce
dev_get_by_name_rcu())

Signed-off-by: Eric Dumazet <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
---
 net/core/dev.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index b624fe4..30a4078 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1007,7 +1007,7 @@ rollback:
        }
 
        write_lock_bh(&dev_base_lock);
-       hlist_del(&dev->name_hlist);
+       hlist_del_rcu(&dev->name_hlist);
        write_unlock_bh(&dev_base_lock);
 
        synchronize_rcu();
-- 
1.7.4.4


>From d028e3eda03270969958935831496f2a542ad082 Mon Sep 17 00:00:00 2001
From: Anton Blanchard <[email protected]>
Date: Tue, 17 May 2011 15:38:57 -0400
Subject: [PATCH 04/17] net: recvmmsg: Strip MSG_WAITFORONE when calling
 recvmsg

[ Upstream commit b9eb8b8752804cecbacdb4d24b52e823cf07f107 ]

recvmmsg fails on a raw socket with EINVAL. The reason for this is
packet_recvmsg checks the incoming flags:

        err = -EINVAL;
        if (flags & 
~(MSG_PEEK|MSG_DONTWAIT|MSG_TRUNC|MSG_CMSG_COMPAT|MSG_ERRQUEUE))
                goto out;

This patch strips out MSG_WAITFORONE when calling recvmmsg which
fixes the issue.

Signed-off-by: Anton Blanchard <[email protected]>
Cc: [email protected] [2.6.34+]
Signed-off-by: David S. Miller <[email protected]>
---
 net/socket.c |    6 ++++--
 1 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/net/socket.c b/net/socket.c
index 310d16b..65b2310 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -2122,14 +2122,16 @@ int __sys_recvmmsg(int fd, struct mmsghdr __user *mmsg, 
unsigned int vlen,
                 */
                if (MSG_CMSG_COMPAT & flags) {
                        err = __sys_recvmsg(sock, (struct msghdr __user 
*)compat_entry,
-                                           &msg_sys, flags, datagrams);
+                                           &msg_sys, flags & ~MSG_WAITFORONE,
+                                           datagrams);
                        if (err < 0)
                                break;
                        err = __put_user(err, &compat_entry->msg_len);
                        ++compat_entry;
                } else {
                        err = __sys_recvmsg(sock, (struct msghdr __user *)entry,
-                                           &msg_sys, flags, datagrams);
+                                           &msg_sys, flags & ~MSG_WAITFORONE,
+                                           datagrams);
                        if (err < 0)
                                break;
                        err = put_user(err, &entry->msg_len);
-- 
1.7.4.4


>From 2dff2a5efe86b57aca6de548678edd3b0b018207 Mon Sep 17 00:00:00 2001
From: Michael S. Tsirkin <[email protected]>
Date: Mon, 16 May 2011 10:37:39 +0000
Subject: [PATCH 05/17] net: Change netdev_fix_features messages loglevel

[ Upstream commit 604ae14ffb6d75d6eef4757859226b758d6bf9e3 ]

Cool, how about we make 'Features changed' debug as well?
This way userspace can't fill up the log just by tweaking tun features
with an ioctl.

Signed-off-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
---
 net/core/dev.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index 30a4078..acd7423 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -5258,7 +5258,7 @@ void netdev_update_features(struct net_device *dev)
        if (dev->features == features)
                return;
 
-       netdev_info(dev, "Features changed: 0x%08x -> 0x%08x\n",
+       netdev_dbg(dev, "Features changed: 0x%08x -> 0x%08x\n",
                dev->features, features);
 
        if (dev->netdev_ops->ndo_set_features)
-- 
1.7.4.4


>From 627fb90fdc01aefdbe902561b6a3fd9eecfa30dc Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Micha=C5=82=20Miros=C5=82aw?= <[email protected]>
Date: Tue, 17 May 2011 16:50:02 -0400
Subject: [PATCH 06/17] net: ethtool: fix IPV6 checksum feature name string
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

[ Upstream commit 7cc31a9ae1477abc79d5992b3afe889f25c50c99 ]

Signed-off-by: Michał Mirosław <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
---
 net/core/ethtool.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/net/core/ethtool.c b/net/core/ethtool.c
index 74ead9e..f337525 100644
--- a/net/core/ethtool.c
+++ b/net/core/ethtool.c
@@ -330,7 +330,7 @@ static const char 
netdev_features_strings[ETHTOOL_DEV_FEATURE_WORDS * 32][ETH_GS
        /* NETIF_F_IP_CSUM */         "tx-checksum-ipv4",
        /* NETIF_F_NO_CSUM */         "tx-checksum-unneeded",
        /* NETIF_F_HW_CSUM */         "tx-checksum-ip-generic",
-       /* NETIF_F_IPV6_CSUM */       "tx_checksum-ipv6",
+       /* NETIF_F_IPV6_CSUM */       "tx-checksum-ipv6",
        /* NETIF_F_HIGHDMA */         "highdma",
        /* NETIF_F_FRAGLIST */        "tx-scatter-gather-fraglist",
        /* NETIF_F_HW_VLAN_TX */      "tx-vlan-hw-insert",
-- 
1.7.4.4


>From f8f463fa0378747ca4cda029a452b4c50fcc5e95 Mon Sep 17 00:00:00 2001
From: Eric Dumazet <[email protected]>
Date: Wed, 18 May 2011 02:21:31 -0400
Subject: [PATCH 07/17] net: add skb_dst_force() in sock_queue_err_skb()

[ Upstream commit abb57ea48fd9431fa320a5c55f73e6b5a44c2efb ]

Commit 7fee226ad239 (add a noref bit on skb dst) forgot to use
skb_dst_force() on packets queued in sk_error_queue

This triggers following warning, for applications using IP_CMSG_PKTINFO
receiving one error status

------------[ cut here ]------------
WARNING: at include/linux/skbuff.h:457 ip_cmsg_recv_pktinfo+0xa6/0xb0()
Hardware name: 2669UYD
Modules linked in: isofs vboxnetadp vboxnetflt nfsd ebtable_nat ebtables
lib80211_crypt_ccmp uinput xcbc hdaps tp_smapi thinkpad_ec radeonfb fb_ddc
radeon ttm drm_kms_helper drm ipw2200 intel_agp intel_gtt libipw i2c_algo_bit
i2c_i801 agpgart rng_core cfbfillrect cfbcopyarea cfbimgblt video raid10 raid1
raid0 linear md_mod vboxdrv
Pid: 4697, comm: miredo Not tainted 2.6.39-rc6-00569-g5895198-dirty #22
Call Trace:
 [<c17746b6>] ? printk+0x1d/0x1f
 [<c1058302>] warn_slowpath_common+0x72/0xa0
 [<c15bbca6>] ? ip_cmsg_recv_pktinfo+0xa6/0xb0
 [<c15bbca6>] ? ip_cmsg_recv_pktinfo+0xa6/0xb0
 [<c1058350>] warn_slowpath_null+0x20/0x30
 [<c15bbca6>] ip_cmsg_recv_pktinfo+0xa6/0xb0
 [<c15bbdd7>] ip_cmsg_recv+0x127/0x260
 [<c154f82d>] ? skb_dequeue+0x4d/0x70
 [<c1555523>] ? skb_copy_datagram_iovec+0x53/0x300
 [<c178e834>] ? sub_preempt_count+0x24/0x50
 [<c15bdd2d>] ip_recv_error+0x23d/0x270
 [<c15de554>] udp_recvmsg+0x264/0x2b0
 [<c15ea659>] inet_recvmsg+0xd9/0x130
 [<c1547752>] sock_recvmsg+0xf2/0x120
 [<c11179cb>] ? might_fault+0x4b/0xa0
 [<c15546bc>] ? verify_iovec+0x4c/0xc0
 [<c1547660>] ? sock_recvmsg_nosec+0x100/0x100
 [<c1548294>] __sys_recvmsg+0x114/0x1e0
 [<c1093895>] ? __lock_acquire+0x365/0x780
 [<c1148b66>] ? fget_light+0xa6/0x3e0
 [<c1148b7f>] ? fget_light+0xbf/0x3e0
 [<c1148aee>] ? fget_light+0x2e/0x3e0
 [<c1549f29>] sys_recvmsg+0x39/0x60

Close bug https://bugzilla.kernel.org/show_bug.cgi?id=34622

Reported-by: Witold Baryluk <[email protected]>
Signed-off-by: Eric Dumazet <[email protected]>
CC: Stephen Hemminger <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
---
 net/core/skbuff.c |    3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 7ebeed0..3e934fe 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -2993,6 +2993,9 @@ int sock_queue_err_skb(struct sock *sk, struct sk_buff 
*skb)
        skb->destructor = sock_rmem_free;
        atomic_add(skb->truesize, &sk->sk_rmem_alloc);
 
+       /* before exiting rcu section, make sure dst is refcounted */
+       skb_dst_force(skb);
+
        skb_queue_tail(&sk->sk_error_queue, skb);
        if (!sock_flag(sk, SOCK_DEAD))
                sk->sk_data_ready(sk, skb->len);
-- 
1.7.4.4


>From e5d37b08f230fd29fc7d22d2367af18e737c1aaa Mon Sep 17 00:00:00 2001
From: Eric Dumazet <[email protected]>
Date: Fri, 20 May 2011 14:59:23 -0400
Subject: [PATCH 08/17] macvlan: fix panic if lowerdev in a bond

[ Upstream commit d93515611bbc70c2fe4db232e5feb448ed8e4cc9 ]

commit a35e2c1b6d905 (macvlan: use rx_handler_data pointer to store
macvlan_port pointer V2) added a bug in macvlan_port_create()

Steps to reproduce the bug:

# ifenslave bond0 eth0 eth1

# ip link add link eth0 up name eth0#1 type macvlan
->error EBUSY

# ip link add link eth0 up name eth0#1 type macvlan
->panic

Fix: Dont set IFF_MACVLAN_PORT in error case.

Signed-off-by: Eric Dumazet <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
---
 drivers/net/macvlan.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/macvlan.c b/drivers/net/macvlan.c
index 78e34e9..6d357d6 100644
--- a/drivers/net/macvlan.c
+++ b/drivers/net/macvlan.c
@@ -598,8 +598,8 @@ static int macvlan_port_create(struct net_device *dev)
        err = netdev_rx_handler_register(dev, macvlan_handle_frame, port);
        if (err)
                kfree(port);
-
-       dev->priv_flags |= IFF_MACVLAN_PORT;
+       else
+               dev->priv_flags |= IFF_MACVLAN_PORT;
        return err;
 }
 
-- 
1.7.4.4


>From bc161cb50ea713af8a1c67f8d0291c93169970b5 Mon Sep 17 00:00:00 2001
From: Jacek Luczak <[email protected]>
Date: Thu, 19 May 2011 09:55:13 +0000
Subject: [PATCH 09/17] SCTP: fix race between sctp_bind_addr_free() and
 sctp_bind_addr_conflict()

[ Upstream commit c182f90bc1f22ce5039b8722e45621d5f96862c2 ]

During the sctp_close() call, we do not use rcu primitives to
destroy the address list attached to the endpoint.  At the same
time, we do the removal of addresses from this list before
attempting to remove the socket from the port hash

As a result, it is possible for another process to find the socket
in the port hash that is in the process of being closed.  It then
proceeds to traverse the address list to find the conflict, only
to have that address list suddenly disappear without rcu() critical
section.

Fix issue by closing address list removal inside RCU critical
section.

Race can result in a kernel crash with general protection fault or
kernel NULL pointer dereference:

kernel: general protection fault: 0000 [#1] SMP
kernel: RIP: 0010:[<ffffffffa02f3dde>]  [<ffffffffa02f3dde>] 
sctp_bind_addr_conflict+0x64/0x82 [sctp]
kernel: Call Trace:
kernel:  [<ffffffffa02f415f>] ? sctp_get_port_local+0x17b/0x2a3 [sctp]
kernel:  [<ffffffffa02f3d45>] ? sctp_bind_addr_match+0x33/0x68 [sctp]
kernel:  [<ffffffffa02f4416>] ? sctp_do_bind+0xd3/0x141 [sctp]
kernel:  [<ffffffffa02f5030>] ? sctp_bindx_add+0x4d/0x8e [sctp]
kernel:  [<ffffffffa02f5183>] ? sctp_setsockopt_bindx+0x112/0x4a4 [sctp]
kernel:  [<ffffffff81089e82>] ? generic_file_aio_write+0x7f/0x9b
kernel:  [<ffffffffa02f763e>] ? sctp_setsockopt+0x14f/0xfee [sctp]
kernel:  [<ffffffff810c11fb>] ? do_sync_write+0xab/0xeb
kernel:  [<ffffffff810e82ab>] ? fsnotify+0x239/0x282
kernel:  [<ffffffff810c2462>] ? alloc_file+0x18/0xb1
kernel:  [<ffffffff8134a0b1>] ? compat_sys_setsockopt+0x1a5/0x1d9
kernel:  [<ffffffff8134aaf1>] ? compat_sys_socketcall+0x143/0x1a4
kernel:  [<ffffffff810467dc>] ? sysenter_dispatch+0x7/0x32

Signed-off-by: Jacek Luczak <[email protected]>
Acked-by: Vlad Yasevich <[email protected]>
CC: Eric Dumazet <[email protected]>
Reviewed-by: Eric Dumazet <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
---
 net/sctp/bind_addr.c |   10 ++++------
 1 files changed, 4 insertions(+), 6 deletions(-)

diff --git a/net/sctp/bind_addr.c b/net/sctp/bind_addr.c
index faf71d1..6150ac5 100644
--- a/net/sctp/bind_addr.c
+++ b/net/sctp/bind_addr.c
@@ -140,14 +140,12 @@ void sctp_bind_addr_init(struct sctp_bind_addr *bp, __u16 
port)
 /* Dispose of the address list. */
 static void sctp_bind_addr_clean(struct sctp_bind_addr *bp)
 {
-       struct sctp_sockaddr_entry *addr;
-       struct list_head *pos, *temp;
+       struct sctp_sockaddr_entry *addr, *temp;
 
        /* Empty the bind address list. */
-       list_for_each_safe(pos, temp, &bp->address_list) {
-               addr = list_entry(pos, struct sctp_sockaddr_entry, list);
-               list_del(pos);
-               kfree(addr);
+       list_for_each_entry_safe(addr, temp, &bp->address_list, list) {
+               list_del_rcu(&addr->list);
+               call_rcu(&addr->rcu, sctp_local_addr_free);
                SCTP_DBG_OBJCNT_DEC(addr);
        }
 }
-- 
1.7.4.4


>From cd934c1d517bca86a3104c935a7cf7cc13a6845b Mon Sep 17 00:00:00 2001
From: Veaceslav Falico <[email protected]>
Date: Mon, 23 May 2011 23:15:05 +0000
Subject: [PATCH 10/17] igmp: call ip_mc_clear_src() only when we have no
 users of ip_mc_list

[ Upstream commit 24cf3af3fed5edcf90bc2a0ed181e6ce1513d2dc ]

In igmp_group_dropped() we call ip_mc_clear_src(), which resets the number
of source filters per mulitcast. However, igmp_group_dropped() is also
called on NETDEV_DOWN, NETDEV_PRE_TYPE_CHANGE and NETDEV_UNREGISTER, which
means that the group might get added back on NETDEV_UP, NETDEV_REGISTER and
NETDEV_POST_TYPE_CHANGE respectively, leaving us with broken source
filters.

To fix that, we must clear the source filters only when there are no users
in the ip_mc_list, i.e. in ip_mc_dec_group() and on device destroy.

Acked-by: David L Stevens <[email protected]>
Signed-off-by: Veaceslav Falico <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
---
 net/ipv4/igmp.c |   10 +++++-----
 1 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/net/ipv4/igmp.c b/net/ipv4/igmp.c
index 1fd3d9c..57ca93a 100644
--- a/net/ipv4/igmp.c
+++ b/net/ipv4/igmp.c
@@ -1169,20 +1169,18 @@ static void igmp_group_dropped(struct ip_mc_list *im)
 
        if (!in_dev->dead) {
                if (IGMP_V1_SEEN(in_dev))
-                       goto done;
+                       return;
                if (IGMP_V2_SEEN(in_dev)) {
                        if (reporter)
                                igmp_send_report(in_dev, im, 
IGMP_HOST_LEAVE_MESSAGE);
-                       goto done;
+                       return;
                }
                /* IGMPv3 */
                igmpv3_add_delrec(in_dev, im);
 
                igmp_ifc_event(in_dev);
        }
-done:
 #endif
-       ip_mc_clear_src(im);
 }
 
 static void igmp_group_added(struct ip_mc_list *im)
@@ -1319,6 +1317,7 @@ void ip_mc_dec_group(struct in_device *in_dev, __be32 
addr)
                                *ip = i->next_rcu;
                                in_dev->mc_count--;
                                igmp_group_dropped(i);
+                               ip_mc_clear_src(i);
 
                                if (!in_dev->dead)
                                        ip_rt_multicast_event(in_dev);
@@ -1428,7 +1427,8 @@ void ip_mc_destroy_dev(struct in_device *in_dev)
                in_dev->mc_list = i->next_rcu;
                in_dev->mc_count--;
 
-               igmp_group_dropped(i);
+               /* We've dropped the groups in ip_mc_down already */
+               ip_mc_clear_src(i);
                ip_ma_put(i);
        }
 }
-- 
1.7.4.4


>From ddce98be0fe80702591c637593740872a2835d48 Mon Sep 17 00:00:00 2001
From: Eric Dumazet <[email protected]>
Date: Tue, 24 May 2011 13:32:18 -0400
Subject: [PATCH 11/17] bridge: initialize fake_rtable metrics

[ Upstream commit 33eb9873a283a2076f2b5628813d5365ca420ea9 ]

bridge netfilter code uses a fake_rtable, and we must init its _metric
field or risk NULL dereference later.

Ref: https://bugzilla.kernel.org/show_bug.cgi?id=35672

Signed-off-by: Eric Dumazet <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
---
 net/bridge/br_netfilter.c |    6 +++++-
 1 files changed, 5 insertions(+), 1 deletions(-)

diff --git a/net/bridge/br_netfilter.c b/net/bridge/br_netfilter.c
index 74ef4d4..5f9c091 100644
--- a/net/bridge/br_netfilter.c
+++ b/net/bridge/br_netfilter.c
@@ -117,6 +117,10 @@ static struct dst_ops fake_dst_ops = {
  * ipt_REJECT needs it.  Future netfilter modules might
  * require us to fill additional fields.
  */
+static const u32 br_dst_default_metrics[RTAX_MAX] = {
+       [RTAX_MTU - 1] = 1500,
+};
+
 void br_netfilter_rtable_init(struct net_bridge *br)
 {
        struct rtable *rt = &br->fake_rtable;
@@ -124,7 +128,7 @@ void br_netfilter_rtable_init(struct net_bridge *br)
        atomic_set(&rt->dst.__refcnt, 1);
        rt->dst.dev = br->dev;
        rt->dst.path = &rt->dst;
-       dst_metric_set(&rt->dst, RTAX_MTU, 1500);
+       dst_init_metrics(&rt->dst, br_dst_default_metrics, true);
        rt->dst.flags   = DST_NOXFRM;
        rt->dst.ops = &fake_dst_ops;
 }
-- 
1.7.4.4


>From c10a10f37acde34b6e8e888d4e76a1c92ef200c3 Mon Sep 17 00:00:00 2001
From: Wei Yongjun <[email protected]>
Date: Tue, 24 May 2011 21:48:02 +0000
Subject: [PATCH 12/17] sctp: fix memory leak of the ASCONF queue when free
 asoc

[ Upstream commit 8b4472cc13136d04727e399c6fdadf58d2218b0a ]

If an ASCONF chunk is outstanding, then the following ASCONF
chunk will be queued for later transmission. But when we free
the asoc, we forget to free the ASCONF queue at the same time,
this will cause memory leak.

Signed-off-by: Wei Yongjun <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
---
 net/sctp/associola.c |   16 ++++++++++++++++
 1 files changed, 16 insertions(+), 0 deletions(-)

diff --git a/net/sctp/associola.c b/net/sctp/associola.c
index 1a21c57..525f97c 100644
--- a/net/sctp/associola.c
+++ b/net/sctp/associola.c
@@ -64,6 +64,7 @@
 /* Forward declarations for internal functions. */
 static void sctp_assoc_bh_rcv(struct work_struct *work);
 static void sctp_assoc_free_asconf_acks(struct sctp_association *asoc);
+static void sctp_assoc_free_asconf_queue(struct sctp_association *asoc);
 
 /* Keep track of the new idr low so that we don't re-use association id
  * numbers too fast.  It is protected by they idr spin lock is in the
@@ -446,6 +447,9 @@ void sctp_association_free(struct sctp_association *asoc)
        /* Free any cached ASCONF_ACK chunk. */
        sctp_assoc_free_asconf_acks(asoc);
 
+       /* Free the ASCONF queue. */
+       sctp_assoc_free_asconf_queue(asoc);
+
        /* Free any cached ASCONF chunk. */
        if (asoc->addip_last_asconf)
                sctp_chunk_free(asoc->addip_last_asconf);
@@ -1578,6 +1582,18 @@ retry:
        return error;
 }
 
+/* Free the ASCONF queue */
+static void sctp_assoc_free_asconf_queue(struct sctp_association *asoc)
+{
+       struct sctp_chunk *asconf;
+       struct sctp_chunk *tmp;
+
+       list_for_each_entry_safe(asconf, tmp, &asoc->addip_chunk_list, list) {
+               list_del_init(&asconf->list);
+               sctp_chunk_free(asconf);
+       }
+}
+
 /* Free asconf_ack cache */
 static void sctp_assoc_free_asconf_acks(struct sctp_association *asoc)
 {
-- 
1.7.4.4


>From 5bfe906050eda2fdb229d552089442759b6fd18a Mon Sep 17 00:00:00 2001
From: Eric Dumazet <[email protected]>
Date: Wed, 25 May 2011 04:40:11 +0000
Subject: [PATCH 13/17] sch_sfq: fix peek() implementation

[ Upstream commit 07bd8df5df4369487812bf85a237322ff3569b77 ]

Since commit eeaeb068f139 (sch_sfq: allow big packets and be fair),
sfq_peek() can return a different skb that would be normally dequeued by
sfq_dequeue() [ if current slot->allot is negative ]

Use generic qdisc_peek_dequeued() instead of custom implementation, to
get consistent result.

Signed-off-by: Eric Dumazet <[email protected]>
CC: Jarek Poplawski <[email protected]>
CC: Patrick McHardy <[email protected]>
CC: Jesper Dangaard Brouer <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
---
 net/sched/sch_sfq.c |   14 +-------------
 1 files changed, 1 insertions(+), 13 deletions(-)

diff --git a/net/sched/sch_sfq.c b/net/sched/sch_sfq.c
index c2e628d..5b8ae41 100644
--- a/net/sched/sch_sfq.c
+++ b/net/sched/sch_sfq.c
@@ -410,18 +410,6 @@ sfq_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 }
 
 static struct sk_buff *
-sfq_peek(struct Qdisc *sch)
-{
-       struct sfq_sched_data *q = qdisc_priv(sch);
-
-       /* No active slots */
-       if (q->tail == NULL)
-               return NULL;
-
-       return q->slots[q->tail->next].skblist_next;
-}
-
-static struct sk_buff *
 sfq_dequeue(struct Qdisc *sch)
 {
        struct sfq_sched_data *q = qdisc_priv(sch);
@@ -702,7 +690,7 @@ static struct Qdisc_ops sfq_qdisc_ops __read_mostly = {
        .priv_size      =       sizeof(struct sfq_sched_data),
        .enqueue        =       sfq_enqueue,
        .dequeue        =       sfq_dequeue,
-       .peek           =       sfq_peek,
+       .peek           =       qdisc_peek_dequeued,
        .drop           =       sfq_drop,
        .init           =       sfq_init,
        .reset          =       sfq_reset,
-- 
1.7.4.4


>From 2d17cce7a2d33f0aaf2bc6bcbc9aa9d12036223a Mon Sep 17 00:00:00 2001
From: Neil Horman <[email protected]>
Date: Wed, 25 May 2011 08:13:01 +0000
Subject: [PATCH 14/17] bonding: prevent deadlock on slave store with alb mode
 (v3)

[ Upstream commit 9fe0617d9b6d21f700ee9e658e1c9fe3be2fb402 ]

This soft lockup was recently reported:

[root@dell-per715-01 ~]# echo +bond5 > /sys/class/net/bonding_masters
[root@dell-per715-01 ~]# echo +eth1 > /sys/class/net/bond5/bonding/slaves
bonding: bond5: doing slave updates when interface is down.
bonding bond5: master_dev is not up in bond_enslave
[root@dell-per715-01 ~]# echo -eth1 > /sys/class/net/bond5/bonding/slaves
bonding: bond5: doing slave updates when interface is down.

BUG: soft lockup - CPU#12 stuck for 60s! [bash:6444]
CPU 12:
Modules linked in: bonding autofs4 hidp rfcomm l2cap bluetooth lockd sunrpc
be2d
Pid: 6444, comm: bash Not tainted 2.6.18-262.el5 #1
RIP: 0010:[<ffffffff80064bf0>]  [<ffffffff80064bf0>]
.text.lock.spinlock+0x26/00
RSP: 0018:ffff810113167da8  EFLAGS: 00000286
RAX: ffff810113167fd8 RBX: ffff810123a47800 RCX: 0000000000ff1025
RDX: 0000000000000000 RSI: ffff810123a47800 RDI: ffff81021b57f6f8
RBP: ffff81021b57f500 R08: 0000000000000000 R09: 000000000000000c
R10: 00000000ffffffff R11: ffff81011d41c000 R12: ffff81021b57f000
R13: 0000000000000000 R14: 0000000000000282 R15: 0000000000000282
FS:  00002b3b41ef3f50(0000) GS:ffff810123b27940(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00002b3b456dd000 CR3: 000000031fc60000 CR4: 00000000000006e0

Call Trace:
 [<ffffffff80064af9>] _spin_lock_bh+0x9/0x14
 [<ffffffff886937d7>] :bonding:tlb_clear_slave+0x22/0xa1
 [<ffffffff8869423c>] :bonding:bond_alb_deinit_slave+0xba/0xf0
 [<ffffffff8868dda6>] :bonding:bond_release+0x1b4/0x450
 [<ffffffff8006457b>] __down_write_nested+0x12/0x92
 [<ffffffff88696ae4>] :bonding:bonding_store_slaves+0x25c/0x2f7
 [<ffffffff801106f7>] sysfs_write_file+0xb9/0xe8
 [<ffffffff80016b87>] vfs_write+0xce/0x174
 [<ffffffff80017450>] sys_write+0x45/0x6e
 [<ffffffff8005d28d>] tracesys+0xd5/0xe0

It occurs because we are able to change the slave configuarion of a bond while
the bond interface is down.  The bonding driver initializes some data structures
only after its ndo_open routine is called.  Among them is the initalization of
the alb tx and rx hash locks.  So if we add or remove a slave without first
opening the bond master device, we run the risk of trying to lock/unlock a
spinlock that has garbage for data in it, which results in our above softlock.

Note that sometimes this works, because in many cases an unlocked spinlock has
the raw_lock parameter initialized to zero (meaning that the kzalloc of the
net_device private data is equivalent to calling spin_lock_init), but thats not
true in all cases, and we aren't guaranteed that condition, so we need to pass
the relevant spinlocks through the spin_lock_init function.

Fix it by moving the spin_lock_init calls for the tx and rx hashtable locks to
the ndo_init path, so they are ready for use by the bond_store_slaves path.

Change notes:
v2) Based on conversation with Jay and Nicolas it seems that the ability to
enslave devices while the bond master is down should be safe to do.  As such
this is an outlier bug, and so instead we'll just initalize the errant spinlocks
in the init path rather than the open path, solving the problem.  We'll also
remove the warnings about the bond being down during enslave operations, since
it should be safe

v3) Fix spelling error

Signed-off-by: Neil Horman <[email protected]>
Reported-by: [email protected]
CC: Jay Vosburgh <[email protected]>
CC: Andy Gospodarek <[email protected]>
CC: [email protected]
CC: "David S. Miller" <[email protected]>
Signed-off-by: Jay Vosburgh <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
---
 drivers/net/bonding/bond_alb.c   |    4 ----
 drivers/net/bonding/bond_main.c  |   16 ++++++++++------
 drivers/net/bonding/bond_sysfs.c |    6 ------
 3 files changed, 10 insertions(+), 16 deletions(-)

diff --git a/drivers/net/bonding/bond_alb.c b/drivers/net/bonding/bond_alb.c
index ba71582..a20bfef 100644
--- a/drivers/net/bonding/bond_alb.c
+++ b/drivers/net/bonding/bond_alb.c
@@ -163,8 +163,6 @@ static int tlb_initialize(struct bonding *bond)
        struct tlb_client_info *new_hashtbl;
        int i;
 
-       spin_lock_init(&(bond_info->tx_hashtbl_lock));
-
        new_hashtbl = kzalloc(size, GFP_KERNEL);
        if (!new_hashtbl) {
                pr_err("%s: Error: Failed to allocate TLB hash table\n",
@@ -764,8 +762,6 @@ static int rlb_initialize(struct bonding *bond)
        int size = RLB_HASH_TABLE_SIZE * sizeof(struct rlb_client_info);
        int i;
 
-       spin_lock_init(&(bond_info->rx_hashtbl_lock));
-
        new_hashtbl = kmalloc(size, GFP_KERNEL);
        if (!new_hashtbl) {
                pr_err("%s: Error: Failed to allocate RLB hash table\n",
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 16d6fe9..ffb0fde 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -1535,12 +1535,6 @@ int bond_enslave(struct net_device *bond_dev, struct 
net_device *slave_dev)
                           bond_dev->name, slave_dev->name);
        }
 
-       /* bond must be initialized by bond_open() before enslaving */
-       if (!(bond_dev->flags & IFF_UP)) {
-               pr_warning("%s: master_dev is not up in bond_enslave\n",
-                          bond_dev->name);
-       }
-
        /* already enslaved */
        if (slave_dev->flags & IFF_SLAVE) {
                pr_debug("Error, Device was already enslaved\n");
@@ -4975,9 +4969,19 @@ static int bond_init(struct net_device *bond_dev)
 {
        struct bonding *bond = netdev_priv(bond_dev);
        struct bond_net *bn = net_generic(dev_net(bond_dev), bond_net_id);
+       struct alb_bond_info *bond_info = &(BOND_ALB_INFO(bond));
 
        pr_debug("Begin bond_init for %s\n", bond_dev->name);
 
+       /*
+        * Initialize locks that may be required during
+        * en/deslave operations.  All of the bond_open work
+        * (of which this is part) should really be moved to
+        * a phase prior to dev_open
+        */
+       spin_lock_init(&(bond_info->tx_hashtbl_lock));
+       spin_lock_init(&(bond_info->rx_hashtbl_lock));
+
        bond->wq = create_singlethread_workqueue(bond_dev->name);
        if (!bond->wq)
                return -ENOMEM;
diff --git a/drivers/net/bonding/bond_sysfs.c b/drivers/net/bonding/bond_sysfs.c
index de87aea..8a2717e 100644
--- a/drivers/net/bonding/bond_sysfs.c
+++ b/drivers/net/bonding/bond_sysfs.c
@@ -227,12 +227,6 @@ static ssize_t bonding_store_slaves(struct device *d,
        struct net_device *dev;
        struct bonding *bond = to_bond(d);
 
-       /* Quick sanity check -- is the bond interface up? */
-       if (!(bond->dev->flags & IFF_UP)) {
-               pr_warning("%s: doing slave updates when interface is down.\n",
-                          bond->dev->name);
-       }
-
        if (!rtnl_trylock())
                return restart_syscall();
 
-- 
1.7.4.4


>From 07317133d5fc38c21af6adb6c6672cab37eb3548 Mon Sep 17 00:00:00 2001
From: Eric Dumazet <[email protected]>
Date: Mon, 23 May 2011 11:02:42 +0000
Subject: [PATCH 15/17] sch_sfq: avoid giving spurious NET_XMIT_CN signals

[ Upstream commit 8efa885406359af300d46910642b50ca82c0fe47 ]

While chasing a possible net_sched bug, I found that IP fragments have
litle chance to pass a congestioned SFQ qdisc :

- Say SFQ qdisc is full because one flow is non responsive.
- ip_fragment() wants to send two fragments belonging to an idle flow.
- sfq_enqueue() queues first packet, but see queue limit reached :
- sfq_enqueue() drops one packet from 'big consumer', and returns
NET_XMIT_CN.
- ip_fragment() cancel remaining fragments.

This patch restores fairness, making sure we return NET_XMIT_CN only if
we dropped a packet from the same flow.

Signed-off-by: Eric Dumazet <[email protected]>
CC: Patrick McHardy <[email protected]>
CC: Jarek Poplawski <[email protected]>
CC: Jamal Hadi Salim <[email protected]>
CC: Stephen Hemminger <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
---
 net/sched/sch_sfq.c |    8 ++++++--
 1 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/net/sched/sch_sfq.c b/net/sched/sch_sfq.c
index 5b8ae41..6d96275 100644
--- a/net/sched/sch_sfq.c
+++ b/net/sched/sch_sfq.c
@@ -361,7 +361,7 @@ sfq_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 {
        struct sfq_sched_data *q = qdisc_priv(sch);
        unsigned int hash;
-       sfq_index x;
+       sfq_index x, qlen;
        struct sfq_slot *slot;
        int uninitialized_var(ret);
 
@@ -405,8 +405,12 @@ sfq_enqueue(struct sk_buff *skb, struct Qdisc *sch)
        if (++sch->q.qlen <= q->limit)
                return NET_XMIT_SUCCESS;
 
+       qlen = slot->qlen;
        sfq_drop(sch);
-       return NET_XMIT_CN;
+       /* Return Congestion Notification only if we dropped a packet
+        * from this flow.
+        */
+       return (qlen != slot->qlen) ? NET_XMIT_CN : NET_XMIT_SUCCESS;
 }
 
 static struct sk_buff *
-- 
1.7.4.4


>From 8c378c2862ae68a60bc5feb778ced790b0a19242 Mon Sep 17 00:00:00 2001
From: Eric Dumazet <[email protected]>
Date: Tue, 24 May 2011 13:29:50 -0400
Subject: [PATCH 16/17] net: fix __dst_destroy_metrics_generic()

[ Upstream commit b30c516f875004f025f4d10147bde28c5e98466b ]

dst_default_metrics is readonly, we dont want to kfree() it later.

Signed-off-by: Eric Dumazet <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
---
 net/core/dst.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/net/core/dst.c b/net/core/dst.c
index 91104d3..b71b7a3 100644
--- a/net/core/dst.c
+++ b/net/core/dst.c
@@ -314,7 +314,7 @@ void __dst_destroy_metrics_generic(struct dst_entry *dst, 
unsigned long old)
 {
        unsigned long prev, new;
 
-       new = (unsigned long) dst_default_metrics;
+       new = ((unsigned long) dst_default_metrics) | DST_METRICS_READ_ONLY;
        prev = cmpxchg(&dst->_metrics, old, new);
        if (prev == old)
                kfree(__DST_METRICS_PTR(old));
-- 
1.7.4.4


>From a6662b9379dca82b365b4ea61b5d202ca1ed5975 Mon Sep 17 00:00:00 2001
From: Stephen Hemminger <[email protected]>
Date: Tue, 24 May 2011 13:50:52 -0400
Subject: [PATCH 17/17] dst: catch uninitialized metrics

[ Upstream commit 1f37070d3ff325827c6213e51b57f21fd5ac9d05 ]

Catch cases where dst_metric_set() and other functions are called
but _metrics is NULL.

Signed-off-by: Stephen Hemminger <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
---
 include/net/dst.h |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/include/net/dst.h b/include/net/dst.h
index 75b95df..b3ad020 100644
--- a/include/net/dst.h
+++ b/include/net/dst.h
@@ -120,6 +120,8 @@ static inline u32 *dst_metrics_write_ptr(struct dst_entry 
*dst)
 {
        unsigned long p = dst->_metrics;
 
+       BUG_ON(!p);
+
        if (p & DST_METRICS_READ_ONLY)
                return dst->ops->cow_metrics(dst, p);
        return __DST_METRICS_PTR(p);
-- 
1.7.4.4

_______________________________________________
stable mailing list
[email protected]
http://linux.kernel.org/mailman/listinfo/stable

[stable] [PATCHES] Networking fixes for 2.6.39-stable

Reply via email to