Re: [PATCH net] bonding: fix bond dev flags after convert to arphrd_ether

2015-08-28 Thread Nikolay Aleksandrov

 On Aug 27, 2015, at 8:39 PM, Jay Vosburgh jay.vosbu...@canonical.com wrote:
 
 Nikolay Aleksandrov niko...@cumulusnetworks.com wrote:
 [...]
 Restarting this thread because there’s actually a bug here, what you 
 described with
 the bonding destruction is true when the slaves are all destroyed but it 
 isn’t true if they’re
 just released, if you take a look at bond_slave_netdev_event() the bond 
 destruction happens
 only on NETDEV_UNREGISTER and I just hit this bug by enslaving a 
 non-ARPHRD_ETHER
 device, releasing it and enslaving a ARPHRD_ETHER device so ether_setup() 
 path in bond_enslave
 is hit and IFF_MASTER gets dropped:
 17: bond0: BROADCAST,MULTICAST,MASTER,UP,LOWER_UP mtu 1500 qdisc noqueue 
 state UP mode DEFAULT group default qlen 1000
   link/fddi 9a:33:c5:30:ff:a6 brd ff:ff:ff:ff:ff:ff
 (release non-ARPHRD_ETHER slave)
 (enslave ARPHRD_ETHER device)
 17: bond0: BROADCAST,MULTICAST,LOWER_UP mtu 1500 qdisc noqueue state UP 
 mode DEFAULT group default qlen 1000
   link/ether 08:00:27:3c:13:57 brd ff:ff:ff:ff:ff:ff
 
 Notice the master flag is gone and of course on unload we get:
 [57981.545547] [ cut here ]
 [57981.545567] WARNING: CPU: 0 PID: 13792 at fs/proc/generic.c:575 
 remove_proc_entry+0x17e/0x190()
 [57981.545572] remove_proc_entry: removing non-empty directory 
 'net/bonding', leaking at least 'bond0'
 [...]
 We need to convert it back to ARPHRD_ETHER if releasing the last slave, 
 because
 we can’t destroy it (in some paths bond-dev is used after bond_release()).
 Basically we should make the case that if the bonding doesn’t have any 
 slaves then it’s
 always an ARPHRD_ETHER device.
 
 Thoughts ?
 
   I agree that it would be cleaner for bond_dev-type to switch
 back on release of last slave.  The options code (caller of
 bond_option_slaves_set) and bond_uninit() both reference the bond or dev
 after calling bond_release(), and would need changing if any release
 could destroy the bond itself.
 
   However, for the type change, there's the potentially tricky
 case of a nested non-ARPHRD_ETHER bond, e.g., bond0 - bond1 - ib0.
 This isn't a typical use case that I'm aware of, but I believe it's
 supported by the code.
 
   If ib0, the last slave, is released, bond1 will want to change
 to ARPHRD_ETHER, but bond0 is ARPHRD_INFINIBAND.  I suspect bonding will
 have to notice the NETDEV_PRE_TYPE_CHANGE and _POST_ notifiers and take
 appropriate action (i.e., cascade the type change upwards).
 
   There might be similar issues with other devices stacked on top
 of the IB - Ether type-changing bond; I'm not sure how many of those
 there may be, though, since many things won't stack over IB devices (or
 an IB-flavor bond).
 
Ugh right, this would be a problem. I’ll see if it can be handled well.

   If the type change works, then I don't think we would still need
 the release and destroy logic.
 
Right, that was my intention.

   -J
 
 ---
   -Jay Vosburgh, jay.vosbu...@canonical.com

I’ll look into this some more and if it works out I’ll post the patch.

Thanks,
 Nik--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net] bonding: fix bond dev flags after convert to arphrd_ether

2015-08-27 Thread Nikolay Aleksandrov

 On Jul 15, 2015, at 1:32 PM, Jay Vosburgh jay.vosbu...@canonical.com wrote:
 
 Nikolay Aleksandrov ra...@blackwall.org wrote:
 
 From: Nikolay Aleksandrov niko...@cumulusnetworks.com
 
 If a bonding device enslaves devices != arphrd_ether it'll change types
 and if later these devices are released, it can enslave an arphrd_ether
 device and switch back calling ether_setup() which resets dev-flags to
 IFF_BROADCAST|IFF_MULTICAST and clears IFF_MASTER which then could lead
 to many different bugs. This bug seems to have been there since the
 introduction of ether_setup() in bond_enslave().
 
   I thought the hack-around for the non-ethernet device problem
 was that, for non-ARPHRD_ETHER devices, the bonding master device is
 automatically destroyed when the last slave is released.
 
   This (well, this sort of thing) originally came up with IPoIB
 devices needing different ops that would disappear if the ipoib module
 was unloaded, so the bond master was deliberately unregistered when the
 last slave was released.
 
   Looking at the code, at first glance this appears to still be
 the case: bond_slave_netdev_event calls bond_release_and_destroy for !=
 ARPHRD_ETHER, which in turn should unregister the bond itself if there
 are no slaves left.  Is this no longer working as intended?
 
   -J
 
Restarting this thread because there’s actually a bug here, what you described 
with
the bonding destruction is true when the slaves are all destroyed but it isn’t 
true if they’re
just released, if you take a look at bond_slave_netdev_event() the bond 
destruction happens
only on NETDEV_UNREGISTER and I just hit this bug by enslaving a 
non-ARPHRD_ETHER
device, releasing it and enslaving a ARPHRD_ETHER device so ether_setup() path 
in bond_enslave
is hit and IFF_MASTER gets dropped:
17: bond0: BROADCAST,MULTICAST,MASTER,UP,LOWER_UP mtu 1500 qdisc noqueue 
state UP mode DEFAULT group default qlen 1000
link/fddi 9a:33:c5:30:ff:a6 brd ff:ff:ff:ff:ff:ff
(release non-ARPHRD_ETHER slave)
(enslave ARPHRD_ETHER device)
17: bond0: BROADCAST,MULTICAST,LOWER_UP mtu 1500 qdisc noqueue state UP mode 
DEFAULT group default qlen 1000
link/ether 08:00:27:3c:13:57 brd ff:ff:ff:ff:ff:ff

Notice the master flag is gone and of course on unload we get:
[57981.545547] [ cut here ]
[57981.545567] WARNING: CPU: 0 PID: 13792 at fs/proc/generic.c:575 
remove_proc_entry+0x17e/0x190()
[57981.545572] remove_proc_entry: removing non-empty directory 'net/bonding', 
leaking at least 'bond0'
[57981.545576] Modules linked in: bonding(-) bridge(OE) stp llc rpcsec_gss_krb5 
nfsv4 dns_resolver nfs fscache macvlan netconsole ppdev joydev serio_raw 
parport_pc parport i2c_piix4 video acpi_cpufreq nfsd auth_rpcgss nfs_acl lockd 
grace sunrpc virtio_net pcnet32 mii virtio_pci virtio_ring virtio e1000 
ata_generic pata_acpi [last unloaded: bonding]
[57981.545614] CPU: 0 PID: 13792 Comm: rmmod Tainted: G   OE   
4.2.0-rc7+ #56
[57981.545618] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS 
VirtualBox 12/01/2006
[57981.545623]   b15c0564 88001ec43ca8 
8183f105
[57981.545629]   88001ec43d00 88001ec43ce8 
810a9496
[57981.545635]   88002ef6a800 88002ef6a838 
88002e24ed00
[57981.545641] Call Trace:
[57981.545649]  [8183f105] dump_stack+0x4c/0x65
[57981.545656]  [810a9496] warn_slowpath_common+0x86/0xc0
[57981.545662]  [810a9525] warn_slowpath_fmt+0x55/0x70
[57981.545669]  [812dbcde] remove_proc_entry+0x17e/0x190
[57981.545675]  [812e79d5] ? kernfs_remove_by_name_ns+0x55/0xa0
[57981.545687]  [a032522e] bond_destroy_proc_dir+0x2e/0x3e [bonding]
[57981.545694]  [a03126fe] bond_net_exit+0x13e/0x200 [bonding]
[57981.545700]  [a03125c5] ? bond_net_exit+0x5/0x200 [bonding]
[57981.545707]  [816bd7c8] ops_exit_list.isra.4+0x38/0x60
[57981.545713]  [816bdd08] unregister_pernet_operations+0x78/0xd0
[57981.545718]  [816bdd87] unregister_pernet_subsys+0x27/0x40
[57981.545726]  [a032546b] bonding_exit+0x26/0xbbb [bonding]
[57981.545732]  [8114a9f7] SyS_delete_module+0x1b7/0x210
[57981.545739]  [81003017] ? trace_hardirqs_on_thunk+0x17/0x19
[57981.545745]  [8184936e] entry_SYSCALL_64_fastpath+0x12/0x76
[57981.545749] ---[ end trace 46a4798bb28254d0 ]—

We need to convert it back to ARPHRD_ETHER if releasing the last slave, because
we can’t destroy it (in some paths bond-dev is used after bond_release()).
Basically we should make the case that if the bonding doesn’t have any slaves 
then it’s
always an ARPHRD_ETHER device.

Thoughts ?

 
 Signed-off-by: Nikolay Aleksandrov niko...@cumulusnetworks.com
 Fixes: e36b9d16c6a6 (bonding: clean muticast addresses when device changes 
 type)
 ---
 drivers/net/bonding/bond_main.c | 1 +
 1 file changed, 1 insertion(+)
 
 diff --git a/drivers/net/bonding/bond_main.c 

Re: [PATCH net] bonding: fix bond dev flags after convert to arphrd_ether

2015-08-27 Thread Jay Vosburgh
Nikolay Aleksandrov niko...@cumulusnetworks.com wrote:
[...]
Restarting this thread because there’s actually a bug here, what you described 
with
the bonding destruction is true when the slaves are all destroyed but it isn’t 
true if they’re
just released, if you take a look at bond_slave_netdev_event() the bond 
destruction happens
only on NETDEV_UNREGISTER and I just hit this bug by enslaving a 
non-ARPHRD_ETHER
device, releasing it and enslaving a ARPHRD_ETHER device so ether_setup() path 
in bond_enslave
is hit and IFF_MASTER gets dropped:
17: bond0: BROADCAST,MULTICAST,MASTER,UP,LOWER_UP mtu 1500 qdisc noqueue 
state UP mode DEFAULT group default qlen 1000
link/fddi 9a:33:c5:30:ff:a6 brd ff:ff:ff:ff:ff:ff
(release non-ARPHRD_ETHER slave)
(enslave ARPHRD_ETHER device)
17: bond0: BROADCAST,MULTICAST,LOWER_UP mtu 1500 qdisc noqueue state UP mode 
DEFAULT group default qlen 1000
link/ether 08:00:27:3c:13:57 brd ff:ff:ff:ff:ff:ff

Notice the master flag is gone and of course on unload we get:
[57981.545547] [ cut here ]
[57981.545567] WARNING: CPU: 0 PID: 13792 at fs/proc/generic.c:575 
remove_proc_entry+0x17e/0x190()
[57981.545572] remove_proc_entry: removing non-empty directory 'net/bonding', 
leaking at least 'bond0'
[...]
We need to convert it back to ARPHRD_ETHER if releasing the last slave, because
we can’t destroy it (in some paths bond-dev is used after bond_release()).
Basically we should make the case that if the bonding doesn’t have any slaves 
then it’s
always an ARPHRD_ETHER device.

Thoughts ?

I agree that it would be cleaner for bond_dev-type to switch
back on release of last slave.  The options code (caller of
bond_option_slaves_set) and bond_uninit() both reference the bond or dev
after calling bond_release(), and would need changing if any release
could destroy the bond itself.

However, for the type change, there's the potentially tricky
case of a nested non-ARPHRD_ETHER bond, e.g., bond0 - bond1 - ib0.
This isn't a typical use case that I'm aware of, but I believe it's
supported by the code.

If ib0, the last slave, is released, bond1 will want to change
to ARPHRD_ETHER, but bond0 is ARPHRD_INFINIBAND.  I suspect bonding will
have to notice the NETDEV_PRE_TYPE_CHANGE and _POST_ notifiers and take
appropriate action (i.e., cascade the type change upwards).

There might be similar issues with other devices stacked on top
of the IB - Ether type-changing bond; I'm not sure how many of those
there may be, though, since many things won't stack over IB devices (or
an IB-flavor bond).

If the type change works, then I don't think we would still need
the release and destroy logic.

-J

---
-Jay Vosburgh, jay.vosbu...@canonical.com
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net] bonding: fix bond dev flags after convert to arphrd_ether

2015-07-20 Thread David Miller
From: Nikolay Aleksandrov niko...@cumulusnetworks.com
Date: Wed, 15 Jul 2015 22:55:55 +0200

 Dave please ignore this patch, I'll send a better fix.

OK
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net] bonding: fix bond dev flags after convert to arphrd_ether

2015-07-15 Thread Nikolay Aleksandrov
From: Nikolay Aleksandrov niko...@cumulusnetworks.com

If a bonding device enslaves devices != arphrd_ether it'll change types
and if later these devices are released, it can enslave an arphrd_ether
device and switch back calling ether_setup() which resets dev-flags to
IFF_BROADCAST|IFF_MULTICAST and clears IFF_MASTER which then could lead
to many different bugs. This bug seems to have been there since the
introduction of ether_setup() in bond_enslave().

Signed-off-by: Nikolay Aleksandrov niko...@cumulusnetworks.com
Fixes: e36b9d16c6a6 (bonding: clean muticast addresses when device changes 
type)
---
 drivers/net/bonding/bond_main.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 317a49480475..8ba119896e55 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -1368,6 +1368,7 @@ int bond_enslave(struct net_device *bond_dev, struct 
net_device *slave_dev)
bond_setup_by_slave(bond_dev, slave_dev);
else {
ether_setup(bond_dev);
+   bond_dev-flags |= IFF_MASTER;
bond_dev-priv_flags = ~IFF_TX_SKB_SHARING;
}
 
-- 
1.9.3

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net] bonding: fix bond dev flags after convert to arphrd_ether

2015-07-15 Thread Nikolay Aleksandrov
On 07/15/2015 10:32 PM, Jay Vosburgh wrote:
 Nikolay Aleksandrov ra...@blackwall.org wrote:
 
 From: Nikolay Aleksandrov niko...@cumulusnetworks.com

 If a bonding device enslaves devices != arphrd_ether it'll change types
 and if later these devices are released, it can enslave an arphrd_ether
 device and switch back calling ether_setup() which resets dev-flags to
 IFF_BROADCAST|IFF_MULTICAST and clears IFF_MASTER which then could lead
 to many different bugs. This bug seems to have been there since the
 introduction of ether_setup() in bond_enslave().
 
   I thought the hack-around for the non-ethernet device problem
 was that, for non-ARPHRD_ETHER devices, the bonding master device is
 automatically destroyed when the last slave is released.
 
   This (well, this sort of thing) originally came up with IPoIB
 devices needing different ops that would disappear if the ipoib module
 was unloaded, so the bond master was deliberately unregistered when the
 last slave was released.
 
   Looking at the code, at first glance this appears to still be
 the case: bond_slave_netdev_event calls bond_release_and_destroy for !=
 ARPHRD_ETHER, which in turn should unregister the bond itself if there
 are no slaves left.  Is this no longer working as intended?
 
   -J
 
 
 Signed-off-by: Nikolay Aleksandrov niko...@cumulusnetworks.com
 Fixes: e36b9d16c6a6 (bonding: clean muticast addresses when device changes 
 type)
 ---
 drivers/net/bonding/bond_main.c | 1 +
 1 file changed, 1 insertion(+)

 diff --git a/drivers/net/bonding/bond_main.c 
 b/drivers/net/bonding/bond_main.c
 index 317a49480475..8ba119896e55 100644
 --- a/drivers/net/bonding/bond_main.c
 +++ b/drivers/net/bonding/bond_main.c
 @@ -1368,6 +1368,7 @@ int bond_enslave(struct net_device *bond_dev, struct 
 net_device *slave_dev)
  bond_setup_by_slave(bond_dev, slave_dev);
  else {
  ether_setup(bond_dev);
 +bond_dev-flags |= IFF_MASTER;
  bond_dev-priv_flags = ~IFF_TX_SKB_SHARING;
  }

 -- 
 1.9.3

 
 ---
   -Jay Vosburgh, jay.vosbu...@canonical.com
 

Ah yes, the bug is actually that the bond doesn't switch back its type
on failure after the bond_setup_by_slave().
I actually had prepared that fix next. :-)
But you're right, this doesn't need to be fixed if the enslave failure
path is updated to switch back the type on fail.

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net] bonding: fix bond dev flags after convert to arphrd_ether

2015-07-15 Thread Jay Vosburgh
Nikolay Aleksandrov ra...@blackwall.org wrote:

From: Nikolay Aleksandrov niko...@cumulusnetworks.com

If a bonding device enslaves devices != arphrd_ether it'll change types
and if later these devices are released, it can enslave an arphrd_ether
device and switch back calling ether_setup() which resets dev-flags to
IFF_BROADCAST|IFF_MULTICAST and clears IFF_MASTER which then could lead
to many different bugs. This bug seems to have been there since the
introduction of ether_setup() in bond_enslave().

I thought the hack-around for the non-ethernet device problem
was that, for non-ARPHRD_ETHER devices, the bonding master device is
automatically destroyed when the last slave is released.

This (well, this sort of thing) originally came up with IPoIB
devices needing different ops that would disappear if the ipoib module
was unloaded, so the bond master was deliberately unregistered when the
last slave was released.

Looking at the code, at first glance this appears to still be
the case: bond_slave_netdev_event calls bond_release_and_destroy for !=
ARPHRD_ETHER, which in turn should unregister the bond itself if there
are no slaves left.  Is this no longer working as intended?

-J


Signed-off-by: Nikolay Aleksandrov niko...@cumulusnetworks.com
Fixes: e36b9d16c6a6 (bonding: clean muticast addresses when device changes 
type)
---
 drivers/net/bonding/bond_main.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 317a49480475..8ba119896e55 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -1368,6 +1368,7 @@ int bond_enslave(struct net_device *bond_dev, struct 
net_device *slave_dev)
   bond_setup_by_slave(bond_dev, slave_dev);
   else {
   ether_setup(bond_dev);
+  bond_dev-flags |= IFF_MASTER;
   bond_dev-priv_flags = ~IFF_TX_SKB_SHARING;
   }
 
-- 
1.9.3


---
-Jay Vosburgh, jay.vosbu...@canonical.com
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net] bonding: fix bond dev flags after convert to arphrd_ether

2015-07-15 Thread Nikolay Aleksandrov
On 07/15/2015 10:09 PM, Nikolay Aleksandrov wrote:
 From: Nikolay Aleksandrov niko...@cumulusnetworks.com
 
 If a bonding device enslaves devices != arphrd_ether it'll change types
 and if later these devices are released, it can enslave an arphrd_ether
 device and switch back calling ether_setup() which resets dev-flags to
 IFF_BROADCAST|IFF_MULTICAST and clears IFF_MASTER which then could lead
 to many different bugs. This bug seems to have been there since the
 introduction of ether_setup() in bond_enslave().
 
 Signed-off-by: Nikolay Aleksandrov niko...@cumulusnetworks.com
 Fixes: e36b9d16c6a6 (bonding: clean muticast addresses when device changes 
 type)
 ---
  drivers/net/bonding/bond_main.c | 1 +
  1 file changed, 1 insertion(+)
 
 diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
 index 317a49480475..8ba119896e55 100644
 --- a/drivers/net/bonding/bond_main.c
 +++ b/drivers/net/bonding/bond_main.c
 @@ -1368,6 +1368,7 @@ int bond_enslave(struct net_device *bond_dev, struct 
 net_device *slave_dev)
   bond_setup_by_slave(bond_dev, slave_dev);
   else {
   ether_setup(bond_dev);
 + bond_dev-flags |= IFF_MASTER;
   bond_dev-priv_flags = ~IFF_TX_SKB_SHARING;
   }
  
 

Dave please ignore this patch, I'll send a better fix.

Thanks,
 Nik
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html