from:"Jarod Wilson"

Re: [PATCH net] wireless/nl80211: fix wdev_id may be used uninitialized

2021-03-15 Thread Jarod Wilson

On Fri, Mar 12, 2021 at 4:04 PM Kalle Valo  wrote:
>
> Jarod Wilson  writes:
>
> > Build currently fails with -Werror=maybe-uninitialized set:
> >
> > net/wireless/nl80211.c: In function '__cfg80211_wdev_from_attrs':
> > net/wireless/nl80211.c:124:44: error: 'wdev_id' may be used
> > uninitialized in this function [-Werror=maybe-uninitialized]
>
> Really, build fails? Is -Werror enabled by default now? I hope not.

Don't think so. But we (Red Hat) build all our kernels with a fair
amount of extra error-checking enabled.

-- 
Jarod Wilson
ja...@redhat.com

[PATCH net] wireless/nl80211: fix wdev_id may be used uninitialized

2021-03-12 Thread Jarod Wilson

Build currently fails with -Werror=maybe-uninitialized set:

net/wireless/nl80211.c: In function '__cfg80211_wdev_from_attrs':
net/wireless/nl80211.c:124:44: error: 'wdev_id' may be used
uninitialized in this function [-Werror=maybe-uninitialized]

Easy fix is to just initialize wdev_id to 0, since it's value doesn't
otherwise matter unless have_wdev_id is true.

Fixes: a05829a7222e ("cfg80211: avoid holding the RTNL when calling the driver")
CC: Johannes Berg 
CC: "David S. Miller" 
CC: Jakub Kicinski 
CC: linux-wirel...@vger.kernel.org
CC: net...@vger.kernel.org
Signed-off-by: Jarod Wilson 
---
 net/wireless/nl80211.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/wireless/nl80211.c b/net/wireless/nl80211.c
index 521d36bb0803..a157783760c7 100644
--- a/net/wireless/nl80211.c
+++ b/net/wireless/nl80211.c
@@ -70,7 +70,7 @@ __cfg80211_wdev_from_attrs(struct cfg80211_registered_device 
*rdev,
struct wireless_dev *result = NULL;
bool have_ifidx = attrs[NL80211_ATTR_IFINDEX];
bool have_wdev_id = attrs[NL80211_ATTR_WDEV];
-   u64 wdev_id;
+   u64 wdev_id = 0;
int wiphy_idx = -1;
int ifidx = -1;
 
-- 
2.29.2

[PATCH net-next v4] bonding: add a vlan+srcmac tx hashing option

2021-01-18 Thread Jarod Wilson

This comes from an end-user request, where they're running multiple VMs on
hosts with bonded interfaces connected to some interest switch topologies,
where 802.3ad isn't an option. They're currently running a proprietary
solution that effectively achieves load-balancing of VMs and bandwidth
utilization improvements with a similar form of transmission algorithm.

Basically, each VM has it's own vlan, so it always sends its traffic out
the same interface, unless that interface fails. Traffic gets split
between the interfaces, maintaining a consistent path, with failover still
available if an interface goes down.

Unlike bond_eth_hash(), this hash function is using the full source MAC
address instead of just the last byte, as there are so few components to
the hash, and in the no-vlan case, we would be returning just the last
byte of the source MAC as the hash value. It's entirely possible to have
two NICs in a bond with the same last byte of their MAC, but not the same
MAC, so this adjustment should guarantee distinct hashes in all cases.

This has been rudimetarily tested to provide similar results to the
proprietary solution it is aiming to replace. A patch for iproute2 is also
posted, to properly support the new mode there as well.

Cc: Jay Vosburgh 
Cc: Veaceslav Falico 
Cc: Andy Gospodarek 
Cc: "David S. Miller" 
Cc: Jakub Kicinski 
Cc: Thomas Davis 
Cc: net...@vger.kernel.org
Signed-off-by: Jarod Wilson 
---
v2: verified netlink interfaces working, added Documentation, changed
tx hash mode name to vlan+mac for consistency and clarity.
v3: drop inline from hash function, use full source MAC, not just the
last byte, expand explanation in patch description, extend hash name to
vlan+srcmac.
v4: fix documentation issues pointed out by David Ahern

 Documentation/networking/bonding.rst | 13 +++
 drivers/net/bonding/bond_main.c  | 34 ++--
 drivers/net/bonding/bond_options.c   | 13 ++-
 include/linux/netdevice.h|  1 +
 include/uapi/linux/if_bonding.h  |  1 +
 5 files changed, 54 insertions(+), 8 deletions(-)

diff --git a/Documentation/networking/bonding.rst 
b/Documentation/networking/bonding.rst
index adc314639085..36562dcd3e1e 100644
--- a/Documentation/networking/bonding.rst
+++ b/Documentation/networking/bonding.rst
@@ -951,6 +951,19 @@ xmit_hash_policy
packets will be distributed according to the encapsulated
flows.
 
+   vlan+srcmac
+
+   This policy uses a very rudimentary vlan ID and source mac
+   hash to load-balance traffic per-vlan, with failover
+   should one leg fail. The intended use case is for a bond
+   shared by multiple virtual machines, all configured to
+   use their own vlan, to give lacp-like functionality
+   without requiring lacp-capable switching hardware.
+
+   The formula for the hash is simply
+
+   hash = (vlan ID) XOR (source MAC vendor) XOR (source MAC dev)
+
The default value is layer2.  This option was added in bonding
version 2.6.3.  In earlier versions of bonding, this parameter
does not exist, and the layer2 policy is the only policy.  The
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 5fe5232cc3f3..d4bc4d4e953b 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -164,7 +164,7 @@ module_param(xmit_hash_policy, charp, 0);
 MODULE_PARM_DESC(xmit_hash_policy, "balance-alb, balance-tlb, balance-xor, 
802.3ad hashing method; "
   "0 for layer 2 (default), 1 for layer 3+4, "
   "2 for layer 2+3, 3 for encap layer 2+3, "
-  "4 for encap layer 3+4");
+  "4 for encap layer 3+4, 5 for vlan+srcmac");
 module_param(arp_interval, int, 0);
 MODULE_PARM_DESC(arp_interval, "arp interval in milliseconds");
 module_param_array(arp_ip_target, charp, NULL, 0);
@@ -1434,6 +1434,8 @@ static enum netdev_lag_hash bond_lag_hash_type(struct 
bonding *bond,
return NETDEV_LAG_HASH_E23;
case BOND_XMIT_POLICY_ENCAP34:
return NETDEV_LAG_HASH_E34;
+   case BOND_XMIT_POLICY_VLAN_SRCMAC:
+   return NETDEV_LAG_HASH_VLAN_SRCMAC;
default:
return NETDEV_LAG_HASH_UNKNOWN;
}
@@ -3494,6 +3496,27 @@ static bool bond_flow_ip(struct sk_buff *skb, struct 
flow_keys *fk,
return true;
 }
 
+static u32 bond_vlan_srcmac_hash(struct sk_buff *skb)
+{
+   struct ethhdr *mac_hdr = (struct ethhdr *)skb_mac_header(skb);
+   u32 srcmac_vendor = 0, srcmac_dev = 0;
+   u16 vlan;
+   int i;
+
+   for (i = 0; i < 3; i++)
+   srcmac_vendor = (srcmac_vendor << 8) | mac_hdr->h_source[i];
+

Re: [PATCH net-next v3] bonding: add a vlan+srcmac tx hashing option

2021-01-18 Thread Jarod Wilson

On Mon, Jan 18, 2021 at 04:10:38PM -0700, David Ahern wrote:
> On 1/15/21 12:21 PM, Jarod Wilson wrote:
> > diff --git a/Documentation/networking/bonding.rst 
> > b/Documentation/networking/bonding.rst
> > index adc314639085..36562dcd3e1e 100644
> > --- a/Documentation/networking/bonding.rst
> > +++ b/Documentation/networking/bonding.rst
> > @@ -951,6 +951,19 @@ xmit_hash_policy
> > packets will be distributed according to the encapsulated
> > flows.
> >  
> > +   vlan+srcmac
> > +
> > +   This policy uses a very rudimentary vland ID and source mac
> 
> s/vland/vlan/
> 
> > +   ID hash to load-balance traffic per-vlan, with failover
> 
> drop ID on this line; just 'source mac'.

Bah. Crap. Didn't test documentation, clearly. Or proof-read it. Will fix
in v4. Hopefully, nothing else to change though...

-- 
Jarod Wilson
ja...@redhat.com

[PATCH net-next v3] bonding: add a vlan+srcmac tx hashing option

2021-01-15 Thread Jarod Wilson

This comes from an end-user request, where they're running multiple VMs on
hosts with bonded interfaces connected to some interest switch topologies,
where 802.3ad isn't an option. They're currently running a proprietary
solution that effectively achieves load-balancing of VMs and bandwidth
utilization improvements with a similar form of transmission algorithm.

Basically, each VM has it's own vlan, so it always sends its traffic out
the same interface, unless that interface fails. Traffic gets split
between the interfaces, maintaining a consistent path, with failover still
available if an interface goes down.

Unlike bond_eth_hash(), this hash function is using the full source MAC
address instead of just the last byte, as there are so few components to
the hash, and in the no-vlan case, we would be returning just the last
byte of the source MAC as the hash value. It's entirely possible to have
two NICs in a bond with the same last byte of their MAC, but not the same
MAC, so this adjustment should guarantee distinct hashes in all cases.

This has been rudimetarily tested to provide similar results to the
proprietary solution it is aiming to replace. A patch for iproute2 is also
posted, to properly support the new mode there as well.

Cc: Jay Vosburgh 
Cc: Veaceslav Falico 
Cc: Andy Gospodarek 
Cc: "David S. Miller" 
Cc: Jakub Kicinski 
Cc: Thomas Davis 
Cc: net...@vger.kernel.org
Signed-off-by: Jarod Wilson 
---
v2: verified netlink interfaces working, added Documentation, changed
tx hash mode name to vlan+mac for consistency and clarity.
v3: drop inline from hash function, use full source MAC, not just the
last byte, expand explanation in patch description, extend hash name to
vlan+srcmac.

 Documentation/networking/bonding.rst | 13 +++
 drivers/net/bonding/bond_main.c  | 34 ++--
 drivers/net/bonding/bond_options.c   | 13 ++-
 include/linux/netdevice.h|  1 +
 include/uapi/linux/if_bonding.h  |  1 +
 5 files changed, 54 insertions(+), 8 deletions(-)

diff --git a/Documentation/networking/bonding.rst 
b/Documentation/networking/bonding.rst
index adc314639085..36562dcd3e1e 100644
--- a/Documentation/networking/bonding.rst
+++ b/Documentation/networking/bonding.rst
@@ -951,6 +951,19 @@ xmit_hash_policy
packets will be distributed according to the encapsulated
flows.
 
+   vlan+srcmac
+
+   This policy uses a very rudimentary vland ID and source mac
+   ID hash to load-balance traffic per-vlan, with failover
+   should one leg fail. The intended use case is for a bond
+   shared by multiple virtual machines, all configured to
+   use their own vlan, to give lacp-like functionality
+   without requiring lacp-capable switching hardware.
+
+   The formula for the hash is simply
+
+   hash = (vlan ID) XOR (source MAC vendor) XOR (source MAC dev)
+
The default value is layer2.  This option was added in bonding
version 2.6.3.  In earlier versions of bonding, this parameter
does not exist, and the layer2 policy is the only policy.  The
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 5fe5232cc3f3..d4bc4d4e953b 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -164,7 +164,7 @@ module_param(xmit_hash_policy, charp, 0);
 MODULE_PARM_DESC(xmit_hash_policy, "balance-alb, balance-tlb, balance-xor, 
802.3ad hashing method; "
   "0 for layer 2 (default), 1 for layer 3+4, "
   "2 for layer 2+3, 3 for encap layer 2+3, "
-  "4 for encap layer 3+4");
+  "4 for encap layer 3+4, 5 for vlan+srcmac");
 module_param(arp_interval, int, 0);
 MODULE_PARM_DESC(arp_interval, "arp interval in milliseconds");
 module_param_array(arp_ip_target, charp, NULL, 0);
@@ -1434,6 +1434,8 @@ static enum netdev_lag_hash bond_lag_hash_type(struct 
bonding *bond,
return NETDEV_LAG_HASH_E23;
case BOND_XMIT_POLICY_ENCAP34:
return NETDEV_LAG_HASH_E34;
+   case BOND_XMIT_POLICY_VLAN_SRCMAC:
+   return NETDEV_LAG_HASH_VLAN_SRCMAC;
default:
return NETDEV_LAG_HASH_UNKNOWN;
}
@@ -3494,6 +3496,27 @@ static bool bond_flow_ip(struct sk_buff *skb, struct 
flow_keys *fk,
return true;
 }
 
+static u32 bond_vlan_srcmac_hash(struct sk_buff *skb)
+{
+   struct ethhdr *mac_hdr = (struct ethhdr *)skb_mac_header(skb);
+   u32 srcmac_vendor = 0, srcmac_dev = 0;
+   u16 vlan;
+   int i;
+
+   for (i = 0; i < 3; i++)
+   srcmac_vendor = (srcmac_vendor << 8) | mac_hdr->h_source[i];
+
+   for (i = 3; i < ETH_ALEN; i++)

Re: [PATCH net-next v2] bonding: add a vlan+mac tx hashing option

2021-01-15 Thread Jarod Wilson

On Thu, Jan 14, 2021 at 01:54:31PM -0800, Jay Vosburgh wrote:
> Jarod Wilson  wrote:
> 
> >On Wed, Jan 13, 2021 at 05:58:18PM -0800, Jakub Kicinski wrote:
> >> On Wed, 13 Jan 2021 17:35:48 -0500 Jarod Wilson wrote:
> >> > This comes from an end-user request, where they're running multiple VMs 
> >> > on
> >> > hosts with bonded interfaces connected to some interest switch 
> >> > topologies,
> >> > where 802.3ad isn't an option. They're currently running a proprietary
> >> > solution that effectively achieves load-balancing of VMs and bandwidth
> >> > utilization improvements with a similar form of transmission algorithm.
> >> > 
> >> > Basically, each VM has it's own vlan, so it always sends its traffic out
> >> > the same interface, unless that interface fails. Traffic gets split
> >> > between the interfaces, maintaining a consistent path, with failover 
> >> > still
> >> > available if an interface goes down.
> >> > 
> >> > This has been rudimetarily tested to provide similar results, suitable 
> >> > for
> >> > them to use to move off their current proprietary solution. A patch for
> >> > iproute2 is forthcoming as well, to properly support the new mode there 
> >> > as
> >> > well.
> >> 
> >> > Signed-off-by: Jarod Wilson 
> >> > ---
> >> > v2: verified netlink interfaces working, added Documentation, changed
> >> > tx hash mode name to vlan+mac for consistency and clarity.
> >> > 
> >> >  Documentation/networking/bonding.rst | 13 +
> >> >  drivers/net/bonding/bond_main.c  | 27 +--
> >> >  drivers/net/bonding/bond_options.c   |  1 +
> >> >  include/linux/netdevice.h|  1 +
> >> >  include/uapi/linux/if_bonding.h  |  1 +
> >> >  5 files changed, 41 insertions(+), 2 deletions(-)
> >> > 
> >> > diff --git a/Documentation/networking/bonding.rst 
> >> > b/Documentation/networking/bonding.rst
> >> > index adc314639085..c78ceb7630a0 100644
> >> > --- a/Documentation/networking/bonding.rst
> >> > +++ b/Documentation/networking/bonding.rst
> >> > @@ -951,6 +951,19 @@ xmit_hash_policy
> >> >  packets will be distributed according to the 
> >> > encapsulated
> >> >  flows.
> >> >  
> >> > +vlan+mac
> 
>   I notice that the code calls it "VLAN_SRCMAC" but the
> user-facing nomenclature is "vlan+mac"; I tend to lean towards having
> the user visible name also be "vlan+srcmac".  Both for consistency, and
> just in case someone someday wants "vlan+dstmac".  And you did ask for
> preference on this in a separate email.

That's valid. I was trying to keep it short, but it does muddy the waters
a bit by not including src. I'll adjust accordingly and resend the
userspace bit too.

...
>   Yah, the existing L2 hash is pretty weak.  It might be possible
> to squeeze this into the existing bond_xmit_hash a bit better, if the
> hash is two u32s.  The first being the first 32 bits of the MAC, and the
> second being the last 16 bits of the MAC combined with the 16 bit VLAN
> tag.
> 
>   There's already logic at the end of bond_xmit_hash to reduce a
> u32 into the final hash that perhaps could be leveraged.  
> 
>   Thinking about it, though, all the ways to combine that data
> together end up being pretty vile ("*(u32 *)ðhdr->h_source[0]" sorts
> of things).

Yeah, I'd worry that bond_xmit_hash() is already getting a bit complicated
to follow and understand, and that would make it even more so.

> >Something like this instead maybe:
> >
> >static u32 bond_vlan_srcmac_hash(struct sk_buff *skb)
> >{
> >struct ethhdr *mac_hdr = (struct ethhdr *)skb_mac_header(skb);
> >u32 srcmac = 0;
> >u16 vlan;
> >int i;
> >
> >for (i = 0; i < ETH_ALEN; i++)
> >srcmac = (srcmac << 8) | mac_hdr->h_source[i];
> 
>   I think this will shift h_source[0] and [1] into oblivion.

Argh, yep, 48 bits don't fit into a u32. Okay, so I'll replace that with a
u32 srcmac_vendor and u32 srcmac_dev, but they'll only have 24 bits of data
in them, then return vlan ^ srcmac_vendor ^ srcmac_dev, I think.

-- 
Jarod Wilson
ja...@redhat.com

Re: [PATCH net-next v2] bonding: add a vlan+mac tx hashing option

2021-01-14 Thread Jarod Wilson

On Thu, Jan 14, 2021 at 01:23:14PM -0800, Jakub Kicinski wrote:
> On Thu, 14 Jan 2021 16:11:41 -0500 Jarod Wilson wrote:
> > In truth, this code started out as a copy of bond_eth_hash(), which also
> > only uses the last byte, though of both source and destination macs. In
> > the typical use case for the requesting user, the bond is formed from two
> > onboard NICs, which typically have adjacent mac addresses, i.e.,
> > AA:BB:CC:DD:EE:01 and AA:BB:CC:DD:EE:02, so only the last byte is really
> > relevant to hash differently, but in thinking about it, a replacement NIC
> > because an onboard one died could have the same last byte, and maybe we
> > ought to just go full source mac right off the go here.
> > 
> > Something like this instead maybe:
> > 
> > static u32 bond_vlan_srcmac_hash(struct sk_buff *skb)
> > {
> > struct ethhdr *mac_hdr = (struct ethhdr *)skb_mac_header(skb);
> > u32 srcmac = 0;
> > u16 vlan;
> > int i;
> > 
> > for (i = 0; i < ETH_ALEN; i++)
> > srcmac = (srcmac << 8) | mac_hdr->h_source[i];
> > 
> > if (!skb_vlan_tag_present(skb))
> > return srcmac;
> > 
> > vlan = skb_vlan_tag_get(skb);
> > 
> > return vlan ^ srcmac;
> > }
> > 
> > Then the documentation is spot-on, and we're future-proof, though
> > marginally less performant in calculating the hash, which may have been a
> > consideration when the original function was written, but is probably
> > basically irrelevant w/modern systems...
> 
> No preference, especially if bond_eth_hash() already uses the last byte.
> Just make sure the choice is explained in the commit message.

I've sold myself on using the full MAC, because if there's no vlan tag
present, mac is the only thing used for the hash, increasing the chances
of getting the same hash for two different interfaces, which won't happen
if we've got the full MAC. Of course, I'm not sure why someone would be
using this xmit hash outside of the very particular use-case that includes
VLANs, but people do strange things...

-- 
Jarod Wilson
ja...@redhat.com

Re: [PATCH net-next v2] bonding: add a vlan+mac tx hashing option

2021-01-14 Thread Jarod Wilson

On Wed, Jan 13, 2021 at 05:58:18PM -0800, Jakub Kicinski wrote:
> On Wed, 13 Jan 2021 17:35:48 -0500 Jarod Wilson wrote:
> > This comes from an end-user request, where they're running multiple VMs on
> > hosts with bonded interfaces connected to some interest switch topologies,
> > where 802.3ad isn't an option. They're currently running a proprietary
> > solution that effectively achieves load-balancing of VMs and bandwidth
> > utilization improvements with a similar form of transmission algorithm.
> > 
> > Basically, each VM has it's own vlan, so it always sends its traffic out
> > the same interface, unless that interface fails. Traffic gets split
> > between the interfaces, maintaining a consistent path, with failover still
> > available if an interface goes down.
> > 
> > This has been rudimetarily tested to provide similar results, suitable for
> > them to use to move off their current proprietary solution. A patch for
> > iproute2 is forthcoming as well, to properly support the new mode there as
> > well.
> 
> > Signed-off-by: Jarod Wilson 
> > ---
> > v2: verified netlink interfaces working, added Documentation, changed
> > tx hash mode name to vlan+mac for consistency and clarity.
> > 
> >  Documentation/networking/bonding.rst | 13 +
> >  drivers/net/bonding/bond_main.c  | 27 +--
> >  drivers/net/bonding/bond_options.c   |  1 +
> >  include/linux/netdevice.h|  1 +
> >  include/uapi/linux/if_bonding.h  |  1 +
> >  5 files changed, 41 insertions(+), 2 deletions(-)
> > 
> > diff --git a/Documentation/networking/bonding.rst 
> > b/Documentation/networking/bonding.rst
> > index adc314639085..c78ceb7630a0 100644
> > --- a/Documentation/networking/bonding.rst
> > +++ b/Documentation/networking/bonding.rst
> > @@ -951,6 +951,19 @@ xmit_hash_policy
> > packets will be distributed according to the encapsulated
> > flows.
> >  
> > +   vlan+mac
> > +
> > +   This policy uses a very rudimentary vland ID and source mac
> > +   ID hash to load-balance traffic per-vlan, with failover
> > +   should one leg fail. The intended use case is for a bond
> > +   shared by multiple virtual machines, all configured to
> > +   use their own vlan, to give lacp-like functionality
> > +   without requiring lacp-capable switching hardware.
> > +
> > +   The formula for the hash is simply
> > +
> > +   hash = (vlan ID) XOR (source MAC)
> 
> But in the code it's only using one byte of the MAC, currently.
> 
> I think that's fine for the particular use case but should we call out
> explicitly in the commit message why it's considered sufficient?
> 
> Someone can change it later, if needed, but best if we spell out the
> current motivation.

In truth, this code started out as a copy of bond_eth_hash(), which also
only uses the last byte, though of both source and destination macs. In
the typical use case for the requesting user, the bond is formed from two
onboard NICs, which typically have adjacent mac addresses, i.e.,
AA:BB:CC:DD:EE:01 and AA:BB:CC:DD:EE:02, so only the last byte is really
relevant to hash differently, but in thinking about it, a replacement NIC
because an onboard one died could have the same last byte, and maybe we
ought to just go full source mac right off the go here.

Something like this instead maybe:

static u32 bond_vlan_srcmac_hash(struct sk_buff *skb)
{
struct ethhdr *mac_hdr = (struct ethhdr *)skb_mac_header(skb);
u32 srcmac = 0;
u16 vlan;
int i;

for (i = 0; i < ETH_ALEN; i++)
srcmac = (srcmac << 8) | mac_hdr->h_source[i];

if (!skb_vlan_tag_present(skb))
return srcmac;

vlan = skb_vlan_tag_get(skb);

return vlan ^ srcmac;
}

Then the documentation is spot-on, and we're future-proof, though
marginally less performant in calculating the hash, which may have been a
consideration when the original function was written, but is probably
basically irrelevant w/modern systems...

> > The default value is layer2.  This option was added in bonding
> > version 2.6.3.  In earlier versions of bonding, this parameter
> > does not exist, and the layer2 policy is the only policy.  The
> 
> > +static inline u32 bond_vlan_srcmac_hash(struct sk_buff *skb)
> 
> Can we drop the inline? It's a static function called once.

Works for me. That was also inherited by copying bond_eth_hash(). :)

> > +{
> > +   struct ethhdr *mac_hdr = (struct ethhdr *)skb_mac_hea

[PATCH net-next v2] bonding: add a vlan+mac tx hashing option

2021-01-13 Thread Jarod Wilson

This comes from an end-user request, where they're running multiple VMs on
hosts with bonded interfaces connected to some interest switch topologies,
where 802.3ad isn't an option. They're currently running a proprietary
solution that effectively achieves load-balancing of VMs and bandwidth
utilization improvements with a similar form of transmission algorithm.

Basically, each VM has it's own vlan, so it always sends its traffic out
the same interface, unless that interface fails. Traffic gets split
between the interfaces, maintaining a consistent path, with failover still
available if an interface goes down.

This has been rudimetarily tested to provide similar results, suitable for
them to use to move off their current proprietary solution. A patch for
iproute2 is forthcoming as well, to properly support the new mode there as
well.

Cc: Jay Vosburgh 
Cc: Veaceslav Falico 
Cc: Andy Gospodarek 
Cc: "David S. Miller" 
Cc: Jakub Kicinski 
Cc: Thomas Davis 
Cc: net...@vger.kernel.org
Signed-off-by: Jarod Wilson 
---
v2: verified netlink interfaces working, added Documentation, changed
tx hash mode name to vlan+mac for consistency and clarity.

 Documentation/networking/bonding.rst | 13 +
 drivers/net/bonding/bond_main.c  | 27 +--
 drivers/net/bonding/bond_options.c   |  1 +
 include/linux/netdevice.h|  1 +
 include/uapi/linux/if_bonding.h  |  1 +
 5 files changed, 41 insertions(+), 2 deletions(-)

diff --git a/Documentation/networking/bonding.rst 
b/Documentation/networking/bonding.rst
index adc314639085..c78ceb7630a0 100644
--- a/Documentation/networking/bonding.rst
+++ b/Documentation/networking/bonding.rst
@@ -951,6 +951,19 @@ xmit_hash_policy
packets will be distributed according to the encapsulated
flows.
 
+   vlan+mac
+
+   This policy uses a very rudimentary vland ID and source mac
+   ID hash to load-balance traffic per-vlan, with failover
+   should one leg fail. The intended use case is for a bond
+   shared by multiple virtual machines, all configured to
+   use their own vlan, to give lacp-like functionality
+   without requiring lacp-capable switching hardware.
+
+   The formula for the hash is simply
+
+   hash = (vlan ID) XOR (source MAC)
+
The default value is layer2.  This option was added in bonding
version 2.6.3.  In earlier versions of bonding, this parameter
does not exist, and the layer2 policy is the only policy.  The
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 5fe5232cc3f3..766c09a553c1 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -164,7 +164,7 @@ module_param(xmit_hash_policy, charp, 0);
 MODULE_PARM_DESC(xmit_hash_policy, "balance-alb, balance-tlb, balance-xor, 
802.3ad hashing method; "
   "0 for layer 2 (default), 1 for layer 3+4, "
   "2 for layer 2+3, 3 for encap layer 2+3, "
-  "4 for encap layer 3+4");
+  "4 for encap layer 3+4, 5 for vlan+mac");
 module_param(arp_interval, int, 0);
 MODULE_PARM_DESC(arp_interval, "arp interval in milliseconds");
 module_param_array(arp_ip_target, charp, NULL, 0);
@@ -1434,6 +1434,8 @@ static enum netdev_lag_hash bond_lag_hash_type(struct 
bonding *bond,
return NETDEV_LAG_HASH_E23;
case BOND_XMIT_POLICY_ENCAP34:
return NETDEV_LAG_HASH_E34;
+   case BOND_XMIT_POLICY_VLAN_SRCMAC:
+   return NETDEV_LAG_HASH_VLAN_SRCMAC;
default:
return NETDEV_LAG_HASH_UNKNOWN;
}
@@ -3494,6 +3496,20 @@ static bool bond_flow_ip(struct sk_buff *skb, struct 
flow_keys *fk,
return true;
 }
 
+static inline u32 bond_vlan_srcmac_hash(struct sk_buff *skb)
+{
+   struct ethhdr *mac_hdr = (struct ethhdr *)skb_mac_header(skb);
+   u32 srcmac = mac_hdr->h_source[5];
+   u16 vlan;
+
+   if (!skb_vlan_tag_present(skb))
+   return srcmac;
+
+   vlan = skb_vlan_tag_get(skb);
+
+   return srcmac ^ vlan;
+}
+
 /* Extract the appropriate headers based on bond's xmit policy */
 static bool bond_flow_dissect(struct bonding *bond, struct sk_buff *skb,
  struct flow_keys *fk)
@@ -3501,10 +3517,14 @@ static bool bond_flow_dissect(struct bonding *bond, 
struct sk_buff *skb,
bool l34 = bond->params.xmit_policy == BOND_XMIT_POLICY_LAYER34;
int noff, proto = -1;
 
-   if (bond->params.xmit_policy > BOND_XMIT_POLICY_LAYER23) {
+   switch (bond->params.xmit_policy) {
+   case BOND_XMIT_POLICY_ENCAP23:
+   case BOND_XMIT_POLICY_ENCAP34:

Re: [RFC PATCH net-next] bonding: add a vlan+srcmac tx hashing option

2021-01-12 Thread Jarod Wilson

On Tue, Jan 12, 2021 at 01:39:10PM -0800, Jay Vosburgh wrote:
> Jarod Wilson  wrote:
> 
> >On Thu, Jan 07, 2021 at 07:03:40PM -0500, Jarod Wilson wrote:
> >> On Fri, Dec 18, 2020 at 04:18:59PM -0800, Jay Vosburgh wrote:
> >> > Jarod Wilson  wrote:
> >> > 
> >> > >This comes from an end-user request, where they're running multiple VMs 
> >> > >on
> >> > >hosts with bonded interfaces connected to some interest switch 
> >> > >topologies,
> >> > >where 802.3ad isn't an option. They're currently running a proprietary
> >> > >solution that effectively achieves load-balancing of VMs and bandwidth
> >> > >utilization improvements with a similar form of transmission algorithm.
> >> > >
> >> > >Basically, each VM has it's own vlan, so it always sends its traffic out
> >> > >the same interface, unless that interface fails. Traffic gets split
> >> > >between the interfaces, maintaining a consistent path, with failover 
> >> > >still
> >> > >available if an interface goes down.
> >> > >
> >> > >This has been rudimetarily tested to provide similar results, suitable 
> >> > >for
> >> > >them to use to move off their current proprietary solution.
> >> > >
> >> > >Still on the TODO list, if these even looks sane to begin with, is
> >> > >fleshing out Documentation/networking/bonding.rst.
> >> > 
> >> >  I'm sure you're aware, but any final submission will also need
> >> > to include netlink and iproute2 support.
> >> 
> >> I believe everything for netlink support is already included, but I'll
> >> double-check that before submitting something for inclusion consideration.
> >
> >I'm not certain if what you actually meant was that I'd have to patch
> >iproute2 as well, which I've definitely stumbled onto today, but it's a
> >2-line patch, and everything seems to be working fine with it:
> 
>   Yes, that's what I meant.
> 
> >$ sudo ip link set bond0 type bond xmit_hash_policy 5
> 
>   Does the above work with the text label (presumably "vlansrc")
> as well as the number, and does "ip link add test type bond help" print
> the correct text for XMIT_HASH_POLICY?

All of the above looks correct to me, output below. Before submitting...
Could rename it from vlansrc to vlan+srcmac or some variation thereof if
it's desired. I tried to keep it relatively short, but it's perhaps a bit
less succinct like I have it now, and other modes include a +.

$ sudo modprobe bonding mode=2 max_bonds=1 xmit_hash_policy=0
$ sudo ip link set bond0 type bond xmit_hash_policy vlansrc
$ cat /proc/net/bonding/bond0
Ethernet Channel Bonding Driver: v4.18.0-272.el8.vstx.x86_64

Bonding Mode: load balancing (xor)
Transmit Hash Policy: vlansrc (5)
MII Status: down
MII Polling Interval (ms): 0
Up Delay (ms): 0
Down Delay (ms): 0
Peer Notification Delay (ms): 0

$ sudo ip link add test type bond help
Usage: ... bond [ mode BONDMODE ] [ active_slave SLAVE_DEV ]
[ clear_active_slave ] [ miimon MIIMON ]
[ updelay UPDELAY ] [ downdelay DOWNDELAY ]
[ peer_notify_delay DELAY ]
[ use_carrier USE_CARRIER ]
[ arp_interval ARP_INTERVAL ]
[ arp_validate ARP_VALIDATE ]
[ arp_all_targets ARP_ALL_TARGETS ]
[ arp_ip_target [ ARP_IP_TARGET, ... ] ]
[ primary SLAVE_DEV ]
[ primary_reselect PRIMARY_RESELECT ]
[ fail_over_mac FAIL_OVER_MAC ]
[ xmit_hash_policy XMIT_HASH_POLICY ]
[ resend_igmp RESEND_IGMP ]
[ num_grat_arp|num_unsol_na NUM_GRAT_ARP|NUM_UNSOL_NA ]
[ all_slaves_active ALL_SLAVES_ACTIVE ]
[ min_links MIN_LINKS ]
[ lp_interval LP_INTERVAL ]
[ packets_per_slave PACKETS_PER_SLAVE ]
[ tlb_dynamic_lb TLB_DYNAMIC_LB ]
[ lacp_rate LACP_RATE ]
        [ ad_select AD_SELECT ]
[ ad_user_port_key PORTKEY ]
[ ad_actor_sys_prio SYSPRIO ]
[ ad_actor_system LLADDR ]

BONDMODE := 
balance-rr|active-backup|balance-xor|broadcast|802.3ad|balance-tlb|balance-alb
ARP_VALIDATE := none|active|backup|all
ARP_ALL_TARGETS := any|all
PRIMARY_RESELECT := always|better|failure
FAIL_OVER_MAC := none|active|follow
XMIT_HASH_POLICY := layer2|layer2+3|layer3+4|encap2+3|encap3+4|vlansrc
LACP_RATE := slow|fast
AD_SELECT := stable|bandwidth|count


-- 
Jarod Wilson
ja...@redhat.com

Re: [RFC PATCH net-next] bonding: add a vlan+srcmac tx hashing option

2021-01-12 Thread Jarod Wilson

On Thu, Jan 07, 2021 at 07:03:40PM -0500, Jarod Wilson wrote:
> On Fri, Dec 18, 2020 at 04:18:59PM -0800, Jay Vosburgh wrote:
> > Jarod Wilson  wrote:
> > 
> > >This comes from an end-user request, where they're running multiple VMs on
> > >hosts with bonded interfaces connected to some interest switch topologies,
> > >where 802.3ad isn't an option. They're currently running a proprietary
> > >solution that effectively achieves load-balancing of VMs and bandwidth
> > >utilization improvements with a similar form of transmission algorithm.
> > >
> > >Basically, each VM has it's own vlan, so it always sends its traffic out
> > >the same interface, unless that interface fails. Traffic gets split
> > >between the interfaces, maintaining a consistent path, with failover still
> > >available if an interface goes down.
> > >
> > >This has been rudimetarily tested to provide similar results, suitable for
> > >them to use to move off their current proprietary solution.
> > >
> > >Still on the TODO list, if these even looks sane to begin with, is
> > >fleshing out Documentation/networking/bonding.rst.
> > 
> > I'm sure you're aware, but any final submission will also need
> > to include netlink and iproute2 support.
> 
> I believe everything for netlink support is already included, but I'll
> double-check that before submitting something for inclusion consideration.

I'm not certain if what you actually meant was that I'd have to patch
iproute2 as well, which I've definitely stumbled onto today, but it's a
2-line patch, and everything seems to be working fine with it:

$ sudo ip link set bond0 type bond xmit_hash_policy 5
$ ip -d link show bond0
11: bond0:  mtu 1500 qdisc noop state DOWN mode 
DEFAULT group default qlen 1000
link/ether ce:85:5e:24:ce:90 brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 
maxmtu 65535
bond mode balance-xor miimon 0 updelay 0 downdelay 0 peer_notify_delay 0 
use_carrier 1 arp_interval 0 arp_validate none arp_all_targets any 
primary_reselect always fail_over_mac none xmit_hash_policy vlansrc resend_igmp 
1 num_grat_arp 1 all_slaves_active 0 min_links 0 lp_interval 1 
packets_per_slave 1 lacp_rate slow ad_select stable tlb_dynamic_lb 1 
addrgenmode eui64 numtxqueues 16 numrxqueues 16 gso_max_size 65536 gso_max_segs 
65535
$ grep Hash /proc/net/bonding/bond0
Transmit Hash Policy: vlansrc (5)

Nothing bad seems to happen on an older kernel if one tries to set the new
hash, you just get told that it's an invalid argument.

I *think* this is all ready for submission then, so I'll get both the kernel
and iproute2 patches out soon.

-- 
Jarod Wilson
ja...@redhat.com

Re: [RFC PATCH net-next] bonding: add a vlan+srcmac tx hashing option

2021-01-08 Thread Jarod Wilson

On Fri, Jan 08, 2021 at 02:12:56PM +0100, Jiri Pirko wrote:
> Fri, Jan 08, 2021 at 12:58:13AM CET, ja...@redhat.com wrote:
> >On Mon, Dec 28, 2020 at 11:11:45AM +0100, Jiri Pirko wrote:
> >> Fri, Dec 18, 2020 at 08:30:33PM CET, ja...@redhat.com wrote:
> >> >This comes from an end-user request, where they're running multiple VMs on
> >> >hosts with bonded interfaces connected to some interest switch topologies,
> >> >where 802.3ad isn't an option. They're currently running a proprietary
> >> >solution that effectively achieves load-balancing of VMs and bandwidth
> >> >utilization improvements with a similar form of transmission algorithm.
> >> >
> >> >Basically, each VM has it's own vlan, so it always sends its traffic out
> >> >the same interface, unless that interface fails. Traffic gets split
> >> >between the interfaces, maintaining a consistent path, with failover still
> >> >available if an interface goes down.
> >> >
> >> >This has been rudimetarily tested to provide similar results, suitable for
> >> >them to use to move off their current proprietary solution.
> >> >
> >> >Still on the TODO list, if these even looks sane to begin with, is
> >> >fleshing out Documentation/networking/bonding.rst.
> >> 
> >> Jarod, did you consider using team driver instead ? :)
> >
> >That's actually one of the things that was suggested, since team I believe
> >already has support for this, but the user really wants to use bonding.
> >We're finding that a lot of users really still prefer bonding over team.
> 
> Do you know the reason, other than "nostalgia"?

I've heard a few different reasons that come to mind:

1) nostalgia is definitely one -- "we know bonding here"
2) support -- "the things I'm running say I need bonding to properly
support failover in their environment". How accurate this is, I don't
actually know.
3) monitoring -- "my monitoring solution knows about bonding, but not
about team". This is probably easily fixed, but may or may not be in the
user's direct control.
4) footprint -- "bonding does the job w/o team's userspace footprint".
I think this one is kind of hard for team to do anything about, bonding
really does have a smaller userspace footprint, which is a plus for
embedded type applications and high-security environments looking to keep
things as minimal as possible.

I think I've heard a few "we tried team years ago and it didn't work" as
well, which of course is ridiculous as a reason not to try something again,
since a lot can change in a few years in this world.

-- 
Jarod Wilson
ja...@redhat.com

Re: [RFC PATCH net-next] bonding: add a vlan+srcmac tx hashing option

2021-01-07 Thread Jarod Wilson

On Fri, Dec 18, 2020 at 04:18:59PM -0800, Jay Vosburgh wrote:
> Jarod Wilson  wrote:
> 
> >This comes from an end-user request, where they're running multiple VMs on
> >hosts with bonded interfaces connected to some interest switch topologies,
> >where 802.3ad isn't an option. They're currently running a proprietary
> >solution that effectively achieves load-balancing of VMs and bandwidth
> >utilization improvements with a similar form of transmission algorithm.
> >
> >Basically, each VM has it's own vlan, so it always sends its traffic out
> >the same interface, unless that interface fails. Traffic gets split
> >between the interfaces, maintaining a consistent path, with failover still
> >available if an interface goes down.
> >
> >This has been rudimetarily tested to provide similar results, suitable for
> >them to use to move off their current proprietary solution.
> >
> >Still on the TODO list, if these even looks sane to begin with, is
> >fleshing out Documentation/networking/bonding.rst.
> 
>   I'm sure you're aware, but any final submission will also need
> to include netlink and iproute2 support.

I believe everything for netlink support is already included, but I'll
double-check that before submitting something for inclusion consideration.

> >diff --git a/drivers/net/bonding/bond_main.c 
> >b/drivers/net/bonding/bond_main.c
> >index 5fe5232cc3f3..151ce8c7a56f 100644
> >--- a/drivers/net/bonding/bond_main.c
> >+++ b/drivers/net/bonding/bond_main.c
> >@@ -164,7 +164,7 @@ module_param(xmit_hash_policy, charp, 0);
> > MODULE_PARM_DESC(xmit_hash_policy, "balance-alb, balance-tlb, balance-xor, 
> > 802.3ad hashing method; "
> >"0 for layer 2 (default), 1 for layer 3+4, "
> >"2 for layer 2+3, 3 for encap layer 2+3, "
> >-   "4 for encap layer 3+4");
> >+   "4 for encap layer 3+4, 5 for vlan+srcmac");
> > module_param(arp_interval, int, 0);
> > MODULE_PARM_DESC(arp_interval, "arp interval in milliseconds");
> > module_param_array(arp_ip_target, charp, NULL, 0);
> >@@ -1434,6 +1434,8 @@ static enum netdev_lag_hash bond_lag_hash_type(struct 
> >bonding *bond,
> > return NETDEV_LAG_HASH_E23;
> > case BOND_XMIT_POLICY_ENCAP34:
> > return NETDEV_LAG_HASH_E34;
> >+case BOND_XMIT_POLICY_VLAN_SRCMAC:
> >+return NETDEV_LAG_HASH_VLAN_SRCMAC;
> > default:
> > return NETDEV_LAG_HASH_UNKNOWN;
> > }
> >@@ -3494,6 +3496,20 @@ static bool bond_flow_ip(struct sk_buff *skb, struct 
> >flow_keys *fk,
> > return true;
> > }
> > 
> >+static inline u32 bond_vlan_srcmac_hash(struct sk_buff *skb)
> >+{
> >+struct ethhdr *mac_hdr = (struct ethhdr *)skb_mac_header(skb);
> >+u32 srcmac = mac_hdr->h_source[5];
> >+u16 vlan;
> >+
> >+if (!skb_vlan_tag_present(skb))
> >+return srcmac;
> >+
> >+vlan = skb_vlan_tag_get(skb);
> >+
> >+return srcmac ^ vlan;
> 
>   For the common configuration wherein multiple VLANs are
> configured atop a single interface (and thus by default end up with the
> same MAC address), this seems like a fairly weak hash.  The TCI is 16
> bits (12 of which are the VID), but only 8 are used from the MAC, which
> will be a constant.
> 
>   Is this algorithm copying the proprietary solution you mention?

I've not actually seen the code in question, so I can't be 100% certain,
but this is exactly how it was described to me, and testing seems to bear
out that it behaves at least similarly enough for the user. They like
simplicity, and the very basic hash suits their needs, which are basically
just getting some load-balancing with failover w/o having to have lacp,
running on some older switches that can't do lacp.

-- 
Jarod Wilson
ja...@redhat.com

Re: [RFC PATCH net-next] bonding: add a vlan+srcmac tx hashing option

2021-01-07 Thread Jarod Wilson

On Mon, Dec 28, 2020 at 11:11:45AM +0100, Jiri Pirko wrote:
> Fri, Dec 18, 2020 at 08:30:33PM CET, ja...@redhat.com wrote:
> >This comes from an end-user request, where they're running multiple VMs on
> >hosts with bonded interfaces connected to some interest switch topologies,
> >where 802.3ad isn't an option. They're currently running a proprietary
> >solution that effectively achieves load-balancing of VMs and bandwidth
> >utilization improvements with a similar form of transmission algorithm.
> >
> >Basically, each VM has it's own vlan, so it always sends its traffic out
> >the same interface, unless that interface fails. Traffic gets split
> >between the interfaces, maintaining a consistent path, with failover still
> >available if an interface goes down.
> >
> >This has been rudimetarily tested to provide similar results, suitable for
> >them to use to move off their current proprietary solution.
> >
> >Still on the TODO list, if these even looks sane to begin with, is
> >fleshing out Documentation/networking/bonding.rst.
> 
> Jarod, did you consider using team driver instead ? :)

That's actually one of the things that was suggested, since team I believe
already has support for this, but the user really wants to use bonding.
We're finding that a lot of users really still prefer bonding over team.

-- 
Jarod Wilson
ja...@redhat.com

Re: [PATCH net] bonding: reduce rtnl lock contention in mii monitor thread

2020-12-18 Thread Jarod Wilson

On Tue, Dec 8, 2020 at 3:35 PM Jay Vosburgh  wrote:
>
> Jarod Wilson  wrote:
...
> >The addition of a case BOND_LINK_BACK in bond_miimon_commit() is somewhat
> >separate from the fix for the actual hang, but it eliminates a constant
> >"invalid new link 3 on slave" message seen related to this issue, and it's
> >not actually an invalid state here, so we shouldn't be reporting it as an
> >error.
...
> In principle, bond_miimon_commit should not see _BACK or _FAIL
> state as a new link state, because those states should be managed at the
> bond_miimon_inspect level (as they are the result of updelay and
> downdelay).  These states should not be "committed" in the sense of
> causing notifications or doing actions that require RTNL.
>
> My recollection is that the "invalid new link" messages were the
> result of a bug in de77ecd4ef02, which was fixed in 1899bb325149
> ("bonding: fix state transition issue in link monitoring"), but maybe
> the RTNL problem here induces that in some other fashion.
>
> Either way, I believe this message is correct as-is.

For reference, with 5.10.1 and this script:

#!/bin/sh

slave1=ens4f0
slave2=ens4f1

modprobe -rv bonding
modprobe -v bonding mode=2 miimon=100 updelay=200
ip link set bond0 up
ifenslave bond0 $slave1 $slave2
sleep 5

while :
do
ip link set $slave1 down
sleep 1
ip link set $slave1 up
sleep 1
done

I get this repeating log output:

[ 9488.262291] sfc :05:00.0 ens4f0: link up at 1Mbps
full-duplex (MTU 1500)
[ 9488.339508] bond0: (slave ens4f0): link status up, enabling it in 200 ms
[ 9488.339511] bond0: (slave ens4f0): invalid new link 3 on slave
[ 9488.547643] bond0: (slave ens4f0): link status definitely up, 1
Mbps full duplex
[ 9489.276614] bond0: (slave ens4f0): link status definitely down,
disabling slave
[ 9490.273830] sfc :05:00.0 ens4f0: link up at 1Mbps
full-duplex (MTU 1500)
[ 9490.315540] bond0: (slave ens4f0): link status up, enabling it in 200 ms
[ 9490.315543] bond0: (slave ens4f0): invalid new link 3 on slave
[ 9490.523641] bond0: (slave ens4f0): link status definitely up, 1
Mbps full duplex
[ 9491.356526] bond0: (slave ens4f0): link status definitely down,
disabling slave
[ 9492.285249] sfc :05:00.0 ens4f0: link up at 1Mbps
full-duplex (MTU 1500)
[ 9492.291522] bond0: (slave ens4f0): link status up, enabling it in 200 ms
[ 9492.291523] bond0: (slave ens4f0): invalid new link 3 on slave
[ 9492.499604] bond0: (slave ens4f0): link status definitely up, 1
Mbps full duplex
[ 9493.331594] bond0: (slave ens4f0): link status definitely down,
disabling slave

"invalid new link 3 on slave" is there every single time.

Side note: I'm not actually able to reproduce the repeating "link
status up, enabling it in 200 ms" and never recovering from a downed
link on this host, no clue why it's so reproducible w/another system.

-- 
Jarod Wilson
ja...@redhat.com

[RFC PATCH net-next] bonding: add a vlan+srcmac tx hashing option

2020-12-18 Thread Jarod Wilson

This comes from an end-user request, where they're running multiple VMs on
hosts with bonded interfaces connected to some interest switch topologies,
where 802.3ad isn't an option. They're currently running a proprietary
solution that effectively achieves load-balancing of VMs and bandwidth
utilization improvements with a similar form of transmission algorithm.

Basically, each VM has it's own vlan, so it always sends its traffic out
the same interface, unless that interface fails. Traffic gets split
between the interfaces, maintaining a consistent path, with failover still
available if an interface goes down.

This has been rudimetarily tested to provide similar results, suitable for
them to use to move off their current proprietary solution.

Still on the TODO list, if these even looks sane to begin with, is
fleshing out Documentation/networking/bonding.rst.

Cc: Jay Vosburgh 
Cc: Veaceslav Falico 
Cc: Andy Gospodarek 
Cc: "David S. Miller" 
Cc: Jakub Kicinski 
Cc: Thomas Davis 
Cc: net...@vger.kernel.org
Signed-off-by: Jarod Wilson 
---
 drivers/net/bonding/bond_main.c| 27 +--
 drivers/net/bonding/bond_options.c |  1 +
 include/linux/netdevice.h  |  1 +
 include/uapi/linux/if_bonding.h|  1 +
 4 files changed, 28 insertions(+), 2 deletions(-)

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 5fe5232cc3f3..151ce8c7a56f 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -164,7 +164,7 @@ module_param(xmit_hash_policy, charp, 0);
 MODULE_PARM_DESC(xmit_hash_policy, "balance-alb, balance-tlb, balance-xor, 
802.3ad hashing method; "
   "0 for layer 2 (default), 1 for layer 3+4, "
   "2 for layer 2+3, 3 for encap layer 2+3, "
-  "4 for encap layer 3+4");
+  "4 for encap layer 3+4, 5 for vlan+srcmac");
 module_param(arp_interval, int, 0);
 MODULE_PARM_DESC(arp_interval, "arp interval in milliseconds");
 module_param_array(arp_ip_target, charp, NULL, 0);
@@ -1434,6 +1434,8 @@ static enum netdev_lag_hash bond_lag_hash_type(struct 
bonding *bond,
return NETDEV_LAG_HASH_E23;
case BOND_XMIT_POLICY_ENCAP34:
return NETDEV_LAG_HASH_E34;
+   case BOND_XMIT_POLICY_VLAN_SRCMAC:
+   return NETDEV_LAG_HASH_VLAN_SRCMAC;
default:
return NETDEV_LAG_HASH_UNKNOWN;
}
@@ -3494,6 +3496,20 @@ static bool bond_flow_ip(struct sk_buff *skb, struct 
flow_keys *fk,
return true;
 }
 
+static inline u32 bond_vlan_srcmac_hash(struct sk_buff *skb)
+{
+   struct ethhdr *mac_hdr = (struct ethhdr *)skb_mac_header(skb);
+   u32 srcmac = mac_hdr->h_source[5];
+   u16 vlan;
+
+   if (!skb_vlan_tag_present(skb))
+   return srcmac;
+
+   vlan = skb_vlan_tag_get(skb);
+
+   return srcmac ^ vlan;
+}
+
 /* Extract the appropriate headers based on bond's xmit policy */
 static bool bond_flow_dissect(struct bonding *bond, struct sk_buff *skb,
  struct flow_keys *fk)
@@ -3501,10 +3517,14 @@ static bool bond_flow_dissect(struct bonding *bond, 
struct sk_buff *skb,
bool l34 = bond->params.xmit_policy == BOND_XMIT_POLICY_LAYER34;
int noff, proto = -1;
 
-   if (bond->params.xmit_policy > BOND_XMIT_POLICY_LAYER23) {
+   switch (bond->params.xmit_policy) {
+   case BOND_XMIT_POLICY_ENCAP23:
+   case BOND_XMIT_POLICY_ENCAP34:
memset(fk, 0, sizeof(*fk));
return __skb_flow_dissect(NULL, skb, &flow_keys_bonding,
  fk, NULL, 0, 0, 0, 0);
+   default:
+   break;
}
 
fk->ports.ports = 0;
@@ -3556,6 +3576,9 @@ u32 bond_xmit_hash(struct bonding *bond, struct sk_buff 
*skb)
skb->l4_hash)
return skb->hash;
 
+   if (bond->params.xmit_policy == BOND_XMIT_POLICY_VLAN_SRCMAC)
+   return bond_vlan_srcmac_hash(skb);
+
if (bond->params.xmit_policy == BOND_XMIT_POLICY_LAYER2 ||
!bond_flow_dissect(bond, skb, &flow))
return bond_eth_hash(skb);
diff --git a/drivers/net/bonding/bond_options.c 
b/drivers/net/bonding/bond_options.c
index a4e4e15f574d..9826fe46fca1 100644
--- a/drivers/net/bonding/bond_options.c
+++ b/drivers/net/bonding/bond_options.c
@@ -101,6 +101,7 @@ static const struct bond_opt_value bond_xmit_hashtype_tbl[] 
= {
{ "layer2+3", BOND_XMIT_POLICY_LAYER23, 0},
{ "encap2+3", BOND_XMIT_POLICY_ENCAP23, 0},
{ "encap3+4", BOND_XMIT_POLICY_ENCAP34, 0},
+   { "vlansrc",  BOND_XMIT_POLICY_VLAN_SRCMAC,  0},
{ NULL,   -1,   0},
 };
 
diff --g

Re: [PATCH net] bonding: reduce rtnl lock contention in mii monitor thread

2020-12-09 Thread Jarod Wilson

On Tue, Dec 8, 2020 at 3:35 PM Jay Vosburgh  wrote:
>
> Jarod Wilson  wrote:
>
> >I'm seeing a system get stuck unable to bring a downed interface back up
> >when it's got an updelay value set, behavior which ceased when logging
> >spew was removed from bond_miimon_inspect(). I'm monitoring logs on this
> >system over another network connection, and it seems that the act of
> >spewing logs at all there increases rtnl lock contention, because
> >instrumented code showed bond_mii_monitor() never able to succeed in it's
> >attempts to call rtnl_trylock() to actually commit link state changes,
> >leaving the downed link stuck in BOND_LINK_DOWN. The system in question
> >appears to be fine with the log spew being moved to
> >bond_commit_link_state(), which is called after the successful
> >rtnl_trylock(). I'm actually wondering if perhaps we ultimately need/want
> >some bond-specific lock here to prevent racing with bond_close() instead
> >of using rtnl, but this shift of the output appears to work. I believe
> >this started happening when de77ecd4ef02 ("bonding: improve link-status
> >update in mii-monitoring") went in, but I'm not 100% on that.
>
> We use RTNL not to avoid deadlock with bonding itself, but
> because the "commit" side undertakes actions which require RTNL, e.g.,
> various events will eventually call netdev_lower_state_changed.
>
> However, the RTNL acquisition is a trylock to avoid the deadlock
> with bond_close.  Moving that out of line here (e.g., putting the commit
> into another work queue event or the like) has the same problem, in that
> bond_close needs to wait for all of the work queue events to finish, and
> it holds RTNL.

Ah, okay, it wasn't clear to me that we actually do need RTNL here,
I'd thought it was just for the deadlock avoidance with bond_close,
based on the comments in the source.

> Also, a dim memory says that the various notification messages
> were mostly placed in the "inspect" phase and not the "commit" phase to
> avoid doing printk-like activities with RTNL held.  As a general
> principle, I don't think we want to add more verbiage under RTNL.

Yeah, that's fair.

> >The addition of a case BOND_LINK_BACK in bond_miimon_inspect() is somewhat
> >separate from the fix for the actual hang, but it eliminates a constant
> >"invalid new link 3 on slave" message seen related to this issue, and it's
> >not actually an invalid state here, so we shouldn't be reporting it as an
> >error.
>
> Do you mean bond_miimon_commit here and not bond_miimon_inspect
> (which already has a case for BOND_LINK_BACK)?

Whoops, yes.

> In principle, bond_miimon_commit should not see _BACK or _FAIL
> state as a new link state, because those states should be managed at the
> bond_miimon_inspect level (as they are the result of updelay and
> downdelay).  These states should not be "committed" in the sense of
> causing notifications or doing actions that require RTNL.
>
> My recollection is that the "invalid new link" messages were the
> result of a bug in de77ecd4ef02, which was fixed in 1899bb325149
> ("bonding: fix state transition issue in link monitoring"), but maybe
> the RTNL problem here induces that in some other fashion.
>
> Either way, I believe this message is correct as-is.

Hm, okay, definitely seeing this message pop up regularly when a link
recovers, using a fairly simple reproducer:

slave1=p6p1
slave2=p6p2

modprobe -rv bonding
modprobe -v bonding mode=2 miimon=100 updelay=200
ip link set bond0 up
ifenslave bond0 $slave1 $slave2
sleep 10

while :
do
  ip link set $slave1 down
   sleep 1
   ip link set $slave1 up
   sleep 1
done

I wasn't actually seeing the problem until I was running a 'watch -n 1
"dmesg | tail -n 50"' or similar in a remote ssh session on the host.

I should add the caveat that this was also initially seen on an older
kernel, but with a fairly up-to-date bonding driver, which does
include both de77ecd4ef02 and 1899bb325149. I'm going to keep prodding
w/a more recent upstreamier kernel, and see if I can get a better idea
of what's actually going on.

-- 
Jarod Wilson
ja...@redhat.com

Re: [PATCH net] bonding: reduce rtnl lock contention in mii monitor thread

2020-12-09 Thread Jarod Wilson

On Tue, Dec 8, 2020 at 2:38 PM Jakub Kicinski  wrote:
>
> On Sat,  5 Dec 2020 18:43:54 -0500 Jarod Wilson wrote:
> > I'm seeing a system get stuck unable to bring a downed interface back up
> > when it's got an updelay value set, behavior which ceased when logging
> > spew was removed from bond_miimon_inspect(). I'm monitoring logs on this
> > system over another network connection, and it seems that the act of
> > spewing logs at all there increases rtnl lock contention, because
> > instrumented code showed bond_mii_monitor() never able to succeed in it's
> > attempts to call rtnl_trylock() to actually commit link state changes,
> > leaving the downed link stuck in BOND_LINK_DOWN. The system in question
> > appears to be fine with the log spew being moved to
> > bond_commit_link_state(), which is called after the successful
> > rtnl_trylock().
>
> But it's not called under rtnl_lock AFAICT. So something else is also
> spewing messages?
>
> While bond_commit_link_state() _is_ called under the lock. So you're
> increasing the retry rate, by putting the slow operation under the
> lock, is that right?

Partially, yes. I probably should have tagged this with RFC instead of
PATCH, tbh. My theory was that the log spew, being sent out *other*
network interfaces when monitoring the system or remote syslog or ssh
was potentially causing some rtnl_lock() calls, so not spewing until
after actually being able to grab the lock would lessen the problem
w/actually acquiring the lock, but I ... don't know offhand how to
verify that theory.


> Also isn't bond_commit_link_state() called from many more places?
> So we're adding new prints, effectively?

Ah. Crap. Yes. bond_set_slave_link_state() is called quite a few
places, and that in turn calls bond_commit_link_state().


> > I'm actually wondering if perhaps we ultimately need/want
> > some bond-specific lock here to prevent racing with bond_close() instead
> > of using rtnl, but this shift of the output appears to work. I believe
> > this started happening when de77ecd4ef02 ("bonding: improve link-status
> > update in mii-monitoring") went in, but I'm not 100% on that.
> >
> > The addition of a case BOND_LINK_BACK in bond_miimon_inspect() is somewhat
> > separate from the fix for the actual hang, but it eliminates a constant
> > "invalid new link 3 on slave" message seen related to this issue, and it's
> > not actually an invalid state here, so we shouldn't be reporting it as an
> > error.
>
> Let's make it a separate patch, then.

Sounds like Jay is confident that bit is valid, and I shouldn't be
ending up in that state, unless something else is going wrong.

-- 
Jarod Wilson
ja...@redhat.com

[PATCH net] bonding: reduce rtnl lock contention in mii monitor thread

2020-12-05 Thread Jarod Wilson

I'm seeing a system get stuck unable to bring a downed interface back up
when it's got an updelay value set, behavior which ceased when logging
spew was removed from bond_miimon_inspect(). I'm monitoring logs on this
system over another network connection, and it seems that the act of
spewing logs at all there increases rtnl lock contention, because
instrumented code showed bond_mii_monitor() never able to succeed in it's
attempts to call rtnl_trylock() to actually commit link state changes,
leaving the downed link stuck in BOND_LINK_DOWN. The system in question
appears to be fine with the log spew being moved to
bond_commit_link_state(), which is called after the successful
rtnl_trylock(). I'm actually wondering if perhaps we ultimately need/want
some bond-specific lock here to prevent racing with bond_close() instead
of using rtnl, but this shift of the output appears to work. I believe
this started happening when de77ecd4ef02 ("bonding: improve link-status
update in mii-monitoring") went in, but I'm not 100% on that.

The addition of a case BOND_LINK_BACK in bond_miimon_inspect() is somewhat
separate from the fix for the actual hang, but it eliminates a constant
"invalid new link 3 on slave" message seen related to this issue, and it's
not actually an invalid state here, so we shouldn't be reporting it as an
error.

CC: Mahesh Bandewar 
CC: Jay Vosburgh 
CC: Veaceslav Falico 
CC: Andy Gospodarek 
CC: "David S. Miller" 
CC: Jakub Kicinski 
CC: Thomas Davis 
CC: net...@vger.kernel.org
Signed-off-by: Jarod Wilson 
---
 drivers/net/bonding/bond_main.c | 26 ++
 include/net/bonding.h   | 38 +
 2 files changed, 44 insertions(+), 20 deletions(-)

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 47afc5938c26..cdb6c64f16b6 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -2292,23 +2292,13 @@ static int bond_miimon_inspect(struct bonding *bond)
bond_propose_link_state(slave, BOND_LINK_FAIL);
commit++;
slave->delay = bond->params.downdelay;
-   if (slave->delay) {
-   slave_info(bond->dev, slave->dev, "link status 
down for %sinterface, disabling it in %d ms\n",
-  (BOND_MODE(bond) ==
-   BOND_MODE_ACTIVEBACKUP) ?
-   (bond_is_active_slave(slave) ?
-"active " : "backup ") : "",
-  bond->params.downdelay * 
bond->params.miimon);
-   }
+
fallthrough;
case BOND_LINK_FAIL:
if (link_state) {
/* recovered before downdelay expired */
bond_propose_link_state(slave, BOND_LINK_UP);
slave->last_link_up = jiffies;
-   slave_info(bond->dev, slave->dev, "link status 
up again after %d ms\n",
-  (bond->params.downdelay - 
slave->delay) *
-  bond->params.miimon);
commit++;
continue;
}
@@ -2330,19 +2320,10 @@ static int bond_miimon_inspect(struct bonding *bond)
commit++;
slave->delay = bond->params.updelay;
 
-   if (slave->delay) {
-   slave_info(bond->dev, slave->dev, "link status 
up, enabling it in %d ms\n",
-  ignore_updelay ? 0 :
-  bond->params.updelay *
-  bond->params.miimon);
-   }
fallthrough;
case BOND_LINK_BACK:
if (!link_state) {
bond_propose_link_state(slave, BOND_LINK_DOWN);
-   slave_info(bond->dev, slave->dev, "link status 
down again after %d ms\n",
-  (bond->params.updelay - 
slave->delay) *
-  bond->params.miimon);
commit++;
continue;
}
@@ -2456,6 +2437,11 @@ static void bond_miimon_commit(struct bonding *bond)
 
continue;
 
+   case BOND_LINK_BACK:
+   bond_propose_link_state(slave, BOND_LINK_NOCHANGE);
+

Re: [PATCH net v3] bonding: fix feature flag setting at init time

2020-12-05 Thread Jarod Wilson

On Thu, Dec 3, 2020 at 11:45 AM Jakub Kicinski  wrote:
...
> nit: let's narrow down the ifdef-enery
>
> no need for the ifdef here, if the helper looks like this:
>
> +static void bond_set_xfrm_features(struct net_device *bond_dev, u64 mode)
> +{
> +#ifdef CONFIG_XFRM_OFFLOAD
> +   if (mode == BOND_MODE_ACTIVEBACKUP)
> +   bond_dev->wanted_features |= BOND_XFRM_FEATURES;
> +   else
> +   bond_dev->wanted_features &= ~BOND_XFRM_FEATURES;
> +
> +   netdev_update_features(bond_dev);
> +#endif /* CONFIG_XFRM_OFFLOAD */
> +}
>
> Even better:
>
> +static void bond_set_xfrm_features(struct net_device *bond_dev, u64 mode)
> +{
> +   if (!IS_ENABLED(CONFIG_XFRM_OFFLOAD))
> +   return;
> +
> +   if (mode == BOND_MODE_ACTIVEBACKUP)
> +   bond_dev->wanted_features |= BOND_XFRM_FEATURES;
> +   else
> +   bond_dev->wanted_features &= ~BOND_XFRM_FEATURES;
> +
> +   netdev_update_features(bond_dev);
> +}
>
> (Assuming BOND_XFRM_FEATURES doesn't itself hide under an ifdef.)

It is, but doesn't need to be. I can mix these changes in as well.

-- 
Jarod Wilson
ja...@redhat.com

[PATCH net v4] bonding: fix feature flag setting at init time

2020-12-05 Thread Jarod Wilson

Don't try to adjust XFRM support flags if the bond device isn't yet
registered. Bad things can currently happen when netdev_change_features()
is called without having wanted_features fully filled in yet. This code
runs both on post-module-load mode changes, as well as at module init
time, and when run at module init time, it is before register_netdevice()
has been called and filled in wanted_features. The empty wanted_features
led to features also getting emptied out, which was definitely not the
intended behavior, so prevent that from happening.

Originally, I'd hoped to stop adjusting wanted_features at all in the
bonding driver, as it's documented as being something only the network
core should touch, but we actually do need to do this to properly update
both the features and wanted_features fields when changing the bond type,
or we get to a situation where ethtool sees:

esp-hw-offload: off [requested on]

I do think we should be using netdev_update_features instead of
netdev_change_features here though, so we only send notifiers when the
features actually changed.

Fixes: a3b658cfb664 ("bonding: allow xfrm offload setup post-module-load")
Reported-by: Ivan Vecera 
Suggested-by: Ivan Vecera 
Cc: Jay Vosburgh 
Cc: Veaceslav Falico 
Cc: Andy Gospodarek 
Cc: "David S. Miller" 
Cc: Jakub Kicinski 
Cc: Thomas Davis 
Cc: net...@vger.kernel.org
Signed-off-by: Jarod Wilson 
---
v2: rework based on further testing and suggestions from ivecera
v3: add helper function, remove goto
v4: drop hunk not directly related to fix, clean up ifdeffery

 drivers/net/bonding/bond_options.c | 22 +++---
 include/net/bonding.h  |  2 --
 2 files changed, 15 insertions(+), 9 deletions(-)

diff --git a/drivers/net/bonding/bond_options.c 
b/drivers/net/bonding/bond_options.c
index 9abfaae1c6f7..a4e4e15f574d 100644
--- a/drivers/net/bonding/bond_options.c
+++ b/drivers/net/bonding/bond_options.c
@@ -745,6 +745,19 @@ const struct bond_option *bond_opt_get(unsigned int option)
return &bond_opts[option];
 }
 
+static void bond_set_xfrm_features(struct net_device *bond_dev, u64 mode)
+{
+   if (!IS_ENABLED(CONFIG_XFRM_OFFLOAD))
+   return;
+
+   if (mode == BOND_MODE_ACTIVEBACKUP)
+   bond_dev->wanted_features |= BOND_XFRM_FEATURES;
+   else
+   bond_dev->wanted_features &= ~BOND_XFRM_FEATURES;
+
+   netdev_update_features(bond_dev);
+}
+
 static int bond_option_mode_set(struct bonding *bond,
const struct bond_opt_value *newval)
 {
@@ -767,13 +780,8 @@ static int bond_option_mode_set(struct bonding *bond,
if (newval->value == BOND_MODE_ALB)
bond->params.tlb_dynamic_lb = 1;
 
-#ifdef CONFIG_XFRM_OFFLOAD
-   if (newval->value == BOND_MODE_ACTIVEBACKUP)
-   bond->dev->wanted_features |= BOND_XFRM_FEATURES;
-   else
-   bond->dev->wanted_features &= ~BOND_XFRM_FEATURES;
-   netdev_change_features(bond->dev);
-#endif /* CONFIG_XFRM_OFFLOAD */
+   if (bond->dev->reg_state == NETREG_REGISTERED)
+   bond_set_xfrm_features(bond->dev, newval->value);
 
/* don't cache arp_validate between modes */
bond->params.arp_validate = BOND_ARP_VALIDATE_NONE;
diff --git a/include/net/bonding.h b/include/net/bonding.h
index d9d0ff3b0ad3..adc3da776970 100644
--- a/include/net/bonding.h
+++ b/include/net/bonding.h
@@ -86,10 +86,8 @@
 #define bond_for_each_slave_rcu(bond, pos, iter) \
netdev_for_each_lower_private_rcu((bond)->dev, pos, iter)
 
-#ifdef CONFIG_XFRM_OFFLOAD
 #define BOND_XFRM_FEATURES (NETIF_F_HW_ESP | NETIF_F_HW_ESP_TX_CSUM | \
NETIF_F_GSO_ESP)
-#endif /* CONFIG_XFRM_OFFLOAD */
 
 #ifdef CONFIG_NET_POLL_CONTROLLER
 extern atomic_t netpoll_block_tx;
-- 
2.28.0

[PATCH net-next] bonding: set xfrm feature flags more sanely

2020-12-05 Thread Jarod Wilson

We can remove one of the ifdef blocks here, and instead of setting both
the xfrm hw_features and features flags, then unsetting the the features
flags if not in AB, wait to set the features flags if we're actually in AB
mode.

Cc: Jay Vosburgh 
Cc: Veaceslav Falico 
Cc: Andy Gospodarek 
Cc: "David S. Miller" 
Cc: Jakub Kicinski 
Cc: Thomas Davis 
Cc: net...@vger.kernel.org
Signed-off-by: Jarod Wilson 
---
 drivers/net/bonding/bond_main.c | 10 --
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index e0880a3840d7..5fe5232cc3f3 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -4746,15 +4746,13 @@ void bond_setup(struct net_device *bond_dev)
NETIF_F_HW_VLAN_CTAG_FILTER;
 
bond_dev->hw_features |= NETIF_F_GSO_ENCAP_ALL;
-#ifdef CONFIG_XFRM_OFFLOAD
-   bond_dev->hw_features |= BOND_XFRM_FEATURES;
-#endif /* CONFIG_XFRM_OFFLOAD */
bond_dev->features |= bond_dev->hw_features;
bond_dev->features |= NETIF_F_HW_VLAN_CTAG_TX | NETIF_F_HW_VLAN_STAG_TX;
 #ifdef CONFIG_XFRM_OFFLOAD
-   /* Disable XFRM features if this isn't an active-backup config */
-   if (BOND_MODE(bond) != BOND_MODE_ACTIVEBACKUP)
-   bond_dev->features &= ~BOND_XFRM_FEATURES;
+   bond_dev->hw_features |= BOND_XFRM_FEATURES;
+   /* Only enable XFRM features if this is an active-backup config */
+   if (BOND_MODE(bond) == BOND_MODE_ACTIVEBACKUP)
+   bond_dev->features |= BOND_XFRM_FEATURES;
 #endif /* CONFIG_XFRM_OFFLOAD */
 }
 
-- 
2.28.0

Re: [PATCH net v3] bonding: fix feature flag setting at init time

2020-12-03 Thread Jarod Wilson

On Thu, Dec 3, 2020 at 11:50 AM Jakub Kicinski  wrote:
>
> On Wed,  2 Dec 2020 19:43:57 -0500 Jarod Wilson wrote:
> >   bond_dev->hw_features |= NETIF_F_GSO_ENCAP_ALL | NETIF_F_GSO_UDP_L4;
> > -#ifdef CONFIG_XFRM_OFFLOAD
> > - bond_dev->hw_features |= BOND_XFRM_FEATURES;
> > -#endif /* CONFIG_XFRM_OFFLOAD */
> >   bond_dev->features |= bond_dev->hw_features;
> >   bond_dev->features |= NETIF_F_HW_VLAN_CTAG_TX | 
> > NETIF_F_HW_VLAN_STAG_TX;
> >  #ifdef CONFIG_XFRM_OFFLOAD
> > - /* Disable XFRM features if this isn't an active-backup config */
> > - if (BOND_MODE(bond) != BOND_MODE_ACTIVEBACKUP)
> > - bond_dev->features &= ~BOND_XFRM_FEATURES;
> > + bond_dev->hw_features |= BOND_XFRM_FEATURES;
> > + /* Only enable XFRM features if this is an active-backup config */
> > + if (BOND_MODE(bond) == BOND_MODE_ACTIVEBACKUP)
> > + bond_dev->features |= BOND_XFRM_FEATURES;
> >  #endif /* CONFIG_XFRM_OFFLOAD */
>
> This makes no functional change, or am I reading it wrong?

You are correct, there's ultimately no functional change there, it
primarily just condenses the code down to a single #ifdef block, and
doesn't add and then remove BOND_XFRM_FEATURES from
bond_dev->features, instead omitting it initially and only adding it
when in AB mode. I'd poked at the code in that area while trying to
get to the bottom of this, thought it made it more understandable, so
I left it in, but ultimately, it's not necessary to fix the problem
here.

-- 
Jarod Wilson
ja...@redhat.com

[PATCH net v3] bonding: fix feature flag setting at init time

2020-12-02 Thread Jarod Wilson

Don't try to adjust XFRM support flags if the bond device isn't yet
registered. Bad things can currently happen when netdev_change_features()
is called without having wanted_features fully filled in yet. This code
runs on post-module-load mode changes, as well as at module init time
and new bond creation time, and in the latter two scenarios, it is
running prior to register_netdevice() having been called and
subsequently filling in wanted_features. The empty wanted_features led
to features also getting emptied out, which was definitely not the
intended behavior, so prevent that from happening.

Originally, I'd hoped to stop adjusting wanted_features at all in the
bonding driver, as it's documented as being something only the network
core should touch, but we actually do need to do this to properly update
both the features and wanted_features fields when changing the bond type,
or we get to a situation where ethtool sees:

esp-hw-offload: off [requested on]

I do think we should be using netdev_update_features instead of
netdev_change_features here though, so we only send notifiers when the
features actually changed.

v2: rework based on further testing and suggestions from ivecera
v3: add helper function, remove goto, fix problem description

Fixes: a3b658cfb664 ("bonding: allow xfrm offload setup post-module-load")
Reported-by: Ivan Vecera 
Suggested-by: Ivan Vecera 
Cc: Jay Vosburgh 
Cc: Veaceslav Falico 
Cc: Andy Gospodarek 
Cc: "David S. Miller" 
Cc: Jakub Kicinski 
Cc: Thomas Davis 
Cc: net...@vger.kernel.org
Signed-off-by: Jarod Wilson 
---
 drivers/net/bonding/bond_main.c| 10 --
 drivers/net/bonding/bond_options.c | 19 ++-
 2 files changed, 18 insertions(+), 11 deletions(-)

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 47afc5938c26..7905534a763b 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -4747,15 +4747,13 @@ void bond_setup(struct net_device *bond_dev)
NETIF_F_HW_VLAN_CTAG_FILTER;
 
bond_dev->hw_features |= NETIF_F_GSO_ENCAP_ALL | NETIF_F_GSO_UDP_L4;
-#ifdef CONFIG_XFRM_OFFLOAD
-   bond_dev->hw_features |= BOND_XFRM_FEATURES;
-#endif /* CONFIG_XFRM_OFFLOAD */
bond_dev->features |= bond_dev->hw_features;
bond_dev->features |= NETIF_F_HW_VLAN_CTAG_TX | NETIF_F_HW_VLAN_STAG_TX;
 #ifdef CONFIG_XFRM_OFFLOAD
-   /* Disable XFRM features if this isn't an active-backup config */
-   if (BOND_MODE(bond) != BOND_MODE_ACTIVEBACKUP)
-   bond_dev->features &= ~BOND_XFRM_FEATURES;
+   bond_dev->hw_features |= BOND_XFRM_FEATURES;
+   /* Only enable XFRM features if this is an active-backup config */
+   if (BOND_MODE(bond) == BOND_MODE_ACTIVEBACKUP)
+   bond_dev->features |= BOND_XFRM_FEATURES;
 #endif /* CONFIG_XFRM_OFFLOAD */
 }
 
diff --git a/drivers/net/bonding/bond_options.c 
b/drivers/net/bonding/bond_options.c
index 9abfaae1c6f7..1ae0e5ab8c67 100644
--- a/drivers/net/bonding/bond_options.c
+++ b/drivers/net/bonding/bond_options.c
@@ -745,6 +745,18 @@ const struct bond_option *bond_opt_get(unsigned int option)
return &bond_opts[option];
 }
 
+#ifdef CONFIG_XFRM_OFFLOAD
+static void bond_set_xfrm_features(struct net_device *bond_dev, u64 mode)
+{
+   if (mode == BOND_MODE_ACTIVEBACKUP)
+   bond_dev->wanted_features |= BOND_XFRM_FEATURES;
+   else
+   bond_dev->wanted_features &= ~BOND_XFRM_FEATURES;
+
+   netdev_update_features(bond_dev);
+}
+#endif /* CONFIG_XFRM_OFFLOAD */
+
 static int bond_option_mode_set(struct bonding *bond,
const struct bond_opt_value *newval)
 {
@@ -768,11 +780,8 @@ static int bond_option_mode_set(struct bonding *bond,
bond->params.tlb_dynamic_lb = 1;
 
 #ifdef CONFIG_XFRM_OFFLOAD
-   if (newval->value == BOND_MODE_ACTIVEBACKUP)
-   bond->dev->wanted_features |= BOND_XFRM_FEATURES;
-   else
-   bond->dev->wanted_features &= ~BOND_XFRM_FEATURES;
-   netdev_change_features(bond->dev);
+   if (bond->dev->reg_state == NETREG_REGISTERED)
+   bond_set_xfrm_features(bond->dev, newval->value);
 #endif /* CONFIG_XFRM_OFFLOAD */
 
/* don't cache arp_validate between modes */
-- 
2.28.0

Re: [PATCH net v2] bonding: fix feature flag setting at init time

2020-12-02 Thread Jarod Wilson

On Wed, Dec 2, 2020 at 3:17 PM Jay Vosburgh  wrote:
>
> Jarod Wilson  wrote:
>
> >On Wed, Dec 2, 2020 at 12:55 PM Jay Vosburgh  
> >wrote:
> >>
> >> Jarod Wilson  wrote:
> >>
> >> >Don't try to adjust XFRM support flags if the bond device isn't yet
> >> >registered. Bad things can currently happen when netdev_change_features()
> >> >is called without having wanted_features fully filled in yet. Basically,
> >> >this code was racing against register_netdevice() filling in
> >> >wanted_features, and when it got there first, the empty wanted_features
> >> >led to features also getting emptied out, which was definitely not the
> >> >intended behavior, so prevent that from happening.
> >>
> >> Is this an actual race?  Reading Ivan's prior message, it sounds
> >> like it's an ordering problem (in that bond_newlink calls
> >> register_netdevice after bond_changelink).
> >
> >Sorry, yeah, this is not actually a race condition, just an ordering
> >issue, bond_check_params() gets called at init time, which leads to
> >bond_option_mode_set() being called, and does so prior to
> >bond_create() running, which is where we actually call
> >register_netdevice().
>
> So this only happens if there's a "mode" module parameter?  That
> doesn't sound like the call path that Ivan described (coming in via
> bond_newlink).

Ah. I think there's actually two different pathways that can trigger
this. The first is for bonds created at module load time, which I was
describing, the second is for a new bond created via bond_newlink()
after the bonding module is already loaded, as described by Ivan. Both
have the problem of bond_option_mode_set() running prior to
register_netdevice(). Of course, that would suggest every bond
currently comes up with unintentionally neutered flags, which I
neglected to catch in earlier testing and development.

-- 
Jarod Wilson
ja...@redhat.com

Re: [PATCH net v2] bonding: fix feature flag setting at init time

2020-12-02 Thread Jarod Wilson

On Wed, Dec 2, 2020 at 2:23 PM Jakub Kicinski  wrote:
>
> On Wed, 2 Dec 2020 14:03:53 -0500 Jarod Wilson wrote:
> > On Wed, Dec 2, 2020 at 12:53 PM Jakub Kicinski  wrote:
> > >
> > > On Wed,  2 Dec 2020 12:30:53 -0500 Jarod Wilson wrote:
> > > > + if (bond->dev->reg_state != NETREG_REGISTERED)
> > > > + goto noreg;
> > > > +
> > > >   if (newval->value == BOND_MODE_ACTIVEBACKUP)
> > > >   bond->dev->wanted_features |= BOND_XFRM_FEATURES;
> > > >   else
> > > >   bond->dev->wanted_features &= ~BOND_XFRM_FEATURES;
> > > > - netdev_change_features(bond->dev);
> > > > + netdev_update_features(bond->dev);
> > > > +noreg:
> > >
> > > Why the goto?
> >
> > Seemed cleaner to prevent an extra level of indentation of the code
> > following the goto and before the label, but I'm not that attached to
> > it if it's not wanted for coding style reasons.
>
> Yes, please don't use gotos where a normal if statement is sufficient.
> If you must avoid the indentation move the code to a helper.
>
> Also - this patch did not apply to net, please make sure you're
> developing on the correct base.

Argh, I must have been working in net-next instead of net, apologies.
Okay, I'll clarify the description per what Jay pointed out and adjust
the code to not include a goto, then make it on the right branch.

-- 
Jarod Wilson
ja...@redhat.com

Re: [PATCH net v2] bonding: fix feature flag setting at init time

2020-12-02 Thread Jarod Wilson

On Wed, Dec 2, 2020 at 12:55 PM Jay Vosburgh  wrote:
>
> Jarod Wilson  wrote:
>
> >Don't try to adjust XFRM support flags if the bond device isn't yet
> >registered. Bad things can currently happen when netdev_change_features()
> >is called without having wanted_features fully filled in yet. Basically,
> >this code was racing against register_netdevice() filling in
> >wanted_features, and when it got there first, the empty wanted_features
> >led to features also getting emptied out, which was definitely not the
> >intended behavior, so prevent that from happening.
>
> Is this an actual race?  Reading Ivan's prior message, it sounds
> like it's an ordering problem (in that bond_newlink calls
> register_netdevice after bond_changelink).

Sorry, yeah, this is not actually a race condition, just an ordering
issue, bond_check_params() gets called at init time, which leads to
bond_option_mode_set() being called, and does so prior to
bond_create() running, which is where we actually call
register_netdevice().

> The change to bond_option_mode_set tests against reg_state, so
> presumably it wants to skip the first(?) time through, before the
> register_netdevice call; is that right?

Correct. Later on, when the bonding driver is already loaded, and
parameter changes are made, bond_option_mode_set() gets called and if
the mode changes to or from active-backup, we do need/want this code
to run to update wanted and features flags properly.

-- 
Jarod Wilson
ja...@redhat.com

Re: [PATCH net v2] bonding: fix feature flag setting at init time

2020-12-02 Thread Jarod Wilson

On Wed, Dec 2, 2020 at 12:53 PM Jakub Kicinski  wrote:
>
> On Wed,  2 Dec 2020 12:30:53 -0500 Jarod Wilson wrote:
> > + if (bond->dev->reg_state != NETREG_REGISTERED)
> > + goto noreg;
> > +
> >   if (newval->value == BOND_MODE_ACTIVEBACKUP)
> >   bond->dev->wanted_features |= BOND_XFRM_FEATURES;
> >   else
> >   bond->dev->wanted_features &= ~BOND_XFRM_FEATURES;
> > - netdev_change_features(bond->dev);
> > + netdev_update_features(bond->dev);
> > +noreg:
>
> Why the goto?

Seemed cleaner to prevent an extra level of indentation of the code
following the goto and before the label, but I'm not that attached to
it if it's not wanted for coding style reasons.

-- 
Jarod Wilson
ja...@redhat.com

[PATCH net v2] bonding: fix feature flag setting at init time

2020-12-02 Thread Jarod Wilson

Don't try to adjust XFRM support flags if the bond device isn't yet
registered. Bad things can currently happen when netdev_change_features()
is called without having wanted_features fully filled in yet. Basically,
this code was racing against register_netdevice() filling in
wanted_features, and when it got there first, the empty wanted_features
led to features also getting emptied out, which was definitely not the
intended behavior, so prevent that from happening.

Originally, I'd hoped to stop adjusting wanted_features at all in the
bonding driver, as it's documented as being something only the network
core should touch, but we actually do need to do this to properly update
both the features and wanted_features fields when changing the bond type,
or we get to a situation where ethtool sees:

esp-hw-offload: off [requested on]

I do think we should be using netdev_update_features instead of
netdev_change_features here though, so we only send notifiers when the
features actually changed.

v2: rework based on further testing and suggestions from ivecera

Fixes: a3b658cfb664 ("bonding: allow xfrm offload setup post-module-load")
Reported-by: Ivan Vecera 
Suggested-by: Ivan Vecera 
Cc: Jay Vosburgh 
Cc: Veaceslav Falico 
Cc: Andy Gospodarek 
Cc: "David S. Miller" 
Cc: Jakub Kicinski 
Cc: Thomas Davis 
Cc: net...@vger.kernel.org
Signed-off-by: Jarod Wilson 
---
 drivers/net/bonding/bond_main.c| 10 --
 drivers/net/bonding/bond_options.c |  6 +-
 2 files changed, 9 insertions(+), 7 deletions(-)

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index e0880a3840d7..5fe5232cc3f3 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -4746,15 +4746,13 @@ void bond_setup(struct net_device *bond_dev)
NETIF_F_HW_VLAN_CTAG_FILTER;
 
bond_dev->hw_features |= NETIF_F_GSO_ENCAP_ALL;
-#ifdef CONFIG_XFRM_OFFLOAD
-   bond_dev->hw_features |= BOND_XFRM_FEATURES;
-#endif /* CONFIG_XFRM_OFFLOAD */
bond_dev->features |= bond_dev->hw_features;
bond_dev->features |= NETIF_F_HW_VLAN_CTAG_TX | NETIF_F_HW_VLAN_STAG_TX;
 #ifdef CONFIG_XFRM_OFFLOAD
-   /* Disable XFRM features if this isn't an active-backup config */
-   if (BOND_MODE(bond) != BOND_MODE_ACTIVEBACKUP)
-   bond_dev->features &= ~BOND_XFRM_FEATURES;
+   bond_dev->hw_features |= BOND_XFRM_FEATURES;
+   /* Only enable XFRM features if this is an active-backup config */
+   if (BOND_MODE(bond) == BOND_MODE_ACTIVEBACKUP)
+   bond_dev->features |= BOND_XFRM_FEATURES;
 #endif /* CONFIG_XFRM_OFFLOAD */
 }
 
diff --git a/drivers/net/bonding/bond_options.c 
b/drivers/net/bonding/bond_options.c
index 9abfaae1c6f7..19205cfac751 100644
--- a/drivers/net/bonding/bond_options.c
+++ b/drivers/net/bonding/bond_options.c
@@ -768,11 +768,15 @@ static int bond_option_mode_set(struct bonding *bond,
bond->params.tlb_dynamic_lb = 1;
 
 #ifdef CONFIG_XFRM_OFFLOAD
+   if (bond->dev->reg_state != NETREG_REGISTERED)
+   goto noreg;
+
if (newval->value == BOND_MODE_ACTIVEBACKUP)
bond->dev->wanted_features |= BOND_XFRM_FEATURES;
else
bond->dev->wanted_features &= ~BOND_XFRM_FEATURES;
-   netdev_change_features(bond->dev);
+   netdev_update_features(bond->dev);
+noreg:
 #endif /* CONFIG_XFRM_OFFLOAD */
 
/* don't cache arp_validate between modes */
-- 
2.28.0

Re: [PATCH net-next v4 0/5] bonding: rename bond components

2020-11-22 Thread Jarod Wilson

On Wed, Nov 11, 2020 at 5:04 PM Jakub Kicinski  wrote:
>
> On Wed, 11 Nov 2020 12:13:56 -0800 Jay Vosburgh wrote:
> > Jarod Wilson  wrote:
> >
> > >The bonding driver's use of master and slave, while largely understood
> > >in technical circles, poses a barrier for inclusion to some potential
> > >members of the development and user community, due to the historical
> > >context of masters and slaves, particularly in the United States. This
> > >is a first full pass at replacing those phrases with more socially
> > >inclusive ones, opting for bond to replace master and port to
> > >replace slave, which is congruent with the bridge and team drivers.
> > >
> > >There are a few problems with this change. First up, "port" is used in
> > >the bonding 802.3ad code, so the first step here is to rename port to
> > >ad_port, so we can reuse port. Second, we have the issue of not wanting
> > >to break any existing userspace, which I believe this patchset
> > >accomplishes, preserving all existing sysfs and procfs interfaces, and
> > >adding module parameter aliases where necessary.
> > >
> > >Third, we do still have the issue of ease of backporting fixes to
> > >-stable trees. I've not had a huge amount of time to spend on it, but
> > >brief forays into coccinelle didn't really pay off (since it's meant to
> > >operate on code, not patches), and the best solution I can come up with
> > >is providing a shell script someone could run over git-format-patch
> > >output before git-am'ing the result to a -stable tree, though scripting
> > >these changes in the first place turned out to be not the best thing to
> > >do anyway, due to subtle cases where use of master or slave can NOT yet
> > >be replaced, so a large amount of work was done by hand, inspection,
> > >trial and error, which is why this set is a lot longer in coming than
> > >I'd originally hoped. I don't expect -stable backports to be horrible to
> > >figure out one way or another though, and I don't believe that a bit of
> > >inconvenience on that front is enough to warrant not making these
> > >changes.
> >
> >   I think this undersells the impact a bit; this will most likely
> > break the majority of cherry-picks for the bonding driver to stable
> > going forward should this patch set be committed.  Yes, the volume of
> > patches to bonding is relatively low, and the manual backports are not
> > likely to be technically difficult.  Nevertheless, I expect that most
> > bonding backports to stable that cross this patch set will require
> > manual intervention.
> >
> >   As such, I'd still like to see explicit direction from the
> > kernel development community leadership that change sets of this nature
> > (not technically driven, with long term maintenance implications) are
> > changes that should be undertaken rather than are merely permitted.
>
> Yeah, IDK. I think it's up to you as the maintainer of this code to
> make a call based on the specific circumstances. All we have AFAIK
> is the coding style statement which discourages new uses:
>
>   For symbol names and documentation, avoid introducing new usage of
>   'master / slave' (or 'slave' independent of 'master') and 'blacklist /
>   whitelist'.
>
>   Recommended replacements for 'master / slave' are:
> '{primary,main} / {secondary,replica,subordinate}'
> '{initiator,requester} / {target,responder}'
> '{controller,host} / {device,worker,proxy}'
> 'leader / follower'
> 'director / performer'
>
>   Recommended replacements for 'blacklist/whitelist' are:
> 'denylist / allowlist'
> 'blocklist / passlist'
>
>   Exceptions for introducing new usage is to maintain a userspace ABI/API,
>   or when updating code for an existing (as of 2020) hardware or protocol
>   specification that mandates those terms. For new specifications
>   translate specification usage of the terminology to the kernel coding
>   standard where possible.

Haven't been able to put much time into this the past few weeks, too
many other things going on leading into the holidays... But I think
Red Hat's general stance on this is that leaving things the way they
are is akin to condoning them. For change to happen, change needs to
happen. I know this starts to get political in a hurry though. I'd
like to see the changes made, even if there's a bit of pain involved
(clearly, since I've dumped this much time and effort into it so far).
:)

I'll try to address issues raised with this version, including the
checkpatch bits, but it may not be until after the first of the year
at this point, with assorted projects trying to get wrapped up before
the holidays, then the holidays themselves, etc.

-- 
Jarod Wilson
ja...@redhat.com

[PATCH net] bonding: fix feature flag setting at init time

2020-11-22 Thread Jarod Wilson

Have run into a case where bond_option_mode_set() gets called before
hw_features has been filled in, and very bad things happen when
netdev_change_features() then gets called, because the empty hw_features
wipes out almost all features. Further reading of netdev feature flag
documentation suggests drivers aren't supposed to touch wanted_features,
so this changes bond_option_mode_set() to use netdev_increment_features()
and &= ~BOND_XFRM_FEATURES on mode changes and then only calling
netdev_features_change() if there was actually a change of features. This
specifically fixes bonding on top of mlxsw interfaces, and has been
regression-tested with ixgbe interfaces. This change also simplifies the
xfrm-specific code in bond_setup() a little bit as well.

Fixes: a3b658cfb664 ("bonding: allow xfrm offload setup post-module-load")
Reported-by: Ivan Vecera 
Cc: Jay Vosburgh 
Cc: Veaceslav Falico 
Cc: Andy Gospodarek 
Cc: "David S. Miller" 
Cc: Jakub Kicinski 
Cc: Thomas Davis 
Cc: net...@vger.kernel.org
Signed-off-by: Jarod Wilson 
---
 drivers/net/bonding/bond_main.c| 10 --
 drivers/net/bonding/bond_options.c | 14 +++---
 2 files changed, 15 insertions(+), 9 deletions(-)

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 71c9677d135f..b8e0cb4f9480 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -4721,15 +4721,13 @@ void bond_setup(struct net_device *bond_dev)
NETIF_F_HW_VLAN_CTAG_FILTER;
 
bond_dev->hw_features |= NETIF_F_GSO_ENCAP_ALL;
-#ifdef CONFIG_XFRM_OFFLOAD
-   bond_dev->hw_features |= BOND_XFRM_FEATURES;
-#endif /* CONFIG_XFRM_OFFLOAD */
bond_dev->features |= bond_dev->hw_features;
bond_dev->features |= NETIF_F_HW_VLAN_CTAG_TX | NETIF_F_HW_VLAN_STAG_TX;
 #ifdef CONFIG_XFRM_OFFLOAD
-   /* Disable XFRM features if this isn't an active-backup config */
-   if (BOND_MODE(bond) != BOND_MODE_ACTIVEBACKUP)
-   bond_dev->features &= ~BOND_XFRM_FEATURES;
+   bond_dev->hw_features |= BOND_XFRM_FEATURES;
+   /* Only enable XFRM features if this is an active-backup config */
+   if (BOND_MODE(bond) == BOND_MODE_ACTIVEBACKUP)
+   bond_dev->features |= BOND_XFRM_FEATURES;
 #endif /* CONFIG_XFRM_OFFLOAD */
 }
 
diff --git a/drivers/net/bonding/bond_options.c 
b/drivers/net/bonding/bond_options.c
index 9abfaae1c6f7..bce34648d97d 100644
--- a/drivers/net/bonding/bond_options.c
+++ b/drivers/net/bonding/bond_options.c
@@ -748,6 +748,9 @@ const struct bond_option *bond_opt_get(unsigned int option)
 static int bond_option_mode_set(struct bonding *bond,
const struct bond_opt_value *newval)
 {
+   netdev_features_t features = bond->dev->features;
+   netdev_features_t mask = features & BOND_XFRM_FEATURES;
+
if (!bond_mode_uses_arp(newval->value)) {
if (bond->params.arp_interval) {
netdev_dbg(bond->dev, "%s mode is incompatible with arp 
monitoring, start mii monitoring\n",
@@ -769,10 +772,15 @@ static int bond_option_mode_set(struct bonding *bond,
 
 #ifdef CONFIG_XFRM_OFFLOAD
if (newval->value == BOND_MODE_ACTIVEBACKUP)
-   bond->dev->wanted_features |= BOND_XFRM_FEATURES;
+   features = netdev_increment_features(features,
+BOND_XFRM_FEATURES, mask);
else
-   bond->dev->wanted_features &= ~BOND_XFRM_FEATURES;
-   netdev_change_features(bond->dev);
+   features &= ~BOND_XFRM_FEATURES;
+
+   if (bond->dev->features != features) {
+   bond->dev->features = features;
+   netdev_features_change(bond->dev);
+   }
 #endif /* CONFIG_XFRM_OFFLOAD */
 
/* don't cache arp_validate between modes */
-- 
2.28.0

Re: [PATCH net-next v4 0/5] bonding: rename bond components

2020-11-09 Thread Jarod Wilson

On Fri, Nov 6, 2020 at 9:44 PM Jakub Kicinski  wrote:
>
> On Fri,  6 Nov 2020 15:04:31 -0500 Jarod Wilson wrote:
> > The bonding driver's use of master and slave, while largely understood
> > in technical circles, poses a barrier for inclusion to some potential
> > members of the development and user community, due to the historical
> > context of masters and slaves, particularly in the United States. This
> > is a first full pass at replacing those phrases with more socially
> > inclusive ones, opting for bond to replace master and port to
> > replace slave, which is congruent with the bridge and team drivers.
>
> If we decide to go ahead with this, we should probably also use it as
> an opportunity to clean up the more egregious checkpatch warnings, WDYT?
>
> Plan minimum - don't add new ones ;)

Hm. I hadn't actually looked at checkpatch output until now. It's...
noisy here. But I'm pretty sure the vast majority of that is from
existing issues, simply reported now due to all the renaming. I can
certainly take a crack at cleanups, but I'd be worried about missing
another merge window trying to sort all of these, when they're not
directly related.

-- 
Jarod Wilson
ja...@redhat.com

[PATCH 1/5] bonding: rename 802.3ad's struct port to ad_port

2020-11-06 Thread Jarod Wilson

The intention is to reuse "port" in place of "slave" in the bonding driver
after making this change, as port is consistent with the bridge and team
drivers, and allows us to remove socially problematic language from the
bonding driver.

Cc: Jay Vosburgh 
Cc: Veaceslav Falico 
Cc: Andy Gospodarek 
Cc: "David S. Miller" 
Cc: Jakub Kicinski 
Cc: Thomas Davis 
Cc: net...@vger.kernel.org
Signed-off-by: Jarod Wilson 
---
 drivers/net/bonding/bond_3ad.c | 1307 
 drivers/net/bonding/bond_main.c|4 +-
 drivers/net/bonding/bond_netlink.c |6 +-
 drivers/net/bonding/bond_procfs.c  |   36 +-
 drivers/net/bonding/bond_sysfs_slave.c |   10 +-
 include/net/bond_3ad.h |   14 +-
 6 files changed, 688 insertions(+), 689 deletions(-)

diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c
index aa001b16765a..0eb717b0bfc6 100644
--- a/drivers/net/bonding/bond_3ad.c
+++ b/drivers/net/bonding/bond_3ad.c
@@ -89,60 +89,60 @@ static const u8 lacpdu_mcast_addr[ETH_ALEN + 2] 
__long_aligned =
MULTICAST_LACPDU_ADDR;
 
 /* = main 802.3ad protocol functions == */
-static int ad_lacpdu_send(struct port *port);
-static int ad_marker_send(struct port *port, struct bond_marker *marker);
-static void ad_mux_machine(struct port *port, bool *update_slave_arr);
-static void ad_rx_machine(struct lacpdu *lacpdu, struct port *port);
-static void ad_tx_machine(struct port *port);
-static void ad_periodic_machine(struct port *port);
-static void ad_port_selection_logic(struct port *port, bool *update_slave_arr);
+static int ad_lacpdu_send(struct ad_port *ad_port);
+static int ad_marker_send(struct ad_port *ad_port, struct bond_marker *marker);
+static void ad_mux_machine(struct ad_port *ad_port, bool *update_slave_arr);
+static void ad_rx_machine(struct lacpdu *lacpdu, struct ad_port *ad_port);
+static void ad_tx_machine(struct ad_port *ad_port);
+static void ad_periodic_machine(struct ad_port *ad_port);
+static void ad_port_selection_logic(struct ad_port *ad_port, bool 
*update_slave_arr);
 static void ad_agg_selection_logic(struct aggregator *aggregator,
   bool *update_slave_arr);
 static void ad_clear_agg(struct aggregator *aggregator);
 static void ad_initialize_agg(struct aggregator *aggregator);
-static void ad_initialize_port(struct port *port, int lacp_fast);
-static void ad_enable_collecting_distributing(struct port *port,
+static void ad_initialize_port(struct ad_port *ad_port, int lacp_fast);
+static void ad_enable_collecting_distributing(struct ad_port *ad_port,
  bool *update_slave_arr);
-static void ad_disable_collecting_distributing(struct port *port,
+static void ad_disable_collecting_distributing(struct ad_port *ad_port,
   bool *update_slave_arr);
 static void ad_marker_info_received(struct bond_marker *marker_info,
-   struct port *port);
+   struct ad_port *ad_port);
 static void ad_marker_response_received(struct bond_marker *marker,
-   struct port *port);
-static void ad_update_actor_keys(struct port *port, bool reset);
+   struct ad_port *ad_port);
+static void ad_update_actor_keys(struct ad_port *ad_port, bool reset);
 
 
 /* = api to bonding and kernel code == */
 
 /**
- * __get_bond_by_port - get the port's bonding struct
- * @port: the port we're looking at
+ * __get_bond_by_ad_port - get the ad_port's bonding struct
+ * @ad_port: the ad_port we're looking at
  *
- * Return @port's bonding struct, or %NULL if it can't be found.
+ * Return @ad_port's bonding struct, or %NULL if it can't be found.
  */
-static inline struct bonding *__get_bond_by_port(struct port *port)
+static inline struct bonding *__get_bond_by_ad_port(struct ad_port *ad_port)
 {
-   if (port->slave == NULL)
+   if (ad_port->slave == NULL)
return NULL;
 
-   return bond_get_bond_by_slave(port->slave);
+   return bond_get_bond_by_slave(ad_port->slave);
 }
 
 /**
  * __get_first_agg - get the first aggregator in the bond
- * @port: the port we're looking at
+ * @ad_port: the ad_port we're looking at
  *
  * Return the aggregator of the first slave in @bond, or %NULL if it can't be
  * found.
  * The caller must hold RCU or RTNL lock.
  */
-static inline struct aggregator *__get_first_agg(struct port *port)
+static inline struct aggregator *__get_first_agg(struct ad_port *ad_port)
 {
-   struct bonding *bond = __get_bond_by_port(port);
+   struct bonding *bond = __get_bond_by_ad_port(ad_port);
struct slave *first_slave;
struct aggregator *agg;
 
-   /* If there's no bond for thi

[PATCH 4/5] bonding: rename bonding_sysfs_slave.c to _port.c

2020-11-06 Thread Jarod Wilson

Now that use of "slave" has been replaced by "port", rename this file too.

Cc: Jay Vosburgh 
Cc: Veaceslav Falico 
Cc: Andy Gospodarek 
Cc: "David S. Miller" 
Cc: Jakub Kicinski 
Cc: Thomas Davis 
Cc: net...@vger.kernel.org
Signed-off-by: Jarod Wilson 
---
 drivers/net/bonding/Makefile  | 2 +-
 drivers/net/bonding/{bond_sysfs_slave.c => bond_sysfs_port.c} | 0
 2 files changed, 1 insertion(+), 1 deletion(-)
 rename drivers/net/bonding/{bond_sysfs_slave.c => bond_sysfs_port.c} (100%)

diff --git a/drivers/net/bonding/Makefile b/drivers/net/bonding/Makefile
index 30e8ae3da2da..2ed0083514a6 100644
--- a/drivers/net/bonding/Makefile
+++ b/drivers/net/bonding/Makefile
@@ -5,7 +5,7 @@
 
 obj-$(CONFIG_BONDING) += bonding.o
 
-bonding-objs := bond_main.o bond_3ad.o bond_alb.o bond_sysfs.o 
bond_sysfs_slave.o bond_debugfs.o bond_netlink.o bond_options.o
+bonding-objs := bond_main.o bond_3ad.o bond_alb.o bond_sysfs.o 
bond_sysfs_port.o bond_debugfs.o bond_netlink.o bond_options.o
 
 proc-$(CONFIG_PROC_FS) += bond_procfs.o
 bonding-objs += $(proc-y)
diff --git a/drivers/net/bonding/bond_sysfs_slave.c 
b/drivers/net/bonding/bond_sysfs_port.c
similarity index 100%
rename from drivers/net/bonding/bond_sysfs_slave.c
rename to drivers/net/bonding/bond_sysfs_port.c
-- 
2.28.0

[PATCH 5/5] bonding: update Documentation for port/bond terminology

2020-11-06 Thread Jarod Wilson

Swap in port/bond terminology where appropriate, leaving all legacy sysfs
and procfs interface mentions in place, but marked as deprecated.
Additionally, add more netlink/iproute2 documentation, and note that this
is the preferred method of interfacing with the bonding driver. While
we're at it, also make some mention of NetworkManager's existence.

Cc: Jay Vosburgh 
Cc: Veaceslav Falico 
Cc: Andy Gospodarek 
Cc: "David S. Miller" 
Cc: Jakub Kicinski 
Cc: Thomas Davis 
Cc: net...@vger.kernel.org
Signed-off-by: Jarod Wilson 
---
 Documentation/networking/bonding.rst | 581 ---
 1 file changed, 348 insertions(+), 233 deletions(-)

diff --git a/Documentation/networking/bonding.rst 
b/Documentation/networking/bonding.rst
index adc314639085..9641add9fb32 100644
--- a/Documentation/networking/bonding.rst
+++ b/Documentation/networking/bonding.rst
@@ -21,6 +21,15 @@ Added Sysfs information: 2006/04/24
 
   - Mitch Williams 
 
+Major terminology rework done late 2020, to start to remove the use of
+the socially problematic terms "master" and "slave" from the code. The
+"master" device is now referred to as simply the "bond" device and the
+"slave" devices as "ports", but all sysfs, procfs and module options
+have been retained as-is for userspace compatibility. The sysfs and
+procfs interfaces are deprecated though, in favor of userspace making
+use of netlink and iproute for any and all bonding configuration and
+information-gathering work.
+
 Introduction
 
 
@@ -53,10 +62,11 @@ who to ask for help, please follow the links at the end of 
this file.
3.2.2   Configuring Multiple Bonds with Initscripts
3.3 Configuring Bonding Manually with Ifenslave
3.3.1   Configuring Multiple Bonds Manually
-   3.4 Configuring Bonding Manually via Sysfs
-   3.5 Configuration with Interfaces Support
-   3.6 Overriding Configuration for Special Cases
-   3.7 Configuring LACP for 802.3ad mode in a more secure way
+   3.4 Configuring Bonding Manually via netlink
+   3.5 Configuring Bonding Manually via Sysfs
+   3.6 Configuration with Interfaces Support
+   3.7 Overriding Configuration for Special Cases
+   3.8 Configuring LACP for 802.3ad mode in a more secure way
 
4. Querying Bonding Configuration
4.1 Bonding Configuration
@@ -134,8 +144,8 @@ Build and install the new kernel and modules.
 1.2 Bonding Control Utility
 ---
 
-It is recommended to configure bonding via iproute2 (netlink)
-or sysfs, the old ifenslave control utility is obsolete.
+It is recommended to configure bonding via iproute2 (netlink).
+The sysfs interfaces and the old ifenslave control utility are obsolete.
 
 2. Bonding Driver Options
 =
@@ -167,22 +177,23 @@ or, for backwards compatibility, the option value.  E.g.,
 
 The parameters are as follows:
 
-active_slave
+active_port
+active_slave (DEPRECATED)
 
-   Specifies the new active slave for modes that support it
+   Specifies the new active port for modes that support it
(active-backup, balance-alb and balance-tlb).  Possible values
-   are the name of any currently enslaved interface, or an empty
-   string.  If a name is given, the slave and its link must be up in order
-   to be selected as the new active slave.  If an empty string is
-   specified, the current active slave is cleared, and a new active
-   slave is selected automatically.
+   are the name of any currently aggregated interface, or an empty
+   string.  If a name is given, the port and its port must be up in order
+   to be selected as the new active port.  If an empty string is
+   specified, the current active port is cleared, and a new active
+   port is selected automatically.
 
Note that this is only available through the sysfs interface. No module
parameter by this name exists.
 
The normal value of this option is the name of the currently
-   active slave, or the empty string if there is no active slave or
-   the current mode does not use an active slave.
+   active port, or the empty string if there is no active port or
+   the current mode does not use an active port.
 
 ad_actor_sys_prio
 
@@ -199,8 +210,8 @@ ad_actor_system
protocol packet exchanges (LACPDUs). The value cannot be NULL or
multicast. It is preferred to have the local-admin bit set for this
mac but driver does not enforce it. If the value is not given then
-   system defaults to using the masters' mac address as actors' system
-   address.
+   system defaults to using the bonds' mac address as actors'
+   system address.
 
This parameter has effect only in 802.3ad mode and is available through
SysFs interface.
@@ -216,8 +227,8 @@ ad_select
bandwidth.
 
Resel

[PATCH net-next v4 0/5] bonding: rename bond components

2020-11-06 Thread Jarod Wilson

The bonding driver's use of master and slave, while largely understood
in technical circles, poses a barrier for inclusion to some potential
members of the development and user community, due to the historical
context of masters and slaves, particularly in the United States. This
is a first full pass at replacing those phrases with more socially
inclusive ones, opting for bond to replace master and port to
replace slave, which is congruent with the bridge and team drivers.

There are a few problems with this change. First up, "port" is used in
the bonding 802.3ad code, so the first step here is to rename port to
ad_port, so we can reuse port. Second, we have the issue of not wanting
to break any existing userspace, which I believe this patchset
accomplishes, preserving all existing sysfs and procfs interfaces, and
adding module parameter aliases where necessary.

Third, we do still have the issue of ease of backporting fixes to
-stable trees. I've not had a huge amount of time to spend on it, but
brief forays into coccinelle didn't really pay off (since it's meant to
operate on code, not patches), and the best solution I can come up with
is providing a shell script someone could run over git-format-patch
output before git-am'ing the result to a -stable tree, though scripting
these changes in the first place turned out to be not the best thing to
do anyway, due to subtle cases where use of master or slave can NOT yet
be replaced, so a large amount of work was done by hand, inspection,
trial and error, which is why this set is a lot longer in coming than
I'd originally hoped. I don't expect -stable backports to be horrible to
figure out one way or another though, and I don't believe that a bit of
inconvenience on that front is enough to warrant not making these
changes.

See here for further details on Red Hat's commitment to this work:
https://www.redhat.com/en/blog/making-open-source-more-inclusive-eradicating-problematic-language

As far as testing goes, I've manually operated on various bonds while
working on this code, and have run it through multiple lnst test runs,
which exercises the existing sysfs interfaces fairly extensively. As far
as I can tell through testing and inspection, there is no breakage of
any existing interfaces with this set.

v2: legacy module parameters are retained this time, and we're trying
out bond/port instead of aggregator/link in place of master/slave. The
procfs interface legacy output is also duplicated or dropped, depending
on Kconfig, rather than being replaced.

v3: remove Kconfig knob, leave sysfs and procfs interfaces entirely
untouched, but update documentation to reference their deprecated
nature, explain the name changes, add references to NetworkManager,
include more netlink/iproute2 examples and make note of netlink
being the preferred interface for userspace interaction with bonds.

v4: documentation table of contents fixes

Cc: Jay Vosburgh 
Cc: Veaceslav Falico 
Cc: Andy Gospodarek 
Cc: "David S. Miller" 
Cc: Jakub Kicinski 
Cc: Thomas Davis 
Cc: net...@vger.kernel.org

Jarod Wilson (5):
  bonding: rename 802.3ad's struct port to ad_port
  bonding: replace use of the term master where possible
  bonding: rename slave to port where possible
  bonding: rename bonding_sysfs_slave.c to _port.c
  bonding: update Documentation for port/bond terminology

 .clang-format |4 +-
 Documentation/networking/bonding.rst  |  581 ++--
 drivers/infiniband/core/cma.c |2 +-
 drivers/infiniband/core/lag.c |2 +-
 drivers/infiniband/core/roce_gid_mgmt.c   |   10 +-
 drivers/infiniband/hw/mlx4/main.c |2 +-
 drivers/net/bonding/Makefile  |2 +-
 drivers/net/bonding/bond_3ad.c| 1701 ++--
 drivers/net/bonding/bond_alb.c|  689 ++---
 drivers/net/bonding/bond_debugfs.c|2 +-
 drivers/net/bonding/bond_main.c   | 2341 +
 drivers/net/bonding/bond_netlink.c|  114 +-
 drivers/net/bonding/bond_options.c|  258 +-
 drivers/net/bonding/bond_procfs.c |   86 +-
 drivers/net/bonding/bond_sysfs.c  |   78 +-
 drivers/net/bonding/bond_sysfs_port.c |  185 ++
 drivers/net/bonding/bond_sysfs_slave.c|  176 --
 .../ethernet/chelsio/cxgb3/cxgb3_offload.c|2 +-
 .../net/ethernet/mellanox/mlx4/en_netdev.c|   14 +-
 .../ethernet/mellanox/mlx5/core/en/rep/bond.c |4 +-
 .../net/ethernet/mellanox/mlx5/core/en_tc.c   |2 +-
 .../ethernet/netronome/nfp/flower/lag_conf.c  |2 +-
 .../ethernet/qlogic/netxen/netxen_nic_main.c  |   12 +-
 include/linux/netdevice.h |   22 +-
 include/net/bond_3ad.h|   42 +-
 include/net/bond_alb.h|   74 +-
 include/net/bond_options.h

[PATCH 2/5] bonding: replace use of the term master where possible

2020-11-06 Thread Jarod Wilson

Simply refer to what was the bonding "master" as the "bond" or bonding
device, depending on context. However, do retain compat code for the
bonding_masters sysfs interface to avoid breaking userspace.

Cc: Jay Vosburgh 
Cc: Veaceslav Falico 
Cc: Andy Gospodarek 
Cc: "David S. Miller" 
Cc: Jakub Kicinski 
Cc: Thomas Davis 
Cc: net...@vger.kernel.org
Signed-off-by: Jarod Wilson 
---
 drivers/infiniband/core/cma.c |  2 +-
 drivers/infiniband/core/lag.c |  2 +-
 drivers/infiniband/core/roce_gid_mgmt.c   |  6 +-
 drivers/net/bonding/bond_3ad.c|  2 +-
 drivers/net/bonding/bond_main.c   | 58 +--
 drivers/net/bonding/bond_procfs.c |  4 +-
 drivers/net/bonding/bond_sysfs.c  |  8 +--
 .../net/ethernet/mellanox/mlx4/en_netdev.c| 10 ++--
 .../ethernet/netronome/nfp/flower/lag_conf.c  |  2 +-
 .../ethernet/qlogic/netxen/netxen_nic_main.c  |  8 +--
 include/linux/netdevice.h |  8 +--
 include/net/bonding.h |  4 +-
 12 files changed, 58 insertions(+), 56 deletions(-)

diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
index a77750b8954d..3a1679d16e19 100644
--- a/drivers/infiniband/core/cma.c
+++ b/drivers/infiniband/core/cma.c
@@ -4753,7 +4753,7 @@ static int cma_netdev_callback(struct notifier_block 
*self, unsigned long event,
if (event != NETDEV_BONDING_FAILOVER)
return NOTIFY_DONE;
 
-   if (!netif_is_bond_master(ndev))
+   if (!netif_is_bond_dev(ndev))
return NOTIFY_DONE;
 
mutex_lock(&lock);
diff --git a/drivers/infiniband/core/lag.c b/drivers/infiniband/core/lag.c
index 7063e41eaf26..2afaca2f9d0b 100644
--- a/drivers/infiniband/core/lag.c
+++ b/drivers/infiniband/core/lag.c
@@ -128,7 +128,7 @@ struct net_device *rdma_lag_get_ah_roce_slave(struct 
ib_device *device,
dev_hold(master);
rcu_read_unlock();
 
-   if (!netif_is_bond_master(master))
+   if (!netif_is_bond_dev(master))
goto put;
 
slave = rdma_get_xmit_slave_udp(device, master, ah_attr, flags);
diff --git a/drivers/infiniband/core/roce_gid_mgmt.c 
b/drivers/infiniband/core/roce_gid_mgmt.c
index 6b8364bb032d..e06cf51f1773 100644
--- a/drivers/infiniband/core/roce_gid_mgmt.c
+++ b/drivers/infiniband/core/roce_gid_mgmt.c
@@ -129,7 +129,7 @@ enum bonding_slave_state {
 static enum bonding_slave_state is_eth_active_slave_of_bonding_rcu(struct 
net_device *dev,
   struct 
net_device *upper)
 {
-   if (upper && netif_is_bond_master(upper)) {
+   if (upper && netif_is_bond_dev(upper)) {
struct net_device *pdev =
bond_option_active_slave_get_rcu(netdev_priv(upper));
 
@@ -216,7 +216,7 @@ is_ndev_for_default_gid_filter(struct ib_device *ib_dev, u8 
port,
 * make sure that it the upper netdevice of rdma netdevice.
 */
res = ((cookie_ndev == rdma_ndev && !netif_is_bond_slave(rdma_ndev)) ||
-  (netif_is_bond_master(cookie_ndev) &&
+  (netif_is_bond_dev(cookie_ndev) &&
rdma_is_upper_dev_rcu(rdma_ndev, cookie_ndev)));
 
rcu_read_unlock();
@@ -271,7 +271,7 @@ is_upper_ndev_bond_master_filter(struct ib_device *ib_dev, 
u8 port,
return false;
 
rcu_read_lock();
-   if (netif_is_bond_master(cookie_ndev) &&
+   if (netif_is_bond_dev(cookie_ndev) &&
rdma_is_upper_dev_rcu(rdma_ndev, cookie_ndev))
match = true;
rcu_read_unlock();
diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c
index 0eb717b0bfc6..852b9c4f6a47 100644
--- a/drivers/net/bonding/bond_3ad.c
+++ b/drivers/net/bonding/bond_3ad.c
@@ -2550,7 +2550,7 @@ void bond_3ad_handle_link_change(struct slave *slave, 
char link)
 }
 
 /**
- * bond_3ad_set_carrier - set link state for bonding master
+ * bond_3ad_set_carrier - set link state for bonding device
  * @bond: bonding structure
  *
  * if we have an active aggregator, we're up, if not, we're down.
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index d79643f6b01e..e9cc7d68f3b9 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -469,8 +469,8 @@ static const struct xfrmdev_ops bond_xfrmdev_ops = {
 
 /*--- Link status ---*/
 
-/* Set the carrier state for the master according to the state of its
- * slaves.  If any slaves are up, the master is up.  In 802.3ad mode,
+/* Set the carrier state for the bond according to the state of its
+ * slaves.  If any slaves are up, the bond is up.  In 802.3ad mode,
  * do special 802.3ad magic.
  *
  * Returns zero if carrier state does not change, nonzero i

Re: [PATCH net-next v3 5/5] bonding: update Documentation for port/bond terminology

2020-10-07 Thread Jarod Wilson

On Wed, Oct 7, 2020 at 2:14 PM Jarod Wilson  wrote:
>
> Swap in port/bond terminology where appropriate, leaving all legacy sysfs
> and procfs interface mentions in place, but marked as deprecated.
> Additionally, add more netlink/iproute2 documentation, and note that this
> is the preferred method of interfacing with the bonding driver. While
> we're at it, also make some mention of NetworkManager's existence.

I neglected to update the ToC and put in some leading #'s that I
should have, so there will be at least one more revision of this set,
but would appreciate feedback on this version of the Documentation
update with respect to preservation of existing interface names,
explaining the changes, and deprecation.

-- 
Jarod Wilson
ja...@redhat.com

[PATCH net-next v3 2/5] bonding: replace use of the term master where possible

2020-10-07 Thread Jarod Wilson

Simply refer to what was the bonding "master" as the "bond" or bonding
device, depending on context. However, do retain compat code for the
bonding_masters sysfs interface to avoid breaking userspace.

Cc: Jay Vosburgh 
Cc: Veaceslav Falico 
Cc: Andy Gospodarek 
Cc: "David S. Miller" 
Cc: Jakub Kicinski 
Cc: Thomas Davis 
Cc: net...@vger.kernel.org
Signed-off-by: Jarod Wilson 
---
 drivers/infiniband/core/cma.c |  2 +-
 drivers/infiniband/core/lag.c |  2 +-
 drivers/infiniband/core/roce_gid_mgmt.c   |  6 +-
 drivers/net/bonding/bond_3ad.c|  2 +-
 drivers/net/bonding/bond_main.c   | 58 +--
 drivers/net/bonding/bond_procfs.c |  4 +-
 drivers/net/bonding/bond_sysfs.c  |  8 +--
 .../net/ethernet/mellanox/mlx4/en_netdev.c| 10 ++--
 .../ethernet/netronome/nfp/flower/lag_conf.c  |  2 +-
 .../ethernet/qlogic/netxen/netxen_nic_main.c  |  8 +--
 include/linux/netdevice.h |  8 +--
 include/net/bonding.h |  4 +-
 12 files changed, 58 insertions(+), 56 deletions(-)

diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
index 7f0e91e92968..fd5ad5139106 100644
--- a/drivers/infiniband/core/cma.c
+++ b/drivers/infiniband/core/cma.c
@@ -4687,7 +4687,7 @@ static int cma_netdev_callback(struct notifier_block 
*self, unsigned long event,
if (event != NETDEV_BONDING_FAILOVER)
return NOTIFY_DONE;
 
-   if (!netif_is_bond_master(ndev))
+   if (!netif_is_bond_dev(ndev))
return NOTIFY_DONE;
 
mutex_lock(&lock);
diff --git a/drivers/infiniband/core/lag.c b/drivers/infiniband/core/lag.c
index 7063e41eaf26..2afaca2f9d0b 100644
--- a/drivers/infiniband/core/lag.c
+++ b/drivers/infiniband/core/lag.c
@@ -128,7 +128,7 @@ struct net_device *rdma_lag_get_ah_roce_slave(struct 
ib_device *device,
dev_hold(master);
rcu_read_unlock();
 
-   if (!netif_is_bond_master(master))
+   if (!netif_is_bond_dev(master))
goto put;
 
slave = rdma_get_xmit_slave_udp(device, master, ah_attr, flags);
diff --git a/drivers/infiniband/core/roce_gid_mgmt.c 
b/drivers/infiniband/core/roce_gid_mgmt.c
index 2860def84f4d..85c48977be6c 100644
--- a/drivers/infiniband/core/roce_gid_mgmt.c
+++ b/drivers/infiniband/core/roce_gid_mgmt.c
@@ -129,7 +129,7 @@ enum bonding_slave_state {
 static enum bonding_slave_state is_eth_active_slave_of_bonding_rcu(struct 
net_device *dev,
   struct 
net_device *upper)
 {
-   if (upper && netif_is_bond_master(upper)) {
+   if (upper && netif_is_bond_dev(upper)) {
struct net_device *pdev =
bond_option_active_slave_get_rcu(netdev_priv(upper));
 
@@ -216,7 +216,7 @@ is_ndev_for_default_gid_filter(struct ib_device *ib_dev, u8 
port,
 * make sure that it the upper netdevice of rdma netdevice.
 */
res = ((cookie_ndev == rdma_ndev && !netif_is_bond_slave(rdma_ndev)) ||
-  (netif_is_bond_master(cookie_ndev) &&
+  (netif_is_bond_dev(cookie_ndev) &&
rdma_is_upper_dev_rcu(rdma_ndev, cookie_ndev)));
 
rcu_read_unlock();
@@ -271,7 +271,7 @@ is_upper_ndev_bond_master_filter(struct ib_device *ib_dev, 
u8 port,
return false;
 
rcu_read_lock();
-   if (netif_is_bond_master(cookie_ndev) &&
+   if (netif_is_bond_dev(cookie_ndev) &&
rdma_is_upper_dev_rcu(rdma_ndev, cookie_ndev))
match = true;
rcu_read_unlock();
diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c
index 0eb717b0bfc6..852b9c4f6a47 100644
--- a/drivers/net/bonding/bond_3ad.c
+++ b/drivers/net/bonding/bond_3ad.c
@@ -2550,7 +2550,7 @@ void bond_3ad_handle_link_change(struct slave *slave, 
char link)
 }
 
 /**
- * bond_3ad_set_carrier - set link state for bonding master
+ * bond_3ad_set_carrier - set link state for bonding device
  * @bond: bonding structure
  *
  * if we have an active aggregator, we're up, if not, we're down.
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 28c04a7a5105..405d230b8ea3 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -469,8 +469,8 @@ static const struct xfrmdev_ops bond_xfrmdev_ops = {
 
 /*--- Link status ---*/
 
-/* Set the carrier state for the master according to the state of its
- * slaves.  If any slaves are up, the master is up.  In 802.3ad mode,
+/* Set the carrier state for the bond according to the state of its
+ * slaves.  If any slaves are up, the bond is up.  In 802.3ad mode,
  * do special 802.3ad magic.
  *
  * Returns zero if carrier state does not change, nonzero i

[PATCH net-next v3 4/5] bonding: rename bonding_sysfs_slave.c to _port.c

2020-10-07 Thread Jarod Wilson

Now that use of "slave" has been replaced by "port", rename this file too.

Cc: Jay Vosburgh 
Cc: Veaceslav Falico 
Cc: Andy Gospodarek 
Cc: "David S. Miller" 
Cc: Jakub Kicinski 
Cc: Thomas Davis 
Cc: net...@vger.kernel.org
Signed-off-by: Jarod Wilson 
---
 drivers/net/bonding/Makefile  | 2 +-
 drivers/net/bonding/{bond_sysfs_slave.c => bond_sysfs_port.c} | 0
 2 files changed, 1 insertion(+), 1 deletion(-)
 rename drivers/net/bonding/{bond_sysfs_slave.c => bond_sysfs_port.c} (100%)

diff --git a/drivers/net/bonding/Makefile b/drivers/net/bonding/Makefile
index 30e8ae3da2da..2ed0083514a6 100644
--- a/drivers/net/bonding/Makefile
+++ b/drivers/net/bonding/Makefile
@@ -5,7 +5,7 @@
 
 obj-$(CONFIG_BONDING) += bonding.o
 
-bonding-objs := bond_main.o bond_3ad.o bond_alb.o bond_sysfs.o 
bond_sysfs_slave.o bond_debugfs.o bond_netlink.o bond_options.o
+bonding-objs := bond_main.o bond_3ad.o bond_alb.o bond_sysfs.o 
bond_sysfs_port.o bond_debugfs.o bond_netlink.o bond_options.o
 
 proc-$(CONFIG_PROC_FS) += bond_procfs.o
 bonding-objs += $(proc-y)
diff --git a/drivers/net/bonding/bond_sysfs_slave.c 
b/drivers/net/bonding/bond_sysfs_port.c
similarity index 100%
rename from drivers/net/bonding/bond_sysfs_slave.c
rename to drivers/net/bonding/bond_sysfs_port.c
-- 
2.27.0

[PATCH net-next v3 1/5] bonding: rename 802.3ad's struct port to ad_port

2020-10-07 Thread Jarod Wilson

The intention is to reuse "port" in place of "slave" in the bonding driver
after making this change, as port is consistent with the bridge and team
drivers, and allows us to remove socially problematic language from the
bonding driver.

Cc: Jay Vosburgh 
Cc: Veaceslav Falico 
Cc: Andy Gospodarek 
Cc: "David S. Miller" 
Cc: Jakub Kicinski 
Cc: Thomas Davis 
Cc: net...@vger.kernel.org
Signed-off-by: Jarod Wilson 
---
 drivers/net/bonding/bond_3ad.c | 1307 
 drivers/net/bonding/bond_main.c|4 +-
 drivers/net/bonding/bond_netlink.c |6 +-
 drivers/net/bonding/bond_procfs.c  |   36 +-
 drivers/net/bonding/bond_sysfs_slave.c |   10 +-
 include/net/bond_3ad.h |   14 +-
 6 files changed, 688 insertions(+), 689 deletions(-)

diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c
index aa001b16765a..0eb717b0bfc6 100644
--- a/drivers/net/bonding/bond_3ad.c
+++ b/drivers/net/bonding/bond_3ad.c
@@ -89,60 +89,60 @@ static const u8 lacpdu_mcast_addr[ETH_ALEN + 2] 
__long_aligned =
MULTICAST_LACPDU_ADDR;
 
 /* = main 802.3ad protocol functions == */
-static int ad_lacpdu_send(struct port *port);
-static int ad_marker_send(struct port *port, struct bond_marker *marker);
-static void ad_mux_machine(struct port *port, bool *update_slave_arr);
-static void ad_rx_machine(struct lacpdu *lacpdu, struct port *port);
-static void ad_tx_machine(struct port *port);
-static void ad_periodic_machine(struct port *port);
-static void ad_port_selection_logic(struct port *port, bool *update_slave_arr);
+static int ad_lacpdu_send(struct ad_port *ad_port);
+static int ad_marker_send(struct ad_port *ad_port, struct bond_marker *marker);
+static void ad_mux_machine(struct ad_port *ad_port, bool *update_slave_arr);
+static void ad_rx_machine(struct lacpdu *lacpdu, struct ad_port *ad_port);
+static void ad_tx_machine(struct ad_port *ad_port);
+static void ad_periodic_machine(struct ad_port *ad_port);
+static void ad_port_selection_logic(struct ad_port *ad_port, bool 
*update_slave_arr);
 static void ad_agg_selection_logic(struct aggregator *aggregator,
   bool *update_slave_arr);
 static void ad_clear_agg(struct aggregator *aggregator);
 static void ad_initialize_agg(struct aggregator *aggregator);
-static void ad_initialize_port(struct port *port, int lacp_fast);
-static void ad_enable_collecting_distributing(struct port *port,
+static void ad_initialize_port(struct ad_port *ad_port, int lacp_fast);
+static void ad_enable_collecting_distributing(struct ad_port *ad_port,
  bool *update_slave_arr);
-static void ad_disable_collecting_distributing(struct port *port,
+static void ad_disable_collecting_distributing(struct ad_port *ad_port,
   bool *update_slave_arr);
 static void ad_marker_info_received(struct bond_marker *marker_info,
-   struct port *port);
+   struct ad_port *ad_port);
 static void ad_marker_response_received(struct bond_marker *marker,
-   struct port *port);
-static void ad_update_actor_keys(struct port *port, bool reset);
+   struct ad_port *ad_port);
+static void ad_update_actor_keys(struct ad_port *ad_port, bool reset);
 
 
 /* = api to bonding and kernel code == */
 
 /**
- * __get_bond_by_port - get the port's bonding struct
- * @port: the port we're looking at
+ * __get_bond_by_ad_port - get the ad_port's bonding struct
+ * @ad_port: the ad_port we're looking at
  *
- * Return @port's bonding struct, or %NULL if it can't be found.
+ * Return @ad_port's bonding struct, or %NULL if it can't be found.
  */
-static inline struct bonding *__get_bond_by_port(struct port *port)
+static inline struct bonding *__get_bond_by_ad_port(struct ad_port *ad_port)
 {
-   if (port->slave == NULL)
+   if (ad_port->slave == NULL)
return NULL;
 
-   return bond_get_bond_by_slave(port->slave);
+   return bond_get_bond_by_slave(ad_port->slave);
 }
 
 /**
  * __get_first_agg - get the first aggregator in the bond
- * @port: the port we're looking at
+ * @ad_port: the ad_port we're looking at
  *
  * Return the aggregator of the first slave in @bond, or %NULL if it can't be
  * found.
  * The caller must hold RCU or RTNL lock.
  */
-static inline struct aggregator *__get_first_agg(struct port *port)
+static inline struct aggregator *__get_first_agg(struct ad_port *ad_port)
 {
-   struct bonding *bond = __get_bond_by_port(port);
+   struct bonding *bond = __get_bond_by_ad_port(ad_port);
struct slave *first_slave;
struct aggregator *agg;
 
-   /* If there's no bond for thi

[PATCH net-next v3 5/5] bonding: update Documentation for port/bond terminology

2020-10-07 Thread Jarod Wilson

Swap in port/bond terminology where appropriate, leaving all legacy sysfs
and procfs interface mentions in place, but marked as deprecated.
Additionally, add more netlink/iproute2 documentation, and note that this
is the preferred method of interfacing with the bonding driver. While
we're at it, also make some mention of NetworkManager's existence.

Cc: Jay Vosburgh 
Cc: Veaceslav Falico 
Cc: Andy Gospodarek 
Cc: "David S. Miller" 
Cc: Jakub Kicinski 
Cc: Thomas Davis 
Cc: net...@vger.kernel.org
Signed-off-by: Jarod Wilson 
---
 Documentation/networking/bonding.rst | 578 ---
 1 file changed, 346 insertions(+), 232 deletions(-)

diff --git a/Documentation/networking/bonding.rst 
b/Documentation/networking/bonding.rst
index adc314639085..a1b3aace600b 100644
--- a/Documentation/networking/bonding.rst
+++ b/Documentation/networking/bonding.rst
@@ -21,6 +21,15 @@ Added Sysfs information: 2006/04/24
 
   - Mitch Williams 
 
+Major terminology rework done late 2020, to start to remove the use of
+the socially problematic terms "master" and "slave" from the code. The
+"master" device is now referred to as simply the "bond" device and the
+"slave" devices as "ports", but all sysfs, procfs and module options
+have been retained as-is for userspace compatibility. The sysfs and
+procfs interfaces are deprecated though, in favor of userspace making
+use of netlink and iproute for any and all bonding configuration and
+information-gathering work.
+
 Introduction
 
 
@@ -167,22 +176,23 @@ or, for backwards compatibility, the option value.  E.g.,
 
 The parameters are as follows:
 
-active_slave
+active_port
+active_slave (DEPRECATED)
 
-   Specifies the new active slave for modes that support it
+   Specifies the new active port for modes that support it
(active-backup, balance-alb and balance-tlb).  Possible values
-   are the name of any currently enslaved interface, or an empty
-   string.  If a name is given, the slave and its link must be up in order
-   to be selected as the new active slave.  If an empty string is
-   specified, the current active slave is cleared, and a new active
-   slave is selected automatically.
+   are the name of any currently aggregated interface, or an empty
+   string.  If a name is given, the port and its port must be up in order
+   to be selected as the new active port.  If an empty string is
+   specified, the current active port is cleared, and a new active
+   port is selected automatically.
 
Note that this is only available through the sysfs interface. No module
parameter by this name exists.
 
The normal value of this option is the name of the currently
-   active slave, or the empty string if there is no active slave or
-   the current mode does not use an active slave.
+   active port, or the empty string if there is no active port or
+   the current mode does not use an active port.
 
 ad_actor_sys_prio
 
@@ -199,8 +209,8 @@ ad_actor_system
protocol packet exchanges (LACPDUs). The value cannot be NULL or
multicast. It is preferred to have the local-admin bit set for this
mac but driver does not enforce it. If the value is not given then
-   system defaults to using the masters' mac address as actors' system
-   address.
+   system defaults to using the bonds' mac address as actors'
+   system address.
 
This parameter has effect only in 802.3ad mode and is available through
SysFs interface.
@@ -216,8 +226,8 @@ ad_select
bandwidth.
 
Reselection of the active aggregator occurs only when all
-   slaves of the active aggregator are down or the active
-   aggregator has no slaves.
+   ports of the active aggregator are down or the active
+   aggregator has no ports.
 
This is the default value.
 
@@ -226,18 +236,18 @@ ad_select
The active aggregator is chosen by largest aggregate
bandwidth.  Reselection occurs if:
 
-   - A slave is added to or removed from the bond
+   - A port is added to or removed from the bond
 
-   - Any slave's link state changes
+   - Any port's link state changes
 
-   - Any slave's 802.3ad association state changes
+   - Any port's 802.3ad association state changes
 
- The bond's administrative state changes to up
 
count or 2
 
The active aggregator is chosen by the largest number of
-   ports (slaves).  Reselection occurs as described under the
+   ports (ports).  Reselection occurs as described under the
"bandwidth" setting, above.
 
The bandwidth

[PATCH net-next v3 0/5] bonding: rename bond components

2020-10-07 Thread Jarod Wilson

The bonding driver's use of master and slave, while largely understood
in technical circles, poses a barrier for inclusion to some potential
members of the development and user community, due to the historical
context of masters and slaves, particularly in the United States. This
is a first full pass at replacing those phrases with more socially
inclusive ones, opting for bond to replace master and port to
replace slave, which is congruent with the bridge and team drivers.

There are a few problems with this change. First up, "port" is used in
the bonding 802.3ad code, so the first step here is to rename port to
ad_port, so we can reuse port. Second, we have the issue of not wanting
to break any existing userspace, which I believe this patchset
accomplishes, preserving all existing sysfs and procfs interfaces, and
adding module parameter aliases where necessary.

Third, we do still have the issue of ease of backporting fixes to
-stable trees. I've not had a huge amount of time to spend on it, but
brief forays into coccinelle didn't really pay off (since it's meant to
operate on code, not patches), and the best solution I can come up with
is providing a shell script someone could run over git-format-patch
output before git-am'ing the result to a -stable tree, though scripting
these changes in the first place turned out to be not the best thing to
do anyway, due to subtle cases where use of master or slave can NOT yet
be replaced, so a large amount of work was done by hand, inspection,
trial and error, which is why this set is a lot longer in coming than
I'd originally hoped. I don't expect -stable backports to be horrible to
figure out one way or another though, and I don't believe that a bit of
inconvenience on that front is enough to warrant not making these
changes.

See here for further details on Red Hat's commitment to this work:
https://www.redhat.com/en/blog/making-open-source-more-inclusive-eradicating-problematic-language

As far as testing goes, I've manually operated on various bonds while
working on this code, and have run it through multiple lnst test runs,
which exercises the existing sysfs interfaces fairly extensively. As far
as I can tell through testing and inspection, there is no breakage of
any existing interfaces with this set.

v2: legacy module parameters are retained this time, and we're trying
out bond/port instead of aggregator/link in place of master/slave. The
procfs interface legacy output is also duplicated or dropped, depending
on Kconfig, rather than being replaced.

v3: remove Kconfig knob, leave sysfs and procfs interfaces entirely
untouched, but update documentation to reference their deprecated
nature, explain the name changes, add references to NetworkManager,
include more netlink/iproute2 examples and make note of netlink
being the preferred interface for userspace interaction with bonds.

Cc: Jay Vosburgh 
Cc: Veaceslav Falico 
Cc: Andy Gospodarek 
Cc: "David S. Miller" 
Cc: Jakub Kicinski 
Cc: Thomas Davis 
Cc: net...@vger.kernel.org

Jarod Wilson (5):
  bonding: rename 802.3ad's struct port to ad_port
  bonding: replace use of the term master where possible
  bonding: rename slave to port where possible
  bonding: rename bonding_sysfs_slave.c to _port.c
  bonding: update Documentation for port/bond terminology

 .clang-format |4 +-
 Documentation/networking/bonding.rst  |  578 ++--
 drivers/infiniband/core/cma.c |2 +-
 drivers/infiniband/core/lag.c |2 +-
 drivers/infiniband/core/roce_gid_mgmt.c   |   10 +-
 drivers/infiniband/hw/mlx4/main.c |2 +-
 drivers/net/bonding/Makefile  |2 +-
 drivers/net/bonding/bond_3ad.c| 1701 ++--
 drivers/net/bonding/bond_alb.c|  689 ++---
 drivers/net/bonding/bond_debugfs.c|2 +-
 drivers/net/bonding/bond_main.c   | 2339 +
 drivers/net/bonding/bond_netlink.c|  114 +-
 drivers/net/bonding/bond_options.c|  258 +-
 drivers/net/bonding/bond_procfs.c |   86 +-
 drivers/net/bonding/bond_sysfs.c  |   78 +-
 drivers/net/bonding/bond_sysfs_port.c |  185 ++
 drivers/net/bonding/bond_sysfs_slave.c|  176 --
 .../ethernet/chelsio/cxgb3/cxgb3_offload.c|2 +-
 .../net/ethernet/mellanox/mlx4/en_netdev.c|   14 +-
 .../ethernet/mellanox/mlx5/core/en/rep/bond.c |4 +-
 .../net/ethernet/mellanox/mlx5/core/en_tc.c   |2 +-
 .../ethernet/netronome/nfp/flower/lag_conf.c  |2 +-
 .../ethernet/qlogic/netxen/netxen_nic_main.c  |   12 +-
 include/linux/netdevice.h |   22 +-
 include/net/bond_3ad.h|   42 +-
 include/net/bond_alb.h|   74 +-
 include/net/bond_options.h|   18 +-
 include/net/bonding.h

Re: [PATCH net-next v2 6/6] bonding: make Kconfig toggle to disable legacy interfaces

2020-10-03 Thread Jarod Wilson

On Fri, Oct 2, 2020 at 6:42 PM Stephen Hemminger
 wrote:
>
> On Fri, 2 Oct 2020 16:23:46 -0400
> Jarod Wilson  wrote:
>
> > On Fri, Oct 2, 2020 at 3:13 PM Stephen Hemminger
> >  wrote:
> > >
> > > On Fri,  2 Oct 2020 13:40:01 -0400
> > > Jarod Wilson  wrote:
> > >
> > > > By default, enable retaining all user-facing API that includes the use 
> > > > of
> > > > master and slave, but add a Kconfig knob that allows those that wish to
> > > > remove it entirely do so in one shot.
...
> > > This is problematic. You are printing both old and new values.
> > > Also every distribution will have to enable it.
> > >
> > > This looks like too much of change to users.
> >
> > I'd had a bit of feedback that people would rather see both, and be
> > able to toggle off the old ones, rather than only having one or the
> > other, depending on the toggle, so I thought I'd give this a try. I
> > kind of liked the one or the other route, but I see the problems with
> > that too.
> >
> > For simplicity, I'm kind of liking the idea of just not updating the
> > proc and sysfs interfaces, have a toggle entirely disable them, and
> > work on enhancing userspace to only use netlink, but ... it's going to
> > be a while before any such work makes its way to any already shipping
> > distros. I don't have a satisfying answer here.
> >
>
> I like the idea of having bonding proc and sysf apis optional.

I do too, but I'd see it more as something only userspace developers
would care about for a while, as an easy way to make absolutely
certain their code/distro is no longer reliant on them and only uses
netlink, not as something any normal user really has any reason to do.

-- 
Jarod Wilson
ja...@redhat.com

Re: [PATCH net-next v2 6/6] bonding: make Kconfig toggle to disable legacy interfaces

2020-10-03 Thread Jarod Wilson

On Fri, Oct 2, 2020 at 6:57 PM David Miller  wrote:
>
> From: Jarod Wilson 
> Date: Fri, 2 Oct 2020 16:23:46 -0400
>
> > I'd had a bit of feedback that people would rather see both, and be
> > able to toggle off the old ones, rather than only having one or the
> > other, depending on the toggle, so I thought I'd give this a try. I
> > kind of liked the one or the other route, but I see the problems with
> > that too.
>
> Please keep everything for the entire deprecation period, unconditionally.

Okay, so 100% drop the Kconfig flag patch, but duplicate sysfs
interface names are acceptable, correct? Then what about the procfs
file having duplicate Slave and Port lines? Just leave them all as
Slave?

-- 
Jarod Wilson
ja...@redhat.com

Re: [PATCH net-next v2 5/6] bonding: update Documentation for port/bond terminology

2020-10-03 Thread Jarod Wilson

On Fri, Oct 2, 2020 at 6:55 PM David Miller  wrote:
>
> From: Jarod Wilson 
> Date: Fri, 2 Oct 2020 16:12:49 -0400
>
> > The documentation was updated to point to the new names, but the old
> > ones still exist across the board, there should be no userspace
> > breakage here. (My lnst bonding tests actually fall flat currently
> > if the old names are gone).
>
> The documentation is the reference point for people reading code in
> userspace that manipulates bonding devices.
>
> So people will come across the deprecated names in userland code and
> therefore will try to learn what they do and what they mean.
>
> Which means that the documentation must reference the old names.
>
> You can mark them "(DEPRECATED)" or similar, but you must not remove
> them.

Okay, so it sounds like just a blurb near the top of the file
referencing the changes that have been made in the code might be the
way to go here. Tagging every occurrence of master or slave in the doc
inline as deprecated would get ... noisy.

-- 
Jarod Wilson
ja...@redhat.com

Re: [PATCH net-next v2 6/6] bonding: make Kconfig toggle to disable legacy interfaces

2020-10-02 Thread Jarod Wilson

On Fri, Oct 2, 2020 at 3:13 PM Stephen Hemminger
 wrote:
>
> On Fri,  2 Oct 2020 13:40:01 -0400
> Jarod Wilson  wrote:
>
> > By default, enable retaining all user-facing API that includes the use of
> > master and slave, but add a Kconfig knob that allows those that wish to
> > remove it entirely do so in one shot.
> >
> > Cc: Jay Vosburgh 
> > Cc: Veaceslav Falico 
> > Cc: Andy Gospodarek 
> > Cc: "David S. Miller" 
> > Cc: Jakub Kicinski 
> > Cc: Thomas Davis 
> > Cc: net...@vger.kernel.org
> > Signed-off-by: Jarod Wilson 
> > ---
> >  drivers/net/Kconfig   | 12 
> >  drivers/net/bonding/bond_main.c   |  4 ++--
> >  drivers/net/bonding/bond_options.c|  4 ++--
> >  drivers/net/bonding/bond_procfs.c |  8 
> >  drivers/net/bonding/bond_sysfs.c  | 14 ++
> >  drivers/net/bonding/bond_sysfs_port.c |  6 --
> >  6 files changed, 38 insertions(+), 10 deletions(-)
> >
>
> This is problematic. You are printing both old and new values.
> Also every distribution will have to enable it.
>
> This looks like too much of change to users.

I'd had a bit of feedback that people would rather see both, and be
able to toggle off the old ones, rather than only having one or the
other, depending on the toggle, so I thought I'd give this a try. I
kind of liked the one or the other route, but I see the problems with
that too.

For simplicity, I'm kind of liking the idea of just not updating the
proc and sysfs interfaces, have a toggle entirely disable them, and
work on enhancing userspace to only use netlink, but ... it's going to
be a while before any such work makes its way to any already shipping
distros. I don't have a satisfying answer here.

-- 
Jarod Wilson
ja...@redhat.com

Re: [PATCH net-next v2 5/6] bonding: update Documentation for port/bond terminology

2020-10-02 Thread Jarod Wilson

On Fri, Oct 2, 2020 at 2:09 PM Andrew Lunn  wrote:
>
> On Fri, Oct 02, 2020 at 01:40:00PM -0400, Jarod Wilson wrote:
> > Point users to the new interface names instead of the old ones, where
> > appropriate. Userspace bits referenced still include use of master/slave,
> > but those can't be altered until userspace changes too, ideally after
> > these changes propagate to the community at large.
> >
> > Cc: Jay Vosburgh 
> > Cc: Veaceslav Falico 
> > Cc: Andy Gospodarek 
> > Cc: "David S. Miller" 
> > Cc: Jakub Kicinski 
> > Cc: Thomas Davis 
> > Cc: net...@vger.kernel.org
> > Signed-off-by: Jarod Wilson 
> > ---
> >  Documentation/networking/bonding.rst | 440 +--
> >  1 file changed, 220 insertions(+), 220 deletions(-)
> >
> > diff --git a/Documentation/networking/bonding.rst 
> > b/Documentation/networking/bonding.rst
> > index adc314639085..f4c4f0fae83b 100644
> > --- a/Documentation/networking/bonding.rst
> > +++ b/Documentation/networking/bonding.rst
> > @@ -167,22 +167,22 @@ or, for backwards compatibility, the option value.  
> > E.g.,
> >
> >  The parameters are as follows:
> >
> > -active_slave
> > +active_port
>
> Hi Jarod
>
> It is going to take quite a while before all distributions user space
> gets updated. So todays API is going to live on for a few
> years. People are going to be search the documentation using the terms
> their user space uses, which are going to be todays terms, not the new
> ones you are introducing here. For that to work, i think you are going
> to have to introduce a table listing todays names and the new names
> you are adding, so search engines have some chance of finding this
> document, and readers have some clue as to how to translate from what
> their user space is using to the terms used in the document.

Hm. Would a simple blurb describing the when the changes were made and
why at the top of bonding.rst be sufficient? And then would the rest
of the doc remain as-is (old master/slave language), or with
terminology conversions?

-- 
Jarod Wilson
ja...@redhat.com

Re: [PATCH net-next v2 5/6] bonding: update Documentation for port/bond terminology

2020-10-02 Thread Jarod Wilson

On Fri, Oct 2, 2020 at 3:11 PM Stephen Hemminger
 wrote:
>
> On Fri,  2 Oct 2020 13:40:00 -0400
> Jarod Wilson  wrote:
>
> > @@ -265,7 +265,7 @@ ad_user_port_key
> >   This parameter has effect only in 802.3ad mode and is available 
> > through
> >   SysFs interface.
> >
> > -all_slaves_active
> > +all_ports_active
>
> You can change internal variable names, comments, and documentation all you 
> want, thats great.
>
> But you can't change user API, that includes:
>* definitions in uapi header
>* module parameters
>* sysfs file names or outputs

All of those are retained by default here in this set. There are 0
changes to the if_bonding.h uapi header, module parameters with 'port'
in them have duplicates with the old terminology included, and all
sysfs file names are duplicated (or aliased) as well. The
documentation was updated to point to the new names, but the old ones
still exist across the board, there should be no userspace breakage
here. (My lnst bonding tests actually fall flat currently if the old
names are gone).

-- 
Jarod Wilson
ja...@redhat.com

[PATCH net-next v2 5/6] bonding: update Documentation for port/bond terminology

2020-10-02 Thread Jarod Wilson

Point users to the new interface names instead of the old ones, where
appropriate. Userspace bits referenced still include use of master/slave,
but those can't be altered until userspace changes too, ideally after
these changes propagate to the community at large.

Cc: Jay Vosburgh 
Cc: Veaceslav Falico 
Cc: Andy Gospodarek 
Cc: "David S. Miller" 
Cc: Jakub Kicinski 
Cc: Thomas Davis 
Cc: net...@vger.kernel.org
Signed-off-by: Jarod Wilson 
---
 Documentation/networking/bonding.rst | 440 +--
 1 file changed, 220 insertions(+), 220 deletions(-)

diff --git a/Documentation/networking/bonding.rst 
b/Documentation/networking/bonding.rst
index adc314639085..f4c4f0fae83b 100644
--- a/Documentation/networking/bonding.rst
+++ b/Documentation/networking/bonding.rst
@@ -167,22 +167,22 @@ or, for backwards compatibility, the option value.  E.g.,
 
 The parameters are as follows:
 
-active_slave
+active_port
 
-   Specifies the new active slave for modes that support it
+   Specifies the new active port for modes that support it
(active-backup, balance-alb and balance-tlb).  Possible values
-   are the name of any currently enslaved interface, or an empty
-   string.  If a name is given, the slave and its link must be up in order
-   to be selected as the new active slave.  If an empty string is
-   specified, the current active slave is cleared, and a new active
-   slave is selected automatically.
+   are the name of any currently aggregated interface, or an empty
+   string.  If a name is given, the port and its port must be up in order
+   to be selected as the new active port.  If an empty string is
+   specified, the current active port is cleared, and a new active
+   port is selected automatically.
 
Note that this is only available through the sysfs interface. No module
parameter by this name exists.
 
The normal value of this option is the name of the currently
-   active slave, or the empty string if there is no active slave or
-   the current mode does not use an active slave.
+   active port, or the empty string if there is no active port or
+   the current mode does not use an active port.
 
 ad_actor_sys_prio
 
@@ -199,8 +199,8 @@ ad_actor_system
protocol packet exchanges (LACPDUs). The value cannot be NULL or
multicast. It is preferred to have the local-admin bit set for this
mac but driver does not enforce it. If the value is not given then
-   system defaults to using the masters' mac address as actors' system
-   address.
+   system defaults to using the bonds' mac address as actors'
+   system address.
 
This parameter has effect only in 802.3ad mode and is available through
SysFs interface.
@@ -216,8 +216,8 @@ ad_select
bandwidth.
 
Reselection of the active aggregator occurs only when all
-   slaves of the active aggregator are down or the active
-   aggregator has no slaves.
+   ports of the active aggregator are down or the active
+   aggregator has no ports.
 
This is the default value.
 
@@ -226,18 +226,18 @@ ad_select
The active aggregator is chosen by largest aggregate
bandwidth.  Reselection occurs if:
 
-   - A slave is added to or removed from the bond
+   - A port is added to or removed from the bond
 
-   - Any slave's link state changes
+   - Any port's link state changes
 
-   - Any slave's 802.3ad association state changes
+   - Any port's 802.3ad association state changes
 
- The bond's administrative state changes to up
 
count or 2
 
The active aggregator is chosen by the largest number of
-   ports (slaves).  Reselection occurs as described under the
+   ports (ports).  Reselection occurs as described under the
"bandwidth" setting, above.
 
The bandwidth and count selection policies permit failover of
@@ -265,7 +265,7 @@ ad_user_port_key
This parameter has effect only in 802.3ad mode and is available through
SysFs interface.
 
-all_slaves_active
+all_ports_active
 
Specifies that duplicate frames (received on inactive ports) should be
dropped (0) or delivered (1).
@@ -281,10 +281,10 @@ arp_interval
 
Specifies the ARP link monitoring frequency in milliseconds.
 
-   The ARP monitor works by periodically checking the slave
+   The ARP monitor works by periodically checking the port
devices to determine whether they have sent or received
traffic recently (the precise criteria depends upon the
-   bonding mode, and the state of the slave).  Regular traffic is
+   bonding mode, and the st

[PATCH net-next v2 4/6] bonding: rename bonding_sysfs_slave.c to _port.c

2020-10-02 Thread Jarod Wilson

Now that use of "slave" has been replaced by "port", rename this file too.

Cc: Jay Vosburgh 
Cc: Veaceslav Falico 
Cc: Andy Gospodarek 
Cc: "David S. Miller" 
Cc: Jakub Kicinski 
Cc: Thomas Davis 
Cc: net...@vger.kernel.org
Signed-off-by: Jarod Wilson 
---
 drivers/net/bonding/Makefile  | 2 +-
 drivers/net/bonding/{bond_sysfs_slave.c => bond_sysfs_port.c} | 0
 2 files changed, 1 insertion(+), 1 deletion(-)
 rename drivers/net/bonding/{bond_sysfs_slave.c => bond_sysfs_port.c} (100%)

diff --git a/drivers/net/bonding/Makefile b/drivers/net/bonding/Makefile
index 30e8ae3da2da..2ed0083514a6 100644
--- a/drivers/net/bonding/Makefile
+++ b/drivers/net/bonding/Makefile
@@ -5,7 +5,7 @@
 
 obj-$(CONFIG_BONDING) += bonding.o
 
-bonding-objs := bond_main.o bond_3ad.o bond_alb.o bond_sysfs.o 
bond_sysfs_slave.o bond_debugfs.o bond_netlink.o bond_options.o
+bonding-objs := bond_main.o bond_3ad.o bond_alb.o bond_sysfs.o 
bond_sysfs_port.o bond_debugfs.o bond_netlink.o bond_options.o
 
 proc-$(CONFIG_PROC_FS) += bond_procfs.o
 bonding-objs += $(proc-y)
diff --git a/drivers/net/bonding/bond_sysfs_slave.c 
b/drivers/net/bonding/bond_sysfs_port.c
similarity index 100%
rename from drivers/net/bonding/bond_sysfs_slave.c
rename to drivers/net/bonding/bond_sysfs_port.c
-- 
2.27.0

[PATCH net-next v2 6/6] bonding: make Kconfig toggle to disable legacy interfaces

2020-10-02 Thread Jarod Wilson

By default, enable retaining all user-facing API that includes the use of
master and slave, but add a Kconfig knob that allows those that wish to
remove it entirely do so in one shot.

Cc: Jay Vosburgh 
Cc: Veaceslav Falico 
Cc: Andy Gospodarek 
Cc: "David S. Miller" 
Cc: Jakub Kicinski 
Cc: Thomas Davis 
Cc: net...@vger.kernel.org
Signed-off-by: Jarod Wilson 
---
 drivers/net/Kconfig   | 12 
 drivers/net/bonding/bond_main.c   |  4 ++--
 drivers/net/bonding/bond_options.c|  4 ++--
 drivers/net/bonding/bond_procfs.c |  8 
 drivers/net/bonding/bond_sysfs.c  | 14 ++
 drivers/net/bonding/bond_sysfs_port.c |  6 --
 6 files changed, 38 insertions(+), 10 deletions(-)

diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig
index c3dbe64e628e..1a13894820cb 100644
--- a/drivers/net/Kconfig
+++ b/drivers/net/Kconfig
@@ -56,6 +56,18 @@ config BONDING
  To compile this driver as a module, choose M here: the module
  will be called bonding.
 
+config BONDING_LEGACY_INTERFACES
+   default y
+   bool "Maintain legacy bonding interface names"
+   help
+ The bonding driver historically made use of the terms "master" and
+ "slave" to describe it's component members. This has since been
+ changed to "bond" and "port" as part of a broader effort to remove
+ the use of socially problematic language from the kernel. However,
+ removing all such cases requires breaking long-standing user-facing
+ interfaces in /proc and /sys, which will not be done, unless you
+ opt out of them here, by selecting 'N'.
+
 config DUMMY
tristate "Dummy net driver support"
help
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index b8a351d85da4..226d5fb76221 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -194,7 +194,7 @@ module_param(lp_interval, uint, 0);
 MODULE_PARM_DESC(lp_interval, "The number of seconds between instances where "
  "the bonding driver sends learning packets to "
  "each port's peer switch. The default is 1.");
-/* legacy compatability module parameters */
+#ifdef CONFIG_BONDING_LEGACY_INTERFACES
 module_param_named(all_slaves_active, apa, int, 0644);
 MODULE_PARM_DESC(all_slaves_active, "Keep all frames received on an interface "
 "by setting active flag for all slaves; "
@@ -205,7 +205,7 @@ MODULE_PARM_DESC(packets_per_slave, "Packets to send per 
slave in balance-rr "
"mode; 0 for a random slave, 1 packet per "
"slave (default), >1 packets per slave. "
"(Legacy compat synonym for 
packets_per_port).");
-/* end legacy compatability module parameters */
+#endif
 
 /*- Global variables */
 
diff --git a/drivers/net/bonding/bond_options.c 
b/drivers/net/bonding/bond_options.c
index 8e4050c2b08e..630079ba5452 100644
--- a/drivers/net/bonding/bond_options.c
+++ b/drivers/net/bonding/bond_options.c
@@ -434,7 +434,7 @@ static const struct bond_option bond_opts[BOND_OPT_LAST] = {
.values = bond_intmax_tbl,
.set = bond_option_peer_notif_delay_set
},
-/* legacy sysfs interfaces */
+#ifdef CONFIG_BONDING_LEGACY_INTERFACES
[BOND_OPT_PACKETS_PER_SLAVE] = {
.id = BOND_OPT_PACKETS_PER_SLAVE,
.name = "packets_per_slave",
@@ -467,7 +467,7 @@ static const struct bond_option bond_opts[BOND_OPT_LAST] = {
.flags = BOND_OPTFLAG_RAWVAL,
.set = bond_option_ports_set
},
-/* end legacy sysfs interfaces */
+#endif
 };
 
 /* Searches for an option by name */
diff --git a/drivers/net/bonding/bond_procfs.c 
b/drivers/net/bonding/bond_procfs.c
index 2e65472e3c58..8e4a03d86329 100644
--- a/drivers/net/bonding/bond_procfs.c
+++ b/drivers/net/bonding/bond_procfs.c
@@ -86,8 +86,10 @@ static void bond_info_show_bond_dev(struct seq_file *seq)
primary = rcu_dereference(bond->primary_port);
seq_printf(seq, "Primary Port: %s",
   primary ? primary->dev->name : "None");
+#ifdef CONFIG_BONDING_LEGACY_INTERFACES
seq_printf(seq, "Primary Slave: %s",
   primary ? primary->dev->name : "None");
+#endif
if (primary) {
optval = bond_opt_get_val(BOND_OPT_PRIMARY_RESELECT,
  
bond->params.primary_reselect);
@@ -97,8 +99,10 @@ static void bond_inf

[PATCH net-next v2 2/6] bonding: replace use of the term master where possible

2020-10-02 Thread Jarod Wilson

Simply refer to what was the bonding "master" as the "bond" or bonding
device, depending on context. However, do retain compat code for the
bonding_masters sysfs interface to avoid breaking userspace.

Cc: Jay Vosburgh 
Cc: Veaceslav Falico 
Cc: Andy Gospodarek 
Cc: "David S. Miller" 
Cc: Jakub Kicinski 
Cc: Thomas Davis 
Cc: net...@vger.kernel.org
Signed-off-by: Jarod Wilson 
---
 drivers/infiniband/core/cma.c |   2 +-
 drivers/infiniband/core/lag.c |   2 +-
 drivers/infiniband/core/roce_gid_mgmt.c   |   6 +-
 drivers/net/bonding/bond_3ad.c|   2 +-
 drivers/net/bonding/bond_main.c   |  58 
 drivers/net/bonding/bond_procfs.c |   4 +-
 drivers/net/bonding/bond_sysfs.c  | 135 ++
 .../net/ethernet/mellanox/mlx4/en_netdev.c|  10 +-
 .../ethernet/netronome/nfp/flower/lag_conf.c  |   2 +-
 .../ethernet/qlogic/netxen/netxen_nic_main.c  |   8 +-
 include/linux/netdevice.h |   8 +-
 include/net/bonding.h |   4 +-
 12 files changed, 158 insertions(+), 83 deletions(-)

diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
index 7f0e91e92968..fd5ad5139106 100644
--- a/drivers/infiniband/core/cma.c
+++ b/drivers/infiniband/core/cma.c
@@ -4687,7 +4687,7 @@ static int cma_netdev_callback(struct notifier_block 
*self, unsigned long event,
if (event != NETDEV_BONDING_FAILOVER)
return NOTIFY_DONE;
 
-   if (!netif_is_bond_master(ndev))
+   if (!netif_is_bond_dev(ndev))
return NOTIFY_DONE;
 
mutex_lock(&lock);
diff --git a/drivers/infiniband/core/lag.c b/drivers/infiniband/core/lag.c
index 7063e41eaf26..2afaca2f9d0b 100644
--- a/drivers/infiniband/core/lag.c
+++ b/drivers/infiniband/core/lag.c
@@ -128,7 +128,7 @@ struct net_device *rdma_lag_get_ah_roce_slave(struct 
ib_device *device,
dev_hold(master);
rcu_read_unlock();
 
-   if (!netif_is_bond_master(master))
+   if (!netif_is_bond_dev(master))
goto put;
 
slave = rdma_get_xmit_slave_udp(device, master, ah_attr, flags);
diff --git a/drivers/infiniband/core/roce_gid_mgmt.c 
b/drivers/infiniband/core/roce_gid_mgmt.c
index 2860def84f4d..85c48977be6c 100644
--- a/drivers/infiniband/core/roce_gid_mgmt.c
+++ b/drivers/infiniband/core/roce_gid_mgmt.c
@@ -129,7 +129,7 @@ enum bonding_slave_state {
 static enum bonding_slave_state is_eth_active_slave_of_bonding_rcu(struct 
net_device *dev,
   struct 
net_device *upper)
 {
-   if (upper && netif_is_bond_master(upper)) {
+   if (upper && netif_is_bond_dev(upper)) {
struct net_device *pdev =
bond_option_active_slave_get_rcu(netdev_priv(upper));
 
@@ -216,7 +216,7 @@ is_ndev_for_default_gid_filter(struct ib_device *ib_dev, u8 
port,
 * make sure that it the upper netdevice of rdma netdevice.
 */
res = ((cookie_ndev == rdma_ndev && !netif_is_bond_slave(rdma_ndev)) ||
-  (netif_is_bond_master(cookie_ndev) &&
+  (netif_is_bond_dev(cookie_ndev) &&
rdma_is_upper_dev_rcu(rdma_ndev, cookie_ndev)));
 
rcu_read_unlock();
@@ -271,7 +271,7 @@ is_upper_ndev_bond_master_filter(struct ib_device *ib_dev, 
u8 port,
return false;
 
rcu_read_lock();
-   if (netif_is_bond_master(cookie_ndev) &&
+   if (netif_is_bond_dev(cookie_ndev) &&
rdma_is_upper_dev_rcu(rdma_ndev, cookie_ndev))
match = true;
rcu_read_unlock();
diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c
index 0eb717b0bfc6..852b9c4f6a47 100644
--- a/drivers/net/bonding/bond_3ad.c
+++ b/drivers/net/bonding/bond_3ad.c
@@ -2550,7 +2550,7 @@ void bond_3ad_handle_link_change(struct slave *slave, 
char link)
 }
 
 /**
- * bond_3ad_set_carrier - set link state for bonding master
+ * bond_3ad_set_carrier - set link state for bonding device
  * @bond: bonding structure
  *
  * if we have an active aggregator, we're up, if not, we're down.
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 28c04a7a5105..405d230b8ea3 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -469,8 +469,8 @@ static const struct xfrmdev_ops bond_xfrmdev_ops = {
 
 /*--- Link status ---*/
 
-/* Set the carrier state for the master according to the state of its
- * slaves.  If any slaves are up, the master is up.  In 802.3ad mode,
+/* Set the carrier state for the bond according to the state of its
+ * slaves.  If any slaves are up, the bond is up.  In 802.3ad mode,
  * do special 802.3ad magic.
  *
  * Returns zero if carrier state does not change, nonzero i

[PATCH net-next v2 0/6] bonding: rename bond components

2020-10-02 Thread Jarod Wilson

The bonding driver's use of master and slave, while largely understood
in technical circles, poses a barrier for inclusion to some potential
members of the development and user community, due to the historical
context of masters and slaves, particularly in the United States. This
is a first full pass at replacing those phrases with more socially
inclusive ones, opting for bond to replace master and port to
replace slave, which is congruent with the bridge and team drivers.

There are a few problems with this change. First up, "port" is used in
the bonding 802.3ad code, so the first step here is to rename port to
ad_port, so we can reuse port. Second, we have the issue of not wanting
to break any existing userspace, which I believe this patchset
accomplishes, while also adding alternate interfaces using the new
terminology. This set also includes a Kconfig option that will let
people make the conscious decision to break userspace and no longer
expose the original master/slave interfaces, once their userspace is
able to cope with their removal.

Lastly, we do still have the issue of ease of backporting fixes to
-stable trees. I've not had a huge amount of time to spend on it, but
brief forays into coccinelle didn't really pay off (since it's meant to
operate on code, not patches), and the best solution I can come up with
is providing a shell script someone could run over git-format-patch
output before git-am'ing the result to a -stable tree, though scripting
these changes in the first place turned out to be not the best thing to
do anyway, due to subtle cases where use of master or slave can NOT yet
be replaced, so a large amount of work was done by hand, inspection,
trial and error, which is why this set is a lot longer in coming than
I'd originally hoped. I don't expect -stable backports to be horrible to
figure out one way or another though, and I don't believe that a bit of
inconvenience on that front is enough to warrant not making these
changes.

See here for further details on Red Hat's commitment to this work:
https://www.redhat.com/en/blog/making-open-source-more-inclusive-eradicating-problematic-language

As far as testing goes, I've manually operated on various bonds while
working on this code, and have run it through multiple lnst test runs,
which exercises the existing sysfs interfaces fairly extensively. As far
as I can tell, there is no breakage of existing interfaces with this
set, unless the user consciously opts to do so via Kconfig.

v2: legacy module parameters are retained this time, and we're trying
out bond/port instead of aggregator/link in place of master/slave. The
procfs interface legacy output is also duplicated or dropped, depending
on Kconfig, rather than being replaced.

Cc: Jay Vosburgh 
Cc: Veaceslav Falico 
Cc: Andy Gospodarek 
Cc: "David S. Miller" 
Cc: Jakub Kicinski 
Cc: Thomas Davis 
Cc: net...@vger.kernel.org

Jarod Wilson (6):
  bonding: rename 802.3ad's struct port to ad_port
  bonding: replace use of the term master where possible
  bonding: rename slave to port where possible
  bonding: rename bonding_sysfs_slave.c to _port.c
  bonding: update Documentation for port/bond terminology
  bonding: make Kconfig toggle to disable legacy interfaces

 .clang-format |4 +-
 Documentation/networking/bonding.rst  |  440 ++--
 drivers/infiniband/core/cma.c |2 +-
 drivers/infiniband/core/lag.c |2 +-
 drivers/infiniband/core/roce_gid_mgmt.c   |   10 +-
 drivers/infiniband/hw/mlx4/main.c |2 +-
 drivers/net/Kconfig   |   12 +
 drivers/net/bonding/Makefile  |2 +-
 drivers/net/bonding/bond_3ad.c| 1701 ++--
 drivers/net/bonding/bond_alb.c|  689 ++---
 drivers/net/bonding/bond_debugfs.c|2 +-
 drivers/net/bonding/bond_main.c   | 2339 +
 drivers/net/bonding/bond_netlink.c|  114 +-
 drivers/net/bonding/bond_options.c|  258 +-
 drivers/net/bonding/bond_procfs.c |  102 +-
 drivers/net/bonding/bond_sysfs.c  |  242 +-
 drivers/net/bonding/bond_sysfs_port.c |  187 ++
 drivers/net/bonding/bond_sysfs_slave.c|  176 --
 .../ethernet/chelsio/cxgb3/cxgb3_offload.c|2 +-
 .../net/ethernet/mellanox/mlx4/en_netdev.c|   14 +-
 .../ethernet/mellanox/mlx5/core/en/rep/bond.c |4 +-
 .../net/ethernet/mellanox/mlx5/core/en_tc.c   |2 +-
 .../ethernet/netronome/nfp/flower/lag_conf.c  |2 +-
 .../ethernet/qlogic/netxen/netxen_nic_main.c  |   12 +-
 include/linux/netdevice.h |   22 +-
 include/net/bond_3ad.h|   42 +-
 include/net/bond_alb.h|   74 +-
 include/net/bond_options.h|   18 +-
 include/net/bonding.h

[PATCH net-next v2 1/6] bonding: rename 802.3ad's struct port to ad_port

2020-10-02 Thread Jarod Wilson

The intention is to reuse "port" in place of "slave" in the bonding driver
after making this change, as port is consistent with the bridge and team
drivers, and allows us to remove socially problematic language from the
bonding driver.

Cc: Jay Vosburgh 
Cc: Veaceslav Falico 
Cc: Andy Gospodarek 
Cc: "David S. Miller" 
Cc: Jakub Kicinski 
Cc: Thomas Davis 
Cc: net...@vger.kernel.org
Signed-off-by: Jarod Wilson 
---
 drivers/net/bonding/bond_3ad.c | 1307 
 drivers/net/bonding/bond_main.c|4 +-
 drivers/net/bonding/bond_netlink.c |6 +-
 drivers/net/bonding/bond_procfs.c  |   36 +-
 drivers/net/bonding/bond_sysfs_slave.c |   10 +-
 include/net/bond_3ad.h |   14 +-
 6 files changed, 688 insertions(+), 689 deletions(-)

diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c
index aa001b16765a..0eb717b0bfc6 100644
--- a/drivers/net/bonding/bond_3ad.c
+++ b/drivers/net/bonding/bond_3ad.c
@@ -89,60 +89,60 @@ static const u8 lacpdu_mcast_addr[ETH_ALEN + 2] 
__long_aligned =
MULTICAST_LACPDU_ADDR;
 
 /* = main 802.3ad protocol functions == */
-static int ad_lacpdu_send(struct port *port);
-static int ad_marker_send(struct port *port, struct bond_marker *marker);
-static void ad_mux_machine(struct port *port, bool *update_slave_arr);
-static void ad_rx_machine(struct lacpdu *lacpdu, struct port *port);
-static void ad_tx_machine(struct port *port);
-static void ad_periodic_machine(struct port *port);
-static void ad_port_selection_logic(struct port *port, bool *update_slave_arr);
+static int ad_lacpdu_send(struct ad_port *ad_port);
+static int ad_marker_send(struct ad_port *ad_port, struct bond_marker *marker);
+static void ad_mux_machine(struct ad_port *ad_port, bool *update_slave_arr);
+static void ad_rx_machine(struct lacpdu *lacpdu, struct ad_port *ad_port);
+static void ad_tx_machine(struct ad_port *ad_port);
+static void ad_periodic_machine(struct ad_port *ad_port);
+static void ad_port_selection_logic(struct ad_port *ad_port, bool 
*update_slave_arr);
 static void ad_agg_selection_logic(struct aggregator *aggregator,
   bool *update_slave_arr);
 static void ad_clear_agg(struct aggregator *aggregator);
 static void ad_initialize_agg(struct aggregator *aggregator);
-static void ad_initialize_port(struct port *port, int lacp_fast);
-static void ad_enable_collecting_distributing(struct port *port,
+static void ad_initialize_port(struct ad_port *ad_port, int lacp_fast);
+static void ad_enable_collecting_distributing(struct ad_port *ad_port,
  bool *update_slave_arr);
-static void ad_disable_collecting_distributing(struct port *port,
+static void ad_disable_collecting_distributing(struct ad_port *ad_port,
   bool *update_slave_arr);
 static void ad_marker_info_received(struct bond_marker *marker_info,
-   struct port *port);
+   struct ad_port *ad_port);
 static void ad_marker_response_received(struct bond_marker *marker,
-   struct port *port);
-static void ad_update_actor_keys(struct port *port, bool reset);
+   struct ad_port *ad_port);
+static void ad_update_actor_keys(struct ad_port *ad_port, bool reset);
 
 
 /* = api to bonding and kernel code == */
 
 /**
- * __get_bond_by_port - get the port's bonding struct
- * @port: the port we're looking at
+ * __get_bond_by_ad_port - get the ad_port's bonding struct
+ * @ad_port: the ad_port we're looking at
  *
- * Return @port's bonding struct, or %NULL if it can't be found.
+ * Return @ad_port's bonding struct, or %NULL if it can't be found.
  */
-static inline struct bonding *__get_bond_by_port(struct port *port)
+static inline struct bonding *__get_bond_by_ad_port(struct ad_port *ad_port)
 {
-   if (port->slave == NULL)
+   if (ad_port->slave == NULL)
return NULL;
 
-   return bond_get_bond_by_slave(port->slave);
+   return bond_get_bond_by_slave(ad_port->slave);
 }
 
 /**
  * __get_first_agg - get the first aggregator in the bond
- * @port: the port we're looking at
+ * @ad_port: the ad_port we're looking at
  *
  * Return the aggregator of the first slave in @bond, or %NULL if it can't be
  * found.
  * The caller must hold RCU or RTNL lock.
  */
-static inline struct aggregator *__get_first_agg(struct port *port)
+static inline struct aggregator *__get_first_agg(struct ad_port *ad_port)
 {
-   struct bonding *bond = __get_bond_by_port(port);
+   struct bonding *bond = __get_bond_by_ad_port(ad_port);
struct slave *first_slave;
struct aggregator *agg;
 
-   /* If there's no bond for thi

Re: [PATCH net-next 0/5] bonding: rename bond components

2020-09-25 Thread Jarod Wilson

On Tue, Sep 22, 2020 at 6:19 PM Jay Vosburgh  wrote:
>
> Jarod Wilson  wrote:
>
> >The bonding driver's use of master and slave, while largely understood
> >in technical circles, poses a barrier for inclusion to some potential
> >members of the development and user community, due to the historical
> >context of masters and slaves, particularly in the United States. This
> >is a first full pass at replacing those phrases with more socially
> >inclusive ones, opting for aggregator to replace master and link to
> >replace slave, as the bonding driver itself is a link aggregation
> >driver.
>
> First, I think there should be some direction from the kernel
> development leadership as to whether or not this type of large-scale
> search and replace of socially sensitive terms of art or other
> terminology is a task that should be undertaken, given the noted issues
> it will cause in stable release maintenance going forwards.

Admittedly, part of the point of this patch is to help drive such
conversations. Having a concrete example of how these changes would
look makes it easier to discuss, I think. I understand the burden
here, though as you noted later, bonding doesn't really churn that
much, so in this specific case, the maintenance load wouldn't be
terrible, and I think worth it in this case, from a social standpoint.
I know this can start to get political and personal real fast
though...

> Second, on the merits of the proposed changes (presuming for the
> moment that this goes forward), I would prefer different nomenclature
> that does not reuse existing names for different purposes, i.e., "link"
> and "aggregator."
>
> Both of those have specific meanings in the current code, and
> old kernels will retain that meaning.  Changing them to have new
> meanings going forward will lead to confusion, in my opinion for no good
> reason, as there are other names suited that do not conflict.
>
> For example, instead of "master" call everything a "bond," which
> matches common usage in discussion.  Changing "master" to "aggregator,"
> the replacement results in cumbersome descriptions like "the
> aggregator's active aggregator" in the context of LACP.
>
> A replacement term for "slave" is trickier; my first choice
> would be "port," but that may make more churn from a code change point
> of view, although struct slave could become struct bond_port, and leave
> the existing struct port for its current LACP use.

I did briefly have the idea of renaming 'port' in the LACP code to
'lacp_port', which would allow reuse of 'port', and would then be
consistent with the team driver (and bridge driver, iirc). I could
certainly pursue that option, or try going with "bond_port", but I'd
like something so widely used throughout the code to be shorter if
possible, I think. It really is LACP that throws a wrench into most
sensible naming schemes I could think of. Simply renaming current
"master" to "bond" does make a fair bit of sense though, didn't really
occur to me. But replacing "slave" is definitely the far more involved
and messy one.

> >There are a few problems with this change. First up, "link" is used for
> >link state already in the bonding driver, so the first step here is to
> >rename link to link_state. Second, aggregator is already used in the
> >802.3ad code, but I feel the usage is actually consistent with referring
> >to the bonding aggregation virtual device as the aggregator. Third, we
> >have the issue of not wanting to break any existing userspace, which I
> >believe this patchset accomplishes, while also adding alternative
> >interfaces using new terminology, and a Kconfig option that will let
> >people make the conscious decision to break userspace and no longer
> >expose the original master/slave interfaces, once their userspace is
> >able to cope with their removal.
>
> I'm opposed to the Kconfig option because it will lead to
> balkanization of the UAPI, which would be worse than a clean break
> (which I'm also opposed to).

I suspected this might be a point of contention. Easy enough to simply
omit that bit from the series, if that's the consensus.

> >Lastly, we do still have the issue of ease of backporting fixes to
> >-stable trees. I've not had a huge amount of time to spend on it, but
> >brief forays into coccinelle didn't really pay off (since it's meant to
> >operate on code, not patches), and the best solution I can come up with
> >is providing a shell script someone could run over git-fo

Re: [PATCH net-next 2/5] bonding: rename slave to link where possible

2020-09-25 Thread Jarod Wilson

On Tue, Sep 22, 2020 at 7:51 PM David Miller  wrote:
>
> From: Michal Kubecek 
> Date: Wed, 23 Sep 2020 01:23:17 +0200
>
> > Even if the module parameters are deprecated and extremely inconvenient
> > as a mean of bonding configuration, I would say changing their names
> > would still count as "breaking the userspace".
>
> I totally agree.
>
> Anything user facing has to be kept around for the deprecation period,
> and that includes module parameters.

Apologies, that was a definite oversight on my part, can add them back
via similar means as num_grat_arp and num_unsol_na use currently.

-- 
Jarod Wilson
ja...@redhat.com

Re: [PATCH net-next 4/5] bonding: make Kconfig toggle to disable legacy interfaces

2020-09-23 Thread Jarod Wilson

On Tue, Sep 22, 2020 at 8:01 PM Stephen Hemminger
 wrote:
>
> On Tue, 22 Sep 2020 16:47:07 -0700
> Jay Vosburgh  wrote:
>
> > Stephen Hemminger  wrote:
> >
> > >On Tue, 22 Sep 2020 09:37:30 -0400
> > >Jarod Wilson  wrote:
> > >
> > >> By default, enable retaining all user-facing API that includes the use of
> > >> master and slave, but add a Kconfig knob that allows those that wish to
> > >> remove it entirely do so in one shot.
> > >>
> > >> Cc: Jay Vosburgh 
> > >> Cc: Veaceslav Falico 
> > >> Cc: Andy Gospodarek 
> > >> Cc: "David S. Miller" 
> > >> Cc: Jakub Kicinski 
> > >> Cc: Thomas Davis 
> > >> Cc: net...@vger.kernel.org
> > >> Signed-off-by: Jarod Wilson 
> > >
> > >Why not just have a config option to remove all the /proc and sysfs options
> > >in bonding (and bridging) and only use netlink? New tools should be only 
> > >able
> > >to use netlink only.
> >
> >   I agree that new tooling should be netlink, but what value is
> > provided by such an option that distros are unlikely to enable, and
> > enabling will break the UAPI?

Do you mean the initial proposed option, or what Stephen is
suggesting? I think Red Hat actually will consider the former, the
latter is less likely in the immediate future, since so many people
still rely on the output of /proc/net/bonding/* for an overall view of
their bonds' health and status. I don't know how close we are to
having something comparable that could be spit out with a single
invocation of something like 'ip' that would only be using netlink.
It's entirely possible there's something akin to 'ip link bondX
overview' already that outputs something similar, and I'm just not
aware of it, but something like that would definitely need to exist
and be well-documented for Red Hat to remove the procfs bits, I think.

> > >Then you might convince maintainers to update documentation as well.
> > >Last I checked there were still references to ifenslave.
> >
> >   Distros still include ifenslave, but it's now a shell script
> > that uses sysfs.  I see it used in scripts from time to time.
>
> Some bleeding edge distros have already dropped ifenslave and even ifconfig.
> The Enterprise ones never will.
>
> The one motivation would be for the embedded folks which are always looking
> to trim out the fat. Although not sure if the minimal versions of commands
> in busybox are pure netlink yet.

Yeah, the bonding documentation is still filled with references to
ifenslave. I believe Red Hat still includes it, though it's
"deprecated" in documentation in favor of using ip. Similar with
ifconfig. I could see them both getting dropped in a future major
release of Red Hat Enterprise Linux, but they're definitely still here
for at least the life of RHEL8.

-- 
Jarod Wilson
ja...@redhat.com

Re: [RFC PATCH] bonding: linkdesc can be static

2020-09-23 Thread Jarod Wilson

On Wed, Sep 23, 2020 at 12:15 AM kernel test robot  wrote:
>
> Signed-off-by: kernel test robot 
> ---
>  bond_procfs.c |2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/net/bonding/bond_procfs.c 
> b/drivers/net/bonding/bond_procfs.c
> index 91ece68607b23..9b1b37a682728 100644
> --- a/drivers/net/bonding/bond_procfs.c
> +++ b/drivers/net/bonding/bond_procfs.c
> @@ -8,7 +8,7 @@
>  #include "bonding_priv.h"
>
>  #ifdef CONFIG_BONDING_LEGACY_INTERFACES
> -const char *linkdesc = "Slave";
> +static const char *linkdesc = "Slave";
>  #else
>  const char *linkdesc = "Link";
>  #endif

Good attempt, robot, but you missed the #else. Will fold a full
version into my set.

-- 
Jarod Wilson
ja...@redhat.com

Re: [PATCH net-next 4/5] bonding: make Kconfig toggle to disable legacy interfaces

2020-09-22 Thread Jarod Wilson

On Tue, Sep 22, 2020 at 9:38 AM Jarod Wilson  wrote:
>
> By default, enable retaining all user-facing API that includes the use of
> master and slave, but add a Kconfig knob that allows those that wish to
> remove it entirely do so in one shot.
> diff --git a/drivers/net/bonding/bond_procfs.c 
> b/drivers/net/bonding/bond_procfs.c
> index abd265d6e975..91ece68607b2 100644
> --- a/drivers/net/bonding/bond_procfs.c
> +++ b/drivers/net/bonding/bond_procfs.c
> @@ -7,6 +7,12 @@
>
>  #include "bonding_priv.h"
>
> +#ifdef CONFIG_BONDING_LEGACY_INTERFACES
> +const char *linkdesc = "Slave";
> +#else
> +const char *linkdesc = "Link";
> +#endif

I've been asked if it would be okay to add extra lines to the
/proc/net/bonding/ output, so that for example, both
"Slave Interface: " and "Link Interface: " are
both in the default output, with the Slave bits then suppressed by the
Kconfig option being unset, versus the Kconfig option currently
swapping out Slave for Link when disabled. It would bloat the output
by a fair number of lines, but all the same data would be there and
parseable. Wasn't sure on this one, so I wanted to check on it. If it
would be acceptable, I'll rework that bit of code.

-- 
Jarod Wilson
ja...@redhat.com

[PATCH net-next 5/5] bonding: update Documentation for link/aggregator terminology

2020-09-22 Thread Jarod Wilson

Point users to the new interface names instead of the old ones, where
appropriate. Userspace bits referenced still include use of master/slave,
but those can't be altered until userspace changes too, ideally after
these changes propagate to the community at large.

Cc: Jay Vosburgh 
Cc: Veaceslav Falico 
Cc: Andy Gospodarek 
Cc: "David S. Miller" 
Cc: Jakub Kicinski 
Cc: Thomas Davis 
Cc: net...@vger.kernel.org
Signed-off-by: Jarod Wilson 
---
 Documentation/networking/bonding.rst | 440 +--
 1 file changed, 220 insertions(+), 220 deletions(-)

diff --git a/Documentation/networking/bonding.rst 
b/Documentation/networking/bonding.rst
index adc314639085..ee233abcc58d 100644
--- a/Documentation/networking/bonding.rst
+++ b/Documentation/networking/bonding.rst
@@ -167,22 +167,22 @@ or, for backwards compatibility, the option value.  E.g.,
 
 The parameters are as follows:
 
-active_slave
+active_link
 
-   Specifies the new active slave for modes that support it
+   Specifies the new active link for modes that support it
(active-backup, balance-alb and balance-tlb).  Possible values
-   are the name of any currently enslaved interface, or an empty
-   string.  If a name is given, the slave and its link must be up in order
-   to be selected as the new active slave.  If an empty string is
-   specified, the current active slave is cleared, and a new active
-   slave is selected automatically.
+   are the name of any currently aggregated interface, or an empty
+   string.  If a name is given, the link and its connection must be up in
+   order to be selected as the new active link.  If an empty string is
+   specified, the current active link is cleared, and a new active
+   link is selected automatically.
 
Note that this is only available through the sysfs interface. No module
parameter by this name exists.
 
The normal value of this option is the name of the currently
-   active slave, or the empty string if there is no active slave or
-   the current mode does not use an active slave.
+   active link, or the empty string if there is no active link or
+   the current mode does not use an active link.
 
 ad_actor_sys_prio
 
@@ -199,8 +199,8 @@ ad_actor_system
protocol packet exchanges (LACPDUs). The value cannot be NULL or
multicast. It is preferred to have the local-admin bit set for this
mac but driver does not enforce it. If the value is not given then
-   system defaults to using the masters' mac address as actors' system
-   address.
+   system defaults to using the aggregators' mac address as actors'
+   system address.
 
This parameter has effect only in 802.3ad mode and is available through
SysFs interface.
@@ -216,8 +216,8 @@ ad_select
bandwidth.
 
Reselection of the active aggregator occurs only when all
-   slaves of the active aggregator are down or the active
-   aggregator has no slaves.
+   links of the active aggregator are down or the active
+   aggregator has no links.
 
This is the default value.
 
@@ -226,18 +226,18 @@ ad_select
The active aggregator is chosen by largest aggregate
bandwidth.  Reselection occurs if:
 
-   - A slave is added to or removed from the bond
+   - A link is added to or removed from the bond
 
-   - Any slave's link state changes
+   - Any link's link state changes
 
-   - Any slave's 802.3ad association state changes
+   - Any link's 802.3ad association state changes
 
- The bond's administrative state changes to up
 
count or 2
 
The active aggregator is chosen by the largest number of
-   ports (slaves).  Reselection occurs as described under the
+   ports (links).  Reselection occurs as described under the
"bandwidth" setting, above.
 
The bandwidth and count selection policies permit failover of
@@ -265,7 +265,7 @@ ad_user_port_key
This parameter has effect only in 802.3ad mode and is available through
SysFs interface.
 
-all_slaves_active
+all_links_active
 
Specifies that duplicate frames (received on inactive ports) should be
dropped (0) or delivered (1).
@@ -281,10 +281,10 @@ arp_interval
 
Specifies the ARP link monitoring frequency in milliseconds.
 
-   The ARP monitor works by periodically checking the slave
+   The ARP monitor works by periodically checking the link
devices to determine whether they have sent or received
traffic recently (the precise criteria depends upon the
-   bonding mode, and the state of the slave).  Regular traffic is
+   bonding m

[PATCH net-next 4/5] bonding: make Kconfig toggle to disable legacy interfaces

2020-09-22 Thread Jarod Wilson

By default, enable retaining all user-facing API that includes the use of
master and slave, but add a Kconfig knob that allows those that wish to
remove it entirely do so in one shot.

Cc: Jay Vosburgh 
Cc: Veaceslav Falico 
Cc: Andy Gospodarek 
Cc: "David S. Miller" 
Cc: Jakub Kicinski 
Cc: Thomas Davis 
Cc: net...@vger.kernel.org
Signed-off-by: Jarod Wilson 
---
 drivers/net/Kconfig   | 12 
 drivers/net/bonding/bond_options.c|  4 ++--
 drivers/net/bonding/bond_procfs.c | 14 ++
 drivers/net/bonding/bond_sysfs.c  | 15 ++-
 drivers/net/bonding/bond_sysfs_link.c | 12 
 include/net/bond_options.h|  4 ++--
 include/net/bonding.h |  2 ++
 7 files changed, 46 insertions(+), 17 deletions(-)

diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig
index c3dbe64e628e..3640694be34d 100644
--- a/drivers/net/Kconfig
+++ b/drivers/net/Kconfig
@@ -56,6 +56,18 @@ config BONDING
  To compile this driver as a module, choose M here: the module
  will be called bonding.
 
+config BONDING_LEGACY_INTERFACES
+   default y
+   bool "Maintain legacy interface names"
+   help
+ The bonding driver historically made use of the terms "master" and
+ "slave" to describe it's component members. This has since been
+ changed to "aggregator" and "link" as part of a broader effort to
+ remove the use of socially problematic language from the kernel.
+ However, removing all such cases requires breaking long-standing
+ user-facing interfaces in /proc and /sys, which will not be done,
+ unless you opt out of them here, by selecting 'N'.
+
 config DUMMY
tristate "Dummy net driver support"
help
diff --git a/drivers/net/bonding/bond_options.c 
b/drivers/net/bonding/bond_options.c
index 437df9a207a6..7bf1a13a3c17 100644
--- a/drivers/net/bonding/bond_options.c
+++ b/drivers/net/bonding/bond_options.c
@@ -434,7 +434,7 @@ static const struct bond_option bond_opts[BOND_OPT_LAST] = {
.values = bond_intmax_tbl,
.set = bond_option_peer_notif_delay_set
},
-/* legacy sysfs interface names */
+#ifdef CONFIG_BONDING_LEGACY_INTERFACES
[BOND_OPT_PACKETS_PER_SLAVE] = {
.id = BOND_OPT_PACKETS_PER_SLAVE,
.name = "packets_per_slave",
@@ -474,7 +474,7 @@ static const struct bond_option bond_opts[BOND_OPT_LAST] = {
.flags = BOND_OPTFLAG_RAWVAL,
.set = bond_option_links_set
},
-/* end legacy sysfs interface names */
+#endif
 };
 
 /* Searches for an option by name */
diff --git a/drivers/net/bonding/bond_procfs.c 
b/drivers/net/bonding/bond_procfs.c
index abd265d6e975..91ece68607b2 100644
--- a/drivers/net/bonding/bond_procfs.c
+++ b/drivers/net/bonding/bond_procfs.c
@@ -7,6 +7,12 @@
 
 #include "bonding_priv.h"
 
+#ifdef CONFIG_BONDING_LEGACY_INTERFACES
+const char *linkdesc = "Slave";
+#else
+const char *linkdesc = "Link";
+#endif
+
 static void *bond_info_seq_start(struct seq_file *seq, loff_t *pos)
__acquires(RCU)
 {
@@ -84,7 +90,7 @@ static void bond_info_show_aggregator(struct seq_file *seq)
 
if (bond_uses_primary(bond)) {
primary = rcu_dereference(bond->primary_link);
-   seq_printf(seq, "Primary Slave: %s",
+   seq_printf(seq, "Primary %s: %s", linkdesc,
   primary ? primary->dev->name : "None");
if (primary) {
optval = bond_opt_get_val(BOND_OPT_PRIMARY_RESELECT,
@@ -93,7 +99,7 @@ static void bond_info_show_aggregator(struct seq_file *seq)
   optval->string);
}
 
-   seq_printf(seq, "\nCurrently Active Slave: %s\n",
+   seq_printf(seq, "\nCurrently Active %s: %s\n", linkdesc,
   (curr) ? curr->dev->name : "None");
}
 
@@ -171,7 +177,7 @@ static void bond_info_show_link(struct seq_file *seq,
 {
struct bonding *bond = PDE_DATA(file_inode(seq->file));
 
-   seq_printf(seq, "\nSlave Interface: %s\n", link->dev->name);
+   seq_printf(seq, "\n%s Interface: %s\n", linkdesc, link->dev->name);
seq_printf(seq, "MII Status: %s\n",
   bond_link_status(link->link_state));
if (link->speed == SPEED_UNKNOWN)
@@ -189,7 +195,7 @@ static void bond_info_show_link(struct seq_file *seq,
 
seq_printf(seq, "Permanent HW addr: %*phC\n",
   link->dev->addr_len, link->perm_hwaddr);
-   seq_printf(seq, "Slave queue ID: %d\n", link->queue_id);
+   seq_printf(seq, "%s queue ID: %d\

[PATCH net-next 0/5] bonding: rename bond components

2020-09-22 Thread Jarod Wilson

The bonding driver's use of master and slave, while largely understood
in technical circles, poses a barrier for inclusion to some potential
members of the development and user community, due to the historical
context of masters and slaves, particularly in the United States. This
is a first full pass at replacing those phrases with more socially
inclusive ones, opting for aggregator to replace master and link to
replace slave, as the bonding driver itself is a link aggregation
driver.

There are a few problems with this change. First up, "link" is used for
link state already in the bonding driver, so the first step here is to
rename link to link_state. Second, aggregator is already used in the
802.3ad code, but I feel the usage is actually consistent with referring
to the bonding aggregation virtual device as the aggregator. Third, we
have the issue of not wanting to break any existing userspace, which I
believe this patchset accomplishes, while also adding alternative
interfaces using new terminology, and a Kconfig option that will let
people make the conscious decision to break userspace and no longer
expose the original master/slave interfaces, once their userspace is
able to cope with their removal.

Lastly, we do still have the issue of ease of backporting fixes to
-stable trees. I've not had a huge amount of time to spend on it, but
brief forays into coccinelle didn't really pay off (since it's meant to
operate on code, not patches), and the best solution I can come up with
is providing a shell script someone could run over git-format-patch
output before git-am'ing the result to a -stable tree, though scripting
these changes in the first place turned out to be not the best thing to
do anyway, due to subtle cases where use of master or slave can NOT yet
be replaced, so a large amount of work was done by hand, inspection,
trial and error, which is why this set is a lot longer in coming than
I'd originally hoped. I don't expect -stable backports to be horrible to
figure out one way or another though, and I don't believe that a bit of
inconvenience on that front is enough to warrant not making these
changes.

See here for further details on Red Hat's commitment to this work:
https://www.redhat.com/en/blog/making-open-source-more-inclusive-eradicating-problematic-language

As far as testing goes, I've manually operated on various bonds while
working on this code, and have run it through multiple lnst test runs,
which exercises the existing sysfs interfaces fairly extensively. As far
as I can tell, there is no breakage of existing interfaces with this
set, unless the user consciously opts to do so via Kconfig.

Jarod Wilson (5):
  bonding: rename struct slave member link to link_state
  bonding: rename slave to link where possible
  bonding: rename master to aggregator where possible
  bonding: make Kconfig toggle to disable legacy interfaces
  bonding: update Documentation for link/aggregator terminology

Cc: Jay Vosburgh 
Cc: Veaceslav Falico 
Cc: Andy Gospodarek 
Cc: "David S. Miller" 
Cc: Jakub Kicinski 
Cc: Thomas Davis 
Cc: net...@vger.kernel.org

 .clang-format |4 +-
 Documentation/networking/bonding.rst  |  440 ++--
 drivers/infiniband/core/cma.c |2 +-
 drivers/infiniband/core/lag.c |2 +-
 drivers/infiniband/core/roce_gid_mgmt.c   |   10 +-
 drivers/infiniband/hw/mlx4/main.c |2 +-
 drivers/net/Kconfig   |   12 +
 drivers/net/bonding/Makefile  |2 +-
 drivers/net/bonding/bond_3ad.c|  604 ++---
 drivers/net/bonding/bond_alb.c|  687 ++---
 drivers/net/bonding/bond_debugfs.c|2 +-
 drivers/net/bonding/bond_main.c   | 2336 +
 drivers/net/bonding/bond_netlink.c|  104 +-
 drivers/net/bonding/bond_options.c|  258 +-
 drivers/net/bonding/bond_procfs.c |   63 +-
 drivers/net/bonding/bond_sysfs.c  |  249 +-
 drivers/net/bonding/bond_sysfs_link.c |  193 ++
 drivers/net/bonding/bond_sysfs_slave.c|  176 --
 .../ethernet/chelsio/cxgb3/cxgb3_offload.c|2 +-
 .../net/ethernet/mellanox/mlx4/en_netdev.c|4 +-
 .../ethernet/mellanox/mlx5/core/en/rep/bond.c |2 +-
 .../net/ethernet/mellanox/mlx5/core/en_tc.c   |2 +-
 .../ethernet/netronome/nfp/flower/lag_conf.c  |2 +-
 .../ethernet/qlogic/netxen/netxen_nic_main.c  |   12 +-
 include/linux/netdevice.h |   20 +-
 include/net/bond_3ad.h|   20 +-
 include/net/bond_alb.h|   31 +-
 include/net/bond_options.h|   19 +-
 include/net/bonding.h |  351 +--
 include/net/lag.h |2 +-
 30 files changed, 2902 insertions(+), 2711 deletions(-)
 create mode 100644 drivers

[PATCH net-next 3/5] bonding: rename master to aggregator where possible

2020-09-22 Thread Jarod Wilson

Getting rid of as much usage of "master" as we can here, without breaking
any user-facing API.

Cc: Jay Vosburgh 
Cc: Veaceslav Falico 
Cc: Andy Gospodarek 
Cc: "David S. Miller" 
Cc: Jakub Kicinski 
Cc: Thomas Davis 
Cc: net...@vger.kernel.org
Signed-off-by: Jarod Wilson 
---
 drivers/infiniband/core/cma.c |   2 +-
 drivers/infiniband/core/lag.c |   2 +-
 drivers/infiniband/core/roce_gid_mgmt.c   |   6 +-
 drivers/net/bonding/bond_3ad.c|   2 +-
 drivers/net/bonding/bond_main.c   |  57 +++
 drivers/net/bonding/bond_procfs.c |   4 +-
 drivers/net/bonding/bond_sysfs.c  | 140 +-
 .../ethernet/netronome/nfp/flower/lag_conf.c  |   2 +-
 .../ethernet/qlogic/netxen/netxen_nic_main.c  |   8 +-
 include/linux/netdevice.h |   4 +-
 include/net/bonding.h |   1 +
 11 files changed, 153 insertions(+), 75 deletions(-)

diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
index 7f0e91e92968..9141a8402456 100644
--- a/drivers/infiniband/core/cma.c
+++ b/drivers/infiniband/core/cma.c
@@ -4687,7 +4687,7 @@ static int cma_netdev_callback(struct notifier_block 
*self, unsigned long event,
if (event != NETDEV_BONDING_FAILOVER)
return NOTIFY_DONE;
 
-   if (!netif_is_bond_master(ndev))
+   if (!netif_is_bond_aggregator(ndev))
return NOTIFY_DONE;
 
mutex_lock(&lock);
diff --git a/drivers/infiniband/core/lag.c b/drivers/infiniband/core/lag.c
index 7063e41eaf26..df20107aba88 100644
--- a/drivers/infiniband/core/lag.c
+++ b/drivers/infiniband/core/lag.c
@@ -128,7 +128,7 @@ struct net_device *rdma_lag_get_ah_roce_slave(struct 
ib_device *device,
dev_hold(master);
rcu_read_unlock();
 
-   if (!netif_is_bond_master(master))
+   if (!netif_is_bond_aggregator(master))
goto put;
 
slave = rdma_get_xmit_slave_udp(device, master, ah_attr, flags);
diff --git a/drivers/infiniband/core/roce_gid_mgmt.c 
b/drivers/infiniband/core/roce_gid_mgmt.c
index d0ada1756564..a748d85fbfa1 100644
--- a/drivers/infiniband/core/roce_gid_mgmt.c
+++ b/drivers/infiniband/core/roce_gid_mgmt.c
@@ -129,7 +129,7 @@ enum bonding_slave_state {
 static enum bonding_slave_state is_eth_active_slave_of_bonding_rcu(struct 
net_device *dev,
   struct 
net_device *upper)
 {
-   if (upper && netif_is_bond_master(upper)) {
+   if (upper && netif_is_bond_aggregator(upper)) {
struct net_device *pdev =
bond_option_active_link_get_rcu(netdev_priv(upper));
 
@@ -216,7 +216,7 @@ is_ndev_for_default_gid_filter(struct ib_device *ib_dev, u8 
port,
 * make sure that it the upper netdevice of rdma netdevice.
 */
res = ((cookie_ndev == rdma_ndev && !netif_is_bond_link(rdma_ndev)) ||
-  (netif_is_bond_master(cookie_ndev) &&
+  (netif_is_bond_aggregator(cookie_ndev) &&
rdma_is_upper_dev_rcu(rdma_ndev, cookie_ndev)));
 
rcu_read_unlock();
@@ -271,7 +271,7 @@ is_upper_ndev_bond_master_filter(struct ib_device *ib_dev, 
u8 port,
return false;
 
rcu_read_lock();
-   if (netif_is_bond_master(cookie_ndev) &&
+   if (netif_is_bond_aggregator(cookie_ndev) &&
rdma_is_upper_dev_rcu(rdma_ndev, cookie_ndev))
match = true;
rcu_read_unlock();
diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c
index aec4cd6918b9..6a7c285ae969 100644
--- a/drivers/net/bonding/bond_3ad.c
+++ b/drivers/net/bonding/bond_3ad.c
@@ -2551,7 +2551,7 @@ void bond_3ad_handle_link_change(struct link *link, char 
link_state)
 }
 
 /**
- * bond_3ad_set_carrier - set link state for bonding master
+ * bond_3ad_set_carrier - set link state for bonding aggregator device
  * @bond: bonding structure
  *
  * if we have an active aggregator, we're up, if not, we're down.
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 8e2edebeb61a..f895f0c70017 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -469,8 +469,8 @@ static const struct xfrmdev_ops bond_xfrmdev_ops = {
 
 /*--- Link status ---*/
 
-/* Set the carrier state for the master according to the state of its
- * links.  If any links are up, the master is up.  In 802.3ad mode,
+/* Set the carrier state for the aggregator according to the state of its
+ * links.  If any links are up, the aggregator is up.  In 802.3ad mode,
  * do special 802.3ad magic.
  *
  * Returns zero if carrier state does not change, nonzero if it does.
@@ -1372,7 +1372,7 @@ static rx_handler_result_t bond_handle_frame(struct 
sk_buff **pskb)

[PATCH net-next 1/5] bonding: rename struct slave member link to link_state

2020-09-22 Thread Jarod Wilson

Necessary prep work to recycle the name "link" as a replacement for
"slave" in bonding driver terminology.

Cc: Jay Vosburgh 
Cc: Veaceslav Falico 
Cc: Andy Gospodarek 
Cc: "David S. Miller" 
Cc: Jakub Kicinski 
Cc: Thomas Davis 
Cc: net...@vger.kernel.org
Signed-off-by: Jarod Wilson 
---
 drivers/net/bonding/bond_3ad.c | 12 ++--
 drivers/net/bonding/bond_alb.c |  7 ++-
 drivers/net/bonding/bond_main.c| 77 +-
 drivers/net/bonding/bond_netlink.c |  2 +-
 drivers/net/bonding/bond_options.c |  3 +-
 drivers/net/bonding/bond_procfs.c  |  3 +-
 drivers/net/bonding/bond_sysfs_slave.c |  2 +-
 include/net/bond_3ad.h |  2 +-
 include/net/bond_alb.h |  3 +-
 include/net/bonding.h  | 10 ++--
 10 files changed, 64 insertions(+), 57 deletions(-)

diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c
index aa001b16765a..e55b73aa3043 100644
--- a/drivers/net/bonding/bond_3ad.c
+++ b/drivers/net/bonding/bond_3ad.c
@@ -183,7 +183,7 @@ static inline void __enable_port(struct port *port)
 {
struct slave *slave = port->slave;
 
-   if ((slave->link == BOND_LINK_UP) && bond_slave_is_up(slave))
+   if ((slave->link_state == BOND_LINK_UP) && bond_slave_is_up(slave))
bond_set_slave_active_flags(slave, BOND_SLAVE_NOTIFY_LATER);
 }
 
@@ -256,7 +256,7 @@ static u16 __get_link_speed(struct port *port)
 * This is done in spite of the fact that the e100 driver reports 0
 * to be compatible with MVT in the future.
 */
-   if (slave->link != BOND_LINK_UP)
+   if (slave->link_state != BOND_LINK_UP)
speed = 0;
else {
switch (slave->speed) {
@@ -345,7 +345,7 @@ static u8 __get_duplex(struct port *port)
/* handling a special case: when the configuration starts with
 * link down, it sets the duplex to 0.
 */
-   if (slave->link == BOND_LINK_UP) {
+   if (slave->link_state == BOND_LINK_UP) {
switch (slave->duplex) {
case DUPLEX_FULL:
retval = 0x1;
@@ -2505,7 +2505,7 @@ void bond_3ad_adapter_speed_duplex_changed(struct slave 
*slave)
  *
  * Handle reselection of aggregator (if needed) for this port.
  */
-void bond_3ad_handle_link_change(struct slave *slave, char link)
+void bond_3ad_handle_link_change(struct link *link, char link_state)
 {
struct aggregator *agg;
struct port *port;
@@ -2527,7 +2527,7 @@ void bond_3ad_handle_link_change(struct slave *slave, 
char link)
 * on link up we are forcing recheck on the duplex and speed since
 * some of he adaptors(ce1000.lan) report.
 */
-   if (link == BOND_LINK_UP) {
+   if (link_state == BOND_LINK_UP) {
port->is_enabled = true;
ad_update_actor_keys(port, false);
} else {
@@ -2542,7 +2542,7 @@ void bond_3ad_handle_link_change(struct slave *slave, 
char link)
 
slave_dbg(slave->bond->dev, slave->dev, "Port %d changed link status to 
%s\n",
  port->actor_port_number,
- link == BOND_LINK_UP ? "UP" : "DOWN");
+ link_state == BOND_LINK_UP ? "UP" : "DOWN");
 
/* RTNL is held and mode_lock is released so it's safe
 * to update slave_array here.
diff --git a/drivers/net/bonding/bond_alb.c b/drivers/net/bonding/bond_alb.c
index 4e1b7deb724b..9e6f80d8ef8c 100644
--- a/drivers/net/bonding/bond_alb.c
+++ b/drivers/net/bonding/bond_alb.c
@@ -1664,15 +1664,16 @@ void bond_alb_deinit_slave(struct bonding *bond, struct 
slave *slave)
 
 }
 
-void bond_alb_handle_link_change(struct bonding *bond, struct slave *slave, 
char link)
+void bond_alb_handle_link_change(struct bonding *bond, struct slave *slave,
+char link_state)
 {
struct alb_bond_info *bond_info = &(BOND_ALB_INFO(bond));
 
-   if (link == BOND_LINK_DOWN) {
+   if (link_state == BOND_LINK_DOWN) {
tlb_clear_slave(bond, slave, 0);
if (bond->alb_info.rlb_enabled)
rlb_clear_slave(bond, slave);
-   } else if (link == BOND_LINK_UP) {
+   } else if (link_state == BOND_LINK_UP) {
/* order a rebalance ASAP */
bond_info->tx_rebalance_counter = BOND_TLB_REBALANCE_TICKS;
if (bond->alb_info.rlb_enabled) {
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 42ef25ec0af5..1f602bcf10bd 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -487,7 +487,7 @@ int bond_set_carrier(struct bonding *bond)
return bond_3ad_set_carrier(bond);
 
bond_for_each_slave(bond, slave, iter) {
-

[PATCH net v2] bonding: show saner speed for broadcast mode

2020-08-13 Thread Jarod Wilson

Broadcast mode bonds transmit a copy of all traffic simultaneously out of
all interfaces, so the "speed" of the bond isn't really the aggregate of
all interfaces, but rather, the speed of the slowest active interface.

Also, the type of the speed field is u32, not unsigned long, so adjust
that accordingly, as required to make min() function here without
complaining about mismatching types.

Fixes: bb5b052f751b ("bond: add support to read speed and duplex via ethtool")
CC: Jay Vosburgh 
CC: Veaceslav Falico 
CC: Andy Gospodarek 
CC: "David S. Miller" 
CC: net...@vger.kernel.org
Acked-by: Jay Vosburgh 
Signed-off-by: Jarod Wilson 
---
v2: fix description to clarify speed == that of slowest active interface

 drivers/net/bonding/bond_main.c | 21 ++---
 1 file changed, 18 insertions(+), 3 deletions(-)

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 5ad43aaf76e5..c853ca67058c 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -4552,13 +4552,23 @@ static netdev_tx_t bond_start_xmit(struct sk_buff *skb, 
struct net_device *dev)
return ret;
 }
 
+static u32 bond_mode_bcast_speed(struct slave *slave, u32 speed)
+{
+   if (speed == 0 || speed == SPEED_UNKNOWN)
+   speed = slave->speed;
+   else
+   speed = min(speed, slave->speed);
+
+   return speed;
+}
+
 static int bond_ethtool_get_link_ksettings(struct net_device *bond_dev,
   struct ethtool_link_ksettings *cmd)
 {
struct bonding *bond = netdev_priv(bond_dev);
-   unsigned long speed = 0;
struct list_head *iter;
struct slave *slave;
+   u32 speed = 0;
 
cmd->base.duplex = DUPLEX_UNKNOWN;
cmd->base.port = PORT_OTHER;
@@ -4570,8 +4580,13 @@ static int bond_ethtool_get_link_ksettings(struct 
net_device *bond_dev,
 */
bond_for_each_slave(bond, slave, iter) {
if (bond_slave_can_tx(slave)) {
-   if (slave->speed != SPEED_UNKNOWN)
-   speed += slave->speed;
+   if (slave->speed != SPEED_UNKNOWN) {
+   if (BOND_MODE(bond) == BOND_MODE_BROADCAST)
+   speed = bond_mode_bcast_speed(slave,
+ speed);
+   else
+   speed += slave->speed;
+   }
if (cmd->base.duplex == DUPLEX_UNKNOWN &&
slave->duplex != DUPLEX_UNKNOWN)
cmd->base.duplex = slave->duplex;
-- 
2.20.1

Re: [PATCH net] bonding: show saner speed for broadcast mode

2020-08-13 Thread Jarod Wilson

On Thu, Aug 13, 2020 at 1:30 AM Jay Vosburgh  wrote:
>
> Jarod Wilson  wrote:
>
> >Broadcast mode bonds transmit a copy of all traffic simultaneously out of
> >all interfaces, so the "speed" of the bond isn't really the aggregate of
> >all interfaces, but rather, the speed of the lowest active interface.
>
> Did you mean "slowest" here?

I think I was thinking "lowest speed", but the way it's written does
seem a little ambiguous, and slowest would fit better. I'll repost
with slowest.

> >Also, the type of the speed field is u32, not unsigned long, so adjust
> >that accordingly, as required to make min() function here without
> >complaining about mismatching types.
> >
> >Fixes: bb5b052f751b ("bond: add support to read speed and duplex via 
> >ethtool")
> >CC: Jay Vosburgh 
> >CC: Veaceslav Falico 
> >CC: Andy Gospodarek 
> >CC: "David S. Miller" 
> >CC: net...@vger.kernel.org
> >Signed-off-by: Jarod Wilson 
>
> Did you notice this by inspection, or did it come up in use
> somewhere?  I can't recall ever hearing of anyone using broadcast mode,
> so I'm curious if there is a use for it, but this change seems
> reasonable enough regardless.

Someone working on our virt management tools was working on something
displaying bonding speeds in the UI, and reached out, thinking the
reporting for broadcast mode was wrong. My response was similar: I
don't think I've ever actually used broadcast mode or heard of anyone
using it, but for that one person who does, sure, we can probably make
that adjustment. :)

-- 
Jarod Wilson
ja...@redhat.com

[PATCH net] bonding: show saner speed for broadcast mode

2020-08-12 Thread Jarod Wilson

Broadcast mode bonds transmit a copy of all traffic simultaneously out of
all interfaces, so the "speed" of the bond isn't really the aggregate of
all interfaces, but rather, the speed of the lowest active interface.

Also, the type of the speed field is u32, not unsigned long, so adjust
that accordingly, as required to make min() function here without
complaining about mismatching types.

Fixes: bb5b052f751b ("bond: add support to read speed and duplex via ethtool")
CC: Jay Vosburgh 
CC: Veaceslav Falico 
CC: Andy Gospodarek 
CC: "David S. Miller" 
CC: net...@vger.kernel.org
Signed-off-by: Jarod Wilson 
---
 drivers/net/bonding/bond_main.c | 21 ++---
 1 file changed, 18 insertions(+), 3 deletions(-)

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 5ad43aaf76e5..c853ca67058c 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -4552,13 +4552,23 @@ static netdev_tx_t bond_start_xmit(struct sk_buff *skb, 
struct net_device *dev)
return ret;
 }
 
+static u32 bond_mode_bcast_speed(struct slave *slave, u32 speed)
+{
+   if (speed == 0 || speed == SPEED_UNKNOWN)
+   speed = slave->speed;
+   else
+   speed = min(speed, slave->speed);
+
+   return speed;
+}
+
 static int bond_ethtool_get_link_ksettings(struct net_device *bond_dev,
   struct ethtool_link_ksettings *cmd)
 {
struct bonding *bond = netdev_priv(bond_dev);
-   unsigned long speed = 0;
struct list_head *iter;
struct slave *slave;
+   u32 speed = 0;
 
cmd->base.duplex = DUPLEX_UNKNOWN;
cmd->base.port = PORT_OTHER;
@@ -4570,8 +4580,13 @@ static int bond_ethtool_get_link_ksettings(struct 
net_device *bond_dev,
 */
bond_for_each_slave(bond, slave, iter) {
if (bond_slave_can_tx(slave)) {
-   if (slave->speed != SPEED_UNKNOWN)
-   speed += slave->speed;
+   if (slave->speed != SPEED_UNKNOWN) {
+   if (BOND_MODE(bond) == BOND_MODE_BROADCAST)
+   speed = bond_mode_bcast_speed(slave,
+ speed);
+   else
+   speed += slave->speed;
+   }
if (cmd->base.duplex == DUPLEX_UNKNOWN &&
slave->duplex != DUPLEX_UNKNOWN)
cmd->base.duplex = slave->duplex;
-- 
2.20.1

[PATCH net-next] bonding: don't need RTNL for ipsec helpers

2020-07-08 Thread Jarod Wilson

The bond_ipsec_* helpers don't need RTNL, and can potentially get called
without it being held, so switch from rtnl_dereference() to
rcu_dereference() to access bond struct data.

Lightly tested with xfrm bonding, no problems found, should address the
syzkaller bug referenced below.

Reported-by: syzbot+582c98032903dcc04...@syzkaller.appspotmail.com
CC: Huy Nguyen 
CC: Saeed Mahameed 
CC: Jay Vosburgh 
CC: Veaceslav Falico 
CC: Andy Gospodarek 
CC: "David S. Miller" 
CC: Jeff Kirsher 
CC: Jakub Kicinski 
CC: Steffen Klassert 
CC: Herbert Xu 
CC: net...@vger.kernel.org
CC: intel-wired-...@lists.osuosl.org
Signed-off-by: Jarod Wilson 
---
 drivers/net/bonding/bond_main.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index f886d97c4359..e2d491c4378c 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -390,7 +390,7 @@ static int bond_ipsec_add_sa(struct xfrm_state *xs)
return -EINVAL;
 
bond = netdev_priv(bond_dev);
-   slave = rtnl_dereference(bond->curr_active_slave);
+   slave = rcu_dereference(bond->curr_active_slave);
xs->xso.real_dev = slave->dev;
bond->xs = xs;
 
@@ -417,7 +417,7 @@ static void bond_ipsec_del_sa(struct xfrm_state *xs)
return;
 
bond = netdev_priv(bond_dev);
-   slave = rtnl_dereference(bond->curr_active_slave);
+   slave = rcu_dereference(bond->curr_active_slave);
 
if (!slave)
return;
@@ -442,7 +442,7 @@ static bool bond_ipsec_offload_ok(struct sk_buff *skb, 
struct xfrm_state *xs)
 {
struct net_device *bond_dev = xs->xso.dev;
struct bonding *bond = netdev_priv(bond_dev);
-   struct slave *curr_active = rtnl_dereference(bond->curr_active_slave);
+   struct slave *curr_active = rcu_dereference(bond->curr_active_slave);
struct net_device *slave_dev = curr_active->dev;
 
if (BOND_MODE(bond) != BOND_MODE_ACTIVEBACKUP)
-- 
2.20.1

Re: WARNING: suspicious RCU usage in bond_ipsec_add_sa

2020-07-08 Thread Jarod Wilson

On Mon, Jul 6, 2020 at 11:44 AM syzbot
 wrote:
>
> Hello,
>
> syzbot found the following crash on:
>
> HEAD commit:e44f65fd xen-netfront: remove redundant assignment to vari..
> git tree:   net-next
> console output: https://syzkaller.appspot.com/x/log.txt?x=16148f8710
> kernel config:  https://syzkaller.appspot.com/x/.config?x=829871134ca5e230
> dashboard link: https://syzkaller.appspot.com/bug?extid=582c98032903dcc04816
> compiler:   gcc (GCC) 10.1.0-syz 20200507
>
> Unfortunately, I don't have any reproducer for this crash yet.
>
> IMPORTANT: if you fix the bug, please add the following tag to the commit:
> Reported-by: syzbot+582c98032903dcc04...@syzkaller.appspotmail.com
>
> =
> WARNING: suspicious RCU usage
> 5.8.0-rc2-syzkaller #0 Not tainted
> -
> drivers/net/bonding/bond_main.c:387 suspicious rcu_dereference_protected() 
> usage!

Hm. Access to curr_active_slave in the bonding driver is kind of all
over the place, between rtnl_dereference, rcu_deference,
rcu_access_pointer and just reading it without any protections. It
does look like this is a case where bond_ipsec_add_sa() gets called
without RTNL being held, so perhaps we should be using rcu_dereference
here, since we do need to dereference the acquired pointer, but
probably don't need to be holding RTNL here.


> other info that might help us debug this:
>
>
> rcu_scheduler_active = 2, debug_locks = 1
> 1 lock held by syz-executor.0/5186:
>  #0: 888089791a28 (&net->xfrm.xfrm_cfg_mutex){+.+.}-{3:3}, at: 
> xfrm_netlink_rcv+0x5c/0x90 net/xfrm/xfrm_user.c:2687
>
> stack backtrace:
> CPU: 1 PID: 5186 Comm: syz-executor.0 Not tainted 5.8.0-rc2-syzkaller #0
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS 
> Google 01/01/2011
> Call Trace:
>  __dump_stack lib/dump_stack.c:77 [inline]
>  dump_stack+0x18f/0x20d lib/dump_stack.c:118
>  bond_ipsec_add_sa+0x1c8/0x220 drivers/net/bonding/bond_main.c:387
>  xfrm_dev_state_add+0x2da/0x7b0 net/xfrm/xfrm_device.c:268
>  xfrm_state_construct net/xfrm/xfrm_user.c:655 [inline]
>  xfrm_add_sa+0x2166/0x34f0 net/xfrm/xfrm_user.c:684
>  xfrm_user_rcv_msg+0x414/0x700 net/xfrm/xfrm_user.c:2680
>  netlink_rcv_skb+0x15a/0x430 net/netlink/af_netlink.c:2469
>  xfrm_netlink_rcv+0x6b/0x90 net/xfrm/xfrm_user.c:2688
>  netlink_unicast_kernel net/netlink/af_netlink.c:1303 [inline]
>  netlink_unicast+0x533/0x7d0 net/netlink/af_netlink.c:1329
>  netlink_sendmsg+0x856/0xd90 net/netlink/af_netlink.c:1918
>  sock_sendmsg_nosec net/socket.c:652 [inline]
>  sock_sendmsg+0xcf/0x120 net/socket.c:672
>  sys_sendmsg+0x6e8/0x810 net/socket.c:2352
>  ___sys_sendmsg+0xf3/0x170 net/socket.c:2406
>  __sys_sendmsg+0xe5/0x1b0 net/socket.c:2439
>  do_syscall_64+0x60/0xe0 arch/x86/entry/common.c:359
>  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> RIP: 0033:0x45cb29
> Code: Bad RIP value.
> RSP: 002b:7ff7e9a92c78 EFLAGS: 0246 ORIG_RAX: 002e
> RAX: ffda RBX: 005027e0 RCX: 0045cb29
> RDX:  RSI: 2180 RDI: 0003
> RBP: 0078bf00 R08:  R09: 
> R10:  R11: 0246 R12: 
> R13: 0a45 R14: 004cd2c9 R15: 7ff7e9a936d4
> bond0: (slave bond_slave_0): Slave does not support ipsec offload
>
>
> ---
> This bug is generated by a bot. It may contain errors.
> See https://goo.gl/tpsmEJ for more information about syzbot.
> syzbot engineers can be reached at syzkal...@googlegroups.com.
>
> syzbot will keep track of this bug report. See:
> https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
>


-- 
Jarod Wilson
ja...@redhat.com

[PATCH net-next] bonding: deal with xfrm state in all modes and add more error-checking

2020-07-08 Thread Jarod Wilson

It's possible that device removal happens when the bond is in non-AB mode,
and addition happens in AB mode, so bond_ipsec_del_sa() never gets called,
which leaves security associations in an odd state if bond_ipsec_add_sa()
then gets called after switching the bond into AB. Just call add and
delete universally for all modes to keep things consistent.

However, it's also possible that this code gets called when the system is
shutting down, and the xfrm subsystem has already been disconnected from
the bond device, so we need to do some error-checking and bail, lest we
hit a null ptr deref.

Fixes: a3b658cfb664 ("bonding: allow xfrm offload setup post-module-load")
CC: Huy Nguyen 
CC: Saeed Mahameed 
CC: Jay Vosburgh 
CC: Veaceslav Falico 
CC: Andy Gospodarek 
CC: "David S. Miller" 
CC: Jeff Kirsher 
CC: Jakub Kicinski 
CC: Steffen Klassert 
CC: Herbert Xu 
CC: net...@vger.kernel.org
CC: intel-wired-...@lists.osuosl.org
Signed-off-by: Jarod Wilson 
---
 drivers/net/bonding/bond_main.c | 39 +
 1 file changed, 25 insertions(+), 14 deletions(-)

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 2adf6ce20a38..f886d97c4359 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -383,9 +383,14 @@ static int bond_vlan_rx_kill_vid(struct net_device 
*bond_dev,
 static int bond_ipsec_add_sa(struct xfrm_state *xs)
 {
struct net_device *bond_dev = xs->xso.dev;
-   struct bonding *bond = netdev_priv(bond_dev);
-   struct slave *slave = rtnl_dereference(bond->curr_active_slave);
+   struct bonding *bond;
+   struct slave *slave;
 
+   if (!bond_dev)
+   return -EINVAL;
+
+   bond = netdev_priv(bond_dev);
+   slave = rtnl_dereference(bond->curr_active_slave);
xs->xso.real_dev = slave->dev;
bond->xs = xs;
 
@@ -405,8 +410,14 @@ static int bond_ipsec_add_sa(struct xfrm_state *xs)
 static void bond_ipsec_del_sa(struct xfrm_state *xs)
 {
struct net_device *bond_dev = xs->xso.dev;
-   struct bonding *bond = netdev_priv(bond_dev);
-   struct slave *slave = rtnl_dereference(bond->curr_active_slave);
+   struct bonding *bond;
+   struct slave *slave;
+
+   if (!bond_dev)
+   return;
+
+   bond = netdev_priv(bond_dev);
+   slave = rtnl_dereference(bond->curr_active_slave);
 
if (!slave)
return;
@@ -960,12 +971,12 @@ void bond_change_active_slave(struct bonding *bond, 
struct slave *new_active)
if (old_active == new_active)
return;
 
-   if (new_active) {
 #ifdef CONFIG_XFRM_OFFLOAD
-   if ((BOND_MODE(bond) == BOND_MODE_ACTIVEBACKUP) && bond->xs)
-   bond_ipsec_del_sa(bond->xs);
+   if (old_active && bond->xs)
+   bond_ipsec_del_sa(bond->xs);
 #endif /* CONFIG_XFRM_OFFLOAD */
 
+   if (new_active) {
new_active->last_link_up = jiffies;
 
if (new_active->link == BOND_LINK_BACK) {
@@ -1028,13 +1039,6 @@ void bond_change_active_slave(struct bonding *bond, 
struct slave *new_active)
bond_should_notify_peers(bond);
}
 
-#ifdef CONFIG_XFRM_OFFLOAD
-   if (old_active && bond->xs) {
-   xfrm_dev_state_flush(dev_net(bond->dev), 
bond->dev, true);
-   bond_ipsec_add_sa(bond->xs);
-   }
-#endif /* CONFIG_XFRM_OFFLOAD */
-
call_netdevice_notifiers(NETDEV_BONDING_FAILOVER, 
bond->dev);
if (should_notify_peers) {
bond->send_peer_notif--;
@@ -1044,6 +1048,13 @@ void bond_change_active_slave(struct bonding *bond, 
struct slave *new_active)
}
}
 
+#ifdef CONFIG_XFRM_OFFLOAD
+   if (new_active && bond->xs) {
+   xfrm_dev_state_flush(dev_net(bond->dev), bond->dev, true);
+   bond_ipsec_add_sa(bond->xs);
+   }
+#endif /* CONFIG_XFRM_OFFLOAD */
+
/* resend IGMP joins since active slave has changed or
 * all were sent on curr_active_slave.
 * resend only if bond is brought up with the affected
-- 
2.20.1

[PATCH net-next] bonding: allow xfrm offload setup post-module-load

2020-06-30 Thread Jarod Wilson

At the moment, bonding xfrm crypto offload can only be set up if the bonding
module is loaded with active-backup mode already set. We need to be able to
make this work with bonds set to AB after the bonding driver has already
been loaded.

So what's done here is:

1) move #define BOND_XFRM_FEATURES to net/bonding.h so it can be used
by both bond_main.c and bond_options.c
2) set BOND_XFRM_FEATURES in bond_dev->hw_features universally, rather than
only when loading in AB mode
3) wire up xfrmdev_ops universally too
4) disable BOND_XFRM_FEATURES in bond_dev->features if not AB
5) exit early (non-AB case) from bond_ipsec_offload_ok, to prevent a
performance hit from traversing into the underlying drivers
5) toggle BOND_XFRM_FEATURES in bond_dev->wanted_features and call
netdev_change_features() from bond_option_mode_set()

In my local testing, I can change bonding modes back and forth on the fly,
have hardware offload work when I'm in AB, and see no performance penalty
to non-AB software encryption, despite having xfrm bits all wired up for
all modes now.

Fixes: 18cb261afd7b ("bonding: support hardware encryption offload to slaves")
Reported-by: Huy Nguyen 
CC: Saeed Mahameed 
CC: Jay Vosburgh 
CC: Veaceslav Falico 
CC: Andy Gospodarek 
CC: "David S. Miller" 
CC: Jeff Kirsher 
CC: Jakub Kicinski 
CC: Steffen Klassert 
CC: Herbert Xu 
CC: net...@vger.kernel.org
CC: intel-wired-...@lists.osuosl.org
Signed-off-by: Jarod Wilson 
---
 drivers/net/bonding/bond_main.c| 19 ++-
 drivers/net/bonding/bond_options.c |  8 
 include/net/bonding.h  |  5 +
 3 files changed, 23 insertions(+), 9 deletions(-)

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index b3479584cc16..2adf6ce20a38 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -434,6 +434,9 @@ static bool bond_ipsec_offload_ok(struct sk_buff *skb, 
struct xfrm_state *xs)
struct slave *curr_active = rtnl_dereference(bond->curr_active_slave);
struct net_device *slave_dev = curr_active->dev;
 
+   if (BOND_MODE(bond) != BOND_MODE_ACTIVEBACKUP)
+   return true;
+
if (!(slave_dev->xfrmdev_ops
  && slave_dev->xfrmdev_ops->xdo_dev_offload_ok)) {
slave_warn(bond_dev, slave_dev, "%s: no slave 
xdo_dev_offload_ok\n", __func__);
@@ -1218,11 +1221,6 @@ static netdev_features_t bond_fix_features(struct 
net_device *dev,
 #define BOND_ENC_FEATURES  (NETIF_F_HW_CSUM | NETIF_F_SG | \
 NETIF_F_RXCSUM | NETIF_F_ALL_TSO)
 
-#ifdef CONFIG_XFRM_OFFLOAD
-#define BOND_XFRM_FEATURES (NETIF_F_HW_ESP | NETIF_F_HW_ESP_TX_CSUM | \
-NETIF_F_GSO_ESP)
-#endif /* CONFIG_XFRM_OFFLOAD */
-
 #define BOND_MPLS_FEATURES (NETIF_F_HW_CSUM | NETIF_F_SG | \
 NETIF_F_ALL_TSO)
 
@@ -4654,8 +4652,7 @@ void bond_setup(struct net_device *bond_dev)
 
 #ifdef CONFIG_XFRM_OFFLOAD
/* set up xfrm device ops (only supported in active-backup right now) */
-   if (BOND_MODE(bond) == BOND_MODE_ACTIVEBACKUP)
-   bond_dev->xfrmdev_ops = &bond_xfrmdev_ops;
+   bond_dev->xfrmdev_ops = &bond_xfrmdev_ops;
bond->xs = NULL;
 #endif /* CONFIG_XFRM_OFFLOAD */
 
@@ -4678,11 +4675,15 @@ void bond_setup(struct net_device *bond_dev)
 
bond_dev->hw_features |= NETIF_F_GSO_ENCAP_ALL | NETIF_F_GSO_UDP_L4;
 #ifdef CONFIG_XFRM_OFFLOAD
-   if (BOND_MODE(bond) == BOND_MODE_ACTIVEBACKUP)
-   bond_dev->hw_features |= BOND_XFRM_FEATURES;
+   bond_dev->hw_features |= BOND_XFRM_FEATURES;
 #endif /* CONFIG_XFRM_OFFLOAD */
bond_dev->features |= bond_dev->hw_features;
bond_dev->features |= NETIF_F_HW_VLAN_CTAG_TX | NETIF_F_HW_VLAN_STAG_TX;
+#ifdef CONFIG_XFRM_OFFLOAD
+   /* Disable XFRM features if this isn't an active-backup config */
+   if (BOND_MODE(bond) != BOND_MODE_ACTIVEBACKUP)
+   bond_dev->features &= ~BOND_XFRM_FEATURES;
+#endif /* CONFIG_XFRM_OFFLOAD */
 }
 
 /* Destroy a bonding device.
diff --git a/drivers/net/bonding/bond_options.c 
b/drivers/net/bonding/bond_options.c
index ddb3916d3506..9abfaae1c6f7 100644
--- a/drivers/net/bonding/bond_options.c
+++ b/drivers/net/bonding/bond_options.c
@@ -767,6 +767,14 @@ static int bond_option_mode_set(struct bonding *bond,
if (newval->value == BOND_MODE_ALB)
bond->params.tlb_dynamic_lb = 1;
 
+#ifdef CONFIG_XFRM_OFFLOAD
+   if (newval->value == BOND_MODE_ACTIVEBACKUP)
+   bond->dev->wanted_features |= BOND_XFRM_FEATURES;
+   else
+   bond->dev->wanted_features &= ~BOND_XFRM_FEATURES;
+   netdev_change_features(bond->dev);
+#endif /* CONFIG_XFRM_OFFLOAD */
+
/* don't cache arp_validate between mode

[PATCH net-next] bonding/xfrm: use real_dev instead of slave_dev

2020-06-23 Thread Jarod Wilson

Rather than requiring every hw crypto capable NIC driver to do a check for
slave_dev being set, set real_dev in the xfrm layer and xso init time, and
then override it in the bonding driver as needed. Then NIC drivers can
always use real_dev, and at the same time, we eliminate the use of a
variable name that probably shouldn't have been used in the first place,
particularly given recent current events.

CC: Boris Pismenny 
CC: Saeed Mahameed 
CC: Leon Romanovsky 
CC: Jay Vosburgh 
CC: Veaceslav Falico 
CC: Andy Gospodarek 
CC: "David S. Miller" 
CC: Jeff Kirsher 
CC: Jakub Kicinski 
CC: Steffen Klassert 
CC: Herbert Xu 
CC: net...@vger.kernel.org
Suggested-by: Saeed Mahameed 
Signed-off-by: Jarod Wilson 
---
 drivers/net/bonding/bond_main.c   |  6 +--
 .../net/ethernet/intel/ixgbe/ixgbe_ipsec.c| 47 +--
 .../mellanox/mlx5/core/en_accel/ipsec.c   | 10 +---
 include/net/xfrm.h|  2 +-
 net/xfrm/xfrm_device.c|  5 +-
 5 files changed, 21 insertions(+), 49 deletions(-)

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 90939ccf2a94..4ef99efc37f6 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -386,7 +386,7 @@ static int bond_ipsec_add_sa(struct xfrm_state *xs)
struct bonding *bond = netdev_priv(bond_dev);
struct slave *slave = rtnl_dereference(bond->curr_active_slave);
 
-   xs->xso.slave_dev = slave->dev;
+   xs->xso.real_dev = slave->dev;
bond->xs = xs;
 
if (!(slave->dev->xfrmdev_ops
@@ -411,7 +411,7 @@ static void bond_ipsec_del_sa(struct xfrm_state *xs)
if (!slave)
return;
 
-   xs->xso.slave_dev = slave->dev;
+   xs->xso.real_dev = slave->dev;
 
if (!(slave->dev->xfrmdev_ops
  && slave->dev->xfrmdev_ops->xdo_dev_state_delete)) {
@@ -440,7 +440,7 @@ static bool bond_ipsec_offload_ok(struct sk_buff *skb, 
struct xfrm_state *xs)
return false;
}
 
-   xs->xso.slave_dev = slave_dev;
+   xs->xso.real_dev = slave_dev;
return slave_dev->xfrmdev_ops->xdo_dev_offload_ok(skb, xs);
 }
 
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c 
b/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c
index 26b0a58a064d..6516980965a2 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c
@@ -427,14 +427,11 @@ static struct xfrm_state 
*ixgbe_ipsec_find_rx_state(struct ixgbe_ipsec *ipsec,
 static int ixgbe_ipsec_parse_proto_keys(struct xfrm_state *xs,
u32 *mykey, u32 *mysalt)
 {
-   struct net_device *dev = xs->xso.dev;
+   struct net_device *dev = xs->xso.real_dev;
unsigned char *key_data;
char *alg_name = NULL;
int key_len;
 
-   if (xs->xso.slave_dev)
-   dev = xs->xso.slave_dev;
-
if (!xs->aead) {
netdev_err(dev, "Unsupported IPsec algorithm\n");
return -EINVAL;
@@ -480,9 +477,9 @@ static int ixgbe_ipsec_parse_proto_keys(struct xfrm_state 
*xs,
  **/
 static int ixgbe_ipsec_check_mgmt_ip(struct xfrm_state *xs)
 {
-   struct net_device *dev = xs->xso.dev;
-   struct ixgbe_adapter *adapter;
-   struct ixgbe_hw *hw;
+   struct net_device *dev = xs->xso.real_dev;
+   struct ixgbe_adapter *adapter = netdev_priv(dev);
+   struct ixgbe_hw *hw = &adapter->hw;
u32 mfval, manc, reg;
int num_filters = 4;
bool manc_ipv4;
@@ -500,12 +497,6 @@ static int ixgbe_ipsec_check_mgmt_ip(struct xfrm_state *xs)
 #define BMCIP_V6 0x3
 #define BMCIP_MASK   0x3
 
-   if (xs->xso.slave_dev)
-   dev = xs->xso.slave_dev;
-
-   adapter = netdev_priv(dev);
-   hw = &adapter->hw;
-
manc = IXGBE_READ_REG(hw, IXGBE_MANC);
manc_ipv4 = !!(manc & MANC_EN_IPV4_FILTER);
mfval = IXGBE_READ_REG(hw, IXGBE_MFVAL);
@@ -569,22 +560,15 @@ static int ixgbe_ipsec_check_mgmt_ip(struct xfrm_state 
*xs)
  **/
 static int ixgbe_ipsec_add_sa(struct xfrm_state *xs)
 {
-   struct net_device *dev = xs->xso.dev;
-   struct ixgbe_adapter *adapter;
-   struct ixgbe_ipsec *ipsec;
-   struct ixgbe_hw *hw;
+   struct net_device *dev = xs->xso.real_dev;
+   struct ixgbe_adapter *adapter = netdev_priv(dev);
+   struct ixgbe_ipsec *ipsec = adapter->ipsec;
+   struct ixgbe_hw *hw = &adapter->hw;
int checked, match, first;
u16 sa_idx;
int ret;
int i;
 
-   if (xs->xso.slave_dev)
-   dev = xs->xso.slave_dev;
-
-   adapter = netdev_priv(dev);
-   ipsec = adapter->ipsec;
-   hw = &adapter->hw;
-
if (xs->id.proto != IPPROTO_ESP &

Re: [PATCH net-next v2 3/4] mlx5: become aware of when running as a bonding slave

2020-06-21 Thread Jarod Wilson

On Thu, Jun 11, 2020 at 5:51 PM Saeed Mahameed  wrote:
>
> On Wed, 2020-06-10 at 14:59 -0400, Jarod Wilson wrote:
> > I've been unable to get my hands on suitable supported hardware to
> > date,
> > but I believe this ought to be all that is needed to enable the mlx5
> > driver to also work with bonding active-backup crypto offload
> > passthru.
> >
> > CC: Boris Pismenny 
> > CC: Saeed Mahameed 
> > CC: Leon Romanovsky 
> > CC: Jay Vosburgh 
> > CC: Veaceslav Falico 
> > CC: Andy Gospodarek 
> > CC: "David S. Miller" 
> > CC: Jeff Kirsher 
> > CC: Jakub Kicinski 
> > CC: Steffen Klassert 
> > CC: Herbert Xu 
> > CC: net...@vger.kernel.org
> > Signed-off-by: Jarod Wilson 
> > ---
> >  drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c | 6 ++
> >  1 file changed, 6 insertions(+)
> >
> > diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c
> > b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c
> > index 92eb3bad4acd..72ad6664bd73 100644
> > --- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c
> > +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c
> > @@ -210,6 +210,9 @@ static inline int
> > mlx5e_xfrm_validate_state(struct xfrm_state *x)
> >   struct net_device *netdev = x->xso.dev;
> >   struct mlx5e_priv *priv;
> >
> > + if (x->xso.slave_dev)
> > + netdev = x->xso.slave_dev;
> > +
>
> Do we really need to repeat this per driver ?
> why not just setup xso.real_dev, in xfrm layer once and for all before
> calling device drivers ?
>
> Device drivers will use xso.real_dev blindly.
>
> Will be useful in the future when you add vlan support, etc..

Apologies, I didn't catch your reply until just recently. Yeah, that
sounds like a better approach, if I can work it out cleanly. We just
init xso.real_dev to the same thing as xso.dev, then overwrite it in
the upper layer drivers (bonding, vlan, etc), while device drivers
just always use xso.real_dev, if I'm understanding your suggestion.
I'll see what I can come up with.


-- 
Jarod Wilson
ja...@redhat.com

[PATCH net-next v3 3/4] mlx5: become aware of when running as a bonding slave

2020-06-19 Thread Jarod Wilson

I've been unable to get my hands on suitable supported hardware to date,
but I believe this ought to be all that is needed to enable the mlx5
driver to also work with bonding active-backup crypto offload passthru.

CC: Boris Pismenny 
CC: Saeed Mahameed 
CC: Leon Romanovsky 
CC: Jay Vosburgh 
CC: Veaceslav Falico 
CC: Andy Gospodarek 
CC: "David S. Miller" 
CC: Jeff Kirsher 
CC: Jakub Kicinski 
CC: Steffen Klassert 
CC: Herbert Xu 
CC: net...@vger.kernel.org
Signed-off-by: Jarod Wilson 
---
 drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c
index 92eb3bad4acd..72ad6664bd73 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c
@@ -210,6 +210,9 @@ static inline int mlx5e_xfrm_validate_state(struct 
xfrm_state *x)
struct net_device *netdev = x->xso.dev;
struct mlx5e_priv *priv;
 
+   if (x->xso.slave_dev)
+   netdev = x->xso.slave_dev;
+
priv = netdev_priv(netdev);
 
if (x->props.aalgo != SADB_AALG_NONE) {
@@ -291,6 +294,9 @@ static int mlx5e_xfrm_add_state(struct xfrm_state *x)
unsigned int sa_handle;
int err;
 
+   if (x->xso.slave_dev)
+   netdev = x->xso.slave_dev;
+
priv = netdev_priv(netdev);
 
err = mlx5e_xfrm_validate_state(x);
-- 
2.20.1

[PATCH net-next v3 4/4] bonding: support hardware encryption offload to slaves

2020-06-19 Thread Jarod Wilson

Currently, this support is limited to active-backup mode, as I'm not sure
about the feasilibity of mapping an xfrm_state's offload handle to
multiple hardware devices simultaneously, and we rely on being able to
pass some hints to both the xfrm and NIC driver about whether or not
they're operating on a slave device.

I've tested this atop an Intel x520 device (ixgbe) using libreswan in
transport mode, succesfully achieving ~4.3Gbps throughput with netperf
(more or less identical to throughput on a bare NIC in this system),
as well as successful failover and recovery mid-netperf.

v2: just use CONFIG_XFRM_OFFLOAD for wrapping, isolate more code with it

CC: Jay Vosburgh 
CC: Veaceslav Falico 
CC: Andy Gospodarek 
CC: "David S. Miller" 
CC: Jeff Kirsher 
CC: Jakub Kicinski 
CC: Steffen Klassert 
CC: Herbert Xu 
CC: net...@vger.kernel.org
CC: intel-wired-...@lists.osuosl.org
Signed-off-by: Jarod Wilson 
---
 drivers/net/bonding/bond_main.c | 127 +++-
 include/net/bonding.h   |   3 +
 2 files changed, 128 insertions(+), 2 deletions(-)

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 004919aea5fb..90939ccf2a94 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -79,6 +79,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -278,8 +279,6 @@ const char *bond_mode_name(int mode)
return names[mode];
 }
 
-/*-- VLAN ---*/
-
 /**
  * bond_dev_queue_xmit - Prepare skb for xmit.
  *
@@ -302,6 +301,8 @@ netdev_tx_t bond_dev_queue_xmit(struct bonding *bond, 
struct sk_buff *skb,
return dev_queue_xmit(skb);
 }
 
+/*-- VLAN ---*/
+
 /* In the following 2 functions, bond_vlan_rx_add_vid and 
bond_vlan_rx_kill_vid,
  * We don't protect the slave list iteration with a lock because:
  * a. This operation is performed in IOCTL context,
@@ -372,6 +373,84 @@ static int bond_vlan_rx_kill_vid(struct net_device 
*bond_dev,
return 0;
 }
 
+/*-- XFRM ---*/
+
+#ifdef CONFIG_XFRM_OFFLOAD
+/**
+ * bond_ipsec_add_sa - program device with a security association
+ * @xs: pointer to transformer state struct
+ **/
+static int bond_ipsec_add_sa(struct xfrm_state *xs)
+{
+   struct net_device *bond_dev = xs->xso.dev;
+   struct bonding *bond = netdev_priv(bond_dev);
+   struct slave *slave = rtnl_dereference(bond->curr_active_slave);
+
+   xs->xso.slave_dev = slave->dev;
+   bond->xs = xs;
+
+   if (!(slave->dev->xfrmdev_ops
+ && slave->dev->xfrmdev_ops->xdo_dev_state_add)) {
+   slave_warn(bond_dev, slave->dev, "Slave does not support ipsec 
offload\n");
+   return -EINVAL;
+   }
+
+   return slave->dev->xfrmdev_ops->xdo_dev_state_add(xs);
+}
+
+/**
+ * bond_ipsec_del_sa - clear out this specific SA
+ * @xs: pointer to transformer state struct
+ **/
+static void bond_ipsec_del_sa(struct xfrm_state *xs)
+{
+   struct net_device *bond_dev = xs->xso.dev;
+   struct bonding *bond = netdev_priv(bond_dev);
+   struct slave *slave = rtnl_dereference(bond->curr_active_slave);
+
+   if (!slave)
+   return;
+
+   xs->xso.slave_dev = slave->dev;
+
+   if (!(slave->dev->xfrmdev_ops
+ && slave->dev->xfrmdev_ops->xdo_dev_state_delete)) {
+   slave_warn(bond_dev, slave->dev, "%s: no slave 
xdo_dev_state_delete\n", __func__);
+   return;
+   }
+
+   slave->dev->xfrmdev_ops->xdo_dev_state_delete(xs);
+}
+
+/**
+ * bond_ipsec_offload_ok - can this packet use the xfrm hw offload
+ * @skb: current data packet
+ * @xs: pointer to transformer state struct
+ **/
+static bool bond_ipsec_offload_ok(struct sk_buff *skb, struct xfrm_state *xs)
+{
+   struct net_device *bond_dev = xs->xso.dev;
+   struct bonding *bond = netdev_priv(bond_dev);
+   struct slave *curr_active = rtnl_dereference(bond->curr_active_slave);
+   struct net_device *slave_dev = curr_active->dev;
+
+   if (!(slave_dev->xfrmdev_ops
+ && slave_dev->xfrmdev_ops->xdo_dev_offload_ok)) {
+   slave_warn(bond_dev, slave_dev, "%s: no slave 
xdo_dev_offload_ok\n", __func__);
+   return false;
+   }
+
+   xs->xso.slave_dev = slave_dev;
+   return slave_dev->xfrmdev_ops->xdo_dev_offload_ok(skb, xs);
+}
+
+static const struct xfrmdev_ops bond_xfrmdev_ops = {
+   .xdo_dev_state_add = bond_ipsec_add_sa,
+   .xdo_dev_state_delete = bond_ipsec_del_sa,
+   .xdo_dev_offload_ok = bond_ipsec_offload_ok,
+};
+#endif /* CONFIG_XFRM_OFFLOAD */
+
 /*

[PATCH net-next v3 1/4] xfrm: bail early on slave pass over skb

2020-06-19 Thread Jarod Wilson

This is prep work for initial support of bonding hardware encryption
pass-through support. The bonding driver will fill in the slave_dev
pointer, and we use that to know not to skb_push() again on a given
skb that was already processed on the bond device.

CC: Jay Vosburgh 
CC: Veaceslav Falico 
CC: Andy Gospodarek 
CC: "David S. Miller" 
CC: Jeff Kirsher 
CC: Jakub Kicinski 
CC: Steffen Klassert 
CC: Herbert Xu 
CC: net...@vger.kernel.org
CC: intel-wired-...@lists.osuosl.org
Signed-off-by: Jarod Wilson 
---
 include/net/xfrm.h |  1 +
 net/xfrm/xfrm_device.c | 34 +-
 2 files changed, 18 insertions(+), 17 deletions(-)

diff --git a/include/net/xfrm.h b/include/net/xfrm.h
index 094fe682f5d7..e20b2b27ec48 100644
--- a/include/net/xfrm.h
+++ b/include/net/xfrm.h
@@ -127,6 +127,7 @@ struct xfrm_state_walk {
 
 struct xfrm_state_offload {
struct net_device   *dev;
+   struct net_device   *slave_dev;
unsigned long   offload_handle;
unsigned intnum_exthdrs;
u8  flags;
diff --git a/net/xfrm/xfrm_device.c b/net/xfrm/xfrm_device.c
index f50d1f97cf8e..b8918fc5248b 100644
--- a/net/xfrm/xfrm_device.c
+++ b/net/xfrm/xfrm_device.c
@@ -106,6 +106,7 @@ struct sk_buff *validate_xmit_xfrm(struct sk_buff *skb, 
netdev_features_t featur
struct sk_buff *skb2, *nskb, *pskb = NULL;
netdev_features_t esp_features = features;
struct xfrm_offload *xo = xfrm_offload(skb);
+   struct net_device *dev = skb->dev;
struct sec_path *sp;
 
if (!xo)
@@ -119,6 +120,10 @@ struct sk_buff *validate_xmit_xfrm(struct sk_buff *skb, 
netdev_features_t featur
if (xo->flags & XFRM_GRO || x->xso.flags & XFRM_OFFLOAD_INBOUND)
return skb;
 
+   /* This skb was already validated on the master dev */
+   if ((x->xso.dev != dev) && (x->xso.slave_dev == dev))
+   return skb;
+
local_irq_save(flags);
sd = this_cpu_ptr(&softnet_data);
err = !skb_queue_empty(&sd->xfrm_backlog);
@@ -129,25 +134,20 @@ struct sk_buff *validate_xmit_xfrm(struct sk_buff *skb, 
netdev_features_t featur
return skb;
}
 
-   if (skb_is_gso(skb)) {
-   struct net_device *dev = skb->dev;
-
-   if (unlikely(x->xso.dev != dev)) {
-   struct sk_buff *segs;
+   if (skb_is_gso(skb) && unlikely(x->xso.dev != dev)) {
+   struct sk_buff *segs;
 
-   /* Packet got rerouted, fixup features and segment it. 
*/
-   esp_features = esp_features & ~(NETIF_F_HW_ESP
-   | NETIF_F_GSO_ESP);
+   /* Packet got rerouted, fixup features and segment it. */
+   esp_features = esp_features & ~(NETIF_F_HW_ESP | 
NETIF_F_GSO_ESP);
 
-   segs = skb_gso_segment(skb, esp_features);
-   if (IS_ERR(segs)) {
-   kfree_skb(skb);
-   atomic_long_inc(&dev->tx_dropped);
-   return NULL;
-   } else {
-   consume_skb(skb);
-   skb = segs;
-   }
+   segs = skb_gso_segment(skb, esp_features);
+   if (IS_ERR(segs)) {
+   kfree_skb(skb);
+   atomic_long_inc(&dev->tx_dropped);
+   return NULL;
+   } else {
+   consume_skb(skb);
+   skb = segs;
}
}
 
-- 
2.20.1

[PATCH net-next v3 0/4] bonding: initial support for hardware crypto offload

2020-06-19 Thread Jarod Wilson

This is an initial functional implementation for doing pass-through of
hardware encryption from bonding device to capable slaves, in active-backup
bond setups. This was developed and tested using ixgbe-driven Intel x520
interfaces with libreswan and a transport mode connection, primarily using
netperf, with assorted connection failures forced during transmission. The
failover works quite well in my testing, and overall performance is right
on par with offload when running on a bare interface, no bond involved.

Caveats: this is ONLY enabled for active-backup, because I'm not sure
how one would manage multiple offload handles for different devices all
running at the same time in the same xfrm, and it relies on some minor
changes to both the xfrm code and slave device driver code to get things
to behave, and I don't have immediate access to any other hardware that
could function similarly, but the NIC driver changes are minimal and
straight-forward enough that I've included what I think ought to be
enough for mlx5 devices too.

v2: reordered patches, switched (back) to using CONFIG_XFRM_OFFLOAD
to wrap the code additions and wrapped overlooked additions.
v3: rebase w/net-next open, add proper cc list to cover letter

Jarod Wilson (4):
  xfrm: bail early on slave pass over skb
  ixgbe_ipsec: become aware of when running as a bonding slave
  mlx5: become aware of when running as a bonding slave
  bonding: support hardware encryption offload to slaves

CC: Jay Vosburgh 
CC: Veaceslav Falico 
CC: Andy Gospodarek 
CC: "David S. Miller" 
CC: Jeff Kirsher 
CC: Jakub Kicinski 
CC: Steffen Klassert 
CC: Herbert Xu 
CC: net...@vger.kernel.org
CC: intel-wired-...@lists.osuosl.org
Signed-off-by: Jarod Wilson 

Jarod Wilson (4):
  xfrm: bail early on slave pass over skb
  ixgbe_ipsec: become aware of when running as a bonding slave
  mlx5: become aware of when running as a bonding slave
  bonding: support hardware encryption offload to slaves

 drivers/net/bonding/bond_main.c   | 127 +-
 .../net/ethernet/intel/ixgbe/ixgbe_ipsec.c|  39 --
 .../mellanox/mlx5/core/en_accel/ipsec.c   |   6 +
 include/net/bonding.h |   3 +
 include/net/xfrm.h|   1 +
 net/xfrm/xfrm_device.c|  34 ++---
 6 files changed, 183 insertions(+), 27 deletions(-)

-- 
2.20.1

[PATCH net-next v3 2/4] ixgbe_ipsec: become aware of when running as a bonding slave

2020-06-19 Thread Jarod Wilson

Slave devices in a bond doing hardware encryption also need to be aware
that they're slaves, so we operate on the slave instead of the bonding
master to do the actual hardware encryption offload bits.

CC: Jay Vosburgh 
CC: Veaceslav Falico 
CC: Andy Gospodarek 
CC: "David S. Miller" 
CC: Jeff Kirsher 
CC: Jakub Kicinski 
CC: Steffen Klassert 
CC: Herbert Xu 
CC: net...@vger.kernel.org
CC: intel-wired-...@lists.osuosl.org
Acked-by: Jeff Kirsher 
Signed-off-by: Jarod Wilson 
---
 .../net/ethernet/intel/ixgbe/ixgbe_ipsec.c| 39 +++
 1 file changed, 31 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c 
b/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c
index 113f6087c7c9..26b0a58a064d 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c
@@ -432,6 +432,9 @@ static int ixgbe_ipsec_parse_proto_keys(struct xfrm_state 
*xs,
char *alg_name = NULL;
int key_len;
 
+   if (xs->xso.slave_dev)
+   dev = xs->xso.slave_dev;
+
if (!xs->aead) {
netdev_err(dev, "Unsupported IPsec algorithm\n");
return -EINVAL;
@@ -478,8 +481,8 @@ static int ixgbe_ipsec_parse_proto_keys(struct xfrm_state 
*xs,
 static int ixgbe_ipsec_check_mgmt_ip(struct xfrm_state *xs)
 {
struct net_device *dev = xs->xso.dev;
-   struct ixgbe_adapter *adapter = netdev_priv(dev);
-   struct ixgbe_hw *hw = &adapter->hw;
+   struct ixgbe_adapter *adapter;
+   struct ixgbe_hw *hw;
u32 mfval, manc, reg;
int num_filters = 4;
bool manc_ipv4;
@@ -497,6 +500,12 @@ static int ixgbe_ipsec_check_mgmt_ip(struct xfrm_state *xs)
 #define BMCIP_V6 0x3
 #define BMCIP_MASK   0x3
 
+   if (xs->xso.slave_dev)
+   dev = xs->xso.slave_dev;
+
+   adapter = netdev_priv(dev);
+   hw = &adapter->hw;
+
manc = IXGBE_READ_REG(hw, IXGBE_MANC);
manc_ipv4 = !!(manc & MANC_EN_IPV4_FILTER);
mfval = IXGBE_READ_REG(hw, IXGBE_MFVAL);
@@ -561,14 +570,21 @@ static int ixgbe_ipsec_check_mgmt_ip(struct xfrm_state 
*xs)
 static int ixgbe_ipsec_add_sa(struct xfrm_state *xs)
 {
struct net_device *dev = xs->xso.dev;
-   struct ixgbe_adapter *adapter = netdev_priv(dev);
-   struct ixgbe_ipsec *ipsec = adapter->ipsec;
-   struct ixgbe_hw *hw = &adapter->hw;
+   struct ixgbe_adapter *adapter;
+   struct ixgbe_ipsec *ipsec;
+   struct ixgbe_hw *hw;
int checked, match, first;
u16 sa_idx;
int ret;
int i;
 
+   if (xs->xso.slave_dev)
+   dev = xs->xso.slave_dev;
+
+   adapter = netdev_priv(dev);
+   ipsec = adapter->ipsec;
+   hw = &adapter->hw;
+
if (xs->id.proto != IPPROTO_ESP && xs->id.proto != IPPROTO_AH) {
netdev_err(dev, "Unsupported protocol 0x%04x for ipsec 
offload\n",
   xs->id.proto);
@@ -746,12 +762,19 @@ static int ixgbe_ipsec_add_sa(struct xfrm_state *xs)
 static void ixgbe_ipsec_del_sa(struct xfrm_state *xs)
 {
struct net_device *dev = xs->xso.dev;
-   struct ixgbe_adapter *adapter = netdev_priv(dev);
-   struct ixgbe_ipsec *ipsec = adapter->ipsec;
-   struct ixgbe_hw *hw = &adapter->hw;
+   struct ixgbe_adapter *adapter;
+   struct ixgbe_ipsec *ipsec;
+   struct ixgbe_hw *hw;
u32 zerobuf[4] = {0, 0, 0, 0};
u16 sa_idx;
 
+   if (xs->xso.slave_dev)
+   dev = xs->xso.slave_dev;
+
+   adapter = netdev_priv(dev);
+   ipsec = adapter->ipsec;
+   hw = &adapter->hw;
+
if (xs->xso.flags & XFRM_OFFLOAD_INBOUND) {
struct rx_sa *rsa;
u8 ipi;
-- 
2.20.1

Re: [PATCH net-next v2 0/4] bonding: initial support for hardware crypto offload

2020-06-19 Thread Jarod Wilson

On Fri, Jun 19, 2020 at 6:26 AM Jeff Kirsher
 wrote:
>
> On Wed, Jun 10, 2020 at 1:18 PM Jarod Wilson  wrote:
> >
> > This is an initial functional implementation for doing pass-through of
> > hardware encryption from bonding device to capable slaves, in active-backup
> > bond setups. This was developed and tested using ixgbe-driven Intel x520
> > interfaces with libreswan and a transport mode connection, primarily using
> > netperf, with assorted connection failures forced during transmission. The
> > failover works quite well in my testing, and overall performance is right
> > on par with offload when running on a bare interface, no bond involved.
> >
> > Caveats: this is ONLY enabled for active-backup, because I'm not sure
> > how one would manage multiple offload handles for different devices all
> > running at the same time in the same xfrm, and it relies on some minor
> > changes to both the xfrm code and slave device driver code to get things
> > to behave, and I don't have immediate access to any other hardware that
> > could function similarly, but the NIC driver changes are minimal and
> > straight-forward enough that I've included what I think ought to be
> > enough for mlx5 devices too.
> >
> > v2: reordered patches, switched (back) to using CONFIG_XFRM_OFFLOAD
> > to wrap the code additions and wrapped overlooked additions.
> >
> > Jarod Wilson (4):
> >   xfrm: bail early on slave pass over skb
> >   ixgbe_ipsec: become aware of when running as a bonding slave
> >   mlx5: become aware of when running as a bonding slave
> >   bonding: support hardware encryption offload to slaves
...
> Was this ever sent to netdev (the more appropriate ML)?

I believe so, but I'd neglected to notice net-next was closed at the
time, so I was holding on to it to resubmit once net-next is opened
back up.

-- 
Jarod Wilson
ja...@redhat.com

[PATCH net-next v2 1/4] xfrm: bail early on slave pass over skb

2020-06-10 Thread Jarod Wilson

This is prep work for initial support of bonding hardware encryption
pass-through support. The bonding driver will fill in the slave_dev
pointer, and we use that to know not to skb_push() again on a given
skb that was already processed on the bond device.

CC: Jay Vosburgh 
CC: Veaceslav Falico 
CC: Andy Gospodarek 
CC: "David S. Miller" 
CC: Jeff Kirsher 
CC: Jakub Kicinski 
CC: Steffen Klassert 
CC: Herbert Xu 
CC: net...@vger.kernel.org
CC: intel-wired-...@lists.osuosl.org
Signed-off-by: Jarod Wilson 
---
 include/net/xfrm.h |  1 +
 net/xfrm/xfrm_device.c | 34 +-
 2 files changed, 18 insertions(+), 17 deletions(-)

diff --git a/include/net/xfrm.h b/include/net/xfrm.h
index 094fe682f5d7..e20b2b27ec48 100644
--- a/include/net/xfrm.h
+++ b/include/net/xfrm.h
@@ -127,6 +127,7 @@ struct xfrm_state_walk {
 
 struct xfrm_state_offload {
struct net_device   *dev;
+   struct net_device   *slave_dev;
unsigned long   offload_handle;
unsigned intnum_exthdrs;
u8  flags;
diff --git a/net/xfrm/xfrm_device.c b/net/xfrm/xfrm_device.c
index f50d1f97cf8e..b8918fc5248b 100644
--- a/net/xfrm/xfrm_device.c
+++ b/net/xfrm/xfrm_device.c
@@ -106,6 +106,7 @@ struct sk_buff *validate_xmit_xfrm(struct sk_buff *skb, 
netdev_features_t featur
struct sk_buff *skb2, *nskb, *pskb = NULL;
netdev_features_t esp_features = features;
struct xfrm_offload *xo = xfrm_offload(skb);
+   struct net_device *dev = skb->dev;
struct sec_path *sp;
 
if (!xo)
@@ -119,6 +120,10 @@ struct sk_buff *validate_xmit_xfrm(struct sk_buff *skb, 
netdev_features_t featur
if (xo->flags & XFRM_GRO || x->xso.flags & XFRM_OFFLOAD_INBOUND)
return skb;
 
+   /* This skb was already validated on the master dev */
+   if ((x->xso.dev != dev) && (x->xso.slave_dev == dev))
+   return skb;
+
local_irq_save(flags);
sd = this_cpu_ptr(&softnet_data);
err = !skb_queue_empty(&sd->xfrm_backlog);
@@ -129,25 +134,20 @@ struct sk_buff *validate_xmit_xfrm(struct sk_buff *skb, 
netdev_features_t featur
return skb;
}
 
-   if (skb_is_gso(skb)) {
-   struct net_device *dev = skb->dev;
-
-   if (unlikely(x->xso.dev != dev)) {
-   struct sk_buff *segs;
+   if (skb_is_gso(skb) && unlikely(x->xso.dev != dev)) {
+   struct sk_buff *segs;
 
-   /* Packet got rerouted, fixup features and segment it. 
*/
-   esp_features = esp_features & ~(NETIF_F_HW_ESP
-   | NETIF_F_GSO_ESP);
+   /* Packet got rerouted, fixup features and segment it. */
+   esp_features = esp_features & ~(NETIF_F_HW_ESP | 
NETIF_F_GSO_ESP);
 
-   segs = skb_gso_segment(skb, esp_features);
-   if (IS_ERR(segs)) {
-   kfree_skb(skb);
-   atomic_long_inc(&dev->tx_dropped);
-   return NULL;
-   } else {
-   consume_skb(skb);
-   skb = segs;
-   }
+   segs = skb_gso_segment(skb, esp_features);
+   if (IS_ERR(segs)) {
+   kfree_skb(skb);
+   atomic_long_inc(&dev->tx_dropped);
+   return NULL;
+   } else {
+   consume_skb(skb);
+   skb = segs;
}
}
 
-- 
2.20.1

[PATCH net-next v2 3/4] mlx5: become aware of when running as a bonding slave

2020-06-10 Thread Jarod Wilson

I've been unable to get my hands on suitable supported hardware to date,
but I believe this ought to be all that is needed to enable the mlx5
driver to also work with bonding active-backup crypto offload passthru.

CC: Boris Pismenny 
CC: Saeed Mahameed 
CC: Leon Romanovsky 
CC: Jay Vosburgh 
CC: Veaceslav Falico 
CC: Andy Gospodarek 
CC: "David S. Miller" 
CC: Jeff Kirsher 
CC: Jakub Kicinski 
CC: Steffen Klassert 
CC: Herbert Xu 
CC: net...@vger.kernel.org
Signed-off-by: Jarod Wilson 
---
 drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c
index 92eb3bad4acd..72ad6664bd73 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c
@@ -210,6 +210,9 @@ static inline int mlx5e_xfrm_validate_state(struct 
xfrm_state *x)
struct net_device *netdev = x->xso.dev;
struct mlx5e_priv *priv;
 
+   if (x->xso.slave_dev)
+   netdev = x->xso.slave_dev;
+
priv = netdev_priv(netdev);
 
if (x->props.aalgo != SADB_AALG_NONE) {
@@ -291,6 +294,9 @@ static int mlx5e_xfrm_add_state(struct xfrm_state *x)
unsigned int sa_handle;
int err;
 
+   if (x->xso.slave_dev)
+   netdev = x->xso.slave_dev;
+
priv = netdev_priv(netdev);
 
err = mlx5e_xfrm_validate_state(x);
-- 
2.20.1

[PATCH net-next v2 2/4] ixgbe_ipsec: become aware of when running as a bonding slave

2020-06-10 Thread Jarod Wilson

Slave devices in a bond doing hardware encryption also need to be aware
that they're slaves, so we operate on the slave instead of the bonding
master to do the actual hardware encryption offload bits.

CC: Jay Vosburgh 
CC: Veaceslav Falico 
CC: Andy Gospodarek 
CC: "David S. Miller" 
CC: Jeff Kirsher 
CC: Jakub Kicinski 
CC: Steffen Klassert 
CC: Herbert Xu 
CC: net...@vger.kernel.org
CC: intel-wired-...@lists.osuosl.org
Acked-by: Jeff Kirsher 
Signed-off-by: Jarod Wilson 
---
 .../net/ethernet/intel/ixgbe/ixgbe_ipsec.c| 39 +++
 1 file changed, 31 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c 
b/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c
index 113f6087c7c9..26b0a58a064d 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c
@@ -432,6 +432,9 @@ static int ixgbe_ipsec_parse_proto_keys(struct xfrm_state 
*xs,
char *alg_name = NULL;
int key_len;
 
+   if (xs->xso.slave_dev)
+   dev = xs->xso.slave_dev;
+
if (!xs->aead) {
netdev_err(dev, "Unsupported IPsec algorithm\n");
return -EINVAL;
@@ -478,8 +481,8 @@ static int ixgbe_ipsec_parse_proto_keys(struct xfrm_state 
*xs,
 static int ixgbe_ipsec_check_mgmt_ip(struct xfrm_state *xs)
 {
struct net_device *dev = xs->xso.dev;
-   struct ixgbe_adapter *adapter = netdev_priv(dev);
-   struct ixgbe_hw *hw = &adapter->hw;
+   struct ixgbe_adapter *adapter;
+   struct ixgbe_hw *hw;
u32 mfval, manc, reg;
int num_filters = 4;
bool manc_ipv4;
@@ -497,6 +500,12 @@ static int ixgbe_ipsec_check_mgmt_ip(struct xfrm_state *xs)
 #define BMCIP_V6 0x3
 #define BMCIP_MASK   0x3
 
+   if (xs->xso.slave_dev)
+   dev = xs->xso.slave_dev;
+
+   adapter = netdev_priv(dev);
+   hw = &adapter->hw;
+
manc = IXGBE_READ_REG(hw, IXGBE_MANC);
manc_ipv4 = !!(manc & MANC_EN_IPV4_FILTER);
mfval = IXGBE_READ_REG(hw, IXGBE_MFVAL);
@@ -561,14 +570,21 @@ static int ixgbe_ipsec_check_mgmt_ip(struct xfrm_state 
*xs)
 static int ixgbe_ipsec_add_sa(struct xfrm_state *xs)
 {
struct net_device *dev = xs->xso.dev;
-   struct ixgbe_adapter *adapter = netdev_priv(dev);
-   struct ixgbe_ipsec *ipsec = adapter->ipsec;
-   struct ixgbe_hw *hw = &adapter->hw;
+   struct ixgbe_adapter *adapter;
+   struct ixgbe_ipsec *ipsec;
+   struct ixgbe_hw *hw;
int checked, match, first;
u16 sa_idx;
int ret;
int i;
 
+   if (xs->xso.slave_dev)
+   dev = xs->xso.slave_dev;
+
+   adapter = netdev_priv(dev);
+   ipsec = adapter->ipsec;
+   hw = &adapter->hw;
+
if (xs->id.proto != IPPROTO_ESP && xs->id.proto != IPPROTO_AH) {
netdev_err(dev, "Unsupported protocol 0x%04x for ipsec 
offload\n",
   xs->id.proto);
@@ -746,12 +762,19 @@ static int ixgbe_ipsec_add_sa(struct xfrm_state *xs)
 static void ixgbe_ipsec_del_sa(struct xfrm_state *xs)
 {
struct net_device *dev = xs->xso.dev;
-   struct ixgbe_adapter *adapter = netdev_priv(dev);
-   struct ixgbe_ipsec *ipsec = adapter->ipsec;
-   struct ixgbe_hw *hw = &adapter->hw;
+   struct ixgbe_adapter *adapter;
+   struct ixgbe_ipsec *ipsec;
+   struct ixgbe_hw *hw;
u32 zerobuf[4] = {0, 0, 0, 0};
u16 sa_idx;
 
+   if (xs->xso.slave_dev)
+   dev = xs->xso.slave_dev;
+
+   adapter = netdev_priv(dev);
+   ipsec = adapter->ipsec;
+   hw = &adapter->hw;
+
if (xs->xso.flags & XFRM_OFFLOAD_INBOUND) {
struct rx_sa *rsa;
u8 ipi;
-- 
2.20.1

[PATCH net-next v2 4/4] bonding: support hardware encryption offload to slaves

2020-06-10 Thread Jarod Wilson

Currently, this support is limited to active-backup mode, as I'm not sure
about the feasilibity of mapping an xfrm_state's offload handle to
multiple hardware devices simultaneously, and we rely on being able to
pass some hints to both the xfrm and NIC driver about whether or not
they're operating on a slave device.

I've tested this atop an Intel x520 device (ixgbe) using libreswan in
transport mode, succesfully achieving ~4.3Gbps throughput with netperf
(more or less identical to throughput on a bare NIC in this system),
as well as successful failover and recovery mid-netperf.

v2: just use CONFIG_XFRM_OFFLOAD for wrapping, isolate more code with it

CC: Jay Vosburgh 
CC: Veaceslav Falico 
CC: Andy Gospodarek 
CC: "David S. Miller" 
CC: Jeff Kirsher 
CC: Jakub Kicinski 
CC: Steffen Klassert 
CC: Herbert Xu 
CC: net...@vger.kernel.org
CC: intel-wired-...@lists.osuosl.org
Signed-off-by: Jarod Wilson 
---
 drivers/net/bonding/bond_main.c | 127 +++-
 include/net/bonding.h   |   3 +
 2 files changed, 128 insertions(+), 2 deletions(-)

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index a25c65d4af71..882b57328308 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -79,6 +79,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -278,8 +279,6 @@ const char *bond_mode_name(int mode)
return names[mode];
 }
 
-/*-- VLAN ---*/
-
 /**
  * bond_dev_queue_xmit - Prepare skb for xmit.
  *
@@ -302,6 +301,8 @@ netdev_tx_t bond_dev_queue_xmit(struct bonding *bond, 
struct sk_buff *skb,
return dev_queue_xmit(skb);
 }
 
+/*-- VLAN ---*/
+
 /* In the following 2 functions, bond_vlan_rx_add_vid and 
bond_vlan_rx_kill_vid,
  * We don't protect the slave list iteration with a lock because:
  * a. This operation is performed in IOCTL context,
@@ -372,6 +373,84 @@ static int bond_vlan_rx_kill_vid(struct net_device 
*bond_dev,
return 0;
 }
 
+/*-- XFRM ---*/
+
+#ifdef CONFIG_XFRM_OFFLOAD
+/**
+ * bond_ipsec_add_sa - program device with a security association
+ * @xs: pointer to transformer state struct
+ **/
+static int bond_ipsec_add_sa(struct xfrm_state *xs)
+{
+   struct net_device *bond_dev = xs->xso.dev;
+   struct bonding *bond = netdev_priv(bond_dev);
+   struct slave *slave = rtnl_dereference(bond->curr_active_slave);
+
+   xs->xso.slave_dev = slave->dev;
+   bond->xs = xs;
+
+   if (!(slave->dev->xfrmdev_ops
+ && slave->dev->xfrmdev_ops->xdo_dev_state_add)) {
+   slave_warn(bond_dev, slave->dev, "Slave does not support ipsec 
offload\n");
+   return -EINVAL;
+   }
+
+   return slave->dev->xfrmdev_ops->xdo_dev_state_add(xs);
+}
+
+/**
+ * bond_ipsec_del_sa - clear out this specific SA
+ * @xs: pointer to transformer state struct
+ **/
+static void bond_ipsec_del_sa(struct xfrm_state *xs)
+{
+   struct net_device *bond_dev = xs->xso.dev;
+   struct bonding *bond = netdev_priv(bond_dev);
+   struct slave *slave = rtnl_dereference(bond->curr_active_slave);
+
+   if (!slave)
+   return;
+
+   xs->xso.slave_dev = slave->dev;
+
+   if (!(slave->dev->xfrmdev_ops
+ && slave->dev->xfrmdev_ops->xdo_dev_state_delete)) {
+   slave_warn(bond_dev, slave->dev, "%s: no slave 
xdo_dev_state_delete\n", __func__);
+   return;
+   }
+
+   slave->dev->xfrmdev_ops->xdo_dev_state_delete(xs);
+}
+
+/**
+ * bond_ipsec_offload_ok - can this packet use the xfrm hw offload
+ * @skb: current data packet
+ * @xs: pointer to transformer state struct
+ **/
+static bool bond_ipsec_offload_ok(struct sk_buff *skb, struct xfrm_state *xs)
+{
+   struct net_device *bond_dev = xs->xso.dev;
+   struct bonding *bond = netdev_priv(bond_dev);
+   struct slave *curr_active = rtnl_dereference(bond->curr_active_slave);
+   struct net_device *slave_dev = curr_active->dev;
+
+   if (!(slave_dev->xfrmdev_ops
+ && slave_dev->xfrmdev_ops->xdo_dev_offload_ok)) {
+   slave_warn(bond_dev, slave_dev, "%s: no slave 
xdo_dev_offload_ok\n", __func__);
+   return false;
+   }
+
+   xs->xso.slave_dev = slave_dev;
+   return slave_dev->xfrmdev_ops->xdo_dev_offload_ok(skb, xs);
+}
+
+static const struct xfrmdev_ops bond_xfrmdev_ops = {
+   .xdo_dev_state_add = bond_ipsec_add_sa,
+   .xdo_dev_state_delete = bond_ipsec_del_sa,
+   .xdo_dev_offload_ok = bond_ipsec_offload_ok,
+};
+#endif /* CONFIG_XFRM_OFFLOAD */
+
 /*

[PATCH net-next v2 0/4] bonding: initial support for hardware crypto offload

2020-06-10 Thread Jarod Wilson

This is an initial functional implementation for doing pass-through of
hardware encryption from bonding device to capable slaves, in active-backup
bond setups. This was developed and tested using ixgbe-driven Intel x520
interfaces with libreswan and a transport mode connection, primarily using
netperf, with assorted connection failures forced during transmission. The
failover works quite well in my testing, and overall performance is right
on par with offload when running on a bare interface, no bond involved.

Caveats: this is ONLY enabled for active-backup, because I'm not sure
how one would manage multiple offload handles for different devices all
running at the same time in the same xfrm, and it relies on some minor
changes to both the xfrm code and slave device driver code to get things
to behave, and I don't have immediate access to any other hardware that
could function similarly, but the NIC driver changes are minimal and
straight-forward enough that I've included what I think ought to be
enough for mlx5 devices too.

v2: reordered patches, switched (back) to using CONFIG_XFRM_OFFLOAD
to wrap the code additions and wrapped overlooked additions.

Jarod Wilson (4):
  xfrm: bail early on slave pass over skb
  ixgbe_ipsec: become aware of when running as a bonding slave
  mlx5: become aware of when running as a bonding slave
  bonding: support hardware encryption offload to slaves

 drivers/net/bonding/bond_main.c   | 127 +-
 .../net/ethernet/intel/ixgbe/ixgbe_ipsec.c|  39 --
 .../mellanox/mlx5/core/en_accel/ipsec.c   |   6 +
 include/net/bonding.h |   3 +
 include/net/xfrm.h|   1 +
 net/xfrm/xfrm_device.c|  34 ++---
 6 files changed, 183 insertions(+), 27 deletions(-)

-- 
2.20.1

Re: [PATCH net-next 3/4] bonding: support hardware encryption offload to slaves

2020-06-08 Thread Jarod Wilson

On Mon, Jun 8, 2020 at 7:48 PM Jay Vosburgh  wrote:
>
> Jarod Wilson  wrote:
>
> >Currently, this support is limited to active-backup mode, as I'm not sure
> >about the feasilibity of mapping an xfrm_state's offload handle to
> >multiple hardware devices simultaneously, and we rely on being able to
> >pass some hints to both the xfrm and NIC driver about whether or not
> >they're operating on a slave device.
> >
> >I've tested this atop an Intel x520 device (ixgbe) using libreswan in
> >transport mode, succesfully achieving ~4.3Gbps throughput with netperf
> >(more or less identical to throughput on a bare NIC in this system),
> >as well as successful failover and recovery mid-netperf.
> >
> >v2: rebase on latest net-next and wrap with #ifdef CONFIG_XFRM_OFFLOAD
> >v3: add new CONFIG_BOND_XFRM_OFFLOAD option and fix shutdown path
> >
> >CC: Jay Vosburgh 
> >CC: Veaceslav Falico 
> >CC: Andy Gospodarek 
> >CC: "David S. Miller" 
> >CC: Jeff Kirsher 
> >CC: Jakub Kicinski 
> >CC: Steffen Klassert 
> >CC: Herbert Xu 
> >CC: net...@vger.kernel.org
> >CC: intel-wired-...@lists.osuosl.org
> >Signed-off-by: Jarod Wilson 
> >
> >Signed-off-by: Jarod Wilson 
> >---
> > drivers/net/Kconfig |  11 
> > drivers/net/bonding/bond_main.c | 111 +++-
> > include/net/bonding.h   |   3 +
> > 3 files changed, 122 insertions(+), 3 deletions(-)
> >
> >diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig
> >index c7d310ef1c83..938c4dd9bfb9 100644
> >--- a/drivers/net/Kconfig
> >+++ b/drivers/net/Kconfig
> >@@ -56,6 +56,17 @@ config BONDING
> > To compile this driver as a module, choose M here: the module
> > will be called bonding.
> >
> >+config BONDING_XFRM_OFFLOAD
> >+  bool "Bonding driver IPSec XFRM cryptography-offload pass-through 
> >support"
> >+  depends on BONDING
> >+  depends on XFRM_OFFLOAD
> >+  default y
> >+  select XFRM_ALGO
> >+  ---help---
> >+Enable support for IPSec offload pass-through in the bonding driver.
> >+Currently limited to active-backup mode only, and requires slave
> >+devices that support hardware crypto offload.
> >+
>
> Why is this a separate Kconfig option?  Is it reasonable to
> expect users to enable XFRM_OFFLOAD but not BONDING_XFRM_OFFLOAD?

I'd originally just wrapped it with XFRM_OFFLOAD, but in an
overabundance of caution, thought maybe gating it behind its own flag
was better. I didn't get any feedback on the initial posting, so I've
been sort of winging it. :)

> >diff --git a/drivers/net/bonding/bond_main.c 
> >b/drivers/net/bonding/bond_main.c
> >index a25c65d4af71..01b80cef492a 100644
> >--- a/drivers/net/bonding/bond_main.c
> >+++ b/drivers/net/bonding/bond_main.c
...
> >@@ -4560,6 +4663,8 @@ void bond_setup(struct net_device *bond_dev)
> >   NETIF_F_HW_VLAN_CTAG_FILTER;
> >
> >   bond_dev->hw_features |= NETIF_F_GSO_ENCAP_ALL | NETIF_F_GSO_UDP_L4;
> >+  if ((BOND_MODE(bond) == BOND_MODE_ACTIVEBACKUP))
> >+  bond_dev->hw_features |= BOND_ENC_FEATURES;
>
> Why is adding the ESP features to hw_features (here, and added
> to BOND_ENC_FEATURES, above) not behind CONFIG_BONDING_XFRM_OFFLOAD?
>
> If adding these features makes sense regardless of the
> XFRM_OFFLOAD configuration, then shouldn't this change to feature
> handling be a separate patch?  The feature handling is complex, and is
> worth its own patch so it stands out in the log.

No, that would be an oversight by me. The build bot yelled at me on v1
about builds with XFRM_OFFLOAD not enabled, and I neglected to wrap
that bit too.

I'll do that in the next revision. I'm also fine with dropping the
extra kconfig and just using XFRM_OFFLOAD for all of it, if that's
sufficient.

-- 
Jarod Wilson
ja...@redhat.com

[PATCH net-next 1/4] xfrm: bail early on slave pass over skb

2020-06-08 Thread Jarod Wilson

This is prep work for initial support of bonding hardware encryption
pass-through support. The bonding driver will fill in the slave_dev
pointer, and we use that to know not to skb_push() again on a given
skb that was already processed on the bond device.

CC: Jay Vosburgh 
CC: Veaceslav Falico 
CC: Andy Gospodarek 
CC: "David S. Miller" 
CC: Jeff Kirsher 
CC: Jakub Kicinski 
CC: Steffen Klassert 
CC: Herbert Xu 
CC: net...@vger.kernel.org
CC: intel-wired-...@lists.osuosl.org
Signed-off-by: Jarod Wilson 
---
 include/net/xfrm.h |  1 +
 net/xfrm/xfrm_device.c | 34 +-
 2 files changed, 18 insertions(+), 17 deletions(-)

diff --git a/include/net/xfrm.h b/include/net/xfrm.h
index 094fe682f5d7..e20b2b27ec48 100644
--- a/include/net/xfrm.h
+++ b/include/net/xfrm.h
@@ -127,6 +127,7 @@ struct xfrm_state_walk {
 
 struct xfrm_state_offload {
struct net_device   *dev;
+   struct net_device   *slave_dev;
unsigned long   offload_handle;
unsigned intnum_exthdrs;
u8  flags;
diff --git a/net/xfrm/xfrm_device.c b/net/xfrm/xfrm_device.c
index f50d1f97cf8e..b8918fc5248b 100644
--- a/net/xfrm/xfrm_device.c
+++ b/net/xfrm/xfrm_device.c
@@ -106,6 +106,7 @@ struct sk_buff *validate_xmit_xfrm(struct sk_buff *skb, 
netdev_features_t featur
struct sk_buff *skb2, *nskb, *pskb = NULL;
netdev_features_t esp_features = features;
struct xfrm_offload *xo = xfrm_offload(skb);
+   struct net_device *dev = skb->dev;
struct sec_path *sp;
 
if (!xo)
@@ -119,6 +120,10 @@ struct sk_buff *validate_xmit_xfrm(struct sk_buff *skb, 
netdev_features_t featur
if (xo->flags & XFRM_GRO || x->xso.flags & XFRM_OFFLOAD_INBOUND)
return skb;
 
+   /* This skb was already validated on the master dev */
+   if ((x->xso.dev != dev) && (x->xso.slave_dev == dev))
+   return skb;
+
local_irq_save(flags);
sd = this_cpu_ptr(&softnet_data);
err = !skb_queue_empty(&sd->xfrm_backlog);
@@ -129,25 +134,20 @@ struct sk_buff *validate_xmit_xfrm(struct sk_buff *skb, 
netdev_features_t featur
return skb;
}
 
-   if (skb_is_gso(skb)) {
-   struct net_device *dev = skb->dev;
-
-   if (unlikely(x->xso.dev != dev)) {
-   struct sk_buff *segs;
+   if (skb_is_gso(skb) && unlikely(x->xso.dev != dev)) {
+   struct sk_buff *segs;
 
-   /* Packet got rerouted, fixup features and segment it. 
*/
-   esp_features = esp_features & ~(NETIF_F_HW_ESP
-   | NETIF_F_GSO_ESP);
+   /* Packet got rerouted, fixup features and segment it. */
+   esp_features = esp_features & ~(NETIF_F_HW_ESP | 
NETIF_F_GSO_ESP);
 
-   segs = skb_gso_segment(skb, esp_features);
-   if (IS_ERR(segs)) {
-   kfree_skb(skb);
-   atomic_long_inc(&dev->tx_dropped);
-   return NULL;
-   } else {
-   consume_skb(skb);
-   skb = segs;
-   }
+   segs = skb_gso_segment(skb, esp_features);
+   if (IS_ERR(segs)) {
+   kfree_skb(skb);
+   atomic_long_inc(&dev->tx_dropped);
+   return NULL;
+   } else {
+   consume_skb(skb);
+   skb = segs;
}
}
 
-- 
2.20.1

[PATCH net-next 2/4] ixgbe_ipsec: become aware of when running as a bonding slave

2020-06-08 Thread Jarod Wilson

Slave devices in a bond doing hardware encryption also need to be aware
that they're slaves, so we operate on the slave instead of the bonding
master to do the actual hardware encryption offload bits.

CC: Jay Vosburgh 
CC: Veaceslav Falico 
CC: Andy Gospodarek 
CC: "David S. Miller" 
CC: Jeff Kirsher 
CC: Jakub Kicinski 
CC: Steffen Klassert 
CC: Herbert Xu 
CC: net...@vger.kernel.org
CC: intel-wired-...@lists.osuosl.org
Signed-off-by: Jarod Wilson 
---
 .../net/ethernet/intel/ixgbe/ixgbe_ipsec.c| 39 +++
 1 file changed, 31 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c 
b/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c
index 113f6087c7c9..26b0a58a064d 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c
@@ -432,6 +432,9 @@ static int ixgbe_ipsec_parse_proto_keys(struct xfrm_state 
*xs,
char *alg_name = NULL;
int key_len;
 
+   if (xs->xso.slave_dev)
+   dev = xs->xso.slave_dev;
+
if (!xs->aead) {
netdev_err(dev, "Unsupported IPsec algorithm\n");
return -EINVAL;
@@ -478,8 +481,8 @@ static int ixgbe_ipsec_parse_proto_keys(struct xfrm_state 
*xs,
 static int ixgbe_ipsec_check_mgmt_ip(struct xfrm_state *xs)
 {
struct net_device *dev = xs->xso.dev;
-   struct ixgbe_adapter *adapter = netdev_priv(dev);
-   struct ixgbe_hw *hw = &adapter->hw;
+   struct ixgbe_adapter *adapter;
+   struct ixgbe_hw *hw;
u32 mfval, manc, reg;
int num_filters = 4;
bool manc_ipv4;
@@ -497,6 +500,12 @@ static int ixgbe_ipsec_check_mgmt_ip(struct xfrm_state *xs)
 #define BMCIP_V6 0x3
 #define BMCIP_MASK   0x3
 
+   if (xs->xso.slave_dev)
+   dev = xs->xso.slave_dev;
+
+   adapter = netdev_priv(dev);
+   hw = &adapter->hw;
+
manc = IXGBE_READ_REG(hw, IXGBE_MANC);
manc_ipv4 = !!(manc & MANC_EN_IPV4_FILTER);
mfval = IXGBE_READ_REG(hw, IXGBE_MFVAL);
@@ -561,14 +570,21 @@ static int ixgbe_ipsec_check_mgmt_ip(struct xfrm_state 
*xs)
 static int ixgbe_ipsec_add_sa(struct xfrm_state *xs)
 {
struct net_device *dev = xs->xso.dev;
-   struct ixgbe_adapter *adapter = netdev_priv(dev);
-   struct ixgbe_ipsec *ipsec = adapter->ipsec;
-   struct ixgbe_hw *hw = &adapter->hw;
+   struct ixgbe_adapter *adapter;
+   struct ixgbe_ipsec *ipsec;
+   struct ixgbe_hw *hw;
int checked, match, first;
u16 sa_idx;
int ret;
int i;
 
+   if (xs->xso.slave_dev)
+   dev = xs->xso.slave_dev;
+
+   adapter = netdev_priv(dev);
+   ipsec = adapter->ipsec;
+   hw = &adapter->hw;
+
if (xs->id.proto != IPPROTO_ESP && xs->id.proto != IPPROTO_AH) {
netdev_err(dev, "Unsupported protocol 0x%04x for ipsec 
offload\n",
   xs->id.proto);
@@ -746,12 +762,19 @@ static int ixgbe_ipsec_add_sa(struct xfrm_state *xs)
 static void ixgbe_ipsec_del_sa(struct xfrm_state *xs)
 {
struct net_device *dev = xs->xso.dev;
-   struct ixgbe_adapter *adapter = netdev_priv(dev);
-   struct ixgbe_ipsec *ipsec = adapter->ipsec;
-   struct ixgbe_hw *hw = &adapter->hw;
+   struct ixgbe_adapter *adapter;
+   struct ixgbe_ipsec *ipsec;
+   struct ixgbe_hw *hw;
u32 zerobuf[4] = {0, 0, 0, 0};
u16 sa_idx;
 
+   if (xs->xso.slave_dev)
+   dev = xs->xso.slave_dev;
+
+   adapter = netdev_priv(dev);
+   ipsec = adapter->ipsec;
+   hw = &adapter->hw;
+
if (xs->xso.flags & XFRM_OFFLOAD_INBOUND) {
struct rx_sa *rsa;
u8 ipi;
-- 
2.20.1

[PATCH net-next 4/4] mlx5: become aware of when running as a bonding slave

2020-06-08 Thread Jarod Wilson

I've been unable to get my hands on suitable supported hardware to date,
but I believe this ought to be all that is needed to enable the mlx5
driver to also work with bonding active-backup crypto offload passthru.

CC: Boris Pismenny 
CC: Saeed Mahameed 
CC: Leon Romanovsky 
CC: Jay Vosburgh 
CC: Veaceslav Falico 
CC: Andy Gospodarek 
CC: "David S. Miller" 
CC: Jeff Kirsher 
CC: Jakub Kicinski 
CC: Steffen Klassert 
CC: Herbert Xu 
CC: net...@vger.kernel.org
Signed-off-by: Jarod Wilson 
---
 drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c
index 92eb3bad4acd..72ad6664bd73 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c
@@ -210,6 +210,9 @@ static inline int mlx5e_xfrm_validate_state(struct 
xfrm_state *x)
struct net_device *netdev = x->xso.dev;
struct mlx5e_priv *priv;
 
+   if (x->xso.slave_dev)
+   netdev = x->xso.slave_dev;
+
priv = netdev_priv(netdev);
 
if (x->props.aalgo != SADB_AALG_NONE) {
@@ -291,6 +294,9 @@ static int mlx5e_xfrm_add_state(struct xfrm_state *x)
unsigned int sa_handle;
int err;
 
+   if (x->xso.slave_dev)
+   netdev = x->xso.slave_dev;
+
priv = netdev_priv(netdev);
 
err = mlx5e_xfrm_validate_state(x);
-- 
2.20.1

[PATCH net-next 3/4] bonding: support hardware encryption offload to slaves

2020-06-08 Thread Jarod Wilson

Currently, this support is limited to active-backup mode, as I'm not sure
about the feasilibity of mapping an xfrm_state's offload handle to
multiple hardware devices simultaneously, and we rely on being able to
pass some hints to both the xfrm and NIC driver about whether or not
they're operating on a slave device.

I've tested this atop an Intel x520 device (ixgbe) using libreswan in
transport mode, succesfully achieving ~4.3Gbps throughput with netperf
(more or less identical to throughput on a bare NIC in this system),
as well as successful failover and recovery mid-netperf.

v2: rebase on latest net-next and wrap with #ifdef CONFIG_XFRM_OFFLOAD
v3: add new CONFIG_BOND_XFRM_OFFLOAD option and fix shutdown path

CC: Jay Vosburgh 
CC: Veaceslav Falico 
CC: Andy Gospodarek 
CC: "David S. Miller" 
CC: Jeff Kirsher 
CC: Jakub Kicinski 
CC: Steffen Klassert 
CC: Herbert Xu 
CC: net...@vger.kernel.org
CC: intel-wired-...@lists.osuosl.org
Signed-off-by: Jarod Wilson 

Signed-off-by: Jarod Wilson 
---
 drivers/net/Kconfig |  11 
 drivers/net/bonding/bond_main.c | 111 +++-
 include/net/bonding.h   |   3 +
 3 files changed, 122 insertions(+), 3 deletions(-)

diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig
index c7d310ef1c83..938c4dd9bfb9 100644
--- a/drivers/net/Kconfig
+++ b/drivers/net/Kconfig
@@ -56,6 +56,17 @@ config BONDING
  To compile this driver as a module, choose M here: the module
  will be called bonding.
 
+config BONDING_XFRM_OFFLOAD
+   bool "Bonding driver IPSec XFRM cryptography-offload pass-through 
support"
+   depends on BONDING
+   depends on XFRM_OFFLOAD
+   default y
+   select XFRM_ALGO
+   ---help---
+ Enable support for IPSec offload pass-through in the bonding driver.
+ Currently limited to active-backup mode only, and requires slave
+ devices that support hardware crypto offload.
+
 config DUMMY
tristate "Dummy net driver support"
---help---
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index a25c65d4af71..01b80cef492a 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -79,6 +79,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -278,8 +279,6 @@ const char *bond_mode_name(int mode)
return names[mode];
 }
 
-/*-- VLAN ---*/
-
 /**
  * bond_dev_queue_xmit - Prepare skb for xmit.
  *
@@ -302,6 +301,8 @@ netdev_tx_t bond_dev_queue_xmit(struct bonding *bond, 
struct sk_buff *skb,
return dev_queue_xmit(skb);
 }
 
+/*-- VLAN ---*/
+
 /* In the following 2 functions, bond_vlan_rx_add_vid and 
bond_vlan_rx_kill_vid,
  * We don't protect the slave list iteration with a lock because:
  * a. This operation is performed in IOCTL context,
@@ -372,6 +373,84 @@ static int bond_vlan_rx_kill_vid(struct net_device 
*bond_dev,
return 0;
 }
 
+/*-- XFRM ---*/
+
+#ifdef CONFIG_BONDING_XFRM_OFFLOAD
+/**
+ * bond_ipsec_add_sa - program device with a security association
+ * @xs: pointer to transformer state struct
+ **/
+static int bond_ipsec_add_sa(struct xfrm_state *xs)
+{
+   struct net_device *bond_dev = xs->xso.dev;
+   struct bonding *bond = netdev_priv(bond_dev);
+   struct slave *slave = rtnl_dereference(bond->curr_active_slave);
+
+   xs->xso.slave_dev = slave->dev;
+   bond->xs = xs;
+
+   if (!(slave->dev->xfrmdev_ops
+ && slave->dev->xfrmdev_ops->xdo_dev_state_add)) {
+   slave_warn(bond_dev, slave->dev, "Slave does not support ipsec 
offload\n");
+   return -EINVAL;
+   }
+
+   return slave->dev->xfrmdev_ops->xdo_dev_state_add(xs);
+}
+
+/**
+ * bond_ipsec_del_sa - clear out this specific SA
+ * @xs: pointer to transformer state struct
+ **/
+static void bond_ipsec_del_sa(struct xfrm_state *xs)
+{
+   struct net_device *bond_dev = xs->xso.dev;
+   struct bonding *bond = netdev_priv(bond_dev);
+   struct slave *slave = rtnl_dereference(bond->curr_active_slave);
+
+   if (!slave)
+   return;
+
+   xs->xso.slave_dev = slave->dev;
+
+   if (!(slave->dev->xfrmdev_ops
+ && slave->dev->xfrmdev_ops->xdo_dev_state_delete)) {
+   slave_warn(bond_dev, slave->dev, "%s: no slave 
xdo_dev_state_delete\n", __func__);
+   return;
+   }
+
+   slave->dev->xfrmdev_ops->xdo_dev_state_delete(xs);
+}
+
+/**
+ * bond_ipsec_offload_ok - can this packet use the xfrm hw offload
+ * @skb: current data packet
+ * @xs: pointer to transformer state st

[PATCH net-next 0/4] bonding: initial support for hardware crypto offload

2020-06-08 Thread Jarod Wilson

This is an initial functional implementation for doing pass-through of
hardware encryption from bonding device to capable slaves, in active-backup
bond setups. This was developed and tested using ixgbe-driven Intel x520
interfaces with libreswan and a transport mode connection, primarily using
netperf, with assorted connection failures forced during transmission. The
failover works quite well in my testing, and overall performance is right
on par with offload when running on a bare interface, no bond involved.

Caveats: this is ONLY enabled for active-backup, because I'm not sure
how one would manage multiple offload handles for different devices all
running at the same time in the same xfrm, and it relies on some minor
changes to both the xfrm code and slave device driver code to get things
to behave, and I don't have immediate access to any other hardware that
could function similarly, but the NIC driver changes are minimal and
straight-forward enough that I've included what I think ought to be
enough for mlx5 devices too.

Earlier RFC submissions of this set didn't get any feedback, other than
from the build bot, so I'm hoping silence means nobody hated it...

Jarod Wilson (4):
  xfrm: bail early on slave pass over skb
  ixgbe_ipsec: become aware of when running as a bonding slave
  bonding: support hardware encryption offload to slaves
  mlx5: support crypto offload as a bonding slave

 drivers/net/Kconfig   |  11 ++
 drivers/net/bonding/bond_main.c   | 111 +-
 .../net/ethernet/intel/ixgbe/ixgbe_ipsec.c|  39 --
 .../mellanox/mlx5/core/en_accel/ipsec.c   |   6 +
 include/net/bonding.h |   3 +
 include/net/xfrm.h|   1 +
 net/xfrm/xfrm_device.c|  34 +++---
 7 files changed, 177 insertions(+), 28 deletions(-)

-- 
2.20.1

[RFC PATCH net-next v2 1/3] xfrm: bail early on slave pass over skb

2020-05-05 Thread Jarod Wilson

This is prep work for initial support of bonding hardware encryption
pass-through support. The bonding driver will fill in the slave_dev
pointer, and we use that to know not to skb_push() again on a given
skb that was already processed on the bond device.

CC: Jay Vosburgh 
CC: Veaceslav Falico 
CC: Andy Gospodarek 
CC: "David S. Miller" 
CC: Jeff Kirsher 
CC: Jakub Kicinski 
CC: Steffen Klassert 
CC: Herbert Xu 
CC: net...@vger.kernel.org
CC: intel-wired-...@lists.osuosl.org
Signed-off-by: Jarod Wilson 
---
 include/net/xfrm.h |  1 +
 net/xfrm/xfrm_device.c | 34 +-
 2 files changed, 18 insertions(+), 17 deletions(-)

diff --git a/include/net/xfrm.h b/include/net/xfrm.h
index 8f71c111e65a..a6ec341cd9f0 100644
--- a/include/net/xfrm.h
+++ b/include/net/xfrm.h
@@ -127,6 +127,7 @@ struct xfrm_state_walk {
 
 struct xfrm_state_offload {
struct net_device   *dev;
+   struct net_device   *slave_dev;
unsigned long   offload_handle;
unsigned intnum_exthdrs;
u8  flags;
diff --git a/net/xfrm/xfrm_device.c b/net/xfrm/xfrm_device.c
index 6cc7f7f1dd68..1cd31dcf59da 100644
--- a/net/xfrm/xfrm_device.c
+++ b/net/xfrm/xfrm_device.c
@@ -108,6 +108,7 @@ struct sk_buff *validate_xmit_xfrm(struct sk_buff *skb, 
netdev_features_t featur
struct sk_buff *skb2, *nskb, *pskb = NULL;
netdev_features_t esp_features = features;
struct xfrm_offload *xo = xfrm_offload(skb);
+   struct net_device *dev = skb->dev;
struct sec_path *sp;
 
if (!xo)
@@ -121,6 +122,10 @@ struct sk_buff *validate_xmit_xfrm(struct sk_buff *skb, 
netdev_features_t featur
if (xo->flags & XFRM_GRO || x->xso.flags & XFRM_OFFLOAD_INBOUND)
return skb;
 
+   /* This skb was already validated on the master dev */
+   if ((x->xso.dev != dev) && (x->xso.slave_dev == dev))
+   return skb;
+
local_irq_save(flags);
sd = this_cpu_ptr(&softnet_data);
err = !skb_queue_empty(&sd->xfrm_backlog);
@@ -131,25 +136,20 @@ struct sk_buff *validate_xmit_xfrm(struct sk_buff *skb, 
netdev_features_t featur
return skb;
}
 
-   if (skb_is_gso(skb)) {
-   struct net_device *dev = skb->dev;
-
-   if (unlikely(x->xso.dev != dev)) {
-   struct sk_buff *segs;
+   if (skb_is_gso(skb) && unlikely(x->xso.dev != dev)) {
+   struct sk_buff *segs;
 
-   /* Packet got rerouted, fixup features and segment it. 
*/
-   esp_features = esp_features & ~(NETIF_F_HW_ESP
-   | NETIF_F_GSO_ESP);
+   /* Packet got rerouted, fixup features and segment it. */
+   esp_features = esp_features & ~(NETIF_F_HW_ESP | 
NETIF_F_GSO_ESP);
 
-   segs = skb_gso_segment(skb, esp_features);
-   if (IS_ERR(segs)) {
-   kfree_skb(skb);
-   atomic_long_inc(&dev->tx_dropped);
-   return NULL;
-   } else {
-   consume_skb(skb);
-   skb = segs;
-   }
+   segs = skb_gso_segment(skb, esp_features);
+   if (IS_ERR(segs)) {
+   kfree_skb(skb);
+   atomic_long_inc(&dev->tx_dropped);
+   return NULL;
+   } else {
+   consume_skb(skb);
+   skb = segs;
}
}
 
-- 
2.20.1

[RFC PATCH net-next v2 2/3] ixgbe_ipsec: become aware of when running as a bonding slave

2020-05-05 Thread Jarod Wilson

Slave devices in a bond doing hardware encryption also need to be aware
that they're slaves, so we operate on the slave instead of the bonding
master to do the actual hardware encryption offload bits.

CC: Jay Vosburgh 
CC: Veaceslav Falico 
CC: Andy Gospodarek 
CC: "David S. Miller" 
CC: Jeff Kirsher 
CC: Jakub Kicinski 
CC: Steffen Klassert 
CC: Herbert Xu 
CC: net...@vger.kernel.org
CC: intel-wired-...@lists.osuosl.org
Signed-off-by: Jarod Wilson 
---
 .../net/ethernet/intel/ixgbe/ixgbe_ipsec.c| 39 +++
 1 file changed, 31 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c 
b/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c
index 113f6087c7c9..26b0a58a064d 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c
@@ -432,6 +432,9 @@ static int ixgbe_ipsec_parse_proto_keys(struct xfrm_state 
*xs,
char *alg_name = NULL;
int key_len;
 
+   if (xs->xso.slave_dev)
+   dev = xs->xso.slave_dev;
+
if (!xs->aead) {
netdev_err(dev, "Unsupported IPsec algorithm\n");
return -EINVAL;
@@ -478,8 +481,8 @@ static int ixgbe_ipsec_parse_proto_keys(struct xfrm_state 
*xs,
 static int ixgbe_ipsec_check_mgmt_ip(struct xfrm_state *xs)
 {
struct net_device *dev = xs->xso.dev;
-   struct ixgbe_adapter *adapter = netdev_priv(dev);
-   struct ixgbe_hw *hw = &adapter->hw;
+   struct ixgbe_adapter *adapter;
+   struct ixgbe_hw *hw;
u32 mfval, manc, reg;
int num_filters = 4;
bool manc_ipv4;
@@ -497,6 +500,12 @@ static int ixgbe_ipsec_check_mgmt_ip(struct xfrm_state *xs)
 #define BMCIP_V6 0x3
 #define BMCIP_MASK   0x3
 
+   if (xs->xso.slave_dev)
+   dev = xs->xso.slave_dev;
+
+   adapter = netdev_priv(dev);
+   hw = &adapter->hw;
+
manc = IXGBE_READ_REG(hw, IXGBE_MANC);
manc_ipv4 = !!(manc & MANC_EN_IPV4_FILTER);
mfval = IXGBE_READ_REG(hw, IXGBE_MFVAL);
@@ -561,14 +570,21 @@ static int ixgbe_ipsec_check_mgmt_ip(struct xfrm_state 
*xs)
 static int ixgbe_ipsec_add_sa(struct xfrm_state *xs)
 {
struct net_device *dev = xs->xso.dev;
-   struct ixgbe_adapter *adapter = netdev_priv(dev);
-   struct ixgbe_ipsec *ipsec = adapter->ipsec;
-   struct ixgbe_hw *hw = &adapter->hw;
+   struct ixgbe_adapter *adapter;
+   struct ixgbe_ipsec *ipsec;
+   struct ixgbe_hw *hw;
int checked, match, first;
u16 sa_idx;
int ret;
int i;
 
+   if (xs->xso.slave_dev)
+   dev = xs->xso.slave_dev;
+
+   adapter = netdev_priv(dev);
+   ipsec = adapter->ipsec;
+   hw = &adapter->hw;
+
if (xs->id.proto != IPPROTO_ESP && xs->id.proto != IPPROTO_AH) {
netdev_err(dev, "Unsupported protocol 0x%04x for ipsec 
offload\n",
   xs->id.proto);
@@ -746,12 +762,19 @@ static int ixgbe_ipsec_add_sa(struct xfrm_state *xs)
 static void ixgbe_ipsec_del_sa(struct xfrm_state *xs)
 {
struct net_device *dev = xs->xso.dev;
-   struct ixgbe_adapter *adapter = netdev_priv(dev);
-   struct ixgbe_ipsec *ipsec = adapter->ipsec;
-   struct ixgbe_hw *hw = &adapter->hw;
+   struct ixgbe_adapter *adapter;
+   struct ixgbe_ipsec *ipsec;
+   struct ixgbe_hw *hw;
u32 zerobuf[4] = {0, 0, 0, 0};
u16 sa_idx;
 
+   if (xs->xso.slave_dev)
+   dev = xs->xso.slave_dev;
+
+   adapter = netdev_priv(dev);
+   ipsec = adapter->ipsec;
+   hw = &adapter->hw;
+
if (xs->xso.flags & XFRM_OFFLOAD_INBOUND) {
struct rx_sa *rsa;
u8 ipi;
-- 
2.20.1

[RFC PATCH net-next v2 3/3] bonding: support hardware encryption offload to slaves

2020-05-05 Thread Jarod Wilson

Currently, this support is limited to active-backup mode, as I'm not sure
about the feasilibity of mapping an xfrm_state's offload handle to
multiple hardware devices simultaneously, and we rely on being able to
pass some hints to both the xfrm and NIC driver about whether or not
they're operating on a slave device.

I've tested this atop an Intel x520 device (ixgbe) using libreswan in
transport mode, succesfully achieving ~4.3Gbps throughput with netperf
(more or less identical to throughput on a bare NIC in this system),
as well as successful failover and recovery mid-netperf.

v2: rebase on latest net-next and wrap with #ifdef CONFIG_XFRM_OFFLOAD

CC: Jay Vosburgh 
CC: Veaceslav Falico 
CC: Andy Gospodarek 
CC: "David S. Miller" 
CC: Jeff Kirsher 
CC: Jakub Kicinski 
CC: Steffen Klassert 
CC: Herbert Xu 
CC: net...@vger.kernel.org
CC: intel-wired-...@lists.osuosl.org
Signed-off-by: Jarod Wilson 

Signed-off-by: Jarod Wilson 
---
 drivers/net/bonding/bond_main.c | 111 +++-
 include/net/bonding.h   |   3 +
 2 files changed, 111 insertions(+), 3 deletions(-)

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index baa93191dfdd..b90a86029df5 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -79,6 +79,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -278,8 +279,6 @@ const char *bond_mode_name(int mode)
return names[mode];
 }
 
-/*-- VLAN ---*/
-
 /**
  * bond_dev_queue_xmit - Prepare skb for xmit.
  *
@@ -302,6 +301,8 @@ void bond_dev_queue_xmit(struct bonding *bond, struct 
sk_buff *skb,
dev_queue_xmit(skb);
 }
 
+/*-- VLAN ---*/
+
 /* In the following 2 functions, bond_vlan_rx_add_vid and 
bond_vlan_rx_kill_vid,
  * We don't protect the slave list iteration with a lock because:
  * a. This operation is performed in IOCTL context,
@@ -372,6 +373,84 @@ static int bond_vlan_rx_kill_vid(struct net_device 
*bond_dev,
return 0;
 }
 
+/*-- XFRM ---*/
+
+#ifdef CONFIG_XFRM_OFFLOAD
+/**
+ * bond_ipsec_add_sa - program device with a security association
+ * @xs: pointer to transformer state struct
+ **/
+static int bond_ipsec_add_sa(struct xfrm_state *xs)
+{
+   struct net_device *bond_dev = xs->xso.dev;
+   struct bonding *bond = netdev_priv(bond_dev);
+   struct slave *slave = rtnl_dereference(bond->curr_active_slave);
+
+   xs->xso.slave_dev = slave->dev;
+   bond->xs = xs;
+
+   if (!(slave->dev->xfrmdev_ops
+ && slave->dev->xfrmdev_ops->xdo_dev_state_add)) {
+   slave_warn(bond_dev, slave->dev, "Slave does not support ipsec 
offload\n");
+   return -EINVAL;
+   }
+
+   return slave->dev->xfrmdev_ops->xdo_dev_state_add(xs);
+}
+
+/**
+ * bond_ipsec_del_sa - clear out this specific SA
+ * @xs: pointer to transformer state struct
+ **/
+static void bond_ipsec_del_sa(struct xfrm_state *xs)
+{
+   struct net_device *bond_dev = xs->xso.dev;
+   struct bonding *bond = netdev_priv(bond_dev);
+   struct slave *slave = rtnl_dereference(bond->curr_active_slave);
+
+   if (!slave)
+   return;
+
+   xs->xso.slave_dev = slave->dev;
+
+   if (!(slave->dev->xfrmdev_ops
+ && slave->dev->xfrmdev_ops->xdo_dev_state_delete)) {
+   slave_warn(bond_dev, slave->dev, "%s: no slave 
xdo_dev_state_delete\n", __func__);
+   return;
+   }
+
+   slave->dev->xfrmdev_ops->xdo_dev_state_delete(xs);
+}
+
+/**
+ * bond_ipsec_offload_ok - can this packet use the xfrm hw offload
+ * @skb: current data packet
+ * @xs: pointer to transformer state struct
+ **/
+static bool bond_ipsec_offload_ok(struct sk_buff *skb, struct xfrm_state *xs)
+{
+   struct net_device *bond_dev = xs->xso.dev;
+   struct bonding *bond = netdev_priv(bond_dev);
+   struct slave *curr_active = rtnl_dereference(bond->curr_active_slave);
+   struct net_device *slave_dev = curr_active->dev;
+
+   if (!(slave_dev->xfrmdev_ops
+ && slave_dev->xfrmdev_ops->xdo_dev_offload_ok)) {
+   slave_warn(bond_dev, slave_dev, "%s: no slave 
xdo_dev_offload_ok\n", __func__);
+   return false;
+   }
+
+   xs->xso.slave_dev = slave_dev;
+   return slave_dev->xfrmdev_ops->xdo_dev_offload_ok(skb, xs);
+}
+
+static const struct xfrmdev_ops bond_xfrmdev_ops = {
+   .xdo_dev_state_add = bond_ipsec_add_sa,
+   .xdo_dev_state_delete = bond_ipsec_del_sa,
+   .xdo_dev_offload_ok = bond_ipsec_offload_ok,
+};
+#endif /* CONFIG_

[RFC PATCH net-next v2 0/3] bonding: support hardware crypto offload

2020-05-05 Thread Jarod Wilson

This is an initial "proof of concept" functional implementation for doing
pass-through of hardware encryption from bonding device to capable slaves.
This was tested using an ixgbe-driven Intel x520 NIC with libreswan and a
transport mode connection, on top of an active-backup bond, using netperf
and downing an interface during. Failover takes a moment, but does work,
and overall performance is right on par with offload when running on a
bare interface.

Caveats: this is ONLY enabled for active-backup, because I'm not sure
how one would manage multiple offload handles for different devices all
running at the same time in the same xfrm, and it relies on some minor
changes to both the xfrm code and slave device driver code to get things
to behave, and I don't have immediate access to any other hardware that
could function similarly to update driver code accordingly.

I'm hoping folks with more of an idea about xfrm have some thoughts on
ways to make this cleaner, and possibly support more bonding modes, but
I'm reasonably happy I've made it this far. :)

v2: fix build with CONFIG_XFRM_OFFLOAD disabled and rebase on latest
net-next tree bonding changes

CC: Jay Vosburgh 
CC: Veaceslav Falico 
CC: Andy Gospodarek 
CC: "David S. Miller" 
CC: Jeff Kirsher 
CC: Jakub Kicinski 
CC: Steffen Klassert 
CC: Herbert Xu 
CC: net...@vger.kernel.org
CC: intel-wired-...@lists.osuosl.org

Jarod Wilson (3):
  xfrm: bail early on slave pass over skb
  ixgbe_ipsec: become aware of when running as a bonding slave
  bonding: support hardware encryption offload to slaves

 drivers/net/bonding/bond_main.c   | 111 +-
 .../net/ethernet/intel/ixgbe/ixgbe_ipsec.c|  39 --
 include/net/bonding.h |   3 +
 include/net/xfrm.h|   1 +
 net/xfrm/xfrm_device.c|  34 +++---
 5 files changed, 160 insertions(+), 28 deletions(-)

-- 
2.20.1

[RFC PATCH net-next 3/3] bonding: support hardware encryption offload to slaves

2020-05-04 Thread Jarod Wilson

Currently, this support is limited to active-backup mode, as I'm not sure
about the feasilibity of mapping an xfrm_state's offload handle to
multiple hardware devices simultaneously, and we rely on being able to
pass some hints to both the xfrm and NIC driver about whether or not
they're operating on a slave device.

I've tested this atop an Intel x520 device (ixgbe) using libreswan in
transport mode, succesfully achieving ~4.3Gbps throughput with netperf
(more or less identical to throughput on a bare NIC in this system),
as well as successful failover and recovery mid-netperf.

CC: Jay Vosburgh 
CC: Veaceslav Falico 
CC: Andy Gospodarek 
CC: "David S. Miller" 
CC: Jeff Kirsher 
CC: Jakub Kicinski 
CC: Steffen Klassert 
CC: Herbert Xu 
CC: net...@vger.kernel.org
CC: intel-wired-...@lists.osuosl.org
Signed-off-by: Jarod Wilson 
---
 drivers/net/bonding/bond_main.c | 103 +++-
 include/net/bonding.h   |   1 +
 2 files changed, 101 insertions(+), 3 deletions(-)

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 2e70e43c5df5..781da5beb484 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -79,6 +79,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -278,8 +279,6 @@ const char *bond_mode_name(int mode)
return names[mode];
 }
 
-/*-- VLAN ---*/
-
 /**
  * bond_dev_queue_xmit - Prepare skb for xmit.
  *
@@ -302,6 +301,8 @@ void bond_dev_queue_xmit(struct bonding *bond, struct 
sk_buff *skb,
dev_queue_xmit(skb);
 }
 
+/*-- VLAN ---*/
+
 /* In the following 2 functions, bond_vlan_rx_add_vid and 
bond_vlan_rx_kill_vid,
  * We don't protect the slave list iteration with a lock because:
  * a. This operation is performed in IOCTL context,
@@ -372,6 +373,82 @@ static int bond_vlan_rx_kill_vid(struct net_device 
*bond_dev,
return 0;
 }
 
+/*-- XFRM ---*/
+
+/**
+ * bond_ipsec_add_sa - program device with a security association
+ * @xs: pointer to transformer state struct
+ **/
+static int bond_ipsec_add_sa(struct xfrm_state *xs)
+{
+   struct net_device *bond_dev = xs->xso.dev;
+   struct bonding *bond = netdev_priv(bond_dev);
+   struct slave *slave = rtnl_dereference(bond->curr_active_slave);
+
+   xs->xso.slave_dev = slave->dev;
+   bond->xs = xs;
+
+   if (!(slave->dev->xfrmdev_ops
+ && slave->dev->xfrmdev_ops->xdo_dev_state_add)) {
+   slave_warn(bond_dev, slave->dev, "Slave does not support ipsec 
offload\n");
+   return -EINVAL;
+   }
+
+   return slave->dev->xfrmdev_ops->xdo_dev_state_add(xs);
+}
+
+/**
+ * bond_ipsec_del_sa - clear out this specific SA
+ * @xs: pointer to transformer state struct
+ **/
+static void bond_ipsec_del_sa(struct xfrm_state *xs)
+{
+   struct net_device *bond_dev = xs->xso.dev;
+   struct bonding *bond = netdev_priv(bond_dev);
+   struct slave *slave = rtnl_dereference(bond->curr_active_slave);
+
+   if (!slave)
+   return;
+
+   xs->xso.slave_dev = slave->dev;
+
+   if (!(slave->dev->xfrmdev_ops
+ && slave->dev->xfrmdev_ops->xdo_dev_state_delete)) {
+   slave_warn(bond_dev, slave->dev, "%s: no slave 
xdo_dev_state_delete\n", __func__);
+   return;
+   }
+
+   slave->dev->xfrmdev_ops->xdo_dev_state_delete(xs);
+}
+
+/**
+ * bond_ipsec_offload_ok - can this packet use the xfrm hw offload
+ * @skb: current data packet
+ * @xs: pointer to transformer state struct
+ **/
+static bool bond_ipsec_offload_ok(struct sk_buff *skb, struct xfrm_state *xs)
+{
+   struct net_device *bond_dev = xs->xso.dev;
+   struct bonding *bond = netdev_priv(bond_dev);
+   struct slave *curr_active = rtnl_dereference(bond->curr_active_slave);
+   struct net_device *slave_dev = curr_active->dev;
+
+   if (!(slave_dev->xfrmdev_ops
+ && slave_dev->xfrmdev_ops->xdo_dev_offload_ok)) {
+   slave_warn(bond_dev, slave_dev, "%s: no slave 
xdo_dev_offload_ok\n", __func__);
+   return false;
+   }
+
+   xs->xso.slave_dev = slave_dev;
+   return slave_dev->xfrmdev_ops->xdo_dev_offload_ok(skb, xs);
+}
+
+static const struct xfrmdev_ops bond_xfrmdev_ops = {
+   .xdo_dev_state_add = bond_ipsec_add_sa,
+   .xdo_dev_state_delete = bond_ipsec_del_sa,
+   .xdo_dev_offload_ok = bond_ipsec_offload_ok,
+};
+
 /*--- Link status ---*/
 
 /* Set the carrier state for the master according to the state

[RFC PATCH net-next 2/3] ixgbe_ipsec: become aware of when running as a bonding slave

2020-05-04 Thread Jarod Wilson

Slave devices in a bond doing hardware encryption also need to be aware
that they're slaves, so we operate on the slave instead of the bonding
master to do the actual hardware encryption offload bits.

CC: Jay Vosburgh 
CC: Veaceslav Falico 
CC: Andy Gospodarek 
CC: "David S. Miller" 
CC: Jeff Kirsher 
CC: Jakub Kicinski 
CC: Steffen Klassert 
CC: Herbert Xu 
CC: net...@vger.kernel.org
CC: intel-wired-...@lists.osuosl.org
Signed-off-by: Jarod Wilson 
---
 .../net/ethernet/intel/ixgbe/ixgbe_ipsec.c| 39 +++
 1 file changed, 31 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c 
b/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c
index 113f6087c7c9..26b0a58a064d 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c
@@ -432,6 +432,9 @@ static int ixgbe_ipsec_parse_proto_keys(struct xfrm_state 
*xs,
char *alg_name = NULL;
int key_len;
 
+   if (xs->xso.slave_dev)
+   dev = xs->xso.slave_dev;
+
if (!xs->aead) {
netdev_err(dev, "Unsupported IPsec algorithm\n");
return -EINVAL;
@@ -478,8 +481,8 @@ static int ixgbe_ipsec_parse_proto_keys(struct xfrm_state 
*xs,
 static int ixgbe_ipsec_check_mgmt_ip(struct xfrm_state *xs)
 {
struct net_device *dev = xs->xso.dev;
-   struct ixgbe_adapter *adapter = netdev_priv(dev);
-   struct ixgbe_hw *hw = &adapter->hw;
+   struct ixgbe_adapter *adapter;
+   struct ixgbe_hw *hw;
u32 mfval, manc, reg;
int num_filters = 4;
bool manc_ipv4;
@@ -497,6 +500,12 @@ static int ixgbe_ipsec_check_mgmt_ip(struct xfrm_state *xs)
 #define BMCIP_V6 0x3
 #define BMCIP_MASK   0x3
 
+   if (xs->xso.slave_dev)
+   dev = xs->xso.slave_dev;
+
+   adapter = netdev_priv(dev);
+   hw = &adapter->hw;
+
manc = IXGBE_READ_REG(hw, IXGBE_MANC);
manc_ipv4 = !!(manc & MANC_EN_IPV4_FILTER);
mfval = IXGBE_READ_REG(hw, IXGBE_MFVAL);
@@ -561,14 +570,21 @@ static int ixgbe_ipsec_check_mgmt_ip(struct xfrm_state 
*xs)
 static int ixgbe_ipsec_add_sa(struct xfrm_state *xs)
 {
struct net_device *dev = xs->xso.dev;
-   struct ixgbe_adapter *adapter = netdev_priv(dev);
-   struct ixgbe_ipsec *ipsec = adapter->ipsec;
-   struct ixgbe_hw *hw = &adapter->hw;
+   struct ixgbe_adapter *adapter;
+   struct ixgbe_ipsec *ipsec;
+   struct ixgbe_hw *hw;
int checked, match, first;
u16 sa_idx;
int ret;
int i;
 
+   if (xs->xso.slave_dev)
+   dev = xs->xso.slave_dev;
+
+   adapter = netdev_priv(dev);
+   ipsec = adapter->ipsec;
+   hw = &adapter->hw;
+
if (xs->id.proto != IPPROTO_ESP && xs->id.proto != IPPROTO_AH) {
netdev_err(dev, "Unsupported protocol 0x%04x for ipsec 
offload\n",
   xs->id.proto);
@@ -746,12 +762,19 @@ static int ixgbe_ipsec_add_sa(struct xfrm_state *xs)
 static void ixgbe_ipsec_del_sa(struct xfrm_state *xs)
 {
struct net_device *dev = xs->xso.dev;
-   struct ixgbe_adapter *adapter = netdev_priv(dev);
-   struct ixgbe_ipsec *ipsec = adapter->ipsec;
-   struct ixgbe_hw *hw = &adapter->hw;
+   struct ixgbe_adapter *adapter;
+   struct ixgbe_ipsec *ipsec;
+   struct ixgbe_hw *hw;
u32 zerobuf[4] = {0, 0, 0, 0};
u16 sa_idx;
 
+   if (xs->xso.slave_dev)
+   dev = xs->xso.slave_dev;
+
+   adapter = netdev_priv(dev);
+   ipsec = adapter->ipsec;
+   hw = &adapter->hw;
+
if (xs->xso.flags & XFRM_OFFLOAD_INBOUND) {
struct rx_sa *rsa;
u8 ipi;
-- 
2.20.1

[RFC PATCH net-next 0/3] bonding: support hardware crypto offload

2020-05-04 Thread Jarod Wilson

This is an initial "proof of concept" functional implementation for doing
pass-through of hardware encryption from bonding device to capable slaves.
This was tested using an ixgbe-driven Intel x520 NIC with libreswan and a
transport mode connection, on top of an active-backup bond, using netperf
and downing an interface during. Failover takes a moment, but does work,
and overall performance is right on par with offload when running on a
bare interface.

Caveats: this is ONLY enabled for active-backup, because I'm not sure
how one would manage multiple offload handles for different devices all
running at the same time in the same xfrm, and it relies on some minor
changes to both the xfrm code and slave device driver code to get things
to behave, and I don't have immediate access to any other hardware that
could function similarly to update driver code accordingly.

I'm hoping folks with more of an idea about xfrm have some thoughts on
ways to make this cleaner, and possibly support more bonding modes, but
I'm reasonably happy I've made it this far. :)

Jarod Wilson (3):
  xfrm: bail early on slave pass over skb
  ixgbe_ipsec: become aware of when running as a bonding slave
  bonding: support hardware encryption offload to slaves

CC: Jay Vosburgh 
CC: Veaceslav Falico 
CC: Andy Gospodarek 
CC: "David S. Miller" 
CC: Jeff Kirsher 
CC: Jakub Kicinski 
CC: Steffen Klassert 
CC: Herbert Xu 
CC: net...@vger.kernel.org
CC: intel-wired-...@lists.osuosl.org

 drivers/net/bonding/bond_main.c   | 103 +-
 .../net/ethernet/intel/ixgbe/ixgbe_ipsec.c|  39 +--
 include/net/bonding.h |   1 +
 include/net/xfrm.h|   1 +
 net/xfrm/xfrm_device.c|  34 +++---
 5 files changed, 150 insertions(+), 28 deletions(-)

-- 
2.20.1

[RFC PATCH net-next 1/3] xfrm: bail early on slave pass over skb

2020-05-04 Thread Jarod Wilson

This is prep work for initial support of bonding hardware encryption
pass-through support. The bonding driver will fill in the slave_dev
pointer, and we use that to know not to skb_push() again on a given
skb that was already processed on the bond device.

CC: Jay Vosburgh 
CC: Veaceslav Falico 
CC: Andy Gospodarek 
CC: "David S. Miller" 
CC: Jeff Kirsher 
CC: Jakub Kicinski 
CC: Steffen Klassert 
CC: Herbert Xu 
CC: net...@vger.kernel.org
CC: intel-wired-...@lists.osuosl.org
Signed-off-by: Jarod Wilson 
---
 include/net/xfrm.h |  1 +
 net/xfrm/xfrm_device.c | 34 +-
 2 files changed, 18 insertions(+), 17 deletions(-)

diff --git a/include/net/xfrm.h b/include/net/xfrm.h
index 8f71c111e65a..a6ec341cd9f0 100644
--- a/include/net/xfrm.h
+++ b/include/net/xfrm.h
@@ -127,6 +127,7 @@ struct xfrm_state_walk {
 
 struct xfrm_state_offload {
struct net_device   *dev;
+   struct net_device   *slave_dev;
unsigned long   offload_handle;
unsigned intnum_exthdrs;
u8  flags;
diff --git a/net/xfrm/xfrm_device.c b/net/xfrm/xfrm_device.c
index 6cc7f7f1dd68..1cd31dcf59da 100644
--- a/net/xfrm/xfrm_device.c
+++ b/net/xfrm/xfrm_device.c
@@ -108,6 +108,7 @@ struct sk_buff *validate_xmit_xfrm(struct sk_buff *skb, 
netdev_features_t featur
struct sk_buff *skb2, *nskb, *pskb = NULL;
netdev_features_t esp_features = features;
struct xfrm_offload *xo = xfrm_offload(skb);
+   struct net_device *dev = skb->dev;
struct sec_path *sp;
 
if (!xo)
@@ -121,6 +122,10 @@ struct sk_buff *validate_xmit_xfrm(struct sk_buff *skb, 
netdev_features_t featur
if (xo->flags & XFRM_GRO || x->xso.flags & XFRM_OFFLOAD_INBOUND)
return skb;
 
+   /* This skb was already validated on the master dev */
+   if ((x->xso.dev != dev) && (x->xso.slave_dev == dev))
+   return skb;
+
local_irq_save(flags);
sd = this_cpu_ptr(&softnet_data);
err = !skb_queue_empty(&sd->xfrm_backlog);
@@ -131,25 +136,20 @@ struct sk_buff *validate_xmit_xfrm(struct sk_buff *skb, 
netdev_features_t featur
return skb;
}
 
-   if (skb_is_gso(skb)) {
-   struct net_device *dev = skb->dev;
-
-   if (unlikely(x->xso.dev != dev)) {
-   struct sk_buff *segs;
+   if (skb_is_gso(skb) && unlikely(x->xso.dev != dev)) {
+   struct sk_buff *segs;
 
-   /* Packet got rerouted, fixup features and segment it. 
*/
-   esp_features = esp_features & ~(NETIF_F_HW_ESP
-   | NETIF_F_GSO_ESP);
+   /* Packet got rerouted, fixup features and segment it. */
+   esp_features = esp_features & ~(NETIF_F_HW_ESP | 
NETIF_F_GSO_ESP);
 
-   segs = skb_gso_segment(skb, esp_features);
-   if (IS_ERR(segs)) {
-   kfree_skb(skb);
-   atomic_long_inc(&dev->tx_dropped);
-   return NULL;
-   } else {
-   consume_skb(skb);
-   skb = segs;
-   }
+   segs = skb_gso_segment(skb, esp_features);
+   if (IS_ERR(segs)) {
+   kfree_skb(skb);
+   atomic_long_inc(&dev->tx_dropped);
+   return NULL;
+   } else {
+   consume_skb(skb);
+   skb = segs;
}
}
 
-- 
2.20.1

Re: [PATCH net] bonding: make debugging output more succinct

2019-06-07 Thread Jarod Wilson


On 6/7/19 10:59 AM, Jarod Wilson wrote:

Seeing bonding debug log data along the lines of "event: 5" is a bit spartan,
and often requires a lookup table if you don't remember what every event is.
Make use of netdev_cmd_to_name for an improved debugging experience, so for
the prior example, you'll see: "bond_netdev_event received NETDEV_REGISTER"
instead (both are prefixed with the device for which the event pertains).

There are also quite a few places that the netdev_dbg output could stand to
mention exactly which slave the message pertains to (gets messy if you have
multiple slaves all spewing at once to know which one they pertain to).


Argh. Please drop this one, detritus in my git tree when I hit git 
send-email caused this earlier iteration of patch 1 of the set this is 
threaded with to go out.


--
Jarod Wilson
ja...@redhat.com

1 2 3 4 5 >

1 - 100 of 405 matches

Mail list logo