Re: [BUG] bonded interfaces drop bpdu (stp) frames

2018-07-14 Thread Michal Soltys

On 07/13/2018 02:15 AM, Mahesh Bandewar (महेश बंडेवार) wrote:

On Thu, Jul 12, 2018 at 4:14 PM, Michal Soltys  wrote:

On 2018-07-13 00:03, Jay Vosburgh wrote:

Mahesh Bandewar (महेश बंडेवार) wrote:


On Thu, Jul 12, 2018 at 11:03 AM, Jay Vosburgh
 wrote:

Michal Soltys  wrote:


On 07/12/2018 04:51 PM, Jay Vosburgh wrote:

Mahesh Bandewar (महेश बंडेवार) wrote:


On Wed, Jul 11, 2018 at 3:23 PM, Michal Soltys  wrote:


Hi,

As weird as that sounds, this is what I observed today after bumping
kernel version. I have a setup where 2 bonds are attached to linux
bridge and physically are connected to two switches doing MSTP (and
linux bridge is just passing them).

Initially I suspected some changes related to bridge code - but quick
peek at the code showed nothing suspicious - and the part of it that
explicitly passes stp frames if stp is not enabled has seen little
changes (e.g. per-port group_fwd_mask added recently). Furthermore - if
regular non-bonded interfaces are attached everything works fine.

Just to be sure I detached the bond (802.3ad mode) and checked it with
simple tcpdump (ether proto \\stp) - and indeed no hello packets were
there (with them being present just fine on active enslaved interface,
or on the bond device in earlier kernels).

If time permits I'll bisect tommorow to pinpoint the commit, but from
quick todays test - 4.9.x is working fine, while 4.16.16 (tested on
debian) and 4.17.3 (tested on archlinux) are failing.

Unless this is already a known issue (or you have any suggestions what
could be responsible).


I believe these are link-local-multicast messages and sometime back a
change went into to not pass those frames to the bonding master. This
could be the side effect of that.


  Mahesh, I suspect you're thinking of:

commit b89f04c61efe3b7756434d693b9203cc0cce002e
Author: Chonggang Li 
Date:   Sun Apr 16 12:02:18 2017 -0700

  bonding: deliver link-local packets with skb->dev set to link that 
packets arrived on

  Michal, are you able to revert this patch and test?

  -J

---
  -Jay Vosburgh, jay.vosbu...@canonical.com




Just tested - yes, reverting that patch solves the issues.


 Chonggang,

 Reading the changelog in your commit referenced above, I'm not
entirely sure what actual problem it is fixing.  Could you elaborate?

 As the patch appears to cause a regression, it needs to be
either fixed or reverted.

 Mahesh, you signed-off on it as well, perhaps you also have some
context?



I think the original idea behind it was to pass the LLDPDUs to the
stack on the interface that they came on since this is considered to
be link-local traffic and passing to bond-master would loose it's
"linklocal-ness". This is true for LLDP and if you change the skb->dev
of the packet, then you don't know which slave link it came on in
(from LLDP consumer's perspective).

I don't know much about STP but trunking two links and aggregating
this link info through bond-master seems wrong. Just like LLDP, you
are losing info specific to a link and the decision derived from that
info could be wrong.

Having said that, we determine "linklocal-ness" by looking at L2 and
bondmaster shares this with lts slaves. So it does seem fair to pass
those frames to the bonding-master but at the same time link-local
traffic is supposed to be limited to the physical link (LLDP/STP/LACP
etc). Your thoughts?


   I agree the whole thing sounds kind of weird, but I'm curious as
to what Michal's actual use case is; he presumably has some practical
use for this, since he noticed that the behavior changed.



The whole "link-local" term is a bit I don't know - at this point it
feels like too many things were thrown into single bag and it got
somewhat confusing (bpdu, lldp, pause frames, lacp, pae, qinq mulitcast
that afaik has its own address) - I added some examples in another reply
I did at the same time as you were typing this one =)


   Michal, you mentioned MSTP and using 802.3ad (LACP) mode; how
does that combination work rationally given that the bond might send and
receive traffic across multiple slaves?  Or does the switch side bundle
the ports together into a single logical interface for MSTP purposes?
On the TX side, I think the bond will likely balance all STP frames to
just one slave.



The basic concept - two "main" switches with "important" machines
connected to those. One switch dies and everything keeps working. With
no unused ports and so on.

In more details:

Originally I was trying MSTP daemon (on "important" machines) which
seems quite well and completely coded, but cannot really work correctly
- as afaik you can't put port (in linux bridge conext) in different
forwarding/blocking/etc. state per-region - itow per group of vlans (or
mstpd didn't know how to do that, or it wasn't implemented - I didn't
look too deep back then, though my interest resurfaced in recent days).

So that option was out of the question. But any switch, real or

Re: [BUG] bonded interfaces drop bpdu (stp) frames

2018-07-12 Thread महेश बंडेवार
On Thu, Jul 12, 2018 at 4:14 PM, Michal Soltys  wrote:
> On 2018-07-13 00:03, Jay Vosburgh wrote:
>> Mahesh Bandewar (महेश बंडेवार) wrote:
>>
>>>On Thu, Jul 12, 2018 at 11:03 AM, Jay Vosburgh
>>> wrote:
 Michal Soltys  wrote:

>On 07/12/2018 04:51 PM, Jay Vosburgh wrote:
>> Mahesh Bandewar (महेश बंडेवार) wrote:
>>
>>> On Wed, Jul 11, 2018 at 3:23 PM, Michal Soltys  wrote:

 Hi,

 As weird as that sounds, this is what I observed today after bumping
 kernel version. I have a setup where 2 bonds are attached to linux
 bridge and physically are connected to two switches doing MSTP (and
 linux bridge is just passing them).

 Initially I suspected some changes related to bridge code - but quick
 peek at the code showed nothing suspicious - and the part of it that
 explicitly passes stp frames if stp is not enabled has seen little
 changes (e.g. per-port group_fwd_mask added recently). Furthermore - if
 regular non-bonded interfaces are attached everything works fine.

 Just to be sure I detached the bond (802.3ad mode) and checked it with
 simple tcpdump (ether proto \\stp) - and indeed no hello packets were
 there (with them being present just fine on active enslaved interface,
 or on the bond device in earlier kernels).

 If time permits I'll bisect tommorow to pinpoint the commit, but from
 quick todays test - 4.9.x is working fine, while 4.16.16 (tested on
 debian) and 4.17.3 (tested on archlinux) are failing.

 Unless this is already a known issue (or you have any suggestions what
 could be responsible).

>>> I believe these are link-local-multicast messages and sometime back a
>>> change went into to not pass those frames to the bonding master. This
>>> could be the side effect of that.
>>
>>  Mahesh, I suspect you're thinking of:
>>
>> commit b89f04c61efe3b7756434d693b9203cc0cce002e
>> Author: Chonggang Li 
>> Date:   Sun Apr 16 12:02:18 2017 -0700
>>
>>  bonding: deliver link-local packets with skb->dev set to link that 
>> packets arrived on
>>
>>  Michal, are you able to revert this patch and test?
>>
>>  -J
>>
>> ---
>>  -Jay Vosburgh, jay.vosbu...@canonical.com
>>
>
>
>Just tested - yes, reverting that patch solves the issues.

 Chonggang,

 Reading the changelog in your commit referenced above, I'm not
 entirely sure what actual problem it is fixing.  Could you elaborate?

 As the patch appears to cause a regression, it needs to be
 either fixed or reverted.

 Mahesh, you signed-off on it as well, perhaps you also have some
 context?

>>>
>>>I think the original idea behind it was to pass the LLDPDUs to the
>>>stack on the interface that they came on since this is considered to
>>>be link-local traffic and passing to bond-master would loose it's
>>>"linklocal-ness". This is true for LLDP and if you change the skb->dev
>>>of the packet, then you don't know which slave link it came on in
>>>(from LLDP consumer's perspective).
>>>
>>>I don't know much about STP but trunking two links and aggregating
>>>this link info through bond-master seems wrong. Just like LLDP, you
>>>are losing info specific to a link and the decision derived from that
>>>info could be wrong.
>>>
>>>Having said that, we determine "linklocal-ness" by looking at L2 and
>>>bondmaster shares this with lts slaves. So it does seem fair to pass
>>>those frames to the bonding-master but at the same time link-local
>>>traffic is supposed to be limited to the physical link (LLDP/STP/LACP
>>>etc). Your thoughts?
>>
>>   I agree the whole thing sounds kind of weird, but I'm curious as
>> to what Michal's actual use case is; he presumably has some practical
>> use for this, since he noticed that the behavior changed.
>>
>
> The whole "link-local" term is a bit I don't know - at this point it
> feels like too many things were thrown into single bag and it got
> somewhat confusing (bpdu, lldp, pause frames, lacp, pae, qinq mulitcast
> that afaik has its own address) - I added some examples in another reply
> I did at the same time as you were typing this one =)
>
>>   Michal, you mentioned MSTP and using 802.3ad (LACP) mode; how
>> does that combination work rationally given that the bond might send and
>> receive traffic across multiple slaves?  Or does the switch side bundle
>> the ports together into a single logical interface for MSTP purposes?
>> On the TX side, I think the bond will likely balance all STP frames to
>> just one slave.
>>
>
> The basic concept - two "main" switches with "important" machines
> connected to those. One switch dies and everything keeps working. With
> no unused ports and so on.
>
> In mor

Re: [BUG] bonded interfaces drop bpdu (stp) frames

2018-07-12 Thread Michal Soltys
On 2018-07-13 00:03, Jay Vosburgh wrote:
> Mahesh Bandewar (महेश बंडेवार) wrote:
> 
>>On Thu, Jul 12, 2018 at 11:03 AM, Jay Vosburgh
>> wrote:
>>> Michal Soltys  wrote:
>>>
On 07/12/2018 04:51 PM, Jay Vosburgh wrote:
> Mahesh Bandewar (महेश बंडेवार) wrote:
>
>> On Wed, Jul 11, 2018 at 3:23 PM, Michal Soltys  wrote:
>>>
>>> Hi,
>>>
>>> As weird as that sounds, this is what I observed today after bumping
>>> kernel version. I have a setup where 2 bonds are attached to linux
>>> bridge and physically are connected to two switches doing MSTP (and
>>> linux bridge is just passing them).
>>>
>>> Initially I suspected some changes related to bridge code - but quick
>>> peek at the code showed nothing suspicious - and the part of it that
>>> explicitly passes stp frames if stp is not enabled has seen little
>>> changes (e.g. per-port group_fwd_mask added recently). Furthermore - if
>>> regular non-bonded interfaces are attached everything works fine.
>>>
>>> Just to be sure I detached the bond (802.3ad mode) and checked it with
>>> simple tcpdump (ether proto \\stp) - and indeed no hello packets were
>>> there (with them being present just fine on active enslaved interface,
>>> or on the bond device in earlier kernels).
>>>
>>> If time permits I'll bisect tommorow to pinpoint the commit, but from
>>> quick todays test - 4.9.x is working fine, while 4.16.16 (tested on
>>> debian) and 4.17.3 (tested on archlinux) are failing.
>>>
>>> Unless this is already a known issue (or you have any suggestions what
>>> could be responsible).
>>>
>> I believe these are link-local-multicast messages and sometime back a
>> change went into to not pass those frames to the bonding master. This
>> could be the side effect of that.
>
>  Mahesh, I suspect you're thinking of:
>
> commit b89f04c61efe3b7756434d693b9203cc0cce002e
> Author: Chonggang Li 
> Date:   Sun Apr 16 12:02:18 2017 -0700
>
>  bonding: deliver link-local packets with skb->dev set to link that 
> packets arrived on
>
>  Michal, are you able to revert this patch and test?
>
>  -J
>
> ---
>  -Jay Vosburgh, jay.vosbu...@canonical.com
>


Just tested - yes, reverting that patch solves the issues.
>>>
>>> Chonggang,
>>>
>>> Reading the changelog in your commit referenced above, I'm not
>>> entirely sure what actual problem it is fixing.  Could you elaborate?
>>>
>>> As the patch appears to cause a regression, it needs to be
>>> either fixed or reverted.
>>>
>>> Mahesh, you signed-off on it as well, perhaps you also have some
>>> context?
>>>
>>
>>I think the original idea behind it was to pass the LLDPDUs to the
>>stack on the interface that they came on since this is considered to
>>be link-local traffic and passing to bond-master would loose it's
>>"linklocal-ness". This is true for LLDP and if you change the skb->dev
>>of the packet, then you don't know which slave link it came on in
>>(from LLDP consumer's perspective).
>>
>>I don't know much about STP but trunking two links and aggregating
>>this link info through bond-master seems wrong. Just like LLDP, you
>>are losing info specific to a link and the decision derived from that
>>info could be wrong.
>>
>>Having said that, we determine "linklocal-ness" by looking at L2 and
>>bondmaster shares this with lts slaves. So it does seem fair to pass
>>those frames to the bonding-master but at the same time link-local
>>traffic is supposed to be limited to the physical link (LLDP/STP/LACP
>>etc). Your thoughts?
> 
>   I agree the whole thing sounds kind of weird, but I'm curious as
> to what Michal's actual use case is; he presumably has some practical
> use for this, since he noticed that the behavior changed.
> 

The whole "link-local" term is a bit I don't know - at this point it
feels like too many things were thrown into single bag and it got
somewhat confusing (bpdu, lldp, pause frames, lacp, pae, qinq mulitcast
that afaik has its own address) - I added some examples in another reply
I did at the same time as you were typing this one =)

>   Michal, you mentioned MSTP and using 802.3ad (LACP) mode; how
> does that combination work rationally given that the bond might send and
> receive traffic across multiple slaves?  Or does the switch side bundle
> the ports together into a single logical interface for MSTP purposes?
> On the TX side, I think the bond will likely balance all STP frames to
> just one slave.
> 

The basic concept - two "main" switches with "important" machines
connected to those. One switch dies and everything keeps working. With
no unused ports and so on.

In more details:

Originally I was trying MSTP daemon (on "important" machines) which
seems quite well and completely coded, but cannot really work correctly
- as afaik you can't put

Re: [BUG] bonded interfaces drop bpdu (stp) frames

2018-07-12 Thread Michal Soltys
On 2018-07-12 23:26, Mahesh Bandewar (महेश बंडेवार) wrote:
> On Thu, Jul 12, 2018 at 11:03 AM, Jay Vosburgh
>  wrote:
>> Michal Soltys  wrote:
>>
>>>On 07/12/2018 04:51 PM, Jay Vosburgh wrote:
 Mahesh Bandewar (महेश बंडेवार) wrote:

> On Wed, Jul 11, 2018 at 3:23 PM, Michal Soltys  wrote:
>>
>> Hi,
>>
>> As weird as that sounds, this is what I observed today after bumping
>> kernel version. I have a setup where 2 bonds are attached to linux
>> bridge and physically are connected to two switches doing MSTP (and
>> linux bridge is just passing them).
>>
>> Initially I suspected some changes related to bridge code - but quick
>> peek at the code showed nothing suspicious - and the part of it that
>> explicitly passes stp frames if stp is not enabled has seen little
>> changes (e.g. per-port group_fwd_mask added recently). Furthermore - if
>> regular non-bonded interfaces are attached everything works fine.
>>
>> Just to be sure I detached the bond (802.3ad mode) and checked it with
>> simple tcpdump (ether proto \\stp) - and indeed no hello packets were
>> there (with them being present just fine on active enslaved interface,
>> or on the bond device in earlier kernels).
>>
>> If time permits I'll bisect tommorow to pinpoint the commit, but from
>> quick todays test - 4.9.x is working fine, while 4.16.16 (tested on
>> debian) and 4.17.3 (tested on archlinux) are failing.
>>
>> Unless this is already a known issue (or you have any suggestions what
>> could be responsible).
>>
> I believe these are link-local-multicast messages and sometime back a
> change went into to not pass those frames to the bonding master. This
> could be the side effect of that.

  Mahesh, I suspect you're thinking of:

 commit b89f04c61efe3b7756434d693b9203cc0cce002e
 Author: Chonggang Li 
 Date:   Sun Apr 16 12:02:18 2017 -0700

  bonding: deliver link-local packets with skb->dev set to link that 
 packets arrived on

  Michal, are you able to revert this patch and test?

  -J

 ---
  -Jay Vosburgh, jay.vosbu...@canonical.com

>>>
>>>
>>>Just tested - yes, reverting that patch solves the issues.
>>
>> Chonggang,
>>
>> Reading the changelog in your commit referenced above, I'm not
>> entirely sure what actual problem it is fixing.  Could you elaborate?
>>
>> As the patch appears to cause a regression, it needs to be
>> either fixed or reverted.
>>
>> Mahesh, you signed-off on it as well, perhaps you also have some
>> context?
>>
> 
> I think the original idea behind it was to pass the LLDPDUs to the
> stack on the interface that they came on since this is considered to
> be link-local traffic and passing to bond-master would loose it's
> "linklocal-ness". This is true for LLDP and if you change the skb->dev
> of the packet, then you don't know which slave link it came on in
> (from LLDP consumer's perspective).
> 
> I don't know much about STP but trunking two links and aggregating
> this link info through bond-master seems wrong. Just like LLDP, you
> are losing info specific to a link and the decision derived from that
> info could be wrong.
> 
> Having said that, we determine "linklocal-ness" by looking at L2 and
> bondmaster shares this with lts slaves. So it does seem fair to pass
> those frames to the bonding-master but at the same time link-local
> traffic is supposed to be limited to the physical link (LLDP/STP/LACP
> etc). Your thoughts?
> 

But, isn't bond de-facto considered the "physical link" ? Not directly
of course, but say an LLDP daemon would likely be more interested in
getting LLDP data from a bond device (or a bridge device, if the bond is
attached to one), than from its enslaved interfaces (and enslaved
interfaces can be changed, not mentioning potentially complex setup
itself, even if usually it's just lacp&go ).

ITOW, blocking link-local multicasts on bond level (among those - bpdu,
pae, lldp) is a bit like if the interface itself hid LACP before bond code.

A few other examples:

- putting bonds in a bridge is pretty normal thing - and whether the
bridge interpretes the spanning tree data itself (via in-kernel classic
stp or userspace daemon for e.g. rstp) or passes the trafic, it must see
the BPDU frames. Otherwise it becomes blind to the whole spanning tree
protocol - and implicitly other switches around - real or virtual ones.
It's literally instant loop disaster. br_input.c specifically takes care
to pass those frames if the bridge has stp turned off

- "group_fwd_mask" (again in bridge context) has been added to bridge
code - and recently as a per-port knob as well - to specifically allow
the control of what kind of "link-local" stuff is passed or not. LLDP
and 802.1X PAE were, afaik, the main reasons for that sysfs variable.
The per-port setting is even more rel

Re: [BUG] bonded interfaces drop bpdu (stp) frames

2018-07-12 Thread Jay Vosburgh
Mahesh Bandewar (महेश बंडेवार) wrote:

>On Thu, Jul 12, 2018 at 11:03 AM, Jay Vosburgh
> wrote:
>> Michal Soltys  wrote:
>>
>>>On 07/12/2018 04:51 PM, Jay Vosburgh wrote:
 Mahesh Bandewar (महेश बंडेवार) wrote:

> On Wed, Jul 11, 2018 at 3:23 PM, Michal Soltys  wrote:
>>
>> Hi,
>>
>> As weird as that sounds, this is what I observed today after bumping
>> kernel version. I have a setup where 2 bonds are attached to linux
>> bridge and physically are connected to two switches doing MSTP (and
>> linux bridge is just passing them).
>>
>> Initially I suspected some changes related to bridge code - but quick
>> peek at the code showed nothing suspicious - and the part of it that
>> explicitly passes stp frames if stp is not enabled has seen little
>> changes (e.g. per-port group_fwd_mask added recently). Furthermore - if
>> regular non-bonded interfaces are attached everything works fine.
>>
>> Just to be sure I detached the bond (802.3ad mode) and checked it with
>> simple tcpdump (ether proto \\stp) - and indeed no hello packets were
>> there (with them being present just fine on active enslaved interface,
>> or on the bond device in earlier kernels).
>>
>> If time permits I'll bisect tommorow to pinpoint the commit, but from
>> quick todays test - 4.9.x is working fine, while 4.16.16 (tested on
>> debian) and 4.17.3 (tested on archlinux) are failing.
>>
>> Unless this is already a known issue (or you have any suggestions what
>> could be responsible).
>>
> I believe these are link-local-multicast messages and sometime back a
> change went into to not pass those frames to the bonding master. This
> could be the side effect of that.

  Mahesh, I suspect you're thinking of:

 commit b89f04c61efe3b7756434d693b9203cc0cce002e
 Author: Chonggang Li 
 Date:   Sun Apr 16 12:02:18 2017 -0700

  bonding: deliver link-local packets with skb->dev set to link that 
 packets arrived on

  Michal, are you able to revert this patch and test?

  -J

 ---
  -Jay Vosburgh, jay.vosbu...@canonical.com

>>>
>>>
>>>Just tested - yes, reverting that patch solves the issues.
>>
>> Chonggang,
>>
>> Reading the changelog in your commit referenced above, I'm not
>> entirely sure what actual problem it is fixing.  Could you elaborate?
>>
>> As the patch appears to cause a regression, it needs to be
>> either fixed or reverted.
>>
>> Mahesh, you signed-off on it as well, perhaps you also have some
>> context?
>>
>
>I think the original idea behind it was to pass the LLDPDUs to the
>stack on the interface that they came on since this is considered to
>be link-local traffic and passing to bond-master would loose it's
>"linklocal-ness". This is true for LLDP and if you change the skb->dev
>of the packet, then you don't know which slave link it came on in
>(from LLDP consumer's perspective).
>
>I don't know much about STP but trunking two links and aggregating
>this link info through bond-master seems wrong. Just like LLDP, you
>are losing info specific to a link and the decision derived from that
>info could be wrong.
>
>Having said that, we determine "linklocal-ness" by looking at L2 and
>bondmaster shares this with lts slaves. So it does seem fair to pass
>those frames to the bonding-master but at the same time link-local
>traffic is supposed to be limited to the physical link (LLDP/STP/LACP
>etc). Your thoughts?

I agree the whole thing sounds kind of weird, but I'm curious as
to what Michal's actual use case is; he presumably has some practical
use for this, since he noticed that the behavior changed.

Michal, you mentioned MSTP and using 802.3ad (LACP) mode; how
does that combination work rationally given that the bond might send and
receive traffic across multiple slaves?  Or does the switch side bundle
the ports together into a single logical interface for MSTP purposes?
On the TX side, I think the bond will likely balance all STP frames to
just one slave.

As for a resolution, presuming that Michal has some reasonable
use case, I'm thinking along the lines of reverting the new (leave frame
attached to slave) behavior for the general case and adding a special
case for LLDP and friends to get the new behavior.  I'd like to avoid
adding any new options to bonding.

-J

---
-Jay Vosburgh, jay.vosbu...@canonical.com


Re: [BUG] bonded interfaces drop bpdu (stp) frames

2018-07-12 Thread महेश बंडेवार
On Thu, Jul 12, 2018 at 11:03 AM, Jay Vosburgh
 wrote:
> Michal Soltys  wrote:
>
>>On 07/12/2018 04:51 PM, Jay Vosburgh wrote:
>>> Mahesh Bandewar (महेश बंडेवार) wrote:
>>>
 On Wed, Jul 11, 2018 at 3:23 PM, Michal Soltys  wrote:
>
> Hi,
>
> As weird as that sounds, this is what I observed today after bumping
> kernel version. I have a setup where 2 bonds are attached to linux
> bridge and physically are connected to two switches doing MSTP (and
> linux bridge is just passing them).
>
> Initially I suspected some changes related to bridge code - but quick
> peek at the code showed nothing suspicious - and the part of it that
> explicitly passes stp frames if stp is not enabled has seen little
> changes (e.g. per-port group_fwd_mask added recently). Furthermore - if
> regular non-bonded interfaces are attached everything works fine.
>
> Just to be sure I detached the bond (802.3ad mode) and checked it with
> simple tcpdump (ether proto \\stp) - and indeed no hello packets were
> there (with them being present just fine on active enslaved interface,
> or on the bond device in earlier kernels).
>
> If time permits I'll bisect tommorow to pinpoint the commit, but from
> quick todays test - 4.9.x is working fine, while 4.16.16 (tested on
> debian) and 4.17.3 (tested on archlinux) are failing.
>
> Unless this is already a known issue (or you have any suggestions what
> could be responsible).
>
 I believe these are link-local-multicast messages and sometime back a
 change went into to not pass those frames to the bonding master. This
 could be the side effect of that.
>>>
>>>  Mahesh, I suspect you're thinking of:
>>>
>>> commit b89f04c61efe3b7756434d693b9203cc0cce002e
>>> Author: Chonggang Li 
>>> Date:   Sun Apr 16 12:02:18 2017 -0700
>>>
>>>  bonding: deliver link-local packets with skb->dev set to link that 
>>> packets arrived on
>>>
>>>  Michal, are you able to revert this patch and test?
>>>
>>>  -J
>>>
>>> ---
>>>  -Jay Vosburgh, jay.vosbu...@canonical.com
>>>
>>
>>
>>Just tested - yes, reverting that patch solves the issues.
>
> Chonggang,
>
> Reading the changelog in your commit referenced above, I'm not
> entirely sure what actual problem it is fixing.  Could you elaborate?
>
> As the patch appears to cause a regression, it needs to be
> either fixed or reverted.
>
> Mahesh, you signed-off on it as well, perhaps you also have some
> context?
>

I think the original idea behind it was to pass the LLDPDUs to the
stack on the interface that they came on since this is considered to
be link-local traffic and passing to bond-master would loose it's
"linklocal-ness". This is true for LLDP and if you change the skb->dev
of the packet, then you don't know which slave link it came on in
(from LLDP consumer's perspective).

I don't know much about STP but trunking two links and aggregating
this link info through bond-master seems wrong. Just like LLDP, you
are losing info specific to a link and the decision derived from that
info could be wrong.

Having said that, we determine "linklocal-ness" by looking at L2 and
bondmaster shares this with lts slaves. So it does seem fair to pass
those frames to the bonding-master but at the same time link-local
traffic is supposed to be limited to the physical link (LLDP/STP/LACP
etc). Your thoughts?


> -J
>
> ---
> -Jay Vosburgh, jay.vosbu...@canonical.com


Re: [BUG] bonded interfaces drop bpdu (stp) frames

2018-07-12 Thread Jay Vosburgh
Michal Soltys  wrote:

>On 07/12/2018 04:51 PM, Jay Vosburgh wrote:
>> Mahesh Bandewar (महेश बंडेवार) wrote:
>>
>>> On Wed, Jul 11, 2018 at 3:23 PM, Michal Soltys  wrote:

 Hi,

 As weird as that sounds, this is what I observed today after bumping
 kernel version. I have a setup where 2 bonds are attached to linux
 bridge and physically are connected to two switches doing MSTP (and
 linux bridge is just passing them).

 Initially I suspected some changes related to bridge code - but quick
 peek at the code showed nothing suspicious - and the part of it that
 explicitly passes stp frames if stp is not enabled has seen little
 changes (e.g. per-port group_fwd_mask added recently). Furthermore - if
 regular non-bonded interfaces are attached everything works fine.

 Just to be sure I detached the bond (802.3ad mode) and checked it with
 simple tcpdump (ether proto \\stp) - and indeed no hello packets were
 there (with them being present just fine on active enslaved interface,
 or on the bond device in earlier kernels).

 If time permits I'll bisect tommorow to pinpoint the commit, but from
 quick todays test - 4.9.x is working fine, while 4.16.16 (tested on
 debian) and 4.17.3 (tested on archlinux) are failing.

 Unless this is already a known issue (or you have any suggestions what
 could be responsible).

>>> I believe these are link-local-multicast messages and sometime back a
>>> change went into to not pass those frames to the bonding master. This
>>> could be the side effect of that.
>>
>>  Mahesh, I suspect you're thinking of:
>>
>> commit b89f04c61efe3b7756434d693b9203cc0cce002e
>> Author: Chonggang Li 
>> Date:   Sun Apr 16 12:02:18 2017 -0700
>>
>>  bonding: deliver link-local packets with skb->dev set to link that 
>> packets arrived on
>>
>>  Michal, are you able to revert this patch and test?
>>
>>  -J
>>
>> ---
>>  -Jay Vosburgh, jay.vosbu...@canonical.com
>>
>
>
>Just tested - yes, reverting that patch solves the issues.

Chonggang,

Reading the changelog in your commit referenced above, I'm not
entirely sure what actual problem it is fixing.  Could you elaborate?

As the patch appears to cause a regression, it needs to be
either fixed or reverted.

Mahesh, you signed-off on it as well, perhaps you also have some
context?

-J

---
-Jay Vosburgh, jay.vosbu...@canonical.com


Re: [BUG] bonded interfaces drop bpdu (stp) frames

2018-07-12 Thread Michal Soltys

On 07/12/2018 04:51 PM, Jay Vosburgh wrote:

Mahesh Bandewar (महेश बंडेवार) wrote:


On Wed, Jul 11, 2018 at 3:23 PM, Michal Soltys  wrote:


Hi,

As weird as that sounds, this is what I observed today after bumping
kernel version. I have a setup where 2 bonds are attached to linux
bridge and physically are connected to two switches doing MSTP (and
linux bridge is just passing them).

Initially I suspected some changes related to bridge code - but quick
peek at the code showed nothing suspicious - and the part of it that
explicitly passes stp frames if stp is not enabled has seen little
changes (e.g. per-port group_fwd_mask added recently). Furthermore - if
regular non-bonded interfaces are attached everything works fine.

Just to be sure I detached the bond (802.3ad mode) and checked it with
simple tcpdump (ether proto \\stp) - and indeed no hello packets were
there (with them being present just fine on active enslaved interface,
or on the bond device in earlier kernels).

If time permits I'll bisect tommorow to pinpoint the commit, but from
quick todays test - 4.9.x is working fine, while 4.16.16 (tested on
debian) and 4.17.3 (tested on archlinux) are failing.

Unless this is already a known issue (or you have any suggestions what
could be responsible).


I believe these are link-local-multicast messages and sometime back a
change went into to not pass those frames to the bonding master. This
could be the side effect of that.


Mahesh, I suspect you're thinking of:

commit b89f04c61efe3b7756434d693b9203cc0cce002e
Author: Chonggang Li 
Date:   Sun Apr 16 12:02:18 2017 -0700

 bonding: deliver link-local packets with skb->dev set to link that packets 
arrived on

Michal, are you able to revert this patch and test?

-J

---
-Jay Vosburgh, jay.vosbu...@canonical.com




Just tested - yes, reverting that patch solves the issues.


Re: [BUG] bonded interfaces drop bpdu (stp) frames

2018-07-12 Thread Jay Vosburgh
Mahesh Bandewar (महेश बंडेवार) wrote:

>On Wed, Jul 11, 2018 at 3:23 PM, Michal Soltys  wrote:
>>
>> Hi,
>>
>> As weird as that sounds, this is what I observed today after bumping
>> kernel version. I have a setup where 2 bonds are attached to linux
>> bridge and physically are connected to two switches doing MSTP (and
>> linux bridge is just passing them).
>>
>> Initially I suspected some changes related to bridge code - but quick
>> peek at the code showed nothing suspicious - and the part of it that
>> explicitly passes stp frames if stp is not enabled has seen little
>> changes (e.g. per-port group_fwd_mask added recently). Furthermore - if
>> regular non-bonded interfaces are attached everything works fine.
>>
>> Just to be sure I detached the bond (802.3ad mode) and checked it with
>> simple tcpdump (ether proto \\stp) - and indeed no hello packets were
>> there (with them being present just fine on active enslaved interface,
>> or on the bond device in earlier kernels).
>>
>> If time permits I'll bisect tommorow to pinpoint the commit, but from
>> quick todays test - 4.9.x is working fine, while 4.16.16 (tested on
>> debian) and 4.17.3 (tested on archlinux) are failing.
>>
>> Unless this is already a known issue (or you have any suggestions what
>> could be responsible).
>>
>I believe these are link-local-multicast messages and sometime back a
>change went into to not pass those frames to the bonding master. This
>could be the side effect of that.

Mahesh, I suspect you're thinking of:

commit b89f04c61efe3b7756434d693b9203cc0cce002e
Author: Chonggang Li 
Date:   Sun Apr 16 12:02:18 2017 -0700

bonding: deliver link-local packets with skb->dev set to link that packets 
arrived on

Michal, are you able to revert this patch and test?

-J

---
-Jay Vosburgh, jay.vosbu...@canonical.com


Re: [BUG] bonded interfaces drop bpdu (stp) frames

2018-07-11 Thread महेश बंडेवार
On Wed, Jul 11, 2018 at 3:23 PM, Michal Soltys  wrote:
>
> Hi,
>
> As weird as that sounds, this is what I observed today after bumping
> kernel version. I have a setup where 2 bonds are attached to linux
> bridge and physically are connected to two switches doing MSTP (and
> linux bridge is just passing them).
>
> Initially I suspected some changes related to bridge code - but quick
> peek at the code showed nothing suspicious - and the part of it that
> explicitly passes stp frames if stp is not enabled has seen little
> changes (e.g. per-port group_fwd_mask added recently). Furthermore - if
> regular non-bonded interfaces are attached everything works fine.
>
> Just to be sure I detached the bond (802.3ad mode) and checked it with
> simple tcpdump (ether proto \\stp) - and indeed no hello packets were
> there (with them being present just fine on active enslaved interface,
> or on the bond device in earlier kernels).
>
> If time permits I'll bisect tommorow to pinpoint the commit, but from
> quick todays test - 4.9.x is working fine, while 4.16.16 (tested on
> debian) and 4.17.3 (tested on archlinux) are failing.
>
> Unless this is already a known issue (or you have any suggestions what
> could be responsible).
>
I believe these are link-local-multicast messages and sometime back a
change went into to not pass those frames to the bonding master. This
could be the side effect of that.


[BUG] bonded interfaces drop bpdu (stp) frames

2018-07-11 Thread Michal Soltys
Hi,

As weird as that sounds, this is what I observed today after bumping
kernel version. I have a setup where 2 bonds are attached to linux
bridge and physically are connected to two switches doing MSTP (and
linux bridge is just passing them).

Initially I suspected some changes related to bridge code - but quick
peek at the code showed nothing suspicious - and the part of it that
explicitly passes stp frames if stp is not enabled has seen little
changes (e.g. per-port group_fwd_mask added recently). Furthermore - if
regular non-bonded interfaces are attached everything works fine.

Just to be sure I detached the bond (802.3ad mode) and checked it with
simple tcpdump (ether proto \\stp) - and indeed no hello packets were
there (with them being present just fine on active enslaved interface,
or on the bond device in earlier kernels).

If time permits I'll bisect tommorow to pinpoint the commit, but from
quick todays test - 4.9.x is working fine, while 4.16.16 (tested on
debian) and 4.17.3 (tested on archlinux) are failing.

Unless this is already a known issue (or you have any suggestions what
could be responsible).