Re: [PATCH v2 rfc 0/8] IGMP snooping for local traffic

2017-09-06 Thread Andrew Lunn
> At the very least we should probably move the skb->offload_fwd_mark
> setting down into the individual taggers since they should be in a
> better position to set it or not based on the switch device they are
> driving, this should address, on a per-switch basis whether 2) or 3)
> applies to a given switch.

Yes, that could be done.

> That being said, I have a feeling that the Marvell switches behave a
> tiny bit differently than others in that they do not flood broadcast by
> default in a given L2 domain.

It seems like the switch can be configured to flood broadcast. Just at
the moment, it is not.

> On b53/bcm_sf2 there is the ability to disable the reception of
> broadcast frames on the management/CPU port, and while there is the
> ability to configure which ports should be flooded in case of
> unicast/multicast lookup failures, I don't see anything for Broadcast,
> so I am assuming this will get forwarded by default. Will test with your
> patch set later on time permitting.

Please let me know the results of the tests.

If Marvell is being different, it also means that all other switches
today are duplicating broadcasts. We should fix that. In fact, i think
we need to fix this first, before these patches for IGMP snooping on
br0 goes in.

Andrew


Re: [PATCH v2 rfc 0/8] IGMP snooping for local traffic

2017-09-06 Thread Andrew Lunn
On Wed, Sep 06, 2017 at 11:44:47PM +, woojung@microchip.com wrote:
> > That being said, I have a feeling that the Marvell switches behave a
> > tiny bit differently than others in that they do not flood broadcast by
> > default in a given L2 domain.
> Florian,
> 
> Because some DSA switches from Marvell & Microchip can do IGMP snooping, 
> can we propose switch layer another flag what to do when HW support it?
 
Hi Woojung

I expect all the current DSA devices should be able to do IGMP
snooping, with some modifications.

Two things are required:

1) The .port_mdb_prepare, .port_mdb_add and .port_mdb_del ops, so that
mdb entries can be added. As you said, only Marvell and Microchip
support these, but i expect the other switch can do this, it just
needs implementing.

2) The switch needs to identify and forward IGMP packets to the host,
even when they would normally be blocked.

And for the implementation, i don't think it actually matters.  For
switches which don't implement the port_mdb operations, IGMP packets
will get forwarded to the software bridge. It will attempt to put in
an mdb, but the request will come back with EOPNOTSUPP. The switch
should continue to flood multicast out all ports. No harm done.

   Andrew




RE: [PATCH v2 rfc 0/8] IGMP snooping for local traffic

2017-09-06 Thread Woojung.Huh
> That being said, I have a feeling that the Marvell switches behave a
> tiny bit differently than others in that they do not flood broadcast by
> default in a given L2 domain.
Florian,

Because some DSA switches from Marvell & Microchip can do IGMP snooping, 
can we propose switch layer another flag what to do when HW support it?

- Woojung


Re: [PATCH v2 rfc 0/8] IGMP snooping for local traffic

2017-09-06 Thread Florian Fainelli
On 09/06/2017 09:42 AM, Andrew Lunn wrote:
>>> On the switch asics we work with, the driver has information if the
>>> packet was
>>> forwarded in hardware. This is per packet reason code telling why the
>>> CPU is seeing the packet.
>>> The driver can use this information to reset skb->offload_fwd_mark to
>>> allow software forward.
> 
>> I am not positive this is universally available across different
>> switch vendors.
> 
> It is not universally available. We cannot rely on it being available
> with switches supported by DSA.
> 
> We have a few choices:
> 
> 1) We assume anything the switch forwards to the CPU has also been
>sent out whatever ports of the switch it needs to. Set
>offload_fwd_mark.
> 
> 2) We assume anything the switch forwards to the CPU has not gone
>anywhere else, and the bridge needs to send it out whatever ports
>it thinks. Don't set offload_fwd_mark.
> 
> 3) We define some rules about what packets the switch should handle,
>and then do some deep packet inspection to decide if
>offload_fwd_mark should be set or not.
> 
> I don't see 3) being possible. We are dealing with a fixed silicon
> data path, not something which is fully programmable.
> 
> So it is down to 1) or 2). I've been assuming 1), but maybe we need to
> discuss that as well.

At the very least we should probably move the skb->offload_fwd_mark
setting down into the individual taggers since they should be in a
better position to set it or not based on the switch device they are
driving, this should address, on a per-switch basis whether 2) or 3)
applies to a given switch.

That being said, I have a feeling that the Marvell switches behave a
tiny bit differently than others in that they do not flood broadcast by
default in a given L2 domain.

On b53/bcm_sf2 there is the ability to disable the reception of
broadcast frames on the management/CPU port, and while there is the
ability to configure which ports should be flooded in case of
unicast/multicast lookup failures, I don't see anything for Broadcast,
so I am assuming this will get forwarded by default. Will test with your
patch set later on time permitting.
-- 
Florian


Re: [PATCH v2 rfc 0/8] IGMP snooping for local traffic

2017-09-06 Thread Vivien Didelot
Hi Andrew,

Andrew Lunn  writes:

>> I don't understand why SWITCHDEV_OBJ_ID_PORT_MDB cannot be used. Isn't
>> setting the obj->orig_dev to the bridge device itself enough to
>> distinguish br0 from a switch port?
>
> I did consider this. But conceptually, it seems wrong.
> SWITCHDEV_OBJ_ID_PORT_MDB has always been about egress. I don't like
> adding a special case for ingress. Adding a new switchdev object
> avoids this special case.

SWITCHDEV_OBJ_ID_PORT_MDB means manipulating an MDB entry on a port.
When one want to add an MDB entry to br0, SWITCHDEV_OBJ_ID_PORT_MDB is
used. If the device is br->dev, the lower devices (bridged ports) get
recursively notified and switchdev users can program themselves
accordingly. In the case of DSA, a slave port will program its
associated CPU port (port->cpu_dp) if orig_dev is a bridge.

This seems totally correct to me. I don't see any reason for adding and
maintaining a new switchdev object. What do switchdev guys think?

>> The main problem is that we will soon want support for multiple CPU
>> ports.
>
> We keep saying that, but it never actually happens. We should solve
> multiple CPU problems when we actually add support for multiple CPUs.
> Probably at that point, we need to push SWITCHDEV_OBJ_ID_PORT_HOST_MDB
> right down to a port, so that port can setup what needs setting up so
> its frames goes to its CPU port.

This is not correct. I am currently making this easier, i.e. the
dsa_master patchset I sent before net-next closed.

I do understand your point however and even if this multi-CPU feature
takes time to arrive, we can always find a proper design which makes it
easy. Assuming that each port has its dedicated CPU port is the correct
concept to follow.

>> So adding the MDB entry to all CPU ports as you did in a patch 5/8 in
>> dsa_switch_host_mdb_*() is not correct because you can have CPU port
>> sw0p0 being a member of br0 and CPU port sw0p10 being a member of br1.
>
> Be careful, you are making assumptions about mapping cpu ports to
> bridges. That is not been defined yet.

I am not making any assumptions, but you did because you assume that all
CPU ports will be part of the same bridge.

We need to handle things at the DSA core level the following way:

If one programs the bridge device itself, a switchdev object is
notified, and each DSA slave devices member of the bridge will call the
correct dsa_port_* function (agnostic to the port type) with its cpu_dp.

This covers cleanly all cases. You also don't need any new DSA notifier.
You only need your 6/8 patch. To acheive that, there is two ways:

1) either we keep SWITCHDEV_OBJ_ID_PORT_MDB for the bridge device itself
and the slave devices are notified accordingly with a correct orig_dev:

static int dsa_slave_port_obj_add(struct net_device *dev,
  const struct switchdev_obj *obj,
  struct switchdev_trans *trans)
{
struct dsa_slave_priv *p = netdev_priv(dev);
struct dsa_port *port = p->dp;

/* Program the CPU port if the target is the bridge itself */
if (obj->orig_dev == port->br)
port = port->cpu_dp;

switch (obj->id) {
case SWITCHDEV_OBJ_ID_PORT_MDB:
return dsa_port_mdb_add(port, obj, trans);
...
}
}

2) or you introduce a switchdev object which, by definition, targets the
CPU ports of our bridge members:

static int dsa_slave_port_obj_add(struct net_device *dev,
  const struct switchdev_obj *obj,
  struct switchdev_trans *trans)
{
struct dsa_slave_priv *p = netdev_priv(dev);
struct dsa_port *port = p->dp;

switch (obj->id) {
case SWITCHDEV_OBJ_ID_PORT_HOST_MDB:
port = port->cpu_dp;
/* fall through... */
case SWITCHDEV_OBJ_ID_PORT_MDB:
return dsa_port_mdb_add(port, obj, trans);
...
}
}

You basically just need that and your 6/8 patch for the DSA part.


Thanks,

Vivien


Re: [PATCH v2 rfc 0/8] IGMP snooping for local traffic

2017-09-06 Thread Andrew Lunn
On Wed, Sep 06, 2017 at 11:25:26AM -0400, Vivien Didelot wrote:
> Hi Andrew, Nikolay,
> 
> Andrew Lunn  writes:
> 
> > Then starts the work passing down to the hardware that the host has
> > joined/left a group. The existing switchdev mdb object cannot be used,
> > since the semantics are different. The existing
> > SWITCHDEV_OBJ_ID_PORT_MDB is used to indicate a specific multicast
> > group should be forwarded out that port of the switch. However here we
> > require the exact opposite. We want multicast frames for the group
> > received on the port to the forwarded to the host. Hence add a new
> > object SWITCHDEV_OBJ_ID_HOST_MDB, a multicast database entry to
> > forward to the host. This new object is then propagated through the
> > DSA layers. No DSA driver changes should be needed, this should just
> > work...
> 
> I'm not sure if you already explained that, if so, sorry in advance.
> 
> I don't understand why SWITCHDEV_OBJ_ID_PORT_MDB cannot be used. Isn't
> setting the obj->orig_dev to the bridge device itself enough to
> distinguish br0 from a switch port?

Hi Vivien

I did consider this. But conceptually, it seems wrong.
SWITCHDEV_OBJ_ID_PORT_MDB has always been about egress. I don't like
adding a special case for ingress. Adding a new switchdev object
avoids this special case.

> static int dsa_slave_port_obj_add(struct net_device *dev,
> const struct switchdev_obj *obj,
> struct switchdev_trans *trans)
> {
> struct dsa_slave_priv *p = netdev_priv(dev);
> struct dsa_port *port = p->dp;
> 
> /* Is the target port the bridge device itself? */
> if (obj->orig_dev == port->br)
> port = port->cpu_dp;
> 
> return dsa_port_mdb_add(port, obj, trans);
> }
> 
> The main problem is that we will soon want support for multiple CPU
> ports.

We keep saying that, but it never actually happens. We should solve
multiple CPU problems when we actually add support for multiple CPUs.
Probably at that point, we need to push SWITCHDEV_OBJ_ID_PORT_HOST_MDB
right down to a port, so that port can setup what needs setting up so
its frames goes to its CPU port.

> So adding the MDB entry to all CPU ports as you did in a patch 5/8 in
> dsa_switch_host_mdb_*() is not correct because you can have CPU port
> sw0p0 being a member of br0 and CPU port sw0p10 being a member of br1.

Be careful, you are making assumptions about mapping cpu ports to
bridges. That is not been defined yet.

 Andrew


Re: [PATCH v2 rfc 0/8] IGMP snooping for local traffic

2017-09-06 Thread Andrew Lunn
> >On the switch asics we work with, the driver has information if the
> >packet was
> >forwarded in hardware. This is per packet reason code telling why the
> >CPU is seeing the packet.
> >The driver can use this information to reset skb->offload_fwd_mark to
> >allow software forward.

> I am not positive this is universally available across different
> switch vendors.

It is not universally available. We cannot rely on it being available
with switches supported by DSA.

We have a few choices:

1) We assume anything the switch forwards to the CPU has also been
   sent out whatever ports of the switch it needs to. Set
   offload_fwd_mark.

2) We assume anything the switch forwards to the CPU has not gone
   anywhere else, and the bridge needs to send it out whatever ports
   it thinks. Don't set offload_fwd_mark.

3) We define some rules about what packets the switch should handle,
   and then do some deep packet inspection to decide if
   offload_fwd_mark should be set or not.

I don't see 3) being possible. We are dealing with a fixed silicon
data path, not something which is fully programmable.

So it is down to 1) or 2). I've been assuming 1), but maybe we need to
discuss that as well.

Andrew


Re: [PATCH v2 rfc 0/8] IGMP snooping for local traffic

2017-09-06 Thread Florian Fainelli
On September 6, 2017 8:54:33 AM PDT, Roopa Prabhu  
wrote:
>On Tue, Sep 5, 2017 at 5:47 PM, Andrew Lunn  wrote:
>>> The third and last issue will be explained in a followup email.
>>
>> Hi DSA hackers
>>
>> So there is the third issue. It affects just DSA, but it possible
>> affects all DSA drivers.
>>
>> This patchset broken broadcast with the Marvell drivers. It could
>> break broadcast on others drivers as well.
>>
>> What i found is that the Marvell chips don't flood broadcast frames
>> between bridged ports. What appears to happen is there is a fdb miss,
>> so it gets forwarded to the CPU port for the host to deal with. The
>> software bridge when floods it out all ports of the bridge.
>>
>> But the set offload_fwd_mark patch changes this. The software bridge
>> now assumes the hardware has already flooded broadcast out all ports
>> of the switch as needed. So it does not do any flooding itself. As a
>> result, on Marvell devices, broadcast packets don't get flooded at
>> all.
>>
>> The issue can be fixed. I just need to add an mdb entry for the
>> broadcast address to each port of the bridge in the switch, and the
>> CPU port.  But i don't know at what level to do this.
>>
>> Should this be done at the DSA level, or at the driver level?  Do any
>> chips do broadcast flooding in hardware already? Hence they currently
>> see broadcast duplication? If i add a broadcast mdb at the DSA level,
>> and the chip is already hard wired to flooding broadcast, is it going
>> to double flood?
>>
>
>On the switch asics we work with, the driver has information if the
>packet was
>forwarded in hardware. This is per packet reason code telling why the
>CPU is seeing the packet.
>The driver can use this information to reset skb->offload_fwd_mark to
>allow software forward.

I am not positive this is universally available across different switch 
vendors. In Broadcom tag (net/dsa/tag_brcm.c) the reason code definitely tells 
you that but it also largely depends on whether you have configured SW managed 
forwarding or not and that translates in having either the HW do all the 
address learning itself or having SW do it which is less desirable since you 
end-up with a possibility huge FDB of 4k entries to manage in SW.


-- 
Florian


Re: [PATCH v2 rfc 0/8] IGMP snooping for local traffic

2017-09-06 Thread Roopa Prabhu
On Tue, Sep 5, 2017 at 5:47 PM, Andrew Lunn  wrote:
>> The third and last issue will be explained in a followup email.
>
> Hi DSA hackers
>
> So there is the third issue. It affects just DSA, but it possible
> affects all DSA drivers.
>
> This patchset broken broadcast with the Marvell drivers. It could
> break broadcast on others drivers as well.
>
> What i found is that the Marvell chips don't flood broadcast frames
> between bridged ports. What appears to happen is there is a fdb miss,
> so it gets forwarded to the CPU port for the host to deal with. The
> software bridge when floods it out all ports of the bridge.
>
> But the set offload_fwd_mark patch changes this. The software bridge
> now assumes the hardware has already flooded broadcast out all ports
> of the switch as needed. So it does not do any flooding itself. As a
> result, on Marvell devices, broadcast packets don't get flooded at
> all.
>
> The issue can be fixed. I just need to add an mdb entry for the
> broadcast address to each port of the bridge in the switch, and the
> CPU port.  But i don't know at what level to do this.
>
> Should this be done at the DSA level, or at the driver level?  Do any
> chips do broadcast flooding in hardware already? Hence they currently
> see broadcast duplication? If i add a broadcast mdb at the DSA level,
> and the chip is already hard wired to flooding broadcast, is it going
> to double flood?
>

On the switch asics we work with, the driver has information if the packet was
forwarded in hardware. This is per packet reason code telling why the
CPU is seeing the packet.
The driver can use this information to reset skb->offload_fwd_mark to
allow software forward.


RE: [PATCH v2 rfc 0/8] IGMP snooping for local traffic

2017-09-06 Thread Woojung.Huh
Andrew,

> What i found is that the Marvell chips don't flood broadcast frames
> between bridged ports. What appears to happen is there is a fdb miss,
> so it gets forwarded to the CPU port for the host to deal with. The
> software bridge when floods it out all ports of the bridge.
Is this IGMP snooping enabled mode in Marvell chip?




Re: [PATCH v2 rfc 0/8] IGMP snooping for local traffic

2017-09-06 Thread Vivien Didelot
Hi Andrew, Nikolay,

Andrew Lunn  writes:

> Then starts the work passing down to the hardware that the host has
> joined/left a group. The existing switchdev mdb object cannot be used,
> since the semantics are different. The existing
> SWITCHDEV_OBJ_ID_PORT_MDB is used to indicate a specific multicast
> group should be forwarded out that port of the switch. However here we
> require the exact opposite. We want multicast frames for the group
> received on the port to the forwarded to the host. Hence add a new
> object SWITCHDEV_OBJ_ID_HOST_MDB, a multicast database entry to
> forward to the host. This new object is then propagated through the
> DSA layers. No DSA driver changes should be needed, this should just
> work...

I'm not sure if you already explained that, if so, sorry in advance.

I don't understand why SWITCHDEV_OBJ_ID_PORT_MDB cannot be used. Isn't
setting the obj->orig_dev to the bridge device itself enough to
distinguish br0 from a switch port?

This way, the only change necessary in net/dsa/slave.c is something
like (abbreviated):

static int dsa_slave_port_obj_add(struct net_device *dev,
const struct switchdev_obj *obj,
struct switchdev_trans *trans)
{
struct dsa_slave_priv *p = netdev_priv(dev);
struct dsa_port *port = p->dp;

/* Is the target port the bridge device itself? */
if (obj->orig_dev == port->br)
port = port->cpu_dp;

return dsa_port_mdb_add(port, obj, trans);
}

The main problem is that we will soon want support for multiple CPU
ports. This means that each DSA slave will have its dedicated CPU port,
instead of having only one for the whole switch tree.

So adding the MDB entry to all CPU ports as you did in a patch 5/8 in
dsa_switch_host_mdb_*() is not correct because you can have CPU port
sw0p0 being a member of br0 and CPU port sw0p10 being a member of br1.

So is it correct to simply notify SWITCHDEV_OBJ_ID_PORT_MDB with
orig_dev = br->dev so that its members get recursively notified?

Even if SWITCHDEV_OBJ_ID_HOST_MDB is necessary, we need to handle it the
way I described, otherwise we don't have a correct mapping between a
slave port and its CPU port that we need to configure.


Thanks,

Vivien


Re: [PATCH v2 rfc 0/8] IGMP snooping for local traffic

2017-09-06 Thread Stephen Hemminger
On Wed, 6 Sep 2017 02:47:03 +0200
Andrew Lunn  wrote:

> > The third and last issue will be explained in a followup email.  
> 
> Hi DSA hackers
> 
> So there is the third issue. It affects just DSA, but it possible
> affects all DSA drivers.
> 
> This patchset broken broadcast with the Marvell drivers. It could
> break broadcast on others drivers as well.
> 
> What i found is that the Marvell chips don't flood broadcast frames
> between bridged ports. What appears to happen is there is a fdb miss,
> so it gets forwarded to the CPU port for the host to deal with. The
> software bridge when floods it out all ports of the bridge.
> 

That sounds like a good feature. There environments where you want
to disable broadcast between certain ports. It lets the bridge get
at the traffic for firewall filtering.


Re: [PATCH v2 rfc 0/8] IGMP snooping for local traffic

2017-09-06 Thread Andrew Lunn
On Wed, Sep 06, 2017 at 04:46:51PM +0200, Matthias May wrote:
> 
> Hi Andrew
> We are using the 88E6321.
> In our setup we are using openvswitch and not a bridge, however the problem 
> you describe seems to be the same.
> 
> We had to configure the switch to flood unknown multicast (Egress Floods = 
> 0x3, bits 3:2, offset 0x4 in port control)
> and
> unset FloodBC (FloodBC = 0x0, bit 12, offset 0x5 in global 2) which defines 
> if a broadcast should be considered as
> multicast for the above config.

Hi Matthias

I might look at this.

But architecturally it seems better to add an mdb entry for the
broadcast address. I hope that work across all switches, where as what
you suggests only works for Marvell devices.

Andrew


Re: [PATCH v2 rfc 0/8] IGMP snooping for local traffic

2017-09-06 Thread Matthias May
On 06/09/17 02:47, Andrew Lunn wrote:
>> The third and last issue will be explained in a followup email.
> 
> Hi DSA hackers
> 
> So there is the third issue. It affects just DSA, but it possible
> affects all DSA drivers.
> 
> This patchset broken broadcast with the Marvell drivers. It could
> break broadcast on others drivers as well.
> 
> What i found is that the Marvell chips don't flood broadcast frames
> between bridged ports. What appears to happen is there is a fdb miss,
> so it gets forwarded to the CPU port for the host to deal with. The
> software bridge when floods it out all ports of the bridge.
> 
> But the set offload_fwd_mark patch changes this. The software bridge
> now assumes the hardware has already flooded broadcast out all ports
> of the switch as needed. So it does not do any flooding itself. As a
> result, on Marvell devices, broadcast packets don't get flooded at
> all.
> 
> The issue can be fixed. I just need to add an mdb entry for the
> broadcast address to each port of the bridge in the switch, and the
> CPU port.  But i don't know at what level to do this.
> 
> Should this be done at the DSA level, or at the driver level?  Do any
> chips do broadcast flooding in hardware already? Hence they currently
> see broadcast duplication? If i add a broadcast mdb at the DSA level,
> and the chip is already hard wired to flooding broadcast, is it going
> to double flood?
> 
>   Andrew
> 

Hi Andrew
We are using the 88E6321.
In our setup we are using openvswitch and not a bridge, however the problem you 
describe seems to be the same.

We had to configure the switch to flood unknown multicast (Egress Floods = 0x3, 
bits 3:2, offset 0x4 in port control)
and
unset FloodBC (FloodBC = 0x0, bit 12, offset 0x5 in global 2) which defines if 
a broadcast should be considered as
multicast for the above config.

Regarding the looping problem:
Since OVS isn't aware of the fdb of the switch, we configure the ports 
representing the switch ports (in ovs) as
"protected" which prevents looping them back between (even when flooding) see 
[1].
I'm not sure if the bridge has a similar feature.

BR
Matthias

[1]http://openvswitch.org/support/dist-docs/ovs-vswitchd.conf.db.5.txt ctrl-f: 
"protected: boolean"


Re: [PATCH v2 rfc 0/8] IGMP snooping for local traffic

2017-09-06 Thread Vivien Didelot
Hi Andrew,

Andrew Lunn  writes:

> Getting the frames to the bridge as requested turned up an issue or
> three. The offload_fwd_mark is not being set by DSA, so the bridge
> floods the received frames back to the switch ports, resulting in
> duplication since the hardware has already flooded the packet. Fixing
> that turned up an issue with the meaning of
> SWITCHDEV_ATTR_ID_PORT_PARENT_ID in DSA. A DSA fabric of three
> switches needs to look to the software bridge as a single
> switch. Otherwise the offload_fwd_mark does not work, and we get
> duplication on the non-ingress switch. But each switch returned a
> different value. And they were not unique.

[...]

> Is the SWITCHDEV_ATTR_ID_PORT_PARENT_ID change acceptable?

So here we need to return the switch fabric (tree) ID instead of a
single switch chip ID. I think we are not supposed to change it, but
there's definitly something wrong with them and we must fix it.

The same issue happens with the physical port IDs, where the net devices
of two disjoint trees on a system will have the same phys_*_id
attributes, because we do not include the tree ID in them.

(not sure if this affects your patchset though.)


Thanks,

Vivien


Re: [PATCH v2 rfc 0/8] IGMP snooping for local traffic

2017-09-06 Thread John Crispin



On 06/09/17 02:47, Andrew Lunn wrote:

Should this be done at the DSA level, or at the driver level?  Do any
chips do broadcast flooding in hardware already? Hence they currently
see broadcast duplication? If i add a broadcast mdb at the DSA level,
and the chip is already hard wired to flooding broadcast, is it going
to double flood?


Hi Andrew,

MT7530 and QCA8K only have a  "unknown mac fwd to port X" feature. both 
use the same HW table for FDB and MDB tables. so this should ideally be 
fixed on DSA level rather than fixing up those 2 drivers.


John


Re: [PATCH v2 rfc 0/8] IGMP snooping for local traffic

2017-09-06 Thread Andrew Lunn
> > What i found is that the Marvell chips don't flood broadcast frames
> > between bridged ports. What appears to happen is there is a fdb miss,
> > so it gets forwarded to the CPU port for the host to deal with. The
> > software bridge when floods it out all ports of the bridge.
> 
> Do you have this issue on a single switch?

Yes. I have two boards with a single switch in my test setup. They
fail the broadcast tests in the same way as boards with multiple
switches.

> It should be done at the DSA level. DSA core must be the single entry
> point to handle all the switch logic, calling into (dumb) DSA drivers.
> If it is buried into a specific driver, we'll likely lose track of the
> problem and make it harder to maintain.

I think we need to wait and see what driver writers report their chips
are doing. If this is just a Marvell issue, we probably should fix it
in the Marvell driver only.

If i remember correctly, Woojung said he was seeing duplication of
broadcasts. So it could be the Microchip hardware is forwarding
broadcast in the hardware.

  Andrew


Re: [PATCH v2 rfc 0/8] IGMP snooping for local traffic

2017-09-06 Thread Vivien Didelot
Hi Andrew,

Andrew Lunn  writes:

> So there is the third issue. It affects just DSA, but it possible
> affects all DSA drivers.
>
> This patchset broken broadcast with the Marvell drivers. It could
> break broadcast on others drivers as well.
>
> What i found is that the Marvell chips don't flood broadcast frames
> between bridged ports. What appears to happen is there is a fdb miss,
> so it gets forwarded to the CPU port for the host to deal with. The
> software bridge when floods it out all ports of the bridge.

Do you have this issue on a single switch?

I do expect FDB (not MDB) miss on a multi-chip fabric, not on a single
chip though.

> But the set offload_fwd_mark patch changes this. The software bridge
> now assumes the hardware has already flooded broadcast out all ports
> of the switch as needed. So it does not do any flooding itself. As a
> result, on Marvell devices, broadcast packets don't get flooded at
> all.
>
> The issue can be fixed. I just need to add an mdb entry for the
> broadcast address to each port of the bridge in the switch, and the
> CPU port.  But i don't know at what level to do this.
>
> Should this be done at the DSA level, or at the driver level?  Do any
> chips do broadcast flooding in hardware already? Hence they currently
> see broadcast duplication? If i add a broadcast mdb at the DSA level,
> and the chip is already hard wired to flooding broadcast, is it going
> to double flood?

It should be done at the DSA level. DSA core must be the single entry
point to handle all the switch logic, calling into (dumb) DSA drivers.
If it is buried into a specific driver, we'll likely lose track of the
problem and make it harder to maintain.


Thanks!

Vivien


Re: [PATCH v2 rfc 0/8] IGMP snooping for local traffic

2017-09-06 Thread Nikolay Aleksandrov
On 06/09/17 03:11, Stephen Hemminger wrote:
> On Wed,  6 Sep 2017 01:35:02 +0200
> Andrew Lunn  wrote:
> 
>> After the very useful feedback from Nikolay, i threw away what i had,
>> and started again. To recap:
>>
>> The linux bridge supports IGMP snooping. It will listen to IGMP
>> reports on bridge ports and keep track of which groups have been
>> joined on an interface. It will then forward multicast based on this
>> group membership.
>>
>> When the bridge adds or removed groups from an interface, it uses
>> switchdev to request the hardware add an mdb to a port, so the
>> hardware can perform the selective forwarding between ports.
>>
>> What is not covered by the current bridge code, is IGMP joins/leaves
>> from the host on the brX interface. These are not reported via
>> switchdev so that hardware knows the local host is interested in the
>> multicast frames.
>>
>> Luckily, the bridge does track joins/leaves on the brX interface. The
>> code is obfusticated, which is why i missed it with my first attempt.
>> So the first patch tries to remove this obfustication. Currently,
>> there is no notifications sent when the bridge interface joins a
>> group. The second patch adds them. bridge monitor then shows
>> joins/leaves in the same way as for other ports of the bridge.
>>
>> Then starts the work passing down to the hardware that the host has
>> joined/left a group. The existing switchdev mdb object cannot be used,
>> since the semantics are different. The existing
>> SWITCHDEV_OBJ_ID_PORT_MDB is used to indicate a specific multicast
>> group should be forwarded out that port of the switch. However here we
>> require the exact opposite. We want multicast frames for the group
>> received on the port to the forwarded to the host. Hence add a new
>> object SWITCHDEV_OBJ_ID_HOST_MDB, a multicast database entry to
>> forward to the host. This new object is then propagated through the
>> DSA layers. No DSA driver changes should be needed, this should just
>> work...
>>
>> Getting the frames to the bridge as requested turned up an issue or
>> three. The offload_fwd_mark is not being set by DSA, so the bridge
>> floods the received frames back to the switch ports, resulting in
>> duplication since the hardware has already flooded the packet. Fixing
>> that turned up an issue with the meaning of
>> SWITCHDEV_ATTR_ID_PORT_PARENT_ID in DSA. A DSA fabric of three
>> switches needs to look to the software bridge as a single
>> switch. Otherwise the offload_fwd_mark does not work, and we get
>> duplication on the non-ingress switch. But each switch returned a
>> different value. And they were not unique.
>>
>> The third and last issue will be explained in a followup email.
>>
>> Open questions:
>>
>> Is sending notifications going to break userspace?
>> Is this new switchdev object O.K. for the few non-DSA switches that exist?
>> Is the SWITCHDEV_ATTR_ID_PORT_PARENT_ID change acceptable?
>>
>>Andrew
>>
>> Andrew Lunn (8):
>>   net: bridge: Rename mglist to host_joined
>>   net: bridge: Send notification when host join/leaves a group
>>   net: bridge: Add/del switchdev object on host join/leave
>>   net: dsa: slave: Handle switchdev host mdb add/del
>>   net: dsa: switch: handle host mdb add/remove
>>   net: dsa: switch: Don't add CPU port to an mdb by default
>>   net: dsa: set offload_fwd_mark on received packets
>>   net: dsa: Fix SWITCHDEV_ATTR_ID_PORT_PARENT_ID
>>
>>  include/net/switchdev.h   |  1 +
>>  net/bridge/br_input.c |  2 +-
>>  net/bridge/br_mdb.c   | 50 +---
>>  net/bridge/br_multicast.c | 18 +++-
>>  net/bridge/br_private.h   |  2 +-
>>  net/dsa/dsa.c |  1 +
>>  net/dsa/dsa_priv.h|  7 +
>>  net/dsa/port.c| 26 +
>>  net/dsa/slave.c   | 16 ---
>>  net/dsa/switch.c  | 72 
>> +++
>>  net/switchdev/switchdev.c |  2 ++
>>  11 files changed, 168 insertions(+), 29 deletions(-)
>>
> 
> This looks much cleaner. I don't have DSA hardware or infrastructure to look 
> deeper.
> 

+1

This version looks great!



Re: [PATCH v2 rfc 0/8] IGMP snooping for local traffic

2017-09-05 Thread Andrew Lunn
> The third and last issue will be explained in a followup email.

Hi DSA hackers

So there is the third issue. It affects just DSA, but it possible
affects all DSA drivers.

This patchset broken broadcast with the Marvell drivers. It could
break broadcast on others drivers as well.

What i found is that the Marvell chips don't flood broadcast frames
between bridged ports. What appears to happen is there is a fdb miss,
so it gets forwarded to the CPU port for the host to deal with. The
software bridge when floods it out all ports of the bridge.

But the set offload_fwd_mark patch changes this. The software bridge
now assumes the hardware has already flooded broadcast out all ports
of the switch as needed. So it does not do any flooding itself. As a
result, on Marvell devices, broadcast packets don't get flooded at
all.

The issue can be fixed. I just need to add an mdb entry for the
broadcast address to each port of the bridge in the switch, and the
CPU port.  But i don't know at what level to do this.

Should this be done at the DSA level, or at the driver level?  Do any
chips do broadcast flooding in hardware already? Hence they currently
see broadcast duplication? If i add a broadcast mdb at the DSA level,
and the chip is already hard wired to flooding broadcast, is it going
to double flood?

Andrew



Re: [PATCH v2 rfc 0/8] IGMP snooping for local traffic

2017-09-05 Thread Stephen Hemminger
On Wed,  6 Sep 2017 01:35:02 +0200
Andrew Lunn  wrote:

> After the very useful feedback from Nikolay, i threw away what i had,
> and started again. To recap:
> 
> The linux bridge supports IGMP snooping. It will listen to IGMP
> reports on bridge ports and keep track of which groups have been
> joined on an interface. It will then forward multicast based on this
> group membership.
> 
> When the bridge adds or removed groups from an interface, it uses
> switchdev to request the hardware add an mdb to a port, so the
> hardware can perform the selective forwarding between ports.
> 
> What is not covered by the current bridge code, is IGMP joins/leaves
> from the host on the brX interface. These are not reported via
> switchdev so that hardware knows the local host is interested in the
> multicast frames.
> 
> Luckily, the bridge does track joins/leaves on the brX interface. The
> code is obfusticated, which is why i missed it with my first attempt.
> So the first patch tries to remove this obfustication. Currently,
> there is no notifications sent when the bridge interface joins a
> group. The second patch adds them. bridge monitor then shows
> joins/leaves in the same way as for other ports of the bridge.
> 
> Then starts the work passing down to the hardware that the host has
> joined/left a group. The existing switchdev mdb object cannot be used,
> since the semantics are different. The existing
> SWITCHDEV_OBJ_ID_PORT_MDB is used to indicate a specific multicast
> group should be forwarded out that port of the switch. However here we
> require the exact opposite. We want multicast frames for the group
> received on the port to the forwarded to the host. Hence add a new
> object SWITCHDEV_OBJ_ID_HOST_MDB, a multicast database entry to
> forward to the host. This new object is then propagated through the
> DSA layers. No DSA driver changes should be needed, this should just
> work...
> 
> Getting the frames to the bridge as requested turned up an issue or
> three. The offload_fwd_mark is not being set by DSA, so the bridge
> floods the received frames back to the switch ports, resulting in
> duplication since the hardware has already flooded the packet. Fixing
> that turned up an issue with the meaning of
> SWITCHDEV_ATTR_ID_PORT_PARENT_ID in DSA. A DSA fabric of three
> switches needs to look to the software bridge as a single
> switch. Otherwise the offload_fwd_mark does not work, and we get
> duplication on the non-ingress switch. But each switch returned a
> different value. And they were not unique.
> 
> The third and last issue will be explained in a followup email.
> 
> Open questions:
> 
> Is sending notifications going to break userspace?
> Is this new switchdev object O.K. for the few non-DSA switches that exist?
> Is the SWITCHDEV_ATTR_ID_PORT_PARENT_ID change acceptable?
> 
>Andrew
> 
> Andrew Lunn (8):
>   net: bridge: Rename mglist to host_joined
>   net: bridge: Send notification when host join/leaves a group
>   net: bridge: Add/del switchdev object on host join/leave
>   net: dsa: slave: Handle switchdev host mdb add/del
>   net: dsa: switch: handle host mdb add/remove
>   net: dsa: switch: Don't add CPU port to an mdb by default
>   net: dsa: set offload_fwd_mark on received packets
>   net: dsa: Fix SWITCHDEV_ATTR_ID_PORT_PARENT_ID
> 
>  include/net/switchdev.h   |  1 +
>  net/bridge/br_input.c |  2 +-
>  net/bridge/br_mdb.c   | 50 +---
>  net/bridge/br_multicast.c | 18 +++-
>  net/bridge/br_private.h   |  2 +-
>  net/dsa/dsa.c |  1 +
>  net/dsa/dsa_priv.h|  7 +
>  net/dsa/port.c| 26 +
>  net/dsa/slave.c   | 16 ---
>  net/dsa/switch.c  | 72 
> +++
>  net/switchdev/switchdev.c |  2 ++
>  11 files changed, 168 insertions(+), 29 deletions(-)
> 

This looks much cleaner. I don't have DSA hardware or infrastructure to look 
deeper.


[PATCH v2 rfc 0/8] IGMP snooping for local traffic

2017-09-05 Thread Andrew Lunn
After the very useful feedback from Nikolay, i threw away what i had,
and started again. To recap:

The linux bridge supports IGMP snooping. It will listen to IGMP
reports on bridge ports and keep track of which groups have been
joined on an interface. It will then forward multicast based on this
group membership.

When the bridge adds or removed groups from an interface, it uses
switchdev to request the hardware add an mdb to a port, so the
hardware can perform the selective forwarding between ports.

What is not covered by the current bridge code, is IGMP joins/leaves
from the host on the brX interface. These are not reported via
switchdev so that hardware knows the local host is interested in the
multicast frames.

Luckily, the bridge does track joins/leaves on the brX interface. The
code is obfusticated, which is why i missed it with my first attempt.
So the first patch tries to remove this obfustication. Currently,
there is no notifications sent when the bridge interface joins a
group. The second patch adds them. bridge monitor then shows
joins/leaves in the same way as for other ports of the bridge.

Then starts the work passing down to the hardware that the host has
joined/left a group. The existing switchdev mdb object cannot be used,
since the semantics are different. The existing
SWITCHDEV_OBJ_ID_PORT_MDB is used to indicate a specific multicast
group should be forwarded out that port of the switch. However here we
require the exact opposite. We want multicast frames for the group
received on the port to the forwarded to the host. Hence add a new
object SWITCHDEV_OBJ_ID_HOST_MDB, a multicast database entry to
forward to the host. This new object is then propagated through the
DSA layers. No DSA driver changes should be needed, this should just
work...

Getting the frames to the bridge as requested turned up an issue or
three. The offload_fwd_mark is not being set by DSA, so the bridge
floods the received frames back to the switch ports, resulting in
duplication since the hardware has already flooded the packet. Fixing
that turned up an issue with the meaning of
SWITCHDEV_ATTR_ID_PORT_PARENT_ID in DSA. A DSA fabric of three
switches needs to look to the software bridge as a single
switch. Otherwise the offload_fwd_mark does not work, and we get
duplication on the non-ingress switch. But each switch returned a
different value. And they were not unique.

The third and last issue will be explained in a followup email.

Open questions:

Is sending notifications going to break userspace?
Is this new switchdev object O.K. for the few non-DSA switches that exist?
Is the SWITCHDEV_ATTR_ID_PORT_PARENT_ID change acceptable?

   Andrew

Andrew Lunn (8):
  net: bridge: Rename mglist to host_joined
  net: bridge: Send notification when host join/leaves a group
  net: bridge: Add/del switchdev object on host join/leave
  net: dsa: slave: Handle switchdev host mdb add/del
  net: dsa: switch: handle host mdb add/remove
  net: dsa: switch: Don't add CPU port to an mdb by default
  net: dsa: set offload_fwd_mark on received packets
  net: dsa: Fix SWITCHDEV_ATTR_ID_PORT_PARENT_ID

 include/net/switchdev.h   |  1 +
 net/bridge/br_input.c |  2 +-
 net/bridge/br_mdb.c   | 50 +---
 net/bridge/br_multicast.c | 18 +++-
 net/bridge/br_private.h   |  2 +-
 net/dsa/dsa.c |  1 +
 net/dsa/dsa_priv.h|  7 +
 net/dsa/port.c| 26 +
 net/dsa/slave.c   | 16 ---
 net/dsa/switch.c  | 72 +++
 net/switchdev/switchdev.c |  2 ++
 11 files changed, 168 insertions(+), 29 deletions(-)

-- 
2.14.1