Re: [PATCH 0/7] net: dsa: mv88e6xxx: features to handle network storms

2019-09-13 Thread Robert Beckett
On Thu, 2019-09-12 at 10:41 -0700, Florian Fainelli wrote:
> On 9/12/19 9:46 AM, Robert Beckett wrote:
> > On Thu, 2019-09-12 at 09:25 -0700, Florian Fainelli wrote:
> > > On 9/12/19 2:03 AM, Ido Schimmel wrote:
> > > > On Wed, Sep 11, 2019 at 12:49:03PM +0100, Robert Beckett wrote:
> > > > > On Wed, 2019-09-11 at 11:21 +, Ido Schimmel wrote:
> > > > > > On Tue, Sep 10, 2019 at 09:49:46AM -0700, Florian Fainelli
> > > > > > wrote:
> > > > > > > +Ido, Jiri,
> > > > > > > 
> > > > > > > On 9/10/19 8:41 AM, Robert Beckett wrote:
> > > > > > > > This patch-set adds support for some features of the
> > > > > > > > Marvell
> > > > > > > > switch
> > > > > > > > chips that can be used to handle packet storms.
> > > > > > > > 
> > > > > > > > The rationale for this was a setup that requires the
> > > > > > > > ability to
> > > > > > > > receive
> > > > > > > > traffic from one port, while a packet storm is occuring
> > > > > > > > on
> > > > > > > > another port
> > > > > > > > (via an external switch with a deliberate loop). This
> > > > > > > > is
> > > > > > > > needed
> > > > > > > > to
> > > > > > > > ensure vital data delivery from a specific port, while
> > > > > > > > mitigating
> > > > > > > > any
> > > > > > > > loops or DoS that a user may introduce on another port
> > > > > > > > (can't
> > > > > > > > guarantee
> > > > > > > > sensible users).
> > > > > > > 
> > > > > > > The use case is reasonable, but the implementation is not
> > > > > > > really.
> > > > > > > You
> > > > > > > are using Device Tree which is meant to describe hardware
> > > > > > > as
> > > > > > > a
> > > > > > > policy
> > > > > > > holder for setting up queue priorities and likewise for
> > > > > > > queue
> > > > > > > scheduling.
> > > > > > > 
> > > > > > > The tool that should be used for that purpose is tc and
> > > > > > > possibly an
> > > > > > > appropriately offloaded queue scheduler in order to map
> > > > > > > the
> > > > > > > desired
> > > > > > > scheduling class to what the hardware supports.
> > > > > > > 
> > > > > > > Jiri, Ido, how do you guys support this with mlxsw?
> > > > > > 
> > > > > > Hi Florian,
> > > > > > 
> > > > > > Are you referring to policing traffic towards the CPU using
> > > > > > a
> > > > > > policer
> > > > > > on
> > > > > > the egress of the CPU port? At least that's what I
> > > > > > understand
> > > > > > from
> > > > > > the
> > > > > > description of patch 6 below.
> > > > > > 
> > > > > > If so, mlxsw sets policers for different traffic types
> > > > > > during
> > > > > > its
> > > > > > initialization sequence. These policers are not exposed to
> > > > > > the
> > > > > > user
> > > > > > nor
> > > > > > configurable. While the default settings are good for most
> > > > > > users, we
> > > > > > do
> > > > > > want to allow users to change these and expose current
> > > > > > settings.
> > > > > > 
> > > > > > I agree that tc seems like the right choice, but the
> > > > > > question
> > > > > > is
> > > > > > where
> > > > > > are we going to install the filters?
> > > > > > 
> > > > > 
> > > > > Before I go too far down the rabbit hole of tc traffic
> > > > > shaping,
> > > > > maybe
> > > > > it would be good to explain in more detail the problem I am
> > > > > trying to
> > > > > solve.
> > > > > 
> > > > > We have a setup as follows:
> > > > > 
> > > > > Marvell 88E6240 switch chip, accepting traffic from 4 ports.
> > > > > Port
> > > > > 1
> > > > > (P1) is critical priority, no dropped packets allowed, all
> > > > > others
> > > > > can
> > > > > be best effort.
> > > > > 
> > > > > CPU port of swtich chip is connected via phy to phy of intel
> > > > > i210
> > > > > (igb
> > > > > driver).
> > > > > 
> > > > > i210 is connected via pcie switch to imx6.
> > > > > 
> > > > > When too many small packets attempt to be delivered to CPU
> > > > > port
> > > > > (e.g.
> > > > > during broadcast flood) we saw dropped packets.
> > > > > 
> > > > > The packets were being received by i210 in to rx descriptor
> > > > > buffer
> > > > > fine, but the CPU could not keep up with the load. We saw
> > > > > rx_fifo_errors increasing rapidly and ksoftirqd at ~100% CPU.
> > > > > 
> > > > > 
> > > > > With this in mind, I am wondering whether any amount of tc
> > > > > traffic
> > > > > shaping would help? Would tc shaping require that the packet
> > > > > reception
> > > > > manages to keep up before it can enact its policies? Does the
> > > > > infrastructure have accelerator offload hooks to be able to
> > > > > apply
> > > > > it
> > > > > via HW? I dont see how it would be able to inspect the
> > > > > packets to
> > > > > apply
> > > > > filtering if they were dropped due to rx descriptor
> > > > > exhaustion.
> > > > > (please
> > > > > bear with me with the basic questions, I am not familiar with
> > > > > this part
> > > > > of the stack).
> > > > > 
> > > > > Assuming that tc is still the way to go, after a brief look
> > > > > in to
> > > > > the
> > > > > man pages and th

Re: [PATCH 0/7] net: dsa: mv88e6xxx: features to handle network storms

2019-09-12 Thread Florian Fainelli
On 9/12/19 9:46 AM, Robert Beckett wrote:
> On Thu, 2019-09-12 at 09:25 -0700, Florian Fainelli wrote:
>> On 9/12/19 2:03 AM, Ido Schimmel wrote:
>>> On Wed, Sep 11, 2019 at 12:49:03PM +0100, Robert Beckett wrote:
 On Wed, 2019-09-11 at 11:21 +, Ido Schimmel wrote:
> On Tue, Sep 10, 2019 at 09:49:46AM -0700, Florian Fainelli
> wrote:
>> +Ido, Jiri,
>>
>> On 9/10/19 8:41 AM, Robert Beckett wrote:
>>> This patch-set adds support for some features of the
>>> Marvell
>>> switch
>>> chips that can be used to handle packet storms.
>>>
>>> The rationale for this was a setup that requires the
>>> ability to
>>> receive
>>> traffic from one port, while a packet storm is occuring on
>>> another port
>>> (via an external switch with a deliberate loop). This is
>>> needed
>>> to
>>> ensure vital data delivery from a specific port, while
>>> mitigating
>>> any
>>> loops or DoS that a user may introduce on another port
>>> (can't
>>> guarantee
>>> sensible users).
>>
>> The use case is reasonable, but the implementation is not
>> really.
>> You
>> are using Device Tree which is meant to describe hardware as
>> a
>> policy
>> holder for setting up queue priorities and likewise for queue
>> scheduling.
>>
>> The tool that should be used for that purpose is tc and
>> possibly an
>> appropriately offloaded queue scheduler in order to map the
>> desired
>> scheduling class to what the hardware supports.
>>
>> Jiri, Ido, how do you guys support this with mlxsw?
>
> Hi Florian,
>
> Are you referring to policing traffic towards the CPU using a
> policer
> on
> the egress of the CPU port? At least that's what I understand
> from
> the
> description of patch 6 below.
>
> If so, mlxsw sets policers for different traffic types during
> its
> initialization sequence. These policers are not exposed to the
> user
> nor
> configurable. While the default settings are good for most
> users, we
> do
> want to allow users to change these and expose current
> settings.
>
> I agree that tc seems like the right choice, but the question
> is
> where
> are we going to install the filters?
>

 Before I go too far down the rabbit hole of tc traffic shaping,
 maybe
 it would be good to explain in more detail the problem I am
 trying to
 solve.

 We have a setup as follows:

 Marvell 88E6240 switch chip, accepting traffic from 4 ports. Port
 1
 (P1) is critical priority, no dropped packets allowed, all others
 can
 be best effort.

 CPU port of swtich chip is connected via phy to phy of intel i210
 (igb
 driver).

 i210 is connected via pcie switch to imx6.

 When too many small packets attempt to be delivered to CPU port
 (e.g.
 during broadcast flood) we saw dropped packets.

 The packets were being received by i210 in to rx descriptor
 buffer
 fine, but the CPU could not keep up with the load. We saw
 rx_fifo_errors increasing rapidly and ksoftirqd at ~100% CPU.


 With this in mind, I am wondering whether any amount of tc
 traffic
 shaping would help? Would tc shaping require that the packet
 reception
 manages to keep up before it can enact its policies? Does the
 infrastructure have accelerator offload hooks to be able to apply
 it
 via HW? I dont see how it would be able to inspect the packets to
 apply
 filtering if they were dropped due to rx descriptor exhaustion.
 (please
 bear with me with the basic questions, I am not familiar with
 this part
 of the stack).

 Assuming that tc is still the way to go, after a brief look in to
 the
 man pages and the documentation at largc.org, it seems like it
 would
 need to use the ingress qdisc, with some sort of system to
 segregate
 and priortise based on ingress port. Is this possible?
>>>
>>> Hi Robert,
>>>
>>> As I see it, you have two problems here:
>>>
>>> 1. Classification: Based on ingress port in your case
>>>
>>> 2. Scheduling: How to schedule between the different transmission
>>> queues
>>>
>>> Where the port from which the packets should egress is the CPU
>>> port,
>>> before they cross the PCI towards the imx6.
>>>
>>> Both of these issues can be solved by tc. The main problem is that
>>> today
>>> we do not have a netdev to represent the CPU port and therefore
>>> can't
>>> use existing infra like tc. I believe we need to create one.
>>> Besides
>>> scheduling, we can also use it to permit/deny certain traffic from
>>> reaching the CPU and perform policing.
>>
>> We do not necessarily have to create a CPU netdev, we can overlay
>> netdev
>> operations onto the DSA master interface (fec in that case

Re: [PATCH 0/7] net: dsa: mv88e6xxx: features to handle network storms

2019-09-12 Thread Robert Beckett
On Thu, 2019-09-12 at 09:25 -0700, Florian Fainelli wrote:
> On 9/12/19 2:03 AM, Ido Schimmel wrote:
> > On Wed, Sep 11, 2019 at 12:49:03PM +0100, Robert Beckett wrote:
> > > On Wed, 2019-09-11 at 11:21 +, Ido Schimmel wrote:
> > > > On Tue, Sep 10, 2019 at 09:49:46AM -0700, Florian Fainelli
> > > > wrote:
> > > > > +Ido, Jiri,
> > > > > 
> > > > > On 9/10/19 8:41 AM, Robert Beckett wrote:
> > > > > > This patch-set adds support for some features of the
> > > > > > Marvell
> > > > > > switch
> > > > > > chips that can be used to handle packet storms.
> > > > > > 
> > > > > > The rationale for this was a setup that requires the
> > > > > > ability to
> > > > > > receive
> > > > > > traffic from one port, while a packet storm is occuring on
> > > > > > another port
> > > > > > (via an external switch with a deliberate loop). This is
> > > > > > needed
> > > > > > to
> > > > > > ensure vital data delivery from a specific port, while
> > > > > > mitigating
> > > > > > any
> > > > > > loops or DoS that a user may introduce on another port
> > > > > > (can't
> > > > > > guarantee
> > > > > > sensible users).
> > > > > 
> > > > > The use case is reasonable, but the implementation is not
> > > > > really.
> > > > > You
> > > > > are using Device Tree which is meant to describe hardware as
> > > > > a
> > > > > policy
> > > > > holder for setting up queue priorities and likewise for queue
> > > > > scheduling.
> > > > > 
> > > > > The tool that should be used for that purpose is tc and
> > > > > possibly an
> > > > > appropriately offloaded queue scheduler in order to map the
> > > > > desired
> > > > > scheduling class to what the hardware supports.
> > > > > 
> > > > > Jiri, Ido, how do you guys support this with mlxsw?
> > > > 
> > > > Hi Florian,
> > > > 
> > > > Are you referring to policing traffic towards the CPU using a
> > > > policer
> > > > on
> > > > the egress of the CPU port? At least that's what I understand
> > > > from
> > > > the
> > > > description of patch 6 below.
> > > > 
> > > > If so, mlxsw sets policers for different traffic types during
> > > > its
> > > > initialization sequence. These policers are not exposed to the
> > > > user
> > > > nor
> > > > configurable. While the default settings are good for most
> > > > users, we
> > > > do
> > > > want to allow users to change these and expose current
> > > > settings.
> > > > 
> > > > I agree that tc seems like the right choice, but the question
> > > > is
> > > > where
> > > > are we going to install the filters?
> > > > 
> > > 
> > > Before I go too far down the rabbit hole of tc traffic shaping,
> > > maybe
> > > it would be good to explain in more detail the problem I am
> > > trying to
> > > solve.
> > > 
> > > We have a setup as follows:
> > > 
> > > Marvell 88E6240 switch chip, accepting traffic from 4 ports. Port
> > > 1
> > > (P1) is critical priority, no dropped packets allowed, all others
> > > can
> > > be best effort.
> > > 
> > > CPU port of swtich chip is connected via phy to phy of intel i210
> > > (igb
> > > driver).
> > > 
> > > i210 is connected via pcie switch to imx6.
> > > 
> > > When too many small packets attempt to be delivered to CPU port
> > > (e.g.
> > > during broadcast flood) we saw dropped packets.
> > > 
> > > The packets were being received by i210 in to rx descriptor
> > > buffer
> > > fine, but the CPU could not keep up with the load. We saw
> > > rx_fifo_errors increasing rapidly and ksoftirqd at ~100% CPU.
> > > 
> > > 
> > > With this in mind, I am wondering whether any amount of tc
> > > traffic
> > > shaping would help? Would tc shaping require that the packet
> > > reception
> > > manages to keep up before it can enact its policies? Does the
> > > infrastructure have accelerator offload hooks to be able to apply
> > > it
> > > via HW? I dont see how it would be able to inspect the packets to
> > > apply
> > > filtering if they were dropped due to rx descriptor exhaustion.
> > > (please
> > > bear with me with the basic questions, I am not familiar with
> > > this part
> > > of the stack).
> > > 
> > > Assuming that tc is still the way to go, after a brief look in to
> > > the
> > > man pages and the documentation at largc.org, it seems like it
> > > would
> > > need to use the ingress qdisc, with some sort of system to
> > > segregate
> > > and priortise based on ingress port. Is this possible?
> > 
> > Hi Robert,
> > 
> > As I see it, you have two problems here:
> > 
> > 1. Classification: Based on ingress port in your case
> > 
> > 2. Scheduling: How to schedule between the different transmission
> > queues
> > 
> > Where the port from which the packets should egress is the CPU
> > port,
> > before they cross the PCI towards the imx6.
> > 
> > Both of these issues can be solved by tc. The main problem is that
> > today
> > we do not have a netdev to represent the CPU port and therefore
> > can't
> > use existing infra like tc. I believe we need to create one.
> > Besides
> > scheduling, we can a

Re: [PATCH 0/7] net: dsa: mv88e6xxx: features to handle network storms

2019-09-12 Thread Florian Fainelli
On 9/12/19 2:03 AM, Ido Schimmel wrote:
> On Wed, Sep 11, 2019 at 12:49:03PM +0100, Robert Beckett wrote:
>> On Wed, 2019-09-11 at 11:21 +, Ido Schimmel wrote:
>>> On Tue, Sep 10, 2019 at 09:49:46AM -0700, Florian Fainelli wrote:
 +Ido, Jiri,

 On 9/10/19 8:41 AM, Robert Beckett wrote:
> This patch-set adds support for some features of the Marvell
> switch
> chips that can be used to handle packet storms.
>
> The rationale for this was a setup that requires the ability to
> receive
> traffic from one port, while a packet storm is occuring on
> another port
> (via an external switch with a deliberate loop). This is needed
> to
> ensure vital data delivery from a specific port, while mitigating
> any
> loops or DoS that a user may introduce on another port (can't
> guarantee
> sensible users).

 The use case is reasonable, but the implementation is not really.
 You
 are using Device Tree which is meant to describe hardware as a
 policy
 holder for setting up queue priorities and likewise for queue
 scheduling.

 The tool that should be used for that purpose is tc and possibly an
 appropriately offloaded queue scheduler in order to map the desired
 scheduling class to what the hardware supports.

 Jiri, Ido, how do you guys support this with mlxsw?
>>>
>>> Hi Florian,
>>>
>>> Are you referring to policing traffic towards the CPU using a policer
>>> on
>>> the egress of the CPU port? At least that's what I understand from
>>> the
>>> description of patch 6 below.
>>>
>>> If so, mlxsw sets policers for different traffic types during its
>>> initialization sequence. These policers are not exposed to the user
>>> nor
>>> configurable. While the default settings are good for most users, we
>>> do
>>> want to allow users to change these and expose current settings.
>>>
>>> I agree that tc seems like the right choice, but the question is
>>> where
>>> are we going to install the filters?
>>>
>>
>> Before I go too far down the rabbit hole of tc traffic shaping, maybe
>> it would be good to explain in more detail the problem I am trying to
>> solve.
>>
>> We have a setup as follows:
>>
>> Marvell 88E6240 switch chip, accepting traffic from 4 ports. Port 1
>> (P1) is critical priority, no dropped packets allowed, all others can
>> be best effort.
>>
>> CPU port of swtich chip is connected via phy to phy of intel i210 (igb
>> driver).
>>
>> i210 is connected via pcie switch to imx6.
>>
>> When too many small packets attempt to be delivered to CPU port (e.g.
>> during broadcast flood) we saw dropped packets.
>>
>> The packets were being received by i210 in to rx descriptor buffer
>> fine, but the CPU could not keep up with the load. We saw
>> rx_fifo_errors increasing rapidly and ksoftirqd at ~100% CPU.
>>
>>
>> With this in mind, I am wondering whether any amount of tc traffic
>> shaping would help? Would tc shaping require that the packet reception
>> manages to keep up before it can enact its policies? Does the
>> infrastructure have accelerator offload hooks to be able to apply it
>> via HW? I dont see how it would be able to inspect the packets to apply
>> filtering if they were dropped due to rx descriptor exhaustion. (please
>> bear with me with the basic questions, I am not familiar with this part
>> of the stack).
>>
>> Assuming that tc is still the way to go, after a brief look in to the
>> man pages and the documentation at largc.org, it seems like it would
>> need to use the ingress qdisc, with some sort of system to segregate
>> and priortise based on ingress port. Is this possible?
> 
> Hi Robert,
> 
> As I see it, you have two problems here:
> 
> 1. Classification: Based on ingress port in your case
> 
> 2. Scheduling: How to schedule between the different transmission queues
> 
> Where the port from which the packets should egress is the CPU port,
> before they cross the PCI towards the imx6.
> 
> Both of these issues can be solved by tc. The main problem is that today
> we do not have a netdev to represent the CPU port and therefore can't
> use existing infra like tc. I believe we need to create one. Besides
> scheduling, we can also use it to permit/deny certain traffic from
> reaching the CPU and perform policing.

We do not necessarily have to create a CPU netdev, we can overlay netdev
operations onto the DSA master interface (fec in that case), and
whenever you configure the DSA master interface, we also call back into
the switch side for the CPU port. This is not necessarily the cleanest
way to do things, but that is how we support ethtool operations (and
some netdev operations incidentally), and it works
-- 
Florian


Re: [PATCH 0/7] net: dsa: mv88e6xxx: features to handle network storms

2019-09-12 Thread Andrew Lunn
> 2. Scheduling: How to schedule between the different transmission queues
> 
> Where the port from which the packets should egress is the CPU port,
> before they cross the PCI towards the imx6.

Hi Ido

This is DSA, so the switch is connected via Ethernet to the IMX6, not
PCI. Minor detail, but that really is the core of what makes DSA DSA.

 Andrew


Re: [PATCH 0/7] net: dsa: mv88e6xxx: features to handle network storms

2019-09-12 Thread Ido Schimmel
On Thu, Sep 12, 2019 at 12:58:41AM +0200, Andrew Lunn wrote:
> So think about how your can model the Marvell switch capabilities
> using TC, and implement offload support for it.

+1 :)


Re: [PATCH 0/7] net: dsa: mv88e6xxx: features to handle network storms

2019-09-12 Thread Ido Schimmel
On Wed, Sep 11, 2019 at 12:49:03PM +0100, Robert Beckett wrote:
> On Wed, 2019-09-11 at 11:21 +, Ido Schimmel wrote:
> > On Tue, Sep 10, 2019 at 09:49:46AM -0700, Florian Fainelli wrote:
> > > +Ido, Jiri,
> > > 
> > > On 9/10/19 8:41 AM, Robert Beckett wrote:
> > > > This patch-set adds support for some features of the Marvell
> > > > switch
> > > > chips that can be used to handle packet storms.
> > > > 
> > > > The rationale for this was a setup that requires the ability to
> > > > receive
> > > > traffic from one port, while a packet storm is occuring on
> > > > another port
> > > > (via an external switch with a deliberate loop). This is needed
> > > > to
> > > > ensure vital data delivery from a specific port, while mitigating
> > > > any
> > > > loops or DoS that a user may introduce on another port (can't
> > > > guarantee
> > > > sensible users).
> > > 
> > > The use case is reasonable, but the implementation is not really.
> > > You
> > > are using Device Tree which is meant to describe hardware as a
> > > policy
> > > holder for setting up queue priorities and likewise for queue
> > > scheduling.
> > > 
> > > The tool that should be used for that purpose is tc and possibly an
> > > appropriately offloaded queue scheduler in order to map the desired
> > > scheduling class to what the hardware supports.
> > > 
> > > Jiri, Ido, how do you guys support this with mlxsw?
> > 
> > Hi Florian,
> > 
> > Are you referring to policing traffic towards the CPU using a policer
> > on
> > the egress of the CPU port? At least that's what I understand from
> > the
> > description of patch 6 below.
> > 
> > If so, mlxsw sets policers for different traffic types during its
> > initialization sequence. These policers are not exposed to the user
> > nor
> > configurable. While the default settings are good for most users, we
> > do
> > want to allow users to change these and expose current settings.
> > 
> > I agree that tc seems like the right choice, but the question is
> > where
> > are we going to install the filters?
> > 
> 
> Before I go too far down the rabbit hole of tc traffic shaping, maybe
> it would be good to explain in more detail the problem I am trying to
> solve.
> 
> We have a setup as follows:
> 
> Marvell 88E6240 switch chip, accepting traffic from 4 ports. Port 1
> (P1) is critical priority, no dropped packets allowed, all others can
> be best effort.
> 
> CPU port of swtich chip is connected via phy to phy of intel i210 (igb
> driver).
> 
> i210 is connected via pcie switch to imx6.
> 
> When too many small packets attempt to be delivered to CPU port (e.g.
> during broadcast flood) we saw dropped packets.
> 
> The packets were being received by i210 in to rx descriptor buffer
> fine, but the CPU could not keep up with the load. We saw
> rx_fifo_errors increasing rapidly and ksoftirqd at ~100% CPU.
> 
> 
> With this in mind, I am wondering whether any amount of tc traffic
> shaping would help? Would tc shaping require that the packet reception
> manages to keep up before it can enact its policies? Does the
> infrastructure have accelerator offload hooks to be able to apply it
> via HW? I dont see how it would be able to inspect the packets to apply
> filtering if they were dropped due to rx descriptor exhaustion. (please
> bear with me with the basic questions, I am not familiar with this part
> of the stack).
> 
> Assuming that tc is still the way to go, after a brief look in to the
> man pages and the documentation at largc.org, it seems like it would
> need to use the ingress qdisc, with some sort of system to segregate
> and priortise based on ingress port. Is this possible?

Hi Robert,

As I see it, you have two problems here:

1. Classification: Based on ingress port in your case

2. Scheduling: How to schedule between the different transmission queues

Where the port from which the packets should egress is the CPU port,
before they cross the PCI towards the imx6.

Both of these issues can be solved by tc. The main problem is that today
we do not have a netdev to represent the CPU port and therefore can't
use existing infra like tc. I believe we need to create one. Besides
scheduling, we can also use it to permit/deny certain traffic from
reaching the CPU and perform policing.

Drivers can run the received packets via taps using
dev_queue_xmit_nit(), so that users will see all the traffic directed at
the host when running tcpdump on this netdev.

> 
> 
> 
> > > 
> > > > 
> > > > [patch 1/7] configures auto negotiation for CPU ports connected
> > > > with
> > > > phys to enable pause frame propogation.
> > > > 
> > > > [patch 2/7] allows setting of port's default output queue
> > > > priority for
> > > > any ingressing packets on that port.
> > > > 
> > > > [patch 3/7] dt-bindings for patch 2.
> > > > 
> > > > [patch 4/7] allows setting of a port's queue scheduling so that
> > > > it can
> > > > prioritise egress of traffic routed from high priority ports.
> > > > 
> > > > [patch 5/7] dt

Re: [PATCH 0/7] net: dsa: mv88e6xxx: features to handle network storms

2019-09-11 Thread Andrew Lunn
> > Feature series targeting netdev must be prefixed "PATCH net-next". As
> 
> Thanks for the info. Out of curiosity, where should I have gleaned this
> info from? This is my first contribution to netdev, so I wasnt familiar
> with the etiquette.

It is also a good idea to 'lurk' in a mailing list for a while,
reading emails flying around, getting to know how things work. This
subject of "PATCH net-next" comes up maybe once a week. The idea off
offloads gets discussed once every couple of weeks etc.

 Andrew


Re: [PATCH 0/7] net: dsa: mv88e6xxx: features to handle network storms

2019-09-11 Thread Andrew Lunn
> We have a setup as follows:
> 
> Marvell 88E6240 switch chip, accepting traffic from 4 ports. Port 1
> (P1) is critical priority, no dropped packets allowed, all others can
> be best effort.
> 
> CPU port of swtich chip is connected via phy to phy of intel i210 (igb
> driver).
> 
> i210 is connected via pcie switch to imx6.
> 
> When too many small packets attempt to be delivered to CPU port (e.g.
> during broadcast flood) we saw dropped packets.
> 
> The packets were being received by i210 in to rx descriptor buffer
> fine, but the CPU could not keep up with the load. We saw
> rx_fifo_errors increasing rapidly and ksoftirqd at ~100% CPU.
> 
> 
> With this in mind, I am wondering whether any amount of tc traffic
> shaping would help?

Hi Robert

The model in linux is that you start with a software TC filter, and
then offload it to the hardware. So the user configures TC just as
normal, and then that is used to program the hardware to do the same
thing as what would happen in software. This is exactly the same as we
do with bridging. You create a software bridge and add interfaces to
the bridge. This then gets offloaded to the hardware and it does the
bridging for you.

So think about how your can model the Marvell switch capabilities
using TC, and implement offload support for it.

Andrew


Re: [PATCH 0/7] net: dsa: mv88e6xxx: features to handle network storms

2019-09-11 Thread Vivien Didelot
Hi Robert,

On Wed, 11 Sep 2019 10:46:05 +0100, Robert Beckett  
wrote:
> > Feature series targeting netdev must be prefixed "PATCH net-next". As
> 
> Thanks for the info. Out of curiosity, where should I have gleaned this
> info from? This is my first contribution to netdev, so I wasnt familiar
> with the etiquette.
> 
> > this approach was a PoC, sending it as "RFC net-next" would be even
> > more
> > appropriate.

Netdev being a huge subsystem has specific rules for subject prefix or merge
window, which are described in Documentation/networking/netdev-FAQ.rst


Thank you,

Vivien


Re: [PATCH 0/7] net: dsa: mv88e6xxx: features to handle network storms

2019-09-11 Thread Robert Beckett
On Wed, 2019-09-11 at 11:21 +, Ido Schimmel wrote:
> On Tue, Sep 10, 2019 at 09:49:46AM -0700, Florian Fainelli wrote:
> > +Ido, Jiri,
> > 
> > On 9/10/19 8:41 AM, Robert Beckett wrote:
> > > This patch-set adds support for some features of the Marvell
> > > switch
> > > chips that can be used to handle packet storms.
> > > 
> > > The rationale for this was a setup that requires the ability to
> > > receive
> > > traffic from one port, while a packet storm is occuring on
> > > another port
> > > (via an external switch with a deliberate loop). This is needed
> > > to
> > > ensure vital data delivery from a specific port, while mitigating
> > > any
> > > loops or DoS that a user may introduce on another port (can't
> > > guarantee
> > > sensible users).
> > 
> > The use case is reasonable, but the implementation is not really.
> > You
> > are using Device Tree which is meant to describe hardware as a
> > policy
> > holder for setting up queue priorities and likewise for queue
> > scheduling.
> > 
> > The tool that should be used for that purpose is tc and possibly an
> > appropriately offloaded queue scheduler in order to map the desired
> > scheduling class to what the hardware supports.
> > 
> > Jiri, Ido, how do you guys support this with mlxsw?
> 
> Hi Florian,
> 
> Are you referring to policing traffic towards the CPU using a policer
> on
> the egress of the CPU port? At least that's what I understand from
> the
> description of patch 6 below.
> 
> If so, mlxsw sets policers for different traffic types during its
> initialization sequence. These policers are not exposed to the user
> nor
> configurable. While the default settings are good for most users, we
> do
> want to allow users to change these and expose current settings.
> 
> I agree that tc seems like the right choice, but the question is
> where
> are we going to install the filters?
> 

Before I go too far down the rabbit hole of tc traffic shaping, maybe
it would be good to explain in more detail the problem I am trying to
solve.

We have a setup as follows:

Marvell 88E6240 switch chip, accepting traffic from 4 ports. Port 1
(P1) is critical priority, no dropped packets allowed, all others can
be best effort.

CPU port of swtich chip is connected via phy to phy of intel i210 (igb
driver).

i210 is connected via pcie switch to imx6.

When too many small packets attempt to be delivered to CPU port (e.g.
during broadcast flood) we saw dropped packets.

The packets were being received by i210 in to rx descriptor buffer
fine, but the CPU could not keep up with the load. We saw
rx_fifo_errors increasing rapidly and ksoftirqd at ~100% CPU.


With this in mind, I am wondering whether any amount of tc traffic
shaping would help? Would tc shaping require that the packet reception
manages to keep up before it can enact its policies? Does the
infrastructure have accelerator offload hooks to be able to apply it
via HW? I dont see how it would be able to inspect the packets to apply
filtering if they were dropped due to rx descriptor exhaustion. (please
bear with me with the basic questions, I am not familiar with this part
of the stack).

Assuming that tc is still the way to go, after a brief look in to the
man pages and the documentation at largc.org, it seems like it would
need to use the ingress qdisc, with some sort of system to segregate
and priortise based on ingress port. Is this possible?



> > 
> > > 
> > > [patch 1/7] configures auto negotiation for CPU ports connected
> > > with
> > > phys to enable pause frame propogation.
> > > 
> > > [patch 2/7] allows setting of port's default output queue
> > > priority for
> > > any ingressing packets on that port.
> > > 
> > > [patch 3/7] dt-bindings for patch 2.
> > > 
> > > [patch 4/7] allows setting of a port's queue scheduling so that
> > > it can
> > > prioritise egress of traffic routed from high priority ports.
> > > 
> > > [patch 5/7] dt-bindings for patch 4.
> > > 
> > > [patch 6/7] allows ports to rate limit their egress. This can be
> > > used to
> > > stop the host CPU from becoming swamped by packet delivery and
> > > exhasting
> > > descriptors.
> > > 
> > > [patch 7/7] dt-bindings for patch 6.
> > > 
> > > 
> > > Robert Beckett (7):
> > >   net/dsa: configure autoneg for CPU port
> > >   net: dsa: mv88e6xxx: add ability to set default queue
> > > priorities per
> > > port
> > >   dt-bindings: mv88e6xxx: add ability to set default queue
> > > priorities
> > > per port
> > >   net: dsa: mv88e6xxx: add ability to set queue scheduling
> > >   dt-bindings: mv88e6xxx: add ability to set queue scheduling
> > >   net: dsa: mv88e6xxx: add egress rate limiting
> > >   dt-bindings: mv88e6xxx: add egress rate limiting
> > > 
> > >  .../devicetree/bindings/net/dsa/marvell.txt   |  38 +
> > >  drivers/net/dsa/mv88e6xxx/chip.c  | 122
> > > ---
> > >  drivers/net/dsa/mv88e6xxx/chip.h  |   5 +-
> > >  drivers/net/dsa/mv88e6xxx/port.c  | 140

Re: [PATCH 0/7] net: dsa: mv88e6xxx: features to handle network storms

2019-09-11 Thread Ido Schimmel
On Tue, Sep 10, 2019 at 09:49:46AM -0700, Florian Fainelli wrote:
> +Ido, Jiri,
> 
> On 9/10/19 8:41 AM, Robert Beckett wrote:
> > This patch-set adds support for some features of the Marvell switch
> > chips that can be used to handle packet storms.
> > 
> > The rationale for this was a setup that requires the ability to receive
> > traffic from one port, while a packet storm is occuring on another port
> > (via an external switch with a deliberate loop). This is needed to
> > ensure vital data delivery from a specific port, while mitigating any
> > loops or DoS that a user may introduce on another port (can't guarantee
> > sensible users).
> 
> The use case is reasonable, but the implementation is not really. You
> are using Device Tree which is meant to describe hardware as a policy
> holder for setting up queue priorities and likewise for queue scheduling.
> 
> The tool that should be used for that purpose is tc and possibly an
> appropriately offloaded queue scheduler in order to map the desired
> scheduling class to what the hardware supports.
> 
> Jiri, Ido, how do you guys support this with mlxsw?

Hi Florian,

Are you referring to policing traffic towards the CPU using a policer on
the egress of the CPU port? At least that's what I understand from the
description of patch 6 below.

If so, mlxsw sets policers for different traffic types during its
initialization sequence. These policers are not exposed to the user nor
configurable. While the default settings are good for most users, we do
want to allow users to change these and expose current settings.

I agree that tc seems like the right choice, but the question is where
are we going to install the filters?

> 
> > 
> > [patch 1/7] configures auto negotiation for CPU ports connected with
> > phys to enable pause frame propogation.
> > 
> > [patch 2/7] allows setting of port's default output queue priority for
> > any ingressing packets on that port.
> > 
> > [patch 3/7] dt-bindings for patch 2.
> > 
> > [patch 4/7] allows setting of a port's queue scheduling so that it can
> > prioritise egress of traffic routed from high priority ports.
> > 
> > [patch 5/7] dt-bindings for patch 4.
> > 
> > [patch 6/7] allows ports to rate limit their egress. This can be used to
> > stop the host CPU from becoming swamped by packet delivery and exhasting
> > descriptors.
> > 
> > [patch 7/7] dt-bindings for patch 6.
> > 
> > 
> > Robert Beckett (7):
> >   net/dsa: configure autoneg for CPU port
> >   net: dsa: mv88e6xxx: add ability to set default queue priorities per
> > port
> >   dt-bindings: mv88e6xxx: add ability to set default queue priorities
> > per port
> >   net: dsa: mv88e6xxx: add ability to set queue scheduling
> >   dt-bindings: mv88e6xxx: add ability to set queue scheduling
> >   net: dsa: mv88e6xxx: add egress rate limiting
> >   dt-bindings: mv88e6xxx: add egress rate limiting
> > 
> >  .../devicetree/bindings/net/dsa/marvell.txt   |  38 +
> >  drivers/net/dsa/mv88e6xxx/chip.c  | 122 ---
> >  drivers/net/dsa/mv88e6xxx/chip.h  |   5 +-
> >  drivers/net/dsa/mv88e6xxx/port.c  | 140 +-
> >  drivers/net/dsa/mv88e6xxx/port.h  |  24 ++-
> >  include/dt-bindings/net/dsa-mv88e6xxx.h   |  22 +++
> >  net/dsa/port.c|  10 ++
> >  7 files changed, 327 insertions(+), 34 deletions(-)
> >  create mode 100644 include/dt-bindings/net/dsa-mv88e6xxx.h
> > 
> 
> 
> -- 
> Florian


Re: [PATCH 0/7] net: dsa: mv88e6xxx: features to handle network storms

2019-09-11 Thread Robert Beckett
On Tue, 2019-09-10 at 13:19 -0400, Vivien Didelot wrote:
> Hi Robert,
> 
> On Tue, 10 Sep 2019 16:41:46 +0100, Robert Beckett <
> bob.beck...@collabora.com> wrote:
> > This patch-set adds support for some features of the Marvell switch
> > chips that can be used to handle packet storms.
> > 
> > The rationale for this was a setup that requires the ability to
> > receive
> > traffic from one port, while a packet storm is occuring on another
> > port
> > (via an external switch with a deliberate loop). This is needed to
> > ensure vital data delivery from a specific port, while mitigating
> > any
> > loops or DoS that a user may introduce on another port (can't
> > guarantee
> > sensible users).
> > 
> > [patch 1/7] configures auto negotiation for CPU ports connected
> > with
> > phys to enable pause frame propogation.
> > 
> > [patch 2/7] allows setting of port's default output queue priority
> > for
> > any ingressing packets on that port.
> > 
> > [patch 3/7] dt-bindings for patch 2.
> > 
> > [patch 4/7] allows setting of a port's queue scheduling so that it
> > can
> > prioritise egress of traffic routed from high priority ports.
> > 
> > [patch 5/7] dt-bindings for patch 4.
> > 
> > [patch 6/7] allows ports to rate limit their egress. This can be
> > used to
> > stop the host CPU from becoming swamped by packet delivery and
> > exhasting
> > descriptors.
> > 
> > [patch 7/7] dt-bindings for patch 6.
> > 
> > 
> > Robert Beckett (7):
> >   net/dsa: configure autoneg for CPU port
> >   net: dsa: mv88e6xxx: add ability to set default queue priorities
> > per
> > port
> >   dt-bindings: mv88e6xxx: add ability to set default queue
> > priorities
> > per port
> >   net: dsa: mv88e6xxx: add ability to set queue scheduling
> >   dt-bindings: mv88e6xxx: add ability to set queue scheduling
> >   net: dsa: mv88e6xxx: add egress rate limiting
> >   dt-bindings: mv88e6xxx: add egress rate limiting
> > 
> >  .../devicetree/bindings/net/dsa/marvell.txt   |  38 +
> >  drivers/net/dsa/mv88e6xxx/chip.c  | 122 
> > ---
> >  drivers/net/dsa/mv88e6xxx/chip.h  |   5 +-
> >  drivers/net/dsa/mv88e6xxx/port.c  | 140
> > +-
> >  drivers/net/dsa/mv88e6xxx/port.h  |  24 ++-
> >  include/dt-bindings/net/dsa-mv88e6xxx.h   |  22 +++
> >  net/dsa/port.c|  10 ++
> >  7 files changed, 327 insertions(+), 34 deletions(-)
> >  create mode 100644 include/dt-bindings/net/dsa-mv88e6xxx.h
> 
> Feature series targeting netdev must be prefixed "PATCH net-next". As

Thanks for the info. Out of curiosity, where should I have gleaned this
info from? This is my first contribution to netdev, so I wasnt familiar
with the etiquette.

> this approach was a PoC, sending it as "RFC net-next" would be even
> more
> appropriate.
> 
> 
> Thank you,
> 
>   Vivien



Re: [PATCH 0/7] net: dsa: mv88e6xxx: features to handle network storms

2019-09-11 Thread Robert Beckett
On Tue, 2019-09-10 at 09:49 -0700, Florian Fainelli wrote:
> +Ido, Jiri,
> 
> On 9/10/19 8:41 AM, Robert Beckett wrote:
> > This patch-set adds support for some features of the Marvell switch
> > chips that can be used to handle packet storms.
> > 
> > The rationale for this was a setup that requires the ability to
> > receive
> > traffic from one port, while a packet storm is occuring on another
> > port
> > (via an external switch with a deliberate loop). This is needed to
> > ensure vital data delivery from a specific port, while mitigating
> > any
> > loops or DoS that a user may introduce on another port (can't
> > guarantee
> > sensible users).
> 
> The use case is reasonable, but the implementation is not really. You
> are using Device Tree which is meant to describe hardware as a policy
> holder for setting up queue priorities and likewise for queue
> scheduling.
> 
> The tool that should be used for that purpose is tc and possibly an
> appropriately offloaded queue scheduler in order to map the desired
> scheduling class to what the hardware supports.

Thanks for the review and tip about tc. Im currently not familiar with
that tool. Ill investigate it as an alternative approach.

> 
> Jiri, Ido, how do you guys support this with mlxsw?
> 
> > 
> > [patch 1/7] configures auto negotiation for CPU ports connected
> > with
> > phys to enable pause frame propogation.
> > 
> > [patch 2/7] allows setting of port's default output queue priority
> > for
> > any ingressing packets on that port.
> > 
> > [patch 3/7] dt-bindings for patch 2.
> > 
> > [patch 4/7] allows setting of a port's queue scheduling so that it
> > can
> > prioritise egress of traffic routed from high priority ports.
> > 
> > [patch 5/7] dt-bindings for patch 4.
> > 
> > [patch 6/7] allows ports to rate limit their egress. This can be
> > used to
> > stop the host CPU from becoming swamped by packet delivery and
> > exhasting
> > descriptors.
> > 
> > [patch 7/7] dt-bindings for patch 6.
> > 
> > 
> > Robert Beckett (7):
> >   net/dsa: configure autoneg for CPU port
> >   net: dsa: mv88e6xxx: add ability to set default queue priorities
> > per
> > port
> >   dt-bindings: mv88e6xxx: add ability to set default queue
> > priorities
> > per port
> >   net: dsa: mv88e6xxx: add ability to set queue scheduling
> >   dt-bindings: mv88e6xxx: add ability to set queue scheduling
> >   net: dsa: mv88e6xxx: add egress rate limiting
> >   dt-bindings: mv88e6xxx: add egress rate limiting
> > 
> >  .../devicetree/bindings/net/dsa/marvell.txt   |  38 +
> >  drivers/net/dsa/mv88e6xxx/chip.c  | 122 
> > ---
> >  drivers/net/dsa/mv88e6xxx/chip.h  |   5 +-
> >  drivers/net/dsa/mv88e6xxx/port.c  | 140
> > +-
> >  drivers/net/dsa/mv88e6xxx/port.h  |  24 ++-
> >  include/dt-bindings/net/dsa-mv88e6xxx.h   |  22 +++
> >  net/dsa/port.c|  10 ++
> >  7 files changed, 327 insertions(+), 34 deletions(-)
> >  create mode 100644 include/dt-bindings/net/dsa-mv88e6xxx.h
> > 
> 
> 



Re: [PATCH 0/7] net: dsa: mv88e6xxx: features to handle network storms

2019-09-10 Thread Vivien Didelot
Hi Robert,

On Tue, 10 Sep 2019 16:41:46 +0100, Robert Beckett  
wrote:
> This patch-set adds support for some features of the Marvell switch
> chips that can be used to handle packet storms.
> 
> The rationale for this was a setup that requires the ability to receive
> traffic from one port, while a packet storm is occuring on another port
> (via an external switch with a deliberate loop). This is needed to
> ensure vital data delivery from a specific port, while mitigating any
> loops or DoS that a user may introduce on another port (can't guarantee
> sensible users).
> 
> [patch 1/7] configures auto negotiation for CPU ports connected with
> phys to enable pause frame propogation.
> 
> [patch 2/7] allows setting of port's default output queue priority for
> any ingressing packets on that port.
> 
> [patch 3/7] dt-bindings for patch 2.
> 
> [patch 4/7] allows setting of a port's queue scheduling so that it can
> prioritise egress of traffic routed from high priority ports.
> 
> [patch 5/7] dt-bindings for patch 4.
> 
> [patch 6/7] allows ports to rate limit their egress. This can be used to
> stop the host CPU from becoming swamped by packet delivery and exhasting
> descriptors.
> 
> [patch 7/7] dt-bindings for patch 6.
> 
> 
> Robert Beckett (7):
>   net/dsa: configure autoneg for CPU port
>   net: dsa: mv88e6xxx: add ability to set default queue priorities per
> port
>   dt-bindings: mv88e6xxx: add ability to set default queue priorities
> per port
>   net: dsa: mv88e6xxx: add ability to set queue scheduling
>   dt-bindings: mv88e6xxx: add ability to set queue scheduling
>   net: dsa: mv88e6xxx: add egress rate limiting
>   dt-bindings: mv88e6xxx: add egress rate limiting
> 
>  .../devicetree/bindings/net/dsa/marvell.txt   |  38 +
>  drivers/net/dsa/mv88e6xxx/chip.c  | 122 ---
>  drivers/net/dsa/mv88e6xxx/chip.h  |   5 +-
>  drivers/net/dsa/mv88e6xxx/port.c  | 140 +-
>  drivers/net/dsa/mv88e6xxx/port.h  |  24 ++-
>  include/dt-bindings/net/dsa-mv88e6xxx.h   |  22 +++
>  net/dsa/port.c|  10 ++
>  7 files changed, 327 insertions(+), 34 deletions(-)
>  create mode 100644 include/dt-bindings/net/dsa-mv88e6xxx.h

Feature series targeting netdev must be prefixed "PATCH net-next". As
this approach was a PoC, sending it as "RFC net-next" would be even more
appropriate.


Thank you,

Vivien


Re: [PATCH 0/7] net: dsa: mv88e6xxx: features to handle network storms

2019-09-10 Thread Florian Fainelli
+Ido, Jiri,

On 9/10/19 8:41 AM, Robert Beckett wrote:
> This patch-set adds support for some features of the Marvell switch
> chips that can be used to handle packet storms.
> 
> The rationale for this was a setup that requires the ability to receive
> traffic from one port, while a packet storm is occuring on another port
> (via an external switch with a deliberate loop). This is needed to
> ensure vital data delivery from a specific port, while mitigating any
> loops or DoS that a user may introduce on another port (can't guarantee
> sensible users).

The use case is reasonable, but the implementation is not really. You
are using Device Tree which is meant to describe hardware as a policy
holder for setting up queue priorities and likewise for queue scheduling.

The tool that should be used for that purpose is tc and possibly an
appropriately offloaded queue scheduler in order to map the desired
scheduling class to what the hardware supports.

Jiri, Ido, how do you guys support this with mlxsw?

> 
> [patch 1/7] configures auto negotiation for CPU ports connected with
> phys to enable pause frame propogation.
> 
> [patch 2/7] allows setting of port's default output queue priority for
> any ingressing packets on that port.
> 
> [patch 3/7] dt-bindings for patch 2.
> 
> [patch 4/7] allows setting of a port's queue scheduling so that it can
> prioritise egress of traffic routed from high priority ports.
> 
> [patch 5/7] dt-bindings for patch 4.
> 
> [patch 6/7] allows ports to rate limit their egress. This can be used to
> stop the host CPU from becoming swamped by packet delivery and exhasting
> descriptors.
> 
> [patch 7/7] dt-bindings for patch 6.
> 
> 
> Robert Beckett (7):
>   net/dsa: configure autoneg for CPU port
>   net: dsa: mv88e6xxx: add ability to set default queue priorities per
> port
>   dt-bindings: mv88e6xxx: add ability to set default queue priorities
> per port
>   net: dsa: mv88e6xxx: add ability to set queue scheduling
>   dt-bindings: mv88e6xxx: add ability to set queue scheduling
>   net: dsa: mv88e6xxx: add egress rate limiting
>   dt-bindings: mv88e6xxx: add egress rate limiting
> 
>  .../devicetree/bindings/net/dsa/marvell.txt   |  38 +
>  drivers/net/dsa/mv88e6xxx/chip.c  | 122 ---
>  drivers/net/dsa/mv88e6xxx/chip.h  |   5 +-
>  drivers/net/dsa/mv88e6xxx/port.c  | 140 +-
>  drivers/net/dsa/mv88e6xxx/port.h  |  24 ++-
>  include/dt-bindings/net/dsa-mv88e6xxx.h   |  22 +++
>  net/dsa/port.c|  10 ++
>  7 files changed, 327 insertions(+), 34 deletions(-)
>  create mode 100644 include/dt-bindings/net/dsa-mv88e6xxx.h
> 


-- 
Florian