date:20171012

Re: [patch net-next 00/34] net: sched: allow qdiscs to share filter block instances

2017-10-12 Thread David Miller

From: Jiri Pirko 
Date: Fri, 13 Oct 2017 08:21:01 +0200

> Thu, Oct 12, 2017 at 07:21:48PM CEST, da...@davemloft.net wrote:
>>
>>Jiri I'm not looking at a 34 patch set, it's too large.
>>
>>Break this up into groups of a dozen or so patches each, no
>>more.  Submit them one at a time and wait for each series
>>to be fully reviewed and integrated before moving onto the
>>next one.
> 
> Yeah. As I stated in the beginning of the cover letter, I did not find a
> way to do it. I could split into 2 of 3 patchsets, problem is that I
> would introduce interfaces in first patchset that would be only used in
> patchset 2 or 3. I believe that is not ok. Do you think that I can do it
> like this this time?

Jiri, please try harder.

Thank you.

Re: [patch net-next 06/34] net: core: use dev->ingress_queue instead of tp->q

2017-10-12 Thread Jiri Pirko

Thu, Oct 12, 2017 at 11:45:43PM CEST, dan...@iogearbox.net wrote:
>On 10/12/2017 07:17 PM, Jiri Pirko wrote:
>> From: Jiri Pirko 
>> 
>> In sch_handle_egress and sch_handle_ingress, don't use tp->q and use
>> dev->ingress_queue which stores the same pointer instead.
>> 
>> Signed-off-by: Jiri Pirko 
>> ---
>>   net/core/dev.c | 21 +++--
>>   1 file changed, 15 insertions(+), 6 deletions(-)
>> 
>> diff --git a/net/core/dev.c b/net/core/dev.c
>> index fcddccb..cb9e5e5 100644
>> --- a/net/core/dev.c
>> +++ b/net/core/dev.c
>> @@ -3273,14 +3273,18 @@ EXPORT_SYMBOL(dev_loopback_xmit);
>>   static struct sk_buff *
>>   sch_handle_egress(struct sk_buff *skb, int *ret, struct net_device *dev)
>>   {
>> +struct netdev_queue *netdev_queue =
>> +rcu_dereference_bh(dev->ingress_queue);
>>  struct tcf_proto *cl = rcu_dereference_bh(dev->egress_cl_list);
>>  struct tcf_result cl_res;
>> +struct Qdisc *q;
>> 
>> -if (!cl)
>> +if (!cl || !netdev_queue)
>>  return skb;
>> +q = netdev_queue->qdisc;
>
>NAK, no additional overhead in the software fast-path of
>sch_handle_{ingress,egress}() like this. There are users out there
>that use tc in software only, so performance is critical here.

Okay, how else do you suggest I can avoid the need to use tp->q?
I was thinking about storing q directly to net_device, which would safe
one dereference, resulting in the same amount as current cl->q.

Thanks.

Re: [patch net-next 00/34] net: sched: allow qdiscs to share filter block instances

2017-10-12 Thread Jiri Pirko

Thu, Oct 12, 2017 at 11:37:30PM CEST, dsah...@gmail.com wrote:
>On 10/12/17 11:17 AM, Jiri Pirko wrote:
>> So back to the example. First, we create 2 qdiscs. Both will share
>> block number 22. "22" is just an identification. If we don't pass any
>> block number, a new one will be generated by kernel:
>> 
>> $ tc qdisc add dev ens7 ingress block 22
>> 
>> $ tc qdisc add dev ens8 ingress block 22
>> 
>> 
>> Now if we list the qdiscs, we will see the block index in the output:
>> 
>> $ tc qdisc
>> qdisc ingress : dev ens7 parent :fff1 block 22 
>> qdisc ingress : dev ens8 parent :fff1 block 22 
>> 
>> Now we can add filter to any of qdiscs sharing the same block:
>> 
>> $ tc filter add dev ens7 parent : protocol ip pref 25 flower dst_ip 
>> 192.168.0.0/16 action drop
>> 
>> 
>> We will see the same output if we list filters for ens7 and ens8, including 
>> stats:
>> 
>> $ tc -s filter show dev ens7 ingress
>> filter protocol ip pref 25 flower chain 0 
>> filter protocol ip pref 25 flower chain 0 handle 0x1 
>>   eth_type ipv4
>>   dst_ip 192.168.0.0/16
>>   not_in_hw
>> action order 1: gact action drop
>>  random type none pass val 0
>>  index 1 ref 1 bind 1 installed 39 sec used 2 sec
>> Action statistics:
>> Sent 3108 bytes 37 pkt (dropped 37, overlimits 0 requeues 0) 
>> backlog 0b 0p requeues 0 
>> 
>> $ tc -s filter show dev ens8 ingress
>> filter protocol ip pref 25 flower chain 0 
>> filter protocol ip pref 25 flower chain 0 handle 0x1 
>>   eth_type ipv4
>>   dst_ip 192.168.0.0/16
>>   not_in_hw
>> action order 1: gact action drop
>>  random type none pass val 0
>>  index 1 ref 1 bind 1 installed 40 sec used 3 sec
>> Action statistics:
>> Sent 3108 bytes 37 pkt (dropped 37, overlimits 0 requeues 0) 
>> backlog 0b 0p requeues 0
>
>This seems like really odd semantics to me ... a filter added to one
>device shows up on another.

Why is it odd? They share the same block, so it is natural that rule
added to one shows in list of rules for all devices that share the same
block.


>
>Why not make the shared block a standalone object that is configured
>through its own set of commands and then referenced by both devices?

I was thinking about that for a long time. That would require entirely
new set of netlink api and internal kernel handling just for this. Lots
of duplications. The reason is, the current API is strictly build around
ifindex. But the new API would not solve anything. As a user, I still
want so see shared rules in individial device listing, because they
would get processed for the device. So I believe that the proposed
behaviour is correct.

Re: [patch net-next 00/34] net: sched: allow qdiscs to share filter block instances

2017-10-12 Thread Jiri Pirko

Thu, Oct 12, 2017 at 07:21:48PM CEST, da...@davemloft.net wrote:
>
>Jiri I'm not looking at a 34 patch set, it's too large.
>
>Break this up into groups of a dozen or so patches each, no
>more.  Submit them one at a time and wait for each series
>to be fully reviewed and integrated before moving onto the
>next one.

Yeah. As I stated in the beginning of the cover letter, I did not find a
way to do it. I could split into 2 of 3 patchsets, problem is that I
would introduce interfaces in first patchset that would be only used in
patchset 2 or 3. I believe that is not ok. Do you think that I can do it
like this this time?

Thanks

Re: [PATCH net-next 1/1] net/smc: add SMC rendezvous protocol

2017-10-12 Thread David Miller

From: Florian Westphal 
Date: Thu, 12 Oct 2017 13:14:29 +0200

> Ursula Braun  wrote:
>> On 10/11/2017 11:06 PM, David Miller wrote:
>> > From: Ursula Braun 
>> > Date: Tue, 10 Oct 2017 16:14:19 +0200
>> > 
>> >> The goal of this patch is to leave common TCP code unmodified. Thus,
>> >> it uses netfilter hooks to intercept TCP SYN and SYN/ACK
>> >> packets. For outgoing packets originating from SMC sockets, the
>> >> experimental option is added. For inbound packets destined for SMC
>> >> sockets, the experimental option is checked.
>> > 
>> > I think this really isn't going to pass.
>> > 
>> > It's a user experience nightmare when the kernel inserts and
>> > deletes filtering rules outside of what the user configures
>> > on their system.
> 
> It depends if the hook is passive or not (i.e. mangles
> payload/metadata or returns verdict other than NF_ACCEPT).
> 
> OUTPUT hook added here is not passive as it mangles tcp options.
> 
>> > This approach was also considerd for ipv6 ILA, and the same
>> > pushback was given.
> 
> ahem.
> net/ipv6/ila/ila_xlat.c:   err = nf_register_net_hooks(net, ila_nf_hook_ops,

My bad, I thought we had decided against that.

Oh well.

Re: [PATCH net-next] selftests: rtnetlink: add a small macsec test case

2017-10-12 Thread David Miller

From: Florian Westphal 
Date: Thu, 12 Oct 2017 11:11:22 +0200

> Signed-off-by: Florian Westphal 

Applied.

Re: [PATCH net-next 0/8] cxgb4: add support to get hardware debug logs via ethtool

2017-10-12 Thread David Miller

From: Rahul Lakkireddy 
Date: Thu, 12 Oct 2017 13:54:37 +0530

> This series of patches add support to collect hardware debug logs
> via ethtool --get-dump facility.

There is a lot of global namespace pollution added by these
changes.

A lot of the global symbols you add in this new code have very
poorly namespaced names like "collect_mem_info()"

If the driver is built statically into the kernel this will pollute
the global namespace and conflict with any symbols elsewhere in the
kernel that have the same name.

So please use a proper "cxgb4_" or similar prefix for any non-static
symbols in the driver.

Thank you.

Re: [PATCH] ravb: Consolidate clock handling

2017-10-12 Thread David Miller

From: Geert Uytterhoeven 
Date: Thu, 12 Oct 2017 10:24:53 +0200

> The module clock is used for two purposes:
>   - Wake-on-LAN (WoL), which is optional,
>   - gPTP Timer Increment (GTI) configuration, which is mandatory.
> 
> As the clock is needed for GTI configuration anyway, WoL is always
> available.  Hence remove duplication and repeated obtaining of the clock
> by making GTI use the stored clock for WoL use.
> 
> Signed-off-by: Geert Uytterhoeven 

Applied.

Re: [PATCH 0/2] net: support bgmac with B50212E B1 PHY

2017-10-12 Thread David Miller

From: Rafał Miłecki 
Date: Thu, 12 Oct 2017 10:21:24 +0200

> From: Rafał Miłecki 
> 
> I got a report that a board with BCM47189 SoC and B50212E B1 PHY doesn't
> work well some devices as there is massive ping loss. After analyzing
> PHY state it has appeared that is runs in slave mode and doesn't auto
> switch to master properly when needed.
> 
> This patchset fixes this by:
> 1) Adding new flag support to the PHY driver for setting master mode
> 2) Modifying bgmac to request master mode for reported hardware

Series applied to net-next, thanks.

Re: [PATCH] ip: update policy routing config help

2017-10-12 Thread David Miller

From: Stephen Hemminger 
Date: Wed, 11 Oct 2017 20:10:31 -0700

> The kernel config help for policy routing was still pointing at
> an ancient document from 2000 that refers to Linux 2.1. Update it
> to point to something that is at least occasionally updated.
> 
> Signed-off-by: Stephen Hemminger 

Applied.

Re: [PATCH next] ipvlan: always use the current L2 addr of the master

2017-10-12 Thread David Miller

From: Mahesh Bandewar 
Date: Wed, 11 Oct 2017 17:16:26 -0700

> From: Mahesh Bandewar 
> 
> If the underlying master ever changes its L2 (e.g. bonding device),
> then make sure that the IPvlan slaves always emit packets with the
> current L2 of the master instead of the stale mac addr which was
> copied during the device creation. The problem can be seen with
> following script -
> 
>   #!/bin/bash
>   # Create a vEth pair
>   ip link add dev veth0 type veth peer name veth1
>   ip link set veth0 up
>   ip link set veth1 up
>   ip link show veth0
>   ip link show veth1
>   # Create an IPvlan device on one end of this vEth pair.
>   ip link add link veth0 dev ipvl0 type ipvlan mode l2
>   ip link show ipvl0
>   # Change the mac-address of the vEth master.
>   ip link set veth0 address 02:11:22:33:44:55
> 
> Fixes: 2ad7bf363841 ("ipvlan: Initial check-in of the IPVLAN driver.")
> Signed-off-by: Mahesh Bandewar 

Applied.

Re: [PATCH net-next 0/4] tc-testing: Test suite updates

2017-10-12 Thread David Miller

From: Lucas Bates 
Date: Wed, 11 Oct 2017 17:16:50 -0400

> This patch series is a roundup of changes to the tc-testing
> suite:
> 
>  - Add test cases for police and mirred modules and some coverage
>in already-submitted test categories
>  - Break the test case files down into more user-friendly sizes
>  - Bug fix to the tdc.py script's handling of the -l argument

Some of the newly added files lack final newlines, and as you can
see in the git patch output a warning is generated.

Please fix this up and resubmit.

Thanks.

Re: [PATCH net-next 0/3] sched: act: ife: UAPI checks and performance tweaks

2017-10-12 Thread David Miller

From: Alexander Aring 
Date: Wed, 11 Oct 2017 17:16:05 -0400

> this patch series contains at first a patch which adds a check for
> IFE_ENCODE and IFE_DECODE when a ife act gets created or updated and adding
> handling of these cases only inside the act callback only.
> 
> The second patch use per-cpu counters and move the spinlock around so that
> the spinlock is less being held in act callback.
> 
> The last patch use rcu for update parameters and also move the spinlock for
> the same purpose as in patch 2.
> 
> Notes:
>  - There is still a spinlock around for protecting the metalist and a
>rw-lock for another list. Should be migrated to a rcu list, ife
>possible.
> 
>  - I use still dereference in dump callback, so I think what I didn't
>got was what happened when rcu_assign_pointer will do when rcu read
>lock is held. I suppose the pointer will be updated, then we don't
>have any issue here.

Series applied.

Re: [PATCH net-next 0/2] Fix IFE meta modules loading

2017-10-12 Thread David Miller

From: Roman Mashak 
Date: Thu, 12 Oct 2017 16:37:39 -0400

> David Miller  writes:
> 
>> From: Roman Mashak 
>> Date: Wed, 11 Oct 2017 10:50:28 -0400
>>
>>> Adjust module alias names of IFE meta modules and fix the bug that
>>> prevented auto-loading IFE modules in run-time.
>>
>> Anyone using the existing alises will be broken by these changes
>> no?
> 
> Actually aliases never worked, the bug existed since the day act_meta_*
> modules have been introduced. I suspect everyone compiles them in-kernel
> rather then as modules.

Fair enough, series applied, thanks for explaining.

Waiting for your response to my numerous un-replied emails to you concerning your family inheritance fund ($7.5 million dollars). I seek your assistance and I assured of your capability to champion th

2017-10-12 Thread Enu Ofe

Waiting for your response to my numerous un-replied emails to you concerning your family inheritance fund ($7.5 million dollars). I seek your assistance and I assured of your capability to champion th

2017-10-12 Thread Enu Ofe

[PATCH v3] net: ftgmac100: Request clock and set speed

2017-10-12 Thread Joel Stanley

According to the ASPEED datasheet, gigabit speeds require a clock of
100MHz or higher. Other speeds require 25MHz or higher. This patch
configures a 100MHz clock if the system has a direct-attached
PHY, or 25MHz if the system is running NC-SI which is limited to 100MHz.

There appear to be no other upstream users of the FTGMAC100 driver it is
hard to know the clocking requirements of other platforms. Therefore a
conservative approach was taken with enabling clocks. If the platform is
not ASPEED, both requesting the clock and configuring the speed is
skipped.

Signed-off-by: Joel Stanley 
---
Andrew, can you please give this one a spin on hardware?

v3:
 - Fix errors from v2
v2:
 - only touch the clocks on Aspeed platforms
 - unconditionally call clk_unprepare_disable

 drivers/net/ethernet/faraday/ftgmac100.c | 26 ++
 1 file changed, 26 insertions(+)

diff --git a/drivers/net/ethernet/faraday/ftgmac100.c 
b/drivers/net/ethernet/faraday/ftgmac100.c
index 9ed8e4b81530..78db8e62a83f 100644
--- a/drivers/net/ethernet/faraday/ftgmac100.c
+++ b/drivers/net/ethernet/faraday/ftgmac100.c
@@ -21,6 +21,7 @@
 
 #define pr_fmt(fmt)KBUILD_MODNAME ": " fmt
 
+#include 
 #include 
 #include 
 #include 
@@ -59,6 +60,9 @@
 /* Min number of tx ring entries before stopping queue */
 #define TX_THRESHOLD   (MAX_SKB_FRAGS + 1)
 
+#define FTGMAC_100MHZ  1
+#define FTGMAC_25MHZ   2500
+
 struct ftgmac100 {
/* Registers */
struct resource *res;
@@ -96,6 +100,7 @@ struct ftgmac100 {
struct napi_struct napi;
struct work_struct reset_task;
struct mii_bus *mii_bus;
+   struct clk *clk;
 
/* Link management */
int cur_speed;
@@ -1734,6 +1739,22 @@ static void ftgmac100_ncsi_handler(struct ncsi_dev *nd)
nd->link_up ? "up" : "down");
 }
 
+static void ftgmac100_setup_clk(struct ftgmac100 *priv)
+{
+   priv->clk = devm_clk_get(priv->dev, NULL);
+   if (IS_ERR(priv->clk))
+   return;
+
+   clk_prepare_enable(priv->clk);
+
+   /* Aspeed specifies a 100MHz clock is required for up to
+* 1000Mbit link speeds. As NCSI is limited to 100Mbit, 25MHz
+* is sufficient
+*/
+   clk_set_rate(priv->clk, priv->use_ncsi ? FTGMAC_25MHZ :
+   FTGMAC_100MHZ);
+}
+
 static int ftgmac100_probe(struct platform_device *pdev)
 {
struct resource *res;
@@ -1830,6 +1851,9 @@ static int ftgmac100_probe(struct platform_device *pdev)
goto err_setup_mdio;
}
 
+   if (priv->is_aspeed)
+   ftgmac100_setup_clk(priv);
+
/* Default ring sizes */
priv->rx_q_entries = priv->new_rx_q_entries = DEF_RX_QUEUE_ENTRIES;
priv->tx_q_entries = priv->new_tx_q_entries = DEF_TX_QUEUE_ENTRIES;
@@ -1883,6 +1907,8 @@ static int ftgmac100_remove(struct platform_device *pdev)
 
unregister_netdev(netdev);
 
+   clk_disable_unprepare(priv->clk);
+
/* There's a small chance the reset task will have been re-queued,
 * during stop, make sure it's gone before we free the structure.
 */
-- 
2.14.1

Re: [PATCH] nfp: Explicitly include linux/bug.h

2017-10-12 Thread Jakub Kicinski

On Fri, 13 Oct 2017 03:50:35 +0100, Mark Brown wrote:
> Today's -next build encountered an error due to a missing definition of
> WARN_ON(), caused by some header reorganization removing an implicit
> inclusion of linux/bug.h.  Fix this with an explicit inclusion.
> 
> Signed-off-by: Mark Brown 

Acked-by: Jakub Kicinski 

Thank you!

[PATCH] nfp: Explicitly include linux/bug.h

2017-10-12 Thread Mark Brown

Today's -next build encountered an error due to a missing definition of
WARN_ON(), caused by some header reorganization removing an implicit
inclusion of linux/bug.h.  Fix this with an explicit inclusion.

Signed-off-by: Mark Brown 
---
 drivers/net/ethernet/netronome/nfp/nfp_app.c | 1 +
 drivers/net/ethernet/netronome/nfp/nfp_asm.h | 1 +
 2 files changed, 2 insertions(+)

diff --git a/drivers/net/ethernet/netronome/nfp/nfp_app.c 
b/drivers/net/ethernet/netronome/nfp/nfp_app.c
index 82c290763529..5d9e2eba5b49 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_app.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_app.c
@@ -31,6 +31,7 @@
  * SOFTWARE.
  */
 
+#include 
 #include 
 #include 
 
diff --git a/drivers/net/ethernet/netronome/nfp/nfp_asm.h 
b/drivers/net/ethernet/netronome/nfp/nfp_asm.h
index c4c18dd5630a..aa397bf308e4 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_asm.h
+++ b/drivers/net/ethernet/netronome/nfp/nfp_asm.h
@@ -35,6 +35,7 @@
 #define __NFP_ASM_H__ 1
 
 #include 
+#include 
 #include 
 
 #define REG_NONE   0
-- 
2.14.1

Re: [next-queue PATCH v7 4/6] net/sched: Introduce Credit Based Shaper (CBS) qdisc

2017-10-12 Thread Eric Dumazet

On Thu, 2017-10-12 at 17:40 -0700, Vinicius Costa Gomes wrote:
> This queueing discipline implements the shaper algorithm defined by
> the 802.1Q-2014 Section 8.6.8.2 and detailed in Annex L.
> 
> It's primary usage is to apply some bandwidth reservation to user
> defined traffic classes, which are mapped to different queues via the
> mqprio qdisc.
> 
> Only a simple software implementation is added for now.
> 
> Signed-off-by: Vinicius Costa Gomes 
> Signed-off-by: Jesus Sanchez-Palencia 
> ---

> +/* timediff is in ns, slope is in kbps */
> +static s64 timediff_to_credits(s64 timediff, s32 slope)
> +{
> + s64 credits = timediff * slope * BYTES_PER_KBIT;
> +
> + do_div(credits, NSEC_PER_SEC);
> +
> + return credits;
> +}
> +
> +static s64 delay_from_credits(s64 credits, s32 slope)
> +{
> + s64 rate = slope * BYTES_PER_KBIT;
> + s64 delay;
> +
> + if (unlikely(rate == 0))
> + return S64_MAX;
> +
> + delay = -credits * NSEC_PER_SEC;
> + do_div(delay, rate);
> +
> + return delay;
> +}
> +
> +static s64 credits_from_len(unsigned int len, s32 slope, s64 port_rate)
> +{
> + /* As do_div() only works on unsigned quantities, convert
> +  * slope to a positive number here, and credits to a negative
> +  * number before returning.
> +  */
> + s64 rate = -slope * BYTES_PER_KBIT;
> + s64 credits;
> +
> + if (unlikely(port_rate == 0))
> + return S64_MAX;
> +
> + credits = len * rate;
> + do_div(credits, port_rate);
> +
> + return -credits;
> +}
> +


Your mixing of s64 and u64 is disturbing.

do_div() handles u64, not s64.

div64_s64() might be needed in place of do_div()

linux-next: build failure after merge of the akpm-current tree

2017-10-12 Thread Mark Brown

Hi Andrew,

After merging the akpm-current tree, today's linux-next build
(x86 allmodconfig) failed like this:

  CC [M]  drivers/net/ethernet/netronome/nfp/nfp_app.o
In file included from 
/home/broonie/tmpfs/next/drivers/net/ethernet/netronome/nfp/nfp_asm.c:40:0:
/home/broonie/tmpfs/next/drivers/net/ethernet/netronome/nfp/nfp_asm.h: In 
function '__enc_swreg_lm':
/home/broonie/tmpfs/next/drivers/net/ethernet/netronome/nfp/nfp_asm.h:301:2: 
error: implicit declaration of function 'WARN_ON' 
[-Werror=implicit-function-declaration]
  WARN_ON(id > 3 || (off && mode != NN_LM_MOD_NONE));
  ^
cc1: some warnings being treated as errors

Caused by some reliance on an implicit include being exposed by a header
reorganization in your tree.  I'll add a patch for this which I'll post,
probably tomorrow morning.


signature.asc
Description: PGP signature

Re: [PATCH net-next v2 1/1] bridge: return error code when deleting Vlan

2017-10-12 Thread Nikolay Aleksandrov

On 13.10.2017 05:03, Jamal Hadi Salim wrote:
> On 17-10-12 02:12 PM, Nikolay Aleksandrov wrote:
>> On 12/10/17 21:07, Roman Mashak wrote:
> 
>>> For example, if you attempt to delete a non-existing vlan on a port,
>>> the current code succeeds and also sends event :
>>>
>>> rtnetlink_rcv_msg
>>>  rtnl_bridge_dellink
>>>     br_dellink
>>>    br_afspec
>>>   br_vlan_info
>>>
>>> int br_dellink(..)
>>> {
>>>    ...
>>>    err = br_afspec()
>>>    if (err == 0)
>>>    br_ifinfo_notify(RTM_NEWLINK, p);
>>> }
>>>
>>> This is misleading, so a proper errcode has to be produced.
>>>
>>
> 
> 
> 
>> True, but you also change the expected behaviour because now a user can
>> clear all vlans with one request (1 - 4094), and after the change that
>> will fail with a partial delete if some vlan was missing.
>>
> 
> The issue is more subtle (per Roman above):
> Try to delete a vlan (that doesnt  exist).
> 1) It says "success".
> 2) Worse: Another process listening (bridge monitor?) gets an _event_
>  that  the vlan has been deleted (when it never existed in the first
>  place).
> 
>> This has been the behaviour forever and some script might depend on it.
>> Also IMO, and as David also mentioned, doing a partial delete is not
>> good.
>>
> 
> I think this is a bug (especially the event part).
> 
> cheers,
> jamal
> 

Fair enough, but after the patch you get the opposite effect too - you
delete a couple of vlans but you don't generate an event because of an
error in the middle. That at least can be taken care of.

I do agree it's a bug, but there might be scripts that rely on it and
don't check the return value when clearing vlans. They will end up with
a partial clear and wrongly assumed state, so maybe leave the
opportunistic delete but count if anything was actually deleted and send
an event only then ?
That should make everyone happy :-)

Re: [PATCH net-next 1/2] mqprio: Add a new hardware offload type in mqprio

2017-10-12 Thread Yunsheng Lin

Hi, Yuval

On 2017/10/13 4:10, Yuval Mintz wrote:
>> When a driver supports both dcb and hardware offloaded mqprio, and
>> user is running mqprio and dcb tool concurrently, the configuration
>> set by each tool may be conflicted with each other because the dcb
> (for second 'each') s/each/the
> 

Will do, Thanks

>> and mqprio may be using the same hardwere offload component and share
> s/hardwere/hardware

Will do, Thanks

> 
>> the tc system in the network stack.
>>
>> This patch adds a new offload type to indicate that the underlying
>> driver offload prio mapping as part of DCB. If the driver would be
> 'should' offload

Will do, Thanks

> 
>> incapable of that it would refuse the offload. User would then have
>> to explicitly request that qdisc offload.
> 
> 
>

Re: [PATCH net-next 0/2] Add mqprio hardware offload support in hns3 driver

2017-10-12 Thread Yunsheng Lin

Hi, Yuval

On 2017/10/13 4:21, Yuval Mintz wrote:
>> This patchset adds a new hardware offload type in mqprio before adding
>> mqprio hardware offload support in hns3 driver.
> 
> I think one of the biggest issues in tying this to DCB configuration is the
> non-immediate [and possibly non persistent] configuration.
> 
> Scenario #1:
> User is configuring mqprio offloaded with 3 TCs while device is in willing 
> mode.
> Would you expect the driver to immediately respond with a success or instead
> delay the return until the DCBx negotiation is complete and the operational
> num of TCs is actually 3?

Well, when user requsts the mqprio offloaded by a hardware shared by DCB, I 
expect
the user is not using the dcb tool.
If user is still using dcb tool, then result is undefined.

The scenario you mention maybe can be enforced by setting willing to zero when 
user
is requesting the mqprio offload, and restore the willing bit when unloaded the 
mqprio
offload.
But I think the real issue is that dcb and mqprio shares the tc system in the 
stack,
the problem may be better to be fixed in the stack rather than in the driver, 
as you
suggested in the DCB patchset. What do you think?

> 
> Scenario #2:
> Assume user explicitly offloaded mqprio with 3 TCs, but now DCB configuration
> has changed on the peer side and 4 TCs is the new negotiated operational 
> value.
> Your current driver logic would change the number of TCs underneath the user
> configuration [and it would actually probably work due to mqprio being a 
> crappy
> qdisc]. But was that the user actual intention?
> [I think the likely answer in this scenario is 'yes' since the alternative is 
> no better.
> But I still thought it was worth mentioning]

You are right, the problem also have something to do with mqprio and dcb sharing
the tc in the stack.

Druing testing, when user explicitly offloaded mqprio with 3 TCs, all
queue has a default pfifo mqprio attached, after DCB changes the tc num to 4,
using tc qdisc shows some queue does not have a default pfifo mqprio attached.

Maybe we can add a callback to notify mqprio the configuration has changed.

Thanks
Yunsheng Lin

> 
> Cheers,
> Yuval
> 
>>
>> Yunsheng Lin (2):
>>   mqprio: Add a new hardware offload type in mqprio
>>   net: hns3: Add mqprio hardware offload support in hns3 driver
>>
>>  drivers/net/ethernet/hisilicon/hns3/hnae3.h|  1 +
>>  .../net/ethernet/hisilicon/hns3/hns3pf/hclge_dcb.c | 23 +++
>>  .../net/ethernet/hisilicon/hns3/hns3pf/hns3_enet.c | 46 ++-
>> ---
>>  include/uapi/linux/pkt_sched.h |  1 +
>>  4 files changed, 55 insertions(+), 16 deletions(-)
>>
>> --
>> 1.9.1
> 
> 
>

Re: [PATCH net-next v2 1/1] bridge: return error code when deleting Vlan

2017-10-12 Thread Jamal Hadi Salim


On 17-10-12 02:12 PM, Nikolay Aleksandrov wrote:

On 12/10/17 21:07, Roman Mashak wrote:



For example, if you attempt to delete a non-existing vlan on a port,
the current code succeeds and also sends event :

rtnetlink_rcv_msg
 rtnl_bridge_dellink
br_dellink
   br_afspec
  br_vlan_info

int br_dellink(..)
{
   ...
   err = br_afspec()
   if (err == 0)
   br_ifinfo_notify(RTM_NEWLINK, p);
}

This is misleading, so a proper errcode has to be produced.








True, but you also change the expected behaviour because now a user can
clear all vlans with one request (1 - 4094), and after the change that
will fail with a partial delete if some vlan was missing.



The issue is more subtle (per Roman above):
Try to delete a vlan (that doesnt  exist).
1) It says "success".
2) Worse: Another process listening (bridge monitor?) gets an _event_
 that  the vlan has been deleted (when it never existed in the first
 place).


This has been the behaviour forever and some script might depend on it.
Also IMO, and as David also mentioned, doing a partial delete is not good.



I think this is a bug (especially the event part).

cheers,
jamal

Re: [PATCH] tracing: bpf: Hide bpf trace events when they are not used

2017-10-12 Thread Steven Rostedt

On Thu, 12 Oct 2017 18:38:36 -0700
Alexei Starovoitov  wrote:

> actually just noticed that xdp tracepoints are not covered by ifdef.
> They depend on bpf_syscall too. So probably makes sense to wrap
> them together.
> bpf tracepoints are not being actively worked on whereas xdp tracepoints
> keep evolving quickly, so the best is probalby to go via net-next
> if you don't mind.

Hmm, they didn't trigger a warning, with the exception of
trace_xdp_redirect_map. I have code to check if tracepoints are used or
not, and it appears that the xdp can be used without BPF_SYSCALL.

I don't think they should be wrapped together until we know why they
are used. I can still take this patch and just not touch the xdp ones.

Note, my kernel was using trace_xdp_redirect_map_err,
trace_xdp_redirect_err, trace_xdp_redirect and trace_xdp_exception.

As they did appear.

-- Steve

[PATCH net-next v2 0/4] net: dsa: remove .set_addr

2017-10-12 Thread Vivien Didelot

An Ethernet switch may support having a MAC address, which can be used
as the switch's source address in transmitted full-duplex Pause frames.

If a DSA switch supports the related .set_addr operation, the DSA core
sets the master's MAC address on the switch.

This won't make sense anymore in a multi-CPU ports system, because there
won't be a unique master device assigned to a switch tree.

Moreover this operation is confusing because it makes the user think
that it could be used to program the switch with the MAC address of the
CPU/management port such that MAC address learning can be disabled on
said port, but in fact, that's not how it is currently used.

To fix this, assign a random MAC address at setup time in the mv88e6060
and mv88e6xxx drivers before removing .set_addr completely from DSA.

Changes in v2:
  - remove .set_addr implementation from drivers and use a random MAC.

Vivien Didelot (4):
  net: dsa: mv88e6xxx: setup random mac address
  net: dsa: mv88e6060: setup random mac address
  net: dsa: dsa_loop: remove .set_addr
  net: dsa: remove .set_addr

 drivers/net/dsa/dsa_loop.c   |  8 
 drivers/net/dsa/mv88e6060.c  | 30 +++---
 drivers/net/dsa/mv88e6xxx/chip.c | 33 +
 include/net/dsa.h|  1 -
 net/dsa/dsa2.c   |  6 --
 net/dsa/legacy.c |  6 --
 6 files changed, 36 insertions(+), 48 deletions(-)

-- 
2.14.2

[PATCH net-next v2 3/4] net: dsa: dsa_loop: remove .set_addr

2017-10-12 Thread Vivien Didelot

The .set_addr function does nothing, remove the dsa_loop implementation
before getting rid of it completely in DSA.

Signed-off-by: Vivien Didelot 
---
 drivers/net/dsa/dsa_loop.c | 8 
 1 file changed, 8 deletions(-)

diff --git a/drivers/net/dsa/dsa_loop.c b/drivers/net/dsa/dsa_loop.c
index d55051abf4ed..3a3f4f7ba364 100644
--- a/drivers/net/dsa/dsa_loop.c
+++ b/drivers/net/dsa/dsa_loop.c
@@ -110,13 +110,6 @@ static void dsa_loop_get_ethtool_stats(struct dsa_switch 
*ds, int port,
data[i] = ps->ports[port].mib[i].val;
 }
 
-static int dsa_loop_set_addr(struct dsa_switch *ds, u8 *addr)
-{
-   dev_dbg(ds->dev, "%s\n", __func__);
-
-   return 0;
-}
-
 static int dsa_loop_phy_read(struct dsa_switch *ds, int port, int regnum)
 {
struct dsa_loop_priv *ps = ds->priv;
@@ -263,7 +256,6 @@ static const struct dsa_switch_ops dsa_loop_driver = {
.get_strings= dsa_loop_get_strings,
.get_ethtool_stats  = dsa_loop_get_ethtool_stats,
.get_sset_count = dsa_loop_get_sset_count,
-   .set_addr   = dsa_loop_set_addr,
.phy_read   = dsa_loop_phy_read,
.phy_write  = dsa_loop_phy_write,
.port_bridge_join   = dsa_loop_port_bridge_join,
-- 
2.14.2

[PATCH net-next v2 2/4] net: dsa: mv88e6060: setup random mac address

2017-10-12 Thread Vivien Didelot

As for mv88e6xxx, setup the switch from within the mv88e6060 driver with
a random MAC address, and remove the .set_addr implementation.

Signed-off-by: Vivien Didelot 
---
 drivers/net/dsa/mv88e6060.c | 30 +++---
 1 file changed, 19 insertions(+), 11 deletions(-)

diff --git a/drivers/net/dsa/mv88e6060.c b/drivers/net/dsa/mv88e6060.c
index 621cdc46ad81..2f9d5e6a0f97 100644
--- a/drivers/net/dsa/mv88e6060.c
+++ b/drivers/net/dsa/mv88e6060.c
@@ -9,6 +9,7 @@
  */
 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -188,6 +189,20 @@ static int mv88e6060_setup_port(struct dsa_switch *ds, int 
p)
return 0;
 }
 
+static int mv88e6060_setup_addr(struct dsa_switch *ds)
+{
+   u8 addr[ETH_ALEN];
+
+   eth_random_addr(addr);
+
+   /* Use the same MAC Address as FD Pause frames for all ports */
+   REG_WRITE(REG_GLOBAL, GLOBAL_MAC_01, (addr[0] << 9) | addr[1]);
+   REG_WRITE(REG_GLOBAL, GLOBAL_MAC_23, (addr[2] << 8) | addr[3]);
+   REG_WRITE(REG_GLOBAL, GLOBAL_MAC_45, (addr[4] << 8) | addr[5]);
+
+   return 0;
+}
+
 static int mv88e6060_setup(struct dsa_switch *ds)
 {
int ret;
@@ -203,6 +218,10 @@ static int mv88e6060_setup(struct dsa_switch *ds)
if (ret < 0)
return ret;
 
+   ret = mv88e6060_setup_addr(ds);
+   if (ret < 0)
+   return ret;
+
for (i = 0; i < MV88E6060_PORTS; i++) {
ret = mv88e6060_setup_port(ds, i);
if (ret < 0)
@@ -212,16 +231,6 @@ static int mv88e6060_setup(struct dsa_switch *ds)
return 0;
 }
 
-static int mv88e6060_set_addr(struct dsa_switch *ds, u8 *addr)
-{
-   /* Use the same MAC Address as FD Pause frames for all ports */
-   REG_WRITE(REG_GLOBAL, GLOBAL_MAC_01, (addr[0] << 9) | addr[1]);
-   REG_WRITE(REG_GLOBAL, GLOBAL_MAC_23, (addr[2] << 8) | addr[3]);
-   REG_WRITE(REG_GLOBAL, GLOBAL_MAC_45, (addr[4] << 8) | addr[5]);
-
-   return 0;
-}
-
 static int mv88e6060_port_to_phy_addr(int port)
 {
if (port >= 0 && port < MV88E6060_PORTS)
@@ -256,7 +265,6 @@ static const struct dsa_switch_ops mv88e6060_switch_ops = {
.get_tag_protocol = mv88e6060_get_tag_protocol,
.probe  = mv88e6060_drv_probe,
.setup  = mv88e6060_setup,
-   .set_addr   = mv88e6060_set_addr,
.phy_read   = mv88e6060_phy_read,
.phy_write  = mv88e6060_phy_write,
 };
-- 
2.14.2

[PATCH net-next v2 4/4] net: dsa: remove .set_addr

2017-10-12 Thread Vivien Didelot

Now that there is no user for the .set_addr function, remove it from
DSA. If a switch supports this feature (like mv88e6xxx), the
implementation can be done in the driver setup.

Signed-off-by: Vivien Didelot 
---
 include/net/dsa.h | 1 -
 net/dsa/dsa2.c| 6 --
 net/dsa/legacy.c  | 6 --
 3 files changed, 13 deletions(-)

diff --git a/include/net/dsa.h b/include/net/dsa.h
index ce1d622734d7..2746741f74cf 100644
--- a/include/net/dsa.h
+++ b/include/net/dsa.h
@@ -291,7 +291,6 @@ struct dsa_switch_ops {
enum dsa_tag_protocol (*get_tag_protocol)(struct dsa_switch *ds);
 
int (*setup)(struct dsa_switch *ds);
-   int (*set_addr)(struct dsa_switch *ds, u8 *addr);
u32 (*get_phy_flags)(struct dsa_switch *ds, int port);
 
/*
diff --git a/net/dsa/dsa2.c b/net/dsa/dsa2.c
index 54ed054777bd..6ac9e11d385c 100644
--- a/net/dsa/dsa2.c
+++ b/net/dsa/dsa2.c
@@ -336,12 +336,6 @@ static int dsa_ds_apply(struct dsa_switch_tree *dst, 
struct dsa_switch *ds)
if (err)
return err;
 
-   if (ds->ops->set_addr) {
-   err = ds->ops->set_addr(ds, dst->cpu_dp->netdev->dev_addr);
-   if (err < 0)
-   return err;
-   }
-
if (!ds->slave_mii_bus && ds->ops->phy_read) {
ds->slave_mii_bus = devm_mdiobus_alloc(ds->dev);
if (!ds->slave_mii_bus)
diff --git a/net/dsa/legacy.c b/net/dsa/legacy.c
index 19ff6e0a21dc..b0fefbffe082 100644
--- a/net/dsa/legacy.c
+++ b/net/dsa/legacy.c
@@ -172,12 +172,6 @@ static int dsa_switch_setup_one(struct dsa_switch *ds,
if (ret)
return ret;
 
-   if (ops->set_addr) {
-   ret = ops->set_addr(ds, master->dev_addr);
-   if (ret < 0)
-   return ret;
-   }
-
if (!ds->slave_mii_bus && ops->phy_read) {
ds->slave_mii_bus = devm_mdiobus_alloc(ds->dev);
if (!ds->slave_mii_bus)
-- 
2.14.2

[PATCH net-next v2 1/4] net: dsa: mv88e6xxx: setup random mac address

2017-10-12 Thread Vivien Didelot

An Ethernet switch may support having a MAC address, which can be used
as the switch's source address in transmitted full-duplex Pause frames.

If a DSA switch supports the related .set_addr operation, the DSA core
sets the master's MAC address on the switch. This won't make sense
anymore in a multi-CPU ports system, because there won't be a unique
master device assigned to a switch tree.

Instead, setup the switch from within the Marvell driver with a random
MAC address, and remove the .set_addr implementation.

Signed-off-by: Vivien Didelot 
---
 drivers/net/dsa/mv88e6xxx/chip.c | 33 +
 1 file changed, 17 insertions(+), 16 deletions(-)

diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
index d74c7335c512..76cf383dcf90 100644
--- a/drivers/net/dsa/mv88e6xxx/chip.c
+++ b/drivers/net/dsa/mv88e6xxx/chip.c
@@ -932,6 +932,19 @@ static int mv88e6xxx_irl_setup(struct mv88e6xxx_chip *chip)
return 0;
 }
 
+static int mv88e6xxx_mac_setup(struct mv88e6xxx_chip *chip)
+{
+   if (chip->info->ops->set_switch_mac) {
+   u8 addr[ETH_ALEN];
+
+   eth_random_addr(addr);
+
+   return chip->info->ops->set_switch_mac(chip, addr);
+   }
+
+   return 0;
+}
+
 static int mv88e6xxx_pvt_map(struct mv88e6xxx_chip *chip, int dev, int port)
 {
u16 pvlan = 0;
@@ -2013,6 +2026,10 @@ static int mv88e6xxx_setup(struct dsa_switch *ds)
if (err)
goto unlock;
 
+   err = mv88e6xxx_mac_setup(chip);
+   if (err)
+   goto unlock;
+
err = mv88e6xxx_phy_setup(chip);
if (err)
goto unlock;
@@ -2043,21 +2060,6 @@ static int mv88e6xxx_setup(struct dsa_switch *ds)
return err;
 }
 
-static int mv88e6xxx_set_addr(struct dsa_switch *ds, u8 *addr)
-{
-   struct mv88e6xxx_chip *chip = ds->priv;
-   int err;
-
-   if (!chip->info->ops->set_switch_mac)
-   return -EOPNOTSUPP;
-
-   mutex_lock(&chip->reg_lock);
-   err = chip->info->ops->set_switch_mac(chip, addr);
-   mutex_unlock(&chip->reg_lock);
-
-   return err;
-}
-
 static int mv88e6xxx_mdio_read(struct mii_bus *bus, int phy, int reg)
 {
struct mv88e6xxx_mdio_bus *mdio_bus = bus->priv;
@@ -3785,7 +3787,6 @@ static const struct dsa_switch_ops mv88e6xxx_switch_ops = 
{
.probe  = mv88e6xxx_drv_probe,
.get_tag_protocol   = mv88e6xxx_get_tag_protocol,
.setup  = mv88e6xxx_setup,
-   .set_addr   = mv88e6xxx_set_addr,
.adjust_link= mv88e6xxx_adjust_link,
.get_strings= mv88e6xxx_get_strings,
.get_ethtool_stats  = mv88e6xxx_get_ethtool_stats,
-- 
2.14.2

Re: [PATCH net-next 3/3] sched: act: ife: update parameters via rcu handling

2017-10-12 Thread Jamal Hadi Salim


On 17-10-11 05:16 PM, Alexander Aring wrote:

This patch changes the parameter updating via RCU and not protected by a
spinlock anymore. This reduce the time that the spinlock is being held.

Signed-off-by: Alexander Aring 


Acked-by: Jamal Hadi Salim 

cheers,
jamal

Re: [PATCH net-next 2/3] sched: act: ife: migrate to use per-cpu counters

2017-10-12 Thread Jamal Hadi Salim


On 17-10-11 05:16 PM, Alexander Aring wrote:

This patch migrates the current counter handling which is protected by a
spinlock to a per-cpu counter handling. This reduce the time where the
spinlock is being held.

Signed-off-by: Alexander Aring 


Acked-by: Jamal Hadi Salim 

cheers,
jamal

Re: [PATCH] tracing: bpf: Hide bpf trace events when they are not used

2017-10-12 Thread Alexei Starovoitov

On Thu, Oct 12, 2017 at 09:35:01PM -0400, Steven Rostedt wrote:
> On Thu, 12 Oct 2017 18:14:52 -0700
> Alexei Starovoitov  wrote:
> 
> > On Thu, Oct 12, 2017 at 06:40:02PM -0400, Steven Rostedt wrote:
> > > From: Steven Rostedt (VMware) 
> > > 
> > > All the trace events defined in include/trace/events/bpf.h are only
> > > used when CONFIG_BPF_SYSCALL is defined. But this file gets included by
> > > include/linux/bpf_trace.h which is included by the networking code with
> > > CREATE_TRACE_POINTS defined.
> > > 
> > > If a trace event is created but not used it still has data structures
> > > and functions created for its use, even though nothing is using them.
> > > To not waste space, do not define the BPF trace events in bpf.h unless
> > > CONFIG_BPF_SYSCALL is defined.
> > > 
> > > Signed-off-by: Steven Rostedt (VMware)   
> > 
> > Looks fine.
> > Acked-by: Alexei Starovoitov 
> > 
> > I'm assuming you want to take it through tracing tree along
> > with all other cleanups?
> 
> Either way is fine. I have a few other ones. I believe Paul is taking
> the RCU patch. There's no dependency.
> 
> I'll take it if it is easier for you. I just need the ack.

actually just noticed that xdp tracepoints are not covered by ifdef.
They depend on bpf_syscall too. So probably makes sense to wrap
them together.
bpf tracepoints are not being actively worked on whereas xdp tracepoints
keep evolving quickly, so the best is probalby to go via net-next
if you don't mind.

Re: [PATCH net-next 1/3] sched: act: ife: move encode/decode check to init

2017-10-12 Thread Jamal Hadi Salim


On 17-10-11 05:16 PM, Alexander Aring wrote:

This patch adds the check of the two possible ife handlings encode
and decode to the init callback. The decode value is for usability
aspect and used in userspace code only. The current code offers encode
else decode only. This patch avoids any other option than this.

Signed-off-by: Alexander Aring 


Acked-by: Jamal Hadi Salim 

cheers,
jamal

Re: [PATCH] tracing: bpf: Hide bpf trace events when they are not used

2017-10-12 Thread Steven Rostedt

On Thu, 12 Oct 2017 18:14:52 -0700
Alexei Starovoitov  wrote:

> On Thu, Oct 12, 2017 at 06:40:02PM -0400, Steven Rostedt wrote:
> > From: Steven Rostedt (VMware) 
> > 
> > All the trace events defined in include/trace/events/bpf.h are only
> > used when CONFIG_BPF_SYSCALL is defined. But this file gets included by
> > include/linux/bpf_trace.h which is included by the networking code with
> > CREATE_TRACE_POINTS defined.
> > 
> > If a trace event is created but not used it still has data structures
> > and functions created for its use, even though nothing is using them.
> > To not waste space, do not define the BPF trace events in bpf.h unless
> > CONFIG_BPF_SYSCALL is defined.
> > 
> > Signed-off-by: Steven Rostedt (VMware)   
> 
> Looks fine.
> Acked-by: Alexei Starovoitov 
> 
> I'm assuming you want to take it through tracing tree along
> with all other cleanups?

Either way is fine. I have a few other ones. I believe Paul is taking
the RCU patch. There's no dependency.

I'll take it if it is easier for you. I just need the ack.

-- Steve

Re: [PATCH 2/6] ath9k: add a quirk to set use_msi automatically

2017-10-12 Thread Daniel Drake

On Fri, Oct 13, 2017 at 9:12 AM, AceLan Kao  wrote:
> Hi Daniel,
>
> After applied the 2 commits you mentioned in the email, ath9k works.
>
> https://marc.info/?l=linux-wireless&m=150631274108016&w=2
> https://github.com/endlessm/linux/commit/739c7a924db8f4434a9617657

Thanks for testing. However the approach was basically rejected in this thread:
  [PATCH] PCI MSI: allow alignment restrictions on vector allocation
  https://marc.info/?t=15063128321&r=1&w=2

So we still need an upstream solution.

I am curious what Qualcomm have to say about their hardware corrupting
the MSI Message Data value. Is there any news on them submitting the
MSI support patch?

Separately we have the option of seeing if Intel can help us unblock
the legacy interrupt (assuming it was simply blocked by the BIOS), or
adding an interrupt-polling fallback path to ath9k.

Daniel

Re: [PATCH] tracing: bpf: Hide bpf trace events when they are not used

2017-10-12 Thread Alexei Starovoitov

On Thu, Oct 12, 2017 at 06:40:02PM -0400, Steven Rostedt wrote:
> From: Steven Rostedt (VMware) 
> 
> All the trace events defined in include/trace/events/bpf.h are only
> used when CONFIG_BPF_SYSCALL is defined. But this file gets included by
> include/linux/bpf_trace.h which is included by the networking code with
> CREATE_TRACE_POINTS defined.
> 
> If a trace event is created but not used it still has data structures
> and functions created for its use, even though nothing is using them.
> To not waste space, do not define the BPF trace events in bpf.h unless
> CONFIG_BPF_SYSCALL is defined.
> 
> Signed-off-by: Steven Rostedt (VMware) 

Looks fine.
Acked-by: Alexei Starovoitov 

I'm assuming you want to take it through tracing tree along
with all other cleanups?

Re: [PATCH] Add -target to clang switch while cross compiling.

2017-10-12 Thread Alexei Starovoitov

On Thu, Oct 12, 2017 at 04:58:43PM -0700, Abhijit Ayarekar wrote:
> Update to llvm excludes assembly instructions.
> llvm git revision is below
> 
> commit 2865ab6996164e7854d55c9e21c065fad7c26569
> Author: Yonghong Song 
> Date:   Mon Sep 18 23:29:36 2017 +
> 
> bpf: add inline-asm support
> 
> Signed-off-by: Yonghong Song 
> Acked-by: Alexei Starovoitov 
> 
> git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@313593 
> 91177308-0d34-0410-b5e6-96231b3b80d8

correct way to reference a commit is
commit 81eb8447daae ("ipv6: take care of rt6_stats")
Also I think git-svn-id link is broken. I'm not sure why llvm keeps
adding it to commits. Kernel's git history doesn't need them.
So just mention llvm commit in kernel's cannonical way and
mention that it will be part of llvm release 6.0

Re: [PATCH 2/6] ath9k: add a quirk to set use_msi automatically

2017-10-12 Thread AceLan Kao

Hi Daniel,

After applied the 2 commits you mentioned in the email, ath9k works.

https://marc.info/?l=linux-wireless&m=150631274108016&w=2
https://github.com/endlessm/linux/commit/739c7a924db8f4434a9617657

Best regards,
AceLan Kao.

2017-10-05 14:39 GMT+08:00 AceLan Kao :
> Hi all,
>
> Please drop my patches, Qualcomm is working internally and will submit
> the MSI patch by themselves.
> Thanks.
>
> Hi Daniel,
>
> I'll try your patches tomorrow.
>
> Best regards,
> AceLan Kao.
>
> 2017-10-02 12:21 GMT+08:00 Daniel Drake :
>> Hi AceLan,
>>
>> On Thu, Sep 28, 2017 at 4:28 PM, AceLan Kao  wrote:
>>> Hi Daniel,
>>>
>>> I've tried your patch, but it doesn't work for me.
>>> Wifi can scan AP, but can't get connected.
>>
>> Can you please clarify which patch(es) you have tried?
>>
>> This is the base patch which adds the infrastructure to request
>> specific MSI IRQ vectors:
>> https://marc.info/?l=linux-wireless&m=150631274108016&w=2
>>
>> This is the ath9k MSI patch which makes use of that:
>> https://github.com/endlessm/linux/commit/739c7a924db8f4434a9617657
>>
>> If you were already able to use ath9k MSI interrupts without specific
>> consideration for which MSI vector numbers were used, these are the
>> possible explanations that spring to mind:
>>
>> 1. You got lucky and it picked a vector number that is 4-aligned. You
>> can check this in the "lspci -vvv" output. You'll see something like:
>> Capabilities: [50] MSI: Enable+ Count=1/4 Maskable+ 64bit+
>> Address: fee0300c  Data: 4142
>> The lower number is the vector number. In my example here 0x42 (66) is
>> not 4-aligned so the failure condition will be hit.
>>
>> 2. You are using interrupt remapping, which I suspect may provide a
>> high likelihood of MSI interrupt vectors being 4-aligned. See if
>> /proc/interrupts shows the IRQ type as IR-PCI-MSI
>> Unfortunately interrupt remapping is not available here,
>> https://lists.linuxfoundation.org/pipermail/iommu/2017-August/023717.html
>>
>> 3. My assumption that all ath9k hardware corrupts the MSI vector
>> number could wrong. However we've seen this on different wifi modules
>> in laptops produced by different OEMs and ODMs, so it seems to be a
>> somewhat widespread problem at least.
>>
>> 4. My assumption that ath9k hardware is corrupting the MSI vector
>> number could be wrong; maybe another component is to blame, could it
>> be a BIOS issue? Admittedly I don't really know how I can debug the
>> layers inbetween seeing the MSI Message Data value disagree with the
>> vector number being handled inside do_IRQ().
>>
>> Daniel

Re: [patch net-next 27/34] nfp: bpf: Convert ndo_setup_tc offloads to block callbacks

2017-10-12 Thread Jakub Kicinski

On Thu, 12 Oct 2017 19:18:16 +0200, Jiri Pirko wrote:
> diff --git a/drivers/net/ethernet/netronome/nfp/bpf/offload.c 
> b/drivers/net/ethernet/netronome/nfp/bpf/offload.c
> index a88bb5b..9e9af88 100644
> --- a/drivers/net/ethernet/netronome/nfp/bpf/offload.c
> +++ b/drivers/net/ethernet/netronome/nfp/bpf/offload.c
> @@ -246,6 +246,10 @@ int nfp_net_bpf_offload(struct nfp_net *nn, struct 
> tc_cls_bpf_offload *cls_bpf)
>   void *code;
>   int err;
>  
> + if (cls_bpf->common.protocol != htons(ETH_P_ALL) ||
> + cls_bpf->common.chain_index)
> + return -EOPNOTSUPP;
> +
>   max_instr = nn_readw(nn, NFP_NET_CFG_BPF_MAX_LEN);
>  
>   switch (cls_bpf->command) {

It is certainly very ugly but I send a fake struct tc_cls_bpf_offload
here for XDP.  Refactoring this mess is pretty high on my priority list
but one way or the other this function will be called from XDP so TC
checks must stay in the TC handler... :(

Re: [RFC] Support for UNARP (RFC 1868)

2017-10-12 Thread महेश बंडेवार

On Thu, Oct 12, 2017 at 4:06 PM, Girish Moodalbail
 wrote:
> Hello Eric,
>
> The basic idea is to mark the ARP entry either FAILED or STALE as soon as we
> can so that the subsequent packets that depend on that ARP entry will take
> the slow path (neigh_resolve_output()).
>
> Say, if base_reachable_time is 30 seconds, then an ARP entry will be in
> reachable state somewhere between 15 to 45 seconds. Assuming the worst case,
> the ARP entry will be in REACHABLE state for 45 seconds and the packets
> continue to traverse the network towards the source machine and gets dropped
> their since the VM has moved to destination machine.
>
> Instead, based on the received UNARP packet if we mark the ARP entry
>
> (a) FAILED
>- we move to INCOMPLETE state and start the address resolution by sending
>  out ARP packets (up to allowed maximum number) until we get ARP
> response
>  back at which point we move the ARP entry state to reachable.
>
> (b) STALE
>- we move to DELAY state and set the next timer to DELAY_PROBE_TIME
>  (1 second) and continue to send queued packets in arp_queue.
>- After 1 sec we move to PROBE state and start the address resolution
> like
>  in the case(a) above.
>
> I was leaning towards (a).
One could arbitrarily increase the stale timeout (by changing no of
probes). So sender
will continue sending traffic to something that has already gone away.
STALE doesn't
mean bad but here the sender is clearly indicating it's going away so
FAILED seems to
be the only logical option.

> Please see in-line..
>
> 
>
>>
>> Hi Girish
>>
>> Your description (or patch title) is misleading. You apparently
>> implement the receive side of the RFC.
>
>
> You are right, it implements only the receive side of the RFC. If this RFC
> is accepted, then we can change arping(8) to generate UNARP requests. We
> could also add an option to ip-address(8) delete subcommand to generate
> UNARP whenever an address is deleted from the interface.
>
>> And the RFC had Proxy ARP in mind.
>>
>> What about security implications ?
>
>
> Yes, this feature will extend the attack surface for L2 networks. However,
> the attack vectors for this feature should be same as that of the gratuitous
> ARP, right? The same attack mitigation techniques for gratuitous ARPs is
> equally applicable here.
>
>> Will TCP flows be terminated, instead
>> of being smoothly migrated (TCP_REPAIR)
>
>
> The TCP flows will not be terminated. Upon receiving UNARP packet, the ARP
> entry will be marked FAILED. The subsequent TCP packets from the client
> (towards that IP) will be queued (the first 3 packets in arp_queue and then
> other packets get dropped) until the IP address is resolved again through
> the slow path neigh_resolve_output().
>
> The slow path marks the entry as INCOMPLETE and will start sending several
> ARP requests (ucast_solicit + app_solicit + mcast_solicit) to resolve the
> IP. If the resolution is successful, then the TCP packets will be sent out.
> If not, we will invalidate the ARP entry and call arp_error_report() on the
> queued packets (which will end up sending ICMP_HOST_UNREACH error). This
> behavior is same as what will occur if TCP server disappears in the middle
> of a connection.
>
>>
>> What about IPv6 ? Or maybe more abruptly, do we still need to add
>> features to IPv4 in 2017,  22 years after this RFC came ? ;)
>
>
> Legit question :). Well one thing I have seen in Networking is that an old
> idea circles back around later and turns out to be useful in new contexts
> and use cases. Like I enumerated in my initial email there are certain use
> cases in Cloud that might benefit from UNARP.
>
It doesn't make sense to have this implemented only for IPv4. At this time if
equivalent IPv6 feature is missing, I don't see it being useful / acceptable.

> regards,
> ~Girish
>
>>
>> Thanks.
>>
>>
>

[next-queue PATCH v7 4/6] net/sched: Introduce Credit Based Shaper (CBS) qdisc

2017-10-12 Thread Vinicius Costa Gomes

This queueing discipline implements the shaper algorithm defined by
the 802.1Q-2014 Section 8.6.8.2 and detailed in Annex L.

It's primary usage is to apply some bandwidth reservation to user
defined traffic classes, which are mapped to different queues via the
mqprio qdisc.

Only a simple software implementation is added for now.

Signed-off-by: Vinicius Costa Gomes 
Signed-off-by: Jesus Sanchez-Palencia 
---
 include/uapi/linux/pkt_sched.h |  18 +++
 net/sched/Kconfig  |  11 ++
 net/sched/Makefile |   1 +
 net/sched/sch_cbs.c| 314 +
 4 files changed, 344 insertions(+)
 create mode 100644 net/sched/sch_cbs.c

diff --git a/include/uapi/linux/pkt_sched.h b/include/uapi/linux/pkt_sched.h
index 099bf5528fed..41e349df4bf4 100644
--- a/include/uapi/linux/pkt_sched.h
+++ b/include/uapi/linux/pkt_sched.h
@@ -871,4 +871,22 @@ struct tc_pie_xstats {
__u32 maxq; /* maximum queue size */
__u32 ecn_mark; /* packets marked with ecn*/
 };
+
+/* CBS */
+struct tc_cbs_qopt {
+   __u8 offload;
+   __s32 hicredit;
+   __s32 locredit;
+   __s32 idleslope;
+   __s32 sendslope;
+};
+
+enum {
+   TCA_CBS_UNSPEC,
+   TCA_CBS_PARMS,
+   __TCA_CBS_MAX,
+};
+
+#define TCA_CBS_MAX (__TCA_CBS_MAX - 1)
+
 #endif
diff --git a/net/sched/Kconfig b/net/sched/Kconfig
index e70ed26485a2..c03d86a7775e 100644
--- a/net/sched/Kconfig
+++ b/net/sched/Kconfig
@@ -172,6 +172,17 @@ config NET_SCH_TBF
  To compile this code as a module, choose M here: the
  module will be called sch_tbf.
 
+config NET_SCH_CBS
+   tristate "Credit Based Shaper (CBS)"
+   ---help---
+ Say Y here if you want to use the Credit Based Shaper (CBS) packet
+ scheduling algorithm.
+
+ See the top of  for more details.
+
+ To compile this code as a module, choose M here: the
+ module will be called sch_cbs.
+
 config NET_SCH_GRED
tristate "Generic Random Early Detection (GRED)"
---help---
diff --git a/net/sched/Makefile b/net/sched/Makefile
index 7b915d226de7..80c8f92d162d 100644
--- a/net/sched/Makefile
+++ b/net/sched/Makefile
@@ -52,6 +52,7 @@ obj-$(CONFIG_NET_SCH_FQ_CODEL)+= sch_fq_codel.o
 obj-$(CONFIG_NET_SCH_FQ)   += sch_fq.o
 obj-$(CONFIG_NET_SCH_HHF)  += sch_hhf.o
 obj-$(CONFIG_NET_SCH_PIE)  += sch_pie.o
+obj-$(CONFIG_NET_SCH_CBS)  += sch_cbs.o
 
 obj-$(CONFIG_NET_CLS_U32)  += cls_u32.o
 obj-$(CONFIG_NET_CLS_ROUTE4)   += cls_route.o
diff --git a/net/sched/sch_cbs.c b/net/sched/sch_cbs.c
new file mode 100644
index ..0643587e6dc8
--- /dev/null
+++ b/net/sched/sch_cbs.c
@@ -0,0 +1,314 @@
+/*
+ * net/sched/sch_cbs.c Credit Based Shaper
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ *
+ * Authors:Vinicius Costa Gomes 
+ *
+ */
+
+/* Credit Based Shaper (CBS)
+ * =
+ *
+ * This is a simple rate-limiting shaper aimed at TSN applications on
+ * systems with known traffic workloads.
+ *
+ * Its algorithm is defined by the IEEE 802.1Q-2014 Specification,
+ * Section 8.6.8.2, and explained in more detail in the Annex L of the
+ * same specification.
+ *
+ * There are four tunables to be considered:
+ *
+ * 'idleslope': Idleslope is the rate of credits that is
+ * accumulated (in kilobits per second) when there is at least
+ * one packet waiting for transmission. Packets are transmitted
+ * when the current value of credits is equal or greater than
+ * zero. When there is no packet to be transmitted the amount of
+ * credits is set to zero. This is the main tunable of the CBS
+ * algorithm.
+ *
+ * 'sendslope':
+ * Sendslope is the rate of credits that is depleted (it should be a
+ * negative number of kilobits per second) when a transmission is
+ * ocurring. It can be calculated as follows, (IEEE 802.1Q-2014 Section
+ * 8.6.8.2 item g):
+ *
+ * sendslope = idleslope - port_transmit_rate
+ *
+ * 'hicredit': Hicredit defines the maximum amount of credits (in
+ * bytes) that can be accumulated. Hicredit depends on the
+ * characteristics of interfering traffic,
+ * 'max_interference_size' is the maximum size of any burst of
+ * traffic that can delay the transmission of a frame that is
+ * available for transmission for this traffic class, (IEEE
+ * 802.1Q-2014 Annex L, Equation L-3):
+ *
+ * hicredit = max_interference_size * (idleslope / port_transmit_rate)
+ *
+ * 'locredit': Locredit is the minimum amount of credits that can
+ * be reached. It is a function of the traffic flowing through
+ * this qdisc (IEEE 802.1Q-2014 Annex L, Equation

[next-queue PATCH v7 3/6] net/sched: Add select_queue() class_ops for mqprio

2017-10-12 Thread Vinicius Costa Gomes

From: Jesus Sanchez-Palencia 

When replacing a child qdisc from mqprio, tc_modify_qdisc() must fetch
the netdev_queue pointer that the current child qdisc is associated
with before creating the new qdisc.

Currently, when using mqprio as root qdisc, the kernel will end up
getting the queue #0 pointer from the mqprio (root qdisc), which leaves
any new child qdisc with a possibly wrong netdev_queue pointer.

Implementing the Qdisc_class_ops select_queue() on mqprio fixes this
issue and avoid an inconsistent state when child qdiscs are replaced.

Signed-off-by: Jesus Sanchez-Palencia 
---
 net/sched/sch_mqprio.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/net/sched/sch_mqprio.c b/net/sched/sch_mqprio.c
index 6bcdfe6e7b63..8c042ae323e3 100644
--- a/net/sched/sch_mqprio.c
+++ b/net/sched/sch_mqprio.c
@@ -396,6 +396,12 @@ static void mqprio_walk(struct Qdisc *sch, struct 
qdisc_walker *arg)
}
 }
 
+static struct netdev_queue *mqprio_select_queue(struct Qdisc *sch,
+   struct tcmsg *tcm)
+{
+   return mqprio_queue_get(sch, TC_H_MIN(tcm->tcm_parent));
+}
+
 static const struct Qdisc_class_ops mqprio_class_ops = {
.graft  = mqprio_graft,
.leaf   = mqprio_leaf,
@@ -403,6 +409,7 @@ static const struct Qdisc_class_ops mqprio_class_ops = {
.walk   = mqprio_walk,
.dump   = mqprio_dump_class,
.dump_stats = mqprio_dump_class_stats,
+   .select_queue   = mqprio_select_queue,
 };
 
 static struct Qdisc_ops mqprio_qdisc_ops __read_mostly = {
-- 
2.14.2

[next-queue PATCH v7 1/6] net/sched: Check for null dev_queue on create flow

2017-10-12 Thread Vinicius Costa Gomes

From: Jesus Sanchez-Palencia 

In qdisc_alloc() the dev_queue pointer was used without any checks
being performed. If qdisc_create() gets a null dev_queue pointer, it
just passes it along to qdisc_alloc(), leading to a crash. That
happens if a root qdisc implements select_queue() and returns a null
dev_queue pointer for an "invalid handle", for example, or if the
dev_queue associated with the parent qdisc is null.

This patch is in preparation for the next in this series, where
select_queue() is being added to mqprio and as it may return a null
dev_queue.

Signed-off-by: Jesus Sanchez-Palencia 
---
 net/sched/sch_generic.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index a0a198768aad..de2408f1ccd3 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -603,8 +603,14 @@ struct Qdisc *qdisc_alloc(struct netdev_queue *dev_queue,
struct Qdisc *sch;
unsigned int size = QDISC_ALIGN(sizeof(*sch)) + ops->priv_size;
int err = -ENOBUFS;
-   struct net_device *dev = dev_queue->dev;
+   struct net_device *dev;
+
+   if (!dev_queue) {
+   err = -EINVAL;
+   goto errout;
+   }
 
+   dev = dev_queue->dev;
p = kzalloc_node(size, GFP_KERNEL,
 netdev_queue_numa_node_read(dev_queue));
 
-- 
2.14.2

[next-queue PATCH v7 5/6] net/sched: Add support for HW offloading for CBS

2017-10-12 Thread Vinicius Costa Gomes

This adds support for offloading the CBS algorithm to the controller,
if supported. Drivers wanting to support CBS offload must implement
the .ndo_setup_tc callback and handle the TC_SETUP_CBS (introduced
here) type.

Signed-off-by: Vinicius Costa Gomes 
---
 include/linux/netdevice.h |   1 +
 include/net/pkt_sched.h   |   9 
 net/sched/sch_cbs.c   | 104 --
 3 files changed, 102 insertions(+), 12 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 31bb3010c69b..1f6c44ef5b21 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -775,6 +775,7 @@ enum tc_setup_type {
TC_SETUP_CLSFLOWER,
TC_SETUP_CLSMATCHALL,
TC_SETUP_CLSBPF,
+   TC_SETUP_CBS,
 };
 
 /* These structures hold the attributes of xdp state that are being passed
diff --git a/include/net/pkt_sched.h b/include/net/pkt_sched.h
index 259bc191ba59..7c597b050b36 100644
--- a/include/net/pkt_sched.h
+++ b/include/net/pkt_sched.h
@@ -146,4 +146,13 @@ static inline bool is_classid_clsact_egress(u32 classid)
   TC_H_MIN(classid) == TC_H_MIN(TC_H_MIN_EGRESS);
 }
 
+struct tc_cbs_qopt_offload {
+   u8 enable;
+   s32 queue;
+   s32 hicredit;
+   s32 locredit;
+   s32 idleslope;
+   s32 sendslope;
+};
+
 #endif
diff --git a/net/sched/sch_cbs.c b/net/sched/sch_cbs.c
index 0643587e6dc8..7d2100c5b8aa 100644
--- a/net/sched/sch_cbs.c
+++ b/net/sched/sch_cbs.c
@@ -68,6 +68,8 @@
 #define BYTES_PER_KBIT (1000 / 8)
 
 struct cbs_sched_data {
+   bool offload;
+   int queue;
s64 port_rate; /* in bytes/s */
s64 last; /* timestamp in ns */
s64 credits; /* in bytes */
@@ -80,6 +82,11 @@ struct cbs_sched_data {
struct sk_buff *(*dequeue)(struct Qdisc *sch);
 };
 
+static int cbs_enqueue_offload(struct sk_buff *skb, struct Qdisc *sch)
+{
+   return qdisc_enqueue_tail(skb, sch);
+}
+
 static int cbs_enqueue_soft(struct sk_buff *skb, struct Qdisc *sch)
 {
struct cbs_sched_data *q = qdisc_priv(sch);
@@ -190,6 +197,11 @@ static struct sk_buff *cbs_dequeue_soft(struct Qdisc *sch)
return skb;
 }
 
+static struct sk_buff *cbs_dequeue_offload(struct Qdisc *sch)
+{
+   return qdisc_dequeue_head(sch);
+}
+
 static struct sk_buff *cbs_dequeue(struct Qdisc *sch)
 {
struct cbs_sched_data *q = qdisc_priv(sch);
@@ -201,14 +213,66 @@ static const struct nla_policy cbs_policy[TCA_CBS_MAX + 
1] = {
[TCA_CBS_PARMS] = { .len = sizeof(struct tc_cbs_qopt) },
 };
 
+static void cbs_disable_offload(struct net_device *dev,
+   struct cbs_sched_data *q)
+{
+   struct tc_cbs_qopt_offload cbs = { };
+   const struct net_device_ops *ops;
+   int err;
+
+   if (!q->offload)
+   return;
+
+   q->enqueue = cbs_enqueue_soft;
+   q->dequeue = cbs_dequeue_soft;
+
+   ops = dev->netdev_ops;
+   if (!ops->ndo_setup_tc)
+   return;
+
+   cbs.queue = q->queue;
+   cbs.enable = 0;
+
+   err = ops->ndo_setup_tc(dev, TC_SETUP_CBS, &cbs);
+   if (err < 0)
+   pr_warn("Couldn't disable CBS offload for queue %d\n",
+   cbs.queue);
+}
+
+static int cbs_enable_offload(struct net_device *dev, struct cbs_sched_data *q,
+ const struct tc_cbs_qopt *opt)
+{
+   const struct net_device_ops *ops = dev->netdev_ops;
+   struct tc_cbs_qopt_offload cbs = { };
+   int err;
+
+   if (!ops->ndo_setup_tc)
+   return -EOPNOTSUPP;
+
+   cbs.queue = q->queue;
+
+   cbs.enable = 1;
+   cbs.hicredit = opt->hicredit;
+   cbs.locredit = opt->locredit;
+   cbs.idleslope = opt->idleslope;
+   cbs.sendslope = opt->sendslope;
+
+   err = ops->ndo_setup_tc(dev, TC_SETUP_CBS, &cbs);
+   if (err < 0)
+   return err;
+
+   q->enqueue = cbs_enqueue_offload;
+   q->dequeue = cbs_dequeue_offload;
+
+   return 0;
+}
+
 static int cbs_change(struct Qdisc *sch, struct nlattr *opt)
 {
struct cbs_sched_data *q = qdisc_priv(sch);
struct net_device *dev = qdisc_dev(sch);
struct nlattr *tb[TCA_CBS_MAX + 1];
-   struct ethtool_link_ksettings ecmd;
struct tc_cbs_qopt *qopt;
-   s64 link_speed;
int err;
 
err = nla_parse_nested(tb, TCA_CBS_MAX, opt, cbs_policy, NULL);
@@ -220,23 +284,30 @@ static int cbs_change(struct Qdisc *sch, struct nlattr 
*opt)
 
qopt = nla_data(tb[TCA_CBS_PARMS]);
 
-   if (qopt->offload)
-   return -EOPNOTSUPP;
+   if (!qopt->offload) {
+   struct ethtool_link_ksettings ecmd;
+   s64 link_speed;
 
-   if (!__ethtool_get_link_ksettings(dev, &ecmd))
-   link_speed = ecmd.base.speed;
-   else
-   link_speed = SPEED_1000;
+   if (!__ethtool_get_link_ksettings(dev, &ecmd))
+   link_speed =

[next-queue PATCH v7 6/6] igb: Add support for CBS offload

2017-10-12 Thread Vinicius Costa Gomes

From: Andre Guedes 

This patch adds support for Credit-Based Shaper (CBS) qdisc offload
from Traffic Control system. This support enable us to leverage the
Forwarding and Queuing for Time-Sensitive Streams (FQTSS) features
from Intel i210 Ethernet Controller. FQTSS is the former 802.1Qav
standard which was merged into 802.1Q in 2014. It enables traffic
prioritization and bandwidth reservation via the Credit-Based Shaper
which is implemented in hardware by i210 controller.

The patch introduces the igb_setup_tc() function which implements the
support for CBS qdisc hardware offload in the IGB driver. CBS offload
is the only traffic control offload supported by the driver at the
moment.

FQTSS transmission mode from i210 controller is automatically enabled
by the IGB driver when the CBS is enabled for the first hardware
queue. Likewise, FQTSS mode is automatically disabled when CBS is
disabled for the last hardware queue. Changing FQTSS mode requires NIC
reset.

FQTSS feature is supported by i210 controller only.

Signed-off-by: Andre Guedes 
---
 drivers/net/ethernet/intel/igb/e1000_defines.h |  23 ++
 drivers/net/ethernet/intel/igb/e1000_regs.h|   8 +
 drivers/net/ethernet/intel/igb/igb.h   |   6 +
 drivers/net/ethernet/intel/igb/igb_main.c  | 347 +
 4 files changed, 384 insertions(+)

diff --git a/drivers/net/ethernet/intel/igb/e1000_defines.h 
b/drivers/net/ethernet/intel/igb/e1000_defines.h
index 1de82f247312..83cabff1e0ab 100644
--- a/drivers/net/ethernet/intel/igb/e1000_defines.h
+++ b/drivers/net/ethernet/intel/igb/e1000_defines.h
@@ -353,7 +353,18 @@
 #define E1000_RXPBS_CFG_TS_EN   0x8000
 
 #define I210_RXPBSIZE_DEFAULT  0x00A2 /* RXPBSIZE default */
+#define I210_RXPBSIZE_MASK 0x003F
+#define I210_RXPBSIZE_PB_32KB  0x0020
 #define I210_TXPBSIZE_DEFAULT  0x0414 /* TXPBSIZE default */
+#define I210_TXPBSIZE_MASK 0xC0FF
+#define I210_TXPBSIZE_PB0_8KB  (8 << 0)
+#define I210_TXPBSIZE_PB1_8KB  (8 << 6)
+#define I210_TXPBSIZE_PB2_4KB  (4 << 12)
+#define I210_TXPBSIZE_PB3_4KB  (4 << 18)
+
+#define I210_DTXMXPKTSZ_DEFAULT0x0098
+
+#define I210_SR_QUEUES_NUM 2
 
 /* SerDes Control */
 #define E1000_SCTL_DISABLE_SERDES_LOOPBACK 0x0400
@@ -1051,4 +1062,16 @@
 #define E1000_VLAPQF_P_VALID(_n)   (0x1 << (3 + (_n) * 4))
 #define E1000_VLAPQF_QUEUE_MASK0x03
 
+/* TX Qav Control fields */
+#define E1000_TQAVCTRL_XMIT_MODE   BIT(0)
+#define E1000_TQAVCTRL_DATAFETCHARBBIT(4)
+#define E1000_TQAVCTRL_DATATRANARB BIT(8)
+
+/* TX Qav Credit Control fields */
+#define E1000_TQAVCC_IDLESLOPE_MASK0x
+#define E1000_TQAVCC_QUEUEMODE BIT(31)
+
+/* Transmit Descriptor Control fields */
+#define E1000_TXDCTL_PRIORITY  BIT(27)
+
 #endif
diff --git a/drivers/net/ethernet/intel/igb/e1000_regs.h 
b/drivers/net/ethernet/intel/igb/e1000_regs.h
index 58adbf234e07..8eee081d395f 100644
--- a/drivers/net/ethernet/intel/igb/e1000_regs.h
+++ b/drivers/net/ethernet/intel/igb/e1000_regs.h
@@ -421,6 +421,14 @@ do { \
 
 #define E1000_I210_FLA 0x1201C
 
+#define E1000_I210_DTXMXPKTSZ  0x355C
+
+#define E1000_I210_TXDCTL(_n)  (0x0E028 + ((_n) * 0x40))
+
+#define E1000_I210_TQAVCTRL0x3570
+#define E1000_I210_TQAVCC(_n)  (0x3004 + ((_n) * 0x40))
+#define E1000_I210_TQAVHC(_n)  (0x300C + ((_n) * 0x40))
+
 #define E1000_INVM_DATA_REG(_n)(0x12120 + 4*(_n))
 #define E1000_INVM_SIZE64 /* Number of INVM Data Registers */
 
diff --git a/drivers/net/ethernet/intel/igb/igb.h 
b/drivers/net/ethernet/intel/igb/igb.h
index 06ffb2bc713e..92845692087a 100644
--- a/drivers/net/ethernet/intel/igb/igb.h
+++ b/drivers/net/ethernet/intel/igb/igb.h
@@ -281,6 +281,11 @@ struct igb_ring {
u16 count;  /* number of desc. in the ring */
u8 queue_index; /* logical index of the ring*/
u8 reg_idx; /* physical index of the ring */
+   bool cbs_enable;/* indicates if CBS is enabled */
+   s32 idleslope;  /* idleSlope in kbps */
+   s32 sendslope;  /* sendSlope in kbps */
+   s32 hicredit;   /* hiCredit in bytes */
+   s32 locredit;   /* loCredit in bytes */
 
/* everything past this point are written often */
u16 next_to_clean;
@@ -621,6 +626,7 @@ struct igb_adapter {
 #define IGB_FLAG_EEE   BIT(14)
 #define IGB_FLAG_VLAN_PROMISC  BIT(15)
 #define IGB_FLAG_RX_LEGACY BIT(16)
+#define IGB_FLAG_FQTSS BIT(17)
 
 /* Media Auto Sense */
 #define IGB_MAS_ENABLE_0   0X0001
diff --git a/drivers/net/ethernet/intel/igb/igb_main.c 
b/drivers/net/ethernet/intel/igb/igb_main.c
index 837d9b46a390..be2cf263efa9 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/

[next-queue PATCH v7 0/6] TSN: Add qdisc based config interface for CBS

2017-10-12 Thread Vinicius Costa Gomes

Hi,

Changes since v6:
 - Fixed compilation for 32bit arches;
 - Aligned the behaviour of .select_queue() of the mq qdisc to be the
   same as mqprio;

Changes since v5:
 - Fixed comments from Jiri Pirko;

Changes since v4:
 - Added a software implementation of the CBS algorithm;

Changes since v3:
 - None, only a clean patchset without old patches;

Changes since v2:
 - squashed the patch introducing the userspace API into the patch
   implementing CBS;

Changes since v1:
 - Solved the mqprio dependency;
 - Fixed a mqprio bug, that caused the inner qdisc to have a wrong
   dev_queue associated with it;

Changes from the RFC:
 - Fixed comments from Henrik Austad;
 - Simplified the Qdisc, using the generic implementation of callbacks
   where possible;
 - Small refactor on the driver (igb) code;

This patchset is a proposal of how the Traffic Control subsystem can
be used to offload the configuration of the Credit Based Shaper
(defined in the IEEE 802.1Q-2014 Section 8.6.8.2) into supported
network devices.

As part of this work, we've assessed previous public discussions
related to TSN enabling: patches from Henrik Austad (Cisco), the
presentation from Eric Mann at Linux Plumbers 2012, patches from
Gangfeng Huang (National Instruments) and the current state of the
OpenAVNU project (https://github.com/AVnu/OpenAvnu/).

Overview


Time-sensitive Networking (TSN) is a set of standards that aim to
address resources availability for providing bandwidth reservation and
bounded latency on Ethernet based LANs. The proposal described here
aims to cover mainly what is needed to enable the following standards:
802.1Qat and 802.1Qav.

The initial target of this work is the Intel i210 NIC, but other
controllers' datasheet were also taken into account, like the Renesas
RZ/A1H RZ/A1M group and the Synopsis DesignWare Ethernet QoS
controller.


Proposal


Feature-wise, what is covered here is the configuration interfaces for
HW implementations of the Credit-Based shaper (CBS, 802.1Qav). CBS is
a per-queue shaper. Given that this feature is related to traffic
shaping, and that the traffic control subsystem already provides a
queueing discipline that offloads config into the device driver (i.e.
mqprio), designing a new qdisc for the specific purpose of offloading
the config for the CBS shaper seemed like a good fit.

For steering traffic into the correct queues, we use the socket option
SO_PRIORITY and then a mechanism to map priority to traffic classes /
Tx queues. The qdisc mqprio is currently used in our tests.

As for the CBS config interface, this patchset is proposing a new
qdisc called 'cbs'. Its 'tc' cmd line is:

$ tc qdisc add dev IFACE parent ID cbs locredit N hicredit M sendslope S \
 idleslope I

   Note that the parameters for this qdisc are the ones defined by the
   802.1Q-2014 spec, so no hardware specific functionality is exposed here.

Per-stream shaping, as defined by IEEE 802.1Q-2014 Section 34.6.1, is
not yet covered by this proposal.


Testing this RFC


Attached to this cover letter are:
 - calculate_cbs_params.py: A Python script to calculate the
   parameters to the CBS queueing discipline;
 - tsn-talker.c: A sample C implementation of the talker side of a stream;
 - tsn-listener.c: A sample C implementation of the listener side of a
   stream;

For testing the patches of this series, you may want to use the
attached samples to this cover letter and use the 'mqprio' qdisc to
setup the priorities to Tx queues mapping, together with the 'cbs'
qdisc to configure the HW shaper of the i210 controller:

1) Setup priorities to traffic classes to hardware queues mapping
$ tc qdisc replace dev ens4 handle 100: parent root mqprio num_tc 3 \
 map 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 queues 1@0 1@1 2@2 hw 0

For a more detailed explanation, see mqprio(8), in short, this command
will map traffic with priority 3 to the hardware queue 0, traffic with
priority 2 to hardware queue 1, and the rest will be mapped to
hardware queues 2 and 3.

2) Check scheme. You want to get the inner qdiscs ID from the bottom up
$ tc -g class show dev ens4

Ex.:
+---(100:3) mqprio
|+---(100:6) mqprio
|+---(100:7) mqprio
|
+---(100:2) mqprio
|+---(100:5) mqprio
|
+---(100:1) mqprio
 +---(100:4) mqprio

* Here '100:4' is Tx Queue #0 and '100:5' is Tx Queue #1.

3) Calculate CBS parameters for classes A and B. i.e. BW for A is 20Mbps and
   for B is 10Mbps:
$ calc_cbs_params.py -A 2 -a 1500 -B 1 -b 1500

4) Configure CBS for traffic class A (priority 3) as provided by the script:
$ tc qdisc replace dev ens4 parent 100:4 cbs locredit -1470 \
 hicredit 30 sendslope -98 idleslope 2

5) Configure CBS for traffic class B (priority 2):
$ tc qdisc replace dev ens4 parent 100:5 cbs \
 locredit -1485 hicredit 31 sendslope -99 idleslope 1

6) Run Listener:
$ ./tsn-listener -d 01:AA:AA:AA:AA:AA -i ens4 -s 1500

7) Run Talker for class A (prio 3 here), compiled from

[next-queue PATCH v7 2/6] net/sched: Change behavior of mq select_queue()

2017-10-12 Thread Vinicius Costa Gomes

From: Jesus Sanchez-Palencia 

Currently, the class_ops select_queue() implementation on sch_mq
returns a pointer to netdev_queue #0 when it receives and invalid
qdisc id. That can be misleading since all of mq's inner qdiscs are
attached to a valid netdev_queue.

Here we fix that by returning NULL when a qdisc id is invalid. This is
aligned with how select_queue() is implemented for sch_mqprio in the
next patch on this series, keeping a consistent behavior between these
two qdiscs.

Signed-off-by: Jesus Sanchez-Palencia 
---
 net/sched/sch_mq.c | 10 +-
 1 file changed, 1 insertion(+), 9 deletions(-)

diff --git a/net/sched/sch_mq.c b/net/sched/sch_mq.c
index f3a3e507422b..213b586a06a0 100644
--- a/net/sched/sch_mq.c
+++ b/net/sched/sch_mq.c
@@ -130,15 +130,7 @@ static struct netdev_queue *mq_queue_get(struct Qdisc 
*sch, unsigned long cl)
 static struct netdev_queue *mq_select_queue(struct Qdisc *sch,
struct tcmsg *tcm)
 {
-   unsigned int ntx = TC_H_MIN(tcm->tcm_parent);
-   struct netdev_queue *dev_queue = mq_queue_get(sch, ntx);
-
-   if (!dev_queue) {
-   struct net_device *dev = qdisc_dev(sch);
-
-   return netdev_get_tx_queue(dev, 0);
-   }
-   return dev_queue;
+   return mq_queue_get(sch, TC_H_MIN(tcm->tcm_parent));
 }
 
 static int mq_graft(struct Qdisc *sch, unsigned long cl, struct Qdisc *new,
-- 
2.14.2

[PATCH net-next v2 0/2] add UniPhier AVE ethernet support

2017-10-12 Thread Kunihiko Hayashi

This series adds support for Socionext AVE ethernet controller implemented
on UniPhier SoCs. This driver supports RGMII/RMII modes.

v1: http://www.spinics.net/lists/netdev/msg454292.html

The PHY patch included in v1 has already separated in:
http://www.spinics.net/lists/netdev/msg454595.html

Changes since v1:
- add/remove devicetree properties and sub-node
  - remove "internal-phy-interrupt" and "desc-bits" property
  - add SoC data structures based on compatible strings
  - add node operation to apply "mdio" sub-node
- add support for features
  - add support for {get,set}_pauseparam and pause frame operations
  - add support for ndo_get_stats64 instead of ndo_get_stats
- replace with desiable functions
  - replace check for valid phy_mode with phy_interface{_mode}_is_rgmii()
  - replace phy attach message with phy_attached_info()
  - replace 32bit operation with {upper,lower}_32_bits() on ave_wdesc_addr()
  - replace nway_reset and get_link with generic functions
- move operations to proper functions
  - move phy_start_aneg() to ndo_open,
and remove unnecessary PHY interrupt operations
See http://www.spinics.net/lists/netdev/msg454590.html
  - move irq initialization and descriptor memory allocation to ndo_open
  - move initialization of reset and clock and mdiobus to ndo_init
- fix skbuffer operations
  - fix skb alignment operations and add Rx buffer adjustment for descriptor
See http://www.spinics.net/lists/netdev/msg456014.html
  - add error returns when dma_map_single() failed 
- clean up code structures
  - clean up wait-loop and wake-queue conditions
  - add ave_wdesc_addr() and offset definitions
  - add ave_macaddr_init() to clean up mac-address operation
  - fix checking whether Tx entry is not enough
  - fix supported features of phydev
  - add necessary free/disable operations
  - add phydev check on ave_{get,set}_wol()
  - remove netif_carrier functions, phydev initializer, and Tx budget check
- change obsolate codes
  - replace ndev->{base_addr,irq} with the members of ave_private
- rename goto labels and mask definitions, and remove unused codes

Kunihiko Hayashi (2):
  dt-bindings: net: add DT bindings for Socionext UniPhier AVE
  net: ethernet: socionext: add AVE ethernet driver

 .../bindings/net/socionext,uniphier-ave4.txt   |   53 +
 drivers/net/ethernet/Kconfig   |1 +
 drivers/net/ethernet/Makefile  |1 +
 drivers/net/ethernet/socionext/Kconfig |   22 +
 drivers/net/ethernet/socionext/Makefile|4 +
 drivers/net/ethernet/socionext/sni_ave.c   | 1773 
 6 files changed, 1854 insertions(+)
 create mode 100644 
Documentation/devicetree/bindings/net/socionext,uniphier-ave4.txt
 create mode 100644 drivers/net/ethernet/socionext/Kconfig
 create mode 100644 drivers/net/ethernet/socionext/Makefile
 create mode 100644 drivers/net/ethernet/socionext/sni_ave.c

-- 
2.7.4

[PATCH net-next v2 1/2] dt-bindings: net: add DT bindings for Socionext UniPhier AVE

2017-10-12 Thread Kunihiko Hayashi

DT bindings for the AVE ethernet controller found on Socionext's
UniPhier platforms.

Signed-off-by: Kunihiko Hayashi 
Signed-off-by: Jassi Brar 
---
 .../bindings/net/socionext,uniphier-ave4.txt   | 53 ++
 1 file changed, 53 insertions(+)
 create mode 100644 
Documentation/devicetree/bindings/net/socionext,uniphier-ave4.txt

diff --git a/Documentation/devicetree/bindings/net/socionext,uniphier-ave4.txt 
b/Documentation/devicetree/bindings/net/socionext,uniphier-ave4.txt
new file mode 100644
index 000..25f4d92
--- /dev/null
+++ b/Documentation/devicetree/bindings/net/socionext,uniphier-ave4.txt
@@ -0,0 +1,53 @@
+* Socionext AVE ethernet controller
+
+This describes the devicetree bindings for AVE ethernet controller
+implemented on Socionext UniPhier SoCs.
+
+Required properties:
+ - compatible: Should be
+   - "socionext,uniphier-pro4-ave4" : for Pro4 SoC
+   - "socionext,uniphier-pxs2-ave4" : for PXs2 SoC
+   - "socionext,uniphier-ld20-ave4" : for LD20 SoC
+   - "socionext,uniphier-ld11-ave4" : for LD11 SoC
+ - reg: Address where registers are mapped and size of region.
+ - interrupts: Should contain the MAC interrupt.
+ - phy-mode: See ethernet.txt in the same directory. Allow to choose
+   "rgmii", "rmii", or "mii" according to the PHY.
+ - pinctrl-names: List of assigned state names, see pinctrl
+   binding documentation.
+ - pinctrl-0: List of phandles to configure the GPIO pin,
+   see pinctrl binding documentation. Choose this appropriately
+   according to phy-mode.
+   - <&pinctrl_ether_rgmii> if phy-mode is "rgmii".
+   - <&pinctrl_ether_rmii> if phy-mode is "rmii".
+   - <&pinctrl_ether_mii> if phy-mode is "mii".
+ - phy-handle: Should point to the external phy device.
+   See ethernet.txt file in the same directory.
+ - mdio subnode: Should be device tree subnode with the following required
+   properties:
+   - #address-cells: Must be <1>.
+   - #size-cells: Must be <0>.
+   - reg: phy ID number, usually a small integer.
+
+Optional properties:
+ - local-mac-address: See ethernet.txt in the same directory.
+
+Example:
+
+   ether: ethernet@6500 {
+   compatible = "socionext,uniphier-ld20-ave4";
+   reg = <0x6500 0x8500>;
+   interrupts = <0 66 4>;
+   pinctrl-names = "default";
+   pinctrl-0 = <&pinctrl_ether_rgmii>;
+   phy-mode = "rgmii";
+   phy-handle = <ðphy>;
+   local-mac-address = [00 00 00 00 00 00];
+   mdio {
+   #address-cells = <1>;
+   #size-cells = <0>;
+   ethphy: ethphy@1 {
+   reg = <1>;
+   };
+   };
+   };
-- 
2.7.4

[PATCH net-next v2 2/2] net: ethernet: socionext: add AVE ethernet driver

2017-10-12 Thread Kunihiko Hayashi

The UniPhier platform from Socionext provides the AVE ethernet
controller that includes MAC and MDIO bus supporting RGMII/RMII
modes. The controller is named AVE.

Signed-off-by: Kunihiko Hayashi 
Signed-off-by: Jassi Brar 
---
 drivers/net/ethernet/Kconfig |1 +
 drivers/net/ethernet/Makefile|1 +
 drivers/net/ethernet/socionext/Kconfig   |   22 +
 drivers/net/ethernet/socionext/Makefile  |4 +
 drivers/net/ethernet/socionext/sni_ave.c | 1773 ++
 5 files changed, 1801 insertions(+)
 create mode 100644 drivers/net/ethernet/socionext/Kconfig
 create mode 100644 drivers/net/ethernet/socionext/Makefile
 create mode 100644 drivers/net/ethernet/socionext/sni_ave.c

diff --git a/drivers/net/ethernet/Kconfig b/drivers/net/ethernet/Kconfig
index c604213..d50519e 100644
--- a/drivers/net/ethernet/Kconfig
+++ b/drivers/net/ethernet/Kconfig
@@ -170,6 +170,7 @@ source "drivers/net/ethernet/sis/Kconfig"
 source "drivers/net/ethernet/sfc/Kconfig"
 source "drivers/net/ethernet/sgi/Kconfig"
 source "drivers/net/ethernet/smsc/Kconfig"
+source "drivers/net/ethernet/socionext/Kconfig"
 source "drivers/net/ethernet/stmicro/Kconfig"
 source "drivers/net/ethernet/sun/Kconfig"
 source "drivers/net/ethernet/tehuti/Kconfig"
diff --git a/drivers/net/ethernet/Makefile b/drivers/net/ethernet/Makefile
index a0a03d4..9f55b36 100644
--- a/drivers/net/ethernet/Makefile
+++ b/drivers/net/ethernet/Makefile
@@ -81,6 +81,7 @@ obj-$(CONFIG_SFC) += sfc/
 obj-$(CONFIG_SFC_FALCON) += sfc/falcon/
 obj-$(CONFIG_NET_VENDOR_SGI) += sgi/
 obj-$(CONFIG_NET_VENDOR_SMSC) += smsc/
+obj-$(CONFIG_NET_VENDOR_SOCIONEXT) += socionext/
 obj-$(CONFIG_NET_VENDOR_STMICRO) += stmicro/
 obj-$(CONFIG_NET_VENDOR_SUN) += sun/
 obj-$(CONFIG_NET_VENDOR_TEHUTI) += tehuti/
diff --git a/drivers/net/ethernet/socionext/Kconfig 
b/drivers/net/ethernet/socionext/Kconfig
new file mode 100644
index 000..3a1829e
--- /dev/null
+++ b/drivers/net/ethernet/socionext/Kconfig
@@ -0,0 +1,22 @@
+config NET_VENDOR_SOCIONEXT
+   bool "Socionext ethernet drivers"
+   default y
+   ---help---
+ Option to select ethernet drivers for Socionext platforms.
+
+ Note that the answer to this question doesn't directly affect the
+ kernel: saying N will just cause the configurator to skip all
+ the questions about Socionext devices. If you say Y, you will be asked
+ for your specific card in the following questions.
+
+if NET_VENDOR_SOCIONEXT
+
+config SNI_AVE
+   tristate "Socionext AVE ethernet support"
+   depends on (ARCH_UNIPHIER || COMPILE_TEST) && OF
+   select PHYLIB
+   ---help---
+ Driver for gigabit ethernet MACs, called AVE, in the
+ Socionext UniPhier family.
+
+endif #NET_VENDOR_SOCIONEXT
diff --git a/drivers/net/ethernet/socionext/Makefile 
b/drivers/net/ethernet/socionext/Makefile
new file mode 100644
index 000..0356341
--- /dev/null
+++ b/drivers/net/ethernet/socionext/Makefile
@@ -0,0 +1,4 @@
+#
+# Makefile for all ethernet ip drivers on Socionext platforms
+#
+obj-$(CONFIG_SNI_AVE) += sni_ave.o
diff --git a/drivers/net/ethernet/socionext/sni_ave.c 
b/drivers/net/ethernet/socionext/sni_ave.c
new file mode 100644
index 000..7e399fc
--- /dev/null
+++ b/drivers/net/ethernet/socionext/sni_ave.c
@@ -0,0 +1,1773 @@
+/**
+ * sni_ave.c - Socionext UniPhier AVE ethernet driver
+ *
+ * Copyright 2014 Panasonic Corporation
+ * Copyright 2015-2017 Socionext Inc.
+ *
+ * This program is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2  of
+ * the License as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+/* General Register Group */
+#define AVE_IDR0x0 /* ID */
+#define AVE_VR 0x4 /* Version */
+#define AVE_GRR0x8 /* Global Reset */
+#define AVE_CFGR   0xc /* Configuration */
+
+/* Interrupt Register Group */
+#define AVE_GIMR   0x100   /* Global Interrupt Mask */
+#define AVE_GISR   0x104   /* Global Interrupt Status */
+
+/* MAC Register Group */
+#define AVE_TXCR   0x200   /* TX Setup */
+#define AVE_RXCR   0x204   /* RX Setup */
+#define AVE_RXMAC1R0x208   /* MAC address (lower) */
+#define AVE_RXMAC2R0x20c   /* MAC address (upper) */
+#define AVE_MDIOCTR0x214   /* MDIO Control */
+#define AVE_MDIOAR 0x218   /* MDIO Address */
+#define AVE_MDIOWDR0x21c   /* MDIO Data */
+#define AVE_MDIOSR 0x220   /* MDIO Status */
+#define AVE_MDIORDR0x224   /* MDIO

Waiting for your response to my numerous un-replied emails to you concerning your family inheritance fund ($7.5 million dollars). I seek your assistance and I assured of your capability to champion th

2017-10-12 Thread Collins Ogbor

[PATCH] Add -target to clang switch while cross compiling.

2017-10-12 Thread Abhijit Ayarekar

Update to llvm excludes assembly instructions.
llvm git revision is below

commit 2865ab6996164e7854d55c9e21c065fad7c26569
Author: Yonghong Song 
Date:   Mon Sep 18 23:29:36 2017 +

bpf: add inline-asm support

Signed-off-by: Yonghong Song 
Acked-by: Alexei Starovoitov 

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@313593 
91177308-0d34-0410-b5e6-96231b3b80d8

__ASM_SYSREG_H define is not required for native compile.
-target switch includes appropriate target specific files
while cross compiling

Tested on x86 and arm64.

Signed-off-by: Abhijit Ayarekar 
---
 samples/bpf/Makefile | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile
index ebc2ad6..81f9fcd 100644
--- a/samples/bpf/Makefile
+++ b/samples/bpf/Makefile
@@ -180,6 +180,7 @@ CLANG ?= clang
 # Detect that we're cross compiling and use the cross compiler
 ifdef CROSS_COMPILE
 HOSTCC = $(CROSS_COMPILE)gcc
+CLANG_ARCH_ARGS = -target $(ARCH)
 endif
 
 # Trick to allow make to be run from this directory
@@ -229,9 +230,9 @@ $(obj)/tracex5_kern.o: $(obj)/syscall_nrs.h
 $(obj)/%.o: $(src)/%.c
$(CLANG) $(NOSTDINC_FLAGS) $(LINUXINCLUDE) $(EXTRA_CFLAGS) -I$(obj) \
-I$(srctree)/tools/testing/selftests/bpf/ \
-   -D__KERNEL__ -D__ASM_SYSREG_H -Wno-unused-value 
-Wno-pointer-sign \
+   -D__KERNEL__ -Wno-unused-value -Wno-pointer-sign \
-D__TARGET_ARCH_$(ARCH) -Wno-compare-distinct-pointer-types \
-Wno-gnu-variable-sized-type-not-at-end \
-Wno-address-of-packed-member -Wno-tautological-compare \
-   -Wno-unknown-warning-option \
+   -Wno-unknown-warning-option $(CLANG_ARCH_ARGS) \
-O2 -emit-llvm -c $< -o -| $(LLC) -march=bpf -filetype=obj -o $@
-- 
2.7.4

[ANNOUNCE] nftables 0.8 release

2017-10-12 Thread Pablo Neira Ayuso

Hi!

The Netfilter project proudly presents:

nftables 0.8

This release contains new features available up to the (upcoming)
Linux 4.14 kernel release:

* Support for stateful objects, these objects are uniquely identified by
  a user-defined name, you can refer to them from rules, and there is a
  well established interface to operate with them, eg.

 # nft add counter filter test

  This creates a quota object whose name is 'test'.

 # nft list counters
 table ip filter {
counter test {
 packets 0 bytes 0
}

  You can then refer to these objects from maps:

 # nft add table filter
 # nft add chain filter input { type filter hook input priority 0\; }
 # nft add map filter badguys { type ipv4_addr : counter \; }
 # nft add rule filter input counter name ip saddr map @badguys
 # nft add counter filter badguy1
 # nft add counter filter badguy2
 # nft add element filter badguys { 192.168.2.3 : "badguy1" }
 # nft add element filter badguys { 192.168.2.4 : "badguy2" }

  Implicit map definitions are supported too:

 table ip filter {
counter http-traffic {
packets 8 bytes 672
}

chain input {
type filter hook input priority 0; policy accept;
counter name tcp dport map { 80 : "http-traffic", 443 : 
"http-traffic"}
}
 }

  You can atomically dump and reset these objects:

 # nft reset counter ip filter badguy1
 counter test {
packets 1024 bytes 10
 }
 # nft reset counter ip filter badguy1
 counter test {
packets 0 bytes 0
 }

  Currently: counters, quota and limit are supported. Note: limit is
  available starting 4.14-rc.

* Sort set elements when listing them, from lower to largest, eg.

 # nft add table x
 # nft add set x y { type ipv4_addr\; }
 # nft add element x y { 192.168.1.2, 192.168.1.1, 192.168.1.4, 192.168.1.3 
}
 # nft list ruleset
 table ip x {
set y {
type ipv4_addr
elements = { 192.168.1.1, 192.168.1.2,
 192.168.1.3, 192.168.1.4 }
}
 }

  When listing very large sets, nft takes almost the same time as
  before, so impact of this new feature is negligible.

* TCP option matching and mangling support. This includes TCP maximum
  segment size mangling, eg.

# nft add rule mangle forward tcp flags syn tcp option maxseg size set rt 
mtu

  People that own routers with ppp interfaces, you have no excuses to
  migrate to nftables, this is your replacement for the TCPMSS target ;-)

* Add new `-s' option for listings without stateful information:

 # nft -s list ruleset
 table ip filter {
chain output {
type filter hook output priority 0; policy accept;
tcp dport https counter
tcp dport https quota 25 mbytes
}
 }

* Add new -c/--check option for nft, to tests if your ruleset loads fine,
  into the kernel, this is a dry run mode, eg.

 # nft -c ruleset.nft

  You can also use it in incremental rule updates scenarios:

 # nft -c add rule x y counter

* Connection tracking helper support, eg.

 table ip filter {
 ct helper ftp-standard {
type "ftp" protocol tcp
 }

 chain y {
tcp dport ftp ct helper set "ftp-standard"
 }
 }

  Note for iptables users: In nftables, you have to specify what helper
  you want to enable specifically, then set it from rules, given the
  former automatic helper assignment approach is deprecated, see for
  more info: https://home.regit.org/netfilter-en/secure-use-of-helpers/

* Add --echo option, to print the handle that the kernel allocates to
  uniquely identify rules, eg.

 # nft --echo --handle add rule ip t c tcp dport {22, 80} accept
 add rule ip t c tcp dport { ssh, http } accept # handle 2

* Conntrack zone support, eg.

 table raw {
chain pre {
   type filter hook prerouting priority -300;
   iif eth3 ct zone set 23
}
chain out {
   type filter hook output priority -300;
   oif eth3 ct zone set 23
}
 }

* Symmetric hash support, eg.

 # nft add rule ip nat prerouting ct mark set symhash mod 2

* Add support to include directories from nft natives scripts, files are
  loaded in alphanumerical order, eg.

 include "/foo/*.nft"

  Assuming the following content on such folder:

/foo
/foo/02_rules.nft
/foo/01_rules.nft

  "01_rules.nft" is loaded before "02_rules.nft".

* Allow to check if IPv6 extension header or TCP option exists or is
  missing, eg.

 # nft add rule ip6 x y exthdr frag exists drop
 # nft add rule inet x y tcp optio

[PATCH] Add -target to clang switch while cross compiling.

2017-10-12 Thread Abhijit Ayarekar

Update to llvm excludes assembly instructions.
llvm git revision is below

commit 2865ab6996164e7854d55c9e21c065fad7c26569
Author: Yonghong Song 
Date:   Mon Sep 18 23:29:36 2017 +

bpf: add inline-asm support

Signed-off-by: Yonghong Song 
Acked-by: Alexei Starovoitov 

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@313593 
91177308-0d34-0410-b5e6-96231b3b80d8

__ASM_SYSREG_H define is not required for native compile.
-target switch includes appropriate target specific files
while cross compiling

Tested on x86 and arm64.

Signed-off-by: Abhijit Ayarekar 
---
 samples/bpf/Makefile | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile
index ebc2ad6..81f9fcd 100644
--- a/samples/bpf/Makefile
+++ b/samples/bpf/Makefile
@@ -180,6 +180,7 @@ CLANG ?= clang
 # Detect that we're cross compiling and use the cross compiler
 ifdef CROSS_COMPILE
 HOSTCC = $(CROSS_COMPILE)gcc
+CLANG_ARCH_ARGS = -target $(ARCH)
 endif
 
 # Trick to allow make to be run from this directory
@@ -229,9 +230,9 @@ $(obj)/tracex5_kern.o: $(obj)/syscall_nrs.h
 $(obj)/%.o: $(src)/%.c
$(CLANG) $(NOSTDINC_FLAGS) $(LINUXINCLUDE) $(EXTRA_CFLAGS) -I$(obj) \
-I$(srctree)/tools/testing/selftests/bpf/ \
-   -D__KERNEL__ -D__ASM_SYSREG_H -Wno-unused-value 
-Wno-pointer-sign \
+   -D__KERNEL__ -Wno-unused-value -Wno-pointer-sign \
-D__TARGET_ARCH_$(ARCH) -Wno-compare-distinct-pointer-types \
-Wno-gnu-variable-sized-type-not-at-end \
-Wno-address-of-packed-member -Wno-tautological-compare \
-   -Wno-unknown-warning-option \
+   -Wno-unknown-warning-option $(CLANG_ARCH_ARGS) \
-O2 -emit-llvm -c $< -o -| $(LLC) -march=bpf -filetype=obj -o $@
-- 
2.7.4

[PATCH] wanxl: use m68k-linux-gnu-as if available

2017-10-12 Thread Adam Borowski

This fixes build failure on Debian based systems: GNU as is the only m68k
assembler available in the archive (package binutils-m68k-linux-gnu).

Signed-off-by: Adam Borowski 
---
I have no relevant hardware, thus I can't check whether the built firmware
actually works.  Some opcodes are translated differently, thus it might be
possible that some extra options are required.  Or possibly the assembler
has long since changed -- the prebuilt copy hasn't been updated anywhere
within recorded history.

In any case, I admit I don't really care about this driver -- it's merely
a notorious cause of randconfig failures.

 drivers/net/wan/Makefile | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/drivers/net/wan/Makefile b/drivers/net/wan/Makefile
index 73c2326603fc..fa17b18765b0 100644
--- a/drivers/net/wan/Makefile
+++ b/drivers/net/wan/Makefile
@@ -38,14 +38,20 @@ obj-$(CONFIG_SLIC_DS26522)  += slic_ds26522.o
 clean-files := wanxlfw.inc
 $(obj)/wanxl.o:$(obj)/wanxlfw.inc
 
+HAVE_GNU_M68K_AS := $(shell command -v m68k-linux-gnu-as 2> /dev/null)
 ifeq ($(CONFIG_WANXL_BUILD_FIRMWARE),y)
 ifeq ($(ARCH),m68k)
   AS68K = $(AS)
   LD68K = $(LD)
+else
+ifdef HAVE_GNU_M68K_AS
+  AS68K = m68k-linux-gnu-as
+  LD68K = m68k-linux-gnu-ld
 else
   AS68K = as68k
   LD68K = ld68k
 endif
+endif
 
 quiet_cmd_build_wanxlfw = BLD FW  $@
   cmd_build_wanxlfw = \
-- 
2.15.0.rc0

Re: [PATCH net-next 4/5] net: dsa: add slave get master helper

2017-10-12 Thread Vivien Didelot

Hi Florian,

Florian Fainelli  writes:

> On 10/12/2017 03:51 PM, Vivien Didelot wrote:
>> Many part of the DSA slave code require to get the master device
>> assigned to a slave device. Remove dsa_master_netdev() in favor of a
>> dsa_slave_get_master() helper which does that.
>> 
>> Signed-off-by: Vivien Didelot 
>> ---
>
>> +static inline struct net_device *
>> +dsa_slave_get_master(const struct net_device *dev)
>> +{
>> +struct dsa_port *dp = dsa_slave_to_port(dev);
>> +
>> +return dp->cpu_dp->netdev;
>> +}
>
> Nit: _get may convey the idea that a reference count may be incremented
> when the function is called (balanced with a _put), so maybe name it
> dsa_slave_to_master() which is more in line with dsa_slave_to_port() as
> well? Other than that:
>
> Reviewed-by: Florian Fainelli 

Perfect, I was hesitating myself between the two, but I didn't have a
good argument for any of them. Now I have one! I'll change that for
dsa_slave_to_master and add your tag.


Thanks,

Vivien

Re: [PATCH net-next 5/5] net: dsa: split dsa_port's netdev member

2017-10-12 Thread Vivien Didelot

Hi Florian,

Florian Fainelli  writes:

> On 10/12/2017 03:51 PM, Vivien Didelot wrote:
>> The dsa_port structure has a "netdev" member, which can be used for
>> either the master device, or the slave device, depending on its type.
>> 
>> It is true that today, CPU port are not exposed to userspace, thus the
>> port's netdev member can be used to point to its master interface.
>> 
>> But it is still slightly confusing, so split it into more explicit
>> "master" and "slave" members.
>
> I do see some value in doing that, although I also see value in having
> structure members be named after what they are, rather than their use
> (oh well, it's all debatable anyway), see below for a suggestion on how
> to reconcile the two:
>
>>  struct dsa_port {
>> +/* Master device, physically connected if this is a CPU port */
>> +struct net_device *master;
>> +
>> +/* Slave device, if this port is exposed to userspace */
>> +struct net_device *slave;
>> +
>
> How about using:
>
>   union {
>   struct net_device *master;
>   struct net_device *slave;
>   } netdev;
>
> Such that this serves both purposes of clearly communicating what the
> structure member is, and it can be either one of the two, but not both
> at the same time?

I love that! It makes clear that master is not available for a non-CPU
port. Using this union is correct for the moment because DSA and CPU
ports don't have a slave device attached to them. If this becomes true
one day (unlikely), we'll remove the union.


Thanks,

Vivien

Re: [Patch net-next v2] tcp: add a tracepoint for tcp_retransmit_skb()

2017-10-12 Thread Alexei Starovoitov

On Thu, Oct 12, 2017 at 03:48:07PM -0700, Cong Wang wrote:
> We need a real-time notification for tcp retransmission
> for monitoring.
> 
> Of course we could use ftrace to dynamically instrument this
> kernel function too, however we can't retrieve the connection
> information at the same time, for example perf-tools [1] reads
> /proc/net/tcp for socket details, which is slow when we have
> a lots of connections.
> 
> Therefore, this patch adds a tracepoint for tcp_retransmit_skb()
> and exposes src/dst IP addresses and ports of the connection.
> This also makes it easier to integrate into perf.
> 
> Note, I expose both IPv4 and IPv6 addresses at the same time:
> for a IPv4 socket, v4 mapped address is used as IPv6 addresses,
> for a IPv6 socket, LOOPBACK4_IPV6 is already filled by kernel.
> Also, add sk and skb pointers as they are useful for BPF.
> 
> 1. https://github.com/brendangregg/perf-tools/blob/master/net/tcpretrans
> 
> Cc: Eric Dumazet 
> Cc: Alexei Starovoitov 
> Cc: Hannes Frederic Sowa 
> Cc: Brendan Gregg 
> Cc: Neal Cardwell 
> Signed-off-by: Cong Wang 
> ---
>  include/trace/events/tcp.h | 68 
> ++
>  net/core/net-traces.c  |  1 +
>  net/ipv4/tcp_output.c  |  3 ++
>  3 files changed, 72 insertions(+)
>  create mode 100644 include/trace/events/tcp.h
> 
> diff --git a/include/trace/events/tcp.h b/include/trace/events/tcp.h
> new file mode 100644
> index ..749f93c542ab
> --- /dev/null
> +++ b/include/trace/events/tcp.h
> @@ -0,0 +1,68 @@
> +#undef TRACE_SYSTEM
> +#define TRACE_SYSTEM tcp
> +
> +#if !defined(_TRACE_TCP_H) || defined(TRACE_HEADER_MULTI_READ)
> +#define _TRACE_TCP_H
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +TRACE_EVENT(tcp_retransmit_skb,
> +
> + TP_PROTO(struct sock *sk, struct sk_buff *skb, int segs),
> +
> + TP_ARGS(sk, skb, segs),
> +
> + TP_STRUCT__entry(
> + __field(void *, skbaddr)
> + __field(void *, skaddr)
> + __field(__u16, sport)
> + __field(__u16, dport)
> + __array(__u8, saddr, 4)
> + __array(__u8, daddr, 4)
> + __array(__u8, saddr_v6, 16)
> + __array(__u8, daddr_v6, 16)
> + ),
...
>   if (likely(!err)) {
>   TCP_SKB_CB(skb)->sacked |= TCPCB_EVER_RETRANS;
> + trace_tcp_retransmit_skb(sk, skb, segs);

looks great to me, but why 'segs' is there?
It's unused.

Re: [PATCH] Add -target to clang switches while cross compiling.

2017-10-12 Thread Alexei Starovoitov

On Thu, Oct 12, 2017 at 03:43:22PM -0700, Abhijit Ayarekar wrote:
> On Thu, Oct 12, 2017 at 03:23:04PM -0700, Alexei Starovoitov wrote:
> > On Thu, Oct 12, 2017 at 01:45:57PM -0700, Abhijit Ayarekar wrote:
> > > Latest llvm update excludes assembly instructions.
> > > As a result __ASM_SYSREGS_H define is not required.
> > > -target switch includes appropriate target specific files.
> > > 
> > > Tested on x86 and arm64 with llvm with git revision
> > > commit df6ca162269f9d756f8742bf4b658dcf690e3eb5
> > > Author: Yonghong Song 
> > > Date:   Thu Sep 28 02:46:11 2017 +
> > > 
> > > bpf: add new insns for bswap_to_le and negation
> > > 
> > > Signed-off-by: Abhijit Ayarekar 
> > > ---
> > >  samples/bpf/Makefile | 5 +++--
> > >  1 file changed, 3 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile
> > > index ebc2ad6..81f9fcd 100644
> > > --- a/samples/bpf/Makefile
> > > +++ b/samples/bpf/Makefile
> > > @@ -180,6 +180,7 @@ CLANG ?= clang
> > >  # Detect that we're cross compiling and use the cross compiler
> > >  ifdef CROSS_COMPILE
> > >  HOSTCC = $(CROSS_COMPILE)gcc
> > > +CLANG_ARCH_ARGS = -target $(ARCH)
> > 
> > this is only need because you're crosscompiling, right?
> Yes
> 
> > In native compilation it's unnecessary flag.
> > Only droping -D__ASM_SYSREG_H is enough, correct?
> > 
> Yes. That is correct.

please update commit log then and reference proper llvm commit
that added asm support instead of 'add new insns for bswap_to_le'
which is unrelated.

Re: [RFC] Support for UNARP (RFC 1868)

2017-10-12 Thread Girish Moodalbail


Hello Eric,

The basic idea is to mark the ARP entry either FAILED or STALE as soon as we can 
so that the subsequent packets that depend on that ARP entry will take the slow 
path (neigh_resolve_output()).


Say, if base_reachable_time is 30 seconds, then an ARP entry will be in 
reachable state somewhere between 15 to 45 seconds. Assuming the worst case, the 
ARP entry will be in REACHABLE state for 45 seconds and the packets continue to 
traverse the network towards the source machine and gets dropped their since the 
VM has moved to destination machine.


Instead, based on the received UNARP packet if we mark the ARP entry

(a) FAILED
   - we move to INCOMPLETE state and start the address resolution by sending
 out ARP packets (up to allowed maximum number) until we get ARP response
 back at which point we move the ARP entry state to reachable.

(b) STALE
   - we move to DELAY state and set the next timer to DELAY_PROBE_TIME
 (1 second) and continue to send queued packets in arp_queue.
   - After 1 sec we move to PROBE state and start the address resolution like
 in the case(a) above.

I was leaning towards (a). Please see in-line..





Hi Girish

Your description (or patch title) is misleading. You apparently
implement the receive side of the RFC.


You are right, it implements only the receive side of the RFC. If this RFC is 
accepted, then we can change arping(8) to generate UNARP requests. We could also 
add an option to ip-address(8) delete subcommand to generate UNARP whenever an 
address is deleted from the interface.



And the RFC had Proxy ARP in mind.

What about security implications ? 


Yes, this feature will extend the attack surface for L2 networks. However, the 
attack vectors for this feature should be same as that of the gratuitous ARP, 
right? The same attack mitigation techniques for gratuitous ARPs is equally 
applicable here.



Will TCP flows be terminated, instead
of being smoothly migrated (TCP_REPAIR)


The TCP flows will not be terminated. Upon receiving UNARP packet, the ARP entry 
will be marked FAILED. The subsequent TCP packets from the client (towards that 
IP) will be queued (the first 3 packets in arp_queue and then other packets get 
dropped) until the IP address is resolved again through the slow path 
neigh_resolve_output().


The slow path marks the entry as INCOMPLETE and will start sending several ARP 
requests (ucast_solicit + app_solicit + mcast_solicit) to resolve the IP. If the 
resolution is successful, then the TCP packets will be sent out. If not, we will 
invalidate the ARP entry and call arp_error_report() on the queued packets 
(which will end up sending ICMP_HOST_UNREACH error). This behavior is same as 
what will occur if TCP server disappears in the middle of a connection.




What about IPv6 ? Or maybe more abruptly, do we still need to add
features to IPv4 in 2017,  22 years after this RFC came ? ;)


Legit question :). Well one thing I have seen in Networking is that an old idea 
circles back around later and turns out to be useful in new contexts and use 
cases. Like I enumerated in my initial email there are certain use cases in 
Cloud that might benefit from UNARP.


regards,
~Girish



Thanks.

Re: [PATCH net-next 5/5] net: dsa: split dsa_port's netdev member

2017-10-12 Thread Florian Fainelli

On 10/12/2017 03:51 PM, Vivien Didelot wrote:
> The dsa_port structure has a "netdev" member, which can be used for
> either the master device, or the slave device, depending on its type.
> 
> It is true that today, CPU port are not exposed to userspace, thus the
> port's netdev member can be used to point to its master interface.
> 
> But it is still slightly confusing, so split it into more explicit
> "master" and "slave" members.

I do see some value in doing that, although I also see value in having
structure members be named after what they are, rather than their use
(oh well, it's all debatable anyway), see below for a suggestion on how
to reconcile the two:

>  struct dsa_port {
> + /* Master device, physically connected if this is a CPU port */
> + struct net_device *master;
> +
> + /* Slave device, if this port is exposed to userspace */
> + struct net_device *slave;
> +

How about using:

union {
struct net_device *master;
struct net_device *slave;
} netdev;

Such that this serves both purposes of clearly communicating what the
structure member is, and it can be either one of the two, but not both
at the same time?

-- 
Florian

Re: [PATCH net-next 4/5] net: dsa: add slave get master helper

2017-10-12 Thread Florian Fainelli

On 10/12/2017 03:51 PM, Vivien Didelot wrote:
> Many part of the DSA slave code require to get the master device
> assigned to a slave device. Remove dsa_master_netdev() in favor of a
> dsa_slave_get_master() helper which does that.
> 
> Signed-off-by: Vivien Didelot 
> ---

> +static inline struct net_device *
> +dsa_slave_get_master(const struct net_device *dev)
> +{
> + struct dsa_port *dp = dsa_slave_to_port(dev);
> +
> + return dp->cpu_dp->netdev;
> +}

Nit: _get may convey the idea that a reference count may be incremented
when the function is called (balanced with a _put), so maybe name it
dsa_slave_to_master() which is more in line with dsa_slave_to_port() as
well? Other than that:

Reviewed-by: Florian Fainelli 
-- 
Florian

Re: [PATCH net-next 3/5] net: dsa: add slave to port helper

2017-10-12 Thread Florian Fainelli

On 10/12/2017 03:51 PM, Vivien Didelot wrote:
> Many portions of DSA core code require to get the dsa_port structure
> corresponding to a slave net_device. For this purpose, introduce a
> dsa_slave_to_port() helper.
> 
> Signed-off-by: Vivien Didelot 

Reviewed-by: Florian Fainelli 
-- 
Florian

[PATCH] netfilter: nf_conntrack_h323: Remove typedef struct

2017-10-12 Thread Harsha Sharma

Remove typedef from struct as linux-kernel coding style tends to
avoid using typedefs.
Done using following coccinelle semantic patch

@r1@
type T;
@@

typedef struct { ... } T;

@script:python c1@
T2;
T << r1.T;
@@
if T[-2:] =="_t" or T[-2:] == "_T":
coccinelle.T2 = T[:-2];
else:
coccinelle.T2 = T;

print T, coccinelle.T2

@r2@
type r1.T;
identifier c1.T2;
@@
-typedef
struct
+ T2
{ ... }
-T
;

@r3@
type r1.T;
identifier c1.T2;
@@
-T
+struct T2

Signed-off-by: Harsha Sharma 
---
 net/netfilter/nf_conntrack_h323_asn1.c | 80 +-
 1 file changed, 40 insertions(+), 40 deletions(-)

diff --git a/net/netfilter/nf_conntrack_h323_asn1.c 
b/net/netfilter/nf_conntrack_h323_asn1.c
index 89b2e46925c4..7831aa1effc9 100644
--- a/net/netfilter/nf_conntrack_h323_asn1.c
+++ b/net/netfilter/nf_conntrack_h323_asn1.c
@@ -91,41 +91,41 @@ typedef struct field_t {
 } field_t;
 
 /* Bit Stream */
-typedef struct {
+struct bitstr {
unsigned char *buf;
unsigned char *beg;
unsigned char *end;
unsigned char *cur;
unsigned int bit;
-} bitstr_t;
+};
 
 /* Tool Functions */
 #define INC_BIT(bs) if((++(bs)->bit)>7){(bs)->cur++;(bs)->bit=0;}
 #define INC_BITS(bs,b) 
if(((bs)->bit+=(b))>7){(bs)->cur+=(bs)->bit>>3;(bs)->bit&=7;}
 #define BYTE_ALIGN(bs) if((bs)->bit){(bs)->cur++;(bs)->bit=0;}
 #define CHECK_BOUND(bs,n) if((bs)->cur+(n)>(bs)->end)return(H323_ERROR_BOUND)
-static unsigned int get_len(bitstr_t *bs);
-static unsigned int get_bit(bitstr_t *bs);
-static unsigned int get_bits(bitstr_t *bs, unsigned int b);
-static unsigned int get_bitmap(bitstr_t *bs, unsigned int b);
-static unsigned int get_uint(bitstr_t *bs, int b);
+static unsigned int get_len(struct bitstr *bs);
+static unsigned int get_bit(struct bitstr *bs);
+static unsigned int get_bits(struct bitstr *bs, unsigned int b);
+static unsigned int get_bitmap(struct bitstr *bs, unsigned int b);
+static unsigned int get_uint(struct bitstr *bs, int b);
 
 /* Decoder Functions */
-static int decode_nul(bitstr_t *bs, const struct field_t *f, char *base, int 
level);
-static int decode_bool(bitstr_t *bs, const struct field_t *f, char *base, int 
level);
-static int decode_oid(bitstr_t *bs, const struct field_t *f, char *base, int 
level);
-static int decode_int(bitstr_t *bs, const struct field_t *f, char *base, int 
level);
-static int decode_enum(bitstr_t *bs, const struct field_t *f, char *base, int 
level);
-static int decode_bitstr(bitstr_t *bs, const struct field_t *f, char *base, 
int level);
-static int decode_numstr(bitstr_t *bs, const struct field_t *f, char *base, 
int level);
-static int decode_octstr(bitstr_t *bs, const struct field_t *f, char *base, 
int level);
-static int decode_bmpstr(bitstr_t *bs, const struct field_t *f, char *base, 
int level);
-static int decode_seq(bitstr_t *bs, const struct field_t *f, char *base, int 
level);
-static int decode_seqof(bitstr_t *bs, const struct field_t *f, char *base, int 
level);
-static int decode_choice(bitstr_t *bs, const struct field_t *f, char *base, 
int level);
+static int decode_nul(struct bitstr *bs, const struct field_t *f, char *base, 
int level);
+static int decode_bool(struct bitstr *bs, const struct field_t *f, char *base, 
int level);
+static int decode_oid(struct bitstr *bs, const struct field_t *f, char *base, 
int level);
+static int decode_int(struct bitstr *bs, const struct field_t *f, char *base, 
int level);
+static int decode_enum(struct bitstr *bs, const struct field_t *f, char *base, 
int level);
+static int decode_bitstr(struct bitstr *bs, const struct field_t *f, char 
*base, int level);
+static int decode_numstr(struct bitstr *bs, const struct field_t *f, char 
*base, int level);
+static int decode_octstr(struct bitstr *bs, const struct field_t *f, char 
*base, int level);
+static int decode_bmpstr(struct bitstr *bs, const struct field_t *f, char 
*base, int level);
+static int decode_seq(struct bitstr *bs, const struct field_t *f, char *base, 
int level);
+static int decode_seqof(struct bitstr *bs, const struct field_t *f, char 
*base, int level);
+static int decode_choice(struct bitstr *bs, const struct field_t *f, char 
*base, int level);
 
 /* Decoder Functions Vector */
-typedef int (*decoder_t)(bitstr_t *, const struct field_t *, char *, int);
+typedef int (*decoder_t)(struct bitstr *, const struct field_t *, char *, int);
 static const decoder_t Decoders[] = {
decode_nul,
decode_bool,
@@ -150,7 +150,7 @@ static const decoder_t Decoders[] = {
  * Functions
  /
 /* Assume bs is aligned && v < 16384 */
-static unsigned int get_len(bitstr_t *bs)
+static unsigned int get_len(struct bitstr *bs)
 {
unsigned int v;
 
@@ -166,7 +166,7 @@ static unsigned int get_len(bitstr_t *bs)
 }
 
 //
-static unsigned int get_bit(bitstr_t *bs)
+static unsigne

Re: [PATCH net-next 2/5] net: dsa: add slave notify helper

2017-10-12 Thread Florian Fainelli

On 10/12/2017 03:51 PM, Vivien Didelot wrote:
> Both DSA slave create and destroy functions call call_dsa_notifiers with
> respectively DSA_PORT_REGISTER and DSA_PORT_UNREGISTER and the same
> dsa_notifier_register_info structure.
> 
> Wrap this in a dsa_slave_notify helper so prevent cluttering these
> functions.
> 
> Signed-off-by: Vivien Didelot 

Much nicer, thanks!

Reviewed-by: Florian Fainelli 
-- 
Florian

[PATCH net-next 0/5] net: dsa: master and slave helpers

2017-10-12 Thread Vivien Didelot

This patch series adds a few helpers to DSA core for clarity and
readability but brings no functional changes.

A dsa_slave_notify helper calls the DSA notifiers when (un)registering a
slave device.

Most of the DSA slave code only needs to access the dsa_port structure,
not the dsa_slave_priv (which only contains a few PHY-specific members).
Thus a dsa_slave_to_port helper returns a dsa_port structure of a slave
device.

A dsa_slave_get_master returns the master device of a slave device.

After that the netdev member of the dsa_port structure is split into two
explicit master and slave members to avoid confusion, even though it is
not planned to create a slave for DSA or CPU ports for the moment.

Vivien Didelot (5):
  net: dsa: use port's cpu_dp when creating a slave
  net: dsa: add slave notify helper
  net: dsa: add slave to port helper
  net: dsa: add slave get master helper
  net: dsa: split dsa_port's netdev member

 drivers/net/dsa/bcm_sf2.c|   6 +-
 drivers/net/dsa/mt7530.c |   2 +-
 drivers/net/dsa/mv88e6xxx/chip.c |   2 +-
 include/net/dsa.h|   7 +-
 net/dsa/dsa.c|   6 +-
 net/dsa/dsa2.c   |  22 ++--
 net/dsa/dsa_priv.h   |  22 ++--
 net/dsa/legacy.c |  20 ++--
 net/dsa/slave.c  | 227 +++
 net/dsa/tag_brcm.c   |   9 +-
 net/dsa/tag_dsa.c|  10 +-
 net/dsa/tag_edsa.c   |  10 +-
 net/dsa/tag_ksz.c|   4 +-
 net/dsa/tag_lan9303.c|   4 +-
 net/dsa/tag_mtk.c|   4 +-
 net/dsa/tag_qca.c|   5 +-
 net/dsa/tag_trailer.c|   4 +-
 17 files changed, 183 insertions(+), 181 deletions(-)

-- 
2.14.2

[PATCH net-next 5/5] net: dsa: split dsa_port's netdev member

2017-10-12 Thread Vivien Didelot

The dsa_port structure has a "netdev" member, which can be used for
either the master device, or the slave device, depending on its type.

It is true that today, CPU port are not exposed to userspace, thus the
port's netdev member can be used to point to its master interface.

But it is still slightly confusing, so split it into more explicit
"master" and "slave" members.

Signed-off-by: Vivien Didelot 
---
 drivers/net/dsa/bcm_sf2.c|  6 +++---
 drivers/net/dsa/mt7530.c |  2 +-
 drivers/net/dsa/mv88e6xxx/chip.c |  2 +-
 include/net/dsa.h|  7 ++-
 net/dsa/dsa.c|  6 +++---
 net/dsa/dsa2.c   | 22 +++---
 net/dsa/dsa_priv.h   |  4 ++--
 net/dsa/legacy.c | 14 +++---
 net/dsa/slave.c  |  6 +++---
 9 files changed, 37 insertions(+), 32 deletions(-)

diff --git a/drivers/net/dsa/bcm_sf2.c b/drivers/net/dsa/bcm_sf2.c
index 32025b990437..b43c063b9634 100644
--- a/drivers/net/dsa/bcm_sf2.c
+++ b/drivers/net/dsa/bcm_sf2.c
@@ -601,7 +601,7 @@ static void bcm_sf2_sw_fixed_link_update(struct dsa_switch 
*ds, int port,
 * state machine and make it go in PHY_FORCING state instead.
 */
if (!status->link)
-   netif_carrier_off(ds->ports[port].netdev);
+   netif_carrier_off(ds->ports[port].slave);
status->duplex = 1;
} else {
status->link = 1;
@@ -690,7 +690,7 @@ static int bcm_sf2_sw_resume(struct dsa_switch *ds)
 static void bcm_sf2_sw_get_wol(struct dsa_switch *ds, int port,
   struct ethtool_wolinfo *wol)
 {
-   struct net_device *p = ds->ports[port].cpu_dp->netdev;
+   struct net_device *p = ds->ports[port].cpu_dp->master;
struct bcm_sf2_priv *priv = bcm_sf2_to_priv(ds);
struct ethtool_wolinfo pwol;
 
@@ -713,7 +713,7 @@ static void bcm_sf2_sw_get_wol(struct dsa_switch *ds, int 
port,
 static int bcm_sf2_sw_set_wol(struct dsa_switch *ds, int port,
  struct ethtool_wolinfo *wol)
 {
-   struct net_device *p = ds->ports[port].cpu_dp->netdev;
+   struct net_device *p = ds->ports[port].cpu_dp->master;
struct bcm_sf2_priv *priv = bcm_sf2_to_priv(ds);
s8 cpu_port = ds->ports[port].cpu_dp->index;
struct ethtool_wolinfo pwol;
diff --git a/drivers/net/dsa/mt7530.c b/drivers/net/dsa/mt7530.c
index 034241696ce2..fea2e665d0cb 100644
--- a/drivers/net/dsa/mt7530.c
+++ b/drivers/net/dsa/mt7530.c
@@ -933,7 +933,7 @@ mt7530_setup(struct dsa_switch *ds)
 * controller also is the container for two GMACs nodes representing
 * as two netdev instances.
 */
-   dn = ds->ports[MT7530_CPU_PORT].netdev->dev.of_node->parent;
+   dn = ds->ports[MT7530_CPU_PORT].master->dev.of_node->parent;
priv->ethernet = syscon_node_to_regmap(dn);
if (IS_ERR(priv->ethernet))
return PTR_ERR(priv->ethernet);
diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
index d74c7335c512..955f4e214191 100644
--- a/drivers/net/dsa/mv88e6xxx/chip.c
+++ b/drivers/net/dsa/mv88e6xxx/chip.c
@@ -1124,7 +1124,7 @@ static int mv88e6xxx_port_check_hw_vlan(struct dsa_switch 
*ds, int port,
if (dsa_is_dsa_port(ds, i) || dsa_is_cpu_port(ds, i))
continue;
 
-   if (!ds->ports[port].netdev)
+   if (!ds->ports[port].slave)
continue;
 
if (vlan.member[i] ==
diff --git a/include/net/dsa.h b/include/net/dsa.h
index ce1d622734d7..4c769bc8a8b5 100644
--- a/include/net/dsa.h
+++ b/include/net/dsa.h
@@ -164,6 +164,12 @@ struct dsa_mall_tc_entry {
 
 
 struct dsa_port {
+   /* Master device, physically connected if this is a CPU port */
+   struct net_device *master;
+
+   /* Slave device, if this port is exposed to userspace */
+   struct net_device *slave;
+
/* CPU port tagging operations used by master or slave devices */
const struct dsa_device_ops *tag_ops;
 
@@ -176,7 +182,6 @@ struct dsa_port {
unsigned intindex;
const char  *name;
struct dsa_port *cpu_dp;
-   struct net_device   *netdev;
struct device_node  *dn;
unsigned intageing_time;
u8  stp_state;
diff --git a/net/dsa/dsa.c b/net/dsa/dsa.c
index 832c659ff993..a3abf7a7b9a2 100644
--- a/net/dsa/dsa.c
+++ b/net/dsa/dsa.c
@@ -201,7 +201,7 @@ static int dsa_switch_rcv(struct sk_buff *skb, struct 
net_device *dev,
 #ifdef CONFIG_PM_SLEEP
 static bool dsa_is_port_initialized(struct dsa_switch *ds, int p)
 {
-   return ds->enabled_port_mask & (1 << p) && ds->ports[p].netdev;
+   return ds->enabled_port_mask & (1 << p) && ds->ports[p].slave;
 }
 
 int dsa_switch_suspend(s

[PATCH net-next 3/5] net: dsa: add slave to port helper

2017-10-12 Thread Vivien Didelot

Many portions of DSA core code require to get the dsa_port structure
corresponding to a slave net_device. For this purpose, introduce a
dsa_slave_to_port() helper.

Signed-off-by: Vivien Didelot 
---
 net/dsa/dsa_priv.h|   7 +++
 net/dsa/legacy.c  |   6 +-
 net/dsa/slave.c   | 167 +-
 net/dsa/tag_brcm.c|   9 ++-
 net/dsa/tag_dsa.c |  10 +--
 net/dsa/tag_edsa.c|  10 +--
 net/dsa/tag_ksz.c |   4 +-
 net/dsa/tag_lan9303.c |   4 +-
 net/dsa/tag_mtk.c |   4 +-
 net/dsa/tag_qca.c |   5 +-
 net/dsa/tag_trailer.c |   4 +-
 11 files changed, 115 insertions(+), 115 deletions(-)

diff --git a/net/dsa/dsa_priv.h b/net/dsa/dsa_priv.h
index 2850077cc9cc..569a4929b4c9 100644
--- a/net/dsa/dsa_priv.h
+++ b/net/dsa/dsa_priv.h
@@ -169,6 +169,13 @@ int dsa_slave_resume(struct net_device *slave_dev);
 int dsa_slave_register_notifier(void);
 void dsa_slave_unregister_notifier(void);
 
+static inline struct dsa_port *dsa_slave_to_port(const struct net_device *dev)
+{
+   struct dsa_slave_priv *p = netdev_priv(dev);
+
+   return p->dp;
+}
+
 /* switch.c */
 int dsa_switch_register_notifier(struct dsa_switch *ds);
 void dsa_switch_unregister_notifier(struct dsa_switch *ds);
diff --git a/net/dsa/legacy.c b/net/dsa/legacy.c
index 19ff6e0a21dc..6f2254753859 100644
--- a/net/dsa/legacy.c
+++ b/net/dsa/legacy.c
@@ -746,8 +746,7 @@ int dsa_legacy_fdb_add(struct ndmsg *ndm, struct nlattr 
*tb[],
   const unsigned char *addr, u16 vid,
   u16 flags)
 {
-   struct dsa_slave_priv *p = netdev_priv(dev);
-   struct dsa_port *dp = p->dp;
+   struct dsa_port *dp = dsa_slave_to_port(dev);
 
return dsa_port_fdb_add(dp, addr, vid);
 }
@@ -756,8 +755,7 @@ int dsa_legacy_fdb_del(struct ndmsg *ndm, struct nlattr 
*tb[],
   struct net_device *dev,
   const unsigned char *addr, u16 vid)
 {
-   struct dsa_slave_priv *p = netdev_priv(dev);
-   struct dsa_port *dp = p->dp;
+   struct dsa_port *dp = dsa_slave_to_port(dev);
 
return dsa_port_fdb_del(dp, addr, vid);
 }
diff --git a/net/dsa/slave.c b/net/dsa/slave.c
index f31737256f69..894602c88b09 100644
--- a/net/dsa/slave.c
+++ b/net/dsa/slave.c
@@ -72,8 +72,8 @@ static int dsa_slave_get_iflink(const struct net_device *dev)
 static int dsa_slave_open(struct net_device *dev)
 {
struct dsa_slave_priv *p = netdev_priv(dev);
-   struct dsa_port *dp = p->dp;
struct net_device *master = dsa_master_netdev(p);
+   struct dsa_port *dp = dsa_slave_to_port(dev);
int err;
 
if (!(master->flags & IFF_UP))
@@ -122,7 +122,7 @@ static int dsa_slave_close(struct net_device *dev)
 {
struct dsa_slave_priv *p = netdev_priv(dev);
struct net_device *master = dsa_master_netdev(p);
-   struct dsa_port *dp = p->dp;
+   struct dsa_port *dp = dsa_slave_to_port(dev);
 
if (dev->phydev)
phy_stop(dev->phydev);
@@ -246,14 +246,13 @@ dsa_slave_fdb_dump(struct sk_buff *skb, struct 
netlink_callback *cb,
   struct net_device *dev, struct net_device *filter_dev,
   int *idx)
 {
+   struct dsa_port *dp = dsa_slave_to_port(dev);
struct dsa_slave_dump_ctx dump = {
.dev = dev,
.skb = skb,
.cb = cb,
.idx = *idx,
};
-   struct dsa_slave_priv *p = netdev_priv(dev);
-   struct dsa_port *dp = p->dp;
int err;
 
err = dsa_port_fdb_dump(dp, dsa_slave_port_fdb_do_dump, &dump);
@@ -274,8 +273,7 @@ static int dsa_slave_port_attr_set(struct net_device *dev,
   const struct switchdev_attr *attr,
   struct switchdev_trans *trans)
 {
-   struct dsa_slave_priv *p = netdev_priv(dev);
-   struct dsa_port *dp = p->dp;
+   struct dsa_port *dp = dsa_slave_to_port(dev);
int ret;
 
switch (attr->id) {
@@ -301,8 +299,7 @@ static int dsa_slave_port_obj_add(struct net_device *dev,
  const struct switchdev_obj *obj,
  struct switchdev_trans *trans)
 {
-   struct dsa_slave_priv *p = netdev_priv(dev);
-   struct dsa_port *dp = p->dp;
+   struct dsa_port *dp = dsa_slave_to_port(dev);
int err;
 
/* For the prepare phase, ensure the full set of changes is feasable in
@@ -329,8 +326,7 @@ static int dsa_slave_port_obj_add(struct net_device *dev,
 static int dsa_slave_port_obj_del(struct net_device *dev,
  const struct switchdev_obj *obj)
 {
-   struct dsa_slave_priv *p = netdev_priv(dev);
-   struct dsa_port *dp = p->dp;
+   struct dsa_port *dp = dsa_slave_to_port(dev);
int err;
 
switch (obj->id) {
@@ -351,8 +347,8 @@ static int dsa_slave_port_obj_del(struct net_device *dev,
 static int dsa_slave_port_attr

Re: [PATCH net-next 1/5] net: dsa: use port's cpu_dp when creating a slave

2017-10-12 Thread Florian Fainelli

On 10/12/2017 03:51 PM, Vivien Didelot wrote:
> When dsa_slave_create is called, the related port already has a CPU port
> assigned to it, available in its cpu_dp member. Use it instead of the
> unique tree cpu_dp.
> 
> Signed-off-by: Vivien Didelot 

Reviewed-by: Florian Fainelli 
-- 
Florian

[PATCH net-next 1/5] net: dsa: use port's cpu_dp when creating a slave

2017-10-12 Thread Vivien Didelot

When dsa_slave_create is called, the related port already has a CPU port
assigned to it, available in its cpu_dp member. Use it instead of the
unique tree cpu_dp.

Signed-off-by: Vivien Didelot 
---
 net/dsa/slave.c | 7 ++-
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/net/dsa/slave.c b/net/dsa/slave.c
index 45f4ea845c07..c6f4829645bf 100644
--- a/net/dsa/slave.c
+++ b/net/dsa/slave.c
@@ -1117,16 +1117,13 @@ int dsa_slave_resume(struct net_device *slave_dev)
 int dsa_slave_create(struct dsa_port *port, const char *name)
 {
struct dsa_notifier_register_info rinfo = { };
+   struct dsa_port *cpu_dp = port->cpu_dp;
+   struct net_device *master = cpu_dp->netdev;
struct dsa_switch *ds = port->ds;
-   struct net_device *master;
struct net_device *slave_dev;
struct dsa_slave_priv *p;
-   struct dsa_port *cpu_dp;
int ret;
 
-   cpu_dp = ds->dst->cpu_dp;
-   master = cpu_dp->netdev;
-
if (!ds->num_tx_queues)
ds->num_tx_queues = 1;
 
-- 
2.14.2

[PATCH net-next 4/5] net: dsa: add slave get master helper

2017-10-12 Thread Vivien Didelot

Many part of the DSA slave code require to get the master device
assigned to a slave device. Remove dsa_master_netdev() in favor of a
dsa_slave_get_master() helper which does that.

Signed-off-by: Vivien Didelot 
---
 net/dsa/dsa_priv.h | 13 -
 net/dsa/slave.c| 26 +-
 2 files changed, 17 insertions(+), 22 deletions(-)

diff --git a/net/dsa/dsa_priv.h b/net/dsa/dsa_priv.h
index 569a4929b4c9..f1dc5a856fda 100644
--- a/net/dsa/dsa_priv.h
+++ b/net/dsa/dsa_priv.h
@@ -176,6 +176,14 @@ static inline struct dsa_port *dsa_slave_to_port(const 
struct net_device *dev)
return p->dp;
 }
 
+static inline struct net_device *
+dsa_slave_get_master(const struct net_device *dev)
+{
+   struct dsa_port *dp = dsa_slave_to_port(dev);
+
+   return dp->cpu_dp->netdev;
+}
+
 /* switch.c */
 int dsa_switch_register_notifier(struct dsa_switch *ds);
 void dsa_switch_unregister_notifier(struct dsa_switch *ds);
@@ -204,9 +212,4 @@ extern const struct dsa_device_ops qca_netdev_ops;
 /* tag_trailer.c */
 extern const struct dsa_device_ops trailer_netdev_ops;
 
-static inline struct net_device *dsa_master_netdev(struct dsa_slave_priv *p)
-{
-   return p->dp->cpu_dp->netdev;
-}
-
 #endif
diff --git a/net/dsa/slave.c b/net/dsa/slave.c
index 894602c88b09..d2c780f13d78 100644
--- a/net/dsa/slave.c
+++ b/net/dsa/slave.c
@@ -64,15 +64,12 @@ void dsa_slave_mii_bus_init(struct dsa_switch *ds)
 /* slave device handling /
 static int dsa_slave_get_iflink(const struct net_device *dev)
 {
-   struct dsa_slave_priv *p = netdev_priv(dev);
-
-   return dsa_master_netdev(p)->ifindex;
+   return dsa_slave_get_master(dev)->ifindex;
 }
 
 static int dsa_slave_open(struct net_device *dev)
 {
-   struct dsa_slave_priv *p = netdev_priv(dev);
-   struct net_device *master = dsa_master_netdev(p);
+   struct net_device *master = dsa_slave_get_master(dev);
struct dsa_port *dp = dsa_slave_to_port(dev);
int err;
 
@@ -120,8 +117,7 @@ static int dsa_slave_open(struct net_device *dev)
 
 static int dsa_slave_close(struct net_device *dev)
 {
-   struct dsa_slave_priv *p = netdev_priv(dev);
-   struct net_device *master = dsa_master_netdev(p);
+   struct net_device *master = dsa_slave_get_master(dev);
struct dsa_port *dp = dsa_slave_to_port(dev);
 
if (dev->phydev)
@@ -144,8 +140,7 @@ static int dsa_slave_close(struct net_device *dev)
 
 static void dsa_slave_change_rx_flags(struct net_device *dev, int change)
 {
-   struct dsa_slave_priv *p = netdev_priv(dev);
-   struct net_device *master = dsa_master_netdev(p);
+   struct net_device *master = dsa_slave_get_master(dev);
 
if (change & IFF_ALLMULTI)
dev_set_allmulti(master, dev->flags & IFF_ALLMULTI ? 1 : -1);
@@ -155,8 +150,7 @@ static void dsa_slave_change_rx_flags(struct net_device 
*dev, int change)
 
 static void dsa_slave_set_rx_mode(struct net_device *dev)
 {
-   struct dsa_slave_priv *p = netdev_priv(dev);
-   struct net_device *master = dsa_master_netdev(p);
+   struct net_device *master = dsa_slave_get_master(dev);
 
dev_mc_sync(master, dev);
dev_uc_sync(master, dev);
@@ -164,8 +158,7 @@ static void dsa_slave_set_rx_mode(struct net_device *dev)
 
 static int dsa_slave_set_mac_address(struct net_device *dev, void *a)
 {
-   struct dsa_slave_priv *p = netdev_priv(dev);
-   struct net_device *master = dsa_master_netdev(p);
+   struct net_device *master = dsa_slave_get_master(dev);
struct sockaddr *addr = a;
int err;
 
@@ -409,7 +402,7 @@ static netdev_tx_t dsa_slave_xmit(struct sk_buff *skb, 
struct net_device *dev)
/* Queue the SKB for transmission on the parent interface, but
 * do not modify its EtherType
 */
-   nskb->dev = dsa_master_netdev(p);
+   nskb->dev = dsa_slave_get_master(dev);
dev_queue_xmit(nskb);
 
return NETDEV_TX_OK;
@@ -632,8 +625,8 @@ static int dsa_slave_get_eee(struct net_device *dev, struct 
ethtool_eee *e)
 static int dsa_slave_netpoll_setup(struct net_device *dev,
   struct netpoll_info *ni)
 {
+   struct net_device *master = dsa_slave_get_master(dev);
struct dsa_slave_priv *p = netdev_priv(dev);
-   struct net_device *master = dsa_master_netdev(p);
struct netpoll *netpoll;
int err = 0;
 
@@ -1115,8 +1108,7 @@ int dsa_slave_resume(struct net_device *slave_dev)
 
 static void dsa_slave_notify(struct net_device *dev, unsigned long val)
 {
-   struct dsa_slave_priv *p = netdev_priv(dev);
-   struct net_device *master = dsa_master_netdev(p);
+   struct net_device *master = dsa_slave_get_master(dev);
struct dsa_port *dp = dsa_slave_to_port(dev);
struct dsa_notifier_register_info rinfo = {
.switch_number = dp->ds->index,
-- 
2.14.2

[PATCH net-next 2/5] net: dsa: add slave notify helper

2017-10-12 Thread Vivien Didelot

Both DSA slave create and destroy functions call call_dsa_notifiers with
respectively DSA_PORT_REGISTER and DSA_PORT_UNREGISTER and the same
dsa_notifier_register_info structure.

Wrap this in a dsa_slave_notify helper so prevent cluttering these
functions.

Signed-off-by: Vivien Didelot 
---
 net/dsa/slave.c | 29 +
 1 file changed, 17 insertions(+), 12 deletions(-)

diff --git a/net/dsa/slave.c b/net/dsa/slave.c
index c6f4829645bf..f31737256f69 100644
--- a/net/dsa/slave.c
+++ b/net/dsa/slave.c
@@ -1114,9 +1114,23 @@ int dsa_slave_resume(struct net_device *slave_dev)
return 0;
 }
 
+static void dsa_slave_notify(struct net_device *dev, unsigned long val)
+{
+   struct dsa_slave_priv *p = netdev_priv(dev);
+   struct net_device *master = dsa_master_netdev(p);
+   struct dsa_port *dp = p->dp;
+   struct dsa_notifier_register_info rinfo = {
+   .switch_number = dp->ds->index,
+   .port_number = dp->index,
+   .master = master,
+   .info.dev = dev,
+   };
+
+   call_dsa_notifiers(val, dev, &rinfo.info);
+}
+
 int dsa_slave_create(struct dsa_port *port, const char *name)
 {
-   struct dsa_notifier_register_info rinfo = { };
struct dsa_port *cpu_dp = port->cpu_dp;
struct net_device *master = cpu_dp->netdev;
struct dsa_switch *ds = port->ds;
@@ -1175,11 +1189,7 @@ int dsa_slave_create(struct dsa_port *port, const char 
*name)
goto out_free;
}
 
-   rinfo.info.dev = slave_dev;
-   rinfo.master = master;
-   rinfo.port_number = p->dp->index;
-   rinfo.switch_number = p->dp->ds->index;
-   call_dsa_notifiers(DSA_PORT_REGISTER, slave_dev, &rinfo.info);
+   dsa_slave_notify(slave_dev, DSA_PORT_REGISTER);
 
ret = register_netdev(slave_dev);
if (ret) {
@@ -1204,7 +1214,6 @@ int dsa_slave_create(struct dsa_port *port, const char 
*name)
 void dsa_slave_destroy(struct net_device *slave_dev)
 {
struct dsa_slave_priv *p = netdev_priv(slave_dev);
-   struct dsa_notifier_register_info rinfo = { };
struct device_node *port_dn;
 
port_dn = p->dp->dn;
@@ -1216,11 +1225,7 @@ void dsa_slave_destroy(struct net_device *slave_dev)
if (of_phy_is_fixed_link(port_dn))
of_phy_deregister_fixed_link(port_dn);
}
-   rinfo.info.dev = slave_dev;
-   rinfo.master = p->dp->cpu_dp->netdev;
-   rinfo.port_number = p->dp->index;
-   rinfo.switch_number = p->dp->ds->index;
-   call_dsa_notifiers(DSA_PORT_UNREGISTER, slave_dev, &rinfo.info);
+   dsa_slave_notify(slave_dev, DSA_PORT_UNREGISTER);
unregister_netdev(slave_dev);
free_percpu(p->stats64);
free_netdev(slave_dev);
-- 
2.14.2

[Patch net-next v2] tcp: add a tracepoint for tcp_retransmit_skb()

2017-10-12 Thread Cong Wang

We need a real-time notification for tcp retransmission
for monitoring.

Of course we could use ftrace to dynamically instrument this
kernel function too, however we can't retrieve the connection
information at the same time, for example perf-tools [1] reads
/proc/net/tcp for socket details, which is slow when we have
a lots of connections.

Therefore, this patch adds a tracepoint for tcp_retransmit_skb()
and exposes src/dst IP addresses and ports of the connection.
This also makes it easier to integrate into perf.

Note, I expose both IPv4 and IPv6 addresses at the same time:
for a IPv4 socket, v4 mapped address is used as IPv6 addresses,
for a IPv6 socket, LOOPBACK4_IPV6 is already filled by kernel.
Also, add sk and skb pointers as they are useful for BPF.

1. https://github.com/brendangregg/perf-tools/blob/master/net/tcpretrans

Cc: Eric Dumazet 
Cc: Alexei Starovoitov 
Cc: Hannes Frederic Sowa 
Cc: Brendan Gregg 
Cc: Neal Cardwell 
Signed-off-by: Cong Wang 
---
 include/trace/events/tcp.h | 68 ++
 net/core/net-traces.c  |  1 +
 net/ipv4/tcp_output.c  |  3 ++
 3 files changed, 72 insertions(+)
 create mode 100644 include/trace/events/tcp.h

diff --git a/include/trace/events/tcp.h b/include/trace/events/tcp.h
new file mode 100644
index ..749f93c542ab
--- /dev/null
+++ b/include/trace/events/tcp.h
@@ -0,0 +1,68 @@
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM tcp
+
+#if !defined(_TRACE_TCP_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_TCP_H
+
+#include 
+#include 
+#include 
+#include 
+
+TRACE_EVENT(tcp_retransmit_skb,
+
+   TP_PROTO(struct sock *sk, struct sk_buff *skb, int segs),
+
+   TP_ARGS(sk, skb, segs),
+
+   TP_STRUCT__entry(
+   __field(void *, skbaddr)
+   __field(void *, skaddr)
+   __field(__u16, sport)
+   __field(__u16, dport)
+   __array(__u8, saddr, 4)
+   __array(__u8, daddr, 4)
+   __array(__u8, saddr_v6, 16)
+   __array(__u8, daddr_v6, 16)
+   ),
+
+   TP_fast_assign(
+   struct ipv6_pinfo *np = inet6_sk(sk);
+   struct inet_sock *inet = inet_sk(sk);
+   struct in6_addr *pin6;
+   __be32 *p32;
+
+   __entry->skbaddr = skb;
+   __entry->skaddr = sk;
+
+   __entry->sport = ntohs(inet->inet_sport);
+   __entry->dport = ntohs(inet->inet_dport);
+
+   p32 = (__be32 *) __entry->saddr;
+   *p32 = inet->inet_saddr;
+
+   p32 = (__be32 *) __entry->daddr;
+   *p32 =  inet->inet_daddr;
+
+   if (np) {
+   pin6 = (struct in6_addr *)__entry->saddr_v6;
+   *pin6 = np->saddr;
+   pin6 = (struct in6_addr *)__entry->daddr_v6;
+   *pin6 = *(np->daddr_cache);
+   } else {
+   pin6 = (struct in6_addr *)__entry->saddr_v6;
+   ipv6_addr_set_v4mapped(inet->inet_saddr, pin6);
+   pin6 = (struct in6_addr *)__entry->daddr_v6;
+   ipv6_addr_set_v4mapped(inet->inet_daddr, pin6);
+   }
+   ),
+
+   TP_printk("sport=%hu dport=%hu saddr=%pI4 daddr=%pI4 saddrv6=%pI6 
daddrv6=%pI6",
+ __entry->sport, __entry->dport, __entry->saddr, 
__entry->daddr,
+ __entry->saddr_v6, __entry->daddr_v6)
+);
+
+#endif /* _TRACE_TCP_H */
+
+/* This part must be outside protection */
+#include 
diff --git a/net/core/net-traces.c b/net/core/net-traces.c
index 1132820c8e62..f4e4fa2db505 100644
--- a/net/core/net-traces.c
+++ b/net/core/net-traces.c
@@ -31,6 +31,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #if IS_ENABLED(CONFIG_IPV6)
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 696b0a168f16..e1e7410a5b60 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -42,6 +42,8 @@
 #include 
 #include 
 
+#include 
+
 /* People can turn this off for buggy TCP's found in printers etc. */
 int sysctl_tcp_retrans_collapse __read_mostly = 1;
 
@@ -2875,6 +2877,7 @@ int __tcp_retransmit_skb(struct sock *sk, struct sk_buff 
*skb, int segs)
 
if (likely(!err)) {
TCP_SKB_CB(skb)->sacked |= TCPCB_EVER_RETRANS;
+   trace_tcp_retransmit_skb(sk, skb, segs);
} else if (err != -EBUSY) {
NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPRETRANSFAIL);
}
-- 
2.13.0

Re: [PATCH] Add -target to clang switches while cross compiling.

2017-10-12 Thread Abhijit Ayarekar

On Thu, Oct 12, 2017 at 03:23:04PM -0700, Alexei Starovoitov wrote:
> On Thu, Oct 12, 2017 at 01:45:57PM -0700, Abhijit Ayarekar wrote:
> > Latest llvm update excludes assembly instructions.
> > As a result __ASM_SYSREGS_H define is not required.
> > -target switch includes appropriate target specific files.
> > 
> > Tested on x86 and arm64 with llvm with git revision
> > commit df6ca162269f9d756f8742bf4b658dcf690e3eb5
> > Author: Yonghong Song 
> > Date:   Thu Sep 28 02:46:11 2017 +
> > 
> > bpf: add new insns for bswap_to_le and negation
> > 
> > Signed-off-by: Abhijit Ayarekar 
> > ---
> >  samples/bpf/Makefile | 5 +++--
> >  1 file changed, 3 insertions(+), 2 deletions(-)
> > 
> > diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile
> > index ebc2ad6..81f9fcd 100644
> > --- a/samples/bpf/Makefile
> > +++ b/samples/bpf/Makefile
> > @@ -180,6 +180,7 @@ CLANG ?= clang
> >  # Detect that we're cross compiling and use the cross compiler
> >  ifdef CROSS_COMPILE
> >  HOSTCC = $(CROSS_COMPILE)gcc
> > +CLANG_ARCH_ARGS = -target $(ARCH)
> 
> this is only need because you're crosscompiling, right?
Yes

> In native compilation it's unnecessary flag.
> Only droping -D__ASM_SYSREG_H is enough, correct?
> 
Yes. That is correct.

> >  endif
> >  
> >  # Trick to allow make to be run from this directory
> > @@ -229,9 +230,9 @@ $(obj)/tracex5_kern.o: $(obj)/syscall_nrs.h
> >  $(obj)/%.o: $(src)/%.c
> > $(CLANG) $(NOSTDINC_FLAGS) $(LINUXINCLUDE) $(EXTRA_CFLAGS) -I$(obj) \
> > -I$(srctree)/tools/testing/selftests/bpf/ \
> > -   -D__KERNEL__ -D__ASM_SYSREG_H -Wno-unused-value 
> > -Wno-pointer-sign \
> > +   -D__KERNEL__ -Wno-unused-value -Wno-pointer-sign \
> > -D__TARGET_ARCH_$(ARCH) -Wno-compare-distinct-pointer-types \
> > -Wno-gnu-variable-sized-type-not-at-end \
> > -Wno-address-of-packed-member -Wno-tautological-compare \
> > -   -Wno-unknown-warning-option \
> > +   -Wno-unknown-warning-option $(CLANG_ARCH_ARGS) \
> > -O2 -emit-llvm -c $< -o -| $(LLC) -march=bpf -filetype=obj -o $@
> > -- 
> > 2.7.4
> >

Re: [PATCH net-next] net: dsa: set random switch address

2017-10-12 Thread Florian Fainelli

On 10/12/2017 03:35 PM, Vivien Didelot wrote:
> Hi Florian,
> 
> Florian Fainelli  writes:
> 
>> On 10/12/2017 03:10 PM, Vivien Didelot wrote:
>>> An Ethernet switch may support having a MAC address, which can be used
>>> as the switch's source address in transmitted full-duplex Pause frames.
>>>
>>> If a DSA switch supports the related .set_addr operation, the DSA core
>>> sets the master's MAC address on the switch.
>>>
>>> This won't make sense anymore in a multi-CPU ports system, because there
>>> won't be a unique master device assigned to a switch tree.
>>
>> Thus far, everything you have said is true, but why we should do it,
>> that is: what if we don't, needs to be explained. Does that create a
>> problem with the generation of pause frames throughout the switch fabric?
>>
>>>
>>> To fix this, assign a random MAC address to the switch chip instead.
>>
>> Maybe this is something that should be removed entirely from the DSA
>> core and pushed into the individual switch drivers instead. dsa_loop
>> implements it for code coverage, but that does not do anything.
>>
>> set_addr is confusing in that you may think it could be used to program
>> the switch with the MAC address of the CPU/management port such that you
>> can disable MAC address learning on said port, but in fact, that's not
>> how it is used.
> 
> You are correct. So what I can do is assign a random MAC address in the
> Marvell driver, remove the .set_addr implementation of mv88e6xxx and
> dsa_loop, and finally remove this code from DSA core completely.

Works for me, thanks!
-- 
Florian

[PATCH] tracing: bpf: Hide bpf trace events when they are not used

2017-10-12 Thread Steven Rostedt

From: Steven Rostedt (VMware) 

All the trace events defined in include/trace/events/bpf.h are only
used when CONFIG_BPF_SYSCALL is defined. But this file gets included by
include/linux/bpf_trace.h which is included by the networking code with
CREATE_TRACE_POINTS defined.

If a trace event is created but not used it still has data structures
and functions created for its use, even though nothing is using them.
To not waste space, do not define the BPF trace events in bpf.h unless
CONFIG_BPF_SYSCALL is defined.

Signed-off-by: Steven Rostedt (VMware) 
---
Index: linux-trace.git/include/trace/events/bpf.h
===
--- linux-trace.git.orig/include/trace/events/bpf.h
+++ linux-trace.git/include/trace/events/bpf.h
@@ -4,6 +4,9 @@
 #if !defined(_TRACE_BPF_H) || defined(TRACE_HEADER_MULTI_READ)
 #define _TRACE_BPF_H
 
+/* These are only used within the BPF_SYSCALL code */
+#ifdef CONFIG_BPF_SYSCALL
+
 #include 
 #include 
 #include 
@@ -345,7 +348,7 @@ TRACE_EVENT(bpf_map_next_key,
  __print_hex(__get_dynamic_array(nxt), __entry->key_len),
  __entry->key_trunc ? " ..." : "")
 );
-
+#endif /* CONFIG_BPF_SYSCALL */
 #endif /* _TRACE_BPF_H */
 
 #include 
Index: linux-trace.git/kernel/bpf/core.c
===
--- linux-trace.git.orig/kernel/bpf/core.c
+++ linux-trace.git/kernel/bpf/core.c
@@ -1498,5 +1498,8 @@ int __weak skb_copy_bits(const struct sk
 
 EXPORT_TRACEPOINT_SYMBOL_GPL(xdp_exception);
 
+/* These are only used within the BPF_SYSCALL code */
+#ifdef CONFIG_BPF_SYSCALL
 EXPORT_TRACEPOINT_SYMBOL_GPL(bpf_prog_get_type);
 EXPORT_TRACEPOINT_SYMBOL_GPL(bpf_prog_put_rcu);
+#endif

[net-next RFC 0/4] Openvswitch meter action

2017-10-12 Thread Andy Zhou

This patch series is the first attempt to add openvswitch
meter support. We have previously experimented with adding
metering support in nftables. However 1) It was not clear
how to expose a named nftables object cleanly, and 2)
the logic that implements metering is quite small, < 100 lines
of code.

With those two observations, it seems cleaner to add meter
support in the openvswitch module directly.


Andy Zhou (4):
  openvswitch: Add meter netlink definitions
  openvswitch: export get_dp() API.
  openvswitch: Add meter infrastructure
  openvswitch: Add meter action support

 include/uapi/linux/openvswitch.h |  52 
 net/openvswitch/Makefile |   1 +
 net/openvswitch/actions.c|  12 +
 net/openvswitch/datapath.c   |  43 +--
 net/openvswitch/datapath.h   |  35 +++
 net/openvswitch/flow_netlink.c   |   6 +
 net/openvswitch/meter.c  | 611 +++
 net/openvswitch/meter.h  |  54 
 8 files changed, 783 insertions(+), 31 deletions(-)
 create mode 100644 net/openvswitch/meter.c
 create mode 100644 net/openvswitch/meter.h

-- 
1.8.3.1

[net-next RFC 2/4] openvswitch: export get_dp() API.

2017-10-12 Thread Andy Zhou

Later patches will invoke get_dp() outside of datapath.c. Export it.

Signed-off-by: Andy Zhou 
---
 net/openvswitch/datapath.c | 29 -
 net/openvswitch/datapath.h | 31 +++
 2 files changed, 31 insertions(+), 29 deletions(-)

diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
index c3aec6227c91..ac7154018676 100644
--- a/net/openvswitch/datapath.c
+++ b/net/openvswitch/datapath.c
@@ -142,35 +142,6 @@ static int queue_userspace_packet(struct datapath *dp, 
struct sk_buff *,
  const struct dp_upcall_info *,
  uint32_t cutlen);
 
-/* Must be called with rcu_read_lock. */
-static struct datapath *get_dp_rcu(struct net *net, int dp_ifindex)
-{
-   struct net_device *dev = dev_get_by_index_rcu(net, dp_ifindex);
-
-   if (dev) {
-   struct vport *vport = ovs_internal_dev_get_vport(dev);
-   if (vport)
-   return vport->dp;
-   }
-
-   return NULL;
-}
-
-/* The caller must hold either ovs_mutex or rcu_read_lock to keep the
- * returned dp pointer valid.
- */
-static inline struct datapath *get_dp(struct net *net, int dp_ifindex)
-{
-   struct datapath *dp;
-
-   WARN_ON_ONCE(!rcu_read_lock_held() && !lockdep_ovsl_is_held());
-   rcu_read_lock();
-   dp = get_dp_rcu(net, dp_ifindex);
-   rcu_read_unlock();
-
-   return dp;
-}
-
 /* Must be called with rcu_read_lock or ovs_mutex. */
 const char *ovs_dp_name(const struct datapath *dp)
 {
diff --git a/net/openvswitch/datapath.h b/net/openvswitch/datapath.h
index 480600649d0b..ad14b571219d 100644
--- a/net/openvswitch/datapath.h
+++ b/net/openvswitch/datapath.h
@@ -30,6 +30,7 @@
 #include "conntrack.h"
 #include "flow.h"
 #include "flow_table.h"
+#include "vport-internal_dev.h"
 
 #define DP_MAX_PORTS   USHRT_MAX
 #define DP_VPORT_HASH_BUCKETS  1024
@@ -190,6 +191,36 @@ static inline struct vport *ovs_vport_ovsl(const struct 
datapath *dp, int port_n
return ovs_lookup_vport(dp, port_no);
 }
 
+/* Must be called with rcu_read_lock. */
+static inline struct datapath *get_dp_rcu(struct net *net, int dp_ifindex)
+{
+   struct net_device *dev = dev_get_by_index_rcu(net, dp_ifindex);
+
+   if (dev) {
+   struct vport *vport = ovs_internal_dev_get_vport(dev);
+
+   if (vport)
+   return vport->dp;
+   }
+
+   return NULL;
+}
+
+/* The caller must hold either ovs_mutex or rcu_read_lock to keep the
+ * returned dp pointer valid.
+ */
+static inline struct datapath *get_dp(struct net *net, int dp_ifindex)
+{
+   struct datapath *dp;
+
+   WARN_ON_ONCE(!rcu_read_lock_held() && !lockdep_ovsl_is_held());
+   rcu_read_lock();
+   dp = get_dp_rcu(net, dp_ifindex);
+   rcu_read_unlock();
+
+   return dp;
+}
+
 extern struct notifier_block ovs_dp_device_notifier;
 extern struct genl_family dp_vport_genl_family;
 
-- 
1.8.3.1

[net-next RFC 3/4] openvswitch: Add meter infrastructure

2017-10-12 Thread Andy Zhou

OVS kernel datapath so far does not support Openflow meter action.
This is the first stab at adding kernel datapath meter support.
This implementation supports only drop band type.

Signed-off-by: Andy Zhou 
---
 net/openvswitch/Makefile   |   1 +
 net/openvswitch/datapath.c |  14 +-
 net/openvswitch/datapath.h |   3 +
 net/openvswitch/meter.c| 611 +
 net/openvswitch/meter.h|  54 
 5 files changed, 681 insertions(+), 2 deletions(-)
 create mode 100644 net/openvswitch/meter.c
 create mode 100644 net/openvswitch/meter.h

diff --git a/net/openvswitch/Makefile b/net/openvswitch/Makefile
index 60f809085b92..658383fbdf53 100644
--- a/net/openvswitch/Makefile
+++ b/net/openvswitch/Makefile
@@ -11,6 +11,7 @@ openvswitch-y := \
flow.o \
flow_netlink.o \
flow_table.o \
+   meter.o \
vport.o \
vport-internal_dev.o \
vport-netdev.o
diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
index ac7154018676..eef8d3ea3aae 100644
--- a/net/openvswitch/datapath.c
+++ b/net/openvswitch/datapath.c
@@ -55,6 +55,7 @@
 #include "flow.h"
 #include "flow_table.h"
 #include "flow_netlink.h"
+#include "meter.h"
 #include "vport-internal_dev.h"
 #include "vport-netdev.h"
 
@@ -174,6 +175,7 @@ static void destroy_dp_rcu(struct rcu_head *rcu)
ovs_flow_tbl_destroy(&dp->table);
free_percpu(dp->stats_percpu);
kfree(dp->ports);
+   ovs_meters_exit(dp);
kfree(dp);
 }
 
@@ -1572,6 +1574,10 @@ static int ovs_dp_cmd_new(struct sk_buff *skb, struct 
genl_info *info)
for (i = 0; i < DP_VPORT_HASH_BUCKETS; i++)
INIT_HLIST_HEAD(&dp->ports[i]);
 
+   err = ovs_meters_init(dp);
+   if (err)
+   goto err_destroy_ports_array;
+
/* Set up our datapath device. */
parms.name = nla_data(a[OVS_DP_ATTR_NAME]);
parms.type = OVS_VPORT_TYPE_INTERNAL;
@@ -1600,7 +1606,7 @@ static int ovs_dp_cmd_new(struct sk_buff *skb, struct 
genl_info *info)
ovs_dp_reset_user_features(skb, info);
}
 
-   goto err_destroy_ports_array;
+   goto err_destroy_meters;
}
 
err = ovs_dp_cmd_fill_info(dp, reply, info->snd_portid,
@@ -1615,8 +1621,10 @@ static int ovs_dp_cmd_new(struct sk_buff *skb, struct 
genl_info *info)
ovs_notify(&dp_datapath_genl_family, reply, info);
return 0;
 
-err_destroy_ports_array:
+err_destroy_meters:
ovs_unlock();
+   ovs_meters_exit(dp);
+err_destroy_ports_array:
kfree(dp->ports);
 err_destroy_percpu:
free_percpu(dp->stats_percpu);
@@ -2244,6 +2252,7 @@ struct genl_family dp_vport_genl_family __ro_after_init = 
{
&dp_vport_genl_family,
&dp_flow_genl_family,
&dp_packet_genl_family,
+   &dp_meter_genl_family,
 };
 
 static void dp_unregister_genl(int n_families)
@@ -2424,3 +2433,4 @@ static void dp_cleanup(void)
 MODULE_ALIAS_GENL_FAMILY(OVS_VPORT_FAMILY);
 MODULE_ALIAS_GENL_FAMILY(OVS_FLOW_FAMILY);
 MODULE_ALIAS_GENL_FAMILY(OVS_PACKET_FAMILY);
+MODULE_ALIAS_GENL_FAMILY(OVS_METER_FAMILY);
diff --git a/net/openvswitch/datapath.h b/net/openvswitch/datapath.h
index ad14b571219d..d1ffa1d9fe57 100644
--- a/net/openvswitch/datapath.h
+++ b/net/openvswitch/datapath.h
@@ -92,6 +92,9 @@ struct datapath {
u32 user_features;
 
u32 max_headroom;
+
+   /* Switch meters. */
+   struct hlist_head *meters;
 };
 
 /**
diff --git a/net/openvswitch/meter.c b/net/openvswitch/meter.c
new file mode 100644
index ..f24ebb5f7af4
--- /dev/null
+++ b/net/openvswitch/meter.c
@@ -0,0 +1,611 @@
+/*
+ * Copyright (c) 2017 Nicira, Inc.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of version 2 of the GNU General Public
+ * License as published by the Free Software Foundation.
+ */
+
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+
+#include "datapath.h"
+#include "meter.h"
+
+#define METER_HASH_BUCKETS 1024
+
+static const struct nla_policy meter_policy[OVS_METER_ATTR_MAX + 1] = {
+   [OVS_METER_ATTR_ID] = { .type = NLA_U32, },
+   [OVS_METER_ATTR_KBPS] = { .type = NLA_FLAG },
+   [OVS_METER_ATTR_STATS] = { .len = sizeof(struct ovs_flow_stats) },
+   [OVS_METER_ATTR_BANDS] = { .type = NLA_NESTED },
+   [OVS_METER_ATTR_USED] = { .type = NLA_U64 },
+   [OVS_METER_ATTR_CLEAR] = { .type = NLA_FLAG },
+   [OVS_METER_ATTR_MAX_METERS] = { .type = NLA_U32 },
+   [OVS_METER_ATTR_MAX_BANDS] = { .type = NLA_U32 },
+};
+
+static const struct nla_policy band_policy[OVS_BAND_ATTR_MAX + 1] = {
+   [OVS_BAND_ATTR_TYPE] = { .type = NLA_U32, },
+   [OVS_BAND_ATTR_RATE] = { .type = NLA_U32, },
+   [OVS_BAND_ATTR_BURST] = { .type = NLA_U32, },
+   [OVS_BAND_ATTR_STATS] = { .len = sizeof(struct

[net-next RFC 1/4] openvswitch: Add meter netlink definitions

2017-10-12 Thread Andy Zhou

Meter has its own netlink family. Define netlink messages and attributes
for communicating with the user space programs.

Signed-off-by: Andy Zhou 
---
 include/uapi/linux/openvswitch.h | 51 
 1 file changed, 51 insertions(+)

diff --git a/include/uapi/linux/openvswitch.h b/include/uapi/linux/openvswitch.h
index 156ee4cab82e..325049a129e4 100644
--- a/include/uapi/linux/openvswitch.h
+++ b/include/uapi/linux/openvswitch.h
@@ -848,4 +848,55 @@ enum ovs_action_attr {
 
 #define OVS_ACTION_ATTR_MAX (__OVS_ACTION_ATTR_MAX - 1)
 
+/* Meters. */
+#define OVS_METER_FAMILY  "ovs_meter"
+#define OVS_METER_MCGROUP "ovs_meter"
+#define OVS_METER_VERSION 0x1
+
+enum ovs_meter_cmd {
+   OVS_METER_CMD_UNSPEC,
+   OVS_METER_CMD_FEATURES, /* Get features supported by the datapath. */
+   OVS_METER_CMD_SET,  /* Add or modify a meter. */
+   OVS_METER_CMD_DEL,  /* Delete a meter. */
+   OVS_METER_CMD_GET   /* Get meter stats. */
+};
+
+enum ovs_meter_attr {
+   OVS_METER_ATTR_UNSPEC,
+   OVS_METER_ATTR_ID,  /* u32 meter ID within datapath. */
+   OVS_METER_ATTR_KBPS,/* No argument. If set, units in kilobits
+* per second. Otherwise, units in
+* packets per second.
+*/
+   OVS_METER_ATTR_STATS,   /* struct ovs_flow_stats for the meter. */
+   OVS_METER_ATTR_BANDS,   /* Nested attributes for meter bands. */
+   OVS_METER_ATTR_USED,/* u64 msecs last used in monotonic time. */
+   OVS_METER_ATTR_CLEAR,   /* Flag to clear stats, used. */
+   OVS_METER_ATTR_MAX_METERS, /* u32 number of meters supported. */
+   OVS_METER_ATTR_MAX_BANDS,  /* u32 max number of bands per meter. */
+   OVS_METER_ATTR_PAD,
+   __OVS_METER_ATTR_MAX
+};
+
+#define OVS_METER_ATTR_MAX (__OVS_METER_ATTR_MAX - 1)
+
+enum ovs_band_attr {
+   OVS_BAND_ATTR_UNSPEC,
+   OVS_BAND_ATTR_TYPE, /* u32 OVS_METER_BAND_TYPE_* constant. */
+   OVS_BAND_ATTR_RATE, /* u32 band rate in meter units (see above). */
+   OVS_BAND_ATTR_BURST,/* u32 burst size in meter units. */
+   OVS_BAND_ATTR_STATS,/* struct ovs_flow_stats for the band. */
+   __OVS_BAND_ATTR_MAX
+};
+
+#define OVS_BAND_ATTR_MAX (__OVS_BAND_ATTR_MAX - 1)
+
+enum ovs_meter_band_type {
+   OVS_METER_BAND_TYPE_UNSPEC,
+   OVS_METER_BAND_TYPE_DROP,   /* Drop exceeding packets. */
+   __OVS_METER_BAND_TYPE_MAX
+};
+
+#define OVS_METER_BAND_TYPE_MAX (__OVS_METER_BAND_TYPE_MAX - 1)
+
 #endif /* _LINUX_OPENVSWITCH_H */
-- 
1.8.3.1

[net-next RFC 4/4] openvswitch: Add meter action support

2017-10-12 Thread Andy Zhou

Implements OVS kernel meter action support.

Signed-off-by: Andy Zhou 
---
 include/uapi/linux/openvswitch.h |  1 +
 net/openvswitch/actions.c| 12 
 net/openvswitch/datapath.h   |  1 +
 net/openvswitch/flow_netlink.c   |  6 ++
 4 files changed, 20 insertions(+)

diff --git a/include/uapi/linux/openvswitch.h b/include/uapi/linux/openvswitch.h
index 325049a129e4..11fe1a06cdd6 100644
--- a/include/uapi/linux/openvswitch.h
+++ b/include/uapi/linux/openvswitch.h
@@ -835,6 +835,7 @@ enum ovs_action_attr {
OVS_ACTION_ATTR_TRUNC,/* u32 struct ovs_action_trunc. */
OVS_ACTION_ATTR_PUSH_ETH, /* struct ovs_action_push_eth. */
OVS_ACTION_ATTR_POP_ETH,  /* No argument. */
+   OVS_ACTION_ATTR_METER,/* u32 meter ID. */
 
__OVS_ACTION_ATTR_MAX,/* Nothing past this will be accepted
   * from userspace. */
diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
index a54a556fcdb5..4eb160ac5a27 100644
--- a/net/openvswitch/actions.c
+++ b/net/openvswitch/actions.c
@@ -1210,6 +1210,12 @@ static int do_execute_actions(struct datapath *dp, 
struct sk_buff *skb,
case OVS_ACTION_ATTR_POP_ETH:
err = pop_eth(skb, key);
break;
+
+   case OVS_ACTION_ATTR_METER:
+   if (ovs_meter_execute(dp, skb, key, nla_get_u32(a))) {
+   consume_skb(skb);
+   return 0;
+   }
}
 
if (unlikely(err)) {
@@ -1341,6 +1347,12 @@ int ovs_execute_actions(struct datapath *dp, struct 
sk_buff *skb,
err = do_execute_actions(dp, skb, key,
 acts->actions, acts->actions_len);
 
+   /* OVS action has dropped the packet, do not expose it
+* to the user.
+*/
+   if (err == -ENODATA)
+   err = 0;
+
if (level == 1)
process_deferred_actions(dp);
 
diff --git a/net/openvswitch/datapath.h b/net/openvswitch/datapath.h
index d1ffa1d9fe57..cda40c6af40a 100644
--- a/net/openvswitch/datapath.h
+++ b/net/openvswitch/datapath.h
@@ -30,6 +30,7 @@
 #include "conntrack.h"
 #include "flow.h"
 #include "flow_table.h"
+#include "meter.h"
 #include "vport-internal_dev.h"
 
 #define DP_MAX_PORTS   USHRT_MAX
diff --git a/net/openvswitch/flow_netlink.c b/net/openvswitch/flow_netlink.c
index e8eb427ce6d1..39b548431f68 100644
--- a/net/openvswitch/flow_netlink.c
+++ b/net/openvswitch/flow_netlink.c
@@ -85,6 +85,7 @@ static bool actions_may_change_flow(const struct nlattr 
*actions)
case OVS_ACTION_ATTR_SAMPLE:
case OVS_ACTION_ATTR_SET:
case OVS_ACTION_ATTR_SET_MASKED:
+   case OVS_ACTION_ATTR_METER:
default:
return true;
}
@@ -2482,6 +2483,7 @@ static int __ovs_nla_copy_actions(struct net *net, const 
struct nlattr *attr,
[OVS_ACTION_ATTR_TRUNC] = sizeof(struct 
ovs_action_trunc),
[OVS_ACTION_ATTR_PUSH_ETH] = sizeof(struct 
ovs_action_push_eth),
[OVS_ACTION_ATTR_POP_ETH] = 0,
+   [OVS_ACTION_ATTR_METER] = sizeof(u32),
};
const struct ovs_action_push_vlan *vlan;
int type = nla_type(a);
@@ -2636,6 +2638,10 @@ static int __ovs_nla_copy_actions(struct net *net, const 
struct nlattr *attr,
mac_proto = MAC_PROTO_ETHERNET;
break;
 
+   case OVS_ACTION_ATTR_METER:
+   /* Non-existent meters are simply ignored.  */
+   break;
+
default:
OVS_NLERR(log, "Unknown Action type %d", type);
return -EINVAL;
-- 
1.8.3.1

Re: [PATCH net-next] net: dsa: set random switch address

2017-10-12 Thread Vivien Didelot

Hi Florian,

Florian Fainelli  writes:

> On 10/12/2017 03:10 PM, Vivien Didelot wrote:
>> An Ethernet switch may support having a MAC address, which can be used
>> as the switch's source address in transmitted full-duplex Pause frames.
>> 
>> If a DSA switch supports the related .set_addr operation, the DSA core
>> sets the master's MAC address on the switch.
>> 
>> This won't make sense anymore in a multi-CPU ports system, because there
>> won't be a unique master device assigned to a switch tree.
>
> Thus far, everything you have said is true, but why we should do it,
> that is: what if we don't, needs to be explained. Does that create a
> problem with the generation of pause frames throughout the switch fabric?
>
>> 
>> To fix this, assign a random MAC address to the switch chip instead.
>
> Maybe this is something that should be removed entirely from the DSA
> core and pushed into the individual switch drivers instead. dsa_loop
> implements it for code coverage, but that does not do anything.
>
> set_addr is confusing in that you may think it could be used to program
> the switch with the MAC address of the CPU/management port such that you
> can disable MAC address learning on said port, but in fact, that's not
> how it is used.

You are correct. So what I can do is assign a random MAC address in the
Marvell driver, remove the .set_addr implementation of mv88e6xxx and
dsa_loop, and finally remove this code from DSA core completely.


Thanks,

Vivien

Re: [PATCH] Add -target to clang switches while cross compiling.

2017-10-12 Thread Alexei Starovoitov

On Thu, Oct 12, 2017 at 01:45:57PM -0700, Abhijit Ayarekar wrote:
> Latest llvm update excludes assembly instructions.
> As a result __ASM_SYSREGS_H define is not required.
> -target switch includes appropriate target specific files.
> 
> Tested on x86 and arm64 with llvm with git revision
> commit df6ca162269f9d756f8742bf4b658dcf690e3eb5
> Author: Yonghong Song 
> Date:   Thu Sep 28 02:46:11 2017 +
> 
> bpf: add new insns for bswap_to_le and negation
> 
> Signed-off-by: Abhijit Ayarekar 
> ---
>  samples/bpf/Makefile | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile
> index ebc2ad6..81f9fcd 100644
> --- a/samples/bpf/Makefile
> +++ b/samples/bpf/Makefile
> @@ -180,6 +180,7 @@ CLANG ?= clang
>  # Detect that we're cross compiling and use the cross compiler
>  ifdef CROSS_COMPILE
>  HOSTCC = $(CROSS_COMPILE)gcc
> +CLANG_ARCH_ARGS = -target $(ARCH)

this is only need because you're crosscompiling, right?
In native compilation it's unnecessary flag.
Only droping -D__ASM_SYSREG_H is enough, correct?

>  endif
>  
>  # Trick to allow make to be run from this directory
> @@ -229,9 +230,9 @@ $(obj)/tracex5_kern.o: $(obj)/syscall_nrs.h
>  $(obj)/%.o: $(src)/%.c
>   $(CLANG) $(NOSTDINC_FLAGS) $(LINUXINCLUDE) $(EXTRA_CFLAGS) -I$(obj) \
>   -I$(srctree)/tools/testing/selftests/bpf/ \
> - -D__KERNEL__ -D__ASM_SYSREG_H -Wno-unused-value 
> -Wno-pointer-sign \
> + -D__KERNEL__ -Wno-unused-value -Wno-pointer-sign \
>   -D__TARGET_ARCH_$(ARCH) -Wno-compare-distinct-pointer-types \
>   -Wno-gnu-variable-sized-type-not-at-end \
>   -Wno-address-of-packed-member -Wno-tautological-compare \
> - -Wno-unknown-warning-option \
> + -Wno-unknown-warning-option $(CLANG_ARCH_ARGS) \
>   -O2 -emit-llvm -c $< -o -| $(LLC) -march=bpf -filetype=obj -o $@
> -- 
> 2.7.4
>

Re: [PATCH net-next] net: dsa: set random switch address

2017-10-12 Thread Florian Fainelli

On 10/12/2017 03:10 PM, Vivien Didelot wrote:
> An Ethernet switch may support having a MAC address, which can be used
> as the switch's source address in transmitted full-duplex Pause frames.
> 
> If a DSA switch supports the related .set_addr operation, the DSA core
> sets the master's MAC address on the switch.
> 
> This won't make sense anymore in a multi-CPU ports system, because there
> won't be a unique master device assigned to a switch tree.

Thus far, everything you have said is true, but why we should do it,
that is: what if we don't, needs to be explained. Does that create a
problem with the generation of pause frames throughout the switch fabric?

> 
> To fix this, assign a random MAC address to the switch chip instead.

Maybe this is something that should be removed entirely from the DSA
core and pushed into the individual switch drivers instead. dsa_loop
implements it for code coverage, but that does not do anything.

set_addr is confusing in that you may think it could be used to program
the switch with the MAC address of the CPU/management port such that you
can disable MAC address learning on said port, but in fact, that's not
how it is used.

> 
> Signed-off-by: Vivien Didelot 
> ---
>  net/dsa/dsa2.c |  8 +++-
>  net/dsa/dsa_priv.h |  1 +
>  net/dsa/legacy.c   |  8 +++-
>  net/dsa/switch.c   | 17 +
>  4 files changed, 24 insertions(+), 10 deletions(-)
> 
> diff --git a/net/dsa/dsa2.c b/net/dsa/dsa2.c
> index 54ed054777bd..8e5780ddd7f9 100644
> --- a/net/dsa/dsa2.c
> +++ b/net/dsa/dsa2.c
> @@ -336,11 +336,9 @@ static int dsa_ds_apply(struct dsa_switch_tree *dst, 
> struct dsa_switch *ds)
>   if (err)
>   return err;
>  
> - if (ds->ops->set_addr) {
> - err = ds->ops->set_addr(ds, dst->cpu_dp->netdev->dev_addr);
> - if (err < 0)
> - return err;
> - }
> + err = dsa_switch_set_addr(ds);
> + if (err)
> + return err;
>  
>   if (!ds->slave_mii_bus && ds->ops->phy_read) {
>   ds->slave_mii_bus = devm_mdiobus_alloc(ds->dev);
> diff --git a/net/dsa/dsa_priv.h b/net/dsa/dsa_priv.h
> index 2850077cc9cc..9c4c17a4bd6b 100644
> --- a/net/dsa/dsa_priv.h
> +++ b/net/dsa/dsa_priv.h
> @@ -172,6 +172,7 @@ void dsa_slave_unregister_notifier(void);
>  /* switch.c */
>  int dsa_switch_register_notifier(struct dsa_switch *ds);
>  void dsa_switch_unregister_notifier(struct dsa_switch *ds);
> +int dsa_switch_set_addr(struct dsa_switch *ds);
>  
>  /* tag_brcm.c */
>  extern const struct dsa_device_ops brcm_netdev_ops;
> diff --git a/net/dsa/legacy.c b/net/dsa/legacy.c
> index 19ff6e0a21dc..340ca7997271 100644
> --- a/net/dsa/legacy.c
> +++ b/net/dsa/legacy.c
> @@ -172,11 +172,9 @@ static int dsa_switch_setup_one(struct dsa_switch *ds,
>   if (ret)
>   return ret;
>  
> - if (ops->set_addr) {
> - ret = ops->set_addr(ds, master->dev_addr);
> - if (ret < 0)
> - return ret;
> - }
> + ret = dsa_switch_set_addr(ds);
> + if (ret)
> + return ret;
>  
>   if (!ds->slave_mii_bus && ops->phy_read) {
>   ds->slave_mii_bus = devm_mdiobus_alloc(ds->dev);
> diff --git a/net/dsa/switch.c b/net/dsa/switch.c
> index e6c06aa349a6..b45a26b006af 100644
> --- a/net/dsa/switch.c
> +++ b/net/dsa/switch.c
> @@ -10,6 +10,7 @@
>   * (at your option) any later version.
>   */
>  
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -267,3 +268,19 @@ void dsa_switch_unregister_notifier(struct dsa_switch 
> *ds)
>   if (err)
>   dev_err(ds->dev, "failed to unregister notifier (%d)\n", err);
>  }
> +
> +int dsa_switch_set_addr(struct dsa_switch *ds)
> +{
> + u8 addr[6];
> + int err;
> +
> + if (ds->ops->set_addr) {
> + eth_random_addr(addr);
> +
> + err = ds->ops->set_addr(ds, addr);
> + if (err)
> + return err;
> + }
> +
> + return 0;
> +}
> 


-- 
Florian

[PATCH net-next] net: dsa: set random switch address

2017-10-12 Thread Vivien Didelot

An Ethernet switch may support having a MAC address, which can be used
as the switch's source address in transmitted full-duplex Pause frames.

If a DSA switch supports the related .set_addr operation, the DSA core
sets the master's MAC address on the switch.

This won't make sense anymore in a multi-CPU ports system, because there
won't be a unique master device assigned to a switch tree.

To fix this, assign a random MAC address to the switch chip instead.

Signed-off-by: Vivien Didelot 
---
 net/dsa/dsa2.c |  8 +++-
 net/dsa/dsa_priv.h |  1 +
 net/dsa/legacy.c   |  8 +++-
 net/dsa/switch.c   | 17 +
 4 files changed, 24 insertions(+), 10 deletions(-)

diff --git a/net/dsa/dsa2.c b/net/dsa/dsa2.c
index 54ed054777bd..8e5780ddd7f9 100644
--- a/net/dsa/dsa2.c
+++ b/net/dsa/dsa2.c
@@ -336,11 +336,9 @@ static int dsa_ds_apply(struct dsa_switch_tree *dst, 
struct dsa_switch *ds)
if (err)
return err;
 
-   if (ds->ops->set_addr) {
-   err = ds->ops->set_addr(ds, dst->cpu_dp->netdev->dev_addr);
-   if (err < 0)
-   return err;
-   }
+   err = dsa_switch_set_addr(ds);
+   if (err)
+   return err;
 
if (!ds->slave_mii_bus && ds->ops->phy_read) {
ds->slave_mii_bus = devm_mdiobus_alloc(ds->dev);
diff --git a/net/dsa/dsa_priv.h b/net/dsa/dsa_priv.h
index 2850077cc9cc..9c4c17a4bd6b 100644
--- a/net/dsa/dsa_priv.h
+++ b/net/dsa/dsa_priv.h
@@ -172,6 +172,7 @@ void dsa_slave_unregister_notifier(void);
 /* switch.c */
 int dsa_switch_register_notifier(struct dsa_switch *ds);
 void dsa_switch_unregister_notifier(struct dsa_switch *ds);
+int dsa_switch_set_addr(struct dsa_switch *ds);
 
 /* tag_brcm.c */
 extern const struct dsa_device_ops brcm_netdev_ops;
diff --git a/net/dsa/legacy.c b/net/dsa/legacy.c
index 19ff6e0a21dc..340ca7997271 100644
--- a/net/dsa/legacy.c
+++ b/net/dsa/legacy.c
@@ -172,11 +172,9 @@ static int dsa_switch_setup_one(struct dsa_switch *ds,
if (ret)
return ret;
 
-   if (ops->set_addr) {
-   ret = ops->set_addr(ds, master->dev_addr);
-   if (ret < 0)
-   return ret;
-   }
+   ret = dsa_switch_set_addr(ds);
+   if (ret)
+   return ret;
 
if (!ds->slave_mii_bus && ops->phy_read) {
ds->slave_mii_bus = devm_mdiobus_alloc(ds->dev);
diff --git a/net/dsa/switch.c b/net/dsa/switch.c
index e6c06aa349a6..b45a26b006af 100644
--- a/net/dsa/switch.c
+++ b/net/dsa/switch.c
@@ -10,6 +10,7 @@
  * (at your option) any later version.
  */
 
+#include 
 #include 
 #include 
 #include 
@@ -267,3 +268,19 @@ void dsa_switch_unregister_notifier(struct dsa_switch *ds)
if (err)
dev_err(ds->dev, "failed to unregister notifier (%d)\n", err);
 }
+
+int dsa_switch_set_addr(struct dsa_switch *ds)
+{
+   u8 addr[6];
+   int err;
+
+   if (ds->ops->set_addr) {
+   eth_random_addr(addr);
+
+   err = ds->ops->set_addr(ds, addr);
+   if (err)
+   return err;
+   }
+
+   return 0;
+}
-- 
2.14.2

Re: [net-next PATCH] mqprio: Reserve last 32 classid values for HW traffic classes and misc IDs

2017-10-12 Thread Jesus Sanchez-Palencia

Hi Alex,


On 10/12/2017 11:38 AM, Alexander Duyck wrote:
> From: Alexander Duyck 
> 
> This patch makes a slight tweak to mqprio in order to bring the
> classid values used back in line with what is used for mq. The general idea
> is to reserve values :ffe0 - :ffef to identify hardware traffic classes
> normally reported via dev->num_tc. By doing this we can maintain a
> consistent behavior with mq for classid where :1 - :ffdf will represent a
> physical qdisc mapped onto a Tx queue represented by classid - 1, and the
> traffic classes will be mapped onto a known subset of classid values
> reserved for our virtual qdiscs.
> 
> Note I reserved the range from :fff0 - : since this way we might be
> able to reuse these classid values with clsact and ingress which would mean
> that for mq, mqprio, ingress, and clsact we should be able to maintain a
> similar classid layout.
> 
> Signed-off-by: Alexander Duyck 
> ---
> 
> So I thought I would put this out here as a first step towards trying to
> address some of Jiri's concerns about wanting to have a consistent
> userspace API.
> 
> The plan is to follow this up with patches to ingress and clsact to look at
> exposing a set of virtual qdiscs similar to what we already have for the HW
> traffic classes in mqprio, although I won't bother with the ability to dump
> class stats since they don't actually enqueue anything.
> 
>  include/uapi/linux/pkt_sched.h |1 +
>  net/sched/sch_mqprio.c |   79 
> +++-
>  2 files changed, 47 insertions(+), 33 deletions(-)
> 
> diff --git a/include/uapi/linux/pkt_sched.h b/include/uapi/linux/pkt_sched.h
> index 099bf5528fed..174f1cf7e7f9 100644
> --- a/include/uapi/linux/pkt_sched.h
> +++ b/include/uapi/linux/pkt_sched.h
> @@ -74,6 +74,7 @@ struct tc_estimator {
>  #define TC_H_INGRESS(0xFFF1U)
>  #define TC_H_CLSACT  TC_H_INGRESS
>  
> +#define TC_H_MIN_PRIORITY0xFFE0U
>  #define TC_H_MIN_INGRESS 0xFFF2U
>  #define TC_H_MIN_EGRESS  0xFFF3U
>  
> diff --git a/net/sched/sch_mqprio.c b/net/sched/sch_mqprio.c
> index 6bcdfe6e7b63..a61ef119a556 100644
> --- a/net/sched/sch_mqprio.c
> +++ b/net/sched/sch_mqprio.c
> @@ -115,6 +115,10 @@ static int mqprio_init(struct Qdisc *sch, struct nlattr 
> *opt)
>   if (!netif_is_multiqueue(dev))
>   return -EOPNOTSUPP;
>  
> + /* make certain can allocate enough classids to handle queues */
> + if (dev->num_tx_queues >= TC_H_MIN_PRIORITY)
> + return -ENOMEM;
> +
>   if (!opt || nla_len(opt) < sizeof(*qopt))
>   return -EINVAL;
>  
> @@ -193,7 +197,7 @@ static struct netdev_queue *mqprio_queue_get(struct Qdisc 
> *sch,
>unsigned long cl)
>  {
>   struct net_device *dev = qdisc_dev(sch);
> - unsigned long ntx = cl - 1 - netdev_get_num_tc(dev);
> + unsigned long ntx = cl - 1;
>  
>   if (ntx >= dev->num_tx_queues)
>   return NULL;
> @@ -282,38 +286,35 @@ static unsigned long mqprio_find(struct Qdisc *sch, u32 
> classid)
>   struct net_device *dev = qdisc_dev(sch);
>   unsigned int ntx = TC_H_MIN(classid);
>  
> - if (ntx > dev->num_tx_queues + netdev_get_num_tc(dev))
> - return 0;
> - return ntx;
> + /* There are essentially two regions here that have valid classid
> +  * values. The first region will have a classid value of 1 through
> +  * num_tx_queues. All of these are backed by actual Qdiscs.
> +  */
> + if (ntx < TC_H_MIN_PRIORITY)
> + return (ntx <= dev->num_tx_queues) ? ntx : 0;
> +
> + /* The second region represents the hardware traffic classes. These
> +  * are represented by classid values of TC_H_MIN_PRIORITY through
> +  * TC_H_MIN_PRIORITY + netdev_get_num_tc - 1
> +  */
> + return ((ntx - TC_H_MIN_PRIORITY) < netdev_get_num_tc(dev)) ? ntx : 0;
>  }
>  
>  static int mqprio_dump_class(struct Qdisc *sch, unsigned long cl,
>struct sk_buff *skb, struct tcmsg *tcm)
>  {
> - struct net_device *dev = qdisc_dev(sch);
> + if (cl < TC_H_MIN_PRIORITY) {
> + struct netdev_queue *dev_queue = mqprio_queue_get(sch, cl);
> + struct net_device *dev = qdisc_dev(sch);
> + int tc = netdev_txq_to_tc(dev, cl - 1);
>  
> - if (cl <= netdev_get_num_tc(dev)) {
> + tcm->tcm_parent = (tc < 0) ? 0 :
> + TC_H_MAKE(TC_H_MAJ(sch->handle),
> +   TC_H_MIN(tc + TC_H_MIN_PRIORITY));
> + tcm->tcm_info = dev_queue->qdisc_sleeping->handle;
> + } else {
>   tcm->tcm_parent = TC_H_ROOT;
>   tcm->tcm_info = 0;
> - } else {
> - int i;
> - struct netdev_queue *dev_queue;
> -
> - dev_queue = mqprio_queue_get(sch, cl);
> - tcm->tcm_parent = 0;
> - for (i = 0; i < netdev_get_num_tc(dev); i++) {
> - st

Re: Ethtool question

2017-10-12 Thread Roopa Prabhu

On Thu, Oct 12, 2017 at 2:45 PM, Ben Greear  wrote:
> On 10/11/2017 01:49 PM, David Miller wrote:
>>
>> From: "John W. Linville" 
>> Date: Wed, 11 Oct 2017 16:44:07 -0400
>>
>>> On Wed, Oct 11, 2017 at 09:51:56AM -0700, Ben Greear wrote:

 I noticed today that setting some ethtool settings to the same value
 returns an error code.  I would think this should silently return
 success instead?  Makes it easier to call it from scripts this way:

 [root@lf0313-6477 lanforge]# ethtool -L eth3 combined 1
 combined unmodified, ignoring
 no channel parameters changed, aborting
 current values: tx 0 rx 0 other 1 combined 1
 [root@lf0313-6477 lanforge]# echo $?
 1
>>>
>>>
>>> I just had this discussion a couple of months ago with someone. My
>>> initial feeling was like you, a no-op is not a failure. But someone
>>> convinced me otherwise...I will now endeavour to remember who that
>>> was and how they convinced me...
>>>
>>> Anyone else have input here?
>>
>>
>> I guess this usually happens when drivers don't support changing the
>> settings at all.  So they just make their ethtool operation for the
>> 'set' always return an error.
>>
>> We could have a generic ethtool helper that does "get" and then if the
>> "set" request is identical just return zero.
>>
>> But from another perspective, the error returned from the "set" in this
>> situation also indicates to the user that the driver does not support
>> the "set" operation which has value and meaning in and of itself.  And
>> we'd lose that with the given suggestion.
>
>
> In my case, the driver (igb) does support the set, my program just made the
> same
> ethtool call several times and it fails after the initial change (that
> actually
> changes something), as best as I can figure.


This error is returned by ethtool user-space. It does a get, check and
then set if user has requested changes.

Re: [RFC 0/3] Adding config get/set to devlink

2017-10-12 Thread Roopa Prabhu

On Thu, Oct 12, 2017 at 12:20 PM, Florian Fainelli  wrote:
> On 10/12/2017 12:06 PM, David Miller wrote:
>> From: Florian Fainelli 
>> Date: Thu, 12 Oct 2017 08:43:59 -0700
>>
>>> Once we move ethtool (or however we name its successor) over to
>>> netlink there is an opportunity for accessing objects that do and do
>>> not have a netdevice representor today (e.g: management ports on
>>> switches) with the same interface, and devlink could be used for
>>> that.
>>
>> That is an interesting angle for including this in devlink.
>>
>> I'm not so sure what to do about this.
>>
>> One suggestion is that devlink is used for getting ethtool stats for
>> objects lacking netdev representor's, and a new genetlink family is
>> used for netdev based ethtool.
>
> Right, I was also thinking along those lines that we we would have a new
> generic netlink family for ethtool to support ethtool over netlink.

new api is fine by me. The reason for suggesting devlink was because
some of the devlink
port_* ops are close to ethtool ops that can operate on a port/netdev.
eg split_port could be a netdev operation
unless you want to split before the netdev is created.

There are some ops in devlink which are global hw parameters and not
specific to a port, those fit perfectly with
devlinks original goal.


>
>>
>> I think it's important that we don't expand the scope of devlink
>> beyond what it was originally designed for.
>
> It seems to me like devlink is well defined in what it is not for: it is
> not meant to be used for any object that is/has a net_device, but it is
> not well defined for what it can offer to these non network devices. For
> instance, we have a tremendous amount of operations that are extremely
> specific to its single user(s) such as mlx5 and mlxsw.
>
> For instance, I am not sure how the buffer reservation scheme can be
> generalized, and this is always the tricky part with a single user
> facility in that you try to generalize the best you can based on the HW
> you know. This is not a criticism or meant to be anything negative, this
> just happens to be the case, and we did not have anything better.
>
> So maybe the first thing is to clarify what devlink operations can and
> should be and what they are absolutely not allowed to cover. We should
> also clarify whether a generic set/get that Steven is proposing is
> something that we tolerate, or whether there should be specific function
> pointers implemented for each attribute, which would be more in line
> with what has been done thus far.
> --
> Florian

Re: [patch net-next 06/34] net: core: use dev->ingress_queue instead of tp->q

2017-10-12 Thread Daniel Borkmann


On 10/12/2017 07:17 PM, Jiri Pirko wrote:

From: Jiri Pirko 

In sch_handle_egress and sch_handle_ingress, don't use tp->q and use
dev->ingress_queue which stores the same pointer instead.

Signed-off-by: Jiri Pirko 
---
  net/core/dev.c | 21 +++--
  1 file changed, 15 insertions(+), 6 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index fcddccb..cb9e5e5 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -3273,14 +3273,18 @@ EXPORT_SYMBOL(dev_loopback_xmit);
  static struct sk_buff *
  sch_handle_egress(struct sk_buff *skb, int *ret, struct net_device *dev)
  {
+   struct netdev_queue *netdev_queue =
+   rcu_dereference_bh(dev->ingress_queue);
struct tcf_proto *cl = rcu_dereference_bh(dev->egress_cl_list);
struct tcf_result cl_res;
+   struct Qdisc *q;

-   if (!cl)
+   if (!cl || !netdev_queue)
return skb;
+   q = netdev_queue->qdisc;


NAK, no additional overhead in the software fast-path of
sch_handle_{ingress,egress}() like this. There are users out there
that use tc in software only, so performance is critical here.


/* qdisc_skb_cb(skb)->pkt_len was already set by the caller. */
-   qdisc_bstats_cpu_update(cl->q, skb);
+   qdisc_bstats_cpu_update(q, skb);

switch (tcf_classify(skb, cl, &cl_res, false)) {
case TC_ACT_OK:
@@ -3288,7 +3292,7 @@ sch_handle_egress(struct sk_buff *skb, int *ret, struct 
net_device *dev)
skb->tc_index = TC_H_MIN(cl_res.classid);
break;
case TC_ACT_SHOT:
-   qdisc_qstats_cpu_drop(cl->q);
+   qdisc_qstats_cpu_drop(q);
*ret = NET_XMIT_DROP;
kfree_skb(skb);
return NULL;
@@ -4188,16 +4192,21 @@ sch_handle_ingress(struct sk_buff *skb, struct 
packet_type **pt_prev, int *ret,
   struct net_device *orig_dev)
  {
  #ifdef CONFIG_NET_CLS_ACT
+   struct netdev_queue *netdev_queue =
+   rcu_dereference_bh(skb->dev->ingress_queue);
struct tcf_proto *cl = rcu_dereference_bh(skb->dev->ingress_cl_list);
struct tcf_result cl_res;
+   struct Qdisc *q;

/* If there's at least one ingress present somewhere (so
 * we get here via enabled static key), remaining devices
 * that are not configured with an ingress qdisc will bail
 * out here.
 */
-   if (!cl)
+   if (!cl || !netdev_queue)
return skb;
+   q = netdev_queue->qdisc;
+
if (*pt_prev) {
*ret = deliver_skb(skb, *pt_prev, orig_dev);
*pt_prev = NULL;
@@ -4205,7 +4214,7 @@ sch_handle_ingress(struct sk_buff *skb, struct 
packet_type **pt_prev, int *ret,

qdisc_skb_cb(skb)->pkt_len = skb->len;
skb->tc_at_ingress = 1;
-   qdisc_bstats_cpu_update(cl->q, skb);
+   qdisc_bstats_cpu_update(q, skb);

switch (tcf_classify(skb, cl, &cl_res, false)) {
case TC_ACT_OK:
@@ -4213,7 +4222,7 @@ sch_handle_ingress(struct sk_buff *skb, struct 
packet_type **pt_prev, int *ret,
skb->tc_index = TC_H_MIN(cl_res.classid);
break;
case TC_ACT_SHOT:
-   qdisc_qstats_cpu_drop(cl->q);
+   qdisc_qstats_cpu_drop(q);
kfree_skb(skb);
return NULL;
case TC_ACT_STOLEN:

Re: [PATCH net-next 01/12] bpf: verifier: set reg_type on context accesses in second pass

2017-10-12 Thread Daniel Borkmann


On 10/12/2017 11:39 PM, Jakub Kicinski wrote:

On Thu, 12 Oct 2017 23:33:21 +0200, Daniel Borkmann wrote:

On 10/12/2017 10:56 PM, Jakub Kicinski wrote:

On Thu, 12 Oct 2017 22:43:10 +0200, Daniel Borkmann wrote:

[...]

It would be nice to keep the reg_type setting in one place, meaning
the callbacks themselves, so we wouldn't need to maintain this in
multiple places.


Hm.. I though this was the smallest and simplest change.  I could
translate the offsets but that seems wobbly.  Or try to consolidate the
call into the same if () branch?  Not sure..


Different callbacks for post-verification would be good at min as it
would allow to keep all the context access info in one place for a
given type at least.


Sorry to be clear - you're suggesting adding a new callback to struct
bpf_verifier_ops, or swapping the struct bpf_verifier_ops for a
special post-verification one?


Either way is fine by me.


As a bonus info I discovered there is a bug in -net with how things are
converted.  We allow arithmetic on context pointers but then only
look at the insn.off in the converter...  I'm working on a fix.


Ohh well, good catch, indeed! :( Can you also add coverage to the
bpf selftests for this?


Will do!


Thanks,
Daniel

Re: Ethtool question

2017-10-12 Thread Ben Greear

On 10/11/2017 01:49 PM, David Miller wrote:

From: "John W. Linville" 
Date: Wed, 11 Oct 2017 16:44:07 -0400

On Wed, Oct 11, 2017 at 09:51:56AM -0700, Ben Greear wrote:

I noticed today that setting some ethtool settings to the same value
returns an error code.  I would think this should silently return
success instead?  Makes it easier to call it from scripts this way:

[root@lf0313-6477 lanforge]# ethtool -L eth3 combined 1
combined unmodified, ignoring
no channel parameters changed, aborting
current values: tx 0 rx 0 other 1 combined 1
[root@lf0313-6477 lanforge]# echo $?
1

I just had this discussion a couple of months ago with someone. My
initial feeling was like you, a no-op is not a failure. But someone
convinced me otherwise...I will now endeavour to remember who that
was and how they convinced me...

Anyone else have input here?

I guess this usually happens when drivers don't support changing the
settings at all.  So they just make their ethtool operation for the
'set' always return an error.

We could have a generic ethtool helper that does "get" and then if the
"set" request is identical just return zero.

But from another perspective, the error returned from the "set" in this
situation also indicates to the user that the driver does not support
the "set" operation which has value and meaning in and of itself.  And
we'd lose that with the given suggestion.

In my case, the driver (igb) does support the set, my program just made the same
ethtool call several times and it fails after the initial change (that actually
changes something), as best as I can figure.

Thanks,
Ben

--
Ben Greear 
Candela Technologies Inc  http://www.candelatech.com

Re: [PATCH net-next 01/12] bpf: verifier: set reg_type on context accesses in second pass

2017-10-12 Thread Jakub Kicinski

On Thu, 12 Oct 2017 23:33:21 +0200, Daniel Borkmann wrote:
> On 10/12/2017 10:56 PM, Jakub Kicinski wrote:
> > On Thu, 12 Oct 2017 22:43:10 +0200, Daniel Borkmann wrote:  
> [...]
> >> It would be nice to keep the reg_type setting in one place, meaning
> >> the callbacks themselves, so we wouldn't need to maintain this in
> >> multiple places.  
> >
> > Hm.. I though this was the smallest and simplest change.  I could
> > translate the offsets but that seems wobbly.  Or try to consolidate the
> > call into the same if () branch?  Not sure..  
> 
> Different callbacks for post-verification would be good at min as it
> would allow to keep all the context access info in one place for a
> given type at least.

Sorry to be clear - you're suggesting adding a new callback to struct
bpf_verifier_ops, or swapping the struct bpf_verifier_ops for a
special post-verification one?

> > As a bonus info I discovered there is a bug in -net with how things are
> > converted.  We allow arithmetic on context pointers but then only
> > look at the insn.off in the converter...  I'm working on a fix.  
> 
> Ohh well, good catch, indeed! :( Can you also add coverage to the
> bpf selftests for this?

Will do!

Re: [patch net-next 00/34] net: sched: allow qdiscs to share filter block instances

2017-10-12 Thread David Ahern

On 10/12/17 11:17 AM, Jiri Pirko wrote:
> So back to the example. First, we create 2 qdiscs. Both will share
> block number 22. "22" is just an identification. If we don't pass any
> block number, a new one will be generated by kernel:
> 
> $ tc qdisc add dev ens7 ingress block 22
> 
> $ tc qdisc add dev ens8 ingress block 22
> 
> 
> Now if we list the qdiscs, we will see the block index in the output:
> 
> $ tc qdisc
> qdisc ingress : dev ens7 parent :fff1 block 22 
> qdisc ingress : dev ens8 parent :fff1 block 22 
> 
> Now we can add filter to any of qdiscs sharing the same block:
> 
> $ tc filter add dev ens7 parent : protocol ip pref 25 flower dst_ip 
> 192.168.0.0/16 action drop
> 
> 
> We will see the same output if we list filters for ens7 and ens8, including 
> stats:
> 
> $ tc -s filter show dev ens7 ingress
> filter protocol ip pref 25 flower chain 0 
> filter protocol ip pref 25 flower chain 0 handle 0x1 
>   eth_type ipv4
>   dst_ip 192.168.0.0/16
>   not_in_hw
> action order 1: gact action drop
>  random type none pass val 0
>  index 1 ref 1 bind 1 installed 39 sec used 2 sec
> Action statistics:
> Sent 3108 bytes 37 pkt (dropped 37, overlimits 0 requeues 0) 
> backlog 0b 0p requeues 0 
> 
> $ tc -s filter show dev ens8 ingress
> filter protocol ip pref 25 flower chain 0 
> filter protocol ip pref 25 flower chain 0 handle 0x1 
>   eth_type ipv4
>   dst_ip 192.168.0.0/16
>   not_in_hw
> action order 1: gact action drop
>  random type none pass val 0
>  index 1 ref 1 bind 1 installed 40 sec used 3 sec
> Action statistics:
> Sent 3108 bytes 37 pkt (dropped 37, overlimits 0 requeues 0) 
> backlog 0b 0p requeues 0

This seems like really odd semantics to me ... a filter added to one
device shows up on another.

Why not make the shared block a standalone object that is configured
through its own set of commands and then referenced by both devices?

Re: [RFC 0/3] Adding config get/set to devlink

2017-10-12 Thread Michal Kubecek

On Thu, Oct 12, 2017 at 12:20:07PM -0700, Florian Fainelli wrote:
> On 10/12/2017 12:06 PM, David Miller wrote:
> > 
> > One suggestion is that devlink is used for getting ethtool stats for
> > objects lacking netdev representor's, and a new genetlink family is
> > used for netdev based ethtool.
> 
> Right, I was also thinking along those lines that we we would have a new
> generic netlink family for ethtool to support ethtool over netlink.

This is what I plan to work on on next SUSE Hackweek in November. But
I'm, of course, open to suggestions and I don't insist on this approach.

Michal Kubecek

Re: [PATCH net-next 01/12] bpf: verifier: set reg_type on context accesses in second pass

2017-10-12 Thread Daniel Borkmann


On 10/12/2017 10:56 PM, Jakub Kicinski wrote:

On Thu, 12 Oct 2017 22:43:10 +0200, Daniel Borkmann wrote:

[...]

It would be nice to keep the reg_type setting in one place, meaning
the callbacks themselves, so we wouldn't need to maintain this in
multiple places.


Hm.. I though this was the smallest and simplest change.  I could
translate the offsets but that seems wobbly.  Or try to consolidate the
call into the same if () branch?  Not sure..


Different callbacks for post-verification would be good at min as it
would allow to keep all the context access info in one place for a
given type at least.


As a bonus info I discovered there is a bug in -net with how things are
converted.  We allow arithmetic on context pointers but then only
look at the insn.off in the converter...  I'm working on a fix.


Ohh well, good catch, indeed! :( Can you also add coverage to the
bpf selftests for this?

Thanks,
Daniel

[PATCH] atm: fore200e: mark expected switch fall-throughs

2017-10-12 Thread Gustavo A. R. Silva

In preparation to enabling -Wimplicit-fallthrough, mark switch cases
where we are expecting to fall through.

Signed-off-by: Gustavo A. R. Silva 
---
 drivers/atm/fore200e.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/drivers/atm/fore200e.c b/drivers/atm/fore200e.c
index f8b7e86..126855e 100644
--- a/drivers/atm/fore200e.c
+++ b/drivers/atm/fore200e.c
@@ -358,26 +358,33 @@ fore200e_shutdown(struct fore200e* fore200e)
 case FORE200E_STATE_COMPLETE:
kfree(fore200e->stats);
 
+   /* fall through */
 case FORE200E_STATE_IRQ:
free_irq(fore200e->irq, fore200e->atm_dev);
 
+   /* fall through */
 case FORE200E_STATE_ALLOC_BUF:
fore200e_free_rx_buf(fore200e);
 
+   /* fall through */
 case FORE200E_STATE_INIT_BSQ:
fore200e_uninit_bs_queue(fore200e);
 
+   /* fall through */
 case FORE200E_STATE_INIT_RXQ:
fore200e->bus->dma_chunk_free(fore200e, &fore200e->host_rxq.status);
fore200e->bus->dma_chunk_free(fore200e, &fore200e->host_rxq.rpd);
 
+   /* fall through */
 case FORE200E_STATE_INIT_TXQ:
fore200e->bus->dma_chunk_free(fore200e, &fore200e->host_txq.status);
fore200e->bus->dma_chunk_free(fore200e, &fore200e->host_txq.tpd);
 
+   /* fall through */
 case FORE200E_STATE_INIT_CMDQ:
fore200e->bus->dma_chunk_free(fore200e, &fore200e->host_cmdq.status);
 
+   /* fall through */
 case FORE200E_STATE_INITIALIZE:
/* nothing to do for that state */
 
@@ -390,6 +397,7 @@ fore200e_shutdown(struct fore200e* fore200e)
 case FORE200E_STATE_MAP:
fore200e->bus->unmap(fore200e);
 
+   /* fall through */
 case FORE200E_STATE_CONFIGURE:
/* nothing to do for that state */
 
-- 
2.7.4

Hello

2017-10-12 Thread Daria Yoong Shang

Hello,

Can i trust an investment project in your country? accepted please send email 
for more details.

Best Regards 
Daria Yoong Shang

Re: [net-next V7 PATCH 3/5] bpf: cpumap xdp_buff to skb conversion and allocation

2017-10-12 Thread Edward Cree

On 12/10/17 13:26, Jesper Dangaard Brouer wrote:
> This patch makes cpumap functional, by adding SKB allocation and
> invoking the network stack on the dequeuing CPU.
>
> For constructing the SKB on the remote CPU, the xdp_buff in converted
> into a struct xdp_pkt, and it mapped into the top headroom of the
> packet, to avoid allocating separate mem.  For now, struct xdp_pkt is
> just a cpumap internal data structure, with info carried between
> enqueue to dequeue.



> +struct sk_buff *cpu_map_build_skb(struct bpf_cpu_map_entry *rcpu,
> +   struct xdp_pkt *xdp_pkt)
> +{
> + unsigned int frame_size;
> + void *pkt_data_start;
> + struct sk_buff *skb;
> +
> + /* build_skb need to place skb_shared_info after SKB end, and
> +  * also want to know the memory "truesize".  Thus, need to
> +  * know the memory frame size backing xdp_buff.
> +  *
> +  * XDP was designed to have PAGE_SIZE frames, but this
> +  * assumption is not longer true with ixgbe and i40e.  It
> +  * would be preferred to set frame_size to 2048 or 4096
> +  * depending on the driver.
> +  *   frame_size = 2048;
> +  *   frame_len  = frame_size - sizeof(*xdp_pkt);
> +  *
> +  * Instead, with info avail, skb_shared_info in placed after
> +  * packet len.  This, unfortunately fakes the truesize.
> +  * Another disadvantage of this approach, the skb_shared_info
> +  * is not at a fixed memory location, with mixed length
> +  * packets, which is bad for cache-line hotness.
> +  */
> + frame_size = SKB_DATA_ALIGN(xdp_pkt->len) + xdp_pkt->headroom +
> + SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
> +
> + pkt_data_start = xdp_pkt->data - xdp_pkt->headroom;
> + skb = build_skb(pkt_data_start, frame_size);
> + if (!skb)
> + return NULL;
> +
> + skb_reserve(skb, xdp_pkt->headroom);
> + __skb_put(skb, xdp_pkt->len);
> + if (xdp_pkt->metasize)
> + skb_metadata_set(skb, xdp_pkt->metasize);
> +
> + /* Essential SKB info: protocol and skb->dev */
> + skb->protocol = eth_type_trans(skb, xdp_pkt->dev_rx);
> +
> + /* Optional SKB info, currently missing:
> +  * - HW checksum info   (skb->ip_summed)
> +  * - HW RX hash (skb_set_hash)
> +  * - RX ring dev queue index(skb_record_rx_queue)
> +  */
One possibility for dealing with these and related issues — also things
 like the proper way to free an xdp_buff if SKB creation fails, which
 might not be page_frag_free() for some drivers with unusual recycle ring
 implementations — is to have a new ndo for 'receiving' an xdp_pkt from a
 cpumap redirect.
Since you're always receiving from the same driver that enqueued it, even
 the structure of the metadata stored in the top of the packet page
 doesn't have to be standardised; instead, each driver can put there just
 whatever happens to be needed for its ndo_xdp_rx routine.  (Though there
 would probably be standard enqueue and dequeue functions that the
 'common-case' drivers could use.)
In some cases, the driver could even just leave in the page the packet
 prefix it got from the NIC, rather than reading it and then writing an
 interpreted version back, thus minimising the number of packet-page
 cachelines the 'bottom half' RX function has to touch (it would still
 need to write in anything it got from the RX event, of course).
It shouldn't be much work as many driver RX routines are already
 structured this way — sfc, for instance, has a split into efx_rx_packet()
 and __efx_rx_packet(), as a software pipeline for prefetching.

-Ed

1 2 3 4 >

1 - 100 of 318 matches

Mail list logo