Re: [patch net-next v2 07/12] mlxsw: spectrum: Add the multicast routing offloading logic

2017-09-24 Thread Yotam Gigi
On 09/25/2017 04:48 AM, Yunsheng Lin wrote:
> Hi, Jiri
>
> On 2017/9/25 1:22, Jiri Pirko wrote:
>> From: Yotam Gigi 
>>
>> Add the multicast router offloading logic, which is in charge of handling
>> the VIF and MFC notifications and translating it to the hardware logic API.
>>
>> The offloading logic has to overcome several obstacles in order to safely
>> comply with the kernel multicast router user API:
>>  - It must keep track of the mapping between VIFs to netdevices. The user
>>can add an MFC cache entry pointing to a VIF, delete the VIF and add
>>re-add it with a different netdevice. The offloading logic has to handle
>>this in order to be compatible with the kernel logic.
>>  - It must keep track of the mapping between netdevices to spectrum RIFs,
>>as the current hardware implementation assume having a RIF for every
>>port in a multicast router.
>>  - It must handle routes pointing to pimreg device to be trapped to the
>>kernel, as the packet should be delivered to userspace.
>>  - It must handle routes pointing tunnel VIFs. The current implementation
>>does not support multicast forwarding to tunnels, thus routes that point
>>to a tunnel should be trapped to the kernel.
>>  - It must be aware of proxy multicast routes, which include both (*,*)
>>routes and duplicate routes. Currently proxy routes are not offloaded
>>and trigger the abort mechanism: removal of all routes from hardware and
>>triggering the traffic to go through the kernel.
>>
>> The multicast routing offloading logic also updates the counters of the
>> offloaded MFC routes in a periodic work.
>>
>> Signed-off-by: Yotam Gigi 
>> Reviewed-by: Ido Schimmel 
>> Signed-off-by: Jiri Pirko 
>> ---
>> v1->v2:
>>  - Update the lastuse MFC entry field too, in addition to packets an bytes.
>> ---
>>  drivers/net/ethernet/mellanox/mlxsw/Makefile  |3 +-
>>  drivers/net/ethernet/mellanox/mlxsw/spectrum.h|1 +
>>  drivers/net/ethernet/mellanox/mlxsw/spectrum_mr.c | 1014 
>> +
>>  drivers/net/ethernet/mellanox/mlxsw/spectrum_mr.h |  133 +++
>>  4 files changed, 1150 insertions(+), 1 deletion(-)
>>  create mode 100644 drivers/net/ethernet/mellanox/mlxsw/spectrum_mr.c
>>  create mode 100644 drivers/net/ethernet/mellanox/mlxsw/spectrum_mr.h
>>
>> diff --git a/drivers/net/ethernet/mellanox/mlxsw/Makefile 
>> b/drivers/net/ethernet/mellanox/mlxsw/Makefile
>> index 4b88158..9b29764 100644
>> --- a/drivers/net/ethernet/mellanox/mlxsw/Makefile
>> +++ b/drivers/net/ethernet/mellanox/mlxsw/Makefile
>> @@ -17,7 +17,8 @@ mlxsw_spectrum-objs:= spectrum.o 
>> spectrum_buffers.o \
>> spectrum_kvdl.o spectrum_acl_tcam.o \
>> spectrum_acl.o spectrum_flower.o \
>> spectrum_cnt.o spectrum_fid.o \
>> -   spectrum_ipip.o spectrum_acl_flex_actions.o
>> +   spectrum_ipip.o spectrum_acl_flex_actions.o \
>> +   spectrum_mr.o
>>  mlxsw_spectrum-$(CONFIG_MLXSW_SPECTRUM_DCB) += spectrum_dcb.o
>>  mlxsw_spectrum-$(CONFIG_NET_DEVLINK) += spectrum_dpipe.o
>>  obj-$(CONFIG_MLXSW_MINIMAL) += mlxsw_minimal.o
>> diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum.h 
>> b/drivers/net/ethernet/mellanox/mlxsw/spectrum.h
>> index e907ec4..51d8b9f 100644
>> --- a/drivers/net/ethernet/mellanox/mlxsw/spectrum.h
>> +++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum.h
>> @@ -153,6 +153,7 @@ struct mlxsw_sp {
>>  struct mlxsw_sp_sb *sb;
>>  struct mlxsw_sp_bridge *bridge;
>>  struct mlxsw_sp_router *router;
>> +struct mlxsw_sp_mr *mr;
>>  struct mlxsw_afa *afa;
>>  struct mlxsw_sp_acl *acl;
>>  struct mlxsw_sp_fid_core *fid_core;
>> diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_mr.c 
>> b/drivers/net/ethernet/mellanox/mlxsw/spectrum_mr.c
>> new file mode 100644
>> index 000..89b2e60
>> --- /dev/null
>> +++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_mr.c
>> @@ -0,0 +1,1014 @@
>> +/*
>> + * drivers/net/ethernet/mellanox/mlxsw/spectrum_mr.c
>> + * Copyright (c) 2017 Mellanox Technologies. All rights reserved.
>> + * Copyright (c) 2017 Yotam Gigi 
>> + *
>> + * Redistribution and use in source and binary forms, with or without
>> + * modification, are permitted provided that the following conditions are 
>> met:
>> + *
>> + * 1. Redistributions of source code must retain the above copyright
>> + *notice, this list of conditions and the following disclaimer.
>> + * 2. Redistributions in binary form must reproduce the above copyright
>> + *notice, this list of conditions and the following disclaimer in the
>> + *documentation and/or other materials provided with the distribution.
>> + * 3. Neither the names of the copyright holders nor the names of its
>> + *

Re: [patch net-next v2 06/12] net: mroute: Check if rule is a default rule

2017-09-24 Thread Yotam Gigi
On 09/25/2017 04:28 AM, Yunsheng Lin wrote:
> Hi, Jiri
>
> On 2017/9/25 1:22, Jiri Pirko wrote:
>> From: Yotam Gigi 
>>
>> When the ipmr starts, it adds one default FIB rule that matches all packets
>> and sends them to the DEFAULT (multicast) FIB table. A more complex rule
>> can be added by user to specify that for a specific interface, a packet
>> should be look up at either an arbitrary table or according to the l3mdev
>> of the interface.
>>
>> For drivers willing to offload the ipmr logic into a hardware but don't
>> want to offload all the FIB rules functionality, provide a function that
>> can indicate whether the FIB rule is the default multicast rule, thus only
>> one routing table is needed.
>>
>> This way, a driver can register to the FIB notification chain, get
>> notifications about FIB rules added and trigger some kind of an internal
>> abort mechanism when a non default rule is added by the user.
>>
>> Signed-off-by: Yotam Gigi 
>> Reviewed-by: Ido Schimmel 
>> Signed-off-by: Jiri Pirko 
>> ---
>>  include/linux/mroute.h |  7 +++
>>  net/ipv4/ipmr.c| 10 ++
>>  2 files changed, 17 insertions(+)
>>
>> diff --git a/include/linux/mroute.h b/include/linux/mroute.h
>> index 5566580..b072a84 100644
>> --- a/include/linux/mroute.h
>> +++ b/include/linux/mroute.h
>> @@ -5,6 +5,7 @@
>>  #include 
>>  #include 
>>  #include 
>> +#include 
>>  #include 
>>  #include 
>>  
>> @@ -19,6 +20,7 @@ int ip_mroute_getsockopt(struct sock *, int, char __user 
>> *, int __user *);
>>  int ipmr_ioctl(struct sock *sk, int cmd, void __user *arg);
>>  int ipmr_compat_ioctl(struct sock *sk, unsigned int cmd, void __user *arg);
>>  int ip_mr_init(void);
>> +bool ipmr_rule_default(const struct fib_rule *rule);
>>  #else
>>  static inline int ip_mroute_setsockopt(struct sock *sock, int optname,
>> char __user *optval, unsigned int optlen)
>> @@ -46,6 +48,11 @@ static inline int ip_mroute_opt(int opt)
>>  {
>>  return 0;
>>  }
>> +
>> +static inline bool ipmr_rule_default(const struct fib_rule *rule)
>> +{
>> +return true;
>> +}
>>  #endif
>>  
>>  struct vif_device {
>> diff --git a/net/ipv4/ipmr.c b/net/ipv4/ipmr.c
>> index 2a795d2..a714f55 100644
>> --- a/net/ipv4/ipmr.c
>> +++ b/net/ipv4/ipmr.c
>> @@ -320,6 +320,16 @@ static unsigned int ipmr_rules_seq_read(struct net *net)
>>  }
>>  #endif
>>  
>> +bool ipmr_rule_default(const struct fib_rule *rule)
>> +{
>> +#if IS_ENABLED(CONFIG_FIB_RULES)
>> +return fib_rule_matchall(rule) && rule->table == RT_TABLE_DEFAULT;
>> +#else
>> +return true;
>> +#endif
> In patch 02, You have the following, can you do the same for the above?
> +#ifdef CONFIG_IP_MROUTE
> +void ipmr_cache_free(struct mfc_cache *mfc_cache);
> +#else
> +static inline void ipmr_cache_free(struct mfc_cache *mfc_cache)
> +{
> +}
> +#endif

OK.

>> +}
>> +EXPORT_SYMBOL(ipmr_rule_default);
>> +
>>  static inline int ipmr_hash_cmp(struct rhashtable_compare_arg *arg,
>>  const void *ptr)
>>  {
>>



Re: [patch net-next v2 03/12] ipmr: Add FIB notification access functions

2017-09-24 Thread Yotam Gigi
On 09/25/2017 04:19 AM, Yunsheng Lin wrote:
> Hi, Jiri
>
> On 2017/9/25 1:22, Jiri Pirko wrote:
>> From: Yotam Gigi 
>>
>> Make the ipmr module register as a FIB notifier. To do that, implement both
>> the ipmr_seq_read and ipmr_dump ops.
>>
>> The ipmr_seq_read op returns a sequence counter that is incremented on
>> every notification related operation done by the ipmr. To implement that,
>> add a sequence counter in the netns_ipv4 struct and increment it whenever a
>> new MFC route or VIF are added or deleted. The sequence operations are
>> protected by the RTNL lock.
>>
>> The ipmr_dump iterates the list of MFC routes and the list of VIF entries
>> and sends notifications about them. The entries dump is done under RCU
>> where the VIF dump uses the mrt_lock too, as the vif->dev field can change
>> under RCU.
>>
>> Signed-off-by: Yotam Gigi 
>> Reviewed-by: Ido Schimmel 
>> Signed-off-by: Jiri Pirko 
>> ---
>> v1->v2:
>>  - Take the mrt_lock when dumping VIF entries.
>> ---
>>  include/linux/mroute.h   |  15 ++
>>  include/net/netns/ipv4.h |   3 ++
>>  net/ipv4/ipmr.c  | 137 
>> ++-
>>  3 files changed, 153 insertions(+), 2 deletions(-)
>>
>> diff --git a/include/linux/mroute.h b/include/linux/mroute.h
>> index 10028f2..54c5cb8 100644
>> --- a/include/linux/mroute.h
>> +++ b/include/linux/mroute.h
>> @@ -5,6 +5,7 @@
>>  #include 
>>  #include 
>>  #include 
>> +#include 
>>  #include 
>>  
>>  #ifdef CONFIG_IP_MROUTE
>> @@ -58,6 +59,14 @@ struct vif_device {
>>  int link;   /* Physical interface index 
>> */
>>  };
>>  
>> +struct vif_entry_notifier_info {
>> +struct fib_notifier_info info;
>> +struct net_device *dev;
>> +vifi_t vif_index;
>> +unsigned short vif_flags;
>> +u32 tb_id;
>> +};
>> +
>>  #define VIFF_STATIC 0x8000
>>  
>>  #define VIF_EXISTS(_mrt, _idx) ((_mrt)->vif_table[_idx].dev != NULL)
>> @@ -146,6 +155,12 @@ struct mfc_cache {
>>  struct rcu_head rcu;
>>  };
>>  
>> +struct mfc_entry_notifier_info {
>> +struct fib_notifier_info info;
>> +struct mfc_cache *mfc;
>> +u32 tb_id;
>> +};
>> +
>>  struct rtmsg;
>>  int ipmr_get_route(struct net *net, struct sk_buff *skb,
>> __be32 saddr, __be32 daddr,
>> diff --git a/include/net/netns/ipv4.h b/include/net/netns/ipv4.h
>> index 8387f09..abc84d9 100644
>> --- a/include/net/netns/ipv4.h
>> +++ b/include/net/netns/ipv4.h
>> @@ -163,6 +163,9 @@ struct netns_ipv4 {
>>  struct fib_notifier_ops *notifier_ops;
>>  unsigned intfib_seq;/* protected by rtnl_mutex */
>>  
>> +struct fib_notifier_ops *ipmr_notifier_ops;
> Can we add a const here?

It cannot be const as it get initialized it in ipmr_notifier_init.

>
>> +unsigned intipmr_seq;   /* protected by rtnl_mutex */
>> +
>>  atomic_trt_genid;
>>  };
>>  #endif
>> diff --git a/net/ipv4/ipmr.c b/net/ipv4/ipmr.c
>> index 86dc5f9..49879c3 100644
>> --- a/net/ipv4/ipmr.c
>> +++ b/net/ipv4/ipmr.c
>> @@ -264,6 +264,16 @@ static void __net_exit ipmr_rules_exit(struct net *net)
>>  fib_rules_unregister(net->ipv4.mr_rules_ops);
>>  rtnl_unlock();
>>  }
>> +
>> +static int ipmr_rules_dump(struct net *net, struct notifier_block *nb)
>> +{
>> +return fib_rules_dump(net, nb, RTNL_FAMILY_IPMR);
>> +}
>> +
>> +static unsigned int ipmr_rules_seq_read(struct net *net)
>> +{
>> +return fib_rules_seq_read(net, RTNL_FAMILY_IPMR);
>> +}
>>  #else
>>  #define ipmr_for_each_table(mrt, net) \
>>  for (mrt = net->ipv4.mrt; mrt; mrt = NULL)
>> @@ -298,6 +308,16 @@ static void __net_exit ipmr_rules_exit(struct net *net)
>>  net->ipv4.mrt = NULL;
>>  rtnl_unlock();
>>  }
>> +
>> +static int ipmr_rules_dump(struct net *net, struct notifier_block *nb)
>> +{
>> +return 0;
>> +}
>> +
>> +static unsigned int ipmr_rules_seq_read(struct net *net)
>> +{
>> +return 0;
>> +}
>>  #endif
>>  
>>  static inline int ipmr_hash_cmp(struct rhashtable_compare_arg *arg,
>> @@ -587,6 +607,43 @@ static struct net_device *ipmr_reg_vif(struct net *net, 
>> struct mr_table *mrt)
>>  }
>>  #endif
>>  
>> +static int call_ipmr_vif_entry_notifier(struct notifier_block *nb,
>> +struct net *net,
>> +enum fib_event_type event_type,
>> +struct vif_device *vif,
>> +vifi_t vif_index, u32 tb_id)
>> +{
>> +struct vif_entry_notifier_info info = {
>> +.info = {
>> +.family = RTNL_FAMILY_IPMR,
>> +.net = net,
>> +},
>> +.dev = vif->dev,
>> +.vif_index = vif_index,
>> +.vif_flags = vif->flags,
>> +.tb_id = tb_id,
>> +};
> We only use info.info which is fib_notifier_info, the
> vif_entry_notifier_info 

Re: [PATCH net-next] sch_netem: faster rb tree removal

2017-09-24 Thread Eric Dumazet
On Sun, 2017-09-24 at 20:05 -0600, David Ahern wrote:
> On 9/24/17 7:57 PM, David Ahern wrote:

> > Hi Eric:
> > 
> > I'm guessing the cost is in the rb_first and rb_next computations. Did
> > you consider something like this:
> > 
> > struct rb_root *root
> > struct rb_node **p = >rb_node;
> > 
> > while (*p != NULL) {
> > struct foobar *fb;
> > 
> > fb = container_of(*p, struct foobar, rb_node);
> > // fb processing
> rb_erase(>rb_node, root);
> 
> > p = >rb_node;
> > }
> > 
> 
> Oops, dropped the rb_erase in my consolidating the code to this snippet.

Hi David

This gives about same numbers than method_1

I tried with 10^7 skbs in the tree :

Your suggestion takes 66ns per skb, while the one I chose takes 37ns per
skb.

Thanks.





Re: [PATCH] mac80211: aead api to reduce redundancy

2017-09-24 Thread Johannes Berg
On Mon, 2017-09-25 at 12:56 +0800, Herbert Xu wrote:
> On Sun, Sep 24, 2017 at 07:42:46PM +0200, Johannes Berg wrote:
> > 
> > Unrelated to this, I'm not sure whose tree this should go through -
> > probably Herbert's (or DaveM's with his ACK? not sure if there's a
> > crypto tree?) or so?
> 
> Since you're just rearranging code invoking the crypto API, rather
> than touching actual crypto API code, I think you should handle it
> as you do with any other wireless patch.

The code moves to crypto/ though, and I'm not even sure I can vouch for
the Makefile choice there.

johannes


Re: [PATCH] mac80211: aead api to reduce redundancy

2017-09-24 Thread Herbert Xu
On Sun, Sep 24, 2017 at 07:42:46PM +0200, Johannes Berg wrote:
>
> Unrelated to this, I'm not sure whose tree this should go through -
> probably Herbert's (or DaveM's with his ACK? not sure if there's a
> crypto tree?) or so?

Since you're just rearranging code invoking the crypto API, rather
than touching actual crypto API code, I think you should handle it
as you do with any other wireless patch.

Thanks,
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt


Re: [PATCH v4 1/9] brcmsmac: make some local variables 'static const' to reduce stack size

2017-09-24 Thread Kalle Valo
Arnd Bergmann  writes:

> With KASAN and a couple of other patches applied, this driver is one
> of the few remaining ones that actually use more than 2048 bytes of
> kernel stack:
>
> broadcom/brcm80211/brcmsmac/phy/phy_n.c: In function 
> 'wlc_phy_workarounds_nphy_gainctrl':
> broadcom/brcm80211/brcmsmac/phy/phy_n.c:16065:1: warning: the frame size of 
> 3264 bytes is larger than 2048 bytes [-Wframe-larger-than=]
> broadcom/brcm80211/brcmsmac/phy/phy_n.c: In function 
> 'wlc_phy_workarounds_nphy':
> broadcom/brcm80211/brcmsmac/phy/phy_n.c:17138:1: warning: the frame size of 
> 2864 bytes is larger than 2048 bytes [-Wframe-larger-than=]
>
> Here, I'm reducing the stack size by marking as many local variables as
> 'static const' as I can without changing the actual code.
>
> This is the first of three patches to improve the stack usage in this
> driver. It would be good to have this backported to stabl kernels
> to get all drivers in 'allmodconfig' below the 2048 byte limit so
> we can turn on the frame warning again globally, but I realize that
> the patch is larger than the normal limit for stable backports.
>
> The other two patches do not need to be backported.
>
> Acked-by: Arend van Spriel 
> Signed-off-by: Arnd Bergmann 

I'll queue this and the two following brcmsmac patches for 4.14.

Also I'll add (only for this patch):

Cc: 

-- 
Kalle Valo


Re: usb/wireless/rsi_91x: use-after-free write in __run_timers

2017-09-24 Thread Kalle Valo
Andrey Konovalov  writes:

> I've got the following report while fuzzing the kernel with syzkaller.
>
> On commit 6e80ecdddf4ea6f3cd84e83720f3d852e6624a68 (Sep 21).
>
> ==
> BUG: KASAN: use-after-free in __run_timers+0xc0e/0xd40
> Write of size 8 at addr 880069f701b8 by task swapper/0/0
>
> CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.14.0-rc1-42311-g6e80ecdddf4e #234
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011

[...]

> Allocated by task 1845:
>  save_stack_trace+0x1b/0x20 arch/x86/kernel/stacktrace.c:59
>  save_stack+0x43/0xd0 mm/kasan/kasan.c:447
>  set_track mm/kasan/kasan.c:459
>  kasan_kmalloc+0xad/0xe0 mm/kasan/kasan.c:551
>  kmem_cache_alloc_trace+0x11e/0x2d0 mm/slub.c:2772
>  kmalloc ./include/linux/slab.h:493
>  kzalloc ./include/linux/slab.h:666
>  rsi_91x_init+0x98/0x510 drivers/net/wireless/rsi/rsi_91x_main.c:203
>  rsi_probe+0xb6/0x13b0 drivers/net/wireless/rsi/rsi_91x_usb.c:665
>  usb_probe_interface+0x35d/0x8e0 drivers/usb/core/driver.c:361

I'm curious about your setup. Apparently you are running syzkaller on
QEMU but what I don't understand is how the rsi device comes into the
picture. Did you have a rsi usb device connected to the virtual machine
or what? Or does syzkaller do some kind of magic here?

-- 
Kalle Valo


Re: [PATCH] brcm80211: make const array ucode_ofdm_rates static, reduces object code size

2017-09-24 Thread Kalle Valo
Arend van Spriel  writes:

> Please use 'brcmsmac:' as prefix instead of 'brcm80211:'.

I can fix that.

-- 
Kalle Valo


[PATCH v3 net-next 10/12] gtp: Experimental encapsulation of IPv6 packets

2017-09-24 Thread Tom Herbert
Allow IPv6 mobile subscriber packets. This entails adding an IPv6 mobile
subscriber address to pdp context and IPv6 specific variants to find pdp
contexts by address.

Note that this is experimental support of IPv6, more work is
necessary to make this compliant with 3GPP standard.

Signed-off-by: Tom Herbert 
---
 drivers/net/Kconfig  |  12 +-
 drivers/net/gtp.c| 324 +++
 include/uapi/linux/gtp.h |   1 +
 3 files changed, 280 insertions(+), 57 deletions(-)

diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig
index aba0d652095b..8e55367ab6d4 100644
--- a/drivers/net/Kconfig
+++ b/drivers/net/Kconfig
@@ -225,7 +225,17 @@ config GTP
  3GPP TS 29.060 standards.
 
  To compile this drivers as a module, choose M here: the module
- wil be called gtp.
+ will be called gtp.
+
+config GTP_IPV6_EXPERIMENTAL
+   bool "GTP IPv6 datapath (EXPERIMENTAL)"
+   default n
+   depends on GTP
+   ---help---
+ This is an experimental implementation that allows encapsulating
+ IPv6 over GTP and using GTP over IPv6 for testing and development
+ purpose. This is not a standards conformant implementation for
+ IPv6 and GTP. More work is needed reach that level.
 
 config MACSEC
tristate "IEEE 802.1AE MAC-level encryption (MACsec)"
diff --git a/drivers/net/gtp.c b/drivers/net/gtp.c
index 44844eba8df2..919ec6e14973 100644
--- a/drivers/net/gtp.c
+++ b/drivers/net/gtp.c
@@ -36,6 +36,8 @@
 #include 
 #include 
 
+#define GTP_IPV6 IS_ENABLED(CONFIG_GTP_IPV6_EXPERIMENTAL)
+
 /* An active session for the subscriber. */
 struct pdp_ctx {
struct hlist_node   hlist_tid;
@@ -55,9 +57,17 @@ struct pdp_ctx {
u8  gtp_version;
u8  hlen;
__be16  gtp_port;
-   u16 af;
 
-   struct in_addr  ms_addr_ip4;
+   u16 ms_af;
+#if GTP_IPV6
+   union {
+   struct in_addr  ms_addr_ip4;
+   struct in6_addr ms_addr_ip6;
+   };
+#else
+   struct in_addr  ms_addr_ip4;
+#endif
+
struct in_addr  peer_addr_ip4;
 
struct sock *sk;
@@ -81,7 +91,11 @@ struct gtp_dev {
unsigned introle;
unsigned inthash_size;
struct hlist_head   *tid_hash;
-   struct hlist_head   *addr_hash;
+
+   struct hlist_head   *addr4_hash;
+#if GTP_IPV6
+   struct hlist_head   *addr6_hash;
+#endif
 
struct gro_cellsgro_cells;
 };
@@ -99,6 +113,7 @@ static void pdp_context_delete(struct pdp_ctx *pctx);
 static inline u32 gtp0_hashfn(u64 tid)
 {
u32 *tid32 = (u32 *) 
+
return jhash_2words(tid32[0], tid32[1], gtp_h_initval);
 }
 
@@ -107,11 +122,6 @@ static inline u32 gtp1u_hashfn(u32 tid)
return jhash_1word(tid, gtp_h_initval);
 }
 
-static inline u32 ipv4_hashfn(__be32 ip)
-{
-   return jhash_1word((__force u32)ip, gtp_h_initval);
-}
-
 /* Resolve a PDP context structure based on the 64bit TID. */
 static struct pdp_ctx *gtp0_pdp_find(struct gtp_dev *gtp, u64 tid)
 {
@@ -144,16 +154,21 @@ static struct pdp_ctx *gtp1_pdp_find(struct gtp_dev *gtp, 
u32 tid)
return NULL;
 }
 
+static inline u32 gtp_ipv4_hashfn(__be32 ip)
+{
+   return jhash_1word((__force u32)ip, gtp_h_initval);
+}
+
 /* Resolve a PDP context based on IPv4 address of MS. */
 static struct pdp_ctx *ipv4_pdp_find(struct gtp_dev *gtp, __be32 ms_addr)
 {
struct hlist_head *head;
struct pdp_ctx *pdp;
 
-   head = >addr_hash[ipv4_hashfn(ms_addr) % gtp->hash_size];
+   head = >addr4_hash[gtp_ipv4_hashfn(ms_addr) % gtp->hash_size];
 
hlist_for_each_entry_rcu(pdp, head, hlist_addr) {
-   if (pdp->af == AF_INET &&
+   if (pdp->ms_af == AF_INET &&
pdp->ms_addr_ip4.s_addr == ms_addr)
return pdp;
}
@@ -177,33 +192,109 @@ static bool gtp_check_ms_ipv4(struct sk_buff *skb, 
struct pdp_ctx *pctx,
return iph->saddr == pctx->ms_addr_ip4.s_addr;
 }
 
+#if GTP_IPV6
+
+static inline u32 gtp_ipv6_hashfn(const struct in6_addr *a)
+{
+   return __ipv6_addr_jhash(a, gtp_h_initval);
+}
+
+/* Resolve a PDP context based on IPv6 address of MS. */
+static struct pdp_ctx *ipv6_pdp_find(struct gtp_dev *gtp,
+const struct in6_addr *ms_addr)
+{
+   struct hlist_head *head;
+   struct pdp_ctx *pdp;
+
+   head = >addr6_hash[gtp_ipv6_hashfn(ms_addr) % gtp->hash_size];
+
+   hlist_for_each_entry_rcu(pdp, head, hlist_addr) {
+   if (pdp->ms_af == AF_INET6 &&
+   ipv6_addr_equal(>ms_addr_ip6, ms_addr))
+   return pdp;
+   }
+
+   return NULL;
+}
+
+static bool gtp_check_ms_ipv6(struct sk_buff *skb, struct pdp_ctx *pctx,
+ 

[PATCH v3 net-next 09/12] gtp: Eliminate pktinfo and add port configuration

2017-09-24 Thread Tom Herbert
The gtp pktinfo structure is unnecessary and needs a lot of code to
manage it. Remove it. Also, add per pdp port configuration for transmit.

Signed-off-by: Tom Herbert 
---
 drivers/net/gtp.c| 177 +--
 include/uapi/linux/gtp.h |   1 +
 2 files changed, 80 insertions(+), 98 deletions(-)

diff --git a/drivers/net/gtp.c b/drivers/net/gtp.c
index bbb08f8849d3..44844eba8df2 100644
--- a/drivers/net/gtp.c
+++ b/drivers/net/gtp.c
@@ -54,6 +54,7 @@ struct pdp_ctx {
} u;
u8  gtp_version;
u8  hlen;
+   __be16  gtp_port;
u16 af;
 
struct in_addr  ms_addr_ip4;
@@ -420,73 +421,36 @@ static inline void gtp1_push_header(struct sk_buff *skb, 
struct pdp_ctx *pctx)
 */
 }
 
-struct gtp_pktinfo {
-   struct sock *sk;
-   struct iphdr*iph;
-   struct flowi4   fl4;
-   struct rtable   *rt;
-   struct pdp_ctx  *pctx;
-   struct net_device   *dev;
-   __be16  gtph_port;
-};
-
-static void gtp_push_header(struct sk_buff *skb, struct gtp_pktinfo *pktinfo)
+static void gtp_push_header(struct sk_buff *skb, struct pdp_ctx *pctx)
 {
-   switch (pktinfo->pctx->gtp_version) {
+   switch (pctx->gtp_version) {
case GTP_V0:
-   pktinfo->gtph_port = htons(GTP0_PORT);
-   gtp0_push_header(skb, pktinfo->pctx);
+   gtp0_push_header(skb, pctx);
break;
case GTP_V1:
-   pktinfo->gtph_port = htons(GTP1U_PORT);
-   gtp1_push_header(skb, pktinfo->pctx);
+   gtp1_push_header(skb, pctx);
break;
}
 }
 
-static inline void gtp_set_pktinfo_ipv4(struct gtp_pktinfo *pktinfo,
-   struct sock *sk, struct iphdr *iph,
-   struct pdp_ctx *pctx, struct rtable *rt,
-   struct flowi4 *fl4,
-   struct net_device *dev)
-{
-   pktinfo->sk = sk;
-   pktinfo->iph= iph;
-   pktinfo->pctx   = pctx;
-   pktinfo->rt = rt;
-   pktinfo->fl4= *fl4;
-   pktinfo->dev= dev;
-}
-
-static int gtp_build_skb_ip4(struct sk_buff *skb, struct net_device *dev,
-struct gtp_pktinfo *pktinfo)
+static int gtp_xmit(struct sk_buff *skb, struct net_device *dev,
+   struct pdp_ctx *pctx)
 {
-   struct gtp_dev *gtp = netdev_priv(dev);
-   struct pdp_ctx *pctx;
+   struct iphdr *inner_iph = NULL;
+   struct sock *sk = pctx->sk;
+   __be32 saddr = inet_sk(sk)->inet_saddr;
struct rtable *rt;
-   struct flowi4 fl4;
-   struct iphdr *iph;
-   struct sock *sk;
-   __be32 saddr;
+   int err = 0;
 
-   /* Read the IP destination address and resolve the PDP context.
-* Prepend PDP header with TEI/TID from PDP ctx.
-*/
-   iph = ip_hdr(skb);
-   if (gtp->role == GTP_ROLE_SGSN)
-   pctx = ipv4_pdp_find(gtp, iph->saddr);
-   else
-   pctx = ipv4_pdp_find(gtp, iph->daddr);
+   if (skb->protocol == ETH_P_IP)
+   inner_iph = ip_hdr(skb);
 
-   if (!pctx) {
-   netdev_dbg(dev, "no PDP ctx found for %pI4, skip\n",
-  >daddr);
-   return -ENOENT;
-   }
-   netdev_dbg(dev, "found PDP context %p\n", pctx);
+   /* Ensure there is sufficient headroom. */
+   err = skb_cow_head(skb, dev->needed_headroom);
+   if (unlikely(err))
+   goto out_err;
 
-   sk = pctx->sk;
-   saddr = inet_sk(sk)->inet_saddr;
+   skb_reset_inner_headers(skb);
 
/* Source address returned by route lookup is ignored since
 * we get the address from a socket.
@@ -494,81 +458,89 @@ static int gtp_build_skb_ip4(struct sk_buff *skb, struct 
net_device *dev,
rt = ip_tunnel_get_route(dev, skb, sk->sk_protocol,
 sk->sk_bound_dev_if, RT_CONN_FLAGS(sk),
 pctx->peer_addr_ip4.s_addr, ,
-pktinfo->gtph_port, pktinfo->gtph_port,
+pctx->gtp_port, pctx->gtp_port,
 >dst_cache, NULL);
 
if (IS_ERR(rt)) {
-   if (rt == ERR_PTR(-ELOOP)) {
-   netdev_dbg(dev, "circular route to SSGN %pI4\n",
-  >peer_addr_ip4.s_addr);
-   dev->stats.collisions++;
-   goto err_rt;
-   } else {
-   netdev_dbg(dev, "no route to SSGN %pI4\n",
-  >peer_addr_ip4.s_addr);
-   dev->stats.tx_carrier_errors++;
-   goto err;

[PATCH v3 net-next 07/12] gtp: udp recv clean up

2017-09-24 Thread Tom Herbert
Create separate UDP receive functions for GTP version 0 and version 1.
Set encap_rcv appropriately when configuring a socket.

Signed-off-by: Tom Herbert 
---
 drivers/net/gtp.c | 100 ++
 1 file changed, 49 insertions(+), 51 deletions(-)

diff --git a/drivers/net/gtp.c b/drivers/net/gtp.c
index 00e5ea5cb935..a6e2e0a1f424 100644
--- a/drivers/net/gtp.c
+++ b/drivers/net/gtp.c
@@ -225,14 +225,20 @@ static int gtp_rx(struct pdp_ctx *pctx, struct sk_buff 
*skb,
return 0;
 }
 
-/* 1 means pass up to the stack, -1 means drop and 0 means decapsulated. */
-static int gtp0_udp_encap_recv(struct gtp_dev *gtp, struct sk_buff *skb)
+/* UDP encapsulation receive handler for GTPv0-U . See net/ipv4/udp.c.
+ * Return codes: 0: success, <0: error, >0: pass up to userspace UDP socket.
+ */
+static int gtp0_udp_encap_recv(struct sock *sk, struct sk_buff *skb)
 {
+   struct gtp_dev *gtp = rcu_dereference_sk_user_data(sk);
unsigned int hdrlen = sizeof(struct udphdr) +
  sizeof(struct gtp0_header);
struct gtp0_header *gtp0;
struct pdp_ctx *pctx;
 
+   if (!gtp)
+   goto pass;
+
if (!pskb_may_pull(skb, hdrlen))
goto drop;
 
@@ -244,26 +250,41 @@ static int gtp0_udp_encap_recv(struct gtp_dev *gtp, 
struct sk_buff *skb)
if (gtp0->type != GTP_TPDU)
goto pass;
 
+   netdev_dbg(gtp->dev, "received GTP0 packet\n");
+
pctx = gtp0_pdp_find(gtp, be64_to_cpu(gtp0->tid));
if (!pctx) {
netdev_dbg(gtp->dev, "No PDP ctx to decap skb=%p\n", skb);
goto pass;
}
 
-   return gtp_rx(pctx, skb, hdrlen, gtp->role);
+   if (!gtp_rx(pctx, skb, hdrlen, gtp->role)) {
+   /* Successfully received */
+   return 0;
+   }
+
 drop:
-   return -1;
+   kfree_skb(skb);
+   return 0;
+
 pass:
return 1;
 }
 
-static int gtp1u_udp_encap_recv(struct gtp_dev *gtp, struct sk_buff *skb)
+/* UDP encapsulation receive handler for GTPv0-U . See net/ipv4/udp.c.
+ * Return codes: 0: success, <0: error, >0: pass up to userspace UDP socket.
+ */
+static int gtp1u_udp_encap_recv(struct sock *sk, struct sk_buff *skb)
 {
+   struct gtp_dev *gtp = rcu_dereference_sk_user_data(sk);
unsigned int hdrlen = sizeof(struct udphdr) +
  sizeof(struct gtp1_header);
struct gtp1_header *gtp1;
struct pdp_ctx *pctx;
 
+   if (!gtp)
+   goto pass;
+
if (!pskb_may_pull(skb, hdrlen))
goto drop;
 
@@ -275,6 +296,8 @@ static int gtp1u_udp_encap_recv(struct gtp_dev *gtp, struct 
sk_buff *skb)
if (gtp1->type != GTP_TPDU)
goto pass;
 
+   netdev_dbg(gtp->dev, "received GTP1 packet\n");
+
/* From 29.060: "This field shall be present if and only if any one or
 * more of the S, PN and E flags are set.".
 *
@@ -296,9 +319,15 @@ static int gtp1u_udp_encap_recv(struct gtp_dev *gtp, 
struct sk_buff *skb)
goto drop;
}
 
-   return gtp_rx(pctx, skb, hdrlen, gtp->role);
+   if (!gtp_rx(pctx, skb, hdrlen, gtp->role)) {
+   /* Successfully received */
+   return 0;
+   }
+
 drop:
-   return -1;
+   kfree_skb(skb);
+   return 0;
+
 pass:
return 1;
 }
@@ -329,49 +358,6 @@ static void gtp_encap_disable(struct gtp_dev *gtp)
gtp_encap_disable_sock(gtp->sk1u);
 }
 
-/* UDP encapsulation receive handler. See net/ipv4/udp.c.
- * Return codes: 0: success, <0: error, >0: pass up to userspace UDP socket.
- */
-static int gtp_encap_recv(struct sock *sk, struct sk_buff *skb)
-{
-   struct gtp_dev *gtp;
-   int ret = 0;
-
-   gtp = rcu_dereference_sk_user_data(sk);
-   if (!gtp)
-   return 1;
-
-   netdev_dbg(gtp->dev, "encap_recv sk=%p\n", sk);
-
-   switch (udp_sk(sk)->encap_type) {
-   case UDP_ENCAP_GTP0:
-   netdev_dbg(gtp->dev, "received GTP0 packet\n");
-   ret = gtp0_udp_encap_recv(gtp, skb);
-   break;
-   case UDP_ENCAP_GTP1U:
-   netdev_dbg(gtp->dev, "received GTP1U packet\n");
-   ret = gtp1u_udp_encap_recv(gtp, skb);
-   break;
-   default:
-   ret = -1; /* Shouldn't happen. */
-   }
-
-   switch (ret) {
-   case 1:
-   netdev_dbg(gtp->dev, "pass up to the process\n");
-   break;
-   case 0:
-   break;
-   case -1:
-   netdev_dbg(gtp->dev, "GTP packet has been dropped\n");
-   kfree_skb(skb);
-   ret = 0;
-   break;
-   }
-
-   return ret;
-}
-
 static int gtp_dev_init(struct net_device *dev)
 {
struct gtp_dev *gtp = netdev_priv(dev);
@@ -824,9 +810,21 @@ static struct sock *gtp_encap_enable_socket(int fd, int 
type,
  

[PATCH v3 net-next 11/12] gtp: Experimental support encpasulating over IPv6

2017-09-24 Thread Tom Herbert
Allows using GTP datapath over IPv6. Remote peers are indicated by IPv6.

Note this is experimental, more work is needed to make this
compliant with 3GPP standard.

Signed-off-by: Tom Herbert 
---
 drivers/net/gtp.c| 248 ++-
 include/uapi/linux/gtp.h |   1 +
 include/uapi/linux/if_link.h |   3 +
 3 files changed, 200 insertions(+), 52 deletions(-)

diff --git a/drivers/net/gtp.c b/drivers/net/gtp.c
index 919ec6e14973..1c580df4cfc5 100644
--- a/drivers/net/gtp.c
+++ b/drivers/net/gtp.c
@@ -28,6 +28,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -59,16 +60,22 @@ struct pdp_ctx {
__be16  gtp_port;
 
u16 ms_af;
+   u16 peer_af;
 #if GTP_IPV6
union {
struct in_addr  ms_addr_ip4;
struct in6_addr ms_addr_ip6;
};
+
+   union {
+   struct in_addr  peer_addr_ip4;
+   struct in6_addr peer_addr_ip6;
+   };
 #else
struct in_addr  ms_addr_ip4;
+   struct in_addr  peer_addr_ip4;
 #endif
 
-   struct in_addr  peer_addr_ip4;
 
struct sock *sk;
struct net_device   *dev;
@@ -93,8 +100,11 @@ struct gtp_dev {
struct hlist_head   *tid_hash;
 
struct hlist_head   *addr4_hash;
+
 #if GTP_IPV6
struct hlist_head   *addr6_hash;
+
+   unsigned intis_ipv6:1;
 #endif
 
struct gro_cellsgro_cells;
@@ -534,8 +544,6 @@ static int gtp_xmit(struct sk_buff *skb, struct net_device 
*dev,
 {
struct iphdr *inner_iph = NULL;
struct sock *sk = pctx->sk;
-   __be32 saddr = inet_sk(sk)->inet_saddr;
-   struct rtable *rt;
int err = 0;
 
if (skb->protocol == ETH_P_IP)
@@ -548,38 +556,84 @@ static int gtp_xmit(struct sk_buff *skb, struct 
net_device *dev,
 
skb_reset_inner_headers(skb);
 
-   /* Source address returned by route lookup is ignored since
-* we get the address from a socket.
-*/
-   rt = ip_tunnel_get_route(dev, skb, sk->sk_protocol,
-sk->sk_bound_dev_if, RT_CONN_FLAGS(sk),
-pctx->peer_addr_ip4.s_addr, ,
-pctx->gtp_port, pctx->gtp_port,
->dst_cache, NULL);
-
-   if (IS_ERR(rt)) {
-   err = PTR_ERR(rt);
-   goto out_err;
-   }
+   if (pctx->peer_af == AF_INET) {
+   __be32 saddr = inet_sk(sk)->inet_saddr;
+   struct rtable *rt;
+
+   /* Source address returned by route lookup is ignored since
+* we get the address from a socket.
+*/
+   rt = ip_tunnel_get_route(dev, skb, sk->sk_protocol,
+sk->sk_bound_dev_if, RT_CONN_FLAGS(sk),
+pctx->peer_addr_ip4.s_addr, ,
+pctx->gtp_port, pctx->gtp_port,
+>dst_cache, NULL);
+
+   if (IS_ERR(rt)) {
+   err = PTR_ERR(rt);
+   goto out_err;
+   }
+
+   skb_dst_drop(skb);
 
-   skb_dst_drop(skb);
+   gtp_push_header(skb, pctx);
 
-   gtp_push_header(skb, pctx);
+   if (inner_iph)
+   __iptunnel_update_pmtu(dev, skb, >dst,
+  !!inner_iph->frag_off,
+  inner_iph, pctx->hlen,
+  pctx->peer_addr_ip4.s_addr);
 
-   if (inner_iph)
-   __iptunnel_update_pmtu(dev, skb, >dst,
-  !!inner_iph->frag_off,
-  inner_iph, pctx->hlen,
-  pctx->peer_addr_ip4.s_addr);
+   udp_tunnel_xmit_skb(rt, sk, skb, saddr,
+   pctx->peer_addr_ip4.s_addr,
+   0, ip4_dst_hoplimit(>dst), 0,
+   pctx->gtp_port, pctx->gtp_port,
+   false, false);
 
-   udp_tunnel_xmit_skb(rt, sk, skb, saddr,
-   pctx->peer_addr_ip4.s_addr,
-   0, ip4_dst_hoplimit(>dst), 0,
-   pctx->gtp_port, pctx->gtp_port,
-   false, false);
+   netdev_dbg(dev, "gtp -> IP src: %pI4 dst: %pI4\n",
+  , >peer_addr_ip4.s_addr);
 
-   netdev_dbg(dev, "gtp -> IP src: %pI4 dst: %pI4\n",
-  , >peer_addr_ip4.s_addr);
+#if GTP_IPV6
+#if IS_ENABLED(CONFIG_IPV6)
+   } else if (pctx->peer_af == AF_INET6) {
+   struct in6_addr saddr = 

[PATCH v3 net-next 12/12] gtp: Allow configuring GTP interface as standalone

2017-09-24 Thread Tom Herbert
Add new configuration of GTP interfaces that allow specifying a port to
listen on (as opposed to having to get sockets from a userspace control
plane). This allows GTP interfaces to be configured and the data path
tested without requiring a GTP-C daemon.

Signed-off-by: Tom Herbert 
---
 drivers/net/gtp.c| 215 ---
 include/uapi/linux/gtp.h |   5 ++
 2 files changed, 169 insertions(+), 51 deletions(-)

diff --git a/drivers/net/gtp.c b/drivers/net/gtp.c
index 1c580df4cfc5..dc1fcd3034af 100644
--- a/drivers/net/gtp.c
+++ b/drivers/net/gtp.c
@@ -93,6 +93,9 @@ struct gtp_dev {
struct sock *sk0;
struct sock *sk1u;
 
+   struct socket   *sock0;
+   struct socket   *sock1u;
+
struct net_device   *dev;
 
unsigned introle;
@@ -451,26 +454,33 @@ static void gtp_encap_destroy(struct sock *sk)
}
 }
 
-static void gtp_encap_disable_sock(struct sock *sk)
+static void gtp_encap_release(struct gtp_dev *gtp)
 {
-   if (!sk)
-   return;
+   if (gtp->sk0) {
+   if (gtp->sock0) {
+   udp_tunnel_sock_release(gtp->sock0);
+   gtp->sock0 = NULL;
+   } else {
+   gtp_encap_destroy(gtp->sk0);
+   }
 
-   gtp_encap_destroy(sk);
-}
+   gtp->sk0 = NULL;
+   }
 
-static void gtp_encap_disable(struct gtp_dev *gtp)
-{
-   gtp_encap_disable_sock(gtp->sk0);
-   gtp_encap_disable_sock(gtp->sk1u);
+   if (gtp->sk1u) {
+   if (gtp->sock1u) {
+   udp_tunnel_sock_release(gtp->sock1u);
+   gtp->sock1u = NULL;
+   } else {
+   gtp_encap_destroy(gtp->sk1u);
+   }
+
+   gtp->sk1u = NULL;
+   }
 }
 
 static int gtp_dev_init(struct net_device *dev)
 {
-   struct gtp_dev *gtp = netdev_priv(dev);
-
-   gtp->dev = dev;
-
dev->tstats = netdev_alloc_pcpu_stats(struct pcpu_sw_netstats);
if (!dev->tstats)
return -ENOMEM;
@@ -482,7 +492,8 @@ static void gtp_dev_uninit(struct net_device *dev)
 {
struct gtp_dev *gtp = netdev_priv(dev);
 
-   gtp_encap_disable(gtp);
+   gtp_encap_release(gtp);
+
free_percpu(dev->tstats);
 }
 
@@ -751,6 +762,8 @@ static void gtp_link_setup(struct net_device *dev)
  sizeof(struct udphdr) +
  sizeof(struct gtp0_header);
 
+   gtp->dev = dev;
+
gro_cells_init(>gro_cells, dev);
 }
 
@@ -764,13 +777,19 @@ static int gtp_newlink(struct net *src_net, struct 
net_device *dev,
   struct netlink_ext_ack *extack)
 {
unsigned int role = GTP_ROLE_GGSN;
+   bool have_fd, have_ports;
bool is_ipv6 = false;
struct gtp_dev *gtp;
struct gtp_net *gn;
int hashsize, err;
 
-   if (!data[IFLA_GTP_FD0] && !data[IFLA_GTP_FD1])
+   have_fd = !!data[IFLA_GTP_FD0] || !!data[IFLA_GTP_FD1];
+   have_ports = !!data[IFLA_GTP_PORT0] || !!data[IFLA_GTP_PORT1];
+
+   if (!(have_fd ^ have_ports)) {
+   /* Either got fd(s) or port(s) */
return -EINVAL;
+   }
 
if (data[IFLA_GTP_ROLE]) {
role = nla_get_u32(data[IFLA_GTP_ROLE]);
@@ -831,7 +850,7 @@ static int gtp_newlink(struct net *src_net, struct 
net_device *dev,
 out_hashtable:
gtp_hashtable_free(gtp);
 out_encap:
-   gtp_encap_disable(gtp);
+   gtp_encap_release(gtp);
return err;
 }
 
@@ -840,7 +859,7 @@ static void gtp_dellink(struct net_device *dev, struct 
list_head *head)
struct gtp_dev *gtp = netdev_priv(dev);
 
gro_cells_destroy(>gro_cells);
-   gtp_encap_disable(gtp);
+   gtp_encap_release(gtp);
gtp_hashtable_free(gtp);
list_del_rcu(>list);
unregister_netdevice_queue(dev, head);
@@ -851,6 +870,8 @@ static const struct nla_policy gtp_policy[IFLA_GTP_MAX + 1] 
= {
[IFLA_GTP_FD1]  = { .type = NLA_U32 },
[IFLA_GTP_PDP_HASHSIZE] = { .type = NLA_U32 },
[IFLA_GTP_ROLE] = { .type = NLA_U32 },
+   [IFLA_GTP_PORT0]= { .type = NLA_U16 },
+   [IFLA_GTP_PORT1]= { .type = NLA_U16 },
 };
 
 static int gtp_validate(struct nlattr *tb[], struct nlattr *data[],
@@ -949,11 +970,35 @@ static void gtp_hashtable_free(struct gtp_dev *gtp)
kfree(gtp->tid_hash);
 }
 
-static struct sock *gtp_encap_enable_socket(int fd, int type,
-   struct gtp_dev *gtp,
-   bool is_ipv6)
+static int gtp_encap_enable_sock(struct socket *sock, int type,
+struct gtp_dev *gtp)
 {
struct udp_tunnel_sock_cfg tuncfg = {NULL};
+
+   switch (type) {
+

[PATCH v3 net-next 05/12] gtp: Change to use gro_cells

2017-09-24 Thread Tom Herbert
Call gro_cells_receive instead of netif_rx.

Signed-off-by: Tom Herbert 
---
 drivers/net/gtp.c | 11 ++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/drivers/net/gtp.c b/drivers/net/gtp.c
index 6dabd605607c..f2aac5d01143 100644
--- a/drivers/net/gtp.c
+++ b/drivers/net/gtp.c
@@ -80,6 +80,8 @@ struct gtp_dev {
unsigned inthash_size;
struct hlist_head   *tid_hash;
struct hlist_head   *addr_hash;
+
+   struct gro_cellsgro_cells;
 };
 
 static unsigned int gtp_net_id __read_mostly;
@@ -189,6 +191,7 @@ static bool gtp_check_ms(struct sk_buff *skb, struct 
pdp_ctx *pctx,
 static int gtp_rx(struct pdp_ctx *pctx, struct sk_buff *skb,
unsigned int hdrlen, unsigned int role)
 {
+   struct gtp_dev *gtp = netdev_priv(pctx->dev);
struct pcpu_sw_netstats *stats;
 
if (!gtp_check_ms(skb, pctx, hdrlen, role)) {
@@ -217,7 +220,8 @@ static int gtp_rx(struct pdp_ctx *pctx, struct sk_buff *skb,
stats->rx_bytes += skb->len;
u64_stats_update_end(>syncp);
 
-   netif_rx(skb);
+   gro_cells_receive(>gro_cells, skb);
+
return 0;
 }
 
@@ -611,6 +615,8 @@ static const struct net_device_ops gtp_netdev_ops = {
 
 static void gtp_link_setup(struct net_device *dev)
 {
+   struct gtp_dev *gtp = netdev_priv(dev);
+
dev->netdev_ops = _netdev_ops;
dev->needs_free_netdev  = true;
 
@@ -630,6 +636,8 @@ static void gtp_link_setup(struct net_device *dev)
  sizeof(struct iphdr) +
  sizeof(struct udphdr) +
  sizeof(struct gtp0_header);
+
+   gro_cells_init(>gro_cells, dev);
 }
 
 static int gtp_hashtable_new(struct gtp_dev *gtp, int hsize);
@@ -686,6 +694,7 @@ static void gtp_dellink(struct net_device *dev, struct 
list_head *head)
 {
struct gtp_dev *gtp = netdev_priv(dev);
 
+   gro_cells_destroy(>gro_cells);
gtp_encap_disable(gtp);
gtp_hashtable_free(gtp);
list_del_rcu(>list);
-- 
2.11.0



[PATCH v3 net-next 04/12] iptunnel: Generalize tunnel update pmtu

2017-09-24 Thread Tom Herbert
Add __iptunnel_update_pmtu exported function which does not take
an iptunnel argument but instead includes the fields from the
iptunnel structure as arguments which are needed in the function.

iptunnel_update_pmtu was modified to call __iptunnel_update_pmtu.

Signed-off-by: Tom Herbert 
---
 include/net/ip_tunnels.h |  4 
 net/ipv4/ip_tunnel.c | 30 --
 2 files changed, 24 insertions(+), 10 deletions(-)

diff --git a/include/net/ip_tunnels.h b/include/net/ip_tunnels.h
index 9650efff33d7..880c9ea5b08c 100644
--- a/include/net/ip_tunnels.h
+++ b/include/net/ip_tunnels.h
@@ -318,6 +318,10 @@ static inline struct rtable *ip_tunnel_get_route(struct 
net_device *dev,
 dst_cache, info, use_cache);
 }
 
+int __iptunnel_update_pmtu(struct net_device *dev, struct sk_buff *skb,
+  struct dst_entry *dst, __be16 df,
+  const struct iphdr *inner_iph, int hlen, u32 daddr);
+
 struct ip_tunnel_encap_ops {
size_t (*encap_hlen)(struct ip_tunnel_encap *e);
int (*build_header)(struct sk_buff *skb, struct ip_tunnel_encap *e,
diff --git a/net/ipv4/ip_tunnel.c b/net/ipv4/ip_tunnel.c
index ea8f8bc0aaf9..31d6dc9f6859 100644
--- a/net/ipv4/ip_tunnel.c
+++ b/net/ipv4/ip_tunnel.c
@@ -506,17 +506,16 @@ int ip_tunnel_encap_setup(struct ip_tunnel *t,
 }
 EXPORT_SYMBOL_GPL(ip_tunnel_encap_setup);
 
-static int tnl_update_pmtu(struct net_device *dev, struct sk_buff *skb,
-   struct rtable *rt, __be16 df,
-   const struct iphdr *inner_iph)
+int __iptunnel_update_pmtu(struct net_device *dev, struct sk_buff *skb,
+  struct dst_entry *dst, __be16 df,
+  const struct iphdr *inner_iph, int hlen, u32 daddr)
 {
-   struct ip_tunnel *tunnel = netdev_priv(dev);
-   int pkt_size = skb->len - tunnel->hlen - dev->hard_header_len;
+   int pkt_size = skb->len - hlen - dev->hard_header_len;
int mtu;
 
if (df)
-   mtu = dst_mtu(>dst) - dev->hard_header_len
-   - sizeof(struct iphdr) - tunnel->hlen;
+   mtu = dst_mtu(dst) - dev->hard_header_len
+  - sizeof(struct iphdr) - hlen;
else
mtu = skb_dst(skb) ? dst_mtu(skb_dst(skb)) : dev->mtu;
 
@@ -538,8 +537,7 @@ static int tnl_update_pmtu(struct net_device *dev, struct 
sk_buff *skb,
 
if (rt6 && mtu < dst_mtu(skb_dst(skb)) &&
   mtu >= IPV6_MIN_MTU) {
-   if ((tunnel->parms.iph.daddr &&
-   !ipv4_is_multicast(tunnel->parms.iph.daddr)) ||
+   if ((daddr && !ipv4_is_multicast(daddr)) ||
rt6->rt6i_dst.plen == 128) {
rt6->rt6i_flags |= RTF_MODIFIED;
dst_metric_set(skb_dst(skb), RTAX_MTU, mtu);
@@ -555,6 +553,17 @@ static int tnl_update_pmtu(struct net_device *dev, struct 
sk_buff *skb,
 #endif
return 0;
 }
+EXPORT_SYMBOL(__iptunnel_update_pmtu);
+
+static int iptunnel_update_pmtu(struct net_device *dev, struct sk_buff *skb,
+   struct rtable *rt, __be16 df,
+   const struct iphdr *inner_iph)
+{
+   struct ip_tunnel *tunnel = netdev_priv(dev);
+
+   return __iptunnel_update_pmtu(dev, skb, >dst, df, inner_iph,
+ tunnel->hlen, tunnel->parms.iph.daddr);
+}
 
 void ip_md_tunnel_xmit(struct sk_buff *skb, struct net_device *dev, u8 proto)
 {
@@ -739,7 +748,8 @@ void ip_tunnel_xmit(struct sk_buff *skb, struct net_device 
*dev,
goto tx_error;
}
 
-   if (tnl_update_pmtu(dev, skb, rt, tnl_params->frag_off, inner_iph)) {
+   if (iptunnel_update_pmtu(dev, skb, rt, tnl_params->frag_off,
+inner_iph)) {
ip_rt_put(rt);
goto tx_error;
}
-- 
2.11.0



[PATCH v3 net-next 08/12] gtp: Call function to update path mtu

2017-09-24 Thread Tom Herbert
Replace mtu handling with call to __iptunnel_update_pmtu.

Signed-off-by: Tom Herbert 
---
 drivers/net/gtp.c | 36 ++--
 1 file changed, 6 insertions(+), 30 deletions(-)

diff --git a/drivers/net/gtp.c b/drivers/net/gtp.c
index a6e2e0a1f424..bbb08f8849d3 100644
--- a/drivers/net/gtp.c
+++ b/drivers/net/gtp.c
@@ -53,6 +53,7 @@ struct pdp_ctx {
} v1;
} u;
u8  gtp_version;
+   u8  hlen;
u16 af;
 
struct in_addr  ms_addr_ip4;
@@ -467,8 +468,6 @@ static int gtp_build_skb_ip4(struct sk_buff *skb, struct 
net_device *dev,
struct iphdr *iph;
struct sock *sk;
__be32 saddr;
-   __be16 df;
-   int mtu;
 
/* Read the IP destination address and resolve the PDP context.
 * Prepend PDP header with TEI/TID from PDP ctx.
@@ -514,37 +513,12 @@ static int gtp_build_skb_ip4(struct sk_buff *skb, struct 
net_device *dev,
 
skb_dst_drop(skb);
 
-   /* This is similar to tnl_update_pmtu(). */
-   df = iph->frag_off;
-   if (df) {
-   mtu = dst_mtu(>dst) - dev->hard_header_len -
-   sizeof(struct iphdr) - sizeof(struct udphdr);
-   switch (pctx->gtp_version) {
-   case GTP_V0:
-   mtu -= sizeof(struct gtp0_header);
-   break;
-   case GTP_V1:
-   mtu -= sizeof(struct gtp1_header);
-   break;
-   }
-   } else {
-   mtu = dst_mtu(>dst);
-   }
-
-   rt->dst.ops->update_pmtu(>dst, NULL, skb, mtu);
-
-   if (!skb_is_gso(skb) && (iph->frag_off & htons(IP_DF)) &&
-   mtu < ntohs(iph->tot_len)) {
-   netdev_dbg(dev, "packet too big, fragmentation needed\n");
-   memset(IPCB(skb), 0, sizeof(*IPCB(skb)));
-   icmp_send(skb, ICMP_DEST_UNREACH, ICMP_FRAG_NEEDED,
- htonl(mtu));
-   goto err_rt;
-   }
-
gtp_set_pktinfo_ipv4(pktinfo, sk, iph, pctx, rt, , dev);
gtp_push_header(skb, pktinfo);
 
+   __iptunnel_update_pmtu(dev, skb, >dst, !!iph->frag_off, iph,
+  pctx->hlen, pctx->peer_addr_ip4.s_addr);
+
return 0;
 err_rt:
ip_rt_put(rt);
@@ -915,10 +889,12 @@ static void ipv4_pdp_fill(struct pdp_ctx *pctx, struct 
genl_info *info)
 */
pctx->u.v0.tid = nla_get_u64(info->attrs[GTPA_TID]);
pctx->u.v0.flow = nla_get_u16(info->attrs[GTPA_FLOW]);
+   pctx->hlen = sizeof(struct udphdr) + sizeof(struct gtp0_header);
break;
case GTP_V1:
pctx->u.v1.i_tei = nla_get_u32(info->attrs[GTPA_I_TEI]);
pctx->u.v1.o_tei = nla_get_u32(info->attrs[GTPA_O_TEI]);
+   pctx->hlen = sizeof(struct udphdr) + sizeof(struct gtp1_header);
break;
default:
break;
-- 
2.11.0



[PATCH v3 net-next 06/12] gtp: Use goto for exceptions in gtp_udp_encap_recv funcs

2017-09-24 Thread Tom Herbert
Consolidate return logic to make it easier to extend.

Signed-off-by: Tom Herbert 
---
 drivers/net/gtp.c | 26 +-
 1 file changed, 17 insertions(+), 9 deletions(-)

diff --git a/drivers/net/gtp.c b/drivers/net/gtp.c
index f2aac5d01143..00e5ea5cb935 100644
--- a/drivers/net/gtp.c
+++ b/drivers/net/gtp.c
@@ -234,23 +234,27 @@ static int gtp0_udp_encap_recv(struct gtp_dev *gtp, 
struct sk_buff *skb)
struct pdp_ctx *pctx;
 
if (!pskb_may_pull(skb, hdrlen))
-   return -1;
+   goto drop;
 
gtp0 = (struct gtp0_header *)(skb->data + sizeof(struct udphdr));
 
if ((gtp0->flags >> 5) != GTP_V0)
-   return 1;
+   goto pass;
 
if (gtp0->type != GTP_TPDU)
-   return 1;
+   goto pass;
 
pctx = gtp0_pdp_find(gtp, be64_to_cpu(gtp0->tid));
if (!pctx) {
netdev_dbg(gtp->dev, "No PDP ctx to decap skb=%p\n", skb);
-   return 1;
+   goto pass;
}
 
return gtp_rx(pctx, skb, hdrlen, gtp->role);
+drop:
+   return -1;
+pass:
+   return 1;
 }
 
 static int gtp1u_udp_encap_recv(struct gtp_dev *gtp, struct sk_buff *skb)
@@ -261,15 +265,15 @@ static int gtp1u_udp_encap_recv(struct gtp_dev *gtp, 
struct sk_buff *skb)
struct pdp_ctx *pctx;
 
if (!pskb_may_pull(skb, hdrlen))
-   return -1;
+   goto drop;
 
gtp1 = (struct gtp1_header *)(skb->data + sizeof(struct udphdr));
 
if ((gtp1->flags >> 5) != GTP_V1)
-   return 1;
+   goto pass;
 
if (gtp1->type != GTP_TPDU)
-   return 1;
+   goto pass;
 
/* From 29.060: "This field shall be present if and only if any one or
 * more of the S, PN and E flags are set.".
@@ -282,17 +286,21 @@ static int gtp1u_udp_encap_recv(struct gtp_dev *gtp, 
struct sk_buff *skb)
 
/* Make sure the header is larger enough, including extensions. */
if (!pskb_may_pull(skb, hdrlen))
-   return -1;
+   goto drop;
 
gtp1 = (struct gtp1_header *)(skb->data + sizeof(struct udphdr));
 
pctx = gtp1_pdp_find(gtp, ntohl(gtp1->tid));
if (!pctx) {
netdev_dbg(gtp->dev, "No PDP ctx to decap skb=%p\n", skb);
-   return 1;
+   goto drop;
}
 
return gtp_rx(pctx, skb, hdrlen, gtp->role);
+drop:
+   return -1;
+pass:
+   return 1;
 }
 
 static void gtp_encap_destroy(struct sock *sk)
-- 
2.11.0



[PATCH v3 net-next 00/12] gtp: Additional feature support - Part I

2017-09-24 Thread Tom Herbert
This patch set builds upon the initial GTP implementation to make
support closer to that enjoyed by other encapsulation protocols.

The major items are:

  - Experimental IPv6 support
  - Configurable networking interfaces so that GTP kernel can be
used and tested without needing GSN network emulation (i.e. no user
space daemon needed).
  - Addition of a dst_cache in the GTP structure and other cleanup

Additionally, this patch set also includes:

  - Common functions to get a route fo for an IP tunnel

For IPv6 support, the mobile subscriber needs to allow IPv6 addresses,
and the remote endpoint can be IPv6.

For configurable interfaces, configuration is added to allow an
alternate means to configure a GTP and device. This follows the
typical UDP encapsulation model of specifying a listener port for
receive, and a remote address and port for transmit. 

Configuration is performed by iproute2/ip. I will post that
in a subsequent patch set.

Tested:

Configured the matrix of IPv4/IPv6 mobile subscriber, IPv4/IPv6 remote
peer, and GTP version 0 and 1 (eight combinations). Observed
connectivity and functional netperf. Also, tested VXLAN for
regression.

Test using openggs with ggsn and kernel module on one side and
emulated sgsn on the other. Observed connectivity and
functional netperf.

v2:
  - Split the otiginal patch to post in parts in order to make
review more manageable
  - Make IPv6 support experimental with a configuration option for it
  - Prepend hash functions with gtp
  - Generalize iptunnel update path MTU function and call it from gtp
instead using custom code
  - Split original patch cleaning up udp_recv into several for easier
review
v3: Properly include netdev on cc

Tom Herbert (12):
  iptunnel: Add common functions to get a tunnel route
  vxlan: Call common functions to get tunnel routes
  gtp: Call common functions to get tunnel routes and add dst_cache
  iptunnel: Generalize tunnel update pmtu
  gtp: Change to use gro_cells
  gtp: Use goto for exceptions in gtp_udp_encap_recv funcs
  gtp: udp recv clean up
  gtp: Call function to update path mtu
  gtp: Eliminate pktinfo and add port configuration
  gtp: Experimental encapsulation of IPv6 packets
  gtp: Experimental support encpasulating over IPv6
  gtp: Allow configuring GTP interface as standalone

 drivers/net/Kconfig  |   12 +-
 drivers/net/gtp.c| 1043 ++
 drivers/net/vxlan.c  |   84 +---
 include/net/ip6_tunnel.h |   35 ++
 include/net/ip_tunnels.h |   37 ++
 include/uapi/linux/gtp.h |8 +
 include/uapi/linux/if_link.h |3 +
 net/ipv4/ip_tunnel.c |   71 ++-
 net/ipv6/ip6_tunnel.c|   43 ++
 9 files changed, 949 insertions(+), 387 deletions(-)

-- 
2.11.0



[PATCH v3 net-next 02/12] vxlan: Call common functions to get tunnel routes

2017-09-24 Thread Tom Herbert
Call ip_tunnel_get_route and ip6_tnl_get_route to handle getting a route
and dealing with the dst_cache.

Signed-off-by: Tom Herbert 
---
 drivers/net/vxlan.c | 84 -
 1 file changed, 5 insertions(+), 79 deletions(-)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index d7c49cf1d5e9..810caa9adf37 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -1867,47 +1867,11 @@ static struct rtable *vxlan_get_route(struct vxlan_dev 
*vxlan, struct net_device
  struct dst_cache *dst_cache,
  const struct ip_tunnel_info *info)
 {
-   bool use_cache = ip_tunnel_dst_cache_usable(skb, info);
-   struct rtable *rt = NULL;
-   struct flowi4 fl4;
-
if (!sock4)
return ERR_PTR(-EIO);
 
-   if (tos && !info)
-   use_cache = false;
-   if (use_cache) {
-   rt = dst_cache_get_ip4(dst_cache, saddr);
-   if (rt)
-   return rt;
-   }
-
-   memset(, 0, sizeof(fl4));
-   fl4.flowi4_oif = oif;
-   fl4.flowi4_tos = RT_TOS(tos);
-   fl4.flowi4_mark = skb->mark;
-   fl4.flowi4_proto = IPPROTO_UDP;
-   fl4.daddr = daddr;
-   fl4.saddr = *saddr;
-   fl4.fl4_dport = dport;
-   fl4.fl4_sport = sport;
-
-   rt = ip_route_output_key(vxlan->net, );
-   if (likely(!IS_ERR(rt))) {
-   if (rt->dst.dev == dev) {
-   netdev_dbg(dev, "circular route to %pI4\n", );
-   ip_rt_put(rt);
-   return ERR_PTR(-ELOOP);
-   }
-
-   *saddr = fl4.saddr;
-   if (use_cache)
-   dst_cache_set_ip4(dst_cache, >dst, fl4.saddr);
-   } else {
-   netdev_dbg(dev, "no route to %pI4\n", );
-   return ERR_PTR(-ENETUNREACH);
-   }
-   return rt;
+   return ip_tunnel_get_route(dev, skb, IPPROTO_UDP, oif, tos, daddr,
+  saddr, dport, sport, dst_cache, info);
 }
 
 #if IS_ENABLED(CONFIG_IPV6)
@@ -1922,50 +1886,12 @@ static struct dst_entry *vxlan6_get_route(struct 
vxlan_dev *vxlan,
  struct dst_cache *dst_cache,
  const struct ip_tunnel_info *info)
 {
-   bool use_cache = ip_tunnel_dst_cache_usable(skb, info);
-   struct dst_entry *ndst;
-   struct flowi6 fl6;
-   int err;
-
if (!sock6)
return ERR_PTR(-EIO);
 
-   if (tos && !info)
-   use_cache = false;
-   if (use_cache) {
-   ndst = dst_cache_get_ip6(dst_cache, saddr);
-   if (ndst)
-   return ndst;
-   }
-
-   memset(, 0, sizeof(fl6));
-   fl6.flowi6_oif = oif;
-   fl6.daddr = *daddr;
-   fl6.saddr = *saddr;
-   fl6.flowlabel = ip6_make_flowinfo(RT_TOS(tos), label);
-   fl6.flowi6_mark = skb->mark;
-   fl6.flowi6_proto = IPPROTO_UDP;
-   fl6.fl6_dport = dport;
-   fl6.fl6_sport = sport;
-
-   err = ipv6_stub->ipv6_dst_lookup(vxlan->net,
-sock6->sock->sk,
-, );
-   if (unlikely(err < 0)) {
-   netdev_dbg(dev, "no route to %pI6\n", daddr);
-   return ERR_PTR(-ENETUNREACH);
-   }
-
-   if (unlikely(ndst->dev == dev)) {
-   netdev_dbg(dev, "circular route to %pI6\n", daddr);
-   dst_release(ndst);
-   return ERR_PTR(-ELOOP);
-   }
-
-   *saddr = fl6.saddr;
-   if (use_cache)
-   dst_cache_set_ip6(dst_cache, ndst, saddr);
-   return ndst;
+   return ip6_tnl_get_route(dev, skb, sock6->sock->sk, IPPROTO_UDP, oif,
+  tos, label, daddr, saddr, dport, sport,
+  dst_cache, info);
 }
 #endif
 
-- 
2.11.0



[PATCH v3 net-next 01/12] iptunnel: Add common functions to get a tunnel route

2017-09-24 Thread Tom Herbert
ip_tunnel_get_route and ip6_tnl_get_route are created to return
routes for a tunnel. These functions are derived from the VXLAN
functions.

Signed-off-by: Tom Herbert 
---
 include/net/ip6_tunnel.h | 35 +++
 include/net/ip_tunnels.h | 33 +
 net/ipv4/ip_tunnel.c | 41 +
 net/ipv6/ip6_tunnel.c| 43 +++
 4 files changed, 152 insertions(+)

diff --git a/include/net/ip6_tunnel.h b/include/net/ip6_tunnel.h
index 08fbc7f7d8d7..5a67301b0416 100644
--- a/include/net/ip6_tunnel.h
+++ b/include/net/ip6_tunnel.h
@@ -142,6 +142,41 @@ __u32 ip6_tnl_get_cap(struct ip6_tnl *t, const struct 
in6_addr *laddr,
 struct net *ip6_tnl_get_link_net(const struct net_device *dev);
 int ip6_tnl_get_iflink(const struct net_device *dev);
 int ip6_tnl_change_mtu(struct net_device *dev, int new_mtu);
+struct dst_entry *__ip6_tnl_get_route(struct net_device *dev,
+ struct sk_buff *skb, struct sock *sk,
+ u8 proto, int oif, u8 tos, __be32 label,
+ const struct in6_addr *daddr,
+ struct in6_addr *saddr,
+ __be16 dport, __be16 sport,
+ struct dst_cache *dst_cache,
+ const struct ip_tunnel_info *info,
+ bool use_cache);
+
+static inline struct dst_entry *ip6_tnl_get_route(struct net_device *dev,
+   struct sk_buff *skb, struct sock *sk, u8 proto,
+   int oif, u8 tos, __be32 label,
+   const struct in6_addr *daddr,
+   struct in6_addr *saddr,
+   __be16 dport, __be16 sport,
+   struct dst_cache *dst_cache,
+   const struct ip_tunnel_info *info)
+{
+bool use_cache = (ip_tunnel_dst_cache_usable(skb, info) &&
+   (!tos || info));
+
+#if IS_ENABLED(CONFIG_IPV6)
+   if (use_cache) {
+   struct dst_entry *ndst = dst_cache_get_ip6(dst_cache, saddr);
+
+   if (ndst)
+   return ndst;
+   }
+#endif
+
+   return __ip6_tnl_get_route(dev, skb, sk, proto, oif, tos, label,
+  daddr, saddr, dport, sport, dst_cache,
+  info, use_cache);
+}
 
 static inline void ip6tunnel_xmit(struct sock *sk, struct sk_buff *skb,
  struct net_device *dev)
diff --git a/include/net/ip_tunnels.h b/include/net/ip_tunnels.h
index b41a1e057fce..9650efff33d7 100644
--- a/include/net/ip_tunnels.h
+++ b/include/net/ip_tunnels.h
@@ -285,6 +285,39 @@ int ip_tunnel_newlink(struct net_device *dev, struct 
nlattr *tb[],
  struct ip_tunnel_parm *p, __u32 fwmark);
 void ip_tunnel_setup(struct net_device *dev, unsigned int net_id);
 
+struct rtable *__ip_tunnel_get_route(struct net_device *dev,
+struct sk_buff *skb, u8 proto,
+int oif, u8 tos,
+__be32 daddr, __be32 *saddr,
+__be16 dport, __be16 sport,
+struct dst_cache *dst_cache,
+const struct ip_tunnel_info *info,
+bool use_cache);
+
+static inline struct rtable *ip_tunnel_get_route(struct net_device *dev,
+struct sk_buff *skb, u8 proto,
+int oif, u8 tos,
+__be32 daddr, __be32 *saddr,
+__be16 dport, __be16 sport,
+struct dst_cache *dst_cache,
+const struct ip_tunnel_info *info)
+{
+   bool use_cache = (ip_tunnel_dst_cache_usable(skb, info) &&
+   (!tos || info));
+
+   if (use_cache) {
+   struct rtable *rt;
+
+   rt = dst_cache_get_ip4(dst_cache, saddr);
+   if (rt)
+   return rt;
+   }
+
+   return __ip_tunnel_get_route(dev, skb, proto, oif, tos,
+daddr, saddr, dport, sport,
+dst_cache, info, use_cache);
+}
+
 struct ip_tunnel_encap_ops {
size_t (*encap_hlen)(struct ip_tunnel_encap *e);
int (*build_header)(struct sk_buff *skb, struct ip_tunnel_encap *e,
diff --git a/net/ipv4/ip_tunnel.c b/net/ipv4/ip_tunnel.c
index fe6fee728ce4..ea8f8bc0aaf9 100644
--- a/net/ipv4/ip_tunnel.c
+++ b/net/ipv4/ip_tunnel.c
@@ -935,6 +935,47 @@ int ip_tunnel_ioctl(struct net_device *dev, struct 
ip_tunnel_parm *p, int cmd)
 }
 

[PATCH v3 net-next 03/12] gtp: Call common functions to get tunnel routes and add dst_cache

2017-09-24 Thread Tom Herbert
Call ip_tunnel_get_route and dst_cache to pdp context which should
improve performance by obviating the need to perform a route lookup
on every packet.

Signed-off-by: Tom Herbert 
---
 drivers/net/gtp.c | 62 +++
 1 file changed, 35 insertions(+), 27 deletions(-)

diff --git a/drivers/net/gtp.c b/drivers/net/gtp.c
index f38e32a7ec9c..6dabd605607c 100644
--- a/drivers/net/gtp.c
+++ b/drivers/net/gtp.c
@@ -63,6 +63,8 @@ struct pdp_ctx {
 
atomic_ttx_seq;
struct rcu_head rcu_head;
+
+   struct dst_cachedst_cache;
 };
 
 /* One instance of the GTP device. */
@@ -379,20 +381,6 @@ static void gtp_dev_uninit(struct net_device *dev)
free_percpu(dev->tstats);
 }
 
-static struct rtable *ip4_route_output_gtp(struct flowi4 *fl4,
-  const struct sock *sk,
-  __be32 daddr)
-{
-   memset(fl4, 0, sizeof(*fl4));
-   fl4->flowi4_oif = sk->sk_bound_dev_if;
-   fl4->daddr  = daddr;
-   fl4->saddr  = inet_sk(sk)->inet_saddr;
-   fl4->flowi4_tos = RT_CONN_FLAGS(sk);
-   fl4->flowi4_proto   = sk->sk_protocol;
-
-   return ip_route_output_key(sock_net(sk), fl4);
-}
-
 static inline void gtp0_push_header(struct sk_buff *skb, struct pdp_ctx *pctx)
 {
int payload_len = skb->len;
@@ -479,6 +467,8 @@ static int gtp_build_skb_ip4(struct sk_buff *skb, struct 
net_device *dev,
struct rtable *rt;
struct flowi4 fl4;
struct iphdr *iph;
+   struct sock *sk;
+   __be32 saddr;
__be16 df;
int mtu;
 
@@ -498,19 +488,30 @@ static int gtp_build_skb_ip4(struct sk_buff *skb, struct 
net_device *dev,
}
netdev_dbg(dev, "found PDP context %p\n", pctx);
 
-   rt = ip4_route_output_gtp(, pctx->sk, pctx->peer_addr_ip4.s_addr);
-   if (IS_ERR(rt)) {
-   netdev_dbg(dev, "no route to SSGN %pI4\n",
-  >peer_addr_ip4.s_addr);
-   dev->stats.tx_carrier_errors++;
-   goto err;
-   }
+   sk = pctx->sk;
+   saddr = inet_sk(sk)->inet_saddr;
 
-   if (rt->dst.dev == dev) {
-   netdev_dbg(dev, "circular route to SSGN %pI4\n",
-  >peer_addr_ip4.s_addr);
-   dev->stats.collisions++;
-   goto err_rt;
+   /* Source address returned by route lookup is ignored since
+* we get the address from a socket.
+*/
+   rt = ip_tunnel_get_route(dev, skb, sk->sk_protocol,
+sk->sk_bound_dev_if, RT_CONN_FLAGS(sk),
+pctx->peer_addr_ip4.s_addr, ,
+pktinfo->gtph_port, pktinfo->gtph_port,
+>dst_cache, NULL);
+
+   if (IS_ERR(rt)) {
+   if (rt == ERR_PTR(-ELOOP)) {
+   netdev_dbg(dev, "circular route to SSGN %pI4\n",
+  >peer_addr_ip4.s_addr);
+   dev->stats.collisions++;
+   goto err_rt;
+   } else {
+   netdev_dbg(dev, "no route to SSGN %pI4\n",
+  >peer_addr_ip4.s_addr);
+   dev->stats.tx_carrier_errors++;
+   goto err;
+   }
}
 
skb_dst_drop(skb);
@@ -543,7 +544,7 @@ static int gtp_build_skb_ip4(struct sk_buff *skb, struct 
net_device *dev,
goto err_rt;
}
 
-   gtp_set_pktinfo_ipv4(pktinfo, pctx->sk, iph, pctx, rt, , dev);
+   gtp_set_pktinfo_ipv4(pktinfo, sk, iph, pctx, rt, , dev);
gtp_push_header(skb, pktinfo);
 
return 0;
@@ -917,6 +918,7 @@ static int ipv4_pdp_add(struct gtp_dev *gtp, struct sock 
*sk,
struct pdp_ctx *pctx;
bool found = false;
__be32 ms_addr;
+   int err;
 
ms_addr = nla_get_be32(info->attrs[GTPA_MS_ADDRESS]);
hash_ms = ipv4_hashfn(ms_addr) % gtp->hash_size;
@@ -951,6 +953,12 @@ static int ipv4_pdp_add(struct gtp_dev *gtp, struct sock 
*sk,
if (pctx == NULL)
return -ENOMEM;
 
+   err = dst_cache_init(>dst_cache, GFP_KERNEL);
+   if (err) {
+   kfree(pctx);
+   return err;
+   }
+
sock_hold(sk);
pctx->sk = sk;
pctx->dev = gtp->dev;
-- 
2.11.0



Re: [PATCH net-next] sch_netem: faster rb tree removal

2017-09-24 Thread David Ahern
On 9/24/17 7:57 PM, David Ahern wrote:
> On 9/23/17 12:07 PM, Eric Dumazet wrote:
>> From: Eric Dumazet 
>>
>> While running TCP tests involving netem storing millions of packets,
>> I had the idea to speed up tfifo_reset() and did experiments.
>>
>> I tried the rbtree_postorder_for_each_entry_safe() method that is
>> used in skb_rbtree_purge() but discovered it was slower than the
>> current tfifo_reset() method.
>>
>> I measured time taken to release skbs with three occupation levels :
>> 10^4, 10^5 and 10^6 skbs with three methods :
>>
>> 1) (current 'naive' method)
>>
>>  while ((p = rb_first(>t_root))) {
>>  struct sk_buff *skb = netem_rb_to_skb(p);
>>  
>>  rb_erase(p, >t_root);
>>  rtnl_kfree_skbs(skb, skb);
>>  }
>>
>> 2) Use rb_next() instead of rb_first() in the loop :
>>
>>  p = rb_first(>t_root);
>>  while (p) {
>>  struct sk_buff *skb = netem_rb_to_skb(p);
>>
>>  p = rb_next(p);
>>  rb_erase(>rbnode, >t_root);
>>  rtnl_kfree_skbs(skb, skb);
>>  }
>>
>> 3) "optimized" method using rbtree_postorder_for_each_entry_safe()
>>
>>  struct sk_buff *skb, *next;
>>
>>  rbtree_postorder_for_each_entry_safe(skb, next,
>>   >t_root, rbnode) {
>>rtnl_kfree_skbs(skb, skb);
>>  }
>>  q->t_root = RB_ROOT;
>>
>> Results :
>>
>> method_1:while (rb_first()) rb_erase() 1 skbs in 690378 ns (69 ns per 
>> skb)
>> method_2:rb_first; while (p) { p = rb_next(p); ...}  1 skbs in 541846 ns 
>> (54 ns per skb)
>> method_3:rbtree_postorder_for_each_entry_safe() 1 skbs in 868307 ns (86 
>> ns per skb)
>>
>> method_1:while (rb_first()) rb_erase() 6 skbs in 7804021 ns (78 ns per 
>> skb)
>> method_2:rb_first; while (p) { p = rb_next(p); ...}  10 skbs in 5942456 
>> ns (59 ns per skb)
>> method_3:rbtree_postorder_for_each_entry_safe() 10 skbs in 11584940 ns 
>> (115 ns per skb)
>>
>> method_1:while (rb_first()) rb_erase() 100 skbs in 108577838 ns (108 ns 
>> per skb)
>> method_2:rb_first; while (p) { p = rb_next(p); ...}  100 skbs in 
>> 82619635 ns (82 ns per skb)
>> method_3:rbtree_postorder_for_each_entry_safe() 100 skbs in 127328743 ns 
>> (127 ns per skb)
>>
>> Method 2) is simply faster, probably because it maintains a smaller
>> working size set.
>>
>> Note that this is the method we use in tcp_ofo_queue() already.
>>
>> I will also change skb_rbtree_purge() in a second patch.
>>
>> Signed-off-by: Eric Dumazet 
>> ---
>>  net/sched/sch_netem.c |7 ---
>>  1 file changed, 4 insertions(+), 3 deletions(-)
>>
>> diff --git a/net/sched/sch_netem.c b/net/sched/sch_netem.c
>> index 
>> 063a4bdb9ee6f26b01387959e8f6ccd15ec16191..5a4f1008029068372019a965186e7a3c0a18aac3
>>  100644
>> --- a/net/sched/sch_netem.c
>> +++ b/net/sched/sch_netem.c
>> @@ -361,12 +361,13 @@ static psched_time_t packet_len_2_sched_time(unsigned 
>> int len, struct netem_sche
>>  static void tfifo_reset(struct Qdisc *sch)
>>  {
>>  struct netem_sched_data *q = qdisc_priv(sch);
>> -struct rb_node *p;
>> +struct rb_node *p = rb_first(>t_root);
>>  
>> -while ((p = rb_first(>t_root))) {
>> +while (p) {
>>  struct sk_buff *skb = netem_rb_to_skb(p);
>>  
>> -rb_erase(p, >t_root);
>> +p = rb_next(p);
>> +rb_erase(>rbnode, >t_root);
>>  rtnl_kfree_skbs(skb, skb);
>>  }
>>  }
>>
>>
> 
> Hi Eric:
> 
> I'm guessing the cost is in the rb_first and rb_next computations. Did
> you consider something like this:
> 
> struct rb_root *root
> struct rb_node **p = >rb_node;
> 
> while (*p != NULL) {
> struct foobar *fb;
> 
> fb = container_of(*p, struct foobar, rb_node);
> // fb processing
  rb_erase(>rb_node, root);

> p = >rb_node;
> }
> 

Oops, dropped the rb_erase in my consolidating the code to this snippet.


Re: [PATCH net-next RFC 2/5] vhost: introduce helper to prefetch desc index

2017-09-24 Thread Jason Wang



On 2017年09月22日 17:02, Stefan Hajnoczi wrote:

On Fri, Sep 22, 2017 at 04:02:32PM +0800, Jason Wang wrote:

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index f87ec75..8424166d 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -2437,6 +2437,61 @@ struct vhost_msg_node *vhost_dequeue_msg(struct 
vhost_dev *dev,
  }
  EXPORT_SYMBOL_GPL(vhost_dequeue_msg);
  
+int vhost_prefetch_desc_indices(struct vhost_virtqueue *vq,

+   struct vring_used_elem *heads,
+   u16 num, bool used_update)

Missing doc comment.


Will fix this.




+{
+   int ret, ret2;
+   u16 last_avail_idx, last_used_idx, total, copied;
+   __virtio16 avail_idx;
+   struct vring_used_elem __user *used;
+   int i;

The following variable names are a little confusing:

last_avail_idx vs vq->last_avail_idx.  last_avail_idx is a wrapped
avail->ring[] index, vq->last_avail_idx is a free-running counter.  The
same for last_used_idx vs vq->last_used_idx.

num argument vs vq->num.  The argument could be called nheads instead to
make it clear that this is heads[] and not the virtqueue size.

Not a bug but it took me a while to figure out what was going on.


I admit the name is confusing. Let me try better ones in V2.

Thanks


Re: [PATCH net-next RFC 1/5] vhost: split out ring head fetching logic

2017-09-24 Thread Jason Wang



On 2017年09月22日 16:31, Stefan Hajnoczi wrote:

On Fri, Sep 22, 2017 at 04:02:31PM +0800, Jason Wang wrote:

+/* This looks in the virtqueue and for the first available buffer, and converts
+ * it to an iovec for convenient access.  Since descriptors consist of some
+ * number of output then some number of input descriptors, it's actually two
+ * iovecs, but we pack them into one and note how many of each there were.
+ *
+ * This function returns the descriptor number found, or vq->num (which is
+ * never a valid descriptor number) if none was found.  A negative code is
+ * returned on error. */
+int __vhost_get_vq_desc(struct vhost_virtqueue *vq,
+   struct iovec iov[], unsigned int iov_size,
+   unsigned int *out_num, unsigned int *in_num,
+   struct vhost_log *log, unsigned int *log_num,
+   __virtio16 head)

[...]

+int vhost_get_vq_desc(struct vhost_virtqueue *vq,
+ struct iovec iov[], unsigned int iov_size,
+ unsigned int *out_num, unsigned int *in_num,
+ struct vhost_log *log, unsigned int *log_num)

Please document vhost_get_vq_desc().

Please also explain the difference between __vhost_get_vq_desc() and
vhost_get_vq_desc() in the documentation.


Right, will document this in next version.

Thanks



Re: [PATCH net-next] sch_netem: faster rb tree removal

2017-09-24 Thread David Ahern
On 9/23/17 12:07 PM, Eric Dumazet wrote:
> From: Eric Dumazet 
> 
> While running TCP tests involving netem storing millions of packets,
> I had the idea to speed up tfifo_reset() and did experiments.
> 
> I tried the rbtree_postorder_for_each_entry_safe() method that is
> used in skb_rbtree_purge() but discovered it was slower than the
> current tfifo_reset() method.
> 
> I measured time taken to release skbs with three occupation levels :
> 10^4, 10^5 and 10^6 skbs with three methods :
> 
> 1) (current 'naive' method)
> 
>   while ((p = rb_first(>t_root))) {
>   struct sk_buff *skb = netem_rb_to_skb(p);
>  
>   rb_erase(p, >t_root);
>   rtnl_kfree_skbs(skb, skb);
>   }
> 
> 2) Use rb_next() instead of rb_first() in the loop :
> 
>   p = rb_first(>t_root);
>   while (p) {
>   struct sk_buff *skb = netem_rb_to_skb(p);
> 
>   p = rb_next(p);
>   rb_erase(>rbnode, >t_root);
>   rtnl_kfree_skbs(skb, skb);
>   }
> 
> 3) "optimized" method using rbtree_postorder_for_each_entry_safe()
> 
>   struct sk_buff *skb, *next;
> 
>   rbtree_postorder_for_each_entry_safe(skb, next,
>>t_root, rbnode) {
>rtnl_kfree_skbs(skb, skb);
>   }
>   q->t_root = RB_ROOT;
> 
> Results :
> 
> method_1:while (rb_first()) rb_erase() 1 skbs in 690378 ns (69 ns per skb)
> method_2:rb_first; while (p) { p = rb_next(p); ...}  1 skbs in 541846 ns 
> (54 ns per skb)
> method_3:rbtree_postorder_for_each_entry_safe() 1 skbs in 868307 ns (86 
> ns per skb)
> 
> method_1:while (rb_first()) rb_erase() 6 skbs in 7804021 ns (78 ns per 
> skb)
> method_2:rb_first; while (p) { p = rb_next(p); ...}  10 skbs in 5942456 
> ns (59 ns per skb)
> method_3:rbtree_postorder_for_each_entry_safe() 10 skbs in 11584940 ns 
> (115 ns per skb)
> 
> method_1:while (rb_first()) rb_erase() 100 skbs in 108577838 ns (108 ns 
> per skb)
> method_2:rb_first; while (p) { p = rb_next(p); ...}  100 skbs in 82619635 
> ns (82 ns per skb)
> method_3:rbtree_postorder_for_each_entry_safe() 100 skbs in 127328743 ns 
> (127 ns per skb)
> 
> Method 2) is simply faster, probably because it maintains a smaller
> working size set.
> 
> Note that this is the method we use in tcp_ofo_queue() already.
> 
> I will also change skb_rbtree_purge() in a second patch.
> 
> Signed-off-by: Eric Dumazet 
> ---
>  net/sched/sch_netem.c |7 ---
>  1 file changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/net/sched/sch_netem.c b/net/sched/sch_netem.c
> index 
> 063a4bdb9ee6f26b01387959e8f6ccd15ec16191..5a4f1008029068372019a965186e7a3c0a18aac3
>  100644
> --- a/net/sched/sch_netem.c
> +++ b/net/sched/sch_netem.c
> @@ -361,12 +361,13 @@ static psched_time_t packet_len_2_sched_time(unsigned 
> int len, struct netem_sche
>  static void tfifo_reset(struct Qdisc *sch)
>  {
>   struct netem_sched_data *q = qdisc_priv(sch);
> - struct rb_node *p;
> + struct rb_node *p = rb_first(>t_root);
>  
> - while ((p = rb_first(>t_root))) {
> + while (p) {
>   struct sk_buff *skb = netem_rb_to_skb(p);
>  
> - rb_erase(p, >t_root);
> + p = rb_next(p);
> + rb_erase(>rbnode, >t_root);
>   rtnl_kfree_skbs(skb, skb);
>   }
>  }
> 
> 

Hi Eric:

I'm guessing the cost is in the rb_first and rb_next computations. Did
you consider something like this:

struct rb_root *root
struct rb_node **p = >rb_node;

while (*p != NULL) {
struct foobar *fb;

fb = container_of(*p, struct foobar, rb_node);
// fb processing

p = >rb_node;
}


Re: [patch net-next v2 07/12] mlxsw: spectrum: Add the multicast routing offloading logic

2017-09-24 Thread Yunsheng Lin
Hi, Jiri

On 2017/9/25 1:22, Jiri Pirko wrote:
> From: Yotam Gigi 
> 
> Add the multicast router offloading logic, which is in charge of handling
> the VIF and MFC notifications and translating it to the hardware logic API.
> 
> The offloading logic has to overcome several obstacles in order to safely
> comply with the kernel multicast router user API:
>  - It must keep track of the mapping between VIFs to netdevices. The user
>can add an MFC cache entry pointing to a VIF, delete the VIF and add
>re-add it with a different netdevice. The offloading logic has to handle
>this in order to be compatible with the kernel logic.
>  - It must keep track of the mapping between netdevices to spectrum RIFs,
>as the current hardware implementation assume having a RIF for every
>port in a multicast router.
>  - It must handle routes pointing to pimreg device to be trapped to the
>kernel, as the packet should be delivered to userspace.
>  - It must handle routes pointing tunnel VIFs. The current implementation
>does not support multicast forwarding to tunnels, thus routes that point
>to a tunnel should be trapped to the kernel.
>  - It must be aware of proxy multicast routes, which include both (*,*)
>routes and duplicate routes. Currently proxy routes are not offloaded
>and trigger the abort mechanism: removal of all routes from hardware and
>triggering the traffic to go through the kernel.
> 
> The multicast routing offloading logic also updates the counters of the
> offloaded MFC routes in a periodic work.
> 
> Signed-off-by: Yotam Gigi 
> Reviewed-by: Ido Schimmel 
> Signed-off-by: Jiri Pirko 
> ---
> v1->v2:
>  - Update the lastuse MFC entry field too, in addition to packets an bytes.
> ---
>  drivers/net/ethernet/mellanox/mlxsw/Makefile  |3 +-
>  drivers/net/ethernet/mellanox/mlxsw/spectrum.h|1 +
>  drivers/net/ethernet/mellanox/mlxsw/spectrum_mr.c | 1014 
> +
>  drivers/net/ethernet/mellanox/mlxsw/spectrum_mr.h |  133 +++
>  4 files changed, 1150 insertions(+), 1 deletion(-)
>  create mode 100644 drivers/net/ethernet/mellanox/mlxsw/spectrum_mr.c
>  create mode 100644 drivers/net/ethernet/mellanox/mlxsw/spectrum_mr.h
> 
> diff --git a/drivers/net/ethernet/mellanox/mlxsw/Makefile 
> b/drivers/net/ethernet/mellanox/mlxsw/Makefile
> index 4b88158..9b29764 100644
> --- a/drivers/net/ethernet/mellanox/mlxsw/Makefile
> +++ b/drivers/net/ethernet/mellanox/mlxsw/Makefile
> @@ -17,7 +17,8 @@ mlxsw_spectrum-objs := spectrum.o 
> spectrum_buffers.o \
>  spectrum_kvdl.o spectrum_acl_tcam.o \
>  spectrum_acl.o spectrum_flower.o \
>  spectrum_cnt.o spectrum_fid.o \
> -spectrum_ipip.o spectrum_acl_flex_actions.o
> +spectrum_ipip.o spectrum_acl_flex_actions.o \
> +spectrum_mr.o
>  mlxsw_spectrum-$(CONFIG_MLXSW_SPECTRUM_DCB)  += spectrum_dcb.o
>  mlxsw_spectrum-$(CONFIG_NET_DEVLINK) += spectrum_dpipe.o
>  obj-$(CONFIG_MLXSW_MINIMAL)  += mlxsw_minimal.o
> diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum.h 
> b/drivers/net/ethernet/mellanox/mlxsw/spectrum.h
> index e907ec4..51d8b9f 100644
> --- a/drivers/net/ethernet/mellanox/mlxsw/spectrum.h
> +++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum.h
> @@ -153,6 +153,7 @@ struct mlxsw_sp {
>   struct mlxsw_sp_sb *sb;
>   struct mlxsw_sp_bridge *bridge;
>   struct mlxsw_sp_router *router;
> + struct mlxsw_sp_mr *mr;
>   struct mlxsw_afa *afa;
>   struct mlxsw_sp_acl *acl;
>   struct mlxsw_sp_fid_core *fid_core;
> diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_mr.c 
> b/drivers/net/ethernet/mellanox/mlxsw/spectrum_mr.c
> new file mode 100644
> index 000..89b2e60
> --- /dev/null
> +++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_mr.c
> @@ -0,0 +1,1014 @@
> +/*
> + * drivers/net/ethernet/mellanox/mlxsw/spectrum_mr.c
> + * Copyright (c) 2017 Mellanox Technologies. All rights reserved.
> + * Copyright (c) 2017 Yotam Gigi 
> + *
> + * Redistribution and use in source and binary forms, with or without
> + * modification, are permitted provided that the following conditions are 
> met:
> + *
> + * 1. Redistributions of source code must retain the above copyright
> + *notice, this list of conditions and the following disclaimer.
> + * 2. Redistributions in binary form must reproduce the above copyright
> + *notice, this list of conditions and the following disclaimer in the
> + *documentation and/or other materials provided with the distribution.
> + * 3. Neither the names of the copyright holders nor the names of its
> + *contributors may be used to endorse or promote products derived from
> + *this software without specific prior written 

Re: [patch net-next v2 06/12] net: mroute: Check if rule is a default rule

2017-09-24 Thread Yunsheng Lin
Hi, Jiri

On 2017/9/25 1:22, Jiri Pirko wrote:
> From: Yotam Gigi 
> 
> When the ipmr starts, it adds one default FIB rule that matches all packets
> and sends them to the DEFAULT (multicast) FIB table. A more complex rule
> can be added by user to specify that for a specific interface, a packet
> should be look up at either an arbitrary table or according to the l3mdev
> of the interface.
> 
> For drivers willing to offload the ipmr logic into a hardware but don't
> want to offload all the FIB rules functionality, provide a function that
> can indicate whether the FIB rule is the default multicast rule, thus only
> one routing table is needed.
> 
> This way, a driver can register to the FIB notification chain, get
> notifications about FIB rules added and trigger some kind of an internal
> abort mechanism when a non default rule is added by the user.
> 
> Signed-off-by: Yotam Gigi 
> Reviewed-by: Ido Schimmel 
> Signed-off-by: Jiri Pirko 
> ---
>  include/linux/mroute.h |  7 +++
>  net/ipv4/ipmr.c| 10 ++
>  2 files changed, 17 insertions(+)
> 
> diff --git a/include/linux/mroute.h b/include/linux/mroute.h
> index 5566580..b072a84 100644
> --- a/include/linux/mroute.h
> +++ b/include/linux/mroute.h
> @@ -5,6 +5,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  
> @@ -19,6 +20,7 @@ int ip_mroute_getsockopt(struct sock *, int, char __user *, 
> int __user *);
>  int ipmr_ioctl(struct sock *sk, int cmd, void __user *arg);
>  int ipmr_compat_ioctl(struct sock *sk, unsigned int cmd, void __user *arg);
>  int ip_mr_init(void);
> +bool ipmr_rule_default(const struct fib_rule *rule);
>  #else
>  static inline int ip_mroute_setsockopt(struct sock *sock, int optname,
>  char __user *optval, unsigned int optlen)
> @@ -46,6 +48,11 @@ static inline int ip_mroute_opt(int opt)
>  {
>   return 0;
>  }
> +
> +static inline bool ipmr_rule_default(const struct fib_rule *rule)
> +{
> + return true;
> +}
>  #endif
>  
>  struct vif_device {
> diff --git a/net/ipv4/ipmr.c b/net/ipv4/ipmr.c
> index 2a795d2..a714f55 100644
> --- a/net/ipv4/ipmr.c
> +++ b/net/ipv4/ipmr.c
> @@ -320,6 +320,16 @@ static unsigned int ipmr_rules_seq_read(struct net *net)
>  }
>  #endif
>  
> +bool ipmr_rule_default(const struct fib_rule *rule)
> +{
> +#if IS_ENABLED(CONFIG_FIB_RULES)
> + return fib_rule_matchall(rule) && rule->table == RT_TABLE_DEFAULT;
> +#else
> + return true;
> +#endif

In patch 02, You have the following, can you do the same for the above?
+#ifdef CONFIG_IP_MROUTE
+void ipmr_cache_free(struct mfc_cache *mfc_cache);
+#else
+static inline void ipmr_cache_free(struct mfc_cache *mfc_cache)
+{
+}
+#endif

> +}
> +EXPORT_SYMBOL(ipmr_rule_default);
> +
>  static inline int ipmr_hash_cmp(struct rhashtable_compare_arg *arg,
>   const void *ptr)
>  {
> 



Re: [patch net-next v2 03/12] ipmr: Add FIB notification access functions

2017-09-24 Thread Yunsheng Lin
Hi, Jiri

On 2017/9/25 1:22, Jiri Pirko wrote:
> From: Yotam Gigi 
> 
> Make the ipmr module register as a FIB notifier. To do that, implement both
> the ipmr_seq_read and ipmr_dump ops.
> 
> The ipmr_seq_read op returns a sequence counter that is incremented on
> every notification related operation done by the ipmr. To implement that,
> add a sequence counter in the netns_ipv4 struct and increment it whenever a
> new MFC route or VIF are added or deleted. The sequence operations are
> protected by the RTNL lock.
> 
> The ipmr_dump iterates the list of MFC routes and the list of VIF entries
> and sends notifications about them. The entries dump is done under RCU
> where the VIF dump uses the mrt_lock too, as the vif->dev field can change
> under RCU.
> 
> Signed-off-by: Yotam Gigi 
> Reviewed-by: Ido Schimmel 
> Signed-off-by: Jiri Pirko 
> ---
> v1->v2:
>  - Take the mrt_lock when dumping VIF entries.
> ---
>  include/linux/mroute.h   |  15 ++
>  include/net/netns/ipv4.h |   3 ++
>  net/ipv4/ipmr.c  | 137 
> ++-
>  3 files changed, 153 insertions(+), 2 deletions(-)
> 
> diff --git a/include/linux/mroute.h b/include/linux/mroute.h
> index 10028f2..54c5cb8 100644
> --- a/include/linux/mroute.h
> +++ b/include/linux/mroute.h
> @@ -5,6 +5,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  
>  #ifdef CONFIG_IP_MROUTE
> @@ -58,6 +59,14 @@ struct vif_device {
>   int link;   /* Physical interface index 
> */
>  };
>  
> +struct vif_entry_notifier_info {
> + struct fib_notifier_info info;
> + struct net_device *dev;
> + vifi_t vif_index;
> + unsigned short vif_flags;
> + u32 tb_id;
> +};
> +
>  #define VIFF_STATIC 0x8000
>  
>  #define VIF_EXISTS(_mrt, _idx) ((_mrt)->vif_table[_idx].dev != NULL)
> @@ -146,6 +155,12 @@ struct mfc_cache {
>   struct rcu_head rcu;
>  };
>  
> +struct mfc_entry_notifier_info {
> + struct fib_notifier_info info;
> + struct mfc_cache *mfc;
> + u32 tb_id;
> +};
> +
>  struct rtmsg;
>  int ipmr_get_route(struct net *net, struct sk_buff *skb,
>  __be32 saddr, __be32 daddr,
> diff --git a/include/net/netns/ipv4.h b/include/net/netns/ipv4.h
> index 8387f09..abc84d9 100644
> --- a/include/net/netns/ipv4.h
> +++ b/include/net/netns/ipv4.h
> @@ -163,6 +163,9 @@ struct netns_ipv4 {
>   struct fib_notifier_ops *notifier_ops;
>   unsigned intfib_seq;/* protected by rtnl_mutex */
>  
> + struct fib_notifier_ops *ipmr_notifier_ops;

Can we add a const here?

> + unsigned intipmr_seq;   /* protected by rtnl_mutex */
> +
>   atomic_trt_genid;
>  };
>  #endif
> diff --git a/net/ipv4/ipmr.c b/net/ipv4/ipmr.c
> index 86dc5f9..49879c3 100644
> --- a/net/ipv4/ipmr.c
> +++ b/net/ipv4/ipmr.c
> @@ -264,6 +264,16 @@ static void __net_exit ipmr_rules_exit(struct net *net)
>   fib_rules_unregister(net->ipv4.mr_rules_ops);
>   rtnl_unlock();
>  }
> +
> +static int ipmr_rules_dump(struct net *net, struct notifier_block *nb)
> +{
> + return fib_rules_dump(net, nb, RTNL_FAMILY_IPMR);
> +}
> +
> +static unsigned int ipmr_rules_seq_read(struct net *net)
> +{
> + return fib_rules_seq_read(net, RTNL_FAMILY_IPMR);
> +}
>  #else
>  #define ipmr_for_each_table(mrt, net) \
>   for (mrt = net->ipv4.mrt; mrt; mrt = NULL)
> @@ -298,6 +308,16 @@ static void __net_exit ipmr_rules_exit(struct net *net)
>   net->ipv4.mrt = NULL;
>   rtnl_unlock();
>  }
> +
> +static int ipmr_rules_dump(struct net *net, struct notifier_block *nb)
> +{
> + return 0;
> +}
> +
> +static unsigned int ipmr_rules_seq_read(struct net *net)
> +{
> + return 0;
> +}
>  #endif
>  
>  static inline int ipmr_hash_cmp(struct rhashtable_compare_arg *arg,
> @@ -587,6 +607,43 @@ static struct net_device *ipmr_reg_vif(struct net *net, 
> struct mr_table *mrt)
>  }
>  #endif
>  
> +static int call_ipmr_vif_entry_notifier(struct notifier_block *nb,
> + struct net *net,
> + enum fib_event_type event_type,
> + struct vif_device *vif,
> + vifi_t vif_index, u32 tb_id)
> +{
> + struct vif_entry_notifier_info info = {
> + .info = {
> + .family = RTNL_FAMILY_IPMR,
> + .net = net,
> + },
> + .dev = vif->dev,
> + .vif_index = vif_index,
> + .vif_flags = vif->flags,
> + .tb_id = tb_id,
> + };

We only use info.info which is fib_notifier_info, the
vif_entry_notifier_info seems to be not needed, why not just
use fib_notifier_info?

> +
> + return call_fib_notifier(nb, net, event_type, );
> +}
> +
> +static int call_ipmr_mfc_entry_notifier(struct notifier_block *nb,
> + 

Re: [PATCH net-next 10/10] net: hns3: Add mqprio support when interacting with network stack

2017-09-24 Thread Yunsheng Lin
Hi, Jiri

On 2017/9/24 19:37, Jiri Pirko wrote:
> Sat, Sep 23, 2017 at 02:47:20AM CEST, linyunsh...@huawei.com wrote:
>> Hi, Jiri
>>
>> On 2017/9/23 0:03, Jiri Pirko wrote:
>>> Fri, Sep 22, 2017 at 04:11:51PM CEST, linyunsh...@huawei.com wrote:
 Hi, Jiri

>> - if (!tc) {
>> + if (if_running) {
>> + (void)hns3_nic_net_stop(netdev);
>> + msleep(100);
>> + }
>> +
>> + ret = (kinfo->dcb_ops && kinfo->dcb_ops->>setup_tc) ?
>> + kinfo->dcb_ops->setup_tc(h, tc, prio_tc) : ->EOPNOTSUPP;

> This is most odd. Why do you call dcb_ops from >ndo_setup_tc callback?
> Why are you mixing this together? prio->tc mapping >can be done
> directly in dcbnl

 Here is what we do in dcb_ops->setup_tc:
 Firstly, if current tc num is different from the tc num
 that user provide, then we setup the queues for each
 tc.

 Secondly, we tell hardware the pri to tc mapping that
 the stack is using. In rx direction, our hardware need
 that mapping to put different packet into different tc'
 queues according to the priority of the packet, then
 rss decides which specific queue in the tc should the
 packet goto.

 By mixing, I suppose you meant why we need the
 pri to tc infomation?
>>>
>>> by mixing, I mean what I wrote. You are calling dcb_ops callback from
>>> ndo_setup_tc callback. So you are mixing DCBNL subsystem and TC
>>> subsystem. Why? Why do you need sch_mqprio? Why DCBNL is not enough for
>>> all?
>>
>> When using lldptool, dcbnl is involved.
>>
>> But when using tc qdisc, dcbbl is not involved, below is the a few key
>> call graph in the kernel when tc qdisc cmd is executed.
>>
>> cmd:
>> tc qdisc add dev eth0 root handle 1:0 mqprio num_tc 4 map 1 2 3 3 1 3 1 1 hw 
>> 1
>>
>> call graph:
>> rtnetlink_rcv_msg -> tc_modify_qdisc -> qdisc_create -> mqprio_init ->
>> hns3_nic_setup_tc
>>
>> When hns3_nic_setup_tc is called, we need to know how many tc num and
>> prio_tc mapping from the tc_mqprio_qopt which is provided in the paramter
>> in the ndo_setup_tc function, and dcb_ops is the our hardware specific
>> method to setup the tc related parameter to the hardware, so this is why
>> we call dcb_ops callback in ndo_setup_tc callback.
>>
>> I hope this will answer your question, thanks for your time.
> 
> Okay. I understand that you have a usecase for mqprio mapping offload
> without lldptool being involved. Ok. I believe it is wrong to call dcb_ops
> from tc callback. You should have a generic layer inside the driver and
> call it from both dcb_ops and tc callbacks.

Actually, dcb_ops is our generic layer inside the driver.
Below is high level architecture:

   [ tc qdisc ][ lldpad ]
 | |
 | |
 | |
   [ hns3_enet ][ hns3_dcbnl ]
 \/
\  /
   \/
 [ hclge_dcb ]
   /  \
/\
 /  \
 [ hclgc_main ][ hclge_tm ]

hns3_enet.c implements the ndo_setup_tc callback.
hns3_dcbnl.c implements the dcbnl_rtnl_ops for stack's DCBNL system.
hclge_dcb implements the dcb_ops.
So we already have a generic layer that tc and dcbnl all call from.

> 
> Also, what happens If I run lldptool concurrently with mqprio? Who wins
> and is going to configure the mapping?

Both lldptool and tc qdisc cmd use rtnl interface provided by stack, so
they are both protected by rtnl_lock, so we do not have to do the locking
in the driver.

The locking is in rtnetlink_rcv_msg:

rtnl_lock();
handlers = rtnl_dereference(rtnl_msg_handlers[family]);
if (handlers) {
doit = READ_ONCE(handlers[type].doit);
if (doit)
err = doit(skb, nlh, extack);
}
rtnl_unlock();

Thanks.

> 
> 
>>
>>>
>>>
>>>
 I hope I did not misunderstand your question, thanks
 for your time reviewing.
>>>
>>> .
>>>
>>
> 
> .
> 



[PATCH net-next 0/6] BPF metadata for direct access

2017-09-24 Thread Daniel Borkmann
This work enables generic transfer of metadata from XDP into skb,
meaning the packet has a flexible and programmable room for meta
data, which can later be used by BPF to set various skb members
when passing up the stack. For details, please see second patch.
Support has been implemented and tested with two drivers, and
should be straight forward to add to other drivers as well which
properly support head adjustment already.

Thanks!

Daniel Borkmann (6):
  bpf: rename bpf_compute_data_end into bpf_compute_data_pointers
  bpf: add meta pointer for direct access
  bpf: update bpf.h uapi header for tools
  bpf: improve selftests and add tests for meta pointer
  bpf, nfp: add meta data support
  bpf, ixgbe: add meta data support

 drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c  |   1 +
 drivers/net/ethernet/cavium/thunder/nicvf_main.c   |   1 +
 drivers/net/ethernet/intel/i40e/i40e_txrx.c|   1 +
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c  |  29 ++-
 drivers/net/ethernet/mellanox/mlx4/en_rx.c |   1 +
 drivers/net/ethernet/mellanox/mlx5/core/en_rx.c|   1 +
 .../net/ethernet/netronome/nfp/nfp_net_common.c|  39 ++--
 drivers/net/ethernet/qlogic/qede/qede_fp.c |   1 +
 drivers/net/tun.c  |   1 +
 drivers/net/virtio_net.c   |   2 +
 include/linux/bpf.h|   1 +
 include/linux/filter.h |  30 ++-
 include/linux/skbuff.h |  68 +-
 include/uapi/linux/bpf.h   |  13 +-
 kernel/bpf/sockmap.c   |   4 +-
 kernel/bpf/verifier.c  | 114 +++---
 net/bpf/test_run.c |   3 +-
 net/core/dev.c |  31 ++-
 net/core/filter.c  |  91 +++-
 net/core/lwt_bpf.c |   2 +-
 net/core/skbuff.c  |   2 +
 net/sched/act_bpf.c|   4 +-
 net/sched/cls_bpf.c|   4 +-
 tools/include/uapi/linux/bpf.h |  45 ++--
 tools/testing/selftests/bpf/Makefile   |  21 +-
 tools/testing/selftests/bpf/bpf_helpers.h  |   2 +
 tools/testing/selftests/bpf/test_verifier.c| 247 +
 tools/testing/selftests/bpf/test_xdp_meta.c|  53 +
 tools/testing/selftests/bpf/test_xdp_meta.sh   |  51 +
 29 files changed, 759 insertions(+), 104 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/test_xdp_meta.c
 create mode 100755 tools/testing/selftests/bpf/test_xdp_meta.sh

-- 
1.9.3



[PATCH net-next 4/6] bpf: improve selftests and add tests for meta pointer

2017-09-24 Thread Daniel Borkmann
Add various test_verifier selftests, and a simple xdp/tc functional
test that is being attached to veths. Also let new versions of clang
use the recently added -mcpu=probe support [1] for the BPF target,
so that it can probe the underlying kernel for BPF insn set extensions.
We could also just set this options always, where older versions just
ignore it and give a note to the user that the -mcpu value is not
supported, but given emitting the note cannot be turned off from clang
side lets not confuse users running selftests with it, thus fallback
to the default generic one when we see that clang doesn't support it.
Also allow CPU option to be overridden in the Makefile from command
line.

  [1] 
https://github.com/llvm-mirror/llvm/commit/d7276a40d87b89aed89978dec6457a5b8b3a0db5

Signed-off-by: Daniel Borkmann 
Acked-by: Alexei Starovoitov 
Acked-by: John Fastabend 
---
 tools/testing/selftests/bpf/Makefile |  21 ++-
 tools/testing/selftests/bpf/bpf_helpers.h|   2 +
 tools/testing/selftests/bpf/test_verifier.c  | 247 +++
 tools/testing/selftests/bpf/test_xdp_meta.c  |  53 ++
 tools/testing/selftests/bpf/test_xdp_meta.sh |  51 ++
 5 files changed, 370 insertions(+), 4 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/test_xdp_meta.c
 create mode 100755 tools/testing/selftests/bpf/test_xdp_meta.sh

diff --git a/tools/testing/selftests/bpf/Makefile 
b/tools/testing/selftests/bpf/Makefile
index f4b23d6..924af8d7 100644
--- a/tools/testing/selftests/bpf/Makefile
+++ b/tools/testing/selftests/bpf/Makefile
@@ -15,9 +15,10 @@ TEST_GEN_PROGS = test_verifier test_tag test_maps 
test_lru_map test_lpm_map test
test_align
 
 TEST_GEN_FILES = test_pkt_access.o test_xdp.o test_l4lb.o test_tcp_estats.o 
test_obj_id.o \
-   test_pkt_md_access.o test_xdp_redirect.o sockmap_parse_prog.o 
sockmap_verdict_prog.o
+   test_pkt_md_access.o test_xdp_redirect.o test_xdp_meta.o 
sockmap_parse_prog.o \
+   sockmap_verdict_prog.o
 
-TEST_PROGS := test_kmod.sh test_xdp_redirect.sh
+TEST_PROGS := test_kmod.sh test_xdp_redirect.sh test_xdp_meta.sh
 
 include ../lib.mk
 
@@ -34,8 +35,20 @@ $(BPFOBJ): force
$(MAKE) -C $(BPFDIR) OUTPUT=$(OUTPUT)/
 
 CLANG ?= clang
+LLC   ?= llc
+
+PROBE := $(shell llc -march=bpf -mcpu=probe -filetype=null /dev/null 2>&1)
+
+# Let newer LLVM versions transparently probe the kernel for availability
+# of full BPF instruction set.
+ifeq ($(PROBE),)
+  CPU ?= probe
+else
+  CPU ?= generic
+endif
 
 %.o: %.c
$(CLANG) -I. -I./include/uapi -I../../../include/uapi \
-   -Wno-compare-distinct-pointer-types \
-   -O2 -target bpf -c $< -o $@
+-Wno-compare-distinct-pointer-types  \
+-O2 -target bpf -emit-llvm -c $< -o - |  \
+   $(LLC) -march=bpf -mcpu=$(CPU) -filetype=obj -o $@
diff --git a/tools/testing/selftests/bpf/bpf_helpers.h 
b/tools/testing/selftests/bpf/bpf_helpers.h
index 4875395..a56053d 100644
--- a/tools/testing/selftests/bpf/bpf_helpers.h
+++ b/tools/testing/selftests/bpf/bpf_helpers.h
@@ -62,6 +62,8 @@ static unsigned long long (*bpf_get_prandom_u32)(void) =
(void *) BPF_FUNC_get_prandom_u32;
 static int (*bpf_xdp_adjust_head)(void *ctx, int offset) =
(void *) BPF_FUNC_xdp_adjust_head;
+static int (*bpf_xdp_adjust_meta)(void *ctx, int offset) =
+   (void *) BPF_FUNC_xdp_adjust_meta;
 static int (*bpf_setsockopt)(void *ctx, int level, int optname, void *optval,
 int optlen) =
(void *) BPF_FUNC_setsockopt;
diff --git a/tools/testing/selftests/bpf/test_verifier.c 
b/tools/testing/selftests/bpf/test_verifier.c
index 26f3250..a042614 100644
--- a/tools/testing/selftests/bpf/test_verifier.c
+++ b/tools/testing/selftests/bpf/test_verifier.c
@@ -6645,6 +6645,253 @@ struct test_val {
.errstr = "BPF_END uses reserved fields",
.result = REJECT,
},
+   {
+   "meta access, test1",
+   .insns = {
+   BPF_LDX_MEM(BPF_W, BPF_REG_2, BPF_REG_1,
+   offsetof(struct xdp_md, data_meta)),
+   BPF_LDX_MEM(BPF_W, BPF_REG_3, BPF_REG_1,
+   offsetof(struct xdp_md, data)),
+   BPF_MOV64_REG(BPF_REG_0, BPF_REG_2),
+   BPF_ALU64_IMM(BPF_ADD, BPF_REG_0, 8),
+   BPF_JMP_REG(BPF_JGT, BPF_REG_0, BPF_REG_3, 1),
+   BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_2, 0),
+   BPF_MOV64_IMM(BPF_REG_0, 0),
+   BPF_EXIT_INSN(),
+   },
+   .result = ACCEPT,
+   .prog_type = BPF_PROG_TYPE_XDP,
+   },
+   {
+   "meta access, test2",
+   .insns = {
+   BPF_LDX_MEM(BPF_W, BPF_REG_2, BPF_REG_1,
+ 

[PATCH net-next 2/6] bpf: add meta pointer for direct access

2017-09-24 Thread Daniel Borkmann
This work enables generic transfer of metadata from XDP into skb. The
basic idea is that we can make use of the fact that the resulting skb
must be linear and already comes with a larger headroom for supporting
bpf_xdp_adjust_head(), which mangles xdp->data. Here, we base our work
on a similar principle and introduce a small helper bpf_xdp_adjust_meta()
for adjusting a new pointer called xdp->data_meta. Thus, the packet has
a flexible and programmable room for meta data, followed by the actual
packet data. struct xdp_buff is therefore laid out that we first point
to data_hard_start, then data_meta directly prepended to data followed
by data_end marking the end of packet. bpf_xdp_adjust_head() takes into
account whether we have meta data already prepended and if so, memmove()s
this along with the given offset provided there's enough room.

xdp->data_meta is optional and programs are not required to use it. The
rationale is that when we process the packet in XDP (e.g. as DoS filter),
we can push further meta data along with it for the XDP_PASS case, and
give the guarantee that a clsact ingress BPF program on the same device
can pick this up for further post-processing. Since we work with skb
there, we can also set skb->mark, skb->priority or other skb meta data
out of BPF, thus having this scratch space generic and programmable
allows for more flexibility than defining a direct 1:1 transfer of
potentially new XDP members into skb (it's also more efficient as we
don't need to initialize/handle each of such new members). The facility
also works together with GRO aggregation. The scratch space at the head
of the packet can be multiple of 4 byte up to 32 byte large. Drivers not
yet supporting xdp->data_meta can simply be set up with xdp->data_meta
as xdp->data + 1 as bpf_xdp_adjust_meta() will detect this and bail out,
such that the subsequent match against xdp->data for later access is
guaranteed to fail.

The verifier treats xdp->data_meta/xdp->data the same way as we treat
xdp->data/xdp->data_end pointer comparisons. The requirement for doing
the compare against xdp->data is that it hasn't been modified from it's
original address we got from ctx access. It may have a range marking
already from prior successful xdp->data/xdp->data_end pointer comparisons
though.

Signed-off-by: Daniel Borkmann 
Acked-by: Alexei Starovoitov 
Acked-by: John Fastabend 
---
 drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c  |   1 +
 drivers/net/ethernet/cavium/thunder/nicvf_main.c   |   1 +
 drivers/net/ethernet/intel/i40e/i40e_txrx.c|   1 +
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c  |   1 +
 drivers/net/ethernet/mellanox/mlx4/en_rx.c |   1 +
 drivers/net/ethernet/mellanox/mlx5/core/en_rx.c|   1 +
 .../net/ethernet/netronome/nfp/nfp_net_common.c|   1 +
 drivers/net/ethernet/qlogic/qede/qede_fp.c |   1 +
 drivers/net/tun.c  |   1 +
 drivers/net/virtio_net.c   |   2 +
 include/linux/bpf.h|   1 +
 include/linux/filter.h |  21 +++-
 include/linux/skbuff.h |  68 +++-
 include/uapi/linux/bpf.h   |  13 ++-
 kernel/bpf/verifier.c  | 114 -
 net/bpf/test_run.c |   1 +
 net/core/dev.c |  31 +-
 net/core/filter.c  |  77 +-
 net/core/skbuff.c  |   2 +
 19 files changed, 297 insertions(+), 42 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c 
b/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c
index d8f0c83..06ce63c 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c
@@ -94,6 +94,7 @@ bool bnxt_rx_xdp(struct bnxt *bp, struct bnxt_rx_ring_info 
*rxr, u16 cons,
 
xdp.data_hard_start = *data_ptr - offset;
xdp.data = *data_ptr;
+   xdp_set_data_meta_invalid();
xdp.data_end = *data_ptr + *len;
orig_data = xdp.data;
mapping = rx_buf->mapping - bp->rx_dma_offset;
diff --git a/drivers/net/ethernet/cavium/thunder/nicvf_main.c 
b/drivers/net/ethernet/cavium/thunder/nicvf_main.c
index 49b80da..d68478a 100644
--- a/drivers/net/ethernet/cavium/thunder/nicvf_main.c
+++ b/drivers/net/ethernet/cavium/thunder/nicvf_main.c
@@ -523,6 +523,7 @@ static inline bool nicvf_xdp_rx(struct nicvf *nic, struct 
bpf_prog *prog,
 
xdp.data_hard_start = page_address(page);
xdp.data = (void *)cpu_addr;
+   xdp_set_data_meta_invalid();
xdp.data_end = xdp.data + len;
orig_data = xdp.data;
 
diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c 
b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
index 1519dfb..f426762 100644
--- 

[PATCH net-next 5/6] bpf, nfp: add meta data support

2017-09-24 Thread Daniel Borkmann
Implement support for transferring XDP meta data into skb for
nfp driver; before calling into the program, xdp.data_meta points
to xdp.data, where on program return with pass verdict, we call
into skb_metadata_set().

Signed-off-by: Daniel Borkmann 
Acked-by: Alexei Starovoitov 
Acked-by: John Fastabend 
---
 .../net/ethernet/netronome/nfp/nfp_net_common.c| 40 --
 1 file changed, 15 insertions(+), 25 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c 
b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
index e3a38be..d2f73fe 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
@@ -1574,27 +1574,6 @@ static void nfp_net_rx_csum(struct nfp_net_dp *dp,
return true;
 }
 
-static int nfp_net_run_xdp(struct bpf_prog *prog, void *data, void *hard_start,
-  unsigned int *off, unsigned int *len)
-{
-   struct xdp_buff xdp;
-   void *orig_data;
-   int ret;
-
-   xdp.data_hard_start = hard_start;
-   xdp.data = data + *off;
-   xdp_set_data_meta_invalid();
-   xdp.data_end = data + *off + *len;
-
-   orig_data = xdp.data;
-   ret = bpf_prog_run_xdp(prog, );
-
-   *len -= xdp.data - orig_data;
-   *off += xdp.data - orig_data;
-
-   return ret;
-}
-
 /**
  * nfp_net_rx() - receive up to @budget packets on @rx_ring
  * @rx_ring:   RX ring to receive from
@@ -1630,6 +1609,7 @@ static int nfp_net_rx(struct nfp_net_rx_ring *rx_ring, 
int budget)
struct nfp_meta_parsed meta;
struct net_device *netdev;
dma_addr_t new_dma_addr;
+   u32 meta_len_xdp = 0;
void *new_frag;
 
idx = D_IDX(rx_ring, rx_ring->rd_p);
@@ -1708,16 +1688,24 @@ static int nfp_net_rx(struct nfp_net_rx_ring *rx_ring, 
int budget)
 
if (xdp_prog && !(rxd->rxd.flags & PCIE_DESC_RX_BPF &&
  dp->bpf_offload_xdp) && !meta.portid) {
+   void *orig_data = rxbuf->frag + pkt_off;
unsigned int dma_off;
-   void *hard_start;
+   struct xdp_buff xdp;
int act;
 
-   hard_start = rxbuf->frag + NFP_NET_RX_BUF_HEADROOM;
+   xdp.data_hard_start = rxbuf->frag + 
NFP_NET_RX_BUF_HEADROOM;
+   xdp.data = orig_data;
+   xdp.data_meta = orig_data;
+   xdp.data_end = orig_data + pkt_len;
+
+   act = bpf_prog_run_xdp(xdp_prog, );
+
+   pkt_len -= xdp.data - orig_data;
+   pkt_off += xdp.data - orig_data;
 
-   act = nfp_net_run_xdp(xdp_prog, rxbuf->frag, hard_start,
- _off, _len);
switch (act) {
case XDP_PASS:
+   meta_len_xdp = xdp.data - xdp.data_meta;
break;
case XDP_TX:
dma_off = pkt_off - NFP_NET_RX_BUF_HEADROOM;
@@ -1785,6 +1773,8 @@ static int nfp_net_rx(struct nfp_net_rx_ring *rx_ring, 
int budget)
if (rxd->rxd.flags & PCIE_DESC_RX_VLAN)
__vlan_hwaccel_put_tag(skb, htons(ETH_P_8021Q),
   le16_to_cpu(rxd->rxd.vlan));
+   if (meta_len_xdp)
+   skb_metadata_set(skb, meta_len_xdp);
 
napi_gro_receive(_ring->r_vec->napi, skb);
}
-- 
1.9.3



[PATCH net-next 6/6] bpf, ixgbe: add meta data support

2017-09-24 Thread Daniel Borkmann
Implement support for transferring XDP meta data into skb for
ixgbe driver; before calling into the program, xdp.data_meta points
to xdp.data, where on program return with pass verdict, we call
into skb_metadata_set().

We implement this for the default ixgbe_build_skb() variant. For the
ixgbe_construct_skb() that is used when legacy-rx buffer mananagement
mode is turned on via ethtool, I found that XDP gets 0 headroom, so
neither xdp_adjust_head() nor xdp_adjust_meta() can be used with this.
Just add a comment with explanation for this operating mode.

Signed-off-by: Daniel Borkmann 
Acked-by: Alexei Starovoitov 
Acked-by: John Fastabend 
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 30 +++
 1 file changed, 26 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c 
b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 04bb03b..3942c62 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -2133,6 +2133,21 @@ static struct sk_buff *ixgbe_construct_skb(struct 
ixgbe_ring *rx_ring,
 #if L1_CACHE_BYTES < 128
prefetch(xdp->data + L1_CACHE_BYTES);
 #endif
+   /* Note, we get here by enabling legacy-rx via:
+*
+*ethtool --set-priv-flags  legacy-rx on
+*
+* In this mode, we currently get 0 extra XDP headroom as
+* opposed to having legacy-rx off, where we process XDP
+* packets going to stack via ixgbe_build_skb(). The latter
+* provides us currently with 192 bytes of headroom.
+*
+* For ixgbe_construct_skb() mode it means that the
+* xdp->data_meta will always point to xdp->data, since
+* the helper cannot expand the head. Should this ever
+* change in future for legacy-rx mode on, then lets also
+* add xdp->data_meta handling here.
+*/
 
/* allocate a skb to store the frags */
skb = napi_alloc_skb(_ring->q_vector->napi, IXGBE_RX_HDR_SIZE);
@@ -2165,6 +2180,7 @@ static struct sk_buff *ixgbe_build_skb(struct ixgbe_ring 
*rx_ring,
   struct xdp_buff *xdp,
   union ixgbe_adv_rx_desc *rx_desc)
 {
+   unsigned int metasize = xdp->data - xdp->data_meta;
 #if (PAGE_SIZE < 8192)
unsigned int truesize = ixgbe_rx_pg_size(rx_ring) / 2;
 #else
@@ -2174,10 +2190,14 @@ static struct sk_buff *ixgbe_build_skb(struct 
ixgbe_ring *rx_ring,
 #endif
struct sk_buff *skb;
 
-   /* prefetch first cache line of first page */
-   prefetch(xdp->data);
+   /* Prefetch first cache line of first page. If xdp->data_meta
+* is unused, this points extactly as xdp->data, otherwise we
+* likely have a consumer accessing first few bytes of meta
+* data, and then actual data.
+*/
+   prefetch(xdp->data_meta);
 #if L1_CACHE_BYTES < 128
-   prefetch(xdp->data + L1_CACHE_BYTES);
+   prefetch(xdp->data_meta + L1_CACHE_BYTES);
 #endif
 
/* build an skb to around the page buffer */
@@ -2188,6 +2208,8 @@ static struct sk_buff *ixgbe_build_skb(struct ixgbe_ring 
*rx_ring,
/* update pointers within the skb to store the data */
skb_reserve(skb, xdp->data - xdp->data_hard_start);
__skb_put(skb, xdp->data_end - xdp->data);
+   if (metasize)
+   skb_metadata_set(skb, metasize);
 
/* record DMA address if this is the start of a chain of buffers */
if (!ixgbe_test_staterr(rx_desc, IXGBE_RXD_STAT_EOP))
@@ -2326,7 +2348,7 @@ static int ixgbe_clean_rx_irq(struct ixgbe_q_vector 
*q_vector,
if (!skb) {
xdp.data = page_address(rx_buffer->page) +
   rx_buffer->page_offset;
-   xdp_set_data_meta_invalid();
+   xdp.data_meta = xdp.data;
xdp.data_hard_start = xdp.data -
  ixgbe_rx_offset(rx_ring);
xdp.data_end = xdp.data + size;
-- 
1.9.3



[PATCH net-next 3/6] bpf: update bpf.h uapi header for tools

2017-09-24 Thread Daniel Borkmann
Looks like a couple of updates missed to get carried into tools/include/uapi/,
so copy the bpf.h header as usual to pull in latest updates.

Signed-off-by: Daniel Borkmann 
Acked-by: Alexei Starovoitov 
Acked-by: John Fastabend 
---
 tools/include/uapi/linux/bpf.h | 45 ++
 1 file changed, 32 insertions(+), 13 deletions(-)

diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 461811e..e43491a 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -143,12 +143,6 @@ enum bpf_attach_type {
 
 #define MAX_BPF_ATTACH_TYPE __MAX_BPF_ATTACH_TYPE
 
-enum bpf_sockmap_flags {
-   BPF_SOCKMAP_UNSPEC,
-   BPF_SOCKMAP_STRPARSER,
-   __MAX_BPF_SOCKMAP_FLAG
-};
-
 /* If BPF_F_ALLOW_OVERRIDE flag is used in BPF_PROG_ATTACH command
  * to the given target_fd cgroup the descendent cgroup will be able to
  * override effective bpf program that was inherited from this cgroup
@@ -368,9 +362,20 @@ enum bpf_sockmap_flags {
  * int bpf_redirect(ifindex, flags)
  * redirect to another netdev
  * @ifindex: ifindex of the net device
- * @flags: bit 0 - if set, redirect to ingress instead of egress
- * other bits - reserved
- * Return: TC_ACT_REDIRECT
+ * @flags:
+ *   cls_bpf:
+ *  bit 0 - if set, redirect to ingress instead of egress
+ *  other bits - reserved
+ *   xdp_bpf:
+ * all bits - reserved
+ * Return: cls_bpf: TC_ACT_REDIRECT on success or TC_ACT_SHOT on error
+ *xdp_bfp: XDP_REDIRECT on success or XDP_ABORT on error
+ * int bpf_redirect_map(map, key, flags)
+ * redirect to endpoint in map
+ * @map: pointer to dev map
+ * @key: index in map to lookup
+ * @flags: --
+ * Return: XDP_REDIRECT on success or XDP_ABORT on error
  *
  * u32 bpf_get_route_realm(skb)
  * retrieve a dst's tclassid
@@ -577,6 +582,12 @@ enum bpf_sockmap_flags {
  * @map: pointer to sockmap to update
  * @key: key to insert/update sock in map
  * @flags: same flags as map update elem
+ *
+ * int bpf_xdp_adjust_meta(xdp_md, delta)
+ * Adjust the xdp_md.data_meta by delta
+ * @xdp_md: pointer to xdp_md
+ * @delta: An positive/negative integer to be added to xdp_md.data_meta
+ * Return: 0 on success or negative on error
  */
 #define __BPF_FUNC_MAPPER(FN)  \
FN(unspec), \
@@ -632,7 +643,8 @@ enum bpf_sockmap_flags {
FN(skb_adjust_room),\
FN(redirect_map),   \
FN(sk_redirect_map),\
-   FN(sock_map_update),
+   FN(sock_map_update),\
+   FN(xdp_adjust_meta),
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
  * function eBPF program intends to call
@@ -710,7 +722,7 @@ struct __sk_buff {
__u32 data_end;
__u32 napi_id;
 
-   /* accessed by BPF_PROG_TYPE_sk_skb types */
+   /* Accessed by BPF_PROG_TYPE_sk_skb types from here to ... */
__u32 family;
__u32 remote_ip4;   /* Stored in network byte order */
__u32 local_ip4;/* Stored in network byte order */
@@ -718,6 +730,9 @@ struct __sk_buff {
__u32 local_ip6[4]; /* Stored in network byte order */
__u32 remote_port;  /* Stored in network byte order */
__u32 local_port;   /* stored in host byte order */
+   /* ... here. */
+
+   __u32 data_meta;
 };
 
 struct bpf_tunnel_key {
@@ -753,20 +768,23 @@ struct bpf_sock {
__u32 family;
__u32 type;
__u32 protocol;
+   __u32 mark;
+   __u32 priority;
 };
 
 #define XDP_PACKET_HEADROOM 256
 
 /* User return codes for XDP prog type.
  * A valid XDP program must return one of these defined values. All other
- * return codes are reserved for future use. Unknown return codes will result
- * in packet drop.
+ * return codes are reserved for future use. Unknown return codes will
+ * result in packet drops and a warning via bpf_warn_invalid_xdp_action().
  */
 enum xdp_action {
XDP_ABORTED = 0,
XDP_DROP,
XDP_PASS,
XDP_TX,
+   XDP_REDIRECT,
 };
 
 /* user accessible metadata for XDP packet hook
@@ -775,6 +793,7 @@ enum xdp_action {
 struct xdp_md {
__u32 data;
__u32 data_end;
+   __u32 data_meta;
 };
 
 enum sk_action {
-- 
1.9.3



[PATCH net-next 1/6] bpf: rename bpf_compute_data_end into bpf_compute_data_pointers

2017-09-24 Thread Daniel Borkmann
Just do the rename into bpf_compute_data_pointers() as we'll add
one more pointer here to recompute.

Signed-off-by: Daniel Borkmann 
Acked-by: Alexei Starovoitov 
Acked-by: John Fastabend 
---
 include/linux/filter.h |  9 ++---
 kernel/bpf/sockmap.c   |  4 ++--
 net/bpf/test_run.c |  2 +-
 net/core/filter.c  | 14 +++---
 net/core/lwt_bpf.c |  2 +-
 net/sched/act_bpf.c|  4 ++--
 net/sched/cls_bpf.c|  4 ++--
 7 files changed, 21 insertions(+), 18 deletions(-)

diff --git a/include/linux/filter.h b/include/linux/filter.h
index d29e58f..052bab3 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -496,10 +496,13 @@ struct xdp_buff {
void *data_hard_start;
 };
 
-/* compute the linear packet data range [data, data_end) which
- * will be accessed by cls_bpf, act_bpf and lwt programs
+/* Compute the linear packet data range [data, data_end) which
+ * will be accessed by various program types (cls_bpf, act_bpf,
+ * lwt, ...). Subsystems allowing direct data access must (!)
+ * ensure that cb[] area can be written to when BPF program is
+ * invoked (otherwise cb[] save/restore is necessary).
  */
-static inline void bpf_compute_data_end(struct sk_buff *skb)
+static inline void bpf_compute_data_pointers(struct sk_buff *skb)
 {
struct bpf_skb_data_end *cb = (struct bpf_skb_data_end *)skb->cb;
 
diff --git a/kernel/bpf/sockmap.c b/kernel/bpf/sockmap.c
index 6424ce0..a298d66 100644
--- a/kernel/bpf/sockmap.c
+++ b/kernel/bpf/sockmap.c
@@ -102,7 +102,7 @@ static int smap_verdict_func(struct smap_psock *psock, 
struct sk_buff *skb)
 
skb_orphan(skb);
skb->sk = psock->sock;
-   bpf_compute_data_end(skb);
+   bpf_compute_data_pointers(skb);
rc = (*prog->bpf_func)(skb, prog->insnsi);
skb->sk = NULL;
 
@@ -369,7 +369,7 @@ static int smap_parse_func_strparser(struct strparser *strp,
 * any socket yet.
 */
skb->sk = psock->sock;
-   bpf_compute_data_end(skb);
+   bpf_compute_data_pointers(skb);
rc = (*prog->bpf_func)(skb, prog->insnsi);
skb->sk = NULL;
rcu_read_unlock();
diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c
index 6be41a4..df67251 100644
--- a/net/bpf/test_run.c
+++ b/net/bpf/test_run.c
@@ -133,7 +133,7 @@ int bpf_prog_test_run_skb(struct bpf_prog *prog, const 
union bpf_attr *kattr,
if (is_l2)
__skb_push(skb, ETH_HLEN);
if (is_direct_pkt_access)
-   bpf_compute_data_end(skb);
+   bpf_compute_data_pointers(skb);
retval = bpf_test_run(prog, skb, repeat, );
if (!is_l2)
__skb_push(skb, ETH_HLEN);
diff --git a/net/core/filter.c b/net/core/filter.c
index 82edad5..c468e7c 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -1402,7 +1402,7 @@ static inline int bpf_try_make_writable(struct sk_buff 
*skb,
 {
int err = __bpf_try_make_writable(skb, write_len);
 
-   bpf_compute_data_end(skb);
+   bpf_compute_data_pointers(skb);
return err;
 }
 
@@ -1962,7 +1962,7 @@ struct sock *do_sk_redirect_map(void)
ret = skb_vlan_push(skb, vlan_proto, vlan_tci);
bpf_pull_mac_rcsum(skb);
 
-   bpf_compute_data_end(skb);
+   bpf_compute_data_pointers(skb);
return ret;
 }
 
@@ -1984,7 +1984,7 @@ struct sock *do_sk_redirect_map(void)
ret = skb_vlan_pop(skb);
bpf_pull_mac_rcsum(skb);
 
-   bpf_compute_data_end(skb);
+   bpf_compute_data_pointers(skb);
return ret;
 }
 
@@ -2178,7 +2178,7 @@ static int bpf_skb_proto_xlat(struct sk_buff *skb, __be16 
to_proto)
 * need to be verified first.
 */
ret = bpf_skb_proto_xlat(skb, proto);
-   bpf_compute_data_end(skb);
+   bpf_compute_data_pointers(skb);
return ret;
 }
 
@@ -2303,7 +2303,7 @@ static int bpf_skb_adjust_net(struct sk_buff *skb, s32 
len_diff)
ret = shrink ? bpf_skb_net_shrink(skb, len_diff_abs) :
   bpf_skb_net_grow(skb, len_diff_abs);
 
-   bpf_compute_data_end(skb);
+   bpf_compute_data_pointers(skb);
return ret;
 }
 
@@ -2394,7 +2394,7 @@ static int bpf_skb_trim_rcsum(struct sk_buff *skb, 
unsigned int new_len)
skb_gso_reset(skb);
}
 
-   bpf_compute_data_end(skb);
+   bpf_compute_data_pointers(skb);
return ret;
 }
 
@@ -2434,7 +2434,7 @@ static int bpf_skb_trim_rcsum(struct sk_buff *skb, 
unsigned int new_len)
skb_reset_mac_header(skb);
}
 
-   bpf_compute_data_end(skb);
+   bpf_compute_data_pointers(skb);
return 0;
 }
 
diff --git a/net/core/lwt_bpf.c b/net/core/lwt_bpf.c
index 1307731..e7e626f 100644
--- a/net/core/lwt_bpf.c
+++ b/net/core/lwt_bpf.c
@@ -51,7 +51,7 @@ static int run_lwt_bpf(struct sk_buff *skb, struct 
bpf_lwt_prog *lwt,
 */
preempt_disable();
rcu_read_lock();
-   

Re: [PATCH net-next 2/2] net: dsa: lan9303: Add basic offloading of unicast traffic

2017-09-24 Thread Egil Hjelmeland

Den 23. sep. 2017 16:31, skrev Andrew Lunn:

The point is: Once both external ports are in "forwarding", I see no way
to prevent traffic flowing directly between the external ports.


Generally, there are port vectors. Port X can send frames only to Port
Y.

If you don't have that, there are possibilities with VLANs. Each port
is given a unique VLAN. All incoming untagged traffic is tagged with
the VLAN. You just need to keep the VLAN separated and add/remove the
VLAN tag in the dsa tag driver.

  Andrew


Thanks. The lan9303 has nothing like "port vectors". The port tagging
scheme is VLAN based, but is does not prevent direct forwarding between
the external ports.

In order to not break the strong port separation in the current driver;
I will stick to my solution, and only add caching of the STP state
register.

Egil


Re: [PATCH] vxge: Fix rts_mac_en config parameter check

2017-09-24 Thread Christos Gkekas
On 24/09/17 19:50:21 +0100, Christos Gkekas wrote:
> Current checks return VXGE_HW_BADCFG_RTS_MAC_EN if rts_mac_en is not
> equal to DISABLE *and* not equal to ENABLE. This condition is always
> false and the check should change to *or* to properly verify the value.
> 
> Signed-off-by: Christos Gkekas 
> ---
>  drivers/net/ethernet/neterion/vxge/vxge-config.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/ethernet/neterion/vxge/vxge-config.c 
> b/drivers/net/ethernet/neterion/vxge/vxge-config.c
> index 6223930..c694f97 100644
> --- a/drivers/net/ethernet/neterion/vxge/vxge-config.c
> +++ b/drivers/net/ethernet/neterion/vxge/vxge-config.c
> @@ -1286,8 +1286,8 @@ __vxge_hw_device_config_check(struct 
> vxge_hw_device_config *new_config)
>   (new_config->intr_mode != VXGE_HW_INTR_MODE_DEF))
>   return VXGE_HW_BADCFG_INTR_MODE;
>  
> - if ((new_config->rts_mac_en != VXGE_HW_RTS_MAC_DISABLE) &&
> - (new_config->rts_mac_en != VXGE_HW_RTS_MAC_ENABLE))
> + if (new_config->rts_mac_en != VXGE_HW_RTS_MAC_DISABLE ||
> + new_config->rts_mac_en != VXGE_HW_RTS_MAC_ENABLE)
>   return VXGE_HW_BADCFG_RTS_MAC_EN;
>  
>   for (i = 0; i < VXGE_HW_MAX_VIRTUAL_PATHS; i++) {
> -- 
> 2.7.4
>

Please ignore the patch above, it was sent in error.

Thanks,
Chris



[PATCH] vxge: Fix rts_mac_en config parameter check

2017-09-24 Thread Christos Gkekas
Current checks return VXGE_HW_BADCFG_RTS_MAC_EN if rts_mac_en is not
equal to DISABLE *and* not equal to ENABLE. This condition is always
false and the check should change to *or* to properly verify the value.

Signed-off-by: Christos Gkekas 
---
 drivers/net/ethernet/neterion/vxge/vxge-config.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/neterion/vxge/vxge-config.c 
b/drivers/net/ethernet/neterion/vxge/vxge-config.c
index 6223930..c694f97 100644
--- a/drivers/net/ethernet/neterion/vxge/vxge-config.c
+++ b/drivers/net/ethernet/neterion/vxge/vxge-config.c
@@ -1286,8 +1286,8 @@ __vxge_hw_device_config_check(struct 
vxge_hw_device_config *new_config)
(new_config->intr_mode != VXGE_HW_INTR_MODE_DEF))
return VXGE_HW_BADCFG_INTR_MODE;
 
-   if ((new_config->rts_mac_en != VXGE_HW_RTS_MAC_DISABLE) &&
-   (new_config->rts_mac_en != VXGE_HW_RTS_MAC_ENABLE))
+   if (new_config->rts_mac_en != VXGE_HW_RTS_MAC_DISABLE ||
+   new_config->rts_mac_en != VXGE_HW_RTS_MAC_ENABLE)
return VXGE_HW_BADCFG_RTS_MAC_EN;
 
for (i = 0; i < VXGE_HW_MAX_VIRTUAL_PATHS; i++) {
-- 
2.7.4



Re: [PATCH] mac80211: aead api to reduce redundancy

2017-09-24 Thread Xiang Gao
2017-09-24 13:42 GMT-04:00 Johannes Berg :
> On Sun, 2017-09-24 at 13:21 -0400, Xiang Gao wrote:
>>
>> Do you mean to put more characters each line in the description
>>
> Huh, sorry, no - my bad. I was thinking of the code, not the
> description at all.

Oh yes, these indentation do looks ugly. Thank you for figuring this
out. The tab
width of my editor was set to 4. It should be 8... I will fix these
problems and resend
the patch soon, maybe after receiving a bit more feedback.

>
> For example here:
>
>> -int ieee80211_aes_gcm_encrypt(struct crypto_aead *tfm, u8 *j_0, u8 *aad,
>> - u8 *data, size_t data_len, u8 *mic)
>> +int aead_encrypt(struct crypto_aead *tfm, u8 *b_0, u8 *aad, size_t aad_len,
>> +u8 *data, size_t data_len, u8 *auth)
>>
>
> I think you should adjust the indentation to match - or did it just get
> mangled in my mail? It looks *further* indented now, when it should be
> less (to after the opening parenthesis). Similarly in various other
> places.
>
> And perhaps for long things like
>
>> +static inline struct crypto_aead *ieee80211_aes_key_setup_encrypt(
>> +   const u8 key[], size_t key_len,
>> size_t mic_len)
>
>> +struct crypto_aead *aead_key_setup_encrypt(const char *alg,
>> +   const u8 key[], size_t key_len, size_t authsize);
>
> it might be better to write
>
> static inline struct crypto_aead *
> ieee80211_aes_key_setup_encrypt(const u8 key[], ...)
>
> and
>
> struct crypto_aead *
> aead_key_setup_encrypt(const char *alg, ...)
>
>
> respectively, depending on how far you have to indent to break lines
> etc.
>
> Anyway, I'm nitpicking.
>
> Unrelated to this, I'm not sure whose tree this should go through -
> probably Herbert's (or DaveM's with his ACK? not sure if there's a
> crypto tree?) or so?

Yes, there is one at
https://git.kernel.org/pub/scm/linux/kernel/git/herbert/cryptodev-2.6.git/
I'm not sure which tree to go either. I'm also not sure about the
beginning of the patch title,
should it be "mac80211:" or "crypto:"?

Options are:
1. This whole patch goes to either mac80211 tree or crypto tree. I
don't know which is better.
2. Make the higher level api only for internal usage in mac80211, i.e.
move the aead_api.c and aead_api.h to net/mac80211, does not export
the symbol. And of course, this will go to the mac80211 tree. I
personally don't want this to be the final solution because I happen
to be writing a loadable kernel module that uses these higher level
api.
3. Maybe split this patch, one for changes in crypto, which will go to
crypto tree, and the other for mac80211 part, which goes to the
mac80211 tree?

>
> johannes


Re: [PATCH] mac80211: aead api to reduce redundancy

2017-09-24 Thread Johannes Berg
On Sun, 2017-09-24 at 13:21 -0400, Xiang Gao wrote:
> 
> Do you mean to put more characters each line in the description
> 
Huh, sorry, no - my bad. I was thinking of the code, not the
description at all.

For example here:

> -int ieee80211_aes_gcm_encrypt(struct crypto_aead *tfm, u8 *j_0, u8 *aad,
> - u8 *data, size_t data_len, u8 *mic)
> +int aead_encrypt(struct crypto_aead *tfm, u8 *b_0, u8 *aad, size_t aad_len,
> +u8 *data, size_t data_len, u8 *auth)
> 

I think you should adjust the indentation to match - or did it just get
mangled in my mail? It looks *further* indented now, when it should be
less (to after the opening parenthesis). Similarly in various other
places.

And perhaps for long things like

> +static inline struct crypto_aead *ieee80211_aes_key_setup_encrypt(
> +   const u8 key[], size_t key_len,
> size_t mic_len)

> +struct crypto_aead *aead_key_setup_encrypt(const char *alg,
> +   const u8 key[], size_t key_len, size_t authsize);

it might be better to write

static inline struct crypto_aead *
ieee80211_aes_key_setup_encrypt(const u8 key[], ...)

and

struct crypto_aead *
aead_key_setup_encrypt(const char *alg, ...)


respectively, depending on how far you have to indent to break lines
etc.

Anyway, I'm nitpicking.

Unrelated to this, I'm not sure whose tree this should go through -
probably Herbert's (or DaveM's with his ACK? not sure if there's a
crypto tree?) or so?

johannes


[patch net-next v2 03/12] ipmr: Add FIB notification access functions

2017-09-24 Thread Jiri Pirko
From: Yotam Gigi 

Make the ipmr module register as a FIB notifier. To do that, implement both
the ipmr_seq_read and ipmr_dump ops.

The ipmr_seq_read op returns a sequence counter that is incremented on
every notification related operation done by the ipmr. To implement that,
add a sequence counter in the netns_ipv4 struct and increment it whenever a
new MFC route or VIF are added or deleted. The sequence operations are
protected by the RTNL lock.

The ipmr_dump iterates the list of MFC routes and the list of VIF entries
and sends notifications about them. The entries dump is done under RCU
where the VIF dump uses the mrt_lock too, as the vif->dev field can change
under RCU.

Signed-off-by: Yotam Gigi 
Reviewed-by: Ido Schimmel 
Signed-off-by: Jiri Pirko 
---
v1->v2:
 - Take the mrt_lock when dumping VIF entries.
---
 include/linux/mroute.h   |  15 ++
 include/net/netns/ipv4.h |   3 ++
 net/ipv4/ipmr.c  | 137 ++-
 3 files changed, 153 insertions(+), 2 deletions(-)

diff --git a/include/linux/mroute.h b/include/linux/mroute.h
index 10028f2..54c5cb8 100644
--- a/include/linux/mroute.h
+++ b/include/linux/mroute.h
@@ -5,6 +5,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #ifdef CONFIG_IP_MROUTE
@@ -58,6 +59,14 @@ struct vif_device {
int link;   /* Physical interface index 
*/
 };
 
+struct vif_entry_notifier_info {
+   struct fib_notifier_info info;
+   struct net_device *dev;
+   vifi_t vif_index;
+   unsigned short vif_flags;
+   u32 tb_id;
+};
+
 #define VIFF_STATIC 0x8000
 
 #define VIF_EXISTS(_mrt, _idx) ((_mrt)->vif_table[_idx].dev != NULL)
@@ -146,6 +155,12 @@ struct mfc_cache {
struct rcu_head rcu;
 };
 
+struct mfc_entry_notifier_info {
+   struct fib_notifier_info info;
+   struct mfc_cache *mfc;
+   u32 tb_id;
+};
+
 struct rtmsg;
 int ipmr_get_route(struct net *net, struct sk_buff *skb,
   __be32 saddr, __be32 daddr,
diff --git a/include/net/netns/ipv4.h b/include/net/netns/ipv4.h
index 8387f09..abc84d9 100644
--- a/include/net/netns/ipv4.h
+++ b/include/net/netns/ipv4.h
@@ -163,6 +163,9 @@ struct netns_ipv4 {
struct fib_notifier_ops *notifier_ops;
unsigned intfib_seq;/* protected by rtnl_mutex */
 
+   struct fib_notifier_ops *ipmr_notifier_ops;
+   unsigned intipmr_seq;   /* protected by rtnl_mutex */
+
atomic_trt_genid;
 };
 #endif
diff --git a/net/ipv4/ipmr.c b/net/ipv4/ipmr.c
index 86dc5f9..49879c3 100644
--- a/net/ipv4/ipmr.c
+++ b/net/ipv4/ipmr.c
@@ -264,6 +264,16 @@ static void __net_exit ipmr_rules_exit(struct net *net)
fib_rules_unregister(net->ipv4.mr_rules_ops);
rtnl_unlock();
 }
+
+static int ipmr_rules_dump(struct net *net, struct notifier_block *nb)
+{
+   return fib_rules_dump(net, nb, RTNL_FAMILY_IPMR);
+}
+
+static unsigned int ipmr_rules_seq_read(struct net *net)
+{
+   return fib_rules_seq_read(net, RTNL_FAMILY_IPMR);
+}
 #else
 #define ipmr_for_each_table(mrt, net) \
for (mrt = net->ipv4.mrt; mrt; mrt = NULL)
@@ -298,6 +308,16 @@ static void __net_exit ipmr_rules_exit(struct net *net)
net->ipv4.mrt = NULL;
rtnl_unlock();
 }
+
+static int ipmr_rules_dump(struct net *net, struct notifier_block *nb)
+{
+   return 0;
+}
+
+static unsigned int ipmr_rules_seq_read(struct net *net)
+{
+   return 0;
+}
 #endif
 
 static inline int ipmr_hash_cmp(struct rhashtable_compare_arg *arg,
@@ -587,6 +607,43 @@ static struct net_device *ipmr_reg_vif(struct net *net, 
struct mr_table *mrt)
 }
 #endif
 
+static int call_ipmr_vif_entry_notifier(struct notifier_block *nb,
+   struct net *net,
+   enum fib_event_type event_type,
+   struct vif_device *vif,
+   vifi_t vif_index, u32 tb_id)
+{
+   struct vif_entry_notifier_info info = {
+   .info = {
+   .family = RTNL_FAMILY_IPMR,
+   .net = net,
+   },
+   .dev = vif->dev,
+   .vif_index = vif_index,
+   .vif_flags = vif->flags,
+   .tb_id = tb_id,
+   };
+
+   return call_fib_notifier(nb, net, event_type, );
+}
+
+static int call_ipmr_mfc_entry_notifier(struct notifier_block *nb,
+   struct net *net,
+   enum fib_event_type event_type,
+   struct mfc_cache *mfc, u32 tb_id)
+{
+   struct mfc_entry_notifier_info info = {
+   .info = {
+   .family = RTNL_FAMILY_IPMR,
+   .net = net,
+   },
+   .mfc = mfc,
+   .tb_id = tb_id
+   

Re: [PATCH] mac80211: aead api to reduce redundancy

2017-09-24 Thread Xiang Gao
2017-09-24 11:05 GMT-04:00 Johannes Berg :
> On Sun, 2017-09-24 at 01:40 -0400, Xiang Gao wrote:
>> Currently, the aes_ccm.c and aes_gcm.c are almost line by line
>> copy of each other. This patch reduce code redundancy by moving
>> the code in these two files to crypto/aead_api.c to make it a
>> higher level aead api. The aes_ccm.c and aes_gcm.c are removed
>> and all the functions are now implemented in their headers using
>> the newly added aead api.
>>
> No objection from me, though I'd ask you to respin with the indentation
> fixed up a bit.

Hi Johannes,

Thank you for you time for the suggestion. I'm not sure if I correctly
understand you point.
Do you mean to put more characters each line in the description? Something like:

> Currently, the aes_ccm.c and aes_gcm.c are almost line by line copy of
> each other. This patch reduce code redundancy by moving the code in these
> two files to crypto/aead_api.c to make it a higher level aead api. The
> file aes_ccm.c and aes_gcm.c are removed and all the functions there are
> now implemented in their headers using the newly added aead api.

instead of

> Currently, the aes_ccm.c and aes_gcm.c are almost line by line
> copy of each other. This patch reduce code redundancy by moving
> the code in these two files to crypto/aead_api.c to make it a
> higher level aead api. The aes_ccm.c and aes_gcm.c are removed
> and all the functions are now implemented in their headers using
> the newly added aead api.

Xiang Gao

>
> johannes


[patch net-next v2 05/12] net: ipmr: Add MFC offload indication

2017-09-24 Thread Jiri Pirko
From: Yotam Gigi 

Allow drivers, registered to the fib notification chain indicate whether a
multicast MFC route is offloaded or not, similarly to unicast routes. The
indication of whether a route is offloaded is done using the mfc_flags
field on an mfc_cache struct, and the information is sent to the userspace
via the RTNetlink interface only.

Currently, MFC routes are either offloaded or not, thus there is no need to
add per-VIF offload indication.

Signed-off-by: Yotam Gigi 
Reviewed-by: Ido Schimmel 
Signed-off-by: Jiri Pirko 
---
v1->v2:
 - Add comment for the MFC_OFFLOAD flag
---
 include/linux/mroute.h | 2 ++
 net/ipv4/ipmr.c| 3 +++
 2 files changed, 5 insertions(+)

diff --git a/include/linux/mroute.h b/include/linux/mroute.h
index 54c5cb8..5566580 100644
--- a/include/linux/mroute.h
+++ b/include/linux/mroute.h
@@ -90,9 +90,11 @@ struct mr_table {
 
 /* mfc_flags:
  * MFC_STATIC - the entry was added statically (not by a routing daemon)
+ * MFC_OFFLOAD - the entry was offloaded to the hardware
  */
 enum {
MFC_STATIC = BIT(0),
+   MFC_OFFLOAD = BIT(1),
 };
 
 struct mfc_cache_cmp_arg {
diff --git a/net/ipv4/ipmr.c b/net/ipv4/ipmr.c
index ba71bc4..2a795d2 100644
--- a/net/ipv4/ipmr.c
+++ b/net/ipv4/ipmr.c
@@ -2268,6 +2268,9 @@ static int __ipmr_fill_mroute(struct mr_table *mrt, 
struct sk_buff *skb,
nla_put_u32(skb, RTA_IIF, 
mrt->vif_table[c->mfc_parent].dev->ifindex) < 0)
return -EMSGSIZE;
 
+   if (c->mfc_flags & MFC_OFFLOAD)
+   rtm->rtm_flags |= RTNH_F_OFFLOAD;
+
if (!(mp_attr = nla_nest_start(skb, RTA_MULTIPATH)))
return -EMSGSIZE;
 
-- 
2.9.5



[patch net-next v2 02/12] ipmr: Add reference count to MFC entries

2017-09-24 Thread Jiri Pirko
From: Yotam Gigi 

Next commits will introduce MFC notifications through the atomic
fib_notification chain, thus allowing modules to be aware of MFC entries.

Due to the fact that modules may need to hold a reference to an MFC entry,
add reference count to MFC entries to prevent them from being freed while
these modules use them.

The reference counting is done only on resolved MFC entries currently.

Signed-off-by: Yotam Gigi 
Reviewed-by: Ido Schimmel 
Signed-off-by: Jiri Pirko 
---
v1->v2:
 - Add comment for the mfc_cache.mfc_un.res.refcount field, similarly to
   all other fields in the struct
---
 include/linux/mroute.h | 21 +
 net/ipv4/ipmr.c|  8 +---
 2 files changed, 26 insertions(+), 3 deletions(-)

diff --git a/include/linux/mroute.h b/include/linux/mroute.h
index d7f6333..10028f2 100644
--- a/include/linux/mroute.h
+++ b/include/linux/mroute.h
@@ -109,6 +109,7 @@ struct mfc_cache_cmp_arg {
  * @wrong_if: number of wrong source interface hits
  * @lastuse: time of last use of the group (traffic or update)
  * @ttls: OIF TTL threshold array
+ * @refcount: reference count for this entry
  * @list: global entry list
  * @rcu: used for entry destruction
  */
@@ -138,6 +139,7 @@ struct mfc_cache {
unsigned long wrong_if;
unsigned long lastuse;
unsigned char ttls[MAXVIFS];
+   refcount_t refcount;
} res;
} mfc_un;
struct list_head list;
@@ -148,4 +150,23 @@ struct rtmsg;
 int ipmr_get_route(struct net *net, struct sk_buff *skb,
   __be32 saddr, __be32 daddr,
   struct rtmsg *rtm, u32 portid);
+
+#ifdef CONFIG_IP_MROUTE
+void ipmr_cache_free(struct mfc_cache *mfc_cache);
+#else
+static inline void ipmr_cache_free(struct mfc_cache *mfc_cache)
+{
+}
+#endif
+
+static inline void ipmr_cache_put(struct mfc_cache *c)
+{
+   if (refcount_dec_and_test(>mfc_un.res.refcount))
+   ipmr_cache_free(c);
+}
+static inline void ipmr_cache_hold(struct mfc_cache *c)
+{
+   refcount_inc(>mfc_un.res.refcount);
+}
+
 #endif
diff --git a/net/ipv4/ipmr.c b/net/ipv4/ipmr.c
index c9b3e6e..86dc5f9 100644
--- a/net/ipv4/ipmr.c
+++ b/net/ipv4/ipmr.c
@@ -652,10 +652,11 @@ static void ipmr_cache_free_rcu(struct rcu_head *head)
kmem_cache_free(mrt_cachep, c);
 }
 
-static inline void ipmr_cache_free(struct mfc_cache *c)
+void ipmr_cache_free(struct mfc_cache *c)
 {
call_rcu(>rcu, ipmr_cache_free_rcu);
 }
+EXPORT_SYMBOL(ipmr_cache_free);
 
 /* Destroy an unresolved cache entry, killing queued skbs
  * and reporting error to netlink readers.
@@ -949,6 +950,7 @@ static struct mfc_cache *ipmr_cache_alloc(void)
if (c) {
c->mfc_un.res.last_assert = jiffies - MFC_ASSERT_THRESH - 1;
c->mfc_un.res.minvif = MAXVIFS;
+   refcount_set(>mfc_un.res.refcount, 1);
}
return c;
 }
@@ -1162,7 +1164,7 @@ static int ipmr_mfc_delete(struct mr_table *mrt, struct 
mfcctl *mfc, int parent)
rhltable_remove(>mfc_hash, >mnode, ipmr_rht_params);
list_del_rcu(>list);
mroute_netlink_event(mrt, c, RTM_DELROUTE);
-   ipmr_cache_free(c);
+   ipmr_cache_put(c);
 
return 0;
 }
@@ -1264,7 +1266,7 @@ static void mroute_clean_tables(struct mr_table *mrt, 
bool all)
rhltable_remove(>mfc_hash, >mnode, ipmr_rht_params);
list_del_rcu(>list);
mroute_netlink_event(mrt, c, RTM_DELROUTE);
-   ipmr_cache_free(c);
+   ipmr_cache_put(c);
}
 
if (atomic_read(>cache_resolve_queue_len) != 0) {
-- 
2.9.5



[patch net-next v2 01/12] fib: notifier: Add VIF add and delete event types

2017-09-24 Thread Jiri Pirko
From: Yotam Gigi 

In order for an interface to forward packets according to the kernel
multicast routing table, it must be configured with a VIF index according
to the mroute user API. The VIF index is then used to refer to that
interface in the mroute user API, for example, to set the iif and oifs of
an MFC entry.

In order to allow drivers to be aware and offload multicast routes, they
have to be aware of the VIF add and delete notifications.

Due to the fact that a specific VIF can be deleted and re-added pointing to
another netdevice, and the MFC routes that point to it will forward the
matching packets to the new netdevice, a driver willing to offload MFC
cache entries must be aware of the VIF add and delete events in addition to
MFC routes notifications.

Signed-off-by: Yotam Gigi 
Reviewed-by: Ido Schimmel 
Signed-off-by: Jiri Pirko 
---
 include/net/fib_notifier.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/include/net/fib_notifier.h b/include/net/fib_notifier.h
index 669b971..54cd6b8 100644
--- a/include/net/fib_notifier.h
+++ b/include/net/fib_notifier.h
@@ -20,6 +20,8 @@ enum fib_event_type {
FIB_EVENT_RULE_DEL,
FIB_EVENT_NH_ADD,
FIB_EVENT_NH_DEL,
+   FIB_EVENT_VIF_ADD,
+   FIB_EVENT_VIF_DEL,
 };
 
 struct fib_notifier_ops {
-- 
2.9.5



[patch net-next v2 06/12] net: mroute: Check if rule is a default rule

2017-09-24 Thread Jiri Pirko
From: Yotam Gigi 

When the ipmr starts, it adds one default FIB rule that matches all packets
and sends them to the DEFAULT (multicast) FIB table. A more complex rule
can be added by user to specify that for a specific interface, a packet
should be look up at either an arbitrary table or according to the l3mdev
of the interface.

For drivers willing to offload the ipmr logic into a hardware but don't
want to offload all the FIB rules functionality, provide a function that
can indicate whether the FIB rule is the default multicast rule, thus only
one routing table is needed.

This way, a driver can register to the FIB notification chain, get
notifications about FIB rules added and trigger some kind of an internal
abort mechanism when a non default rule is added by the user.

Signed-off-by: Yotam Gigi 
Reviewed-by: Ido Schimmel 
Signed-off-by: Jiri Pirko 
---
 include/linux/mroute.h |  7 +++
 net/ipv4/ipmr.c| 10 ++
 2 files changed, 17 insertions(+)

diff --git a/include/linux/mroute.h b/include/linux/mroute.h
index 5566580..b072a84 100644
--- a/include/linux/mroute.h
+++ b/include/linux/mroute.h
@@ -5,6 +5,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 
@@ -19,6 +20,7 @@ int ip_mroute_getsockopt(struct sock *, int, char __user *, 
int __user *);
 int ipmr_ioctl(struct sock *sk, int cmd, void __user *arg);
 int ipmr_compat_ioctl(struct sock *sk, unsigned int cmd, void __user *arg);
 int ip_mr_init(void);
+bool ipmr_rule_default(const struct fib_rule *rule);
 #else
 static inline int ip_mroute_setsockopt(struct sock *sock, int optname,
   char __user *optval, unsigned int optlen)
@@ -46,6 +48,11 @@ static inline int ip_mroute_opt(int opt)
 {
return 0;
 }
+
+static inline bool ipmr_rule_default(const struct fib_rule *rule)
+{
+   return true;
+}
 #endif
 
 struct vif_device {
diff --git a/net/ipv4/ipmr.c b/net/ipv4/ipmr.c
index 2a795d2..a714f55 100644
--- a/net/ipv4/ipmr.c
+++ b/net/ipv4/ipmr.c
@@ -320,6 +320,16 @@ static unsigned int ipmr_rules_seq_read(struct net *net)
 }
 #endif
 
+bool ipmr_rule_default(const struct fib_rule *rule)
+{
+#if IS_ENABLED(CONFIG_FIB_RULES)
+   return fib_rule_matchall(rule) && rule->table == RT_TABLE_DEFAULT;
+#else
+   return true;
+#endif
+}
+EXPORT_SYMBOL(ipmr_rule_default);
+
 static inline int ipmr_hash_cmp(struct rhashtable_compare_arg *arg,
const void *ptr)
 {
-- 
2.9.5



[patch net-next v2 04/12] ipmr: Send FIB notifications on MFC and VIF entries

2017-09-24 Thread Jiri Pirko
From: Yotam Gigi 

Use the newly introduced notification chain to send events upon VIF and MFC
addition and deletion. The MFC notifications are sent only on resolved MFC
entries, as unresolved cannot be offloaded.

Signed-off-by: Yotam Gigi 
Reviewed-by: Ido Schimmel 
Signed-off-by: Jiri Pirko 
Reviewed-by: Nikolay Aleksandrov 
---
 net/ipv4/ipmr.c | 53 +
 1 file changed, 53 insertions(+)

diff --git a/net/ipv4/ipmr.c b/net/ipv4/ipmr.c
index 49879c3..ba71bc4 100644
--- a/net/ipv4/ipmr.c
+++ b/net/ipv4/ipmr.c
@@ -627,6 +627,27 @@ static int call_ipmr_vif_entry_notifier(struct 
notifier_block *nb,
return call_fib_notifier(nb, net, event_type, );
 }
 
+static int call_ipmr_vif_entry_notifiers(struct net *net,
+enum fib_event_type event_type,
+struct vif_device *vif,
+vifi_t vif_index, u32 tb_id)
+{
+   struct vif_entry_notifier_info info = {
+   .info = {
+   .family = RTNL_FAMILY_IPMR,
+   .net = net,
+   },
+   .dev = vif->dev,
+   .vif_index = vif_index,
+   .vif_flags = vif->flags,
+   .tb_id = tb_id,
+   };
+
+   ASSERT_RTNL();
+   net->ipv4.ipmr_seq++;
+   return call_fib_notifiers(net, event_type, );
+}
+
 static int call_ipmr_mfc_entry_notifier(struct notifier_block *nb,
struct net *net,
enum fib_event_type event_type,
@@ -644,6 +665,24 @@ static int call_ipmr_mfc_entry_notifier(struct 
notifier_block *nb,
return call_fib_notifier(nb, net, event_type, );
 }
 
+static int call_ipmr_mfc_entry_notifiers(struct net *net,
+enum fib_event_type event_type,
+struct mfc_cache *mfc, u32 tb_id)
+{
+   struct mfc_entry_notifier_info info = {
+   .info = {
+   .family = RTNL_FAMILY_IPMR,
+   .net = net,
+   },
+   .mfc = mfc,
+   .tb_id = tb_id
+   };
+
+   ASSERT_RTNL();
+   net->ipv4.ipmr_seq++;
+   return call_fib_notifiers(net, event_type, );
+}
+
 /**
  * vif_delete - Delete a VIF entry
  * @notify: Set to 1, if the caller is a notifier_call
@@ -651,6 +690,7 @@ static int call_ipmr_mfc_entry_notifier(struct 
notifier_block *nb,
 static int vif_delete(struct mr_table *mrt, int vifi, int notify,
  struct list_head *head)
 {
+   struct net *net = read_pnet(>net);
struct vif_device *v;
struct net_device *dev;
struct in_device *in_dev;
@@ -660,6 +700,10 @@ static int vif_delete(struct mr_table *mrt, int vifi, int 
notify,
 
v = >vif_table[vifi];
 
+   if (VIF_EXISTS(mrt, vifi))
+   call_ipmr_vif_entry_notifiers(net, FIB_EVENT_VIF_DEL, v, vifi,
+ mrt->id);
+
write_lock_bh(_lock);
dev = v->dev;
v->dev = NULL;
@@ -909,6 +953,7 @@ static int vif_add(struct net *net, struct mr_table *mrt,
if (vifi+1 > mrt->maxvif)
mrt->maxvif = vifi+1;
write_unlock_bh(_lock);
+   call_ipmr_vif_entry_notifiers(net, FIB_EVENT_VIF_ADD, v, vifi, mrt->id);
return 0;
 }
 
@@ -1209,6 +1254,7 @@ static int ipmr_cache_unresolved(struct mr_table *mrt, 
vifi_t vifi,
 
 static int ipmr_mfc_delete(struct mr_table *mrt, struct mfcctl *mfc, int 
parent)
 {
+   struct net *net = read_pnet(>net);
struct mfc_cache *c;
 
/* The entries are added/deleted only under RTNL */
@@ -1220,6 +1266,7 @@ static int ipmr_mfc_delete(struct mr_table *mrt, struct 
mfcctl *mfc, int parent)
return -ENOENT;
rhltable_remove(>mfc_hash, >mnode, ipmr_rht_params);
list_del_rcu(>list);
+   call_ipmr_mfc_entry_notifiers(net, FIB_EVENT_ENTRY_DEL, c, mrt->id);
mroute_netlink_event(mrt, c, RTM_DELROUTE);
ipmr_cache_put(c);
 
@@ -1248,6 +1295,8 @@ static int ipmr_mfc_add(struct net *net, struct mr_table 
*mrt,
if (!mrtsock)
c->mfc_flags |= MFC_STATIC;
write_unlock_bh(_lock);
+   call_ipmr_mfc_entry_notifiers(net, FIB_EVENT_ENTRY_REPLACE, c,
+ mrt->id);
mroute_netlink_event(mrt, c, RTM_NEWROUTE);
return 0;
}
@@ -1297,6 +1346,7 @@ static int ipmr_mfc_add(struct net *net, struct mr_table 
*mrt,
ipmr_cache_resolve(net, mrt, uc, c);
ipmr_cache_free(uc);
}
+   call_ipmr_mfc_entry_notifiers(net, FIB_EVENT_ENTRY_ADD, c, mrt->id);

[patch net-next v2 12/12] mlxsw: spectrum: router: Don't ignore IPMR notifications

2017-09-24 Thread Jiri Pirko
From: Yotam Gigi 

Make the Spectrum router logic not ignore the RTNL_FAMILY_IPMR FIB
notifications.

Past commits added the IPMR VIF and MFC add/del notifications via the
fib_notifier chain. In addition, a code for handling these notifications in
the Spectrum router logic was added. Make the Spectrum router logic not
ignore these notifications and forward the requests to the Spectrum
multicast router offloading logic.

Signed-off-by: Yotam Gigi 
Reviewed-by: Ido Schimmel 
Signed-off-by: Jiri Pirko 
---
 drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c 
b/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
index edc6462..16c041b 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
@@ -5034,7 +5034,8 @@ static int mlxsw_sp_router_fib_event(struct 
notifier_block *nb,
struct mlxsw_sp_router *router;
 
if (!net_eq(info->net, _net) ||
-   (info->family != AF_INET && info->family != AF_INET6))
+   (info->family != AF_INET && info->family != AF_INET6 &&
+info->family != RTNL_FAMILY_IPMR))
return NOTIFY_DONE;
 
fib_work = kzalloc(sizeof(*fib_work), GFP_ATOMIC);
-- 
2.9.5



[patch net-next v2 08/12] mlxsw: spectrum: Add the multicast routing hardware logic

2017-09-24 Thread Jiri Pirko
From: Yotam Gigi 

Implement the multicast routing hardware API introduced in previous patch
for the specific spectrum hardware.

The spectrum hardware multicast routes are written using the RMFT2 register
and point to an ACL flexible action set. The actions used for multicast
routes are:
 - Counter action, which allows counting bytes and packets on multicast
   routes.
 - Multicast route action, which provide RPF check and do the actual packet
   duplication to a list of RIFs.
 - Trap action, in the case the route action specified by the called is
   trap.

Signed-off-by: Yotam Gigi 
Reviewed-by: Ido Schimmel 
Signed-off-by: Jiri Pirko 
---
 drivers/net/ethernet/mellanox/mlxsw/Makefile   |   2 +-
 drivers/net/ethernet/mellanox/mlxsw/spectrum.h |   1 +
 .../net/ethernet/mellanox/mlxsw/spectrum_mr_tcam.c | 828 +
 .../net/ethernet/mellanox/mlxsw/spectrum_mr_tcam.h |  43 ++
 4 files changed, 873 insertions(+), 1 deletion(-)
 create mode 100644 drivers/net/ethernet/mellanox/mlxsw/spectrum_mr_tcam.c
 create mode 100644 drivers/net/ethernet/mellanox/mlxsw/spectrum_mr_tcam.h

diff --git a/drivers/net/ethernet/mellanox/mlxsw/Makefile 
b/drivers/net/ethernet/mellanox/mlxsw/Makefile
index 9b29764..4816504 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/Makefile
+++ b/drivers/net/ethernet/mellanox/mlxsw/Makefile
@@ -18,7 +18,7 @@ mlxsw_spectrum-objs   := spectrum.o 
spectrum_buffers.o \
   spectrum_acl.o spectrum_flower.o \
   spectrum_cnt.o spectrum_fid.o \
   spectrum_ipip.o spectrum_acl_flex_actions.o \
-  spectrum_mr.o
+  spectrum_mr.o spectrum_mr_tcam.o
 mlxsw_spectrum-$(CONFIG_MLXSW_SPECTRUM_DCB)+= spectrum_dcb.o
 mlxsw_spectrum-$(CONFIG_NET_DEVLINK) += spectrum_dpipe.o
 obj-$(CONFIG_MLXSW_MINIMAL)+= mlxsw_minimal.o
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum.h 
b/drivers/net/ethernet/mellanox/mlxsw/spectrum.h
index 51d8b9f..d06f7fe 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum.h
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum.h
@@ -139,6 +139,7 @@ struct mlxsw_sp_port_mall_tc_entry {
 struct mlxsw_sp_sb;
 struct mlxsw_sp_bridge;
 struct mlxsw_sp_router;
+struct mlxsw_sp_mr;
 struct mlxsw_sp_acl;
 struct mlxsw_sp_counter_pool;
 struct mlxsw_sp_fid_core;
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_mr_tcam.c 
b/drivers/net/ethernet/mellanox/mlxsw/spectrum_mr_tcam.c
new file mode 100644
index 000..cda9e9a
--- /dev/null
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_mr_tcam.c
@@ -0,0 +1,828 @@
+/*
+ * drivers/net/ethernet/mellanox/mlxsw/spectrum_mr_tcam.c
+ * Copyright (c) 2017 Mellanox Technologies. All rights reserved.
+ * Copyright (c) 2017 Yotam Gigi 
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions are met:
+ *
+ * 1. Redistributions of source code must retain the above copyright
+ *notice, this list of conditions and the following disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *notice, this list of conditions and the following disclaimer in the
+ *documentation and/or other materials provided with the distribution.
+ * 3. Neither the names of the copyright holders nor the names of its
+ *contributors may be used to endorse or promote products derived from
+ *this software without specific prior written permission.
+ *
+ * Alternatively, this software may be distributed under the terms of the
+ * GNU General Public License ("GPL") version 2 as published by the Free
+ * Software Foundation.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+ * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
+ * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+ * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+ * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+ * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+ * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+ * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+ * POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include 
+#include 
+#include 
+#include 
+
+#include "reg.h"
+#include "spectrum.h"
+#include "core_acl_flex_actions.h"
+#include "spectrum_mr.h"
+
+struct mlxsw_sp_mr_tcam_region {
+   struct mlxsw_sp *mlxsw_sp;
+   enum mlxsw_reg_rtar_key_type rtar_key_type;
+   struct parman *parman;
+   struct 

[patch net-next v2 00/12] mlxsw: Add support for offloading IPv4 multicast routes

2017-09-24 Thread Jiri Pirko
From: Jiri Pirko 

This patch-set introduces offloading of the kernel IPv4 multicast router
logic in the Spectrum driver.

The first patch makes the Spectrum driver ignore FIB notifications that are
not of address family IPv4 or IPv6. This is needed in order to prevent
crashes while the next patches introduce the RTNL_FAMILY_IPMR FIB
notifications.

Patches 2-5 update ipmr to use the FIB notification chain for both MFC and
VIF notifications, and patches 8-12 update the Spectrum driver to register
to these notifications and offload the routes.

Similarly to IPv4 and IPv6, any failure will trigger the abort mechanism
which is updated in this patch-set to eject multicast route tables too.

At this stage, the following limitations apply:
 - A multicast MFC route will be offloaded by the driver if all the output
   interfaces are Spectrum router interfaces (RIFs). In any other case
   (which includes pimreg device, tunnel devices and management ports) the
   route will be trapped to the CPU and the packets will be forwarded by
   software.
 - ipmr proxy routes are not supported and will trigger the abort
   mechanism.
 - The MFC TTL values are currently treated as boolean: if the value is
   different than 255, the traffic is forwarded to the interface and if the
   value is 255 it is not forwarded. Dropping packets based on their TTL isn't
   currently supported.

To allow users to have visibility on which of the routes are offloaded and
which are not, patch 6 introduces a per-route offload indication similar to
IPv4 and IPv6 routes which is sent to the user via the RTNetlink interface.

The Spectrum driver multicast router offloading support, which is
introduced in patches 8 and 9, is divided into two parts:
 - The hardware logic which abstracts the Spectrum hardware and provides a
   simple API for the upper levels.
 - The offloading logic which gets the MFC and VIF notifications from the
   kernel and updates the hardware using the hardware logic part.

Finally, the last patch makes the Spectrum router logic not ignore the
multicast FIB notifications and call the corresponding functions in the
multicast router offloading logic.

---
v1->v2:
 - Add comments for struct fields in mroute.h
 - Take the mrt_lock while dumping VIFs in the fib_notifier dump callback
 - Update the MFC lastuse field too

Yotam Gigi (12):
  fib: notifier: Add VIF add and delete event types
  ipmr: Add reference count to MFC entries
  ipmr: Add FIB notification access functions
  ipmr: Send FIB notifications on MFC and VIF entries
  net: ipmr: Add MFC offload indication
  net: mroute: Check if rule is a default rule
  mlxsw: spectrum: Add the multicast routing offloading logic
  mlxsw: spectrum: Add the multicast routing hardware logic
  mlxsw: spectrum: router: Squash the default route table to main
  mlxsw: spectrum_router: Add multicast routes notification handling
functionality
  mlxsw: spectrum: Notify multicast router on RIF MTU changes
  mlxsw: spectrum: router: Don't ignore IPMR notifications

 drivers/net/ethernet/mellanox/mlxsw/Makefile   |3 +-
 drivers/net/ethernet/mellanox/mlxsw/spectrum.h |2 +
 drivers/net/ethernet/mellanox/mlxsw/spectrum_mr.c  | 1014 
 drivers/net/ethernet/mellanox/mlxsw/spectrum_mr.h  |  133 +++
 .../net/ethernet/mellanox/mlxsw/spectrum_mr_tcam.c |  828 
 .../net/ethernet/mellanox/mlxsw/spectrum_mr_tcam.h |   43 +
 .../net/ethernet/mellanox/mlxsw/spectrum_router.c  |  205 +++-
 include/linux/mroute.h |   45 +
 include/net/fib_notifier.h |2 +
 include/net/netns/ipv4.h   |3 +
 net/ipv4/ipmr.c|  211 +++-
 11 files changed, 2478 insertions(+), 11 deletions(-)
 create mode 100644 drivers/net/ethernet/mellanox/mlxsw/spectrum_mr.c
 create mode 100644 drivers/net/ethernet/mellanox/mlxsw/spectrum_mr.h
 create mode 100644 drivers/net/ethernet/mellanox/mlxsw/spectrum_mr_tcam.c
 create mode 100644 drivers/net/ethernet/mellanox/mlxsw/spectrum_mr_tcam.h

-- 
2.9.5



[patch net-next v2 11/12] mlxsw: spectrum: Notify multicast router on RIF MTU changes

2017-09-24 Thread Jiri Pirko
From: Yotam Gigi 

Due to the fact that multicast routes hold the minimum MTU of all the
egress RIFs and trap packets that don't meet it, notify the mulitcast
router code on RIF MTU changes.

Signed-off-by: Yotam Gigi 
Reviewed-by: Ido Schimmel 
Signed-off-by: Jiri Pirko 
---
 drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c 
b/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
index b36ec63..edc6462 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
@@ -5648,6 +5648,17 @@ int mlxsw_sp_netdevice_router_port_event(struct 
net_device *dev)
if (err)
goto err_rif_fdb_op;
 
+   if (rif->mtu != dev->mtu) {
+   struct mlxsw_sp_vr *vr;
+
+   /* The RIF is relevant only to its mr_table instance, as unlike
+* unicast routing, in multicast routing a RIF cannot be shared
+* between several multicast routing tables.
+*/
+   vr = _sp->router->vrs[rif->vr_id];
+   mlxsw_sp_mr_rif_mtu_update(vr->mr4_table, rif, dev->mtu);
+   }
+
ether_addr_copy(rif->addr, dev->dev_addr);
rif->mtu = dev->mtu;
 
-- 
2.9.5



[patch net-next v2 09/12] mlxsw: spectrum: router: Squash the default route table to main

2017-09-24 Thread Jiri Pirko
From: Yotam Gigi 

Currently, the mlxsw Spectrum driver offloads only either the RT_TABLE_MAIN
FIB table or the VRF tables, so the RT_TABLE_LOCAL table is squashed to the
RT_TABLE_MAIN table to allow local routes to be offloaded too.

By default, multicast MFC routes which are not assigned to any user
requested table are put in the RT_TABLE_DEFAULT table.

Due to the fact that offloading multicast MFC routes support in Spectrum
router logic is going to be introduced soon, squash the default table to
MAIN too.

Signed-off-by: Yotam Gigi 
Reviewed-by: Ido Schimmel 
Signed-off-by: Jiri Pirko 
---
 drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c 
b/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
index 0bd93dc..1e6122f 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
@@ -692,8 +692,8 @@ static int mlxsw_sp_vr_lpm_tree_unbind(struct mlxsw_sp 
*mlxsw_sp,
 
 static u32 mlxsw_sp_fix_tb_id(u32 tb_id)
 {
-   /* For our purpose, squash main and local table into one */
-   if (tb_id == RT_TABLE_LOCAL)
+   /* For our purpose, squash main, default and local tables into one */
+   if (tb_id == RT_TABLE_LOCAL || tb_id == RT_TABLE_DEFAULT)
tb_id = RT_TABLE_MAIN;
return tb_id;
 }
-- 
2.9.5



[patch net-next v2 07/12] mlxsw: spectrum: Add the multicast routing offloading logic

2017-09-24 Thread Jiri Pirko
From: Yotam Gigi 

Add the multicast router offloading logic, which is in charge of handling
the VIF and MFC notifications and translating it to the hardware logic API.

The offloading logic has to overcome several obstacles in order to safely
comply with the kernel multicast router user API:
 - It must keep track of the mapping between VIFs to netdevices. The user
   can add an MFC cache entry pointing to a VIF, delete the VIF and add
   re-add it with a different netdevice. The offloading logic has to handle
   this in order to be compatible with the kernel logic.
 - It must keep track of the mapping between netdevices to spectrum RIFs,
   as the current hardware implementation assume having a RIF for every
   port in a multicast router.
 - It must handle routes pointing to pimreg device to be trapped to the
   kernel, as the packet should be delivered to userspace.
 - It must handle routes pointing tunnel VIFs. The current implementation
   does not support multicast forwarding to tunnels, thus routes that point
   to a tunnel should be trapped to the kernel.
 - It must be aware of proxy multicast routes, which include both (*,*)
   routes and duplicate routes. Currently proxy routes are not offloaded
   and trigger the abort mechanism: removal of all routes from hardware and
   triggering the traffic to go through the kernel.

The multicast routing offloading logic also updates the counters of the
offloaded MFC routes in a periodic work.

Signed-off-by: Yotam Gigi 
Reviewed-by: Ido Schimmel 
Signed-off-by: Jiri Pirko 
---
v1->v2:
 - Update the lastuse MFC entry field too, in addition to packets an bytes.
---
 drivers/net/ethernet/mellanox/mlxsw/Makefile  |3 +-
 drivers/net/ethernet/mellanox/mlxsw/spectrum.h|1 +
 drivers/net/ethernet/mellanox/mlxsw/spectrum_mr.c | 1014 +
 drivers/net/ethernet/mellanox/mlxsw/spectrum_mr.h |  133 +++
 4 files changed, 1150 insertions(+), 1 deletion(-)
 create mode 100644 drivers/net/ethernet/mellanox/mlxsw/spectrum_mr.c
 create mode 100644 drivers/net/ethernet/mellanox/mlxsw/spectrum_mr.h

diff --git a/drivers/net/ethernet/mellanox/mlxsw/Makefile 
b/drivers/net/ethernet/mellanox/mlxsw/Makefile
index 4b88158..9b29764 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/Makefile
+++ b/drivers/net/ethernet/mellanox/mlxsw/Makefile
@@ -17,7 +17,8 @@ mlxsw_spectrum-objs   := spectrum.o 
spectrum_buffers.o \
   spectrum_kvdl.o spectrum_acl_tcam.o \
   spectrum_acl.o spectrum_flower.o \
   spectrum_cnt.o spectrum_fid.o \
-  spectrum_ipip.o spectrum_acl_flex_actions.o
+  spectrum_ipip.o spectrum_acl_flex_actions.o \
+  spectrum_mr.o
 mlxsw_spectrum-$(CONFIG_MLXSW_SPECTRUM_DCB)+= spectrum_dcb.o
 mlxsw_spectrum-$(CONFIG_NET_DEVLINK) += spectrum_dpipe.o
 obj-$(CONFIG_MLXSW_MINIMAL)+= mlxsw_minimal.o
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum.h 
b/drivers/net/ethernet/mellanox/mlxsw/spectrum.h
index e907ec4..51d8b9f 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum.h
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum.h
@@ -153,6 +153,7 @@ struct mlxsw_sp {
struct mlxsw_sp_sb *sb;
struct mlxsw_sp_bridge *bridge;
struct mlxsw_sp_router *router;
+   struct mlxsw_sp_mr *mr;
struct mlxsw_afa *afa;
struct mlxsw_sp_acl *acl;
struct mlxsw_sp_fid_core *fid_core;
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_mr.c 
b/drivers/net/ethernet/mellanox/mlxsw/spectrum_mr.c
new file mode 100644
index 000..89b2e60
--- /dev/null
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_mr.c
@@ -0,0 +1,1014 @@
+/*
+ * drivers/net/ethernet/mellanox/mlxsw/spectrum_mr.c
+ * Copyright (c) 2017 Mellanox Technologies. All rights reserved.
+ * Copyright (c) 2017 Yotam Gigi 
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions are met:
+ *
+ * 1. Redistributions of source code must retain the above copyright
+ *notice, this list of conditions and the following disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *notice, this list of conditions and the following disclaimer in the
+ *documentation and/or other materials provided with the distribution.
+ * 3. Neither the names of the copyright holders nor the names of its
+ *contributors may be used to endorse or promote products derived from
+ *this software without specific prior written permission.
+ *
+ * Alternatively, this software may be distributed under the terms of the
+ * GNU General Public License ("GPL") version 2 as published by the Free
+ * Software Foundation.
+ *
+ * THIS SOFTWARE IS 

[patch net-next v2 10/12] mlxsw: spectrum_router: Add multicast routes notification handling functionality

2017-09-24 Thread Jiri Pirko
From: Yotam Gigi 

Add functionality for calling the multicast routing offloading logic upon
MFC and VIF add and delete notifications. In addition, call the multicast
routing upon RIF addition and deletion events.

As the multicast routing offload logic may sleep, the actual calls are done
in a deferred work. To ensure the MFC object is not freed in that interval,
a reference is held to it. In case of a failure, the abort mechanism is
used, which ejects all the routes from the hardware and triggers the
traffic to flow through the kernel.

Note: At that stage, the FIB notifications are still ignored, and will be
enabled in a further patch.

Signed-off-by: Yotam Gigi 
Reviewed-by: Ido Schimmel 
Signed-off-by: Jiri Pirko 
---
 .../net/ethernet/mellanox/mlxsw/spectrum_router.c  | 187 -
 1 file changed, 185 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c 
b/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
index 1e6122f..b36ec63 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
@@ -65,6 +65,8 @@
 #include "spectrum_cnt.h"
 #include "spectrum_dpipe.h"
 #include "spectrum_ipip.h"
+#include "spectrum_mr.h"
+#include "spectrum_mr_tcam.h"
 #include "spectrum_router.h"
 
 struct mlxsw_sp_vr;
@@ -458,6 +460,7 @@ struct mlxsw_sp_vr {
unsigned int rif_count;
struct mlxsw_sp_fib *fib4;
struct mlxsw_sp_fib *fib6;
+   struct mlxsw_sp_mr_table *mr4_table;
 };
 
 static const struct rhashtable_params mlxsw_sp_fib_ht_params;
@@ -652,7 +655,7 @@ static void mlxsw_sp_lpm_fini(struct mlxsw_sp *mlxsw_sp)
 
 static bool mlxsw_sp_vr_is_used(const struct mlxsw_sp_vr *vr)
 {
-   return !!vr->fib4 || !!vr->fib6;
+   return !!vr->fib4 || !!vr->fib6 || !!vr->mr4_table;
 }
 
 static struct mlxsw_sp_vr *mlxsw_sp_vr_find_unused(struct mlxsw_sp *mlxsw_sp)
@@ -743,9 +746,18 @@ static struct mlxsw_sp_vr *mlxsw_sp_vr_create(struct 
mlxsw_sp *mlxsw_sp,
err = PTR_ERR(vr->fib6);
goto err_fib6_create;
}
+   vr->mr4_table = mlxsw_sp_mr_table_create(mlxsw_sp, vr->id,
+MLXSW_SP_L3_PROTO_IPV4);
+   if (IS_ERR(vr->mr4_table)) {
+   err = PTR_ERR(vr->mr4_table);
+   goto err_mr_table_create;
+   }
vr->tb_id = tb_id;
return vr;
 
+err_mr_table_create:
+   mlxsw_sp_fib_destroy(vr->fib6);
+   vr->fib6 = NULL;
 err_fib6_create:
mlxsw_sp_fib_destroy(vr->fib4);
vr->fib4 = NULL;
@@ -754,6 +766,8 @@ static struct mlxsw_sp_vr *mlxsw_sp_vr_create(struct 
mlxsw_sp *mlxsw_sp,
 
 static void mlxsw_sp_vr_destroy(struct mlxsw_sp_vr *vr)
 {
+   mlxsw_sp_mr_table_destroy(vr->mr4_table);
+   vr->mr4_table = NULL;
mlxsw_sp_fib_destroy(vr->fib6);
vr->fib6 = NULL;
mlxsw_sp_fib_destroy(vr->fib4);
@@ -774,7 +788,8 @@ static struct mlxsw_sp_vr *mlxsw_sp_vr_get(struct mlxsw_sp 
*mlxsw_sp, u32 tb_id)
 static void mlxsw_sp_vr_put(struct mlxsw_sp_vr *vr)
 {
if (!vr->rif_count && list_empty(>fib4->node_list) &&
-   list_empty(>fib6->node_list))
+   list_empty(>fib6->node_list) &&
+   mlxsw_sp_mr_table_empty(vr->mr4_table))
mlxsw_sp_vr_destroy(vr);
 }
 
@@ -4606,6 +4621,75 @@ static int __mlxsw_sp_router_set_abort_trap(struct 
mlxsw_sp *mlxsw_sp,
return 0;
 }
 
+static int mlxsw_sp_router_fibmr_add(struct mlxsw_sp *mlxsw_sp,
+struct mfc_entry_notifier_info *men_info,
+bool replace)
+{
+   struct mlxsw_sp_vr *vr;
+
+   if (mlxsw_sp->router->aborted)
+   return 0;
+
+   vr = mlxsw_sp_vr_get(mlxsw_sp, men_info->tb_id);
+   if (IS_ERR(vr))
+   return PTR_ERR(vr);
+
+   return mlxsw_sp_mr_route4_add(vr->mr4_table, men_info->mfc, replace);
+}
+
+static void mlxsw_sp_router_fibmr_del(struct mlxsw_sp *mlxsw_sp,
+ struct mfc_entry_notifier_info *men_info)
+{
+   struct mlxsw_sp_vr *vr;
+
+   if (mlxsw_sp->router->aborted)
+   return;
+
+   vr = mlxsw_sp_vr_find(mlxsw_sp, men_info->tb_id);
+   if (WARN_ON(!vr))
+   return;
+
+   mlxsw_sp_mr_route4_del(vr->mr4_table, men_info->mfc);
+   mlxsw_sp_vr_put(vr);
+}
+
+static int
+mlxsw_sp_router_fibmr_vif_add(struct mlxsw_sp *mlxsw_sp,
+ struct vif_entry_notifier_info *ven_info)
+{
+   struct mlxsw_sp_rif *rif;
+   struct mlxsw_sp_vr *vr;
+
+   if (mlxsw_sp->router->aborted)
+   return 0;
+
+   vr = mlxsw_sp_vr_get(mlxsw_sp, ven_info->tb_id);
+   if (IS_ERR(vr))
+   return PTR_ERR(vr);
+
+   rif = mlxsw_sp_rif_find_by_dev(mlxsw_sp, ven_info->dev);
+   return 

Re: [PATCH net-next 09/14] gtp: Allow configuring GTP interface as standalone

2017-09-24 Thread Tom Herbert
> I'm not sure where a "vendor" is involved with the GTP patches so far.  I
> think we have to draw a distinction between what you expect from
> professional, corporate "vendors" with a commercial interest in mind
> (such as supporting their hardware) and what you can expect from people
> doing things in their spare time, out of enthusiasm to finally bring
> some Free Software into the closed world of telecommunications.
>
If it makes you feel any better I am not getting paid for this work either :-)

> The Telecom world should have implemented something like a GTP kernel
> module a decade to 15 years ago.  They could have saved significant
> investments in proprietary hardware by running open source GGSNs with an
> accelerated user plane in the kernel.  Nobody seemed to have an interest
> in that, until today - as you can see from Pablo and me working on this
> in our spare time, whenever we have a couple of spare cycles next to
> many other projects.  You can see from the osmo-gtp-kernel commit log it
> took years of being a ultra-low-priority on-and-off project  to ever get
> to a point where we thought it was worth submitting it mainline.
> Andreas deserves the praise for finally pushing it ahead.
>
I completely agree, and your work is well appreciated! But I don't
believe it is to late to steer the ship away from proprietary
solutions. In fact, given the direction of the rest of the industry
direction, now is our best opportunity to try. That is a major reason
for these patches. We need to bring GTP into the limelight and get a
lot more people thinking about. This might even be the world's most
important tunneling protocol. If nothing else, a discussion like this
is good if it inspires others in the community to start to look at it.

Tom


Re: [PATCH net-next 09/14] gtp: Allow configuring GTP interface as standalone

2017-09-24 Thread Harald Welte
Hi Tom,

On Sun, Sep 24, 2017 at 08:55:49AM -0700, Tom Herbert wrote:
> Do you believe that these patches are not at all on the right track,
> that they can't be built upon to get to a standards-compliant
> implementation, and that we are going to have to throw all of this and
> start from scratch to provide IPv6 support?

I believe I have pointed out where the problem areas are, several times
by now.  I see no reason why things would have to be started from
scratch.  However, the issues pointed out in the IPv6 support patch[es]
have to be resolved *before* any merge to mainline.

I don't mind merging "incomplete" code that doesn't cover all parts of a
spec but provides basic interoperability.  I also am not arguing that
code must be bug-free at the time it is merged (which is impossible
anyway).  But I am arguing that we cannot merge something that is
a wrong implementation as per the spec, and hence it must be brought
in-line with the spec before it can be merged.

> > There's no use in merging an IPv6 support patch if already by code
> > review it can be shown that it's impossible to create a spec-compliant
> > implementation using that patch.  To me, that would be "merging IPv6
> > support so we can check off a box on a management form or marketing
> > sheet", but not for any practical value.
> 
> To be clear, these patches are not done because to be a bullet point
> on a marketing sheet. 

Great.

> IPv6 is becoming _the_ Internet protocol.

I'm all aware of that, and I've been a very early adopter, since the
1990ies with 6bone.

My argument is not against IPv6 support.  My argument is against merging
something that introdues IPv6 in a way that's not in-line with the GTP
protocol specifications, as such a way is of no use to anyone (except
marketing sheets).

> We should be far past the days of vendors only providing IPv4 in the
> kernel support because "that's what our customers use" and they'll get
> to IPv6 support at their leisure. 

I'm not sure where a "vendor" is involved with the GTP patches so far.  I
think we have to draw a distinction between what you expect from
professional, corporate "vendors" with a commercial interest in mind
(such as supporting their hardware) and what you can expect from people
doing things in their spare time, out of enthusiasm to finally bring
some Free Software into the closed world of telecommunications.

The Telecom world should have implemented something like a GTP kernel
module a decade to 15 years ago.  They could have saved significant
investments in proprietary hardware by running open source GGSNs with an
accelerated user plane in the kernel.  Nobody seemed to have an interest
in that, until today - as you can see from Pablo and me working on this
in our spare time, whenever we have a couple of spare cycles next to
many other projects.  You can see from the osmo-gtp-kernel commit log it
took years of being a ultra-low-priority on-and-off project  to ever get
to a point where we thought it was worth submitting it mainline.
Andreas deserves the praise for finally pushing it ahead.

I'm looking forward to reviewing the next version of the patch series.
-- 
- Harald Welte    http://laforge.gnumonks.org/

"Privacy in residential applications is a desirable marketing option."
  (ETSI EN 300 175-7 Ch. A6)


Re: [PATCH net-next 09/14] gtp: Allow configuring GTP interface as standalone

2017-09-24 Thread Tom Herbert
> It's not about "not liking".  I'm very happy about contributions,
> including (of course) yours.  It's about making sure that code we merge
> into the kernel GTP driver will actually be usable to create a
> standards-compliant GTP application or not.
>
Harald,

Do you believe that these patches are not at all on the right track,
that they can't be built upon to get to a standards-compliant
implementation, and that we are going to have to throw all of this and
start from scratch to provide IPv6 support?

> There's no use in merging an IPv6 support patch if already by code
> review it can be shown that it's impossible to create a spec-compliant
> implementation using that patch.  To me, that would be "merging IPv6
> support so we can check off a box on a management form or marketing
> sheet", but not for any practical value.
>

To be clear, these patches are not done because to be a bullet point
on a marketing sheet. IPv6 is becoming _the_ Internet protocol. It
continues to exhibit exponential growth (~20% of Internet, per Google
stats), I believe least two of the largest datacenter operators are
running everything over IPv6, and there are already proposals to start
official deprecation of IPv4. In the mobile space IPv6 is going to be
a critical enabler of IoT and security in technologies like 5G. If we
want Linux to be at the forefront of the next technology wave then we
need to focus on IPv6 now! We should be far past the days of vendors
only providing IPv4 in the kernel support because "that's what our
customers use" and they'll get to IPv6 support at their leisure. IMO,
davem has every right to unilaterally NAK patches that only support
IPv4 or only test IPv4 with not even a path or timeline for IPv6
support.

Thanks,
Tom


Re: [RFC] endianness issues in drivers/net/ethernet/qlogic/qed

2017-09-24 Thread Al Viro
On Sun, Sep 24, 2017 at 02:34:19PM +, Tayar, Tomer wrote:
> 
> > "qed: Utilize FW 8.10.3.0" has attempted some endianness annotations
> > in that driver; unfortunately, either annotations are BS or the driver is 
> > genuinely
> > broken on big-endian hosts.
> [...]
> > Is that driver intended to be used on big-endian hosts at all?
> 
> Thanks for taking the time to review our driver and pointing out these 
> mistakes.
> Support for BE machines is planned to be added but currently it is not 
> available.
> However, the structures which are used to abstract the HW carry endianity 
> annotations.
> Obviously, there are some misses and some annotations were added when not 
> required.
> We will prepare a patch that fixes the issues you pointed out and similar 
> ones.

OK...  sparse is pretty good at spotting the problems; if you have any 
questions - just
ask.  A bit of random braindump concerning that kind of work:

* bitfields and fixed-endian data do not mix.  It's much better to have 
just
__le32 (or __le64, etc.) in the structure and use GET_FIELD/SET_FIELD or 
similar for
accesses.  Another safe technics is something like
if ((foo->bar & cpu_to_le32(BAR_MASK)) == cpu_to_le32(BAR_THIS << 
BAR_SHIFT))
instead of
if (get_bar(foo) == BAR_THIS))
since that keeps shift and endianness conversion on the constant size.  The 
same goes
for
if ((foo->bar ^ baz->bar) & cpu_to_le32(BAR_MASK))
instead of
if (get_bar(foo) != get_bar(baz))
If would be nice if compiler had recognized that kind of stuff and transformed 
the
latter into the former on its own, but...

* swab... is a Bloody Bad Idea(tm) in almost all situations.  Keeping 
track of
whether given data is little-endian or host-endian is much easier than keeping 
track of
how many times have we flipped it.

* don't mix little-endian and host-endian in the same variable.  See 
the previous
point for the reasons - static typing is much safer and easier to reason about. 
 Code
doing
n = cpu_to_le32(n);
is asking for trouble.  For local variables it's not even an optimization - 
compiler
is generally pretty good at spotting two local variables that are never live at 
the
same point and reusing memory.  And for anything non-local you are introducing 
a hidden
piece of state - "is that field in this structure little-endian or host-endian 
at the
moment?", making it very easy to screw up a few months down the road.  Brittle 
and
hell to debug...

* one very common source of noise is cpu_to_le32() when le32_to_cpu() 
was
intended.  Sure, they do the same transformation on anything even remotely 
plausible
(something like V0 V2 V3 V1 is not a byte order likely to happen on any 
hardware),
but the choice documents what kind of values do you have before and after the
conversion.  Both the human readers and automated typechecking (sparse) have 
much
easier life if those are accurate.  Again, see the point re keeping track of the
number of flips vs. keeping track of what's host-endian and what's 
little-endian.
The latter is local, the former takes reasoning about control flow.

* for situations like "use this le32 value as search key in binary 
tree",
where you are really OK with having differently-shaped trees on l-e and b-e 
hosts,
use something like
if ((__force __u32) key > node->key)
preferably with a comment explaining why treating this value that way is OK.


[PATCH] isdn/eicon: do integrity check on cmd->adapter == a->controller early

2017-09-24 Thread Meng Xu
In my understanding, the reason to have the check on
if (cmd->adapter != a->controller) {report error} is to prevent the case
where after xdi_copy_from_user() in diva_xdi_write(), data->adapter
is changed from what is previously fetched in diva_xdi_open_adapter(),
and hence, leading to using a wrong adapter to do interface.cmd_proc().

Although respective checks are in place in the three implementations of
cmd_proc(), i.e., diva_4bri_cmd_card_proc(), diva_bri_cmd_card_proc(),
and diva_pri_cmd_card_proc(), in my opinion, a better way might be doing
this integrity right after the xdi_copy_from_user() in diva_xdi_write(),
which is what this patch is for.

Signed-off-by: Meng Xu 
---
 drivers/isdn/hardware/eicon/diva.c| 10 +-
 drivers/isdn/hardware/eicon/os_4bri.c |  6 --
 drivers/isdn/hardware/eicon/os_bri.c  |  6 --
 drivers/isdn/hardware/eicon/os_pri.c  |  6 --
 4 files changed, 9 insertions(+), 19 deletions(-)

diff --git a/drivers/isdn/hardware/eicon/diva.c 
b/drivers/isdn/hardware/eicon/diva.c
index d91dd58..8ebd3c7 100644
--- a/drivers/isdn/hardware/eicon/diva.c
+++ b/drivers/isdn/hardware/eicon/diva.c
@@ -460,7 +460,15 @@ diva_xdi_write(void *adapter, void *os_handle, const void 
__user *src,
 
length = (*cp_fn) (os_handle, data, src, length);
if (length > 0) {
-   if ((*(a->interface.cmd_proc))
+   /* do the integrity check early */
+   if(((diva_xdi_um_cfg_cmd_t *)data)->adapter != a->controller){
+   DBG_ERR(("A: A(%d) write, invalid controller=%d != %d",
+   ((diva_xdi_um_cfg_cmd_t 
*)data)->adapter, a->controller));
+
+   length = -1;
+   }
+
+   else if ((*(a->interface.cmd_proc))
(a, (diva_xdi_um_cfg_cmd_t *) data, length)) {
length = -3;
}
diff --git a/drivers/isdn/hardware/eicon/os_4bri.c 
b/drivers/isdn/hardware/eicon/os_4bri.c
index 1891246..adbd852 100644
--- a/drivers/isdn/hardware/eicon/os_4bri.c
+++ b/drivers/isdn/hardware/eicon/os_4bri.c
@@ -629,12 +629,6 @@ diva_4bri_cmd_card_proc(struct _diva_os_xdi_adapter *a,
 {
int ret = -1;
 
-   if (cmd->adapter != a->controller) {
-   DBG_ERR(("A: 4bri_cmd, invalid controller=%d != %d",
-cmd->adapter, a->controller))
-   return (-1);
-   }
-
switch (cmd->command) {
case DIVA_XDI_UM_CMD_GET_CARD_ORDINAL:
a->xdi_mbox.data_length = sizeof(dword);
diff --git a/drivers/isdn/hardware/eicon/os_bri.c 
b/drivers/isdn/hardware/eicon/os_bri.c
index 20f2653..e3d398f 100644
--- a/drivers/isdn/hardware/eicon/os_bri.c
+++ b/drivers/isdn/hardware/eicon/os_bri.c
@@ -398,12 +398,6 @@ diva_bri_cmd_card_proc(struct _diva_os_xdi_adapter *a,
 {
int ret = -1;
 
-   if (cmd->adapter != a->controller) {
-   DBG_ERR(("A: pri_cmd, invalid controller=%d != %d",
-cmd->adapter, a->controller))
-   return (-1);
-   }
-
switch (cmd->command) {
case DIVA_XDI_UM_CMD_GET_CARD_ORDINAL:
a->xdi_mbox.data_length = sizeof(dword);
diff --git a/drivers/isdn/hardware/eicon/os_pri.c 
b/drivers/isdn/hardware/eicon/os_pri.c
index da4957a..93443aa 100644
--- a/drivers/isdn/hardware/eicon/os_pri.c
+++ b/drivers/isdn/hardware/eicon/os_pri.c
@@ -604,12 +604,6 @@ diva_pri_cmd_card_proc(struct _diva_os_xdi_adapter *a,
 {
int ret = -1;
 
-   if (cmd->adapter != a->controller) {
-   DBG_ERR(("A: pri_cmd, invalid controller=%d != %d",
-cmd->adapter, a->controller))
-   return (-1);
-   }
-
switch (cmd->command) {
case DIVA_XDI_UM_CMD_GET_CARD_ORDINAL:
a->xdi_mbox.data_length = sizeof(dword);
-- 
2.7.4



[PATCH] net/tls: move version check after second userspace fetch

2017-09-24 Thread Meng Xu
Even the userspace buffer optval passed the version check
(i.e., tmp_crypto_info.version == TLS_1_2_VERSION) after the first fetch,
it can still be changed before the second copy_from_user() and hence,
a version different than TLS_1_2_VERSION may be copied into crypto_info.
This patch moves the version check after the second userspace fetch.

Signed-off-by: Meng Xu 
---
 net/tls/tls_main.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/net/tls/tls_main.c b/net/tls/tls_main.c
index 60aff60..d4a7bc6 100644
--- a/net/tls/tls_main.c
+++ b/net/tls/tls_main.c
@@ -354,12 +354,6 @@ static int do_tls_setsockopt_tx(struct sock *sk, char 
__user *optval,
goto out;
}
 
-   /* check version */
-   if (tmp_crypto_info.version != TLS_1_2_VERSION) {
-   rc = -ENOTSUPP;
-   goto out;
-   }
-
/* get user crypto info */
crypto_info = >crypto_send;
 
@@ -382,6 +376,12 @@ static int do_tls_setsockopt_tx(struct sock *sk, char 
__user *optval,
rc = -EFAULT;
goto err_crypto_info;
}
+
+   /* check version */
+   if (crypto_info->version != TLS_1_2_VERSION) {
+   rc = -ENOTSUPP;
+   goto err_crypto_info;
+   }
break;
}
default:
-- 
2.7.4



Re: [PATCH] mac80211: aead api to reduce redundancy

2017-09-24 Thread Johannes Berg
On Sun, 2017-09-24 at 01:40 -0400, Xiang Gao wrote:
> Currently, the aes_ccm.c and aes_gcm.c are almost line by line
> copy of each other. This patch reduce code redundancy by moving
> the code in these two files to crypto/aead_api.c to make it a
> higher level aead api. The aes_ccm.c and aes_gcm.c are removed
> and all the functions are now implemented in their headers using
> the newly added aead api.
> 
No objection from me, though I'd ask you to respin with the indentation
fixed up a bit.

johannes


RE: [RFC] endianness issues in drivers/net/ethernet/qlogic/qed

2017-09-24 Thread Tayar, Tomer

>   "qed: Utilize FW 8.10.3.0" has attempted some endianness annotations
> in that driver; unfortunately, either annotations are BS or the driver is 
> genuinely
> broken on big-endian hosts.
[...]
> Is that driver intended to be used on big-endian hosts at all?

Thanks for taking the time to review our driver and pointing out these mistakes.
Support for BE machines is planned to be added but currently it is not 
available.
However, the structures which are used to abstract the HW carry endianity 
annotations.
Obviously, there are some misses and some annotations were added when not 
required.
We will prepare a patch that fixes the issues you pointed out and similar ones.


Re: [RESEND] Re: usb/net/p54: trying to register non-static key in p54_unregister_leds

2017-09-24 Thread Johannes Berg
On Sat, 2017-09-23 at 21:37 +0200, Christian Lamparter wrote:

> But this also begs the question: Is this really working then?
> From what I can tell, if CONFIG_LOCKDEP is not set then there's no
> BUG no WARN, no other splat or any other odd system behaviour. Does
> [cancel | flush]_[delayed_]work[_sync] really "just work" by
> *accident*, as long the delayed_work | work_struct is zeroed out? 

It looks like it does, but I'm not sure it's not more or less by
accident. Look at get_work_pool() for example, it might actually return
non-NULL in this case, and then in start_flush_work() you'll probably
fall into one of the few "already_gone" cases.

> And should it work in the future as well?

I guess it's not really guaranteed, the API doesn't state anything to
that effect. Not that I'm looking forward to a new workqueue rewrite ;)

johannes


[PATCH] net: bcm63xx_enet: Use setup_timer and mod_timer

2017-09-24 Thread Himanshu Jha
Use setup_timer and mod_timer API instead of structure assignments.

This is done using Coccinelle and semantic patch used
for this as follows:

@@
expression x,y,z,a,b;
@@

-init_timer ();
+setup_timer (, y, z);
+mod_timer (, b);
-x.function = y;
-x.data = z;
-x.expires = b;
-add_timer();

Signed-off-by: Himanshu Jha 
---
 drivers/net/ethernet/broadcom/bcm63xx_enet.c | 7 ++-
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bcm63xx_enet.c 
b/drivers/net/ethernet/broadcom/bcm63xx_enet.c
index f8f..c6221f0 100644
--- a/drivers/net/ethernet/broadcom/bcm63xx_enet.c
+++ b/drivers/net/ethernet/broadcom/bcm63xx_enet.c
@@ -2331,11 +2331,8 @@ static int bcm_enetsw_open(struct net_device *dev)
}
 
/* start phy polling timer */
-   init_timer(>swphy_poll);
-   priv->swphy_poll.function = swphy_poll_timer;
-   priv->swphy_poll.data = (unsigned long)priv;
-   priv->swphy_poll.expires = jiffies;
-   add_timer(>swphy_poll);
+   setup_timer(>swphy_poll, swphy_poll_timer, (unsigned long)priv);
+   mod_timer(>swphy_poll, jiffies);
return 0;
 
 out:
-- 
2.7.4



Re: [PATCH net-next 10/10] net: hns3: Add mqprio support when interacting with network stack

2017-09-24 Thread Jiri Pirko
Sat, Sep 23, 2017 at 02:47:20AM CEST, linyunsh...@huawei.com wrote:
>Hi, Jiri
>
>On 2017/9/23 0:03, Jiri Pirko wrote:
>> Fri, Sep 22, 2017 at 04:11:51PM CEST, linyunsh...@huawei.com wrote:
>>> Hi, Jiri
>>>
> - if (!tc) {
> + if (if_running) {
> + (void)hns3_nic_net_stop(netdev);
> + msleep(100);
> + }
> +
> + ret = (kinfo->dcb_ops && kinfo->dcb_ops->>setup_tc) ?
> + kinfo->dcb_ops->setup_tc(h, tc, prio_tc) : ->EOPNOTSUPP;
>>>
 This is most odd. Why do you call dcb_ops from >ndo_setup_tc callback?
 Why are you mixing this together? prio->tc mapping >can be done
 directly in dcbnl
>>>
>>> Here is what we do in dcb_ops->setup_tc:
>>> Firstly, if current tc num is different from the tc num
>>> that user provide, then we setup the queues for each
>>> tc.
>>>
>>> Secondly, we tell hardware the pri to tc mapping that
>>> the stack is using. In rx direction, our hardware need
>>> that mapping to put different packet into different tc'
>>> queues according to the priority of the packet, then
>>> rss decides which specific queue in the tc should the
>>> packet goto.
>>>
>>> By mixing, I suppose you meant why we need the
>>> pri to tc infomation?
>> 
>> by mixing, I mean what I wrote. You are calling dcb_ops callback from
>> ndo_setup_tc callback. So you are mixing DCBNL subsystem and TC
>> subsystem. Why? Why do you need sch_mqprio? Why DCBNL is not enough for
>> all?
>
>When using lldptool, dcbnl is involved.
>
>But when using tc qdisc, dcbbl is not involved, below is the a few key
>call graph in the kernel when tc qdisc cmd is executed.
>
>cmd:
>tc qdisc add dev eth0 root handle 1:0 mqprio num_tc 4 map 1 2 3 3 1 3 1 1 hw 1
>
>call graph:
>rtnetlink_rcv_msg -> tc_modify_qdisc -> qdisc_create -> mqprio_init ->
>hns3_nic_setup_tc
>
>When hns3_nic_setup_tc is called, we need to know how many tc num and
>prio_tc mapping from the tc_mqprio_qopt which is provided in the paramter
>in the ndo_setup_tc function, and dcb_ops is the our hardware specific
>method to setup the tc related parameter to the hardware, so this is why
>we call dcb_ops callback in ndo_setup_tc callback.
>
>I hope this will answer your question, thanks for your time.

Okay. I understand that you have a usecase for mqprio mapping offload
without lldptool being involved. Ok. I believe it is wrong to call dcb_ops
from tc callback. You should have a generic layer inside the driver and
call it from both dcb_ops and tc callbacks.

Also, what happens If I run lldptool concurrently with mqprio? Who wins
and is going to configure the mapping?


>
>> 
>> 
>> 
>>> I hope I did not misunderstand your question, thanks
>>> for your time reviewing.
>> 
>> .
>> 
>


Re: [PATCH v2 net-next 3/4] qed: Fix maximum number of CQs for iWARP

2017-09-24 Thread Leon Romanovsky
On Sun, Sep 24, 2017 at 12:09:44PM +0300, Michal Kalderon wrote:
> The maximum number of CQs supported is bound to the number
> of connections supported, which differs between RoCE and iWARP.
>
> This fixes a crash that occurred in iWARP when running 1000 sessions
> using perftest.
>
> Fixes: 67b40dccc45 ("qed: Implement iWARP initialization, teardown and qp 
> operations")
>
> Signed-off-by: Michal Kalderon 
> Signed-off-by: Ariel Elior 
> ---
>  drivers/net/ethernet/qlogic/qed/qed_rdma.c | 12 ++--
>  1 file changed, 6 insertions(+), 6 deletions(-)
>

Thanks,
Reviewed-by: Leon Romanovsky 


signature.asc
Description: PGP signature


[PATCH v2 net-next 1/4] qed: Add iWARP enablement support

2017-09-24 Thread Michal Kalderon
This patch is the last of the initial iWARP patch series. It
adds the possiblity to actually detect iWARP from the device and enable
it in the critical locations which basically make iWARP available.

It wasn't submitted until now as iWARP hadn't been accepted into
the rdma tree.

Signed-off-by: Michal Kalderon 
Signed-off-by: Ariel Elior 
---
 drivers/net/ethernet/qlogic/qed/qed_cxt.c |  6 ++
 drivers/net/ethernet/qlogic/qed/qed_mcp.c | 10 +-
 drivers/net/ethernet/qlogic/qed/qed_rdma.c|  5 -
 drivers/net/ethernet/qlogic/qed/qed_sp_commands.c |  1 +
 4 files changed, 16 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/qlogic/qed/qed_cxt.c 
b/drivers/net/ethernet/qlogic/qed/qed_cxt.c
index af106be..afd07ad 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_cxt.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_cxt.c
@@ -2069,6 +2069,12 @@ static void qed_rdma_set_pf_params(struct qed_hwfn 
*p_hwfn,
 
num_srqs = min_t(u32, 32 * 1024, p_params->num_srqs);
 
+   if (p_hwfn->mcp_info->func_info.protocol == QED_PCI_ETH_RDMA) {
+   DP_NOTICE(p_hwfn,
+ "Current day drivers don't support RoCE & iWARP 
simultaneously on the same PF. Default to RoCE-only\n");
+   p_hwfn->hw_info.personality = QED_PCI_ETH_ROCE;
+   }
+
switch (p_hwfn->hw_info.personality) {
case QED_PCI_ETH_IWARP:
/* Each QP requires one connection */
diff --git a/drivers/net/ethernet/qlogic/qed/qed_mcp.c 
b/drivers/net/ethernet/qlogic/qed/qed_mcp.c
index 376485d..8b99c7d 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_mcp.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_mcp.c
@@ -1691,12 +1691,12 @@ int qed_mcp_get_media_type(struct qed_dev *cdev, u32 
*p_media_type)
case FW_MB_PARAM_GET_PF_RDMA_ROCE:
*p_proto = QED_PCI_ETH_ROCE;
break;
-   case FW_MB_PARAM_GET_PF_RDMA_BOTH:
-   DP_NOTICE(p_hwfn,
- "Current day drivers don't support RoCE & iWARP. 
Default to RoCE-only\n");
-   *p_proto = QED_PCI_ETH_ROCE;
-   break;
case FW_MB_PARAM_GET_PF_RDMA_IWARP:
+   *p_proto = QED_PCI_ETH_IWARP;
+   break;
+   case FW_MB_PARAM_GET_PF_RDMA_BOTH:
+   *p_proto = QED_PCI_ETH_RDMA;
+   break;
default:
DP_NOTICE(p_hwfn,
  "MFW answers GET_PF_RDMA_PROTOCOL but param is 
%08x\n",
diff --git a/drivers/net/ethernet/qlogic/qed/qed_rdma.c 
b/drivers/net/ethernet/qlogic/qed/qed_rdma.c
index 6fb9951..06715f7 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_rdma.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_rdma.c
@@ -156,7 +156,10 @@ static int qed_rdma_alloc(struct qed_hwfn *p_hwfn,
return rc;
 
p_hwfn->p_rdma_info = p_rdma_info;
-   p_rdma_info->proto = PROTOCOLID_ROCE;
+   if (QED_IS_IWARP_PERSONALITY(p_hwfn))
+   p_rdma_info->proto = PROTOCOLID_IWARP;
+   else
+   p_rdma_info->proto = PROTOCOLID_ROCE;
 
num_cons = qed_cxt_get_proto_cid_count(p_hwfn, p_rdma_info->proto,
   NULL);
diff --git a/drivers/net/ethernet/qlogic/qed/qed_sp_commands.c 
b/drivers/net/ethernet/qlogic/qed/qed_sp_commands.c
index 46d0c3c..a1d33f3 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_sp_commands.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_sp_commands.c
@@ -377,6 +377,7 @@ int qed_sp_pf_start(struct qed_hwfn *p_hwfn,
p_ramrod->personality = PERSONALITY_ISCSI;
break;
case QED_PCI_ETH_ROCE:
+   case QED_PCI_ETH_IWARP:
p_ramrod->personality = PERSONALITY_RDMA_AND_ETH;
break;
default:
-- 
1.8.3.1



[PATCH v2 net-next 0/4] qed: iWARP fixes and enhancements

2017-09-24 Thread Michal Kalderon
This patch series includes several fixes and enhancements
related to iWARP.

Patch #1 is actually the last of the initial iWARP submission.
It has been delayed until now as I wanted to make sure that qedr
supports iWARP prior to enabling iWARP device detection.

iWARP changes in RDMA tree have been accepted and targeted at
kernel 4.15, therefore, all iWARP fixes for this cycle are
submitted to net-next.

Changes from v1->v2 
  - Added "Fixes:" tag to commit message of patch #3

Signed-off by: michal.kalde...@cavium.com
Signed-off-by: Ariel Elior 

Michal Kalderon (4):
  qed: Add iWARP enablement support
  qed: Add iWARP out of order support
  qed: Fix maximum number of CQs for iWARP
  qed: iWARP - Add check for errors on a SYN packet

 drivers/net/ethernet/qlogic/qed/qed_cxt.c |  6 +++
 drivers/net/ethernet/qlogic/qed/qed_iwarp.c   | 52 +++
 drivers/net/ethernet/qlogic/qed/qed_iwarp.h   | 11 -
 drivers/net/ethernet/qlogic/qed/qed_ll2.c |  1 +
 drivers/net/ethernet/qlogic/qed/qed_mcp.c | 10 ++---
 drivers/net/ethernet/qlogic/qed/qed_rdma.c| 24 +++
 drivers/net/ethernet/qlogic/qed/qed_sp_commands.c |  1 +
 include/linux/qed/qed_ll2_if.h|  1 +
 8 files changed, 91 insertions(+), 15 deletions(-)

-- 
1.8.3.1



[PATCH v2 net-next 2/4] qed: Add iWARP out of order support

2017-09-24 Thread Michal Kalderon
iWARP requires OOO support which is already provided by the ll2
interface (until now was used only for iSCSI offload).
The changes mostly include opening a ll2 dedicated connection for
OOO and notifiying the FW about the handle id.

Signed-off-by: Michal Kalderon 
Signed-off-by: Ariel Elior 
---
 drivers/net/ethernet/qlogic/qed/qed_iwarp.c | 44 +
 drivers/net/ethernet/qlogic/qed/qed_iwarp.h | 11 +++-
 drivers/net/ethernet/qlogic/qed/qed_rdma.c  |  7 +++--
 3 files changed, 59 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/qlogic/qed/qed_iwarp.c 
b/drivers/net/ethernet/qlogic/qed/qed_iwarp.c
index 9d989c9..568e985 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_iwarp.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_iwarp.c
@@ -41,6 +41,7 @@
 #include "qed_rdma.h"
 #include "qed_reg_addr.h"
 #include "qed_sp.h"
+#include "qed_ooo.h"
 
 #define QED_IWARP_ORD_DEFAULT  32
 #define QED_IWARP_IRD_DEFAULT  32
@@ -119,6 +120,13 @@ static void qed_iwarp_cid_cleaned(struct qed_hwfn *p_hwfn, 
u32 cid)
spin_unlock_bh(_hwfn->p_rdma_info->lock);
 }
 
+void qed_iwarp_init_fw_ramrod(struct qed_hwfn *p_hwfn,
+ struct iwarp_init_func_params *p_ramrod)
+{
+   p_ramrod->ll2_ooo_q_index = RESC_START(p_hwfn, QED_LL2_QUEUE) +
+   p_hwfn->p_rdma_info->iwarp.ll2_ooo_handle;
+}
+
 static int qed_iwarp_alloc_cid(struct qed_hwfn *p_hwfn, u32 *cid)
 {
int rc;
@@ -1876,6 +1884,16 @@ static int qed_iwarp_ll2_stop(struct qed_hwfn *p_hwfn, 
struct qed_ptt *p_ptt)
iwarp_info->ll2_syn_handle = QED_IWARP_HANDLE_INVAL;
}
 
+   if (iwarp_info->ll2_ooo_handle != QED_IWARP_HANDLE_INVAL) {
+   rc = qed_ll2_terminate_connection(p_hwfn,
+ iwarp_info->ll2_ooo_handle);
+   if (rc)
+   DP_INFO(p_hwfn, "Failed to terminate ooo connection\n");
+
+   qed_ll2_release_connection(p_hwfn, iwarp_info->ll2_ooo_handle);
+   iwarp_info->ll2_ooo_handle = QED_IWARP_HANDLE_INVAL;
+   }
+
qed_llh_remove_mac_filter(p_hwfn,
  p_ptt, p_hwfn->p_rdma_info->iwarp.mac_addr);
return rc;
@@ -1927,10 +1945,12 @@ static int qed_iwarp_ll2_stop(struct qed_hwfn *p_hwfn, 
struct qed_ptt *p_ptt)
struct qed_iwarp_info *iwarp_info;
struct qed_ll2_acquire_data data;
struct qed_ll2_cbs cbs;
+   u16 n_ooo_bufs;
int rc = 0;
 
iwarp_info = _hwfn->p_rdma_info->iwarp;
iwarp_info->ll2_syn_handle = QED_IWARP_HANDLE_INVAL;
+   iwarp_info->ll2_ooo_handle = QED_IWARP_HANDLE_INVAL;
 
iwarp_info->max_mtu = params->max_mtu;
 
@@ -1978,6 +1998,29 @@ static int qed_iwarp_ll2_stop(struct qed_hwfn *p_hwfn, 
struct qed_ptt *p_ptt)
if (rc)
goto err;
 
+   /* Start OOO connection */
+   data.input.conn_type = QED_LL2_TYPE_OOO;
+   data.input.mtu = params->max_mtu;
+
+   n_ooo_bufs = (QED_IWARP_MAX_OOO * QED_IWARP_RCV_WND_SIZE_DEF) /
+iwarp_info->max_mtu;
+   n_ooo_bufs = min_t(u32, n_ooo_bufs, QED_IWARP_LL2_OOO_MAX_RX_SIZE);
+
+   data.input.rx_num_desc = n_ooo_bufs;
+   data.input.rx_num_ooo_buffers = n_ooo_bufs;
+
+   data.input.tx_max_bds_per_packet = 1;   /* will never be fragmented */
+   data.input.tx_num_desc = QED_IWARP_LL2_OOO_DEF_TX_SIZE;
+   data.p_connection_handle = _info->ll2_ooo_handle;
+
+   rc = qed_ll2_acquire_connection(p_hwfn, );
+   if (rc)
+   goto err;
+
+   rc = qed_ll2_establish_connection(p_hwfn, iwarp_info->ll2_ooo_handle);
+   if (rc)
+   goto err;
+
return rc;
 err:
qed_iwarp_ll2_stop(p_hwfn, p_ptt);
@@ -2014,6 +2057,7 @@ int qed_iwarp_setup(struct qed_hwfn *p_hwfn, struct 
qed_ptt *p_ptt,
 
qed_spq_register_async_cb(p_hwfn, PROTOCOLID_IWARP,
  qed_iwarp_async_event);
+   qed_ooo_setup(p_hwfn);
 
return qed_iwarp_ll2_start(p_hwfn, params, p_ptt);
 }
diff --git a/drivers/net/ethernet/qlogic/qed/qed_iwarp.h 
b/drivers/net/ethernet/qlogic/qed/qed_iwarp.h
index 148ef3c..9e2bfde 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_iwarp.h
+++ b/drivers/net/ethernet/qlogic/qed/qed_iwarp.h
@@ -47,7 +47,12 @@ enum qed_iwarp_qp_state {
 #define QED_IWARP_LL2_SYN_TX_SIZE   (128)
 #define QED_IWARP_LL2_SYN_RX_SIZE   (256)
 #define QED_IWARP_MAX_SYN_PKT_SIZE  (128)
-#define QED_IWARP_HANDLE_INVAL (0xff)
+
+#define QED_IWARP_LL2_OOO_DEF_TX_SIZE   (256)
+#define QED_IWARP_MAX_OOO  (16)
+#define QED_IWARP_LL2_OOO_MAX_RX_SIZE   (16384)
+
+#define QED_IWARP_HANDLE_INVAL (0xff)
 
 struct qed_iwarp_ll2_buff {
void *data;
@@ -67,6 +72,7 @@ struct qed_iwarp_info {
u8 crc_needed;
u8 tcp_flags;
 

[PATCH v2 net-next 4/4] qed: iWARP - Add check for errors on a SYN packet

2017-09-24 Thread Michal Kalderon
A SYN packet which arrives with errors from FW should be dropped.
This required adding an additional field to the ll2
rx completion data.

Signed-off-by: Michal Kalderon 
Signed-off-by: Ariel Elior 
---
 drivers/net/ethernet/qlogic/qed/qed_iwarp.c | 8 
 drivers/net/ethernet/qlogic/qed/qed_ll2.c   | 1 +
 include/linux/qed/qed_ll2_if.h  | 1 +
 3 files changed, 10 insertions(+)

diff --git a/drivers/net/ethernet/qlogic/qed/qed_iwarp.c 
b/drivers/net/ethernet/qlogic/qed/qed_iwarp.c
index 568e985..8fc9c811 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_iwarp.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_iwarp.c
@@ -1733,6 +1733,14 @@ int qed_iwarp_reject(void *rdma_cxt, struct 
qed_iwarp_reject_in *iparams)
 
memset(_info, 0, sizeof(cm_info));
ll2_syn_handle = p_hwfn->p_rdma_info->iwarp.ll2_syn_handle;
+
+   /* Check if packet was received with errors... */
+   if (data->err_flags) {
+   DP_NOTICE(p_hwfn, "Error received on SYN packet: 0x%x\n",
+ data->err_flags);
+   goto err;
+   }
+
if (GET_FIELD(data->parse_flags,
  PARSING_AND_ERR_FLAGS_L4CHKSMWASCALCULATED) &&
GET_FIELD(data->parse_flags, PARSING_AND_ERR_FLAGS_L4CHKSMERROR)) {
diff --git a/drivers/net/ethernet/qlogic/qed/qed_ll2.c 
b/drivers/net/ethernet/qlogic/qed/qed_ll2.c
index c06ad4f..250afa5 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_ll2.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_ll2.c
@@ -413,6 +413,7 @@ static void qed_ll2_rxq_parse_reg(struct qed_hwfn *p_hwfn,
  struct qed_ll2_comp_rx_data *data)
 {
data->parse_flags = le16_to_cpu(p_cqe->rx_cqe_fp.parse_flags.flags);
+   data->err_flags = le16_to_cpu(p_cqe->rx_cqe_fp.err_flags.flags);
data->length.packet_length =
le16_to_cpu(p_cqe->rx_cqe_fp.packet_length);
data->vlan = le16_to_cpu(p_cqe->rx_cqe_fp.vlan);
diff --git a/include/linux/qed/qed_ll2_if.h b/include/linux/qed/qed_ll2_if.h
index dd7a3b8..89fa0bb 100644
--- a/include/linux/qed/qed_ll2_if.h
+++ b/include/linux/qed/qed_ll2_if.h
@@ -101,6 +101,7 @@ struct qed_ll2_comp_rx_data {
void *cookie;
dma_addr_t rx_buf_addr;
u16 parse_flags;
+   u16 err_flags;
u16 vlan;
bool b_last_packet;
u8 connection_handle;
-- 
1.8.3.1



[PATCH v2 net-next 3/4] qed: Fix maximum number of CQs for iWARP

2017-09-24 Thread Michal Kalderon
The maximum number of CQs supported is bound to the number
of connections supported, which differs between RoCE and iWARP.

This fixes a crash that occurred in iWARP when running 1000 sessions
using perftest.

Fixes: 67b40dccc45 ("qed: Implement iWARP initialization, teardown and qp 
operations")

Signed-off-by: Michal Kalderon 
Signed-off-by: Ariel Elior 
---
 drivers/net/ethernet/qlogic/qed/qed_rdma.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/qlogic/qed/qed_rdma.c 
b/drivers/net/ethernet/qlogic/qed/qed_rdma.c
index 4f46f28..c8c4b39 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_rdma.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_rdma.c
@@ -209,11 +209,11 @@ static int qed_rdma_alloc(struct qed_hwfn *p_hwfn,
goto free_pd_map;
}
 
-   /* Allocate bitmap for cq's. The maximum number of CQs is bounded to
-* twice the number of QPs.
+   /* Allocate bitmap for cq's. The maximum number of CQs is bound to
+* the number of connections we support. (num_qps in iWARP or
+* num_qps/2 in RoCE).
 */
-   rc = qed_rdma_bmap_alloc(p_hwfn, _rdma_info->cq_map,
-p_rdma_info->num_qps * 2, "CQ");
+   rc = qed_rdma_bmap_alloc(p_hwfn, _rdma_info->cq_map, num_cons, "CQ");
if (rc) {
DP_VERBOSE(p_hwfn, QED_MSG_RDMA,
   "Failed to allocate cq bitmap, rc = %d\n", rc);
@@ -222,10 +222,10 @@ static int qed_rdma_alloc(struct qed_hwfn *p_hwfn,
 
/* Allocate bitmap for toggle bit for cq icids
 * We toggle the bit every time we create or resize cq for a given icid.
-* The maximum number of CQs is bounded to  twice the number of QPs.
+* Size needs to equal the size of the cq bmap.
 */
rc = qed_rdma_bmap_alloc(p_hwfn, _rdma_info->toggle_bits,
-p_rdma_info->num_qps * 2, "Toggle");
+num_cons, "Toggle");
if (rc) {
DP_VERBOSE(p_hwfn, QED_MSG_RDMA,
   "Failed to allocate toogle bits, rc = %d\n", rc);
-- 
1.8.3.1