Re: [PATCH net] ipv4: route: fix inet_rtm_getroute induced crash

2017-08-13 Thread Florian Westphal
David Ahern  wrote:
> On 8/13/17 4:52 PM, Florian Westphal wrote:
> > "ip route get $daddr iif eth0 from $saddr" causes:
> >  BUG: KASAN: use-after-free in ip_route_input_rcu+0x1535/0x1b50
> >  Call Trace:
> >   ip_route_input_rcu+0x1535/0x1b50
> >   ip_route_input_noref+0xf9/0x190
> >   tcp_v4_early_demux+0x1a4/0x2b0
> >   ip_rcv+0xbcb/0xc05
> >   __netif_receive_skb+0x9c/0xd0
> >   netif_receive_skb_internal+0x5a8/0x890
> > 
> > Problem is that inet_rtm_getroute calls either ip_route_input_rcu (if an
> > iif was provided) or ip_route_output_key_hash_rcu.
> > 
> > But ip_route_input_rcu, unlike ip_route_output_key_hash_rcu, already
> > associates the dst_entry with the skb.  This clears the SKB_DST_NOREF
> > bit (i.e. skb_dst_drop will release/free the entry while it should not).
> > 
> > Thus only set the dst if we called ip_route_output_key_hash_rcu().
> > 
> > I tested this patch by running:
> >  while true;do ip r get 10.0.1.2;done > /dev/null &
> >  while true;do ip r get 10.0.1.2 iif eth0  from 10.0.1.1;done > /dev/null &
> > ... and saw no crash or memory leak.
> > 
> > Cc: Roopa Prabhu 
> > Cc: David Ahern 
> > Fixes: ba52d61e0ff ("ipv4: route: restore skb_dst_set in inet_rtm_getroute")
> > Signed-off-by: Florian Westphal 
> 
> Have looked at the change in detail, but are you sure that is the
> correct Fixes?

I'm reasonably sure, yes:

if (iif) {
  ip_route_input_rcu // 1 might get NOREF dst
} else {
  ip_route_output_key_hash_rcu // 2 always takes dst ref
}
skb_dst_set /* 3 loses NOREF in case of 1) */

> Running these:
>   while true;do ip r get 10.1.1.3;done > /dev/null &
>   while true;do ip r get 10.1.1.3 iif eth0  from 192.16.1.1;done >
> /dev/null &
> 
> at various commits:
>   ffe95ecf3a2 - KASAN backtraces

Right, this is broken state (has both ba52d61e0ff and 3765d35ed8b9)

>   374d801522f - works fine

This is fine, it lacks 3765d35ed8b9:
both branches take a reference on dst so '3' above has no side effect.

>   ba52d61e0ff - negative refcnt messages

AFAICS this is before dst gc removal, I guess (but did not
check) that KASAN vs. refcount just comes from this.

>   a5e2ee5da47 - works fine

Should cause a memory leak when iif is not given (ref on dst is
taken but not released in case of 2), ba52d61e0ff cured this but
adds the problem described here).

Does that make it clearer?


Re: [patch v1 1/2] Allow Mellanox network vendor to be configured if only I2C bus is configured

2017-08-13 Thread Leon Romanovsky
On Sun, Aug 13, 2017 at 05:25:21PM -0700, David Miller wrote:
> From: Ohad Oz 
> Date: Sun, 13 Aug 2017 15:26:56 +
>
> >
> >
> >> -Original Message-
> >> From: Leon Romanovsky [mailto:l...@kernel.org]
> >> Sent: Saturday, August 12, 2017 5:37 PM
> >> To: Ohad Oz 
> >> Cc: da...@davemloft.net; netdev@vger.kernel.org; j...@resnulli.us; Saeed
> >> Mahameed ; Vadim Pasternak
> >> ; system-sw-low-level  >> le...@mellanox.com>
> >> Subject: Re: [patch v1 1/2] Allow Mellanox network vendor to be configured
> >> if only I2C bus is configured
> >>
> >> On Thu, Aug 10, 2017 at 05:11:51PM +, Ohad Oz wrote:
> >> > Patch allows Mellanox devices on system with no PCI, but with I2C only.
> >> >
> >>
> >> Did you test mlx5 device on such system? Did it work for you?
> >
> > Yes, I did. With PCI config set to disable mlx5 drivers are not built.
> > Only the following:
> > /build/drivers/net/Ethernet/mellanox/mlxsw/mlxsw_core.ko
> > /build/drivers/net/Ethernet/mellanox /mlxsw/mlxsw_i2c.ko
> > /build/drivers/net/Ethernet/mellanox /mlxsw/mlxsw_minimal.ko
> >
> > While with both options on all drivers are built inc mlx5.
>
> I'm not so sure he's interested if things build or not.
>
> He's asking you if you actually used a Mellanox device with this
> driver with PCI disabled and only I2C available.

Thanks David, you are absolutely right.

The commit message and more important, Kconfig entry, is misleading the 
potential
users by promising them that all devices under drivers/net/ethernet/mellanox
folder are working with i2c.

I have no technical objection to the idea of this patch, but it needs to be 
resubmitted after
Ohad ensures that mlx4/mlx5 are not visible in menuconfig for i2c
systems and he fixes Kconfig description together with commit log.

Thanks


signature.asc
Description: PGP signature


Re: [PATCH net] ipv4: route: fix inet_rtm_getroute induced crash

2017-08-13 Thread David Ahern
On 8/13/17 4:52 PM, Florian Westphal wrote:
> "ip route get $daddr iif eth0 from $saddr" causes:
>  BUG: KASAN: use-after-free in ip_route_input_rcu+0x1535/0x1b50
>  Call Trace:
>   ip_route_input_rcu+0x1535/0x1b50
>   ip_route_input_noref+0xf9/0x190
>   tcp_v4_early_demux+0x1a4/0x2b0
>   ip_rcv+0xbcb/0xc05
>   __netif_receive_skb+0x9c/0xd0
>   netif_receive_skb_internal+0x5a8/0x890
> 
> Problem is that inet_rtm_getroute calls either ip_route_input_rcu (if an
> iif was provided) or ip_route_output_key_hash_rcu.
> 
> But ip_route_input_rcu, unlike ip_route_output_key_hash_rcu, already
> associates the dst_entry with the skb.  This clears the SKB_DST_NOREF
> bit (i.e. skb_dst_drop will release/free the entry while it should not).
> 
> Thus only set the dst if we called ip_route_output_key_hash_rcu().
> 
> I tested this patch by running:
>  while true;do ip r get 10.0.1.2;done > /dev/null &
>  while true;do ip r get 10.0.1.2 iif eth0  from 10.0.1.1;done > /dev/null &
> ... and saw no crash or memory leak.
> 
> Cc: Roopa Prabhu 
> Cc: David Ahern 
> Fixes: ba52d61e0ff ("ipv4: route: restore skb_dst_set in inet_rtm_getroute")
> Signed-off-by: Florian Westphal 

Have looked at the change in detail, but are you sure that is the
correct Fixes?

Running these:
  while true;do ip r get 10.1.1.3;done > /dev/null &
  while true;do ip r get 10.1.1.3 iif eth0  from 192.16.1.1;done >
/dev/null &

at various commits:
  ffe95ecf3a2 - KASAN backtraces
  374d801522f - works fine
  ba52d61e0ff - negative refcnt messages
  a5e2ee5da47 - works fine


Re: [PATCH net-next v2 2/3] net/ncsi: Configure VLAN tag filter

2017-08-13 Thread Samuel Mendoza-Jonas
On Mon, 2017-08-14 at 11:39 +0930, Joel Stanley wrote:
> On Mon, Aug 14, 2017 at 10:59 AM, Samuel Mendoza-Jonas
>  wrote:
> > Make use of the ndo_vlan_rx_{add,kill}_vid callbacks to have the NCSI
> > stack process new VLAN tags and configure the channel VLAN filter
> > appropriately.
> > Several VLAN tags can be set and a "Set VLAN Filter" packet must be sent
> > for each one, meaning the ncsi_dev_state_config_svf state must be
> > repeated. An internal list of VLAN tags is maintained, and compared
> > against the current channel's ncsi_channel_filter in order to keep track
> > within the state. VLAN filters are removed in a similar manner, with the
> > introduction of the ncsi_dev_state_config_clear_vids state. The maximum
> > number of VLAN tag filters is determined by the "Get Capabilities"
> > response from the channel.
> > 
> > Signed-off-by: Samuel Mendoza-Jonas 
> 
> I've given this some testing, but there are a few things I saw below
> that we should sort out.
> 
> > --- a/net/ncsi/ncsi-manage.c
> > +++ b/net/ncsi/ncsi-manage.c
> > @@ -38,6 +38,22 @@ static inline int ncsi_filter_size(int table)
> > return sizes[table];
> >  }
> > 
> > +u32 *ncsi_get_filter(struct ncsi_channel *nc, int table, int index)
> > +{
> > +   struct ncsi_channel_filter *ncf;
> > +   int size;
> > +
> > +   ncf = nc->filters[table];
> > +   if (!ncf)
> > +   return NULL;
> > +
> > +   size = ncsi_filter_size(table);
> > +   if (size < 0)
> > +   return NULL;
> > +
> > +   return ncf->data + size * index;
> > +}
> > +
> >  int ncsi_find_filter(struct ncsi_channel *nc, int table, void *data)
> >  {
> > struct ncsi_channel_filter *ncf;
> > @@ -58,7 +74,7 @@ int ncsi_find_filter(struct ncsi_channel *nc, int table, 
> > void *data)
> > index = -1;
> > while ((index = find_next_bit(bitmap, ncf->total, index + 1))
> >< ncf->total) {
> > -   if (!memcmp(ncf->data + size * index, data, size)) {
> > +   if (!data || !memcmp(ncf->data + size * index, data, size)) 
> > {
> 
> Not clear why this check is required?

Right, this could use a comment. This is a small workaround to having a
way to finding an arbitrary active filter, below in clear_one_vid(). We
pass NULL as a way of saying "return any enabled filter".

> 
> > spin_unlock_irqrestore(>lock, flags);
> > return index;
> > }
> > @@ -639,6 +655,83 @@ static void ncsi_suspend_channel(struct ncsi_dev_priv 
> > *ndp)
> > nd->state = ncsi_dev_state_functional;
> >  }
> > 
> > +/* Check the VLAN filter bitmap for a set filter, and construct a
> > + * "Set VLAN Filter - Disable" packet if found.
> > + */
> > +static int clear_one_vid(struct ncsi_dev_priv *ndp, struct ncsi_channel 
> > *nc,
> > +struct ncsi_cmd_arg *nca)
> > +{
> > +   int index;
> > +   u16 vid;
> > +
> > +   index = ncsi_find_filter(nc, NCSI_FILTER_VLAN, NULL);
> > +   if (index < 0) {
> > +   /* Filter table empty */
> > +   return -1;
> > +   }
> > +
> > +   vid = *(u16 *)ncsi_get_filter(nc, NCSI_FILTER_VLAN, index);
> 
> You just added this function that returns a pointer to a u32. It's
> strange to see the only call site then throw half of it away.

The problem here is that the single ncsi_channel_filter struct is used to
represent several different filters, and the filter data is stored in a
u32 data[] type. We cast to u16 in clear_one_vid because we know it's a
VLAN tag (NCSI_FILTER_VLAN), although we probably should call
ncsi_filter_size() to be sure.
> 
> Also, ncsi_get_filter can return NULL.

That is indeed a problem though!

> 
> > +   netdev_printk(KERN_DEBUG, ndp->ndev.dev,
> > + "ncsi: removed vlan tag %u at index %d\n",
> > + vid, index + 1);
> > +   ncsi_remove_filter(nc, NCSI_FILTER_VLAN, index);
> > +
> > +   nca->type = NCSI_PKT_CMD_SVF;
> > +   nca->words[1] = vid;
> > +   /* HW filter index starts at 1 */
> > +   nca->bytes[6] = index + 1;
> > +   nca->bytes[7] = 0x00;
> > +   return 0;
> > +}
> > @@ -751,6 +876,26 @@ static void ncsi_configure_channel(struct 
> > ncsi_dev_priv *ndp)
> > break;
> > case ncsi_dev_state_config_done:
> > spin_lock_irqsave(>lock, flags);
> > +   if (nc->reconfigure_needed) {
> > +   /* This channel's configuration has been updated
> > +* part-way during the config state - start the
> > +* channel configuration over
> > +*/
> > +   nc->reconfigure_needed = false;
> > +   nc->state = NCSI_CHANNEL_INVISIBLE;
> > +   spin_unlock_irqrestore(>lock, flags);
> > +
> > +   

Re: [PATCH 0/4] constify net platform_device_id

2017-08-13 Thread David Miller
From: Arvind Yadav 
Date: Sun, 13 Aug 2017 16:41:44 +0530

> platform_device_id are not supposed to change at runtime. All functions
> working with platform_device_id provided by 
> work with const platform_device_id. So mark the non-const structs as const.

Series applied, thanks.


Re: [PATCH] tap: make struct tap_fops static

2017-08-13 Thread David Miller
From: Colin King 
Date: Sat, 12 Aug 2017 22:52:31 +0100

> From: Colin Ian King 
> 
> The structure tap_fops is local to the source and does not need to
> be in global scope, so make it static.
> 
> Cleans up sparse warning:
> symbol 'tap_fops' was not declared. Should it be static?
> 
> Signed-off-by: Colin Ian King 

Applied.


Re: [PATCH] virtio-net: make array guest_offloads static

2017-08-13 Thread David Miller
From: Colin King 
Date: Sat, 12 Aug 2017 22:45:53 +0100

> From: Colin Ian King 
> 
> The array guest_offloads is local to the source and does not need to
> be in global scope, so make it static. Also tweak formatting.
> 
> Cleans up sparse warnings:
> symbol 'guest_offloads' was not declared. Should it be static?
> 
> Signed-off-by: Colin Ian King 

Applied.


Re: [PATCH] of_mdio: merge branch tails in of_phy_register_fixed_link()

2017-08-13 Thread David Miller
From: Sergei Shtylyov 
Date: Sun, 13 Aug 2017 00:03:06 +0300

> Looks  like gcc isn't always able to figure  out that 3 *if* branches in
> of_phy_register_fixed_link() calling fixed_phy_register() at their ends
> are similar enough and thus can be merged. The "manual" merge saves 40
> bytes of the object code (AArch64 gcc 4.8.5), and still saves 12 bytes
> even  if gcc was able to merge the branch tails (ARM gcc 4.8.5)...
> 
> Signed-off-by: Sergei Shtylyov 

Applied, but if two instances of the "same" compiler just with
different targets changes the optimization, it could be because of a
tradeoff which is specific to parameters expressed in that target's
backend.

So in the future we should probably back away from trying to "help"
the compiler in this way.


Re: [PATCH net-next 0/2] net: vrf: Support for local traffic with sockets bound to enslaved devices

2017-08-13 Thread David Miller
From: David Ahern 
Date: Fri, 11 Aug 2017 17:11:13 -0700

> This set gets local traffic working for sockets bound to enslaved
> devices. The local rtable and rt6_info added in June 2016 to get
> local traffic in VRFs working is no longer needed and actually
> keeps local traffic for sockets bound to an enslaved device from
> working. Patch 1 removes them.
> 
> Patch 2 adds a fix up for IPv4 IP_PKTINFO to return rt_iif for
> packets sent over the VRF device. This is similar to the handling
> of loopback.

Series applied, thanks David.


Re: [PATCH net-next] net: ipv4: remove unnecessary check on orig_oif

2017-08-13 Thread David Miller
From: David Ahern 
Date: Fri, 11 Aug 2017 17:02:02 -0700

> rt_iif is going to be set to either 0 or orig_oif. If orig_oif
> is 0 it amounts to the same end result so remove the check.
> 
> Signed-off-by: David Ahern 

Applied, thanks David.


Re: [PATCH v2] bonding: ratelimit failed speed/duplex update warning

2017-08-13 Thread David Miller
From: Andreas Born 
Date: Sat, 12 Aug 2017 00:36:55 +0200

> bond_miimon_commit() handles the UP transition for each slave of a bond
> in the case of MII. It is triggered 10 times per second for the default
> MII Polling interval of 100ms. For device drivers that do not implement
> __ethtool_get_link_ksettings() the call to bond_update_speed_duplex()
> fails persistently while the MII status could remain UP. That is, in
> this and other cases where the speed/duplex update keeps failing over a
> longer period of time while the MII state is UP, a warning is printed
> every MII polling interval.
> 
> To address these excessive warnings net_ratelimit() should be used.
> Printing a warning once would not be sufficient since the call to
> bond_update_speed_duplex() could recover to succeed and fail again
> later. In that case there would be no new indication what went wrong.
> 
> Fixes: b5bf0f5b16b9c (bonding: correctly update link status during mii-commit 
> phase)
> Signed-off-by: Andreas Born 
> ---
> Changes in v2:
> * swapped pr_warn_ratelimited() for net_ratelimit()

Applied and you'll be happy to know I queued this up for -stable too :)


Re: [PATCH net-next v3] vxlan: change vxlan_[config_]validate() to use netlink_ext_ack for error reporting

2017-08-13 Thread David Miller
From: Girish Moodalbail 
Date: Fri, 11 Aug 2017 15:20:59 -0700

> The kernel log is not where users expect error messages for netlink
> requests; as we have extended acks now, we can replace pr_debug() with
> NL_SET_ERR_MSG_ATTR().
> 
> Signed-off-by: Matthias Schiffer 
> Signed-off-by: Girish Moodalbail 

Applied, thanks.


Re: [PATCH net-next V2 0/3] XDP support for tap

2017-08-13 Thread David Miller
From: Jason Wang 
Date: Fri, 11 Aug 2017 19:41:15 +0800

> Hi all:
> 
> This series tries to implement XDP support for tap. Two path were
> implemented:
> 
> - fast path: small & non-gso packet, For performance reason we do it
>   at page level and use build_skb() to create skb if necessary.
> - slow path: big or gso packet, we don't want to lose the capability
>   compared to generic XDP, so we export some generic xdp helpers and
>   do it after skb was created.
> 
> xdp1 shows about 41% improvement, xdp_redirect shows about 60%
> improvement.
> 
> Changes from V1:
> - fix the race between xdp set and free
> - don't hold extra refcount
> - add XDP_REDIRECT support
> 
> Please review.

Series applied, thanks Jason.


Re: [PATCHv2 net-next] gre: introduce native tunnel support for ERSPAN

2017-08-13 Thread David Miller
From: William Tu 
Date: Wed,  9 Aug 2017 13:53:05 -0700

> The patch adds ERSPAN type II tunnel support.  The implementation
> is based on the draft at [1].  One of the purposes is for Linux
> box to be able to receive ERSPAN monitoring traffic sent from
> the Cisco switch, by creating a ERSPAN tunnel device.
> In addition, the patch also adds ERSPAN TX, so traffic can
> also be encapsulated into ERSPAN and sent out.
> 
> The implementation reuses the key as ERSPAN session ID, and
> field 'erspan' as ERSPAN Index fields:
> ./ip link add dev ers11 type erspan seq key 100 erspan 123 \
>   local 172.16.1.200 remote 172.16.1.100
> 
> [1] https://tools.ietf.org/html/draft-foschiano-erspan-01
> [2] iproute patch: http://marc.info/?l=linux-netdev=150231090207544=2
> 
> Signed-off-by: William Tu 
> Signed-off-by: Meenakshi Vohra 
> Cc: Alexey Kuznetsov 
> Cc: Hideaki YOSHIFUJI 
> ---
> v1->v2: 
>  Add missing erspan.h header
> 
> ---
>  include/net/erspan.h   |  62 +++
>  include/net/ip_tunnels.h   |   3 +
>  include/uapi/linux/if_ether.h  |   1 +
>  include/uapi/linux/if_tunnel.h |   1 +
>  net/ipv4/ip_gre.c  | 248 
> +
>  5 files changed, 315 insertions(+)
>  create mode 100644 include/net/erspan.h
> 
> diff --git a/include/net/erspan.h b/include/net/erspan.h
> new file mode 100644
> index ..cafe387a2cae
> --- /dev/null
> +++ b/include/net/erspan.h
> @@ -0,0 +1,62 @@
> +#ifndef __LINUX_ERSPAN_H
> +#define __LINUX_ERSPAN_H
> +
> +/*
> + * GRE header for ERSPAN encapsulation (8 octets [34:41]) -- 8 bytes
> + *   0   1   2   3
> + *  0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
> + * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> + * |0|0|0|1|0|0|0|0|Protocol Type for ERSPAN   |
> + * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> + * |  Sequence Number (increments per packet per session)  |
> + * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> + *
> + *  Note that in the above GRE header [RFC1701] out of the C, R, K, S,
> + *  s, Recur, Flags, Version fields only S (bit 03) is set to 1. The
> + *  other fields are set to zero, so only a sequence number follows.
> + *
> + *  ERSPAN Type II header (8 octets [42:49])
> + *  0   1   2   3
> + *  0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
> + * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> + * |  Ver  |  VLAN | COS | En|T|Session ID |
> + * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> + * |  Reserved |  Index|
> + * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> + *
> +  * GRE proto ERSPAN type II = 0x88BE, type III = 0x22EB
> + */
> +
> +#define ERSPAN_VERSION   0x1
> +
> +#define VER_MASK 0xf000
> +#define VLAN_MASK0x0fff
> +#define COS_MASK 0xe000
> +#define EN_MASK  0x1800
> +#define BOS_MASK 0x1800 //?
> +#define T_MASK   0x0400
> +#define ID_MASK  0x03ff
> +#define INDEX_MASK   0xf
> +
> +enum erspan_encap_type {
> + ERSPAN_ENCAP_NOVLAN = 0x0,  /* originally without VLAN tag */
> + ERSPAN_ENCAP_ISL = 0x1, /* originally ISL encapsulated */
> + ERSPAN_ENCAP_8021Q = 0x2,   /* originally 802.1Q encapsulated */
> + ERSPAN_ENCAP_INFRAME = 0x3, /* VLAN tag perserved in frame */
> +};
> +
> +struct erspan_metadata {
> + __be32 index;   /* type II */
> +};
> +
> +struct erspanhdr {
> +   __be16 ver_vlan;
> +#define VER_OFFSET  12
> +   __be16 session_id;
> +#define COS_OFFSET  13
> +#define EN_OFFSET   11
> +#define T_OFFSET10
> +   struct erspan_metadata md;
> +};
> +
> +#endif
> diff --git a/include/net/ip_tunnels.h b/include/net/ip_tunnels.h
> index 520809912f03..625c29329372 100644
> --- a/include/net/ip_tunnels.h
> +++ b/include/net/ip_tunnels.h
> @@ -115,6 +115,9 @@ struct ip_tunnel {
>   u32 o_seqno;/* The last output seqno */
>   int tun_hlen;   /* Precalculated header length */
>  
> + /* This field used only by ERSPAN */
> + u32 index;  /* ERSPAN type II index */
> +
>   struct dst_cache dst_cache;
>  
>   struct ip_tunnel_parm parms;
> diff --git a/include/uapi/linux/if_ether.h b/include/uapi/linux/if_ether.h
> index 5bc9bfd816b7..efeb1190c2ca 100644
> --- a/include/uapi/linux/if_ether.h
> +++ b/include/uapi/linux/if_ether.h
> @@ -66,6 +66,7 @@
>  #define ETH_P_ATALK  0x809B  /* Appletalk DDP*/
>  #define ETH_P_AARP   0x80F3  /* Appletalk AARP

Re: [PATCH net-next] rtnelink: Move link dump consistency check out of the loop

2017-08-13 Thread David Miller
From: Jakub Sitnicki 
Date: Wed,  9 Aug 2017 17:39:12 +0200

> Calls to rtnl_dump_ifinfo() are protected by RTNL lock. So are the
> {list,unlist}_netdevice() calls where we bump the net->dev_base_seq
> number.
> 
> For this reason net->dev_base_seq can't change under out feet while
> we're looping over links in rtnl_dump_ifinfo(). So move the check for
> net->dev_base_seq change (since the last time we were called) out of the
> loop.
> 
> This way we avoid giving a wrong impression that there are concurrent
> updates to the link list going on while we're iterating over them.
> 
> Signed-off-by: Jakub Sitnicki 

Applied, thanks.


Re: [PATCH net-next 8/8] liquidio: added support for ethtool --set-ring feature

2017-08-13 Thread David Miller
From: Felix Manlunas 
Date: Fri, 11 Aug 2017 18:29:30 -0700

> + if ((ering->rx_mini_pending) || (ering->rx_jumbo_pending))
> + return -EINVAL;

Lots of unnecessary parenthesis, please remove.

> + if ((rx_count == rx_count_old) && (tx_count == tx_count_old))
> + return 0;

Likewise.


Re: [PATCH net-next v2 3/3] ftgmac100: Support NCSI VLAN filtering when available

2017-08-13 Thread Joel Stanley
On Mon, Aug 14, 2017 at 10:59 AM, Samuel Mendoza-Jonas
 wrote:
> Register the ndo_vlan_rx_{add,kill}_vid callbacks and set the
> NETIF_F_HW_VLAN_CTAG_FILTER if NCSI is available.
> This allows the VLAN core to notify the NCSI driver when changes occur
> so that the remote NCSI channel can be properly configured to filter on
> the set VLAN tags.
>
> Signed-off-by: Samuel Mendoza-Jonas 

I'm not a vlan expert, so I asked why this was only being set when
doing NCSI. The answer was:

> There is no point setting it when not doing ncsi. The HW doesn't have a
> filter in that case (we do have HW vlan tag extraction and injection,
> which my driver supports, but that's different flags).

Reviewed-by: Joel Stanley 

> ---
> v2: Moved ftgmac100 change into same patch and reordered
>
>  drivers/net/ethernet/faraday/ftgmac100.c | 5 +
>  1 file changed, 5 insertions(+)
>
> diff --git a/drivers/net/ethernet/faraday/ftgmac100.c 
> b/drivers/net/ethernet/faraday/ftgmac100.c
> index 34dae51effd4..05fe7123d5ae 100644
> --- a/drivers/net/ethernet/faraday/ftgmac100.c
> +++ b/drivers/net/ethernet/faraday/ftgmac100.c
> @@ -1623,6 +1623,8 @@ static const struct net_device_ops ftgmac100_netdev_ops 
> = {
>  #ifdef CONFIG_NET_POLL_CONTROLLER
> .ndo_poll_controller= ftgmac100_poll_controller,
>  #endif
> +   .ndo_vlan_rx_add_vid= ncsi_vlan_rx_add_vid,
> +   .ndo_vlan_rx_kill_vid   = ncsi_vlan_rx_kill_vid,
>  };
>
>  static int ftgmac100_setup_mdio(struct net_device *netdev)
> @@ -1837,6 +1839,9 @@ static int ftgmac100_probe(struct platform_device *pdev)
> NETIF_F_GRO | NETIF_F_SG | NETIF_F_HW_VLAN_CTAG_RX |
> NETIF_F_HW_VLAN_CTAG_TX;
>
> +   if (priv->use_ncsi)
> +   netdev->hw_features |= NETIF_F_HW_VLAN_CTAG_FILTER;
> +
> /* AST2400  doesn't have working HW checksum generation */
> if (np && (of_device_is_compatible(np, "aspeed,ast2400-mac")))
> netdev->hw_features &= ~NETIF_F_HW_CSUM;
> --
> 2.14.0
>


Re: [PATCH net-next v2 1/3] net/ncsi: Fix several packet definitions

2017-08-13 Thread Joel Stanley
On Mon, Aug 14, 2017 at 10:59 AM, Samuel Mendoza-Jonas
 wrote:

I asked Sam if these should be backported to stable and he said:

> These are straight up bugs except... without my changes we never call
> this code. As Ben says as time provides a lot of the current definitions
> need to be gone over, there's a few command/response code paths that are
> never triggered and could be broken in similar ways.

So we're okay here.

> Signed-off-by: Samuel Mendoza-Jonas 

Reviewed-by: Joel Stanley 

Cheers,

Joel

> ---
> v2: Rebased on latest net-next
>
>  net/ncsi/ncsi-cmd.c | 10 +-
>  net/ncsi/ncsi-pkt.h |  2 +-
>  net/ncsi/ncsi-rsp.c |  3 ++-
>  3 files changed, 8 insertions(+), 7 deletions(-)
>
> diff --git a/net/ncsi/ncsi-cmd.c b/net/ncsi/ncsi-cmd.c
> index 5e03ed190e18..7567ca63aae2 100644
> --- a/net/ncsi/ncsi-cmd.c
> +++ b/net/ncsi/ncsi-cmd.c
> @@ -139,9 +139,9 @@ static int ncsi_cmd_handler_svf(struct sk_buff *skb,
> struct ncsi_cmd_svf_pkt *cmd;
>
> cmd = skb_put_zero(skb, sizeof(*cmd));
> -   cmd->vlan = htons(nca->words[0]);
> -   cmd->index = nca->bytes[2];
> -   cmd->enable = nca->bytes[3];
> +   cmd->vlan = htons(nca->words[1]);
> +   cmd->index = nca->bytes[6];
> +   cmd->enable = nca->bytes[7];
> ncsi_cmd_build_header(>cmd.common, nca);
>
> return 0;
> @@ -153,7 +153,7 @@ static int ncsi_cmd_handler_ev(struct sk_buff *skb,
> struct ncsi_cmd_ev_pkt *cmd;
>
> cmd = skb_put_zero(skb, sizeof(*cmd));
> -   cmd->mode = nca->bytes[0];
> +   cmd->mode = nca->bytes[3];
> ncsi_cmd_build_header(>cmd.common, nca);
>
> return 0;
> @@ -228,7 +228,7 @@ static struct ncsi_cmd_handler {
> { NCSI_PKT_CMD_AE, 8, ncsi_cmd_handler_ae  },
> { NCSI_PKT_CMD_SL, 8, ncsi_cmd_handler_sl  },
> { NCSI_PKT_CMD_GLS,0, ncsi_cmd_handler_default },
> -   { NCSI_PKT_CMD_SVF,4, ncsi_cmd_handler_svf },
> +   { NCSI_PKT_CMD_SVF,8, ncsi_cmd_handler_svf },
> { NCSI_PKT_CMD_EV, 4, ncsi_cmd_handler_ev  },
> { NCSI_PKT_CMD_DV, 0, ncsi_cmd_handler_default },
> { NCSI_PKT_CMD_SMA,8, ncsi_cmd_handler_sma },
> diff --git a/net/ncsi/ncsi-pkt.h b/net/ncsi/ncsi-pkt.h
> index 3ea49ed0a935..91b4b66438df 100644
> --- a/net/ncsi/ncsi-pkt.h
> +++ b/net/ncsi/ncsi-pkt.h
> @@ -104,7 +104,7 @@ struct ncsi_cmd_svf_pkt {
> unsigned char   index; /* VLAN table index  */
> unsigned char   enable;/* Enable or disable */
> __be32  checksum;  /* Checksum  */
> -   unsigned char   pad[14];
> +   unsigned char   pad[18];
>  };
>
>  /* Enable VLAN */
> diff --git a/net/ncsi/ncsi-rsp.c b/net/ncsi/ncsi-rsp.c
> index 087db775b3dc..c1a191d790e2 100644
> --- a/net/ncsi/ncsi-rsp.c
> +++ b/net/ncsi/ncsi-rsp.c
> @@ -354,7 +354,8 @@ static int ncsi_rsp_handler_svf(struct ncsi_request *nr)
>
> /* Add or remove the VLAN filter */
> if (!(cmd->enable & 0x1)) {
> -   ret = ncsi_remove_filter(nc, NCSI_FILTER_VLAN, cmd->index);
> +   /* HW indexes from 1 */
> +   ret = ncsi_remove_filter(nc, NCSI_FILTER_VLAN, cmd->index - 
> 1);
> } else {
> vlan = ntohs(cmd->vlan);
> ret = ncsi_add_filter(nc, NCSI_FILTER_VLAN, );
> --
> 2.14.0
>


Re: [PATCH net-next v2 2/3] net/ncsi: Configure VLAN tag filter

2017-08-13 Thread Joel Stanley
On Mon, Aug 14, 2017 at 10:59 AM, Samuel Mendoza-Jonas
 wrote:
> Make use of the ndo_vlan_rx_{add,kill}_vid callbacks to have the NCSI
> stack process new VLAN tags and configure the channel VLAN filter
> appropriately.
> Several VLAN tags can be set and a "Set VLAN Filter" packet must be sent
> for each one, meaning the ncsi_dev_state_config_svf state must be
> repeated. An internal list of VLAN tags is maintained, and compared
> against the current channel's ncsi_channel_filter in order to keep track
> within the state. VLAN filters are removed in a similar manner, with the
> introduction of the ncsi_dev_state_config_clear_vids state. The maximum
> number of VLAN tag filters is determined by the "Get Capabilities"
> response from the channel.
>
> Signed-off-by: Samuel Mendoza-Jonas 

I've given this some testing, but there are a few things I saw below
that we should sort out.

> --- a/net/ncsi/ncsi-manage.c
> +++ b/net/ncsi/ncsi-manage.c
> @@ -38,6 +38,22 @@ static inline int ncsi_filter_size(int table)
> return sizes[table];
>  }
>
> +u32 *ncsi_get_filter(struct ncsi_channel *nc, int table, int index)
> +{
> +   struct ncsi_channel_filter *ncf;
> +   int size;
> +
> +   ncf = nc->filters[table];
> +   if (!ncf)
> +   return NULL;
> +
> +   size = ncsi_filter_size(table);
> +   if (size < 0)
> +   return NULL;
> +
> +   return ncf->data + size * index;
> +}
> +
>  int ncsi_find_filter(struct ncsi_channel *nc, int table, void *data)
>  {
> struct ncsi_channel_filter *ncf;
> @@ -58,7 +74,7 @@ int ncsi_find_filter(struct ncsi_channel *nc, int table, 
> void *data)
> index = -1;
> while ((index = find_next_bit(bitmap, ncf->total, index + 1))
>< ncf->total) {
> -   if (!memcmp(ncf->data + size * index, data, size)) {
> +   if (!data || !memcmp(ncf->data + size * index, data, size)) {

Not clear why this check is required?

> spin_unlock_irqrestore(>lock, flags);
> return index;
> }
> @@ -639,6 +655,83 @@ static void ncsi_suspend_channel(struct ncsi_dev_priv 
> *ndp)
> nd->state = ncsi_dev_state_functional;
>  }
>
> +/* Check the VLAN filter bitmap for a set filter, and construct a
> + * "Set VLAN Filter - Disable" packet if found.
> + */
> +static int clear_one_vid(struct ncsi_dev_priv *ndp, struct ncsi_channel *nc,
> +struct ncsi_cmd_arg *nca)
> +{
> +   int index;
> +   u16 vid;
> +
> +   index = ncsi_find_filter(nc, NCSI_FILTER_VLAN, NULL);
> +   if (index < 0) {
> +   /* Filter table empty */
> +   return -1;
> +   }
> +
> +   vid = *(u16 *)ncsi_get_filter(nc, NCSI_FILTER_VLAN, index);

You just added this function that returns a pointer to a u32. It's
strange to see the only call site then throw half of it away.

Also, ncsi_get_filter can return NULL.

> +   netdev_printk(KERN_DEBUG, ndp->ndev.dev,
> + "ncsi: removed vlan tag %u at index %d\n",
> + vid, index + 1);
> +   ncsi_remove_filter(nc, NCSI_FILTER_VLAN, index);
> +
> +   nca->type = NCSI_PKT_CMD_SVF;
> +   nca->words[1] = vid;
> +   /* HW filter index starts at 1 */
> +   nca->bytes[6] = index + 1;
> +   nca->bytes[7] = 0x00;
> +   return 0;
> +}

> @@ -751,6 +876,26 @@ static void ncsi_configure_channel(struct ncsi_dev_priv 
> *ndp)
> break;
> case ncsi_dev_state_config_done:
> spin_lock_irqsave(>lock, flags);
> +   if (nc->reconfigure_needed) {
> +   /* This channel's configuration has been updated
> +* part-way during the config state - start the
> +* channel configuration over
> +*/
> +   nc->reconfigure_needed = false;
> +   nc->state = NCSI_CHANNEL_INVISIBLE;
> +   spin_unlock_irqrestore(>lock, flags);
> +
> +   spin_lock_irqsave(>lock, flags);
> +   nc->state = NCSI_CHANNEL_INACTIVE;

This looks strange. What's nc->state up to? Does setting it to
NCSI_CHANNEL_INVISIBLE have any affect?

The locking is confusing too.

> +   list_add_tail_rcu(>link, >channel_queue);
> +   spin_unlock_irqrestore(>lock, flags);
> +
> +   netdev_printk(KERN_DEBUG, dev,
> + "Dirty NCSI channel state reset\n");
> +   ncsi_process_next_channel(ndp);
> +   break;
> +   }
> +
> if (nc->modes[NCSI_MODE_LINK].data[2] & 0x1) {
> hot_nc = nc;
> nc->state = NCSI_CHANNEL_ACTIVE;
> @@ -1191,6 +1336,149 @@ static struct notifier_block 

[PATCH net-next v2 3/3] ftgmac100: Support NCSI VLAN filtering when available

2017-08-13 Thread Samuel Mendoza-Jonas
Register the ndo_vlan_rx_{add,kill}_vid callbacks and set the
NETIF_F_HW_VLAN_CTAG_FILTER if NCSI is available.
This allows the VLAN core to notify the NCSI driver when changes occur
so that the remote NCSI channel can be properly configured to filter on
the set VLAN tags.

Signed-off-by: Samuel Mendoza-Jonas 
---
v2: Moved ftgmac100 change into same patch and reordered

 drivers/net/ethernet/faraday/ftgmac100.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/drivers/net/ethernet/faraday/ftgmac100.c 
b/drivers/net/ethernet/faraday/ftgmac100.c
index 34dae51effd4..05fe7123d5ae 100644
--- a/drivers/net/ethernet/faraday/ftgmac100.c
+++ b/drivers/net/ethernet/faraday/ftgmac100.c
@@ -1623,6 +1623,8 @@ static const struct net_device_ops ftgmac100_netdev_ops = 
{
 #ifdef CONFIG_NET_POLL_CONTROLLER
.ndo_poll_controller= ftgmac100_poll_controller,
 #endif
+   .ndo_vlan_rx_add_vid= ncsi_vlan_rx_add_vid,
+   .ndo_vlan_rx_kill_vid   = ncsi_vlan_rx_kill_vid,
 };
 
 static int ftgmac100_setup_mdio(struct net_device *netdev)
@@ -1837,6 +1839,9 @@ static int ftgmac100_probe(struct platform_device *pdev)
NETIF_F_GRO | NETIF_F_SG | NETIF_F_HW_VLAN_CTAG_RX |
NETIF_F_HW_VLAN_CTAG_TX;
 
+   if (priv->use_ncsi)
+   netdev->hw_features |= NETIF_F_HW_VLAN_CTAG_FILTER;
+
/* AST2400  doesn't have working HW checksum generation */
if (np && (of_device_is_compatible(np, "aspeed,ast2400-mac")))
netdev->hw_features &= ~NETIF_F_HW_CSUM;
-- 
2.14.0



[PATCH net-next v2 2/3] net/ncsi: Configure VLAN tag filter

2017-08-13 Thread Samuel Mendoza-Jonas
Make use of the ndo_vlan_rx_{add,kill}_vid callbacks to have the NCSI
stack process new VLAN tags and configure the channel VLAN filter
appropriately.
Several VLAN tags can be set and a "Set VLAN Filter" packet must be sent
for each one, meaning the ncsi_dev_state_config_svf state must be
repeated. An internal list of VLAN tags is maintained, and compared
against the current channel's ncsi_channel_filter in order to keep track
within the state. VLAN filters are removed in a similar manner, with the
introduction of the ncsi_dev_state_config_clear_vids state. The maximum
number of VLAN tag filters is determined by the "Get Capabilities"
response from the channel.

Signed-off-by: Samuel Mendoza-Jonas 
---
 include/net/ncsi.h |   2 +
 net/ncsi/internal.h|  11 ++
 net/ncsi/ncsi-manage.c | 295 -
 net/ncsi/ncsi-rsp.c|   9 +-
 4 files changed, 313 insertions(+), 4 deletions(-)

diff --git a/include/net/ncsi.h b/include/net/ncsi.h
index 68680baac0fd..1f96af46df49 100644
--- a/include/net/ncsi.h
+++ b/include/net/ncsi.h
@@ -28,6 +28,8 @@ struct ncsi_dev {
 };
 
 #ifdef CONFIG_NET_NCSI
+int ncsi_vlan_rx_add_vid(struct net_device *dev, __be16 proto, u16 vid);
+int ncsi_vlan_rx_kill_vid(struct net_device *dev, __be16 proto, u16 vid);
 struct ncsi_dev *ncsi_register_dev(struct net_device *dev,
   void (*notifier)(struct ncsi_dev *nd));
 int ncsi_start_dev(struct ncsi_dev *nd);
diff --git a/net/ncsi/internal.h b/net/ncsi/internal.h
index 1308a56f2591..af3d636534ef 100644
--- a/net/ncsi/internal.h
+++ b/net/ncsi/internal.h
@@ -180,6 +180,7 @@ struct ncsi_channel {
 #define NCSI_CHANNEL_INACTIVE  1
 #define NCSI_CHANNEL_ACTIVE2
 #define NCSI_CHANNEL_INVISIBLE 3
+   boolreconfigure_needed;
spinlock_t  lock;   /* Protect filters etc */
struct ncsi_package *package;
struct ncsi_channel_version version;
@@ -235,6 +236,9 @@ enum {
ncsi_dev_state_probe_dp,
ncsi_dev_state_config_sp= 0x0301,
ncsi_dev_state_config_cis,
+   ncsi_dev_state_config_clear_vids,
+   ncsi_dev_state_config_svf,
+   ncsi_dev_state_config_ev,
ncsi_dev_state_config_sma,
ncsi_dev_state_config_ebf,
 #if IS_ENABLED(CONFIG_IPV6)
@@ -253,6 +257,12 @@ enum {
ncsi_dev_state_suspend_done
 };
 
+struct vlan_vid {
+   struct list_head list;
+   __be16 proto;
+   u16 vid;
+};
+
 struct ncsi_dev_priv {
struct ncsi_dev ndev;/* Associated NCSI device */
unsigned intflags;   /* NCSI device flags  */
@@ -276,6 +286,7 @@ struct ncsi_dev_priv {
struct work_struct  work;/* For channel management */
struct packet_type  ptype;   /* NCSI packet Rx handler */
struct list_headnode;/* Form NCSI device list  */
+   struct list_headvlan_vids;   /* List of active VLAN IDs */
 };
 
 struct ncsi_cmd_arg {
diff --git a/net/ncsi/ncsi-manage.c b/net/ncsi/ncsi-manage.c
index a3bd5fa8ad09..3cbd4328f142 100644
--- a/net/ncsi/ncsi-manage.c
+++ b/net/ncsi/ncsi-manage.c
@@ -38,6 +38,22 @@ static inline int ncsi_filter_size(int table)
return sizes[table];
 }
 
+u32 *ncsi_get_filter(struct ncsi_channel *nc, int table, int index)
+{
+   struct ncsi_channel_filter *ncf;
+   int size;
+
+   ncf = nc->filters[table];
+   if (!ncf)
+   return NULL;
+
+   size = ncsi_filter_size(table);
+   if (size < 0)
+   return NULL;
+
+   return ncf->data + size * index;
+}
+
 int ncsi_find_filter(struct ncsi_channel *nc, int table, void *data)
 {
struct ncsi_channel_filter *ncf;
@@ -58,7 +74,7 @@ int ncsi_find_filter(struct ncsi_channel *nc, int table, void 
*data)
index = -1;
while ((index = find_next_bit(bitmap, ncf->total, index + 1))
   < ncf->total) {
-   if (!memcmp(ncf->data + size * index, data, size)) {
+   if (!data || !memcmp(ncf->data + size * index, data, size)) {
spin_unlock_irqrestore(>lock, flags);
return index;
}
@@ -639,6 +655,83 @@ static void ncsi_suspend_channel(struct ncsi_dev_priv *ndp)
nd->state = ncsi_dev_state_functional;
 }
 
+/* Check the VLAN filter bitmap for a set filter, and construct a
+ * "Set VLAN Filter - Disable" packet if found.
+ */
+static int clear_one_vid(struct ncsi_dev_priv *ndp, struct ncsi_channel *nc,
+struct ncsi_cmd_arg *nca)
+{
+   int index;
+   u16 vid;
+
+   index = ncsi_find_filter(nc, NCSI_FILTER_VLAN, NULL);
+   if (index < 0) {
+   /* Filter table empty */
+   return -1;
+   }
+
+   vid = *(u16 *)ncsi_get_filter(nc, NCSI_FILTER_VLAN, index);
+   

[PATCH net-next v2 0/3] NCSI VLAN Filtering Support

2017-08-13 Thread Samuel Mendoza-Jonas
This series (mainly patch 2) adds VLAN filtering to the NCSI implementation.
A fair amount of code already exists in the NCSI stack for VLAN filtering but
none of it is actually hooked up. This goes the final mile and fixes a few
bugs in the existing code found along the way (patch 1).

Patch 3 adds the appropriate flag and callbacks to the ftgmac100 driver to
enable filtering as it's a large consumer of NCSI (and what I've been
testing on).

Samuel Mendoza-Jonas (3):
  net/ncsi: Fix several packet definitions
  net/ncsi: Configure VLAN tag filter
  ftgmac100: Support NCSI VLAN filtering when available

 drivers/net/ethernet/faraday/ftgmac100.c |   5 +
 include/net/ncsi.h   |   2 +
 net/ncsi/internal.h  |  11 ++
 net/ncsi/ncsi-cmd.c  |  10 +-
 net/ncsi/ncsi-manage.c   | 295 ++-
 net/ncsi/ncsi-pkt.h  |   2 +-
 net/ncsi/ncsi-rsp.c  |  12 +-
 7 files changed, 326 insertions(+), 11 deletions(-)

-- 
2.14.0



[PATCH net-next v2 1/3] net/ncsi: Fix several packet definitions

2017-08-13 Thread Samuel Mendoza-Jonas
Signed-off-by: Samuel Mendoza-Jonas 
---
v2: Rebased on latest net-next

 net/ncsi/ncsi-cmd.c | 10 +-
 net/ncsi/ncsi-pkt.h |  2 +-
 net/ncsi/ncsi-rsp.c |  3 ++-
 3 files changed, 8 insertions(+), 7 deletions(-)

diff --git a/net/ncsi/ncsi-cmd.c b/net/ncsi/ncsi-cmd.c
index 5e03ed190e18..7567ca63aae2 100644
--- a/net/ncsi/ncsi-cmd.c
+++ b/net/ncsi/ncsi-cmd.c
@@ -139,9 +139,9 @@ static int ncsi_cmd_handler_svf(struct sk_buff *skb,
struct ncsi_cmd_svf_pkt *cmd;
 
cmd = skb_put_zero(skb, sizeof(*cmd));
-   cmd->vlan = htons(nca->words[0]);
-   cmd->index = nca->bytes[2];
-   cmd->enable = nca->bytes[3];
+   cmd->vlan = htons(nca->words[1]);
+   cmd->index = nca->bytes[6];
+   cmd->enable = nca->bytes[7];
ncsi_cmd_build_header(>cmd.common, nca);
 
return 0;
@@ -153,7 +153,7 @@ static int ncsi_cmd_handler_ev(struct sk_buff *skb,
struct ncsi_cmd_ev_pkt *cmd;
 
cmd = skb_put_zero(skb, sizeof(*cmd));
-   cmd->mode = nca->bytes[0];
+   cmd->mode = nca->bytes[3];
ncsi_cmd_build_header(>cmd.common, nca);
 
return 0;
@@ -228,7 +228,7 @@ static struct ncsi_cmd_handler {
{ NCSI_PKT_CMD_AE, 8, ncsi_cmd_handler_ae  },
{ NCSI_PKT_CMD_SL, 8, ncsi_cmd_handler_sl  },
{ NCSI_PKT_CMD_GLS,0, ncsi_cmd_handler_default },
-   { NCSI_PKT_CMD_SVF,4, ncsi_cmd_handler_svf },
+   { NCSI_PKT_CMD_SVF,8, ncsi_cmd_handler_svf },
{ NCSI_PKT_CMD_EV, 4, ncsi_cmd_handler_ev  },
{ NCSI_PKT_CMD_DV, 0, ncsi_cmd_handler_default },
{ NCSI_PKT_CMD_SMA,8, ncsi_cmd_handler_sma },
diff --git a/net/ncsi/ncsi-pkt.h b/net/ncsi/ncsi-pkt.h
index 3ea49ed0a935..91b4b66438df 100644
--- a/net/ncsi/ncsi-pkt.h
+++ b/net/ncsi/ncsi-pkt.h
@@ -104,7 +104,7 @@ struct ncsi_cmd_svf_pkt {
unsigned char   index; /* VLAN table index  */
unsigned char   enable;/* Enable or disable */
__be32  checksum;  /* Checksum  */
-   unsigned char   pad[14];
+   unsigned char   pad[18];
 };
 
 /* Enable VLAN */
diff --git a/net/ncsi/ncsi-rsp.c b/net/ncsi/ncsi-rsp.c
index 087db775b3dc..c1a191d790e2 100644
--- a/net/ncsi/ncsi-rsp.c
+++ b/net/ncsi/ncsi-rsp.c
@@ -354,7 +354,8 @@ static int ncsi_rsp_handler_svf(struct ncsi_request *nr)
 
/* Add or remove the VLAN filter */
if (!(cmd->enable & 0x1)) {
-   ret = ncsi_remove_filter(nc, NCSI_FILTER_VLAN, cmd->index);
+   /* HW indexes from 1 */
+   ret = ncsi_remove_filter(nc, NCSI_FILTER_VLAN, cmd->index - 1);
} else {
vlan = ntohs(cmd->vlan);
ret = ncsi_add_filter(nc, NCSI_FILTER_VLAN, );
-- 
2.14.0



Re: [PATCH 0/3] NCSI VLAN Filtering Support

2017-08-13 Thread Samuel Mendoza-Jonas
On Fri, 2017-08-11 at 14:48 -0700, David Miller wrote:
> From: Samuel Mendoza-Jonas 
> Date: Fri, 11 Aug 2017 16:16:45 +1000
> 
> > This series (mainly patch 3) adds VLAN filtering to the NCSI implementation.
> > A fair amount of code already exists in the NCSI stack for VLAN filtering 
> > but
> > none of it is actually hooked up. This goes the final mile and fixes a few
> > bugs in the existing code found along the way (patch 2).
> > 
> > Patch 1 adds the appropriate flag to the ftgmac100 driver to enable 
> > filtering
> > as it's a large consumer of NCSI (and what I've been testing on).
> 
> This patch series does not apply cleanly to net-next, you need to respin.

Sorry about that, silly Friday mistake :)
I noticed I've also squashed an ftgmac100 change into the wrong patch,
I'll fix that up along with the respin.

Cheers,
Sam


Re: [patch v1 1/2] Allow Mellanox network vendor to be configured if only I2C bus is configured

2017-08-13 Thread David Miller
From: Ohad Oz 
Date: Sun, 13 Aug 2017 15:26:56 +

> 
> 
>> -Original Message-
>> From: Leon Romanovsky [mailto:l...@kernel.org]
>> Sent: Saturday, August 12, 2017 5:37 PM
>> To: Ohad Oz 
>> Cc: da...@davemloft.net; netdev@vger.kernel.org; j...@resnulli.us; Saeed
>> Mahameed ; Vadim Pasternak
>> ; system-sw-low-level > le...@mellanox.com>
>> Subject: Re: [patch v1 1/2] Allow Mellanox network vendor to be configured
>> if only I2C bus is configured
>> 
>> On Thu, Aug 10, 2017 at 05:11:51PM +, Ohad Oz wrote:
>> > Patch allows Mellanox devices on system with no PCI, but with I2C only.
>> >
>> 
>> Did you test mlx5 device on such system? Did it work for you?
> 
> Yes, I did. With PCI config set to disable mlx5 drivers are not built. 
> Only the following: 
> /build/drivers/net/Ethernet/mellanox/mlxsw/mlxsw_core.ko
> /build/drivers/net/Ethernet/mellanox /mlxsw/mlxsw_i2c.ko
> /build/drivers/net/Ethernet/mellanox /mlxsw/mlxsw_minimal.ko
> 
> While with both options on all drivers are built inc mlx5.

I'm not so sure he's interested if things build or not.

He's asking you if you actually used a Mellanox device with this
driver with PCI disabled and only I2C available.


Re: Kernel 4.13.0-rc4-next-20170811 - IP Routing / Forwarding performance vs Core/RSS number / HT on

2017-08-13 Thread Alexander Duyck
On Sat, Aug 12, 2017 at 10:27 AM, Paweł Staszewski
 wrote:
> Hi and thanks for reply
>
>
>
> W dniu 2017-08-12 o 14:23, Jesper Dangaard Brouer pisze:
>>
>> On Fri, 11 Aug 2017 19:51:10 +0200 Paweł Staszewski
>>  wrote:
>>
>>> Hi
>>>
>>> I made some tests for performance comparison.
>>
>> Thanks for doing this. Feel free to Cc me, if you do more of these
>> tests (so I don't miss them on the mailing list).
>>
>> I don't understand stand if you are reporting a potential problem?
>>
>> It would be good if you can provide a short summary section (of the
>> issue) in the _start_ of the email, and then provide all this nice data
>> afterwards, to back your case.
>>
>> My understanding is, you report:
>>
>> 1. VLANs on ixgbe show a 30-40% slowdown
>> 2. System stopped scaling after 7+ CPUs

So I had read through most of this before I realized what it was you
were reporting. As far as the behavior there are a few things going
on. I have some additional comments below but they are mostly based on
what I had read up to that point.

As far as possible issues for item 1. The VLAN adds 4 bytes of data of
the payload, when it is stripped it can result in a packet that is 56
bytes. These missing 8 bytes can cause issues as it forces the CPU to
do a read/modify/write every time the device writes to the 64B cache
line instead of just doing it as a single write. This can be very
expensive and hurt performance. In addition it adds 4 bytes on the
wire, so if you are sending the same 64B packets over the VLAN
interface it is bumping them up to 68B to make room for the VLAN tag.
I am suspecting you are encountering one of these type of issues. You
might try tweaking the packet sizes in increments of 4 to see if there
is a sweet spot that you might be falling out of or into.

Item 2 is a known issue with the NICs supported by ixgbe, at least for
anything 82599 and later. The issue here is that there isn't really an
Rx descriptor cache so to try and optimize performance the hardware
will try to write back as many descriptors it has ready for the ring
requesting writeback. The problem is as you add more rings it means
the writes get smaller as they are triggering more often. So what you
end up seeing is that for each additional ring you add the performance
starts dropping as soon as the rings are no longer being fully
saturated. You can tell this has happened when the CPUs in use
suddenly all stop reporting 100% softirq use. So for example to
perform at line rate with 64B packets you would need something like
XDP and to keep the ring count small, like maybe 2 rings. Any more
than that and the performance will start to drop as you hit PCIe
bottlenecks.

> This is not only problem/bug report  - but some kind of comparision plus
> some toughts about possible problems :)
> And can help somebody when searching the net for possible expectations :)
> Also - dono better list where are the smartest people that know what is
> going in kernel with networking :)
>
> Next time i will place summary on top - sorry :)
>
>>
>>> Tested HW (FORWARDING HOST):
>>>
>>> Intel(R) Xeon(R) Gold 6132 CPU @ 2.60GHz
>>
>> Interesting, I've not heard about a Intel CPU called "Gold" before now,
>> but it does exist:
>>
>> https://ark.intel.com/products/123541/Intel-Xeon-Gold-6132-Processor-19_25M-Cache-2_60-GHz
>>
>>
>>> Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection (rev 01)
>>
>> This is one of my all time favorite NICs!
>
> Yes this is a good NIC - will have connectx-4 2x100G by monday so will also
> do some tests
>
>>
>>>
>>> Test diagram:
>>>
>>>
>>> TRAFFIC GENERATOR (ethX) -> (enp216s0f0 - RX Traffic) FORWARDING HOST
>>> (enp216s0f1(vlan1000) - TX Traffic) -> (ethY) SINK
>>>
>>> Forwarder traffic: UDP random ports from 9 to 19 with random hosts from
>>> 172.16.0.1 to 172.16.0.255
>>>
>>> TRAFFIC GENERATOR TX is stable 9.9Mpps (in kernel pktgen)
>>
>> What kind of traffic flow?  E.g. distribution, many/few source IPs...
>
>
> Traffic generator is pktgen so udp flows - better paste parameters from
> pktgen:
> UDP_MIN=9
> UDP_MAX=19
>
> pg_set $dev "dst_min 172.16.0.1"
> pg_set $dev "dst_max 172.16.0.100"
>
> # Setup random UDP port src range
> #pg_set $dev "flag UDPSRC_RND"
> pg_set $dev "flag UDPSRC_RND"
> pg_set $dev "udp_src_min $UDP_MIN"
> pg_set $dev "udp_src_max $UDP_MAX"
>
>
>>
>>
>>>
>>> Settings used for FORWARDING HOST (changed param. was only number of RSS
>>> combined queues + set affinity assignment for them to fit with first
>>> numa node where 2x10G port card is installed)
>>>
>>> ixgbe driver used from kernel (in-kernel build - not a module)
>>>
>> Nice with a script showing you setup, thanks. I would be good if it had
>> comments, telling why you think this is a needed setup adjustment.
>>
>>> #!/bin/sh
>>> ifc='enp216s0f0 enp216s0f1'
>>> for i in $ifc
>>>   do
>>>   ip link set up dev $i
>>>   ethtool -A $i autoneg 

Re: unregister_netdevice: waiting for eth0 to become free. Usage count = 1

2017-08-13 Thread David Ahern
On 8/13/17 2:56 PM, Wei Wang wrote:
>> Looking at my patch to move host routes from loopback to device with the
>> address, I have this:
>>
>> @@ -2789,7 +2808,8 @@ static int fib6_ifdown(struct rt6_info *rt, void *arg)
>> const struct arg_dev_net *adn = arg;
>> const struct net_device *dev = adn->dev;
>>
>> -   if ((rt->dst.dev == dev || !dev) &&
>> +   if ((rt->dst.dev == dev || !dev ||
>> +(netdev_unregistering(dev) && rt->rt6i_idev->dev == dev)) &&
>> rt != adn->net->ipv6.ip6_null_entry &&
>> (rt->rt6i_nsiblings == 0 ||
>>  (dev && netdev_unregistering(dev)) ||
> 
> As you explained earlier, after your patch, all entries in the fib6
> tree will have rt->dst.dev be the same as rt->rt6i_idev->dev except
> those ones created by p6_rt_cache_alloc() and ip6_rt_pcpu_alloc().
> Then the above newly added check is mainly to catch those cached dst
> entries (created by ip6_rt_cached_alloc()). right?
> And it is required because __ipv6_ifa_notify() -> ip6_del_rt() won't
> take care of those cached dst entries.
> 
> Then I think I should wait for your patches to get merged before
> submitting my patch?

no. your patch will need to go back to 4.12; my changes will not be
appropriate for that.


[PATCH net] ipv4: route: fix inet_rtm_getroute induced crash

2017-08-13 Thread Florian Westphal
"ip route get $daddr iif eth0 from $saddr" causes:
 BUG: KASAN: use-after-free in ip_route_input_rcu+0x1535/0x1b50
 Call Trace:
  ip_route_input_rcu+0x1535/0x1b50
  ip_route_input_noref+0xf9/0x190
  tcp_v4_early_demux+0x1a4/0x2b0
  ip_rcv+0xbcb/0xc05
  __netif_receive_skb+0x9c/0xd0
  netif_receive_skb_internal+0x5a8/0x890

Problem is that inet_rtm_getroute calls either ip_route_input_rcu (if an
iif was provided) or ip_route_output_key_hash_rcu.

But ip_route_input_rcu, unlike ip_route_output_key_hash_rcu, already
associates the dst_entry with the skb.  This clears the SKB_DST_NOREF
bit (i.e. skb_dst_drop will release/free the entry while it should not).

Thus only set the dst if we called ip_route_output_key_hash_rcu().

I tested this patch by running:
 while true;do ip r get 10.0.1.2;done > /dev/null &
 while true;do ip r get 10.0.1.2 iif eth0  from 10.0.1.1;done > /dev/null &
... and saw no crash or memory leak.

Cc: Roopa Prabhu 
Cc: David Ahern 
Fixes: ba52d61e0ff ("ipv4: route: restore skb_dst_set in inet_rtm_getroute")
Signed-off-by: Florian Westphal 
---
 net/ipv4/route.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 0383e66f59bc..7effa62beed3 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -2750,12 +2750,13 @@ static int inet_rtm_getroute(struct sk_buff *in_skb, 
struct nlmsghdr *nlh,
err = 0;
if (IS_ERR(rt))
err = PTR_ERR(rt);
+   else
+   skb_dst_set(skb, >dst);
}
 
if (err)
goto errout_free;
 
-   skb_dst_set(skb, >dst);
if (rtm->rtm_flags & RTM_F_NOTIFY)
rt->rt_flags |= RTCF_NOTIFY;
 
-- 
2.13.0



RE: [PATCH net-next v2] openvswitch: enable NSH support

2017-08-13 Thread Jan Scheurich
> From: Jiri Benc [mailto:jb...@redhat.com]
> Sent: Friday, 11 August, 2017 12:23
> 
> On Fri, 11 Aug 2017 10:09:36 +, Jan Scheurich wrote:
> > Unless someone can explain to me why the datapath should understand the
> > internal structure/format of metadata in push_nsh, I would strongly
> > vote to keep the metadata as variable length octet sequence in the
> > non-structured OVS_ACTION_ATTR_PUSH_NSH
> 
> Could be but it still needs to be in a different attribute and not in
> the ovs_action_push_nsh structure.
> 
> Separate attributes for MD1/MD2 has the advantage of easier validation:
> with a separate MD1 type attribute, the size check is easier. With an
> unstructured MD attribute, we'd need to look into the
> OVS_ACTION_ATTR_NSH_BASE_HEADER attribute for mdtype and then validate
> the unstructured MD attribute size manually. Not a big deal, though.
> I don't have strong opinion here.
> 
> But I do have strong opinion that MD needs to go into a separate
> attribute, whether there are separate attributes for MD1/2 or not.

Jiri, I am not too familiar with conventions on the OVS netlink interface 
regarding the handling of variable length fields. What is the benefit of 
structuring the push_nsh action into

OVS_ACTION_ATTR_PUSH_NSH
+-- OVS_ACTION_ATTR_NSH_BASE_HEADER
+-- OVS_ACTION_ATTR_NSH_MD

instead of grouping the base header fields and the variable length MD into a 
single monolithic attribute OVS_ACTION_ATTR_PUSH_NSH? Is the main concern a 
potential future need to add additional parameters to the push_nsh datapath 
action? Are there examples for such structured actions other than 
OVS_ACTION_ATTR_CT where the need is obvious because it is very polymorphic?

BR, Jan

BTW:  The name OVS_ACTION_ATTR_NSH_BASE_HEADER is misleading because in the NSH 
draft the term base header is used for the first 32-bit word, whereas here it 
includes also the 32-bit Service Path header.


Re: unregister_netdevice: waiting for eth0 to become free. Usage count = 1

2017-08-13 Thread Wei Wang
> Looking at my patch to move host routes from loopback to device with the
> address, I have this:
>
> @@ -2789,7 +2808,8 @@ static int fib6_ifdown(struct rt6_info *rt, void *arg)
> const struct arg_dev_net *adn = arg;
> const struct net_device *dev = adn->dev;
>
> -   if ((rt->dst.dev == dev || !dev) &&
> +   if ((rt->dst.dev == dev || !dev ||
> +(netdev_unregistering(dev) && rt->rt6i_idev->dev == dev)) &&
> rt != adn->net->ipv6.ip6_null_entry &&
> (rt->rt6i_nsiblings == 0 ||
>  (dev && netdev_unregistering(dev)) ||

As you explained earlier, after your patch, all entries in the fib6
tree will have rt->dst.dev be the same as rt->rt6i_idev->dev except
those ones created by p6_rt_cache_alloc() and ip6_rt_pcpu_alloc().
Then the above newly added check is mainly to catch those cached dst
entries (created by ip6_rt_cached_alloc()). right?
And it is required because __ipv6_ifa_notify() -> ip6_del_rt() won't
take care of those cached dst entries.

Then I think I should wait for your patches to get merged before
submitting my patch?

Thanks.
Wei


On Sun, Aug 13, 2017 at 9:24 AM, David Ahern  wrote:
> On 8/12/17 1:42 PM, Wei Wang wrote:
>> Hi Ido,
>>
 - if ((rt->dst.dev == dev || !dev) &&
 + if ((rt->dst.dev == dev || !dev ||
 +  rt->rt6i_idev->dev == dev) &&
>>>
>>> Can you please explain why this line is needed? While host routes aren't
>>> removed from the FIB by rt6_ifdown() (when dst.dev goes down), they are
>>> removed later on in addrconf_ifdown().
>>>
>>
>> Yes.. Agree. But one difference is that if the route is removed from
>> addrconf_ifdown(), dst_dev_put() won't be called to release the
>> devices before doing dst_release(). It is OK if dst_release() sees the
>> refcnt on dst already drops to 0 and directly destroys the dst. But I
>> think it will cause problem if at the time, the dst is still held by
>> some other users because then the refcnt on the device going down will
>> not get released.
>> That's why I think we should remove the dst with either dst->dev ==
>> going down dev or rt6->rt6i_idev->dev == going down dev from the fib6
>> tree always because there, we always call dst_dev_put() to release the
>> device.
>>
>>> With your patch, if I check the return value of ip6_del_rt() in
>>> __ipv6_ifa_notify() I see that -ENONET is returned. Because the host
>>> route was already removed by rt6_ifdown(). When the line in question is
>>> removed from the patch I don't get the error anymore.
>>>
>>
>> Right. That is expected as the route is already removed from the tree.
>>
>>> Is it possible that in John's case the host route was correctly removed
>>> from the FIB and that the unreleased reference was due to a wrong check
>>> in ip6_dst_ifdown() (which you patched correctly AFAICT)?
>>>
>>
>> Yes. possible. But as I explained earlier, I still think we should
>> also remove routes with rt6->rt6i_idev->dev == going down dev from the
>> tree.
>
> Looking at my patch to move host routes from loopback to device with the
> address, I have this:
>
> @@ -2789,7 +2808,8 @@ static int fib6_ifdown(struct rt6_info *rt, void *arg)
> const struct arg_dev_net *adn = arg;
> const struct net_device *dev = adn->dev;
>
> -   if ((rt->dst.dev == dev || !dev) &&
> +   if ((rt->dst.dev == dev || !dev ||
> +(netdev_unregistering(dev) && rt->rt6i_idev->dev == dev)) &&
> rt != adn->net->ipv6.ip6_null_entry &&
> (rt->rt6i_nsiblings == 0 ||
>  (dev && netdev_unregistering(dev)) ||
>
>


Re: Regression: Bug 196547 - Since 4.12 - bonding module not working with wireless drivers

2017-08-13 Thread Andreas Born
2017-08-12 21:30 GMT+02:00 James Feeney :
>
>
> Andreas patch failed to address the continuous, *10-times per second* warning
> which will "spam" the log file, sometimes the console, whenever the test 
> fails:
>  if (bond_update_speed_duplex(slave) && bond_needs_speed_duplex(bond)) {...}
> which then has:
>  netdev_warn(bond->dev, "failed to get link speed/duplex for%s\n",
> slave->dev->name);
>
> That is the sort of irresponsible code that "works fine" as long as there are 
> no
> errors, and it never actually runs!
>
> I'm guessing that the simple fix is to use "net_warn_ratelimited()" instead of
> "netdev_warn()", where net/core/utils.c says:
>
> /*
>  * All net warning printk()s should be guarded by this function.
>  */
> int net_ratelimit(void)
> {
> return __ratelimit(_ratelimit_state);
> }
>
> though Andreas has also suggested "pr_warn_ratelimited()", which instead uses
> "__ratelimit(&_rs)".
>
> Here's a link to the rate-limiting patch, after Andreas' patch is already
> applied, since you say that David has already applied the first patch:
>
> https://bugzilla.kernel.org/attachment.cgi?id=257903
>

The other day I also sent a patch for part 2 of your bug report on
this list. Better safe than sorry by doing one thing per patch. This
second patch received feedback and its v2 is currently awaiting
review. Sorry, for informing you only now. I should have cc'd you in
the first place. You can lookup the state of that patch on patchwork.
[1] It's basically the same as previous versions just changing the
original code even less.
.
I sort of expect that this second patch won't be queued for stable.
But let's see... Essentially it's a regression too and additionally
would've been fixed by a plain revert.

On a side note I would recommend some of my own reading to you about
patch submission in general [1] and on netdev specifically [2].
Putting your suggested changes in the right form makes everyone's life
easier especially your own saving you lots of back and forth by mail.
Seeing the amount of mail on this list during the last days was reason
enough to comprehend the necessity of those standards.

And, just wondering, who's going to eventually close that bugreport?
https://bugzilla.kernel.org/show_bug.cgi?id=196547

Cheers
Andreas

[1] https://patchwork.ozlabs.org/patch/800770/
[2] https://www.kernel.org/doc/html/latest/process/submitting-patches.html
[3] 
https://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git/tree/Documentation/networking/netdev-FAQ.txt


Re: Kernel 4.13.0-rc4-next-20170811 - IP Routing / Forwarding performance vs Core/RSS number / HT on

2017-08-13 Thread Paweł Staszewski

To show some difference below comparision vlan/no-vlan traffic

10Mpps forwarded traffic vith no-vlan vs 6.9Mpps with vlan

(ixgbe in kernel driver kernel 4.13.0-rc4-next-20170811)

ethtool settings for both tests:

ethtool -K $ifc gro off tso off gso off sg on l2-fwd-offload off 
tx-nocache-copy off ntuple off


ethtool -L $ifc combined 16

ethtool -C $ifc rx-usecs 2

ethtool -G $ifc rx 4096 tx 1024

16 CORES / 16 RSS QUEUES


Tx traffic on vlan:

RX Interface:

enp216s0f0

TX Interface

vlan1000 added to enp216s0f1 interface (with vlan 1000 ip address assigned)

ID;CPU_CORES / RSS QUEUES;PKT_SIZE;PPS_RX;BPS_RX;PPS_TX;BPS_TX
0;16;64;6939008;416325120;6938696;402411192
1;16;64;6941952;416444160;6941745;402558918
2;16;64;6960576;417584640;6960707;403698718
3;16;64;6940736;416486400;6941820;402503876
4;16;64;6927680;415741440;6927420;401853870
5;16;64;6929792;415687680;6929917;401839196
6;16;64;6950400;416989440;6950661;403026166
7;16;64;6953664;417216000;6953454;403260544
8;16;64;6948480;416851200;6948800;403023266
9;16;64;6924160;415422720;6924092;401542468

100% load on all 16 Cores.


vs

RX interface from traffic generator:

enp216s0f0

TX interface to the sink:

enp216s0f1

No vlan used

ID;CPU_CORES / RSS QUEUES;PKT_SIZE;PPS_RX;BPS_RX;PPS_TX;BPS_TX

0;16;64;10280176;793608540;10298496;596796568
1;16;64;10046928;600978780;10046022;582527002
2;16;64;10032956;601827420;10026097;581515656
3;16;64;10051503;602252460;10067880;582420804
4;16;64;10016204;602725140;10017358;582644800
5;16;64;10035575;602437620;10059504;582067294
6;16;64;10041667;603069780;10057865;582477412
7;16;64;1008;600027420;10046526;581022018
8;16;64;10022436;601121100;10025946;581904314
9;16;64;10036231;602514960;10058724;582180684


So we have 10Mpps forwarded

- have problems with pktgen on my traffic generator to push more than 
10M but this low budget hardware so.. :)



And there are still free cpu cycles so probabbly can forward at line 10G 
rate 14Mpps


Average: CPU%usr   %nice%sys %iowait%irq   %soft %steal  
%guest  %gnice   %idle
Average: all0.000.000.000.000.00 20.91
0.000.000.00   79.09
Average:   00.000.000.000.000.00 0.090.00
0.000.00   99.91
Average:   10.030.000.030.000.00 0.000.00
0.000.00   99.94
Average:   20.000.000.000.000.00 0.000.00
0.000.00  100.00
Average:   30.000.000.000.000.00 0.000.00
0.000.00  100.00
Average:   40.000.000.000.000.00 0.000.00
0.000.00  100.00
Average:   50.000.000.180.000.00 0.000.00
0.000.00   99.82
Average:   60.000.000.000.000.00 0.000.00
0.000.00  100.00
Average:   70.000.000.000.000.00 0.000.00
0.000.00  100.00
Average:   80.000.000.000.000.00 0.000.00
0.000.00  100.00
Average:   90.000.000.000.000.00 0.000.00
0.000.00  100.00
Average:  100.000.000.030.240.00 0.000.00
0.000.00   99.74
Average:  110.000.000.000.000.00 0.000.00
0.000.00  100.00
Average:  120.000.000.000.000.00 0.000.00
0.000.00  100.00
Average:  130.000.000.000.000.00 0.000.00
0.000.00  100.00
Average:  140.000.000.000.000.00 92.38
0.000.000.007.62
Average:  150.000.000.000.000.00 85.88
0.000.000.00   14.12
Average:  160.000.000.000.000.00 64.91
0.000.000.00   35.09
Average:  170.000.000.000.000.00 66.76
0.000.000.00   33.24
Average:  180.000.000.000.000.00 65.57
0.000.000.00   34.43
Average:  190.000.000.000.000.00 66.38
0.000.000.00   33.62
Average:  200.000.000.000.000.00 72.97
0.000.000.00   27.03
Average:  210.000.000.000.000.00 70.80
0.000.000.00   29.20
Average:  220.000.000.000.000.00 66.44
0.000.000.00   33.56
Average:  230.000.000.000.000.00 66.12
0.000.000.00   33.88
Average:  240.000.000.000.000.00 68.35
0.000.000.00   31.65
Average:  250.000.000.000.000.00 71.79
0.000.000.00   28.21
Average:  260.000.000.000.000.00 70.24
0.000.000.00   29.76
Average:  270.000.000.000.000.00 73.24
0.000.000.00   26.76
Average:  280.000.000.000.000.00 0.000.00
0.000.00  100.00
Average:  290.000.000.000.000.00 0.00

Re: unregister_netdevice: waiting for eth0 to become free. Usage count = 1

2017-08-13 Thread David Ahern
On 8/12/17 1:42 PM, Wei Wang wrote:
> Hi Ido,
> 
>>> - if ((rt->dst.dev == dev || !dev) &&
>>> + if ((rt->dst.dev == dev || !dev ||
>>> +  rt->rt6i_idev->dev == dev) &&
>>
>> Can you please explain why this line is needed? While host routes aren't
>> removed from the FIB by rt6_ifdown() (when dst.dev goes down), they are
>> removed later on in addrconf_ifdown().
>>
> 
> Yes.. Agree. But one difference is that if the route is removed from
> addrconf_ifdown(), dst_dev_put() won't be called to release the
> devices before doing dst_release(). It is OK if dst_release() sees the
> refcnt on dst already drops to 0 and directly destroys the dst. But I
> think it will cause problem if at the time, the dst is still held by
> some other users because then the refcnt on the device going down will
> not get released.
> That's why I think we should remove the dst with either dst->dev ==
> going down dev or rt6->rt6i_idev->dev == going down dev from the fib6
> tree always because there, we always call dst_dev_put() to release the
> device.
> 
>> With your patch, if I check the return value of ip6_del_rt() in
>> __ipv6_ifa_notify() I see that -ENONET is returned. Because the host
>> route was already removed by rt6_ifdown(). When the line in question is
>> removed from the patch I don't get the error anymore.
>>
> 
> Right. That is expected as the route is already removed from the tree.
> 
>> Is it possible that in John's case the host route was correctly removed
>> from the FIB and that the unreleased reference was due to a wrong check
>> in ip6_dst_ifdown() (which you patched correctly AFAICT)?
>>
> 
> Yes. possible. But as I explained earlier, I still think we should
> also remove routes with rt6->rt6i_idev->dev == going down dev from the
> tree.

Looking at my patch to move host routes from loopback to device with the
address, I have this:

@@ -2789,7 +2808,8 @@ static int fib6_ifdown(struct rt6_info *rt, void *arg)
const struct arg_dev_net *adn = arg;
const struct net_device *dev = adn->dev;

-   if ((rt->dst.dev == dev || !dev) &&
+   if ((rt->dst.dev == dev || !dev ||
+(netdev_unregistering(dev) && rt->rt6i_idev->dev == dev)) &&
rt != adn->net->ipv6.ip6_null_entry &&
(rt->rt6i_nsiblings == 0 ||
 (dev && netdev_unregistering(dev)) ||




Re: [PATCH net] Revert "vhost: cache used event for better performance"

2017-08-13 Thread Koichiro Den
Sorry I mistakenly focused on NET case, please pass it over. I will do any
relevant suggestion in patch-based way. Thanks.

On Sun, 2017-08-13 at 23:11 +0900, Koichiro Den wrote:
> Thanks for your comments, Michael and Jason. And I'm sorry about late
> response.
> To be honest, I am on a summer vacation until next Tuesday.
> 
> I noticed that what I wrote was not sufficient. Regardless of caching
> mechanism
> existence, the "event" could legitimately be at any point out of the latest
> interval, which vhost_notify checks it against, meaning that if it's out of
> the
> interval we cannot distinguish whether or not it lags behind or has a
> lead. And
> it seems to conform to the spec. Thanks for leading me to the spec. The corner
> case I point out here is:
> (0) event idx feature turned on + TX napi turned off
> -> (1) guest-side TX traffic bursting occurs and delayed callback set
> -> (2) some interruption triggers skb_xmit_done
> -> (3) guest-side modest traffic makes the interval proceed to unbounded
> extent
> without updating "event" since NO_INTERRUPT continues to be set on its shadow
> flag.
> 
> IMHO, if you plan to make TX napi the only choice, doing this sort of
> optimisation beforehand seems likely to be in vain.
> 
> So, if the none-TX napi case continues to coexist, what I would like to
> suggest
> is not just the sort of my last email, but like making maximum staleness of
> "event" less than or equal to vq.num, and afterward caching suggestion.
> Otherwise, I guess I should not repost my last email since it would make
> matters
>  too complicated even though it will soon be removed when TX-napi becomes the
> only choice.
> 
> Thanks!
> 
> 
> On Wed, 2017-08-09 at 07:37 +0300, Michael S. Tsirkin wrote:
> > On Wed, Aug 09, 2017 at 10:38:10AM +0800, Jason Wang wrote:
> > > I think don't think current code can work well if vq.num is grater than
> > > 2^15. Since all cached idx is u16. This looks like a bug which needs to be
> > > fixed.
> > 
> > That's a limitation of virtio 1.0.
> > 
> > > > * else if the interval of vq.num is [2^15, 2^16):
> > > > the logic in the original patch (809ecb9bca6a9) suffices
> > > > * else (= less than 2^15) (optional):
> > > > checking only (vring_need_event(vq->last_used_event, new + vq->num, new)
> > > > would suffice.
> > > > 
> > > > Am I missing something, or is this irrelevant?
> > 
> > Could you pls repost the suggestion copying virtio-dev mailing list
> > (subscriber only, sorry about that, but host/guest ABI discussions
> > need to copy that list)?
> > 
> > > Looks not, I think this may work. Let me do some test.
> > > 
> > > Thanks
> > 
> > I think that at this point it's prudent to add a feature bit
> > as the virtio spec does not require to never move the event index back.
> > 



Re: [iproute PATCH 15/51] ipvrf: Fix error path of vrf_switch()

2017-08-13 Thread David Ahern
On 8/12/17 6:04 AM, Phil Sutter wrote:
> Apart from trying to close(-1), this also leaked memory.
> 
> Signed-off-by: Phil Sutter 
> ---
>  ip/ipvrf.c | 9 +
>  1 file changed, 5 insertions(+), 4 deletions(-)
> 
> diff --git a/ip/ipvrf.c b/ip/ipvrf.c
> index 92e2db98ca7d7..75cc026d072b8 100644
> --- a/ip/ipvrf.c
> +++ b/ip/ipvrf.c
> @@ -373,12 +373,12 @@ static int vrf_switch(const char *name)
>  
>   /* -1 on length to add '/' to the end */
>   if (ipvrf_get_netns(netns, sizeof(netns) - 1) < 0)
> - return -1;
> + goto out;
>  
>   if (vrf_path(vpath, sizeof(vpath)) < 0) {
>   fprintf(stderr, "Failed to get base cgroup path: %s\n",
>   strerror(errno));
> - return -1;
> + goto out;
>   }
>  
>   /* if path already ends in netns then don't add it again */
> @@ -429,13 +429,14 @@ static int vrf_switch(const char *name)
>   snprintf(pid, sizeof(pid), "%d", getpid());
>   if (write(fd, pid, strlen(pid)) < 0) {
>   fprintf(stderr, "Failed to join cgroup\n");
> - goto out;
> + goto out2;
>   }
>  
>   rc = 0;
> +out2:
> + close(fd);
>  out:
>   free(mnt);
> - close(fd);
>  
>   return rc;
>  }
> 

Acked-by: David Ahern 


Re: [iproute PATCH 06/51] iplink_vrf: Complain if main table is not found

2017-08-13 Thread David Ahern
On 8/12/17 6:04 AM, Phil Sutter wrote:
> Signed-off-by: Phil Sutter 
> ---
>  ip/iplink_vrf.c | 5 -
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/ip/iplink_vrf.c b/ip/iplink_vrf.c
> index 917630e853375..809eda5de8f6e 100644
> --- a/ip/iplink_vrf.c
> +++ b/ip/iplink_vrf.c
> @@ -131,7 +131,10 @@ __u32 ipvrf_get_table(const char *name)
>  , sizeof(answer)) < 0) {
>   /* special case "default" vrf to be the main table */
>   if (errno == ENODEV && !strcmp(name, "default"))
> - rtnl_rttable_a2n(_id, "main");
> + if (rtnl_rttable_a2n(_id, "main"))
> + fprintf(stderr,
> + "BUG: RTTable \"main\" not found.\n");
> +
>  
>   return tb_id;
>   }
> 

Acked-by: David Ahern 


Re: [iproute PATCH 14/51] ipvrf: Don't try to close an invalid fd

2017-08-13 Thread David Ahern
On 8/12/17 6:04 AM, Phil Sutter wrote:
> Signed-off-by: Phil Sutter 
> ---
>  ip/ipvrf.c | 7 ---
>  1 file changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/ip/ipvrf.c b/ip/ipvrf.c
> index 0094cf8557cd7..92e2db98ca7d7 100644
> --- a/ip/ipvrf.c
> +++ b/ip/ipvrf.c
> @@ -268,7 +268,7 @@ static int vrf_configure_cgroup(const char *path, int 
> ifindex)
>   fprintf(stderr,
>   "Failed to open cgroup path: '%s'\n",
>   strerror(errno));
> - goto out;
> + return rc;
>   }
>  
>   /*
> @@ -290,13 +290,14 @@ static int vrf_configure_cgroup(const char *path, int 
> ifindex)
>   if (bpf_prog_attach_fd(prog_fd, cg_fd, BPF_CGROUP_INET_SOCK_CREATE)) {
>   fprintf(stderr, "Failed to attach prog to cgroup: '%s'\n",
>   strerror(errno));
> - goto out;
> + goto out2;
>   }
>  
>   rc = 0;
> +out2:
> + close(prog_fd);
>  out:
>   close(cg_fd);
> - close(prog_fd);
>  
>   return rc;
>  }
> 

Acked-by: David Ahern 


RE: [patch v1 1/2] Allow Mellanox network vendor to be configured if only I2C bus is configured

2017-08-13 Thread Ohad Oz


> -Original Message-
> From: Leon Romanovsky [mailto:l...@kernel.org]
> Sent: Saturday, August 12, 2017 5:37 PM
> To: Ohad Oz 
> Cc: da...@davemloft.net; netdev@vger.kernel.org; j...@resnulli.us; Saeed
> Mahameed ; Vadim Pasternak
> ; system-sw-low-level  le...@mellanox.com>
> Subject: Re: [patch v1 1/2] Allow Mellanox network vendor to be configured
> if only I2C bus is configured
> 
> On Thu, Aug 10, 2017 at 05:11:51PM +, Ohad Oz wrote:
> > Patch allows Mellanox devices on system with no PCI, but with I2C only.
> >
> 
> Did you test mlx5 device on such system? Did it work for you?

Yes, I did. With PCI config set to disable mlx5 drivers are not built. 
Only the following: 
/build/drivers/net/Ethernet/mellanox/mlxsw/mlxsw_core.ko
/build/drivers/net/Ethernet/mellanox /mlxsw/mlxsw_i2c.ko
/build/drivers/net/Ethernet/mellanox /mlxsw/mlxsw_minimal.ko

While with both options on all drivers are built inc mlx5.


> What is the changelog between v0 and v1 of these patches?

No changelog, I've started with v1.

> 
> > Signed-off-by: Ohad Oz 
> > ---
> >  drivers/net/ethernet/mellanox/Kconfig | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/drivers/net/ethernet/mellanox/Kconfig
> b/drivers/net/ethernet/mellanox/Kconfig
> > index 84a2007..0949741 100644
> > --- a/drivers/net/ethernet/mellanox/Kconfig
> > +++ b/drivers/net/ethernet/mellanox/Kconfig
> > @@ -5,7 +5,7 @@
> >  config NET_VENDOR_MELLANOX
> > bool "Mellanox devices"
> > default y
> > -   depends on PCI
> > +   depends on PCI || I2C
> > ---help---
> >   If you have a network (Ethernet) card belonging to this class, say Y.
> >
> > --
> > 2.8.0
> >


Re: [PATCH net] Revert "vhost: cache used event for better performance"

2017-08-13 Thread Koichiro Den
Thanks for your comments, Michael and Jason. And I'm sorry about late response.
To be honest, I am on a summer vacation until next Tuesday.

I noticed that what I wrote was not sufficient. Regardless of caching mechanism
existence, the "event" could legitimately be at any point out of the latest
interval, which vhost_notify checks it against, meaning that if it's out of the
interval we cannot distinguish whether or not it lags behind or has a lead. And
it seems to conform to the spec. Thanks for leading me to the spec. The corner
case I point out here is:
(0) event idx feature turned on + TX napi turned off
-> (1) guest-side TX traffic bursting occurs and delayed callback set
-> (2) some interruption triggers skb_xmit_done
-> (3) guest-side modest traffic makes the interval proceed to unbounded extent
without updating "event" since NO_INTERRUPT continues to be set on its shadow
flag.

IMHO, if you plan to make TX napi the only choice, doing this sort of
optimisation beforehand seems likely to be in vain.

So, if the none-TX napi case continues to coexist, what I would like to suggest
is not just the sort of my last email, but like making maximum staleness of
"event" less than or equal to vq.num, and afterward caching suggestion.
Otherwise, I guess I should not repost my last email since it would make matters
 too complicated even though it will soon be removed when TX-napi becomes the
only choice.

Thanks!


On Wed, 2017-08-09 at 07:37 +0300, Michael S. Tsirkin wrote:
> On Wed, Aug 09, 2017 at 10:38:10AM +0800, Jason Wang wrote:
> > I think don't think current code can work well if vq.num is grater than
> > 2^15. Since all cached idx is u16. This looks like a bug which needs to be
> > fixed.
> 
> That's a limitation of virtio 1.0.
> 
> > > * else if the interval of vq.num is [2^15, 2^16):
> > > the logic in the original patch (809ecb9bca6a9) suffices
> > > * else (= less than 2^15) (optional):
> > > checking only (vring_need_event(vq->last_used_event, new + vq->num, new)
> > > would suffice.
> > > 
> > > Am I missing something, or is this irrelevant?
> 
> Could you pls repost the suggestion copying virtio-dev mailing list
> (subscriber only, sorry about that, but host/guest ABI discussions
> need to copy that list)?
> 
> > Looks not, I think this may work. Let me do some test.
> > 
> > Thanks
> 
> I think that at this point it's prudent to add a feature bit
> as the virtio spec does not require to never move the event index back.
> 



[PATCH 4/4] net: ti: cpsw:: constify platform_device_id

2017-08-13 Thread Arvind Yadav
platform_device_id are not supposed to change at runtime. All functions
working with platform_device_id provided by 
work with const platform_device_id. So mark the non-const structs as
const.

Signed-off-by: Arvind Yadav 
---
 drivers/net/ethernet/ti/cpsw.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/ti/cpsw.c b/drivers/net/ethernet/ti/cpsw.c
index 1850e34..2866964 100644
--- a/drivers/net/ethernet/ti/cpsw.c
+++ b/drivers/net/ethernet/ti/cpsw.c
@@ -2827,7 +2827,7 @@ static int cpsw_probe_dual_emac(struct cpsw_priv *priv)
 
 #define CPSW_QUIRK_IRQ BIT(0)
 
-static struct platform_device_id cpsw_devtype[] = {
+static const struct platform_device_id cpsw_devtype[] = {
{
/* keep it for existing comaptibles */
.name = "cpsw",
-- 
2.7.4



[PATCH 0/4] constify net platform_device_id

2017-08-13 Thread Arvind Yadav
platform_device_id are not supposed to change at runtime. All functions
working with platform_device_id provided by 
work with const platform_device_id. So mark the non-const structs as const.

Arvind Yadav (4):
  [PATCH 1/4] can: constify platform_device_id
  [PATCH 2/4] net: dpaa_eth: constify platform_device_id
  [PATCH 3/4] net: sh_eth: constify platform_device_id
  [PATCH 4/4] net: ti: cpsw:: constify platform_device_id

 drivers/net/can/c_can/c_can_platform.c | 2 +-
 drivers/net/ethernet/freescale/dpaa/dpaa_eth.c | 2 +-
 drivers/net/ethernet/renesas/sh_eth.c  | 2 +-
 drivers/net/ethernet/ti/cpsw.c | 2 +-
 4 files changed, 4 insertions(+), 4 deletions(-)

-- 
2.7.4



[PATCH 2/4] net: dpaa_eth: constify platform_device_id

2017-08-13 Thread Arvind Yadav
platform_device_id are not supposed to change at runtime. All functions
working with platform_device_id provided by 
work with const platform_device_id. So mark the non-const structs as
const.

Signed-off-by: Arvind Yadav 
---
 drivers/net/ethernet/freescale/dpaa/dpaa_eth.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c 
b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
index 757b873..4521b33 100644
--- a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
+++ b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
@@ -2829,7 +2829,7 @@ static int dpaa_remove(struct platform_device *pdev)
return err;
 }
 
-static struct platform_device_id dpaa_devtype[] = {
+static const struct platform_device_id dpaa_devtype[] = {
{
.name = "dpaa-ethernet",
.driver_data = 0,
-- 
2.7.4



[PATCH 3/4] net: sh_eth: constify platform_device_id

2017-08-13 Thread Arvind Yadav
platform_device_id are not supposed to change at runtime. All functions
working with platform_device_id provided by 
work with const platform_device_id. So mark the non-const structs as
const.

Signed-off-by: Arvind Yadav 
---
 drivers/net/ethernet/renesas/sh_eth.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/renesas/sh_eth.c 
b/drivers/net/ethernet/renesas/sh_eth.c
index d2dc0a8..d2e88a3 100644
--- a/drivers/net/ethernet/renesas/sh_eth.c
+++ b/drivers/net/ethernet/renesas/sh_eth.c
@@ -3402,7 +3402,7 @@ static const struct dev_pm_ops sh_eth_dev_pm_ops = {
 #define SH_ETH_PM_OPS NULL
 #endif
 
-static struct platform_device_id sh_eth_id_table[] = {
+static const struct platform_device_id sh_eth_id_table[] = {
{ "sh7619-ether", (kernel_ulong_t)_data },
{ "sh771x-ether", (kernel_ulong_t)_data },
{ "sh7724-ether", (kernel_ulong_t)_data },
-- 
2.7.4



[PATCH 1/4] can: constify platform_device_id

2017-08-13 Thread Arvind Yadav
platform_device_id are not supposed to change at runtime. All functions
working with platform_device_id provided by 
work with const platform_device_id. So mark the non-const structs as
const.

Signed-off-by: Arvind Yadav 
---
 drivers/net/can/c_can/c_can_platform.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/can/c_can/c_can_platform.c 
b/drivers/net/can/c_can/c_can_platform.c
index e36d105..46a746e 100644
--- a/drivers/net/can/c_can/c_can_platform.c
+++ b/drivers/net/can/c_can/c_can_platform.c
@@ -220,7 +220,7 @@ static const struct c_can_driver_data am3352_dcan_drvdata = 
{
.raminit_bits = am3352_raminit_bits,
 };
 
-static struct platform_device_id c_can_id_table[] = {
+static const struct platform_device_id c_can_id_table[] = {
{
.name = KBUILD_MODNAME,
.driver_data = (kernel_ulong_t)_can_drvdata,
-- 
2.7.4



Re: [PATCH v5 09/11] ARM: dts: rk3228-evb: Enable the integrated PHY for gmac

2017-08-13 Thread kbuild test robot
Hi David,

[auto build test ERROR on net/master]
[also build test ERROR on v4.13-rc4 next-20170811]
[cannot apply to net-next/master]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/David-Wu/Add-the-integrated-PHY-support/20170813-163538
config: arm-omap2plus_defconfig (attached as .config)
compiler: arm-linux-gnueabi-gcc (Debian 6.1.1-9) 6.1.1 20160705
reproduce:
wget 
https://raw.githubusercontent.com/01org/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=arm 

All errors (new ones prefixed by >>):

>> ERROR: Input tree has errors, aborting (use -f to force output)

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip


[PATCH net] openvswitch: fix skb_panic due to the incorrect actions attrlen

2017-08-13 Thread Liping Zhang
From: Liping Zhang 

For sw_flow_actions, the actions_len only represents the kernel part's
size, and when we dump the actions to the userspace, we will do the
convertions, so it's true size may become bigger than the actions_len.

But unfortunately, for OVS_PACKET_ATTR_ACTIONS, we use the actions_len
to alloc the skbuff, so the user_skb's size may become insufficient and
oops will happen like this:
  skbuff: skb_over_panic: text:8148fabf len:1749 put:157 head:
  881300f39000 data:881300f39000 tail:0x6d5 end:0x6c0 dev:
  [ cut here ]
  kernel BUG at net/core/skbuff.c:129!
  [...]
  Call Trace:
   
   [] skb_put+0x43/0x44
   [] skb_zerocopy+0x6c/0x1f4
   [] queue_userspace_packet+0x3a3/0x448 [openvswitch]
   [] ovs_dp_upcall+0x30/0x5c [openvswitch]
   [] output_userspace+0x132/0x158 [openvswitch]
   [] ? ip6_rcv_finish+0x74/0x77 [ipv6]
   [] do_execute_actions+0xcc1/0xdc8 [openvswitch]
   [] ovs_execute_actions+0x74/0x106 [openvswitch]
   [] ovs_dp_process_packet+0xe1/0xfd [openvswitch]
   [] ? key_extract+0x63c/0x8d5 [openvswitch]
   [] ovs_vport_receive+0xa1/0xc3 [openvswitch]
  [...]

Also we can find that the actions_len is much little than the orig_len:
  crash> struct sw_flow_actions 0x8812f539d000
  struct sw_flow_actions {
rcu = {
  next = 0x8812f5398800,
  func = 0xe3b00035db32
},
orig_len = 1384,
actions_len = 592,
actions = 0x8812f539d01c
  }

So as a quick fix, use the orig_len instead of the actions_len to alloc
the user_skb.

Last, this oops happened on our system running a relative old kernel, but
the same risk still exists on the mainline, since we use the wrong
actions_len from the beginning.

Fixes: ccea74457bbd ("openvswitch: include datapath actions with sampled-packet 
upcall to userspace")
Cc: Neil McKee 
Signed-off-by: Liping Zhang 
---
 net/openvswitch/actions.c  | 39 +--
 net/openvswitch/datapath.c |  2 +-
 net/openvswitch/datapath.h |  1 +
 3 files changed, 27 insertions(+), 15 deletions(-)

diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
index e4610676299b..799a22dfb89e 100644
--- a/net/openvswitch/actions.c
+++ b/net/openvswitch/actions.c
@@ -48,6 +48,7 @@ struct deferred_action {
struct sk_buff *skb;
const struct nlattr *actions;
int actions_len;
+   int actions_attrlen;
 
/* Store pkt_key clone when creating deferred action. */
struct sw_flow_key pkt_key;
@@ -135,7 +136,8 @@ static struct deferred_action *action_fifo_put(struct 
action_fifo *fifo)
 static struct deferred_action *add_deferred_actions(struct sk_buff *skb,
const struct sw_flow_key *key,
const struct nlattr *actions,
-   const int actions_len)
+   const int actions_len,
+   const int actions_attrlen)
 {
struct action_fifo *fifo;
struct deferred_action *da;
@@ -146,6 +148,7 @@ static struct deferred_action *add_deferred_actions(struct 
sk_buff *skb,
da->skb = skb;
da->actions = actions;
da->actions_len = actions_len;
+   da->actions_attrlen = actions_attrlen;
da->pkt_key = *key;
}
 
@@ -166,6 +169,7 @@ static int clone_execute(struct datapath *dp, struct 
sk_buff *skb,
 struct sw_flow_key *key,
 u32 recirc_id,
 const struct nlattr *actions, int len,
+int actions_attrlen,
 bool last, bool clone_flow_key);
 
 static void update_ethertype(struct sk_buff *skb, struct ethhdr *hdr,
@@ -880,7 +884,7 @@ static void do_output(struct datapath *dp, struct sk_buff 
*skb, int out_port,
 static int output_userspace(struct datapath *dp, struct sk_buff *skb,
struct sw_flow_key *key, const struct nlattr *attr,
const struct nlattr *actions, int actions_len,
-   uint32_t cutlen)
+   int actions_attrlen, uint32_t cutlen)
 {
struct dp_upcall_info upcall;
const struct nlattr *a;
@@ -921,6 +925,7 @@ static int output_userspace(struct datapath *dp, struct 
sk_buff *skb,
/* Include actions. */
upcall.actions = actions;
upcall.actions_len = actions_len;
+   upcall.actions_attrlen = actions_attrlen;
break;
}
 
@@ -936,7 +941,7 @@ static int output_userspace(struct datapath *dp, struct 
sk_buff *skb,
  */
 static int sample(struct datapath *dp, struct sk_buff *skb,
  struct sw_flow_key *key, const struct nlattr *attr,
- bool last)
+  

Re: Driver profiles RFC

2017-08-13 Thread Jiri Pirko
Fri, Aug 11, 2017 at 11:57:38PM CEST, kubak...@wp.pl wrote:
>On Tue, 8 Aug 2017 16:15:41 +0300, Arkadi Sharshevsky wrote:
>> Driver <--> Devlink API
>> ===
>> Each driver will register his resources with default values at init in
>> a similar way to DPIPE table registration. In case those resources already
>> exist the default values are discarded. The user will be able to dump and
>> update the resources. In order for the changes to take place the user will
>> need to re-initiate the driver by a specific devlink knob.
>
>What seems missing from the examples is the ability to dump the
>different states - the "pending" configuration and the currently
>applied one.

Agreed, I miss this too.


>
>> The above described procedure will require extra reload of the driver.
>> This can be improved as a future optimization.
>
>I'm a bit lost, this says driver reload is required...

driver will provide "commit" callback. What he does is up to him. In
case of mlxsw, we have to reinstantiate the driver, because FW reset is
required.


>
>> UAPI
>> 
>> The user will be able to update the resources on a per resource basis:
>> 
>> $devlink dpipe resource set pci/:03:00.0 Mem_Linear 2M
>> 
>> For some resources the size is fixed, for example the size of the internal
>> memory cannot be changed. It is provided merely in order to reflect the
>> nested structure of the resource and to imply the user that Mem = Linear +
>> Hash, thus a set operation on it will fail.
>> 
>> The user can dump the current resource configuration:
>> 
>> #devlink dpipe resource dump tree pci/:03:00.0 Mem
>> 
>> The user can specify 'tree' in order to show all the nested resources under
>> the specified one. In case no 'resource name' is specified the TOP hierarchy
>> will be dumped.
>> 
>> After successful resource update the drivers hould be re-instantiated in
>> order for the changes to take place:
>> 
>> $devlink reload pci/:03:00.0
>
>... but this shows a devlink reload tigger, so no driver reload?  Were
>you describing two possible solutions?  One with persistent kernel
>database of configs (persistent across driver reloads) and one with no
>persistence and the driver is managing reinit internally when triggered
>via devlink?

This is misunderstanding. There is no driver reload (modprobe -r &&
modprobe)


>
>Another thing that comes to mind is - in case HW/FW reinit takes long
>would it make sense to incorporate some form of pre-population of those
>defaults somehow?  If user knows exactly the config they want upon
>boot, it would seem cleaner if the reconfig did not have to happen and
>devices started out in the right mode.

We were discussing it. There are couple of ways to achieve that, none of
that is very nice. So we decided to leave this for later, when/if it is
needed.