date:20170404

Re: [PATCH] net: netfilters: Remove unnecessary parenthesis

2017-04-04 Thread Simon Horman

On Tue, Mar 28, 2017 at 06:56:48PM +0530, Arushi Singhal wrote:
> Rmoved parentheses on the right hand side of assignment, as they are
> not required. The following coccinelle script was used to fix this
> issue:
> 
> @@
> local idexpression id;
> expression e;
> @@
> 
> id =
> -(
> e
> -)
> 
> Signed-off-by: Arushi Singhal 
> ---
>  net/netfilter/ipvs/ip_vs_proto_tcp.c   | 2 +-

grep seems to find some other instances of this problem for IPVS.
I would prefer if all of them were fixed.

$ grep -n "= *(.*);" net/netfilter/ipvs/*
After some manual filtering I see

net/netfilter/ipvs/ip_vs_ctl.c:113: nomem = (availmem < 
ipvs->sysctl_amemthresh);
net/netfilter/ipvs/ip_vs_ctl.c:279: ahash ^= ((size_t) ipvs >> 8);
net/netfilter/ipvs/ip_vs_proto_tcp.c:490:   int on = (flags & 1);  
/* secure_tcp */
net/netfilter/ipvs/ip_vs_proto_tcp.c:498:   pd->tcp_state_table = (on ? 
tcp_states_dos : tcp_states);
net/netfilter/ipvs/ip_vs_sync.c:719:s->v4.type = (cp->af == AF_INET6 ? 
STYPE_F_INET6 : 0);
net/netfilter/ipvs/ip_vs_sync.c:948:cp->timeout = (3*60*HZ)

As mentioned elsewhere: If you would like me to pick up patches for IPVS
then please post patches that only update IPVS files. I'm also happy for
Pablo to pick up patches that include both IPVS and non-IPVS Netfilter
updates of this nature.  Pablo can offer his own guidance here.

Re: [PATCH] net: netfilter: Replace explicit NULL comparison with ! operator

2017-04-04 Thread Simon Horman

On Wed, Mar 29, 2017 at 03:45:01PM +0530, Arushi Singhal wrote:
> Replace explicit NULL comparison with ! operator to simplify code.
> 
> Signed-off-by: Arushi Singhal 
> ---
>  net/netfilter/ipvs/ip_vs_ctl.c |  8 ++---
>  net/netfilter/ipvs/ip_vs_proto.c   |  8 ++---

I count 18 instances of "!= NULL in net/netfilter/ipvs/ip_vs_proto but this
patch only seems to update 8 of them. I would prefer to fix all or none of
them.

If you would like me to pick up patches for IPVS then please post patches
that only update IPVS files. I'm also happy for Pablo to pick up patches
that include both IPVS and non-IPVS Netfilter updates of this nature.

Pablo can offer his own guidance here.

>  net/netfilter/nf_conntrack_broadcast.c |  2 +-
>  net/netfilter/nf_conntrack_core.c  |  2 +-
>  net/netfilter/nf_conntrack_ecache.c|  4 +--
>  net/netfilter/nf_conntrack_helper.c|  4 +--
>  net/netfilter/nf_conntrack_proto.c |  4 +--
>  net/netfilter/nf_log.c |  2 +-
>  net/netfilter/nf_nat_redirect.c|  2 +-
>  net/netfilter/nf_tables_api.c  | 62 
> +-
>  net/netfilter/nfnetlink_log.c  |  6 ++--
>  net/netfilter/nfnetlink_queue.c|  8 ++---
>  net/netfilter/nft_compat.c |  4 +--
>  net/netfilter/nft_ct.c | 10 +++---
>  net/netfilter/nft_dynset.c | 14 
>  net/netfilter/nft_log.c| 14 
>  net/netfilter/nft_lookup.c |  2 +-
>  net/netfilter/nft_payload.c|  4 +--
>  net/netfilter/nft_set_hash.c   |  4 +--
>  net/netfilter/x_tables.c   |  8 ++---
>  net/netfilter/xt_TCPMSS.c  |  4 +--
>  net/netfilter/xt_addrtype.c|  2 +-
>  net/netfilter/xt_connlimit.c   |  2 +-
>  net/netfilter/xt_conntrack.c   |  2 +-
>  net/netfilter/xt_hashlimit.c   |  4 +--
>  net/netfilter/xt_recent.c  |  6 ++--
>  26 files changed, 96 insertions(+), 96 deletions(-)
> 
> diff --git a/net/netfilter/ipvs/ip_vs_ctl.c b/net/netfilter/ipvs/ip_vs_ctl.c
> index 5aeb0dde6ccc..32daa0b3797e 100644
> --- a/net/netfilter/ipvs/ip_vs_ctl.c
> +++ b/net/netfilter/ipvs/ip_vs_ctl.c
> @@ -983,7 +983,7 @@ ip_vs_add_dest(struct ip_vs_service *svc, struct 
> ip_vs_dest_user_kern *udest)
>   dest = ip_vs_lookup_dest(svc, udest->af, , dport);
>   rcu_read_unlock();
>  
> - if (dest != NULL) {
> + if (dest) {
>   IP_VS_DBG(1, "%s(): dest already exists\n", __func__);
>   return -EEXIST;
>   }
> @@ -994,7 +994,7 @@ ip_vs_add_dest(struct ip_vs_service *svc, struct 
> ip_vs_dest_user_kern *udest)
>*/
>   dest = ip_vs_trash_get_dest(svc, udest->af, , dport);
>  
> - if (dest != NULL) {
> + if (dest) {
>   IP_VS_DBG_BUF(3, "Get destination %s:%u from trash, "
> "dest->refcnt=%d, service %u/%s:%u\n",
> IP_VS_DBG_ADDR(udest->af, ), ntohs(dport),
> @@ -1299,7 +1299,7 @@ ip_vs_add_service(struct netns_ipvs *ipvs, struct 
> ip_vs_service_user_kern *u,
>  
>  
>   out_err:
> - if (svc != NULL) {
> + if (svc) {
>   ip_vs_unbind_scheduler(svc, sched);
>   ip_vs_service_free(svc);
>   }
> @@ -2453,7 +2453,7 @@ do_ip_vs_set_ctl(struct sock *sk, int cmd, void __user 
> *user, unsigned int len)
>  
>   switch (cmd) {
>   case IP_VS_SO_SET_ADD:
> - if (svc != NULL)
> + if (svc)
>   ret = -EEXIST;
>   else
>   ret = ip_vs_add_service(ipvs, , );
> diff --git a/net/netfilter/ipvs/ip_vs_proto.c 
> b/net/netfilter/ipvs/ip_vs_proto.c
> index 8ae480715cea..6ee7fec2ef47 100644
> --- a/net/netfilter/ipvs/ip_vs_proto.c
> +++ b/net/netfilter/ipvs/ip_vs_proto.c
> @@ -53,7 +53,7 @@ static int __used __init register_ip_vs_protocol(struct 
> ip_vs_protocol *pp)
>   pp->next = ip_vs_proto_table[hash];
>   ip_vs_proto_table[hash] = pp;
>  
> - if (pp->init != NULL)
> + if (pp->init)
>   pp->init(pp);
>  
>   return 0;
> @@ -77,7 +77,7 @@ register_ip_vs_proto_netns(struct netns_ipvs *ipvs, struct 
> ip_vs_protocol *pp)
>   ipvs->proto_data_table[hash] = pd;
>   atomic_set(>appcnt, 0); /* Init app counter */
>  
> - if (pp->init_netns != NULL) {
> + if (pp->init_netns) {
>   int ret = pp->init_netns(ipvs, pd);
>   if (ret) {
>   /* unlink an free proto data */
> @@ -102,7 +102,7 @@ static int unregister_ip_vs_protocol(struct 
> ip_vs_protocol *pp)
>   for (; *pp_p; pp_p = &(*pp_p)->next) {
>   if (*pp_p == pp) {
>   *pp_p = pp->next;
> - if (pp->exit != NULL)
> + if (pp->exit)
>   pp->exit(pp);
>   return 0;
>   }
> @@ -124,7 +124,7 @@

Re: [PATCH 0/2] ARM: am335x-icev2: Add ethernet support

2017-04-04 Thread David Miller

From: Tony Lindgren 
Date: Tue, 4 Apr 2017 09:01:06 -0700

> You may need to resend the davinci_mdio.c patch alone
> for Dave as he usually won't pick individual patches I
> think.

Correct.

Re: [Intel-wired-lan] [PATCH] igb: Allow to remove administratively set MAC on VFs

2017-04-04 Thread Alexander Duyck

On Tue, Apr 4, 2017 at 10:16 AM, Duyck, Alexander H
 wrote:
>> -Original Message-
>> From: Intel-wired-lan [mailto:intel-wired-lan-boun...@lists.osuosl.org] On
>> Behalf Of Corinna Vinschen
>> Sent: Tuesday, April 4, 2017 8:11 AM
>> To: intel-wired-...@lists.osuosl.org
>> Cc: netdev@vger.kernel.org
>> Subject: [Intel-wired-lan] [PATCH] igb: Allow to remove administratively set 
>> MAC
>> on VFs
>>
>>   Before libvirt modifies the MAC address and vlan tag for an SRIOV VF
>>   for use by a virtual machine (either using vfio device assignment or
>>   macvtap passthru mode), it saves the current MAC address and vlan tag
>>   so that it can reset them to their original value when the guest is
>>   done.  Libvirt can't leave the VF MAC set to the value used by the
>>   now-defunct guest since it may be started again later using a
>>   different VF, but it certainly shouldn't just pick any random value,
>>   either. So it saves the state of everything prior to using the VF, and
>>   resets it to that.
>>
>>   The igb driver initializes the MAC addresses of all VFs to
>>   00:00:00:00:00:00, and reports that when asked (via an RTM_GETLINK
>>   netlink message, also visible in the list of VFs in the output of "ip
>>   link show"). But when libvirt attempts to restore the MAC address back
>>   to 00:00:00:00:00:00 (using an RTM_SETLINK netlink message) the kernel
>>   responds with "Invalid argument".
>>
>>   Forbidding a reset back to the original value leaves the VF MAC at the
>>   value set for the now-defunct virtual machine. Especially on a system
>>   with NetworkManager enabled, this has very bad consequences, since
>>   NetworkManager forces all interfacess to be IFF_UP all the time - if
>>   the same virtual machine is restarted using a different VF (or even on
>>   a different host), there will be multiple interfaces watching for
>>   traffic with the same MAC address.
>>
>>   To allow libvirt to revert to the original state, we need a way to
>>   remove the administrative set MAC on a VF, to allow normal host
>>   operation again, and to reset/overwrite the VF MAC via VF netdev.
>>
>>   This patch implements the outlined scenario by allowing to set the
>>   VF MAC to 00:00:00:00:00:00 via RTM_SETLINK on the PF.
>>   igb_ndo_set_vf_mac resets the IGB_VF_FLAG_PF_SET_MAC flag to 0,
>>   so it's possible to reset the VF MAC back to the original value via
>>   the VF netdev.
>>
>>   Note: Recent patches to libvirt allow for a workaround if the NIC
>>   isn't capable of resetting the administrative MAC back to all 0, but
>>   in theory the NIC should allow resetting the MAC in the fisr place.
>
> Minor typo here. I assume you mean "first place".
>
>> Signed-off-by: Corinna Vinschen 
>> ---
>>  drivers/net/ethernet/intel/igb/igb_main.c | 25 -
>>  1 file changed, 20 insertions(+), 5 deletions(-)
>>
>> diff --git a/drivers/net/ethernet/intel/igb/igb_main.c
>> b/drivers/net/ethernet/intel/igb/igb_main.c
>> index 26a821f..e7a61b1 100644
>> --- a/drivers/net/ethernet/intel/igb/igb_main.c
>> +++ b/drivers/net/ethernet/intel/igb/igb_main.c
>> @@ -8125,12 +8125,27 @@ static int igb_set_vf_mac(struct igb_adapter
>> *adapter,  static int igb_ndo_set_vf_mac(struct net_device *netdev, int vf, 
>> u8
>> *mac)  {
>>   struct igb_adapter *adapter = netdev_priv(netdev);
>> - if (!is_valid_ether_addr(mac) || (vf >= adapter->vfs_allocated_count))
>> +
>> + if (vf >= adapter->vfs_allocated_count)
>> + return -EINVAL;
>
> I would add an blank line here just for readability.
>
>> + /* Setting the VF MAC to 0 reverts the IGB_VF_FLAG_PF_SET_MAC
>> +flag and allows to overwrite the MAC via VF netdev.  This
>> +is necessary to allow libvirt a way to restore the original
>> +MAC after unbinding vfio-pci and reloading igbvf after shutting
>> +down a VM. */
>
> Minor coding style issue here. The "*/" should be on a separate line.
>
>> + if (is_zero_ether_addr(mac)) {
>> + adapter->vf_data[vf].flags &= ~IGB_VF_FLAG_PF_SET_MAC;
>> + dev_info(>pdev->dev,
>> +  "remove administratively set MAC on VF %d\n",
>> +  vf);
>> + } else if (is_valid_ether_addr (mac)) {
>> + adapter->vf_data[vf].flags |= IGB_VF_FLAG_PF_SET_MAC;
>> + dev_info(>pdev->dev, "setting MAC %pM on VF
>> %d\n",
>> +  mac, vf);
>> + dev_info(>pdev->dev,
>> +  "Reload the VF driver to make this change 
>> effective.");
>> + } else
>>   return -EINVAL;
>
> Minor coding style issue here. The else should also have "{}" wrapping the 
> statement. Generally if any one of the statements in an if/else series needs 
> the braces they should all have the braces.
>
>> - adapter->vf_data[vf].flags |= IGB_VF_FLAG_PF_SET_MAC;
>> - dev_info(>pdev->dev, "setting MAC %pM on VF %d\n",

Re: [PATCH] net: ethernet: ti: cpsw: fix race condition during open()

2017-04-04 Thread David Miller

From: Sekhar Nori 
Date: Mon, 3 Apr 2017 17:34:28 +0530

> TI's cpsw driver handles both OF and non-OF case for phy
> connect. Unfortunately of_phy_connect() returns NULL on
> error while phy_connect() returns ERR_PTR().
> 
> To handle this, cpsw_slave_open() overrides the return value
> from phy_connect() to make it NULL or error.
> 
> This leaves a small window, where cpsw_adjust_link() may be
> invoked for a slave while slave->phy pointer is temporarily
> set to -ENODEV (or some other error) before it is finally set
> to NULL.
> 
> _cpsw_adjust_link() only handles the NULL case, and an oops
> results when ERR_PTR() is seen by it.
> 
> Note that cpsw_adjust_link() checks PHY status for each
> slave whenever it is invoked. It can so happen that even
> though phy_connect() for a given slave returns error,
> _cpsw_adjust_link() is still called for that slave because
> the link status of another slave changed.
> 
> Fix this by using a temporary pointer to store return value
> of {of_}phy_connect() and do a one-time write to slave->phy.
> 
> Reviewed-by: Grygorii Strashko 
> Reported-by: Yan Liu 
> Signed-off-by: Sekhar Nori 

Applied, thank you.

Re: [PATCH 1/2] bpf: remove struct bpf_prog_type_list

2017-04-04 Thread Johannes Berg

Oops, I really meant to send these as RFC more than anything, because I
don't really understand why it's done that way :)

FWIW, the bloat-o-meter looks similar in both cases, like this:

add/remove: 0/11 grow/shrink: 9/1 up/down: 145/-365 (-220)
function old new   delta
bpf_map_types 16  96 +80
register_htab_map 56  76 +20
register_array_map32  42 +10
bpf_register_map_type 35  45 +10
register_trie_map 20  25  +5
register_stack_map20  25  +5
register_prog_array_map   20  25  +5
register_perf_event_array_map 20  25  +5
register_cgroup_array_map 20  25  +5
sys_bpf 11401127 -13
trie_type 32   - -32
stack_map_type32   - -32
prog_array_type   32   - -32
perf_event_array_type 32   - -32
percpu_array_type 32   - -32
htab_type 32   - -32
htab_percpu_type  32   - -32
htab_lru_type 32   - -32
htab_lru_percpu_type  32   - -32
cgroup_array_type 32   - -32
array_type32   - -32

johannes

Re: [PATCH next] bonding: fix active-backup transition

2017-04-04 Thread Andy Gospodarek

On Mon, Apr 03, 2017 at 06:38:39PM -0700, Mahesh Bandewar wrote:
> From: Mahesh Bandewar 
> 
> Earlier patch c4adfc822bf5 ("bonding: make speed, duplex setting
> consistent with link state") made an attempt to keep slave state
> consistent with speed and duplex settings. Unfortunately link-state
> transition is used to change the active link especially when used
> in conjunction with mii-mon. The above mentioned patch broke that
> logic. Also when speed and duplex settings for a link are updated
> during a link-event, the link-status should not be changed to
> invoke correct transition logic.
> 
> This patch fixes this issue by moving the link-state update outside
> of the bond_update_speed_duplex() fn and to the places where this fn
> is called and update link-state selectively.
> 
> Fixes: c4adfc822bf5 ("bonding: make speed, duplex setting consistent
> with link state")
> Signed-off-by: Mahesh Bandewar 

Reviewed-by: Andy Gospodarek 

> ---
>  drivers/net/bonding/bond_main.c | 13 +
>  1 file changed, 5 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
> index 27359dab78a1..535388b15cde 100644
> --- a/drivers/net/bonding/bond_main.c
> +++ b/drivers/net/bonding/bond_main.c
> @@ -378,20 +378,15 @@ static int bond_update_speed_duplex(struct slave *slave)
>   slave->duplex = DUPLEX_UNKNOWN;
>  
>   res = __ethtool_get_link_ksettings(slave_dev, );
> - if (res < 0) {
> - slave->link = BOND_LINK_DOWN;
> + if (res < 0)
>   return 1;
> - }
> - if (ecmd.base.speed == 0 || ecmd.base.speed == ((__u32)-1)) {
> - slave->link = BOND_LINK_DOWN;
> + if (ecmd.base.speed == 0 || ecmd.base.speed == ((__u32)-1))
>   return 1;
> - }
>   switch (ecmd.base.duplex) {
>   case DUPLEX_FULL:
>   case DUPLEX_HALF:
>   break;
>   default:
> - slave->link = BOND_LINK_DOWN;
>   return 1;
>   }
>  
> @@ -1563,7 +1558,8 @@ int bond_enslave(struct net_device *bond_dev, struct 
> net_device *slave_dev)
>   new_slave->delay = 0;
>   new_slave->link_failure_count = 0;
>  
> - bond_update_speed_duplex(new_slave);
> + if (bond_update_speed_duplex(new_slave))
> + new_slave->link = BOND_LINK_DOWN;
>  
>   new_slave->last_rx = jiffies -
>   (msecs_to_jiffies(bond->params.arp_interval) + 1);
> @@ -2126,6 +2122,7 @@ static void bond_miimon_commit(struct bonding *bond)
>  
>   case BOND_LINK_UP:
>   if (bond_update_speed_duplex(slave)) {
> + slave->link = BOND_LINK_DOWN;
>   netdev_warn(bond->dev,
>   "failed to get link speed/duplex 
> for %s\n",
>   slave->dev->name);
> -- 
> 2.12.2.715.g7642488e1d-goog
>

Re: pull-request: wireless-drivers-next 2017-04-03

2017-04-04 Thread David Miller

From: Kalle Valo 
Date: Mon, 03 Apr 2017 14:26:10 +0300

> here few really small fixes. I'm hoping this to be the last pull request
> for 4.11.
> 
> Please let me if there are any problems.

Pulled, thanks.

But I will warn you, you say fixes, but your Subject line and
GIT tag says "-next" so I pulled it into net-next.

Re: [PATCH] bpf: use 'ctx' instead of 'skb' in debug message

2017-04-04 Thread Johannes Berg

On Tue, 2017-04-04 at 19:26 +0200, Daniel Borkmann wrote:

> >     if (regs[BPF_REG_6].type != PTR_TO_CTX) {
> > -   verbose("at the time of BPF_LD_ABS|IND R6 !=
> > pointer to skb\n");
> > +   verbose("at the time of BPF_LD_ABS|IND R6 !=
> > pointer to ctx\n");
> >     return -EINVAL;
> 
> Seems okay, the reason why we had 'skb' in the verbose message here
> is due to BPF_LD + BPF_ABS/BPF_IND operations being only specific to
> skbs and no other context (see __bpf_prog_run(), and in verifier
> may_access_skb() check before that verbose() message in
> check_ld_abs()). Reason for this is mostly historical due to the cBPF
> to eBPF migration so that these loads don't get slowed down when
> migrated to eBPF and can be handled by JIT optimizations (e.g.,
> caching skb->data), too. Anyway, just to provide some more background
> on this. I've no strong opinion if you want to change the verifier
> error message, so:

Ah. I really have no opinion on this either - it just seemed somewhat
inconsistent. I clearly neglected to read the comment in front of the
function though, that explains that we must have ctx == skb. I think
therefore it's probably better to drop this - thanks for the
explanation!

johannes

Re: net/sctp: list double add warning in sctp_endpoint_add_asoc

2017-04-04 Thread Xin Long

On Tue, Apr 4, 2017 at 9:28 PM, Andrey Konovalov  wrote:
> Hi,
>
> I've got the following error report while fuzzing the kernel with syzkaller.
>
> On commit a71c9a1c779f2499fb2afc0553e543f18aff6edf (4.11-rc5).
>
> A reproducer and .config are attached.
The script is pretty hard to reproduce the issue in my env.
But there seems a case to cause a use-after-free when out of snd_buf.

the case is like:
---
one thread:   another thread:
  sctp_rcv hold asoc (hold transport)
  enqueue the chunk to backlog queue
  [refcnt=2]

sctp_close free assoc
[refcnt=1]

sctp_sendmsg find asoc
but not hold it

out of snd_buf
hold asoc, schedule out
[refcnt = 2]

  process backlog and put asoc/transport
  [refcnt=1]

schedule in, put asoc
[refcnt=0] <--- destroyed

sctp_sendmsg continue
using asoc, panic



Maybe we should check if asoc is dead already when schedule back
into sctp_sendmsg because of out of snd_buf.

Re: [PATCH net] l2tp: fix PPP pseudo-wire auto-loading

2017-04-04 Thread David Miller

From: Guillaume Nault 
Date: Mon, 3 Apr 2017 13:23:15 +0200

> PPP pseudo-wire type is 7 (11 is L2TP_PWTYPE_IP).
> 
> Fixes: f1f39f911027 ("l2tp: auto load type modules")
> Signed-off-by: Guillaume Nault 

Applied and queued up for -stable.

[PATCH 1/2] bpf: remove struct bpf_prog_type_list

2017-04-04 Thread Johannes Berg

From: Johannes Berg 

There's no need to have struct bpf_prog_type_list since
it just contains a list_head, the type, and the ops
pointer. Since the types are densely packed and not
actually dynamically registered, it's much easier and
smaller to have an array of type->ops pointer.

This doesn't really change the image size much, but in
the running image it saves a few hundred bytes because
the structs are removed and traded against __init code.

While at it, also mark bpf_register_prog_type() __init
since it's only called from code already marked so.

Signed-off-by: Johannes Berg 
---
So I'm not sure about this - I looked at this code since
I wanted to see if we could even register prog_types from
modules, but that seems to be impossible right now ...
---
 include/linux/bpf.h  | 12 +++--
 include/uapi/linux/bpf.h |  2 ++
 kernel/bpf/syscall.c | 26 ++--
 kernel/trace/bpf_trace.c | 21 +++-
 net/core/filter.c| 63 +++-
 5 files changed, 31 insertions(+), 93 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 909fc033173a..891a76aaccaa 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -169,12 +169,6 @@ struct bpf_verifier_ops {
  struct bpf_prog *prog);
 };
 
-struct bpf_prog_type_list {
-   struct list_head list_node;
-   const struct bpf_verifier_ops *ops;
-   enum bpf_prog_type type;
-};
-
 struct bpf_prog_aux {
atomic_t refcnt;
u32 used_map_cnt;
@@ -234,7 +228,8 @@ u64 bpf_event_output(struct bpf_map *map, u64 flags, void 
*meta, u64 meta_size,
 #ifdef CONFIG_BPF_SYSCALL
 DECLARE_PER_CPU(int, bpf_prog_active);
 
-void bpf_register_prog_type(struct bpf_prog_type_list *tl);
+void bpf_register_prog_type(enum bpf_prog_type type,
+   const struct bpf_verifier_ops *ops);
 void bpf_register_map_type(struct bpf_map_type_list *tl);
 
 struct bpf_prog *bpf_prog_get(u32 ufd);
@@ -295,7 +290,8 @@ static inline void bpf_long_memcpy(void *dst, const void 
*src, u32 size)
 /* verify correctness of eBPF program */
 int bpf_check(struct bpf_prog **fp, union bpf_attr *attr);
 #else
-static inline void bpf_register_prog_type(struct bpf_prog_type_list *tl)
+static inline void bpf_register_prog_type(enum bpf_prog_type type,
+ const struct bpf_verifier_ops *ops)
 {
 }
 
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 0539a0ceef38..cc68f5bbf458 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -112,6 +112,8 @@ enum bpf_prog_type {
BPF_PROG_TYPE_LWT_IN,
BPF_PROG_TYPE_LWT_OUT,
BPF_PROG_TYPE_LWT_XMIT,
+
+   NUM_BPF_PROG_TYPES,
 };
 
 enum bpf_attach_type {
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 7af0dcc5d755..1156eccf36a5 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -564,26 +564,26 @@ static int map_get_next_key(union bpf_attr *attr)
return err;
 }
 
-static LIST_HEAD(bpf_prog_types);
+static const struct bpf_verifier_ops *
+bpf_prog_types[NUM_BPF_PROG_TYPES] __ro_after_init;
 
 static int find_prog_type(enum bpf_prog_type type, struct bpf_prog *prog)
 {
-   struct bpf_prog_type_list *tl;
-
-   list_for_each_entry(tl, _prog_types, list_node) {
-   if (tl->type == type) {
-   prog->aux->ops = tl->ops;
-   prog->type = type;
-   return 0;
-   }
-   }
+   if (type >= NUM_BPF_PROG_TYPES || !bpf_prog_types[type])
+   return -EINVAL;
 
-   return -EINVAL;
+   prog->aux->ops = bpf_prog_types[type];
+   prog->type = type;
+   return 0;
 }
 
-void bpf_register_prog_type(struct bpf_prog_type_list *tl)
+void __init bpf_register_prog_type(enum bpf_prog_type type,
+  const struct bpf_verifier_ops *ops)
 {
-   list_add(>list_node, _prog_types);
+   if (WARN_ON(bpf_prog_types[type]))
+   return;
+
+   bpf_prog_types[type] = ops;
 }
 
 /* fixup insn->imm field of bpf_call instructions:
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index cee9802cf3e0..1e45e1cd0174 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -506,11 +506,6 @@ static const struct bpf_verifier_ops kprobe_prog_ops = {
.is_valid_access = kprobe_prog_is_valid_access,
 };
 
-static struct bpf_prog_type_list kprobe_tl __ro_after_init = {
-   .ops= _prog_ops,
-   .type   = BPF_PROG_TYPE_KPROBE,
-};
-
 BPF_CALL_5(bpf_perf_event_output_tp, void *, tp_buff, struct bpf_map *, map,
   u64, flags, void *, data, u64, size)
 {
@@ -589,11 +584,6 @@ static const struct bpf_verifier_ops tracepoint_prog_ops = 
{
.is_valid_access = tp_prog_is_valid_access,
 };
 
-static struct bpf_prog_type_list tracepoint_tl __ro_after_init

[PATCH 2/2] bpf: remove struct bpf_map_type_list

2017-04-04 Thread Johannes Berg

From: Johannes Berg 

There's no need to have struct bpf_map_type_list since
it just contains a list_head, the type, and the ops
pointer. Since the types are densely packed and not
actually dynamically registered, it's much easier and
smaller to have an array of type->ops pointer.

This doesn't really change the image size much, but in
the running image it saves a few hundred bytes because
the structs are removed and traded against __init code.

While at it, also mark bpf_register_map_type() __init
since it's only called from code already marked so.

Signed-off-by: Johannes Berg 
---
 include/linux/bpf.h  |  9 ++---
 include/uapi/linux/bpf.h |  2 ++
 kernel/bpf/arraymap.c| 36 ++--
 kernel/bpf/hashtab.c | 29 +
 kernel/bpf/lpm_trie.c|  7 +--
 kernel/bpf/stackmap.c|  7 +--
 kernel/bpf/syscall.c | 33 ++---
 7 files changed, 35 insertions(+), 88 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 891a76aaccaa..9e0a2dec789a 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -51,12 +51,6 @@ struct bpf_map {
atomic_t usercnt;
 };
 
-struct bpf_map_type_list {
-   struct list_head list_node;
-   const struct bpf_map_ops *ops;
-   enum bpf_map_type type;
-};
-
 /* function argument constraints */
 enum bpf_arg_type {
ARG_DONTCARE = 0,   /* unused argument in helper function */
@@ -230,7 +224,8 @@ DECLARE_PER_CPU(int, bpf_prog_active);
 
 void bpf_register_prog_type(enum bpf_prog_type type,
const struct bpf_verifier_ops *ops);
-void bpf_register_map_type(struct bpf_map_type_list *tl);
+void bpf_register_map_type(enum bpf_map_type type,
+  const struct bpf_map_ops *ops);
 
 struct bpf_prog *bpf_prog_get(u32 ufd);
 struct bpf_prog *bpf_prog_get_type(u32 ufd, enum bpf_prog_type type);
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index cc68f5bbf458..53adc7e93062 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -96,6 +96,8 @@ enum bpf_map_type {
BPF_MAP_TYPE_LRU_HASH,
BPF_MAP_TYPE_LRU_PERCPU_HASH,
BPF_MAP_TYPE_LPM_TRIE,
+
+   NUM_BPF_MAP_TYPES,
 };
 
 enum bpf_prog_type {
diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c
index 6b6f41f0b211..6a3f3aa681de 100644
--- a/kernel/bpf/arraymap.c
+++ b/kernel/bpf/arraymap.c
@@ -269,11 +269,6 @@ static const struct bpf_map_ops array_ops = {
.map_delete_elem = array_map_delete_elem,
 };
 
-static struct bpf_map_type_list array_type __ro_after_init = {
-   .ops = _ops,
-   .type = BPF_MAP_TYPE_ARRAY,
-};
-
 static const struct bpf_map_ops percpu_array_ops = {
.map_alloc = array_map_alloc,
.map_free = array_map_free,
@@ -283,15 +278,10 @@ static const struct bpf_map_ops percpu_array_ops = {
.map_delete_elem = array_map_delete_elem,
 };
 
-static struct bpf_map_type_list percpu_array_type __ro_after_init = {
-   .ops = _array_ops,
-   .type = BPF_MAP_TYPE_PERCPU_ARRAY,
-};
-
 static int __init register_array_map(void)
 {
-   bpf_register_map_type(_type);
-   bpf_register_map_type(_array_type);
+   bpf_register_map_type(BPF_MAP_TYPE_ARRAY, _ops);
+   bpf_register_map_type(BPF_MAP_TYPE_PERCPU_ARRAY, _array_ops);
return 0;
 }
 late_initcall(register_array_map);
@@ -409,14 +399,9 @@ static const struct bpf_map_ops prog_array_ops = {
.map_fd_put_ptr = prog_fd_array_put_ptr,
 };
 
-static struct bpf_map_type_list prog_array_type __ro_after_init = {
-   .ops = _array_ops,
-   .type = BPF_MAP_TYPE_PROG_ARRAY,
-};
-
 static int __init register_prog_array_map(void)
 {
-   bpf_register_map_type(_array_type);
+   bpf_register_map_type(BPF_MAP_TYPE_PROG_ARRAY, _array_ops);
return 0;
 }
 late_initcall(register_prog_array_map);
@@ -522,14 +507,10 @@ static const struct bpf_map_ops perf_event_array_ops = {
.map_release = perf_event_fd_array_release,
 };
 
-static struct bpf_map_type_list perf_event_array_type __ro_after_init = {
-   .ops = _event_array_ops,
-   .type = BPF_MAP_TYPE_PERF_EVENT_ARRAY,
-};
-
 static int __init register_perf_event_array_map(void)
 {
-   bpf_register_map_type(_event_array_type);
+   bpf_register_map_type(BPF_MAP_TYPE_PERF_EVENT_ARRAY,
+ _event_array_ops);
return 0;
 }
 late_initcall(register_perf_event_array_map);
@@ -564,14 +545,9 @@ static const struct bpf_map_ops cgroup_array_ops = {
.map_fd_put_ptr = cgroup_fd_array_put_ptr,
 };
 
-static struct bpf_map_type_list cgroup_array_type __ro_after_init = {
-   .ops = _array_ops,
-   .type = BPF_MAP_TYPE_CGROUP_ARRAY,
-};
-
 static int __init register_cgroup_array_map(void)
 {
-   bpf_register_map_type(_array_type);
+   bpf_register_map_type(BPF_MAP_TYPE_CGROUP_ARRAY,

Re: [PATCH] bpf: use 'ctx' instead of 'skb' in debug message

2017-04-04 Thread Daniel Borkmann


On 04/04/2017 04:46 PM, Johannes Berg wrote:

From: Johannes Berg 

The error message here should mention 'ctx' since the context
is now more generic than just an skb.

Signed-off-by: Johannes Berg 
---
  kernel/bpf/verifier.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 796b68d00119..1b3c921d3798 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -2329,7 +2329,7 @@ static int check_ld_abs(struct bpf_verifier_env *env, 
struct bpf_insn *insn)
return err;

if (regs[BPF_REG_6].type != PTR_TO_CTX) {
-   verbose("at the time of BPF_LD_ABS|IND R6 != pointer to skb\n");
+   verbose("at the time of BPF_LD_ABS|IND R6 != pointer to ctx\n");
return -EINVAL;


Seems okay, the reason why we had 'skb' in the verbose message here is
due to BPF_LD + BPF_ABS/BPF_IND operations being only specific to skbs
and no other context (see __bpf_prog_run(), and in verifier may_access_skb()
check before that verbose() message in check_ld_abs()). Reason for this
is mostly historical due to the cBPF to eBPF migration so that these loads
don't get slowed down when migrated to eBPF and can be handled by JIT
optimizations (e.g., caching skb->data), too. Anyway, just to provide
some more background on this. I've no strong opinion if you want to change
the verifier error message, so:

Acked-by: Daniel Borkmann

RE: [Intel-wired-lan] [PATCH] igb: Allow to remove administratively set MAC on VFs

2017-04-04 Thread Duyck, Alexander H

> -Original Message-
> From: Intel-wired-lan [mailto:intel-wired-lan-boun...@lists.osuosl.org] On
> Behalf Of Corinna Vinschen
> Sent: Tuesday, April 4, 2017 8:11 AM
> To: intel-wired-...@lists.osuosl.org
> Cc: netdev@vger.kernel.org
> Subject: [Intel-wired-lan] [PATCH] igb: Allow to remove administratively set 
> MAC
> on VFs
> 
>   Before libvirt modifies the MAC address and vlan tag for an SRIOV VF
>   for use by a virtual machine (either using vfio device assignment or
>   macvtap passthru mode), it saves the current MAC address and vlan tag
>   so that it can reset them to their original value when the guest is
>   done.  Libvirt can't leave the VF MAC set to the value used by the
>   now-defunct guest since it may be started again later using a
>   different VF, but it certainly shouldn't just pick any random value,
>   either. So it saves the state of everything prior to using the VF, and
>   resets it to that.
> 
>   The igb driver initializes the MAC addresses of all VFs to
>   00:00:00:00:00:00, and reports that when asked (via an RTM_GETLINK
>   netlink message, also visible in the list of VFs in the output of "ip
>   link show"). But when libvirt attempts to restore the MAC address back
>   to 00:00:00:00:00:00 (using an RTM_SETLINK netlink message) the kernel
>   responds with "Invalid argument".
> 
>   Forbidding a reset back to the original value leaves the VF MAC at the
>   value set for the now-defunct virtual machine. Especially on a system
>   with NetworkManager enabled, this has very bad consequences, since
>   NetworkManager forces all interfacess to be IFF_UP all the time - if
>   the same virtual machine is restarted using a different VF (or even on
>   a different host), there will be multiple interfaces watching for
>   traffic with the same MAC address.
> 
>   To allow libvirt to revert to the original state, we need a way to
>   remove the administrative set MAC on a VF, to allow normal host
>   operation again, and to reset/overwrite the VF MAC via VF netdev.
> 
>   This patch implements the outlined scenario by allowing to set the
>   VF MAC to 00:00:00:00:00:00 via RTM_SETLINK on the PF.
>   igb_ndo_set_vf_mac resets the IGB_VF_FLAG_PF_SET_MAC flag to 0,
>   so it's possible to reset the VF MAC back to the original value via
>   the VF netdev.
> 
>   Note: Recent patches to libvirt allow for a workaround if the NIC
>   isn't capable of resetting the administrative MAC back to all 0, but
>   in theory the NIC should allow resetting the MAC in the fisr place.

Minor typo here. I assume you mean "first place".

> Signed-off-by: Corinna Vinschen 
> ---
>  drivers/net/ethernet/intel/igb/igb_main.c | 25 -
>  1 file changed, 20 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/net/ethernet/intel/igb/igb_main.c
> b/drivers/net/ethernet/intel/igb/igb_main.c
> index 26a821f..e7a61b1 100644
> --- a/drivers/net/ethernet/intel/igb/igb_main.c
> +++ b/drivers/net/ethernet/intel/igb/igb_main.c
> @@ -8125,12 +8125,27 @@ static int igb_set_vf_mac(struct igb_adapter
> *adapter,  static int igb_ndo_set_vf_mac(struct net_device *netdev, int vf, u8
> *mac)  {
>   struct igb_adapter *adapter = netdev_priv(netdev);
> - if (!is_valid_ether_addr(mac) || (vf >= adapter->vfs_allocated_count))
> +
> + if (vf >= adapter->vfs_allocated_count)
> + return -EINVAL;

I would add an blank line here just for readability.

> + /* Setting the VF MAC to 0 reverts the IGB_VF_FLAG_PF_SET_MAC
> +flag and allows to overwrite the MAC via VF netdev.  This
> +is necessary to allow libvirt a way to restore the original
> +MAC after unbinding vfio-pci and reloading igbvf after shutting
> +down a VM. */

Minor coding style issue here. The "*/" should be on a separate line.

> + if (is_zero_ether_addr(mac)) {
> + adapter->vf_data[vf].flags &= ~IGB_VF_FLAG_PF_SET_MAC;
> + dev_info(>pdev->dev,
> +  "remove administratively set MAC on VF %d\n",
> +  vf);
> + } else if (is_valid_ether_addr (mac)) {
> + adapter->vf_data[vf].flags |= IGB_VF_FLAG_PF_SET_MAC;
> + dev_info(>pdev->dev, "setting MAC %pM on VF
> %d\n",
> +  mac, vf);
> + dev_info(>pdev->dev,
> +  "Reload the VF driver to make this change effective.");
> + } else
>   return -EINVAL;

Minor coding style issue here. The else should also have "{}" wrapping the 
statement. Generally if any one of the statements in an if/else series needs 
the braces they should all have the braces.

> - adapter->vf_data[vf].flags |= IGB_VF_FLAG_PF_SET_MAC;
> - dev_info(>pdev->dev, "setting MAC %pM on VF %d\n", mac,
> vf);
> - dev_info(>pdev->dev,
> -  "Reload the VF driver to make this change effective.");
>   if (test_bit(__IGB_DOWN, >state)) {

You might need to change this to allow

[Netdev conf Announce]: Keynote by Shrijeet Mukherjee

2017-04-04 Thread Jamal Hadi Salim



The tech committee is pleased to announce a keynote by
Shrijeet Mukherjee titled
"Journey into Enterprise Networking with Linux"

Shrijeet wants to make Linux networking appealing to the Enterprise 
users and have it appear in the core of their critical networks.


Come one, come all

cheers,
jamal

Re: [PATCH] bnx2x: fix spelling mistake in macros HW_INTERRUT_ASSERT_SET_*

2017-04-04 Thread David Miller

From: Colin King 
Date: Mon,  3 Apr 2017 11:19:10 +0100

> From: Colin Ian King 
> 
> Trival fix, rename HW_INTERRUT_ASSERT_SET_* to HW_INTERRUPT_ASSERT_SET_*
> 
> Signed-off-by: Colin Ian King 

Applied.

Re: [PATCH net] l2tp: take reference on sessions being dumped

2017-04-04 Thread David Miller

From: Guillaume Nault 
Date: Mon, 3 Apr 2017 12:03:13 +0200

> Take a reference on the sessions returned by l2tp_session_find_nth()
> (and rename it l2tp_session_get_nth() to reflect this change), so that
> caller is assured that the session isn't going to disappear while
> processing it.
> 
> For procfs and debugfs handlers, the session is held in the .start()
> callback and dropped in .show(). Given that pppol2tp_seq_session_show()
> dereferences the associated PPPoL2TP socket and that
> l2tp_dfs_seq_session_show() might call pppol2tp_show(), we also need to
> call the session's .ref() callback to prevent the socket from going
> away from under us.
> 
> Fixes: fd558d186df2 ("l2tp: Split pppol2tp patch into separate l2tp and ppp 
> parts")
> Fixes: 0ad6614048cf ("l2tp: Add debugfs files for dumping l2tp debug info")
> Fixes: 309795f4bec2 ("l2tp: Add netlink control API for L2TP")
> Signed-off-by: Guillaume Nault 

Applied and queued up for -stable, thanks.

Re: [PATCH] i40e: limit client interface to X722 hardware

2017-04-04 Thread Or Gerlitz

On Tue, Apr 4, 2017 at 5:34 PM, Stefan Assmann  wrote:
> The client interface is meant for X722 iWARP support. Modprobing i40iw
> on systems with X710/XL710 NICs currently may crash the system.

just curious may or crash? and why?

RE: [PATCH] i40e: limit client interface to X722 hardware

2017-04-04 Thread Williams, Mitch A



> -Original Message-
> From: netdev-ow...@vger.kernel.org [mailto:netdev-ow...@vger.kernel.org] On
> Behalf Of Stefan Assmann
> Sent: Tuesday, April 04, 2017 7:35 AM
> To: intel-wired-...@lists.osuosl.org
> Cc: netdev@vger.kernel.org; da...@davemloft.net; Kirsher, Jeffrey T
> ; sassm...@kpanic.de
> Subject: [PATCH] i40e: limit client interface to X722 hardware
> 
> The client interface is meant for X722 iWARP support. Modprobing i40iw
> on systems with X710/XL710 NICs currently may crash the system. Adding a
> check which limits client interface access to the appropriate hardware.
> 
> Signed-off-by: Stefan Assmann 
> ---
>  drivers/net/ethernet/intel/i40e/i40e_client.c | 14 --
>  1 file changed, 8 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/net/ethernet/intel/i40e/i40e_client.c
> b/drivers/net/ethernet/intel/i40e/i40e_client.c
> index 191028b..6f873449 100644
> --- a/drivers/net/ethernet/intel/i40e/i40e_client.c
> +++ b/drivers/net/ethernet/intel/i40e/i40e_client.c
> @@ -525,15 +525,17 @@ static void i40e_client_release(struct i40e_client
> *client)
>  static void i40e_client_prepare(struct i40e_client *client)
>  {
>   struct i40e_device *ldev;
> - struct i40e_pf *pf;
> 
>   mutex_lock(_device_mutex);
>   list_for_each_entry(ldev, _devices, list) {
> - pf = ldev->pf;
> - i40e_client_add_instance(pf);
> - /* Start the client subtask */
> - pf->flags |= I40E_FLAG_SERVICE_CLIENT_REQUESTED;
> - i40e_service_event_schedule(pf);
> + struct i40e_pf *pf = ldev->pf;
> +
> + if (pf->hw.mac.type == I40E_MAC_X722) {
> + i40e_client_add_instance(pf);
> + /* Start the client subtask */
> + pf->flags |= I40E_FLAG_SERVICE_CLIENT_REQUESTED;
> + i40e_service_event_schedule(pf);
> + }
>   }
>   mutex_unlock(_device_mutex);
>  }
> --
> 2.9.3

Thanks for pointing this out, Stephan. I think that's not exactly the right way 
to do this check. Instead, we need to look at the IWARP flag. I'll get a patch 
out today to take care of it.

So consider this a thankful NAK.

-Mitch

[PATCH net-next] sfc: don't insert mc_list on low-latency firmware if it's too long

2017-04-04 Thread Edward Cree

If the mc_list is longer than 256 addresses, we enter mc_promisc mode.
If we're in mc_promisc mode and the firmware doesn't support cascaded
 multicast, normally we also insert our mc_list, to prevent stealing by
 another VI.  However, if the mc_list was too long, this isn't really
 helpful - the MC groups that didn't fit in the list can still get
 stolen, and having only some of them stealable will probably cause
 more confusing behaviour than having them all stealable.  Since
 inserting 256 multicast filters takes a long time and can lead to MCDI
 state machine timeouts, just skip the mc_list insert in this overflow
 condition.

Signed-off-by: Edward Cree 
---
 drivers/net/ethernet/sfc/ef10.c | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/sfc/ef10.c b/drivers/net/ethernet/sfc/ef10.c
index c60c2d4..78efb28 100644
--- a/drivers/net/ethernet/sfc/ef10.c
+++ b/drivers/net/ethernet/sfc/ef10.c
@@ -119,6 +119,7 @@ struct efx_ef10_filter_table {
bool mc_promisc;
 /* Whether in multicast promiscuous mode when last changed */
bool mc_promisc_last;
+   bool mc_overflow; /* Too many MC addrs; should always imply mc_promisc 
*/
bool vlan_filter;
struct list_head vlan_list;
 };
@@ -5058,6 +5059,7 @@ static void efx_ef10_filter_mc_addr_list(struct efx_nic 
*efx)
struct netdev_hw_addr *mc;
unsigned int i, addr_count;
 
+   table->mc_overflow = false;
table->mc_promisc = !!(net_dev->flags & (IFF_PROMISC | IFF_ALLMULTI));
 
addr_count = netdev_mc_count(net_dev);
@@ -5065,6 +5067,7 @@ static void efx_ef10_filter_mc_addr_list(struct efx_nic 
*efx)
netdev_for_each_mc_addr(mc, net_dev) {
if (i >= EFX_EF10_FILTER_DEV_MC_MAX) {
table->mc_promisc = true;
+   table->mc_overflow = true;
break;
}
ether_addr_copy(table->dev_mc_list[i].addr, mc->addr);
@@ -5469,12 +5472,15 @@ static void efx_ef10_filter_vlan_sync_rx_mode(struct 
efx_nic *efx,
}
} else {
/* If we failed to insert promiscuous filters, don't
-* rollback.  Regardless, also insert the mc_list
+* rollback.  Regardless, also insert the mc_list,
+* unless it's incomplete due to overflow
 */
efx_ef10_filter_insert_def(efx, vlan,
   EFX_ENCAP_TYPE_NONE,
   true, false);
-   efx_ef10_filter_insert_addr_list(efx, vlan, true, 
false);
+   if (!table->mc_overflow)
+   efx_ef10_filter_insert_addr_list(efx, vlan,
+true, false);
}
} else {
/* If any filters failed to insert, rollback and fall back to

Re: [PATCH 0/2] ARM: am335x-icev2: Add ethernet support

2017-04-04 Thread Tony Lindgren

* Roger Quadros  [170330 05:37]:
> Hi Tony & Dave,
> 
> On 13/03/17 15:42, Roger Quadros wrote:
> > Hi,
> > 
> > This series adds ethernet support to am335x-icev2 board.
> > 
> > The ethernet PHYs on the board need an explicit GPIO reset pulse
> > to ensure they bootstrap to the correct mode. Without the
> > GPIO reset they just don't work.
> > 
> > cheers,
> > -roger
> 
> Any comments on this series. Patch 1 is at version 2.

I think you meant patch 2/3 is at version2. I've picked
patches 2 and 3 for v4.12 into dt and defconfig branches.

You may need to resend the davinci_mdio.c patch alone
for Dave as he usually won't pick individual patches I
think.

Regards,

Tony

> > Roger Quadros (2):
> >   net: davinci_mdio: add GPIO reset logic
> >   ARM: dts: am335x-icev2: Add CPSW ethernet0 and ethernet1
> > 
> >  .../devicetree/bindings/net/davinci-mdio.txt   |   2 +
> >  arch/arm/boot/dts/am335x-icev2.dts | 113 
> > +
> >  drivers/net/ethernet/ti/davinci_mdio.c |  68 +++--
> >  3 files changed, 175 insertions(+), 8 deletions(-)
> > 
> 
> cheers,
> -roger

Re: [PATCH] can: rcar_can: Do not print virtual addresses

2017-04-04 Thread Marc Kleine-Budde

On 04/03/2017 12:11 PM, Geert Uytterhoeven wrote:
> During probe, the rcar_can driver prints:
> 
> rcar_can e6e8.can: device registered (regs @ e08bc000, IRQ76)
> 
> The "regs" value is a virtual address, exposing internal information,
> hence stop printing it.  The (useful) physical address is already
> printed as part of the device name.
> 
> Fixes: fd1159318e55e901 ("can: add Renesas R-Car CAN driver")
> Signed-off-by: Geert Uytterhoeven 

Thanks, applied and included in pull request.

Mar

-- 
Pengutronix e.K.  | Marc Kleine-Budde   |
Industrial Linux Solutions| Phone: +49-231-2826-924 |
Vertretung West/Dortmund  | Fax:   +49-5121-206917- |
Amtsgericht Hildesheim, HRA 2686  | http://www.pengutronix.de   |



signature.asc
Description: OpenPGP digital signature

[PATCH 1/2] can: ifi: use correct register to read rx status

2017-04-04 Thread Marc Kleine-Budde

From: Markus Marb 

The incorrect offset was used when trying to read the RXSTCMD register.

Signed-off-by: Markus Marb 
Cc: linux-stable 
Signed-off-by: Marc Kleine-Budde 
---
 drivers/net/can/ifi_canfd/ifi_canfd.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/can/ifi_canfd/ifi_canfd.c 
b/drivers/net/can/ifi_canfd/ifi_canfd.c
index 138f5ae75c0b..4d1fe8d95042 100644
--- a/drivers/net/can/ifi_canfd/ifi_canfd.c
+++ b/drivers/net/can/ifi_canfd/ifi_canfd.c
@@ -557,7 +557,7 @@ static int ifi_canfd_poll(struct napi_struct *napi, int 
quota)
int work_done = 0;
 
u32 stcmd = readl(priv->base + IFI_CANFD_STCMD);
-   u32 rxstcmd = readl(priv->base + IFI_CANFD_STCMD);
+   u32 rxstcmd = readl(priv->base + IFI_CANFD_RXSTCMD);
u32 errctr = readl(priv->base + IFI_CANFD_ERROR_CTR);
 
/* Handle bus state changes */
-- 
2.11.0

pull-request: can 2017-04-04

2017-04-04 Thread Marc Kleine-Budde

Hello David,

this is a pull request of two patches for net/master.

The first patch by Markus Marb fixes a register read access in the ifi driver.
The second patch by Geert Uytterhoeven for the rcar driver remove the printing
of a kernel virtual address.

regards,
Marc

---

The following changes since commit 0b9aefea860063bb39e36bd7fe6c7087fed0ba87:

  tcp: minimize false-positives on TCP/GRO check (2017-04-03 18:43:41 -0700)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can.git 

for you to fetch changes up to ca257b9e2d807ab6cb2678ecc7b74aaf4651f597:

  can: rcar_can: Do not print virtual addresses (2017-04-04 17:49:59 +0200)


Geert Uytterhoeven (1):
  can: rcar_can: Do not print virtual addresses

Markus Marb (1):
  can: ifi: use correct register to read rx status

 drivers/net/can/ifi_canfd/ifi_canfd.c | 2 +-
 drivers/net/can/rcar/rcar_can.c   | 3 +--
 2 files changed, 2 insertions(+), 3 deletions(-)

[PATCH 2/2] can: rcar_can: Do not print virtual addresses

2017-04-04 Thread Marc Kleine-Budde

From: Geert Uytterhoeven 

During probe, the rcar_can driver prints:

rcar_can e6e8.can: device registered (regs @ e08bc000, IRQ76)

The "regs" value is a virtual address, exposing internal information,
hence stop printing it.  The (useful) physical address is already
printed as part of the device name.

Fixes: fd1159318e55e901 ("can: add Renesas R-Car CAN driver")
Signed-off-by: Geert Uytterhoeven 
Acked-by: Sergei Shtylyov 
Signed-off-by: Marc Kleine-Budde 
---
 drivers/net/can/rcar/rcar_can.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/net/can/rcar/rcar_can.c b/drivers/net/can/rcar/rcar_can.c
index caed4e6960f8..11662f479e76 100644
--- a/drivers/net/can/rcar/rcar_can.c
+++ b/drivers/net/can/rcar/rcar_can.c
@@ -826,8 +826,7 @@ static int rcar_can_probe(struct platform_device *pdev)
 
devm_can_led_init(ndev);
 
-   dev_info(>dev, "device registered (regs @ %p, IRQ%d)\n",
-priv->regs, ndev->irq);
+   dev_info(>dev, "device registered (IRQ%d)\n", ndev->irq);
 
return 0;
 fail_candev:
-- 
2.11.0

pull-request: can-next 2017-03-03

2017-04-04 Thread Marc Kleine-Budde

Hello David,

this is a pull request of 5 patches for net-next/master.

There are two patches by Yegor Yefremov which convert the ti_hecc
driver into a DT only driver, as there is no in-tree user of the old
platform driver interface anymore. The next patch by Mario Kicherer
adds network namespace support to the can subsystem. The last two
patches by Akshay Bhat add support for the holt_hi311x SPI CAN driver.

regards,
Marc

---

The following changes since commit 822f9bb104c9d1d2dea3669f1941558c6304cf92:

  soreuseport: use "unsigned int" in __reuseport_alloc() (2017-04-03 19:06:38 
-0700)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next.git 
tags/linux-can-next-for-4.13-20170404

for you to fetch changes up to 57e83fb9b7468c75cb65cde1d23043553c346c6d:

  can: hi311x: Add Holt HI-311x CAN driver (2017-04-04 17:35:59 +0200)


linux-can-next-for-4.13-20170404


Akshay Bhat (2):
  can: holt_hi311x: document device tree bindings
  can: hi311x: Add Holt HI-311x CAN driver

Mario Kicherer (1):
  can: initial support for network namespaces

Yegor Yefremov (2):
  can: ti_hecc: Add TI HECC DT binding documentation
  can: ti_hecc: Convert TI HECC driver to DT only driver

 .../devicetree/bindings/net/can/holt_hi311x.txt|   24 +
 .../devicetree/bindings/net/can/ti_hecc.txt|   32 +
 drivers/net/can/spi/Kconfig|6 +
 drivers/net/can/spi/Makefile   |1 +
 drivers/net/can/spi/hi311x.c   | 1076 
 drivers/net/can/ti_hecc.c  |  170 ++--
 include/linux/can/core.h   |7 +-
 include/linux/can/platform/ti_hecc.h   |   44 -
 include/net/net_namespace.h|4 +
 include/net/netns/can.h|   31 +
 net/can/af_can.c   |  122 ++-
 net/can/af_can.h   |4 +-
 net/can/bcm.c  |   13 +-
 net/can/gw.c   |4 +-
 net/can/proc.c |  144 ++-
 net/can/raw.c  |   92 +-
 16 files changed, 1469 insertions(+), 305 deletions(-)
 create mode 100644 Documentation/devicetree/bindings/net/can/holt_hi311x.txt
 create mode 100644 Documentation/devicetree/bindings/net/can/ti_hecc.txt
 create mode 100644 drivers/net/can/spi/hi311x.c
 delete mode 100644 include/linux/can/platform/ti_hecc.h
 create mode 100644 include/net/netns/can.h

-- 
Pengutronix e.K.  | Marc Kleine-Budde   |
Industrial Linux Solutions| Phone: +49-231-2826-924 |
Vertretung West/Dortmund  | Fax:   +49-5121-206917- |
Amtsgericht Hildesheim, HRA 2686  | http://www.pengutronix.de   |



signature.asc
Description: OpenPGP digital signature

Re: [PATCH v4 2/2] can: spi: hi311x: Add Holt HI-311x CAN driver

2017-04-04 Thread Marc Kleine-Budde

On 03/24/2017 06:20 PM, Akshay Bhat wrote:
> Hi Marc,
> 
> On 03/17/2017 05:10 PM, Akshay Bhat wrote:
>> This patch adds support for the Holt HI-311x CAN controller. The HI311x
>> CAN controller is capable of transmitting and receiving standard data
>> frames, extended data frames and remote frames. The HI311x interfaces
>> with the host over SPI.
>>
>> Datasheet: www.holtic.com/documents/371-hi-3110_v-rev-jpdf.do
>>
>> Signed-off-by: Akshay Bhat 
>> ---
>>
> 
> If there are no further review comments can this series be applied to
> can-next or does it need to wait for the next kernel release cycle (4.13)?

The driver doesn't check if the workqueue allocation is successfull,
I've squashed this patch:

> diff --git a/drivers/net/can/spi/hi311x.c b/drivers/net/can/spi/hi311x.c
> index ff4bb40d855e..170e8e3971b2 100644
> --- a/drivers/net/can/spi/hi311x.c
> +++ b/drivers/net/can/spi/hi311x.c
> @@ -780,20 +780,24 @@ static int hi3110_open(struct net_device *net)
>  
> priv->wq = alloc_workqueue("hi3110_wq", WQ_FREEZABLE | WQ_MEM_RECLAIM,
>0);
> +   if (!priv->wq) {
> +   ret = -ENOMEM;
> +   goto out_free_irq;
> +   }
> INIT_WORK(>tx_work, hi3110_tx_work_handler);
> INIT_WORK(>restart_work, hi3110_restart_work_handler);
>  
> ret = hi3110_hw_reset(spi);
> if (ret)
> -   goto out_free_irq;
> +   goto out_free_wq;
>  
> ret = hi3110_setup(net);
> if (ret)
> -   goto out_free_irq;
> +   goto out_free_wq;
>  
> ret = hi3110_set_normal_mode(spi);
> if (ret)
> -   goto out_free_irq;
> +   goto out_free_wq;
>  
> can_led_event(net, CAN_LED_EVENT_OPEN);
> netif_wake_queue(net);
> @@ -801,11 +805,12 @@ static int hi3110_open(struct net_device *net)
>  
> return 0;
>  
> -out_free_irq:
> + out_free_wq:
> +   destroy_workqueue(priv->wq);
> + out_free_irq:
> free_irq(spi->irq, priv);
> hi3110_hw_sleep(spi);
> -
> -out_close:
> + out_close:
> hi3110_power_enable(priv->transceiver, 0);
> close_candev(net);
> mutex_unlock(>hi3110_lock);

Marc

-- 
Pengutronix e.K.  | Marc Kleine-Budde   |
Industrial Linux Solutions| Phone: +49-231-2826-924 |
Vertretung West/Dortmund  | Fax:   +49-5121-206917- |
Amtsgericht Hildesheim, HRA 2686  | http://www.pengutronix.de   |



signature.asc
Description: OpenPGP digital signature

Re: [Intel-wired-lan] [next-queue v6 PATCH 2/7] i40e: Introduce Port Representor netdevs and switchdev mode.

2017-04-04 Thread Alexander Duyck

On Tue, Apr 4, 2017 at 4:58 AM, Or Gerlitz  wrote:
> On Mon, Apr 3, 2017 at 9:41 PM, Samudrala, Sridhar
>  wrote:
>> On 3/30/2017 12:17 AM, Or Gerlitz wrote:
>>> On Thu, Mar 30, 2017, Sridhar Samudrala wrote:
>
 Port Representator netdevs are created for each PF and VF if the switch
 mode is set to 'switchdev'. These netdevs can be used to control and
 configure VFs and PFs when they are moved to a different namespace.
 They enable exposing statistics, configure and monitor link state, mtu,
 filters,fdb/vlan entries etc.
>
 In switchdev mode, broadcasts from VFs are received by the PF and passed
 to corresponding port representor netdev.
>
>>> What netdev represents the uplink (wire port) in your impl?
>
> combining your replies from the two emails:
>
>> We don't have a port netdev representing the uplink in this implementation 
>> as we
>> cannot control the frames going out the uplink via sw rules with the current
>> generation of hw/fw.
>
>> fwd to CPU as default rule is not possible with the current generation of 
>> hw/fw.
>> So we would like to enable switchdev to expose the port representors and 
>> start
>> adding offloads in an incremental way.
>
> I lost you even deeper
>
> I was asking on frames getting in from the uplink and not getting out
> the uplink.

Frames coming from the uplink will by default be routed to the PF. So
are you saying you want a representor for the uplink to handle the
packets that don't have any rules set up for them, correct?

I think we could set something like this up as we do have the concept
of a "default" entity that everything falls back into. It is just a
bit muddled since that current exists as a part of the PF.

> This is about offloading to HW a switching model where the steering
> (matching and actions)
> comes into play on the port ingress. E.g
>
> VF NIC xmit ---> VF vport e-switch rep recv --> SW or HW steering

So this bit we can't really support very well with the i40e hardware.
The problem is that unless there is a rule that exists to route it to
another PF/VF there is a default rule in the hardware that would send
it out the uplink port. The only data we can really catch on the port
representors is broadcast/multicast because it does replication.

> other node xmit --> UPLINK vport e-switch rep recv --> SW or HW steering

This part I think we can do. The default behavior would be to send a
packet to the default entity which in this case is the PF.

> If your current HW can't let you have "send to CPU" as the default
> action on ingress
> for the VFs and uplink ports, I am not clear what use-cases you can do
> in slow path
> (only reps, no offloaded SW rules) and for past path (reps + offloaded
> SW rules)...
>
> Can you please elaborate on such use-cases, so the bigger picture is more 
> clear?

So the main goal with all of this is to support TC offloads so that we
can program filters to route packets from the default entity to the
VF. I agree that I think we are missing the uplink port. We probably
just need to add it as the "default" handler for packets that
originate with a source MAC address that is not the PF or one of the
VFs.

We can discuss this further at netdev/netconf.

- Alex

Re: [oss-security] Linux kernel ping socket / AF_LLC connect() sin_family race

2017-04-04 Thread Marcus Meissner

Hi,

did anyone request a CVE yet?

Ciao, Marcus
On Sat, Mar 25, 2017 at 01:10:57AM +0100, Solar Designer wrote:
> On Fri, Mar 24, 2017 at 03:21:06PM -0700, Eric Dumazet wrote:
> > Looks easy enough to fix ?
> 
> Oh.  Probably.  Thanks.  Need to test, but I guess you already did?
> 
> > diff --git a/net/ipv4/ping.c b/net/ipv4/ping.c
> > index
> > 2af6244b83e27ae384e96cf071c10c5a89674804..ccfbce13a6333a65dab64e4847dd510dfafb1b43
> > 100644
> > --- a/net/ipv4/ping.c
> > +++ b/net/ipv4/ping.c
> > @@ -156,17 +156,18 @@ int ping_hash(struct sock *sk)
> >  void ping_unhash(struct sock *sk)
> >  {
> > struct inet_sock *isk = inet_sk(sk);
> > +
> > pr_debug("ping_unhash(isk=%p,isk->num=%u)\n", isk, isk->inet_num);
> > +   write_lock_bh(_table.lock);
> > if (sk_hashed(sk)) {
> > -   write_lock_bh(_table.lock);
> > hlist_nulls_del(>sk_nulls_node);
> > sk_nulls_node_init(>sk_nulls_node);
> > sock_put(sk);
> > isk->inet_num = 0;
> > isk->inet_sport = 0;
> > sock_prot_inuse_add(sock_net(sk), sk->sk_prot, -1);
> > -   write_unlock_bh(_table.lock);
> > }
> > +   write_unlock_bh(_table.lock);
> >  }
> >  EXPORT_SYMBOL_GPL(ping_unhash);
> 
> FWIW, in Pavel's original implementation for 2.4.32 (unused), this was:
> 
> static void ping_v4_unhash(struct sock *sk)
> {
>   DEBUG(("ping_v4_unhash(sk=%p,sk->num=%u)\n", sk, sk->num));
>   write_lock_bh(_hash_lock);
>   if (sk->pprev) {
>   if (sk->next)
>  sk->next->pprev = sk->pprev;
>   *sk->pprev = sk->next;
>   sk->pprev = NULL;
>   sk->num = 0;
>   sock_prot_dec_use(sk->prot);
>   __sock_put(sk);
>   }
>   write_unlock_bh(_hash_lock);
> }
> 
> Looks like the erroneous optimization (not expecting concurrent activity
> on the same socket?) was introduced during conversion to 2.6's hlists.
> 
> So far this cursed function had 3 bugs, two of them security (including
> this one) and one probably benign (or if not, then effectively a subset
> of this bug as it performed some unneeded / stale debugging work before
> acquiring the lock), with all 3 introduced in forward-porting.  Maybe
> the nature of forward-porting activity makes people relatively
> inattentive ("compiles with the new interfaces and still works? must be
> correct"), compared to when writing new code.
> 
> Anyhow, I share some responsibility for this mess, for having advocated
> this patch being forward-ported and merged back then.  I still like
> having this functionality and its userspace security benefits... but I
> don't like the kernel bugs.
> 
> Alexander
> 

-- 
Marcus Meissner,SUSE LINUX GmbH; Maxfeldstrasse 5; D-90409 Nuernberg; Zi. 
3.1-33,+49-911-740 53-432,,serv=loki,mail=wotan,type=real

[PATCH] igb: Allow to remove administratively set MAC on VFs

2017-04-04 Thread Corinna Vinschen

  Before libvirt modifies the MAC address and vlan tag for an SRIOV VF
  for use by a virtual machine (either using vfio device assignment or
  macvtap passthru mode), it saves the current MAC address and vlan tag
  so that it can reset them to their original value when the guest is
  done.  Libvirt can't leave the VF MAC set to the value used by the
  now-defunct guest since it may be started again later using a
  different VF, but it certainly shouldn't just pick any random value,
  either. So it saves the state of everything prior to using the VF, and
  resets it to that.

  The igb driver initializes the MAC addresses of all VFs to
  00:00:00:00:00:00, and reports that when asked (via an RTM_GETLINK
  netlink message, also visible in the list of VFs in the output of "ip
  link show"). But when libvirt attempts to restore the MAC address back
  to 00:00:00:00:00:00 (using an RTM_SETLINK netlink message) the kernel
  responds with "Invalid argument".

  Forbidding a reset back to the original value leaves the VF MAC at the
  value set for the now-defunct virtual machine. Especially on a system
  with NetworkManager enabled, this has very bad consequences, since
  NetworkManager forces all interfacess to be IFF_UP all the time - if
  the same virtual machine is restarted using a different VF (or even on
  a different host), there will be multiple interfaces watching for
  traffic with the same MAC address.

  To allow libvirt to revert to the original state, we need a way to
  remove the administrative set MAC on a VF, to allow normal host
  operation again, and to reset/overwrite the VF MAC via VF netdev.

  This patch implements the outlined scenario by allowing to set the
  VF MAC to 00:00:00:00:00:00 via RTM_SETLINK on the PF.
  igb_ndo_set_vf_mac resets the IGB_VF_FLAG_PF_SET_MAC flag to 0,
  so it's possible to reset the VF MAC back to the original value via
  the VF netdev.

  Note: Recent patches to libvirt allow for a workaround if the NIC
  isn't capable of resetting the administrative MAC back to all 0, but
  in theory the NIC should allow resetting the MAC in the fisr place.

Signed-off-by: Corinna Vinschen 
---
 drivers/net/ethernet/intel/igb/igb_main.c | 25 -
 1 file changed, 20 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/intel/igb/igb_main.c 
b/drivers/net/ethernet/intel/igb/igb_main.c
index 26a821f..e7a61b1 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -8125,12 +8125,27 @@ static int igb_set_vf_mac(struct igb_adapter *adapter,
 static int igb_ndo_set_vf_mac(struct net_device *netdev, int vf, u8 *mac)
 {
struct igb_adapter *adapter = netdev_priv(netdev);
-   if (!is_valid_ether_addr(mac) || (vf >= adapter->vfs_allocated_count))
+
+   if (vf >= adapter->vfs_allocated_count)
+   return -EINVAL;
+   /* Setting the VF MAC to 0 reverts the IGB_VF_FLAG_PF_SET_MAC
+  flag and allows to overwrite the MAC via VF netdev.  This
+  is necessary to allow libvirt a way to restore the original
+  MAC after unbinding vfio-pci and reloading igbvf after shutting
+  down a VM. */
+   if (is_zero_ether_addr(mac)) {
+   adapter->vf_data[vf].flags &= ~IGB_VF_FLAG_PF_SET_MAC;
+   dev_info(>pdev->dev,
+"remove administratively set MAC on VF %d\n",
+vf);
+   } else if (is_valid_ether_addr (mac)) {
+   adapter->vf_data[vf].flags |= IGB_VF_FLAG_PF_SET_MAC;
+   dev_info(>pdev->dev, "setting MAC %pM on VF %d\n",
+mac, vf);
+   dev_info(>pdev->dev,
+"Reload the VF driver to make this change effective.");
+   } else
return -EINVAL;
-   adapter->vf_data[vf].flags |= IGB_VF_FLAG_PF_SET_MAC;
-   dev_info(>pdev->dev, "setting MAC %pM on VF %d\n", mac, vf);
-   dev_info(>pdev->dev,
-"Reload the VF driver to make this change effective.");
if (test_bit(__IGB_DOWN, >state)) {
dev_warn(>pdev->dev,
 "The VF MAC address has been set, but the PF device is 
not up.\n");
-- 
2.9.3

[iproute PATCH] man: ip-link: Specify min/max values for bridge slave priority and cost

2017-04-04 Thread Phil Sutter

The values are parsed as u16/u32, but kernel limits allowed values.

Signed-off-by: Phil Sutter 
---
 man/man8/ip-link.8.in | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/man/man8/ip-link.8.in b/man/man8/ip-link.8.in
index 3f5d57c28885f..12ec330a9c38e 100644
--- a/man/man8/ip-link.8.in
+++ b/man/man8/ip-link.8.in
@@ -1485,10 +1485,10 @@ is a number representing the following states:
 .BR 4 " (blocking)."
 
 .BI priority " PRIO"
-- set port priority (a 16bit unsigned value).
+- set port priority (allowed values are between 0 and 63, inclusively).
 
 .BI cost " COST"
-- set port cost (a 32bit unsigned value).
+- set port cost (allowed values are between 1 and 65535, inclusively).
 
 .BR guard " { " on " | " off " }"
 - block incoming BPDU packets on this port.
-- 
2.11.0

Re: [PATCH v2] selftests: add a generic testsuite for ethernet device

2017-04-04 Thread Andrew Lunn

On Tue, Apr 04, 2017 at 04:43:19PM +0200, Niklas Cassel wrote:
> On 04/04/2017 03:32 PM, Corentin Labbe wrote:
> > This patch add a generic testsuite for testing ethernet network device 
> > driver.
> > 
> > Signed-off-by: Corentin Labbe 
> > ---
> > 
> > Changes since v1:
> > - Test for starting master interface
> > - Changed printing format to "RESULT: $netdev: line"
> > - Use "ip link" to get device list
> > 
> >  tools/testing/selftests/net/Makefile |   2 +-
> >  tools/testing/selftests/net/netdevice.sh | 200 
> > +++
> >  2 files changed, 201 insertions(+), 1 deletion(-)
> >  create mode 100755 tools/testing/selftests/net/netdevice.sh
> > 
> (snip)
> 
> Good work!
> 
> I suggest adding a test for setting MTU as well.

That one might be tricky. Some devices don't allow the MTU to be
changed when the device is up. Others might require the interface is
up...

Andrew

Re: [PATCH v2] selftests: add a generic testsuite for ethernet device

2017-04-04 Thread Corentin Labbe

On Tue, Apr 04, 2017 at 04:43:19PM +0200, Niklas Cassel wrote:
> On 04/04/2017 03:32 PM, Corentin Labbe wrote:
> > This patch add a generic testsuite for testing ethernet network device 
> > driver.
> > 
> > Signed-off-by: Corentin Labbe 
> > ---
> > 
> > Changes since v1:
> > - Test for starting master interface
> > - Changed printing format to "RESULT: $netdev: line"
> > - Use "ip link" to get device list
> > 
> >  tools/testing/selftests/net/Makefile |   2 +-
> >  tools/testing/selftests/net/netdevice.sh | 200 
> > +++
> >  2 files changed, 201 insertions(+), 1 deletion(-)
> >  create mode 100755 tools/testing/selftests/net/netdevice.sh
> > 
> (snip)
> 
> Good work!
> 
> I suggest adding a test for setting MTU as well.
> It doesn't have to be added before merging, but it
> would be great if it could be added in the near future.
> 
> 
> Regards,
> Niklas

I already have it but prefer to add few test and add the rest one by one.
But yes, I should add it as TODO like set ipaddr.

Regards

net/ipv4: use-after-free in ipv4_mtu

2017-04-04 Thread Andrey Konovalov

Hi,

I've got the following error report while fuzzing the kernel with syzkaller.

On commit a71c9a1c779f2499fb2afc0553e543f18aff6edf (4.11-rc5).

Unfortunately it's not reproducible.

==
BUG: KASAN: use-after-free in dst_metric_raw include/net/dst.h:176
[inline] at addr 88003d6a965c
BUG: KASAN: use-after-free in ipv4_mtu+0x3f2/0x4b0
net/ipv4/route.c:1270 at addr 88003d6a965c
Read of size 4 by task syz-executor3/20611
CPU: 3 PID: 20611 Comm: syz-executor3 Not tainted 4.11.0-rc5+ #199
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:16 [inline]
 dump_stack+0x292/0x398 lib/dump_stack.c:52
 kasan_object_err+0x1c/0x70 mm/kasan/report.c:164
 print_address_description mm/kasan/report.c:202 [inline]
 kasan_report_error mm/kasan/report.c:291 [inline]
 kasan_report+0x252/0x510 mm/kasan/report.c:347
 __asan_report_load4_noabort+0x14/0x20 mm/kasan/report.c:367
 dst_metric_raw include/net/dst.h:176 [inline]
 ipv4_mtu+0x3f2/0x4b0 net/ipv4/route.c:1270
 dst_mtu include/net/dst.h:221 [inline]
 do_ip_getsockopt+0x71d/0x2290 net/ipv4/ip_sockglue.c:1433
 ip_getsockopt+0x90/0x230 net/ipv4/ip_sockglue.c:1578
 tcp_getsockopt+0x82/0xd0 net/ipv4/tcp.c:3131
 sock_common_getsockopt+0x95/0xd0 net/core/sock.c:2709
 SYSC_getsockopt net/socket.c:1829 [inline]
 SyS_getsockopt+0x252/0x390 net/socket.c:1811
 entry_SYSCALL_64_fastpath+0x1f/0xc2
RIP: 0033:0x4458d9
RSP: 002b:7fe87f452b58 EFLAGS: 0286 ORIG_RAX: 0037
RAX: ffda RBX: 0005 RCX: 004458d9
RDX: 000e RSI:  RDI: 0005
RBP: 006e0020 R08: 20db6000 R09: 
R10: 207e8000 R11: 0286 R12: 00708150
R13: 20db8000 R14: 1000 R15: 0003
Object at 88003d6a9658, in cache kmalloc-64 size: 64
Allocated:
PID = 20110
 save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59
 save_stack+0x43/0xd0 mm/kasan/kasan.c:513
 set_track mm/kasan/kasan.c:525 [inline]
 kasan_kmalloc+0xad/0xe0 mm/kasan/kasan.c:616
 kmem_cache_alloc_trace+0x82/0x270 mm/slub.c:2745
 kmalloc include/linux/slab.h:490 [inline]
 kzalloc include/linux/slab.h:663 [inline]
 fib_create_info+0x8e0/0x3a30 net/ipv4/fib_semantics.c:1040
 fib_table_insert+0x1a5/0x1550 net/ipv4/fib_trie.c:1221
 ip_rt_ioctl+0xddc/0x1590 net/ipv4/fib_frontend.c:597
 inet_ioctl+0xf2/0x1c0 net/ipv4/af_inet.c:882
sctp: [Deprecated]: syz-executor0 (pid 20638) Use of int in max_burst
socket option.
Use struct sctp_assoc_value instead
 sock_do_ioctl+0x65/0xb0 net/socket.c:906
 sock_ioctl+0x28f/0x440 net/socket.c:1004
 vfs_ioctl fs/ioctl.c:45 [inline]
 do_vfs_ioctl+0x1bf/0x1780 fs/ioctl.c:685
 SYSC_ioctl fs/ioctl.c:700 [inline]
 SyS_ioctl+0x8f/0xc0 fs/ioctl.c:691
 entry_SYSCALL_64_fastpath+0x1f/0xc2
Freed:
PID = 4439
 save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59
 save_stack+0x43/0xd0 mm/kasan/kasan.c:513
 set_track mm/kasan/kasan.c:525 [inline]
 kasan_slab_free+0x73/0xc0 mm/kasan/kasan.c:589
 slab_free_hook mm/slub.c:1357 [inline]
 slab_free_freelist_hook mm/slub.c:1379 [inline]
 slab_free mm/slub.c:2961 [inline]
 kfree+0xe8/0x2b0 mm/slub.c:3882
 free_fib_info_rcu+0x4ba/0x5e0 net/ipv4/fib_semantics.c:218
 __rcu_reclaim kernel/rcu/rcu.h:118 [inline]
 rcu_do_batch.isra.64+0x947/0xcc0 kernel/rcu/tree.c:2879
 invoke_rcu_callbacks kernel/rcu/tree.c:3142 [inline]
 __rcu_process_callbacks kernel/rcu/tree.c:3109 [inline]
 rcu_process_callbacks+0x2cc/0xb90 kernel/rcu/tree.c:3126
 __do_softirq+0x2fb/0xb7d kernel/softirq.c:284
Memory state around the buggy address:
 88003d6a9500: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
 88003d6a9580: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>88003d6a9600: fc fc fc fc fc fc fc fc fc fc fc fb fb fb fb fb
^
 88003d6a9680: fb fb fb fc fc fc fc fc fc fc fc fc fc fc fc fc
 88003d6a9700: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
==

Re: linux-next: manual merge of the net-next tree with the net tree

2017-04-04 Thread Simon Horman

On Tue, Apr 04, 2017 at 11:13:57AM +1000, Stephen Rothwell wrote:
> Hi all,
> 
> Today's linux-next merge of the net-next tree got a conflict in:
> 
>   net/core/flow_dissector.c
> 
> between commit:
> 
>   ac6a3722fed6 ("flow dissector: correct size of storage for ARP")
> 
> from the net tree and commit:
> 
>   9bf881ffc5c0 ("flow_dissector: Move ARP dissection into a separate 
> function")
> 
> from the net-next tree.
> 
> I fixed it up (see below) and can carry the fix as necessary. This
> is now fixed as far as linux-next is concerned, but any non trivial
> conflicts should be mentioned to your upstream maintainer when your tree
> is submitted for merging.  You may also want to consider cooperating
> with the maintainer of the conflicting tree to minimise any particularly
> complex conflicts.

Thanks Stephen, the fix looks correct to me.

[PATCH] bpf: use 'ctx' instead of 'skb' in debug message

2017-04-04 Thread Johannes Berg

From: Johannes Berg 

The error message here should mention 'ctx' since the context
is now more generic than just an skb.

Signed-off-by: Johannes Berg 
---
 kernel/bpf/verifier.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 796b68d00119..1b3c921d3798 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -2329,7 +2329,7 @@ static int check_ld_abs(struct bpf_verifier_env *env, 
struct bpf_insn *insn)
return err;
 
if (regs[BPF_REG_6].type != PTR_TO_CTX) {
-   verbose("at the time of BPF_LD_ABS|IND R6 != pointer to skb\n");
+   verbose("at the time of BPF_LD_ABS|IND R6 != pointer to ctx\n");
return -EINVAL;
}
 
-- 
2.11.0

Re: [PATCH v2] selftests: add a generic testsuite for ethernet device

2017-04-04 Thread Niklas Cassel

On 04/04/2017 03:32 PM, Corentin Labbe wrote:
> This patch add a generic testsuite for testing ethernet network device driver.
> 
> Signed-off-by: Corentin Labbe 
> ---
> 
> Changes since v1:
> - Test for starting master interface
> - Changed printing format to "RESULT: $netdev: line"
> - Use "ip link" to get device list
> 
>  tools/testing/selftests/net/Makefile |   2 +-
>  tools/testing/selftests/net/netdevice.sh | 200 
> +++
>  2 files changed, 201 insertions(+), 1 deletion(-)
>  create mode 100755 tools/testing/selftests/net/netdevice.sh
> 
(snip)

Good work!

I suggest adding a test for setting MTU as well.
It doesn't have to be added before merging, but it
would be great if it could be added in the near future.


Regards,
Niklas

[PATCH] i40e: limit client interface to X722 hardware

2017-04-04 Thread Stefan Assmann

The client interface is meant for X722 iWARP support. Modprobing i40iw
on systems with X710/XL710 NICs currently may crash the system. Adding a
check which limits client interface access to the appropriate hardware.

Signed-off-by: Stefan Assmann 
---
 drivers/net/ethernet/intel/i40e/i40e_client.c | 14 --
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_client.c 
b/drivers/net/ethernet/intel/i40e/i40e_client.c
index 191028b..6f873449 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_client.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_client.c
@@ -525,15 +525,17 @@ static void i40e_client_release(struct i40e_client 
*client)
 static void i40e_client_prepare(struct i40e_client *client)
 {
struct i40e_device *ldev;
-   struct i40e_pf *pf;
 
mutex_lock(_device_mutex);
list_for_each_entry(ldev, _devices, list) {
-   pf = ldev->pf;
-   i40e_client_add_instance(pf);
-   /* Start the client subtask */
-   pf->flags |= I40E_FLAG_SERVICE_CLIENT_REQUESTED;
-   i40e_service_event_schedule(pf);
+   struct i40e_pf *pf = ldev->pf;
+
+   if (pf->hw.mac.type == I40E_MAC_X722) {
+   i40e_client_add_instance(pf);
+   /* Start the client subtask */
+   pf->flags |= I40E_FLAG_SERVICE_CLIENT_REQUESTED;
+   i40e_service_event_schedule(pf);
+   }
}
mutex_unlock(_device_mutex);
 }
-- 
2.9.3

Re: [PATCH v3] tracing/kprobes: expose maxactive for kretprobe in kprobe_events

2017-04-04 Thread Steven Rostedt

On Tue, 4 Apr 2017 20:24:59 +0900
Masami Hiramatsu  wrote:

> On Mon,  3 Apr 2017 12:36:22 +0200
> Alban Crequy  wrote:
> 
> > From: Alban Crequy 
> > 
> > When a kretprobe is installed on a kernel function, there is a maximum
> > limit of how many calls in parallel it can catch (aka "maxactive"). A
> > kernel module could call register_kretprobe() and initialize maxactive
> > (see example in samples/kprobes/kretprobe_example.c).
> > 
> > But that is not exposed to userspace and it is currently not possible to
> > choose maxactive when writing to /sys/kernel/debug/tracing/kprobe_events
> > 
> > The default maxactive can be as low as 1 on single-core with a
> > non-preemptive kernel. This is too low and we need to increase it not
> > only for recursive functions, but for functions that sleep or resched.
> > 
> > This patch updates the format of the command that can be written to
> > kprobe_events so that maxactive can be optionally specified.
> > 
> > I need this for a bpf program attached to the kretprobe of
> > inet_csk_accept, which can sleep for a long time.
> > 
> > This patch includes a basic selftest:
> >   
> > > # ./ftracetest -v  test.d/kprobe/
> > > === Ftrace unit tests ===
> > > [1] Kprobe dynamic event - adding and removing[PASS]
> > > [2] Kprobe dynamic event - busy event check   [PASS]
> > > [3] Kprobe dynamic event with arguments   [PASS]
> > > [4] Kprobes event arguments with types[PASS]
> > > [5] Kprobe dynamic event with function tracer [PASS]
> > > [6] Kretprobe dynamic event with arguments[PASS]
> > > [7] Kretprobe dynamic event with maxactive[PASS]
> > >
> > > # of passed:  7
> > > # of failed:  0
> > > # of unresolved:  0
> > > # of untested:  0
> > > # of unsupported:  0
> > > # of xfailed:  0
> > > # of undefined(test bug):  0  
> > 
> > BugLink: https://github.com/iovisor/bcc/issues/1072
> > Signed-off-by: Alban Crequy   
> 
> Looks good to me.
> 
> Acked-by: Masami Hiramatsu 
> 

Applied, thanks!

-- Steve

Re: [PATCH v2] selftests: add a generic testsuite for ethernet device

2017-04-04 Thread Andrew Lunn

On Tue, Apr 04, 2017 at 03:32:47PM +0200, Corentin Labbe wrote:
> This patch add a generic testsuite for testing ethernet network device driver.

# ip link show
1: lo:  mtu 65536 qdisc noqueue state UNKNOWN mode DEFAUL0
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: eth0:  mtu 1500 qdisc mq state UP mode DEFA2
link/ether 94:10:3e:80:bc:f3 brd ff:ff:ff:ff:ff:ff
3: eth1:  mtu 1500 qdisc mq state UP mode DEFA2
link/ether a2:e2:da:92:7e:8c brd ff:ff:ff:ff:ff:ff
4: sit0@NONE:  mtu 1480 qdisc noop state DOWN mode DEFAULT group default0
link/sit 0.0.0.0 brd 0.0.0.0
5: lan4@eth0:  mtu 1500 qdisc noqueue state 0
link/ether 94:10:3e:80:bc:f3 brd ff:ff:ff:ff:ff:ff
6: lan3@eth0:  mtu 1500 qdisc noop state DOWN mode DEFAULT0
link/ether 94:10:3e:80:bc:f3 brd ff:ff:ff:ff:ff:ff
7: lan2@eth0:  mtu 1500 qdisc noqueue state 0
link/ether 94:10:3e:80:bc:f3 brd ff:ff:ff:ff:ff:ff
8: lan1@eth0:  mtu 1500 qdisc noqueue state UP0
link/ether 94:10:3e:80:bc:f3 brd ff:ff:ff:ff:ff:ff
9: internet@eth0:  mtu 1500 qdisc noqueue st0
link/ether 94:10:3e:80:bc:f3 brd ff:ff:ff:ff:ff:ff

# /home/andrew/netdevice.sh 
[  151.417351] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
PASS: eth0: set interface up
PASS: eth0: set MAC address
SKIP: eth0: set IP address
PASS: eth0: ethtool list features
Cannot get register dump: Operation not supported
SKIP: eth0: ethtool dump not supported
PASS: eth0: ethtool stats
PASS: eth0: stop interface
SKIP: eth1: interface already up
PASS: eth1: ethtool list features
Cannot get register dump: Operation not supported
SKIP: eth1: ethtool dump not supported
PASS: eth1: ethtool stats
SKIP: eth1: interface kept up
PASS: eth0: set interface up
PASS: lan4: set MAC address
inet 192.168.13.2/24 brd 192.168.13.255 scope global lan4
SKIP: lan4: already have an IP
PASS: lan4: ethtool list features
PASS: lan4: ethtool dump
PASS: lan4: ethtool stats
PASS: eth0: stop interface
PASS: eth0: set interface up
PASS: lan3: set MAC address
SKIP: lan3: set IP address
PASS: lan3: ethtool list features
PASS: lan3: ethtool dump
PASS: lan3: ethtool stats
PASS: eth0: stop interface
PASS: eth0: set interface up
PASS: lan2: set MAC address
SKIP: lan2: set IP address
PASS: lan2: ethtool list features
PASS: lan2: ethtool dump
PASS: lan2: ethtool stats
PASS: eth0: stop interface
PASS: eth0: set interface up
PASS: lan1: set MAC address
inet 10.0.0.12/24 brd 10.0.0.255 scope global lan1
SKIP: lan1: already have an IP
PASS: lan1: ethtool list features
PASS: lan1: ethtool dump
PASS: lan1: ethtool stats
PASS: eth0: stop interface
PASS: eth0: set interface up
PASS: internet: set MAC address
inet 192.168.10.2/24 brd 192.168.10.255 scope global internet
SKIP: internet: already have an IP
PASS: internet: ethtool list features
PASS: internet: ethtool dump
PASS: internet: ethtool stats
PASS: eth0: stop interface

Cool

Tested-by: Andrew Lunn 

Andrew

net/dccp: BUG in tfrc_rx_hist_sample_rtt

2017-04-04 Thread Andrey Konovalov

Hi,

I've got the following error report while fuzzing the kernel with syzkaller.

On commit a71c9a1c779f2499fb2afc0553e543f18aff6edf (4.11-rc5).

I'm able to reproduce it by executing the attached syzkaller prog, but
there's no simple C reproducer. My .config is attached.

BUG: please report to d...@vger.kernel.org => prev = 0, last = 0 at
net/dccp/ccids/lib/packet_history.c:427/tfrc_rx_hist_sample_rtt()
CPU: 1 PID: 4049 Comm: syz-executor Not tainted 4.11.0-rc5+ #199
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
Call Trace:
 
 __dump_stack lib/dump_stack.c:16
 dump_stack+0x292/0x398 lib/dump_stack.c:52
 tfrc_rx_hist_sample_rtt+0x407/0x4d0 net/dccp/ccids/lib/packet_history.c:424
 ccid3_hc_rx_packet_recv+0x6a3/0xfb0 net/dccp/ccids/ccid3.c:764
 ccid_hc_rx_packet_recv net/dccp/ccid.h:185
 dccp_deliver_input_to_ccids+0xd9/0x250 net/dccp/input.c:180
 dccp_rcv_established+0x88/0xb0 net/dccp/input.c:378
 dccp_v6_do_rcv+0x2af/0x350 net/dccp/ipv6.c:600
 sk_backlog_rcv ./include/net/sock.h:898
 __sk_receive_skb+0x368/0xcb0 net/core/sock.c:469
 dccp_v6_rcv+0xba2/0x1cf0 net/dccp/ipv6.c:744
 ip6_input_finish+0x468/0x17b0 net/ipv6/ip6_input.c:279
 NF_HOOK ./include/linux/netfilter.h:257
 ip6_input+0xdb/0x590 net/ipv6/ip6_input.c:322
 dst_input ./include/net/dst.h:492
 ip6_rcv_finish+0x289/0x890 net/ipv6/ip6_input.c:69
 NF_HOOK ./include/linux/netfilter.h:257
 ipv6_rcv+0x12f1/0x23a0 net/ipv6/ip6_input.c:203
 __netif_receive_skb_core+0x1ad1/0x3400 net/core/dev.c:4207
 __netif_receive_skb+0x2a/0x170 net/core/dev.c:4245
 process_backlog+0xe5/0x6c0 net/core/dev.c:4865
 napi_poll net/core/dev.c:5267
 net_rx_action+0xe70/0x1900 net/core/dev.c:5332
 __do_softirq+0x2fb/0xb7d kernel/softirq.c:284
 do_softirq_own_stack+0x1c/0x30 arch/x86/entry/entry_64.S:902
 
 do_softirq.part.17+0x1e8/0x230 kernel/softirq.c:328
 do_softirq kernel/softirq.c:176
 __local_bh_enable_ip+0x1f2/0x200 kernel/softirq.c:181
 local_bh_enable ./include/linux/bottom_half.h:31
 rcu_read_unlock_bh ./include/linux/rcupdate.h:931
 ip6_finish_output2+0xc4e/0x2560 net/ipv6/ip6_output.c:124
 ip6_finish_output+0x302/0x960 net/ipv6/ip6_output.c:149
 NF_HOOK_COND ./include/linux/netfilter.h:246
 ip6_output+0x1cb/0x8d0 net/ipv6/ip6_output.c:163
 ip6_xmit+0xd38/0x2210 ./include/net/dst.h:486
 inet6_csk_xmit+0x331/0x610 net/ipv6/inet6_connection_sock.c:139
 dccp_transmit_skb+0xb09/0x1120 net/dccp/output.c:142
 dccp_xmit_packet+0x215/0x760 net/dccp/output.c:281
 dccp_write_xmit+0x168/0x1d0 net/dccp/output.c:363
 dccp_sendmsg+0x79c/0xb10 net/dccp/proto.c:796
 inet_sendmsg+0x164/0x5b0 net/ipv4/af_inet.c:762
 sock_sendmsg_nosec net/socket.c:633
 sock_sendmsg+0xca/0x110 net/socket.c:643
 SYSC_sendto+0x660/0x810 net/socket.c:1696
 SyS_sendto+0x40/0x50 net/socket.c:1664
 entry_SYSCALL_64_fastpath+0x1f/0xc2 arch/x86/entry/entry_64.S:204
RIP: 0033:0x4458d9
RSP: 002b:7f4a98d8ab58 EFLAGS: 0282 ORIG_RAX: 002c
RAX: ffda RBX: 0015 RCX: 004458d9
RDX: 007a RSI: 203c4000 RDI: 0015
RBP: 006e2fa0 R08: 20e6f000 R09: 000a
R10: 4000 R11: 0282 R12: 00708150
R13:  R14: 7f4a98d8b9c0 R15: 7f4a98d8b700
dccp_close: ABORT with 1 bytes unread


.config
Description: Binary data


dccp-history-bug-log
Description: Binary data

[PATCH v2] selftests: add a generic testsuite for ethernet device

2017-04-04 Thread Corentin Labbe

This patch add a generic testsuite for testing ethernet network device driver.

Signed-off-by: Corentin Labbe 
---

Changes since v1:
- Test for starting master interface
- Changed printing format to "RESULT: $netdev: line"
- Use "ip link" to get device list

 tools/testing/selftests/net/Makefile |   2 +-
 tools/testing/selftests/net/netdevice.sh | 200 +++
 2 files changed, 201 insertions(+), 1 deletion(-)
 create mode 100755 tools/testing/selftests/net/netdevice.sh

diff --git a/tools/testing/selftests/net/Makefile 
b/tools/testing/selftests/net/Makefile
index fbfe5d0..35cbb4c 100644
--- a/tools/testing/selftests/net/Makefile
+++ b/tools/testing/selftests/net/Makefile
@@ -5,7 +5,7 @@ CFLAGS += -I../../../../usr/include/
 
 reuseport_bpf_numa: LDFLAGS += -lnuma
 
-TEST_PROGS := run_netsocktests run_afpackettests test_bpf.sh
+TEST_PROGS := run_netsocktests run_afpackettests test_bpf.sh netdevice.sh
 TEST_GEN_FILES =  socket
 TEST_GEN_FILES += psock_fanout psock_tpacket
 TEST_GEN_FILES += reuseport_bpf reuseport_bpf_cpu reuseport_bpf_numa
diff --git a/tools/testing/selftests/net/netdevice.sh 
b/tools/testing/selftests/net/netdevice.sh
new file mode 100755
index 000..4e00568
--- /dev/null
+++ b/tools/testing/selftests/net/netdevice.sh
@@ -0,0 +1,200 @@
+#!/bin/sh
+#
+# This test is for checking network interface
+# For the moment it tests only ethernet interface (but wifi could be easily 
added)
+#
+# We assume that all network driver are loaded
+# if not they probably have failed earlier in the boot process and their 
logged error will be catched by another test
+#
+
+# this function will try to up the interface
+# if already up, nothing done
+# arg1: network interface name
+kci_net_start()
+{
+   netdev=$1
+
+   ip link show "$netdev" |grep -q UP
+   if [ $? -eq 0 ];then
+   echo "SKIP: $netdev: interface already up"
+   return 0
+   fi
+
+   ip link set "$netdev" up
+   if [ $? -ne 0 ];then
+   echo "FAIL: $netdev: Fail to up interface"
+   return 1
+   else
+   echo "PASS: $netdev: set interface up"
+   NETDEV_STARTED=1
+   fi
+   return 0
+}
+
+# this function will try to setup an IP and MAC address on a network interface
+# Doing nothing if the interface was already up
+# arg1: network interface name
+kci_net_setup()
+{
+   netdev=$1
+
+   # do nothing if the interface was already up
+   if [ $NETDEV_STARTED -eq 0 ];then
+   return 0
+   fi
+
+   MACADDR='02:03:04:05:06:07'
+   ip link set dev $netdev address "$MACADDR"
+   if [ $? -ne 0 ];then
+   echo "FAIL: $netdev: Cannot set MAC address"
+   else
+   ip link show $netdev |grep -q "$MACADDR"
+   if [ $? -eq 0 ];then
+   echo "PASS: $netdev: set MAC address"
+   else
+   echo "FAIL: $netdev: Cannot set MAC address"
+   fi
+   fi
+
+   #check that the interface did not already have an IP
+   ip address show "$netdev" |grep '^[[:space:]]*inet'
+   if [ $? -eq 0 ];then
+   echo "SKIP: $netdev: already have an IP"
+   return 0
+   fi
+
+   # TODO what ipaddr to set ? DHCP ?
+   echo "SKIP: $netdev: set IP address"
+   return 0
+}
+
+# test an ethtool command
+# arg1: return code for not supported (see ethtool code source)
+# arg2: summary of the command
+# arg3: command to execute
+kci_netdev_ethtool_test()
+{
+   if [ $# -le 2 ];then
+   echo "SKIP: $netdev: ethtool: invalid number of arguments"
+   return 1
+   fi
+   $3 >/dev/null
+   ret=$?
+   if [ $ret -ne 0 ];then
+   if [ $ret -eq "$1" ];then
+   echo "SKIP: $netdev: ethtool $2 not supported"
+   else
+   echo "FAIL: $netdev: ethtool $2"
+   return 1
+   fi
+   else
+   echo "PASS: $netdev: ethtool $2"
+   fi
+   return 0
+}
+
+# test ethtool commands
+# arg1: network interface name
+kci_netdev_ethtool()
+{
+   netdev=$1
+
+   #check presence of ethtool
+   ethtool --version 2>/dev/null >/dev/null
+   if [ $? -ne 0 ];then
+   echo "SKIP: ethtool not present"
+   return 1
+   fi
+
+   TMP_ETHTOOL_FEATURES="$(mktemp)"
+   if [ ! -e "$TMP_ETHTOOL_FEATURES" ];then
+   echo "SKIP: Cannot create a tmp file"
+   return 1
+   fi
+
+   ethtool -k "$netdev" > "$TMP_ETHTOOL_FEATURES"
+   if [ $? -ne 0 ];then
+   echo "FAIL: $netdev: ethtool list features"
+   rm "$TMP_ETHTOOL_FEATURES"
+   return 1
+   fi
+   echo "PASS: $netdev: ethtool list features"
+   #TODO for each non fixed features, try to turn them on/off
+   rm "$TMP_ETHTOOL_FEATURES"

RE: [PATCH] bnx2x: fix spelling mistake in macros HW_INTERRUT_ASSERT_SET_*

2017-04-04 Thread Mintz, Yuval

> Trival fix, rename HW_INTERRUT_ASSERT_SET_* to
> HW_INTERRUPT_ASSERT_SET_*
> 
> Signed-off-by: Colin Ian King 

Thanks. Don't know if it's needed but still:

Acked-by: Yuval Mintz

[PATCH v3 iproute] ip: Add support for netdev events to monitor

2017-04-04 Thread Vladislav Yasevich

Add IFLA_EVENT handling so that event types can be viewed with
'monitor' command.  This gives a little more information for why
a given message was receivied.

Signed-off-by: Vladislav Yasevich 
---
 include/linux/if_link.h | 21 +
 ip/ipaddress.c  | 31 +++
 2 files changed, 52 insertions(+)

diff --git a/include/linux/if_link.h b/include/linux/if_link.h
index b0bdbd6..b6d211a 100644
--- a/include/linux/if_link.h
+++ b/include/linux/if_link.h
@@ -157,6 +157,7 @@ enum {
IFLA_GSO_MAX_SIZE,
IFLA_PAD,
IFLA_XDP,
+   IFLA_EVENT,
__IFLA_MAX
 };
 
@@ -890,4 +891,24 @@ enum {
 
 #define IFLA_XDP_MAX (__IFLA_XDP_MAX - 1)
 
+enum {
+   IFLA_EVENT_UNSPEC,
+   IFLA_EVENT_REBOOT,
+   IFLA_EVENT_CHANGE_MTU,
+   IFLA_EVENT_CHANGE_ADDR,
+   IFLA_EVENT_CHANGE_NAME,
+   IFLA_EVENT_FEAT_CHANGE,
+   IFLA_EVENT_BONDING_FAILOVER,
+   IFLA_EVENT_POST_TYPE_CHANGE,
+   IFLA_EVENT_NOTIFY_PEERS,
+   IFLA_EVENT_CHANGE_UPPER,
+   IFLA_EVENT_RESEND_IGMP,
+   IFLA_EVENT_PRE_CHANGE_MTU,
+   IFLA_EVENT_CHANGE_INFO_DATA,
+   IFLA_EVENT_PRE_CHANGE_UPPER,
+   IFLA_EVENT_CHANGE_LOWER_STATE,
+   IFLA_EVENT_UDP_TUNNEL_PUSH_INFO,
+   IFLA_EVENT_CHANGE_TX_QUEUE_LEN,
+};
+
 #endif /* _LINUX_IF_LINK_H */
diff --git a/ip/ipaddress.c b/ip/ipaddress.c
index b8d9c7d..ffcfa5e 100644
--- a/ip/ipaddress.c
+++ b/ip/ipaddress.c
@@ -753,6 +753,34 @@ int print_linkinfo_brief(const struct sockaddr_nl *who,
return 0;
 }
 
+static const char *netdev_events[] = {"UNKNOWN",
+ "REBOOT",
+ "CHANGE_MTU",
+ "CHANGE_ADDR",
+ "CHANGE_NAME",
+ "FEATURE_CHANGE",
+ "BONDING_FAILOVER",
+ "POST_TYPE_CHANGE",
+ "NOTIFY_PEERS",
+ "CHANGE_UPPER",
+ "RESEND_IGMP",
+ "PRE_CHANGE_MTU",
+ "CHANGE_INFO_DATA",
+ "PRE_CHANGE_UPPER",
+ "CHANGE_LOWER_STATE",
+ "UDP_TUNNEL_PUSH_INFO",
+ "CHANGE_TXQUEUE_LEN"};
+
+static void print_dev_event(FILE *f, __u32 event)
+{
+   if (event >= ARRAY_SIZE(netdev_events))
+   fprintf(f, "event %d ", event);
+   else {
+   if (event)
+   fprintf(f, "event %s ", netdev_events[event]);
+   }
+}
+
 int print_linkinfo(const struct sockaddr_nl *who,
   struct nlmsghdr *n, void *arg)
 {
@@ -858,6 +886,9 @@ int print_linkinfo(const struct sockaddr_nl *who,
if (filter.showqueue)
print_queuelen(fp, tb);
 
+   if (tb[IFLA_EVENT])
+   print_dev_event(fp, rta_getattr_u32(tb[IFLA_EVENT]));
+
if (!filter.family || filter.family == AF_PACKET || show_details) {
SPRINT_BUF(b1);
fprintf(fp, "%s", _SL_);
-- 
2.7.4

[PATCH v2 net-next 2/2] rtnl: Add support for netdev event to link messages

2017-04-04 Thread Vladislav Yasevich

When netdev events happen, a rtnetlink_event() handler will send
messages for every event in it's white list.  These messages contain
current information about a particular device, but they do not include
the iformation about which event just happened.  The consumer of
the message has to try to infer this information.  In some cases
(ex: NETDEV_NOTIFY_PEERS), that is not possible.

This patch adds a new extension to RTM_NEWLINK message called IFLA_EVENT
that would have an encoding of the which event triggered this
message.  This would allow the the message consumer to easily determine
if it is interested in a particular event or not.

Signed-off-by: Vladislav Yasevich 
---
V2: Added support for 2 missed event types.

 include/linux/rtnetlink.h|  3 +-
 include/uapi/linux/if_link.h | 21 ++
 net/core/dev.c   |  2 +-
 net/core/rtnetlink.c | 92 +++-
 4 files changed, 107 insertions(+), 11 deletions(-)

diff --git a/include/linux/rtnetlink.h b/include/linux/rtnetlink.h
index 57e5484..0459018 100644
--- a/include/linux/rtnetlink.h
+++ b/include/linux/rtnetlink.h
@@ -18,7 +18,8 @@ extern int rtnl_put_cacheinfo(struct sk_buff *skb, struct 
dst_entry *dst,
 
 void rtmsg_ifinfo(int type, struct net_device *dev, unsigned change, gfp_t 
flags);
 struct sk_buff *rtmsg_ifinfo_build_skb(int type, struct net_device *dev,
-  unsigned change, gfp_t flags);
+  unsigned change, unsigned long event,
+  gfp_t flags);
 void rtmsg_ifinfo_send(struct sk_buff *skb, struct net_device *dev,
   gfp_t flags);
 
diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
index 320fc1e..e4a85cb 100644
--- a/include/uapi/linux/if_link.h
+++ b/include/uapi/linux/if_link.h
@@ -157,6 +157,7 @@ enum {
IFLA_GSO_MAX_SIZE,
IFLA_PAD,
IFLA_XDP,
+   IFLA_EVENT,
__IFLA_MAX
 };
 
@@ -892,4 +893,24 @@ enum {
 
 #define IFLA_XDP_MAX (__IFLA_XDP_MAX - 1)
 
+enum {
+   IFLA_EVENT_UNSPEC,
+   IFLA_EVENT_REBOOT,
+   IFLA_EVENT_CHANGE_MTU,
+   IFLA_EVENT_CHANGE_ADDR,
+   IFLA_EVENT_CHANGE_NAME,
+   IFLA_EVENT_FEAT_CHANGE,
+   IFLA_EVENT_BONDING_FAILOVER,
+   IFLA_EVENT_POST_TYPE_CHANGE,
+   IFLA_EVENT_NOTIFY_PEERS,
+   IFLA_EVENT_CHANGE_UPPER,
+   IFLA_EVENT_RESEND_IGMP,
+   IFLA_EVENT_PRE_CHANGE_MTU,
+   IFLA_EVENT_CHANGE_INFO_DATA,
+   IFLA_EVENT_PRE_CHANGE_UPPER,
+   IFLA_EVENT_CHANGE_LOWER_STATE,
+   IFLA_EVENT_UDP_TUNNEL_PUSH_INFO,
+   IFLA_EVENT_CHANGE_TX_QUEUE_LEN,
+};
+
 #endif /* _UAPI_LINUX_IF_LINK_H */
diff --git a/net/core/dev.c b/net/core/dev.c
index ef9fe60e..7efb417 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -6840,7 +6840,7 @@ static void rollback_registered_many(struct list_head 
*head)
 
if (!dev->rtnl_link_ops ||
dev->rtnl_link_state == RTNL_LINK_INITIALIZED)
-   skb = rtmsg_ifinfo_build_skb(RTM_DELLINK, dev, ~0U,
+   skb = rtmsg_ifinfo_build_skb(RTM_DELLINK, dev, ~0U, 0,
 GFP_KERNEL);
 
/*
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 58419da..b2bd4c9 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -944,6 +944,7 @@ static noinline size_t if_nlmsg_size(const struct 
net_device *dev,
   + nla_total_size(MAX_PHYS_ITEM_ID_LEN) /* IFLA_PHYS_SWITCH_ID */
   + nla_total_size(IFNAMSIZ) /* IFLA_PHYS_PORT_NAME */
   + rtnl_xdp_size(dev) /* IFLA_XDP */
+  + nla_total_size(4)  /* IFLA_EVENT */
   + nla_total_size(1); /* IFLA_PROTO_DOWN */
 
 }
@@ -1276,9 +1277,70 @@ static int rtnl_xdp_fill(struct sk_buff *skb, struct 
net_device *dev)
return err;
 }
 
+static int rtnl_fill_link_event(struct sk_buff *skb, unsigned long event)
+{
+   u32 rtnl_event;
+
+   switch (event) {
+   case NETDEV_REBOOT:
+   rtnl_event = IFLA_EVENT_REBOOT;
+   break;
+   case NETDEV_CHANGEMTU:
+   rtnl_event = IFLA_EVENT_CHANGE_MTU;
+   break;
+   case NETDEV_CHANGEADDR:
+   rtnl_event = IFLA_EVENT_CHANGE_ADDR;
+   break;
+   case NETDEV_CHANGENAME:
+   rtnl_event = IFLA_EVENT_CHANGE_NAME;
+   break;
+   case NETDEV_FEAT_CHANGE:
+   rtnl_event = IFLA_EVENT_FEAT_CHANGE;
+   break;
+   case NETDEV_BONDING_FAILOVER:
+   rtnl_event = IFLA_EVENT_BONDING_FAILOVER;
+   break;
+   case NETDEV_POST_TYPE_CHANGE:
+   rtnl_event = IFLA_EVENT_POST_TYPE_CHANGE;
+   break;
+   case NETDEV_NOTIFY_PEERS:
+   rtnl_event = IFLA_EVENT_NOTIFY_PEERS;
+   break;
+   case

[PATCH net-next 0/2] rtnetlink: Updates to rtnetlink_event()

2017-04-04 Thread Vladislav Yasevich

This series came out of the conversation that started as a result
my first attempt to add netdevice event info to netlink messages.

This series converts event processing to a 'white list', where
we explicitely permit events to generate netlink messages.  This
is meant to make people take a closer look and determine wheter
these events should really trigger netlink messages.

I am also adding a V2 of my patch to add event type to the netlink
message.  This version supports all events that we currently generate.

I will also update my patch to iproute that will show this data
through 'ip monitor'. 

I actually need the ability to trap NETDEV_NOTIFY_PEERS event
(as well as possible NETDEV_RESEND_IGMP) to support hanlding of
macvtap on top of bonding.  I hope others will also find this info usefull.

Vladislav Yasevich (2):
  rtnetlink: Convert rtnetlink_event to white list
  rtnl: Add support for netdev event to link messages

 include/linux/rtnetlink.h|   3 +-
 include/uapi/linux/if_link.h |  19 
 net/core/dev.c   |   2 +-
 net/core/rtnetlink.c | 113 ++-
 4 files changed, 113 insertions(+), 24 deletions(-)

-- 
2.7.4

[PATCH v2 net-next 1/2] rtnetlink: Convert rtnetlink_event to white list

2017-04-04 Thread Vladislav Yasevich

The rtnetlink_event currently functions as a blacklist where
we block cerntain netdev events from being sent to user space.
As a result, events have been added to the system that userspace
probably doesn't care about.

This patch converts the implementation to the white list so that
newly events would have to be specifically added to the list to
be sent to userspace.  This would force new event implementers to
consider whether a given event is usefull to user space or if it's
just a kernel event.

Signed-off-by: Vladislav Yasevich 
---
V2: Added missed events (from David Ahern)

 net/core/rtnetlink.c | 31 +--
 1 file changed, 17 insertions(+), 14 deletions(-)

diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 9c3947a..58419da 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -4116,22 +4116,25 @@ static int rtnetlink_event(struct notifier_block *this, 
unsigned long event, voi
struct net_device *dev = netdev_notifier_info_to_dev(ptr);
 
switch (event) {
-   case NETDEV_UP:
-   case NETDEV_DOWN:
-   case NETDEV_PRE_UP:
-   case NETDEV_POST_INIT:
-   case NETDEV_REGISTER:
-   case NETDEV_CHANGE:
-   case NETDEV_PRE_TYPE_CHANGE:
-   case NETDEV_GOING_DOWN:
-   case NETDEV_UNREGISTER:
-   case NETDEV_UNREGISTER_FINAL:
-   case NETDEV_RELEASE:
-   case NETDEV_JOIN:
-   case NETDEV_BONDING_INFO:
+   case NETDEV_REBOOT:
+   case NETDEV_CHANGEMTU:
+   case NETDEV_CHANGEADDR:
+   case NETDEV_CHANGENAME:
+   case NETDEV_FEAT_CHANGE:
+   case NETDEV_BONDING_FAILOVER:
+   case NETDEV_POST_TYPE_CHANGE:
+   case NETDEV_NOTIFY_PEERS:
+   case NETDEV_CHANGEUPPER:
+   case NETDEV_RESEND_IGMP:
+   case NETDEV_PRECHANGEMTU:
+   case NETDEV_CHANGEINFODATA:
+   case NETDEV_PRECHANGEUPPER:
+   case NETDEV_CHANGELOWERSTATE:
+   case NETDEV_UDP_TUNNEL_PUSH_INFO:
+   case NETDEV_CHANGE_TX_QUEUE_LEN:
+   rtmsg_ifinfo(RTM_NEWLINK, dev, 0, GFP_KERNEL);
break;
default:
-   rtmsg_ifinfo(RTM_NEWLINK, dev, 0, GFP_KERNEL);
break;
}
return NOTIFY_DONE;
-- 
2.7.4

[PATCH v2 net-next 0/2] rtnetlink: Updates to rtnetlink_event()

2017-04-04 Thread Vladislav Yasevich

This series came out of the conversation that started as a result
my first attempt to add netdevice event info to netlink messages.

This series converts event processing to a 'white list', where
we explicitely permit events to generate netlink messages.  This
is meant to make people take a closer look and determine wheter
these events should really trigger netlink messages.

I am also adding a V2 of my patch to add event type to the netlink
message.  This version supports all events that we currently generate.

I will also update my patch to iproute that will show this data
through 'ip monitor'. 

I actually need the ability to trap NETDEV_NOTIFY_PEERS event
(as well as possible NETDEV_RESEND_IGMP) to support hanlding of
macvtap on top of bonding.  I hope others will also find this info usefull.

V2: Added missed events (from David Ahern)

Vladislav Yasevich (2):
  rtnetlink: Convert rtnetlink_event to white list
  rtnl: Add support for netdev event to link messages

 include/linux/rtnetlink.h|   3 +-
 include/uapi/linux/if_link.h |  21 
 net/core/dev.c   |   2 +-
 net/core/rtnetlink.c | 121 +++
 4 files changed, 123 insertions(+), 24 deletions(-)

-- 
2.7.4

[PATCH net-next v2 1/1] net: tcp: Define the TCP_MAX_WSCALE instead of literal number 14

2017-04-04 Thread gfree . wind

From: Gao Feng 

Define one new macro TCP_MAX_WSCALE instead of literal number '14',
and use U16_MAX instead of 65535 as the max value of TCP window.
There is another minor change, use rounddown(space, mss) instead of
(space / mss) * mss;

Signed-off-by: Gao Feng 
---
 v2: Correct the literal 14 in comment and log too, per Neal
 v1: initial version

 include/net/tcp.h |  3 +++
 net/ipv4/tcp.c|  2 +-
 net/ipv4/tcp_input.c  |  9 +
 net/ipv4/tcp_output.c | 12 +---
 4 files changed, 14 insertions(+), 12 deletions(-)

diff --git a/include/net/tcp.h b/include/net/tcp.h
index 582e377..cc6ae0a 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -78,6 +78,9 @@
 /* Maximal number of ACKs sent quickly to accelerate slow-start. */
 #define TCP_MAX_QUICKACKS  16U
 
+/* Maximal number of window scale according to RFC1323 */
+#define TCP_MAX_WSCALE 14U
+
 /* urg_data states */
 #define TCP_URG_VALID  0x0100
 #define TCP_URG_NOTYET 0x0200
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 1665948..94f0b5b 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -2393,7 +2393,7 @@ static int tcp_repair_options_est(struct tcp_sock *tp,
u16 snd_wscale = opt.opt_val & 0x;
u16 rcv_wscale = opt.opt_val >> 16;
 
-   if (snd_wscale > 14 || rcv_wscale > 14)
+   if (snd_wscale > TCP_MAX_WSCALE || rcv_wscale > 
TCP_MAX_WSCALE)
return -EFBIG;
 
tp->rx_opt.snd_wscale = snd_wscale;
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index a75c48f..ed6606c 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -3759,11 +3759,12 @@ void tcp_parse_options(const struct sk_buff *skb,
!estab && sysctl_tcp_window_scaling) {
__u8 snd_wscale = *(__u8 *)ptr;
opt_rx->wscale_ok = 1;
-   if (snd_wscale > 14) {
-   net_info_ratelimited("%s: 
Illegal window scaling value %d >14 received\n",
+   if (snd_wscale > TCP_MAX_WSCALE) {
+   net_info_ratelimited("%s: 
Illegal window scaling value %d > %u received\n",
 __func__,
-
snd_wscale);
-   snd_wscale = 14;
+snd_wscale,
+
TCP_MAX_WSCALE);
+   snd_wscale = TCP_MAX_WSCALE;
}
opt_rx->snd_wscale = snd_wscale;
}
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 1397194..0e807a8 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -212,12 +212,12 @@ void tcp_select_initial_window(int __space, __u32 mss,
 
/* If no clamp set the clamp to the max possible scaled window */
if (*window_clamp == 0)
-   (*window_clamp) = (65535 << 14);
+   (*window_clamp) = (U16_MAX << TCP_MAX_WSCALE);
space = min(*window_clamp, space);
 
/* Quantize space offering to a multiple of mss if possible. */
if (space > mss)
-   space = (space / mss) * mss;
+   space = rounddown(space, mss);
 
/* NOTE: offering an initial window larger than 32767
 * will break some buggy TCP stacks. If the admin tells us
@@ -234,13 +234,11 @@ void tcp_select_initial_window(int __space, __u32 mss,
 
(*rcv_wscale) = 0;
if (wscale_ok) {
-   /* Set window scaling on max possible window
-* See RFC1323 for an explanation of the limit to 14
-*/
+   /* Set window scaling on max possible window */
space = max_t(u32, space, sysctl_tcp_rmem[2]);
space = max_t(u32, space, sysctl_rmem_max);
space = min_t(u32, space, *window_clamp);
-   while (space > 65535 && (*rcv_wscale) < 14) {
+   while (space > U16_MAX && (*rcv_wscale) < TCP_MAX_WSCALE) {
space >>= 1;
(*rcv_wscale)++;
}
@@ -253,7 +251,7 @@ void tcp_select_initial_window(int __space, __u32 mss,
}
 
/* Set the clamp no higher than max representable value */
-   (*window_clamp) = min(65535U << (*rcv_wscale), *window_clamp);
+   (*window_clamp) = min_t(__u32, U16_MAX << (*rcv_wscale), *window_clamp);
 }
 EXPORT_SYMBOL(tcp_select_initial_window);

Re: [PATCH rfc 5/6] block: Add rdma affinity based queue mapping helper

2017-04-04 Thread Christoph Hellwig

On Tue, Apr 04, 2017 at 10:46:54AM +0300, Max Gurtovoy wrote:
>> +if (set->nr_hw_queues > dev->num_comp_vectors)
>> +goto fallback;
>> +
>> +for (queue = 0; queue < set->nr_hw_queues; queue++) {
>> +mask = ib_get_vector_affinity(dev, first_vec + queue);
>> +if (!mask)
>> +goto fallback;
>
> Christoph,
> we can use fallback also in the blk-mq-pci.c in case pci_irq_get_affinity 
> fails, right ?

For PCI it shouldn't fail as the driver calling pci_irq_get_affinity
knows how it set up the interrupts.  So I don't think it's necessary there.

Re: [PATCH] selftests: add a generic testsuite for ethernet device

2017-04-04 Thread Andrew Lunn

On Tue, Apr 04, 2017 at 09:56:04AM +0200, Corentin Labbe wrote:
> On Mon, Apr 03, 2017 at 03:27:41PM +0200, Andrew Lunn wrote:
> > > By ifnum, you mean by the order that "ip link" gives ?
> > 
> > I've not checked if it remains in order as interfaces are hot
> > plugged/unplugged. But i guess you are running tests direct after
> > boot, and unplugs/replugs are unlikely?
> > 
> 
> Yes the ideal run is just after boot, but the test need to succeed in all 
> case.
> 
> For handling vlan, I think the best way is to detect the master interface by 
> looking the "@master".

Hi Corentin

In my case, these are DSA interfaces. Documentation/networking/dsa/dsa.txt

> So when testing a vlan interface, the test would just up also the masteri 
> interface.

That will work for DSA. 

I know kernelci.org has at least two boards with Ethernet switches
that will have such interfaces. So it is great to test them.

Thanks
 Andrew

[PATCH net-next] net: stmmac: allow changing the MTU while the interface is running

2017-04-04 Thread Niklas Cassel

From: Niklas Cassel 

Setting ethtool ops for stmmac is only allowed when the interface is up.
Setting MTU (a netdev op) for stmmac is only allowed when the interface
is down.

It seems that the only reason why MTU cannot be changed when running is
that we have not bothered to implement a nice way to dealloc/alloc the
descriptor rings.

To make it less confusing for the user, call ndo_stop() and ndo_open()
from ndo_change_mtu(). This is not a nice way to dealloc/alloc the
descriptor rings, since it will announce that the interface is being
brought down/up to user space, but there are several other drivers doing
it this way, and it is arguably better than just returning -EBUSY.

Signed-off-by: Niklas Cassel 
---
 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 12 +---
 1 file changed, 5 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c 
b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index c1c63197ff73..fd268dc0df02 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -3109,17 +3109,15 @@ static void stmmac_set_rx_mode(struct net_device *dev)
  */
 static int stmmac_change_mtu(struct net_device *dev, int new_mtu)
 {
-   struct stmmac_priv *priv = netdev_priv(dev);
-
-   if (netif_running(dev)) {
-   netdev_err(priv->dev, "must be stopped to change its MTU\n");
-   return -EBUSY;
-   }
-
dev->mtu = new_mtu;
 
netdev_update_features(dev);
 
+   if (netif_running(dev)) {
+   stmmac_release(dev);
+   stmmac_open(dev);
+   }
+
return 0;
 }
 
-- 
2.11.0

Re: [next-queue v6 PATCH 2/7] i40e: Introduce Port Representor netdevs and switchdev mode.

2017-04-04 Thread Or Gerlitz

On Mon, Apr 3, 2017 at 9:41 PM, Samudrala, Sridhar
 wrote:
> On 3/30/2017 12:17 AM, Or Gerlitz wrote:
>> On Thu, Mar 30, 2017, Sridhar Samudrala wrote:

>>> Port Representator netdevs are created for each PF and VF if the switch
>>> mode is set to 'switchdev'. These netdevs can be used to control and
>>> configure VFs and PFs when they are moved to a different namespace.
>>> They enable exposing statistics, configure and monitor link state, mtu,
>>> filters,fdb/vlan entries etc.

>>> In switchdev mode, broadcasts from VFs are received by the PF and passed
>>> to corresponding port representor netdev.

>> What netdev represents the uplink (wire port) in your impl?

combining your replies from the two emails:

> We don't have a port netdev representing the uplink in this implementation as 
> we
> cannot control the frames going out the uplink via sw rules with the current
> generation of hw/fw.

> fwd to CPU as default rule is not possible with the current generation of 
> hw/fw.
> So we would like to enable switchdev to expose the port representors and start
> adding offloads in an incremental way.

I lost you even deeper

I was asking on frames getting in from the uplink and not getting out
the uplink.

This is about offloading to HW a switching model where the steering
(matching and actions)
comes into play on the port ingress. E.g

VF NIC xmit ---> VF vport e-switch rep recv --> SW or HW steering

other node xmit --> UPLINK vport e-switch rep recv --> SW or HW steering

If your current HW can't let you have "send to CPU" as the default
action on ingress
for the VFs and uplink ports, I am not clear what use-cases you can do
in slow path
(only reps, no offloaded SW rules) and for past path (reps + offloaded
SW rules)...

Can you please elaborate on such use-cases, so the bigger picture is more clear?

Or.

Re: [PATCH] soreuseport: use "unsigned int" in __reuseport_alloc()

2017-04-04 Thread Alexey Dobriyan

On Mon, Apr 3, 2017 at 4:56 PM, Craig Gallek  wrote:
> On Sun, Apr 2, 2017 at 6:18 PM, Alexey Dobriyan  wrote:
>> Number of sockets is limited by 16-bit, so 64-bit allocation will never
>> happen.
>>
>> 16-bit ops are the worst code density-wise on x86_64 because of
>> additional prefix (66).
> So this boils down to a compiled code density vs a
> readability/maintainability argument?  I'm not familiar with the 16
> bit problem you're referring to, but I'd argue that using the
> self-documenting u16 as an input parameter to define the range
> expectations is more useful that the micro optimization that this
> change may buy you in the assembly of one platform.  Especially given
> that this is a rare-use function.

It's not a problem as in "create trouble".
16-bit operations are the worst on x86_64: they require additional prefix,
compiler often has to extend it to 32-bit to do anything useful
(MOVZX = 1 cycle, 3 bytes) because of cast-everything-to-int
behaviour enabled by the language.

Re: [PATCH v3] tracing/kprobes: expose maxactive for kretprobe in kprobe_events

2017-04-04 Thread Masami Hiramatsu

On Mon,  3 Apr 2017 12:36:22 +0200
Alban Crequy  wrote:

> From: Alban Crequy 
> 
> When a kretprobe is installed on a kernel function, there is a maximum
> limit of how many calls in parallel it can catch (aka "maxactive"). A
> kernel module could call register_kretprobe() and initialize maxactive
> (see example in samples/kprobes/kretprobe_example.c).
> 
> But that is not exposed to userspace and it is currently not possible to
> choose maxactive when writing to /sys/kernel/debug/tracing/kprobe_events
> 
> The default maxactive can be as low as 1 on single-core with a
> non-preemptive kernel. This is too low and we need to increase it not
> only for recursive functions, but for functions that sleep or resched.
> 
> This patch updates the format of the command that can be written to
> kprobe_events so that maxactive can be optionally specified.
> 
> I need this for a bpf program attached to the kretprobe of
> inet_csk_accept, which can sleep for a long time.
> 
> This patch includes a basic selftest:
> 
> > # ./ftracetest -v  test.d/kprobe/
> > === Ftrace unit tests ===
> > [1] Kprobe dynamic event - adding and removing  [PASS]
> > [2] Kprobe dynamic event - busy event check [PASS]
> > [3] Kprobe dynamic event with arguments [PASS]
> > [4] Kprobes event arguments with types  [PASS]
> > [5] Kprobe dynamic event with function tracer   [PASS]
> > [6] Kretprobe dynamic event with arguments  [PASS]
> > [7] Kretprobe dynamic event with maxactive  [PASS]
> >
> > # of passed:  7
> > # of failed:  0
> > # of unresolved:  0
> > # of untested:  0
> > # of unsupported:  0
> > # of xfailed:  0
> > # of undefined(test bug):  0
> 
> BugLink: https://github.com/iovisor/bcc/issues/1072
> Signed-off-by: Alban Crequy 

Looks good to me.

Acked-by: Masami Hiramatsu 

Thanks!

> 
> ---
> 
> Changes since v2:
> - Explain the default maxactive value in the documentation.
>   (Review from Steven Rostedt)
> 
> Changes since v1:
> - Remove "(*)" from documentation. (Review from Masami Hiramatsu)
> - Fix support for "r100" without the event name (Review from Masami Hiramatsu)
> - Get rid of magic numbers within the code.  (Review from Steven Rostedt)
>   Note that I didn't use KRETPROBE_MAXACTIVE_ALLOC since that patch is not
>   merged.
> - Return -E2BIG when maxactive is too big.
> - Add basic selftest
> ---
>  Documentation/trace/kprobetrace.txt|  5 ++-
>  kernel/trace/trace_kprobe.c| 39 
> ++
>  .../ftrace/test.d/kprobe/kretprobe_maxactive.tc| 39 
> ++
>  3 files changed, 76 insertions(+), 7 deletions(-)
>  create mode 100644 
> tools/testing/selftests/ftrace/test.d/kprobe/kretprobe_maxactive.tc
> 
> diff --git a/Documentation/trace/kprobetrace.txt 
> b/Documentation/trace/kprobetrace.txt
> index 41ef9d8..25f3960 100644
> --- a/Documentation/trace/kprobetrace.txt
> +++ b/Documentation/trace/kprobetrace.txt
> @@ -23,7 +23,7 @@ current_tracer. Instead of that, add probe points via
>  Synopsis of kprobe_events
>  -
>p[:[GRP/]EVENT] [MOD:]SYM[+offs]|MEMADDR [FETCHARGS]   : Set a probe
> -  r[:[GRP/]EVENT] [MOD:]SYM[+0] [FETCHARGS]  : Set a return probe
> +  r[MAXACTIVE][:[GRP/]EVENT] [MOD:]SYM[+0] [FETCHARGS]   : Set a return 
> probe
>-:[GRP/]EVENT  : Clear a probe
>  
>   GRP : Group name. If omitted, use "kprobes" for it.
> @@ -32,6 +32,9 @@ Synopsis of kprobe_events
>   MOD : Module name which has given SYM.
>   SYM[+offs]  : Symbol+offset where the probe is inserted.
>   MEMADDR : Address where the probe is inserted.
> + MAXACTIVE   : Maximum number of instances of the specified function that
> +   can be probed simultaneously, or 0 for the default value
> +   as defined in Documentation/kprobes.txt section 1.3.1.
>  
>   FETCHARGS   : Arguments. Each probe can have up to 128 args.
>%REG   : Fetch register REG
> diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
> index c5089c7..ae81f3c 100644
> --- a/kernel/trace/trace_kprobe.c
> +++ b/kernel/trace/trace_kprobe.c
> @@ -25,6 +25,7 @@
>  #include "trace_probe.h"
>  
>  #define KPROBE_EVENT_SYSTEM "kprobes"
> +#define KRETPROBE_MAXACTIVE_MAX 4096
>  
>  /**
>   * Kprobe event core functions
> @@ -282,6 +283,7 @@ static struct trace_kprobe *alloc_trace_kprobe(const char 
> *group,
>void *addr,
>const char *symbol,
>unsigned long offs,
> +  int maxactive,
>int nargs, bool is_return)
>  {
>   struct trace_kprobe *tk;
> @@ -309,6 +311,8 @@ static struct trace_kprobe *alloc_trace_kprobe(const char 
> *group,
>   else

RE: [PATCH 16/16] drivers, net, intersil: convert request_context.refcount from atomic_t to refcount_t

2017-04-04 Thread Reshetova, Elena


> Elena Reshetova  writes:
> 
> > refcount_t type and corresponding API should be
> > used instead of atomic_t when the variable is used as
> > a reference counter. This allows to avoid accidental
> > refcounter overflows that might lead to use-after-free
> > situations.
> >
> > Signed-off-by: Elena Reshetova 
> > Signed-off-by: Hans Liljestrand 
> > Signed-off-by: Kees Cook 
> > Signed-off-by: David Windsor 
> > ---
> >  drivers/net/wireless/intersil/orinoco/orinoco_usb.c | 15 ---
> >  1 file changed, 8 insertions(+), 7 deletions(-)
> 
> The prefix should be "orinoco_usb:", I'll fix that.

Thanks for both! Will you take the patches in?

Best Regards,
Elena.

> 
> --
> Kalle Valo

[PATCH] net: thunderx: Switch to pci_alloc_irq_vectors

2017-04-04 Thread dev . srinivasulu

From: Thanneeru Srinivasulu 

Remove deprecated pci_enable_msix API in favour of it's
successor pci_alloc_irq_vectors.

Signed-off-by: Thanneeru Srinivasulu 
Signed-off-by: Sunil Goutham 
---
 drivers/net/ethernet/cavium/thunder/nic.h|2 -
 drivers/net/ethernet/cavium/thunder/nic_main.c   |   64 +---
 drivers/net/ethernet/cavium/thunder/nicvf_main.c |   70 --
 3 files changed, 40 insertions(+), 96 deletions(-)

diff --git a/drivers/net/ethernet/cavium/thunder/nic.h 
b/drivers/net/ethernet/cavium/thunder/nic.h
index 2269ff5..6fb4421 100644
--- a/drivers/net/ethernet/cavium/thunder/nic.h
+++ b/drivers/net/ethernet/cavium/thunder/nic.h
@@ -319,9 +319,7 @@ struct nicvf {
struct bgx_statsbgx_stats;
 
/* MSI-X  */
-   boolmsix_enabled;
u8  num_vec;
-   struct msix_entry   msix_entries[NIC_VF_MSIX_VECTORS];
charirq_name[NIC_VF_MSIX_VECTORS][IFNAMSIZ + 15];
boolirq_allocated[NIC_VF_MSIX_VECTORS];
cpumask_var_t   affinity_mask[NIC_VF_MSIX_VECTORS];
diff --git a/drivers/net/ethernet/cavium/thunder/nic_main.c 
b/drivers/net/ethernet/cavium/thunder/nic_main.c
index 767234e..fb770b0 100644
--- a/drivers/net/ethernet/cavium/thunder/nic_main.c
+++ b/drivers/net/ethernet/cavium/thunder/nic_main.c
@@ -65,9 +65,7 @@ struct nicpf {
boolmbx_lock[MAX_NUM_VFS_SUPPORTED];
 
/* MSI-X */
-   boolmsix_enabled;
u8  num_vec;
-   struct msix_entry   *msix_entries;
boolirq_allocated[NIC_PF_MSIX_VECTORS];
charirq_name[NIC_PF_MSIX_VECTORS][20];
 };
@@ -1088,7 +1086,7 @@ static irqreturn_t nic_mbx_intr_handler(int irq, void 
*nic_irq)
u64 intr;
u8  vf, vf_per_mbx_reg = 64;
 
-   if (irq == nic->msix_entries[NIC_PF_INTR_ID_MBOX0].vector)
+   if (irq == pci_irq_vector(nic->pdev, NIC_PF_INTR_ID_MBOX0))
mbx = 0;
else
mbx = 1;
@@ -1107,51 +1105,13 @@ static irqreturn_t nic_mbx_intr_handler(int irq, void 
*nic_irq)
return IRQ_HANDLED;
 }
 
-static int nic_enable_msix(struct nicpf *nic)
-{
-   int i, ret;
-
-   nic->num_vec = pci_msix_vec_count(nic->pdev);
-
-   nic->msix_entries = kmalloc_array(nic->num_vec,
- sizeof(struct msix_entry),
- GFP_KERNEL);
-   if (!nic->msix_entries)
-   return -ENOMEM;
-
-   for (i = 0; i < nic->num_vec; i++)
-   nic->msix_entries[i].entry = i;
-
-   ret = pci_enable_msix(nic->pdev, nic->msix_entries, nic->num_vec);
-   if (ret) {
-   dev_err(>pdev->dev,
-   "Request for #%d msix vectors failed, returned %d\n",
-  nic->num_vec, ret);
-   kfree(nic->msix_entries);
-   return ret;
-   }
-
-   nic->msix_enabled = 1;
-   return 0;
-}
-
-static void nic_disable_msix(struct nicpf *nic)
-{
-   if (nic->msix_enabled) {
-   pci_disable_msix(nic->pdev);
-   kfree(nic->msix_entries);
-   nic->msix_enabled = 0;
-   nic->num_vec = 0;
-   }
-}
-
 static void nic_free_all_interrupts(struct nicpf *nic)
 {
int irq;
 
for (irq = 0; irq < nic->num_vec; irq++) {
if (nic->irq_allocated[irq])
-   free_irq(nic->msix_entries[irq].vector, nic);
+   free_irq(pci_irq_vector(nic->pdev, irq), nic);
nic->irq_allocated[irq] = false;
}
 }
@@ -1159,18 +1119,24 @@ static void nic_free_all_interrupts(struct nicpf *nic)
 static int nic_register_interrupts(struct nicpf *nic)
 {
int i, ret;
+   nic->num_vec = pci_msix_vec_count(nic->pdev);
 
/* Enable MSI-X */
-   ret = nic_enable_msix(nic);
-   if (ret)
-   return ret;
+   ret = pci_alloc_irq_vectors(nic->pdev, nic->num_vec, nic->num_vec,
+   PCI_IRQ_MSIX);
+   if (ret < 0) {
+   dev_err(>pdev->dev,
+   "Request for #%d msix vectors failed, returned %d\n",
+  nic->num_vec, ret);
+   return 1;
+   }
 
/* Register mailbox interrupt handler */
for (i = NIC_PF_INTR_ID_MBOX0; i < nic->num_vec; i++) {
sprintf(nic->irq_name[i],
"NICPF Mbox%d", (i - NIC_PF_INTR_ID_MBOX0));
 
-   ret = request_irq(nic->msix_entries[i].vector,
+   ret = request_irq(pci_irq_vector(nic->pdev, i),
  nic_mbx_intr_handler, 0,
  nic->irq_name[i], nic);
if (ret)
@@

Re: [PATCH] selftests: add a generic testsuite for ethernet device

2017-04-04 Thread Corentin Labbe

On Mon, Apr 03, 2017 at 03:27:41PM +0200, Andrew Lunn wrote:
> > By ifnum, you mean by the order that "ip link" gives ?
> 
> I've not checked if it remains in order as interfaces are hot
> plugged/unplugged. But i guess you are running tests direct after
> boot, and unplugs/replugs are unlikely?
> 

Yes the ideal run is just after boot, but the test need to succeed in all case.

For handling vlan, I think the best way is to detect the master interface by 
looking the "@master".
So when testing a vlan interface, the test would just up also the masteri 
interface.

So I will drop using "ls /sys/clas/net/" and directly use output of ip

Regards

Re: [PATCH rfc 0/6] Automatic affinity settings for nvme over rdma

2017-04-04 Thread Max Gurtovoy



Any feedback is welcome.


Hi Sagi,

the patchset looks good and of course we can add support for more 
drivers in the future.

have you run some performance testing with the nvmf initiator ?




Sagi Grimberg (6):
  mlx5: convert to generic pci_alloc_irq_vectors
  mlx5: move affinity hints assignments to generic code
  RDMA/core: expose affinity mappings per completion vector
  mlx5: support ->get_vector_affinity
  block: Add rdma affinity based queue mapping helper
  nvme-rdma: use intelligent affinity based queue mappings

 block/Kconfig  |   5 +
 block/Makefile |   1 +
 block/blk-mq-rdma.c|  56 +++
 drivers/infiniband/hw/mlx5/main.c  |  10 ++
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c  |   5 +-
 drivers/net/ethernet/mellanox/mlx5/core/eq.c   |   9 +-
 drivers/net/ethernet/mellanox/mlx5/core/eswitch.c  |   2 +-
 drivers/net/ethernet/mellanox/mlx5/core/health.c   |   2 +-
 drivers/net/ethernet/mellanox/mlx5/core/main.c | 106 +++--
 .../net/ethernet/mellanox/mlx5/core/mlx5_core.h|   1 -
 drivers/nvme/host/rdma.c   |  13 +++
 include/linux/blk-mq-rdma.h|  10 ++
 include/linux/mlx5/driver.h|   2 -
 include/rdma/ib_verbs.h|  24 +
 14 files changed, 138 insertions(+), 108 deletions(-)
 create mode 100644 block/blk-mq-rdma.c
 create mode 100644 include/linux/blk-mq-rdma.h

Re: [PATCH rfc 5/6] block: Add rdma affinity based queue mapping helper

2017-04-04 Thread Max Gurtovoy




diff --git a/block/blk-mq-rdma.c b/block/blk-mq-rdma.c
new file mode 100644
index ..d402f7c93528
--- /dev/null
+++ b/block/blk-mq-rdma.c
@@ -0,0 +1,56 @@
+/*
+ * Copyright (c) 2017 Sagi Grimberg.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ */


shouldn't you include   and  like in 
commit 8ec2ef2b66ea2f that fixes blk-mq-pci.c ?



+#include 
+#include 
+#include 
+#include 
+#include "blk-mq.h"


Is this include needed ?



+
+/**
+ * blk_mq_rdma_map_queues - provide a default queue mapping for rdma device
+ * @set:   tagset to provide the mapping for
+ * @dev:   rdma device associated with @set.
+ * @first_vec: first interrupt vectors to use for queues (usually 0)
+ *
+ * This function assumes the rdma device @dev has at least as many available
+ * interrupt vetors as @set has queues.  It will then query it's affinity mask
+ * and built queue mapping that maps a queue to the CPUs that have irq affinity
+ * for the corresponding vector.
+ *
+ * In case either the driver passed a @dev with less vectors than
+ * @set->nr_hw_queues, or @dev does not provide an affinity mask for a
+ * vector, we fallback to the naive mapping.
+ */
+int blk_mq_rdma_map_queues(struct blk_mq_tag_set *set,
+   struct ib_device *dev, int first_vec)
+{
+   const struct cpumask *mask;
+   unsigned int queue, cpu;
+
+   if (set->nr_hw_queues > dev->num_comp_vectors)
+   goto fallback;
+
+   for (queue = 0; queue < set->nr_hw_queues; queue++) {
+   mask = ib_get_vector_affinity(dev, first_vec + queue);
+   if (!mask)
+   goto fallback;


Christoph,
we can use fallback also in the blk-mq-pci.c in case 
pci_irq_get_affinity fails, right ?



+
+   for_each_cpu(cpu, mask)
+   set->mq_map[cpu] = queue;
+   }
+
+   return 0;
+fallback:
+   return blk_mq_map_queues(set);
+}
+EXPORT_SYMBOL_GPL(blk_mq_rdma_map_queues);


Otherwise, Looks good.

Reviewed-by: Max Gurtovoy

Note

2017-04-04 Thread Bong Phang

favorable transaction of £ 12.5 million,which will benefit both of us.

Re: [PATCH] af_key: Add lock to key dump

2017-04-04 Thread Steffen Klassert

On Fri, Mar 31, 2017 at 03:10:20PM +0800, Yuejie Shi wrote:
> A dump may come in the middle of another dump, modifying its dump
> structure members. This race condition will result in NULL pointer
> dereference in kernel. So add a lock to prevent that race.
> 
> Fixes: 83321d6b9872 ("[AF_KEY]: Dump SA/SP entries non-atomically")
> Signed-off-by: Yuejie Shi 

Applied, thanks a lot!

Re: [PATCH rfc 6/6] nvme-rdma: use intelligent affinity based queue mappings

2017-04-04 Thread Christoph Hellwig

On Sun, Apr 02, 2017 at 04:41:32PM +0300, Sagi Grimberg wrote:
> Use the geneic block layer affinity mapping helper. Also,

  generic

>   nr_io_queues = min(opts->nr_io_queues, num_online_cpus());
> + nr_io_queues = min_t(unsigned int, nr_io_queues,
> + ibdev->num_comp_vectors);
> +

Add a comment here?

Otherwise looks fine:

Reviewed-by: Christoph Hellwig

Re: [PATCH rfc 5/6] block: Add rdma affinity based queue mapping helper

2017-04-04 Thread Christoph Hellwig

On Sun, Apr 02, 2017 at 04:41:31PM +0300, Sagi Grimberg wrote:
> Like pci and virtio, we add a rdma helper for affinity
> spreading. This achieves optimal mq affinity assignments
> according to the underlying rdma device affinity maps.
> 
> Signed-off-by: Sagi Grimberg 
> ---
>  block/Kconfig   |  5 
>  block/Makefile  |  1 +
>  block/blk-mq-rdma.c | 56 
> +
>  include/linux/blk-mq-rdma.h | 10 
>  4 files changed, 72 insertions(+)
>  create mode 100644 block/blk-mq-rdma.c
>  create mode 100644 include/linux/blk-mq-rdma.h
> 
> diff --git a/block/Kconfig b/block/Kconfig
> index 89cd28f8d051..3ab42bbb06d5 100644
> --- a/block/Kconfig
> +++ b/block/Kconfig
> @@ -206,4 +206,9 @@ config BLK_MQ_VIRTIO
>   depends on BLOCK && VIRTIO
>   default y
>  
> +config BLK_MQ_RDMA
> + bool
> + depends on BLOCK && INFINIBAND
> + default y
> +
>  source block/Kconfig.iosched
> diff --git a/block/Makefile b/block/Makefile
> index 081bb680789b..4498603dbc83 100644
> --- a/block/Makefile
> +++ b/block/Makefile
> @@ -26,6 +26,7 @@ obj-$(CONFIG_BLK_CMDLINE_PARSER)+= cmdline-parser.o
>  obj-$(CONFIG_BLK_DEV_INTEGRITY) += bio-integrity.o blk-integrity.o t10-pi.o
>  obj-$(CONFIG_BLK_MQ_PCI) += blk-mq-pci.o
>  obj-$(CONFIG_BLK_MQ_VIRTIO)  += blk-mq-virtio.o
> +obj-$(CONFIG_BLK_MQ_RDMA)+= blk-mq-rdma.o
>  obj-$(CONFIG_BLK_DEV_ZONED)  += blk-zoned.o
>  obj-$(CONFIG_BLK_WBT)+= blk-wbt.o
>  obj-$(CONFIG_BLK_DEBUG_FS)   += blk-mq-debugfs.o
> diff --git a/block/blk-mq-rdma.c b/block/blk-mq-rdma.c
> new file mode 100644
> index ..d402f7c93528
> --- /dev/null
> +++ b/block/blk-mq-rdma.c
> @@ -0,0 +1,56 @@
> +/*
> + * Copyright (c) 2017 Sagi Grimberg.
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms and conditions of the GNU General Public License,
> + * version 2, as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope it will be useful, but WITHOUT
> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
> + * more details.
> + */
> +#include 
> +#include 
> +#include 
> +#include 
> +#include "blk-mq.h"
> +
> +/**
> + * blk_mq_rdma_map_queues - provide a default queue mapping for rdma device
> + * @set: tagset to provide the mapping for
> + * @dev: rdma device associated with @set.
> + * @first_vec:   first interrupt vectors to use for queues (usually 0)
> + *
> + * This function assumes the rdma device @dev has at least as many available
> + * interrupt vetors as @set has queues.  It will then query it's affinity 
> mask
> + * and built queue mapping that maps a queue to the CPUs that have irq 
> affinity
> + * for the corresponding vector.
> + *
> + * In case either the driver passed a @dev with less vectors than
> + * @set->nr_hw_queues, or @dev does not provide an affinity mask for a
> + * vector, we fallback to the naive mapping.
> + */
> +int blk_mq_rdma_map_queues(struct blk_mq_tag_set *set,
> + struct ib_device *dev, int first_vec)
> +{
> + const struct cpumask *mask;
> + unsigned int queue, cpu;
> +
> + if (set->nr_hw_queues > dev->num_comp_vectors)
> + goto fallback;

maybe print a warning here?

Otherwise looks fine:

Reviewed-by: Christoph Hellwig

Re: [PATCH rfc 4/6] mlx5: support ->get_vector_affinity

2017-04-04 Thread Christoph Hellwig

Looks good,

Reviewed-by: Christoph Hellwig

Re: [PATCH rfc 3/6] RDMA/core: expose affinity mappings per completion vector

2017-04-04 Thread Christoph Hellwig

Looks good,

Reviewed-by: Christoph Hellwig

Re: [PATCH rfc 2/6] mlx5: move affinity hints assignments to generic code

2017-04-04 Thread Christoph Hellwig

> @@ -1375,7 +1375,8 @@ static void mlx5e_close_cq(struct mlx5e_cq *cq)
>  
>  static int mlx5e_get_cpu(struct mlx5e_priv *priv, int ix)
>  {
> - return cpumask_first(priv->mdev->priv.irq_info[ix].mask);
> + return cpumask_first(pci_irq_get_affinity(priv->mdev->pdev,
> + MLX5_EQ_VEC_COMP_BASE + ix));

This looks ok for now, but if we look at the callers we'd probably
want to make direct use of pci_irq_get_node and pci_irq_get_affinity for
the uses directly in mlx5e_open_channel as well as the stored away
->cpu field.  But maybe that should be left for another patch after
this one.

> + struct irq_affinity irqdesc = { .pre_vectors = MLX5_EQ_VEC_COMP_BASE, };

I usually move assignments inside structures onto a separate line to make
it more readable, e.g.

struct irq_affinity irqdesc = {
.pre_vectors = MLX5_EQ_VEC_COMP_BASE,
};

Otherwise this looks fine:

Reviewed-by: Christoph Hellwig

Re: [PATCH rfc 1/6] mlx5: convert to generic pci_alloc_irq_vectors

2017-04-04 Thread Christoph Hellwig

Looks good:

Reviewed-by: Christoph Hellwig

Re: [PATCH] [net-next] stmmac: use netif_set_real_num_{rx,tx}_queues

2017-04-04 Thread Giuseppe CAVALLARO


Hello Joao

On 4/3/2017 3:12 PM, Joao Pinto wrote:

Yes older cores do not support multiple queues and I tried to isolate the
features not to affect older versions.


ok so we are inline ;-)



Do you think that functions as "ndev = alloc_etherdev_mqs" has some sort of
influence?


I do not think so is we are passing the right txqs and rxqs values.

Regards
Peppe



Thanks.

RE: [PATCH -next] qed: Add a missing error code

2017-04-04 Thread Tayar, Tomer

> We should be returning -ENOMEM if qed_mcp_cmd_add_elem() fails.  The
> current code returns success.
> 
> Fixes: 4ed1eea82a21 ("qed: Revise MFW command locking")
> Signed-off-by: Dan Carpenter 

Thanks

Acked-by: Tomer Tayar

< 1 2

101 - 171 of 171 matches

Mail list logo