date:20160920

Re: [net-next 02/15] i40e: Enable VF specific ethtool statistics via VF Port representor netdevs

2016-09-20 Thread Samudrala, Sridhar



On 9/20/2016 9:26 PM, Or Gerlitz wrote:

On Wed, Sep 21, 2016 at 6:43 AM, Jeff Kirsher
 wrote:

From: Sridhar Samudrala 

Sample script that shows ethtool stats on VF representor netdev
PF: enp5s0f0, VF0: enp5s2  VF_REP0: enp5s0f0-vf0

# echo 2 > /sys/class/net/enp5s0f0/device/sriov_numvfs
# ip link set enp5s2 up
# ethtool -S enp5s0f0-vf0
NIC statistics:
  tx_bytes: 0
  tx_unicast: 0
  tx_multicast: 0
  tx_broadcast: 0
  tx_discards: 0
  tx_errors: 0
  rx_bytes: 140
  rx_unicast: 0
  rx_multicast: 2
  rx_broadcast: 0
  rx_discards: 0
  rx_unknown_protocol: 0

Now, when the SW stats are finally upstream for 4.9 in net-next, the
correct approach
for the VF reps counters is to follow the architecture presented there
[1] -- and this is
for the netlink based standard counters. Once you do that, there's no
need to expose
the VF HW counters through  ethtool of the VF rep.
Sure. Will look into it.  However, i think we can keep ethtool support 
also as VFPR

represents the switch port corresponding to the VF.



Or.


[1] offloaded stats commits
a5ea31f Merge branch 'net-offloaded-stats'
fc1bbb0 mlxsw: spectrum: Implement offload stats ndo and expose HW
stats by default
69ae6ad net: core: Add offload stats to if_stats_msg
2c9d85d netdevice: Add offload statistics ndo

Re: [RFC] PCI: Allow sysfs control over totalvfs

2016-09-20 Thread Jiri Pirko

Tue, Sep 20, 2016 at 10:27:24PM CEST, yuval.mi...@cavium.com wrote:
>> >Some of the HW capable of SRIOV has resource limitations, where the
>> >PF and VFs resources are drawn from a common pool.
>> >In some cases, these limitations have to be considered early during
>> >chip initialization and can only be changed by tearing down the
>> >configuration and re-initializing.
>> >As a result, drivers for such HWs sometimes have to make unfavorable
>> >compromises where they reserve sufficient resources to accomadate
>> >the maximal number of VFs that can be created - at the expanse of
>> >resources that could have been used by the PF.
>> >
>> >If users were able to provide 'hints' regarding the required number
>> >of VFs *prior* to driver attachment, then such compromises could be
>> >avoided. As we already have sysfs nodes that can be queried for the
>> >number of totalvfs, it makes sense to let the user reduce the number
>> >of said totalvfs using same infrastrucure.
>> >Then, we can have drivers supporting SRIOV take that value into account
>> >when deciding how much resources to reserve, allowing the PF to benefit
>> >from the difference between the configuration space value and the actual
>> >number needed by user.
>
>> One of the motivations for introducing devlink interface was to allow
>> user to pass some kind of well defined option parameters or as you call
>> it hints to driver module. That would allow to replace module options
>> and introduce similar possibility to pre-configure hardware on probe time.
>> We plan to use devlink to allow user to change resource allocation for
>> mlxsw devices.
>
>Is IOV configuration something you're going to explore in the near
>future for mlxsw devices? Or are you merely pointing out that

No, not sriov related directly.


>devlink could provide a superior configuration infrastrucutre and
>should be investigated as a better alternative?

Exactly. It is a general problem of how to pre-configure driver modules.


>
>> The plan is to allow to pre-create devlink instance before driver module
>> is loaded. Then the user will use this placeholder to do the options
>> setting. Once the driver module is loaded, it will fetch the options
>> from devlink core and process it accordingly.
>
>> I believe this is exactly what you need.
>
>While this sounds far-superior to anything we can do via pci sysfs,
>question is whether adding a devlink support for a device is 
>a reasonable cost for adding this specific configuration [given
>the existing sysfs nodes we already have].

Adding devlink support is trivial in most cases, I bet you can do it in
couple of minutes for your driver.


>I'm not sufficiently familiar with the infrastrucutre there, and I
>wonder whether it will set the bar too high for this sort of
>configuration to be used.

[PATCH net-next] tcp: implement TSQ for retransmits

2016-09-20 Thread Eric Dumazet

From: Eric Dumazet 

We saw sch_fq drops caused by the per flow limit of 100 packets and TCP
when dealing with large cwnd and bursts of retransmits.

Even after increasing the limit to 1000, and even after commit
10d3be569243 ("tcp-tso: do not split TSO packets at retransmit time"),
we can still have these drops.

Under certain conditions, TCP can spend a considerable amount of
time queuing thousands of skbs in a single tcp_xmit_retransmit_queue()
invocation, incurring latency spikes and stalls of other softirq
handlers.

This patch implements TSQ for retransmits, limiting number of packets
and giving more chance for scheduling packets in both ways.

Signed-off-by: Eric Dumazet 
Signed-off-by: Yuchung Cheng 
Signed-off-by: Neal Cardwell 
---
 net/ipv4/tcp_output.c |   72 ++--
 1 file changed, 47 insertions(+), 25 deletions(-)

diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 
7d025a7804b597465564f0980f2ac069d6c61d27..478dfc53917815d30838a21b1adc2ea7096425af
 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -734,9 +734,16 @@ static void tcp_tsq_handler(struct sock *sk)
 {
if ((1 << sk->sk_state) &
(TCPF_ESTABLISHED | TCPF_FIN_WAIT1 | TCPF_CLOSING |
-TCPF_CLOSE_WAIT  | TCPF_LAST_ACK))
-   tcp_write_xmit(sk, tcp_current_mss(sk), tcp_sk(sk)->nonagle,
+TCPF_CLOSE_WAIT  | TCPF_LAST_ACK)) {
+   struct tcp_sock *tp = tcp_sk(sk);
+
+   if (tp->lost_out > tp->retrans_out &&
+   tp->snd_cwnd > tcp_packets_in_flight(tp))
+   tcp_xmit_retransmit_queue(sk);
+
+   tcp_write_xmit(sk, tcp_current_mss(sk), tp->nonagle,
   0, GFP_ATOMIC);
+   }
 }
 /*
  * One tasklet per cpu tries to send more skbs.
@@ -2039,6 +2046,39 @@ static int tcp_mtu_probe(struct sock *sk)
return -1;
 }
 
+/* TCP Small Queues :
+ * Control number of packets in qdisc/devices to two packets / or ~1 ms.
+ * (These limits are doubled for retransmits)
+ * This allows for :
+ *  - better RTT estimation and ACK scheduling
+ *  - faster recovery
+ *  - high rates
+ * Alas, some drivers / subsystems require a fair amount
+ * of queued bytes to ensure line rate.
+ * One example is wifi aggregation (802.11 AMPDU)
+ */
+static bool tcp_small_queue_check(struct sock *sk, const struct sk_buff *skb,
+ unsigned int factor)
+{
+   unsigned int limit;
+
+   limit = max(2 * skb->truesize, sk->sk_pacing_rate >> 10);
+   limit = min_t(u32, limit, sysctl_tcp_limit_output_bytes);
+   limit <<= factor;
+
+   if (atomic_read(>sk_wmem_alloc) > limit) {
+   set_bit(TSQ_THROTTLED, _sk(sk)->tsq_flags);
+   /* It is possible TX completion already happened
+* before we set TSQ_THROTTLED, so we must
+* test again the condition.
+*/
+   smp_mb__after_atomic();
+   if (atomic_read(>sk_wmem_alloc) > limit)
+   return true;
+   }
+   return false;
+}
+
 /* This routine writes packets to the network.  It advances the
  * send_head.  This happens as incoming acks open up the remote
  * window for us.
@@ -2125,29 +2165,8 @@ static bool tcp_write_xmit(struct sock *sk, unsigned int 
mss_now, int nonagle,
unlikely(tso_fragment(sk, skb, limit, mss_now, gfp)))
break;
 
-   /* TCP Small Queues :
-* Control number of packets in qdisc/devices to two packets / 
or ~1 ms.
-* This allows for :
-*  - better RTT estimation and ACK scheduling
-*  - faster recovery
-*  - high rates
-* Alas, some drivers / subsystems require a fair amount
-* of queued bytes to ensure line rate.
-* One example is wifi aggregation (802.11 AMPDU)
-*/
-   limit = max(2 * skb->truesize, sk->sk_pacing_rate >> 10);
-   limit = min_t(u32, limit, sysctl_tcp_limit_output_bytes);
-
-   if (atomic_read(>sk_wmem_alloc) > limit) {
-   set_bit(TSQ_THROTTLED, >tsq_flags);
-   /* It is possible TX completion already happened
-* before we set TSQ_THROTTLED, so we must
-* test again the condition.
-*/
-   smp_mb__after_atomic();
-   if (atomic_read(>sk_wmem_alloc) > limit)
-   break;
-   }
+   if (tcp_small_queue_check(sk, skb, 0))
+   break;
 
if (unlikely(tcp_transmit_skb(sk, skb, 1, gfp)))
break;
@@ -2847,6 +2866,9 @@ begin_fwd:
if (sacked &

Re: [net-next 01/15] i40e: Introduce VF port representor/control netdevs

2016-09-20 Thread Samudrala, Sridhar




On 9/20/2016 9:22 PM, Or Gerlitz wrote:

On Wed, Sep 21, 2016 at 6:43 AM, Jeff Kirsher
 wrote:

From: Sridhar Samudrala 
This patch enables creation of a VF Port representor/Control netdev
associated with each VF. These netdevs can be used to control and configure
VFs from PFs namespace. They enable exposing VF statistics, configuring
link state, mtu, fdb/vlan entries etc.

What happens if someone does a xmit on the VF representor, does the
packet show up @ the VF?
and what happens of the VF xmits and there's no HW steering rule that
matches this, does
the frame show up @ the VF rep on the host?

TX/RX are not yet supported via VFPR netdevs in this patch series.
Will be submitting this support in the next patchset.


In other words, can these VF reps serve for setting up host SW based
switching which you
can later offload (through TC, bridge, netfilter, etc)?

Yes. These offloads will be possible  via VFPRs.


I am posing these questions because in downstream patch you are adding
devlink support
for set/get the e-switch mode and you declare the default mode to be switchdev.

When the switchdev mode was introduced in 4.8 these RX/TX
characteristics were defined
to be an essential (== requirement) part for a driver to support that mode.
The current patchset introduces the basic VFPR support starting with 
exposing VF stats and

syncing link state between VFs and VFPRs.
We decided to declare the default mode to be switchdev so that the new 
code paths will get

exercised by default during normal testing.



Or


 # echo 2 > /sys/class/net/enp5s0f0/device/sriov_numvfs
 # ip l show
 297: enp5s0f0:  mtu 1500 qdisc noop portid 
6805ca2e7268 state DOWN mode DEFAULT group default qlen 1000
 link/ether 68:05:ca:2e:72:68 brd ff:ff:ff:ff:ff:ff
 vf 0 MAC 00:00:00:00:00:00, spoof checking on, link-state auto, trust off
 vf 1 MAC 00:00:00:00:00:00, spoof checking on, link-state auto, trust off
 299: enp5s0f0-vf0:  mtu 1500 qdisc noop state DOWN 
mode DEFAULT group default qlen 1000
 link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff
 300: enp5s0f0-vf1:  mtu 1500 qdisc noop state DOWN 
mode DEFAULT group default qlen 1000
 link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff

Re: [patch net-next 0/9] mlxsw: Replace Hw related const with resource query results

2016-09-20 Thread David Miller

From: Jiri Pirko 
Date: Tue, 20 Sep 2016 11:16:48 +0200

> Many of the ASIC's properties can be read from the HW with resources query.
> This patchset adds new resources to the resource query and implement
> using them, instead of the constants that we currently use.
> Those resources are lag, kvd and router related.

Series applied, thanks.

Re: [PATCH net v2] ipmr, ip6mr: return lastuse relative to now

2016-09-20 Thread David Miller

From: Nikolay Aleksandrov 
Date: Tue, 20 Sep 2016 16:17:22 +0200

> When I introduced the lastuse member I made a subtle error because it was
> returned as an absolute value but that is meaningless to user-space as it
> doesn't allow to see how old exactly an entry is. Let's make it similar to
> how the bridge returns such values and make it relative to "now" (jiffies).
> This allows us to show the actual age of the entries and is much more
> useful (e.g. user-space daemons can age out entries, iproute2 can display
> the lastuse properly).
> 
> Fixes: 43b9e1274060 ("net: ipmr/ip6mr: add support for keeping an entry age")
> Reported-by: Satish Ashok 
> Signed-off-by: Nikolay Aleksandrov 
> ---
> v2: make sure lastuse is before or equal to jiffies as per David Laight's
> comment

Applied.

Re: [PATCH net 0/5] r8152: correct the flow of PHY

2016-09-20 Thread David Miller

From: Hayes Wang 
Date: Tue, 20 Sep 2016 16:22:04 +0800

> First, to enable the PHY as early as possible. Some settings may fail if the
> PHY is power down.
> 
> Move the other PHY settings to hw_phy_cfg() to make sure the order is correct.
> 
> Finally, disable ALDPS and EEE before updating the PHY for RTL8153.

Series applied, thanks.

Re: [PATCHv2 net] cxgb4/cxgb4vf: Allocate more queues for 25G and 100G adapter

2016-09-20 Thread David Miller

From: Hariprasad Shenai 
Date: Tue, 20 Sep 2016 12:00:52 +0530

> We were missing check for 25G and 100G while checking port speed,
> which lead to less number of queues getting allocated for 25G & 100G
> adapters and leading to low throughput. Adding the missing check for
> both NIC and vNIC driver.
> 
> Also fixes port advertisement for 25G and 100G in ethtool output.
> 
> Signed-off-by: Hariprasad Shenai 
> ---
> V2: Missed 25G in the first one

Applied.

Re: [PATCH net-next] mlxsw: spectrum: Make offloads stats functions static

2016-09-20 Thread David Miller

From: Or Gerlitz 
Date: Tue, 20 Sep 2016 08:14:08 +0300

> The offloads stats functions are local to this file, make them static.
> 
> Fixes: fc1bbb0f1831 ('mlxsw: spectrum: Implement offload stats ndo [..]')
> Signed-off-by: Or Gerlitz 

Applied.

Re: [PATCH net-next 2/3] net: ethernet: mediatek: add support for GMAC0 connecting with external PHY through TRGMII

2016-09-20 Thread David Miller

From: 
Date: Tue, 20 Sep 2016 15:59:19 +0800

> +/*TRGMII RXC control register*/
 ...
> +/*TRGMII RXC control register*/
 ...
> +/*TRGMII Interface mode register*/


Please put a space at the beginning and end of comment lines like this.

Thanks.

Re: [PATCH v4 net-next 00/16] tcp: BBR congestion control algorithm

2016-09-20 Thread David Miller

From: Neal Cardwell 
Date: Mon, 19 Sep 2016 23:39:07 -0400

> tcp: BBR congestion control algorithm

Series applied, thanks Neal.

Re: [net-next PATCH v2 1/2] e1000: add initial XDP support

2016-09-20 Thread John Fastabend

On 16-09-20 09:26 PM, zhuyj wrote:
>  +static int e1000_xdp_set(struct net_device *netdev, struct bpf_prog *prog)
> +{
> +   struct e1000_adapter *adapter = netdev_priv(netdev);
> +   struct bpf_prog *old_prog;
> +
> +   old_prog = xchg(>prog, prog);
> +   if (old_prog) {
> +   synchronize_net();
> +   bpf_prog_put(old_prog);
> +   }
> +
> +   if (netif_running(netdev))
> +   e1000_reinit_locked(adapter);
> +   else
> +   e1000_reset(adapter);
> +   return 0;
> +}
> 
> To this function, is it better to use "static void
> e1000_xdp_set(struct net_device *netdev, struct bpf_prog *prog)"?
> since it is always to return 0.
> 

In general try to avoid top posting.

Yes making it void would be reasonable and probably a good idea. I'll
do it in v3.

[...]

Thanks,
John

Re: [net-next PATCH v2 1/2] e1000: add initial XDP support

2016-09-20 Thread zhuyj

 +static int e1000_xdp_set(struct net_device *netdev, struct bpf_prog *prog)
+{
+   struct e1000_adapter *adapter = netdev_priv(netdev);
+   struct bpf_prog *old_prog;
+
+   old_prog = xchg(>prog, prog);
+   if (old_prog) {
+   synchronize_net();
+   bpf_prog_put(old_prog);
+   }
+
+   if (netif_running(netdev))
+   e1000_reinit_locked(adapter);
+   else
+   e1000_reset(adapter);
+   return 0;
+}

To this function, is it better to use "static void
e1000_xdp_set(struct net_device *netdev, struct bpf_prog *prog)"?
since it is always to return 0.


On Sat, Sep 10, 2016 at 5:29 AM, John Fastabend
 wrote:
> From: Alexei Starovoitov 
>
> This patch adds initial support for XDP on e1000 driver. Note e1000
> driver does not support page recycling in general which could be
> added as a further improvement. However XDP_DROP case will recycle.
> XDP_TX and XDP_PASS do not support recycling yet.
>
> I tested this patch running e1000 in a VM using KVM over a tap
> device.
>
> CC: William Tu 
> Signed-off-by: Alexei Starovoitov 
> Signed-off-by: John Fastabend 
> ---
>  drivers/net/ethernet/intel/e1000/e1000.h  |2
>  drivers/net/ethernet/intel/e1000/e1000_main.c |  171 
> +
>  2 files changed, 170 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/net/ethernet/intel/e1000/e1000.h 
> b/drivers/net/ethernet/intel/e1000/e1000.h
> index d7bdea7..5cf8a0a 100644
> --- a/drivers/net/ethernet/intel/e1000/e1000.h
> +++ b/drivers/net/ethernet/intel/e1000/e1000.h
> @@ -150,6 +150,7 @@ struct e1000_adapter;
>   */
>  struct e1000_tx_buffer {
> struct sk_buff *skb;
> +   struct page *page;
> dma_addr_t dma;
> unsigned long time_stamp;
> u16 length;
> @@ -279,6 +280,7 @@ struct e1000_adapter {
>  struct e1000_rx_ring *rx_ring,
>  int cleaned_count);
> struct e1000_rx_ring *rx_ring;  /* One per active queue */
> +   struct bpf_prog *prog;
> struct napi_struct napi;
>
> int num_tx_queues;
> diff --git a/drivers/net/ethernet/intel/e1000/e1000_main.c 
> b/drivers/net/ethernet/intel/e1000/e1000_main.c
> index f42129d..91d5c87 100644
> --- a/drivers/net/ethernet/intel/e1000/e1000_main.c
> +++ b/drivers/net/ethernet/intel/e1000/e1000_main.c
> @@ -32,6 +32,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>
>  char e1000_driver_name[] = "e1000";
>  static char e1000_driver_string[] = "Intel(R) PRO/1000 Network Driver";
> @@ -842,6 +843,44 @@ static int e1000_set_features(struct net_device *netdev,
> return 0;
>  }
>
> +static int e1000_xdp_set(struct net_device *netdev, struct bpf_prog *prog)
> +{
> +   struct e1000_adapter *adapter = netdev_priv(netdev);
> +   struct bpf_prog *old_prog;
> +
> +   old_prog = xchg(>prog, prog);
> +   if (old_prog) {
> +   synchronize_net();
> +   bpf_prog_put(old_prog);
> +   }
> +
> +   if (netif_running(netdev))
> +   e1000_reinit_locked(adapter);
> +   else
> +   e1000_reset(adapter);
> +   return 0;
> +}
> +
> +static bool e1000_xdp_attached(struct net_device *dev)
> +{
> +   struct e1000_adapter *priv = netdev_priv(dev);
> +
> +   return !!priv->prog;
> +}
> +
> +static int e1000_xdp(struct net_device *dev, struct netdev_xdp *xdp)
> +{
> +   switch (xdp->command) {
> +   case XDP_SETUP_PROG:
> +   return e1000_xdp_set(dev, xdp->prog);
> +   case XDP_QUERY_PROG:
> +   xdp->prog_attached = e1000_xdp_attached(dev);
> +   return 0;
> +   default:
> +   return -EINVAL;
> +   }
> +}
> +
>  static const struct net_device_ops e1000_netdev_ops = {
> .ndo_open   = e1000_open,
> .ndo_stop   = e1000_close,
> @@ -860,6 +899,7 @@ static const struct net_device_ops e1000_netdev_ops = {
>  #endif
> .ndo_fix_features   = e1000_fix_features,
> .ndo_set_features   = e1000_set_features,
> +   .ndo_xdp= e1000_xdp,
>  };
>
>  /**
> @@ -1276,6 +1316,9 @@ static void e1000_remove(struct pci_dev *pdev)
> e1000_down_and_stop(adapter);
> e1000_release_manageability(adapter);
>
> +   if (adapter->prog)
> +   bpf_prog_put(adapter->prog);
> +
> unregister_netdev(netdev);
>
> e1000_phy_hw_reset(hw);
> @@ -1859,7 +1902,7 @@ static void e1000_configure_rx(struct e1000_adapter 
> *adapter)
> struct e1000_hw *hw = >hw;
> u32 rdlen, rctl, rxcsum;
>
> -   if (adapter->netdev->mtu > ETH_DATA_LEN) {
> +   if (adapter->netdev->mtu > ETH_DATA_LEN || adapter->prog) {
> rdlen = adapter->rx_ring[0].count *
> sizeof(struct e1000_rx_desc);
>

Re: [net-next 02/15] i40e: Enable VF specific ethtool statistics via VF Port representor netdevs

2016-09-20 Thread Or Gerlitz

On Wed, Sep 21, 2016 at 6:43 AM, Jeff Kirsher
 wrote:
> From: Sridhar Samudrala 
>
> Sample script that shows ethtool stats on VF representor netdev
> PF: enp5s0f0, VF0: enp5s2  VF_REP0: enp5s0f0-vf0
>
># echo 2 > /sys/class/net/enp5s0f0/device/sriov_numvfs
># ip link set enp5s2 up
># ethtool -S enp5s0f0-vf0
>NIC statistics:
>  tx_bytes: 0
>  tx_unicast: 0
>  tx_multicast: 0
>  tx_broadcast: 0
>  tx_discards: 0
>  tx_errors: 0
>  rx_bytes: 140
>  rx_unicast: 0
>  rx_multicast: 2
>  rx_broadcast: 0
>  rx_discards: 0
>  rx_unknown_protocol: 0

Now, when the SW stats are finally upstream for 4.9 in net-next, the
correct approach
for the VF reps counters is to follow the architecture presented there
[1] -- and this is
for the netlink based standard counters. Once you do that, there's no
need to expose
the VF HW counters through  ethtool of the VF rep.

Or.


[1] offloaded stats commits
a5ea31f Merge branch 'net-offloaded-stats'
fc1bbb0 mlxsw: spectrum: Implement offload stats ndo and expose HW
stats by default
69ae6ad net: core: Add offload stats to if_stats_msg
2c9d85d netdevice: Add offload statistics ndo

Re: [net-next 01/15] i40e: Introduce VF port representor/control netdevs

2016-09-20 Thread Or Gerlitz

On Wed, Sep 21, 2016 at 6:43 AM, Jeff Kirsher
 wrote:
> From: Sridhar Samudrala 

> This patch enables creation of a VF Port representor/Control netdev
> associated with each VF. These netdevs can be used to control and configure
> VFs from PFs namespace. They enable exposing VF statistics, configuring
> link state, mtu, fdb/vlan entries etc.

What happens if someone does a xmit on the VF representor, does the
packet show up @ the VF?
and what happens of the VF xmits and there's no HW steering rule that
matches this, does
the frame show up @ the VF rep on the host?

In other words, can these VF reps serve for setting up host SW based
switching which you
can later offload (through TC, bridge, netfilter, etc)?

I am posing these questions because in downstream patch you are adding
devlink support
for set/get the e-switch mode and you declare the default mode to be switchdev.

When the switchdev mode was introduced in 4.8 these RX/TX
characteristics were defined
to be an essential (== requirement) part for a driver to support that mode.

Or

> # echo 2 > /sys/class/net/enp5s0f0/device/sriov_numvfs
> # ip l show
> 297: enp5s0f0:  mtu 1500 qdisc noop portid 
> 6805ca2e7268 state DOWN mode DEFAULT group default qlen 1000
> link/ether 68:05:ca:2e:72:68 brd ff:ff:ff:ff:ff:ff
> vf 0 MAC 00:00:00:00:00:00, spoof checking on, link-state auto, trust off
> vf 1 MAC 00:00:00:00:00:00, spoof checking on, link-state auto, trust off
> 299: enp5s0f0-vf0:  mtu 1500 qdisc noop state DOWN 
> mode DEFAULT group default qlen 1000
> link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff
> 300: enp5s0f0-vf1:  mtu 1500 qdisc noop state DOWN 
> mode DEFAULT group default qlen 1000
> link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff

Re: [PATCH net-next] net: ethernet: mediatek: enhance with avoiding superfluous assignment inside mtk_get_ethtool_stats

2016-09-20 Thread David Miller

From: 
Date: Tue, 20 Sep 2016 11:26:48 +0800

> From: Sean Wang 
> 
> data_src is unchanged inside the loop, so this patch moves
> the assignment to outside the loop to avoid unnecessarily
> assignment
> 
> Signed-off-by: Sean Wang 

Applied.

Re: [PATCH net-next] net: dsa: mv88e6xxx: handle multiple ports in ATU

2016-09-20 Thread David Miller

From: Vivien Didelot 
Date: Mon, 19 Sep 2016 19:56:11 -0400

> An address can be loaded in the ATU with multiple ports, for instance
> when adding multiple ports to a Multicast group with "bridge mdb".
> 
> The current code doesn't allow that. Add an helper to get a single entry
> from the ATU, then set or clear the requested port, before loading the
> entry back in the ATU.
> 
> Note that the required _mv88e6xxx_atu_getnext function is defined below
> mv88e6xxx_port_db_load_purge, so forward-declare it for the moment. The
> ATU code will be isolated in future patches.
> 
> Fixes: 83dabd1fa84c ("net: dsa: mv88e6xxx: make switchdev DB ops generic")
> Signed-off-by: Vivien Didelot 

Applied.

[net-next 15/15] i40evf: remove unnecessary error checking against i40e_shutdown_adminq

2016-09-20 Thread Jeff Kirsher

From: Lihong Yang 

The i40e_shutdown_adminq function never returns failure. There is no need to
check the non-0 return value. Clean up the unnecessary error checking and
warning against it.

Change-ID: Ibb616f09cfb93bd1a872ebf3241a15fb8354b31b
Signed-off-by: Lihong Yang 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40evf/i40evf_main.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40evf/i40evf_main.c 
b/drivers/net/ethernet/intel/i40evf/i40evf_main.c
index 9906775..99833f3 100644
--- a/drivers/net/ethernet/intel/i40evf/i40evf_main.c
+++ b/drivers/net/ethernet/intel/i40evf/i40evf_main.c
@@ -1785,8 +1785,7 @@ continue_reset:
i40evf_free_all_tx_resources(adapter);
 
/* kill and reinit the admin queue */
-   if (i40evf_shutdown_adminq(hw))
-   dev_warn(>pdev->dev, "Failed to shut down adminq\n");
+   i40evf_shutdown_adminq(hw);
adapter->current_op = I40E_VIRTCHNL_OP_UNKNOWN;
err = i40evf_init_adminq(hw);
if (err)
-- 
2.7.4

[net-next 14/15] i40e: Limit TX descriptor count in cases where frag size is greater than 16K

2016-09-20 Thread Jeff Kirsher

From: Alexander Duyck 

The i40e driver was incorrectly assuming that we would always be pulling
no more than 1 descriptor from each fragment.  It is in fact possible for
us to end up with the case where 2 descriptors worth of data may be pulled
when a frame is larger than one of the pieces generated when aligning the
payload to either 4K or pieces smaller than 16K.

To adjust for this we just need to make certain to test all the way to the
end of the fragments as it is possible for us to span 2 descriptors in the
block before us so we need to guarantee that even the last 6 descriptors
have enough data to fill a full frame.

Change-ID: Ic2ecb4d6b745f447d334e66c14002152f50e2f99
Signed-off-by: Alexander Duyck 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40e/i40e_txrx.c   | 7 ++-
 drivers/net/ethernet/intel/i40evf/i40e_txrx.c | 7 ++-
 2 files changed, 4 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c 
b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
index f8d6623..bf7bb7c 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
@@ -2621,9 +2621,7 @@ bool __i40e_chk_linearize(struct sk_buff *skb)
return false;
 
/* We need to walk through the list and validate that each group
-* of 6 fragments totals at least gso_size.  However we don't need
-* to perform such validation on the last 6 since the last 6 cannot
-* inherit any data from a descriptor after them.
+* of 6 fragments totals at least gso_size.
 */
nr_frags -= I40E_MAX_BUFFER_TXD - 2;
frag = _shinfo(skb)->frags[0];
@@ -2654,8 +2652,7 @@ bool __i40e_chk_linearize(struct sk_buff *skb)
if (sum < 0)
return true;
 
-   /* use pre-decrement to avoid processing last fragment */
-   if (!--nr_frags)
+   if (!nr_frags--)
break;
 
sum -= skb_frag_size(stale++);
diff --git a/drivers/net/ethernet/intel/i40evf/i40e_txrx.c 
b/drivers/net/ethernet/intel/i40evf/i40e_txrx.c
index 0130458..e3427eb 100644
--- a/drivers/net/ethernet/intel/i40evf/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40evf/i40e_txrx.c
@@ -1832,9 +1832,7 @@ bool __i40evf_chk_linearize(struct sk_buff *skb)
return false;
 
/* We need to walk through the list and validate that each group
-* of 6 fragments totals at least gso_size.  However we don't need
-* to perform such validation on the last 6 since the last 6 cannot
-* inherit any data from a descriptor after them.
+* of 6 fragments totals at least gso_size.
 */
nr_frags -= I40E_MAX_BUFFER_TXD - 2;
frag = _shinfo(skb)->frags[0];
@@ -1865,8 +1863,7 @@ bool __i40evf_chk_linearize(struct sk_buff *skb)
if (sum < 0)
return true;
 
-   /* use pre-decrement to avoid processing last fragment */
-   if (!--nr_frags)
+   if (!nr_frags--)
break;
 
sum -= skb_frag_size(stale++);
-- 
2.7.4

[net-next 10/15] i40e: Add support for switchdev API for Switch ID

2016-09-20 Thread Jeff Kirsher

From: Amritha Nambiar 

This patch adds support for switchdev ops on the VF Port representors
and the PF uplink, the only operation implemented is the port attribute
API to get the port parent ID or the switch ID. The switch ID is used
to identify the net_devices attached to the same HW switch.

The switch ID returned for the VF Port representors and the PF uplink
is the phys_port_id.

102: enp9s0f0:  mtu 1500 qdisc mq state DOWN group default 
qlen 1000
link/ether 68:05:ca:35:77:50 brd ff:ff:ff:ff:ff:ff promiscuity 0 
numtxqueues 64 numrxqueues 64 portid 6805ca357750 switchid 6805ca357750
vf 0 MAC 00:00:00:00:00:00, spoof checking on, link-state auto
vf 1 MAC 00:00:00:00:00:00, spoof checking on, link-state auto
103: enp9s0f1:  mtu 1500 qdisc mq state DOWN group default 
qlen 1000
link/ether 68:05:ca:35:77:51 brd ff:ff:ff:ff:ff:ff promiscuity 0 
numtxqueues 64 numrxqueues 64 portid 6805ca357751 switchid 6805ca357751
104: enp9s0f0-vf0:  mtu 1500 qdisc fq_codel state DOWN 
group default qlen 1000
link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff promiscuity 0 
numtxqueues 1 numrxqueues 1 switchid 6805ca357750
105: enp9s0f0-vf1:  mtu 1500 qdisc fq_codel state DOWN 
group default qlen 1000
link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff promiscuity 0 
numtxqueues 1 numrxqueues 1 switchid 6805ca357750
inet6 fe80::200:ff:fe00:0/64 scope link tentative
   valid_lft forever preferred_lft forever

Signed-off-by: Amritha Nambiar 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40e/i40e.h |  1 +
 drivers/net/ethernet/intel/i40e/i40e_main.c| 17 +++
 drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c | 57 ++
 drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.h |  2 +
 4 files changed, 77 insertions(+)

diff --git a/drivers/net/ethernet/intel/i40e/i40e.h 
b/drivers/net/ethernet/intel/i40e/i40e.h
index f531f91..22657ea 100644
--- a/drivers/net/ethernet/intel/i40e/i40e.h
+++ b/drivers/net/ethernet/intel/i40e/i40e.h
@@ -54,6 +54,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "i40e_type.h"
 #include "i40e_prototype.h"
diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c 
b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 3fdaf36..3bbf07f 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -9102,6 +9102,20 @@ static const struct net_device_ops i40e_netdev_ops = {
.ndo_bridge_setlink = i40e_ndo_bridge_setlink,
 };
 
+static int i40e_sw_attr_get(struct net_device *dev, struct switchdev_attr 
*attr)
+{
+   struct i40e_netdev_priv *np = netdev_priv(dev);
+   struct i40e_pf *pf = np->vsi->back;
+   int err = 0;
+
+   err = __i40e_sw_attr_get(pf, attr);
+   return err;
+}
+
+static const struct switchdev_ops i40e_switchdev_ops = {
+   .switchdev_port_attr_get= i40e_sw_attr_get,
+};
+
 /**
  * i40e_config_netdev - Setup the netdev flags
  * @vsi: the VSI being configured
@@ -9199,6 +9213,9 @@ static int i40e_config_netdev(struct i40e_vsi *vsi)
netdev->netdev_ops = _netdev_ops;
netdev->watchdog_timeo = 5 * HZ;
i40e_set_ethtool_ops(netdev);
+#ifdef CONFIG_NET_SWITCHDEV
+   netdev->switchdev_ops = _switchdev_ops;
+#endif
 #ifdef I40E_FCOE
i40e_fcoe_config_netdev(netdev, vsi);
 #endif
diff --git a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c 
b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
index da68b00..b90abd3 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
@@ -1027,6 +1027,60 @@ static const struct net_device_ops i40e_vf_netdev_ops = {
 };
 
 /**
+ * __i40e_sw_attr_get
+ * @pf: pointer to the PF structure
+ * @attr: pointer to switchdev_attr structure
+ *
+ * Get switchdev port attributes
+ **/
+int __i40e_sw_attr_get(struct i40e_pf *pf, struct switchdev_attr *attr)
+{
+   struct i40e_hw *hw = >hw;
+
+   if (!(pf->flags & I40E_FLAG_SRIOV_ENABLED))
+   return -EOPNOTSUPP;
+
+   switch (attr->id) {
+   case SWITCHDEV_ATTR_ID_PORT_PARENT_ID:
+   if (!(pf->flags & I40E_FLAG_PORT_ID_VALID))
+   return -EOPNOTSUPP;
+
+   attr->u.ppid.id_len = min_t(int, sizeof(hw->mac.port_addr),
+   sizeof(attr->u.ppid.id));
+   memcpy(>u.ppid.id, hw->mac.port_addr,
+  attr->u.ppid.id_len);
+   break;
+   default:
+   return -EOPNOTSUPP;
+   }
+
+   return 0;
+}
+
+/**
+ * i40e_vf_netdev_sw_attr_get
+ * @dev: target device
+ * @attr: pointer to switchdev_attr structure
+ *
+ * Handler for switchdev API to get port attributes for VF

[net-next 04/15] i40e: fix setting user defined RSS hash key

2016-09-20 Thread Jeff Kirsher

From: Alan Brady 

Previously, when using ethtool to change the RSS hash key, ethtool would
report back saying the old key was still being used and no error was
reported.  It was unclear whether it was being reported incorrectly or
being set incorrectly.  Debugging revealed 'i40e_set_rxfh()' returned
zero immediately instead of setting the key because a user defined
indirection table is not supplied when changing the hash key.

This fix instead changes it such that if an indirection table is not
supplied, then a default one is created and the hash key is now
correctly set.

Change-ID: Iddb621897ecf208650272b7ee46702cad7b69a71
Signed-off-by: Alan Brady 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40e/i40e.h |  2 ++
 drivers/net/ethernet/intel/i40e/i40e_ethtool.c | 12 +++-
 drivers/net/ethernet/intel/i40e/i40e_main.c|  6 ++
 3 files changed, 11 insertions(+), 9 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e.h 
b/drivers/net/ethernet/intel/i40e/i40e.h
index 6e211f2..f531f91 100644
--- a/drivers/net/ethernet/intel/i40e/i40e.h
+++ b/drivers/net/ethernet/intel/i40e/i40e.h
@@ -704,6 +704,8 @@ void i40e_do_reset_safe(struct i40e_pf *pf, u32 
reset_flags);
 void i40e_do_reset(struct i40e_pf *pf, u32 reset_flags);
 int i40e_config_rss(struct i40e_vsi *vsi, u8 *seed, u8 *lut, u16 lut_size);
 int i40e_get_rss(struct i40e_vsi *vsi, u8 *seed, u8 *lut, u16 lut_size);
+void i40e_fill_rss_lut(struct i40e_pf *pf, u8 *lut,
+  u16 rss_table_size, u16 rss_size);
 struct i40e_vsi *i40e_find_vsi_from_id(struct i40e_pf *pf, u16 id);
 void i40e_update_stats(struct i40e_vsi *vsi);
 void i40e_update_eth_stats(struct i40e_vsi *vsi);
diff --git a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c 
b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
index 1f3bbb05..2b2b55e 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
@@ -2922,15 +2922,13 @@ static int i40e_set_rxfh(struct net_device *netdev, 
const u32 *indir,
 {
struct i40e_netdev_priv *np = netdev_priv(netdev);
struct i40e_vsi *vsi = np->vsi;
+   struct i40e_pf *pf = vsi->back;
u8 *seed = NULL;
u16 i;
 
if (hfunc != ETH_RSS_HASH_NO_CHANGE && hfunc != ETH_RSS_HASH_TOP)
return -EOPNOTSUPP;
 
-   if (!indir)
-   return 0;
-
if (key) {
if (!vsi->rss_hkey_user) {
vsi->rss_hkey_user = kzalloc(I40E_HKEY_ARRAY_SIZE,
@@ -2948,8 +2946,12 @@ static int i40e_set_rxfh(struct net_device *netdev, 
const u32 *indir,
}
 
/* Each 32 bits pointed by 'indir' is stored with a lut entry */
-   for (i = 0; i < I40E_HLUT_ARRAY_SIZE; i++)
-   vsi->rss_lut_user[i] = (u8)(indir[i]);
+   if (indir)
+   for (i = 0; i < I40E_HLUT_ARRAY_SIZE; i++)
+   vsi->rss_lut_user[i] = (u8)(indir[i]);
+   else
+   i40e_fill_rss_lut(pf, vsi->rss_lut_user, I40E_HLUT_ARRAY_SIZE,
+ vsi->rss_size);
 
return i40e_config_rss(vsi, seed, vsi->rss_lut_user,
   I40E_HLUT_ARRAY_SIZE);
diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c 
b/drivers/net/ethernet/intel/i40e/i40e_main.c
index b3e9ce4..45b9c67 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -57,8 +57,6 @@ static int i40e_setup_pf_switch(struct i40e_pf *pf, bool 
reinit);
 static int i40e_setup_misc_vector(struct i40e_pf *pf);
 static void i40e_determine_queue_usage(struct i40e_pf *pf);
 static int i40e_setup_pf_filter_control(struct i40e_pf *pf);
-static void i40e_fill_rss_lut(struct i40e_pf *pf, u8 *lut,
- u16 rss_table_size, u16 rss_size);
 static void i40e_fdir_sb_setup(struct i40e_pf *pf);
 static int i40e_veb_get_bw_info(struct i40e_veb *veb);
 
@@ -8244,8 +8242,8 @@ int i40e_get_rss(struct i40e_vsi *vsi, u8 *seed, u8 *lut, 
u16 lut_size)
  * @rss_table_size: Lookup table size
  * @rss_size: Range of queue number for hashing
  */
-static void i40e_fill_rss_lut(struct i40e_pf *pf, u8 *lut,
- u16 rss_table_size, u16 rss_size)
+void i40e_fill_rss_lut(struct i40e_pf *pf, u8 *lut,
+  u16 rss_table_size, u16 rss_size)
 {
u16 i;
 
-- 
2.7.4

[net-next 11/15] i40evf: Fix link state event handling

2016-09-20 Thread Jeff Kirsher

From: Sridhar Samudrala 

Currently disabling the link state from PF via
ip link set enp5s0f0 vf 0 state disable
doesn't disable the CARRIER on the VF.

This patch updates the carrier and starts/stops the tx queues based on the
link state notification from PF.

  PF: enp5s0f0, VF: enp5s2
  #modprobe i40e
  #echo 2 > /sys/class/net/enp5s0f0/device/sriov_numvfs
  #ip link set enp5s2 up
  #ip -d link show enp5s2
  175: enp5s2:  mtu 1500 qdisc mq state UP 
mode DEFAULT group default qlen 1000
  link/ether ea:4d:60:bc:6f:85 brd ff:ff:ff:ff:ff:ff promiscuity 0 
addrgenmode eui64
  #ip link set enp5s0f0 vf 0 state disable
  #ip -d link show enp5s0f0
  171: enp5s0f0:  mtu 1500 qdisc noop state DOWN mode 
DEFAULT group default qlen 1000
  link/ether 68:05:ca:2e:72:68 brd ff:ff:ff:ff:ff:ff promiscuity 0 
addrgenmode eui64 numtxqueues 72 numrxqueues 72 portid 6805ca2e7268
  vf 0 MAC 00:00:00:00:00:00, spoof checking on, link-state disable, trust 
off
  vf 1 MAC 00:00:00:00:00:00, spoof checking on, link-state auto, trust off
  #ip -d link show enp5s2
  175: enp5s2:  mtu 1500 qdisc mq state DOWN 
mode DEFAULT group default qlen 1000
   link/ether ea:4d:60:bc:6f:85 brd ff:ff:ff:ff:ff:ff promiscuity 0 
addrgenmode eui64 numtxqueues 16 numrxqueues 16

Signed-off-by: Sridhar Samudrala 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40evf/i40evf_main.c |  4 
 drivers/net/ethernet/intel/i40evf/i40evf_virtchnl.c | 10 +++---
 2 files changed, 11 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40evf/i40evf_main.c 
b/drivers/net/ethernet/intel/i40evf/i40evf_main.c
index f751f7b..e0a8cd8 100644
--- a/drivers/net/ethernet/intel/i40evf/i40evf_main.c
+++ b/drivers/net/ethernet/intel/i40evf/i40evf_main.c
@@ -1037,6 +1037,7 @@ void i40evf_down(struct i40evf_adapter *adapter)
 
netif_carrier_off(netdev);
netif_tx_disable(netdev);
+   adapter->link_up = false;
i40evf_napi_disable_all(adapter);
i40evf_irq_disable(adapter);
 
@@ -1731,6 +1732,7 @@ static void i40evf_reset_task(struct work_struct *work)
set_bit(__I40E_DOWN, >vsi.state);
netif_carrier_off(netdev);
netif_tx_disable(netdev);
+   adapter->link_up = false;
i40evf_napi_disable_all(adapter);
i40evf_irq_disable(adapter);
i40evf_free_traffic_irqs(adapter);
@@ -1769,6 +1771,7 @@ continue_reset:
if (netif_running(adapter->netdev)) {
netif_carrier_off(netdev);
netif_tx_stop_all_queues(netdev);
+   adapter->link_up = false;
i40evf_napi_disable_all(adapter);
}
i40evf_irq_disable(adapter);
@@ -2457,6 +2460,7 @@ static void i40evf_init_task(struct work_struct *work)
goto err_sw_init;
 
netif_carrier_off(netdev);
+   adapter->link_up = false;
 
if (!adapter->netdev_registered) {
err = register_netdev(netdev);
diff --git a/drivers/net/ethernet/intel/i40evf/i40evf_virtchnl.c 
b/drivers/net/ethernet/intel/i40evf/i40evf_virtchnl.c
index cc6cb30..ddf478d 100644
--- a/drivers/net/ethernet/intel/i40evf/i40evf_virtchnl.c
+++ b/drivers/net/ethernet/intel/i40evf/i40evf_virtchnl.c
@@ -898,8 +898,14 @@ void i40evf_virtchnl_completion(struct i40evf_adapter 
*adapter,
vpe->event_data.link_event.link_status) {
adapter->link_up =
vpe->event_data.link_event.link_status;
+   if (adapter->link_up) {
+   netif_tx_start_all_queues(netdev);
+   netif_carrier_on(netdev);
+   } else {
+   netif_tx_stop_all_queues(netdev);
+   netif_carrier_off(netdev);
+   }
i40evf_print_link_message(adapter);
-   netif_tx_stop_all_queues(netdev);
}
break;
case I40E_VIRTCHNL_EVENT_RESET_IMPENDING:
@@ -974,8 +980,6 @@ void i40evf_virtchnl_completion(struct i40evf_adapter 
*adapter,
case I40E_VIRTCHNL_OP_ENABLE_QUEUES:
/* enable transmits */
i40evf_irq_enable(adapter, true);
-   netif_tx_start_all_queues(adapter->netdev);
-   netif_carrier_on(adapter->netdev);
break;
case I40E_VIRTCHNL_OP_DISABLE_QUEUES:
i40evf_free_all_tx_resources(adapter);
-- 
2.7.4

[net-next 03/15] i40e: Introduce devlink interface

2016-09-20 Thread Jeff Kirsher

From: Sridhar Samudrala 

Add initial devlink support to set/get the mode of SRIOV switch.
By default the switch mode is set to 'switchdev' as VF Port representors
are created by default.
This patch allows the mode to be set to 'legacy' to disable creation of
VF Port representor netdevs.

With smode support in iproute2 'devlink' utility, switch mode can be set
and get via following commands.

# devlink dev smode pci/:05:00.0
mode: switchdev
# devlink dev set pci/:05:00.0 smode legacy
# devlink dev smode pci/:05:00.0
mode: legacy

Signed-off-by: Sridhar Samudrala 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/Kconfig |  1 +
 drivers/net/ethernet/intel/i40e/i40e.h |  3 +
 drivers/net/ethernet/intel/i40e/i40e_main.c| 91 --
 drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c |  6 +-
 4 files changed, 91 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ethernet/intel/Kconfig 
b/drivers/net/ethernet/intel/Kconfig
index c0e1743..2ede229 100644
--- a/drivers/net/ethernet/intel/Kconfig
+++ b/drivers/net/ethernet/intel/Kconfig
@@ -215,6 +215,7 @@ config I40E
tristate "Intel(R) Ethernet Controller XL710 Family support"
select PTP_1588_CLOCK
depends on PCI
+   depends on MAY_USE_DEVLINK
---help---
  This driver supports Intel(R) Ethernet Controller XL710 Family of
  devices.  For more information on how to identify your adapter, go
diff --git a/drivers/net/ethernet/intel/i40e/i40e.h 
b/drivers/net/ethernet/intel/i40e/i40e.h
index 13b1f75..6e211f2 100644
--- a/drivers/net/ethernet/intel/i40e/i40e.h
+++ b/drivers/net/ethernet/intel/i40e/i40e.h
@@ -53,6 +53,8 @@
 #include 
 #include 
 #include 
+#include 
+
 #include "i40e_type.h"
 #include "i40e_prototype.h"
 #ifdef I40E_FCOE
@@ -442,6 +444,7 @@ struct i40e_pf {
u32 ioremap_len;
u32 fd_inv;
u16 phy_led_val;
+   enum devlink_eswitch_mode eswitch_mode;
 };
 
 enum i40e_filter_state {
diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c 
b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 61b0fc4..b3e9ce4 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -10721,6 +10721,68 @@ static void i40e_get_platform_mac_addr(struct pci_dev 
*pdev, struct i40e_pf *pf)
 }
 
 /**
+ * i40e_devlink_eswitch_mode_get
+ *
+ * @devlink: pointer to devlink struct
+ * @mode: sr-iov switch mode pointer
+ *
+ * Returns the switch mode of the associated PF in the @mode pointer.
+ */
+static int i40e_devlink_eswitch_mode_get(struct devlink *devlink, u16 *mode)
+{
+   struct i40e_pf *pf = devlink_priv(devlink);
+
+   *mode = pf->eswitch_mode;
+
+   return 0;
+}
+
+/**
+ * i40e_devlink_eswitch_mode_set
+ *
+ * @devlink: pointer to devlink struct
+ * @mode: sr-iov switch mode
+ *
+ * Set the switch mode of the associated PF.
+ * Returns 0 on success and -EOPNOTSUPP on error.
+ */
+static int i40e_devlink_eswitch_mode_set(struct devlink *devlink, u16 mode)
+{
+   struct i40e_pf *pf = devlink_priv(devlink);
+   struct i40e_vf *vf;
+   int i, err = 0;
+
+   if (mode == pf->eswitch_mode)
+   goto done;
+
+   switch (mode) {
+   case DEVLINK_ESWITCH_MODE_LEGACY:
+   for (i = 0; i < pf->num_alloc_vfs; i++) {
+   vf = &(pf->vf[i]);
+   i40e_free_vf_netdev(vf);
+   }
+   pf->eswitch_mode = mode;
+   break;
+   case DEVLINK_ESWITCH_MODE_SWITCHDEV:
+   pf->eswitch_mode = mode;
+   for (i = 0; i < pf->num_alloc_vfs; i++) {
+   vf = &(pf->vf[i]);
+   i40e_alloc_vf_netdev(vf, i);
+   }
+   break;
+   default:
+   err = -EOPNOTSUPP;
+   }
+done:
+   return err;
+}
+
+static const struct devlink_ops i40e_devlink_ops = {
+   .eswitch_mode_get = i40e_devlink_eswitch_mode_get,
+   .eswitch_mode_set = i40e_devlink_eswitch_mode_set,
+};
+
+/**
  * i40e_probe - Device initialization routine
  * @pdev: PCI device information struct
  * @ent: entry in i40e_pci_tbl
@@ -10737,6 +10799,7 @@ static int i40e_probe(struct pci_dev *pdev, const 
struct pci_device_id *ent)
struct i40e_pf *pf;
struct i40e_hw *hw;
static u16 pfs_found;
+   struct devlink *devlink;
u16 wol_nvm_bits;
u16 link_status;
int err;
@@ -10770,20 +10833,28 @@ static int i40e_probe(struct pci_dev *pdev, const 
struct pci_device_id *ent)
pci_enable_pcie_error_reporting(pdev);
pci_set_master(pdev);
 
+   devlink = devlink_alloc(_devlink_ops, sizeof(*pf));
+   if (!devlink) {
+   dev_err(>dev, "devlink_alloc failed\n");
+   err = -ENOMEM;
+   goto

[net-next 13/15] i40evf: remove unnecessary error checking against i40evf_up_complete

2016-09-20 Thread Jeff Kirsher

From: Bimmy Pujari 

Function i40evf_up_complete() always returns success. Changed this to a
void type and removed the code that checks the return status and prints
an error message.

Change-ID: I8c400f174786b9c855f679e470f35af292fb50ad
Signed-off-by: Bimmy Pujari 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40evf/i40evf_main.c | 11 +++
 1 file changed, 3 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40evf/i40evf_main.c 
b/drivers/net/ethernet/intel/i40evf/i40evf_main.c
index e0a8cd8..9906775 100644
--- a/drivers/net/ethernet/intel/i40evf/i40evf_main.c
+++ b/drivers/net/ethernet/intel/i40evf/i40evf_main.c
@@ -1007,7 +1007,7 @@ static void i40evf_configure(struct i40evf_adapter 
*adapter)
  * i40evf_up_complete - Finish the last steps of bringing up a connection
  * @adapter: board private structure
  **/
-static int i40evf_up_complete(struct i40evf_adapter *adapter)
+static void i40evf_up_complete(struct i40evf_adapter *adapter)
 {
adapter->state = __I40EVF_RUNNING;
clear_bit(__I40E_DOWN, >vsi.state);
@@ -1016,7 +1016,6 @@ static int i40evf_up_complete(struct i40evf_adapter 
*adapter)
 
adapter->aq_required |= I40EVF_FLAG_AQ_ENABLE_QUEUES;
mod_timer_pending(>watchdog_timer, jiffies + 1);
-   return 0;
 }
 
 /**
@@ -1827,9 +1826,7 @@ continue_reset:
 
i40evf_configure(adapter);
 
-   err = i40evf_up_complete(adapter);
-   if (err)
-   goto reset_err;
+   i40evf_up_complete(adapter);
 
i40evf_irq_enable(adapter, true);
} else {
@@ -2059,9 +2056,7 @@ static int i40evf_open(struct net_device *netdev)
i40evf_add_filter(adapter, adapter->hw.mac.addr);
i40evf_configure(adapter);
 
-   err = i40evf_up_complete(adapter);
-   if (err)
-   goto err_req_irq;
+   i40evf_up_complete(adapter);
 
i40evf_irq_enable(adapter, true);
 
-- 
2.7.4

[net-next 06/15] i40e: return correct opcode to VF

2016-09-20 Thread Jeff Kirsher

From: Mitch Williams 

This conditional is backward, so the driver responds back to the VF with
the wrong opcode. Do the old switcheroo to fix this.

Change-ID: I384035b0fef8a3881c176de4b4672009b3400b25
Signed-off-by: Mitch Williams 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c 
b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
index f98aa84..9ecf8f8 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
@@ -2308,8 +2308,8 @@ static int i40e_vc_iwarp_qvmap_msg(struct i40e_vf *vf, u8 
*msg, u16 msglen,
 error_param:
/* send the response to the VF */
return i40e_vc_send_resp_to_vf(vf,
-  config ? I40E_VIRTCHNL_OP_RELEASE_IWARP_IRQ_MAP :
-  I40E_VIRTCHNL_OP_CONFIG_IWARP_IRQ_MAP,
+  config ? I40E_VIRTCHNL_OP_CONFIG_IWARP_IRQ_MAP :
+  I40E_VIRTCHNL_OP_RELEASE_IWARP_IRQ_MAP,
   aq_ret);
 }
 
-- 
2.7.4

[net-next 07/15] i40e: Fix to check for NULL

2016-09-20 Thread Jeff Kirsher

From: Carolyn Wyborny 

This patch fixes an issue in the virt channel code, where a return
from i40e_find_vsi_from_id was not checked for NULL when applicable.
Without this patch, there is a risk for panic and static analysis
tools complain. This patch fixes the problem by adding the check
and adding an additional input check for similar reasons.

Change-ID: I7e9be88eb7a3addb50eadc451c8336d9e06f5394
Signed-off-by: Carolyn Wyborny 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c | 11 ++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c 
b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
index 9ecf8f8..da68b00 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
@@ -502,8 +502,16 @@ static int i40e_config_vsi_tx_queue(struct i40e_vf *vf, 
u16 vsi_id,
u32 qtx_ctl;
int ret = 0;
 
+   if (!i40e_vc_isvalid_vsi_id(vf, info->vsi_id)) {
+   ret = -ENOENT;
+   goto error_context;
+   }
pf_queue_id = i40e_vc_get_pf_queue_id(vf, vsi_id, vsi_queue_id);
vsi = i40e_find_vsi_from_id(pf, vsi_id);
+   if (!vsi) {
+   ret = -ENOENT;
+   goto error_context;
+   }
 
/* clear the context structure first */
memset(_ctx, 0, sizeof(struct i40e_hmc_obj_txq));
@@ -1567,7 +1575,8 @@ static int i40e_vc_config_promiscuous_mode_msg(struct 
i40e_vf *vf,
 
vsi = i40e_find_vsi_from_id(pf, info->vsi_id);
if (!test_bit(I40E_VF_STAT_ACTIVE, >vf_states) ||
-   !i40e_vc_isvalid_vsi_id(vf, info->vsi_id)) {
+   !i40e_vc_isvalid_vsi_id(vf, info->vsi_id) ||
+   !vsi) {
aq_ret = I40E_ERR_PARAM;
goto error_param;
}
-- 
2.7.4

[net-next 01/15] i40e: Introduce VF port representor/control netdevs

2016-09-20 Thread Jeff Kirsher

From: Sridhar Samudrala 

This patch enables creation of a VF Port representor/Control netdev
associated with each VF. These netdevs can be used to control and configure
VFs from PFs namespace. They enable exposing VF statistics, configuring
link state, mtu, fdb/vlan entries etc.

# echo 2 > /sys/class/net/enp5s0f0/device/sriov_numvfs
# ip l show
297: enp5s0f0:  mtu 1500 qdisc noop portid 
6805ca2e7268 state DOWN mode DEFAULT group default qlen 1000
link/ether 68:05:ca:2e:72:68 brd ff:ff:ff:ff:ff:ff
vf 0 MAC 00:00:00:00:00:00, spoof checking on, link-state auto, trust off
vf 1 MAC 00:00:00:00:00:00, spoof checking on, link-state auto, trust off
299: enp5s0f0-vf0:  mtu 1500 qdisc noop state DOWN 
mode DEFAULT group default qlen 1000
link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff
300: enp5s0f0-vf1:  mtu 1500 qdisc noop state DOWN 
mode DEFAULT group default qlen 1000
link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff

Signed-off-by: Sridhar Samudrala 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c | 88 ++
 drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.h | 14 
 2 files changed, 102 insertions(+)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c 
b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
index da34235..11f6970 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
@@ -1003,6 +1003,90 @@ complete_reset:
clear_bit(__I40E_VF_DISABLE, >state);
 }
 
+static int i40e_vf_netdev_open(struct net_device *dev)
+{
+   return 0;
+}
+
+static int i40e_vf_netdev_stop(struct net_device *dev)
+{
+   return 0;
+}
+
+static const struct net_device_ops i40e_vf_netdev_ops = {
+   .ndo_open = i40e_vf_netdev_open,
+   .ndo_stop = i40e_vf_netdev_stop,
+};
+
+/**
+ * i40e_alloc_vf_netdev
+ * @vf: pointer to the VF structure
+ * @vf_num: VF number
+ *
+ * Create VF representor/control netdev
+ **/
+int i40e_alloc_vf_netdev(struct i40e_vf *vf, u16 vf_num)
+{
+   struct i40e_pf *pf = vf->pf;
+   struct i40e_vsi *vsi = pf->vsi[pf->lan_vsi];
+   struct i40e_vf_netdev_priv *priv;
+   char netdev_name[IFNAMSIZ];
+   struct net_device *netdev;
+   int err;
+
+   snprintf(netdev_name, IFNAMSIZ, "%s-vf%d", vsi->netdev->name, vf_num);
+   netdev = alloc_netdev(sizeof(struct i40e_vf_netdev_priv), netdev_name,
+ NET_NAME_UNKNOWN, ether_setup);
+   if (!netdev) {
+   dev_err(>pdev->dev, "alloc_netdev failed for vf:%d\n",
+   vf_num);
+   return -ENOMEM;
+   }
+
+   pf->vf[vf_num].ctrl_netdev = netdev;
+
+   priv = netdev_priv(netdev);
+   priv->vf = &(pf->vf[vf_num]);
+
+   netdev->netdev_ops = _vf_netdev_ops;
+
+   netif_carrier_off(netdev);
+   netif_tx_disable(netdev);
+
+   err = register_netdev(netdev);
+   if (err) {
+   dev_err(>pdev->dev, "register_netdev failed for vf: %s\n",
+   vf->ctrl_netdev->name);
+   free_netdev(netdev);
+   return err;
+   }
+
+   dev_info(>pdev->dev, "VF representor(%s) created for VF %d\n",
+vf->ctrl_netdev->name, vf_num);
+
+   return 0;
+}
+
+/**
+ * i40e_free_vf_netdev
+ * @vf: pointer to the VF structure
+ *
+ * Free VF representor/control netdev
+ **/
+void i40e_free_vf_netdev(struct i40e_vf *vf)
+{
+   struct i40e_pf *pf = vf->pf;
+
+   if (!vf->ctrl_netdev)
+   return;
+
+   dev_info(>pdev->dev, "Freeing VF representor(%s)\n",
+vf->ctrl_netdev->name);
+
+   unregister_netdev(vf->ctrl_netdev);
+   free_netdev(vf->ctrl_netdev);
+}
+
 /**
  * i40e_free_vfs
  * @pf: pointer to the PF structure
@@ -1045,6 +1129,8 @@ void i40e_free_vfs(struct i40e_pf *pf)
i40e_free_vf_res(>vf[i]);
/* disable qp mappings */
i40e_disable_vf_mappings(>vf[i]);
+
+   i40e_free_vf_netdev(>vf[i]);
}
 
kfree(pf->vf);
@@ -1112,6 +1198,8 @@ int i40e_alloc_vfs(struct i40e_pf *pf, u16 num_alloc_vfs)
/* VF resources get allocated during reset */
i40e_reset_vf([i], false);
 
+   i40e_alloc_vf_netdev([i], i);
+
}
pf->num_alloc_vfs = num_alloc_vfs;
 
diff --git a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.h 
b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.h
index 8751741..1d54b95 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.h
+++ b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.h
@@ -72,10 +72,21 @@ enum i40e_vf_capabilities {
I40E_VIRTCHNL_VF_CAP_IWARP,
 };
 
+/* VF Ctrl

[net-next 09/15] i40e: avoid potential null pointer dereference when assigning len

2016-09-20 Thread Jeff Kirsher

From: Colin Ian King 

There is a sanitcy check for desc being null in the first line of
function i40evf_debug_aq.  However, before that, aq_desc is cast from
desc, and aq_desc is being dereferenced on the assignment of len, so
this could be a potential null pointer deference.  Fix this by moving
the initialization of len to the code block where len is being used
and hence at this point we know it is OK to dereference aq_desc.

Signed-off-by: Colin Ian King 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40evf/i40e_common.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/i40evf/i40e_common.c 
b/drivers/net/ethernet/intel/i40evf/i40e_common.c
index 4db0c03..7953c13 100644
--- a/drivers/net/ethernet/intel/i40evf/i40e_common.c
+++ b/drivers/net/ethernet/intel/i40evf/i40e_common.c
@@ -302,7 +302,6 @@ void i40evf_debug_aq(struct i40e_hw *hw, enum 
i40e_debug_mask mask, void *desc,
   void *buffer, u16 buf_len)
 {
struct i40e_aq_desc *aq_desc = (struct i40e_aq_desc *)desc;
-   u16 len = le16_to_cpu(aq_desc->datalen);
u8 *buf = (u8 *)buffer;
u16 i = 0;
 
@@ -326,6 +325,8 @@ void i40evf_debug_aq(struct i40e_hw *hw, enum 
i40e_debug_mask mask, void *desc,
   le32_to_cpu(aq_desc->params.external.addr_low));
 
if ((buffer != NULL) && (aq_desc->datalen != 0)) {
+   u16 len = le16_to_cpu(aq_desc->datalen);
+
i40e_debug(hw, mask, "AQ CMD Buffer:\n");
if (buf_len < len)
len = buf_len;
-- 
2.7.4

[net-next 05/15] i40e: fix "dump port" command when NPAR enabled

2016-09-20 Thread Jeff Kirsher

From: Alan Brady 

When using the debugfs to issue the "dump port" command
with NPAR enabled, the firmware reports back with invalid argument.

The issue occurs because the pf->mac_seid was used to perform the query.
This is fine when NPAR is disabled because the switch ID == pf->mac_seid,
however this is not the case when NPAR is enabled.  This fix instead
goes through the VSI to determine the correct ID to use in either case.

Change-ID: I0cd67913a7f2c4a2962e06d39e32e7447cc55b6a
Signed-off-by: Alan Brady 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40e/i40e_debugfs.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_debugfs.c 
b/drivers/net/ethernet/intel/i40e/i40e_debugfs.c
index 05cf9a7..8555f04 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_debugfs.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_debugfs.c
@@ -1054,6 +1054,7 @@ static ssize_t i40e_dbg_command_write(struct file *filp,
struct i40e_dcbx_config *r_cfg =
>hw.remote_dcbx_config;
int i, ret;
+   u32 switch_id;
 
bw_data = kzalloc(sizeof(
struct i40e_aqc_query_port_ets_config_resp),
@@ -1063,8 +1064,12 @@ static ssize_t i40e_dbg_command_write(struct file *filp,
goto command_write_done;
}
 
+   vsi = pf->vsi[pf->lan_vsi];
+   switch_id =
+   vsi->info.switch_id & I40E_AQ_VSI_SW_ID_MASK;
+
ret = i40e_aq_query_port_ets_config(>hw,
-   pf->mac_seid,
+   switch_id,
bw_data, NULL);
if (ret) {
dev_info(>pdev->dev,
-- 
2.7.4

[net-next 02/15] i40e: Enable VF specific ethtool statistics via VF Port representor netdevs

2016-09-20 Thread Jeff Kirsher

From: Sridhar Samudrala 

Sample script that shows ethtool stats on VF representor netdev
PF: enp5s0f0, VF0: enp5s2  VF_REP0: enp5s0f0-vf0

   # echo 2 > /sys/class/net/enp5s0f0/device/sriov_numvfs
   # ip link set enp5s2 up
   # ethtool -S enp5s0f0-vf0
   NIC statistics:
 tx_bytes: 0
 tx_unicast: 0
 tx_multicast: 0
 tx_broadcast: 0
 tx_discards: 0
 tx_errors: 0
 rx_bytes: 140
 rx_unicast: 0
 rx_multicast: 2
 rx_broadcast: 0
 rx_discards: 0
 rx_unknown_protocol: 0

Signed-off-by: Sridhar Samudrala 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40e/i40e.h |  1 +
 drivers/net/ethernet/intel/i40e/i40e_ethtool.c | 72 ++
 drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c |  1 +
 3 files changed, 74 insertions(+)

diff --git a/drivers/net/ethernet/intel/i40e/i40e.h 
b/drivers/net/ethernet/intel/i40e/i40e.h
index 19103a6..13b1f75 100644
--- a/drivers/net/ethernet/intel/i40e/i40e.h
+++ b/drivers/net/ethernet/intel/i40e/i40e.h
@@ -866,4 +866,5 @@ i40e_status i40e_get_npar_bw_setting(struct i40e_pf *pf);
 i40e_status i40e_set_npar_bw_setting(struct i40e_pf *pf);
 i40e_status i40e_commit_npar_bw_setting(struct i40e_pf *pf);
 void i40e_print_link_message(struct i40e_vsi *vsi, bool isup);
+void i40e_set_vf_netdev_ethtool_ops(struct net_device *netdev);
 #endif /* _I40E_H_ */
diff --git a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c 
b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
index 1835186..1f3bbb05 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
@@ -3116,3 +3116,75 @@ void i40e_set_ethtool_ops(struct net_device *netdev)
 {
netdev->ethtool_ops = _ethtool_ops;
 }
+
+/* As the VF Port representor(VFPR) represents the switch port corresponding
+ * to a VF, the tx_ and rx_ strings are swapped to indicate that the frames
+ * transmitted from VF are received on VFPR and the frames received on VF are
+ * transmitted from VFPR.
+ */
+static const char i40e_vf_netdev_ethtool_sset[][ETH_GSTRING_LEN] = {
+   "tx_bytes",
+   "tx_unicast",
+   "tx_multicast",
+   "tx_broadcast",
+   "tx_discards",
+   "tx_errors",
+   "rx_bytes",
+   "rx_unicast",
+   "rx_multicast",
+   "rx_broadcast",
+   "rx_discards",
+   "rx_unknown_protocol",
+};
+
+#define I40E_VF_NETDEV_ETHTOOL_STAT_COUNT \
+   ARRAY_SIZE(i40e_vf_netdev_ethtool_sset)
+
+static void i40e_vf_netdev_ethtool_get_strings(struct net_device *dev,
+  u32 stringset,
+  u8 *ethtool_strings)
+{
+   switch (stringset) {
+   case ETH_SS_STATS:
+   memcpy(ethtool_strings, _vf_netdev_ethtool_sset,
+  sizeof(i40e_vf_netdev_ethtool_sset));
+   break;
+   }
+}
+
+static int i40e_vf_netdev_ethtool_get_sset_count(struct net_device *dev,
+int stringset)
+{
+   switch (stringset) {
+   case ETH_SS_STATS:
+   return I40E_VF_NETDEV_ETHTOOL_STAT_COUNT;
+   default:
+   return -EOPNOTSUPP;
+   }
+}
+
+static void i40e_vf_netdev_ethtool_get_stats(struct net_device *dev,
+   struct ethtool_stats *target_ethtool_stats,
+   u64 *target_stat_values)
+{
+   struct i40e_vf_netdev_priv *priv = netdev_priv(dev);
+   struct i40e_vf *vf = priv->vf;
+   struct i40e_pf *pf = vf->pf;
+   struct i40e_vsi *vsi;
+
+   vsi = pf->vsi[vf->lan_vsi_idx];
+   i40e_update_stats(vsi);
+   memcpy(target_stat_values, >eth_stats,
+  I40E_VF_NETDEV_ETHTOOL_STAT_COUNT * 8);
+}
+
+static const struct ethtool_ops i40e_vf_netdev_ethtool_ops = {
+   .get_strings= i40e_vf_netdev_ethtool_get_strings,
+   .get_ethtool_stats  = i40e_vf_netdev_ethtool_get_stats,
+   .get_sset_count = i40e_vf_netdev_ethtool_get_sset_count,
+};
+
+void i40e_set_vf_netdev_ethtool_ops(struct net_device *netdev)
+{
+   netdev->ethtool_ops = _vf_netdev_ethtool_ops;
+}
diff --git a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c 
b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
index 11f6970..cacb797 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
@@ -1049,6 +1049,7 @@ int i40e_alloc_vf_netdev(struct i40e_vf *vf, u16 vf_num)
priv->vf = &(pf->vf[vf_num]);
 
netdev->netdev_ops = _vf_netdev_ops;
+   i40e_set_vf_netdev_ethtool_ops(netdev);
 
netif_carrier_off(netdev);
netif_tx_disable(netdev);
-- 
2.7.4

[net-next 08/15] i40e: Fix for extra byte swap in tunnel setup

2016-09-20 Thread Jeff Kirsher

From: Carolyn Wyborny 

This patch fixes an issue where we were byte swapping the port
parameter, then byte swapping it again in function execution.
Obviously, that's unnecessary, so take it out of the function calls.
Without this patch, the udp based tunnel configuration would
not be correct.

Change-ID: I788d83c5bd5732170f1a81dbfa0b1ac3ca8ea5b7
Signed-off-by: Carolyn Wyborny 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40e/i40e_main.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c 
b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 45b9c67..3fdaf36 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -7154,9 +7154,9 @@ static void i40e_sync_udp_filters_subtask(struct i40e_pf 
*pf)
pf->pending_udp_bitmap &= ~BIT_ULL(i);
port = pf->udp_ports[i].index;
if (port)
-   ret = i40e_aq_add_udp_tunnel(hw, ntohs(port),
-pf->udp_ports[i].type,
-NULL, NULL);
+   ret = i40e_aq_add_udp_tunnel(hw, port,
+   pf->udp_ports[i].type,
+   NULL, NULL);
else
ret = i40e_aq_del_udp_tunnel(hw, i, NULL);
 
-- 
2.7.4

[net-next 12/15] i40e: Sync link state between VFs and VF Port representors(VFPR)

2016-09-20 Thread Jeff Kirsher

From: Sridhar Samudrala 

This patch enables
- reflecting the link state of VFPR based on VF admin state & link state
  of VF based on admin state of VFPR.
- bringing up/down the VFPR sends a notification to update VF link state.
- bringing up/down the VF will cause the link state update of VFPR.
- enable/disable VF link state via ndo_set_vf_link_state will update the
  admin state of associated VFPR.

PF: enp5s0f0, VFs: enp5s2,enp5s2f1 VFPRs:enp5s0f0-vf0, enp5s0f0-vf1
# modprobe i40e
# echo 2 > /sys/class/net/enp5s0f0/device/sriov_numvfs

# ip link set enp5s2 up
# ip link set enp5s0f0-vf0 up
# ip link set enp5s0f0-vf1 up

# ip link show enp5s0f0-vf0
215: enp5s0f0-vf0:  mtu 1500 qdisc fq_codel 
state UP mode DEFAULT group default qlen 1000
link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff

/* enp5s0f0-vf0 UP -> enp5s2 CARRIER ON */
# ip link show enp5s2
218: enp5s2:  mtu 1500 qdisc mq state UP mode 
DEFAULT group default qlen 1000
link/ether ea:4d:60:bc:6f:85 brd ff:ff:ff:ff:ff:ff

/* enp5s2f1 DOWN -> enp5s0f0-vf1 NO-CARRIER */
# ip link show enp5s0f0-vf1
216: enp5s0f0-vf1:  mtu 1500 qdisc fq_codel 
state DOWN mode DEFAULT group default qlen 1000
link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff

# ip link set enp5s0f0-vf0 down
# ip link set enp5s2f1 up

/* enp5s2 UP -> enp5s0f0-vf1 CARRIER ON */
# ip link show enp5s0f0-vf1
216: enp5s0f0-vf1:  mtu 1500 qdisc fq_codel 
state UP mode DEFAULT group default qlen 1000
link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff

/* enp5s0-vf0 DOWN -> enp5s2 NO_CARRIER */
# ip link show enp5s2
218: enp5s2:  mtu 1500 qdisc mq state DOWN 
mode DEFAULT group default qlen 1000
link/ether ea:4d:60:bc:6f:85 brd ff:ff:ff:ff:ff:ff

# ip -d link show enp5s0f0
213: enp5s0f0:  mtu 1500 qdisc noop portid 6805ca2e7268 
state DOWN mode DEFAULT group default qlen 1000
link/ether 68:05:ca:2e:72:68 brd ff:ff:ff:ff:ff:ff promiscuity 0 
addrgenmode eui64
vf 0 MAC 00:00:00:00:00:00, spoof checking on, link-state disable
vf 1 MAC 00:00:00:00:00:00, spoof checking on, link-state enable

Signed-off-by: Sridhar Samudrala 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c | 33 ++
 1 file changed, 33 insertions(+)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c 
b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
index b90abd3..795a294 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
@@ -1013,11 +1013,25 @@ complete_reset:
 
 static int i40e_vf_netdev_open(struct net_device *dev)
 {
+   struct i40e_vf_netdev_priv *priv = netdev_priv(dev);
+   struct i40e_vf *vf = priv->vf;
+
+   vf->link_forced = true;
+   vf->link_up = true;
+   i40e_vc_notify_vf_link_state(vf);
+
return 0;
 }
 
 static int i40e_vf_netdev_stop(struct net_device *dev)
 {
+   struct i40e_vf_netdev_priv *priv = netdev_priv(dev);
+   struct i40e_vf *vf = priv->vf;
+
+   vf->link_forced = true;
+   vf->link_up = false;
+   i40e_vc_notify_vf_link_state(vf);
+
return 0;
 }
 
@@ -1907,6 +1921,10 @@ static int i40e_vc_enable_queues_msg(struct i40e_vf *vf, 
u8 *msg, u16 msglen)
 
if (i40e_vsi_control_rings(pf->vsi[vf->lan_vsi_idx], true))
aq_ret = I40E_ERR_TIMEOUT;
+
+   if ((0 == aq_ret) && vf->ctrl_netdev)
+   netif_carrier_on(vf->ctrl_netdev);
+
 error_param:
/* send the response to the VF */
return i40e_vc_send_resp_to_vf(vf, I40E_VIRTCHNL_OP_ENABLE_QUEUES,
@@ -1947,6 +1965,9 @@ static int i40e_vc_disable_queues_msg(struct i40e_vf *vf, 
u8 *msg, u16 msglen)
if (i40e_vsi_control_rings(pf->vsi[vf->lan_vsi_idx], false))
aq_ret = I40E_ERR_TIMEOUT;
 
+   if ((0 == aq_ret) && vf->ctrl_netdev)
+   netif_carrier_off(vf->ctrl_netdev);
+
 error_param:
/* send the response to the VF */
return i40e_vc_send_resp_to_vf(vf, I40E_VIRTCHNL_OP_DISABLE_QUEUES,
@@ -3179,6 +3200,7 @@ int i40e_ndo_set_vf_link_state(struct net_device *netdev, 
int vf_id, int link)
struct i40e_virtchnl_pf_event pfe;
struct i40e_hw *hw = >hw;
struct i40e_vf *vf;
+   struct net_device *vf_netdev;
int abs_vf_id;
int ret = 0;
 
@@ -3219,6 +3241,17 @@ int i40e_ndo_set_vf_link_state(struct net_device 
*netdev, int vf_id, int link)
ret = -EINVAL;
goto error_out;
}
+
+   vf_netdev = vf->ctrl_netdev;
+   if (vf_netdev) {
+   unsigned int flags =

[net-next 00/15][pull request] 40GbE Intel Wired LAN Driver Updates 2016-09-20

2016-09-20 Thread Jeff Kirsher

This series contains updates to i40e and i40evf only.

Sridhar enables creation of a VF port Representor/Control netdev
associated with each VF, which allows control and configuring VFs from
Pfs namespace.  Then enables the VF specific ethtool statistics via the
VF port Representor.  Adds initial devlink support to set/get the mode
of a SRIOV switch.  Fixes link state event handling by updating the
carrier and starts/stops the Tx queues based on the link state
notification from PF.

Brady fixes an issue where a user defined RSS hash key was not being
set because a user defined indirection table is not supplied when changing
the hash key, so if an indirection table is not supplied now, then a
default one is created and the hash key is correctly set.  Also fixed
an issue where when NPAR was enabled, we were still using pf->mac_seid
to perform the dump port query. Instead, go through the VSI to determine
the correct ID to use in either case.

Mitch provides one fix where a conditional return code was reversed, so
he does a "switheroo" to fix the issue.

Carolyn has two fixes, first fixes an issue in the virt channel code,
where a return code was not checked for NULL when applicable.  Second,
fixes an issue where we were byte swapping the port parameter, then
byte swapping it again in function execution.

Colin Ian King fixes a potential NULL pointer dereference.

Amritha adds support for switchdev ops on the VF port representors and
the PF uplink.

Bimmy changes up i40evf_up_complete() to be void since it always returns
success anyways, which allows cleaning up of code which checked the
return code from this function.

Alex fixed an issue where the driver was incorrectly assuming that we
would always be pulling no more than 1 descriptor from each fragment.
So to correct this, we just need to make certain to test all the way to
the end of the fragments as it is possible for us to span 2 descriptors
in the block before us so we need to guarantee that even the last 6
descriptors have enough data to fill a full frame.

The following are changes since commit 5737f6c92681939e417579b421f81f035e57c582:
  mlx4: add missed recycle opportunity for XDP_TX on TX failure
and are available in the git repository at:
  git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue 40GbE

Alan Brady (2):
  i40e: fix setting user defined RSS hash key
  i40e: fix "dump port" command when NPAR enabled

Alexander Duyck (1):
  i40e: Limit TX descriptor count in cases where frag size is greater
than 16K

Amritha Nambiar (1):
  i40e: Add support for switchdev API for Switch ID

Bimmy Pujari (1):
  i40evf: remove unnecessary error checking against i40evf_up_complete

Carolyn Wyborny (2):
  i40e: Fix to check for NULL
  i40e: Fix for extra byte swap in tunnel setup

Colin Ian King (1):
  i40e: avoid potential null pointer dereference when assigning len

Lihong Yang (1):
  i40evf: remove unnecessary error checking against i40e_shutdown_adminq

Mitch Williams (1):
  i40e: return correct opcode to VF

Sridhar Samudrala (5):
  i40e: Introduce VF port representor/control netdevs
  i40e: Enable VF specific ethtool statistics via VF Port representor
netdevs
  i40e: Introduce devlink interface
  i40evf: Fix link state event handling
  i40e: Sync link state between VFs and VF Port representors(VFPR)

 drivers/net/ethernet/intel/Kconfig |   1 +
 drivers/net/ethernet/intel/i40e/i40e.h |   7 +
 drivers/net/ethernet/intel/i40e/i40e_debugfs.c |   7 +-
 drivers/net/ethernet/intel/i40e/i40e_ethtool.c |  84 -
 drivers/net/ethernet/intel/i40e/i40e_main.c| 120 +++--
 drivers/net/ethernet/intel/i40e/i40e_txrx.c|   7 +-
 drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c | 196 -
 drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.h |  16 ++
 drivers/net/ethernet/intel/i40evf/i40e_common.c|   3 +-
 drivers/net/ethernet/intel/i40evf/i40e_txrx.c  |   7 +-
 drivers/net/ethernet/intel/i40evf/i40evf_main.c|  18 +-
 .../net/ethernet/intel/i40evf/i40evf_virtchnl.c|  10 +-
 12 files changed, 428 insertions(+), 48 deletions(-)

-- 
2.7.4

Re: [PATCH net-next 0/3] BPF direct packet access improvements

2016-09-20 Thread David Miller

From: Daniel Borkmann 
Date: Tue, 20 Sep 2016 00:26:11 +0200

> This set adds write support to the currently available read support
> for {cls,act}_bpf programs. First one is a fix for affected commit
> sitting in net-next and prerequisite for the second one, last patch
> adds a number of test cases against the verifier. For details, please
> see individual patches.

Series applied.

Re: [PATCH v5 net-next 1/1] net sched actions: fix GETing actions

2016-09-20 Thread David Miller

From: Jamal Hadi Salim 
Date: Mon, 19 Sep 2016 19:02:51 -0400

> From: Jamal Hadi Salim 
> 
> With the batch changes that translated transient actions into
> a temporary list lost in the translation was the fact that
> tcf_action_destroy() will eventually delete the action from
> the permanent location if the refcount is zero.
> 
> Example of what broke:
> ...add a gact action to drop
> sudo $TC actions add action drop index 10
> ...now retrieve it, looks good
> sudo $TC actions get action gact index 10
> ...retrieve it again and find it is gone!
> sudo $TC actions get action gact index 10
> 
> Fixes: 22dc13c837c3 ("net_sched: convert tcf_exts from list to pointer 
> array"),
> Fixes: 824a7e8863b3 ("net_sched: remove an unnecessary list_del()")
> Fixes: f07fed82ad79 ("net_sched: remove the leftover cleanup_a()")
> 
> Acked-by: Cong Wang 
> Signed-off-by: Jamal Hadi Salim 

Applied.

[PATCH net-next] MAINTAINERS: Update b44 maintainer.

2016-09-20 Thread Michael Chan

Taking over as maintainer since Gary Zambrano is no longer working
for Broadcom.

Signed-off-by: Michael Chan 
---
 MAINTAINERS | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index ce80b36..7626f7836 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2509,7 +2509,7 @@ S:Supported
 F: kernel/bpf/
 
 BROADCOM B44 10/100 ETHERNET DRIVER
-M: Gary Zambrano 
+M: Michael Chan 
 L: netdev@vger.kernel.org
 S: Supported
 F: drivers/net/ethernet/broadcom/b44.*
-- 
1.8.3.1

Re: [PATCH net-next 5/7] rhashtable: abstract out function to get hash

2016-09-20 Thread Tom Herbert

On Tue, Sep 20, 2016 at 7:46 PM, Herbert Xu  wrote:
> On Tue, Sep 20, 2016 at 07:58:03PM +0200, Thomas Graf wrote:
>>
>> I understand this particular patch as an effort not to duplicate
>> hash function selection such as jhash vs jhash2 based on key_len.
>
> If the rhashtable params stay non-const as is then this is going
> to produce some monstrous code which will be worse than using
> jhash unconditionally.
>
I will look at keep params constant.

Tom

> If the rhashtable params are made const then you'll already know
> whether jhash or jhash2 is used.
>
> Cheers,
> --
> Email: Herbert Xu 
> Home Page: http://gondor.apana.org.au/~herbert/
> PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

Re: [PATCH v4 net-next 16/16] tcp_bbr: add BBR congestion control

2016-09-20 Thread Neal Cardwell

On Tue, Sep 20, 2016 at 2:50 PM, Neal Cardwell  wrote:
> On Tue, Sep 20, 2016 at 2:48 PM, Stephen Hemminger
>  wrote:
>>
>> On Mon, 19 Sep 2016 23:39:23 -0400
>> Neal Cardwell  wrote:
>>
>> > +/* INET_DIAG_BBRINFO */
>> > +
>> > +struct tcp_bbr_info {
>> > + /* u64 bw: max-filtered BW (app throughput) estimate in Byte per 
>> > sec: */
>> > + __u32   bbr_bw_lo;  /* lower 32 bits of bw */
>> > + __u32   bbr_bw_hi;  /* upper 32 bits of bw */
>> > + __u32   bbr_min_rtt;/* min-filtered RTT in uSec */
>> > + __u32   bbr_pacing_gain;/* pacing gain shifted left 8 bits */
>> > + __u32   bbr_cwnd_gain;  /* cwnd gain shifted left 8 bits */
>> > +};
>> > +
>>
>> I assume there is a change to iproute (ss) to dump this info?
>
> Yes, we have a patch for iproute2 (inet_diag.h and ss.c), which we've
> been using. We'll send that out ASAP.

Here are the patches with proposed iproute2 support to dump this info:

http://patchwork.ozlabs.org/patch/672538/
http://patchwork.ozlabs.org/patch/672539/
http://patchwork.ozlabs.org/patch/672540/

thanks,
neal

Re: [PATCH] tcp: fix wrong checksum calculation on MTU probing

2016-09-20 Thread David Miller


This patch is whitespace damaged by your email client.

Please fix this, email the patch to yourself, and only resubmit this
when you can successfully apply the patch you emailed to yourself.

Thanks.

Re: [PATCH next] ipvlan: Fix dependency issue

2016-09-20 Thread David Miller

From: Mahesh Bandewar 
Date: Mon, 19 Sep 2016 13:56:29 -0700

> From: Mahesh Bandewar 
> 
> kbuild-build-bot reported that if NETFILTER is not selected, the
> build fails pointing to netfilter symbols.
> 
> Fixes: 4fbae7d83c98 ("ipvlan: Introduce l3s mode")
> 
> Signed-off-by: Mahesh Bandewar 

Applied.

Re: [PATCH net-next 2/2] openvswitch: avoid resetting flow key while installing new flow.

2016-09-20 Thread David Miller

From: Pravin B Shelar 
Date: Mon, 19 Sep 2016 13:51:00 -0700

> since commit commit db74a3335e0f6 ("openvswitch: use percpu
> flow stats") flow alloc resets flow-key. So there is no need
> to reset the flow-key again if OVS is using newly allocated
> flow-key.
> 
> Signed-off-by: Pravin B Shelar 

Applied.

Re: [PATCH net-next 1/2] openvswitch: Fix Frame-size larger than 1024 bytes warning.

2016-09-20 Thread David Miller

From: Pravin B Shelar 
Date: Mon, 19 Sep 2016 13:50:59 -0700

> There is no need to declare separate key on stack,
> we can just use sw_flow->key to store the key directly.
> 
> This commit fixes following warning:
> 
> net/openvswitch/datapath.c: In function ‘ovs_flow_cmd_new’:
> net/openvswitch/datapath.c:1080:1: warning: the frame size of 1040 bytes
> is larger than 1024 bytes [-Wframe-larger-than=]
> 
> Signed-off-by: Pravin B Shelar 

Applied.

Re: pull request: bluetooth-next 2016-09-19

2016-09-20 Thread David Miller

From: Johan Hedberg 
Date: Mon, 19 Sep 2016 22:37:42 +0300

> Here's the main bluetooth-next pull request for the 4.9 kernel.
> 
>  - Added new messages for monitor sockets for better mgmt tracing
>  - Added local name and appearance support in scan response
>  - Added new Qualcomm WCNSS SMD based HCI driver
>  - Minor fixes & cleanup to 802.15.4 code
>  - New USB ID to btusb driver
>  - Added Marvell support to HCI UART driver
>  - Add combined LED trigger for controller power
>  - Other minor fixes here and there
> 
> Please let me know if there are any issues pulling. Thanks.

Pulled, thanks Johan.

Re: [PATCH] 6pack: fix buffer length mishandling

2016-09-20 Thread David Miller

From: Alan 
Date: Mon, 19 Sep 2016 20:15:24 +0100

> Dmitry Vyukov wrote:
>> different runs). Looking at code, the following looks suspicious -- we
>> limit copy by 512 bytes, but use the original count which can be
>> larger than 512:
>>
>> static void sixpack_receive_buf(struct tty_struct *tty,
>> const unsigned char *cp, char *fp, int count)
>> {
>> unsigned char buf[512];
>> 
>> memcpy(buf, cp, count < sizeof(buf) ? count : sizeof(buf));
>> 
>> sixpack_decode(sp, buf, count1);
> 
> With the sane tty locking we now have I believe the following is safe as
> we consume the bytes and move them into the decoded buffer before
> returning.
> 
> Signed-off-by: Alan Cox 

Applied to net-next, thanks Alan.

Re: pull-request: can 2016-09-19

2016-09-20 Thread David Miller

From: Marc Kleine-Budde 
Date: Mon, 19 Sep 2016 16:19:48 +0200

> this is a pull request of one patch for the upcoming linux-4.8 release.
> 
> The patch by Fabio Estevam fixes the pm handling in the flexcan driver.

Pulled, thanks.

Re: [iovisor-dev] XDP (eXpress Data Path) documentation

2016-09-20 Thread Alexei Starovoitov

On Tue, Sep 20, 2016 at 11:08:44AM +0200, Jesper Dangaard Brouer via 
iovisor-dev wrote:
> Hi all,
> 
> As promised, I've started documenting the XDP eXpress Data Path):
> 
>  [1] 
> https://prototype-kernel.readthedocs.io/en/latest/networking/XDP/index.html
> 
> IMHO the documentation have reached a stage where it is useful for the
> XDP project, BUT I request collaboration on improving the documentation
> from all. (Native English speakers are encouraged to send grammar fixes ;-))
> 
> You wouldn't believe it: But this pretty looking documentation actually
> follows the new Kernel documentation format.  It is actually just
> ".rst" text files stored in my github repository under kernel/Documentation 
> [2]
> 
>  [2] 
> https://github.com/netoptimizer/prototype-kernel/tree/master/kernel/Documentation

Thanks so much for doing it. This is great start!
Some minor editing is needed here and there.
To make it into official doc do you mind preparing a patch for Jon's doc tree ?
If you think the doc is too volatile and not suitable for kernel.org,
another alternative is to host it on https://github.com/iovisor
since it's LF collaborative project it won't disappear suddenly.
You can be a maintainer of that repo if you like.

Re: [PATCH net-next 5/7] rhashtable: abstract out function to get hash

2016-09-20 Thread Herbert Xu

On Tue, Sep 20, 2016 at 07:58:03PM +0200, Thomas Graf wrote:
> 
> I understand this particular patch as an effort not to duplicate
> hash function selection such as jhash vs jhash2 based on key_len.

If the rhashtable params stay non-const as is then this is going
to produce some monstrous code which will be worse than using
jhash unconditionally.

If the rhashtable params are made const then you'll already know
whether jhash or jhash2 is used.

Cheers,
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

[PATCH iproute2 3/3] ss: output TCP BBR diag information

2016-09-20 Thread Neal Cardwell

Dump useful TCP BBR state information from a struct tcp_bbr_info that
was grabbed using the inet_diag API.

We tolerate info that is shorter or longer than expected, in case the
kernel is older or newer than the ss binary. We simply print the
minimum of what is expected from the kernel and what is provided from
the kernel. We use the same trick as that used for struct tcp_info:
when the info from the kernel is shorter than we hoped, we pad the end
with zeroes, and don't print fields if they are zero.

The BBR output looks like:
  bbr:(bw:1.2Mbps,mrtt:18.965,pacing_gain:2.88672,cwnd_gain:2.88672)

The motivation here is to be consistent with DCTCP, which looks like:
  dctcp(ce_state:23,alpha:23,ab_ecn:23,ab_tot:23)

Signed-off-by: Neal Cardwell 
Signed-off-by: Yuchung Cheng 
Signed-off-by: Eric Dumazet 
Signed-off-by: Soheil Hassas Yeganeh 
---
 misc/ss.c | 31 +++
 1 file changed, 31 insertions(+)

diff --git a/misc/ss.c b/misc/ss.c
index 9c456d4..14fff46 100644
--- a/misc/ss.c
+++ b/misc/ss.c
@@ -784,6 +784,7 @@ struct tcpstat {
boolhas_fastopen_opt;
boolhas_wscale_opt;
struct dctcpstat*dctcp;
+   struct tcp_bbr_info *bbr_info;
 };
 
 static void sock_state_print(struct sockstat *s, const char *sock_name)
@@ -1727,6 +1728,25 @@ static void tcp_stats_print(struct tcpstat *s)
printf(" dctcp:fallback_mode");
}
 
+   if (s->bbr_info) {
+   __u64 bw;
+
+   bw = s->bbr_info->bbr_bw_hi;
+   bw <<= 32;
+   bw |= s->bbr_info->bbr_bw_lo;
+
+   printf(" bbr:(bw:%sbps,mrtt:%g",
+  sprint_bw(b1, bw * 8.0),
+  (double)s->bbr_info->bbr_min_rtt / 1000.0);
+   if (s->bbr_info->bbr_pacing_gain)
+   printf(",pacing_gain:%g",
+  (double)s->bbr_info->bbr_pacing_gain / 256.0);
+   if (s->bbr_info->bbr_cwnd_gain)
+   printf(",cwnd_gain:%g",
+  (double)s->bbr_info->bbr_cwnd_gain / 256.0);
+   printf(")");
+   }
+
if (s->send_bps)
printf(" send %sbps", sprint_bw(b1, s->send_bps));
if (s->lastsnd)
@@ -2005,6 +2025,16 @@ static void tcp_show_info(const struct nlmsghdr *nlh, 
struct inet_diag_msg *r,
s.dctcp = dctcp;
}
 
+   if (tb[INET_DIAG_BBRINFO]) {
+   const void *bbr_info = RTA_DATA(tb[INET_DIAG_BBRINFO]);
+   int len = min(RTA_PAYLOAD(tb[INET_DIAG_BBRINFO]),
+ sizeof(*s.bbr_info));
+
+   s.bbr_info = calloc(1, sizeof(*s.bbr_info));
+   if (s.bbr_info && bbr_info)
+   memcpy(s.bbr_info, bbr_info, len);
+   }
+
if (rtt > 0 && info->tcpi_snd_mss && info->tcpi_snd_cwnd) {
s.send_bps = (double) info->tcpi_snd_cwnd *
(double)info->tcpi_snd_mss * 800. / rtt;
@@ -2027,6 +2057,7 @@ static void tcp_show_info(const struct nlmsghdr *nlh, 
struct inet_diag_msg *r,
s.min_rtt = (double) info->tcpi_min_rtt / 1000;
tcp_stats_print();
free(s.dctcp);
+   free(s.bbr_info);
}
 }
 
-- 
2.8.0.rc3.226.g39d4020

[PATCH iproute2 1/3] Update inet_diag.h header to pick up INET_DIAG_MARK

2016-09-20 Thread Neal Cardwell

To ease the upcoming addition of BBR-related data to inet_diag.h, add
the declaration of INET_DIAG_MARK. That way the BBR-related paches
only contain BBR-related pieces.

Signed-off-by: Neal Cardwell 
---
 include/linux/inet_diag.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/include/linux/inet_diag.h b/include/linux/inet_diag.h
index 07e486c..5dac049 100644
--- a/include/linux/inet_diag.h
+++ b/include/linux/inet_diag.h
@@ -116,6 +116,7 @@ enum {
INET_DIAG_LOCALS,
INET_DIAG_PEERS,
INET_DIAG_PAD,
+   INET_DIAG_MARK,
__INET_DIAG_MAX,
 };
 
-- 
2.8.0.rc3.226.g39d4020

[PATCH iproute2 2/3] Update inet_diag.h to include INET_DIAG_BBRINFO and related structs

2016-09-20 Thread Neal Cardwell

Update to include the the inet_diag.h changes in:
  "tcp_bbr: add BBR congestion control"

Signed-off-by: Neal Cardwell 
---
 include/linux/inet_diag.h | 13 +
 1 file changed, 13 insertions(+)

diff --git a/include/linux/inet_diag.h b/include/linux/inet_diag.h
index 5dac049..529a5a2 100644
--- a/include/linux/inet_diag.h
+++ b/include/linux/inet_diag.h
@@ -117,6 +117,7 @@ enum {
INET_DIAG_PEERS,
INET_DIAG_PAD,
INET_DIAG_MARK,
+   INET_DIAG_BBRINFO,
__INET_DIAG_MAX,
 };
 
@@ -150,8 +151,20 @@ struct tcp_dctcp_info {
__u32   dctcp_ab_tot;
 };
 
+/* INET_DIAG_BBRINFO */
+
+struct tcp_bbr_info {
+   /* u64 bw: max-filtered BW (app throughput) estimate in Byte per sec: */
+   __u32   bbr_bw_lo;  /* lower 32 bits of bw */
+   __u32   bbr_bw_hi;  /* upper 32 bits of bw */
+   __u32   bbr_min_rtt;/* min-filtered RTT in uSec */
+   __u32   bbr_pacing_gain;/* pacing gain shifted left 8 bits */
+   __u32   bbr_cwnd_gain;  /* cwnd gain shifted left 8 bits */
+};
+
 union tcp_cc_info {
struct tcpvegas_infovegas;
struct tcp_dctcp_info   dctcp;
+   struct tcp_bbr_info bbr;
 };
 #endif /* _INET_DIAG_H_ */
-- 
2.8.0.rc3.226.g39d4020

Re: [PATCH net-next 6/7] net/faraday: Fix phy link irq on Aspeed G5 SoCs

2016-09-20 Thread Joel Stanley

On Wed, Sep 21, 2016 at 12:59 AM, Andrew Lunn  wrote:
> On Tue, Sep 20, 2016 at 10:13:14PM +1000, Benjamin Herrenschmidt wrote:
>> On Tue, 2016-09-20 at 16:00 +0930, Joel Stanley wrote:
>> > On Aspeed SoC with a direct PHY connection (non-NSCI), we receive
>> > continual PHYSTS interrupts:
>> >
>> >  [   20.28] ftgmac100 1e66.ethernet eth0: [ISR] = 0x200: PHYSTS_CHG
>> >  [   20.28] ftgmac100 1e66.ethernet eth0: [ISR] = 0x200: PHYSTS_CHG
>> >  [   20.28] ftgmac100 1e66.ethernet eth0: [ISR] = 0x200: PHYSTS_CHG
>> >  [   20.30] ftgmac100 1e66.ethernet eth0: [ISR] = 0x200: PHYSTS_CHG
>> >
>> > This is because the driver was enabling low-level sensitive interrupt
>> > generation where the systems are wired for high-level. All CPU cycles
>> > are spent servicing this interrupt.
>>
>> If this is a system wiring issue, should it be represented by a DT
>> property ?
>
> Is there a device tree binding document somewhere?
>
> Is it possible just to put ACTIVE_HIGH in the right place in the
> binding?

I wrote "wired for high level" wrt the SoC internals. To be honest I
wondered the same thing but it's hard with only one (non-NSCI) system
to test on.

I had a look at the eval board schematic and it appears that the line
has pull down resistors on it, explaining why the IRQ fires when it's
configured to active low. Other machines re-use the pin pin as a GPIO.
So yes, I will change this to a dt property in v2. That will mean
dropping 4/7 "net/faraday: Avoid PHYSTS_CHG interrupt" as well.

Cheers,

Joel

[PATCH net] net: get rid of an signed integer overflow in ip_idents_reserve()

2016-09-20 Thread Eric Dumazet

From: Eric Dumazet 

Jiri Pirko reported an UBSAN warning happening in ip_idents_reserve()

[] UBSAN: Undefined behaviour in ./arch/x86/include/asm/atomic.h:156:11
[] signed integer overflow:
[] -2117905507 + -695755206 cannot be represented in type 'int'

Since we do not have uatomic_add_return() yet, use atomic_cmpxchg()
so that the arithmetics can be done using unsigned int.

Fixes: 04ca6973f7c1 ("ip: make IP identifiers less predictable")
Signed-off-by: Eric Dumazet 
Reported-by: Jiri Pirko 
---
David, Jiri, I removed the prandom_u32() stuff in favor of a traditional
loop to meet stable requirements. Thanks !

 net/ipv4/route.c |   10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 
b52496fd51075821c39435f50ac62f813967aecc..654a9af201366887652a4e19a6f1261e5e747056
 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -476,12 +476,18 @@ u32 ip_idents_reserve(u32 hash, int segs)
atomic_t *p_id = ip_idents + hash % IP_IDENTS_SZ;
u32 old = ACCESS_ONCE(*p_tstamp);
u32 now = (u32)jiffies;
-   u32 delta = 0;
+   u32 new, delta = 0;
 
if (old != now && cmpxchg(p_tstamp, old, now) == old)
delta = prandom_u32_max(now - old);
 
-   return atomic_add_return(segs + delta, p_id) - segs;
+   /* Do not use atomic_add_return() as it makes UBSAN unhappy */
+   do {
+   old = (u32)atomic_read(p_id);
+   new = old + delta + segs;
+   } while (atomic_cmpxchg(p_id, old, new) != old);
+
+   return new - segs;
 }
 EXPORT_SYMBOL(ip_idents_reserve);

Re: [PATCHv3 net-next 1/2] net: dsa: mv88e6xxx: Add helper for accessing port registers

2016-09-20 Thread Vivien Didelot

Hi Andrew,

Andrew Lunn  writes:

> There is a device coming soon which places its port registers
> somewhere different to all other Marvell switches supported so far.
> Add helper functions for reading/writing port registers, making it
> easier to handle this new device.
>
> Signed-off-by: Andrew Lunn 

Reviewed-by: Vivien Didelot 

Thanks!

Vivien

Re: [PATCH RFC 1/3] xdp: Infrastructure to generalize XDP

2016-09-20 Thread Alexei Starovoitov


On 9/20/16 4:59 PM, Tom Herbert wrote:

I am looking at using this for ILA router. The problem I am hitting is
that not all packets that we need to translate go through the XDP
path. Some would go through the kernel path, some through XDP path but
that would mean I need parallel lookup tables to be maintained for the
two paths which won't scale. ILA translation is so trivial and not
really something that we need to be user programmable, the fast path
is really for accelerating an existing kernel capability. If I can
reuse the kernel code already written and the existing kernel data
structures to make a fast path in XDP there is a lot of value in that
for me.


sounds like you want to add hard coded ILA rewriter to the driver
instead of doing it as BPF program?!
That is 180 degree turn vs the whole protocol ossification tune
that I thought you strongly believe in.

What kernel data structures do you want to reuse?
ILA rewriter needs single hash lookup. Several different
types of hash maps exist on bpf side already and
even more are coming that will be usable by both tc and xdp side.
csum adjustment? we have them for tc. Not for xdp yet,
but it's trivial to allow them on xdp side too.
May be we should talk about real motivation for the patches
and see what is the best solution.

Re: [PATCH RFC 1/3] xdp: Infrastructure to generalize XDP

2016-09-20 Thread Alexei Starovoitov

On 9/20/16 3:00 PM, Tom Herbert wrote:

+static inline int __xdp_hook_run(struct list_head *list_head,
+struct xdp_buff *xdp)
+{
+   struct xdp_hook_ops *elem;
+   int ret = XDP_PASS;
+
+   list_for_each_entry(elem, list_head, list) {
+   ret = elem->hook(elem->priv, xdp);
+   if (ret != XDP_PASS)
+   break;
+   }
+
+   return ret;
+}
+
+/* Run the XDP hooks for a napi device. Called from a driver's receive
+ * routine
+ */
+static inline int xdp_hook_run(struct napi_struct *napi, struct xdp_buff *xdp)
+{
+   struct net_device *dev = napi->dev;
+   int ret = XDP_PASS;
+
+   if (static_branch_unlikely(_hooks_needed)) {
+   /* Run hooks in napi first */
+   ret = __xdp_hook_run(>xdp_hook_list, xdp);
+   if (ret != XDP_PASS)
+   return ret;
+
+   /* Now run device hooks */
+   ret = __xdp_hook_run(>xdp_hook_list, xdp);
+   if (ret != XDP_PASS)
+   return ret;
+   }
+
+   return ret;
+}

it's an interesting idea to move prog pointer into napi struct,
but certainly not at such huge cost.
Right now it's 1 load + 1 cmp + 1 indirect jump per packet
to invoke the program, with above approach it becomes
6 loads + 3 cmp (just to get through run_needed_check() check)
+ 6 loads + 3 cmp + 2 indirect jumps.
(I may be little bit off +- few loads)
That is a non-starter.
When we were optimizing receive path of tc clast ingress hook
we saw 1Mpps saved for every load+cmp+indirect jump removed.

We're working on inlining of bpf_map_lookup to save one
indirect call per lookup, we cannot just waste them here.

We need to save cycles instead, especially when it doesn't
really solve your goals. It seems the goals are:

>- Allows alternative users of the XDP hooks other than the original
>BPF

this should be achieved by their own hooks while reusing
return codes XDP_TX, XDP_PASS to keep driver side the same.
I'm not against other packet processing engines, but not
at the cost of lower performance.

>  - Allows a means to pipeline XDP programs together

this can be done already via bpf_tail_call. No changes needed.

>  - Reduces the amount of code and complexity needed in drivers to
>manage XDP

hmm:
534 insertions(+), 144 deletions(-)
looks like increase in complexity instead.

>  - Provides a more structured environment that is extensible to new
>features while being mostly transparent to the drivers

don't see that in these patches either.
Things like packet size change (that we're working on) still
has to be implemented for every driver.
Existing XDP_TX, XDP_DROP have to be implemented per driver as well.

Also introduction of xdp.h breaks existing UAPI.
That's not acceptable either.

Re: [PATCH RFC 1/3] xdp: Infrastructure to generalize XDP

2016-09-20 Thread Tom Herbert

On Tue, Sep 20, 2016 at 4:43 PM, Thomas Graf  wrote:
> On 09/20/16 at 04:18pm, Tom Herbert wrote:
>> This allows other use cases than BPF inserting code into the data
>> path. This gives XDP potential more utility and more users so that we
>> can motivate more driver implementations. For instance, I thinks it's
>> totally reasonable if the nftables guys want to insert some of their
>> rules to perform early DDOS drop to get the same performance that we
>> see in XDP.
>
> Reasonable point with nftables but are any of these users on the table
> already and ready to consume non-skbs? It would be a pity to add this
> complexity and cost if it is never used.
>
Well, need to measure to ascertain the cost. As for complexity, this
actually reduces complexity needed for XDP in the drivers which is a
good thing because that's where most of the support and development
pain will be.

I am looking at using this for ILA router. The problem I am hitting is
that not all packets that we need to translate go through the XDP
path. Some would go through the kernel path, some through XDP path but
that would mean I need parallel lookup tables to be maintained for the
two paths which won't scale. ILA translation is so trivial and not
really something that we need to be user programmable, the fast path
is really for accelerating an existing kernel capability. If I can
reuse the kernel code already written and the existing kernel data
structures to make a fast path in XDP there is a lot of value in that
for me.

> I don't see how we can ensure performance if we have multiple
> subsystems register for the hook each adding their own parsers which
> need to be passed through sequentially. Maybe I'm missing something.

We can optimize for allowing only one hook, or maybe limit to only
allowing one hook to be set. In any case this obviously requires a lot
of performance evaluation, I am hoping to feedback on the design
first. My question about using a linear list for this was real, do you
know a better method off hand to implement a call list?

Thanks,
Tom

Re: [PATCH v4 net-next 16/16] tcp_bbr: add BBR congestion control

2016-09-20 Thread Eric Dumazet

On Tue, Sep 20, 2016 at 4:42 PM, Neal Cardwell  wrote:
>
> On Tue, Sep 20, 2016 at 7:39 PM, Stephen Hemminger
>  wrote:
> >
> >> NOTE: BBR *must* be used with the fq qdisc ("man tc-fq") with pacing
> >> enabled, since pacing is integral to the BBR design and
> >> implementation. BBR without pacing would not function properly, and
> >> may incur unnecessary high packet loss rates.
> >
> > Does it work with fq_codel?
>
> Good question. Since fq_codel does not (currently) implement pacing,
> it would not be sufficient to get the required behavior.
>
> neal


fq_codel is stochastic, so it wont work very well on hosts with
1,000,000 flows or more...

fq_codel is aimed for routers, while sch_fq targets hosts,
implementing pacing at a minimal cost (one high resolution timer per
qdisc)

Re: [PATCHv4 next 0/3] IPvlan introduce l3s mode

2016-09-20 Thread Stephen Hemminger

On Mon, 19 Sep 2016 01:25:53 -0400 (EDT)
David Miller  wrote:

> From: Mahesh Bandewar 
> Date: Fri, 16 Sep 2016 12:59:01 -0700
> 
> > Same old problem with new approach especially from suggestions from
> > earlier patch-series.
> > 
> > First thing is that this is introduced as a new mode rather than
> > modifying the old (L3) mode. So the behavior of the existing modes is
> > preserved as it is and the new L3s mode obeys iptables so that intended
> > conn-tracking can work. 
> > 
> > To do this, the code uses newly added l3mdev_rcv() handler and an
> > Iptables hook. l3mdev_rcv() to perform an inbound route lookup with the
> > correct (IPvlan slave) interface and then IPtable-hook at LOCAL_INPUT
> > to change the input device from master to the slave to complete the
> > formality.
> > 
> > Supporting stack changes are trivial changes to export symbol to get
> > IPv4 equivalent code exported for IPv6 and to allow netfilter hook
> > registration code to allow caller to hold RTNL. Please look into
> > individual patches for details.  
> 
> Series applied, thanks.

This fails to build with the following config. Looks like IPVLAN now has
dependency on netfilter.

  CC [M]  drivers/net/ipvlan/ipvlan_core.o
In file included from drivers/net/ipvlan/ipvlan_core.c:10:0:
drivers/net/ipvlan/ipvlan.h:132:22: warning: ‘struct nf_hook_state’ declared 
inside parameter list will not be visible outside of this definition or 
declaration
 const struct nf_hook_state *state);
  ^
drivers/net/ipvlan/ipvlan_core.c:754:22: warning: ‘struct nf_hook_state’ 
declared inside parameter list will not be visible outside of this definition 
or declaration
 const struct nf_hook_state *state)
  ^
drivers/net/ipvlan/ipvlan_core.c:753:14: error: conflicting types for 
‘ipvlan_nf_input’
 unsigned int ipvlan_nf_input(void *priv, struct sk_buff *skb,
  ^~~
In file included from drivers/net/ipvlan/ipvlan_core.c:10:0:
drivers/net/ipvlan/ipvlan.h:131:14: note: previous declaration of 
‘ipvlan_nf_input’ was here
 unsigned int ipvlan_nf_input(void *priv, struct sk_buff *skb,
  ^~~

Not sure why the build bot did not catch this.

#
# Automatically generated file; DO NOT EDIT.
# Linux/x86 4.8.0-rc6 Kernel Configuration
#
CONFIG_64BIT=y
CONFIG_X86_64=y
CONFIG_X86=y
CONFIG_INSTRUCTION_DECODER=y
CONFIG_OUTPUT_FORMAT="elf64-x86-64"
CONFIG_ARCH_DEFCONFIG="arch/x86/configs/x86_64_defconfig"
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_STACKTRACE_SUPPORT=y
CONFIG_MMU=y
CONFIG_ARCH_MMAP_RND_BITS_MIN=28
CONFIG_ARCH_MMAP_RND_BITS_MAX=32
CONFIG_ARCH_MMAP_RND_COMPAT_BITS_MIN=8
CONFIG_ARCH_MMAP_RND_COMPAT_BITS_MAX=16
CONFIG_NEED_DMA_MAP_STATE=y
CONFIG_NEED_SG_DMA_LENGTH=y
CONFIG_GENERIC_ISA_DMA=y
CONFIG_GENERIC_BUG=y
CONFIG_GENERIC_BUG_RELATIVE_POINTERS=y
CONFIG_GENERIC_HWEIGHT=y
CONFIG_ARCH_MAY_HAVE_PC_FDC=y
CONFIG_RWSEM_XCHGADD_ALGORITHM=y
CONFIG_GENERIC_CALIBRATE_DELAY=y
CONFIG_ARCH_HAS_CPU_RELAX=y
CONFIG_ARCH_HAS_CACHE_LINE_SIZE=y
CONFIG_HAVE_SETUP_PER_CPU_AREA=y
CONFIG_NEED_PER_CPU_EMBED_FIRST_CHUNK=y
CONFIG_NEED_PER_CPU_PAGE_FIRST_CHUNK=y
CONFIG_ARCH_HIBERNATION_POSSIBLE=y
CONFIG_ARCH_SUSPEND_POSSIBLE=y
CONFIG_ARCH_WANT_HUGE_PMD_SHARE=y
CONFIG_ARCH_WANT_GENERAL_HUGETLB=y
CONFIG_ZONE_DMA32=y
CONFIG_AUDIT_ARCH=y
CONFIG_ARCH_SUPPORTS_OPTIMIZED_INLINING=y
CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC=y
CONFIG_HAVE_INTEL_TXT=y
CONFIG_X86_64_SMP=y
CONFIG_ARCH_SUPPORTS_UPROBES=y
CONFIG_FIX_EARLYCON_MEM=y
CONFIG_DEBUG_RODATA=y
CONFIG_PGTABLE_LEVELS=4
CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config"
CONFIG_IRQ_WORK=y
CONFIG_BUILDTIME_EXTABLE_SORT=y

#
# General setup
#
CONFIG_INIT_ENV_ARG_LIMIT=32
CONFIG_CROSS_COMPILE=""
# CONFIG_COMPILE_TEST is not set
CONFIG_LOCALVERSION="-net-next"
# CONFIG_LOCALVERSION_AUTO is not set
CONFIG_HAVE_KERNEL_GZIP=y
CONFIG_HAVE_KERNEL_BZIP2=y
CONFIG_HAVE_KERNEL_LZMA=y
CONFIG_HAVE_KERNEL_XZ=y
CONFIG_HAVE_KERNEL_LZO=y
CONFIG_HAVE_KERNEL_LZ4=y
# CONFIG_KERNEL_GZIP is not set
# CONFIG_KERNEL_BZIP2 is not set
# CONFIG_KERNEL_LZMA is not set
CONFIG_KERNEL_XZ=y
# CONFIG_KERNEL_LZO is not set
# CONFIG_KERNEL_LZ4 is not set
CONFIG_DEFAULT_HOSTNAME="(none)"
CONFIG_SWAP=y
CONFIG_SYSVIPC=y
CONFIG_SYSVIPC_SYSCTL=y
CONFIG_POSIX_MQUEUE=y
CONFIG_POSIX_MQUEUE_SYSCTL=y
CONFIG_CROSS_MEMORY_ATTACH=y
CONFIG_FHANDLE=y
CONFIG_USELIB=y
CONFIG_AUDIT=y
CONFIG_HAVE_ARCH_AUDITSYSCALL=y
CONFIG_AUDITSYSCALL=y
CONFIG_AUDIT_WATCH=y
CONFIG_AUDIT_TREE=y

#
# IRQ subsystem
#
CONFIG_GENERIC_IRQ_PROBE=y
CONFIG_GENERIC_IRQ_SHOW=y
CONFIG_GENERIC_PENDING_IRQ=y
CONFIG_IRQ_DOMAIN=y
CONFIG_IRQ_DOMAIN_HIERARCHY=y
CONFIG_GENERIC_MSI_IRQ=y
CONFIG_GENERIC_MSI_IRQ_DOMAIN=y
# CONFIG_IRQ_DOMAIN_DEBUG is not set
CONFIG_IRQ_FORCED_THREADING=y
CONFIG_SPARSE_IRQ=y
CONFIG_CLOCKSOURCE_WATCHDOG=y
CONFIG_ARCH_CLOCKSOURCE_DATA=y
CONFIG_CLOCKSOURCE_VALIDATE_LAST_CYCLE=y
CONFIG_GENERIC_TIME_VSYSCALL=y
CONFIG_GENERIC_CLOCKEVENTS=y

Re: [PATCH RFC 1/3] xdp: Infrastructure to generalize XDP

2016-09-20 Thread Thomas Graf

On 09/20/16 at 04:18pm, Tom Herbert wrote:
> This allows other use cases than BPF inserting code into the data
> path. This gives XDP potential more utility and more users so that we
> can motivate more driver implementations. For instance, I thinks it's
> totally reasonable if the nftables guys want to insert some of their
> rules to perform early DDOS drop to get the same performance that we
> see in XDP.

Reasonable point with nftables but are any of these users on the table
already and ready to consume non-skbs? It would be a pity to add this
complexity and cost if it is never used.

I don't see how we can ensure performance if we have multiple
subsystems register for the hook each adding their own parsers which
need to be passed through sequentially. Maybe I'm missing something.

Re: [PATCH net-next] net: dsa: mv88e6xxx: handle multiple ports in ATU

2016-09-20 Thread Andrew Lunn

On Mon, Sep 19, 2016 at 09:07:16PM -0400, Vivien Didelot wrote:
> Hi Andrew,
> 
> Andrew Lunn  writes:
> 
> > On Mon, Sep 19, 2016 at 07:56:11PM -0400, Vivien Didelot wrote:
> >> An address can be loaded in the ATU with multiple ports, for instance
> >> when adding multiple ports to a Multicast group with "bridge mdb".
> >>
> >> The current code doesn't allow that. Add an helper to get a single entry
> >> from the ATU, then set or clear the requested port, before loading the
> >> entry back in the ATU.
> >>
> >> Note that the required _mv88e6xxx_atu_getnext function is defined below
> >> mv88e6xxx_port_db_load_purge, so forward-declare it for the moment. The
> >> ATU code will be isolated in future patches.
> >>
> >> Fixes: 83dabd1fa84c ("net: dsa: mv88e6xxx: make switchdev DB ops generic")
> >
> > Is this a real fixes? You don't make it clear what goes wrong. I
> > assume adding the same MAC address for a second time but for a
> > different port removes the first entry for the old port?
> 
> Yes, this is what happens, sorry for the bad message. Below is an
> example with the relevant hardware bits.
> 
> Here's the current behavior, without this patch:
> 
> # bridge mdb add dev br0 port lan0 grp 238.39.20.86
> 
> FID  MAC Addr  State Trunk?  DPV/Trunk ID
> 001:00:5e:27:14:56 MC_STATIC   n 0 - - - - - -
> 
> # bridge mdb add dev br0 port lan2 grp 238.39.20.86
> 
> FID  MAC Addr  State Trunk?  DPV/Trunk ID
> 001:00:5e:27:14:56 MC_STATIC   n - - 2 - - - - 
> 
> Here's the new behavior, with this patch:
> 
> # bridge mdb add dev br0 port lan0 grp 238.39.20.86
> 
> FID  MAC Addr  State Trunk?  DPV/Trunk ID
> 001:00:5e:27:14:56 MC_STATIC   n 0 - - - - - -
> 
> # bridge mdb add dev br0 port lan2 grp 238.39.20.86
> 
> FID  MAC Addr  State Trunk?  DPV/Trunk ID
> 001:00:5e:27:14:56 MC_STATIC   n 0 - 2 - - - -

Hi Vivien

it would be nice to update the commit message with this text. 

Otherwise

Reviewed-by: Andrew Lunn 

Andrew

Re: [PATCH v4 net-next 16/16] tcp_bbr: add BBR congestion control

2016-09-20 Thread Neal Cardwell

On Tue, Sep 20, 2016 at 7:39 PM, Stephen Hemminger
 wrote:
>
>> NOTE: BBR *must* be used with the fq qdisc ("man tc-fq") with pacing
>> enabled, since pacing is integral to the BBR design and
>> implementation. BBR without pacing would not function properly, and
>> may incur unnecessary high packet loss rates.
>
> Does it work with fq_codel?

Good question. Since fq_codel does not (currently) implement pacing,
it would not be sufficient to get the required behavior.

neal

[PATCHv3 net-next 0/2] Preparation for mv88e6390

2016-09-20 Thread Andrew Lunn

These two patches are a couple of preparation steps for supporting the
the MV88E6390 family of chips. This is a new generation from Marvell,
and will need more feature flags than are currently available in an
unsigned long. Expand to an unsigned long long. The MV88E6390 also
places its port registers somewhere else, so add a wrapper around port
register access.

v2:
 Rework wrappers to use mv88e6xxx_{read|write}
 Simpliy some (err < ) to (err)
Add Reviewed by tag.

v3::
 reg = reg & foo -> reg &= foo
 Fix over zealous s/ret/err

Andrew Lunn (2):
  net: dsa: mv88e6xxx: Add helper for accessing port registers
  net: dsa: mv88e6xxx: Convert flag bits to unsigned long long

 drivers/net/dsa/mv88e6xxx/chip.c  | 370 +-
 drivers/net/dsa/mv88e6xxx/mv88e6xxx.h |  63 +++---
 2 files changed, 213 insertions(+), 220 deletions(-)

-- 
2.9.3

[PATCHv3 net-next 1/2] net: dsa: mv88e6xxx: Add helper for accessing port registers

2016-09-20 Thread Andrew Lunn

There is a device coming soon which places its port registers
somewhere different to all other Marvell switches supported so far.
Add helper functions for reading/writing port registers, making it
easier to handle this new device.

Signed-off-by: Andrew Lunn 
---
v2:
 Call mv88e6xxx_{read|write} in wrappers
 Change some (err < 0) to plain (err)
v3:
 reg = reg & foo -> reg &= foo
 Fix over zealous s/ret/err
---
 drivers/net/dsa/mv88e6xxx/chip.c  | 370 +-
 drivers/net/dsa/mv88e6xxx/mv88e6xxx.h |   1 -
 2 files changed, 182 insertions(+), 189 deletions(-)

diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
index 70a812d159c9..f7014866e8ef 100644
--- a/drivers/net/dsa/mv88e6xxx/chip.c
+++ b/drivers/net/dsa/mv88e6xxx/chip.c
@@ -216,6 +216,22 @@ int mv88e6xxx_write(struct mv88e6xxx_chip *chip, int addr, 
int reg, u16 val)
return 0;
 }
 
+int mv88e6xxx_port_read(struct mv88e6xxx_chip *chip, int port, int reg,
+   u16 *val)
+{
+   int addr = chip->info->port_base_addr + port;
+
+   return mv88e6xxx_read(chip, addr, reg, val);
+}
+
+int mv88e6xxx_port_write(struct mv88e6xxx_chip *chip, int port, int reg,
+u16 val)
+{
+   int addr = chip->info->port_base_addr + port;
+
+   return mv88e6xxx_write(chip, addr, reg, val);
+}
+
 static int mv88e6xxx_phy_read(struct mv88e6xxx_chip *chip, int phy,
  int reg, u16 *val)
 {
@@ -585,23 +601,23 @@ static void mv88e6xxx_adjust_link(struct dsa_switch *ds, 
int port,
  struct phy_device *phydev)
 {
struct mv88e6xxx_chip *chip = ds->priv;
-   u32 reg;
-   int ret;
+   u16 reg;
+   int err;
 
if (!phy_is_pseudo_fixed_link(phydev))
return;
 
mutex_lock(>reg_lock);
 
-   ret = _mv88e6xxx_reg_read(chip, REG_PORT(port), PORT_PCS_CTRL);
-   if (ret < 0)
+   err = mv88e6xxx_port_read(chip, port, PORT_PCS_CTRL, );
+   if (err)
goto out;
 
-   reg = ret & ~(PORT_PCS_CTRL_LINK_UP |
- PORT_PCS_CTRL_FORCE_LINK |
- PORT_PCS_CTRL_DUPLEX_FULL |
- PORT_PCS_CTRL_FORCE_DUPLEX |
- PORT_PCS_CTRL_UNFORCED);
+   reg &= ~(PORT_PCS_CTRL_LINK_UP |
+PORT_PCS_CTRL_FORCE_LINK |
+PORT_PCS_CTRL_DUPLEX_FULL |
+PORT_PCS_CTRL_FORCE_DUPLEX |
+PORT_PCS_CTRL_UNFORCED);
 
reg |= PORT_PCS_CTRL_FORCE_LINK;
if (phydev->link)
@@ -639,7 +655,7 @@ static void mv88e6xxx_adjust_link(struct dsa_switch *ds, 
int port,
reg |= (PORT_PCS_CTRL_RGMII_DELAY_RXCLK |
PORT_PCS_CTRL_RGMII_DELAY_TXCLK);
}
-   _mv88e6xxx_reg_write(chip, REG_PORT(port), PORT_PCS_CTRL, reg);
+   mv88e6xxx_port_write(chip, port, PORT_PCS_CTRL, reg);
 
 out:
mutex_unlock(>reg_lock);
@@ -799,22 +815,22 @@ static uint64_t _mv88e6xxx_get_ethtool_stat(struct 
mv88e6xxx_chip *chip,
 {
u32 low;
u32 high = 0;
-   int ret;
+   int err;
+   u16 reg;
u64 value;
 
switch (s->type) {
case PORT:
-   ret = _mv88e6xxx_reg_read(chip, REG_PORT(port), s->reg);
-   if (ret < 0)
+   err = mv88e6xxx_port_read(chip, port, s->reg, );
+   if (err)
return UINT64_MAX;
 
-   low = ret;
+   low = reg;
if (s->sizeof_stat == 4) {
-   ret = _mv88e6xxx_reg_read(chip, REG_PORT(port),
- s->reg + 1);
-   if (ret < 0)
+   err = mv88e6xxx_port_read(chip, port, s->reg + 1, );
+   if (err)
return UINT64_MAX;
-   high = ret;
+   high = reg;
}
break;
case BANK0:
@@ -893,6 +909,8 @@ static void mv88e6xxx_get_regs(struct dsa_switch *ds, int 
port,
   struct ethtool_regs *regs, void *_p)
 {
struct mv88e6xxx_chip *chip = ds->priv;
+   int err;
+   u16 reg;
u16 *p = _p;
int i;
 
@@ -903,11 +921,10 @@ static void mv88e6xxx_get_regs(struct dsa_switch *ds, int 
port,
mutex_lock(>reg_lock);
 
for (i = 0; i < 32; i++) {
-   int ret;
 
-   ret = _mv88e6xxx_reg_read(chip, REG_PORT(port), i);
-   if (ret >= 0)
-   p[i] = ret;
+   err = mv88e6xxx_port_read(chip, port, i, );
+   if (!err)
+   p[i] = reg;
}
 
mutex_unlock(>reg_lock);
@@ -938,7 +955,7 @@ static int mv88e6xxx_get_eee(struct dsa_switch *ds, int 
port,
e->eee_enabled = !!(reg & 0x0200);
e->tx_lpi_enabled = !!(reg &

[PATCHv3 net-next 2/2] net: dsa: mv88e6xxx: Convert flag bits to unsigned long long

2016-09-20 Thread Andrew Lunn

We are soon going to run out of flag bits on 32bit systems. Convert to
unsigned long long.

Signed-off-by: Andrew Lunn 
Reviewed-by: Vivien Didelot 
---
 drivers/net/dsa/mv88e6xxx/mv88e6xxx.h | 62 +--
 1 file changed, 31 insertions(+), 31 deletions(-)

diff --git a/drivers/net/dsa/mv88e6xxx/mv88e6xxx.h 
b/drivers/net/dsa/mv88e6xxx/mv88e6xxx.h
index e349d0d64645..827988397fd8 100644
--- a/drivers/net/dsa/mv88e6xxx/mv88e6xxx.h
+++ b/drivers/net/dsa/mv88e6xxx/mv88e6xxx.h
@@ -452,36 +452,36 @@ enum mv88e6xxx_cap {
 };
 
 /* Bitmask of capabilities */
-#define MV88E6XXX_FLAG_EDSABIT(MV88E6XXX_CAP_EDSA)
-#define MV88E6XXX_FLAG_EEE BIT(MV88E6XXX_CAP_EEE)
-
-#define MV88E6XXX_FLAG_SMI_CMD BIT(MV88E6XXX_CAP_SMI_CMD)
-#define MV88E6XXX_FLAG_SMI_DATABIT(MV88E6XXX_CAP_SMI_DATA)
-
-#define MV88E6XXX_FLAG_PHY_PAGEBIT(MV88E6XXX_CAP_PHY_PAGE)
-
-#define MV88E6XXX_FLAG_SERDES  BIT(MV88E6XXX_CAP_SERDES)
-
-#define MV88E6XXX_FLAG_GLOBAL2 BIT(MV88E6XXX_CAP_GLOBAL2)
-#define MV88E6XXX_FLAG_G2_MGMT_EN_2X   BIT(MV88E6XXX_CAP_G2_MGMT_EN_2X)
-#define MV88E6XXX_FLAG_G2_MGMT_EN_0X   BIT(MV88E6XXX_CAP_G2_MGMT_EN_0X)
-#define MV88E6XXX_FLAG_G2_IRL_CMD  BIT(MV88E6XXX_CAP_G2_IRL_CMD)
-#define MV88E6XXX_FLAG_G2_IRL_DATA BIT(MV88E6XXX_CAP_G2_IRL_DATA)
-#define MV88E6XXX_FLAG_G2_PVT_ADDR BIT(MV88E6XXX_CAP_G2_PVT_ADDR)
-#define MV88E6XXX_FLAG_G2_PVT_DATA BIT(MV88E6XXX_CAP_G2_PVT_DATA)
-#define MV88E6XXX_FLAG_G2_SWITCH_MAC   BIT(MV88E6XXX_CAP_G2_SWITCH_MAC)
-#define MV88E6XXX_FLAG_G2_POT  BIT(MV88E6XXX_CAP_G2_POT)
-#define MV88E6XXX_FLAG_G2_EEPROM_CMD   BIT(MV88E6XXX_CAP_G2_EEPROM_CMD)
-#define MV88E6XXX_FLAG_G2_EEPROM_DATA  BIT(MV88E6XXX_CAP_G2_EEPROM_DATA)
-#define MV88E6XXX_FLAG_G2_SMI_PHY_CMD  BIT(MV88E6XXX_CAP_G2_SMI_PHY_CMD)
-#define MV88E6XXX_FLAG_G2_SMI_PHY_DATA BIT(MV88E6XXX_CAP_G2_SMI_PHY_DATA)
-
-#define MV88E6XXX_FLAG_PPU BIT(MV88E6XXX_CAP_PPU)
-#define MV88E6XXX_FLAG_PPU_ACTIVE  BIT(MV88E6XXX_CAP_PPU_ACTIVE)
-#define MV88E6XXX_FLAG_STU BIT(MV88E6XXX_CAP_STU)
-#define MV88E6XXX_FLAG_TEMPBIT(MV88E6XXX_CAP_TEMP)
-#define MV88E6XXX_FLAG_TEMP_LIMIT  BIT(MV88E6XXX_CAP_TEMP_LIMIT)
-#define MV88E6XXX_FLAG_VTU BIT(MV88E6XXX_CAP_VTU)
+#define MV88E6XXX_FLAG_EDSABIT_ULL(MV88E6XXX_CAP_EDSA)
+#define MV88E6XXX_FLAG_EEE BIT_ULL(MV88E6XXX_CAP_EEE)
+
+#define MV88E6XXX_FLAG_SMI_CMD BIT_ULL(MV88E6XXX_CAP_SMI_CMD)
+#define MV88E6XXX_FLAG_SMI_DATABIT_ULL(MV88E6XXX_CAP_SMI_DATA)
+
+#define MV88E6XXX_FLAG_PHY_PAGEBIT_ULL(MV88E6XXX_CAP_PHY_PAGE)
+
+#define MV88E6XXX_FLAG_SERDES  BIT_ULL(MV88E6XXX_CAP_SERDES)
+
+#define MV88E6XXX_FLAG_GLOBAL2 BIT_ULL(MV88E6XXX_CAP_GLOBAL2)
+#define MV88E6XXX_FLAG_G2_MGMT_EN_2X   BIT_ULL(MV88E6XXX_CAP_G2_MGMT_EN_2X)
+#define MV88E6XXX_FLAG_G2_MGMT_EN_0X   BIT_ULL(MV88E6XXX_CAP_G2_MGMT_EN_0X)
+#define MV88E6XXX_FLAG_G2_IRL_CMD  BIT_ULL(MV88E6XXX_CAP_G2_IRL_CMD)
+#define MV88E6XXX_FLAG_G2_IRL_DATA BIT_ULL(MV88E6XXX_CAP_G2_IRL_DATA)
+#define MV88E6XXX_FLAG_G2_PVT_ADDR BIT_ULL(MV88E6XXX_CAP_G2_PVT_ADDR)
+#define MV88E6XXX_FLAG_G2_PVT_DATA BIT_ULL(MV88E6XXX_CAP_G2_PVT_DATA)
+#define MV88E6XXX_FLAG_G2_SWITCH_MAC   BIT_ULL(MV88E6XXX_CAP_G2_SWITCH_MAC)
+#define MV88E6XXX_FLAG_G2_POT  BIT_ULL(MV88E6XXX_CAP_G2_POT)
+#define MV88E6XXX_FLAG_G2_EEPROM_CMD   BIT_ULL(MV88E6XXX_CAP_G2_EEPROM_CMD)
+#define MV88E6XXX_FLAG_G2_EEPROM_DATA  BIT_ULL(MV88E6XXX_CAP_G2_EEPROM_DATA)
+#define MV88E6XXX_FLAG_G2_SMI_PHY_CMD  BIT_ULL(MV88E6XXX_CAP_G2_SMI_PHY_CMD)
+#define MV88E6XXX_FLAG_G2_SMI_PHY_DATA BIT_ULL(MV88E6XXX_CAP_G2_SMI_PHY_DATA)
+
+#define MV88E6XXX_FLAG_PPU BIT_ULL(MV88E6XXX_CAP_PPU)
+#define MV88E6XXX_FLAG_PPU_ACTIVE  BIT_ULL(MV88E6XXX_CAP_PPU_ACTIVE)
+#define MV88E6XXX_FLAG_STU BIT_ULL(MV88E6XXX_CAP_STU)
+#define MV88E6XXX_FLAG_TEMPBIT_ULL(MV88E6XXX_CAP_TEMP)
+#define MV88E6XXX_FLAG_TEMP_LIMIT  BIT_ULL(MV88E6XXX_CAP_TEMP_LIMIT)
+#define MV88E6XXX_FLAG_VTU BIT_ULL(MV88E6XXX_CAP_VTU)
 
 /* EEPROM Programming via Global2 with 16-bit data */
 #define MV88E6XXX_FLAGS_EEPROM16   \
@@ -614,7 +614,7 @@ struct mv88e6xxx_info {
unsigned int num_ports;
unsigned int port_base_addr;
unsigned int age_time_coeff;
-   unsigned long flags;
+   unsigned long long flags;
 };
 
 struct mv88e6xxx_atu_entry {
-- 
2.9.3

Re: [PATCH v4 net-next 16/16] tcp_bbr: add BBR congestion control

2016-09-20 Thread Stephen Hemminger


> NOTE: BBR *must* be used with the fq qdisc ("man tc-fq") with pacing
> enabled, since pacing is integral to the BBR design and
> implementation. BBR without pacing would not function properly, and
> may incur unnecessary high packet loss rates.

Does it work with fq_codel?

[PATCH] ptp_clock: future-proofing drivers against PTP subsystem becoming optional

2016-09-20 Thread Nicolas Pitre


Drivers must be ready to accept NULL from ptp_clock_register() if the
PTP clock subsystem is configured out.

This patch documents that and ensures that all drivers cope well
with a NULL return.

Signed-off-by: Nicolas Pitre 
Reviewed-by: Eugenia Emantayev 

---

Let's have the basics merged now and work out the actual Kconfig issue 
separately. Richard, if you agree with this patch, I think this could go 
via the netdev tree.

diff --git a/drivers/net/ethernet/intel/e1000e/ptp.c 
b/drivers/net/ethernet/intel/e1000e/ptp.c
index 2e1b17ad52..ad03763e00 100644
--- a/drivers/net/ethernet/intel/e1000e/ptp.c
+++ b/drivers/net/ethernet/intel/e1000e/ptp.c
@@ -334,7 +334,7 @@ void e1000e_ptp_init(struct e1000_adapter *adapter)
if (IS_ERR(adapter->ptp_clock)) {
adapter->ptp_clock = NULL;
e_err("ptp_clock_register failed\n");
-   } else {
+   } else if (adapter->ptp_clock) {
e_info("registered PHC clock\n");
}
 }
diff --git a/drivers/net/ethernet/intel/i40e/i40e_ptp.c 
b/drivers/net/ethernet/intel/i40e/i40e_ptp.c
index ed39cbad24..f1feceab75 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_ptp.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_ptp.c
@@ -669,7 +669,7 @@ void i40e_ptp_init(struct i40e_pf *pf)
pf->ptp_clock = NULL;
dev_err(>pdev->dev, "%s: ptp_clock_register failed\n",
__func__);
-   } else {
+   } else if (pf->ptp_clock) {
struct timespec64 ts;
u32 regval;
 
diff --git a/drivers/net/ethernet/intel/igb/igb_ptp.c 
b/drivers/net/ethernet/intel/igb/igb_ptp.c
index 336c103ae3..7531892b08 100644
--- a/drivers/net/ethernet/intel/igb/igb_ptp.c
+++ b/drivers/net/ethernet/intel/igb/igb_ptp.c
@@ -1159,7 +1159,7 @@ void igb_ptp_init(struct igb_adapter *adapter)
if (IS_ERR(adapter->ptp_clock)) {
adapter->ptp_clock = NULL;
dev_err(>pdev->dev, "ptp_clock_register failed\n");
-   } else {
+   } else if (adapter->ptp_clock) {
dev_info(>pdev->dev, "added PHC on %s\n",
 adapter->netdev->name);
adapter->ptp_flags |= IGB_PTP_ENABLED;
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_ptp.c 
b/drivers/net/ethernet/intel/ixgbe/ixgbe_ptp.c
index e5431bfe33..a92277683a 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_ptp.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_ptp.c
@@ -1254,7 +1254,7 @@ static long ixgbe_ptp_create_clock(struct ixgbe_adapter 
*adapter)
adapter->ptp_clock = NULL;
e_dev_err("ptp_clock_register failed\n");
return err;
-   } else
+   } else if (adapter->ptp_clock)
e_dev_info("registered PHC device on %s\n", netdev->name);
 
/* set default timestamp mode to disabled here. We do this in
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_clock.c 
b/drivers/net/ethernet/mellanox/mlx4/en_clock.c
index 1494997c4f..08fc5fc56d 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_clock.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_clock.c
@@ -298,7 +298,7 @@ void mlx4_en_init_timestamp(struct mlx4_en_dev *mdev)
if (IS_ERR(mdev->ptp_clock)) {
mdev->ptp_clock = NULL;
mlx4_err(mdev, "ptp_clock_register failed\n");
-   } else {
+   } else if (mdev->ptp_clock) {
mlx4_info(mdev, "registered PHC clock\n");
}
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_clock.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_clock.c
index 847a8f3ac2..13dc388667 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_clock.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_clock.c
@@ -273,7 +273,7 @@ void mlx5e_timestamp_init(struct mlx5e_priv *priv)
 
tstamp->ptp = ptp_clock_register(>ptp_info,
 >mdev->pdev->dev);
-   if (IS_ERR_OR_NULL(tstamp->ptp)) {
+   if (IS_ERR(tstamp->ptp)) {
mlx5_core_warn(priv->mdev, "ptp_clock_register failed %ld\n",
   PTR_ERR(tstamp->ptp));
tstamp->ptp = NULL;
diff --git a/drivers/net/ethernet/sfc/ptp.c b/drivers/net/ethernet/sfc/ptp.c
index c771e0af4e..f105a170b4 100644
--- a/drivers/net/ethernet/sfc/ptp.c
+++ b/drivers/net/ethernet/sfc/ptp.c
@@ -1269,13 +1269,13 @@ int efx_ptp_probe(struct efx_nic *efx, struct 
efx_channel *channel)
if (IS_ERR(ptp->phc_clock)) {
rc = PTR_ERR(ptp->phc_clock);
goto fail3;
-   }
-
-   INIT_WORK(>pps_work, efx_ptp_pps_worker);
-   ptp->pps_workwq = create_singlethread_workqueue("sfc_pps");
-   if (!ptp->pps_workwq) {
-   rc = -ENOMEM;
-   goto fail4;
+   } else if (ptp->phc_clock) {
+   INIT_WORK(>pps_work,

Re: [PATCH RFC 1/3] xdp: Infrastructure to generalize XDP

2016-09-20 Thread Daniel Borkmann


On 09/21/2016 01:09 AM, Thomas Graf wrote:

On 09/20/16 at 03:49pm, Tom Herbert wrote:

On Tue, Sep 20, 2016 at 3:44 PM, Thomas Graf  wrote:

On 09/20/16 at 03:00pm, Tom Herbert wrote:

+static inline int __xdp_hook_run(struct list_head *list_head,
+  struct xdp_buff *xdp)
+{
+ struct xdp_hook_ops *elem;
+ int ret = XDP_PASS;
+
+ list_for_each_entry(elem, list_head, list) {
+ ret = elem->hook(elem->priv, xdp);
+ if (ret != XDP_PASS)
+ break;
+ }


Walking over a linear list? Really? :-) I thought this was supposed
to be fast, no compromises made.


Can you suggest an alternative?


Single BPF program that encodes whatever logic is required. This is
what BPF is for. If it absolutely has to run two programs in sequence
then it can still do that even though I really don't see much of a
point of doing that in a high performance environment.


Agreed, if there's one thing I would change in cls_bpf, then it's getting
rid of the (uapi unfortunately) list of classifiers and just make it a
single one, because that's all that is needed, and chaining/pipelining
can be done via tail calls for example. This whole list + callback likely
also makes things slower. Why not let more drivers come first to support
the current xdp model we have, and then we can move common parts to the
driver-independent core code?


I'm not even sure yet I understand full purpose of this yet ;-)

Re: [PATCH RFC 1/3] xdp: Infrastructure to generalize XDP

2016-09-20 Thread Tom Herbert

On Tue, Sep 20, 2016 at 4:09 PM, Thomas Graf  wrote:
> On 09/20/16 at 03:49pm, Tom Herbert wrote:
>> On Tue, Sep 20, 2016 at 3:44 PM, Thomas Graf  wrote:
>> > On 09/20/16 at 03:00pm, Tom Herbert wrote:
>> >> +static inline int __xdp_hook_run(struct list_head *list_head,
>> >> +  struct xdp_buff *xdp)
>> >> +{
>> >> + struct xdp_hook_ops *elem;
>> >> + int ret = XDP_PASS;
>> >> +
>> >> + list_for_each_entry(elem, list_head, list) {
>> >> + ret = elem->hook(elem->priv, xdp);
>> >> + if (ret != XDP_PASS)
>> >> + break;
>> >> + }
>> >
>> > Walking over a linear list? Really? :-) I thought this was supposed
>> > to be fast, no compromises made.
>>
>> Can you suggest an alternative?
>
> Single BPF program that encodes whatever logic is required. This is
> what BPF is for. If it absolutely has to run two programs in sequence
> then it can still do that even though I really don't see much of a
> point of doing that in a high performance environment.
>
> I'm not even sure yet I understand full purpose of this yet ;-)

This allows other use cases than BPF inserting code into the data
path. This gives XDP potential more utility and more users so that we
can motivate more driver implementations. For instance, I thinks it's
totally reasonable if the nftables guys want to insert some of their
rules to perform early DDOS drop to get the same performance that we
see in XDP.

Tom

Re: [PATCH RFC 1/3] xdp: Infrastructure to generalize XDP

2016-09-20 Thread Thomas Graf

On 09/20/16 at 03:49pm, Tom Herbert wrote:
> On Tue, Sep 20, 2016 at 3:44 PM, Thomas Graf  wrote:
> > On 09/20/16 at 03:00pm, Tom Herbert wrote:
> >> +static inline int __xdp_hook_run(struct list_head *list_head,
> >> +  struct xdp_buff *xdp)
> >> +{
> >> + struct xdp_hook_ops *elem;
> >> + int ret = XDP_PASS;
> >> +
> >> + list_for_each_entry(elem, list_head, list) {
> >> + ret = elem->hook(elem->priv, xdp);
> >> + if (ret != XDP_PASS)
> >> + break;
> >> + }
> >
> > Walking over a linear list? Really? :-) I thought this was supposed
> > to be fast, no compromises made.
> 
> Can you suggest an alternative?

Single BPF program that encodes whatever logic is required. This is
what BPF is for. If it absolutely has to run two programs in sequence
then it can still do that even though I really don't see much of a
point of doing that in a high performance environment.

I'm not even sure yet I understand full purpose of this yet ;-)

Re: [PATCH RFC 1/3] xdp: Infrastructure to generalize XDP

2016-09-20 Thread Tom Herbert

On Tue, Sep 20, 2016 at 3:44 PM, Thomas Graf  wrote:
> On 09/20/16 at 03:00pm, Tom Herbert wrote:
>> +static inline int __xdp_hook_run(struct list_head *list_head,
>> +  struct xdp_buff *xdp)
>> +{
>> + struct xdp_hook_ops *elem;
>> + int ret = XDP_PASS;
>> +
>> + list_for_each_entry(elem, list_head, list) {
>> + ret = elem->hook(elem->priv, xdp);
>> + if (ret != XDP_PASS)
>> + break;
>> + }
>
> Walking over a linear list? Really? :-) I thought this was supposed
> to be fast, no compromises made.

Can you suggest an alternative?

Re: [PATCH v2 0/2] make POSIX timers optional

2016-09-20 Thread Nicolas Pitre

On Tue, 20 Sep 2016, Thomas Gleixner wrote:

> I think the whole approach is wrong because it makes the PTP split at the
> wrong level.
> 
> Currently we have:
> 
> DRIVER_X
> tristate "Driver X"
> select PTP
> 
> In order to make POSIX_CLOCK configurable we should have
> 
> PTP
> tristate "PTP"
> select POSIX_CLOCK
> 
> Now if you want to distangle PTP from a driver then you split it at the
> driver level and not at the PTP level:
> 
> DRIVER_X
> tristate "Driver X"
> 
> DRIVER_X_PTP
> bool "Enable PTP support"
> default y if !MAKE_IT_TINY
> depends on DRIVER_X
> select PTP
> 
> We have already drivers following that scheme. That way you make the PTP
> support in the driver conditional on DRIVER_X_PTP and have no hassle with
> modules and dependencies.

I beg to disagree.

There are way more drivers than subsystems and if you had to go around 
unselecting all NIC drivers for CONFIG_ETHERNET to be turned off, and 
with CONFIG_ETHERNET=n you'd finally be able to turn networking off, 
then this would be a nightmare.

IMHO it is much nicer for the poor user configuring the kernel to have a 
single configuration prompt for PTP support, and then have whatever 
driver that can provide a PTP clock just do it (or omit it) based on 
that single prompt.  Prompting for PTP support for each individual 
ethernet driver is silly.

Nicolas

Re: [PATCH RFC 1/3] xdp: Infrastructure to generalize XDP

2016-09-20 Thread Thomas Graf

On 09/20/16 at 03:00pm, Tom Herbert wrote:
> +static inline int __xdp_hook_run(struct list_head *list_head,
> +  struct xdp_buff *xdp)
> +{
> + struct xdp_hook_ops *elem;
> + int ret = XDP_PASS;
> +
> + list_for_each_entry(elem, list_head, list) {
> + ret = elem->hook(elem->priv, xdp);
> + if (ret != XDP_PASS)
> + break;
> + }

Walking over a linear list? Really? :-) I thought this was supposed
to be fast, no compromises made.

Re: [PATCH RFC 1/3] xdp: Infrastructure to generalize XDP

2016-09-20 Thread Tom Herbert

On Tue, Sep 20, 2016 at 3:37 PM, Eric Dumazet  wrote:
> On Tue, 2016-09-20 at 15:00 -0700, Tom Herbert wrote:
>
>> diff --git a/net/core/xdp.c b/net/core/xdp.c
>> new file mode 100644
>> index 000..815ead8
>> --- /dev/null
>> +++ b/net/core/xdp.c
>> @@ -0,0 +1,211 @@
>> +/*
>> + * Kernel Connection Multiplexor
>> + *
>> + * Copyright (c) 2016 Tom Herbert 
>> + *
>> + * This program is free software; you can redistribute it and/or modify
>> + * it under the terms of the GNU General Public License version 2
>> + * as published by the Free Software Foundation.
>> + */
>
>
> Too much copy/paste Tom :)
>
If this is you're only complaint Eric I'll be quite impressed ;-)

>

Re: [PATCH RFC 1/3] xdp: Infrastructure to generalize XDP

2016-09-20 Thread Eric Dumazet

On Tue, 2016-09-20 at 15:00 -0700, Tom Herbert wrote:

> diff --git a/net/core/xdp.c b/net/core/xdp.c
> new file mode 100644
> index 000..815ead8
> --- /dev/null
> +++ b/net/core/xdp.c
> @@ -0,0 +1,211 @@
> +/*
> + * Kernel Connection Multiplexor
> + *
> + * Copyright (c) 2016 Tom Herbert 
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2
> + * as published by the Free Software Foundation.
> + */


Too much copy/paste Tom :)

Re: [RFC PATCH] xdp: separate struct xdp_prog as container for bpf_prog

2016-09-20 Thread Tom Herbert

On Tue, Sep 20, 2016 at 12:55 PM, Jesper Dangaard Brouer
 wrote:
> Currently the XDP program is simply a bpf_prog pointer.  While it
> is good for simplicity, it is limiting extendability for upcoming
> features.
>
Hi Jesper,

Can you take a look (or try) the RFC patches I just posted to
generalize XDP. I believe that should subsume most of what you're
doing here!

Thanks,
Tom

> Introducing a new struct xdp_prog, that can carry information
> related to the XDP program.  Notice this approach does not affect
> performance (tested and benchmarked), because the extra dereference
> for the eBPF program only happens once per 64 packets in the poll
> function.
>
> The features that need this is:
>
> * Multi-port TX:
>   Need to know own port index and port lookup table.
>
> * XDP program per RX queue:
>   Need setup info about program type, global or specific, due to
>   replace semantics.
>
> * Capabilities negotiation:
>   Need to store information about features program want to use,
>   in-order to validate this.
>
> I do realize this new struct xdp_prog features cannot go into the
> kernel before one of the three users of the struct is also implemented.
>
> Signed-off-by: Jesper Dangaard Brouer 
> ---
>  drivers/net/ethernet/mellanox/mlx4/en_netdev.c |   12 +++---
>  drivers/net/ethernet/mellanox/mlx4/en_rx.c |   10 +++--
>  drivers/net/ethernet/mellanox/mlx4/mlx4_en.h   |2 -
>  include/linux/filter.h |   14 +++
>  include/linux/netdevice.h  |2 -
>  net/core/dev.c |   15 +--
>  net/core/filter.c  |   50 
> 
>  7 files changed, 89 insertions(+), 16 deletions(-)
>
> diff --git a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c 
> b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
> index 62516f8369ba..f86f65b170f7 100644
> --- a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
> +++ b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
> @@ -2622,11 +2622,11 @@ static int mlx4_en_set_tx_maxrate(struct net_device 
> *dev, int queue_index, u32 m
> return err;
>  }
>
> -static int mlx4_xdp_set(struct net_device *dev, struct bpf_prog *prog)
> +static int mlx4_xdp_set(struct net_device *dev, struct xdp_prog *prog)
>  {
> struct mlx4_en_priv *priv = netdev_priv(dev);
> struct mlx4_en_dev *mdev = priv->mdev;
> -   struct bpf_prog *old_prog;
> +   struct xdp_prog *old_prog;
> int xdp_ring_num;
> int port_up = 0;
> int err;
> @@ -2639,7 +2639,7 @@ static int mlx4_xdp_set(struct net_device *dev, struct 
> bpf_prog *prog)
>  */
> if (priv->xdp_ring_num == xdp_ring_num) {
> if (prog) {
> -   prog = bpf_prog_add(prog, priv->rx_ring_num - 1);
> +   prog = xdp_prog_add(prog, priv->rx_ring_num);
> if (IS_ERR(prog))
> return PTR_ERR(prog);
> }
> @@ -2650,7 +2650,7 @@ static int mlx4_xdp_set(struct net_device *dev, struct 
> bpf_prog *prog)
> lockdep_is_held(>state_lock));
> rcu_assign_pointer(priv->rx_ring[i]->xdp_prog, prog);
> if (old_prog)
> -   bpf_prog_put(old_prog);
> +   xdp_prog_put(old_prog);
> }
> mutex_unlock(>state_lock);
> return 0;
> @@ -2669,7 +2669,7 @@ static int mlx4_xdp_set(struct net_device *dev, struct 
> bpf_prog *prog)
> }
>
> if (prog) {
> -   prog = bpf_prog_add(prog, priv->rx_ring_num - 1);
> +   prog = xdp_prog_add(prog, priv->rx_ring_num);
> if (IS_ERR(prog))
> return PTR_ERR(prog);
> }
> @@ -2690,7 +2690,7 @@ static int mlx4_xdp_set(struct net_device *dev, struct 
> bpf_prog *prog)
> lockdep_is_held(>state_lock));
> rcu_assign_pointer(priv->rx_ring[i]->xdp_prog, prog);
> if (old_prog)
> -   bpf_prog_put(old_prog);
> +   xdp_prog_put(old_prog);
> }
>
> if (port_up) {
> diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c 
> b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
> index c46355bce613..e1182879ea6f 100644
> --- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
> +++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
> @@ -535,13 +535,13 @@ void mlx4_en_destroy_rx_ring(struct mlx4_en_priv *priv,
>  {
> struct mlx4_en_dev *mdev = priv->mdev;
> struct mlx4_en_rx_ring *ring = *pring;
> -   struct bpf_prog *old_prog;
> +   struct xdp_prog *old_prog;
>
> old_prog = rcu_dereference_protected(
> ring->xdp_prog,
>

[PATCH RFC] brcmfmac: stop netif queue when waiting for packets transmission

2016-09-20 Thread Rafał Miłecki

From: Rafał Miłecki 

Sending a new key to the firmware should be done without any 802.1x
packets pending. Currently brcmfmac has very trivial code waiting for
that condition and it doesn't seem to be enough.

We should stop netif from sending any extra packets in order to:
1) Make sure new 802.1x packets won't be coming over and over
2) Avoid a race with netif providing a new packet right after our
   waiting code

Another solution would be to accept only non-802.1x packets. This would
require enqueuing all packets and hacking brcmf_fws_dequeue_worker to
dequeue only non-802.1x ones but that would most likely result in too
hacky code.
---
 drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c 
b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c
index 201a980..1791060 100644
--- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c
+++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c
@@ -471,11 +471,14 @@ send_key_to_dongle(struct brcmf_if *ifp, struct 
brcmf_wsec_key *key)
 
convert_key_from_CPU(key, _le);
 
+   netif_stop_queue(ifp->ndev);
brcmf_netdev_wait_pend8021x(ifp);
 
err = brcmf_fil_bsscfg_data_set(ifp, "wsec_key", _le,
sizeof(key_le));
 
+   netif_start_queue(ifp->ndev);
+
if (err)
brcmf_err("wsec_key error (%d)\n", err);
return err;
-- 
2.9.3

[PATCH RFC 1/3] xdp: Infrastructure to generalize XDP

2016-09-20 Thread Tom Herbert

This patch creates an infrastructure for registering and running code at
XDP hooks in drivers. This is based on the orignal XDP?BPF and borrows
heavily from the techniques used by netfilter to make generic nfhooks.

An XDP hook is defined by the  xdp_hook_ops. This structure contains the
ops of an XDP hook. A pointer to this structure is passed into the XDP
register function to set up a hook. The XDP register function mallocs
its own xdp_hook_ops structure and copies the values from the
xdp_hook_ops passed in. The register function also stores the pointer
value of the xdp_hook_ops argument; this pointer is used in subsequently
calls to XDP to identify the registered hook.

The interface is defined in net/xdp.h. This includes the definition of
xdp_hook_ops, functions to register and unregister hook ops on a device
or individual instances of napi, and xdp_hook_run that is called by
drivers to run the hooks.

Signed-off-by: Tom Herbert 
---
 include/linux/filter.h  |   6 +-
 include/linux/netdev_features.h |   3 +-
 include/linux/netdevice.h   |  11 ++
 include/net/xdp.h   | 218 
 include/uapi/linux/bpf.h|  20 
 include/uapi/linux/xdp.h|  24 +
 net/core/Makefile   |   2 +-
 net/core/dev.c  |   4 +
 net/core/xdp.c  | 211 ++
 9 files changed, 472 insertions(+), 27 deletions(-)
 create mode 100644 include/net/xdp.h
 create mode 100644 include/uapi/linux/xdp.h
 create mode 100644 net/core/xdp.c

diff --git a/include/linux/filter.h b/include/linux/filter.h
index 1f09c52..2a26133 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -16,6 +16,7 @@
 #include 
 
 #include 
+#include 
 
 #include 
 
@@ -432,11 +433,6 @@ struct bpf_skb_data_end {
void *data_end;
 };
 
-struct xdp_buff {
-   void *data;
-   void *data_end;
-};
-
 /* compute the linear packet data range [data, data_end) which
  * will be accessed by cls_bpf and act_bpf programs
  */
diff --git a/include/linux/netdev_features.h b/include/linux/netdev_features.h
index 9c6c8ef..697fdea 100644
--- a/include/linux/netdev_features.h
+++ b/include/linux/netdev_features.h
@@ -72,8 +72,8 @@ enum {
NETIF_F_HW_VLAN_STAG_FILTER_BIT,/* Receive filtering on VLAN STAGs */
NETIF_F_HW_L2FW_DOFFLOAD_BIT,   /* Allow L2 Forwarding in Hardware */
NETIF_F_BUSY_POLL_BIT,  /* Busy poll */
-
NETIF_F_HW_TC_BIT,  /* Offload TC infrastructure */
+   NETIF_F_XDP_BIT,/* Support XDP interface */
 
/*
 * Add your fresh new feature above and remember to update
@@ -136,6 +136,7 @@ enum {
 #define NETIF_F_HW_L2FW_DOFFLOAD   __NETIF_F(HW_L2FW_DOFFLOAD)
 #define NETIF_F_BUSY_POLL  __NETIF_F(BUSY_POLL)
 #define NETIF_F_HW_TC  __NETIF_F(HW_TC)
+#define NETIF_F_XDP__NETIF_F(XDP)
 
 #define for_each_netdev_feature(mask_addr, bit)\
for_each_set_bit(bit, (unsigned long *)mask_addr, NETDEV_FEATURE_COUNT)
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index a10d8d1..f2b7d1b 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -324,6 +324,7 @@ struct napi_struct {
struct sk_buff  *skb;
struct hrtimer  timer;
struct list_headdev_list;
+   struct list_headxdp_hook_list;
struct hlist_node   napi_hash_node;
unsigned intnapi_id;
 };
@@ -819,6 +820,14 @@ enum xdp_netdev_command {
 * return true if a program is currently attached and running.
 */
XDP_QUERY_PROG,
+   /* Initialize XDP in the device. Called the first time an XDP hook
+* hook is being set on the device.
+*/
+   XDP_DEV_INIT,
+   /* XDP is finished on the device. Called after the last XDP hook
+* has been removed from a device.
+*/
+   XDP_DEV_FINISH,
 };
 
 struct netdev_xdp {
@@ -1663,6 +1672,8 @@ struct net_device {
struct list_headclose_list;
struct list_headptype_all;
struct list_headptype_specific;
+   struct list_headxdp_hook_list;
+   unsigned intxdp_hook_cnt;
 
struct {
struct list_head upper;
diff --git a/include/net/xdp.h b/include/net/xdp.h
new file mode 100644
index 000..c01a44e
--- /dev/null
+++ b/include/net/xdp.h
@@ -0,0 +1,218 @@
+/*
+ * eXpress Data Path (XDP)
+ *
+ * Copyright (c) 2016 Tom Herbert 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2
+ * as published by the Free Software Foundation.
+ */
+
+#ifndef __NET_XDP_H_
+#define __NET_XDP_H_
+
+#include 
+#include 
+#include 
+
+/* XDP data structure.
+ *
+ * Fields:
+ *   data - pointer to first byte of

[PATCH RFC 2/3] mlx4: Change XDP/BPF to use generic XDP infrastructure

2016-09-20 Thread Tom Herbert

This patch changes the XDP-BPF implementation to use the generic
XDP infrastructure. This includes corresponding changes to the
Mellanox XDP code.

Signed-off-by: Tom Herbert 
---
 drivers/net/ethernet/mellanox/mlx4/en_netdev.c | 64 --
 drivers/net/ethernet/mellanox/mlx4/en_rx.c | 25 --
 drivers/net/ethernet/mellanox/mlx4/mlx4_en.h   |  1 -
 include/linux/filter.h | 13 --
 net/core/dev.c | 40 +---
 net/core/filter.c  |  7 +--
 net/core/rtnetlink.c   | 16 +++
 7 files changed, 63 insertions(+), 103 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c 
b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
index 62516f8..47990b7 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
@@ -2622,39 +2622,15 @@ static int mlx4_en_set_tx_maxrate(struct net_device 
*dev, int queue_index, u32 m
return err;
 }
 
-static int mlx4_xdp_set(struct net_device *dev, struct bpf_prog *prog)
+static int mlx4_xdp_make_tx_rings(struct net_device *dev, int xdp_ring_num)
 {
struct mlx4_en_priv *priv = netdev_priv(dev);
struct mlx4_en_dev *mdev = priv->mdev;
-   struct bpf_prog *old_prog;
-   int xdp_ring_num;
int port_up = 0;
int err;
-   int i;
-
-   xdp_ring_num = prog ? ALIGN(priv->rx_ring_num, MLX4_EN_NUM_UP) : 0;
 
-   /* No need to reconfigure buffers when simply swapping the
-* program for a new one.
-*/
-   if (priv->xdp_ring_num == xdp_ring_num) {
-   if (prog) {
-   prog = bpf_prog_add(prog, priv->rx_ring_num - 1);
-   if (IS_ERR(prog))
-   return PTR_ERR(prog);
-   }
-   mutex_lock(>state_lock);
-   for (i = 0; i < priv->rx_ring_num; i++) {
-   old_prog = rcu_dereference_protected(
-   priv->rx_ring[i]->xdp_prog,
-   lockdep_is_held(>state_lock));
-   rcu_assign_pointer(priv->rx_ring[i]->xdp_prog, prog);
-   if (old_prog)
-   bpf_prog_put(old_prog);
-   }
-   mutex_unlock(>state_lock);
+   if (priv->xdp_ring_num == xdp_ring_num)
return 0;
-   }
 
if (priv->num_frags > 1) {
en_err(priv, "Cannot set XDP if MTU requires multiple frags\n");
@@ -2668,12 +2644,6 @@ static int mlx4_xdp_set(struct net_device *dev, struct 
bpf_prog *prog)
return -EINVAL;
}
 
-   if (prog) {
-   prog = bpf_prog_add(prog, priv->rx_ring_num - 1);
-   if (IS_ERR(prog))
-   return PTR_ERR(prog);
-   }
-
mutex_lock(>state_lock);
if (priv->port_up) {
port_up = 1;
@@ -2684,15 +2654,6 @@ static int mlx4_xdp_set(struct net_device *dev, struct 
bpf_prog *prog)
netif_set_real_num_tx_queues(dev, priv->tx_ring_num -
priv->xdp_ring_num);
 
-   for (i = 0; i < priv->rx_ring_num; i++) {
-   old_prog = rcu_dereference_protected(
-   priv->rx_ring[i]->xdp_prog,
-   lockdep_is_held(>state_lock));
-   rcu_assign_pointer(priv->rx_ring[i]->xdp_prog, prog);
-   if (old_prog)
-   bpf_prog_put(old_prog);
-   }
-
if (port_up) {
err = mlx4_en_start_port(dev);
if (err) {
@@ -2706,23 +2667,18 @@ static int mlx4_xdp_set(struct net_device *dev, struct 
bpf_prog *prog)
return 0;
 }
 
-static bool mlx4_xdp_attached(struct net_device *dev)
+static int mlx4_xdp(struct net_device *dev, struct netdev_xdp *xdp)
 {
struct mlx4_en_priv *priv = netdev_priv(dev);
 
-   return !!priv->xdp_ring_num;
-}
-
-static int mlx4_xdp(struct net_device *dev, struct netdev_xdp *xdp)
-{
switch (xdp->command) {
-   case XDP_SETUP_PROG:
-   return mlx4_xdp_set(dev, xdp->prog);
-   case XDP_QUERY_PROG:
-   xdp->prog_attached = mlx4_xdp_attached(dev);
-   return 0;
+   case XDP_DEV_INIT:
+   return mlx4_xdp_make_tx_rings(dev,
+   ALIGN(priv->rx_ring_num, MLX4_EN_NUM_UP));
+   case XDP_DEV_FINISH:
+   return mlx4_xdp_make_tx_rings(dev, 0);
default:
-   return -EINVAL;
+   return 0;
}
 }
 
@@ -3210,7 +3166,7 @@ int mlx4_en_init_netdev(struct mlx4_en_dev *mdev, int 
port,
 
dev->vlan_features = dev->hw_features;
 
-   dev->hw_features |= NETIF_F_RXCSUM | NETIF_F_RXHASH;
+   dev->hw_features |= NETIF_F_RXCSUM |

[PATCH RFC 0/3] xdp: Generalize XDP

2016-09-20 Thread Tom Herbert

This patch set generalizes XDP by make the hooks in drivers to be
generic in the same manner of nfhooks. This has a number of
advantages:

  - Allows alternative users of the XDP hooks other than the original
BPF
  - Allows a means to pipeline XDP programs together
  - Reduces the amount of code and complexity needed in drivers to
manage XDP
  - Provides a more structured environment that is extensible to new
features while being mostly transparent to the drivers 

The generic XDP infrastructure is based on how nfhooks works. The new
xdp_hook_ops structure contains callback functions and private data
structure that can be populated by the user of XDP. The hook ops are
registered either on a netdev or a napi (both maintain a list of XDP
hook ops). Allow per netdev ops makes management of XDP a lot simpler
when the intent is for the hook to apply to the whole driver (as is the
case with XDP_BPF so far). The downside is that we may need per napi
data (such as counters of returned actions).

The xdp_hook_ops contains three fields of interest. The "hook" field is
the function that is run for the hook. This takes a private data field
and the xdp_buff as arguments. "priv" is private data and "put_priv"
is a function called when XDP is done with the private data. In XDP_BPF
terminology the hook field is bpf_prog_run_xdp, "priv" is the xdp_prog,
and "put_priv" is bpf_prog_put.

The meaning of ndo_xdp is also changed. There are two commands for this
nod: XDP_DEV_INIT and XDP_DEV_FINISH. XDP_DEV_INIT is called the first
time an XDP hook is set on a device, this is primarily intended to
allow the device to initialize XDP (allocated the XDP TX queues for
instance). XDP_DEV_FINISH is called when the last XDP hook is
removed from a driver so that the driver can cleanup when XDP is done.

A new net feature is added NETIF_F_XDP so that a driver indicates
that is supports XDP.

The primary modification to a driver to support XDP is that it call
xdp_hook_run in the receive path (equivalent to bpf_prog_run in
previous XDP-BPF). The driver must deal with the four XDP return
actions XDP_PASS, XDP_DROP, XDP_TX, and XDP_ABORT.

xdp.h contains the interface to register and manage XDP hooks.

Tested:

Created a simple hook that does XDP_PASS and saw it works. A lot more
testing is needed for this.

Tom Herbert (3):
  xdp: Infrastructure to generalize XDP
  mlx4: Change XDP/BPF to use generic XDP infrastructure
  netdevice: Remove obsolete xdp_netdev_command

 drivers/net/ethernet/mellanox/mlx4/en_netdev.c |  64 ++--
 drivers/net/ethernet/mellanox/mlx4/en_rx.c |  25 ++-
 drivers/net/ethernet/mellanox/mlx4/mlx4_en.h   |   1 -
 include/linux/filter.h |  19 +--
 include/linux/netdev_features.h|   3 +-
 include/linux/netdevice.h  |  24 ++-
 include/net/xdp.h  | 218 +
 include/uapi/linux/bpf.h   |  20 ---
 include/uapi/linux/xdp.h   |  24 +++
 net/core/Makefile  |   2 +-
 net/core/dev.c |  44 -
 net/core/filter.c  |   7 +-
 net/core/rtnetlink.c   |  16 +-
 net/core/xdp.c | 211 
 14 files changed, 534 insertions(+), 144 deletions(-)
 create mode 100644 include/net/xdp.h
 create mode 100644 include/uapi/linux/xdp.h
 create mode 100644 net/core/xdp.c

-- 
2.8.0.rc2

[PATCH RFC 3/3] netdevice: Remove obsolete xdp_netdev_command

2016-09-20 Thread Tom Herbert

Remove XDP_SETUP_PROG and XDP_QUERY_PROG as they should no longer be
needed.

Signed-off-by: Tom Herbert 
---
 include/linux/netdevice.h | 17 +
 1 file changed, 1 insertion(+), 16 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index f2b7d1b..9a545ab 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -808,18 +808,6 @@ struct tc_to_netdev {
  * to the netdevice through the xdp op.
  */
 enum xdp_netdev_command {
-   /* Set or clear a bpf program used in the earliest stages of packet
-* rx. The prog will have been loaded as BPF_PROG_TYPE_XDP. The callee
-* is responsible for calling bpf_prog_put on any old progs that are
-* stored. In case of error, the callee need not release the new prog
-* reference, but on success it takes ownership and must bpf_prog_put
-* when it is no longer used.
-*/
-   XDP_SETUP_PROG,
-   /* Check if a bpf program is set on the device.  The callee should
-* return true if a program is currently attached and running.
-*/
-   XDP_QUERY_PROG,
/* Initialize XDP in the device. Called the first time an XDP hook
 * hook is being set on the device.
 */
@@ -833,10 +821,7 @@ enum xdp_netdev_command {
 struct netdev_xdp {
enum xdp_netdev_command command;
union {
-   /* XDP_SETUP_PROG */
-   struct bpf_prog *prog;
-   /* XDP_QUERY_PROG */
-   bool prog_attached;
+   /* Command parameters */
};
 };
 
-- 
2.8.0.rc2

Re: [RFC] PCI: Allow sysfs control over totalvfs

2016-09-20 Thread Alexander Duyck

On Tue, Sep 20, 2016 at 8:49 AM, Yuval Mintz  wrote:
> [Sorry in advance if this was already discussed in the past]
>
> Some of the HW capable of SRIOV has resource limitations, where the
> PF and VFs resources are drawn from a common pool.
> In some cases, these limitations have to be considered early during
> chip initialization and can only be changed by tearing down the
> configuration and re-initializing.
> As a result, drivers for such HWs sometimes have to make unfavorable
> compromises where they reserve sufficient resources to accomadate
> the maximal number of VFs that can be created - at the expanse of
> resources that could have been used by the PF.
>
> If users were able to provide 'hints' regarding the required number
> of VFs *prior* to driver attachment, then such compromises could be
> avoided. As we already have sysfs nodes that can be queried for the
> number of totalvfs, it makes sense to let the user reduce the number
> of said totalvfs using same infrastrucure.
> Then, we can have drivers supporting SRIOV take that value into account
> when deciding how much resources to reserve, allowing the PF to benefit
> from the difference between the configuration space value and the actual
> number needed by user.
>
> Signed-off-by: Yuval Mintz 
> ---
>  drivers/pci/pci-sysfs.c | 28 +++-
>  1 file changed, 27 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c
> index bcd10c7..c1546f8 100644
> --- a/drivers/pci/pci-sysfs.c
> +++ b/drivers/pci/pci-sysfs.c
> @@ -449,6 +449,30 @@ static ssize_t sriov_totalvfs_show(struct device *dev,
> return sprintf(buf, "%u\n", pci_sriov_get_totalvfs(pdev));
>  }
>
> +static ssize_t sriov_totalvfs_store(struct device *dev,
> +   struct device_attribute *attr,
> +   const char *buf, size_t count)
> +{
> +   struct pci_dev *pdev = to_pci_dev(dev);
> +   u16 max_vfs;
> +   int ret;
> +
> +   ret = kstrtou16(buf, 0, _vfs);
> +   if (ret < 0)
> +   return ret;
> +
> +   if (pdev->driver) {
> +   dev_info(>dev,
> +"Can't change totalvfs while driver is attached\n");
> +   return -EUSERS;
> +   }
> +
> +   ret = pci_sriov_set_totalvfs(pdev, max_vfs);
> +   if (ret)
> +   return ret;
> +
> +   return count;
> +}
>
>  static ssize_t sriov_numvfs_show(struct device *dev,
>  struct device_attribute *attr,
> @@ -516,7 +540,9 @@ static ssize_t sriov_numvfs_store(struct device *dev,
> return count;
>  }
>
> -static struct device_attribute sriov_totalvfs_attr = 
> __ATTR_RO(sriov_totalvfs);
> +static struct device_attribute sriov_totalvfs_attr =
> +   __ATTR(sriov_totalvfs, (S_IRUGO|S_IWUSR|S_IWGRP),
> +  sriov_totalvfs_show, sriov_totalvfs_store);
>  static struct device_attribute sriov_numvfs_attr =
> __ATTR(sriov_numvfs, (S_IRUGO|S_IWUSR|S_IWGRP),
>sriov_numvfs_show, sriov_numvfs_store);

It would be useful to have an interface where you could increase the
number after you have decreased it.  With the interface as you have it
written that isn't an option since pci_sriov_set_totalvfs is really
only meant to strip VFs if they cannot be support by something such as
a bus limitation due to ARI not being supported.

I really think that if you need something like this you might be
better off using something like dev-link or just to figure out a way
to make your driver flexible enough to allow you to move resources
into and/or out of your PF interface if VFs are added or removed.  I
know in the case of the Intel parts we have to bounce the link when
SR-IOV is enabled because we actually go through and tear out the
queues and interrupts from the PF and then reassign all of them
between the PF and VFs before we bring the PF back up.

- Alex

[PATCH next 0/2] Rename WORD_TRUNC/ROUND macros and use them

2016-09-20 Thread Marcelo Ricardo Leitner

This patchset aims to rename these macros to a non-confusing name, as
reported by David Laight and David Miller, and to update all remaining
places to make use of it, which was 1 last remaining spot.

v2:
- fixed 2nd patch summary

Details on the specific changelogs.

Thanks!

Marcelo Ricardo Leitner (2):
  sctp: rename WORD_TRUNC/ROUND macros
  sctp: make use of WORD_TRUNC macro

 include/net/sctp/sctp.h  | 10 +-
 net/netfilter/xt_sctp.c  |  2 +-
 net/sctp/associola.c |  2 +-
 net/sctp/chunk.c | 13 +++--
 net/sctp/input.c |  8 
 net/sctp/inqueue.c   |  2 +-
 net/sctp/output.c| 12 ++--
 net/sctp/sm_make_chunk.c | 26 +-
 net/sctp/sm_statefuns.c  |  6 +++---
 net/sctp/transport.c |  4 ++--
 net/sctp/ulpevent.c  |  4 ++--
 11 files changed, 45 insertions(+), 44 deletions(-)

-- 
2.7.4

[PATCH next v2 1/2] sctp: rename WORD_TRUNC/ROUND macros

2016-09-20 Thread Marcelo Ricardo Leitner

To something more meaningful these days, specially because this is
working on packet headers or lengths and which are not tied to any CPU
arch but to the protocol itself.

So, WORD_TRUNC becomes SCTP_TRUNC4 and WORD_ROUND becomes SCTP_ALIGN4.

Reported-by: David Laight 
Reported-by: David Miller 
Signed-off-by: Marcelo Ricardo Leitner 
---
 include/net/sctp/sctp.h  | 10 +-
 net/netfilter/xt_sctp.c  |  2 +-
 net/sctp/associola.c |  2 +-
 net/sctp/chunk.c |  6 +++---
 net/sctp/input.c |  8 
 net/sctp/inqueue.c   |  2 +-
 net/sctp/output.c| 12 ++--
 net/sctp/sm_make_chunk.c | 26 +-
 net/sctp/sm_statefuns.c  |  6 +++---
 net/sctp/transport.c |  4 ++--
 net/sctp/ulpevent.c  |  4 ++--
 11 files changed, 41 insertions(+), 41 deletions(-)

diff --git a/include/net/sctp/sctp.h b/include/net/sctp/sctp.h
index 
632e205ca54bfe85124753e09445251056e19aa7..bfc97d28a857c338403bda81ee8ef6897257bf9f
 100644
--- a/include/net/sctp/sctp.h
+++ b/include/net/sctp/sctp.h
@@ -83,9 +83,9 @@
 #endif
 
 /* Round an int up to the next multiple of 4.  */
-#define WORD_ROUND(s) (((s)+3)&~3)
+#define SCTP_ALIGN4(s) (((s)+3)&~3)
 /* Truncate to the previous multiple of 4.  */
-#define WORD_TRUNC(s) ((s)&~3)
+#define SCTP_TRUNC4(s) ((s)&~3)
 
 /*
  * Function declarations.
@@ -433,7 +433,7 @@ static inline int sctp_frag_point(const struct 
sctp_association *asoc, int pmtu)
if (asoc->user_frag)
frag = min_t(int, frag, asoc->user_frag);
 
-   frag = WORD_TRUNC(min_t(int, frag, SCTP_MAX_CHUNK_LEN));
+   frag = SCTP_TRUNC4(min_t(int, frag, SCTP_MAX_CHUNK_LEN));
 
return frag;
 }
@@ -462,7 +462,7 @@ _sctp_walk_params((pos), (chunk), 
ntohs((chunk)->chunk_hdr.length), member)
 for (pos.v = chunk->member;\
  pos.v <= (void *)chunk + end - ntohs(pos.p->length) &&\
  ntohs(pos.p->length) >= sizeof(sctp_paramhdr_t);\
- pos.v += WORD_ROUND(ntohs(pos.p->length)))
+ pos.v += SCTP_ALIGN4(ntohs(pos.p->length)))
 
 #define sctp_walk_errors(err, chunk_hdr)\
 _sctp_walk_errors((err), (chunk_hdr), ntohs((chunk_hdr)->length))
@@ -472,7 +472,7 @@ for (err = (sctp_errhdr_t *)((void *)chunk_hdr + \
sizeof(sctp_chunkhdr_t));\
  (void *)err <= (void *)chunk_hdr + end - ntohs(err->length) &&\
  ntohs(err->length) >= sizeof(sctp_errhdr_t); \
- err = (sctp_errhdr_t *)((void *)err + WORD_ROUND(ntohs(err->length
+ err = (sctp_errhdr_t *)((void *)err + SCTP_ALIGN4(ntohs(err->length
 
 #define sctp_walk_fwdtsn(pos, chunk)\
 _sctp_walk_fwdtsn((pos), (chunk), ntohs((chunk)->chunk_hdr->length) - 
sizeof(struct sctp_fwdtsn_chunk))
diff --git a/net/netfilter/xt_sctp.c b/net/netfilter/xt_sctp.c
index 
ef36a56a02c6881c58296b2bf45c4b99d3836456..3d23ee5e72a93157ab8bd84d41aa14aa5acf5c4c
 100644
--- a/net/netfilter/xt_sctp.c
+++ b/net/netfilter/xt_sctp.c
@@ -68,7 +68,7 @@ match_packet(const struct sk_buff *skb,
 ++i, offset, sch->type, htons(sch->length),
 sch->flags);
 #endif
-   offset += WORD_ROUND(ntohs(sch->length));
+   offset += SCTP_ALIGN4(ntohs(sch->length));
 
pr_debug("skb->len: %d\toffset: %d\n", skb->len, offset);
 
diff --git a/net/sctp/associola.c b/net/sctp/associola.c
index 
1c23060c41a669f38f4e2d244cfb61c85a522be6..f10d3397f917986d25240fc42f6a33ae8049e7b7
 100644
--- a/net/sctp/associola.c
+++ b/net/sctp/associola.c
@@ -1408,7 +1408,7 @@ void sctp_assoc_sync_pmtu(struct sock *sk, struct 
sctp_association *asoc)
transports) {
if (t->pmtu_pending && t->dst) {
sctp_transport_update_pmtu(sk, t,
-  WORD_TRUNC(dst_mtu(t->dst)));
+  
SCTP_TRUNC4(dst_mtu(t->dst)));
t->pmtu_pending = 0;
}
if (!pmtu || (t->pathmtu < pmtu))
diff --git a/net/sctp/chunk.c b/net/sctp/chunk.c
index 
af9cc8055465b18e9754a5542fc7bd43f9dad240..86be257c9881a9bf404a1be010fdcd1c55c8b89a
 100644
--- a/net/sctp/chunk.c
+++ b/net/sctp/chunk.c
@@ -208,8 +208,8 @@ struct sctp_datamsg *sctp_datamsg_from_user(struct 
sctp_association *asoc,
struct sctp_hmac *hmac_desc = sctp_auth_asoc_get_hmac(asoc);
 
if (hmac_desc)
-   max_data -= WORD_ROUND(sizeof(sctp_auth_chunk_t) +
-   hmac_desc->hmac_len);
+   max_data -= SCTP_ALIGN4(sizeof(sctp_auth_chunk_t) +
+   hmac_desc->hmac_len);
}
 
/* Now, check if we need to reduce our max */
@@ -229,7 +229,7 @@ struct sctp_datamsg *sctp_datamsg_from_user(struct 
sctp_association *asoc,
asoc->outqueue.out_qlen == 0 &&

[PATCH next v2 2/2] sctp: make use of SCTP_TRUNC4 macro

2016-09-20 Thread Marcelo Ricardo Leitner

And avoid the usage of '&~3'. This is the last place still not using
the macro.
Also break the line to make it easier to read.

Signed-off-by: Marcelo Ricardo Leitner 
---
When I checked it the other day I thought I had this patch applied by
the moment but I hadn't.

v2: updated patch summary

 net/sctp/chunk.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/net/sctp/chunk.c b/net/sctp/chunk.c
index 
86be257c9881a9bf404a1be010fdcd1c55c8b89a..b0089989473eee65b257a11f29c353f39fe3c602
 100644
--- a/net/sctp/chunk.c
+++ b/net/sctp/chunk.c
@@ -195,9 +195,10 @@ struct sctp_datamsg *sctp_datamsg_from_user(struct 
sctp_association *asoc,
/* This is the biggest possible DATA chunk that can fit into
 * the packet
 */
-   max_data = (asoc->pathmtu -
-   sctp_sk(asoc->base.sk)->pf->af->net_header_len -
-   sizeof(struct sctphdr) - sizeof(struct sctp_data_chunk)) & ~3;
+   max_data = asoc->pathmtu -
+  sctp_sk(asoc->base.sk)->pf->af->net_header_len -
+  sizeof(struct sctphdr) - sizeof(struct sctp_data_chunk);
+   max_data = SCTP_TRUNC4(max_data);
 
max = asoc->frag_point;
/* If the the peer requested that we authenticate DATA chunks
-- 
2.7.4

Re: [PATCH net-next 1/3] net: ethernet: mediatek: add extension of phy-mode for TRGMII

2016-09-20 Thread Florian Fainelli

On 09/20/2016 12:59 AM, sean.w...@mediatek.com wrote:
> From: Sean Wang 
> 
> adds PHY-mode "trgmii" as an extension for the operation
> mode of the PHY interface, TRGMII can be compatible with
> RGMII, so the extended mode doesn't really have effects on
> the target MAC and PHY, is used as the indication if the
> current MAC is connected to an internal switch or external
> PHY respectively by the given configuration on the board and
> then to perform the corresponding setup on TRGMII hardware
> module.

Based on my googling, it seems like Turbo RGMII is a Mediatek-specific
thing for now, but this could become standard and used by other vendors
at some point, so I would be inclined to just extend the phy-mode
property to support trgmii as another interface type.

If you do so, do you also mind proposing an update to the Device Tree
specification:

https://www.devicetree.org/specifications/

Thanks!
-- 
Florian

Re: [PATCH next 2/2] sctp: make use of WORD_TRUNC macro

2016-09-20 Thread Marcelo Ricardo Leitner

On Tue, Sep 20, 2016 at 05:24:20PM -0300, Marcelo Ricardo Leitner wrote:
> + max_data = SCTP_TRUNC4(max_data);

Will post a v2 to fix the subject.

[PATCH next 1/2] sctp: fix the handling of SACK Gap Ack blocks

2016-09-20 Thread Marcelo Ricardo Leitner

sctp_acked() is using 32bit arithmetics on 16bits vars, via TSN_lte()
macros, which is weird and confusing.

Once the offset to ctsn is calculated, all wrapping is already handled
and thus to verify the Gap Ack blocks we can just use pure
less/big-or-equal than checks.

Also, rename gap variable to tsn_offset, so it's more meaningful, as
it doesn't point to any gap at all.

Even so, I don't think this discrepancy resulted in any practical bug.

This patch is a preparation for the next one, which will introduce
typecheck() for TSN_lte() macros and would cause a compile error here.

Suggested-by: David Laight 
Reported-by: David Laight 
Signed-off-by: Marcelo Ricardo Leitner 
---
 net/sctp/outqueue.c | 11 ++-
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/net/sctp/outqueue.c b/net/sctp/outqueue.c
index 
8c3f446d965c030376d0cfafd2c64b9a946d2cc1..3ec6da8bbb5360187007935838717885c97f9e91
 100644
--- a/net/sctp/outqueue.c
+++ b/net/sctp/outqueue.c
@@ -1719,7 +1719,7 @@ static int sctp_acked(struct sctp_sackhdr *sack, __u32 
tsn)
 {
int i;
sctp_sack_variable_t *frags;
-   __u16 gap;
+   __u16 tsn_offset, blocks;
__u32 ctsn = ntohl(sack->cum_tsn_ack);
 
if (TSN_lte(tsn, ctsn))
@@ -1738,10 +1738,11 @@ static int sctp_acked(struct sctp_sackhdr *sack, __u32 
tsn)
 */
 
frags = sack->variable;
-   gap = tsn - ctsn;
-   for (i = 0; i < ntohs(sack->num_gap_ack_blocks); ++i) {
-   if (TSN_lte(ntohs(frags[i].gab.start), gap) &&
-   TSN_lte(gap, ntohs(frags[i].gab.end)))
+   blocks = ntohs(sack->num_gap_ack_blocks);
+   tsn_offset = tsn - ctsn;
+   for (i = 0; i < blocks; ++i) {
+   if (tsn_offset >= ntohs(frags[i].gab.start) &&
+   tsn_offset <= ntohs(frags[i].gab.end))
goto pass;
}
 
-- 
2.7.4

[PATCH next 0/2] improvements to SSN and TSN handling

2016-09-20 Thread Marcelo Ricardo Leitner

First patch fixes a potential issue made visible by the second one
noticed by David Laight and is a preparation for the next one.

The second patch changes how SSN, TSN and ASCONF serials are compared so
they can use typecheck() and are more like time_before() macro.

Marcelo Ricardo Leitner (2):
  sctp: fix the handling of SACK Gap Ack blocks
  sctp: improve how SSN, TSN and ASCONF serial are compared

 include/net/sctp/sm.h | 94 ++-
 net/sctp/outqueue.c   | 11 +++---
 2 files changed, 24 insertions(+), 81 deletions(-)

-- 
2.7.4

[PATCH next 2/2] sctp: improve how SSN, TSN and ASCONF serial are compared

2016-09-20 Thread Marcelo Ricardo Leitner

Make it similar to time_before() macros:
- easier to understand
- make use of typecheck() to avoid working on unexpected variable types
  (made the issue on previous patch visible)
- for _[lg]te versions, slighly faster, as the compiler used to generate
  a sequence of cmp/je/cmp/js instructions and now it's sub/test/jle
  (for _lte):

Before, for sctp_outq_sack:
if (primary->cacc.changeover_active) {
1f01:   80 b9 84 02 00 00 00cmpb   $0x0,0x284(%rcx)
1f08:   74 6e   je 1f78 
u8 clear_cycling = 0;

if (TSN_lte(primary->cacc.next_tsn_at_change, sack_ctsn)) {
1f0a:   8b 81 80 02 00 00   mov0x280(%rcx),%eax
return ((s) - (t)) & TSN_SIGN_BIT;
}

static inline int TSN_lte(__u32 s, __u32 t)
{
return ((s) == (t)) || (((s) - (t)) & TSN_SIGN_BIT);
1f10:   8b 7d bcmov-0x44(%rbp),%edi
1f13:   39 c7   cmp%eax,%edi
1f15:   74 25   je 1f3c 
1f17:   39 f8   cmp%edi,%eax
1f19:   78 21   js 1f3c 
primary->cacc.changeover_active = 0;

After:
if (primary->cacc.changeover_active) {
1ee7:   80 b9 84 02 00 00 00cmpb   $0x0,0x284(%rcx)
1eee:   74 73   je 1f63 
u8 clear_cycling = 0;

if (TSN_lte(primary->cacc.next_tsn_at_change, sack_ctsn)) {
1ef0:   8b 81 80 02 00 00   mov0x280(%rcx),%eax
1ef6:   2b 45 b4sub-0x4c(%rbp),%eax
1ef9:   85 c0   test   %eax,%eax
1efb:   7e 26   jle1f23 
primary->cacc.changeover_active = 0;

*_lt() generated pretty much the same code.
Tested with gcc (GCC) 6.1.1 20160621.

This patch also removes SSN_lte as it is not used and cleanups some
comments.

Signed-off-by: Marcelo Ricardo Leitner 
---
 include/net/sctp/sm.h | 94 ++-
 1 file changed, 18 insertions(+), 76 deletions(-)

diff --git a/include/net/sctp/sm.h b/include/net/sctp/sm.h
index 
bafe2a0ab9085f24e17038516c55c00cfddd02f4..ca6c971dd74aede829d4512ddf71006520c78f47
 100644
--- a/include/net/sctp/sm.h
+++ b/include/net/sctp/sm.h
@@ -307,85 +307,27 @@ static inline __u16 sctp_data_size(struct sctp_chunk 
*chunk)
 }
 
 /* Compare two TSNs */
+#define TSN_lt(a,b)\
+   (typecheck(__u32, a) && \
+typecheck(__u32, b) && \
+((__s32)((a) - (b)) < 0))
 
-/* RFC 1982 - Serial Number Arithmetic
- *
- * 2. Comparison
- *  Then, s1 is said to be equal to s2 if and only if i1 is equal to i2,
- *  in all other cases, s1 is not equal to s2.
- *
- * s1 is said to be less than s2 if, and only if, s1 is not equal to s2,
- * and
- *
- *  (i1 < i2 and i2 - i1 < 2^(SERIAL_BITS - 1)) or
- *  (i1 > i2 and i1 - i2 > 2^(SERIAL_BITS - 1))
- *
- * s1 is said to be greater than s2 if, and only if, s1 is not equal to
- * s2, and
- *
- *  (i1 < i2 and i2 - i1 > 2^(SERIAL_BITS - 1)) or
- *  (i1 > i2 and i1 - i2 < 2^(SERIAL_BITS - 1))
- */
-
-/*
- * RFC 2960
- *  1.6 Serial Number Arithmetic
- *
- * Comparisons and arithmetic on TSNs in this document SHOULD use Serial
- * Number Arithmetic as defined in [RFC1982] where SERIAL_BITS = 32.
- */
-
-enum {
-   TSN_SIGN_BIT = (1<<31)
-};
-
-static inline int TSN_lt(__u32 s, __u32 t)
-{
-   return ((s) - (t)) & TSN_SIGN_BIT;
-}
-
-static inline int TSN_lte(__u32 s, __u32 t)
-{
-   return ((s) == (t)) || (((s) - (t)) & TSN_SIGN_BIT);
-}
+#define TSN_lte(a,b)   \
+   (typecheck(__u32, a) && \
+typecheck(__u32, b) && \
+((__s32)((a) - (b)) <= 0))
 
 /* Compare two SSNs */
-
-/*
- * RFC 2960
- *  1.6 Serial Number Arithmetic
- *
- * Comparisons and arithmetic on Stream Sequence Numbers in this document
- * SHOULD use Serial Number Arithmetic as defined in [RFC1982] where
- * SERIAL_BITS = 16.
- */
-enum {
-   SSN_SIGN_BIT = (1<<15)
-};
-
-static inline int SSN_lt(__u16 s, __u16 t)
-{
-   return ((s) - (t)) & SSN_SIGN_BIT;
-}
-
-static inline int SSN_lte(__u16 s, __u16 t)
-{
-   return ((s) == (t)) || (((s) - (t)) & SSN_SIGN_BIT);
-}
-
-/*
- * ADDIP 3.1.1
- * The valid range of Serial Number is from 0 to 4294967295 (2**32 - 1). Serial
- * Numbers wrap back to 0 after reaching 4294967295.
- */
-enum {
-   ADDIP_SERIAL_SIGN_BIT = (1<<31)
-};
-
-static inline int ADDIP_SERIAL_gte(__u32 s, __u32 t)
-{
-   return ((s) == (t)) || (((t) - (s)) & ADDIP_SERIAL_SIGN_BIT);
-}
+#define SSN_lt(a,b)\
+   (typecheck(__u16, a) && \
+typecheck(__u16, b) && \
+((__s16)((a) - (b)) < 0))
+
+/* ADDIP 3.1.1 */
+#define ADDIP_SERIAL_gte(a,b)  \
+   (typecheck(__u32, a) && \
+

Re: [PATCH v3] net: ip, diag -- Add diag interface for raw sockets

2016-09-20 Thread Cyrill Gorcunov

On Fri, Sep 16, 2016 at 11:07:22PM +0300, Cyrill Gorcunov wrote:
> > It may well be a ss bug / problem. As I mentioned I am always seeing 255 
> > for the protocol which
> 
> It is rather not addressed in ss. I mean, look, when we send out a diag packet
> the kernel look ups for a handler, which for raw protocol we register as
> 
> static const struct inet_diag_handler raw_diag_handler = {
>   .dump= raw_diag_dump,
>   .dump_one= raw_diag_dump_one,
>   .idiag_get_info= raw_diag_get_info,
>   .idiag_type= IPPROTO_RAW,
>   .idiag_info_size= 0,
> #ifdef CONFIG_INET_DIAG_DESTROY
>   .destroy= raw_diag_destroy,
> #endif
> };
> 
> so if we patch ss and ask for IPPROTO_ICMP in netlink packet the
> kernel simply won't find anything. Thus I think we need (well, I need)
> to extend the patch and register IPPROTO_ICMP diag type, then
> extend ss as well. (If only I didn't miss somethin obvious).
> 
> > is odd since ss does a dump and takes the matches and invokes the kill.
> > Thanks for taking the time to do the kill piece.

Sorry for delay in reply (I got flu unexpectedly). You know, it eventually
become uneasy to implement handling for sock-raw because they are special.
They described as ipproto-ip in net/ipv4/af_inet.c, so it matches any
protocol specified with the socket call. In turn inet-diag module handled
predefined protocols only, in particular IPPROTO_RAW in our case. Thus
to fecth some real protocol sitting in raw sockets hashes we need some
kind of additional argument passed in the request. I guess we may
use @idiag_ext field for this sake? Or require @idiag_ext to have
INET_DIAG_PROTOCOL bit set and then fetch real protocol from
additional attribute? Sounds ok?

Cyrill

Re: [PATCH v2 0/2] make POSIX timers optional

2016-09-20 Thread Thomas Gleixner

On Tue, 20 Sep 2016, Nicolas Pitre wrote:
> On Tue, 20 Sep 2016, Richard Cochran wrote:
> 
> > On Tue, Sep 20, 2016 at 10:25:56PM +0200, Richard Cochran wrote:
> > > After this series, if I don't pay enough attention to dmesg, then I
> > > have lost functionality that I had in step #1.  That sucks, and it has
> > > nothing to do with the tinification option at all.  It will bite even
> > > if I have no knowledge of it.  That isn't acceptable to me.
> > 
> > Can't you leave all the "select PTP_1588_CLOCK" alone and simply add
> > 
> > #ifdef CONFIG_POSIX_TIMERS
> > // global declarations
> > #else
> > // static inlines
> > #endif

Eew. No! That's an even more blantant layering violation.

> > to ptp_clock_kernel.h, and then sandwich ptp_clock.c in
> > #ifdef CONFIG_POSIX_TIMERS ... #endif ?
> 
> Sure I could... but I'm sure I'll be flamed by others for making things 
> even more obscure and hackish than they are right now.

I think the whole approach is wrong because it makes the PTP split at the
wrong level.

Currently we have:

  DRIVER_X
  tristate "Driver X"
  select PTP

In order to make POSIX_CLOCK configurable we should have

  PTP
  tristate "PTP"
  select POSIX_CLOCK

Now if you want to distangle PTP from a driver then you split it at the
driver level and not at the PTP level:

  DRIVER_X
  tristate "Driver X"

  DRIVER_X_PTP
  bool "Enable PTP support"
  default y if !MAKE_IT_TINY
  depends on DRIVER_X
  select PTP

We have already drivers following that scheme. That way you make the PTP
support in the driver conditional on DRIVER_X_PTP and have no hassle with
modules and dependencies.

Your tiny config can simply disable all the PTP extra bits and then you can
disable PTP and finally POSIX_TIMERS.

Thanks,

tglx

Re: [PATCH net-next 7/8] net/mlx5e: XDP TX forwarding support

2016-09-20 Thread Jesper Dangaard Brouer

On Tue, 20 Sep 2016 09:45:28 -0700
Alexei Starovoitov  wrote:

> To your other question:
> > Please explain why a eBPF program error (div by zero) must be a silent 
> > drop?  
> 
> because 'div by zero' is an abnormal situation that shouldn't be exploited.
> Meaning if xdp program is doing DoS prevention and it has a bug that
> attacker can now exploit by sending a crafted packet that causes
> 'div by zero' and kernel will warn then attack got successful.
> Therefore it has to be silent drop.

Understood and documented:
 https://github.com/netoptimizer/prototype-kernel/commit/a4e60e2d7a894

Our current solution is not very optimal, it only result in onetime
WARN_ONCE() see bpf_warn_invalid_xdp_action().  But is should not be
affected by the DoS attack scenario you described.

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  Author of http://www.iptv-analyzer.org
  LinkedIn: http://www.linkedin.com/in/brouer

Re: [PATCH v2 0/2] make POSIX timers optional

2016-09-20 Thread Nicolas Pitre

On Tue, 20 Sep 2016, Richard Cochran wrote:

> On Tue, Sep 20, 2016 at 10:25:56PM +0200, Richard Cochran wrote:
> > After this series, if I don't pay enough attention to dmesg, then I
> > have lost functionality that I had in step #1.  That sucks, and it has
> > nothing to do with the tinification option at all.  It will bite even
> > if I have no knowledge of it.  That isn't acceptable to me.
> 
> Can't you leave all the "select PTP_1588_CLOCK" alone and simply add
> 
> #ifdef CONFIG_POSIX_TIMERS
>   // global declarations
> #else
>   // static inlines
> #endif
> 
> to ptp_clock_kernel.h, and then sandwich ptp_clock.c in
> #ifdef CONFIG_POSIX_TIMERS ... #endif ?

Sure I could... but I'm sure I'll be flamed by others for making things 
even more obscure and hackish than they are right now.

Oh well...  Let's go fix the Kconfig parser then.


Nicolas

Re: [PATCH v2 0/2] make POSIX timers optional

2016-09-20 Thread Nicolas Pitre

On Tue, 20 Sep 2016, Richard Cochran wrote:

> On Tue, Sep 20, 2016 at 03:56:38PM -0400, Nicolas Pitre wrote:
> > - Add a warning for the case where PTP clock subsystem is modular and a
> >   driver providing a clock is built-in rather than silently ignoring it.
> >   Suggested by Jiri Benc.
> 
> So I am really not happy with this.  Here is a common embedded
> workflow, at least for me:
> 
> 1. take some given Kconfig and get it running on the target.
> 
> 2. for the given HW, change the modules into built-ins, and forget
>module loading
> 
> After this series, if I don't pay enough attention to dmesg, then I
> have lost functionality that I had in step #1.

Would that given config from #1 typically have CONFIG_EXPERT actually 
set?

Ultimately, do you know a way to restrict a tristate to y or n? A 
tristate can be limited to m or n with "depends on m" but it doesn't 
appear to be possible to exclude m with a promotion to y.

Nicolas

Re: [PATCH net-next 7/8] net/mlx5e: XDP TX forwarding support

2016-09-20 Thread Jesper Dangaard Brouer


On Tue, 20 Sep 2016 20:59:39 +0200 Jesper Dangaard Brouer  
wrote:
> On Tue, 20 Sep 2016 10:39:20 -0700  Eric Dumazet  
> wrote:
> 
[...]
> 
> > Many existing supervision infrastructures collect device snmp
> > counters, and run as unprivileged programs.   
> 
> A supervision infrastructures is a valid use-case. It again indicate
> that such XDP stats need to structured, not just a random driver
> specific ethtool counter, to make it easy for such collection daemons.
> 
> 
> > tracepoints might not fit the need here, compared to a mere
> > tx_ring->tx_drops++  
> 
> I do see your point.  I really liked the tracepoint idea, but now I'm
> uncertain again...

I've document the need for Troubleshooting and Monitoring, so we don't
forget about it. See:

Commit:
 https://github.com/netoptimizer/prototype-kernel/commit/3925249089ae4

Online doc:
 
https://prototype-kernel.readthedocs.io/en/latest/networking/XDP/implementation/userspace_api.html#troubleshooting-and-monitoring


-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  Author of http://www.iptv-analyzer.org
  LinkedIn: http://www.linkedin.com/in/brouer

Re: [PATCH v2 0/2] make POSIX timers optional

2016-09-20 Thread Richard Cochran

On Tue, Sep 20, 2016 at 10:25:56PM +0200, Richard Cochran wrote:
> After this series, if I don't pay enough attention to dmesg, then I
> have lost functionality that I had in step #1.  That sucks, and it has
> nothing to do with the tinification option at all.  It will bite even
> if I have no knowledge of it.  That isn't acceptable to me.

Can't you leave all the "select PTP_1588_CLOCK" alone and simply add

#ifdef CONFIG_POSIX_TIMERS
// global declarations
#else
// static inlines
#endif

to ptp_clock_kernel.h, and then sandwich ptp_clock.c in
#ifdef CONFIG_POSIX_TIMERS ... #endif ?

Thanks,
Richard

[PATCH 2/2] net: ethernet: hisilicon: hns: use new api ethtool_{get|set}_link_ksettings

2016-09-20 Thread Philippe Reynes

The ethtool api {get|set}_settings is deprecated.
We move this driver to new api {get|set}_link_ksettings.

Signed-off-by: Philippe Reynes 
---
 drivers/net/ethernet/hisilicon/hns/hns_ethtool.c |  105 --
 1 files changed, 58 insertions(+), 47 deletions(-)

diff --git a/drivers/net/ethernet/hisilicon/hns/hns_ethtool.c 
b/drivers/net/ethernet/hisilicon/hns/hns_ethtool.c
index 0e2c174..47e59bb 100644
--- a/drivers/net/ethernet/hisilicon/hns/hns_ethtool.c
+++ b/drivers/net/ethernet/hisilicon/hns/hns_ethtool.c
@@ -64,14 +64,14 @@ static u32 hns_nic_get_link(struct net_device *net_dev)
 }
 
 static void hns_get_mdix_mode(struct net_device *net_dev,
- struct ethtool_cmd *cmd)
+ struct ethtool_link_ksettings *cmd)
 {
int mdix_ctrl, mdix, retval, is_resolved;
struct phy_device *phy_dev = net_dev->phydev;
 
if (!phy_dev || !phy_dev->mdio.bus) {
-   cmd->eth_tp_mdix_ctrl = ETH_TP_MDI_INVALID;
-   cmd->eth_tp_mdix = ETH_TP_MDI_INVALID;
+   cmd->base.eth_tp_mdix_ctrl = ETH_TP_MDI_INVALID;
+   cmd->base.eth_tp_mdix = ETH_TP_MDI_INVALID;
return;
}
 
@@ -88,35 +88,35 @@ static void hns_get_mdix_mode(struct net_device *net_dev,
 
switch (mdix_ctrl) {
case 0x0:
-   cmd->eth_tp_mdix_ctrl = ETH_TP_MDI;
+   cmd->base.eth_tp_mdix_ctrl = ETH_TP_MDI;
break;
case 0x1:
-   cmd->eth_tp_mdix_ctrl = ETH_TP_MDI_X;
+   cmd->base.eth_tp_mdix_ctrl = ETH_TP_MDI_X;
break;
case 0x3:
-   cmd->eth_tp_mdix_ctrl = ETH_TP_MDI_AUTO;
+   cmd->base.eth_tp_mdix_ctrl = ETH_TP_MDI_AUTO;
break;
default:
-   cmd->eth_tp_mdix_ctrl = ETH_TP_MDI_INVALID;
+   cmd->base.eth_tp_mdix_ctrl = ETH_TP_MDI_INVALID;
break;
}
 
if (!is_resolved)
-   cmd->eth_tp_mdix = ETH_TP_MDI_INVALID;
+   cmd->base.eth_tp_mdix = ETH_TP_MDI_INVALID;
else if (mdix)
-   cmd->eth_tp_mdix = ETH_TP_MDI_X;
+   cmd->base.eth_tp_mdix = ETH_TP_MDI_X;
else
-   cmd->eth_tp_mdix = ETH_TP_MDI;
+   cmd->base.eth_tp_mdix = ETH_TP_MDI;
 }
 
 /**
- *hns_nic_get_settings - implement ethtool get settings
+ *hns_nic_get_link_ksettings - implement ethtool get link ksettings
  *@net_dev: net_device
- *@cmd: ethtool_cmd
+ *@cmd: ethtool_link_ksettings
  *retuen 0 - success , negative --fail
  */
-static int hns_nic_get_settings(struct net_device *net_dev,
-   struct ethtool_cmd *cmd)
+static int hns_nic_get_link_ksettings(struct net_device *net_dev,
+ struct ethtool_link_ksettings *cmd)
 {
struct hns_nic_priv *priv = netdev_priv(net_dev);
struct hnae_handle *h;
@@ -124,6 +124,7 @@ static int hns_nic_get_settings(struct net_device *net_dev,
int ret;
u8 duplex;
u16 speed;
+   u32 supported, advertising;
 
if (!priv || !priv->ae_handle)
return -ESRCH;
@@ -138,38 +139,43 @@ static int hns_nic_get_settings(struct net_device 
*net_dev,
return -EINVAL;
}
 
+   ethtool_convert_link_mode_to_legacy_u32(,
+   cmd->link_modes.supported);
+   ethtool_convert_link_mode_to_legacy_u32(,
+   cmd->link_modes.advertising);
+
/* When there is no phy, autoneg is off. */
-   cmd->autoneg = false;
-   ethtool_cmd_speed_set(cmd, speed);
-   cmd->duplex = duplex;
+   cmd->base.autoneg = false;
+   cmd->base.cmd = speed;
+   cmd->base.duplex = duplex;
 
if (net_dev->phydev)
-   (void)phy_ethtool_gset(net_dev->phydev, cmd);
+   (void)phy_ethtool_ksettings_get(net_dev->phydev, cmd);
 
link_stat = hns_nic_get_link(net_dev);
if (!link_stat) {
-   ethtool_cmd_speed_set(cmd, (u32)SPEED_UNKNOWN);
-   cmd->duplex = DUPLEX_UNKNOWN;
+   cmd->base.speed = (u32)SPEED_UNKNOWN;
+   cmd->base.duplex = DUPLEX_UNKNOWN;
}
 
-   if (cmd->autoneg)
-   cmd->advertising |= ADVERTISED_Autoneg;
+   if (cmd->base.autoneg)
+   advertising |= ADVERTISED_Autoneg;
 
-   cmd->supported |= h->if_support;
+   supported |= h->if_support;
if (h->phy_if == PHY_INTERFACE_MODE_SGMII) {
-   cmd->supported |= SUPPORTED_TP;
-   cmd->advertising |= ADVERTISED_1000baseT_Full;
+   supported |= SUPPORTED_TP;
+   advertising |= ADVERTISED_1000baseT_Full;
} else if (h->phy_if == PHY_INTERFACE_MODE_XGMII) {
-   cmd->supported |= SUPPORTED_FIBRE;
-   cmd->advertising |=

[PATCH 1/2] net: ethernet: hisilicon: hns: use phydev from struct net_device

2016-09-20 Thread Philippe Reynes

The private structure contain a pointer to phydev, but the structure
net_device already contain such pointer. So we can remove the pointer
phydev in the private structure, and update the driver to use the
one contained in struct net_device.

Signed-off-by: Philippe Reynes 
---
 drivers/net/ethernet/hisilicon/hns/hns_enet.c|   23 ++-
 drivers/net/ethernet/hisilicon/hns/hns_enet.h|1 -
 drivers/net/ethernet/hisilicon/hns/hns_ethtool.c |   33 ++
 3 files changed, 24 insertions(+), 33 deletions(-)

diff --git a/drivers/net/ethernet/hisilicon/hns/hns_enet.c 
b/drivers/net/ethernet/hisilicon/hns/hns_enet.c
index d7e1f8c..059aaed 100644
--- a/drivers/net/ethernet/hisilicon/hns/hns_enet.c
+++ b/drivers/net/ethernet/hisilicon/hns/hns_enet.c
@@ -994,10 +994,10 @@ static void hns_nic_adjust_link(struct net_device *ndev)
struct hnae_handle *h = priv->ae_handle;
int state = 1;
 
-   if (priv->phy) {
+   if (ndev->phydev) {
h->dev->ops->adjust_link(h, ndev->phydev->speed,
 ndev->phydev->duplex);
-   state = priv->phy->link;
+   state = ndev->phydev->link;
}
state = state && h->dev->ops->get_status(h);
 
@@ -1022,7 +1022,6 @@ static void hns_nic_adjust_link(struct net_device *ndev)
  */
 int hns_nic_init_phy(struct net_device *ndev, struct hnae_handle *h)
 {
-   struct hns_nic_priv *priv = netdev_priv(ndev);
struct phy_device *phy_dev = h->phy_dev;
int ret;
 
@@ -1046,8 +1045,6 @@ int hns_nic_init_phy(struct net_device *ndev, struct 
hnae_handle *h)
if (h->phy_if == PHY_INTERFACE_MODE_XGMII)
phy_dev->autoneg = false;
 
-   priv->phy = phy_dev;
-
return 0;
 }
 
@@ -1224,8 +1221,8 @@ static int hns_nic_net_up(struct net_device *ndev)
if (ret)
goto out_start_err;
 
-   if (priv->phy)
-   phy_start(priv->phy);
+   if (ndev->phydev)
+   phy_start(ndev->phydev);
 
clear_bit(NIC_STATE_DOWN, >state);
(void)mod_timer(>service_timer, jiffies + SERVICE_TIMER_HZ);
@@ -1259,8 +1256,8 @@ static void hns_nic_net_down(struct net_device *ndev)
netif_tx_disable(ndev);
priv->link = 0;
 
-   if (priv->phy)
-   phy_stop(priv->phy);
+   if (ndev->phydev)
+   phy_stop(ndev->phydev);
 
ops = priv->ae_handle->dev->ops;
 
@@ -1359,8 +1356,7 @@ static void hns_nic_net_timeout(struct net_device *ndev)
 static int hns_nic_do_ioctl(struct net_device *netdev, struct ifreq *ifr,
int cmd)
 {
-   struct hns_nic_priv *priv = netdev_priv(netdev);
-   struct phy_device *phy_dev = priv->phy;
+   struct phy_device *phy_dev = netdev->phydev;
 
if (!netif_running(netdev))
return -EINVAL;
@@ -2017,9 +2013,8 @@ static int hns_nic_dev_remove(struct platform_device 
*pdev)
hns_nic_uninit_ring_data(priv);
priv->ring_data = NULL;
 
-   if (priv->phy)
-   phy_disconnect(priv->phy);
-   priv->phy = NULL;
+   if (ndev->phydev)
+   phy_disconnect(ndev->phydev);
 
if (!IS_ERR_OR_NULL(priv->ae_handle))
hnae_put_handle(priv->ae_handle);
diff --git a/drivers/net/ethernet/hisilicon/hns/hns_enet.h 
b/drivers/net/ethernet/hisilicon/hns/hns_enet.h
index 44bb301..5b412de 100644
--- a/drivers/net/ethernet/hisilicon/hns/hns_enet.h
+++ b/drivers/net/ethernet/hisilicon/hns/hns_enet.h
@@ -59,7 +59,6 @@ struct hns_nic_priv {
u32 port_id;
int phy_mode;
int phy_led_val;
-   struct phy_device *phy;
struct net_device *netdev;
struct device *dev;
struct hnae_handle *ae_handle;
diff --git a/drivers/net/ethernet/hisilicon/hns/hns_ethtool.c 
b/drivers/net/ethernet/hisilicon/hns/hns_ethtool.c
index 5eb3245..0e2c174 100644
--- a/drivers/net/ethernet/hisilicon/hns/hns_ethtool.c
+++ b/drivers/net/ethernet/hisilicon/hns/hns_ethtool.c
@@ -48,9 +48,9 @@ static u32 hns_nic_get_link(struct net_device *net_dev)
 
h = priv->ae_handle;
 
-   if (priv->phy) {
-   if (!genphy_read_status(priv->phy))
-   link_stat = priv->phy->link;
+   if (net_dev->phydev) {
+   if (!genphy_read_status(net_dev->phydev))
+   link_stat = net_dev->phydev->link;
else
link_stat = 0;
}
@@ -67,8 +67,7 @@ static void hns_get_mdix_mode(struct net_device *net_dev,
  struct ethtool_cmd *cmd)
 {
int mdix_ctrl, mdix, retval, is_resolved;
-   struct hns_nic_priv *priv = netdev_priv(net_dev);
-   struct phy_device *phy_dev = priv->phy;
+   struct phy_device *phy_dev = net_dev->phydev;
 
if (!phy_dev || !phy_dev->mdio.bus) {
cmd->eth_tp_mdix_ctrl = ETH_TP_MDI_INVALID;
@@ -144,8 +143,8 @@

Re: [RFC] PCI: Allow sysfs control over totalvfs

2016-09-20 Thread Mintz, Yuval

> >Some of the HW capable of SRIOV has resource limitations, where the
> >PF and VFs resources are drawn from a common pool.
> >In some cases, these limitations have to be considered early during
> >chip initialization and can only be changed by tearing down the
> >configuration and re-initializing.
> >As a result, drivers for such HWs sometimes have to make unfavorable
> >compromises where they reserve sufficient resources to accomadate
> >the maximal number of VFs that can be created - at the expanse of
> >resources that could have been used by the PF.
> >
> >If users were able to provide 'hints' regarding the required number
> >of VFs *prior* to driver attachment, then such compromises could be
> >avoided. As we already have sysfs nodes that can be queried for the
> >number of totalvfs, it makes sense to let the user reduce the number
> >of said totalvfs using same infrastrucure.
> >Then, we can have drivers supporting SRIOV take that value into account
> >when deciding how much resources to reserve, allowing the PF to benefit
> >from the difference between the configuration space value and the actual
> >number needed by user.

> One of the motivations for introducing devlink interface was to allow
> user to pass some kind of well defined option parameters or as you call
> it hints to driver module. That would allow to replace module options
> and introduce similar possibility to pre-configure hardware on probe time.
> We plan to use devlink to allow user to change resource allocation for
> mlxsw devices.

Is IOV configuration something you're going to explore in the near
future for mlxsw devices? Or are you merely pointing out that
devlink could provide a superior configuration infrastrucutre and
should be investigated as a better alternative?

> The plan is to allow to pre-create devlink instance before driver module
> is loaded. Then the user will use this placeholder to do the options
> setting. Once the driver module is loaded, it will fetch the options
> from devlink core and process it accordingly.

> I believe this is exactly what you need.

While this sounds far-superior to anything we can do via pci sysfs,
question is whether adding a devlink support for a device is 
a reasonable cost for adding this specific configuration [given
the existing sysfs nodes we already have].
I'm not sufficiently familiar with the infrastrucutre there, and I
wonder whether it will set the bar too high for this sort of
configuration to be used.

Re: [PATCH v2 0/2] make POSIX timers optional

2016-09-20 Thread Richard Cochran

On Tue, Sep 20, 2016 at 03:56:38PM -0400, Nicolas Pitre wrote:
> - Add a warning for the case where PTP clock subsystem is modular and a
>   driver providing a clock is built-in rather than silently ignoring it.
>   Suggested by Jiri Benc.

So I am really not happy with this.  Here is a common embedded
workflow, at least for me:

1. take some given Kconfig and get it running on the target.

2. for the given HW, change the modules into built-ins, and forget
   module loading

After this series, if I don't pay enough attention to dmesg, then I
have lost functionality that I had in step #1.  That sucks, and it has
nothing to do with the tinification option at all.  It will bite even
if I have no knowledge of it.  That isn't acceptable to me.

Thanks,
Richard

1 2 3 4 >

1 - 100 of 330 matches

Mail list logo