date:20160218

Re: [PATCH] b43: fix memory leak

2016-02-18 Thread Kalle Valo

Michael Büsch  writes:

> On Thu, 18 Feb 2016 18:04:36 +0530
> Sudip Mukherjee  wrote:
>
>> From: Sudip Mukherjee 
>> 
>> On error we jumped to the label bcma_out and returned the error code but
>> we missed freeing dev.
>> 
>> Signed-off-by: Sudip Mukherjee 
>> ---
>>  drivers/net/wireless/broadcom/b43/main.c | 1 +
>>  1 file changed, 1 insertion(+)
>> 
>> diff --git a/drivers/net/wireless/broadcom/b43/main.c 
>> b/drivers/net/wireless/broadcom/b43/main.c
>> index c279211..78f670a 100644
>> --- a/drivers/net/wireless/broadcom/b43/main.c
>> +++ b/drivers/net/wireless/broadcom/b43/main.c
>> @@ -5671,6 +5671,7 @@ static int b43_bcma_probe(struct bcma_device *core)
>>  wl = b43_wireless_init(dev);
>>  if (IS_ERR(wl)) {
>>  err = PTR_ERR(wl);
>> +kfree(dev);
>>  goto bcma_out;
>>  }
>>  
>
> We recently had a patch that fixes this, among more leaks. Subject:
> [PATCH v2 resend] b43: Fix memory leaks in b43_bus_dev_ssb_init and
> b43_bus_dev_bcma_init
>
> Please test that patch instead, so we can finally apply it.
>
> It needs to be tested on both ssb and bcma. Come on. This isn't too
> hard. :) Please somebody with any hardware test it. (I currently don't
> have any b43 hardware)

And the patch can be downloaded from patchwork:

https://patchwork.kernel.org/patch/8049041/

-- 
Kalle Valo

Re: [PATCH] net: bcmgenet: Fix internal PHY link state

2016-02-18 Thread Florian Fainelli

Le 18/02/2016 20:48, Jaedon Shin a écrit :
> The PHY link state is not chaged in GENETv2 caused by the previous
> commit 49f7a471e4d1 ("net: bcmgenet: Properly configure PHY to ignore
> interrupt") was set to PHY_IGNORE_INTERRUPT in bcmgenet_mii_probe().
> 
> The internal PHY should use phy_mac_interrupt() when not in use
> PHY_POLL. The statement for phy_mac_interrupt() has two conditions. The
> first condition to check GENET_HAS_MDIO_INTR is not related PHY link
> state, so this patch removes it.
> 
> Fixes: 49f7a471e4d1 ("net: bcmgenet: Properly configure PHY to ignore 
> interrupt")
> Signed-off-by: Jaedon Shin 

Acked-by: Florian Fainelli 

Thanks!
-- 
Florian

Re: [PATCH] net: bcmgenet: Add MDIO_INTR in GENETv2

2016-02-18 Thread Florian Fainelli

Hi Jaedon

Le 15/02/2016 19:12, Jaedon Shin a écrit :
> 
> As you said, the part in bcmgenet_irq_task() is a problem.
> 
> The bcmgenet using internal PHY should use phy_mac_interrupt() cause it has 
> not
> PHY_POLL, and it depends on Ethernet MAC ISR.
> 
> UMAC_IRQ_LINK_EVENT(LINK_UP and LINK_DOWN) was working correctly in GENETv2,
> but (priv->hw_params->flags & GENET_HAS_MDIO_INTR) was blocking to call
> phy_mac_interrupt(). I didn't find a reason through datasheet without 
> MDIO_INTR
> in GENETv2. However, I'm not sure using MDIO_INTR.
> 
> Therefore if MDIO_INTR is not valid in GENETv2, I will send the patch again 
> to remove the first chicken GENET_HAS_MDIO_INTR after your confirm.

MDIO interrupts are wired in GENETv2, so the second part of your patch
is correct, I see now that you have submitted a proper fix for the Link
UP/DOWN event condition, and thank you for doing that. FWIW, all GENET
versions have link UP/DOWN interrupts.
--
Florian

A net device refcnt leak problem

2016-02-18 Thread Yang Yingliang


Hi,


I got a tap device refcnt leak message when I was detaching the device
and it's deadloop for waiting the usage count decrease to 0.

The log is:
unregister_netdevice: waiting for pae_tap0 to become free. Usage count = 1


Unfortunately, it happened only once unit now, I cannot reproduce this.
My kernel version is 3.10 LTS.

The attachment is value of struct net_device, tun_struct and
tun_file while the leak is occurred.

Does anyone have any thoughts or if there is a patch could fix this ?


Thanks,
Yang
struct net_device {
  name = "pae_tap0\000\000\000\000\000\000\000", 
  name_hlist = {
next = 0x0, 
pprev = 0xdead00200200
  }, 
  ifalias = 0x0, 
  mem_end = 0, 
  mem_start = 0, 
  base_addr = 0, 
  irq = 0, 
  state = 6, 
  dev_list = {
next = 0x88061b96c050, 
prev = 0xdead00200200
  }, 
  napi_list = {
next = 0x88060bf2c060, 
prev = 0x88060bf2c060
  }, 
  unreg_list = {
next = 0x88060bf2c070, 
prev = 0x88060bf2c070
  }, 
  upper_dev_list = {
next = 0x88060bf2c080, 
prev = 0x88060bf2c080
  }, 
  features = 8589953089, 
  hw_features = 8591722569, 
  wanted_features = 8591722569, 
  vlan_features = 1769577, 
  hw_enc_features = 1, 
  mpls_features = 1, 
  ifindex = 13, 
  iflink = 13, 
  stats = {
rx_packets = 0, 
tx_packets = 7, 
rx_bytes = 0, 
tx_bytes = 570, 
rx_errors = 0, 
tx_errors = 0, 
rx_dropped = 0, 
tx_dropped = 0, 
multicast = 0, 
collisions = 0, 
rx_length_errors = 0, 
rx_over_errors = 0, 
rx_crc_errors = 0, 
rx_frame_errors = 0, 
rx_fifo_errors = 0, 
rx_missed_errors = 0, 
tx_aborted_errors = 0, 
tx_carrier_errors = 0, 
tx_fifo_errors = 0, 
tx_heartbeat_errors = 0, 
tx_window_errors = 0, 
rx_compressed = 0, 
tx_compressed = 0
  }, 
  rx_dropped = {
counter = 0
  }, 
  wireless_handlers = 0x0, 
  wireless_data = 0x0, 
  netdev_ops = 0xa04004c0 , 
  ethtool_ops = 0xa0400940 , 
  header_ops = 0x816b5480 , 
  flags = 4098, 
  priv_flags = 1049600, 
  gflags = 0, 
  padded = 0, 
  operstate = 2 '\002', 
  link_mode = 0 '\000', 
  if_port = 0 '\000', 
  dma = 0 '\000', 
  mtu = 1500, 
  type = 1, 
  hard_header_len = 14, 
  needed_headroom = 0, 
  needed_tailroom = 0, 
  perm_addr = 
"\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000",
 
  addr_assign_type = 1 '\001', 
  addr_len = 6 '\006', 
  neigh_priv_len = 0, 
  dev_id = 0, 
  dev_port = 0, 
  addr_list_lock = {
{
  rlock = {
raw_lock = {
  {
head_tail = 1441814, 
tickets = {
  head = 22, 
  tail = 22
}
  }
}
  }
}
  }, 
  uc = {
list = {
  next = 0x88060bf2c1f8, 
  prev = 0x88060bf2c1f8
}, 
count = 0
  }, 
  mc = {
list = {
  next = 0x88060bf2c210, 
  prev = 0x88060bf2c210
}, 
count = 0
  }, 
  dev_addrs = {
list = {
  next = 0x880619636a20, 
  prev = 0x880619636a20
}, 
count = 1
  }, 
  queues_kset = 0x8806196366c0, 
  uc_promisc = false, 
  promiscuity = 0, 
  allmulti = 0, 
  vlan_info = 0x0, 
  atalk_ptr = 0x0, 
  ip_ptr = 0x0, 
  dn_ptr = 0x0, 
  ip6_ptr = 0x0, 
  ax25_ptr = 0x0, 
  ieee80211_ptr = 0x0, 
  last_rx = 0, 
  dev_addr = 0x880619636a30 "\342\374\207[\342d", 
  _rx = 0x88061b8a1380, 
  num_rx_queues = 1, 
  real_num_rx_queues = 1, 
  rx_handler = 0x0, 
  rx_handler_data = 0x0, 
  ingress_queue = 0x0, 
  broadcast = 
"\377\377\377\377\377\377\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000",
 
  _tx = 0x88060de3f200, 
  num_tx_queues = 1, 
  real_num_tx_queues = 1, 
  qdisc = 0x819e7e40 , 
  tx_queue_len = 500, 
  tx_global_lock = {
{
  rlock = {
raw_lock = {
  {
head_tail = 131074, 
tickets = {
  head = 2, 
  tail = 2
}
  }
}
  }
}
  }, 
  xps_maps = 0x0, 
  rx_cpu_rmap = 0x0, 
  trans_start = 4294707515, 
  watchdog_timeo = 0, 
  watchdog_timer = {
entry = {
  next = 0x0, 
  prev = 0x0
}, 
expires = 0, 
base = 0x88061c4fc000, 
function = 0x815212b0 , 
data = 18446612158284480512, 
slack = -1, 
start_pid = -1, 
start_site = 0x0, 
start_comm = "\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000"
  }, 
  pcpu_refcnt = 0x60f9d98029a4, 
  todo_list = {
next = 0xdead00100100, 
prev = 0xdead00200200
  }, 
  index_hlist = {
next = 0x0, 
pprev = 0xdead00200200
  }, 
  link_watch_list = {
next = 0x88060bf2c3c0, 
prev = 0x88060bf2c3c0
  }, 
  reg_state = NETREG_UNREGISTERED, 
  dismantle = true, 
  rtnl_link_state = RTNL_LINK_INITIALIZED, 
  destructor = 0xa03fd220 ,

[PATCH] net: bcmgenet: Fix internal PHY link state

2016-02-18 Thread Jaedon Shin

The PHY link state is not chaged in GENETv2 caused by the previous
commit 49f7a471e4d1 ("net: bcmgenet: Properly configure PHY to ignore
interrupt") was set to PHY_IGNORE_INTERRUPT in bcmgenet_mii_probe().

The internal PHY should use phy_mac_interrupt() when not in use
PHY_POLL. The statement for phy_mac_interrupt() has two conditions. The
first condition to check GENET_HAS_MDIO_INTR is not related PHY link
state, so this patch removes it.

Fixes: 49f7a471e4d1 ("net: bcmgenet: Properly configure PHY to ignore 
interrupt")
Signed-off-by: Jaedon Shin 
---
 drivers/net/ethernet/broadcom/genet/bcmgenet.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/genet/bcmgenet.c 
b/drivers/net/ethernet/broadcom/genet/bcmgenet.c
index b15a60d787c7..d7e01a74e927 100644
--- a/drivers/net/ethernet/broadcom/genet/bcmgenet.c
+++ b/drivers/net/ethernet/broadcom/genet/bcmgenet.c
@@ -2445,8 +2445,7 @@ static void bcmgenet_irq_task(struct work_struct *work)
}
 
/* Link UP/DOWN event */
-   if ((priv->hw_params->flags & GENET_HAS_MDIO_INTR) &&
-   (priv->irq0_stat & UMAC_IRQ_LINK_EVENT)) {
+   if (priv->irq0_stat & UMAC_IRQ_LINK_EVENT) {
phy_mac_interrupt(priv->phydev,
  !!(priv->irq0_stat & UMAC_IRQ_LINK_UP));
priv->irq0_stat &= ~UMAC_IRQ_LINK_EVENT;
-- 
2.7.1

Re: [net-next 00/16][pull request] 40GbE Intel Wired LAN Driver Updates 2016-02-18

2016-02-18 Thread David Miller

From: Jeff Kirsher 
Date: Thu, 18 Feb 2016 16:31:05 -0800

> This series contains updates to i40e and i40evf only.

Pulled, thanks Jeff.

> Sorry, no witty patch descriptions this time around,

Sad trombone...

Re: [PATCH net-next] bnx2x: Add missing HSI for big-endian machines

2016-02-18 Thread David Miller

From: Yuval Mintz 
Date: Wed, 17 Feb 2016 13:15:14 +0200

> Commit e5d3a51cefbb "bnx2x: extend DCBx support" was missing HSI changes
> for big-endian machine, breaking compilation on such platforms.
> 
> Reported-by: kbuild test robot 
> Signed-off-by: Yuval Mintz 

Applied.

Re: [RFC][PATCH 00/10] Add trace event support to eBPF

2016-02-18 Thread Alexei Starovoitov

On Thu, Feb 18, 2016 at 03:27:18PM -0600, Tom Zanussi wrote:
> On Tue, 2016-02-16 at 20:51 -0800, Alexei Starovoitov wrote:
> > On Tue, Feb 16, 2016 at 04:35:27PM -0600, Tom Zanussi wrote:
> > > On Sun, 2016-02-14 at 01:02 +0100, Alexei Starovoitov wrote:
> > > > On Fri, Feb 12, 2016 at 10:11:18AM -0600, Tom Zanussi wrote:
> > 
> 
>   # ./funccount.py '*spin*'
> 
> Which on my machine resulted in a hard lockup on all CPUs.  I'm not set

thanks for the report. looks like something got broken. After:
# ./funccount.par '*spin*'
Tracing 12 functions for "*spin*"... Hit Ctrl-C to end.
^C
ADDR FUNC  COUNT
810aeb91 mutex_spin_on_owner 530
8177f241 _raw_spin_unlock_bh1325
810aebe1 mutex_optimistic_spin  1696
8177f581 _raw_spin_lock_bh  1985
8177f511 _raw_spin_trylock 55337
8177f3c1 _raw_spin_lock_irq   787875
8177f551 _raw_spin_lock  2211324
8177f331 _raw_spin_lock_irqsave  3556740
8177f1c1 __lock_text_start   3593983
[  275.175524] Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 
11
it seems kprobe cleanup is racing with bpf cleanup...

> > > surrounding that even in the comments.  I guess I'd have to spend a few
> > > hours reading the BPF code and the verifier even, to understand that.
> > 
> > not sure what is your goal. Runtime lookup via field name is not acceptable
> > whether it's cached or not. There is no place for strcmp in the critical 
> > path.
> 
> Exactly - that's why I was asking about a 'begin probe', in order to do
> the lookup once, in an non-critical path.

It is critical path. When program is called million times for_each{strcmp}
at the beginning of every program is unacceptable overhead.
In the crash above, Ctrl-C was pressed in a split second, yet bpf
already processed 2.2M + 3.5M + 3.5M events and then hung while unloading.
In the upcoming tracepoint+bpf patches the programs will have
direct access to tracepoint data without wasting time on strcmp.
The steps to do that were already outlined in the previous email.

Re: [net-next PATCH v3 3/8] net: sched: add cls_u32 offload hooks for netdevs

2016-02-18 Thread Simon Horman

On Thu, Feb 18, 2016 at 11:23:35AM +0200, Amir Vadai" wrote:
> On Wed, Feb 17, 2016 at 03:07:23PM -0800, John Fastabend wrote:
> > [...]
> > 
> > >>
> > >>> +static void u32_replace_hw_hnode(struct tcf_proto *tp, struct
> > >>> tc_u_hnode *h)
> > >>> +{
> > >>> +struct net_device *dev = tp->q->dev_queue->dev;
> > >>> +struct tc_cls_u32_offload u32_offload = {0};
> > >>> +struct tc_to_netdev offload;
> > >>> +
> > >>> +offload.type = TC_SETUP_CLSU32;
> > >>> +offload.cls_u32 = _offload;
> > >>> +
> > >>> +if (dev->netdev_ops->ndo_setup_tc) {
> > >>> +offload.cls_u32->command = TC_CLSU32_NEW_HNODE;
> > >>
> > >> TC_CLSU32_REPLACE_HNODE?
> > >>
> > > 
> > > Yep I made this change and will send out v4.
> > > 
> > > [...]
> > > 
> > >>
> > 
> > Actually thinking about this a bit more I wrote this thinking
> > that there existed some hardware that actually cared if it was
> > a new rule or an existing rule. For me it doesn't matter I do
> > the same thing in the new/replace cases I just write into the
> > slot on the hardware table and if it happens to have something
> > in it well its overwritten e.g. "replaced". This works because
> > the cls_u32 layer protects us from doing something unexpected.
> > 
> > I'm wondering (mostly asking the mlx folks) is there hardware
> > out there that cares to make this distinction between new and
> > replace? Otherwise I can just drop new and always use replace.
> > Or vice versa which is the case in its current form.
> I don't see a need for such a distinction in mlx hardware.

FWIW, I think it is unlikely such a distinction would
be needed for Netronome hardware.

Re: [PATCH] bpf: grab rcu read lock for bpf_percpu_hash_update

2016-02-18 Thread Alexei Starovoitov

On Thu, Feb 18, 2016 at 09:56:22PM -0500, Sasha Levin wrote:
> bpf_percpu_hash_update() expects rcu lock to be held and warns if it's not,
> which pointed out a missing rcu read lock.
> 
> Fixes: 15a07b338 ("bpf: add lookup/update support for per-cpu hash and array 
> maps")
> Signed-off-by: Sasha Levin 
> ---
>  kernel/bpf/syscall.c |2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> index c95a753..94324b8 100644
> --- a/kernel/bpf/syscall.c
> +++ b/kernel/bpf/syscall.c
> @@ -348,7 +348,9 @@ static int map_update_elem(union bpf_attr *attr)
>   goto free_value;
>  
>   if (map->map_type == BPF_MAP_TYPE_PERCPU_HASH) {
> + rcu_read_lock();
>   err = bpf_percpu_hash_update(map, key, value, attr->flags);
> + rcu_read_unlock();

good catch. thanks, but could you add rcu_read_lock/unlock
inside bpf_percpu_hash_update() instead... to match what
bpf_percpu_hash_copy/bpf_percpu_array_update/bpf_percpu_array_copy are doing.
Otherwise it's inconsistent.

[PATCH] bpf: grab rcu read lock for bpf_percpu_hash_update

2016-02-18 Thread Sasha Levin

bpf_percpu_hash_update() expects rcu lock to be held and warns if it's not,
which pointed out a missing rcu read lock.

Fixes: 15a07b338 ("bpf: add lookup/update support for per-cpu hash and array 
maps")
Signed-off-by: Sasha Levin 
---
 kernel/bpf/syscall.c |2 ++
 1 file changed, 2 insertions(+)

diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index c95a753..94324b8 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -348,7 +348,9 @@ static int map_update_elem(union bpf_attr *attr)
goto free_value;
 
if (map->map_type == BPF_MAP_TYPE_PERCPU_HASH) {
+   rcu_read_lock();
err = bpf_percpu_hash_update(map, key, value, attr->flags);
+   rcu_read_unlock();
} else if (map->map_type == BPF_MAP_TYPE_PERCPU_ARRAY) {
err = bpf_percpu_array_update(map, key, value, attr->flags);
} else {
-- 
1.7.10.4

Re: [PATCH net-next V1 08/12] net/mlx5e: Move to checksum complete

2016-02-18 Thread Tom Herbert

On Thu, Feb 18, 2016 at 2:32 AM, Saeed Mahameed  wrote:
> From: Matthew Finlay 
>
> Use checksum complete for all IP packets, unless they are HW LRO,
> in which case, use checksum unnecessary.
>
Awesome! Thanks for this fix.


> Signed-off-by: Matt Finlay 
> Signed-off-by: Saeed Mahameed 
> ---
>  drivers/net/ethernet/mellanox/mlx5/core/en_rx.c |9 +
>  1 files changed, 5 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c 
> b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
> index dd959d9..519a07f 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
> @@ -167,14 +167,15 @@ static inline bool is_first_ethertype_ip(struct sk_buff 
> *skb)
>  static inline void mlx5e_handle_csum(struct net_device *netdev,
>  struct mlx5_cqe64 *cqe,
>  struct mlx5e_rq *rq,
> -struct sk_buff *skb)
> +struct sk_buff *skb,
> +bool   lro)
>  {
> if (unlikely(!(netdev->features & NETIF_F_RXCSUM)))
> goto csum_none;
>
> -   if (likely(cqe->hds_ip_ext & CQE_L4_OK)) {
> +   if (lro) {
> skb->ip_summed = CHECKSUM_UNNECESSARY;
> -   } else if (is_first_ethertype_ip(skb)) {
> +   } else if (likely(is_first_ethertype_ip(skb))) {
> skb->ip_summed = CHECKSUM_COMPLETE;
> skb->csum = csum_unfold((__force __sum16)cqe->check_sum);
> rq->stats.csum_sw++;
> @@ -211,7 +212,7 @@ static inline void mlx5e_build_rx_skb(struct mlx5_cqe64 
> *cqe,
> if (unlikely(mlx5e_rx_hw_stamp(tstamp)))
> mlx5e_fill_hwstamp(tstamp, get_cqe_ts(cqe), 
> skb_hwtstamps(skb));
>
> -   mlx5e_handle_csum(netdev, cqe, rq, skb);
> +   mlx5e_handle_csum(netdev, cqe, rq, skb, !!lro_num_seg);
>
> skb->protocol = eth_type_trans(skb, netdev);
>
> --
> 1.7.1
>

Re: [PATCH net-next] store complete hash type information in socket buffer...

2016-02-18 Thread Tom Herbert

On Wed, Feb 17, 2016 at 1:30 PM, Eric Dumazet  wrote:
> On mer., 2016-02-17 at 15:44 -0500, David Miller wrote:
>> From: Paul Durrant 
>> Date: Mon, 15 Feb 2016 08:32:08 +
>>
>> > ...rather than a boolean merely indicating a canonical L4 hash.
>> >
>> > skb_set_hash() takes a hash type (from enum pkt_hash_types) as an
>> > argument but information is lost since only a single bit in the skb
>> > stores whether that hash type is PKT_HASH_TYPE_L4 or not. By using
>> > two bits it's possible to store the complete hash type information.
>> >
>> > Signed-off-by: Paul Durrant 
>>
>> Tom and/or Eric, please have a look at this.
>
> I guess my question is simply 'why do we need this' ?
>
> Consuming a bit in our precious sk_buff is not something we want for
> some obscure feature.
>
Right. I think the reason Paul wants this is be able to pass the hash
to a Windows guest. As I pointed out though, we'd also need an
indication that the hash is Toeplitz to be really correct with Windows
interface. The Linux driver interface does allow indicating L2, L3, or
L4 hash with the assumption that differentiation might be useful some
day, but so far it only appears that distinguishing L4 from others has
any value. It would be interesting to know if Windows actually does
anything useful in differentiating L2 and L3 hashes.

Tom

>
>

Re: [PATCH net v2 2/3] geneve: Relax MTU constraints

2016-02-18 Thread Tom Herbert

On Thu, Feb 18, 2016 at 8:54 AM, David Wragg  wrote:
> Tom Herbert  writes:
>> Please implement like in ip_tunnel_change_mtu (or better yet call it),
>> that is the precedent for tunnels.
>
> I've made geneve_change_mtu follow ip_tunnel_change_mtu in v2.
>
> If it were to call it instead, are you suggesting just passing in
> t_hlen?  Or restructuring geneve.c to re-use the whole ip_tunnel
> infrastructure?
>
I'll leave that to you to decide if that is feasible or makes sense,
but ip_tunnel does do some other interesting things. Support for
geneve could easily be implemented using ip_tunnel_encap facility. The
default MTU on the device is set based on the MTU of the outgoing
interface and tunnel overhead-- this should mitigate the possibility
of a lot of fragmentation happening within the tunnel. Also, the
output infrastructure caches the route for the tunnel which is a nice
performance win.

> Also, I'm not sure where the 0xFFF8 comes from in
> __ip_tunnel_change_mtu.  Any ideas why 0xFFF8 rather than 0x?  It
> goes all the way back to the inital import of the kernel into git.
>
Yes, that's pretty ugly. Feel free to replace that with a #define or
at least put a comment about it for the benefit of future generations.

Thanks,
Tom

> David

Re: [PATCH] net: fix bridge multicast packet checksum validation

2016-02-18 Thread Stephen Hemminger

On Thu, 18 Feb 2016 15:35:42 -0500 (EST)
David Miller  wrote:

> From: Linus Lüssing 
> Date: Mon, 15 Feb 2016 03:07:06 +0100
> 
> > @@ -4084,10 +4089,22 @@ struct sk_buff *skb_checksum_trimmed(struct sk_buff 
> > *skb,
> > if (!pskb_may_pull(skb_chk, offset))
> > goto err;
> >  
> > -   __skb_pull(skb_chk, offset);
> > +   ip_summed = skb->ip_summed;
> > +   csum_valid = skb->csum_valid;
> > +   csum_level = skb->csum_level;
> > +   csum_bad = skb->csum_bad;
> > +   csum = skb->csum;
> > +
> > +   skb_pull_rcsum(skb_chk, offset);
> > ret = skb_chkf(skb_chk);
> > __skb_push(skb_chk, offset);
> >  
> > +   skb->ip_summed = ip_summed;
> > +   skb->csum_valid = csum_valid;
> > +   skb->csum_level = csum_level;
> > +   skb->csum_bad = csum_bad;
> > +   skb->csum = csum;
> > +
> 
> There really has to be a better way to fix this :-/

Agreed, this is gross.

[PATCH] unix_diag: fix incorrect sign extension in unix_lookup_by_ino

2016-02-18 Thread Dmitry V. Levin

The value passed by unix_diag_get_exact to unix_lookup_by_ino has type
__u32, but unix_lookup_by_ino's argument ino has type int, which is not
a problem yet.
However, when ino is compared with sock_i_ino return value of type
unsigned long, ino is sign extended to signed long, and this results
to incorrect comparison on 64-bit architectures for inode numbers
greater than INT_MAX.

This bug was found by strace test suite.

Signed-off-by: Dmitry V. Levin 
Cc: 
---
 net/unix/diag.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/unix/diag.c b/net/unix/diag.c
index c512f64..4d96797 100644
--- a/net/unix/diag.c
+++ b/net/unix/diag.c
@@ -220,23 +220,23 @@ done:
return skb->len;
 }
 
-static struct sock *unix_lookup_by_ino(int ino)
+static struct sock *unix_lookup_by_ino(unsigned int ino)
 {
int i;
struct sock *sk;
 
spin_lock(_table_lock);
for (i = 0; i < ARRAY_SIZE(unix_socket_table); i++) {
sk_for_each(sk, _socket_table[i])
if (ino == sock_i_ino(sk)) {
sock_hold(sk);
spin_unlock(_table_lock);
 
return sk;
}
}
 
spin_unlock(_table_lock);
return NULL;
 }
 
-- 
ldv

Re: [PATCH V2 net-next 2/3] lan78xx: add ethtool set & get pause functions

2016-02-18 Thread Ben Hutchings

On Fri, 2016-02-19 at 00:16 +, woojung@microchip.com wrote:
> > > Ben, thanks for comments.
> > > How about comment in include/uapi/linux/ethtool.h?
> > > It says
> > > ** struct ethtool_pauseparam - Ethernet pause (flow control) parameters
> > > ...
> > > * If @autoneg is non-zero, the MAC is configured to send and/or
> > > * receive pause frames according to the result of autonegotiation.
> > > 
> > > Doesn't this mean get_pauseparam() returns pause settings based on
> > > Result of autonegotation? Not manual settings of rx_param & tx_param?
> > 
> > No, get_pauseparam should return the same settings that were passed to
> > the last set_pauseparam.
> > 
> > Ben.
> 
> I used drivers/net/ethernet/intel/e1000e driver as reference.
> It's implementation also returns status updated after autonegotiation.
> Look into wrong one?

Unfortunately the API has not always been clearly defined and there are
lots of bugs (or at least inconsistencies) in drivers.  The comments in
include/uapi/linux/ethtool.h are supposed to be definitive; if they are
not clear then please suggest additional or alternative wording.

Ben.

-- 
Ben Hutchings
Tomorrow will be cancelled due to lack of interest.

signature.asc
Description: This is a digitally signed message part

Re: [PATCH next v3 1/3] ipvlan: scrub skb before routing in L3 mode.

2016-02-18 Thread Mahesh Bandewar

On Thu, Feb 18, 2016 at 4:44 PM, Cong Wang  wrote:
> On Thu, Feb 18, 2016 at 4:39 PM, Mahesh Bandewar  wrote:
>> [snip]
 -   skb_dst_drop(skb);
 +   skb_scrub_packet(skb, true);
>>>
>>> At least this patch is still same with the previous version. Or am I
>>> missing anything?
>>
>>  xnet param is now set to 'true'.
>
> Oh, I was suggesting to set xnet based on the netns of both ipvlan
> device and physical device, not setting it to be true or false
> unconditionally.
>
Well, thought about that but don't know any use case / user who is
using the ipvlan slave devices in the same ns as master hence decided
to do it this way.

> Something like:
>
> xnet = !netns_eq(dev_net(ipvlan_dev), dev_net(phy_dev));
>
> (not real code, just to show my idea)
>
> Makes any sense?
Sure, it's not hard just did for the said reasons.

Re: [PATCH v7 8/8] net: e1000e: Adds hardware supported cross timestamp on e1000e nic

2016-02-18 Thread Jeff Kirsher

On Fri, 2016-02-12 at 12:25 -0800, Christopher S. Hall wrote:
> Modern Intel systems supports cross timestamping of the network
> device
> clock and Always Running Timer (ART) in hardware.  This allows the
> device time and system time to be precisely correlated. The timestamp
> pair is returned through e1000e_phc_get_syncdevicetime() used by
> get_system_device_crosststamp().  The hardware cross-timestamp result
> is made available to applications through the PTP_SYS_OFFSET_PRECISE
> ioctl which calls e1000e_phc_getcrosststamp().
> 
> Signed-off-by: Christopher S. Hall 
> [jstultz: Reworked to use new interface, commit message tweaks]
> Signed-off-by: John Stultz 
> ---
>  drivers/net/ethernet/intel/Kconfig  |  9 +++
>  drivers/net/ethernet/intel/e1000e/defines.h |  5 ++
>  drivers/net/ethernet/intel/e1000e/ptp.c | 85
> +
>  drivers/net/ethernet/intel/e1000e/regs.h    |  4 ++
>  4 files changed, 103 insertions(+)

Acked-by: Jeff Kirsher 

I am fine with Christopher's changes, so when the issues with the other
patches in the series gets ironed out, your good to apply this patch as
well John.

signature.asc
Description: This is a digitally signed message part

Re: [PATCH next v3 1/3] ipvlan: scrub skb before routing in L3 mode.

2016-02-18 Thread Cong Wang

On Thu, Feb 18, 2016 at 4:39 PM, Mahesh Bandewar  wrote:
> [snip]
>>> -   skb_dst_drop(skb);
>>> +   skb_scrub_packet(skb, true);
>>
>> At least this patch is still same with the previous version. Or am I
>> missing anything?
>
>  xnet param is now set to 'true'.

Oh, I was suggesting to set xnet based on the netns of both ipvlan
device and physical device, not setting it to be true or false
unconditionally.

Something like:

xnet = !netns_eq(dev_net(ipvlan_dev), dev_net(phy_dev));

(not real code, just to show my idea)

Makes any sense?

Re: [PATCH next v3 1/3] ipvlan: scrub skb before routing in L3 mode.

2016-02-18 Thread Mahesh Bandewar

[snip]
>> -   skb_dst_drop(skb);
>> +   skb_scrub_packet(skb, true);
>
> At least this patch is still same with the previous version. Or am I
> missing anything?

 xnet param is now set to 'true'.

[net-next 11/16] i40e/i40evf: Enable support for SKB_GSO_UDP_TUNNEL_CSUM

2016-02-18 Thread Jeff Kirsher

From: Alexander Duyck 

The XL722 has support for providing the outer UDP tunnel checksum on
transmits.  Make use of this feature to support segmenting UDP tunnels with
outer checksums enabled.

Signed-off-by: Alexander Duyck 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40e/i40e_txrx.c   | 19 ++-
 drivers/net/ethernet/intel/i40evf/i40e_txrx.c | 19 ++-
 2 files changed, 36 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c 
b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
index ded73c0..1955c84 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
@@ -2272,6 +2272,7 @@ static int i40e_tso(struct i40e_ring *tx_ring, struct 
sk_buff *skb,
} ip;
union {
struct tcphdr *tcp;
+   struct udphdr *udp;
unsigned char *hdr;
} l4;
u32 paylen, l4_offset;
@@ -2298,7 +2299,18 @@ static int i40e_tso(struct i40e_ring *tx_ring, struct 
sk_buff *skb,
ip.v6->payload_len = 0;
}
 
-   if (skb_shinfo(skb)->gso_type & (SKB_GSO_UDP_TUNNEL | SKB_GSO_GRE)) {
+   if (skb_shinfo(skb)->gso_type & (SKB_GSO_UDP_TUNNEL | SKB_GSO_GRE |
+SKB_GSO_UDP_TUNNEL_CSUM)) {
+   if (skb_shinfo(skb)->gso_type & SKB_GSO_UDP_TUNNEL_CSUM) {
+   /* determine offset of outer transport header */
+   l4_offset = l4.hdr - skb->data;
+
+   /* remove payload length from outer checksum */
+   paylen = (__force u16)l4.udp->check;
+   paylen += ntohs(1) * (u16)~(skb->len - l4_offset);
+   l4.udp->check = ~csum_fold((__force __wsum)paylen);
+   }
+
/* reset pointers to inner headers */
ip.hdr = skb_inner_network_header(skb);
l4.hdr = skb_inner_transport_header(skb);
@@ -2460,6 +2472,11 @@ static int i40e_tx_enable_csum(struct sk_buff *skb, u32 
*tx_flags,
tunnel |= ((ip.hdr - l4.hdr) / 2) <<
  I40E_TXD_CTX_QW0_NATLEN_SHIFT;
 
+   /* indicate if we need to offload outer UDP header */
+   if ((*tx_flags & I40E_TX_FLAGS_TSO) &&
+   (skb_shinfo(skb)->gso_type & SKB_GSO_UDP_TUNNEL_CSUM))
+   tunnel |= I40E_TXD_CTX_QW0_L4T_CS_MASK;
+
/* record tunnel offload values */
*cd_tunneling |= tunnel;
 
diff --git a/drivers/net/ethernet/intel/i40evf/i40e_txrx.c 
b/drivers/net/ethernet/intel/i40evf/i40e_txrx.c
index 3f40e0e..6d66fcd 100644
--- a/drivers/net/ethernet/intel/i40evf/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40evf/i40e_txrx.c
@@ -1532,6 +1532,7 @@ static int i40e_tso(struct i40e_ring *tx_ring, struct 
sk_buff *skb,
} ip;
union {
struct tcphdr *tcp;
+   struct udphdr *udp;
unsigned char *hdr;
} l4;
u32 paylen, l4_offset;
@@ -1558,7 +1559,18 @@ static int i40e_tso(struct i40e_ring *tx_ring, struct 
sk_buff *skb,
ip.v6->payload_len = 0;
}
 
-   if (skb_shinfo(skb)->gso_type & (SKB_GSO_UDP_TUNNEL | SKB_GSO_GRE)) {
+   if (skb_shinfo(skb)->gso_type & (SKB_GSO_UDP_TUNNEL | SKB_GSO_GRE |
+SKB_GSO_UDP_TUNNEL_CSUM)) {
+   if (skb_shinfo(skb)->gso_type & SKB_GSO_UDP_TUNNEL_CSUM) {
+   /* determine offset of outer transport header */
+   l4_offset = l4.hdr - skb->data;
+
+   /* remove payload length from outer checksum */
+   paylen = (__force u16)l4.udp->check;
+   paylen += ntohs(1) * (u16)~(skb->len - l4_offset);
+   l4.udp->check = ~csum_fold((__force __wsum)paylen);
+   }
+
/* reset pointers to inner headers */
ip.hdr = skb_inner_network_header(skb);
l4.hdr = skb_inner_transport_header(skb);
@@ -1678,6 +1690,11 @@ static int i40e_tx_enable_csum(struct sk_buff *skb, u32 
*tx_flags,
tunnel |= ((ip.hdr - l4.hdr) / 2) <<
  I40E_TXD_CTX_QW0_NATLEN_SHIFT;
 
+   /* indicate if we need to offload outer UDP header */
+   if ((*tx_flags & I40E_TX_FLAGS_TSO) &&
+   (skb_shinfo(skb)->gso_type & SKB_GSO_UDP_TUNNEL_CSUM))
+   tunnel |= I40E_TXD_CTX_QW0_L4T_CS_MASK;
+
/* record tunnel offload values */
*cd_tunneling |= tunnel;
 
-- 
2.5.0

[net-next 04/16] i40e/i40evf: Consolidate all header changes into TSO function

2016-02-18 Thread Jeff Kirsher

From: Alexander Duyck 

This patch goes through and pulls all of the spots where we were updating
either the TCP or IP checksums in the TSO and checksum path into the TSO
function.  The general idea here is that we should only be updating the
header after we verify we have completed a skb_cow_head check to verify the
head is writable.

One other advantage to doing this is that it makes things much more
obvious.  For example, in the case of IPv6 there was one spot where the
offset of the IPv4 header checksum was being updated which is obviously
incorrect.

Signed-off-by: Alexander Duyck 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40e/i40e_txrx.c   | 44 ---
 drivers/net/ethernet/intel/i40evf/i40e_txrx.c | 44 ---
 2 files changed, 52 insertions(+), 36 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c 
b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
index ce0234e..f47f9cb 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
@@ -2268,8 +2268,11 @@ static int i40e_tso(struct i40e_ring *tx_ring, struct 
sk_buff *skb,
u8 *hdr_len, u64 *cd_type_cmd_tso_mss)
 {
u64 cd_cmd, cd_tso_len, cd_mss;
-   struct ipv6hdr *ipv6h;
-   struct iphdr *iph;
+   union {
+   struct iphdr *v4;
+   struct ipv6hdr *v6;
+   unsigned char *hdr;
+   } ip;
union {
struct tcphdr *tcp;
unsigned char *hdr;
@@ -2287,16 +2290,29 @@ static int i40e_tso(struct i40e_ring *tx_ring, struct 
sk_buff *skb,
if (err < 0)
return err;
 
-   iph = skb->encapsulation ? inner_ip_hdr(skb) : ip_hdr(skb);
-   ipv6h = skb->encapsulation ? inner_ipv6_hdr(skb) : ipv6_hdr(skb);
-   l4.hdr = skb->encapsulation ? skb_inner_transport_header(skb) :
- skb_transport_header(skb);
+   ip.hdr = skb_network_header(skb);
+   l4.hdr = skb_transport_header(skb);
 
-   if (iph->version == 4) {
-   iph->tot_len = 0;
-   iph->check = 0;
+   /* initialize outer IP header fields */
+   if (ip.v4->version == 4) {
+   ip.v4->tot_len = 0;
+   ip.v4->check = 0;
} else {
-   ipv6h->payload_len = 0;
+   ip.v6->payload_len = 0;
+   }
+
+   if (skb_shinfo(skb)->gso_type & (SKB_GSO_UDP_TUNNEL | SKB_GSO_GRE)) {
+   /* reset pointers to inner headers */
+   ip.hdr = skb_inner_network_header(skb);
+   l4.hdr = skb_inner_transport_header(skb);
+
+   /* initialize inner IP header fields */
+   if (ip.v4->version == 4) {
+   ip.v4->tot_len = 0;
+   ip.v4->check = 0;
+   } else {
+   ip.v6->payload_len = 0;
+   }
}
 
/* determine offset of inner transport header */
@@ -2381,15 +2397,11 @@ static void i40e_tx_enable_csum(struct sk_buff *skb, 
u32 *tx_flags,
struct iphdr *this_ip_hdr;
u32 network_hdr_len;
u8 l4_hdr = 0;
-   struct udphdr *oudph = NULL;
-   struct iphdr *oiph = NULL;
u32 l4_tunnel = 0;
 
if (skb->encapsulation) {
switch (ip_hdr(skb)->protocol) {
case IPPROTO_UDP:
-   oudph = udp_hdr(skb);
-   oiph = ip_hdr(skb);
l4_tunnel = I40E_TXD_CTX_UDP_TUNNELING;
*tx_flags |= I40E_TX_FLAGS_UDP_TUNNEL;
break;
@@ -2407,15 +2419,12 @@ static void i40e_tx_enable_csum(struct sk_buff *skb, 
u32 *tx_flags,
if (*tx_flags & I40E_TX_FLAGS_IPV4) {
if (*tx_flags & I40E_TX_FLAGS_TSO) {
*cd_tunneling |= I40E_TX_CTX_EXT_IP_IPV4;
-   ip_hdr(skb)->check = 0;
} else {
*cd_tunneling |=
 I40E_TX_CTX_EXT_IP_IPV4_NO_CSUM;
}
} else if (*tx_flags & I40E_TX_FLAGS_IPV6) {
*cd_tunneling |= I40E_TX_CTX_EXT_IP_IPV6;
-   if (*tx_flags & I40E_TX_FLAGS_TSO)
-   ip_hdr(skb)->check = 0;
}
 
/* Now set the ctx descriptor fields */
@@ -2444,7 +2453,6 @@ static void i40e_tx_enable_csum(struct sk_buff *skb, u32 
*tx_flags,
 */
if (*tx_flags & I40E_TX_FLAGS_TSO) {
*td_cmd |= I40E_TX_DESC_CMD_IIPT_IPV4_CSUM;
-   this_ip_hdr->check = 0;
} else {
*td_cmd |= I40E_TX_DESC_CMD_IIPT_IPV4;

[net-next 03/16] i40e/i40evf: Factor out L4 header and checksum from L3 bits in TSO path

2016-02-18 Thread Jeff Kirsher

From: Alexander Duyck 

This patch makes it so that the L4 header offsets and such can be ignored
when dealing with the L3 checksum and length update.  This is done making
use of two things.

First we can just use the offset from the L4 header to the start of the
packet to determine the L4 offset, and from that we can then make use of
the data offset to determine the full length of the headers.

As far as adjusting the checksum to remove the length we can simply add the
inverse of the length instead of having to recompute the entire
pseudo-header without the length.  In the case of an IPv6 header this
should be significantly cheaper since we can make use of a value we already
needed instead of having to read the source and destination address out of
the packet.

Signed-off-by: Alexander Duyck 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40e/i40e_txrx.c   | 31 ---
 drivers/net/ethernet/intel/i40evf/i40e_txrx.c | 31 ---
 2 files changed, 36 insertions(+), 26 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c 
b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
index 5e82589..ce0234e 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
@@ -2269,9 +2269,12 @@ static int i40e_tso(struct i40e_ring *tx_ring, struct 
sk_buff *skb,
 {
u64 cd_cmd, cd_tso_len, cd_mss;
struct ipv6hdr *ipv6h;
-   struct tcphdr *tcph;
struct iphdr *iph;
-   u32 l4len;
+   union {
+   struct tcphdr *tcp;
+   unsigned char *hdr;
+   } l4;
+   u32 paylen, l4_offset;
int err;
 
if (skb->ip_summed != CHECKSUM_PARTIAL)
@@ -2286,24 +2289,26 @@ static int i40e_tso(struct i40e_ring *tx_ring, struct 
sk_buff *skb,
 
iph = skb->encapsulation ? inner_ip_hdr(skb) : ip_hdr(skb);
ipv6h = skb->encapsulation ? inner_ipv6_hdr(skb) : ipv6_hdr(skb);
+   l4.hdr = skb->encapsulation ? skb_inner_transport_header(skb) :
+ skb_transport_header(skb);
 
if (iph->version == 4) {
-   tcph = skb->encapsulation ? inner_tcp_hdr(skb) : tcp_hdr(skb);
iph->tot_len = 0;
iph->check = 0;
-   tcph->check = ~csum_tcpudp_magic(iph->saddr, iph->daddr,
-0, IPPROTO_TCP, 0);
-   } else if (ipv6h->version == 6) {
-   tcph = skb->encapsulation ? inner_tcp_hdr(skb) : tcp_hdr(skb);
+   } else {
ipv6h->payload_len = 0;
-   tcph->check = ~csum_ipv6_magic(>saddr, >daddr,
-  0, IPPROTO_TCP, 0);
}
 
-   l4len = skb->encapsulation ? inner_tcp_hdrlen(skb) : tcp_hdrlen(skb);
-   *hdr_len = (skb->encapsulation
-   ? (skb_inner_transport_header(skb) - skb->data)
-   : skb_transport_offset(skb)) + l4len;
+   /* determine offset of inner transport header */
+   l4_offset = l4.hdr - skb->data;
+
+   /* remove payload length from inner checksum */
+   paylen = (__force u16)l4.tcp->check;
+   paylen += ntohs(1) * (u16)~(skb->len - l4_offset);
+   l4.tcp->check = ~csum_fold((__force __wsum)paylen);
+
+   /* compute length of segmentation header */
+   *hdr_len = (l4.tcp->doff * 4) + l4_offset;
 
/* find the field values */
cd_cmd = I40E_TX_CTX_DESC_TSO;
diff --git a/drivers/net/ethernet/intel/i40evf/i40e_txrx.c 
b/drivers/net/ethernet/intel/i40evf/i40e_txrx.c
index c5f98cb..881d0ad 100644
--- a/drivers/net/ethernet/intel/i40evf/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40evf/i40e_txrx.c
@@ -1529,9 +1529,12 @@ static int i40e_tso(struct i40e_ring *tx_ring, struct 
sk_buff *skb,
 {
u64 cd_cmd, cd_tso_len, cd_mss;
struct ipv6hdr *ipv6h;
-   struct tcphdr *tcph;
struct iphdr *iph;
-   u32 l4len;
+   union {
+   struct tcphdr *tcp;
+   unsigned char *hdr;
+   } l4;
+   u32 paylen, l4_offset;
int err;
 
if (skb->ip_summed != CHECKSUM_PARTIAL)
@@ -1546,24 +1549,26 @@ static int i40e_tso(struct i40e_ring *tx_ring, struct 
sk_buff *skb,
 
iph = skb->encapsulation ? inner_ip_hdr(skb) : ip_hdr(skb);
ipv6h = skb->encapsulation ? inner_ipv6_hdr(skb) : ipv6_hdr(skb);
+   l4.hdr = skb->encapsulation ? skb_inner_transport_header(skb) :
+ skb_transport_header(skb);
 
if (iph->version == 4) {
-   tcph = skb->encapsulation ? inner_tcp_hdr(skb) : tcp_hdr(skb);
iph->tot_len = 0;
iph->check = 0;
-   tcph->check = ~csum_tcpudp_magic(iph->saddr, iph->daddr,
-0, IPPROTO_TCP, 0);
-   }

[net-next 13/16] i40e: Do not drop support for IPv6 VXLAN or GENEVE tunnels

2016-02-18 Thread Jeff Kirsher

From: Alexander Duyck 

All of the documentation in the datasheets for the XL710 do not call out
any reason to exclude support for IPv6 based tunnels.  As such I am
dropping the code that was excluding these tunnel types from having their
port numbers recognized.  This way we can take advantage of things such as
checksum offload for inner headers over IPv6 based VXLAN or GENEVE
tunnels.

Signed-off-by: Alexander Duyck 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40e/i40e_main.c | 12 
 1 file changed, 12 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c 
b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 0fa52ed..955dc71 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -8626,9 +8626,6 @@ static void i40e_add_vxlan_port(struct net_device *netdev,
u8 next_idx;
u8 idx;
 
-   if (sa_family == AF_INET6)
-   return;
-
idx = i40e_get_udp_port_idx(pf, port);
 
/* Check if port already exists */
@@ -8668,9 +8665,6 @@ static void i40e_del_vxlan_port(struct net_device *netdev,
struct i40e_pf *pf = vsi->back;
u8 idx;
 
-   if (sa_family == AF_INET6)
-   return;
-
idx = i40e_get_udp_port_idx(pf, port);
 
/* Check if port already exists */
@@ -8707,9 +8701,6 @@ static void i40e_add_geneve_port(struct net_device 
*netdev,
if (!(pf->flags & I40E_FLAG_GENEVE_OFFLOAD_CAPABLE))
return;
 
-   if (sa_family == AF_INET6)
-   return;
-
idx = i40e_get_udp_port_idx(pf, port);
 
/* Check if port already exists */
@@ -8751,9 +8742,6 @@ static void i40e_del_geneve_port(struct net_device 
*netdev,
struct i40e_pf *pf = vsi->back;
u8 idx;
 
-   if (sa_family == AF_INET6)
-   return;
-
if (!(pf->flags & I40E_FLAG_GENEVE_OFFLOAD_CAPABLE))
return;
 
-- 
2.5.0

[net-next 12/16] i40e: Fix ATR in relation to tunnels

2016-02-18 Thread Jeff Kirsher

From: Alexander Duyck 

This patch contains a number of fixes to make certain that we are using
the correct protocols when parsing both the inner and outer headers of a
frame that is mixed between IPv4 and IPv6 for inner and outer.

Signed-off-by: Alexander Duyck 
Acked-by: Kiran Patil 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40e/i40e_txrx.c | 28 +++-
 1 file changed, 11 insertions(+), 17 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c 
b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
index 1955c84..159fb6e 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
@@ -2030,10 +2030,9 @@ tx_only:
  * @tx_ring:  ring to add programming descriptor to
  * @skb:  send buffer
  * @tx_flags: send tx flags
- * @protocol: wire protocol
  **/
 static void i40e_atr(struct i40e_ring *tx_ring, struct sk_buff *skb,
-u32 tx_flags, __be16 protocol)
+u32 tx_flags)
 {
struct i40e_filter_program_desc *fdir_desc;
struct i40e_pf *pf = tx_ring->vsi->back;
@@ -2045,6 +2044,7 @@ static void i40e_atr(struct i40e_ring *tx_ring, struct 
sk_buff *skb,
struct tcphdr *th;
unsigned int hlen;
u32 flex_ptype, dtype_cmd;
+   u8 l4_proto;
u16 i;
 
/* make sure ATR is enabled */
@@ -2058,6 +2058,7 @@ static void i40e_atr(struct i40e_ring *tx_ring, struct 
sk_buff *skb,
if (!tx_ring->atr_sample_rate)
return;
 
+   /* Currently only IPv4/IPv6 with TCP is supported */
if (!(tx_flags & (I40E_TX_FLAGS_IPV4 | I40E_TX_FLAGS_IPV6)))
return;
 
@@ -2065,29 +2066,22 @@ static void i40e_atr(struct i40e_ring *tx_ring, struct 
sk_buff *skb,
/* snag network header to get L4 type and address */
hdr.network = skb_network_header(skb);
 
-   /* Currently only IPv4/IPv6 with TCP is supported
-* access ihl as u8 to avoid unaligned access on ia64
-*/
+   /* access ihl as u8 to avoid unaligned access on ia64 */
if (tx_flags & I40E_TX_FLAGS_IPV4)
hlen = (hdr.network[0] & 0x0F) << 2;
-   else if (protocol == htons(ETH_P_IPV6))
-   hlen = sizeof(struct ipv6hdr);
else
-   return;
+   hlen = sizeof(struct ipv6hdr);
} else {
hdr.network = skb_inner_network_header(skb);
hlen = skb_inner_network_header_len(skb);
}
 
-   /* Currently only IPv4/IPv6 with TCP is supported
-* Note: tx_flags gets modified to reflect inner protocols in
+   /* Note: tx_flags gets modified to reflect inner protocols in
 * tx_enable_csum function if encap is enabled.
 */
-   if ((tx_flags & I40E_TX_FLAGS_IPV4) &&
-   (hdr.ipv4->protocol != IPPROTO_TCP))
-   return;
-   else if ((tx_flags & I40E_TX_FLAGS_IPV6) &&
-(hdr.ipv6->nexthdr != IPPROTO_TCP))
+   l4_proto = (tx_flags & I40E_TX_FLAGS_IPV4) ? hdr.ipv4->protocol :
+hdr.ipv6->nexthdr;
+   if (l4_proto != IPPROTO_TCP)
return;
 
th = (struct tcphdr *)(hdr.network + hlen);
@@ -2124,7 +2118,7 @@ static void i40e_atr(struct i40e_ring *tx_ring, struct 
sk_buff *skb,
 
flex_ptype = (tx_ring->queue_index << I40E_TXD_FLTR_QW0_QINDEX_SHIFT) &
  I40E_TXD_FLTR_QW0_QINDEX_MASK;
-   flex_ptype |= (protocol == htons(ETH_P_IP)) ?
+   flex_ptype |= (tx_flags & I40E_TX_FLAGS_IPV4) ?
  (I40E_FILTER_PCTYPE_NONF_IPV4_TCP <<
   I40E_TXD_FLTR_QW0_PCTYPE_SHIFT) :
  (I40E_FILTER_PCTYPE_NONF_IPV6_TCP <<
@@ -2992,7 +2986,7 @@ static netdev_tx_t i40e_xmit_frame_ring(struct sk_buff 
*skb,
 *
 * NOTE: this must always be directly before the data descriptor.
 */
-   i40e_atr(tx_ring, skb, tx_flags, protocol);
+   i40e_atr(tx_ring, skb, tx_flags);
 
i40e_tx_map(tx_ring, skb, first, tx_flags, hdr_len,
td_cmd, td_offset);
-- 
2.5.0

[net-next 14/16] i40e: Update feature flags to reflect newly enabled features

2016-02-18 Thread Jeff Kirsher

From: Alexander Duyck 

Recent changes should have enabled support for IPv6 based tunnels and
support for TSO with outer UDP checksums.  As such we can update the
feature flags to reflect that.

In addition we can clean-up the flags that aren't needed such as SCTP and
RXCSUM since having the bits there doesn't add any value.

Signed-off-by: Alexander Duyck 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40e/i40e_main.c | 14 ++
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c 
b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 955dc71..2f2b2d7 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -9032,10 +9032,14 @@ static int i40e_config_netdev(struct i40e_vsi *vsi)
np = netdev_priv(netdev);
np->vsi = vsi;
 
-   netdev->hw_enc_features |= NETIF_F_IP_CSUM|
-  NETIF_F_GSO_UDP_TUNNEL |
-  NETIF_F_GSO_GRE|
-  NETIF_F_TSO|
+   netdev->hw_enc_features |= NETIF_F_IP_CSUM |
+  NETIF_F_IPV6_CSUM   |
+  NETIF_F_TSO |
+  NETIF_F_TSO6|
+  NETIF_F_TSO_ECN |
+  NETIF_F_GSO_GRE |
+  NETIF_F_GSO_UDP_TUNNEL  |
+  NETIF_F_GSO_UDP_TUNNEL_CSUM |
   0;
 
netdev->features = NETIF_F_SG  |
@@ -9057,6 +9061,8 @@ static int i40e_config_netdev(struct i40e_vsi *vsi)
 
if (!(pf->flags & I40E_FLAG_MFP_ENABLED))
netdev->features |= NETIF_F_NTUPLE;
+   if (pf->flags & I40E_FLAG_OUTER_UDP_CSUM_CAPABLE)
+   netdev->features |= NETIF_F_GSO_UDP_TUNNEL_CSUM;
 
/* copy netdev features into list of user selectable features */
netdev->hw_features |= netdev->features;
-- 
2.5.0

[net-next 15/16] i40evf: Update feature flags to reflect newly enabled features

2016-02-18 Thread Jeff Kirsher

From: Alexander Duyck 

Recent changes should have enabled support for IPv6 based tunnels and
support for TSO with outer UDP checksums.  As such we can update the
feature flags to reflect that.

In addition we can clean-up the flags that aren't needed such as SCTP and
RXCSUM since having the bits there doesn't add any value.

I also found one spot where we were setting the same flag twice.  It looks
like it was probably a git merge error that resulted in the line being
duplicated.  As such I have dropped it in this patch.

Signed-off-by: Alexander Duyck 
Acked-by: Anjali Singhai Jain 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40evf/i40evf_main.c | 23 +++
 1 file changed, 19 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40evf/i40evf_main.c 
b/drivers/net/ethernet/intel/i40evf/i40evf_main.c
index 41369a3..3396fe3 100644
--- a/drivers/net/ethernet/intel/i40evf/i40evf_main.c
+++ b/drivers/net/ethernet/intel/i40evf/i40evf_main.c
@@ -2337,9 +2337,24 @@ int i40evf_process_config(struct i40evf_adapter *adapter)
NETIF_F_IPV6_CSUM |
NETIF_F_TSO |
NETIF_F_TSO6 |
+   NETIF_F_TSO_ECN |
+   NETIF_F_GSO_GRE|
+   NETIF_F_GSO_UDP_TUNNEL |
NETIF_F_RXCSUM |
NETIF_F_GRO;
 
+   netdev->hw_enc_features |= NETIF_F_IP_CSUM |
+  NETIF_F_IPV6_CSUM   |
+  NETIF_F_TSO |
+  NETIF_F_TSO6|
+  NETIF_F_TSO_ECN |
+  NETIF_F_GSO_GRE |
+  NETIF_F_GSO_UDP_TUNNEL  |
+  NETIF_F_GSO_UDP_TUNNEL_CSUM;
+
+   if (adapter->flags & I40EVF_FLAG_OUTER_UDP_CSUM_CAPABLE)
+   netdev->features |= NETIF_F_GSO_UDP_TUNNEL_CSUM;
+
/* copy netdev features into list of user selectable features */
netdev->hw_features |= netdev->features;
netdev->hw_features &= ~NETIF_F_RXCSUM;
@@ -2478,6 +2493,10 @@ static void i40evf_init_task(struct work_struct *work)
default:
goto err_alloc;
}
+
+   if (hw->mac.type == I40E_MAC_X722_VF)
+   adapter->flags |= I40EVF_FLAG_OUTER_UDP_CSUM_CAPABLE;
+
if (i40evf_process_config(adapter))
goto err_alloc;
adapter->current_op = I40E_VIRTCHNL_OP_UNKNOWN;
@@ -2519,10 +2538,6 @@ static void i40evf_init_task(struct work_struct *work)
goto err_sw_init;
i40evf_map_rings_to_vectors(adapter);
if (adapter->vf_res->vf_offload_flags &
-   I40E_VIRTCHNL_VF_OFFLOAD_WB_ON_ITR)
-   adapter->flags |= I40EVF_FLAG_WB_ON_ITR_CAPABLE;
-
-   if (adapter->vf_res->vf_offload_flags &
I40E_VIRTCHNL_VF_OFFLOAD_WB_ON_ITR)
adapter->flags |= I40EVF_FLAG_WB_ON_ITR_CAPABLE;
 
-- 
2.5.0

[net-next 05/16] i40e/i40evf: Replace header pointers with unions of pointers in Tx checksum path

2016-02-18 Thread Jeff Kirsher

From: Alexander Duyck 

The Tx checksum path was maintaining a set of 3 pointers and two lengths in
order to prepare the packet for being checksummed.  The thing is we only
really needed 2 pointers, and the lengths that were being maintained can
easily be computed.

As such we can replace the IPv4 and IPv6 header pointers with one single
union that represents both, or a generic pointer to the start of the
network header.  For the L4 headers we can do the same with TCP and a
generic pointer to the start of the transport header.  The length of the
TCP header is obtained by simply multiplying doff by 4, and the network
header length can be obtained by subtracting the network header pointer
from the transport header pointer.

While I was at it I renamed l4_hdr to l4_proto to make it a bit more clear
and less likely to be confused with l4.hdr which is the transport header
pointer.

Signed-off-by: Alexander Duyck 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40e/i40e_txrx.c   | 51 +-
 drivers/net/ethernet/intel/i40evf/i40e_txrx.c | 52 +--
 2 files changed, 52 insertions(+), 51 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c 
b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
index f47f9cb..5cc7e71 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
@@ -2392,12 +2392,21 @@ static void i40e_tx_enable_csum(struct sk_buff *skb, 
u32 *tx_flags,
struct i40e_ring *tx_ring,
u32 *cd_tunneling)
 {
-   struct ipv6hdr *this_ipv6_hdr;
-   unsigned int this_tcp_hdrlen;
-   struct iphdr *this_ip_hdr;
-   u32 network_hdr_len;
-   u8 l4_hdr = 0;
+   union {
+   struct iphdr *v4;
+   struct ipv6hdr *v6;
+   unsigned char *hdr;
+   } ip;
+   union {
+   struct tcphdr *tcp;
+   struct udphdr *udp;
+   unsigned char *hdr;
+   } l4;
u32 l4_tunnel = 0;
+   u8 l4_proto = 0;
+
+   ip.hdr = skb_network_header(skb);
+   l4.hdr = skb_transport_header(skb);
 
if (skb->encapsulation) {
switch (ip_hdr(skb)->protocol) {
@@ -2411,10 +2420,10 @@ static void i40e_tx_enable_csum(struct sk_buff *skb, 
u32 *tx_flags,
default:
return;
}
-   network_hdr_len = skb_inner_network_header_len(skb);
-   this_ip_hdr = inner_ip_hdr(skb);
-   this_ipv6_hdr = inner_ipv6_hdr(skb);
-   this_tcp_hdrlen = inner_tcp_hdrlen(skb);
+
+   /* switch L4 header pointer from outer to inner */
+   ip.hdr = skb_inner_network_header(skb);
+   l4.hdr = skb_inner_transport_header(skb);
 
if (*tx_flags & I40E_TX_FLAGS_IPV4) {
if (*tx_flags & I40E_TX_FLAGS_TSO) {
@@ -2434,20 +2443,15 @@ static void i40e_tx_enable_csum(struct sk_buff *skb, 
u32 *tx_flags,
   ((skb_inner_network_offset(skb) -
skb_transport_offset(skb)) >> 1) <<
   I40E_TXD_CTX_QW0_NATLEN_SHIFT;
-   if (this_ip_hdr->version == 6) {
+   if (ip.v6->version == 6) {
*tx_flags &= ~I40E_TX_FLAGS_IPV4;
*tx_flags |= I40E_TX_FLAGS_IPV6;
}
-   } else {
-   network_hdr_len = skb_network_header_len(skb);
-   this_ip_hdr = ip_hdr(skb);
-   this_ipv6_hdr = ipv6_hdr(skb);
-   this_tcp_hdrlen = tcp_hdrlen(skb);
}
 
/* Enable IP checksum offloads */
if (*tx_flags & I40E_TX_FLAGS_IPV4) {
-   l4_hdr = this_ip_hdr->protocol;
+   l4_proto = ip.v4->protocol;
/* the stack computes the IP header already, the only time we
 * need the hardware to recompute it is in the case of TSO.
 */
@@ -2456,26 +2460,23 @@ static void i40e_tx_enable_csum(struct sk_buff *skb, 
u32 *tx_flags,
} else {
*td_cmd |= I40E_TX_DESC_CMD_IIPT_IPV4;
}
-   /* Now set the td_offset for IP header length */
-   *td_offset = (network_hdr_len >> 2) <<
- I40E_TX_DESC_LENGTH_IPLEN_SHIFT;
} else if (*tx_flags & I40E_TX_FLAGS_IPV6) {
-   l4_hdr = this_ipv6_hdr->nexthdr;
+   l4_proto = ip.v6->nexthdr;
*td_cmd |= I40E_TX_DESC_CMD_IIPT_IPV6;
-   /* Now set the td_offset for IP header length */
-   *td_offset = (network_hdr_len >> 2) <<
- I40E_TX_DESC_LENGTH_IPLEN_SHIFT;
}
+
+   /* Now

[net-next 02/16] i40e/i40evf: Use u64 values instead of casting them in TSO function

2016-02-18 Thread Jeff Kirsher

From: Alexander Duyck 

Instead of casing u32 values to u64 it makes more sense to just start out
with u64 values in the first place.  This way we don't need to create a
mess with all of the casts needed to populate a 64b value.

Signed-off-by: Alexander Duyck 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40e/i40e_txrx.c   | 9 -
 drivers/net/ethernet/intel/i40evf/i40e_txrx.c | 9 -
 2 files changed, 8 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c 
b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
index d4364ec..5e82589 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
@@ -2267,7 +2267,7 @@ out:
 static int i40e_tso(struct i40e_ring *tx_ring, struct sk_buff *skb,
u8 *hdr_len, u64 *cd_type_cmd_tso_mss)
 {
-   u32 cd_cmd, cd_tso_len, cd_mss;
+   u64 cd_cmd, cd_tso_len, cd_mss;
struct ipv6hdr *ipv6h;
struct tcphdr *tcph;
struct iphdr *iph;
@@ -2309,10 +2309,9 @@ static int i40e_tso(struct i40e_ring *tx_ring, struct 
sk_buff *skb,
cd_cmd = I40E_TX_CTX_DESC_TSO;
cd_tso_len = skb->len - *hdr_len;
cd_mss = skb_shinfo(skb)->gso_size;
-   *cd_type_cmd_tso_mss |= ((u64)cd_cmd << I40E_TXD_CTX_QW1_CMD_SHIFT) |
-   ((u64)cd_tso_len <<
-I40E_TXD_CTX_QW1_TSO_LEN_SHIFT) |
-   ((u64)cd_mss << I40E_TXD_CTX_QW1_MSS_SHIFT);
+   *cd_type_cmd_tso_mss |= (cd_cmd << I40E_TXD_CTX_QW1_CMD_SHIFT) |
+   (cd_tso_len << I40E_TXD_CTX_QW1_TSO_LEN_SHIFT) |
+   (cd_mss << I40E_TXD_CTX_QW1_MSS_SHIFT);
return 1;
 }
 
diff --git a/drivers/net/ethernet/intel/i40evf/i40e_txrx.c 
b/drivers/net/ethernet/intel/i40evf/i40e_txrx.c
index 8b20ed3..c5f98cb 100644
--- a/drivers/net/ethernet/intel/i40evf/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40evf/i40e_txrx.c
@@ -1527,7 +1527,7 @@ out:
 static int i40e_tso(struct i40e_ring *tx_ring, struct sk_buff *skb,
u8 *hdr_len, u64 *cd_type_cmd_tso_mss)
 {
-   u32 cd_cmd, cd_tso_len, cd_mss;
+   u64 cd_cmd, cd_tso_len, cd_mss;
struct ipv6hdr *ipv6h;
struct tcphdr *tcph;
struct iphdr *iph;
@@ -1569,10 +1569,9 @@ static int i40e_tso(struct i40e_ring *tx_ring, struct 
sk_buff *skb,
cd_cmd = I40E_TX_CTX_DESC_TSO;
cd_tso_len = skb->len - *hdr_len;
cd_mss = skb_shinfo(skb)->gso_size;
-   *cd_type_cmd_tso_mss |= ((u64)cd_cmd << I40E_TXD_CTX_QW1_CMD_SHIFT) |
-   ((u64)cd_tso_len <<
-I40E_TXD_CTX_QW1_TSO_LEN_SHIFT) |
-   ((u64)cd_mss << I40E_TXD_CTX_QW1_MSS_SHIFT);
+   *cd_type_cmd_tso_mss |= (cd_cmd << I40E_TXD_CTX_QW1_CMD_SHIFT) |
+   (cd_tso_len << I40E_TXD_CTX_QW1_TSO_LEN_SHIFT) |
+   (cd_mss << I40E_TXD_CTX_QW1_MSS_SHIFT);
return 1;
 }
 
-- 
2.5.0

[net-next 16/16] i40e: Add support for ATR w/ IPv6 extension headers

2016-02-18 Thread Jeff Kirsher

From: Alexander Duyck 

This patch updates the code for determining the L4 protocol and L3 header
length so that when IPv6 extension headers are being used we can determine
the offset and type of the L4 protocol.

Signed-off-by: Alexander Duyck 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40e/i40e_txrx.c | 28 +---
 1 file changed, 13 insertions(+), 15 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c 
b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
index 159fb6e..1d3afa7 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
@@ -2044,7 +2044,7 @@ static void i40e_atr(struct i40e_ring *tx_ring, struct 
sk_buff *skb,
struct tcphdr *th;
unsigned int hlen;
u32 flex_ptype, dtype_cmd;
-   u8 l4_proto;
+   int l4_proto;
u16 i;
 
/* make sure ATR is enabled */
@@ -2062,25 +2062,23 @@ static void i40e_atr(struct i40e_ring *tx_ring, struct 
sk_buff *skb,
if (!(tx_flags & (I40E_TX_FLAGS_IPV4 | I40E_TX_FLAGS_IPV6)))
return;
 
-   if (!(tx_flags & I40E_TX_FLAGS_UDP_TUNNEL)) {
-   /* snag network header to get L4 type and address */
-   hdr.network = skb_network_header(skb);
+   /* snag network header to get L4 type and address */
+   hdr.network = (tx_flags & I40E_TX_FLAGS_UDP_TUNNEL) ?
+ skb_inner_network_header(skb) : skb_network_header(skb);
 
+   /* Note: tx_flags gets modified to reflect inner protocols in
+* tx_enable_csum function if encap is enabled.
+*/
+   if (tx_flags & I40E_TX_FLAGS_IPV4) {
/* access ihl as u8 to avoid unaligned access on ia64 */
-   if (tx_flags & I40E_TX_FLAGS_IPV4)
-   hlen = (hdr.network[0] & 0x0F) << 2;
-   else
-   hlen = sizeof(struct ipv6hdr);
+   hlen = (hdr.network[0] & 0x0F) << 2;
+   l4_proto = hdr.ipv4->protocol;
} else {
-   hdr.network = skb_inner_network_header(skb);
-   hlen = skb_inner_network_header_len(skb);
+   hlen = hdr.network - skb->data;
+   l4_proto = ipv6_find_hdr(skb, , IPPROTO_TCP, NULL, NULL);
+   hlen -= hdr.network - skb->data;
}
 
-   /* Note: tx_flags gets modified to reflect inner protocols in
-* tx_enable_csum function if encap is enabled.
-*/
-   l4_proto = (tx_flags & I40E_TX_FLAGS_IPV4) ? hdr.ipv4->protocol :
-hdr.ipv6->nexthdr;
if (l4_proto != IPPROTO_TCP)
return;
 
-- 
2.5.0

[net-next 07/16] i40e/i40evf: Handle IPv6 extension headers in checksum offload

2016-02-18 Thread Jeff Kirsher

From: Alexander Duyck 

This patch adds support for IPv6 extension headers in setting up the Tx
checksum.  Without this patch extension headers would cause IPv6 traffic to
fail as the transport protocol could not be identified.

Signed-off-by: Alexander Duyck 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40e/i40e_txrx.c   | 14 +-
 drivers/net/ethernet/intel/i40evf/i40e_txrx.c | 14 +-
 2 files changed, 26 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c 
b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
index 1404cae..e49fe8f 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
@@ -2402,7 +2402,9 @@ static void i40e_tx_enable_csum(struct sk_buff *skb, u32 
*tx_flags,
struct udphdr *udp;
unsigned char *hdr;
} l4;
+   unsigned char *exthdr;
u32 l4_tunnel = 0;
+   __be16 frag_off;
u8 l4_proto = 0;
 
ip.hdr = skb_network_header(skb);
@@ -2419,7 +2421,12 @@ static void i40e_tx_enable_csum(struct sk_buff *skb, u32 
*tx_flags,
l4_proto = ip.v4->protocol;
} else if (*tx_flags & I40E_TX_FLAGS_IPV6) {
*cd_tunneling |= I40E_TX_CTX_EXT_IP_IPV6;
+
+   exthdr = ip.hdr + sizeof(*ip.v6);
l4_proto = ip.v6->nexthdr;
+   if (l4.hdr != exthdr)
+   ipv6_skip_exthdr(skb, exthdr - skb->data,
+_proto, _off);
}
 
/* define outer transport */
@@ -2469,8 +2476,13 @@ static void i40e_tx_enable_csum(struct sk_buff *skb, u32 
*tx_flags,
*td_cmd |= I40E_TX_DESC_CMD_IIPT_IPV4;
}
} else if (*tx_flags & I40E_TX_FLAGS_IPV6) {
-   l4_proto = ip.v6->nexthdr;
*td_cmd |= I40E_TX_DESC_CMD_IIPT_IPV6;
+
+   exthdr = ip.hdr + sizeof(*ip.v6);
+   l4_proto = ip.v6->nexthdr;
+   if (l4.hdr != exthdr)
+   ipv6_skip_exthdr(skb, exthdr - skb->data,
+_proto, _off);
}
 
/* Now set the td_offset for IP header length */
diff --git a/drivers/net/ethernet/intel/i40evf/i40e_txrx.c 
b/drivers/net/ethernet/intel/i40evf/i40e_txrx.c
index 39d5f80..48ec763 100644
--- a/drivers/net/ethernet/intel/i40evf/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40evf/i40e_txrx.c
@@ -1619,7 +1619,9 @@ static void i40e_tx_enable_csum(struct sk_buff *skb, u32 
*tx_flags,
struct udphdr *udp;
unsigned char *hdr;
} l4;
+   unsigned char *exthdr;
u32 l4_tunnel = 0;
+   __be16 frag_off;
u8 l4_proto = 0;
 
ip.hdr = skb_network_header(skb);
@@ -1636,7 +1638,12 @@ static void i40e_tx_enable_csum(struct sk_buff *skb, u32 
*tx_flags,
l4_proto = ip.v4->protocol;
} else if (*tx_flags & I40E_TX_FLAGS_IPV6) {
*cd_tunneling |= I40E_TX_CTX_EXT_IP_IPV6;
+
+   exthdr = ip.hdr + sizeof(*ip.v6);
l4_proto = ip.v6->nexthdr;
+   if (l4.hdr != exthdr)
+   ipv6_skip_exthdr(skb, exthdr - skb->data,
+_proto, _off);
}
 
/* define outer transport */
@@ -1686,8 +1693,13 @@ static void i40e_tx_enable_csum(struct sk_buff *skb, u32 
*tx_flags,
*td_cmd |= I40E_TX_DESC_CMD_IIPT_IPV4;
}
} else if (*tx_flags & I40E_TX_FLAGS_IPV6) {
-   l4_proto = ip.v6->nexthdr;
*td_cmd |= I40E_TX_DESC_CMD_IIPT_IPV6;
+
+   exthdr = ip.hdr + sizeof(*ip.v6);
+   l4_proto = ip.v6->nexthdr;
+   if (l4.hdr != exthdr)
+   ipv6_skip_exthdr(skb, exthdr - skb->data,
+_proto, _off);
}
 
/* Now set the td_offset for IP header length */
-- 
2.5.0

[net-next 06/16] i40e/i40evf: Add support for IPv4 encapsulated in IPv6

2016-02-18 Thread Jeff Kirsher

From: Alexander Duyck 

This patch fixes two issues.  First was the fact that iphdr(skb)->protocl
was being used to test for the outer transport protocol.  This completely
breaks IPv6 support.  Second was the fact that we cleared the flag for v4
going to v6, but we didn't take care of txflags going the other way.  As
such we would have the v6 flag still set even if the inner header was v4.

Signed-off-by: Alexander Duyck 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40e/i40e_txrx.c   | 38 +++--
 drivers/net/ethernet/intel/i40evf/i40e_txrx.c | 41 +--
 2 files changed, 49 insertions(+), 30 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c 
b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
index 5cc7e71..1404cae 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
@@ -2409,13 +2409,28 @@ static void i40e_tx_enable_csum(struct sk_buff *skb, 
u32 *tx_flags,
l4.hdr = skb_transport_header(skb);
 
if (skb->encapsulation) {
-   switch (ip_hdr(skb)->protocol) {
+   /* define outer network header type */
+   if (*tx_flags & I40E_TX_FLAGS_IPV4) {
+   if (*tx_flags & I40E_TX_FLAGS_TSO)
+   *cd_tunneling |= I40E_TX_CTX_EXT_IP_IPV4;
+   else
+   *cd_tunneling |=
+I40E_TX_CTX_EXT_IP_IPV4_NO_CSUM;
+   l4_proto = ip.v4->protocol;
+   } else if (*tx_flags & I40E_TX_FLAGS_IPV6) {
+   *cd_tunneling |= I40E_TX_CTX_EXT_IP_IPV6;
+   l4_proto = ip.v6->nexthdr;
+   }
+
+   /* define outer transport */
+   switch (l4_proto) {
case IPPROTO_UDP:
l4_tunnel = I40E_TXD_CTX_UDP_TUNNELING;
*tx_flags |= I40E_TX_FLAGS_UDP_TUNNEL;
break;
case IPPROTO_GRE:
l4_tunnel = I40E_TXD_CTX_GRE_TUNNELING;
+   *tx_flags |= I40E_TX_FLAGS_UDP_TUNNEL;
break;
default:
return;
@@ -2424,17 +2439,7 @@ static void i40e_tx_enable_csum(struct sk_buff *skb, u32 
*tx_flags,
/* switch L4 header pointer from outer to inner */
ip.hdr = skb_inner_network_header(skb);
l4.hdr = skb_inner_transport_header(skb);
-
-   if (*tx_flags & I40E_TX_FLAGS_IPV4) {
-   if (*tx_flags & I40E_TX_FLAGS_TSO) {
-   *cd_tunneling |= I40E_TX_CTX_EXT_IP_IPV4;
-   } else {
-   *cd_tunneling |=
-I40E_TX_CTX_EXT_IP_IPV4_NO_CSUM;
-   }
-   } else if (*tx_flags & I40E_TX_FLAGS_IPV6) {
-   *cd_tunneling |= I40E_TX_CTX_EXT_IP_IPV6;
-   }
+   l4_proto = 0;
 
/* Now set the ctx descriptor fields */
*cd_tunneling |= (skb_network_header_len(skb) >> 2) <<
@@ -2443,10 +2448,13 @@ static void i40e_tx_enable_csum(struct sk_buff *skb, 
u32 *tx_flags,
   ((skb_inner_network_offset(skb) -
skb_transport_offset(skb)) >> 1) <<
   I40E_TXD_CTX_QW0_NATLEN_SHIFT;
-   if (ip.v6->version == 6) {
-   *tx_flags &= ~I40E_TX_FLAGS_IPV4;
+
+   /* reset type as we transition from outer to inner headers */
+   *tx_flags &= ~(I40E_TX_FLAGS_IPV4 | I40E_TX_FLAGS_IPV6);
+   if (ip.v4->version == 4)
+   *tx_flags |= I40E_TX_FLAGS_IPV4;
+   if (ip.v6->version == 6)
*tx_flags |= I40E_TX_FLAGS_IPV6;
-   }
}
 
/* Enable IP checksum offloads */
diff --git a/drivers/net/ethernet/intel/i40evf/i40e_txrx.c 
b/drivers/net/ethernet/intel/i40evf/i40e_txrx.c
index 29af3c9f..39d5f80 100644
--- a/drivers/net/ethernet/intel/i40evf/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40evf/i40e_txrx.c
@@ -1626,11 +1626,29 @@ static void i40e_tx_enable_csum(struct sk_buff *skb, 
u32 *tx_flags,
l4.hdr = skb_transport_header(skb);
 
if (skb->encapsulation) {
-   switch (ip_hdr(skb)->protocol) {
+   /* define outer network header type */
+   if (*tx_flags & I40E_TX_FLAGS_IPV4) {
+   if (*tx_flags & I40E_TX_FLAGS_TSO)
+   *cd_tunneling |= I40E_TX_CTX_EXT_IP_IPV4;
+   else
+   *cd_tunneling |=
+

[net-next 01/16] i40e/i40evf: Drop outer checksum offload that was not requested

2016-02-18 Thread Jeff Kirsher

From: Alexander Duyck 

The i40e and i40evf drivers contained code for inserting an outer checksum
on UDP tunnels.  The issue however is that the upper levels of the stack
never requested such an offload and it results in possible errors.

In addition the same logic was being applied to the Rx side where it was
attempting to validate the outer checksum, but the logic there was
incorrect in that it was testing for the resultant sum to be equal to the
header checksum instead of being equal to 0.

Since this code is so massively flawed, and doing things that we didn't ask
for it to do I am just dropping it, and will bring it back later to use as
an offload for SKB_GSO_UDP_TUNNEL_CSUM which can make use of such a
feature.

As far as the Rx feature I am dropping it completely since it would need to
be massively expanded and applied to IPv4 and IPv6 checksums for all parts,
not just the one that supports Tx checksum offload for the outer.

Signed-off-by: Alexander Duyck 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40e/i40e_main.c   |  2 --
 drivers/net/ethernet/intel/i40e/i40e_txrx.c   | 47 +++
 drivers/net/ethernet/intel/i40e/i40e_txrx.h   |  1 -
 drivers/net/ethernet/intel/i40evf/i40e_txrx.c | 46 +++---
 drivers/net/ethernet/intel/i40evf/i40e_txrx.h |  1 -
 5 files changed, 10 insertions(+), 87 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c 
b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 16e5e0b..0fa52ed 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -7474,8 +7474,6 @@ static int i40e_alloc_rings(struct i40e_vsi *vsi)
tx_ring->dcb_tc = 0;
if (vsi->back->flags & I40E_FLAG_WB_ON_ITR_CAPABLE)
tx_ring->flags = I40E_TXR_FLAGS_WB_ON_ITR;
-   if (vsi->back->flags & I40E_FLAG_OUTER_UDP_CSUM_CAPABLE)
-   tx_ring->flags |= I40E_TXR_FLAGS_OUTER_UDP_CSUM;
vsi->tx_rings[i] = tx_ring;
 
rx_ring = _ring[1];
diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c 
b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
index 65f2fd8..d4364ec 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
@@ -1391,9 +1391,6 @@ static inline void i40e_rx_checksum(struct i40e_vsi *vsi,
struct i40e_rx_ptype_decoded decoded = decode_rx_desc_ptype(rx_ptype);
bool ipv4 = false, ipv6 = false;
bool ipv4_tunnel, ipv6_tunnel;
-   __wsum rx_udp_csum;
-   struct iphdr *iph;
-   __sum16 csum;
 
ipv4_tunnel = (rx_ptype >= I40E_RX_PTYPE_GRENAT4_MAC_PAY3) &&
 (rx_ptype <= I40E_RX_PTYPE_GRENAT4_MACVLAN_IPV6_ICMP_PAY4);
@@ -1443,37 +1440,12 @@ static inline void i40e_rx_checksum(struct i40e_vsi 
*vsi,
if (rx_error & BIT(I40E_RX_DESC_ERROR_PPRS_SHIFT))
return;
 
-   /* If VXLAN/GENEVE traffic has an outer UDPv4 checksum we need to check
-* it in the driver, hardware does not do it for us.
-* Since L3L4P bit was set we assume a valid IHL value (>=5)
-* so the total length of IPv4 header is IHL*4 bytes
-* The UDP_0 bit *may* bet set if the *inner* header is UDP
+   /* The hardware supported by this driver does not validate outer
+* checksums for tunneled VXLAN or GENEVE frames.  I don't agree
+* with it but the specification states that you "MAY validate", it
+* doesn't make it a hard requirement so if we have validated the
+* inner checksum report CHECKSUM_UNNECESSARY.
 */
-   if (!(vsi->back->flags & I40E_FLAG_OUTER_UDP_CSUM_CAPABLE) &&
-   (ipv4_tunnel)) {
-   skb->transport_header = skb->mac_header +
-   sizeof(struct ethhdr) +
-   (ip_hdr(skb)->ihl * 4);
-
-   /* Add 4 bytes for VLAN tagged packets */
-   skb->transport_header += (skb->protocol == htons(ETH_P_8021Q) ||
- skb->protocol == htons(ETH_P_8021AD))
- ? VLAN_HLEN : 0;
-
-   if ((ip_hdr(skb)->protocol == IPPROTO_UDP) &&
-   (udp_hdr(skb)->check != 0)) {
-   rx_udp_csum = udp_csum(skb);
-   iph = ip_hdr(skb);
-   csum = csum_tcpudp_magic(
-   iph->saddr, iph->daddr,
-   (skb->len - skb_transport_offset(skb)),
-   IPPROTO_UDP, rx_udp_csum);
-
-   if (udp_hdr(skb)->check != csum)
-   goto checksum_fail;
-
-   } /* else its GRE and so no outer UDP

[net-next 09/16] i40e/i40evf: Add exception handling for Tx checksum

2016-02-18 Thread Jeff Kirsher

From: Alexander Duyck 

Add exception handling to the Tx checksum path so that we can handle cases
of TSO where the frame is bad, or Tx checksum where we didn't recognize a
protocol

Drop I40E_TX_FLAGS_CSUM as it is unused, move the CHECKSUM_PARTIAL check
into the function itself so that we can decrease indent.

Signed-off-by: Alexander Duyck 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40e/i40e_txrx.c   | 34 +-
 drivers/net/ethernet/intel/i40e/i40e_txrx.h   |  1 -
 drivers/net/ethernet/intel/i40evf/i40e_txrx.c | 35 ++-
 drivers/net/ethernet/intel/i40evf/i40e_txrx.h |  1 -
 4 files changed, 45 insertions(+), 26 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c 
b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
index 5b591b8..6b08b0f 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
@@ -2387,10 +2387,10 @@ static int i40e_tsyn(struct i40e_ring *tx_ring, struct 
sk_buff *skb,
  * @tx_ring: Tx descriptor ring
  * @cd_tunneling: ptr to context desc bits
  **/
-static void i40e_tx_enable_csum(struct sk_buff *skb, u32 *tx_flags,
-   u32 *td_cmd, u32 *td_offset,
-   struct i40e_ring *tx_ring,
-   u32 *cd_tunneling)
+static int i40e_tx_enable_csum(struct sk_buff *skb, u32 *tx_flags,
+  u32 *td_cmd, u32 *td_offset,
+  struct i40e_ring *tx_ring,
+  u32 *cd_tunneling)
 {
union {
struct iphdr *v4;
@@ -2407,6 +2407,9 @@ static void i40e_tx_enable_csum(struct sk_buff *skb, u32 
*tx_flags,
__be16 frag_off;
u8 l4_proto = 0;
 
+   if (skb->ip_summed != CHECKSUM_PARTIAL)
+   return 0;
+
ip.hdr = skb_network_header(skb);
l4.hdr = skb_transport_header(skb);
 
@@ -2449,7 +2452,11 @@ static void i40e_tx_enable_csum(struct sk_buff *skb, u32 
*tx_flags,
*tx_flags |= I40E_TX_FLAGS_UDP_TUNNEL;
break;
default:
-   return;
+   if (*tx_flags & I40E_TX_FLAGS_TSO)
+   return -1;
+
+   skb_checksum_help(skb);
+   return 0;
}
 
/* compute tunnel header size */
@@ -2513,11 +2520,16 @@ static void i40e_tx_enable_csum(struct sk_buff *skb, 
u32 *tx_flags,
  I40E_TX_DESC_LENGTH_L4_FC_LEN_SHIFT;
break;
default:
-   break;
+   if (*tx_flags & I40E_TX_FLAGS_TSO)
+   return -1;
+   skb_checksum_help(skb);
+   return 0;
}
 
*td_cmd |= cmd;
*td_offset |= offset;
+
+   return 1;
 }
 
 /**
@@ -2954,12 +2966,10 @@ static netdev_tx_t i40e_xmit_frame_ring(struct sk_buff 
*skb,
td_cmd |= I40E_TX_DESC_CMD_ICRC;
 
/* Always offload the checksum, since it's in the data descriptor */
-   if (skb->ip_summed == CHECKSUM_PARTIAL) {
-   tx_flags |= I40E_TX_FLAGS_CSUM;
-
-   i40e_tx_enable_csum(skb, _flags, _cmd, _offset,
-   tx_ring, _tunneling);
-   }
+   tso = i40e_tx_enable_csum(skb, _flags, _cmd, _offset,
+ tx_ring, _tunneling);
+   if (tso < 0)
+   goto out_drop;
 
i40e_create_tx_ctx(tx_ring, cd_type_cmd_tso_mss,
   cd_tunneling, cd_l2tag2);
diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.h 
b/drivers/net/ethernet/intel/i40e/i40e_txrx.h
index fb065d4..fde5f42 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.h
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.h
@@ -153,7 +153,6 @@ enum i40e_dyn_idx_t {
 #define DESC_NEEDED (MAX_SKB_FRAGS + 4)
 #define I40E_MIN_DESC_PENDING  4
 
-#define I40E_TX_FLAGS_CSUM BIT(0)
 #define I40E_TX_FLAGS_HW_VLAN  BIT(1)
 #define I40E_TX_FLAGS_SW_VLAN  BIT(2)
 #define I40E_TX_FLAGS_TSO  BIT(3)
diff --git a/drivers/net/ethernet/intel/i40evf/i40e_txrx.c 
b/drivers/net/ethernet/intel/i40evf/i40e_txrx.c
index 0ee13f6..143c570 100644
--- a/drivers/net/ethernet/intel/i40evf/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40evf/i40e_txrx.c
@@ -1602,12 +1602,13 @@ static int i40e_tso(struct i40e_ring *tx_ring, struct 
sk_buff *skb,
  * @tx_flags: pointer to Tx flags currently set
  * @td_cmd: Tx descriptor command bits to set
  * @td_offset: Tx descriptor header offsets to set
+ * @tx_ring: Tx descriptor ring
  * @cd_tunneling: ptr to context desc bits
  **/
-static void i40e_tx_enable_csum(struct sk_buff *skb, u32 *tx_flags,
-   u32 *td_cmd, u32 *td_offset,
-

[net-next 08/16] i40e/i40evf: Do not write to descriptor unless we complete

2016-02-18 Thread Jeff Kirsher

From: Alexander Duyck 

This patch defers writing to the Tx descriptor bits until we know we have
successfully completed a given operation.  So for example we defer updating
the tunnelling portion of the context descriptor until we have fully
identified the type.

The advantage to this approach is that we can assemble values as we go
instead of having to try and kludge everything together all at once.  As a
result we can significantly clean up the tunneling configuration for
instance as we can just do a pointer walk and do the math for the distance
between each set of points.

Signed-off-by: Alexander Duyck 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40e/i40e_txrx.c   | 80 ++-
 drivers/net/ethernet/intel/i40evf/i40e_txrx.c | 80 ++-
 2 files changed, 84 insertions(+), 76 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c 
b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
index e49fe8f..5b591b8 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
@@ -2403,24 +2403,26 @@ static void i40e_tx_enable_csum(struct sk_buff *skb, 
u32 *tx_flags,
unsigned char *hdr;
} l4;
unsigned char *exthdr;
-   u32 l4_tunnel = 0;
+   u32 offset, cmd = 0, tunnel = 0;
__be16 frag_off;
u8 l4_proto = 0;
 
ip.hdr = skb_network_header(skb);
l4.hdr = skb_transport_header(skb);
 
+   /* compute outer L2 header size */
+   offset = ((ip.hdr - skb->data) / 2) << I40E_TX_DESC_LENGTH_MACLEN_SHIFT;
+
if (skb->encapsulation) {
/* define outer network header type */
if (*tx_flags & I40E_TX_FLAGS_IPV4) {
-   if (*tx_flags & I40E_TX_FLAGS_TSO)
-   *cd_tunneling |= I40E_TX_CTX_EXT_IP_IPV4;
-   else
-   *cd_tunneling |=
-I40E_TX_CTX_EXT_IP_IPV4_NO_CSUM;
+   tunnel |= (*tx_flags & I40E_TX_FLAGS_TSO) ?
+ I40E_TX_CTX_EXT_IP_IPV4 :
+ I40E_TX_CTX_EXT_IP_IPV4_NO_CSUM;
+
l4_proto = ip.v4->protocol;
} else if (*tx_flags & I40E_TX_FLAGS_IPV6) {
-   *cd_tunneling |= I40E_TX_CTX_EXT_IP_IPV6;
+   tunnel |= I40E_TX_CTX_EXT_IP_IPV6;
 
exthdr = ip.hdr + sizeof(*ip.v6);
l4_proto = ip.v6->nexthdr;
@@ -2429,33 +2431,38 @@ static void i40e_tx_enable_csum(struct sk_buff *skb, 
u32 *tx_flags,
 _proto, _off);
}
 
+   /* compute outer L3 header size */
+   tunnel |= ((l4.hdr - ip.hdr) / 4) <<
+ I40E_TXD_CTX_QW0_EXT_IPLEN_SHIFT;
+
+   /* switch IP header pointer from outer to inner header */
+   ip.hdr = skb_inner_network_header(skb);
+
/* define outer transport */
switch (l4_proto) {
case IPPROTO_UDP:
-   l4_tunnel = I40E_TXD_CTX_UDP_TUNNELING;
+   tunnel |= I40E_TXD_CTX_UDP_TUNNELING;
*tx_flags |= I40E_TX_FLAGS_UDP_TUNNEL;
break;
case IPPROTO_GRE:
-   l4_tunnel = I40E_TXD_CTX_GRE_TUNNELING;
+   tunnel |= I40E_TXD_CTX_GRE_TUNNELING;
*tx_flags |= I40E_TX_FLAGS_UDP_TUNNEL;
break;
default:
return;
}
 
+   /* compute tunnel header size */
+   tunnel |= ((ip.hdr - l4.hdr) / 2) <<
+ I40E_TXD_CTX_QW0_NATLEN_SHIFT;
+
+   /* record tunnel offload values */
+   *cd_tunneling |= tunnel;
+
/* switch L4 header pointer from outer to inner */
-   ip.hdr = skb_inner_network_header(skb);
l4.hdr = skb_inner_transport_header(skb);
l4_proto = 0;
 
-   /* Now set the ctx descriptor fields */
-   *cd_tunneling |= (skb_network_header_len(skb) >> 2) <<
-  I40E_TXD_CTX_QW0_EXT_IPLEN_SHIFT  |
-  l4_tunnel |
-  ((skb_inner_network_offset(skb) -
-   skb_transport_offset(skb)) >> 1) <<
-  I40E_TXD_CTX_QW0_NATLEN_SHIFT;
-
/* reset type as we transition from outer to inner headers */
*tx_flags &= ~(I40E_TX_FLAGS_IPV4 | I40E_TX_FLAGS_IPV6);
if (ip.v4->version == 4)
@@

[net-next 10/16] i40e/i40evf: Clean-up Rx packet checksum handling

2016-02-18 Thread Jeff Kirsher

From: Alexander Duyck 

This is mostly a minor clean-up for the Rx checksum path in order to avoid
some of the unnecessary conditional checks that were being applied.

Signed-off-by: Alexander Duyck 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40e/i40e_txrx.c   | 23 ++-
 drivers/net/ethernet/intel/i40evf/i40e_txrx.c | 23 ++-
 2 files changed, 20 insertions(+), 26 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c 
b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
index 6b08b0f..ded73c0 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
@@ -1389,13 +1389,7 @@ static inline void i40e_rx_checksum(struct i40e_vsi *vsi,
u16 rx_ptype)
 {
struct i40e_rx_ptype_decoded decoded = decode_rx_desc_ptype(rx_ptype);
-   bool ipv4 = false, ipv6 = false;
-   bool ipv4_tunnel, ipv6_tunnel;
-
-   ipv4_tunnel = (rx_ptype >= I40E_RX_PTYPE_GRENAT4_MAC_PAY3) &&
-(rx_ptype <= I40E_RX_PTYPE_GRENAT4_MACVLAN_IPV6_ICMP_PAY4);
-   ipv6_tunnel = (rx_ptype >= I40E_RX_PTYPE_GRENAT6_MAC_PAY3) &&
-(rx_ptype <= I40E_RX_PTYPE_GRENAT6_MACVLAN_IPV6_ICMP_PAY4);
+   bool ipv4, ipv6, ipv4_tunnel, ipv6_tunnel;
 
skb->ip_summed = CHECKSUM_NONE;
 
@@ -1411,12 +1405,10 @@ static inline void i40e_rx_checksum(struct i40e_vsi 
*vsi,
if (!(decoded.known && decoded.outer_ip))
return;
 
-   if (decoded.outer_ip == I40E_RX_PTYPE_OUTER_IP &&
-   decoded.outer_ip_ver == I40E_RX_PTYPE_OUTER_IPV4)
-   ipv4 = true;
-   else if (decoded.outer_ip == I40E_RX_PTYPE_OUTER_IP &&
-decoded.outer_ip_ver == I40E_RX_PTYPE_OUTER_IPV6)
-   ipv6 = true;
+   ipv4 = (decoded.outer_ip == I40E_RX_PTYPE_OUTER_IP) &&
+  (decoded.outer_ip_ver == I40E_RX_PTYPE_OUTER_IPV4);
+   ipv6 = (decoded.outer_ip == I40E_RX_PTYPE_OUTER_IP) &&
+  (decoded.outer_ip_ver == I40E_RX_PTYPE_OUTER_IPV6);
 
if (ipv4 &&
(rx_error & (BIT(I40E_RX_DESC_ERROR_IPE_SHIFT) |
@@ -1447,6 +1439,11 @@ static inline void i40e_rx_checksum(struct i40e_vsi *vsi,
 * inner checksum report CHECKSUM_UNNECESSARY.
 */
 
+   ipv4_tunnel = (rx_ptype >= I40E_RX_PTYPE_GRENAT4_MAC_PAY3) &&
+(rx_ptype <= I40E_RX_PTYPE_GRENAT4_MACVLAN_IPV6_ICMP_PAY4);
+   ipv6_tunnel = (rx_ptype >= I40E_RX_PTYPE_GRENAT6_MAC_PAY3) &&
+(rx_ptype <= I40E_RX_PTYPE_GRENAT6_MACVLAN_IPV6_ICMP_PAY4);
+
skb->ip_summed = CHECKSUM_UNNECESSARY;
skb->csum_level = ipv4_tunnel || ipv6_tunnel;
 
diff --git a/drivers/net/ethernet/intel/i40evf/i40e_txrx.c 
b/drivers/net/ethernet/intel/i40evf/i40e_txrx.c
index 143c570..3f40e0e 100644
--- a/drivers/net/ethernet/intel/i40evf/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40evf/i40e_txrx.c
@@ -861,13 +861,7 @@ static inline void i40e_rx_checksum(struct i40e_vsi *vsi,
u16 rx_ptype)
 {
struct i40e_rx_ptype_decoded decoded = decode_rx_desc_ptype(rx_ptype);
-   bool ipv4 = false, ipv6 = false;
-   bool ipv4_tunnel, ipv6_tunnel;
-
-   ipv4_tunnel = (rx_ptype >= I40E_RX_PTYPE_GRENAT4_MAC_PAY3) &&
-(rx_ptype <= I40E_RX_PTYPE_GRENAT4_MACVLAN_IPV6_ICMP_PAY4);
-   ipv6_tunnel = (rx_ptype >= I40E_RX_PTYPE_GRENAT6_MAC_PAY3) &&
-(rx_ptype <= I40E_RX_PTYPE_GRENAT6_MACVLAN_IPV6_ICMP_PAY4);
+   bool ipv4, ipv6, ipv4_tunnel, ipv6_tunnel;
 
skb->ip_summed = CHECKSUM_NONE;
 
@@ -883,12 +877,10 @@ static inline void i40e_rx_checksum(struct i40e_vsi *vsi,
if (!(decoded.known && decoded.outer_ip))
return;
 
-   if (decoded.outer_ip == I40E_RX_PTYPE_OUTER_IP &&
-   decoded.outer_ip_ver == I40E_RX_PTYPE_OUTER_IPV4)
-   ipv4 = true;
-   else if (decoded.outer_ip == I40E_RX_PTYPE_OUTER_IP &&
-decoded.outer_ip_ver == I40E_RX_PTYPE_OUTER_IPV6)
-   ipv6 = true;
+   ipv4 = (decoded.outer_ip == I40E_RX_PTYPE_OUTER_IP) &&
+  (decoded.outer_ip_ver == I40E_RX_PTYPE_OUTER_IPV4);
+   ipv6 = (decoded.outer_ip == I40E_RX_PTYPE_OUTER_IP) &&
+  (decoded.outer_ip_ver == I40E_RX_PTYPE_OUTER_IPV6);
 
if (ipv4 &&
(rx_error & (BIT(I40E_RX_DESC_ERROR_IPE_SHIFT) |
@@ -919,6 +911,11 @@ static inline void i40e_rx_checksum(struct i40e_vsi *vsi,
 * inner checksum report CHECKSUM_UNNECESSARY.
 */
 
+   ipv4_tunnel = (rx_ptype >= I40E_RX_PTYPE_GRENAT4_MAC_PAY3) &&
+(rx_ptype <= I40E_RX_PTYPE_GRENAT4_MACVLAN_IPV6_ICMP_PAY4);
+   ipv6_tunnel = (rx_ptype >= I40E_RX_PTYPE_GRENAT6_MAC_PAY3) &&
+(rx_ptype <=

[net-next 00/16][pull request] 40GbE Intel Wired LAN Driver Updates 2016-02-18

2016-02-18 Thread Jeff Kirsher

This series contains updates to i40e and i40evf only.

Alex Duyck provides all the patches in the series to update and fix the
drivers.  Fixed the driver to drop the outer checksum offload on UDP
tunnels, since the issue is that the upper levels of the stack never
requested such an offload and it results in possible errors.  Updates the
TSO function to just use u64 values, so we do not have to end up casting
u32 values.  In the TSO path, factored out the L4 header offsets allowing
us to ignore the L4 header offsets when dealing with the L3 checksum and
length update.  Consolidates all of the spots where we were updating
either the TCP or IP checksums in the TSO and checksum path into the TSO
function.  Fixed two issues by adding support for IPv4 encapsulated in
IPv6, first issue was the fact that iphdr(skb)->protocol was being used to
test for the outer transport protocol which breaks IPv6 support.  The second
was that we cleared the flag for v4 going to v6, but we did not take care
of txflags going the other way.  Added support for IPv6 extension headers
in setting up the Tx checksum.  Added exception handling to the Tx
checksum path so that we can handle cases of TSO where the frame is bad,
or Tx checksum where we did not recognize a protocol.  Fixed a number of
issues to make certain that we are using the correct protocols when
parsing both the inner and outer headers of a frame that is mixed between
IPv4 and IPv6 for inner and outer.  Updated the feature flags to reflect
the newly enabled/added features.

Sorry, no witty patch descriptions this time around, probably should
let Mitch help in writing patch descriptions for Alex. :-)

The following are changes since commit 7e6e18fbc033e00a4d4af3d4ea7bad0db6b7ad1b:
  net_sched: Improve readability of filter processing
and are available in the git repository at:
  git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue 40GbE

Alexander Duyck (16):
  i40e/i40evf: Drop outer checksum offload that was not requested
  i40e/i40evf: Use u64 values instead of casting them in TSO function
  i40e/i40evf: Factor out L4 header and checksum from L3 bits in TSO
path
  i40e/i40evf: Consolidate all header changes into TSO function
  i40e/i40evf: Replace header pointers with unions of pointers in Tx
checksum path
  i40e/i40evf: Add support for IPv4 encapsulated in IPv6
  i40e/i40evf: Handle IPv6 extension headers in checksum offload
  i40e/i40evf: Do not write to descriptor unless we complete
  i40e/i40evf: Add exception handling for Tx checksum
  i40e/i40evf: Clean-up Rx packet checksum handling
  i40e/i40evf: Enable support for SKB_GSO_UDP_TUNNEL_CSUM
  i40e: Fix ATR in relation to tunnels
  i40e: Do not drop support for IPv6 VXLAN or GENEVE tunnels
  i40e: Update feature flags to reflect newly enabled features
  i40evf: Update feature flags to reflect newly enabled features
  i40e: Add support for ATR w/ IPv6 extension headers

 drivers/net/ethernet/intel/i40e/i40e_main.c |  28 +-
 drivers/net/ethernet/intel/i40e/i40e_txrx.c | 404 
 drivers/net/ethernet/intel/i40e/i40e_txrx.h |   2 -
 drivers/net/ethernet/intel/i40evf/i40e_txrx.c   | 360 +++--
 drivers/net/ethernet/intel/i40evf/i40e_txrx.h   |   2 -
 drivers/net/ethernet/intel/i40evf/i40evf_main.c |  23 +-
 6 files changed, 433 insertions(+), 386 deletions(-)

-- 
2.5.0

RE: [PATCH V2 net-next 2/3] lan78xx: add ethtool set & get pause functions

2016-02-18 Thread Woojung.Huh

> > Ben, thanks for comments.
> > How about comment in include/uapi/linux/ethtool.h?
> > It says
> > ** struct ethtool_pauseparam - Ethernet pause (flow control) parameters
> > ...
> > * If @autoneg is non-zero, the MAC is configured to send and/or
> > * receive pause frames according to the result of autonegotiation.
> >
> > Doesn't this mean get_pauseparam() returns pause settings based on
> > Result of autonegotation? Not manual settings of rx_param & tx_param?
> 
> No, get_pauseparam should return the same settings that were passed to
> the last set_pauseparam.
> 
> Ben.

I used drivers/net/ethernet/intel/e1000e driver as reference.
It's implementation also returns status updated after autonegotiation.
Look into wrong one?

Woojung

Re: [PATCH V2 net-next 2/3] lan78xx: add ethtool set & get pause functions

2016-02-18 Thread Ben Hutchings

On Fri, 2016-02-19 at 00:03 +, woojung@microchip.com wrote:
> > > Add ethtool operations of set_pauseram and get_pauseparm.
> > [...]
> > > +static void lan78xx_get_pause(struct net_device *net,
> > > +   struct ethtool_pauseparam *pause)
> > > +{
> > > + struct lan78xx_net *dev = netdev_priv(net);
> > > + struct phy_device *phydev = net->phydev;
> > > + struct ethtool_cmd ecmd = { .cmd = ETHTOOL_GSET };
> > > +
> > > + phy_ethtool_gset(phydev, );
> > > +
> > > + pause->autoneg = dev->fc_autoneg;
> > > +
> > > + if (dev->fc_autoneg) {
> > > + if (dev->fc_autoneg_control & FLOW_CTRL_TX)
> > > + pause->tx_pause = 1;
> > > +
> > > + if (dev->fc_autoneg_control & FLOW_CTRL_RX)
> > > + pause->rx_pause = 1;
> > 
> > This is incorrect; you should always return the manual settings
> > (fc_request_control flags) here.  If autonegotiation is enabled then
> > your get_settings function will return the actual pause flags.
> > 
> > Ben.
> 
> Ben, thanks for comments.
> How about comment in include/uapi/linux/ethtool.h?
> It says 
> ** struct ethtool_pauseparam - Ethernet pause (flow control) parameters
> ...
> * If @autoneg is non-zero, the MAC is configured to send and/or
> * receive pause frames according to the result of autonegotiation.
> 
> Doesn't this mean get_pauseparam() returns pause settings based on 
> Result of autonegotation? Not manual settings of rx_param & tx_param?

No, get_pauseparam should return the same settings that were passed to
the last set_pauseparam.

Ben.

-- 
Ben Hutchings
Tomorrow will be cancelled due to lack of interest.

signature.asc
Description: This is a digitally signed message part

RE: [PATCH V2 net-next 2/3] lan78xx: add ethtool set & get pause functions

2016-02-18 Thread Woojung.Huh

> > Add ethtool operations of set_pauseram and get_pauseparm.
> [...]
> > +static void lan78xx_get_pause(struct net_device *net,
> > +     struct ethtool_pauseparam *pause)
> > +{
> > +   struct lan78xx_net *dev = netdev_priv(net);
> > +   struct phy_device *phydev = net->phydev;
> > +   struct ethtool_cmd ecmd = { .cmd = ETHTOOL_GSET };
> > +
> > +   phy_ethtool_gset(phydev, );
> > +
> > +   pause->autoneg = dev->fc_autoneg;
> > +
> > +   if (dev->fc_autoneg) {
> > +   if (dev->fc_autoneg_control & FLOW_CTRL_TX)
> > +   pause->tx_pause = 1;
> > +
> > +   if (dev->fc_autoneg_control & FLOW_CTRL_RX)
> > +   pause->rx_pause = 1;
> 
> This is incorrect; you should always return the manual settings
> (fc_request_control flags) here.  If autonegotiation is enabled then
> your get_settings function will return the actual pause flags.
> 
> Ben.

Ben, thanks for comments.
How about comment in include/uapi/linux/ethtool.h?
It says 
** struct ethtool_pauseparam - Ethernet pause (flow control) parameters
...
* If @autoneg is non-zero, the MAC is configured to send and/or
* receive pause frames according to the result of autonegotiation.

Doesn't this mean get_pauseparam() returns pause settings based on 
Result of autonegotation? Not manual settings of rx_param & tx_param?

Woojung

Re: [PATCH 1/1] ser_gigaset: use container_of() instead of detour

2016-02-18 Thread Paul Bolle

Hi Greg,

On do, 2016-02-18 at 15:46 -0800, Greg KH wrote:
> 
> 
> This is not the correct way to submit patches for inclusion in the
> stable kernel tree.  Please read Documentation/stable_kernel_rules.txt
> for how to do this properly.
> 
> 

What in my submission triggers this formletter?

Thanks,


Paul Bolle

Re: [PATCH] rose: correct integer overflow check

2016-02-18 Thread Ralf Baechle

On Thu, Feb 18, 2016 at 04:03:16PM -0500, Insu Yun wrote:

> 
> Because of the types on the right hand side of the comparison
> the expressions are all promoted to unsigned.
> 
> Did you look at the compiler's assembler output?  I did when
> reviewing your patch.
> 
> 
> I checked the assembler output right now.
> You are right.
> I realized that right hand side becomes unsigned due to sizeof.
> I think this patch is wrong. 
> Thanks. 

On a different level, the current whole approach of ROSE to just generate
a fixed number of devices at initialization time of ROSE is if not wrong
then at least very archaic.  The default number is 10 devices and probably
of those 9 are unused on a typical setup - that is, if the module has
been loaded intentionally at all.

As a solution I've implemented a patch to support creating of ROSE
devices through netlink plus the necessary changes to iproute2 to go
along with that.

  Ralf

Re: [PATCH 1/1] ser_gigaset: use container_of() instead of detour

2016-02-18 Thread Greg KH

On Thu, Feb 18, 2016 at 09:29:08PM +0100, Paul Bolle wrote:
> The purpose of gigaset_device_release() is to kfree() the struct
> ser_cardstate that contains our struct device. This is done via a bit of
> a detour. First we make our struct device's driver_data point to the
> container of our struct ser_cardstate (which is a struct cardstate). In
> gigaset_device_release() we then retrieve that driver_data again. And
> after that we finally kfree() the struct ser_cardstate that was saved in
> the struct cardstate.
> 
> All of this can be achieved much easier by using container_of() to get
> from our struct device to its container, struct ser_cardstate. Do so.
> 
> Note that at the time the detour was implemented commit b8b2c7d845d5
> ("base/platform: assert that dev_pm_domain callbacks are called
> unconditionally") had just entered the tree. That commit disconnected
> our platform_device and our platform_driver. These were reconnected
> again in v4.5-rc2 through commit 25cad69f21f5 ("base/platform: Fix
> platform drivers with no probe callback"). And one of the consequences
> of that fix was that it broke the detour via driver_data. That's because
> it made __device_release_driver() stop being a NOP for our struct device
> and actually do stuff again. One of the things it now does, is setting
> our driver_data to NULL. That, in turn, makes it impossible for
> gigaset_device_release() to get to our struct cardstate. Which has the
> net effect of leaking a struct ser_cardstate at every call of this
> driver's tty close() operation. So using container_of() has the
> additional benefit of actually working.
> 
> Reported-by: Dmitry Vyukov 
> Tested-by: Dmitry Vyukov 
> Signed-off-by: Paul Bolle 
> ---
>  drivers/isdn/gigaset/ser-gigaset.c | 9 +
>  1 file changed, 1 insertion(+), 8 deletions(-)




This is not the correct way to submit patches for inclusion in the
stable kernel tree.  Please read Documentation/stable_kernel_rules.txt
for how to do this properly.

Re: [PATCH V2 net-next 2/3] lan78xx: add ethtool set & get pause functions

2016-02-18 Thread Ben Hutchings

On Thu, 2016-02-18 at 22:40 +, woojung@microchip.com wrote:
> Add ethtool operations of set_pauseram and get_pauseparm.
[...]
> +static void lan78xx_get_pause(struct net_device *net,
> +   struct ethtool_pauseparam *pause)
> +{
> + struct lan78xx_net *dev = netdev_priv(net);
> + struct phy_device *phydev = net->phydev;
> + struct ethtool_cmd ecmd = { .cmd = ETHTOOL_GSET };
> +
> + phy_ethtool_gset(phydev, );
> +
> + pause->autoneg = dev->fc_autoneg;
> +
> + if (dev->fc_autoneg) {
> + if (dev->fc_autoneg_control & FLOW_CTRL_TX)
> + pause->tx_pause = 1;
> +
> + if (dev->fc_autoneg_control & FLOW_CTRL_RX)
> + pause->rx_pause = 1;

This is incorrect; you should always return the manual settings
(fc_request_control flags) here.  If autonegotiation is enabled then
your get_settings function will return the actual pause flags.

Ben.


> + } else {
> + if (dev->fc_request_control & FLOW_CTRL_TX)
> + pause->tx_pause = 1;
> +
> + if (dev->fc_request_control & FLOW_CTRL_RX)
> + pause->rx_pause = 1;
> + }
> +}
[...]

-- 
Ben Hutchings
Once a job is fouled up, anything done to improve it makes it worse.

signature.asc
Description: This is a digitally signed message part

linux-next: manual merge of the net-next tree with the net tree

2016-02-18 Thread Stephen Rothwell

Hi all,

Today's linux-next merge of the net-next tree got a conflict in:

  drivers/net/phy/marvell.c

between commit:

  79be1a1c9090 ("phy: marvell: Fix and unify reg-init behavior")

from the net tree and commit:

  930b37ee8d84 ("net: phy: Add SGMII support for Marvell 
88E1510/1512/1514/1518")

from the net-next tree.

OK, I didn't know how to resolve this, so I just used the net-next
tree version (which is probably wrong, but will build).

-- 
Cheers,
Stephen Rothwell

Re: [RFC][PATCH 00/10] Add trace event support to eBPF

2016-02-18 Thread Daniel Borkmann


On 02/18/2016 10:27 PM, Tom Zanussi wrote:

On Tue, 2016-02-16 at 20:51 -0800, Alexei Starovoitov wrote:

On Tue, Feb 16, 2016 at 04:35:27PM -0600, Tom Zanussi wrote:

On Sun, 2016-02-14 at 01:02 +0100, Alexei Starovoitov wrote:

[...]

Take a look at all the tools written on top of it:
https://github.com/iovisor/bcc/tree/master/tools


That's great, but it's all out-of-tree.  Supporting out-of-tree users
has never been justification for merging in-kernel code (or for blocking
it from being merged).


huh? perf is the only in-tree user space project.
All others tools and libraries are out-of-tree and that makes sense.


What about all the other things under tools/?


Actually would be great to merge bcc with perf eventually, but choice
of C++ isn't going to make it easy. The only real difference
between perf+bpf and bcc is that bcc integrates clang/llvm
as a library whereas perf+bpf deals with elf files and standalone compiler.
There are pros and cons for both and it's great that both are actively
growing and gaining user traction.


Why worry about merging bcc with perf?  Why not a tools/bcc?


It would indeed be great to mid-term have bcc internals to some degree
merged(/rewritten) into perf. tools/bcc doesn't make much sense to me
as it really should be perf, where already the rest of the eBPF front-end
logic resides that Wang et al initially integrated. So efforts could
consolidate from that side.

The user could be given a choice whether his use-case is to load the
object file, or directly pass in a C file where perf does the rest. And
f.e. Brendan's tools could ship natively as perf "built-ins" that users
can go and try out from there directly and/or use the code as a starting
point for their own proglets.

Cheers,
Daniel

[PATCH V2 net-next 3/3] MAINTAINERS: Add LAN78XX entry

2016-02-18 Thread Woojung.Huh

Add maintainers for Microchip LAN78XX.
unglinuxdri...@microchip.com is alias email which goes to current
developer(s) work for Microchip Network related products.

Signed-off-by: Woojung Huh 
---
 MAINTAINERS | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 355e1c8..5593d18 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -11343,6 +11343,13 @@ S: Maintained
 F: drivers/usb/host/isp116x*
 F: include/linux/usb/isp116x.h
 
+USB LAN78XX ETHERNET DRIVER
+M: Woojung Huh 
+M: Microchip Linux Driver Support 
+L: netdev@vger.kernel.org
+S: Maintained
+F: drivers/net/usb/lan78xx.*
+
 USB MASS STORAGE DRIVER
 M: Matthew Dharm 
 L: linux-...@vger.kernel.org
-- 
2.7.0

[PATCH V2 net-next 2/3] lan78xx: add ethtool set & get pause functions

2016-02-18 Thread Woojung.Huh

Add ethtool operations of set_pauseram and get_pauseparm.

Signed-off-by: Woojung Huh 
---
 drivers/net/usb/lan78xx.c | 84 +--
 1 file changed, 81 insertions(+), 3 deletions(-)

diff --git a/drivers/net/usb/lan78xx.c b/drivers/net/usb/lan78xx.c
index 4ec25e8..6bcb312 100644
--- a/drivers/net/usb/lan78xx.c
+++ b/drivers/net/usb/lan78xx.c
@@ -36,7 +36,7 @@
 #define DRIVER_AUTHOR  "WOOJUNG HUH "
 #define DRIVER_DESC"LAN78XX USB 3.0 Gigabit Ethernet Devices"
 #define DRIVER_NAME"lan78xx"
-#define DRIVER_VERSION "1.0.2"
+#define DRIVER_VERSION "1.0.3"
 
 #define TX_TIMEOUT_JIFFIES (5 * HZ)
 #define THROTTLE_JIFFIES   (HZ / 8)
@@ -281,6 +281,10 @@ struct lan78xx_net {
u32 chipid;
u32 chiprev;
struct mii_bus  *mdiobus;
+
+   int fc_autoneg;
+   u8  fc_autoneg_control;
+   u8  fc_request_control;
 };
 
 /* use ethtool to change the level for any given device */
@@ -902,11 +906,17 @@ static int lan78xx_update_flowcontrol(struct lan78xx_net 
*dev, u8 duplex,
 {
u32 flow = 0, fct_flow = 0;
int ret;
+   u8 cap;
 
-   u8 cap = mii_resolve_flowctrl_fdx(lcladv, rmtadv);
+   if (dev->fc_autoneg) {
+   cap = mii_resolve_flowctrl_fdx(lcladv, rmtadv);
+   dev->fc_autoneg_control = cap;
+   } else {
+   cap = dev->fc_request_control;
+   }
 
if (cap & FLOW_CTRL_TX)
-   flow = (FLOW_CR_TX_FCEN_ | 0x);
+   flow |= (FLOW_CR_TX_FCEN_ | 0x);
 
if (cap & FLOW_CTRL_RX)
flow |= FLOW_CR_RX_FCEN_;
@@ -1386,6 +1396,70 @@ static int lan78xx_set_settings(struct net_device *net, 
struct ethtool_cmd *cmd)
return ret;
 }
 
+static void lan78xx_get_pause(struct net_device *net,
+ struct ethtool_pauseparam *pause)
+{
+   struct lan78xx_net *dev = netdev_priv(net);
+   struct phy_device *phydev = net->phydev;
+   struct ethtool_cmd ecmd = { .cmd = ETHTOOL_GSET };
+
+   phy_ethtool_gset(phydev, );
+
+   pause->autoneg = dev->fc_autoneg;
+
+   if (dev->fc_autoneg) {
+   if (dev->fc_autoneg_control & FLOW_CTRL_TX)
+   pause->tx_pause = 1;
+
+   if (dev->fc_autoneg_control & FLOW_CTRL_RX)
+   pause->rx_pause = 1;
+   } else {
+   if (dev->fc_request_control & FLOW_CTRL_TX)
+   pause->tx_pause = 1;
+
+   if (dev->fc_request_control & FLOW_CTRL_RX)
+   pause->rx_pause = 1;
+   }
+}
+
+static int lan78xx_set_pause(struct net_device *net,
+struct ethtool_pauseparam *pause)
+{
+   struct lan78xx_net *dev = netdev_priv(net);
+   struct phy_device *phydev = net->phydev;
+   struct ethtool_cmd ecmd = { .cmd = ETHTOOL_GSET };
+   int ret;
+
+   phy_ethtool_gset(phydev, );
+
+   if (pause->autoneg && !ecmd.autoneg) {
+   ret = -EINVAL;
+   goto exit;
+   }
+
+   dev->fc_request_control = 0;
+   if (pause->rx_pause)
+   dev->fc_request_control |= FLOW_CTRL_RX;
+
+   if (pause->tx_pause)
+   dev->fc_request_control |= FLOW_CTRL_TX;
+
+   if (ecmd.autoneg) {
+   u32 mii_adv;
+
+   ecmd.advertising &= ~(ADVERTISED_Pause | ADVERTISED_Asym_Pause);
+   mii_adv = (u32)mii_advertise_flowctrl(dev->fc_request_control);
+   ecmd.advertising |= mii_adv_to_ethtool_adv_t(mii_adv);
+   phy_ethtool_sset(phydev, );
+   }
+
+   dev->fc_autoneg = pause->autoneg;
+
+   ret = 0;
+exit:
+   return ret;
+}
+
 static const struct ethtool_ops lan78xx_ethtool_ops = {
.get_link   = lan78xx_get_link,
.nway_reset = lan78xx_nway_reset,
@@ -1404,6 +1478,8 @@ static const struct ethtool_ops lan78xx_ethtool_ops = {
.set_wol= lan78xx_set_wol,
.get_eee= lan78xx_get_eee,
.set_eee= lan78xx_set_eee,
+   .get_pauseparam = lan78xx_get_pause,
+   .set_pauseparam = lan78xx_set_pause,
 };
 
 static int lan78xx_ioctl(struct net_device *netdev, struct ifreq *rq, int cmd)
@@ -1631,6 +1707,8 @@ static int lan78xx_phy_init(struct lan78xx_net *dev)
  SUPPORTED_Pause | SUPPORTED_Asym_Pause);
genphy_config_aneg(phydev);
 
+   dev->fc_autoneg = phydev->autoneg;
+
phy_start(phydev);
 
netif_dbg(dev, ifup, dev->net, "phy initialised successfully");
-- 
2.7.0

[PATCH V2 net-next 1/3] lan78xx: replace devid to chipid & chiprev

2016-02-18 Thread Woojung.Huh

Replace devid to chipid & chiprev for easy access.

Signed-off-by: Woojung Huh 
---
 drivers/net/usb/lan78xx.c | 20 +++-
 drivers/net/usb/lan78xx.h |  1 +
 2 files changed, 12 insertions(+), 9 deletions(-)

diff --git a/drivers/net/usb/lan78xx.c b/drivers/net/usb/lan78xx.c
index 1c299b8..4ec25e8 100644
--- a/drivers/net/usb/lan78xx.c
+++ b/drivers/net/usb/lan78xx.c
@@ -278,7 +278,8 @@ struct lan78xx_net {
int link_on;
u8  mdix_ctrl;
 
-   u32 devid;
+   u32 chipid;
+   u32 chiprev;
struct mii_bus  *mdiobus;
 };
 
@@ -471,7 +472,7 @@ static int lan78xx_read_raw_eeprom(struct lan78xx_net *dev, 
u32 offset,
 */
ret = lan78xx_read_reg(dev, HW_CFG, );
saved = val;
-   if ((dev->devid & ID_REV_CHIP_ID_MASK_) == 0x7800) {
+   if (dev->chipid == ID_REV_CHIP_ID_7800_) {
val &= ~(HW_CFG_LED1_EN_ | HW_CFG_LED0_EN_);
ret = lan78xx_write_reg(dev, HW_CFG, val);
}
@@ -505,7 +506,7 @@ static int lan78xx_read_raw_eeprom(struct lan78xx_net *dev, 
u32 offset,
 
retval = 0;
 exit:
-   if ((dev->devid & ID_REV_CHIP_ID_MASK_) == 0x7800)
+   if (dev->chipid == ID_REV_CHIP_ID_7800_)
ret = lan78xx_write_reg(dev, HW_CFG, saved);
 
return retval;
@@ -539,7 +540,7 @@ static int lan78xx_write_raw_eeprom(struct lan78xx_net 
*dev, u32 offset,
 */
ret = lan78xx_read_reg(dev, HW_CFG, );
saved = val;
-   if ((dev->devid & ID_REV_CHIP_ID_MASK_) == 0x7800) {
+   if (dev->chipid == ID_REV_CHIP_ID_7800_) {
val &= ~(HW_CFG_LED1_EN_ | HW_CFG_LED0_EN_);
ret = lan78xx_write_reg(dev, HW_CFG, val);
}
@@ -587,7 +588,7 @@ static int lan78xx_write_raw_eeprom(struct lan78xx_net 
*dev, u32 offset,
 
retval = 0;
 exit:
-   if ((dev->devid & ID_REV_CHIP_ID_MASK_) == 0x7800)
+   if (dev->chipid == ID_REV_CHIP_ID_7800_)
ret = lan78xx_write_reg(dev, HW_CFG, saved);
 
return retval;
@@ -1555,9 +1556,9 @@ static int lan78xx_mdio_init(struct lan78xx_net *dev)
snprintf(dev->mdiobus->id, MII_BUS_ID_SIZE, "usb-%03d:%03d",
 dev->udev->bus->busnum, dev->udev->devnum);
 
-   switch (dev->devid & ID_REV_CHIP_ID_MASK_) {
-   case 0x7800:
-   case 0x7850:
+   switch (dev->chipid) {
+   case ID_REV_CHIP_ID_7800_:
+   case ID_REV_CHIP_ID_7850_:
/* set to internal PHY id */
dev->mdiobus->phy_mask = ~(1 << 1);
break;
@@ -1918,7 +1919,8 @@ static int lan78xx_reset(struct lan78xx_net *dev)
 
/* save DEVID for later usage */
ret = lan78xx_read_reg(dev, ID_REV, );
-   dev->devid = buf;
+   dev->chipid = (buf & ID_REV_CHIP_ID_MASK_) >> 16;
+   dev->chiprev = buf & ID_REV_CHIP_REV_MASK_;
 
/* Respond to the IN token with a NAK */
ret = lan78xx_read_reg(dev, USB_CFG0, );
diff --git a/drivers/net/usb/lan78xx.h b/drivers/net/usb/lan78xx.h
index a93fb65..4092790 100644
--- a/drivers/net/usb/lan78xx.h
+++ b/drivers/net/usb/lan78xx.h
@@ -107,6 +107,7 @@
 #define ID_REV_CHIP_ID_MASK_   (0x)
 #define ID_REV_CHIP_REV_MASK_  (0x)
 #define ID_REV_CHIP_ID_7800_   (0x7800)
+#define ID_REV_CHIP_ID_7850_   (0x7850)
 
 #define FPGA_REV   (0x04)
 #define FPGA_REV_MINOR_MASK_   (0xFF00)
-- 
2.7.0

[PATCH V2 net-next 0/3] lan78xx: driver update

2016-02-18 Thread Woojung.Huh

This patch series add new ethtool functions of set_pauseparam  & get_pauseparam
and MAINTAINERS entry.

Woojung Huh (3):
  lan78xx: replace devid to chipid & chiprev
  lan78xx: add ethtool set & get pause functions
  MAINTAINERS: Add LAN78XX entry

 MAINTAINERS   |   7 
 drivers/net/usb/lan78xx.c | 104 --
 drivers/net/usb/lan78xx.h |   1 +
 3 files changed, 100 insertions(+), 12 deletions(-)

-- 
2.7.0

RE: linux-next: build failure after merge of the net-next tree

2016-02-18 Thread Yuval Mintz

> > After merging the net-next tree, today's linux-next build (powerpc
> > ppc64_defconfig) failed like this:
> >
> > In file included from drivers/net/ethernet/broadcom/bnx2x/bnx2x.h:56:0,
> >  from drivers/net/ethernet/broadcom/bnx2x/bnx2x_dcb.c:30:
> > drivers/net/ethernet/broadcom/bnx2x/bnx2x_dcb.c: In function
> 'bnx2x_dcbx_get_ap_feature':
> > drivers/net/ethernet/broadcom/bnx2x/bnx2x_dcb.c:224:11: error:
> 'DCBX_APP_SF_DEFAULT' undeclared (first use in this function)
> >DCBX_APP_SF_DEFAULT) &&
> >^
> > Caused by commit
> >
> >   e5d3a51cefbb ("This adds support for default application priority.")
> >
> > This build is big endian.
> 
> Yuval and Ariel, you _MUST_ fix this.
> 
> This build failure has been in the tree for 24 hours and I haven't heard 
> anything
> from you two yet.

Hi Dave,

Perhaps I wasn't clear enough in the title I've provided for the fixing patch,
but I've sent it yesterday and it's marked as 'under review' for net-next.
The patch is "bnx2x: Add missing HSI for big-endian machines".

Sorry about all the noise.

Re: [PATCH v7 5/8] time: Add history to cross timestamp interface supporting slower devices

2016-02-18 Thread Richard Cochran

On Fri, Feb 12, 2016 at 12:25:26PM -0800, Christopher S. Hall wrote:
>  /**
>   * get_device_system_crosststamp - Synchronously capture system/device 
> timestamp
> - * @sync_devicetime: Callback to get simultaneous device time and
> + * @get_time_fn: Callback to get simultaneous device time and

Fold this into earlier patch?

>   *   system counter from the device driver
> + * @history_ref: Historical reference point used to interpolate system
> + *   time when counter provided by the driver is before the current interval

KernelDoc says history_ref,

>   * @xtstamp: Receives simultaneously captured system and device time
>   *
>   * Reads a timestamp from a device and correlates it to system time
> @@ -920,6 +1035,7 @@ int get_device_system_crosststamp(int (*get_time_fn)
>  struct system_counterval_t *sys_counterval,
>  void *ctx),
> void *ctx,
> +   struct system_time_snapshot *history_begin,
> struct system_device_crosststamp *xtstamp)

... but parameter is called history_begin.

Thanks,
Richard

Re: [PATCH 1/1] ser_gigaset: use container_of() instead of detour

2016-02-18 Thread Paul Bolle

On do, 2016-02-18 at 22:42 +0100, Tilman Schmidt wrote:
> Acked-by: Tilman Schmidt 
> 
> Thanks for cleaning up the mess I left behind.

That's not how I look at it.

Commit 4c5e354a9742 ("ser_gigaset: fix deallocation of platform device
structure") provided a plausible fix for the issue we were trying to fix
two months ago. If the platform code hadn't been broken when you wrote
it - broken for ser_gigaset's corner case, that is - you would have
noticed the lifetime rules of driver_data the hard way. (I suspect I
might have never noticed if syzkaller hadn't insisted in rubbing my nose
in it.)

So chances are you would have come up with something similar to this fix
would commit 25cad69f21f5 ("base/platform: Fix platform drivers with no
probe callback") already have landed.

Thanks,

Paul Bolle

Re: [PATCH 1/1] ser_gigaset: use container_of() instead of detour

2016-02-18 Thread Tilman Schmidt

Am 18.02.2016 um 21:29 schrieb Paul Bolle:
> The purpose of gigaset_device_release() is to kfree() the struct
> ser_cardstate that contains our struct device. This is done via a bit of
> a detour. First we make our struct device's driver_data point to the
> container of our struct ser_cardstate (which is a struct cardstate). In
> gigaset_device_release() we then retrieve that driver_data again. And
> after that we finally kfree() the struct ser_cardstate that was saved in
> the struct cardstate.
> 
> All of this can be achieved much easier by using container_of() to get
> from our struct device to its container, struct ser_cardstate. Do so.

You're absolutely right. Very nice!

> Note that at the time the detour was implemented commit b8b2c7d845d5
> ("base/platform: assert that dev_pm_domain callbacks are called
> unconditionally") had just entered the tree. That commit disconnected
> our platform_device and our platform_driver. These were reconnected
> again in v4.5-rc2 through commit 25cad69f21f5 ("base/platform: Fix
> platform drivers with no probe callback"). And one of the consequences
> of that fix was that it broke the detour via driver_data. That's because
> it made __device_release_driver() stop being a NOP for our struct device
> and actually do stuff again. One of the things it now does, is setting
> our driver_data to NULL. That, in turn, makes it impossible for
> gigaset_device_release() to get to our struct cardstate. Which has the
> net effect of leaking a struct ser_cardstate at every call of this
> driver's tty close() operation. So using container_of() has the
> additional benefit of actually working.
> 
> Reported-by: Dmitry Vyukov 
> Tested-by: Dmitry Vyukov 
> Signed-off-by: Paul Bolle 

Acked-by: Tilman Schmidt 

Thanks for cleaning up the mess I left behind.

Tilman

-- 
Tilman Schmidt  E-Mail: til...@imap.cc
Bonn, Germany
Nous, on a des fleurs et des bougies pour nous protéger.



signature.asc
Description: OpenPGP digital signature

Re: [RFC][PATCH 00/10] Add trace event support to eBPF

2016-02-18 Thread Tom Zanussi

On Tue, 2016-02-16 at 20:51 -0800, Alexei Starovoitov wrote:
> On Tue, Feb 16, 2016 at 04:35:27PM -0600, Tom Zanussi wrote:
> > On Sun, 2016-02-14 at 01:02 +0100, Alexei Starovoitov wrote:
> > > On Fri, Feb 12, 2016 at 10:11:18AM -0600, Tom Zanussi wrote:
> 
> > > this hist triggers belong in the kernel. BPF already can do
> > > way more complex aggregation and histograms.
> > 
> > Way more?  I still can't accomplish with eBPF some of the most basic and
> > helpful use cases that I can with hist triggers, such as using
> > stacktraces as hash keys.  And the locking in the eBPF hashmap
> > implementation prevents anything like the function_hist [1] tracer from
> > being implemented on top of it:
> 
> Both statements are not true.

Erm, not exactly...

> In the link from previous email take a look at funccount_example.txt:
> # ./funccount 'vfs_*'
> Tracing... Ctrl-C to end.
> ^C
> ADDR FUNC  COUNT
> 811efe81 vfs_create1
> 811f24a1 vfs_rename1
> 81215191 vfs_fsync_range   2
> 81231df1 vfs_lock_file30
> 811e8dd1 vfs_fstatat 152
> 811e8d71 vfs_fstat   154
> 811e4381 vfs_write   166
> 811e8c71 vfs_getattr_nosec   262
> 811e8d41 vfs_getattr 262
> 811e3221 vfs_open264
> 811e4251 vfs_read470
> Detaching...
> 

When I tried to use it to trace all functions:

  # ./funccount.py '*'

I got about 5 minutes worth of these kinds of error messages, which I
couldn't break out of:

write of "p:kprobes/p_rootfs_mount rootfs_mount" into kprobe_events failed: 
Device or resource busy
open(/sys/kernel/debug/tracing/events/kprobes/p_legacy_pic_int_noop/id): Too 
many open files
open(/sys/kernel/debug/tracing/events/kprobes/p_legacy_pic_irq_pending_noop/id):
 Too many open files
open(/sys/kernel/debug/tracing/events/kprobes/p_legacy_pic_probe/id): Too many 
open files
open(/sys/kernel/debug/tracing/events/kprobes/p_unmask_8259A_irq/id): Too many 
open files
open(/sys/kernel/debug/tracing/events/kprobes/p_enable_8259A_irq/id): Too many 
open files
open(/sys/kernel/debug/tracing/events/kprobes/p_mask_8259A_irq/id): Too many 
open files
open(/sys/kernel/debug/tracing/events/kprobes/p_disable_8259A_irq/id): Too many 
open files
open(/sys/kernel/debug/tracing/events/kprobes/p_i8259A_irq_pending/id): Too 
many open files
...

Which is probably a good thing, since that's a relatively common thing
for someone to try, whereas this subset probably isn't:

  # ./funccount.py '*spin*'

Which on my machine resulted in a hard lockup on all CPUs.  I'm not set
up to grab serial output on that machine, but here's a screenshot of
most of the stacktrace, all I could get:

  http://picpaste.com/bcc-spinstar-hardlock-fobQbcuG.JPG

There's nothing special about that machine and it's running a stock
4.4.0 kernel, so it should be pretty easy to reproduce on anything with
just a BPF-enabled config.  If not, let me know and I'll send more
specific info.

> And this is done without adding new code to the kernel.
> 
> Another example is offwaketime that uses two stack traces as
> part of single key.
> 

Which has the below code in the script itself implementing a stack
walker:

static u64 get_frame(u64 *bp) {
if (*bp) {
// The following stack walker is x86_64 specific
u64 ret = 0;
if (bpf_probe_read(, sizeof(ret), (void *)(*bp+8)))
return 0;
if (bpf_probe_read(bp, sizeof(*bp), (void *)*bp))
*bp = 0;
if (ret < __START_KERNEL_map)
return 0;
return ret;
}
return 0;
}

int waker(struct pt_regs *ctx, struct task_struct *p) {
u32 pid = p->pid;

if (!(FILTER))
return 0;

u64 bp = 0;
struct wokeby_t woke = {};
int depth = 0;
bpf_get_current_comm(, sizeof(woke.name));
bp = ctx->bp;

// unrolled loop (MAXWDEPTH):
if (!(woke.ret[depth++] = get_frame())) goto out;
if (!(woke.ret[depth++] = get_frame())) goto out;
if (!(woke.ret[depth++] = get_frame())) goto out;
if (!(woke.ret[depth++] = get_frame())) goto out;
if (!(woke.ret[depth++] = get_frame())) goto out;
if (!(woke.ret[depth++] = get_frame())) goto out;
if (!(woke.ret[depth++] = get_frame())) goto out;
if (!(woke.ret[depth++] = get_frame())) goto out;
if (!(woke.ret[depth++] = get_frame())) goto out;
woke.ret[depth] = get_frame();

I think one call to this pretty much cancels out any performance gains
you might have made by fastidiously avoiding trace events.

So technically users can use stacktraces as hash keys, but they're
expected to use the bpf_probe_read() kernel primitives to write a stack
walker themselves.  Not exactly what I was hoping for..

> > > Take a look at all the tools written on

Re: [PATCH v7 6/8] x86: tsc: Always Running Timer (ART) correlated clocksource

2016-02-18 Thread Andy Lutomirski


On 02/12/2016 12:25 PM, Christopher S. Hall wrote:

On modern Intel systems TSC is derived from the new Always Running Timer
(ART). ART can be captured simultaneous to the capture of
audio and network device clocks, allowing a correlation between timebases
to be constructed. Upon capture, the driver converts the captured ART
value to the appropriate system clock using the correlated clocksource
mechanism.

On systems that support ART a new CPUID leaf (0x15) returns parameters
“m” and “n” such that:

TSC_value = (ART_value * m) / n + k [n >= 2]

[k is an offset that can adjusted by a privileged agent. The
IA32_TSC_ADJUST MSR is an example of an interface to adjust k.
See 17.14.4 of the Intel SDM for more details]

Signed-off-by: Christopher S. Hall 
[jstultz: Tweaked to fix build issue, also reworked math for
64bit division on 32bit systems]
Signed-off-by: John Stultz 
---
  arch/x86/include/asm/cpufeature.h |  3 ++-
  arch/x86/include/asm/tsc.h|  2 ++
  arch/x86/kernel/cpu/scattered.c   |  1 +
  arch/x86/kernel/tsc.c | 50 +++
  4 files changed, 55 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/cpufeature.h 
b/arch/x86/include/asm/cpufeature.h
index 7ad8c94..111b892 100644
--- a/arch/x86/include/asm/cpufeature.h
+++ b/arch/x86/include/asm/cpufeature.h
@@ -85,7 +85,7 @@
  #define X86_FEATURE_P4( 3*32+ 7) /* "" P4 */
  #define X86_FEATURE_CONSTANT_TSC ( 3*32+ 8) /* TSC ticks at a constant rate */
  #define X86_FEATURE_UP( 3*32+ 9) /* smp kernel running on up 
*/
-/* free, was #define X86_FEATURE_FXSAVE_LEAK ( 3*32+10) * "" FXSAVE leaks 
FOP/FIP/FOP */
+#define X86_FEATURE_ART(3*32+10) /* Platform has always 
running timer (ART) */
  #define X86_FEATURE_ARCH_PERFMON ( 3*32+11) /* Intel Architectural PerfMon */
  #define X86_FEATURE_PEBS  ( 3*32+12) /* Precise-Event Based Sampling */
  #define X86_FEATURE_BTS   ( 3*32+13) /* Branch Trace Store */
@@ -188,6 +188,7 @@

  #define X86_FEATURE_CPB   ( 7*32+ 2) /* AMD Core Performance 
Boost */
  #define X86_FEATURE_EPB   ( 7*32+ 3) /* IA32_ENERGY_PERF_BIAS 
support */
+#define X86_FEATURE_INVARIANT_TSC (7*32+4) /* Intel Invariant TSC */


How is this related to the rest of the patch?


+/*
+ * Convert ART to TSC given numerator/denominator found in detect_art()
+ */
+struct system_counterval_t convert_art_to_tsc(cycle_t art)
+{
+   u64 tmp, res, rem;
+
+   rem = do_div(art, art_to_tsc_denominator);
+
+   res = art * art_to_tsc_numerator;
+   tmp = rem * art_to_tsc_numerator;
+
+   do_div(tmp, art_to_tsc_denominator);
+   res += tmp;
+
+   return (struct system_counterval_t) {.cs = art_related_clocksource,
+   .cycles = res};


The SDM and the patch description both mention an offset "k".  Shouldn't 
this code at least have a comment about how it deals with the k != 0 case?


--Andy

[PATCHv2] sctp: Fix port hash table size computation

2016-02-18 Thread Neil Horman

Dmitry Vyukov noted recently that the sctp_port_hashtable had an error in
its size computation, observing that the current method never guaranteed
that the hashsize (measured in number of entries) would be a power of two,
which the input hash function for that table requires.  The root cause of
the problem is that two values need to be computed (one, the allocation
order of the storage requries, as passed to __get_free_pages, and two the
number of entries for the hash table).  Both need to be ^2, but for
different reasons, and the existing code is simply computing one order
value, and using it as the basis for both, which is wrong (i.e. it assumes
that ((1<
Reported-by: Dmitry Vyukov 
CC: Dmitry Vyukov 
CC: Vladislav Yasevich 
CC: "David S. Miller" 

---
Change notes:

v2) Fix type error for num_entries and max_entry_order.  Should have caught
that, sorry Dave
---
 net/sctp/protocol.c | 46 ++
 1 file changed, 38 insertions(+), 8 deletions(-)

diff --git a/net/sctp/protocol.c b/net/sctp/protocol.c
index ab0d538..1099e99 100644
--- a/net/sctp/protocol.c
+++ b/net/sctp/protocol.c
@@ -60,6 +60,8 @@
 #include 
 #include 
 
+#define MAX_SCTP_PORT_HASH_ENTRIES (64 * 1024)
+
 /* Global data structures. */
 struct sctp_globals sctp_globals __read_mostly;
 
@@ -1355,6 +1357,8 @@ static __init int sctp_init(void)
unsigned long limit;
int max_share;
int order;
+   int num_entries;
+   int max_entry_order;
 
sock_skb_cb_check_size(sizeof(struct sctp_ulpevent));
 
@@ -1407,14 +1411,24 @@ static __init int sctp_init(void)
 
/* Size and allocate the association hash table.
 * The methodology is similar to that of the tcp hash tables.
+* Though not identical.  Start by getting a goal size
 */
if (totalram_pages >= (128 * 1024))
goal = totalram_pages >> (22 - PAGE_SHIFT);
else
goal = totalram_pages >> (24 - PAGE_SHIFT);
 
-   for (order = 0; (1UL << order) < goal; order++)
-   ;
+   /* Then compute the page order for said goal */
+   order = get_order(goal);
+
+   /* Now compute the required page order for the maximum sized table we
+* want to create
+*/
+   max_entry_order = get_order(MAX_SCTP_PORT_HASH_ENTRIES *
+   sizeof(struct sctp_bind_hashbucket));
+
+   /* Limit the page order by that maximum hash table size */
+   order = min(order, max_entry_order);
 
/* Allocate and initialize the endpoint hash table.  */
sctp_ep_hashsize = 64;
@@ -1430,20 +1444,35 @@ static __init int sctp_init(void)
INIT_HLIST_HEAD(_ep_hashtable[i].chain);
}
 
-   /* Allocate and initialize the SCTP port hash table.  */
+   /* Allocate and initialize the SCTP port hash table.
+* Note that order is initalized to start at the max sized
+* table we want to support.  If we can't get that many pages
+* reduce the order and try again
+*/
do {
-   sctp_port_hashsize = (1UL << order) * PAGE_SIZE /
-   sizeof(struct sctp_bind_hashbucket);
-   if ((sctp_port_hashsize > (64 * 1024)) && order > 0)
-   continue;
sctp_port_hashtable = (struct sctp_bind_hashbucket *)
__get_free_pages(GFP_KERNEL | __GFP_NOWARN, order);
} while (!sctp_port_hashtable && --order > 0);
+
if (!sctp_port_hashtable) {
pr_err("Failed bind hash alloc\n");
status = -ENOMEM;
goto err_bhash_alloc;
}
+
+   /* Now compute the number of entries that will fit in the
+* port hash space we allocated
+*/
+   num_entries = (1UL << order) * PAGE_SIZE /
+ sizeof(struct sctp_bind_hashbucket);
+
+   /* And finish by rounding it down to the nearest power of two
+* this wastes some memory of course, but its needed because
+* the hash function operates based on the assumption that
+

Re: [PATCH net-next 0/2] qed{,e}: Add vlan filtering offload

2016-02-18 Thread David Miller

From: Yuval Mintz 
Date: Thu, 18 Feb 2016 17:00:38 +0200

> This series adds vlan filtering offload to qede.
> First patch introduces small additional infrastructure needed in
> qed to support it, while second contains the main bulk of driver changes.

Series applied to net-next, thanks.

Re: [PATCH] USB: cdc_subset: only build when one driver is enabled

2016-02-18 Thread David Miller

From: Arnd Bergmann 
Date: Wed, 17 Feb 2016 23:25:11 +0100

> This avoids a harmless randconfig warning I get when USB_NET_CDC_SUBSET
> is enabled, but all of the more specific drivers are not:
> 
> drivers/net/usb/cdc_subset.c:241:2: #warning You need to configure some 
> hardware for this driver
> 
> The current behavior is clearly intentional, giving a warning when
> a user picks a configuration that won't do anything good. The only
> reason for even addressing this is that I'm getting close to
> eliminating all 'randconfig' warnings on ARM, and this came up
> a couple of times.
> 
> My workaround is to not even build the module when none of the
> configurations are enable.
> 
> Alternatively we could simply remove the #warning (nothing wrong
> for compile-testing), turn it into a runtime warning, or
> change the Kconfig options into a menu to hide CONFIG_USB_NET_CDC_SUBSET.
> 
> Signed-off-by: Arnd Bergmann 

Applied, thanks Arnd.

Re: [PATCH] net: phy: dp83848: Fix sysfs naming collision warning

2016-02-18 Thread David Miller

From: "Andrew F. Davis" 
Date: Wed, 17 Feb 2016 18:10:00 -0600

> Files in sysfs are created using the name from the phy_driver struct,
> when two names are the same we may get a duplicate filename warning,
> fix this.
> 
> Reported-by: kernel test robot 
> Signed-off-by: Andrew F. Davis 

Applied, thanks.

Re: [PATCH net-next 0/2] bridge: mdb: add support for extended attributes

2016-02-18 Thread Nikolay Aleksandrov

On 02/18/2016 09:37 PM, David Miller wrote:
> From: Nikolay Aleksandrov 
> Date: Tue, 16 Feb 2016 12:46:52 +0100
> 
>> Note that the reason we can't simply add an attribute after
>> MDBA_MDB_ENTRY_INFO is that current users (e.g. iproute2) walk over
>> the attribute list directly without checking for the attribute type.
> 
> Honestly that sounds like a bug in iproute2 to me...
> 

I agree, but changing this in the kernel would make older iproute2 versions
incompatible with newer kernels, possibly outputting garbage from the additional
attributes, besides we still will have to turn MDBA_MDB_ENTRY_INFO into a nested
attribute and insert the struct with a header as that's the per-mdb entry 
attribute
currently.

Alternatively I have a version that uses a request flag in the dump request and
sends back an alternative/"extended" version of the mdbs where every field is
a netlink attribute and is extensible, thus keeping the old format in place
and offering extended attribute support to anyone who requests it.

I just thought this version is a middle ground between the two solutions and
still doesn't break user-space while being extensible.

There're no more holes in the mdb entry struct to reuse.. :-)

Cheers,
 Nik

[PATCHv3 net-next 07/14] nfp: preallocate RX buffers early in .ndo_open

2016-02-18 Thread Jakub Kicinski

We want the .ndo_open() to have following structure:
 - allocate resources;
 - configure HW/FW;
 - enable the device from stack perspective.
Therefore filling RX rings needs to be moved to the beginning
of .ndo_open().

Signed-off-by: Jakub Kicinski 
---
 .../net/ethernet/netronome/nfp/nfp_net_common.c| 34 +++---
 1 file changed, 11 insertions(+), 23 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c 
b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
index b640e1693377..1e1e0f7ac077 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
@@ -1664,28 +1664,19 @@ static void nfp_net_clear_config_and_disable(struct 
nfp_net *nn)
  * @nn:  NFP Net device structure
  * @r_vec:   Ring vector to be started
  */
-static int nfp_net_start_vec(struct nfp_net *nn, struct nfp_net_r_vector 
*r_vec)
+static void
+nfp_net_start_vec(struct nfp_net *nn, struct nfp_net_r_vector *r_vec)
 {
unsigned int irq_vec;
-   int err = 0;
 
irq_vec = nn->irq_entries[r_vec->irq_idx].vector;
 
disable_irq(irq_vec);
 
-   err = nfp_net_rx_ring_bufs_alloc(r_vec->nfp_net, r_vec->rx_ring);
-   if (err) {
-   nn_err(nn, "RV%02d: couldn't allocate enough buffers\n",
-  r_vec->irq_idx);
-   goto out;
-   }
nfp_net_rx_ring_fill_freelist(r_vec->rx_ring);
-
napi_enable(_vec->napi);
-out:
-   enable_irq(irq_vec);
 
-   return err;
+   enable_irq(irq_vec);
 }
 
 static int nfp_net_netdev_open(struct net_device *netdev)
@@ -1740,6 +1731,10 @@ static int nfp_net_netdev_open(struct net_device *netdev)
err = nfp_net_rx_ring_alloc(nn->r_vecs[r].rx_ring);
if (err)
goto err_free_tx_ring_p;
+
+   err = nfp_net_rx_ring_bufs_alloc(nn, nn->r_vecs[r].rx_ring);
+   if (err)
+   goto err_flush_rx_ring_p;
}
 
err = netif_set_real_num_tx_queues(netdev, nn->num_tx_rings);
@@ -1812,11 +1807,8 @@ static int nfp_net_netdev_open(struct net_device *netdev)
 * - enable all TX queues
 * - set link state
 */
-   for (r = 0; r < nn->num_r_vecs; r++) {
-   err = nfp_net_start_vec(nn, >r_vecs[r]);
-   if (err)
-   goto err_disable_napi;
-   }
+   for (r = 0; r < nn->num_r_vecs; r++)
+   nfp_net_start_vec(nn, >r_vecs[r]);
 
netif_tx_wake_all_queues(netdev);
 
@@ -1825,18 +1817,14 @@ static int nfp_net_netdev_open(struct net_device 
*netdev)
 
return 0;
 
-err_disable_napi:
-   while (r--) {
-   napi_disable(>r_vecs[r].napi);
-   nfp_net_rx_ring_reset(nn->r_vecs[r].rx_ring);
-   nfp_net_rx_ring_bufs_free(nn, nn->r_vecs[r].rx_ring);
-   }
 err_clear_config:
nfp_net_clear_config_and_disable(nn);
 err_free_rings:
r = nn->num_r_vecs;
 err_free_prev_vecs:
while (r--) {
+   nfp_net_rx_ring_bufs_free(nn, nn->r_vecs[r].rx_ring);
+err_flush_rx_ring_p:
nfp_net_rx_ring_free(nn->r_vecs[r].rx_ring);
 err_free_tx_ring_p:
nfp_net_tx_ring_free(nn->r_vecs[r].tx_ring);
-- 
1.9.1

[PATCHv3 net-next 11/14] nfp: propagate list buffer size in struct rx_ring

2016-02-18 Thread Jakub Kicinski

Free list buffer size needs to be propagated to few functions
as a parameter and added to struct nfp_net_rx_ring since soon
some of the functions will be reused to manage rings with
buffers of size different than nn->fl_bufsz.

Signed-off-by: Jakub Kicinski 
---
 drivers/net/ethernet/netronome/nfp/nfp_net.h   |  3 +++
 .../net/ethernet/netronome/nfp/nfp_net_common.c| 24 ++
 2 files changed, 19 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net.h 
b/drivers/net/ethernet/netronome/nfp/nfp_net.h
index 0a87571a7d9c..1e08c9cf3ee0 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net.h
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net.h
@@ -298,6 +298,8 @@ struct nfp_net_rx_buf {
  * @rxds:   Virtual address of FL/RX ring in host memory
  * @dma:DMA address of the FL/RX ring
  * @size:   Size, in bytes, of the FL/RX ring (needed to free)
+ * @bufsz: Buffer allocation size for convenience of management routines
+ * (NOTE: this is in second cache line, do not use on fast path!)
  */
 struct nfp_net_rx_ring {
struct nfp_net_r_vector *r_vec;
@@ -319,6 +321,7 @@ struct nfp_net_rx_ring {
 
dma_addr_t dma;
unsigned int size;
+   unsigned int bufsz;
 } cacheline_aligned;
 
 /**
diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c 
b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
index 15d695cd8c44..fd226d2e8606 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
@@ -958,25 +958,27 @@ static inline int nfp_net_rx_space(struct nfp_net_rx_ring 
*rx_ring)
  * nfp_net_rx_alloc_one() - Allocate and map skb for RX
  * @rx_ring:   RX ring structure of the skb
  * @dma_addr:  Pointer to storage for DMA address (output param)
+ * @fl_bufsz:  size of freelist buffers
  *
  * This function will allcate a new skb, map it for DMA.
  *
  * Return: allocated skb or NULL on failure.
  */
 static struct sk_buff *
-nfp_net_rx_alloc_one(struct nfp_net_rx_ring *rx_ring, dma_addr_t *dma_addr)
+nfp_net_rx_alloc_one(struct nfp_net_rx_ring *rx_ring, dma_addr_t *dma_addr,
+unsigned int fl_bufsz)
 {
struct nfp_net *nn = rx_ring->r_vec->nfp_net;
struct sk_buff *skb;
 
-   skb = netdev_alloc_skb(nn->netdev, nn->fl_bufsz);
+   skb = netdev_alloc_skb(nn->netdev, fl_bufsz);
if (!skb) {
nn_warn_ratelimit(nn, "Failed to alloc receive SKB\n");
return NULL;
}
 
*dma_addr = dma_map_single(>pdev->dev, skb->data,
- nn->fl_bufsz, DMA_FROM_DEVICE);
+  fl_bufsz, DMA_FROM_DEVICE);
if (dma_mapping_error(>pdev->dev, *dma_addr)) {
dev_kfree_skb_any(skb);
nn_warn_ratelimit(nn, "Failed to map DMA RX buffer\n");
@@ -1069,7 +1071,7 @@ nfp_net_rx_ring_bufs_free(struct nfp_net *nn, struct 
nfp_net_rx_ring *rx_ring)
continue;
 
dma_unmap_single(>dev, rx_ring->rxbufs[i].dma_addr,
-nn->fl_bufsz, DMA_FROM_DEVICE);
+rx_ring->bufsz, DMA_FROM_DEVICE);
dev_kfree_skb_any(rx_ring->rxbufs[i].skb);
rx_ring->rxbufs[i].dma_addr = 0;
rx_ring->rxbufs[i].skb = NULL;
@@ -1091,7 +1093,8 @@ nfp_net_rx_ring_bufs_alloc(struct nfp_net *nn, struct 
nfp_net_rx_ring *rx_ring)
 
for (i = 0; i < rx_ring->cnt - 1; i++) {
rxbufs[i].skb =
-   nfp_net_rx_alloc_one(rx_ring, [i].dma_addr);
+   nfp_net_rx_alloc_one(rx_ring, [i].dma_addr,
+rx_ring->bufsz);
if (!rxbufs[i].skb) {
nfp_net_rx_ring_bufs_free(nn, rx_ring);
return -ENOMEM;
@@ -1279,7 +1282,8 @@ static int nfp_net_rx(struct nfp_net_rx_ring *rx_ring, 
int budget)
 
skb = rx_ring->rxbufs[idx].skb;
 
-   new_skb = nfp_net_rx_alloc_one(rx_ring, _dma_addr);
+   new_skb = nfp_net_rx_alloc_one(rx_ring, _dma_addr,
+  nn->fl_bufsz);
if (!new_skb) {
nfp_net_rx_give_one(rx_ring, rx_ring->rxbufs[idx].skb,
rx_ring->rxbufs[idx].dma_addr);
@@ -1463,10 +1467,12 @@ static void nfp_net_rx_ring_free(struct nfp_net_rx_ring 
*rx_ring)
 /**
  * nfp_net_rx_ring_alloc() - Allocate resource for a RX ring
  * @rx_ring:  RX ring to allocate
+ * @fl_bufsz: Size of buffers to allocate
  *
  * Return: 0 on success, negative errno otherwise.
  */
-static int nfp_net_rx_ring_alloc(struct nfp_net_rx_ring *rx_ring)
+static int
+nfp_net_rx_ring_alloc(struct nfp_net_rx_ring *rx_ring, unsigned int fl_bufsz)
 {
struct nfp_net_r_vector *r_vec =

[PATCHv3 net-next 05/14] nfp: cleanup tx ring flush and rename to reset

2016-02-18 Thread Jakub Kicinski

Since we never used flush without freeing the ring later
the functionality of the two operations is mixed.
Rename flush to ring reset and move there all the things
which have to be done after FW ring state is cleared.
While at it do some clean-ups.

Signed-off-by: Jakub Kicinski 
---
 .../net/ethernet/netronome/nfp/nfp_net_common.c| 81 ++
 1 file changed, 37 insertions(+), 44 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c 
b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
index faaa25dd5a1e..cc8b06651f57 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
@@ -868,61 +868,59 @@ static void nfp_net_tx_complete(struct nfp_net_tx_ring 
*tx_ring)
 }
 
 /**
- * nfp_net_tx_flush() - Free any untransmitted buffers currently on the TX ring
- * @tx_ring: TX ring structure
+ * nfp_net_tx_ring_reset() - Free any untransmitted buffers and reset pointers
+ * @nn:NFP Net device
+ * @tx_ring:   TX ring structure
  *
  * Assumes that the device is stopped
  */
-static void nfp_net_tx_flush(struct nfp_net_tx_ring *tx_ring)
+static void
+nfp_net_tx_ring_reset(struct nfp_net *nn, struct nfp_net_tx_ring *tx_ring)
 {
-   struct nfp_net_r_vector *r_vec = tx_ring->r_vec;
-   struct nfp_net *nn = r_vec->nfp_net;
-   struct pci_dev *pdev = nn->pdev;
const struct skb_frag_struct *frag;
struct netdev_queue *nd_q;
-   struct sk_buff *skb;
-   int nr_frags;
-   int fidx;
-   int idx;
+   struct pci_dev *pdev = nn->pdev;
 
while (tx_ring->rd_p != tx_ring->wr_p) {
-   idx = tx_ring->rd_p % tx_ring->cnt;
+   int nr_frags, fidx, idx;
+   struct sk_buff *skb;
 
+   idx = tx_ring->rd_p % tx_ring->cnt;
skb = tx_ring->txbufs[idx].skb;
-   if (skb) {
-   nr_frags = skb_shinfo(skb)->nr_frags;
-   fidx = tx_ring->txbufs[idx].fidx;
-
-   if (fidx == -1) {
-   /* unmap head */
-   dma_unmap_single(>dev,
-tx_ring->txbufs[idx].dma_addr,
-skb_headlen(skb),
-DMA_TO_DEVICE);
-   } else {
-   /* unmap fragment */
-   frag = _shinfo(skb)->frags[fidx];
-   dma_unmap_page(>dev,
-  tx_ring->txbufs[idx].dma_addr,
-  skb_frag_size(frag),
-  DMA_TO_DEVICE);
-   }
-
-   /* check for last gather fragment */
-   if (fidx == nr_frags - 1)
-   dev_kfree_skb_any(skb);
-
-   tx_ring->txbufs[idx].dma_addr = 0;
-   tx_ring->txbufs[idx].skb = NULL;
-   tx_ring->txbufs[idx].fidx = -2;
+   nr_frags = skb_shinfo(skb)->nr_frags;
+   fidx = tx_ring->txbufs[idx].fidx;
+
+   if (fidx == -1) {
+   /* unmap head */
+   dma_unmap_single(>dev,
+tx_ring->txbufs[idx].dma_addr,
+skb_headlen(skb), DMA_TO_DEVICE);
+   } else {
+   /* unmap fragment */
+   frag = _shinfo(skb)->frags[fidx];
+   dma_unmap_page(>dev,
+  tx_ring->txbufs[idx].dma_addr,
+  skb_frag_size(frag), DMA_TO_DEVICE);
}
 
-   memset(_ring->txds[idx], 0, sizeof(tx_ring->txds[idx]));
+   /* check for last gather fragment */
+   if (fidx == nr_frags - 1)
+   dev_kfree_skb_any(skb);
+
+   tx_ring->txbufs[idx].dma_addr = 0;
+   tx_ring->txbufs[idx].skb = NULL;
+   tx_ring->txbufs[idx].fidx = -2;
 
tx_ring->qcp_rd_p++;
tx_ring->rd_p++;
}
 
+   memset(tx_ring->txds, 0, sizeof(*tx_ring->txds) * tx_ring->cnt);
+   tx_ring->wr_p = 0;
+   tx_ring->rd_p = 0;
+   tx_ring->qcp_rd_p = 0;
+   tx_ring->wr_ptr_add = 0;
+
nd_q = netdev_get_tx_queue(nn->netdev, tx_ring->idx);
netdev_tx_reset_queue(nd_q);
 }
@@ -1360,11 +1358,6 @@ static void nfp_net_tx_ring_free(struct nfp_net_tx_ring 
*tx_ring)
  tx_ring->txds, tx_ring->dma);
 
tx_ring->cnt = 0;
-   tx_ring->wr_p = 0;
-   tx_ring->rd_p = 0;
-   tx_ring->qcp_rd_p = 0;
-   tx_ring->wr_ptr_add = 0;
-

[PATCHv3 net-next 12/14] nfp: convert .ndo_change_mtu() to prepare/commit paradigm

2016-02-18 Thread Jakub Kicinski

When changing MTU on running device first allocate new rings
and buffers and once it succeeds proceed with changing MTU.

Allocation of new rings is not really necessary for this
operation - it's done to keep the code simple and because
size of the extra ring memory is quite small compared to
the size of buffers.

Operation can still fail midway through if FW communication
times out.  In that case we retry with old rings and if fail
persists there is little we can do, we just free all resources
and leave device in fully closed state.

Signed-off-by: Jakub Kicinski 
---
 .../net/ethernet/netronome/nfp/nfp_net_common.c| 113 +++--
 1 file changed, 105 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c 
b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
index fd226d2e8606..0153fce33dff 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
@@ -1504,6 +1504,64 @@ err_alloc:
return -ENOMEM;
 }
 
+static struct nfp_net_rx_ring *
+nfp_net_shadow_rx_rings_prepare(struct nfp_net *nn, unsigned int fl_bufsz)
+{
+   struct nfp_net_rx_ring *rings;
+   unsigned int r;
+
+   rings = kcalloc(nn->num_rx_rings, sizeof(*rings), GFP_KERNEL);
+   if (!rings)
+   return NULL;
+
+   for (r = 0; r < nn->num_rx_rings; r++) {
+   nfp_net_rx_ring_init([r], nn->rx_rings[r].r_vec, r);
+
+   if (nfp_net_rx_ring_alloc([r], fl_bufsz))
+   goto err_free_prev;
+
+   if (nfp_net_rx_ring_bufs_alloc(nn, [r]))
+   goto err_free_ring;
+   }
+
+   return rings;
+
+err_free_prev:
+   while (r--) {
+   nfp_net_rx_ring_bufs_free(nn, [r]);
+err_free_ring:
+   nfp_net_rx_ring_free([r]);
+   }
+   kfree(rings);
+   return NULL;
+}
+
+static struct nfp_net_rx_ring *
+nfp_net_shadow_rx_rings_swap(struct nfp_net *nn, struct nfp_net_rx_ring *rings)
+{
+   struct nfp_net_rx_ring *old = nn->rx_rings;
+   unsigned int r;
+
+   for (r = 0; r < nn->num_rx_rings; r++)
+   old[r].r_vec->rx_ring = [r];
+
+   nn->rx_rings = rings;
+   return old;
+}
+
+static void
+nfp_net_shadow_rx_rings_free(struct nfp_net *nn, struct nfp_net_rx_ring *rings)
+{
+   unsigned int r;
+
+   for (r = 0; r < nn->num_r_vecs; r++) {
+   nfp_net_rx_ring_bufs_free(nn, [r]);
+   nfp_net_rx_ring_free([r]);
+   }
+
+   kfree(rings);
+}
+
 static int
 nfp_net_prepare_vector(struct nfp_net *nn, struct nfp_net_r_vector *r_vec,
   int idx)
@@ -1977,25 +2035,64 @@ static void nfp_net_set_rx_mode(struct net_device 
*netdev)
 
 static int nfp_net_change_mtu(struct net_device *netdev, int new_mtu)
 {
+   unsigned int old_mtu, old_fl_bufsz, new_fl_bufsz;
struct nfp_net *nn = netdev_priv(netdev);
-   int ret = 0;
+   struct nfp_net_rx_ring *tmp_rings;
+   int err, err2;
 
if (new_mtu < 68 || new_mtu > nn->max_mtu) {
nn_err(nn, "New MTU (%d) is not valid\n", new_mtu);
return -EINVAL;
}
 
-   if (netif_running(netdev))
-   nfp_net_netdev_close(netdev);
+   old_mtu = netdev->mtu;
+   old_fl_bufsz = nn->fl_bufsz;
+   new_fl_bufsz = NFP_NET_MAX_PREPEND + ETH_HLEN + VLAN_HLEN * 2 +
+   MPLS_HLEN * 8 + new_mtu;
+
+   if (!(nn->ctrl & NFP_NET_CFG_CTRL_ENABLE)) {
+   netdev->mtu = new_mtu;
+   nn->fl_bufsz = new_fl_bufsz;
+   return 0;
+   }
+
+   /* Prepare new rings */
+   tmp_rings = nfp_net_shadow_rx_rings_prepare(nn, new_fl_bufsz);
+   if (!tmp_rings)
+   return -ENOMEM;
+
+   /* Stop device, swap in new rings, try to start the device */
+   nfp_net_close_stack(nn);
+   nfp_net_clear_config_and_disable(nn);
+
+   tmp_rings = nfp_net_shadow_rx_rings_swap(nn, tmp_rings);
 
netdev->mtu = new_mtu;
-   nn->fl_bufsz = NFP_NET_MAX_PREPEND + ETH_HLEN + VLAN_HLEN * 2 +
-   MPLS_HLEN * 8 + new_mtu;
+   nn->fl_bufsz = new_fl_bufsz;
+
+   err = nfp_net_set_config_and_enable(nn);
+   if (err) {
+   /* Try with old configuration and old rings */
+   tmp_rings = nfp_net_shadow_rx_rings_swap(nn, tmp_rings);
+
+   netdev->mtu = old_mtu;
+   nn->fl_bufsz = old_fl_bufsz;
+
+   err2 = nfp_net_set_config_and_enable(nn);
+   if (err2) {
+   nn_err(nn, "Can't restore MTU - FW communication failed 
(%d,%d)\n",
+  err, err2);
+   nfp_net_shadow_rx_rings_free(nn, tmp_rings);
+   nfp_net_close_free_all(nn);
+   return err2;
+   }
+   }
 
-   if (netif_running(netdev))
-   ret =

[PATCHv3 net-next 10/14] nfp: sync ring state during FW reconfiguration

2016-02-18 Thread Jakub Kicinski

FW reconfiguration in .ndo_open()/.ndo_stop() should reset/
restore queue state.  Since we need IRQs to be disabled when
filling rings on RX path we have to move disable_irq() from
.ndo_open() all the way up to IRQ allocation.

Since nfp_net_start_vec() becomes trivial now it can be
inlined.

Signed-off-by: Jakub Kicinski 
---
 .../net/ethernet/netronome/nfp/nfp_net_common.c| 45 --
 1 file changed, 16 insertions(+), 29 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c 
b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
index 4ce17cb95e6f..15d695cd8c44 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
@@ -1517,6 +1517,7 @@ nfp_net_prepare_vector(struct nfp_net *nn, struct 
nfp_net_r_vector *r_vec,
nn_err(nn, "Error requesting IRQ %d\n", entry->vector);
return err;
}
+   disable_irq(entry->vector);
 
/* Setup NAPI */
netif_napi_add(nn->netdev, _vec->napi,
@@ -1645,13 +1646,14 @@ static void nfp_net_clear_config_and_disable(struct 
nfp_net *nn)
 
nn_writel(nn, NFP_NET_CFG_CTRL, new_ctrl);
err = nfp_net_reconfig(nn, update);
-   if (err) {
+   if (err)
nn_err(nn, "Could not disable device: %d\n", err);
-   return;
-   }
 
-   for (r = 0; r < nn->num_r_vecs; r++)
+   for (r = 0; r < nn->num_r_vecs; r++) {
+   nfp_net_rx_ring_reset(nn->r_vecs[r].rx_ring);
+   nfp_net_tx_ring_reset(nn, nn->r_vecs[r].tx_ring);
nfp_net_vec_clear_ring_data(nn, r);
+   }
 
nn->ctrl = new_ctrl;
 }
@@ -1725,6 +1727,9 @@ static int nfp_net_set_config_and_enable(struct nfp_net 
*nn)
 
nn->ctrl = new_ctrl;
 
+   for (r = 0; r < nn->num_r_vecs; r++)
+   nfp_net_rx_ring_fill_freelist(nn->r_vecs[r].rx_ring);
+
/* Since reconfiguration requests while NFP is down are ignored we
 * have to wipe the entire VXLAN configuration and reinitialize it.
 */
@@ -1742,26 +1747,6 @@ err_clear_config:
 }
 
 /**
- * nfp_net_start_vec() - Start ring vector
- * @nn:  NFP Net device structure
- * @r_vec:   Ring vector to be started
- */
-static void
-nfp_net_start_vec(struct nfp_net *nn, struct nfp_net_r_vector *r_vec)
-{
-   unsigned int irq_vec;
-
-   irq_vec = nn->irq_entries[r_vec->irq_idx].vector;
-
-   disable_irq(irq_vec);
-
-   nfp_net_rx_ring_fill_freelist(r_vec->rx_ring);
-   napi_enable(_vec->napi);
-
-   enable_irq(irq_vec);
-}
-
-/**
  * nfp_net_open_stack() - Start the device from stack's perspective
  * @nn:  NFP Net device to reconfigure
  */
@@ -1769,8 +1754,10 @@ static void nfp_net_open_stack(struct nfp_net *nn)
 {
unsigned int r;
 
-   for (r = 0; r < nn->num_r_vecs; r++)
-   nfp_net_start_vec(nn, >r_vecs[r]);
+   for (r = 0; r < nn->num_r_vecs; r++) {
+   napi_enable(>r_vecs[r].napi);
+   enable_irq(nn->irq_entries[nn->r_vecs[r].irq_idx].vector);
+   }
 
netif_tx_wake_all_queues(nn->netdev);
 
@@ -1895,8 +1882,10 @@ static void nfp_net_close_stack(struct nfp_net *nn)
netif_carrier_off(nn->netdev);
nn->link_up = false;
 
-   for (r = 0; r < nn->num_r_vecs; r++)
+   for (r = 0; r < nn->num_r_vecs; r++) {
+   disable_irq(nn->irq_entries[nn->r_vecs[r].irq_idx].vector);
napi_disable(>r_vecs[r].napi);
+   }
 
netif_tx_disable(nn->netdev);
 }
@@ -1910,9 +1899,7 @@ static void nfp_net_close_free_all(struct nfp_net *nn)
unsigned int r;
 
for (r = 0; r < nn->num_r_vecs; r++) {
-   nfp_net_rx_ring_reset(nn->r_vecs[r].rx_ring);
nfp_net_rx_ring_bufs_free(nn, nn->r_vecs[r].rx_ring);
-   nfp_net_tx_ring_reset(nn, nn->r_vecs[r].tx_ring);
nfp_net_rx_ring_free(nn->r_vecs[r].rx_ring);
nfp_net_tx_ring_free(nn->r_vecs[r].tx_ring);
nfp_net_cleanup_vector(nn, >r_vecs[r]);
-- 
1.9.1

[PATCHv3 net-next 08/14] nfp: move filling ring information to FW config

2016-02-18 Thread Jakub Kicinski

nfp_net_[rt]x_ring_{alloc,free} should only allocate or free
ring resources without touching the device.  Move setting
parameters in the BAR to separate functions.  This will make
it possible to reuse alloc/free functions to allocate new
rings while the device is running.

Signed-off-by: Jakub Kicinski 
---
 .../net/ethernet/netronome/nfp/nfp_net_common.c| 50 ++
 1 file changed, 32 insertions(+), 18 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c 
b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
index 1e1e0f7ac077..34f933f19059 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
@@ -1385,10 +1385,6 @@ static void nfp_net_tx_ring_free(struct nfp_net_tx_ring 
*tx_ring)
struct nfp_net *nn = r_vec->nfp_net;
struct pci_dev *pdev = nn->pdev;
 
-   nn_writeq(nn, NFP_NET_CFG_TXR_ADDR(tx_ring->idx), 0);
-   nn_writeb(nn, NFP_NET_CFG_TXR_SZ(tx_ring->idx), 0);
-   nn_writeb(nn, NFP_NET_CFG_TXR_VEC(tx_ring->idx), 0);
-
kfree(tx_ring->txbufs);
 
if (tx_ring->txds)
@@ -1428,11 +1424,6 @@ static int nfp_net_tx_ring_alloc(struct nfp_net_tx_ring 
*tx_ring)
if (!tx_ring->txbufs)
goto err_alloc;
 
-   /* Write the DMA address, size and MSI-X info to the device */
-   nn_writeq(nn, NFP_NET_CFG_TXR_ADDR(tx_ring->idx), tx_ring->dma);
-   nn_writeb(nn, NFP_NET_CFG_TXR_SZ(tx_ring->idx), ilog2(tx_ring->cnt));
-   nn_writeb(nn, NFP_NET_CFG_TXR_VEC(tx_ring->idx), r_vec->irq_idx);
-
netif_set_xps_queue(nn->netdev, _vec->affinity_mask, tx_ring->idx);
 
nn_dbg(nn, "TxQ%02d: QCidx=%02d cnt=%d dma=%#llx host=%p\n",
@@ -1456,10 +1447,6 @@ static void nfp_net_rx_ring_free(struct nfp_net_rx_ring 
*rx_ring)
struct nfp_net *nn = r_vec->nfp_net;
struct pci_dev *pdev = nn->pdev;
 
-   nn_writeq(nn, NFP_NET_CFG_RXR_ADDR(rx_ring->idx), 0);
-   nn_writeb(nn, NFP_NET_CFG_RXR_SZ(rx_ring->idx), 0);
-   nn_writeb(nn, NFP_NET_CFG_RXR_VEC(rx_ring->idx), 0);
-
kfree(rx_ring->rxbufs);
 
if (rx_ring->rxds)
@@ -1499,11 +1486,6 @@ static int nfp_net_rx_ring_alloc(struct nfp_net_rx_ring 
*rx_ring)
if (!rx_ring->rxbufs)
goto err_alloc;
 
-   /* Write the DMA address, size and MSI-X info to the device */
-   nn_writeq(nn, NFP_NET_CFG_RXR_ADDR(rx_ring->idx), rx_ring->dma);
-   nn_writeb(nn, NFP_NET_CFG_RXR_SZ(rx_ring->idx), ilog2(rx_ring->cnt));
-   nn_writeb(nn, NFP_NET_CFG_RXR_VEC(rx_ring->idx), r_vec->irq_idx);
-
nn_dbg(nn, "RxQ%02d: FlQCidx=%02d RxQCidx=%02d cnt=%d dma=%#llx 
host=%p\n",
   rx_ring->idx, rx_ring->fl_qcidx, rx_ring->rx_qcidx,
   rx_ring->cnt, (unsigned long long)rx_ring->dma, rx_ring->rxds);
@@ -1628,6 +1610,17 @@ static void nfp_net_write_mac_addr(struct nfp_net *nn, 
const u8 *mac)
  get_unaligned_be16(nn->netdev->dev_addr + 4) << 16);
 }
 
+static void nfp_net_vec_clear_ring_data(struct nfp_net *nn, unsigned int idx)
+{
+   nn_writeq(nn, NFP_NET_CFG_RXR_ADDR(idx), 0);
+   nn_writeb(nn, NFP_NET_CFG_RXR_SZ(idx), 0);
+   nn_writeb(nn, NFP_NET_CFG_RXR_VEC(idx), 0);
+
+   nn_writeq(nn, NFP_NET_CFG_TXR_ADDR(idx), 0);
+   nn_writeb(nn, NFP_NET_CFG_TXR_SZ(idx), 0);
+   nn_writeb(nn, NFP_NET_CFG_TXR_VEC(idx), 0);
+}
+
 /**
  * nfp_net_clear_config_and_disable() - Clear control BAR and disable NFP
  * @nn:  NFP Net device to reconfigure
@@ -1635,6 +1628,7 @@ static void nfp_net_write_mac_addr(struct nfp_net *nn, 
const u8 *mac)
 static void nfp_net_clear_config_and_disable(struct nfp_net *nn)
 {
u32 new_ctrl, update;
+   unsigned int r;
int err;
 
new_ctrl = nn->ctrl;
@@ -1656,9 +1650,26 @@ static void nfp_net_clear_config_and_disable(struct 
nfp_net *nn)
return;
}
 
+   for (r = 0; r < nn->num_r_vecs; r++)
+   nfp_net_vec_clear_ring_data(nn, r);
+
nn->ctrl = new_ctrl;
 }
 
+static void
+nfp_net_vec_write_ring_data(struct nfp_net *nn, struct nfp_net_r_vector *r_vec,
+   unsigned int idx)
+{
+   /* Write the DMA address, size and MSI-X info to the device */
+   nn_writeq(nn, NFP_NET_CFG_RXR_ADDR(idx), r_vec->rx_ring->dma);
+   nn_writeb(nn, NFP_NET_CFG_RXR_SZ(idx), ilog2(r_vec->rx_ring->cnt));
+   nn_writeb(nn, NFP_NET_CFG_RXR_VEC(idx), r_vec->irq_idx);
+
+   nn_writeq(nn, NFP_NET_CFG_TXR_ADDR(idx), r_vec->tx_ring->dma);
+   nn_writeb(nn, NFP_NET_CFG_TXR_SZ(idx), ilog2(r_vec->tx_ring->cnt));
+   nn_writeb(nn, NFP_NET_CFG_TXR_VEC(idx), r_vec->irq_idx);
+}
+
 /**
  * nfp_net_start_vec() - Start ring vector
  * @nn:  NFP Net device structure
@@ -1766,6 +1777,9 @@ static int nfp_net_netdev_open(struct net_device *netdev)
 * - Set the Freelist buffer size
 * - Enable the FW
 */
+

[PATCHv3 net-next 14/14] nfp: allow ring size reconfiguration at runtime

2016-02-18 Thread Jakub Kicinski

Since much of the required changes have already been made for
changing MTU at runtime let's use it for ring size changes as
well.

Signed-off-by: Jakub Kicinski 
---
 drivers/net/ethernet/netronome/nfp/nfp_net.h   |   1 +
 .../net/ethernet/netronome/nfp/nfp_net_common.c| 129 +
 .../net/ethernet/netronome/nfp/nfp_net_ethtool.c   |  30 ++---
 3 files changed, 139 insertions(+), 21 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net.h 
b/drivers/net/ethernet/netronome/nfp/nfp_net.h
index 1e08c9cf3ee0..90ad6264e62c 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net.h
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net.h
@@ -724,6 +724,7 @@ void nfp_net_rss_write_key(struct nfp_net *nn);
 void nfp_net_coalesce_write_cfg(struct nfp_net *nn);
 int nfp_net_irqs_alloc(struct nfp_net *nn);
 void nfp_net_irqs_disable(struct nfp_net *nn);
+int nfp_net_set_ring_size(struct nfp_net *nn, u32 rxd_cnt, u32 txd_cnt);
 
 #ifdef CONFIG_NFP_NET_DEBUG
 void nfp_net_debugfs_create(void);
diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c 
b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
index 2c86a10abcd3..70d366bdd4b7 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
@@ -1442,6 +1442,59 @@ err_alloc:
return -ENOMEM;
 }
 
+static struct nfp_net_tx_ring *
+nfp_net_shadow_tx_rings_prepare(struct nfp_net *nn, u32 buf_cnt)
+{
+   struct nfp_net_tx_ring *rings;
+   unsigned int r;
+
+   rings = kcalloc(nn->num_tx_rings, sizeof(*rings), GFP_KERNEL);
+   if (!rings)
+   return NULL;
+
+   for (r = 0; r < nn->num_tx_rings; r++) {
+   nfp_net_tx_ring_init([r], nn->tx_rings[r].r_vec, r);
+
+   if (nfp_net_tx_ring_alloc([r], buf_cnt))
+   goto err_free_prev;
+   }
+
+   return rings;
+
+err_free_prev:
+   while (r--)
+   nfp_net_tx_ring_free([r]);
+   kfree(rings);
+   return NULL;
+}
+
+static struct nfp_net_tx_ring *
+nfp_net_shadow_tx_rings_swap(struct nfp_net *nn, struct nfp_net_tx_ring *rings)
+{
+   struct nfp_net_tx_ring *old = nn->tx_rings;
+   unsigned int r;
+
+   for (r = 0; r < nn->num_tx_rings; r++)
+   old[r].r_vec->tx_ring = [r];
+
+   nn->tx_rings = rings;
+   return old;
+}
+
+static void
+nfp_net_shadow_tx_rings_free(struct nfp_net *nn, struct nfp_net_tx_ring *rings)
+{
+   unsigned int r;
+
+   if (!rings)
+   return;
+
+   for (r = 0; r < nn->num_tx_rings; r++)
+   nfp_net_tx_ring_free([r]);
+
+   kfree(rings);
+}
+
 /**
  * nfp_net_rx_ring_free() - Free resources allocated to a RX ring
  * @rx_ring:  RX ring to free
@@ -1558,6 +1611,9 @@ nfp_net_shadow_rx_rings_free(struct nfp_net *nn, struct 
nfp_net_rx_ring *rings)
 {
unsigned int r;
 
+   if (!rings)
+   return;
+
for (r = 0; r < nn->num_r_vecs; r++) {
nfp_net_rx_ring_bufs_free(nn, [r]);
nfp_net_rx_ring_free([r]);
@@ -2100,6 +2156,79 @@ static int nfp_net_change_mtu(struct net_device *netdev, 
int new_mtu)
return err;
 }
 
+int nfp_net_set_ring_size(struct nfp_net *nn, u32 rxd_cnt, u32 txd_cnt)
+{
+   struct nfp_net_tx_ring *tx_rings = NULL;
+   struct nfp_net_rx_ring *rx_rings = NULL;
+   u32 old_rxd_cnt, old_txd_cnt;
+   int err, err2;
+
+   if (!(nn->ctrl & NFP_NET_CFG_CTRL_ENABLE)) {
+   nn->rxd_cnt = rxd_cnt;
+   nn->txd_cnt = txd_cnt;
+   return 0;
+   }
+
+   old_rxd_cnt = nn->rxd_cnt;
+   old_txd_cnt = nn->txd_cnt;
+
+   /* Prepare new rings */
+   if (nn->rxd_cnt != rxd_cnt) {
+   rx_rings = nfp_net_shadow_rx_rings_prepare(nn, nn->fl_bufsz,
+  rxd_cnt);
+   if (!rx_rings)
+   return -ENOMEM;
+   }
+   if (nn->txd_cnt != txd_cnt) {
+   tx_rings = nfp_net_shadow_tx_rings_prepare(nn, txd_cnt);
+   if (!tx_rings) {
+   nfp_net_shadow_rx_rings_free(nn, rx_rings);
+   return -ENOMEM;
+   }
+   }
+
+   /* Stop device, swap in new rings, try to start the device */
+   nfp_net_close_stack(nn);
+   nfp_net_clear_config_and_disable(nn);
+
+   if (rx_rings)
+   rx_rings = nfp_net_shadow_rx_rings_swap(nn, rx_rings);
+   if (tx_rings)
+   tx_rings = nfp_net_shadow_tx_rings_swap(nn, tx_rings);
+
+   nn->rxd_cnt = rxd_cnt;
+   nn->txd_cnt = txd_cnt;
+
+   err = nfp_net_set_config_and_enable(nn);
+   if (err) {
+   /* Try with old configuration and old rings */
+   if (rx_rings)
+   rx_rings = nfp_net_shadow_rx_rings_swap(nn, rx_rings);
+   if (tx_rings)
+

[PATCHv3 net-next 13/14] nfp: pass ring count as function parameter

2016-02-18 Thread Jakub Kicinski

Soon ring resize will call this functions with values
different than the current configuration we need to
explicitly pass the ring count as parameter.

Signed-off-by: Jakub Kicinski 
---
 .../net/ethernet/netronome/nfp/nfp_net_common.c| 23 +-
 1 file changed, 14 insertions(+), 9 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c 
b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
index 0153fce33dff..2c86a10abcd3 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
@@ -1405,17 +1405,18 @@ static void nfp_net_tx_ring_free(struct nfp_net_tx_ring 
*tx_ring)
 /**
  * nfp_net_tx_ring_alloc() - Allocate resource for a TX ring
  * @tx_ring:   TX Ring structure to allocate
+ * @cnt:   Ring buffer count
  *
  * Return: 0 on success, negative errno otherwise.
  */
-static int nfp_net_tx_ring_alloc(struct nfp_net_tx_ring *tx_ring)
+static int nfp_net_tx_ring_alloc(struct nfp_net_tx_ring *tx_ring, u32 cnt)
 {
struct nfp_net_r_vector *r_vec = tx_ring->r_vec;
struct nfp_net *nn = r_vec->nfp_net;
struct pci_dev *pdev = nn->pdev;
int sz;
 
-   tx_ring->cnt = nn->txd_cnt;
+   tx_ring->cnt = cnt;
 
tx_ring->size = sizeof(*tx_ring->txds) * tx_ring->cnt;
tx_ring->txds = dma_zalloc_coherent(>dev, tx_ring->size,
@@ -1468,18 +1469,20 @@ static void nfp_net_rx_ring_free(struct nfp_net_rx_ring 
*rx_ring)
  * nfp_net_rx_ring_alloc() - Allocate resource for a RX ring
  * @rx_ring:  RX ring to allocate
  * @fl_bufsz: Size of buffers to allocate
+ * @cnt:  Ring buffer count
  *
  * Return: 0 on success, negative errno otherwise.
  */
 static int
-nfp_net_rx_ring_alloc(struct nfp_net_rx_ring *rx_ring, unsigned int fl_bufsz)
+nfp_net_rx_ring_alloc(struct nfp_net_rx_ring *rx_ring, unsigned int fl_bufsz,
+ u32 cnt)
 {
struct nfp_net_r_vector *r_vec = rx_ring->r_vec;
struct nfp_net *nn = r_vec->nfp_net;
struct pci_dev *pdev = nn->pdev;
int sz;
 
-   rx_ring->cnt = nn->rxd_cnt;
+   rx_ring->cnt = cnt;
rx_ring->bufsz = fl_bufsz;
 
rx_ring->size = sizeof(*rx_ring->rxds) * rx_ring->cnt;
@@ -1505,7 +1508,8 @@ err_alloc:
 }
 
 static struct nfp_net_rx_ring *
-nfp_net_shadow_rx_rings_prepare(struct nfp_net *nn, unsigned int fl_bufsz)
+nfp_net_shadow_rx_rings_prepare(struct nfp_net *nn, unsigned int fl_bufsz,
+   u32 buf_cnt)
 {
struct nfp_net_rx_ring *rings;
unsigned int r;
@@ -1517,7 +1521,7 @@ nfp_net_shadow_rx_rings_prepare(struct nfp_net *nn, 
unsigned int fl_bufsz)
for (r = 0; r < nn->num_rx_rings; r++) {
nfp_net_rx_ring_init([r], nn->rx_rings[r].r_vec, r);
 
-   if (nfp_net_rx_ring_alloc([r], fl_bufsz))
+   if (nfp_net_rx_ring_alloc([r], fl_bufsz, buf_cnt))
goto err_free_prev;
 
if (nfp_net_rx_ring_bufs_alloc(nn, [r]))
@@ -1871,12 +1875,12 @@ static int nfp_net_netdev_open(struct net_device 
*netdev)
if (err)
goto err_free_prev_vecs;
 
-   err = nfp_net_tx_ring_alloc(nn->r_vecs[r].tx_ring);
+   err = nfp_net_tx_ring_alloc(nn->r_vecs[r].tx_ring, nn->txd_cnt);
if (err)
goto err_cleanup_vec_p;
 
err = nfp_net_rx_ring_alloc(nn->r_vecs[r].rx_ring,
-   nn->fl_bufsz);
+   nn->fl_bufsz, nn->rxd_cnt);
if (err)
goto err_free_tx_ring_p;
 
@@ -2057,7 +2061,8 @@ static int nfp_net_change_mtu(struct net_device *netdev, 
int new_mtu)
}
 
/* Prepare new rings */
-   tmp_rings = nfp_net_shadow_rx_rings_prepare(nn, new_fl_bufsz);
+   tmp_rings = nfp_net_shadow_rx_rings_prepare(nn, new_fl_bufsz,
+   nn->rxd_cnt);
if (!tmp_rings)
return -ENOMEM;
 
-- 
1.9.1

[PATCHv3 net-next 04/14] nfp: allocate ring SW structs dynamically

2016-02-18 Thread Jakub Kicinski

To be able to switch rings more easly on config changes allocate
them dynamically, separately from nfp_net structure.

Signed-off-by: Jakub Kicinski 
---
 drivers/net/ethernet/netronome/nfp/nfp_net.h   |  6 ++---
 .../net/ethernet/netronome/nfp/nfp_net_common.c| 28 +-
 2 files changed, 25 insertions(+), 9 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net.h 
b/drivers/net/ethernet/netronome/nfp/nfp_net.h
index ab264e1bccd0..0a87571a7d9c 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net.h
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net.h
@@ -472,6 +472,9 @@ struct nfp_net {
 
u32 rx_offset;
 
+   struct nfp_net_tx_ring *tx_rings;
+   struct nfp_net_rx_ring *rx_rings;
+
 #ifdef CONFIG_PCI_IOV
unsigned int num_vfs;
struct vf_data_storage *vfinfo;
@@ -504,9 +507,6 @@ struct nfp_net {
int txd_cnt;
int rxd_cnt;
 
-   struct nfp_net_tx_ring tx_rings[NFP_NET_MAX_TX_RINGS];
-   struct nfp_net_rx_ring rx_rings[NFP_NET_MAX_RX_RINGS];
-
u8 num_irqs;
u8 num_r_vecs;
struct nfp_net_r_vector r_vecs[NFP_NET_MAX_TX_RINGS];
diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c 
b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
index 8299d4c002fb..faaa25dd5a1e 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
@@ -414,12 +414,6 @@ static void nfp_net_irqs_assign(struct net_device *netdev)
r_vec->irq_idx = NFP_NET_NON_Q_VECTORS + r;
 
cpumask_set_cpu(r, _vec->affinity_mask);
-
-   r_vec->tx_ring = >tx_rings[r];
-   nfp_net_tx_ring_init(r_vec->tx_ring, r_vec, r);
-
-   r_vec->rx_ring = >rx_rings[r];
-   nfp_net_rx_ring_init(r_vec->rx_ring, r_vec, r);
}
 }
 
@@ -1501,6 +1495,12 @@ nfp_net_prepare_vector(struct nfp_net *nn, struct 
nfp_net_r_vector *r_vec,
struct msix_entry *entry = >irq_entries[r_vec->irq_idx];
int err;
 
+   r_vec->tx_ring = >tx_rings[idx];
+   nfp_net_tx_ring_init(r_vec->tx_ring, r_vec, idx);
+
+   r_vec->rx_ring = >rx_rings[idx];
+   nfp_net_rx_ring_init(r_vec->rx_ring, r_vec, idx);
+
snprintf(r_vec->name, sizeof(r_vec->name),
 "%s-rxtx-%d", nn->netdev->name, idx);
err = request_irq(entry->vector, r_vec->handler, 0, r_vec->name, r_vec);
@@ -1691,6 +1691,15 @@ static int nfp_net_netdev_open(struct net_device *netdev)
goto err_free_exn;
disable_irq(nn->irq_entries[NFP_NET_CFG_LSC].vector);
 
+   nn->rx_rings = kcalloc(nn->num_rx_rings, sizeof(*nn->rx_rings),
+  GFP_KERNEL);
+   if (!nn->rx_rings)
+   goto err_free_lsc;
+   nn->tx_rings = kcalloc(nn->num_tx_rings, sizeof(*nn->tx_rings),
+  GFP_KERNEL);
+   if (!nn->tx_rings)
+   goto err_free_rx_rings;
+
for (r = 0; r < nn->num_r_vecs; r++) {
err = nfp_net_prepare_vector(nn, >r_vecs[r], r);
if (err)
@@ -1805,6 +1814,10 @@ err_free_tx_ring_p:
 err_cleanup_vec_p:
nfp_net_cleanup_vector(nn, >r_vecs[r]);
}
+   kfree(nn->tx_rings);
+err_free_rx_rings:
+   kfree(nn->rx_rings);
+err_free_lsc:
nfp_net_aux_irq_free(nn, NFP_NET_CFG_LSC, NFP_NET_IRQ_LSC_IDX);
 err_free_exn:
nfp_net_aux_irq_free(nn, NFP_NET_CFG_EXN, NFP_NET_IRQ_EXN_IDX);
@@ -1850,6 +1863,9 @@ static int nfp_net_netdev_close(struct net_device *netdev)
nfp_net_cleanup_vector(nn, >r_vecs[r]);
}
 
+   kfree(nn->rx_rings);
+   kfree(nn->tx_rings);
+
nfp_net_aux_irq_free(nn, NFP_NET_CFG_LSC, NFP_NET_IRQ_LSC_IDX);
nfp_net_aux_irq_free(nn, NFP_NET_CFG_EXN, NFP_NET_IRQ_EXN_IDX);
 
-- 
1.9.1

[PATCHv3 net-next 00/14] nfp: MTU fixes for net

2016-02-18 Thread Jakub Kicinski

Hi Dave!

This is the second part of MTU reconfiguration fixes, targeted
at net-next.

Patches 1-8 refactor open/stop paths to look like this:
 - alloc;
 - dev/FW init;
 - stack init/start.
stop:
 - stack quiescence/stop;
 - dev/FW down;
 - free.
That's a quite a bit of code churn I did my best to split
it up but probably still not much fun to review.

Patch 9 splits the open/stop into chunks I can call later.

Patch 10 makes sure that FW start/stop operations are
reflected in SW state (which was not needed earlier since
we always did full down/up).

[Patches 11 and 13 are trivial, split for readability.]

Patch 12 does what you requested for MTU change:
 - alloc new resources;
 - stop dev;
 - try to start dev with new config;
 - if failed try with old config;
 - if failed die loudly.

Patch 14 does the same thing for ring resize.

I tested this with various error injection hacks and it 
seems quite solid.

This is on top of the first series, merge of net into
net-next will be required for this to apply.

Thanks!


Jakub Kicinski (14):
  nfp: move link state interrupt request/free calls
  nfp: break up nfp_net_{alloc|free}_rings
  nfp: make *x_ring_init do all the init
  nfp: allocate ring SW structs dynamically
  nfp: cleanup tx ring flush and rename to reset
  nfp: reorganize initial filling of RX rings
  nfp: preallocate RX buffers early in .ndo_open
  nfp: move filling ring information to FW config
  nfp: slice .ndo_open() and .ndo_stop() up
  nfp: sync ring state during FW reconfiguration
  nfp: propagate list buffer size in struct rx_ring
  nfp: convert .ndo_change_mtu() to prepare/commit paradigm
  nfp: pass ring count as function parameter
  nfp: allow ring size reconfiguration at runtime

 drivers/net/ethernet/netronome/nfp/nfp_net.h   |  10 +-
 .../net/ethernet/netronome/nfp/nfp_net_common.c| 898 ++---
 .../net/ethernet/netronome/nfp/nfp_net_ethtool.c   |  30 +-
 3 files changed, 615 insertions(+), 323 deletions(-)

-- 
1.9.1

Re: [PATCH] sctp: Fix port hash table size computation

2016-02-18 Thread David Miller

From: Neil Horman 
Date: Thu, 18 Feb 2016 10:02:04 -0500

> Dmitry Vyukov noted recently that the sctp_port_hashtable had an error in
> its size computation, observing that the current method never guaranteed
> that the hashsize (measured in number of entries) would be a power of two,
> which the input hash function for that table requires.  The root cause of
> the problem is that two values need to be computed (one, the allocation
> order of the storage requries, as passed to __get_free_pages, and two the
> number of entries for the hash table).  Both need to be ^2, but for
> different reasons, and the existing code is simply computing one order
> value, and using it as the basis for both, which is wrong (i.e. it assumes
> that ((1< 
> To fix this, we change the logic slightly.  We start by computing a goal
> allocation order (which is limited by the maximum size hash table we want
> to support.  Then we attempt to allocate that size table, decreasing the
> order until a successful allocation is made.  Then, with the resultant
> successful order we compute the number of buckets that hash table supports,
> which we then round down to the nearest power of two, giving us the number
> of entries the table actually supports.
> 
> I've tested this locally here, using non-debug and spinlock-debug kernels,
> and the number of entries in the hashtable consistently work out to be
> powers of two in all cases.
> 
> Signed-off-by: Neil Horman 
> Reported-by: Dmitry Vyukov 

This needs some work:

In file included from include/linux/list.h:8:0,
  from include/linux/module.h:9, from net/sctp/protocol.c:44:
  net/sctp/protocol.c: In function ‘sctp_init’:
  include/linux/kernel.h:752:17: warning: comparison of distinct
  pointer types lacks a cast (void) (&_min1 == &_min2); \ ^
  net/sctp/protocol.c:1431:10: note: in expansion of macro ‘min’ order
  = min(order, max_entry_order); ^ In file included from
  include/linux/printk.h:6:0, from include/linux/kernel.h:13, from
  include/linux/list.h:8, from include/linux/module.h:9, from
  net/sctp/protocol.c:44:
include/linux/kern_levels.h:4:18: warning: format ‘%d’ expects argument of type 
‘int’, but argument 3 has type ‘long unsigned int’ [-Wformat=]
 #define KERN_SOH "\001"  /* ASCII Start Of Header */
  ^
include/linux/kern_levels.h:13:19: note: in expansion of macro ‘KERN_SOH’
 #define KERN_INFO KERN_SOH "6" /* informational */
   ^
include/linux/printk.h:259:9: note: in expansion of macro ‘KERN_INFO’
  printk(KERN_INFO pr_fmt(fmt), ##__VA_ARGS__)
 ^
net/sctp/protocol.c:1484:2: note: in expansion of macro ‘pr_info’
  pr_info("Hash tables configured (bind %d/%d)\n", sctp_port_hashsize,
  ^

[PATCHv3 net-next 02/14] nfp: break up nfp_net_{alloc|free}_rings

2016-02-18 Thread Jakub Kicinski

nfp_net_{alloc|free}_rings contained strange mix of allocations
and vector initialization.  Remove it, declare vector init as
a separate function and handle allocations explicitly.

Signed-off-by: Jakub Kicinski 
---
 .../net/ethernet/netronome/nfp/nfp_net_common.c| 126 -
 1 file changed, 47 insertions(+), 79 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c 
b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
index bebdae80ccda..d39ac3553e1e 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
@@ -1486,91 +1486,40 @@ err_alloc:
return -ENOMEM;
 }
 
-static void __nfp_net_free_rings(struct nfp_net *nn, unsigned int n_free)
+static int
+nfp_net_prepare_vector(struct nfp_net *nn, struct nfp_net_r_vector *r_vec,
+  int idx)
 {
-   struct nfp_net_r_vector *r_vec;
-   struct msix_entry *entry;
+   struct msix_entry *entry = >irq_entries[r_vec->irq_idx];
+   int err;
 
-   while (n_free--) {
-   r_vec = >r_vecs[n_free];
-   entry = >irq_entries[r_vec->irq_idx];
+   snprintf(r_vec->name, sizeof(r_vec->name),
+"%s-rxtx-%d", nn->netdev->name, idx);
+   err = request_irq(entry->vector, r_vec->handler, 0, r_vec->name, r_vec);
+   if (err) {
+   nn_err(nn, "Error requesting IRQ %d\n", entry->vector);
+   return err;
+   }
 
-   nfp_net_rx_ring_free(r_vec->rx_ring);
-   nfp_net_tx_ring_free(r_vec->tx_ring);
+   /* Setup NAPI */
+   netif_napi_add(nn->netdev, _vec->napi,
+  nfp_net_poll, NAPI_POLL_WEIGHT);
 
-   irq_set_affinity_hint(entry->vector, NULL);
-   free_irq(entry->vector, r_vec);
+   irq_set_affinity_hint(entry->vector, _vec->affinity_mask);
 
-   netif_napi_del(_vec->napi);
-   }
-}
+   nn_dbg(nn, "RV%02d: irq=%03d/%03d\n", idx, entry->vector, entry->entry);
 
-/**
- * nfp_net_free_rings() - Free all ring resources
- * @nn:  NFP Net device to reconfigure
- */
-static void nfp_net_free_rings(struct nfp_net *nn)
-{
-   __nfp_net_free_rings(nn, nn->num_r_vecs);
+   return 0;
 }
 
-/**
- * nfp_net_alloc_rings() - Allocate resources for RX and TX rings
- * @nn:  NFP Net device to reconfigure
- *
- * Return: 0 on success or negative errno on error.
- */
-static int nfp_net_alloc_rings(struct nfp_net *nn)
+static void
+nfp_net_cleanup_vector(struct nfp_net *nn, struct nfp_net_r_vector *r_vec)
 {
-   struct nfp_net_r_vector *r_vec;
-   struct msix_entry *entry;
-   int err;
-   int r;
+   struct msix_entry *entry = >irq_entries[r_vec->irq_idx];
 
-   for (r = 0; r < nn->num_r_vecs; r++) {
-   r_vec = >r_vecs[r];
-   entry = >irq_entries[r_vec->irq_idx];
-
-   /* Setup NAPI */
-   netif_napi_add(nn->netdev, _vec->napi,
-  nfp_net_poll, NAPI_POLL_WEIGHT);
-
-   snprintf(r_vec->name, sizeof(r_vec->name),
-"%s-rxtx-%d", nn->netdev->name, r);
-   err = request_irq(entry->vector, r_vec->handler, 0,
- r_vec->name, r_vec);
-   if (err) {
-   nn_dbg(nn, "Error requesting IRQ %d\n", entry->vector);
-   goto err_napi_del;
-   }
-
-   irq_set_affinity_hint(entry->vector, _vec->affinity_mask);
-
-   nn_dbg(nn, "RV%02d: irq=%03d/%03d\n",
-  r, entry->vector, entry->entry);
-
-   /* Allocate TX ring resources */
-   err = nfp_net_tx_ring_alloc(r_vec->tx_ring);
-   if (err)
-   goto err_free_irq;
-
-   /* Allocate RX ring resources */
-   err = nfp_net_rx_ring_alloc(r_vec->rx_ring);
-   if (err)
-   goto err_free_tx;
-   }
-
-   return 0;
-
-err_free_tx:
-   nfp_net_tx_ring_free(r_vec->tx_ring);
-err_free_irq:
irq_set_affinity_hint(entry->vector, NULL);
-   free_irq(entry->vector, r_vec);
-err_napi_del:
netif_napi_del(_vec->napi);
-   __nfp_net_free_rings(nn, r);
-   return err;
+   free_irq(entry->vector, r_vec);
 }
 
 /**
@@ -1734,9 +1683,19 @@ static int nfp_net_netdev_open(struct net_device *netdev)
goto err_free_exn;
disable_irq(nn->irq_entries[NFP_NET_CFG_LSC].vector);
 
-   err = nfp_net_alloc_rings(nn);
-   if (err)
-   goto err_free_lsc;
+   for (r = 0; r < nn->num_r_vecs; r++) {
+   err = nfp_net_prepare_vector(nn, >r_vecs[r], r);
+   if (err)
+   goto err_free_prev_vecs;
+
+   err = nfp_net_tx_ring_alloc(nn->r_vecs[r].tx_ring);
+   if (err)
+   goto

[PATCHv3 net-next 06/14] nfp: reorganize initial filling of RX rings

2016-02-18 Thread Jakub Kicinski

Separate allocation of buffers from giving them to FW,
thanks to this it will be possible to move allocation
earlier on .ndo_open() path and reuse buffers during
runtime reconfiguration.

Similar to TX side clean up the spill of functionality
from flush to freeing the ring.  Unlike on TX side,
RX ring reset does not free buffers from the ring.
Ring reset means only that FW pointers are zeroed and
buffers on the ring must be placed in [0, cnt - 1)
positions.

Signed-off-by: Jakub Kicinski 
---
 .../net/ethernet/netronome/nfp/nfp_net_common.c| 119 ++---
 1 file changed, 78 insertions(+), 41 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c 
b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
index cc8b06651f57..b640e1693377 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
@@ -1021,62 +1021,100 @@ static void nfp_net_rx_give_one(struct nfp_net_rx_ring 
*rx_ring,
 }
 
 /**
- * nfp_net_rx_flush() - Free any buffers currently on the RX ring
- * @rx_ring:  RX ring to remove buffers from
+ * nfp_net_rx_ring_reset() - Reflect in SW state of freelist after disable
+ * @rx_ring:   RX ring structure
  *
- * Assumes that the device is stopped
+ * Warning: Do *not* call if ring buffers were never put on the FW freelist
+ * (i.e. device was not enabled)!
  */
-static void nfp_net_rx_flush(struct nfp_net_rx_ring *rx_ring)
+static void nfp_net_rx_ring_reset(struct nfp_net_rx_ring *rx_ring)
 {
-   struct nfp_net *nn = rx_ring->r_vec->nfp_net;
-   struct pci_dev *pdev = nn->pdev;
-   int idx;
+   unsigned int wr_idx, last_idx;
 
-   while (rx_ring->rd_p != rx_ring->wr_p) {
-   idx = rx_ring->rd_p % rx_ring->cnt;
+   /* Move the empty entry to the end of the list */
+   wr_idx = rx_ring->wr_p % rx_ring->cnt;
+   last_idx = rx_ring->cnt - 1;
+   rx_ring->rxbufs[wr_idx].dma_addr = rx_ring->rxbufs[last_idx].dma_addr;
+   rx_ring->rxbufs[wr_idx].skb = rx_ring->rxbufs[last_idx].skb;
+   rx_ring->rxbufs[last_idx].dma_addr = 0;
+   rx_ring->rxbufs[last_idx].skb = NULL;
 
-   if (rx_ring->rxbufs[idx].skb) {
-   dma_unmap_single(>dev,
-rx_ring->rxbufs[idx].dma_addr,
-nn->fl_bufsz, DMA_FROM_DEVICE);
-   dev_kfree_skb_any(rx_ring->rxbufs[idx].skb);
-   rx_ring->rxbufs[idx].dma_addr = 0;
-   rx_ring->rxbufs[idx].skb = NULL;
-   }
+   memset(rx_ring->rxds, 0, sizeof(*rx_ring->rxds) * rx_ring->cnt);
+   rx_ring->wr_p = 0;
+   rx_ring->rd_p = 0;
+   rx_ring->wr_ptr_add = 0;
+}
 
-   memset(_ring->rxds[idx], 0, sizeof(rx_ring->rxds[idx]));
+/**
+ * nfp_net_rx_ring_bufs_free() - Free any buffers currently on the RX ring
+ * @nn:NFP Net device
+ * @rx_ring:   RX ring to remove buffers from
+ *
+ * Assumes that the device is stopped and buffers are in [0, ring->cnt - 1)
+ * entries.  After device is disabled nfp_net_rx_ring_reset() must be called
+ * to restore required ring geometry.
+ */
+static void
+nfp_net_rx_ring_bufs_free(struct nfp_net *nn, struct nfp_net_rx_ring *rx_ring)
+{
+   struct pci_dev *pdev = nn->pdev;
+   unsigned int i;
 
-   rx_ring->rd_p++;
+   for (i = 0; i < rx_ring->cnt - 1; i++) {
+   /* NULL skb can only happen when initial filling of the ring
+* fails to allocate enough buffers and calls here to free
+* already allocated ones.
+*/
+   if (!rx_ring->rxbufs[i].skb)
+   continue;
+
+   dma_unmap_single(>dev, rx_ring->rxbufs[i].dma_addr,
+nn->fl_bufsz, DMA_FROM_DEVICE);
+   dev_kfree_skb_any(rx_ring->rxbufs[i].skb);
+   rx_ring->rxbufs[i].dma_addr = 0;
+   rx_ring->rxbufs[i].skb = NULL;
}
 }
 
 /**
- * nfp_net_rx_fill_freelist() - Attempt filling freelist with RX buffers
- * @rx_ring: RX ring to fill
- *
- * Try to fill as many buffers as possible into freelist.  Return
- * number of buffers added.
- *
- * Return: Number of freelist buffers added.
+ * nfp_net_rx_ring_bufs_alloc() - Fill RX ring with buffers (don't give to FW)
+ * @nn:NFP Net device
+ * @rx_ring:   RX ring to remove buffers from
  */
-static int nfp_net_rx_fill_freelist(struct nfp_net_rx_ring *rx_ring)
+static int
+nfp_net_rx_ring_bufs_alloc(struct nfp_net *nn, struct nfp_net_rx_ring *rx_ring)
 {
-   struct sk_buff *skb;
-   dma_addr_t dma_addr;
+   struct nfp_net_rx_buf *rxbufs;
+   unsigned int i;
+
+   rxbufs = rx_ring->rxbufs;
 
-   while (nfp_net_rx_space(rx_ring)) {
-   skb = nfp_net_rx_alloc_one(rx_ring, _addr);
-   if (!skb) {
-

[PATCHv3 net-next 03/14] nfp: make *x_ring_init do all the init

2016-02-18 Thread Jakub Kicinski

nfp_net_[rt]x_ring_init functions used to be called from probe
path only and some of their functionality was spilled to the
call site.  In order to reuse them for ring reconfiguration
we need them to do all the init.

Signed-off-by: Jakub Kicinski 
---
 .../net/ethernet/netronome/nfp/nfp_net_common.c| 28 ++
 1 file changed, 18 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c 
b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
index d39ac3553e1e..8299d4c002fb 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
@@ -348,12 +348,18 @@ static irqreturn_t nfp_net_irq_exn(int irq, void *data)
 /**
  * nfp_net_tx_ring_init() - Fill in the boilerplate for a TX ring
  * @tx_ring:  TX ring structure
+ * @r_vec:IRQ vector servicing this ring
+ * @idx:  Ring index
  */
-static void nfp_net_tx_ring_init(struct nfp_net_tx_ring *tx_ring)
+static void
+nfp_net_tx_ring_init(struct nfp_net_tx_ring *tx_ring,
+struct nfp_net_r_vector *r_vec, unsigned int idx)
 {
-   struct nfp_net_r_vector *r_vec = tx_ring->r_vec;
struct nfp_net *nn = r_vec->nfp_net;
 
+   tx_ring->idx = idx;
+   tx_ring->r_vec = r_vec;
+
tx_ring->qcidx = tx_ring->idx * nn->stride_tx;
tx_ring->qcp_q = nn->tx_bar + NFP_QCP_QUEUE_OFF(tx_ring->qcidx);
 }
@@ -361,12 +367,18 @@ static void nfp_net_tx_ring_init(struct nfp_net_tx_ring 
*tx_ring)
 /**
  * nfp_net_rx_ring_init() - Fill in the boilerplate for a RX ring
  * @rx_ring:  RX ring structure
+ * @r_vec:IRQ vector servicing this ring
+ * @idx:  Ring index
  */
-static void nfp_net_rx_ring_init(struct nfp_net_rx_ring *rx_ring)
+static void
+nfp_net_rx_ring_init(struct nfp_net_rx_ring *rx_ring,
+struct nfp_net_r_vector *r_vec, unsigned int idx)
 {
-   struct nfp_net_r_vector *r_vec = rx_ring->r_vec;
struct nfp_net *nn = r_vec->nfp_net;
 
+   rx_ring->idx = idx;
+   rx_ring->r_vec = r_vec;
+
rx_ring->fl_qcidx = rx_ring->idx * nn->stride_rx;
rx_ring->rx_qcidx = rx_ring->fl_qcidx + (nn->stride_rx - 1);
 
@@ -404,14 +416,10 @@ static void nfp_net_irqs_assign(struct net_device *netdev)
cpumask_set_cpu(r, _vec->affinity_mask);
 
r_vec->tx_ring = >tx_rings[r];
-   nn->tx_rings[r].idx = r;
-   nn->tx_rings[r].r_vec = r_vec;
-   nfp_net_tx_ring_init(r_vec->tx_ring);
+   nfp_net_tx_ring_init(r_vec->tx_ring, r_vec, r);
 
r_vec->rx_ring = >rx_rings[r];
-   nn->rx_rings[r].idx = r;
-   nn->rx_rings[r].r_vec = r_vec;
-   nfp_net_rx_ring_init(r_vec->rx_ring);
+   nfp_net_rx_ring_init(r_vec->rx_ring, r_vec, r);
}
 }
 
-- 
1.9.1

Re: [PATCH net-next 0/2] Add support for PHY packet generators

2016-02-18 Thread David Miller

From: Andrew Lunn 
Date: Wed, 17 Feb 2016 21:32:05 +0100

> Some Ethernet PHYs contain a simple packet generator. This can be
> useful for bringing up new devices, trying to determine if a problem
> lies in the MAC-PHY connection or PHY-Socket. Also, the PHY generators
> can generate invalid packets, which is hard to do in software.
> 
> Add support ethtool(1) and wire up the Marvell PHY packet generator.

You really cannot make this blocking, every time we've added a blocking
ethtool op that could take a non-trivial amount of time we've been
burnt.

So as Ben mentioned blocking for 0.3 seconds or whatever is a non-starter.

[PATCHv3 net-next 09/14] nfp: slice .ndo_open() and .ndo_stop() up

2016-02-18 Thread Jakub Kicinski

Divide .ndo_open() and .ndo_stop() into logical, callable
chunks.  No functional changes.

Signed-off-by: Jakub Kicinski 
---
 .../net/ethernet/netronome/nfp/nfp_net_common.c| 213 +
 1 file changed, 131 insertions(+), 82 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c 
b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
index 34f933f19059..4ce17cb95e6f 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
@@ -1671,6 +1671,77 @@ nfp_net_vec_write_ring_data(struct nfp_net *nn, struct 
nfp_net_r_vector *r_vec,
 }
 
 /**
+ * nfp_net_set_config_and_enable() - Write control BAR and enable NFP
+ * @nn:  NFP Net device to reconfigure
+ */
+static int nfp_net_set_config_and_enable(struct nfp_net *nn)
+{
+   u32 new_ctrl, update = 0;
+   unsigned int r;
+   int err;
+
+   new_ctrl = nn->ctrl;
+
+   if (nn->cap & NFP_NET_CFG_CTRL_RSS) {
+   nfp_net_rss_write_key(nn);
+   nfp_net_rss_write_itbl(nn);
+   nn_writel(nn, NFP_NET_CFG_RSS_CTRL, nn->rss_cfg);
+   update |= NFP_NET_CFG_UPDATE_RSS;
+   }
+
+   if (nn->cap & NFP_NET_CFG_CTRL_IRQMOD) {
+   nfp_net_coalesce_write_cfg(nn);
+
+   new_ctrl |= NFP_NET_CFG_CTRL_IRQMOD;
+   update |= NFP_NET_CFG_UPDATE_IRQMOD;
+   }
+
+   for (r = 0; r < nn->num_r_vecs; r++)
+   nfp_net_vec_write_ring_data(nn, >r_vecs[r], r);
+
+   nn_writeq(nn, NFP_NET_CFG_TXRS_ENABLE, nn->num_tx_rings == 64 ?
+ 0xULL : ((u64)1 << nn->num_tx_rings) - 1);
+
+   nn_writeq(nn, NFP_NET_CFG_RXRS_ENABLE, nn->num_rx_rings == 64 ?
+ 0xULL : ((u64)1 << nn->num_rx_rings) - 1);
+
+   nfp_net_write_mac_addr(nn, nn->netdev->dev_addr);
+
+   nn_writel(nn, NFP_NET_CFG_MTU, nn->netdev->mtu);
+   nn_writel(nn, NFP_NET_CFG_FLBUFSZ, nn->fl_bufsz);
+
+   /* Enable device */
+   new_ctrl |= NFP_NET_CFG_CTRL_ENABLE;
+   update |= NFP_NET_CFG_UPDATE_GEN;
+   update |= NFP_NET_CFG_UPDATE_MSIX;
+   update |= NFP_NET_CFG_UPDATE_RING;
+   if (nn->cap & NFP_NET_CFG_CTRL_RINGCFG)
+   new_ctrl |= NFP_NET_CFG_CTRL_RINGCFG;
+
+   nn_writel(nn, NFP_NET_CFG_CTRL, new_ctrl);
+   err = nfp_net_reconfig(nn, update);
+   if (err)
+   goto err_clear_config;
+
+   nn->ctrl = new_ctrl;
+
+   /* Since reconfiguration requests while NFP is down are ignored we
+* have to wipe the entire VXLAN configuration and reinitialize it.
+*/
+   if (nn->ctrl & NFP_NET_CFG_CTRL_VXLAN) {
+   memset(>vxlan_ports, 0, sizeof(nn->vxlan_ports));
+   memset(>vxlan_usecnt, 0, sizeof(nn->vxlan_usecnt));
+   vxlan_get_rx_port(nn->netdev);
+   }
+
+   return 0;
+
+err_clear_config:
+   nfp_net_clear_config_and_disable(nn);
+   return err;
+}
+
+/**
  * nfp_net_start_vec() - Start ring vector
  * @nn:  NFP Net device structure
  * @r_vec:   Ring vector to be started
@@ -1690,20 +1761,33 @@ nfp_net_start_vec(struct nfp_net *nn, struct 
nfp_net_r_vector *r_vec)
enable_irq(irq_vec);
 }
 
+/**
+ * nfp_net_open_stack() - Start the device from stack's perspective
+ * @nn:  NFP Net device to reconfigure
+ */
+static void nfp_net_open_stack(struct nfp_net *nn)
+{
+   unsigned int r;
+
+   for (r = 0; r < nn->num_r_vecs; r++)
+   nfp_net_start_vec(nn, >r_vecs[r]);
+
+   netif_tx_wake_all_queues(nn->netdev);
+
+   enable_irq(nn->irq_entries[NFP_NET_CFG_LSC].vector);
+   nfp_net_read_link_status(nn);
+}
+
 static int nfp_net_netdev_open(struct net_device *netdev)
 {
struct nfp_net *nn = netdev_priv(netdev);
int err, r;
-   u32 update = 0;
-   u32 new_ctrl;
 
if (nn->ctrl & NFP_NET_CFG_CTRL_ENABLE) {
nn_err(nn, "Dev is already enabled: 0x%08x\n", nn->ctrl);
return -EBUSY;
}
 
-   new_ctrl = nn->ctrl;
-
/* Step 1: Allocate resources for rings and the like
 * - Request interrupts
 * - Allocate RX and TX ring resources
@@ -1756,20 +1840,6 @@ static int nfp_net_netdev_open(struct net_device *netdev)
if (err)
goto err_free_rings;
 
-   if (nn->cap & NFP_NET_CFG_CTRL_RSS) {
-   nfp_net_rss_write_key(nn);
-   nfp_net_rss_write_itbl(nn);
-   nn_writel(nn, NFP_NET_CFG_RSS_CTRL, nn->rss_cfg);
-   update |= NFP_NET_CFG_UPDATE_RSS;
-   }
-
-   if (nn->cap & NFP_NET_CFG_CTRL_IRQMOD) {
-   nfp_net_coalesce_write_cfg(nn);
-
-   new_ctrl |= NFP_NET_CFG_CTRL_IRQMOD;
-   update |= NFP_NET_CFG_UPDATE_IRQMOD;
-   }
-
/* Step 2: Configure the NFP
 * - Enable rings from 0 to tx_rings/rx_rings - 1.
 * -

[PATCHv3 net-next 01/14] nfp: move link state interrupt request/free calls

2016-02-18 Thread Jakub Kicinski

We need to be able to disable the link state interrupt when
the device is brought down.  We used to just free the IRQ
at the beginning of .ndo_stop().  As we now move towards
more ordered .ndo_open()/.ndo_stop() paths LSC allocation
should be placed in the "allocate resource" section.

Since the IRQ can't be freed early in .ndo_stop(), it is
disabled instead.

Signed-off-by: Jakub Kicinski 
---
 .../net/ethernet/netronome/nfp/nfp_net_common.c| 23 +++---
 1 file changed, 12 insertions(+), 11 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c 
b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
index 070645f9bc21..bebdae80ccda 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
@@ -1727,10 +1727,16 @@ static int nfp_net_netdev_open(struct net_device 
*netdev)
  NFP_NET_IRQ_EXN_IDX, nn->exn_handler);
if (err)
return err;
+   err = nfp_net_aux_irq_request(nn, NFP_NET_CFG_LSC, "%s-lsc",
+ nn->lsc_name, sizeof(nn->lsc_name),
+ NFP_NET_IRQ_LSC_IDX, nn->lsc_handler);
+   if (err)
+   goto err_free_exn;
+   disable_irq(nn->irq_entries[NFP_NET_CFG_LSC].vector);
 
err = nfp_net_alloc_rings(nn);
if (err)
-   goto err_free_exn;
+   goto err_free_lsc;
 
err = netif_set_real_num_tx_queues(netdev, nn->num_tx_rings);
if (err)
@@ -1810,19 +1816,11 @@ static int nfp_net_netdev_open(struct net_device 
*netdev)
 
netif_tx_wake_all_queues(netdev);
 
-   err = nfp_net_aux_irq_request(nn, NFP_NET_CFG_LSC, "%s-lsc",
- nn->lsc_name, sizeof(nn->lsc_name),
- NFP_NET_IRQ_LSC_IDX, nn->lsc_handler);
-   if (err)
-   goto err_stop_tx;
+   enable_irq(nn->irq_entries[NFP_NET_CFG_LSC].vector);
nfp_net_read_link_status(nn);
 
return 0;
 
-err_stop_tx:
-   netif_tx_disable(netdev);
-   for (r = 0; r < nn->num_r_vecs; r++)
-   nfp_net_tx_flush(nn->r_vecs[r].tx_ring);
 err_disable_napi:
while (r--) {
napi_disable(>r_vecs[r].napi);
@@ -1832,6 +1830,8 @@ err_clear_config:
nfp_net_clear_config_and_disable(nn);
 err_free_rings:
nfp_net_free_rings(nn);
+err_free_lsc:
+   nfp_net_aux_irq_free(nn, NFP_NET_CFG_LSC, NFP_NET_IRQ_LSC_IDX);
 err_free_exn:
nfp_net_aux_irq_free(nn, NFP_NET_CFG_EXN, NFP_NET_IRQ_EXN_IDX);
return err;
@@ -1853,7 +1853,7 @@ static int nfp_net_netdev_close(struct net_device *netdev)
 
/* Step 1: Disable RX and TX rings from the Linux kernel perspective
 */
-   nfp_net_aux_irq_free(nn, NFP_NET_CFG_LSC, NFP_NET_IRQ_LSC_IDX);
+   disable_irq(nn->irq_entries[NFP_NET_CFG_LSC].vector);
netif_carrier_off(netdev);
nn->link_up = false;
 
@@ -1874,6 +1874,7 @@ static int nfp_net_netdev_close(struct net_device *netdev)
}
 
nfp_net_free_rings(nn);
+   nfp_net_aux_irq_free(nn, NFP_NET_CFG_LSC, NFP_NET_IRQ_LSC_IDX);
nfp_net_aux_irq_free(nn, NFP_NET_CFG_EXN, NFP_NET_IRQ_EXN_IDX);
 
nn_dbg(nn, "%s down", netdev->name);
-- 
1.9.1

[PATCHv3 net 1/5] nfp: return error if MTU change fails

2016-02-18 Thread Jakub Kicinski

When reopening device fails after MTU change, let the userspace
know.  MTU remains changed even though error is returned, this
is what all ethernet devices are doing.

Signed-off-by: Jakub Kicinski 
Reviewed-by: Rolf Neugebauer 
---
 drivers/net/ethernet/netronome/nfp/nfp_net_common.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c 
b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
index 43c618bafdb6..006d9600240f 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
@@ -1911,6 +1911,7 @@ static void nfp_net_set_rx_mode(struct net_device *netdev)
 static int nfp_net_change_mtu(struct net_device *netdev, int new_mtu)
 {
struct nfp_net *nn = netdev_priv(netdev);
+   int ret = 0;
u32 tmp;
 
nn_dbg(nn, "New MTU = %d\n", new_mtu);
@@ -1929,10 +1930,10 @@ static int nfp_net_change_mtu(struct net_device 
*netdev, int new_mtu)
/* restart if running */
if (netif_running(netdev)) {
nfp_net_netdev_close(netdev);
-   nfp_net_netdev_open(netdev);
+   ret = nfp_net_netdev_open(netdev);
}
 
-   return 0;
+   return ret;
 }
 
 static struct rtnl_link_stats64 *nfp_net_stat64(struct net_device *netdev,
-- 
1.9.1

[PATCHv3 net 2/5] nfp: free buffers before changing MTU

2016-02-18 Thread Jakub Kicinski

For freeing DMA buffers we depend on nfp_net.fl_bufsz having the same
value as during allocation therefore in .ndo_change_mtu we must first
free the buffers and then change the setting.

Signed-off-by: Jakub Kicinski 
Reviewed-by: Rolf Neugebauer 
---
 drivers/net/ethernet/netronome/nfp/nfp_net_common.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c 
b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
index 006d9600240f..b381682de3d6 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
@@ -1921,17 +1921,17 @@ static int nfp_net_change_mtu(struct net_device 
*netdev, int new_mtu)
return -EINVAL;
}
 
+   if (netif_running(netdev))
+   nfp_net_netdev_close(netdev);
+
netdev->mtu = new_mtu;
 
/* Freelist buffer size rounded up to the nearest 1K */
tmp = new_mtu + ETH_HLEN + VLAN_HLEN + NFP_NET_MAX_PREPEND;
nn->fl_bufsz = roundup(tmp, 1024);
 
-   /* restart if running */
-   if (netif_running(netdev)) {
-   nfp_net_netdev_close(netdev);
+   if (netif_running(netdev))
ret = nfp_net_netdev_open(netdev);
-   }
 
return ret;
 }
-- 
1.9.1

[PATCHv3 net 4/5] nfp: fix RX buffer length validation

2016-02-18 Thread Jakub Kicinski

Meaning of data_len and meta_len RX WB descriptor fields depend
slightly on whether rx_offset is dynamic or not.  For dynamic
offsets data_len includes meta_len.  This makes the code harder
to follow, in fact our RX buffer length check is incorrect -
we are comparing allocation length to data_len while we should
also account for meta_len.

Let's adjust the values of data_len and meta_len to their natural
meaning and simplify the logic.

Signed-off-by: Jakub Kicinski 
Reviewed-by: Rolf Neugebauer 
---
 drivers/net/ethernet/netronome/nfp/nfp_net_common.c | 19 ---
 1 file changed, 8 insertions(+), 11 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c 
b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
index 553ae64e2f7f..070645f9bc21 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
@@ -1259,22 +1259,19 @@ static int nfp_net_rx(struct nfp_net_rx_ring *rx_ring, 
int budget)
 
meta_len = rxd->rxd.meta_len_dd & PCIE_DESC_RX_META_LEN_MASK;
data_len = le16_to_cpu(rxd->rxd.data_len);
+   /* For dynamic offset data_len includes meta_len, adjust */
+   if (nn->rx_offset == NFP_NET_CFG_RX_OFFSET_DYNAMIC)
+   data_len -= meta_len;
+   else
+   meta_len = nn->rx_offset;
 
-   if (WARN_ON_ONCE(data_len > nn->fl_bufsz)) {
+   if (WARN_ON_ONCE(meta_len + data_len > nn->fl_bufsz)) {
dev_kfree_skb_any(skb);
continue;
}
 
-   if (nn->rx_offset == NFP_NET_CFG_RX_OFFSET_DYNAMIC) {
-   /* The packet data starts after the metadata */
-   skb_reserve(skb, meta_len);
-   } else {
-   /* The packet data starts at a fixed offset */
-   skb_reserve(skb, nn->rx_offset);
-   }
-
-   /* Adjust the SKB for the dynamic meta data pre-pended */
-   skb_put(skb, data_len - meta_len);
+   skb_reserve(skb, meta_len);
+   skb_put(skb, data_len);
 
nfp_net_set_hash(nn->netdev, skb, rxd);
 
-- 
1.9.1

[PATCHv3 net 5/5] nfp: don't trust netif_running() in debug code

2016-02-18 Thread Jakub Kicinski

Since change_mtu() can fail and leave us with netif_running()
returning true even though all rings were freed - we should
look at NFP_NET_CFG_CTRL_ENABLE flag to determine if device
is really opened.

Signed-off-by: Jakub Kicinski 
---
 drivers/net/ethernet/netronome/nfp/nfp_net_debugfs.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_debugfs.c 
b/drivers/net/ethernet/netronome/nfp/nfp_net_debugfs.c
index 4c97c713121c..7af404d492cc 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_debugfs.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_debugfs.c
@@ -52,7 +52,7 @@ static int nfp_net_debugfs_rx_q_read(struct seq_file *file, 
void *data)
if (!rx_ring->r_vec || !rx_ring->r_vec->nfp_net)
goto out;
nn = rx_ring->r_vec->nfp_net;
-   if (!netif_running(nn->netdev))
+   if (!(nn->ctrl & NFP_NET_CFG_CTRL_ENABLE))
goto out;
 
rxd_cnt = rx_ring->cnt;
@@ -127,7 +127,7 @@ static int nfp_net_debugfs_tx_q_read(struct seq_file *file, 
void *data)
if (!tx_ring->r_vec || !tx_ring->r_vec->nfp_net)
goto out;
nn = tx_ring->r_vec->nfp_net;
-   if (!netif_running(nn->netdev))
+   if (!(nn->ctrl & NFP_NET_CFG_CTRL_ENABLE))
goto out;
 
txd_cnt = tx_ring->cnt;
-- 
1.9.1

[PATCHv3 net 0/5] nfp: MTU fixes for net

2016-02-18 Thread Jakub Kicinski

Hi Dave!

This is the first part of MTU reconfiguration fixes.  These
are the patches which I would like to get into -net.  The
requested overhaul of the way MTU configuration is done is
posted as a separate series targeted at net-next.

Thanks!


Jakub Kicinski (5):
  nfp: return error if MTU change fails
  nfp: free buffers before changing MTU
  nfp: correct RX buffer length calculation
  nfp: fix RX buffer length validation
  nfp: don't trust netif_running() in debug code

 .../net/ethernet/netronome/nfp/nfp_net_common.c| 42 ++
 .../net/ethernet/netronome/nfp/nfp_net_debugfs.c   |  4 +--
 2 files changed, 20 insertions(+), 26 deletions(-)

-- 
1.9.1

[PATCHv3 net 3/5] nfp: correct RX buffer length calculation

2016-02-18 Thread Jakub Kicinski

When calculating the RX buffer length we need to account for
up to 2 VLAN tags and up to 8 MPLS labels.  Rounding up to 1k
is an relic of a distant past and can be removed.  While at
it also remove trivial print statement.

Signed-off-by: Jakub Kicinski 
---
 drivers/net/ethernet/netronome/nfp/nfp_net_common.c | 10 +++---
 1 file changed, 3 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c 
b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
index b381682de3d6..553ae64e2f7f 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
@@ -61,6 +61,7 @@
 
 #include 
 
+#include 
 #include 
 
 #include "nfp_net_ctrl.h"
@@ -1912,9 +1913,6 @@ static int nfp_net_change_mtu(struct net_device *netdev, 
int new_mtu)
 {
struct nfp_net *nn = netdev_priv(netdev);
int ret = 0;
-   u32 tmp;
-
-   nn_dbg(nn, "New MTU = %d\n", new_mtu);
 
if (new_mtu < 68 || new_mtu > nn->max_mtu) {
nn_err(nn, "New MTU (%d) is not valid\n", new_mtu);
@@ -1925,10 +1923,8 @@ static int nfp_net_change_mtu(struct net_device *netdev, 
int new_mtu)
nfp_net_netdev_close(netdev);
 
netdev->mtu = new_mtu;
-
-   /* Freelist buffer size rounded up to the nearest 1K */
-   tmp = new_mtu + ETH_HLEN + VLAN_HLEN + NFP_NET_MAX_PREPEND;
-   nn->fl_bufsz = roundup(tmp, 1024);
+   nn->fl_bufsz = NFP_NET_MAX_PREPEND + ETH_HLEN + VLAN_HLEN * 2 +
+   MPLS_HLEN * 8 + new_mtu;
 
if (netif_running(netdev))
ret = nfp_net_netdev_open(netdev);
-- 
1.9.1

Re: [PATCH net-next 0/2] bridge: mdb: add support for extended attributes

2016-02-18 Thread David Miller

From: Nikolay Aleksandrov 
Date: Tue, 16 Feb 2016 12:46:52 +0100

> Note that the reason we can't simply add an attribute after
> MDBA_MDB_ENTRY_INFO is that current users (e.g. iproute2) walk over
> the attribute list directly without checking for the attribute type.

Honestly that sounds like a bug in iproute2 to me...

Re: [PATCH] net: fix bridge multicast packet checksum validation

2016-02-18 Thread David Miller

From: Linus Lüssing 
Date: Mon, 15 Feb 2016 03:07:06 +0100

> @@ -4084,10 +4089,22 @@ struct sk_buff *skb_checksum_trimmed(struct sk_buff 
> *skb,
>   if (!pskb_may_pull(skb_chk, offset))
>   goto err;
>  
> - __skb_pull(skb_chk, offset);
> + ip_summed = skb->ip_summed;
> + csum_valid = skb->csum_valid;
> + csum_level = skb->csum_level;
> + csum_bad = skb->csum_bad;
> + csum = skb->csum;
> +
> + skb_pull_rcsum(skb_chk, offset);
>   ret = skb_chkf(skb_chk);
>   __skb_push(skb_chk, offset);
>  
> + skb->ip_summed = ip_summed;
> + skb->csum_valid = csum_valid;
> + skb->csum_level = csum_level;
> + skb->csum_bad = csum_bad;
> + skb->csum = csum;
> +

There really has to be a better way to fix this :-/

Re: [PATCH net] r8169:fix "rtl_counters_cond == 1 (loop: 1000, delay: 10)" log spam.

2016-02-18 Thread Francois Romieu

Chunhao Lin  :
[...]
> I add checking driver's pm runtime status in rtl8169_get_stats64() to fix
> this issue.

Would you consider taking the device out of suspended mode during
rtl8169_get_stats64 to prevent outdated stats ?

-- 
Ueimor

Re: [PATCH] rose: correct integer overflow check

2016-02-18 Thread David Miller

From: Insu Yun 
Date: Wed, 17 Feb 2016 15:25:13 -0500

> Since rose_ndevs is signed integer type,
> it can be overflowed when it is negative.
> 
> Signed-off-by: Insu Yun 

That's not how the expression is evaluated.

Because of the types on the right hand side of the comparison
the expressions are all promoted to unsigned.

Did you look at the compiler's assembler output?  I did when
reviewing your patch.

[PATCH 1/1] ser_gigaset: use container_of() instead of detour

2016-02-18 Thread Paul Bolle

The purpose of gigaset_device_release() is to kfree() the struct
ser_cardstate that contains our struct device. This is done via a bit of
a detour. First we make our struct device's driver_data point to the
container of our struct ser_cardstate (which is a struct cardstate). In
gigaset_device_release() we then retrieve that driver_data again. And
after that we finally kfree() the struct ser_cardstate that was saved in
the struct cardstate.

All of this can be achieved much easier by using container_of() to get
from our struct device to its container, struct ser_cardstate. Do so.

Note that at the time the detour was implemented commit b8b2c7d845d5
("base/platform: assert that dev_pm_domain callbacks are called
unconditionally") had just entered the tree. That commit disconnected
our platform_device and our platform_driver. These were reconnected
again in v4.5-rc2 through commit 25cad69f21f5 ("base/platform: Fix
platform drivers with no probe callback"). And one of the consequences
of that fix was that it broke the detour via driver_data. That's because
it made __device_release_driver() stop being a NOP for our struct device
and actually do stuff again. One of the things it now does, is setting
our driver_data to NULL. That, in turn, makes it impossible for
gigaset_device_release() to get to our struct cardstate. Which has the
net effect of leaking a struct ser_cardstate at every call of this
driver's tty close() operation. So using container_of() has the
additional benefit of actually working.

Reported-by: Dmitry Vyukov 
Tested-by: Dmitry Vyukov 
Signed-off-by: Paul Bolle 
---
 drivers/isdn/gigaset/ser-gigaset.c | 9 +
 1 file changed, 1 insertion(+), 8 deletions(-)

diff --git a/drivers/isdn/gigaset/ser-gigaset.c 
b/drivers/isdn/gigaset/ser-gigaset.c
index 2a506fe0c8a4..d1f8ab915b15 100644
--- a/drivers/isdn/gigaset/ser-gigaset.c
+++ b/drivers/isdn/gigaset/ser-gigaset.c
@@ -373,13 +373,7 @@ static void gigaset_freecshw(struct cardstate *cs)
 
 static void gigaset_device_release(struct device *dev)
 {
-   struct cardstate *cs = dev_get_drvdata(dev);
-
-   if (!cs)
-   return;
-   dev_set_drvdata(dev, NULL);
-   kfree(cs->hw.ser);
-   cs->hw.ser = NULL;
+   kfree(container_of(dev, struct ser_cardstate, dev.dev));
 }
 
 /*
@@ -408,7 +402,6 @@ static int gigaset_initcshw(struct cardstate *cs)
cs->hw.ser = NULL;
return rc;
}
-   dev_set_drvdata(>hw.ser->dev.dev, cs);
 
tasklet_init(>write_tasklet,
 gigaset_modem_fill, (unsigned long) cs);
-- 
2.4.3

[PATCH 0/1] ser_gigaset: use container_of() instead of detour

2016-02-18 Thread Paul Bolle

Dmitry Vyukov reported that the syzkaller fuzzer uncovered a leak in
ser_gigaset (see
https://lkml.kernel.org/g/cact4y+y+p7-pm0d4htz4zf6i+rya22eokpdnrv_omdcmb7e...@mail.gmail.com).
This small patch fixes that. The commit explanation contains all the details to
understand how this leak made its way into the code.

This should eventually land in stable. No fixes tag because backporting is
probably not so straightforward for v4.3 and earlier. So I suggest I'll send
proper backports in about two weeks.

Paul Bolle (1):
  ser_gigaset: use container_of() instead of detour

 drivers/isdn/gigaset/ser-gigaset.c | 9 +
 1 file changed, 1 insertion(+), 8 deletions(-)

-- 
2.4.3

Re: [net-next PATCH] net: Optimize local checksum offload

2016-02-18 Thread David Miller

From: Alexander Duyck 
Date: Wed, 17 Feb 2016 11:23:55 -0800

> This patch takes advantage of several assumptions we can make about the
> headers of the frame in order to reduce overall processing overhead for
> computing the outer header checksum.
> 
> First we can assume the entire header is in the region pointed to by
> skb->head as this is what csum_start is based on.
> 
> Second, as a result of our first assumption, we can just call csum_partial
> instead of making a call to skb_checksum which would end up having to
> configure things so that we could walk through the frags list.
> 
> Signed-off-by: Alexander Duyck 

Applied, thanks Alex.

Re: linux-next: build failure after merge of the net-next tree

2016-02-18 Thread David Miller

From: Stephen Rothwell 
Date: Thu, 18 Feb 2016 12:28:55 +1100

> After merging the net-next tree, today's linux-next build (powerpc
> ppc64_defconfig) failed like this:
> 
> In file included from drivers/net/ethernet/broadcom/bnx2x/bnx2x.h:56:0,
>  from drivers/net/ethernet/broadcom/bnx2x/bnx2x_dcb.c:30:
> drivers/net/ethernet/broadcom/bnx2x/bnx2x_dcb.c: In function 
> 'bnx2x_dcbx_get_ap_feature':
> drivers/net/ethernet/broadcom/bnx2x/bnx2x_dcb.c:224:11: error: 
> 'DCBX_APP_SF_DEFAULT' undeclared (first use in this function)
>DCBX_APP_SF_DEFAULT) &&  
>^
> drivers/net/ethernet/broadcom/bnx2x/bnx2x_dcb.h:120:45: note: in definition 
> of macro 'GET_FLAGS'
>  #define GET_FLAGS(flags, bits)  ((flags) & (bits))
>  ^
> drivers/net/ethernet/broadcom/bnx2x/bnx2x_dcb.c:224:11: note: each undeclared 
> identifier is reported only once for each function it appears in
>DCBX_APP_SF_DEFAULT) &&  
>^
> drivers/net/ethernet/broadcom/bnx2x/bnx2x_dcb.h:120:45: note: in definition 
> of macro 'GET_FLAGS'
>  #define GET_FLAGS(flags, bits)  ((flags) & (bits))
>  ^
> 
> Caused by commit
> 
>   e5d3a51cefbb ("This adds support for default application priority.")
> 
> This build is big endian.

Yuval and Ariel, you _MUST_ fix this.

This build failure has been in the tree for 24 hours and I haven't heard 
anything
from you two yet.

If this persists for another day I'm reverting all of your changes.

Re: [PATCH] ipv6: Annotate change of locking mechanism for np->opt

2016-02-18 Thread David Miller

From: Benjamin Poirier 
Date: Wed, 17 Feb 2016 16:20:33 -0800

> follows up commit 45f6fad84cc3 ("ipv6: add complete rcu protection around
> np->opt") which added mixed rcu/refcount protection to np->opt.
> 
> Given the current implementation of rcu_pointer_handoff(), this has no
> effect at runtime.
> 
> Signed-off-by: Benjamin Poirier 

Since it has no effect on run-time, applied to net-next, thanks.

Re: [PATCH v2 net] bonding: don't use stale speed and duplex information

2016-02-18 Thread David Miller

From: Jay Vosburgh 
Date: Thu, 18 Feb 2016 12:25:52 -0800

> David Miller  wrote:
> [...]
>>> This was done historically in bonding, but the call to
>>> bond_update_speed_duplex was removed in commit 876254ae2758 ("bonding:
>>> don't call update_speed_duplex() under spinlocks"), as it might sleep
>>> under lock.  Later, the locking was changed to only hold RTNL, and so
>>> after commit 876254ae2758 ("bonding: don't call update_speed_duplex()
>>> under spinlocks") this call is again safe.
>>> 
>>> Tested-by: "Tantilov, Emil S" 
>>> Cc: Veaceslav Falico 
>>> Cc: dingtianhong 
>>> Fixes: 876254ae2758 ("bonding: don't call update_speed_duplex() under 
>>> spinlocks")
>>> Signed-off-by: Jay Vosburgh 
>>
>>Applied, thanks Jay.
> 
>   Rereading the above, I just noticed that I put the wrong commit
> into the fixes tag (and the "Later, the locking was changed" text); the
> correct fixes tag should be:
> 
> Fixes: 4cb4f97b7e36 ("bonding: rebuild the lock use for bond_mii_monitor()")
> 
>   Kernels between 876254ae2758 and 4cb4f97b7e36 should not have
> this patch applied, as it might sleep under lock.
> 
>   Sorry for the error,

Ok, thanks for the info.

Re: [PATCH v2 net] bonding: don't use stale speed and duplex information

2016-02-18 Thread Jay Vosburgh

David Miller  wrote:
[...]
>>  This was done historically in bonding, but the call to
>> bond_update_speed_duplex was removed in commit 876254ae2758 ("bonding:
>> don't call update_speed_duplex() under spinlocks"), as it might sleep
>> under lock.  Later, the locking was changed to only hold RTNL, and so
>> after commit 876254ae2758 ("bonding: don't call update_speed_duplex()
>> under spinlocks") this call is again safe.
>> 
>> Tested-by: "Tantilov, Emil S" 
>> Cc: Veaceslav Falico 
>> Cc: dingtianhong 
>> Fixes: 876254ae2758 ("bonding: don't call update_speed_duplex() under 
>> spinlocks")
>> Signed-off-by: Jay Vosburgh 
>
>Applied, thanks Jay.

Rereading the above, I just noticed that I put the wrong commit
into the fixes tag (and the "Later, the locking was changed" text); the
correct fixes tag should be:

Fixes: 4cb4f97b7e36 ("bonding: rebuild the lock use for bond_mii_monitor()")

Kernels between 876254ae2758 and 4cb4f97b7e36 should not have
this patch applied, as it might sleep under lock.

Sorry for the error,

-J

---
-Jay Vosburgh, jay.vosbu...@canonical.com

Re: [PATCH 8/9] rfkill: Userspace control for airplane mode

2016-02-18 Thread Johannes Berg

Hi,

Sorry for the delay reviewing this.



On Mon, 2016-02-08 at 10:41 -0500, João Paulo Rechi Vita wrote:
> Provide an interface for the airplane-mode indicator be controlled
> from
> userspace. User has to first acquire the control through
> RFKILL_OP_AIRPLANE_MODE_ACQUIRE and keep the fd open for the whole
> time
> it wants to be in control of the indicator. Closing the fd or using
> RFKILL_OP_AIRPLANE_MODE_RELEASE restores the default policy.

I've come to the conclusion that the new ops are probably the best
thing to do here.

> +Userspace can also override the default airplane-mode indicator
> policy through
> +/dev/rfkill. Control of the airplane mode indicator has to be
> acquired first,
> +using RFKILL_OP_AIRPLANE_MODE_ACQUIRE, and is only available for one
> userspace
> +application at a time. Closing the fd or using
> RFKILL_OP_AIRPLANE_MODE_RELEASE
> +reverts the airplane-mode indicator back to the default kernel
> policy and makes
> +it available for other applications to take control. Changes to the
> +airplane-mode indicator state can be made using
> RFKILL_OP_AIRPLANE_MODE_CHANGE,
> +passing the new value in the 'soft' field of 'struct rfkill_event'.

I don't really see any value in _RELEASE, since an application can just
close the fd? I'd prefer not having the duplicate functionality
and force us to exercise the single code path every time.

>  For further details consult Documentation/ABI/stable/sysfs-class-
> rfkill.
> diff --git a/include/uapi/linux/rfkill.h
> b/include/uapi/linux/rfkill.h
> index 2e00dce..9cb999b 100644
> --- a/include/uapi/linux/rfkill.h
> +++ b/include/uapi/linux/rfkill.h
> @@ -67,6 +67,9 @@ enum rfkill_operation {
>   RFKILL_OP_DEL,
>   RFKILL_OP_CHANGE,
>   RFKILL_OP_CHANGE_ALL,
> + RFKILL_OP_AIRPLANE_MODE_ACQUIRE,
> + RFKILL_OP_AIRPLANE_MODE_RELEASE,
> + RFKILL_OP_AIRPLANE_MODE_CHANGE,
>  };
 
> @@ -1199,7 +1202,7 @@ static ssize_t rfkill_fop_write(struct file
> *file, const char __user *buf,
>   if (copy_from_user(, buf, count))
>   return -EFAULT;
>  
> - if (ev.op != RFKILL_OP_CHANGE && ev.op !=
> RFKILL_OP_CHANGE_ALL)
> + if (ev.op < RFKILL_OP_CHANGE)
>   return -EINVAL;

You need to also reject invalid high values, like 27.

>   mutex_lock(_global_mutex);
>  
> + if (ev.op == RFKILL_OP_AIRPLANE_MODE_ACQUIRE) {
> + if (rfkill_apm_owned && !data->is_apm_owner) {
> + count = -EACCES;
> + } else {
> + rfkill_apm_owned = true;
> + data->is_apm_owner = true;
> + }
> + }
> +
> + if (ev.op == RFKILL_OP_AIRPLANE_MODE_RELEASE) {

It would probably be better to simply use "switch (ev.op)" and make the
default case do a reject.

>   if (ev.op == RFKILL_OP_CHANGE_ALL)
>   rfkill_update_global_state(ev.type, ev.soft);

Also moving the existing code inside the switch, of course.

johannes

Re: [PATCH 7/9] rfkill: Create "rfkill-airplane_mode" LED trigger

2016-02-18 Thread Johannes Berg

On Mon, 2016-02-08 at 10:41 -0500, João Paulo Rechi Vita wrote:
> This creates a new LED trigger to be used by platform drivers as a
> default trigger for airplane-mode indicator LEDs.
> 
> By default this trigger will fire when RFKILL_OP_CHANGE_ALL is called
> for all types (RFKILL_TYPE_ALL), setting the LED brightness to
> LED_FULL
> when the changing the state to blocked, and to LED_OFF when the
> changing
> the state to unblocked. In the future there will be a mechanism for
> userspace to override the default policy, so it can implement its
> own.
> 
> This trigger will be used by the asus-wireless x86 platform driver.

Just one comment - I think you should be consistent with the _ vs - and
just use "rfkill-airplane-mode" as the name.

johannes

Re: [PATCH] net: ti: netcp: restore get/set_pad_info() functionality

2016-02-18 Thread David Miller

From: Murali Karicheri 
Date: Thu, 18 Feb 2016 12:13:10 -0500

> On 02/16/2016 03:24 PM, David Miller wrote:
>> 
>> I would like some of the feedback to be taken into consideration and
>> integrated into this patch.
>> 
>> Part of the reason this regression was introduced was probably because
>> the purpose of some fields or descriptor semantics was not defined
>> properly.
>> 
>> Therefore it is absolutely appropriate to properly name and document
>> these fields as part of the bug fix.
>> 
>> Thank you.
>> 
> David,
> 
> I will take over this from Grygorii as he is out of office. 
> 
> I propose to keep this patch as is and add additional patch to address
> the feedback in the same series (v1). Is that fine with you?

No, I want the code to be clarified along with the bug fix so that
this bug is unlikely to resurface.

Re: [PATCH net v2 2/3] geneve: Relax MTU constraints

2016-02-18 Thread David Miller

From: David Wragg 
Date: Thu, 18 Feb 2016 16:54:14 +

> Tom Herbert  writes:
>> Please implement like in ip_tunnel_change_mtu (or better yet call it),
>> that is the precedent for tunnels.
> 
> I've made geneve_change_mtu follow ip_tunnel_change_mtu in v2.
> 
> If it were to call it instead, are you suggesting just passing in
> t_hlen?  Or restructuring geneve.c to re-use the whole ip_tunnel
> infrastructure?
> 
> Also, I'm not sure where the 0xFFF8 comes from in
> __ip_tunnel_change_mtu.  Any ideas why 0xFFF8 rather than 0x?  It
> goes all the way back to the inital import of the kernel into git.

Some 8 byte multiple requirement, perhaps to do with fragmentation.

1 2 3 >

1 - 100 of 262 matches

Mail list logo