date:20160215

Re: [PATCH v7 4/8] time: Add driver cross timestamp interface for higher precision time synchronization

2016-02-15 Thread Richard Cochran

On Fri, Feb 12, 2016 at 12:25:25PM -0800, Christopher S. Hall wrote:
>  /**
> + * get_device_system_crosststamp - Synchronously capture system/device 
> timestamp
> + * @sync_devicetime: Callback to get simultaneous device time and
> + *   system counter from the device driver
> + * @xtstamp: Receives simultaneously captured system and device time
  @ctx: Private context passed to the 'get_time_fn' callback.

> + *
> + * Reads a timestamp from a device and correlates it to system time
> + */

Thanks,
Richard

Re: Same data to several sockets with just one syscall ?

2016-02-15 Thread Claudio Scordino

Hi Eric,

2016-02-15 19:16 GMT+01:00 Eric Dumazet :
> On Mon, 2016-02-15 at 11:03 +0100, Claudio Scordino wrote:
>> Hi Eric,
>>
>> 2016-02-12 11:35 GMT+01:00 Eric Dumazet :
>> > On Fri, 2016-02-12 at 09:53 +0100, Claudio Scordino wrote:
>> >
>> >> This makes the application waste time in entering/exiting the kernel
>> >> level several times.
>> >
>> > syscall overhead is usually small. Real cost is actually getting to the
>> > socket objects (fd manipulation), that you wont avoid with a
>> > super-syscall anyway.
>>
>> Thank you for answering. I see your point.
>>
>> However, assuming that a switch from user-space to kernel-space (and
>> back) needs about 200nsec of computation (which I guess is a
>> reasonable value for a 3GHz x86 architecture), the 50th receiver
>> experiences a latency of about 10 usec. In some domains (e.g.,
>> finance) this delay is not negligible.
>
> I thought these domains were using multicast.

They don't :)

There are a couple of reasons behind their choice:

- Multicast works only in SOCK_DGRAM (i.e. unreliable)

- For a limited number of receivers (e.g. 50) and depending on the
data size, the latency of multicast is almost equal to the one of TCP

>
>>
>> Moving the "fan-out" code into kernel space would remove this waste of
>> time. IMHO, the latency reduction would pay back the 100 lines of code
>> for adding a new syscall.
>
> It wont reduce the latency at all, and add a lot of maintenance hassle.
>
> syscall overhead is about 40 ns.

I thought it was slightly higher. Does this time also include the
interrupt return to go back to user-space ?


> This is the time taken to transmit ~50 bytes on 10Gbit link.
>
> 40ns * 50 = 2 usec only.
>
> Feel free to implement your idea and test it, you'll discover the added
> complexity is not worth it.

Honestly, I can't see how it could be that difficult: the kernel-side
code could just iterate on the existing syscall...

Can you please elaborate a bit further to let me understand why it
would be that complex ?

Many thanks and best regards,

 Claudio

Re: [PATCH v7 3/8] time: Remove duplicated code in ktime_get_raw_and_real()

2016-02-15 Thread Richard Cochran

On Fri, Feb 12, 2016 at 12:25:24PM -0800, Christopher S. Hall wrote:
> The code in ktime_get_snapshot() is a superset of the code in
> ktime_get_raw_and_real() code. Further, ktime_get_raw_and_real() is
> called only by the PPS code, pps_get_ts(). Consolidate the
> pps_get_ts() code into a single function calling ktime_get_snapshot()
> and eliminate ktime_get_raw_and_real(). A side effect of this is that
> the raw and real results of pps_get_ts() correspond to exactly the
> same clock cycle. Previously these values represented separate reads
> of the system clock.

Nice improvement.

> @@ -888,6 +888,8 @@ void ktime_get_snapshot(struct system_time_snapshot 
> *systime_snapshot)
>   s64 nsec_real;
>   cycle_t now;
>  
> + WARN_ON(timekeeping_suspended);
...
> - WARN_ON_ONCE(timekeeping_suspended);

Is this change intentional?

Thanks,
Richard

Re: ravb: Possible Regression In "net: phy: Avoid polling PHY with PHY_IGNORE_INTERRUPTS"

2016-02-15 Thread Geert Uytterhoeven

Hi Simon,

On Tue, Feb 16, 2016 at 4:26 AM, Simon Horman  wrote:
> I have observed what appears to be a regression in the ravb ethernet driver
> caused by d5c3d84657db ("net: phy: Avoid polling PHY with
> PHY_IGNORE_INTERRUPTS").
>
> When booting net-next configured with the ARM64 defconfig on the Renesas
> r8a7795/salvator-x I see the following and the ravb is unable to access the
> network. With the above mentioned patch reverted I am able to boot to
> user-space using nfsroot.

The ravb interrupt is connected to a GPIO controller, which is
runtime-suspended and thus not serving the interrupt.

Cfr. "[PATCH/RFC] gpio: rcar: Add Runtime PM handling for interrupts"
(http://www.spinics.net/lists/linux-renesas-soc/msg00532.html).

I assume it worked before as the PHY driver polled the PHY instead of relying
solely on the interrupt.

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds

[net-next PATCH v2 8/8] net: ixgbe: abort with cls u32 divisor groups greater than 1

2016-02-15 Thread John Fastabend

This patch ensures ixgbe will not try to offload hash tables from the
u32 module. The device class does not currently support this so until
it is enabled just abort on these tables.

Interestingly the more flexible your hardware is the less code you
need to implement to guard against these cases.

Signed-off-by: John Fastabend 
---
 drivers/net/ethernet/intel/ixgbe/ixgbe.h  |1 +
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |   31 +
 2 files changed, 32 insertions(+)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe.h 
b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
index fc877c7..84fa28c 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe.h
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
@@ -798,6 +798,7 @@ struct ixgbe_adapter {
 
 #define IXGBE_MAX_LINK_HANDLE 10
struct ixgbe_mat_field *jump_tables[IXGBE_MAX_LINK_HANDLE];
+   unsigned long tables;
 
 /* maximum number of RETA entries among all devices supported by ixgbe
  * driver: currently it's x550 device in non-SRIOV mode
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c 
b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index adbb7c7..3d154c9 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -8217,6 +8217,27 @@ static int ixgbe_delete_clsu32(struct ixgbe_adapter 
*adapter,
return err;
 }
 
+static int ixgbe_configure_clsu32_add_hnode(struct ixgbe_adapter *adapter,
+   __be16 protocol,
+   struct tc_cls_u32_offload *cls)
+{
+   /* This ixgbe devices do not support hash tables at the moment
+* so abort when given hash tables.
+*/
+   if (cls->hnode.divisor > 0)
+   return -EINVAL;
+
+   set_bit(TC_U32_USERHTID(cls->hnode.handle), &adapter->tables);
+   return 0;
+}
+
+static int ixgbe_configure_clsu32_del_hnode(struct ixgbe_adapter *adapter,
+   struct tc_cls_u32_offload *cls)
+{
+   clear_bit(TC_U32_USERHTID(cls->hnode.handle), &adapter->tables);
+   return 0;
+}
+
 static int ixgbe_configure_clsu32(struct ixgbe_adapter *adapter,
  __be16 protocol,
  struct tc_cls_u32_offload *cls)
@@ -8251,6 +8272,9 @@ static int ixgbe_configure_clsu32(struct ixgbe_adapter 
*adapter,
struct ixgbe_nexthdr *nexthdr = ixgbe_ipv4_jumps;
u32 uhtid = TC_U32_USERHTID(cls->knode.link_handle);
 
+   if (!test_bit(uhtid, &adapter->tables))
+   return -EINVAL;
+
for (i = 0; nexthdr[i].jump; i++) {
if (nexthdr->o != cls->knode.sel->offoff ||
nexthdr->s != cls->knode.sel->offshift ||
@@ -8386,6 +8410,13 @@ int __ixgbe_setup_tc(struct net_device *dev, u32 handle, 
__be16 proto,
  proto, tc->cls_u32);
case TC_CLSU32_DELETE_KNODE:
return ixgbe_delete_clsu32(adapter, tc->cls_u32);
+   case TC_CLSU32_NEW_HNODE:
+   case TC_CLSU32_REPLACE_HNODE:
+   return ixgbe_configure_clsu32_add_hnode(adapter, proto,
+   tc->cls_u32);
+   case TC_CLSU32_DELETE_HNODE:
+   return ixgbe_configure_clsu32_del_hnode(adapter,
+   tc->cls_u32);
default:
return -EINVAL;
}

[net-next PATCH v2 7/8] net: ixgbe: add support for tc_u32 offload

2016-02-15 Thread John Fastabend

This adds initial support for offloading the u32 tc classifier. This
initial implementation only implements a few base matches and actions
to illustrate the use of the infrastructure patches.

However it is an interesting subset because it handles the u32 next
hdr logic to correctly map tcp packets from ip headers using the ihl
and protocol fields. After this is accepted we can extend the match
and action fields easily by updating the model header file.

Also only the drop action is supported initially.

Here is a short test script,

 #tc qdisc add dev eth4 ingress
 #tc filter add dev eth4 parent : protocol ip \
u32 ht 800: order 1 \
match ip dst 15.0.0.1/32 match ip src 15.0.0.2/32 action drop

<-- hardware has dst/src ip match rule installed -->

 #tc filter del dev eth4 parent : prio 49152
 #tc filter add dev eth4 parent : protocol ip prio 99 \
handle 1: u32 divisor 1
 #tc filter add dev eth4 protocol ip parent : prio 99 \
u32 ht 800: order 1 link 1: \
offset at 0 mask 0f00 shift 6 plus 0 eat match ip protocol 6 ff
 #tc filter add dev eth4 parent : protocol ip \
u32 ht 1: order 3 match tcp src 23  action drop

<-- hardware has tcp src port rule installed -->

 #tc qdisc del dev eth4 parent :

<-- hardware cleaned up -->

Signed-off-by: John Fastabend 
---
 drivers/net/ethernet/intel/ixgbe/ixgbe.h |6 +
 drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c |6 -
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c|  231 +++---
 3 files changed, 213 insertions(+), 30 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe.h 
b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
index 4b9156c..fc877c7 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe.h
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
@@ -796,6 +796,9 @@ struct ixgbe_adapter {
u8 default_up;
unsigned long fwd_bitmask; /* Bitmask indicating in use pools */
 
+#define IXGBE_MAX_LINK_HANDLE 10
+   struct ixgbe_mat_field *jump_tables[IXGBE_MAX_LINK_HANDLE];
+
 /* maximum number of RETA entries among all devices supported by ixgbe
  * driver: currently it's x550 device in non-SRIOV mode
  */
@@ -925,6 +928,9 @@ s32 ixgbe_fdir_erase_perfect_filter_82599(struct ixgbe_hw 
*hw,
  u16 soft_id);
 void ixgbe_atr_compute_perfect_hash_82599(union ixgbe_atr_input *input,
  union ixgbe_atr_input *mask);
+int ixgbe_update_ethtool_fdir_entry(struct ixgbe_adapter *adapter,
+   struct ixgbe_fdir_filter *input,
+   u16 sw_idx);
 void ixgbe_set_rx_mode(struct net_device *netdev);
 #ifdef CONFIG_IXGBE_DCB
 void ixgbe_set_rx_drop_en(struct ixgbe_adapter *adapter);
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c 
b/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c
index bea96b3..726e0ee 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c
@@ -2520,9 +2520,9 @@ static int ixgbe_get_rxnfc(struct net_device *dev, struct 
ethtool_rxnfc *cmd,
return ret;
 }
 
-static int ixgbe_update_ethtool_fdir_entry(struct ixgbe_adapter *adapter,
-  struct ixgbe_fdir_filter *input,
-  u16 sw_idx)
+int ixgbe_update_ethtool_fdir_entry(struct ixgbe_adapter *adapter,
+   struct ixgbe_fdir_filter *input,
+   u16 sw_idx)
 {
struct ixgbe_hw *hw = &adapter->hw;
struct hlist_node *node2;
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c 
b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index dca2298..adbb7c7 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -51,6 +51,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 
 #ifdef CONFIG_OF
 #include 
@@ -65,6 +67,7 @@
 #include "ixgbe_common.h"
 #include "ixgbe_dcb_82599.h"
 #include "ixgbe_sriov.h"
+#include "ixgbe_model.h"
 
 char ixgbe_driver_name[] = "ixgbe";
 static const char ixgbe_driver_string[] =
@@ -5545,6 +5548,9 @@ static int ixgbe_sw_init(struct ixgbe_adapter *adapter)
 #endif /* CONFIG_IXGBE_DCB */
 #endif /* IXGBE_FCOE */
 
+   /* initialize static ixgbe jump table entries */
+   adapter->jump_tables[0] = ixgbe_ipv4_fields;
+
adapter->mac_table = kzalloc(sizeof(struct ixgbe_mac_addr) *
 hw->mac.num_rar_entries,
 GFP_ATOMIC);
@@ -8200,10 +8206,191 @@ int ixgbe_setup_tc(struct net_device *dev, u8 tc)
return 0;
 }
 
+static int ixgbe_delete_clsu32(struct ixgbe_adapter *adapter,
+  struct tc_cls_u32_offload *cls)
+{
+   int err;
+
+   spin_lock(&adapter->fdir_perfect_lock);
+   err = ixgbe_update_ethtool_fdir_entry(adapter,

[net-next PATCH v2 5/8] net: tc: helper functions to query action types

2016-02-15 Thread John Fastabend

This is a helper function drivers can use to learn if the
action type is a drop action.

Signed-off-by: John Fastabend 
---
 include/net/tc_act/tc_gact.h |   16 
 1 file changed, 16 insertions(+)

diff --git a/include/net/tc_act/tc_gact.h b/include/net/tc_act/tc_gact.h
index 592a6bc..3067a10 100644
--- a/include/net/tc_act/tc_gact.h
+++ b/include/net/tc_act/tc_gact.h
@@ -2,6 +2,7 @@
 #define __NET_TC_GACT_H
 
 #include 
+#include 
 
 struct tcf_gact {
struct tcf_common   common;
@@ -15,4 +16,19 @@ struct tcf_gact {
 #define to_gact(a) \
container_of(a->priv, struct tcf_gact, common)
 
+#ifdef CONFIG_NET_CLS_ACT
+static inline bool is_tcf_gact_dropped(const struct tc_action *a)
+{
+   struct tcf_gact *gact;
+
+   if (a->ops && a->ops->type != TCA_ACT_GACT)
+   return false;
+
+   gact = a->priv;
+   if (gact->tcf_action == TC_ACT_SHOT)
+   return true;
+
+   return false;
+}
+#endif
 #endif /* __NET_TC_GACT_H */

[net-next PATCH v2 6/8] net: ixgbe: add minimal parser details for ixgbe

2016-02-15 Thread John Fastabend

This adds an ixgbe data structure that is used to determine what
headers:fields can be matched and in what order they are supported.

For hardware devices this can be a bit tricky because typically
only pre-programmed (firmware, ucode, rtl) parse graphs will be
supported and we don't yet have an interface to change these from
the OS. So its sort of a you get whatever your friendly vendor
provides affair at the moment.

In the future we can add the get routines and set routines to
update this data structure. One interesting thing to note here
is the data structure here identifies ethernet, ip, and tcp
fields without having to hardcode them as enumerations or use
other identifiers.

Signed-off-by: John Fastabend 
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_model.h |  112 
 1 file changed, 112 insertions(+)
 create mode 100644 drivers/net/ethernet/intel/ixgbe/ixgbe_model.h

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_model.h 
b/drivers/net/ethernet/intel/ixgbe/ixgbe_model.h
new file mode 100644
index 000..43ebec4
--- /dev/null
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_model.h
@@ -0,0 +1,112 @@
+/***
+ *
+ * Intel 10 Gigabit PCI Express Linux drive
+ * Copyright(c) 2013 - 2015 Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program.  If not, see .
+ *
+ * The full GNU General Public License is included in this distribution in
+ * the file called "COPYING".
+ *
+ * Contact Information:
+ * e1000-devel Mailing List 
+ * Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
+ *
+ 
**/
+
+#ifndef _IXGBE_MODEL_H_
+#define _IXGBE_MODEL_H_
+
+#include "ixgbe.h"
+#include "ixgbe_type.h"
+
+struct ixgbe_mat_field {
+   unsigned int off;
+   unsigned int mask;
+   int (*val)(struct ixgbe_fdir_filter *input,
+  union ixgbe_atr_input *mask,
+  __u32 val, __u32 m);
+   unsigned int type;
+};
+
+static inline int ixgbe_mat_prgm_sip(struct ixgbe_fdir_filter *input,
+union ixgbe_atr_input *mask,
+__u32 val, __u32 m)
+{
+   input->filter.formatted.src_ip[0] = val;
+   mask->formatted.src_ip[0] = m;
+   return 0;
+}
+
+static inline int ixgbe_mat_prgm_dip(struct ixgbe_fdir_filter *input,
+union ixgbe_atr_input *mask,
+__u32 val, __u32 m)
+{
+   input->filter.formatted.dst_ip[0] = val;
+   mask->formatted.dst_ip[0] = m;
+   return 0;
+}
+
+static struct ixgbe_mat_field ixgbe_ipv4_fields[] = {
+   { .off = 12, .mask = -1, .val = ixgbe_mat_prgm_sip,
+ .type = IXGBE_ATR_FLOW_TYPE_IPV4},
+   { .off = 16, .mask = -1, .val = ixgbe_mat_prgm_dip,
+ .type = IXGBE_ATR_FLOW_TYPE_IPV4},
+   { .val = NULL } /* terminal node */
+};
+
+static inline int ixgbe_mat_prgm_sport(struct ixgbe_fdir_filter *input,
+  union ixgbe_atr_input *mask,
+  __u32 val, __u32 m)
+{
+   input->filter.formatted.src_port = val & 0x;
+   mask->formatted.src_port = m & 0x;
+   return 0;
+};
+
+static inline int ixgbe_mat_prgm_dport(struct ixgbe_fdir_filter *input,
+  union ixgbe_atr_input *mask,
+  __u32 val, __u32 m)
+{
+   input->filter.formatted.dst_port = val & 0x;
+   mask->formatted.dst_port = m & 0x;
+   return 0;
+};
+
+static struct ixgbe_mat_field ixgbe_tcp_fields[] = {
+   {.off = 0, .mask = 0x, .val = ixgbe_mat_prgm_sport,
+.type = IXGBE_ATR_FLOW_TYPE_TCPV4},
+   {.off = 2, .mask = 0x, .val = ixgbe_mat_prgm_dport,
+.type = IXGBE_ATR_FLOW_TYPE_TCPV4},
+   { .val = NULL } /* terminal node */
+};
+
+struct ixgbe_nexthdr {
+   /* offset, shift, and mask of position to next header */
+   unsigned int o;
+   __u32 s;
+   __u32 m;
+   /* match criteria to make this jump*/
+   unsigned int off;
+   __u32 val;
+   __u32 mask;
+   /* location of jump to make */
+   struct ixgbe_mat_field *jump;
+};
+
+static struct ixgbe_nexthdr ixgbe_ipv4_jumps[] = {
+   { .o = 0, .s = 6, .m = 0xf,
+ .off = 8, .val = 0x600,

[net-next PATCH v2 4/8] net: add tc offload feature flag

2016-02-15 Thread John Fastabend

Its useful to turn off the qdisc offload feature at a per device
level. This gives us a big hammer to enable/disable offloading.
More fine grained control (i.e. per rule) may be supported later.

Signed-off-by: John Fastabend 
---
 include/linux/netdev_features.h |3 +++
 net/core/ethtool.c  |1 +
 2 files changed, 4 insertions(+)

diff --git a/include/linux/netdev_features.h b/include/linux/netdev_features.h
index d9654f0e..a734bf4 100644
--- a/include/linux/netdev_features.h
+++ b/include/linux/netdev_features.h
@@ -67,6 +67,8 @@ enum {
NETIF_F_HW_L2FW_DOFFLOAD_BIT,   /* Allow L2 Forwarding in Hardware */
NETIF_F_BUSY_POLL_BIT,  /* Busy poll */
 
+   NETIF_F_HW_TC_BIT,  /* Offload TC infrastructure */
+
/*
 * Add your fresh new feature above and remember to update
 * netdev_features_strings[] in net/core/ethtool.c and maybe
@@ -124,6 +126,7 @@ enum {
 #define NETIF_F_HW_VLAN_STAG_TX__NETIF_F(HW_VLAN_STAG_TX)
 #define NETIF_F_HW_L2FW_DOFFLOAD   __NETIF_F(HW_L2FW_DOFFLOAD)
 #define NETIF_F_BUSY_POLL  __NETIF_F(BUSY_POLL)
+#define NETIF_F_HW_TC  __NETIF_F(HW_TC)
 
 #define for_each_netdev_feature(mask_addr, bit)\
for_each_set_bit(bit, (unsigned long *)mask_addr, NETDEV_FEATURE_COUNT)
diff --git a/net/core/ethtool.c b/net/core/ethtool.c
index 453c803..ab8376a7 100644
--- a/net/core/ethtool.c
+++ b/net/core/ethtool.c
@@ -98,6 +98,7 @@ static const char 
netdev_features_strings[NETDEV_FEATURE_COUNT][ETH_GSTRING_LEN]
[NETIF_F_RXALL_BIT] ="rx-all",
[NETIF_F_HW_L2FW_DOFFLOAD_BIT] = "l2-fwd-offload",
[NETIF_F_BUSY_POLL_BIT] ="busy-poll",
+   [NETIF_F_HW_TC_BIT] ="hw-tc-offload",
 };
 
 static const char

[net-next PATCH v2 3/8] net: sched: add cls_u32 offload hooks for netdevs

2016-02-15 Thread John Fastabend

This patch allows netdev drivers to consume cls_u32 offloads via
the ndo_setup_tc ndo op.

This works aligns with how network drivers have been doing qdisc
offloads for mqprio.

Signed-off-by: John Fastabend 
---
 include/linux/netdevice.h |6 ++-
 include/net/pkt_cls.h |   34 +++
 net/sched/cls_u32.c   |   99 -
 3 files changed, 136 insertions(+), 3 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 105a661..f8b500e 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -779,17 +779,21 @@ static inline bool netdev_phys_item_id_same(struct 
netdev_phys_item_id *a,
 typedef u16 (*select_queue_fallback_t)(struct net_device *dev,
   struct sk_buff *skb);
 
-/* This structure holds attributes of qdisc and classifiers
+/* These structures hold the attributes of qdisc and classifiers
  * that are being passed to the netdevice through the setup_tc op.
  */
 enum {
TC_SETUP_MQPRIO,
+   TC_SETUP_CLSU32,
 };
 
+struct tc_cls_u32_offload;
+
 struct tc_to_netdev {
unsigned int type;
union {
u8 tc;
+   struct tc_cls_u32_offload *cls_u32;
};
 };
 
diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
index bc49967..174e4dc 100644
--- a/include/net/pkt_cls.h
+++ b/include/net/pkt_cls.h
@@ -358,4 +358,38 @@ tcf_match_indev(struct sk_buff *skb, int ifindex)
 }
 #endif /* CONFIG_NET_CLS_IND */
 
+struct tc_cls_u32_knode {
+   struct tcf_exts *exts;
+   u8 fshift;
+   u32 handle;
+   u32 val;
+   u32 mask;
+   u32 link_handle;
+   struct tc_u32_sel *sel;
+};
+
+struct tc_cls_u32_hnode {
+   u32 handle;
+   u32 prio;
+   unsigned int divisor;
+};
+
+enum {
+   TC_CLSU32_NEW_KNODE,
+   TC_CLSU32_REPLACE_KNODE,
+   TC_CLSU32_DELETE_KNODE,
+   TC_CLSU32_NEW_HNODE,
+   TC_CLSU32_REPLACE_HNODE,
+   TC_CLSU32_DELETE_HNODE,
+};
+
+struct tc_cls_u32_offload {
+   /* knode values */
+   int command;
+   union {
+   struct tc_cls_u32_knode knode;
+   struct tc_cls_u32_hnode hnode;
+   };
+};
+
 #endif
diff --git a/net/sched/cls_u32.c b/net/sched/cls_u32.c
index 4fbb674..d54bc94 100644
--- a/net/sched/cls_u32.c
+++ b/net/sched/cls_u32.c
@@ -43,6 +43,7 @@
 #include 
 #include 
 #include 
+#include 
 
 struct tc_u_knode {
struct tc_u_knode __rcu *next;
@@ -424,6 +425,93 @@ static int u32_delete_key(struct tcf_proto *tp, struct 
tc_u_knode *key)
return 0;
 }
 
+static void u32_remove_hw_knode(struct tcf_proto *tp, u32 handle)
+{
+   struct net_device *dev = tp->q->dev_queue->dev;
+   struct tc_cls_u32_offload u32_offload = {0};
+   struct tc_to_netdev offload;
+
+   offload.type = TC_SETUP_CLSU32;
+   offload.cls_u32 = &u32_offload;
+
+   if (dev->netdev_ops->ndo_setup_tc) {
+   offload.cls_u32->command = TC_CLSU32_DELETE_KNODE;
+   offload.cls_u32->knode.handle = handle;
+   dev->netdev_ops->ndo_setup_tc(dev, tp->q->handle,
+ tp->protocol, &offload);
+   }
+}
+
+static void u32_replace_hw_hnode(struct tcf_proto *tp, struct tc_u_hnode *h)
+{
+   struct net_device *dev = tp->q->dev_queue->dev;
+   struct tc_cls_u32_offload u32_offload = {0};
+   struct tc_to_netdev offload;
+
+   offload.type = TC_SETUP_CLSU32;
+   offload.cls_u32 = &u32_offload;
+
+   if (dev->netdev_ops->ndo_setup_tc) {
+   offload.cls_u32->command = TC_CLSU32_NEW_HNODE;
+   offload.cls_u32->hnode.divisor = h->divisor;
+   offload.cls_u32->hnode.handle = h->handle;
+   offload.cls_u32->hnode.prio = h->prio;
+
+   dev->netdev_ops->ndo_setup_tc(dev, tp->q->handle,
+ tp->protocol, &offload);
+   }
+}
+
+static void u32_clear_hw_hnode(struct tcf_proto *tp, struct tc_u_hnode *h)
+{
+   struct net_device *dev = tp->q->dev_queue->dev;
+   struct tc_cls_u32_offload u32_offload = {0};
+   struct tc_to_netdev offload;
+
+   offload.type = TC_SETUP_CLSU32;
+   offload.cls_u32 = &u32_offload;
+
+   if (dev->netdev_ops->ndo_setup_tc) {
+   offload.cls_u32->command = TC_CLSU32_DELETE_HNODE;
+   offload.cls_u32->hnode.divisor = h->divisor;
+   offload.cls_u32->hnode.handle = h->handle;
+   offload.cls_u32->hnode.prio = h->prio;
+
+   dev->netdev_ops->ndo_setup_tc(dev, tp->q->handle,
+ tp->protocol, &offload);
+   }
+}
+
+static void u32_replace_hw_knode(struct tcf_proto *tp, struct tc_u_knode *n)
+{
+   struct net_device *dev = tp->q->dev_queue->dev;
+   struct tc_cls_u32_offload u32_offload = {0};
+   struct tc_to_netdev offload;
+
+   offload.type = TC_SETUP_CLSU3

[net-next PATCH v2 2/8] net: rework setup_tc ndo op to consume general tc operand

2016-02-15 Thread John Fastabend

This patch updates setup_tc so we can pass additional parameters into
the ndo op in a generic way. To do this we provide structured union
and type flag.

This lets each classifier and qdisc provide its own set of attributes
without having to add new ndo ops or grow the signature of the
callback.

Signed-off-by: John Fastabend 
---
 drivers/net/ethernet/amd/xgbe/xgbe-drv.c|9 ++---
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c |7 ---
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h |3 ++-
 drivers/net/ethernet/broadcom/bnxt/bnxt.c   |8 ++--
 drivers/net/ethernet/intel/fm10k/fm10k_netdev.c |7 ---
 drivers/net/ethernet/intel/i40e/i40e.h  |3 ++-
 drivers/net/ethernet/intel/i40e/i40e_main.c |   10 ++
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c   |7 ---
 drivers/net/ethernet/mellanox/mlx4/en_netdev.c  |7 ---
 drivers/net/ethernet/sfc/efx.h  |3 ++-
 drivers/net/ethernet/sfc/tx.c   |9 ++---
 drivers/net/ethernet/ti/netcp_core.c|   13 +++--
 include/linux/netdevice.h   |   20 +++-
 net/sched/sch_mqprio.c  |9 ++---
 14 files changed, 78 insertions(+), 37 deletions(-)

diff --git a/drivers/net/ethernet/amd/xgbe/xgbe-drv.c 
b/drivers/net/ethernet/amd/xgbe/xgbe-drv.c
index 9955cae..cfd3f7e 100644
--- a/drivers/net/ethernet/amd/xgbe/xgbe-drv.c
+++ b/drivers/net/ethernet/amd/xgbe/xgbe-drv.c
@@ -1626,15 +1626,18 @@ static void xgbe_poll_controller(struct net_device 
*netdev)
 }
 #endif /* End CONFIG_NET_POLL_CONTROLLER */
 
-static int xgbe_setup_tc(struct net_device *netdev, u32 handle, u8 tc)
+static int xgbe_setup_tc(struct net_device *netdev, u32 handle, __be16 proto,
+struct tc_to_netdev *tc_to_netdev)
 {
struct xgbe_prv_data *pdata = netdev_priv(netdev);
unsigned int offset, queue;
-   u8 i;
+   u8 i, tc;
 
-   if (handle != TC_H_ROOT)
+   if (handle != TC_H_ROOT || tc_to_netdev->type != TC_SETUP_MQPRIO)
return -EINVAL;
 
+   tc = tc_to_netdev->tc;
+
if (tc && (tc != pdata->hw_feat.tc_cnt))
return -EINVAL;
 
diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c 
b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
index 1c7ff51..e925831 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
@@ -4272,11 +4272,12 @@ int bnx2x_setup_tc(struct net_device *dev, u8 num_tc)
return 0;
 }
 
-int __bnx2x_setup_tc(struct net_device *dev, u32 handle, u8 num_tc)
+int __bnx2x_setup_tc(struct net_device *dev, u32 handle, __be16 proto,
+struct tc_to_netdev *tc)
 {
-   if (handle != TC_H_ROOT)
+   if (handle != TC_H_ROOT || tc->type != TC_SETUP_MQPRIO)
return -EINVAL;
-   return bnx2x_setup_tc(dev, num_tc);
+   return bnx2x_setup_tc(dev, tc->tc);
 }
 
 /* called with rtnl_lock */
diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h 
b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h
index e92d6e7..ef2c776 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h
@@ -486,7 +486,8 @@ netdev_tx_t bnx2x_start_xmit(struct sk_buff *skb, struct 
net_device *dev);
 
 /* setup_tc callback */
 int bnx2x_setup_tc(struct net_device *dev, u8 num_tc);
-int __bnx2x_setup_tc(struct net_device *dev, u32 handle, u8 num_tc);
+int __bnx2x_setup_tc(struct net_device *dev, u32 handle, __be16 proto,
+struct tc_to_netdev *tc);
 
 int bnx2x_get_vf_config(struct net_device *dev, int vf,
struct ifla_vf_info *ivi);
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c 
b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index ff08faf..169920a 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -5370,13 +5370,17 @@ static int bnxt_change_mtu(struct net_device *dev, int 
new_mtu)
return 0;
 }
 
-static int bnxt_setup_tc(struct net_device *dev, u32 handle, u8 tc)
+static int bnxt_setup_tc(struct net_device *dev, u32 handle, __be16 proto,
+struct tc_to_netdev *ntc)
 {
struct bnxt *bp = netdev_priv(dev);
+   u8 tc;
 
-   if (handle != TC_H_ROOT)
+   if (handle != TC_H_ROOT || ntc->type != TC_SETUP_MQPRIO)
return -EINVAL;
 
+   tc = ntc->tc;
+
if (tc > bp->max_tc) {
netdev_err(dev, "too many traffic classes requested: %d Max 
supported is %d\n",
   tc, bp->max_tc);
diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_netdev.c 
b/drivers/net/ethernet/intel/fm10k/fm10k_netdev.c
index 12701a4..dc1a821 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_netdev.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_netdev.c
@@ -1204,12 +1204,13 @@ err_queueing_scheme:

[net-next PATCH v2 1/8] net: rework ndo tc op to consume additional qdisc handle parameter

2016-02-15 Thread John Fastabend

The ndo_setup_tc() op was added to support drivers offloading tx
qdiscs however only support for mqprio was ever added. So we
only ever added support for passing the number of traffic classes
to the driver.

This patch generalizes the ndo_setup_tc op so that a handle can
be provided to indicate if the offload is for ingress or egress
or potentially even child qdiscs.

CC: Murali Karicheri 
CC: Shradha Shah 
CC: Or Gerlitz 
CC: Ariel Elior 
CC: Jeff Kirsher 
CC: Bruce Allan 
CC: Jesse Brandeburg 
CC: Don Skidmore 
Signed-off-by: John Fastabend 
---
 drivers/net/ethernet/amd/xgbe/xgbe-drv.c |5 -
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c  |7 +++
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h  |1 +
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c |2 +-
 drivers/net/ethernet/broadcom/bnxt/bnxt.c|5 -
 drivers/net/ethernet/intel/fm10k/fm10k_netdev.c  |   10 +-
 drivers/net/ethernet/intel/i40e/i40e.h   |2 +-
 drivers/net/ethernet/intel/i40e/i40e_fcoe.c  |2 +-
 drivers/net/ethernet/intel/i40e/i40e_main.c  |   17 -
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c|   11 ++-
 drivers/net/ethernet/mellanox/mlx4/en_netdev.c   |   12 ++--
 drivers/net/ethernet/sfc/efx.h   |2 +-
 drivers/net/ethernet/sfc/tx.c|5 -
 drivers/net/ethernet/ti/netcp_core.c |5 -
 include/linux/netdevice.h|3 ++-
 net/sched/sch_mqprio.c   |5 +++--
 16 files changed, 74 insertions(+), 20 deletions(-)

diff --git a/drivers/net/ethernet/amd/xgbe/xgbe-drv.c 
b/drivers/net/ethernet/amd/xgbe/xgbe-drv.c
index 8a9b493..9955cae 100644
--- a/drivers/net/ethernet/amd/xgbe/xgbe-drv.c
+++ b/drivers/net/ethernet/amd/xgbe/xgbe-drv.c
@@ -1626,12 +1626,15 @@ static void xgbe_poll_controller(struct net_device 
*netdev)
 }
 #endif /* End CONFIG_NET_POLL_CONTROLLER */
 
-static int xgbe_setup_tc(struct net_device *netdev, u8 tc)
+static int xgbe_setup_tc(struct net_device *netdev, u32 handle, u8 tc)
 {
struct xgbe_prv_data *pdata = netdev_priv(netdev);
unsigned int offset, queue;
u8 i;
 
+   if (handle != TC_H_ROOT)
+   return -EINVAL;
+
if (tc && (tc != pdata->hw_feat.tc_cnt))
return -EINVAL;
 
diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c 
b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
index 9695a4c..1c7ff51 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
@@ -4272,6 +4272,13 @@ int bnx2x_setup_tc(struct net_device *dev, u8 num_tc)
return 0;
 }
 
+int __bnx2x_setup_tc(struct net_device *dev, u32 handle, u8 num_tc)
+{
+   if (handle != TC_H_ROOT)
+   return -EINVAL;
+   return bnx2x_setup_tc(dev, num_tc);
+}
+
 /* called with rtnl_lock */
 int bnx2x_change_mac_addr(struct net_device *dev, void *p)
 {
diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h 
b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h
index 4cbb03f8..e92d6e7 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h
@@ -486,6 +486,7 @@ netdev_tx_t bnx2x_start_xmit(struct sk_buff *skb, struct 
net_device *dev);
 
 /* setup_tc callback */
 int bnx2x_setup_tc(struct net_device *dev, u8 num_tc);
+int __bnx2x_setup_tc(struct net_device *dev, u32 handle, u8 num_tc);
 
 int bnx2x_get_vf_config(struct net_device *dev, int vf,
struct ifla_vf_info *ivi);
diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c 
b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
index 6c4e3a6..b17bb17 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
@@ -12994,7 +12994,7 @@ static const struct net_device_ops bnx2x_netdev_ops = {
 #ifdef CONFIG_NET_POLL_CONTROLLER
.ndo_poll_controller= poll_bnx2x,
 #endif
-   .ndo_setup_tc   = bnx2x_setup_tc,
+   .ndo_setup_tc   = __bnx2x_setup_tc,
 #ifdef CONFIG_BNX2X_SRIOV
.ndo_set_vf_mac = bnx2x_set_vf_mac,
.ndo_set_vf_vlan= bnx2x_set_vf_vlan,
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c 
b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index 5dc89e5..ff08faf 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -5370,10 +5370,13 @@ static int bnxt_change_mtu(struct net_device *dev, int 
new_mtu)
return 0;
 }
 
-static int bnxt_setup_tc(struct net_device *dev, u8 tc)
+static int bnxt_setup_tc(struct net_device *dev, u32 handle, u8 tc)
 {
struct bnxt *bp = netdev_priv(dev);
 
+   if (handle != TC_H_ROOT)
+   return -EINVAL;
+
if (tc > bp->max_tc) {
netdev_err(dev, "too many traffic classes requested: %d Max 
supported is %d\n",

[net-next PATCH v2 0/8] tc offload for cls_u32 on ixgbe

2016-02-15 Thread John Fastabend

This extends the setup_tc framework so it can support more than just
the mqprio offload and push other classifiers and qdiscs into the
hardware. The series here targets the u32 classifier and ixgbe
driver. I worked out the u32 classifier because it is protocol
oblivious and aligns with multiple hardware devices I have access
to. I did an initial implementation on ixgbe because (a) I have one
in my box (b) its a stable driver and (c) it is relatively simple
compared to the other devices I have here but still has enough
flexibility to exercise the features of cls_u32.

I intentionally limited the scope of this series to the basic
feature set. Specifically this uses a 'big hammer' feature bit
to do the offload or not. If the bit is set you get offloaded rules
if it is not then rules will not be offloaded. If we can agree on
this patch series there are some more patches on my queue we can
talk about to make the offload decision per rule using flags similar
to how we do l2 mac updates. Additionally the error strategy can
be improved to be hard aborting, log and continue, etc. I think
these are nice to have improvements but shouldn't block this series.

Also by adding get_parse_graph and set_parse_graph attributes as
in my previous flow_api work we can build programmable devices
and programmatically learn when rules can or can not be loaded
into the hardware. Again future work.

---

John Fastabend (8):
  net: rework ndo tc op to consume additional qdisc handle parameter
  net: rework setup_tc ndo op to consume general tc operand
  net: sched: add cls_u32 offload hooks for netdevs
  net: add tc offload feature flag
  net: tc: helper functions to query action types
  net: ixgbe: add minimal parser details for ixgbe
  net: ixgbe: add support for tc_u32 offload
  net: ixgbe: abort with cls u32 divisor groups greater than 1


 drivers/net/ethernet/amd/xgbe/xgbe-drv.c |   10 +
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c  |8 +
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h  |2 
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c |2 
 drivers/net/ethernet/broadcom/bnxt/bnxt.c|9 +
 drivers/net/ethernet/intel/fm10k/fm10k_netdev.c  |   11 +
 drivers/net/ethernet/intel/i40e/i40e.h   |3 
 drivers/net/ethernet/intel/i40e/i40e_fcoe.c  |2 
 drivers/net/ethernet/intel/i40e/i40e_main.c  |   19 +-
 drivers/net/ethernet/intel/ixgbe/ixgbe.h |7 +
 drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c |6 
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c|  272 --
 drivers/net/ethernet/intel/ixgbe/ixgbe_model.h   |  112 +
 drivers/net/ethernet/mellanox/mlx4/en_netdev.c   |   13 +
 drivers/net/ethernet/sfc/efx.h   |3 
 drivers/net/ethernet/sfc/tx.c|   10 +
 drivers/net/ethernet/ti/netcp_core.c |   14 +
 include/linux/netdev_features.h  |3 
 include/linux/netdevice.h|   25 ++
 include/net/pkt_cls.h|   34 +++
 include/net/tc_act/tc_gact.h |   16 +
 net/core/ethtool.c   |1 
 net/sched/cls_u32.c  |   99 
 net/sched/sch_mqprio.c   |8 -
 24 files changed, 632 insertions(+), 57 deletions(-)
 create mode 100644 drivers/net/ethernet/intel/ixgbe/ixgbe_model.h

--
Signature

Re: ravb: Possible Regression In "net: phy: Avoid polling PHY with PHY_IGNORE_INTERRUPTS"

2016-02-15 Thread Simon Horman

Hi Florian,

On Mon, Feb 15, 2016 at 09:08:46PM -0800, Florian Fainelli wrote:
> On February 15, 2016 7:26:46 PM PST, Simon Horman  wrote:
> >Hi Florian,
> >
> >I have observed what appears to be a regression in the ravb ethernet
> >driver
> >caused by d5c3d84657db ("net: phy: Avoid polling PHY with
> >PHY_IGNORE_INTERRUPTS").
> >
> >When booting net-next configured with the ARM64 defconfig on the
> >Renesas
> >r8a7795/salvator-x I see the following and the ravb is unable to access
> >the
> >network. With the above mentioned patch reverted I am able to boot to
> >user-space using nfsroot.
> 
> Thanks for the report, I will take a closer look tomorrow, can you test 
> patches easily on top of 4.5-rc on this platform?

Thanks.

Yes, I can easily test patches. The platform is supported in mainline as of
v4.5-rc (this problem aside) and I have access to a board.

Re: ravb: Possible Regression In "net: phy: Avoid polling PHY with PHY_IGNORE_INTERRUPTS"

2016-02-15 Thread Florian Fainelli

On February 15, 2016 7:26:46 PM PST, Simon Horman  wrote:
>Hi Florian,
>
>I have observed what appears to be a regression in the ravb ethernet
>driver
>caused by d5c3d84657db ("net: phy: Avoid polling PHY with
>PHY_IGNORE_INTERRUPTS").
>
>When booting net-next configured with the ARM64 defconfig on the
>Renesas
>r8a7795/salvator-x I see the following and the ravb is unable to access
>the
>network. With the above mentioned patch reverted I am able to boot to
>user-space using nfsroot.

Thanks for the report, I will take a closer look tomorrow, can you test patches 
easily on top of 4.5-rc on this platform?

>
>
>[0.00] Booting Linux on physical CPU 0x0
>[0.00] Linux version 4.5.0-rc2+
>(ho...@ayumi.isobedori.kobe.vergenet.net) (gcc version 4.8.5 (Linaro
>GCC 4.8-2015.06) ) #90 SMP PREEMPT Tue Feb 16 12:22:08 JST 2016
>[0.00] Boot CPU: AArch64 Processor [411fd073]
>[0.00] debug: ignoring loglevel setting.
>[0.00] efi: Getting EFI parameters from FDT:
>[0.00] efi: UEFI not found.
>[0.00] cma: Reserved 16 MiB at 0x7f00
>[0.00] On node 0 totalpages: 229376
>[0.00]   DMA zone: 3584 pages used for memmap
>[0.00]   DMA zone: 0 pages reserved
>[0.00]   DMA zone: 229376 pages, LIFO batch:31
>[0.00] psci: probing for conduit method from DT.
>[0.00] psci: PSCIv1.0 detected in firmware.
>[0.00] psci: Using standard PSCI v0.2 function IDs
>[0.00] psci: Trusted OS migration not required
>[0.00] PERCPU: Embedded 20 pages/cpu @ffc036f8f000 s42496
>r8192 d31232 u81920
>[0.00] pcpu-alloc: s42496 r8192 d31232 u81920 alloc=20*4096
>[0.00] pcpu-alloc: [0] 0 [0] 1 [0] 2 [0] 3 
>[0.00] Detected PIPT I-cache on CPU0
>[0.00] Built 1 zonelists in Zone order, mobility grouping on. 
>Total pages: 225792
>[0.00] Kernel command line: ignore_loglevel rw root=/dev/nfs
>ip=dhcp nfsroot=10.3.3.135:/srv/nfs/salvator-x-arm64
>[0.00] log_buf_len individual max cpu contribution: 4096 bytes
>[0.00] log_buf_len total cpu_extra contributions: 12288 bytes
>[0.00] log_buf_len min size: 16384 bytes
>[0.00] log_buf_len: 32768 bytes
>[0.00] early log buf free: 14796(90%)
>[0.00] PID hash table entries: 4096 (order: 3, 32768 bytes)
>[0.00] Dentry cache hash table entries: 131072 (order: 8,
>1048576 bytes)
>[0.00] Inode-cache hash table entries: 65536 (order: 7, 524288
>bytes)
>[0.00] software IO TLB [mem 0x79c0-0x7dc0] (64MB)
>mapped at [ffc031c0-ffc035bf]
>[0.00] Memory: 805744K/917504K available (6468K kernel code,
>593K rwdata, 2900K rodata, 724K init, 242K bss, 95376K reserved, 16384K
>cma-reserved)
>[0.00] Virtual kernel memory layout:
>[0.00] vmalloc : 0xff80 - 0xffbdbfff  
>(   246 GB)
>[0.00] vmemmap : 0xffbdc000 - 0xffbfc000  
>( 8 GB maximum)
>[0.00]   0xffbdc120 - 0xffbdc200  
>(14 MB actual)
>[0.00] fixed   : 0xffbffa7fd000 - 0xffbffac0  
>(  4108 KB)
>[0.00] PCI I/O : 0xffbffae0 - 0xffbffbe0  
>(16 MB)
>[0.00] modules : 0xffbffc00 - 0xffc0  
>(64 MB)
>[0.00] memory  : 0xffc0 - 0xffc03800  
>(   896 MB)
>[0.00]   .init : 0xffc0009a9000 - 0xffc000a5e000  
>(   724 KB)
>[0.00]   .text : 0xffc8 - 0xffc0009a8244  
>(  9377 KB)
>[0.00]   .data : 0xffc000a5e000 - 0xffc000af2600  
>(   594 KB)
>[0.00] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=4,
>Nodes=1
>[0.00] Preemptible hierarchical RCU implementation.
>[0.00]  Build-time adjustment of leaf fanout to 64.
>[0.00]  RCU restricting CPUs from NR_CPUS=64 to nr_cpu_ids=4.
>[0.00] RCU: Adjusting geometry for rcu_fanout_leaf=64,
>nr_cpu_ids=4
>[0.00] NR_IRQS:64 nr_irqs:64 0
>[0.00] Architected cp15 timer(s) running at 16.66MHz (virt).
>[0.00] clocksource: arch_sys_counter: mask: 0xff
>max_cycles: 0x3d7a162dd, max_idle_ns: 440795202225 ns
>[0.01] sched_clock: 56 bits at 16MHz, resolution 60ns, wraps
>every 219902321ns
>[0.55] Console: colour dummy device 80x25
>[0.000217] console [tty0] enabled
>[0.000229] Calibrating delay loop (skipped), value calculated using
>timer frequency.. 33.32 BogoMIPS (lpj=66640)
>[0.000238] pid_max: default: 32768 minimum: 301
>[0.000263] Security Framework initialized
>[0.000278] Mount-cache hash table entries: 2048 (order: 2, 16384
>bytes)
>[0.000283] Mountpoint-cache hash table entries: 2048 (order: 2,
>16384 bytes)
>[0.000678] ASID allocator initialised with 65536 entries
>[0.020145] EFI services will not be available.
>[0.036098] Detected PIPT I-cache on

[PATCH net-next 0/2] cxgb4: Use __dev_[um]c_[un]sync for MAC address syncing

2016-02-15 Thread Hariprasad Shenai

Hi

This patch series adds support to use __dev_uc_sync/__dev_mc_sync to add
MAC address and __dev_uc_unsync/__dev_mc_unsync to delete MAC address.

This patch series has been created against net-next tree and includes
patches on cxgb4 and cxgb4vf driver.

We have included all the maintainers of respective drivers. Kindly review
the change and let us know in case of any review comments.

Thanks

Hariprasad Shenai (2):
  cxgb4: Use __dev_uc_sync/__dev_mc_sync to sync MAC address
  cxgb4vf: Use __dev_uc_sync/__dev_mc_sync to sync MAC address

 drivers/net/ethernet/chelsio/cxgb4/cxgb4.h |  27 
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c| 138 -
 drivers/net/ethernet/chelsio/cxgb4/t4_hw.c |  92 +++---
 drivers/net/ethernet/chelsio/cxgb4vf/adapter.h |   8 ++
 .../net/ethernet/chelsio/cxgb4vf/cxgb4vf_main.c| 116 ++---
 drivers/net/ethernet/chelsio/cxgb4vf/t4vf_common.h |  20 +++
 drivers/net/ethernet/chelsio/cxgb4vf/t4vf_hw.c |  88 ++---
 7 files changed, 355 insertions(+), 134 deletions(-)

-- 
2.3.4

[PATCH net-next 2/2] cxgb4vf: Use __dev_uc_sync/__dev_mc_sync to sync MAC address

2016-02-15 Thread Hariprasad Shenai

Signed-off-by: Hariprasad Shenai 
---
 drivers/net/ethernet/chelsio/cxgb4vf/adapter.h |   8 ++
 .../net/ethernet/chelsio/cxgb4vf/cxgb4vf_main.c| 116 +
 drivers/net/ethernet/chelsio/cxgb4vf/t4vf_common.h |  20 
 drivers/net/ethernet/chelsio/cxgb4vf/t4vf_hw.c |  88 +---
 4 files changed, 171 insertions(+), 61 deletions(-)

diff --git a/drivers/net/ethernet/chelsio/cxgb4vf/adapter.h 
b/drivers/net/ethernet/chelsio/cxgb4vf/adapter.h
index 6049f70e110c..4a707c32d76f 100644
--- a/drivers/net/ethernet/chelsio/cxgb4vf/adapter.h
+++ b/drivers/net/ethernet/chelsio/cxgb4vf/adapter.h
@@ -348,6 +348,11 @@ struct sge {
 #define for_each_ethrxq(sge, iter) \
for (iter = 0; iter < (sge)->ethqsets; iter++)
 
+struct hash_mac_addr {
+   struct list_head list;
+   u8 addr[ETH_ALEN];
+};
+
 /*
  * Per-"adapter" (Virtual Function) information.
  */
@@ -381,6 +386,9 @@ struct adapter {
 
/* various locks */
spinlock_t stats_lock;
+
+   /* list of MAC addresses in MPS Hash */
+   struct list_head mac_hlist;
 };
 
 enum { /* adapter flags */
diff --git a/drivers/net/ethernet/chelsio/cxgb4vf/cxgb4vf_main.c 
b/drivers/net/ethernet/chelsio/cxgb4vf/cxgb4vf_main.c
index 0cfa5d72cafd..8337514ababb 100644
--- a/drivers/net/ethernet/chelsio/cxgb4vf/cxgb4vf_main.c
+++ b/drivers/net/ethernet/chelsio/cxgb4vf/cxgb4vf_main.c
@@ -741,6 +741,9 @@ static int adapter_up(struct adapter *adapter)
 */
enable_rx(adapter);
t4vf_sge_start(adapter);
+
+   /* Initialize hash mac addr list*/
+   INIT_LIST_HEAD(&adapter->mac_hlist);
return 0;
 }
 
@@ -905,51 +908,74 @@ static inline unsigned int 
collect_netdev_mc_list_addrs(const struct net_device
return naddr;
 }
 
-/*
- * Configure the exact and hash address filters to handle a port's multicast
- * and secondary unicast MAC addresses.
- */
-static int set_addr_filters(const struct net_device *dev, bool sleep)
+static inline int cxgb4vf_set_addr_hash(struct port_info *pi)
 {
-   u64 mhash = 0;
-   u64 uhash = 0;
-   bool free = true;
-   unsigned int offset, naddr;
-   const u8 *addr[7];
-   int ret;
-   const struct port_info *pi = netdev_priv(dev);
+   struct adapter *adapter = pi->adapter;
+   u64 vec = 0;
+   bool ucast = false;
+   struct hash_mac_addr *entry;
 
-   /* first do the secondary unicast addresses */
-   for (offset = 0; ; offset += naddr) {
-   naddr = collect_netdev_uc_list_addrs(dev, addr, offset,
-ARRAY_SIZE(addr));
-   if (naddr == 0)
-   break;
+   /* Calculate the hash vector for the updated list and program it */
+   list_for_each_entry(entry, &adapter->mac_hlist, list) {
+   ucast |= is_unicast_ether_addr(entry->addr);
+   vec |= (1ULL << hash_mac_addr(entry->addr));
+   }
+   return t4vf_set_addr_hash(adapter, pi->viid, ucast, vec, false);
+}
 
-   ret = t4vf_alloc_mac_filt(pi->adapter, pi->viid, free,
- naddr, addr, NULL, &uhash, sleep);
-   if (ret < 0)
-   return ret;
+static int cxgb4vf_mac_sync(struct net_device *netdev, const u8 *mac_addr)
+{
+   struct port_info *pi = netdev_priv(netdev);
+   struct adapter *adapter = pi->adapter;
+   int ret;
+   u64 mhash = 0;
+   u64 uhash = 0;
+   bool free = false;
+   bool ucast = is_unicast_ether_addr(mac_addr);
+   const u8 *maclist[1] = {mac_addr};
+   struct hash_mac_addr *new_entry;
 
-   free = false;
+   ret = t4vf_alloc_mac_filt(adapter, pi->viid, free, 1, maclist,
+ NULL, ucast ? &uhash : &mhash, false);
+   if (ret < 0)
+   goto out;
+   /* if hash != 0, then add the addr to hash addr list
+* so on the end we will calculate the hash for the
+* list and program it
+*/
+   if (uhash || mhash) {
+   new_entry = kzalloc(sizeof(*new_entry), GFP_ATOMIC);
+   if (!new_entry)
+   return -ENOMEM;
+   ether_addr_copy(new_entry->addr, mac_addr);
+   list_add_tail(&new_entry->list, &adapter->mac_hlist);
+   ret = cxgb4vf_set_addr_hash(pi);
}
+out:
+   return ret < 0 ? ret : 0;
+}
 
-   /* next set up the multicast addresses */
-   for (offset = 0; ; offset += naddr) {
-   naddr = collect_netdev_mc_list_addrs(dev, addr, offset,
-ARRAY_SIZE(addr));
-   if (naddr == 0)
-   break;
+static int cxgb4vf_mac_unsync(struct net_device *netdev, const u8 *mac_addr)
+{
+   struct port_info *pi = netdev_priv(netdev);
+   struct adapter *adapter = pi->adapter;
+   int ret;
+   const u8 *maclist[1] = {mac_addr};

[PATCH net-next 1/2] cxgb4: Use __dev_uc_sync/__dev_mc_sync to sync MAC address

2016-02-15 Thread Hariprasad Shenai

Signed-off-by: Hariprasad Shenai 
---
 drivers/net/ethernet/chelsio/cxgb4/cxgb4.h  |  27 +
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c | 138 ++--
 drivers/net/ethernet/chelsio/cxgb4/t4_hw.c  |  92 +---
 3 files changed, 184 insertions(+), 73 deletions(-)

diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h 
b/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h
index ec6e849676c1..1dac6c6111bf 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h
@@ -702,6 +702,11 @@ struct doorbell_stats {
u32 db_full;
 };
 
+struct hash_mac_addr {
+   struct list_head list;
+   u8 addr[ETH_ALEN];
+};
+
 struct adapter {
void __iomem *regs;
void __iomem *bar2;
@@ -740,6 +745,7 @@ struct adapter {
void *uld_handle[CXGB4_ULD_MAX];
struct list_head list_node;
struct list_head rcu_node;
+   struct list_head mac_hlist; /* list of MAC addresses in MPS Hash */
 
struct tid_info tids;
void **tid_release_head;
@@ -1207,6 +1213,24 @@ static inline int t4_wr_mbox_ns(struct adapter *adap, 
int mbox, const void *cmd,
return t4_wr_mbox_meat(adap, mbox, cmd, size, rpl, false);
 }
 
+/**
+ * hash_mac_addr - return the hash value of a MAC address
+ * @addr: the 48-bit Ethernet MAC address
+ *
+ * Hashes a MAC address according to the hash function used by HW inexact
+ * (hash) address matching.
+ */
+static inline int hash_mac_addr(const u8 *addr)
+{
+   u32 a = ((u32)addr[0] << 16) | ((u32)addr[1] << 8) | addr[2];
+   u32 b = ((u32)addr[3] << 16) | ((u32)addr[4] << 8) | addr[5];
+
+   a ^= b;
+   a ^= (a >> 12);
+   a ^= (a >> 6);
+   return a & 0x3f;
+}
+
 void t4_write_indirect(struct adapter *adap, unsigned int addr_reg,
   unsigned int data_reg, const u32 *vals,
   unsigned int nregs, unsigned int start_idx);
@@ -1389,6 +1413,9 @@ int t4_set_rxmode(struct adapter *adap, unsigned int 
mbox, unsigned int viid,
 int t4_alloc_mac_filt(struct adapter *adap, unsigned int mbox,
  unsigned int viid, bool free, unsigned int naddr,
  const u8 **addr, u16 *idx, u64 *hash, bool sleep_ok);
+int t4_free_mac_filt(struct adapter *adap, unsigned int mbox,
+unsigned int viid, unsigned int naddr,
+const u8 **addr, bool sleep_ok);
 int t4_change_mac(struct adapter *adap, unsigned int mbox, unsigned int viid,
  int idx, const u8 *addr, bool persist, bool add_smt);
 int t4_set_addr_hash(struct adapter *adap, unsigned int mbox, unsigned int 
viid,
diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c 
b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
index b8a5fb0c32d4..adad73f7c8cd 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
@@ -338,84 +338,108 @@ void t4_os_portmod_changed(const struct adapter *adap, 
int port_id)
netdev_info(dev, "%s module inserted\n", mod_str[pi->mod_type]);
 }
 
+int dbfifo_int_thresh = 10; /* 10 == 640 entry threshold */
+module_param(dbfifo_int_thresh, int, 0644);
+MODULE_PARM_DESC(dbfifo_int_thresh, "doorbell fifo interrupt threshold");
+
 /*
- * Configure the exact and hash address filters to handle a port's multicast
- * and secondary unicast MAC addresses.
+ * usecs to sleep while draining the dbfifo
  */
-static int set_addr_filters(const struct net_device *dev, bool sleep)
+static int dbfifo_drain_delay = 1000;
+module_param(dbfifo_drain_delay, int, 0644);
+MODULE_PARM_DESC(dbfifo_drain_delay,
+"usecs to sleep while draining the dbfifo");
+
+static inline int cxgb4_set_addr_hash(struct port_info *pi)
 {
+   struct adapter *adap = pi->adapter;
+   u64 vec = 0;
+   bool ucast = false;
+   struct hash_mac_addr *entry;
+
+   /* Calculate the hash vector for the updated list and program it */
+   list_for_each_entry(entry, &adap->mac_hlist, list) {
+   ucast |= is_unicast_ether_addr(entry->addr);
+   vec |= (1ULL << hash_mac_addr(entry->addr));
+   }
+   return t4_set_addr_hash(adap, adap->mbox, pi->viid, ucast,
+   vec, false);
+}
+
+static int cxgb4_mac_sync(struct net_device *netdev, const u8 *mac_addr)
+{
+   struct port_info *pi = netdev_priv(netdev);
+   struct adapter *adap = pi->adapter;
+   int ret;
u64 mhash = 0;
u64 uhash = 0;
-   bool free = true;
-   u16 filt_idx[7];
-   const u8 *addr[7];
-   int ret, naddr = 0;
-   const struct netdev_hw_addr *ha;
-   int uc_cnt = netdev_uc_count(dev);
-   int mc_cnt = netdev_mc_count(dev);
-   const struct port_info *pi = netdev_priv(dev);
-   unsigned int mb = pi->adapter->pf;
+   bool free = false;
+   bool ucast = is_unicast_ether_addr(mac_addr);
+   const u8 *maclis

ravb: Possible Regression In "net: phy: Avoid polling PHY with PHY_IGNORE_INTERRUPTS"

2016-02-15 Thread Simon Horman

Hi Florian,

I have observed what appears to be a regression in the ravb ethernet driver
caused by d5c3d84657db ("net: phy: Avoid polling PHY with
PHY_IGNORE_INTERRUPTS").

When booting net-next configured with the ARM64 defconfig on the Renesas
r8a7795/salvator-x I see the following and the ravb is unable to access the
network. With the above mentioned patch reverted I am able to boot to
user-space using nfsroot.


[0.00] Booting Linux on physical CPU 0x0
[0.00] Linux version 4.5.0-rc2+ 
(ho...@ayumi.isobedori.kobe.vergenet.net) (gcc version 4.8.5 (Linaro GCC 
4.8-2015.06) ) #90 SMP PREEMPT Tue Feb 16 12:22:08 JST 2016
[0.00] Boot CPU: AArch64 Processor [411fd073]
[0.00] debug: ignoring loglevel setting.
[0.00] efi: Getting EFI parameters from FDT:
[0.00] efi: UEFI not found.
[0.00] cma: Reserved 16 MiB at 0x7f00
[0.00] On node 0 totalpages: 229376
[0.00]   DMA zone: 3584 pages used for memmap
[0.00]   DMA zone: 0 pages reserved
[0.00]   DMA zone: 229376 pages, LIFO batch:31
[0.00] psci: probing for conduit method from DT.
[0.00] psci: PSCIv1.0 detected in firmware.
[0.00] psci: Using standard PSCI v0.2 function IDs
[0.00] psci: Trusted OS migration not required
[0.00] PERCPU: Embedded 20 pages/cpu @ffc036f8f000 s42496 r8192 
d31232 u81920
[0.00] pcpu-alloc: s42496 r8192 d31232 u81920 alloc=20*4096
[0.00] pcpu-alloc: [0] 0 [0] 1 [0] 2 [0] 3 
[0.00] Detected PIPT I-cache on CPU0
[0.00] Built 1 zonelists in Zone order, mobility grouping on.  Total 
pages: 225792
[0.00] Kernel command line: ignore_loglevel rw root=/dev/nfs ip=dhcp 
nfsroot=10.3.3.135:/srv/nfs/salvator-x-arm64
[0.00] log_buf_len individual max cpu contribution: 4096 bytes
[0.00] log_buf_len total cpu_extra contributions: 12288 bytes
[0.00] log_buf_len min size: 16384 bytes
[0.00] log_buf_len: 32768 bytes
[0.00] early log buf free: 14796(90%)
[0.00] PID hash table entries: 4096 (order: 3, 32768 bytes)
[0.00] Dentry cache hash table entries: 131072 (order: 8, 1048576 bytes)
[0.00] Inode-cache hash table entries: 65536 (order: 7, 524288 bytes)
[0.00] software IO TLB [mem 0x79c0-0x7dc0] (64MB) mapped at 
[ffc031c0-ffc035bf]
[0.00] Memory: 805744K/917504K available (6468K kernel code, 593K 
rwdata, 2900K rodata, 724K init, 242K bss, 95376K reserved, 16384K cma-reserved)
[0.00] Virtual kernel memory layout:
[0.00] vmalloc : 0xff80 - 0xffbdbfff   (   246 
GB)
[0.00] vmemmap : 0xffbdc000 - 0xffbfc000   ( 8 
GB maximum)
[0.00]   0xffbdc120 - 0xffbdc200   (14 
MB actual)
[0.00] fixed   : 0xffbffa7fd000 - 0xffbffac0   (  4108 
KB)
[0.00] PCI I/O : 0xffbffae0 - 0xffbffbe0   (16 
MB)
[0.00] modules : 0xffbffc00 - 0xffc0   (64 
MB)
[0.00] memory  : 0xffc0 - 0xffc03800   (   896 
MB)
[0.00]   .init : 0xffc0009a9000 - 0xffc000a5e000   (   724 
KB)
[0.00]   .text : 0xffc8 - 0xffc0009a8244   (  9377 
KB)
[0.00]   .data : 0xffc000a5e000 - 0xffc000af2600   (   594 
KB)
[0.00] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=4, Nodes=1
[0.00] Preemptible hierarchical RCU implementation.
[0.00]  Build-time adjustment of leaf fanout to 64.
[0.00]  RCU restricting CPUs from NR_CPUS=64 to nr_cpu_ids=4.
[0.00] RCU: Adjusting geometry for rcu_fanout_leaf=64, nr_cpu_ids=4
[0.00] NR_IRQS:64 nr_irqs:64 0
[0.00] Architected cp15 timer(s) running at 16.66MHz (virt).
[0.00] clocksource: arch_sys_counter: mask: 0xff 
max_cycles: 0x3d7a162dd, max_idle_ns: 440795202225 ns
[0.01] sched_clock: 56 bits at 16MHz, resolution 60ns, wraps every 
219902321ns
[0.55] Console: colour dummy device 80x25
[0.000217] console [tty0] enabled
[0.000229] Calibrating delay loop (skipped), value calculated using timer 
frequency.. 33.32 BogoMIPS (lpj=66640)
[0.000238] pid_max: default: 32768 minimum: 301
[0.000263] Security Framework initialized
[0.000278] Mount-cache hash table entries: 2048 (order: 2, 16384 bytes)
[0.000283] Mountpoint-cache hash table entries: 2048 (order: 2, 16384 bytes)
[0.000678] ASID allocator initialised with 65536 entries
[0.020145] EFI services will not be available.
[0.036098] Detected PIPT I-cache on CPU1
[0.036121] CPU1: Booted secondary processor [411fd073]
[0.048089] Detected PIPT I-cache on CPU2
[0.048097] CPU2: Booted secondary processor [411fd073]
[0.060095] Detected PIPT I-cache on CPU3
[0.060104] CPU3: Booted secondary processor [411

[net-next 08/14] igb: Always enable VLAN 0 even if 8021q is not loaded

2016-02-15 Thread Jeff Kirsher

From: Alexander Duyck 

This patch makes it so that we always add VLAN 0.  This is important as we
need to guarantee the PF can receive untagged frames in the case of SR-IOV
being enabled but VLAN filtering not being enabled in the kernel.

Signed-off-by: Alexander Duyck 
Tested-by: Aaron Brown 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/igb/igb_main.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/intel/igb/igb_main.c 
b/drivers/net/ethernet/intel/igb/igb_main.c
index 9461480..f3e1738 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -7177,11 +7177,12 @@ static int igb_vlan_rx_kill_vid(struct net_device 
*netdev,
 
 static void igb_restore_vlan(struct igb_adapter *adapter)
 {
-   u16 vid;
+   u16 vid = 1;
 
igb_vlan_mode(adapter->netdev, adapter->netdev->features);
+   igb_vlan_rx_add_vid(adapter->netdev, htons(ETH_P_8021Q), 0);
 
-   for_each_set_bit(vid, adapter->active_vlans, VLAN_N_VID)
+   for_each_set_bit_from(vid, adapter->active_vlans, VLAN_N_VID)
igb_vlan_rx_add_vid(adapter->netdev, htons(ETH_P_8021Q), vid);
 }
 
-- 
2.5.0

[net-next 04/14] igb: clean up code for setting MAC address

2016-02-15 Thread Jeff Kirsher

From: Alexander Duyck 

Drop a bunch of hand written byte swapping code in favor of just doing the
byte swapping ourselves.  The registers are little endian registers storing
a big endian value so if we read the MAC address array as little endian
then we will get the CPU registers into the proper layout.

Signed-off-by: Alexander Duyck 
Tested-by: Aaron Brown 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/igb/igb_main.c | 9 -
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/intel/igb/igb_main.c 
b/drivers/net/ethernet/intel/igb/igb_main.c
index 85c47aa..02f19e4 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -7698,15 +7698,14 @@ static void igb_io_resume(struct pci_dev *pdev)
 static void igb_rar_set_qsel(struct igb_adapter *adapter, u8 *addr, u32 index,
 u8 qsel)
 {
-   u32 rar_low, rar_high;
struct e1000_hw *hw = &adapter->hw;
+   u32 rar_low, rar_high;
 
/* HW expects these in little endian so we reverse the byte order
-* from network order (big endian) to little endian
+* from network order (big endian) to CPU endian
 */
-   rar_low = ((u32) addr[0] | ((u32) addr[1] << 8) |
-  ((u32) addr[2] << 16) | ((u32) addr[3] << 24));
-   rar_high = ((u32) addr[4] | ((u32) addr[5] << 8));
+   rar_low = le32_to_cpup((__be32 *)(addr));
+   rar_high = le16_to_cpup((__be16 *)(addr + 4));
 
/* Indicate to hardware the Address is Valid. */
rar_high |= E1000_RAH_AV;
-- 
2.5.0

[net-next 00/14][pull request] 1GbE Intel Wired LAN Driver Updates 2016-02-15

2016-02-15 Thread Jeff Kirsher

This series contains updates to igb only.

Shota Suzuki cleans up unnecessary flag setting for 82576 in
igb_set_flag_queue_pairs() since the default block already sets
IGB_FLAG_QUEUE_PAIRS to the correct value anyways, so the e1000_82576
code block is not necessary and we can simply fall through.  Then fixes
an issue where IGB_FLAG_QUEUE_PAIRS can now be set by using "ethtool -L"
option but is never cleared unless the driver is reloaded, so clear the
queue pairing if the pairing becomes unnecessary as a result of "ethtool
-L".

Mitch fixes the igbvf from giving up if it fails to get the hardware
mailbox lock.  This can happen when the PF-VF communication channel is
heavily loaded and causes complete communications failure between the
PF and VF drivers, so add a counter and a delay so that the driver will
now retry ten times before giving up on getting the mailbox lock.

The remaining patches in the series are from Alex Duyck, starting with the
cleaning up code that sets the MAC address.  Then refactors the VFTA and
VLVF configuration, to simplify and update to similar setups in the ixgbe
driver.  Fixed an issue were VLANs headers size was being added to the
value programmed into the RLPML registers, yet these registers already
take into account the size of the VLAN headers when determining the
maximum packet length, so we can drop the code that adds the size to
the RLPML registers.  Cleaned up the configuration of the VF port based
VLAN configuration.  Also fixed the igb driver so that we can fully
support SR-IOV or the recently added NTUPLE filtering while allowing
support for VLAN promiscuous mode.  Also added the ability to use the
bridge utility to add a FDB entry for the PF to an igb port.

The following are changes since commit 667f00630ebefc4d73aa105c6ab254e4aec867f8:
  Merge branch 'local-checksum-offload'
and are available in the git repository at:
  git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue 1GbE

Alexander Duyck (11):
  igb: clean up code for setting MAC address
  igb: Refactor VFTA configuration
  igb: Allow asymmetric configuration of MTU versus Rx frame size
  igb: Do not factor VLANs into RLPML calculation
  igb: Always enable VLAN 0 even if 8021q is not loaded
  igb: Merge VLVF configuration into igb_vfta_set
  igb: Clean-up configuration of VF port VLANs
  igb: Add support for VLAN promiscuous with SR-IOV and NTUPLE
  igb: Drop unnecessary checks in transmit path
  igb: Enable use of "bridge fdb add" to set unicast table entries
  igb: Add workaround for VLAN tag stripping on 82576

Mitch Williams (1):
  igb/igbvf: don't give up

Shota Suzuki (2):
  igb: Remove unnecessary flag setting in igb_set_flag_queue_pairs()
  igb: Unpair the queues when changing the number of queues

 drivers/net/ethernet/intel/igb/e1000_82575.c   |  39 +-
 drivers/net/ethernet/intel/igb/e1000_defines.h |   3 +-
 drivers/net/ethernet/intel/igb/e1000_hw.h  |   2 +-
 drivers/net/ethernet/intel/igb/e1000_mac.c | 213 ---
 drivers/net/ethernet/intel/igb/e1000_mac.h |   5 +-
 drivers/net/ethernet/intel/igb/e1000_mbx.c |  18 +-
 drivers/net/ethernet/intel/igb/igb.h   |   2 +-
 drivers/net/ethernet/intel/igb/igb_main.c  | 774 ++---
 drivers/net/ethernet/intel/igbvf/mbx.c |  20 +-
 9 files changed, 637 insertions(+), 439 deletions(-)

-- 
2.5.0

[net-next 09/14] igb: Merge VLVF configuration into igb_vfta_set

2016-02-15 Thread Jeff Kirsher

From: Alexander Duyck 

This change makes it so that we can merge the configuration of the VLVF
registers into the setting of the VFTA register.  By doing this we simplify
the logic and make use of similar functionality that we have already added
for ixgbe making it easier to maintain both drivers.

Signed-off-by: Alexander Duyck 
Tested-by: Aaron Brown 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/igb/e1000_mac.c | 119 -
 drivers/net/ethernet/intel/igb/e1000_mac.h |   3 +-
 drivers/net/ethernet/intel/igb/igb_main.c  | 105 +
 3 files changed, 135 insertions(+), 92 deletions(-)

diff --git a/drivers/net/ethernet/intel/igb/e1000_mac.c 
b/drivers/net/ethernet/intel/igb/e1000_mac.c
index 97f6fae..07cf4fe 100644
--- a/drivers/net/ethernet/intel/igb/e1000_mac.c
+++ b/drivers/net/ethernet/intel/igb/e1000_mac.c
@@ -141,21 +141,69 @@ void igb_init_rx_addrs(struct e1000_hw *hw, u16 rar_count)
 }
 
 /**
+ *  igb_find_vlvf_slot - find the VLAN id or the first empty slot
+ *  @hw: pointer to hardware structure
+ *  @vlan: VLAN id to write to VLAN filter
+ *  @vlvf_bypass: skip VLVF if no match is found
+ *
+ *  return the VLVF index where this VLAN id should be placed
+ *
+ **/
+static s32 igb_find_vlvf_slot(struct e1000_hw *hw, u32 vlan, bool vlvf_bypass)
+{
+   s32 regindex, first_empty_slot;
+   u32 bits;
+
+   /* short cut the special case */
+   if (vlan == 0)
+   return 0;
+
+   /* if vlvf_bypass is set we don't want to use an empty slot, we
+* will simply bypass the VLVF if there are no entries present in the
+* VLVF that contain our VLAN
+*/
+   first_empty_slot = vlvf_bypass ? -E1000_ERR_NO_SPACE : 0;
+
+   /* Search for the VLAN id in the VLVF entries. Save off the first empty
+* slot found along the way.
+*
+* pre-decrement loop covering (IXGBE_VLVF_ENTRIES - 1) .. 1
+*/
+   for (regindex = E1000_VLVF_ARRAY_SIZE; --regindex > 0;) {
+   bits = rd32(E1000_VLVF(regindex)) & E1000_VLVF_VLANID_MASK;
+   if (bits == vlan)
+   return regindex;
+   if (!first_empty_slot && !bits)
+   first_empty_slot = regindex;
+   }
+
+   return first_empty_slot ? : -E1000_ERR_NO_SPACE;
+}
+
+/**
  *  igb_vfta_set - enable or disable vlan in VLAN filter table
  *  @hw: pointer to the HW structure
  *  @vlan: VLAN id to add or remove
+ *  @vind: VMDq output index that maps queue to VLAN id
  *  @vlan_on: if true add filter, if false remove
  *
  *  Sets or clears a bit in the VLAN filter table array based on VLAN id
  *  and if we are adding or removing the filter
  **/
-s32 igb_vfta_set(struct e1000_hw *hw, u32 vlan, bool vlan_on)
+s32 igb_vfta_set(struct e1000_hw *hw, u32 vlan, u32 vind,
+bool vlan_on, bool vlvf_bypass)
 {
struct igb_adapter *adapter = hw->back;
-   u32 regidx, vfta_delta, vfta;
+   u32 regidx, vfta_delta, vfta, bits;
+   s32 vlvf_index;
 
-   if (vlan > 4095)
-   return E1000_ERR_PARAM;
+   if ((vlan > 4095) || (vind > 7))
+   return -E1000_ERR_PARAM;
+
+   /* this is a 2 part operation - first the VFTA, then the
+* VLVF and VLVFB if VT Mode is set
+* We don't write the VFTA until we know the VLVF part succeeded.
+*/
 
/* Part 1
 * The VFTA is a bitstring made up of 128 32-bit registers
@@ -174,6 +222,69 @@ s32 igb_vfta_set(struct e1000_hw *hw, u32 vlan, bool 
vlan_on)
vfta_delta &= vlan_on ? ~vfta : vfta;
vfta ^= vfta_delta;
 
+   /* Part 2
+* If VT Mode is set
+*   Either vlan_on
+* make sure the VLAN is in VLVF
+* set the vind bit in the matching VLVFB
+*   Or !vlan_on
+* clear the pool bit and possibly the vind
+*/
+   if (!adapter->vfs_allocated_count)
+   goto vfta_update;
+
+   vlvf_index = igb_find_vlvf_slot(hw, vlan, vlvf_bypass);
+   if (vlvf_index < 0) {
+   if (vlvf_bypass)
+   goto vfta_update;
+   return vlvf_index;
+   }
+
+   bits = rd32(E1000_VLVF(vlvf_index));
+
+   /* set the pool bit */
+   bits |= 1 << (E1000_VLVF_POOLSEL_SHIFT + vind);
+   if (vlan_on)
+   goto vlvf_update;
+
+   /* clear the pool bit */
+   bits ^= 1 << (E1000_VLVF_POOLSEL_SHIFT + vind);
+
+   if (!(bits & E1000_VLVF_POOLSEL_MASK)) {
+   /* Clear VFTA first, then disable VLVF.  Otherwise
+* we run the risk of stray packets leaking into
+* the PF via the default pool
+*/
+   if (vfta_delta)
+   hw->mac.ops.write_vfta(hw, regidx, vfta);
+
+   /* disable VLVF and clear remaining bit from pool */
+   wr32(E1000_VLVF(vlvf_index), 0);
+
+   return 0;

[net-next 13/14] igb: Enable use of "bridge fdb add" to set unicast table entries

2016-02-15 Thread Jeff Kirsher

From: Alexander Duyck 

This change makes it so that we can use the bridge utility to add a FDB
entry for the PF to an igb port.  By doing this we can enable the VFs to
talk to virtual ports residing on top of the PF.

In addition this should also address issues with MACVLANs trying to reside
on top of the PF as well as they would have had similar issues when added
to the PF with SR-IOV enabled.

Signed-off-by: Alexander Duyck 
Tested-by: Aaron Brown 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/igb/igb_main.c | 39 ---
 1 file changed, 30 insertions(+), 9 deletions(-)

diff --git a/drivers/net/ethernet/intel/igb/igb_main.c 
b/drivers/net/ethernet/intel/igb/igb_main.c
index bb5be40..e9bdad7 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -2067,6 +2067,25 @@ static int igb_set_features(struct net_device *netdev,
return 0;
 }
 
+static int igb_ndo_fdb_add(struct ndmsg *ndm, struct nlattr *tb[],
+  struct net_device *dev,
+  const unsigned char *addr, u16 vid,
+  u16 flags)
+{
+   /* guarantee we can provide a unique filter for the unicast address */
+   if (is_unicast_ether_addr(addr) || is_link_local_ether_addr(addr)) {
+   struct igb_adapter *adapter = netdev_priv(dev);
+   struct e1000_hw *hw = &adapter->hw;
+   int vfn = adapter->vfs_allocated_count;
+   int rar_entries = hw->mac.rar_entry_count - (vfn + 1);
+
+   if (netdev_uc_count(dev) >= rar_entries)
+   return -ENOMEM;
+   }
+
+   return ndo_dflt_fdb_add(ndm, tb, dev, addr, vid, flags);
+}
+
 static const struct net_device_ops igb_netdev_ops = {
.ndo_open   = igb_open,
.ndo_stop   = igb_close,
@@ -2090,6 +2109,7 @@ static const struct net_device_ops igb_netdev_ops = {
 #endif
.ndo_fix_features   = igb_fix_features,
.ndo_set_features   = igb_set_features,
+   .ndo_fdb_add= igb_ndo_fdb_add,
.ndo_features_check = passthru_features_check,
 };
 
@@ -4132,15 +4152,16 @@ static void igb_set_rx_mode(struct net_device *netdev)
vmolr |= E1000_VMOLR_ROMPE;
}
}
-   /* Write addresses to available RAR registers, if there is not
-* sufficient space to store all the addresses then enable
-* unicast promiscuous mode
-*/
-   count = igb_write_uc_addr_list(netdev);
-   if (count < 0) {
-   rctl |= E1000_RCTL_UPE;
-   vmolr |= E1000_VMOLR_ROPE;
-   }
+   }
+
+   /* Write addresses to available RAR registers, if there is not
+* sufficient space to store all the addresses then enable
+* unicast promiscuous mode
+*/
+   count = igb_write_uc_addr_list(netdev);
+   if (count < 0) {
+   rctl |= E1000_RCTL_UPE;
+   vmolr |= E1000_VMOLR_ROPE;
}
 
/* enable VLAN filtering by default */
-- 
2.5.0

[net-next 03/14] igb/igbvf: don't give up

2016-02-15 Thread Jeff Kirsher

From: Mitch Williams 

The driver shouldn't just give up if it fails to get the hardware
mailbox lock. This can happen in a situation where the PF-VF
communication channel is heavily loaded and causes complete
communications failure between the PF and VF drivers.

Add a counter and a delay. The driver will now retry ten times, waiting
one millisecond between retries.

Signed-off-by: Mitch Williams 
Tested-by: Aaron Brown 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/igb/e1000_mbx.c | 18 --
 drivers/net/ethernet/intel/igbvf/mbx.c | 20 +---
 2 files changed, 25 insertions(+), 13 deletions(-)

diff --git a/drivers/net/ethernet/intel/igb/e1000_mbx.c 
b/drivers/net/ethernet/intel/igb/e1000_mbx.c
index 162cc49..10f5c9e 100644
--- a/drivers/net/ethernet/intel/igb/e1000_mbx.c
+++ b/drivers/net/ethernet/intel/igb/e1000_mbx.c
@@ -322,14 +322,20 @@ static s32 igb_obtain_mbx_lock_pf(struct e1000_hw *hw, 
u16 vf_number)
 {
s32 ret_val = -E1000_ERR_MBX;
u32 p2v_mailbox;
+   int count = 10;
 
-   /* Take ownership of the buffer */
-   wr32(E1000_P2VMAILBOX(vf_number), E1000_P2VMAILBOX_PFU);
+   do {
+   /* Take ownership of the buffer */
+   wr32(E1000_P2VMAILBOX(vf_number), E1000_P2VMAILBOX_PFU);
 
-   /* reserve mailbox for vf use */
-   p2v_mailbox = rd32(E1000_P2VMAILBOX(vf_number));
-   if (p2v_mailbox & E1000_P2VMAILBOX_PFU)
-   ret_val = 0;
+   /* reserve mailbox for vf use */
+   p2v_mailbox = rd32(E1000_P2VMAILBOX(vf_number));
+   if (p2v_mailbox & E1000_P2VMAILBOX_PFU) {
+   ret_val = 0;
+   break;
+   }
+   udelay(1000);
+   } while (count-- > 0);
 
return ret_val;
 }
diff --git a/drivers/net/ethernet/intel/igbvf/mbx.c 
b/drivers/net/ethernet/intel/igbvf/mbx.c
index 7b6cb4c..01752f4 100644
--- a/drivers/net/ethernet/intel/igbvf/mbx.c
+++ b/drivers/net/ethernet/intel/igbvf/mbx.c
@@ -234,13 +234,19 @@ static s32 e1000_check_for_rst_vf(struct e1000_hw *hw)
 static s32 e1000_obtain_mbx_lock_vf(struct e1000_hw *hw)
 {
s32 ret_val = -E1000_ERR_MBX;
-
-   /* Take ownership of the buffer */
-   ew32(V2PMAILBOX(0), E1000_V2PMAILBOX_VFU);
-
-   /* reserve mailbox for VF use */
-   if (e1000_read_v2p_mailbox(hw) & E1000_V2PMAILBOX_VFU)
-   ret_val = E1000_SUCCESS;
+   int count = 10;
+
+   do {
+   /* Take ownership of the buffer */
+   ew32(V2PMAILBOX(0), E1000_V2PMAILBOX_VFU);
+
+   /* reserve mailbox for VF use */
+   if (e1000_read_v2p_mailbox(hw) & E1000_V2PMAILBOX_VFU) {
+   ret_val = 0;
+   break;
+   }
+   udelay(1000);
+   } while (count-- > 0);
 
return ret_val;
 }
-- 
2.5.0

[net-next 05/14] igb: Refactor VFTA configuration

2016-02-15 Thread Jeff Kirsher

From: Alexander Duyck 

This patch starts the clean-up process on the VFTA configuration.
Specifically in this patch I attempt to address and simplify several items
while also updating the code to bring it more inline with what is already
in ixgbe.

Signed-off-by: Alexander Duyck 
Tested-by: Aaron Brown 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/igb/e1000_82575.c |  37 --
 drivers/net/ethernet/intel/igb/e1000_hw.h|   2 +-
 drivers/net/ethernet/intel/igb/e1000_mac.c   | 102 +--
 drivers/net/ethernet/intel/igb/e1000_mac.h   |   2 +-
 4 files changed, 67 insertions(+), 76 deletions(-)

diff --git a/drivers/net/ethernet/intel/igb/e1000_82575.c 
b/drivers/net/ethernet/intel/igb/e1000_82575.c
index adb33e2..fff5052 100644
--- a/drivers/net/ethernet/intel/igb/e1000_82575.c
+++ b/drivers/net/ethernet/intel/igb/e1000_82575.c
@@ -34,6 +34,7 @@
 #include "e1000_mac.h"
 #include "e1000_82575.h"
 #include "e1000_i210.h"
+#include "igb.h"
 
 static s32  igb_get_invariants_82575(struct e1000_hw *);
 static s32  igb_acquire_phy_82575(struct e1000_hw *);
@@ -71,6 +72,32 @@ static s32 igb_update_nvm_checksum_i350(struct e1000_hw *hw);
 static const u16 e1000_82580_rxpbs_table[] = {
36, 72, 144, 1, 2, 4, 8, 16, 35, 70, 140 };
 
+/* Due to a hw errata, if the host tries to  configure the VFTA register
+ * while performing queries from the BMC or DMA, then the VFTA in some
+ * cases won't be written.
+ */
+
+/**
+ *  igb_write_vfta_i350 - Write value to VLAN filter table
+ *  @hw: pointer to the HW structure
+ *  @offset: register offset in VLAN filter table
+ *  @value: register value written to VLAN filter table
+ *
+ *  Writes value at the given offset in the register array which stores
+ *  the VLAN filter table.
+ **/
+static void igb_write_vfta_i350(struct e1000_hw *hw, u32 offset, u32 value)
+{
+   struct igb_adapter *adapter = hw->back;
+   int i;
+
+   for (i = 10; i--;)
+   array_wr32(E1000_VFTA, offset, value);
+
+   wrfl();
+   adapter->shadow_vfta[offset] = value;
+}
+
 /**
  *  igb_sgmii_uses_mdio_82575 - Determine if I2C pins are for external MDIO
  *  @hw: pointer to the HW structure
@@ -429,6 +456,11 @@ static s32 igb_init_mac_params_82575(struct e1000_hw *hw)
mac->ops.release_swfw_sync = igb_release_swfw_sync_82575;
}
 
+   if ((hw->mac.type == e1000_i350) || (hw->mac.type == e1000_i354))
+   mac->ops.write_vfta = igb_write_vfta_i350;
+   else
+   mac->ops.write_vfta = igb_write_vfta;
+
/* Set if part includes ASF firmware */
mac->asf_firmware_present = true;
/* Set if manageability features are enabled. */
@@ -1517,10 +1549,7 @@ static s32 igb_init_hw_82575(struct e1000_hw *hw)
 
/* Disabling VLAN filtering */
hw_dbg("Initializing the IEEE VLAN\n");
-   if ((hw->mac.type == e1000_i350) || (hw->mac.type == e1000_i354))
-   igb_clear_vfta_i350(hw);
-   else
-   igb_clear_vfta(hw);
+   igb_clear_vfta(hw);
 
/* Setup the receive address */
igb_init_rx_addrs(hw, rar_count);
diff --git a/drivers/net/ethernet/intel/igb/e1000_hw.h 
b/drivers/net/ethernet/intel/igb/e1000_hw.h
index 4034207..f0c416e 100644
--- a/drivers/net/ethernet/intel/igb/e1000_hw.h
+++ b/drivers/net/ethernet/intel/igb/e1000_hw.h
@@ -325,7 +325,7 @@ struct e1000_mac_operations {
s32 (*get_thermal_sensor_data)(struct e1000_hw *);
s32 (*init_thermal_sensor_thresh)(struct e1000_hw *);
 #endif
-
+   void (*write_vfta)(struct e1000_hw *, u32, u32);
 };
 
 struct e1000_phy_operations {
diff --git a/drivers/net/ethernet/intel/igb/e1000_mac.c 
b/drivers/net/ethernet/intel/igb/e1000_mac.c
index 2a88595..97f6fae 100644
--- a/drivers/net/ethernet/intel/igb/e1000_mac.c
+++ b/drivers/net/ethernet/intel/igb/e1000_mac.c
@@ -92,10 +92,8 @@ void igb_clear_vfta(struct e1000_hw *hw)
 {
u32 offset;
 
-   for (offset = 0; offset < E1000_VLAN_FILTER_TBL_SIZE; offset++) {
-   array_wr32(E1000_VFTA, offset, 0);
-   wrfl();
-   }
+   for (offset = E1000_VLAN_FILTER_TBL_SIZE; offset--;)
+   hw->mac.ops.write_vfta(hw, offset, 0);
 }
 
 /**
@@ -107,54 +105,14 @@ void igb_clear_vfta(struct e1000_hw *hw)
  *  Writes value at the given offset in the register array which stores
  *  the VLAN filter table.
  **/
-static void igb_write_vfta(struct e1000_hw *hw, u32 offset, u32 value)
+void igb_write_vfta(struct e1000_hw *hw, u32 offset, u32 value)
 {
+   struct igb_adapter *adapter = hw->back;
+
array_wr32(E1000_VFTA, offset, value);
wrfl();
-}
-
-/* Due to a hw errata, if the host tries to  configure the VFTA register
- * while performing queries from the BMC or DMA, then the VFTA in some
- * cases won't be written.
- */
-
-/**
- *  igb_clear_vfta_i350 - Clear VLAN filter table
- *  @hw: pointer to the HW structure
- *
- *  Clears the register arr

[net-next 06/14] igb: Allow asymmetric configuration of MTU versus Rx frame size

2016-02-15 Thread Jeff Kirsher

From: Alexander Duyck 

Since the igb driver is using page based receive there is no point in
limiting the Rx capabilities of the device.  The driver can receive 9K
jumbo frames at all times.  The only changes needed due to MTU changes are
updates for the FIFO sizes and flow-control watermarks.

Update the maximum frame size to reflect the 9.5K limitation of the
hardware, and replace all instances of max_frame_size with
MAX_JUMBO_FRAME_SIZE when referring to an Rx FIFO or frame.

Signed-off-by: Alexander Duyck 
Tested-by: Aaron Brown 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/igb/e1000_defines.h |   3 +-
 drivers/net/ethernet/intel/igb/igb_main.c  | 107 +
 2 files changed, 42 insertions(+), 68 deletions(-)

diff --git a/drivers/net/ethernet/intel/igb/e1000_defines.h 
b/drivers/net/ethernet/intel/igb/e1000_defines.h
index c3c598c..e9f23ee 100644
--- a/drivers/net/ethernet/intel/igb/e1000_defines.h
+++ b/drivers/net/ethernet/intel/igb/e1000_defines.h
@@ -356,7 +356,8 @@
 /* Ethertype field values */
 #define ETHERNET_IEEE_VLAN_TYPE 0x8100  /* 802.3ac packet */
 
-#define MAX_JUMBO_FRAME_SIZE0x3F00
+/* As per the EAS the maximum supported size is 9.5KB (9728 bytes) */
+#define MAX_JUMBO_FRAME_SIZE   0x2600
 
 /* PBA constants */
 #define E1000_PBA_34K 0x0022
diff --git a/drivers/net/ethernet/intel/igb/igb_main.c 
b/drivers/net/ethernet/intel/igb/igb_main.c
index 02f19e4..b676881 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -1862,7 +1862,7 @@ void igb_reset(struct igb_adapter *adapter)
struct e1000_hw *hw = &adapter->hw;
struct e1000_mac_info *mac = &hw->mac;
struct e1000_fc_info *fc = &hw->fc;
-   u32 pba = 0, tx_space, min_tx_space, min_rx_space, hwm;
+   u32 pba, hwm;
 
/* Repartition Pba for greater than 9k mtu
 * To take effect CTRL.RST is required.
@@ -1886,9 +1886,10 @@ void igb_reset(struct igb_adapter *adapter)
break;
}
 
-   if ((adapter->max_frame_size > ETH_FRAME_LEN + ETH_FCS_LEN) &&
-   (mac->type < e1000_82576)) {
-   /* adjust PBA for jumbo frames */
+   if (mac->type == e1000_82575) {
+   u32 min_rx_space, min_tx_space, needed_tx_space;
+
+   /* write Rx PBA so that hardware can report correct Tx PBA */
wr32(E1000_PBA, pba);
 
/* To maintain wire speed transmits, the Tx FIFO should be
@@ -1898,31 +1899,26 @@ void igb_reset(struct igb_adapter *adapter)
 * one full receive packet and is similarly rounded up and
 * expressed in KB.
 */
-   pba = rd32(E1000_PBA);
-   /* upper 16 bits has Tx packet buffer allocation size in KB */
-   tx_space = pba >> 16;
-   /* lower 16 bits has Rx packet buffer allocation size in KB */
-   pba &= 0x;
-   /* the Tx fifo also stores 16 bytes of information about the Tx
-* but don't include ethernet FCS because hardware appends it
+   min_rx_space = DIV_ROUND_UP(MAX_JUMBO_FRAME_SIZE, 1024);
+
+   /* The Tx FIFO also stores 16 bytes of information about the Tx
+* but don't include Ethernet FCS because hardware appends it.
+* We only need to round down to the nearest 512 byte block
+* count since the value we care about is 2 frames, not 1.
 */
-   min_tx_space = (adapter->max_frame_size +
-   sizeof(union e1000_adv_tx_desc) -
-   ETH_FCS_LEN) * 2;
-   min_tx_space = ALIGN(min_tx_space, 1024);
-   min_tx_space >>= 10;
-   /* software strips receive CRC, so leave room for it */
-   min_rx_space = adapter->max_frame_size;
-   min_rx_space = ALIGN(min_rx_space, 1024);
-   min_rx_space >>= 10;
+   min_tx_space = adapter->max_frame_size;
+   min_tx_space += sizeof(union e1000_adv_tx_desc) - ETH_FCS_LEN;
+   min_tx_space = DIV_ROUND_UP(min_tx_space, 512);
+
+   /* upper 16 bits has Tx packet buffer allocation size in KB */
+   needed_tx_space = min_tx_space - (rd32(E1000_PBA) >> 16);
 
/* If current Tx allocation is less than the min Tx FIFO size,
 * and the min Tx FIFO size is less than the current Rx FIFO
-* allocation, take space away from current Rx allocation
+* allocation, take space away from current Rx allocation.
 */
-   if (tx_space < min_tx_space &&
-   ((min_tx_space - tx_space) < pba)) {
-   pba = pba - (min_tx_space - tx_space);
+   if (needed_tx_space < pba) {
+   pba -= needed_tx_space;
 
/* if s

[net-next 10/14] igb: Clean-up configuration of VF port VLANs

2016-02-15 Thread Jeff Kirsher

From: Alexander Duyck 

This patch is meant to clean-up the configuration of the VF port based VLAN
configuration.  The original logic was a bit muddled and had some
undesirable side effects such as VLANs being either completely stripped
from the port or VLANs being left when they shouldn't be.  The idea behind
this code is to avoid any events such as spurious spoof notifications when
we are removing one VLAN tag and replacing it with another.

Signed-off-by: Alexander Duyck 
Tested-by: Aaron Brown 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/igb/igb_main.c | 181 ++
 1 file changed, 110 insertions(+), 71 deletions(-)

diff --git a/drivers/net/ethernet/intel/igb/igb_main.c 
b/drivers/net/ethernet/intel/igb/igb_main.c
index e7c3a94..6876ae5 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -5786,53 +5786,6 @@ static void igb_clear_vf_vfta(struct igb_adapter 
*adapter, u32 vf)
}
 }
 
-static void igb_set_vmvir(struct igb_adapter *adapter, u32 vid, u32 vf)
-{
-   struct e1000_hw *hw = &adapter->hw;
-
-   if (vid)
-   wr32(E1000_VMVIR(vf), (vid | E1000_VMVIR_VLANA_DEFAULT));
-   else
-   wr32(E1000_VMVIR(vf), 0);
-}
-
-static int igb_ndo_set_vf_vlan(struct net_device *netdev,
-  int vf, u16 vlan, u8 qos)
-{
-   struct igb_adapter *adapter = netdev_priv(netdev);
-   struct e1000_hw *hw = &adapter->hw;
-   int err = 0;
-
-   if ((vf >= adapter->vfs_allocated_count) || (vlan > 4095) || (qos > 7))
-   return -EINVAL;
-   if (vlan || qos) {
-   err = igb_vfta_set(hw, vlan, vf, !!vlan, false);
-   if (err)
-   goto out;
-   igb_set_vmvir(adapter, vlan | (qos << VLAN_PRIO_SHIFT), vf);
-   igb_set_vmolr(adapter, vf, !vlan);
-   adapter->vf_data[vf].pf_vlan = vlan;
-   adapter->vf_data[vf].pf_qos = qos;
-   dev_info(&adapter->pdev->dev,
-"Setting VLAN %d, QOS 0x%x on VF %d\n", vlan, qos, vf);
-   if (test_bit(__IGB_DOWN, &adapter->state)) {
-   dev_warn(&adapter->pdev->dev,
-"The VF VLAN has been set, but the PF device 
is not up.\n");
-   dev_warn(&adapter->pdev->dev,
-"Bring the PF device up before attempting to 
use the VF device.\n");
-   }
-   } else {
-   igb_vfta_set(hw, adapter->vf_data[vf].pf_vlan, vf,
-false, false);
-   igb_set_vmvir(adapter, vlan, vf);
-   igb_set_vmolr(adapter, vf, true);
-   adapter->vf_data[vf].pf_vlan = 0;
-   adapter->vf_data[vf].pf_qos = 0;
-   }
-out:
-   return err;
-}
-
 static int igb_find_vlvf_entry(struct igb_adapter *adapter, int vid)
 {
struct e1000_hw *hw = &adapter->hw;
@@ -5853,23 +5806,25 @@ static int igb_find_vlvf_entry(struct igb_adapter 
*adapter, int vid)
return i;
 }
 
-static int igb_set_vf_vlan(struct igb_adapter *adapter, u32 *msgbuf, u32 vf)
+static s32 igb_set_vf_vlan(struct igb_adapter *adapter, u32 vid,
+  bool add, u32 vf)
 {
+   int pf_id = adapter->vfs_allocated_count;
struct e1000_hw *hw = &adapter->hw;
-   int add = (msgbuf[0] & E1000_VT_MSGINFO_MASK) >> E1000_VT_MSGINFO_SHIFT;
-   int vid = (msgbuf[1] & E1000_VLVF_VLANID_MASK);
-   int err = 0;
+   int err;
 
-   /* If in promiscuous mode we need to make sure the PF also has
-* the VLAN filter set.
+   /* If VLAN overlaps with one the PF is currently monitoring make
+* sure that we are able to allocate a VLVF entry.  This may be
+* redundant but it guarantees PF will maintain visibility to
+* the VLAN.
 */
-   if (add && (adapter->netdev->flags & IFF_PROMISC))
-   err = igb_vfta_set(hw, vid, adapter->vfs_allocated_count,
-  true, false);
-   if (err)
-   goto out;
+   if (add && (adapter->netdev->flags & IFF_PROMISC)) {
+   err = igb_vfta_set(hw, vid, pf_id, true, false);
+   if (err)
+   return err;
+   }
 
-   err = igb_vfta_set(hw, vid, vf, !!add, false);
+   err = igb_vfta_set(hw, vid, vf, add, false);
 
if (err)
goto out;
@@ -5904,23 +5859,107 @@ out:
return err;
 }
 
-static inline void igb_vf_reset(struct igb_adapter *adapter, u32 vf)
+static void igb_set_vmvir(struct igb_adapter *adapter, u32 vid, u32 vf)
 {
-   /* clear flags - except flag that indicates PF has set the MAC */
-   adapter->vf_data[vf].flags &= IGB_VF_FLAG_PF_SET_MAC;
-   adapter->vf_data[vf].last_nack = jiffies;
+   struct e1000_hw *hw = &adapter->hw;
 
-   /* reset offloads to defaults */
+

[net-next 02/14] igb: Unpair the queues when changing the number of queues

2016-02-15 Thread Jeff Kirsher

From: Shota Suzuki 

By the commit 72ddef0506da ("igb: Fix oops caused by missing queue
pairing"), the IGB_FLAG_QUEUE_PAIRS flag can now be set when changing the
number of queues by "ethtool -L", but it is never cleared unless the igb
driver is reloaded.
This patch clears it if queue pairing becomes unnecessary as a result of
"ethtool -L".

Signed-off-by: Shota Suzuki 
Tested-by: Aaron Brown 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/igb/igb_main.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/ethernet/intel/igb/igb_main.c 
b/drivers/net/ethernet/intel/igb/igb_main.c
index eb24b40..85c47aa 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -2931,6 +2931,8 @@ void igb_set_flag_queue_pairs(struct igb_adapter *adapter,
 */
if (adapter->rss_queues > (max_rss_queues / 2))
adapter->flags |= IGB_FLAG_QUEUE_PAIRS;
+   else
+   adapter->flags &= ~IGB_FLAG_QUEUE_PAIRS;
break;
}
 }
-- 
2.5.0

[net-next 12/14] igb: Drop unnecessary checks in transmit path

2016-02-15 Thread Jeff Kirsher

From: Alexander Duyck 

This patch drops several checks that we dropped from ixgbe some ago.  It
should not be possible for us to be called with either of the conditional
statements returning true so we can just drop them from the hot-path.

Signed-off-by: Alexander Duyck 
Tested-by: Aaron Brown 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/igb/igb_main.c | 10 --
 1 file changed, 10 deletions(-)

diff --git a/drivers/net/ethernet/intel/igb/igb_main.c 
b/drivers/net/ethernet/intel/igb/igb_main.c
index 7366d4f..bb5be40 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -5193,16 +5193,6 @@ static netdev_tx_t igb_xmit_frame(struct sk_buff *skb,
 {
struct igb_adapter *adapter = netdev_priv(netdev);
 
-   if (test_bit(__IGB_DOWN, &adapter->state)) {
-   dev_kfree_skb_any(skb);
-   return NETDEV_TX_OK;
-   }
-
-   if (skb->len <= 0) {
-   dev_kfree_skb_any(skb);
-   return NETDEV_TX_OK;
-   }
-
/* The minimum packet size with TCTL.PSP set is 17 so pad the skb
 * in order to meet this minimum size requirement.
 */
-- 
2.5.0

[net-next 07/14] igb: Do not factor VLANs into RLPML calculation

2016-02-15 Thread Jeff Kirsher

From: Alexander Duyck 

The RLPML registers already take the size of VLAN headers into account when
determining the maximum packet length.  This is called out in EAS documents
for several parts including the 82576 and the i350.  As such we can drop
the addition of size to the value programmed into the RLPML registers.

Signed-off-by: Alexander Duyck 
Tested-by: Aaron Brown 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/igb/igb.h  |  1 -
 drivers/net/ethernet/intel/igb/igb_main.c | 43 ++-
 2 files changed, 2 insertions(+), 42 deletions(-)

diff --git a/drivers/net/ethernet/intel/igb/igb.h 
b/drivers/net/ethernet/intel/igb/igb.h
index e3cb93b..d135261 100644
--- a/drivers/net/ethernet/intel/igb/igb.h
+++ b/drivers/net/ethernet/intel/igb/igb.h
@@ -95,7 +95,6 @@ struct vf_data_storage {
unsigned char vf_mac_addresses[ETH_ALEN];
u16 vf_mc_hashes[IGB_MAX_VF_MC_ENTRIES];
u16 num_vf_mc_hashes;
-   u16 vlans_enabled;
u32 flags;
unsigned long last_nack;
u16 pf_vlan; /* When set, guest VLAN config not allowed. */
diff --git a/drivers/net/ethernet/intel/igb/igb_main.c 
b/drivers/net/ethernet/intel/igb/igb_main.c
index b676881..9461480 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -3531,12 +3531,8 @@ static inline int igb_set_vf_rlpml(struct igb_adapter 
*adapter, int size,
struct e1000_hw *hw = &adapter->hw;
u32 vmolr;
 
-   /* if it isn't the PF check to see if VFs are enabled and
-* increase the size to support vlan tags
-*/
-   if (vfn < adapter->vfs_allocated_count &&
-   adapter->vf_data[vfn].vlans_enabled)
-   size += VLAN_TAG_SIZE;
+   if (size > MAX_JUMBO_FRAME_SIZE)
+   size = MAX_JUMBO_FRAME_SIZE;
 
vmolr = rd32(E1000_VMOLR(vfn));
vmolr &= ~E1000_VMOLR_RLPML_MASK;
@@ -5787,8 +5783,6 @@ static void igb_clear_vf_vfta(struct igb_adapter 
*adapter, u32 vf)
 
wr32(E1000_VLVF(i), reg);
}
-
-   adapter->vf_data[vf].vlans_enabled = 0;
 }
 
 static s32 igb_vlvf_set(struct igb_adapter *adapter, u32 vid, bool add, u32 vf)
@@ -5837,23 +5831,6 @@ static s32 igb_vlvf_set(struct igb_adapter *adapter, u32 
vid, bool add, u32 vf)
reg &= ~E1000_VLVF_VLANID_MASK;
reg |= vid;
wr32(E1000_VLVF(i), reg);
-
-   /* do not modify RLPML for PF devices */
-   if (vf >= adapter->vfs_allocated_count)
-   return 0;
-
-   if (!adapter->vf_data[vf].vlans_enabled) {
-   u32 size;
-
-   reg = rd32(E1000_VMOLR(vf));
-   size = reg & E1000_VMOLR_RLPML_MASK;
-   size += 4;
-   reg &= ~E1000_VMOLR_RLPML_MASK;
-   reg |= size;
-   wr32(E1000_VMOLR(vf), reg);
-   }
-
-   adapter->vf_data[vf].vlans_enabled++;
}
} else {
if (i < E1000_VLVF_ARRAY_SIZE) {
@@ -5865,22 +5842,6 @@ static s32 igb_vlvf_set(struct igb_adapter *adapter, u32 
vid, bool add, u32 vf)
igb_vfta_set(hw, vid, false);
}
wr32(E1000_VLVF(i), reg);
-
-   /* do not modify RLPML for PF devices */
-   if (vf >= adapter->vfs_allocated_count)
-   return 0;
-
-   adapter->vf_data[vf].vlans_enabled--;
-   if (!adapter->vf_data[vf].vlans_enabled) {
-   u32 size;
-
-   reg = rd32(E1000_VMOLR(vf));
-   size = reg & E1000_VMOLR_RLPML_MASK;
-   size -= 4;
-   reg &= ~E1000_VMOLR_RLPML_MASK;
-   reg |= size;
-   wr32(E1000_VMOLR(vf), reg);
-   }
}
}
return 0;
-- 
2.5.0

[net-next 14/14] igb: Add workaround for VLAN tag stripping on 82576

2016-02-15 Thread Jeff Kirsher

From: Alexander Duyck 

There was a workaround partially implemented for the 82576 that is needed
in order for VLAN tag stripping to function correctly.  The original code
had side effects that would make it so the workaround was active on all
MACs.  I have updated the code so that the workaround is enabled, but
limited to the 82576, or activated if we exceed the available unicast
addresses.

The workaround has a side effect of mirroring all of the traffic outgoing
from the VFs back to the PF.  As such it is not recommended to use the
82576 in promiscuous mode as it will take a performance hit, though this is
now consistent with the performance as seen on the out-of-tree igb driver.

I also limited the scope of the UTA bits all being set to only when the
VMOLR register is enabled.  This should limit the effects of the UTA
register so that we don't pick up any excess traffic unless promiscuous
mode has been enabled on the PF, whereas before the PF would have ended up
in something equivalent to unicast promiscuous mode with VLAN filtering
otherwise.

Signed-off-by: Alexander Duyck 
Tested-by: Aaron Brown 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/igb/e1000_82575.c |  2 ++
 drivers/net/ethernet/intel/igb/igb_main.c| 26 ++
 2 files changed, 16 insertions(+), 12 deletions(-)

diff --git a/drivers/net/ethernet/intel/igb/e1000_82575.c 
b/drivers/net/ethernet/intel/igb/e1000_82575.c
index fff5052..9a1a9c7 100644
--- a/drivers/net/ethernet/intel/igb/e1000_82575.c
+++ b/drivers/net/ethernet/intel/igb/e1000_82575.c
@@ -425,6 +425,8 @@ static s32 igb_init_mac_params_82575(struct e1000_hw *hw)
 
/* Set mta register count */
mac->mta_reg_count = 128;
+   /* Set uta register count */
+   mac->uta_reg_count = (hw->mac.type == e1000_82575) ? 0 : 128;
/* Set rar entry count */
switch (mac->type) {
case e1000_82576:
diff --git a/drivers/net/ethernet/intel/igb/igb_main.c 
b/drivers/net/ethernet/intel/igb/igb_main.c
index e9bdad7..af46fcf 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -140,7 +140,7 @@ static struct rtnl_link_stats64 *igb_get_stats64(struct 
net_device *dev,
  struct rtnl_link_stats64 *stats);
 static int igb_change_mtu(struct net_device *, int);
 static int igb_set_mac(struct net_device *, void *);
-static void igb_set_uta(struct igb_adapter *adapter);
+static void igb_set_uta(struct igb_adapter *adapter, bool set);
 static irqreturn_t igb_intr(int irq, void *);
 static irqreturn_t igb_intr_msi(int irq, void *);
 static irqreturn_t igb_msix_other(int irq, void *);
@@ -3670,9 +3670,6 @@ static void igb_configure_rx(struct igb_adapter *adapter)
 {
int i;
 
-   /* set UTA to appropriate mode */
-   igb_set_uta(adapter);
-
/* set the correct pool for the PF default MAC address in entry 0 */
igb_rar_set_qsel(adapter, adapter->hw.mac.addr, 0,
 adapter->vfs_allocated_count);
@@ -4134,7 +4131,11 @@ static void igb_set_rx_mode(struct net_device *netdev)
/* Check for Promiscuous and All Multicast modes */
if (netdev->flags & IFF_PROMISC) {
rctl |= E1000_RCTL_UPE | E1000_RCTL_MPE;
-   vmolr |= E1000_VMOLR_ROPE | E1000_VMOLR_MPME;
+   vmolr |= E1000_VMOLR_MPME;
+
+   /* enable use of UTA filter to force packets to default pool */
+   if (hw->mac.type == e1000_82576)
+   vmolr |= E1000_VMOLR_ROPE;
} else {
if (netdev->flags & IFF_ALLMULTI) {
rctl |= E1000_RCTL_MPE;
@@ -4190,6 +4191,9 @@ static void igb_set_rx_mode(struct net_device *netdev)
if ((hw->mac.type < e1000_82576) || (hw->mac.type > e1000_i350))
return;
 
+   /* set UTA to appropriate mode */
+   igb_set_uta(adapter, !!(vmolr & E1000_VMOLR_ROPE));
+
vmolr |= rd32(E1000_VMOLR(vfn)) &
 ~(E1000_VMOLR_ROPE | E1000_VMOLR_MPME | E1000_VMOLR_ROMPE);
 
@@ -6323,6 +6327,7 @@ static void igb_msg_task(struct igb_adapter *adapter)
 /**
  *  igb_set_uta - Set unicast filter table address
  *  @adapter: board private structure
+ *  @set: boolean indicating if we are setting or clearing bits
  *
  *  The unicast table address is a register array of 32-bit registers.
  *  The table is meant to be used in a way similar to how the MTA is used
@@ -6330,21 +6335,18 @@ static void igb_msg_task(struct igb_adapter *adapter)
  *  set all the hash bits to 1 and use the VMOLR ROPE bit as a promiscuous
  *  enable bit to allow vlan tag stripping when promiscuous mode is enabled
  **/
-static void igb_set_uta(struct igb_adapter *adapter)
+static void igb_set_uta(struct igb_adapter *adapter, bool set)
 {
struct e1000_hw *hw = &adapter->hw;
+   u32 uta = set ? ~0 : 0;
int i;
 
-   /* The UTA table only exists on 82576 hardwa

[net-next 11/14] igb: Add support for VLAN promiscuous with SR-IOV and NTUPLE

2016-02-15 Thread Jeff Kirsher

From: Alexander Duyck 

This change fixes things so that we can fully support SR-IOV or the
recently added NTUPLE filtering while allowing support for VLAN promiscuous
mode.  By making this change we are able to support possible scenarios such
as SR-IOV with the PF connected to a Linux bridge hosting other VMs.

Signed-off-by: Alexander Duyck 
Tested-by: Aaron Brown 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/igb/igb.h  |   1 +
 drivers/net/ethernet/intel/igb/igb_main.c | 313 +++---
 2 files changed, 242 insertions(+), 72 deletions(-)

diff --git a/drivers/net/ethernet/intel/igb/igb.h 
b/drivers/net/ethernet/intel/igb/igb.h
index d135261..707ae5c 100644
--- a/drivers/net/ethernet/intel/igb/igb.h
+++ b/drivers/net/ethernet/intel/igb/igb.h
@@ -481,6 +481,7 @@ struct igb_adapter {
 #define IGB_FLAG_MAS_ENABLE(1 << 12)
 #define IGB_FLAG_HAS_MSIX  (1 << 13)
 #define IGB_FLAG_EEE   (1 << 14)
+#define IGB_FLAG_VLAN_PROMISC  BIT(15)
 
 /* Media Auto Sense */
 #define IGB_MAS_ENABLE_0   0X0001
diff --git a/drivers/net/ethernet/intel/igb/igb_main.c 
b/drivers/net/ethernet/intel/igb/igb_main.c
index 6876ae5..7366d4f 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -1819,6 +1819,10 @@ void igb_down(struct igb_adapter *adapter)
 
if (!pci_channel_offline(adapter->pdev))
igb_reset(adapter);
+
+   /* clear VLAN promisc flag so VFTA will be updated if necessary */
+   adapter->flags &= ~IGB_FLAG_VLAN_PROMISC;
+
igb_clean_all_tx_rings(adapter);
igb_clean_all_rx_rings(adapter);
 #ifdef CONFIG_IGB_DCA
@@ -2050,7 +2054,7 @@ static int igb_set_features(struct net_device *netdev,
if (changed & NETIF_F_HW_VLAN_CTAG_RX)
igb_vlan_mode(netdev, features);
 
-   if (!(changed & NETIF_F_RXALL))
+   if (!(changed & (NETIF_F_RXALL | NETIF_F_NTUPLE)))
return 0;
 
netdev->features = features;
@@ -3515,8 +3519,7 @@ void igb_setup_rctl(struct igb_adapter *adapter)
 E1000_RCTL_BAM | /* RX All Bcast Pkts */
 E1000_RCTL_PMCF); /* RX All MAC Ctrl Pkts */
 
-   rctl &= ~(E1000_RCTL_VFE | /* Disable VLAN filter */
- E1000_RCTL_DPF | /* Allow filtered pause */
+   rctl &= ~(E1000_RCTL_DPF | /* Allow filtered pause */
  E1000_RCTL_CFIEN); /* Dis VLAN CFIEN Filter */
/* Do not mess with E1000_CTRL_VME, it affects transmit as well,
 * and that breaks VLANs.
@@ -3967,6 +3970,130 @@ static int igb_write_uc_addr_list(struct net_device 
*netdev)
return count;
 }
 
+static int igb_vlan_promisc_enable(struct igb_adapter *adapter)
+{
+   struct e1000_hw *hw = &adapter->hw;
+   u32 i, pf_id;
+
+   switch (hw->mac.type) {
+   case e1000_i210:
+   case e1000_i211:
+   case e1000_i350:
+   /* VLAN filtering needed for VLAN prio filter */
+   if (adapter->netdev->features & NETIF_F_NTUPLE)
+   break;
+   /* fall through */
+   case e1000_82576:
+   case e1000_82580:
+   case e1000_i354:
+   /* VLAN filtering needed for pool filtering */
+   if (adapter->vfs_allocated_count)
+   break;
+   /* fall through */
+   default:
+   return 1;
+   }
+
+   /* We are already in VLAN promisc, nothing to do */
+   if (adapter->flags & IGB_FLAG_VLAN_PROMISC)
+   return 0;
+
+   if (!adapter->vfs_allocated_count)
+   goto set_vfta;
+
+   /* Add PF to all active pools */
+   pf_id = adapter->vfs_allocated_count + E1000_VLVF_POOLSEL_SHIFT;
+
+   for (i = E1000_VLVF_ARRAY_SIZE; --i;) {
+   u32 vlvf = rd32(E1000_VLVF(i));
+
+   vlvf |= 1 << pf_id;
+   wr32(E1000_VLVF(i), vlvf);
+   }
+
+set_vfta:
+   /* Set all bits in the VLAN filter table array */
+   for (i = E1000_VLAN_FILTER_TBL_SIZE; i--;)
+   hw->mac.ops.write_vfta(hw, i, ~0U);
+
+   /* Set flag so we don't redo unnecessary work */
+   adapter->flags |= IGB_FLAG_VLAN_PROMISC;
+
+   return 0;
+}
+
+#define VFTA_BLOCK_SIZE 8
+static void igb_scrub_vfta(struct igb_adapter *adapter, u32 vfta_offset)
+{
+   struct e1000_hw *hw = &adapter->hw;
+   u32 vfta[VFTA_BLOCK_SIZE] = { 0 };
+   u32 vid_start = vfta_offset * 32;
+   u32 vid_end = vid_start + (VFTA_BLOCK_SIZE * 32);
+   u32 i, vid, word, bits, pf_id;
+
+   /* guarantee that we don't scrub out management VLAN */
+   vid = adapter->mng_vlan_id;
+   if (vid >= vid_start && vid < vid_end)
+   vfta[(vid - vid_start) / 32] |= 1 << (vid % 32);
+
+   if (!adapter->vfs_allocated_count)
+   goto set_vfta;
+
+   pf_id = a

[net-next 01/14] igb: Remove unnecessary flag setting in igb_set_flag_queue_pairs()

2016-02-15 Thread Jeff Kirsher

From: Shota Suzuki 

If VFs are enabled (max_vfs >= 1), both max_rss_queues and
adapter->rss_queues are set to 2 in the case of e1000_82576.
In this case, IGB_FLAG_QUEUE_PAIRS is always set in the default block as a
result of fall-through, thus setting it in the e1000_82576 block is not
necessary.

Signed-off-by: Shota Suzuki 
Tested-by: Aaron Brown 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/igb/igb_main.c | 8 
 1 file changed, 8 deletions(-)

diff --git a/drivers/net/ethernet/intel/igb/igb_main.c 
b/drivers/net/ethernet/intel/igb/igb_main.c
index 31e5f39..eb24b40 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -2921,14 +2921,6 @@ void igb_set_flag_queue_pairs(struct igb_adapter 
*adapter,
/* Device supports enough interrupts without queue pairing. */
break;
case e1000_82576:
-   /* If VFs are going to be allocated with RSS queues then we
-* should pair the queues in order to conserve interrupts due
-* to limited supply.
-*/
-   if ((adapter->rss_queues > 1) &&
-   (adapter->vfs_allocated_count > 6))
-   adapter->flags |= IGB_FLAG_QUEUE_PAIRS;
-   /* fall through */
case e1000_82580:
case e1000_i350:
case e1000_i354:
-- 
2.5.0

Re: [PATCH] net: bcmgenet: Add MDIO_INTR in GENETv2

2016-02-15 Thread Jaedon Shin

Hi Florian,

> On Feb 16, 2016, at 3:18 AM, Florian Fainelli  wrote:
> 
> Hi Jaedon,
> 
> On 15/02/2016 00:42, Jaedon Shin wrote:
>> The GENETv2 chipsets has MDIO interrupt like the GENETv3+ chipsets.
>> 
>> The previous commit d5c3d84657db ("net: phy: Avoid polling PHY with
>> PHY_IGNORE_INTERRUPTS") and commit 49f7a471e4d1 ("net: bcmgenet: Properly
>> configure PHY to ignore interrupt") cause link-down PHY always in some
>> 40nm generation chipsets.
> 
> Humm, these are two different things here:
> 
> - GENET_HAS_MDIO_INTR is about telling the driver whether the hardware
> supports MDIO_INTR_DONE and MDIO_INTR_ERROR
> - eliminating PHY polling is about utilizing LINK_UP and LINK_DOWN to
> avoid polling the PHY
> 
> So the original problem is actually here:
> 
> bcmgenet_irq_task():
> 
>   /* Link UP/DOWN event */
>   if ((priv->hw_params->flags & GENET_HAS_MDIO_INTR) &&
>   (priv->irq0_stat & UMAC_IRQ_LINK_EVENT)) {
> 
> These two checks are actually orthogonal, so we should remove the first
> check on GENET_HAS_MDIO_INTR.
> 

As you said, the part in bcmgenet_irq_task() is a problem.

The bcmgenet using internal PHY should use phy_mac_interrupt() cause it has not
PHY_POLL, and it depends on Ethernet MAC ISR.

UMAC_IRQ_LINK_EVENT(LINK_UP and LINK_DOWN) was working correctly in GENETv2,
but (priv->hw_params->flags & GENET_HAS_MDIO_INTR) was blocking to call
phy_mac_interrupt(). I didn't find a reason through datasheet without MDIO_INTR
in GENETv2. However, I'm not sure using MDIO_INTR.

Therefore if MDIO_INTR is not valid in GENETv2, I will send the patch again 
to remove the first chicken GENET_HAS_MDIO_INTR after your confirm.

Thanks,
Jaedon 

> Your patch remains valid though, just the explanation needs a bit
> tweaking, thanks!
> 
>> 
>> Signed-off-by: Jaedon Shin 
>> ---
>> drivers/net/ethernet/broadcom/genet/bcmgenet.c | 4 ++--
>> drivers/net/ethernet/broadcom/genet/bcmgenet.h | 2 +-
>> 2 files changed, 3 insertions(+), 3 deletions(-)
>> 
>> diff --git a/drivers/net/ethernet/broadcom/genet/bcmgenet.c 
>> b/drivers/net/ethernet/broadcom/genet/bcmgenet.c
>> index b15a60d787c7..8e9aa8f6390d 100644
>> --- a/drivers/net/ethernet/broadcom/genet/bcmgenet.c
>> +++ b/drivers/net/ethernet/broadcom/genet/bcmgenet.c
>> @@ -1904,7 +1904,7 @@ static int init_umac(struct bcmgenet_priv *priv)
>>  bcmgenet_bp_mc_set(priv, reg);
>>  }
>> 
>> -/* Enable MDIO interrupts on GENET v3+ */
>> +/* Enable MDIO interrupts on GENET v2+ */
>>  if (priv->hw_params->flags & GENET_HAS_MDIO_INTR)
>>  int0_enable |= (UMAC_IRQ_MDIO_DONE | UMAC_IRQ_MDIO_ERROR);
>> 
>> @@ -3168,7 +3168,7 @@ static struct bcmgenet_hw_params bcmgenet_hw_params[] 
>> = {
>>  .rdma_offset = 0x3000,
>>  .tdma_offset = 0x4000,
>>  .words_per_bd = 2,
>> -.flags = GENET_HAS_EXT,
>> +.flags = GENET_HAS_EXT | GENET_HAS_MDIO_INTR,
>>  },
>>  [GENET_V3] = {
>>  .tx_queues = 4,
>> diff --git a/drivers/net/ethernet/broadcom/genet/bcmgenet.h 
>> b/drivers/net/ethernet/broadcom/genet/bcmgenet.h
>> index 967367557309..c14bfbfbe06a 100644
>> --- a/drivers/net/ethernet/broadcom/genet/bcmgenet.h
>> +++ b/drivers/net/ethernet/broadcom/genet/bcmgenet.h
>> @@ -310,7 +310,7 @@ struct bcmgenet_mib_counters {
>> #define UMAC_IRQ_TXDMA_BDONE (1 << 18)
>> #define UMAC_IRQ_TXDMA_DONE  UMAC_IRQ_TXDMA_MBDONE
>> 
>> -/* Only valid for GENETv3+ */
>> +/* Only valid for GENETv2+ */
>> #define UMAC_IRQ_MDIO_DONE   (1 << 23)
>> #define UMAC_IRQ_MDIO_ERROR  (1 << 24)
>> 
>>

Re: [PATCHv2] af_llc: fix types on llc_ui_wait_for_conn

2016-02-15 Thread Simon Horman

On Mon, Feb 15, 2016 at 07:41:51PM +, Alan wrote:
> The timeout is a long, we return it truncated if it is huge. Basically
> harmless as the only caller does a boolean check, but tidy it up anyway.

If the only caller performs a boolean check then perhaps
it would be best if the function's return type was bool.

> (64bit build tested this time. Thank you 0day)
> 
> Signed-off-by: Alan Cox 
> ---
>  net/llc/af_llc.c |4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/net/llc/af_llc.c b/net/llc/af_llc.c
> index 8dab4e5..b3c52e3 100644
> --- a/net/llc/af_llc.c
> +++ b/net/llc/af_llc.c
> @@ -38,7 +38,7 @@ static u16 llc_ui_sap_link_no_max[256];
>  static struct sockaddr_llc llc_ui_addrnull;
>  static const struct proto_ops llc_ui_ops;
>  
> -static int llc_ui_wait_for_conn(struct sock *sk, long timeout);
> +static long llc_ui_wait_for_conn(struct sock *sk, long timeout);
>  static int llc_ui_wait_for_disc(struct sock *sk, long timeout);
>  static int llc_ui_wait_for_busy_core(struct sock *sk, long timeout);
>  
> @@ -551,7 +551,7 @@ static int llc_ui_wait_for_disc(struct sock *sk, long 
> timeout)
>   return rc;
>  }
>  
> -static int llc_ui_wait_for_conn(struct sock *sk, long timeout)
> +static long llc_ui_wait_for_conn(struct sock *sk, long timeout)
>  {
>   DEFINE_WAIT(wait);
>  
>

[PATCH] tcp: correctly crypto_alloc_hash return check

2016-02-15 Thread Insu Yun

crypto_alloc_hash never returns NULL

Signed-off-by: Insu Yun 
---
 net/ipv4/tcp.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index fd17eec..a95aac1 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -2946,7 +2946,7 @@ static void __tcp_alloc_md5sig_pool(void)
struct crypto_hash *hash;
 
hash = crypto_alloc_hash("md5", 0, CRYPTO_ALG_ASYNC);
-   if (IS_ERR_OR_NULL(hash))
+   if (IS_ERR(hash))
return;
per_cpu(tcp_md5sig_pool, cpu).md5_desc.tfm = hash;
}
-- 
1.9.1

[PATCH] et131x: check return value of dma_alloc_coherent

2016-02-15 Thread Insu Yun

For error handling, dma_alloc_coherent's return value
needs to be checked, not argument.

Signed-off-by: Insu Yun 
---
 drivers/net/ethernet/agere/et131x.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/agere/et131x.c 
b/drivers/net/ethernet/agere/et131x.c
index 3f3bcbe..0907ab6 100644
--- a/drivers/net/ethernet/agere/et131x.c
+++ b/drivers/net/ethernet/agere/et131x.c
@@ -2380,7 +2380,7 @@ static int et131x_tx_dma_memory_alloc(struct 
et131x_adapter *adapter)
sizeof(u32),
&tx_ring->tx_status_pa,
GFP_KERNEL);
-   if (!tx_ring->tx_status_pa) {
+   if (!tx_ring->tx_status) {
dev_err(&adapter->pdev->dev,
"Cannot alloc memory for Tx status block\n");
return -ENOMEM;
-- 
1.9.1

[PATCH v2] gre: Avoid kernel panic by clearing IPCB before dst_link_failure called

2016-02-15 Thread Bernie Harris

skb->cb may contain data from previous layers (in the observed case the
qdisc layer). In the observed scenario, the data was misinterpreted as
ip header options, which later caused the ihl to be set to an invalid
value (<5). This resulted in an infinite loop in the mips implementation
of ip_fast_csum.

This patch clears IPCB before dst_link_failure is called from the functions
ip_tunnel_xmit and ip6gre_xmit2, similar to what commit 11c21a30 does for
an ipv4 case.

Signed-off-by: Bernie Harris 
---
 net/ipv4/ip_tunnel.c | 1 +
 net/ipv6/ip6_gre.c   | 2 ++
 2 files changed, 3 insertions(+)

diff --git a/net/ipv4/ip_tunnel.c b/net/ipv4/ip_tunnel.c
index 89e8861..946091a 100644
--- a/net/ipv4/ip_tunnel.c
+++ b/net/ipv4/ip_tunnel.c
@@ -799,6 +799,7 @@ void ip_tunnel_xmit(struct sk_buff *skb, struct net_device 
*dev,
 
 #if IS_ENABLED(CONFIG_IPV6)
 tx_error_icmp:
+   memset(IPCB(skb), 0, sizeof(*IPCB(skb)));
dst_link_failure(skb);
 #endif
 tx_error:
diff --git a/net/ipv6/ip6_gre.c b/net/ipv6/ip6_gre.c
index f37f18b..93fc6f9 100644
--- a/net/ipv6/ip6_gre.c
+++ b/net/ipv6/ip6_gre.c
@@ -678,6 +678,7 @@ static netdev_tx_t ip6gre_xmit2(struct sk_buff *skb,
tunnel->err_time + IP6TUNNEL_ERR_TIMEO)) {
tunnel->err_count--;
 
+   memset(IPCB(skb), 0, sizeof(*IPCB(skb)));
dst_link_failure(skb);
} else
tunnel->err_count = 0;
@@ -761,6 +762,7 @@ static netdev_tx_t ip6gre_xmit2(struct sk_buff *skb,
return 0;
 tx_err_link_failure:
stats->tx_carrier_errors++;
+   memset(IPCB(skb), 0, sizeof(*IPCB(skb)));
dst_link_failure(skb);
 tx_err_dst_release:
dst_release(dst);
-- 
2.7.1

[PATCH net-next] net: macb: make magic-packet property generic

2016-02-15 Thread Sergio Prado

Signed-off-by: Sergio Prado 
---
As requested by Rob Herring on patch
https://patchwork.ozlabs.org/patch/580862/
---
 Documentation/devicetree/bindings/net/macb.txt | 2 +-
 drivers/net/ethernet/cadence/macb.c| 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/Documentation/devicetree/bindings/net/macb.txt 
b/Documentation/devicetree/bindings/net/macb.txt
index c6b1cb5ffa87..b5a42df4c928 100644
--- a/Documentation/devicetree/bindings/net/macb.txt
+++ b/Documentation/devicetree/bindings/net/macb.txt
@@ -25,7 +25,7 @@ Required properties:
 
 Optional properties for PHY child node:
 - reset-gpios : Should specify the gpio for phy reset
-- cdns,magic-packet : If present, indicates that the hardware supports waking
+- magic-packet : If present, indicates that the hardware supports waking
   up via magic packet.
 
 Examples:
diff --git a/drivers/net/ethernet/cadence/macb.c 
b/drivers/net/ethernet/cadence/macb.c
index 69af049e55a8..7ccf2298a5fa 100644
--- a/drivers/net/ethernet/cadence/macb.c
+++ b/drivers/net/ethernet/cadence/macb.c
@@ -2929,7 +2929,7 @@ static int macb_probe(struct platform_device *pdev)
bp->jumbo_max_len = macb_config->jumbo_max_len;
 
bp->wol = 0;
-   if (of_get_property(np, "cdns,magic-packet", NULL))
+   if (of_get_property(np, "magic-packet", NULL))
bp->wol |= MACB_WOL_HAS_MAGIC_PACKET;
device_init_wakeup(&pdev->dev, bp->wol & MACB_WOL_HAS_MAGIC_PACKET);
 
-- 
1.9.1

Re: [net-next PATCH 06/11] RFC: mlx5: RX bulking or bundling of packets before calling network stack

2016-02-15 Thread Saeed Mahameed

On Wed, Feb 10, 2016 at 10:26 PM, Jesper Dangaard Brouer
 wrote:
> On Tue, 9 Feb 2016 13:57:41 +0200
> Saeed Mahameed  wrote:
>
>> On Tue, Feb 2, 2016 at 11:13 PM, Jesper Dangaard Brouer
>>  wrote:
>> > There are several techniques/concepts combined in this optimization.
>> > It is both a data-cache and instruction-cache optimization.
>> >
>> > First of all, this is primarily about delaying touching
>> > packet-data, which happend in eth_type_trans, until the prefetch
>> > have had time to fetch.  Thus, hopefully avoiding a cache-miss on
>> > packet data.
>> >
>> > Secondly, the instruction-cache optimization is about, not
>> > calling the network stack for every packet, which is pulled out
>> > of the RX ring.  Calling the full stack likely removes/flushes
>> > the instruction cache every time.
>> >
>> > Thus, have two loops, one loop pulling out packet from the RX
>> > ring and starting the prefetching, and the second loop calling
>> > eth_type_trans() and invoking the stack via napi_gro_receive().
>> >
>> > Signed-off-by: Jesper Dangaard Brouer 
>> >
>> >
>> > Notes:
>> > This is the patch that gave a speed up of 6.2Mpps to 12Mpps, when
>> > trying to measure lowest RX level, by dropping the packets in the
>> > driver itself (marked drop point as comment).
>>
>> Indeed looks very promising in respect of instruction-cache
>> optimization, but i have some doubts regarding the data-cache
>> optimizations (prefetch), please see my below questions.
>>
>> We will take this patch and test it in house.
>>
>> >
>> > For now, the ring is emptied upto the budget.  I don't know if it
>> > would be better to chunk it up more?
>>
>> Not sure, according to netdevice.h :
>>
>> /* Default NAPI poll() weight
>>  * Device drivers are strongly advised to not use bigger value
>>  */
>> #define NAPI_POLL_WEIGHT 64
>>
>> we will also compare different budget values with your approach, but I
>> doubt it will be accepted to increase the NAPI_POLL_WEIGHT for mlx5
>> drivers. furthermore increasing NAPI poll budget might cause cache overflow
>> with this approach since you are chunking up all "prefetch(skb->data)"
>> (I didn't do the math yet in regards of cache utilization with this
>> approach).
>
> You misunderstood me... I don't want to increase the NAPI_POLL_WEIGHT.
> I want to keep the 64, but sort of split it up, and e.g. call the stack
> for each 16 packets. Due to cache-size limits...
I see.

>
> One approach could be to compare the HW skb->hash to prev packet, and
> exit loop if they don't match (and call netstack with this bundle).
Sorry i am failing to see how this could help, either way you need an
inner budget of 16 as you said before.
>
>
>> > mlx5e_handle_csum(netdev, cqe, rq, skb);
>> >
>> > -   skb->protocol = eth_type_trans(skb, netdev);
>> > -
>>
>> mlx5e_handle_csum also access the skb->data in is_first_ethertype_ip
>> function, but i think it is not interesting since this is not the
>> common case,
>> e.g: for the none common case of L4 traffic with no HW checksum
>> offload you won't benefit from this optimization since we access the
>> skb->data to know the L3 header type, and this can be fixed in driver
>> code to check the CQE meta data for these fields instead of accessing
>> the skb->data, but I will need to look further into that.
>
> Okay, understood.  We should look into this too, but not as top priority.
> We can simply move mlx5e_handle_csum() like eth_type_trans().
No, it is not that simple. mlx5e_handle_csum needs the cqe form the
first loop, referencing back to the cqe in the second loop will might
introduce new cache misses as the cqe is already "cold", what i like
in your approach is that you separated between two different flows
(read from device & create SKBs bundle  --> pass bundle to netstack),
now we don't want the "pass bundle to netstack" flow to look back at
device's (cqes/wqes etc..).

Again this is not the main issue for now as it is not the common case,
but we are preparing a patch that fixes the mlx5e_handle_csum to not
look at skb->data at all, we will share it once it is ready.
>
>
>> > @@ -252,7 +257,6 @@ int mlx5e_poll_rx_cq(struct mlx5e_cq *cq, int budget)
>> > wqe_counter= be16_to_cpu(wqe_counter_be);
>> > wqe= mlx5_wq_ll_get_wqe(&rq->wq, wqe_counter);
>> > skb= rq->skb[wqe_counter];
>> > -   prefetch(skb->data);
>> > rq->skb[wqe_counter] = NULL;
>> >
>> > dma_unmap_single(rq->pdev,
>> > @@ -265,16 +269,27 @@ int mlx5e_poll_rx_cq(struct mlx5e_cq *cq, int budget)
>> > dev_kfree_skb(skb);
>> > goto wq_ll_pop;
>> > }
>> > +   prefetch(skb->data);
>>
>> is this optimal for all CPU archs ?
>
> For some CPU ARCHs the prefetch is compile time removed.
>
>> is it ok to use up to 64 cache lines at once ?
>
> That is not the problem, using 64 cache-lines * 64 = 4096 bytes.
> The B

Re: [PATCH] ravb: Update DT binding example for final CPG/MSSR bindings

2016-02-15 Thread Simon Horman

On Mon, Feb 15, 2016 at 01:41:31PM +0100, Geert Uytterhoeven wrote:
> The example in the DT binding documentation uses the preliminary DT
> bindings for the r8a7795 MSTP clocks, which never went upstream.
> Update the example to use the DT bindings for the upstream Clock Pulse
> Generator / Module Standby and Software Reset hardware block.
> 
> Signed-off-by: Geert Uytterhoeven 

Reviewed-by: Simon Horman

Re: [PATCH 2/2] sh_eth: kill useless switch defaults

2016-02-15 Thread Simon Horman

On Sun, Feb 14, 2016 at 10:56:33PM +0300, Sergei Shtylyov wrote:
> The driver often has the *default* cases doing nothing in the *switch*
> statements with  the integer expressions -- remove them.
> 
> Signed-off-by: Sergei Shtylyov 

Reviewed-by: Simon Horman

Re: [PATCH 1/2] ravb: kill useless switch defaults

2016-02-15 Thread Simon Horman

On Sun, Feb 14, 2016 at 10:56:03PM +0300, Sergei Shtylyov wrote:
> The  driver has the *default* case doing nothing in the *switch* statement
> with an integer expression -- remove it.
> 
> Signed-off-by: Sergei Shtylyov 

Reviewed-by: Simon Horman

[PATCH] net-sysfs: remove unused fmt_long_hex

2016-02-15 Thread Colin King

From: Colin Ian King 

Ever since commit 04ed3e741d0f133e02bed7fa5c98edba128f90e7
("net: change netdev->features to u32") the format string
fmt_long_hex has not been used, so we may as well remove it.

Signed-off-by: Colin Ian King 
---
 net/core/net-sysfs.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c
index b6c8a66..e326707 100644
--- a/net/core/net-sysfs.c
+++ b/net/core/net-sysfs.c
@@ -29,7 +29,6 @@
 
 #ifdef CONFIG_SYSFS
 static const char fmt_hex[] = "%#x\n";
-static const char fmt_long_hex[] = "%#lx\n";
 static const char fmt_dec[] = "%d\n";
 static const char fmt_ulong[] = "%lu\n";
 static const char fmt_u64[] = "%llu\n";
-- 
2.7.0

[PATCH v3] phy: marvell: Fix and unify reg-init behavior

2016-02-15 Thread Clemens Gruber

For the Marvell 88E1510, marvell_of_reg_init was called too late, in the
config_aneg function.
Since commit 113c74d83eef ("net: phy: turn carrier off on phy attach"),
this lead to the link not coming up at boot anymore, due to the phy
state machine being stuck at waiting for interrupts (off by default on
the 88E1510).
For seven other Marvell PHYs, marvell_of_reg_init was not called at all.

Add a generic marvell_config_init function, which in turn calls
marvell_of_reg_init.
PHYs, which already have a specific config_init function with a call to
marvell_of_reg_init, are left untouched. The generic marvell_config_init
function is called for all the others, to get consistent behavior across
all Marvell PHYs.

Signed-off-by: Clemens Gruber 
---

Changes from v2:
- Simplified marvell_config_init (No preemptive error handling)

---
 drivers/net/phy/marvell.c | 15 +--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/drivers/net/phy/marvell.c b/drivers/net/phy/marvell.c
index e3eb964..ab1d0fc 100644
--- a/drivers/net/phy/marvell.c
+++ b/drivers/net/phy/marvell.c
@@ -446,6 +446,12 @@ static int m88e1510_config_aneg(struct phy_device *phydev)
if (err < 0)
return err;
 
+   return 0;
+}
+
+static int marvell_config_init(struct phy_device *phydev)
+{
+   /* Set registers from marvell,reg-init DT property */
return marvell_of_reg_init(phydev);
 }
 
@@ -495,7 +501,7 @@ static int m88e1116r_config_init(struct phy_device *phydev)
 
mdelay(500);
 
-   return 0;
+   return marvell_config_init(phydev);
 }
 
 static int m88e3016_config_init(struct phy_device *phydev)
@@ -514,7 +520,7 @@ static int m88e3016_config_init(struct phy_device *phydev)
if (reg < 0)
return reg;
 
-   return 0;
+   return marvell_config_init(phydev);
 }
 
 static int m88e_config_init(struct phy_device *phydev)
@@ -1078,6 +1084,7 @@ static struct phy_driver marvell_drivers[] = {
.features = PHY_GBIT_FEATURES,
.probe = marvell_probe,
.flags = PHY_HAS_INTERRUPT,
+   .config_init = &marvell_config_init,
.config_aneg = &marvell_config_aneg,
.read_status = &genphy_read_status,
.ack_interrupt = &marvell_ack_interrupt,
@@ -1149,6 +1156,7 @@ static struct phy_driver marvell_drivers[] = {
.features = PHY_GBIT_FEATURES,
.flags = PHY_HAS_INTERRUPT,
.probe = marvell_probe,
+   .config_init = &marvell_config_init,
.config_aneg = &m88e1121_config_aneg,
.read_status = &marvell_read_status,
.ack_interrupt = &marvell_ack_interrupt,
@@ -1167,6 +1175,7 @@ static struct phy_driver marvell_drivers[] = {
.features = PHY_GBIT_FEATURES,
.flags = PHY_HAS_INTERRUPT,
.probe = marvell_probe,
+   .config_init = &marvell_config_init,
.config_aneg = &m88e1318_config_aneg,
.read_status = &marvell_read_status,
.ack_interrupt = &marvell_ack_interrupt,
@@ -1259,6 +1268,7 @@ static struct phy_driver marvell_drivers[] = {
.features = PHY_GBIT_FEATURES,
.flags = PHY_HAS_INTERRUPT,
.probe = marvell_probe,
+   .config_init = &marvell_config_init,
.config_aneg = &m88e1510_config_aneg,
.read_status = &marvell_read_status,
.ack_interrupt = &marvell_ack_interrupt,
@@ -1277,6 +1287,7 @@ static struct phy_driver marvell_drivers[] = {
.features = PHY_GBIT_FEATURES,
.flags = PHY_HAS_INTERRUPT,
.probe = marvell_probe,
+   .config_init = &marvell_config_init,
.config_aneg = &m88e1510_config_aneg,
.read_status = &marvell_read_status,
.ack_interrupt = &marvell_ack_interrupt,
-- 
2.7.1

Re: [PATCH v2] phy: marvell: Fix and unify reg-init behavior

2016-02-15 Thread Clemens Gruber

Hi Fabio,

On Mon, Feb 15, 2016 at 06:54:29PM -0200, Fabio Estevam wrote:
> On Mon, Feb 15, 2016 at 6:01 PM, Clemens Gruber
>  wrote:
> 
> > +static int marvell_config_init(struct phy_device *phydev)
> > +{
> > +   int err;
> > +
> > +   /* Set registers from marvell,reg-init DT property */
> > +   err = marvell_of_reg_init(phydev);
> > +   if (err < 0)
> > +   return err;
> > +
> > +   return 0;
> >  }
> 
> Couldn't this be replaced by
> 
> return marvell_of_reg_init(phydev); ?

I wanted to add some missing errata fixes from the Marvell Release Notes
into that function (in the near future).
But you are right, I should probably not change things preemptively.

I'll send a v3 with that part replaced!

Thanks,
Clemens

Re: [PATCH net-next 1/3] lwtunnel: autoload of lwt modules

2016-02-15 Thread Eric W. Biederman

Robert Shearman  writes:

> The lwt implementations using net devices can autoload using the
> existing mechanism using IFLA_INFO_KIND. However, there's no mechanism
> that lwt modules not using net devices can use.
>
> Therefore, add the ability to autoload modules registering lwt
> operations for lwt implementations not using a net device so that
> users don't have to manually load the modules.
>
> Signed-off-by: Robert Shearman 
> ---
>  include/net/lwtunnel.h |  4 +++-
>  net/core/lwtunnel.c| 32 
>  2 files changed, 35 insertions(+), 1 deletion(-)
>
> diff --git a/include/net/lwtunnel.h b/include/net/lwtunnel.h
> index 66350ce3e955..e9f116e29c22 100644
> --- a/include/net/lwtunnel.h
> +++ b/include/net/lwtunnel.h
> @@ -170,6 +170,8 @@ static inline int lwtunnel_input(struct sk_buff *skb)
>   return -EOPNOTSUPP;
>  }
>  
> -#endif
> +#endif /* CONFIG_LWTUNNEL */
> +
> +#define MODULE_ALIAS_RTNL_LWT(encap_type) MODULE_ALIAS("rtnl-lwt-" 
> __stringify(encap_type))
>  
>  #endif /* __NET_LWTUNNEL_H */
> diff --git a/net/core/lwtunnel.c b/net/core/lwtunnel.c
> index 299cfc24d888..8ef5e5cec03e 100644
> --- a/net/core/lwtunnel.c
> +++ b/net/core/lwtunnel.c
> @@ -27,6 +27,30 @@
>  #include 
>  #include 
>  
> +#ifdef CONFIG_MODULES
> +
> +static const char *lwtunnel_encap_str(enum lwtunnel_encap_types encap_type)
> +{
> + switch (encap_type) {
> + case LWTUNNEL_ENCAP_MPLS:
> + return "LWTUNNEL_ENCAP_MPLS";
> + case LWTUNNEL_ENCAP_IP:
> + return "LWTUNNEL_ENCAP_IP";
> + case LWTUNNEL_ENCAP_ILA:
> + return "LWTUNNEL_ENCAP_ILA";
> + case LWTUNNEL_ENCAP_IP6:
> + return "LWTUNNEL_ENCAP_IP6";
> + case LWTUNNEL_ENCAP_NONE:
> + case __LWTUNNEL_ENCAP_MAX:
> + /* should not have got here */
> + break;
> + }
> + WARN_ON(1);
> + return "LWTUNNEL_ENCAP_NONE";
> +}
> +
> +#endif /* CONFIG_MODULES */
> +
>  struct lwtunnel_state *lwtunnel_state_alloc(int encap_len)
>  {
>   struct lwtunnel_state *lws;
> @@ -85,6 +109,14 @@ int lwtunnel_build_state(struct net_device *dev, u16 
> encap_type,
>   ret = -EOPNOTSUPP;
>   rcu_read_lock();
>   ops = rcu_dereference(lwtun_encaps[encap_type]);
> +#ifdef CONFIG_MODULES
> + if (!ops) {
> + rcu_read_unlock();
> + request_module("rtnl-lwt-%s", lwtunnel_encap_str(encap_type));
> + rcu_read_lock();
> + ops = rcu_dereference(lwtun_encaps[encap_type]);
> + }
> +#endif
>   if (likely(ops && ops->build_state))
>   ret = ops->build_state(dev, encap, family, cfg, lws);
>   rcu_read_unlock();

My memory is fuzzy on how this is done elsewhere but this looks like it
needs a capability check to ensure that non-root user's can't trigger
this.

It tends to be problematic if a non-root user can trigger an autoload of
a known-buggy module.  With a combination of user namespaces and network
namespaces unprivileged users can cause just about every corner of the
network stack to be exercised.

Eric

Re: IPv4/IPv6 sysctl defaults in new namespace

2016-02-15 Thread Eric W. Biederman

Konstantin Khlebnikov  writes:

> IPv6 initialized with default. That's ok.
> IPv4 makes a copy from init_net. Looks like a bug, here
> v2.6.24-2577-g752d14dc6aa9
>
> root@zurg:~# sysctl net.ipv4.conf.all.forwarding=0
> net.ipv6.conf.all.forwarding=0
> net.ipv4.conf.all.forwarding = 0
> net.ipv6.conf.all.forwarding = 0
> root@zurg:~# unshare -n sysctl net.ipv4.conf.all.forwarding
> net.ipv6.conf.all.forwarding
> net.ipv4.conf.all.forwarding = 0
> net.ipv6.conf.all.forwarding = 0
> root@zurg:~# sysctl net.ipv4.conf.all.forwarding=1
> net.ipv6.conf.all.forwarding=1
> net.ipv4.conf.all.forwarding = 1
> net.ipv6.conf.all.forwarding = 1
> root@zurg:~# unshare -n sysctl net.ipv4.conf.all.forwarding
> net.ipv6.conf.all.forwarding
> net.ipv4.conf.all.forwarding = 1
> net.ipv6.conf.all.forwarding = 0
>
> This is nasty. Could we fix this or this bug set in stone?

The test is do we break anyone, and the initial network namespace
is arbitrary enough I don't think anyone can depend upon a specific
value when creating a new network namespace.  So if someone is willing
to do the work we can fix this.

Of course the fixes would have to made against a recent kernel not
something as ancient as 2.6.24.

Eric

[PATCH net-next 3/3] ipv4: Remove inet_lro library

2016-02-15 Thread Ben Hutchings

There are no longer any in-tree drivers that use it.

Signed-off-by: Ben Hutchings 
---
 include/linux/inet_lro.h | 142 --
 net/ipv4/Kconfig |   8 -
 net/ipv4/Makefile|   1 -
 net/ipv4/inet_lro.c  | 374 ---
 4 files changed, 525 deletions(-)
 delete mode 100644 include/linux/inet_lro.h
 delete mode 100644 net/ipv4/inet_lro.c

diff --git a/include/linux/inet_lro.h b/include/linux/inet_lro.h
deleted file mode 100644
index 9a715cfa1fe3..
--- a/include/linux/inet_lro.h
+++ /dev/null
@@ -1,142 +0,0 @@
-/*
- *  linux/include/linux/inet_lro.h
- *
- *  Large Receive Offload (ipv4 / tcp)
- *
- *  (C) Copyright IBM Corp. 2007
- *
- *  Authors:
- *   Jan-Bernd Themann 
- *   Christoph Raisch 
- *
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License as published by
- * the Free Software Foundation; either version 2, or (at your option)
- * any later version.
- *
- * This program is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
- * GNU General Public License for more details.
- *
- * You should have received a copy of the GNU General Public License
- * along with this program; if not, write to the Free Software
- * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
- */
-
-#ifndef __INET_LRO_H_
-#define __INET_LRO_H_
-
-#include 
-#include 
-
-/*
- * LRO statistics
- */
-
-struct net_lro_stats {
-   unsigned long aggregated;
-   unsigned long flushed;
-   unsigned long no_desc;
-};
-
-/*
- * LRO descriptor for a tcp session
- */
-struct net_lro_desc {
-   struct sk_buff *parent;
-   struct sk_buff *last_skb;
-   struct skb_frag_struct *next_frag;
-   struct iphdr *iph;
-   struct tcphdr *tcph;
-   __wsum  data_csum;
-   __be32 tcp_rcv_tsecr;
-   __be32 tcp_rcv_tsval;
-   __be32 tcp_ack;
-   u32 tcp_next_seq;
-   u32 skb_tot_frags_len;
-   u16 ip_tot_len;
-   u16 tcp_saw_tstamp; /* timestamps enabled */
-   __be16 tcp_window;
-   int pkt_aggr_cnt;   /* counts aggregated packets */
-   int vlan_packet;
-   int mss;
-   int active;
-};
-
-/*
- * Large Receive Offload (LRO) Manager
- *
- * Fields must be set by driver
- */
-
-struct net_lro_mgr {
-   struct net_device *dev;
-   struct net_lro_stats stats;
-
-   /* LRO features */
-   unsigned long features;
-#define LRO_F_NAPI1  /* Pass packets to stack via NAPI */
-#define LRO_F_EXTRACT_VLAN_ID 2  /* Set flag if VLAN IDs are extracted
-   from received packets and eth protocol
-   is still ETH_P_8021Q */
-
-   /*
-* Set for generated SKBs that are not added to
-* the frag list in fragmented mode
-*/
-   u32 ip_summed;
-   u32 ip_summed_aggr; /* Set in aggregated SKBs: CHECKSUM_UNNECESSARY
-* or CHECKSUM_NONE */
-
-   int max_desc; /* Max number of LRO descriptors  */
-   int max_aggr; /* Max number of LRO packets to be aggregated */
-
-   int frag_align_pad; /* Padding required to properly align layer 3
-* headers in generated skb when using frags */
-
-   struct net_lro_desc *lro_arr; /* Array of LRO descriptors */
-
-   /*
-* Optimized driver functions
-*
-* get_skb_header: returns tcp and ip header for packet in SKB
-*/
-   int (*get_skb_header)(struct sk_buff *skb, void **ip_hdr,
- void **tcpudp_hdr, u64 *hdr_flags, void *priv);
-
-   /* hdr_flags: */
-#define LRO_IPV4 1 /* ip_hdr is IPv4 header */
-#define LRO_TCP  2 /* tcpudp_hdr is TCP header */
-
-   /*
-* get_frag_header: returns mac, tcp and ip header for packet in SKB
-*
-* @hdr_flags: Indicate what kind of LRO has to be done
-* (IPv4/IPv6/TCP/UDP)
-*/
-   int (*get_frag_header)(struct skb_frag_struct *frag, void **mac_hdr,
-  void **ip_hdr, void **tcpudp_hdr, u64 *hdr_flags,
-  void *priv);
-};
-
-/*
- * Processes a SKB
- *
- * @lro_mgr: LRO manager to use
- * @skb: SKB to aggregate
- * @priv: Private data that may be used by driver functions
- *(for example get_tcp_ip_hdr)
- */
-
-void lro_receive_skb(struct net_lro_mgr *lro_mgr,
-struct sk_buff *skb,
-void *priv);
-/*
- * Forward all aggregated SKBs held by lro_mgr to network stack
- */
-
-void lro_flush_all(struct net_lro_mgr *lro_mgr);
-
-#endif
diff --git a/net/ipv4/Kconfig b/net/ipv4/Kconfig
index 775824720b6b..6c4b79c98fda 100644
--- a/net/ipv4/Kconfig
+++ b/net/ipv4/Kconfig
@@ -405,14 +405,6 @@ config

[PATCH net-next 2/3] RDMA/nes: Replace LRO with GRO

2016-02-15 Thread Ben Hutchings

GRO is simpler to use than the old inet_lro library, and is compatible
with forwarding and bridging configurations.

Compile-tested only.

Signed-off-by: Ben Hutchings 
---
 drivers/infiniband/hw/nes/Kconfig   |  1 -
 drivers/infiniband/hw/nes/nes_hw.c  | 44 +
 drivers/infiniband/hw/nes/nes_hw.h  |  7 --
 drivers/infiniband/hw/nes/nes_nic.c |  7 --
 4 files changed, 1 insertion(+), 58 deletions(-)

diff --git a/drivers/infiniband/hw/nes/Kconfig 
b/drivers/infiniband/hw/nes/Kconfig
index 846dc97cf260..7964eba8e7ed 100644
--- a/drivers/infiniband/hw/nes/Kconfig
+++ b/drivers/infiniband/hw/nes/Kconfig
@@ -2,7 +2,6 @@ config INFINIBAND_NES
tristate "NetEffect RNIC Driver"
depends on PCI && INET && INFINIBAND
select LIBCRC32C
-   select INET_LRO
---help---
  This is the RDMA Network Interface Card (RNIC) driver for
  NetEffect Ethernet Cluster Server Adapters.
diff --git a/drivers/infiniband/hw/nes/nes_hw.c 
b/drivers/infiniband/hw/nes/nes_hw.c
index 4713dd7ed764..a1c6481d8038 100644
--- a/drivers/infiniband/hw/nes/nes_hw.c
+++ b/drivers/infiniband/hw/nes/nes_hw.c
@@ -35,18 +35,11 @@
 #include 
 #include 
 #include 
-#include 
-#include 
 #include 
-#include 
 #include 
 
 #include "nes.h"
 
-static unsigned int nes_lro_max_aggr = NES_LRO_MAX_AGGR;
-module_param(nes_lro_max_aggr, uint, 0444);
-MODULE_PARM_DESC(nes_lro_max_aggr, "NIC LRO max packet aggregation");
-
 static int wide_ppm_offset;
 module_param(wide_ppm_offset, int, 0644);
 MODULE_PARM_DESC(wide_ppm_offset, "Increase CX4 interface clock ppm offset, 
0=100ppm (default), 1=300ppm");
@@ -1642,25 +1635,6 @@ static void nes_rq_wqes_timeout(unsigned long parm)
 }
 
 
-static int nes_lro_get_skb_hdr(struct sk_buff *skb, void **iphdr,
-  void **tcph, u64 *hdr_flags, void *priv)
-{
-   unsigned int ip_len;
-   struct iphdr *iph;
-   skb_reset_network_header(skb);
-   iph = ip_hdr(skb);
-   if (iph->protocol != IPPROTO_TCP)
-   return -1;
-   ip_len = ip_hdrlen(skb);
-   skb_set_transport_header(skb, ip_len);
-   *tcph = tcp_hdr(skb);
-
-   *hdr_flags = LRO_IPV4 | LRO_TCP;
-   *iphdr = iph;
-   return 0;
-}
-
-
 /**
  * nes_init_nic_qp
  */
@@ -1895,14 +1869,6 @@ int nes_init_nic_qp(struct nes_device *nesdev, struct 
net_device *netdev)
return -ENOMEM;
}
 
-   nesvnic->lro_mgr.max_aggr   = nes_lro_max_aggr;
-   nesvnic->lro_mgr.max_desc   = NES_MAX_LRO_DESCRIPTORS;
-   nesvnic->lro_mgr.lro_arr= nesvnic->lro_desc;
-   nesvnic->lro_mgr.get_skb_header = nes_lro_get_skb_hdr;
-   nesvnic->lro_mgr.features   = LRO_F_NAPI | LRO_F_EXTRACT_VLAN_ID;
-   nesvnic->lro_mgr.dev= netdev;
-   nesvnic->lro_mgr.ip_summed  = CHECKSUM_UNNECESSARY;
-   nesvnic->lro_mgr.ip_summed_aggr = CHECKSUM_UNNECESSARY;
return 0;
 }
 
@@ -2809,13 +2775,10 @@ void nes_nic_ce_handler(struct nes_device *nesdev, 
struct nes_hw_nic_cq *cq)
u16 pkt_type;
u16 rqes_processed = 0;
u8 sq_cqes = 0;
-   u8 nes_use_lro = 0;
 
head = cq->cq_head;
cq_size = cq->cq_size;
cq->cqes_pending = 1;
-   if (nesvnic->netdev->features & NETIF_F_LRO)
-   nes_use_lro = 1;
do {
if 
(le32_to_cpu(cq->cq_vbase[head].cqe_words[NES_NIC_CQE_MISC_IDX]) &
NES_NIC_CQE_VALID) {
@@ -2950,10 +2913,7 @@ void nes_nic_ce_handler(struct nes_device *nesdev, 
struct nes_hw_nic_cq *cq)
 
__vlan_hwaccel_put_tag(rx_skb, 
htons(ETH_P_8021Q), vlan_tag);
}
-   if (nes_use_lro)
-   lro_receive_skb(&nesvnic->lro_mgr, 
rx_skb, NULL);
-   else
-   netif_receive_skb(rx_skb);
+   napi_gro_receive(&nesvnic->napi, rx_skb);
 
 skip_rx_indicate0:
;
@@ -2984,8 +2944,6 @@ skip_rx_indicate0:
 
} while (1);
 
-   if (nes_use_lro)
-   lro_flush_all(&nesvnic->lro_mgr);
if (sq_cqes) {
barrier();
/* restart the queue if it had been stopped */
diff --git a/drivers/infiniband/hw/nes/nes_hw.h 
b/drivers/infiniband/hw/nes/nes_hw.h
index c9080208aad2..1b66ef1e9937 100644
--- a/drivers/infiniband/hw/nes/nes_hw.h
+++ b/drivers/infiniband/hw/nes/nes_hw.h
@@ -33,8 +33,6 @@
 #ifndef __NES_HW_H
 #define __NES_HW_H
 
-#include 
-
 #define NES_PHY_TYPE_CX4   1
 #define NES_PHY_TYPE_1G2
 #define NES_PHY_TYPE_ARGUS 4
@@ -1049,8 +1047,6 @@ struct nes_hw_tune_timer {
 #define NES_TIMER_ENABLE_LIMIT  4
 #define NES_MAX_LINK_INTERRUPTS 128
 #define NES_MAX_LINK_CHECK  200
-#define NES_MAX_LRO_DESCRIPTORS 32
-#define NES_LRO_MAX_AG

[PATCH net-next 1/3] pasemi_mac: Replace LRO with GRO

2016-02-15 Thread Ben Hutchings

GRO is simpler to use than the old inet_lro library, and is compatible
with forwarding and bridging configurations.

Compile-tested only.

Signed-off-by: Ben Hutchings 
---
 drivers/net/ethernet/pasemi/Kconfig  |  5 +--
 drivers/net/ethernet/pasemi/pasemi_mac.c | 50 +---
 drivers/net/ethernet/pasemi/pasemi_mac.h |  4 --
 drivers/net/ethernet/pasemi/pasemi_mac_ethtool.c |  1 -
 4 files changed, 3 insertions(+), 57 deletions(-)

diff --git a/drivers/net/ethernet/pasemi/Kconfig 
b/drivers/net/ethernet/pasemi/Kconfig
index db19c6f49859..7c92e8306c19 100644
--- a/drivers/net/ethernet/pasemi/Kconfig
+++ b/drivers/net/ethernet/pasemi/Kconfig
@@ -5,7 +5,7 @@
 config NET_VENDOR_PASEMI
bool "PA Semi devices"
default y
-   depends on PPC_PASEMI && PCI && INET
+   depends on PPC_PASEMI && PCI
---help---
  If you have a network (Ethernet) card belonging to this class, say Y.
 
@@ -18,9 +18,8 @@ if NET_VENDOR_PASEMI
 
 config PASEMI_MAC
tristate "PA Semi 1/10Gbit MAC"
-   depends on PPC_PASEMI && PCI && INET
+   depends on PPC_PASEMI && PCI
select PHYLIB
-   select INET_LRO
---help---
  This driver supports the on-chip 1/10Gbit Ethernet controller on
  PA Semi's PWRficient line of chips.
diff --git a/drivers/net/ethernet/pasemi/pasemi_mac.c 
b/drivers/net/ethernet/pasemi/pasemi_mac.c
index 57a6e6cd74fc..af54df52aa6b 100644
--- a/drivers/net/ethernet/pasemi/pasemi_mac.c
+++ b/drivers/net/ethernet/pasemi/pasemi_mac.c
@@ -30,9 +30,7 @@
 #include 
 
 #include 
-#include 
 #include 
-#include 
 #include 
 
 #include 
@@ -52,12 +50,9 @@
  *
  * - Multicast support
  * - Large MTU support
- * - SW LRO
  * - Multiqueue RX/TX
  */
 
-#define LRO_MAX_AGGR 64
-
 #define PE_MIN_MTU 64
 #define PE_MAX_MTU 9000
 #define PE_DEF_MTU ETH_DATA_LEN
@@ -257,37 +252,6 @@ static int pasemi_mac_set_mac_addr(struct net_device *dev, 
void *p)
return 0;
 }
 
-static int get_skb_hdr(struct sk_buff *skb, void **iphdr,
-  void **tcph, u64 *hdr_flags, void *data)
-{
-   u64 macrx = (u64) data;
-   unsigned int ip_len;
-   struct iphdr *iph;
-
-   /* IPv4 header checksum failed */
-   if ((macrx & XCT_MACRX_HTY_M) != XCT_MACRX_HTY_IPV4_OK)
-   return -1;
-
-   /* non tcp packet */
-   skb_reset_network_header(skb);
-   iph = ip_hdr(skb);
-   if (iph->protocol != IPPROTO_TCP)
-   return -1;
-
-   ip_len = ip_hdrlen(skb);
-   skb_set_transport_header(skb, ip_len);
-   *tcph = tcp_hdr(skb);
-
-   /* check if ip header and tcp header are complete */
-   if (ntohs(iph->tot_len) < ip_len + tcp_hdrlen(skb))
-   return -1;
-
-   *hdr_flags = LRO_IPV4 | LRO_TCP;
-   *iphdr = iph;
-
-   return 0;
-}
-
 static int pasemi_mac_unmap_tx_skb(struct pasemi_mac *mac,
const int nfrags,
struct sk_buff *skb,
@@ -817,7 +781,7 @@ static int pasemi_mac_clean_rx(struct pasemi_mac_rxring *rx,
skb_put(skb, len-4);
 
skb->protocol = eth_type_trans(skb, mac->netdev);
-   lro_receive_skb(&mac->lro_mgr, skb, (void *)macrx);
+   napi_gro_receive(&mac->napi, skb);
 
 next:
RX_DESC(rx, n) = 0;
@@ -839,8 +803,6 @@ next:
 
rx_ring(mac)->next_to_clean = n;
 
-   lro_flush_all(&mac->lro_mgr);
-
/* Increase is in number of 16-byte entries, and since each descriptor
 * with an 8BRES takes up 3x8 bytes (padded to 4x8), increase with
 * count*2.
@@ -1754,16 +1716,6 @@ pasemi_mac_probe(struct pci_dev *pdev, const struct 
pci_device_id *ent)
dev->features = NETIF_F_IP_CSUM | NETIF_F_LLTX | NETIF_F_SG |
NETIF_F_HIGHDMA | NETIF_F_GSO;
 
-   mac->lro_mgr.max_aggr = LRO_MAX_AGGR;
-   mac->lro_mgr.max_desc = MAX_LRO_DESCRIPTORS;
-   mac->lro_mgr.lro_arr = mac->lro_desc;
-   mac->lro_mgr.get_skb_header = get_skb_hdr;
-   mac->lro_mgr.features = LRO_F_NAPI | LRO_F_EXTRACT_VLAN_ID;
-   mac->lro_mgr.dev = mac->netdev;
-   mac->lro_mgr.ip_summed = CHECKSUM_UNNECESSARY;
-   mac->lro_mgr.ip_summed_aggr = CHECKSUM_UNNECESSARY;
-
-
mac->dma_pdev = pci_get_device(PCI_VENDOR_ID_PASEMI, 0xa007, NULL);
if (!mac->dma_pdev) {
dev_err(&mac->pdev->dev, "Can't find DMA Controller\n");
diff --git a/drivers/net/ethernet/pasemi/pasemi_mac.h 
b/drivers/net/ethernet/pasemi/pasemi_mac.h
index a5807703ab96..161c99a98403 100644
--- a/drivers/net/ethernet/pasemi/pasemi_mac.h
+++ b/drivers/net/ethernet/pasemi/pasemi_mac.h
@@ -31,7 +31,6 @@
 #define CS_RING_SIZE (TX_RING_SIZE*2)
 
 
-#define MAX_LRO_DESCRIPTORS 8
 #define MAX_CS 2
 
 struct pasemi_mac_txring {
@@ -84,10 +83,7 @@ struct pasemi_mac {
 
u8  mac_addr[ETH_ALEN];
 
-   struct net_lro_mgr

[PATCH net-next 0/3] Remove the inet_lro library

2016-02-15 Thread Ben Hutchings

The old inet_lro library has been deprecated ever since GRO was
introduced, but there are still a few drivers using it.  Convert
them to GRO and remove it.

Ben.

Ben Hutchings (3):
  pasemi_mac: Replace LRO with GRO
  RDMA/nes: Replace LRO with GRO
  ipv4: Remove inet_lro library

 drivers/infiniband/hw/nes/Kconfig|   1 -
 drivers/infiniband/hw/nes/nes_hw.c   |  44 +--
 drivers/infiniband/hw/nes/nes_hw.h   |   7 -
 drivers/infiniband/hw/nes/nes_nic.c  |   7 -
 drivers/net/ethernet/pasemi/Kconfig  |   5 +-
 drivers/net/ethernet/pasemi/pasemi_mac.c |  50 +--
 drivers/net/ethernet/pasemi/pasemi_mac.h |   4 -
 drivers/net/ethernet/pasemi/pasemi_mac_ethtool.c |   1 -
 include/linux/inet_lro.h | 142 -
 net/ipv4/Kconfig |   8 -
 net/ipv4/Makefile|   1 -
 net/ipv4/inet_lro.c  | 374 ---
 12 files changed, 4 insertions(+), 640 deletions(-)
 delete mode 100644 include/linux/inet_lro.h
 delete mode 100644 net/ipv4/inet_lro.c



signature.asc
Description: Digital signature

Re: [net-next PATCH 1/1] net_sched fix: reclassification needs to consider ether protocol changes

2016-02-15 Thread Daniel Borkmann


On 02/15/2016 08:49 PM, Jamal Hadi Salim wrote:

From: Jamal Hadi Salim 

actions could change the etherproto in particular with ethernet
tunnelled data. Typically such actions, after peeling the outer header,
will ask for the packet to be  reclassified. We then need to restart
the classification with the new proto header.

Example setup used to catch this:
sudo tc qdisc add dev $ETH ingress
sudo tc filter add dev $ETH parent : prio 2 protocol 0xbeef \
u32 match u32 0 0 flowid 1:1 \
action ife decode reclassify


ife action is out of tree, but I believe this should be possible with
vlan action and using reclassify as tc opcode.


Fixes: 3b3ae880266d ("net: sched: consolidate tc_classify{,_compat}")
Signed-off-by: Jamal Hadi Salim 


Has kbuild bot issues, I'd probably just move this under 'reset' label,
like:

diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
index b5c2cf2..af1acf0 100644
--- a/net/sched/sch_api.c
+++ b/net/sched/sch_api.c
@@ -1852,6 +1852,7 @@ reset:
}

tp = old_tp;
+   protocol = tc_skb_protocol(skb);
goto reclassify;
 #endif
 }

Thanks,
Daniel

Re: [PATCH v2] phy: marvell: Fix and unify reg-init behavior

2016-02-15 Thread Fabio Estevam

On Mon, Feb 15, 2016 at 6:01 PM, Clemens Gruber
 wrote:

> +static int marvell_config_init(struct phy_device *phydev)
> +{
> +   int err;
> +
> +   /* Set registers from marvell,reg-init DT property */
> +   err = marvell_of_reg_init(phydev);
> +   if (err < 0)
> +   return err;
> +
> +   return 0;
>  }

Couldn't this be replaced by

return marvell_of_reg_init(phydev); ?

Re: [PATCH] phy: marvell: Fix and unify reg-init behavior

2016-02-15 Thread Clemens Gruber

Hi Florian,

On Mon, Feb 15, 2016 at 10:22:14AM -0800, Florian Fainelli wrote:
> 
> 
> On 15/02/2016 10:19, Florian Fainelli wrote:
> > On 15/02/2016 09:52, Clemens Gruber wrote:
> >> For the Marvell 88E1510, marvell_of_reg_init was called too late (in
> >> m88e1510_config_aneg), which lead to the phy state machine being stuck
> >> at waiting for interrupts, which are off by default on the 88E1510.
> >> This further lead to the ethernet link not coming up at boot.
> >> For some Marvell PHYs, marvell_of_reg_init was not called at all.
> > 
> > You could mention that this became apparent with
> 
> ...
> 113c74d83eef ("net: phy: turn carrier off on phy attach")

Thanks for your comments. I thought it would be better to just leave all
PHYs with special requirements (resets, ..) as they were and only change
the behavior of those who were missing the marvell_of_reg_init calls.
What do you think?

Clemens

Re: [PATCH net] ipv4: fix memory leaks in ip_cmsg_send() callers

2016-02-15 Thread Cong Wang

On Thu, Feb 4, 2016 at 6:23 AM, Eric Dumazet  wrote:
> From: Eric Dumazet 
>
> Dmitry reported memory leaks of IP options allocated in
> ip_cmsg_send() when/if this function returns an error.
>
> Callers are responsible for the freeing.

Right, because there is a loop in ip_cmsg_send(), so the callers
are easier to free it than the callee.

The other thing is we perhaps have another leak in the following code:

if (ipc.opt && ipc.opt->opt.srr) {
if (!daddr)
return -EINVAL;
faddr = ipc.opt->opt.faddr;
}

since ipc.opt could be allocated on heap... We need something like:

@@ -770,8 +770,11 @@ static int ping_v4_sendmsg(struct sock *sk,
struct msghdr *msg, size_t len)
ipc.addr = faddr = daddr;

if (ipc.opt && ipc.opt->opt.srr) {
-   if (!daddr)
+   if (!daddr) {
+   if (free)
+   kfree(ipc.opt);
return -EINVAL;
+   }
faddr = ipc.opt->opt.faddr;
}
tos = get_rttos(&ipc, inet);

Re: [net-next PATCH 1/1] net_sched fix: reclassification needs to consider ether protocol changes

2016-02-15 Thread kbuild test robot

Hi Jamal,

[auto build test ERROR on net-next/master]

url:
https://github.com/0day-ci/linux/commits/Jamal-Hadi-Salim/net_sched-fix-reclassification-needs-to-consider-ether-protocol-changes/20160216-035147
config: i386-randconfig-x008-201607 (attached as .config)
reproduce:
# save the attached .config to linux build tree
make ARCH=i386 

All errors (new ones prefixed by >>):

   net/sched/sch_api.c: In function 'tc_classify':
>> net/sched/sch_api.c:1832:23: error: 'protocol' undeclared (first use in this 
>> function)
  if (tp->protocol != protocol &&
  ^
   net/sched/sch_api.c:1832:23: note: each undeclared identifier is reported 
only once for each function it appears in

vim +/protocol +1832 net/sched/sch_api.c

3b3ae880 Daniel Borkmann  2015-08-26  1826  reclassify:
bd4d9963 Jamal Hadi Salim 2016-02-15  1827  protocol = tc_skb_protocol(skb);
3b3ae880 Daniel Borkmann  2015-08-26  1828  #endif
25d8c0d5 John Fastabend   2014-09-12  1829  for (; tp; tp = 
rcu_dereference_bh(tp->next)) {
3b3ae880 Daniel Borkmann  2015-08-26  1830  int err;
3b3ae880 Daniel Borkmann  2015-08-26  1831  
cc7ec456 Eric Dumazet 2011-01-19 @1832  if (tp->protocol != 
protocol &&
cc7ec456 Eric Dumazet 2011-01-19  1833  tp->protocol != 
htons(ETH_P_ALL))
cc7ec456 Eric Dumazet 2011-01-19  1834  continue;
cc7ec456 Eric Dumazet 2011-01-19  1835  

:: The code at line 1832 was first introduced by commit
:: cc7ec456f82da7f89a5b376e613b3ac4311b3e9a net_sched: cleanups

:: TO: Eric Dumazet 
:: CC: David S. Miller 

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: Binary data

Re: [PATCH 00/33] Compile-time stack metadata validation

2016-02-15 Thread Josh Poimboeuf

On Mon, Feb 15, 2016 at 08:56:21AM -0800, Linus Torvalds wrote:
> On Feb 15, 2016 8:31 AM, "Josh Poimboeuf"  wrote:
> >
> > So is the goal to optimize for size?  If I replace the calls to
> > __preempt_schedule[_notrace]() with real C calls and remove the thunks,
> > it only adds about 2k to vmlinux.
> 
> It adds nasty register pressure in some of the most critical parts of the
> kernel, and makes the asm code harder to read.
> 
> And yes, I still read the asm. For performance reasons, and when decoding
> oopses.
> 
> I realize that few other people care about generated code. That's sad.
> 
> > There are two ways to fix the warnings:
> >
> > 1. get rid of the thunks and call the C functions directly; or
> 
> No. Not until gcc learns about per-function callibg conventions (so that it
> can be marked as not clobbering registers).
> 
> > 2. add the stack pointer to the asm() statement output operand list to
> > ensure a stack frame gets created in the caller function before the
> > call.
> 
> That probably doesn't make things much worse. It probably makes least
> functions have a stack frame if they do preempt disable, but it might still
> be acceptable. Hard to say before I end up hurting this case again.

Oddly, this change (see patch below) actually seems to make things
faster in a lot of cases.  For many smaller functions it causes the
stack frame creation to get moved out of the common path and into the
unlikely path.

For example, here's the original cyc2ns_read_end():

  8101f8c0 :
  8101f8c0: 55  push   %rbp
  8101f8c1: 48 89 e5mov%rsp,%rbp
  8101f8c4: 83 6f 10 01 subl   $0x1,0x10(%rdi)
  8101f8c8: 75 08   jne8101f8d2 

  8101f8ca: 65 48 89 3d e6 5a ffmov
%rdi,%gs:0x7eff5ae6(%rip)# 153b8 
  8101f8d1: 7e 
  8101f8d2: 65 ff 0d 77 c4 fe 7edecl   %gs:0x7efec477(%rip) 
   # bd50 <__preempt_count>
  8101f8d9: 74 02   je 8101f8dd 

  8101f8db: 5d  pop%rbp
  8101f8dc: c3  retq   
  8101f8dd: e8 1e 37 fe ff  callq  81003000 
<___preempt_schedule>
  8101f8e2: 5d  pop%rbp
  8101f8e3: c3  retq   
  8101f8e4: 66 66 66 2e 0f 1f 84data16 data16 nopw 
%cs:0x0(%rax,%rax,1)
  8101f8eb: 00 00 00 00 00 

And here's the same function with the patch:

  8101f8c0 :
  8101f8c0: 83 6f 10 01 subl   $0x1,0x10(%rdi)
  8101f8c4: 75 08   jne8101f8ce 

  8101f8c6: 65 48 89 3d ea 5a ffmov
%rdi,%gs:0x7eff5aea(%rip)# 153b8 
  8101f8cd: 7e 
  8101f8ce: 65 ff 0d 7b c4 fe 7edecl   %gs:0x7efec47b(%rip) 
   # bd50 <__preempt_count>
  8101f8d5: 74 01   je 8101f8d8 

  8101f8d7: c3  retq   
  8101f8d8: 55  push   %rbp
  8101f8d9: 48 89 e5mov%rsp,%rbp
  8101f8dc: e8 1f 37 fe ff  callq  81003000 
<___preempt_schedule>
  8101f8e1: 5d  pop%rbp
  8101f8e2: c3  retq   
  8101f8e3: 66 66 66 66 2e 0f 1fdata16 data16 data16 nopw 
%cs:0x0(%rax,%rax,1)
  8101f8ea: 84 00 00 00 00 00 

Notice that it moved the frame pointer setup code to the unlikely
___preempt_schedule() call path.  Going through a sampling of the
differences in the asm, that's the most common change I see.

Otherwise it has no real effect on callers which already have stack
frames (though it does change the ordering of some 'mov's).

And it has the intended effect of fixing the following warnings by
ensuring these call sites have stack frames:

  stacktool: drivers/scsi/hpsa.o: hpsa_scsi_do_simple_cmd.constprop.106()+0x79: 
call without frame pointer save/setup
  stacktool: fs/mbcache.o: mb_cache_entry_find_first()+0x70: call without frame 
pointer save/setup
  stacktool: fs/mbcache.o: mb_cache_entry_find_first()+0x92: call without frame 
pointer save/setup
  stacktool: fs/mbcache.o: mb_cache_entry_free()+0xff: call without frame 
pointer save/setup
  stacktool: fs/mbcache.o: mb_cache_entry_free()+0xf5: call without frame 
pointer save/setup
  stacktool: fs/mbcache.o: mb_cache_entry_free()+0x11a: call without frame 
pointer save/setup
  stacktool: fs/mbcache.o: mb_cache_entry_get()+0x225: call without frame 
pointer save/setup
  stacktool: kernel/locking/percpu-rwsem.o: percpu_up_read()+0x27: call without 
frame pointer save/setup
  stacktool: kernel/profile.o: do_profile_hits.isra.5()+0x139: call without 
frame pointer save/setup
  stacktool: lib/nmi_backtrace.o: nmi_trigger_all_c

Re: [PATCH 00/33] Compile-time stack metadata validation

2016-02-15 Thread Andi Kleen

> > There are two ways to fix the warnings:
> >
> > 1. get rid of the thunks and call the C functions directly; or
> 
> No. Not until gcc learns about per-function callibg conventions (so that it 
> can
> be marked as not clobbering registers).

It does already for static functions in 5.x (with -fipa-ra).
And with LTO it can be used even somewhat globally.

Even older version supported it, for only for x86->SSE on 32bit, which
is useless for the kernel. But the new IPA-RA propagates which registers
are clobbered.

That said it will probably be a long time until we can drop support for
older compilers.  So for now the manual method is still needed.

-Andi

[PATCH v2] phy: marvell: Fix and unify reg-init behavior

2016-02-15 Thread Clemens Gruber

For the Marvell 88E1510, marvell_of_reg_init was called too late, in the
config_aneg function.
Since commit 113c74d83eef ("net: phy: turn carrier off on phy attach"),
this lead to the link not coming up at boot anymore, due to the phy
state machine being stuck at waiting for interrupts (off by default on
the 88E1510).
For seven other Marvell PHYs, marvell_of_reg_init was not called at all.

Add a generic marvell_config_init function, which in turn calls
marvell_of_reg_init.
PHYs, which already have a specific config_init function with a call to
marvell_of_reg_init, are left untouched. The generic marvell_config_init
function is called for all the others, to get consistent behavior across
all Marvell PHYs.

Signed-off-by: Clemens Gruber 
---

Changes since v1:
- No longer touch the PHYs that already call marvell_of_reg_init
- No more reset in marvell_config_init.
- Moved the function block to avoid a predeclaration

---
 drivers/net/phy/marvell.c | 23 ---
 1 file changed, 20 insertions(+), 3 deletions(-)

diff --git a/drivers/net/phy/marvell.c b/drivers/net/phy/marvell.c
index e3eb964..bfef5d1 100644
--- a/drivers/net/phy/marvell.c
+++ b/drivers/net/phy/marvell.c
@@ -446,7 +446,19 @@ static int m88e1510_config_aneg(struct phy_device *phydev)
if (err < 0)
return err;
 
-   return marvell_of_reg_init(phydev);
+   return 0;
+}
+
+static int marvell_config_init(struct phy_device *phydev)
+{
+   int err;
+
+   /* Set registers from marvell,reg-init DT property */
+   err = marvell_of_reg_init(phydev);
+   if (err < 0)
+   return err;
+
+   return 0;
 }
 
 static int m88e1116r_config_init(struct phy_device *phydev)
@@ -495,7 +507,7 @@ static int m88e1116r_config_init(struct phy_device *phydev)
 
mdelay(500);
 
-   return 0;
+   return marvell_config_init(phydev);
 }
 
 static int m88e3016_config_init(struct phy_device *phydev)
@@ -514,7 +526,7 @@ static int m88e3016_config_init(struct phy_device *phydev)
if (reg < 0)
return reg;
 
-   return 0;
+   return marvell_config_init(phydev);
 }
 
 static int m88e_config_init(struct phy_device *phydev)
@@ -1078,6 +1090,7 @@ static struct phy_driver marvell_drivers[] = {
.features = PHY_GBIT_FEATURES,
.probe = marvell_probe,
.flags = PHY_HAS_INTERRUPT,
+   .config_init = &marvell_config_init,
.config_aneg = &marvell_config_aneg,
.read_status = &genphy_read_status,
.ack_interrupt = &marvell_ack_interrupt,
@@ -1149,6 +1162,7 @@ static struct phy_driver marvell_drivers[] = {
.features = PHY_GBIT_FEATURES,
.flags = PHY_HAS_INTERRUPT,
.probe = marvell_probe,
+   .config_init = &marvell_config_init,
.config_aneg = &m88e1121_config_aneg,
.read_status = &marvell_read_status,
.ack_interrupt = &marvell_ack_interrupt,
@@ -1167,6 +1181,7 @@ static struct phy_driver marvell_drivers[] = {
.features = PHY_GBIT_FEATURES,
.flags = PHY_HAS_INTERRUPT,
.probe = marvell_probe,
+   .config_init = &marvell_config_init,
.config_aneg = &m88e1318_config_aneg,
.read_status = &marvell_read_status,
.ack_interrupt = &marvell_ack_interrupt,
@@ -1259,6 +1274,7 @@ static struct phy_driver marvell_drivers[] = {
.features = PHY_GBIT_FEATURES,
.flags = PHY_HAS_INTERRUPT,
.probe = marvell_probe,
+   .config_init = &marvell_config_init,
.config_aneg = &m88e1510_config_aneg,
.read_status = &marvell_read_status,
.ack_interrupt = &marvell_ack_interrupt,
@@ -1277,6 +1293,7 @@ static struct phy_driver marvell_drivers[] = {
.features = PHY_GBIT_FEATURES,
.flags = PHY_HAS_INTERRUPT,
.probe = marvell_probe,
+   .config_init = &marvell_config_init,
.config_aneg = &m88e1510_config_aneg,
.read_status = &marvell_read_status,
.ack_interrupt = &marvell_ack_interrupt,
-- 
2.7.1

Re: [PATCH V6] netfilter: h323: avoid potential attack

2016-02-15 Thread Pablo Neira Ayuso

On Tue, Feb 02, 2016 at 09:40:04PM +0800, Zhouyi Zhou wrote:
> diff --git a/net/netfilter/nf_conntrack_h323_main.c 
> b/net/netfilter/nf_conntrack_h323_main.c
> index 9511af0..8d24c4b 100644
> --- a/net/netfilter/nf_conntrack_h323_main.c
> +++ b/net/netfilter/nf_conntrack_h323_main.c
> @@ -110,6 +110,21 @@ int (*nat_q931_hook) (struct sk_buff *skb,
>  
>  static DEFINE_SPINLOCK(nf_h323_lock);
>  static char *h323_buffer;
> +static int h323_buffer_valid_bytes;
> +
> +static bool h323_buffer_ref_valid(void *p, int len)

I'd rather see you pass a parameter to this function with the
remaining size in the buffer, so we don't need this global variable
h323_buffer_valid_bytes.

You can probably add an initial patch to add a structure to store the
state information so we reduce the function parameter footprint.

struct h323_ct_state {
...
int buflen;
};

So you pass up the h323_ct_state structure to function calls,
something like that.

Thanks.

> +{
> + if ((unsigned long)len > h323_buffer_valid_bytes)
> + return false;
> +
> + if (p + len > (void *)h323_buffer + h323_buffer_valid_bytes)
> + return false;
> +
> + if (p < (void *)h323_buffer)
> + return false;
> +
> + return true;
> +}
>  
>  static struct nf_conntrack_helper nf_conntrack_helper_h245;
>  static struct nf_conntrack_helper nf_conntrack_helper_q931[];

[net-next PATCH 1/1] net_sched fix: reclassification needs to consider ether protocol changes

2016-02-15 Thread Jamal Hadi Salim

From: Jamal Hadi Salim 

actions could change the etherproto in particular with ethernet
tunnelled data. Typically such actions, after peeling the outer header,
will ask for the packet to be  reclassified. We then need to restart
the classification with the new proto header.

Example setup used to catch this:
sudo tc qdisc add dev $ETH ingress
sudo tc filter add dev $ETH parent : prio 2 protocol 0xbeef \
u32 match u32 0 0 flowid 1:1 \
action ife decode reclassify

Fixes: 3b3ae880266d ("net: sched: consolidate tc_classify{,_compat}")
Signed-off-by: Jamal Hadi Salim 
---
 net/sched/sch_api.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
index b5c2cf2..07cfb54 100644
--- a/net/sched/sch_api.c
+++ b/net/sched/sch_api.c
@@ -1818,12 +1818,13 @@ done:
 int tc_classify(struct sk_buff *skb, const struct tcf_proto *tp,
struct tcf_result *res, bool compat_mode)
 {
-   __be16 protocol = tc_skb_protocol(skb);
 #ifdef CONFIG_NET_CLS_ACT
const struct tcf_proto *old_tp = tp;
int limit = 0;
+   __be16 protocol;
 
 reclassify:
+   protocol = tc_skb_protocol(skb);
 #endif
for (; tp; tp = rcu_dereference_bh(tp->next)) {
int err;
-- 
1.9.1

[PATCHv2] af_llc: fix types on llc_ui_wait_for_conn

2016-02-15 Thread Alan

The timeout is a long, we return it truncated if it is huge. Basically
harmless as the only caller does a boolean check, but tidy it up anyway.

(64bit build tested this time. Thank you 0day)

Signed-off-by: Alan Cox 
---
 net/llc/af_llc.c |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/llc/af_llc.c b/net/llc/af_llc.c
index 8dab4e5..b3c52e3 100644
--- a/net/llc/af_llc.c
+++ b/net/llc/af_llc.c
@@ -38,7 +38,7 @@ static u16 llc_ui_sap_link_no_max[256];
 static struct sockaddr_llc llc_ui_addrnull;
 static const struct proto_ops llc_ui_ops;
 
-static int llc_ui_wait_for_conn(struct sock *sk, long timeout);
+static long llc_ui_wait_for_conn(struct sock *sk, long timeout);
 static int llc_ui_wait_for_disc(struct sock *sk, long timeout);
 static int llc_ui_wait_for_busy_core(struct sock *sk, long timeout);
 
@@ -551,7 +551,7 @@ static int llc_ui_wait_for_disc(struct sock *sk, long 
timeout)
return rc;
 }
 
-static int llc_ui_wait_for_conn(struct sock *sk, long timeout)
+static long llc_ui_wait_for_conn(struct sock *sk, long timeout)
 {
DEFINE_WAIT(wait);

Re: [PATCH nf] netfilter: nfnetlink: correctly validate length of batch messages

2016-02-15 Thread Pablo Neira Ayuso

On Tue, Feb 02, 2016 at 01:36:45PM -0500, phil.turnb...@oracle.com wrote:
> From: Phil Turnbull 
> 
> If nlh->nlmsg_len is zero then an infinite loop is triggered because
> 'skb_pull(skb, msglen);' pulls zero bytes.
> 
> The calculation in nlmsg_len() underflows if 'nlh->nlmsg_len <
> NLMSG_HDRLEN' which bypasses the length validation and will later
> trigger an out-of-bound read.
> 
> If the length validation does fail then the malformed batch message is
> copied back to userspace. However, we cannot do this because the
> nlh->nlmsg_len can be invalid. This leads to an out-of-bounds read in
> netlink_ack:
> 
> [   41.455421] 
> ==
> [   41.456431] BUG: KASAN: slab-out-of-bounds in memcpy+0x1d/0x40 at addr 
> 880119e79340
> [   41.456431] Read of size 4294967280 by task a.out/987
> [   41.456431] 
> =
> [   41.456431] BUG kmalloc-512 (Not tainted): kasan: bad access detected
> [   41.456431] 
> -
> ...
> [   41.456431] Bytes b4 880119e79310: 00 00 00 00 d5 03 00 00 b0 fb 
> fe ff 00 00 00 00  
> [   41.456431] Object 880119e79320: 20 00 00 00 10 00 05 00 00 00 00 
> 00 00 00 00 00   ...
> [   41.456431] Object 880119e79330: 14 00 0a 00 01 03 fc 40 45 56 11 
> 22 33 10 00 05  ...@EV."3...
> [   41.456431] Object 880119e79340: f0 ff ff ff 88 99 aa bb 00 14 00 
> 0a 00 06 fe fb  
> ^^ start of batch nlmsg with
>nlmsg_len=4294967280
> ...
> [   41.456431] Memory state around the buggy address:
> [   41.456431]  880119e79400: 00 00 00 00 00 00 00 00 00 00 00 00 00 
> 00 00 00
> [   41.456431]  880119e79480: 00 00 00 00 00 00 00 00 00 00 00 00 00 
> 00 00 00
> [   41.456431] >880119e79500: 00 00 00 00 fc fc fc fc fc fc fc fc fc 
> fc fc fc
> [   41.456431]^
> [   41.456431]  880119e79580: fc fc fc fc fc fc fc fc fc fc fc fc fc 
> fc fc fc
> [   41.456431]  880119e79600: fc fc fc fc fc fc fc fc fc fc fb fb fb 
> fb fb fb
> [   41.456431] 
> ==
> 
> Fix this with better validation of nlh->nlmsg_len and by setting
> NFNL_BATCH_FAILURE if any batch message fails length validation.
> 
> CAP_NET_ADMIN is required to trigger the bugs.

Applied, thanks.

Re: [PATCH] af_llc: fix types on llc_ui_wait_for_conn

2016-02-15 Thread kbuild test robot

Hi Alan,

[auto build test ERROR on net-next/master]
[also build test ERROR on v4.5-rc4 next-20160215]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improving the system]

url:
https://github.com/0day-ci/linux/commits/Alan/af_llc-fix-types-on-llc_ui_wait_for_conn/20160216-030508
config: x86_64-randconfig-x019-201607 (attached as .config)
reproduce:
# save the attached .config to linux build tree
make ARCH=x86_64 

All error/warnings (new ones prefixed by >>):

>> net/llc/af_llc.c:554:13: error: conflicting types for 'llc_ui_wait_for_conn'
static long llc_ui_wait_for_conn(struct sock *sk, long timeout)
^
   net/llc/af_llc.c:41:12: note: previous declaration of 'llc_ui_wait_for_conn' 
was here
static int llc_ui_wait_for_conn(struct sock *sk, long timeout);
   ^
>> net/llc/af_llc.c:41:12: warning: 'llc_ui_wait_for_conn' used but never 
>> defined
   net/llc/af_llc.c:554:13: warning: 'llc_ui_wait_for_conn' defined but not 
used [-Wunused-function]
static long llc_ui_wait_for_conn(struct sock *sk, long timeout)
^

vim +/llc_ui_wait_for_conn +554 net/llc/af_llc.c

   548  rc = 0;
   549  }
   550  finish_wait(sk_sleep(sk), &wait);
   551  return rc;
   552  }
   553  
 > 554  static long llc_ui_wait_for_conn(struct sock *sk, long timeout)
   555  {
   556  DEFINE_WAIT(wait);
   557  

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: Binary data

[PATCH] af_llc: fix types on llc_ui_wait_for_conn

2016-02-15 Thread Alan

The timeout is a long, we return it truncated if it is huge. Basically
harmless as the only caller does a boolean check, but tidy it up anyway.

Signed-off-by: Alan Cox 
---
 net/llc/af_llc.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/llc/af_llc.c b/net/llc/af_llc.c
index 8dab4e5..f05a2cb 100644
--- a/net/llc/af_llc.c
+++ b/net/llc/af_llc.c
@@ -551,7 +551,7 @@ static int llc_ui_wait_for_disc(struct sock *sk, long 
timeout)
return rc;
 }
 
-static int llc_ui_wait_for_conn(struct sock *sk, long timeout)
+static long llc_ui_wait_for_conn(struct sock *sk, long timeout)
 {
DEFINE_WAIT(wait);

Re: [PATCH net-next] core: remove unneded headers for net cgroup controllers.

2016-02-15 Thread Tejun Heo

On Mon, Feb 15, 2016 at 02:39:43AM +0200, Rami Rosen wrote:
> commit 3ed80a6 (cgroup: drop module support) made including 
> module.h redundant in the net cgroup controllers, 
> netclassid_cgroup.c and netprio_cgroup.c. This patch 
> removes them.
> 
> Signed-off-by: Rami Rosen 

Acked-by: Tejun Heo 

Thanks.

-- 
tejun

Re: [PATCH] netfilter: nft_counter: fix erroneous return values

2016-02-15 Thread Pablo Neira Ayuso

On Sat, Feb 06, 2016 at 11:31:19PM -0500, Anton Protopopov wrote:
> The nft_counter_init() and nft_counter_clone() functions should return
> negative error value -ENOMEM instead of positive ENOMEM.

Applied, thanks.

Re: [PATCH] phy: marvell: Fix and unify reg-init behavior

2016-02-15 Thread Florian Fainelli



On 15/02/2016 10:19, Florian Fainelli wrote:
> On 15/02/2016 09:52, Clemens Gruber wrote:
>> For the Marvell 88E1510, marvell_of_reg_init was called too late (in
>> m88e1510_config_aneg), which lead to the phy state machine being stuck
>> at waiting for interrupts, which are off by default on the 88E1510.
>> This further lead to the ethernet link not coming up at boot.
>> For some Marvell PHYs, marvell_of_reg_init was not called at all.
> 
> You could mention that this became apparent with

...
113c74d83eef ("net: phy: turn carrier off on phy attach")
--
Florian

Re: [PATCH] phy: marvell: Fix and unify reg-init behavior

2016-02-15 Thread Florian Fainelli

On 15/02/2016 09:52, Clemens Gruber wrote:
> For the Marvell 88E1510, marvell_of_reg_init was called too late (in
> m88e1510_config_aneg), which lead to the phy state machine being stuck
> at waiting for interrupts, which are off by default on the 88E1510.
> This further lead to the ethernet link not coming up at boot.
> For some Marvell PHYs, marvell_of_reg_init was not called at all.

You could mention that this became apparent with

> 
> Add a generic marvell_config_init function, which in turn calls
> marvell_of_reg_init and resets the PHY, to get more consistent behavior
> across all Marvell PHYs.

Looks good, just few comments below:

> 
> Signed-off-by: Clemens Gruber 
> ---
>  drivers/net/phy/marvell.c | 65 
> +++
>  1 file changed, 32 insertions(+), 33 deletions(-)
> 
> diff --git a/drivers/net/phy/marvell.c b/drivers/net/phy/marvell.c
> index e3eb964..473beaa 100644
> --- a/drivers/net/phy/marvell.c
> +++ b/drivers/net/phy/marvell.c
> @@ -153,6 +153,8 @@ struct marvell_priv {
>   u64 stats[ARRAY_SIZE(marvell_hw_stats)];
>  };
>  
> +static int marvell_of_reg_init(struct phy_device *phydev);

Can we avoid the forward declaration by re-arranging function bodies?

> +
>  static int marvell_ack_interrupt(struct phy_device *phydev)
>  {
>   int err;
> @@ -215,6 +217,24 @@ static int marvell_set_polarity(struct phy_device 
> *phydev, int polarity)
>   return 0;
>  }
>  
> +static int marvell_config_init(struct phy_device *phydev)
> +{
> + int err;
> +
> + /* Set page to 0 */
> + err = phy_write(phydev, MII_MARVELL_PHY_PAGE, 0x0);
> + if (err < 0)
> + return err;
> +
> + /* Set registers from marvell,reg-init DT property */
> + err = marvell_of_reg_init(phydev);
> + if (err < 0)
> + return err;
> +
> + /* Reset the PHY (The page is 0 already) */
> + return phy_write(phydev, MII_BMCR, BMCR_RESET);

This does not appear to be needed except for the 88E, 1118 and 1149,
might be better to check these PHY IDs explicitly here to avoid a
software reset for the other PHYs and make the transformation identical
before and after the change?
--
Florian

Re: [PATCH] net: bcmgenet: Add MDIO_INTR in GENETv2

2016-02-15 Thread Florian Fainelli

Hi Jaedon,

On 15/02/2016 00:42, Jaedon Shin wrote:
> The GENETv2 chipsets has MDIO interrupt like the GENETv3+ chipsets.
> 
> The previous commit d5c3d84657db ("net: phy: Avoid polling PHY with
> PHY_IGNORE_INTERRUPTS") and commit 49f7a471e4d1 ("net: bcmgenet: Properly
> configure PHY to ignore interrupt") cause link-down PHY always in some
> 40nm generation chipsets.

Humm, these are two different things here:

- GENET_HAS_MDIO_INTR is about telling the driver whether the hardware
supports MDIO_INTR_DONE and MDIO_INTR_ERROR
- eliminating PHY polling is about utilizing LINK_UP and LINK_DOWN to
avoid polling the PHY

So the original problem is actually here:

bcmgenet_irq_task():

/* Link UP/DOWN event */
if ((priv->hw_params->flags & GENET_HAS_MDIO_INTR) &&
(priv->irq0_stat & UMAC_IRQ_LINK_EVENT)) {

These two checks are actually orthogonal, so we should remove the first
check on GENET_HAS_MDIO_INTR.

Your patch remains valid though, just the explanation needs a bit
tweaking, thanks!

> 
> Signed-off-by: Jaedon Shin 
> ---
>  drivers/net/ethernet/broadcom/genet/bcmgenet.c | 4 ++--
>  drivers/net/ethernet/broadcom/genet/bcmgenet.h | 2 +-
>  2 files changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/net/ethernet/broadcom/genet/bcmgenet.c 
> b/drivers/net/ethernet/broadcom/genet/bcmgenet.c
> index b15a60d787c7..8e9aa8f6390d 100644
> --- a/drivers/net/ethernet/broadcom/genet/bcmgenet.c
> +++ b/drivers/net/ethernet/broadcom/genet/bcmgenet.c
> @@ -1904,7 +1904,7 @@ static int init_umac(struct bcmgenet_priv *priv)
>   bcmgenet_bp_mc_set(priv, reg);
>   }
>  
> - /* Enable MDIO interrupts on GENET v3+ */
> + /* Enable MDIO interrupts on GENET v2+ */
>   if (priv->hw_params->flags & GENET_HAS_MDIO_INTR)
>   int0_enable |= (UMAC_IRQ_MDIO_DONE | UMAC_IRQ_MDIO_ERROR);
>  
> @@ -3168,7 +3168,7 @@ static struct bcmgenet_hw_params bcmgenet_hw_params[] = 
> {
>   .rdma_offset = 0x3000,
>   .tdma_offset = 0x4000,
>   .words_per_bd = 2,
> - .flags = GENET_HAS_EXT,
> + .flags = GENET_HAS_EXT | GENET_HAS_MDIO_INTR,
>   },
>   [GENET_V3] = {
>   .tx_queues = 4,
> diff --git a/drivers/net/ethernet/broadcom/genet/bcmgenet.h 
> b/drivers/net/ethernet/broadcom/genet/bcmgenet.h
> index 967367557309..c14bfbfbe06a 100644
> --- a/drivers/net/ethernet/broadcom/genet/bcmgenet.h
> +++ b/drivers/net/ethernet/broadcom/genet/bcmgenet.h
> @@ -310,7 +310,7 @@ struct bcmgenet_mib_counters {
>  #define UMAC_IRQ_TXDMA_BDONE (1 << 18)
>  #define UMAC_IRQ_TXDMA_DONE  UMAC_IRQ_TXDMA_MBDONE
>  
> -/* Only valid for GENETv3+ */
> +/* Only valid for GENETv2+ */
>  #define UMAC_IRQ_MDIO_DONE   (1 << 23)
>  #define UMAC_IRQ_MDIO_ERROR  (1 << 24)
>  
>

Re: Same data to several sockets with just one syscall ?

2016-02-15 Thread Eric Dumazet

On Mon, 2016-02-15 at 11:03 +0100, Claudio Scordino wrote:
> Hi Eric,
> 
> 2016-02-12 11:35 GMT+01:00 Eric Dumazet :
> > On Fri, 2016-02-12 at 09:53 +0100, Claudio Scordino wrote:
> >
> >> This makes the application waste time in entering/exiting the kernel
> >> level several times.
> >
> > syscall overhead is usually small. Real cost is actually getting to the
> > socket objects (fd manipulation), that you wont avoid with a
> > super-syscall anyway.
> 
> Thank you for answering. I see your point.
> 
> However, assuming that a switch from user-space to kernel-space (and
> back) needs about 200nsec of computation (which I guess is a
> reasonable value for a 3GHz x86 architecture), the 50th receiver
> experiences a latency of about 10 usec. In some domains (e.g.,
> finance) this delay is not negligible.

I thought these domains were using multicast.

> 
> Moving the "fan-out" code into kernel space would remove this waste of
> time. IMHO, the latency reduction would pay back the 100 lines of code
> for adding a new syscall.

It wont reduce the latency at all, and add a lot of maintenance hassle.

syscall overhead is about 40 ns.
This is the time taken to transmit ~50 bytes on 10Gbit link.

40ns * 50 = 2 usec only.

Feel free to implement your idea and test it, you'll discover the added
complexity is not worth it.

Re: [PATCH] netfilter: tee: select NF_DUP_IPV6 unconditionally

2016-02-15 Thread Pablo Neira Ayuso

On Fri, Feb 05, 2016 at 10:20:21AM +0100, Arnd Bergmann wrote:
> The NETFILTER_XT_TARGET_TEE option selects NF_DUP_IPV6 whenever
> IP6_NF_IPTABLES is enabled, and it ensures that it cannot be
> builtin itself if NF_CONNTRACK is a loadable module, as that
> is a dependency for NF_DUP_IPV6.
> 
> However, NF_DUP_IPV6 can be enabled even if IP6_NF_IPTABLES is
> turned off, and it only really depends on IPV6. With the current
> check in tee_tg6, we call nf_dup_ipv6() whenever NF_DUP_IPV6
> is enabled. This can however be a loadable module which is
> unreachable from a built-in xt_TEE:
> 
> net/built-in.o: In function `tee_tg6':
> :(.text+0x67728): undefined reference to `nf_dup_ipv6'
> 
> The bug was originally introduced in the split of the xt_TEE module
> into separate modules for ipv4 and ipv6, and two patches tried
> to fix it unsuccessfully afterwards.
> 
> This is a revert of the the first incorrect attempt to fix it,
> going back to depending on IPV6 as the dependency, and we
> adapt the 'select' condition accordingly.

Applied, thanks Arnd.

Re: [PATCH net-next 1/3] lwtunnel: autoload of lwt modules

2016-02-15 Thread Robert Shearman


On 15/02/16 16:32, Jiri Benc wrote:

On Mon, 15 Feb 2016 16:22:08 +, Robert Shearman wrote:

Yeah, it's the C preprocessor. MODULE_ALIAS_RTNL_LWT includes the string
for the encap type in the module alias, and since the LWT encap types
are defined as enums this is symbolic name. I can't see any way of
getting the preprocessor to convert
MODULE_ALIAS_RTNL_LWT(LWTUNNEL_ENCAP_MPLS) into "rtnl-lwt-MPLS", but I'm
open to suggestions.


MODULE_ALIAS_RTNL_LWT(MPLS)?

But whatever, as I said, no strong preference.


I was so hung up on the making the string match the name of the enum 
that I'd discounted that, but you're right that doing that would reduce 
duplication in the alias string.



True, but I figured that it was cleaner for the lwtunnel infra to not
assume whether how those modules are implemented. If you disagree, then
I can change to doing as you suggest.


It's not completely transparent to the infrastructure anyway, the
tunnel type needs to be added to lwtunnel_encap_str for new tunnels.
The way I suggested, it's only added for those tunnels having the
module alias set.

Just trying to get rid of the unnecessary strings in
lwtunnel_encap_str. There's no need to bloat kernel with them if
they're never used.


Ok, will resubmit without the unnecessary strings in that function as 
well as with your suggestion above.


Thanks for the review,
Rob

[PATCH] phy: marvell: Fix and unify reg-init behavior

2016-02-15 Thread Clemens Gruber

For the Marvell 88E1510, marvell_of_reg_init was called too late (in
m88e1510_config_aneg), which lead to the phy state machine being stuck
at waiting for interrupts, which are off by default on the 88E1510.
This further lead to the ethernet link not coming up at boot.
For some Marvell PHYs, marvell_of_reg_init was not called at all.

Add a generic marvell_config_init function, which in turn calls
marvell_of_reg_init and resets the PHY, to get more consistent behavior
across all Marvell PHYs.

Signed-off-by: Clemens Gruber 
---
 drivers/net/phy/marvell.c | 65 +++
 1 file changed, 32 insertions(+), 33 deletions(-)

diff --git a/drivers/net/phy/marvell.c b/drivers/net/phy/marvell.c
index e3eb964..473beaa 100644
--- a/drivers/net/phy/marvell.c
+++ b/drivers/net/phy/marvell.c
@@ -153,6 +153,8 @@ struct marvell_priv {
u64 stats[ARRAY_SIZE(marvell_hw_stats)];
 };
 
+static int marvell_of_reg_init(struct phy_device *phydev);
+
 static int marvell_ack_interrupt(struct phy_device *phydev)
 {
int err;
@@ -215,6 +217,24 @@ static int marvell_set_polarity(struct phy_device *phydev, 
int polarity)
return 0;
 }
 
+static int marvell_config_init(struct phy_device *phydev)
+{
+   int err;
+
+   /* Set page to 0 */
+   err = phy_write(phydev, MII_MARVELL_PHY_PAGE, 0x0);
+   if (err < 0)
+   return err;
+
+   /* Set registers from marvell,reg-init DT property */
+   err = marvell_of_reg_init(phydev);
+   if (err < 0)
+   return err;
+
+   /* Reset the PHY (The page is 0 already) */
+   return phy_write(phydev, MII_BMCR, BMCR_RESET);
+}
+
 static int marvell_config_aneg(struct phy_device *phydev)
 {
int err;
@@ -446,7 +466,7 @@ static int m88e1510_config_aneg(struct phy_device *phydev)
if (err < 0)
return err;
 
-   return marvell_of_reg_init(phydev);
+   return 0;
 }
 
 static int m88e1116r_config_init(struct phy_device *phydev)
@@ -495,7 +515,7 @@ static int m88e1116r_config_init(struct phy_device *phydev)
 
mdelay(500);
 
-   return 0;
+   return marvell_config_init(phydev);
 }
 
 static int m88e3016_config_init(struct phy_device *phydev)
@@ -514,7 +534,7 @@ static int m88e3016_config_init(struct phy_device *phydev)
if (reg < 0)
return reg;
 
-   return 0;
+   return marvell_config_init(phydev);
 }
 
 static int m88e_config_init(struct phy_device *phydev)
@@ -618,11 +638,7 @@ static int m88e_config_init(struct phy_device *phydev)
return err;
}
 
-   err = marvell_of_reg_init(phydev);
-   if (err < 0)
-   return err;
-
-   return phy_write(phydev, MII_BMCR, BMCR_RESET);
+   return marvell_config_init(phydev);
 }
 
 static int m88e1118_config_aneg(struct phy_device *phydev)
@@ -669,16 +685,7 @@ static int m88e1118_config_init(struct phy_device *phydev)
if (err < 0)
return err;
 
-   err = marvell_of_reg_init(phydev);
-   if (err < 0)
-   return err;
-
-   /* Reset address */
-   err = phy_write(phydev, MII_MARVELL_PHY_PAGE, 0x0);
-   if (err < 0)
-   return err;
-
-   return phy_write(phydev, MII_BMCR, BMCR_RESET);
+   return marvell_config_init(phydev);
 }
 
 static int m88e1149_config_init(struct phy_device *phydev)
@@ -695,16 +702,7 @@ static int m88e1149_config_init(struct phy_device *phydev)
if (err < 0)
return err;
 
-   err = marvell_of_reg_init(phydev);
-   if (err < 0)
-   return err;
-
-   /* Reset address */
-   err = phy_write(phydev, MII_MARVELL_PHY_PAGE, 0x0);
-   if (err < 0)
-   return err;
-
-   return phy_write(phydev, MII_BMCR, BMCR_RESET);
+   return marvell_config_init(phydev);
 }
 
 static int m88e1145_config_init(struct phy_device *phydev)
@@ -781,11 +779,7 @@ static int m88e1145_config_init(struct phy_device *phydev)
return err;
}
 
-   err = marvell_of_reg_init(phydev);
-   if (err < 0)
-   return err;
-
-   return 0;
+   return marvell_config_init(phydev);
 }
 
 /* marvell_read_status
@@ -1078,6 +1072,7 @@ static struct phy_driver marvell_drivers[] = {
.features = PHY_GBIT_FEATURES,
.probe = marvell_probe,
.flags = PHY_HAS_INTERRUPT,
+   .config_init = &marvell_config_init,
.config_aneg = &marvell_config_aneg,
.read_status = &genphy_read_status,
.ack_interrupt = &marvell_ack_interrupt,
@@ -1149,6 +1144,7 @@ static struct phy_driver marvell_drivers[] = {
.features = PHY_GBIT_FEATURES,
.flags = PHY_HAS_INTERRUPT,
.probe = marvell_probe,
+   .config_init = &marvell_config_init,
.config_aneg = &m88e1121_config_aneg,
.re

Re: [PATCH net-next v2 1/2] nsh: encapsulation module

2016-02-15 Thread Jiri Benc

On Thu, 11 Feb 2016 19:57:05 +, Brian Russell wrote:
> --- /dev/null
> +++ b/net/ipv4/nsh.c
> @@ -0,0 +1,365 @@
> +/*
> + * Network Service Header (NSH) inserted onto encapsulated packets
> + * or frames to realize service function paths.
> + * NSH also provides a mechanism for metadata exchange along the
> + * instantiated service path.
> + *
> + * https://tools.ietf.org/html/draft-ietf-sfc-nsh-01
> + *
> + * Copyright (c) 2016 by Brocade Communications Systems, Inc.
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + */
> +#include 
> +#include 
> +
> +static struct list_head nsh_listeners;
> +static DEFINE_MUTEX(nsh_listener_mutex);
> +static struct nsh_metadata *decap_ctx_hdrs;
> +static unsigned char limit_ctx_hdrs = 10;
> +module_param_named(nsh_hdrs, limit_ctx_hdrs, byte, 0444);
> +MODULE_PARM_DESC(nsh_hdrs, "Maximum NSH metadata headers per packet");

No module parameters, please. Especially not for something like
encapsulation where multiple users will want different settings.

> +
> +int nsh_register_listener(struct nsh_listener *listener)
> +{
> + if (listener->max_ctx_hdrs > limit_ctx_hdrs)
> + return -ENOMEM;
> +
> + mutex_lock(&nsh_listener_mutex);
> + list_add(&listener->list, &nsh_listeners);
> + mutex_unlock(&nsh_listener_mutex);
> + return 0;
> +}
> +EXPORT_SYMBOL(nsh_register_listener);
> +
> +int nsh_unregister_listener(struct nsh_listener *listener)
> +{
> + mutex_lock(&nsh_listener_mutex);
> + list_del(&listener->list);
> + mutex_unlock(&nsh_listener_mutex);
> + return 0;
> +}
> +EXPORT_SYMBOL(nsh_unregister_listener);

I'd like to see how this listener stuff is used. Please do not submit
patches adding API without actual users. It's hard (or, in this case,
I'd say even impossible) to properly review this without seeing how it
is used.

> +
> +static int
> +notify_listeners(struct sk_buff *skb,

Please do not break lines between the return type and name of the function.

> +  u32 service_path_id,
> +  u8 service_index,
> +  u8 next_proto,
> +  struct nsh_metadata *ctx_hdrs,
> +  unsigned int num_ctx_hdrs)
> +{
> + struct nsh_listener *listener;
> + int i, err = 0;
> +
> + mutex_lock(&nsh_listener_mutex);
> + list_for_each_entry(listener, &nsh_listeners, list) {
> + for (i = 0; i < num_ctx_hdrs; i++)
> + if (listener->class == ctx_hdrs[i].class) {
> + err = listener->notify(skb,
> +service_path_id,
> +service_index,
> +next_proto,
> +ctx_hdrs,
> +num_ctx_hdrs);
> + if (err < 0) {
> + mutex_unlock(&nsh_listener_mutex);
> + return err;
> + }
> + break;
> + }
> + }
> + mutex_unlock(&nsh_listener_mutex);
> + return 0;
> +}
> +
> +static int
> +type_1_decap(struct sk_buff *skb,
> +  struct nsh_md_type_1 *md,
> +  unsigned int max_ctx_hdrs,
> +  struct nsh_metadata *ctx_hdrs,
> +  unsigned int *num_ctx_hdrs)
> +{
> + int i;
> + u32 *data =  &md->ctx_hdr1;
> +
> + if (max_ctx_hdrs == 0)
> + return -ENOMEM;
> +
> + ctx_hdrs[0].class = NSH_MD_CLASS_TYPE_1;
> + ctx_hdrs[0].type = NSH_MD_TYPE_TYPE_1;
> + ctx_hdrs[0].len = NSH_MD_LEN_TYPE_1;
> + ctx_hdrs[0].data = data;
> +
> + for (i = 0; i < NSH_MD_TYPE_1_NUM_HDRS; i++, data++)
> + *data = ntohl(*data);
> +
> + *num_ctx_hdrs = 1;
> +
> + return 0;
> +}
> +
> +static int
> +type_2_decap(struct sk_buff *skb,
> +  struct nsh_md_type_2 *md,
> +  u8 md_len,
> +  unsigned int max_ctx_hdrs,
> +  struct nsh_metadata *ctx_hdrs,
> +  unsigned int *num_ctx_hdrs)
> +{
> + u32 *data;
> + int i = 0, j;
> +
> + while (md_len > 0) {
> + if (i > max_ctx_hdrs)
> + return -ENOMEM;
> +
> + ctx_hdrs[i].class = ntohs(md->tlv_class);
> + ctx_hdrs[i].type = md->tlv_type;
> + if (ctx_hdrs[i].type & NSH_TYPE_CRIT) {
> + ctx_hdrs[i].type &= ~NSH_TYPE_CRIT;
> + ctx_hdrs[i].crit = 1;
> + }
> + ctx_hdrs[i].len = md->length;
> +
> + data = (u32 *) ++md;
> + md_len--;
> +
> + ctx_hdrs[i].data = data;
> +
> + for (j = 0; j < ctx_hdrs[i].len; j++)
> + data[j] = n

Re: [PATCH 00/33] Compile-time stack metadata validation

2016-02-15 Thread Peter Zijlstra

On Mon, Feb 15, 2016 at 10:31:34AM -0600, Josh Poimboeuf wrote:
> On Fri, Feb 12, 2016 at 09:10:11PM +0100, Peter Zijlstra wrote:
> > On Fri, Feb 12, 2016 at 12:32:06PM -0600, Josh Poimboeuf wrote:
> > > What I actually see in the listing is:
> > > 
> > >   decl__percpu_prefix:__preempt_count
> > >   je  1f:
> > >   
> > >  1:
> > >   call___preempt_schedule
> > > 
> > > So it puts the "call ___preempt_schedule" in the slow path.
> > 
> > Ah yes indeed. Same difference though.
> > 
> > > I also don't see how that would be related to the use of the asm
> > > statement in the __preempt_schedule() macro.  Doesn't the use of
> > > unlikely() in preempt_enable() put the call in the slow path?
> > 
> > Sadly no, unlikely() and asm_goto don't work well together. But the slow
> > path or not isn't the reason we do the asm call thing.
> > 
> > >   #define preempt_enable() \
> > >   do { \
> > > barrier(); \
> > > if (unlikely(preempt_count_dec_and_test())) \
> > > preempt_schedule(); \
> > >   } while (0)
> > > 
> > > Also, why is the thunk needed?  Any reason why preempt_enable() can't be
> > > called directly from C?
> > 
> > That would make the call-site save registers and increase the size of
> > every preempt_enable(). By using the thunk we can do callee saved
> > registers and avoid blowing up the call site.
> 
> So is the goal to optimize for size?  

General performance impact of preempt_enable().

> If I replace the calls to
> __preempt_schedule[_notrace]() with real C calls and remove the thunks,
> it only adds about 2k to vmlinux.

That's less than I had expected, but probably still worth it.

And is that added text purely in the slow path? We really want to avoid
putting any more register pressure on the preempt_enable() call sites.
The single memop and Jcc is about as fast we can get and we spend quite
a bit of effort getting there.

> There are two ways to fix the warnings:
> 
> 1. get rid of the thunks and call the C functions directly; or
> 
> 2. add the stack pointer to the asm() statement output operand list to
> ensure a stack frame gets created in the caller function before the
> call.  (Note this still allows the thunks to do callee saved registers.)
> 
> I like #1 better, but maybe I'm still missing the point of the thunks.

Ingo, Linus?

Re: [PATCH net-next v2 2/2] vxlan: support GPE/NSH

2016-02-15 Thread Jiri Benc

On Thu, 11 Feb 2016 19:57:06 +, Brian Russell wrote:
> +skip_l2:
>   skb_reset_network_header(skb);
> +
>   /* In flow-based mode, GBP is carried in dst_metadata */
> - if (!(vs->flags & VXLAN_F_COLLECT_METADATA))
> + if (!(vs->flags & VXLAN_F_COLLECT_METADATA) &&
> + !(vs->flags & VXLAN_F_GPE))
>   skb->mark = md->gbp;

This is completely wrong. You cannot return a packet with a garbage in
place of the Ethernet header from ARPHRD_ETHER interface. For proper
VXLAN-GPE support, the vxlan interface needs to be in L3 mode, e.g.
ARPHRD_NONE.

To support L3 mode, the vxlan driver needs *tons* of cleanups (or tons
of duplicate code). This is exactly what I've done and what I'm in
process of merging. The number of patches is too big to be submitted as
a single patchset, hence I'm submitting in parts. The first one has
been already merged (net-next commit 19f76f63507f). For the full code,
look at: https://github.com/jbenc/linux-vxlan/commits/master

Comments are welcome.

 Jiri

-- 
Jiri Benc

Re: [PATCH net-next 1/3] lwtunnel: autoload of lwt modules

2016-02-15 Thread Jiri Benc

On Mon, 15 Feb 2016 16:22:08 +, Robert Shearman wrote:
> Yeah, it's the C preprocessor. MODULE_ALIAS_RTNL_LWT includes the string 
> for the encap type in the module alias, and since the LWT encap types 
> are defined as enums this is symbolic name. I can't see any way of 
> getting the preprocessor to convert 
> MODULE_ALIAS_RTNL_LWT(LWTUNNEL_ENCAP_MPLS) into "rtnl-lwt-MPLS", but I'm 
> open to suggestions.

MODULE_ALIAS_RTNL_LWT(MPLS)?

But whatever, as I said, no strong preference.

> True, but I figured that it was cleaner for the lwtunnel infra to not 
> assume whether how those modules are implemented. If you disagree, then 
> I can change to doing as you suggest.

It's not completely transparent to the infrastructure anyway, the
tunnel type needs to be added to lwtunnel_encap_str for new tunnels.
The way I suggested, it's only added for those tunnels having the
module alias set.

Just trying to get rid of the unnecessary strings in
lwtunnel_encap_str. There's no need to bloat kernel with them if
they're never used.

 Jiri

-- 
Jiri Benc

Re: [PATCH 00/33] Compile-time stack metadata validation

2016-02-15 Thread Josh Poimboeuf

On Fri, Feb 12, 2016 at 09:10:11PM +0100, Peter Zijlstra wrote:
> On Fri, Feb 12, 2016 at 12:32:06PM -0600, Josh Poimboeuf wrote:
> > What I actually see in the listing is:
> > 
> > decl__percpu_prefix:__preempt_count
> > je  1f:
> > 
> >  1:
> > call___preempt_schedule
> > 
> > So it puts the "call ___preempt_schedule" in the slow path.
> 
> Ah yes indeed. Same difference though.
> 
> > I also don't see how that would be related to the use of the asm
> > statement in the __preempt_schedule() macro.  Doesn't the use of
> > unlikely() in preempt_enable() put the call in the slow path?
> 
> Sadly no, unlikely() and asm_goto don't work well together. But the slow
> path or not isn't the reason we do the asm call thing.
> 
> >   #define preempt_enable() \
> >   do { \
> >   barrier(); \
> >   if (unlikely(preempt_count_dec_and_test())) \
> >   preempt_schedule(); \
> >   } while (0)
> > 
> > Also, why is the thunk needed?  Any reason why preempt_enable() can't be
> > called directly from C?
> 
> That would make the call-site save registers and increase the size of
> every preempt_enable(). By using the thunk we can do callee saved
> registers and avoid blowing up the call site.

So is the goal to optimize for size?  If I replace the calls to
__preempt_schedule[_notrace]() with real C calls and remove the thunks,
it only adds about 2k to vmlinux.

There are two ways to fix the warnings:

1. get rid of the thunks and call the C functions directly; or

2. add the stack pointer to the asm() statement output operand list to
ensure a stack frame gets created in the caller function before the
call.  (Note this still allows the thunks to do callee saved registers.)

I like #1 better, but maybe I'm still missing the point of the thunks.

-- 
Josh

Re: [PATCH net-next 1/3] lwtunnel: autoload of lwt modules

2016-02-15 Thread Robert Shearman


On 15/02/16 16:02, Jiri Benc wrote:

On Mon, 15 Feb 2016 15:42:01 +, Robert Shearman wrote:

+static const char *lwtunnel_encap_str(enum lwtunnel_encap_types encap_type)
+{
+   switch (encap_type) {
+   case LWTUNNEL_ENCAP_MPLS:
+   return "LWTUNNEL_ENCAP_MPLS";
+   case LWTUNNEL_ENCAP_IP:
+   return "LWTUNNEL_ENCAP_IP";
+   case LWTUNNEL_ENCAP_ILA:
+   return "LWTUNNEL_ENCAP_ILA";
+   case LWTUNNEL_ENCAP_IP6:
+   return "LWTUNNEL_ENCAP_IP6";
+   case LWTUNNEL_ENCAP_NONE:
+   case __LWTUNNEL_ENCAP_MAX:
+   /* should not have got here */
+   break;
+   }
+   WARN_ON(1);
+   return "LWTUNNEL_ENCAP_NONE";
+}
+
+#endif /* CONFIG_MODULES */
+
  struct lwtunnel_state *lwtunnel_state_alloc(int encap_len)
  {
struct lwtunnel_state *lws;
@@ -85,6 +109,14 @@ int lwtunnel_build_state(struct net_device *dev, u16 
encap_type,
ret = -EOPNOTSUPP;
rcu_read_lock();
ops = rcu_dereference(lwtun_encaps[encap_type]);
+#ifdef CONFIG_MODULES
+   if (!ops) {
+   rcu_read_unlock();
+   request_module("rtnl-lwt-%s", lwtunnel_encap_str(encap_type));


Why the repeating of "lwt"/"LWTUNNEL" and the unnecessary "ENCAP"?
Wouldn't be lwtunnel_encap_str returning just "MPLS" or "ILA" enough?
I don't have any strong preference here, it just looks weird to me.
Maybe there's a reason.


Yeah, it's the C preprocessor. MODULE_ALIAS_RTNL_LWT includes the string 
for the encap type in the module alias, and since the LWT encap types 
are defined as enums this is symbolic name. I can't see any way of 
getting the preprocessor to convert 
MODULE_ALIAS_RTNL_LWT(LWTUNNEL_ENCAP_MPLS) into "rtnl-lwt-MPLS", but I'm 
open to suggestions.


I could just drop the "lwt-" bit of the alias string given that it's 
included in the name of the enum values.



Also, this doesn't affect IP lwtunnels, i.e. LWTUNNEL_ENCAP_IP and
LWTUNNEL_ENCAP_IP6. Should we just return NULL from lwtunnel_encap_str
in such cases (plus unknown encap_type) and WARN on the NULL here?


True, but I figured that it was cleaner for the lwtunnel infra to not 
assume whether how those modules are implemented. If you disagree, then 
I can change to doing as you suggest.


Thanks,
Rob

Re: [PATCH net-next 1/3] lwtunnel: autoload of lwt modules

2016-02-15 Thread Jiri Benc

On Mon, 15 Feb 2016 15:42:01 +, Robert Shearman wrote:
> +static const char *lwtunnel_encap_str(enum lwtunnel_encap_types encap_type)
> +{
> + switch (encap_type) {
> + case LWTUNNEL_ENCAP_MPLS:
> + return "LWTUNNEL_ENCAP_MPLS";
> + case LWTUNNEL_ENCAP_IP:
> + return "LWTUNNEL_ENCAP_IP";
> + case LWTUNNEL_ENCAP_ILA:
> + return "LWTUNNEL_ENCAP_ILA";
> + case LWTUNNEL_ENCAP_IP6:
> + return "LWTUNNEL_ENCAP_IP6";
> + case LWTUNNEL_ENCAP_NONE:
> + case __LWTUNNEL_ENCAP_MAX:
> + /* should not have got here */
> + break;
> + }
> + WARN_ON(1);
> + return "LWTUNNEL_ENCAP_NONE";
> +}
> +
> +#endif /* CONFIG_MODULES */
> +
>  struct lwtunnel_state *lwtunnel_state_alloc(int encap_len)
>  {
>   struct lwtunnel_state *lws;
> @@ -85,6 +109,14 @@ int lwtunnel_build_state(struct net_device *dev, u16 
> encap_type,
>   ret = -EOPNOTSUPP;
>   rcu_read_lock();
>   ops = rcu_dereference(lwtun_encaps[encap_type]);
> +#ifdef CONFIG_MODULES
> + if (!ops) {
> + rcu_read_unlock();
> + request_module("rtnl-lwt-%s", lwtunnel_encap_str(encap_type));

Why the repeating of "lwt"/"LWTUNNEL" and the unnecessary "ENCAP"?
Wouldn't be lwtunnel_encap_str returning just "MPLS" or "ILA" enough?
I don't have any strong preference here, it just looks weird to me.
Maybe there's a reason.

Also, this doesn't affect IP lwtunnels, i.e. LWTUNNEL_ENCAP_IP and
LWTUNNEL_ENCAP_IP6. Should we just return NULL from lwtunnel_encap_str
in such cases (plus unknown encap_type) and WARN on the NULL here?

 Jiri

-- 
Jiri Benc

[PATCH net] pppoe: fix reference counting in PPPoE proxy

2016-02-15 Thread Guillaume Nault

Drop reference on the relay_po socket when __pppoe_xmit() succeeds.
This is already handled correctly in the error path.

Signed-off-by: Guillaume Nault 
---
No 'Fixes' tag, since this issue seems to predate git's history.

 drivers/net/ppp/pppoe.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/ppp/pppoe.c b/drivers/net/ppp/pppoe.c
index f3c6302..4ddae81 100644
--- a/drivers/net/ppp/pppoe.c
+++ b/drivers/net/ppp/pppoe.c
@@ -395,6 +395,8 @@ static int pppoe_rcv_core(struct sock *sk, struct sk_buff 
*skb)
 
if (!__pppoe_xmit(sk_pppox(relay_po), skb))
goto abort_put;
+
+   sock_put(sk_pppox(relay_po));
} else {
if (sock_queue_rcv_skb(sk, skb))
goto abort_kfree;
-- 
2.7.0

Re: sctp: bad hash index calculation

2016-02-15 Thread Neil Horman

On Mon, Feb 15, 2016 at 04:56:01PM +0100, Dmitry Vyukov wrote:
> On Mon, Feb 15, 2016 at 4:50 PM, Neil Horman  wrote:
> > On Mon, Feb 15, 2016 at 04:11:22PM +0100, Dmitry Vyukov wrote:
> >> Hello,
> >>
> >> While looking into some memory leaks of sctp ports I've noticed that
> >> sctp_init initializes port hash table as follows:
> >>
> >> /* Allocate and initialize the SCTP port hash table.  */
> >> do {
> >> sctp_port_hashsize = (1UL << order) * PAGE_SIZE /
> >> sizeof(struct sctp_bind_hashbucket);
> >> if ((sctp_port_hashsize > (64 * 1024)) && order > 0)
> >> continue;
> >> sctp_port_hashtable = (struct sctp_bind_hashbucket *)
> >> __get_free_pages(GFP_KERNEL | __GFP_NOWARN, order);
> >> } while (!sctp_port_hashtable && --order > 0);
> >>
> >> and then hash index is computed as follows:
> >>
> >> /* Warning: The following hash functions assume a power of two 'size'. */
> >> /* This is the hash function for the SCTP port hash table. */
> >> static inline int sctp_phashfn(struct net *net, __u16 lport)
> >> {
> >> return (net_hash_mix(net) + lport) & (sctp_port_hashsize - 1);
> >> }
> >>
> >> I don't see what ensures that sctp_port_hashsize is in fact a power-of-2.
> >>
> >> spinlock_t in sctp_bind_hashbucket can be 2 words in some configs,
> >> then sizeof(sctp_bind_hashbucket) == 24, which can render half of hash
> >> table unused.
> >>
> >> struct sctp_bind_hashbucket {
> >> spinlock_t  lock;
> >> struct hlist_head   chain;
> >> };
> >>
> >> Am I missing something?
> >>
> > You're right, its not.  It seems to me that sctp_port_hashsize is meant to
> > simply bound the upper index of the hashtable array, and as such the phashfn
> > should not assume that its a power of 2 (i.e. it should simply mod the hash
> > value by sctp_port_hashsize rather than and-ing it).  Alternatively we could
> > simply use alloc_large_system_hash to allocate this hash table here, the 
> > way tcp
> > does.  I'm traveling right now, but can take care of this as soon as i get 
> > home
> > on wednesday
> 
> Hi Neil,
> 
> Thanks for confirming. It's yours, I don't pretend to fix it sooner.
> 
Copy that, I'll look at it later this week.  Thank you for the report
neil

Re: sctp: bad hash index calculation

2016-02-15 Thread Dmitry Vyukov

On Mon, Feb 15, 2016 at 4:50 PM, Neil Horman  wrote:
> On Mon, Feb 15, 2016 at 04:11:22PM +0100, Dmitry Vyukov wrote:
>> Hello,
>>
>> While looking into some memory leaks of sctp ports I've noticed that
>> sctp_init initializes port hash table as follows:
>>
>> /* Allocate and initialize the SCTP port hash table.  */
>> do {
>> sctp_port_hashsize = (1UL << order) * PAGE_SIZE /
>> sizeof(struct sctp_bind_hashbucket);
>> if ((sctp_port_hashsize > (64 * 1024)) && order > 0)
>> continue;
>> sctp_port_hashtable = (struct sctp_bind_hashbucket *)
>> __get_free_pages(GFP_KERNEL | __GFP_NOWARN, order);
>> } while (!sctp_port_hashtable && --order > 0);
>>
>> and then hash index is computed as follows:
>>
>> /* Warning: The following hash functions assume a power of two 'size'. */
>> /* This is the hash function for the SCTP port hash table. */
>> static inline int sctp_phashfn(struct net *net, __u16 lport)
>> {
>> return (net_hash_mix(net) + lport) & (sctp_port_hashsize - 1);
>> }
>>
>> I don't see what ensures that sctp_port_hashsize is in fact a power-of-2.
>>
>> spinlock_t in sctp_bind_hashbucket can be 2 words in some configs,
>> then sizeof(sctp_bind_hashbucket) == 24, which can render half of hash
>> table unused.
>>
>> struct sctp_bind_hashbucket {
>> spinlock_t  lock;
>> struct hlist_head   chain;
>> };
>>
>> Am I missing something?
>>
> You're right, its not.  It seems to me that sctp_port_hashsize is meant to
> simply bound the upper index of the hashtable array, and as such the phashfn
> should not assume that its a power of 2 (i.e. it should simply mod the hash
> value by sctp_port_hashsize rather than and-ing it).  Alternatively we could
> simply use alloc_large_system_hash to allocate this hash table here, the way 
> tcp
> does.  I'm traveling right now, but can take care of this as soon as i get 
> home
> on wednesday

Hi Neil,

Thanks for confirming. It's yours, I don't pretend to fix it sooner.

Re: sctp: bad hash index calculation

2016-02-15 Thread Neil Horman

On Mon, Feb 15, 2016 at 04:11:22PM +0100, Dmitry Vyukov wrote:
> Hello,
> 
> While looking into some memory leaks of sctp ports I've noticed that
> sctp_init initializes port hash table as follows:
> 
> /* Allocate and initialize the SCTP port hash table.  */
> do {
> sctp_port_hashsize = (1UL << order) * PAGE_SIZE /
> sizeof(struct sctp_bind_hashbucket);
> if ((sctp_port_hashsize > (64 * 1024)) && order > 0)
> continue;
> sctp_port_hashtable = (struct sctp_bind_hashbucket *)
> __get_free_pages(GFP_KERNEL | __GFP_NOWARN, order);
> } while (!sctp_port_hashtable && --order > 0);
> 
> and then hash index is computed as follows:
> 
> /* Warning: The following hash functions assume a power of two 'size'. */
> /* This is the hash function for the SCTP port hash table. */
> static inline int sctp_phashfn(struct net *net, __u16 lport)
> {
> return (net_hash_mix(net) + lport) & (sctp_port_hashsize - 1);
> }
> 
> I don't see what ensures that sctp_port_hashsize is in fact a power-of-2.
> 
> spinlock_t in sctp_bind_hashbucket can be 2 words in some configs,
> then sizeof(sctp_bind_hashbucket) == 24, which can render half of hash
> table unused.
> 
> struct sctp_bind_hashbucket {
> spinlock_t  lock;
> struct hlist_head   chain;
> };
> 
> Am I missing something?
> 
You're right, its not.  It seems to me that sctp_port_hashsize is meant to
simply bound the upper index of the hashtable array, and as such the phashfn
should not assume that its a power of 2 (i.e. it should simply mod the hash
value by sctp_port_hashsize rather than and-ing it).  Alternatively we could
simply use alloc_large_system_hash to allocate this hash table here, the way tcp
does.  I'm traveling right now, but can take care of this as soon as i get home
on wednesday

Neil

[PATCH net-next 2/3] mpls: autoload lwt module

2016-02-15 Thread Robert Shearman

Avoid users having to manually load the module by adding a module
alias allowing it to be autoloaded by the lwt infra.

Signed-off-by: Robert Shearman 
---
 net/mpls/mpls_iptunnel.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/net/mpls/mpls_iptunnel.c b/net/mpls/mpls_iptunnel.c
index fb31aa87de81..1b4536960f79 100644
--- a/net/mpls/mpls_iptunnel.c
+++ b/net/mpls/mpls_iptunnel.c
@@ -227,5 +227,6 @@ static void __exit mpls_iptunnel_exit(void)
 }
 module_exit(mpls_iptunnel_exit);
 
+MODULE_ALIAS_RTNL_LWT(LWTUNNEL_ENCAP_MPLS);
 MODULE_DESCRIPTION("MultiProtocol Label Switching IP Tunnels");
 MODULE_LICENSE("GPL v2");
-- 
2.1.4

[PATCH net-next 1/3] lwtunnel: autoload of lwt modules

2016-02-15 Thread Robert Shearman

The lwt implementations using net devices can autoload using the
existing mechanism using IFLA_INFO_KIND. However, there's no mechanism
that lwt modules not using net devices can use.

Therefore, add the ability to autoload modules registering lwt
operations for lwt implementations not using a net device so that
users don't have to manually load the modules.

Signed-off-by: Robert Shearman 
---
 include/net/lwtunnel.h |  4 +++-
 net/core/lwtunnel.c| 32 
 2 files changed, 35 insertions(+), 1 deletion(-)

diff --git a/include/net/lwtunnel.h b/include/net/lwtunnel.h
index 66350ce3e955..e9f116e29c22 100644
--- a/include/net/lwtunnel.h
+++ b/include/net/lwtunnel.h
@@ -170,6 +170,8 @@ static inline int lwtunnel_input(struct sk_buff *skb)
return -EOPNOTSUPP;
 }
 
-#endif
+#endif /* CONFIG_LWTUNNEL */
+
+#define MODULE_ALIAS_RTNL_LWT(encap_type) MODULE_ALIAS("rtnl-lwt-" 
__stringify(encap_type))
 
 #endif /* __NET_LWTUNNEL_H */
diff --git a/net/core/lwtunnel.c b/net/core/lwtunnel.c
index 299cfc24d888..8ef5e5cec03e 100644
--- a/net/core/lwtunnel.c
+++ b/net/core/lwtunnel.c
@@ -27,6 +27,30 @@
 #include 
 #include 
 
+#ifdef CONFIG_MODULES
+
+static const char *lwtunnel_encap_str(enum lwtunnel_encap_types encap_type)
+{
+   switch (encap_type) {
+   case LWTUNNEL_ENCAP_MPLS:
+   return "LWTUNNEL_ENCAP_MPLS";
+   case LWTUNNEL_ENCAP_IP:
+   return "LWTUNNEL_ENCAP_IP";
+   case LWTUNNEL_ENCAP_ILA:
+   return "LWTUNNEL_ENCAP_ILA";
+   case LWTUNNEL_ENCAP_IP6:
+   return "LWTUNNEL_ENCAP_IP6";
+   case LWTUNNEL_ENCAP_NONE:
+   case __LWTUNNEL_ENCAP_MAX:
+   /* should not have got here */
+   break;
+   }
+   WARN_ON(1);
+   return "LWTUNNEL_ENCAP_NONE";
+}
+
+#endif /* CONFIG_MODULES */
+
 struct lwtunnel_state *lwtunnel_state_alloc(int encap_len)
 {
struct lwtunnel_state *lws;
@@ -85,6 +109,14 @@ int lwtunnel_build_state(struct net_device *dev, u16 
encap_type,
ret = -EOPNOTSUPP;
rcu_read_lock();
ops = rcu_dereference(lwtun_encaps[encap_type]);
+#ifdef CONFIG_MODULES
+   if (!ops) {
+   rcu_read_unlock();
+   request_module("rtnl-lwt-%s", lwtunnel_encap_str(encap_type));
+   rcu_read_lock();
+   ops = rcu_dereference(lwtun_encaps[encap_type]);
+   }
+#endif
if (likely(ops && ops->build_state))
ret = ops->build_state(dev, encap, family, cfg, lws);
rcu_read_unlock();
-- 
2.1.4

[PATCH net-next 3/3] ila: autoload module

2016-02-15 Thread Robert Shearman

Avoid users having to manually load the module by adding a module
alias allowing it to be autoloaded by the lwt infra.

Signed-off-by: Robert Shearman 
---
 net/ipv6/ila/ila_common.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/net/ipv6/ila/ila_common.c b/net/ipv6/ila/ila_common.c
index 32dc9aab7297..8ecf2482560e 100644
--- a/net/ipv6/ila/ila_common.c
+++ b/net/ipv6/ila/ila_common.c
@@ -99,5 +99,6 @@ static void __exit ila_fini(void)
 
 module_init(ila_init);
 module_exit(ila_fini);
+MODULE_ALIAS_RTNL_LWT(LWTUNNEL_ENCAP_ILA);
 MODULE_AUTHOR("Tom Herbert ");
 MODULE_LICENSE("GPL");
-- 
2.1.4

[PATCH net-next 0/3] lwtunnel: autoload of lwt modules

2016-02-15 Thread Robert Shearman

The lwt implementations using net devices can autoload using the
existing mechanism using IFLA_INFO_KIND. However, there's no mechanism
that lwt modules not using net devices can use.

Therefore, add the ability to autoload modules registering lwt
operations for lwt implementations not using a net device so that
users don't have to manually load the modules.

Robert Shearman (3):
  lwtunnel: autoload of lwt modules
  mpls: autoload lwt module
  ila: autoload module

 include/net/lwtunnel.h|  4 +++-
 net/core/lwtunnel.c   | 32 
 net/ipv6/ila/ila_common.c |  1 +
 net/mpls/mpls_iptunnel.c  |  1 +
 4 files changed, 37 insertions(+), 1 deletion(-)

-- 
2.1.4

sctp: bad hash index calculation

2016-02-15 Thread Dmitry Vyukov

Hello,

While looking into some memory leaks of sctp ports I've noticed that
sctp_init initializes port hash table as follows:

/* Allocate and initialize the SCTP port hash table.  */
do {
sctp_port_hashsize = (1UL << order) * PAGE_SIZE /
sizeof(struct sctp_bind_hashbucket);
if ((sctp_port_hashsize > (64 * 1024)) && order > 0)
continue;
sctp_port_hashtable = (struct sctp_bind_hashbucket *)
__get_free_pages(GFP_KERNEL | __GFP_NOWARN, order);
} while (!sctp_port_hashtable && --order > 0);

and then hash index is computed as follows:

/* Warning: The following hash functions assume a power of two 'size'. */
/* This is the hash function for the SCTP port hash table. */
static inline int sctp_phashfn(struct net *net, __u16 lport)
{
return (net_hash_mix(net) + lport) & (sctp_port_hashsize - 1);
}

I don't see what ensures that sctp_port_hashsize is in fact a power-of-2.

spinlock_t in sctp_bind_hashbucket can be 2 words in some configs,
then sizeof(sctp_bind_hashbucket) == 24, which can render half of hash
table unused.

struct sctp_bind_hashbucket {
spinlock_t  lock;
struct hlist_head   chain;
};

Am I missing something?

[PATCH ethtool 3/3] Documentation for IPv6 NFC

2016-02-15 Thread Edward Cree

Leaves 'src-ip' and 'dst-ip' documented as taking x.x.x.x, because there's
more low-level nroff here than I can parse, let alone emit.

Signed-off-by: Edward Cree 
---
 ethtool.8.in | 34 +-
 ethtool.c|  4 +++-
 2 files changed, 28 insertions(+), 10 deletions(-)

diff --git a/ethtool.8.in b/ethtool.8.in
index eeffa70..5c0a9e3 100644
--- a/ethtool.8.in
+++ b/ethtool.8.in
@@ -74,7 +74,7 @@
 .\"
 .\"\(*NC - Network Classifier type values
 .\"
-.ds NC 
\fBether\fP|\fBip4\fP|\fBtcp4\fP|\fBudp4\fP|\fBsctp4\fP|\fBah4\fP|\fBesp4\fP
+.ds NC 
\fBether\fP|\fBip4\fP|\fBip6\fP|\fBtcp4\fP|\fBudp4\fP|\fBsctp4\fP|\fBah4\fP|\fBesp4\fP|\fBtcp6\fP|\fBudp6\fP|\fBah6\fP|\fBesp6\fP|\fBsctp6\fP
 ..
 .\"
 .\" Start URL.
@@ -260,6 +260,8 @@ ethtool \- query or control network driver and hardware 
settings
 .RB [ src\-ip \ \*(PA\ [ m \ \*(PA]]
 .RB [ dst\-ip \ \*(PA\ [ m \ \*(PA]]
 .BM tos
+.BM tclass
+.BM nexthdr
 .BM l4proto
 .BM src\-port
 .BM dst\-port
@@ -676,11 +678,17 @@ nokeep;
 lB l.
 ether  Ethernet
 ip4Raw IPv4
+ip6Raw IPv6
 tcp4   TCP over IPv4
 udp4   UDP over IPv4
 sctp4  SCTP over IPv4
 ah4IPSEC AH over IPv4
 esp4   IPSEC ESP over IPv4
+tcp6   TCP over IPv6
+udp6   UDP over IPv6
+sctp6  SCTP over IPv6
+ah6IPSEC AH over IPv6
+esp6   IPSEC ESP over IPv6
 .TE
 .PP
 For all fields that allow both a value and a mask to be specified, the
@@ -706,38 +714,46 @@ Valid only for flow-type ether.
 .TP
 .BR src\-ip \ \*(PA\ [ m \ \*(PA]
 Specify the source IP address of the incoming packet to match along with
-an optional mask.  Valid for all IPv4 based flow-types.
+an optional mask.  Valid for all IP based flow-types.
 .TP
 .BR dst\-ip \ \*(PA\ [ m \ \*(PA]
 Specify the destination IP address of the incoming packet to match along
-with an optional mask.  Valid for all IPv4 based flow-types.
+with an optional mask.  Valid for all IP based flow-types.
 .TP
 .BI tos \ N \\fR\ [\\fPm \ N \\fR]\\fP
 Specify the value of the Type of Service field in the incoming packet to
 match along with an optional mask.  Applies to all IPv4 based flow-types.
 .TP
+.BI tclass \ N \\fR\ [\\fPm \ N \\fR]\\fP
+Specify the value of the Traffic Class field in the incoming packet to
+match along with an optional mask.  Applies to all IPv6 based flow-types.
+.TP
+.BI nexthdr \ N \\fR\ [\\fPm \ N \\fR]\\fP
+Includes the IPv6 Next Header and optional mask.  Valid only for flow-type
+ip6.
+.TP
 .BI l4proto \ N \\fR\ [\\fPm \ N \\fR]\\fP
 Includes the layer 4 protocol number and optional mask.  Valid only for
-flow-type ip4.
+flow-types ip4 and ip6.
 .TP
 .BI src\-port \ N \\fR\ [\\fPm \ N \\fR]\\fP
 Specify the value of the source port field (applicable to TCP/UDP packets)
 in the incoming packet to match along with an optional mask.  Valid for
-flow-types ip4, tcp4, udp4, and sctp4.
+flow-types ip4, tcp4, udp4, and sctp4 and their IPv6 equivalents.
 .TP
 .BI dst\-port \ N \\fR\ [\\fPm \ N \\fR]\\fP
 Specify the value of the destination port field (applicable to TCP/UDP
 packets)in the incoming packet to match along with an optional mask.
-Valid for flow-types ip4, tcp4, udp4, and sctp4.
+Valid for flow-types ip4, tcp4, udp4, and sctp4 and their IPv6 equivalents.
 .TP
 .BI spi \ N \\fR\ [\\fPm \ N \\fR]\\fP
 Specify the value of the security parameter index field (applicable to
 AH/ESP packets)in the incoming packet to match along with an optional
-mask.  Valid for flow-types ip4, ah4, and esp4.
+mask.  Valid for flow-types ip4, ah4, and esp4 and their IPv6 equivalents.
 .TP
 .BI l4data \ N \\fR\ [\\fPm \ N \\fR]\\fP
 Specify the value of the first 4 Bytes of Layer 4 in the incoming packet to
-match along with an optional mask.  Valid for ip4 flow-type.
+match along with an optional mask.  Valid for ip4 and ip6 flow-types.
 .TP
 .BI vlan\-etype \ N \\fR\ [\\fPm \ N \\fR]\\fP
 Includes the VLAN tag Ethertype and an optional mask.
@@ -751,7 +767,7 @@ Includes 64-bits of user-specific data and an optional mask.
 .BR dst-mac \ \*(MA\ [ m \ \*(MA]
 Includes the destination MAC address, specified as 6 bytes in hexadecimal
 separated by colons, along with an optional mask.
-Valid for all IPv4 based flow-types.
+Valid for all IP based flow-types.
 .TP
 .BI action \ N
 Specifies the Rx queue to send packets to, or some other action.
diff --git a/ethtool.c b/ethtool.c
index f18ad73..b1de453 100644
--- a/ethtool.c
+++ b/ethtool.c
@@ -4108,13 +4108,15 @@ static const struct option {
  "Configure Rx network flow classification options or rules",
  " rx-flow-hash tcp4|udp4|ah4|esp4|sctp4|"
  "tcp6|udp6|ah6|esp6|sctp6 m|v|t|s|d|f|n|r... |\n"
- " flow-type ether|ip4|tcp4|udp4|sctp4|ah4|esp4\n"
+ " flow-type ether|ip4|tcp4|udp4|sctp4|ah4|esp4|"
+ "ip6|tcp6|udp6|ah6|esp6|sctp6\n"
  " [ src %x:%x:%x:%x:%x:%x [m %x:%x:%x:%x:%x:%x] 
]\n"
  " [ dst %x:%x:%x:%x:%x:%x [m %x:%x:%x:%x:%x:

[PATCH ethtool 2/3] Add IPv6 support to NFC

2016-02-15 Thread Edward Cree

Signed-off-by: Edward Cree 
---
 ethtool.c |  21 +
 rxclass.c | 272 ++
 2 files changed, 279 insertions(+), 14 deletions(-)

diff --git a/ethtool.c b/ethtool.c
index 92c40b8..f18ad73 100644
--- a/ethtool.c
+++ b/ethtool.c
@@ -32,6 +32,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -3492,6 +3493,22 @@ static int do_permaddr(struct cmd_context *ctx)
return err;
 }
 
+static bool flow_type_is_ntuple_supported(__u32 flow_type)
+{
+   switch (flow_type) {
+   case TCP_V4_FLOW:
+   case UDP_V4_FLOW:
+   case SCTP_V4_FLOW:
+   case AH_V4_FLOW:
+   case ESP_V4_FLOW:
+   case IPV4_USER_FLOW:
+   case ETHER_FLOW:
+   return true;
+   default:
+   return false;
+   }
+}
+
 static int flow_spec_to_ntuple(struct ethtool_rx_flow_spec *fsp,
   struct ethtool_rx_ntuple_flow_spec *ntuple)
 {
@@ -3515,6 +3532,10 @@ static int flow_spec_to_ntuple(struct 
ethtool_rx_flow_spec *fsp,
fsp->m_ext.vlan_etype)
return -1;
 
+   /* IPv6 flow types are not supported by ntuple */
+   if (!flow_type_is_ntuple_supported(fsp->flow_type & ~FLOW_EXT))
+   return -1;
+
/* Set entire ntuple to ~0 to guarantee all masks are set */
memset(ntuple, ~0, sizeof(*ntuple));
 
diff --git a/rxclass.c b/rxclass.c
index cd686a3..d3150d5 100644
--- a/rxclass.c
+++ b/rxclass.c
@@ -39,6 +39,25 @@ static void rxclass_print_ipv4_rule(__be32 sip, __be32 sipm, 
__be32 dip,
tos, tosm);
 }
 
+static void rxclass_print_ipv6_rule(__be32 *sip, __be32 *sipm, __be32 *dip,
+   __be32 *dipm, u8 tclass, u8 tclassm)
+{
+   char sip_str[INET6_ADDRSTRLEN];
+   char sipm_str[INET6_ADDRSTRLEN];
+   char dip_str[INET6_ADDRSTRLEN];
+   char dipm_str[INET6_ADDRSTRLEN];
+
+   fprintf(stdout,
+   "\tSrc IP addr: %s mask: %s\n"
+   "\tDest IP addr: %s mask: %s\n"
+   "\tTraffic Class: 0x%x mask: 0x%x\n",
+   inet_ntop(AF_INET6, sip, sip_str, INET6_ADDRSTRLEN),
+   inet_ntop(AF_INET6, sipm, sipm_str, INET6_ADDRSTRLEN),
+   inet_ntop(AF_INET6, dip, dip_str, INET6_ADDRSTRLEN),
+   inet_ntop(AF_INET6, dipm, dipm_str, INET6_ADDRSTRLEN),
+   tclass, tclassm);
+}
+
 static void rxclass_print_nfc_spec_ext(struct ethtool_rx_flow_spec *fsp)
 {
if (fsp->flow_type & FLOW_EXT) {
@@ -127,7 +146,7 @@ static void rxclass_print_nfc_rule(struct 
ethtool_rx_flow_spec *fsp)
ntohl(fsp->h_u.esp_ip4_spec.spi),
ntohl(fsp->m_u.esp_ip4_spec.spi));
break;
-   case IP_USER_FLOW:
+   case IPV4_USER_FLOW:
fprintf(stdout, "\tRule Type: Raw IPv4\n");
rxclass_print_ipv4_rule(fsp->h_u.usr_ip4_spec.ip4src,
 fsp->m_u.usr_ip4_spec.ip4src,
@@ -143,6 +162,62 @@ static void rxclass_print_nfc_rule(struct 
ethtool_rx_flow_spec *fsp)
ntohl(fsp->h_u.usr_ip4_spec.l4_4_bytes),
ntohl(fsp->m_u.usr_ip4_spec.l4_4_bytes));
break;
+   case TCP_V6_FLOW:
+   case UDP_V6_FLOW:
+   case SCTP_V6_FLOW:
+   if (flow_type == TCP_V6_FLOW)
+   fprintf(stdout, "\tRule Type: TCP over IPv6\n");
+   else if (flow_type == UDP_V6_FLOW)
+   fprintf(stdout, "\tRule Type: UDP over IPv6\n");
+   else
+   fprintf(stdout, "\tRule Type: SCTP over IPv6\n");
+   rxclass_print_ipv6_rule(fsp->h_u.tcp_ip6_spec.ip6src,
+fsp->m_u.tcp_ip6_spec.ip6src,
+fsp->h_u.tcp_ip6_spec.ip6dst,
+fsp->m_u.tcp_ip6_spec.ip6dst,
+fsp->h_u.tcp_ip6_spec.tclass,
+fsp->m_u.tcp_ip6_spec.tclass);
+   fprintf(stdout,
+   "\tSrc port: %d mask: 0x%x\n"
+   "\tDest port: %d mask: 0x%x\n",
+   ntohs(fsp->h_u.tcp_ip6_spec.psrc),
+   ntohs(fsp->m_u.tcp_ip6_spec.psrc),
+   ntohs(fsp->h_u.tcp_ip6_spec.pdst),
+   ntohs(fsp->m_u.tcp_ip6_spec.pdst));
+   break;
+   case AH_V6_FLOW:
+   case ESP_V6_FLOW:
+   if (flow_type == AH_V6_FLOW)
+   fprintf(stdout, "\tRule Type: IPSEC AH over IPv6\n");
+   else
+   fprintf(stdout, "\tRule Type: IPSEC ESP over IPv6\n");
+   rxclass_print_ipv6_rule(fsp->h_u.ah_ip6_spec.ip6src,
+fsp->m_u.ah_ip6_spec.ip6src,
+fsp->h_u.ah_ip6_spec.ip6dst,
+

[PATCH ethtool 1/3] ethtool-copy.h: sync with net-next

2016-02-15 Thread Edward Cree

This covers kernel changes up to:

commit 72bb68721f80a1441e871b6afc9ab0b3793d5031
Author: Edward Cree 
Date:   Fri Feb 5 11:16:21 2016 +

ethtool: add IPv6 to the NFC API

Signed-off-by: Edward Cree 
---
 ethtool-copy.h | 149 ++---
 1 file changed, 142 insertions(+), 7 deletions(-)

diff --git a/ethtool-copy.h b/ethtool-copy.h
index d23ffc4..39e89e3 100644
--- a/ethtool-copy.h
+++ b/ethtool-copy.h
@@ -215,6 +215,11 @@ enum tunable_id {
ETHTOOL_ID_UNSPEC,
ETHTOOL_RX_COPYBREAK,
ETHTOOL_TX_COPYBREAK,
+   /*
+* Add your fresh new tubale attribute above and remember to update
+* tunable_strings[] in net/core/ethtool.c
+*/
+   __ETHTOOL_TUNABLE_COUNT,
 };
 
 enum tunable_type_id {
@@ -537,6 +542,7 @@ struct ethtool_pauseparam {
  * now deprecated
  * @ETH_SS_FEATURES: Device feature names
  * @ETH_SS_RSS_HASH_FUNCS: RSS hush function names
+ * @ETH_SS_PHY_STATS: Statistic names, for use with %ETHTOOL_GPHYSTATS
  */
 enum ethtool_stringset {
ETH_SS_TEST = 0,
@@ -545,6 +551,8 @@ enum ethtool_stringset {
ETH_SS_NTUPLE_FILTERS,
ETH_SS_FEATURES,
ETH_SS_RSS_HASH_FUNCS,
+   ETH_SS_TUNABLES,
+   ETH_SS_PHY_STATS,
 };
 
 /**
@@ -740,6 +748,56 @@ struct ethtool_usrip4_spec {
__u8proto;
 };
 
+/**
+ * struct ethtool_tcpip6_spec - flow specification for TCP/IPv6 etc.
+ * @ip6src: Source host
+ * @ip6dst: Destination host
+ * @psrc: Source port
+ * @pdst: Destination port
+ * @tclass: Traffic Class
+ *
+ * This can be used to specify a TCP/IPv6, UDP/IPv6 or SCTP/IPv6 flow.
+ */
+struct ethtool_tcpip6_spec {
+   __be32  ip6src[4];
+   __be32  ip6dst[4];
+   __be16  psrc;
+   __be16  pdst;
+   __u8tclass;
+};
+
+/**
+ * struct ethtool_ah_espip6_spec - flow specification for IPsec/IPv6
+ * @ip6src: Source host
+ * @ip6dst: Destination host
+ * @spi: Security parameters index
+ * @tclass: Traffic Class
+ *
+ * This can be used to specify an IPsec transport or tunnel over IPv6.
+ */
+struct ethtool_ah_espip6_spec {
+   __be32  ip6src[4];
+   __be32  ip6dst[4];
+   __be32  spi;
+   __u8tclass;
+};
+
+/**
+ * struct ethtool_usrip6_spec - general flow specification for IPv6
+ * @ip6src: Source host
+ * @ip6dst: Destination host
+ * @l4_4_bytes: First 4 bytes of transport (layer 4) header
+ * @tclass: Traffic Class
+ * @l4_proto: Transport protocol number (nexthdr after any Extension Headers)
+ */
+struct ethtool_usrip6_spec {
+   __be32  ip6src[4];
+   __be32  ip6dst[4];
+   __be32  l4_4_bytes;
+   __u8tclass;
+   __u8l4_proto;
+};
+
 union ethtool_flow_union {
struct ethtool_tcpip4_spec  tcp_ip4_spec;
struct ethtool_tcpip4_spec  udp_ip4_spec;
@@ -747,6 +805,12 @@ union ethtool_flow_union {
struct ethtool_ah_espip4_spec   ah_ip4_spec;
struct ethtool_ah_espip4_spec   esp_ip4_spec;
struct ethtool_usrip4_spec  usr_ip4_spec;
+   struct ethtool_tcpip6_spec  tcp_ip6_spec;
+   struct ethtool_tcpip6_spec  udp_ip6_spec;
+   struct ethtool_tcpip6_spec  sctp_ip6_spec;
+   struct ethtool_ah_espip6_spec   ah_ip6_spec;
+   struct ethtool_ah_espip6_spec   esp_ip6_spec;
+   struct ethtool_usrip6_spec  usr_ip6_spec;
struct ethhdr   ether_spec;
__u8hdata[52];
 };
@@ -796,6 +860,31 @@ struct ethtool_rx_flow_spec {
__u32   location;
 };
 
+/* How rings are layed out when accessing virtual functions or
+ * offloaded queues is device specific. To allow users to do flow
+ * steering and specify these queues the ring cookie is partitioned
+ * into a 32bit queue index with an 8 bit virtual function id.
+ * This also leaves the 3bytes for further specifiers. It is possible
+ * future devices may support more than 256 virtual functions if
+ * devices start supporting PCIe w/ARI. However at the moment I
+ * do not know of any devices that support this so I do not reserve
+ * space for this at this time. If a future patch consumes the next
+ * byte it should be aware of this possiblity.
+ */
+#define ETHTOOL_RX_FLOW_SPEC_RING  0xLL
+#define ETHTOOL_RX_FLOW_SPEC_RING_VF   0x00FFLL
+#define ETHTOOL_RX_FLOW_SPEC_RING_VF_OFF 32
+static __inline__ __u64 ethtool_get_flow_spec_ring(__u64 ring_cookie)
+{
+   return ETHTOOL_RX_FLOW_SPEC_RING & ring_cookie;
+};
+
+static __inline__ __u64 ethtool_get_flow_spec_ring_vf(__u64 ring_cookie)
+{
+   return (ETHTOOL_RX_FLOW_SPEC_RING_VF & ring_cookie) >>
+   ETHTOOL_RX_FLOW_SPEC_RING_VF_OFF;
+};
+
 /**
  * struct ethtool_rxnfc - command to get or set RX flow classification rules
  * @cmd: Specific command number - %ETHTOOL_GRXFH, %ETHTOOL_SRXF

[PATCH ethtool 0/3] IPv6 RXNFC

2016-02-15 Thread Edward Cree

This series adds support for steering of IPv6 receive flows.

Edward Cree (3):
  ethtool-copy.h: sync with net-next
  Add IPv6 support to NFC
  Documentation for IPv6 NFC

 ethtool-copy.h | 149 +--
 ethtool.8.in   |  34 ++--
 ethtool.c  |  25 +-
 rxclass.c  | 272 ++---
 4 files changed, 449 insertions(+), 31 deletions(-)

[PATCH 0/7] fix IS_ERR_VALUE usage

2016-02-15 Thread Andrzej Hajda

Hi,

This small set of independent patches tries to fix incorrect
IS_ERR_VALUE macro usage. It fixes most usages leading to errors
as described in [1]. It also follows conclusion from the discussion
[1][2] - IS_ERR_VALUE should be used only with unsigned long type,
signed types should use comparison 'ret < 0'.

The patchset does not fix errors present in net/ethernet/freescale
and soc/fsq/qe drivers - these drivers mixes different types:
dma_addr_t, u32, unsigned long, fixing it properly seems to me more
challenging, maybe maintainers or brave volunteers can look it.

The list of missing fixes:
drivers/net/ethernet/freescale/fs_enet/mac-scc.c:149:36-37: WARNING: incorrect 
argument type in IS_ERR_VALUE(fep -> ring_mem_addr)
drivers/net/ethernet/freescale/ucc_geth.c:2237:48-49: WARNING: incorrect 
argument type in IS_ERR_VALUE(ugeth -> tx_bd_ring_offset [ j ])
drivers/net/ethernet/freescale/ucc_geth.c:2314:48-49: WARNING: incorrect 
argument type in IS_ERR_VALUE(ugeth -> rx_bd_ring_offset [ j ])
drivers/net/ethernet/freescale/ucc_geth.c:2524:44-45: WARNING: incorrect 
argument type in IS_ERR_VALUE(ugeth -> tx_glbl_pram_offset)
drivers/net/ethernet/freescale/ucc_geth.c:2544:45-46: WARNING: incorrect 
argument type in IS_ERR_VALUE(ugeth -> thread_dat_tx_offset)
drivers/net/ethernet/freescale/ucc_geth.c:2571:46-47: WARNING: incorrect 
argument type in IS_ERR_VALUE(ugeth -> send_q_mem_reg_offset)
drivers/net/ethernet/freescale/ucc_geth.c:2612:42-43: WARNING: incorrect 
argument type in IS_ERR_VALUE(ugeth -> scheduler_offset)
drivers/net/ethernet/freescale/ucc_geth.c:2659:54-55: WARNING: incorrect 
argument type in IS_ERR_VALUE(ugeth -> tx_fw_statistics_pram_offset)
drivers/net/ethernet/freescale/ucc_geth.c:2696:44-45: WARNING: incorrect 
argument type in IS_ERR_VALUE(ugeth -> rx_glbl_pram_offset)
drivers/net/ethernet/freescale/ucc_geth.c:2715:45-46: WARNING: incorrect 
argument type in IS_ERR_VALUE(ugeth -> thread_dat_rx_offset)
drivers/net/ethernet/freescale/ucc_geth.c:2736:54-55: WARNING: incorrect 
argument type in IS_ERR_VALUE(ugeth -> rx_fw_statistics_pram_offset)
drivers/net/ethernet/freescale/ucc_geth.c:2756:53-54: WARNING: incorrect 
argument type in IS_ERR_VALUE(ugeth -> rx_irq_coalescing_tbl_offset)
drivers/net/ethernet/freescale/ucc_geth.c:2822:44-45: WARNING: incorrect 
argument type in IS_ERR_VALUE(ugeth -> rx_bd_qs_tbl_offset)
drivers/net/ethernet/freescale/ucc_geth.c:2908:47-48: WARNING: incorrect 
argument type in IS_ERR_VALUE(ugeth -> exf_glbl_param_offset)
drivers/net/ethernet/freescale/ucc_geth.c:292:36-37: WARNING: incorrect 
argument type in IS_ERR_VALUE(init_enet_offset)
drivers/net/ethernet/freescale/ucc_geth.c:3042:39-40: WARNING: incorrect 
argument type in IS_ERR_VALUE(init_enet_pram_offset)
drivers/soc/fsl/qe/ucc_fast.c:271:60-61: WARNING: incorrect argument type in 
IS_ERR_VALUE(uccf -> ucc_fast_tx_virtual_fifo_base_offset)
drivers/soc/fsl/qe/ucc_fast.c:284:60-61: WARNING: incorrect argument type in 
IS_ERR_VALUE(uccf -> ucc_fast_rx_virtual_fifo_base_offset)
drivers/soc/fsl/qe/ucc_slow.c:186:38-39: WARNING: incorrect argument type in 
IS_ERR_VALUE(uccs -> us_pram_offset)
drivers/soc/fsl/qe/ucc_slow.c:213:38-39: WARNING: incorrect argument type in 
IS_ERR_VALUE(uccs -> rx_base_offset)
drivers/soc/fsl/qe/ucc_slow.c:224:38-39: WARNING: incorrect argument type in 
IS_ERR_VALUE(uccs -> tx_base_offset)
drivers/net/ethernet/freescale/fs_enet/mac-fcc.c:110:35-36: WARNING: unknown 
argument type in IS_ERR_VALUE(fpi -> dpram_offset)

[1]: http://permalink.gmane.org/gmane.linux.kernel/2120927
[2]: http://permalink.gmane.org/gmane.linux.kernel/2150581

Regards
Andrzej


Andrzej Hajda (7):
  netfilter: fix IS_ERR_VALUE usage
  MIPS: module: fix incorrect IS_ERR_VALUE macro usages
  drivers: char: mem: fix IS_ERROR_VALUE usage
  atmel-isi: fix IS_ERR_VALUE usage
  serial: clps711x: fix IS_ERR_VALUE usage
  fbdev: exynos: fix IS_ERR_VALUE usage
  usb: gadget: fsl_qe_udc: fix IS_ERR_VALUE usage

 arch/mips/kernel/module-rela.c|  2 +-
 arch/mips/kernel/module.c |  2 +-
 drivers/char/mem.c|  2 +-
 drivers/media/platform/soc_camera/atmel-isi.c |  4 ++--
 drivers/tty/serial/clps711x.c | 14 --
 drivers/usb/gadget/udc/fsl_qe_udc.c   |  2 +-
 drivers/video/fbdev/exynos/exynos_mipi_dsi.c  |  6 +++---
 include/linux/netfilter/x_tables.h|  6 +++---
 net/ipv4/netfilter/arp_tables.c   | 11 +++
 net/ipv4/netfilter/ip_tables.c| 12 
 net/ipv6/netfilter/ip6_tables.c   | 13 +
 11 files changed, 44 insertions(+), 30 deletions(-)

-- 
1.9.1

[PATCH] ravb: Update DT binding example for final CPG/MSSR bindings

2016-02-15 Thread Geert Uytterhoeven

The example in the DT binding documentation uses the preliminary DT
bindings for the r8a7795 MSTP clocks, which never went upstream.
Update the example to use the DT bindings for the upstream Clock Pulse
Generator / Module Standby and Software Reset hardware block.

Signed-off-by: Geert Uytterhoeven 
---
 Documentation/devicetree/bindings/net/renesas,ravb.txt | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/Documentation/devicetree/bindings/net/renesas,ravb.txt 
b/Documentation/devicetree/bindings/net/renesas,ravb.txt
index 81a9f9e6b45ff85c..c8ac222eac67a0c3 100644
--- a/Documentation/devicetree/bindings/net/renesas,ravb.txt
+++ b/Documentation/devicetree/bindings/net/renesas,ravb.txt
@@ -82,8 +82,8 @@ Example:
  "ch16", "ch17", "ch18", "ch19",
  "ch20", "ch21", "ch22", "ch23",
  "ch24";
-   clocks = <&mstp8_clks R8A7795_CLK_ETHERAVB>;
-   power-domains = <&cpg_clocks>;
+   clocks = <&cpg CPG_MOD 812>;
+   power-domains = <&cpg>;
phy-mode = "rgmii-id";
phy-handle = <&phy0>;
 
-- 
1.9.1

I have registered your CHECK WORTH of $3.5 MILLION USD

2016-02-15 Thread Ben Taylor

I have registered your CHECK WORTH of $3.5 MILLION USD(THREE MILLION,FIVE
HUNDRED THOUSAND DOLLARS) with delivery agency.
Please Contact them with your delivery information such as,
Your Name---,
Your Address---,
ID CARD COPY---, and Your Telephone Number:
Contact Person: Mr. Mike Uwaa:
E-mail: (delivery.agenc...@outlook.com)
Tel: +229-+229-6331-4984

Thank you,
Ben Taylor

[patch net 0/2] mlxsw fixes

2016-02-15 Thread Jiri Pirko

From: Jiri Pirko 

Just a couple of fixes from Ido.

Ido Schimmel (2):
  mlxsw: Treat local port 64 as valid
  mlxsw: spectrum: Set STP state when leaving 802.1D bridge

 drivers/net/ethernet/mellanox/mlxsw/port.h | 2 +-
 drivers/net/ethernet/mellanox/mlxsw/spectrum.c | 8 
 2 files changed, 9 insertions(+), 1 deletion(-)

-- 
1.9.3

1 2 >

1 - 100 of 120 matches

Mail list logo