date:20161017

Re: [PATCH 00/28] Reenable maybe-uninitialized warnings

2016-10-17 Thread Christoph Hellwig

On Tue, Oct 18, 2016 at 12:03:28AM +0200, Arnd Bergmann wrote:
> This is a set of patches that I hope to get into v4.9 in some form
> in order to turn on the -Wmaybe-uninitialized warnings again.

Hi Arnd,

I jsut complained to Geert that I was introducing way to many
bugs or pointless warnings for some compilers lately, but gcc didn't
warn me about them.  From a little research the lack of
-Wmaybe-uninitialized seems to be the reason for it, so I'm all
for re-enabling it.

Re: [patch net-next RFC 4/6] Introduce sample tc action

2016-10-17 Thread Roopa Prabhu

On 10/17/16, 5:17 PM, Roopa Prabhu wrote:
> On 10/17/16, 3:10 AM, Jamal Hadi Salim wrote:
[snip]

inline below more data/context..

 +
 +struct sample_packet_metadata {
 +int sample_size;
 +int orig_size;
 +int ifindex;
 +};
 +
>>> This metadata does not look extensible.. can it be made to ?
>>>
>> Sure it can...
more sflow context here... [1]

An extensible metadata scheme is  highly desirable when passing data from the 
dataplane to 
the sampling agent in userspace. Looking forward, advanced instrumentation is 
being 
added to data planes and keeping the api future proof will help.

>>
>>> With sflow in context, you need a pair of ifindex numbers to encode ingress 
>>> and egress ports.
>> What is the use case for both?
> I have heard that most monitoring tools have moved to ingress only sampling 
> because of operational
> complexity (use case is sflow). I think hardware also supports ingress and 
> egress only sampling.
> better to have an option to reflect that in the api.

The reason for having two ifindex numbers is to record the ingress and egress 
ports (i.e. the path that the packet takes through the datapath/ASIC). You may 
actually have three ifindex numbers associated with a sample:
1. The data source that made the measurement (on a linux system each bridge has 
its own ifindex)
2. The ifindex associated with the ingress switch port
3. The ifindex associated with the egress switch port.

All three apply irrespective of sampling direction.

thanks,
Roopa

[1] Additional extended flow attributes have been defined to further extend 
sFlow packet samples:
http://sflow.org/sflow_tunnels.txt 
http://sflow.org/sflow_openflow.txt

Useless debug warning "netlink: 16 bytes leftover after parsing attributes"

2016-10-17 Thread Marcel Holtmann

Hi,

so lately I am seeing a bunch of these warnings:

netlink: 16 bytes leftover after parsing attributes..

While they give you the process name, they are still useless to track down the 
message that causes them. I find them even more useless since an updated 
userspace on an older kernel can trigger the nla_policy warning here. And that 
updated userspace program is doing nothing wrong by including extra attributes. 
So what purpose is this warning serving?

Regards

Marcel

Re: [PATCH net-next] tcp: Change txhash on some non-RTO retransmits

2016-10-17 Thread Tom Herbert

On Mon, Oct 17, 2016 at 8:35 PM, Lawrence Brakmo  wrote:
> Yuchung and Eric, thank you for your comments.
>
> It looks like I need to think more about this patch. I was trying
> to reduce the likelihood of reordering (which seems even more
> important based on Eric¹s comment on pacing), but it seems like
> the only way to prevent reordering is to only re-hash after an RTO
> or when there are no packets in flight (which may not occur).
>
Sounds like that should be the same condition as when we set ooo_okay?

>
> On 10/11/16, 8:56 PM, "Yuchung Cheng"  wrote:
>
>>On Tue, Oct 11, 2016 at 6:01 PM, Yuchung Cheng  wrote:
>>> On Tue, Oct 11, 2016 at 2:08 PM, Lawrence Brakmo  wrote:
 Yuchung, thank you for your comments. Responses inline.

 On 10/11/16, 12:49 PM, "Yuchung Cheng"  wrote:

>On Mon, Oct 10, 2016 at 5:18 PM, Lawrence Brakmo  wrote:
>>
>> The purpose of this patch is to help balance flows across paths. A
>>new
>> sysctl "tcp_retrans_txhash_prob" specifies the probability (0-100)
>>that
>> the txhash (IPv6 flowlabel) will be changed after a non-RTO
>>retransmit.
>> A probability is used in order to control how many flows are moved
>> during a congestion event and prevent the congested path from
>>becoming
>> under utilized (which could occur if too many flows leave the current
>> path). Txhash changes may be delayed in order to decrease the
>>likelihood
>> that it will trigger retransmists due to too much reordering.
>>
>> Another sysctl "tcp_retrans_txhash_mode" determines the behavior
>>after
>> RTOs. If the sysctl is 0, then after an RTO, only RTOs can trigger
>> txhash changes. The idea is to decrease the likelihood of going back
>> to a broken path. That is, we don't want flow balancing to trigger
>> changes to broken paths. The drawback is that flow balancing does
>> not work as well. If the sysctl is greater than 1, then we always
>> do flow balancing, even after RTOs.
>>
>> Tested with packedrill tests (for correctness) and performance
>> experiments with 2 and 3 paths. Performance experiments looked at
>> aggregate goodput and fairness. For each run, we looked at the ratio
>>of
>> the goodputs for the fastest and slowest flows. These were averaged
>>for
>> all the runs. A fairness of 1 means all flows had the same goodput, a
>> fairness of 2 means the fastest flow was twice as fast as the slowest
>> flow.
>>
>> The setup for the performance experiments was 4 or 5 serves in a
>>rack,
>> 10G links. I tested various probabilities, but 20 seemed to have the
>> best tradeoff for my setup (small RTTs).
>>
>>   --- node1 -
>> sender --- switch --- node2 - switch  receiver
>>   --- node3 -
>>
>> Scenario 1: One sender sends to one receiver through 2 routes (node1
>>or
>> node 2). The output from node1 and node2 is 1G (1gbit/sec). With
>>only 2
>> flows, without flow balancing (prob=0) the average goodput is 1.6G
>>vs.
>> 1.9G with flow balancing due to 2 flows ending up in one link and
>>either
>> not moving and taking some time to move. Fairness was 1 in all cases.
>> For 7 flows, goodput was 1.9G for all, but fairness was 1.5, 1.4 or
>>1.2
>> for prob=0, prob=20,mode=0 and prob=20,mode=1 respectively. That is,
>> flow balancing increased fairness.
>>
>> Scenario 2: One sender to one receiver, through 3 routes (node1,...
>> node2). With 6 or 16 flows the goodput was the same for all, but
>> fairness was 1.8, 1.5 and 1.2 respectively. Interestingly, the worst
>> case fairness out of 10 runs were 2.2, 1.8 and 1.4 repectively. That
>>is,
>> prob=20,mode=1 improved average and worst case fairness.
>I am wondering if we can build better API with routing layer to
>implement this type of feature, instead of creeping the tx_rehashing
>logic scatter in TCP. For example, we call dst_negative_advice on TCP
>write timeouts.

 Not sure. The route is not necessarily bad, may be temporarily
congested
 or they may all be congested. If all we want to do is change the txhash
 (unlike dst_negative_advice), then calling a tx_rehashing function may
 be the appropriate call.

>
>On the patch itself, it seems aggressive to (attempt to) rehash every
>post-RTO retranmission. Also you can just use ca_state (==CA_Loss) to
>identify post-RTO retransmission directly.

 Thanks, I will add the test.

>
>is this an implementation of the Flow Bender ?
>https://urldefense.proofpoint.com/v2/url?u=http-3A__dl.acm.org_citation
>.cf
>m-3Fid-3D2674985=DQIBaQ=5VD0RTtNlTh3ycd41b3MUw=pq_Mqvzfy-C8ltkgyx
>1u_

[PATCH net-next] r8152: add new products of Lenovo

2016-10-17 Thread Hayes Wang

Add the following four products of Lenovo and sort the order of the list.

VID PID
0x17ef  0x3062
0x17ef  0x3069
0x17ef  0x720c
0x17ef  0x7214

Signed-off-by: Hayes Wang 
---
 drivers/net/usb/cdc_ether.c | 28 
 drivers/net/usb/r8152.c |  6 +-
 2 files changed, 33 insertions(+), 1 deletion(-)

diff --git a/drivers/net/usb/cdc_ether.c b/drivers/net/usb/cdc_ether.c
index c47ec0a..45e5e43 100644
--- a/drivers/net/usb/cdc_ether.c
+++ b/drivers/net/usb/cdc_ether.c
@@ -687,6 +687,20 @@ static const struct usb_device_id  products[] = {
.driver_info = 0,
 },
 
+/* ThinkPad USB-C Dock (based on Realtek RTL8153) */
+{
+   USB_DEVICE_AND_INTERFACE_INFO(LENOVO_VENDOR_ID, 0x3062, USB_CLASS_COMM,
+   USB_CDC_SUBCLASS_ETHERNET, USB_CDC_PROTO_NONE),
+   .driver_info = 0,
+},
+
+/* ThinkPad Thunderbolt 3 Dock (based on Realtek RTL8153) */
+{
+   USB_DEVICE_AND_INTERFACE_INFO(LENOVO_VENDOR_ID, 0x3069, USB_CLASS_COMM,
+   USB_CDC_SUBCLASS_ETHERNET, USB_CDC_PROTO_NONE),
+   .driver_info = 0,
+},
+
 /* Lenovo Thinkpad USB 3.0 Ethernet Adapters (based on Realtek RTL8153) */
 {
USB_DEVICE_AND_INTERFACE_INFO(LENOVO_VENDOR_ID, 0x7205, USB_CLASS_COMM,
@@ -694,6 +708,20 @@ static const struct usb_device_id  products[] = {
.driver_info = 0,
 },
 
+/* Lenovo USB C to Ethernet Adapter (based on Realtek RTL8153) */
+{
+   USB_DEVICE_AND_INTERFACE_INFO(LENOVO_VENDOR_ID, 0x720c, USB_CLASS_COMM,
+   USB_CDC_SUBCLASS_ETHERNET, USB_CDC_PROTO_NONE),
+   .driver_info = 0,
+},
+
+/* Lenovo USB-C Travel Hub (based on Realtek RTL8153) */
+{
+   USB_DEVICE_AND_INTERFACE_INFO(LENOVO_VENDOR_ID, 0x7214, USB_CLASS_COMM,
+   USB_CDC_SUBCLASS_ETHERNET, USB_CDC_PROTO_NONE),
+   .driver_info = 0,
+},
+
 /* NVIDIA Tegra USB 3.0 Ethernet Adapters (based on Realtek RTL8153) */
 {
USB_DEVICE_AND_INTERFACE_INFO(NVIDIA_VENDOR_ID, 0x09ff, USB_CLASS_COMM,
diff --git a/drivers/net/usb/r8152.c b/drivers/net/usb/r8152.c
index 2886946..8d6e13c 100644
--- a/drivers/net/usb/r8152.c
+++ b/drivers/net/usb/r8152.c
@@ -4411,8 +4411,12 @@ static struct usb_device_id rtl8152_table[] = {
{REALTEK_USB_DEVICE(VENDOR_ID_REALTEK, 0x8152)},
{REALTEK_USB_DEVICE(VENDOR_ID_REALTEK, 0x8153)},
{REALTEK_USB_DEVICE(VENDOR_ID_SAMSUNG, 0xa101)},
-   {REALTEK_USB_DEVICE(VENDOR_ID_LENOVO,  0x7205)},
{REALTEK_USB_DEVICE(VENDOR_ID_LENOVO,  0x304f)},
+   {REALTEK_USB_DEVICE(VENDOR_ID_LENOVO,  0x3062)},
+   {REALTEK_USB_DEVICE(VENDOR_ID_LENOVO,  0x3069)},
+   {REALTEK_USB_DEVICE(VENDOR_ID_LENOVO,  0x7205)},
+   {REALTEK_USB_DEVICE(VENDOR_ID_LENOVO,  0x720c)},
+   {REALTEK_USB_DEVICE(VENDOR_ID_LENOVO,  0x7214)},
{REALTEK_USB_DEVICE(VENDOR_ID_NVIDIA,  0x09ff)},
{}
 };
-- 
2.7.4

Re: [PATCH net-next] tcp: Change txhash on some non-RTO retransmits

2016-10-17 Thread Lawrence Brakmo

Yuchung and Eric, thank you for your comments.

It looks like I need to think more about this patch. I was trying
to reduce the likelihood of reordering (which seems even more
important based on Eric¹s comment on pacing), but it seems like
the only way to prevent reordering is to only re-hash after an RTO
or when there are no packets in flight (which may not occur).


On 10/11/16, 8:56 PM, "Yuchung Cheng"  wrote:

>On Tue, Oct 11, 2016 at 6:01 PM, Yuchung Cheng  wrote:
>> On Tue, Oct 11, 2016 at 2:08 PM, Lawrence Brakmo  wrote:
>>> Yuchung, thank you for your comments. Responses inline.
>>>
>>> On 10/11/16, 12:49 PM, "Yuchung Cheng"  wrote:
>>>
On Mon, Oct 10, 2016 at 5:18 PM, Lawrence Brakmo  wrote:
>
> The purpose of this patch is to help balance flows across paths. A
>new
> sysctl "tcp_retrans_txhash_prob" specifies the probability (0-100)
>that
> the txhash (IPv6 flowlabel) will be changed after a non-RTO
>retransmit.
> A probability is used in order to control how many flows are moved
> during a congestion event and prevent the congested path from
>becoming
> under utilized (which could occur if too many flows leave the current
> path). Txhash changes may be delayed in order to decrease the
>likelihood
> that it will trigger retransmists due to too much reordering.
>
> Another sysctl "tcp_retrans_txhash_mode" determines the behavior
>after
> RTOs. If the sysctl is 0, then after an RTO, only RTOs can trigger
> txhash changes. The idea is to decrease the likelihood of going back
> to a broken path. That is, we don't want flow balancing to trigger
> changes to broken paths. The drawback is that flow balancing does
> not work as well. If the sysctl is greater than 1, then we always
> do flow balancing, even after RTOs.
>
> Tested with packedrill tests (for correctness) and performance
> experiments with 2 and 3 paths. Performance experiments looked at
> aggregate goodput and fairness. For each run, we looked at the ratio
>of
> the goodputs for the fastest and slowest flows. These were averaged
>for
> all the runs. A fairness of 1 means all flows had the same goodput, a
> fairness of 2 means the fastest flow was twice as fast as the slowest
> flow.
>
> The setup for the performance experiments was 4 or 5 serves in a
>rack,
> 10G links. I tested various probabilities, but 20 seemed to have the
> best tradeoff for my setup (small RTTs).
>
>   --- node1 -
> sender --- switch --- node2 - switch  receiver
>   --- node3 -
>
> Scenario 1: One sender sends to one receiver through 2 routes (node1
>or
> node 2). The output from node1 and node2 is 1G (1gbit/sec). With
>only 2
> flows, without flow balancing (prob=0) the average goodput is 1.6G
>vs.
> 1.9G with flow balancing due to 2 flows ending up in one link and
>either
> not moving and taking some time to move. Fairness was 1 in all cases.
> For 7 flows, goodput was 1.9G for all, but fairness was 1.5, 1.4 or
>1.2
> for prob=0, prob=20,mode=0 and prob=20,mode=1 respectively. That is,
> flow balancing increased fairness.
>
> Scenario 2: One sender to one receiver, through 3 routes (node1,...
> node2). With 6 or 16 flows the goodput was the same for all, but
> fairness was 1.8, 1.5 and 1.2 respectively. Interestingly, the worst
> case fairness out of 10 runs were 2.2, 1.8 and 1.4 repectively. That
>is,
> prob=20,mode=1 improved average and worst case fairness.
I am wondering if we can build better API with routing layer to
implement this type of feature, instead of creeping the tx_rehashing
logic scatter in TCP. For example, we call dst_negative_advice on TCP
write timeouts.
>>>
>>> Not sure. The route is not necessarily bad, may be temporarily
>>>congested
>>> or they may all be congested. If all we want to do is change the txhash
>>> (unlike dst_negative_advice), then calling a tx_rehashing function may
>>> be the appropriate call.
>>>

On the patch itself, it seems aggressive to (attempt to) rehash every
post-RTO retranmission. Also you can just use ca_state (==CA_Loss) to
identify post-RTO retransmission directly.
>>>
>>> Thanks, I will add the test.
>>>

is this an implementation of the Flow Bender ?
https://urldefense.proofpoint.com/v2/url?u=http-3A__dl.acm.org_citation
.cf
m-3Fid-3D2674985=DQIBaQ=5VD0RTtNlTh3ycd41b3MUw=pq_Mqvzfy-C8ltkgyx
1u_
g=Q4nONH7kQ5AvQguw9UxpcHd79jfdDdrXj1YSJs7Ezhk=MA4fWBLMTGgRS0eGvBjxf
7BJ
Ol3-oxAzZDEYUG4cE-s=
>>>
>>> Part of flow bender, although there are also some similarities to
>>>flowlet
>>> switching.
>>>

>
> Scenario 3: One sender to one receiver, 2

[PATCH net-next 06/11] ixgbe: Flip to the new dev walk API

2016-10-17 Thread David Ahern

Convert ixgbe users to new dev walk API. This is just a code conversion;
no functional change is intended.

Signed-off-by: David Ahern 
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 132 --
 1 file changed, 82 insertions(+), 50 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c 
b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 784b0b98ab2f..f380fda11eb6 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -5012,24 +5012,23 @@ static int ixgbe_fwd_ring_up(struct net_device *vdev,
return err;
 }
 
-static void ixgbe_configure_dfwd(struct ixgbe_adapter *adapter)
+static int ixgbe_upper_dev_walk(struct net_device *upper, void *data)
 {
-   struct net_device *upper;
-   struct list_head *iter;
-   int err;
-
-   netdev_for_each_all_upper_dev_rcu(adapter->netdev, upper, iter) {
-   if (netif_is_macvlan(upper)) {
-   struct macvlan_dev *dfwd = netdev_priv(upper);
-   struct ixgbe_fwd_adapter *vadapter = dfwd->fwd_priv;
+   if (netif_is_macvlan(upper)) {
+   struct macvlan_dev *dfwd = netdev_priv(upper);
+   struct ixgbe_fwd_adapter *vadapter = dfwd->fwd_priv;
 
-   if (dfwd->fwd_priv) {
-   err = ixgbe_fwd_ring_up(upper, vadapter);
-   if (err)
-   continue;
-   }
-   }
+   if (dfwd->fwd_priv)
+   ixgbe_fwd_ring_up(upper, vadapter);
}
+
+   return 0;
+}
+
+static void ixgbe_configure_dfwd(struct ixgbe_adapter *adapter)
+{
+   netdev_walk_all_upper_dev_rcu(adapter->netdev,
+ ixgbe_upper_dev_walk, NULL);
 }
 
 static void ixgbe_configure(struct ixgbe_adapter *adapter)
@@ -5448,12 +5447,25 @@ static void ixgbe_fdir_filter_exit(struct ixgbe_adapter 
*adapter)
spin_unlock(>fdir_perfect_lock);
 }
 
+static int ixgbe_disable_macvlan(struct net_device *upper, void *data)
+{
+   if (netif_is_macvlan(upper)) {
+   struct macvlan_dev *vlan = netdev_priv(upper);
+
+   if (vlan->fwd_priv) {
+   netif_tx_stop_all_queues(upper);
+   netif_carrier_off(upper);
+   netif_tx_disable(upper);
+   }
+   }
+
+   return 0;
+}
+
 void ixgbe_down(struct ixgbe_adapter *adapter)
 {
struct net_device *netdev = adapter->netdev;
struct ixgbe_hw *hw = >hw;
-   struct net_device *upper;
-   struct list_head *iter;
int i;
 
/* signal that we are down to the interrupt handler */
@@ -5477,17 +5489,8 @@ void ixgbe_down(struct ixgbe_adapter *adapter)
netif_tx_disable(netdev);
 
/* disable any upper devices */
-   netdev_for_each_all_upper_dev_rcu(adapter->netdev, upper, iter) {
-   if (netif_is_macvlan(upper)) {
-   struct macvlan_dev *vlan = netdev_priv(upper);
-
-   if (vlan->fwd_priv) {
-   netif_tx_stop_all_queues(upper);
-   netif_carrier_off(upper);
-   netif_tx_disable(upper);
-   }
-   }
-   }
+   netdev_walk_all_upper_dev_rcu(adapter->netdev,
+ ixgbe_disable_macvlan, NULL);
 
ixgbe_irq_disable(adapter);
 
@@ -6728,6 +6731,18 @@ static void ixgbe_update_default_up(struct ixgbe_adapter 
*adapter)
 #endif
 }
 
+static int ixgbe_enable_macvlan(struct net_device *upper, void *data)
+{
+   if (netif_is_macvlan(upper)) {
+   struct macvlan_dev *vlan = netdev_priv(upper);
+
+   if (vlan->fwd_priv)
+   netif_tx_wake_all_queues(upper);
+   }
+
+   return 0;
+}
+
 /**
  * ixgbe_watchdog_link_is_up - update netif_carrier status and
  * print link up message
@@ -6737,8 +6752,6 @@ static void ixgbe_watchdog_link_is_up(struct 
ixgbe_adapter *adapter)
 {
struct net_device *netdev = adapter->netdev;
struct ixgbe_hw *hw = >hw;
-   struct net_device *upper;
-   struct list_head *iter;
u32 link_speed = adapter->link_speed;
const char *speed_str;
bool flow_rx, flow_tx;
@@ -6809,14 +6822,8 @@ static void ixgbe_watchdog_link_is_up(struct 
ixgbe_adapter *adapter)
 
/* enable any upper devices */
rtnl_lock();
-   netdev_for_each_all_upper_dev_rcu(adapter->netdev, upper, iter) {
-   if (netif_is_macvlan(upper)) {
-   struct macvlan_dev *vlan = netdev_priv(upper);
-
-   if (vlan->fwd_priv)
-   netif_tx_wake_all_queues(upper);
-   }
-   }
+

[PATCH net-next 10/11] net: Add warning if any lower device is still in adjacency list

2016-10-17 Thread David Ahern

Lower list should be empty just like upper.

Signed-off-by: David Ahern 
---
 net/core/dev.c | 15 +++
 1 file changed, 15 insertions(+)

diff --git a/net/core/dev.c b/net/core/dev.c
index a9fe14908b44..c6bbf310d407 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -5219,6 +5219,20 @@ struct net_device *netdev_master_upper_dev_get(struct 
net_device *dev)
 }
 EXPORT_SYMBOL(netdev_master_upper_dev_get);
 
+/**
+ * netdev_has_any_lower_dev - Check if device is linked to some device
+ * @dev: device
+ *
+ * Find out if a device is linked to a lower device and return true in case
+ * it is. The caller must hold the RTNL lock.
+ */
+static bool netdev_has_any_lower_dev(struct net_device *dev)
+{
+   ASSERT_RTNL();
+
+   return !list_empty(>adj_list.lower);
+}
+
 void *netdev_adjacent_get_private(struct list_head *adj_list)
 {
struct netdev_adjacent *adj;
@@ -6616,6 +6630,7 @@ static void rollback_registered_many(struct list_head 
*head)
 
/* Notifier chain MUST detach us all upper devices. */
WARN_ON(netdev_has_any_upper_dev(dev));
+   WARN_ON(netdev_has_any_lower_dev(dev));
 
/* Remove entries from kobject tree */
netdev_unregister_kobject(dev);
-- 
2.1.4

[PATCH net-next 05/11] IB/ipoib: Flip to new dev walk API

2016-10-17 Thread David Ahern

Convert ipoib_get_net_dev_match_addr to the new upper device walk API.
This is just a code conversion; no functional change is intended.

v2
- removed typecast of data

Signed-off-by: David Ahern 
---
 drivers/infiniband/ulp/ipoib/ipoib_main.c | 37 +--
 1 file changed, 25 insertions(+), 12 deletions(-)

diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c 
b/drivers/infiniband/ulp/ipoib/ipoib_main.c
index 5636fc3da6b8..cc059218c962 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_main.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c
@@ -292,6 +292,25 @@ static struct net_device *ipoib_get_master_net_dev(struct 
net_device *dev)
return dev;
 }
 
+struct ipoib_walk_data {
+   const struct sockaddr *addr;
+   struct net_device *result;
+};
+
+static int ipoib_upper_walk(struct net_device *upper, void *_data)
+{
+   struct ipoib_walk_data *data = _data;
+   int ret = 0;
+
+   if (ipoib_is_dev_match_addr_rcu(data->addr, upper)) {
+   dev_hold(upper);
+   data->result = upper;
+   ret = 1;
+   }
+
+   return ret;
+}
+
 /**
  * Find a net_device matching the given address, which is an upper device of
  * the given net_device.
@@ -304,27 +323,21 @@ static struct net_device *ipoib_get_master_net_dev(struct 
net_device *dev)
 static struct net_device *ipoib_get_net_dev_match_addr(
const struct sockaddr *addr, struct net_device *dev)
 {
-   struct net_device *upper,
- *result = NULL;
-   struct list_head *iter;
+   struct ipoib_walk_data data = {
+   .addr = addr,
+   };
 
rcu_read_lock();
if (ipoib_is_dev_match_addr_rcu(addr, dev)) {
dev_hold(dev);
-   result = dev;
+   data.result = dev;
goto out;
}
 
-   netdev_for_each_all_upper_dev_rcu(dev, upper, iter) {
-   if (ipoib_is_dev_match_addr_rcu(addr, upper)) {
-   dev_hold(upper);
-   result = upper;
-   break;
-   }
-   }
+   netdev_walk_all_upper_dev_rcu(dev, ipoib_upper_walk, );
 out:
rcu_read_unlock();
-   return result;
+   return data.result;
 }
 
 /* returns the number of IPoIB netdevs on top a given ipoib device matching a
-- 
2.1.4

[PATCH net-next 11/11] net: dev: Improve debug statements for adjacency tracking

2016-10-17 Thread David Ahern

Adjacency code only has debugs for the insert case. Add debugs for
the remove path and make both consistently worded to make it easier
to follow the insert and removal with reference counts.

In addition, change the BUG to a WARN_ON. A missing adjacency at
removal time is not cause for a panic.

Signed-off-by: David Ahern 
---
 net/core/dev.c | 22 +++---
 1 file changed, 15 insertions(+), 7 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index c6bbf310d407..f55fb4536016 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -5561,6 +5561,9 @@ static int __netdev_adjacent_dev_insert(struct net_device 
*dev,
 
if (adj) {
adj->ref_nr += 1;
+   pr_debug("Insert adjacency: dev %s adj_dev %s adj->ref_nr %d\n",
+dev->name, adj_dev->name, adj->ref_nr);
+
return 0;
}
 
@@ -5574,8 +5577,8 @@ static int __netdev_adjacent_dev_insert(struct net_device 
*dev,
adj->private = private;
dev_hold(adj_dev);
 
-   pr_debug("dev_hold for %s, because of link added from %s to %s\n",
-adj_dev->name, dev->name, adj_dev->name);
+   pr_debug("Insert adjacency: dev %s adj_dev %s adj->ref_nr %d; dev_hold 
on %s\n",
+dev->name, adj_dev->name, adj->ref_nr, adj_dev->name);
 
if (netdev_adjacent_is_neigh_list(dev, adj_dev, dev_list)) {
ret = netdev_adjacent_sysfs_add(dev, adj_dev, dev_list);
@@ -5614,17 +5617,22 @@ static void __netdev_adjacent_dev_remove(struct 
net_device *dev,
 {
struct netdev_adjacent *adj;
 
+   pr_debug("Remove adjacency: dev %s adj_dev %s ref_nr %d\n",
+dev->name, adj_dev->name, ref_nr);
+
adj = __netdev_find_adj(adj_dev, dev_list);
 
if (!adj) {
-   pr_err("tried to remove device %s from %s\n",
+   pr_err("Adjacency does not exist for device %s from %s\n",
   dev->name, adj_dev->name);
-   BUG();
+   WARN_ON(1);
+   return;
}
 
if (adj->ref_nr > ref_nr) {
-   pr_debug("%s to %s ref_nr-%d = %d\n", dev->name, adj_dev->name,
-ref_nr, adj->ref_nr-ref_nr);
+   pr_debug("adjacency: %s to %s ref_nr - %d = %d\n",
+dev->name, adj_dev->name, ref_nr,
+adj->ref_nr - ref_nr);
adj->ref_nr -= ref_nr;
return;
}
@@ -5636,7 +5644,7 @@ static void __netdev_adjacent_dev_remove(struct 
net_device *dev,
netdev_adjacent_sysfs_del(dev, adj_dev->name, dev_list);
 
list_del_rcu(>list);
-   pr_debug("dev_put for %s, because link removed from %s to %s\n",
+   pr_debug("adjacency: dev_put for %s, because link removed from %s to 
%s\n",
 adj_dev->name, dev->name, adj_dev->name);
dev_put(adj_dev);
kfree_rcu(adj, rcu);
-- 
2.1.4

[PATCH net-next 04/11] IB/core: Flip to the new dev walk API

2016-10-17 Thread David Ahern

Convert rdma_is_upper_dev_rcu, handle_netdev_upper and
ipoib_get_net_dev_match_addr to the new upper device walk API.
This is just a code conversion; no functional change is intended.

v2
- removed typecast of data

Signed-off-by: David Ahern 
---
 drivers/infiniband/core/core_priv.h |  9 +--
 drivers/infiniband/core/roce_gid_mgmt.c | 42 ++---
 2 files changed, 24 insertions(+), 27 deletions(-)

diff --git a/drivers/infiniband/core/core_priv.h 
b/drivers/infiniband/core/core_priv.h
index 19d499dcab76..0c0bea091de8 100644
--- a/drivers/infiniband/core/core_priv.h
+++ b/drivers/infiniband/core/core_priv.h
@@ -127,14 +127,7 @@ void ib_cache_release_one(struct ib_device *device);
 static inline bool rdma_is_upper_dev_rcu(struct net_device *dev,
 struct net_device *upper)
 {
-   struct net_device *_upper = NULL;
-   struct list_head *iter;
-
-   netdev_for_each_all_upper_dev_rcu(dev, _upper, iter)
-   if (_upper == upper)
-   break;
-
-   return _upper == upper;
+   return netdev_has_upper_dev_all_rcu(dev, upper);
 }
 
 int addr_init(void);
diff --git a/drivers/infiniband/core/roce_gid_mgmt.c 
b/drivers/infiniband/core/roce_gid_mgmt.c
index 06556c34606d..3a64a0881882 100644
--- a/drivers/infiniband/core/roce_gid_mgmt.c
+++ b/drivers/infiniband/core/roce_gid_mgmt.c
@@ -437,6 +437,28 @@ static void callback_for_addr_gid_device_scan(struct 
ib_device *device,
  >gid_attr);
 }
 
+struct upper_list {
+   struct list_head list;
+   struct net_device *upper;
+};
+
+static int netdev_upper_walk(struct net_device *upper, void *data)
+{
+   struct upper_list *entry = kmalloc(sizeof(*entry), GFP_ATOMIC);
+   struct list_head *upper_list = data;
+
+   if (!entry) {
+   pr_info("roce_gid_mgmt: couldn't allocate entry to delete 
ndev\n");
+   return 0;
+   }
+
+   list_add_tail(>list, upper_list);
+   dev_hold(upper);
+   entry->upper = upper;
+
+   return 0;
+}
+
 static void handle_netdev_upper(struct ib_device *ib_dev, u8 port,
void *cookie,
void (*handle_netdev)(struct ib_device *ib_dev,
@@ -444,30 +466,12 @@ static void handle_netdev_upper(struct ib_device *ib_dev, 
u8 port,
  struct net_device *ndev))
 {
struct net_device *ndev = (struct net_device *)cookie;
-   struct upper_list {
-   struct list_head list;
-   struct net_device *upper;
-   };
-   struct net_device *upper;
-   struct list_head *iter;
struct upper_list *upper_iter;
struct upper_list *upper_temp;
LIST_HEAD(upper_list);
 
rcu_read_lock();
-   netdev_for_each_all_upper_dev_rcu(ndev, upper, iter) {
-   struct upper_list *entry = kmalloc(sizeof(*entry),
-  GFP_ATOMIC);
-
-   if (!entry) {
-   pr_info("roce_gid_mgmt: couldn't allocate entry to 
delete ndev\n");
-   continue;
-   }
-
-   list_add_tail(>list, _list);
-   dev_hold(upper);
-   entry->upper = upper;
-   }
+   netdev_walk_all_upper_dev_rcu(ndev, netdev_upper_walk, _list);
rcu_read_unlock();
 
handle_netdev(ib_dev, port, ndev);
-- 
2.1.4

[PATCH net-next 07/11] mlxsw: Flip to the new dev walk API

2016-10-17 Thread David Ahern

Convert mlxsw users to new dev walk API. This is just a code conversion;
no functional change is intended.

Signed-off-by: David Ahern 
---
 drivers/net/ethernet/mellanox/mlxsw/spectrum.c | 37 --
 1 file changed, 23 insertions(+), 14 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum.c 
b/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
index 43a5eddc2c11..99805fd3d110 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
@@ -3092,19 +3092,30 @@ static bool mlxsw_sp_port_dev_check(const struct 
net_device *dev)
return dev->netdev_ops == _sp_port_netdev_ops;
 }
 
+static int mlxsw_lower_dev_walk(struct net_device *lower_dev, void *data)
+{
+   struct mlxsw_sp_port **port = data;
+   int ret = 0;
+
+   if (mlxsw_sp_port_dev_check(lower_dev)) {
+   *port = netdev_priv(lower_dev);
+   ret = 1;
+   }
+
+   return ret;
+}
+
 static struct mlxsw_sp_port *mlxsw_sp_port_dev_lower_find(struct net_device 
*dev)
 {
-   struct net_device *lower_dev;
-   struct list_head *iter;
+   struct mlxsw_sp_port *port;
 
if (mlxsw_sp_port_dev_check(dev))
return netdev_priv(dev);
 
-   netdev_for_each_all_lower_dev(dev, lower_dev, iter) {
-   if (mlxsw_sp_port_dev_check(lower_dev))
-   return netdev_priv(lower_dev);
-   }
-   return NULL;
+   port = NULL;
+   netdev_walk_all_lower_dev(dev, mlxsw_lower_dev_walk, );
+
+   return port;
 }
 
 static struct mlxsw_sp *mlxsw_sp_lower_get(struct net_device *dev)
@@ -3117,17 +3128,15 @@ static struct mlxsw_sp *mlxsw_sp_lower_get(struct 
net_device *dev)
 
 static struct mlxsw_sp_port *mlxsw_sp_port_dev_lower_find_rcu(struct 
net_device *dev)
 {
-   struct net_device *lower_dev;
-   struct list_head *iter;
+   struct mlxsw_sp_port *port;
 
if (mlxsw_sp_port_dev_check(dev))
return netdev_priv(dev);
 
-   netdev_for_each_all_lower_dev_rcu(dev, lower_dev, iter) {
-   if (mlxsw_sp_port_dev_check(lower_dev))
-   return netdev_priv(lower_dev);
-   }
-   return NULL;
+   port = NULL;
+   netdev_walk_all_lower_dev_rcu(dev, mlxsw_lower_dev_walk, );
+
+   return port;
 }
 
 struct mlxsw_sp_port *mlxsw_sp_port_lower_dev_hold(struct net_device *dev)
-- 
2.1.4

[PATCH net-next 08/11] rocker: Flip to the new dev walk API

2016-10-17 Thread David Ahern

Convert rocker to the new dev walk API. This is just a code conversion;
no functional change is intended.

v2
- removed typecast of data

Signed-off-by: David Ahern 
---
 drivers/net/ethernet/rocker/rocker_main.c | 31 ---
 1 file changed, 24 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/rocker/rocker_main.c 
b/drivers/net/ethernet/rocker/rocker_main.c
index 5424fb341613..5deb25f26e5f 100644
--- a/drivers/net/ethernet/rocker/rocker_main.c
+++ b/drivers/net/ethernet/rocker/rocker_main.c
@@ -2839,20 +2839,37 @@ static bool rocker_port_dev_check_under(const struct 
net_device *dev,
return true;
 }
 
+struct rocker_walk_data {
+   struct rocker *rocker;
+   struct rocker_port *port;
+};
+
+static int rocker_lower_dev_walk(struct net_device *lower_dev, void *_data)
+{
+   struct rocker_walk_data *data = _data;
+   int ret = 0;
+
+   if (rocker_port_dev_check_under(lower_dev, data->rocker)) {
+   data->port = netdev_priv(lower_dev);
+   ret = 1;
+   }
+
+   return ret;
+}
+
 struct rocker_port *rocker_port_dev_lower_find(struct net_device *dev,
   struct rocker *rocker)
 {
-   struct net_device *lower_dev;
-   struct list_head *iter;
+   struct rocker_walk_data data;
 
if (rocker_port_dev_check_under(dev, rocker))
return netdev_priv(dev);
 
-   netdev_for_each_all_lower_dev(dev, lower_dev, iter) {
-   if (rocker_port_dev_check_under(lower_dev, rocker))
-   return netdev_priv(lower_dev);
-   }
-   return NULL;
+   data.rocker = rocker;
+   data.port = NULL;
+   netdev_walk_all_lower_dev(dev, rocker_lower_dev_walk, );
+
+   return data.port;
 }
 
 static int rocker_netdevice_event(struct notifier_block *unused,
-- 
2.1.4

[PATCH net-next 02/11] net: Introduce new api for walking upper and lower devices

2016-10-17 Thread David Ahern

This patch introduces netdev_walk_all_upper_dev_rcu,
netdev_walk_all_lower_dev and netdev_walk_all_lower_dev_rcu. These
functions recursively walk the adj_list of devices to determine all upper
and lower devices.

The functions take a callback function that is invoked for each device
in the list. If the callback returns non-0, the walk is terminated and
the functions return that code back to callers.

v3
- simplified netdev_has_upper_dev_all_rcu and __netdev_has_upper_dev and
  removed typecast as suggested by Stephen

v2
- fixed definition of netdev_next_lower_dev_rcu to mirror the upper_dev
  version.

Signed-off-by: David Ahern 
---
 include/linux/netdevice.h |  17 +
 net/core/dev.c| 155 ++
 2 files changed, 172 insertions(+)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index bf341b65ca5e..a5902d995907 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -3778,6 +3778,14 @@ struct net_device 
*netdev_all_upper_get_next_dev_rcu(struct net_device *dev,
 updev; \
 updev = netdev_all_upper_get_next_dev_rcu(dev, &(iter)))
 
+int netdev_walk_all_upper_dev_rcu(struct net_device *dev,
+ int (*fn)(struct net_device *upper_dev,
+   void *data),
+ void *data);
+
+bool netdev_has_upper_dev_all_rcu(struct net_device *dev,
+ struct net_device *upper_dev);
+
 void *netdev_lower_get_next_private(struct net_device *dev,
struct list_head **iter);
 void *netdev_lower_get_next_private_rcu(struct net_device *dev,
@@ -3821,6 +3829,15 @@ struct net_device *netdev_all_lower_get_next_rcu(struct 
net_device *dev,
 ldev; \
 ldev = netdev_all_lower_get_next_rcu(dev, &(iter)))
 
+int netdev_walk_all_lower_dev(struct net_device *dev,
+ int (*fn)(struct net_device *lower_dev,
+   void *data),
+ void *data);
+int netdev_walk_all_lower_dev_rcu(struct net_device *dev,
+ int (*fn)(struct net_device *lower_dev,
+   void *data),
+ void *data);
+
 void *netdev_adjacent_get_private(struct list_head *adj_list);
 void *netdev_lower_get_first_private_rcu(struct net_device *dev);
 struct net_device *netdev_master_upper_dev_get(struct net_device *dev);
diff --git a/net/core/dev.c b/net/core/dev.c
index f67fd16615bb..fc48337cfab8 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -5156,6 +5156,31 @@ bool netdev_has_upper_dev(struct net_device *dev,
 EXPORT_SYMBOL(netdev_has_upper_dev);
 
 /**
+ * netdev_has_upper_dev_all - Check if device is linked to an upper device
+ * @dev: device
+ * @upper_dev: upper device to check
+ *
+ * Find out if a device is linked to specified upper device and return true
+ * in case it is. Note that this checks the entire upper device chain.
+ * The caller must hold rcu lock.
+ */
+
+static int __netdev_has_upper_dev(struct net_device *upper_dev, void *data)
+{
+   struct net_device *dev = data;
+
+   return upper_dev == dev;
+}
+
+bool netdev_has_upper_dev_all_rcu(struct net_device *dev,
+ struct net_device *upper_dev)
+{
+   return !!netdev_walk_all_upper_dev_rcu(dev, __netdev_has_upper_dev,
+  upper_dev);
+}
+EXPORT_SYMBOL(netdev_has_upper_dev_all_rcu);
+
+/**
  * netdev_has_any_upper_dev - Check if device is linked to some device
  * @dev: device
  *
@@ -5255,6 +5280,51 @@ struct net_device 
*netdev_all_upper_get_next_dev_rcu(struct net_device *dev,
 }
 EXPORT_SYMBOL(netdev_all_upper_get_next_dev_rcu);
 
+static struct net_device *netdev_next_upper_dev_rcu(struct net_device *dev,
+   struct list_head **iter)
+{
+   struct netdev_adjacent *upper;
+
+   WARN_ON_ONCE(!rcu_read_lock_held() && !lockdep_rtnl_is_held());
+
+   upper = list_entry_rcu((*iter)->next, struct netdev_adjacent, list);
+
+   if (>list == >adj_list.upper)
+   return NULL;
+
+   *iter = >list;
+
+   return upper->dev;
+}
+
+int netdev_walk_all_upper_dev_rcu(struct net_device *dev,
+ int (*fn)(struct net_device *dev,
+   void *data),
+ void *data)
+{
+   struct net_device *udev;
+   struct list_head *iter;
+   int ret;
+
+   for (iter = >adj_list.upper,
+udev = netdev_next_upper_dev_rcu(dev, );
+udev;
+udev = netdev_next_upper_dev_rcu(dev, )) {
+   /* first is the upper device itself */
+   ret = fn(udev, data);
+   if (ret)
+

[PATCH net-next 03/11] net: bonding: Flip to the new dev walk API

2016-10-17 Thread David Ahern

Convert alb_send_learning_packets and bond_has_this_ip to use the new
netdev_walk_all_upper_dev_rcu API. In both cases this is just a code
conversion; no functional change is intended.

v2
- removed typecast of data and simplified bond_upper_dev_walk

Signed-off-by: David Ahern 
---
 drivers/net/bonding/bond_alb.c  | 82 ++---
 drivers/net/bonding/bond_main.c | 17 +
 2 files changed, 61 insertions(+), 38 deletions(-)

diff --git a/drivers/net/bonding/bond_alb.c b/drivers/net/bonding/bond_alb.c
index 551f0f8dead3..c80b023092dd 100644
--- a/drivers/net/bonding/bond_alb.c
+++ b/drivers/net/bonding/bond_alb.c
@@ -950,13 +950,61 @@ static void alb_send_lp_vid(struct slave *slave, u8 
mac_addr[],
dev_queue_xmit(skb);
 }
 
+struct alb_walk_data {
+   struct bonding *bond;
+   struct slave *slave;
+   u8 *mac_addr;
+   bool strict_match;
+};
+
+static int alb_upper_dev_walk(struct net_device *upper, void *_data)
+{
+   struct alb_walk_data *data = _data;
+   bool strict_match = data->strict_match;
+   struct bonding *bond = data->bond;
+   struct slave *slave = data->slave;
+   u8 *mac_addr = data->mac_addr;
+   struct bond_vlan_tag *tags;
+
+   if (is_vlan_dev(upper) && vlan_get_encap_level(upper) == 0) {
+   if (strict_match &&
+   ether_addr_equal_64bits(mac_addr,
+   upper->dev_addr)) {
+   alb_send_lp_vid(slave, mac_addr,
+   vlan_dev_vlan_proto(upper),
+   vlan_dev_vlan_id(upper));
+   } else if (!strict_match) {
+   alb_send_lp_vid(slave, upper->dev_addr,
+   vlan_dev_vlan_proto(upper),
+   vlan_dev_vlan_id(upper));
+   }
+   }
+
+   /* If this is a macvlan device, then only send updates
+* when strict_match is turned off.
+*/
+   if (netif_is_macvlan(upper) && !strict_match) {
+   tags = bond_verify_device_path(bond->dev, upper, 0);
+   if (IS_ERR_OR_NULL(tags))
+   BUG();
+   alb_send_lp_vid(slave, upper->dev_addr,
+   tags[0].vlan_proto, tags[0].vlan_id);
+   kfree(tags);
+   }
+
+   return 0;
+}
+
 static void alb_send_learning_packets(struct slave *slave, u8 mac_addr[],
  bool strict_match)
 {
struct bonding *bond = bond_get_bond_by_slave(slave);
-   struct net_device *upper;
-   struct list_head *iter;
-   struct bond_vlan_tag *tags;
+   struct alb_walk_data data = {
+   .strict_match = strict_match,
+   .mac_addr = mac_addr,
+   .slave = slave,
+   .bond = bond,
+   };
 
/* send untagged */
alb_send_lp_vid(slave, mac_addr, 0, 0);
@@ -965,33 +1013,7 @@ static void alb_send_learning_packets(struct slave 
*slave, u8 mac_addr[],
 * for that device.
 */
rcu_read_lock();
-   netdev_for_each_all_upper_dev_rcu(bond->dev, upper, iter) {
-   if (is_vlan_dev(upper) && vlan_get_encap_level(upper) == 0) {
-   if (strict_match &&
-   ether_addr_equal_64bits(mac_addr,
-   upper->dev_addr)) {
-   alb_send_lp_vid(slave, mac_addr,
-   vlan_dev_vlan_proto(upper),
-   vlan_dev_vlan_id(upper));
-   } else if (!strict_match) {
-   alb_send_lp_vid(slave, upper->dev_addr,
-   vlan_dev_vlan_proto(upper),
-   vlan_dev_vlan_id(upper));
-   }
-   }
-
-   /* If this is a macvlan device, then only send updates
-* when strict_match is turned off.
-*/
-   if (netif_is_macvlan(upper) && !strict_match) {
-   tags = bond_verify_device_path(bond->dev, upper, 0);
-   if (IS_ERR_OR_NULL(tags))
-   BUG();
-   alb_send_lp_vid(slave, upper->dev_addr,
-   tags[0].vlan_proto, tags[0].vlan_id);
-   kfree(tags);
-   }
-   }
+   netdev_walk_all_upper_dev_rcu(bond->dev, alb_upper_dev_walk, );
rcu_read_unlock();
 }
 
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 5fa36ebc0640..c9944d86d045 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -2270,22 +2270,23 @@ static void bond_mii_monitor(struct

[PATCH net-next 01/11] net: Remove refnr arg when inserting link adjacencies

2016-10-17 Thread David Ahern

Commit 93409033ae65 ("net: Add netdev all_adj_list refcnt propagation to
fix panic") propagated the refnr to insert and remove functions tracking
the netdev adjacency graph. However, for the insert path the refnr can
only be 1. Accordingly, remove the refnr argument to make that clear.
ie., the refnr arg in 93409033ae65 was only needed for the remove path.

Signed-off-by: David Ahern 
---
 net/core/dev.c | 27 ---
 1 file changed, 12 insertions(+), 15 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index 352e98129601..f67fd16615bb 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -5453,7 +5453,6 @@ static inline bool netdev_adjacent_is_neigh_list(struct 
net_device *dev,
 
 static int __netdev_adjacent_dev_insert(struct net_device *dev,
struct net_device *adj_dev,
-   u16 ref_nr,
struct list_head *dev_list,
void *private, bool master)
 {
@@ -5463,7 +5462,7 @@ static int __netdev_adjacent_dev_insert(struct net_device 
*dev,
adj = __netdev_find_adj(adj_dev, dev_list);
 
if (adj) {
-   adj->ref_nr += ref_nr;
+   adj->ref_nr += 1;
return 0;
}
 
@@ -5473,7 +5472,7 @@ static int __netdev_adjacent_dev_insert(struct net_device 
*dev,
 
adj->dev = adj_dev;
adj->master = master;
-   adj->ref_nr = ref_nr;
+   adj->ref_nr = 1;
adj->private = private;
dev_hold(adj_dev);
 
@@ -5547,22 +5546,21 @@ static void __netdev_adjacent_dev_remove(struct 
net_device *dev,
 
 static int __netdev_adjacent_dev_link_lists(struct net_device *dev,
struct net_device *upper_dev,
-   u16 ref_nr,
struct list_head *up_list,
struct list_head *down_list,
void *private, bool master)
 {
int ret;
 
-   ret = __netdev_adjacent_dev_insert(dev, upper_dev, ref_nr, up_list,
+   ret = __netdev_adjacent_dev_insert(dev, upper_dev, up_list,
   private, master);
if (ret)
return ret;
 
-   ret = __netdev_adjacent_dev_insert(upper_dev, dev, ref_nr, down_list,
+   ret = __netdev_adjacent_dev_insert(upper_dev, dev, down_list,
   private, false);
if (ret) {
-   __netdev_adjacent_dev_remove(dev, upper_dev, ref_nr, up_list);
+   __netdev_adjacent_dev_remove(dev, upper_dev, 1, up_list);
return ret;
}
 
@@ -5570,10 +5568,9 @@ static int __netdev_adjacent_dev_link_lists(struct 
net_device *dev,
 }
 
 static int __netdev_adjacent_dev_link(struct net_device *dev,
- struct net_device *upper_dev,
- u16 ref_nr)
+ struct net_device *upper_dev)
 {
-   return __netdev_adjacent_dev_link_lists(dev, upper_dev, ref_nr,
+   return __netdev_adjacent_dev_link_lists(dev, upper_dev,
>all_adj_list.upper,
_dev->all_adj_list.lower,
NULL, false);
@@ -5602,12 +5599,12 @@ static int __netdev_adjacent_dev_link_neighbour(struct 
net_device *dev,
struct net_device *upper_dev,
void *private, bool master)
 {
-   int ret = __netdev_adjacent_dev_link(dev, upper_dev, 1);
+   int ret = __netdev_adjacent_dev_link(dev, upper_dev);
 
if (ret)
return ret;
 
-   ret = __netdev_adjacent_dev_link_lists(dev, upper_dev, 1,
+   ret = __netdev_adjacent_dev_link_lists(dev, upper_dev,
   >adj_list.upper,
   _dev->adj_list.lower,
   private, master);
@@ -5676,7 +5673,7 @@ static int __netdev_upper_dev_link(struct net_device *dev,
list_for_each_entry(j, _dev->all_adj_list.upper, list) {
pr_debug("Interlinking %s with %s, non-neighbour\n",
 i->dev->name, j->dev->name);
-   ret = __netdev_adjacent_dev_link(i->dev, j->dev, 
i->ref_nr);
+   ret = __netdev_adjacent_dev_link(i->dev, j->dev);
if (ret)
goto rollback_mesh;
}
@@ -5686,7 +5683,7 @@ static int __netdev_upper_dev_link(struct net_device *dev,
list_for_each_entry(i, _dev->all_adj_list.upper, list) {
pr_debug("linking %s's upper

[PATCH net-next 09/11] net: Remove all_adj_list and its references

2016-10-17 Thread David Ahern

Only direct adjacencies are maintained. All upper or lower devices can
be learned via the new walk API which recursively walks the adj_list for
upper devices or lower devices.

Signed-off-by: David Ahern 
---
 include/linux/netdevice.h |  25 --
 net/core/dev.c| 223 --
 2 files changed, 18 insertions(+), 230 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index a5902d995907..458c87631e7f 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1456,7 +1456,6 @@ enum netdev_priv_flags {
  * @ptype_specific: Device-specific, protocol-specific packet handlers
  *
  * @adj_list:  Directly linked devices, like slaves for bonding
- * @all_adj_list:  All linked devices, *including* neighbours
  * @features:  Currently active device features
  * @hw_features:   User-changeable features
  *
@@ -1675,11 +1674,6 @@ struct net_device {
struct list_head lower;
} adj_list;
 
-   struct {
-   struct list_head upper;
-   struct list_head lower;
-   } all_adj_list;
-
netdev_features_t   features;
netdev_features_t   hw_features;
netdev_features_t   wanted_features;
@@ -3771,13 +3765,6 @@ struct net_device 
*netdev_all_upper_get_next_dev_rcu(struct net_device *dev,
 updev; \
 updev = netdev_upper_get_next_dev_rcu(dev, &(iter)))
 
-/* iterate through upper list, must be called under RCU read lock */
-#define netdev_for_each_all_upper_dev_rcu(dev, updev, iter) \
-   for (iter = &(dev)->all_adj_list.upper, \
-updev = netdev_all_upper_get_next_dev_rcu(dev, &(iter)); \
-updev; \
-updev = netdev_all_upper_get_next_dev_rcu(dev, &(iter)))
-
 int netdev_walk_all_upper_dev_rcu(struct net_device *dev,
  int (*fn)(struct net_device *upper_dev,
void *data),
@@ -3817,18 +3804,6 @@ struct net_device *netdev_all_lower_get_next(struct 
net_device *dev,
 struct net_device *netdev_all_lower_get_next_rcu(struct net_device *dev,
 struct list_head **iter);
 
-#define netdev_for_each_all_lower_dev(dev, ldev, iter) \
-   for (iter = (dev)->all_adj_list.lower.next, \
-ldev = netdev_all_lower_get_next(dev, &(iter)); \
-ldev; \
-ldev = netdev_all_lower_get_next(dev, &(iter)))
-
-#define netdev_for_each_all_lower_dev_rcu(dev, ldev, iter) \
-   for (iter = (dev)->all_adj_list.lower.next, \
-ldev = netdev_all_lower_get_next_rcu(dev, &(iter)); \
-ldev; \
-ldev = netdev_all_lower_get_next_rcu(dev, &(iter)))
-
 int netdev_walk_all_lower_dev(struct net_device *dev,
  int (*fn)(struct net_device *lower_dev,
void *data),
diff --git a/net/core/dev.c b/net/core/dev.c
index fc48337cfab8..a9fe14908b44 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -5137,6 +5137,13 @@ static struct netdev_adjacent *__netdev_find_adj(struct 
net_device *adj_dev,
return NULL;
 }
 
+static int __netdev_has_upper_dev(struct net_device *upper_dev, void *data)
+{
+   struct net_device *dev = data;
+
+   return upper_dev == dev;
+}
+
 /**
  * netdev_has_upper_dev - Check if device is linked to an upper device
  * @dev: device
@@ -5151,7 +5158,8 @@ bool netdev_has_upper_dev(struct net_device *dev,
 {
ASSERT_RTNL();
 
-   return __netdev_find_adj(upper_dev, >all_adj_list.upper);
+   return netdev_walk_all_upper_dev_rcu(dev, __netdev_has_upper_dev,
+upper_dev);
 }
 EXPORT_SYMBOL(netdev_has_upper_dev);
 
@@ -5165,13 +5173,6 @@ EXPORT_SYMBOL(netdev_has_upper_dev);
  * The caller must hold rcu lock.
  */
 
-static int __netdev_has_upper_dev(struct net_device *upper_dev, void *data)
-{
-   struct net_device *dev = data;
-
-   return upper_dev == dev;
-}
-
 bool netdev_has_upper_dev_all_rcu(struct net_device *dev,
  struct net_device *upper_dev)
 {
@@ -5191,7 +5192,7 @@ static bool netdev_has_any_upper_dev(struct net_device 
*dev)
 {
ASSERT_RTNL();
 
-   return !list_empty(>all_adj_list.upper);
+   return !list_empty(>adj_list.upper);
 }
 
 /**
@@ -5254,32 +5255,6 @@ struct net_device *netdev_upper_get_next_dev_rcu(struct 
net_device *dev,
 }
 EXPORT_SYMBOL(netdev_upper_get_next_dev_rcu);
 
-/**
- * netdev_all_upper_get_next_dev_rcu - Get the next dev from upper list
- * @dev: device
- * @iter: list_head ** of the current position
- *
- * Gets the next device from the dev's upper list, starting from iter
- * position. The caller must hold RCU read lock.
- */
-struct net_device *netdev_all_upper_get_next_dev_rcu(struct net_device *dev,
-

[PATCH v3 net-next 00/11] net: Fix netdev adjacency tracking

2016-10-17 Thread David Ahern

The netdev adjacency tracking is failing to create proper dependencies
for some topologies. For example this topology

++
|  myvrf |
++
  ||
  |  +-+
  |  | macvlan |
  |  +-+
  ||
  +--+
  |  bridge  |
  +--+
  |
  ++
  | bond1  |
  ++
  |
  ++
  |  eth3  |
  ++

hits 1 of 2 problems depending on the order of enslavement. The base set of
commands for both cases:

ip link add bond1 type bond
ip link set bond1 up
ip link set eth3 down
ip link set eth3 master bond1
ip link set eth3 up

ip link add bridge type bridge
ip link set bridge up
ip link add macvlan link bridge type macvlan
ip link set macvlan up

ip link add myvrf type vrf table 1234
ip link set myvrf up

ip link set bridge master myvrf

Case 1 enslave macvlan to the vrf before enslaving the bond to the bridge:

ip link set macvlan master myvrf
ip link set bond1 master bridge

Attempts to delete the VRF:
ip link delete myvrf

trigger the BUG in __netdev_adjacent_dev_remove:

[  587.405260] tried to remove device eth3 from myvrf
[  587.407269] [ cut here ]
[  587.408918] kernel BUG at /home/dsa/kernel.git/net/core/dev.c:5661!
[  587.43] invalid opcode:  [#1] SMP
[  587.412454] Modules linked in: macvlan bridge stp llc bonding vrf
[  587.414765] CPU: 0 PID: 726 Comm: ip Not tainted 4.8.0+ #109
[  587.416766] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
1.7.5-20140531_083030-gandalf 04/01/2014
[  587.420241] task: 88013ab6eec0 task.stack: c9628000
[  587.422163] RIP: 0010:[]  [] 
__netdev_adjacent_dev_remove+0x40/0x12c
...
[  587.446053] Call Trace:
[  587.446424]  [] __netdev_adjacent_dev_unlink+0x20/0x3c
[  587.447390]  [] netdev_upper_dev_unlink+0xfa/0x15e
[  587.448297]  [] vrf_del_slave+0x13/0x2a [vrf]
[  587.449153]  [] vrf_dev_uninit+0xea/0x114 [vrf]
[  587.450036]  [] rollback_registered_many+0x22b/0x2da
[  587.450974]  [] unregister_netdevice_many+0x17/0x48
[  587.451903]  [] rtnl_delete_link+0x3c/0x43
[  587.452719]  [] rtnl_dellink+0x180/0x194

When the BUG is converted to a WARN_ON it shows 4 missing adjacencies:
  eth3 - myvrf, mvrf - eth3, bond1 - myvrf and myvrf - bond1

All of those are because the __netdev_upper_dev_link function does not
properly link macvlan lower devices to myvrf when it is enslaved.

The second case just flips the ordering of the enslavements:
ip link set bond1 master bridge
ip link set macvlan master myvrf

Then run:
ip link delete bond1
ip link delete myvrf

The vrf delete command hangs because myvrf has a reference that has not
been released. In this case the removal code does not account for 2 paths 
between eth3 and myvrf - one from bridge to vrf and the other through the
macvlan.

Rather than try to maintain a linked list of all upper and lower devices
per netdevice, only track the direct neighbors. The remaining stack can
be determined by recursively walking the neighbors.

The existing netdev_for_each_all_upper_dev_rcu,
netdev_for_each_all_lower_dev and netdev_for_each_all_lower_dev_rcu macros
are replaced with APIs that walk the upper and lower device lists. The
new APIs take a callback function and a data arg that is passed to the
callback for each device in the list. Drivers using the old macros are
converted in separate patches to make it easier on reviewers. It is an
API conversion only; no functional change is intended.

v3
- address Stephen's comment to simplify logic and remove typecasts

v2
- fixed bond0 references in cover-letter
- fixed definition of netdev_next_lower_dev_rcu to mirror the upper_dev
  version.

David Ahern (11):
  net: Remove refnr arg when inserting link adjacencies
  net: Introduce new api for walking upper and lower devices
  net: bonding: Flip to the new dev walk API
  IB/core: Flip to the new dev walk API
  IB/ipoib: Flip to new dev walk API
  ixgbe: Flip to the new dev walk API
  mlxsw: Flip to the new dev walk API
  rocker: Flip to the new dev walk API
  net: Remove all_adj_list and its references
  net: Add warning if any lower device is still in adjacency list
  net: dev: Improve debug statements for adjacency tracking

 drivers/infiniband/core/core_priv.h|   9 +-
 drivers/infiniband/core/roce_gid_mgmt.c|  42 +--
 drivers/infiniband/ulp/ipoib/ipoib_main.c  |  37 ++-
 drivers/net/bonding/bond_alb.c |  82 +++---
 drivers/net/bonding/bond_main.c|  17 +-
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c  | 132 ++
 drivers/net/ethernet/mellanox/mlxsw/spectrum.c |  37 ++-
 drivers/net/ethernet/rocker/rocker_main.c  |  31 ++-
 include/linux/netdevice.h  |  38 ++-
 net/core/dev.c | 350 -
 10 files changed,

Re: [PATCH net-next 1/1] net: vlan: Use sizeof instead of literal number

2016-10-17 Thread Gao Feng

Hi David,

On Tue, Oct 18, 2016 at 9:36 AM, David Miller  wrote:
>
> It never makes sense to send the same patch for both net and net-next.
>
> If it's a bug fix, it goes to 'net'.  And it will be eventually
> be naturally merged into 'net-next'.
>
> Otherwise, if it's a new feature, cleanup, or optimization it goes to
> 'net-next'.

Because I forget add the "net-next" in the title of first patch, so I
send the second patch with right title.
And I have replied the first patch and said the reason.

Regards
Feng

Re: [PATCH net-next 1/1] net: vlan: Use sizeof instead of literal number

2016-10-17 Thread David Miller


It never makes sense to send the same patch for both net and net-next.

If it's a bug fix, it goes to 'net'.  And it will be eventually
be naturally merged into 'net-next'.

Otherwise, if it's a new feature, cleanup, or optimization it goes to
'net-next'.

Re: [PATCH 1/1] net: vlan: Use sizeof instead of literal number

2016-10-17 Thread Feng Gao

On Tue, Oct 18, 2016 at 8:44 AM,   wrote:
> From: Gao Feng 
>
> Use sizeof variable instead of literal number to enhance the readability.
>
> Signed-off-by: Gao Feng 
> ---
>  net/8021q/vlan.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/net/8021q/vlan.c b/net/8021q/vlan.c
> index 8de138d..5a3903b 100644
> --- a/net/8021q/vlan.c
> +++ b/net/8021q/vlan.c
> @@ -515,8 +515,8 @@ static int vlan_ioctl_handler(struct net *net, void 
> __user *arg)
> return -EFAULT;
>
> /* Null terminate this sucker, just in case. */
> -   args.device1[23] = 0;
> -   args.u.device2[23] = 0;
> +   args.device1[sizeof(args.device1) - 1] = 0;
> +   args.u.device2[sizeof(args.u.device2) - 1] = 0;
>
> rtnl_lock();
>
> --
> 1.9.1
>
>

Sorry, I forget add the "net-next" in the title.
Now I have sent another new patch, please ignore this conversation.

Regards
Feng

[PATCH net-next 1/1] net: vlan: Use sizeof instead of literal number

2016-10-17 Thread fgao

From: Gao Feng 

Use sizeof variable instead of literal number to enhance the readability.

Signed-off-by: Gao Feng 
---
 net/8021q/vlan.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/8021q/vlan.c b/net/8021q/vlan.c
index 8de138d..5a3903b 100644
--- a/net/8021q/vlan.c
+++ b/net/8021q/vlan.c
@@ -515,8 +515,8 @@ static int vlan_ioctl_handler(struct net *net, void __user 
*arg)
return -EFAULT;
 
/* Null terminate this sucker, just in case. */
-   args.device1[23] = 0;
-   args.u.device2[23] = 0;
+   args.device1[sizeof(args.device1) - 1] = 0;
+   args.u.device2[sizeof(args.u.device2) - 1] = 0;
 
rtnl_lock();
 
-- 
1.9.1

[PATCH 1/1] net: vlan: Use sizeof instead of literal number

2016-10-17 Thread fgao

From: Gao Feng 

Use sizeof variable instead of literal number to enhance the readability.

Signed-off-by: Gao Feng 
---
 net/8021q/vlan.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/8021q/vlan.c b/net/8021q/vlan.c
index 8de138d..5a3903b 100644
--- a/net/8021q/vlan.c
+++ b/net/8021q/vlan.c
@@ -515,8 +515,8 @@ static int vlan_ioctl_handler(struct net *net, void __user 
*arg)
return -EFAULT;
 
/* Null terminate this sucker, just in case. */
-   args.device1[23] = 0;
-   args.u.device2[23] = 0;
+   args.device1[sizeof(args.device1) - 1] = 0;
+   args.u.device2[sizeof(args.u.device2) - 1] = 0;
 
rtnl_lock();
 
-- 
1.9.1

Re: [PATCH net-next 02/11] net: Introduce new api for walking upper and lower devices

2016-10-17 Thread David Ahern

On 10/17/16 6:21 AM, Stephen Hemminger wrote:
> 
> No if/else needed. No cast of void * ptr need. Use const if possible?
> 

so much of the stack does not use const and trying to add it for this API does 
not work -- the upper or lower device is passed to the callbacks and those 
callbacks invoke other apis. e.g., the bond patch calls vlan_get_encap_level, 
bond_verify_device_path and bond_confirm_addr and none of those accept a const 
dev.

v3 coming up with the more succinct versions, but const is not possible.

Re: [patch net-next RFC 4/6] Introduce sample tc action

2016-10-17 Thread Roopa Prabhu

On 10/17/16, 3:10 AM, Jamal Hadi Salim wrote:
>
> Some comments:
> IIUC, the main struggle seems to be whether the redirect to dummy0
> is useful or not? i.e instead of just letting the packets go up the
> stack on eth1?

yep, correct...given existing workflow for the non-offloaded case is
to receive sample packets via bpf filter on socket or
use netlink as a sample delivery mechanism (NFLOG eg)


> It seems like sflowd needs to read off eth1 via packet socket?
> To be backward compatible - supporting that approach seems sensible.
>
> Note:
> There is a clear efficiency benefit of both using IFE encoding and
> redirecting to dummy0.
> 1) Redirecting to dummy0 implies you dont need to exercise a bpf
> filter around every packet that comes off eth1.
> I understand there are probably not millions of pps for this case;
> but in a non-offloaded cases it could be millions pps.
> And in case of sampling over many ethx devices, you can redirect
> samples from many other ethx devices.
> So making dummy0 the sflow device is a win.
> 2) Encaping an IFE header implies a much more efficient bpf filter
> (IFE ethertype is an excellent discriminator for bpf).
>
> Additional benefit is as mentioned before - redirecting to a device
> means you can send it remotely over ethernet to a more powerful
> machine without having to cross kernel-userspace. Redirecting instead
> of mirroring to tuntap is also an interesting option.

sure, this seems like a good option to have.
generally you have one instance of the sampling agent on a hyper visor or 
switch.
But, if you have use-cases where monitoring agents run external, sure.
would have preferred if it was optional or an addon and not the default.

Regarding the device, yeah, agree there are pros and cons.
An additional device just to sample packets seems like an overkill.
But, if there is no other other option, and there are benefits to it, no 
objections.
Hopefully we can add another option on the existing api to skip the device in 
the future.


>
>
> On 16-10-15 12:34 PM, Roopa Prabhu wrote:
>> On 10/12/16, 5:41 AM, Jiri Pirko wrote:
>>> From: Yotam Gigi 
>
>
>>> +
>>> +struct sample_packet_metadata {
>>> +int sample_size;
>>> +int orig_size;
>>> +int ifindex;
>>> +};
>>> +
>> This metadata does not look extensible.. can it be made to ?
>>
>
> Sure it can...
>
>> With sflow in context, you need a pair of ifindex numbers to encode ingress 
>> and egress ports.
>
> What is the use case for both?

I have heard that most monitoring tools have moved to ingress only sampling 
because of operational
complexity (use case is sflow). I think hardware also supports ingress and 
egress only sampling.
better to have an option to reflect that in the api.

>> Ideally you would also include a sequence number and a count of the total 
>> number of packets
> > that were candidates for sampling.
>
> Sequence number may make sense (they will help show a gap if something
> gets dropped). But i am not sure about the stats consuming such space.
> Stats are something that can be queried (tc stats should have a record
> of how many bytes/packets )

sure, thats fine.
>
>> The OVS implementation is a good example, the metadata includes all the 
>> actions applied
>> to the packet in the kernel data path.
>>
>
> Again not sure what the use case would be (and why waste such space
> especially when you are sending over the wire with such details).

All this is being used currently.., But, this can be other api's sflow uses
for monitoring.
http://openvswitch.org/support/ovscon2014/17/1400-ovs-sflow.pdf

Does not have to be part of the main/basic sampling api...
it was just an example.

>
>>> +rcu_read_lock();
>>> +retval = READ_ONCE(s->tcf_action);
>>> +
>>> +if (++s->packet_counter % s->rate == 0) {
>>
>> The sampling function isn’t random
>>
>> if (++s->packet_counter % s->rate == 0) {
>>
>> This is unsuitable for sFlow, which is specific about the random sampling 
>> function required.
>> BPF, OVS, and the
>> ULOG statistics module include efficient kernel based random sampling 
>> functions that could be used instead.
>>
>
> If i understood correctly, the above is a fallback sampling algorithm.
> In the case of the spectrum it already does the sampling in the ASIC
> so there is no need to repeat it in software.
> Agreed that in that case the sampling approach is not sufficiently
> random.

yes. and since the same sampling api will be used for offloaded and 
non-offloaded case,
the sampling algo here for the non-offloaded case...can do better .. atleast 
match the existing
api efficiency. We would want people to use the same api for the offload and 
non-offloaded case.

thanks,
Roopa

Re: [Intel-wired-lan] [PATCH V3 RFC 2/2] ixgbe: ixgbe_atr() compute l4_proto only if non-paged data has network/transport headers

2016-10-17 Thread Jeff Kirsher

On Mon, 2016-10-17 at 18:47 -0400, Sowmini Varadhan wrote:
> On (10/17/16 15:37), Jeff Kirsher wrote:
> > > Reviewed-by: Alexander Duyck 
> > 
> > Sowmini, can you re-submit this to intel-wired-lan but without the RFC
> in
> > the title?
> 
> V4 resubmitted.. I think I just inadvertently forgot to add Alex as the
> reviewed-by.. could you please fix that (or I can resubmit v5 if needed).

No need to resubmit, I can make sure Alex's reviewed-by gets added.

signature.asc
Description: This is a digitally signed message part

Re: [Intel-wired-lan] [PATCH V3 RFC 2/2] ixgbe: ixgbe_atr() compute l4_proto only if non-paged data has network/transport headers

2016-10-17 Thread Sowmini Varadhan

On (10/17/16 15:37), Jeff Kirsher wrote:
> > Reviewed-by: Alexander Duyck 
> 
> Sowmini, can you re-submit this to intel-wired-lan but without the RFC in
> the title?

V4 resubmitted.. I think I just inadvertently forgot to add Alex as the
reviewed-by.. could you please fix that (or I can resubmit v5 if needed).

--Sowmini

Re: [Intel-wired-lan] [PATCH V3 RFC 2/2] ixgbe: ixgbe_atr() compute l4_proto only if non-paged data has network/transport headers

2016-10-17 Thread Jeff Kirsher

On Mon, 2016-10-17 at 15:29 -0700, Alexander Duyck wrote:
> On Mon, Oct 17, 2016 at 2:12 PM, Sowmini Varadhan
>  wrote:
> > 
> > For some Tx paths (e.g., tpacket_snd()), ixgbe_atr may be
> > passed down an sk_buff that has the network and transport
> > header in the paged data, so it needs to make sure these
> > headers are available in the headlen bytes to calculate the
> > l4_proto.
> > 
> > This patch expect that network and transport headers are
> > already available in the non-paged header dat.  The assumption
> > is that the caller has set this up if l4_proto based Tx
> > steering is desired.
> > 
> > Signed-off-by: Sowmini Varadhan 
> 
> This all looks correct to me.  I would recommend having Jeff pull it
> in to be submitted to the net queue.
> 
> Reviewed-by: Alexander Duyck 

Sowmini, can you re-submit this to intel-wired-lan but without the RFC in
the title?

signature.asc
Description: This is a digitally signed message part

Re: [Intel-wired-lan] [PATCH V3 RFC 2/2] ixgbe: ixgbe_atr() compute l4_proto only if non-paged data has network/transport headers

2016-10-17 Thread Alexander Duyck

On Mon, Oct 17, 2016 at 2:12 PM, Sowmini Varadhan
 wrote:
> For some Tx paths (e.g., tpacket_snd()), ixgbe_atr may be
> passed down an sk_buff that has the network and transport
> header in the paged data, so it needs to make sure these
> headers are available in the headlen bytes to calculate the
> l4_proto.
>
> This patch expect that network and transport headers are
> already available in the non-paged header dat.  The assumption
> is that the caller has set this up if l4_proto based Tx
> steering is desired.
>
> Signed-off-by: Sowmini Varadhan 

This all looks correct to me.  I would recommend having Jeff pull it
in to be submitted to the net queue.

Reviewed-by: Alexander Duyck

[PATCH 28/28] Kbuild: bring back -Wmaybe-uninitialized warning

2016-10-17 Thread Arnd Bergmann

Traditionally, we have always had warnings about uninitialized variables
enabled, as this is part of -Wall, and generally a good idea [1], but it
also always produced false positives, mainly because this is a variation
of the halting problem and provably impossible to get right in all cases
[2].

Various people have identified cases that are particularly bad for false
positives, and in commit e74fc973b6e5 ("Turn off -Wmaybe-uninitialized
when building with -Os"), I turned off the warning for any build that
was done with CC_OPTIMIZE_FOR_SIZE.  This drastically reduced the number
of false positive warnings in the default build but unfortunately had
the side effect of turning the warning off completely in 'allmodconfig'
builds, which in turn led to a lot of warnings (both actual bugs, and
remaining false positives) to go in unnoticed.

With commit 877417e6ffb9 ("Kbuild: change CC_OPTIMIZE_FOR_SIZE
definition") enabled the warning again for allmodconfig builds in v4.7
and in v4.8-rc1, I had finally managed to address all warnings I get in
an ARM allmodconfig build and most other maybe-uninitialized warnings
for ARM randconfig builds.

However, commit 6e8d666e9253 ("Disable "maybe-uninitialized" warning
globally") was merged at the same time and disabled it completely for
all configurations, because of false-positive warnings on x86 that
I had not addressed until then. This caused a lot of actual bugs to
get merged into mainline, and I sent several dozen patches for these
during the v4.9 development cycle. Most of these are actual bugs,
some are for correct code that is safe because it is only called
under external constraints that make it impossible to run into
the case that gcc sees, and in a few cases gcc is just stupid and
finds something that can obviously never happen.

I have now done a few thousand randconfig builds on x86 and collected
all patches that I needed to address every single warning I got
(I can provide the combined patch for the other warnings if anyone
is interested), so I hope we can get the warning back and let people
catch the actual bugs earlier.

Note that the majority of the patches I created are for the third kind
of problem (stupid false-positives), for one of two reasons:
- some of them only get triggered in certain combinations of config
  options, so we don't always run into them, and
- the actual bugs tend to get addressed much quicker as they also
  lead to incorrect runtime behavior.

These 27 patches address the warnings that either occur in one of the more
common configurations (defconfig, allmodconfig, or something built by the
kbuild robot or kernelci.org), or they are about a real bug. It would be
good to get these all into v4.9 if we want to turn on the warning again.
I have tested these extensively with gcc-4.9 and gcc-6 and done a bit
of testing with gcc-5, and all of these should now be fine. gcc-4.8
is much worse about the false-positive warnings and is also fairly old
now, so I'm leaving the warning disabled with that version. gcc-4.7 and
older don't understand the -Wno-maybe-uninitialized option and are not
affected by this patch either way.

I have another (smaller) series of patches for warnings that are both
harmless and not as easy to trigger, and I will send them for inclusion
in v4.10.

Link: https://rusty.ozlabs.org/?p=232 [1]
Link: https://gcc.gnu.org/wiki/Better_Uninitialized_Warnings [2]
Signed-off-by: Arnd Bergmann 
---
 Makefile   | 10 ++
 arch/arc/Makefile  |  4 +++-
 scripts/Makefile.ubsan |  4 
 3 files changed, 13 insertions(+), 5 deletions(-)

Cc: x...@kernel.org
Cc: linux-me...@vger.kernel.org
Cc: Mauro Carvalho Chehab 
Cc: Martin Schwidefsky 
Cc: linux-s...@vger.kernel.org
Cc: Ilya Dryomov 
Cc: dri-de...@lists.freedesktop.org
Cc: linux-...@lists.infradead.org
Cc: Herbert Xu 
Cc: linux-cry...@vger.kernel.org
Cc: "David S. Miller" 
Cc: netdev@vger.kernel.org
Cc: Greg Kroah-Hartman 
Cc: ceph-de...@vger.kernel.org
Cc: linux-f2fs-de...@lists.sourceforge.net
Cc: linux-e...@vger.kernel.org
Cc: netfilter-de...@vger.kernel.org

diff --git a/Makefile b/Makefile
index 512e47a..43cd3d9 100644
--- a/Makefile
+++ b/Makefile
@@ -370,7 +370,7 @@ LDFLAGS_MODULE  =
 CFLAGS_KERNEL  =
 AFLAGS_KERNEL  =
 LDFLAGS_vmlinux =
-CFLAGS_GCOV= -fprofile-arcs -ftest-coverage -fno-tree-loop-im
+CFLAGS_GCOV= -fprofile-arcs -ftest-coverage -fno-tree-loop-im  
-Wno-maybe-uninitialized
 CFLAGS_KCOV:= $(call cc-option,-fsanitize-coverage=trace-pc,)
 
 
@@ -620,7 +620,6 @@ ARCH_CFLAGS :=
 include arch/$(SRCARCH)/Makefile
 
 KBUILD_CFLAGS  += $(call cc-option,-fno-delete-null-pointer-checks,)
-KBUILD_CFLAGS  += $(call cc-disable-warning,maybe-uninitialized,)
 KBUILD_CFLAGS  += $(call cc-disable-warning,frame-address,)
 
 ifdef CONFIG_LD_DEAD_CODE_DATA_ELIMINATION
@@ -629,15 +628,18 @@

[PATCH 27/28] rocker: fix maybe-uninitialized warning

2016-10-17 Thread Arnd Bergmann

In some rare configurations, we get a warning about the 'index' variable
being used without an initialization:

drivers/net/ethernet/rocker/rocker_ofdpa.c: In function 
‘ofdpa_port_fib_ipv4.isra.16.constprop’:
drivers/net/ethernet/rocker/rocker_ofdpa.c:2425:92: warning: ‘index’ may be 
used uninitialized in this function [-Wmaybe-uninitialized]

This is a false positive, the logic is just a bit too complex for gcc
to follow here. Moving the intialization of 'index' a little further
down makes it clear to gcc that the function always returns an error
if it is not initialized.

Signed-off-by: Arnd Bergmann 
---
 drivers/net/ethernet/rocker/rocker_ofdpa.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/rocker/rocker_ofdpa.c 
b/drivers/net/ethernet/rocker/rocker_ofdpa.c
index 431a608..4ca4613 100644
--- a/drivers/net/ethernet/rocker/rocker_ofdpa.c
+++ b/drivers/net/ethernet/rocker/rocker_ofdpa.c
@@ -1493,8 +1493,6 @@ static int ofdpa_port_ipv4_nh(struct ofdpa_port 
*ofdpa_port,
spin_lock_irqsave(>neigh_tbl_lock, lock_flags);
 
found = ofdpa_neigh_tbl_find(ofdpa, ip_addr);
-   if (found)
-   *index = found->index;
 
updating = found && adding;
removing = found && !adding;
@@ -1508,9 +1506,11 @@ static int ofdpa_port_ipv4_nh(struct ofdpa_port 
*ofdpa_port,
resolved = false;
} else if (removing) {
ofdpa_neigh_del(trans, found);
+   *index = found->index;
} else if (updating) {
ofdpa_neigh_update(found, trans, NULL, false);
resolved = !is_zero_ether_addr(found->eth_dst);
+   *index = found->index;
} else {
err = -ENOENT;
}
-- 
2.9.0

[PATCH 21/28] net/hyperv: avoid uninitialized variable

2016-10-17 Thread Arnd Bergmann

The hdr_offset variable is only if we deal with a TCP or UDP packet,
but as the check surrounding its usage tests for skb_is_gso()
instead, the compiler has no idea if the variable is initialized
or not at that point:

drivers/net/hyperv/netvsc_drv.c: In function ‘netvsc_start_xmit’:
drivers/net/hyperv/netvsc_drv.c:494:42: error: ‘hdr_offset’ may be used 
uninitialized in this function [-Werror=maybe-uninitialized]

This adds an additional check for the transport type, which
tells the compiler that this path cannot happen. Since the
get_net_transport_info() function should always be inlined
here, I don't expect this to result in additional runtime
checks.

Signed-off-by: Arnd Bergmann 
---
 drivers/net/hyperv/netvsc_drv.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c
index f0919bd..5d6e75a 100644
--- a/drivers/net/hyperv/netvsc_drv.c
+++ b/drivers/net/hyperv/netvsc_drv.c
@@ -447,7 +447,7 @@ static int netvsc_start_xmit(struct sk_buff *skb, struct 
net_device *net)
 * Setup the sendside checksum offload only if this is not a
 * GSO packet.
 */
-   if (skb_is_gso(skb)) {
+   if ((net_trans_info & (INFO_TCP | INFO_UDP)) && skb_is_gso(skb)) {
struct ndis_tcp_lso_info *lso_info;
 
rndis_msg_size += NDIS_LSO_PPI_SIZE;
-- 
2.9.0

[PATCH 20/28] net: bcm63xx: avoid referencing uninitialized variable

2016-10-17 Thread Arnd Bergmann

gcc found a reference to an uninitialized variable in the error handling
of bcm_enet_open, introduced by a recent cleanup:

drivers/net/ethernet/broadcom/bcm63xx_enet.c: In function 'bcm_enet_open'
drivers/net/ethernet/broadcom/bcm63xx_enet.c:1129:2: warning: 'phydev' may be 
used uninitialized in this function [-Wmaybe-uninitialized]

This makes the use of that variable conditional, so we only reference it
here after it has been used before. Unlike my normal patches, I have not
build-tested this one, as I don't currently have mips test in my
randconfig setup.

Fixes: 625eb8667d6f ("net: ethernet: broadcom: bcm63xx: use phydev from struct 
net_device")
Cc: Philippe Reynes 
Reported-by: kbuild test robot 
Signed-off-by: Arnd Bergmann 
---
 drivers/net/ethernet/broadcom/bcm63xx_enet.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/broadcom/bcm63xx_enet.c 
b/drivers/net/ethernet/broadcom/bcm63xx_enet.c
index ae364c7..5370909 100644
--- a/drivers/net/ethernet/broadcom/bcm63xx_enet.c
+++ b/drivers/net/ethernet/broadcom/bcm63xx_enet.c
@@ -1126,7 +1126,8 @@ static int bcm_enet_open(struct net_device *dev)
free_irq(dev->irq, dev);
 
 out_phy_disconnect:
-   phy_disconnect(phydev);
+   if (priv->has_phy)
+   phy_disconnect(phydev);
 
return ret;
 }
-- 
2.9.0

[PATCH 19/28] brcmfmac: avoid maybe-uninitialized warning in brcmf_cfg80211_start_ap

2016-10-17 Thread Arnd Bergmann

A bugfix added a sanity check around the assignment and use of the
'is_11d' variable, which looks correct to me, but as the function is
rather complex already, this confuses the compiler to the point where
it can no longer figure out if the variable is always initialized
correctly:

brcm80211/brcmfmac/cfg80211.c: In function ‘brcmf_cfg80211_start_ap’:
brcm80211/brcmfmac/cfg80211.c:4586:10: error: ‘is_11d’ may be used 
uninitialized in this function [-Werror=maybe-uninitialized]

This adds an initialization for the newly introduced case in which
the variable should not really be used, in order to make the warning
go away.

Fixes: b3589dfe0212 ("brcmfmac: ignore 11d configuration errors")
Cc: Hante Meuleman 
Cc: Arend van Spriel 
Cc: Kalle Valo 
Signed-off-by: Arnd Bergmann 
---
 drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c 
b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c
index b777e1b..78d9966 100644
--- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c
+++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c
@@ -4516,7 +4516,7 @@ brcmf_cfg80211_start_ap(struct wiphy *wiphy, struct 
net_device *ndev,
/* store current 11d setting */
if (brcmf_fil_cmd_int_get(ifp, BRCMF_C_GET_REGULATORY,
  >vif->is_11d)) {
-   supports_11d = false;
+   is_11d = supports_11d = false;
} else {
country_ie = brcmf_parse_tlvs((u8 *)settings->beacon.tail,
  settings->beacon.tail_len,
-- 
2.9.0

[PATCH 01/28] [v2] netfilter: nf_tables: avoid uninitialized variable warning

2016-10-17 Thread Arnd Bergmann

The newly added nft_range_eval() function handles the two possible
nft range operations, but as the compiler warning points out,
any unexpected value would lead to the 'mismatch' variable being
used without being initialized:

net/netfilter/nft_range.c: In function 'nft_range_eval':
net/netfilter/nft_range.c:45:5: error: 'mismatch' may be used uninitialized in 
this function [-Werror=maybe-uninitialized]

This removes the variable in question and instead moves the
condition into the switch itself, which is potentially more
efficient than adding a bogus 'default' clause as in my
first approach, and is nicer than using the 'uninitialized_var'
macro.

Fixes: 0f3cd9b36977 ("netfilter: nf_tables: add range expression")
Link: http://patchwork.ozlabs.org/patch/677114/
Signed-off-by: Arnd Bergmann 
---
 net/netfilter/nft_range.c | 10 --
 1 file changed, 4 insertions(+), 6 deletions(-)

Cc: Pablo Neira Ayuso 

diff --git a/net/netfilter/nft_range.c b/net/netfilter/nft_range.c
index c6d5358..2dd80f4 100644
--- a/net/netfilter/nft_range.c
+++ b/net/netfilter/nft_range.c
@@ -28,22 +28,20 @@ static void nft_range_eval(const struct nft_expr *expr,
 const struct nft_pktinfo *pkt)
 {
const struct nft_range_expr *priv = nft_expr_priv(expr);
-   bool mismatch;
int d1, d2;
 
d1 = memcmp(>data[priv->sreg], >data_from, priv->len);
d2 = memcmp(>data[priv->sreg], >data_to, priv->len);
switch (priv->op) {
case NFT_RANGE_EQ:
-   mismatch = (d1 < 0 || d2 > 0);
+   if (d1 < 0 || d2 > 0)
+   regs->verdict.code = NFT_BREAK;
break;
case NFT_RANGE_NEQ:
-   mismatch = (d1 >= 0 && d2 <= 0);
+   if (d1 >= 0 && d2 <= 0)
+   regs->verdict.code = NFT_BREAK;
break;
}
-
-   if (mismatch)
-   regs->verdict.code = NFT_BREAK;
 }
 
 static const struct nla_policy nft_range_policy[NFTA_RANGE_MAX + 1] = {
-- 
2.9.0

[PATCH 00/28] Reenable maybe-uninitialized warnings

2016-10-17 Thread Arnd Bergmann

This is a set of patches that I hope to get into v4.9 in some form
in order to turn on the -Wmaybe-uninitialized warnings again.

After talking to Linus in person at Linaro Connect about this, I
spent some time on finding all the remaining warnings, and this
is the resulting patch series. More details are in the description
of the last patch that actually enables the warning.

Let me know if there are other warnings that I missed, and whether
you think these are still appropriate for v4.9 or not.
A couple of patches are non-obvious, and could use some more
detailed review.

Arnd

Arnd Bergmann (28):
  [v2] netfilter: nf_tables: avoid uninitialized variable warning
  [v2] mtd: mtk: avoid warning in mtk_ecc_encode
  [v2] infiniband: shut up a maybe-uninitialized warning
  f2fs: replace a build-time warning with runtime WARN_ON
  ext2: avoid bogus -Wmaybe-uninitialized warning
  NFSv4.1: work around -Wmaybe-uninitialized warning
  ceph: avoid false positive maybe-uninitialized warning
  staging: lustre: restore initialization of return code
  staging: lustre: remove broken dead code in
cfs_cpt_table_create_pattern
  UBI: fix uninitialized access of vid_hdr pointer
  block: rdb: false-postive gcc-4.9 -Wmaybe-uninitialized
  [media] rc: print correct variable for z8f0811
  [media] dib0700: fix uninitialized data on 'repeat' event
  iio: accel: sca3000_core: avoid potentially uninitialized variable
  crypto: aesni: avoid -Wmaybe-uninitialized warning
  pcmcia: fix return value of soc_pcmcia_regulator_set
  spi: fsl-espi: avoid processing uninitalized data on error
  drm: avoid uninitialized timestamp use in wait_vblank
  brcmfmac: avoid maybe-uninitialized warning in brcmf_cfg80211_start_ap
  net: bcm63xx: avoid referencing uninitialized variable
  net/hyperv: avoid uninitialized variable
  x86: apm: avoid uninitialized data
  x86: mark target address as output in 'insb' asm
  x86: math-emu: possible uninitialized variable use
  s390: pci: don't print uninitialized data for debugging
  nios2: fix timer initcall return value
  rocker: fix maybe-uninitialized warning
  Kbuild: bring back -Wmaybe-uninitialized warning

 Makefile   |  10 +-
 arch/arc/Makefile  |   4 +-
 arch/nios2/kernel/time.c   |   1 +
 arch/s390/pci/pci_dma.c|   2 +-
 arch/x86/crypto/aesni-intel_glue.c | 121 +
 arch/x86/include/asm/io.h  |   4 +-
 arch/x86/kernel/apm_32.c   |   5 +-
 arch/x86/math-emu/Makefile |   4 +-
 arch/x86/math-emu/reg_compare.c|  16 +--
 drivers/block/rbd.c|   1 +
 drivers/gpu/drm/drm_irq.c  |   4 +-
 drivers/infiniband/core/cma.c  |  56 +-
 drivers/media/i2c/ir-kbd-i2c.c |   2 +-
 drivers/media/usb/dvb-usb/dib0700_core.c   |  10 +-
 drivers/mtd/nand/mtk_ecc.c |  19 ++--
 drivers/mtd/ubi/eba.c  |   2 +-
 drivers/net/ethernet/broadcom/bcm63xx_enet.c   |   3 +-
 drivers/net/ethernet/rocker/rocker_ofdpa.c |   4 +-
 drivers/net/hyperv/netvsc_drv.c|   2 +-
 .../broadcom/brcm80211/brcmfmac/cfg80211.c |   2 +-
 drivers/pcmcia/soc_common.c|   2 +-
 drivers/spi/spi-fsl-espi.c |   2 +-
 drivers/staging/iio/accel/sca3000_core.c   |   2 +
 .../staging/lustre/lnet/libcfs/linux/linux-cpu.c   |   7 --
 drivers/staging/lustre/lustre/lov/lov_pack.c   |   2 +
 fs/ceph/super.c|   3 +-
 fs/ext2/inode.c|   7 +-
 fs/f2fs/data.c |   7 ++
 fs/nfs/nfs4session.c   |  10 +-
 net/netfilter/nft_range.c  |  10 +-
 scripts/Makefile.ubsan |   4 +
 31 files changed, 187 insertions(+), 141 deletions(-)

-- 
Cc: x...@kernel.org
Cc: linux-me...@vger.kernel.org
Cc: Mauro Carvalho Chehab 
Cc: Martin Schwidefsky 
Cc: linux-s...@vger.kernel.org
Cc: Ilya Dryomov 
Cc: dri-de...@lists.freedesktop.org
Cc: linux-...@lists.infradead.org
Cc: Herbert Xu 
Cc: linux-cry...@vger.kernel.org
Cc: "David S. Miller" 
Cc: netdev@vger.kernel.org
Cc: Greg Kroah-Hartman 
Cc: ceph-de...@vger.kernel.org
Cc: linux-f2fs-de...@lists.sourceforge.net
Cc: linux-e...@vger.kernel.org
Cc: netfilter-de...@vger.kernel.org
2.9.0

[PATCH net] soreuseport: do not export reuseport_add_sock()

2016-10-17 Thread Eric Dumazet

From: Eric Dumazet 

reuseport_add_sock() is not used from a module,
no need to export it.

Signed-off-by: Eric Dumazet 
---
 net/core/sock_reuseport.c |1 -
 1 file changed, 1 deletion(-)

diff --git a/net/core/sock_reuseport.c b/net/core/sock_reuseport.c
index e92b759d906c..9a1a352fd1eb 100644
--- a/net/core/sock_reuseport.c
+++ b/net/core/sock_reuseport.c
@@ -129,7 +129,6 @@ int reuseport_add_sock(struct sock *sk, struct sock *sk2)
 
return 0;
 }
-EXPORT_SYMBOL(reuseport_add_sock);
 
 static void reuseport_free_rcu(struct rcu_head *head)
 {

Re: [Intel-wired-lan] [PATCH V2 RFC 2/2] ixgbe: ixgbe_atr() compute l4_proto only if non-paged data has network/transport headers

2016-10-17 Thread Sowmini Varadhan

On (10/17/16 12:49), Alexander Duyck wrote:
> >> > /* Currently only IPv4/IPv6 with TCP is supported */
> >> > switch (hdr.ipv4->version) {
> >> > case IPVERSION:
> >> > /* access ihl as u8 to avoid unaligned access on ia64 */
> >> > hlen = (hdr.network[0] & 0x0F) << 2;
> >> > +   if (skb_tail_pointer(skb) < hdr.network + hlen +
> >> > +   sizeof(struct tcphdr))
> >> > +   return;
> >> > l4_proto = hdr.ipv4->protocol;
> >> > break;
> >> > case 6:
> >> > hlen = hdr.network - skb->data;
> >> > +   if (skb_tail_pointer(skb) < hdr.network + hlen +
> >> > +   sizeof(struct tcphdr))
> >> > +   return;
> >> > l4_proto = ipv6_find_hdr(skb, , IPPROTO_TCP, NULL, 
> >> > NULL);
> >> > hlen -= hdr.network - skb->data;
> >> > break;
   :
> >> So you probably need to add a check for "skb_tail_pointer(skb) <
> >> (hdr.network + hlen + 20)".
> >
> > But isnt that the same thing as the checks before l4_proto computation 
> > above?
> 
> Sort of.  The problem is IPv6 can include extension headers and that
> can totally mess with us.  So we need to do one more check to verify
> that we have enough space for IPv6 w/ TCP which would be hdr.raw + 20
> + hlenl.

Yes, you are right. So given that I already check that I have
at least 40 bytes past the network header, and ipv6_find_hdr
will pull up exthdrs as needed, my checks are not needed, and the
real ones should happen after we come out of that switch().

--Sowmini

[PATCH v3 RFC 0/2] ixgbe: ixgbe_atr() bug fixes

2016-10-17 Thread Sowmini Varadhan

Two bug fixes:
- ixgbe_atr() should check for protocol == udp in the
  skb->encapsulation case (instead of !=)
- ixgbe_atr() should make sure the non-paged data has the
  needed network/transport header for computing l4_proto.

v3: Alex Duyck comments

Sowmini Varadhan (2):
  ixgbe: ixgbe_atr() should access udp_hdr(skb) only for UDP packets
  ixgbe: ixgbe_atr() compute l4_proto only if non-paged data has
network/transport headers

 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |   19 ++-
 1 files changed, 18 insertions(+), 1 deletions(-)

[PATCH V3 RFC 2/2] ixgbe: ixgbe_atr() compute l4_proto only if non-paged data has network/transport headers

2016-10-17 Thread Sowmini Varadhan

For some Tx paths (e.g., tpacket_snd()), ixgbe_atr may be
passed down an sk_buff that has the network and transport
header in the paged data, so it needs to make sure these
headers are available in the headlen bytes to calculate the
l4_proto.

This patch expect that network and transport headers are
already available in the non-paged header dat.  The assumption
is that the caller has set this up if l4_proto based Tx
steering is desired.

Signed-off-by: Sowmini Varadhan 
---
v3: add unlikely(); remove needless check for hdr.network against
skb_tail_pointer(); refactor check to make sure we have tcp header
in non-paged part.

 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |   17 +
 1 files changed, 17 insertions(+), 0 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c 
b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index eceb47b..a9d2b0c 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -54,6 +54,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "ixgbe.h"
 #include "ixgbe_common.h"
@@ -7651,11 +7652,17 @@ static void ixgbe_atr(struct ixgbe_ring *ring,
/* snag network header to get L4 type and address */
skb = first->skb;
hdr.network = skb_network_header(skb);
+   if (unlikely(hdr.network <= skb->data))
+   return;
if (skb->encapsulation &&
first->protocol == htons(ETH_P_IP) &&
hdr.ipv4->protocol == IPPROTO_UDP) {
struct ixgbe_adapter *adapter = q_vector->adapter;
 
+   if (unlikely(skb_tail_pointer(skb) < hdr.network +
+   VXLAN_HEADROOM))
+   return;
+
/* verify the port is recognized as VXLAN */
if (adapter->vxlan_port &&
udp_hdr(skb)->dest == adapter->vxlan_port)
@@ -7666,6 +7673,12 @@ static void ixgbe_atr(struct ixgbe_ring *ring,
hdr.network = skb_inner_network_header(skb);
}
 
+   /* Make sure we have at least [minimum IPv4 header + TCP]
+* or [IPv6 header] bytes
+*/
+   if (unlikely(skb_tail_pointer(skb) < hdr.network + 40))
+   return;
+
/* Currently only IPv4/IPv6 with TCP is supported */
switch (hdr.ipv4->version) {
case IPVERSION:
@@ -7685,6 +7698,10 @@ static void ixgbe_atr(struct ixgbe_ring *ring,
if (l4_proto != IPPROTO_TCP)
return;
 
+   if (unlikely(skb_tail_pointer(skb) < hdr.network +
+hlen + sizeof(struct tcphdr)))
+   return;
+
th = (struct tcphdr *)(hdr.network + hlen);
 
/* skip this packet since the socket is closing */
-- 
1.7.1

[PATCH V3 RFC 1/2] ixgbe: ixgbe_atr() should access udp_hdr(skb) only for UDP packets

2016-10-17 Thread Sowmini Varadhan

Commit 9f12df906cd8 ("ixgbe: Store VXLAN port number in network order")
incorrectly checks for hdr.ipv4->protocol != IPPROTO_UDP
in ixgbe_atr(). This check should be for "==" instead.

Signed-off-by: Sowmini Varadhan 
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c 
b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index a244d9a..eceb47b 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -7653,7 +7653,7 @@ static void ixgbe_atr(struct ixgbe_ring *ring,
hdr.network = skb_network_header(skb);
if (skb->encapsulation &&
first->protocol == htons(ETH_P_IP) &&
-   hdr.ipv4->protocol != IPPROTO_UDP) {
+   hdr.ipv4->protocol == IPPROTO_UDP) {
struct ixgbe_adapter *adapter = q_vector->adapter;
 
/* verify the port is recognized as VXLAN */
-- 
1.7.1

[PATCH 1/3 net v2] ibmvnic: Driver Version 1.0.1

2016-10-17 Thread Thomas Falcon

Increment driver version to reflect features that have
been added since release.

Signed-off-by: Thomas Falcon 
---
Sorry about the extra noise.

v2: revise version format to be consistent with other IBM virtual drivers
---
 drivers/net/ethernet/ibm/ibmvnic.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/ibm/ibmvnic.h 
b/drivers/net/ethernet/ibm/ibmvnic.h
index bfc84c7..d244e29 100644
--- a/drivers/net/ethernet/ibm/ibmvnic.h
+++ b/drivers/net/ethernet/ibm/ibmvnic.h
@@ -27,7 +27,7 @@
 /**/
 
 #define IBMVNIC_NAME   "ibmvnic"
-#define IBMVNIC_DRIVER_VERSION "1.0"
+#define IBMVNIC_DRIVER_VERSION "1.0.1"
 #define IBMVNIC_INVALID_MAP-1
 #define IBMVNIC_STATS_TIMEOUT  1
 /* basic structures plus 100 2k buffers */
-- 
2.7.4

Re: Need help with mdiobus_register and phy

2016-10-17 Thread Timur Tabi


Zefir Kurtisi wrote:

Anyway, since the SGMII reset is required, instead of reverting the patch in 
full
I suggest to move the SGMII power down from at803x_suspend() and do a SerDes 
power
cycle in at803x_resume(). Could you please test if the patch below fixes the 
problem?


I have never seen the original problem that you noticed.  When I use the 
generic phy driver instead of the at803x driver, everything works great 
for me.  Perhaps the problem that you noticed only occurs with the 
Gianfar NIC?


Anyway, I tested the change you suggested, and it fixes the problem for 
me.  I moved the power-down code to before the power-up code.  But like 
I said, since I never experienced the original problem, I don't know if 
that works for you.


I suggest you make the changes in the driver yourself and test it, and 
then I will test whether that patch also works for me.


--
Sent by an employee of the Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the
Code Aurora Forum, hosted by The Linux Foundation.

Re: [PATCH net-next 2/2] bpf: add initial suite for selftests

2016-10-17 Thread Alexei Starovoitov

On Mon, Oct 17, 2016 at 02:28:36PM +0200, Daniel Borkmann wrote:
> Add a start of a test suite for kernel selftests. This moves test_verifier
> and test_maps over to tools/testing/selftests/bpf/ along with various
> code improvements and also adds a script for invoking test_bpf module.
> The test suite can simply be run via selftest framework, f.e.:
> 
>   # cd tools/testing/selftests/bpf/
>   # make
>   # make run_tests
> 
> Both test_verifier and test_maps were kind of misplaced in samples/bpf/
> directory and we were looking into adding them to selftests for a while
> now, so it can be picked up by kbuild bot et al and hopefully also get
> more exposure and thus new test case additions.
> 
> Signed-off-by: Daniel Borkmann 

Acked-by: Alexei Starovoitov

Re: [PATCH net-next 1/2] bpf: add various tests around spill/fill of regs

2016-10-17 Thread Alexei Starovoitov

On Mon, Oct 17, 2016 at 02:28:35PM +0200, Daniel Borkmann wrote:
> Add several spill/fill tests. Besides others, one that performs xadd
> on the spilled register, one ldx/stx test where different types are
> spilled from two branches and read out from common path. Verfier does
> handle all correctly.
> 
> Signed-off-by: Daniel Borkmann 

Acked-by: Alexei Starovoitov

Re: [PATCH -next] net: wan: slic_ds26522: Remove .owner field for driver

2016-10-17 Thread Javier Martinez Canillas

Hello Wei,

On 10/17/2016 11:51 AM, Wei Yongjun wrote:
> From: Wei Yongjun 
> 
> Remove .owner field if calls are used which set it automatically.
> 
> Generated by: scripts/coccinelle/api/platform_no_drv_owner.cocci
> 
> Signed-off-by: Wei Yongjun 
> ---
>  drivers/net/wan/slic_ds26522.c | 1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/drivers/net/wan/slic_ds26522.c b/drivers/net/wan/slic_ds26522.c
> index b776a0a..5bca31c 100644
> --- a/drivers/net/wan/slic_ds26522.c
> +++ b/drivers/net/wan/slic_ds26522.c
> @@ -241,7 +241,6 @@ static struct spi_driver slic_ds26522_driver = {
>   .driver = {
>  .name = "ds26522",
>  .bus = _bus_type,
> -.owner = THIS_MODULE,
>  .of_match_table = slic_ds26522_match,
>  },
>   .probe = slic_ds26522_probe,
> 

Reviewed-by: Javier Martinez Canillas 

Best regards,
-- 
Javier Martinez Canillas
Open Source Group
Samsung Research America

Re: [PATCH -next] net: wan: slic_ds26522: Use module_spi_driver to simplify the code

2016-10-17 Thread Javier Martinez Canillas

Hello Wei,

On 10/17/2016 11:50 AM, Wei Yongjun wrote:
> From: Wei Yongjun 
> 
> module_spi_driver() makes the code simpler by eliminating
> boilerplate code.
> 
> Signed-off-by: Wei Yongjun 
> ---
>  drivers/net/wan/slic_ds26522.c | 13 +
>  1 file changed, 1 insertion(+), 12 deletions(-)
> 
> diff --git a/drivers/net/wan/slic_ds26522.c b/drivers/net/wan/slic_ds26522.c
> index b776a0a..8f782f1 100644
> --- a/drivers/net/wan/slic_ds26522.c
> +++ b/drivers/net/wan/slic_ds26522.c
> @@ -249,15 +249,4 @@ static struct spi_driver slic_ds26522_driver = {
>   .id_table = slic_ds26522_id,
>  };
>  
> -static int __init slic_ds26522_init(void)
> -{
> - return spi_register_driver(_ds26522_driver);
> -}
> -
> -static void __exit slic_ds26522_exit(void)
> -{
> - spi_unregister_driver(_ds26522_driver);
> -}
> -
> -module_init(slic_ds26522_init);
> -module_exit(slic_ds26522_exit);
> +module_spi_driver(slic_ds26522_driver);
> 

Reviewed-by: Javier Martinez Canillas 

Best regards,
-- 
Javier Martinez Canillas
Open Source Group
Samsung Research America

Re: [PATCH net-next 00/15] ethernet: use core min/max MTU checking

2016-10-17 Thread Jarod Wilson

On Mon, Oct 17, 2016 at 04:03:41PM -0400, David Miller wrote:
> From: Jarod Wilson 
> Date: Mon, 17 Oct 2016 15:54:02 -0400
> 
> > For the most part, every patch does the same essential thing: removes the
> > MTU range checking from the drivers' ndo_change_mtu function, puts those
> > ranges into the core net_device min_mtu and max_mtu fields, and where
> > possible, removes ndo_change_mtu functions entirely.
> 
> Jarod, please read my other posting.

Done, didn't see it until just after I'd hit send, have replied there as
well.

> You've positively broken the maximum MTU for all of these drivers.
> 
> That's not cool.
>
> And this series fixing things doesn't make things better, because now
> we've significanyly broken bisection for anyone running into this
> regression.

Agreed, and my suggestion right now is to revert the 2nd patch from the
prior series. I believe it can be resubmitted after all other callers of
ether_setup() have been converted to have their own min/max_mtu.

> You should have arranged this in such a way that the drivers needing
> > 1500 byte MTU were not impacted at all by your changes, but that
> isn't what happened.

Yeah, I must admit to not looking closely enough at the state the first
two patches left things in. It was absolutely my intention to not alter
behaviour in any way, but I neglected to test sufficiently without this
additional set applied.

-- 
Jarod Wilson
ja...@redhat.com

[PATCH net v2] flow_dissector: Check skb for VLAN only if skb specified.

2016-10-17 Thread Eric Garver

Fixes a panic when calling eth_get_headlen(). Noticed on i40e driver.

Fixes: d5709f7ab776 ("flow_dissector: For stripped vlan, get vlan info from 
skb->vlan_tci")
Signed-off-by: Eric Garver 
---
 net/core/flow_dissector.c | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c
index 1a7b80f73376..44e6ba9d3a6b 100644
--- a/net/core/flow_dissector.c
+++ b/net/core/flow_dissector.c
@@ -247,12 +247,10 @@ bool __skb_flow_dissect(const struct sk_buff *skb,
case htons(ETH_P_8021Q): {
const struct vlan_hdr *vlan;
 
-   if (skb_vlan_tag_present(skb))
+   if (skb && skb_vlan_tag_present(skb))
proto = skb->protocol;
 
-   if (!skb_vlan_tag_present(skb) ||
-   proto == cpu_to_be16(ETH_P_8021Q) ||
-   proto == cpu_to_be16(ETH_P_8021AD)) {
+   if (eth_type_vlan(proto)) {
struct vlan_hdr _vlan;
 
vlan = __skb_header_pointer(skb, nhoff, sizeof(_vlan),
-- 
2.5.5

[PATCH 1/3 net] ibmvnic: Driver Version 1.01

2016-10-17 Thread Thomas Falcon

Increment driver version to reflect features that have
been added since release.

Signed-off-by: Thomas Falcon 
---
 drivers/net/ethernet/ibm/ibmvnic.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/ibm/ibmvnic.h 
b/drivers/net/ethernet/ibm/ibmvnic.h
index bfc84c7..d244e29 100644
--- a/drivers/net/ethernet/ibm/ibmvnic.h
+++ b/drivers/net/ethernet/ibm/ibmvnic.h
@@ -27,7 +27,7 @@
 /**/
 
 #define IBMVNIC_NAME   "ibmvnic"
-#define IBMVNIC_DRIVER_VERSION "1.0"
+#define IBMVNIC_DRIVER_VERSION "1.01"
 #define IBMVNIC_INVALID_MAP-1
 #define IBMVNIC_STATS_TIMEOUT  1
 /* basic structures plus 100 2k buffers */
-- 
2.7.4

[PATCH 2/3 net] ibmvnic: Fix GFP_KERNEL allocation in interrupt context

2016-10-17 Thread Thomas Falcon

Signed-off-by: Thomas Falcon 
---
 drivers/net/ethernet/ibm/ibmvnic.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/ibm/ibmvnic.c 
b/drivers/net/ethernet/ibm/ibmvnic.c
index bfe17d9..928bf8a 100644
--- a/drivers/net/ethernet/ibm/ibmvnic.c
+++ b/drivers/net/ethernet/ibm/ibmvnic.c
@@ -1190,7 +1190,7 @@ static struct ibmvnic_sub_crq_queue 
*init_sub_crq_queue(struct ibmvnic_adapter
if (!scrq)
return NULL;
 
-   scrq->msgs = (union sub_crq *)__get_free_pages(GFP_KERNEL, 2);
+   scrq->msgs = (union sub_crq *)__get_free_pages(GFP_ATOMIC, 2);
memset(scrq->msgs, 0, 4 * PAGE_SIZE);
if (!scrq->msgs) {
dev_warn(dev, "Couldn't allocate crq queue messages page\n");
-- 
2.7.4

[PATCH 3/3 net] ibmvnic: Update MTU after device initialization

2016-10-17 Thread Thomas Falcon

It is possible for the MTU to be changed during the initialization
process with the VNIC Server.  Ensure that the net device is updated 
to reflect the new MTU.

Signed-off-by: Thomas Falcon 
---
 drivers/net/ethernet/ibm/ibmvnic.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/ethernet/ibm/ibmvnic.c 
b/drivers/net/ethernet/ibm/ibmvnic.c
index 928bf8a..213162d 100644
--- a/drivers/net/ethernet/ibm/ibmvnic.c
+++ b/drivers/net/ethernet/ibm/ibmvnic.c
@@ -3654,6 +3654,7 @@ static void handle_crq_init_rsp(struct work_struct *work)
goto task_failed;
 
netdev->real_num_tx_queues = adapter->req_tx_queues;
+   netdev->mtu = adapter->req_mtu;
 
if (adapter->failover) {
adapter->failover = false;
@@ -3792,6 +3793,7 @@ static int ibmvnic_probe(struct vio_dev *dev, const 
struct vio_device_id *id)
}
 
netdev->real_num_tx_queues = adapter->req_tx_queues;
+   netdev->mtu = adapter->req_mtu;
 
rc = register_netdev(netdev);
if (rc) {
-- 
2.7.4

[PATCH net] flow_dissector: Check skb for VLAN only if skb specified.

2016-10-17 Thread Eric Garver

From: Eric Garver 

Fixes a panic when calling eth_get_headlen(). Noticed on i40e driver.

Fixes: d5709f7ab776 ("flow_dissector: For stripped vlan, get vlan info from 
skb->vlan_tci")
Signed-off-by: Eric Garver 
---
 net/core/flow_dissector.c | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c
index 1a7b80f73376..44e6ba9d3a6b 100644
--- a/net/core/flow_dissector.c
+++ b/net/core/flow_dissector.c
@@ -247,12 +247,10 @@ bool __skb_flow_dissect(const struct sk_buff *skb,
case htons(ETH_P_8021Q): {
const struct vlan_hdr *vlan;
 
-   if (skb_vlan_tag_present(skb))
+   if (skb && skb_vlan_tag_present(skb))
proto = skb->protocol;
 
-   if (!skb_vlan_tag_present(skb) ||
-   proto == cpu_to_be16(ETH_P_8021Q) ||
-   proto == cpu_to_be16(ETH_P_8021AD)) {
+   if (eth_type_vlan(proto)) {
struct vlan_hdr _vlan;
 
vlan = __skb_header_pointer(skb, nhoff, sizeof(_vlan),
-- 
2.5.5

Re: [PATCH net] flow_dissector: Check skb for VLAN only if skb specified.

2016-10-17 Thread Eric Garver

On Mon, Oct 17, 2016 at 04:21:57PM -0400, Eric Garver wrote:
> From: Eric Garver 
> 
> Fixes a panic when calling eth_get_headlen(). Noticed on i40e driver.
> 
> Fixes: d5709f7ab776 ("flow_dissector: For stripped vlan, get vlan info from 
> skb->vlan_tci")
> Signed-off-by: Eric Garver 

Oops. Messed the From. Sending a v2.

Re: [PATCH net-next 03/15] ethernet/intel: use core min/max MTU checking

2016-10-17 Thread Jakub Kicinski

Looks better, unfortunately, I can't test since net-next also seem to
make nfs implode on my setup.

On Mon, 17 Oct 2016 15:54:05 -0400, Jarod Wilson wrote:
> @@ -7187,6 +7180,11 @@ static int e1000_probe(struct pci_dev *pdev, const 
> struct pci_device_id *ent)
>   netdev->vlan_features |= NETIF_F_HIGHDMA;
>   }
>  
> + /* MTU range: 68 - max_hw_frame_size */
> + netdev->min_mtu = ETH_MIN_MTU;

Nitpick: we probably don't have to reinit min_mtu if it's 68.

Re: [PATCH v3 net-next 2/2] net: deprecate eth_change_mtu, remove usage

2016-10-17 Thread Jarod Wilson

On Mon, Oct 17, 2016 at 04:07:12PM -0400, Jarod Wilson wrote:
> On Mon, Oct 17, 2016 at 01:25:53PM -0400, David Miller wrote:
> > From: Jakub Kicinski 
> > Date: Mon, 17 Oct 2016 18:20:49 +0100
> > 
> > > Hm.  I must be missing something really obvious.  I just booted
> > > net-next an hour ago and couldn't set MTU to anything larger than 1500
> > > on either nfp or igb.  As far as I can read the code it will set the
> > > max_mtu to 1500 in setup_ether() but none of the jumbo-capable drivers
> > > had been touched by Jarod so far...
> > 
> > Indeed.
> > 
> > Jarod, this doesn't work.
> > 
> > I guess the idea was that if the driver overrides ndo_change_mtu and
> > enforeced it's limits there, the driver would still work after your
> > changes.
> > 
> > But that isn't what is happening, look at the IGB example.
> > 
> > It uses ether_setup(), which sets those new defaults, but now when
> > the MTU is changed you enforce those default min/max before the
> > driver's ->ndo_change_mtu() has a change to step in front and make
> > the decision on it's own.
> > 
> > This means your changes pretty much did indeed break a lot of
> > drivers's ability to set larger than a 1500 byte MTU.
> 
> Argh. Yeah, I see it now. I was primarily operating with the follow-on
> patches also in play, which do touch all the ethernet drivers and set
> max_mtu to match current behavior, didn't consider the max_mtu case where
> only the initial patches were applied and the follow-on ones weren't. I've
> sent that set, which should theoretically make this problem go away, but I
> can also try to rework things if need be to restore intermediate jumbo
> frames functionality. (And there are actually non-ethernet devices that
> also call ether_setup and may or may not have larger than 1500 mtu that
> aren't yet addressed by that follow-on set).

Looks like the simplest thing to do is going to be to revert a52ad514, and
only make that change after all callers of ether_setup() are setting
min/max_mtu themselves as needed, then it can be reintroduced.

-- 
Jarod Wilson
ja...@redhat.com

Re: [PATCH v3 net-next 2/2] net: deprecate eth_change_mtu, remove usage

2016-10-17 Thread Jarod Wilson

On Mon, Oct 17, 2016 at 01:25:53PM -0400, David Miller wrote:
> From: Jakub Kicinski 
> Date: Mon, 17 Oct 2016 18:20:49 +0100
> 
> > Hm.  I must be missing something really obvious.  I just booted
> > net-next an hour ago and couldn't set MTU to anything larger than 1500
> > on either nfp or igb.  As far as I can read the code it will set the
> > max_mtu to 1500 in setup_ether() but none of the jumbo-capable drivers
> > had been touched by Jarod so far...
> 
> Indeed.
> 
> Jarod, this doesn't work.
> 
> I guess the idea was that if the driver overrides ndo_change_mtu and
> enforeced it's limits there, the driver would still work after your
> changes.
> 
> But that isn't what is happening, look at the IGB example.
> 
> It uses ether_setup(), which sets those new defaults, but now when
> the MTU is changed you enforce those default min/max before the
> driver's ->ndo_change_mtu() has a change to step in front and make
> the decision on it's own.
> 
> This means your changes pretty much did indeed break a lot of
> drivers's ability to set larger than a 1500 byte MTU.

Argh. Yeah, I see it now. I was primarily operating with the follow-on
patches also in play, which do touch all the ethernet drivers and set
max_mtu to match current behavior, didn't consider the max_mtu case where
only the initial patches were applied and the follow-on ones weren't. I've
sent that set, which should theoretically make this problem go away, but I
can also try to rework things if need be to restore intermediate jumbo
frames functionality. (And there are actually non-ethernet devices that
also call ether_setup and may or may not have larger than 1500 mtu that
aren't yet addressed by that follow-on set).

-- 
Jarod Wilson
ja...@redhat.com

Re: [PATCH net-next 00/15] ethernet: use core min/max MTU checking

2016-10-17 Thread David Miller

From: Jarod Wilson 
Date: Mon, 17 Oct 2016 15:54:02 -0400

> For the most part, every patch does the same essential thing: removes the
> MTU range checking from the drivers' ndo_change_mtu function, puts those
> ranges into the core net_device min_mtu and max_mtu fields, and where
> possible, removes ndo_change_mtu functions entirely.

Jarod, please read my other posting.

You've positively broken the maximum MTU for all of these drivers.

That's not cool.

And this series fixing things doesn't make things better, because now
we've significanyly broken bisection for anyone running into this
regression.

You should have arranged this in such a way that the drivers needing
> 1500 byte MTU were not impacted at all by your changes, but that
isn't what happened.

[PATCH net-next 04/15] ethernet/marvell: use core min/max MTU checking

2016-10-17 Thread Jarod Wilson

mvneta: min_mtu 68, max_mtu 9676
- mtu validation routine mostly did range check, merge back into
  mvneta_change_mtu for simplicity

mvpp2: min_mtu 68, max_mtu 9676
- mtu validation routine mostly did range check, merge back into
  mvpp2_change_mtu for simplicity

pxa168_eth: min_mtu 68, max_mtu 9500

skge: min_mtu 60, max_mtu 9000

sky2: min_mtu 68, max_mtu 1500 or 9000, depending on hw

CC: netdev@vger.kernel.org
CC: Mirko Lindner 
CC: Stephen Hemminger 
CC: Thomas Petazzoni 
Signed-off-by: Jarod Wilson 
---
 drivers/net/ethernet/marvell/mvneta.c | 36 +--
 drivers/net/ethernet/marvell/mvpp2.c  | 36 ---
 drivers/net/ethernet/marvell/pxa168_eth.c |  7 +++---
 drivers/net/ethernet/marvell/skge.c   |  7 +++---
 drivers/net/ethernet/marvell/sky2.c   | 18 +++-
 5 files changed, 35 insertions(+), 69 deletions(-)

diff --git a/drivers/net/ethernet/marvell/mvneta.c 
b/drivers/net/ethernet/marvell/mvneta.c
index 5cb07c2..b85819e 100644
--- a/drivers/net/ethernet/marvell/mvneta.c
+++ b/drivers/net/ethernet/marvell/mvneta.c
@@ -3024,29 +3024,6 @@ static void mvneta_stop_dev(struct mvneta_port *pp)
mvneta_rx_reset(pp);
 }
 
-/* Return positive if MTU is valid */
-static int mvneta_check_mtu_valid(struct net_device *dev, int mtu)
-{
-   if (mtu < 68) {
-   netdev_err(dev, "cannot change mtu to less than 68\n");
-   return -EINVAL;
-   }
-
-   /* 9676 == 9700 - 20 and rounding to 8 */
-   if (mtu > 9676) {
-   netdev_info(dev, "Illegal MTU value %d, round to 9676\n", mtu);
-   mtu = 9676;
-   }
-
-   if (!IS_ALIGNED(MVNETA_RX_PKT_SIZE(mtu), 8)) {
-   netdev_info(dev, "Illegal MTU value %d, rounding to %d\n",
-   mtu, ALIGN(MVNETA_RX_PKT_SIZE(mtu), 8));
-   mtu = ALIGN(MVNETA_RX_PKT_SIZE(mtu), 8);
-   }
-
-   return mtu;
-}
-
 static void mvneta_percpu_enable(void *arg)
 {
struct mvneta_port *pp = arg;
@@ -3067,9 +3044,11 @@ static int mvneta_change_mtu(struct net_device *dev, int 
mtu)
struct mvneta_port *pp = netdev_priv(dev);
int ret;
 
-   mtu = mvneta_check_mtu_valid(dev, mtu);
-   if (mtu < 0)
-   return -EINVAL;
+   if (!IS_ALIGNED(MVNETA_RX_PKT_SIZE(mtu), 8)) {
+   netdev_info(dev, "Illegal MTU value %d, rounding to %d\n",
+   mtu, ALIGN(MVNETA_RX_PKT_SIZE(mtu), 8));
+   mtu = ALIGN(MVNETA_RX_PKT_SIZE(mtu), 8);
+   }
 
dev->mtu = mtu;
 
@@ -4154,6 +4133,11 @@ static int mvneta_probe(struct platform_device *pdev)
dev->priv_flags |= IFF_UNICAST_FLT | IFF_LIVE_ADDR_CHANGE;
dev->gso_max_segs = MVNETA_MAX_TSO_SEGS;
 
+   /* MTU range: 68 - 9676 */
+   dev->min_mtu = ETH_MIN_MTU;
+   /* 9676 == 9700 - 20 and rounding to 8 */
+   dev->max_mtu = 9676;
+
err = register_netdev(dev);
if (err < 0) {
dev_err(>dev, "failed to register\n");
diff --git a/drivers/net/ethernet/marvell/mvpp2.c 
b/drivers/net/ethernet/marvell/mvpp2.c
index 60227a3..c8bf155 100644
--- a/drivers/net/ethernet/marvell/mvpp2.c
+++ b/drivers/net/ethernet/marvell/mvpp2.c
@@ -5453,29 +5453,6 @@ static void mvpp2_stop_dev(struct mvpp2_port *port)
phy_stop(ndev->phydev);
 }
 
-/* Return positive if MTU is valid */
-static inline int mvpp2_check_mtu_valid(struct net_device *dev, int mtu)
-{
-   if (mtu < 68) {
-   netdev_err(dev, "cannot change mtu to less than 68\n");
-   return -EINVAL;
-   }
-
-   /* 9676 == 9700 - 20 and rounding to 8 */
-   if (mtu > 9676) {
-   netdev_info(dev, "illegal MTU value %d, round to 9676\n", mtu);
-   mtu = 9676;
-   }
-
-   if (!IS_ALIGNED(MVPP2_RX_PKT_SIZE(mtu), 8)) {
-   netdev_info(dev, "illegal MTU value %d, round to %d\n", mtu,
-   ALIGN(MVPP2_RX_PKT_SIZE(mtu), 8));
-   mtu = ALIGN(MVPP2_RX_PKT_SIZE(mtu), 8);
-   }
-
-   return mtu;
-}
-
 static int mvpp2_check_ringparam_valid(struct net_device *dev,
   struct ethtool_ringparam *ring)
 {
@@ -5717,10 +5694,10 @@ static int mvpp2_change_mtu(struct net_device *dev, int 
mtu)
struct mvpp2_port *port = netdev_priv(dev);
int err;
 
-   mtu = mvpp2_check_mtu_valid(dev, mtu);
-   if (mtu < 0) {
-   err = mtu;
-   goto error;
+   if (!IS_ALIGNED(MVPP2_RX_PKT_SIZE(mtu), 8)) {
+   netdev_info(dev, "illegal MTU value %d, round to %d\n", mtu,
+   ALIGN(MVPP2_RX_PKT_SIZE(mtu), 8));
+   mtu = ALIGN(MVPP2_RX_PKT_SIZE(mtu), 8);
}
 
if (!netif_running(dev)) {
@@ -6212,6 +6189,11 @@ static int mvpp2_port_probe(struct

[PATCH net-next 00/15] ethernet: use core min/max MTU checking

2016-10-17 Thread Jarod Wilson

Now that the network stack core min/max MTU checking infrastructure is in
place, time to start making drivers use it. We'll start with the easiest
ones, the ethernet drivers, split roughly by vendor, with a catch-all
patch at the end.

For the most part, every patch does the same essential thing: removes the
MTU range checking from the drivers' ndo_change_mtu function, puts those
ranges into the core net_device min_mtu and max_mtu fields, and where
possible, removes ndo_change_mtu functions entirely.

These patches have all been built through the 0-day build infrastructure
provided by Intel, on top of net-next as of October 17.

Rebasing git tree with these patches can be found here:

  https://github.com/jarodwilson/linux-muck

Jarod Wilson (15):
  ethernet/atheros: use core min/max MTU checking
  ethernet/broadcom: use core min/max MTU checking
  ethernet/intel: use core min/max MTU checking
  ethernet/marvell: use core min/max MTU checking
  ethernet/mellanox: use core min/max MTU checking
  ethernet/qlogic: use core min/max MTU checking
  ethernet/realtek: use core min/max MTU checking
  ethernet/sun: use core min/max MTU checking
  ethernet/dlink: use core min/max MTU checking
  ethernet/neterion: use core min/max MTU checking
  ethernet/cavium: use core min/max MTU checking
  ethernet/ibm: use core min/max MTU checking
  ethernet/tile: use core min/max MTU checking
  ethernet/toshiba: use core min/max MTU checking
  ethernet: use core min/max MTU checking

CC: netdev@vger.kernel.org

 drivers/net/ethernet/agere/et131x.c|  7 ++--
 drivers/net/ethernet/altera/altera_tse.h   |  1 -
 drivers/net/ethernet/altera/altera_tse_main.c  | 14 ++--
 drivers/net/ethernet/amd/amd8111e.c|  5 ++-
 drivers/net/ethernet/atheros/alx/hw.h  |  1 -
 drivers/net/ethernet/atheros/alx/main.c| 10 ++
 drivers/net/ethernet/atheros/atl1c/atl1c_main.c| 41 ++
 drivers/net/ethernet/atheros/atl1e/atl1e_main.c| 12 +++
 drivers/net/ethernet/atheros/atlx/atl1.c   | 15 
 drivers/net/ethernet/atheros/atlx/atl2.c   | 16 -
 drivers/net/ethernet/atheros/atlx/atl2.h   |  3 --
 drivers/net/ethernet/broadcom/b44.c|  9 +++--
 drivers/net/ethernet/broadcom/bcm63xx_enet.c   | 35 ++
 drivers/net/ethernet/broadcom/bnx2.c   | 16 -
 drivers/net/ethernet/broadcom/bnx2.h   |  6 ++--
 drivers/net/ethernet/broadcom/bnx2x/bnx2x.h|  6 ++--
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c|  8 +
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_link.c   | 22 +---
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c   |  7 ++--
 drivers/net/ethernet/broadcom/bnxt/bnxt.c  |  7 ++--
 drivers/net/ethernet/broadcom/tg3.c|  9 ++---
 drivers/net/ethernet/brocade/bna/bnad.c|  7 ++--
 drivers/net/ethernet/cadence/macb.c| 19 +-
 drivers/net/ethernet/calxeda/xgmac.c   | 20 +++
 drivers/net/ethernet/cavium/liquidio/lio_main.c| 15 +++-
 .../net/ethernet/cavium/liquidio/octeon_network.h  |  2 +-
 drivers/net/ethernet/cavium/octeon/octeon_mgmt.c   | 13 ++-
 drivers/net/ethernet/cavium/thunder/nicvf_main.c   | 10 +++---
 drivers/net/ethernet/chelsio/cxgb/common.h |  5 +++
 drivers/net/ethernet/chelsio/cxgb/cxgb2.c  | 18 --
 drivers/net/ethernet/chelsio/cxgb/pm3393.c |  8 +
 drivers/net/ethernet/chelsio/cxgb/vsc7326.c|  5 ---
 drivers/net/ethernet/cisco/enic/enic_main.c|  7 ++--
 drivers/net/ethernet/cisco/enic/enic_res.h |  2 +-
 drivers/net/ethernet/dlink/dl2k.c  | 22 +++-
 drivers/net/ethernet/dlink/sundance.c  |  6 ++--
 drivers/net/ethernet/freescale/gianfar.c   |  9 ++---
 drivers/net/ethernet/hisilicon/hns/hns_dsaf_mac.c  |  3 +-
 drivers/net/ethernet/hisilicon/hns/hns_enet.c  | 11 +++---
 drivers/net/ethernet/ibm/ehea/ehea_main.c  | 13 +++
 drivers/net/ethernet/ibm/emac/core.c   |  9 ++---
 drivers/net/ethernet/intel/e100.c  |  9 -
 drivers/net/ethernet/intel/e1000/e1000_main.c  | 12 +++
 drivers/net/ethernet/intel/e1000e/netdev.c | 14 
 drivers/net/ethernet/intel/fm10k/fm10k_netdev.c| 15 +++-
 drivers/net/ethernet/intel/i40e/i40e_main.c| 10 +++---
 drivers/net/ethernet/intel/i40evf/i40evf_main.c|  8 ++---
 drivers/net/ethernet/intel/igb/e1000_defines.h |  3 +-
 drivers/net/ethernet/intel/igb/igb_main.c  | 15 +++-
 drivers/net/ethernet/intel/igbvf/defines.h |  3 +-
 drivers/net/ethernet/intel/igbvf/netdev.c  | 14 +++-
 drivers/net/ethernet/intel/ixgb/ixgb_main.c| 16 +++--
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c  | 11 +++---
 drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c  | 33 -

[PATCH net-next 03/15] ethernet/intel: use core min/max MTU checking

2016-10-17 Thread Jarod Wilson

e100: min_mtu 68, max_mtu 1500
- remove e100_change_mtu entirely, is identical to old eth_change_mtu,
  and no longer serves a purpose. No need to set min_mtu or max_mtu
  explicitly, as ether_setup() will already set them to 68 and 1500.

e1000: min_mtu 46, max_mtu 16110

e1000e: min_mtu 68, max_mtu varies based on adapter

fm10k: min_mtu 68, max_mtu 15342
- remove fm10k_change_mtu entirely, does nothing now

i40e: min_mtu 68, max_mtu 9706

i40evf: min_mtu 68, max_mtu 9706

igb: min_mtu 68, max_mtu 9216
- There are two different "max" frame sizes claimed and both checked in
  the driver, the larger value wasn't relevant though, so I've set max_mtu
  to the smaller of the two values here to retain identical behavior.

igbvf: min_mtu 68, max_mtu 9216
- Same issue as igb duplicated

ixgb: min_mtu 68, max_mtu 16114
- Also remove pointless old == new check, as that's done in dev_set_mtu

ixgbe: min_mtu 68, max_mtu 9710

ixgbevf: min_mtu 68, max_mtu dependent on hardware/firmware
- Some hw can only handle up to max_mtu 1504 on a vf, others 9710

CC: netdev@vger.kernel.org
CC: intel-wired-...@lists.osuosl.org
CC: Jeff Kirsher 
Signed-off-by: Jarod Wilson 
---
 drivers/net/ethernet/intel/e100.c |  9 ---
 drivers/net/ethernet/intel/e1000/e1000_main.c | 12 -
 drivers/net/ethernet/intel/e1000e/netdev.c| 14 +-
 drivers/net/ethernet/intel/fm10k/fm10k_netdev.c   | 15 +++
 drivers/net/ethernet/intel/i40e/i40e_main.c   | 10 +++
 drivers/net/ethernet/intel/i40evf/i40evf_main.c   |  8 +++---
 drivers/net/ethernet/intel/igb/e1000_defines.h|  3 ++-
 drivers/net/ethernet/intel/igb/igb_main.c | 15 +++
 drivers/net/ethernet/intel/igbvf/defines.h|  3 ++-
 drivers/net/ethernet/intel/igbvf/netdev.c | 14 +++---
 drivers/net/ethernet/intel/ixgb/ixgb_main.c   | 16 +++
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 11 
 drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c | 33 ---
 13 files changed, 62 insertions(+), 101 deletions(-)

diff --git a/drivers/net/ethernet/intel/e100.c 
b/drivers/net/ethernet/intel/e100.c
index 068789e..25c6dfd 100644
--- a/drivers/net/ethernet/intel/e100.c
+++ b/drivers/net/ethernet/intel/e100.c
@@ -2286,14 +2286,6 @@ static int e100_set_mac_address(struct net_device 
*netdev, void *p)
return 0;
 }
 
-static int e100_change_mtu(struct net_device *netdev, int new_mtu)
-{
-   if (new_mtu < ETH_ZLEN || new_mtu > ETH_DATA_LEN)
-   return -EINVAL;
-   netdev->mtu = new_mtu;
-   return 0;
-}
-
 static int e100_asf(struct nic *nic)
 {
/* ASF can be enabled from eeprom */
@@ -2834,7 +2826,6 @@ static const struct net_device_ops e100_netdev_ops = {
.ndo_validate_addr  = eth_validate_addr,
.ndo_set_rx_mode= e100_set_multicast_list,
.ndo_set_mac_address= e100_set_mac_address,
-   .ndo_change_mtu = e100_change_mtu,
.ndo_do_ioctl   = e100_do_ioctl,
.ndo_tx_timeout = e100_tx_timeout,
 #ifdef CONFIG_NET_POLL_CONTROLLER
diff --git a/drivers/net/ethernet/intel/e1000/e1000_main.c 
b/drivers/net/ethernet/intel/e1000/e1000_main.c
index f42129d..33076fa 100644
--- a/drivers/net/ethernet/intel/e1000/e1000_main.c
+++ b/drivers/net/ethernet/intel/e1000/e1000_main.c
@@ -1085,6 +1085,10 @@ static int e1000_probe(struct pci_dev *pdev, const 
struct pci_device_id *ent)
hw->subsystem_vendor_id != PCI_VENDOR_ID_VMWARE)
netdev->priv_flags |= IFF_UNICAST_FLT;
 
+   /* MTU range: 46 - 16110 */
+   netdev->min_mtu = ETH_ZLEN - ETH_HLEN;
+   netdev->max_mtu = MAX_JUMBO_FRAME_SIZE - (ETH_HLEN + ETH_FCS_LEN);
+
adapter->en_mng_pt = e1000_enable_mng_pass_thru(hw);
 
/* initialize eeprom parameters */
@@ -3549,13 +3553,7 @@ static int e1000_change_mtu(struct net_device *netdev, 
int new_mtu)
 {
struct e1000_adapter *adapter = netdev_priv(netdev);
struct e1000_hw *hw = >hw;
-   int max_frame = new_mtu + ENET_HEADER_SIZE + ETHERNET_FCS_SIZE;
-
-   if ((max_frame < MINIMUM_ETHERNET_FRAME_SIZE) ||
-   (max_frame > MAX_JUMBO_FRAME_SIZE)) {
-   e_err(probe, "Invalid MTU setting\n");
-   return -EINVAL;
-   }
+   int max_frame = new_mtu + ETH_HLEN + ETH_FCS_LEN;
 
/* Adapter-specific max frame size limits. */
switch (hw->mac_type) {
diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c 
b/drivers/net/ethernet/intel/e1000e/netdev.c
index 7017281..8759d92 100644
--- a/drivers/net/ethernet/intel/e1000e/netdev.c
+++ b/drivers/net/ethernet/intel/e1000e/netdev.c
@@ -5974,19 +5974,12 @@ static int e1000_change_mtu(struct net_device *netdev, 
int new_mtu)
int max_frame = new_mtu + VLAN_ETH_HLEN + ETH_FCS_LEN;
 
/* Jumbo frame support */
-   if ((max_frame > (VLAN_ETH_FRAME_LEN +

[PATCH net-next 06/15] ethernet/qlogic: use core min/max MTU checking

2016-10-17 Thread Jarod Wilson

qede: min_mtu 46, max_mtu 9600
- Put define for max in qede.h

qlcnic: min_mtu 68, max_mtu 9600

CC: netdev@vger.kernel.org
CC dept-gelinuxnic...@qlogic.com
CC: Yuval Mintz 
CC: Ariel Elior 
Signed-off-by: Jarod Wilson 
---
 drivers/net/ethernet/qlogic/qede/qede.h  | 5 +++--
 drivers/net/ethernet/qlogic/qede/qede_ethtool.c  | 8 
 drivers/net/ethernet/qlogic/qede/qede_main.c | 4 
 drivers/net/ethernet/qlogic/qlcnic/qlcnic_hw.c   | 6 --
 drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c | 4 
 5 files changed, 11 insertions(+), 16 deletions(-)

diff --git a/drivers/net/ethernet/qlogic/qede/qede.h 
b/drivers/net/ethernet/qlogic/qede/qede.h
index f50e527..9135b9d 100644
--- a/drivers/net/ethernet/qlogic/qede/qede.h
+++ b/drivers/net/ethernet/qlogic/qede/qede.h
@@ -362,8 +362,9 @@ void qede_recycle_rx_bd_ring(struct qede_rx_queue *rxq, 
struct qede_dev *edev,
 #define NUM_TX_BDS_MIN 128
 #define NUM_TX_BDS_DEF NUM_TX_BDS_MAX
 
-#define QEDE_MIN_PKT_LEN   64
-#define QEDE_RX_HDR_SIZE   256
+#define QEDE_MIN_PKT_LEN   64
+#define QEDE_RX_HDR_SIZE   256
+#define QEDE_MAX_JUMBO_PACKET_SIZE 9600
 #definefor_each_queue(i) for (i = 0; i < edev->num_queues; i++)
 
 #endif /* _QEDE_H_ */
diff --git a/drivers/net/ethernet/qlogic/qede/qede_ethtool.c 
b/drivers/net/ethernet/qlogic/qede/qede_ethtool.c
index 25a9b29..b7dbb44 100644
--- a/drivers/net/ethernet/qlogic/qede/qede_ethtool.c
+++ b/drivers/net/ethernet/qlogic/qede/qede_ethtool.c
@@ -723,19 +723,11 @@ static void qede_update_mtu(struct qede_dev *edev, union 
qede_reload_args *args)
 }
 
 /* Netdevice NDOs */
-#define ETH_MAX_JUMBO_PACKET_SIZE  9600
-#define ETH_MIN_PACKET_SIZE60
 int qede_change_mtu(struct net_device *ndev, int new_mtu)
 {
struct qede_dev *edev = netdev_priv(ndev);
union qede_reload_args args;
 
-   if ((new_mtu > ETH_MAX_JUMBO_PACKET_SIZE) ||
-   ((new_mtu + ETH_HLEN) < ETH_MIN_PACKET_SIZE)) {
-   DP_ERR(edev, "Can't support requested MTU size\n");
-   return -EINVAL;
-   }
-
DP_VERBOSE(edev, (NETIF_MSG_IFUP | NETIF_MSG_IFDOWN),
   "Configuring MTU size of %d\n", new_mtu);
 
diff --git a/drivers/net/ethernet/qlogic/qede/qede_main.c 
b/drivers/net/ethernet/qlogic/qede/qede_main.c
index 0e483af..4f29865 100644
--- a/drivers/net/ethernet/qlogic/qede/qede_main.c
+++ b/drivers/net/ethernet/qlogic/qede/qede_main.c
@@ -2391,6 +2391,10 @@ static void qede_init_ndev(struct qede_dev *edev)
 
ndev->hw_features = hw_features;
 
+   /* MTU range: 46 - 9600 */
+   ndev->min_mtu = ETH_ZLEN - ETH_HLEN;
+   ndev->max_mtu = QEDE_MAX_JUMBO_PACKET_SIZE;
+
/* Set network device HW mac */
ether_addr_copy(edev->ndev->dev_addr, edev->dev_info.common.hw_mac);
 }
diff --git a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_hw.c 
b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_hw.c
index 509b596..838cc0c 100644
--- a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_hw.c
+++ b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_hw.c
@@ -1024,12 +1024,6 @@ int qlcnic_change_mtu(struct net_device *netdev, int mtu)
struct qlcnic_adapter *adapter = netdev_priv(netdev);
int rc = 0;
 
-   if (mtu < P3P_MIN_MTU || mtu > P3P_MAX_MTU) {
-   dev_err(>netdev->dev, "%d bytes < mtu < %d bytes"
-   " not supported\n", P3P_MAX_MTU, P3P_MIN_MTU);
-   return -EINVAL;
-   }
-
rc = qlcnic_fw_cmd_set_mtu(adapter, mtu);
 
if (!rc)
diff --git a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c 
b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c
index 3ae3968..4c0cce96 100644
--- a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c
+++ b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c
@@ -2342,6 +2342,10 @@ qlcnic_setup_netdev(struct qlcnic_adapter *adapter, 
struct net_device *netdev,
netdev->priv_flags |= IFF_UNICAST_FLT;
netdev->irq = adapter->msix_entries[0].vector;
 
+   /* MTU range: 68 - 9600 */
+   netdev->min_mtu = P3P_MIN_MTU;
+   netdev->max_mtu = P3P_MAX_MTU;
+
err = qlcnic_set_real_num_queues(adapter, adapter->drv_tx_rings,
 adapter->drv_sds_rings);
if (err)
-- 
2.10.0

[PATCH net-next 07/15] ethernet/realtek: use core min/max MTU checking

2016-10-17 Thread Jarod Wilson

8139cp: min_mtu 60, max_mtu 4096

8139too: min_mtu 68, max_mtu 1770

r8169: min_mtu 60, max_mtu depends on chipset, 1500 to 9k-ish

CC: netdev@vger.kernel.org
CC: Realtek linux nic maintainers 
Signed-off-by: Jarod Wilson 
---
 drivers/net/ethernet/realtek/8139cp.c  |  8 
 drivers/net/ethernet/realtek/8139too.c | 13 -
 drivers/net/ethernet/realtek/r8169.c   |  8 
 3 files changed, 12 insertions(+), 17 deletions(-)

diff --git a/drivers/net/ethernet/realtek/8139cp.c 
b/drivers/net/ethernet/realtek/8139cp.c
index 5297bf7..b7c89eb 100644
--- a/drivers/net/ethernet/realtek/8139cp.c
+++ b/drivers/net/ethernet/realtek/8139cp.c
@@ -1277,10 +1277,6 @@ static int cp_change_mtu(struct net_device *dev, int 
new_mtu)
 {
struct cp_private *cp = netdev_priv(dev);
 
-   /* check for invalid MTU, according to hardware limits */
-   if (new_mtu < CP_MIN_MTU || new_mtu > CP_MAX_MTU)
-   return -EINVAL;
-
/* if network interface not up, no need for complexity */
if (!netif_running(dev)) {
dev->mtu = new_mtu;
@@ -2010,6 +2006,10 @@ static int cp_init_one (struct pci_dev *pdev, const 
struct pci_device_id *ent)
dev->vlan_features = NETIF_F_SG | NETIF_F_IP_CSUM | NETIF_F_TSO |
NETIF_F_HIGHDMA;
 
+   /* MTU range: 60 - 4096 */
+   dev->min_mtu = CP_MIN_MTU;
+   dev->max_mtu = CP_MAX_MTU;
+
rc = register_netdev(dev);
if (rc)
goto err_out_iomap;
diff --git a/drivers/net/ethernet/realtek/8139too.c 
b/drivers/net/ethernet/realtek/8139too.c
index da4c2d8..9bc047a 100644
--- a/drivers/net/ethernet/realtek/8139too.c
+++ b/drivers/net/ethernet/realtek/8139too.c
@@ -924,19 +924,10 @@ static int rtl8139_set_features(struct net_device *dev, 
netdev_features_t featur
return 0;
 }
 
-static int rtl8139_change_mtu(struct net_device *dev, int new_mtu)
-{
-   if (new_mtu < 68 || new_mtu > MAX_ETH_DATA_SIZE)
-   return -EINVAL;
-   dev->mtu = new_mtu;
-   return 0;
-}
-
 static const struct net_device_ops rtl8139_netdev_ops = {
.ndo_open   = rtl8139_open,
.ndo_stop   = rtl8139_close,
.ndo_get_stats64= rtl8139_get_stats64,
-   .ndo_change_mtu = rtl8139_change_mtu,
.ndo_validate_addr  = eth_validate_addr,
.ndo_set_mac_address= rtl8139_set_mac_address,
.ndo_start_xmit = rtl8139_start_xmit,
@@ -1022,6 +1013,10 @@ static int rtl8139_init_one(struct pci_dev *pdev,
dev->hw_features |= NETIF_F_RXALL;
dev->hw_features |= NETIF_F_RXFCS;
 
+   /* MTU range: 68 - 1770 */
+   dev->min_mtu = ETH_MIN_MTU;
+   dev->max_mtu = MAX_ETH_DATA_SIZE;
+
/* tp zeroed and aligned in alloc_etherdev */
tp = netdev_priv(dev);
 
diff --git a/drivers/net/ethernet/realtek/r8169.c 
b/drivers/net/ethernet/realtek/r8169.c
index e55638c..b698ea5 100644
--- a/drivers/net/ethernet/realtek/r8169.c
+++ b/drivers/net/ethernet/realtek/r8169.c
@@ -6673,10 +6673,6 @@ static int rtl8169_change_mtu(struct net_device *dev, 
int new_mtu)
 {
struct rtl8169_private *tp = netdev_priv(dev);
 
-   if (new_mtu < ETH_ZLEN ||
-   new_mtu > rtl_chip_infos[tp->mac_version].jumbo_max)
-   return -EINVAL;
-
if (new_mtu > ETH_DATA_LEN)
rtl_hw_jumbo_enable(tp);
else
@@ -8430,6 +8426,10 @@ static int rtl_init_one(struct pci_dev *pdev, const 
struct pci_device_id *ent)
dev->hw_features |= NETIF_F_RXALL;
dev->hw_features |= NETIF_F_RXFCS;
 
+   /* MTU range: 60 - hw-specific max */
+   dev->min_mtu = ETH_ZLEN;
+   dev->max_mtu = rtl_chip_infos[chipset].jumbo_max;
+
tp->hw_start = cfg->hw_start;
tp->event_slow = cfg->event_slow;
 
-- 
2.10.0

[PATCH net-next 01/15] ethernet/atheros: use core min/max MTU checking

2016-10-17 Thread Jarod Wilson

atl2: min_mtu 40, max_mtu 1504

- Remove a few redundant defines that already have equivalents in
  if_ether.h.

atl1: min_mtu 42, max_mtu 10218

atl1e: min_mtu 42, max_mtu 8170

atl1c: min_mtu 42, max_mtu 6122/1500

- GbE hardware gets a max_mtu of 6122, slower hardware gets 1500.

alx: min_mtu 34, max_mtu 9256

- Not so sure that minimum MTU number is really what was intended, but
  that's what the math actually makes it out to be, due to max_frame
  manipulations and comparison in alx_change_mtu, rather than just
  comparing new_mtu. (I think 68 was the intended min_mtu value).

CC: netdev@vger.kernel.org
CC: Jay Cliburn 
CC: Chris Snook 
Signed-off-by: Jarod Wilson 
---
 drivers/net/ethernet/atheros/alx/hw.h   |  1 -
 drivers/net/ethernet/atheros/alx/main.c | 10 ++
 drivers/net/ethernet/atheros/atl1c/atl1c_main.c | 41 -
 drivers/net/ethernet/atheros/atl1e/atl1e_main.c | 12 +++-
 drivers/net/ethernet/atheros/atlx/atl1.c| 15 -
 drivers/net/ethernet/atheros/atlx/atl2.c| 16 +-
 drivers/net/ethernet/atheros/atlx/atl2.h|  3 --
 7 files changed, 47 insertions(+), 51 deletions(-)

diff --git a/drivers/net/ethernet/atheros/alx/hw.h 
b/drivers/net/ethernet/atheros/alx/hw.h
index 0191477..e42d7e0 100644
--- a/drivers/net/ethernet/atheros/alx/hw.h
+++ b/drivers/net/ethernet/atheros/alx/hw.h
@@ -351,7 +351,6 @@ struct alx_rrd {
 #define ALX_MAX_JUMBO_PKT_SIZE (9*1024)
 #define ALX_MAX_TSO_PKT_SIZE   (7*1024)
 #define ALX_MAX_FRAME_SIZE ALX_MAX_JUMBO_PKT_SIZE
-#define ALX_MIN_FRAME_SIZE (ETH_ZLEN + ETH_FCS_LEN + VLAN_HLEN)
 
 #define ALX_MAX_RX_QUEUES  8
 #define ALX_MAX_TX_QUEUES  4
diff --git a/drivers/net/ethernet/atheros/alx/main.c 
b/drivers/net/ethernet/atheros/alx/main.c
index c0f84b7..eccbacd 100644
--- a/drivers/net/ethernet/atheros/alx/main.c
+++ b/drivers/net/ethernet/atheros/alx/main.c
@@ -892,6 +892,9 @@ static int alx_init_sw(struct alx_priv *alx)
hw->smb_timer = 400;
hw->mtu = alx->dev->mtu;
alx->rxbuf_size = ALX_MAX_FRAME_LEN(hw->mtu);
+   /* MTU range: 34 - 9256 */
+   alx->dev->min_mtu = 34;
+   alx->dev->max_mtu = ALX_MAX_FRAME_LEN(ALX_MAX_FRAME_SIZE);
alx->tx_ringsz = 256;
alx->rx_ringsz = 512;
hw->imt = 200;
@@ -994,13 +997,6 @@ static int alx_change_mtu(struct net_device *netdev, int 
mtu)
struct alx_priv *alx = netdev_priv(netdev);
int max_frame = ALX_MAX_FRAME_LEN(mtu);
 
-   if ((max_frame < ALX_MIN_FRAME_SIZE) ||
-   (max_frame > ALX_MAX_FRAME_SIZE))
-   return -EINVAL;
-
-   if (netdev->mtu == mtu)
-   return 0;
-
netdev->mtu = mtu;
alx->hw.mtu = mtu;
alx->rxbuf_size = max(max_frame, ALX_DEF_RXBUF_SIZE);
diff --git a/drivers/net/ethernet/atheros/atl1c/atl1c_main.c 
b/drivers/net/ethernet/atheros/atl1c/atl1c_main.c
index a3200ea..773d3b7 100644
--- a/drivers/net/ethernet/atheros/atl1c/atl1c_main.c
+++ b/drivers/net/ethernet/atheros/atl1c/atl1c_main.c
@@ -519,6 +519,26 @@ static int atl1c_set_features(struct net_device *netdev,
return 0;
 }
 
+static void atl1c_set_max_mtu(struct net_device *netdev)
+{
+   struct atl1c_adapter *adapter = netdev_priv(netdev);
+   struct atl1c_hw *hw = >hw;
+
+   switch (hw->nic_type) {
+   /* These (GbE) devices support jumbo packets, max_mtu 6122 */
+   case athr_l1c:
+   case athr_l1d:
+   case athr_l1d_2:
+   netdev->max_mtu = MAX_JUMBO_FRAME_SIZE -
+ (ETH_HLEN + ETH_FCS_LEN + VLAN_HLEN);
+   break;
+   /* The 10/100 devices don't support jumbo packets, max_mtu 1500 */
+   default:
+   netdev->max_mtu = ETH_DATA_LEN;
+   break;
+   }
+}
+
 /**
  * atl1c_change_mtu - Change the Maximum Transfer Unit
  * @netdev: network interface device structure
@@ -529,22 +549,9 @@ static int atl1c_set_features(struct net_device *netdev,
 static int atl1c_change_mtu(struct net_device *netdev, int new_mtu)
 {
struct atl1c_adapter *adapter = netdev_priv(netdev);
-   struct atl1c_hw *hw = >hw;
-   int old_mtu   = netdev->mtu;
-   int max_frame = new_mtu + ETH_HLEN + ETH_FCS_LEN + VLAN_HLEN;
-
-   /* Fast Ethernet controller doesn't support jumbo packet */
-   if (((hw->nic_type == athr_l2c ||
- hw->nic_type == athr_l2c_b ||
- hw->nic_type == athr_l2c_b2) && new_mtu > ETH_DATA_LEN) ||
- max_frame < ETH_ZLEN + ETH_FCS_LEN ||
- max_frame > MAX_JUMBO_FRAME_SIZE) {
-   if (netif_msg_link(adapter))
-   dev_warn(>pdev->dev, "invalid MTU setting\n");
-   return -EINVAL;
-   }
+
/* set MTU */
-   if (old_mtu != new_mtu && netif_running(netdev)) {
+   if (netif_running(netdev)) {
while

[PATCH net-next 02/15] ethernet/broadcom: use core min/max MTU checking

2016-10-17 Thread Jarod Wilson

tg3: min_mtu 60, max_mtu 9000/1500

bnxt: min_mtu 60, max_mtu 9000

bnx2x: min_mtu 46, max_mtu 9600
- Fix up ETH_OVREHEAD -> ETH_OVERHEAD while we're in here, remove
  duplicated defines from bnx2x_link.c.

bnx2: min_mtu 46, max_mtu 9000
- Use more standard ETH_* defines while we're at it.

bcm63xx_enet: min_mtu 46, max_mtu 2028
- compute_hw_mtu was made largely pointless, and thus merged back into
  bcm_enet_change_mtu.

b44: min_mtu 60, max_mtu 1500

CC: netdev@vger.kernel.org
CC: Michael Chan 
CC: Sony Chacko 
CC: Ariel Elior 
CC: dept-hsglinuxnic...@qlogic.com
CC: Siva Reddy Kallam 
CC: Prashant Sreedharan 
Signed-off-by: Jarod Wilson 
---
 drivers/net/ethernet/broadcom/b44.c  |  9 +++---
 drivers/net/ethernet/broadcom/bcm63xx_enet.c | 35 
 drivers/net/ethernet/broadcom/bnx2.c | 16 +--
 drivers/net/ethernet/broadcom/bnx2.h |  6 ++--
 drivers/net/ethernet/broadcom/bnx2x/bnx2x.h  |  6 ++--
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c  |  8 +-
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_link.c | 22 ++-
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c |  7 +++--
 drivers/net/ethernet/broadcom/bnxt/bnxt.c|  7 +++--
 drivers/net/ethernet/broadcom/tg3.c  |  9 +++---
 10 files changed, 51 insertions(+), 74 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/b44.c 
b/drivers/net/ethernet/broadcom/b44.c
index 17aa33c..1df3048 100644
--- a/drivers/net/ethernet/broadcom/b44.c
+++ b/drivers/net/ethernet/broadcom/b44.c
@@ -59,8 +59,8 @@
 #define B44_TX_TIMEOUT (5 * HZ)
 
 /* hardware minimum and maximum for a single frame's data payload */
-#define B44_MIN_MTU60
-#define B44_MAX_MTU1500
+#define B44_MIN_MTUETH_ZLEN
+#define B44_MAX_MTUETH_DATA_LEN
 
 #define B44_RX_RING_SIZE   512
 #define B44_DEF_RX_RING_PENDING200
@@ -1064,9 +1064,6 @@ static int b44_change_mtu(struct net_device *dev, int 
new_mtu)
 {
struct b44 *bp = netdev_priv(dev);
 
-   if (new_mtu < B44_MIN_MTU || new_mtu > B44_MAX_MTU)
-   return -EINVAL;
-
if (!netif_running(dev)) {
/* We'll just catch it later when the
 * device is up'd.
@@ -2377,6 +2374,8 @@ static int b44_init_one(struct ssb_device *sdev,
dev->netdev_ops = _netdev_ops;
netif_napi_add(dev, >napi, b44_poll, 64);
dev->watchdog_timeo = B44_TX_TIMEOUT;
+   dev->min_mtu = B44_MIN_MTU;
+   dev->max_mtu = B44_MAX_MTU;
dev->irq = sdev->irq;
dev->ethtool_ops = _ethtool_ops;
 
diff --git a/drivers/net/ethernet/broadcom/bcm63xx_enet.c 
b/drivers/net/ethernet/broadcom/bcm63xx_enet.c
index ae364c7..7e513ca 100644
--- a/drivers/net/ethernet/broadcom/bcm63xx_enet.c
+++ b/drivers/net/ethernet/broadcom/bcm63xx_enet.c
@@ -1622,20 +1622,19 @@ static int bcm_enet_ioctl(struct net_device *dev, 
struct ifreq *rq, int cmd)
 }
 
 /*
- * calculate actual hardware mtu
+ * adjust mtu, can't be called while device is running
  */
-static int compute_hw_mtu(struct bcm_enet_priv *priv, int mtu)
+static int bcm_enet_change_mtu(struct net_device *dev, int new_mtu)
 {
-   int actual_mtu;
+   struct bcm_enet_priv *priv = netdev_priv(dev);
+   int actual_mtu = new_mtu;
 
-   actual_mtu = mtu;
+   if (netif_running(dev))
+   return -EBUSY;
 
/* add ethernet header + vlan tag size */
actual_mtu += VLAN_ETH_HLEN;
 
-   if (actual_mtu < 64 || actual_mtu > BCMENET_MAX_MTU)
-   return -EINVAL;
-
/*
 * setup maximum size before we get overflow mark in
 * descriptor, note that this will not prevent reception of
@@ -1650,22 +1649,7 @@ static int compute_hw_mtu(struct bcm_enet_priv *priv, 
int mtu)
 */
priv->rx_skb_size = ALIGN(actual_mtu + ETH_FCS_LEN,
  priv->dma_maxburst * 4);
-   return 0;
-}
 
-/*
- * adjust mtu, can't be called while device is running
- */
-static int bcm_enet_change_mtu(struct net_device *dev, int new_mtu)
-{
-   int ret;
-
-   if (netif_running(dev))
-   return -EBUSY;
-
-   ret = compute_hw_mtu(netdev_priv(dev), new_mtu);
-   if (ret)
-   return ret;
dev->mtu = new_mtu;
return 0;
 }
@@ -1755,7 +1739,7 @@ static int bcm_enet_probe(struct platform_device *pdev)
priv->enet_is_sw = false;
priv->dma_maxburst = BCMENET_DMA_MAXBURST;
 
-   ret = compute_hw_mtu(priv, dev->mtu);
+   ret = bcm_enet_change_mtu(dev, dev->mtu);
if (ret)
goto out;
 
@@ -1888,6 +1872,9 @@ static int bcm_enet_probe(struct platform_device *pdev)
netif_napi_add(dev, >napi, bcm_enet_poll, 16);

[PATCH net-next 05/15] ethernet/mellanox: use core min/max MTU checking

2016-10-17 Thread Jarod Wilson

mlx4: min_mtu 46, max_mtu depends on hardware

mlx5: min_mtu 68, max_mtu depends on hardware

CC: netdev@vger.kernel.org
CC: Tariq Toukan 
CC: Saeed Mahameed 
Signed-off-by: Jarod Wilson 
---
 drivers/net/ethernet/mellanox/mlx4/en_netdev.c|  8 
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 24 ++-
 2 files changed, 10 insertions(+), 22 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c 
b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
index 7e703be..bf35ac4 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
@@ -2205,10 +2205,6 @@ static int mlx4_en_change_mtu(struct net_device *dev, 
int new_mtu)
en_dbg(DRV, priv, "Change MTU called - current:%d new:%d\n",
 dev->mtu, new_mtu);
 
-   if ((new_mtu < MLX4_EN_MIN_MTU) || (new_mtu > priv->max_mtu)) {
-   en_err(priv, "Bad MTU size:%d.\n", new_mtu);
-   return -EPERM;
-   }
if (priv->xdp_ring_num && MLX4_EN_EFF_MTU(new_mtu) > FRAG_SZ0) {
en_err(priv, "MTU size:%d requires frags but XDP running\n",
   new_mtu);
@@ -3288,6 +3284,10 @@ int mlx4_en_init_netdev(struct mlx4_en_dev *mdev, int 
port,
dev->gso_partial_features = NETIF_F_GSO_UDP_TUNNEL_CSUM;
}
 
+   /* MTU range: 46 - hw-specific max */
+   dev->min_mtu = MLX4_EN_MIN_MTU;
+   dev->max_mtu = priv->max_mtu;
+
mdev->pndev[port] = dev;
mdev->upper[port] = NULL;
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 7eaf380..03183eb 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -2851,31 +2851,13 @@ static int mlx5e_set_features(struct net_device *netdev,
return err ? -EINVAL : 0;
 }
 
-#define MXL5_HW_MIN_MTU 64
-#define MXL5E_MIN_MTU (MXL5_HW_MIN_MTU + ETH_FCS_LEN)
-
 static int mlx5e_change_mtu(struct net_device *netdev, int new_mtu)
 {
struct mlx5e_priv *priv = netdev_priv(netdev);
-   struct mlx5_core_dev *mdev = priv->mdev;
bool was_opened;
-   u16 max_mtu;
-   u16 min_mtu;
int err = 0;
bool reset;
 
-   mlx5_query_port_max_mtu(mdev, _mtu, 1);
-
-   max_mtu = MLX5E_HW2SW_MTU(max_mtu);
-   min_mtu = MLX5E_HW2SW_MTU(MXL5E_MIN_MTU);
-
-   if (new_mtu > max_mtu || new_mtu < min_mtu) {
-   netdev_err(netdev,
-  "%s: Bad MTU (%d), valid range is: [%d..%d]\n",
-  __func__, new_mtu, min_mtu, max_mtu);
-   return -EINVAL;
-   }
-
mutex_lock(>state_lock);
 
reset = !priv->params.lro_en &&
@@ -3835,6 +3817,7 @@ int mlx5e_attach_netdev(struct mlx5_core_dev *mdev, 
struct net_device *netdev)
 {
const struct mlx5e_profile *profile;
struct mlx5e_priv *priv;
+   u16 max_mtu;
int err;
 
priv = netdev_priv(netdev);
@@ -3865,6 +3848,11 @@ int mlx5e_attach_netdev(struct mlx5_core_dev *mdev, 
struct net_device *netdev)
 
mlx5e_init_l2_addr(priv);
 
+   /* MTU range: 68 - hw-specific max */
+   netdev->min_mtu = ETH_MIN_MTU;
+   mlx5_query_port_max_mtu(priv->mdev, _mtu, 1);
+   netdev->max_mtu = MLX5E_HW2SW_MTU(max_mtu);
+
mlx5e_set_dev_port_mtu(netdev);
 
if (profile->enable)
-- 
2.10.0

[PATCH net-next 08/15] ethernet/sun: use core min/max MTU checking

2016-10-17 Thread Jarod Wilson

cassini: min_mtu 60, max_mtu 9000

niu: min_mtu 68, max_mtu 9216

sungem: min_mtu 68, max_mtu 1500 (comments say jumbo mode is broken)

sunvnet: min_mtu 68, max_mtu 65535
- removed sunvnet_change_mut_common as it does nothing now

CC: netdev@vger.kernel.org
Signed-off-by: Jarod Wilson 
---
 drivers/net/ethernet/sun/cassini.c|  7 ---
 drivers/net/ethernet/sun/ldmvsw.c |  5 -
 drivers/net/ethernet/sun/niu.c|  7 ---
 drivers/net/ethernet/sun/sungem.c | 11 ++-
 drivers/net/ethernet/sun/sunvnet.c|  5 -
 drivers/net/ethernet/sun/sunvnet_common.c | 10 --
 drivers/net/ethernet/sun/sunvnet_common.h |  3 ++-
 7 files changed, 24 insertions(+), 24 deletions(-)

diff --git a/drivers/net/ethernet/sun/cassini.c 
b/drivers/net/ethernet/sun/cassini.c
index 062bce9..e9e5ef2 100644
--- a/drivers/net/ethernet/sun/cassini.c
+++ b/drivers/net/ethernet/sun/cassini.c
@@ -3863,9 +3863,6 @@ static int cas_change_mtu(struct net_device *dev, int 
new_mtu)
 {
struct cas *cp = netdev_priv(dev);
 
-   if (new_mtu < CAS_MIN_MTU || new_mtu > CAS_MAX_MTU)
-   return -EINVAL;
-
dev->mtu = new_mtu;
if (!netif_running(dev) || !netif_device_present(dev))
return 0;
@@ -5115,6 +5112,10 @@ static int cas_init_one(struct pci_dev *pdev, const 
struct pci_device_id *ent)
if (pci_using_dac)
dev->features |= NETIF_F_HIGHDMA;
 
+   /* MTU range: 60 - varies or 9000 */
+   dev->min_mtu = CAS_MIN_MTU;
+   dev->max_mtu = CAS_MAX_MTU;
+
if (register_netdev(dev)) {
dev_err(>dev, "Cannot register net device, aborting\n");
goto err_out_free_consistent;
diff --git a/drivers/net/ethernet/sun/ldmvsw.c 
b/drivers/net/ethernet/sun/ldmvsw.c
index 0ac449a..335b876 100644
--- a/drivers/net/ethernet/sun/ldmvsw.c
+++ b/drivers/net/ethernet/sun/ldmvsw.c
@@ -139,7 +139,6 @@ static const struct net_device_ops vsw_ops = {
.ndo_set_mac_address= sunvnet_set_mac_addr_common,
.ndo_validate_addr  = eth_validate_addr,
.ndo_tx_timeout = sunvnet_tx_timeout_common,
-   .ndo_change_mtu = sunvnet_change_mtu_common,
.ndo_start_xmit = vsw_start_xmit,
.ndo_select_queue   = vsw_select_queue,
 #ifdef CONFIG_NET_POLL_CONTROLLER
@@ -239,6 +238,10 @@ static struct net_device *vsw_alloc_netdev(u8 hwaddr[],
   NETIF_F_HW_CSUM | NETIF_F_SG;
dev->features = dev->hw_features;
 
+   /* MTU range: 68 - 65535 */
+   dev->min_mtu = ETH_MIN_MTU;
+   dev->max_mtu = VNET_MAX_MTU;
+
SET_NETDEV_DEV(dev, >dev);
 
return dev;
diff --git a/drivers/net/ethernet/sun/niu.c b/drivers/net/ethernet/sun/niu.c
index a2371aa..f90d1af 100644
--- a/drivers/net/ethernet/sun/niu.c
+++ b/drivers/net/ethernet/sun/niu.c
@@ -6754,9 +6754,6 @@ static int niu_change_mtu(struct net_device *dev, int 
new_mtu)
struct niu *np = netdev_priv(dev);
int err, orig_jumbo, new_jumbo;
 
-   if (new_mtu < 68 || new_mtu > NIU_MAX_MTU)
-   return -EINVAL;
-
orig_jumbo = (dev->mtu > ETH_DATA_LEN);
new_jumbo = (new_mtu > ETH_DATA_LEN);
 
@@ -9823,6 +9820,10 @@ static int niu_pci_init_one(struct pci_dev *pdev,
 
dev->irq = pdev->irq;
 
+   /* MTU range: 68 - 9216 */
+   dev->min_mtu = ETH_MIN_MTU;
+   dev->max_mtu = NIU_MAX_MTU;
+
niu_assign_netdev_ops(dev);
 
err = niu_get_invariants(np);
diff --git a/drivers/net/ethernet/sun/sungem.c 
b/drivers/net/ethernet/sun/sungem.c
index d6ad0fb..66ecf0f 100644
--- a/drivers/net/ethernet/sun/sungem.c
+++ b/drivers/net/ethernet/sun/sungem.c
@@ -2476,9 +2476,9 @@ static void gem_set_multicast(struct net_device *dev)
 }
 
 /* Jumbo-grams don't seem to work :-( */
-#define GEM_MIN_MTU68
+#define GEM_MIN_MTUETH_MIN_MTU
 #if 1
-#define GEM_MAX_MTU1500
+#define GEM_MAX_MTUETH_DATA_LEN
 #else
 #define GEM_MAX_MTU9000
 #endif
@@ -2487,9 +2487,6 @@ static int gem_change_mtu(struct net_device *dev, int 
new_mtu)
 {
struct gem *gp = netdev_priv(dev);
 
-   if (new_mtu < GEM_MIN_MTU || new_mtu > GEM_MAX_MTU)
-   return -EINVAL;
-
dev->mtu = new_mtu;
 
/* We'll just catch it later when the device is up'd or resumed */
@@ -2977,6 +2974,10 @@ static int gem_init_one(struct pci_dev *pdev, const 
struct pci_device_id *ent)
if (pci_using_dac)
dev->features |= NETIF_F_HIGHDMA;
 
+   /* MTU range: 68 - 1500 (Jumbo mode is broken) */
+   dev->min_mtu = GEM_MIN_MTU;
+   dev->max_mtu = GEM_MAX_MTU;
+
/* Register with kernel */
if (register_netdev(dev)) {
pr_err("Cannot register net device, aborting\n");
diff --git a/drivers/net/ethernet/sun/sunvnet.c 
b/drivers/net/ethernet/sun/sunvnet.c
index a2f9b47..5356a70 100644
--- a/drivers/net/ethernet/sun/sunvnet.c

[PATCH net-next 12/15] ethernet/ibm: use core min/max MTU checking

2016-10-17 Thread Jarod Wilson

ehea: min_mtu 68, max_mtu 9022
- remove ehea_change_mtu, it's now redundant

emac: min_mtu 46, max_mtu 1500 or whatever gets read from OF

CC: netdev@vger.kernel.org
CC: Douglas Miller 
Signed-off-by: Jarod Wilson 
---
 drivers/net/ethernet/ibm/ehea/ehea_main.c | 13 -
 drivers/net/ethernet/ibm/emac/core.c  |  9 +
 2 files changed, 9 insertions(+), 13 deletions(-)

diff --git a/drivers/net/ethernet/ibm/ehea/ehea_main.c 
b/drivers/net/ethernet/ibm/ehea/ehea_main.c
index 54efa9a..e9719ba 100644
--- a/drivers/net/ethernet/ibm/ehea/ehea_main.c
+++ b/drivers/net/ethernet/ibm/ehea/ehea_main.c
@@ -1981,14 +1981,6 @@ static void ehea_set_multicast_list(struct net_device 
*dev)
ehea_update_bcmc_registrations();
 }
 
-static int ehea_change_mtu(struct net_device *dev, int new_mtu)
-{
-   if ((new_mtu < 68) || (new_mtu > EHEA_MAX_PACKET_SIZE))
-   return -EINVAL;
-   dev->mtu = new_mtu;
-   return 0;
-}
-
 static void xmit_common(struct sk_buff *skb, struct ehea_swqe *swqe)
 {
swqe->tx_control |= EHEA_SWQE_IMM_DATA_PRESENT | EHEA_SWQE_CRC;
@@ -2968,7 +2960,6 @@ static const struct net_device_ops ehea_netdev_ops = {
.ndo_set_mac_address= ehea_set_mac_addr,
.ndo_validate_addr  = eth_validate_addr,
.ndo_set_rx_mode= ehea_set_multicast_list,
-   .ndo_change_mtu = ehea_change_mtu,
.ndo_vlan_rx_add_vid= ehea_vlan_rx_add_vid,
.ndo_vlan_rx_kill_vid   = ehea_vlan_rx_kill_vid,
.ndo_tx_timeout = ehea_tx_watchdog,
@@ -3041,6 +3032,10 @@ static struct ehea_port *ehea_setup_single_port(struct 
ehea_adapter *adapter,
NETIF_F_IP_CSUM;
dev->watchdog_timeo = EHEA_WATCH_DOG_TIMEOUT;
 
+   /* MTU range: 68 - 9022 */
+   dev->min_mtu = ETH_MIN_MTU;
+   dev->max_mtu = EHEA_MAX_PACKET_SIZE;
+
INIT_WORK(>reset_task, ehea_reset_port);
INIT_DELAYED_WORK(>stats_work, ehea_update_stats);
 
diff --git a/drivers/net/ethernet/ibm/emac/core.c 
b/drivers/net/ethernet/ibm/emac/core.c
index 5d804a5..52a69c9 100644
--- a/drivers/net/ethernet/ibm/emac/core.c
+++ b/drivers/net/ethernet/ibm/emac/core.c
@@ -1099,9 +1099,6 @@ static int emac_change_mtu(struct net_device *ndev, int 
new_mtu)
struct emac_instance *dev = netdev_priv(ndev);
int ret = 0;
 
-   if (new_mtu < EMAC_MIN_MTU || new_mtu > dev->max_mtu)
-   return -EINVAL;
-
DBG(dev, "change_mtu(%d)" NL, new_mtu);
 
if (netif_running(ndev)) {
@@ -2564,7 +2561,7 @@ static int emac_init_config(struct emac_instance *dev)
if (emac_read_uint_prop(np, "cell-index", >cell_index, 1))
return -ENXIO;
if (emac_read_uint_prop(np, "max-frame-size", >max_mtu, 0))
-   dev->max_mtu = 1500;
+   dev->max_mtu = ETH_DATA_LEN;
if (emac_read_uint_prop(np, "rx-fifo-size", >rx_fifo_size, 0))
dev->rx_fifo_size = 2048;
if (emac_read_uint_prop(np, "tx-fifo-size", >tx_fifo_size, 0))
@@ -2890,6 +2887,10 @@ static int emac_probe(struct platform_device *ofdev)
ndev->netdev_ops = _netdev_ops;
ndev->ethtool_ops = _ethtool_ops;
 
+   /* MTU range: 46 - 1500 or whatever is in OF */
+   ndev->min_mtu = EMAC_MIN_MTU;
+   ndev->max_mtu = dev->max_mtu;
+
netif_carrier_off(ndev);
 
err = register_netdev(ndev);
-- 
2.10.0

[PATCH net-next 15/15] ethernet: use core min/max MTU checking

2016-10-17 Thread Jarod Wilson

et131x: min_mtu 64, max_mtu 9216

altera_tse: min_mtu 64, max_mtu 1500

amd8111e: min_mtu 60, max_mtu 9000

bnad: min_mtu 46, max_mtu 9000

macb: min_mtu 68, max_mtu 1500 or 10240 depending on hardware capability

xgmac: min_mtu 46, max_mtu 9000

cxgb2: min_mtu 68, max_mtu 9582 (pm3393) or 9600 (vsc7326)

enic: min_mtu 68, max_mtu 9000

gianfar: min_mtu 50, max_mu 9586

hns_enet: min_mtu 68, max_mtu 9578 (v1) or 9706 (v2)

ksz884x: min_mtu 60, max_mtu 1894

myri10ge: min_mtu 68, max_mtu 9000

natsemi: min_mtu 64, max_mtu 2024

nfp: min_mtu 68, max_mtu hardware-specific

forcedeth: min_mtu 64, max_mtu 1500 or 9100, depending on hardware

pch_gbe: min_mtu 46, max_mtu 10300

pasemi_mac: min_mtu 64, max_mtu 9000

qcaspi: min_mtu 46, max_mtu 1500
- remove qcaspi_netdev_change_mtu as it is now redundant

rocker: min_mtu 68, max_mtu 9000

sxgbe: min_mtu 68, max_mtu 9000

stmmac: min_mtu 46, max_mtu depends on hardware

tehuti: min_mtu 60, max_mtu 16384
- driver had no max mtu checking, but product docs say 16k jumbo packets
  are supported by the hardware

netcp: min_mtu 68, max_mtu 9486
- remove netcp_ndo_change_mtu as it is now redundant

via-velocity: min_mtu 64, max_mtu 9000

octeon: min_mtu 46, max_mtu 65370

CC: netdev@vger.kernel.org
CC: Mark Einon 
CC: Vince Bridgers 
CC: Rasesh Mody 
CC: Nicolas Ferre 
CC: Santosh Raspatur 
CC: Hariprasad S 
CC:  Christian Benvenuti 
CC: Sujith Sankar 
CC: Govindarajulu Varadarajan <_gov...@gmx.com>
CC: Neel Patel 
CC: Claudiu Manoil 
CC: Yisen Zhuang 
CC: Salil Mehta 
CC: Hyong-Youb Kim 
CC: Jakub Kicinski 
CC: Olof Johansson 
CC: Jiri Pirko 
CC: Byungho An 
CC: Girish K S 
CC: Vipul Pandya 
CC: Giuseppe Cavallaro 
CC: Alexandre Torgue 
CC: Andy Gospodarek 
CC: Wingman Kwok 
CC: Murali Karicheri 
CC: Francois Romieu 
Signed-off-by: Jarod Wilson 
---
 drivers/net/ethernet/agere/et131x.c|  7 +++--
 drivers/net/ethernet/altera/altera_tse.h   |  1 -
 drivers/net/ethernet/altera/altera_tse_main.c  | 14 ++---
 drivers/net/ethernet/amd/amd8111e.c|  5 ++--
 drivers/net/ethernet/brocade/bna/bnad.c|  7 +++--
 drivers/net/ethernet/cadence/macb.c| 19 ++---
 drivers/net/ethernet/calxeda/xgmac.c   | 20 -
 drivers/net/ethernet/chelsio/cxgb/common.h |  5 
 drivers/net/ethernet/chelsio/cxgb/cxgb2.c  | 18 ++--
 drivers/net/ethernet/chelsio/cxgb/pm3393.c |  8 +-
 drivers/net/ethernet/chelsio/cxgb/vsc7326.c|  5 
 drivers/net/ethernet/cisco/enic/enic_main.c|  7 +++--
 drivers/net/ethernet/cisco/enic/enic_res.h |  2 +-
 drivers/net/ethernet/freescale/gianfar.c   |  9 ++
 drivers/net/ethernet/hisilicon/hns/hns_dsaf_mac.c  |  3 +-
 drivers/net/ethernet/hisilicon/hns/hns_enet.c  | 11 +---
 drivers/net/ethernet/micrel/ksz884x.c  | 33 +++---
 drivers/net/ethernet/myricom/myri10ge/myri10ge.c   | 20 +++--
 drivers/net/ethernet/natsemi/natsemi.c |  7 +++--
 .../net/ethernet/netronome/nfp/nfp_net_common.c| 10 +++
 drivers/net/ethernet/nvidia/forcedeth.c|  9 +++---
 .../net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c   | 13 -
 drivers/net/ethernet/pasemi/pasemi_mac.c   | 12 
 drivers/net/ethernet/qualcomm/qca_framing.h|  6 ++--
 drivers/net/ethernet/qualcomm/qca_spi.c| 16 +++
 drivers/net/ethernet/rocker/rocker_main.c  | 12 
 drivers/net/ethernet/samsung/sxgbe/sxgbe_main.c| 17 +++
 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c  | 25 ++--
 drivers/net/ethernet/tehuti/tehuti.c   | 14 +++--
 drivers/net/ethernet/tehuti/tehuti.h   |  3 ++
 drivers/net/ethernet/ti/netcp_core.c   | 20 +++--
 drivers/net/ethernet/via/via-velocity.c| 11 +++-
 drivers/staging/octeon/ethernet.c  | 22 +++
 33 files changed, 169 insertions(+), 222 deletions(-)

diff --git a/drivers/net/ethernet/agere/et131x.c 
b/drivers/net/ethernet/agere/et131x.c
index 9066838..831bab3 100644
--- a/drivers/net/ethernet/agere/et131x.c
+++ b/drivers/net/ethernet/agere/et131x.c
@@ -176,6 +176,8 @@ MODULE_DESCRIPTION("10/100/1000 Base-T Ethernet Driver for 
the ET1310 by Agere S
 #define NUM_FBRS   2
 
 #define MAX_PACKETS_HANDLED

[PATCH net-next 14/15] ethernet/toshiba: use core min/max MTU checking

2016-10-17 Thread Jarod Wilson

gelic_net: min_mtu 64, max_mtu 1518
- remove gelic_net_change_mtu now that it is redundant

spidernet: min_Mtu 64, max_mtu 2294
- remove spiter_net_change_mtu now that it is redundant

CC: netdev@vger.kernel.org
CC: Geoff Levand 
CC: Ishizaki Kou 
Signed-off-by: Jarod Wilson 
---
 drivers/net/ethernet/toshiba/ps3_gelic_net.c  | 23 --
 drivers/net/ethernet/toshiba/ps3_gelic_net.h  |  1 -
 drivers/net/ethernet/toshiba/ps3_gelic_wireless.c |  1 -
 drivers/net/ethernet/toshiba/spider_net.c | 24 ---
 4 files changed, 8 insertions(+), 41 deletions(-)

diff --git a/drivers/net/ethernet/toshiba/ps3_gelic_net.c 
b/drivers/net/ethernet/toshiba/ps3_gelic_net.c
index 272f2b1..345316c 100644
--- a/drivers/net/ethernet/toshiba/ps3_gelic_net.c
+++ b/drivers/net/ethernet/toshiba/ps3_gelic_net.c
@@ -1114,24 +1114,6 @@ static int gelic_net_poll(struct napi_struct *napi, int 
budget)
}
return packets_done;
 }
-/**
- * gelic_net_change_mtu - changes the MTU of an interface
- * @netdev: interface device structure
- * @new_mtu: new MTU value
- *
- * returns 0 on success, <0 on failure
- */
-int gelic_net_change_mtu(struct net_device *netdev, int new_mtu)
-{
-   /* no need to re-alloc skbs or so -- the max mtu is about 2.3k
-* and mtu is outbound only anyway */
-   if ((new_mtu < GELIC_NET_MIN_MTU) ||
-   (new_mtu > GELIC_NET_MAX_MTU)) {
-   return -EINVAL;
-   }
-   netdev->mtu = new_mtu;
-   return 0;
-}
 
 /**
  * gelic_card_interrupt - event handler for gelic_net
@@ -1446,7 +1428,6 @@ static const struct net_device_ops gelic_netdevice_ops = {
.ndo_stop = gelic_net_stop,
.ndo_start_xmit = gelic_net_xmit,
.ndo_set_rx_mode = gelic_net_set_multi,
-   .ndo_change_mtu = gelic_net_change_mtu,
.ndo_tx_timeout = gelic_net_tx_timeout,
.ndo_set_mac_address = eth_mac_addr,
.ndo_validate_addr = eth_validate_addr,
@@ -1513,6 +1494,10 @@ int gelic_net_setup_netdev(struct net_device *netdev, 
struct gelic_card *card)
netdev->features |= NETIF_F_VLAN_CHALLENGED;
}
 
+   /* MTU range: 64 - 1518 */
+   netdev->min_mtu = GELIC_NET_MIN_MTU;
+   netdev->max_mtu = GELIC_NET_MAX_MTU;
+
status = register_netdev(netdev);
if (status) {
dev_err(ctodev(card), "%s:Couldn't register %s %d\n",
diff --git a/drivers/net/ethernet/toshiba/ps3_gelic_net.h 
b/drivers/net/ethernet/toshiba/ps3_gelic_net.h
index 8505196..003d045 100644
--- a/drivers/net/ethernet/toshiba/ps3_gelic_net.h
+++ b/drivers/net/ethernet/toshiba/ps3_gelic_net.h
@@ -373,7 +373,6 @@ int gelic_net_stop(struct net_device *netdev);
 int gelic_net_xmit(struct sk_buff *skb, struct net_device *netdev);
 void gelic_net_set_multi(struct net_device *netdev);
 void gelic_net_tx_timeout(struct net_device *netdev);
-int gelic_net_change_mtu(struct net_device *netdev, int new_mtu);
 int gelic_net_setup_netdev(struct net_device *netdev, struct gelic_card *card);
 
 /* shared ethtool ops */
diff --git a/drivers/net/ethernet/toshiba/ps3_gelic_wireless.c 
b/drivers/net/ethernet/toshiba/ps3_gelic_wireless.c
index 446ea58..b3abd02 100644
--- a/drivers/net/ethernet/toshiba/ps3_gelic_wireless.c
+++ b/drivers/net/ethernet/toshiba/ps3_gelic_wireless.c
@@ -2558,7 +2558,6 @@ static const struct net_device_ops gelic_wl_netdevice_ops 
= {
.ndo_stop = gelic_wl_stop,
.ndo_start_xmit = gelic_net_xmit,
.ndo_set_rx_mode = gelic_net_set_multi,
-   .ndo_change_mtu = gelic_net_change_mtu,
.ndo_tx_timeout = gelic_net_tx_timeout,
.ndo_set_mac_address = eth_mac_addr,
.ndo_validate_addr = eth_validate_addr,
diff --git a/drivers/net/ethernet/toshiba/spider_net.c 
b/drivers/net/ethernet/toshiba/spider_net.c
index 36a6e8b..cb341df 100644
--- a/drivers/net/ethernet/toshiba/spider_net.c
+++ b/drivers/net/ethernet/toshiba/spider_net.c
@@ -1279,25 +1279,6 @@ static int spider_net_poll(struct napi_struct *napi, int 
budget)
 }
 
 /**
- * spider_net_change_mtu - changes the MTU of an interface
- * @netdev: interface device structure
- * @new_mtu: new MTU value
- *
- * returns 0 on success, <0 on failure
- */
-static int
-spider_net_change_mtu(struct net_device *netdev, int new_mtu)
-{
-   /* no need to re-alloc skbs or so -- the max mtu is about 2.3k
-* and mtu is outbound only anyway */
-   if ( (new_mtu < SPIDER_NET_MIN_MTU ) ||
-   (new_mtu > SPIDER_NET_MAX_MTU) )
-   return -EINVAL;
-   netdev->mtu = new_mtu;
-   return 0;
-}
-
-/**
  * spider_net_set_mac - sets the MAC of an interface
  * @netdev: interface device structure
  * @ptr: pointer to new MAC address
@@ -2229,7 +2210,6 @@ static const struct net_device_ops spider_net_ops = {
.ndo_start_xmit = spider_net_xmit,
.ndo_set_rx_mode=

[PATCH net-next 09/15] ethernet/dlink: use core min/max MTU checking

2016-10-17 Thread Jarod Wilson

dl2k: min_mtu 68, max_mtu 1536 or 8000, depending on hardware
- Removed change_mtu, does nothing productive anymore

sundance: min_mtu 68, max_mtu 8191

CC: netdev@vger.kernel.org
CC: Denis Kirjanov 
Signed-off-by: Jarod Wilson 
---
 drivers/net/ethernet/dlink/dl2k.c | 22 --
 drivers/net/ethernet/dlink/sundance.c |  6 --
 2 files changed, 8 insertions(+), 20 deletions(-)

diff --git a/drivers/net/ethernet/dlink/dl2k.c 
b/drivers/net/ethernet/dlink/dl2k.c
index 78f1446..8c95a8a 100644
--- a/drivers/net/ethernet/dlink/dl2k.c
+++ b/drivers/net/ethernet/dlink/dl2k.c
@@ -76,7 +76,6 @@ static void rio_free_tx (struct net_device *dev, int irq);
 static void tx_error (struct net_device *dev, int tx_status);
 static int receive_packet (struct net_device *dev);
 static void rio_error (struct net_device *dev, int int_status);
-static int change_mtu (struct net_device *dev, int new_mtu);
 static void set_multicast (struct net_device *dev);
 static struct net_device_stats *get_stats (struct net_device *dev);
 static int clear_stats (struct net_device *dev);
@@ -106,7 +105,6 @@ static const struct net_device_ops netdev_ops = {
.ndo_set_rx_mode= set_multicast,
.ndo_do_ioctl   = rio_ioctl,
.ndo_tx_timeout = rio_tx_timeout,
-   .ndo_change_mtu = change_mtu,
 };
 
 static int
@@ -230,6 +228,10 @@ rio_probe1 (struct pci_dev *pdev, const struct 
pci_device_id *ent)
 #if 0
dev->features = NETIF_F_IP_CSUM;
 #endif
+   /* MTU range: 68 - 1536 or 8000 */
+   dev->min_mtu = ETH_MIN_MTU;
+   dev->max_mtu = np->jumbo ? MAX_JUMBO : PACKET_SIZE;
+
pci_set_drvdata (pdev, dev);
 
ring_space = pci_alloc_consistent (pdev, TX_TOTAL_SIZE, _dma);
@@ -1198,22 +1200,6 @@ clear_stats (struct net_device *dev)
return 0;
 }
 
-
-static int
-change_mtu (struct net_device *dev, int new_mtu)
-{
-   struct netdev_private *np = netdev_priv(dev);
-   int max = (np->jumbo) ? MAX_JUMBO : 1536;
-
-   if ((new_mtu < 68) || (new_mtu > max)) {
-   return -EINVAL;
-   }
-
-   dev->mtu = new_mtu;
-
-   return 0;
-}
-
 static void
 set_multicast (struct net_device *dev)
 {
diff --git a/drivers/net/ethernet/dlink/sundance.c 
b/drivers/net/ethernet/dlink/sundance.c
index 79d8009..eab36ac 100644
--- a/drivers/net/ethernet/dlink/sundance.c
+++ b/drivers/net/ethernet/dlink/sundance.c
@@ -580,6 +580,10 @@ static int sundance_probe1(struct pci_dev *pdev,
dev->ethtool_ops = _ops;
dev->watchdog_timeo = TX_TIMEOUT;
 
+   /* MTU range: 68 - 8191 */
+   dev->min_mtu = ETH_MIN_MTU;
+   dev->max_mtu = 8191;
+
pci_set_drvdata(pdev, dev);
 
i = register_netdev(dev);
@@ -713,8 +717,6 @@ static int sundance_probe1(struct pci_dev *pdev,
 
 static int change_mtu(struct net_device *dev, int new_mtu)
 {
-   if ((new_mtu < 68) || (new_mtu > 8191)) /* Set by RxDMAFrameLen */
-   return -EINVAL;
if (netif_running(dev))
return -EBUSY;
dev->mtu = new_mtu;
-- 
2.10.0

[PATCH net-next 10/15] ethernet/neterion: use core min/max MTU checking

2016-10-17 Thread Jarod Wilson

s2io: min_mtu 46, max_mtu 9600

vxge: min_mtu 68, max_mtu 9600

CC: netdev@vger.kernel.org
CC: Jon Mason 
Signed-off-by: Jarod Wilson 
---
 drivers/net/ethernet/neterion/s2io.c | 9 -
 drivers/net/ethernet/neterion/vxge/vxge-config.h | 2 +-
 drivers/net/ethernet/neterion/vxge/vxge-main.c   | 9 -
 3 files changed, 9 insertions(+), 11 deletions(-)

diff --git a/drivers/net/ethernet/neterion/s2io.c 
b/drivers/net/ethernet/neterion/s2io.c
index eaa37c0..564f682 100644
--- a/drivers/net/ethernet/neterion/s2io.c
+++ b/drivers/net/ethernet/neterion/s2io.c
@@ -6678,11 +6678,6 @@ static int s2io_change_mtu(struct net_device *dev, int 
new_mtu)
struct s2io_nic *sp = netdev_priv(dev);
int ret = 0;
 
-   if ((new_mtu < MIN_MTU) || (new_mtu > S2IO_JUMBO_SIZE)) {
-   DBG_PRINT(ERR_DBG, "%s: MTU size is invalid.\n", dev->name);
-   return -EPERM;
-   }
-
dev->mtu = new_mtu;
if (netif_running(dev)) {
s2io_stop_all_tx_queue(sp);
@@ -8019,6 +8014,10 @@ s2io_init_nic(struct pci_dev *pdev, const struct 
pci_device_id *pre)
config->mc_start_offset = S2IO_HERC_MC_ADDR_START_OFFSET;
}
 
+   /* MTU range: 46 - 9600 */
+   dev->min_mtu = MIN_MTU;
+   dev->max_mtu = S2IO_JUMBO_SIZE;
+
/* store mac addresses from CAM to s2io_nic structure */
do_s2io_store_unicast_mc(sp);
 
diff --git a/drivers/net/ethernet/neterion/vxge/vxge-config.h 
b/drivers/net/ethernet/neterion/vxge/vxge-config.h
index 6ce4412..cfa9704 100644
--- a/drivers/net/ethernet/neterion/vxge/vxge-config.h
+++ b/drivers/net/ethernet/neterion/vxge/vxge-config.h
@@ -27,7 +27,7 @@
(((size) - (((u64)adrs) & ((size)-1))) & ((size)-1))
 #endif
 
-#define VXGE_HW_MIN_MTU68
+#define VXGE_HW_MIN_MTUETH_MIN_MTU
 #define VXGE_HW_MAX_MTU9600
 #define VXGE_HW_DEFAULT_MTU1500
 
diff --git a/drivers/net/ethernet/neterion/vxge/vxge-main.c 
b/drivers/net/ethernet/neterion/vxge/vxge-main.c
index e0993eb..e07b936 100644
--- a/drivers/net/ethernet/neterion/vxge/vxge-main.c
+++ b/drivers/net/ethernet/neterion/vxge/vxge-main.c
@@ -3074,11 +3074,6 @@ static int vxge_change_mtu(struct net_device *dev, int 
new_mtu)
 
vxge_debug_entryexit(vdev->level_trace,
"%s:%d", __func__, __LINE__);
-   if ((new_mtu < VXGE_HW_MIN_MTU) || (new_mtu > VXGE_HW_MAX_MTU)) {
-   vxge_debug_init(vdev->level_err,
-   "%s: mtu size is invalid", dev->name);
-   return -EPERM;
-   }
 
/* check if device is down already */
if (unlikely(!is_vxge_card_up(vdev))) {
@@ -3462,6 +3457,10 @@ static int vxge_device_register(struct __vxge_hw_device 
*hldev,
"%s : using High DMA", __func__);
}
 
+   /* MTU range: 68 - 9600 */
+   ndev->min_mtu = VXGE_HW_MIN_MTU;
+   ndev->max_mtu = VXGE_HW_MAX_MTU;
+
ret = register_netdev(ndev);
if (ret) {
vxge_debug_init(vxge_hw_device_trace_level_get(hldev),
-- 
2.10.0

[PATCH net-next 11/15] ethernet/cavium: use core min/max MTU checking

2016-10-17 Thread Jarod Wilson

liquidio: min_mtu 68, max_mtu 16000

thunder: min_mtu 64, max_mtu 9200

CC: netdev@vger.kernel.org
CC: Sunil Goutham 
CC: Robert Richter 
CC: Derek Chickles 
CC: Satanand Burla 
CC: Felix Manlunas 
CC: Raghu Vatsavayi 
Signed-off-by: Jarod Wilson 
---
 drivers/net/ethernet/cavium/liquidio/lio_main.c   | 15 ---
 drivers/net/ethernet/cavium/liquidio/octeon_network.h |  2 +-
 drivers/net/ethernet/cavium/octeon/octeon_mgmt.c  | 13 +++--
 drivers/net/ethernet/cavium/thunder/nicvf_main.c  | 10 --
 4 files changed, 12 insertions(+), 28 deletions(-)

diff --git a/drivers/net/ethernet/cavium/liquidio/lio_main.c 
b/drivers/net/ethernet/cavium/liquidio/lio_main.c
index afc6f9dc..71d01a7 100644
--- a/drivers/net/ethernet/cavium/liquidio/lio_main.c
+++ b/drivers/net/ethernet/cavium/liquidio/lio_main.c
@@ -2868,17 +2868,6 @@ static int liquidio_change_mtu(struct net_device 
*netdev, int new_mtu)
struct octnic_ctrl_pkt nctrl;
int ret = 0;
 
-   /* Limit the MTU to make sure the ethernet packets are between 68 bytes
-* and 16000 bytes
-*/
-   if ((new_mtu < LIO_MIN_MTU_SIZE) ||
-   (new_mtu > LIO_MAX_MTU_SIZE)) {
-   dev_err(>pci_dev->dev, "Invalid MTU: %d\n", new_mtu);
-   dev_err(>pci_dev->dev, "Valid range %d and %d\n",
-   LIO_MIN_MTU_SIZE, LIO_MAX_MTU_SIZE);
-   return -EINVAL;
-   }
-
memset(, 0, sizeof(struct octnic_ctrl_pkt));
 
nctrl.ncmd.u64 = 0;
@@ -3891,6 +3880,10 @@ static int setup_nic_devices(struct octeon_device 
*octeon_dev)
netdev->hw_features = netdev->hw_features &
~NETIF_F_HW_VLAN_CTAG_RX;
 
+   /* MTU range: 68 - 16000 */
+   netdev->min_mtu = LIO_MIN_MTU_SIZE;
+   netdev->max_mtu = LIO_MAX_MTU_SIZE;
+
/* Point to the  properties for octeon device to which this
 * interface belongs.
 */
diff --git a/drivers/net/ethernet/cavium/liquidio/octeon_network.h 
b/drivers/net/ethernet/cavium/liquidio/octeon_network.h
index e5d1deb..54b9665 100644
--- a/drivers/net/ethernet/cavium/liquidio/octeon_network.h
+++ b/drivers/net/ethernet/cavium/liquidio/octeon_network.h
@@ -29,7 +29,7 @@
 #include 
 
 #define LIO_MAX_MTU_SIZE (OCTNET_MAX_FRM_SIZE - OCTNET_FRM_HEADER_SIZE)
-#define LIO_MIN_MTU_SIZE 68
+#define LIO_MIN_MTU_SIZE ETH_MIN_MTU
 
 struct oct_nic_stats_resp {
u64 rh;
diff --git a/drivers/net/ethernet/cavium/octeon/octeon_mgmt.c 
b/drivers/net/ethernet/cavium/octeon/octeon_mgmt.c
index 4ab404f..16e12c4 100644
--- a/drivers/net/ethernet/cavium/octeon/octeon_mgmt.c
+++ b/drivers/net/ethernet/cavium/octeon/octeon_mgmt.c
@@ -645,16 +645,6 @@ static int octeon_mgmt_change_mtu(struct net_device 
*netdev, int new_mtu)
struct octeon_mgmt *p = netdev_priv(netdev);
int size_without_fcs = new_mtu + OCTEON_MGMT_RX_HEADROOM;
 
-   /* Limit the MTU to make sure the ethernet packets are between
-* 64 bytes and 16383 bytes.
-*/
-   if (size_without_fcs < 64 || size_without_fcs > 16383) {
-   dev_warn(p->dev, "MTU must be between %d and %d.\n",
-64 - OCTEON_MGMT_RX_HEADROOM,
-16383 - OCTEON_MGMT_RX_HEADROOM);
-   return -EINVAL;
-   }
-
netdev->mtu = new_mtu;
 
cvmx_write_csr(p->agl + AGL_GMX_RX_FRM_MAX, size_without_fcs);
@@ -1491,6 +1481,9 @@ static int octeon_mgmt_probe(struct platform_device *pdev)
netdev->netdev_ops = _mgmt_ops;
netdev->ethtool_ops = _mgmt_ethtool_ops;
 
+   netdev->min_mtu = 64 - OCTEON_MGMT_RX_HEADROOM;
+   netdev->max_mtu = 16383 - OCTEON_MGMT_RX_HEADROOM;
+
mac = of_get_mac_address(pdev->dev.of_node);
 
if (mac)
diff --git a/drivers/net/ethernet/cavium/thunder/nicvf_main.c 
b/drivers/net/ethernet/cavium/thunder/nicvf_main.c
index 45a13f7..b192712 100644
--- a/drivers/net/ethernet/cavium/thunder/nicvf_main.c
+++ b/drivers/net/ethernet/cavium/thunder/nicvf_main.c
@@ -1312,12 +1312,6 @@ static int nicvf_change_mtu(struct net_device *netdev, 
int new_mtu)
 {
struct nicvf *nic = netdev_priv(netdev);
 
-   if (new_mtu > NIC_HW_MAX_FRS)
-   return -EINVAL;
-
-   if (new_mtu < NIC_HW_MIN_FRS)
-   return -EINVAL;
-
if (nicvf_update_hw_max_frs(nic, new_mtu))
return -EINVAL;
netdev->mtu = new_mtu;
@@ -1630,6 +1624,10 @@ static int nicvf_probe(struct pci_dev *pdev, const 
struct pci_device_id *ent)
netdev->netdev_ops = _netdev_ops;
netdev->watchdog_timeo = NICVF_TX_TIMEOUT;
 
+   /* MTU range: 64 - 9200 */
+   netdev->min_mtu = NIC_HW_MIN_FRS;
+

[PATCH net-next 13/15] ethernet/tile: use core min/max MTU checking

2016-10-17 Thread Jarod Wilson

tilegx: min_mtu 68, max_mtu 1500 or 9000, depending on modparam
- remove tile_net_change_mtu now that it is fully redundant

tilepro: min_mtu 68, max_mtu 1500
- hardware supports jumbo packets up to 10226, but it's not implemented or
  tested yet, according to code comments

CC: netdev@vger.kernel.org
CC: Chris Metcalf 
Signed-off-by: Jarod Wilson 
---
 drivers/net/ethernet/tile/tilegx.c  | 21 -
 drivers/net/ethernet/tile/tilepro.c | 27 +--
 2 files changed, 13 insertions(+), 35 deletions(-)

diff --git a/drivers/net/ethernet/tile/tilegx.c 
b/drivers/net/ethernet/tile/tilegx.c
index 11213a3..0aaf975 100644
--- a/drivers/net/ethernet/tile/tilegx.c
+++ b/drivers/net/ethernet/tile/tilegx.c
@@ -59,6 +59,9 @@
 /* Maximum number of packets to handle per "poll". */
 #define TILE_NET_WEIGHT 64
 
+/* Maximum Jumbo Packet MTU */
+#define TILE_JUMBO_MAX_MTU 9000
+
 /* Number of entries in each iqueue. */
 #define IQUEUE_ENTRIES 512
 
@@ -2101,17 +2104,6 @@ static int tile_net_ioctl(struct net_device *dev, struct 
ifreq *rq, int cmd)
return -EOPNOTSUPP;
 }
 
-/* Change the MTU. */
-static int tile_net_change_mtu(struct net_device *dev, int new_mtu)
-{
-   if (new_mtu < 68)
-   return -EINVAL;
-   if (new_mtu > ((jumbo_num != 0) ? 9000 : 1500))
-   return -EINVAL;
-   dev->mtu = new_mtu;
-   return 0;
-}
-
 /* Change the Ethernet address of the NIC.
  *
  * The hypervisor driver does not support changing MAC address.  However,
@@ -2154,7 +2146,6 @@ static const struct net_device_ops tile_net_ops = {
.ndo_start_xmit = tile_net_tx,
.ndo_select_queue = tile_net_select_queue,
.ndo_do_ioctl = tile_net_ioctl,
-   .ndo_change_mtu = tile_net_change_mtu,
.ndo_tx_timeout = tile_net_tx_timeout,
.ndo_set_mac_address = tile_net_set_mac_address,
 #ifdef CONFIG_NET_POLL_CONTROLLER
@@ -2174,7 +2165,11 @@ static void tile_net_setup(struct net_device *dev)
ether_setup(dev);
dev->netdev_ops = _net_ops;
dev->watchdog_timeo = TILE_NET_TIMEOUT;
-   dev->mtu = 1500;
+
+   /* MTU range: 68 - 1500 or 9000 */
+   dev->mtu = ETH_DATA_LEN;
+   dev->min_mtu = ETH_MIN_MTU;
+   dev->max_mtu = jumbo_num ? TILE_JUMBO_MAX_MTU : ETH_DATA_LEN;
 
features |= NETIF_F_HW_CSUM;
features |= NETIF_F_SG;
diff --git a/drivers/net/ethernet/tile/tilepro.c 
b/drivers/net/ethernet/tile/tilepro.c
index 4ef605a..0a3b7da 100644
--- a/drivers/net/ethernet/tile/tilepro.c
+++ b/drivers/net/ethernet/tile/tilepro.c
@@ -87,7 +87,7 @@
 /* This should be 1500 if "jumbo" is not set in LIPP. */
 /* This should be at most 10226 (10240 - 14) if "jumbo" is set in LIPP. */
 /* ISSUE: This has not been thoroughly tested (except at 1500). */
-#define TILE_NET_MTU 1500
+#define TILE_NET_MTU ETH_DATA_LEN
 
 /* HACK: Define this to verify incoming packets. */
 /* #define TILE_NET_VERIFY_INGRESS */
@@ -2095,26 +2095,6 @@ static struct rtnl_link_stats64 
*tile_net_get_stats64(struct net_device *dev,
 }
 
 
-/*
- * Change the "mtu".
- *
- * The "change_mtu" method is usually not needed.
- * If you need it, it must be like this.
- */
-static int tile_net_change_mtu(struct net_device *dev, int new_mtu)
-{
-   PDEBUG("tile_net_change_mtu()\n");
-
-   /* Check ranges. */
-   if ((new_mtu < 68) || (new_mtu > 1500))
-   return -EINVAL;
-
-   /* Accept the value. */
-   dev->mtu = new_mtu;
-
-   return 0;
-}
-
 
 /*
  * Change the Ethernet Address of the NIC.
@@ -2229,7 +2209,6 @@ static const struct net_device_ops tile_net_ops = {
.ndo_start_xmit = tile_net_tx,
.ndo_do_ioctl = tile_net_ioctl,
.ndo_get_stats64 = tile_net_get_stats64,
-   .ndo_change_mtu = tile_net_change_mtu,
.ndo_tx_timeout = tile_net_tx_timeout,
.ndo_set_mac_address = tile_net_set_mac_address,
 #ifdef CONFIG_NET_POLL_CONTROLLER
@@ -2252,7 +2231,11 @@ static void tile_net_setup(struct net_device *dev)
dev->netdev_ops = _net_ops;
dev->watchdog_timeo = TILE_NET_TIMEOUT;
dev->tx_queue_len = TILE_NET_TX_QUEUE_LEN;
+
+   /* MTU range: 68 - 1500 */
dev->mtu = TILE_NET_MTU;
+   dev->min_mtu = ETH_MIN_MTU;
+   dev->max_mtu = TILE_NET_MTU;
 
features |= NETIF_F_HW_CSUM;
features |= NETIF_F_SG;
-- 
2.10.0

[PATCH v3 3/4] net: smc91x: take into account half-word workaround

2016-10-17 Thread Robert Jarzmik

For device-tree builds, platforms such as mainstone, idp and stargate2
must have their u16 writes all aligned on 32 bit boundaries. This is
already enabled in platform data builds, and this patch adds it to
device-tree builds.

Signed-off-by: Robert Jarzmik 
---
Since v1: rename dt property to pxa-u16-align4
---
 drivers/net/ethernet/smsc/smc91x.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/ethernet/smsc/smc91x.c 
b/drivers/net/ethernet/smsc/smc91x.c
index 705d99b2d947..65077c77082a 100644
--- a/drivers/net/ethernet/smsc/smc91x.c
+++ b/drivers/net/ethernet/smsc/smc91x.c
@@ -2326,6 +2326,8 @@ static int smc_drv_probe(struct platform_device *pdev)
if (!device_property_read_u32(>dev, "reg-shift",
  ))
lp->io_shift = val;
+   lp->cfg.pxa_u16_align4 =
+   device_property_read_bool(>dev, "pxa-u16-align4");
}
 #endif
 
-- 
2.1.4

[PATCH v3 4/4] net: smsc91x: add u16 workaround for pxa platforms

2016-10-17 Thread Robert Jarzmik

Add a workaround for mainstone, idp and stargate2 boards, for u16 writes
which must be aligned on 32 bits addresses.

Signed-off-by: Robert Jarzmik 
Cc: Jeremy Linton 
---
Since v1: rename dt property to pxa-u16-align4
  change the binding documentation file
---
 Documentation/devicetree/bindings/net/smsc-lan91c111.txt | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/Documentation/devicetree/bindings/net/smsc-lan91c111.txt 
b/Documentation/devicetree/bindings/net/smsc-lan91c111.txt
index e77e167593db..309e37eb7c7c 100644
--- a/Documentation/devicetree/bindings/net/smsc-lan91c111.txt
+++ b/Documentation/devicetree/bindings/net/smsc-lan91c111.txt
@@ -13,3 +13,5 @@ Optional properties:
   16-bit access only.
 - power-gpios: GPIO to control the PWRDWN pin
 - reset-gpios: GPIO to control the RESET pin
+- pxa-u16-align4 : Boolean, put in place the workaround the force all
+  u16 writes to be 32 bits aligned
-- 
2.1.4

[PATCH v3 2/4] net: smc91x: isolate u16 writes alignment workaround

2016-10-17 Thread Robert Jarzmik

Writes to u16 has a special handling on 3 PXA platforms, where the
hardware wiring forces these writes to be u32 aligned.

This patch isolates this handling for PXA platforms as before, but
enables this "workaround" to be set up dynamically, which will be the
case in device-tree build types.

This patch was tested on 2 PXA platforms : mainstone, which relies on
the workaround, and lubbock, which doesn't.

Signed-off-by: Robert Jarzmik 
--
Since v2: fixed arch/mn10300 case
  removed machine_is_*() calls
---
 arch/mn10300/unit-asb2303/include/unit/smc9.h |  2 +-
 drivers/net/ethernet/smsc/smc91x.c|  3 +-
 drivers/net/ethernet/smsc/smc91x.h| 82 ---
 3 files changed, 47 insertions(+), 40 deletions(-)

diff --git a/arch/mn10300/unit-asb2303/include/unit/smc9.h 
b/arch/mn10300/unit-asb2303/include/unit/smc9.h
index dd456e9c513f..dd4e2946438e 100644
--- a/arch/mn10300/unit-asb2303/include/unit/smc9.h
+++ b/arch/mn10300/unit-asb2303/include/unit/smc9.h
@@ -30,7 +30,7 @@
 
 #if SMC_CAN_USE_16BIT
 #define SMC_inw(a, r)  inw((unsigned long) ((a) + (r)))
-#define SMC_outw(v, a, r)  outw(v, (unsigned long) ((a) + (r)))
+#define SMC_outw(lp, v, a, r)  outw(v, (unsigned long) ((a) + (r)))
 #define SMC_insw(a, r, p, l)   insw((unsigned long) ((a) + (r)), (p), (l))
 #define SMC_outsw(a, r, p, l)  outsw((unsigned long) ((a) + (r)), (p), (l))
 #endif
diff --git a/drivers/net/ethernet/smsc/smc91x.c 
b/drivers/net/ethernet/smsc/smc91x.c
index 9b4780f87863..705d99b2d947 100644
--- a/drivers/net/ethernet/smsc/smc91x.c
+++ b/drivers/net/ethernet/smsc/smc91x.c
@@ -602,7 +602,8 @@ static void smc_hardware_send_pkt(unsigned long data)
SMC_PUSH_DATA(lp, buf, len & ~1);
 
/* Send final ctl word with the last byte if there is one */
-   SMC_outw(((len & 1) ? (0x2000 | buf[len-1]) : 0), ioaddr, DATA_REG(lp));
+   SMC_outw(lp, ((len & 1) ? (0x2000 | buf[len - 1]) : 0), ioaddr,
+DATA_REG(lp));
 
/*
 * If THROTTLE_TX_PKTS is set, we stop the queue here. This will
diff --git a/drivers/net/ethernet/smsc/smc91x.h 
b/drivers/net/ethernet/smsc/smc91x.h
index ea8465467469..45e6b81a6a92 100644
--- a/drivers/net/ethernet/smsc/smc91x.h
+++ b/drivers/net/ethernet/smsc/smc91x.h
@@ -63,8 +63,6 @@
 
 #if defined(CONFIG_ARM)
 
-#include 
-
 /* Now the bus width is specified in the platform data
  * pretend here to support all I/O access types
  */
@@ -86,11 +84,11 @@
 
 #define SMC_inl(a, r)  readl((a) + (r))
 #define SMC_outb(v, a, r)  writeb(v, (a) + (r))
-#define SMC_outw(v, a, r)  \
+#define SMC_outw(lp, v, a, r)  \
do {\
unsigned int __v = v, __smc_r = r;  \
if (SMC_16BIT(lp))  \
-   __SMC_outw(__v, a, __smc_r);\
+   __SMC_outw(lp, __v, a, __smc_r);\
else if (SMC_8BIT(lp))  \
SMC_outw_b(__v, a, __smc_r);\
else\
@@ -107,10 +105,10 @@
 #define SMC_IRQ_FLAGS  (-1)/* from resource */
 
 /* We actually can't write halfwords properly if not word aligned */
-static inline void __SMC_outw(u16 val, void __iomem *ioaddr, int reg)
+static inline void _SMC_outw_align4(u16 val, void __iomem *ioaddr, int reg,
+   bool use_align4_workaround)
 {
-   if ((machine_is_mainstone() || machine_is_stargate2() ||
-machine_is_pxa_idp()) && reg & 2) {
+   if (use_align4_workaround) {
unsigned int v = val << 16;
v |= readl(ioaddr + (reg & ~2)) & 0x;
writel(v, ioaddr + (reg & ~2));
@@ -119,6 +117,12 @@ static inline void __SMC_outw(u16 val, void __iomem 
*ioaddr, int reg)
}
 }
 
+#define __SMC_outw(lp, v, a, r)
\
+   _SMC_outw_align4((v), (a), (r), \
+IS_BUILTIN(CONFIG_ARCH_PXA) && ((r) & 2) &&\
+(lp)->cfg.pxa_u16_align4)
+
+
 #elif  defined(CONFIG_SH_SH4202_MICRODEV)
 
 #define SMC_CAN_USE_8BIT   0
@@ -129,7 +133,7 @@ static inline void __SMC_outw(u16 val, void __iomem 
*ioaddr, int reg)
 #define SMC_inw(a, r)  inw((a) + (r) - 0xa000)
 #define SMC_inl(a, r)  inl((a) + (r) - 0xa000)
 #define SMC_outb(v, a, r)  outb(v, (a) + (r) - 0xa000)
-#define SMC_outw(v, a, r)  outw(v, (a) + (r) - 0xa000)
+#define SMC_outw(lp, v, a, r)  outw(v, (a) + (r) - 0xa000)
 #define SMC_outl(v, a, r)  outl(v, (a) + (r) - 0xa000)
 #define

[PATCH v3 1/4] ARM: pxa: enhance smc91x platform data

2016-10-17 Thread Robert Jarzmik

Instead of having the smc91x driver relying on machine_is_*() calls,
provide this data through platform data, ie. idp, mainstone and
stargate.

This way, the driver doesn't need anymore machine_is_*() calls, which
wouldn't work anymore with a device-tree build.

Signed-off-by: Robert Jarzmik 
---
 arch/arm/mach-pxa/idp.c   | 1 +
 arch/arm/mach-pxa/mainstone.c | 1 +
 arch/arm/mach-pxa/stargate2.c | 1 +
 include/linux/smc91x.h| 1 +
 4 files changed, 4 insertions(+)

diff --git a/arch/arm/mach-pxa/idp.c b/arch/arm/mach-pxa/idp.c
index 66070acaa888..d1db32b1a2c6 100644
--- a/arch/arm/mach-pxa/idp.c
+++ b/arch/arm/mach-pxa/idp.c
@@ -85,6 +85,7 @@ static struct resource smc91x_resources[] = {
 static struct smc91x_platdata smc91x_platdata = {
.flags = SMC91X_USE_8BIT | SMC91X_USE_16BIT | SMC91X_USE_32BIT |
 SMC91X_USE_DMA | SMC91X_NOWAIT,
+   .pxa_u16_align4 = true,
 };
 
 static struct platform_device smc91x_device = {
diff --git a/arch/arm/mach-pxa/mainstone.c b/arch/arm/mach-pxa/mainstone.c
index 40964069a17c..a2d851a3a546 100644
--- a/arch/arm/mach-pxa/mainstone.c
+++ b/arch/arm/mach-pxa/mainstone.c
@@ -140,6 +140,7 @@ static struct resource smc91x_resources[] = {
 static struct smc91x_platdata mainstone_smc91x_info = {
.flags  = SMC91X_USE_8BIT | SMC91X_USE_16BIT | SMC91X_USE_32BIT |
  SMC91X_NOWAIT | SMC91X_USE_DMA,
+   .pxa_u16_align4 = true,
 };
 
 static struct platform_device smc91x_device = {
diff --git a/arch/arm/mach-pxa/stargate2.c b/arch/arm/mach-pxa/stargate2.c
index 702f4f14b708..7b6610e9dae4 100644
--- a/arch/arm/mach-pxa/stargate2.c
+++ b/arch/arm/mach-pxa/stargate2.c
@@ -673,6 +673,7 @@ static struct resource smc91x_resources[] = {
 static struct smc91x_platdata stargate2_smc91x_info = {
.flags = SMC91X_USE_8BIT | SMC91X_USE_16BIT | SMC91X_USE_32BIT
| SMC91X_NOWAIT | SMC91X_USE_DMA,
+   .pxa_u16_align4 = true,
 };
 
 static struct platform_device smc91x_device = {
diff --git a/include/linux/smc91x.h b/include/linux/smc91x.h
index e302c447e057..129bc674dcf5 100644
--- a/include/linux/smc91x.h
+++ b/include/linux/smc91x.h
@@ -39,6 +39,7 @@ struct smc91x_platdata {
unsigned long flags;
unsigned char leda;
unsigned char ledb;
+   bool pxa_u16_align4;/* PXA buggy u16 writes on 4*n+2 addresses */
 };
 
 #endif /* __SMC91X_H__ */
-- 
2.1.4

Re: [Intel-wired-lan] [PATCH V2 RFC 2/2] ixgbe: ixgbe_atr() compute l4_proto only if non-paged data has network/transport headers

2016-10-17 Thread Alexander Duyck

On Mon, Oct 17, 2016 at 12:18 PM, Sowmini Varadhan
 wrote:
> On (10/17/16 11:15), Alexander Duyck wrote:
>> I would say you probably only need the first check here for skb->data
>> and could probably skip the second part.  You will be testing for
>> skb_tail_pointer in all the other tests you added so this check is
>> redundant anyway.
>>
>> Also you might want to go through and wrap these with unlikely() since
>> most of these are exception cases.
>
> Ok.. v3 will have this.
>
>> > /* Currently only IPv4/IPv6 with TCP is supported */
>> > switch (hdr.ipv4->version) {
>> > case IPVERSION:
>> > /* access ihl as u8 to avoid unaligned access on ia64 */
>> > hlen = (hdr.network[0] & 0x0F) << 2;
>> > +   if (skb_tail_pointer(skb) < hdr.network + hlen +
>> > +   sizeof(struct tcphdr))
>> > +   return;
>> > l4_proto = hdr.ipv4->protocol;
>> > break;
>> > case 6:
>> > hlen = hdr.network - skb->data;
>> > +   if (skb_tail_pointer(skb) < hdr.network + hlen +
>> > +   sizeof(struct tcphdr))
>> > +   return;
>> > l4_proto = ipv6_find_hdr(skb, , IPPROTO_TCP, NULL, 
>> > NULL);
>> > hlen -= hdr.network - skb->data;
>> > break;
>>
>> I believe one more check is needed after this block to verify the TCP
>> header fields are present.
>>
>> So you probably need to add a check for "skb_tail_pointer(skb) <
>> (hdr.network + hlen + 20)".
>
> But isnt that the same thing as the checks before l4_proto computation above?

Sort of.  The problem is IPv6 can include extension headers and that
can totally mess with us.  So we need to do one more check to verify
that we have enough space for IPv6 w/ TCP which would be hdr.raw + 20
+ hlenl.

Thanks.

- Alex

[PATCH v3 0/4] support smc91x on mainstone and devicetree

2016-10-17 Thread Robert Jarzmik

This serie aims at bringing support to mainstone board on a device-tree based
build, as what is already in place for legacy mainstone.

The bulk of the mainstone "specific" behavior is that a u16 write doesn't work
on a address of the form 4*n + 2, while it works on 4*n.

The legacy workaround was in SMC_outw(), with calls to
machine_is_mainstone(). These calls don't work with a pxa27x-dt machine type,
which is used when a generic device-tree pxa27x machine is used to boot the
mainstone board.

Therefore, this serie enables the smc91c111 adapter of the mainstone board to
work on a device-tree build, exaclty as it's been working for years with the
legacy arch/arm/mach-pxa/mainstone.c definition.

As a sum up, this extends an existing mechanism to device-tree based pxa 
platforms.

Cheers.

--
Robert

Robert Jarzmik (4):
  ARM: pxa: enhance smc91x platform data
  net: smc91x: isolate u16 writes alignment workaround
  net: smc91x: take into account half-word workaround
  net: smsc91x: add u16 workaround for pxa platforms

 .../devicetree/bindings/net/smsc-lan91c111.txt |  2 +
 arch/arm/mach-pxa/idp.c|  1 +
 arch/arm/mach-pxa/mainstone.c  |  1 +
 arch/arm/mach-pxa/stargate2.c  |  1 +
 arch/mn10300/unit-asb2303/include/unit/smc9.h  |  2 +-
 drivers/net/ethernet/smsc/smc91x.c |  5 +-
 drivers/net/ethernet/smsc/smc91x.h | 82 --
 include/linux/smc91x.h |  1 +
 8 files changed, 55 insertions(+), 40 deletions(-)

-- 
2.1.4

[PATCH v2 net-next 7/7] fou: Support flow dissection

2016-10-17 Thread Tom Herbert

This patch performs flow dissection for GUE and FOU. This is an
optional feature on the receiver and is set by FOU_ATTR_DEEP_HASH
netlink configuration. When enable the UDP socket flow_dissect
function is set to fou_flow_dissect or gue_flow_dissect as
appropriate. These functions return FLOW_DIS_RET_IPPROTO and
set ip protocol argument. In the case of GUE the header is
parsed to find the protocol number.

Signed-off-by: Tom Herbert 
---
 include/uapi/linux/fou.h |  1 +
 net/ipv4/fou.c   | 68 +++-
 2 files changed, 68 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/fou.h b/include/uapi/linux/fou.h
index d2947c5..2c837eb 100644
--- a/include/uapi/linux/fou.h
+++ b/include/uapi/linux/fou.h
@@ -15,6 +15,7 @@ enum {
FOU_ATTR_IPPROTO,   /* u8 */
FOU_ATTR_TYPE,  /* u8 */
FOU_ATTR_REMCSUM_NOPARTIAL, /* flag */
+   FOU_ATTR_DEEP_HASH, /* flag */
 
__FOU_ATTR_MAX,
 };
diff --git a/net/ipv4/fou.c b/net/ipv4/fou.c
index cf50f7e..95ac5a8 100644
--- a/net/ipv4/fou.c
+++ b/net/ipv4/fou.c
@@ -27,7 +27,8 @@ struct fou {
struct rcu_head rcu;
 };
 
-#define FOU_F_REMCSUM_NOPARTIAL BIT(0)
+#define FOU_F_REMCSUM_NOPARTIALBIT(0)
+#define FOU_F_DEEP_HASHBIT(1)
 
 struct fou_cfg {
u16 type;
@@ -281,6 +282,16 @@ static int fou_gro_complete(struct sock *sk, struct 
sk_buff *skb,
return err;
 }
 
+static int fou_flow_dissect(struct sock *sk, const struct sk_buff *skb,
+   void *data, int hlen, int *nhoff, u8 *ip_proto,
+   __be16 *proto)
+{
+   *ip_proto = fou_from_sock(sk)->protocol;
+   *nhoff += sizeof(struct udphdr);
+
+   return FLOW_DIS_RET_IPPROTO;
+}
+
 static struct guehdr *gue_gro_remcsum(struct sk_buff *skb, unsigned int off,
  struct guehdr *guehdr, void *data,
  size_t hdrlen, struct gro_remcsum *grc,
@@ -498,6 +509,48 @@ static int gue_gro_complete(struct sock *sk, struct 
sk_buff *skb, int nhoff)
return err;
 }
 
+static int gue_flow_dissect(struct sock *sk, const struct sk_buff *skb,
+   void *data, int hlen, int *nhoff, u8 *ip_proto,
+   __be16 *proto)
+{
+   struct guehdr _hdr, *hdr;
+
+   hdr = __skb_header_pointer(skb, *nhoff + sizeof(struct udphdr),
+  sizeof(_hdr), data, hlen, &_hdr);
+   if (!hdr)
+   return FLOW_DIS_RET_BAD;
+
+   switch (hdr->version) {
+   case 0: /* Full GUE header present */
+   if (hdr->control)
+   return FLOW_DIS_RET_PASS;
+
+   *nhoff += sizeof(struct udphdr) + sizeof(_hdr) +
+ (hdr->hlen << 2);
+   *ip_proto = hdr->proto_ctype;
+
+   return FLOW_DIS_RET_IPPROTO;
+   case 1:
+   /* Direct encasulation of IPv4 or IPv6 */
+
+   switch (((struct iphdr *)hdr)->version) {
+   case 4:
+   *nhoff += sizeof(struct udphdr);
+   *ip_proto = IPPROTO_IPIP;
+   return FLOW_DIS_RET_IPPROTO;
+   case 6:
+   *nhoff += sizeof(struct udphdr);
+   *ip_proto = IPPROTO_IPV6;
+   return FLOW_DIS_RET_IPPROTO;
+   default:
+   return FLOW_DIS_RET_PASS;
+   }
+
+   default:
+   return FLOW_DIS_RET_PASS;
+   }
+}
+
 static int fou_add_to_port_list(struct net *net, struct fou *fou)
 {
struct fou_net *fn = net_generic(net, fou_net_id);
@@ -568,12 +621,16 @@ static int fou_create(struct net *net, struct fou_cfg 
*cfg,
tunnel_cfg.encap_rcv = fou_udp_recv;
tunnel_cfg.gro_receive = fou_gro_receive;
tunnel_cfg.gro_complete = fou_gro_complete;
+   if (cfg->flags & FOU_F_DEEP_HASH)
+   tunnel_cfg.flow_dissect = fou_flow_dissect;
fou->protocol = cfg->protocol;
break;
case FOU_ENCAP_GUE:
tunnel_cfg.encap_rcv = gue_udp_recv;
tunnel_cfg.gro_receive = gue_gro_receive;
tunnel_cfg.gro_complete = gue_gro_complete;
+   if (cfg->flags & FOU_F_DEEP_HASH)
+   tunnel_cfg.flow_dissect = gue_flow_dissect;
break;
default:
err = -EINVAL;
@@ -637,6 +694,7 @@ static const struct nla_policy fou_nl_policy[FOU_ATTR_MAX + 
1] = {
[FOU_ATTR_IPPROTO] = { .type = NLA_U8, },
[FOU_ATTR_TYPE] = { .type = NLA_U8, },
[FOU_ATTR_REMCSUM_NOPARTIAL] = { .type = NLA_FLAG, },
+   [FOU_ATTR_DEEP_HASH] = { .type = NLA_FLAG },
 };
 
 static int parse_nl_config(struct

[PATCH v2 net-next 5/7] udp: Add UDP flow dissection functions to IPv4 and IPv6

2016-10-17 Thread Tom Herbert

Add per protocol offload callbacks for flow_dissect to UDP for
IPv4 and IPv6. The callback functions extract the port number
information and with the packet addresses (given in an argument with
type flow_dissector_key_addrs) it performs a lookup on the UDP
socket. If a socket is found and flow_dissect is set for the
socket then that function is called.

Signed-off-by: Tom Herbert 
---
 net/ipv4/udp_offload.c | 39 +++
 net/ipv6/udp_offload.c | 40 +++-
 2 files changed, 78 insertions(+), 1 deletion(-)

diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c
index f9333c9..c7753ba 100644
--- a/net/ipv4/udp_offload.c
+++ b/net/ipv4/udp_offload.c
@@ -377,11 +377,50 @@ static int udp4_gro_complete(struct sk_buff *skb, int 
nhoff)
return udp_gro_complete(skb, nhoff, udp4_lib_lookup_skb);
 }
 
+/* Assumes rcu lock is held */
+static int udp4_flow_dissect(const struct sk_buff *skb, void *data, int hlen,
+int *nhoff, u8 *ip_proto, __be16 *proto,
+struct flow_dissector_key_addrs *key_addrs)
+{
+   u16 _ports[2], *ports;
+   struct net *net;
+   struct sock *sk;
+   int dif = -1;
+
+   /* See if there is a flow dissector in the UDP socket */
+
+   if (skb->dev) {
+   net = dev_net(skb->dev);
+   dif = skb->dev->ifindex;
+   } else if (skb->sk) {
+   net = sock_net(skb->sk);
+   } else {
+   return FLOW_DIS_RET_PASS;
+   }
+
+   ports = __skb_header_pointer(skb, *nhoff, sizeof(_ports),
+data, hlen, &_ports);
+   if (!ports)
+   return FLOW_DIS_RET_BAD;
+
+   sk = udp4_lib_lookup_noref(net,
+  key_addrs->v4addrs.src, ports[0],
+  key_addrs->v4addrs.dst, ports[1],
+  dif);
+
+   if (sk && udp_sk(sk)->flow_dissect)
+   return udp_sk(sk)->flow_dissect(sk, skb, data, hlen, nhoff,
+   ip_proto, proto);
+   else
+   return FLOW_DIS_RET_PASS;
+}
+
 static const struct net_offload udpv4_offload = {
.callbacks = {
.gso_segment = udp4_ufo_fragment,
.gro_receive  = udp4_gro_receive,
.gro_complete = udp4_gro_complete,
+   .flow_dissect = udp4_flow_dissect,
},
 };
 
diff --git a/net/ipv6/udp_offload.c b/net/ipv6/udp_offload.c
index ac858c4..12d9a92 100644
--- a/net/ipv6/udp_offload.c
+++ b/net/ipv6/udp_offload.c
@@ -1,5 +1,5 @@
 /*
- * IPV6 GSO/GRO offload support
+ * ipv6 gso/gro offload support
  * Linux INET6 implementation
  *
  * This program is free software; you can redistribute it and/or
@@ -163,11 +163,49 @@ static int udp6_gro_complete(struct sk_buff *skb, int 
nhoff)
return udp_gro_complete(skb, nhoff, udp6_lib_lookup_skb);
 }
 
+/* Assumes rcu lock is held */
+static int udp6_flow_dissect(const struct sk_buff *skb, void *data, int hlen,
+int *nhoff, u8 *ip_proto, __be16 *proto,
+const struct flow_dissector_key_addrs *key_addrs)
+{
+   u16 _ports[2], *ports;
+   struct net *net;
+   struct sock *sk;
+   int dif = -1;
+
+   /* See if there is a flow dissector in the UDP socket */
+
+   if (skb->dev) {
+   net = dev_net(skb->dev);
+   dif = skb->dev->ifindex;
+   } else if (skb->sk) {
+   net = sock_net(skb->sk);
+   } else {
+   return FLOW_DIS_RET_PASS;
+   }
+
+   ports = __skb_header_pointer(skb, *nhoff, sizeof(_ports),
+data, hlen, &_ports);
+   if (!ports)
+   return FLOW_DIS_RET_BAD;
+
+   sk = udp6_lib_lookup_noref(net,
+  _addrs->v6addrs.src, ports[0],
+  _addrs->v6addrs.dst, ports[1],
+  dif);
+
+   if (sk && udp_sk(sk)->flow_dissect)
+   return udp_sk(sk)->flow_dissect(sk, skb, data, hlen, nhoff,
+   ip_proto, proto);
+   return FLOW_DIS_RET_PASS;
+}
+
 static const struct net_offload udpv6_offload = {
.callbacks = {
.gso_segment=   udp6_ufo_fragment,
.gro_receive=   udp6_gro_receive,
.gro_complete   =   udp6_gro_complete,
+   .flow_dissect   =   udp6_flow_dissect,
},
 };
 
-- 
2.9.3

[PATCH v2 net-next 6/7] udp: UDP tunnel flow dissection infrastructure

2016-10-17 Thread Tom Herbert

Add infrastructure to allow UDP tunnels to setup flow dissecion.

Signed-off-by: Tom Herbert 
---
 include/net/udp_tunnel.h | 5 +
 net/ipv4/udp_tunnel.c| 5 +
 2 files changed, 10 insertions(+)

diff --git a/include/net/udp_tunnel.h b/include/net/udp_tunnel.h
index 02c5be0..81d2584 100644
--- a/include/net/udp_tunnel.h
+++ b/include/net/udp_tunnel.h
@@ -69,6 +69,10 @@ typedef struct sk_buff **(*udp_tunnel_gro_receive_t)(struct 
sock *sk,
 struct sk_buff *skb);
 typedef int (*udp_tunnel_gro_complete_t)(struct sock *sk, struct sk_buff *skb,
 int nhoff);
+typedef int (*udp_tunnel_flow_dissect_t)(struct sock *sk,
+const struct sk_buff *skb,
+void *data, int hlen, int *nhoff,
+u8 *ip_proto, __be16 *proto);
 
 struct udp_tunnel_sock_cfg {
void *sk_user_data; /* user data used by encap_rcv call back */
@@ -78,6 +82,7 @@ struct udp_tunnel_sock_cfg {
udp_tunnel_encap_destroy_t encap_destroy;
udp_tunnel_gro_receive_t gro_receive;
udp_tunnel_gro_complete_t gro_complete;
+   udp_tunnel_flow_dissect_t flow_dissect;
 };
 
 /* Setup the given (UDP) sock to receive UDP encapsulated packets */
diff --git a/net/ipv4/udp_tunnel.c b/net/ipv4/udp_tunnel.c
index 58bd39f..4459288 100644
--- a/net/ipv4/udp_tunnel.c
+++ b/net/ipv4/udp_tunnel.c
@@ -72,6 +72,11 @@ void setup_udp_tunnel_sock(struct net *net, struct socket 
*sock,
udp_sk(sk)->gro_receive = cfg->gro_receive;
udp_sk(sk)->gro_complete = cfg->gro_complete;
 
+   if (cfg->flow_dissect) {
+   udp_sk(sk)->flow_dissect = cfg->flow_dissect;
+   udp_flow_dissect_enable();
+   }
+
udp_tunnel_encap_enable(sock);
 }
 EXPORT_SYMBOL_GPL(setup_udp_tunnel_sock);
-- 
2.9.3

[PATCH v2 net-next 3/7] udp: Add socket lookup functions with noref

2016-10-17 Thread Tom Herbert

Create udp4_lib_lookup_noref and udp6_lib_lookup_noref. These perfrom
a socket lookup on addresses and ports without taking a reference.

Signed-off-by: Tom Herbert 
---
 include/net/udp.h |  8 
 net/ipv4/udp.c|  8 
 net/ipv6/udp.c| 10 ++
 3 files changed, 26 insertions(+)

diff --git a/include/net/udp.h b/include/net/udp.h
index ea53a87..717a972 100644
--- a/include/net/udp.h
+++ b/include/net/udp.h
@@ -275,6 +275,10 @@ struct sock *__udp4_lib_lookup(struct net *net, __be32 
saddr, __be16 sport,
   struct udp_table *tbl, struct sk_buff *skb);
 struct sock *udp4_lib_lookup_skb(struct sk_buff *skb,
 __be16 sport, __be16 dport);
+struct sock *udp4_lib_lookup_noref(struct net *net,
+  __be32 saddr, __be16 sport,
+  __be32 daddr, __be16 dport,
+  int dif);
 struct sock *udp6_lib_lookup(struct net *net,
 const struct in6_addr *saddr, __be16 sport,
 const struct in6_addr *daddr, __be16 dport,
@@ -286,6 +290,10 @@ struct sock *__udp6_lib_lookup(struct net *net,
   struct sk_buff *skb);
 struct sock *udp6_lib_lookup_skb(struct sk_buff *skb,
 __be16 sport, __be16 dport);
+struct sock *udp6_lib_lookup_noref(struct net *net,
+  const struct in6_addr *saddr, __be16 sport,
+  const struct in6_addr *daddr, __be16 dport,
+  int dif);
 
 /*
  * SNMP statistics for UDP and UDP-Lite
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 7d96dc2..7f84c51 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -595,6 +595,14 @@ struct sock *udp4_lib_lookup(struct net *net, __be32 
saddr, __be16 sport,
 EXPORT_SYMBOL_GPL(udp4_lib_lookup);
 #endif
 
+struct sock *udp4_lib_lookup_noref(struct net *net, __be32 saddr, __be16 sport,
+  __be32 daddr, __be16 dport, int dif)
+{
+   return __udp4_lib_lookup(net, saddr, sport, daddr, dport,
+dif, _table, NULL);
+}
+EXPORT_SYMBOL_GPL(udp4_lib_lookup_noref);
+
 static inline bool __udp_is_mcast_sock(struct net *net, struct sock *sk,
   __be16 loc_port, __be32 loc_addr,
   __be16 rmt_port, __be32 rmt_addr,
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index 9aa7c1c..6e382d9 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -317,6 +317,16 @@ struct sock *udp6_lib_lookup(struct net *net, const struct 
in6_addr *saddr, __be
 EXPORT_SYMBOL_GPL(udp6_lib_lookup);
 #endif
 
+struct sock *udp6_lib_lookup_noref(struct net *net,
+  const struct in6_addr *saddr, __be16 sport,
+  const struct in6_addr *daddr, __be16 dport,
+  int dif)
+{
+   return __udp6_lib_lookup(net, saddr, sport, daddr, dport,
+dif, _table, NULL);
+}
+EXPORT_SYMBOL_GPL(udp6_lib_lookup_noref);
+
 /*
  * This should be easy, if there is something there we
  * return it, otherwise we block.
-- 
2.9.3

[PATCH v2 net-next 0/7] udp: Flow dissection for tunnels

2016-10-17 Thread Tom Herbert

Now that we have a means to perform a UDP socket lookup without taking
a reference, it is feasible to have flow dissector crack open UDP
encapsulated packets. Generally, we would expect that the UDP source
port or the flow label in IPv6 would contain enough entropy about
the encapsulated flow. However, there will be cases, such as a static
UDP tunnel with fixed ports, where dissecting the encapsulated packet
is valuable.

The model is here is similar to that implemented for UDP GRO. A
tunnel implementation (e.g. GUE) may set a flow_dissect function
in the udp_sk. In __skb_flow_dissect a case has been added for
UDP to check if there is a socket with flow_dissect set. If there
is the function is called. The (per tunnel implementation)
function can parse the encapsulation headers and return the
next protocol for __skb_flow_dissect to process and it's position
in nhoff.

Since performing a UDP lookup on every packet might be expensive
I added a static key check to bypass the lookup if there are no
sockets with flow_dissect set. I should mention that doing the
lookup wasn't particularly a big hit anyway.

Fou/gue was modified to perform tunnel dissection. This is enabled
on each listener socket via a netlink configuration option.

v2:
  - davem suggested that we don't need udp_flow_dissect and that
udp{v6}_encap_needed could be used. Problem is that those are
in respective udp.c and flow_dissector.c is in net/core. Keep
udp_flow_dissect as more generic item.
  - Fixed Makefile issue where we were using CONFIG_NET instead of
CONFIG_INET.
  - Added limits inf flow dissector from controlling number of nested
encapsulations or EHs that are dissected.
  - Added CONFIG_INET around use of inet_offloads in flow_dissector.c.

Tested:

Running 200 streams with TCP_RR.

GRE/GUE variable source port (baseline)
RSS distributes packets, RFS is effective
1211702 tps
147/241/442 50/90/99% latencies
87.95 CPU utilization

GRE/GUE fixed source port
All packets to one CPU, RFS is ineffective
173680 tps
1170/1377/1853 50/90/99% latencies
7.42 CPU utilization

GRE/GUE fixed source port with deep hash enabled
All packets to one CPU, but now RFS is effective
730359 tps
263/325/464 50/90/99% latencies
38.25% CPU utilization (Interrupting CPU is maxed out)


Tom Herbert (7):
  ipv6: Fix Makefile conditional to use CONFIG_INET
  flow_dissector: Limit processing of next encaps and extensions
  udp: Add socket lookup functions with noref
  udp: UDP flow dissector
  udp: Add UDP flow dissection functions to IPv4 and IPv6
  udp: UDP tunnel flow dissection infrastructure
  fou: Support flow dissection

 include/linux/netdevice.h|   5 ++
 include/linux/udp.h  |   7 +++
 include/net/flow_dissector.h |   8 +++
 include/net/udp.h|  12 +
 include/net/udp_tunnel.h |   5 ++
 include/uapi/linux/fou.h |   1 +
 net/Makefile |   2 +-
 net/core/flow_dissector.c| 122 ++-
 net/ipv4/fou.c   |  68 +++-
 net/ipv4/udp.c   |  11 
 net/ipv4/udp_offload.c   |  39 ++
 net/ipv4/udp_tunnel.c|   5 ++
 net/ipv6/udp.c   |  10 
 net/ipv6/udp_offload.c   |  40 +-
 14 files changed, 320 insertions(+), 15 deletions(-)

-- 
2.9.3

[PATCH v2 net-next 1/7] ipv6: Fix Makefile conditional to use CONFIG_INET

2016-10-17 Thread Tom Herbert

ipv6 directory was being built based on CONFIG_NET not CONFIG_INET.

Signed-off-by: Tom Herbert 
---
 net/Makefile | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/Makefile b/net/Makefile
index 4cafaa2..82ffb91 100644
--- a/net/Makefile
+++ b/net/Makefile
@@ -17,7 +17,7 @@ obj-$(CONFIG_NETFILTER)   += netfilter/
 obj-$(CONFIG_INET) += ipv4/
 obj-$(CONFIG_XFRM) += xfrm/
 obj-$(CONFIG_UNIX) += unix/
-obj-$(CONFIG_NET)  += ipv6/
+obj-$(CONFIG_INET) += ipv6/
 obj-$(CONFIG_PACKET)   += packet/
 obj-$(CONFIG_NET_KEY)  += key/
 obj-$(CONFIG_BRIDGE)   += bridge/
-- 
2.9.3

[PATCH v2 net-next 2/7] flow_dissector: Limit processing of next encaps and extensions

2016-10-17 Thread Tom Herbert

Flow dissector does not limit the number of encapsulated packets or IPv6
header extensions that will be processed. This could easily be
suceptible to DOS attack-- for instance a 1500 byte packet could contain
75 IPIP headers.

This patch places limits on the number of encapsulations and IPv6 extension
headers that are processed in flow dissector

Signed-off-by: Tom Herbert 
---
 net/core/flow_dissector.c | 37 +++--
 1 file changed, 27 insertions(+), 10 deletions(-)

diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c
index 1a7b80f..919bd02 100644
--- a/net/core/flow_dissector.c
+++ b/net/core/flow_dissector.c
@@ -91,6 +91,22 @@ __be32 __skb_flow_get_ports(const struct sk_buff *skb, int 
thoff, u8 ip_proto,
 }
 EXPORT_SYMBOL(__skb_flow_get_ports);
 
+#define MAX_DISSECT_DEPTH  10
+#define MAX_DISSECT_EXT10
+
+#define __DISSECT_AGAIN(_target, _depth, _limit) do {  \
+   (_depth)++; \
+   if ((_depth) > (_limit))\
+   goto out_good;  \
+   else\
+   goto _target;   \
+} while (0)
+
+#define DISSECT_AGAIN(target) \
+   __DISSECT_AGAIN(target, depth, MAX_DISSECT_DEPTH)
+#define DISSECT_AGAIN_EXT(target) \
+   __DISSECT_AGAIN(target, ext_cnt, MAX_DISSECT_EXT)
+
 /**
  * __skb_flow_dissect - extract the flow_keys struct and return it
  * @skb: sk_buff to extract the flow from, can be NULL if the rest are 
specified
@@ -123,6 +139,7 @@ bool __skb_flow_dissect(const struct sk_buff *skb,
bool skip_vlan = false;
u8 ip_proto = 0;
bool ret = false;
+   int depth = 0, ext_cnt = 0;
 
if (!data) {
data = skb->data;
@@ -262,7 +279,7 @@ bool __skb_flow_dissect(const struct sk_buff *skb,
proto = vlan->h_vlan_encapsulated_proto;
nhoff += sizeof(*vlan);
if (skip_vlan)
-   goto again;
+   DISSECT_AGAIN(again);
}
 
skip_vlan = true;
@@ -285,7 +302,7 @@ bool __skb_flow_dissect(const struct sk_buff *skb,
}
}
 
-   goto again;
+   DISSECT_AGAIN(again);
}
case htons(ETH_P_PPP_SES): {
struct {
@@ -299,9 +316,9 @@ bool __skb_flow_dissect(const struct sk_buff *skb,
nhoff += PPPOE_SES_HLEN;
switch (proto) {
case htons(PPP_IP):
-   goto ip;
+   DISSECT_AGAIN(ip);
case htons(PPP_IPV6):
-   goto ipv6;
+   DISSECT_AGAIN(ipv6);
default:
goto out_bad;
}
@@ -472,7 +489,7 @@ bool __skb_flow_dissect(const struct sk_buff *skb,
if (flags & FLOW_DISSECTOR_F_STOP_AT_ENCAP)
goto out_good;
 
-   goto again;
+   DISSECT_AGAIN(again);
}
case NEXTHDR_HOP:
case NEXTHDR_ROUTING:
@@ -490,7 +507,7 @@ bool __skb_flow_dissect(const struct sk_buff *skb,
ip_proto = opthdr[0];
nhoff += (opthdr[1] + 1) << 3;
 
-   goto ip_proto_again;
+   DISSECT_AGAIN_EXT(ip_proto_again);
}
case NEXTHDR_FRAGMENT: {
struct frag_hdr _fh, *fh;
@@ -512,7 +529,7 @@ bool __skb_flow_dissect(const struct sk_buff *skb,
if (!(fh->frag_off & htons(IP6_OFFSET))) {
key_control->flags |= FLOW_DIS_FIRST_FRAG;
if (flags & FLOW_DISSECTOR_F_PARSE_1ST_FRAG)
-   goto ip_proto_again;
+   DISSECT_AGAIN_EXT(ip_proto_again);
}
goto out_good;
}
@@ -523,7 +540,7 @@ bool __skb_flow_dissect(const struct sk_buff *skb,
if (flags & FLOW_DISSECTOR_F_STOP_AT_ENCAP)
goto out_good;
 
-   goto ip;
+   DISSECT_AGAIN(ip);
case IPPROTO_IPV6:
proto = htons(ETH_P_IPV6);
 
@@ -531,10 +548,10 @@ bool __skb_flow_dissect(const struct sk_buff *skb,
if (flags & FLOW_DISSECTOR_F_STOP_AT_ENCAP)
goto out_good;
 
-   goto ipv6;
+   DISSECT_AGAIN(ipv6);
case IPPROTO_MPLS:
proto = htons(ETH_P_MPLS_UC);
-   goto mpls;
+   DISSECT_AGAIN(mpls);
default:
break;
}
-- 
2.9.3

[PATCH v2 net-next 4/7] udp: UDP flow dissector

2016-10-17 Thread Tom Herbert

Add infrastructure for performing per protocol flow dissection and
support flow dissection in UDP payloads (e.g. flow dissection on a
UDP encapsulated tunnel.

The per protocol flow dissector is called by flow_dissect function
in the offload_callbacks of a protocol. The arguments of this function
include the necessary information to do flow dissection as derived
from __skb_flow_dissect which is where the callback is intended to be
called from. There are return codes from the callback in the form
FLOW_DIS_RET_* that indicate the result. FLOW_DIS_RET_IPPROTO
means that the payload should be dissected as an IP proto, the
specific protocol is returned in a pointer argument. Likewise,
FLOW_DIS_RET_PROTO indicate the payload should be processed as
an ethertype which is returned in another argument.

A case for IPPROTO_UDP was added to __skb_flow_dissect. Since
UDP flow dissector involves a relatively expensive socket lookup
there is a static key check first to see if there are any sockets
that have enabled flow dissection. After this check, the offload
ops for UDP for either IPv4 or IPv6 is considered. If the
flow_dissect function is it is called. Upon return the result
is processed (pass, out_bad, process as IP protocol, process
as ethertype). Note that if the result indicates a protocol must
be processed it is expected that nhoff has been updated to the
encapsulated protocol header.

Signed-off-by: Tom Herbert 
---
 include/linux/netdevice.h|  5 +++
 include/linux/udp.h  |  7 
 include/net/flow_dissector.h |  8 +
 include/net/udp.h|  4 +++
 net/core/flow_dissector.c| 85 ++--
 net/ipv4/udp.c   |  3 ++
 6 files changed, 110 insertions(+), 2 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index bf341b6..c5f4295 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -2203,6 +2203,11 @@ struct offload_callbacks {
struct sk_buff  **(*gro_receive)(struct sk_buff **head,
 struct sk_buff *skb);
int (*gro_complete)(struct sk_buff *skb, int nhoff);
+   int (*flow_dissect)(const struct sk_buff *skb,
+   void *data, int hlen,
+   int *nhoff, u8 *ip_proto,
+   __be16 *proto,
+struct flow_dissector_key_addrs *key_addrs);
 };
 
 struct packet_offload {
diff --git a/include/linux/udp.h b/include/linux/udp.h
index d1fd8cd..608ebf4 100644
--- a/include/linux/udp.h
+++ b/include/linux/udp.h
@@ -79,6 +79,13 @@ struct udp_sock {
int (*gro_complete)(struct sock *sk,
struct sk_buff *skb,
int nhoff);
+
+   /* Flow dissector function for UDP socket */
+   int (*flow_dissect)(struct sock *sk,
+   const struct sk_buff *skb,
+   void *data, int hlen,
+   int *nhoff, u8 *ip_proto,
+   __be16 *proto);
 };
 
 static inline struct udp_sock *udp_sk(const struct sock *sk)
diff --git a/include/net/flow_dissector.h b/include/net/flow_dissector.h
index d953492..9de4904 100644
--- a/include/net/flow_dissector.h
+++ b/include/net/flow_dissector.h
@@ -203,4 +203,12 @@ static inline void *skb_flow_dissector_target(struct 
flow_dissector *flow_dissec
return ((char *)target_container) + flow_dissector->offset[key_id];
 }
 
+/* Return codes from per socket flow dissector (e.g. UDP) */
+enum {
+   FLOW_DIS_RET_PASS = 0,
+   FLOW_DIS_RET_BAD,
+   FLOW_DIS_RET_IPPROTO,
+   FLOW_DIS_RET_PROTO,
+};
+
 #endif
diff --git a/include/net/udp.h b/include/net/udp.h
index 717a972..8d364e8 100644
--- a/include/net/udp.h
+++ b/include/net/udp.h
@@ -360,4 +360,8 @@ void udp_encap_enable(void);
 #if IS_ENABLED(CONFIG_IPV6)
 void udpv6_encap_enable(void);
 #endif
+
+void udp_flow_dissect_enable(void);
+void udp_flow_dissect_disable(void);
+
 #endif /* _UDP_H */
diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c
index 919bd02..06ccfd5 100644
--- a/net/core/flow_dissector.c
+++ b/net/core/flow_dissector.c
@@ -8,6 +8,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 #include 
 #include 
 #include 
@@ -57,6 +59,20 @@ void skb_flow_dissector_init(struct flow_dissector 
*flow_dissector,
 }
 EXPORT_SYMBOL(skb_flow_dissector_init);
 
+static struct static_key udp_flow_dissect __read_mostly;
+
+void udp_flow_dissect_enable(void)
+{
+   static_key_slow_inc(_flow_dissect);
+}
+EXPORT_SYMBOL(udp_flow_dissect_enable);
+
+void udp_flow_dissect_disable(void)
+{
+   static_key_slow_dec(_flow_dissect);
+}
+EXPORT_SYMBOL(udp_flow_dissect_disable);
+
 /**
  * __skb_flow_get_ports -

Re: [PATCH] qed: Use list_move_tail instead of list_del/list_add_tail

2016-10-17 Thread David Miller

From: "Mintz, Yuval" 
Date: Mon, 17 Oct 2016 19:10:10 +

>> Using list_move_tail() instead of list_del() + list_add_tail().
> 
>> Signed-off-by: Wei Yongjun 
> 
> Thanks.
> Acked-by: Yuval Mintz 

Applied, thanks.

nfs NULL-dereferencing in net-next

2016-10-17 Thread Jakub Kicinski

Hi!

I'm hitting this reliably on net-next, HEAD at 3f3177bb680f
("fsl/fman: fix error return code in mac_probe()").

[   23.409633] BUG: unable to handle kernel NULL pointer dereference at 
0172
[   23.418716] IP: [] rpc_clnt_xprt_switch_has_addr+0xc/0x40 
[sunrpc]
[   23.427574] PGD 859020067 [   23.430472] PUD 858f2d067 
PMD 0 [   23.434311] 
[   23.436133] Oops:  [#1] PREEMPT SMP
[   23.440506] Modules linked in: nfsv4 ip6table_filter ip6_tables 
iptable_filter ip_tables ebtable_nat ebtables x_tables intel_ri
[   23.505915] CPU: 1 PID: 1067 Comm: mount.nfs Not tainted 
4.8.0-perf-13951-g3f3177bb680f #51
[   23.515363] Hardware name: Dell Inc. PowerEdge T630/0W9WXC, BIOS 1.2.10 
03/10/2015
[   23.523937] task: 983e9086ea00 task.stack: ac6c0a57c000
[   23.530641] RIP: 0010:[]  [] 
rpc_clnt_xprt_switch_has_addr+0xc/0x40 [sunrpc]
[   23.542229] RSP: 0018:ac6c0a57fb28  EFLAGS: 00010a97
[   23.548255] RAX: c80214ac RBX: 983e97c7b000 RCX: 983e9b3bc180
[   23.556320] RDX: 0001 RSI: 983e9928ed28 RDI: ffea
[   23.564386] RBP: ac6c0a57fb38 R08: 983e97090630 R09: 983e9928ed30
[   23.572452] R10: ac6c0a57fba0 R11: 0010 R12: ac6c0a57fba0
[   23.580517] R13: 983e9928ed28 R14:  R15: 983e91360560
[   23.588585] FS:  7f4c348aa880() GS:983e9f24() 
knlGS:
[   23.597742] CS:  0010 DS:  ES:  CR0: 80050033
[   23.604251] CR2: 0172 CR3: 000850a5f000 CR4: 001406e0
[   23.612316] Stack:
[   23.614648]  983e97c7b000 ac6c0a57fba0 ac6c0a57fb90 
c04d38c3
[   23.623331]  983e91360500 983e9928ed30 c0b9e560 
983e913605b8
[   23.632016]  983e9882e800 983e9882e800 ac6c0a57fc30 
ac6c0a57fdb8
[   23.640706] Call Trace:
[   23.643535]  [] nfs_get_client+0x123/0x340 [nfs]
[   23.650542]  [] nfs4_set_client+0x80/0xb0 [nfsv4]
[   23.657642]  [] nfs4_create_server+0x115/0x2a0 [nfsv4]
[   23.665230]  [] nfs4_remote_mount+0x2e/0x60 [nfsv4]
[   23.672519]  [] mount_fs+0x3a/0x160
[   23.678254]  [] ? alloc_vfsmnt+0x19e/0x230
[   23.684669]  [] vfs_kern_mount+0x67/0x110
[   23.690990]  [] nfs_do_root_mount+0x84/0xc0 [nfsv4]
[   23.698284]  [] nfs4_try_mount+0x37/0x50 [nfsv4]
[   23.705287]  [] nfs_fs_mount+0x2d1/0xa70 [nfs]
[   23.712092]  [] ? find_next_bit+0x18/0x20
[   23.718413]  [] ? nfs_remount+0x3c0/0x3c0 [nfs]
[   23.725316]  [] ? nfs_clone_super+0x130/0x130 [nfs]
[   23.732606]  [] mount_fs+0x3a/0x160
[   23.738340]  [] ? alloc_vfsmnt+0x19e/0x230
[   23.744755]  [] vfs_kern_mount+0x67/0x110
[   23.751071]  [] do_mount+0x1bf/0xc70
[   23.756904]  [] ? copy_mount_options+0xbb/0x220
[   23.763803]  [] SyS_mount+0x83/0xd0
[   23.769538]  [] entry_SYSCALL_64_fastpath+0x17/0x98
[   23.776817] Code: 01 00 48 8b 93 f8 04 00 00 44 89 e6 48 c7 c7 98 b2 43 c0 
e8 9f 0d d4 f9 eb c0 0f 1f 44 00 00 0f 1f 44 00 00  
[   23.802909] RIP  [] rpc_clnt_xprt_switch_has_addr+0xc/0x40 
[sunrpc]
[   23.811857]  RSP 
[   23.815839] CR2: 0172
[   23.819629] ---[ end trace 9958eca92c9eeafe ]---
[   23.827345] note: mount.nfs[1067] exited with preempt_count 1


.config
Description: Binary data

Re: [Intel-wired-lan] [PATCH V2 RFC 2/2] ixgbe: ixgbe_atr() compute l4_proto only if non-paged data has network/transport headers

2016-10-17 Thread Sowmini Varadhan

On (10/17/16 11:15), Alexander Duyck wrote:
> I would say you probably only need the first check here for skb->data
> and could probably skip the second part.  You will be testing for
> skb_tail_pointer in all the other tests you added so this check is
> redundant anyway.
> 
> Also you might want to go through and wrap these with unlikely() since
> most of these are exception cases.

Ok.. v3 will have this.

> > /* Currently only IPv4/IPv6 with TCP is supported */
> > switch (hdr.ipv4->version) {
> > case IPVERSION:
> > /* access ihl as u8 to avoid unaligned access on ia64 */
> > hlen = (hdr.network[0] & 0x0F) << 2;
> > +   if (skb_tail_pointer(skb) < hdr.network + hlen +
> > +   sizeof(struct tcphdr))
> > +   return;
> > l4_proto = hdr.ipv4->protocol;
> > break;
> > case 6:
> > hlen = hdr.network - skb->data;
> > +   if (skb_tail_pointer(skb) < hdr.network + hlen +
> > +   sizeof(struct tcphdr))
> > +   return;
> > l4_proto = ipv6_find_hdr(skb, , IPPROTO_TCP, NULL, 
> > NULL);
> > hlen -= hdr.network - skb->data;
> > break;
> 
> I believe one more check is needed after this block to verify the TCP
> header fields are present.
> 
> So you probably need to add a check for "skb_tail_pointer(skb) <
> (hdr.network + hlen + 20)".

But isnt that the same thing as the checks before l4_proto computation above?

--Sowmini

Re: [PATCH 4/4] gpio: ptxpmb-ext-cpld: Document bindings of PTXPMB extended CPLD

2016-10-17 Thread Pantelis Antoniou

Hi Rob,

> On Oct 10, 2016, at 23:19 , Rob Herring  wrote:
> 
> On Fri, Oct 07, 2016 at 06:19:34PM +0300, Pantelis Antoniou wrote:
>> From: Georgi Vlaev 
>> 
>> Add device tree bindings document for the GPIO driver of
>> Juniper's PTXPMB extended CPLD.
>> 
>> Signed-off-by: Georgi Vlaev 
>> [Ported from Juniper kernel]
>> Signed-off-by: Pantelis Antoniou 
>> ---
>> .../bindings/gpio/jnx,gpio-ptxpmb-ext-cpld.txt | 36 
>> ++
>> 1 file changed, 36 insertions(+)
>> create mode 100644 
>> Documentation/devicetree/bindings/gpio/jnx,gpio-ptxpmb-ext-cpld.txt
>> 
>> diff --git 
>> a/Documentation/devicetree/bindings/gpio/jnx,gpio-ptxpmb-ext-cpld.txt 
>> b/Documentation/devicetree/bindings/gpio/jnx,gpio-ptxpmb-ext-cpld.txt
>> new file mode 100644
>> index 000..87f01b9
>> --- /dev/null
>> +++ b/Documentation/devicetree/bindings/gpio/jnx,gpio-ptxpmb-ext-cpld.txt
>> @@ -0,0 +1,36 @@
>> +Juniper PTXPMB extended CPLD GPIO block
>> +
>> +Required properties:
>> +
>> +- compatible:
>> +Must be "jnx,gpio-ptxpmb-ext-cpld"
> 
> Generally, '-gpio' would be last.
> 

OK.


>> +
>> +- #gpio-cells:
>> +Should be <2>.  The first cell is the pin number (within the 
>> controller's
>> +pin space), and the second is used for the following flags:
>> +bit[0]: direction (0 = out, 1 = in)
>> +bit[1]: init high
>> +bit[2]: active low
> 
> Same comment as all the other gpio bindings...
> 
>> +
>> +- gpio-controller:
>> +Specifies that the node is a GPIO controller.
>> +
>> +- interrupt-controller:
>> +Specifies that the node is an interrupt controller.
>> +
>> +Optional properties:
>> +
>> +- reg:
>> +Address and length of the register set for the device. Usually supplied
>> +by the parent MFD device.
> 
> Make it required.
> 

Hmm, the current driver supplies that range via platform data (it’s an mfd 
driver).
What’s the take on mixing those?

>> +
>> +
>> +Example:
>> +
>> +gpio_ext_cpld: cpld-ext-gpio {
>> +compatible = "jnx,gpio-ptxpmb-ext-cpld";
>> +#gpio-cells = <2>;
>> +#interrupt-cells = <2>;
>> +gpio-controller;
>> +interrupt-controller;
>> +};
>> -- 
>> 1.9.1

Re: [PATCH] qed: Use list_move_tail instead of list_del/list_add_tail

2016-10-17 Thread Mintz, Yuval

> Using list_move_tail() instead of list_del() + list_add_tail().

> Signed-off-by: Wei Yongjun 

Thanks.
Acked-by: Yuval Mintz

Re: [PATCH 08/10] mtd: flash-sam: Bindings for Juniper's SAM FPGA flash

2016-10-17 Thread Pantelis Antoniou

Hi Rob,

> On Oct 10, 2016, at 23:07 , Rob Herring  wrote:
> 
> gOn Fri, Oct 07, 2016 at 06:18:36PM +0300, Pantelis Antoniou wrote:
>> From: Georgi Vlaev 
>> 
>> Add binding document for Junipers Flash IP block present
>> in the SAM FPGA on PTX series of routers.
>> 
>> Signed-off-by: Georgi Vlaev 
>> [Ported from Juniper kernel]
>> Signed-off-by: Pantelis Antoniou 
>> ---
>> .../devicetree/bindings/mtd/flash-sam.txt  | 31 
>> ++
>> 1 file changed, 31 insertions(+)
>> create mode 100644 Documentation/devicetree/bindings/mtd/flash-sam.txt
>> 
>> diff --git a/Documentation/devicetree/bindings/mtd/flash-sam.txt 
>> b/Documentation/devicetree/bindings/mtd/flash-sam.txt
>> new file mode 100644
>> index 000..bdf1d78
>> --- /dev/null
>> +++ b/Documentation/devicetree/bindings/mtd/flash-sam.txt
>> @@ -0,0 +1,31 @@
>> +Flash device on a Juniper SAM FPGA
>> +
>> +These flash chips are found in the PTX series of Juniper routers.
>> +
>> +They are regular CFI compatible (Intel or AMD extended) flash chips with
>> +some special write protect/VPP bits that can be controlled by the machine's
>> +system controller.
> 
> And where's the description of the sys ctrlr?
> 

The system controller is Juniper IP. We’ll have to ask around about
specifics, and it’s pretty uninspiring stuff.

>> +
>> +Required properties:
>> +- compatible : must be "jnx,flash-sam"
>> +
>> +Optional properties:
>> +- reg : memory address for the flash chip, note that this is not
>> +required since usually the device is a subdevice of the SAM MFD
>> +driver which fills in the register fields.
>> +
>> +For the rest of the properties, see mtd-physmap.txt.
>> +
>> +The device tree may optionally contain sub-nodes describing partitions of 
>> the
>> +address space. See partition.txt for more detail.
>> +
>> +Example:
>> +
>> +flash_sam {
>> +compatible = "jnx,flash-sam";
>> +partition@0 {
> 
> This should have a heirarchy of a controller node, a flash child node, 
> partitions child node, and partition child nodes.
> 

OK.

>> +reg = <0x0 0x40>;
>> +label = "pic0-golden";
>> +read-only;
>> +};
>> +};
>> -- 
>> 1.9.1

Re: [PATCH 2/4] mfd: ptxpmb-ext-cpld: Add documentation for PTXPMB extended CPLD

2016-10-17 Thread Pantelis Antoniou

Hi Rob,

> On Oct 10, 2016, at 23:10 , Rob Herring  wrote:
> 
> On Fri, Oct 07, 2016 at 06:19:32PM +0300, Pantelis Antoniou wrote:
>> From: Georgi Vlaev 
>> 
>> Add DT bindings document for the PTXPMB extended CPLD device.
>> 
>> Signed-off-by: Georgi Vlaev 
>> [Ported from Juniper kernel]
>> Signed-off-by: Pantelis Antoniou 
>> ---
>> .../bindings/mfd/jnx-ptxpmb-ext-cpld.txt   | 35 
>> ++
>> 1 file changed, 35 insertions(+)
>> create mode 100644 
>> Documentation/devicetree/bindings/mfd/jnx-ptxpmb-ext-cpld.txt
>> 
>> diff --git a/Documentation/devicetree/bindings/mfd/jnx-ptxpmb-ext-cpld.txt 
>> b/Documentation/devicetree/bindings/mfd/jnx-ptxpmb-ext-cpld.txt
>> new file mode 100644
>> index 000..098a548a
>> --- /dev/null
>> +++ b/Documentation/devicetree/bindings/mfd/jnx-ptxpmb-ext-cpld.txt
>> @@ -0,0 +1,35 @@
>> +* Device tree bindings for Juniper's PTXPMB Extended CPLD FPGA MFD driver
>> +
>> +The device supports a gpio block which is described in the
>> +jnx-gpio-ptxpmb-ext-cpld document.
>> +
>> +Required properties:
>> +
>> +- compatible:   "jnx,ptxpmb-ext-cpld"
>> +
>> +- reg:  contains offset/length value for device state 
>> control
>> +registers space.
>> +
>> +Optional properties:
>> +
>> +- interrupts:   The interrupt line(s) the /IRQ signal(s) for 
>> the device is
>> +connected to.
>> +
>> +- interrupt-parent: The parent interrupt controller.
>> +
>> +Example:
>> +
>> +ext-cpld@1,0 {
>> +compatible = "jnx,ptxpmb-ext-cpld";
>> +reg = <0x1 0 0x1000>;
> 
> What's the bus type here? Unit address is probably wrong.
> 

localbus on a gpmc memory controller.

>> +interrupt-parent = <>;
>> +interrupts = <7 2>, <8 2>;
>> +
>> +gpio_ext_cpld: cpld-ext-gpio {
>> +compatible = "jnx,gpio-ptxpmb-ext-cpld";
>> +#gpio-cells = <2>;
>> +#interrupt-cells = <2>;
>> +gpio-controller;
>> +interrupt-controller;
>> +};
>> +};
>> -- 
>> 1.9.1

Re: Kernel 4.6.7-rt13: Intel Ethernet driver igb causes huge latencies in cyclictest

2016-10-17 Thread Richard Cochran

On Mon, Oct 17, 2016 at 01:36:45PM -0500, Julia Cartwright wrote:
> While that is certainly the case, and would explain the most egregious
> of measured latency spikes, it doesn't invalidate the test if you
> consider the valuable data point(s) to be the minimum and/or median
> latencies.

Well, consider the case where an interrupt is stuck on.  That is a
possible cause, and it can be positively excluded by either disabling
local interrupts around the time stamps or by putting the vector
events into the trace.

(Doesn't matter now that bisection fingered the PCIe setup, just sayin.)

Thanks,
Richard

Re: [PATCH 06/10] gpio: sam: Document bindings of SAM FPGA GPIO block

2016-10-17 Thread Pantelis Antoniou

Hi Rob,

> On Oct 10, 2016, at 23:03 , Rob Herring  wrote:
> 
> On Fri, Oct 07, 2016 at 06:18:34PM +0300, Pantelis Antoniou wrote:
>> From: Georgi Vlaev 
>> 
>> Add device tree bindings document for the GPIO driver of
>> Juniper's SAM FPGA.
>> 
>> Signed-off-by: Georgi Vlaev 
>> [Ported from Juniper kernel]
>> Signed-off-by: Pantelis Antoniou 
>> ---
>> .../devicetree/bindings/gpio/jnx,gpio-sam.txt  | 110 
>> +
>> 1 file changed, 110 insertions(+)
>> create mode 100644 Documentation/devicetree/bindings/gpio/jnx,gpio-sam.txt
>> 
>> diff --git a/Documentation/devicetree/bindings/gpio/jnx,gpio-sam.txt 
>> b/Documentation/devicetree/bindings/gpio/jnx,gpio-sam.txt
>> new file mode 100644
>> index 000..514c350
>> --- /dev/null
>> +++ b/Documentation/devicetree/bindings/gpio/jnx,gpio-sam.txt
>> @@ -0,0 +1,110 @@
>> +Juniper SAM FPGA GPIO block
>> +
>> +The controller's registers are organized as sets of eight 32-bit
>> +registers with each set controlling a bank of up to 32 pins.  A single
>> +interrupt is shared for all of the banks handled by the controller.
>> +
>> +Required properties:
>> +
>> +- compatible:
>> +Must be "jnx,gpio-sam"
>> +
>> +- #gpio-cells:
>> +Should be <2>.  The first cell is the pin number (within the 
>> controller's
>> +pin space), and the second is used for the following flags:
>> +bit[0]: direction (0 = out, 1 = in)
>> +bit[1]: init high
>> +bit[2]: active low
>> +bit[3]: open drain
>> +bit[4]: open drain
> 
> Use and/or add to standard flags.
> 

OK.

>> +
>> +- gpio-controller:
>> +Specifies that the node is a GPIO controller.
>> +
>> +Optional properties:
>> +
>> +- reg:
>> +This driver is part of the SAM FPGA MFD driver, so the
>> +address range is supplied by that driver. However you can
>> +override using this property.
>> +
>> +- gpio-base:
>> +Base of the GPIO pins of this instance. If not present use system 
>> allocated.
> 
> This probably needs to go.
> 

OK.

>> +
>> +- gpio-count:
> 
> ngpios instead.
> 

OK.

>> +Number of GPIO pins of this instance. If not present read the number 
>> from
>> +the one configured in the FPGA data. Maximum number is 512.
>> +
>> +- #interrupt-cells:
>> +Should be <2>.  The first cell is the GPIO number, the second should 
>> specify
>> +flags.  The following subset of flags is supported:
>> +- bits[16,4:0] trigger type and level flags
>> +bit  0: rising edge interrupt
>> +bit  1: falling edge interrupt
>> +bit  2: active high interrupt
>> +bit  3: active low interrupt
>> +bit  4: enable debounce
>> +bit 16: signal is active low
> 
> What does this mean?
> 

I will reword.

>> +See also 
>> Documentation/devicetree/bindings/interrupt-controller/interrupts.txt
>> +
>> +- gpio-interrupts:
>> +A number of triples that define the mapping of interrupt groupsb to a 
>> range of
>> +pins. The first cell defines the interrupt group, the second is the 
>> start of
>> +the pin range and the third the number of pins in the range.
> 
> Needs a vendor prefix.
> 

OK.

>> +
>> +- gpio-exports:
>> +A subnode containing the list of pins that will be exported to 
>> user-space.
> 
> DT doesn't know about userspace. Drop this.
> 

OK, the export bit should go. 

>> +Each subnode contains:
>> +Required properties:
>> +- pin: The gpio to be exported and the relevant flags.
>> +Optional properties:
>> +- label: The label to use for export; if not supplied use the node 
>> name.
>> +
>> +Example:
>> +
>> +gpio20: gpio-sam {
>> +compatible = "jnx,gpio-sam";
>> +gpio-controller;
>> +interrupt-controller;
>> +/* 1st cell: gpio pin
>> + * 2nd cell: flags (bit mask)
>> + * bit  0: rising edge interrupt
>> + * bit  1: falling edge interrupt
>> + * bit  2: active high interrupt
>> + * bit  3: active low interrupt
>> + * bit  4: enable debounce
>> + * bit 16: signal is active low
>> + */
>> +#interrupt-cells = <2>;
>> +#gpio-cells = <2>;
>> +gpio-count = <340>;
>> +/* 1st cell: gpio interrupt status bit
>> + * 2nd cell: 1st pin
>> + * 3rd cell: # of pins
>> + */
>> +gpio-interrupts =
>> +<0 0 32>,   /* TL / TQ */
>> +<1 32 32>,  /* PIC 1 */
>> +<2 32 32>,  /* PIC 1 spare */
>> +<7 148 32>, /* PIC 0 */
>> +<8 170 32>, /* PIC 0 spare */
>> +<16 318 22>;/* FPC */
>> +
>> +gpio-exports {
>> +/*
>> + * flags:
>> + * GPIOF_DIR_IN bit 0=1
>> + * GPIOF_DIR_OUTbit 0=0
>> + * GPIOF_INIT_HIGH  bit 1=1
>> + *   GPIOF_INIT_HIGH is raw, not translated
>> + * GPIOF_ACTIVE_LOW bit 2=1
>> + *

Re: Kernel 4.6.7-rt13: Intel Ethernet driver igb causes huge latencies in cyclictest

2016-10-17 Thread Julia Cartwright

On Sat, Oct 15, 2016 at 12:06:33AM +0200, Richard Cochran wrote:
> On Fri, Oct 14, 2016 at 08:58:22AM +, Koehrer Mathias (ETAS/ESW5) wrote:
> > @@ -753,7 +756,9 @@ u32 igb_rd32(struct e1000_hw *hw, u32 re
> > if (E1000_REMOVED(hw_addr))
> > return ~value;
> >  
> > +trace_igb(801);
> > value = readl(_addr[reg]);
> > +trace_igb(802);
> 
> Nothing prevents this code from being preempted between the two trace
> points, and so you can't be sure whether the time delta in the trace
> is caused by the PCIe read stalling or not.

While that is certainly the case, and would explain the most egregious
of measured latency spikes, it doesn't invalidate the test if you
consider the valuable data point(s) to be the minimum and/or median
latencies.

   Julia

Re: Kernel 4.6.7-rt13: Intel Ethernet driver igb causes huge latencies in cyclictest

2016-10-17 Thread Julia Cartwright

+linux-pci

On Mon, Oct 17, 2016 at 08:39:40AM -0700, Alexander Duyck wrote:
> On Mon, Oct 17, 2016 at 8:00 AM, Koehrer Mathias (ETAS/ESW5)
>  wrote:
> > Hi Julia!
> >> > > Have you tested on a vanilla (non-RT) kernel?  I doubt there is
> >> > > anything RT specific about what you are seeing, but it might be nice
> >> > > to get confirmation.  Also, bisection would probably be easier if you 
> >> > > confirm on a
> >> vanilla kernel.
> >> > >
> >> > > I find it unlikely that it's a kernel config option that changed
> >> > > which regressed you, but instead was a code change to a driver.
> >> > > Which driver is now the question, and the surface area is still big
> >> > > (processor mapping attributes for this region, PCI root complex 
> >> > > configuration,
> >> PCI brige configuration, igb driver itself, etc.).
> >> > >
> >> > > Big enough that I'd recommend a bisection.  It looks like a
> >> > > bisection between 3.18 and 4.8 would take you about 18 tries to narrow 
> >> > > down,
> >> assuming all goes well.
> >> > >
> >> >
> >> > I have now repeated my tests using the vanilla kernel.
> >> > There I got the very same issue.
> >> > Using kernel 4.0 is fine, however starting with kernel 4.1, the issue 
> >> > appears.
> >>
> >> Great, thanks for confirming!  That helps narrow things down quite a bit.
> >>
> >> > Here is my exact (reproducible) test description:
> >> > I applied the following patch to the kernel to get the igb trace.
> >> > This patch instruments the igb_rd32() function to measure the call to
> >> > readl() which is used to access registers of the igb NIC.
> >>
> >> I took your test setup and ran it between 4.0 and 4.1 on the hardware on 
> >> my desk,
> >> which is an Atom-based board with dual I210s, however I didn't see much
> >> difference.
> >>
> >> However, it's a fairly simple board, with a much simpler PCI topology than 
> >> your
> >> workstation.  I'll see if I can find some other hardware to test on.
> >>
> >> [..]
> >> > This means, that I think that some other stuff in kernel 4.1 has
> >> > changed, which has impact on the igb accesses.
> >> >
> >> > Any idea what component could cause this kind of issue?
> >>
> >> Can you continue your bisection using 'git bisect'?  You've already 
> >> narrowed it down
> >> between 4.0 and 4.1, so you're well on your way.
> >>
> >
> > OK - done.
> > And finally I was successful!
> > The following git commit is the one that is causing the trouble!
> > (The full commit is in the attachment).
> > + BEGIN +++
> > commit 387d37577fdd05e9472c20885464c2a53b3c945f
> > Author: Matthew Garrett 
> > Date:   Tue Apr 7 11:07:00 2015 -0700
> >
> > PCI: Don't clear ASPM bits when the FADT declares it's unsupported
> >
> > Communications with a hardware vendor confirm that the expected 
> > behaviour
> > on systems that set the FADT ASPM disable bit but which still grant full
> > PCIe control is for the OS to leave any BIOS configuration intact and
> > refuse to touch the ASPM bits.  This mimics the behaviour of Windows.
> >
> > Signed-off-by: Matthew Garrett 
> > Signed-off-by: Bjorn Helgaas 
> > + HEADER +++
> >
> > The only files that are modified by this commit are
> > drivers/acpi/pci_root.c
> > drivers/pci/pcie/aspm.c
> > include/linux/pci-aspm.h
> >
> > This is all generic PCIe stuff - however I do not really understand what
> > the changes of the commit are...
> >
> > In my setup I am using a dual port igb Ethernet adapter.
> > This has an onboard PCIe switch and it might be that the configuration of 
> > this
> > PCIe switch on the Intel board is causing the trouble.
> >
> > Please see also the output of "lspci -v" in the attachment.
> > The relevant PCI address of the NIC is 04:00.0 / 04:00.1
> >
> > Any feedback on this is welcome!
> >
> > Thanks
> >
> > Mathias
>
> Hi Mathias,
>
> If you could set the output of lspci -vvv it might be more useful as
> most of the configuration data isn't present in the lspci dump you had
> attached.  Specifically if you could do this for the working case and
> the non-working case we could verify if this issue is actually due to
> the ASPM configuration on the device.
>
> Also one thing you might try is booting your kernel with the kernel
> parameter "pcie_aspm=off".  It sounds like the extra latency is likely
> due to your platform enabling ASPM on the device and this in turn will
> add latency if the PCIe link is disabled when you attempt to perform a
> read as it takes some time to bring the PCIe link up when in L1 state.

So if we assume it's this commit causing the regression, then it's safe
to assume that this system's BIOS is claiming to not support ASPM in the
FADT, but the BIOS is leaving ASPM configured in some way on the
relevant devices.

Also, unfortunately, taking a look at the code which handles

Re: [Intel-wired-lan] [PATCH V2 RFC 2/2] ixgbe: ixgbe_atr() compute l4_proto only if non-paged data has network/transport headers

2016-10-17 Thread Alexander Duyck

On Mon, Oct 17, 2016 at 10:25 AM, Sowmini Varadhan
 wrote:
> For some Tx paths (e.g., tpacket_snd()), ixgbe_atr may be
> passed down an sk_buff that has the network and transport
> header in the paged data, so it needs to make sure these
> headers are available in the headlen bytes to calculate the
> l4_proto.
>
> This patch expect that network and transport headers are
> already available in the non-paged header dat.  The assumption
> is that the caller has set this up if l4_proto based Tx
> steering is desired.
>
> Signed-off-by: Sowmini Varadhan 
> ---
>  drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |   18 ++
>  1 files changed, 18 insertions(+), 0 deletions(-)
>
> diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c 
> b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> index eceb47b..2cc1dae 100644
> --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> @@ -54,6 +54,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>
>  #include "ixgbe.h"
>  #include "ixgbe_common.h"
> @@ -7651,11 +7652,16 @@ static void ixgbe_atr(struct ixgbe_ring *ring,
> /* snag network header to get L4 type and address */
> skb = first->skb;
> hdr.network = skb_network_header(skb);
> +   if (hdr.network <= skb->data || hdr.network >= skb_tail_pointer(skb))
> +   return;

I would say you probably only need the first check here for skb->data
and could probably skip the second part.  You will be testing for
skb_tail_pointer in all the other tests you added so this check is
redundant anyway.

Also you might want to go through and wrap these with unlikely() since
most of these are exception cases.

> if (skb->encapsulation &&
> first->protocol == htons(ETH_P_IP) &&
> hdr.ipv4->protocol == IPPROTO_UDP) {
> struct ixgbe_adapter *adapter = q_vector->adapter;
>
> +   if (skb_tail_pointer(skb) < hdr.network + VXLAN_HEADROOM)
> +   return;
> +
> /* verify the port is recognized as VXLAN */
> if (adapter->vxlan_port &&
> udp_hdr(skb)->dest == adapter->vxlan_port)
> @@ -7666,15 +7672,27 @@ static void ixgbe_atr(struct ixgbe_ring *ring,
> hdr.network = skb_inner_network_header(skb);
> }
>
> +   /* Make sure we have at least [minimum IPv4 header + TCP]
> +* or [IPv6 header] bytes
> +*/
> +   if (skb_tail_pointer(skb) < hdr.network + 40)
> +   return;
> +
> /* Currently only IPv4/IPv6 with TCP is supported */
> switch (hdr.ipv4->version) {
> case IPVERSION:
> /* access ihl as u8 to avoid unaligned access on ia64 */
> hlen = (hdr.network[0] & 0x0F) << 2;
> +   if (skb_tail_pointer(skb) < hdr.network + hlen +
> +   sizeof(struct tcphdr))
> +   return;
> l4_proto = hdr.ipv4->protocol;
> break;
> case 6:
> hlen = hdr.network - skb->data;
> +   if (skb_tail_pointer(skb) < hdr.network + hlen +
> +   sizeof(struct tcphdr))
> +   return;
> l4_proto = ipv6_find_hdr(skb, , IPPROTO_TCP, NULL, NULL);
> hlen -= hdr.network - skb->data;
> break;

I believe one more check is needed after this block to verify the TCP
header fields are present.

So you probably need to add a check for "skb_tail_pointer(skb) <
(hdr.network + hlen + 20)".

Thanks.

- Alex

1 2 3 >

1 - 100 of 255 matches

Mail list logo