date:20161126

Re: [Patch net-next] net_sched: move the empty tp check from ->destroy() to ->delete()

2016-11-26 Thread Roi Dayan




On 27/11/2016 06:47, Roi Dayan wrote:



On 27/11/2016 02:33, Daniel Borkmann wrote:

On 11/26/2016 12:09 PM, Daniel Borkmann wrote:

On 11/26/2016 07:46 AM, Cong Wang wrote:
On Thu, Nov 24, 2016 at 7:20 AM, Daniel Borkmann 
 wrote:

[...]

Ok, strange, qdisc_destroy() calls into ops->destroy(), where ingress
drops its entire chain via tcf_destroy_chain(), so that will be NULL
eventually. The tps are freed by call_rcu() as well as qdisc itself
later on via qdisc_rcu_free(), where it frees per-cpu bstats as well.
Outstanding readers should either bail out due to if (!cl) or can 
still
process the chain until read section ends, but during that time, 
cl->q

resp. bstats should be good. Do you happen to know what's at address
880a68b04028? I was wondering wrt call_rcu() vs call_rcu_bh(), 
but
at least on ingress (netif_receive_skb_internal()) we hold 
rcu_read_lock()

here. The KASAN report is reliably happening at this location, right?


I am confused as well, I don't see how it could be related to my 
patch yet.

I will take a deep look in the weekend.




Hi Cong,

When reported the new trace I didn't mean it's related to your patch, 
I just wanted to point it out it exposed something. I should have been 
clear about it.





Ok, I'm currently on the run. Got too late yesterday night, but I'll
write what I found in the evening today, not related to ingress though.


Just pushed out my analysis to netdev under "[PATCH net] net, sched: 
respect
rcu grace period on cls destruction". My conclusion is that both 
issues are
actually separate, and that one is small enough where we could route 
it via
net actually. Perhaps this at the same time shrinks your "[PATCH 
net-next]

net_sched: move the empty tp check from ->destroy() to ->delete()" to a
reasonable size that it's suitable to net as well. Your 
->delete()/->destroy()
one is definitely needed, too. The tp->root one is independant of 
->delete()/
->destroy() as they are different races and tp->root could also 
happen when
you just destroy the whole tp directly. I think that seems like a 
good path

forward to me.

Thanks,
Daniel




Hi Daniel,

As for the tainted kernel. I was in old (week or two) net-next tree 
and only cherry-picked from latest net-next related patches to 
Mellanox HCA, cls_api, cls_flower, devlink. so those are the tainted 
modules.
I have the issue reproducing in that tree so wanted it to check it 
with Cong's patch instead of latest net-next.
I'll try running reproducing the issue with your new patch and later 
try latest net-next as well.


Thanks,
Roi



Hi,

I tested "[PATCH net] net, sched: respect rcu grace period on cls 
destruction" and could not reproduce my original issue.
I rebased "[Patch net-next] net_sched: move the empty tp check from 
->destroy() to ->delete()" over to test it in the same tree and got into 
a new trace in fl_delete.


[35659.012123] BUG: KASAN: wild-memory-access on address 1803ca31
[35659.020042] Write of size 1 by task ovs-vswitchd/20135
[35659.025878] CPU: 19 PID: 20135 Comm: ovs-vswitchd Tainted: 
G   O4.9.0-rc3+ #18

[35659.035948] Hardware name: HP ProLiant DL380p Gen8, BIOS P70 07/01/2015
[35659.043730] Call Trace:
[35659.046619]  [] dump_stack+0x63/0x81
[35659.052456]  [] kasan_report_error+0x408/0x4e0
[35659.059402]  [] kasan_report+0x58/0x60
[35659.065428]  [] ? call_rcu_sched+0x1d/0x20
[35659.072119]  [] ? fl_destroy_filter+0x21/0x30 
[cls_flower]

[35659.080217]  [] ? fl_delete+0x1df/0x2e0 [cls_flower]
[35659.087580]  [] __asan_store1+0x4a/0x50
[35659.093697]  [] fl_delete+0x1df/0x2e0 [cls_flower]
[35659.100870]  [] tc_ctl_tfilter+0x10da/0x1b90


0x1d02 is in fl_delete (net/sched/cls_flower.c:805).
800 struct cls_fl_filter *f = (struct cls_fl_filter *) arg;
801
802 rhashtable_remove_fast(>ht, >ht_node,
803head->ht_params);
804 __fl_delete(tp, f);
805 *last = list_empty(>filters);
806 return 0;
807 }


Thanks,
Roi

Re: Crash due to mutex genl_lock called from RCU context

2016-11-26 Thread Cong Wang

On Sat, Nov 26, 2016 at 6:26 PM, Eric Dumazet  wrote:
>
> Are you telling me inet_release() is called when we close() the first
> file descriptor ?
>
> fd1 = socket()
> fd2 = dup(fd1);
> close(fd2) -> release() ???

Sorry, I didn't express myself clearly, I meant your change,
if exclude the SOCK_RCU_FREE part, basically reverts this commit:

commit 3f660d66dfbc13ea4b61d3865851b348444c24b4
Author: Herbert Xu 
Date:   Thu May 3 03:17:14 2007 -0700

[NETLINK]: Kill CB only when socket is unused

IOW, ->release() is called when the last sock fd ref is gone, but ->destructor()
is called with the last sock ref is gone. They are very different.

>> I don't see why we need to get genl_lock in ->done() here, because we are
>> already the last sock using it and module ref protects the ops from being
>> removed via module, seems we are pretty safe without any lock.
>
> Well, at least this exposes a real bug in Thomas patch.
>
> Removing the lock might be done for net-next, not stable branches.

I am confused, what Subash reported is a kernel warning which can
surely be fixed by removing genl lock (if it is correct, I need to double
check), so why for net-next?

RE: [PATCH] net: fec: turn on device when extracting statistics

2016-11-26 Thread Andy Duan

From: Nikita Yushchenko  Sent: Friday, 
November 25, 2016 6:02 PM
 >To: Andy Duan ; David S. Miller
 >; Troy Kisky ;
 >Andrew Lunn ; Eric Nelson ; Philippe
 >Reynes ; Johannes Berg ;
 >netdev@vger.kernel.org; linux-ker...@vger.kernel.org
 >Cc: Chris Healy ; Nikita Yushchenko
 >
 >Subject: [PATCH] net: fec: turn on device when extracting statistics
 >
 >Execution 'ethtool -S' on fec device that is down causes OOPS on Vybrid
 >board:
 >
 >Unhandled fault: external abort on non-linefetch (0x1008) at 0xe0898200 pgd
 >= ddecc000 [e0898200] *pgd=9e406811, *pte=400d1653, *ppte=400d1453
 >Internal error: : 1008 [#1] SMP ARM ...
 >
 >Reason of OOPS is that fec_enet_get_ethtool_stats() accesses fec registers
 >while IPG clock is stopped by PM.
 >
 >Fix that by wrapping statistics extraction into pm_runtime_get_sync() ...
 >pm_runtime_put_autosuspend() braces.
 >
 >Signed-off-by: Nikita Yushchenko 
 >---

Acked-by: Fugang Duan 

 > drivers/net/ethernet/freescale/fec_main.c | 11 ++-
 > 1 file changed, 10 insertions(+), 1 deletion(-)
 >
 >diff --git a/drivers/net/ethernet/freescale/fec_main.c
 >b/drivers/net/ethernet/freescale/fec_main.c
 >index 5aa9d4ded214..9c7592b80ce8 100644
 >--- a/drivers/net/ethernet/freescale/fec_main.c
 >+++ b/drivers/net/ethernet/freescale/fec_main.c
 >@@ -2317,10 +2317,19 @@ static void fec_enet_get_ethtool_stats(struct
 >net_device *dev,
 >  struct ethtool_stats *stats, u64 *data)  {
 >  struct fec_enet_private *fep = netdev_priv(dev);
 >- int i;
 >+ int i, ret;
 >+
 >+ ret = pm_runtime_get_sync(>pdev->dev);
 >+ if (IS_ERR_VALUE(ret)) {
 >+ memset(data, 0, sizeof(*data) * ARRAY_SIZE(fec_stats));
 >+ return;
 >+ }
 >
 >  for (i = 0; i < ARRAY_SIZE(fec_stats); i++)
 >  data[i] = readl(fep->hwp + fec_stats[i].offset);
 >+
 >+ pm_runtime_mark_last_busy(>pdev->dev);
 >+ pm_runtime_put_autosuspend(>pdev->dev);
 > }
 >
 > static void fec_enet_get_strings(struct net_device *netdev,
 >--
 >2.1.4

Xmas Offer

2016-11-26 Thread Mrs Julie Leach

You are a recipient to Mrs Julie Leach Donation of $3 million USD. Contact ( 
julieleac...@gmail.com ) for claims.

[PATCH net-next 2/4] Documentation: net: phy: Add a paragraph about pause frames/flow control

2016-11-26 Thread Florian Fainelli

Describe that the Ethernet MAC controller is ultimately responsible for
dealing with proper pause frames/flow control advertisement and
enabling, and that it is therefore allowed to have it change
phydev->supported/advertising with SUPPORTED_Pause and
SUPPORTED_AsymPause.

Signed-off-by: Florian Fainelli 
---
 Documentation/networking/phy.txt | 18 --
 1 file changed, 16 insertions(+), 2 deletions(-)

diff --git a/Documentation/networking/phy.txt b/Documentation/networking/phy.txt
index 4b25c0f24201..9a42a9414cea 100644
--- a/Documentation/networking/phy.txt
+++ b/Documentation/networking/phy.txt
@@ -127,8 +127,9 @@ Letting the PHY Abstraction Layer do Everything
  values pruned from them which don't make sense for your controller (a 10/100
  controller may be connected to a gigabit capable PHY, so you would need to
  mask off SUPPORTED_1000baseT*).  See include/linux/ethtool.h for definitions
- for these bitfields. Note that you should not SET any bits, or the PHY may
- get put into an unsupported state.
+ for these bitfields. Note that you should not SET any bits, except the
+ SUPPORTED_Pause and SUPPORTED_AsymPause bits (see below), or the PHY may get
+ put into an unsupported state.
 
  Lastly, once the controller is ready to handle network traffic, you call
  phy_start(phydev).  This tells the PAL that you are ready, and configures the
@@ -139,6 +140,19 @@ Letting the PHY Abstraction Layer do Everything
  When you want to disconnect from the network (even if just briefly), you call
  phy_stop(phydev).
 
+Pause frames / flow control
+
+ The PHY does not participate directly in flow control/pause frames except by
+ making sure that the SUPPORTED_Pause and SUPPORTED_AsymPause bits are set in
+ MII_ADVERTISE to indicate towards the link partner that the Ethernet MAC
+ controller supports such a thing. Since flow control/pause frames generation
+ involves the Ethernet MAC driver, it is recommended that this driver takes 
care
+ of properly indicating advertisement and support for such features by setting
+ the SUPPORTED_Pause and SUPPORTED_AsymPause bits accordingly. This can be done
+ either before or after phy_connect() and/or as a result of implementing the
+ ethtool::set_pauseparam feature.
+
+
 Keeping Close Tabs on the PAL
 
  It is possible that the PAL's built-in state machine needs a little help to
-- 
2.9.3

[PATCH net-next 3/4] Documentation: net: phy: Add blurb about RGMII

2016-11-26 Thread Florian Fainelli

RGMII is a recurring source of pain for people with Gigabit Ethernet
hardware since it may require PHY driver and MAC driver level
configuration hints. Document what are the expectations from PHYLIB and
what options exist.

Signed-off-by: Florian Fainelli 
---
 Documentation/networking/phy.txt | 56 
 1 file changed, 56 insertions(+)

diff --git a/Documentation/networking/phy.txt b/Documentation/networking/phy.txt
index 9a42a9414cea..18e9f518b6f9 100644
--- a/Documentation/networking/phy.txt
+++ b/Documentation/networking/phy.txt
@@ -65,6 +65,62 @@ The MDIO bus
  drivers/net/ethernet/freescale/fsl_pq_mdio.c and an associated DTS file
  for one of the users. (e.g. "git grep fsl,.*-mdio arch/powerpc/boot/dts/")
 
+(RG)MII/electrical interface considerations
+
+ The Reduced Gigabit Medium Independent Interface (RGMII) is a 12 pins
+ electrical signal using a synchronous 125Mhz clock signal and several data
+ lines. Due to this design decision, a 1.5ns to 2ns delay must be added between
+ the clock line (RXC or TXC) and the data lines to let the sink (PHY or MAC)
+ have enough setup and hold times to sample the data lines correctly. The PHY
+ library offers different types of PHY_INTERFACE_MODE_RGMII* values to let the
+ PHY driver and optionaly the MAC driver implement the required delay. The
+ values of phy_interface_t must be understood from the perspective of the PHY
+ device itself, leading to the following:
+
+ * PHY_INTERFACE_MODE_RGMII: the PHY is not responsible for inserting any
+   internal delay by itself, it assumes that either the Ethernet MAC (if 
capable
+   or the PCB traces) insert the correct 1.5-2ns delay
+
+ * PHY_INTERFACE_MODE_RGMII_TXID: the PHY should be inserting a delay for the
+   transmit data lines (TXD[3:0]) processed by the PHY device
+
+ * PHY_INTERFACE_MODE_RGMII_RXID: the PHY should be inserting a delay for the
+   receive data lines (RXD[3:0]) processed by the PHY device
+
+ * PHY_INTERFACE_MODE_RGMII_ID: the PHY should be inserting a delay for both
+   transmit AND receive data lines from/to the PHY device
+
+ Whenever it is possible, it is preferrable to utilize the PHY side RGMII delay
+ for several reasons:
+
+ * PHY devices may offer sub-nanosecond granularity in how they allow a
+   receiver/transmitter side delay (e.g: 0.5, 1.0, 1.5ns) etc.
+
+ * PHY devices are typically qualified for a large range of temperatures, and
+   they provide a constant and reliable delay across
+   temperature/pressure/voltage ranges
+
+ * PHY device drivers in PHYLIB being reusable by nature, being able to
+   configure correctly a specified delay enables more designs with similar 
delay
+   requirements to be enabled
+
+ For cases where the PHY is not capable of providing this delay, but the
+ Ethernet MAC driver is capable of doing it, the correct phy_interface_t value
+ should be PHY_INTERFACE_MODE_RGMII, and the Ethernet MAC driver should be
+ configured correctly in order to provide the required transmit and/or receive
+ side delay from the perspective of the PHY device.
+
+ In case neither the Ethernet MAC, nor the PHY are capable of providing the
+ required delays, as defined per the RGMII standard, several options may be
+ available:
+
+ * Some SoCs may offer a pin pad/mux/controller capable of configuring a given
+   set of pins's drive strength, delays and voltage, and it may be a suitable
+   option to insert the expected 2ns RGMII delay
+
+ * Modifying the PCB design to include a fixed delay (e.g: using a specifically
+   designed serpentine), which may not require software configuration at all
+
 Connecting to a PHY
 
  Sometime during startup, the network driver needs to establish a connection
-- 
2.9.3

[PATCH net-next 0/4] Documentation: net: phy: Improve documentation

2016-11-26 Thread Florian Fainelli

Hi all,

This patch series addresses discussions and feedback that was recently received
on the mailing-list in the area of: flow control/pause frames, interpretation of
phy_interface_t and finally add some links to useful standards documents.

Florian Fainelli (4):
  Documentation: net: phy: remove description of function pointers
  Documentation: net: phy: Add a paragraph about pause frames/flow
control
  Documentation: net: phy: Add blurb about RGMII
  Documentation: net: phy: Add links to several standards documents

 Documentation/networking/phy.txt | 119 +++
 1 file changed, 84 insertions(+), 35 deletions(-)

-- 
2.9.3

[PATCH net-next 1/4] Documentation: net: phy: remove description of function pointers

2016-11-26 Thread Florian Fainelli

Remove the function pointers documentation which duplicates information
found in include/linux/phy.h. Maintaining documentation about two
different locations just does not work, but the code is less likely to
be outdated.

Signed-off-by: Florian Fainelli 
---
 Documentation/networking/phy.txt | 35 ++-
 1 file changed, 2 insertions(+), 33 deletions(-)

diff --git a/Documentation/networking/phy.txt b/Documentation/networking/phy.txt
index 7ab9404a8412..4b25c0f24201 100644
--- a/Documentation/networking/phy.txt
+++ b/Documentation/networking/phy.txt
@@ -251,39 +251,8 @@ Writing a PHY driver
  PHY_BASIC_FEATURES, but you can look in include/mii.h for other
  features.
 
- Each driver consists of a number of function pointers:
-
-   soft_reset: perform a PHY software reset
-   config_init: configures PHY into a sane state after a reset.
- For instance, a Davicom PHY requires descrambling disabled.
-   probe: Allocate phy->priv, optionally refuse to bind.
-   PHY may not have been reset or had fixups run yet.
-   suspend/resume: power management
-   config_aneg: Changes the speed/duplex/negotiation settings
-   aneg_done: Determines the auto-negotiation result
-   read_status: Reads the current speed/duplex/negotiation settings
-   ack_interrupt: Clear a pending interrupt
-   did_interrupt: Checks if the PHY generated an interrupt
-   config_intr: Enable or disable interrupts
-   remove: Does any driver take-down
-   ts_info: Queries about the HW timestamping status
-   match_phy_device: used for Clause 45 capable PHYs to match devices
-   in package and ensure they are compatible
-   hwtstamp: Set the PHY HW timestamping configuration
-   rxtstamp: Requests a receive timestamp at the PHY level for a 'skb'
-   txtsamp: Requests a transmit timestamp at the PHY level for a 'skb'
-   set_wol: Enable Wake-on-LAN at the PHY level
-   get_wol: Get the Wake-on-LAN status at the PHY level
-   link_change_notify: called to inform the core is about to change the
-   link state, can be used to work around bogus PHY between state changes
-   read_mmd_indirect: Read PHY MMD indirect register
-   write_mmd_indirect: Write PHY MMD indirect register
-   module_info: Get the size and type of an EEPROM contained in an plug-in
-   module
-   module_eeprom: Get EEPROM information of a plug-in module
-   get_sset_count: Get number of strings sets that get_strings will count
-   get_strings: Get strings from requested objects (statistics)
-   get_stats: Get the extended statistics from the PHY device
+ Each driver consists of a number of function pointers, documented
+ in include/linux/phy.h under the phy_driver structure.
 
  Of these, only config_aneg and read_status are required to be
  assigned by the driver code.  The rest are optional.  Also, it is
-- 
2.9.3

[PATCH net-next 4/4] Documentation: net: phy: Add links to several standards documents

2016-11-26 Thread Florian Fainelli

Add links to the IEEE 802.3-2008 document, and the RGMII v1.3 and v2.0
revisions of the standard.

Signed-off-by: Florian Fainelli 
---
 Documentation/networking/phy.txt | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/Documentation/networking/phy.txt b/Documentation/networking/phy.txt
index 18e9f518b6f9..9908490363d6 100644
--- a/Documentation/networking/phy.txt
+++ b/Documentation/networking/phy.txt
@@ -386,3 +386,13 @@ Board Fixups
  The stubs set one of the two matching criteria, and set the other one to
  match anything.
 
+Standards
+
+ IEEE Standard 802.3: CSMA/CD Access Method and Physical Layer Specifications, 
Section Two:
+ http://standards.ieee.org/getieee802/download/802.3-2008_section2.pdf
+
+ RGMII v1.3:
+ 
http://web.archive.org/web/20160303212629/http://www.hp.com/rnd/pdfs/RGMIIv1_3.pdf
+
+ RGMII v2.0:
+ 
http://web.archive.org/web/20160303171328/http://www.hp.com/rnd/pdfs/RGMIIv2_0_final_hp.pdf
-- 
2.9.3

Re: [PATCH 2/2] net: phy: realtek: fix enabling of the TX-delay for RTL8211F

2016-11-26 Thread Florian Fainelli



On 11/25/2016 05:12 AM, Martin Blumenstingl wrote:
> The old logic always enabled the TX-delay when the phy-mode was set to
> PHY_INTERFACE_MODE_RGMII. There are dedicated phy-modes which tell the
> PHY driver to enable the RX and/or TX delays:
> - PHY_INTERFACE_MODE_RGMII should disable the RX and TX delay in the
>   PHY (if required, the MAC should add the delays in this case)
> - PHY_INTERFACE_MODE_RGMII_ID should enable RX and TX delay in the PHY
> - PHY_INTERFACE_MODE_RGMII_TXID should enable the TX delay in the PHY
> - PHY_INTERFACE_MODE_RGMII_RXID should enable the RX delay in the PHY
>   (currently not supported by RTL8211F)
> 
> With this patch we enable the TX delay for PHY_INTERFACE_MODE_RGMII_ID
> and PHY_INTERFACE_MODE_RGMII_TXID.
> Additionally we now explicity disable the TX-delay, which seems to be
> enabled automatically after a hard-reset of the PHY (by triggering it's
> reset pin) to get a consistent state (as defined by the phy-mode).
> 
> This fixes a compatibility problem with some SoCs where the TX-delay was
> also added by the MAC. With the TX-delay being applied twice the TX
> clock was off and TX traffic was broken or very slow (<10Mbit/s) on
> 1000Mbit/s links.
> 
> Signed-off-by: Martin Blumenstingl 

Reviewed-by: Florian Fainelli 
-- 
Florian

Re: [PATCH 1/2] Documentation: devicetree: clarify usage of the RGMII phy-modes

2016-11-26 Thread Florian Fainelli



On 11/25/2016 05:12 AM, Martin Blumenstingl wrote:
> RGMII requires special RX and/or TX delays depending on the actual
> hardware circuit/wiring. These delays can be added by the MAC, the PHY
> or the designer of the circuit (the latter means that no delay has to
> be added by PHY or MAC).
> There are 4 RGMII phy-modes used describe where a delay should be
> applied:
> - rgmii: the RX and TX delays are either added by the MAC (where the
>   exact delay is typically configurable, and can be turned off when no
>   extra delay is needed) or not needed at all (because the hardware
>   wiring adds the delay already). The PHY should neither add the RX nor
>   TX delay in this case.
> - rgmii-rxid: configures the PHY to enable the RX delay. The MAC should
>   not add the RX delay in this case.
> - rgmii-txid: configures the PHY to enable the TX delay. The MAC should
>   not add the TX delay in this case.
> - rgmii-id: combines rgmii-rxid and rgmii-txid and thus configures the
>   PHY to enable the RX and TX delays. The MAC should neither add the RX
>   nor TX delay in this case.
> 
> Document these cases in the ethernet.txt documentation to make it clear
> when to use each mode.
> If applied incorrectly one might end up with MAC and PHY both enabling
> for example the TX delay, which breaks ethernet TX traffic on 1000Mbit/s
> links.
> 
> Signed-off-by: Martin Blumenstingl 

Reviewed-by: Florian Fainelli 
-- 
Florian

Re: [Patch net-next] net_sched: move the empty tp check from ->destroy() to ->delete()

2016-11-26 Thread Roi Dayan




On 27/11/2016 02:33, Daniel Borkmann wrote:

On 11/26/2016 12:09 PM, Daniel Borkmann wrote:

On 11/26/2016 07:46 AM, Cong Wang wrote:
On Thu, Nov 24, 2016 at 7:20 AM, Daniel Borkmann 
 wrote:

[...]

Ok, strange, qdisc_destroy() calls into ops->destroy(), where ingress
drops its entire chain via tcf_destroy_chain(), so that will be NULL
eventually. The tps are freed by call_rcu() as well as qdisc itself
later on via qdisc_rcu_free(), where it frees per-cpu bstats as well.
Outstanding readers should either bail out due to if (!cl) or can 
still

process the chain until read section ends, but during that time, cl->q
resp. bstats should be good. Do you happen to know what's at address
880a68b04028? I was wondering wrt call_rcu() vs call_rcu_bh(), but
at least on ingress (netif_receive_skb_internal()) we hold 
rcu_read_lock()

here. The KASAN report is reliably happening at this location, right?


I am confused as well, I don't see how it could be related to my 
patch yet.

I will take a deep look in the weekend.




Hi Cong,

When reported the new trace I didn't mean it's related to your patch, I 
just wanted to point it out it exposed something. I should have been 
clear about it.





Ok, I'm currently on the run. Got too late yesterday night, but I'll
write what I found in the evening today, not related to ingress though.


Just pushed out my analysis to netdev under "[PATCH net] net, sched: 
respect
rcu grace period on cls destruction". My conclusion is that both 
issues are
actually separate, and that one is small enough where we could route 
it via
net actually. Perhaps this at the same time shrinks your "[PATCH 
net-next]

net_sched: move the empty tp check from ->destroy() to ->delete()" to a
reasonable size that it's suitable to net as well. Your 
->delete()/->destroy()
one is definitely needed, too. The tp->root one is independant of 
->delete()/
->destroy() as they are different races and tp->root could also happen 
when
you just destroy the whole tp directly. I think that seems like a good 
path

forward to me.

Thanks,
Daniel




Hi Daniel,

As for the tainted kernel. I was in old (week or two) net-next tree and 
only cherry-picked from latest net-next related patches to Mellanox HCA, 
cls_api, cls_flower, devlink. so those are the tainted modules.
I have the issue reproducing in that tree so wanted it to check it with 
Cong's patch instead of latest net-next.
I'll try running reproducing the issue with your new patch and later try 
latest net-next as well.


Thanks,
Roi

Re: Large performance regression with 6in4 tunnel (sit)

2016-11-26 Thread Stephen Rothwell

Hi Sven-Haegar,

On Fri, 25 Nov 2016 05:06:53 +0100 (CET) Sven-Haegar Koch  
wrote:
>
> Somehow this problem description really reminds me of a report on 
> netdev a bit ago, which the following patch fixed:
> 
> commit 9ee6c5dc816aa8256257f2cd4008a9291ec7e985
> Author: Lance Richardson 
> Date:   Wed Nov 2 16:36:17 2016 -0400
> 
> ipv4: allow local fragmentation in ip_finish_output_gso()
> 
> Some configurations (e.g. geneve interface with default
> MTU of 1500 over an ethernet interface with 1500 MTU) result
> in the transmission of packets that exceed the configured MTU.
> While this should be considered to be a "bad" configuration,
> it is still allowed and should not result in the sending
> of packets that exceed the configured MTU.
> 
> Could this be related?
> 
> I suppose it would be difficult to test this patch on this machine?

The kernel I am running on is based on 4.7.8, so the above patch
doesn't come close to applying. Most fo what it is reverting was
introduced in commit 359ebda25aa0 ("net/ipv4: Introduce IPSKB_FRAG_SEGS
bit to inet_skb_parm.flags") in v4.8-rc1.

-- 
Cheers,
Stephen Rothwell

ip manpage comments

2016-11-26 Thread Jon LaBadie

Though not new to *nix, I am new to using the ip(8) command.
Thus some of my historical assumptions about ip may be wrong.

It seems that an inclusive manpage for the ip command was
broken up into a shorter ip(8) manpage and 15 or more
ip-(8) manpages.  I'm basing this assumption
on long, inclusive manpages on https://linux.die.net/man/8/ip
and CentOS 6 while CentOS 7 and Fedora 24 each have the
sub-divided style.

I won't debate the wisdom of this subdivision, only comment
on how it was done.

The ip(8) manpage make no mention of additional subordinate
documents.  The listing of the additional documents in the
See Also section is insufficient.  This section is typically
used to mention related commands and other sources of reference
materials such as info docs, wikis, blogs, or mailing lists.

When one does investigate one of the subordinate manpages,
they do not state that they document subcommands of the
ip command.  In fact, on the ip-address(8) manpage it says

  The `ip address command' ...   (quotes added)

My first thought was "typo", this is the manpage about the
"ip-address" command.  Of course there is no ip-address command.
But "ip address" is not a command either, it is the "ip" command
with an argument.

There are several commands that have broken their manpage into
several manpages.  Two which come to mind are zsh(1) and perl(1).
The authors of those pages clearly state on the primary manpage
that this is an overview page and give clear pointers to the
additional manpages as well as additional documentation.  I would
recommend reorganizing the ip(8) manpage in a similar fashion.

Thank you for consideration of my opinion and for the development
of an awesome tool.

Jon
-- 
Jon H. LaBadie  jo...@jgcomp.com

Re: Crash due to mutex genl_lock called from RCU context

2016-11-26 Thread Eric Dumazet

On Sat, 2016-11-26 at 18:08 -0800, Cong Wang wrote:
> On Fri, Nov 25, 2016 at 8:54 PM, Eric Dumazet  wrote:
> >
> > Oh well, this wont work, since sk->sk_destruct will be called from RCU
> > callback.
> >
> > Grabbing the mutex should not be done from netlink_sock_destruct() but
> > from netlink_release()
> 
> But you also change the behavior of cb.done(), currently it is called when the
> last sock ref is gone, with your patch it is called when the first
> sock is closed.

No. It will be called when last refcount on the socket is released,
sk_refcnt transitions to final 0.

My patch changes the sock_hold() to the variant that makes sure
sk_refcnt is not 0 before increase, otherwise a race can happen and
release could be called twice.

Classic refcounting stuff coupled to rcu rules.

> No?

Are you telling me inet_release() is called when we close() the first
file descriptor ?

fd1 = socket()
fd2 = dup(fd1);
close(fd2) -> release() ???

> 
> I don't see why we need to get genl_lock in ->done() here, because we are
> already the last sock using it and module ref protects the ops from being
> removed via module, seems we are pretty safe without any lock.

Well, at least this exposes a real bug in Thomas patch.

Removing the lock might be done for net-next, not stable branches.

Re: [PATCH] mlx4: give precise rx/tx bytes/packets counters

2016-11-26 Thread Eric Dumazet

On Sun, 2016-11-27 at 00:47 +0200, Saeed Mahameed wrote:
> On Fri, Nov 25, 2016 at 5:46 PM, Eric Dumazet  wrote:

> As you see here in SRIOV mode (PF only) reads   sw stats from FW.
> Tariq, I think we need to fix this.

Sure, my patch does not change this at all.

If mlx4_is_master() is false, then we aggregate the software states and
only the software stats.

My patch makes this aggregation possible at the time ethtool or
ndo_get_stat64() are called, since this absolutely not depend on the 250
ms timer fetching hardware stats.

Re: Crash due to mutex genl_lock called from RCU context

2016-11-26 Thread Cong Wang

On Fri, Nov 25, 2016 at 8:54 PM, Eric Dumazet  wrote:
>
> Oh well, this wont work, since sk->sk_destruct will be called from RCU
> callback.
>
> Grabbing the mutex should not be done from netlink_sock_destruct() but
> from netlink_release()

But you also change the behavior of cb.done(), currently it is called when the
last sock ref is gone, with your patch it is called when the first
sock is closed.
No?

I don't see why we need to get genl_lock in ->done() here, because we are
already the last sock using it and module ref protects the ops from being
removed via module, seems we are pretty safe without any lock.

Re: [PATCH net-next 6/6] tcp: SOF_TIMESTAMPING_OPT_STATS option for SO_TIMESTAMPING

2016-11-26 Thread kbuild test robot

Hi Francis,

[auto build test ERROR on net-next/master]

url:
https://github.com/0day-ci/linux/commits/Yuchung-Cheng/tcp-sender-chronographs-instrumentation/20161127-041428
config: arm-spear6xx_defconfig (attached as .config)
compiler: arm-linux-gnueabi-gcc (Debian 6.1.1-9) 6.1.1 20160705
reproduce:
wget 
https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross
 -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=arm 

All errors (new ones prefixed by >>):

   net/built-in.o: In function `__skb_tstamp_tx':
>> net/core/skbuff.c:3846: undefined reference to 
>> `tcp_get_timestamping_opt_stats'

vim +3846 net/core/skbuff.c

  3840  return;
  3841  
  3842  if (tsonly) {
  3843  if ((sk->sk_tsflags & SOF_TIMESTAMPING_OPT_STATS) &&
  3844  sk->sk_protocol == IPPROTO_TCP &&
  3845  sk->sk_type == SOCK_STREAM)
> 3846  skb = tcp_get_timestamping_opt_stats(sk);
  3847  else
  3848  skb = alloc_skb(0, GFP_ATOMIC);
  3849  } else {

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip

Re: [PATCH net] sit: Set skb->protocol properly in ipip6_tunnel_xmit()

2016-11-26 Thread Stephen Rothwell

Hi Eli,

[Just for Dave's information]

On Fri, 25 Nov 2016 13:50:17 +0800 Eli Cooper  wrote:
>
> Similar to commit ae148b085876
> ("ip6_tunnel: Update skb->protocol to ETH_P_IPV6 in ip6_tnl_xmit()"),
> sit tunnels also need to update skb->protocol; otherwise, TSO/GSO packets
> might not be properly segmented, which causes the packets being dropped.
> 
> Reported-by: Stephen Rothwell 
> Tested-by: Eli Cooper 
> Cc: sta...@vger.kernel.org
> Signed-off-by: Eli Cooper 

I tested this patch and it does *not* solve my problem.

-- 
Cheers,
Stephen Rothwell

Re: Large performance regression with 6in4 tunnel (sit)

2016-11-26 Thread Stephen Rothwell

Hi Eli,

On Sun, 27 Nov 2016 11:54:41 +1100 Stephen Rothwell  
wrote:
>
> On Fri, 25 Nov 2016 14:05:04 +0800 Eli Cooper  wrote:
> >
> > I think this is similar to the bug I fixed in commit ae148b085876
> > ("ip6_tunnel: Update skb->protocol to ETH_P_IPV6 in ip6_tnl_xmit()").
> > 
> > I can reproduce a similar problem by applying xfrm to sit traffic.
> > TSO/GSO packets are dropped when IPSec is enabled, and IPv6 throughput
> > drops to 10s of Kbps. I am not sure if this is the same issue you
> > experienced, but I wrote a patch that fixed at least the issue I had.
> > 
> > Could you test the patch I sent to the mailing list just now?  
> 
> Thanks for the patch!
> 
> Its a bit tricky to test since the problem only occurs in a production
> machine (I tried reproducing in a VM, but the problem did not occur),
> but I will try to just rebuild the sit module and see if I can insert
> the modified one.

OK, I tried your patch and unfortunately, it doesn't seem to have
worked ... I still get the large packets dropped and resent smaller.

-- 
Cheers,
Stephen Rothwell

Re: netlink: GPF in sock_sndtimeo

2016-11-26 Thread Cong Wang

On Sat, Nov 26, 2016 at 7:44 AM, Dmitry Vyukov  wrote:
> Hello,
>
> The following program triggers GPF in sock_sndtimeo:
> https://gist.githubusercontent.com/dvyukov/c19cadd309791cf5cb9b2bf936d3f48d/raw/1743ba0211079a5465d039512b427bc6b59b1a76/gistfile1.txt
>
> On commit 16ae16c6e5616c084168740990fc508bda6655d4 (Nov 24).
>
> general protection fault:  [#1] SMP DEBUG_PAGEALLOC KASAN
> Dumping ftrace buffer:
>(ftrace buffer empty)
> Modules linked in:
> CPU: 1 PID: 19950 Comm: syz-executor Not tainted 4.9.0-rc5+ #54
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> task: 88002a0d0840 task.stack: 88003692
> RIP: 0010:[]  [< inline >] sock_sndtimeo
> include/net/sock.h:2075
> RIP: 0010:[]  []
> netlink_unicast+0xe1/0x730 net/netlink/af_netlink.c:1232
> RSP: 0018:880036926f68  EFLAGS: 00010202
> RAX: 0068 RBX: 880036927000 RCX: c900021d
> RDX: 0d63 RSI: 024000c0 RDI: 0340
> RBP: 880036927028 R08: ed0006ea7aab R09: ed0006ea7aab
> R10: 0001 R11: ed0006ea7aaa R12: dc00
> R13:  R14: 880035de3400 R15: 880035de3400
> FS:  7f90a2fc7700() GS:88003ed0() knlGS:
> CS:  0010 DS:  ES:  CR0: 80050033
> CR2: 006de0c0 CR3: 35de6000 CR4: 06e0
> Stack:
>  880035de3400 819f02a1 110006d24df4 0004
>  4db40014 880036926fd8  41b58ab3
>  89653c11 86cb3500 819f0345 880035de3400
> Call Trace:
>  [< inline >] audit_replace kernel/audit.c:817
>  [] audit_receive_msg+0x22c9/0x2ce0 kernel/audit.c:894
>  [< inline >] audit_receive_skb kernel/audit.c:1120
>  [] audit_receive+0x1dc/0x360 kernel/audit.c:1133
>  [< inline >] netlink_unicast_kernel net/netlink/af_netlink.c:1214
>  [] netlink_unicast+0x514/0x730 
> net/netlink/af_netlink.c:1240
>  [] netlink_sendmsg+0xaa4/0xe50 
> net/netlink/af_netlink.c:1786
>  [< inline >] sock_sendmsg_nosec net/socket.c:621
>  [] sock_sendmsg+0xcf/0x110 net/socket.c:631
>  [] sock_write_iter+0x32b/0x620 net/socket.c:829
>  [< inline >] new_sync_write fs/read_write.c:499
>  [] __vfs_write+0x4fe/0x830 fs/read_write.c:512
>  [] vfs_write+0x175/0x4e0 fs/read_write.c:560
>  [< inline >] SYSC_write fs/read_write.c:607
>  [] SyS_write+0x100/0x240 fs/read_write.c:599
>  [] do_syscall_64+0x2f4/0x940 arch/x86/entry/common.c:280
>  [] entry_SYSCALL64_slow_path+0x25/0x25
> Code: fe 4c 89 f7 e8 31 16 ff ff 8b 8d 70 ff ff ff 49 89 c7 31 c0 85
> c9 75 25 e8 7d 4a a3 fa 49 8d bd 40 03 00 00 48 89 f8 48 c1 e8 03 <42>
> 80 3c 20 00 0f 85 3a 06 00 00 49 8b 85 40 03 00 00 4c 8d 73
> RIP  [< inline >] sock_sndtimeo include/net/sock.h:2075
> RIP  [] netlink_unicast+0xe1/0x730
> net/netlink/af_netlink.c:1232
>  RSP 
> ---[ end trace 8383a15fba6fdc59 ]---

It is racy on audit_sock, especially on the netns exit path.

Could the following patch help a little bit? Also, I don't see how the
synchronize_net() here could sync with netlink rcv path, since unlike
packets from wire, netlink messages are not handled in BH context
nor I see any RCU taken on rcv path.

diff --git a/kernel/audit.c b/kernel/audit.c
index f1ca116..20bc79e 100644
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -1167,10 +1167,13 @@ static void __net_exit audit_net_exit(struct net *net)
 {
struct audit_net *aunet = net_generic(net, audit_net_id);
struct sock *sock = aunet->nlsk;
+
+   mutex_lock(_cmd_mutex);
if (sock == audit_sock) {
audit_pid = 0;
audit_sock = NULL;
}
+   mutex_unlock(_cmd_mutex);

RCU_INIT_POINTER(aunet->nlsk, NULL);
synchronize_net();

Re: [PATCH v3 net-next 1/2] net: ethernet: slicoss: add slicoss gigabit ethernet driver

2016-11-26 Thread Lino Sanfilippo


Hi Rami,


On 26.11.2016 16:48, Rami Rosen wrote:
>> @@ -0,0 +1,28 @@
>> +config NET_VENDOR_ALACRITECH
>> +bool "Alacritech devices"
>> +default y
>> +---help---
>> +  If you have a network (Ethernet) card belonging to this class, 
>> say Y.
>> +
>> +  Note that the answer to this question doesn't directly affect the
>> +  kernel: saying N will just cause the configurator to skip all
> 
> Shouldn't it be "Alacritech devices" here, as appears earlier ?
> 
>> +  the questions about Renesas devices. If you say Y, you will be 
>> asked

Yes, it definitely should not be Renesas :). This is a stupid copy and paste 
error, I will fix it,
thank you! 

>> +  for your specific device in the following questions.
>> +
> 
> ...
> ...
> ...
>> +struct slic_device {
>> +   struct pci_dev *pdev;
> ...
>> +   bool promisc;
> 
> Seems that the autoneg boolean is not used anywhere, apart from
> setting it once to true in
> the slic_set_link_autoneg() method. Apart from this member it is not
> accessed anywhere, so it seems it should be removed.
> 
>> +   bool autoneg;
>> +   int speed;

Agreed, this variable can be removed.

> ...
> 
>> +static int slic_load_rcvseq_firmware(struct slic_device *sdev)
>> +{
>> +   const struct firmware *fw;
>> +   const char *file;
>> +   u32 codelen;
>> +   int idx = 0;
>> +   u32 instr;
>> +   u32 addr;
>> +   int err;
>> +
> ...
>> +   /* Do an initial sanity check concerning firmware size now. A further
>> +* check follows below.
>> +*/
>> +   if (fw->size < SLIC_FIRMWARE_MIN_SIZE) {
>> +   dev_err(>pdev->dev,
>> +   "invalid firmware size %zu (min %u expected)\n",
>> +   fw->size, SLIC_FIRMWARE_MIN_SIZE);
>> +   err = -EINVAL;
> 
> in the release label, always 0 is returned:
> 
>> +   goto release;
>> +   }
>> +
>> +   codelen = slic_read_dword_from_firmware(fw, );
>> +
>> +   /* do another sanity check against firmware size */
>> +   if ((codelen + 4) > fw->size) {
>> +   dev_err(>pdev->dev,
>> +   "invalid rcv-sequencer firmware size %zu\n", 
>> fw->size);
>> +   err = -EINVAL;
> 
> Again, in the release label, always 0 is returned:
> 
>> +   goto release;
>> +   }
>> +
>>
>> +release:
>> +   release_firmware(fw);
>> +
>> +   return 0;
>> +}

This should return "err", I will fix it.

Thanks a lot for the review!

Regards,
Lino

Re: Large performance regression with 6in4 tunnel (sit)

2016-11-26 Thread Stephen Rothwell

Hi Eli,

On Fri, 25 Nov 2016 14:05:04 +0800 Eli Cooper  wrote:
>
> I think this is similar to the bug I fixed in commit ae148b085876
> ("ip6_tunnel: Update skb->protocol to ETH_P_IPV6 in ip6_tnl_xmit()").
> 
> I can reproduce a similar problem by applying xfrm to sit traffic.
> TSO/GSO packets are dropped when IPSec is enabled, and IPv6 throughput
> drops to 10s of Kbps. I am not sure if this is the same issue you
> experienced, but I wrote a patch that fixed at least the issue I had.
> 
> Could you test the patch I sent to the mailing list just now?

Thanks for the patch!

Its a bit tricky to test since the problem only occurs in a production
machine (I tried reproducing in a VM, but the problem did not occur),
but I will try to just rebuild the sit module and see if I can insert
the modified one.

-- 
Cheers,
Stephen Rothwell

[PATCH net-next 4/4] bnxt_en: Add PFC statistics.

2016-11-26 Thread Michael Chan

Report PFC statistics to ethtool -S and DCBNL.

Signed-off-by: Michael Chan 
---
 drivers/net/ethernet/broadcom/bnxt/bnxt.h |  7 +++
 drivers/net/ethernet/broadcom/bnxt/bnxt_dcb.c | 14 +-
 drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c | 23 ---
 3 files changed, 36 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.h 
b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
index 275e560..a72adec 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.h
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
@@ -1124,6 +1124,13 @@ struct bnxt {
u32 lpi_tmr_hi;
 };
 
+#define BNXT_RX_STATS_OFFSET(counter)  \
+   (offsetof(struct rx_port_stats, counter) / 8)
+
+#define BNXT_TX_STATS_OFFSET(counter)  \
+   ((offsetof(struct tx_port_stats, counter) + \
+ sizeof(struct rx_port_stats) + 512) / 8)
+
 #ifdef CONFIG_NET_RX_BUSY_POLL
 static inline void bnxt_enable_poll(struct bnxt_napi *bnapi)
 {
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_dcb.c 
b/drivers/net/ethernet/broadcom/bnxt/bnxt_dcb.c
index f391b47..fdf2d8c 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_dcb.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_dcb.c
@@ -347,8 +347,10 @@ static int bnxt_dcbnl_ieee_setets(struct net_device *dev, 
struct ieee_ets *ets)
 static int bnxt_dcbnl_ieee_getpfc(struct net_device *dev, struct ieee_pfc *pfc)
 {
struct bnxt *bp = netdev_priv(dev);
+   __le64 *stats = (__le64 *)bp->hw_rx_port_stats;
struct ieee_pfc *my_pfc = bp->ieee_pfc;
-   int rc;
+   long rx_off, tx_off;
+   int i, rc;
 
pfc->pfc_cap = bp->max_lltc;
 
@@ -369,6 +371,16 @@ static int bnxt_dcbnl_ieee_getpfc(struct net_device *dev, 
struct ieee_pfc *pfc)
pfc->mbc = my_pfc->mbc;
pfc->delay = my_pfc->delay;
 
+   if (!stats)
+   return 0;
+
+   rx_off = BNXT_RX_STATS_OFFSET(rx_pfc_ena_frames_pri0);
+   tx_off = BNXT_TX_STATS_OFFSET(tx_pfc_ena_frames_pri0);
+   for (i = 0; i < IEEE_8021QAZ_MAX_TCS; i++, rx_off++, tx_off++) {
+   pfc->requests[i] = le64_to_cpu(*(stats + tx_off));
+   pfc->indications[i] = le64_to_cpu(*(stats + rx_off));
+   }
+
return 0;
 }
 
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c 
b/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c
index fa6125e..784aa77 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c
@@ -107,16 +107,9 @@ static int bnxt_set_coalesce(struct net_device *dev,
 
 #define BNXT_NUM_STATS 21
 
-#define BNXT_RX_STATS_OFFSET(counter)  \
-   (offsetof(struct rx_port_stats, counter) / 8)
-
 #define BNXT_RX_STATS_ENTRY(counter)   \
{ BNXT_RX_STATS_OFFSET(counter), __stringify(counter) }
 
-#define BNXT_TX_STATS_OFFSET(counter)  \
-   ((offsetof(struct tx_port_stats, counter) + \
- sizeof(struct rx_port_stats) + 512) / 8)
-
 #define BNXT_TX_STATS_ENTRY(counter)   \
{ BNXT_TX_STATS_OFFSET(counter), __stringify(counter) }
 
@@ -150,6 +143,14 @@ static int bnxt_set_coalesce(struct net_device *dev,
BNXT_RX_STATS_ENTRY(rx_tagged_frames),
BNXT_RX_STATS_ENTRY(rx_double_tagged_frames),
BNXT_RX_STATS_ENTRY(rx_good_frames),
+   BNXT_RX_STATS_ENTRY(rx_pfc_ena_frames_pri0),
+   BNXT_RX_STATS_ENTRY(rx_pfc_ena_frames_pri1),
+   BNXT_RX_STATS_ENTRY(rx_pfc_ena_frames_pri2),
+   BNXT_RX_STATS_ENTRY(rx_pfc_ena_frames_pri3),
+   BNXT_RX_STATS_ENTRY(rx_pfc_ena_frames_pri4),
+   BNXT_RX_STATS_ENTRY(rx_pfc_ena_frames_pri5),
+   BNXT_RX_STATS_ENTRY(rx_pfc_ena_frames_pri6),
+   BNXT_RX_STATS_ENTRY(rx_pfc_ena_frames_pri7),
BNXT_RX_STATS_ENTRY(rx_undrsz_frames),
BNXT_RX_STATS_ENTRY(rx_eee_lpi_events),
BNXT_RX_STATS_ENTRY(rx_eee_lpi_duration),
@@ -179,6 +180,14 @@ static int bnxt_set_coalesce(struct net_device *dev,
BNXT_TX_STATS_ENTRY(tx_fcs_err_frames),
BNXT_TX_STATS_ENTRY(tx_err),
BNXT_TX_STATS_ENTRY(tx_fifo_underruns),
+   BNXT_TX_STATS_ENTRY(tx_pfc_ena_frames_pri0),
+   BNXT_TX_STATS_ENTRY(tx_pfc_ena_frames_pri1),
+   BNXT_TX_STATS_ENTRY(tx_pfc_ena_frames_pri2),
+   BNXT_TX_STATS_ENTRY(tx_pfc_ena_frames_pri3),
+   BNXT_TX_STATS_ENTRY(tx_pfc_ena_frames_pri4),
+   BNXT_TX_STATS_ENTRY(tx_pfc_ena_frames_pri5),
+   BNXT_TX_STATS_ENTRY(tx_pfc_ena_frames_pri6),
+   BNXT_TX_STATS_ENTRY(tx_pfc_ena_frames_pri7),
BNXT_TX_STATS_ENTRY(tx_eee_lpi_events),
BNXT_TX_STATS_ENTRY(tx_eee_lpi_duration),
BNXT_TX_STATS_ENTRY(tx_total_collisions),
-- 
1.8.3.1

[PATCH net-next 0/4] bnxt_en: Add DCBNL support.

2016-11-26 Thread Michael Chan

This series adds DCBNL operations to support host-based IEEE DCBX.

Michael Chan (4):
  bnxt_en: Re-factor bnxt_setup_tc().
  bnxt_en: Update firmware header file to include DCB command structs.
  bnxt_en: Implement DCBNL to support host-based DCBX.
  bnxt_en: Add PFC statistics.

 drivers/net/ethernet/broadcom/Kconfig |  10 +
 drivers/net/ethernet/broadcom/bnxt/Makefile   |   2 +-
 drivers/net/ethernet/broadcom/bnxt/bnxt.c |  30 +-
 drivers/net/ethernet/broadcom/bnxt/bnxt.h |  18 +
 drivers/net/ethernet/broadcom/bnxt/bnxt_dcb.c | 502 ++
 drivers/net/ethernet/broadcom/bnxt/bnxt_dcb.h |  59 +++
 drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c |  23 +-
 drivers/net/ethernet/broadcom/bnxt/bnxt_hsi.h | 326 ++
 8 files changed, 952 insertions(+), 18 deletions(-)
 create mode 100644 drivers/net/ethernet/broadcom/bnxt/bnxt_dcb.c
 create mode 100644 drivers/net/ethernet/broadcom/bnxt/bnxt_dcb.h

-- 
1.8.3.1

[PATCH net-next 2/4] bnxt_en: Update firmware header file to include DCB command structs.

2016-11-26 Thread Michael Chan

Get and store the max number of lossless TCs the hardware can support.

Signed-off-by: Michael Chan 
---
 drivers/net/ethernet/broadcom/bnxt/bnxt.c |   4 +
 drivers/net/ethernet/broadcom/bnxt/bnxt.h |   1 +
 drivers/net/ethernet/broadcom/bnxt/bnxt_hsi.h | 326 ++
 3 files changed, 331 insertions(+)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c 
b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index b75f4d0..58a75f4 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -4252,12 +4252,16 @@ static int bnxt_hwrm_queue_qportcfg(struct bnxt *bp)
goto qportcfg_exit;
}
bp->max_tc = resp->max_configurable_queues;
+   bp->max_lltc = resp->max_configurable_lossless_queues;
if (bp->max_tc > BNXT_MAX_QUEUE)
bp->max_tc = BNXT_MAX_QUEUE;
 
if (resp->queue_cfg_info & QUEUE_QPORTCFG_RESP_QUEUE_CFG_INFO_ASYM_CFG)
bp->max_tc = 1;
 
+   if (bp->max_lltc > bp->max_tc)
+   bp->max_lltc = bp->max_tc;
+
qptr = >queue_id0;
for (i = 0; i < bp->max_tc; i++) {
bp->q_info[i].queue_id = *qptr++;
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.h 
b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
index fcd07ee..edde11e 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.h
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
@@ -1010,6 +1010,7 @@ struct bnxt {
u32 rss_hash_cfg;
 
u8  max_tc;
+   u8  max_lltc;   /* lossless TCs */
struct bnxt_queue_info  q_info[BNXT_MAX_QUEUE];
 
unsigned intcurrent_interval;
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_hsi.h 
b/drivers/net/ethernet/broadcom/bnxt/bnxt_hsi.h
index 0456d5b..5565612 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_hsi.h
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_hsi.h
@@ -2355,6 +2355,39 @@ struct hwrm_queue_cfg_output {
u8 valid;
 };
 
+/* hwrm_queue_pfcenable_qcfg */
+/* Input (24 bytes) */
+struct hwrm_queue_pfcenable_qcfg_input {
+   __le16 req_type;
+   __le16 cmpl_ring;
+   __le16 seq_id;
+   __le16 target_id;
+   __le64 resp_addr;
+   __le16 port_id;
+   __le16 unused_0[3];
+};
+
+/* Output (16 bytes) */
+struct hwrm_queue_pfcenable_qcfg_output {
+   __le16 error_code;
+   __le16 req_type;
+   __le16 seq_id;
+   __le16 resp_len;
+   __le32 flags;
+   #define QUEUE_PFCENABLE_QCFG_RESP_FLAGS_PRI0_PFC_ENABLED   0x1UL
+   #define QUEUE_PFCENABLE_QCFG_RESP_FLAGS_PRI1_PFC_ENABLED   0x2UL
+   #define QUEUE_PFCENABLE_QCFG_RESP_FLAGS_PRI2_PFC_ENABLED   0x4UL
+   #define QUEUE_PFCENABLE_QCFG_RESP_FLAGS_PRI3_PFC_ENABLED   0x8UL
+   #define QUEUE_PFCENABLE_QCFG_RESP_FLAGS_PRI4_PFC_ENABLED   0x10UL
+   #define QUEUE_PFCENABLE_QCFG_RESP_FLAGS_PRI5_PFC_ENABLED   0x20UL
+   #define QUEUE_PFCENABLE_QCFG_RESP_FLAGS_PRI6_PFC_ENABLED   0x40UL
+   #define QUEUE_PFCENABLE_QCFG_RESP_FLAGS_PRI7_PFC_ENABLED   0x80UL
+   u8 unused_0;
+   u8 unused_1;
+   u8 unused_2;
+   u8 valid;
+};
+
 /* hwrm_queue_pfcenable_cfg */
 /* Input (24 bytes) */
 struct hwrm_queue_pfcenable_cfg_input {
@@ -2389,6 +2422,48 @@ struct hwrm_queue_pfcenable_cfg_output {
u8 valid;
 };
 
+/* hwrm_queue_pri2cos_qcfg */
+/* Input (24 bytes) */
+struct hwrm_queue_pri2cos_qcfg_input {
+   __le16 req_type;
+   __le16 cmpl_ring;
+   __le16 seq_id;
+   __le16 target_id;
+   __le64 resp_addr;
+   __le32 flags;
+   #define QUEUE_PRI2COS_QCFG_REQ_FLAGS_PATH   0x1UL
+   #define QUEUE_PRI2COS_QCFG_REQ_FLAGS_PATH_TX   (0x0UL << 0)
+   #define QUEUE_PRI2COS_QCFG_REQ_FLAGS_PATH_RX   (0x1UL << 0)
+   #define QUEUE_PRI2COS_QCFG_REQ_FLAGS_PATH_LAST
QUEUE_PRI2COS_QCFG_REQ_FLAGS_PATH_RX
+   #define QUEUE_PRI2COS_QCFG_REQ_FLAGS_IVLAN  0x2UL
+   u8 port_id;
+   u8 unused_0[3];
+};
+
+/* Output (24 bytes) */
+struct hwrm_queue_pri2cos_qcfg_output {
+   __le16 error_code;
+   __le16 req_type;
+   __le16 seq_id;
+   __le16 resp_len;
+   u8 pri0_cos_queue_id;
+   u8 pri1_cos_queue_id;
+   u8 pri2_cos_queue_id;
+   u8 pri3_cos_queue_id;
+   u8 pri4_cos_queue_id;
+   u8 pri5_cos_queue_id;
+   u8 pri6_cos_queue_id;
+   u8 pri7_cos_queue_id;
+   u8 queue_cfg_info;
+   #define QUEUE_PRI2COS_QCFG_RESP_QUEUE_CFG_INFO_ASYM_CFG0x1UL
+   u8 unused_0;
+   __le16 unused_1;
+   u8 unused_2;
+   u8 unused_3;
+   u8 unused_4;
+   u8 valid;
+};
+
 /* hwrm_queue_pri2cos_cfg */
 /* Input (40 bytes) */
 struct hwrm_queue_pri2cos_cfg_input {
@@ -2439,6 +2514,257 @@ struct hwrm_queue_pri2cos_cfg_output {
u8 valid;
 };
 
+/* hwrm_queue_cos2bw_qcfg */
+/* Input (24 bytes) */
+struct

[PATCH net-next 3/4] bnxt_en: Implement DCBNL to support host-based DCBX.

2016-11-26 Thread Michael Chan

Support only IEEE DCBX initially.  Add IEEE DCBNL ops and functions to
get and set the hardware DCBX parameters.  The DCB code is conditional on
Kconfig CONFIG_BNXT_DCB.

Signed-off-by: Michael Chan 
---
 drivers/net/ethernet/broadcom/Kconfig |  10 +
 drivers/net/ethernet/broadcom/bnxt/Makefile   |   2 +-
 drivers/net/ethernet/broadcom/bnxt/bnxt.c |   8 +-
 drivers/net/ethernet/broadcom/bnxt/bnxt.h |   9 +
 drivers/net/ethernet/broadcom/bnxt/bnxt_dcb.c | 490 ++
 drivers/net/ethernet/broadcom/bnxt/bnxt_dcb.h |  59 
 6 files changed, 575 insertions(+), 3 deletions(-)
 create mode 100644 drivers/net/ethernet/broadcom/bnxt/bnxt_dcb.c
 create mode 100644 drivers/net/ethernet/broadcom/bnxt/bnxt_dcb.h

diff --git a/drivers/net/ethernet/broadcom/Kconfig 
b/drivers/net/ethernet/broadcom/Kconfig
index bd8c80c..404c020 100644
--- a/drivers/net/ethernet/broadcom/Kconfig
+++ b/drivers/net/ethernet/broadcom/Kconfig
@@ -203,4 +203,14 @@ config BNXT_SRIOV
  Virtualization support in the NetXtreme-C/E products. This
  allows for virtual function acceleration in virtual environments.
 
+config BNXT_DCB
+   bool "Data Center Bridging (DCB) Support"
+   default n
+   depends on BNXT && DCB
+   ---help---
+ Say Y here if you want to use Data Center Bridging (DCB) in the
+ driver.
+
+ If unsure, say N.
+
 endif # NET_VENDOR_BROADCOM
diff --git a/drivers/net/ethernet/broadcom/bnxt/Makefile 
b/drivers/net/ethernet/broadcom/bnxt/Makefile
index 97e78e2..b233a86 100644
--- a/drivers/net/ethernet/broadcom/bnxt/Makefile
+++ b/drivers/net/ethernet/broadcom/bnxt/Makefile
@@ -1,3 +1,3 @@
 obj-$(CONFIG_BNXT) += bnxt_en.o
 
-bnxt_en-y := bnxt.o bnxt_sriov.o bnxt_ethtool.o
+bnxt_en-y := bnxt.o bnxt_sriov.o bnxt_ethtool.o bnxt_dcb.o
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c 
b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index 58a75f4..cec24b4 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -54,6 +54,7 @@
 #include "bnxt.h"
 #include "bnxt_sriov.h"
 #include "bnxt_ethtool.h"
+#include "bnxt_dcb.h"
 
 #define BNXT_TX_TIMEOUT(5 * HZ)
 
@@ -4988,7 +4989,7 @@ static void bnxt_enable_napi(struct bnxt *bp)
}
 }
 
-static void bnxt_tx_disable(struct bnxt *bp)
+void bnxt_tx_disable(struct bnxt *bp)
 {
int i;
struct bnxt_tx_ring_info *txr;
@@ -5006,7 +5007,7 @@ static void bnxt_tx_disable(struct bnxt *bp)
netif_carrier_off(bp->dev);
 }
 
-static void bnxt_tx_enable(struct bnxt *bp)
+void bnxt_tx_enable(struct bnxt *bp)
 {
int i;
struct bnxt_tx_ring_info *txr;
@@ -6677,6 +6678,7 @@ static void bnxt_remove_one(struct pci_dev *pdev)
 
bnxt_hwrm_func_drv_unrgtr(bp);
bnxt_free_hwrm_resources(bp);
+   bnxt_dcb_free(bp);
pci_iounmap(pdev, bp->bar2);
pci_iounmap(pdev, bp->bar1);
pci_iounmap(pdev, bp->bar0);
@@ -6904,6 +6906,8 @@ static int bnxt_init_one(struct pci_dev *pdev, const 
struct pci_device_id *ent)
dev->min_mtu = ETH_ZLEN;
dev->max_mtu = 9500;
 
+   bnxt_dcb_init(bp);
+
 #ifdef CONFIG_BNXT_SRIOV
init_waitqueue_head(>sriov_cfg_wait);
 #endif
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.h 
b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
index edde11e..275e560 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.h
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
@@ -1026,6 +1026,13 @@ struct bnxt {
struct bnxt_irq *irq_tbl;
u8  mac_addr[ETH_ALEN];
 
+#ifdef CONFIG_BNXT_DCB
+   struct ieee_pfc *ieee_pfc;
+   struct ieee_ets *ieee_ets;
+   u8  dcbx_cap;
+   u8  default_pri;
+#endif /* CONFIG_BNXT_DCB */
+
u32 msg_enable;
 
u32 hwrm_spec_code;
@@ -1221,6 +1228,8 @@ static inline void bnxt_disable_poll(struct bnxt_napi 
*bnapi)
 int hwrm_send_message_silent(struct bnxt *, void *, u32, int);
 int bnxt_hwrm_set_coal(struct bnxt *);
 int bnxt_hwrm_func_qcaps(struct bnxt *);
+void bnxt_tx_disable(struct bnxt *bp);
+void bnxt_tx_enable(struct bnxt *bp);
 int bnxt_hwrm_set_pause(struct bnxt *);
 int bnxt_hwrm_set_link_setting(struct bnxt *, bool, bool);
 int bnxt_hwrm_fw_set_time(struct bnxt *);
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_dcb.c 
b/drivers/net/ethernet/broadcom/bnxt/bnxt_dcb.c
new file mode 100644
index 000..f391b47
--- /dev/null
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_dcb.c
@@ -0,0 +1,490 @@
+/* Broadcom NetXtreme-C/E network driver.
+ *
+ * Copyright (c) 2014-2016 Broadcom Corporation
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include

[PATCH net-next 1/4] bnxt_en: Re-factor bnxt_setup_tc().

2016-11-26 Thread Michael Chan

Add a new function bnxt_setup_mq_tc() to handle MQPRIO.  This new function
will be called during ETS setup when we add DCBNL in the next patch.

Signed-off-by: Michael Chan 
---
 drivers/net/ethernet/broadcom/bnxt/bnxt.c | 18 ++
 drivers/net/ethernet/broadcom/bnxt/bnxt.h |  1 +
 2 files changed, 11 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c 
b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index 8c7bdbe..b75f4d0 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -6328,17 +6328,10 @@ static int bnxt_change_mtu(struct net_device *dev, int 
new_mtu)
return 0;
 }
 
-static int bnxt_setup_tc(struct net_device *dev, u32 handle, __be16 proto,
-struct tc_to_netdev *ntc)
+int bnxt_setup_mq_tc(struct net_device *dev, u8 tc)
 {
struct bnxt *bp = netdev_priv(dev);
bool sh = false;
-   u8 tc;
-
-   if (ntc->type != TC_SETUP_MQPRIO)
-   return -EINVAL;
-
-   tc = ntc->tc;
 
if (tc > bp->max_tc) {
netdev_err(dev, "too many traffic classes requested: %d Max 
supported is %d\n",
@@ -6381,6 +6374,15 @@ static int bnxt_setup_tc(struct net_device *dev, u32 
handle, __be16 proto,
return 0;
 }
 
+static int bnxt_setup_tc(struct net_device *dev, u32 handle, __be16 proto,
+struct tc_to_netdev *ntc)
+{
+   if (ntc->type != TC_SETUP_MQPRIO)
+   return -EINVAL;
+
+   return bnxt_setup_mq_tc(dev, ntc->tc);
+}
+
 #ifdef CONFIG_RFS_ACCEL
 static bool bnxt_fltr_match(struct bnxt_ntuple_filter *f1,
struct bnxt_ntuple_filter *f2)
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.h 
b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
index 47be789..fcd07ee 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.h
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
@@ -1225,5 +1225,6 @@ static inline void bnxt_disable_poll(struct bnxt_napi 
*bnapi)
 int bnxt_hwrm_fw_set_time(struct bnxt *);
 int bnxt_open_nic(struct bnxt *, bool, bool);
 int bnxt_close_nic(struct bnxt *, bool, bool);
+int bnxt_setup_mq_tc(struct net_device *dev, u8 tc);
 int bnxt_get_max_rings(struct bnxt *, int *, int *, bool);
 #endif
-- 
1.8.3.1

Re: [Patch net-next] net_sched: move the empty tp check from ->destroy() to ->delete()

2016-11-26 Thread Daniel Borkmann


On 11/26/2016 12:09 PM, Daniel Borkmann wrote:

On 11/26/2016 07:46 AM, Cong Wang wrote:

On Thu, Nov 24, 2016 at 7:20 AM, Daniel Borkmann  wrote:

[...]

Ok, strange, qdisc_destroy() calls into ops->destroy(), where ingress
drops its entire chain via tcf_destroy_chain(), so that will be NULL
eventually. The tps are freed by call_rcu() as well as qdisc itself
later on via qdisc_rcu_free(), where it frees per-cpu bstats as well.
Outstanding readers should either bail out due to if (!cl) or can still
process the chain until read section ends, but during that time, cl->q
resp. bstats should be good. Do you happen to know what's at address
880a68b04028? I was wondering wrt call_rcu() vs call_rcu_bh(), but
at least on ingress (netif_receive_skb_internal()) we hold rcu_read_lock()
here. The KASAN report is reliably happening at this location, right?


I am confused as well, I don't see how it could be related to my patch yet.
I will take a deep look in the weekend.


Ok, I'm currently on the run. Got too late yesterday night, but I'll
write what I found in the evening today, not related to ingress though.


Just pushed out my analysis to netdev under "[PATCH net] net, sched: respect
rcu grace period on cls destruction". My conclusion is that both issues are
actually separate, and that one is small enough where we could route it via
net actually. Perhaps this at the same time shrinks your "[PATCH net-next]
net_sched: move the empty tp check from ->destroy() to ->delete()" to a
reasonable size that it's suitable to net as well. Your ->delete()/->destroy()
one is definitely needed, too. The tp->root one is independant of ->delete()/
->destroy() as they are different races and tp->root could also happen when
you just destroy the whole tp directly. I think that seems like a good path
forward to me.

Thanks,
Daniel

[PATCH net] net, sched: respect rcu grace period on cls destruction

2016-11-26 Thread Daniel Borkmann

Roi reported a crash in flower where tp->root was NULL in ->classify()
callbacks. Reason is that in ->destroy() tp->root is set to NULL via
RCU_INIT_POINTER(). It's problematic for some of the classifiers, because
this doesn't respect RCU grace period for them, and as a result, still
outstanding readers from tc_classify() will try to blindly dereference
a NULL tp->root.

The tp->root object is strictly private to the classifier implementation
and holds internal data the core such as tc_ctl_tfilter() doesn't know
about. Within some classifiers, such as cls_bpf, cls_basic, etc, tp->root
is only checked for NULL in ->get() callback, but nowhere else. This is
misleading and seemed to be copied from old classifier code that was not
cleaned up properly. For example, d3fa76ee6b4a ("[NET_SCHED]: cls_basic:
fix NULL pointer dereference") moved tp->root initialization into ->init()
routine, where before it was part of ->change(), so ->get() had to deal
with tp->root being NULL back then, so that was indeed a valid case, after
d3fa76ee6b4a, not really anymore. We used to set tp->root to NULL long
ago in ->destroy(), see 47a1a1d4be29 ("pkt_sched: remove unnecessary xchg()
in packet classifiers"); but the NULLifying was reintroduced with the
RCUification, but it's not correct for every classifier implementation.

In the cases that are fixed here with one exception of cls_cgroup, tp->root
object is allocated and initialized inside ->init() callback, which is always
performed at a point in time after we allocate a new tp, which means tp and
thus tp->root was not globally visible in the tp chain yet (see 
tc_ctl_tfilter()).
Also, on destruction tp->root is strictly kfree_rcu()'ed in ->destroy()
handler, same for the tp which is kfree_rcu()'ed right when we return
from ->destroy() in tcf_destroy(). This means, the head object's lifetime
for such classifiers is always tied to the tp lifetime. The RCU callback
invocation for the two kfree_rcu() could be out of order, but that's fine
since both are independent.

Dropping the RCU_INIT_POINTER(tp->root, NULL) for these classifiers here
means that 1) we don't need a useless NULL check in fast-path and, 2) that
outstanding readers of that tp in tc_classify() can still execute under
respect with RCU grace period as it is actually expected.

Things that haven't been touched here: cls_fw and cls_route. They each
handle tp->root being NULL in ->classify() path for historic reasons, so
their ->destroy() implementation can stay as is. If someone actually
cares, they could get cleaned up at some point to avoid the test in fast
path. cls_u32 doesn't set tp->root to NULL. For cls_rsvp, I just added a
!head should anyone actually be using/testing it, so it at least aligns with
cls_fw and cls_route. For cls_flower we additionally need to defer rhashtable
destruction (to a sleepable context) after RCU grace period as concurrent
readers might still access it. (Note that in this case we need to hold module
reference to keep work callback address intact, since we only wait on module
unload for all call_rcu()s to finish.)

This fixes one race to bring RCU grace period guarantees back. Next step
as worked on by Cong however is to fix 1e052be69d04 ("net_sched: destroy
proto tp when all filters are gone") to get the order of unlinking the tp
in tc_ctl_tfilter() for the RTM_DELTFILTER case right by moving
RCU_INIT_POINTER() before tcf_destroy() and let the notification for
removal be done through the prior ->delete() callback. Both are independant
issues. Once we have that right, we can then clean tp->root up for a number
of classifiers by not making them RCU pointers, which requires a new callback
(->uninit) that is triggered from tp's RCU callback, where we just kfree()
tp->root from there.

Fixes: 1f947bf151e9 ("net: sched: rcu'ify cls_bpf")
Fixes: 9888faefe132 ("net: sched: cls_basic use RCU")
Fixes: 70da9f0bf999 ("net: sched: cls_flow use RCU")
Fixes: 77b9900ef53a ("tc: introduce Flower classifier")
Fixes: bf3994d2ed31 ("net/sched: introduce Match-all classifier")
Fixes: 952313bd6258 ("net: sched: cls_cgroup use RCU")
Reported-by: Roi Dayan 
Signed-off-by: Daniel Borkmann 
Cc: Cong Wang 
Cc: John Fastabend 
Cc: Roi Dayan 
Cc: Jiri Pirko 
---
 net/sched/cls_basic.c|  4 
 net/sched/cls_bpf.c  |  4 
 net/sched/cls_cgroup.c   |  7 +++
 net/sched/cls_flow.c |  1 -
 net/sched/cls_flower.c   | 31 ++-
 net/sched/cls_matchall.c |  1 -
 net/sched/cls_rsvp.h |  3 ++-
 net/sched/cls_tcindex.c  |  1 -
 8 files changed, 31 insertions(+), 21 deletions(-)

diff --git a/net/sched/cls_basic.c b/net/sched/cls_basic.c
index eb219b7..5877f60 100644
--- a/net/sched/cls_basic.c
+++ b/net/sched/cls_basic.c
@@ -62,9 +62,6 @@ static unsigned long basic_get(struct tcf_proto *tp, u32 
handle)
struct basic_head *head =

Re: [PATCH] mlx4: give precise rx/tx bytes/packets counters

2016-11-26 Thread Saeed Mahameed

On Fri, Nov 25, 2016 at 5:46 PM, Eric Dumazet  wrote:
> From: Eric Dumazet 
>
> mlx4 stats are chaotic because a deferred work queue is responsible
> to update them every 250 ms.
>
Hello Eric,

Well the only historical reason for this deferred work is that we
query FW for some counters which might sleep.
and there is one place in the kernel where dev_get_stats(dev, )
is called under a rw lock "read_lock(_base_lock);"
in http://lxr.free-electrons.com/source/net/core/net-sysfs.c#L552, i
am not sure why is it this way ? Maybe it is time fix this and get rid
of the deferred work, which will give you the same precision even for
when reading ehttool stats, which this patch didn't take care off.
this will also improve other drivers who might sleep while reading
stats.

> Even sampling stats every one second with "sar -n DEV 1" gives
> variations like the following :
>
> lpaa23:~# sar -n DEV 1 10 | grep eth0 | cut -c1-65
> 07:39:22 eth0 146877.00 3265554.00   9467.15 4828168.50
> 07:39:23 eth0 146587.00 3260329.00   9448.15 4820445.98
> 07:39:24 eth0 146894.00 3259989.00   9468.55 4819943.26
> 07:39:25 eth0 110368.00 2454497.00   7113.95 3629012.17  <<>>
> 07:39:26 eth0 146563.00 3257502.00   9447.25 4816266.23
> 07:39:27 eth0 145678.00 3258292.00   9389.79 4817414.39
> 07:39:28 eth0 145268.00 3253171.00   9363.85 4809852.46
> 07:39:29 eth0 146439.00 3262185.00   9438.97 4823172.48
> 07:39:30 eth0 146758.00 3264175.00   9459.94 4826124.13
> 07:39:31 eth0 146843.00 3256903.00   9465.44 4815381.97
> Average: eth0 142827.50 3179259.70   9206.30 4700578.16
>
> This patch allows rx/tx bytes/packets counters being folded at the
> time we need stats.
>
> We now can fetch stats every 1 ms if we want to check NIC behavior
> on a small time window. It is also easier to detect anomalies.
>
> lpaa23:~# sar -n DEV 1 10 | grep eth0 | cut -c1-65
> 07:42:50 eth0 142915.00 3177696.00   9212.06 4698270.42
> 07:42:51 eth0 143741.00 3200232.00   9265.15 4731593.02
> 07:42:52 eth0 142781.00 3171600.00   9202.92 4689260.16
> 07:42:53 eth0 143835.00 3192932.00   9271.80 4720761.39
> 07:42:54 eth0 141922.00 3165174.00   9147.64 4679759.21
> 07:42:55 eth0 142993.00 3207038.00   9216.78 4741653.05
> 07:42:56 eth0 141394.06 3154335.64   9113.85 4663731.73
> 07:42:57 eth0 141850.00 3161202.00   9144.48 4673866.07
> 07:42:58 eth0 143439.00 3180736.00   9246.05 4702755.35
> 07:42:59 eth0 143501.00 3210992.00   9249.99 4747501.84
> Average: eth0 142835.66 3182165.93   9206.98 4704874.08
>
> Signed-off-by: Eric Dumazet 
> Cc: Tariq Toukan 
> ---
>  drivers/net/ethernet/mellanox/mlx4/en_ethtool.c |2
>  drivers/net/ethernet/mellanox/mlx4/en_netdev.c  |1
>  drivers/net/ethernet/mellanox/mlx4/en_port.c|   77 +-
>  drivers/net/ethernet/mellanox/mlx4/mlx4_en.h|1
>  4 files changed, 58 insertions(+), 23 deletions(-)
>
> diff --git a/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c 
> b/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c
> index 
> 487a58f9c192896852fef271b6cce9bde132deb7..d9c9f86a30df953fa555934c5406057dcaf28960
>  100644
> --- a/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c
> +++ b/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c
> @@ -367,6 +367,8 @@ static void mlx4_en_get_ethtool_stats(struct net_device 
> *dev,
>
> spin_lock_bh(>stats_lock);
>
> +   mlx4_en_fold_software_stats(dev);
> +
> for (i = 0; i < NUM_MAIN_STATS; i++, bitmap_iterator_inc())
> if (bitmap_iterator_test())
> data[index++] = ((unsigned long *)>stats)[i];
> diff --git a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c 
> b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
> index 
> 9018bb1b2e12142e048281a9d28ddf95e0023a61..d28d841db23ce885d2011877a156bacf23f65afe
>  100644
> --- a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
> +++ b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
> @@ -1321,6 +1321,7 @@ mlx4_en_get_stats64(struct net_device *dev, struct 
> rtnl_link_stats64 *stats)
> struct mlx4_en_priv *priv = netdev_priv(dev);
>
> spin_lock_bh(>stats_lock);
> +   mlx4_en_fold_software_stats(dev);
> netdev_stats_to_stats64(stats, >stats);
> spin_unlock_bh(>stats_lock);
>
> diff --git a/drivers/net/ethernet/mellanox/mlx4/en_port.c 
> b/drivers/net/ethernet/mellanox/mlx4/en_port.c
> index 
> 1eb4c1e10bad1dad26049876acf107a2073a6ab1..c6c4f1238923e09eced547454b86c68720292859
>  100644
> --- a/drivers/net/ethernet/mellanox/mlx4/en_port.c
> +++ b/drivers/net/ethernet/mellanox/mlx4/en_port.c
> @@ -147,6 +147,39 @@ static unsigned long en_stats_adder(__be64 *start, 
> __be64 *next, int num)
> return ret;
>  }
>
> +void mlx4_en_fold_software_stats(struct net_device *dev)
> +{
> +

Re: [PATCH 1/1] NET: usb: cdc_ncm: adding MBIM RESET_FUNCTION request and modifying ncm bind common code

2016-11-26 Thread Bjørn Mork

Bjørn Mork  writes:

> Finally, I found my modems (or at least a number of them) again today.
> But I'm sorry to say, that the troublesome Huawei E3372h-153 is still
> giving us a hard time.  It does not work with your patch. The symptom is
> the same as earlier:  The modem returns MBIM frames with 32bit headers.
>
> So for now, I have to NAK this patch.
>
> I am sure we can find a good solution that makes all of these modems
> work, but I cannot support a patch that breaks previously working
> configurations. Sorry.  I'll do a few experiments and see if there is a
> simple fix for this.  Otherwise we'll probably have to do the quirk
> game.


This is a proof-of-concept only, but it appears to be working.  Please
test with your device(s) too.  It's still mostly your code, as you can
see.

If this turns out to work, then I believe we should refactor
cdc_ncm_init() and cdc_ncm_bind_common() to make the whole
initialisation sequence a bit cleaner.  And maybe also include
cdc_mbim_bind().  Ideally, the MBIM specific RESET should happen there
instead of "polluting" the NCM driver with MBIM specific code.

But anyway:  The sequence that seems to work for both the  E3372h-153
and the EM7455 is

 USB_CDC_GET_NTB_PARAMETERS
 USB_CDC_RESET_FUNCTION
 usb_set_interface(dev->udev, 'data interface no', 0);
 remaining parts of cdc_ncm_init(), excluding USB_CDC_GET_NTB_PARAMETERS
 usb_set_interface(dev->udev, 'data interface no', 'data alt setting');

without any additional delay between the two usb_set_interface() calls.
So the major difference from your patch is that I moved the two control
requests out of cdc_ncm_init() to allow running them _before_ setting
the data interface to altsetting 0.

But maybe I was just lucky.  This was barely proof tested.  Needs a lot
more testing and cleanups as suggested.  I'd appreciate it if you
continued that, as I don't really have any time for it...

FWIW, I also ran a quick test with a D-Link DWM-156A7 (Mediatek MBIM
firmware) and a Huawei E367 (Qualcomm device with early Huawei MBIM
firmware, distinctly different from the E3372h-153 and most other
MBIM devices I've seen)



Bjørn

---
 drivers/net/usb/cdc_ncm.c| 48 
 include/uapi/linux/usb/cdc.h |  1 +
 2 files changed, 32 insertions(+), 17 deletions(-)

diff --git a/drivers/net/usb/cdc_ncm.c b/drivers/net/usb/cdc_ncm.c
index 877c9516e781..be019cbf1719 100644
--- a/drivers/net/usb/cdc_ncm.c
+++ b/drivers/net/usb/cdc_ncm.c
@@ -488,16 +488,6 @@ static int cdc_ncm_init(struct usbnet *dev)
u8 iface_no = ctx->control->cur_altsetting->desc.bInterfaceNumber;
int err;
 
-   err = usbnet_read_cmd(dev, USB_CDC_GET_NTB_PARAMETERS,
- USB_TYPE_CLASS | USB_DIR_IN
- |USB_RECIP_INTERFACE,
- 0, iface_no, >ncm_parm,
- sizeof(ctx->ncm_parm));
-   if (err < 0) {
-   dev_err(>intf->dev, "failed GET_NTB_PARAMETERS\n");
-   return err; /* GET_NTB_PARAMETERS is required */
-   }
-
/* set CRC Mode */
if (cdc_ncm_flags(dev) & USB_CDC_NCM_NCAP_CRC_MODE) {
dev_dbg(>intf->dev, "Setting CRC mode off\n");
@@ -837,12 +827,43 @@ int cdc_ncm_bind_common(struct usbnet *dev, struct 
usb_interface *intf, u8 data_
}
}
 
+   iface_no = ctx->control->cur_altsetting->desc.bInterfaceNumber;
+   temp = usbnet_read_cmd(dev, USB_CDC_GET_NTB_PARAMETERS,
+  USB_TYPE_CLASS | USB_DIR_IN
+  | USB_RECIP_INTERFACE,
+  0, iface_no, >ncm_parm,
+  sizeof(ctx->ncm_parm));
+   if (temp < 0) {
+   dev_err(>intf->dev, "failed GET_NTB_PARAMETERS\n");
+   goto error; /* GET_NTB_PARAMETERS is required */
+   }
+
+   /* Some modems (e.g. Telit LE922A6) need to reset the MBIM function
+* or they will fail to work properly.
+* For details on RESET_FUNCTION request see document
+* "USB Communication Class Subclass Specification for MBIM"
+* RESET_FUNCTION should be harmless for all the other MBIM modems
+*/
+   if (cdc_ncm_comm_intf_is_mbim(ctx->control->cur_altsetting)) {
+   temp = usbnet_write_cmd(dev, USB_CDC_RESET_FUNCTION,
+   USB_TYPE_CLASS | USB_DIR_OUT
+   | USB_RECIP_INTERFACE,
+   0, iface_no, NULL, 0);
+   if (temp < 0)
+   dev_err(>intf->dev, "failed RESET_FUNCTION\n");
+   }
+
iface_no = ctx->data->cur_altsetting->desc.bInterfaceNumber;
 
/* Reset data interface. Some devices will not reset properly
 * unless they are configured first.  Toggle the altsetting to
 * force a reset
+* This is applied only to ncm

[GIT] Networking

2016-11-26 Thread David Miller


1) Fix leak in fsl/fman driver, from Dan Carpenter.

2) Call flow dissector initcall earlier than any networking driver can
   register and start to use it, from Eric Dumazet.

3) Some dup header fixes from Geliang Tang.

4) TIPC link monitoring compat fix from Jon Paul Maloy.

5) Link changes require EEE re-negotiation in bcm_sf2 driver, from
   Florian Fainelli.

6) Fix bogus handle ID passed into tfilter_notify_chain(), from
   Roman Mashak.

7) Fix dump size calculation in rtnl_calcit(), from Zhang Shengju.

Please pull, thanks a lot!

The following changes since commit 3b404a519815b9820f73f1ecf404e5546c9270ba:

  Merge branch 'for-linus' of 
git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security 
(2016-11-21 15:27:41 -0800)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git 

for you to fetch changes up to 6998cc6ec23740347670da13186d2979c5401903:

  tipc: resolve connection flow control compatibility problem (2016-11-25 
21:38:16 -0500)


Andrew Lunn (1):
  net: ethernet: mvneta: Remove IFF_UNICAST_FLT which is not implemented

Andy Gospodarek (1):
  bnxt: do not busy-poll when link is down

Arnd Bergmann (1):
  mvpp2: use correct size for memset

Christophe Jaillet (1):
  bnxt_en: Fix a VXLAN vs GENEVE issue

Dan Carpenter (1):
  fsl/fman: fix a leak in tgec_free()

David S. Miller (2):
  Merge branch 'for-upstream' of 
git://git.kernel.org/.../bluetooth/bluetooth
  Merge tag 'linux-can-fixes-for-4.9-20161123' of 
git://git.kernel.org/.../mkl/linux-can

Eric Dumazet (2):
  flow_dissect: call init_default_flow_dissectors() earlier
  udplite: call proper backlog handlers

Florian Fainelli (1):
  net: dsa: bcm_sf2: Ensure we re-negotiate EEE during after link change

Gao Feng (1):
  driver: macvlan: Check if need rollback multicast setting in macvlan_open

Geliang Tang (4):
  dwc_eth_qos: drop duplicate headers
  ibmvnic: drop duplicate header seq_file.h
  net: ieee802154: drop duplicate header delay.h
  net/mlx5: drop duplicate header delay.h

Johan Hedberg (1):
  Bluetooth: Fix using the correct source address type

Jon Paul Maloy (3):
  tipc: fix compatibility bug in link monitoring
  tipc: improve sanity check for received domain records
  tipc: resolve connection flow control compatibility problem

Kirill Esipov (1):
  net: phy: micrel: fix KSZ8041FTL supported value

Miroslav Lichvar (1):
  net: ethtool: don't require CAP_NET_ADMIN for ETHTOOL_GLINKSETTINGS

Oliver Hartkopp (1):
  can: bcm: fix support for CAN FD frames

Paolo Abeni (1):
  ipv6: bump genid when the IFA_F_TENTATIVE flag is clear

Randy Dunlap (1):
  netdevice.h: fix kernel-doc warning

Roman Mashak (1):
  net sched filters: fix filter handle ID in tfilter_notify_chain()

Tariq Toukan (1):
  net/mlx4_en: Free netdev resources under state lock

WANG Cong (1):
  net: revert "net: l2tp: Treat NET_XMIT_CN as success in l2tp_eth_dev_xmit"

Zhang Shengju (1):
  rtnetlink: fix the wrong minimal dump size getting from rtnl_calcit()

 drivers/net/dsa/bcm_sf2.c   |  4 
 drivers/net/ethernet/broadcom/bnxt/bnxt.c   | 15 ---
 drivers/net/ethernet/freescale/fman/fman_tgec.c |  3 ---
 drivers/net/ethernet/ibm/ibmvnic.c  |  1 -
 drivers/net/ethernet/marvell/mvneta.c   |  2 +-
 drivers/net/ethernet/marvell/mvpp2.c|  2 +-
 drivers/net/ethernet/mellanox/mlx4/en_netdev.c  |  5 -
 drivers/net/ethernet/mellanox/mlx5/core/main.c  |  1 -
 drivers/net/ethernet/synopsys/dwc_eth_qos.c |  2 --
 drivers/net/ieee802154/adf7242.c|  1 -
 drivers/net/macvlan.c   |  3 ++-
 drivers/net/phy/micrel.c|  8 
 include/linux/netdevice.h   |  2 +-
 include/net/bluetooth/hci_core.h|  2 +-
 net/bluetooth/6lowpan.c |  4 ++--
 net/bluetooth/hci_conn.c| 26 --
 net/bluetooth/l2cap_core.c  |  2 +-
 net/bluetooth/rfcomm/tty.c  |  2 +-
 net/bluetooth/sco.c |  2 +-
 net/can/bcm.c   | 18 ++
 net/core/ethtool.c  |  1 +
 net/core/flow_dissector.c   |  2 +-
 net/core/rtnetlink.c|  2 +-
 net/ipv4/udp.c  |  2 +-
 net/ipv4/udp_impl.h |  2 +-
 net/ipv4/udplite.c  |  2 +-
 net/ipv6/addrconf.c | 18 --
 net/ipv6/udp.c  |  2 +-
 net/ipv6/udp_impl.h |  2 +-
 net/ipv6/udplite.c  |  2 +-

[PATCH] amd-xgbe: Fix unused suspend handlers build warning

2016-11-26 Thread Borislav Petkov

From: Borislav Petkov 

Fix:

  drivers/net/ethernet/amd/xgbe/xgbe-main.c:835:12: warning: ‘xgbe_suspend’ 
defined
but not used [-Wunused-function]
  drivers/net/ethernet/amd/xgbe/xgbe-main.c:855:12: warning: ‘xgbe_resume’ 
defined
but not used [-Wunused-function]

I see it during randconfig builds here.

Signed-off-by: Borislav Petkov 
Cc: Tom Lendacky 
Cc: netdev@vger.kernel.org
---
 drivers/net/ethernet/amd/xgbe/xgbe-main.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/amd/xgbe/xgbe-main.c 
b/drivers/net/ethernet/amd/xgbe/xgbe-main.c
index e10e569c0d5f..2e8451b0a74a 100644
--- a/drivers/net/ethernet/amd/xgbe/xgbe-main.c
+++ b/drivers/net/ethernet/amd/xgbe/xgbe-main.c
@@ -831,7 +831,7 @@ static int xgbe_remove(struct platform_device *pdev)
return 0;
 }
 
-#ifdef CONFIG_PM
+#ifdef CONFIG_PM_SLEEP
 static int xgbe_suspend(struct device *dev)
 {
struct net_device *netdev = dev_get_drvdata(dev);
@@ -876,7 +876,7 @@ static int xgbe_resume(struct device *dev)
 
return ret;
 }
-#endif /* CONFIG_PM */
+#endif /* CONFIG_PM_SLEEP */
 
 #ifdef CONFIG_ACPI
 static const struct acpi_device_id xgbe_acpi_match[] = {
-- 
2.10.0

Re: net: GPF in eth_header

2016-11-26 Thread Eric Dumazet

2016-11-26 12:05 GMT-08:00 Eric Dumazet :
>> Hi Eric,
>>
>> The crash happens when the kernel tries to access shadow for nonmapped 
>> memory.
>>
>> The issue here is an integer overflow which happens in 
>> neigh_resolve_output().
>> skb_network_offset(skb) can return negative number, but __skb_pull()
>> accepts unsigned int as len.
>> As a result, the least significat bit in higher 32 bits of skb->data
>> gets set and we get an out-of-bounds with offset of 4 GB.
>>
>> I've attached a short reproducer, but you either need KASAN or to add
>> a BUG_ON to see the crash.
>> In this reproducer skb_network_offset() becomes negative after merging
>> two ipv6 fragments.
>>
>> I actually see multiple places where skb_network_offset() is used as
>> an argument to skb_pull().
>> So I guess every place can potentially be buggy.
>
> Well, I think the intent is to accept a negative number.
>
> This definitely was assumed by commit e1f165032c8bade authors !
>
> I guess they were using a 32bit kernel for their tests.

Correct fix would be to use

skb_push(skb, -skb_network_offset(skb));

As done in other locations...

Re: [PATCH 1/1] NET: usb: cdc_ncm: adding MBIM RESET_FUNCTION request and modifying ncm bind common code

2016-11-26 Thread Bjørn Mork

Finally, I found my modems (or at least a number of them) again today.
But I'm sorry to say, that the troublesome Huawei E3372h-153 is still
giving us a hard time.  It does not work with your patch. The symptom is
the same as earlier:  The modem returns MBIM frames with 32bit headers.

So for now, I have to NAK this patch.

I am sure we can find a good solution that makes all of these modems
work, but I cannot support a patch that breaks previously working
configurations. Sorry.  I'll do a few experiments and see if there is a
simple fix for this.  Otherwise we'll probably have to do the quirk
game.


Bjørn

[PATCH net-next 4/6] tcp: instrument how long TCP is limited by insufficient send buffer

2016-11-26 Thread Yuchung Cheng

From: Francis Yan 

This patch measures the amount of time when TCP runs out of new data
to send to the network due to insufficient send buffer, while TCP
is still busy delivering (i.e. write queue is not empty). The goal
is to indicate either the send buffer autotuning or user SO_SNDBUF
setting has resulted network under-utilization.

The measurement starts conservatively by checking various conditions
to minimize false claims (i.e. under-estimation is more likely).
The measurement stops when the SOCK_NOSPACE flag is cleared. But it
does not account the time elapsed till the next application write.
Also the measurement only starts if the sender is still busy sending
data, s.t. the limit accounted is part of the total busy time.

Signed-off-by: Francis Yan 
Signed-off-by: Yuchung Cheng 
Signed-off-by: Soheil Hassas Yeganeh 
---
 net/ipv4/tcp.c| 10 --
 net/ipv4/tcp_input.c  |  5 -
 net/ipv4/tcp_output.c | 12 
 3 files changed, 24 insertions(+), 3 deletions(-)

diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 913f9bb..259ffb5 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -996,8 +996,11 @@ static ssize_t do_tcp_sendpages(struct sock *sk, struct 
page *page, int offset,
goto out;
 out_err:
/* make sure we wake any epoll edge trigger waiter */
-   if (unlikely(skb_queue_len(>sk_write_queue) == 0 && err == -EAGAIN))
+   if (unlikely(skb_queue_len(>sk_write_queue) == 0 &&
+err == -EAGAIN)) {
sk->sk_write_space(sk);
+   tcp_chrono_stop(sk, TCP_CHRONO_SNDBUF_LIMITED);
+   }
return sk_stream_error(sk, flags, err);
 }
 
@@ -1331,8 +1334,11 @@ int tcp_sendmsg(struct sock *sk, struct msghdr *msg, 
size_t size)
 out_err:
err = sk_stream_error(sk, flags, err);
/* make sure we wake any epoll edge trigger waiter */
-   if (unlikely(skb_queue_len(>sk_write_queue) == 0 && err == -EAGAIN))
+   if (unlikely(skb_queue_len(>sk_write_queue) == 0 &&
+err == -EAGAIN)) {
sk->sk_write_space(sk);
+   tcp_chrono_stop(sk, TCP_CHRONO_SNDBUF_LIMITED);
+   }
release_sock(sk);
return err;
 }
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index a5d1727..56fe736 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -5059,8 +5059,11 @@ static void tcp_check_space(struct sock *sk)
/* pairs with tcp_poll() */
smp_mb__after_atomic();
if (sk->sk_socket &&
-   test_bit(SOCK_NOSPACE, >sk_socket->flags))
+   test_bit(SOCK_NOSPACE, >sk_socket->flags)) {
tcp_new_space(sk);
+   if (!test_bit(SOCK_NOSPACE, >sk_socket->flags))
+   tcp_chrono_stop(sk, TCP_CHRONO_SNDBUF_LIMITED);
+   }
}
 }
 
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index b7c..d3545d0 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -1514,6 +1514,18 @@ static void tcp_cwnd_validate(struct sock *sk, bool 
is_cwnd_limited)
if (sysctl_tcp_slow_start_after_idle &&
(s32)(tcp_time_stamp - tp->snd_cwnd_stamp) >= 
inet_csk(sk)->icsk_rto)
tcp_cwnd_application_limited(sk);
+
+   /* The following conditions together indicate the starvation
+* is caused by insufficient sender buffer:
+* 1) just sent some data (see tcp_write_xmit)
+* 2) not cwnd limited (this else condition)
+* 3) no more data to send (null tcp_send_head )
+* 4) application is hitting buffer limit (SOCK_NOSPACE)
+*/
+   if (!tcp_send_head(sk) && sk->sk_socket &&
+   test_bit(SOCK_NOSPACE, >sk_socket->flags) &&
+   (1 << sk->sk_state) & (TCPF_ESTABLISHED | TCPF_CLOSE_WAIT))
+   tcp_chrono_start(sk, TCP_CHRONO_SNDBUF_LIMITED);
}
 }
 
-- 
2.8.0.rc3.226.g39d4020

[PATCH net-next 5/6] tcp: export sender limits chronographs to TCP_INFO

2016-11-26 Thread Yuchung Cheng

From: Francis Yan 

This patch exports all the sender chronograph measurements collected
in the previous patches to TCP_INFO interface. Note that busy time
exported includes all the other sending limits (rwnd-limited,
sndbuf-limited). Internally the time unit is jiffy but externally
the measurements are in microseconds for future extensions.

Signed-off-by: Francis Yan 
Signed-off-by: Yuchung Cheng 
Signed-off-by: Soheil Hassas Yeganeh 
---
 include/uapi/linux/tcp.h |  4 
 net/ipv4/tcp.c   | 20 
 2 files changed, 24 insertions(+)

diff --git a/include/uapi/linux/tcp.h b/include/uapi/linux/tcp.h
index 73ac0db..2863b66 100644
--- a/include/uapi/linux/tcp.h
+++ b/include/uapi/linux/tcp.h
@@ -214,6 +214,10 @@ struct tcp_info {
__u32   tcpi_data_segs_out; /* RFC4898 tcpEStatsDataSegsOut */
 
__u64   tcpi_delivery_rate;
+
+   __u64   tcpi_busy_time;  /* Time (usec) busy sending data */
+   __u64   tcpi_rwnd_limited;   /* Time (usec) limited by receive window */
+   __u64   tcpi_sndbuf_limited; /* Time (usec) limited by send buffer */
 };
 
 /* for TCP_MD5SIG socket option */
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 259ffb5..cdde20f 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -2708,6 +2708,25 @@ int compat_tcp_setsockopt(struct sock *sk, int level, 
int optname,
 EXPORT_SYMBOL(compat_tcp_setsockopt);
 #endif
 
+static void tcp_get_info_chrono_stats(const struct tcp_sock *tp,
+ struct tcp_info *info)
+{
+   u64 stats[__TCP_CHRONO_MAX], total = 0;
+   enum tcp_chrono i;
+
+   for (i = TCP_CHRONO_BUSY; i < __TCP_CHRONO_MAX; ++i) {
+   stats[i] = tp->chrono_stat[i - 1];
+   if (i == tp->chrono_type)
+   stats[i] += tcp_time_stamp - tp->chrono_start;
+   stats[i] *= USEC_PER_SEC / HZ;
+   total += stats[i];
+   }
+
+   info->tcpi_busy_time = total;
+   info->tcpi_rwnd_limited = stats[TCP_CHRONO_RWND_LIMITED];
+   info->tcpi_sndbuf_limited = stats[TCP_CHRONO_SNDBUF_LIMITED];
+}
+
 /* Return information about state of tcp endpoint in API format. */
 void tcp_get_info(struct sock *sk, struct tcp_info *info)
 {
@@ -2800,6 +2819,7 @@ void tcp_get_info(struct sock *sk, struct tcp_info *info)
info->tcpi_bytes_acked = tp->bytes_acked;
info->tcpi_bytes_received = tp->bytes_received;
info->tcpi_notsent_bytes = max_t(int, 0, tp->write_seq - tp->snd_nxt);
+   tcp_get_info_chrono_stats(tp, info);
 
unlock_sock_fast(sk, slow);
 
-- 
2.8.0.rc3.226.g39d4020

[PATCH net-next 6/6] tcp: SOF_TIMESTAMPING_OPT_STATS option for SO_TIMESTAMPING

2016-11-26 Thread Yuchung Cheng

From: Francis Yan 

This patch exports the sender chronograph stats via the socket
SO_TIMESTAMPING channel. Currently we can instrument how long a
particular application unit of data was queued in TCP by tracking
SOF_TIMESTAMPING_TX_SOFTWARE and SOF_TIMESTAMPING_TX_SCHED. Having
these sender chronograph stats exported simultaneously along with
these timestamps allow further breaking down the various sender
limitation.  For example, a video server can tell if a particular
chunk of video on a connection takes a long time to deliver because
TCP was experiencing small receive window. It is not possible to
tell before this patch without packet traces.

To prepare these stats, the user needs to set
SOF_TIMESTAMPING_OPT_STATS and SOF_TIMESTAMPING_OPT_TSONLY flags
while requesting other SOF_TIMESTAMPING TX timestamps. When the
timestamps are available in the error queue, the stats are returned
in a separate control message of type SCM_TIMESTAMPING_OPT_STATS,
in a list of TLVs (struct nlattr) of types: TCP_NLA_BUSY_TIME,
TCP_NLA_RWND_LIMITED, TCP_NLA_SNDBUF_LIMITED. Unit is microsecond.

Signed-off-by: Francis Yan 
Signed-off-by: Yuchung Cheng 
Signed-off-by: Soheil Hassas Yeganeh 
---
 Documentation/networking/timestamping.txt | 10 ++
 arch/alpha/include/uapi/asm/socket.h  |  2 ++
 arch/frv/include/uapi/asm/socket.h|  2 ++
 arch/ia64/include/uapi/asm/socket.h   |  2 ++
 arch/m32r/include/uapi/asm/socket.h   |  2 ++
 arch/mips/include/uapi/asm/socket.h   |  2 ++
 arch/mn10300/include/uapi/asm/socket.h|  2 ++
 arch/parisc/include/uapi/asm/socket.h |  2 ++
 arch/powerpc/include/uapi/asm/socket.h|  2 ++
 arch/s390/include/uapi/asm/socket.h   |  2 ++
 arch/sparc/include/uapi/asm/socket.h  |  2 ++
 arch/xtensa/include/uapi/asm/socket.h |  2 ++
 include/linux/tcp.h   |  2 ++
 include/uapi/asm-generic/socket.h |  2 ++
 include/uapi/linux/net_tstamp.h   |  3 ++-
 include/uapi/linux/tcp.h  |  8 
 net/core/skbuff.c | 12 +---
 net/core/sock.c   |  7 +++
 net/ipv4/tcp.c| 20 
 net/socket.c  |  7 ++-
 20 files changed, 88 insertions(+), 5 deletions(-)

diff --git a/Documentation/networking/timestamping.txt 
b/Documentation/networking/timestamping.txt
index 671cccf..96f5069 100644
--- a/Documentation/networking/timestamping.txt
+++ b/Documentation/networking/timestamping.txt
@@ -182,6 +182,16 @@ SOF_TIMESTAMPING_OPT_TSONLY:
   the timestamp even if sysctl net.core.tstamp_allow_data is 0.
   This option disables SOF_TIMESTAMPING_OPT_CMSG.
 
+SOF_TIMESTAMPING_OPT_STATS:
+
+  Optional stats that are obtained along with the transmit timestamps.
+  It must be used together with SOF_TIMESTAMPING_OPT_TSONLY. When the
+  transmit timestamp is available, the stats are available in a
+  separate control message of type SCM_TIMESTAMPING_OPT_STATS, as a
+  list of TLVs (struct nlattr) of types. These stats allow the
+  application to associate various transport layer stats with
+  the transmit timestamps, such as how long a certain block of
+  data was limited by peer's receiver window.
 
 New applications are encouraged to pass SOF_TIMESTAMPING_OPT_ID to
 disambiguate timestamps and SOF_TIMESTAMPING_OPT_TSONLY to operate
diff --git a/arch/alpha/include/uapi/asm/socket.h 
b/arch/alpha/include/uapi/asm/socket.h
index 9e46d6e..afc901b 100644
--- a/arch/alpha/include/uapi/asm/socket.h
+++ b/arch/alpha/include/uapi/asm/socket.h
@@ -97,4 +97,6 @@
 
 #define SO_CNX_ADVICE  53
 
+#define SCM_TIMESTAMPING_OPT_STATS 54
+
 #endif /* _UAPI_ASM_SOCKET_H */
diff --git a/arch/frv/include/uapi/asm/socket.h 
b/arch/frv/include/uapi/asm/socket.h
index afbc98f0..81e0353 100644
--- a/arch/frv/include/uapi/asm/socket.h
+++ b/arch/frv/include/uapi/asm/socket.h
@@ -90,5 +90,7 @@
 
 #define SO_CNX_ADVICE  53
 
+#define SCM_TIMESTAMPING_OPT_STATS 54
+
 #endif /* _ASM_SOCKET_H */
 
diff --git a/arch/ia64/include/uapi/asm/socket.h 
b/arch/ia64/include/uapi/asm/socket.h
index 0018fad..57feb0c 100644
--- a/arch/ia64/include/uapi/asm/socket.h
+++ b/arch/ia64/include/uapi/asm/socket.h
@@ -99,4 +99,6 @@
 
 #define SO_CNX_ADVICE  53
 
+#define SCM_TIMESTAMPING_OPT_STATS 54
+
 #endif /* _ASM_IA64_SOCKET_H */
diff --git a/arch/m32r/include/uapi/asm/socket.h 
b/arch/m32r/include/uapi/asm/socket.h
index 5fe42fc..5853f8e9 100644
--- a/arch/m32r/include/uapi/asm/socket.h
+++ b/arch/m32r/include/uapi/asm/socket.h
@@ -90,4 +90,6 @@
 
 #define SO_CNX_ADVICE  53
 
+#define SCM_TIMESTAMPING_OPT_STATS 54
+
 #endif /* _ASM_M32R_SOCKET_H */
diff --git a/arch/mips/include/uapi/asm/socket.h 
b/arch/mips/include/uapi/asm/socket.h
index 2027240a..566ecdc 100644
--- a/arch/mips/include/uapi/asm/socket.h
+++

[PATCH net-next 0/6] tcp: sender chronographs instrumentation

2016-11-26 Thread Yuchung Cheng

This patch set provides instrumentation on TCP sender limitations.
While developing the BBR congestion control, we noticed that TCP
sending process is often limited by factors unrelated to congestion
control: insufficient sender buffer and/or insufficient receive
window/buffer to saturate the network bandwidth. Unfortunately these
limits are not visible to the users and often the poor performance
is attributed to the congestion control of choice.

Thie patch aims to help users get the high level understanding of
where sending process is limited by, similar to the TCP_INFO design.
It is not to replace detailed kernel tracing and instrumentation
facilities.

In addition this patch set provides a new option to the timestamping
work to instrument these limits on application data unit. For exampe,
one can use SO_TIMESTAMPING and this patch set to measure the how
long a particular HTTP response is limited by small receive window.

Patch set was initially written by Francis Yan then polished
by Yuchung Cheng, with lots of help from Eric Dumazet and Soheil
Hassas Yeganeh.

Francis Yan (6):
  tcp: instrument tcp sender limits chronographs
  tcp: instrument how long TCP is busy sending
  tcp: instrument how long TCP is limited by receive window
  tcp: instrument how long TCP is limited by insufficient send buffer
  tcp: export sender limits chronographs to TCP_INFO
  tcp: SOF_TIMESTAMPING_OPT_STATS option for SO_TIMESTAMPING

 Documentation/networking/timestamping.txt | 10 +
 arch/alpha/include/uapi/asm/socket.h  |  2 +
 arch/frv/include/uapi/asm/socket.h|  2 +
 arch/ia64/include/uapi/asm/socket.h   |  2 +
 arch/m32r/include/uapi/asm/socket.h   |  2 +
 arch/mips/include/uapi/asm/socket.h   |  2 +
 arch/mn10300/include/uapi/asm/socket.h|  2 +
 arch/parisc/include/uapi/asm/socket.h |  2 +
 arch/powerpc/include/uapi/asm/socket.h|  2 +
 arch/s390/include/uapi/asm/socket.h   |  2 +
 arch/sparc/include/uapi/asm/socket.h  |  2 +
 arch/xtensa/include/uapi/asm/socket.h |  2 +
 include/linux/tcp.h   |  9 -
 include/net/tcp.h | 20 +-
 include/uapi/asm-generic/socket.h |  2 +
 include/uapi/linux/net_tstamp.h   |  3 +-
 include/uapi/linux/tcp.h  | 12 ++
 net/core/skbuff.c | 12 --
 net/core/sock.c   |  7 
 net/ipv4/tcp.c| 50 ++-
 net/ipv4/tcp_input.c  |  8 +++-
 net/ipv4/tcp_output.c | 66 ++-
 net/socket.c  |  7 +++-
 23 files changed, 215 insertions(+), 13 deletions(-)

-- 
2.8.0.rc3.226.g39d4020

[PATCH net-next 1/6] tcp: instrument tcp sender limits chronographs

2016-11-26 Thread Yuchung Cheng

From: Francis Yan 

This patch implements the skeleton of the TCP chronograph
instrumentation on sender side limits:

1) idle (unspec)
2) busy sending data other than 3-4 below
3) rwnd-limited
4) sndbuf-limited

The limits are enumerated 'tcp_chrono'. Since a connection in
theory can idle forever, we do not track the actual length of this
uninteresting idle period. For the rest we track how long the sender
spends in each limit. At any point during the life time of a
connection, the sender must be in one of the four states.

If there are multiple conditions worthy of tracking in a chronograph
then the highest priority enum takes precedence over
the other conditions. So that if something "more interesting"
starts happening, stop the previous chrono and start a new one.

The time unit is jiffy(u32) in order to save space in tcp_sock.
This implies application must sample the stats no longer than every
49 days of 1ms jiffy.

Signed-off-by: Francis Yan 
Signed-off-by: Yuchung Cheng 
Signed-off-by: Soheil Hassas Yeganeh 
---
 include/linux/tcp.h   |  7 +--
 include/net/tcp.h | 14 ++
 net/ipv4/tcp_output.c | 30 ++
 3 files changed, 49 insertions(+), 2 deletions(-)

diff --git a/include/linux/tcp.h b/include/linux/tcp.h
index 32a7c7e..d5d3bd8 100644
--- a/include/linux/tcp.h
+++ b/include/linux/tcp.h
@@ -211,8 +211,11 @@ struct tcp_sock {
u8 reord;/* reordering detected */
} rack;
u16 advmss; /* Advertised MSS   */
-   u8  rate_app_limited:1,  /* rate_{delivered,interval_us} limited? */
-   unused:7;
+   u32 chrono_start;   /* Start time in jiffies of a TCP chrono */
+   u32 chrono_stat[3]; /* Time in jiffies for chrono_stat stats */
+   u8  chrono_type:2,  /* current chronograph type */
+   rate_app_limited:1,  /* rate_{delivered,interval_us} limited? */
+   unused:5;
u8  nonagle : 4,/* Disable Nagle algorithm? */
thin_lto: 1,/* Use linear timeouts for thin streams */
thin_dupack : 1,/* Fast retransmit on first dupack  */
diff --git a/include/net/tcp.h b/include/net/tcp.h
index 7de8073..e5ff408 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -1516,6 +1516,20 @@ struct tcp_fastopen_context {
struct rcu_head rcu;
 };
 
+/* Latencies incurred by various limits for a sender. They are
+ * chronograph-like stats that are mutually exclusive.
+ */
+enum tcp_chrono {
+   TCP_CHRONO_UNSPEC,
+   TCP_CHRONO_BUSY, /* Actively sending data (non-empty write queue) */
+   TCP_CHRONO_RWND_LIMITED, /* Stalled by insufficient receive window */
+   TCP_CHRONO_SNDBUF_LIMITED, /* Stalled by insufficient send buffer */
+   __TCP_CHRONO_MAX,
+};
+
+void tcp_chrono_start(struct sock *sk, const enum tcp_chrono type);
+void tcp_chrono_stop(struct sock *sk, const enum tcp_chrono type);
+
 /* write queue abstraction */
 static inline void tcp_write_queue_purge(struct sock *sk)
 {
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 19105b4..34f7517 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -2081,6 +2081,36 @@ static bool tcp_small_queue_check(struct sock *sk, const 
struct sk_buff *skb,
return false;
 }
 
+static void tcp_chrono_set(struct tcp_sock *tp, const enum tcp_chrono new)
+{
+   const u32 now = tcp_time_stamp;
+
+   if (tp->chrono_type > TCP_CHRONO_UNSPEC)
+   tp->chrono_stat[tp->chrono_type - 1] += now - tp->chrono_start;
+   tp->chrono_start = now;
+   tp->chrono_type = new;
+}
+
+void tcp_chrono_start(struct sock *sk, const enum tcp_chrono type)
+{
+   struct tcp_sock *tp = tcp_sk(sk);
+
+   /* If there are multiple conditions worthy of tracking in a
+* chronograph then the highest priority enum takes precedence over
+* the other conditions. So that if something "more interesting"
+* starts happening, stop the previous chrono and start a new one.
+*/
+   if (type > tp->chrono_type)
+   tcp_chrono_set(tp, type);
+}
+
+void tcp_chrono_stop(struct sock *sk, const enum tcp_chrono type)
+{
+   struct tcp_sock *tp = tcp_sk(sk);
+
+   tcp_chrono_set(tp, TCP_CHRONO_UNSPEC);
+}
+
 /* This routine writes packets to the network.  It advances the
  * send_head.  This happens as incoming acks open up the remote
  * window for us.
-- 
2.8.0.rc3.226.g39d4020

[PATCH net-next 2/6] tcp: instrument how long TCP is busy sending

2016-11-26 Thread Yuchung Cheng

From: Francis Yan 

This patch measures TCP busy time, which is defined as the period
of time when sender has data (or FIN) to send. The time starts when
data is buffered and stops when the write queue is flushed by ACKs
or error events.

Note the busy time does not include SYN time, unless data is
included in SYN (i.e. Fast Open). It does include FIN time even
if the FIN carries no payload. Excluding pure FIN is possible but
would incur one additional test in the fast path, which may not
be worth it.

Signed-off-by: Francis Yan 
Signed-off-by: Yuchung Cheng 
Signed-off-by: Soheil Hassas Yeganeh 
---
 include/net/tcp.h |  6 +-
 net/ipv4/tcp_input.c  |  3 +++
 net/ipv4/tcp_output.c | 19 ---
 3 files changed, 24 insertions(+), 4 deletions(-)

diff --git a/include/net/tcp.h b/include/net/tcp.h
index e5ff408..3e097e3 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -1535,6 +1535,7 @@ static inline void tcp_write_queue_purge(struct sock *sk)
 {
struct sk_buff *skb;
 
+   tcp_chrono_stop(sk, TCP_CHRONO_BUSY);
while ((skb = __skb_dequeue(>sk_write_queue)) != NULL)
sk_wmem_free_skb(sk, skb);
sk_mem_reclaim(sk);
@@ -1593,8 +1594,10 @@ static inline void tcp_advance_send_head(struct sock 
*sk, const struct sk_buff *
 
 static inline void tcp_check_send_head(struct sock *sk, struct sk_buff 
*skb_unlinked)
 {
-   if (sk->sk_send_head == skb_unlinked)
+   if (sk->sk_send_head == skb_unlinked) {
sk->sk_send_head = NULL;
+   tcp_chrono_stop(sk, TCP_CHRONO_BUSY);
+   }
if (tcp_sk(sk)->highest_sack == skb_unlinked)
tcp_sk(sk)->highest_sack = NULL;
 }
@@ -1616,6 +1619,7 @@ static inline void tcp_add_write_queue_tail(struct sock 
*sk, struct sk_buff *skb
/* Queue it, remembering where we must start sending. */
if (sk->sk_send_head == NULL) {
sk->sk_send_head = skb;
+   tcp_chrono_start(sk, TCP_CHRONO_BUSY);
 
if (tcp_sk(sk)->highest_sack == NULL)
tcp_sk(sk)->highest_sack = skb;
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 22e6a20..a5d1727 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -3178,6 +3178,9 @@ static int tcp_clean_rtx_queue(struct sock *sk, int 
prior_fackets,
tp->lost_skb_hint = NULL;
}
 
+   if (!skb)
+   tcp_chrono_stop(sk, TCP_CHRONO_BUSY);
+
if (likely(between(tp->snd_up, prior_snd_una, tp->snd_una)))
tp->snd_up = tp->snd_una;
 
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 34f7517..e8ea584 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -2096,8 +2096,8 @@ void tcp_chrono_start(struct sock *sk, const enum 
tcp_chrono type)
struct tcp_sock *tp = tcp_sk(sk);
 
/* If there are multiple conditions worthy of tracking in a
-* chronograph then the highest priority enum takes precedence over
-* the other conditions. So that if something "more interesting"
+* chronograph then the highest priority enum takes precedence
+* over the other conditions. So that if something "more interesting"
 * starts happening, stop the previous chrono and start a new one.
 */
if (type > tp->chrono_type)
@@ -2108,7 +2108,18 @@ void tcp_chrono_stop(struct sock *sk, const enum 
tcp_chrono type)
 {
struct tcp_sock *tp = tcp_sk(sk);
 
-   tcp_chrono_set(tp, TCP_CHRONO_UNSPEC);
+
+   /* There are multiple conditions worthy of tracking in a
+* chronograph, so that the highest priority enum takes
+* precedence over the other conditions (see tcp_chrono_start).
+* If a condition stops, we only stop chrono tracking if
+* it's the "most interesting" or current chrono we are
+* tracking and starts busy chrono if we have pending data.
+*/
+   if (tcp_write_queue_empty(sk))
+   tcp_chrono_set(tp, TCP_CHRONO_UNSPEC);
+   else if (type == tp->chrono_type)
+   tcp_chrono_set(tp, TCP_CHRONO_BUSY);
 }
 
 /* This routine writes packets to the network.  It advances the
@@ -3328,6 +3339,8 @@ static int tcp_send_syn_data(struct sock *sk, struct 
sk_buff *syn)
fo->copied = space;
 
tcp_connect_queue_skb(sk, syn_data);
+   if (syn_data->len)
+   tcp_chrono_start(sk, TCP_CHRONO_BUSY);
 
err = tcp_transmit_skb(sk, syn_data, 1, sk->sk_allocation);
 
-- 
2.8.0.rc3.226.g39d4020

[PATCH net-next 3/6] tcp: instrument how long TCP is limited by receive window

2016-11-26 Thread Yuchung Cheng

From: Francis Yan 

This patch measures the total time when the TCP stops sending because
the receiver's advertised window is not large enough. Note that
once the limit is lifted we are likely in the busy status if we
have data pending.

Signed-off-by: Francis Yan 
Signed-off-by: Yuchung Cheng 
Signed-off-by: Soheil Hassas Yeganeh 
---
 net/ipv4/tcp_output.c | 11 +--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index e8ea584..b7c 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -2144,7 +2144,7 @@ static bool tcp_write_xmit(struct sock *sk, unsigned int 
mss_now, int nonagle,
unsigned int tso_segs, sent_pkts;
int cwnd_quota;
int result;
-   bool is_cwnd_limited = false;
+   bool is_cwnd_limited = false, is_rwnd_limited = false;
u32 max_segs;
 
sent_pkts = 0;
@@ -2181,8 +2181,10 @@ static bool tcp_write_xmit(struct sock *sk, unsigned int 
mss_now, int nonagle,
break;
}
 
-   if (unlikely(!tcp_snd_wnd_test(tp, skb, mss_now)))
+   if (unlikely(!tcp_snd_wnd_test(tp, skb, mss_now))) {
+   is_rwnd_limited = true;
break;
+   }
 
if (tso_segs == 1) {
if (unlikely(!tcp_nagle_test(tp, skb, mss_now,
@@ -2227,6 +2229,11 @@ static bool tcp_write_xmit(struct sock *sk, unsigned int 
mss_now, int nonagle,
break;
}
 
+   if (is_rwnd_limited)
+   tcp_chrono_start(sk, TCP_CHRONO_RWND_LIMITED);
+   else
+   tcp_chrono_stop(sk, TCP_CHRONO_RWND_LIMITED);
+
if (likely(sent_pkts)) {
if (tcp_in_cwnd_reduction(sk))
tp->prr_out += sent_pkts;
-- 
2.8.0.rc3.226.g39d4020

Re: net: GPF in eth_header

2016-11-26 Thread Eric Dumazet

> Hi Eric,
>
> The crash happens when the kernel tries to access shadow for nonmapped memory.
>
> The issue here is an integer overflow which happens in neigh_resolve_output().
> skb_network_offset(skb) can return negative number, but __skb_pull()
> accepts unsigned int as len.
> As a result, the least significat bit in higher 32 bits of skb->data
> gets set and we get an out-of-bounds with offset of 4 GB.
>
> I've attached a short reproducer, but you either need KASAN or to add
> a BUG_ON to see the crash.
> In this reproducer skb_network_offset() becomes negative after merging
> two ipv6 fragments.
>
> I actually see multiple places where skb_network_offset() is used as
> an argument to skb_pull().
> So I guess every place can potentially be buggy.

Well, I think the intent is to accept a negative number.

This definitely was assumed by commit e1f165032c8bade authors !

I guess they were using a 32bit kernel for their tests.

unsuscribe

2016-11-26 Thread seba

unsuscribe

Re: net: stmmac: Meson GXBB: attempting to execute userspace memory

2016-11-26 Thread Ben Dooks




On 2016-11-26 07:53, Heinrich Schuchardt wrote:

For Odroid C2 I have compiled kernel
4.9.0-rc6-next-20161124-1-gbf7e142
with one additional patch
https://github.com/xypron/kernel-odroid-c2/blob/master/patch/0001-stmmac-RTL8211F-Meson-GXBB-TX-throughput-problems.patch

I repeatedly see faults like the one below:

[ 2557.400796] Unhandled fault: synchronous external abort (0x9210)
at 0x40001e8ee4b0
[ 2557.952413] CPU: 0 PID: 22837 Comm: cc1 Tainted: G  D
4.9.0-rc6-next-20161124-1-gbf7e142 #1
[ 2557.962062] Hardware name: Hardkernel ODROID-C2 (DT)
[ 2557.966980] task: 80006ddb7080 task.stack: 80006dd9c000
[ 2557.972846] PC is at 0x6a0d98
[ 2557.975776] LR is at 0x6a0e54
[ 2557.978709] pc : [<006a0d98>] lr : [<006a0e54>]
pstate: 8000
[ 2557.986040] sp : f3ee5f80
[ 2557.989318] x29: f3ee5f80 x28: 4b3f1240
[ 2557.994578] x27: 012a7000 x26: 4b3f1288
[ 2557.999840] x25: 00f58f88 x24: 4b3f1240
[ 2558.005101] x23:  x22: 0001
[ 2558.010362] x21: 0001 x20: 4b3f1250
[ 2558.015623] x19: 0054 x18: 0001
[ 2558.020885] x17: 48acaa10 x16: 01285050
[ 2558.026146] x15: 4ad96dc8 x14: 001f
[ 2558.031407] x13: 4b3f1270 x12: 4b3f1258
[ 2558.036668] x11: 01347000 x10: 0661
[ 2558.041930] x9 : 0005 x8 : 0003
[ 2558.047191] x7 : 4b3f1240 x6 : 20020033
[ 2558.052452] x5 : 4b402020 x4 : 4b3e1aa0
[ 2558.057713] x3 : 000c x2 : 0020
[ 2558.062974] x1 : 00f45000 x0 : 0065
[ 2558.068235]
[ 2558.069712] Internal error: Attempting to execute userspace memory:
860f [#7] PREEMPT SMP
[ 2558.078155] Modules linked in: meson_rng rng_core meson_gxbb_wdt
ip_tables x_tables ipv6 dwmac_generic realtek dwmac_meson8b
stmmac_platform stmmac
[ 2558.091267] CPU: 0 PID: 22837 Comm: cc1 Tainted: G  D
4.9.0-rc6-next-20161124-1-gbf7e142 #1
[ 2558.100925] Hardware name: Hardkernel ODROID-C2 (DT)
[ 2558.105841] task: 80006ddb7080 task.stack: 80006dd9c000
[ 2558.111706] PC is at 0x6a0e54
[ 2558.114638] LR is at 0x6a0e54
[ 2558.117571] pc : [<006a0e54>] lr : [<006a0e54>]
pstate: 63c5
[ 2558.124902] sp : 80006dd9fec0
[ 2558.128179] x29:  x28: 80006ddb7080
[ 2558.133441] x27: 012a7000 x26: 4b3f1288
[ 2558.138702] x25: 00f58f88 x24: 4b3f1240
[ 2558.143963] x23: 8000 x22: 006a0d98
[ 2558.149225] x21:  x20: 80006e223000
[ 2558.154486] x19:  x18: 0010
[ 2558.159747] x17: 48acaa10 x16: 01285050
[ 2558.165008] x15: 88e91f07 x14: 0006
[ 2558.170270] x13: 08e91f15 x12: 000f
[ 2558.175531] x11: 0002 x10: 02ea
[ 2558.180792] x9 : 80006dd9fb40 x8 : 00010a8b
[ 2558.186053] x7 :  x6 : 020e
[ 2558.191315] x5 : 020f020e x4 : 
[ 2558.196576] x3 :  x2 : 020f
[ 2558.201837] x1 : 80006ddb7080 x0 : 
[ 2558.207098]
[ 2558.208565] Process cc1 (pid: 22837, stack limit = 
0x80006dd9c000)

[ 2558.215035] Stack: (0x80006dd9fec0 to 0x80006dda)
[ 2558.220728] fec0: 0065 00f45000 0020
000c
[ 2558.228490] fee0: 4b3e1aa0 4b402020 20020033
4b3f1240
[ 2558.236253] ff00: 0003 0005 0661
01347000
[ 2558.244015] ff20: 4b3f1258 4b3f1270 001f
4ad96dc8
[ 2558.251778] ff40: 01285050 48acaa10 0001
0054
[ 2558.259540] ff60: 4b3f1250 0001 0001

[ 2558.267303] ff80: 4b3f1240 00f58f88 4b3f1288
012a7000
[ 2558.275065] ffa0: 4b3f1240 f3ee5f80 006a0e54
f3ee5f80
[ 2558.282828] ffc0: 006a0d98 8000 0003

[ 2558.290590] ffe0:   

[ 2558.298351] Call trace:
[ 2558.300769] Exception stack(0x80006dd9fcf0 to 
0x80006dd9fe20)

[ 2558.307149] fce0:   
0001
[ 2558.314913] fd00: 80006dd9fec0 006a0e54 800073acf500
0004
[ 2558.322675] fd20:  08dbbc18 80006ddb7080
6dd9fdd0
[ 2558.330438] fd40: 80006dd9fd90 080ca878 80006dd9fe40
80006ddb7080
[ 2558.338200] fd60: 0004 03c0 80006dd9fe40
4b3f1240
[ 2558.345963] fd80: 00f58f88 4b3f1288 
80006ddb7080
[ 2558.353725] fda0: 020f

Re: net: GPF in eth_header

2016-11-26 Thread Andrey Konovalov

On Sat, Nov 26, 2016 at 7:28 PM, 'Eric Dumazet' via syzkaller
 wrote:
> On Sat, Nov 26, 2016 at 9:30 AM, Dmitry Vyukov  wrote:
>> Hello,
>>
>> The following program triggers GPF in eth_header:
>>
>> https://gist.githubusercontent.com/dvyukov/613cadf05543b55a419f237e419cd495/raw/5471231523d1a07c3de55f11f87472c2816ee06c/gistfile1.txt
>>
>> On commit 16ae16c6e5616c084168740990fc508bda6655d4 (Nov 24).
>>
>> BUG: unable to handle kernel paging request at ed002d14d74a
>> IP: [] eth_header+0x75/0x260 net/ethernet/eth.c:88
>> PGD 7fff6067 [   50.787819] PUD 7fff5067
>> PMD 0 [   50.787819]
>> Oops:  [#1] SMP DEBUG_PAGEALLOC KASAN
>> Modules linked in:
>> CPU: 2 PID: 6712 Comm: a.out Not tainted 4.9.0-rc6+ #55
>> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
>> task: 88003a1841c0 task.stack: 880034d08000
>> RIP: 0010:[]  []
>> eth_header+0x75/0x260 net/ethernet/eth.c:88
>> RSP: 0018:880034d0eb68  EFLAGS: 00010a03
>> RAX: 11002d14d74a RBX: 880168a6ba4a RCX: 88006a9c7858
>> RDX: dd86 RSI: dc00 RDI: 880168a6ba56
>> RBP: 880034d0eb98 R08:  R09: 0031
>> R10: 0002 R11:  R12: 
>> R13: 88006c208d80 R14: 86dd R15: 88006a9c7858
>> FS:  01a02940() GS:88006d00() knlGS:
>> CS:  0010 DS:  ES:  CR0: 80050033
>> CR2: ed002d14d74a CR3: 37373000 CR4: 06e0
>> Stack:
>>  00316881ab40 88006a9c76c0 88006881ab40 88006a9c77f8
>>   dc00 880034d0ee98 86b31af9
>>  8719605c 880034d0f0f8 86dd 86be3220
>> Call Trace:
>>  [< inline >] dev_hard_header ./include/linux/netdevice.h:2762
>>  [] neigh_resolve_output+0x659/0xb20 
>> net/core/neighbour.c:1302
>>  [< inline >] dst_neigh_output ./include/net/dst.h:464
>>  [] ip6_finish_output2+0xb3c/0x2500 
>> net/ipv6/ip6_output.c:121
>>  [] ip6_finish_output+0x2eb/0x760 net/ipv6/ip6_output.c:139
>>  [< inline >] NF_HOOK_COND ./include/linux/netfilter.h:246
>>  [] ip6_output+0x1d7/0x9a0 net/ipv6/ip6_output.c:153
>>  [< inline >] dst_output ./include/net/dst.h:501
>>  [] ip6_local_out+0x9a/0x180 net/ipv6/output_core.c:170
>>  [] ip6_send_skb+0xa6/0x340 net/ipv6/ip6_output.c:1712
>>  [] ip6_push_pending_frames+0xb8/0xe0
>> net/ipv6/ip6_output.c:1732
>>  [< inline >] rawv6_push_pending_frames net/ipv6/raw.c:607
>>  [] rawv6_sendmsg+0x250b/0x2c20 net/ipv6/raw.c:920
>>  [] inet_sendmsg+0x385/0x590 net/ipv4/af_inet.c:734
>>  [< inline >] sock_sendmsg_nosec net/socket.c:621
>>  [] sock_sendmsg+0xcf/0x110 net/socket.c:631
>>  [] sock_write_iter+0x32b/0x620 net/socket.c:829
>>  [] do_iter_readv_writev+0x363/0x670 fs/read_write.c:695
>>  [] do_readv_writev+0x431/0x9b0 fs/read_write.c:872
>>  [] vfs_writev+0x8c/0xc0 fs/read_write.c:911
>>  [] do_writev+0x115/0x2d0 fs/read_write.c:944
>>  [< inline >] SYSC_writev fs/read_write.c:1017
>>  [] SyS_writev+0x2c/0x40 fs/read_write.c:1014
>>  [] entry_SYSCALL_64_fastpath+0x23/0xc6
>> arch/x86/entry/entry_64.S:209
>> Code: 41 83 fe 04 0f 84 aa 00 00 00 e8 17 4e b0 fa 48 8d 7b 0c 48 be
>> 00 00 00 00 00 fc ff df 44 89 f2 66 c1 c2 08 48 89 f8 48 c1 e8 03 <0f>
>> b6 0c 30 48 8d 43 0d 49 89 c0 49 c1 e8 03 41 0f b6 34 30 49
>> RIP  [] eth_header+0x75/0x260 net/ethernet/eth.c:88
>>  RSP 
>> CR2: ed002d14d74a
>> ---[ end trace a73fedfdc11bd60c ]---
>
>
> Hi Dmitry
>
> I could not reproduce the issue. Might need some specific configuration...
>
> loopback device has proper ethernet header (all 0)
>
> Fault happens in :
>
> 0f b6 0c 30 movzbl (%rax,%rsi,1),%ecx
>
> RAX=11002d14d74a  which is RDI>>3, and RSI=dc00
>
> Could this be a KASAN problem ?

Hi Eric,

The crash happens when the kernel tries to access shadow for nonmapped memory.

The issue here is an integer overflow which happens in neigh_resolve_output().
skb_network_offset(skb) can return negative number, but __skb_pull()
accepts unsigned int as len.
As a result, the least significat bit in higher 32 bits of skb->data
gets set and we get an out-of-bounds with offset of 4 GB.

I've attached a short reproducer, but you either need KASAN or to add
a BUG_ON to see the crash.
In this reproducer skb_network_offset() becomes negative after merging
two ipv6 fragments.

I actually see multiple places where skb_network_offset() is used as
an argument to skb_pull().
So I guess every place can potentially be buggy.

Thanks!


>
> --
> You received this message because you are subscribed to the Google Groups 
> "syzkaller" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to syzkaller+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.


ipv6-poc1.c
Description: Binary

Re: [PATCH net-next] cgroup, bpf: remove unnecessary #include

2016-11-26 Thread Rami Rosen

Acked-by: Rami Rosen 

On 26 November 2016 at 09:23, Alexei Starovoitov  wrote:
> this #include is unnecessary and brings whole set of
> other headers into cgroup-defs.h. Remove it.
>
> Fixes: 3007098494be ("cgroup: add support for eBPF programs")
> Signed-off-by: Alexei Starovoitov 
> ---
>  include/linux/bpf-cgroup.h | 1 -
>  1 file changed, 1 deletion(-)
>
> diff --git a/include/linux/bpf-cgroup.h b/include/linux/bpf-cgroup.h
> index ec80d0c0953e..0cf1adfadd2d 100644
> --- a/include/linux/bpf-cgroup.h
> +++ b/include/linux/bpf-cgroup.h
> @@ -1,7 +1,6 @@
>  #ifndef _BPF_CGROUP_H
>  #define _BPF_CGROUP_H
>
> -#include 
>  #include 
>  #include 
>
> --
> 2.8.0
>

Re: net: GPF in eth_header

2016-11-26 Thread Eric Dumazet

On Sat, Nov 26, 2016 at 9:30 AM, Dmitry Vyukov  wrote:
> Hello,
>
> The following program triggers GPF in eth_header:
>
> https://gist.githubusercontent.com/dvyukov/613cadf05543b55a419f237e419cd495/raw/5471231523d1a07c3de55f11f87472c2816ee06c/gistfile1.txt
>
> On commit 16ae16c6e5616c084168740990fc508bda6655d4 (Nov 24).
>
> BUG: unable to handle kernel paging request at ed002d14d74a
> IP: [] eth_header+0x75/0x260 net/ethernet/eth.c:88
> PGD 7fff6067 [   50.787819] PUD 7fff5067
> PMD 0 [   50.787819]
> Oops:  [#1] SMP DEBUG_PAGEALLOC KASAN
> Modules linked in:
> CPU: 2 PID: 6712 Comm: a.out Not tainted 4.9.0-rc6+ #55
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> task: 88003a1841c0 task.stack: 880034d08000
> RIP: 0010:[]  []
> eth_header+0x75/0x260 net/ethernet/eth.c:88
> RSP: 0018:880034d0eb68  EFLAGS: 00010a03
> RAX: 11002d14d74a RBX: 880168a6ba4a RCX: 88006a9c7858
> RDX: dd86 RSI: dc00 RDI: 880168a6ba56
> RBP: 880034d0eb98 R08:  R09: 0031
> R10: 0002 R11:  R12: 
> R13: 88006c208d80 R14: 86dd R15: 88006a9c7858
> FS:  01a02940() GS:88006d00() knlGS:
> CS:  0010 DS:  ES:  CR0: 80050033
> CR2: ed002d14d74a CR3: 37373000 CR4: 06e0
> Stack:
>  00316881ab40 88006a9c76c0 88006881ab40 88006a9c77f8
>   dc00 880034d0ee98 86b31af9
>  8719605c 880034d0f0f8 86dd 86be3220
> Call Trace:
>  [< inline >] dev_hard_header ./include/linux/netdevice.h:2762
>  [] neigh_resolve_output+0x659/0xb20 
> net/core/neighbour.c:1302
>  [< inline >] dst_neigh_output ./include/net/dst.h:464
>  [] ip6_finish_output2+0xb3c/0x2500 
> net/ipv6/ip6_output.c:121
>  [] ip6_finish_output+0x2eb/0x760 net/ipv6/ip6_output.c:139
>  [< inline >] NF_HOOK_COND ./include/linux/netfilter.h:246
>  [] ip6_output+0x1d7/0x9a0 net/ipv6/ip6_output.c:153
>  [< inline >] dst_output ./include/net/dst.h:501
>  [] ip6_local_out+0x9a/0x180 net/ipv6/output_core.c:170
>  [] ip6_send_skb+0xa6/0x340 net/ipv6/ip6_output.c:1712
>  [] ip6_push_pending_frames+0xb8/0xe0
> net/ipv6/ip6_output.c:1732
>  [< inline >] rawv6_push_pending_frames net/ipv6/raw.c:607
>  [] rawv6_sendmsg+0x250b/0x2c20 net/ipv6/raw.c:920
>  [] inet_sendmsg+0x385/0x590 net/ipv4/af_inet.c:734
>  [< inline >] sock_sendmsg_nosec net/socket.c:621
>  [] sock_sendmsg+0xcf/0x110 net/socket.c:631
>  [] sock_write_iter+0x32b/0x620 net/socket.c:829
>  [] do_iter_readv_writev+0x363/0x670 fs/read_write.c:695
>  [] do_readv_writev+0x431/0x9b0 fs/read_write.c:872
>  [] vfs_writev+0x8c/0xc0 fs/read_write.c:911
>  [] do_writev+0x115/0x2d0 fs/read_write.c:944
>  [< inline >] SYSC_writev fs/read_write.c:1017
>  [] SyS_writev+0x2c/0x40 fs/read_write.c:1014
>  [] entry_SYSCALL_64_fastpath+0x23/0xc6
> arch/x86/entry/entry_64.S:209
> Code: 41 83 fe 04 0f 84 aa 00 00 00 e8 17 4e b0 fa 48 8d 7b 0c 48 be
> 00 00 00 00 00 fc ff df 44 89 f2 66 c1 c2 08 48 89 f8 48 c1 e8 03 <0f>
> b6 0c 30 48 8d 43 0d 49 89 c0 49 c1 e8 03 41 0f b6 34 30 49
> RIP  [] eth_header+0x75/0x260 net/ethernet/eth.c:88
>  RSP 
> CR2: ed002d14d74a
> ---[ end trace a73fedfdc11bd60c ]---


Hi Dmitry

I could not reproduce the issue. Might need some specific configuration...

loopback device has proper ethernet header (all 0)

Fault happens in :

0f b6 0c 30 movzbl (%rax,%rsi,1),%ecx

RAX=11002d14d74a  which is RDI>>3, and RSI=dc00

Could this be a KASAN problem ?

net: BUG in unix_notinflight

2016-11-26 Thread Dmitry Vyukov

Hello,

I am hitting the following BUG while running syzkaller fuzzer:

kernel BUG at net/unix/garbage.c:149!
invalid opcode:  [#1] SMP DEBUG_PAGEALLOC KASAN
Dumping ftrace buffer:
   (ftrace buffer empty)
Modules linked in:
CPU: 0 PID: 23491 Comm: syz-executor Not tainted 4.9.0-rc5+ #41
Hardware name: Google Google Compute Engine/Google Compute Engine,
BIOS Google 01/01/2011
task: 8801c16b06c0 task.stack: 8801c2928000
RIP: 0010:[]  []
unix_notinflight+0x3b4/0x490 net/unix/garbage.c:149
RSP: 0018:8801c292ea40  EFLAGS: 00010297
RAX: 8801c16b06c0 RBX: 110038525d4a RCX: dc00
RDX:  RSI: 110038525d4e RDI: 8a6e9d84
RBP: 8801c292eb18 R08:  R09: 
R10: cdca594876e035a1 R11: 0005 R12: 110038525d4e
R13: 899156e0 R14: 8801c292eaf0 R15: 88018b7cd780
FS:  7f10420fa700() GS:8801d980() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 2000a000 CR3: 0001c2ecc000 CR4: 001406f0
DR0:  DR1: 0400 DR2: 
DR3:  DR6: 0ff0 DR7: 0600
Stack:
 dc00 88019f036970 41b58ab3 894c5120
 8717e840 8801c16b06c0 88018b7cdcf0 894c51e2
 81576d50   1100
Call Trace:
 [] unix_detach_fds.isra.19+0xff/0x170 net/unix/af_unix.c:1487
 [] unix_destruct_scm+0xf9/0x210 net/unix/af_unix.c:1496
 [] skb_release_head_state+0x101/0x200 net/core/skbuff.c:655
 [] skb_release_all+0x1a/0x60 net/core/skbuff.c:668
 [] __kfree_skb+0x1a/0x30 net/core/skbuff.c:684
 [] kfree_skb+0x184/0x570 net/core/skbuff.c:705
 [] unix_release_sock+0x5b5/0xbd0 net/unix/af_unix.c:559
 [] unix_release+0x49/0x90 net/unix/af_unix.c:836
 [] sock_release+0x92/0x1f0 net/socket.c:570
 [] sock_close+0x1b/0x20 net/socket.c:1017
 [] __fput+0x34e/0x910 fs/file_table.c:208
 [] fput+0x1a/0x20 fs/file_table.c:244
 [] task_work_run+0x1a0/0x280 kernel/task_work.c:116
 [< inline >] exit_task_work include/linux/task_work.h:21
 [] do_exit+0x183a/0x2640 kernel/exit.c:828
 [] do_group_exit+0x14e/0x420 kernel/exit.c:931
 [] get_signal+0x663/0x1880 kernel/signal.c:2307
 [] do_signal+0xc5/0x2190 arch/x86/kernel/signal.c:807
 [] exit_to_usermode_loop+0x1ea/0x2d0
arch/x86/entry/common.c:156
 [< inline >] prepare_exit_to_usermode arch/x86/entry/common.c:190
 [] syscall_return_slowpath+0x4d3/0x570
arch/x86/entry/common.c:259
 [] entry_SYSCALL_64_fastpath+0xc4/0xc6
Code: df 49 89 87 70 05 00 00 41 c6 04 14 f8 48 89 f9 48 c1 e9 03 80
3c 11 00 75 64 49 89 87 78 05 00 00 e9 65 ff ff ff e8 ac 94 56 fa <0f>
0b 48 89 d7 48 89 95 30 ff ff ff e8 bb 22 87 fa 48 8b 95 30
RIP  [] unix_notinflight+0x3b4/0x490 net/unix/garbage.c:149
 RSP 
---[ end trace 4cbbd52674b68dab ]---


On commit 16ae16c6e5616c084168740990fc508bda6655d4 (Nov 24).
Unfortunately this is not reproducible outside of syzkaller.
But easily reproducible with syzkaller. If you need to reproduce it,
follow instructions described here:
https://github.com/google/syzkaller/wiki/How-to-execute-syzkaller-programs
With the following as the program:

mmap(&(0x7f00/0xdd5000)=nil, (0xdd5000), 0x3, 0x32,
0x, 0x0)
socketpair$unix(0x1, 0x5, 0x0,

net: GPF in eth_header

2016-11-26 Thread Dmitry Vyukov

Hello,

The following program triggers GPF in eth_header:

https://gist.githubusercontent.com/dvyukov/613cadf05543b55a419f237e419cd495/raw/5471231523d1a07c3de55f11f87472c2816ee06c/gistfile1.txt

On commit 16ae16c6e5616c084168740990fc508bda6655d4 (Nov 24).

BUG: unable to handle kernel paging request at ed002d14d74a
IP: [] eth_header+0x75/0x260 net/ethernet/eth.c:88
PGD 7fff6067 [   50.787819] PUD 7fff5067
PMD 0 [   50.787819]
Oops:  [#1] SMP DEBUG_PAGEALLOC KASAN
Modules linked in:
CPU: 2 PID: 6712 Comm: a.out Not tainted 4.9.0-rc6+ #55
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
task: 88003a1841c0 task.stack: 880034d08000
RIP: 0010:[]  []
eth_header+0x75/0x260 net/ethernet/eth.c:88
RSP: 0018:880034d0eb68  EFLAGS: 00010a03
RAX: 11002d14d74a RBX: 880168a6ba4a RCX: 88006a9c7858
RDX: dd86 RSI: dc00 RDI: 880168a6ba56
RBP: 880034d0eb98 R08:  R09: 0031
R10: 0002 R11:  R12: 
R13: 88006c208d80 R14: 86dd R15: 88006a9c7858
FS:  01a02940() GS:88006d00() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: ed002d14d74a CR3: 37373000 CR4: 06e0
Stack:
 00316881ab40 88006a9c76c0 88006881ab40 88006a9c77f8
  dc00 880034d0ee98 86b31af9
 8719605c 880034d0f0f8 86dd 86be3220
Call Trace:
 [< inline >] dev_hard_header ./include/linux/netdevice.h:2762
 [] neigh_resolve_output+0x659/0xb20 net/core/neighbour.c:1302
 [< inline >] dst_neigh_output ./include/net/dst.h:464
 [] ip6_finish_output2+0xb3c/0x2500 net/ipv6/ip6_output.c:121
 [] ip6_finish_output+0x2eb/0x760 net/ipv6/ip6_output.c:139
 [< inline >] NF_HOOK_COND ./include/linux/netfilter.h:246
 [] ip6_output+0x1d7/0x9a0 net/ipv6/ip6_output.c:153
 [< inline >] dst_output ./include/net/dst.h:501
 [] ip6_local_out+0x9a/0x180 net/ipv6/output_core.c:170
 [] ip6_send_skb+0xa6/0x340 net/ipv6/ip6_output.c:1712
 [] ip6_push_pending_frames+0xb8/0xe0
net/ipv6/ip6_output.c:1732
 [< inline >] rawv6_push_pending_frames net/ipv6/raw.c:607
 [] rawv6_sendmsg+0x250b/0x2c20 net/ipv6/raw.c:920
 [] inet_sendmsg+0x385/0x590 net/ipv4/af_inet.c:734
 [< inline >] sock_sendmsg_nosec net/socket.c:621
 [] sock_sendmsg+0xcf/0x110 net/socket.c:631
 [] sock_write_iter+0x32b/0x620 net/socket.c:829
 [] do_iter_readv_writev+0x363/0x670 fs/read_write.c:695
 [] do_readv_writev+0x431/0x9b0 fs/read_write.c:872
 [] vfs_writev+0x8c/0xc0 fs/read_write.c:911
 [] do_writev+0x115/0x2d0 fs/read_write.c:944
 [< inline >] SYSC_writev fs/read_write.c:1017
 [] SyS_writev+0x2c/0x40 fs/read_write.c:1014
 [] entry_SYSCALL_64_fastpath+0x23/0xc6
arch/x86/entry/entry_64.S:209
Code: 41 83 fe 04 0f 84 aa 00 00 00 e8 17 4e b0 fa 48 8d 7b 0c 48 be
00 00 00 00 00 fc ff df 44 89 f2 66 c1 c2 08 48 89 f8 48 c1 e8 03 <0f>
b6 0c 30 48 8d 43 0d 49 89 c0 49 c1 e8 03 41 0f b6 34 30 49
RIP  [] eth_header+0x75/0x260 net/ethernet/eth.c:88
 RSP 
CR2: ed002d14d74a
---[ end trace a73fedfdc11bd60c ]---

Re: wl1251 & mac address & calibration data

2016-11-26 Thread Pali Rohár

On Thursday 24 November 2016 19:46:01 Aaro Koskinen wrote:
> Hi,
> 
> On Thu, Nov 24, 2016 at 04:20:45PM +0100, Pali Rohár wrote:
> > Proprietary, signed and closed bootloader NOLO does not support DT.
> > So for booting you need to append DTS file to kernel image.
> > 
> > U-Boot is optional and can be used as intermediate bootloader
> > between NOLO and kernel. But still it has problems with reading
> > from nand, so cannot read NVS data nor MAC address.
> 
> You could use kexec to pass the fixed DT.
> 
> A.

IIRC it was broken for N900/omap3, no idea if somebody fixed it.

-- 
Pali Rohár
pali.ro...@gmail.com


signature.asc
Description: This is a digitally signed message part.

Re: wl1251 & mac address & calibration data

2016-11-26 Thread Pavel Machek

On Thu 2016-11-24 20:46:01, Aaro Koskinen wrote:
> Hi,
> 
> On Thu, Nov 24, 2016 at 04:20:45PM +0100, Pali Rohár wrote:
> > Proprietary, signed and closed bootloader NOLO does not support DT. So
> > for booting you need to append DTS file to kernel image.
> > 
> > U-Boot is optional and can be used as intermediate bootloader between
> > NOLO and kernel. But still it has problems with reading from nand, so
> > cannot read NVS data nor MAC address.
> 
> You could use kexec to pass the fixed DT.

Yeah. You could also strap desktop PC to a USB GPRS card, and call it
phone. You could also make a pig fly.

But because you could does not mean you should. No, sorry, kexec is
not acceptable. Too hard to set up, slows boot too much.

Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html


signature.asc
Description: Digital signature

Re: net: deadlock on genl_mutex

2016-11-26 Thread Eric Dumazet

On Sat, Nov 26, 2016 at 9:04 AM, Dmitry Vyukov  wrote:
> Hello,
>
> The following program triggers deadlock warnings on genl_mutex:
>
> https://gist.githubusercontent.com/dvyukov/65e33d053e507d2ab0bf6ae83d989585/raw/b3c640ec58e894b50bcbf255c471406466cfa5d0/gistfile1.txt
>
> On commit 16ae16c6e5616c084168740990fc508bda6655d4 (Nov 24).
>
> BUG: sleeping function called from invalid context at 
> kernel/locking/mutex.c:620
> in_atomic(): 1, irqs_disabled(): 0, pid: 32289, name: syz-executor
> CPU: 0 PID: 32289 Comm: syz-executor Not tainted 4.9.0-rc5+ #54
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
>  88003ec06420 834c2e39  110007d80c17
>  ed0007d80c0f 41b58ab3 89575550 834c2b4b
>  8baab1a0 dc00  880068f794e0
> Call Trace:
>   [  287.394552]  [< inline >] __dump_stack lib/dump_stack.c:15
>   [  287.394552]  [] dump_stack+0x2ee/0x3f5
> lib/dump_stack.c:51
>  [] ___might_sleep+0x483/0x660 kernel/sched/core.c:7761
>  [] __might_sleep+0x9a/0x1a0 kernel/sched/core.c:7720
>  [] mutex_lock_nested+0x1ea/0xf20 kernel/locking/mutex.c:620
>  [< inline >] genl_lock net/netlink/genetlink.c:31
>  [] genl_lock_done+0x71/0xd0 net/netlink/genetlink.c:531
>  [] netlink_sock_destruct+0xf8/0x400
> net/netlink/af_netlink.c:331
>  [] __sk_destruct+0xf4/0x7f0 net/core/sock.c:1423
>  [] sk_destruct+0x4c/0x80 net/core/sock.c:1453
>  [] __sk_free+0x5c/0x230 net/core/sock.c:1461
>  [] sk_free+0x28/0x30 net/core/sock.c:1472
>  [< inline >] sock_put include/net/sock.h:1591
>  [] deferred_put_nlk_sk+0x31/0x40 
> net/netlink/af_netlink.c:652
>  [< inline >] __rcu_reclaim kernel/rcu/rcu.h:118
>  [] rcu_do_batch.isra.70+0x9ed/0xe20 kernel/rcu/tree.c:2776
>  [< inline >] invoke_rcu_callbacks kernel/rcu/tree.c:3040
>  [< inline >] __rcu_process_callbacks kernel/rcu/tree.c:3007
>  [] rcu_process_callbacks+0x48c/0xd70 kernel/rcu/tree.c:3024
>  [] __do_softirq+0x32b/0xca8 kernel/softirq.c:284
>  [< inline >] invoke_softirq kernel/softirq.c:364
>  [] irq_exit+0x1d1/0x210 kernel/softirq.c:405
>  [< inline >] exiting_irq arch/x86/include/asm/apic.h:659
>  [] smp_apic_timer_interrupt+0x80/0xa0
> arch/x86/kernel/apic/apic.c:960
>  [] apic_timer_interrupt+0x8c/0xa0
> arch/x86/entry/entry_64.S:489
>   [  287.403717]  [] ? lock_is_held+0x247/0x310
>  [] ___might_sleep+0x59e/0x660 kernel/sched/core.c:7729
>  [] __might_sleep+0x9a/0x1a0 kernel/sched/core.c:7720
>  [] down_read+0x78/0x160 kernel/locking/rwsem.c:21
>  [< inline >] anon_vma_lock_read include/linux/rmap.h:127
>  [] validate_mm+0xe5/0x880 mm/mmap.c:347
>  [] vma_link+0x11b/0x180 mm/mmap.c:605
>  [] mmap_region+0x1076/0x1880 mm/mmap.c:1692
>  [] do_mmap+0x6ff/0xe80 mm/mmap.c:1450
>  [< inline >] do_mmap_pgoff include/linux/mm.h:2039
>  [] vm_mmap_pgoff+0x1b7/0x210 mm/util.c:305
>  [< inline >] SYSC_mmap_pgoff mm/mmap.c:1500
>  [] SyS_mmap_pgoff+0x231/0x5e0 mm/mmap.c:1458
>  [< inline >] SYSC_mmap arch/x86/kernel/sys_x86_64.c:95
>  [] SyS_mmap+0x1b/0x30 arch/x86/kernel/sys_x86_64.c:86
>  [] entry_SYSCALL_64_fastpath+0x23/0xc6
>
> =
> [ INFO: inconsistent lock state ]
> 4.9.0-rc5+ #54 Tainted: GW
> -
> inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
> syz-executor/32289 [HC0[0]:SC1[1]:HE1:SE0] takes:
>  ([  287.580014] genl_mutex
> [< inline >] genl_lock net/netlink/genetlink.c:31
> [] genl_lock_done+0x71/0xd0 net/netlink/genetlink.c:531
> {SOFTIRQ-ON-W} state was registered at:
>   [  287.580014] [< inline >] mark_irqflags
> kernel/locking/lockdep.c:2938
>   [  287.580014] [] __lock_acquire+0x6e7/0x3380
> kernel/locking/lockdep.c:3292
>   [  287.580014] [] lock_acquire+0x2a2/0x790
> kernel/locking/lockdep.c:3746
>   [  287.580014] [< inline >] __mutex_lock_common
> kernel/locking/mutex.c:521
>   [  287.580014] [] mutex_lock_nested+0x23f/0xf20
> kernel/locking/mutex.c:621
>   [  287.580014] [< inline >] genl_lock net/netlink/genetlink.c:31
>   [  287.580014] [< inline >] genl_lock_all net/netlink/genetlink.c:52
>   [  287.580014] []
> __genl_register_family+0x2ce/0x1870 net/netlink/genetlink.c:374
>   [  287.580014] [< inline >]
> _genl_register_family_with_ops_grps include/net/genetlink.h:173
>   [  287.580014] [] genl_init+0x11d/0x185
> net/netlink/genetlink.c:1084
>   [  287.580014] [] do_one_initcall+0xfb/0x3f0 
> init/main.c:778
>   [  287.580014] [< inline >] do_initcall_level init/main.c:844
>   [  287.580014] [< inline >] do_initcalls init/main.c:852
>   [  287.580014] [< inline >] do_basic_setup init/main.c:870
>   [  287.580014] [] kernel_init_freeable+0x5c4/0x69e
> init/main.c:1017
>   [  287.580014] [] kernel_init+0x18/0x180 init/main.c:943
>   [  287.580014] [] ret_from_fork+0x2a/0x40
>

net: deadlock on genl_mutex

2016-11-26 Thread Dmitry Vyukov

Hello,

The following program triggers deadlock warnings on genl_mutex:

https://gist.githubusercontent.com/dvyukov/65e33d053e507d2ab0bf6ae83d989585/raw/b3c640ec58e894b50bcbf255c471406466cfa5d0/gistfile1.txt

On commit 16ae16c6e5616c084168740990fc508bda6655d4 (Nov 24).

BUG: sleeping function called from invalid context at kernel/locking/mutex.c:620
in_atomic(): 1, irqs_disabled(): 0, pid: 32289, name: syz-executor
CPU: 0 PID: 32289 Comm: syz-executor Not tainted 4.9.0-rc5+ #54
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
 88003ec06420 834c2e39  110007d80c17
 ed0007d80c0f 41b58ab3 89575550 834c2b4b
 8baab1a0 dc00  880068f794e0
Call Trace:
  [  287.394552]  [< inline >] __dump_stack lib/dump_stack.c:15
  [  287.394552]  [] dump_stack+0x2ee/0x3f5
lib/dump_stack.c:51
 [] ___might_sleep+0x483/0x660 kernel/sched/core.c:7761
 [] __might_sleep+0x9a/0x1a0 kernel/sched/core.c:7720
 [] mutex_lock_nested+0x1ea/0xf20 kernel/locking/mutex.c:620
 [< inline >] genl_lock net/netlink/genetlink.c:31
 [] genl_lock_done+0x71/0xd0 net/netlink/genetlink.c:531
 [] netlink_sock_destruct+0xf8/0x400
net/netlink/af_netlink.c:331
 [] __sk_destruct+0xf4/0x7f0 net/core/sock.c:1423
 [] sk_destruct+0x4c/0x80 net/core/sock.c:1453
 [] __sk_free+0x5c/0x230 net/core/sock.c:1461
 [] sk_free+0x28/0x30 net/core/sock.c:1472
 [< inline >] sock_put include/net/sock.h:1591
 [] deferred_put_nlk_sk+0x31/0x40 net/netlink/af_netlink.c:652
 [< inline >] __rcu_reclaim kernel/rcu/rcu.h:118
 [] rcu_do_batch.isra.70+0x9ed/0xe20 kernel/rcu/tree.c:2776
 [< inline >] invoke_rcu_callbacks kernel/rcu/tree.c:3040
 [< inline >] __rcu_process_callbacks kernel/rcu/tree.c:3007
 [] rcu_process_callbacks+0x48c/0xd70 kernel/rcu/tree.c:3024
 [] __do_softirq+0x32b/0xca8 kernel/softirq.c:284
 [< inline >] invoke_softirq kernel/softirq.c:364
 [] irq_exit+0x1d1/0x210 kernel/softirq.c:405
 [< inline >] exiting_irq arch/x86/include/asm/apic.h:659
 [] smp_apic_timer_interrupt+0x80/0xa0
arch/x86/kernel/apic/apic.c:960
 [] apic_timer_interrupt+0x8c/0xa0
arch/x86/entry/entry_64.S:489
  [  287.403717]  [] ? lock_is_held+0x247/0x310
 [] ___might_sleep+0x59e/0x660 kernel/sched/core.c:7729
 [] __might_sleep+0x9a/0x1a0 kernel/sched/core.c:7720
 [] down_read+0x78/0x160 kernel/locking/rwsem.c:21
 [< inline >] anon_vma_lock_read include/linux/rmap.h:127
 [] validate_mm+0xe5/0x880 mm/mmap.c:347
 [] vma_link+0x11b/0x180 mm/mmap.c:605
 [] mmap_region+0x1076/0x1880 mm/mmap.c:1692
 [] do_mmap+0x6ff/0xe80 mm/mmap.c:1450
 [< inline >] do_mmap_pgoff include/linux/mm.h:2039
 [] vm_mmap_pgoff+0x1b7/0x210 mm/util.c:305
 [< inline >] SYSC_mmap_pgoff mm/mmap.c:1500
 [] SyS_mmap_pgoff+0x231/0x5e0 mm/mmap.c:1458
 [< inline >] SYSC_mmap arch/x86/kernel/sys_x86_64.c:95
 [] SyS_mmap+0x1b/0x30 arch/x86/kernel/sys_x86_64.c:86
 [] entry_SYSCALL_64_fastpath+0x23/0xc6

=
[ INFO: inconsistent lock state ]
4.9.0-rc5+ #54 Tainted: GW
-
inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
syz-executor/32289 [HC0[0]:SC1[1]:HE1:SE0] takes:
 ([  287.580014] genl_mutex
[< inline >] genl_lock net/netlink/genetlink.c:31
[] genl_lock_done+0x71/0xd0 net/netlink/genetlink.c:531
{SOFTIRQ-ON-W} state was registered at:
  [  287.580014] [< inline >] mark_irqflags
kernel/locking/lockdep.c:2938
  [  287.580014] [] __lock_acquire+0x6e7/0x3380
kernel/locking/lockdep.c:3292
  [  287.580014] [] lock_acquire+0x2a2/0x790
kernel/locking/lockdep.c:3746
  [  287.580014] [< inline >] __mutex_lock_common
kernel/locking/mutex.c:521
  [  287.580014] [] mutex_lock_nested+0x23f/0xf20
kernel/locking/mutex.c:621
  [  287.580014] [< inline >] genl_lock net/netlink/genetlink.c:31
  [  287.580014] [< inline >] genl_lock_all net/netlink/genetlink.c:52
  [  287.580014] []
__genl_register_family+0x2ce/0x1870 net/netlink/genetlink.c:374
  [  287.580014] [< inline >]
_genl_register_family_with_ops_grps include/net/genetlink.h:173
  [  287.580014] [] genl_init+0x11d/0x185
net/netlink/genetlink.c:1084
  [  287.580014] [] do_one_initcall+0xfb/0x3f0 init/main.c:778
  [  287.580014] [< inline >] do_initcall_level init/main.c:844
  [  287.580014] [< inline >] do_initcalls init/main.c:852
  [  287.580014] [< inline >] do_basic_setup init/main.c:870
  [  287.580014] [] kernel_init_freeable+0x5c4/0x69e
init/main.c:1017
  [  287.580014] [] kernel_init+0x18/0x180 init/main.c:943
  [  287.580014] [] ret_from_fork+0x2a/0x40
arch/x86/entry/entry_64.S:433

[   78.258919] [ INFO: inconsistent lock state ]
[   78.258919] 4.9.0-rc5+ #54 Tainted: GW
[   78.258919] -
[   78.258919] inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
[   78.258919] syz-fuzzer/5211

net: GPF in rt6_get_cookie

2016-11-26 Thread Dmitry Vyukov

Hello,

I got several GPFs in rt6_get_cookie while running syzkaller:

general protection fault:  [#1] SMP DEBUG_PAGEALLOC KASAN
Dumping ftrace buffer:
   (ftrace buffer empty)
Modules linked in:
CPU: 2 PID: 10156 Comm: syz-executor Not tainted 4.9.0-rc5+ #54
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
task: 880016f40480 task.stack: 88000fc0
RIP: 0010:[]  [< inline >] rt6_get_cookie
include/net/ip6_fib.h:174
RIP: 0010:[]  []
sctp_v6_get_dst+0x7c8/0x1960 net/sctp/ipv6.c:340
RSP: 0018:88000fc07298  EFLAGS: 00010202
RAX: dc00 RBX:  RCX: c900029f5000
RDX: 0015 RSI: 0001 RDI: 00a8
RBP: 88000fc07580 R08:  R09: 0001
R10:  R11:  R12: 880066cd0068
R13: 110001f80e92 R14: 880066cd0040 R15: 88005f2d2808
FS:  7f52c41f7700() GS:88006d00() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 20016000 CR3: 65dd7000 CR4: 06e0
DR0: 0400 DR1: 0400 DR2: 
DR3:  DR6: 0ff0 DR7: 0600
Stack:
 87a210f6 8701ad45 88006768ec20 88006768ec20
  16f40480 88000fc07450 11000cd9a017
 88006768ec00 880066fc0730 880066cd0068 110001f80e66
Call Trace:
 [] sctp_transport_route+0xad/0x430 net/sctp/transport.c:279
 [] sctp_assoc_add_peer+0x5a6/0x13e0 net/sctp/associola.c:641
 [] sctp_sendmsg+0x1921/0x3bc0 net/sctp/socket.c:1864
 [] inet_sendmsg+0x385/0x590 net/ipv4/af_inet.c:734
 [< inline >] sock_sendmsg_nosec net/socket.c:621
 [] sock_sendmsg+0xcf/0x110 net/socket.c:631
 [] SYSC_sendto+0x660/0x810 net/socket.c:1656
 [] SyS_sendto+0x45/0x60 net/socket.c:1624
 [] entry_SYSCALL_64_fastpath+0x23/0xc6
Code: 00 00 48 8b 84 24 88 00 00 00 48 8b 58 40 e8 80 76 cc f9 48 8d
bb a8 00 00 00 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <80>
3c 02 00 0f 85 56 0f 00 00 48 8b 9b a8 00 00 00 45 31 ed 48
RIP  [< inline >] rt6_get_cookie include/net/ip6_fib.h:174
RIP  [] sctp_v6_get_dst+0x7c8/0x1960 net/sctp/ipv6.c:340
 RSP 
---[ end trace b8d1354fa571700d ]---


general protection fault:  [#1] SMP DEBUG_PAGEALLOC KASAN
Dumping ftrace buffer:
   (ftrace buffer empty)
Modules linked in:
CPU: 3 PID: 22744 Comm: syz-executor Not tainted 4.9.0-rc5+ #54
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
task: 88006b92a840 task.stack: 88006a73
RIP: 0010:[]  [< inline >] rt6_get_cookie
include/net/ip6_fib.h:174
RIP: 0010:[]  []
sctp_v6_get_dst+0x7c8/0x1960 net/sctp/ipv6.c:340
RSP: 0018:88006a736b88  EFLAGS: 00010202
RAX: dc00 RBX:  RCX: c90003c4f000
RDX: 0015 RSI: 0001 RDI: 00a8
RBP: 88006a736e68 R08:  R09: 0001
R10:  R11:  R12: 880064cff268
R13: 11000d4e6db0 R14: 880064cff240 R15: 88006a4b6808
FS:  7f74f4ec9700() GS:88006d10() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 2070effc CR3: 3bd2f000 CR4: 06e0
DR0: 0400 DR1: 0400 DR2: 
DR3:  DR6: 0ff0 DR7: 0600
Stack:
 87a210f6 000bbd2d 88006c2cd5a0 88006c2cd5a0
  6ccb46c0 88006a736d40 11000c99fe57
 88006c2cd500 8800658b1f30 880064cff268 11000d4e6d84
Call Trace:
 [] sctp_transport_route+0xad/0x430 net/sctp/transport.c:279
 [] sctp_assoc_add_peer+0x5a6/0x13e0 net/sctp/associola.c:641
 [] __sctp_connect+0x288/0xc90 net/sctp/socket.c:1178
 [] __sctp_setsockopt_connectx+0x1ab/0x200
net/sctp/socket.c:1332
 [< inline >] sctp_getsockopt_connectx3 net/sctp/socket.c:1417
 [] sctp_getsockopt+0x36ed/0x6800 net/sctp/socket.c:6474
 [] sock_common_getsockopt+0x9a/0xe0 net/core/sock.c:2649
 [< inline >] SYSC_getsockopt net/socket.c:1788
 [] SyS_getsockopt+0x257/0x390 net/socket.c:1770
 [] entry_SYSCALL_64_fastpath+0x23/0xc6
Code: 00 00 48 8b 84 24 88 00 00 00 48 8b 58 40 e8 80 76 cc f9 48 8d
bb a8 00 00 00 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <80>
3c 02 00 0f 85 56 0f 00 00 48 8b 9b a8 00 00 00 45 31 ed 48
RIP  [< inline >] rt6_get_cookie include/net/ip6_fib.h:174
RIP  [] sctp_v6_get_dst+0x7c8/0x1960 net/sctp/ipv6.c:340
 RSP 
---[ end trace f42d1c14cb6d2835 ]---

This happened on commit a25f0944ba9b1d8a6813fd6f1a86f1bd59ac25a6 (Nov 13).

Unfortunately this is not reproducible.

The line is:

return rt->rt6i_node ? rt->rt6i_node->fn_sernum : 0;

Can it be a data race? rt->rt6i_node != NULL, but the next moment it
is already NULL? That would explain the crash and non-reproducibility
(need ThreadSanitizer!).

This always happened

Re: netlink: GPF in sock_sndtimeo

2016-11-26 Thread Eric Dumazet

CC Richard Guy Briggs 

On Sat, Nov 26, 2016 at 7:44 AM, Dmitry Vyukov  wrote:
> Hello,
>
> The following program triggers GPF in sock_sndtimeo:
> https://gist.githubusercontent.com/dvyukov/c19cadd309791cf5cb9b2bf936d3f48d/raw/1743ba0211079a5465d039512b427bc6b59b1a76/gistfile1.txt
>
> On commit 16ae16c6e5616c084168740990fc508bda6655d4 (Nov 24).
>
> general protection fault:  [#1] SMP DEBUG_PAGEALLOC KASAN
> Dumping ftrace buffer:
>(ftrace buffer empty)
> Modules linked in:
> CPU: 1 PID: 19950 Comm: syz-executor Not tainted 4.9.0-rc5+ #54
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> task: 88002a0d0840 task.stack: 88003692
> RIP: 0010:[]  [< inline >] sock_sndtimeo
> include/net/sock.h:2075
> RIP: 0010:[]  []
> netlink_unicast+0xe1/0x730 net/netlink/af_netlink.c:1232
> RSP: 0018:880036926f68  EFLAGS: 00010202
> RAX: 0068 RBX: 880036927000 RCX: c900021d
> RDX: 0d63 RSI: 024000c0 RDI: 0340
> RBP: 880036927028 R08: ed0006ea7aab R09: ed0006ea7aab
> R10: 0001 R11: ed0006ea7aaa R12: dc00
> R13:  R14: 880035de3400 R15: 880035de3400
> FS:  7f90a2fc7700() GS:88003ed0() knlGS:
> CS:  0010 DS:  ES:  CR0: 80050033
> CR2: 006de0c0 CR3: 35de6000 CR4: 06e0
> Stack:
>  880035de3400 819f02a1 110006d24df4 0004
>  4db40014 880036926fd8  41b58ab3
>  89653c11 86cb3500 819f0345 880035de3400
> Call Trace:
>  [< inline >] audit_replace kernel/audit.c:817
>  [] audit_receive_msg+0x22c9/0x2ce0 kernel/audit.c:894
>  [< inline >] audit_receive_skb kernel/audit.c:1120
>  [] audit_receive+0x1dc/0x360 kernel/audit.c:1133
>  [< inline >] netlink_unicast_kernel net/netlink/af_netlink.c:1214
>  [] netlink_unicast+0x514/0x730 
> net/netlink/af_netlink.c:1240
>  [] netlink_sendmsg+0xaa4/0xe50 
> net/netlink/af_netlink.c:1786
>  [< inline >] sock_sendmsg_nosec net/socket.c:621
>  [] sock_sendmsg+0xcf/0x110 net/socket.c:631
>  [] sock_write_iter+0x32b/0x620 net/socket.c:829
>  [< inline >] new_sync_write fs/read_write.c:499
>  [] __vfs_write+0x4fe/0x830 fs/read_write.c:512
>  [] vfs_write+0x175/0x4e0 fs/read_write.c:560
>  [< inline >] SYSC_write fs/read_write.c:607
>  [] SyS_write+0x100/0x240 fs/read_write.c:599
>  [] do_syscall_64+0x2f4/0x940 arch/x86/entry/common.c:280
>  [] entry_SYSCALL64_slow_path+0x25/0x25
> Code: fe 4c 89 f7 e8 31 16 ff ff 8b 8d 70 ff ff ff 49 89 c7 31 c0 85
> c9 75 25 e8 7d 4a a3 fa 49 8d bd 40 03 00 00 48 89 f8 48 c1 e8 03 <42>
> 80 3c 20 00 0f 85 3a 06 00 00 49 8b 85 40 03 00 00 4c 8d 73
> RIP  [< inline >] sock_sndtimeo include/net/sock.h:2075
> RIP  [] netlink_unicast+0xe1/0x730
> net/netlink/af_netlink.c:1232
>  RSP 
> ---[ end trace 8383a15fba6fdc59 ]---


Looks a bug added in commit 32a1dbaece7e37cea415e03cd426172249aa859e
("audit: try harder to send to auditd upon netlink failure")
or 133e1e5acd4a63c4a0dcc413e90d5decdbce9c4a ("audit: stop an old
auditd being starved out by a new auditd")

Richard, can you take a look ?

Thanks !

Re: [PATCH v3 net-next 1/2] net: ethernet: slicoss: add slicoss gigabit ethernet driver

2016-11-26 Thread Rami Rosen

Hi, Lino,

...

> @@ -0,0 +1,28 @@
> +config NET_VENDOR_ALACRITECH
> +bool "Alacritech devices"
> +default y
> +---help---
> +  If you have a network (Ethernet) card belonging to this class, say 
> Y.
> +
> +  Note that the answer to this question doesn't directly affect the
> +  kernel: saying N will just cause the configurator to skip all

Shouldn't it be "Alacritech devices" here, as appears earlier ?

> +  the questions about Renesas devices. If you say Y, you will be 
> asked
> +  for your specific device in the following questions.
> +

...
...
...
> +struct slic_device {
> +   struct pci_dev *pdev;
...
> +   bool promisc;

Seems that the autoneg boolean is not used anywhere, apart from
setting it once to true in
the slic_set_link_autoneg() method. Apart from this member it is not
accessed anywhere, so it seems it should be removed.

> +   bool autoneg;
> +   int speed;
...
...

> +static int slic_load_rcvseq_firmware(struct slic_device *sdev)
> +{
> +   const struct firmware *fw;
> +   const char *file;
> +   u32 codelen;
> +   int idx = 0;
> +   u32 instr;
> +   u32 addr;
> +   int err;
> +
...
> +   /* Do an initial sanity check concerning firmware size now. A further
> +* check follows below.
> +*/
> +   if (fw->size < SLIC_FIRMWARE_MIN_SIZE) {
> +   dev_err(>pdev->dev,
> +   "invalid firmware size %zu (min %u expected)\n",
> +   fw->size, SLIC_FIRMWARE_MIN_SIZE);
> +   err = -EINVAL;

in the release label, always 0 is returned:

> +   goto release;
> +   }
> +
> +   codelen = slic_read_dword_from_firmware(fw, );
> +
> +   /* do another sanity check against firmware size */
> +   if ((codelen + 4) > fw->size) {
> +   dev_err(>pdev->dev,
> +   "invalid rcv-sequencer firmware size %zu\n", 
> fw->size);
> +   err = -EINVAL;

Again, in the release label, always 0 is returned:

> +   goto release;
> +   }
> +
>
> +release:
> +   release_firmware(fw);
> +
> +   return 0;
> +}
> +

Regards,
Rami Rosen

netlink: GPF in sock_sndtimeo

2016-11-26 Thread Dmitry Vyukov

Hello,

The following program triggers GPF in sock_sndtimeo:
https://gist.githubusercontent.com/dvyukov/c19cadd309791cf5cb9b2bf936d3f48d/raw/1743ba0211079a5465d039512b427bc6b59b1a76/gistfile1.txt

On commit 16ae16c6e5616c084168740990fc508bda6655d4 (Nov 24).

general protection fault:  [#1] SMP DEBUG_PAGEALLOC KASAN
Dumping ftrace buffer:
   (ftrace buffer empty)
Modules linked in:
CPU: 1 PID: 19950 Comm: syz-executor Not tainted 4.9.0-rc5+ #54
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
task: 88002a0d0840 task.stack: 88003692
RIP: 0010:[]  [< inline >] sock_sndtimeo
include/net/sock.h:2075
RIP: 0010:[]  []
netlink_unicast+0xe1/0x730 net/netlink/af_netlink.c:1232
RSP: 0018:880036926f68  EFLAGS: 00010202
RAX: 0068 RBX: 880036927000 RCX: c900021d
RDX: 0d63 RSI: 024000c0 RDI: 0340
RBP: 880036927028 R08: ed0006ea7aab R09: ed0006ea7aab
R10: 0001 R11: ed0006ea7aaa R12: dc00
R13:  R14: 880035de3400 R15: 880035de3400
FS:  7f90a2fc7700() GS:88003ed0() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 006de0c0 CR3: 35de6000 CR4: 06e0
Stack:
 880035de3400 819f02a1 110006d24df4 0004
 4db40014 880036926fd8  41b58ab3
 89653c11 86cb3500 819f0345 880035de3400
Call Trace:
 [< inline >] audit_replace kernel/audit.c:817
 [] audit_receive_msg+0x22c9/0x2ce0 kernel/audit.c:894
 [< inline >] audit_receive_skb kernel/audit.c:1120
 [] audit_receive+0x1dc/0x360 kernel/audit.c:1133
 [< inline >] netlink_unicast_kernel net/netlink/af_netlink.c:1214
 [] netlink_unicast+0x514/0x730 net/netlink/af_netlink.c:1240
 [] netlink_sendmsg+0xaa4/0xe50 net/netlink/af_netlink.c:1786
 [< inline >] sock_sendmsg_nosec net/socket.c:621
 [] sock_sendmsg+0xcf/0x110 net/socket.c:631
 [] sock_write_iter+0x32b/0x620 net/socket.c:829
 [< inline >] new_sync_write fs/read_write.c:499
 [] __vfs_write+0x4fe/0x830 fs/read_write.c:512
 [] vfs_write+0x175/0x4e0 fs/read_write.c:560
 [< inline >] SYSC_write fs/read_write.c:607
 [] SyS_write+0x100/0x240 fs/read_write.c:599
 [] do_syscall_64+0x2f4/0x940 arch/x86/entry/common.c:280
 [] entry_SYSCALL64_slow_path+0x25/0x25
Code: fe 4c 89 f7 e8 31 16 ff ff 8b 8d 70 ff ff ff 49 89 c7 31 c0 85
c9 75 25 e8 7d 4a a3 fa 49 8d bd 40 03 00 00 48 89 f8 48 c1 e8 03 <42>
80 3c 20 00 0f 85 3a 06 00 00 49 8b 85 40 03 00 00 4c 8d 73
RIP  [< inline >] sock_sndtimeo include/net/sock.h:2075
RIP  [] netlink_unicast+0xe1/0x730
net/netlink/af_netlink.c:1232
 RSP 
---[ end trace 8383a15fba6fdc59 ]---

[PATCH net-next 04/10] net/mlx5: Add DCBX firmware commands support

2016-11-26 Thread Saeed Mahameed

From: Huy Nguyen 

Add set/query commands for DCBX_PARAM register

Signed-off-by: Huy Nguyen 
Signed-off-by: Saeed Mahameed 
---
 drivers/net/ethernet/mellanox/mlx5/core/port.c | 20 
 include/linux/mlx5/driver.h|  7 +++
 include/linux/mlx5/port.h  |  2 ++
 3 files changed, 29 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/port.c 
b/drivers/net/ethernet/mellanox/mlx5/core/port.c
index ed4898f..d2ec9d2 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/port.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/port.c
@@ -548,6 +548,26 @@ int mlx5_max_tc(struct mlx5_core_dev *mdev)
return num_tc - 1;
 }
 
+int mlx5_query_port_dcbx_param(struct mlx5_core_dev *mdev, u32 *out)
+{
+   u32 in[MLX5_ST_SZ_DW(dcbx_param)] = {0};
+
+   MLX5_SET(dcbx_param, in, port_number, 1);
+
+   return  mlx5_core_access_reg(mdev, in, sizeof(in), out,
+   sizeof(in), MLX5_REG_DCBX_PARAM, 0, 0);
+}
+
+int mlx5_set_port_dcbx_param(struct mlx5_core_dev *mdev, u32 *in)
+{
+   u32 out[MLX5_ST_SZ_DW(dcbx_param)];
+
+   MLX5_SET(dcbx_param, in, port_number, 1);
+
+   return mlx5_core_access_reg(mdev, in, sizeof(out), out,
+   sizeof(out), MLX5_REG_DCBX_PARAM, 0, 1);
+}
+
 int mlx5_set_port_prio_tc(struct mlx5_core_dev *mdev, u8 *prio_tc)
 {
u32 in[MLX5_ST_SZ_DW(qtct_reg)] = {0};
diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h
index ae1f451..68b85ef 100644
--- a/include/linux/mlx5/driver.h
+++ b/include/linux/mlx5/driver.h
@@ -104,6 +104,8 @@ enum {
 enum {
MLX5_REG_QETCR   = 0x4005,
MLX5_REG_QTCT= 0x400a,
+   MLX5_REG_DCBX_PARAM  = 0x4020,
+   MLX5_REG_DCBX_APP= 0x4021,
MLX5_REG_PCAP= 0x5001,
MLX5_REG_PMTU= 0x5003,
MLX5_REG_PTYS= 0x5004,
@@ -124,6 +126,11 @@ enum {
MLX5_REG_MPCNT   = 0x9051,
 };
 
+enum mlx5_dcbx_oper_mode {
+   MLX5E_DCBX_PARAM_VER_OPER_HOST  = 0x0,
+   MLX5E_DCBX_PARAM_VER_OPER_AUTO  = 0x3,
+};
+
 enum {
MLX5_ATOMIC_OPS_CMP_SWAP= 1 << 0,
MLX5_ATOMIC_OPS_FETCH_ADD   = 1 << 1,
diff --git a/include/linux/mlx5/port.h b/include/linux/mlx5/port.h
index bdee439..e527732 100644
--- a/include/linux/mlx5/port.h
+++ b/include/linux/mlx5/port.h
@@ -162,4 +162,6 @@ void mlx5_query_port_fcs(struct mlx5_core_dev *mdev, bool 
*supported,
 int mlx5_query_module_eeprom(struct mlx5_core_dev *dev,
 u16 offset, u16 size, u8 *data);
 
+int mlx5_query_port_dcbx_param(struct mlx5_core_dev *mdev, u32 *out);
+int mlx5_set_port_dcbx_param(struct mlx5_core_dev *mdev, u32 *in);
 #endif /* __MLX5_PORT_H__ */
-- 
2.7.4

[PATCH net-next 09/10] net/mlx5e: Moves pflags to priv->params

2016-11-26 Thread Saeed Mahameed

From: Shaker Daibes 

pflags is a configuration parameter for the netdev, naturally it belongs
to priv->params.
Also introduce MLX5E_GET_PFLAG

Signed-off-by: Shaker Daibes 
Signed-off-by: Saeed Mahameed 
---
 drivers/net/ethernet/mellanox/mlx5/core/en.h | 16 +---
 drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c |  6 +++---
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c|  4 ++--
 3 files changed, 14 insertions(+), 12 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index 9cf32d3..84ac78f 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -177,14 +177,16 @@ enum mlx5e_priv_flag {
MLX5E_PFLAG_RX_CQE_BASED_MODER = (1 << 0),
 };
 
-#define MLX5E_SET_PRIV_FLAG(priv, pflag, enable)\
-   do {\
-   if (enable) \
-   priv->pflags |= pflag;  \
-   else\
-   priv->pflags &= ~pflag; \
+#define MLX5E_SET_PFLAG(priv, pflag, enable)   \
+   do {\
+   if (enable) \
+   (priv)->params.pflags |= (pflag);   \
+   else\
+   (priv)->params.pflags &= ~(pflag);  \
} while (0)
 
+#define MLX5E_GET_PFLAG(priv, pflag) (!!((priv)->params.pflags & (pflag)))
+
 #ifdef CONFIG_MLX5_CORE_EN_DCB
 #define MLX5E_MAX_BW_ALLOC 100 /* Max percentage of BW allocation */
 #endif
@@ -218,6 +220,7 @@ struct mlx5e_params {
bool vlan_strip_disable;
bool rx_am_enabled;
u32 lro_timeout;
+   u32 pflags;
 };
 
 #ifdef CONFIG_MLX5_CORE_EN_DCB
@@ -705,7 +708,6 @@ struct mlx5e_priv {
struct work_struct tx_timeout_work;
struct delayed_workupdate_stats_work;
 
-   u32pflags;
struct mlx5_core_dev  *mdev;
struct net_device *netdev;
struct mlx5e_stats stats;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
index 839c4e9..d2bdccb 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
@@ -1488,7 +1488,7 @@ static int mlx5e_handle_pflag(struct net_device *netdev,
 {
struct mlx5e_priv *priv = netdev_priv(netdev);
bool enable = !!(wanted_flags & flag);
-   u32 changes = wanted_flags ^ priv->pflags;
+   u32 changes = wanted_flags ^ priv->params.pflags;
int err;
 
if (!(changes & flag))
@@ -1501,7 +1501,7 @@ static int mlx5e_handle_pflag(struct net_device *netdev,
return err;
}
 
-   MLX5E_SET_PRIV_FLAG(priv, flag, enable);
+   MLX5E_SET_PFLAG(priv, flag, enable);
return 0;
 }
 
@@ -1524,7 +1524,7 @@ static u32 mlx5e_get_priv_flags(struct net_device *netdev)
 {
struct mlx5e_priv *priv = netdev_priv(netdev);
 
-   return priv->pflags;
+   return priv->params.pflags;
 }
 
 static int mlx5e_set_rxnfc(struct net_device *dev, struct ethtool_rxnfc *cmd)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 89d5c65..004940a 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -3488,8 +3488,8 @@ static void mlx5e_build_nic_netdev_priv(struct 
mlx5_core_dev *mdev,
SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
 
/* Initialize pflags */
-   MLX5E_SET_PRIV_FLAG(priv, MLX5E_PFLAG_RX_CQE_BASED_MODER,
-   priv->params.rx_cq_period_mode == 
MLX5_CQ_PERIOD_MODE_START_FROM_CQE);
+   MLX5E_SET_PFLAG(priv, MLX5E_PFLAG_RX_CQE_BASED_MODER,
+   priv->params.rx_cq_period_mode == 
MLX5_CQ_PERIOD_MODE_START_FROM_CQE);
 
mutex_init(>state_lock);
 
-- 
2.7.4

[PATCH net-next 07/10] net/mlx5e: Add support for ethtool self diagnostics test

2016-11-26 Thread Saeed Mahameed

From: Kamal Heib 

The self diagnostics test implementaion include the following features:
1. Link Test: Check that link is in up state.
2. Speed Test: Check that link was negotiated correctly.
3. Health Test: Check the device health.

Signed-off-by: Kamal Heib 
Signed-off-by: Saeed Mahameed 
---
 drivers/net/ethernet/mellanox/mlx5/core/Makefile   |   2 +-
 drivers/net/ethernet/mellanox/mlx5/core/en.h   |   5 +
 .../net/ethernet/mellanox/mlx5/core/en_ethtool.c   |   8 +-
 .../net/ethernet/mellanox/mlx5/core/en_selftest.c  | 126 +
 4 files changed, 139 insertions(+), 2 deletions(-)
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/en_selftest.c

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Makefile 
b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
index 0343725..9f43beb 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/Makefile
+++ b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
@@ -8,6 +8,6 @@ mlx5_core-y :=  main.o cmd.o debugfs.o fw.o eq.o uar.o 
pagealloc.o \
 mlx5_core-$(CONFIG_MLX5_CORE_EN) += wq.o eswitch.o eswitch_offloads.o \
en_main.o en_common.o en_fs.o en_ethtool.o en_tx.o \
en_rx.o en_rx_am.o en_txrx.o en_clock.o vxlan.o \
-   en_tc.o en_arfs.o en_rep.o en_fs_ethtool.o
+   en_tc.o en_arfs.o en_rep.o en_fs_ethtool.o en_selftest.o
 
 mlx5_core-$(CONFIG_MLX5_CORE_EN_DCB) +=  en_dcbnl.o
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index 6c954cd..f7bb4a7 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -167,6 +167,8 @@ struct mlx5e_umr_wqe {
struct mlx5_wqe_data_seg   data;
 };
 
+extern const char mlx5e_self_tests[][ETH_GSTRING_LEN];
+
 static const char mlx5e_priv_flags[][ETH_GSTRING_LEN] = {
"rx_cqe_moder",
 };
@@ -754,6 +756,9 @@ int mlx5e_create_flow_steering(struct mlx5e_priv *priv);
 void mlx5e_destroy_flow_steering(struct mlx5e_priv *priv);
 void mlx5e_init_l2_addr(struct mlx5e_priv *priv);
 void mlx5e_destroy_flow_table(struct mlx5e_flow_table *ft);
+int mlx5e_self_test_num(struct mlx5e_priv *priv);
+void mlx5e_self_test(struct net_device *ndev, struct ethtool_test *etest,
+u64 *buf);
 int mlx5e_ethtool_get_flow(struct mlx5e_priv *priv, struct ethtool_rxnfc *info,
   int location);
 int mlx5e_ethtool_get_all_flows(struct mlx5e_priv *priv,
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
index 9ea7b37..839c4e9 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
@@ -180,6 +180,8 @@ static int mlx5e_get_sset_count(struct net_device *dev, int 
sset)
 
case ETH_SS_PRIV_FLAGS:
return ARRAY_SIZE(mlx5e_priv_flags);
+   case ETH_SS_TEST:
+   return mlx5e_self_test_num(priv);
/* fallthrough */
default:
return -EOPNOTSUPP;
@@ -286,6 +288,9 @@ static void mlx5e_get_strings(struct net_device *dev,
break;
 
case ETH_SS_TEST:
+   for (i = 0; i < mlx5e_self_test_num(priv); i++)
+   strcpy(data + i * ETH_GSTRING_LEN,
+  mlx5e_self_tests[i]);
break;
 
case ETH_SS_STATS:
@@ -1573,5 +1578,6 @@ const struct ethtool_ops mlx5e_ethtool_ops = {
.get_module_info   = mlx5e_get_module_info,
.get_module_eeprom = mlx5e_get_module_eeprom,
.get_priv_flags= mlx5e_get_priv_flags,
-   .set_priv_flags= mlx5e_set_priv_flags
+   .set_priv_flags= mlx5e_set_priv_flags,
+   .self_test = mlx5e_self_test,
 };
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_selftest.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_selftest.c
new file mode 100644
index 000..a25dfc5
--- /dev/null
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_selftest.c
@@ -0,0 +1,126 @@
+/*
+ * Copyright (c) 2016, Mellanox Technologies, Ltd.  All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ *  - Redistributions of source code must retain the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer.
+ *
+ *  - Redistributions in binary form must reproduce the above
+ *copyright notice, this list of conditions and the following

[PATCH net-next 02/10] net/mlx5e: Support DCBX CEE API

2016-11-26 Thread Saeed Mahameed

From: Huy Nguyen 

Add DCBX CEE API interface for ConnectX-4. Configurations are stored in
a temporary structure and are applied to the card's firmware when
the CEE's setall callback function is called.

Note:
  priority group in CEE is equivalent to traffic class in ConnectX-4
  hardware spec.

  bw allocation per priority in CEE is not supported because ConnectX-4
  only supports bw allocation per traffic class.

  user priority in CEE does not have an equivalent term in ConnectX-4.
  Therefore, user priority to priority mapping in CEE is not supported.

Signed-off-by: Huy Nguyen 
Signed-off-by: Saeed Mahameed 
---
 drivers/net/ethernet/mellanox/mlx5/core/en.h   |  24 ++
 drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c | 301 -
 drivers/net/ethernet/mellanox/mlx5/core/port.c |  43 +++
 include/linux/mlx5/port.h  |   4 +
 4 files changed, 370 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index a2b32ed..31387ed 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -221,6 +221,26 @@ struct mlx5e_params {
u32 lro_timeout;
 };
 
+#ifdef CONFIG_MLX5_CORE_EN_DCB
+struct mlx5e_cee_config {
+   /* bw pct for priority group */
+   u8 pg_bw_pct[CEE_DCBX_MAX_PGS];
+   u8 prio_to_pg_map[CEE_DCBX_MAX_PRIO];
+   bool   pfc_setting[CEE_DCBX_MAX_PRIO];
+   bool   pfc_enable;
+};
+
+enum {
+   MLX5_DCB_CHG_RESET,
+   MLX5_DCB_NO_CHG,
+   MLX5_DCB_CHG_NO_RESET,
+};
+
+struct mlx5e_dcbx {
+   struct mlx5e_cee_configcee_cfg; /* pending configuration */
+};
+#endif
+
 struct mlx5e_tstamp {
rwlock_t   lock;
struct cyclecountercycles;
@@ -688,6 +708,10 @@ struct mlx5e_priv {
struct mlx5e_stats stats;
struct mlx5e_tstamptstamp;
u16 q_counter;
+#ifdef CONFIG_MLX5_CORE_EN_DCB
+   struct mlx5e_dcbx  dcbx;
+#endif
+
const struct mlx5e_profile *profile;
void  *ppriv;
 };
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c
index 762af16..0595243 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c
@@ -38,6 +38,9 @@
 #define MLX5E_100MB (10)
 #define MLX5E_1GB   (100)
 
+#define MLX5E_CEE_STATE_UP1
+#define MLX5E_CEE_STATE_DOWN  0
+
 static int mlx5e_dcbnl_ieee_getets(struct net_device *netdev,
   struct ieee_ets *ets)
 {
@@ -222,13 +225,15 @@ static int mlx5e_dcbnl_ieee_setpfc(struct net_device *dev,
 
 static u8 mlx5e_dcbnl_getdcbx(struct net_device *dev)
 {
-   return DCB_CAP_DCBX_HOST | DCB_CAP_DCBX_VER_IEEE;
+   return DCB_CAP_DCBX_HOST |
+  DCB_CAP_DCBX_VER_IEEE |
+  DCB_CAP_DCBX_VER_CEE;
 }
 
 static u8 mlx5e_dcbnl_setdcbx(struct net_device *dev, u8 mode)
 {
if ((mode & DCB_CAP_DCBX_LLD_MANAGED) ||
-   (mode & DCB_CAP_DCBX_VER_CEE) ||
+   !(mode & DCB_CAP_DCBX_VER_CEE) ||
!(mode & DCB_CAP_DCBX_VER_IEEE) ||
!(mode & DCB_CAP_DCBX_HOST))
return 1;
@@ -304,6 +309,281 @@ static int mlx5e_dcbnl_ieee_setmaxrate(struct net_device 
*netdev,
return mlx5_modify_port_ets_rate_limit(mdev, max_bw_value, max_bw_unit);
 }
 
+static u8 mlx5e_dcbnl_setall(struct net_device *netdev)
+{
+   struct mlx5e_priv *priv = netdev_priv(netdev);
+   struct mlx5e_cee_config *cee_cfg = >dcbx.cee_cfg;
+   struct mlx5_core_dev *mdev = priv->mdev;
+   struct ieee_ets ets;
+   struct ieee_pfc pfc;
+   int err;
+   int i;
+
+   memset(, 0, sizeof(ets));
+   memset(, 0, sizeof(pfc));
+
+   ets.ets_cap = IEEE_8021QAZ_MAX_TCS;
+   for (i = 0; i < CEE_DCBX_MAX_PGS; i++) {
+   ets.tc_tx_bw[i] = cee_cfg->pg_bw_pct[i];
+   ets.tc_rx_bw[i] = cee_cfg->pg_bw_pct[i];
+   ets.tc_tsa[i]   = IEEE_8021QAZ_TSA_ETS;
+   ets.prio_tc[i]  = cee_cfg->prio_to_pg_map[i];
+   }
+
+   err = mlx5e_dbcnl_validate_ets(netdev, );
+   if (err) {
+   netdev_err(netdev,
+  "%s, Failed to validate ETS: %d\n", __func__, err);
+   goto out;
+   }
+
+   err = mlx5e_dcbnl_ieee_setets_core(priv, );
+   if (err) {
+   netdev_err(netdev,
+  "%s, Failed to set ETS: %d\n", __func__, err);
+   goto out;
+   }
+
+   /* Set PFC */
+   pfc.pfc_cap = mlx5_max_tc(mdev) + 1;
+   if (!cee_cfg->pfc_enable)
+   pfc.pfc_en = 0;
+   else
+   for (i = 0; i <

[PATCH net-next 01/10] net/mlx5e: Add qos capability check

2016-11-26 Thread Saeed Mahameed

From: Huy Nguyen 

Make sure firmware supports qos before exposing the DCB API.

Signed-off-by: Huy Nguyen 
Signed-off-by: Saeed Mahameed 
---
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 19403d6..2b42112 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -3549,7 +3549,8 @@ static void mlx5e_build_nic_netdev(struct net_device 
*netdev)
if (MLX5_CAP_GEN(mdev, vport_group_manager)) {
netdev->netdev_ops = _netdev_ops_sriov;
 #ifdef CONFIG_MLX5_CORE_EN_DCB
-   netdev->dcbnl_ops = _dcbnl_ops;
+   if (MLX5_CAP_GEN(mdev, qos))
+   netdev->dcbnl_ops = _dcbnl_ops;
 #endif
} else {
netdev->netdev_ops = _netdev_ops_basic;
-- 
2.7.4

[PATCH net-next 06/10] net/mlx5e: Add DCBX control interface

2016-11-26 Thread Saeed Mahameed

From: Huy Nguyen 

Use setdcbx interface to set the DCBX mode to firmware or os.
If setdcbx is called with mode value of zero, the DCBX mode
is set to firmware.

Signed-off-by: Huy Nguyen 
Signed-off-by: Saeed Mahameed 
---
 drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c | 27 +++---
 1 file changed, 24 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c
index 2e94717..64c45e9 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c
@@ -288,13 +288,34 @@ static int mlx5e_dcbnl_ieee_setpfc(struct net_device *dev,
 
 static u8 mlx5e_dcbnl_getdcbx(struct net_device *dev)
 {
-   return DCB_CAP_DCBX_HOST |
-  DCB_CAP_DCBX_VER_IEEE |
-  DCB_CAP_DCBX_VER_CEE;
+   struct mlx5e_priv *priv = netdev_priv(dev);
+   struct mlx5e_dcbx *dcbx = >dcbx;
+   u8 mode = DCB_CAP_DCBX_VER_IEEE | DCB_CAP_DCBX_VER_CEE;
+
+   if (dcbx->mode == MLX5E_DCBX_PARAM_VER_OPER_HOST)
+   mode |= DCB_CAP_DCBX_HOST;
+
+   return mode;
 }
 
 static u8 mlx5e_dcbnl_setdcbx(struct net_device *dev, u8 mode)
 {
+   struct mlx5e_priv *priv = netdev_priv(dev);
+   struct mlx5e_dcbx *dcbx = >dcbx;
+
+   if ((!mode) && MLX5_CAP_GEN(priv->mdev, dcbx)) {
+   if (dcbx->mode == MLX5E_DCBX_PARAM_VER_OPER_AUTO)
+   return 0;
+
+   /* set dcbx to fw controlled */
+   if (!mlx5e_dcbnl_set_dcbx_mode(priv, 
MLX5E_DCBX_PARAM_VER_OPER_AUTO)) {
+   dcbx->mode = MLX5E_DCBX_PARAM_VER_OPER_AUTO;
+   return 0;
+   }
+
+   return 1;
+   }
+
if (mlx5e_dcbnl_switch_to_host_mode(netdev_priv(dev)))
return 1;
 
-- 
2.7.4

[PATCH net-next 03/10] net/mlx5e: Read ETS settings directly from firmware

2016-11-26 Thread Saeed Mahameed

From: Huy Nguyen 

Issue description:
Current implementation saves the ETS settings from user in
a temporal soft copy and returns this settings when user
queries the ETS settings.

With the new DCBX firmware, the ETS settings can be changed
by firmware when the DCBX is in firmware controlled mode. Therefore,
user will obtain wrong values from the temporal soft copy.

Solution:
1. Read the ETS settings directly from firmware.
2. For tc_tsa:
   a. Initialize tc_tsa to vendor IEEE_8021QAZ_TSA_VENDOR at netdev
  creation.
   b. When reading ETS setting from FW, if the traffic class bandwidth
  is less than 100, set tc_tsa to IEEE_8021QAZ_TSA_ETS. This
  implementation solves the scenarios when the DCBX is in FW control
  and willing bit is on which means the ETS setting is dictated
  by remote switch.

Signed-off-by: Huy Nguyen 
Signed-off-by: Saeed Mahameed 
---
 drivers/net/ethernet/mellanox/mlx5/core/en.h   |  6 ++--
 drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c | 35 ++
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c  | 28 +
 3 files changed, 47 insertions(+), 22 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index 31387ed..60aa13b 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -214,9 +214,6 @@ struct mlx5e_params {
u8  toeplitz_hash_key[40];
u32 indirection_rqt[MLX5E_INDIR_RQT_SIZE];
bool vlan_strip_disable;
-#ifdef CONFIG_MLX5_CORE_EN_DCB
-   struct ieee_ets ets;
-#endif
bool rx_am_enabled;
u32 lro_timeout;
 };
@@ -238,6 +235,9 @@ enum {
 
 struct mlx5e_dcbx {
struct mlx5e_cee_configcee_cfg; /* pending configuration */
+
+   /* The only setting that cannot be read from FW */
+   u8 tc_tsa[IEEE_8021QAZ_MAX_TCS];
 };
 #endif
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c
index 0595243..8f6b5a7 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c
@@ -45,12 +45,31 @@ static int mlx5e_dcbnl_ieee_getets(struct net_device 
*netdev,
   struct ieee_ets *ets)
 {
struct mlx5e_priv *priv = netdev_priv(netdev);
+   struct mlx5_core_dev *mdev = priv->mdev;
+   int i;
+   int err = 0;
 
if (!MLX5_CAP_GEN(priv->mdev, ets))
return -ENOTSUPP;
 
-   memcpy(ets, >params.ets, sizeof(*ets));
-   return 0;
+   ets->ets_cap = mlx5_max_tc(priv->mdev) + 1;
+   for (i = 0; i < ets->ets_cap; i++) {
+   err = mlx5_query_port_prio_tc(mdev, i, >prio_tc[i]);
+   if (err)
+   return err;
+   }
+
+   for (i = 0; i < ets->ets_cap; i++) {
+   err = mlx5_query_port_tc_bw_alloc(mdev, i, >tc_tx_bw[i]);
+   if (err)
+   return err;
+   if (ets->tc_tx_bw[i] < MLX5E_MAX_BW_ALLOC)
+   priv->dcbx.tc_tsa[i] = IEEE_8021QAZ_TSA_ETS;
+   }
+
+   memcpy(ets->tc_tsa, priv->dcbx.tc_tsa, sizeof(ets->tc_tsa));
+
+   return err;
 }
 
 enum {
@@ -127,7 +146,14 @@ int mlx5e_dcbnl_ieee_setets_core(struct mlx5e_priv *priv, 
struct ieee_ets *ets)
if (err)
return err;
 
-   return mlx5_set_port_tc_bw_alloc(mdev, tc_tx_bw);
+   err = mlx5_set_port_tc_bw_alloc(mdev, tc_tx_bw);
+
+   if (err)
+   return err;
+
+   memcpy(priv->dcbx.tc_tsa, ets->tc_tsa, sizeof(ets->tc_tsa));
+
+   return err;
 }
 
 static int mlx5e_dbcnl_validate_ets(struct net_device *netdev,
@@ -181,9 +207,6 @@ static int mlx5e_dcbnl_ieee_setets(struct net_device 
*netdev,
if (err)
return err;
 
-   memcpy(>params.ets, ets, sizeof(*ets));
-   priv->params.ets.ets_cap = mlx5_max_tc(priv->mdev) + 1;
-
return 0;
 }
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 2b42112..9743c4c 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -3329,17 +3329,23 @@ u16 mlx5e_get_max_inline_cap(struct mlx5_core_dev *mdev)
 static void mlx5e_ets_init(struct mlx5e_priv *priv)
 {
int i;
-
-   priv->params.ets.ets_cap = mlx5_max_tc(priv->mdev) + 1;
-   for (i = 0; i < priv->params.ets.ets_cap; i++) {
-   priv->params.ets.tc_tx_bw[i] = MLX5E_MAX_BW_ALLOC;
-   priv->params.ets.tc_tsa[i] = IEEE_8021QAZ_TSA_VENDOR;
-   priv->params.ets.prio_tc[i] = i;
+   struct ieee_ets ets;
+
+   memset(, 0, sizeof(ets));
+   ets.ets_cap = mlx5_max_tc(priv->mdev) + 1;
+   for (i = 0; i < ets.ets_cap; i++) {
+

[PATCH net-next 05/10] net/mlx5e: ConnectX-4 firmware support for DCBX

2016-11-26 Thread Saeed Mahameed

From: Huy Nguyen 

This patch sets up the infrastructure to support the new
DCBX firmware.

Signed-off-by: Huy Nguyen 
Signed-off-by: Saeed Mahameed 
---
 drivers/net/ethernet/mellanox/mlx5/core/en.h   |  2 +
 drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c | 92 ++
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c  | 26 +-
 3 files changed, 95 insertions(+), 25 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index 60aa13b..6c954cd 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -234,6 +234,7 @@ enum {
 };
 
 struct mlx5e_dcbx {
+   enum mlx5_dcbx_oper_mode   mode;
struct mlx5e_cee_configcee_cfg; /* pending configuration */
 
/* The only setting that cannot be read from FW */
@@ -843,6 +844,7 @@ extern const struct ethtool_ops mlx5e_ethtool_ops;
 #ifdef CONFIG_MLX5_CORE_EN_DCB
 extern const struct dcbnl_rtnl_ops mlx5e_dcbnl_ops;
 int mlx5e_dcbnl_ieee_setets_core(struct mlx5e_priv *priv, struct ieee_ets 
*ets);
+void mlx5e_dcbnl_initialize(struct mlx5e_priv *priv);
 #endif
 
 #ifndef CONFIG_RFS_ACCEL
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c
index 8f6b5a7..2e94717 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c
@@ -41,6 +41,43 @@
 #define MLX5E_CEE_STATE_UP1
 #define MLX5E_CEE_STATE_DOWN  0
 
+/* If dcbx mode is non-host set the dcbx mode to host.
+ */
+static int mlx5e_dcbnl_set_dcbx_mode(struct mlx5e_priv *priv,
+enum mlx5_dcbx_oper_mode mode)
+{
+   struct mlx5_core_dev *mdev = priv->mdev;
+   u32 param[MLX5_ST_SZ_DW(dcbx_param)];
+   int err;
+
+   err = mlx5_query_port_dcbx_param(mdev, param);
+   if (err)
+   return err;
+
+   MLX5_SET(dcbx_param, param, version_admin, mode);
+   if (mode != MLX5E_DCBX_PARAM_VER_OPER_HOST)
+   MLX5_SET(dcbx_param, param, willing_admin, 1);
+
+   return mlx5_set_port_dcbx_param(mdev, param);
+}
+
+static int mlx5e_dcbnl_switch_to_host_mode(struct mlx5e_priv *priv)
+{
+   struct mlx5e_dcbx *dcbx = >dcbx;
+
+   if (!MLX5_CAP_GEN(priv->mdev, dcbx))
+   return 0;
+
+   if (dcbx->mode == MLX5E_DCBX_PARAM_VER_OPER_HOST)
+   return 0;
+
+   if (mlx5e_dcbnl_set_dcbx_mode(priv, MLX5E_DCBX_PARAM_VER_OPER_HOST))
+   return 1;
+
+   dcbx->mode = MLX5E_DCBX_PARAM_VER_OPER_HOST;
+   return 0;
+}
+
 static int mlx5e_dcbnl_ieee_getets(struct net_device *netdev,
   struct ieee_ets *ets)
 {
@@ -199,6 +236,9 @@ static int mlx5e_dcbnl_ieee_setets(struct net_device 
*netdev,
struct mlx5e_priv *priv = netdev_priv(netdev);
int err;
 
+   if (!MLX5_CAP_GEN(priv->mdev, ets))
+   return -ENOTSUPP;
+
err = mlx5e_dbcnl_validate_ets(netdev, ets);
if (err)
return err;
@@ -255,6 +295,9 @@ static u8 mlx5e_dcbnl_getdcbx(struct net_device *dev)
 
 static u8 mlx5e_dcbnl_setdcbx(struct net_device *dev, u8 mode)
 {
+   if (mlx5e_dcbnl_switch_to_host_mode(netdev_priv(dev)))
+   return 1;
+
if ((mode & DCB_CAP_DCBX_LLD_MANAGED) ||
!(mode & DCB_CAP_DCBX_VER_CEE) ||
!(mode & DCB_CAP_DCBX_VER_IEEE) ||
@@ -634,3 +677,52 @@ const struct dcbnl_rtnl_ops mlx5e_dcbnl_ops = {
.getpfcstate= mlx5e_dcbnl_getpfcstate,
.setpfcstate= mlx5e_dcbnl_setpfcstate,
 };
+
+static void mlx5e_dcbnl_query_dcbx_mode(struct mlx5e_priv *priv,
+   enum mlx5_dcbx_oper_mode *mode)
+{
+   u32 out[MLX5_ST_SZ_DW(dcbx_param)];
+
+   *mode = MLX5E_DCBX_PARAM_VER_OPER_HOST;
+
+   if (!mlx5_query_port_dcbx_param(priv->mdev, out))
+   *mode = MLX5_GET(dcbx_param, out, version_oper);
+
+   /* From driver's point of view, we only care if the mode
+* is host (HOST) or non-host (AUTO)
+*/
+   if (*mode != MLX5E_DCBX_PARAM_VER_OPER_HOST)
+   *mode = MLX5E_DCBX_PARAM_VER_OPER_AUTO;
+}
+
+static void mlx5e_ets_init(struct mlx5e_priv *priv)
+{
+   int i;
+   struct ieee_ets ets;
+
+   memset(, 0, sizeof(ets));
+   ets.ets_cap = mlx5_max_tc(priv->mdev) + 1;
+   for (i = 0; i < ets.ets_cap; i++) {
+   ets.tc_tx_bw[i] = MLX5E_MAX_BW_ALLOC;
+   ets.tc_tsa[i] = IEEE_8021QAZ_TSA_VENDOR;
+   ets.prio_tc[i] = i;
+   }
+
+   memcpy(priv->dcbx.tc_tsa, ets.tc_tsa, sizeof(ets.tc_tsa));
+
+   /* tclass[prio=0]=1, tclass[prio=1]=0, tclass[prio=i]=i (for i>1) */
+   ets.prio_tc[0] = 1;
+   ets.prio_tc[1] = 0;
+
+   mlx5e_dcbnl_ieee_setets_core(priv, );
+}
+
+void

[PATCH net-next 08/10] net/mlx5e: Add support for loopback selftest

2016-11-26 Thread Saeed Mahameed

Extend the self diagnostic tests to support loopback test.

The loopback test doesn't require the offline flag, it will use the
generic dev_queue_xmit and a dedicated packet_type to capture and verify
mlx5e selftest loopback packets.

Signed-off-by: Saeed Mahameed 
Signed-off-by: Kamal Heib 
---
 drivers/net/ethernet/mellanox/mlx5/core/en.h   |   3 +-
 .../net/ethernet/mellanox/mlx5/core/en_common.c|   7 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c  |   2 +-
 .../net/ethernet/mellanox/mlx5/core/en_selftest.c  | 218 +
 4 files changed, 227 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index f7bb4a7..9cf32d3 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -885,7 +885,8 @@ void mlx5e_destroy_tir(struct mlx5_core_dev *mdev,
   struct mlx5e_tir *tir);
 int mlx5e_create_mdev_resources(struct mlx5_core_dev *mdev);
 void mlx5e_destroy_mdev_resources(struct mlx5_core_dev *mdev);
-int mlx5e_refresh_tirs_self_loopback_enable(struct mlx5_core_dev *mdev);
+int mlx5e_refresh_tirs_self_loopback(struct mlx5_core_dev *mdev,
+bool enable_uc_lb);
 
 struct mlx5_eswitch_rep;
 int mlx5e_vport_rep_load(struct mlx5_eswitch *esw,
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_common.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_common.c
index 029e856..f175518 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_common.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_common.c
@@ -137,7 +137,8 @@ void mlx5e_destroy_mdev_resources(struct mlx5_core_dev 
*mdev)
mlx5_unmap_free_uar(mdev, >cq_uar);
 }
 
-int mlx5e_refresh_tirs_self_loopback_enable(struct mlx5_core_dev *mdev)
+int mlx5e_refresh_tirs_self_loopback(struct mlx5_core_dev *mdev,
+bool enable_uc_lb)
 {
struct mlx5e_tir *tir;
void *in;
@@ -149,6 +150,10 @@ int mlx5e_refresh_tirs_self_loopback_enable(struct 
mlx5_core_dev *mdev)
if (!in)
return -ENOMEM;
 
+   if (enable_uc_lb)
+   MLX5_SET(modify_tir_in, in, ctx.self_lb_block,
+MLX5_TIRC_SELF_LB_BLOCK_BLOCK_UNICAST_);
+
MLX5_SET(modify_tir_in, in, bitmask.self_lb_en, 1);
 
list_for_each_entry(tir, >mlx5e_res.td.tirs_list, list) {
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index f5b93c2..89d5c65 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -2136,7 +2136,7 @@ int mlx5e_open_locked(struct net_device *netdev)
goto err_clear_state_opened_flag;
}
 
-   err = mlx5e_refresh_tirs_self_loopback_enable(priv->mdev);
+   err = mlx5e_refresh_tirs_self_loopback(priv->mdev, false);
if (err) {
netdev_err(netdev, "%s: mlx5e_refresh_tirs_self_loopback_enable 
failed, %d\n",
   __func__, err);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_selftest.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_selftest.c
index a25dfc5..a823054 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_selftest.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_selftest.c
@@ -30,12 +30,16 @@
  * SOFTWARE.
  */
 
+#include 
+#include 
+#include 
 #include "en.h"
 
 enum {
MLX5E_ST_LINK_STATE,
MLX5E_ST_LINK_SPEED,
MLX5E_ST_HEALTH_INFO,
+   MLX5E_ST_LOOPBACK,
MLX5E_ST_NUM,
 };
 
@@ -43,6 +47,7 @@ const char mlx5e_self_tests[MLX5E_ST_NUM][ETH_GSTRING_LEN] = {
"Link Test",
"Speed Test",
"Health Test",
+   "Loopback Test",
 };
 
 int mlx5e_self_test_num(struct mlx5e_priv *priv)
@@ -88,10 +93,223 @@ static int mlx5e_test_link_speed(struct mlx5e_priv *priv)
return 1;
 }
 
+/* loopback test */
+#define MLX5E_TEST_PKT_SIZE (MLX5_MPWRQ_SMALL_PACKET_THRESHOLD - NET_IP_ALIGN)
+static const char mlx5e_test_text[ETH_GSTRING_LEN] = "MLX5E SELF TEST";
+#define MLX5E_TEST_MAGIC 0x5AEED15C001
+
+struct mlx5ehdr {
+   __be32 version;
+   __be64 magic;
+   char   text[ETH_GSTRING_LEN];
+};
+
+static struct sk_buff *mlx5e_test_get_udp_skb(struct mlx5e_priv *priv)
+{
+   struct sk_buff *skb = NULL;
+   struct mlx5ehdr *mlxh;
+   struct ethhdr *ethh;
+   struct udphdr *udph;
+   struct iphdr *iph;
+   int datalen, iplen;
+
+   datalen = MLX5E_TEST_PKT_SIZE -
+ (sizeof(*ethh) + sizeof(*iph) + sizeof(*udph));
+
+   skb = netdev_alloc_skb(priv->netdev, MLX5E_TEST_PKT_SIZE);
+   if (!skb) {
+   netdev_err(priv->netdev, "\tFailed to alloc loopback skb\n");
+   return NULL;
+   }
+
+   prefetchw(skb->data);
+   skb_reserve(skb, NET_IP_ALIGN);
+
+   /*

[PATCH net-next 10/10] net/mlx5e: Add CQE compression user control

2016-11-26 Thread Saeed Mahameed

From: Shaker Daibes 

The user can now override the automatic driver decision using the
rx_cqe_compress flag, which is the preference for CQE compression.
The flag is initialized with the automatic driver decision.

Signed-off-by: Shaker Daibes 
Signed-off-by: Saeed Mahameed 
---
 drivers/net/ethernet/mellanox/mlx5/core/en.h   |  5 +--
 drivers/net/ethernet/mellanox/mlx5/core/en_clock.c |  3 +-
 .../net/ethernet/mellanox/mlx5/core/en_ethtool.c   | 39 --
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c  | 13 
 drivers/net/ethernet/mellanox/mlx5/core/en_rx.c|  4 +--
 5 files changed, 51 insertions(+), 13 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index 84ac78f..442dbc3 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -171,10 +171,12 @@ extern const char mlx5e_self_tests[][ETH_GSTRING_LEN];
 
 static const char mlx5e_priv_flags[][ETH_GSTRING_LEN] = {
"rx_cqe_moder",
+   "rx_cqe_compress",
 };
 
 enum mlx5e_priv_flag {
MLX5E_PFLAG_RX_CQE_BASED_MODER = (1 << 0),
+   MLX5E_PFLAG_RX_CQE_COMPRESS = (1 << 1),
 };
 
 #define MLX5E_SET_PFLAG(priv, pflag, enable)   \
@@ -205,8 +207,7 @@ struct mlx5e_params {
u16 num_channels;
u8  num_tc;
u8  rx_cq_period_mode;
-   bool rx_cqe_compress_admin;
-   bool rx_cqe_compress;
+   bool rx_cqe_compress_def;
struct mlx5e_cq_moder rx_cq_moderation;
struct mlx5e_cq_moder tx_cq_moderation;
u16 min_rx_wqes;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_clock.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_clock.c
index 13dc388..2cd8e56 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_clock.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_clock.c
@@ -94,7 +94,7 @@ int mlx5e_hwstamp_set(struct net_device *dev, struct ifreq 
*ifr)
switch (config.rx_filter) {
case HWTSTAMP_FILTER_NONE:
/* Reset CQE compression to Admin default */
-   mlx5e_modify_rx_cqe_compression(priv, 
priv->params.rx_cqe_compress_admin);
+   mlx5e_modify_rx_cqe_compression(priv, 
priv->params.rx_cqe_compress_def);
break;
case HWTSTAMP_FILTER_ALL:
case HWTSTAMP_FILTER_SOME:
@@ -111,6 +111,7 @@ int mlx5e_hwstamp_set(struct net_device *dev, struct ifreq 
*ifr)
case HWTSTAMP_FILTER_PTP_V2_SYNC:
case HWTSTAMP_FILTER_PTP_V2_DELAY_REQ:
/* Disable CQE compression */
+   netdev_warn(dev, "Disabling cqe compression");
mlx5e_modify_rx_cqe_compression(priv, false);
config.rx_filter = HWTSTAMP_FILTER_ALL;
break;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
index d2bdccb..aa963d7 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
@@ -1481,6 +1481,35 @@ static int set_pflag_rx_cqe_based_moder(struct 
net_device *netdev, bool enable)
return err;
 }
 
+static int set_pflag_rx_cqe_compress(struct net_device *netdev,
+bool enable)
+{
+   struct mlx5e_priv *priv = netdev_priv(netdev);
+   struct mlx5_core_dev *mdev = priv->mdev;
+   int err = 0;
+   bool reset;
+
+   if (!MLX5_CAP_GEN(mdev, cqe_compression))
+   return -ENOTSUPP;
+
+   if (enable && priv->tstamp.hwtstamp_config.rx_filter != 
HWTSTAMP_FILTER_NONE) {
+   netdev_err(netdev, "Can't enable cqe compression while 
timestamping is enabled.\n");
+   return -EINVAL;
+   }
+
+   reset = test_bit(MLX5E_STATE_OPENED, >state);
+
+   if (reset)
+   mlx5e_close_locked(netdev);
+
+   MLX5E_SET_PFLAG(priv, MLX5E_PFLAG_RX_CQE_COMPRESS, enable);
+   priv->params.rx_cqe_compress_def = enable;
+
+   if (reset)
+   err = mlx5e_open_locked(netdev);
+   return err;
+}
+
 static int mlx5e_handle_pflag(struct net_device *netdev,
  u32 wanted_flags,
  enum mlx5e_priv_flag flag,
@@ -1511,13 +1540,19 @@ static int mlx5e_set_priv_flags(struct net_device 
*netdev, u32 pflags)
int err;
 
mutex_lock(>state_lock);
-
err = mlx5e_handle_pflag(netdev, pflags,
 MLX5E_PFLAG_RX_CQE_BASED_MODER,
 set_pflag_rx_cqe_based_moder);
+   if (err)
+   goto out;
 
+   err = mlx5e_handle_pflag(netdev, pflags,
+MLX5E_PFLAG_RX_CQE_COMPRESS,
+set_pflag_rx_cqe_compress);
+
+out:
mutex_unlock(>state_lock);
-   return err ? -EINVAL : 0;
+   return err;

[PATCH net-next 00/10] Mellanox 100G mlx5 DCBX and ethtool updates

2016-11-26 Thread Saeed Mahameed

Hi Dave, 

This series provides the following mlx5 updates:

>From Huy:
DCBX CEE API and DCBX firmware/host modes support.
- 1st patch ensures the dcbnl_rtnl_ops is published only when the qos
  capability bits is on. 
- 2nd patch adds the support for CEE interfaces into mlx5 dcbnl_rtnl_ops
- 3rd patch refactors ETS query to read ETS configuration directly from
  firmware rather than having a software shadow to it. The existing IEEE
  interfaces stays the same.
- 4th patch adds the support for MLX5_REG_DCBX_PARAM and MLX5_REG_DCBX_APP
  firmware commands to manipulate mlx5 DCBX mode. 
- 5th patch adds the driver support for the new DCBX firmware.  This ensures
  the backward compatibility versus the old and new firmware. With the new 
DCBX
  firmware, qos settings can be controlled by either firmware or software
  depending on the DCBX mode. 

>From Kamal and Saeed:
- mlx5 self-test support.

>From Shaker:
- Private flag to give the user the ability to enable/disable mlx5 CQE
  compression.

This series was generated against commit:
e5f12b3f5ebb ("Merge branch 'mlxsw-trap-groups-and-policers'")

Thenks
Saeed.

Huy Nguyen (6):
  net/mlx5e: Add qos capability check
  net/mlx5e: Support DCBX CEE API
  net/mlx5e: Read ETS settings directly from firmware
  net/mlx5: Add DCBX firmware commands support
  net/mlx5e: ConnectX-4 firmware support for DCBX
  net/mlx5e: Add DCBX control interface

Kamal Heib (1):
  net/mlx5e: Add support for ethtool self diagnostics test

Saeed Mahameed (1):
  net/mlx5e: Add support for loopback selftest

Shaker Daibes (2):
  net/mlx5e: Moves pflags to priv->params
  net/mlx5e: Add CQE compression user control

 drivers/net/ethernet/mellanox/mlx5/core/Makefile   |   2 +-
 drivers/net/ethernet/mellanox/mlx5/core/en.h   |  61 ++-
 drivers/net/ethernet/mellanox/mlx5/core/en_clock.c |   3 +-
 .../net/ethernet/mellanox/mlx5/core/en_common.c|   7 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c | 449 -
 .../net/ethernet/mellanox/mlx5/core/en_ethtool.c   |  53 ++-
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c  |  46 +--
 drivers/net/ethernet/mellanox/mlx5/core/en_rx.c|   4 +-
 .../net/ethernet/mellanox/mlx5/core/en_selftest.c  | 344 
 drivers/net/ethernet/mellanox/mlx5/core/port.c |  63 +++
 include/linux/mlx5/driver.h|   7 +
 include/linux/mlx5/port.h  |   6 +
 12 files changed, 980 insertions(+), 65 deletions(-)
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/en_selftest.c

-- 
2.7.4

Re: net: stmmac: Meson GXBB: attempting to execute userspace memory

2016-11-26 Thread Heinrich Schuchardt

On 11/26/2016 10:08 AM, Martin Blumenstingl wrote:
> Hello Heinrich,
> 
> On Sat, Nov 26, 2016 at 8:53 AM, Heinrich Schuchardt
>  wrote:
>> For Odroid C2 I have compiled kernel
>> 4.9.0-rc6-next-20161124-1-gbf7e142
>> with one additional patch
>> https://github.com/xypron/kernel-odroid-c2/blob/master/patch/0001-stmmac-RTL8211F-Meson-GXBB-TX-throughput-problems.patch
>>
>> I repeatedly see faults like the one below:
> do you see the same errors with the RTL8211F patch *not* applied?
> 
>> [ 2557.400796] Unhandled fault: synchronous external abort (0x9210)
>> at 0x40001e8ee4b0
>> [ 2557.952413] CPU: 0 PID: 22837 Comm: cc1 Tainted: G  D
>> 4.9.0-rc6-next-20161124-1-gbf7e142 #1
>> [ 2557.962062] Hardware name: Hardkernel ODROID-C2 (DT)
>> [ 2557.966980] task: 80006ddb7080 task.stack: 80006dd9c000
>> [ 2557.972846] PC is at 0x6a0d98
>> [ 2557.975776] LR is at 0x6a0e54
>> [ 2557.978709] pc : [<006a0d98>] lr : [<006a0e54>]
>> pstate: 8000
>> [ 2557.986040] sp : f3ee5f80
>> [ 2557.989318] x29: f3ee5f80 x28: 4b3f1240
>> [ 2557.994578] x27: 012a7000 x26: 4b3f1288
>> [ 2557.999840] x25: 00f58f88 x24: 4b3f1240
>> [ 2558.005101] x23:  x22: 0001
>> [ 2558.010362] x21: 0001 x20: 4b3f1250
>> [ 2558.015623] x19: 0054 x18: 0001
>> [ 2558.020885] x17: 48acaa10 x16: 01285050
>> [ 2558.026146] x15: 4ad96dc8 x14: 001f
>> [ 2558.031407] x13: 4b3f1270 x12: 4b3f1258
>> [ 2558.036668] x11: 01347000 x10: 0661
>> [ 2558.041930] x9 : 0005 x8 : 0003
>> [ 2558.047191] x7 : 4b3f1240 x6 : 20020033
>> [ 2558.052452] x5 : 4b402020 x4 : 4b3e1aa0
>> [ 2558.057713] x3 : 000c x2 : 0020
>> [ 2558.062974] x1 : 00f45000 x0 : 0065
>> [ 2558.068235]
>> [ 2558.069712] Internal error: Attempting to execute userspace memory:
>> 860f [#7] PREEMPT SMP
>> [ 2558.078155] Modules linked in: meson_rng rng_core meson_gxbb_wdt
>> ip_tables x_tables ipv6 dwmac_generic realtek dwmac_meson8b
>> stmmac_platform stmmac
>> [ 2558.091267] CPU: 0 PID: 22837 Comm: cc1 Tainted: G  D
>> 4.9.0-rc6-next-20161124-1-gbf7e142 #1
>> [ 2558.100925] Hardware name: Hardkernel ODROID-C2 (DT)
>> [ 2558.105841] task: 80006ddb7080 task.stack: 80006dd9c000
>> [ 2558.111706] PC is at 0x6a0e54
>> [ 2558.114638] LR is at 0x6a0e54
>> [ 2558.117571] pc : [<006a0e54>] lr : [<006a0e54>]
>> pstate: 63c5
>> [ 2558.124902] sp : 80006dd9fec0
>> [ 2558.128179] x29:  x28: 80006ddb7080
>> [ 2558.133441] x27: 012a7000 x26: 4b3f1288
>> [ 2558.138702] x25: 00f58f88 x24: 4b3f1240
>> [ 2558.143963] x23: 8000 x22: 006a0d98
>> [ 2558.149225] x21:  x20: 80006e223000
>> [ 2558.154486] x19:  x18: 0010
>> [ 2558.159747] x17: 48acaa10 x16: 01285050
>> [ 2558.165008] x15: 88e91f07 x14: 0006
>> [ 2558.170270] x13: 08e91f15 x12: 000f
>> [ 2558.175531] x11: 0002 x10: 02ea
>> [ 2558.180792] x9 : 80006dd9fb40 x8 : 00010a8b
>> [ 2558.186053] x7 :  x6 : 020e
>> [ 2558.191315] x5 : 020f020e x4 : 
>> [ 2558.196576] x3 :  x2 : 020f
>> [ 2558.201837] x1 : 80006ddb7080 x0 : 
>> [ 2558.207098]
>> [ 2558.208565] Process cc1 (pid: 22837, stack limit = 0x80006dd9c000)
>> [ 2558.215035] Stack: (0x80006dd9fec0 to 0x80006dda)
>> [ 2558.220728] fec0: 0065 00f45000 0020
>> 000c
>> [ 2558.228490] fee0: 4b3e1aa0 4b402020 20020033
>> 4b3f1240
>> [ 2558.236253] ff00: 0003 0005 0661
>> 01347000
>> [ 2558.244015] ff20: 4b3f1258 4b3f1270 001f
>> 4ad96dc8
>> [ 2558.251778] ff40: 01285050 48acaa10 0001
>> 0054
>> [ 2558.259540] ff60: 4b3f1250 0001 0001
>> 
>> [ 2558.267303] ff80: 4b3f1240 00f58f88 4b3f1288
>> 012a7000
>> [ 2558.275065] ffa0: 4b3f1240 f3ee5f80 006a0e54
>> f3ee5f80
>> [ 2558.282828] ffc0: 006a0d98 8000 0003
>> 
>> [ 2558.290590] ffe0:   
>> 
>> [ 2558.298351] Call trace:
>> [ 2558.300769] Exception stack(0x80006dd9fcf0 to 0x80006dd9fe20)
>> [ 2558.307149] fce0:   
>> 0001
>> [ 2558.314913] fd00: 80006dd9fec0

Re: Gigabit ethernet driver for Alacritechs SLIC devices (v3)

2016-11-26 Thread Lino Sanfilippo

On 26.11.2016 13:20, Lino Sanfilippo wrote:

> v3:
> - dont add defines to pci.h but instead put it into the drivers header file

This should of course be "pci_ids.h".

Regards,
Lino

[PATCH v3 net-next 2/2] MAINTAINERS: add entry for slicoss ethernet driver

2016-11-26 Thread Lino Sanfilippo

Add myself as maintainer for the slicoss ethernet driver.

Signed-off-by: Lino Sanfilippo 
---
 MAINTAINERS | 5 +
 1 file changed, 5 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 6781a3f..bb9af28 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -562,6 +562,11 @@ T: git git://linuxtv.org/anttip/media_tree.git
 S: Maintained
 F: drivers/media/usb/airspy/
 
+ALACRITECH GIGABIT ETHERNET DRIVER
+M: Lino Sanfilippo 
+S: Maintained
+F: drivers/net/ethernet/alacritech/*
+
 ALCATEL SPEEDTOUCH USB DRIVER
 M: Duncan Sands 
 L: linux-...@vger.kernel.org
-- 
1.9.1

[PATCH v3 net-next 1/2] net: ethernet: slicoss: add slicoss gigabit ethernet driver

2016-11-26 Thread Lino Sanfilippo

Add driver for Alacritech gigabit ethernet cards with SLIC (session-layer
interface control) technology. The driver provides basic support without
SLIC for the following devices:

- Mojave cards (single port PCI Gigabit) both copper and fiber
- Oasis cards (single and dual port PCI-x Gigabit) copper and fiber
- Kalahari cards (dual and quad port PCI-e Gigabit) copper and fiber

Signed-off-by: Lino Sanfilippo 
---
 drivers/net/ethernet/Kconfig  |1 +
 drivers/net/ethernet/Makefile |1 +
 drivers/net/ethernet/alacritech/Kconfig   |   28 +
 drivers/net/ethernet/alacritech/Makefile  |4 +
 drivers/net/ethernet/alacritech/slic.h|  576 +
 drivers/net/ethernet/alacritech/slicoss.c | 1867 +
 6 files changed, 2477 insertions(+)
 create mode 100644 drivers/net/ethernet/alacritech/Kconfig
 create mode 100644 drivers/net/ethernet/alacritech/Makefile
 create mode 100644 drivers/net/ethernet/alacritech/slic.h
 create mode 100644 drivers/net/ethernet/alacritech/slicoss.c

diff --git a/drivers/net/ethernet/Kconfig b/drivers/net/ethernet/Kconfig
index 2ffd634..a4cc87fe 100644
--- a/drivers/net/ethernet/Kconfig
+++ b/drivers/net/ethernet/Kconfig
@@ -21,6 +21,7 @@ source "drivers/net/ethernet/3com/Kconfig"
 source "drivers/net/ethernet/adaptec/Kconfig"
 source "drivers/net/ethernet/aeroflex/Kconfig"
 source "drivers/net/ethernet/agere/Kconfig"
+source "drivers/net/ethernet/alacritech/Kconfig"
 source "drivers/net/ethernet/allwinner/Kconfig"
 source "drivers/net/ethernet/alteon/Kconfig"
 source "drivers/net/ethernet/altera/Kconfig"
diff --git a/drivers/net/ethernet/Makefile b/drivers/net/ethernet/Makefile
index 1d349e9..b448027 100644
--- a/drivers/net/ethernet/Makefile
+++ b/drivers/net/ethernet/Makefile
@@ -7,6 +7,7 @@ obj-$(CONFIG_NET_VENDOR_8390) += 8390/
 obj-$(CONFIG_NET_VENDOR_ADAPTEC) += adaptec/
 obj-$(CONFIG_GRETH) += aeroflex/
 obj-$(CONFIG_NET_VENDOR_AGERE) += agere/
+obj-$(CONFIG_NET_VENDOR_ALACRITECH) += alacritech/
 obj-$(CONFIG_NET_VENDOR_ALLWINNER) += allwinner/
 obj-$(CONFIG_NET_VENDOR_ALTEON) += alteon/
 obj-$(CONFIG_ALTERA_TSE) += altera/
diff --git a/drivers/net/ethernet/alacritech/Kconfig 
b/drivers/net/ethernet/alacritech/Kconfig
new file mode 100644
index 000..41000a3
--- /dev/null
+++ b/drivers/net/ethernet/alacritech/Kconfig
@@ -0,0 +1,28 @@
+config NET_VENDOR_ALACRITECH
+bool "Alacritech devices"
+default y
+---help---
+  If you have a network (Ethernet) card belonging to this class, say Y.
+
+  Note that the answer to this question doesn't directly affect the
+  kernel: saying N will just cause the configurator to skip all
+  the questions about Renesas devices. If you say Y, you will be asked
+  for your specific device in the following questions.
+
+if NET_VENDOR_ALACRITECH
+
+config SLICOSS
+   tristate "Alacritech Slicoss support"
+   depends on PCI
+   select CRC32
+   ---help---
+ This driver supports Gigabit Ethernet adapters based on the
+ Session Layer Interface (SLIC) technology by Alacritech.
+
+ Supported are Mojave (1 port) and Oasis (1, 2 and 4 port) cards,
+ both copper and fiber.
+
+ To compile this driver as a module, choose M here: the module
+ will be called slicoss. This is recommended.
+
+endif # NET_VENDOR_ALACRITECH
diff --git a/drivers/net/ethernet/alacritech/Makefile 
b/drivers/net/ethernet/alacritech/Makefile
new file mode 100644
index 000..8790e9e
--- /dev/null
+++ b/drivers/net/ethernet/alacritech/Makefile
@@ -0,0 +1,4 @@
+#
+# Makefile for the Alacritech Slicoss driver
+#
+obj-$(CONFIG_SLICOSS) += slicoss.o
diff --git a/drivers/net/ethernet/alacritech/slic.h 
b/drivers/net/ethernet/alacritech/slic.h
new file mode 100644
index 000..c62d46b
--- /dev/null
+++ b/drivers/net/ethernet/alacritech/slic.h
@@ -0,0 +1,576 @@
+
+#ifndef _SLIC_H
+#define _SLIC_H
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#define SLIC_VGBSTAT_XPERR 0x4000
+#define SLIC_VGBSTAT_XERRSHFT  25
+#define SLIC_VGBSTAT_XCSERR0x23
+#define SLIC_VGBSTAT_XUFLOW0x22
+#define SLIC_VGBSTAT_XHLEN 0x20
+#define SLIC_VGBSTAT_NETERR0x0100
+#define SLIC_VGBSTAT_NERRSHFT  16
+#define SLIC_VGBSTAT_NERRMSK   0x1ff
+#define SLIC_VGBSTAT_NCSERR0x103
+#define SLIC_VGBSTAT_NUFLOW0x102
+#define SLIC_VGBSTAT_NHLEN 0x100
+#define SLIC_VGBSTAT_LNKERR0x0080
+#define SLIC_VGBSTAT_LERRMSK   0xff
+#define SLIC_VGBSTAT_LDEARLY   0x86
+#define SLIC_VGBSTAT_LBOFLO0x85
+#define SLIC_VGBSTAT_LCODERR   0x84
+#define SLIC_VGBSTAT_LDBLNBL   0x83
+#define SLIC_VGBSTAT_LCRCERR   0x82
+#define SLIC_VGBSTAT_LOFLO 0x81
+#define SLIC_VGBSTAT_LUFLO

Gigabit ethernet driver for Alacritechs SLIC devices (v3)

2016-11-26 Thread Lino Sanfilippo

Hi,

this is the third version of the slicoss gigabit ethernet driver (which is a
rework of the driver from Alacritech which can currently be found under
drivers/staging/slicoss). The driver is supposed to support Mojave, Oasis and
Kalahari cards, for both copper and fiber.

If this code is accepted the staging version can be removed.

The driver has been tested on a SEN2104ET adapter (4 Port PCIe copper).

v3:
- dont add defines to pci.h but instead put it into the drivers header file
(requested by Greg Kroah-Hartman)

v2:
- remove unusual padding in statistic strings (suggested by Andrew Lunn)
- for mdio register and bit names use defines from mii.h instead of own ones
  (suggested by Andrew Lunn)
- remove unused defines
- ensure PCI flush at two more places
- use mmiowb before lock to prevent mmio writes leaking out of lock
- fix some typos in comments
- add copyright and GPL header

Regards,
Lino

Re: [Patch net-next] net_sched: move the empty tp check from ->destroy() to ->delete()

2016-11-26 Thread Daniel Borkmann


On 11/26/2016 07:46 AM, Cong Wang wrote:

On Thu, Nov 24, 2016 at 7:20 AM, Daniel Borkmann  wrote:


Ok, strange, qdisc_destroy() calls into ops->destroy(), where ingress
drops its entire chain via tcf_destroy_chain(), so that will be NULL
eventually. The tps are freed by call_rcu() as well as qdisc itself
later on via qdisc_rcu_free(), where it frees per-cpu bstats as well.
Outstanding readers should either bail out due to if (!cl) or can still
process the chain until read section ends, but during that time, cl->q
resp. bstats should be good. Do you happen to know what's at address
880a68b04028? I was wondering wrt call_rcu() vs call_rcu_bh(), but
at least on ingress (netif_receive_skb_internal()) we hold rcu_read_lock()
here. The KASAN report is reliably happening at this location, right?


I am confused as well, I don't see how it could be related to my patch yet.
I will take a deep look in the weekend.


Ok, I'm currently on the run. Got too late yesterday night, but I'll
write what I found in the evening today, not related to ingress though.

Cheers,
Daniel

Re: net: stmmac: Meson GXBB: attempting to execute userspace memory

2016-11-26 Thread Martin Blumenstingl

Hello Heinrich,

On Sat, Nov 26, 2016 at 8:53 AM, Heinrich Schuchardt
 wrote:
> For Odroid C2 I have compiled kernel
> 4.9.0-rc6-next-20161124-1-gbf7e142
> with one additional patch
> https://github.com/xypron/kernel-odroid-c2/blob/master/patch/0001-stmmac-RTL8211F-Meson-GXBB-TX-throughput-problems.patch
>
> I repeatedly see faults like the one below:
do you see the same errors with the RTL8211F patch *not* applied?

> [ 2557.400796] Unhandled fault: synchronous external abort (0x9210)
> at 0x40001e8ee4b0
> [ 2557.952413] CPU: 0 PID: 22837 Comm: cc1 Tainted: G  D
> 4.9.0-rc6-next-20161124-1-gbf7e142 #1
> [ 2557.962062] Hardware name: Hardkernel ODROID-C2 (DT)
> [ 2557.966980] task: 80006ddb7080 task.stack: 80006dd9c000
> [ 2557.972846] PC is at 0x6a0d98
> [ 2557.975776] LR is at 0x6a0e54
> [ 2557.978709] pc : [<006a0d98>] lr : [<006a0e54>]
> pstate: 8000
> [ 2557.986040] sp : f3ee5f80
> [ 2557.989318] x29: f3ee5f80 x28: 4b3f1240
> [ 2557.994578] x27: 012a7000 x26: 4b3f1288
> [ 2557.999840] x25: 00f58f88 x24: 4b3f1240
> [ 2558.005101] x23:  x22: 0001
> [ 2558.010362] x21: 0001 x20: 4b3f1250
> [ 2558.015623] x19: 0054 x18: 0001
> [ 2558.020885] x17: 48acaa10 x16: 01285050
> [ 2558.026146] x15: 4ad96dc8 x14: 001f
> [ 2558.031407] x13: 4b3f1270 x12: 4b3f1258
> [ 2558.036668] x11: 01347000 x10: 0661
> [ 2558.041930] x9 : 0005 x8 : 0003
> [ 2558.047191] x7 : 4b3f1240 x6 : 20020033
> [ 2558.052452] x5 : 4b402020 x4 : 4b3e1aa0
> [ 2558.057713] x3 : 000c x2 : 0020
> [ 2558.062974] x1 : 00f45000 x0 : 0065
> [ 2558.068235]
> [ 2558.069712] Internal error: Attempting to execute userspace memory:
> 860f [#7] PREEMPT SMP
> [ 2558.078155] Modules linked in: meson_rng rng_core meson_gxbb_wdt
> ip_tables x_tables ipv6 dwmac_generic realtek dwmac_meson8b
> stmmac_platform stmmac
> [ 2558.091267] CPU: 0 PID: 22837 Comm: cc1 Tainted: G  D
> 4.9.0-rc6-next-20161124-1-gbf7e142 #1
> [ 2558.100925] Hardware name: Hardkernel ODROID-C2 (DT)
> [ 2558.105841] task: 80006ddb7080 task.stack: 80006dd9c000
> [ 2558.111706] PC is at 0x6a0e54
> [ 2558.114638] LR is at 0x6a0e54
> [ 2558.117571] pc : [<006a0e54>] lr : [<006a0e54>]
> pstate: 63c5
> [ 2558.124902] sp : 80006dd9fec0
> [ 2558.128179] x29:  x28: 80006ddb7080
> [ 2558.133441] x27: 012a7000 x26: 4b3f1288
> [ 2558.138702] x25: 00f58f88 x24: 4b3f1240
> [ 2558.143963] x23: 8000 x22: 006a0d98
> [ 2558.149225] x21:  x20: 80006e223000
> [ 2558.154486] x19:  x18: 0010
> [ 2558.159747] x17: 48acaa10 x16: 01285050
> [ 2558.165008] x15: 88e91f07 x14: 0006
> [ 2558.170270] x13: 08e91f15 x12: 000f
> [ 2558.175531] x11: 0002 x10: 02ea
> [ 2558.180792] x9 : 80006dd9fb40 x8 : 00010a8b
> [ 2558.186053] x7 :  x6 : 020e
> [ 2558.191315] x5 : 020f020e x4 : 
> [ 2558.196576] x3 :  x2 : 020f
> [ 2558.201837] x1 : 80006ddb7080 x0 : 
> [ 2558.207098]
> [ 2558.208565] Process cc1 (pid: 22837, stack limit = 0x80006dd9c000)
> [ 2558.215035] Stack: (0x80006dd9fec0 to 0x80006dda)
> [ 2558.220728] fec0: 0065 00f45000 0020
> 000c
> [ 2558.228490] fee0: 4b3e1aa0 4b402020 20020033
> 4b3f1240
> [ 2558.236253] ff00: 0003 0005 0661
> 01347000
> [ 2558.244015] ff20: 4b3f1258 4b3f1270 001f
> 4ad96dc8
> [ 2558.251778] ff40: 01285050 48acaa10 0001
> 0054
> [ 2558.259540] ff60: 4b3f1250 0001 0001
> 
> [ 2558.267303] ff80: 4b3f1240 00f58f88 4b3f1288
> 012a7000
> [ 2558.275065] ffa0: 4b3f1240 f3ee5f80 006a0e54
> f3ee5f80
> [ 2558.282828] ffc0: 006a0d98 8000 0003
> 
> [ 2558.290590] ffe0:   
> 
> [ 2558.298351] Call trace:
> [ 2558.300769] Exception stack(0x80006dd9fcf0 to 0x80006dd9fe20)
> [ 2558.307149] fce0:   
> 0001
> [ 2558.314913] fd00: 80006dd9fec0 006a0e54 800073acf500
> 0004
> [ 2558.322675] fd20:  08dbbc18 80006ddb7080
> 6dd9fdd0
> [

77 matches

Mail list logo