Re: [Patch net-next] net_sched: move the empty tp check from ->destroy() to ->delete()
On 27/11/2016 06:47, Roi Dayan wrote: On 27/11/2016 02:33, Daniel Borkmann wrote: On 11/26/2016 12:09 PM, Daniel Borkmann wrote: On 11/26/2016 07:46 AM, Cong Wang wrote: On Thu, Nov 24, 2016 at 7:20 AM, Daniel Borkmannwrote: [...] Ok, strange, qdisc_destroy() calls into ops->destroy(), where ingress drops its entire chain via tcf_destroy_chain(), so that will be NULL eventually. The tps are freed by call_rcu() as well as qdisc itself later on via qdisc_rcu_free(), where it frees per-cpu bstats as well. Outstanding readers should either bail out due to if (!cl) or can still process the chain until read section ends, but during that time, cl->q resp. bstats should be good. Do you happen to know what's at address 880a68b04028? I was wondering wrt call_rcu() vs call_rcu_bh(), but at least on ingress (netif_receive_skb_internal()) we hold rcu_read_lock() here. The KASAN report is reliably happening at this location, right? I am confused as well, I don't see how it could be related to my patch yet. I will take a deep look in the weekend. Hi Cong, When reported the new trace I didn't mean it's related to your patch, I just wanted to point it out it exposed something. I should have been clear about it. Ok, I'm currently on the run. Got too late yesterday night, but I'll write what I found in the evening today, not related to ingress though. Just pushed out my analysis to netdev under "[PATCH net] net, sched: respect rcu grace period on cls destruction". My conclusion is that both issues are actually separate, and that one is small enough where we could route it via net actually. Perhaps this at the same time shrinks your "[PATCH net-next] net_sched: move the empty tp check from ->destroy() to ->delete()" to a reasonable size that it's suitable to net as well. Your ->delete()/->destroy() one is definitely needed, too. The tp->root one is independant of ->delete()/ ->destroy() as they are different races and tp->root could also happen when you just destroy the whole tp directly. I think that seems like a good path forward to me. Thanks, Daniel Hi Daniel, As for the tainted kernel. I was in old (week or two) net-next tree and only cherry-picked from latest net-next related patches to Mellanox HCA, cls_api, cls_flower, devlink. so those are the tainted modules. I have the issue reproducing in that tree so wanted it to check it with Cong's patch instead of latest net-next. I'll try running reproducing the issue with your new patch and later try latest net-next as well. Thanks, Roi Hi, I tested "[PATCH net] net, sched: respect rcu grace period on cls destruction" and could not reproduce my original issue. I rebased "[Patch net-next] net_sched: move the empty tp check from ->destroy() to ->delete()" over to test it in the same tree and got into a new trace in fl_delete. [35659.012123] BUG: KASAN: wild-memory-access on address 1803ca31 [35659.020042] Write of size 1 by task ovs-vswitchd/20135 [35659.025878] CPU: 19 PID: 20135 Comm: ovs-vswitchd Tainted: G O4.9.0-rc3+ #18 [35659.035948] Hardware name: HP ProLiant DL380p Gen8, BIOS P70 07/01/2015 [35659.043730] Call Trace: [35659.046619] [] dump_stack+0x63/0x81 [35659.052456] [] kasan_report_error+0x408/0x4e0 [35659.059402] [] kasan_report+0x58/0x60 [35659.065428] [] ? call_rcu_sched+0x1d/0x20 [35659.072119] [] ? fl_destroy_filter+0x21/0x30 [cls_flower] [35659.080217] [] ? fl_delete+0x1df/0x2e0 [cls_flower] [35659.087580] [] __asan_store1+0x4a/0x50 [35659.093697] [] fl_delete+0x1df/0x2e0 [cls_flower] [35659.100870] [] tc_ctl_tfilter+0x10da/0x1b90 0x1d02 is in fl_delete (net/sched/cls_flower.c:805). 800 struct cls_fl_filter *f = (struct cls_fl_filter *) arg; 801 802 rhashtable_remove_fast(>ht, >ht_node, 803head->ht_params); 804 __fl_delete(tp, f); 805 *last = list_empty(>filters); 806 return 0; 807 } Thanks, Roi
Re: Crash due to mutex genl_lock called from RCU context
On Sat, Nov 26, 2016 at 6:26 PM, Eric Dumazetwrote: > > Are you telling me inet_release() is called when we close() the first > file descriptor ? > > fd1 = socket() > fd2 = dup(fd1); > close(fd2) -> release() ??? Sorry, I didn't express myself clearly, I meant your change, if exclude the SOCK_RCU_FREE part, basically reverts this commit: commit 3f660d66dfbc13ea4b61d3865851b348444c24b4 Author: Herbert Xu Date: Thu May 3 03:17:14 2007 -0700 [NETLINK]: Kill CB only when socket is unused IOW, ->release() is called when the last sock fd ref is gone, but ->destructor() is called with the last sock ref is gone. They are very different. >> I don't see why we need to get genl_lock in ->done() here, because we are >> already the last sock using it and module ref protects the ops from being >> removed via module, seems we are pretty safe without any lock. > > Well, at least this exposes a real bug in Thomas patch. > > Removing the lock might be done for net-next, not stable branches. I am confused, what Subash reported is a kernel warning which can surely be fixed by removing genl lock (if it is correct, I need to double check), so why for net-next?
RE: [PATCH] net: fec: turn on device when extracting statistics
From: Nikita YushchenkoSent: Friday, November 25, 2016 6:02 PM >To: Andy Duan ; David S. Miller > ; Troy Kisky ; >Andrew Lunn ; Eric Nelson ; Philippe >Reynes ; Johannes Berg ; >netdev@vger.kernel.org; linux-ker...@vger.kernel.org >Cc: Chris Healy ; Nikita Yushchenko > >Subject: [PATCH] net: fec: turn on device when extracting statistics > >Execution 'ethtool -S' on fec device that is down causes OOPS on Vybrid >board: > >Unhandled fault: external abort on non-linefetch (0x1008) at 0xe0898200 pgd >= ddecc000 [e0898200] *pgd=9e406811, *pte=400d1653, *ppte=400d1453 >Internal error: : 1008 [#1] SMP ARM ... > >Reason of OOPS is that fec_enet_get_ethtool_stats() accesses fec registers >while IPG clock is stopped by PM. > >Fix that by wrapping statistics extraction into pm_runtime_get_sync() ... >pm_runtime_put_autosuspend() braces. > >Signed-off-by: Nikita Yushchenko >--- Acked-by: Fugang Duan > drivers/net/ethernet/freescale/fec_main.c | 11 ++- > 1 file changed, 10 insertions(+), 1 deletion(-) > >diff --git a/drivers/net/ethernet/freescale/fec_main.c >b/drivers/net/ethernet/freescale/fec_main.c >index 5aa9d4ded214..9c7592b80ce8 100644 >--- a/drivers/net/ethernet/freescale/fec_main.c >+++ b/drivers/net/ethernet/freescale/fec_main.c >@@ -2317,10 +2317,19 @@ static void fec_enet_get_ethtool_stats(struct >net_device *dev, > struct ethtool_stats *stats, u64 *data) { > struct fec_enet_private *fep = netdev_priv(dev); >- int i; >+ int i, ret; >+ >+ ret = pm_runtime_get_sync(>pdev->dev); >+ if (IS_ERR_VALUE(ret)) { >+ memset(data, 0, sizeof(*data) * ARRAY_SIZE(fec_stats)); >+ return; >+ } > > for (i = 0; i < ARRAY_SIZE(fec_stats); i++) > data[i] = readl(fep->hwp + fec_stats[i].offset); >+ >+ pm_runtime_mark_last_busy(>pdev->dev); >+ pm_runtime_put_autosuspend(>pdev->dev); > } > > static void fec_enet_get_strings(struct net_device *netdev, >-- >2.1.4
Xmas Offer
You are a recipient to Mrs Julie Leach Donation of $3 million USD. Contact ( julieleac...@gmail.com ) for claims.
[PATCH net-next 2/4] Documentation: net: phy: Add a paragraph about pause frames/flow control
Describe that the Ethernet MAC controller is ultimately responsible for dealing with proper pause frames/flow control advertisement and enabling, and that it is therefore allowed to have it change phydev->supported/advertising with SUPPORTED_Pause and SUPPORTED_AsymPause. Signed-off-by: Florian Fainelli--- Documentation/networking/phy.txt | 18 -- 1 file changed, 16 insertions(+), 2 deletions(-) diff --git a/Documentation/networking/phy.txt b/Documentation/networking/phy.txt index 4b25c0f24201..9a42a9414cea 100644 --- a/Documentation/networking/phy.txt +++ b/Documentation/networking/phy.txt @@ -127,8 +127,9 @@ Letting the PHY Abstraction Layer do Everything values pruned from them which don't make sense for your controller (a 10/100 controller may be connected to a gigabit capable PHY, so you would need to mask off SUPPORTED_1000baseT*). See include/linux/ethtool.h for definitions - for these bitfields. Note that you should not SET any bits, or the PHY may - get put into an unsupported state. + for these bitfields. Note that you should not SET any bits, except the + SUPPORTED_Pause and SUPPORTED_AsymPause bits (see below), or the PHY may get + put into an unsupported state. Lastly, once the controller is ready to handle network traffic, you call phy_start(phydev). This tells the PAL that you are ready, and configures the @@ -139,6 +140,19 @@ Letting the PHY Abstraction Layer do Everything When you want to disconnect from the network (even if just briefly), you call phy_stop(phydev). +Pause frames / flow control + + The PHY does not participate directly in flow control/pause frames except by + making sure that the SUPPORTED_Pause and SUPPORTED_AsymPause bits are set in + MII_ADVERTISE to indicate towards the link partner that the Ethernet MAC + controller supports such a thing. Since flow control/pause frames generation + involves the Ethernet MAC driver, it is recommended that this driver takes care + of properly indicating advertisement and support for such features by setting + the SUPPORTED_Pause and SUPPORTED_AsymPause bits accordingly. This can be done + either before or after phy_connect() and/or as a result of implementing the + ethtool::set_pauseparam feature. + + Keeping Close Tabs on the PAL It is possible that the PAL's built-in state machine needs a little help to -- 2.9.3
[PATCH net-next 3/4] Documentation: net: phy: Add blurb about RGMII
RGMII is a recurring source of pain for people with Gigabit Ethernet hardware since it may require PHY driver and MAC driver level configuration hints. Document what are the expectations from PHYLIB and what options exist. Signed-off-by: Florian Fainelli--- Documentation/networking/phy.txt | 56 1 file changed, 56 insertions(+) diff --git a/Documentation/networking/phy.txt b/Documentation/networking/phy.txt index 9a42a9414cea..18e9f518b6f9 100644 --- a/Documentation/networking/phy.txt +++ b/Documentation/networking/phy.txt @@ -65,6 +65,62 @@ The MDIO bus drivers/net/ethernet/freescale/fsl_pq_mdio.c and an associated DTS file for one of the users. (e.g. "git grep fsl,.*-mdio arch/powerpc/boot/dts/") +(RG)MII/electrical interface considerations + + The Reduced Gigabit Medium Independent Interface (RGMII) is a 12 pins + electrical signal using a synchronous 125Mhz clock signal and several data + lines. Due to this design decision, a 1.5ns to 2ns delay must be added between + the clock line (RXC or TXC) and the data lines to let the sink (PHY or MAC) + have enough setup and hold times to sample the data lines correctly. The PHY + library offers different types of PHY_INTERFACE_MODE_RGMII* values to let the + PHY driver and optionaly the MAC driver implement the required delay. The + values of phy_interface_t must be understood from the perspective of the PHY + device itself, leading to the following: + + * PHY_INTERFACE_MODE_RGMII: the PHY is not responsible for inserting any + internal delay by itself, it assumes that either the Ethernet MAC (if capable + or the PCB traces) insert the correct 1.5-2ns delay + + * PHY_INTERFACE_MODE_RGMII_TXID: the PHY should be inserting a delay for the + transmit data lines (TXD[3:0]) processed by the PHY device + + * PHY_INTERFACE_MODE_RGMII_RXID: the PHY should be inserting a delay for the + receive data lines (RXD[3:0]) processed by the PHY device + + * PHY_INTERFACE_MODE_RGMII_ID: the PHY should be inserting a delay for both + transmit AND receive data lines from/to the PHY device + + Whenever it is possible, it is preferrable to utilize the PHY side RGMII delay + for several reasons: + + * PHY devices may offer sub-nanosecond granularity in how they allow a + receiver/transmitter side delay (e.g: 0.5, 1.0, 1.5ns) etc. + + * PHY devices are typically qualified for a large range of temperatures, and + they provide a constant and reliable delay across + temperature/pressure/voltage ranges + + * PHY device drivers in PHYLIB being reusable by nature, being able to + configure correctly a specified delay enables more designs with similar delay + requirements to be enabled + + For cases where the PHY is not capable of providing this delay, but the + Ethernet MAC driver is capable of doing it, the correct phy_interface_t value + should be PHY_INTERFACE_MODE_RGMII, and the Ethernet MAC driver should be + configured correctly in order to provide the required transmit and/or receive + side delay from the perspective of the PHY device. + + In case neither the Ethernet MAC, nor the PHY are capable of providing the + required delays, as defined per the RGMII standard, several options may be + available: + + * Some SoCs may offer a pin pad/mux/controller capable of configuring a given + set of pins's drive strength, delays and voltage, and it may be a suitable + option to insert the expected 2ns RGMII delay + + * Modifying the PCB design to include a fixed delay (e.g: using a specifically + designed serpentine), which may not require software configuration at all + Connecting to a PHY Sometime during startup, the network driver needs to establish a connection -- 2.9.3
[PATCH net-next 0/4] Documentation: net: phy: Improve documentation
Hi all, This patch series addresses discussions and feedback that was recently received on the mailing-list in the area of: flow control/pause frames, interpretation of phy_interface_t and finally add some links to useful standards documents. Florian Fainelli (4): Documentation: net: phy: remove description of function pointers Documentation: net: phy: Add a paragraph about pause frames/flow control Documentation: net: phy: Add blurb about RGMII Documentation: net: phy: Add links to several standards documents Documentation/networking/phy.txt | 119 +++ 1 file changed, 84 insertions(+), 35 deletions(-) -- 2.9.3
[PATCH net-next 1/4] Documentation: net: phy: remove description of function pointers
Remove the function pointers documentation which duplicates information found in include/linux/phy.h. Maintaining documentation about two different locations just does not work, but the code is less likely to be outdated. Signed-off-by: Florian Fainelli--- Documentation/networking/phy.txt | 35 ++- 1 file changed, 2 insertions(+), 33 deletions(-) diff --git a/Documentation/networking/phy.txt b/Documentation/networking/phy.txt index 7ab9404a8412..4b25c0f24201 100644 --- a/Documentation/networking/phy.txt +++ b/Documentation/networking/phy.txt @@ -251,39 +251,8 @@ Writing a PHY driver PHY_BASIC_FEATURES, but you can look in include/mii.h for other features. - Each driver consists of a number of function pointers: - - soft_reset: perform a PHY software reset - config_init: configures PHY into a sane state after a reset. - For instance, a Davicom PHY requires descrambling disabled. - probe: Allocate phy->priv, optionally refuse to bind. - PHY may not have been reset or had fixups run yet. - suspend/resume: power management - config_aneg: Changes the speed/duplex/negotiation settings - aneg_done: Determines the auto-negotiation result - read_status: Reads the current speed/duplex/negotiation settings - ack_interrupt: Clear a pending interrupt - did_interrupt: Checks if the PHY generated an interrupt - config_intr: Enable or disable interrupts - remove: Does any driver take-down - ts_info: Queries about the HW timestamping status - match_phy_device: used for Clause 45 capable PHYs to match devices - in package and ensure they are compatible - hwtstamp: Set the PHY HW timestamping configuration - rxtstamp: Requests a receive timestamp at the PHY level for a 'skb' - txtsamp: Requests a transmit timestamp at the PHY level for a 'skb' - set_wol: Enable Wake-on-LAN at the PHY level - get_wol: Get the Wake-on-LAN status at the PHY level - link_change_notify: called to inform the core is about to change the - link state, can be used to work around bogus PHY between state changes - read_mmd_indirect: Read PHY MMD indirect register - write_mmd_indirect: Write PHY MMD indirect register - module_info: Get the size and type of an EEPROM contained in an plug-in - module - module_eeprom: Get EEPROM information of a plug-in module - get_sset_count: Get number of strings sets that get_strings will count - get_strings: Get strings from requested objects (statistics) - get_stats: Get the extended statistics from the PHY device + Each driver consists of a number of function pointers, documented + in include/linux/phy.h under the phy_driver structure. Of these, only config_aneg and read_status are required to be assigned by the driver code. The rest are optional. Also, it is -- 2.9.3
[PATCH net-next 4/4] Documentation: net: phy: Add links to several standards documents
Add links to the IEEE 802.3-2008 document, and the RGMII v1.3 and v2.0 revisions of the standard. Signed-off-by: Florian Fainelli--- Documentation/networking/phy.txt | 10 ++ 1 file changed, 10 insertions(+) diff --git a/Documentation/networking/phy.txt b/Documentation/networking/phy.txt index 18e9f518b6f9..9908490363d6 100644 --- a/Documentation/networking/phy.txt +++ b/Documentation/networking/phy.txt @@ -386,3 +386,13 @@ Board Fixups The stubs set one of the two matching criteria, and set the other one to match anything. +Standards + + IEEE Standard 802.3: CSMA/CD Access Method and Physical Layer Specifications, Section Two: + http://standards.ieee.org/getieee802/download/802.3-2008_section2.pdf + + RGMII v1.3: + http://web.archive.org/web/20160303212629/http://www.hp.com/rnd/pdfs/RGMIIv1_3.pdf + + RGMII v2.0: + http://web.archive.org/web/20160303171328/http://www.hp.com/rnd/pdfs/RGMIIv2_0_final_hp.pdf -- 2.9.3
Re: [PATCH 2/2] net: phy: realtek: fix enabling of the TX-delay for RTL8211F
On 11/25/2016 05:12 AM, Martin Blumenstingl wrote: > The old logic always enabled the TX-delay when the phy-mode was set to > PHY_INTERFACE_MODE_RGMII. There are dedicated phy-modes which tell the > PHY driver to enable the RX and/or TX delays: > - PHY_INTERFACE_MODE_RGMII should disable the RX and TX delay in the > PHY (if required, the MAC should add the delays in this case) > - PHY_INTERFACE_MODE_RGMII_ID should enable RX and TX delay in the PHY > - PHY_INTERFACE_MODE_RGMII_TXID should enable the TX delay in the PHY > - PHY_INTERFACE_MODE_RGMII_RXID should enable the RX delay in the PHY > (currently not supported by RTL8211F) > > With this patch we enable the TX delay for PHY_INTERFACE_MODE_RGMII_ID > and PHY_INTERFACE_MODE_RGMII_TXID. > Additionally we now explicity disable the TX-delay, which seems to be > enabled automatically after a hard-reset of the PHY (by triggering it's > reset pin) to get a consistent state (as defined by the phy-mode). > > This fixes a compatibility problem with some SoCs where the TX-delay was > also added by the MAC. With the TX-delay being applied twice the TX > clock was off and TX traffic was broken or very slow (<10Mbit/s) on > 1000Mbit/s links. > > Signed-off-by: Martin BlumenstinglReviewed-by: Florian Fainelli -- Florian
Re: [PATCH 1/2] Documentation: devicetree: clarify usage of the RGMII phy-modes
On 11/25/2016 05:12 AM, Martin Blumenstingl wrote: > RGMII requires special RX and/or TX delays depending on the actual > hardware circuit/wiring. These delays can be added by the MAC, the PHY > or the designer of the circuit (the latter means that no delay has to > be added by PHY or MAC). > There are 4 RGMII phy-modes used describe where a delay should be > applied: > - rgmii: the RX and TX delays are either added by the MAC (where the > exact delay is typically configurable, and can be turned off when no > extra delay is needed) or not needed at all (because the hardware > wiring adds the delay already). The PHY should neither add the RX nor > TX delay in this case. > - rgmii-rxid: configures the PHY to enable the RX delay. The MAC should > not add the RX delay in this case. > - rgmii-txid: configures the PHY to enable the TX delay. The MAC should > not add the TX delay in this case. > - rgmii-id: combines rgmii-rxid and rgmii-txid and thus configures the > PHY to enable the RX and TX delays. The MAC should neither add the RX > nor TX delay in this case. > > Document these cases in the ethernet.txt documentation to make it clear > when to use each mode. > If applied incorrectly one might end up with MAC and PHY both enabling > for example the TX delay, which breaks ethernet TX traffic on 1000Mbit/s > links. > > Signed-off-by: Martin BlumenstinglReviewed-by: Florian Fainelli -- Florian
Re: [Patch net-next] net_sched: move the empty tp check from ->destroy() to ->delete()
On 27/11/2016 02:33, Daniel Borkmann wrote: On 11/26/2016 12:09 PM, Daniel Borkmann wrote: On 11/26/2016 07:46 AM, Cong Wang wrote: On Thu, Nov 24, 2016 at 7:20 AM, Daniel Borkmannwrote: [...] Ok, strange, qdisc_destroy() calls into ops->destroy(), where ingress drops its entire chain via tcf_destroy_chain(), so that will be NULL eventually. The tps are freed by call_rcu() as well as qdisc itself later on via qdisc_rcu_free(), where it frees per-cpu bstats as well. Outstanding readers should either bail out due to if (!cl) or can still process the chain until read section ends, but during that time, cl->q resp. bstats should be good. Do you happen to know what's at address 880a68b04028? I was wondering wrt call_rcu() vs call_rcu_bh(), but at least on ingress (netif_receive_skb_internal()) we hold rcu_read_lock() here. The KASAN report is reliably happening at this location, right? I am confused as well, I don't see how it could be related to my patch yet. I will take a deep look in the weekend. Hi Cong, When reported the new trace I didn't mean it's related to your patch, I just wanted to point it out it exposed something. I should have been clear about it. Ok, I'm currently on the run. Got too late yesterday night, but I'll write what I found in the evening today, not related to ingress though. Just pushed out my analysis to netdev under "[PATCH net] net, sched: respect rcu grace period on cls destruction". My conclusion is that both issues are actually separate, and that one is small enough where we could route it via net actually. Perhaps this at the same time shrinks your "[PATCH net-next] net_sched: move the empty tp check from ->destroy() to ->delete()" to a reasonable size that it's suitable to net as well. Your ->delete()/->destroy() one is definitely needed, too. The tp->root one is independant of ->delete()/ ->destroy() as they are different races and tp->root could also happen when you just destroy the whole tp directly. I think that seems like a good path forward to me. Thanks, Daniel Hi Daniel, As for the tainted kernel. I was in old (week or two) net-next tree and only cherry-picked from latest net-next related patches to Mellanox HCA, cls_api, cls_flower, devlink. so those are the tainted modules. I have the issue reproducing in that tree so wanted it to check it with Cong's patch instead of latest net-next. I'll try running reproducing the issue with your new patch and later try latest net-next as well. Thanks, Roi
Re: Large performance regression with 6in4 tunnel (sit)
Hi Sven-Haegar, On Fri, 25 Nov 2016 05:06:53 +0100 (CET) Sven-Haegar Kochwrote: > > Somehow this problem description really reminds me of a report on > netdev a bit ago, which the following patch fixed: > > commit 9ee6c5dc816aa8256257f2cd4008a9291ec7e985 > Author: Lance Richardson > Date: Wed Nov 2 16:36:17 2016 -0400 > > ipv4: allow local fragmentation in ip_finish_output_gso() > > Some configurations (e.g. geneve interface with default > MTU of 1500 over an ethernet interface with 1500 MTU) result > in the transmission of packets that exceed the configured MTU. > While this should be considered to be a "bad" configuration, > it is still allowed and should not result in the sending > of packets that exceed the configured MTU. > > Could this be related? > > I suppose it would be difficult to test this patch on this machine? The kernel I am running on is based on 4.7.8, so the above patch doesn't come close to applying. Most fo what it is reverting was introduced in commit 359ebda25aa0 ("net/ipv4: Introduce IPSKB_FRAG_SEGS bit to inet_skb_parm.flags") in v4.8-rc1. -- Cheers, Stephen Rothwell
ip manpage comments
Though not new to *nix, I am new to using the ip(8) command. Thus some of my historical assumptions about ip may be wrong. It seems that an inclusive manpage for the ip command was broken up into a shorter ip(8) manpage and 15 or more ip-(8) manpages. I'm basing this assumption on long, inclusive manpages on https://linux.die.net/man/8/ip and CentOS 6 while CentOS 7 and Fedora 24 each have the sub-divided style. I won't debate the wisdom of this subdivision, only comment on how it was done. The ip(8) manpage make no mention of additional subordinate documents. The listing of the additional documents in the See Also section is insufficient. This section is typically used to mention related commands and other sources of reference materials such as info docs, wikis, blogs, or mailing lists. When one does investigate one of the subordinate manpages, they do not state that they document subcommands of the ip command. In fact, on the ip-address(8) manpage it says The `ip address command' ... (quotes added) My first thought was "typo", this is the manpage about the "ip-address" command. Of course there is no ip-address command. But "ip address" is not a command either, it is the "ip" command with an argument. There are several commands that have broken their manpage into several manpages. Two which come to mind are zsh(1) and perl(1). The authors of those pages clearly state on the primary manpage that this is an overview page and give clear pointers to the additional manpages as well as additional documentation. I would recommend reorganizing the ip(8) manpage in a similar fashion. Thank you for consideration of my opinion and for the development of an awesome tool. Jon -- Jon H. LaBadie jo...@jgcomp.com
Re: Crash due to mutex genl_lock called from RCU context
On Sat, 2016-11-26 at 18:08 -0800, Cong Wang wrote: > On Fri, Nov 25, 2016 at 8:54 PM, Eric Dumazetwrote: > > > > Oh well, this wont work, since sk->sk_destruct will be called from RCU > > callback. > > > > Grabbing the mutex should not be done from netlink_sock_destruct() but > > from netlink_release() > > But you also change the behavior of cb.done(), currently it is called when the > last sock ref is gone, with your patch it is called when the first > sock is closed. No. It will be called when last refcount on the socket is released, sk_refcnt transitions to final 0. My patch changes the sock_hold() to the variant that makes sure sk_refcnt is not 0 before increase, otherwise a race can happen and release could be called twice. Classic refcounting stuff coupled to rcu rules. > No? Are you telling me inet_release() is called when we close() the first file descriptor ? fd1 = socket() fd2 = dup(fd1); close(fd2) -> release() ??? > > I don't see why we need to get genl_lock in ->done() here, because we are > already the last sock using it and module ref protects the ops from being > removed via module, seems we are pretty safe without any lock. Well, at least this exposes a real bug in Thomas patch. Removing the lock might be done for net-next, not stable branches.
Re: [PATCH] mlx4: give precise rx/tx bytes/packets counters
On Sun, 2016-11-27 at 00:47 +0200, Saeed Mahameed wrote: > On Fri, Nov 25, 2016 at 5:46 PM, Eric Dumazetwrote: > As you see here in SRIOV mode (PF only) reads sw stats from FW. > Tariq, I think we need to fix this. Sure, my patch does not change this at all. If mlx4_is_master() is false, then we aggregate the software states and only the software stats. My patch makes this aggregation possible at the time ethtool or ndo_get_stat64() are called, since this absolutely not depend on the 250 ms timer fetching hardware stats.
Re: Crash due to mutex genl_lock called from RCU context
On Fri, Nov 25, 2016 at 8:54 PM, Eric Dumazetwrote: > > Oh well, this wont work, since sk->sk_destruct will be called from RCU > callback. > > Grabbing the mutex should not be done from netlink_sock_destruct() but > from netlink_release() But you also change the behavior of cb.done(), currently it is called when the last sock ref is gone, with your patch it is called when the first sock is closed. No? I don't see why we need to get genl_lock in ->done() here, because we are already the last sock using it and module ref protects the ops from being removed via module, seems we are pretty safe without any lock.
Re: [PATCH net-next 6/6] tcp: SOF_TIMESTAMPING_OPT_STATS option for SO_TIMESTAMPING
Hi Francis, [auto build test ERROR on net-next/master] url: https://github.com/0day-ci/linux/commits/Yuchung-Cheng/tcp-sender-chronographs-instrumentation/20161127-041428 config: arm-spear6xx_defconfig (attached as .config) compiler: arm-linux-gnueabi-gcc (Debian 6.1.1-9) 6.1.1 20160705 reproduce: wget https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # save the attached .config to linux build tree make.cross ARCH=arm All errors (new ones prefixed by >>): net/built-in.o: In function `__skb_tstamp_tx': >> net/core/skbuff.c:3846: undefined reference to >> `tcp_get_timestamping_opt_stats' vim +3846 net/core/skbuff.c 3840 return; 3841 3842 if (tsonly) { 3843 if ((sk->sk_tsflags & SOF_TIMESTAMPING_OPT_STATS) && 3844 sk->sk_protocol == IPPROTO_TCP && 3845 sk->sk_type == SOCK_STREAM) > 3846 skb = tcp_get_timestamping_opt_stats(sk); 3847 else 3848 skb = alloc_skb(0, GFP_ATOMIC); 3849 } else { --- 0-DAY kernel test infrastructureOpen Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation .config.gz Description: application/gzip
Re: [PATCH net] sit: Set skb->protocol properly in ipip6_tunnel_xmit()
Hi Eli, [Just for Dave's information] On Fri, 25 Nov 2016 13:50:17 +0800 Eli Cooperwrote: > > Similar to commit ae148b085876 > ("ip6_tunnel: Update skb->protocol to ETH_P_IPV6 in ip6_tnl_xmit()"), > sit tunnels also need to update skb->protocol; otherwise, TSO/GSO packets > might not be properly segmented, which causes the packets being dropped. > > Reported-by: Stephen Rothwell > Tested-by: Eli Cooper > Cc: sta...@vger.kernel.org > Signed-off-by: Eli Cooper I tested this patch and it does *not* solve my problem. -- Cheers, Stephen Rothwell
Re: Large performance regression with 6in4 tunnel (sit)
Hi Eli, On Sun, 27 Nov 2016 11:54:41 +1100 Stephen Rothwellwrote: > > On Fri, 25 Nov 2016 14:05:04 +0800 Eli Cooper wrote: > > > > I think this is similar to the bug I fixed in commit ae148b085876 > > ("ip6_tunnel: Update skb->protocol to ETH_P_IPV6 in ip6_tnl_xmit()"). > > > > I can reproduce a similar problem by applying xfrm to sit traffic. > > TSO/GSO packets are dropped when IPSec is enabled, and IPv6 throughput > > drops to 10s of Kbps. I am not sure if this is the same issue you > > experienced, but I wrote a patch that fixed at least the issue I had. > > > > Could you test the patch I sent to the mailing list just now? > > Thanks for the patch! > > Its a bit tricky to test since the problem only occurs in a production > machine (I tried reproducing in a VM, but the problem did not occur), > but I will try to just rebuild the sit module and see if I can insert > the modified one. OK, I tried your patch and unfortunately, it doesn't seem to have worked ... I still get the large packets dropped and resent smaller. -- Cheers, Stephen Rothwell
Re: netlink: GPF in sock_sndtimeo
On Sat, Nov 26, 2016 at 7:44 AM, Dmitry Vyukovwrote: > Hello, > > The following program triggers GPF in sock_sndtimeo: > https://gist.githubusercontent.com/dvyukov/c19cadd309791cf5cb9b2bf936d3f48d/raw/1743ba0211079a5465d039512b427bc6b59b1a76/gistfile1.txt > > On commit 16ae16c6e5616c084168740990fc508bda6655d4 (Nov 24). > > general protection fault: [#1] SMP DEBUG_PAGEALLOC KASAN > Dumping ftrace buffer: >(ftrace buffer empty) > Modules linked in: > CPU: 1 PID: 19950 Comm: syz-executor Not tainted 4.9.0-rc5+ #54 > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 > task: 88002a0d0840 task.stack: 88003692 > RIP: 0010:[] [< inline >] sock_sndtimeo > include/net/sock.h:2075 > RIP: 0010:[] [] > netlink_unicast+0xe1/0x730 net/netlink/af_netlink.c:1232 > RSP: 0018:880036926f68 EFLAGS: 00010202 > RAX: 0068 RBX: 880036927000 RCX: c900021d > RDX: 0d63 RSI: 024000c0 RDI: 0340 > RBP: 880036927028 R08: ed0006ea7aab R09: ed0006ea7aab > R10: 0001 R11: ed0006ea7aaa R12: dc00 > R13: R14: 880035de3400 R15: 880035de3400 > FS: 7f90a2fc7700() GS:88003ed0() knlGS: > CS: 0010 DS: ES: CR0: 80050033 > CR2: 006de0c0 CR3: 35de6000 CR4: 06e0 > Stack: > 880035de3400 819f02a1 110006d24df4 0004 > 4db40014 880036926fd8 41b58ab3 > 89653c11 86cb3500 819f0345 880035de3400 > Call Trace: > [< inline >] audit_replace kernel/audit.c:817 > [] audit_receive_msg+0x22c9/0x2ce0 kernel/audit.c:894 > [< inline >] audit_receive_skb kernel/audit.c:1120 > [] audit_receive+0x1dc/0x360 kernel/audit.c:1133 > [< inline >] netlink_unicast_kernel net/netlink/af_netlink.c:1214 > [] netlink_unicast+0x514/0x730 > net/netlink/af_netlink.c:1240 > [] netlink_sendmsg+0xaa4/0xe50 > net/netlink/af_netlink.c:1786 > [< inline >] sock_sendmsg_nosec net/socket.c:621 > [] sock_sendmsg+0xcf/0x110 net/socket.c:631 > [] sock_write_iter+0x32b/0x620 net/socket.c:829 > [< inline >] new_sync_write fs/read_write.c:499 > [] __vfs_write+0x4fe/0x830 fs/read_write.c:512 > [] vfs_write+0x175/0x4e0 fs/read_write.c:560 > [< inline >] SYSC_write fs/read_write.c:607 > [] SyS_write+0x100/0x240 fs/read_write.c:599 > [] do_syscall_64+0x2f4/0x940 arch/x86/entry/common.c:280 > [] entry_SYSCALL64_slow_path+0x25/0x25 > Code: fe 4c 89 f7 e8 31 16 ff ff 8b 8d 70 ff ff ff 49 89 c7 31 c0 85 > c9 75 25 e8 7d 4a a3 fa 49 8d bd 40 03 00 00 48 89 f8 48 c1 e8 03 <42> > 80 3c 20 00 0f 85 3a 06 00 00 49 8b 85 40 03 00 00 4c 8d 73 > RIP [< inline >] sock_sndtimeo include/net/sock.h:2075 > RIP [] netlink_unicast+0xe1/0x730 > net/netlink/af_netlink.c:1232 > RSP > ---[ end trace 8383a15fba6fdc59 ]--- It is racy on audit_sock, especially on the netns exit path. Could the following patch help a little bit? Also, I don't see how the synchronize_net() here could sync with netlink rcv path, since unlike packets from wire, netlink messages are not handled in BH context nor I see any RCU taken on rcv path. diff --git a/kernel/audit.c b/kernel/audit.c index f1ca116..20bc79e 100644 --- a/kernel/audit.c +++ b/kernel/audit.c @@ -1167,10 +1167,13 @@ static void __net_exit audit_net_exit(struct net *net) { struct audit_net *aunet = net_generic(net, audit_net_id); struct sock *sock = aunet->nlsk; + + mutex_lock(_cmd_mutex); if (sock == audit_sock) { audit_pid = 0; audit_sock = NULL; } + mutex_unlock(_cmd_mutex); RCU_INIT_POINTER(aunet->nlsk, NULL); synchronize_net();
Re: [PATCH v3 net-next 1/2] net: ethernet: slicoss: add slicoss gigabit ethernet driver
Hi Rami, On 26.11.2016 16:48, Rami Rosen wrote: >> @@ -0,0 +1,28 @@ >> +config NET_VENDOR_ALACRITECH >> +bool "Alacritech devices" >> +default y >> +---help--- >> + If you have a network (Ethernet) card belonging to this class, >> say Y. >> + >> + Note that the answer to this question doesn't directly affect the >> + kernel: saying N will just cause the configurator to skip all > > Shouldn't it be "Alacritech devices" here, as appears earlier ? > >> + the questions about Renesas devices. If you say Y, you will be >> asked Yes, it definitely should not be Renesas :). This is a stupid copy and paste error, I will fix it, thank you! >> + for your specific device in the following questions. >> + > > ... > ... > ... >> +struct slic_device { >> + struct pci_dev *pdev; > ... >> + bool promisc; > > Seems that the autoneg boolean is not used anywhere, apart from > setting it once to true in > the slic_set_link_autoneg() method. Apart from this member it is not > accessed anywhere, so it seems it should be removed. > >> + bool autoneg; >> + int speed; Agreed, this variable can be removed. > ... > >> +static int slic_load_rcvseq_firmware(struct slic_device *sdev) >> +{ >> + const struct firmware *fw; >> + const char *file; >> + u32 codelen; >> + int idx = 0; >> + u32 instr; >> + u32 addr; >> + int err; >> + > ... >> + /* Do an initial sanity check concerning firmware size now. A further >> +* check follows below. >> +*/ >> + if (fw->size < SLIC_FIRMWARE_MIN_SIZE) { >> + dev_err(>pdev->dev, >> + "invalid firmware size %zu (min %u expected)\n", >> + fw->size, SLIC_FIRMWARE_MIN_SIZE); >> + err = -EINVAL; > > in the release label, always 0 is returned: > >> + goto release; >> + } >> + >> + codelen = slic_read_dword_from_firmware(fw, ); >> + >> + /* do another sanity check against firmware size */ >> + if ((codelen + 4) > fw->size) { >> + dev_err(>pdev->dev, >> + "invalid rcv-sequencer firmware size %zu\n", >> fw->size); >> + err = -EINVAL; > > Again, in the release label, always 0 is returned: > >> + goto release; >> + } >> + >> >> +release: >> + release_firmware(fw); >> + >> + return 0; >> +} This should return "err", I will fix it. Thanks a lot for the review! Regards, Lino
Re: Large performance regression with 6in4 tunnel (sit)
Hi Eli, On Fri, 25 Nov 2016 14:05:04 +0800 Eli Cooperwrote: > > I think this is similar to the bug I fixed in commit ae148b085876 > ("ip6_tunnel: Update skb->protocol to ETH_P_IPV6 in ip6_tnl_xmit()"). > > I can reproduce a similar problem by applying xfrm to sit traffic. > TSO/GSO packets are dropped when IPSec is enabled, and IPv6 throughput > drops to 10s of Kbps. I am not sure if this is the same issue you > experienced, but I wrote a patch that fixed at least the issue I had. > > Could you test the patch I sent to the mailing list just now? Thanks for the patch! Its a bit tricky to test since the problem only occurs in a production machine (I tried reproducing in a VM, but the problem did not occur), but I will try to just rebuild the sit module and see if I can insert the modified one. -- Cheers, Stephen Rothwell
[PATCH net-next 4/4] bnxt_en: Add PFC statistics.
Report PFC statistics to ethtool -S and DCBNL. Signed-off-by: Michael Chan--- drivers/net/ethernet/broadcom/bnxt/bnxt.h | 7 +++ drivers/net/ethernet/broadcom/bnxt/bnxt_dcb.c | 14 +- drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c | 23 --- 3 files changed, 36 insertions(+), 8 deletions(-) diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.h b/drivers/net/ethernet/broadcom/bnxt/bnxt.h index 275e560..a72adec 100644 --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.h +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.h @@ -1124,6 +1124,13 @@ struct bnxt { u32 lpi_tmr_hi; }; +#define BNXT_RX_STATS_OFFSET(counter) \ + (offsetof(struct rx_port_stats, counter) / 8) + +#define BNXT_TX_STATS_OFFSET(counter) \ + ((offsetof(struct tx_port_stats, counter) + \ + sizeof(struct rx_port_stats) + 512) / 8) + #ifdef CONFIG_NET_RX_BUSY_POLL static inline void bnxt_enable_poll(struct bnxt_napi *bnapi) { diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_dcb.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_dcb.c index f391b47..fdf2d8c 100644 --- a/drivers/net/ethernet/broadcom/bnxt/bnxt_dcb.c +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_dcb.c @@ -347,8 +347,10 @@ static int bnxt_dcbnl_ieee_setets(struct net_device *dev, struct ieee_ets *ets) static int bnxt_dcbnl_ieee_getpfc(struct net_device *dev, struct ieee_pfc *pfc) { struct bnxt *bp = netdev_priv(dev); + __le64 *stats = (__le64 *)bp->hw_rx_port_stats; struct ieee_pfc *my_pfc = bp->ieee_pfc; - int rc; + long rx_off, tx_off; + int i, rc; pfc->pfc_cap = bp->max_lltc; @@ -369,6 +371,16 @@ static int bnxt_dcbnl_ieee_getpfc(struct net_device *dev, struct ieee_pfc *pfc) pfc->mbc = my_pfc->mbc; pfc->delay = my_pfc->delay; + if (!stats) + return 0; + + rx_off = BNXT_RX_STATS_OFFSET(rx_pfc_ena_frames_pri0); + tx_off = BNXT_TX_STATS_OFFSET(tx_pfc_ena_frames_pri0); + for (i = 0; i < IEEE_8021QAZ_MAX_TCS; i++, rx_off++, tx_off++) { + pfc->requests[i] = le64_to_cpu(*(stats + tx_off)); + pfc->indications[i] = le64_to_cpu(*(stats + rx_off)); + } + return 0; } diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c index fa6125e..784aa77 100644 --- a/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c @@ -107,16 +107,9 @@ static int bnxt_set_coalesce(struct net_device *dev, #define BNXT_NUM_STATS 21 -#define BNXT_RX_STATS_OFFSET(counter) \ - (offsetof(struct rx_port_stats, counter) / 8) - #define BNXT_RX_STATS_ENTRY(counter) \ { BNXT_RX_STATS_OFFSET(counter), __stringify(counter) } -#define BNXT_TX_STATS_OFFSET(counter) \ - ((offsetof(struct tx_port_stats, counter) + \ - sizeof(struct rx_port_stats) + 512) / 8) - #define BNXT_TX_STATS_ENTRY(counter) \ { BNXT_TX_STATS_OFFSET(counter), __stringify(counter) } @@ -150,6 +143,14 @@ static int bnxt_set_coalesce(struct net_device *dev, BNXT_RX_STATS_ENTRY(rx_tagged_frames), BNXT_RX_STATS_ENTRY(rx_double_tagged_frames), BNXT_RX_STATS_ENTRY(rx_good_frames), + BNXT_RX_STATS_ENTRY(rx_pfc_ena_frames_pri0), + BNXT_RX_STATS_ENTRY(rx_pfc_ena_frames_pri1), + BNXT_RX_STATS_ENTRY(rx_pfc_ena_frames_pri2), + BNXT_RX_STATS_ENTRY(rx_pfc_ena_frames_pri3), + BNXT_RX_STATS_ENTRY(rx_pfc_ena_frames_pri4), + BNXT_RX_STATS_ENTRY(rx_pfc_ena_frames_pri5), + BNXT_RX_STATS_ENTRY(rx_pfc_ena_frames_pri6), + BNXT_RX_STATS_ENTRY(rx_pfc_ena_frames_pri7), BNXT_RX_STATS_ENTRY(rx_undrsz_frames), BNXT_RX_STATS_ENTRY(rx_eee_lpi_events), BNXT_RX_STATS_ENTRY(rx_eee_lpi_duration), @@ -179,6 +180,14 @@ static int bnxt_set_coalesce(struct net_device *dev, BNXT_TX_STATS_ENTRY(tx_fcs_err_frames), BNXT_TX_STATS_ENTRY(tx_err), BNXT_TX_STATS_ENTRY(tx_fifo_underruns), + BNXT_TX_STATS_ENTRY(tx_pfc_ena_frames_pri0), + BNXT_TX_STATS_ENTRY(tx_pfc_ena_frames_pri1), + BNXT_TX_STATS_ENTRY(tx_pfc_ena_frames_pri2), + BNXT_TX_STATS_ENTRY(tx_pfc_ena_frames_pri3), + BNXT_TX_STATS_ENTRY(tx_pfc_ena_frames_pri4), + BNXT_TX_STATS_ENTRY(tx_pfc_ena_frames_pri5), + BNXT_TX_STATS_ENTRY(tx_pfc_ena_frames_pri6), + BNXT_TX_STATS_ENTRY(tx_pfc_ena_frames_pri7), BNXT_TX_STATS_ENTRY(tx_eee_lpi_events), BNXT_TX_STATS_ENTRY(tx_eee_lpi_duration), BNXT_TX_STATS_ENTRY(tx_total_collisions), -- 1.8.3.1
[PATCH net-next 0/4] bnxt_en: Add DCBNL support.
This series adds DCBNL operations to support host-based IEEE DCBX. Michael Chan (4): bnxt_en: Re-factor bnxt_setup_tc(). bnxt_en: Update firmware header file to include DCB command structs. bnxt_en: Implement DCBNL to support host-based DCBX. bnxt_en: Add PFC statistics. drivers/net/ethernet/broadcom/Kconfig | 10 + drivers/net/ethernet/broadcom/bnxt/Makefile | 2 +- drivers/net/ethernet/broadcom/bnxt/bnxt.c | 30 +- drivers/net/ethernet/broadcom/bnxt/bnxt.h | 18 + drivers/net/ethernet/broadcom/bnxt/bnxt_dcb.c | 502 ++ drivers/net/ethernet/broadcom/bnxt/bnxt_dcb.h | 59 +++ drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c | 23 +- drivers/net/ethernet/broadcom/bnxt/bnxt_hsi.h | 326 ++ 8 files changed, 952 insertions(+), 18 deletions(-) create mode 100644 drivers/net/ethernet/broadcom/bnxt/bnxt_dcb.c create mode 100644 drivers/net/ethernet/broadcom/bnxt/bnxt_dcb.h -- 1.8.3.1
[PATCH net-next 2/4] bnxt_en: Update firmware header file to include DCB command structs.
Get and store the max number of lossless TCs the hardware can support. Signed-off-by: Michael Chan--- drivers/net/ethernet/broadcom/bnxt/bnxt.c | 4 + drivers/net/ethernet/broadcom/bnxt/bnxt.h | 1 + drivers/net/ethernet/broadcom/bnxt/bnxt_hsi.h | 326 ++ 3 files changed, 331 insertions(+) diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c index b75f4d0..58a75f4 100644 --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c @@ -4252,12 +4252,16 @@ static int bnxt_hwrm_queue_qportcfg(struct bnxt *bp) goto qportcfg_exit; } bp->max_tc = resp->max_configurable_queues; + bp->max_lltc = resp->max_configurable_lossless_queues; if (bp->max_tc > BNXT_MAX_QUEUE) bp->max_tc = BNXT_MAX_QUEUE; if (resp->queue_cfg_info & QUEUE_QPORTCFG_RESP_QUEUE_CFG_INFO_ASYM_CFG) bp->max_tc = 1; + if (bp->max_lltc > bp->max_tc) + bp->max_lltc = bp->max_tc; + qptr = >queue_id0; for (i = 0; i < bp->max_tc; i++) { bp->q_info[i].queue_id = *qptr++; diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.h b/drivers/net/ethernet/broadcom/bnxt/bnxt.h index fcd07ee..edde11e 100644 --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.h +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.h @@ -1010,6 +1010,7 @@ struct bnxt { u32 rss_hash_cfg; u8 max_tc; + u8 max_lltc; /* lossless TCs */ struct bnxt_queue_info q_info[BNXT_MAX_QUEUE]; unsigned intcurrent_interval; diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_hsi.h b/drivers/net/ethernet/broadcom/bnxt/bnxt_hsi.h index 0456d5b..5565612 100644 --- a/drivers/net/ethernet/broadcom/bnxt/bnxt_hsi.h +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_hsi.h @@ -2355,6 +2355,39 @@ struct hwrm_queue_cfg_output { u8 valid; }; +/* hwrm_queue_pfcenable_qcfg */ +/* Input (24 bytes) */ +struct hwrm_queue_pfcenable_qcfg_input { + __le16 req_type; + __le16 cmpl_ring; + __le16 seq_id; + __le16 target_id; + __le64 resp_addr; + __le16 port_id; + __le16 unused_0[3]; +}; + +/* Output (16 bytes) */ +struct hwrm_queue_pfcenable_qcfg_output { + __le16 error_code; + __le16 req_type; + __le16 seq_id; + __le16 resp_len; + __le32 flags; + #define QUEUE_PFCENABLE_QCFG_RESP_FLAGS_PRI0_PFC_ENABLED 0x1UL + #define QUEUE_PFCENABLE_QCFG_RESP_FLAGS_PRI1_PFC_ENABLED 0x2UL + #define QUEUE_PFCENABLE_QCFG_RESP_FLAGS_PRI2_PFC_ENABLED 0x4UL + #define QUEUE_PFCENABLE_QCFG_RESP_FLAGS_PRI3_PFC_ENABLED 0x8UL + #define QUEUE_PFCENABLE_QCFG_RESP_FLAGS_PRI4_PFC_ENABLED 0x10UL + #define QUEUE_PFCENABLE_QCFG_RESP_FLAGS_PRI5_PFC_ENABLED 0x20UL + #define QUEUE_PFCENABLE_QCFG_RESP_FLAGS_PRI6_PFC_ENABLED 0x40UL + #define QUEUE_PFCENABLE_QCFG_RESP_FLAGS_PRI7_PFC_ENABLED 0x80UL + u8 unused_0; + u8 unused_1; + u8 unused_2; + u8 valid; +}; + /* hwrm_queue_pfcenable_cfg */ /* Input (24 bytes) */ struct hwrm_queue_pfcenable_cfg_input { @@ -2389,6 +2422,48 @@ struct hwrm_queue_pfcenable_cfg_output { u8 valid; }; +/* hwrm_queue_pri2cos_qcfg */ +/* Input (24 bytes) */ +struct hwrm_queue_pri2cos_qcfg_input { + __le16 req_type; + __le16 cmpl_ring; + __le16 seq_id; + __le16 target_id; + __le64 resp_addr; + __le32 flags; + #define QUEUE_PRI2COS_QCFG_REQ_FLAGS_PATH 0x1UL + #define QUEUE_PRI2COS_QCFG_REQ_FLAGS_PATH_TX (0x0UL << 0) + #define QUEUE_PRI2COS_QCFG_REQ_FLAGS_PATH_RX (0x1UL << 0) + #define QUEUE_PRI2COS_QCFG_REQ_FLAGS_PATH_LAST QUEUE_PRI2COS_QCFG_REQ_FLAGS_PATH_RX + #define QUEUE_PRI2COS_QCFG_REQ_FLAGS_IVLAN 0x2UL + u8 port_id; + u8 unused_0[3]; +}; + +/* Output (24 bytes) */ +struct hwrm_queue_pri2cos_qcfg_output { + __le16 error_code; + __le16 req_type; + __le16 seq_id; + __le16 resp_len; + u8 pri0_cos_queue_id; + u8 pri1_cos_queue_id; + u8 pri2_cos_queue_id; + u8 pri3_cos_queue_id; + u8 pri4_cos_queue_id; + u8 pri5_cos_queue_id; + u8 pri6_cos_queue_id; + u8 pri7_cos_queue_id; + u8 queue_cfg_info; + #define QUEUE_PRI2COS_QCFG_RESP_QUEUE_CFG_INFO_ASYM_CFG0x1UL + u8 unused_0; + __le16 unused_1; + u8 unused_2; + u8 unused_3; + u8 unused_4; + u8 valid; +}; + /* hwrm_queue_pri2cos_cfg */ /* Input (40 bytes) */ struct hwrm_queue_pri2cos_cfg_input { @@ -2439,6 +2514,257 @@ struct hwrm_queue_pri2cos_cfg_output { u8 valid; }; +/* hwrm_queue_cos2bw_qcfg */ +/* Input (24 bytes) */ +struct
[PATCH net-next 3/4] bnxt_en: Implement DCBNL to support host-based DCBX.
Support only IEEE DCBX initially. Add IEEE DCBNL ops and functions to get and set the hardware DCBX parameters. The DCB code is conditional on Kconfig CONFIG_BNXT_DCB. Signed-off-by: Michael Chan--- drivers/net/ethernet/broadcom/Kconfig | 10 + drivers/net/ethernet/broadcom/bnxt/Makefile | 2 +- drivers/net/ethernet/broadcom/bnxt/bnxt.c | 8 +- drivers/net/ethernet/broadcom/bnxt/bnxt.h | 9 + drivers/net/ethernet/broadcom/bnxt/bnxt_dcb.c | 490 ++ drivers/net/ethernet/broadcom/bnxt/bnxt_dcb.h | 59 6 files changed, 575 insertions(+), 3 deletions(-) create mode 100644 drivers/net/ethernet/broadcom/bnxt/bnxt_dcb.c create mode 100644 drivers/net/ethernet/broadcom/bnxt/bnxt_dcb.h diff --git a/drivers/net/ethernet/broadcom/Kconfig b/drivers/net/ethernet/broadcom/Kconfig index bd8c80c..404c020 100644 --- a/drivers/net/ethernet/broadcom/Kconfig +++ b/drivers/net/ethernet/broadcom/Kconfig @@ -203,4 +203,14 @@ config BNXT_SRIOV Virtualization support in the NetXtreme-C/E products. This allows for virtual function acceleration in virtual environments. +config BNXT_DCB + bool "Data Center Bridging (DCB) Support" + default n + depends on BNXT && DCB + ---help--- + Say Y here if you want to use Data Center Bridging (DCB) in the + driver. + + If unsure, say N. + endif # NET_VENDOR_BROADCOM diff --git a/drivers/net/ethernet/broadcom/bnxt/Makefile b/drivers/net/ethernet/broadcom/bnxt/Makefile index 97e78e2..b233a86 100644 --- a/drivers/net/ethernet/broadcom/bnxt/Makefile +++ b/drivers/net/ethernet/broadcom/bnxt/Makefile @@ -1,3 +1,3 @@ obj-$(CONFIG_BNXT) += bnxt_en.o -bnxt_en-y := bnxt.o bnxt_sriov.o bnxt_ethtool.o +bnxt_en-y := bnxt.o bnxt_sriov.o bnxt_ethtool.o bnxt_dcb.o diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c index 58a75f4..cec24b4 100644 --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c @@ -54,6 +54,7 @@ #include "bnxt.h" #include "bnxt_sriov.h" #include "bnxt_ethtool.h" +#include "bnxt_dcb.h" #define BNXT_TX_TIMEOUT(5 * HZ) @@ -4988,7 +4989,7 @@ static void bnxt_enable_napi(struct bnxt *bp) } } -static void bnxt_tx_disable(struct bnxt *bp) +void bnxt_tx_disable(struct bnxt *bp) { int i; struct bnxt_tx_ring_info *txr; @@ -5006,7 +5007,7 @@ static void bnxt_tx_disable(struct bnxt *bp) netif_carrier_off(bp->dev); } -static void bnxt_tx_enable(struct bnxt *bp) +void bnxt_tx_enable(struct bnxt *bp) { int i; struct bnxt_tx_ring_info *txr; @@ -6677,6 +6678,7 @@ static void bnxt_remove_one(struct pci_dev *pdev) bnxt_hwrm_func_drv_unrgtr(bp); bnxt_free_hwrm_resources(bp); + bnxt_dcb_free(bp); pci_iounmap(pdev, bp->bar2); pci_iounmap(pdev, bp->bar1); pci_iounmap(pdev, bp->bar0); @@ -6904,6 +6906,8 @@ static int bnxt_init_one(struct pci_dev *pdev, const struct pci_device_id *ent) dev->min_mtu = ETH_ZLEN; dev->max_mtu = 9500; + bnxt_dcb_init(bp); + #ifdef CONFIG_BNXT_SRIOV init_waitqueue_head(>sriov_cfg_wait); #endif diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.h b/drivers/net/ethernet/broadcom/bnxt/bnxt.h index edde11e..275e560 100644 --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.h +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.h @@ -1026,6 +1026,13 @@ struct bnxt { struct bnxt_irq *irq_tbl; u8 mac_addr[ETH_ALEN]; +#ifdef CONFIG_BNXT_DCB + struct ieee_pfc *ieee_pfc; + struct ieee_ets *ieee_ets; + u8 dcbx_cap; + u8 default_pri; +#endif /* CONFIG_BNXT_DCB */ + u32 msg_enable; u32 hwrm_spec_code; @@ -1221,6 +1228,8 @@ static inline void bnxt_disable_poll(struct bnxt_napi *bnapi) int hwrm_send_message_silent(struct bnxt *, void *, u32, int); int bnxt_hwrm_set_coal(struct bnxt *); int bnxt_hwrm_func_qcaps(struct bnxt *); +void bnxt_tx_disable(struct bnxt *bp); +void bnxt_tx_enable(struct bnxt *bp); int bnxt_hwrm_set_pause(struct bnxt *); int bnxt_hwrm_set_link_setting(struct bnxt *, bool, bool); int bnxt_hwrm_fw_set_time(struct bnxt *); diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_dcb.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_dcb.c new file mode 100644 index 000..f391b47 --- /dev/null +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_dcb.c @@ -0,0 +1,490 @@ +/* Broadcom NetXtreme-C/E network driver. + * + * Copyright (c) 2014-2016 Broadcom Corporation + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation. + */ + +#include +#include +#include +#include +#include +#include
[PATCH net-next 1/4] bnxt_en: Re-factor bnxt_setup_tc().
Add a new function bnxt_setup_mq_tc() to handle MQPRIO. This new function will be called during ETS setup when we add DCBNL in the next patch. Signed-off-by: Michael Chan--- drivers/net/ethernet/broadcom/bnxt/bnxt.c | 18 ++ drivers/net/ethernet/broadcom/bnxt/bnxt.h | 1 + 2 files changed, 11 insertions(+), 8 deletions(-) diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c index 8c7bdbe..b75f4d0 100644 --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c @@ -6328,17 +6328,10 @@ static int bnxt_change_mtu(struct net_device *dev, int new_mtu) return 0; } -static int bnxt_setup_tc(struct net_device *dev, u32 handle, __be16 proto, -struct tc_to_netdev *ntc) +int bnxt_setup_mq_tc(struct net_device *dev, u8 tc) { struct bnxt *bp = netdev_priv(dev); bool sh = false; - u8 tc; - - if (ntc->type != TC_SETUP_MQPRIO) - return -EINVAL; - - tc = ntc->tc; if (tc > bp->max_tc) { netdev_err(dev, "too many traffic classes requested: %d Max supported is %d\n", @@ -6381,6 +6374,15 @@ static int bnxt_setup_tc(struct net_device *dev, u32 handle, __be16 proto, return 0; } +static int bnxt_setup_tc(struct net_device *dev, u32 handle, __be16 proto, +struct tc_to_netdev *ntc) +{ + if (ntc->type != TC_SETUP_MQPRIO) + return -EINVAL; + + return bnxt_setup_mq_tc(dev, ntc->tc); +} + #ifdef CONFIG_RFS_ACCEL static bool bnxt_fltr_match(struct bnxt_ntuple_filter *f1, struct bnxt_ntuple_filter *f2) diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.h b/drivers/net/ethernet/broadcom/bnxt/bnxt.h index 47be789..fcd07ee 100644 --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.h +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.h @@ -1225,5 +1225,6 @@ static inline void bnxt_disable_poll(struct bnxt_napi *bnapi) int bnxt_hwrm_fw_set_time(struct bnxt *); int bnxt_open_nic(struct bnxt *, bool, bool); int bnxt_close_nic(struct bnxt *, bool, bool); +int bnxt_setup_mq_tc(struct net_device *dev, u8 tc); int bnxt_get_max_rings(struct bnxt *, int *, int *, bool); #endif -- 1.8.3.1
Re: [Patch net-next] net_sched: move the empty tp check from ->destroy() to ->delete()
On 11/26/2016 12:09 PM, Daniel Borkmann wrote: On 11/26/2016 07:46 AM, Cong Wang wrote: On Thu, Nov 24, 2016 at 7:20 AM, Daniel Borkmannwrote: [...] Ok, strange, qdisc_destroy() calls into ops->destroy(), where ingress drops its entire chain via tcf_destroy_chain(), so that will be NULL eventually. The tps are freed by call_rcu() as well as qdisc itself later on via qdisc_rcu_free(), where it frees per-cpu bstats as well. Outstanding readers should either bail out due to if (!cl) or can still process the chain until read section ends, but during that time, cl->q resp. bstats should be good. Do you happen to know what's at address 880a68b04028? I was wondering wrt call_rcu() vs call_rcu_bh(), but at least on ingress (netif_receive_skb_internal()) we hold rcu_read_lock() here. The KASAN report is reliably happening at this location, right? I am confused as well, I don't see how it could be related to my patch yet. I will take a deep look in the weekend. Ok, I'm currently on the run. Got too late yesterday night, but I'll write what I found in the evening today, not related to ingress though. Just pushed out my analysis to netdev under "[PATCH net] net, sched: respect rcu grace period on cls destruction". My conclusion is that both issues are actually separate, and that one is small enough where we could route it via net actually. Perhaps this at the same time shrinks your "[PATCH net-next] net_sched: move the empty tp check from ->destroy() to ->delete()" to a reasonable size that it's suitable to net as well. Your ->delete()/->destroy() one is definitely needed, too. The tp->root one is independant of ->delete()/ ->destroy() as they are different races and tp->root could also happen when you just destroy the whole tp directly. I think that seems like a good path forward to me. Thanks, Daniel
[PATCH net] net, sched: respect rcu grace period on cls destruction
Roi reported a crash in flower where tp->root was NULL in ->classify() callbacks. Reason is that in ->destroy() tp->root is set to NULL via RCU_INIT_POINTER(). It's problematic for some of the classifiers, because this doesn't respect RCU grace period for them, and as a result, still outstanding readers from tc_classify() will try to blindly dereference a NULL tp->root. The tp->root object is strictly private to the classifier implementation and holds internal data the core such as tc_ctl_tfilter() doesn't know about. Within some classifiers, such as cls_bpf, cls_basic, etc, tp->root is only checked for NULL in ->get() callback, but nowhere else. This is misleading and seemed to be copied from old classifier code that was not cleaned up properly. For example, d3fa76ee6b4a ("[NET_SCHED]: cls_basic: fix NULL pointer dereference") moved tp->root initialization into ->init() routine, where before it was part of ->change(), so ->get() had to deal with tp->root being NULL back then, so that was indeed a valid case, after d3fa76ee6b4a, not really anymore. We used to set tp->root to NULL long ago in ->destroy(), see 47a1a1d4be29 ("pkt_sched: remove unnecessary xchg() in packet classifiers"); but the NULLifying was reintroduced with the RCUification, but it's not correct for every classifier implementation. In the cases that are fixed here with one exception of cls_cgroup, tp->root object is allocated and initialized inside ->init() callback, which is always performed at a point in time after we allocate a new tp, which means tp and thus tp->root was not globally visible in the tp chain yet (see tc_ctl_tfilter()). Also, on destruction tp->root is strictly kfree_rcu()'ed in ->destroy() handler, same for the tp which is kfree_rcu()'ed right when we return from ->destroy() in tcf_destroy(). This means, the head object's lifetime for such classifiers is always tied to the tp lifetime. The RCU callback invocation for the two kfree_rcu() could be out of order, but that's fine since both are independent. Dropping the RCU_INIT_POINTER(tp->root, NULL) for these classifiers here means that 1) we don't need a useless NULL check in fast-path and, 2) that outstanding readers of that tp in tc_classify() can still execute under respect with RCU grace period as it is actually expected. Things that haven't been touched here: cls_fw and cls_route. They each handle tp->root being NULL in ->classify() path for historic reasons, so their ->destroy() implementation can stay as is. If someone actually cares, they could get cleaned up at some point to avoid the test in fast path. cls_u32 doesn't set tp->root to NULL. For cls_rsvp, I just added a !head should anyone actually be using/testing it, so it at least aligns with cls_fw and cls_route. For cls_flower we additionally need to defer rhashtable destruction (to a sleepable context) after RCU grace period as concurrent readers might still access it. (Note that in this case we need to hold module reference to keep work callback address intact, since we only wait on module unload for all call_rcu()s to finish.) This fixes one race to bring RCU grace period guarantees back. Next step as worked on by Cong however is to fix 1e052be69d04 ("net_sched: destroy proto tp when all filters are gone") to get the order of unlinking the tp in tc_ctl_tfilter() for the RTM_DELTFILTER case right by moving RCU_INIT_POINTER() before tcf_destroy() and let the notification for removal be done through the prior ->delete() callback. Both are independant issues. Once we have that right, we can then clean tp->root up for a number of classifiers by not making them RCU pointers, which requires a new callback (->uninit) that is triggered from tp's RCU callback, where we just kfree() tp->root from there. Fixes: 1f947bf151e9 ("net: sched: rcu'ify cls_bpf") Fixes: 9888faefe132 ("net: sched: cls_basic use RCU") Fixes: 70da9f0bf999 ("net: sched: cls_flow use RCU") Fixes: 77b9900ef53a ("tc: introduce Flower classifier") Fixes: bf3994d2ed31 ("net/sched: introduce Match-all classifier") Fixes: 952313bd6258 ("net: sched: cls_cgroup use RCU") Reported-by: Roi DayanSigned-off-by: Daniel Borkmann Cc: Cong Wang Cc: John Fastabend Cc: Roi Dayan Cc: Jiri Pirko --- net/sched/cls_basic.c| 4 net/sched/cls_bpf.c | 4 net/sched/cls_cgroup.c | 7 +++ net/sched/cls_flow.c | 1 - net/sched/cls_flower.c | 31 ++- net/sched/cls_matchall.c | 1 - net/sched/cls_rsvp.h | 3 ++- net/sched/cls_tcindex.c | 1 - 8 files changed, 31 insertions(+), 21 deletions(-) diff --git a/net/sched/cls_basic.c b/net/sched/cls_basic.c index eb219b7..5877f60 100644 --- a/net/sched/cls_basic.c +++ b/net/sched/cls_basic.c @@ -62,9 +62,6 @@ static unsigned long basic_get(struct tcf_proto *tp, u32 handle) struct basic_head *head =
Re: [PATCH] mlx4: give precise rx/tx bytes/packets counters
On Fri, Nov 25, 2016 at 5:46 PM, Eric Dumazetwrote: > From: Eric Dumazet > > mlx4 stats are chaotic because a deferred work queue is responsible > to update them every 250 ms. > Hello Eric, Well the only historical reason for this deferred work is that we query FW for some counters which might sleep. and there is one place in the kernel where dev_get_stats(dev, ) is called under a rw lock "read_lock(_base_lock);" in http://lxr.free-electrons.com/source/net/core/net-sysfs.c#L552, i am not sure why is it this way ? Maybe it is time fix this and get rid of the deferred work, which will give you the same precision even for when reading ehttool stats, which this patch didn't take care off. this will also improve other drivers who might sleep while reading stats. > Even sampling stats every one second with "sar -n DEV 1" gives > variations like the following : > > lpaa23:~# sar -n DEV 1 10 | grep eth0 | cut -c1-65 > 07:39:22 eth0 146877.00 3265554.00 9467.15 4828168.50 > 07:39:23 eth0 146587.00 3260329.00 9448.15 4820445.98 > 07:39:24 eth0 146894.00 3259989.00 9468.55 4819943.26 > 07:39:25 eth0 110368.00 2454497.00 7113.95 3629012.17 <<>> > 07:39:26 eth0 146563.00 3257502.00 9447.25 4816266.23 > 07:39:27 eth0 145678.00 3258292.00 9389.79 4817414.39 > 07:39:28 eth0 145268.00 3253171.00 9363.85 4809852.46 > 07:39:29 eth0 146439.00 3262185.00 9438.97 4823172.48 > 07:39:30 eth0 146758.00 3264175.00 9459.94 4826124.13 > 07:39:31 eth0 146843.00 3256903.00 9465.44 4815381.97 > Average: eth0 142827.50 3179259.70 9206.30 4700578.16 > > This patch allows rx/tx bytes/packets counters being folded at the > time we need stats. > > We now can fetch stats every 1 ms if we want to check NIC behavior > on a small time window. It is also easier to detect anomalies. > > lpaa23:~# sar -n DEV 1 10 | grep eth0 | cut -c1-65 > 07:42:50 eth0 142915.00 3177696.00 9212.06 4698270.42 > 07:42:51 eth0 143741.00 3200232.00 9265.15 4731593.02 > 07:42:52 eth0 142781.00 3171600.00 9202.92 4689260.16 > 07:42:53 eth0 143835.00 3192932.00 9271.80 4720761.39 > 07:42:54 eth0 141922.00 3165174.00 9147.64 4679759.21 > 07:42:55 eth0 142993.00 3207038.00 9216.78 4741653.05 > 07:42:56 eth0 141394.06 3154335.64 9113.85 4663731.73 > 07:42:57 eth0 141850.00 3161202.00 9144.48 4673866.07 > 07:42:58 eth0 143439.00 3180736.00 9246.05 4702755.35 > 07:42:59 eth0 143501.00 3210992.00 9249.99 4747501.84 > Average: eth0 142835.66 3182165.93 9206.98 4704874.08 > > Signed-off-by: Eric Dumazet > Cc: Tariq Toukan > --- > drivers/net/ethernet/mellanox/mlx4/en_ethtool.c |2 > drivers/net/ethernet/mellanox/mlx4/en_netdev.c |1 > drivers/net/ethernet/mellanox/mlx4/en_port.c| 77 +- > drivers/net/ethernet/mellanox/mlx4/mlx4_en.h|1 > 4 files changed, 58 insertions(+), 23 deletions(-) > > diff --git a/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c > b/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c > index > 487a58f9c192896852fef271b6cce9bde132deb7..d9c9f86a30df953fa555934c5406057dcaf28960 > 100644 > --- a/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c > +++ b/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c > @@ -367,6 +367,8 @@ static void mlx4_en_get_ethtool_stats(struct net_device > *dev, > > spin_lock_bh(>stats_lock); > > + mlx4_en_fold_software_stats(dev); > + > for (i = 0; i < NUM_MAIN_STATS; i++, bitmap_iterator_inc()) > if (bitmap_iterator_test()) > data[index++] = ((unsigned long *)>stats)[i]; > diff --git a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c > b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c > index > 9018bb1b2e12142e048281a9d28ddf95e0023a61..d28d841db23ce885d2011877a156bacf23f65afe > 100644 > --- a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c > +++ b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c > @@ -1321,6 +1321,7 @@ mlx4_en_get_stats64(struct net_device *dev, struct > rtnl_link_stats64 *stats) > struct mlx4_en_priv *priv = netdev_priv(dev); > > spin_lock_bh(>stats_lock); > + mlx4_en_fold_software_stats(dev); > netdev_stats_to_stats64(stats, >stats); > spin_unlock_bh(>stats_lock); > > diff --git a/drivers/net/ethernet/mellanox/mlx4/en_port.c > b/drivers/net/ethernet/mellanox/mlx4/en_port.c > index > 1eb4c1e10bad1dad26049876acf107a2073a6ab1..c6c4f1238923e09eced547454b86c68720292859 > 100644 > --- a/drivers/net/ethernet/mellanox/mlx4/en_port.c > +++ b/drivers/net/ethernet/mellanox/mlx4/en_port.c > @@ -147,6 +147,39 @@ static unsigned long en_stats_adder(__be64 *start, > __be64 *next, int num) > return ret; > } > > +void mlx4_en_fold_software_stats(struct net_device *dev) > +{ > +
Re: [PATCH 1/1] NET: usb: cdc_ncm: adding MBIM RESET_FUNCTION request and modifying ncm bind common code
Bjørn Morkwrites: > Finally, I found my modems (or at least a number of them) again today. > But I'm sorry to say, that the troublesome Huawei E3372h-153 is still > giving us a hard time. It does not work with your patch. The symptom is > the same as earlier: The modem returns MBIM frames with 32bit headers. > > So for now, I have to NAK this patch. > > I am sure we can find a good solution that makes all of these modems > work, but I cannot support a patch that breaks previously working > configurations. Sorry. I'll do a few experiments and see if there is a > simple fix for this. Otherwise we'll probably have to do the quirk > game. This is a proof-of-concept only, but it appears to be working. Please test with your device(s) too. It's still mostly your code, as you can see. If this turns out to work, then I believe we should refactor cdc_ncm_init() and cdc_ncm_bind_common() to make the whole initialisation sequence a bit cleaner. And maybe also include cdc_mbim_bind(). Ideally, the MBIM specific RESET should happen there instead of "polluting" the NCM driver with MBIM specific code. But anyway: The sequence that seems to work for both the E3372h-153 and the EM7455 is USB_CDC_GET_NTB_PARAMETERS USB_CDC_RESET_FUNCTION usb_set_interface(dev->udev, 'data interface no', 0); remaining parts of cdc_ncm_init(), excluding USB_CDC_GET_NTB_PARAMETERS usb_set_interface(dev->udev, 'data interface no', 'data alt setting'); without any additional delay between the two usb_set_interface() calls. So the major difference from your patch is that I moved the two control requests out of cdc_ncm_init() to allow running them _before_ setting the data interface to altsetting 0. But maybe I was just lucky. This was barely proof tested. Needs a lot more testing and cleanups as suggested. I'd appreciate it if you continued that, as I don't really have any time for it... FWIW, I also ran a quick test with a D-Link DWM-156A7 (Mediatek MBIM firmware) and a Huawei E367 (Qualcomm device with early Huawei MBIM firmware, distinctly different from the E3372h-153 and most other MBIM devices I've seen) Bjørn --- drivers/net/usb/cdc_ncm.c| 48 include/uapi/linux/usb/cdc.h | 1 + 2 files changed, 32 insertions(+), 17 deletions(-) diff --git a/drivers/net/usb/cdc_ncm.c b/drivers/net/usb/cdc_ncm.c index 877c9516e781..be019cbf1719 100644 --- a/drivers/net/usb/cdc_ncm.c +++ b/drivers/net/usb/cdc_ncm.c @@ -488,16 +488,6 @@ static int cdc_ncm_init(struct usbnet *dev) u8 iface_no = ctx->control->cur_altsetting->desc.bInterfaceNumber; int err; - err = usbnet_read_cmd(dev, USB_CDC_GET_NTB_PARAMETERS, - USB_TYPE_CLASS | USB_DIR_IN - |USB_RECIP_INTERFACE, - 0, iface_no, >ncm_parm, - sizeof(ctx->ncm_parm)); - if (err < 0) { - dev_err(>intf->dev, "failed GET_NTB_PARAMETERS\n"); - return err; /* GET_NTB_PARAMETERS is required */ - } - /* set CRC Mode */ if (cdc_ncm_flags(dev) & USB_CDC_NCM_NCAP_CRC_MODE) { dev_dbg(>intf->dev, "Setting CRC mode off\n"); @@ -837,12 +827,43 @@ int cdc_ncm_bind_common(struct usbnet *dev, struct usb_interface *intf, u8 data_ } } + iface_no = ctx->control->cur_altsetting->desc.bInterfaceNumber; + temp = usbnet_read_cmd(dev, USB_CDC_GET_NTB_PARAMETERS, + USB_TYPE_CLASS | USB_DIR_IN + | USB_RECIP_INTERFACE, + 0, iface_no, >ncm_parm, + sizeof(ctx->ncm_parm)); + if (temp < 0) { + dev_err(>intf->dev, "failed GET_NTB_PARAMETERS\n"); + goto error; /* GET_NTB_PARAMETERS is required */ + } + + /* Some modems (e.g. Telit LE922A6) need to reset the MBIM function +* or they will fail to work properly. +* For details on RESET_FUNCTION request see document +* "USB Communication Class Subclass Specification for MBIM" +* RESET_FUNCTION should be harmless for all the other MBIM modems +*/ + if (cdc_ncm_comm_intf_is_mbim(ctx->control->cur_altsetting)) { + temp = usbnet_write_cmd(dev, USB_CDC_RESET_FUNCTION, + USB_TYPE_CLASS | USB_DIR_OUT + | USB_RECIP_INTERFACE, + 0, iface_no, NULL, 0); + if (temp < 0) + dev_err(>intf->dev, "failed RESET_FUNCTION\n"); + } + iface_no = ctx->data->cur_altsetting->desc.bInterfaceNumber; /* Reset data interface. Some devices will not reset properly * unless they are configured first. Toggle the altsetting to * force a reset +* This is applied only to ncm
[GIT] Networking
1) Fix leak in fsl/fman driver, from Dan Carpenter. 2) Call flow dissector initcall earlier than any networking driver can register and start to use it, from Eric Dumazet. 3) Some dup header fixes from Geliang Tang. 4) TIPC link monitoring compat fix from Jon Paul Maloy. 5) Link changes require EEE re-negotiation in bcm_sf2 driver, from Florian Fainelli. 6) Fix bogus handle ID passed into tfilter_notify_chain(), from Roman Mashak. 7) Fix dump size calculation in rtnl_calcit(), from Zhang Shengju. Please pull, thanks a lot! The following changes since commit 3b404a519815b9820f73f1ecf404e5546c9270ba: Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security (2016-11-21 15:27:41 -0800) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git for you to fetch changes up to 6998cc6ec23740347670da13186d2979c5401903: tipc: resolve connection flow control compatibility problem (2016-11-25 21:38:16 -0500) Andrew Lunn (1): net: ethernet: mvneta: Remove IFF_UNICAST_FLT which is not implemented Andy Gospodarek (1): bnxt: do not busy-poll when link is down Arnd Bergmann (1): mvpp2: use correct size for memset Christophe Jaillet (1): bnxt_en: Fix a VXLAN vs GENEVE issue Dan Carpenter (1): fsl/fman: fix a leak in tgec_free() David S. Miller (2): Merge branch 'for-upstream' of git://git.kernel.org/.../bluetooth/bluetooth Merge tag 'linux-can-fixes-for-4.9-20161123' of git://git.kernel.org/.../mkl/linux-can Eric Dumazet (2): flow_dissect: call init_default_flow_dissectors() earlier udplite: call proper backlog handlers Florian Fainelli (1): net: dsa: bcm_sf2: Ensure we re-negotiate EEE during after link change Gao Feng (1): driver: macvlan: Check if need rollback multicast setting in macvlan_open Geliang Tang (4): dwc_eth_qos: drop duplicate headers ibmvnic: drop duplicate header seq_file.h net: ieee802154: drop duplicate header delay.h net/mlx5: drop duplicate header delay.h Johan Hedberg (1): Bluetooth: Fix using the correct source address type Jon Paul Maloy (3): tipc: fix compatibility bug in link monitoring tipc: improve sanity check for received domain records tipc: resolve connection flow control compatibility problem Kirill Esipov (1): net: phy: micrel: fix KSZ8041FTL supported value Miroslav Lichvar (1): net: ethtool: don't require CAP_NET_ADMIN for ETHTOOL_GLINKSETTINGS Oliver Hartkopp (1): can: bcm: fix support for CAN FD frames Paolo Abeni (1): ipv6: bump genid when the IFA_F_TENTATIVE flag is clear Randy Dunlap (1): netdevice.h: fix kernel-doc warning Roman Mashak (1): net sched filters: fix filter handle ID in tfilter_notify_chain() Tariq Toukan (1): net/mlx4_en: Free netdev resources under state lock WANG Cong (1): net: revert "net: l2tp: Treat NET_XMIT_CN as success in l2tp_eth_dev_xmit" Zhang Shengju (1): rtnetlink: fix the wrong minimal dump size getting from rtnl_calcit() drivers/net/dsa/bcm_sf2.c | 4 drivers/net/ethernet/broadcom/bnxt/bnxt.c | 15 --- drivers/net/ethernet/freescale/fman/fman_tgec.c | 3 --- drivers/net/ethernet/ibm/ibmvnic.c | 1 - drivers/net/ethernet/marvell/mvneta.c | 2 +- drivers/net/ethernet/marvell/mvpp2.c| 2 +- drivers/net/ethernet/mellanox/mlx4/en_netdev.c | 5 - drivers/net/ethernet/mellanox/mlx5/core/main.c | 1 - drivers/net/ethernet/synopsys/dwc_eth_qos.c | 2 -- drivers/net/ieee802154/adf7242.c| 1 - drivers/net/macvlan.c | 3 ++- drivers/net/phy/micrel.c| 8 include/linux/netdevice.h | 2 +- include/net/bluetooth/hci_core.h| 2 +- net/bluetooth/6lowpan.c | 4 ++-- net/bluetooth/hci_conn.c| 26 -- net/bluetooth/l2cap_core.c | 2 +- net/bluetooth/rfcomm/tty.c | 2 +- net/bluetooth/sco.c | 2 +- net/can/bcm.c | 18 ++ net/core/ethtool.c | 1 + net/core/flow_dissector.c | 2 +- net/core/rtnetlink.c| 2 +- net/ipv4/udp.c | 2 +- net/ipv4/udp_impl.h | 2 +- net/ipv4/udplite.c | 2 +- net/ipv6/addrconf.c | 18 -- net/ipv6/udp.c | 2 +- net/ipv6/udp_impl.h | 2 +- net/ipv6/udplite.c | 2 +-
[PATCH] amd-xgbe: Fix unused suspend handlers build warning
From: Borislav PetkovFix: drivers/net/ethernet/amd/xgbe/xgbe-main.c:835:12: warning: ‘xgbe_suspend’ defined but not used [-Wunused-function] drivers/net/ethernet/amd/xgbe/xgbe-main.c:855:12: warning: ‘xgbe_resume’ defined but not used [-Wunused-function] I see it during randconfig builds here. Signed-off-by: Borislav Petkov Cc: Tom Lendacky Cc: netdev@vger.kernel.org --- drivers/net/ethernet/amd/xgbe/xgbe-main.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/amd/xgbe/xgbe-main.c b/drivers/net/ethernet/amd/xgbe/xgbe-main.c index e10e569c0d5f..2e8451b0a74a 100644 --- a/drivers/net/ethernet/amd/xgbe/xgbe-main.c +++ b/drivers/net/ethernet/amd/xgbe/xgbe-main.c @@ -831,7 +831,7 @@ static int xgbe_remove(struct platform_device *pdev) return 0; } -#ifdef CONFIG_PM +#ifdef CONFIG_PM_SLEEP static int xgbe_suspend(struct device *dev) { struct net_device *netdev = dev_get_drvdata(dev); @@ -876,7 +876,7 @@ static int xgbe_resume(struct device *dev) return ret; } -#endif /* CONFIG_PM */ +#endif /* CONFIG_PM_SLEEP */ #ifdef CONFIG_ACPI static const struct acpi_device_id xgbe_acpi_match[] = { -- 2.10.0
Re: net: GPF in eth_header
2016-11-26 12:05 GMT-08:00 Eric Dumazet: >> Hi Eric, >> >> The crash happens when the kernel tries to access shadow for nonmapped >> memory. >> >> The issue here is an integer overflow which happens in >> neigh_resolve_output(). >> skb_network_offset(skb) can return negative number, but __skb_pull() >> accepts unsigned int as len. >> As a result, the least significat bit in higher 32 bits of skb->data >> gets set and we get an out-of-bounds with offset of 4 GB. >> >> I've attached a short reproducer, but you either need KASAN or to add >> a BUG_ON to see the crash. >> In this reproducer skb_network_offset() becomes negative after merging >> two ipv6 fragments. >> >> I actually see multiple places where skb_network_offset() is used as >> an argument to skb_pull(). >> So I guess every place can potentially be buggy. > > Well, I think the intent is to accept a negative number. > > This definitely was assumed by commit e1f165032c8bade authors ! > > I guess they were using a 32bit kernel for their tests. Correct fix would be to use skb_push(skb, -skb_network_offset(skb)); As done in other locations...
Re: [PATCH 1/1] NET: usb: cdc_ncm: adding MBIM RESET_FUNCTION request and modifying ncm bind common code
Finally, I found my modems (or at least a number of them) again today. But I'm sorry to say, that the troublesome Huawei E3372h-153 is still giving us a hard time. It does not work with your patch. The symptom is the same as earlier: The modem returns MBIM frames with 32bit headers. So for now, I have to NAK this patch. I am sure we can find a good solution that makes all of these modems work, but I cannot support a patch that breaks previously working configurations. Sorry. I'll do a few experiments and see if there is a simple fix for this. Otherwise we'll probably have to do the quirk game. Bjørn
[PATCH net-next 4/6] tcp: instrument how long TCP is limited by insufficient send buffer
From: Francis YanThis patch measures the amount of time when TCP runs out of new data to send to the network due to insufficient send buffer, while TCP is still busy delivering (i.e. write queue is not empty). The goal is to indicate either the send buffer autotuning or user SO_SNDBUF setting has resulted network under-utilization. The measurement starts conservatively by checking various conditions to minimize false claims (i.e. under-estimation is more likely). The measurement stops when the SOCK_NOSPACE flag is cleared. But it does not account the time elapsed till the next application write. Also the measurement only starts if the sender is still busy sending data, s.t. the limit accounted is part of the total busy time. Signed-off-by: Francis Yan Signed-off-by: Yuchung Cheng Signed-off-by: Soheil Hassas Yeganeh --- net/ipv4/tcp.c| 10 -- net/ipv4/tcp_input.c | 5 - net/ipv4/tcp_output.c | 12 3 files changed, 24 insertions(+), 3 deletions(-) diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index 913f9bb..259ffb5 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -996,8 +996,11 @@ static ssize_t do_tcp_sendpages(struct sock *sk, struct page *page, int offset, goto out; out_err: /* make sure we wake any epoll edge trigger waiter */ - if (unlikely(skb_queue_len(>sk_write_queue) == 0 && err == -EAGAIN)) + if (unlikely(skb_queue_len(>sk_write_queue) == 0 && +err == -EAGAIN)) { sk->sk_write_space(sk); + tcp_chrono_stop(sk, TCP_CHRONO_SNDBUF_LIMITED); + } return sk_stream_error(sk, flags, err); } @@ -1331,8 +1334,11 @@ int tcp_sendmsg(struct sock *sk, struct msghdr *msg, size_t size) out_err: err = sk_stream_error(sk, flags, err); /* make sure we wake any epoll edge trigger waiter */ - if (unlikely(skb_queue_len(>sk_write_queue) == 0 && err == -EAGAIN)) + if (unlikely(skb_queue_len(>sk_write_queue) == 0 && +err == -EAGAIN)) { sk->sk_write_space(sk); + tcp_chrono_stop(sk, TCP_CHRONO_SNDBUF_LIMITED); + } release_sock(sk); return err; } diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index a5d1727..56fe736 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -5059,8 +5059,11 @@ static void tcp_check_space(struct sock *sk) /* pairs with tcp_poll() */ smp_mb__after_atomic(); if (sk->sk_socket && - test_bit(SOCK_NOSPACE, >sk_socket->flags)) + test_bit(SOCK_NOSPACE, >sk_socket->flags)) { tcp_new_space(sk); + if (!test_bit(SOCK_NOSPACE, >sk_socket->flags)) + tcp_chrono_stop(sk, TCP_CHRONO_SNDBUF_LIMITED); + } } } diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index b7c..d3545d0 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -1514,6 +1514,18 @@ static void tcp_cwnd_validate(struct sock *sk, bool is_cwnd_limited) if (sysctl_tcp_slow_start_after_idle && (s32)(tcp_time_stamp - tp->snd_cwnd_stamp) >= inet_csk(sk)->icsk_rto) tcp_cwnd_application_limited(sk); + + /* The following conditions together indicate the starvation +* is caused by insufficient sender buffer: +* 1) just sent some data (see tcp_write_xmit) +* 2) not cwnd limited (this else condition) +* 3) no more data to send (null tcp_send_head ) +* 4) application is hitting buffer limit (SOCK_NOSPACE) +*/ + if (!tcp_send_head(sk) && sk->sk_socket && + test_bit(SOCK_NOSPACE, >sk_socket->flags) && + (1 << sk->sk_state) & (TCPF_ESTABLISHED | TCPF_CLOSE_WAIT)) + tcp_chrono_start(sk, TCP_CHRONO_SNDBUF_LIMITED); } } -- 2.8.0.rc3.226.g39d4020
[PATCH net-next 5/6] tcp: export sender limits chronographs to TCP_INFO
From: Francis YanThis patch exports all the sender chronograph measurements collected in the previous patches to TCP_INFO interface. Note that busy time exported includes all the other sending limits (rwnd-limited, sndbuf-limited). Internally the time unit is jiffy but externally the measurements are in microseconds for future extensions. Signed-off-by: Francis Yan Signed-off-by: Yuchung Cheng Signed-off-by: Soheil Hassas Yeganeh --- include/uapi/linux/tcp.h | 4 net/ipv4/tcp.c | 20 2 files changed, 24 insertions(+) diff --git a/include/uapi/linux/tcp.h b/include/uapi/linux/tcp.h index 73ac0db..2863b66 100644 --- a/include/uapi/linux/tcp.h +++ b/include/uapi/linux/tcp.h @@ -214,6 +214,10 @@ struct tcp_info { __u32 tcpi_data_segs_out; /* RFC4898 tcpEStatsDataSegsOut */ __u64 tcpi_delivery_rate; + + __u64 tcpi_busy_time; /* Time (usec) busy sending data */ + __u64 tcpi_rwnd_limited; /* Time (usec) limited by receive window */ + __u64 tcpi_sndbuf_limited; /* Time (usec) limited by send buffer */ }; /* for TCP_MD5SIG socket option */ diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index 259ffb5..cdde20f 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -2708,6 +2708,25 @@ int compat_tcp_setsockopt(struct sock *sk, int level, int optname, EXPORT_SYMBOL(compat_tcp_setsockopt); #endif +static void tcp_get_info_chrono_stats(const struct tcp_sock *tp, + struct tcp_info *info) +{ + u64 stats[__TCP_CHRONO_MAX], total = 0; + enum tcp_chrono i; + + for (i = TCP_CHRONO_BUSY; i < __TCP_CHRONO_MAX; ++i) { + stats[i] = tp->chrono_stat[i - 1]; + if (i == tp->chrono_type) + stats[i] += tcp_time_stamp - tp->chrono_start; + stats[i] *= USEC_PER_SEC / HZ; + total += stats[i]; + } + + info->tcpi_busy_time = total; + info->tcpi_rwnd_limited = stats[TCP_CHRONO_RWND_LIMITED]; + info->tcpi_sndbuf_limited = stats[TCP_CHRONO_SNDBUF_LIMITED]; +} + /* Return information about state of tcp endpoint in API format. */ void tcp_get_info(struct sock *sk, struct tcp_info *info) { @@ -2800,6 +2819,7 @@ void tcp_get_info(struct sock *sk, struct tcp_info *info) info->tcpi_bytes_acked = tp->bytes_acked; info->tcpi_bytes_received = tp->bytes_received; info->tcpi_notsent_bytes = max_t(int, 0, tp->write_seq - tp->snd_nxt); + tcp_get_info_chrono_stats(tp, info); unlock_sock_fast(sk, slow); -- 2.8.0.rc3.226.g39d4020
[PATCH net-next 6/6] tcp: SOF_TIMESTAMPING_OPT_STATS option for SO_TIMESTAMPING
From: Francis YanThis patch exports the sender chronograph stats via the socket SO_TIMESTAMPING channel. Currently we can instrument how long a particular application unit of data was queued in TCP by tracking SOF_TIMESTAMPING_TX_SOFTWARE and SOF_TIMESTAMPING_TX_SCHED. Having these sender chronograph stats exported simultaneously along with these timestamps allow further breaking down the various sender limitation. For example, a video server can tell if a particular chunk of video on a connection takes a long time to deliver because TCP was experiencing small receive window. It is not possible to tell before this patch without packet traces. To prepare these stats, the user needs to set SOF_TIMESTAMPING_OPT_STATS and SOF_TIMESTAMPING_OPT_TSONLY flags while requesting other SOF_TIMESTAMPING TX timestamps. When the timestamps are available in the error queue, the stats are returned in a separate control message of type SCM_TIMESTAMPING_OPT_STATS, in a list of TLVs (struct nlattr) of types: TCP_NLA_BUSY_TIME, TCP_NLA_RWND_LIMITED, TCP_NLA_SNDBUF_LIMITED. Unit is microsecond. Signed-off-by: Francis Yan Signed-off-by: Yuchung Cheng Signed-off-by: Soheil Hassas Yeganeh --- Documentation/networking/timestamping.txt | 10 ++ arch/alpha/include/uapi/asm/socket.h | 2 ++ arch/frv/include/uapi/asm/socket.h| 2 ++ arch/ia64/include/uapi/asm/socket.h | 2 ++ arch/m32r/include/uapi/asm/socket.h | 2 ++ arch/mips/include/uapi/asm/socket.h | 2 ++ arch/mn10300/include/uapi/asm/socket.h| 2 ++ arch/parisc/include/uapi/asm/socket.h | 2 ++ arch/powerpc/include/uapi/asm/socket.h| 2 ++ arch/s390/include/uapi/asm/socket.h | 2 ++ arch/sparc/include/uapi/asm/socket.h | 2 ++ arch/xtensa/include/uapi/asm/socket.h | 2 ++ include/linux/tcp.h | 2 ++ include/uapi/asm-generic/socket.h | 2 ++ include/uapi/linux/net_tstamp.h | 3 ++- include/uapi/linux/tcp.h | 8 net/core/skbuff.c | 12 +--- net/core/sock.c | 7 +++ net/ipv4/tcp.c| 20 net/socket.c | 7 ++- 20 files changed, 88 insertions(+), 5 deletions(-) diff --git a/Documentation/networking/timestamping.txt b/Documentation/networking/timestamping.txt index 671cccf..96f5069 100644 --- a/Documentation/networking/timestamping.txt +++ b/Documentation/networking/timestamping.txt @@ -182,6 +182,16 @@ SOF_TIMESTAMPING_OPT_TSONLY: the timestamp even if sysctl net.core.tstamp_allow_data is 0. This option disables SOF_TIMESTAMPING_OPT_CMSG. +SOF_TIMESTAMPING_OPT_STATS: + + Optional stats that are obtained along with the transmit timestamps. + It must be used together with SOF_TIMESTAMPING_OPT_TSONLY. When the + transmit timestamp is available, the stats are available in a + separate control message of type SCM_TIMESTAMPING_OPT_STATS, as a + list of TLVs (struct nlattr) of types. These stats allow the + application to associate various transport layer stats with + the transmit timestamps, such as how long a certain block of + data was limited by peer's receiver window. New applications are encouraged to pass SOF_TIMESTAMPING_OPT_ID to disambiguate timestamps and SOF_TIMESTAMPING_OPT_TSONLY to operate diff --git a/arch/alpha/include/uapi/asm/socket.h b/arch/alpha/include/uapi/asm/socket.h index 9e46d6e..afc901b 100644 --- a/arch/alpha/include/uapi/asm/socket.h +++ b/arch/alpha/include/uapi/asm/socket.h @@ -97,4 +97,6 @@ #define SO_CNX_ADVICE 53 +#define SCM_TIMESTAMPING_OPT_STATS 54 + #endif /* _UAPI_ASM_SOCKET_H */ diff --git a/arch/frv/include/uapi/asm/socket.h b/arch/frv/include/uapi/asm/socket.h index afbc98f0..81e0353 100644 --- a/arch/frv/include/uapi/asm/socket.h +++ b/arch/frv/include/uapi/asm/socket.h @@ -90,5 +90,7 @@ #define SO_CNX_ADVICE 53 +#define SCM_TIMESTAMPING_OPT_STATS 54 + #endif /* _ASM_SOCKET_H */ diff --git a/arch/ia64/include/uapi/asm/socket.h b/arch/ia64/include/uapi/asm/socket.h index 0018fad..57feb0c 100644 --- a/arch/ia64/include/uapi/asm/socket.h +++ b/arch/ia64/include/uapi/asm/socket.h @@ -99,4 +99,6 @@ #define SO_CNX_ADVICE 53 +#define SCM_TIMESTAMPING_OPT_STATS 54 + #endif /* _ASM_IA64_SOCKET_H */ diff --git a/arch/m32r/include/uapi/asm/socket.h b/arch/m32r/include/uapi/asm/socket.h index 5fe42fc..5853f8e9 100644 --- a/arch/m32r/include/uapi/asm/socket.h +++ b/arch/m32r/include/uapi/asm/socket.h @@ -90,4 +90,6 @@ #define SO_CNX_ADVICE 53 +#define SCM_TIMESTAMPING_OPT_STATS 54 + #endif /* _ASM_M32R_SOCKET_H */ diff --git a/arch/mips/include/uapi/asm/socket.h b/arch/mips/include/uapi/asm/socket.h index 2027240a..566ecdc 100644 --- a/arch/mips/include/uapi/asm/socket.h +++
[PATCH net-next 0/6] tcp: sender chronographs instrumentation
This patch set provides instrumentation on TCP sender limitations. While developing the BBR congestion control, we noticed that TCP sending process is often limited by factors unrelated to congestion control: insufficient sender buffer and/or insufficient receive window/buffer to saturate the network bandwidth. Unfortunately these limits are not visible to the users and often the poor performance is attributed to the congestion control of choice. Thie patch aims to help users get the high level understanding of where sending process is limited by, similar to the TCP_INFO design. It is not to replace detailed kernel tracing and instrumentation facilities. In addition this patch set provides a new option to the timestamping work to instrument these limits on application data unit. For exampe, one can use SO_TIMESTAMPING and this patch set to measure the how long a particular HTTP response is limited by small receive window. Patch set was initially written by Francis Yan then polished by Yuchung Cheng, with lots of help from Eric Dumazet and Soheil Hassas Yeganeh. Francis Yan (6): tcp: instrument tcp sender limits chronographs tcp: instrument how long TCP is busy sending tcp: instrument how long TCP is limited by receive window tcp: instrument how long TCP is limited by insufficient send buffer tcp: export sender limits chronographs to TCP_INFO tcp: SOF_TIMESTAMPING_OPT_STATS option for SO_TIMESTAMPING Documentation/networking/timestamping.txt | 10 + arch/alpha/include/uapi/asm/socket.h | 2 + arch/frv/include/uapi/asm/socket.h| 2 + arch/ia64/include/uapi/asm/socket.h | 2 + arch/m32r/include/uapi/asm/socket.h | 2 + arch/mips/include/uapi/asm/socket.h | 2 + arch/mn10300/include/uapi/asm/socket.h| 2 + arch/parisc/include/uapi/asm/socket.h | 2 + arch/powerpc/include/uapi/asm/socket.h| 2 + arch/s390/include/uapi/asm/socket.h | 2 + arch/sparc/include/uapi/asm/socket.h | 2 + arch/xtensa/include/uapi/asm/socket.h | 2 + include/linux/tcp.h | 9 - include/net/tcp.h | 20 +- include/uapi/asm-generic/socket.h | 2 + include/uapi/linux/net_tstamp.h | 3 +- include/uapi/linux/tcp.h | 12 ++ net/core/skbuff.c | 12 -- net/core/sock.c | 7 net/ipv4/tcp.c| 50 ++- net/ipv4/tcp_input.c | 8 +++- net/ipv4/tcp_output.c | 66 ++- net/socket.c | 7 +++- 23 files changed, 215 insertions(+), 13 deletions(-) -- 2.8.0.rc3.226.g39d4020
[PATCH net-next 1/6] tcp: instrument tcp sender limits chronographs
From: Francis YanThis patch implements the skeleton of the TCP chronograph instrumentation on sender side limits: 1) idle (unspec) 2) busy sending data other than 3-4 below 3) rwnd-limited 4) sndbuf-limited The limits are enumerated 'tcp_chrono'. Since a connection in theory can idle forever, we do not track the actual length of this uninteresting idle period. For the rest we track how long the sender spends in each limit. At any point during the life time of a connection, the sender must be in one of the four states. If there are multiple conditions worthy of tracking in a chronograph then the highest priority enum takes precedence over the other conditions. So that if something "more interesting" starts happening, stop the previous chrono and start a new one. The time unit is jiffy(u32) in order to save space in tcp_sock. This implies application must sample the stats no longer than every 49 days of 1ms jiffy. Signed-off-by: Francis Yan Signed-off-by: Yuchung Cheng Signed-off-by: Soheil Hassas Yeganeh --- include/linux/tcp.h | 7 +-- include/net/tcp.h | 14 ++ net/ipv4/tcp_output.c | 30 ++ 3 files changed, 49 insertions(+), 2 deletions(-) diff --git a/include/linux/tcp.h b/include/linux/tcp.h index 32a7c7e..d5d3bd8 100644 --- a/include/linux/tcp.h +++ b/include/linux/tcp.h @@ -211,8 +211,11 @@ struct tcp_sock { u8 reord;/* reordering detected */ } rack; u16 advmss; /* Advertised MSS */ - u8 rate_app_limited:1, /* rate_{delivered,interval_us} limited? */ - unused:7; + u32 chrono_start; /* Start time in jiffies of a TCP chrono */ + u32 chrono_stat[3]; /* Time in jiffies for chrono_stat stats */ + u8 chrono_type:2, /* current chronograph type */ + rate_app_limited:1, /* rate_{delivered,interval_us} limited? */ + unused:5; u8 nonagle : 4,/* Disable Nagle algorithm? */ thin_lto: 1,/* Use linear timeouts for thin streams */ thin_dupack : 1,/* Fast retransmit on first dupack */ diff --git a/include/net/tcp.h b/include/net/tcp.h index 7de8073..e5ff408 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -1516,6 +1516,20 @@ struct tcp_fastopen_context { struct rcu_head rcu; }; +/* Latencies incurred by various limits for a sender. They are + * chronograph-like stats that are mutually exclusive. + */ +enum tcp_chrono { + TCP_CHRONO_UNSPEC, + TCP_CHRONO_BUSY, /* Actively sending data (non-empty write queue) */ + TCP_CHRONO_RWND_LIMITED, /* Stalled by insufficient receive window */ + TCP_CHRONO_SNDBUF_LIMITED, /* Stalled by insufficient send buffer */ + __TCP_CHRONO_MAX, +}; + +void tcp_chrono_start(struct sock *sk, const enum tcp_chrono type); +void tcp_chrono_stop(struct sock *sk, const enum tcp_chrono type); + /* write queue abstraction */ static inline void tcp_write_queue_purge(struct sock *sk) { diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index 19105b4..34f7517 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -2081,6 +2081,36 @@ static bool tcp_small_queue_check(struct sock *sk, const struct sk_buff *skb, return false; } +static void tcp_chrono_set(struct tcp_sock *tp, const enum tcp_chrono new) +{ + const u32 now = tcp_time_stamp; + + if (tp->chrono_type > TCP_CHRONO_UNSPEC) + tp->chrono_stat[tp->chrono_type - 1] += now - tp->chrono_start; + tp->chrono_start = now; + tp->chrono_type = new; +} + +void tcp_chrono_start(struct sock *sk, const enum tcp_chrono type) +{ + struct tcp_sock *tp = tcp_sk(sk); + + /* If there are multiple conditions worthy of tracking in a +* chronograph then the highest priority enum takes precedence over +* the other conditions. So that if something "more interesting" +* starts happening, stop the previous chrono and start a new one. +*/ + if (type > tp->chrono_type) + tcp_chrono_set(tp, type); +} + +void tcp_chrono_stop(struct sock *sk, const enum tcp_chrono type) +{ + struct tcp_sock *tp = tcp_sk(sk); + + tcp_chrono_set(tp, TCP_CHRONO_UNSPEC); +} + /* This routine writes packets to the network. It advances the * send_head. This happens as incoming acks open up the remote * window for us. -- 2.8.0.rc3.226.g39d4020
[PATCH net-next 2/6] tcp: instrument how long TCP is busy sending
From: Francis YanThis patch measures TCP busy time, which is defined as the period of time when sender has data (or FIN) to send. The time starts when data is buffered and stops when the write queue is flushed by ACKs or error events. Note the busy time does not include SYN time, unless data is included in SYN (i.e. Fast Open). It does include FIN time even if the FIN carries no payload. Excluding pure FIN is possible but would incur one additional test in the fast path, which may not be worth it. Signed-off-by: Francis Yan Signed-off-by: Yuchung Cheng Signed-off-by: Soheil Hassas Yeganeh --- include/net/tcp.h | 6 +- net/ipv4/tcp_input.c | 3 +++ net/ipv4/tcp_output.c | 19 --- 3 files changed, 24 insertions(+), 4 deletions(-) diff --git a/include/net/tcp.h b/include/net/tcp.h index e5ff408..3e097e3 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -1535,6 +1535,7 @@ static inline void tcp_write_queue_purge(struct sock *sk) { struct sk_buff *skb; + tcp_chrono_stop(sk, TCP_CHRONO_BUSY); while ((skb = __skb_dequeue(>sk_write_queue)) != NULL) sk_wmem_free_skb(sk, skb); sk_mem_reclaim(sk); @@ -1593,8 +1594,10 @@ static inline void tcp_advance_send_head(struct sock *sk, const struct sk_buff * static inline void tcp_check_send_head(struct sock *sk, struct sk_buff *skb_unlinked) { - if (sk->sk_send_head == skb_unlinked) + if (sk->sk_send_head == skb_unlinked) { sk->sk_send_head = NULL; + tcp_chrono_stop(sk, TCP_CHRONO_BUSY); + } if (tcp_sk(sk)->highest_sack == skb_unlinked) tcp_sk(sk)->highest_sack = NULL; } @@ -1616,6 +1619,7 @@ static inline void tcp_add_write_queue_tail(struct sock *sk, struct sk_buff *skb /* Queue it, remembering where we must start sending. */ if (sk->sk_send_head == NULL) { sk->sk_send_head = skb; + tcp_chrono_start(sk, TCP_CHRONO_BUSY); if (tcp_sk(sk)->highest_sack == NULL) tcp_sk(sk)->highest_sack = skb; diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index 22e6a20..a5d1727 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -3178,6 +3178,9 @@ static int tcp_clean_rtx_queue(struct sock *sk, int prior_fackets, tp->lost_skb_hint = NULL; } + if (!skb) + tcp_chrono_stop(sk, TCP_CHRONO_BUSY); + if (likely(between(tp->snd_up, prior_snd_una, tp->snd_una))) tp->snd_up = tp->snd_una; diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index 34f7517..e8ea584 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -2096,8 +2096,8 @@ void tcp_chrono_start(struct sock *sk, const enum tcp_chrono type) struct tcp_sock *tp = tcp_sk(sk); /* If there are multiple conditions worthy of tracking in a -* chronograph then the highest priority enum takes precedence over -* the other conditions. So that if something "more interesting" +* chronograph then the highest priority enum takes precedence +* over the other conditions. So that if something "more interesting" * starts happening, stop the previous chrono and start a new one. */ if (type > tp->chrono_type) @@ -2108,7 +2108,18 @@ void tcp_chrono_stop(struct sock *sk, const enum tcp_chrono type) { struct tcp_sock *tp = tcp_sk(sk); - tcp_chrono_set(tp, TCP_CHRONO_UNSPEC); + + /* There are multiple conditions worthy of tracking in a +* chronograph, so that the highest priority enum takes +* precedence over the other conditions (see tcp_chrono_start). +* If a condition stops, we only stop chrono tracking if +* it's the "most interesting" or current chrono we are +* tracking and starts busy chrono if we have pending data. +*/ + if (tcp_write_queue_empty(sk)) + tcp_chrono_set(tp, TCP_CHRONO_UNSPEC); + else if (type == tp->chrono_type) + tcp_chrono_set(tp, TCP_CHRONO_BUSY); } /* This routine writes packets to the network. It advances the @@ -3328,6 +3339,8 @@ static int tcp_send_syn_data(struct sock *sk, struct sk_buff *syn) fo->copied = space; tcp_connect_queue_skb(sk, syn_data); + if (syn_data->len) + tcp_chrono_start(sk, TCP_CHRONO_BUSY); err = tcp_transmit_skb(sk, syn_data, 1, sk->sk_allocation); -- 2.8.0.rc3.226.g39d4020
[PATCH net-next 3/6] tcp: instrument how long TCP is limited by receive window
From: Francis YanThis patch measures the total time when the TCP stops sending because the receiver's advertised window is not large enough. Note that once the limit is lifted we are likely in the busy status if we have data pending. Signed-off-by: Francis Yan Signed-off-by: Yuchung Cheng Signed-off-by: Soheil Hassas Yeganeh --- net/ipv4/tcp_output.c | 11 +-- 1 file changed, 9 insertions(+), 2 deletions(-) diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index e8ea584..b7c 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -2144,7 +2144,7 @@ static bool tcp_write_xmit(struct sock *sk, unsigned int mss_now, int nonagle, unsigned int tso_segs, sent_pkts; int cwnd_quota; int result; - bool is_cwnd_limited = false; + bool is_cwnd_limited = false, is_rwnd_limited = false; u32 max_segs; sent_pkts = 0; @@ -2181,8 +2181,10 @@ static bool tcp_write_xmit(struct sock *sk, unsigned int mss_now, int nonagle, break; } - if (unlikely(!tcp_snd_wnd_test(tp, skb, mss_now))) + if (unlikely(!tcp_snd_wnd_test(tp, skb, mss_now))) { + is_rwnd_limited = true; break; + } if (tso_segs == 1) { if (unlikely(!tcp_nagle_test(tp, skb, mss_now, @@ -2227,6 +2229,11 @@ static bool tcp_write_xmit(struct sock *sk, unsigned int mss_now, int nonagle, break; } + if (is_rwnd_limited) + tcp_chrono_start(sk, TCP_CHRONO_RWND_LIMITED); + else + tcp_chrono_stop(sk, TCP_CHRONO_RWND_LIMITED); + if (likely(sent_pkts)) { if (tcp_in_cwnd_reduction(sk)) tp->prr_out += sent_pkts; -- 2.8.0.rc3.226.g39d4020
Re: net: GPF in eth_header
> Hi Eric, > > The crash happens when the kernel tries to access shadow for nonmapped memory. > > The issue here is an integer overflow which happens in neigh_resolve_output(). > skb_network_offset(skb) can return negative number, but __skb_pull() > accepts unsigned int as len. > As a result, the least significat bit in higher 32 bits of skb->data > gets set and we get an out-of-bounds with offset of 4 GB. > > I've attached a short reproducer, but you either need KASAN or to add > a BUG_ON to see the crash. > In this reproducer skb_network_offset() becomes negative after merging > two ipv6 fragments. > > I actually see multiple places where skb_network_offset() is used as > an argument to skb_pull(). > So I guess every place can potentially be buggy. Well, I think the intent is to accept a negative number. This definitely was assumed by commit e1f165032c8bade authors ! I guess they were using a 32bit kernel for their tests.
unsuscribe
unsuscribe
Re: net: stmmac: Meson GXBB: attempting to execute userspace memory
On 2016-11-26 07:53, Heinrich Schuchardt wrote: For Odroid C2 I have compiled kernel 4.9.0-rc6-next-20161124-1-gbf7e142 with one additional patch https://github.com/xypron/kernel-odroid-c2/blob/master/patch/0001-stmmac-RTL8211F-Meson-GXBB-TX-throughput-problems.patch I repeatedly see faults like the one below: [ 2557.400796] Unhandled fault: synchronous external abort (0x9210) at 0x40001e8ee4b0 [ 2557.952413] CPU: 0 PID: 22837 Comm: cc1 Tainted: G D 4.9.0-rc6-next-20161124-1-gbf7e142 #1 [ 2557.962062] Hardware name: Hardkernel ODROID-C2 (DT) [ 2557.966980] task: 80006ddb7080 task.stack: 80006dd9c000 [ 2557.972846] PC is at 0x6a0d98 [ 2557.975776] LR is at 0x6a0e54 [ 2557.978709] pc : [<006a0d98>] lr : [<006a0e54>] pstate: 8000 [ 2557.986040] sp : f3ee5f80 [ 2557.989318] x29: f3ee5f80 x28: 4b3f1240 [ 2557.994578] x27: 012a7000 x26: 4b3f1288 [ 2557.999840] x25: 00f58f88 x24: 4b3f1240 [ 2558.005101] x23: x22: 0001 [ 2558.010362] x21: 0001 x20: 4b3f1250 [ 2558.015623] x19: 0054 x18: 0001 [ 2558.020885] x17: 48acaa10 x16: 01285050 [ 2558.026146] x15: 4ad96dc8 x14: 001f [ 2558.031407] x13: 4b3f1270 x12: 4b3f1258 [ 2558.036668] x11: 01347000 x10: 0661 [ 2558.041930] x9 : 0005 x8 : 0003 [ 2558.047191] x7 : 4b3f1240 x6 : 20020033 [ 2558.052452] x5 : 4b402020 x4 : 4b3e1aa0 [ 2558.057713] x3 : 000c x2 : 0020 [ 2558.062974] x1 : 00f45000 x0 : 0065 [ 2558.068235] [ 2558.069712] Internal error: Attempting to execute userspace memory: 860f [#7] PREEMPT SMP [ 2558.078155] Modules linked in: meson_rng rng_core meson_gxbb_wdt ip_tables x_tables ipv6 dwmac_generic realtek dwmac_meson8b stmmac_platform stmmac [ 2558.091267] CPU: 0 PID: 22837 Comm: cc1 Tainted: G D 4.9.0-rc6-next-20161124-1-gbf7e142 #1 [ 2558.100925] Hardware name: Hardkernel ODROID-C2 (DT) [ 2558.105841] task: 80006ddb7080 task.stack: 80006dd9c000 [ 2558.111706] PC is at 0x6a0e54 [ 2558.114638] LR is at 0x6a0e54 [ 2558.117571] pc : [<006a0e54>] lr : [<006a0e54>] pstate: 63c5 [ 2558.124902] sp : 80006dd9fec0 [ 2558.128179] x29: x28: 80006ddb7080 [ 2558.133441] x27: 012a7000 x26: 4b3f1288 [ 2558.138702] x25: 00f58f88 x24: 4b3f1240 [ 2558.143963] x23: 8000 x22: 006a0d98 [ 2558.149225] x21: x20: 80006e223000 [ 2558.154486] x19: x18: 0010 [ 2558.159747] x17: 48acaa10 x16: 01285050 [ 2558.165008] x15: 88e91f07 x14: 0006 [ 2558.170270] x13: 08e91f15 x12: 000f [ 2558.175531] x11: 0002 x10: 02ea [ 2558.180792] x9 : 80006dd9fb40 x8 : 00010a8b [ 2558.186053] x7 : x6 : 020e [ 2558.191315] x5 : 020f020e x4 : [ 2558.196576] x3 : x2 : 020f [ 2558.201837] x1 : 80006ddb7080 x0 : [ 2558.207098] [ 2558.208565] Process cc1 (pid: 22837, stack limit = 0x80006dd9c000) [ 2558.215035] Stack: (0x80006dd9fec0 to 0x80006dda) [ 2558.220728] fec0: 0065 00f45000 0020 000c [ 2558.228490] fee0: 4b3e1aa0 4b402020 20020033 4b3f1240 [ 2558.236253] ff00: 0003 0005 0661 01347000 [ 2558.244015] ff20: 4b3f1258 4b3f1270 001f 4ad96dc8 [ 2558.251778] ff40: 01285050 48acaa10 0001 0054 [ 2558.259540] ff60: 4b3f1250 0001 0001 [ 2558.267303] ff80: 4b3f1240 00f58f88 4b3f1288 012a7000 [ 2558.275065] ffa0: 4b3f1240 f3ee5f80 006a0e54 f3ee5f80 [ 2558.282828] ffc0: 006a0d98 8000 0003 [ 2558.290590] ffe0: [ 2558.298351] Call trace: [ 2558.300769] Exception stack(0x80006dd9fcf0 to 0x80006dd9fe20) [ 2558.307149] fce0: 0001 [ 2558.314913] fd00: 80006dd9fec0 006a0e54 800073acf500 0004 [ 2558.322675] fd20: 08dbbc18 80006ddb7080 6dd9fdd0 [ 2558.330438] fd40: 80006dd9fd90 080ca878 80006dd9fe40 80006ddb7080 [ 2558.338200] fd60: 0004 03c0 80006dd9fe40 4b3f1240 [ 2558.345963] fd80: 00f58f88 4b3f1288 80006ddb7080 [ 2558.353725] fda0: 020f
Re: net: GPF in eth_header
On Sat, Nov 26, 2016 at 7:28 PM, 'Eric Dumazet' via syzkallerwrote: > On Sat, Nov 26, 2016 at 9:30 AM, Dmitry Vyukov wrote: >> Hello, >> >> The following program triggers GPF in eth_header: >> >> https://gist.githubusercontent.com/dvyukov/613cadf05543b55a419f237e419cd495/raw/5471231523d1a07c3de55f11f87472c2816ee06c/gistfile1.txt >> >> On commit 16ae16c6e5616c084168740990fc508bda6655d4 (Nov 24). >> >> BUG: unable to handle kernel paging request at ed002d14d74a >> IP: [] eth_header+0x75/0x260 net/ethernet/eth.c:88 >> PGD 7fff6067 [ 50.787819] PUD 7fff5067 >> PMD 0 [ 50.787819] >> Oops: [#1] SMP DEBUG_PAGEALLOC KASAN >> Modules linked in: >> CPU: 2 PID: 6712 Comm: a.out Not tainted 4.9.0-rc6+ #55 >> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 >> task: 88003a1841c0 task.stack: 880034d08000 >> RIP: 0010:[] [] >> eth_header+0x75/0x260 net/ethernet/eth.c:88 >> RSP: 0018:880034d0eb68 EFLAGS: 00010a03 >> RAX: 11002d14d74a RBX: 880168a6ba4a RCX: 88006a9c7858 >> RDX: dd86 RSI: dc00 RDI: 880168a6ba56 >> RBP: 880034d0eb98 R08: R09: 0031 >> R10: 0002 R11: R12: >> R13: 88006c208d80 R14: 86dd R15: 88006a9c7858 >> FS: 01a02940() GS:88006d00() knlGS: >> CS: 0010 DS: ES: CR0: 80050033 >> CR2: ed002d14d74a CR3: 37373000 CR4: 06e0 >> Stack: >> 00316881ab40 88006a9c76c0 88006881ab40 88006a9c77f8 >> dc00 880034d0ee98 86b31af9 >> 8719605c 880034d0f0f8 86dd 86be3220 >> Call Trace: >> [< inline >] dev_hard_header ./include/linux/netdevice.h:2762 >> [] neigh_resolve_output+0x659/0xb20 >> net/core/neighbour.c:1302 >> [< inline >] dst_neigh_output ./include/net/dst.h:464 >> [] ip6_finish_output2+0xb3c/0x2500 >> net/ipv6/ip6_output.c:121 >> [] ip6_finish_output+0x2eb/0x760 net/ipv6/ip6_output.c:139 >> [< inline >] NF_HOOK_COND ./include/linux/netfilter.h:246 >> [] ip6_output+0x1d7/0x9a0 net/ipv6/ip6_output.c:153 >> [< inline >] dst_output ./include/net/dst.h:501 >> [] ip6_local_out+0x9a/0x180 net/ipv6/output_core.c:170 >> [] ip6_send_skb+0xa6/0x340 net/ipv6/ip6_output.c:1712 >> [] ip6_push_pending_frames+0xb8/0xe0 >> net/ipv6/ip6_output.c:1732 >> [< inline >] rawv6_push_pending_frames net/ipv6/raw.c:607 >> [] rawv6_sendmsg+0x250b/0x2c20 net/ipv6/raw.c:920 >> [] inet_sendmsg+0x385/0x590 net/ipv4/af_inet.c:734 >> [< inline >] sock_sendmsg_nosec net/socket.c:621 >> [] sock_sendmsg+0xcf/0x110 net/socket.c:631 >> [] sock_write_iter+0x32b/0x620 net/socket.c:829 >> [] do_iter_readv_writev+0x363/0x670 fs/read_write.c:695 >> [] do_readv_writev+0x431/0x9b0 fs/read_write.c:872 >> [] vfs_writev+0x8c/0xc0 fs/read_write.c:911 >> [] do_writev+0x115/0x2d0 fs/read_write.c:944 >> [< inline >] SYSC_writev fs/read_write.c:1017 >> [] SyS_writev+0x2c/0x40 fs/read_write.c:1014 >> [] entry_SYSCALL_64_fastpath+0x23/0xc6 >> arch/x86/entry/entry_64.S:209 >> Code: 41 83 fe 04 0f 84 aa 00 00 00 e8 17 4e b0 fa 48 8d 7b 0c 48 be >> 00 00 00 00 00 fc ff df 44 89 f2 66 c1 c2 08 48 89 f8 48 c1 e8 03 <0f> >> b6 0c 30 48 8d 43 0d 49 89 c0 49 c1 e8 03 41 0f b6 34 30 49 >> RIP [] eth_header+0x75/0x260 net/ethernet/eth.c:88 >> RSP >> CR2: ed002d14d74a >> ---[ end trace a73fedfdc11bd60c ]--- > > > Hi Dmitry > > I could not reproduce the issue. Might need some specific configuration... > > loopback device has proper ethernet header (all 0) > > Fault happens in : > > 0f b6 0c 30 movzbl (%rax,%rsi,1),%ecx > > RAX=11002d14d74a which is RDI>>3, and RSI=dc00 > > Could this be a KASAN problem ? Hi Eric, The crash happens when the kernel tries to access shadow for nonmapped memory. The issue here is an integer overflow which happens in neigh_resolve_output(). skb_network_offset(skb) can return negative number, but __skb_pull() accepts unsigned int as len. As a result, the least significat bit in higher 32 bits of skb->data gets set and we get an out-of-bounds with offset of 4 GB. I've attached a short reproducer, but you either need KASAN or to add a BUG_ON to see the crash. In this reproducer skb_network_offset() becomes negative after merging two ipv6 fragments. I actually see multiple places where skb_network_offset() is used as an argument to skb_pull(). So I guess every place can potentially be buggy. Thanks! > > -- > You received this message because you are subscribed to the Google Groups > "syzkaller" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to syzkaller+unsubscr...@googlegroups.com. > For more options, visit https://groups.google.com/d/optout. ipv6-poc1.c Description: Binary
Re: [PATCH net-next] cgroup, bpf: remove unnecessary #include
Acked-by: Rami RosenOn 26 November 2016 at 09:23, Alexei Starovoitov wrote: > this #include is unnecessary and brings whole set of > other headers into cgroup-defs.h. Remove it. > > Fixes: 3007098494be ("cgroup: add support for eBPF programs") > Signed-off-by: Alexei Starovoitov > --- > include/linux/bpf-cgroup.h | 1 - > 1 file changed, 1 deletion(-) > > diff --git a/include/linux/bpf-cgroup.h b/include/linux/bpf-cgroup.h > index ec80d0c0953e..0cf1adfadd2d 100644 > --- a/include/linux/bpf-cgroup.h > +++ b/include/linux/bpf-cgroup.h > @@ -1,7 +1,6 @@ > #ifndef _BPF_CGROUP_H > #define _BPF_CGROUP_H > > -#include > #include > #include > > -- > 2.8.0 >
Re: net: GPF in eth_header
On Sat, Nov 26, 2016 at 9:30 AM, Dmitry Vyukovwrote: > Hello, > > The following program triggers GPF in eth_header: > > https://gist.githubusercontent.com/dvyukov/613cadf05543b55a419f237e419cd495/raw/5471231523d1a07c3de55f11f87472c2816ee06c/gistfile1.txt > > On commit 16ae16c6e5616c084168740990fc508bda6655d4 (Nov 24). > > BUG: unable to handle kernel paging request at ed002d14d74a > IP: [] eth_header+0x75/0x260 net/ethernet/eth.c:88 > PGD 7fff6067 [ 50.787819] PUD 7fff5067 > PMD 0 [ 50.787819] > Oops: [#1] SMP DEBUG_PAGEALLOC KASAN > Modules linked in: > CPU: 2 PID: 6712 Comm: a.out Not tainted 4.9.0-rc6+ #55 > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 > task: 88003a1841c0 task.stack: 880034d08000 > RIP: 0010:[] [] > eth_header+0x75/0x260 net/ethernet/eth.c:88 > RSP: 0018:880034d0eb68 EFLAGS: 00010a03 > RAX: 11002d14d74a RBX: 880168a6ba4a RCX: 88006a9c7858 > RDX: dd86 RSI: dc00 RDI: 880168a6ba56 > RBP: 880034d0eb98 R08: R09: 0031 > R10: 0002 R11: R12: > R13: 88006c208d80 R14: 86dd R15: 88006a9c7858 > FS: 01a02940() GS:88006d00() knlGS: > CS: 0010 DS: ES: CR0: 80050033 > CR2: ed002d14d74a CR3: 37373000 CR4: 06e0 > Stack: > 00316881ab40 88006a9c76c0 88006881ab40 88006a9c77f8 > dc00 880034d0ee98 86b31af9 > 8719605c 880034d0f0f8 86dd 86be3220 > Call Trace: > [< inline >] dev_hard_header ./include/linux/netdevice.h:2762 > [] neigh_resolve_output+0x659/0xb20 > net/core/neighbour.c:1302 > [< inline >] dst_neigh_output ./include/net/dst.h:464 > [] ip6_finish_output2+0xb3c/0x2500 > net/ipv6/ip6_output.c:121 > [] ip6_finish_output+0x2eb/0x760 net/ipv6/ip6_output.c:139 > [< inline >] NF_HOOK_COND ./include/linux/netfilter.h:246 > [] ip6_output+0x1d7/0x9a0 net/ipv6/ip6_output.c:153 > [< inline >] dst_output ./include/net/dst.h:501 > [] ip6_local_out+0x9a/0x180 net/ipv6/output_core.c:170 > [] ip6_send_skb+0xa6/0x340 net/ipv6/ip6_output.c:1712 > [] ip6_push_pending_frames+0xb8/0xe0 > net/ipv6/ip6_output.c:1732 > [< inline >] rawv6_push_pending_frames net/ipv6/raw.c:607 > [] rawv6_sendmsg+0x250b/0x2c20 net/ipv6/raw.c:920 > [] inet_sendmsg+0x385/0x590 net/ipv4/af_inet.c:734 > [< inline >] sock_sendmsg_nosec net/socket.c:621 > [] sock_sendmsg+0xcf/0x110 net/socket.c:631 > [] sock_write_iter+0x32b/0x620 net/socket.c:829 > [] do_iter_readv_writev+0x363/0x670 fs/read_write.c:695 > [] do_readv_writev+0x431/0x9b0 fs/read_write.c:872 > [] vfs_writev+0x8c/0xc0 fs/read_write.c:911 > [] do_writev+0x115/0x2d0 fs/read_write.c:944 > [< inline >] SYSC_writev fs/read_write.c:1017 > [] SyS_writev+0x2c/0x40 fs/read_write.c:1014 > [] entry_SYSCALL_64_fastpath+0x23/0xc6 > arch/x86/entry/entry_64.S:209 > Code: 41 83 fe 04 0f 84 aa 00 00 00 e8 17 4e b0 fa 48 8d 7b 0c 48 be > 00 00 00 00 00 fc ff df 44 89 f2 66 c1 c2 08 48 89 f8 48 c1 e8 03 <0f> > b6 0c 30 48 8d 43 0d 49 89 c0 49 c1 e8 03 41 0f b6 34 30 49 > RIP [] eth_header+0x75/0x260 net/ethernet/eth.c:88 > RSP > CR2: ed002d14d74a > ---[ end trace a73fedfdc11bd60c ]--- Hi Dmitry I could not reproduce the issue. Might need some specific configuration... loopback device has proper ethernet header (all 0) Fault happens in : 0f b6 0c 30 movzbl (%rax,%rsi,1),%ecx RAX=11002d14d74a which is RDI>>3, and RSI=dc00 Could this be a KASAN problem ?
net: BUG in unix_notinflight
Hello, I am hitting the following BUG while running syzkaller fuzzer: kernel BUG at net/unix/garbage.c:149! invalid opcode: [#1] SMP DEBUG_PAGEALLOC KASAN Dumping ftrace buffer: (ftrace buffer empty) Modules linked in: CPU: 0 PID: 23491 Comm: syz-executor Not tainted 4.9.0-rc5+ #41 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 task: 8801c16b06c0 task.stack: 8801c2928000 RIP: 0010:[] [] unix_notinflight+0x3b4/0x490 net/unix/garbage.c:149 RSP: 0018:8801c292ea40 EFLAGS: 00010297 RAX: 8801c16b06c0 RBX: 110038525d4a RCX: dc00 RDX: RSI: 110038525d4e RDI: 8a6e9d84 RBP: 8801c292eb18 R08: R09: R10: cdca594876e035a1 R11: 0005 R12: 110038525d4e R13: 899156e0 R14: 8801c292eaf0 R15: 88018b7cd780 FS: 7f10420fa700() GS:8801d980() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: 2000a000 CR3: 0001c2ecc000 CR4: 001406f0 DR0: DR1: 0400 DR2: DR3: DR6: 0ff0 DR7: 0600 Stack: dc00 88019f036970 41b58ab3 894c5120 8717e840 8801c16b06c0 88018b7cdcf0 894c51e2 81576d50 1100 Call Trace: [] unix_detach_fds.isra.19+0xff/0x170 net/unix/af_unix.c:1487 [] unix_destruct_scm+0xf9/0x210 net/unix/af_unix.c:1496 [] skb_release_head_state+0x101/0x200 net/core/skbuff.c:655 [] skb_release_all+0x1a/0x60 net/core/skbuff.c:668 [] __kfree_skb+0x1a/0x30 net/core/skbuff.c:684 [] kfree_skb+0x184/0x570 net/core/skbuff.c:705 [] unix_release_sock+0x5b5/0xbd0 net/unix/af_unix.c:559 [] unix_release+0x49/0x90 net/unix/af_unix.c:836 [] sock_release+0x92/0x1f0 net/socket.c:570 [] sock_close+0x1b/0x20 net/socket.c:1017 [] __fput+0x34e/0x910 fs/file_table.c:208 [] fput+0x1a/0x20 fs/file_table.c:244 [] task_work_run+0x1a0/0x280 kernel/task_work.c:116 [< inline >] exit_task_work include/linux/task_work.h:21 [] do_exit+0x183a/0x2640 kernel/exit.c:828 [] do_group_exit+0x14e/0x420 kernel/exit.c:931 [] get_signal+0x663/0x1880 kernel/signal.c:2307 [] do_signal+0xc5/0x2190 arch/x86/kernel/signal.c:807 [] exit_to_usermode_loop+0x1ea/0x2d0 arch/x86/entry/common.c:156 [< inline >] prepare_exit_to_usermode arch/x86/entry/common.c:190 [] syscall_return_slowpath+0x4d3/0x570 arch/x86/entry/common.c:259 [] entry_SYSCALL_64_fastpath+0xc4/0xc6 Code: df 49 89 87 70 05 00 00 41 c6 04 14 f8 48 89 f9 48 c1 e9 03 80 3c 11 00 75 64 49 89 87 78 05 00 00 e9 65 ff ff ff e8 ac 94 56 fa <0f> 0b 48 89 d7 48 89 95 30 ff ff ff e8 bb 22 87 fa 48 8b 95 30 RIP [] unix_notinflight+0x3b4/0x490 net/unix/garbage.c:149 RSP ---[ end trace 4cbbd52674b68dab ]--- On commit 16ae16c6e5616c084168740990fc508bda6655d4 (Nov 24). Unfortunately this is not reproducible outside of syzkaller. But easily reproducible with syzkaller. If you need to reproduce it, follow instructions described here: https://github.com/google/syzkaller/wiki/How-to-execute-syzkaller-programs With the following as the program: mmap(&(0x7f00/0xdd5000)=nil, (0xdd5000), 0x3, 0x32, 0x, 0x0) socketpair$unix(0x1, 0x5, 0x0,
net: GPF in eth_header
Hello, The following program triggers GPF in eth_header: https://gist.githubusercontent.com/dvyukov/613cadf05543b55a419f237e419cd495/raw/5471231523d1a07c3de55f11f87472c2816ee06c/gistfile1.txt On commit 16ae16c6e5616c084168740990fc508bda6655d4 (Nov 24). BUG: unable to handle kernel paging request at ed002d14d74a IP: [] eth_header+0x75/0x260 net/ethernet/eth.c:88 PGD 7fff6067 [ 50.787819] PUD 7fff5067 PMD 0 [ 50.787819] Oops: [#1] SMP DEBUG_PAGEALLOC KASAN Modules linked in: CPU: 2 PID: 6712 Comm: a.out Not tainted 4.9.0-rc6+ #55 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 task: 88003a1841c0 task.stack: 880034d08000 RIP: 0010:[] [] eth_header+0x75/0x260 net/ethernet/eth.c:88 RSP: 0018:880034d0eb68 EFLAGS: 00010a03 RAX: 11002d14d74a RBX: 880168a6ba4a RCX: 88006a9c7858 RDX: dd86 RSI: dc00 RDI: 880168a6ba56 RBP: 880034d0eb98 R08: R09: 0031 R10: 0002 R11: R12: R13: 88006c208d80 R14: 86dd R15: 88006a9c7858 FS: 01a02940() GS:88006d00() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: ed002d14d74a CR3: 37373000 CR4: 06e0 Stack: 00316881ab40 88006a9c76c0 88006881ab40 88006a9c77f8 dc00 880034d0ee98 86b31af9 8719605c 880034d0f0f8 86dd 86be3220 Call Trace: [< inline >] dev_hard_header ./include/linux/netdevice.h:2762 [] neigh_resolve_output+0x659/0xb20 net/core/neighbour.c:1302 [< inline >] dst_neigh_output ./include/net/dst.h:464 [] ip6_finish_output2+0xb3c/0x2500 net/ipv6/ip6_output.c:121 [] ip6_finish_output+0x2eb/0x760 net/ipv6/ip6_output.c:139 [< inline >] NF_HOOK_COND ./include/linux/netfilter.h:246 [] ip6_output+0x1d7/0x9a0 net/ipv6/ip6_output.c:153 [< inline >] dst_output ./include/net/dst.h:501 [] ip6_local_out+0x9a/0x180 net/ipv6/output_core.c:170 [] ip6_send_skb+0xa6/0x340 net/ipv6/ip6_output.c:1712 [] ip6_push_pending_frames+0xb8/0xe0 net/ipv6/ip6_output.c:1732 [< inline >] rawv6_push_pending_frames net/ipv6/raw.c:607 [] rawv6_sendmsg+0x250b/0x2c20 net/ipv6/raw.c:920 [] inet_sendmsg+0x385/0x590 net/ipv4/af_inet.c:734 [< inline >] sock_sendmsg_nosec net/socket.c:621 [] sock_sendmsg+0xcf/0x110 net/socket.c:631 [] sock_write_iter+0x32b/0x620 net/socket.c:829 [] do_iter_readv_writev+0x363/0x670 fs/read_write.c:695 [] do_readv_writev+0x431/0x9b0 fs/read_write.c:872 [] vfs_writev+0x8c/0xc0 fs/read_write.c:911 [] do_writev+0x115/0x2d0 fs/read_write.c:944 [< inline >] SYSC_writev fs/read_write.c:1017 [] SyS_writev+0x2c/0x40 fs/read_write.c:1014 [] entry_SYSCALL_64_fastpath+0x23/0xc6 arch/x86/entry/entry_64.S:209 Code: 41 83 fe 04 0f 84 aa 00 00 00 e8 17 4e b0 fa 48 8d 7b 0c 48 be 00 00 00 00 00 fc ff df 44 89 f2 66 c1 c2 08 48 89 f8 48 c1 e8 03 <0f> b6 0c 30 48 8d 43 0d 49 89 c0 49 c1 e8 03 41 0f b6 34 30 49 RIP [] eth_header+0x75/0x260 net/ethernet/eth.c:88 RSP CR2: ed002d14d74a ---[ end trace a73fedfdc11bd60c ]---
Re: wl1251 & mac address & calibration data
On Thursday 24 November 2016 19:46:01 Aaro Koskinen wrote: > Hi, > > On Thu, Nov 24, 2016 at 04:20:45PM +0100, Pali Rohár wrote: > > Proprietary, signed and closed bootloader NOLO does not support DT. > > So for booting you need to append DTS file to kernel image. > > > > U-Boot is optional and can be used as intermediate bootloader > > between NOLO and kernel. But still it has problems with reading > > from nand, so cannot read NVS data nor MAC address. > > You could use kexec to pass the fixed DT. > > A. IIRC it was broken for N900/omap3, no idea if somebody fixed it. -- Pali Rohár pali.ro...@gmail.com signature.asc Description: This is a digitally signed message part.
Re: wl1251 & mac address & calibration data
On Thu 2016-11-24 20:46:01, Aaro Koskinen wrote: > Hi, > > On Thu, Nov 24, 2016 at 04:20:45PM +0100, Pali Rohár wrote: > > Proprietary, signed and closed bootloader NOLO does not support DT. So > > for booting you need to append DTS file to kernel image. > > > > U-Boot is optional and can be used as intermediate bootloader between > > NOLO and kernel. But still it has problems with reading from nand, so > > cannot read NVS data nor MAC address. > > You could use kexec to pass the fixed DT. Yeah. You could also strap desktop PC to a USB GPRS card, and call it phone. You could also make a pig fly. But because you could does not mean you should. No, sorry, kexec is not acceptable. Too hard to set up, slows boot too much. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html signature.asc Description: Digital signature
Re: net: deadlock on genl_mutex
On Sat, Nov 26, 2016 at 9:04 AM, Dmitry Vyukovwrote: > Hello, > > The following program triggers deadlock warnings on genl_mutex: > > https://gist.githubusercontent.com/dvyukov/65e33d053e507d2ab0bf6ae83d989585/raw/b3c640ec58e894b50bcbf255c471406466cfa5d0/gistfile1.txt > > On commit 16ae16c6e5616c084168740990fc508bda6655d4 (Nov 24). > > BUG: sleeping function called from invalid context at > kernel/locking/mutex.c:620 > in_atomic(): 1, irqs_disabled(): 0, pid: 32289, name: syz-executor > CPU: 0 PID: 32289 Comm: syz-executor Not tainted 4.9.0-rc5+ #54 > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 > 88003ec06420 834c2e39 110007d80c17 > ed0007d80c0f 41b58ab3 89575550 834c2b4b > 8baab1a0 dc00 880068f794e0 > Call Trace: > [ 287.394552] [< inline >] __dump_stack lib/dump_stack.c:15 > [ 287.394552] [] dump_stack+0x2ee/0x3f5 > lib/dump_stack.c:51 > [] ___might_sleep+0x483/0x660 kernel/sched/core.c:7761 > [] __might_sleep+0x9a/0x1a0 kernel/sched/core.c:7720 > [] mutex_lock_nested+0x1ea/0xf20 kernel/locking/mutex.c:620 > [< inline >] genl_lock net/netlink/genetlink.c:31 > [] genl_lock_done+0x71/0xd0 net/netlink/genetlink.c:531 > [] netlink_sock_destruct+0xf8/0x400 > net/netlink/af_netlink.c:331 > [] __sk_destruct+0xf4/0x7f0 net/core/sock.c:1423 > [] sk_destruct+0x4c/0x80 net/core/sock.c:1453 > [] __sk_free+0x5c/0x230 net/core/sock.c:1461 > [] sk_free+0x28/0x30 net/core/sock.c:1472 > [< inline >] sock_put include/net/sock.h:1591 > [] deferred_put_nlk_sk+0x31/0x40 > net/netlink/af_netlink.c:652 > [< inline >] __rcu_reclaim kernel/rcu/rcu.h:118 > [] rcu_do_batch.isra.70+0x9ed/0xe20 kernel/rcu/tree.c:2776 > [< inline >] invoke_rcu_callbacks kernel/rcu/tree.c:3040 > [< inline >] __rcu_process_callbacks kernel/rcu/tree.c:3007 > [] rcu_process_callbacks+0x48c/0xd70 kernel/rcu/tree.c:3024 > [] __do_softirq+0x32b/0xca8 kernel/softirq.c:284 > [< inline >] invoke_softirq kernel/softirq.c:364 > [] irq_exit+0x1d1/0x210 kernel/softirq.c:405 > [< inline >] exiting_irq arch/x86/include/asm/apic.h:659 > [] smp_apic_timer_interrupt+0x80/0xa0 > arch/x86/kernel/apic/apic.c:960 > [] apic_timer_interrupt+0x8c/0xa0 > arch/x86/entry/entry_64.S:489 > [ 287.403717] [] ? lock_is_held+0x247/0x310 > [] ___might_sleep+0x59e/0x660 kernel/sched/core.c:7729 > [] __might_sleep+0x9a/0x1a0 kernel/sched/core.c:7720 > [] down_read+0x78/0x160 kernel/locking/rwsem.c:21 > [< inline >] anon_vma_lock_read include/linux/rmap.h:127 > [] validate_mm+0xe5/0x880 mm/mmap.c:347 > [] vma_link+0x11b/0x180 mm/mmap.c:605 > [] mmap_region+0x1076/0x1880 mm/mmap.c:1692 > [] do_mmap+0x6ff/0xe80 mm/mmap.c:1450 > [< inline >] do_mmap_pgoff include/linux/mm.h:2039 > [] vm_mmap_pgoff+0x1b7/0x210 mm/util.c:305 > [< inline >] SYSC_mmap_pgoff mm/mmap.c:1500 > [] SyS_mmap_pgoff+0x231/0x5e0 mm/mmap.c:1458 > [< inline >] SYSC_mmap arch/x86/kernel/sys_x86_64.c:95 > [] SyS_mmap+0x1b/0x30 arch/x86/kernel/sys_x86_64.c:86 > [] entry_SYSCALL_64_fastpath+0x23/0xc6 > > = > [ INFO: inconsistent lock state ] > 4.9.0-rc5+ #54 Tainted: GW > - > inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage. > syz-executor/32289 [HC0[0]:SC1[1]:HE1:SE0] takes: > ([ 287.580014] genl_mutex > [< inline >] genl_lock net/netlink/genetlink.c:31 > [] genl_lock_done+0x71/0xd0 net/netlink/genetlink.c:531 > {SOFTIRQ-ON-W} state was registered at: > [ 287.580014] [< inline >] mark_irqflags > kernel/locking/lockdep.c:2938 > [ 287.580014] [] __lock_acquire+0x6e7/0x3380 > kernel/locking/lockdep.c:3292 > [ 287.580014] [] lock_acquire+0x2a2/0x790 > kernel/locking/lockdep.c:3746 > [ 287.580014] [< inline >] __mutex_lock_common > kernel/locking/mutex.c:521 > [ 287.580014] [] mutex_lock_nested+0x23f/0xf20 > kernel/locking/mutex.c:621 > [ 287.580014] [< inline >] genl_lock net/netlink/genetlink.c:31 > [ 287.580014] [< inline >] genl_lock_all net/netlink/genetlink.c:52 > [ 287.580014] [] > __genl_register_family+0x2ce/0x1870 net/netlink/genetlink.c:374 > [ 287.580014] [< inline >] > _genl_register_family_with_ops_grps include/net/genetlink.h:173 > [ 287.580014] [] genl_init+0x11d/0x185 > net/netlink/genetlink.c:1084 > [ 287.580014] [] do_one_initcall+0xfb/0x3f0 > init/main.c:778 > [ 287.580014] [< inline >] do_initcall_level init/main.c:844 > [ 287.580014] [< inline >] do_initcalls init/main.c:852 > [ 287.580014] [< inline >] do_basic_setup init/main.c:870 > [ 287.580014] [] kernel_init_freeable+0x5c4/0x69e > init/main.c:1017 > [ 287.580014] [] kernel_init+0x18/0x180 init/main.c:943 > [ 287.580014] [] ret_from_fork+0x2a/0x40 >
net: deadlock on genl_mutex
Hello, The following program triggers deadlock warnings on genl_mutex: https://gist.githubusercontent.com/dvyukov/65e33d053e507d2ab0bf6ae83d989585/raw/b3c640ec58e894b50bcbf255c471406466cfa5d0/gistfile1.txt On commit 16ae16c6e5616c084168740990fc508bda6655d4 (Nov 24). BUG: sleeping function called from invalid context at kernel/locking/mutex.c:620 in_atomic(): 1, irqs_disabled(): 0, pid: 32289, name: syz-executor CPU: 0 PID: 32289 Comm: syz-executor Not tainted 4.9.0-rc5+ #54 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 88003ec06420 834c2e39 110007d80c17 ed0007d80c0f 41b58ab3 89575550 834c2b4b 8baab1a0 dc00 880068f794e0 Call Trace: [ 287.394552] [< inline >] __dump_stack lib/dump_stack.c:15 [ 287.394552] [] dump_stack+0x2ee/0x3f5 lib/dump_stack.c:51 [] ___might_sleep+0x483/0x660 kernel/sched/core.c:7761 [] __might_sleep+0x9a/0x1a0 kernel/sched/core.c:7720 [] mutex_lock_nested+0x1ea/0xf20 kernel/locking/mutex.c:620 [< inline >] genl_lock net/netlink/genetlink.c:31 [] genl_lock_done+0x71/0xd0 net/netlink/genetlink.c:531 [] netlink_sock_destruct+0xf8/0x400 net/netlink/af_netlink.c:331 [] __sk_destruct+0xf4/0x7f0 net/core/sock.c:1423 [] sk_destruct+0x4c/0x80 net/core/sock.c:1453 [] __sk_free+0x5c/0x230 net/core/sock.c:1461 [] sk_free+0x28/0x30 net/core/sock.c:1472 [< inline >] sock_put include/net/sock.h:1591 [] deferred_put_nlk_sk+0x31/0x40 net/netlink/af_netlink.c:652 [< inline >] __rcu_reclaim kernel/rcu/rcu.h:118 [] rcu_do_batch.isra.70+0x9ed/0xe20 kernel/rcu/tree.c:2776 [< inline >] invoke_rcu_callbacks kernel/rcu/tree.c:3040 [< inline >] __rcu_process_callbacks kernel/rcu/tree.c:3007 [] rcu_process_callbacks+0x48c/0xd70 kernel/rcu/tree.c:3024 [] __do_softirq+0x32b/0xca8 kernel/softirq.c:284 [< inline >] invoke_softirq kernel/softirq.c:364 [] irq_exit+0x1d1/0x210 kernel/softirq.c:405 [< inline >] exiting_irq arch/x86/include/asm/apic.h:659 [] smp_apic_timer_interrupt+0x80/0xa0 arch/x86/kernel/apic/apic.c:960 [] apic_timer_interrupt+0x8c/0xa0 arch/x86/entry/entry_64.S:489 [ 287.403717] [] ? lock_is_held+0x247/0x310 [] ___might_sleep+0x59e/0x660 kernel/sched/core.c:7729 [] __might_sleep+0x9a/0x1a0 kernel/sched/core.c:7720 [] down_read+0x78/0x160 kernel/locking/rwsem.c:21 [< inline >] anon_vma_lock_read include/linux/rmap.h:127 [] validate_mm+0xe5/0x880 mm/mmap.c:347 [] vma_link+0x11b/0x180 mm/mmap.c:605 [] mmap_region+0x1076/0x1880 mm/mmap.c:1692 [] do_mmap+0x6ff/0xe80 mm/mmap.c:1450 [< inline >] do_mmap_pgoff include/linux/mm.h:2039 [] vm_mmap_pgoff+0x1b7/0x210 mm/util.c:305 [< inline >] SYSC_mmap_pgoff mm/mmap.c:1500 [] SyS_mmap_pgoff+0x231/0x5e0 mm/mmap.c:1458 [< inline >] SYSC_mmap arch/x86/kernel/sys_x86_64.c:95 [] SyS_mmap+0x1b/0x30 arch/x86/kernel/sys_x86_64.c:86 [] entry_SYSCALL_64_fastpath+0x23/0xc6 = [ INFO: inconsistent lock state ] 4.9.0-rc5+ #54 Tainted: GW - inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage. syz-executor/32289 [HC0[0]:SC1[1]:HE1:SE0] takes: ([ 287.580014] genl_mutex [< inline >] genl_lock net/netlink/genetlink.c:31 [] genl_lock_done+0x71/0xd0 net/netlink/genetlink.c:531 {SOFTIRQ-ON-W} state was registered at: [ 287.580014] [< inline >] mark_irqflags kernel/locking/lockdep.c:2938 [ 287.580014] [] __lock_acquire+0x6e7/0x3380 kernel/locking/lockdep.c:3292 [ 287.580014] [] lock_acquire+0x2a2/0x790 kernel/locking/lockdep.c:3746 [ 287.580014] [< inline >] __mutex_lock_common kernel/locking/mutex.c:521 [ 287.580014] [] mutex_lock_nested+0x23f/0xf20 kernel/locking/mutex.c:621 [ 287.580014] [< inline >] genl_lock net/netlink/genetlink.c:31 [ 287.580014] [< inline >] genl_lock_all net/netlink/genetlink.c:52 [ 287.580014] [] __genl_register_family+0x2ce/0x1870 net/netlink/genetlink.c:374 [ 287.580014] [< inline >] _genl_register_family_with_ops_grps include/net/genetlink.h:173 [ 287.580014] [] genl_init+0x11d/0x185 net/netlink/genetlink.c:1084 [ 287.580014] [] do_one_initcall+0xfb/0x3f0 init/main.c:778 [ 287.580014] [< inline >] do_initcall_level init/main.c:844 [ 287.580014] [< inline >] do_initcalls init/main.c:852 [ 287.580014] [< inline >] do_basic_setup init/main.c:870 [ 287.580014] [] kernel_init_freeable+0x5c4/0x69e init/main.c:1017 [ 287.580014] [] kernel_init+0x18/0x180 init/main.c:943 [ 287.580014] [] ret_from_fork+0x2a/0x40 arch/x86/entry/entry_64.S:433 [ 78.258919] [ INFO: inconsistent lock state ] [ 78.258919] 4.9.0-rc5+ #54 Tainted: GW [ 78.258919] - [ 78.258919] inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage. [ 78.258919] syz-fuzzer/5211
net: GPF in rt6_get_cookie
Hello, I got several GPFs in rt6_get_cookie while running syzkaller: general protection fault: [#1] SMP DEBUG_PAGEALLOC KASAN Dumping ftrace buffer: (ftrace buffer empty) Modules linked in: CPU: 2 PID: 10156 Comm: syz-executor Not tainted 4.9.0-rc5+ #54 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 task: 880016f40480 task.stack: 88000fc0 RIP: 0010:[] [< inline >] rt6_get_cookie include/net/ip6_fib.h:174 RIP: 0010:[] [] sctp_v6_get_dst+0x7c8/0x1960 net/sctp/ipv6.c:340 RSP: 0018:88000fc07298 EFLAGS: 00010202 RAX: dc00 RBX: RCX: c900029f5000 RDX: 0015 RSI: 0001 RDI: 00a8 RBP: 88000fc07580 R08: R09: 0001 R10: R11: R12: 880066cd0068 R13: 110001f80e92 R14: 880066cd0040 R15: 88005f2d2808 FS: 7f52c41f7700() GS:88006d00() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: 20016000 CR3: 65dd7000 CR4: 06e0 DR0: 0400 DR1: 0400 DR2: DR3: DR6: 0ff0 DR7: 0600 Stack: 87a210f6 8701ad45 88006768ec20 88006768ec20 16f40480 88000fc07450 11000cd9a017 88006768ec00 880066fc0730 880066cd0068 110001f80e66 Call Trace: [] sctp_transport_route+0xad/0x430 net/sctp/transport.c:279 [] sctp_assoc_add_peer+0x5a6/0x13e0 net/sctp/associola.c:641 [] sctp_sendmsg+0x1921/0x3bc0 net/sctp/socket.c:1864 [] inet_sendmsg+0x385/0x590 net/ipv4/af_inet.c:734 [< inline >] sock_sendmsg_nosec net/socket.c:621 [] sock_sendmsg+0xcf/0x110 net/socket.c:631 [] SYSC_sendto+0x660/0x810 net/socket.c:1656 [] SyS_sendto+0x45/0x60 net/socket.c:1624 [] entry_SYSCALL_64_fastpath+0x23/0xc6 Code: 00 00 48 8b 84 24 88 00 00 00 48 8b 58 40 e8 80 76 cc f9 48 8d bb a8 00 00 00 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 56 0f 00 00 48 8b 9b a8 00 00 00 45 31 ed 48 RIP [< inline >] rt6_get_cookie include/net/ip6_fib.h:174 RIP [] sctp_v6_get_dst+0x7c8/0x1960 net/sctp/ipv6.c:340 RSP ---[ end trace b8d1354fa571700d ]--- general protection fault: [#1] SMP DEBUG_PAGEALLOC KASAN Dumping ftrace buffer: (ftrace buffer empty) Modules linked in: CPU: 3 PID: 22744 Comm: syz-executor Not tainted 4.9.0-rc5+ #54 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 task: 88006b92a840 task.stack: 88006a73 RIP: 0010:[] [< inline >] rt6_get_cookie include/net/ip6_fib.h:174 RIP: 0010:[] [] sctp_v6_get_dst+0x7c8/0x1960 net/sctp/ipv6.c:340 RSP: 0018:88006a736b88 EFLAGS: 00010202 RAX: dc00 RBX: RCX: c90003c4f000 RDX: 0015 RSI: 0001 RDI: 00a8 RBP: 88006a736e68 R08: R09: 0001 R10: R11: R12: 880064cff268 R13: 11000d4e6db0 R14: 880064cff240 R15: 88006a4b6808 FS: 7f74f4ec9700() GS:88006d10() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: 2070effc CR3: 3bd2f000 CR4: 06e0 DR0: 0400 DR1: 0400 DR2: DR3: DR6: 0ff0 DR7: 0600 Stack: 87a210f6 000bbd2d 88006c2cd5a0 88006c2cd5a0 6ccb46c0 88006a736d40 11000c99fe57 88006c2cd500 8800658b1f30 880064cff268 11000d4e6d84 Call Trace: [] sctp_transport_route+0xad/0x430 net/sctp/transport.c:279 [] sctp_assoc_add_peer+0x5a6/0x13e0 net/sctp/associola.c:641 [] __sctp_connect+0x288/0xc90 net/sctp/socket.c:1178 [] __sctp_setsockopt_connectx+0x1ab/0x200 net/sctp/socket.c:1332 [< inline >] sctp_getsockopt_connectx3 net/sctp/socket.c:1417 [] sctp_getsockopt+0x36ed/0x6800 net/sctp/socket.c:6474 [] sock_common_getsockopt+0x9a/0xe0 net/core/sock.c:2649 [< inline >] SYSC_getsockopt net/socket.c:1788 [] SyS_getsockopt+0x257/0x390 net/socket.c:1770 [] entry_SYSCALL_64_fastpath+0x23/0xc6 Code: 00 00 48 8b 84 24 88 00 00 00 48 8b 58 40 e8 80 76 cc f9 48 8d bb a8 00 00 00 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 56 0f 00 00 48 8b 9b a8 00 00 00 45 31 ed 48 RIP [< inline >] rt6_get_cookie include/net/ip6_fib.h:174 RIP [] sctp_v6_get_dst+0x7c8/0x1960 net/sctp/ipv6.c:340 RSP ---[ end trace f42d1c14cb6d2835 ]--- This happened on commit a25f0944ba9b1d8a6813fd6f1a86f1bd59ac25a6 (Nov 13). Unfortunately this is not reproducible. The line is: return rt->rt6i_node ? rt->rt6i_node->fn_sernum : 0; Can it be a data race? rt->rt6i_node != NULL, but the next moment it is already NULL? That would explain the crash and non-reproducibility (need ThreadSanitizer!). This always happened
Re: netlink: GPF in sock_sndtimeo
CC Richard Guy BriggsOn Sat, Nov 26, 2016 at 7:44 AM, Dmitry Vyukov wrote: > Hello, > > The following program triggers GPF in sock_sndtimeo: > https://gist.githubusercontent.com/dvyukov/c19cadd309791cf5cb9b2bf936d3f48d/raw/1743ba0211079a5465d039512b427bc6b59b1a76/gistfile1.txt > > On commit 16ae16c6e5616c084168740990fc508bda6655d4 (Nov 24). > > general protection fault: [#1] SMP DEBUG_PAGEALLOC KASAN > Dumping ftrace buffer: >(ftrace buffer empty) > Modules linked in: > CPU: 1 PID: 19950 Comm: syz-executor Not tainted 4.9.0-rc5+ #54 > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 > task: 88002a0d0840 task.stack: 88003692 > RIP: 0010:[] [< inline >] sock_sndtimeo > include/net/sock.h:2075 > RIP: 0010:[] [] > netlink_unicast+0xe1/0x730 net/netlink/af_netlink.c:1232 > RSP: 0018:880036926f68 EFLAGS: 00010202 > RAX: 0068 RBX: 880036927000 RCX: c900021d > RDX: 0d63 RSI: 024000c0 RDI: 0340 > RBP: 880036927028 R08: ed0006ea7aab R09: ed0006ea7aab > R10: 0001 R11: ed0006ea7aaa R12: dc00 > R13: R14: 880035de3400 R15: 880035de3400 > FS: 7f90a2fc7700() GS:88003ed0() knlGS: > CS: 0010 DS: ES: CR0: 80050033 > CR2: 006de0c0 CR3: 35de6000 CR4: 06e0 > Stack: > 880035de3400 819f02a1 110006d24df4 0004 > 4db40014 880036926fd8 41b58ab3 > 89653c11 86cb3500 819f0345 880035de3400 > Call Trace: > [< inline >] audit_replace kernel/audit.c:817 > [] audit_receive_msg+0x22c9/0x2ce0 kernel/audit.c:894 > [< inline >] audit_receive_skb kernel/audit.c:1120 > [] audit_receive+0x1dc/0x360 kernel/audit.c:1133 > [< inline >] netlink_unicast_kernel net/netlink/af_netlink.c:1214 > [] netlink_unicast+0x514/0x730 > net/netlink/af_netlink.c:1240 > [] netlink_sendmsg+0xaa4/0xe50 > net/netlink/af_netlink.c:1786 > [< inline >] sock_sendmsg_nosec net/socket.c:621 > [] sock_sendmsg+0xcf/0x110 net/socket.c:631 > [] sock_write_iter+0x32b/0x620 net/socket.c:829 > [< inline >] new_sync_write fs/read_write.c:499 > [] __vfs_write+0x4fe/0x830 fs/read_write.c:512 > [] vfs_write+0x175/0x4e0 fs/read_write.c:560 > [< inline >] SYSC_write fs/read_write.c:607 > [] SyS_write+0x100/0x240 fs/read_write.c:599 > [] do_syscall_64+0x2f4/0x940 arch/x86/entry/common.c:280 > [] entry_SYSCALL64_slow_path+0x25/0x25 > Code: fe 4c 89 f7 e8 31 16 ff ff 8b 8d 70 ff ff ff 49 89 c7 31 c0 85 > c9 75 25 e8 7d 4a a3 fa 49 8d bd 40 03 00 00 48 89 f8 48 c1 e8 03 <42> > 80 3c 20 00 0f 85 3a 06 00 00 49 8b 85 40 03 00 00 4c 8d 73 > RIP [< inline >] sock_sndtimeo include/net/sock.h:2075 > RIP [] netlink_unicast+0xe1/0x730 > net/netlink/af_netlink.c:1232 > RSP > ---[ end trace 8383a15fba6fdc59 ]--- Looks a bug added in commit 32a1dbaece7e37cea415e03cd426172249aa859e ("audit: try harder to send to auditd upon netlink failure") or 133e1e5acd4a63c4a0dcc413e90d5decdbce9c4a ("audit: stop an old auditd being starved out by a new auditd") Richard, can you take a look ? Thanks !
Re: [PATCH v3 net-next 1/2] net: ethernet: slicoss: add slicoss gigabit ethernet driver
Hi, Lino, ... > @@ -0,0 +1,28 @@ > +config NET_VENDOR_ALACRITECH > +bool "Alacritech devices" > +default y > +---help--- > + If you have a network (Ethernet) card belonging to this class, say > Y. > + > + Note that the answer to this question doesn't directly affect the > + kernel: saying N will just cause the configurator to skip all Shouldn't it be "Alacritech devices" here, as appears earlier ? > + the questions about Renesas devices. If you say Y, you will be > asked > + for your specific device in the following questions. > + ... ... ... > +struct slic_device { > + struct pci_dev *pdev; ... > + bool promisc; Seems that the autoneg boolean is not used anywhere, apart from setting it once to true in the slic_set_link_autoneg() method. Apart from this member it is not accessed anywhere, so it seems it should be removed. > + bool autoneg; > + int speed; ... ... > +static int slic_load_rcvseq_firmware(struct slic_device *sdev) > +{ > + const struct firmware *fw; > + const char *file; > + u32 codelen; > + int idx = 0; > + u32 instr; > + u32 addr; > + int err; > + ... > + /* Do an initial sanity check concerning firmware size now. A further > +* check follows below. > +*/ > + if (fw->size < SLIC_FIRMWARE_MIN_SIZE) { > + dev_err(>pdev->dev, > + "invalid firmware size %zu (min %u expected)\n", > + fw->size, SLIC_FIRMWARE_MIN_SIZE); > + err = -EINVAL; in the release label, always 0 is returned: > + goto release; > + } > + > + codelen = slic_read_dword_from_firmware(fw, ); > + > + /* do another sanity check against firmware size */ > + if ((codelen + 4) > fw->size) { > + dev_err(>pdev->dev, > + "invalid rcv-sequencer firmware size %zu\n", > fw->size); > + err = -EINVAL; Again, in the release label, always 0 is returned: > + goto release; > + } > + > > +release: > + release_firmware(fw); > + > + return 0; > +} > + Regards, Rami Rosen
netlink: GPF in sock_sndtimeo
Hello, The following program triggers GPF in sock_sndtimeo: https://gist.githubusercontent.com/dvyukov/c19cadd309791cf5cb9b2bf936d3f48d/raw/1743ba0211079a5465d039512b427bc6b59b1a76/gistfile1.txt On commit 16ae16c6e5616c084168740990fc508bda6655d4 (Nov 24). general protection fault: [#1] SMP DEBUG_PAGEALLOC KASAN Dumping ftrace buffer: (ftrace buffer empty) Modules linked in: CPU: 1 PID: 19950 Comm: syz-executor Not tainted 4.9.0-rc5+ #54 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 task: 88002a0d0840 task.stack: 88003692 RIP: 0010:[] [< inline >] sock_sndtimeo include/net/sock.h:2075 RIP: 0010:[] [] netlink_unicast+0xe1/0x730 net/netlink/af_netlink.c:1232 RSP: 0018:880036926f68 EFLAGS: 00010202 RAX: 0068 RBX: 880036927000 RCX: c900021d RDX: 0d63 RSI: 024000c0 RDI: 0340 RBP: 880036927028 R08: ed0006ea7aab R09: ed0006ea7aab R10: 0001 R11: ed0006ea7aaa R12: dc00 R13: R14: 880035de3400 R15: 880035de3400 FS: 7f90a2fc7700() GS:88003ed0() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: 006de0c0 CR3: 35de6000 CR4: 06e0 Stack: 880035de3400 819f02a1 110006d24df4 0004 4db40014 880036926fd8 41b58ab3 89653c11 86cb3500 819f0345 880035de3400 Call Trace: [< inline >] audit_replace kernel/audit.c:817 [] audit_receive_msg+0x22c9/0x2ce0 kernel/audit.c:894 [< inline >] audit_receive_skb kernel/audit.c:1120 [] audit_receive+0x1dc/0x360 kernel/audit.c:1133 [< inline >] netlink_unicast_kernel net/netlink/af_netlink.c:1214 [] netlink_unicast+0x514/0x730 net/netlink/af_netlink.c:1240 [] netlink_sendmsg+0xaa4/0xe50 net/netlink/af_netlink.c:1786 [< inline >] sock_sendmsg_nosec net/socket.c:621 [] sock_sendmsg+0xcf/0x110 net/socket.c:631 [] sock_write_iter+0x32b/0x620 net/socket.c:829 [< inline >] new_sync_write fs/read_write.c:499 [] __vfs_write+0x4fe/0x830 fs/read_write.c:512 [] vfs_write+0x175/0x4e0 fs/read_write.c:560 [< inline >] SYSC_write fs/read_write.c:607 [] SyS_write+0x100/0x240 fs/read_write.c:599 [] do_syscall_64+0x2f4/0x940 arch/x86/entry/common.c:280 [] entry_SYSCALL64_slow_path+0x25/0x25 Code: fe 4c 89 f7 e8 31 16 ff ff 8b 8d 70 ff ff ff 49 89 c7 31 c0 85 c9 75 25 e8 7d 4a a3 fa 49 8d bd 40 03 00 00 48 89 f8 48 c1 e8 03 <42> 80 3c 20 00 0f 85 3a 06 00 00 49 8b 85 40 03 00 00 4c 8d 73 RIP [< inline >] sock_sndtimeo include/net/sock.h:2075 RIP [] netlink_unicast+0xe1/0x730 net/netlink/af_netlink.c:1232 RSP ---[ end trace 8383a15fba6fdc59 ]---
[PATCH net-next 04/10] net/mlx5: Add DCBX firmware commands support
From: Huy NguyenAdd set/query commands for DCBX_PARAM register Signed-off-by: Huy Nguyen Signed-off-by: Saeed Mahameed --- drivers/net/ethernet/mellanox/mlx5/core/port.c | 20 include/linux/mlx5/driver.h| 7 +++ include/linux/mlx5/port.h | 2 ++ 3 files changed, 29 insertions(+) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/port.c b/drivers/net/ethernet/mellanox/mlx5/core/port.c index ed4898f..d2ec9d2 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/port.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/port.c @@ -548,6 +548,26 @@ int mlx5_max_tc(struct mlx5_core_dev *mdev) return num_tc - 1; } +int mlx5_query_port_dcbx_param(struct mlx5_core_dev *mdev, u32 *out) +{ + u32 in[MLX5_ST_SZ_DW(dcbx_param)] = {0}; + + MLX5_SET(dcbx_param, in, port_number, 1); + + return mlx5_core_access_reg(mdev, in, sizeof(in), out, + sizeof(in), MLX5_REG_DCBX_PARAM, 0, 0); +} + +int mlx5_set_port_dcbx_param(struct mlx5_core_dev *mdev, u32 *in) +{ + u32 out[MLX5_ST_SZ_DW(dcbx_param)]; + + MLX5_SET(dcbx_param, in, port_number, 1); + + return mlx5_core_access_reg(mdev, in, sizeof(out), out, + sizeof(out), MLX5_REG_DCBX_PARAM, 0, 1); +} + int mlx5_set_port_prio_tc(struct mlx5_core_dev *mdev, u8 *prio_tc) { u32 in[MLX5_ST_SZ_DW(qtct_reg)] = {0}; diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h index ae1f451..68b85ef 100644 --- a/include/linux/mlx5/driver.h +++ b/include/linux/mlx5/driver.h @@ -104,6 +104,8 @@ enum { enum { MLX5_REG_QETCR = 0x4005, MLX5_REG_QTCT= 0x400a, + MLX5_REG_DCBX_PARAM = 0x4020, + MLX5_REG_DCBX_APP= 0x4021, MLX5_REG_PCAP= 0x5001, MLX5_REG_PMTU= 0x5003, MLX5_REG_PTYS= 0x5004, @@ -124,6 +126,11 @@ enum { MLX5_REG_MPCNT = 0x9051, }; +enum mlx5_dcbx_oper_mode { + MLX5E_DCBX_PARAM_VER_OPER_HOST = 0x0, + MLX5E_DCBX_PARAM_VER_OPER_AUTO = 0x3, +}; + enum { MLX5_ATOMIC_OPS_CMP_SWAP= 1 << 0, MLX5_ATOMIC_OPS_FETCH_ADD = 1 << 1, diff --git a/include/linux/mlx5/port.h b/include/linux/mlx5/port.h index bdee439..e527732 100644 --- a/include/linux/mlx5/port.h +++ b/include/linux/mlx5/port.h @@ -162,4 +162,6 @@ void mlx5_query_port_fcs(struct mlx5_core_dev *mdev, bool *supported, int mlx5_query_module_eeprom(struct mlx5_core_dev *dev, u16 offset, u16 size, u8 *data); +int mlx5_query_port_dcbx_param(struct mlx5_core_dev *mdev, u32 *out); +int mlx5_set_port_dcbx_param(struct mlx5_core_dev *mdev, u32 *in); #endif /* __MLX5_PORT_H__ */ -- 2.7.4
[PATCH net-next 09/10] net/mlx5e: Moves pflags to priv->params
From: Shaker Daibespflags is a configuration parameter for the netdev, naturally it belongs to priv->params. Also introduce MLX5E_GET_PFLAG Signed-off-by: Shaker Daibes Signed-off-by: Saeed Mahameed --- drivers/net/ethernet/mellanox/mlx5/core/en.h | 16 +--- drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c | 6 +++--- drivers/net/ethernet/mellanox/mlx5/core/en_main.c| 4 ++-- 3 files changed, 14 insertions(+), 12 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h index 9cf32d3..84ac78f 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en.h +++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h @@ -177,14 +177,16 @@ enum mlx5e_priv_flag { MLX5E_PFLAG_RX_CQE_BASED_MODER = (1 << 0), }; -#define MLX5E_SET_PRIV_FLAG(priv, pflag, enable)\ - do {\ - if (enable) \ - priv->pflags |= pflag; \ - else\ - priv->pflags &= ~pflag; \ +#define MLX5E_SET_PFLAG(priv, pflag, enable) \ + do {\ + if (enable) \ + (priv)->params.pflags |= (pflag); \ + else\ + (priv)->params.pflags &= ~(pflag); \ } while (0) +#define MLX5E_GET_PFLAG(priv, pflag) (!!((priv)->params.pflags & (pflag))) + #ifdef CONFIG_MLX5_CORE_EN_DCB #define MLX5E_MAX_BW_ALLOC 100 /* Max percentage of BW allocation */ #endif @@ -218,6 +220,7 @@ struct mlx5e_params { bool vlan_strip_disable; bool rx_am_enabled; u32 lro_timeout; + u32 pflags; }; #ifdef CONFIG_MLX5_CORE_EN_DCB @@ -705,7 +708,6 @@ struct mlx5e_priv { struct work_struct tx_timeout_work; struct delayed_workupdate_stats_work; - u32pflags; struct mlx5_core_dev *mdev; struct net_device *netdev; struct mlx5e_stats stats; diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c index 839c4e9..d2bdccb 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c @@ -1488,7 +1488,7 @@ static int mlx5e_handle_pflag(struct net_device *netdev, { struct mlx5e_priv *priv = netdev_priv(netdev); bool enable = !!(wanted_flags & flag); - u32 changes = wanted_flags ^ priv->pflags; + u32 changes = wanted_flags ^ priv->params.pflags; int err; if (!(changes & flag)) @@ -1501,7 +1501,7 @@ static int mlx5e_handle_pflag(struct net_device *netdev, return err; } - MLX5E_SET_PRIV_FLAG(priv, flag, enable); + MLX5E_SET_PFLAG(priv, flag, enable); return 0; } @@ -1524,7 +1524,7 @@ static u32 mlx5e_get_priv_flags(struct net_device *netdev) { struct mlx5e_priv *priv = netdev_priv(netdev); - return priv->pflags; + return priv->params.pflags; } static int mlx5e_set_rxnfc(struct net_device *dev, struct ethtool_rxnfc *cmd) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c index 89d5c65..004940a 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c @@ -3488,8 +3488,8 @@ static void mlx5e_build_nic_netdev_priv(struct mlx5_core_dev *mdev, SKB_DATA_ALIGN(sizeof(struct skb_shared_info)); /* Initialize pflags */ - MLX5E_SET_PRIV_FLAG(priv, MLX5E_PFLAG_RX_CQE_BASED_MODER, - priv->params.rx_cq_period_mode == MLX5_CQ_PERIOD_MODE_START_FROM_CQE); + MLX5E_SET_PFLAG(priv, MLX5E_PFLAG_RX_CQE_BASED_MODER, + priv->params.rx_cq_period_mode == MLX5_CQ_PERIOD_MODE_START_FROM_CQE); mutex_init(>state_lock); -- 2.7.4
[PATCH net-next 07/10] net/mlx5e: Add support for ethtool self diagnostics test
From: Kamal HeibThe self diagnostics test implementaion include the following features: 1. Link Test: Check that link is in up state. 2. Speed Test: Check that link was negotiated correctly. 3. Health Test: Check the device health. Signed-off-by: Kamal Heib Signed-off-by: Saeed Mahameed --- drivers/net/ethernet/mellanox/mlx5/core/Makefile | 2 +- drivers/net/ethernet/mellanox/mlx5/core/en.h | 5 + .../net/ethernet/mellanox/mlx5/core/en_ethtool.c | 8 +- .../net/ethernet/mellanox/mlx5/core/en_selftest.c | 126 + 4 files changed, 139 insertions(+), 2 deletions(-) create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/en_selftest.c diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Makefile b/drivers/net/ethernet/mellanox/mlx5/core/Makefile index 0343725..9f43beb 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/Makefile +++ b/drivers/net/ethernet/mellanox/mlx5/core/Makefile @@ -8,6 +8,6 @@ mlx5_core-y := main.o cmd.o debugfs.o fw.o eq.o uar.o pagealloc.o \ mlx5_core-$(CONFIG_MLX5_CORE_EN) += wq.o eswitch.o eswitch_offloads.o \ en_main.o en_common.o en_fs.o en_ethtool.o en_tx.o \ en_rx.o en_rx_am.o en_txrx.o en_clock.o vxlan.o \ - en_tc.o en_arfs.o en_rep.o en_fs_ethtool.o + en_tc.o en_arfs.o en_rep.o en_fs_ethtool.o en_selftest.o mlx5_core-$(CONFIG_MLX5_CORE_EN_DCB) += en_dcbnl.o diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h index 6c954cd..f7bb4a7 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en.h +++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h @@ -167,6 +167,8 @@ struct mlx5e_umr_wqe { struct mlx5_wqe_data_seg data; }; +extern const char mlx5e_self_tests[][ETH_GSTRING_LEN]; + static const char mlx5e_priv_flags[][ETH_GSTRING_LEN] = { "rx_cqe_moder", }; @@ -754,6 +756,9 @@ int mlx5e_create_flow_steering(struct mlx5e_priv *priv); void mlx5e_destroy_flow_steering(struct mlx5e_priv *priv); void mlx5e_init_l2_addr(struct mlx5e_priv *priv); void mlx5e_destroy_flow_table(struct mlx5e_flow_table *ft); +int mlx5e_self_test_num(struct mlx5e_priv *priv); +void mlx5e_self_test(struct net_device *ndev, struct ethtool_test *etest, +u64 *buf); int mlx5e_ethtool_get_flow(struct mlx5e_priv *priv, struct ethtool_rxnfc *info, int location); int mlx5e_ethtool_get_all_flows(struct mlx5e_priv *priv, diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c index 9ea7b37..839c4e9 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c @@ -180,6 +180,8 @@ static int mlx5e_get_sset_count(struct net_device *dev, int sset) case ETH_SS_PRIV_FLAGS: return ARRAY_SIZE(mlx5e_priv_flags); + case ETH_SS_TEST: + return mlx5e_self_test_num(priv); /* fallthrough */ default: return -EOPNOTSUPP; @@ -286,6 +288,9 @@ static void mlx5e_get_strings(struct net_device *dev, break; case ETH_SS_TEST: + for (i = 0; i < mlx5e_self_test_num(priv); i++) + strcpy(data + i * ETH_GSTRING_LEN, + mlx5e_self_tests[i]); break; case ETH_SS_STATS: @@ -1573,5 +1578,6 @@ const struct ethtool_ops mlx5e_ethtool_ops = { .get_module_info = mlx5e_get_module_info, .get_module_eeprom = mlx5e_get_module_eeprom, .get_priv_flags= mlx5e_get_priv_flags, - .set_priv_flags= mlx5e_set_priv_flags + .set_priv_flags= mlx5e_set_priv_flags, + .self_test = mlx5e_self_test, }; diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_selftest.c b/drivers/net/ethernet/mellanox/mlx5/core/en_selftest.c new file mode 100644 index 000..a25dfc5 --- /dev/null +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_selftest.c @@ -0,0 +1,126 @@ +/* + * Copyright (c) 2016, Mellanox Technologies, Ltd. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + *copyright notice, this list of conditions and the following + *disclaimer. + * + * - Redistributions in binary form must reproduce the above + *copyright notice, this list of conditions and the following
[PATCH net-next 02/10] net/mlx5e: Support DCBX CEE API
From: Huy NguyenAdd DCBX CEE API interface for ConnectX-4. Configurations are stored in a temporary structure and are applied to the card's firmware when the CEE's setall callback function is called. Note: priority group in CEE is equivalent to traffic class in ConnectX-4 hardware spec. bw allocation per priority in CEE is not supported because ConnectX-4 only supports bw allocation per traffic class. user priority in CEE does not have an equivalent term in ConnectX-4. Therefore, user priority to priority mapping in CEE is not supported. Signed-off-by: Huy Nguyen Signed-off-by: Saeed Mahameed --- drivers/net/ethernet/mellanox/mlx5/core/en.h | 24 ++ drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c | 301 - drivers/net/ethernet/mellanox/mlx5/core/port.c | 43 +++ include/linux/mlx5/port.h | 4 + 4 files changed, 370 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h index a2b32ed..31387ed 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en.h +++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h @@ -221,6 +221,26 @@ struct mlx5e_params { u32 lro_timeout; }; +#ifdef CONFIG_MLX5_CORE_EN_DCB +struct mlx5e_cee_config { + /* bw pct for priority group */ + u8 pg_bw_pct[CEE_DCBX_MAX_PGS]; + u8 prio_to_pg_map[CEE_DCBX_MAX_PRIO]; + bool pfc_setting[CEE_DCBX_MAX_PRIO]; + bool pfc_enable; +}; + +enum { + MLX5_DCB_CHG_RESET, + MLX5_DCB_NO_CHG, + MLX5_DCB_CHG_NO_RESET, +}; + +struct mlx5e_dcbx { + struct mlx5e_cee_configcee_cfg; /* pending configuration */ +}; +#endif + struct mlx5e_tstamp { rwlock_t lock; struct cyclecountercycles; @@ -688,6 +708,10 @@ struct mlx5e_priv { struct mlx5e_stats stats; struct mlx5e_tstamptstamp; u16 q_counter; +#ifdef CONFIG_MLX5_CORE_EN_DCB + struct mlx5e_dcbx dcbx; +#endif + const struct mlx5e_profile *profile; void *ppriv; }; diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c b/drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c index 762af16..0595243 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c @@ -38,6 +38,9 @@ #define MLX5E_100MB (10) #define MLX5E_1GB (100) +#define MLX5E_CEE_STATE_UP1 +#define MLX5E_CEE_STATE_DOWN 0 + static int mlx5e_dcbnl_ieee_getets(struct net_device *netdev, struct ieee_ets *ets) { @@ -222,13 +225,15 @@ static int mlx5e_dcbnl_ieee_setpfc(struct net_device *dev, static u8 mlx5e_dcbnl_getdcbx(struct net_device *dev) { - return DCB_CAP_DCBX_HOST | DCB_CAP_DCBX_VER_IEEE; + return DCB_CAP_DCBX_HOST | + DCB_CAP_DCBX_VER_IEEE | + DCB_CAP_DCBX_VER_CEE; } static u8 mlx5e_dcbnl_setdcbx(struct net_device *dev, u8 mode) { if ((mode & DCB_CAP_DCBX_LLD_MANAGED) || - (mode & DCB_CAP_DCBX_VER_CEE) || + !(mode & DCB_CAP_DCBX_VER_CEE) || !(mode & DCB_CAP_DCBX_VER_IEEE) || !(mode & DCB_CAP_DCBX_HOST)) return 1; @@ -304,6 +309,281 @@ static int mlx5e_dcbnl_ieee_setmaxrate(struct net_device *netdev, return mlx5_modify_port_ets_rate_limit(mdev, max_bw_value, max_bw_unit); } +static u8 mlx5e_dcbnl_setall(struct net_device *netdev) +{ + struct mlx5e_priv *priv = netdev_priv(netdev); + struct mlx5e_cee_config *cee_cfg = >dcbx.cee_cfg; + struct mlx5_core_dev *mdev = priv->mdev; + struct ieee_ets ets; + struct ieee_pfc pfc; + int err; + int i; + + memset(, 0, sizeof(ets)); + memset(, 0, sizeof(pfc)); + + ets.ets_cap = IEEE_8021QAZ_MAX_TCS; + for (i = 0; i < CEE_DCBX_MAX_PGS; i++) { + ets.tc_tx_bw[i] = cee_cfg->pg_bw_pct[i]; + ets.tc_rx_bw[i] = cee_cfg->pg_bw_pct[i]; + ets.tc_tsa[i] = IEEE_8021QAZ_TSA_ETS; + ets.prio_tc[i] = cee_cfg->prio_to_pg_map[i]; + } + + err = mlx5e_dbcnl_validate_ets(netdev, ); + if (err) { + netdev_err(netdev, + "%s, Failed to validate ETS: %d\n", __func__, err); + goto out; + } + + err = mlx5e_dcbnl_ieee_setets_core(priv, ); + if (err) { + netdev_err(netdev, + "%s, Failed to set ETS: %d\n", __func__, err); + goto out; + } + + /* Set PFC */ + pfc.pfc_cap = mlx5_max_tc(mdev) + 1; + if (!cee_cfg->pfc_enable) + pfc.pfc_en = 0; + else + for (i = 0; i <
[PATCH net-next 01/10] net/mlx5e: Add qos capability check
From: Huy NguyenMake sure firmware supports qos before exposing the DCB API. Signed-off-by: Huy Nguyen Signed-off-by: Saeed Mahameed --- drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c index 19403d6..2b42112 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c @@ -3549,7 +3549,8 @@ static void mlx5e_build_nic_netdev(struct net_device *netdev) if (MLX5_CAP_GEN(mdev, vport_group_manager)) { netdev->netdev_ops = _netdev_ops_sriov; #ifdef CONFIG_MLX5_CORE_EN_DCB - netdev->dcbnl_ops = _dcbnl_ops; + if (MLX5_CAP_GEN(mdev, qos)) + netdev->dcbnl_ops = _dcbnl_ops; #endif } else { netdev->netdev_ops = _netdev_ops_basic; -- 2.7.4
[PATCH net-next 06/10] net/mlx5e: Add DCBX control interface
From: Huy NguyenUse setdcbx interface to set the DCBX mode to firmware or os. If setdcbx is called with mode value of zero, the DCBX mode is set to firmware. Signed-off-by: Huy Nguyen Signed-off-by: Saeed Mahameed --- drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c | 27 +++--- 1 file changed, 24 insertions(+), 3 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c b/drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c index 2e94717..64c45e9 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c @@ -288,13 +288,34 @@ static int mlx5e_dcbnl_ieee_setpfc(struct net_device *dev, static u8 mlx5e_dcbnl_getdcbx(struct net_device *dev) { - return DCB_CAP_DCBX_HOST | - DCB_CAP_DCBX_VER_IEEE | - DCB_CAP_DCBX_VER_CEE; + struct mlx5e_priv *priv = netdev_priv(dev); + struct mlx5e_dcbx *dcbx = >dcbx; + u8 mode = DCB_CAP_DCBX_VER_IEEE | DCB_CAP_DCBX_VER_CEE; + + if (dcbx->mode == MLX5E_DCBX_PARAM_VER_OPER_HOST) + mode |= DCB_CAP_DCBX_HOST; + + return mode; } static u8 mlx5e_dcbnl_setdcbx(struct net_device *dev, u8 mode) { + struct mlx5e_priv *priv = netdev_priv(dev); + struct mlx5e_dcbx *dcbx = >dcbx; + + if ((!mode) && MLX5_CAP_GEN(priv->mdev, dcbx)) { + if (dcbx->mode == MLX5E_DCBX_PARAM_VER_OPER_AUTO) + return 0; + + /* set dcbx to fw controlled */ + if (!mlx5e_dcbnl_set_dcbx_mode(priv, MLX5E_DCBX_PARAM_VER_OPER_AUTO)) { + dcbx->mode = MLX5E_DCBX_PARAM_VER_OPER_AUTO; + return 0; + } + + return 1; + } + if (mlx5e_dcbnl_switch_to_host_mode(netdev_priv(dev))) return 1; -- 2.7.4
[PATCH net-next 03/10] net/mlx5e: Read ETS settings directly from firmware
From: Huy NguyenIssue description: Current implementation saves the ETS settings from user in a temporal soft copy and returns this settings when user queries the ETS settings. With the new DCBX firmware, the ETS settings can be changed by firmware when the DCBX is in firmware controlled mode. Therefore, user will obtain wrong values from the temporal soft copy. Solution: 1. Read the ETS settings directly from firmware. 2. For tc_tsa: a. Initialize tc_tsa to vendor IEEE_8021QAZ_TSA_VENDOR at netdev creation. b. When reading ETS setting from FW, if the traffic class bandwidth is less than 100, set tc_tsa to IEEE_8021QAZ_TSA_ETS. This implementation solves the scenarios when the DCBX is in FW control and willing bit is on which means the ETS setting is dictated by remote switch. Signed-off-by: Huy Nguyen Signed-off-by: Saeed Mahameed --- drivers/net/ethernet/mellanox/mlx5/core/en.h | 6 ++-- drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c | 35 ++ drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 28 + 3 files changed, 47 insertions(+), 22 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h index 31387ed..60aa13b 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en.h +++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h @@ -214,9 +214,6 @@ struct mlx5e_params { u8 toeplitz_hash_key[40]; u32 indirection_rqt[MLX5E_INDIR_RQT_SIZE]; bool vlan_strip_disable; -#ifdef CONFIG_MLX5_CORE_EN_DCB - struct ieee_ets ets; -#endif bool rx_am_enabled; u32 lro_timeout; }; @@ -238,6 +235,9 @@ enum { struct mlx5e_dcbx { struct mlx5e_cee_configcee_cfg; /* pending configuration */ + + /* The only setting that cannot be read from FW */ + u8 tc_tsa[IEEE_8021QAZ_MAX_TCS]; }; #endif diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c b/drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c index 0595243..8f6b5a7 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c @@ -45,12 +45,31 @@ static int mlx5e_dcbnl_ieee_getets(struct net_device *netdev, struct ieee_ets *ets) { struct mlx5e_priv *priv = netdev_priv(netdev); + struct mlx5_core_dev *mdev = priv->mdev; + int i; + int err = 0; if (!MLX5_CAP_GEN(priv->mdev, ets)) return -ENOTSUPP; - memcpy(ets, >params.ets, sizeof(*ets)); - return 0; + ets->ets_cap = mlx5_max_tc(priv->mdev) + 1; + for (i = 0; i < ets->ets_cap; i++) { + err = mlx5_query_port_prio_tc(mdev, i, >prio_tc[i]); + if (err) + return err; + } + + for (i = 0; i < ets->ets_cap; i++) { + err = mlx5_query_port_tc_bw_alloc(mdev, i, >tc_tx_bw[i]); + if (err) + return err; + if (ets->tc_tx_bw[i] < MLX5E_MAX_BW_ALLOC) + priv->dcbx.tc_tsa[i] = IEEE_8021QAZ_TSA_ETS; + } + + memcpy(ets->tc_tsa, priv->dcbx.tc_tsa, sizeof(ets->tc_tsa)); + + return err; } enum { @@ -127,7 +146,14 @@ int mlx5e_dcbnl_ieee_setets_core(struct mlx5e_priv *priv, struct ieee_ets *ets) if (err) return err; - return mlx5_set_port_tc_bw_alloc(mdev, tc_tx_bw); + err = mlx5_set_port_tc_bw_alloc(mdev, tc_tx_bw); + + if (err) + return err; + + memcpy(priv->dcbx.tc_tsa, ets->tc_tsa, sizeof(ets->tc_tsa)); + + return err; } static int mlx5e_dbcnl_validate_ets(struct net_device *netdev, @@ -181,9 +207,6 @@ static int mlx5e_dcbnl_ieee_setets(struct net_device *netdev, if (err) return err; - memcpy(>params.ets, ets, sizeof(*ets)); - priv->params.ets.ets_cap = mlx5_max_tc(priv->mdev) + 1; - return 0; } diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c index 2b42112..9743c4c 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c @@ -3329,17 +3329,23 @@ u16 mlx5e_get_max_inline_cap(struct mlx5_core_dev *mdev) static void mlx5e_ets_init(struct mlx5e_priv *priv) { int i; - - priv->params.ets.ets_cap = mlx5_max_tc(priv->mdev) + 1; - for (i = 0; i < priv->params.ets.ets_cap; i++) { - priv->params.ets.tc_tx_bw[i] = MLX5E_MAX_BW_ALLOC; - priv->params.ets.tc_tsa[i] = IEEE_8021QAZ_TSA_VENDOR; - priv->params.ets.prio_tc[i] = i; + struct ieee_ets ets; + + memset(, 0, sizeof(ets)); + ets.ets_cap = mlx5_max_tc(priv->mdev) + 1; + for (i = 0; i < ets.ets_cap; i++) { +
[PATCH net-next 05/10] net/mlx5e: ConnectX-4 firmware support for DCBX
From: Huy NguyenThis patch sets up the infrastructure to support the new DCBX firmware. Signed-off-by: Huy Nguyen Signed-off-by: Saeed Mahameed --- drivers/net/ethernet/mellanox/mlx5/core/en.h | 2 + drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c | 92 ++ drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 26 +- 3 files changed, 95 insertions(+), 25 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h index 60aa13b..6c954cd 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en.h +++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h @@ -234,6 +234,7 @@ enum { }; struct mlx5e_dcbx { + enum mlx5_dcbx_oper_mode mode; struct mlx5e_cee_configcee_cfg; /* pending configuration */ /* The only setting that cannot be read from FW */ @@ -843,6 +844,7 @@ extern const struct ethtool_ops mlx5e_ethtool_ops; #ifdef CONFIG_MLX5_CORE_EN_DCB extern const struct dcbnl_rtnl_ops mlx5e_dcbnl_ops; int mlx5e_dcbnl_ieee_setets_core(struct mlx5e_priv *priv, struct ieee_ets *ets); +void mlx5e_dcbnl_initialize(struct mlx5e_priv *priv); #endif #ifndef CONFIG_RFS_ACCEL diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c b/drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c index 8f6b5a7..2e94717 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c @@ -41,6 +41,43 @@ #define MLX5E_CEE_STATE_UP1 #define MLX5E_CEE_STATE_DOWN 0 +/* If dcbx mode is non-host set the dcbx mode to host. + */ +static int mlx5e_dcbnl_set_dcbx_mode(struct mlx5e_priv *priv, +enum mlx5_dcbx_oper_mode mode) +{ + struct mlx5_core_dev *mdev = priv->mdev; + u32 param[MLX5_ST_SZ_DW(dcbx_param)]; + int err; + + err = mlx5_query_port_dcbx_param(mdev, param); + if (err) + return err; + + MLX5_SET(dcbx_param, param, version_admin, mode); + if (mode != MLX5E_DCBX_PARAM_VER_OPER_HOST) + MLX5_SET(dcbx_param, param, willing_admin, 1); + + return mlx5_set_port_dcbx_param(mdev, param); +} + +static int mlx5e_dcbnl_switch_to_host_mode(struct mlx5e_priv *priv) +{ + struct mlx5e_dcbx *dcbx = >dcbx; + + if (!MLX5_CAP_GEN(priv->mdev, dcbx)) + return 0; + + if (dcbx->mode == MLX5E_DCBX_PARAM_VER_OPER_HOST) + return 0; + + if (mlx5e_dcbnl_set_dcbx_mode(priv, MLX5E_DCBX_PARAM_VER_OPER_HOST)) + return 1; + + dcbx->mode = MLX5E_DCBX_PARAM_VER_OPER_HOST; + return 0; +} + static int mlx5e_dcbnl_ieee_getets(struct net_device *netdev, struct ieee_ets *ets) { @@ -199,6 +236,9 @@ static int mlx5e_dcbnl_ieee_setets(struct net_device *netdev, struct mlx5e_priv *priv = netdev_priv(netdev); int err; + if (!MLX5_CAP_GEN(priv->mdev, ets)) + return -ENOTSUPP; + err = mlx5e_dbcnl_validate_ets(netdev, ets); if (err) return err; @@ -255,6 +295,9 @@ static u8 mlx5e_dcbnl_getdcbx(struct net_device *dev) static u8 mlx5e_dcbnl_setdcbx(struct net_device *dev, u8 mode) { + if (mlx5e_dcbnl_switch_to_host_mode(netdev_priv(dev))) + return 1; + if ((mode & DCB_CAP_DCBX_LLD_MANAGED) || !(mode & DCB_CAP_DCBX_VER_CEE) || !(mode & DCB_CAP_DCBX_VER_IEEE) || @@ -634,3 +677,52 @@ const struct dcbnl_rtnl_ops mlx5e_dcbnl_ops = { .getpfcstate= mlx5e_dcbnl_getpfcstate, .setpfcstate= mlx5e_dcbnl_setpfcstate, }; + +static void mlx5e_dcbnl_query_dcbx_mode(struct mlx5e_priv *priv, + enum mlx5_dcbx_oper_mode *mode) +{ + u32 out[MLX5_ST_SZ_DW(dcbx_param)]; + + *mode = MLX5E_DCBX_PARAM_VER_OPER_HOST; + + if (!mlx5_query_port_dcbx_param(priv->mdev, out)) + *mode = MLX5_GET(dcbx_param, out, version_oper); + + /* From driver's point of view, we only care if the mode +* is host (HOST) or non-host (AUTO) +*/ + if (*mode != MLX5E_DCBX_PARAM_VER_OPER_HOST) + *mode = MLX5E_DCBX_PARAM_VER_OPER_AUTO; +} + +static void mlx5e_ets_init(struct mlx5e_priv *priv) +{ + int i; + struct ieee_ets ets; + + memset(, 0, sizeof(ets)); + ets.ets_cap = mlx5_max_tc(priv->mdev) + 1; + for (i = 0; i < ets.ets_cap; i++) { + ets.tc_tx_bw[i] = MLX5E_MAX_BW_ALLOC; + ets.tc_tsa[i] = IEEE_8021QAZ_TSA_VENDOR; + ets.prio_tc[i] = i; + } + + memcpy(priv->dcbx.tc_tsa, ets.tc_tsa, sizeof(ets.tc_tsa)); + + /* tclass[prio=0]=1, tclass[prio=1]=0, tclass[prio=i]=i (for i>1) */ + ets.prio_tc[0] = 1; + ets.prio_tc[1] = 0; + + mlx5e_dcbnl_ieee_setets_core(priv, ); +} + +void
[PATCH net-next 08/10] net/mlx5e: Add support for loopback selftest
Extend the self diagnostic tests to support loopback test. The loopback test doesn't require the offline flag, it will use the generic dev_queue_xmit and a dedicated packet_type to capture and verify mlx5e selftest loopback packets. Signed-off-by: Saeed MahameedSigned-off-by: Kamal Heib --- drivers/net/ethernet/mellanox/mlx5/core/en.h | 3 +- .../net/ethernet/mellanox/mlx5/core/en_common.c| 7 +- drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 2 +- .../net/ethernet/mellanox/mlx5/core/en_selftest.c | 218 + 4 files changed, 227 insertions(+), 3 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h index f7bb4a7..9cf32d3 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en.h +++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h @@ -885,7 +885,8 @@ void mlx5e_destroy_tir(struct mlx5_core_dev *mdev, struct mlx5e_tir *tir); int mlx5e_create_mdev_resources(struct mlx5_core_dev *mdev); void mlx5e_destroy_mdev_resources(struct mlx5_core_dev *mdev); -int mlx5e_refresh_tirs_self_loopback_enable(struct mlx5_core_dev *mdev); +int mlx5e_refresh_tirs_self_loopback(struct mlx5_core_dev *mdev, +bool enable_uc_lb); struct mlx5_eswitch_rep; int mlx5e_vport_rep_load(struct mlx5_eswitch *esw, diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_common.c b/drivers/net/ethernet/mellanox/mlx5/core/en_common.c index 029e856..f175518 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_common.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_common.c @@ -137,7 +137,8 @@ void mlx5e_destroy_mdev_resources(struct mlx5_core_dev *mdev) mlx5_unmap_free_uar(mdev, >cq_uar); } -int mlx5e_refresh_tirs_self_loopback_enable(struct mlx5_core_dev *mdev) +int mlx5e_refresh_tirs_self_loopback(struct mlx5_core_dev *mdev, +bool enable_uc_lb) { struct mlx5e_tir *tir; void *in; @@ -149,6 +150,10 @@ int mlx5e_refresh_tirs_self_loopback_enable(struct mlx5_core_dev *mdev) if (!in) return -ENOMEM; + if (enable_uc_lb) + MLX5_SET(modify_tir_in, in, ctx.self_lb_block, +MLX5_TIRC_SELF_LB_BLOCK_BLOCK_UNICAST_); + MLX5_SET(modify_tir_in, in, bitmask.self_lb_en, 1); list_for_each_entry(tir, >mlx5e_res.td.tirs_list, list) { diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c index f5b93c2..89d5c65 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c @@ -2136,7 +2136,7 @@ int mlx5e_open_locked(struct net_device *netdev) goto err_clear_state_opened_flag; } - err = mlx5e_refresh_tirs_self_loopback_enable(priv->mdev); + err = mlx5e_refresh_tirs_self_loopback(priv->mdev, false); if (err) { netdev_err(netdev, "%s: mlx5e_refresh_tirs_self_loopback_enable failed, %d\n", __func__, err); diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_selftest.c b/drivers/net/ethernet/mellanox/mlx5/core/en_selftest.c index a25dfc5..a823054 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_selftest.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_selftest.c @@ -30,12 +30,16 @@ * SOFTWARE. */ +#include +#include +#include #include "en.h" enum { MLX5E_ST_LINK_STATE, MLX5E_ST_LINK_SPEED, MLX5E_ST_HEALTH_INFO, + MLX5E_ST_LOOPBACK, MLX5E_ST_NUM, }; @@ -43,6 +47,7 @@ const char mlx5e_self_tests[MLX5E_ST_NUM][ETH_GSTRING_LEN] = { "Link Test", "Speed Test", "Health Test", + "Loopback Test", }; int mlx5e_self_test_num(struct mlx5e_priv *priv) @@ -88,10 +93,223 @@ static int mlx5e_test_link_speed(struct mlx5e_priv *priv) return 1; } +/* loopback test */ +#define MLX5E_TEST_PKT_SIZE (MLX5_MPWRQ_SMALL_PACKET_THRESHOLD - NET_IP_ALIGN) +static const char mlx5e_test_text[ETH_GSTRING_LEN] = "MLX5E SELF TEST"; +#define MLX5E_TEST_MAGIC 0x5AEED15C001 + +struct mlx5ehdr { + __be32 version; + __be64 magic; + char text[ETH_GSTRING_LEN]; +}; + +static struct sk_buff *mlx5e_test_get_udp_skb(struct mlx5e_priv *priv) +{ + struct sk_buff *skb = NULL; + struct mlx5ehdr *mlxh; + struct ethhdr *ethh; + struct udphdr *udph; + struct iphdr *iph; + int datalen, iplen; + + datalen = MLX5E_TEST_PKT_SIZE - + (sizeof(*ethh) + sizeof(*iph) + sizeof(*udph)); + + skb = netdev_alloc_skb(priv->netdev, MLX5E_TEST_PKT_SIZE); + if (!skb) { + netdev_err(priv->netdev, "\tFailed to alloc loopback skb\n"); + return NULL; + } + + prefetchw(skb->data); + skb_reserve(skb, NET_IP_ALIGN); + + /*
[PATCH net-next 10/10] net/mlx5e: Add CQE compression user control
From: Shaker DaibesThe user can now override the automatic driver decision using the rx_cqe_compress flag, which is the preference for CQE compression. The flag is initialized with the automatic driver decision. Signed-off-by: Shaker Daibes Signed-off-by: Saeed Mahameed --- drivers/net/ethernet/mellanox/mlx5/core/en.h | 5 +-- drivers/net/ethernet/mellanox/mlx5/core/en_clock.c | 3 +- .../net/ethernet/mellanox/mlx5/core/en_ethtool.c | 39 -- drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 13 drivers/net/ethernet/mellanox/mlx5/core/en_rx.c| 4 +-- 5 files changed, 51 insertions(+), 13 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h index 84ac78f..442dbc3 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en.h +++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h @@ -171,10 +171,12 @@ extern const char mlx5e_self_tests[][ETH_GSTRING_LEN]; static const char mlx5e_priv_flags[][ETH_GSTRING_LEN] = { "rx_cqe_moder", + "rx_cqe_compress", }; enum mlx5e_priv_flag { MLX5E_PFLAG_RX_CQE_BASED_MODER = (1 << 0), + MLX5E_PFLAG_RX_CQE_COMPRESS = (1 << 1), }; #define MLX5E_SET_PFLAG(priv, pflag, enable) \ @@ -205,8 +207,7 @@ struct mlx5e_params { u16 num_channels; u8 num_tc; u8 rx_cq_period_mode; - bool rx_cqe_compress_admin; - bool rx_cqe_compress; + bool rx_cqe_compress_def; struct mlx5e_cq_moder rx_cq_moderation; struct mlx5e_cq_moder tx_cq_moderation; u16 min_rx_wqes; diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_clock.c b/drivers/net/ethernet/mellanox/mlx5/core/en_clock.c index 13dc388..2cd8e56 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_clock.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_clock.c @@ -94,7 +94,7 @@ int mlx5e_hwstamp_set(struct net_device *dev, struct ifreq *ifr) switch (config.rx_filter) { case HWTSTAMP_FILTER_NONE: /* Reset CQE compression to Admin default */ - mlx5e_modify_rx_cqe_compression(priv, priv->params.rx_cqe_compress_admin); + mlx5e_modify_rx_cqe_compression(priv, priv->params.rx_cqe_compress_def); break; case HWTSTAMP_FILTER_ALL: case HWTSTAMP_FILTER_SOME: @@ -111,6 +111,7 @@ int mlx5e_hwstamp_set(struct net_device *dev, struct ifreq *ifr) case HWTSTAMP_FILTER_PTP_V2_SYNC: case HWTSTAMP_FILTER_PTP_V2_DELAY_REQ: /* Disable CQE compression */ + netdev_warn(dev, "Disabling cqe compression"); mlx5e_modify_rx_cqe_compression(priv, false); config.rx_filter = HWTSTAMP_FILTER_ALL; break; diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c index d2bdccb..aa963d7 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c @@ -1481,6 +1481,35 @@ static int set_pflag_rx_cqe_based_moder(struct net_device *netdev, bool enable) return err; } +static int set_pflag_rx_cqe_compress(struct net_device *netdev, +bool enable) +{ + struct mlx5e_priv *priv = netdev_priv(netdev); + struct mlx5_core_dev *mdev = priv->mdev; + int err = 0; + bool reset; + + if (!MLX5_CAP_GEN(mdev, cqe_compression)) + return -ENOTSUPP; + + if (enable && priv->tstamp.hwtstamp_config.rx_filter != HWTSTAMP_FILTER_NONE) { + netdev_err(netdev, "Can't enable cqe compression while timestamping is enabled.\n"); + return -EINVAL; + } + + reset = test_bit(MLX5E_STATE_OPENED, >state); + + if (reset) + mlx5e_close_locked(netdev); + + MLX5E_SET_PFLAG(priv, MLX5E_PFLAG_RX_CQE_COMPRESS, enable); + priv->params.rx_cqe_compress_def = enable; + + if (reset) + err = mlx5e_open_locked(netdev); + return err; +} + static int mlx5e_handle_pflag(struct net_device *netdev, u32 wanted_flags, enum mlx5e_priv_flag flag, @@ -1511,13 +1540,19 @@ static int mlx5e_set_priv_flags(struct net_device *netdev, u32 pflags) int err; mutex_lock(>state_lock); - err = mlx5e_handle_pflag(netdev, pflags, MLX5E_PFLAG_RX_CQE_BASED_MODER, set_pflag_rx_cqe_based_moder); + if (err) + goto out; + err = mlx5e_handle_pflag(netdev, pflags, +MLX5E_PFLAG_RX_CQE_COMPRESS, +set_pflag_rx_cqe_compress); + +out: mutex_unlock(>state_lock); - return err ? -EINVAL : 0; + return err;
[PATCH net-next 00/10] Mellanox 100G mlx5 DCBX and ethtool updates
Hi Dave, This series provides the following mlx5 updates: >From Huy: DCBX CEE API and DCBX firmware/host modes support. - 1st patch ensures the dcbnl_rtnl_ops is published only when the qos capability bits is on. - 2nd patch adds the support for CEE interfaces into mlx5 dcbnl_rtnl_ops - 3rd patch refactors ETS query to read ETS configuration directly from firmware rather than having a software shadow to it. The existing IEEE interfaces stays the same. - 4th patch adds the support for MLX5_REG_DCBX_PARAM and MLX5_REG_DCBX_APP firmware commands to manipulate mlx5 DCBX mode. - 5th patch adds the driver support for the new DCBX firmware. This ensures the backward compatibility versus the old and new firmware. With the new DCBX firmware, qos settings can be controlled by either firmware or software depending on the DCBX mode. >From Kamal and Saeed: - mlx5 self-test support. >From Shaker: - Private flag to give the user the ability to enable/disable mlx5 CQE compression. This series was generated against commit: e5f12b3f5ebb ("Merge branch 'mlxsw-trap-groups-and-policers'") Thenks Saeed. Huy Nguyen (6): net/mlx5e: Add qos capability check net/mlx5e: Support DCBX CEE API net/mlx5e: Read ETS settings directly from firmware net/mlx5: Add DCBX firmware commands support net/mlx5e: ConnectX-4 firmware support for DCBX net/mlx5e: Add DCBX control interface Kamal Heib (1): net/mlx5e: Add support for ethtool self diagnostics test Saeed Mahameed (1): net/mlx5e: Add support for loopback selftest Shaker Daibes (2): net/mlx5e: Moves pflags to priv->params net/mlx5e: Add CQE compression user control drivers/net/ethernet/mellanox/mlx5/core/Makefile | 2 +- drivers/net/ethernet/mellanox/mlx5/core/en.h | 61 ++- drivers/net/ethernet/mellanox/mlx5/core/en_clock.c | 3 +- .../net/ethernet/mellanox/mlx5/core/en_common.c| 7 +- drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c | 449 - .../net/ethernet/mellanox/mlx5/core/en_ethtool.c | 53 ++- drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 46 +-- drivers/net/ethernet/mellanox/mlx5/core/en_rx.c| 4 +- .../net/ethernet/mellanox/mlx5/core/en_selftest.c | 344 drivers/net/ethernet/mellanox/mlx5/core/port.c | 63 +++ include/linux/mlx5/driver.h| 7 + include/linux/mlx5/port.h | 6 + 12 files changed, 980 insertions(+), 65 deletions(-) create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/en_selftest.c -- 2.7.4
Re: net: stmmac: Meson GXBB: attempting to execute userspace memory
On 11/26/2016 10:08 AM, Martin Blumenstingl wrote: > Hello Heinrich, > > On Sat, Nov 26, 2016 at 8:53 AM, Heinrich Schuchardt >wrote: >> For Odroid C2 I have compiled kernel >> 4.9.0-rc6-next-20161124-1-gbf7e142 >> with one additional patch >> https://github.com/xypron/kernel-odroid-c2/blob/master/patch/0001-stmmac-RTL8211F-Meson-GXBB-TX-throughput-problems.patch >> >> I repeatedly see faults like the one below: > do you see the same errors with the RTL8211F patch *not* applied? > >> [ 2557.400796] Unhandled fault: synchronous external abort (0x9210) >> at 0x40001e8ee4b0 >> [ 2557.952413] CPU: 0 PID: 22837 Comm: cc1 Tainted: G D >> 4.9.0-rc6-next-20161124-1-gbf7e142 #1 >> [ 2557.962062] Hardware name: Hardkernel ODROID-C2 (DT) >> [ 2557.966980] task: 80006ddb7080 task.stack: 80006dd9c000 >> [ 2557.972846] PC is at 0x6a0d98 >> [ 2557.975776] LR is at 0x6a0e54 >> [ 2557.978709] pc : [<006a0d98>] lr : [<006a0e54>] >> pstate: 8000 >> [ 2557.986040] sp : f3ee5f80 >> [ 2557.989318] x29: f3ee5f80 x28: 4b3f1240 >> [ 2557.994578] x27: 012a7000 x26: 4b3f1288 >> [ 2557.999840] x25: 00f58f88 x24: 4b3f1240 >> [ 2558.005101] x23: x22: 0001 >> [ 2558.010362] x21: 0001 x20: 4b3f1250 >> [ 2558.015623] x19: 0054 x18: 0001 >> [ 2558.020885] x17: 48acaa10 x16: 01285050 >> [ 2558.026146] x15: 4ad96dc8 x14: 001f >> [ 2558.031407] x13: 4b3f1270 x12: 4b3f1258 >> [ 2558.036668] x11: 01347000 x10: 0661 >> [ 2558.041930] x9 : 0005 x8 : 0003 >> [ 2558.047191] x7 : 4b3f1240 x6 : 20020033 >> [ 2558.052452] x5 : 4b402020 x4 : 4b3e1aa0 >> [ 2558.057713] x3 : 000c x2 : 0020 >> [ 2558.062974] x1 : 00f45000 x0 : 0065 >> [ 2558.068235] >> [ 2558.069712] Internal error: Attempting to execute userspace memory: >> 860f [#7] PREEMPT SMP >> [ 2558.078155] Modules linked in: meson_rng rng_core meson_gxbb_wdt >> ip_tables x_tables ipv6 dwmac_generic realtek dwmac_meson8b >> stmmac_platform stmmac >> [ 2558.091267] CPU: 0 PID: 22837 Comm: cc1 Tainted: G D >> 4.9.0-rc6-next-20161124-1-gbf7e142 #1 >> [ 2558.100925] Hardware name: Hardkernel ODROID-C2 (DT) >> [ 2558.105841] task: 80006ddb7080 task.stack: 80006dd9c000 >> [ 2558.111706] PC is at 0x6a0e54 >> [ 2558.114638] LR is at 0x6a0e54 >> [ 2558.117571] pc : [<006a0e54>] lr : [<006a0e54>] >> pstate: 63c5 >> [ 2558.124902] sp : 80006dd9fec0 >> [ 2558.128179] x29: x28: 80006ddb7080 >> [ 2558.133441] x27: 012a7000 x26: 4b3f1288 >> [ 2558.138702] x25: 00f58f88 x24: 4b3f1240 >> [ 2558.143963] x23: 8000 x22: 006a0d98 >> [ 2558.149225] x21: x20: 80006e223000 >> [ 2558.154486] x19: x18: 0010 >> [ 2558.159747] x17: 48acaa10 x16: 01285050 >> [ 2558.165008] x15: 88e91f07 x14: 0006 >> [ 2558.170270] x13: 08e91f15 x12: 000f >> [ 2558.175531] x11: 0002 x10: 02ea >> [ 2558.180792] x9 : 80006dd9fb40 x8 : 00010a8b >> [ 2558.186053] x7 : x6 : 020e >> [ 2558.191315] x5 : 020f020e x4 : >> [ 2558.196576] x3 : x2 : 020f >> [ 2558.201837] x1 : 80006ddb7080 x0 : >> [ 2558.207098] >> [ 2558.208565] Process cc1 (pid: 22837, stack limit = 0x80006dd9c000) >> [ 2558.215035] Stack: (0x80006dd9fec0 to 0x80006dda) >> [ 2558.220728] fec0: 0065 00f45000 0020 >> 000c >> [ 2558.228490] fee0: 4b3e1aa0 4b402020 20020033 >> 4b3f1240 >> [ 2558.236253] ff00: 0003 0005 0661 >> 01347000 >> [ 2558.244015] ff20: 4b3f1258 4b3f1270 001f >> 4ad96dc8 >> [ 2558.251778] ff40: 01285050 48acaa10 0001 >> 0054 >> [ 2558.259540] ff60: 4b3f1250 0001 0001 >> >> [ 2558.267303] ff80: 4b3f1240 00f58f88 4b3f1288 >> 012a7000 >> [ 2558.275065] ffa0: 4b3f1240 f3ee5f80 006a0e54 >> f3ee5f80 >> [ 2558.282828] ffc0: 006a0d98 8000 0003 >> >> [ 2558.290590] ffe0: >> >> [ 2558.298351] Call trace: >> [ 2558.300769] Exception stack(0x80006dd9fcf0 to 0x80006dd9fe20) >> [ 2558.307149] fce0: >> 0001 >> [ 2558.314913] fd00: 80006dd9fec0
Re: Gigabit ethernet driver for Alacritechs SLIC devices (v3)
On 26.11.2016 13:20, Lino Sanfilippo wrote: > v3: > - dont add defines to pci.h but instead put it into the drivers header file This should of course be "pci_ids.h". Regards, Lino
[PATCH v3 net-next 2/2] MAINTAINERS: add entry for slicoss ethernet driver
Add myself as maintainer for the slicoss ethernet driver. Signed-off-by: Lino Sanfilippo--- MAINTAINERS | 5 + 1 file changed, 5 insertions(+) diff --git a/MAINTAINERS b/MAINTAINERS index 6781a3f..bb9af28 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -562,6 +562,11 @@ T: git git://linuxtv.org/anttip/media_tree.git S: Maintained F: drivers/media/usb/airspy/ +ALACRITECH GIGABIT ETHERNET DRIVER +M: Lino Sanfilippo +S: Maintained +F: drivers/net/ethernet/alacritech/* + ALCATEL SPEEDTOUCH USB DRIVER M: Duncan Sands L: linux-...@vger.kernel.org -- 1.9.1
[PATCH v3 net-next 1/2] net: ethernet: slicoss: add slicoss gigabit ethernet driver
Add driver for Alacritech gigabit ethernet cards with SLIC (session-layer interface control) technology. The driver provides basic support without SLIC for the following devices: - Mojave cards (single port PCI Gigabit) both copper and fiber - Oasis cards (single and dual port PCI-x Gigabit) copper and fiber - Kalahari cards (dual and quad port PCI-e Gigabit) copper and fiber Signed-off-by: Lino Sanfilippo--- drivers/net/ethernet/Kconfig |1 + drivers/net/ethernet/Makefile |1 + drivers/net/ethernet/alacritech/Kconfig | 28 + drivers/net/ethernet/alacritech/Makefile |4 + drivers/net/ethernet/alacritech/slic.h| 576 + drivers/net/ethernet/alacritech/slicoss.c | 1867 + 6 files changed, 2477 insertions(+) create mode 100644 drivers/net/ethernet/alacritech/Kconfig create mode 100644 drivers/net/ethernet/alacritech/Makefile create mode 100644 drivers/net/ethernet/alacritech/slic.h create mode 100644 drivers/net/ethernet/alacritech/slicoss.c diff --git a/drivers/net/ethernet/Kconfig b/drivers/net/ethernet/Kconfig index 2ffd634..a4cc87fe 100644 --- a/drivers/net/ethernet/Kconfig +++ b/drivers/net/ethernet/Kconfig @@ -21,6 +21,7 @@ source "drivers/net/ethernet/3com/Kconfig" source "drivers/net/ethernet/adaptec/Kconfig" source "drivers/net/ethernet/aeroflex/Kconfig" source "drivers/net/ethernet/agere/Kconfig" +source "drivers/net/ethernet/alacritech/Kconfig" source "drivers/net/ethernet/allwinner/Kconfig" source "drivers/net/ethernet/alteon/Kconfig" source "drivers/net/ethernet/altera/Kconfig" diff --git a/drivers/net/ethernet/Makefile b/drivers/net/ethernet/Makefile index 1d349e9..b448027 100644 --- a/drivers/net/ethernet/Makefile +++ b/drivers/net/ethernet/Makefile @@ -7,6 +7,7 @@ obj-$(CONFIG_NET_VENDOR_8390) += 8390/ obj-$(CONFIG_NET_VENDOR_ADAPTEC) += adaptec/ obj-$(CONFIG_GRETH) += aeroflex/ obj-$(CONFIG_NET_VENDOR_AGERE) += agere/ +obj-$(CONFIG_NET_VENDOR_ALACRITECH) += alacritech/ obj-$(CONFIG_NET_VENDOR_ALLWINNER) += allwinner/ obj-$(CONFIG_NET_VENDOR_ALTEON) += alteon/ obj-$(CONFIG_ALTERA_TSE) += altera/ diff --git a/drivers/net/ethernet/alacritech/Kconfig b/drivers/net/ethernet/alacritech/Kconfig new file mode 100644 index 000..41000a3 --- /dev/null +++ b/drivers/net/ethernet/alacritech/Kconfig @@ -0,0 +1,28 @@ +config NET_VENDOR_ALACRITECH +bool "Alacritech devices" +default y +---help--- + If you have a network (Ethernet) card belonging to this class, say Y. + + Note that the answer to this question doesn't directly affect the + kernel: saying N will just cause the configurator to skip all + the questions about Renesas devices. If you say Y, you will be asked + for your specific device in the following questions. + +if NET_VENDOR_ALACRITECH + +config SLICOSS + tristate "Alacritech Slicoss support" + depends on PCI + select CRC32 + ---help--- + This driver supports Gigabit Ethernet adapters based on the + Session Layer Interface (SLIC) technology by Alacritech. + + Supported are Mojave (1 port) and Oasis (1, 2 and 4 port) cards, + both copper and fiber. + + To compile this driver as a module, choose M here: the module + will be called slicoss. This is recommended. + +endif # NET_VENDOR_ALACRITECH diff --git a/drivers/net/ethernet/alacritech/Makefile b/drivers/net/ethernet/alacritech/Makefile new file mode 100644 index 000..8790e9e --- /dev/null +++ b/drivers/net/ethernet/alacritech/Makefile @@ -0,0 +1,4 @@ +# +# Makefile for the Alacritech Slicoss driver +# +obj-$(CONFIG_SLICOSS) += slicoss.o diff --git a/drivers/net/ethernet/alacritech/slic.h b/drivers/net/ethernet/alacritech/slic.h new file mode 100644 index 000..c62d46b --- /dev/null +++ b/drivers/net/ethernet/alacritech/slic.h @@ -0,0 +1,576 @@ + +#ifndef _SLIC_H +#define _SLIC_H + +#include +#include +#include +#include +#include +#include +#include +#include + +#define SLIC_VGBSTAT_XPERR 0x4000 +#define SLIC_VGBSTAT_XERRSHFT 25 +#define SLIC_VGBSTAT_XCSERR0x23 +#define SLIC_VGBSTAT_XUFLOW0x22 +#define SLIC_VGBSTAT_XHLEN 0x20 +#define SLIC_VGBSTAT_NETERR0x0100 +#define SLIC_VGBSTAT_NERRSHFT 16 +#define SLIC_VGBSTAT_NERRMSK 0x1ff +#define SLIC_VGBSTAT_NCSERR0x103 +#define SLIC_VGBSTAT_NUFLOW0x102 +#define SLIC_VGBSTAT_NHLEN 0x100 +#define SLIC_VGBSTAT_LNKERR0x0080 +#define SLIC_VGBSTAT_LERRMSK 0xff +#define SLIC_VGBSTAT_LDEARLY 0x86 +#define SLIC_VGBSTAT_LBOFLO0x85 +#define SLIC_VGBSTAT_LCODERR 0x84 +#define SLIC_VGBSTAT_LDBLNBL 0x83 +#define SLIC_VGBSTAT_LCRCERR 0x82 +#define SLIC_VGBSTAT_LOFLO 0x81 +#define SLIC_VGBSTAT_LUFLO
Gigabit ethernet driver for Alacritechs SLIC devices (v3)
Hi, this is the third version of the slicoss gigabit ethernet driver (which is a rework of the driver from Alacritech which can currently be found under drivers/staging/slicoss). The driver is supposed to support Mojave, Oasis and Kalahari cards, for both copper and fiber. If this code is accepted the staging version can be removed. The driver has been tested on a SEN2104ET adapter (4 Port PCIe copper). v3: - dont add defines to pci.h but instead put it into the drivers header file (requested by Greg Kroah-Hartman) v2: - remove unusual padding in statistic strings (suggested by Andrew Lunn) - for mdio register and bit names use defines from mii.h instead of own ones (suggested by Andrew Lunn) - remove unused defines - ensure PCI flush at two more places - use mmiowb before lock to prevent mmio writes leaking out of lock - fix some typos in comments - add copyright and GPL header Regards, Lino
Re: [Patch net-next] net_sched: move the empty tp check from ->destroy() to ->delete()
On 11/26/2016 07:46 AM, Cong Wang wrote: On Thu, Nov 24, 2016 at 7:20 AM, Daniel Borkmannwrote: Ok, strange, qdisc_destroy() calls into ops->destroy(), where ingress drops its entire chain via tcf_destroy_chain(), so that will be NULL eventually. The tps are freed by call_rcu() as well as qdisc itself later on via qdisc_rcu_free(), where it frees per-cpu bstats as well. Outstanding readers should either bail out due to if (!cl) or can still process the chain until read section ends, but during that time, cl->q resp. bstats should be good. Do you happen to know what's at address 880a68b04028? I was wondering wrt call_rcu() vs call_rcu_bh(), but at least on ingress (netif_receive_skb_internal()) we hold rcu_read_lock() here. The KASAN report is reliably happening at this location, right? I am confused as well, I don't see how it could be related to my patch yet. I will take a deep look in the weekend. Ok, I'm currently on the run. Got too late yesterday night, but I'll write what I found in the evening today, not related to ingress though. Cheers, Daniel
Re: net: stmmac: Meson GXBB: attempting to execute userspace memory
Hello Heinrich, On Sat, Nov 26, 2016 at 8:53 AM, Heinrich Schuchardtwrote: > For Odroid C2 I have compiled kernel > 4.9.0-rc6-next-20161124-1-gbf7e142 > with one additional patch > https://github.com/xypron/kernel-odroid-c2/blob/master/patch/0001-stmmac-RTL8211F-Meson-GXBB-TX-throughput-problems.patch > > I repeatedly see faults like the one below: do you see the same errors with the RTL8211F patch *not* applied? > [ 2557.400796] Unhandled fault: synchronous external abort (0x9210) > at 0x40001e8ee4b0 > [ 2557.952413] CPU: 0 PID: 22837 Comm: cc1 Tainted: G D > 4.9.0-rc6-next-20161124-1-gbf7e142 #1 > [ 2557.962062] Hardware name: Hardkernel ODROID-C2 (DT) > [ 2557.966980] task: 80006ddb7080 task.stack: 80006dd9c000 > [ 2557.972846] PC is at 0x6a0d98 > [ 2557.975776] LR is at 0x6a0e54 > [ 2557.978709] pc : [<006a0d98>] lr : [<006a0e54>] > pstate: 8000 > [ 2557.986040] sp : f3ee5f80 > [ 2557.989318] x29: f3ee5f80 x28: 4b3f1240 > [ 2557.994578] x27: 012a7000 x26: 4b3f1288 > [ 2557.999840] x25: 00f58f88 x24: 4b3f1240 > [ 2558.005101] x23: x22: 0001 > [ 2558.010362] x21: 0001 x20: 4b3f1250 > [ 2558.015623] x19: 0054 x18: 0001 > [ 2558.020885] x17: 48acaa10 x16: 01285050 > [ 2558.026146] x15: 4ad96dc8 x14: 001f > [ 2558.031407] x13: 4b3f1270 x12: 4b3f1258 > [ 2558.036668] x11: 01347000 x10: 0661 > [ 2558.041930] x9 : 0005 x8 : 0003 > [ 2558.047191] x7 : 4b3f1240 x6 : 20020033 > [ 2558.052452] x5 : 4b402020 x4 : 4b3e1aa0 > [ 2558.057713] x3 : 000c x2 : 0020 > [ 2558.062974] x1 : 00f45000 x0 : 0065 > [ 2558.068235] > [ 2558.069712] Internal error: Attempting to execute userspace memory: > 860f [#7] PREEMPT SMP > [ 2558.078155] Modules linked in: meson_rng rng_core meson_gxbb_wdt > ip_tables x_tables ipv6 dwmac_generic realtek dwmac_meson8b > stmmac_platform stmmac > [ 2558.091267] CPU: 0 PID: 22837 Comm: cc1 Tainted: G D > 4.9.0-rc6-next-20161124-1-gbf7e142 #1 > [ 2558.100925] Hardware name: Hardkernel ODROID-C2 (DT) > [ 2558.105841] task: 80006ddb7080 task.stack: 80006dd9c000 > [ 2558.111706] PC is at 0x6a0e54 > [ 2558.114638] LR is at 0x6a0e54 > [ 2558.117571] pc : [<006a0e54>] lr : [<006a0e54>] > pstate: 63c5 > [ 2558.124902] sp : 80006dd9fec0 > [ 2558.128179] x29: x28: 80006ddb7080 > [ 2558.133441] x27: 012a7000 x26: 4b3f1288 > [ 2558.138702] x25: 00f58f88 x24: 4b3f1240 > [ 2558.143963] x23: 8000 x22: 006a0d98 > [ 2558.149225] x21: x20: 80006e223000 > [ 2558.154486] x19: x18: 0010 > [ 2558.159747] x17: 48acaa10 x16: 01285050 > [ 2558.165008] x15: 88e91f07 x14: 0006 > [ 2558.170270] x13: 08e91f15 x12: 000f > [ 2558.175531] x11: 0002 x10: 02ea > [ 2558.180792] x9 : 80006dd9fb40 x8 : 00010a8b > [ 2558.186053] x7 : x6 : 020e > [ 2558.191315] x5 : 020f020e x4 : > [ 2558.196576] x3 : x2 : 020f > [ 2558.201837] x1 : 80006ddb7080 x0 : > [ 2558.207098] > [ 2558.208565] Process cc1 (pid: 22837, stack limit = 0x80006dd9c000) > [ 2558.215035] Stack: (0x80006dd9fec0 to 0x80006dda) > [ 2558.220728] fec0: 0065 00f45000 0020 > 000c > [ 2558.228490] fee0: 4b3e1aa0 4b402020 20020033 > 4b3f1240 > [ 2558.236253] ff00: 0003 0005 0661 > 01347000 > [ 2558.244015] ff20: 4b3f1258 4b3f1270 001f > 4ad96dc8 > [ 2558.251778] ff40: 01285050 48acaa10 0001 > 0054 > [ 2558.259540] ff60: 4b3f1250 0001 0001 > > [ 2558.267303] ff80: 4b3f1240 00f58f88 4b3f1288 > 012a7000 > [ 2558.275065] ffa0: 4b3f1240 f3ee5f80 006a0e54 > f3ee5f80 > [ 2558.282828] ffc0: 006a0d98 8000 0003 > > [ 2558.290590] ffe0: > > [ 2558.298351] Call trace: > [ 2558.300769] Exception stack(0x80006dd9fcf0 to 0x80006dd9fe20) > [ 2558.307149] fce0: > 0001 > [ 2558.314913] fd00: 80006dd9fec0 006a0e54 800073acf500 > 0004 > [ 2558.322675] fd20: 08dbbc18 80006ddb7080 > 6dd9fdd0 > [