[PATCH net-next v5] net: stmmac: Delete dead code for MDIO registration

2017-08-24 Thread Romain Perier
This code is no longer used, the logging function was changed by commit
fbca164776e4 ("net: stmmac: Use the right logging function in 
stmmac_mdio_register").
It was previously showing information about the type of the IRQ, if it's
polled, ignored or a normal interrupt. As we don't want information loss,
I have moved this code to phy_attached_print().

Fixes: fbca164776e4 ("net: stmmac: Use the right logging function in 
stmmac_mdio_register")
Signed-off-by: Romain Perier 
---

Hello,

This is the continuity of "[PATCH v2 0/2] net: stmmac: phy logging fixes",
see https://lkml.org/lkml/2017/8/21/288

Changes in v5:
- Don't truncater commit fbca164776e4 in the message of *this* commit
- Fixed typo

Changes in v4:
- Don't truncate the commit subject for "Fixes"
- Fixed invalid sizeof() used with snprintf
- Added "net-next" prefix to the title of the commit

Changes in v3:
- Removed patch 2/2: "net: phy: Don't use drv when it is NULL in 
phy_attached_print",
  fixed upstream by fcd03e362b1c
  ("net: phy: Deal with unbound PHY driver in phy_attached_print()")
- Moved this code to phy_attached_print(), so we have more informations
  about the IRQ (poll, ignore or normal irq) and no information are loss.
- Re-worded commit message

 drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c | 16 
 drivers/net/phy/phy_device.c  | 21 ++---
 2 files changed, 18 insertions(+), 19 deletions(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c 
b/drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c
index 72ec711fcba2..f5f37bfa1d58 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c
@@ -248,9 +248,6 @@ int stmmac_mdio_register(struct net_device *ndev)
found = 0;
for (addr = 0; addr < PHY_MAX_ADDR; addr++) {
struct phy_device *phydev = mdiobus_get_phy(new_bus, addr);
-   int act = 0;
-   char irq_num[4];
-   char *irq_str;
 
if (!phydev)
continue;
@@ -273,19 +270,6 @@ int stmmac_mdio_register(struct net_device *ndev)
if (priv->plat->phy_addr == -1)
priv->plat->phy_addr = addr;
 
-   act = (priv->plat->phy_addr == addr);
-   switch (phydev->irq) {
-   case PHY_POLL:
-   irq_str = "POLL";
-   break;
-   case PHY_IGNORE_INTERRUPT:
-   irq_str = "IGNORE";
-   break;
-   default:
-   sprintf(irq_num, "%d", phydev->irq);
-   irq_str = irq_num;
-   break;
-   }
phy_attached_info(phydev);
found = 1;
}
diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
index 2f742ae5b92e..4f67cafda2ef 100644
--- a/drivers/net/phy/phy_device.c
+++ b/drivers/net/phy/phy_device.c
@@ -861,21 +861,36 @@ void phy_attached_info(struct phy_device *phydev)
 }
 EXPORT_SYMBOL(phy_attached_info);
 
-#define ATTACHED_FMT "attached PHY driver [%s] (mii_bus:phy_addr=%s, irq=%d)"
+#define ATTACHED_FMT "attached PHY driver [%s] (mii_bus:phy_addr=%s, irq=%s)"
 void phy_attached_print(struct phy_device *phydev, const char *fmt, ...)
 {
const char *drv_name = phydev->drv ? phydev->drv->name : "unbound";
+   char *irq_str;
+   char irq_num[4];
+
+   switch(phydev->irq) {
+   case PHY_POLL:
+   irq_str = "POLL";
+   break;
+   case PHY_IGNORE_INTERRUPT:
+   irq_str = "IGNORE";
+   break;
+   default:
+   snprintf(irq_num, sizeof(irq_num), "%d", phydev->irq);
+   irq_str = irq_num;
+   break;
+   }
 
if (!fmt) {
dev_info(&phydev->mdio.dev, ATTACHED_FMT "\n",
 drv_name, phydev_name(phydev),
-phydev->irq);
+irq_str);
} else {
va_list ap;
 
dev_info(&phydev->mdio.dev, ATTACHED_FMT,
 drv_name, phydev_name(phydev),
-phydev->irq);
+irq_str);
 
va_start(ap, fmt);
vprintk(fmt, ap);
-- 
2.11.0



RE: [PATCH v3 0/7] Add RSS to DPAA 1.x Ethernet driver

2017-08-24 Thread Madalin-cristian Bucur
> -Original Message-
> From: David Miller [mailto:da...@davemloft.net]
> Subject: Re: [PATCH v3 0/7] Add RSS to DPAA 1.x Ethernet driver
> 
> From: David Miller 
> Date: Thu, 24 Aug 2017 09:42:20 -0700 (PDT)
> 
> > From: Madalin Bucur 
> > Date: Thu, 24 Aug 2017 10:28:21 +0300
> >
> >> This patch set introduces Receive Side Scaling for the DPAA Ethernet
> >> driver. Documentation is updated with details related to the new
> >> feature and limitations that apply.
> >> Added also a small fix.
> >>
> >> v2: removed a C++ style comment
> >> v3: move struct fman to header file to avoid exporting a function
> >
> > Series applied, thanks.
> 
> Actually I'm reverting, this doesn't even compile.

Hi,

Sorry for this blunder, I've only tested on PPC, where it works.
Will come back with a proper patch set.

Madalin


[PATCH] net: sxgbe: check memory allocation failure

2017-08-24 Thread Christophe JAILLET
Check memory allocation failure and return -ENOMEM in such a case, as
already done few lines below for another memory allocation.

Signed-off-by: Christophe JAILLET 
---
 drivers/net/ethernet/samsung/sxgbe/sxgbe_platform.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/ethernet/samsung/sxgbe/sxgbe_platform.c 
b/drivers/net/ethernet/samsung/sxgbe/sxgbe_platform.c
index 73427e29df2a..fbd00cb0cb7d 100644
--- a/drivers/net/ethernet/samsung/sxgbe/sxgbe_platform.c
+++ b/drivers/net/ethernet/samsung/sxgbe/sxgbe_platform.c
@@ -47,6 +47,8 @@ static int sxgbe_probe_config_dt(struct platform_device *pdev,
plat->mdio_bus_data = devm_kzalloc(&pdev->dev,
   sizeof(*plat->mdio_bus_data),
   GFP_KERNEL);
+   if (!plat->mdio_bus_data)
+   return -ENOMEM;
 
dma_cfg = devm_kzalloc(&pdev->dev, sizeof(*dma_cfg), GFP_KERNEL);
if (!dma_cfg)
-- 
2.11.0



Re: nf_nat_pptp 4.12.3 kernel lockup/reboot

2017-08-24 Thread Florian Westphal
Denys Fedoryshchenko  wrote:
> >>> I am trying to upgrade kernel 4.11.8 to 4.12.3 (it is a nat/router, 
> >>> handling
> >>> approx 2gbps of pppoe users traffic) and noticed that after while server
> >>> rebooting(i have set reboot on panic and etc).
> >>> I can't run serial console, and in pstore / netconsole there is nothing.
> >>> Best i got is some very short message about softlockup in ipmi, but as
> >>> storage very limited there - it is near useless.
> >>>
> >>> By preliminary testing (can't do it much, as it's production) - it seems
> >>> following lines causing issue, they worked in 4.11.8 and no more in 
> >>> 4.12.3.
> >>
> >>Wild guess here, does this help?
> >>
> >>diff --git a/net/netfilter/nf_conntrack_helper.c
> >>b/net/netfilter/nf_conntrack_helper.c
> >>--- a/net/netfilter/nf_conntrack_helper.c
> >>+++ b/net/netfilter/nf_conntrack_helper.c
> >>@@ -266,6 +266,8 @@ int __nf_ct_try_assign_helper(struct nf_conn *ct,
> >>struct nf_conn *tmpl,
> >>help = nf_ct_helper_ext_add(ct, helper, flags);
> >>if (help == NULL)
> >>return -ENOMEM;
> >>+   if (!nf_ct_ext_add(ct, NF_CT_EXT_NAT, flags));
> >
> >sigh, stupid typo, should be no ';' at the end above.
> Sorry, is there any plans to push this to 4.12 stable queue?

No, sorry, this patch adds the extension for all connections
that use a helper, but the nat extension is only used/required by pptp
helper (and masquerade).

Thing is that this patch should not be needed, I will have
to review pptp again, maybe i missed a case where the extension is not
added.

Do you happen to have an oops backtrace?

That might speed this up a bit.


Re: [PATCH net] net: xfrm: don't double-hold dst when sk_policy in use.

2017-08-24 Thread Steffen Klassert
On Wed, Aug 23, 2017 at 05:14:39PM +0900, Lorenzo Colitti wrote:
> While removing dst_entry garbage collection, commit 52df157f17e5
> ("xfrm: take refcnt of dst when creating struct xfrm_dst bundle")
> changed xfrm_resolve_and_create_bundle so it returns an xdst with
> a refcount of 1 instead of 0.
> 
> However, it did not delete the dst_hold performed by xfrm_lookup
> when a per-socket policy is in use. This means that when a
> socket policy is in use, dst entries returned by xfrm_lookup have
> a refcount of 2, and are not freed when no longer in use.
> 
> Cc: Wei Wang 
> Fixes: 52df157f17 ("xfrm: take refcnt of dst when creating struct xfrm_dst 
> bundle")
> Tested: https://android-review.googlesource.com/417481
> Tested: https://android-review.googlesource.com/418659
> Tested: https://android-review.googlesource.com/424463
> Tested: https://android-review.googlesource.com/452776 passes on net-next
> Signed-off-by: Lorenzo Colitti 

Patch applied, thanks a lot!


Re: [PATCH] netvsc: fix deadlock betwen link status and removal

2017-08-24 Thread David Miller
From: Stephen Hemminger 
Date: Thu, 24 Aug 2017 16:49:16 -0700

> There is a deadlock possible when canceling the link status
> delayed work queue. The removal process is run with RTNL held,
> and the link status callback is acquring RTNL.
> 
> Resolve the issue by using trylock and rescheduling.
> If cancel is in process, that block it from happening.
> 
> Fixes: 122a5f6410f4 ("staging: hv: use delayed_work for netvsc_send_garp()")
> Signed-off-by: Stephen Hemminger 

Applied and queued up for -stable, thanks.


Re: [PATCH net-next] hv_netvsc: Fix rndis_filter_close error during netvsc_remove

2017-08-24 Thread David Miller
From: Haiyang Zhang 
Date: Thu, 24 Aug 2017 11:50:02 -0700

> From: Haiyang Zhang 
> 
> We now remove rndis filter before unregister_netdev(), which calls
> device close. It involves closing rndis filter already removed.
> 
> This patch fixes this error.
> 
> Signed-off-by: Haiyang Zhang 

Applied.


Re: [PATCH] strparser: initialize all callbacks

2017-08-24 Thread David Miller
From: Eric Biggers 
Date: Thu, 24 Aug 2017 14:38:51 -0700

> From: Eric Biggers 
> 
> commit bbb03029a899 ("strparser: Generalize strparser") added more
> function pointers to 'struct strp_callbacks'; however, kcm_attach() was
> not updated to initialize them.  This could cause the ->lock() and/or
> ->unlock() function pointers to be set to garbage values, causing a
> crash in strp_work().
> 
> Fix the bug by moving the callback structs into static memory, so
> unspecified members are zeroed.  Also constify them while we're at it.
> 
> This bug was found by syzkaller, which encountered the following splat:
 ...
> Fixes: bbb03029a899 ("strparser: Generalize strparser")
> Signed-off-by: Eric Biggers 

This commit is only in net-next, so that is where I am applying
this patch.

Thanks.


Re: [PATCH net v1 0/3] tipc: buffer reassignment fixes

2017-08-24 Thread David Miller
From: Parthasarathy Bhuvaragan 
Date: Thu, 24 Aug 2017 16:31:21 +0200

> This series contains fixes for buffer reassignments and a context
> imbalance.

Series applied, thanks.


Re: [pull request][net-next 0/8] Mellanox, mlx5updates 2017-08-24

2017-08-24 Thread David Miller
From: Saeed Mahameed 
Date: Thu, 24 Aug 2017 16:21:44 +0300

> The following changes provide updates to mlx5 core driver.
> 
> For more details please see tag log message below.
> Please pull and let me know if there's any problem.

Pulled, thanks Saeed.


Re: [PATCH net-next] hinic: uninitialized variable in hinic_api_cmd_init()

2017-08-24 Thread David Miller
From: Dan Carpenter 
Date: Thu, 24 Aug 2017 13:47:39 +0300

> We never set the error code in this function.
> 
> Fixes: eabf0fad81d5 ("net-next/hinic: Initialize api cmd resources")
> Signed-off-by: Dan Carpenter 

Applied, thanks.


Re: [PATCH net-next 0/4] net: mvpp2: fix the mac address retrieval logic

2017-08-24 Thread David Miller
From: Antoine Tenart 
Date: Thu, 24 Aug 2017 11:46:54 +0200

> The MAC address retrieval logic was broken and when using the PPv2
> driver on PPv2.2 engines I ended up using the same mac address on all
> ports. This series of patches fixes this, and also tackle a possible bug
> when defining the mac address in the device tree.
> 
> To fix this in a nice way I ended up using a dedicated function to
> handle the mac retrieval logic. This can be hard to backport into stable
> kernels. This is why I also made a quick fix which is easy to backport
> (patch 1/14), to tackle down the PPv2.2 mac retrieval bug. Let me know
> if this approach is the proper way to handle this or if I should do
> something else.

This patch series doesn't apply to any of my trees, that is the first
thing.

Secondly, this is a bug fix, and the bug exists in the 'net' tree.
Therefore this patch series should target the 'net' tree.

Please always target legitimate bug fixes at the 'net' tree, rather
than 'net-next'.

Thank you.




[PATCH net-next 2/3 v9] net: arp: Add support for raw IP device

2017-08-24 Thread Subash Abhinov Kasiviswanathan
Define the raw IP type. This is needed for raw IP net devices
like rmnet.

Signed-off-by: Subash Abhinov Kasiviswanathan 
---
 include/uapi/linux/if_arp.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/include/uapi/linux/if_arp.h b/include/uapi/linux/if_arp.h
index cf73510..a2a6356 100644
--- a/include/uapi/linux/if_arp.h
+++ b/include/uapi/linux/if_arp.h
@@ -59,6 +59,7 @@
 #define ARPHRD_LAPB516 /* LAPB */
 #define ARPHRD_DDCMP517/* Digital's DDCMP protocol */
 #define ARPHRD_RAWHDLC 518 /* Raw HDLC */
+#define ARPHRD_RAWIP519/* Raw IP   */
 
 #define ARPHRD_TUNNEL  768 /* IPIP tunnel  */
 #define ARPHRD_TUNNEL6 769 /* IP6IP6 tunnel*/
-- 
1.9.1



[PATCH net-next 3/3 v9] drivers: net: ethernet: qualcomm: rmnet: Initial implementation

2017-08-24 Thread Subash Abhinov Kasiviswanathan
RmNet driver provides a transport agnostic MAP (multiplexing and
aggregation protocol) support in embedded module. Module provides
virtual network devices which can be attached to any IP-mode
physical device. This will be used to provide all MAP functionality
on future hardware in a single consistent location.

Signed-off-by: Subash Abhinov Kasiviswanathan 
---
 Documentation/networking/rmnet.txt |  82 +
 drivers/net/ethernet/qualcomm/Kconfig  |   2 +
 drivers/net/ethernet/qualcomm/Makefile |   2 +
 drivers/net/ethernet/qualcomm/rmnet/Kconfig|  12 +
 drivers/net/ethernet/qualcomm/rmnet/Makefile   |  10 +
 drivers/net/ethernet/qualcomm/rmnet/rmnet_config.c | 401 +
 drivers/net/ethernet/qualcomm/rmnet/rmnet_config.h |  54 +++
 .../net/ethernet/qualcomm/rmnet/rmnet_handlers.c   | 271 ++
 .../net/ethernet/qualcomm/rmnet/rmnet_handlers.h   |  26 ++
 drivers/net/ethernet/qualcomm/rmnet/rmnet_map.h|  88 +
 .../ethernet/qualcomm/rmnet/rmnet_map_command.c| 107 ++
 .../net/ethernet/qualcomm/rmnet/rmnet_map_data.c   | 105 ++
 .../net/ethernet/qualcomm/rmnet/rmnet_private.h|  45 +++
 drivers/net/ethernet/qualcomm/rmnet/rmnet_vnd.c| 197 ++
 drivers/net/ethernet/qualcomm/rmnet/rmnet_vnd.h|  30 ++
 15 files changed, 1432 insertions(+)
 create mode 100644 Documentation/networking/rmnet.txt
 create mode 100644 drivers/net/ethernet/qualcomm/rmnet/Kconfig
 create mode 100644 drivers/net/ethernet/qualcomm/rmnet/Makefile
 create mode 100644 drivers/net/ethernet/qualcomm/rmnet/rmnet_config.c
 create mode 100644 drivers/net/ethernet/qualcomm/rmnet/rmnet_config.h
 create mode 100644 drivers/net/ethernet/qualcomm/rmnet/rmnet_handlers.c
 create mode 100644 drivers/net/ethernet/qualcomm/rmnet/rmnet_handlers.h
 create mode 100644 drivers/net/ethernet/qualcomm/rmnet/rmnet_map.h
 create mode 100644 drivers/net/ethernet/qualcomm/rmnet/rmnet_map_command.c
 create mode 100644 drivers/net/ethernet/qualcomm/rmnet/rmnet_map_data.c
 create mode 100644 drivers/net/ethernet/qualcomm/rmnet/rmnet_private.h
 create mode 100644 drivers/net/ethernet/qualcomm/rmnet/rmnet_vnd.c
 create mode 100644 drivers/net/ethernet/qualcomm/rmnet/rmnet_vnd.h

diff --git a/Documentation/networking/rmnet.txt 
b/Documentation/networking/rmnet.txt
new file mode 100644
index 000..6b341ea
--- /dev/null
+++ b/Documentation/networking/rmnet.txt
@@ -0,0 +1,82 @@
+1. Introduction
+
+rmnet driver is used for supporting the Multiplexing and aggregation
+Protocol (MAP). This protocol is used by all recent chipsets using Qualcomm
+Technologies, Inc. modems.
+
+This driver can be used to register onto any physical network device in
+IP mode. Physical transports include USB, HSIC, PCIe and IP accelerator.
+
+Multiplexing allows for creation of logical netdevices (rmnet devices) to
+handle multiple private data networks (PDN) like a default internet, tethering,
+multimedia messaging service (MMS) or IP media subsystem (IMS). Hardware sends
+packets with MAP headers to rmnet. Based on the multiplexer id, rmnet
+routes to the appropriate PDN after removing the MAP header.
+
+Aggregation is required to achieve high data rates. This involves hardware
+sending aggregated bunch of MAP frames. rmnet driver will de-aggregate
+these MAP frames and send them to appropriate PDN's.
+
+2. Packet format
+
+a. MAP packet (data / control)
+
+MAP header has the same endianness of the IP packet.
+
+Packet format -
+
+Bit 0 1   2-7  8 - 15   16 - 31
+Function   Command / Data   Reserved Pad   Multiplexer IDPayload length
+Bit32 - x
+Function Raw  Bytes
+
+Command (1)/ Data (0) bit value is to indicate if the packet is a MAP command
+or data packet. Control packet is used for transport level flow control. Data
+packets are standard IP packets.
+
+Reserved bits are usually zeroed out and to be ignored by receiver.
+
+Padding is number of bytes to be added for 4 byte alignment if required by
+hardware.
+
+Multiplexer ID is to indicate the PDN on which data has to be sent.
+
+Payload length includes the padding length but does not include MAP header
+length.
+
+b. MAP packet (command specific)
+
+Bit 0 1   2-7  8 - 15   16 - 31
+Function   Command Reserved Pad   Multiplexer IDPayload length
+Bit  32 - 3940 - 4546 - 47   48 - 63
+Function   Command nameReserved   Command Type   Reserved
+Bit  64 - 95
+Function   Transaction ID
+Bit  96 - 127
+Function   Command data
+
+Command 1 indicates disabling flow while 2 is enabling flow
+
+Command types -
+0 for MAP command request
+1 is to acknowledge the receipt of a command
+2 is for unsupported commands
+3 is for error during processing of commands
+
+c. Aggregation
+
+Aggregation is multiple MAP packets (can be data or command) delivered to
+rmnet in a single l

[PATCH net-next 1/3 v9] net: ether: Add support for multiplexing and aggregation type

2017-08-24 Thread Subash Abhinov Kasiviswanathan
Define the multiplexing and aggregation (MAP) ether type 0x00F9. This
is needed for receiving data in the MAP protocol like RMNET. This is
not an officially registered ID.

Signed-off-by: Subash Abhinov Kasiviswanathan 
---
 include/uapi/linux/if_ether.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/include/uapi/linux/if_ether.h b/include/uapi/linux/if_ether.h
index 5bc9bfd..0d73ecc 100644
--- a/include/uapi/linux/if_ether.h
+++ b/include/uapi/linux/if_ether.h
@@ -137,6 +137,7 @@
 #define ETH_P_IEEE802154 0x00F6/* IEEE802.15.4 frame   
*/
 #define ETH_P_CAIF 0x00F7  /* ST-Ericsson CAIF protocol*/
 #define ETH_P_XDSA 0x00F8  /* Multiplexed DSA protocol */
+#define ETH_P_MAP  0x00F9  /* Multiplex & aggregation proto*/
 
 /*
  * This is an Ethernet frame header.
-- 
1.9.1



[PATCH net-next 0/3 v9] Add support for rmnet driver

2017-08-24 Thread Subash Abhinov Kasiviswanathan
This patch adds support for the rmnet driver which is required to
support recent chipsets using Qualcomm Technologies, Inc. modems. The data
from hardware follows the multiplexing and aggregation protocol (MAP).

This driver can be used to register onto any physical network device in
IP mode. Physical transports include USB, HSIC, PCIe and IP accelerator.

rmnet driver helps to decode these packets and queue them to network
stack (and encode and transmit it to the physical device).

--
v1: Same as the RFC patch with some minor fixes for issues reported by
kbuild test robot.

v1->v2: Change datatypes and remove config IOCTL as mentioned by David.
Also fix checkpatch issues and remove some unused code.

v2->v3: Move location to drivers/net and rename to rmnet. Change the
userspace - netlink communication from custom netlink to rtnl_link_ops.
Refactor some code. Use a fixed config for ingress and egress.

v3->v4: Move location to drivers/net/ethernet/qualcomm/.
Fix comments from Stephen and Jiri -
Split the ether and arp type changes into seperate patches.
Remove debug and custom logging and switch to standard netdevice log.
Remove module parameters. Refactor and change some code style issues.

v4->v5: Rename some structs and variables. Move the initializer
before the for loop start. Put the arp type in correct sequence.

v5->v6: Fix comments from Dan -
Use the upper link API. As a result, remove all the refcounting logic.
Device refcount is explicitly held on real_dev on rx_handler
registration only. Modifiy the flow control struct. Remove the unused
ethernet mode handling.

v6->v7: Fix comments from David - Add newline to end of Makefile. Remove
inline from .c files. Move the module init/exit to rmnet config. Fix an
error reported by kbuild test robot for an unused file.

v7->v8: Use a smaller value for ETH_P_MAP as mentioned by David. Change
netdev_info to netdev_dbg as mentioned by Andew. Fix comments from
Stephen regarding netdev_priv and sparse related errors of using 0 as NULL

v8->v9: Fix comments from David - Remove the CFLAG rule. Change the way
rmnet devices are freed. Instead of using a workqueue to unregister devices
individually, go through the list and free all devices within the rtnl_lock().

Subash Abhinov Kasiviswanathan (3):
  net: ether: Add support for multiplexing and aggregation type
  net: arp: Add support for raw IP device
  drivers: net: ethernet: qualcomm: rmnet: Initial implementation

 Documentation/networking/rmnet.txt |  82 +
 drivers/net/ethernet/qualcomm/Kconfig  |   2 +
 drivers/net/ethernet/qualcomm/Makefile |   2 +
 drivers/net/ethernet/qualcomm/rmnet/Kconfig|  12 +
 drivers/net/ethernet/qualcomm/rmnet/Makefile   |  10 +
 drivers/net/ethernet/qualcomm/rmnet/rmnet_config.c | 401 +
 drivers/net/ethernet/qualcomm/rmnet/rmnet_config.h |  54 +++
 .../net/ethernet/qualcomm/rmnet/rmnet_handlers.c   | 271 ++
 .../net/ethernet/qualcomm/rmnet/rmnet_handlers.h   |  26 ++
 drivers/net/ethernet/qualcomm/rmnet/rmnet_map.h|  88 +
 .../ethernet/qualcomm/rmnet/rmnet_map_command.c| 107 ++
 .../net/ethernet/qualcomm/rmnet/rmnet_map_data.c   | 105 ++
 .../net/ethernet/qualcomm/rmnet/rmnet_private.h|  45 +++
 drivers/net/ethernet/qualcomm/rmnet/rmnet_vnd.c| 197 ++
 drivers/net/ethernet/qualcomm/rmnet/rmnet_vnd.h|  30 ++
 include/uapi/linux/if_arp.h|   1 +
 include/uapi/linux/if_ether.h  |   1 +
 17 files changed, 1434 insertions(+)
 create mode 100644 Documentation/networking/rmnet.txt
 create mode 100644 drivers/net/ethernet/qualcomm/rmnet/Kconfig
 create mode 100644 drivers/net/ethernet/qualcomm/rmnet/Makefile
 create mode 100644 drivers/net/ethernet/qualcomm/rmnet/rmnet_config.c
 create mode 100644 drivers/net/ethernet/qualcomm/rmnet/rmnet_config.h
 create mode 100644 drivers/net/ethernet/qualcomm/rmnet/rmnet_handlers.c
 create mode 100644 drivers/net/ethernet/qualcomm/rmnet/rmnet_handlers.h
 create mode 100644 drivers/net/ethernet/qualcomm/rmnet/rmnet_map.h
 create mode 100644 drivers/net/ethernet/qualcomm/rmnet/rmnet_map_command.c
 create mode 100644 drivers/net/ethernet/qualcomm/rmnet/rmnet_map_data.c
 create mode 100644 drivers/net/ethernet/qualcomm/rmnet/rmnet_private.h
 create mode 100644 drivers/net/ethernet/qualcomm/rmnet/rmnet_vnd.c
 create mode 100644 drivers/net/ethernet/qualcomm/rmnet/rmnet_vnd.h

-- 
1.9.1



Re: [PATCH net-next v2] net: mv643xx_eth: Be drop monitor friendly

2017-08-24 Thread Eric Dumazet
On Thu, 2017-08-24 at 20:55 -0700, Florian Fainelli wrote:
> txq_reclaim() does the normal transmit queue reclamation and
> rxq_deinit() does the RX ring cleanup, none of these are packet drops,
> so use dev_consume_skb() for both locations.
> 
> Signed-off-by: Florian Fainelli 
> ---
>  drivers/net/ethernet/marvell/mv643xx_eth.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/ethernet/marvell/mv643xx_eth.c 
> b/drivers/net/ethernet/marvell/mv643xx_eth.c
> index fb2d533ae4ef..81c1fac00d33 100644
> --- a/drivers/net/ethernet/marvell/mv643xx_eth.c
> +++ b/drivers/net/ethernet/marvell/mv643xx_eth.c
> @@ -1121,7 +1121,7 @@ static int txq_reclaim(struct tx_queue *txq, int 
> budget, int force)
>   struct sk_buff *skb = __skb_dequeue(&txq->tx_skb);
>  
>   if (!WARN_ON(!skb))
> - dev_kfree_skb(skb);
> + dev_consume_skb_any(skb);
>   }
>  
>   if (cmd_sts & ERROR_SUMMARY) {
> @@ -2024,7 +2024,7 @@ static void rxq_deinit(struct rx_queue *rxq)
>  
>   for (i = 0; i < rxq->rx_ring_size; i++) {
>   if (rxq->rx_skb[i]) {
> - dev_kfree_skb(rxq->rx_skb[i]);
> + dev_consume_skb_any(rxq->rx_skb[i]);
>   rxq->rx_desc_count--;
>   }
>   }


I do not believe this patch is needed.

dev_kfree_skb() is an alias of consume_skb(), which is already drop
monitor ready ;)





Re: [PATCH net] net_sched: fix a refcount_t issue with noop_qdisc

2017-08-24 Thread David Miller
From: Eric Dumazet 
Date: Thu, 24 Aug 2017 21:12:28 -0700

> From: Eric Dumazet 
> 
> syzkaller reported a refcount_t warning [1]
> 
> Issue here is that noop_qdisc refcnt was never really considered as
> a true refcount, since qdisc_destroy() found TCQ_F_BUILTIN set :
> 
> if (qdisc->flags & TCQ_F_BUILTIN ||
> !refcount_dec_and_test(&qdisc->refcnt)))
>   return;
> 
> Meaning that all atomic_inc() we did on noop_qdisc.refcnt were not
> really needed, but harmless until refcount_t came.
> 
> To fix this problem, we simply need to not increment noop_qdisc.refcnt,
> since we never decrement it.
> 
> [1]
> refcount_t: increment on 0; use-after-free.
 ...
> Fixes: 7b9364050246 ("net, sched: convert Qdisc.refcnt from atomic_t to 
> refcount_t")
> Signed-off-by: Eric Dumazet 
> Reported-by: Dmitry Vyukov 
> Cc: Reshetova, Elena 

Applied.


[PATCH net-next 1/2] nfp: add basic SR-IOV ndo functions

2017-08-24 Thread Jakub Kicinski
From: Pablo Cascón 

Add basic ndo_set/get_vf to support SR-IOV.

VF to egress phy static mapping by now.

Use vfcfg ABI version 2 to write the info to the FW and collect
the return value from the mailbox.

Signed-off-by: Pablo Cascón 
Signed-off-by: Jimmy Kizito 
Signed-off-by: Rami Tomer 
Signed-off-by: Simon Horman 
Signed-off-by: Jakub Kicinski 
---
 drivers/net/ethernet/netronome/nfp/Makefile|   1 +
 drivers/net/ethernet/netronome/nfp/nfp_main.h  |   4 +
 .../net/ethernet/netronome/nfp/nfp_net_common.c|   6 +
 drivers/net/ethernet/netronome/nfp/nfp_net_ctrl.h  |   1 +
 drivers/net/ethernet/netronome/nfp/nfp_net_main.c  |  20 +-
 drivers/net/ethernet/netronome/nfp/nfp_net_sriov.c | 243 +
 drivers/net/ethernet/netronome/nfp/nfp_net_sriov.h |  86 
 drivers/net/ethernet/netronome/nfp/nic/main.c  |  12 +
 8 files changed, 372 insertions(+), 1 deletion(-)
 create mode 100644 drivers/net/ethernet/netronome/nfp/nfp_net_sriov.c
 create mode 100644 drivers/net/ethernet/netronome/nfp/nfp_net_sriov.h

diff --git a/drivers/net/ethernet/netronome/nfp/Makefile 
b/drivers/net/ethernet/netronome/nfp/Makefile
index b8e1358868bd..96e579a15cbe 100644
--- a/drivers/net/ethernet/netronome/nfp/Makefile
+++ b/drivers/net/ethernet/netronome/nfp/Makefile
@@ -23,6 +23,7 @@ nfp-objs := \
nfp_net_ethtool.o \
nfp_net_main.o \
nfp_net_repr.o \
+   nfp_net_sriov.o \
nfp_netvf_main.o \
nfp_port.o \
bpf/main.o \
diff --git a/drivers/net/ethernet/netronome/nfp/nfp_main.h 
b/drivers/net/ethernet/netronome/nfp/nfp_main.h
index 6922410806db..be0ee59f2eb9 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_main.h
+++ b/drivers/net/ethernet/netronome/nfp/nfp_main.h
@@ -73,6 +73,8 @@ struct nfp_rtsym_table;
  * @mac_stats_mem: Pointer to mapped MAC stats area
  * @vf_cfg_bar:Pointer to the CPP area for the VF 
configuration BAR
  * @vf_cfg_mem:Pointer to mapped VF configuration area
+ * @vfcfg_tbl2_area:   Pointer to the CPP area for the VF config table
+ * @vfcfg_tbl2:Pointer to mapped VF config table
  * @irq_entries:   Array of MSI-X entries for all vNICs
  * @limit_vfs: Number of VFs supported by firmware (~0 for PCI limit)
  * @num_vfs:   Number of SR-IOV VFs enabled
@@ -107,6 +109,8 @@ struct nfp_pf {
u8 __iomem *mac_stats_mem;
struct nfp_cpp_area *vf_cfg_bar;
u8 __iomem *vf_cfg_mem;
+   struct nfp_cpp_area *vfcfg_tbl2_area;
+   u8 __iomem *vfcfg_tbl2;
 
struct msix_entry *irq_entries;
 
diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c 
b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
index 1dbe9a3a02d1..2920889fa6d6 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
@@ -71,6 +71,7 @@
 #include "nfp_app.h"
 #include "nfp_net_ctrl.h"
 #include "nfp_net.h"
+#include "nfp_net_sriov.h"
 #include "nfp_port.h"
 
 /**
@@ -3421,6 +3422,11 @@ const struct net_device_ops nfp_net_netdev_ops = {
.ndo_get_stats64= nfp_net_stat64,
.ndo_vlan_rx_add_vid= nfp_net_vlan_rx_add_vid,
.ndo_vlan_rx_kill_vid   = nfp_net_vlan_rx_kill_vid,
+   .ndo_set_vf_mac = nfp_app_set_vf_mac,
+   .ndo_set_vf_vlan= nfp_app_set_vf_vlan,
+   .ndo_set_vf_spoofchk= nfp_app_set_vf_spoofchk,
+   .ndo_get_vf_config  = nfp_app_get_vf_config,
+   .ndo_set_vf_link_state  = nfp_app_set_vf_link_state,
.ndo_setup_tc   = nfp_port_setup_tc,
.ndo_tx_timeout = nfp_net_tx_timeout,
.ndo_set_rx_mode= nfp_net_set_rx_mode,
diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_ctrl.h 
b/drivers/net/ethernet/netronome/nfp/nfp_net_ctrl.h
index e5e94e0746ec..b0a452ba9039 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_ctrl.h
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_ctrl.h
@@ -164,6 +164,7 @@
 #define   NFP_NET_CFG_UPDATE_BPF (0x1 << 10) /* BPF program load */
 #define   NFP_NET_CFG_UPDATE_MACADDR (0x1 << 11) /* MAC address change */
 #define   NFP_NET_CFG_UPDATE_MBOX(0x1 << 12) /* Mailbox update */
+#define   NFP_NET_CFG_UPDATE_VF  (0x1 << 13) /* VF settings 
change */
 #define   NFP_NET_CFG_UPDATE_ERR  (0x1 << 31) /* A error occurred */
 #define NFP_NET_CFG_TXRS_ENABLE 0x0008
 #define NFP_NET_CFG_RXRS_ENABLE 0x0010
diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_main.c 
b/drivers/net/ethernet/netronome/nfp/nfp_net_main.c
index 4c3be635f475..5a21e61662d6 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_main.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_main.c
@@ -57,6 +57,7 @@
 #include "nfpcore/nfp6000_pcie.h"
 #include "nfp_app.h"
 #include "nfp_net_ctrl.h"
+#include "nfp_net_sriov.h"
 #include "nfp_net.h"
 #include "nfp_main.h"
 #include "nfp_port.

[PATCH net-next 0/2] nfp: SR-IOV ndos support

2017-08-24 Thread Jakub Kicinski
Hi!

This set adds basic SR-IOV including setting/getting VF MAC addresses,
VLANs, link state and spoofcheck settings.  It is wired up for both
vNICs and representors (note: ip link will not report VF settings on
VF/PF representors because they are not linked to the PF PCI device).

Pablo and team add the basic implementation, Simon and Dirk follow
up with the representor plumbing.

Pablo Cascón (1):
  nfp: add basic SR-IOV ndo functions

Simon Horman (1):
  nfp: add basic SR-IOV ndo functions to representors

 drivers/net/ethernet/netronome/nfp/Makefile|   1 +
 drivers/net/ethernet/netronome/nfp/nfp_main.h  |   4 +
 .../net/ethernet/netronome/nfp/nfp_net_common.c|   6 +
 drivers/net/ethernet/netronome/nfp/nfp_net_ctrl.h  |   1 +
 drivers/net/ethernet/netronome/nfp/nfp_net_main.c  |  20 +-
 drivers/net/ethernet/netronome/nfp/nfp_net_repr.c  |   6 +
 drivers/net/ethernet/netronome/nfp/nfp_net_sriov.c | 243 +
 drivers/net/ethernet/netronome/nfp/nfp_net_sriov.h |  86 
 drivers/net/ethernet/netronome/nfp/nic/main.c  |  12 +
 9 files changed, 378 insertions(+), 1 deletion(-)
 create mode 100644 drivers/net/ethernet/netronome/nfp/nfp_net_sriov.c
 create mode 100644 drivers/net/ethernet/netronome/nfp/nfp_net_sriov.h

-- 
2.11.0



[PATCH net-next 2/2] nfp: add basic SR-IOV ndo functions to representors

2017-08-24 Thread Jakub Kicinski
From: Simon Horman 

Add basic ndo_set/get_vf to support SR-IOV on all types
of port representors.

Signed-off-by: Simon Horman 
Signed-off-by: Dirk van der Merwe 
Reviewed-by: Jakub Kicinski 
---
 drivers/net/ethernet/netronome/nfp/nfp_net_repr.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_repr.c 
b/drivers/net/ethernet/netronome/nfp/nfp_net_repr.c
index 0f9878d1bf40..d540a9dc77b3 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_repr.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_repr.c
@@ -43,6 +43,7 @@
 #include "nfp_main.h"
 #include "nfp_net_ctrl.h"
 #include "nfp_net_repr.h"
+#include "nfp_net_sriov.h"
 #include "nfp_port.h"
 
 static void
@@ -247,6 +248,11 @@ const struct net_device_ops nfp_repr_netdev_ops = {
.ndo_get_offload_stats  = nfp_repr_get_offload_stats,
.ndo_get_phys_port_name = nfp_port_get_phys_port_name,
.ndo_setup_tc   = nfp_port_setup_tc,
+   .ndo_set_vf_mac = nfp_app_set_vf_mac,
+   .ndo_set_vf_vlan= nfp_app_set_vf_vlan,
+   .ndo_set_vf_spoofchk= nfp_app_set_vf_spoofchk,
+   .ndo_get_vf_config  = nfp_app_get_vf_config,
+   .ndo_set_vf_link_state  = nfp_app_set_vf_link_state,
 };
 
 static void nfp_repr_clean(struct nfp_repr *repr)
-- 
2.11.0



Re: [PATCH net-next v2] e1000e: Be drop monitor friendly

2017-08-24 Thread David Miller
From: Florian Fainelli 
Date: Thu, 24 Aug 2017 20:58:24 -0700

> e1000_put_txbuf() cleans up the successfully transmitted TX packets,
> e1000e_tx_hwtstamp_work() also does the successfully completes the
> timestamped TX packets, e1000_clean_rx_ring() cleans up the RX ring and
> e1000_remove() cleans up the timestampted packets. None of these
> functions should be reporting dropped packets, so make them use
> dev_consume_skb_any() to be drop monitor friendly.
> 
> Signed-off-by: Florian Fainelli 

I'll let the Intel folks pick this up.


Re: [PATCH net-next v2] net: mv643xx_eth: Be drop monitor friendly

2017-08-24 Thread David Miller
From: Florian Fainelli 
Date: Thu, 24 Aug 2017 20:55:40 -0700

> txq_reclaim() does the normal transmit queue reclamation and
> rxq_deinit() does the RX ring cleanup, none of these are packet drops,
> so use dev_consume_skb() for both locations.
> 
> Signed-off-by: Florian Fainelli 

Applied.


[PATCH net] net_sched: fix a refcount_t issue with noop_qdisc

2017-08-24 Thread Eric Dumazet
From: Eric Dumazet 

syzkaller reported a refcount_t warning [1]

Issue here is that noop_qdisc refcnt was never really considered as
a true refcount, since qdisc_destroy() found TCQ_F_BUILTIN set :

if (qdisc->flags & TCQ_F_BUILTIN ||
!refcount_dec_and_test(&qdisc->refcnt)))
return;

Meaning that all atomic_inc() we did on noop_qdisc.refcnt were not
really needed, but harmless until refcount_t came.

To fix this problem, we simply need to not increment noop_qdisc.refcnt,
since we never decrement it.

[1]
refcount_t: increment on 0; use-after-free.
[ cut here ]
WARNING: CPU: 0 PID: 21754 at lib/refcount.c:152 refcount_inc+0x47/0x50 
lib/refcount.c:152
Kernel panic - not syncing: panic_on_warn set ...

CPU: 0 PID: 21754 Comm: syz-executor7 Not tainted 4.13.0-rc6+ #20
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 
01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:16 [inline]
 dump_stack+0x194/0x257 lib/dump_stack.c:52
 panic+0x1e4/0x417 kernel/panic.c:180
 __warn+0x1c4/0x1d9 kernel/panic.c:541
 report_bug+0x211/0x2d0 lib/bug.c:183
 fixup_bug+0x40/0x90 arch/x86/kernel/traps.c:190
 do_trap_no_signal arch/x86/kernel/traps.c:224 [inline]
 do_trap+0x260/0x390 arch/x86/kernel/traps.c:273
 do_error_trap+0x120/0x390 arch/x86/kernel/traps.c:310
 do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:323
 invalid_op+0x1e/0x30 arch/x86/entry/entry_64.S:846
RIP: 0010:refcount_inc+0x47/0x50 lib/refcount.c:152
RSP: 0018:8801c43477a0 EFLAGS: 00010282
RAX: 002b RBX: 86093c14 RCX: 
RDX: 002b RSI: 8159314e RDI: ed0038868ee8
RBP: 8801c43477a8 R08: 0001 R09: 
R10:  R11:  R12: 86093ac0
R13: 0001 R14: 8801d0f3bac0 R15: dc00
 attach_default_qdiscs net/sched/sch_generic.c:792 [inline]
 dev_activate+0x7d3/0xaa0 net/sched/sch_generic.c:833
 __dev_open+0x227/0x330 net/core/dev.c:1380
 __dev_change_flags+0x695/0x990 net/core/dev.c:6726
 dev_change_flags+0x88/0x140 net/core/dev.c:6792
 dev_ifsioc+0x5a6/0x930 net/core/dev_ioctl.c:256
 dev_ioctl+0x2bc/0xf90 net/core/dev_ioctl.c:554
 sock_do_ioctl+0x94/0xb0 net/socket.c:968
 sock_ioctl+0x2c2/0x440 net/socket.c:1058
 vfs_ioctl fs/ioctl.c:45 [inline]
 do_vfs_ioctl+0x1b1/0x1520 fs/ioctl.c:685
 SYSC_ioctl fs/ioctl.c:700 [inline]
 SyS_ioctl+0x8f/0xc0 fs/ioctl.c:691

Fixes: 7b9364050246 ("net, sched: convert Qdisc.refcnt from atomic_t to 
refcount_t")
Signed-off-by: Eric Dumazet 
Reported-by: Dmitry Vyukov 
Cc: Reshetova, Elena 
---
 include/net/sch_generic.h |7 +++
 net/sched/sch_api.c   |6 +++---
 net/sched/sch_generic.c   |2 +-
 3 files changed, 11 insertions(+), 4 deletions(-)

diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 
67f815e5d52517390226bc3531b1ea7b5f1020bc..c1109cdbbfa6afb9aff0d6033aef7b615630ffc1
 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -101,6 +101,13 @@ struct Qdisc {
spinlock_t  busylock cacheline_aligned_in_smp;
 };
 
+static inline void qdisc_refcount_inc(struct Qdisc *qdisc)
+{
+   if (qdisc->flags & TCQ_F_BUILTIN)
+   return;
+   refcount_inc(&qdisc->refcnt);
+}
+
 static inline bool qdisc_is_running(const struct Qdisc *qdisc)
 {
return (raw_read_seqcount(&qdisc->running) & 1) ? true : false;
diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
index 
a3fa144b864871088209386fd573bded1886432f..4fb5a3222d0d324167f079f755be14eb028b4a50
 100644
--- a/net/sched/sch_api.c
+++ b/net/sched/sch_api.c
@@ -836,7 +836,7 @@ static int qdisc_graft(struct net_device *dev, struct Qdisc 
*parent,
 
old = dev_graft_qdisc(dev_queue, new);
if (new && i > 0)
-   refcount_inc(&new->refcnt);
+   qdisc_refcount_inc(new);
 
if (!ingress)
qdisc_destroy(old);
@@ -847,7 +847,7 @@ static int qdisc_graft(struct net_device *dev, struct Qdisc 
*parent,
notify_and_destroy(net, skb, n, classid,
   dev->qdisc, new);
if (new && !new->ops->attach)
-   refcount_inc(&new->refcnt);
+   qdisc_refcount_inc(new);
dev->qdisc = new ? : &noop_qdisc;
 
if (new && new->ops->attach)
@@ -1256,7 +1256,7 @@ static int tc_modify_qdisc(struct sk_buff *skb, struct 
nlmsghdr *n,
if (q == p ||
(p && check_loop(q, p, 0)))
return -ELOOP;
-   refcount_inc(&q->refcnt);
+   qdisc_refcount_inc(q);
goto graft;
} else {
 

Re: [PATCH v3 3/4] net: stmmac: register parent MDIO node for sun8i-h3-emac

2017-08-24 Thread Florian Fainelli


On 08/24/2017 08:41 PM, Chen-Yu Tsai wrote:
> On Fri, Aug 25, 2017 at 11:05 AM, Florian Fainelli  
> wrote:
>>
>>
>> On 08/24/2017 07:54 PM, Chen-Yu Tsai wrote:
>>> On Fri, Aug 25, 2017 at 3:59 AM, Florian Fainelli  
>>> wrote:
 On 08/24/2017 01:21 AM, Corentin Labbe wrote:
> On Wed, Aug 23, 2017 at 09:31:53AM -0700, Florian Fainelli wrote:
>> On 08/23/2017 12:49 AM, Maxime Ripard wrote:
>>> Hi Florian,
>>>
>>> On Tue, Aug 22, 2017 at 11:35:01AM -0700, Florian Fainelli wrote:
>>> So I think what you are saying is either impossible or 
>>> engineering-wise
>>> a very stupid design, like using an external MAC with a discrete PHY
>>> connected to the internal MAC's MDIO bus, while using the internal 
>>> MAC
>>> with the internal PHY.
>>>
>>> Now can we please decide on something? We're a week and a half from
>>> the 4.13 release. If mdio-mux is wrong, then we could have two mdio
>>> nodes (internal-mdio & external-mdio).
>>
>> I really don't see a need for a mdio-mux in the first place, just 
>> have
>> one MDIO controller (current state) sub-node which describes the
>> built-in STMMAC MDIO controller and declare the internal PHY as a 
>> child
>> node (along with 'phy-is-integrated'). If a different configuration 
>> is
>> used, then just put the external PHY as a child node there.
>>
>> If fixed-link is required, the mdio node becomes unused anyway.
>>
>> Works for everyone?
>
> If we put an external PHY with reg=1 as a child of internal MDIO,
> il will be merged with internal PHY node and get
> phy-is-integrated.

 Then have the .dtsi file contain just the mdio node, but no internal or
 external PHY and push all the internal and external PHY node definition
 (in its entirety) to the per-board DTS file, does not that work?
>>>
>>> If possible, I'd really like to have the internal PHY in the
>>> DTSI. It's always there in hardware anyway, and duplicating the PHY,
>>> with its clock, reset line, and whatever info we might need in the
>>> future in each and every board DTS that uses it will be very error
>>> prone and we will have the usual bunch of issues that come up with
>>> duplication.
>>
>> OK, then what if you put the internal PHY in the DTSI, mark it with a
>> status = "disabled" property, and have the per-board DTS put a status =
>> "okay" property along with a "phy-is-integrated" boolean property? Would
>> that work?
>
> No, I tested and for example with sun8i-h3-orangepi-plus.dts, the 
> external PHY (ethernet-phy@1) is still merged.

 Is not there is a mistake in the unit address not matching the "reg"
 property, or am I not looking at the right tree?

 &mdio {
 ext_rgmii_phy: ethernet-phy@1 {
 compatible = "ethernet-phy-ieee802.3-c22";
 reg = <0>;
 };
 };

 If the PHY is really at MDIO address 0, then it should be
 ethernet-phy@0, and not ethernet-phy@1, and then no problem with the
 merging?
>>>
>>> That is wrong. The board described in the example likely has a Realtek
>>> RTL8211E @ address 0x1. Address 0 for this PHY is a broadcast address,
>>> so it still works, but is the wrong representation.
>>>


> So that adding a 'status = "disabled"' does not bring anything.
>
>>
>> What I really don't think is necessary is:
>>
>> - duplicating the "mdio" controller node for external vs. internal PHY,
>> because this is not accurate, there is just one MDIO controller, but
>> there may be different kinds of MDIO/PHY devices attached
>
> For me, if we want to represent the reality, we need two MDIO:
> - since two PHY at the same address could co-exists
> - since they are isolated so not on the same MDIO bus

 Is that really true? It might be, but from experience with e.g:
 bcmgenet, the integrated PHY and the external PHYs are on the same MDIO
 bus, which is convenient, except when you have an address conflict.
>>>
>>> There's a mux in the hardware: either the internal MDIO+MII lines
>>> from the internal PHY are connected to the MAC, or the external
>>> MDIO+MII lines from the pin controller are connected. I believe
>>> this was already mentioned?
>>
>> There is obviously a mux for the data lines and clock to switch between
>> internal PHY and external PHYs, that does not mean there is one for MDIO
>> and MDC lines, which is what is being suggested to be used here, does
>> the mux also takes care of these lines?
>>
>>>

>
>> - having the STMMAC driver MDIO probing code having to deal with a
>> "mdio" sub-node or an "internal-mdio" sub-node because this is confusing
>> and requiring m

[PATCH net-next v2] e1000e: Be drop monitor friendly

2017-08-24 Thread Florian Fainelli
e1000_put_txbuf() cleans up the successfully transmitted TX packets,
e1000e_tx_hwtstamp_work() also does the successfully completes the
timestamped TX packets, e1000_clean_rx_ring() cleans up the RX ring and
e1000_remove() cleans up the timestampted packets. None of these
functions should be reporting dropped packets, so make them use
dev_consume_skb_any() to be drop monitor friendly.

Signed-off-by: Florian Fainelli 
---
 drivers/net/ethernet/intel/e1000e/netdev.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c 
b/drivers/net/ethernet/intel/e1000e/netdev.c
index 327dfe5bedc0..a90e459c5b8a 100644
--- a/drivers/net/ethernet/intel/e1000e/netdev.c
+++ b/drivers/net/ethernet/intel/e1000e/netdev.c
@@ -1085,7 +1085,7 @@ static void e1000_put_txbuf(struct e1000_ring *tx_ring,
buffer_info->dma = 0;
}
if (buffer_info->skb) {
-   dev_kfree_skb_any(buffer_info->skb);
+   dev_consume_skb_any(buffer_info->skb);
buffer_info->skb = NULL;
}
buffer_info->time_stamp = 0;
@@ -1199,7 +1199,7 @@ static void e1000e_tx_hwtstamp_work(struct work_struct 
*work)
wmb(); /* force write prior to skb_tstamp_tx */
 
skb_tstamp_tx(skb, &shhwtstamps);
-   dev_kfree_skb_any(skb);
+   dev_consume_skb_any(skb);
} else if (time_after(jiffies, adapter->tx_hwtstamp_start
  + adapter->tx_timeout_factor * HZ)) {
dev_kfree_skb_any(adapter->tx_hwtstamp_skb);
@@ -1716,7 +1716,7 @@ static void e1000_clean_rx_ring(struct e1000_ring 
*rx_ring)
}
 
if (buffer_info->skb) {
-   dev_kfree_skb(buffer_info->skb);
+   dev_consume_skb_any(buffer_info->skb);
buffer_info->skb = NULL;
}
 
@@ -1734,7 +1734,7 @@ static void e1000_clean_rx_ring(struct e1000_ring 
*rx_ring)
 
/* there also may be some cached data from a chained receive */
if (rx_ring->rx_skb_top) {
-   dev_kfree_skb(rx_ring->rx_skb_top);
+   dev_consume_skb_any(rx_ring->rx_skb_top);
rx_ring->rx_skb_top = NULL;
}
 
@@ -7411,7 +7411,7 @@ static void e1000_remove(struct pci_dev *pdev)
if (adapter->flags & FLAG_HAS_HW_TIMESTAMP) {
cancel_work_sync(&adapter->tx_hwtstamp_work);
if (adapter->tx_hwtstamp_skb) {
-   dev_kfree_skb_any(adapter->tx_hwtstamp_skb);
+   dev_consume_skb_any(adapter->tx_hwtstamp_skb);
adapter->tx_hwtstamp_skb = NULL;
}
}
-- 
2.11.0



[PATCH net-next v2] net: mv643xx_eth: Be drop monitor friendly

2017-08-24 Thread Florian Fainelli
txq_reclaim() does the normal transmit queue reclamation and
rxq_deinit() does the RX ring cleanup, none of these are packet drops,
so use dev_consume_skb() for both locations.

Signed-off-by: Florian Fainelli 
---
 drivers/net/ethernet/marvell/mv643xx_eth.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/marvell/mv643xx_eth.c 
b/drivers/net/ethernet/marvell/mv643xx_eth.c
index fb2d533ae4ef..81c1fac00d33 100644
--- a/drivers/net/ethernet/marvell/mv643xx_eth.c
+++ b/drivers/net/ethernet/marvell/mv643xx_eth.c
@@ -1121,7 +1121,7 @@ static int txq_reclaim(struct tx_queue *txq, int budget, 
int force)
struct sk_buff *skb = __skb_dequeue(&txq->tx_skb);
 
if (!WARN_ON(!skb))
-   dev_kfree_skb(skb);
+   dev_consume_skb_any(skb);
}
 
if (cmd_sts & ERROR_SUMMARY) {
@@ -2024,7 +2024,7 @@ static void rxq_deinit(struct rx_queue *rxq)
 
for (i = 0; i < rxq->rx_ring_size; i++) {
if (rxq->rx_skb[i]) {
-   dev_kfree_skb(rxq->rx_skb[i]);
+   dev_consume_skb_any(rxq->rx_skb[i]);
rxq->rx_desc_count--;
}
}
-- 
2.11.0



Re: [PATCH v3 3/4] net: stmmac: register parent MDIO node for sun8i-h3-emac

2017-08-24 Thread Chen-Yu Tsai
On Fri, Aug 25, 2017 at 11:05 AM, Florian Fainelli  wrote:
>
>
> On 08/24/2017 07:54 PM, Chen-Yu Tsai wrote:
>> On Fri, Aug 25, 2017 at 3:59 AM, Florian Fainelli  
>> wrote:
>>> On 08/24/2017 01:21 AM, Corentin Labbe wrote:
 On Wed, Aug 23, 2017 at 09:31:53AM -0700, Florian Fainelli wrote:
> On 08/23/2017 12:49 AM, Maxime Ripard wrote:
>> Hi Florian,
>>
>> On Tue, Aug 22, 2017 at 11:35:01AM -0700, Florian Fainelli wrote:
>> So I think what you are saying is either impossible or 
>> engineering-wise
>> a very stupid design, like using an external MAC with a discrete PHY
>> connected to the internal MAC's MDIO bus, while using the internal 
>> MAC
>> with the internal PHY.
>>
>> Now can we please decide on something? We're a week and a half from
>> the 4.13 release. If mdio-mux is wrong, then we could have two mdio
>> nodes (internal-mdio & external-mdio).
>
> I really don't see a need for a mdio-mux in the first place, just have
> one MDIO controller (current state) sub-node which describes the
> built-in STMMAC MDIO controller and declare the internal PHY as a 
> child
> node (along with 'phy-is-integrated'). If a different configuration is
> used, then just put the external PHY as a child node there.
>
> If fixed-link is required, the mdio node becomes unused anyway.
>
> Works for everyone?

 If we put an external PHY with reg=1 as a child of internal MDIO,
 il will be merged with internal PHY node and get
 phy-is-integrated.
>>>
>>> Then have the .dtsi file contain just the mdio node, but no internal or
>>> external PHY and push all the internal and external PHY node definition
>>> (in its entirety) to the per-board DTS file, does not that work?
>>
>> If possible, I'd really like to have the internal PHY in the
>> DTSI. It's always there in hardware anyway, and duplicating the PHY,
>> with its clock, reset line, and whatever info we might need in the
>> future in each and every board DTS that uses it will be very error
>> prone and we will have the usual bunch of issues that come up with
>> duplication.
>
> OK, then what if you put the internal PHY in the DTSI, mark it with a
> status = "disabled" property, and have the per-board DTS put a status =
> "okay" property along with a "phy-is-integrated" boolean property? Would
> that work?

 No, I tested and for example with sun8i-h3-orangepi-plus.dts, the external 
 PHY (ethernet-phy@1) is still merged.
>>>
>>> Is not there is a mistake in the unit address not matching the "reg"
>>> property, or am I not looking at the right tree?
>>>
>>> &mdio {
>>> ext_rgmii_phy: ethernet-phy@1 {
>>> compatible = "ethernet-phy-ieee802.3-c22";
>>> reg = <0>;
>>> };
>>> };
>>>
>>> If the PHY is really at MDIO address 0, then it should be
>>> ethernet-phy@0, and not ethernet-phy@1, and then no problem with the
>>> merging?
>>
>> That is wrong. The board described in the example likely has a Realtek
>> RTL8211E @ address 0x1. Address 0 for this PHY is a broadcast address,
>> so it still works, but is the wrong representation.
>>
>>>
>>>
 So that adding a 'status = "disabled"' does not bring anything.

>
> What I really don't think is necessary is:
>
> - duplicating the "mdio" controller node for external vs. internal PHY,
> because this is not accurate, there is just one MDIO controller, but
> there may be different kinds of MDIO/PHY devices attached

 For me, if we want to represent the reality, we need two MDIO:
 - since two PHY at the same address could co-exists
 - since they are isolated so not on the same MDIO bus
>>>
>>> Is that really true? It might be, but from experience with e.g:
>>> bcmgenet, the integrated PHY and the external PHYs are on the same MDIO
>>> bus, which is convenient, except when you have an address conflict.
>>
>> There's a mux in the hardware: either the internal MDIO+MII lines
>> from the internal PHY are connected to the MAC, or the external
>> MDIO+MII lines from the pin controller are connected. I believe
>> this was already mentioned?
>
> There is obviously a mux for the data lines and clock to switch between
> internal PHY and external PHYs, that does not mean there is one for MDIO
> and MDC lines, which is what is being suggested to be used here, does
> the mux also takes care of these lines?
>
>>
>>>

> - having the STMMAC driver MDIO probing code having to deal with a
> "mdio" sub-node or an "internal-mdio" sub-node because this is confusing
> and requiring more driver-level changes that are error prone

 My patch for stmmac is really small, only the name of my variable 
 ("need_mdio_mux_ids")
 have to be changed to somethi

Re: XDP redirect measurements, gotchas and tracepoints

2017-08-24 Thread Michael Chan
On Wed, Aug 23, 2017 at 1:29 AM, Jesper Dangaard Brouer
 wrote:
> On Tue, 22 Aug 2017 23:59:05 -0700
> Michael Chan  wrote:
>
>> On Tue, Aug 22, 2017 at 6:06 PM, Alexander Duyck
>>  wrote:
>> > On Tue, Aug 22, 2017 at 1:04 PM, Michael Chan  
>> > wrote:
>> >>
>> >> Right, but it's conceivable to add an API to "return" the buffer to
>> >> the input device, right?
>
> Yes, I would really like to see an API like this.
>
>> >
>> > You could, it is just added complexity. "just free the buffer" in
>> > ixgbe usually just amounts to one atomic operation to decrement the
>> > total page count since page recycling is already implemented in the
>> > driver. You still would have to unmap the buffer regardless of if you
>> > were recycling it or not so all you would save is 1.15259 atomic
>> > operations per packet. The fraction is because once every 64K uses we
>> > have to bulk update the count on the page.
>> >
>>
>> If the buffer is returned to the input device, the input device can
>> keep the DMA mapping.  All it needs to do is to dma_sync it back to
>> the input device when the buffer is returned.
>
> Yes, exactly, return to the input device. I really think we should
> work on a solution where we can keep the DMA mapping around.  We have
> an opportunity here to make ndo_xdp_xmit TX queues use a specialized
> page return call, to achieve this. (I imagine other arch's have a high
> DMA overhead than Intel)
>
> I'm not sure how the API should look.  The ixgbe recycle mechanism and
> splitting the page (into two packets) actually complicates things, and
> tie us into a page-refcnt based model.  We could get around this by
> each driver implementing a page-return-callback, that allow us to
> return the page to the input device?  Then, drivers implementing the
> 1-packet-per-page can simply check/read the page-refcnt, and if it is
> "1" DMA-sync and reuse it in the RX queue.
>

Yeah, based on Alex' description, it's not clear to me whether ixgbe
redirecting to a non-intel NIC or vice versa will actually work.  It
sounds like the output device has to make some assumptions about how
the page was allocated by the input device.  With buffer return API,
each driver can cleanly recycle or free its own buffers properly.

Let me discuss this further with Andy to see if we can come up with a
good scheme.


RE: [PATCH v2 net-next 1/1] hv_sock: implements Hyper-V transport for Virtual Sockets (AF_VSOCK)

2017-08-24 Thread Dexuan Cui
> From: David Miller [mailto:da...@davemloft.net]
> Sent: Thursday, August 24, 2017 18:20
> > +#define VMBUS_PKT_TRAILER  (sizeof(u64))
>
> This is not the packet trailer, it's the size of the packet trailer.

Thanks! I'll change it to VMBUS_PKT_TRAILER_SIZE.

> > +   /* Have we sent the zero-length packet (FIN)? */
> > +   unsigned long fin_sent;
>
> Why does this need to be atomic?  Why can't a smaller simpler
It doesn't have to be. It was originally made for a quick workaround.
Thanks! I should do it in the right way now.

> mechanism be used to make sure hvs_shutdown() only performs
> hvs_send_data() call once on the channel?
I'll change "fin_sent" to bool, and avoid test_and_set_bit().
I'll add lock_sock/release_sock()  in hvs_shutdown() like this:

static int hvs_shutdown(struct vsock_sock *vsk, int mode)
{
 ...
   lock_sock(sk);

hvs = vsk->trans;
if (hvs->fin_sent)
goto out;

send_buf = (struct hvs_send_buf *)&hdr;
(void)hvs_send_data(hvs->chan, send_buf, 0);

hvs->fin_sent = true;
out:
release_sock(sk);
return 0;
}

> > +static inline bool is_valid_srv_id(const uuid_le *id)
> > +{
> > +   return !memcmp(&id->b[4], &srv_id_template.b[4], sizeof(uuid_le) -
> 4);
> > +}
>
> Do not use the inline function attribute in *.c code.  Let the
> compiler decide.

OK. Will remove all the inline usages.

> > +   *((u32 *)&h->vm_srv_id) = vsk->local_addr.svm_port;
> > +   *((u32 *)&h->host_srv_id) = vsk->remote_addr.svm_port;
>
> There has to be a better way to express this.
I may need to define a uinon here. Let me try it.

 > And if this is partially initializing vm_srv_id, at a minimum
> endianness needs to be taken into account.
I may need to use cpu_to_le32(). Let me check it.



Re: [PATCH net-next] e1000e: Be drop monitor friendly

2017-08-24 Thread Florian Fainelli


On 08/24/2017 06:36 PM, Florian Fainelli wrote:
> e1000_put_txbuf() cleans up the successfully transmitted TX packets,
> e1000e_tx_hwtstamp_work() also does the successfully completes the
> timestamped TX packets, e1000_clean_rx_ring() cleans up the RX ring and
> e1000_remove() cleans up the timestampted packets. None of these
> functions should be reporting dropped packets, so make them use
> dev_consume_skb*() to be drop monitor friendly.

This also won't compile, I marked it a Superseded in patchwork since
there is a v2 coming.

-- 
Florian


Re: [PATCH v3 3/4] net: stmmac: register parent MDIO node for sun8i-h3-emac

2017-08-24 Thread Florian Fainelli


On 08/24/2017 07:54 PM, Chen-Yu Tsai wrote:
> On Fri, Aug 25, 2017 at 3:59 AM, Florian Fainelli  
> wrote:
>> On 08/24/2017 01:21 AM, Corentin Labbe wrote:
>>> On Wed, Aug 23, 2017 at 09:31:53AM -0700, Florian Fainelli wrote:
 On 08/23/2017 12:49 AM, Maxime Ripard wrote:
> Hi Florian,
>
> On Tue, Aug 22, 2017 at 11:35:01AM -0700, Florian Fainelli wrote:
> So I think what you are saying is either impossible or 
> engineering-wise
> a very stupid design, like using an external MAC with a discrete PHY
> connected to the internal MAC's MDIO bus, while using the internal MAC
> with the internal PHY.
>
> Now can we please decide on something? We're a week and a half from
> the 4.13 release. If mdio-mux is wrong, then we could have two mdio
> nodes (internal-mdio & external-mdio).

 I really don't see a need for a mdio-mux in the first place, just have
 one MDIO controller (current state) sub-node which describes the
 built-in STMMAC MDIO controller and declare the internal PHY as a child
 node (along with 'phy-is-integrated'). If a different configuration is
 used, then just put the external PHY as a child node there.

 If fixed-link is required, the mdio node becomes unused anyway.

 Works for everyone?
>>>
>>> If we put an external PHY with reg=1 as a child of internal MDIO,
>>> il will be merged with internal PHY node and get
>>> phy-is-integrated.
>>
>> Then have the .dtsi file contain just the mdio node, but no internal or
>> external PHY and push all the internal and external PHY node definition
>> (in its entirety) to the per-board DTS file, does not that work?
>
> If possible, I'd really like to have the internal PHY in the
> DTSI. It's always there in hardware anyway, and duplicating the PHY,
> with its clock, reset line, and whatever info we might need in the
> future in each and every board DTS that uses it will be very error
> prone and we will have the usual bunch of issues that come up with
> duplication.

 OK, then what if you put the internal PHY in the DTSI, mark it with a
 status = "disabled" property, and have the per-board DTS put a status =
 "okay" property along with a "phy-is-integrated" boolean property? Would
 that work?
>>>
>>> No, I tested and for example with sun8i-h3-orangepi-plus.dts, the external 
>>> PHY (ethernet-phy@1) is still merged.
>>
>> Is not there is a mistake in the unit address not matching the "reg"
>> property, or am I not looking at the right tree?
>>
>> &mdio {
>> ext_rgmii_phy: ethernet-phy@1 {
>> compatible = "ethernet-phy-ieee802.3-c22";
>> reg = <0>;
>> };
>> };
>>
>> If the PHY is really at MDIO address 0, then it should be
>> ethernet-phy@0, and not ethernet-phy@1, and then no problem with the
>> merging?
> 
> That is wrong. The board described in the example likely has a Realtek
> RTL8211E @ address 0x1. Address 0 for this PHY is a broadcast address,
> so it still works, but is the wrong representation.
> 
>>
>>
>>> So that adding a 'status = "disabled"' does not bring anything.
>>>

 What I really don't think is necessary is:

 - duplicating the "mdio" controller node for external vs. internal PHY,
 because this is not accurate, there is just one MDIO controller, but
 there may be different kinds of MDIO/PHY devices attached
>>>
>>> For me, if we want to represent the reality, we need two MDIO:
>>> - since two PHY at the same address could co-exists
>>> - since they are isolated so not on the same MDIO bus
>>
>> Is that really true? It might be, but from experience with e.g:
>> bcmgenet, the integrated PHY and the external PHYs are on the same MDIO
>> bus, which is convenient, except when you have an address conflict.
> 
> There's a mux in the hardware: either the internal MDIO+MII lines
> from the internal PHY are connected to the MAC, or the external
> MDIO+MII lines from the pin controller are connected. I believe
> this was already mentioned?

There is obviously a mux for the data lines and clock to switch between
internal PHY and external PHYs, that does not mean there is one for MDIO
and MDC lines, which is what is being suggested to be used here, does
the mux also takes care of these lines?

> 
>>
>>>
 - having the STMMAC driver MDIO probing code having to deal with a
 "mdio" sub-node or an "internal-mdio" sub-node because this is confusing
 and requiring more driver-level changes that are error prone
>>>
>>> My patch for stmmac is really small, only the name of my variable 
>>> ("need_mdio_mux_ids")
>>> have to be changed to something like "register_parent_mdio"
>>>
>>>
>>> So I agree with Maxime, we need to avoid merging PHY nodes, and we can 
>>> avoid it only by having two separate MDIO nodes.
>>> Furthermore, with only

Re: nf_nat_pptp 4.12.3 kernel lockup/reboot

2017-08-24 Thread Denys Fedoryshchenko

On 2017-07-24 19:20, Florian Westphal wrote:

Florian Westphal  wrote:

Denys Fedoryshchenko  wrote:
> Hi,
>
> I am trying to upgrade kernel 4.11.8 to 4.12.3 (it is a nat/router, handling
> approx 2gbps of pppoe users traffic) and noticed that after while server
> rebooting(i have set reboot on panic and etc).
> I can't run serial console, and in pstore / netconsole there is nothing.
> Best i got is some very short message about softlockup in ipmi, but as
> storage very limited there - it is near useless.
>
> By preliminary testing (can't do it much, as it's production) - it seems
> following lines causing issue, they worked in 4.11.8 and no more in 4.12.3.

Wild guess here, does this help?

diff --git a/net/netfilter/nf_conntrack_helper.c 
b/net/netfilter/nf_conntrack_helper.c

--- a/net/netfilter/nf_conntrack_helper.c
+++ b/net/netfilter/nf_conntrack_helper.c
@@ -266,6 +266,8 @@ int __nf_ct_try_assign_helper(struct nf_conn *ct, 
struct nf_conn *tmpl,

help = nf_ct_helper_ext_add(ct, helper, flags);
if (help == NULL)
return -ENOMEM;
+   if (!nf_ct_ext_add(ct, NF_CT_EXT_NAT, flags));


sigh, stupid typo, should be no ';' at the end above.

Sorry, is there any plans to push this to 4.12 stable queue?


Re: [PATCH v3 3/4] net: stmmac: register parent MDIO node for sun8i-h3-emac

2017-08-24 Thread Chen-Yu Tsai
On Fri, Aug 25, 2017 at 3:59 AM, Florian Fainelli  wrote:
> On 08/24/2017 01:21 AM, Corentin Labbe wrote:
>> On Wed, Aug 23, 2017 at 09:31:53AM -0700, Florian Fainelli wrote:
>>> On 08/23/2017 12:49 AM, Maxime Ripard wrote:
 Hi Florian,

 On Tue, Aug 22, 2017 at 11:35:01AM -0700, Florian Fainelli wrote:
 So I think what you are saying is either impossible or engineering-wise
 a very stupid design, like using an external MAC with a discrete PHY
 connected to the internal MAC's MDIO bus, while using the internal MAC
 with the internal PHY.

 Now can we please decide on something? We're a week and a half from
 the 4.13 release. If mdio-mux is wrong, then we could have two mdio
 nodes (internal-mdio & external-mdio).
>>>
>>> I really don't see a need for a mdio-mux in the first place, just have
>>> one MDIO controller (current state) sub-node which describes the
>>> built-in STMMAC MDIO controller and declare the internal PHY as a child
>>> node (along with 'phy-is-integrated'). If a different configuration is
>>> used, then just put the external PHY as a child node there.
>>>
>>> If fixed-link is required, the mdio node becomes unused anyway.
>>>
>>> Works for everyone?
>>
>> If we put an external PHY with reg=1 as a child of internal MDIO,
>> il will be merged with internal PHY node and get
>> phy-is-integrated.
>
> Then have the .dtsi file contain just the mdio node, but no internal or
> external PHY and push all the internal and external PHY node definition
> (in its entirety) to the per-board DTS file, does not that work?

 If possible, I'd really like to have the internal PHY in the
 DTSI. It's always there in hardware anyway, and duplicating the PHY,
 with its clock, reset line, and whatever info we might need in the
 future in each and every board DTS that uses it will be very error
 prone and we will have the usual bunch of issues that come up with
 duplication.
>>>
>>> OK, then what if you put the internal PHY in the DTSI, mark it with a
>>> status = "disabled" property, and have the per-board DTS put a status =
>>> "okay" property along with a "phy-is-integrated" boolean property? Would
>>> that work?
>>
>> No, I tested and for example with sun8i-h3-orangepi-plus.dts, the external 
>> PHY (ethernet-phy@1) is still merged.
>
> Is not there is a mistake in the unit address not matching the "reg"
> property, or am I not looking at the right tree?
>
> &mdio {
> ext_rgmii_phy: ethernet-phy@1 {
> compatible = "ethernet-phy-ieee802.3-c22";
> reg = <0>;
> };
> };
>
> If the PHY is really at MDIO address 0, then it should be
> ethernet-phy@0, and not ethernet-phy@1, and then no problem with the
> merging?

That is wrong. The board described in the example likely has a Realtek
RTL8211E @ address 0x1. Address 0 for this PHY is a broadcast address,
so it still works, but is the wrong representation.

>
>
>> So that adding a 'status = "disabled"' does not bring anything.
>>
>>>
>>> What I really don't think is necessary is:
>>>
>>> - duplicating the "mdio" controller node for external vs. internal PHY,
>>> because this is not accurate, there is just one MDIO controller, but
>>> there may be different kinds of MDIO/PHY devices attached
>>
>> For me, if we want to represent the reality, we need two MDIO:
>> - since two PHY at the same address could co-exists
>> - since they are isolated so not on the same MDIO bus
>
> Is that really true? It might be, but from experience with e.g:
> bcmgenet, the integrated PHY and the external PHYs are on the same MDIO
> bus, which is convenient, except when you have an address conflict.

There's a mux in the hardware: either the internal MDIO+MII lines
from the internal PHY are connected to the MAC, or the external
MDIO+MII lines from the pin controller are connected. I believe
this was already mentioned?

>
>>
>>> - having the STMMAC driver MDIO probing code having to deal with a
>>> "mdio" sub-node or an "internal-mdio" sub-node because this is confusing
>>> and requiring more driver-level changes that are error prone
>>
>> My patch for stmmac is really small, only the name of my variable 
>> ("need_mdio_mux_ids")
>> have to be changed to something like "register_parent_mdio"
>>
>>
>> So I agree with Maxime, we need to avoid merging PHY nodes, and we can avoid 
>> it only by having two separate MDIO nodes.
>> Furthermore, with only one MDIO, we will face with lots of small patch for 
>> adding phy-is-integrated, with two we do not need to change any board DT, 
>> all is simply clean.
>> Really having two MDIO seems cleaner.
>
> The only valid thing that you have provided so far is this merging
> problem. Anything else ranging from "we will face with lots of small
> patch for adding phy-is-integrated" to "Really having two MDIO seems
> cleaner." are hard to 

[PATCH] pktgen: add a new sample script for 40G and above link testing

2017-08-24 Thread Robert Hoo
From: Robert Ho 

It's hard to benchmark 40G+ network bandwidth using ordinary
tools like iperf, netperf. I then tried with pktgen multiqueue sample
scripts, but still cannot reach line rate.
I then derived this NUMA awared irq affinity sample script from
multi-queue sample one, successfully benchmarked 40G link. I think this can
also be useful for 100G reference, though I haven't got device to test.

This script simply does:
Detect $DEV's NUMA node belonging.
Bind each thread (processor from that NUMA node) with each $DEV queue's
irq affinity, 1:1 mapping.
How many '-t' threads input determines how many queues will be
utilized.

Tested with Intel XL710 NIC with Cisco 3172 switch.

It would be even slightly better if the irqbalance service is turned
off outside.

Referrences:
https://people.netfilter.org/hawk/presentations/LCA2015/net_stack_challenges_100G_LCA2015.pdf
http://www.intel.cn/content/dam/www/public/us/en/documents/reference-guides/xl710-x710-performance-tuning-linux-guide.pdf

Signed-off-by: Robert Hoo 
---
 ...tgen_sample06_numa_awared_queue_irq_affinity.sh | 132 +
 1 file changed, 132 insertions(+)
 create mode 100755 
samples/pktgen/pktgen_sample06_numa_awared_queue_irq_affinity.sh

diff --git a/samples/pktgen/pktgen_sample06_numa_awared_queue_irq_affinity.sh 
b/samples/pktgen/pktgen_sample06_numa_awared_queue_irq_affinity.sh
new file mode 100755
index 000..f0ee25c
--- /dev/null
+++ b/samples/pktgen/pktgen_sample06_numa_awared_queue_irq_affinity.sh
@@ -0,0 +1,132 @@
+#!/bin/bash
+#
+# Multiqueue: Using pktgen threads for sending on multiple CPUs
+#  * adding devices to kernel threads which are in the same NUMA node
+#  * bound devices queue's irq affinity to the threads, 1:1 mapping
+#  * notice the naming scheme for keeping device names unique
+#  * nameing scheme: dev@thread_number
+#  * flow variation via random UDP source port
+#
+basedir=`dirname $0`
+source ${basedir}/functions.sh
+root_check_run_with_sudo "$@"
+#
+# Required param: -i dev in $DEV
+source ${basedir}/parameters.sh
+
+get_iface_node()
+{
+   echo `cat /sys/class/net/$1/device/numa_node`
+}
+
+get_iface_irqs()
+{
+   local IFACE=$1
+   local queues="${IFACE}-.*TxRx"
+
+   irqs=$(grep "$queues" /proc/interrupts | cut -f1 -d:)
+   [ -z "$irqs" ] && irqs=$(grep $IFACE /proc/interrupts | cut -f1 -d:)
+   [ -z "$irqs" ] && irqs=$(for i in `ls -Ux 
/sys/class/net/$IFACE/device/msi_irqs` ;\
+   do grep "$i:.*TxRx" /proc/interrupts | grep -v fdir | cut -f 1 
-d : ;\
+   done)
+   [ -z "$irqs" ] && echo "Error: Could not find interrupts for $IFACE"
+
+   echo $irqs
+}
+
+get_node_cpus()
+{
+   local node=$1
+   local node_cpu_list
+   local node_cpu_range_list=`cut -f1- -d, --output-delimiter=" " \
+   /sys/devices/system/node/node$node/cpulist`
+
+   for cpu_range in $node_cpu_range_list
+   do
+   node_cpu_list="$node_cpu_list "`seq -s " " ${cpu_range//-/ }`
+   done
+
+   echo $node_cpu_list
+}
+
+
+# Base Config
+DELAY="0"# Zero means max speed
+COUNT="2000"   # Zero means indefinitely
+[ -z "$CLONE_SKB" ] && CLONE_SKB="0"
+
+# Flow variation random source port between min and max
+UDP_MIN=9
+UDP_MAX=109
+
+node=`get_iface_node $DEV`
+irq_array=(`get_iface_irqs $DEV`)
+cpu_array=(`get_node_cpus $node`)
+
+[ $THREADS -gt ${#irq_array[*]} -o $THREADS -gt ${#cpu_array[*]}  ] && \
+   err 1 "Thread number $THREADS exceeds: min 
(${#irq_array[*]},${#cpu_array[*]})"
+
+# (example of setting default params in your script)
+if [ -z "$DEST_IP" ]; then
+[ -z "$IP6" ] && DEST_IP="198.18.0.42" || DEST_IP="FD00::1"
+fi
+[ -z "$DST_MAC" ] && DST_MAC="90:e2:ba:ff:ff:ff"
+
+# General cleanup everything since last run
+pg_ctrl "reset"
+
+# Threads are specified with parameter -t value in $THREADS
+for ((i = 0; i < $THREADS; i++)); do
+# The device name is extended with @name, using thread number to
+# make then unique, but any name will do.
+# Set the queue's irq affinity to this $thread (processor)
+thread=${cpu_array[$i]}
+dev=${DEV}@${thread}
+echo $thread > /proc/irq/${irq_array[$i]}/smp_affinity_list
+echo "irq ${irq_array[$i]} is set affinity to `cat 
/proc/irq/${irq_array[$i]}/smp_affinity_list`"
+
+# Add remove all other devices and add_device $dev to thread
+pg_thread $thread "rem_device_all"
+pg_thread $thread "add_device" $dev
+
+# select queue and bind the queue and $dev in 1:1 relationship
+queue_num=$i
+echo "queue number is $queue_num"
+pg_set $dev "queue_map_min $queue_num"
+pg_set $dev "queue_map_max $queue_num"
+
+# Notice config queue to map to cpu (mirrors smp_processor_id())
+# It is beneficial to map IRQ /proc/irq/*/smp_affinity 1:1 to CPU number
+pg_set $dev "flag QUEUE_MAP_CPU"
+
+# Base config of dev
+pg_set $dev "count $COUNT"
+pg_set $dev "clone_skb $CLONE_SKB"
+pg_set $dev "pk

Re: [PATCH] staging: rtlwifi: Improve debugging by using debugfs

2017-08-24 Thread Andrew Lunn
On Thu, Aug 24, 2017 at 04:28:08PM -0500, Larry Finger wrote:
> The changes in this commit are also being sent to the main rtlwifi
> drivers in wireless-next; however, these changes will also be useful for
> any debugging of r8822be before it gets moved into the main tree.
> 
> Use debugfs to dump register and btcoex status, and also write registers
> and h2c.
> 
> We create topdir in /sys/kernel/debug/rtlwifi/, and use the MAC address
> as subdirectory with several entries to dump mac_reg, bb_reg, rf_reg etc.
> An example is
> /sys/kernel/debug/rtlwifi/00-11-22-33-44-55-66/mac_0
> 
> This change permits examination of device registers in a dynamic manner,
> a feature not available with the current debug mechanism.

Hi Larry

netdev frowns upon debugfs. You should try to keep this altogether,
making it easy to throw away before the driver is moved out of
staging.

You might want to look at ethtool -d. That will be accepted.

Andrew


Re: [PATCH net-next] net: mv643xx_eth: Be drop monitor friendly

2017-08-24 Thread David Miller
From: David Miller 
Date: Thu, 24 Aug 2017 18:27:56 -0700 (PDT)

> From: Florian Fainelli 
> Date: Thu, 24 Aug 2017 16:04:25 -0700
> 
>> txq_reclaim() does the normal transmit queue reclamation and
>> rxq_deinit() does the RX ring cleanup, none of these are packet drops,
>> so use dev_consume_skb() for both locations.
>> 
>> Signed-off-by: Florian Fainelli 
> 
> Applied.

-ENOCOMPILE?

drivers/net/ethernet/marvell/mv643xx_eth.c: In function ‘txq_reclaim’:
drivers/net/ethernet/marvell/mv643xx_eth.c:1124:5: error: implicit declaration 
of function ‘dev_consume_skb’; did you mean ‘dev_consume_skb_any’? 
[-Werror=implicit-function-declaration]
 dev_consume_skb(skb);
 ^~~
 dev_consume_skb_any

Reverted.


[PATCH net-next] e1000e: Be drop monitor friendly

2017-08-24 Thread Florian Fainelli
e1000_put_txbuf() cleans up the successfully transmitted TX packets,
e1000e_tx_hwtstamp_work() also does the successfully completes the
timestamped TX packets, e1000_clean_rx_ring() cleans up the RX ring and
e1000_remove() cleans up the timestampted packets. None of these
functions should be reporting dropped packets, so make them use
dev_consume_skb*() to be drop monitor friendly.

Signed-off-by: Florian Fainelli 
---
 drivers/net/ethernet/intel/e1000e/netdev.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c 
b/drivers/net/ethernet/intel/e1000e/netdev.c
index 327dfe5bedc0..91303544975a 100644
--- a/drivers/net/ethernet/intel/e1000e/netdev.c
+++ b/drivers/net/ethernet/intel/e1000e/netdev.c
@@ -1085,7 +1085,7 @@ static void e1000_put_txbuf(struct e1000_ring *tx_ring,
buffer_info->dma = 0;
}
if (buffer_info->skb) {
-   dev_kfree_skb_any(buffer_info->skb);
+   dev_consume_skb_any(buffer_info->skb);
buffer_info->skb = NULL;
}
buffer_info->time_stamp = 0;
@@ -1199,7 +1199,7 @@ static void e1000e_tx_hwtstamp_work(struct work_struct 
*work)
wmb(); /* force write prior to skb_tstamp_tx */
 
skb_tstamp_tx(skb, &shhwtstamps);
-   dev_kfree_skb_any(skb);
+   dev_consume_skb_any(skb);
} else if (time_after(jiffies, adapter->tx_hwtstamp_start
  + adapter->tx_timeout_factor * HZ)) {
dev_kfree_skb_any(adapter->tx_hwtstamp_skb);
@@ -1716,7 +1716,7 @@ static void e1000_clean_rx_ring(struct e1000_ring 
*rx_ring)
}
 
if (buffer_info->skb) {
-   dev_kfree_skb(buffer_info->skb);
+   dev_consume_skb(buffer_info->skb);
buffer_info->skb = NULL;
}
 
@@ -1734,7 +1734,7 @@ static void e1000_clean_rx_ring(struct e1000_ring 
*rx_ring)
 
/* there also may be some cached data from a chained receive */
if (rx_ring->rx_skb_top) {
-   dev_kfree_skb(rx_ring->rx_skb_top);
+   dev_consume_skb(rx_ring->rx_skb_top);
rx_ring->rx_skb_top = NULL;
}
 
@@ -7411,7 +7411,7 @@ static void e1000_remove(struct pci_dev *pdev)
if (adapter->flags & FLAG_HAS_HW_TIMESTAMP) {
cancel_work_sync(&adapter->tx_hwtstamp_work);
if (adapter->tx_hwtstamp_skb) {
-   dev_kfree_skb_any(adapter->tx_hwtstamp_skb);
+   dev_consume_skb_any(adapter->tx_hwtstamp_skb);
adapter->tx_hwtstamp_skb = NULL;
}
}
-- 
2.9.3



[PATCH net 0/2] r8169: Be drop monitor friendly

2017-08-24 Thread Florian Fainelli
Hi all,

First patch may be questionable but no other driver appears to be doing that
and while it is defendable to account for left packets as dropped during TX
clean, this appears misleadning. I picked Stanislaw changes which brings us
back to 2010, but this was present from pre-git days as well.

Second patch fixes the two missing calls to dev_consume_skb_any().

Florian Fainelli (2):
  r8169: Do not increment tx_dropped in TX ring cleaning
  r8169: Be drop monitor friendly

 drivers/net/ethernet/realtek/r8169.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

-- 
2.9.3



[PATCH net 2/2] r8169: Be drop monitor friendly

2017-08-24 Thread Florian Fainelli
rtl_tx() is the TX reclamation process whereas rtl8169_tx_clear_range() does
the TX ring cleaning during shutdown, both of these functions should call
dev_consume_skb_any() to be drop monitor friendly.

Fixes: cac4b22f3d6a ("r8169: do not account fragments as packets")
Fixes: eb781397904e ("r8169: Do not use dev_kfree_skb in xmit path")
Signed-off-by: Florian Fainelli 
---
 drivers/net/ethernet/realtek/r8169.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/realtek/r8169.c 
b/drivers/net/ethernet/realtek/r8169.c
index 8a1bbd2a6a20..e03fcf914690 100644
--- a/drivers/net/ethernet/realtek/r8169.c
+++ b/drivers/net/ethernet/realtek/r8169.c
@@ -6863,7 +6863,7 @@ static void rtl8169_tx_clear_range(struct rtl8169_private 
*tp, u32 start,
rtl8169_unmap_tx_skb(&tp->pci_dev->dev, tx_skb,
 tp->TxDescArray + entry);
if (skb) {
-   dev_kfree_skb_any(skb);
+   dev_consume_skb_any(skb);
tx_skb->skb = NULL;
}
}
@@ -7318,7 +7318,7 @@ static void rtl_tx(struct net_device *dev, struct 
rtl8169_private *tp)
tp->tx_stats.packets++;
tp->tx_stats.bytes += tx_skb->skb->len;
u64_stats_update_end(&tp->tx_stats.syncp);
-   dev_kfree_skb_any(tx_skb->skb);
+   dev_consume_skb_any(tx_skb->skb);
tx_skb->skb = NULL;
}
dirty_tx++;
-- 
2.9.3



[PATCH net 1/2] r8169: Do not increment tx_dropped in TX ring cleaning

2017-08-24 Thread Florian Fainelli
rtl8169_tx_clear_range() is responsible for cleaning up the TX ring
during interface shutdown, incrementing tx_dropped for every SKB that we
left at the time in the ring is misleading.

Fixes: cac4b22f3d6a ("r8169: do not account fragments as packets")
Signed-off-by: Florian Fainelli 
---
 drivers/net/ethernet/realtek/r8169.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/net/ethernet/realtek/r8169.c 
b/drivers/net/ethernet/realtek/r8169.c
index bd07a15d3b7c..8a1bbd2a6a20 100644
--- a/drivers/net/ethernet/realtek/r8169.c
+++ b/drivers/net/ethernet/realtek/r8169.c
@@ -6863,7 +6863,6 @@ static void rtl8169_tx_clear_range(struct rtl8169_private 
*tp, u32 start,
rtl8169_unmap_tx_skb(&tp->pci_dev->dev, tx_skb,
 tp->TxDescArray + entry);
if (skb) {
-   tp->dev->stats.tx_dropped++;
dev_kfree_skb_any(skb);
tx_skb->skb = NULL;
}
-- 
2.9.3



[PATCH net 0/2] r8169: Be drop monitor friendly

2017-08-24 Thread Florian Fainelli
Hi all,

First patch may be questionable but no other driver appears to be doing that
and while it is defendable to account for left packets as dropped during TX
clean, this appears misleadning. I picked Stanislaw changes which brings us
back to 2010, but this was present from pre-git days as well.

Second patch fixes the two missing calls to dev_consume_skb_any().

Florian Fainelli (2):
  r8169: Do not increment tx_dropped in TX ring cleaning
  r8169: Be drop monitor friendly

 drivers/net/ethernet/realtek/r8169.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

-- 
2.9.3



RE: Question about ip_defrag

2017-08-24 Thread liujian (CE)

> -Original Message-
> From: Jesper Dangaard Brouer [mailto:bro...@redhat.com]
> Sent: Friday, August 25, 2017 2:59 AM
> To: liujian (CE)
> Cc: da...@davemloft.net; kuz...@ms2.inr.ac.ru; yoshf...@linux-ipv6.org;
> elena.reshet...@intel.com; eduma...@google.com; netdev@vger.kernel.org;
> bro...@redhat.com
> Subject: Re: Question about ip_defrag
> 
> 
> On Thu, 24 Aug 2017 16:04:41 + "liujian (CE)" 
> wrote:
> 
> > >What kernel version have you seen this issue with?
> >
> > 3.10,with some backport.
> >
> >  >As far as I remember, this issue have been fixed before...
> >
> > which one patch? I didnot find out the patch:(
> 
> AFAIK it was some bugs in the percpu_counter code.  If you need to backport
> look at the git commits:
> 
>  git log lib/percpu_counter.c include/linux/percpu_counter.h
> 
> Are you maintaining your own 3.10 kernel?
> 
> I know that for RHEL7 (also kernel 3.10) we backported the percpu_counter
> fixes...
>
Could you tell me which one patch?  we have backported most of the two files's 
change. 
Thank you ~


> --Jesper
> 
> 
> > 发件人: Jesper Dangaard Brouer
> > 收件人: liujian
> (CE)mailto:liujia...@huawei.com>>
> > 抄送:
> >
> da...@davemloft.net;kuz...@ms2.inr.ac.ru
>  > ailto:kuz...@ms2.inr.ac.ru>;yoshf...@linux-ipv6.org > nux-ipv6.org>;elena.reshet...@intel.com >
> om>;eduma...@google.com;netdev@vger.k
> ernel
> > .org;bro...@redhat.com e
> > dhat.com>
> > 主题: Re: Question about ip_defrag
> > 时间: 2017-08-24 21:53:17
> >
> >
> > On Thu, 24 Aug 2017 13:15:33 + "liujian (CE)" 
> wrote:
> > > Hello,
> > >
> > > With below patch we met one issue.
> > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/c
> > > ommit/?h=v4.13-rc6&id=6d7b857d541e
> > >
> > > the issue:
> > > Ip_defrag fail caused by frag_mem_limit reached 4M(frags.high_thresh).
> > > At this moment,sum_frag_mem_limit is about 10K.
> > > and my test machine's cpu num is 64.
> > >
> > > Can i only change frag_mem_limit to sum_ frag_mem_limit?
> > >
> > >
> > > diff --git a/net/ipv4/inet_fragment.c b/net/ipv4/inet_fragment.c
> > > index 96e95e8..f09c00b 100644
> > > --- a/net/ipv4/inet_fragment.c
> > > +++ b/net/ipv4/inet_fragment.c
> > > @@ -120,7 +120,7 @@ static void inet_frag_secret_rebuild(struct
> > > inet_frags *f)  static bool inet_fragq_should_evict(const struct
> > > inet_frag_queue *q)  {
> > > return q->net->low_thresh == 0 ||
> > > -  frag_mem_limit(q->net) >= q->net->low_thresh;
> > > +  sum_frag_mem_limit(q->net) >= q->net->low_thresh;
> > >  }
> > >
> > >  static unsigned int
> > > @@ -355,7 +355,7 @@ static struct inet_frag_queue
> > > *inet_frag_alloc(struct netns_frags *nf,  {
> > > struct inet_frag_queue *q;
> > >
> > > -   if (!nf->high_thresh || frag_mem_limit(nf) > nf->high_thresh) {
> > > +   if (!nf->high_thresh || sum_frag_mem_limit(nf) >
> > > + nf->high_thresh) {
> > > inet_frag_schedule_worker(f);
> > > return NULL;
> > > }
> > > @@ -396,7 +396,7 @@ struct inet_frag_queue *inet_frag_find(struct
> netns_frags *nf,
> > > struct inet_frag_queue *q;
> > > int depth = 0;
> > >
> > > -   if (frag_mem_limit(nf) > nf->low_thresh)
> > > +   if (sum_frag_mem_limit(nf) > nf->low_thresh)
> > > inet_frag_schedule_worker(f);
> > >
> > > hash &= (INETFRAGS_HASHSZ - 1);
> > > --
> > >
> > > Thank you for your time.
> >
> > What kernel version have you seen this issue with?
> >
> > As far as I remember, this issue have been fixed before...
> >
> > --
> > Best regards,
> >   Jesper Dangaard Brouer
> >   MSc.CS, Principal Kernel Engineer at Red Hat
> >   LinkedIn: http://www.linkedin.com/in/brouer
> 
> 
> 
> --
> Best regards,
>   Jesper Dangaard Brouer
>   MSc.CS, Principal Kernel Engineer at Red Hat
>   LinkedIn: http://www.linkedin.com/in/brouer


Re: [PATCH net-next] net: mv643xx_eth: Be drop monitor friendly

2017-08-24 Thread David Miller
From: Florian Fainelli 
Date: Thu, 24 Aug 2017 16:04:25 -0700

> txq_reclaim() does the normal transmit queue reclamation and
> rxq_deinit() does the RX ring cleanup, none of these are packet drops,
> so use dev_consume_skb() for both locations.
> 
> Signed-off-by: Florian Fainelli 

Applied.


Re: [PATCH net-next v2] tg3: Be drop monitor friendly

2017-08-24 Thread David Miller
From: Florian Fainelli 
Date: Thu, 24 Aug 2017 17:47:11 -0700

> tg3_tx() does the normal packet TX completion,
> tigon3_dma_hwbug_workaround() and tg3_tso_bug() both need to allocate a
> new SKB that is suitable to workaround HW bugs, and finally
> tg3_free_rings() is doing ring cleanup. Use dev_consume_skb_any() for
> these 3 locations to be SKB drop monitor friendly.
> 
> Signed-off-by: Florian Fainelli 

Applied.


Re: [PATCH net] net: systemport: Free DMA coherent descriptors on errors

2017-08-24 Thread David Miller
From: Florian Fainelli 
Date: Thu, 24 Aug 2017 16:01:13 -0700

> In case bcm_sysport_init_tx_ring() is not able to allocate ring->cbs, we
> would return with an error, and call bcm_sysport_fini_tx_ring() and it
> would see that ring->cbs is NULL and do nothing. This would leak the
> coherent DMA descriptor area, so we need to free it on error before
> returning.
> 
> Reported-by: Eric Dumazet 
> Fixes: 80105befdb4b ("net: systemport: add Broadcom SYSTEMPORT Ethernet MAC 
> driver")
> Signed-off-by: Florian Fainelli 

Applied and queued up for -stable.


Re: [PATCH net] net: bcmgenet: Be drop monitor friendly

2017-08-24 Thread David Miller
From: Florian Fainelli 
Date: Thu, 24 Aug 2017 15:56:29 -0700

> There are 3 spots where we call dev_kfree_skb() but we are actually
> just doing a normal SKB consumption: __bcmgenet_tx_reclaim() for normal
> TX reclamation, bcmgenet_alloc_rx_buffers() during the initial RX ring
> setup and bcmgenet_free_rx_buffers() during RX ring cleanup.
> 
> Fixes: d6707bec5986 ("net: bcmgenet: rewrite bcmgenet_rx_refill()")
> Fixes: f48bed16a756 ("net: bcmgenet: Free skb after last Tx frag")
> Signed-off-by: Florian Fainelli 

Applied.

> David, I tagged these two commits, but this can actually go back
> to when the driver was first merged. Let me know if you would want
> me to provide -stable backports against other trees.

Ok, I'll queue these up for -stable.


Re: [PATCH net] bpf: fix bpf_setsockopts return value

2017-08-24 Thread David Miller
From: Yuchung Cheng 
Date: Thu, 24 Aug 2017 15:48:21 -0700

> This patch fixes a bug causing any sock operations to always return EINVAL.
> 
> Fixes: a5192c52377e ("bpf: fix to bpf_setsockops").
> Reported-by: Neal Cardwell 
> Signed-off-by: Yuchung Cheng 
> Acked-by: Neal Cardwell 
> Acked-by: Craig Gallek 

Applied.


Re: [PATCH net] net: systemport: Be drop monitor friendly

2017-08-24 Thread David Miller
From: Florian Fainelli 
Date: Thu, 24 Aug 2017 15:20:41 -0700

> Utilize dev_consume_skb_any(cb->skb) in bcm_sysport_free_cb() which is
> used when a TX packet is completed, as well as when the RX ring is
> cleaned on shutdown. None of these two cases are packet drops, so be
> drop monitor friendly.
> 
> Suggested-by: Eric Dumazet 
> Fixes: 80105befdb4b ("net: systemport: add Broadcom SYSTEMPORT Ethernet MAC 
> driver")
> Signed-off-by: Florian Fainelli 

Applied.


Re: [PATCH net-next v3 0/4] Route ICMPv6 errors with the flow when ECMP in use

2017-08-24 Thread David Miller
From: Jakub Sitnicki 
Date: Wed, 23 Aug 2017 09:58:27 +0200

> This patch set is another take at making Path MTU Discovery work when
> server nodes are behind a router employing multipath routing in a
> load-balance or anycast setup (that is, when not every end-node can be
> reached by every path). The problem has been well described in RFC 7690
> [1], but in short - in such setups ICMPv6 PTB errors are not guaranteed
> to be routed back to the server node that sent a reply that exceeds path
> MTU.
 ...

Ok, looks not to bad.

Applied, thanks.



Re: [PATCH v2 net-next 1/1] hv_sock: implements Hyper-V transport for Virtual Sockets (AF_VSOCK)

2017-08-24 Thread David Miller
From: Dexuan Cui 
Date: Wed, 23 Aug 2017 04:52:14 +

> +#define VMBUS_PKT_TRAILER(sizeof(u64))

This is not the packet trailer, it's the size of the packet trailer.
Please make this macro name match more accurately what it is.

> + /* Have we sent the zero-length packet (FIN)? */
> + unsigned long fin_sent;

Why does this need to be atomic?  Why can't a smaller simpler
mechanism be used to make sure hvs_shutdown() only performs
hvs_send_data() call once on the channel?

> +static inline bool is_valid_srv_id(const uuid_le *id)
> +{
> + return !memcmp(&id->b[4], &srv_id_template.b[4], sizeof(uuid_le) - 4);
> +}

Do not use the inline function attribute in *.c code.  Let the
compiler decide.

> +static inline unsigned int get_port_by_srv_id(const uuid_le *svr_id)

Likewise.

> +static inline void hvs_addr_init(struct sockaddr_vm *addr,

Likewise.

> +static inline void hvs_remote_addr_init(struct sockaddr_vm *remote,
> + struct sockaddr_vm *local)

Likewise.

And so on and so forth, please audit this for your entire patch.

> + *((u32 *)&h->vm_srv_id) = vsk->local_addr.svm_port;
> + *((u32 *)&h->host_srv_id) = vsk->remote_addr.svm_port;

There has to be a better way to express this.

And if this is partially initializing vm_srv_id, at a minimum
endianness needs to be taken into account.


Re: [net-next:master 1328/1341] drivers/net/ethernet/mellanox/mlxsw/spectrum_dpipe.c:77:3: error: 'devlink_dpipe_header_ethernet' undeclared here (not in a function)

2017-08-24 Thread David Miller
From: kbuild test robot 
Date: Fri, 25 Aug 2017 08:16:01 +0800

> All error/warnings (new ones prefixed by >>):
> 
>>> drivers/net/ethernet/mellanox/mlxsw/spectrum_dpipe.c:77:3: error: 
>>> 'devlink_dpipe_header_ethernet' undeclared here (not in a function)
>  &devlink_dpipe_header_ethernet,
>   ^
>>> drivers/net/ethernet/mellanox/mlxsw/spectrum_dpipe.c:78:3: error: 
>>> 'devlink_dpipe_header_ipv4' undeclared here (not in a function)
>  &devlink_dpipe_header_ipv4,
>   ^
>drivers/net/ethernet/mellanox/mlxsw/spectrum_dpipe.c: In function 
> 'mlxsw_sp_dpipe_erif_table_init':
>drivers/net/ethernet/mellanox/mlxsw/spectrum_dpipe.c:305:9: error: too few 
> arguments to function 'devlink_dpipe_table_register'
>  return devlink_dpipe_table_register(devlink,

Jiri and co., I think you'll need some ifdef'ery to deal with this
properly.


Re: [net-next:master 1324/1341] drivers/net//ethernet/mellanox/mlxsw/spectrum_dpipe.c:323:9: error: too few arguments to function 'devlink_dpipe_table_register'

2017-08-24 Thread David Miller
From: kbuild test robot 
Date: Fri, 25 Aug 2017 08:03:28 +0800

> All errors (new ones prefixed by >>):
> 
>drivers/net//ethernet/mellanox/mlxsw/spectrum_dpipe.c: In function 
> 'mlxsw_sp_dpipe_erif_table_init':
>>> drivers/net//ethernet/mellanox/mlxsw/spectrum_dpipe.c:323:9: error: too few 
>>> arguments to function 'devlink_dpipe_table_register'
>  return devlink_dpipe_table_register(devlink,
> ^
>In file included from 
> drivers/net//ethernet/mellanox/mlxsw/spectrum_dpipe.c:36:0:
>include/net/devlink.h:401:1: note: declared here
> devlink_dpipe_table_register(struct devlink *devlink,
> ^
>drivers/net//ethernet/mellanox/mlxsw/spectrum_dpipe.c:327:1: warning: 
> control reaches end of non-void function [-Wreturn-type]
> }
> ^

I'll push the following fix into net-next for this:


>From 790c6056686cc4dd5b149b330bbd5ae208d4d721 Mon Sep 17 00:00:00 2001
From: "David S. Miller" 
Date: Thu, 24 Aug 2017 18:10:46 -0700
Subject: [PATCH] devlink: Fix devlink_dpipe_table_register() stub signature.

One too many arguments compared to the non-stub version.

Reported-by: kbuild test robot 
Fixes: ffd3cdccf214 ("devlink: Add support for dynamic table size")
Signed-off-by: David S. Miller 
---
 include/net/devlink.h | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/include/net/devlink.h b/include/net/devlink.h
index 047a0c54f652..aaf7178127a2 100644
--- a/include/net/devlink.h
+++ b/include/net/devlink.h
@@ -402,8 +402,7 @@ static inline int
 devlink_dpipe_table_register(struct devlink *devlink,
 const char *table_name,
 struct devlink_dpipe_table_ops *table_ops,
-void *priv, u64 size,
-bool counter_control_extern)
+void *priv, bool counter_control_extern)
 {
return 0;
 }
-- 
2.13.5



Re: [PATCH net-next] ipv6: Add sysctl for per namespace flow label reflection

2017-08-24 Thread David Miller
From: Jakub Sitnicki 
Date: Wed, 23 Aug 2017 09:55:41 +0200

> Reflecting IPv6 Flow Label at server nodes is useful in environments
> that employ multipath routing to load balance the requests. As "IPv6
> Flow Label Reflection" standard draft [1] points out - ICMPv6 PTB error
> messages generated in response to a downstream packets from the server
> can be routed by a load balancer back to the original server without
> looking at transport headers, if the server applies the flow label
> reflection. This enables the Path MTU Discovery past the ECMP router in
> load-balance or anycast environments where each server node is reachable
> by only one path.
> 
> Introduce a sysctl to enable flow label reflection per net namespace for
> all newly created sockets. Same could be earlier achieved only per
> socket by setting the IPV6_FL_F_REFLECT flag for the IPV6_FLOWLABEL_MGR
> socket option.
> 
> [1] https://tools.ietf.org/html/draft-wang-6man-flow-label-reflection-01
> 
> Signed-off-by: Jakub Sitnicki 

Ok, applied.


Re: [PATCH net-next v2] tg3: Be drop monitor friendly

2017-08-24 Thread Michael Chan
On Thu, Aug 24, 2017 at 5:47 PM, Florian Fainelli  wrote:
> tg3_tx() does the normal packet TX completion,
> tigon3_dma_hwbug_workaround() and tg3_tso_bug() both need to allocate a
> new SKB that is suitable to workaround HW bugs, and finally
> tg3_free_rings() is doing ring cleanup. Use dev_consume_skb_any() for
> these 3 locations to be SKB drop monitor friendly.
>
> Signed-off-by: Florian Fainelli 

Acked-by: Michael Chan 

> ---
> Changes in v2:
>
> - also included tg3_tso_bug() as indicated by Michael
>


[PATCH net-next v2] tg3: Be drop monitor friendly

2017-08-24 Thread Florian Fainelli
tg3_tx() does the normal packet TX completion,
tigon3_dma_hwbug_workaround() and tg3_tso_bug() both need to allocate a
new SKB that is suitable to workaround HW bugs, and finally
tg3_free_rings() is doing ring cleanup. Use dev_consume_skb_any() for
these 3 locations to be SKB drop monitor friendly.

Signed-off-by: Florian Fainelli 
---
Changes in v2:

- also included tg3_tso_bug() as indicated by Michael

 drivers/net/ethernet/broadcom/tg3.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/tg3.c 
b/drivers/net/ethernet/broadcom/tg3.c
index d600c41fb1dc..af33dc15c55f 100644
--- a/drivers/net/ethernet/broadcom/tg3.c
+++ b/drivers/net/ethernet/broadcom/tg3.c
@@ -6587,7 +6587,7 @@ static void tg3_tx(struct tg3_napi *tnapi)
pkts_compl++;
bytes_compl += skb->len;
 
-   dev_kfree_skb_any(skb);
+   dev_consume_skb_any(skb);
 
if (unlikely(tx_bug)) {
tg3_tx_recover(tp);
@@ -7829,7 +7829,7 @@ static int tigon3_dma_hwbug_workaround(struct tg3_napi 
*tnapi,
}
}
 
-   dev_kfree_skb_any(skb);
+   dev_consume_skb_any(skb);
*pskb = new_skb;
return ret;
 }
@@ -7882,7 +7882,7 @@ static int tg3_tso_bug(struct tg3 *tp, struct tg3_napi 
*tnapi,
} while (segs);
 
 tg3_tso_bug_end:
-   dev_kfree_skb_any(skb);
+   dev_consume_skb_any(skb);
 
return NETDEV_TX_OK;
 }
@@ -8543,7 +8543,7 @@ static void tg3_free_rings(struct tg3 *tp)
tg3_tx_skb_unmap(tnapi, i,
 skb_shinfo(skb)->nr_frags - 1);
 
-   dev_kfree_skb_any(skb);
+   dev_consume_skb_any(skb);
}
netdev_tx_reset_queue(netdev_get_tx_queue(tp->dev, j));
}
-- 
2.9.3



Re: [PATCH net-next] tg3: Be drop monitor friendly

2017-08-24 Thread Florian Fainelli
On 08/24/2017 05:04 PM, Michael Chan wrote:
> On Thu, Aug 24, 2017 at 4:25 PM, Florian Fainelli  > wrote:
>>
>> tg3_tx() does the normal packet TX completion,
>> tigon3_dma_hwbug_workaround() needs to allocate a new SKB that is
>> suitable for the DMA hardware bug, and finally tg3_free_rings() is doing
>> ring cleanup. Use dev_consume_skb_any() for these 3 locations to be SKB
>> drop monitor friendly.
>>
>> Signed-off-by: Florian Fainelli  >
> 
> Florian, the one at the end of tg3_tso_bug() should also be converted to
> dev_consume_skb_any().  tg3_tso_bug() is similar to
> tigon3_dma_hwbug_workaround().  Thanks.

Indeed, thanks, will resubmit.
-- 
Florian


[net-next:master 1328/1341] drivers/net/ethernet/mellanox/mlxsw/spectrum_dpipe.c:77:3: error: 'devlink_dpipe_header_ethernet' undeclared here (not in a function)

2017-08-24 Thread kbuild test robot
tree:   https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git 
master
head:   39a7e5892418514db114ea4842d7d88aee6a61b8
commit: 6aecb36bc01fc9c84f24c4155e8850b6a1962f03 [1328/1341] mlxsw: 
spectrum_dpipe: Add IPv4 host table initial support
config: i386-randconfig-sb0-08250550 (attached as .config)
compiler: gcc-5 (Debian 5.4.1-2) 5.4.1 20160904
reproduce:
git checkout 6aecb36bc01fc9c84f24c4155e8850b6a1962f03
# save the attached .config to linux build tree
make ARCH=i386 

All error/warnings (new ones prefixed by >>):

>> drivers/net/ethernet/mellanox/mlxsw/spectrum_dpipe.c:77:3: error: 
>> 'devlink_dpipe_header_ethernet' undeclared here (not in a function)
 &devlink_dpipe_header_ethernet,
  ^
>> drivers/net/ethernet/mellanox/mlxsw/spectrum_dpipe.c:78:3: error: 
>> 'devlink_dpipe_header_ipv4' undeclared here (not in a function)
 &devlink_dpipe_header_ipv4,
  ^
   drivers/net/ethernet/mellanox/mlxsw/spectrum_dpipe.c: In function 
'mlxsw_sp_dpipe_erif_table_init':
   drivers/net/ethernet/mellanox/mlxsw/spectrum_dpipe.c:305:9: error: too few 
arguments to function 'devlink_dpipe_table_register'
 return devlink_dpipe_table_register(devlink,
^
   In file included from 
drivers/net/ethernet/mellanox/mlxsw/spectrum_dpipe.c:36:0:
   include/net/devlink.h:402:1: note: declared here
devlink_dpipe_table_register(struct devlink *devlink,
^
>> drivers/net/ethernet/mellanox/mlxsw/spectrum_dpipe.c:305:9: warning: return 
>> makes integer from pointer without a cast [-Wint-conversion]
 return devlink_dpipe_table_register(devlink,
^
   drivers/net/ethernet/mellanox/mlxsw/spectrum_dpipe.c: In function 
'mlxsw_sp_dpipe_table_host_matches_dump':
>> drivers/net/ethernet/mellanox/mlxsw/spectrum_dpipe.c:332:15: error: 
>> assignment from incompatible pointer type 
>> [-Werror=incompatible-pointer-types]
 match.header = &devlink_dpipe_header_ipv4;
  ^
   drivers/net/ethernet/mellanox/mlxsw/spectrum_dpipe.c: In function 
'mlxsw_sp_dpipe_table_host4_actions_dump':
   drivers/net/ethernet/mellanox/mlxsw/spectrum_dpipe.c:350:16: error: 
assignment from incompatible pointer type [-Werror=incompatible-pointer-types]
 action.header = &devlink_dpipe_header_ethernet;
   ^
   drivers/net/ethernet/mellanox/mlxsw/spectrum_dpipe.c: In function 
'mlxsw_sp_dpipe_host4_table_init':
   drivers/net/ethernet/mellanox/mlxsw/spectrum_dpipe.c:397:9: error: too few 
arguments to function 'devlink_dpipe_table_register'
 return devlink_dpipe_table_register(devlink,
^
   In file included from 
drivers/net/ethernet/mellanox/mlxsw/spectrum_dpipe.c:36:0:
   include/net/devlink.h:402:1: note: declared here
devlink_dpipe_table_register(struct devlink *devlink,
^
   drivers/net/ethernet/mellanox/mlxsw/spectrum_dpipe.c:397:9: warning: return 
makes integer from pointer without a cast [-Wint-conversion]
 return devlink_dpipe_table_register(devlink,
^
   cc1: some warnings being treated as errors

vim +/devlink_dpipe_header_ethernet +77 
drivers/net/ethernet/mellanox/mlxsw/spectrum_dpipe.c

  > 36  #include 
37  
38  #include "spectrum.h"
39  #include "spectrum_dpipe.h"
40  #include "spectrum_router.h"
41  
42  enum mlxsw_sp_field_metadata_id {
43  MLXSW_SP_DPIPE_FIELD_METADATA_ERIF_PORT,
44  MLXSW_SP_DPIPE_FIELD_METADATA_L3_FORWARD,
45  MLXSW_SP_DPIPE_FIELD_METADATA_L3_DROP,
46  };
47  
48  static struct devlink_dpipe_field mlxsw_sp_dpipe_fields_metadata[] = {
49  { .name = "erif_port",
50.id = MLXSW_SP_DPIPE_FIELD_METADATA_ERIF_PORT,
51.bitwidth = 32,
52.mapping_type = DEVLINK_DPIPE_FIELD_MAPPING_TYPE_IFINDEX,
53  },
54  { .name = "l3_forward",
55.id = MLXSW_SP_DPIPE_FIELD_METADATA_L3_FORWARD,
56.bitwidth = 1,
57  },
58  { .name = "l3_drop",
59.id = MLXSW_SP_DPIPE_FIELD_METADATA_L3_DROP,
60.bitwidth = 1,
61  },
62  };
63  
64  enum mlxsw_sp_dpipe_header_id {
65  MLXSW_SP_DPIPE_HEADER_METADATA,
66  };
67  
68  static struct devlink_dpipe_header mlxsw_sp_dpipe_header_metadata = {
69  .name = "mlxsw_meta",
70  .id = MLXSW_SP_DPIPE_HEADER_METADATA,
71  .fields = mlxsw_sp_dpipe_fields_metadata,
72  .fields_count = ARRAY_SIZE(mlxsw_sp_dpipe_fields_metadata),
73  };
74  
75  static struct devlink_dpipe_header *mlxsw_dpipe_headers[] = {
76  &mlxsw_sp_dpipe_header_metadata,
  > 77  &devlink_dpipe_header_ethernet,
  > 78  &devlink_dpipe_header_ipv4,
79  };
80  

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.

[net-next:master 1324/1341] drivers/net//ethernet/mellanox/mlxsw/spectrum_dpipe.c:323:9: error: too few arguments to function 'devlink_dpipe_table_register'

2017-08-24 Thread kbuild test robot
tree:   https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git 
master
head:   39a7e5892418514db114ea4842d7d88aee6a61b8
commit: ffd3cdccf214cf0df08856a6738544076c4cd548 [1324/1341] devlink: Add 
support for dynamic table size
config: i386-randconfig-sb0-08250550 (attached as .config)
compiler: gcc-5 (Debian 5.4.1-2) 5.4.1 20160904
reproduce:
git checkout ffd3cdccf214cf0df08856a6738544076c4cd548
# save the attached .config to linux build tree
make ARCH=i386 

All errors (new ones prefixed by >>):

   drivers/net//ethernet/mellanox/mlxsw/spectrum_dpipe.c: In function 
'mlxsw_sp_dpipe_erif_table_init':
>> drivers/net//ethernet/mellanox/mlxsw/spectrum_dpipe.c:323:9: error: too few 
>> arguments to function 'devlink_dpipe_table_register'
 return devlink_dpipe_table_register(devlink,
^
   In file included from 
drivers/net//ethernet/mellanox/mlxsw/spectrum_dpipe.c:36:0:
   include/net/devlink.h:401:1: note: declared here
devlink_dpipe_table_register(struct devlink *devlink,
^
   drivers/net//ethernet/mellanox/mlxsw/spectrum_dpipe.c:327:1: warning: 
control reaches end of non-void function [-Wreturn-type]
}
^

vim +/devlink_dpipe_table_register +323 
drivers/net//ethernet/mellanox/mlxsw/spectrum_dpipe.c

d54b70fe Arkadi Sharshevsky 2017-03-28  318  
d54b70fe Arkadi Sharshevsky 2017-03-28  319  static int 
mlxsw_sp_dpipe_erif_table_init(struct mlxsw_sp *mlxsw_sp)
d54b70fe Arkadi Sharshevsky 2017-03-28  320  {
d54b70fe Arkadi Sharshevsky 2017-03-28  321 struct devlink *devlink = 
priv_to_devlink(mlxsw_sp->core);
d54b70fe Arkadi Sharshevsky 2017-03-28  322  
d54b70fe Arkadi Sharshevsky 2017-03-28 @323 return 
devlink_dpipe_table_register(devlink,
d54b70fe Arkadi Sharshevsky 2017-03-28  324 
MLXSW_SP_DPIPE_TABLE_NAME_ERIF,
d54b70fe Arkadi Sharshevsky 2017-03-28  325 
&mlxsw_sp_erif_ops,
ffd3cdcc Arkadi Sharshevsky 2017-08-24  326 
mlxsw_sp, false);
d54b70fe Arkadi Sharshevsky 2017-03-28  327  }
d54b70fe Arkadi Sharshevsky 2017-03-28  328  

:: The code at line 323 was first introduced by commit
:: d54b70feb696f0d110626438432d0acec3cb4752 mlxsw: spectrum: Add definition 
for egress rif table

:: TO: Arkadi Sharshevsky 
:: CC: David S. Miller 

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip


[Patch net-next v2 3/4] net_sched: remove tc class reference counting

2017-08-24 Thread Cong Wang
For TC classes, their ->get() and ->put() are always paired, and the
reference counting is completely useless, because:

1) For class modification and dumping paths, we already hold RTNL lock,
   so all of these ->get(),->change(),->put() are atomic.

2) For filter bindiing/unbinding, we use other reference counter than
   this one, and they should have RTNL lock too.

3) For ->qlen_notify(), it is special because it is called on ->enqueue()
   path, but we already hold qdisc tree lock there, and we hold this
   tree lock when graft or delete the class too, so it should not be gone
   or changed until we release the tree lock.

Therefore, this patch removes ->get() and ->put(), but:

1) Adds a new ->find() to find the pointer to a class by classid, no
   refcnt.

2) Move the original class destroy upon the last refcnt into ->delete(),
   right after releasing tree lock. This is fine because the class is
   already removed from hash when holding the lock.

For those who also use ->put() as ->unbind(), just rename them to reflect
this change.

Cc: Jamal Hadi Salim 
Signed-off-by: Cong Wang 
---
 include/net/sch_generic.h |  3 +--
 net/sched/cls_api.c   | 17 ++---
 net/sched/sch_api.c   | 21 -
 net/sched/sch_atm.c   | 30 --
 net/sched/sch_cbq.c   | 41 -
 net/sched/sch_drr.c   | 30 +-
 net/sched/sch_dsmark.c| 17 -
 net/sched/sch_fq_codel.c  |  9 -
 net/sched/sch_hfsc.c  | 32 +---
 net/sched/sch_htb.c   | 33 +++--
 net/sched/sch_ingress.c   | 20 +---
 net/sched/sch_mq.c|  9 ++---
 net/sched/sch_mqprio.c|  9 ++---
 net/sched/sch_multiq.c| 11 +--
 net/sched/sch_netem.c |  9 ++---
 net/sched/sch_prio.c  | 11 +--
 net/sched/sch_qfq.c   | 30 +-
 net/sched/sch_red.c   |  9 ++---
 net/sched/sch_sfb.c   |  9 -
 net/sched/sch_sfq.c   |  9 -
 net/sched/sch_tbf.c   |  9 ++---
 21 files changed, 106 insertions(+), 262 deletions(-)

diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 1688f0f6c7ba..d817585aa5df 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -147,8 +147,7 @@ struct Qdisc_class_ops {
void(*qlen_notify)(struct Qdisc *, unsigned long);
 
/* Class manipulation routines */
-   unsigned long   (*get)(struct Qdisc *, u32 classid);
-   void(*put)(struct Qdisc *, unsigned long);
+   unsigned long   (*find)(struct Qdisc *, u32 classid);
int (*change)(struct Qdisc *, u32, u32,
struct nlattr **, unsigned long *);
int (*delete)(struct Qdisc *, unsigned long);
diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index eef6b077f30e..d470a4e2de58 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -586,7 +586,7 @@ static int tc_ctl_tfilter(struct sk_buff *skb, struct 
nlmsghdr *n,
 
/* Do we search for filter, attached to class? */
if (TC_H_MIN(parent)) {
-   cl = cops->get(q, parent);
+   cl = cops->find(q, parent);
if (cl == 0)
return -ENOENT;
}
@@ -716,8 +716,6 @@ static int tc_ctl_tfilter(struct sk_buff *skb, struct 
nlmsghdr *n,
 errout:
if (chain)
tcf_chain_put(chain);
-   if (cl)
-   cops->put(q, cl);
if (err == -EAGAIN)
/* Replay the request. */
goto replay;
@@ -822,17 +820,17 @@ static int tc_dump_tfilter(struct sk_buff *skb, struct 
netlink_callback *cb)
goto out;
cops = q->ops->cl_ops;
if (!cops)
-   goto errout;
+   goto out;
if (!cops->tcf_block)
-   goto errout;
+   goto out;
if (TC_H_MIN(tcm->tcm_parent)) {
-   cl = cops->get(q, tcm->tcm_parent);
+   cl = cops->find(q, tcm->tcm_parent);
if (cl == 0)
-   goto errout;
+   goto out;
}
block = cops->tcf_block(q, cl);
if (!block)
-   goto errout;
+   goto out;
 
index_start = cb->args[0];
index = 0;
@@ -847,9 +845,6 @@ static int tc_dump_tfilter(struct sk_buff *skb, struct 
netlink_callback *cb)
 
cb->args[0] = index;
 
-errout:
-   if (cl)
-   cops->put(q, cl);
 out:
return skb->len;
 }
diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
index 3ef4eb578739..e7f8e4bfd4ec 100644
--- a/net/sched/sch_api.c
+++ b/net/sched/sch_api.c
@@ -153,7 +153,7 @@ int register_qdisc(struct Qdisc_ops *qops)
if (qops->cl_ops) {

[Patch net-next v2 2/4] net_sched: introduce tclass_del_notify()

2017-08-24 Thread Cong Wang
Like for TC actions, ->delete() is a special case,
we have to prepare and fill the notification before delete
otherwise would get use-after-free after we remove the
reference count.

Signed-off-by: Cong Wang 
---
 net/sched/sch_api.c | 39 +--
 1 file changed, 33 insertions(+), 6 deletions(-)

diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
index 330ffaea9974..3ef4eb578739 100644
--- a/net/sched/sch_api.c
+++ b/net/sched/sch_api.c
@@ -1618,6 +1618,38 @@ static int tclass_notify(struct net *net, struct sk_buff 
*oskb,
  n->nlmsg_flags & NLM_F_ECHO);
 }
 
+static int tclass_del_notify(struct net *net,
+const struct Qdisc_class_ops *cops,
+struct sk_buff *oskb, struct nlmsghdr *n,
+struct Qdisc *q, unsigned long cl)
+{
+   u32 portid = oskb ? NETLINK_CB(oskb).portid : 0;
+   struct sk_buff *skb;
+   int err = 0;
+
+   if (!cops->delete)
+   return -EOPNOTSUPP;
+
+   skb = alloc_skb(NLMSG_GOODSIZE, GFP_KERNEL);
+   if (!skb)
+   return -ENOBUFS;
+
+   if (tc_fill_tclass(skb, q, cl, portid, n->nlmsg_seq, 0,
+  RTM_DELTCLASS) < 0) {
+   kfree_skb(skb);
+   return -EINVAL;
+   }
+
+   err = cops->delete(q, cl);
+   if (err) {
+   kfree_skb(skb);
+   return err;
+   }
+
+   return rtnetlink_send(skb, net, portid, RTNLGRP_TC,
+ n->nlmsg_flags & NLM_F_ECHO);
+}
+
 static int tc_ctl_tclass(struct sk_buff *skb, struct nlmsghdr *n,
 struct netlink_ext_ack *extack)
 {
@@ -1722,12 +1754,7 @@ static int tc_ctl_tclass(struct sk_buff *skb, struct 
nlmsghdr *n,
goto out;
break;
case RTM_DELTCLASS:
-   err = -EOPNOTSUPP;
-   if (cops->delete)
-   err = cops->delete(q, cl);
-   if (err == 0)
-   tclass_notify(net, skb, n, q, cl,
- RTM_DELTCLASS);
+   err = tclass_del_notify(net, cops, skb, n, q, cl);
goto out;
case RTM_GETTCLASS:
err = tclass_notify(net, skb, n, q, cl, RTM_NEWTCLASS);
-- 
2.13.0



[Patch net-next v2 1/4] net_sched: get rid of more forward declarations

2017-08-24 Thread Cong Wang
This is not needed if we move them up properly.

Signed-off-by: Cong Wang 
---
 net/sched/sch_api.c | 348 +---
 1 file changed, 169 insertions(+), 179 deletions(-)

diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
index aaf552b8e120..330ffaea9974 100644
--- a/net/sched/sch_api.c
+++ b/net/sched/sch_api.c
@@ -36,13 +36,6 @@
 #include 
 #include 
 
-static int qdisc_notify(struct net *net, struct sk_buff *oskb,
-   struct nlmsghdr *n, u32 clid,
-   struct Qdisc *old, struct Qdisc *new);
-static int tclass_notify(struct net *net, struct sk_buff *oskb,
-struct nlmsghdr *n, struct Qdisc *q,
-unsigned long cl, int event);
-
 /*
 
Short review.
@@ -775,6 +768,111 @@ void qdisc_tree_reduce_backlog(struct Qdisc *sch, 
unsigned int n,
 }
 EXPORT_SYMBOL(qdisc_tree_reduce_backlog);
 
+static int tc_fill_qdisc(struct sk_buff *skb, struct Qdisc *q, u32 clid,
+u32 portid, u32 seq, u16 flags, int event)
+{
+   struct gnet_stats_basic_cpu __percpu *cpu_bstats = NULL;
+   struct gnet_stats_queue __percpu *cpu_qstats = NULL;
+   struct tcmsg *tcm;
+   struct nlmsghdr  *nlh;
+   unsigned char *b = skb_tail_pointer(skb);
+   struct gnet_dump d;
+   struct qdisc_size_table *stab;
+   __u32 qlen;
+
+   cond_resched();
+   nlh = nlmsg_put(skb, portid, seq, event, sizeof(*tcm), flags);
+   if (!nlh)
+   goto out_nlmsg_trim;
+   tcm = nlmsg_data(nlh);
+   tcm->tcm_family = AF_UNSPEC;
+   tcm->tcm__pad1 = 0;
+   tcm->tcm__pad2 = 0;
+   tcm->tcm_ifindex = qdisc_dev(q)->ifindex;
+   tcm->tcm_parent = clid;
+   tcm->tcm_handle = q->handle;
+   tcm->tcm_info = refcount_read(&q->refcnt);
+   if (nla_put_string(skb, TCA_KIND, q->ops->id))
+   goto nla_put_failure;
+   if (q->ops->dump && q->ops->dump(q, skb) < 0)
+   goto nla_put_failure;
+   qlen = q->q.qlen;
+
+   stab = rtnl_dereference(q->stab);
+   if (stab && qdisc_dump_stab(skb, stab) < 0)
+   goto nla_put_failure;
+
+   if (gnet_stats_start_copy_compat(skb, TCA_STATS2, TCA_STATS, TCA_XSTATS,
+NULL, &d, TCA_PAD) < 0)
+   goto nla_put_failure;
+
+   if (q->ops->dump_stats && q->ops->dump_stats(q, &d) < 0)
+   goto nla_put_failure;
+
+   if (qdisc_is_percpu_stats(q)) {
+   cpu_bstats = q->cpu_bstats;
+   cpu_qstats = q->cpu_qstats;
+   }
+
+   if (gnet_stats_copy_basic(qdisc_root_sleeping_running(q),
+ &d, cpu_bstats, &q->bstats) < 0 ||
+   gnet_stats_copy_rate_est(&d, &q->rate_est) < 0 ||
+   gnet_stats_copy_queue(&d, cpu_qstats, &q->qstats, qlen) < 0)
+   goto nla_put_failure;
+
+   if (gnet_stats_finish_copy(&d) < 0)
+   goto nla_put_failure;
+
+   nlh->nlmsg_len = skb_tail_pointer(skb) - b;
+   return skb->len;
+
+out_nlmsg_trim:
+nla_put_failure:
+   nlmsg_trim(skb, b);
+   return -1;
+}
+
+static bool tc_qdisc_dump_ignore(struct Qdisc *q, bool dump_invisible)
+{
+   if (q->flags & TCQ_F_BUILTIN)
+   return true;
+   if ((q->flags & TCQ_F_INVISIBLE) && !dump_invisible)
+   return true;
+
+   return false;
+}
+
+static int qdisc_notify(struct net *net, struct sk_buff *oskb,
+   struct nlmsghdr *n, u32 clid,
+   struct Qdisc *old, struct Qdisc *new)
+{
+   struct sk_buff *skb;
+   u32 portid = oskb ? NETLINK_CB(oskb).portid : 0;
+
+   skb = alloc_skb(NLMSG_GOODSIZE, GFP_KERNEL);
+   if (!skb)
+   return -ENOBUFS;
+
+   if (old && !tc_qdisc_dump_ignore(old, false)) {
+   if (tc_fill_qdisc(skb, old, clid, portid, n->nlmsg_seq,
+ 0, RTM_DELQDISC) < 0)
+   goto err_out;
+   }
+   if (new && !tc_qdisc_dump_ignore(new, false)) {
+   if (tc_fill_qdisc(skb, new, clid, portid, n->nlmsg_seq,
+ old ? NLM_F_REPLACE : 0, RTM_NEWQDISC) < 0)
+   goto err_out;
+   }
+
+   if (skb->len)
+   return rtnetlink_send(skb, net, portid, RTNLGRP_TC,
+ n->nlmsg_flags & NLM_F_ECHO);
+
+err_out:
+   kfree_skb(skb);
+   return -EINVAL;
+}
+
 static void notify_and_destroy(struct net *net, struct sk_buff *skb,
   struct nlmsghdr *n, u32 clid,
   struct Qdisc *old, struct Qdisc *new)
@@ -1342,111 +1440,6 @@ static int tc_modify_qdisc(struct sk_buff *skb, struct 
nlmsghdr *n,
return 0;
 }
 
-static int tc_fill_qdisc(struct sk_buff *skb, struct Qdisc *q, u32 clid,
-u32 portid, u32 seq, u16 flags, int event)
-{
-   st

[Patch net-next v2 4/4] net_sched: kill u32_node pointer in Qdisc

2017-08-24 Thread Cong Wang
It is ugly to hide a u32-filter-specific pointer inside Qdisc,
this breaks the TC layers:

1. Qdisc is a generic representation, should not have any specific
   data of any type

2. Qdisc layer is above filter layer, should only save filters in
   the list of struct tcf_proto.

This pointer is used as the head of the chain of u32 hash tables,
that is struct tc_u_hnode, because u32 filter is very special,
it allows to create multiple hash tables within one qdisc and
across multiple u32 filters.

Instead of using this ugly pointer, we can just save it in a global
hash table key'ed by (dev ifindex, qdisc handle), therefore we can
still treat it as a per qdisc basis data structure conceptually.

Of course, because of network namespaces, this key is not unique
at all, but it is fine as we already have a pointer to Qdisc in
struct tc_u_common, we can just compare the pointers when collision.

And this only affects slow paths, has no impact to fast path,
thanks to the pointer ->tp_c.

Cc: Jamal Hadi Salim 
Cc: Jiri Pirko 
Signed-off-by: Cong Wang 
---
 include/net/sch_generic.h |  1 -
 net/sched/cls_u32.c   | 57 +++
 2 files changed, 53 insertions(+), 5 deletions(-)

diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index d817585aa5df..c30b634c5f82 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -75,7 +75,6 @@ struct Qdisc {
struct hlist_node   hash;
u32 handle;
u32 parent;
-   void*u32_node;
 
struct netdev_queue *dev_queue;
 
diff --git a/net/sched/cls_u32.c b/net/sched/cls_u32.c
index af22742d2847..99ea4c74dd5b 100644
--- a/net/sched/cls_u32.c
+++ b/net/sched/cls_u32.c
@@ -40,6 +40,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 #include 
 #include 
 #include 
@@ -92,6 +94,7 @@ struct tc_u_common {
struct Qdisc*q;
int refcnt;
u32 hgenerator;
+   struct hlist_node   hnode;
struct rcu_head rcu;
 };
 
@@ -323,12 +326,40 @@ static u32 gen_new_htid(struct tc_u_common *tp_c)
return i > 0 ? (tp_c->hgenerator|0x800)<<20 : 0;
 }
 
+static struct hlist_head *tc_u_common_hash;
+
+#define U32_HASH_SHIFT 10
+#define U32_HASH_SIZE (1 << U32_HASH_SHIFT)
+
+static unsigned int tc_u_hash(const struct tcf_proto *tp)
+{
+   struct net_device *dev = tp->q->dev_queue->dev;
+   u32 qhandle = tp->q->handle;
+   int ifindex = dev->ifindex;
+
+   return hash_64((u64)ifindex << 32 | qhandle, U32_HASH_SHIFT);
+}
+
+static struct tc_u_common *tc_u_common_find(const struct tcf_proto *tp)
+{
+   struct tc_u_common *tc;
+   unsigned int h;
+
+   h = tc_u_hash(tp);
+   hlist_for_each_entry(tc, &tc_u_common_hash[h], hnode) {
+   if (tc->q == tp->q)
+   return tc;
+   }
+   return NULL;
+}
+
 static int u32_init(struct tcf_proto *tp)
 {
struct tc_u_hnode *root_ht;
struct tc_u_common *tp_c;
+   unsigned int h;
 
-   tp_c = tp->q->u32_node;
+   tp_c = tc_u_common_find(tp);
 
root_ht = kzalloc(sizeof(*root_ht), GFP_KERNEL);
if (root_ht == NULL)
@@ -345,7 +376,10 @@ static int u32_init(struct tcf_proto *tp)
return -ENOBUFS;
}
tp_c->q = tp->q;
-   tp->q->u32_node = tp_c;
+   INIT_HLIST_NODE(&tp_c->hnode);
+
+   h = tc_u_hash(tp);
+   hlist_add_head(&tp_c->hnode, &tc_u_common_hash[h]);
}
 
tp_c->refcnt++;
@@ -585,7 +619,7 @@ static void u32_destroy(struct tcf_proto *tp)
if (--tp_c->refcnt == 0) {
struct tc_u_hnode *ht;
 
-   tp->q->u32_node = NULL;
+   hlist_del(&tp_c->hnode);
 
for (ht = rtnl_dereference(tp_c->hlist);
 ht;
@@ -1213,6 +1247,8 @@ static struct tcf_proto_ops cls_u32_ops __read_mostly = {
 
 static int __init init_u32(void)
 {
+   int i, ret;
+
pr_info("u32 classifier\n");
 #ifdef CONFIG_CLS_U32_PERF
pr_info("Performance counters on\n");
@@ -1223,12 +1259,25 @@ static int __init init_u32(void)
 #ifdef CONFIG_NET_CLS_ACT
pr_info("Actions configured\n");
 #endif
-   return register_tcf_proto_ops(&cls_u32_ops);
+   tc_u_common_hash = kvmalloc_array(U32_HASH_SIZE,
+ sizeof(struct hlist_head),
+ GFP_KERNEL);
+   if (!tc_u_common_hash)
+   return -ENOMEM;
+
+   for (i = 0; i < U32_HASH_SIZE; i++)
+   INIT_HLIST_HEAD(&tc_u_common_hash[i]);
+
+   ret = register_tcf_proto_ops(&cls_u32_ops);
+   if (ret)
+   kvfree(tc_u_common_hash);
+   return ret;
 }
 
 static void __exit exit_u32(void)
 {
unregister_tcf_proto_ops(&cls_u32_ops);
+ 

[Patch net-next v2 0/4] net_sched: clean up tc classes and u32 filter

2017-08-24 Thread Cong Wang
Patch 1 and patch 2 prepare for patch 3. Major changes
are in patch 3 and patch 4, details are there too.

Cong Wang (4):
  net_sched: get rid of more forward declarations
  net_sched: introduce tclass_del_notify()
  net_sched: remove tc class reference counting
  net_sched: kill u32_node pointer in Qdisc

---
v2: Add patch 1 and 2, group all into a patchset
Fix a coding style issue in patch 4

 include/net/sch_generic.h |   4 +-
 net/sched/cls_api.c   |  17 +-
 net/sched/cls_u32.c   |  57 ++-
 net/sched/sch_api.c   | 408 --
 net/sched/sch_atm.c   |  30 ++--
 net/sched/sch_cbq.c   |  41 +
 net/sched/sch_drr.c   |  30 +---
 net/sched/sch_dsmark.c|  17 +-
 net/sched/sch_fq_codel.c  |   9 +-
 net/sched/sch_hfsc.c  |  32 +---
 net/sched/sch_htb.c   |  33 +---
 net/sched/sch_ingress.c   |  20 +--
 net/sched/sch_mq.c|   9 +-
 net/sched/sch_mqprio.c|   9 +-
 net/sched/sch_multiq.c|  11 +-
 net/sched/sch_netem.c |   9 +-
 net/sched/sch_prio.c  |  11 +-
 net/sched/sch_qfq.c   |  30 +---
 net/sched/sch_red.c   |   9 +-
 net/sched/sch_sfb.c   |   9 +-
 net/sched/sch_sfq.c   |   9 +-
 net/sched/sch_tbf.c   |   9 +-
 22 files changed, 361 insertions(+), 452 deletions(-)

-- 
2.13.0



[PATCH] netvsc: fix deadlock betwen link status and removal

2017-08-24 Thread Stephen Hemminger
There is a deadlock possible when canceling the link status
delayed work queue. The removal process is run with RTNL held,
and the link status callback is acquring RTNL.

Resolve the issue by using trylock and rescheduling.
If cancel is in process, that block it from happening.

Fixes: 122a5f6410f4 ("staging: hv: use delayed_work for netvsc_send_garp()")
Signed-off-by: Stephen Hemminger 
---
This should be queued for stable as well. Problem has existed back to 3.10

 drivers/net/hyperv/netvsc_drv.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c
index 0d78727f1a14..d91cbc6c3ca4 100644
--- a/drivers/net/hyperv/netvsc_drv.c
+++ b/drivers/net/hyperv/netvsc_drv.c
@@ -1269,7 +1269,12 @@ static void netvsc_link_change(struct work_struct *w)
bool notify = false, reschedule = false;
unsigned long flags, next_reconfig, delay;
 
-   rtnl_lock();
+   /* if changes are happening, comeback later */
+   if (!rtnl_trylock()) {
+   schedule_delayed_work(&ndev_ctx->dwork, LINKCHANGE_INT);
+   return;
+   }
+
net_device = rtnl_dereference(ndev_ctx->nvdev);
if (!net_device)
goto out_unlock;
-- 
2.11.0



Re: [PATCH net] bpf: fix bpf_setsockopts return value

2017-08-24 Thread Lawrence Brakmo


On 8/24/17, 4:37 PM, "Daniel Borkmann"  wrote:

On 08/25/2017 12:48 AM, Yuchung Cheng wrote:
> This patch fixes a bug causing any sock operations to always return 
EINVAL.
>
> Fixes: a5192c52377e ("bpf: fix to bpf_setsockops").
> Reported-by: Neal Cardwell 
> Signed-off-by: Yuchung Cheng 
> Acked-by: Neal Cardwell 
> Acked-by: Craig Gallek 

Ohh well, thanks for the fix!

Acked-by: Daniel Borkmann 

Thanks! I never checked return values in by bpf test programs

Acked-by: Lawrence Brakmo  
   



Re: [PATCH net] bpf: fix bpf_setsockopts return value

2017-08-24 Thread Daniel Borkmann

On 08/25/2017 12:48 AM, Yuchung Cheng wrote:

This patch fixes a bug causing any sock operations to always return EINVAL.

Fixes: a5192c52377e ("bpf: fix to bpf_setsockops").
Reported-by: Neal Cardwell 
Signed-off-by: Yuchung Cheng 
Acked-by: Neal Cardwell 
Acked-by: Craig Gallek 


Ohh well, thanks for the fix!

Acked-by: Daniel Borkmann 


[PATCH net-next] tg3: Be drop monitor friendly

2017-08-24 Thread Florian Fainelli
tg3_tx() does the normal packet TX completion,
tigon3_dma_hwbug_workaround() needs to allocate a new SKB that is
suitable for the DMA hardware bug, and finally tg3_free_rings() is doing
ring cleanup. Use dev_consume_skb_any() for these 3 locations to be SKB
drop monitor friendly.

Signed-off-by: Florian Fainelli 
---
 drivers/net/ethernet/broadcom/tg3.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/tg3.c 
b/drivers/net/ethernet/broadcom/tg3.c
index d600c41fb1dc..8b8da7e32e24 100644
--- a/drivers/net/ethernet/broadcom/tg3.c
+++ b/drivers/net/ethernet/broadcom/tg3.c
@@ -6587,7 +6587,7 @@ static void tg3_tx(struct tg3_napi *tnapi)
pkts_compl++;
bytes_compl += skb->len;
 
-   dev_kfree_skb_any(skb);
+   dev_consume_skb_any(skb);
 
if (unlikely(tx_bug)) {
tg3_tx_recover(tp);
@@ -7829,7 +7829,7 @@ static int tigon3_dma_hwbug_workaround(struct tg3_napi 
*tnapi,
}
}
 
-   dev_kfree_skb_any(skb);
+   dev_consume_skb_any(skb);
*pskb = new_skb;
return ret;
 }
@@ -8543,7 +8543,7 @@ static void tg3_free_rings(struct tg3 *tp)
tg3_tx_skb_unmap(tnapi, i,
 skb_shinfo(skb)->nr_frags - 1);
 
-   dev_kfree_skb_any(skb);
+   dev_consume_skb_any(skb);
}
netdev_tx_reset_queue(netdev_get_tx_queue(tp->dev, j));
}
-- 
2.9.3



Re: [PATCH] isdn: hisax: fix buffer overflow check

2017-08-24 Thread Arnd Bergmann
On Fri, Aug 25, 2017 at 12:58 AM, Arnd Bergmann  wrote:
> gcc-8 warns about a corner case that can overflow a memcpy buffer when a
> length variable is negative. While the code checks for an overly large
> value, it does not check for a negative length that would get turned
> into a large positive number:
>
> In function 'memcpy',
> inlined from 'skb_put_data' at include/linux/skbuff.h:2042:2,
> inlined from 'l3dss1_cmd_global' at drivers/isdn/hisax/l3dss1.c:2219:4:
> include/linux/string.h:348:9: error: '__builtin_memcpy' reading 266 or more 
> bytes from a region of size 265 [-Werror=stringop-overflow=]
>
> In function 'memcpy',
> inlined from 'skb_put_data' at include/linux/skbuff.h:2042:2,
> inlined from 'l3ni1_cmd_global' at drivers/isdn/hisax/l3ni1.c:2079:4:
> include/linux/string.h:348:9: error: '__builtin_memcpy' reading between 266 
> and 4294967295 bytes from a region of size 265 [-Werror=stringop-overflow=]
>
> It's not clear to me whether the warning should be here, or if this
> is another case of an optimization step in gcc causing a warning about
> something that would otherwise be silently ignored. Either way, making
> the length 'unsigned int' instead ensures that no overflow can happen
> here, and avoids the warning. The same code exists in two files, so I'm
> patching both the same way.
>
> Signed-off-by: Arnd Bergmann 

Sorry, I sent this out too early (trying to get fixes posted before my
vacation), please ignore this patch, it doesn't fix all the warnings I get
for this overflow problem.

 Arnd


[PATCH net-next] net: mv643xx_eth: Be drop monitor friendly

2017-08-24 Thread Florian Fainelli
txq_reclaim() does the normal transmit queue reclamation and
rxq_deinit() does the RX ring cleanup, none of these are packet drops,
so use dev_consume_skb() for both locations.

Signed-off-by: Florian Fainelli 
---
 drivers/net/ethernet/marvell/mv643xx_eth.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/marvell/mv643xx_eth.c 
b/drivers/net/ethernet/marvell/mv643xx_eth.c
index 9c94ea9b2b80..7655616bffef 100644
--- a/drivers/net/ethernet/marvell/mv643xx_eth.c
+++ b/drivers/net/ethernet/marvell/mv643xx_eth.c
@@ -1123,7 +1123,7 @@ static int txq_reclaim(struct tx_queue *txq, int budget, 
int force)
struct sk_buff *skb = __skb_dequeue(&txq->tx_skb);
 
if (!WARN_ON(!skb))
-   dev_kfree_skb(skb);
+   dev_consume_skb(skb);
}
 
if (cmd_sts & ERROR_SUMMARY) {
@@ -2026,7 +2026,7 @@ static void rxq_deinit(struct rx_queue *rxq)
 
for (i = 0; i < rxq->rx_ring_size; i++) {
if (rxq->rx_skb[i]) {
-   dev_kfree_skb(rxq->rx_skb[i]);
+   dev_consume_skb(rxq->rx_skb[i]);
rxq->rx_desc_count--;
}
}
-- 
2.9.3



[PATCH net] net: systemport: Free DMA coherent descriptors on errors

2017-08-24 Thread Florian Fainelli
In case bcm_sysport_init_tx_ring() is not able to allocate ring->cbs, we
would return with an error, and call bcm_sysport_fini_tx_ring() and it
would see that ring->cbs is NULL and do nothing. This would leak the
coherent DMA descriptor area, so we need to free it on error before
returning.

Reported-by: Eric Dumazet 
Fixes: 80105befdb4b ("net: systemport: add Broadcom SYSTEMPORT Ethernet MAC 
driver")
Signed-off-by: Florian Fainelli 
---
 drivers/net/ethernet/broadcom/bcmsysport.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/ethernet/broadcom/bcmsysport.c 
b/drivers/net/ethernet/broadcom/bcmsysport.c
index e6add99cc31c..c28fa5a8734c 100644
--- a/drivers/net/ethernet/broadcom/bcmsysport.c
+++ b/drivers/net/ethernet/broadcom/bcmsysport.c
@@ -1346,6 +1346,8 @@ static int bcm_sysport_init_tx_ring(struct 
bcm_sysport_priv *priv,
 
ring->cbs = kcalloc(size, sizeof(struct bcm_sysport_cb), GFP_KERNEL);
if (!ring->cbs) {
+   dma_free_coherent(kdev, sizeof(struct dma_desc),
+ ring->desc_cpu, ring->desc_dma);
netif_err(priv, hw, priv->netdev, "CB allocation failed\n");
return -ENOMEM;
}
-- 
2.9.3



[PATCH] isdn: hisax: fix buffer overflow check

2017-08-24 Thread Arnd Bergmann
gcc-8 warns about a corner case that can overflow a memcpy buffer when a
length variable is negative. While the code checks for an overly large
value, it does not check for a negative length that would get turned
into a large positive number:

In function 'memcpy',
inlined from 'skb_put_data' at include/linux/skbuff.h:2042:2,
inlined from 'l3dss1_cmd_global' at drivers/isdn/hisax/l3dss1.c:2219:4:
include/linux/string.h:348:9: error: '__builtin_memcpy' reading 266 or more 
bytes from a region of size 265 [-Werror=stringop-overflow=]

In function 'memcpy',
inlined from 'skb_put_data' at include/linux/skbuff.h:2042:2,
inlined from 'l3ni1_cmd_global' at drivers/isdn/hisax/l3ni1.c:2079:4:
include/linux/string.h:348:9: error: '__builtin_memcpy' reading between 266 and 
4294967295 bytes from a region of size 265 [-Werror=stringop-overflow=]

It's not clear to me whether the warning should be here, or if this
is another case of an optimization step in gcc causing a warning about
something that would otherwise be silently ignored. Either way, making
the length 'unsigned int' instead ensures that no overflow can happen
here, and avoids the warning. The same code exists in two files, so I'm
patching both the same way.

Signed-off-by: Arnd Bergmann 
---
 drivers/isdn/hisax/l3dss1.c | 3 ++-
 drivers/isdn/hisax/l3ni1.c  | 3 ++-
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/isdn/hisax/l3dss1.c b/drivers/isdn/hisax/l3dss1.c
index 18a3484b1f7e..85cef7a2e709 100644
--- a/drivers/isdn/hisax/l3dss1.c
+++ b/drivers/isdn/hisax/l3dss1.c
@@ -2168,7 +2168,8 @@ static int l3dss1_cmd_global(struct PStack *st, isdn_ctrl 
*ic)
 { u_char id;
u_char temp[265];
u_char *p = temp;
-   int i, l, proc_len;
+   int i;
+   unsigned int l, proc_len;
struct sk_buff *skb;
struct l3_process *pc = NULL;
 
diff --git a/drivers/isdn/hisax/l3ni1.c b/drivers/isdn/hisax/l3ni1.c
index ea311e7df48e..99e0ba5d49a1 100644
--- a/drivers/isdn/hisax/l3ni1.c
+++ b/drivers/isdn/hisax/l3ni1.c
@@ -2024,7 +2024,8 @@ static int l3ni1_cmd_global(struct PStack *st, isdn_ctrl 
*ic)
 { u_char id;
u_char temp[265];
u_char *p = temp;
-   int i, l, proc_len;
+   int i;
+   unsigned int l, proc_len;
struct sk_buff *skb;
struct l3_process *pc = NULL;
 
-- 
2.9.0



[PATCH net] net: bcmgenet: Be drop monitor friendly

2017-08-24 Thread Florian Fainelli
There are 3 spots where we call dev_kfree_skb() but we are actually
just doing a normal SKB consumption: __bcmgenet_tx_reclaim() for normal
TX reclamation, bcmgenet_alloc_rx_buffers() during the initial RX ring
setup and bcmgenet_free_rx_buffers() during RX ring cleanup.

Fixes: d6707bec5986 ("net: bcmgenet: rewrite bcmgenet_rx_refill()")
Fixes: f48bed16a756 ("net: bcmgenet: Free skb after last Tx frag")
Signed-off-by: Florian Fainelli 
---
David, I tagged these two commits, but this can actually go back
to when the driver was first merged. Let me know if you would want
me to provide -stable backports against other trees.

Thanks!

 drivers/net/ethernet/broadcom/genet/bcmgenet.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/genet/bcmgenet.c 
b/drivers/net/ethernet/broadcom/genet/bcmgenet.c
index a981c4ee9d72..fea3f9a5fb2d 100644
--- a/drivers/net/ethernet/broadcom/genet/bcmgenet.c
+++ b/drivers/net/ethernet/broadcom/genet/bcmgenet.c
@@ -1360,7 +1360,7 @@ static unsigned int __bcmgenet_tx_reclaim(struct 
net_device *dev,
if (skb) {
pkts_compl++;
bytes_compl += GENET_CB(skb)->bytes_sent;
-   dev_kfree_skb_any(skb);
+   dev_consume_skb_any(skb);
}
 
txbds_processed++;
@@ -1875,7 +1875,7 @@ static int bcmgenet_alloc_rx_buffers(struct bcmgenet_priv 
*priv,
cb = ring->cbs + i;
skb = bcmgenet_rx_refill(priv, cb);
if (skb)
-   dev_kfree_skb_any(skb);
+   dev_consume_skb_any(skb);
if (!cb->skb)
return -ENOMEM;
}
@@ -1894,7 +1894,7 @@ static void bcmgenet_free_rx_buffers(struct bcmgenet_priv 
*priv)
 
skb = bcmgenet_free_rx_cb(&priv->pdev->dev, cb);
if (skb)
-   dev_kfree_skb_any(skb);
+   dev_consume_skb_any(skb);
}
 }
 
-- 
2.9.3



RE: [PATCH net-next v3 2/7] netvsc: fix warnings reported by lockdep

2017-08-24 Thread Stephen Hemminger
These are false positives; lockdep is complaining about things that are safe
It is just that annotations were missing or incorrect.

-Original Message-
From: David Woodhouse [mailto:dw...@infradead.org] 
Sent: Thursday, August 24, 2017 1:37 AM
To: Stephen Hemminger ; KY Srinivasan 
; Haiyang Zhang ; Stephen Hemminger 

Cc: de...@linuxdriverproject.org; netdev@vger.kernel.org
Subject: Re: [PATCH net-next v3 2/7] netvsc: fix warnings reported by lockdep

On Fri, 2017-07-28 at 08:59 -0700, Stephen Hemminger wrote:
> This includes a bunch of fixups for issues reported by
> lockdep.
>    * ethtool routines can assume RTNL
>    * send is done with RCU lock (and BH disable)
>    * avoid refetching internal device struct (netvsc)
>  instead pass it as a parameter.
> 
> Signed-off-by: Stephen Hemminger 

The subject and the commit message are inconsistent — is this fixing
*warnings* (i.e. shut up a false positive), or is it fixing *issues*?
It looks like it's actually fixing issues, not just warnings?

It's really useful to get that right.

FWIW the reason I'm looking in my netdev@ folder for lockdep warning
fixes is because I was trying to confirm whether the commit message in
https://patchwork.kernel.org/patch/4372391/ is actually telling the
truth or not — in that case I think it *is* just a false positive being
shut up (and thus it's OK to say "fix warning"), not really fixing a
true issue.


[PATCH net] bpf: fix bpf_setsockopts return value

2017-08-24 Thread Yuchung Cheng
This patch fixes a bug causing any sock operations to always return EINVAL.

Fixes: a5192c52377e ("bpf: fix to bpf_setsockops").
Reported-by: Neal Cardwell 
Signed-off-by: Yuchung Cheng 
Acked-by: Neal Cardwell 
Acked-by: Craig Gallek 
---
 net/core/filter.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/net/core/filter.c b/net/core/filter.c
index 6280a602604c..8eb81e5fae08 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -2872,7 +2872,6 @@ BPF_CALL_5(bpf_setsockopt, struct bpf_sock_ops_kern *, 
bpf_sock,
ret = -EINVAL;
}
}
-   ret = -EINVAL;
 #endif
} else {
ret = -EINVAL;
-- 
2.14.1.342.g6490525c54-goog



Re: [PATCH net-next 3/3 v7] drivers: net: ethernet: qualcomm: rmnet: Initial implementation

2017-08-24 Thread Subash Abhinov Kasiviswanathan

+
+CFLAGS_rmnet.o := -I$(src)


You do not need this CFLAGS rule, the local include files are included
using "" double quotes so it uses the local directory always.


Hi David

I'll remove this.


+static void rmnet_free_later(struct work_struct *work)
+{
+   struct rmnet_free_work *fwork;
+
+   fwork = container_of(work, struct rmnet_free_work, work);
+
+   rtnl_lock();
+   rmnet_delink(fwork->rmnet_dev, NULL);
+   rtnl_unlock();
+
+   kfree(fwork);
+}


This is racy and doesn't work properly.

When you schedule this work, the RTNL mutex is dropped.  Meanwhile
another request can come in the associate this device.

Your work function will still run and erroneously unlink the object.

Furthermore, during this time that the RTNL mutex is dropped, people
will see the unassociated device in the lists.

You have to atomically remove the object from all possible locations
which provide external visibility of that object, before the RTNL
mutex is dropped.

So you can defer the freeing, but you cannot defer the unlink
operation.


I had incorrectly assumed earlier that the check in rtnl_newlink for
NLM_F_EXCL would guard against the scenario of re-associating a
device which was unlinked.



You probably need to move to RCU as well in order for all of this to
work properly since scans of the lists occur in the data path which
is completely asynchronous and not protected by the RTNL mutex.


I'll remove all the rtnl locks and checks and switch to rcu and post
an update.

--
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a 
Linux Foundation Collaborative Project


Re: [iproute PATCH v3 0/6] Covscan: Misc fixes

2017-08-24 Thread Stephen Hemminger
On Thu, 24 Aug 2017 11:41:25 +0200
Phil Sutter  wrote:

> This series collects patches from v1 addressing miscellaneous issues
> detected by covscan.
> 
> Changes since v2:
> - Dropped patch 1 since v2 discussion is still inconclusive.
> - Replaced patch 2 by a more appropriate one given feedback from v2.
> 
> Phil Sutter (6):
>   ss: Make struct tcpstat fields 'timer' and 'timeout' unsigned
>   ss: Make sure scanned index value to unix_state_map is sane
>   netem/maketable: Check return value of fscanf()
>   lib/bpf: Check return value of write()
>   lib/fs: Fix and simplify make_path()
>   lib/libnetlink: Don't pass NULL parameter to memcpy()
> 
>  lib/bpf.c |  3 ++-
>  lib/fs.c  | 20 +---
>  lib/libnetlink.c  |  6 --
>  misc/ss.c | 11 +--
>  netem/maketable.c |  4 ++--
>  5 files changed, 18 insertions(+), 26 deletions(-)
> 

Applied this group


Re: [iproute PATCH v3 6/6] lib/libnetlink: Don't pass NULL parameter to memcpy()

2017-08-24 Thread Stephen Hemminger
On Thu, 24 Aug 2017 11:41:31 +0200
Phil Sutter  wrote:

> Both addattr_l() and rta_addattr_l() may be called with NULL data
> pointer and 0 alen parameters. Avoid calling memcpy() in that case.
> 
> Signed-off-by: Phil Sutter 
> ---
>  lib/libnetlink.c | 6 --
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/lib/libnetlink.c b/lib/libnetlink.c
> index 874e660be7eb4..fbe719ee10449 100644
> --- a/lib/libnetlink.c
> +++ b/lib/libnetlink.c
> @@ -871,7 +871,8 @@ int addattr_l(struct nlmsghdr *n, int maxlen, int type, 
> const void *data,
>   rta = NLMSG_TAIL(n);
>   rta->rta_type = type;
>   rta->rta_len = len;
> - memcpy(RTA_DATA(rta), data, alen);
> + if (alen)
> + memcpy(RTA_DATA(rta), data, alen);
>   n->nlmsg_len = NLMSG_ALIGN(n->nlmsg_len) + RTA_ALIGN(len);
>   return 0;
>  }
> @@ -958,7 +959,8 @@ int rta_addattr_l(struct rtattr *rta, int maxlen, int 
> type,
>   subrta = (struct rtattr *)(((char *)rta) + RTA_ALIGN(rta->rta_len));
>   subrta->rta_type = type;
>   subrta->rta_len = len;
> - memcpy(RTA_DATA(subrta), data, alen);
> + if (alen)
> + memcpy(RTA_DATA(subrta), data, alen);
>   rta->rta_len = NLMSG_ALIGN(rta->rta_len) + RTA_ALIGN(len);
>   return 0;
>  }

Ok, applied. You never know when GCC language experts might decide
to exploit undefined behavior.


[PATCH net] net: systemport: Be drop monitor friendly

2017-08-24 Thread Florian Fainelli
Utilize dev_consume_skb_any(cb->skb) in bcm_sysport_free_cb() which is
used when a TX packet is completed, as well as when the RX ring is
cleaned on shutdown. None of these two cases are packet drops, so be
drop monitor friendly.

Suggested-by: Eric Dumazet 
Fixes: 80105befdb4b ("net: systemport: add Broadcom SYSTEMPORT Ethernet MAC 
driver")
Signed-off-by: Florian Fainelli 
---
 drivers/net/ethernet/broadcom/bcmsysport.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/broadcom/bcmsysport.c 
b/drivers/net/ethernet/broadcom/bcmsysport.c
index dc3052751bc1..e6add99cc31c 100644
--- a/drivers/net/ethernet/broadcom/bcmsysport.c
+++ b/drivers/net/ethernet/broadcom/bcmsysport.c
@@ -597,7 +597,7 @@ static int bcm_sysport_set_coalesce(struct net_device *dev,
 
 static void bcm_sysport_free_cb(struct bcm_sysport_cb *cb)
 {
-   dev_kfree_skb_any(cb->skb);
+   dev_consume_skb_any(cb->skb);
cb->skb = NULL;
dma_unmap_addr_set(cb, dma_addr, 0);
 }
-- 
2.9.3



Re: [iproute PATCH v4 0/6] Covscan: Fixes for string termination

2017-08-24 Thread Stephen Hemminger
On Thu, 24 Aug 2017 11:51:44 +0200
Phil Sutter  wrote:

> This series collects patches from v1 dealing with code potentially
> leaving string buffers unterminated. This does not include situations
> where it happens for parsed interface names since an overall solution
> was attempted for that and it's state is still unclear due to lack of
> feedback from upstream.
> 
> Changes since v3:
> - Dropped patch 2 since upstream discussion in v3 is not conclusive yet.
> 
> Phil Sutter (6):
>   ipntable: Avoid memory allocation for filter.name
>   lib/fs: Fix format string in find_fs_mount()
>   lib/inet_proto: Review inet_proto_{a2n,n2a}()
>   lnstat_util: Simplify alloc_and_open() a bit
>   tc/m_xt: Fix for potential string buffer overflows
>   lib/ll_map: Choose size of new cache items at run-time
> 
>  ip/ipntable.c  |  6 +++---
>  lib/fs.c   |  2 +-
>  lib/inet_proto.c   | 24 +---
>  lib/ll_map.c   |  4 ++--
>  misc/lnstat_util.c |  7 ++-
>  tc/m_xt.c  |  7 ---
>  6 files changed, 25 insertions(+), 25 deletions(-)
> 

Applied.


Re: [PATCH net-next 09/13] net: mvpp2: dynamic reconfiguration of the PHY mode

2017-08-24 Thread Russell King - ARM Linux
On Thu, Aug 24, 2017 at 07:45:19PM +0200, Andrew Lunn wrote:
> > The 88x3310 driver in the kernel knows about these combinations and
> > sets the phy interface parameter correctly depending on whether the
> > PHY has configured itself for copper at whatever speed or SFP+.
> 
> So when the PHY decides to swap from copper to fibre etc, is the
> phylib state machine kept up to date. Does it see a down, followed by
> an up?

I'd have to re-check to make sure, but I believe it does, because the
negotiation is held off on the "other" media until the currently active
link has gone down.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up
According to speedtest.net: 8.21Mbps down 510kbps up


Re: [PATCH v4] net: sunrpc: svcsock: fix NULL-pointer exception

2017-08-24 Thread J. Bruce Fields
On Wed, Aug 23, 2017 at 06:33:42AM -0400, Jeff Layton wrote:
> I think this one looks fine. Bruce, do you mind picking this one up? It
> might also be reasonable for stable...

Yep, thanks to you both, I'll plan to send a pull request tonight or
tomorrow.

--b.


Re: [Patch net-next] net_sched: remove tc class reference counting

2017-08-24 Thread Cong Wang
On Thu, Aug 24, 2017 at 12:32 PM, David Miller  wrote:
> From: Cong Wang 
> Date: Tue, 22 Aug 2017 21:38:10 -0700
>
>> For TC classes, their ->get() and ->put() are always paired, and the
>> reference counting is completely useless, because:
>>
>> 1) For class modification and dumping paths, we already hold RTNL lock,
>>so all of these ->get(),->change(),->put() are atomic.
>>
>> 2) For filter bindiing/unbinding, we use other reference counter than
>>this one, and they should have RTNL lock too.
>>
>> 3) For ->qlen_notify(), it is special because it is called on ->enqueue()
>>path, but we already hold qdisc tree lock there, and we hold this
>>tree lock when graft or delete the class too, so it should not be gone
>>or changed until we release the tree lock.
>>
>> Therefore, this patch removes ->get() and ->put(), but:
>>
>> 1) Adds a new ->find() to find the pointer to a class by classid, no
>>refcnt.
>>
>> 2) Move the original class destroy upon the last refcnt into ->delete(),
>>right after releasing tree lock. This is fine because the class is
>>already removed from hash when holding the lock.
>>
>> For those who also use ->put() as ->unbind(), just rename them to reflect
>> this change.
>>
>> Cc: Jamal Hadi Salim 
>> Signed-off-by: Cong Wang 
>
> This one looks good to me.
>
> Jamal, please give it a quick look over and review.
>

Hold on, I spot a problem near ->delete(), need a fix like
I did for tc actions...

The reason why I didn't catch it in my test is actually we
don't notify delete event for classes when deleting the
whole tree.

Anyway, v2 is coming.


[PATCH] cfg80211: Use tabs for identation

2017-08-24 Thread Himanshu Jha
Removed whitespaces and added necessary tabs conforming to coding style.
Issue found using checkpatch.pl

Signed-off-by: Himanshu Jha 
---
 net/wireless/chan.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/net/wireless/chan.c b/net/wireless/chan.c
index b8aa5a7..13c95e5 100644
--- a/net/wireless/chan.c
+++ b/net/wireless/chan.c
@@ -984,9 +984,9 @@ int cfg80211_set_monitor_channel(struct 
cfg80211_registered_device *rdev,
 
 void
 cfg80211_get_chan_state(struct wireless_dev *wdev,
-   struct ieee80211_channel **chan,
-   enum cfg80211_chan_mode *chanmode,
-   u8 *radar_detect)
+   struct ieee80211_channel **chan,
+   enum cfg80211_chan_mode *chanmode,
+   u8 *radar_detect)
 {
int ret;
 
-- 
2.7.4



Re: [iproute PATCH v4 0/4] Covscan: Fix potential NULL pointer dereferences

2017-08-24 Thread Stephen Hemminger
On Thu, 24 Aug 2017 11:46:30 +0200
Phil Sutter  wrote:

> This series collects patches from v1 which eliminate possible cases of
> NULL pointer dereferences.
> 
> Changes since v3:
> - Dropped upstream rejected patch 2.
> 
> Phil Sutter (4):
>   ifstat, nstat: Check fdopen() return value
>   tc/q_netem: Don't dereference possibly NULL pointer
>   tc/tc_filter: Make sure filter name is not empty
>   tipc/bearer: Prevent NULL pointer dereference
> 
>  misc/ifstat.c  | 16 +++-
>  misc/nstat.c   | 16 +++-
>  tc/q_netem.c   |  3 ++-
>  tc/tc_filter.c |  3 +++
>  tipc/bearer.c  |  2 +-
>  5 files changed, 28 insertions(+), 12 deletions(-)
> 

Applied. Thanks


Re: [PATCH net] virtio_net: be drop monitor friend

2017-08-24 Thread Michael S. Tsirkin
On Thu, Aug 24, 2017 at 09:02:49AM -0700, Eric Dumazet wrote:
> From: Eric Dumazet 
> 
> This change is needed to not fool drop monitor.
> (perf record ... -e skb:kfree_skb )
> 
> Packets were properly sent and are consumed after TX completion.
> 
> Signed-off-by: Eric Dumazet 

Acked-by: Michael S. Tsirkin 

> ---
>  drivers/net/virtio_net.c |2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 98f17b05c68b..b06169ea60dc 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -1058,7 +1058,7 @@ static void free_old_xmit_skbs(struct send_queue *sq)
>   bytes += skb->len;
>   packets++;
>  
> - dev_kfree_skb_any(skb);
> + dev_consume_skb_any(skb);
>   }
>  
>   /* Avoid overhead when no packets have been processed
> 


[PATCH] strparser: initialize all callbacks

2017-08-24 Thread Eric Biggers
From: Eric Biggers 

commit bbb03029a899 ("strparser: Generalize strparser") added more
function pointers to 'struct strp_callbacks'; however, kcm_attach() was
not updated to initialize them.  This could cause the ->lock() and/or
->unlock() function pointers to be set to garbage values, causing a
crash in strp_work().

Fix the bug by moving the callback structs into static memory, so
unspecified members are zeroed.  Also constify them while we're at it.

This bug was found by syzkaller, which encountered the following splat:

IP: 0x55
PGD 3b1ca067
P4D 3b1ca067
PUD 3b12f067
PMD 0

Oops: 0010 [#1] SMP KASAN
Dumping ftrace buffer:
   (ftrace buffer empty)
Modules linked in:
CPU: 2 PID: 1194 Comm: kworker/u8:1 Not tainted 4.13.0-rc4-next-20170811 #2
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
Workqueue: kstrp strp_work
task: 88006bb0e480 task.stack: 88006bb1
RIP: 0010:0x55
RSP: 0018:88006bb17540 EFLAGS: 00010246
RAX: dc00 RBX: 88006ce4bd60 RCX: 
RDX: 11000d9c97bd RSI:  RDI: 88006ce4bc48
RBP: 88006bb17558 R08: 81467ab2 R09: 
R10: 88006bb17438 R11: 88006bb17940 R12: 88006ce4bc48
R13: 88003c683018 R14: 88006bb17980 R15: 88003c683000
FS:  () GS:88006de0() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 0055 CR3: 3c145000 CR4: 06e0
DR0:  DR1:  DR2: 
DR3:  DR6: fffe0ff0 DR7: 0400
Call Trace:
 process_one_work+0xbf3/0x1bc0 kernel/workqueue.c:2098
 worker_thread+0x223/0x1860 kernel/workqueue.c:2233
 kthread+0x35e/0x430 kernel/kthread.c:231
 ret_from_fork+0x2a/0x40 arch/x86/entry/entry_64.S:431
Code:  Bad RIP value.
RIP: 0x55 RSP: 88006bb17540
CR2: 0055
---[ end trace f0e4920047069cee ]---

Here is a C reproducer (requires CONFIG_BPF_SYSCALL=y and
CONFIG_AF_KCM=y):

#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 

static const struct bpf_insn bpf_insns[3] = {
{ .code = 0xb7 }, /* BPF_MOV64_IMM(0, 0) */
{ .code = 0x95 }, /* BPF_EXIT_INSN() */
};

static const union bpf_attr bpf_attr = {
.prog_type = 1,
.insn_cnt = 2,
.insns = (uintptr_t)&bpf_insns,
.license = (uintptr_t)"",
};

int main(void)
{
int bpf_fd = syscall(__NR_bpf, BPF_PROG_LOAD,
 &bpf_attr, sizeof(bpf_attr));
int inet_fd = socket(AF_INET, SOCK_STREAM, 0);
int kcm_fd = socket(AF_KCM, SOCK_DGRAM, 0);

ioctl(kcm_fd, SIOCKCMATTACH,
  &(struct kcm_attach) { .fd = inet_fd, .bpf_fd = bpf_fd });
}

Fixes: bbb03029a899 ("strparser: Generalize strparser")
Cc: Dmitry Vyukov 
Cc: Tom Herbert 
Signed-off-by: Eric Biggers 
---
 Documentation/networking/strparser.txt |  2 +-
 include/net/strparser.h|  2 +-
 kernel/bpf/sockmap.c   | 10 +-
 net/kcm/kcmsock.c  | 11 +--
 net/strparser/strparser.c  |  2 +-
 5 files changed, 13 insertions(+), 14 deletions(-)

diff --git a/Documentation/networking/strparser.txt 
b/Documentation/networking/strparser.txt
index fe01302471ae..13081b3decef 100644
--- a/Documentation/networking/strparser.txt
+++ b/Documentation/networking/strparser.txt
@@ -35,7 +35,7 @@ Functions
 =
 
 strp_init(struct strparser *strp, struct sock *sk,
- struct strp_callbacks *cb)
+ const struct strp_callbacks *cb)
 
  Called to initialize a stream parser. strp is a struct of type
  strparser that is allocated by the upper layer. sk is the TCP
diff --git a/include/net/strparser.h b/include/net/strparser.h
index 4fe966a0ad92..7dc131d62ad5 100644
--- a/include/net/strparser.h
+++ b/include/net/strparser.h
@@ -138,7 +138,7 @@ void strp_done(struct strparser *strp);
 void strp_stop(struct strparser *strp);
 void strp_check_rcv(struct strparser *strp);
 int strp_init(struct strparser *strp, struct sock *sk,
- struct strp_callbacks *cb);
+ const struct strp_callbacks *cb);
 void strp_data_ready(struct strparser *strp);
 int strp_process(struct strparser *strp, struct sk_buff *orig_skb,
 unsigned int orig_offset, size_t orig_len,
diff --git a/kernel/bpf/sockmap.c b/kernel/bpf/sockmap.c
index 78b2bb9370ac..617c239590c2 100644
--- a/kernel/bpf/sockmap.c
+++ b/kernel/bpf/sockmap.c
@@ -368,12 +368,12 @@ static int smap_read_sock_done(struct strparser *strp, 
int err)
 static int smap_init_sock(struct smap_psock *psock,
  struct sock *sk)
 {
-   struct strp_callbacks cb;
+   static const struct strp_callbacks cb = {

Re: [PATCH][next] net: hinic: fix comparison of a uint16_t type with -1

2017-08-24 Thread David Miller
From: Colin King 
Date: Wed, 23 Aug 2017 16:39:36 +0100

> From: Colin Ian King 
> 
> The comparison of hw_ioctxt.rx_buf_sz_idx == -1 is always false because
> rx_buf_sz_idx is a uint16_t. Fix this by explicitly casting -1 to uint16_t.
> 
> Detected by CoverityScan, CID#1454559 ("Operands don't affect result")
> 
> Signed-off-by: Colin Ian King 
> ---
>  drivers/net/ethernet/huawei/hinic/hinic_hw_dev.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/net/ethernet/huawei/hinic/hinic_hw_dev.c 
> b/drivers/net/ethernet/huawei/hinic/hinic_hw_dev.c
> index 09dec6de8dd5..71e26070fb7f 100644
> --- a/drivers/net/ethernet/huawei/hinic/hinic_hw_dev.c
> +++ b/drivers/net/ethernet/huawei/hinic/hinic_hw_dev.c
> @@ -352,7 +352,7 @@ static int set_hw_ioctxt(struct hinic_hwdev *hwdev, 
> unsigned int rq_depth,
>   }
>   }
>  
> - if (hw_ioctxt.rx_buf_sz_idx == -1)
> + if (hw_ioctxt.rx_buf_sz_idx == (uint16_t)-1)
>   return -EINVAL;
>  
>   hw_ioctxt.sq_depth  = ilog2(sq_depth);

This is really silly.

The code in question is trying to convert a size (HINIC_RX_BUF_SZ)
into a table index, using a loop.

It should just compute this at compile time and do away with this
overly confusing code.

Even a switch statement would be much better and the compiler
would optimize the whole away into a single assignment in
the generated code.


[PATCH] staging: rtlwifi: Improve debugging by using debugfs

2017-08-24 Thread Larry Finger
The changes in this commit are also being sent to the main rtlwifi
drivers in wireless-next; however, these changes will also be useful for
any debugging of r8822be before it gets moved into the main tree.

Use debugfs to dump register and btcoex status, and also write registers
and h2c.

We create topdir in /sys/kernel/debug/rtlwifi/, and use the MAC address
as subdirectory with several entries to dump mac_reg, bb_reg, rf_reg etc.
An example is
/sys/kernel/debug/rtlwifi/00-11-22-33-44-55-66/mac_0

This change permits examination of device registers in a dynamic manner,
a feature not available with the current debug mechanism.

We use seq_file to replace RT_TRACE to dump status, then we can use 'cat'
to access btcoex's status through debugfs.
(i.e. /sys/kernel/debug/rtlwifi/00-11-22-33-44-55-66/btcoex)
Other related changes are
1. implement btc_disp_dbg_msg() to access btcoex's common status.
2. remove obsolete field bt_exist

Signed-off-by: Ping-Ke Shih 
Signed-off-by: Larry Finger 
Cc: Yan-Hsuan Chuang 
Cc: Birming Chiu 
Cc: Shaofu 
Cc: Steven Ting 
---
 drivers/staging/rtlwifi/debug.c | 226 
 1 file changed, 135 insertions(+), 91 deletions(-)

diff --git a/drivers/staging/rtlwifi/debug.c b/drivers/staging/rtlwifi/debug.c
index b9fd47aeaa9b..7446d71c41d1 100644
--- a/drivers/staging/rtlwifi/debug.c
+++ b/drivers/staging/rtlwifi/debug.c
@@ -80,9 +80,11 @@ void _rtl_dbg_print_data(struct rtl_priv *rtlpriv, u64 comp, 
int level,
}
 }
 
-struct rtl_debgufs_priv {
+struct rtl_debugfs_priv {
struct rtl_priv *rtlpriv;
-   int (*cb)(struct seq_file *m, void *v);
+   int (*cb_read)(struct seq_file *m, void *v);
+   ssize_t (*cb_write)(struct file *filp, const char __user *buffer,
+   size_t count, loff_t *loff);
u32 cb_data;
 };
 
@@ -90,9 +92,9 @@ static struct dentry *debugfs_topdir;
 
 static int rtl_debug_get_common(struct seq_file *m, void *v)
 {
-   struct rtl_debgufs_priv *debugfs_priv = m->private;
+   struct rtl_debugfs_priv *debugfs_priv = m->private;
 
-   return debugfs_priv->cb(m, v);
+   return debugfs_priv->cb_read(m, v);
 }
 
 static int dl_debug_open_common(struct inode *inode, struct file *file)
@@ -109,7 +111,7 @@ static const struct file_operations file_ops_common = {
 
 static int rtl_debug_get_mac_page(struct seq_file *m, void *v)
 {
-   struct rtl_debgufs_priv *debugfs_priv = m->private;
+   struct rtl_debugfs_priv *debugfs_priv = m->private;
struct rtl_priv *rtlpriv = debugfs_priv->rtlpriv;
u32 page = debugfs_priv->cb_data;
int i, n;
@@ -126,8 +128,8 @@ static int rtl_debug_get_mac_page(struct seq_file *m, void 
*v)
 }
 
 #define RTL_DEBUG_IMPL_MAC_SERIES(page, addr)  \
-struct rtl_debgufs_priv rtl_debug_priv_mac_ ##page = { \
-   .cb = rtl_debug_get_mac_page,   \
+struct rtl_debugfs_priv rtl_debug_priv_mac_ ##page = { \
+   .cb_read = rtl_debug_get_mac_page,  \
.cb_data = addr,\
 }
 
@@ -150,7 +152,7 @@ RTL_DEBUG_IMPL_MAC_SERIES(17, 0x1700);
 
 static int rtl_debug_get_bb_page(struct seq_file *m, void *v)
 {
-   struct rtl_debgufs_priv *debugfs_priv = m->private;
+   struct rtl_debugfs_priv *debugfs_priv = m->private;
struct rtl_priv *rtlpriv = debugfs_priv->rtlpriv;
struct ieee80211_hw *hw = rtlpriv->hw;
u32 page = debugfs_priv->cb_data;
@@ -168,8 +170,8 @@ static int rtl_debug_get_bb_page(struct seq_file *m, void 
*v)
 }
 
 #define RTL_DEBUG_IMPL_BB_SERIES(page, addr)   \
-struct rtl_debgufs_priv rtl_debug_priv_bb_ ##page = {  \
-   .cb = rtl_debug_get_bb_page,\
+struct rtl_debugfs_priv rtl_debug_priv_bb_ ##page = {  \
+   .cb_read = rtl_debug_get_bb_page,   \
.cb_data = addr,\
 }
 
@@ -192,7 +194,7 @@ RTL_DEBUG_IMPL_BB_SERIES(1f, 0x1f00);
 
 static int rtl_debug_get_reg_rf(struct seq_file *m, void *v)
 {
-   struct rtl_debgufs_priv *debugfs_priv = m->private;
+   struct rtl_debugfs_priv *debugfs_priv = m->private;
struct rtl_priv *rtlpriv = debugfs_priv->rtlpriv;
struct ieee80211_hw *hw = rtlpriv->hw;
enum radio_path rfpath = debugfs_priv->cb_data;
@@ -215,8 +217,8 @@ static int rtl_debug_get_reg_rf(struct seq_file *m, void *v)
 }
 
 #define RTL_DEBUG_IMPL_RF_SERIES(page, addr)   \
-struct rtl_debgufs_priv rtl_debug_priv_rf_ ##page = {  \
-   .cb = rtl_debug_get_reg_rf, \
+struct rtl_debugfs_priv rtl_debug_priv_rf_ ##page = {  \
+   .cb_read = rtl_debug_get_reg_rf,\
.cb_data = addr,\
 }
 
@@ -225,7 +227,7 @@ RTL_DEBUG_IMPL_RF_SERIES(b, RF90_PATH_B);
 
 static int rtl_debug_get_cam_register(struct seq_file *m, void *v)
 {
-   struct rtl_debgufs_priv *debugfs_priv = m->private;
+   struct rtl_debugfs_priv *de

Re: [PATCH net-next 1/3] gre: refactor the gre_fb_xmit

2017-08-24 Thread David Miller

Please repost this series with a proper "[PATCH net-next 0/3] ..." header
posting.

Thank you.


Re: [PATCH net-next 2/5] ipv6: sr: add support for encapsulation of L2 frames

2017-08-24 Thread David Miller
From: David Lebrun 
Date: Wed, 23 Aug 2017 17:32:00 +0200

> + case SEG6_IPTUN_MODE_L2ENCAP:
> + head = sizeof(struct ipv6hdr) + 14;
> + break;

This 14 is ETH_HLEN.

>  
> + if (pskb_expand_head(skb, skb->mac_len, 0, GFP_ATOMIC) < 0)
> + return -ENOMEM;

But you seem to support arbitrary L2 MAC header sizes.

Please sort this out.


Re: [PATCH] tipc: Fix tipc_sk_reinit handling of -EAGAIN

2017-08-24 Thread David Miller
From: Bob Peterson 
Date: Wed, 23 Aug 2017 10:43:02 -0400 (EDT)

> In 9dbbfb0ab6680c6a85609041011484e6658e7d3c function tipc_sk_reinit
> had additional logic added to loop in the event that function
> rhashtable_walk_next() returned -EAGAIN. No worries.
> 
> However, if rhashtable_walk_start returns -EAGAIN, it does "continue",
> and therefore skips the call to rhashtable_walk_stop(). That has
> the effect of calling rcu_read_lock() without its paired call to
> rcu_read_unlock(). Since rcu_read_lock() may be nested, the problem
> may not be apparent for a while, especially since resize events may
> be rare. But the comments to rhashtable_walk_start() state:
> 
>  * ...Note that we take the RCU lock in all
>  * cases including when we return an error.  So you must always call
>  * rhashtable_walk_stop to clean up.
> 
> This patch replaces the continue with a goto and label to ensure a
> matching call to rhashtable_walk_stop().
> 
> Signed-off-by: Bob Peterson 

Applied and queued up for -stable, thanks.


Re: [PATCH] [net-next] qlge: avoid memcpy buffer overflow

2017-08-24 Thread David Miller
From: Arnd Bergmann 
Date: Wed, 23 Aug 2017 15:59:49 +0200

> gcc-8.0.0 (snapshot) points out that we copy a variable-length string
> into a fixed length field using memcpy() with the destination length,
> and that ends up copying whatever follows the string:
> 
> inlined from 'ql_core_dump' at 
> drivers/net/ethernet/qlogic/qlge/qlge_dbg.c:1106:2:
> drivers/net/ethernet/qlogic/qlge/qlge_dbg.c:708:2: error: 'memcpy' reading 15 
> bytes from a region of size 14 [-Werror=stringop-overflow=]
>   memcpy(seg_hdr->description, desc, (sizeof(seg_hdr->description)) - 1);
> 
> Changing it to use strncpy() will instead zero-pad the destination,
> which seems to be the right thing to do here.
> 
> The bug is probably harmless, but it seems like a good idea to address
> it in stable kernels as well, if only for the purpose of building with
> gcc-8 without warnings.
> 
> Cc: sta...@vger.kernel.org

Please don't use explicit stable CC:'ing for networking changes.

> Fixes: a61f80261306 ("qlge: Add ethtool register dump function.")
> Signed-off-by: Arnd Bergmann 

Applied to 'net' and queued up for -stable, thanks.


Re: [PATCH net-next] virtio-net: invoke zerocopy callback on xmit path if no tx napi

2017-08-24 Thread Michael S. Tsirkin
On Thu, Aug 24, 2017 at 04:20:39PM -0400, Willem de Bruijn wrote:
> >> Traffic shaping can introduce msec timescale latencies.
> >>
> >> The delay may actually be a useful signal. If the guest does not
> >> orphan skbs early, TSQ will throttle the socket causing host
> >> queue build up.
> >>
> >> But, if completions are queued in-order, unrelated flows may be
> >> throttled as well. Allowing out of order completions would resolve
> >> this HoL blocking.
> >
> > We can allow out of order, no guests that follow virtio spec
> > will break. But this won't help in all cases
> > - a single slow flow can occupy the whole ring, you will not
> >   be able to make any new buffers available for the fast flow
> > - what host considers a single flow can be multiple flows for guest
> >
> > There are many other examples.
> 
> These examples are due to exhaustion of the fixed ubuf_info pool,
> right?

No - the ring size itself.

> We could use dynamic allocation or a resizable pool if these
> issues are serious enough.

We need some kind of limit on how many requests a guest can queue in the
host, or it's an obvious DoS attack vector. We used the ring size for
that.

> >> > Neither
> >> > do I see why would using tx interrupts within guest be a work around -
> >> > AFAIK windows driver uses tx interrupts.
> >>
> >> It does not address completion latency itself. What I meant was
> >> that in an interrupt-driver model, additional starvation issues,
> >> such as the potential deadlock raised at the start of this thread,
> >> or the timer delay observed before packets were orphaned in
> >> virtio-net in commit b0c39dbdc204, are mitigated.
> >>
> >> Specifically, it breaks the potential deadlock where sockets are
> >> blocked waiting for completions (to free up budget in sndbuf, tsq, ..),
> >> yet completion handling is blocked waiting for a new packet to
> >> trigger free_old_xmit_skbs from start_xmit.
> >
> > This talk of potential deadlock confuses me - I think you mean we would
> > deadlock if we did not orphan skbs in !use_napi - is that right?  If you
> > mean that you can drop skb orphan and this won't lead to a deadlock if
> > free skbs upon a tx interrupt, I agree, for sure.
> 
> Yes, that is what I meant.
> 
> >> >> That is the only thing keeping us from removing the HoL blocking in 
> >> >> vhost-net zerocopy.
> >> >
> >> > We don't enable network watchdog on virtio but we could and maybe
> >> > should.
> >>
> >> Can you elaborate?
> >
> > The issue is that holding onto buffers for very long times makes guests
> > think they are stuck. This is funamentally because from guest point of
> > view this is a NIC, so it is supposed to transmit things out in
> > a timely manner. If host backs the virtual NIC by something that is not
> > a NIC, with traffic shaping etc introducing unbounded latencies,
> > guest will be confused.
> 
> That assumes that guests are fragile in this regard. A linux guest
> does not make such assumptions.

Yes it does. Examples above:
> > - a single slow flow can occupy the whole ring, you will not
> >   be able to make any new buffers available for the fast flow
> > - what host considers a single flow can be multiple flows for guest

it's easier to see if you enable the watchdog timer for virtio. Linux
supports that.

> There are NICs with hardware
> rate limiting, so I'm not sure how much of a leap host os rate
> limiting is.

I don't know what happens if these NICs hold onto packets for seconds.

-- 
MST


Re: [PATCH net-next] virtio-net: invoke zerocopy callback on xmit path if no tx napi

2017-08-24 Thread Willem de Bruijn
>> Traffic shaping can introduce msec timescale latencies.
>>
>> The delay may actually be a useful signal. If the guest does not
>> orphan skbs early, TSQ will throttle the socket causing host
>> queue build up.
>>
>> But, if completions are queued in-order, unrelated flows may be
>> throttled as well. Allowing out of order completions would resolve
>> this HoL blocking.
>
> We can allow out of order, no guests that follow virtio spec
> will break. But this won't help in all cases
> - a single slow flow can occupy the whole ring, you will not
>   be able to make any new buffers available for the fast flow
> - what host considers a single flow can be multiple flows for guest
>
> There are many other examples.

These examples are due to exhaustion of the fixed ubuf_info pool,
right? We could use dynamic allocation or a resizable pool if these
issues are serious enough.

>> > Neither
>> > do I see why would using tx interrupts within guest be a work around -
>> > AFAIK windows driver uses tx interrupts.
>>
>> It does not address completion latency itself. What I meant was
>> that in an interrupt-driver model, additional starvation issues,
>> such as the potential deadlock raised at the start of this thread,
>> or the timer delay observed before packets were orphaned in
>> virtio-net in commit b0c39dbdc204, are mitigated.
>>
>> Specifically, it breaks the potential deadlock where sockets are
>> blocked waiting for completions (to free up budget in sndbuf, tsq, ..),
>> yet completion handling is blocked waiting for a new packet to
>> trigger free_old_xmit_skbs from start_xmit.
>
> This talk of potential deadlock confuses me - I think you mean we would
> deadlock if we did not orphan skbs in !use_napi - is that right?  If you
> mean that you can drop skb orphan and this won't lead to a deadlock if
> free skbs upon a tx interrupt, I agree, for sure.

Yes, that is what I meant.

>> >> That is the only thing keeping us from removing the HoL blocking in 
>> >> vhost-net zerocopy.
>> >
>> > We don't enable network watchdog on virtio but we could and maybe
>> > should.
>>
>> Can you elaborate?
>
> The issue is that holding onto buffers for very long times makes guests
> think they are stuck. This is funamentally because from guest point of
> view this is a NIC, so it is supposed to transmit things out in
> a timely manner. If host backs the virtual NIC by something that is not
> a NIC, with traffic shaping etc introducing unbounded latencies,
> guest will be confused.

That assumes that guests are fragile in this regard. A linux guest
does not make such assumptions. There are NICs with hardware
rate limiting, so I'm not sure how much of a leap host os rate
limiting is.


Re: [PATCH v3 3/4] net: stmmac: register parent MDIO node for sun8i-h3-emac

2017-08-24 Thread Florian Fainelli
On 08/24/2017 01:21 AM, Corentin Labbe wrote:
> On Wed, Aug 23, 2017 at 09:31:53AM -0700, Florian Fainelli wrote:
>> On 08/23/2017 12:49 AM, Maxime Ripard wrote:
>>> Hi Florian,
>>>
>>> On Tue, Aug 22, 2017 at 11:35:01AM -0700, Florian Fainelli wrote:
>>> So I think what you are saying is either impossible or engineering-wise
>>> a very stupid design, like using an external MAC with a discrete PHY
>>> connected to the internal MAC's MDIO bus, while using the internal MAC
>>> with the internal PHY.
>>>
>>> Now can we please decide on something? We're a week and a half from
>>> the 4.13 release. If mdio-mux is wrong, then we could have two mdio
>>> nodes (internal-mdio & external-mdio).
>>
>> I really don't see a need for a mdio-mux in the first place, just have
>> one MDIO controller (current state) sub-node which describes the
>> built-in STMMAC MDIO controller and declare the internal PHY as a child
>> node (along with 'phy-is-integrated'). If a different configuration is
>> used, then just put the external PHY as a child node there.
>>
>> If fixed-link is required, the mdio node becomes unused anyway.
>>
>> Works for everyone?
>
> If we put an external PHY with reg=1 as a child of internal MDIO,
> il will be merged with internal PHY node and get
> phy-is-integrated.

 Then have the .dtsi file contain just the mdio node, but no internal or
 external PHY and push all the internal and external PHY node definition
 (in its entirety) to the per-board DTS file, does not that work?
>>>
>>> If possible, I'd really like to have the internal PHY in the
>>> DTSI. It's always there in hardware anyway, and duplicating the PHY,
>>> with its clock, reset line, and whatever info we might need in the
>>> future in each and every board DTS that uses it will be very error
>>> prone and we will have the usual bunch of issues that come up with
>>> duplication.
>>
>> OK, then what if you put the internal PHY in the DTSI, mark it with a
>> status = "disabled" property, and have the per-board DTS put a status =
>> "okay" property along with a "phy-is-integrated" boolean property? Would
>> that work?
> 
> No, I tested and for example with sun8i-h3-orangepi-plus.dts, the external 
> PHY (ethernet-phy@1) is still merged.

Is not there is a mistake in the unit address not matching the "reg"
property, or am I not looking at the right tree?

&mdio {
ext_rgmii_phy: ethernet-phy@1 {
compatible = "ethernet-phy-ieee802.3-c22";
reg = <0>;
};
};

If the PHY is really at MDIO address 0, then it should be
ethernet-phy@0, and not ethernet-phy@1, and then no problem with the
merging?


> So that adding a 'status = "disabled"' does not bring anything.
> 
>>
>> What I really don't think is necessary is:
>>
>> - duplicating the "mdio" controller node for external vs. internal PHY,
>> because this is not accurate, there is just one MDIO controller, but
>> there may be different kinds of MDIO/PHY devices attached
> 
> For me, if we want to represent the reality, we need two MDIO:
> - since two PHY at the same address could co-exists
> - since they are isolated so not on the same MDIO bus

Is that really true? It might be, but from experience with e.g:
bcmgenet, the integrated PHY and the external PHYs are on the same MDIO
bus, which is convenient, except when you have an address conflict.

> 
>> - having the STMMAC driver MDIO probing code having to deal with a
>> "mdio" sub-node or an "internal-mdio" sub-node because this is confusing
>> and requiring more driver-level changes that are error prone
> 
> My patch for stmmac is really small, only the name of my variable 
> ("need_mdio_mux_ids")
> have to be changed to something like "register_parent_mdio"
> 
> 
> So I agree with Maxime, we need to avoid merging PHY nodes, and we can avoid 
> it only by having two separate MDIO nodes.
> Furthermore, with only one MDIO, we will face with lots of small patch for 
> adding phy-is-integrated, with two we do not need to change any board DT, all 
> is simply clean.
> Really having two MDIO seems cleaner.

The only valid thing that you have provided so far is this merging
problem. Anything else ranging from "we will face with lots of small
patch for adding phy-is-integrated" to "Really having two MDIO seems
cleaner." are hard to receive as technical arguments for correctness.

What happens if someone connects an external PHY at the same MDIO
address than the internal PHY, which one do you get responses from? If
you shutdown the internal PHY and it stops responding, then this
probably becomes deterministic, but it still supports the fact there is
just one MDIO bus controller per MAC.

Anyway, do whatever works best for you I guess.
-- 
Florian


Re: [PATCH v3 3/4] net: stmmac: register parent MDIO node for sun8i-h3-emac

2017-08-24 Thread Corentin Labbe
On Thu, Aug 24, 2017 at 10:21:24AM +0200, Corentin Labbe wrote:
> On Wed, Aug 23, 2017 at 09:31:53AM -0700, Florian Fainelli wrote:
> > On 08/23/2017 12:49 AM, Maxime Ripard wrote:
> > > Hi Florian,
> > > 
> > > On Tue, Aug 22, 2017 at 11:35:01AM -0700, Florian Fainelli wrote:
> > > So I think what you are saying is either impossible or 
> > > engineering-wise
> > > a very stupid design, like using an external MAC with a discrete PHY
> > > connected to the internal MAC's MDIO bus, while using the internal MAC
> > > with the internal PHY.
> > >
> > > Now can we please decide on something? We're a week and a half from
> > > the 4.13 release. If mdio-mux is wrong, then we could have two mdio
> > > nodes (internal-mdio & external-mdio).
> > 
> >  I really don't see a need for a mdio-mux in the first place, just have
> >  one MDIO controller (current state) sub-node which describes the
> >  built-in STMMAC MDIO controller and declare the internal PHY as a child
> >  node (along with 'phy-is-integrated'). If a different configuration is
> >  used, then just put the external PHY as a child node there.
> > 
> >  If fixed-link is required, the mdio node becomes unused anyway.
> > 
> >  Works for everyone?
> > >>>
> > >>> If we put an external PHY with reg=1 as a child of internal MDIO,
> > >>> il will be merged with internal PHY node and get
> > >>> phy-is-integrated.
> > >>
> > >> Then have the .dtsi file contain just the mdio node, but no internal or
> > >> external PHY and push all the internal and external PHY node definition
> > >> (in its entirety) to the per-board DTS file, does not that work?
> > > 
> > > If possible, I'd really like to have the internal PHY in the
> > > DTSI. It's always there in hardware anyway, and duplicating the PHY,
> > > with its clock, reset line, and whatever info we might need in the
> > > future in each and every board DTS that uses it will be very error
> > > prone and we will have the usual bunch of issues that come up with
> > > duplication.
> > 
> > OK, then what if you put the internal PHY in the DTSI, mark it with a
> > status = "disabled" property, and have the per-board DTS put a status =
> > "okay" property along with a "phy-is-integrated" boolean property? Would
> > that work?
> 
> No, I tested and for example with sun8i-h3-orangepi-plus.dts, the external 
> PHY (ethernet-phy@1) is still merged.
> So that adding a 'status = "disabled"' does not bring anything.
> 
> > 
> > What I really don't think is necessary is:
> > 
> > - duplicating the "mdio" controller node for external vs. internal PHY,
> > because this is not accurate, there is just one MDIO controller, but
> > there may be different kinds of MDIO/PHY devices attached
> 
> For me, if we want to represent the reality, we need two MDIO:
> - since two PHY at the same address could co-exists
> - since they are isolated so not on the same MDIO bus
> 
> > - having the STMMAC driver MDIO probing code having to deal with a
> > "mdio" sub-node or an "internal-mdio" sub-node because this is confusing
> > and requiring more driver-level changes that are error prone
> 
> My patch for stmmac is really small, only the name of my variable 
> ("need_mdio_mux_ids")
> have to be changed to something like "register_parent_mdio"
> 
> 
> So I agree with Maxime, we need to avoid merging PHY nodes, and we can avoid 
> it only by having two separate MDIO nodes.
> Furthermore, with only one MDIO, we will face with lots of small patch for 
> adding phy-is-integrated, with two we do not need to change any board DT, all 
> is simply clean.
> Really having two MDIO seems cleaner.
> 

Hello

I have speaked with Neil Amstrong, and he said that they get the same problem 
on amlogic.
They use a mdio-mux-mmioreg, (see eth-phy-mux in 
arch/arm64/boot/dts/amlogic/meson-gxl.dtsi)
So tomorow, i will send a patch series which do the same with the exception 
that we need a mdio-mux-syscon (which is easy/simple to do).
Since their setup use stmmac, it means that we will need 0 changes on stmmac.

Regards


Re: [PATCH] net/mlx4_core: make mlx4_profile const

2017-08-24 Thread David Miller
From: Bhumika Goyal 
Date: Wed, 23 Aug 2017 18:17:39 +0530

> Make these const as they are only used in a copy operation.
> 
> Signed-off-by: Bhumika Goyal 

Applied to net-next.


Re: [PATCH net] virtio_net: be drop monitor friend

2017-08-24 Thread Eric Dumazet
On Thu, 2017-08-24 at 11:50 -0700, David Miller wrote:
> From: Eric Dumazet 
> Date: Thu, 24 Aug 2017 09:02:49 -0700
> 
> > From: Eric Dumazet 
> > 
> > This change is needed to not fool drop monitor.
> > (perf record ... -e skb:kfree_skb )
> > 
> > Packets were properly sent and are consumed after TX completion.
> > 
> > Signed-off-by: Eric Dumazet 
> 
> I'm pretty sure you meant "friendly" in the Subject line so I fixed
> it up to say that :-)
> 
> Applied, thanks.

Ah, great, thanks for fixing this ;)




Re: [PATCH] net/mlx5e: make mlx5e_profile const

2017-08-24 Thread David Miller
From: Bhumika Goyal 
Date: Wed, 23 Aug 2017 18:22:01 +0530

> Make this const as it is only passed as an argument to the function
> mlx5e_create_netdev and the corresponding argument is of type const.
> 
> Signed-off-by: Bhumika Goyal 

Applied to net-next.


  1   2   3   >