date:20170527

Re: running an eBPF program

2017-05-27 Thread Y Song

On Sat, May 27, 2017 at 5:11 PM, David Miller  wrote:
> From: Y Song 
> Date: Sat, 27 May 2017 13:52:27 -0700
>
>> On Sat, May 27, 2017 at 1:23 PM, Y Song  wrote:
>>>
>>> From verifier error message:
>>> ==
>>> 0: (bf) r6 = r1
>>>
>>> 1: (18) r9 = 0xffee
>>>
>>> 3: (69) r0 = *(u16 *)(r6 +16)
>>>
>>> invalid bpf_context access off=16 size=2
>>> ==
>>>
>>> The offset 16 of struct __sk_buff is hash.
>>> What instruction #3 tries to do is to access 2 bytes of the hash value
>>> instead of full 4 bytes.
>>> This is explicitly not allowed in verifier due to endianness issue.
>>
>>
>> I can reproduce the issue now. My previous statement saying to access
>> "hash" field is not correct. It is accessing the protocol field.
>>
>> static __inline__ bool flow_dissector(struct __sk_buff *skb,
>>   struct flow_keys *flow)
>> {
>> int poff, nh_off = BPF_LL_OFF + ETH_HLEN;
>> __be16 proto = skb->protocol;
>> __u8 ip_proto;
>>
>> The plan so far is to see whether we can fix the issue in LLVM side.
>
> If the compiler properly asks for "__sk_buff + 16" on little-endian
> and "__sk_buff + 20" on big-endian, the verifier should instead be
> fixed to allow the access to pass.
>
> I can't see any reason why LLVM won't set the offset properly like
> that, and it's a completely legitimate optimization that we shouldn't
> try to stop LLVM from performing.

I do agree that such optimization in LLVM is perfect fine and actually
beneficial.
The only reason I was thinking was to avoid introduce endianness into verifier.
Maybe not too much work there. Let me do some experiments and come with
a patch for that.

Thanks!

Yonghong

>
> It also makes it so that we don't have to fix having absurdly defined
> __sk_buff's protocol field as a u32.
>
> Thanks.

Re: [PATCH v2 net-next 3/9] net: lwtunnel: Add extack to encap attr validation

2017-05-27 Thread David Ahern

On 5/27/17 7:02 PM, kbuild test robot wrote:
> Hi David,
> 
> [auto build test ERROR on net-next/master]
> 
> url:
> https://github.com/0day-ci/linux/commits/David-Ahern/net-another-round-of-extack-handling-for-routing/20170528-062659
> config: ia64-allmodconfig (attached as .config)
> compiler: ia64-linux-gcc (GCC) 6.2.0
> reproduce:
> wget 
> https://raw.githubusercontent.com/01org/lkp-tests/master/sbin/make.cross -O 
> ~/bin/make.cross
> chmod +x ~/bin/make.cross
> # save the attached .config to linux build tree
> make.cross ARCH=ia64 
> 
> Note: the 
> linux-review/David-Ahern/net-another-round-of-extack-handling-for-routing/20170528-062659
>  HEAD 44d11f94ccbe86e1a088f83a49ef3ca473118ad8 builds fine.
>   It only hurts bisectibility.
> 
> All errors (new ones prefixed by >>):
> 
>ERROR: "ia64_delay_loop" [drivers/spi/spi-thunderx.ko] undefined!
>>> ERROR: "ia64_delay_loop" [drivers/net/phy/mdio-cavium.ko] undefined!

Appears to be a false positive. I do not modify anything related to
delay loops and none of this code is arch specific.

kbuild guys: can you clarify?

Re: [PATCH 7/7] mlx5: Do not build eswitch_offloads if CONFIG_MLX5_EN_ESWITCH_OFFLOADS is set

2017-05-27 Thread Jes Sorensen


On 05/27/2017 05:02 PM, Or Gerlitz wrote:

On Sat, May 27, 2017 at 12:16 AM, Jes Sorensen  wrote:

This gets rid of the temporary #ifdef spaghetti and allows the code to
compile without offload support enabled.


Hi Jes,

I am pretty sure we can do that exercise you're up to without any
spaghetti cooking and even put more code under that CONFIG directive
(en_rep.c), I'll take that with Saeed.


Hi Or,

I want to avoid adding #ifdef CONFIG_foo to the main code in order to 
keep it readable. I did it gradually to make sure I didn't break 
anything and to allow for it to be bisected in case something did break. 
If we can move out more code from places like en_rep.c into 
eswitch_offload.c and get it disabled that way that would be great, but 
I like to limit the number of #ifdefs we add to the actual code.



Just wondering, you are motivated by a wish to put some mlx5
functionalities under their own CONFIG directives which could be
useful when backporting the latest upstream driver into older kernel
and being able not to deal with parts of it, right? in that respect,
are you using SRIOV but not the offloads mode?


The motivation is two-fold, the primary is to be able to disable 
features not being used for those who compile a custom kernel and who 
wish to reduce the codebase compiled. It also makes it more flexible 
when back porting the code to older kernels since it is easier to pick 
out a smaller subset. I was going to look into making TC support etc. 
optional next, but I wanted to have a discussion about this patchset first.


Cheers,
Jes

[net-next:master 329/368] serdes.c:(.text+0x12f): multiple definition of `mv88e6xxx_g2_pvt_write'

2017-05-27 Thread kbuild test robot

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git 
master
head:   a3995460491d4570af8e99ad34ddf6d1948254d9
commit: 6335e9f2446b44139ac0722a81759a2b2f90bb4c [329/368] net: dsa: mv88e6xxx: 
mv88e6390X SERDES support
config: i386-randconfig-i1-05241633 (attached as .config)
compiler: gcc-4.8 (Debian 4.8.4-1) 4.8.4
reproduce:
git checkout 6335e9f2446b44139ac0722a81759a2b2f90bb4c
# save the attached .config to linux build tree
make ARCH=i386 

All errors (new ones prefixed by >>):

   drivers/net/dsa/mv88e6xxx/serdes.o: In function `mv88e6xxx_g2_pvt_write':
>> serdes.c:(.text+0x12f): multiple definition of `mv88e6xxx_g2_pvt_write'
   drivers/net/dsa/mv88e6xxx/chip.o:chip.c:(.text+0x23f3): first defined here
   drivers/net/dsa/mv88e6xxx/serdes.o: In function 
`mv88e6xxx_g2_misc_4_bit_port':
>> serdes.c:(.text+0x13e): multiple definition of `mv88e6xxx_g2_misc_4_bit_port'
   drivers/net/dsa/mv88e6xxx/chip.o:chip.c:(.text+0x2402): first defined here

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip

Re: [PATCH v2 net-next 3/9] net: lwtunnel: Add extack to encap attr validation

2017-05-27 Thread kbuild test robot

Hi David,

[auto build test ERROR on net-next/master]

url:
https://github.com/0day-ci/linux/commits/David-Ahern/net-another-round-of-extack-handling-for-routing/20170528-062659
config: ia64-allmodconfig (attached as .config)
compiler: ia64-linux-gcc (GCC) 6.2.0
reproduce:
wget 
https://raw.githubusercontent.com/01org/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=ia64 

Note: the 
linux-review/David-Ahern/net-another-round-of-extack-handling-for-routing/20170528-062659
 HEAD 44d11f94ccbe86e1a088f83a49ef3ca473118ad8 builds fine.
  It only hurts bisectibility.

All errors (new ones prefixed by >>):

   ERROR: "ia64_delay_loop" [drivers/spi/spi-thunderx.ko] undefined!
>> ERROR: "ia64_delay_loop" [drivers/net/phy/mdio-cavium.ko] undefined!

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip

[PATCH net-next 04/12] nfp: only try to get to PCIe ctrl memory if BARs are wide enough

2017-05-27 Thread Jakub Kicinski

For accessing PCIe ctrl memory we depend on the BAR aperture being
large enough to reach all registers.  Since the BAR aperture can
be set in the flash make sure the driver won't oops the kernel
when the PCIe configuration is unusual.

Signed-off-by: Jakub Kicinski 
---
 drivers/net/ethernet/netronome/nfp/nfpcore/nfp6000_pcie.c | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp6000_pcie.c 
b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp6000_pcie.c
index 1fde213d5b83..597ac8febb63 100644
--- a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp6000_pcie.c
+++ b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp6000_pcie.c
@@ -119,6 +119,11 @@
 #define NFP_PCIE_EM 0x02
 #define NFP_PCIE_SRAM   0x00
 
+/* Minimal size of the PCIe cfg memory we depend on being mapped,
+ * queue controller and DMA controller don't have to be covered.
+ */
+#define NFP_PCI_MIN_MAP_SIZE   0x08
+
 #define NFP_PCIE_P2C_FIXED_SIZE(bar)   (1 << (bar)->bitsize)
 #define NFP_PCIE_P2C_BULK_SIZE(bar)(1 << (bar)->bitsize)
 #define NFP_PCIE_P2C_GENERAL_TARGET_OFFSET(bar, x) ((x) << ((bar)->bitsize - 
2))
@@ -628,8 +633,9 @@ static int enable_bars(struct nfp6000_pcie *nfp, u16 
interface)
 
/* Configure, and lock, BAR0.0 for General Target use (MSI-X SRAM) */
bar = >bar[0];
-   bar->iomem = ioremap_nocache(nfp_bar_resource_start(bar),
-nfp_bar_resource_len(bar));
+   if (nfp_bar_resource_len(bar) >= NFP_PCI_MIN_MAP_SIZE)
+   bar->iomem = ioremap_nocache(nfp_bar_resource_start(bar),
+nfp_bar_resource_len(bar));
if (bar->iomem) {
dev_info(nfp->dev,
 "BAR0.0 RESERVED: General Mapping/MSI-X SRAM\n");
-- 
2.11.0

[PATCH net-next 03/12] nfp: don't set aux pointers if ioremap failed

2017-05-27 Thread Jakub Kicinski

If ioremap of PCIe ctrl memory failed we can still get to it through
PCI config space, therefore we allow ioremap() to fail.  When if fails,
however, we must leave all the IOMEM pointers as NULL.  Currently we
would calculate csr and em pointers, adding offsets to the potential
NULL value and therefore making the NULL-checks throughout the code
ineffective.

Signed-off-by: Jakub Kicinski 
---
 .../ethernet/netronome/nfp/nfpcore/nfp6000_pcie.c| 20 
 1 file changed, 12 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp6000_pcie.c 
b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp6000_pcie.c
index 43dc68e01274..1fde213d5b83 100644
--- a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp6000_pcie.c
+++ b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp6000_pcie.c
@@ -639,19 +639,23 @@ static int enable_bars(struct nfp6000_pcie *nfp, u16 
interface)
nfp6000_bar_write(nfp, bar, barcfg_msix_general);
 
nfp->expl.data = bar->iomem + NFP_PCIE_SRAM + 0x1000;
+
+   if (nfp->pdev->device == PCI_DEVICE_ID_NETRONOME_NFP4000 ||
+   nfp->pdev->device == PCI_DEVICE_ID_NETRONOME_NFP6000) {
+   nfp->iomem.csr = bar->iomem + NFP_PCIE_BAR(0);
+   } else {
+   int pf = nfp->pdev->devfn & 7;
+
+   nfp->iomem.csr = bar->iomem + NFP_PCIE_BAR(pf);
+   }
+   nfp->iomem.em = bar->iomem + NFP_PCIE_EM;
}
 
if (nfp->pdev->device == PCI_DEVICE_ID_NETRONOME_NFP4000 ||
-   nfp->pdev->device == PCI_DEVICE_ID_NETRONOME_NFP6000) {
-   nfp->iomem.csr = bar->iomem + NFP_PCIE_BAR(0);
+   nfp->pdev->device == PCI_DEVICE_ID_NETRONOME_NFP6000)
expl_groups = 4;
-   } else {
-   int pf = nfp->pdev->devfn & 7;
-
-   nfp->iomem.csr = bar->iomem + NFP_PCIE_BAR(pf);
+   else
expl_groups = 1;
-   }
-   nfp->iomem.em = bar->iomem + NFP_PCIE_EM;
 
/* Configure, and lock, BAR0.1 for PCIe XPB (MSI-X PBA) */
bar = >bar[1];
-- 
2.11.0

[PATCH net-next 12/12] nfp: don't keep count for free buffers delayed kick

2017-05-27 Thread Jakub Kicinski

We only kick RX free buffer queue controller every NFP_NET_FL_BATCH
(currently 16) entries.  This means that we will always kick the QC
when write ring index is divisable by NFP_NET_FL_BATCH.  There is
no need to keep counts.

Signed-off-by: Jakub Kicinski 
---
 drivers/net/ethernet/netronome/nfp/nfp_net.h| 3 ---
 drivers/net/ethernet/netronome/nfp/nfp_net_common.c | 7 ++-
 2 files changed, 2 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net.h 
b/drivers/net/ethernet/netronome/nfp/nfp_net.h
index 7882d2604835..cb7114309656 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net.h
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net.h
@@ -328,8 +328,6 @@ struct nfp_net_rx_buf {
  * @idx:Ring index from Linux's perspective
  * @fl_qcidx:   Queue Controller Peripheral (QCP) queue index for the freelist
  * @qcp_fl: Pointer to base of the QCP freelist queue
- * @wr_ptr_add: Accumulated number of buffers to add to QCP write pointer
- *  (used for free list batching)
  * @rxbufs: Array of transmitted FL/RX buffers
  * @rxds:   Virtual address of FL/RX ring in host memory
  * @dma:DMA address of the FL/RX ring
@@ -343,7 +341,6 @@ struct nfp_net_rx_ring {
u32 rd_p;
 
u32 idx;
-   u32 wr_ptr_add;
 
int fl_qcidx;
u8 __iomem *qcp_fl;
diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c 
b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
index 68013d048e9d..c9a140376621 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
@@ -1212,14 +1212,12 @@ static void nfp_net_rx_give_one(const struct nfp_net_dp 
*dp,
  dma_addr + dp->rx_dma_off);
 
rx_ring->wr_p++;
-   rx_ring->wr_ptr_add++;
-   if (rx_ring->wr_ptr_add >= NFP_NET_FL_BATCH) {
+   if (!(rx_ring->wr_p % NFP_NET_FL_BATCH)) {
/* Update write pointer of the freelist queue. Make
 * sure all writes are flushed before telling the hardware.
 */
wmb();
-   nfp_qcp_wr_ptr_add(rx_ring->qcp_fl, rx_ring->wr_ptr_add);
-   rx_ring->wr_ptr_add = 0;
+   nfp_qcp_wr_ptr_add(rx_ring->qcp_fl, NFP_NET_FL_BATCH);
}
 }
 
@@ -1245,7 +1243,6 @@ static void nfp_net_rx_ring_reset(struct nfp_net_rx_ring 
*rx_ring)
memset(rx_ring->rxds, 0, sizeof(*rx_ring->rxds) * rx_ring->cnt);
rx_ring->wr_p = 0;
rx_ring->rd_p = 0;
-   rx_ring->wr_ptr_add = 0;
 }
 
 /**
-- 
2.11.0

[PATCH net-next 07/12] nfp: support variable NSP response lengths

2017-05-27 Thread Jakub Kicinski

We want to support extendable commands, where newer versions
of the management FW may provide more information.  Zero out
the communication buffer before passing control to NSP.  This
way if management FW is old and only fills in first N bytes,
the remaining ones will be zeros which extended ABI fields
should reserve as not supported/not available.

Signed-off-by: Jakub Kicinski 
---
 drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp.c 
b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp.c
index 2fa9247bb23d..58cc3d532769 100644
--- a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp.c
+++ b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp.c
@@ -419,6 +419,14 @@ static int nfp_nsp_command_buf(struct nfp_nsp *nsp, u16 
code, u32 option,
if (err < 0)
return err;
}
+   /* Zero out remaining part of the buffer */
+   if (out_buf && out_size && out_size > in_size) {
+   memset(out_buf, 0, out_size - in_size);
+   err = nfp_cpp_write(cpp, cpp_id, cpp_buf + in_size,
+   out_buf, out_size - in_size);
+   if (err < 0)
+   return err;
+   }
 
ret = nfp_nsp_command(nsp, code, option, cpp_id, cpp_buf);
if (ret < 0)
-- 
2.11.0

[PATCH net-next 01/12] nfp: add set_mac_address support while the interface is up

2017-05-27 Thread Jakub Kicinski

From: Pablo Cascón 

Expose FW app ability to change MAC address at runtime.  Make sure
we only depend on it if FW app advertised the right capability.

Signed-off-by: Pablo Cascón 
Reviewed-by: Jakub Kicinski 
---
 .../net/ethernet/netronome/nfp/nfp_net_common.c| 44 +-
 drivers/net/ethernet/netronome/nfp/nfp_net_ctrl.h  |  2 +
 2 files changed, 36 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c 
b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
index b3f5c8af6789..9312a737fbc9 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
@@ -2123,17 +2123,16 @@ void nfp_net_coalesce_write_cfg(struct nfp_net *nn)
 /**
  * nfp_net_write_mac_addr() - Write mac address to the device control BAR
  * @nn:  NFP Net device to reconfigure
+ * @addr:MAC address to write
  *
  * Writes the MAC address from the netdev to the device control BAR.  Does not
  * perform the required reconfig.  We do a bit of byte swapping dance because
  * firmware is LE.
  */
-static void nfp_net_write_mac_addr(struct nfp_net *nn)
+static void nfp_net_write_mac_addr(struct nfp_net *nn, const u8 *addr)
 {
-   nn_writel(nn, NFP_NET_CFG_MACADDR + 0,
- get_unaligned_be32(nn->dp.netdev->dev_addr));
-   nn_writew(nn, NFP_NET_CFG_MACADDR + 6,
- get_unaligned_be16(nn->dp.netdev->dev_addr + 4));
+   nn_writel(nn, NFP_NET_CFG_MACADDR + 0, get_unaligned_be32(addr));
+   nn_writew(nn, NFP_NET_CFG_MACADDR + 6, get_unaligned_be16(addr + 4));
 }
 
 static void nfp_net_vec_clear_ring_data(struct nfp_net *nn, unsigned int idx)
@@ -2238,7 +2237,7 @@ static int nfp_net_set_config_and_enable(struct nfp_net 
*nn)
nn_writeq(nn, NFP_NET_CFG_RXRS_ENABLE, nn->dp.num_rx_rings == 64 ?
  0xULL : ((u64)1 << nn->dp.num_rx_rings) - 1);
 
-   nfp_net_write_mac_addr(nn);
+   nfp_net_write_mac_addr(nn, nn->dp.netdev->dev_addr);
 
nn_writel(nn, NFP_NET_CFG_MTU, nn->dp.netdev->mtu);
 
@@ -2997,6 +2996,27 @@ static int nfp_net_xdp(struct net_device *netdev, struct 
netdev_xdp *xdp)
}
 }
 
+static int nfp_net_set_mac_address(struct net_device *netdev, void *addr)
+{
+   struct nfp_net *nn = netdev_priv(netdev);
+   struct sockaddr *saddr = addr;
+   int err;
+
+   err = eth_prepare_mac_addr_change(netdev, addr);
+   if (err)
+   return err;
+
+   nfp_net_write_mac_addr(nn, saddr->sa_data);
+
+   err = nfp_net_reconfig(nn, NFP_NET_CFG_UPDATE_MACADDR);
+   if (err)
+   return err;
+
+   eth_commit_mac_addr_change(netdev, addr);
+
+   return 0;
+}
+
 const struct net_device_ops nfp_net_netdev_ops = {
.ndo_open   = nfp_net_netdev_open,
.ndo_stop   = nfp_net_netdev_close,
@@ -3006,7 +3026,7 @@ const struct net_device_ops nfp_net_netdev_ops = {
.ndo_tx_timeout = nfp_net_tx_timeout,
.ndo_set_rx_mode= nfp_net_set_rx_mode,
.ndo_change_mtu = nfp_net_change_mtu,
-   .ndo_set_mac_address= eth_mac_addr,
+   .ndo_set_mac_address= nfp_net_set_mac_address,
.ndo_set_features   = nfp_net_set_features,
.ndo_features_check = nfp_net_features_check,
.ndo_get_phys_port_name = nfp_port_get_phys_port_name,
@@ -3029,7 +3049,7 @@ void nfp_net_info(struct nfp_net *nn)
nn->fw_ver.resv, nn->fw_ver.class,
nn->fw_ver.major, nn->fw_ver.minor,
nn->max_mtu);
-   nn_info(nn, "CAP: %#x %s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s\n",
+   nn_info(nn, "CAP: %#x %s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s\n",
nn->cap,
nn->cap & NFP_NET_CFG_CTRL_PROMISC  ? "PROMISC "  : "",
nn->cap & NFP_NET_CFG_CTRL_L2BC ? "L2BCFILT " : "",
@@ -3051,7 +3071,8 @@ void nfp_net_info(struct nfp_net *nn)
nn->cap & NFP_NET_CFG_CTRL_NVGRE? "NVGRE ": "",
nfp_net_ebpf_capable(nn)? "BPF "  : "",
nn->cap & NFP_NET_CFG_CTRL_CSUM_COMPLETE ?
- "RXCSUM_COMPLETE " : "");
+ "RXCSUM_COMPLETE " : "",
+   nn->cap & NFP_NET_CFG_CTRL_LIVE_ADDR ? "LIVE_ADDR " : "");
 }
 
 /**
@@ -3211,7 +3232,7 @@ int nfp_net_init(struct nfp_net *nn)
if (nn->dp.chained_metadata_format && nn->fw_ver.major != 4)
nn->cap &= ~NFP_NET_CFG_CTRL_RSS;
 
-   nfp_net_write_mac_addr(nn);
+   nfp_net_write_mac_addr(nn, nn->dp.netdev->dev_addr);
 
/* Determine RX packet/metadata boundary offset */
if (nn->fw_ver.major >= 2) {
@@ -3241,6 +3262,9 @@ int nfp_net_init(struct nfp_net *nn)
 * and

[PATCH net-next 06/12] nfp: shorten CPP core probe logs

2017-05-27 Thread Jakub Kicinski

We currently print reserved BAR mappings info as we create them.
This makes the probe logs longer than necessary.  Print into a
buffer instead and log all the info as a single line.

Signed-off-by: Jakub Kicinski 
---
 .../net/ethernet/netronome/nfp/nfpcore/nfp6000_pcie.c | 19 +++
 1 file changed, 11 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp6000_pcie.c 
b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp6000_pcie.c
index 597ac8febb63..cd678323bacb 100644
--- a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp6000_pcie.c
+++ b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp6000_pcie.c
@@ -588,9 +588,15 @@ static int enable_bars(struct nfp6000_pcie *nfp, u16 
interface)
NFP_PCIE_BAR_PCIE2CPP_MapType(
NFP_PCIE_BAR_PCIE2CPP_MapType_EXPLICIT3),
};
+   char status_msg[196] = {};
struct nfp_bar *bar;
int i, bars_free;
int expl_groups;
+   char *msg, *end;
+
+   msg = status_msg +
+   snprintf(status_msg, sizeof(status_msg) - 1, "RESERVED BARs: ");
+   end = status_msg + sizeof(status_msg) - 1;
 
bar = >bar[0];
for (i = 0; i < ARRAY_SIZE(nfp->bar); i++, bar++) {
@@ -637,8 +643,7 @@ static int enable_bars(struct nfp6000_pcie *nfp, u16 
interface)
bar->iomem = ioremap_nocache(nfp_bar_resource_start(bar),
 nfp_bar_resource_len(bar));
if (bar->iomem) {
-   dev_info(nfp->dev,
-"BAR0.0 RESERVED: General Mapping/MSI-X SRAM\n");
+   msg += snprintf(msg, end - msg, "0.0: General/MSI-X SRAM, ");
atomic_inc(>refcnt);
bars_free--;
 
@@ -665,7 +670,7 @@ static int enable_bars(struct nfp6000_pcie *nfp, u16 
interface)
 
/* Configure, and lock, BAR0.1 for PCIe XPB (MSI-X PBA) */
bar = >bar[1];
-   dev_info(nfp->dev, "BAR0.1 RESERVED: PCIe XPB/MSI-X PBA\n");
+   msg += snprintf(msg, end - msg, "0.1: PCIe XPB/MSI-X PBA, ");
atomic_inc(>refcnt);
bars_free--;
 
@@ -684,9 +689,8 @@ static int enable_bars(struct nfp6000_pcie *nfp, u16 
interface)
bar->iomem = ioremap_nocache(nfp_bar_resource_start(bar),
 nfp_bar_resource_len(bar));
if (bar->iomem) {
-   dev_info(nfp->dev,
-"BAR0.%d RESERVED: Explicit%d Mapping\n",
-4 + i, i);
+   msg += snprintf(msg, end - msg,
+   "0.%d: Explicit%d, ", 4 + i, i);
atomic_inc(>refcnt);
bars_free--;
 
@@ -704,8 +708,7 @@ static int enable_bars(struct nfp6000_pcie *nfp, u16 
interface)
sort(>bar[0], nfp->bars, sizeof(nfp->bar[0]),
 bar_cmp, NULL);
 
-   dev_info(nfp->dev, "%d NFP PCI2CPP BARs, %d free\n",
-nfp->bars, bars_free);
+   dev_info(nfp->dev, "%sfree: %d/%d\n", status_msg, bars_free, nfp->bars);
 
return 0;
 }
-- 
2.11.0

[PATCH net-next 09/12] nfp: don't wait for resources indefinitely

2017-05-27 Thread Jakub Kicinski

There is currently no timeout to the resource and lock acquiring
loops.  We printed warnings and depended on user sending a signal
to the waiting process to stop the waiting.  This doesn't work
very well when wait happens out of a work queue.  The simplest
example of that is PCI probe.  When user loads the module and card
is in a broken state modprobe will wait forever and signals sent
to it will not actually reach the probing thread.

Make sure all wait loops have a time out.  Set the upper wait time
to 60 seconds to stay on the safe side.

Signed-off-by: Jakub Kicinski 
---
 drivers/net/ethernet/netronome/nfp/nfpcore/nfp_cpp.h  |  5 +
 drivers/net/ethernet/netronome/nfp/nfpcore/nfp_mutex.c|  9 +++--
 drivers/net/ethernet/netronome/nfp/nfpcore/nfp_resource.c | 10 --
 3 files changed, 20 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_cpp.h 
b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_cpp.h
index 8d46b9acb69f..0a46c0984e68 100644
--- a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_cpp.h
+++ b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_cpp.h
@@ -63,6 +63,11 @@
 /* Max size of area it should be safe to request */
 #define NFP_CPP_SAFE_AREA_SIZE SZ_2M
 
+/* NFP_MUTEX_WAIT_* are timeouts in seconds when waiting for a mutex */
+#define NFP_MUTEX_WAIT_FIRST_WARN  15
+#define NFP_MUTEX_WAIT_NEXT_WARN   5
+#define NFP_MUTEX_WAIT_ERROR   60
+
 struct device;
 
 struct nfp_cpp_area;
diff --git a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_mutex.c 
b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_mutex.c
index 8a99c189efa8..f7b958181126 100644
--- a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_mutex.c
+++ b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_mutex.c
@@ -195,7 +195,8 @@ void nfp_cpp_mutex_free(struct nfp_cpp_mutex *mutex)
  */
 int nfp_cpp_mutex_lock(struct nfp_cpp_mutex *mutex)
 {
-   unsigned long warn_at = jiffies + 15 * HZ;
+   unsigned long warn_at = jiffies + NFP_MUTEX_WAIT_FIRST_WARN * HZ;
+   unsigned long err_at = jiffies + NFP_MUTEX_WAIT_ERROR * HZ;
unsigned int timeout_ms = 1;
int err;
 
@@ -214,12 +215,16 @@ int nfp_cpp_mutex_lock(struct nfp_cpp_mutex *mutex)
return -ERESTARTSYS;
 
if (time_is_before_eq_jiffies(warn_at)) {
-   warn_at = jiffies + 60 * HZ;
+   warn_at = jiffies + NFP_MUTEX_WAIT_NEXT_WARN * HZ;
nfp_warn(mutex->cpp,
 "Warning: waiting for NFP mutex [depth:%hd 
target:%d addr:%llx key:%08x]\n",
 mutex->depth,
 mutex->target, mutex->address, mutex->key);
}
+   if (time_is_before_eq_jiffies(err_at)) {
+   nfp_err(mutex->cpp, "Error: mutex wait timed out\n");
+   return -EBUSY;
+   }
}
 
return err;
diff --git a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_resource.c 
b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_resource.c
index 2d15a7c9d0de..072612263dab 100644
--- a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_resource.c
+++ b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_resource.c
@@ -181,7 +181,8 @@ nfp_resource_try_acquire(struct nfp_cpp *cpp, struct 
nfp_resource *res,
 struct nfp_resource *
 nfp_resource_acquire(struct nfp_cpp *cpp, const char *name)
 {
-   unsigned long warn_at = jiffies + 15 * HZ;
+   unsigned long warn_at = jiffies + NFP_MUTEX_WAIT_FIRST_WARN * HZ;
+   unsigned long err_at = jiffies + NFP_MUTEX_WAIT_ERROR * HZ;
struct nfp_cpp_mutex *dev_mutex;
struct nfp_resource *res;
int err;
@@ -214,10 +215,15 @@ nfp_resource_acquire(struct nfp_cpp *cpp, const char 
*name)
}
 
if (time_is_before_eq_jiffies(warn_at)) {
-   warn_at = jiffies + 60 * HZ;
+   warn_at = jiffies + NFP_MUTEX_WAIT_NEXT_WARN * HZ;
nfp_warn(cpp, "Warning: waiting for NFP resource %s\n",
 name);
}
+   if (time_is_before_eq_jiffies(err_at)) {
+   nfp_err(cpp, "Error: resource %s timed out\n", name);
+   err = -EBUSY;
+   goto err_free;
+   }
}
 
nfp_cpp_mutex_free(dev_mutex);
-- 
2.11.0

[PATCH net-next 00/12] nfp: pci core, hwmon, live mac addr change

2017-05-27 Thread Jakub Kicinski

This series brings updates to core PCI code, SR-IOV, exposes 
firmware's capability to change MAC address at runtime and HWMON
interfaces.  

The PCI code updates include resiliency improvement in conditions 
which are quite unusual, but still shouldn't make the driver oops.
We also handle very large device memory operation more gracefully.
A timeout is added to acquiring mutexes in device memory.

Pablo provides a patch to expose to the stack the ability to change
MAC addresses under traffic while David adds HWMON interface for
reading device temperature and power consumption.

Last three patches are minor improvements to the netdev code.

David Brunecz (1):
  nfp: add hwmon support

Jakub Kicinski (10):
  nfp: set driver VF limit
  nfp: don't set aux pointers if ioremap failed
  nfp: only try to get to PCIe ctrl memory if BARs are wide enough
  nfp: support long reads and writes with the cpp helpers
  nfp: shorten CPP core probe logs
  nfp: support variable NSP response lengths
  nfp: don't wait for resources indefinitely
  nfp: fix print format for ring pointers in ring dumps
  nfp: don't add ring size to index calculations
  nfp: don't keep count for free buffers delayed kick

Pablo Cascón (1):
  nfp: add set_mac_address support while the interface is up

 drivers/net/ethernet/netronome/nfp/Makefile|   1 +
 drivers/net/ethernet/netronome/nfp/nfp_hwmon.c | 190 +
 drivers/net/ethernet/netronome/nfp/nfp_main.c  |  44 +++--
 drivers/net/ethernet/netronome/nfp/nfp_main.h  |   8 +
 drivers/net/ethernet/netronome/nfp/nfp_net.h   |   3 -
 .../net/ethernet/netronome/nfp/nfp_net_common.c|  55 --
 drivers/net/ethernet/netronome/nfp/nfp_net_ctrl.h  |   2 +
 .../net/ethernet/netronome/nfp/nfp_net_debugfs.c   |   4 +-
 drivers/net/ethernet/netronome/nfp/nfpcore/nfp.h   |   2 +
 .../ethernet/netronome/nfp/nfpcore/nfp6000_pcie.c  |  49 --
 .../net/ethernet/netronome/nfp/nfpcore/nfp_cpp.h   |   8 +
 .../ethernet/netronome/nfp/nfpcore/nfp_cppcore.c   |  87 --
 .../net/ethernet/netronome/nfp/nfpcore/nfp_mutex.c |   9 +-
 .../net/ethernet/netronome/nfp/nfpcore/nfp_nsp.c   |  16 ++
 .../net/ethernet/netronome/nfp/nfpcore/nfp_nsp.h   |  12 ++
 .../ethernet/netronome/nfp/nfpcore/nfp_nsp_cmds.c  |  47 -
 .../ethernet/netronome/nfp/nfpcore/nfp_resource.c  |  10 +-
 17 files changed, 470 insertions(+), 77 deletions(-)
 create mode 100644 drivers/net/ethernet/netronome/nfp/nfp_hwmon.c

-- 
2.11.0

[PATCH net-next 02/12] nfp: set driver VF limit

2017-05-27 Thread Jakub Kicinski

PCI subsystem has support for drivers limiting the number of VFs
available below what the IOV capability claims.  Make use of it.

While at it remove the #ifdef/#endif on CONFIG_PCI_IOV, it was
there to avoid unnecessary warnings in case device read failed
but kernel doesn't have SR-IOV support anyway.  Device reads
should not fail.

Signed-off-by: Jakub Kicinski 
---
 drivers/net/ethernet/netronome/nfp/nfp_main.c | 23 +++
 1 file changed, 15 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/nfp_main.c 
b/drivers/net/ethernet/netronome/nfp/nfp_main.c
index f22f56c9218f..ba174e163834 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_main.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_main.c
@@ -73,20 +73,22 @@ static const struct pci_device_id nfp_pci_device_ids[] = {
 };
 MODULE_DEVICE_TABLE(pci, nfp_pci_device_ids);
 
-static void nfp_pcie_sriov_read_nfd_limit(struct nfp_pf *pf)
+static int nfp_pcie_sriov_read_nfd_limit(struct nfp_pf *pf)
 {
-#ifdef CONFIG_PCI_IOV
int err;
 
pf->limit_vfs = nfp_rtsym_read_le(pf->cpp, "nfd_vf_cfg_max_vfs", );
if (!err)
-   return;
+   return pci_sriov_set_totalvfs(pf->pdev, pf->limit_vfs);
 
pf->limit_vfs = ~0;
+   pci_sriov_set_totalvfs(pf->pdev, 0); /* 0 is unset */
/* Allow any setting for backwards compatibility if symbol not found */
-   if (err != -ENOENT)
-   nfp_warn(pf->cpp, "Warning: VF limit read failed: %d\n", err);
-#endif
+   if (err == -ENOENT)
+   return 0;
+
+   nfp_warn(pf->cpp, "Warning: VF limit read failed: %d\n", err);
+   return err;
 }
 
 static int nfp_pcie_sriov_enable(struct pci_dev *pdev, int num_vfs)
@@ -373,14 +375,18 @@ static int nfp_pci_probe(struct pci_dev *pdev,
if (err)
goto err_devlink_unreg;
 
-   nfp_pcie_sriov_read_nfd_limit(pf);
+   err = nfp_pcie_sriov_read_nfd_limit(pf);
+   if (err)
+   goto err_fw_unload;
 
err = nfp_net_pci_probe(pf);
if (err)
-   goto err_fw_unload;
+   goto err_sriov_unlimit;
 
return 0;
 
+err_sriov_unlimit:
+   pci_sriov_set_totalvfs(pf->pdev, 0);
 err_fw_unload:
if (pf->fw_loaded)
nfp_fw_unload(pf);
@@ -411,6 +417,7 @@ static void nfp_pci_remove(struct pci_dev *pdev)
nfp_net_pci_remove(pf);
 
nfp_pcie_sriov_disable(pdev);
+   pci_sriov_set_totalvfs(pf->pdev, 0);
 
devlink_unregister(devlink);
 
-- 
2.11.0

[PATCH net-next 10/12] nfp: fix print format for ring pointers in ring dumps

2017-05-27 Thread Jakub Kicinski

Ring pointers are unsigned.  Fix the print formats to avoid
showing users negative values.

Signed-off-by: Jakub Kicinski 
---
 drivers/net/ethernet/netronome/nfp/nfp_net_debugfs.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_debugfs.c 
b/drivers/net/ethernet/netronome/nfp/nfp_net_debugfs.c
index 6cf1b234eecd..8c52c0e8379c 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_debugfs.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_debugfs.c
@@ -62,7 +62,7 @@ static int nfp_net_debugfs_rx_q_read(struct seq_file *file, 
void *data)
fl_rd_p = nfp_qcp_rd_ptr_read(rx_ring->qcp_fl);
fl_wr_p = nfp_qcp_wr_ptr_read(rx_ring->qcp_fl);
 
-   seq_printf(file, "RX[%02d,%02d]: cnt=%d dma=%pad host=%p   H_RD=%d 
H_WR=%d FL_RD=%d FL_WR=%d\n",
+   seq_printf(file, "RX[%02d,%02d]: cnt=%u dma=%pad host=%p   H_RD=%u 
H_WR=%u FL_RD=%u FL_WR=%u\n",
   rx_ring->idx, rx_ring->fl_qcidx,
   rx_ring->cnt, _ring->dma, rx_ring->rxds,
   rx_ring->rd_p, rx_ring->wr_p, fl_rd_p, fl_wr_p);
@@ -146,7 +146,7 @@ static int nfp_net_debugfs_tx_q_read(struct seq_file *file, 
void *data)
d_rd_p = nfp_qcp_rd_ptr_read(tx_ring->qcp_q);
d_wr_p = nfp_qcp_wr_ptr_read(tx_ring->qcp_q);
 
-   seq_printf(file, "TX[%02d,%02d%s]: cnt=%d dma=%pad host=%p   H_RD=%d 
H_WR=%d D_RD=%d D_WR=%d\n",
+   seq_printf(file, "TX[%02d,%02d%s]: cnt=%u dma=%pad host=%p   H_RD=%u 
H_WR=%u D_RD=%u D_WR=%u\n",
   tx_ring->idx, tx_ring->qcidx,
   tx_ring == r_vec->tx_ring ? "" : "xdp",
   tx_ring->cnt, _ring->dma, tx_ring->txds,
-- 
2.11.0

[PATCH net-next 05/12] nfp: support long reads and writes with the cpp helpers

2017-05-27 Thread Jakub Kicinski

nfp_cpp_{read,write}() helpers perform device memory mapping (setting
the PCIe -> NOC translation BARs) and accessing it.  They, however,
currently implicitly expect that the length of entire operation will
fit in one BAR translation window.  There is a number of 16MB windows
available, and we don't really need to access such large areas today.

If the user, however, manages to trick the driver into making a big
mapping (e.g. by providing a huge fake FW file), the driver will
print a warning saying "No suitable BAR found for request" and a
stack trace - which most users find concerning.

To be future-proof and not scare users with warnings, make the
nfp_cpp_{read,write}() helpers do accesses chunk by chunk if the area
size is large.  Set the notion of "large" to 2MB, which is the size
of the smallest BAR window.

Signed-off-by: Jakub Kicinski 
---
 .../net/ethernet/netronome/nfp/nfpcore/nfp_cpp.h   |  3 +
 .../ethernet/netronome/nfp/nfpcore/nfp_cppcore.c   | 87 +-
 2 files changed, 72 insertions(+), 18 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_cpp.h 
b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_cpp.h
index 154b0b594184..8d46b9acb69f 100644
--- a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_cpp.h
+++ b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_cpp.h
@@ -42,6 +42,7 @@
 
 #include 
 #include 
+#include 
 
 #ifndef NFP_SUBSYS
 #define NFP_SUBSYS "nfp"
@@ -59,6 +60,8 @@
 #define PCI_64BIT_BAR_COUNT 3
 
 #define NFP_CPP_NUM_TARGETS 16
+/* Max size of area it should be safe to request */
+#define NFP_CPP_SAFE_AREA_SIZE SZ_2M
 
 struct device;
 
diff --git a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_cppcore.c 
b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_cppcore.c
index e2abba4c3a3f..5672d309d07d 100644
--- a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_cppcore.c
+++ b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_cppcore.c
@@ -924,18 +924,9 @@ area_cache_put(struct nfp_cpp *cpp, struct 
nfp_cpp_area_cache *cache)
mutex_unlock(>area_cache_mutex);
 }
 
-/**
- * nfp_cpp_read() - read from CPP target
- * @cpp:   CPP handle
- * @destination:   CPP id
- * @address:   offset into CPP target
- * @kernel_vaddr:  kernel buffer for result
- * @length:number of bytes to read
- *
- * Return: length of io, or -ERRNO
- */
-int nfp_cpp_read(struct nfp_cpp *cpp, u32 destination,
-unsigned long long address, void *kernel_vaddr, size_t length)
+static int __nfp_cpp_read(struct nfp_cpp *cpp, u32 destination,
+ unsigned long long address, void *kernel_vaddr,
+ size_t length)
 {
struct nfp_cpp_area_cache *cache;
struct nfp_cpp_area *area;
@@ -968,18 +959,43 @@ int nfp_cpp_read(struct nfp_cpp *cpp, u32 destination,
 }
 
 /**
- * nfp_cpp_write() - write to CPP target
+ * nfp_cpp_read() - read from CPP target
  * @cpp:   CPP handle
  * @destination:   CPP id
  * @address:   offset into CPP target
- * @kernel_vaddr:  kernel buffer to read from
- * @length:number of bytes to write
+ * @kernel_vaddr:  kernel buffer for result
+ * @length:number of bytes to read
  *
  * Return: length of io, or -ERRNO
  */
-int nfp_cpp_write(struct nfp_cpp *cpp, u32 destination,
- unsigned long long address,
- const void *kernel_vaddr, size_t length)
+int nfp_cpp_read(struct nfp_cpp *cpp, u32 destination,
+unsigned long long address, void *kernel_vaddr,
+size_t length)
+{
+   size_t n, offset;
+   int ret;
+
+   for (offset = 0; offset < length; offset += n) {
+   unsigned long long r_addr = address + offset;
+
+   /* make first read smaller to align to safe window */
+   n = min_t(size_t, length - offset,
+ ALIGN(r_addr + 1, NFP_CPP_SAFE_AREA_SIZE) - r_addr);
+
+   ret = __nfp_cpp_read(cpp, destination, address + offset,
+kernel_vaddr + offset, n);
+   if (ret < 0)
+   return ret;
+   if (ret != n)
+   return offset + n;
+   }
+
+   return length;
+}
+
+static int __nfp_cpp_write(struct nfp_cpp *cpp, u32 destination,
+  unsigned long long address,
+  const void *kernel_vaddr, size_t length)
 {
struct nfp_cpp_area_cache *cache;
struct nfp_cpp_area *area;
@@ -1011,6 +1027,41 @@ int nfp_cpp_write(struct nfp_cpp *cpp, u32 destination,
return err;
 }
 
+/**
+ * nfp_cpp_write() - write to CPP target
+ * @cpp:   CPP handle
+ * @destination:   CPP id
+ * @address:   offset into CPP target
+ * @kernel_vaddr:  kernel buffer to read from
+ * @length:number of bytes to write
+ *
+ * Return:

[PATCH net-next 08/12] nfp: add hwmon support

2017-05-27 Thread Jakub Kicinski

From: David Brunecz 

Add support for retrieving temperature and power sensor and limits via NSP.

Signed-off-by: David Brunecz 
Signed-off-by: Jakub Kicinski 
---
 drivers/net/ethernet/netronome/nfp/Makefile|   1 +
 drivers/net/ethernet/netronome/nfp/nfp_hwmon.c | 190 +
 drivers/net/ethernet/netronome/nfp/nfp_main.c  |  21 ++-
 drivers/net/ethernet/netronome/nfp/nfp_main.h  |  10 ++
 drivers/net/ethernet/netronome/nfp/nfpcore/nfp.h   |   2 +
 .../net/ethernet/netronome/nfp/nfpcore/nfp_nsp.c   |   8 +
 .../net/ethernet/netronome/nfp/nfpcore/nfp_nsp.h   |  12 ++
 .../ethernet/netronome/nfp/nfpcore/nfp_nsp_cmds.c  |  47 -
 8 files changed, 284 insertions(+), 7 deletions(-)
 create mode 100644 drivers/net/ethernet/netronome/nfp/nfp_hwmon.c

diff --git a/drivers/net/ethernet/netronome/nfp/Makefile 
b/drivers/net/ethernet/netronome/nfp/Makefile
index 95f6b97b5d71..83039c65e061 100644
--- a/drivers/net/ethernet/netronome/nfp/Makefile
+++ b/drivers/net/ethernet/netronome/nfp/Makefile
@@ -16,6 +16,7 @@ nfp-objs := \
nfpcore/nfp_target.o \
nfp_app.o \
nfp_devlink.o \
+   nfp_hwmon.o \
nfp_main.o \
nfp_net_common.o \
nfp_net_ethtool.o \
diff --git a/drivers/net/ethernet/netronome/nfp/nfp_hwmon.c 
b/drivers/net/ethernet/netronome/nfp/nfp_hwmon.c
new file mode 100644
index ..bef58ee37c3a
--- /dev/null
+++ b/drivers/net/ethernet/netronome/nfp/nfp_hwmon.c
@@ -0,0 +1,190 @@
+/*
+ * Copyright (C) 2017 Netronome Systems, Inc.
+ *
+ * This software is dual licensed under the GNU General License Version 2,
+ * June 1991 as shown in the file COPYING in the top-level directory of this
+ * source tree or the BSD 2-Clause License provided below.  You have the
+ * option to license this software under the complete terms of either license.
+ *
+ * The BSD 2-Clause License:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ *  1. Redistributions of source code must retain the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer.
+ *
+ *  2. Redistributions in binary form must reproduce the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer in the documentation and/or other materials
+ * provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#include 
+#include 
+#include 
+
+#include "nfpcore/nfp_cpp.h"
+#include "nfpcore/nfp_nsp.h"
+#include "nfp_main.h"
+
+#define NFP_TEMP_MAX   (95 * 1000)
+#define NFP_TEMP_CRIT  (105 * 1000)
+
+#define NFP_POWER_MAX  (25 * 1000 * 1000)
+
+static int nfp_hwmon_sensor_id(enum hwmon_sensor_types type, int channel)
+{
+   if (type == hwmon_temp)
+   return NFP_SENSOR_CHIP_TEMPERATURE;
+   if (type == hwmon_power)
+   return NFP_SENSOR_ASSEMBLY_POWER + channel;
+   return -EINVAL;
+}
+
+static int
+nfp_hwmon_read(struct device *dev, enum hwmon_sensor_types type, u32 attr,
+  int channel, long *val)
+{
+   static const struct {
+   enum hwmon_sensor_types type;
+   u32 attr;
+   long val;
+   } const_vals[] = {
+   { hwmon_temp,   hwmon_temp_max, NFP_TEMP_MAX },
+   { hwmon_temp,   hwmon_temp_crit,NFP_TEMP_CRIT },
+   { hwmon_power,  hwmon_power_max,NFP_POWER_MAX },
+   };
+   struct nfp_pf *pf = dev_get_drvdata(dev);
+   enum nfp_nsp_sensor_id id;
+   int err, i;
+
+   for (i = 0; i < ARRAY_SIZE(const_vals); i++)
+   if (const_vals[i].type == type && const_vals[i].attr == attr) {
+   *val = const_vals[i].val;
+   return 0;
+   }
+
+   err = nfp_hwmon_sensor_id(type, channel);
+   if (err < 0)
+   return err;
+   id = err;
+
+   if (!(pf->nspi->sensor_mask & BIT(id)))
+   return -EOPNOTSUPP;
+
+   if (type == hwmon_temp && attr == hwmon_temp_input)
+   return nfp_hwmon_read_sensor(pf->cpp, id, val);
+   if (type == hwmon_power && attr == hwmon_power_input)
+   return nfp_hwmon_read_sensor(pf->cpp, id, val);
+
+   return -EINVAL;
+}
+
+static

[PATCH net-next 11/12] nfp: don't add ring size to index calculations

2017-05-27 Thread Jakub Kicinski

Adding ring size to index calculation is pointless, since index
will be masked with ring size - 1.

Suggested-by: David Laight 
Signed-off-by: Jakub Kicinski 
---
 drivers/net/ethernet/netronome/nfp/nfp_net_common.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c 
b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
index 9312a737fbc9..68013d048e9d 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
@@ -928,7 +928,7 @@ static void nfp_net_tx_complete(struct nfp_net_tx_ring 
*tx_ring)
if (qcp_rd_p == tx_ring->qcp_rd_p)
return;
 
-   todo = D_IDX(tx_ring, qcp_rd_p + tx_ring->cnt - tx_ring->qcp_rd_p);
+   todo = D_IDX(tx_ring, qcp_rd_p - tx_ring->qcp_rd_p);
 
while (todo--) {
idx = D_IDX(tx_ring, tx_ring->rd_p++);
@@ -999,7 +999,7 @@ static bool nfp_net_xdp_complete(struct nfp_net_tx_ring 
*tx_ring)
if (qcp_rd_p == tx_ring->qcp_rd_p)
return true;
 
-   todo = D_IDX(tx_ring, qcp_rd_p + tx_ring->cnt - tx_ring->qcp_rd_p);
+   todo = D_IDX(tx_ring, qcp_rd_p - tx_ring->qcp_rd_p);
 
done_all = todo <= NFP_NET_XDP_MAX_COMPLETE;
todo = min(todo, NFP_NET_XDP_MAX_COMPLETE);
-- 
2.11.0

Re: running an eBPF program

2017-05-27 Thread David Miller

From: Y Song 
Date: Sat, 27 May 2017 13:52:27 -0700

> On Sat, May 27, 2017 at 1:23 PM, Y Song  wrote:
>>
>> From verifier error message:
>> ==
>> 0: (bf) r6 = r1
>>
>> 1: (18) r9 = 0xffee
>>
>> 3: (69) r0 = *(u16 *)(r6 +16)
>>
>> invalid bpf_context access off=16 size=2
>> ==
>>
>> The offset 16 of struct __sk_buff is hash.
>> What instruction #3 tries to do is to access 2 bytes of the hash value
>> instead of full 4 bytes.
>> This is explicitly not allowed in verifier due to endianness issue.
> 
> 
> I can reproduce the issue now. My previous statement saying to access
> "hash" field is not correct. It is accessing the protocol field.
> 
> static __inline__ bool flow_dissector(struct __sk_buff *skb,
>   struct flow_keys *flow)
> {
> int poff, nh_off = BPF_LL_OFF + ETH_HLEN;
> __be16 proto = skb->protocol;
> __u8 ip_proto;
> 
> The plan so far is to see whether we can fix the issue in LLVM side.

If the compiler properly asks for "__sk_buff + 16" on little-endian
and "__sk_buff + 20" on big-endian, the verifier should instead be
fixed to allow the access to pass.

I can't see any reason why LLVM won't set the offset properly like
that, and it's a completely legitimate optimization that we shouldn't
try to stop LLVM from performing.

It also makes it so that we don't have to fix having absurdly defined
__sk_buff's protocol field as a u32.

Thanks.

Re: [PATCH net-next] liquidio: add support for OVS offload

2017-05-27 Thread David Miller

From: Felix Manlunas 
Date: Sat, 27 May 2017 08:56:33 -0700

> From: VSR Burru 
> 
> Add support for OVS offload.  By default PF driver runs in basic NIC mode
> as usual.  To run in OVS mode, use the insmod parameter "fw_type=ovs".
> 
> For OVS mode, create a management interface for communication with NIC
> firmware.  This communication channel uses PF0's I/O rings.
> 
> Bump up driver version to 1.6.0 to match newer firmware.
> 
> Signed-off-by: VSR Burru 
> Signed-off-by: Felix Manlunas 

How does this work?

What in userspace installs the OVS rules onto the card?

We do not support direct offload of OVS, as an OVS entity, instead we
required all vendors to make their OVS offloads visible as packet
scheduler classifiers and actions.

The same rules apply to liquidio.

If there is some special set of userspace interfaces that are used to
comunicate with these different firmwares in some liquidio specific
way, I am going to be very upset.  That is definitely not allowed.

I'm not applying this patch until the above is resolved and at least
more information is added to this commit log message to explain how
this stuff works.

Re: [PATCH net-next v2] net: phy: Relax error checking on sysfs_create_link()

2017-05-27 Thread David Miller

From: Florian Fainelli 
Date: Sat, 27 May 2017 10:42:25 -0700

> Some Ethernet drivers will attach/connect to a PHY device before calling
> register_netdevice() which is responsible for calling 
> netdev_register_kobject()
> which would do the network device's kobject initialization. In such a case,
> sysfs_create_link() would return -ENOENT because the network device's kobject
> is not ready yet, and we would fail to connect to the PHY device.
> 
> In order to keep things simple and symetrical, we just take the success path 
> as
> indicative of the ability to access the network device's kobject, and create
> the second link if that's the case.
> 
> Fixes: 5568363f0cb3 ("net: phy: Create sysfs reciprocal links for 
> attached_dev/phydev")
> Reported-by: Woojung Hung 
> Signed-off-by: Florian Fainelli 
> ---
> Changes in v2:
> - make sure phydev->sysfs_links is set to false before setting again

Applied, thanks Florian.

[PATCH iproute2 2/4] ip address: Move filter struct to ip_common.h

2017-05-27 Thread David Ahern

Move filter struct to ip_common.h as struct link_filter.

Signed-off-by: David Ahern 
---
 ip/ip_common.h | 20 
 ip/ipaddress.c | 22 +-
 2 files changed, 21 insertions(+), 21 deletions(-)

diff --git a/ip/ip_common.h b/ip/ip_common.h
index 450b45ac2b60..2b3cf7049b65 100644
--- a/ip/ip_common.h
+++ b/ip/ip_common.h
@@ -1,3 +1,23 @@
+struct link_filter {
+   int ifindex;
+   int family;
+   int oneline;
+   int showqueue;
+   inet_prefix pfx;
+   int scope, scopemask;
+   int flags, flagmask;
+   int up;
+   char *label;
+   int flushed;
+   char *flushb;
+   int flushp;
+   int flushe;
+   int group;
+   int master;
+   char *kind;
+   char *slave_kind;
+};
+
 int get_operstate(const char *name);
 int print_linkinfo(const struct sockaddr_nl *who,
   struct nlmsghdr *n, void *arg);
diff --git a/ip/ipaddress.c b/ip/ipaddress.c
index c805b929134d..3e2c38a8e53e 100644
--- a/ip/ipaddress.c
+++ b/ip/ipaddress.c
@@ -44,27 +44,7 @@ enum {
IPADD_SAVE,
 };
 
-static struct
-{
-   int ifindex;
-   int family;
-   int oneline;
-   int showqueue;
-   inet_prefix pfx;
-   int scope, scopemask;
-   int flags, flagmask;
-   int up;
-   char *label;
-   int flushed;
-   char *flushb;
-   int flushp;
-   int flushe;
-   int group;
-   int master;
-   char *kind;
-   char *slave_kind;
-} filter;
-
+static struct link_filter filter;
 static int do_link;
 
 static void usage(void) __attribute__((noreturn));
-- 
2.11.0 (Apple Git-81)

[PATCH iproute2 3/4] ip address: Change print_linkinfo_brief to take filter as an input

2017-05-27 Thread David Ahern

Change print_linkinfo_brief to take the filter as an input arg.
If the arg is NULL, use the global filter in ipaddress.c.

Signed-off-by: David Ahern 
---
 ip/ip_common.h |  3 ++-
 ip/ipaddress.c | 35 ---
 ip/iplink.c|  2 +-
 3 files changed, 23 insertions(+), 17 deletions(-)

diff --git a/ip/ip_common.h b/ip/ip_common.h
index 2b3cf7049b65..77e9dd06b864 100644
--- a/ip/ip_common.h
+++ b/ip/ip_common.h
@@ -22,7 +22,8 @@ int get_operstate(const char *name);
 int print_linkinfo(const struct sockaddr_nl *who,
   struct nlmsghdr *n, void *arg);
 int print_linkinfo_brief(const struct sockaddr_nl *who,
-struct nlmsghdr *n, void *arg);
+struct nlmsghdr *n, void *arg,
+struct link_filter *filter);
 int print_addrinfo(const struct sockaddr_nl *who,
   struct nlmsghdr *n, void *arg);
 int print_addrlabel(const struct sockaddr_nl *who,
diff --git a/ip/ipaddress.c b/ip/ipaddress.c
index 3e2c38a8e53e..4900dce09df8 100644
--- a/ip/ipaddress.c
+++ b/ip/ipaddress.c
@@ -634,7 +634,8 @@ static void print_link_stats(FILE *fp, struct nlmsghdr *n)
 }
 
 int print_linkinfo_brief(const struct sockaddr_nl *who,
-   struct nlmsghdr *n, void *arg)
+struct nlmsghdr *n, void *arg,
+struct link_filter *pfilter)
 {
FILE *fp = (FILE *)arg;
struct ifinfomsg *ifi = NLMSG_DATA(n);
@@ -651,9 +652,12 @@ int print_linkinfo_brief(const struct sockaddr_nl *who,
if (len < 0)
return -1;
 
-   if (filter.ifindex && ifi->ifi_index != filter.ifindex)
+   if (!pfilter)
+   pfilter = 
+
+   if (pfilter->ifindex && ifi->ifi_index != pfilter->ifindex)
return -1;
-   if (filter.up && !(ifi->ifi_flags_UP))
+   if (pfilter->up && !(ifi->ifi_flags_UP))
return -1;
 
parse_rtattr(tb, IFLA_MAX, IFLA_RTA(ifi), len);
@@ -664,30 +668,30 @@ int print_linkinfo_brief(const struct sockaddr_nl *who,
name = rta_getattr_str(tb[IFLA_IFNAME]);
}
 
-   if (filter.label &&
-   (!filter.family || filter.family == AF_PACKET) &&
-   fnmatch(filter.label, RTA_DATA(tb[IFLA_IFNAME]), 0))
+   if (pfilter->label &&
+   (!pfilter->family || pfilter->family == AF_PACKET) &&
+   fnmatch(pfilter->label, RTA_DATA(tb[IFLA_IFNAME]), 0))
return -1;
 
if (tb[IFLA_GROUP]) {
int group = rta_getattr_u32(tb[IFLA_GROUP]);
 
-   if (filter.group != -1 && group != filter.group)
+   if (pfilter->group != -1 && group != pfilter->group)
return -1;
}
 
if (tb[IFLA_MASTER]) {
int master = rta_getattr_u32(tb[IFLA_MASTER]);
 
-   if (filter.master > 0 && master != filter.master)
+   if (pfilter->master > 0 && master != pfilter->master)
return -1;
-   } else if (filter.master > 0)
+   } else if (pfilter->master > 0)
return -1;
 
-   if (filter.kind && match_link_kind(tb, filter.kind, 0))
+   if (pfilter->kind && match_link_kind(tb, pfilter->kind, 0))
return -1;
 
-   if (filter.slave_kind && match_link_kind(tb, filter.slave_kind, 1))
+   if (pfilter->slave_kind && match_link_kind(tb, pfilter->slave_kind, 1))
return -1;
 
if (n->nlmsg_type == RTM_DELLINK)
@@ -713,7 +717,7 @@ int print_linkinfo_brief(const struct sockaddr_nl *who,
if (tb[IFLA_OPERSTATE])
print_operstate(fp, rta_getattr_u8(tb[IFLA_OPERSTATE]));
 
-   if (filter.family == AF_PACKET) {
+   if (pfilter->family == AF_PACKET) {
SPRINT_BUF(b1);
if (tb[IFLA_ADDRESS]) {
color_fprintf(fp, COLOR_MAC, "%s ",
@@ -724,10 +728,10 @@ int print_linkinfo_brief(const struct sockaddr_nl *who,
}
}
 
-   if (filter.family == AF_PACKET)
+   if (pfilter->family == AF_PACKET)
print_link_flags(fp, ifi->ifi_flags, m_flag);
 
-   if (filter.family == AF_PACKET)
+   if (pfilter->family == AF_PACKET)
fprintf(fp, "\n");
fflush(fp);
return 0;
@@ -1736,7 +1740,8 @@ static int ipaddr_list_flush_or_save(int argc, char 
**argv, int action)
struct ifinfomsg *ifi = NLMSG_DATA(>h);
 
if (brief) {
-   if (print_linkinfo_brief(NULL, >h, stdout) == 0)
+   if (print_linkinfo_brief(NULL, >h,
+stdout, NULL) == 0)
if (filter.family != AF_PACKET)
print_selected_addrinfo(ifi,
ainfo->head,
diff --git a/ip/iplink.c

[PATCH iproute2 0/4] ip: Add vrf show commmand

2017-05-27 Thread David Ahern

Refactor ip address to export its capability to save a list of nlmsg's
for links and its link filter. Use both to add an 'ip vrf show' command
to list all configured VRF with table id.

David Ahern (4):
  ip address: Export ip_linkaddr_list
  ip address: Move filter struct to ip_common.h
  ip address: Change print_linkinfo_brief to take filter as an input
  ip vrf: Add show command

 include/libnetlink.h |  10 
 ip/ip_common.h   |  27 -
 ip/ipaddress.c   | 144 +++-
 ip/iplink.c  |   2 +-
 ip/ipvrf.c   | 153 +--
 man/man8/ip-vrf.8|  11 
 6 files changed, 266 insertions(+), 81 deletions(-)

-- 
2.11.0 (Apple Git-81)

[PATCH iproute2 4/4] ip vrf: Add show command

2017-05-27 Thread David Ahern

Add show command to list all configured VRF and their table ids.

Signed-off-by: David Ahern 
---
 ip/ipvrf.c| 153 --
 man/man8/ip-vrf.8 |  11 
 2 files changed, 159 insertions(+), 5 deletions(-)

diff --git a/ip/ipvrf.c b/ip/ipvrf.c
index 0f611b44b78a..0094cf8557cd 100644
--- a/ip/ipvrf.c
+++ b/ip/ipvrf.c
@@ -32,9 +32,12 @@
 
 #define CGRP_PROC_FILE  "/cgroup.procs"
 
+static struct link_filter vrf_filter;
+
 static void usage(void)
 {
-   fprintf(stderr, "Usage: ip vrf exec [NAME] cmd ...\n");
+   fprintf(stderr, "Usage: ip vrf show [NAME] ...\n");
+   fprintf(stderr, "   ip vrf exec [NAME] cmd ...\n");
fprintf(stderr, "   ip vrf identify [PID]\n");
fprintf(stderr, "   ip vrf pids [NAME]\n");
 
@@ -467,13 +470,148 @@ void vrf_reset(void)
vrf_switch("default");
 }
 
-int do_ipvrf(int argc, char **argv)
+static int ipvrf_filter_req(struct nlmsghdr *nlh, int reqlen)
+{
+   struct rtattr *linkinfo;
+   int err;
+
+   if (vrf_filter.kind) {
+   linkinfo = addattr_nest(nlh, reqlen, IFLA_LINKINFO);
+
+   err = addattr_l(nlh, reqlen, IFLA_INFO_KIND, vrf_filter.kind,
+   strlen(vrf_filter.kind));
+   if (err)
+   return err;
+
+   addattr_nest_end(nlh, linkinfo);
+   }
+
+   return 0;
+}
+
+/* input arg is linkinfo */
+static __u32 vrf_table_linkinfo(struct rtattr *li[])
+{
+   struct rtattr *attr[IFLA_VRF_MAX + 1];
+
+   if (li[IFLA_INFO_DATA]) {
+   parse_rtattr_nested(attr, IFLA_VRF_MAX, li[IFLA_INFO_DATA]);
+
+   if (attr[IFLA_VRF_TABLE])
+   return rta_getattr_u32(attr[IFLA_VRF_TABLE]);
+   }
+
+   return 0;
+}
+
+static int ipvrf_print(struct nlmsghdr *n)
+{
+   struct ifinfomsg *ifi = NLMSG_DATA(n);
+   struct rtattr *tb[IFLA_MAX+1];
+   struct rtattr *li[IFLA_INFO_MAX+1];
+   int len = n->nlmsg_len;
+   const char *name;
+   __u32 tb_id;
+
+   len -= NLMSG_LENGTH(sizeof(*ifi));
+   if (len < 0)
+   return 0;
+
+   if (vrf_filter.ifindex && vrf_filter.ifindex != ifi->ifi_index)
+   return 0;
+
+   parse_rtattr(tb, IFLA_MAX, IFLA_RTA(ifi), len);
+
+   /* kernel does not support filter by master device */
+   if (tb[IFLA_MASTER]) {
+   int master = *(int *)RTA_DATA(tb[IFLA_MASTER]);
+
+   if (vrf_filter.master && master != vrf_filter.master)
+   return 0;
+   }
+
+   if (!tb[IFLA_IFNAME]) {
+   fprintf(stderr,
+   "BUG: device with ifindex %d has nil ifname\n",
+   ifi->ifi_index);
+   return 0;
+   }
+   name = rta_getattr_str(tb[IFLA_IFNAME]);
+
+   /* missing LINKINFO means not VRF. e.g., kernel does not
+* support filtering on kind, so userspace needs to handle
+*/
+   if (!tb[IFLA_LINKINFO])
+   return 0;
+
+   parse_rtattr_nested(li, IFLA_INFO_MAX, tb[IFLA_LINKINFO]);
+
+   if (!li[IFLA_INFO_KIND])
+   return 0;
+
+   if (strcmp(RTA_DATA(li[IFLA_INFO_KIND]), "vrf"))
+   return 0;
+
+   tb_id = vrf_table_linkinfo(li);
+   if (!tb_id) {
+   fprintf(stderr,
+   "BUG: VRF %s is missing table id\n", name);
+   return 0;
+   }
+
+   printf("%-16s %5u", name, tb_id);
+
+   printf("\n");
+   return 1;
+}
+
+static int ipvrf_show(int argc, char **argv)
 {
-   if (argc == 0) {
-   fprintf(stderr, "No command given. Try \"ip vrf help\".\n");
-   exit(-1);
+   struct nlmsg_chain linfo = { NULL, NULL};
+   int rc = 0;
+
+   vrf_filter.kind = "vrf";
+
+   if (argc > 1)
+   usage();
+
+   if (argc == 1) {
+   __u32 tb_id;
+
+   tb_id = ipvrf_get_table(argv[0]);
+   if (!tb_id) {
+   fprintf(stderr, "Invalid VRF\n");
+   return 1;
+   }
+   printf("%s %u\n", argv[0], tb_id);
+   return 0;
}
 
+   if (ip_linkaddr_list(0, ipvrf_filter_req, , NULL) == 0) {
+   struct nlmsg_list *l;
+   unsigned nvrf = 0;
+   int n;
+
+   n = printf("%-16s  %5s\n", "Name", "Table");
+   printf("%.*s\n", n-1, "---");
+   for (l = linfo.head; l; l = l->next)
+   nvrf += ipvrf_print(>h);
+
+   if (!nvrf)
+   printf("No VRF has been configured\n");
+   } else
+   rc = 1;
+
+   free_nlmsg_chain();
+
+   return rc;
+}
+
+int do_ipvrf(int argc, char **argv)
+{
+   if (argc == 0)
+   return ipvrf_show(0, NULL);
+
if (matches(*argv, "identify") ==

[PATCH iproute2 1/4] ip address: Export ip_linkaddr_list

2017-05-27 Thread David Ahern

ipaddr_list_flush_or_save generates a list of nlmsg's for links and
optionally for addresses. Move the code into ip_linkaddr_list and
export it along with the supporting infrastructure.

API to use this function is:
struct nlmsg_chain linfo = { NULL, NULL};
struct nlmsg_chain ainfo = { NULL, NULL};

ip_linkaddr_list(family, filter_req, , );

... error checking and code looping over linfo/ainfo ...

free_nlmsg_chain();
free_nlmsg_chain();

Signed-off-by: David Ahern 
---
 include/libnetlink.h | 10 ++
 ip/ip_common.h   |  4 +++
 ip/ipaddress.c   | 87 +---
 3 files changed, 63 insertions(+), 38 deletions(-)

diff --git a/include/libnetlink.h b/include/libnetlink.h
index c43ab0a2d9d9..643c3bc56929 100644
--- a/include/libnetlink.h
+++ b/include/libnetlink.h
@@ -25,6 +25,16 @@ struct rtnl_handle {
int flags;
 };
 
+struct nlmsg_list {
+   struct nlmsg_list *next;
+   struct nlmsghdr   h;
+};
+
+struct nlmsg_chain {
+   struct nlmsg_list *head;
+   struct nlmsg_list *tail;
+};
+
 extern int rcvbuf;
 
 int rtnl_open(struct rtnl_handle *rth, unsigned int subscriptions)
diff --git a/ip/ip_common.h b/ip/ip_common.h
index 202fc399e61a..450b45ac2b60 100644
--- a/ip/ip_common.h
+++ b/ip/ip_common.h
@@ -65,6 +65,10 @@ int do_seg6(int argc, char **argv);
 int iplink_get(unsigned int flags, char *name, __u32 filt_mask);
 int iplink_ifla_xstats(int argc, char **argv);
 
+int ip_linkaddr_list(int family, req_filter_fn_t filter_fn,
+struct nlmsg_chain *linfo, struct nlmsg_chain *ainfo);
+void free_nlmsg_chain(struct nlmsg_chain *info);
+
 static inline int rtm_get_table(struct rtmsg *r, struct rtattr **tb)
 {
__u32 table = r->rtm_table;
diff --git a/ip/ipaddress.c b/ip/ipaddress.c
index b8d9c7d917fe..c805b929134d 100644
--- a/ip/ipaddress.c
+++ b/ip/ipaddress.c
@@ -1211,16 +1211,6 @@ int print_addrinfo(const struct sockaddr_nl *who, struct 
nlmsghdr *n,
return 0;
 }
 
-struct nlmsg_list {
-   struct nlmsg_list *next;
-   struct nlmsghdr   h;
-};
-
-struct nlmsg_chain {
-   struct nlmsg_list *head;
-   struct nlmsg_list *tail;
-};
-
 static int print_selected_addrinfo(struct ifinfomsg *ifi,
   struct nlmsg_list *ainfo, FILE *fp)
 {
@@ -1371,7 +1361,7 @@ static int ipaddr_restore(void)
exit(rtnl_from_file(stdin, _handler, NULL));
 }
 
-static void free_nlmsg_chain(struct nlmsg_chain *info)
+void free_nlmsg_chain(struct nlmsg_chain *info)
 {
struct nlmsg_list *l, *n;
 
@@ -1534,10 +1524,43 @@ static int iplink_filter_req(struct nlmsghdr *nlh, int 
reqlen)
return 0;
 }
 
+/* fills in linfo with link data and optionally ainfo with address info
+ * caller can walk lists as desired and must call free_nlmsg_chain for
+ * both when done
+ */
+int ip_linkaddr_list(int family, req_filter_fn_t filter_fn,
+struct nlmsg_chain *linfo, struct nlmsg_chain *ainfo)
+{
+   if (rtnl_wilddump_req_filter_fn(, preferred_family, RTM_GETLINK,
+   filter_fn) < 0) {
+   perror("Cannot send dump request");
+   return 1;
+   }
+
+   if (rtnl_dump_filter(, store_nlmsg, linfo) < 0) {
+   fprintf(stderr, "Dump terminated\n");
+   return 1;
+   }
+
+   if (ainfo) {
+   if (rtnl_wilddump_request(, family, RTM_GETADDR) < 0) {
+   perror("Cannot send dump request");
+   return 1;
+   }
+
+   if (rtnl_dump_filter(, store_nlmsg, ainfo) < 0) {
+   fprintf(stderr, "Dump terminated\n");
+   return 1;
+   }
+   }
+
+   return 0;
+}
+
 static int ipaddr_list_flush_or_save(int argc, char **argv, int action)
 {
struct nlmsg_chain linfo = { NULL, NULL};
-   struct nlmsg_chain ainfo = { NULL, NULL};
+   struct nlmsg_chain _ainfo = { NULL, NULL}, *ainfo = NULL;
struct nlmsg_list *l;
char *filter_dev = NULL;
int no_link = 0;
@@ -1714,33 +1737,19 @@ static int ipaddr_list_flush_or_save(int argc, char 
**argv, int action)
exit(0);
}
 
-   if (rtnl_wilddump_req_filter_fn(, preferred_family, RTM_GETLINK,
-   iplink_filter_req) < 0) {
-   perror("Cannot send dump request");
-   exit(1);
-   }
-
-   if (rtnl_dump_filter(, store_nlmsg, ) < 0) {
-   fprintf(stderr, "Dump terminated\n");
-   exit(1);
-   }
-
if (filter.family != AF_PACKET) {
+   ainfo = &_ainfo;
+
if (filter.oneline)
no_link = 1;
+   }
 
-   if (rtnl_wilddump_request(, filter.family, RTM_GETADDR) < 
0) {
-   perror("Cannot send

Re: [PATCH net-next] net: dsa: mv88e6xxx: handle SERDES error appropriately

2017-05-27 Thread David Miller

From: Vivien Didelot 
Date: Fri, 26 May 2017 18:02:42 -0400

> mv88e6xxx_serdes_power returns an error, so no need to print an error
> message inside of it. Rather print it in its caller when the error is
> ignored, which is in the mv88e6xxx_port_disable void function.
> 
> Catch and return its error in the counterpart mv88e6xxx_port_enable.
> 
> Fixes: 04aca9938255 ("dsa: mv88e6xxx: Enable/Disable SERDES on port 
> enable/disable")
> Signed-off-by: Vivien Didelot 

Applied, thanks.

Re: [PATCH V6 net-next 0/2] rtnetlink: Updates to rtnetlink_event()

2017-05-27 Thread David Miller

From: Vladislav Yasevich 
Date: Sat, 27 May 2017 10:14:33 -0400

> First is the patch to add IFLA_EVENT attribute to the netlink message.  It
> supports only currently white-listed events.
> Like before, this is just an attribute that gets added to the rtnetlink
> message only when the messaged was generated as a result of a netdev event.
> In my case, this is necessary since I want to trap NETDEV_NOTIFY_PEERS
> event (also possibly NETDEV_RESEND_IGMP event) and perform certain actions
> in user space.  This is not possible since the messages generated as
> a result of netdev events do not usually contain any changed data.  They
> are just notifications.  This patch exposes this notification type to
> userspace.
> 
> Second, I remove duplicate messages that a result of a change to bonding
> options.  If netlink is used to configure bonding options, 2 messages
> are generated, one as a result NETDEV_CHANGEINFODATA event triggered by
> bonding code and one a result of device state changes triggered by
> netdev_state_change (called from do_setlink).
 ...

Series applied, thanks Vlad.

Re: [PATCH v3] hdlcdrv: Fix divide by zero in hdlcdrv_ioctl

2017-05-27 Thread David Miller

From: Firo Yang 
Date: Fri, 26 May 2017 22:37:38 +0800

> syszkaller fuzzer triggered a divide by zero, when set calibration
> through ioctl().
> 
> To fix it, test 'bitrate' if it is negative or 0, just return -EINVAL.
> 
> Reported-by: Andrey Konovalov 
> Signed-off-by: Firo Yang 

Applied, thank you.

Re: [PATCH 0/2] Document and use eeprom-length property

2017-05-27 Thread David Miller

From: Shawn Guo 
Date: Sat, 27 May 2017 16:13:34 +0800

> On Fri, May 26, 2017 at 03:02:42PM -0400, David Miller wrote:
>> From: Andrew Lunn 
>> Date: Fri, 26 May 2017 01:44:42 +0200
>> 
>> > The mv88e6xxx switch driver allows the size of the attached EEPROM to
>> > be described in DT. This property is missing from the binding
>> > documentation. Add it. And make use of it on the ZII Devel B board.
>> > 
>> > David, Shawn, please could you talk amongs yourself to decide who
>> > takes what.
>> 
>> I can take this if it works for Shawn, otherwise I'm also fine if Shawn
>> takes it and if so feel free to add my:
>> 
>> Acked-by: David S. Miller 
> 
> Hi David,
> 
> I see these two patches can be applied separately, so I picked up 2/2
> and left 1/2 to you.

Ok I applied 1/2, thanks.

[PATCH v2 net-next 6/9] net: mpls: Pull common label check into helper

2017-05-27 Thread David Ahern

mpls_route_add and mpls_route_del have the same checks on the label.
Move to a helper. Avoid duplicate extack messages in the next patch.

Signed-off-by: David Ahern 
---
 net/mpls/af_mpls.c | 32 +---
 1 file changed, 17 insertions(+), 15 deletions(-)

diff --git a/net/mpls/af_mpls.c b/net/mpls/af_mpls.c
index f3830951fb1c..726eafecc793 100644
--- a/net/mpls/af_mpls.c
+++ b/net/mpls/af_mpls.c
@@ -856,6 +856,19 @@ static int mpls_nh_build_multi(struct mpls_route_config 
*cfg,
return err;
 }
 
+static bool mpls_label_ok(struct net *net, unsigned int index)
+{
+   /* Reserved labels may not be set */
+   if (index < MPLS_LABEL_FIRST_UNRESERVED)
+   return false;
+
+   /* The full 20 bit range may not be supported. */
+   if (index >= net->mpls.platform_labels)
+   return false;
+
+   return true;
+}
+
 static int mpls_route_add(struct mpls_route_config *cfg)
 {
struct mpls_route __rcu **platform_label;
@@ -875,12 +888,7 @@ static int mpls_route_add(struct mpls_route_config *cfg)
index = find_free_label(net);
}
 
-   /* Reserved labels may not be set */
-   if (index < MPLS_LABEL_FIRST_UNRESERVED)
-   goto errout;
-
-   /* The full 20 bit range may not be supported. */
-   if (index >= net->mpls.platform_labels)
+   if (!mpls_label_ok(net, index))
goto errout;
 
/* Append makes no sense with mpls */
@@ -952,12 +960,7 @@ static int mpls_route_del(struct mpls_route_config *cfg)
 
index = cfg->rc_label;
 
-   /* Reserved labels may not be removed */
-   if (index < MPLS_LABEL_FIRST_UNRESERVED)
-   goto errout;
-
-   /* The full 20 bit range may not be supported */
-   if (index >= net->mpls.platform_labels)
+   if (!mpls_label_ok(net, index))
goto errout;
 
mpls_route_update(net, index, NULL, >rc_nlinfo);
@@ -1735,10 +1738,9 @@ static int rtm_to_route_config(struct sk_buff *skb,  
struct nlmsghdr *nlh,
   >rc_label, NULL))
goto errout;
 
-   /* Reserved labels may not be set */
-   if (cfg->rc_label < MPLS_LABEL_FIRST_UNRESERVED)
+   if (!mpls_label_ok(cfg->rc_nlinfo.nl_net,
+  cfg->rc_label))
goto errout;
-
break;
}
case RTA_VIA:
-- 
2.11.0 (Apple Git-81)

[PATCH v2 net-next 8/9] net: mpls: Make nla_get_via in af_mpls.c

2017-05-27 Thread David Ahern

nla_get_via is only used in af_mpls.c. Remove declaration from internal.h
and move up in af_mpls.c before first use. Code move only; no
functional change intended.

Signed-off-by: David Ahern 
---
 net/mpls/af_mpls.c  | 96 ++---
 net/mpls/internal.h |  2 --
 2 files changed, 48 insertions(+), 50 deletions(-)

diff --git a/net/mpls/af_mpls.c b/net/mpls/af_mpls.c
index 0133d1ad9032..a953fcf169ba 100644
--- a/net/mpls/af_mpls.c
+++ b/net/mpls/af_mpls.c
@@ -684,6 +684,54 @@ static int mpls_nh_assign_dev(struct net *net, struct 
mpls_route *rt,
return err;
 }
 
+static int nla_get_via(const struct nlattr *nla, u8 *via_alen, u8 *via_table,
+  u8 via_addr[], struct netlink_ext_ack *extack)
+{
+   struct rtvia *via = nla_data(nla);
+   int err = -EINVAL;
+   int alen;
+
+   if (nla_len(nla) < offsetof(struct rtvia, rtvia_addr)) {
+   NL_SET_ERR_MSG_ATTR(extack, nla,
+   "Invalid attribute length for RTA_VIA");
+   goto errout;
+   }
+   alen = nla_len(nla) -
+   offsetof(struct rtvia, rtvia_addr);
+   if (alen > MAX_VIA_ALEN) {
+   NL_SET_ERR_MSG_ATTR(extack, nla,
+   "Invalid address length for RTA_VIA");
+   goto errout;
+   }
+
+   /* Validate the address family */
+   switch (via->rtvia_family) {
+   case AF_PACKET:
+   *via_table = NEIGH_LINK_TABLE;
+   break;
+   case AF_INET:
+   *via_table = NEIGH_ARP_TABLE;
+   if (alen != 4)
+   goto errout;
+   break;
+   case AF_INET6:
+   *via_table = NEIGH_ND_TABLE;
+   if (alen != 16)
+   goto errout;
+   break;
+   default:
+   /* Unsupported address family */
+   goto errout;
+   }
+
+   memcpy(via_addr, via->rtvia_addr, alen);
+   *via_alen = alen;
+   err = 0;
+
+errout:
+   return err;
+}
+
 static int mpls_nh_build_from_cfg(struct mpls_route_config *cfg,
  struct mpls_route *rt)
 {
@@ -1641,54 +1689,6 @@ int nla_get_labels(const struct nlattr *nla, u8 
max_labels, u8 *labels,
 }
 EXPORT_SYMBOL_GPL(nla_get_labels);
 
-int nla_get_via(const struct nlattr *nla, u8 *via_alen, u8 *via_table,
-   u8 via_addr[], struct netlink_ext_ack *extack)
-{
-   struct rtvia *via = nla_data(nla);
-   int err = -EINVAL;
-   int alen;
-
-   if (nla_len(nla) < offsetof(struct rtvia, rtvia_addr)) {
-   NL_SET_ERR_MSG_ATTR(extack, nla,
-   "Invalid attribute length for RTA_VIA");
-   goto errout;
-   }
-   alen = nla_len(nla) -
-   offsetof(struct rtvia, rtvia_addr);
-   if (alen > MAX_VIA_ALEN) {
-   NL_SET_ERR_MSG_ATTR(extack, nla,
-   "Invalid address length for RTA_VIA");
-   goto errout;
-   }
-
-   /* Validate the address family */
-   switch (via->rtvia_family) {
-   case AF_PACKET:
-   *via_table = NEIGH_LINK_TABLE;
-   break;
-   case AF_INET:
-   *via_table = NEIGH_ARP_TABLE;
-   if (alen != 4)
-   goto errout;
-   break;
-   case AF_INET6:
-   *via_table = NEIGH_ND_TABLE;
-   if (alen != 16)
-   goto errout;
-   break;
-   default:
-   /* Unsupported address family */
-   goto errout;
-   }
-
-   memcpy(via_addr, via->rtvia_addr, alen);
-   *via_alen = alen;
-   err = 0;
-
-errout:
-   return err;
-}
-
 static int rtm_to_route_config(struct sk_buff *skb,
   struct nlmsghdr *nlh,
   struct mpls_route_config *cfg,
diff --git a/net/mpls/internal.h b/net/mpls/internal.h
index a015a6a1143b..cf65aec2e551 100644
--- a/net/mpls/internal.h
+++ b/net/mpls/internal.h
@@ -204,8 +204,6 @@ int nla_put_labels(struct sk_buff *skb, int attrtype,  u8 
labels,
   const u32 label[]);
 int nla_get_labels(const struct nlattr *nla, u8 max_labels, u8 *labels,
   u32 label[], struct netlink_ext_ack *extack);
-int nla_get_via(const struct nlattr *nla, u8 *via_alen, u8 *via_table,
-   u8 via[], struct netlink_ext_ack *extack);
 bool mpls_output_possible(const struct net_device *dev);
 unsigned int mpls_dev_mtu(const struct net_device *dev);
 bool mpls_pkt_too_big(const struct sk_buff *skb, unsigned int mtu);
-- 
2.11.0 (Apple Git-81)

[PATCH v2 net-next 9/9] net: mpls: remove unnecessary initialization of err

2017-05-27 Thread David Ahern

err is initialized to EINVAL and not used before it is set again.
Remove the unnecessary initialization.

Signed-off-by: David Ahern 
---
 net/mpls/af_mpls.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/net/mpls/af_mpls.c b/net/mpls/af_mpls.c
index a953fcf169ba..94b3317232a6 100644
--- a/net/mpls/af_mpls.c
+++ b/net/mpls/af_mpls.c
@@ -743,8 +743,6 @@ static int mpls_nh_build_from_cfg(struct mpls_route_config 
*cfg,
if (!nh)
return -ENOMEM;
 
-   err = -EINVAL;
-
nh->nh_labels = cfg->rc_output_labels;
for (i = 0; i < nh->nh_labels; i++)
nh->nh_label[i] = cfg->rc_output_label[i];
-- 
2.11.0 (Apple Git-81)

[PATCH v2 net-next 4/9] net: add extack arg to lwtunnel build state

2017-05-27 Thread David Ahern

Pass extack arg down to lwtunnel_build_state and the build_state callbacks.
Add messages for failures in lwtunnel_build_state, and add the extarg to
nla_parse where possible in the build_state callbacks.

Signed-off-by: David Ahern 
---
 include/linux/netlink.h   | 10 ++
 include/net/lwtunnel.h|  9 ++---
 net/core/lwt_bpf.c|  5 +++--
 net/core/lwtunnel.c   | 20 +---
 net/ipv4/fib_lookup.h |  3 ++-
 net/ipv4/fib_semantics.c  | 20 +++-
 net/ipv4/fib_trie.c   |  2 +-
 net/ipv4/ip_tunnel_core.c | 11 +++
 net/ipv6/ila/ila_lwt.c|  5 +++--
 net/ipv6/route.c  |  2 +-
 net/ipv6/seg6_iptunnel.c  |  5 +++--
 net/mpls/mpls_iptunnel.c  |  5 +++--
 12 files changed, 67 insertions(+), 30 deletions(-)

diff --git a/include/linux/netlink.h b/include/linux/netlink.h
index a68aad484c69..8664fd26eb5d 100644
--- a/include/linux/netlink.h
+++ b/include/linux/netlink.h
@@ -102,6 +102,16 @@ struct netlink_ext_ack {
(extack)->bad_attr = (attr);\
 } while (0)
 
+#define NL_SET_ERR_MSG_ATTR(extack, attr, msg) do {\
+   static const char __msg[] = (msg);  \
+   struct netlink_ext_ack *__extack = (extack);\
+   \
+   if (__extack) { \
+   __extack->_msg = __msg; \
+   __extack->bad_attr = (attr);\
+   }   \
+} while (0)
+
 extern void netlink_kernel_release(struct sock *sk);
 extern int __netlink_change_ngroups(struct sock *sk, unsigned int groups);
 extern int netlink_change_ngroups(struct sock *sk, unsigned int groups);
diff --git a/include/net/lwtunnel.h b/include/net/lwtunnel.h
index ca6f002774ef..7c26863b8cf4 100644
--- a/include/net/lwtunnel.h
+++ b/include/net/lwtunnel.h
@@ -35,7 +35,8 @@ struct lwtunnel_state {
 struct lwtunnel_encap_ops {
int (*build_state)(struct nlattr *encap,
   unsigned int family, const void *cfg,
-  struct lwtunnel_state **ts);
+  struct lwtunnel_state **ts,
+  struct netlink_ext_ack *extack);
void (*destroy_state)(struct lwtunnel_state *lws);
int (*output)(struct net *net, struct sock *sk, struct sk_buff *skb);
int (*input)(struct sk_buff *skb);
@@ -114,7 +115,8 @@ int lwtunnel_valid_encap_type_attr(struct nlattr *attr, int 
len,
 int lwtunnel_build_state(u16 encap_type,
 struct nlattr *encap,
 unsigned int family, const void *cfg,
-struct lwtunnel_state **lws);
+struct lwtunnel_state **lws,
+struct netlink_ext_ack *extack);
 int lwtunnel_fill_encap(struct sk_buff *skb,
struct lwtunnel_state *lwtstate);
 int lwtunnel_get_encap_size(struct lwtunnel_state *lwtstate);
@@ -192,7 +194,8 @@ static inline int lwtunnel_valid_encap_type_attr(struct 
nlattr *attr, int len,
 static inline int lwtunnel_build_state(u16 encap_type,
   struct nlattr *encap,
   unsigned int family, const void *cfg,
-  struct lwtunnel_state **lws)
+  struct lwtunnel_state **lws,
+  struct netlink_ext_ack *extack)
 {
return -EOPNOTSUPP;
 }
diff --git a/net/core/lwt_bpf.c b/net/core/lwt_bpf.c
index b3bc0a31af9f..1307731ddfe4 100644
--- a/net/core/lwt_bpf.c
+++ b/net/core/lwt_bpf.c
@@ -240,7 +240,8 @@ static const struct nla_policy bpf_nl_policy[LWT_BPF_MAX + 
1] = {
 
 static int bpf_build_state(struct nlattr *nla,
   unsigned int family, const void *cfg,
-  struct lwtunnel_state **ts)
+  struct lwtunnel_state **ts,
+  struct netlink_ext_ack *extack)
 {
struct nlattr *tb[LWT_BPF_MAX + 1];
struct lwtunnel_state *newts;
@@ -250,7 +251,7 @@ static int bpf_build_state(struct nlattr *nla,
if (family != AF_INET && family != AF_INET6)
return -EAFNOSUPPORT;
 
-   ret = nla_parse_nested(tb, LWT_BPF_MAX, nla, bpf_nl_policy, NULL);
+   ret = nla_parse_nested(tb, LWT_BPF_MAX, nla, bpf_nl_policy, extack);
if (ret < 0)
return ret;
 
diff --git a/net/core/lwtunnel.c b/net/core/lwtunnel.c
index ab840386a74d..d9cb3532f1dd 100644
--- a/net/core/lwtunnel.c
+++ b/net/core/lwtunnel.c
@@ -103,25 +103,39 @@ EXPORT_SYMBOL(lwtunnel_encap_del_ops);
 
 int lwtunnel_build_state(u16 encap_type,
 struct nlattr *encap, unsigned int family,
-const void *cfg, struct lwtunnel_state **lws)
+const void *cfg, struct lwtunnel_state **lws,
+

[PATCH v2 net-next 3/9] net: lwtunnel: Add extack to encap attr validation

2017-05-27 Thread David Ahern

Pass extack down to lwtunnel_valid_encap_type and
lwtunnel_valid_encap_type_attr. Add messages for unknown
or unsupported encap types.

Signed-off-by: David Ahern 
---
 include/net/lwtunnel.h  | 13 +
 net/core/lwtunnel.c | 18 +-
 net/ipv4/fib_frontend.c |  6 --
 net/ipv6/route.c|  4 ++--
 4 files changed, 28 insertions(+), 13 deletions(-)

diff --git a/include/net/lwtunnel.h b/include/net/lwtunnel.h
index ebfe237aad7e..ca6f002774ef 100644
--- a/include/net/lwtunnel.h
+++ b/include/net/lwtunnel.h
@@ -107,8 +107,10 @@ int lwtunnel_encap_add_ops(const struct lwtunnel_encap_ops 
*op,
   unsigned int num);
 int lwtunnel_encap_del_ops(const struct lwtunnel_encap_ops *op,
   unsigned int num);
-int lwtunnel_valid_encap_type(u16 encap_type);
-int lwtunnel_valid_encap_type_attr(struct nlattr *attr, int len);
+int lwtunnel_valid_encap_type(u16 encap_type,
+ struct netlink_ext_ack *extack);
+int lwtunnel_valid_encap_type_attr(struct nlattr *attr, int len,
+  struct netlink_ext_ack *extack);
 int lwtunnel_build_state(u16 encap_type,
 struct nlattr *encap,
 unsigned int family, const void *cfg,
@@ -172,11 +174,14 @@ static inline int lwtunnel_encap_del_ops(const struct 
lwtunnel_encap_ops *op,
return -EOPNOTSUPP;
 }
 
-static inline int lwtunnel_valid_encap_type(u16 encap_type)
+static inline int lwtunnel_valid_encap_type(u16 encap_type,
+   struct netlink_ext_ack *extack)
 {
+   NL_SET_ERR_MSG(extack, "CONFIG_LWTUNNEL is not enabled in this kernel");
return -EOPNOTSUPP;
 }
-static inline int lwtunnel_valid_encap_type_attr(struct nlattr *attr, int len)
+static inline int lwtunnel_valid_encap_type_attr(struct nlattr *attr, int len,
+struct netlink_ext_ack *extack)
 {
/* return 0 since we are not walking attr looking for
 * RTA_ENCAP_TYPE attribute on nexthops.
diff --git a/net/core/lwtunnel.c b/net/core/lwtunnel.c
index cfae3d5fe11f..ab840386a74d 100644
--- a/net/core/lwtunnel.c
+++ b/net/core/lwtunnel.c
@@ -126,14 +126,16 @@ int lwtunnel_build_state(u16 encap_type,
 }
 EXPORT_SYMBOL(lwtunnel_build_state);
 
-int lwtunnel_valid_encap_type(u16 encap_type)
+int lwtunnel_valid_encap_type(u16 encap_type, struct netlink_ext_ack *extack)
 {
const struct lwtunnel_encap_ops *ops;
int ret = -EINVAL;
 
if (encap_type == LWTUNNEL_ENCAP_NONE ||
-   encap_type > LWTUNNEL_ENCAP_MAX)
+   encap_type > LWTUNNEL_ENCAP_MAX) {
+   NL_SET_ERR_MSG(extack, "Unknown lwt encapsulation type");
return ret;
+   }
 
rcu_read_lock();
ops = rcu_dereference(lwtun_encaps[encap_type]);
@@ -153,11 +155,16 @@ int lwtunnel_valid_encap_type(u16 encap_type)
}
}
 #endif
-   return ops ? 0 : -EOPNOTSUPP;
+   ret = ops ? 0 : -EOPNOTSUPP;
+   if (ret < 0)
+   NL_SET_ERR_MSG(extack, "lwt encapsulation type not supported");
+
+   return ret;
 }
 EXPORT_SYMBOL(lwtunnel_valid_encap_type);
 
-int lwtunnel_valid_encap_type_attr(struct nlattr *attr, int remaining)
+int lwtunnel_valid_encap_type_attr(struct nlattr *attr, int remaining,
+  struct netlink_ext_ack *extack)
 {
struct rtnexthop *rtnh = (struct rtnexthop *)attr;
struct nlattr *nla_entype;
@@ -174,7 +181,8 @@ int lwtunnel_valid_encap_type_attr(struct nlattr *attr, int 
remaining)
if (nla_entype) {
encap_type = nla_get_u16(nla_entype);
 
-   if (lwtunnel_valid_encap_type(encap_type) != 0)
+   if (lwtunnel_valid_encap_type(encap_type,
+ extack) != 0)
return -EOPNOTSUPP;
}
}
diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c
index 715b7967d8ea..4e678fa892dd 100644
--- a/net/ipv4/fib_frontend.c
+++ b/net/ipv4/fib_frontend.c
@@ -685,7 +685,8 @@ static int rtm_to_fib_config(struct net *net, struct 
sk_buff *skb,
break;
case RTA_MULTIPATH:
err = lwtunnel_valid_encap_type_attr(nla_data(attr),
-nla_len(attr));
+nla_len(attr),
+extack);
if (err < 0)
goto errout;
cfg->fc_mp = nla_data(attr);
@@ -702,7 +703,8 @@ static int rtm_to_fib_config(struct net *net, struct 
sk_buff *skb,
break;

[PATCH v2 net-next 5/9] net: Fill in extack for mpls lwt encap

2017-05-27 Thread David Ahern

Fill in extack for errors in build_state for mpls lwt encap including
passing extack to nla_get_labels and adding error messages for failures
in it.

Signed-off-by: David Ahern 
---
 net/mpls/af_mpls.c   | 49 ++--
 net/mpls/internal.h  |  2 +-
 net/mpls/mpls_iptunnel.c | 12 +++-
 3 files changed, 47 insertions(+), 16 deletions(-)

diff --git a/net/mpls/af_mpls.c b/net/mpls/af_mpls.c
index 257ec66009da..f3830951fb1c 100644
--- a/net/mpls/af_mpls.c
+++ b/net/mpls/af_mpls.c
@@ -728,8 +728,8 @@ static int mpls_nh_build(struct net *net, struct mpls_route 
*rt,
goto errout;
 
if (newdst) {
-   err = nla_get_labels(newdst, max_labels,
->nh_labels, nh->nh_label);
+   err = nla_get_labels(newdst, max_labels, >nh_labels,
+nh->nh_label, NULL);
if (err)
goto errout;
}
@@ -782,7 +782,8 @@ static u8 mpls_count_nexthops(struct rtnexthop *rtnh, int 
len,
 
nla = nla_find(attrs, attrlen, RTA_NEWDST);
if (nla &&
-   nla_get_labels(nla, MAX_NEW_LABELS, _labels, NULL) != 0)
+   nla_get_labels(nla, MAX_NEW_LABELS, _labels,
+  NULL, NULL) != 0)
return 0;
 
*max_labels = max_t(u8, *max_labels, n_labels);
@@ -1541,8 +1542,8 @@ int nla_put_labels(struct sk_buff *skb, int attrtype,
 }
 EXPORT_SYMBOL_GPL(nla_put_labels);
 
-int nla_get_labels(const struct nlattr *nla,
-  u8 max_labels, u8 *labels, u32 label[])
+int nla_get_labels(const struct nlattr *nla, u8 max_labels, u8 *labels,
+  u32 label[], struct netlink_ext_ack *extack)
 {
unsigned len = nla_len(nla);
struct mpls_shim_hdr *nla_label;
@@ -1553,13 +1554,18 @@ int nla_get_labels(const struct nlattr *nla,
/* len needs to be an even multiple of 4 (the label size). Number
 * of labels is a u8 so check for overflow.
 */
-   if (len & 3 || len / 4 > 255)
+   if (len & 3 || len / 4 > 255) {
+   NL_SET_ERR_MSG_ATTR(extack, nla,
+   "Invalid length for labels attribute");
return -EINVAL;
+   }
 
/* Limit the number of new labels allowed */
nla_labels = len/4;
-   if (nla_labels > max_labels)
+   if (nla_labels > max_labels) {
+   NL_SET_ERR_MSG(extack, "Too many labels");
return -EINVAL;
+   }
 
/* when label == NULL, caller wants number of labels */
if (!label)
@@ -1574,8 +1580,29 @@ int nla_get_labels(const struct nlattr *nla,
/* Ensure the bottom of stack flag is properly set
 * and ttl and tc are both clear.
 */
-   if ((dec.bos != bos) || dec.ttl || dec.tc)
+   if (dec.ttl) {
+   NL_SET_ERR_MSG_ATTR(extack, nla,
+   "TTL in label must be 0");
+   return -EINVAL;
+   }
+
+   if (dec.tc) {
+   NL_SET_ERR_MSG_ATTR(extack, nla,
+   "Traffic class in label must be 0");
return -EINVAL;
+   }
+
+   if (dec.bos != bos) {
+   NL_SET_BAD_ATTR(extack, nla);
+   if (bos) {
+   NL_SET_ERR_MSG(extack,
+  "BOS bit must be set in first 
label");
+   } else {
+   NL_SET_ERR_MSG(extack,
+  "BOS bit can only be set in 
first label");
+   }
+   return -EINVAL;
+   }
 
switch (dec.label) {
case MPLS_LABEL_IMPLNULL:
@@ -1583,6 +1610,8 @@ int nla_get_labels(const struct nlattr *nla,
 * assign and distribute, but which never
 * actually appears in the encapsulation.
 */
+   NL_SET_ERR_MSG_ATTR(extack, nla,
+   "Implicit NULL Label (3) can not be 
used in encapsulation");
return -EINVAL;
}
 
@@ -1696,14 +1725,14 @@ static int rtm_to_route_config(struct sk_buff *skb,  
struct nlmsghdr *nlh,
case RTA_NEWDST:
if (nla_get_labels(nla, MAX_NEW_LABELS,
   >rc_output_labels,
-  cfg->rc_output_label))
+  cfg->rc_output_label, NULL))
goto errout;
break;
case RTA_DST:

[PATCH v2 net-next 7/9] net: mpls: Add extack messages for route add and delete failures

2017-05-27 Thread David Ahern

Add error messages for failures in adding and deleting mpls routes.
This covers most of the annoying EINVAL errors.

Signed-off-by: David Ahern 
---
 net/mpls/af_mpls.c  | 125 
 net/mpls/internal.h |   2 +-
 2 files changed, 87 insertions(+), 40 deletions(-)

diff --git a/net/mpls/af_mpls.c b/net/mpls/af_mpls.c
index 726eafecc793..0133d1ad9032 100644
--- a/net/mpls/af_mpls.c
+++ b/net/mpls/af_mpls.c
@@ -720,7 +720,8 @@ static int mpls_nh_build_from_cfg(struct mpls_route_config 
*cfg,
 
 static int mpls_nh_build(struct net *net, struct mpls_route *rt,
 struct mpls_nh *nh, int oif, struct nlattr *via,
-struct nlattr *newdst, u8 max_labels)
+struct nlattr *newdst, u8 max_labels,
+struct netlink_ext_ack *extack)
 {
int err = -ENOMEM;
 
@@ -729,14 +730,14 @@ static int mpls_nh_build(struct net *net, struct 
mpls_route *rt,
 
if (newdst) {
err = nla_get_labels(newdst, max_labels, >nh_labels,
-nh->nh_label, NULL);
+nh->nh_label, extack);
if (err)
goto errout;
}
 
if (via) {
err = nla_get_via(via, >nh_via_alen, >nh_via_table,
- __mpls_nh_via(rt, nh));
+ __mpls_nh_via(rt, nh), extack);
if (err)
goto errout;
} else {
@@ -803,7 +804,8 @@ static u8 mpls_count_nexthops(struct rtnexthop *rtnh, int 
len,
 }
 
 static int mpls_nh_build_multi(struct mpls_route_config *cfg,
-  struct mpls_route *rt, u8 max_labels)
+  struct mpls_route *rt, u8 max_labels,
+  struct netlink_ext_ack *extack)
 {
struct rtnexthop *rtnh = cfg->rc_mp;
struct nlattr *nla_via, *nla_newdst;
@@ -837,7 +839,7 @@ static int mpls_nh_build_multi(struct mpls_route_config 
*cfg,
 
err = mpls_nh_build(cfg->rc_nlinfo.nl_net, rt, nh,
rtnh->rtnh_ifindex, nla_via, nla_newdst,
-   max_labels);
+   max_labels, extack);
if (err)
goto errout;
 
@@ -856,20 +858,28 @@ static int mpls_nh_build_multi(struct mpls_route_config 
*cfg,
return err;
 }
 
-static bool mpls_label_ok(struct net *net, unsigned int index)
+static bool mpls_label_ok(struct net *net, unsigned int index,
+ struct netlink_ext_ack *extack)
 {
/* Reserved labels may not be set */
-   if (index < MPLS_LABEL_FIRST_UNRESERVED)
+   if (index < MPLS_LABEL_FIRST_UNRESERVED) {
+   NL_SET_ERR_MSG(extack,
+  "Invalid label - must be 
MPLS_LABEL_FIRST_UNRESERVED or higher");
return false;
+   }
 
/* The full 20 bit range may not be supported. */
-   if (index >= net->mpls.platform_labels)
+   if (index >= net->mpls.platform_labels) {
+   NL_SET_ERR_MSG(extack,
+  "Label >= configured maximum in 
platform_labels");
return false;
+   }
 
return true;
 }
 
-static int mpls_route_add(struct mpls_route_config *cfg)
+static int mpls_route_add(struct mpls_route_config *cfg,
+ struct netlink_ext_ack *extack)
 {
struct mpls_route __rcu **platform_label;
struct net *net = cfg->rc_nlinfo.nl_net;
@@ -888,13 +898,15 @@ static int mpls_route_add(struct mpls_route_config *cfg)
index = find_free_label(net);
}
 
-   if (!mpls_label_ok(net, index))
+   if (!mpls_label_ok(net, index, extack))
goto errout;
 
/* Append makes no sense with mpls */
err = -EOPNOTSUPP;
-   if (cfg->rc_nlflags & NLM_F_APPEND)
+   if (cfg->rc_nlflags & NLM_F_APPEND) {
+   NL_SET_ERR_MSG(extack, "MPLS does not support route append");
goto errout;
+   }
 
err = -EEXIST;
platform_label = rtnl_dereference(net->mpls.platform_label);
@@ -921,8 +933,10 @@ static int mpls_route_add(struct mpls_route_config *cfg)
nhs = 1;
}
 
-   if (nhs == 0)
+   if (nhs == 0) {
+   NL_SET_ERR_MSG(extack, "Route does not contain a nexthop");
goto errout;
+   }
 
err = -ENOMEM;
rt = mpls_rt_alloc(nhs, max_via_alen, max_labels);
@@ -936,7 +950,7 @@ static int mpls_route_add(struct mpls_route_config *cfg)
rt->rt_ttl_propagate = cfg->rc_ttl_propagate;
 
if (cfg->rc_mp)
-   err = mpls_nh_build_multi(cfg, rt, max_labels);
+   err = mpls_nh_build_multi(cfg, rt, max_labels, extack);
else
err =

[PATCH v2 net-next 1/9] net: ipv4: refactor key and length checks

2017-05-27 Thread David Ahern

fib_table_insert and fib_table_delete have the same checks on the prefix
and length. Refactor into a helper. Avoids duplicate extack messages in
the next patch.

Signed-off-by: David Ahern 
---
 net/ipv4/fib_trie.c | 25 +++--
 1 file changed, 15 insertions(+), 10 deletions(-)

diff --git a/net/ipv4/fib_trie.c b/net/ipv4/fib_trie.c
index 6e9df7d9bcc2..9bd46e1e1037 100644
--- a/net/ipv4/fib_trie.c
+++ b/net/ipv4/fib_trie.c
@@ -1099,6 +1099,17 @@ static int fib_insert_alias(struct trie *t, struct 
key_vector *tp,
return 0;
 }
 
+static bool fib_valid_key_len(u32 key, u8 plen)
+{
+   if (plen > KEYLENGTH)
+   return false;
+
+   if ((plen < KEYLENGTH) && (key << plen))
+   return false;
+
+   return true;
+}
+
 /* Caller must hold RTNL. */
 int fib_table_insert(struct net *net, struct fib_table *tb,
 struct fib_config *cfg, struct netlink_ext_ack *extack)
@@ -1115,16 +1126,13 @@ int fib_table_insert(struct net *net, struct fib_table 
*tb,
u32 key;
int err;
 
-   if (plen > KEYLENGTH)
-   return -EINVAL;
-
key = ntohl(cfg->fc_dst);
 
-   pr_debug("Insert table=%u %08x/%d\n", tb->tb_id, key, plen);
-
-   if ((plen < KEYLENGTH) && (key << plen))
+   if (!fib_valid_key_len(key, plen))
return -EINVAL;
 
+   pr_debug("Insert table=%u %08x/%d\n", tb->tb_id, key, plen);
+
fi = fib_create_info(cfg, extack);
if (IS_ERR(fi)) {
err = PTR_ERR(fi);
@@ -1518,12 +1526,9 @@ int fib_table_delete(struct net *net, struct fib_table 
*tb,
u8 tos = cfg->fc_tos;
u32 key;
 
-   if (plen > KEYLENGTH)
-   return -EINVAL;
-
key = ntohl(cfg->fc_dst);
 
-   if ((plen < KEYLENGTH) && (key << plen))
+   if (!fib_valid_key_len(key, plen))
return -EINVAL;
 
l = fib_find_node(t, , key);
-- 
2.11.0 (Apple Git-81)

[PATCH v2 net-next 0/9] net: another round of extack handling for routing

2017-05-27 Thread David Ahern

This set focuses on passing extack through lwtunnel and MPLS with
additional catches for IPv4 route add and minor cleanups in MPLS
encountered passing the extack arg around.

v2
- mindful of bloat adding duplicate messages
  + refactored prefix and prefix length checks in ipv4's fib_table_insert
and fib_table_del
  + refactored label check in mpls

- split mpls cleanups into 2 patches
  + move nla_get_via up in af_mpls to avoid forward declaration

David Ahern (9):
  net: ipv4: refactor key and length checks
  net: ipv4: Add extack message for invalid prefix or length
  net: lwtunnel: Add extack to encap attr validation
  net: add extack arg to lwtunnel build state
  net: Fill in extack for mpls lwt encap
  net: mpls: Pull common label check into helper
  net: mpls: Add extack messages for route add and delete failures
  net: mpls: Make nla_get_via in af_mpls.c
  net: mpls: remove unnecessary initialization of err

 include/linux/netlink.h   |  10 ++
 include/net/ip_fib.h  |   3 +-
 include/net/lwtunnel.h|  22 ++--
 net/core/lwt_bpf.c|   5 +-
 net/core/lwtunnel.c   |  38 +--
 net/ipv4/fib_frontend.c   |  13 ++-
 net/ipv4/fib_lookup.h |   3 +-
 net/ipv4/fib_semantics.c  |  20 ++--
 net/ipv4/fib_trie.c   |  34 +++---
 net/ipv4/ip_tunnel_core.c |  11 +-
 net/ipv6/ila/ila_lwt.c|   5 +-
 net/ipv6/route.c  |   6 +-
 net/ipv6/seg6_iptunnel.c  |   5 +-
 net/mpls/af_mpls.c| 266 +-
 net/mpls/internal.h   |   4 +-
 net/mpls/mpls_iptunnel.c  |  17 +--
 16 files changed, 301 insertions(+), 161 deletions(-)

-- 
2.11.0 (Apple Git-81)

[PATCH v2 net-next 2/9] net: ipv4: Add extack message for invalid prefix or length

2017-05-27 Thread David Ahern

Add extack error message for invalid prefix length and invalid prefix.
Example of the latter is a route spec containing 172.16.100.1/24, where
the /24 mask means the lower 8-bits should be 0. Amazing how easy that
one is to overlook when an EINVAL is returned.

Signed-off-by: David Ahern 
---
 include/net/ip_fib.h|  3 ++-
 net/ipv4/fib_frontend.c |  7 ---
 net/ipv4/fib_trie.c | 17 +++--
 3 files changed, 17 insertions(+), 10 deletions(-)

diff --git a/include/net/ip_fib.h b/include/net/ip_fib.h
index 25f5c516afd1..c3fa1f0438c1 100644
--- a/include/net/ip_fib.h
+++ b/include/net/ip_fib.h
@@ -266,7 +266,8 @@ int fib_table_lookup(struct fib_table *tb, const struct 
flowi4 *flp,
 struct fib_result *res, int fib_flags);
 int fib_table_insert(struct net *, struct fib_table *, struct fib_config *,
 struct netlink_ext_ack *extack);
-int fib_table_delete(struct net *, struct fib_table *, struct fib_config *);
+int fib_table_delete(struct net *, struct fib_table *, struct fib_config *,
+struct netlink_ext_ack *extack);
 int fib_table_dump(struct fib_table *table, struct sk_buff *skb,
   struct netlink_callback *cb);
 int fib_table_flush(struct net *net, struct fib_table *table);
diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c
index 14d2f7bd7c76..715b7967d8ea 100644
--- a/net/ipv4/fib_frontend.c
+++ b/net/ipv4/fib_frontend.c
@@ -588,7 +588,8 @@ int ip_rt_ioctl(struct net *net, unsigned int cmd, void 
__user *arg)
if (cmd == SIOCDELRT) {
tb = fib_get_table(net, cfg.fc_table);
if (tb)
-   err = fib_table_delete(net, tb, );
+   err = fib_table_delete(net, tb, ,
+  NULL);
else
err = -ESRCH;
} else {
@@ -732,7 +733,7 @@ static int inet_rtm_delroute(struct sk_buff *skb, struct 
nlmsghdr *nlh,
goto errout;
}
 
-   err = fib_table_delete(net, tb, );
+   err = fib_table_delete(net, tb, , extack);
 errout:
return err;
 }
@@ -851,7 +852,7 @@ static void fib_magic(int cmd, int type, __be32 dst, int 
dst_len, struct in_ifad
if (cmd == RTM_NEWROUTE)
fib_table_insert(net, tb, , NULL);
else
-   fib_table_delete(net, tb, );
+   fib_table_delete(net, tb, , NULL);
 }
 
 void fib_add_ifaddr(struct in_ifaddr *ifa)
diff --git a/net/ipv4/fib_trie.c b/net/ipv4/fib_trie.c
index 9bd46e1e1037..a624d380c81d 100644
--- a/net/ipv4/fib_trie.c
+++ b/net/ipv4/fib_trie.c
@@ -1099,13 +1099,18 @@ static int fib_insert_alias(struct trie *t, struct 
key_vector *tp,
return 0;
 }
 
-static bool fib_valid_key_len(u32 key, u8 plen)
+static bool fib_valid_key_len(u32 key, u8 plen, struct netlink_ext_ack *extack)
 {
-   if (plen > KEYLENGTH)
+   if (plen > KEYLENGTH) {
+   NL_SET_ERR_MSG(extack, "Invalid prefix length");
return false;
+   }
 
-   if ((plen < KEYLENGTH) && (key << plen))
+   if ((plen < KEYLENGTH) && (key << plen)) {
+   NL_SET_ERR_MSG(extack,
+  "Invalid prefix for given prefix length");
return false;
+   }
 
return true;
 }
@@ -1128,7 +1133,7 @@ int fib_table_insert(struct net *net, struct fib_table 
*tb,
 
key = ntohl(cfg->fc_dst);
 
-   if (!fib_valid_key_len(key, plen))
+   if (!fib_valid_key_len(key, plen, extack))
return -EINVAL;
 
pr_debug("Insert table=%u %08x/%d\n", tb->tb_id, key, plen);
@@ -1516,7 +1521,7 @@ static void fib_remove_alias(struct trie *t, struct 
key_vector *tp,
 
 /* Caller must hold RTNL. */
 int fib_table_delete(struct net *net, struct fib_table *tb,
-struct fib_config *cfg)
+struct fib_config *cfg, struct netlink_ext_ack *extack)
 {
struct trie *t = (struct trie *) tb->tb_data;
struct fib_alias *fa, *fa_to_delete;
@@ -1528,7 +1533,7 @@ int fib_table_delete(struct net *net, struct fib_table 
*tb,
 
key = ntohl(cfg->fc_dst);
 
-   if (!fib_valid_key_len(key, plen))
+   if (!fib_valid_key_len(key, plen, extack))
return -EINVAL;
 
l = fib_find_node(t, , key);
-- 
2.11.0 (Apple Git-81)

Re: [PATCH net-next] net: stmmac: use correct pointer when printing normal descriptor ring

2017-05-27 Thread Andy Shevchenko

On Tue, May 9, 2017 at 7:52 PM, Niklas Cassel  wrote:
> From: Niklas Cassel 

Commit message?

> seq_printf(seq, "%d [0x%x]: 0x%x 0x%x 0x%x 0x%x\n",
> -  i, (unsigned int)virt_to_phys(ep),
> +  i, (unsigned int)virt_to_phys(p),

There is should not be casting. Pointer might be 64-bit, thus %pap
must be used instead with a reference to the physical address.

>le32_to_cpu(p->des0), le32_to_cpu(p->des1),
>le32_to_cpu(p->des2), 
> le32_to_cpu(p->des3));
> p++;

-- 
With Best Regards,
Andy Shevchenko

[PATCH v2] mac80211: Invoke TX LED in more code paths

2017-05-27 Thread Bjorn Andersson

ieee80211_tx_status() is only one of the possible ways a driver can
report a handled packet, some drivers call this for every packet while
others calls it rarely or never.

In order to invoke the TX LED in the non-status reporting cases this
patch pushes the call to ieee80211_led_tx() into
ieee80211_report_used_skb(), which is shared between the various code
paths.

Signed-off-by: Bjorn Andersson 
---
 net/mac80211/status.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/mac80211/status.c b/net/mac80211/status.c
index be47ac5cd8c8..a9fa6ee57e8f 100644
--- a/net/mac80211/status.c
+++ b/net/mac80211/status.c
@@ -546,6 +546,8 @@ static void ieee80211_report_used_skb(struct 
ieee80211_local *local,
skb->wifi_acked_valid = 1;
skb->wifi_acked = acked;
}
+
+   ieee80211_led_tx(local);
 }
 
 /*
@@ -823,8 +825,6 @@ static void __ieee80211_tx_status(struct ieee80211_hw *hw,
}
}
 
-   ieee80211_led_tx(local);
-
/* SNMP counters
 * Fragments are passed to low-level drivers as separate skbs, so these
 * are actually fragments, not frames. Update frame counters only for
-- 
2.12.0

GREETINGS FROM MR. MUSTAPHA ALI.

2017-05-27 Thread mustapha ali

Dear Friend.

Greetings.

My Name is Mustapha Ali, I am a banker by profession. I am from
Ouagadougou, Burkina Faso, West Africa. My reason of contacting you is
to transfer an abandoned fund $5M US Dollars to your account if you
agree with me.

The owner of this fund died since 2003 with his Next Of Kin. I want to
present you to the bank as the Next of Kin/beneficiary of this fund.

Please indicate your interest and willingness by sending the below
information for more clarification and confident to enable me feed you
with more details concerning the business deal.

(1) Your full name...
(2) Your age and sex
(3) Your contact address..
(4) Your private phone no..
(5) Fax number if Any..
(6) Your country of origin..
(7) Your occupation.’
(8) Your photo..

Further details of the transaction shall be forward to you as soon as
I receive your response indicating your interest in handling this
transaction.

 Have a Great Day,

Mustapha Ali.

Re: [PATCH V6 net-next 2/2] bonding: Prevent duplicate userspace notification

2017-05-27 Thread David Ahern

On 5/27/17 8:14 AM, Vladislav Yasevich wrote:
> Whenever a user changes bonding options, a NETDEV_CHANGEINFODATA
> notificatin is generated which results in a rtnelink message to
> be sent.  While runnig 'ip monitor', we can actually see 2 messages,
> one a result of the event, and the other a result of state change
> that is generated bo netdev_state_change().  However, this is not
> always the case. If bonding changes were done via sysfs or ifenslave
> (old ioctl interface), then only 1 message is seen.
> 
> This patch removes duplicate messages in the case of using netlink
> to configure bonding.  It introduceds a separte function that
> triggers a netdev event and uses that function in the syfs and ioctl
> cases.
> 
> This was discovered while auditing all the different envents and
> continues the effort of cleaning up duplicated netlink messages.
> 
> CC: David Ahern 
> CC: Jiri Pirko 
> Signed-off-by: Vladislav Yasevich 
> ---
>  drivers/net/bonding/bond_main.c|  3 ++-
>  drivers/net/bonding/bond_options.c | 27 +--
>  include/net/bond_options.h |  2 ++
>  3 files changed, 29 insertions(+), 3 deletions(-)
> 

Acked-by: David Ahern

Re: [PATCH V6 net-next 1/2] rtnl: Add support for netdev event to link messages

2017-05-27 Thread David Ahern

On 5/27/17 8:14 AM, Vladislav Yasevich wrote:
> When netdev events happen, a rtnetlink_event() handler will send
> messages for every event in it's white list.  These messages contain
> current information about a particular device, but they do not include
> the iformation about which event just happened.  So, it is impossible
> to tell what just happend for these events.
> 
> This patch adds a new extension to RTM_NEWLINK message called IFLA_EVENT
> that would have an encoding of event that triggered this
> message.  This would allow the the message consumer to easily determine
> if it needs to perform certain actions.
> 
> Signed-off-by: Vladislav Yasevich 
> ---
>  include/linux/rtnetlink.h|  3 +-
>  include/uapi/linux/if_link.h | 11 
>  net/core/dev.c   |  2 +-
>  net/core/rtnetlink.c | 65 
> ++--
>  4 files changed, 70 insertions(+), 11 deletions(-)

Acked-by: David Ahern

Re: [PATCH 7/7] mlx5: Do not build eswitch_offloads if CONFIG_MLX5_EN_ESWITCH_OFFLOADS is set

2017-05-27 Thread Or Gerlitz

On Sat, May 27, 2017 at 12:16 AM, Jes Sorensen  wrote:
> This gets rid of the temporary #ifdef spaghetti and allows the code to
> compile without offload support enabled.

Hi Jes,

I am pretty sure we can do that exercise you're up to without any
spaghetti cooking and even put more code under that CONFIG directive
(en_rep.c), I'll take that with Saeed.

Just wondering, you are motivated by a wish to put some mlx5
functionalities under their own CONFIG directives which could be
useful when backporting the latest upstream driver into older kernel
and being able not to deal with parts of it, right? in that respect,
are you using SRIOV but not the offloads mode?

Or.

Re: [PATCH net-next 1/4] net/flow_dissector: add support for dissection of misc ip header fields

2017-05-27 Thread Or Gerlitz

On Sat, May 27, 2017 at 8:18 PM, Tom Herbert  wrote:

> I think the problem is I don't know what you're dealing with. The only
> thing I can derive from the commit log is that tos and ttl are being
> extracted, but I don't know why they are needed.

The current case for matching on TTL I am dealing with is for using
TC/flower for offloading OVS in flow based VM traffic routing env
(Open-Stack and ODL
DVR - Distributed Virtual Routing) -- where packet headers are
re-written to set the next hop MACs and the TTL is changed. Fields
which are modified are also matched beforhand, and here comes the
matching on TTL.

> I do know this is
> adding complexity to an already overly complex function, and this
> introduces new conditionals and code into the primary use case of
> flow_dissector which is to create a key for deriving skb->hash. I
> don't see that the cost of this patch has been justified.

I hear what you're saying, but part of the rules is that everything to
be offloaded can also be carried out in the kernel SW data-path, so
here comes the touching that area. I have used the minimal foot print
I could and set the code to run in a separate helper called from the
main dissection function.

>> When we did the the flower patches for being able to classify on both
>> the inner and outer fields (say outer src/dst ip, tunnel key) for what
>> related to the  macs/ips/ports/etc -- I don't think we touched the
>> existing dissection, I will look on that to see if I am wrong..

Re: running an eBPF program

2017-05-27 Thread Y Song

On Sat, May 27, 2017 at 1:23 PM, Y Song  wrote:
>
> From verifier error message:
> ==
> 0: (bf) r6 = r1
>
> 1: (18) r9 = 0xffee
>
> 3: (69) r0 = *(u16 *)(r6 +16)
>
> invalid bpf_context access off=16 size=2
> ==
>
> The offset 16 of struct __sk_buff is hash.
> What instruction #3 tries to do is to access 2 bytes of the hash value
> instead of full 4 bytes.
> This is explicitly not allowed in verifier due to endianness issue.


I can reproduce the issue now. My previous statement saying to access
"hash" field is not correct. It is accessing the protocol field.

static __inline__ bool flow_dissector(struct __sk_buff *skb,
  struct flow_keys *flow)
{
int poff, nh_off = BPF_LL_OFF + ETH_HLEN;
__be16 proto = skb->protocol;
__u8 ip_proto;

The plan so far is to see whether we can fix the issue in LLVM side.

Yonghong

>
>
> Look at iproute2 example code, it looks like the following may be responsible:
>
> bpf_tailcall.c:#define MAX_JMP_SIZE2
> bpf_tailcall.c:tail_call(skb, _tc, skb->hash & (MAX_JMP_SIZE - 
> 1));
>
> I am thinking of implementing something in LLVM to prevent
> optimization from LD4=>LD2/DL1 for context access like this.
>
>
> On Fri, May 26, 2017 at 4:00 AM, Adel Fuchs  wrote:
> > Hi
> >
> > I'm trying to run this eBPF program:
> >
> > https://git.kernel.org/pub/scm/linux/kernel/git/shemminger/iproute2.git/tree/examples/bpf
> >
> >
> > and I get this error:
> >
> >
> > :~/iproute2/examples/bpf$sudo tc filter add dev enx00e11100329b parent
> > 1: bpf obj bpf.o exp /tmp/bpf-uds flowid 1:1 action bpf obj bpf.o sec
> > action-markaction bpf obj bpf.o sec action-rand ok
> >
> > [sudo] password for adel:
> >
> >
> >
> > Prog section 'classifier' rejected: Permission denied (13)!
> >
> > - Type: 3
> >
> > - Instructions: 218 (0 over limit)
> >
> > - License:  GPL
> >
> >
> >
> > Verifier analysis:
> >
> >
> >
> > 0: (bf) r6 = r1
> >
> > 1: (18) r9 = 0xffee
> >
> > 3: (69) r0 = *(u16 *)(r6 +16)
> >
> > invalid bpf_context access off=16 size=2
> >
> >
> >
> > Error fetching program/map!
> >
> > Failed to retrieve (e)BPF data!
> >
> >
> > Any suggestions?
> >
> > Thanks,
> >
> > Adel

Re: running an eBPF program

2017-05-27 Thread Y Song

>From verifier error message:
==
0: (bf) r6 = r1

1: (18) r9 = 0xffee

3: (69) r0 = *(u16 *)(r6 +16)

invalid bpf_context access off=16 size=2
==

The offset 16 of struct __sk_buff is hash.
What instruction #3 tries to do is to access 2 bytes of the hash value
instead of full 4 bytes.
This is explicitly not allowed in verifier due to endianness issue.

Look at iproute2 example code, it looks like the following may be responsible:

bpf_tailcall.c:#define MAX_JMP_SIZE2
bpf_tailcall.c:tail_call(skb, _tc, skb->hash & (MAX_JMP_SIZE - 1));

I am thinking of implementing something in LLVM to prevent
optimization from LD4=>LD2/DL1 for context access like this.


On Fri, May 26, 2017 at 4:00 AM, Adel Fuchs  wrote:
> Hi
>
> I'm trying to run this eBPF program:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/shemminger/iproute2.git/tree/examples/bpf
>
>
> and I get this error:
>
>
> :~/iproute2/examples/bpf$sudo tc filter add dev enx00e11100329b parent
> 1: bpf obj bpf.o exp /tmp/bpf-uds flowid 1:1 action bpf obj bpf.o sec
> action-markaction bpf obj bpf.o sec action-rand ok
>
> [sudo] password for adel:
>
>
>
> Prog section 'classifier' rejected: Permission denied (13)!
>
> - Type: 3
>
> - Instructions: 218 (0 over limit)
>
> - License:  GPL
>
>
>
> Verifier analysis:
>
>
>
> 0: (bf) r6 = r1
>
> 1: (18) r9 = 0xffee
>
> 3: (69) r0 = *(u16 *)(r6 +16)
>
> invalid bpf_context access off=16 size=2
>
>
>
> Error fetching program/map!
>
> Failed to retrieve (e)BPF data!
>
>
> Any suggestions?
>
> Thanks,
>
> Adel

GREETINGS

2017-05-27 Thread mis.sbort...@ono.com

GREETINGS,

I AM BORTE ,I WAS  DIAGNOSE WITH OVARIAN CANCER,WHICH DOCTORS HAVE 
CONFIRMED THAT I HAVE ONLY FEW WEEKS TO LIVE, SO I HAVE DECIDED TO 
DONATE EVERYTHING I HAVE TO THE ORPHANAGE AND THE POOR WIDOWS THROUGH 
YOU IN YOUR AREA.PLEASE KINDLY REPLY  ME ONLY ON MY  EMAIL ADDRES HERE 
(borteogo...@gmail.com)  AS SOON AS POSIBLE TO ENABLE ME GIVE YOU MORE 
INFORMATION ABOUT MYSELF AND HOW TO GO ABOUT IT .

THANKS 

MISS BORTE

[PATCH net-next v2] net: phy: Relax error checking on sysfs_create_link()

2017-05-27 Thread Florian Fainelli

Some Ethernet drivers will attach/connect to a PHY device before calling
register_netdevice() which is responsible for calling netdev_register_kobject()
which would do the network device's kobject initialization. In such a case,
sysfs_create_link() would return -ENOENT because the network device's kobject
is not ready yet, and we would fail to connect to the PHY device.

In order to keep things simple and symetrical, we just take the success path as
indicative of the ability to access the network device's kobject, and create
the second link if that's the case.

Fixes: 5568363f0cb3 ("net: phy: Create sysfs reciprocal links for 
attached_dev/phydev")
Reported-by: Woojung Hung 
Signed-off-by: Florian Fainelli 
---
Changes in v2:
- make sure phydev->sysfs_links is set to false before setting again

 drivers/net/phy/phy_device.c | 30 ++
 include/linux/phy.h  |  2 ++
 2 files changed, 24 insertions(+), 8 deletions(-)

diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
index f84414b8f2ee..37a1e98908e3 100644
--- a/drivers/net/phy/phy_device.c
+++ b/drivers/net/phy/phy_device.c
@@ -960,15 +960,27 @@ int phy_attach_direct(struct net_device *dev, struct 
phy_device *phydev,
 
phydev->attached_dev = dev;
dev->phydev = phydev;
+
+   /* Some Ethernet drivers try to connect to a PHY device before
+* calling register_netdevice() -> netdev_register_kobject() and
+* does the dev->dev.kobj initialization. Here we only check for
+* success which indicates that the network device kobject is
+* ready. Once we do that we still need to keep track of whether
+* links were successfully set up or not for phy_detach() to
+* remove them accordingly.
+*/
+   phydev->sysfs_links = false;
+
err = sysfs_create_link(>mdio.dev.kobj, >dev.kobj,
"attached_dev");
-   if (err)
-   goto error;
+   if (!err) {
+   err = sysfs_create_link(>dev.kobj, >mdio.dev.kobj,
+   "phydev");
+   if (err)
+   goto error;
 
-   err = sysfs_create_link(>dev.kobj, >mdio.dev.kobj,
-   "phydev");
-   if (err)
-   goto error;
+   phydev->sysfs_links = true;
+   }
 
phydev->dev_flags = flags;
 
@@ -1059,8 +1071,10 @@ void phy_detach(struct phy_device *phydev)
struct mii_bus *bus;
int i;
 
-   sysfs_remove_link(>dev.kobj, "phydev");
-   sysfs_remove_link(>mdio.dev.kobj, "attached_dev");
+   if (phydev->sysfs_links) {
+   sysfs_remove_link(>dev.kobj, "phydev");
+   sysfs_remove_link(>mdio.dev.kobj, "attached_dev");
+   }
phydev->attached_dev->phydev = NULL;
phydev->attached_dev = NULL;
phy_suspend(phydev);
diff --git a/include/linux/phy.h b/include/linux/phy.h
index 5a808a26e4cf..58f1b45a4c44 100644
--- a/include/linux/phy.h
+++ b/include/linux/phy.h
@@ -363,6 +363,7 @@ struct phy_c45_device_ids {
  * is_pseudo_fixed_link: Set to true if this phy is an Ethernet switch, etc.
  * has_fixups: Set to true if this phy has fixups/quirks.
  * suspended: Set to true if this phy has been suspended successfully.
+ * sysfs_links: Internal boolean tracking sysfs symbolic links setup/removal.
  * state: state of the PHY for management purposes
  * dev_flags: Device-specific flags used by the PHY driver.
  * link_timeout: The number of timer firings to wait before the
@@ -399,6 +400,7 @@ struct phy_device {
bool is_pseudo_fixed_link;
bool has_fixups;
bool suspended;
+   bool sysfs_links;
 
enum phy_state state;
 
-- 
2.11.0

Re: [PATCH net-next 1/4] net/flow_dissector: add support for dissection of misc ip header fields

2017-05-27 Thread Tom Herbert

On Sat, May 27, 2017 at 9:31 AM, Or Gerlitz  wrote:
> On Thu, May 25, 2017 at 7:22 PM, Tom Herbert  wrote:
>> On Thu, May 25, 2017 at 6:24 AM, Or Gerlitz  wrote:
>>> Add support for dissection of ip tos and ttl and ipv6 traffic-class
>>> and hoplimit. Both are dissected into the same struct.
>
>>> Uses similar call to ip dissection function as with tcp, arp and others.
>
>
>>> +/**
>>> + * struct flow_dissector_key_ip:
>>> + * @tos: tos
>>> + * @ttl: ttl
>>> + */
>>> +struct flow_dissector_key_ip {
>>> +   __u8tos;
>>> +   __u8ttl;
>>> +};
>>> --- a/net/core/flow_dissector.c
>>> +++ b/net/core/flow_dissector.c
>
>>> +static void
>>> +__skb_flow_dissect_ipv4(const struct sk_buff *skb,
>>> +   struct flow_dissector *flow_dissector,
>>> +   void *target_container, void *data, const struct 
>>> iphdr *iph)
>>> +{
>>> +   struct flow_dissector_key_ip *key_ip;
>>> +
>>> +   if (!dissector_uses_key(flow_dissector, FLOW_DISSECTOR_KEY_IP))
>>> +   return;
>>> +
>>> +   key_ip = skb_flow_dissector_target(flow_dissector,
>>> +  FLOW_DISSECTOR_KEY_IP,
>>> +  target_container);
>>> +   key_ip->tos = iph->tos;
>>> +   key_ip->ttl = iph->ttl;
>>
>> In an encapsulation this returns the tos and ttl of the encapsulated
>> packet. Is that really useful to the caller? Seems more likely that
>> they need the outer tos and ttl for forwarding.
>
> In what we are dealing with, classification is carried after the
> packet is decapsulated by the shared tunnel device. So even today,e.g
> for the src/dst IP, the dissection is carried on what were the inner
> fields before decap.
>
Or,

I think the problem is I don't know what you're dealing with. The only
thing I can derive from the commit log is that tos and ttl are being
extracted, but I don't know why they are needed. I do know this is
adding complexity to an already overly complex function, and this
introduces new conditionals and code into the primary use case of
flow_dissector which is to create a key for deriving skb->hash. I
don't see that the cost of this patch has been justified.

Tom

> When we did the the flower patches for being able to classify on both
> the inner and outer fields (say outer src/dst ip, tunnel key) for what
> related to the  macs/ips/ports/etc -- I don't think we touched the
> existing dissection, I will look on that to see if I am wrong..

Re: [PATCH net-next 1/4] net/flow_dissector: add support for dissection of misc ip header fields

2017-05-27 Thread Or Gerlitz

On Thu, May 25, 2017 at 7:22 PM, Tom Herbert  wrote:
> On Thu, May 25, 2017 at 6:24 AM, Or Gerlitz  wrote:
>> Add support for dissection of ip tos and ttl and ipv6 traffic-class
>> and hoplimit. Both are dissected into the same struct.

>> Uses similar call to ip dissection function as with tcp, arp and others.


>> +/**
>> + * struct flow_dissector_key_ip:
>> + * @tos: tos
>> + * @ttl: ttl
>> + */
>> +struct flow_dissector_key_ip {
>> +   __u8tos;
>> +   __u8ttl;
>> +};
>> --- a/net/core/flow_dissector.c
>> +++ b/net/core/flow_dissector.c

>> +static void
>> +__skb_flow_dissect_ipv4(const struct sk_buff *skb,
>> +   struct flow_dissector *flow_dissector,
>> +   void *target_container, void *data, const struct 
>> iphdr *iph)
>> +{
>> +   struct flow_dissector_key_ip *key_ip;
>> +
>> +   if (!dissector_uses_key(flow_dissector, FLOW_DISSECTOR_KEY_IP))
>> +   return;
>> +
>> +   key_ip = skb_flow_dissector_target(flow_dissector,
>> +  FLOW_DISSECTOR_KEY_IP,
>> +  target_container);
>> +   key_ip->tos = iph->tos;
>> +   key_ip->ttl = iph->ttl;
>
> In an encapsulation this returns the tos and ttl of the encapsulated
> packet. Is that really useful to the caller? Seems more likely that
> they need the outer tos and ttl for forwarding.

In what we are dealing with, classification is carried after the
packet is decapsulated by the shared tunnel device. So even today,e.g
for the src/dst IP, the dissection is carried on what were the inner
fields before decap.

When we did the the flower patches for being able to classify on both
the inner and outer fields (say outer src/dst ip, tunnel key) for what
related to the  macs/ips/ports/etc -- I don't think we touched the
existing dissection, I will look on that to see if I am wrong..

Re: [PATCH net-next 1/4] net/flow_dissector: add support for dissection of misc ip header fields

2017-05-27 Thread Or Gerlitz

On Thu, May 25, 2017 at 6:42 PM, Tom Herbert  wrote:
> On Thu, May 25, 2017 at 6:24 AM, Or Gerlitz  wrote:
>> Add support for dissection of ip tos and ttl and ipv6 traffic-class
>> and hoplimit. Both are dissected into the same struct.

>> --- a/include/net/flow_dissector.h
>> +++ b/include/net/flow_dissector.h

>> +/**
>> + * struct flow_dissector_key_ip:
>> + * @tos: tos
>> + * @ttl: ttl
>> + */
>> +struct flow_dissector_key_ip {
>> +   __u8tos;
>> +   __u8ttl;
>> +};
>> +

> Looks like yet more complexity be piled onto flow dissector. Instead
> of splitting out individual fields can we just return a pointer to the
> IP header and let the caller extract the fields they're interested in?

Do you mean that struct flow_dissector_key_ip  will only contain
(union?) const struct iphdr * and const struct ipv6hdr * ? I wasn't
sure how would that further look on the kernel SW classification path
(the non offloaded case)

Re: [PATCH] net/core: remove explicit do_softirq() from busy_poll_stop()

2017-05-27 Thread Sebastian Andrzej Siewior

On 2017-05-22 14:26:44 [-0700], Eric Dumazet wrote:
> On Mon, May 22, 2017 at 12:26 PM, Sebastian Andrzej Siewior
>  wrote:
> > Since commit 217f69743681 ("net: busy-poll: allow preemption in
> > sk_busy_loop()") there is an explicit do_softirq() invocation after
> > local_bh_enable() has been invoked.
> > I don't understand why we need this because local_bh_enable() will
> > invoke do_softirq() once the softirq counter reached zero and we have
> > softirq-related work pending.
> >
> > Signed-off-by: Sebastian Andrzej Siewior 
> > ---
> >  net/core/dev.c | 2 --
> >  1 file changed, 2 deletions(-)
> >
> > diff --git a/net/core/dev.c b/net/core/dev.c
> > index fca407b4a6ea..e84eb0ec5529 100644
> > --- a/net/core/dev.c
> > +++ b/net/core/dev.c
> > @@ -5199,8 +5199,6 @@ static void busy_poll_stop(struct napi_struct *napi, 
> > void *have_poll_lock)
> > if (rc == BUSY_POLL_BUDGET)
> > __napi_schedule(napi);
> > local_bh_enable();
> > -   if (local_softirq_pending())
> > -   do_softirq();
> >  }
> 
> preemption is disabled.
so? This patch:

diff --git a/init/main.c b/init/main.c
--- a/init/main.c
+++ b/init/main.c
@@ -1001,6 +1001,21 @@ static int __ref kernel_init(void *unused)
  "See Linux Documentation/admin-guide/init.rst for guidance.");
 }
 
+static void delay_thingy_func(struct work_struct *x)
+{
+   preempt_disable();
+   local_bh_disable();
+   pr_err("one %s\n", current->comm);
+   raise_softirq(TASKLET_SOFTIRQ);
+   pr_err("two %s\n", current->comm);
+   local_bh_enable();
+   pr_err("three %s\n", current->comm);
+   preempt_enable();
+   pr_err("four %s\n", current->comm);
+}
+
+static DECLARE_DELAYED_WORK(delay_thingy, delay_thingy_func);
+
 static noinline void __init kernel_init_freeable(void)
 {
/*
@@ -1038,6 +1053,7 @@ static noinline void __init kernel_init_freeable(void)
 
do_basic_setup();
 
+   schedule_delayed_work(_thingy, HZ * 5);
/* Open the /dev/console on the rootfs, this should never fail */
if (sys_open((const char __user *) "/dev/console", O_RDWR, 0) < 0)
pr_err("Warning: unable to open an initial console.\n");
diff --git a/kernel/softirq.c b/kernel/softirq.c
index 4e09821f9d9e..b8dcb9dc5692 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -500,6 +500,7 @@ static __latent_entropy void tasklet_action(struct 
softirq_action *a)
 {
struct tasklet_struct *list;
 
+   pr_err("%s() %s\n", __func__, current->comm);
local_irq_disable();
list = __this_cpu_read(tasklet_vec.head);
__this_cpu_write(tasklet_vec.head, NULL);

gives me this output:
[7.132439] one kworker/4:2
[7.132806] two kworker/4:2
[7.133120] softirq: tasklet_action() kworker/4:2
[7.133623] three kworker/4:2
[7.133940] four kworker/4:2

which means after the last local_bh_enable() we already invoke the
raised softirq handler. It does not matter that we are in a
preempt_disable() region. 

> 
> Look at netif_rx_ni() for a similar construct.

correct, this is old and it is already patched in -RT. And I have no
clue why this is required by because netif_rx_internal() itself does
preempt_disable() / get_cpu() so this one already disables preemption.
Looking at it, I *think* you want local_bh_disable() instead of
preempt_disable() and do_softirq() removed, too.

Let me browse at the musuem a little bit… ahhh, here -> "[PATCH] Make
netif_rx_ni preempt-safe" [0]. There we got the preempt_disable() from
which protects against parallel invocations of do_softirq() in user
context. And we only do the check for pending softirqs (and invoke
do_softirq()) because netif_rx() sets the pending bits without raising
the softirq and it is done in a BH-enabled section. And in interrupt
context we check for those in the interrupt-exit path. So in this case
we have to do it manually.

[0] http://oss.sgi.com/projects/netdev/archive/2004-10/msg02211.html

> What exact problem do you have with existing code, that is worth
> adding this change ?

I need to workaround the non-existing do_softirq() function in RT and my
current workaround is the patch I posted. I don't see the need for the
two lines. And it seems that the other construct in netif_rx_ni() can be
simplified / removed, too.

> Thanks.

Sebastian

[PATCH net-next] liquidio: add support for OVS offload

2017-05-27 Thread Felix Manlunas

From: VSR Burru 

Add support for OVS offload.  By default PF driver runs in basic NIC mode
as usual.  To run in OVS mode, use the insmod parameter "fw_type=ovs".

For OVS mode, create a management interface for communication with NIC
firmware.  This communication channel uses PF0's I/O rings.

Bump up driver version to 1.6.0 to match newer firmware.

Signed-off-by: VSR Burru 
Signed-off-by: Felix Manlunas 
---
 drivers/net/ethernet/cavium/liquidio/Makefile  |   1 +
 drivers/net/ethernet/cavium/liquidio/lio_main.c|  27 +-
 .../net/ethernet/cavium/liquidio/liquidio_common.h |  23 +-
 .../net/ethernet/cavium/liquidio/liquidio_image.h  |   1 +
 .../net/ethernet/cavium/liquidio/liquidio_mgmt.c   | 439 +
 .../net/ethernet/cavium/liquidio/octeon_console.c  |  27 +-
 drivers/net/ethernet/cavium/liquidio/octeon_main.h |   9 +
 7 files changed, 516 insertions(+), 11 deletions(-)

diff --git a/drivers/net/ethernet/cavium/liquidio/Makefile 
b/drivers/net/ethernet/cavium/liquidio/Makefile
index c4d411d..2064157 100644
--- a/drivers/net/ethernet/cavium/liquidio/Makefile
+++ b/drivers/net/ethernet/cavium/liquidio/Makefile
@@ -15,6 +15,7 @@ liquidio-$(CONFIG_LIQUIDIO) += lio_ethtool.o \
octeon_mailbox.o   \
octeon_mem_ops.o   \
octeon_droq.o  \
+   liquidio_mgmt.o  \
octeon_nic.o
 
 liquidio-objs := lio_main.o octeon_console.o $(liquidio-y)
diff --git a/drivers/net/ethernet/cavium/liquidio/lio_main.c 
b/drivers/net/ethernet/cavium/liquidio/lio_main.c
index ba01242..b22eb74 100644
--- a/drivers/net/ethernet/cavium/liquidio/lio_main.c
+++ b/drivers/net/ethernet/cavium/liquidio/lio_main.c
@@ -43,6 +43,8 @@ MODULE_FIRMWARE(LIO_FW_DIR LIO_FW_BASE_NAME LIO_210SV_NAME 
LIO_FW_NAME_SUFFIX);
 MODULE_FIRMWARE(LIO_FW_DIR LIO_FW_BASE_NAME LIO_210NV_NAME LIO_FW_NAME_SUFFIX);
 MODULE_FIRMWARE(LIO_FW_DIR LIO_FW_BASE_NAME LIO_410NV_NAME LIO_FW_NAME_SUFFIX);
 MODULE_FIRMWARE(LIO_FW_DIR LIO_FW_BASE_NAME LIO_23XX_NAME LIO_FW_NAME_SUFFIX);
+MODULE_FIRMWARE(LIO_FW_DIR LIO_FW_BASE_NAME LIO_23XX_NAME "_"
+LIO_FW_NAME_TYPE_OVS LIO_FW_NAME_SUFFIX);
 
 static int ddr_timeout = 1;
 module_param(ddr_timeout, int, 0644);
@@ -57,7 +59,7 @@ MODULE_PARM_DESC(debug, "NETIF_MSG debug bits");
 
 static char fw_type[LIO_MAX_FW_TYPE_LEN];
 module_param_string(fw_type, fw_type, sizeof(fw_type), );
-MODULE_PARM_DESC(fw_type, "Type of firmware to be loaded. Default \"nic\"");
+MODULE_PARM_DESC(fw_type, "Type of firmware to be loaded (nic,ovs,none). 
Default \"nic\".  Use \"none\" to load firmware from flash on LiquidIO 
adapter.");
 
 static int ptp_enable = 1;
 
@@ -1414,6 +1416,12 @@ static bool fw_type_is_none(void)
   sizeof(LIO_FW_NAME_TYPE_NONE)) == 0;
 }
 
+static bool is_fw_type_ovs(void)
+{
+   return strncmp(fw_type, LIO_FW_NAME_TYPE_OVS,
+  sizeof(LIO_FW_NAME_TYPE_OVS)) == 0;
+}
+
 /**
  *\brief Destroy resources associated with octeon device
  * @param pdev PCI device structure
@@ -1776,6 +1784,9 @@ static void liquidio_remove(struct pci_dev *pdev)
 
dev_dbg(_dev->pci_dev->dev, "Stopping device\n");
 
+   if (is_fw_type_ovs())
+   lio_mgmt_exit();
+
if (oct_dev->watchdog_task)
kthread_stop(oct_dev->watchdog_task);
 
@@ -3933,6 +3944,8 @@ static int setup_nic_devices(struct octeon_device 
*octeon_dev)
u32 resp_size, ctx_size, data_size;
u32 ifidx_or_pfnum;
struct lio_version *vdata;
+   union oct_nic_vf_info vf_info;
+
 
/* This is to handle link status changes */
octeon_register_dispatch_fn(octeon_dev, OPCODE_NIC,
@@ -4001,9 +4014,16 @@ static int setup_nic_devices(struct octeon_device 
*octeon_dev)
 
sc->iq_no = 0;
 
+   /* Populate VF info for OVS firmware */
+   vf_info.u64 = 0;
+
+   vf_info.s.bus_num = octeon_dev->pci_dev->bus->number;
+   vf_info.s.dev_fn = octeon_dev->pci_dev->devfn;
+   vf_info.s.max_vfs = octeon_dev->sriov_info.max_vfs;
+
octeon_prepare_soft_command(octeon_dev, sc, OPCODE_NIC,
OPCODE_NIC_IF_CFG, 0,
-   if_cfg.u64, 0);
+   if_cfg.u64, vf_info.u64);
 
sc->callback = if_cfg_callback;
sc->callback_arg = sc;
@@ -4382,6 +4402,9 @@ static int liquidio_init_nic_module(struct octeon_device 
*oct)
goto octnet_init_failure;
}
 
+   if (is_fw_type_ovs())
+   lio_mgmt_init(oct);
+
liquidio_ptp_init(oct);
 
dev_dbg(>pci_dev->dev, "Network interfaces ready\n");
diff --git a/drivers/net/ethernet/cavium/liquidio/liquidio_common.h

Re: [PATCH net-next] net: phy: Relax error checking on sysfs_create_link()

2017-05-27 Thread Florian Fainelli

On May 26, 2017 8:34:18 PM PDT, Florian Fainelli  wrote:
>Some Ethernet drivers will attach/connect to a PHY device before
>calling
>register_netdevice() which is responsible for calling
>netdev_register_kobject()
>which would do the network device's kobject initialization. In such a
>case,
>sysfs_create_link() would return -ENOENT because the network device's
>kobject
>is not ready yet, and we would fail to connect to the PHY device.
>
>In order to keep things simple and symetrical, we just take the success
>path as
>indicative of the ability to access the network device's kobject, and
>create
>the second link if that's the case.
>
>Fixes: 5568363f0cb3 ("net: phy: Create sysfs reciprocal links for
>attached_dev/phydev")
>Reported-by: Woojung Hung 
>Signed-off-by: Florian Fainelli 
>---
> drivers/net/phy/phy_device.c | 28 
> include/linux/phy.h  |  2 ++
> 2 files changed, 22 insertions(+), 8 deletions(-)
>
>diff --git a/drivers/net/phy/phy_device.c
>b/drivers/net/phy/phy_device.c
>index f84414b8f2ee..523366bd705a 100644
>--- a/drivers/net/phy/phy_device.c
>+++ b/drivers/net/phy/phy_device.c
>@@ -960,15 +960,25 @@ int phy_attach_direct(struct net_device *dev,
>struct phy_device *phydev,
> 
>   phydev->attached_dev = dev;
>   dev->phydev = phydev;
>+
>+  /* Some Ethernet drivers try to connect to a PHY device before
>+   * calling register_netdevice() -> netdev_register_kobject() and
>+   * does the dev->dev.kobj initialization. Here we only check for
>+   * success which indicates that the network device kobject is
>+   * ready. Once we do that we still need to keep track of whether
>+   * links were successfully set up or not for phy_detach() to
>+   * remove them accordingly.
>+   */
>   err = sysfs_create_link(>mdio.dev.kobj, >dev.kobj,
>   "attached_dev");
>-  if (err)
>-  goto error;
>+  if (!err) {
>+  err = sysfs_create_link(>dev.kobj, >mdio.dev.kobj,
>+  "phydev");
>+  if (err)
>+  goto error;
> 
>-  err = sysfs_create_link(>dev.kobj, >mdio.dev.kobj,
>-  "phydev");
>-  if (err)
>-  goto error;
>+  phydev->sysfs_links = true;
>+  }
> 

We should not assume that this Boolean will be true or false if we enter this 
function a second time, v2 coming.

-- 
Florian

Re: [PATCH net-next] net: ndisc.c: reduce size of __ndisc_fill_addr_option()

2017-05-27 Thread Alexey Dobriyan

> --- a/net/ipv6/ndisc.c
> +++ b/net/ipv6/ndisc.c
> @@ -148,17 +148,18 @@ void __ndisc_fill_addr_option(struct sk_buff *skb, int 
> type, void *data,

>   space -= data_len;
> - if (space > 0)
> - memset(opt, 0, space);
> +
> + memset(opt, 0, space);

This can't be right.

And what size are you reducing?

Re: [PATCH] dsa: mv88e6xxx: fix returnvar.cocci warnings

2017-05-27 Thread Andrew Lunn

On Sat, May 27, 2017 at 06:38:14AM +0200, Julia Lawall wrote:
> Remove unneeded variable used to store return value.
> 
> Generated by: scripts/coccinelle/misc/returnvar.cocci

Hi Julia

Thanks for the patch. However, Vivien already submitted a patch.

   Andrew

[PATCH V6 net-next 1/2] rtnl: Add support for netdev event to link messages

2017-05-27 Thread Vladislav Yasevich

When netdev events happen, a rtnetlink_event() handler will send
messages for every event in it's white list.  These messages contain
current information about a particular device, but they do not include
the iformation about which event just happened.  So, it is impossible
to tell what just happend for these events.

This patch adds a new extension to RTM_NEWLINK message called IFLA_EVENT
that would have an encoding of event that triggered this
message.  This would allow the the message consumer to easily determine
if it needs to perform certain actions.

Signed-off-by: Vladislav Yasevich 
---
 include/linux/rtnetlink.h|  3 +-
 include/uapi/linux/if_link.h | 11 
 net/core/dev.c   |  2 +-
 net/core/rtnetlink.c | 65 ++--
 4 files changed, 70 insertions(+), 11 deletions(-)

diff --git a/include/linux/rtnetlink.h b/include/linux/rtnetlink.h
index 57e5484..dea59c8 100644
--- a/include/linux/rtnetlink.h
+++ b/include/linux/rtnetlink.h
@@ -18,7 +18,8 @@ extern int rtnl_put_cacheinfo(struct sk_buff *skb, struct 
dst_entry *dst,
 
 void rtmsg_ifinfo(int type, struct net_device *dev, unsigned change, gfp_t 
flags);
 struct sk_buff *rtmsg_ifinfo_build_skb(int type, struct net_device *dev,
-  unsigned change, gfp_t flags);
+  unsigned change, u32 event,
+  gfp_t flags);
 void rtmsg_ifinfo_send(struct sk_buff *skb, struct net_device *dev,
   gfp_t flags);
 
diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
index 15ac203..8ed679f 100644
--- a/include/uapi/linux/if_link.h
+++ b/include/uapi/linux/if_link.h
@@ -157,6 +157,7 @@ enum {
IFLA_GSO_MAX_SIZE,
IFLA_PAD,
IFLA_XDP,
+   IFLA_EVENT,
__IFLA_MAX
 };
 
@@ -911,4 +912,14 @@ enum {
 
 #define IFLA_XDP_MAX (__IFLA_XDP_MAX - 1)
 
+enum {
+   IFLA_EVENT_NONE,
+   IFLA_EVENT_REBOOT,  /* internal reset / reboot */
+   IFLA_EVENT_FEATURES,/* change in offload features */
+   IFLA_EVENT_BONDING_FAILOVER,/* change in active slave */
+   IFLA_EVENT_NOTIFY_PEERS,/* re-sent grat. arp/ndisc */
+   IFLA_EVENT_IGMP_RESEND, /* re-sent IGMP JOIN */
+   IFLA_EVENT_BONDING_OPTIONS, /* change in bonding options */
+};
+
 #endif /* _UAPI_LINUX_IF_LINK_H */
diff --git a/net/core/dev.c b/net/core/dev.c
index 3d98fbf..06e0a74 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -7084,7 +7084,7 @@ static void rollback_registered_many(struct list_head 
*head)
 
if (!dev->rtnl_link_ops ||
dev->rtnl_link_state == RTNL_LINK_INITIALIZED)
-   skb = rtmsg_ifinfo_build_skb(RTM_DELLINK, dev, ~0U,
+   skb = rtmsg_ifinfo_build_skb(RTM_DELLINK, dev, ~0U, 0,
 GFP_KERNEL);
 
/*
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index dab2834..07218eb 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -941,6 +941,7 @@ static noinline size_t if_nlmsg_size(const struct 
net_device *dev,
   + nla_total_size(MAX_PHYS_ITEM_ID_LEN) /* IFLA_PHYS_SWITCH_ID */
   + nla_total_size(IFNAMSIZ) /* IFLA_PHYS_PORT_NAME */
   + rtnl_xdp_size() /* IFLA_XDP */
+  + nla_total_size(4)  /* IFLA_EVENT */
   + nla_total_size(1); /* IFLA_PROTO_DOWN */
 
 }
@@ -1282,9 +1283,40 @@ static int rtnl_xdp_fill(struct sk_buff *skb, struct 
net_device *dev)
return err;
 }
 
+static u32 rtnl_get_event(unsigned long event)
+{
+   u32 rtnl_event_type = IFLA_EVENT_NONE;
+
+   switch (event) {
+   case NETDEV_REBOOT:
+   rtnl_event_type = IFLA_EVENT_REBOOT;
+   break;
+   case NETDEV_FEAT_CHANGE:
+   rtnl_event_type = IFLA_EVENT_FEATURES;
+   break;
+   case NETDEV_BONDING_FAILOVER:
+   rtnl_event_type = IFLA_EVENT_BONDING_FAILOVER;
+   break;
+   case NETDEV_NOTIFY_PEERS:
+   rtnl_event_type = IFLA_EVENT_NOTIFY_PEERS;
+   break;
+   case NETDEV_RESEND_IGMP:
+   rtnl_event_type = IFLA_EVENT_IGMP_RESEND;
+   break;
+   case NETDEV_CHANGEINFODATA:
+   rtnl_event_type = IFLA_EVENT_BONDING_OPTIONS;
+   break;
+   default:
+   break;
+   }
+
+   return rtnl_event_type;
+}
+
 static int rtnl_fill_ifinfo(struct sk_buff *skb, struct net_device *dev,
int type, u32 pid, u32 seq, u32 change,
-   unsigned int flags, u32 ext_filter_mask)
+   unsigned int flags, u32 ext_filter_mask,
+   u32 event)
 {
struct ifinfomsg *ifm;
struct nlmsghdr *nlh;
@@ -1333,6 +1365,11 @@ static

[PATCH V6 net-next iproute] ip: Add support for netdev events to monitor

2017-05-27 Thread Vladislav Yasevich

Add IFLA_EVENT handling so that event types can be viewed with
'monitor' command.  This gives a little more information for why
a given message was receivied.

Signed-off-by: Vladislav Yasevich 
---
 include/linux/if_link.h | 11 +++
 ip/ipaddress.c  | 21 +
 2 files changed, 32 insertions(+)

diff --git a/include/linux/if_link.h b/include/linux/if_link.h
index 5a3a048..c0a6769 100644
--- a/include/linux/if_link.h
+++ b/include/linux/if_link.h
@@ -157,6 +157,7 @@ enum {
IFLA_GSO_MAX_SIZE,
IFLA_PAD,
IFLA_XDP,
+   IFLA_EVENT,
__IFLA_MAX
 };
 
@@ -909,4 +910,14 @@ enum {
 
 #define IFLA_XDP_MAX (__IFLA_XDP_MAX - 1)
 
+enum {
+   IFLA_EVENT_NONE,
+   IFLA_EVENT_REBOOT,  /* internal reset / reboot */
+   IFLA_EVENT_FEATURES,/* change in offload features */
+   IFLA_EVENT_BONDING_FAILOVER,/* hange in active slave */
+   IFLA_EVENT_NOTIFY_PEERS,/* re-sent grat. arp/ndisc */
+   IFLA_EVENT_IGMP_RESEND, /* re-sent IGMP JOIN */
+   IFLA_EVENT_BONDING_OPTIONS, /* change in bonding options */
+};
+
 #endif /* _LINUX_IF_LINK_H */
diff --git a/ip/ipaddress.c b/ip/ipaddress.c
index b8d9c7d..c6e7413 100644
--- a/ip/ipaddress.c
+++ b/ip/ipaddress.c
@@ -753,6 +753,24 @@ int print_linkinfo_brief(const struct sockaddr_nl *who,
return 0;
 }
 
+static const char *netdev_events[] = {"NONE",
+ "REBOOT",
+ "FEATURE CHANGE",
+ "BONDING FAILOVER",
+ "NOTIFY PEERS",
+ "RESEND IGMP",
+ "BONDING OPTION"};
+
+static void print_dev_event(FILE *f, __u32 event)
+{
+   if (event >= ARRAY_SIZE(netdev_events))
+   fprintf(f, "event %d ", event);
+   else {
+   if (event)
+   fprintf(f, "event %s ", netdev_events[event]);
+   }
+}
+
 int print_linkinfo(const struct sockaddr_nl *who,
   struct nlmsghdr *n, void *arg)
 {
@@ -858,6 +876,9 @@ int print_linkinfo(const struct sockaddr_nl *who,
if (filter.showqueue)
print_queuelen(fp, tb);
 
+   if (tb[IFLA_EVENT])
+   print_dev_event(fp, rta_getattr_u32(tb[IFLA_EVENT]));
+
if (!filter.family || filter.family == AF_PACKET || show_details) {
SPRINT_BUF(b1);
fprintf(fp, "%s", _SL_);
-- 
2.7.4

[PATCH V6 net-next 2/2] bonding: Prevent duplicate userspace notification

2017-05-27 Thread Vladislav Yasevich

Whenever a user changes bonding options, a NETDEV_CHANGEINFODATA
notificatin is generated which results in a rtnelink message to
be sent.  While runnig 'ip monitor', we can actually see 2 messages,
one a result of the event, and the other a result of state change
that is generated bo netdev_state_change().  However, this is not
always the case. If bonding changes were done via sysfs or ifenslave
(old ioctl interface), then only 1 message is seen.

This patch removes duplicate messages in the case of using netlink
to configure bonding.  It introduceds a separte function that
triggers a netdev event and uses that function in the syfs and ioctl
cases.

This was discovered while auditing all the different envents and
continues the effort of cleaning up duplicated netlink messages.

CC: David Ahern 
CC: Jiri Pirko 
Signed-off-by: Vladislav Yasevich 
---
 drivers/net/bonding/bond_main.c|  3 ++-
 drivers/net/bonding/bond_options.c | 27 +--
 include/net/bond_options.h |  2 ++
 3 files changed, 29 insertions(+), 3 deletions(-)

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 7331331..d7aa137 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -3481,7 +3481,8 @@ static int bond_do_ioctl(struct net_device *bond_dev, 
struct ifreq *ifr, int cmd
case BOND_CHANGE_ACTIVE_OLD:
case SIOCBONDCHANGEACTIVE:
bond_opt_initstr(, slave_dev->name);
-   res = __bond_opt_set(bond, BOND_OPT_ACTIVE_SLAVE, );
+   res = __bond_opt_set_notify(bond, BOND_OPT_ACTIVE_SLAVE,
+   );
break;
default:
res = -EOPNOTSUPP;
diff --git a/drivers/net/bonding/bond_options.c 
b/drivers/net/bonding/bond_options.c
index 1bcbb89..8ca6833 100644
--- a/drivers/net/bonding/bond_options.c
+++ b/drivers/net/bonding/bond_options.c
@@ -673,7 +673,30 @@ int __bond_opt_set(struct bonding *bond,
 out:
if (ret)
bond_opt_error_interpret(bond, opt, ret, val);
-   else if (bond->dev->reg_state == NETREG_REGISTERED)
+
+   return ret;
+}
+/**
+ * __bond_opt_set_notify - set a bonding option
+ * @bond: target bond device
+ * @option: option to set
+ * @val: value to set it to
+ *
+ * This function is used to change the bond's option value and trigger
+ * a notification to user sapce. It can be used for both enabling/changing
+ * an option and for disabling it. RTNL lock must be obtained before calling
+ * this function.
+ */
+int __bond_opt_set_notify(struct bonding *bond,
+ unsigned int option, struct bond_opt_value *val)
+{
+   int ret = -ENOENT;
+
+   ASSERT_RTNL();
+
+   ret = __bond_opt_set(bond, option, val);
+
+   if (!ret && (bond->dev->reg_state == NETREG_REGISTERED))
call_netdevice_notifiers(NETDEV_CHANGEINFODATA, bond->dev);
 
return ret;
@@ -696,7 +719,7 @@ int bond_opt_tryset_rtnl(struct bonding *bond, unsigned int 
option, char *buf)
if (!rtnl_trylock())
return restart_syscall();
bond_opt_initstr(, buf);
-   ret = __bond_opt_set(bond, option, );
+   ret = __bond_opt_set_notify(bond, option, );
rtnl_unlock();
 
return ret;
diff --git a/include/net/bond_options.h b/include/net/bond_options.h
index 1797235..d79d28f 100644
--- a/include/net/bond_options.h
+++ b/include/net/bond_options.h
@@ -104,6 +104,8 @@ struct bond_option {
 
 int __bond_opt_set(struct bonding *bond, unsigned int option,
   struct bond_opt_value *val);
+int __bond_opt_set_notify(struct bonding *bond, unsigned int option,
+ struct bond_opt_value *val);
 int bond_opt_tryset_rtnl(struct bonding *bond, unsigned int option, char *buf);
 
 const struct bond_opt_value *bond_opt_parse(const struct bond_option *opt,
-- 
2.7.4

[PATCH V6 net-next 0/2] rtnetlink: Updates to rtnetlink_event()

2017-05-27 Thread Vladislav Yasevich

First is the patch to add IFLA_EVENT attribute to the netlink message.  It
supports only currently white-listed events.
Like before, this is just an attribute that gets added to the rtnetlink
message only when the messaged was generated as a result of a netdev event.
In my case, this is necessary since I want to trap NETDEV_NOTIFY_PEERS
event (also possibly NETDEV_RESEND_IGMP event) and perform certain actions
in user space.  This is not possible since the messages generated as
a result of netdev events do not usually contain any changed data.  They
are just notifications.  This patch exposes this notification type to
userspace.

Second, I remove duplicate messages that a result of a change to bonding
options.  If netlink is used to configure bonding options, 2 messages
are generated, one as a result NETDEV_CHANGEINFODATA event triggered by
bonding code and one a result of device state changes triggered by
netdev_state_change (called from do_setlink).


V6: Updated names and refactored to make it less tied to netdev events.
(From David Ahern)
V5: Rebased.  Added iproute2 patch to the series.
V4:
  * Removed the patch the removed NETDEV_CHANGENAME from event whitelist.
It doesn't trigger duplicate messages since name changes can only be
done while device is down and netdev_state_change() doesn't report
changes while device is down.
  * Added a patch to clean-up duplicate messages on bonding option changes.

V3: Rebased.  Cleaned-up duplicate event.

V2: Added missed events (from David Ahern)


Vladislav Yasevich (2):
  rtnl: Add support for netdev event to link messages
  bonding: Prevent duplicate userspace notification

 drivers/net/bonding/bond_main.c|  3 +-
 drivers/net/bonding/bond_options.c | 27 ++--
 include/linux/rtnetlink.h  |  3 +-
 include/net/bond_options.h |  2 ++
 include/uapi/linux/if_link.h   | 11 +++
 net/core/dev.c |  2 +-
 net/core/rtnetlink.c   | 65 --
 7 files changed, 99 insertions(+), 14 deletions(-)

-- 
2.7.4

Re: [PATCH net-next v2 0/8] net: extend RTM_GETROUTE to return fib result

2017-05-27 Thread David Ahern

On 5/27/17 12:00 AM, Roopa Prabhu wrote:
> On Fri, May 26, 2017 at 11:18 AM, David Miller  wrote:
>> From: Roopa Prabhu 
>> Date: Thu, 25 May 2017 10:42:32 -0700
>>
>>> This series adds a new RTM_F_FIB_MATCH flag to return matched fib result
>>> with RTM_GETROUTE. This is useful for applications and protocols in
>>> userspace wanting to query the selected route.
>>
>> Looks good, series applied, thanks.
>>
> 
> thank you.
> 
>> Have you considered taking this further and allowing one to see which
>> nexthop a route lookup picked?
> 
> since the default RTM_GETROUTE output gives most of the attributes
> from the resolved dst,
> have not considered adding more to it yet...but certainly can if
> needed in the future.
> 

One extension is to pass in prefix and length (plus any options to such
as metric to uniquely discriminate a route) and get back the route
entry. It is needed to retrieve the BPF code for routes with a bpf
encap. This patch set makes it easier.

Re: [PATCH v2 6/6] stmmac: pci: Remove setup handler indirection via stmmac_pci_info

2017-05-27 Thread Andy Shevchenko

On Fri, May 26, 2017 at 7:07 PM, Jan Kiszka  wrote:
> By now, stmmac_pci_info only contains a single entry.

_For now_.

> Register this
> directly with the PCI device table, removing one indirection.

I am not sure this patch is needed.

Next time something comes up we would need to extend this and
effectively revert this change.
So, my vote is to leave it as is for now.

-- 
With Best Regards,
Andy Shevchenko

Re: [PATCH v2 5/6] stmmac: pci: Use dmi_system_id table for retrieving PHY addresses

2017-05-27 Thread Andy Shevchenko

On Fri, May 26, 2017 at 7:07 PM, Jan Kiszka  wrote:
> Avoids reimplementation of DMI matching in stmmac_pci_find_phy_addr.

>  struct stmmac_pci_dmi_data {
> -   const char *name;
> -   const char *asset_tag;
> -   unsigned int func;
> +   int func;
> int phy_addr;
>  };

Can we leave unsigned type here...

> -static struct stmmac_pci_dmi_data quark_pci_dmi_data[] = {
> +static const struct stmmac_pci_dmi_data galileo_stmmac_dmi_data[] = {

> +   {-1, -1},
> +};

> +static const struct stmmac_pci_dmi_data iot2040_stmmac_dmi_data[] = {

> +   {-1, -1},
> +};

...and avoid this not so standard terminators?

> +   .matches = {
> +   DMI_EXACT_MATCH(DMI_BOARD_NAME, "GalileoGen2"),
> +   },

> +   .driver_data = (void *)galileo_stmmac_dmi_data,

Can't be slightly better

 .driver_data = _stmmac_dmi_data,

?

-- 
With Best Regards,
Andy Shevchenko

Re: [PATCH v2 3/6] stmmac: pci: Make stmmac_pci_find_phy_addr truly generic

2017-05-27 Thread Andy Shevchenko

On Fri, May 26, 2017 at 7:07 PM, Jan Kiszka  wrote:
> Move the special case for the early Galileo firmware into
> quark_default_setup. This allows to use stmmac_pci_find_phy_addr for
> non-quark cases.

> ret = stmmac_pci_find_phy_addr(pdev, info);
> -   if (ret < 0)
> -   return ret;
> +   if (ret < 0) {
> +   /*
> +* Galileo boards with old firmware don't support DMI. We 
> always
> +* use 1 here as PHY address, so at least the first found MAC
> +* controller would be probed.
> +*/
> +   if (!dmi_get_system_info(DMI_BOARD_NAME))
> +   ret = 1;
> +   else
> +   return ret;

Perhaps

/* Return error to the caller on DMI enabled boards */
if (dmi_...)
 return ret;
/*
 * Comment goes here, I suppose.
 */
ret = 1;

> +   }

-- 
With Best Regards,
Andy Shevchenko

Re: [PATCH v2 2/6] stmmac: pci: Use stmmac_pci_info for all devices

2017-05-27 Thread Andy Shevchenko

On Sat, May 27, 2017 at 4:13 PM, Andy Shevchenko
 wrote:
> On Fri, May 26, 2017 at 7:07 PM, Jan Kiszka  wrote:
>> Make stmmac_default_data compatible with stmmac_pci_info.setup and use
>> an info structure for all devices. This allows to make the probing more
>> regular.

> Or converting defines first to PCI_DEVICE_ID_*

It looks even for previously mentioned approach we need to rename
constants first.

> and
>
> #define STMMAC_DEVICE(_vid, _did, info) {   \
>PCI_DEVICE(PCI_VENDOR_ID_##_vid, PCI_DEVICE_ID_##_did),
>  \
>
> which I like even better.

Or even
 #define STMMAC_DEVICE(_vid, _did, info) {   \
PCI_VDEVICE(_vid, PCI_DEVICE_ID_##_did),\


-- 
With Best Regards,
Andy Shevchenko

Re: [PATCH v2 2/6] stmmac: pci: Use stmmac_pci_info for all devices

2017-05-27 Thread Andy Shevchenko

On Fri, May 26, 2017 at 7:07 PM, Jan Kiszka  wrote:
> Make stmmac_default_data compatible with stmmac_pci_info.setup and use
> an info structure for all devices. This allows to make the probing more
> regular.


> +#define STMMAC_DEVICE(vendor_id, dev_id, info) {   \
> +   PCI_DEVICE(vendor_id, dev_id),  \

Perhaps

#define STMMAC_DEVICE(_vid, _did, info) {   \
   PCI_DEVICE(PCI_VENDOR_ID_##_vid, _did),  \

Or converting defines first to PCI_DEVICE_ID_*
and

#define STMMAC_DEVICE(_vid, _did, info) {   \
   PCI_DEVICE(PCI_VENDOR_ID_##_vid, PCI_DEVICE_ID_##_did),
 \

which I like even better.

> +   .driver_data = (kernel_ulong_t)\
> +   }
> +
>  static const struct pci_device_id stmmac_id_table[] = {
> -   {PCI_DEVICE(STMMAC_VENDOR_ID, STMMAC_DEVICE_ID)},
> -   {PCI_DEVICE(PCI_VENDOR_ID_STMICRO, PCI_DEVICE_ID_STMICRO_MAC)},
> -   {PCI_VDEVICE(INTEL, STMMAC_QUARK_ID), 
> (kernel_ulong_t)_pci_info},
> +   STMMAC_DEVICE(STMMAC_VENDOR_ID, STMMAC_DEVICE_ID, stmmac_pci_info),
> +   STMMAC_DEVICE(PCI_VENDOR_ID_STMICRO, PCI_DEVICE_ID_STMICRO_MAC,
> + stmmac_pci_info),
> +   STMMAC_DEVICE(PCI_VENDOR_ID_INTEL, STMMAC_QUARK_ID, quark_pci_info),

-- 
With Best Regards,
Andy Shevchenko

TCP get SND_CWND change on loss event

2017-05-27 Thread Lars Erik Storbukås

I want to store the value of snd_cwnd when a congestion event occurs
(value before snd_cwnd is reduced), and the new value of snd_cwnd (the
value it has been reduced to). In other words: the congestion window
before and after a congestion event occurs.

I'm uncertain where (and how) it would be logical to implement this. I
have found two possible locations in the tcp_input.c where (I think)
it could be implemented:

static void tcp_cong_control(...) {
 ...
  if (tcp_in_cwnd_reduction(sk)) {
struct tcp_sock *tp = tcp_sk(sk);
prior_congestion_window = tp->snd_cwnd;

/* Reduce cwnd if state mandates */
tcp_cwnd_reduction(sk, acked_sacked, flag);

reduced_congestion_window = tp->snd_cwnd;
  }
 ...
}

or

static void tcp_fastretrans_alert(...) {
  ...
  default:
...
struct tcp_sock *tp = tcp_sk(sk);
prior_congestion_window = tp->snd_cwnd;

/* Otherwise enter Recovery state */
tcp_enter_recovery(sk, (flag & FLAG_ECE));
fast_rexmit = 1;

reduced_congestion_window = tp->snd_cwnd;
  ...
}

Does anyone have advice on where (and how) to implement this? Does any
of the proposed solutions above seem logical?

/ Lars Erik Storbukås

Re: [PATCH 1/2] PCI: Add new PCIe Fabric End Node flag, PCI_DEV_FLAGS_NO_RELAXED_ORDERING

2017-05-27 Thread Ding Tianhong



On 2017/5/26 3:49, Alexander Duyck wrote:
> On Thu, May 25, 2017 at 6:35 AM, Ding Tianhong  
> wrote:
>>
>> On 2017/5/9 8:48, Casey Leedom wrote:
>>>
>>> | From: Alexander Duyck 
>>> | Date: Saturday, May 6, 2017 11:07 AM
>>> |
>>> | | From: Ding Tianhong 
>>> | | Date: Fri, May 5, 2017 at 8:08 PM
>>> | |
>>> | | According the suggestion, I could only think of this code:
>>> | | ..
>>> |
>>> | This is a bit simplistic but it is a start.
>>>
>>>   Yes, something tells me that this is going to be more complicated than any
>>> of us like ...
>>>
>>> | The other bit I was getting at is that we need to update the core PCIe
>>> | code so that when we configure devices and the root complex reports no
>>> | support for relaxed ordering it should be clearing the relaxed
>>> | ordering bits in the PCIe configuration registers on the upstream
>>> | facing devices.
>>>
>>>   Of course, this can be written to by the Driver at any time ... and is in
>>> the case of the cxgb4 Driver ...
>>>
>>>   After a lot of rummaging around, it does look like KVM prohibits writes to
>>> the PCIe Capability Device Control register in drivers/xen/xen-pciback/
>>> conf_space_capability.c and conf_space.c simply because writes aren't
>>> allowed unless "permissive" is set.  So it ~looks~ like a driver running in
>>> a Virtual Machine can't turn Enable Relaxed Ordering back on ...
>>>
>>> | The last bit we need in all this is a way to allow for setups where
>>> | peer-to-peer wants to perform relaxed ordering but for writes to the
>>> | host we have to not use relaxed ordering. For that we need to enable a
>>> | special case and that isn't handled right now in any of the solutions
>>> | we have coded up so far.
>>>
>>>   Yes, we do need this.
>>>
>>>
>>> | From: Alexander Duyck 
>>> | Date: Saturday, May 8, 2017 08:22 AM
>>> |
>>> | The problem is we need to have something that can be communicated
>>> | through a VM. Your change doesn't work in that regard. That was why I
>>> | suggested just updating the code so that we when we initialized PCIe
>>> | devices what we do is either set or clear the relaxed ordering bit in
>>> | the PCIe device control register. That way when we direct assign an
>>> | interface it could know just based on the bits int the PCIe
>>> | configuration if it could use relaxed ordering or not.
>>> |
>>> | At that point the driver code itself becomes very simple since you
>>> | could just enable the relaxed ordering by default in the igb/ixgbe
>>> | driver and if the bit is set or cleared in the PCIe configuration then
>>> | we are either sending with relaxed ordering requests or not and don't
>>> | have to try and locate the root complex.
>>> |
>>> | So from the sound of it Casey has a special use case where he doesn't
>>> | want to send relaxed ordering frames to the root complex, but instead
>>> | would like to send them to another PCIe device. To do that he needs to
>>> | have a way to enable the relaxed ordering bit in the PCIe
>>> | configuration but then not send any to the root complex. Odds are that
>>> | is something he might be able to just implement in the driver, but is
>>> | something that may become a more general case in the future. I don't
>>> | see our change here impacting it as long as we keep the solution
>>> | generic and mostly confined to when we instantiate the devices as the
>>> | driver could likely make the decision to change the behavior later.
>>>
>>>   It's not just me.  Intel has said that while RO directed at the Root
>>> Complex Host Coherent Memory has a performance bug (not Data Corruption),
>>> it's a performance win for Peer-to-Peer writes to MMIO Space.  (I'll be very
>>> interested in hearing what the bug is if we get that much detail.  The very
>>> same TLPs directed to the Root Complex Port without Relaxed Ordering set get
>>> good performance.  So this is essentially a bug in the hardware that was
>>> ~trying~ to implement a performance win.)
>>>
>>>   Meanwhile, I currently only know of a single PCIe End Point which causes
>>> catastrophic results: the AMD A1100 ARM SoC ("SEATTLE").  And it's not even
>>> clear that product is even alive anymore since I haven't been able to get
>>> any responses from them for several months.
>>>
>>>   What I'm saying is: let's try to architect a solution which doesn't throw
>>> the baby out with the bath water ...
>>>
>>>   I think that if a Device's Root Complex Port has problems with Relaxed
>>> Ordering, it ~probably~ makes sense to turn off the PCIe Capability Device
>>> Control[Enable Relaxed Ordering] when we assign a device to a Virtual
>>> Machine since the Device Driver can no longer query the Relaxed Ordering
>>> Support of the Root Complex Port.  The only down side of this would be if we
>>> assigned two Peers to a VM in an application which wanted to do Peer-to-Peer
>>> transfers.  But that seems like a hard application to

Re: [Patch net-next] net_sched: only create filter chains for new filters/actions

2017-05-27 Thread Jiri Pirko

Fri, May 26, 2017 at 06:55:25PM CEST, xiyou.wangc...@gmail.com wrote:
>On Fri, May 26, 2017 at 7:54 AM, David Miller  wrote:
>> And I also didn't find the boolean logic hard to understand at all.
>>
>> It is in fact a very common pattern to pass a "create" boolean into
>> lookup functions, to tell them whether to create a new object on
>> lookup failure or not.  And then also to control that boolean via
>> what kind of netlink request we are processing.
>
>+10
>
>It is a widely used pattern among the kernel source code.
>I'd be surprised if an experienced kernel developer is not
>aware of this pattern. ;)

Cong, as you wisely put, I'm not aware of this pattern and I'm also
unaware of existence of ternary operator. Are this notes necessary?
Does that make you feel better?

Re: [Patch net-next] net_sched: only create filter chains for new filters/actions

2017-05-27 Thread Jiri Pirko

Fri, May 26, 2017 at 04:54:43PM CEST, da...@davemloft.net wrote:
>From: Jiri Pirko 
>Date: Fri, 26 May 2017 07:53:52 +0200
>
>> Thu, May 25, 2017 at 06:14:56PM CEST, da...@davemloft.net wrote:
>>>From: Cong Wang 
>>>Date: Tue, 23 May 2017 09:42:37 -0700
>>>
 tcf_chain_get() always creates a new filter chain if not found
 in existing ones. This is totally unnecessary when we get or
 delete filters, new chain should be only created for new filters
 (or new actions).
 
 Fixes: 5bc1701881e3 ("net: sched: introduce multichain support for 
 filters")
 Cc: Jamal Hadi Salim 
 Cc: Jiri Pirko 
 Signed-off-by: Cong Wang 
>>>
>>>Indeed, get and delete requests should not create new objects, ever.
>>>
>>>I have pretty much no idea why Jiri is making such a big fuss about
>>>this change, to be quite honest. :-)
>> 
>> Because it makes already hard to read code even worse, for *no* benefit.
>> That's why.
>
>Jiri, if you say the same thing 100 times, it doesn't help anyone
>understand your arguments any better.
>
>Creating new objects when a GET or a DEL is requested is flat out
>wrong.
 
Allright. I ack that.


>
>And Cong is fixing that.
>
>And I also didn't find the boolean logic hard to understand at all.
>
>It is in fact a very common pattern to pass a "create" boolean into
>lookup functions, to tell them whether to create a new object on
>lookup failure or not.  And then also to control that boolean via
>what kind of netlink request we are processing.
>
>So you tell me what's so bad about his change given the above?
>
>Give me details and real facts, like I just did, rather than vague
>statements about "benefit" and "hard to read".

What I don't like is the double "n->nlmsg_type == RTM_NEWTFILTER" check
and return value decusion according to the latter check. The code logic
is split into tcf_chain_get function and its caller. That is
at least odd.

Since you don't like the PTR_ERR approach, I'll try to figure out how to
do this another way.

Re: [patch net-next] net/sched: let chain_get to figure out the return value

2017-05-27 Thread Jiri Pirko

Fri, May 26, 2017 at 04:59:12PM CEST, da...@davemloft.net wrote:
>From: Jiri Pirko 
>Date: Fri, 26 May 2017 09:21:29 +0200
>
>> From: Jiri Pirko 
>> 
>> Alhough I believe that this create/nocreate dance is completelly
>> pointless, at least make it a bit nicer and easier to read.
>> Push the decision on what error value is returned to chain_get function
>> and use ERR macros.
>> 
>> Signed-off-by: Jiri Pirko 
>
>No, this is quite worse.
>
>You're leaving pointer error values in structures.  That's extremely
>error prone.

Yet used everywhere in kernel.

>
>And as stated in the other thread, I don't think Cong's logic is strange
>or hard to understand at all.

That is why tc code looks how it does :/
But perhaps I'm slow and everything is crystal-clear to everyone else.

Re: [PATCH 0/2] Document and use eeprom-length property

2017-05-27 Thread Shawn Guo

On Fri, May 26, 2017 at 03:02:42PM -0400, David Miller wrote:
> From: Andrew Lunn 
> Date: Fri, 26 May 2017 01:44:42 +0200
> 
> > The mv88e6xxx switch driver allows the size of the attached EEPROM to
> > be described in DT. This property is missing from the binding
> > documentation. Add it. And make use of it on the ZII Devel B board.
> > 
> > David, Shawn, please could you talk amongs yourself to decide who
> > takes what.
> 
> I can take this if it works for Shawn, otherwise I'm also fine if Shawn
> takes it and if so feel free to add my:
> 
> Acked-by: David S. Miller 

Hi David,

I see these two patches can be applied separately, so I picked up 2/2
and left 1/2 to you.

Shawn

Re: [PATCH 2/2] ARM: VF610: ZII devel b: Add switch eeprom-length properties

2017-05-27 Thread Shawn Guo

On Fri, May 26, 2017 at 01:44:44AM +0200, Andrew Lunn wrote:
> Two of the Ethernet switches on this board have EEPROMs connected.
> Add the eeprom-length property to the device tree, making it possible
> to access the EEPROM using ethtool -e.
> 
> Signed-off-by: Andrew Lunn 

Applied with a bit update on subject prefix.  Thanks.

Shawn

Re: [PATCH net-next v2 0/8] net: extend RTM_GETROUTE to return fib result

2017-05-27 Thread Roopa Prabhu

On Fri, May 26, 2017 at 11:18 AM, David Miller  wrote:
> From: Roopa Prabhu 
> Date: Thu, 25 May 2017 10:42:32 -0700
>
>> This series adds a new RTM_F_FIB_MATCH flag to return matched fib result
>> with RTM_GETROUTE. This is useful for applications and protocols in
>> userspace wanting to query the selected route.
>
> Looks good, series applied, thanks.
>

thank you.

> Have you considered taking this further and allowing one to see which
> nexthop a route lookup picked?

since the default RTM_GETROUTE output gives most of the attributes
from the resolved dst,
have not considered adding more to it yet...but certainly can if
needed in the future.

77 matches

Mail list logo