date:20161015

Re: [PATCH v3] net: Require exact match for TCP socket lookups if dif is l3mdev

2016-10-15 Thread David Miller

From: David Ahern 
Date: Sat, 15 Oct 2016 17:07:53 -0600

> I believe at netconf someone mentioned it would be a great day when
> something is done for IPv6 first and IPv4 was a follow on. Here you
> go. :-)

:-)

> I can rename the existing one to skb_l3mdev_slave_6 and make the new
> one skb_l3mdev_slave_4.

That works.  So does names with "ipv4_" and "ipv6_" prefixes which at
least to me seems more canonical.  But maybe I'm just weird like that.

Re: [PATCH 0/2] net: Fix compiler warnings

2016-10-15 Thread tndave

On 10/15/2016 02:48 PM, David Miller wrote:

From: Tushar Dave 
Date: Fri, 14 Oct 2016 17:06:04 -0700

Recently, ATU (iommu) changes are submitted to linux-sparc that
enables 64bit DMA on SPARC. However, this change also makes
'incompatible pointer type' compiler warnings inevitable on sunqe
and sunbmac driver.

The two patches in series fix compiler warnings.

Only the sparc tree has this build problem, so these patches
really ought to be submitted for and applied there.

Okay. I will send these to sparclinux then.

Thanks.

-Tushar

Thanks.

Re: [PATCH v3] net: Require exact match for TCP socket lookups if dif is l3mdev

2016-10-15 Thread David Ahern

On 10/15/16 3:46 PM, David Miller wrote:
> From: David Ahern 
> Date: Fri, 14 Oct 2016 12:29:19 -0700
> 
>> +/* can not be used in TCP layer after tcp_v6_fill_cb */
>> +static inline bool inet6_exact_dif_match(struct net *net, struct sk_buff 
>> *skb)
>> +{
>> +#if defined(CONFIG_NET_L3_MASTER_DEV)
>> +if (!net->ipv4.sysctl_tcp_l3mdev_accept &&
>> +skb_l3mdev_slave(IP6CB(skb)->flags))
>> +return true;
>> +#endif
>> +return false;
>> +}
>  ...
>> +static inline bool skb_l3mdev_slave4(u16 flags)
>> +{
>> +return !!(flags & IPSKB_L3SLAVE);
>> +}
> 
> I think this makes the code confusing.
> 
> Actually it has been from the beginning, because we have a generically
> named "skb_l3mdev_slave()" helper which strictly operates on ipv6
> state.
> 
> Please do something with the naming of these two helpers,
> skb_l3mdev_slave() and skb_l3mdev_slave4(), so that it is clear that
> they are ipv6 and ipv4 specific helpers, respectively.
> 

I believe at netconf someone mentioned it would be a great day when something 
is done for IPv6 first and IPv4 was a follow on. Here you go. :-)

I can rename the existing one to skb_l3mdev_slave_6 and make the new one 
skb_l3mdev_slave_4.

Re: [PATCH] ipvlan: constify l3mdev_ops structure

2016-10-15 Thread David Miller

From: Julia Lawall 
Date: Sat, 15 Oct 2016 17:40:30 +0200

> This l3mdev_ops structure is only stored in the l3mdev_ops field of a
> net_device structure.  This field is declared const, so the l3mdev_ops
> structure can be declared as const also.  Additionally drop the
> __read_mostly annotation.
> 
> The semantic patch that adds const is as follows:
> (http://coccinelle.lip6.fr/)
 ...
> Signed-off-by: Julia Lawall 

Applied, thanks.

Re: [PATCH 0/2] net: Fix compiler warnings

2016-10-15 Thread David Miller

From: Tushar Dave 
Date: Fri, 14 Oct 2016 17:06:04 -0700

> Recently, ATU (iommu) changes are submitted to linux-sparc that
> enables 64bit DMA on SPARC. However, this change also makes
> 'incompatible pointer type' compiler warnings inevitable on sunqe
> and sunbmac driver.
> 
> The two patches in series fix compiler warnings.

Only the sparc tree has this build problem, so these patches
really ought to be submitted for and applied there.

Thanks.

Re: [PATCH v2] vmxnet3: avoid assumption about invalid dma_pa in vmxnet3_set_mc()

2016-10-15 Thread David Miller

From: Alexey Khoroshilov 
Date: Sat, 15 Oct 2016 00:01:20 +0300

> vmxnet3_set_mc() checks new_table_pa returned by dma_map_single()
> with dma_mapping_error(), but even there it assumes zero is invalid pa
> (it assumes dma_mapping_error(...,0) returns true if new_table is NULL).
> 
> The patch adds an explicit variable to track status of new_table_pa.
> 
> Found by Linux Driver Verification project (linuxtesting.org).
> 
> v2: use "bool" and "true"/"false" for boolean variables.
> Signed-off-by: Alexey Khoroshilov 

Applied.

Re: [PATCH v3] net: Require exact match for TCP socket lookups if dif is l3mdev

2016-10-15 Thread David Miller

From: David Ahern 
Date: Fri, 14 Oct 2016 12:29:19 -0700

> +/* can not be used in TCP layer after tcp_v6_fill_cb */
> +static inline bool inet6_exact_dif_match(struct net *net, struct sk_buff 
> *skb)
> +{
> +#if defined(CONFIG_NET_L3_MASTER_DEV)
> + if (!net->ipv4.sysctl_tcp_l3mdev_accept &&
> + skb_l3mdev_slave(IP6CB(skb)->flags))
> + return true;
> +#endif
> + return false;
> +}
 ...
> +static inline bool skb_l3mdev_slave4(u16 flags)
> +{
> + return !!(flags & IPSKB_L3SLAVE);
> +}

I think this makes the code confusing.

Actually it has been from the beginning, because we have a generically
named "skb_l3mdev_slave()" helper which strictly operates on ipv6
state.

Please do something with the naming of these two helpers,
skb_l3mdev_slave() and skb_l3mdev_slave4(), so that it is clear that
they are ipv6 and ipv4 specific helpers, respectively.

Re: [patch] stmmac: fix an error code in stmmac_ptp_register()

2016-10-15 Thread David Miller

From: Dan Carpenter 
Date: Fri, 14 Oct 2016 22:26:11 +0300

> PTR_ERR(NULL) is success.  We have to preserve the error code earlier.
> 
> Fixes: 7086605a6ab5 ("stmmac: fix error check when init ptp")
> Signed-off-by: Dan Carpenter 

Good catch, applied.

Re: [PATCH] net: qcom/emac: disable interrupts before calling phy_disconnect

2016-10-15 Thread David Miller

From: Timur Tabi 
Date: Fri, 14 Oct 2016 14:14:35 -0500

> There is a race condition that can occur if EMAC interrupts are
> enabled when phy_disconnect() is called.  phy_disconnect() sets
> adjust_link to NULL.  When an interrupt occurs, the ISR might
> call phy_mac_interrupt(), which wakes up the workqueue function
> phy_state_machine().  This function might reference adjust_link,
> thereby causing a null pointer exception.
> 
> Signed-off-by: Timur Tabi 

Applied.

Re: [PATCH v2 net-next 0/2] ila: Cache a route in ILA lwt structure

2016-10-15 Thread David Miller

From: Tom Herbert 
Date: Fri, 14 Oct 2016 11:25:35 -0700

> Add a dst_cache to ila_lwt structure. This holds a cached route for the
> translated address. In ila_output we now perform a route lookup after
> translation and if possible (destination in original route is full 128
> bits) we set the dst_cache. Subsequent calls to ila_output can then use
> the cache to avoid the route lookup.
 ...

Series applied, thanks Tom.

Re: [PATCH v2] r8169: set coherent DMA mask as well as streaming DMA mask

2016-10-15 Thread David Miller

From: Ard Biesheuvel 
Date: Fri, 14 Oct 2016 14:48:51 +0100

> 
>> On 14 Oct 2016, at 14:42, David Laight  wrote:
>> 
>> From: Of Ard Biesheuvel
>>> Sent: 14 October 2016 14:41
>>> PCI devices that are 64-bit DMA capable should set the coherent
>>> DMA mask as well as the streaming DMA mask. On some architectures,
>>> these are managed separately, and so the coherent DMA mask will be
>>> left at its default value of 32 if it is not set explicitly. This
>>> results in errors such as
>>> 
>>> r8169 Gigabit Ethernet driver 2.3LK-NAPI loaded
>>> hwdev DMA mask = 0x, dev_addr = 0x0080fbfff000
>>> swiotlb: coherent allocation failed for device :02:00.0 size=4096
>>> CPU: 0 PID: 1062 Comm: systemd-udevd Not tainted 4.8.0+ #35
>>> Hardware name: AMD Seattle/Seattle, BIOS 10:53:24 Oct 13 2016
>>> 
>>> on systems without memory that is 32-bit addressable by PCI devices.
>>> 
>>> Signed-off-by: Ard Biesheuvel 
>>> ---
>>> v2: dropped the hunk that sets the coherent DMA mask to DMA_BIT_MASK(32),
>>>which is unnecessary given that it is the default
>>> 
>>> drivers/net/ethernet/realtek/r8169.c | 3 ++-
>>> 1 file changed, 2 insertions(+), 1 deletion(-)
>>> 
>>> diff --git a/drivers/net/ethernet/realtek/r8169.c 
>>> b/drivers/net/ethernet/realtek/r8169.c
>>> index e55638c7505a..bf000d819a21 100644
>>> --- a/drivers/net/ethernet/realtek/r8169.c
>>> +++ b/drivers/net/ethernet/realtek/r8169.c
>>> @@ -8273,7 +8273,8 @@ static int rtl_init_one(struct pci_dev *pdev, const 
>>> struct pci_device_id *ent)
>>>if ((sizeof(dma_addr_t) > 4) &&
>>>(use_dac == 1 || (use_dac == -1 && pci_is_pcie(pdev) &&
>>>  tp->mac_version >= RTL_GIGA_MAC_VER_18)) &&
>>> -!pci_set_dma_mask(pdev, DMA_BIT_MASK(64))) {
>>> +!pci_set_dma_mask(pdev, DMA_BIT_MASK(64)) &&
>>> +!pci_set_consistent_dma_mask(pdev, DMA_BIT_MASK(64))) {
>> 
>> Isn't there a dma_set_mask_and_coherent() function ?
>> 
> 
> Not of the pci_xxx variety afaik

You can often use the "dev_*" variants intechangably with the pci_*()
ones.

In fact you'll find that for several architectures pci_*() is
implemented via calls to dev_*().

[PATCH RFC] ixgbe: ixgbe_atr() must check if network header is available in headlen

2016-10-15 Thread Sowmini Varadhan


For some Tx paths (e.g., tpacket_snd()), ixgbe_atr may be
passed down an sk_buff that has the network and transport
header in the paged data, so it needs to make sure these
headers are available in the headlen bytes to calculate the
l4_proto.

This patch bails out if the headlen is "too short", and does
not attempt to call skb_header_pointer() to get the needed
bytes: the assumption is that the caller should set things
up properly if the l4_proto based tx steering is desired.

Signed-off-by: Sowmini Varadhan 
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |9 +
 1 files changed, 9 insertions(+), 0 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c 
b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index a244d9a..0868de1 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -7632,6 +7632,7 @@ static void ixgbe_atr(struct ixgbe_ring *ring,
struct sk_buff *skb;
__be16 vlan_id;
int l4_proto;
+   int min_hdr_size = 0;
 
/* if ring doesn't have a interrupt vector, cannot perform ATR */
if (!q_vector)
@@ -7650,6 +7651,14 @@ static void ixgbe_atr(struct ixgbe_ring *ring,
 
/* snag network header to get L4 type and address */
skb = first->skb;
+   if (first->protocol == htons(ETH_P_IP))
+   min_hdr_size = sizeof(struct iphdr) +
+  sizeof(struct tcphdr);
+   else if (first->protocol == htons(ETH_P_IPV6))
+   min_hdr_size = sizeof(struct ipv6hdr) +
+  sizeof(struct tcphdr);
+   if (min_hdr_size && skb_headlen(skb) < ETH_HLEN + min_hdr_size)
+   return;
hdr.network = skb_network_header(skb);
if (skb->encapsulation &&
first->protocol == htons(ETH_P_IP) &&
-- 
1.7.1

Re: [PATCH v2] r8169: set coherent DMA mask as well as streaming DMA mask

2016-10-15 Thread David Miller

From: Ard Biesheuvel 
Date: Fri, 14 Oct 2016 14:40:33 +0100

> PCI devices that are 64-bit DMA capable should set the coherent
> DMA mask as well as the streaming DMA mask. On some architectures,
> these are managed separately, and so the coherent DMA mask will be
> left at its default value of 32 if it is not set explicitly. This
> results in errors such as
> 
>  r8169 Gigabit Ethernet driver 2.3LK-NAPI loaded
>  hwdev DMA mask = 0x, dev_addr = 0x0080fbfff000
>  swiotlb: coherent allocation failed for device :02:00.0 size=4096
>  CPU: 0 PID: 1062 Comm: systemd-udevd Not tainted 4.8.0+ #35
>  Hardware name: AMD Seattle/Seattle, BIOS 10:53:24 Oct 13 2016
> 
> on systems without memory that is 32-bit addressable by PCI devices.
> 
> Signed-off-by: Ard Biesheuvel 

Applied.

linux-next: manual merge of the net tree with Linus' tree

2016-10-15 Thread Stephen Rothwell

Hi all,

Today's linux-next merge of the net tree got a conflict in:

  drivers/net/ethernet/qlogic/Kconfig

between commit:

  2e0cbc4dd077 ("qedr: Add RoCE driver framework")

from Linus' tree and commit:

  0189efb8f4f8 ("qed*: Fix Kconfig dependencies with INFINIBAND_QEDR")

from the net tree.

I fixed it up (see below) and can carry the fix as necessary. This
is now fixed as far as linux-next is concerned, but any non trivial
conflicts should be mentioned to your upstream maintainer when your tree
is submitted for merging.  You may also want to consider cooperating
with the maintainer of the conflicting tree to minimise any particularly
complex conflicts.

I also added this merge fix patch:

From: Stephen Rothwell 
Date: Sun, 16 Oct 2016 08:09:42 +1100
Subject: [PATCH] qed*: merge fix for CONFIG_INFINIBAND_QEDR Kconfig move

Signed-off-by: Stephen Rothwell 
---
 drivers/infiniband/hw/qedr/Kconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/infiniband/hw/qedr/Kconfig 
b/drivers/infiniband/hw/qedr/Kconfig
index 7c06d85568d4..6c9f3923e838 100644
--- a/drivers/infiniband/hw/qedr/Kconfig
+++ b/drivers/infiniband/hw/qedr/Kconfig
@@ -2,6 +2,7 @@ config INFINIBAND_QEDR
tristate "QLogic RoCE driver"
depends on 64BIT && QEDE
select QED_LL2
+   select QED_RDMA
---help---
  This driver provides low-level InfiniBand over Ethernet
  support for QLogic QED host channel adapters (HCAs).
-- 
2.8.1

-- 
Cheers,
Stephen Rothwell

diff --cc drivers/net/ethernet/qlogic/Kconfig
index 1e8339a67f6e,77567727528a..
--- a/drivers/net/ethernet/qlogic/Kconfig
+++ b/drivers/net/ethernet/qlogic/Kconfig
@@@ -107,4 -107,19 +107,7 @@@ config QED
---help---
  This enables the support for ...
  
+ config QED_RDMA
+   bool
+ 
 -config INFINIBAND_QEDR
 -  tristate "QLogic qede RoCE sources [debug]"
 -  depends on QEDE && 64BIT
 -  select QED_LL2
 -  select QED_RDMA
 -  default n
 -  ---help---
 -This provides a temporary node that allows the compilation
 -and logical testing of the InfiniBand over Ethernet support
 -for QLogic QED. This would be replaced by the 'real' option
 -once the QEDR driver is added [+relocated].
 -
  endif # NET_VENDOR_QLOGIC

Re: [PATCH v2 1/3] net: smc91x: isolate u16 writes alignment workaround

2016-10-15 Thread Robert Jarzmik

Sorry David, I just noticed you weren't in the "To:" of this serie, but I won't
forget you for the v3 I need to release anyway
(https://lkml.org/lkml/2016/10/15/104).

Robert Jarzmik  writes:
> + lp->half_word_align4 =
> + machine_is_mainstone() || machine_is_stargate2() ||
> + machine_is_pxa_idp();

Bah this one is not good enough.

First, machine_is_*() is not defined if CONFIG_ARM=n, and this part is not under
a #ifdef CONFIG_ARM.

Moreover, I think it is a good occasion to go further, and :
 - enhance smc91x_platdata and add a pxa_u16_align4 boolean
 - transform this statement into :
lp->half_word_align4 = lp->cfg.pxa_u16_align4

This will remove the machine_*() calls from the smc91x driver, which looks a
good move, doesn't it ?

Cheers.

--
Robert

Re: [PATCH 2/2] rds: Remove duplicate prefix from rds_conn_path_error use

2016-10-15 Thread Santosh Shilimkar


On 10/15/2016 11:53 AM, Joe Perches wrote:

rds_conn_path_error already prefixes "RDS:" to the output.

Signed-off-by: Joe Perches 
---

Acked-by: Santosh Shilimkar

Re: [PATCH 1/2] rds: Remove unused rds_conn_error

2016-10-15 Thread Santosh Shilimkar


On 10/15/2016 11:53 AM, Joe Perches wrote:

This macro's last use was removed in commit d769ef81d5b59
("RDS: Update rds_conn_shutdown to work with rds_conn_path")
so make the macro and the __rds_conn_error function definition
and declaration disappear.

Signed-off-by: Joe Perches 
---

Had same patch along with few more in the queue but
didn't find time of late to get it on the list.
Thanks for both patches.

Acked-by: Santosh Shilimkar

[PATCH v2 3/3] net: smsc91x: add u16 workaround for pxa platforms

2016-10-15 Thread Robert Jarzmik

Add a workaround for mainstone, idp and stargate2 boards, for u16 writes
which must be aligned on 32 bits addresses.

Signed-off-by: Robert Jarzmik 
Cc: Jeremy Linton 
---
Since v1: rename dt property to pxa-u16-align4
  change the binding documentation file
---
 Documentation/devicetree/bindings/net/smsc-lan91c111.txt | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/Documentation/devicetree/bindings/net/smsc-lan91c111.txt 
b/Documentation/devicetree/bindings/net/smsc-lan91c111.txt
index e77e167593db..309e37eb7c7c 100644
--- a/Documentation/devicetree/bindings/net/smsc-lan91c111.txt
+++ b/Documentation/devicetree/bindings/net/smsc-lan91c111.txt
@@ -13,3 +13,5 @@ Optional properties:
   16-bit access only.
 - power-gpios: GPIO to control the PWRDWN pin
 - reset-gpios: GPIO to control the RESET pin
+- pxa-u16-align4 : Boolean, put in place the workaround the force all
+  u16 writes to be 32 bits aligned
-- 
2.1.4

[PATCH v2 1/3] net: smc91x: isolate u16 writes alignment workaround

2016-10-15 Thread Robert Jarzmik

Writes to u16 has a special handling on 3 PXA platforms, where the
hardware wiring forces these writes to be u32 aligned.

This patch isolates this handling for PXA platforms as before, but
enables this "workaround" to be set up dynamically, which will be the
case in device-tree build types.

This patch was tested on 2 PXA platforms : mainstone, which relies on
the workaround, and lubbock, which doesn't.

Signed-off-by: Robert Jarzmik 
---
 drivers/net/ethernet/smsc/smc91x.c |  6 ++-
 drivers/net/ethernet/smsc/smc91x.h | 78 +-
 2 files changed, 48 insertions(+), 36 deletions(-)

diff --git a/drivers/net/ethernet/smsc/smc91x.c 
b/drivers/net/ethernet/smsc/smc91x.c
index 9b4780f87863..5658c2b28ec8 100644
--- a/drivers/net/ethernet/smsc/smc91x.c
+++ b/drivers/net/ethernet/smsc/smc91x.c
@@ -602,7 +602,8 @@ static void smc_hardware_send_pkt(unsigned long data)
SMC_PUSH_DATA(lp, buf, len & ~1);
 
/* Send final ctl word with the last byte if there is one */
-   SMC_outw(((len & 1) ? (0x2000 | buf[len-1]) : 0), ioaddr, DATA_REG(lp));
+   SMC_outw(lp, ((len & 1) ? (0x2000 | buf[len-1]) : 0), ioaddr,
+DATA_REG(lp));
 
/*
 * If THROTTLE_TX_PKTS is set, we stop the queue here. This will
@@ -2282,6 +2283,9 @@ static int smc_drv_probe(struct platform_device *pdev)
goto out_free_netdev;
}
}
+   lp->half_word_align4 =
+   machine_is_mainstone() || machine_is_stargate2() ||
+   machine_is_pxa_idp();
 
 #if IS_BUILTIN(CONFIG_OF)
match = of_match_device(of_match_ptr(smc91x_match), &pdev->dev);
diff --git a/drivers/net/ethernet/smsc/smc91x.h 
b/drivers/net/ethernet/smsc/smc91x.h
index ea8465467469..dff165ed106d 100644
--- a/drivers/net/ethernet/smsc/smc91x.h
+++ b/drivers/net/ethernet/smsc/smc91x.h
@@ -86,11 +86,11 @@
 
 #define SMC_inl(a, r)  readl((a) + (r))
 #define SMC_outb(v, a, r)  writeb(v, (a) + (r))
-#define SMC_outw(v, a, r)  \
+#define SMC_outw(lp, v, a, r)  \
do {\
unsigned int __v = v, __smc_r = r;  \
if (SMC_16BIT(lp))  \
-   __SMC_outw(__v, a, __smc_r);\
+   __SMC_outw(lp, __v, a, __smc_r);\
else if (SMC_8BIT(lp))  \
SMC_outw_b(__v, a, __smc_r);\
else\
@@ -107,10 +107,10 @@
 #define SMC_IRQ_FLAGS  (-1)/* from resource */
 
 /* We actually can't write halfwords properly if not word aligned */
-static inline void __SMC_outw(u16 val, void __iomem *ioaddr, int reg)
+static inline void _SMC_outw_align4(u16 val, void __iomem *ioaddr, int reg,
+   bool use_align4_workaround)
 {
-   if ((machine_is_mainstone() || machine_is_stargate2() ||
-machine_is_pxa_idp()) && reg & 2) {
+   if (use_align4_workaround) {
unsigned int v = val << 16;
v |= readl(ioaddr + (reg & ~2)) & 0x;
writel(v, ioaddr + (reg & ~2));
@@ -119,6 +119,12 @@ static inline void __SMC_outw(u16 val, void __iomem 
*ioaddr, int reg)
}
 }
 
+#define __SMC_outw(lp, v, a, r)
\
+   _SMC_outw_align4((v), (a), (r), \
+IS_BUILTIN(CONFIG_ARCH_PXA) && ((r) & 2) &&\
+lp->half_word_align4)
+
+
 #elif  defined(CONFIG_SH_SH4202_MICRODEV)
 
 #define SMC_CAN_USE_8BIT   0
@@ -129,7 +135,7 @@ static inline void __SMC_outw(u16 val, void __iomem 
*ioaddr, int reg)
 #define SMC_inw(a, r)  inw((a) + (r) - 0xa000)
 #define SMC_inl(a, r)  inl((a) + (r) - 0xa000)
 #define SMC_outb(v, a, r)  outb(v, (a) + (r) - 0xa000)
-#define SMC_outw(v, a, r)  outw(v, (a) + (r) - 0xa000)
+#define SMC_outw(lp, v, a, r)  outw(v, (a) + (r) - 0xa000)
 #define SMC_outl(v, a, r)  outl(v, (a) + (r) - 0xa000)
 #define SMC_insl(a, r, p, l)   insl((a) + (r) - 0xa000, p, l)
 #define SMC_outsl(a, r, p, l)  outsl((a) + (r) - 0xa000, p, l)
@@ -147,7 +153,7 @@ static inline void __SMC_outw(u16 val, void __iomem 
*ioaddr, int reg)
 #define SMC_inb(a, r)  inb(((u32)a) + (r))
 #define SMC_inw(a, r)  inw(((u32)a) + (r))
 #define SMC_outb(v, a, r)  outb(v, ((u32)a) + (r))
-#define SMC_outw(v, a, r)  outw(v, ((u32)a) + (r))
+#define SMC_outw(lp, v, a, r)  outw(v, ((u32)a) + (r))
 #define SMC_insw(a, r, p, l)   insw(((u32)a) + (r), p, l)
 #define SMC_outsw(a, r, p, l)  outsw(((u32)a) + (r), p, l)
 
@@ -175,7 +181,7 @@ static

[PATCH v2 2/3] net: smc91x: take into account half-word workaround

2016-10-15 Thread Robert Jarzmik

For device-tree builds, platforms such as mainstone, idp and stargate2
must have their u16 writes all aligned on 32 bit boundaries. This is
already enabled in platform data builds, and this patch adds it to
device-tree builds.

Signed-off-by: Robert Jarzmik 
---
Since v1: rename dt property to pxa-u16-align4
---
 drivers/net/ethernet/smsc/smc91x.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/ethernet/smsc/smc91x.c 
b/drivers/net/ethernet/smsc/smc91x.c
index 5658c2b28ec8..c14676805d06 100644
--- a/drivers/net/ethernet/smsc/smc91x.c
+++ b/drivers/net/ethernet/smsc/smc91x.c
@@ -2329,6 +2329,8 @@ static int smc_drv_probe(struct platform_device *pdev)
if (!device_property_read_u32(&pdev->dev, "reg-shift",
  &val))
lp->io_shift = val;
+   lp->half_word_align4 =
+   device_property_read_bool(&pdev->dev, "pxa-u16-align4");
}
 #endif
 
-- 
2.1.4

[PATCH v2 0/3] support smc91x on mainstone and devicetree

2016-10-15 Thread Robert Jarzmik

This serie aims at bringing support to mainstone board on a device-tree based
build, as what is already in place for legacy mainstone.

The bulk of the mainstone "specific" behavior is that a u16 write doesn't work
on a address of the form 4*n + 2, while it works on 4*n.

The legacy workaround was in SMC_outw(), with calls to
machine_is_mainstone(). These calls don't work with a pxa27x-dt machine type,
which is used when a generic device-tree pxa27x machine is used to boot the
mainstone board.

Therefore, this serie enables the smc91c111 adapter of the mainstone board to
work on a device-tree build, exaclty as it's been working for years with the
legacy arch/arm/mach-pxa/mainstone.c definition.

Cheers.

--
Robert

Robert Jarzmik (3):
  net: smc91x: isolate u16 writes alignment workaround
  net: smc91x: take into account half-word workaround
  net: smsc91x: add u16 workaround for pxa platforms

 .../devicetree/bindings/net/smsc-lan91c111.txt |  2 +
 drivers/net/ethernet/smsc/smc91x.c |  8 ++-
 drivers/net/ethernet/smsc/smc91x.h | 78 --
 3 files changed, 52 insertions(+), 36 deletions(-)

-- 
2.1.4

Re: [rds-devel] [PATCH 2/2] rds: Remove duplicate prefix from rds_conn_path_error use

2016-10-15 Thread Sowmini Varadhan

On (10/15/16 11:53), Joe Perches wrote:
> 
> rds_conn_path_error already prefixes "RDS:" to the output.
> 
> Signed-off-by: Joe Perches 
Acked-by: Sowmini Varadhan

Re: [rds-devel] [PATCH 1/2] rds: Remove unused rds_conn_error

2016-10-15 Thread Sowmini Varadhan

On (10/15/16 11:53), Joe Perches wrote:
> This macro's last use was removed in commit d769ef81d5b59
> ("RDS: Update rds_conn_shutdown to work with rds_conn_path")
> so make the macro and the __rds_conn_error function definition
> and declaration disappear.
> 
> Signed-off-by: Joe Perches 

Acked-by: Sowmini Varadhan

[PATCH 1/2] rds: Remove unused rds_conn_error

2016-10-15 Thread Joe Perches

This macro's last use was removed in commit d769ef81d5b59
("RDS: Update rds_conn_shutdown to work with rds_conn_path")
so make the macro and the __rds_conn_error function definition
and declaration disappear.

Signed-off-by: Joe Perches 
---
 net/rds/connection.c | 15 ---
 net/rds/rds.h|  4 
 2 files changed, 19 deletions(-)

diff --git a/net/rds/connection.c b/net/rds/connection.c
index f5058559bb08..13f459dad4ef 100644
--- a/net/rds/connection.c
+++ b/net/rds/connection.c
@@ -689,21 +689,6 @@ void rds_conn_connect_if_down(struct rds_connection *conn)
 }
 EXPORT_SYMBOL_GPL(rds_conn_connect_if_down);
 
-/*
- * An error occurred on the connection
- */
-void
-__rds_conn_error(struct rds_connection *conn, const char *fmt, ...)
-{
-   va_list ap;
-
-   va_start(ap, fmt);
-   vprintk(fmt, ap);
-   va_end(ap);
-
-   rds_conn_drop(conn);
-}
-
 void
 __rds_conn_path_error(struct rds_conn_path *cp, const char *fmt, ...)
 {
diff --git a/net/rds/rds.h b/net/rds/rds.h
index fd0bccb2f9f9..25532a46602f 100644
--- a/net/rds/rds.h
+++ b/net/rds/rds.h
@@ -683,10 +683,6 @@ void rds_for_each_conn_info(struct socket *sock, unsigned 
int len,
  struct rds_info_lengths *lens,
  int (*visitor)(struct rds_connection *, void *),
  size_t item_len);
-__printf(2, 3)
-void __rds_conn_error(struct rds_connection *conn, const char *, ...);
-#define rds_conn_error(conn, fmt...) \
-   __rds_conn_error(conn, KERN_WARNING "RDS: " fmt)
 
 __printf(2, 3)
 void __rds_conn_path_error(struct rds_conn_path *cp, const char *, ...);
-- 
2.10.0.rc2.1.g053435c

[PATCH 2/2] rds: Remove duplicate prefix from rds_conn_path_error use

2016-10-15 Thread Joe Perches

rds_conn_path_error already prefixes "RDS:" to the output.

Signed-off-by: Joe Perches 
---
 net/rds/threads.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/net/rds/threads.c b/net/rds/threads.c
index e42df11bf30a..e36e333a0aa0 100644
--- a/net/rds/threads.c
+++ b/net/rds/threads.c
@@ -171,8 +171,7 @@ void rds_connect_worker(struct work_struct *work)
 RDS_CONN_DOWN))
rds_queue_reconnect(cp);
else
-   rds_conn_path_error(cp,
-   "RDS: connect failed\n");
+   rds_conn_path_error(cp, "connect failed\n");
}
}
 }
-- 
2.10.0.rc2.1.g053435c

[PATCH 0/2] rds: logging neatening

2016-10-15 Thread Joe Perches

Joe Perches (2):
  rds: Remove unused rds_conn_error
  rds: Remove duplicate prefix from rds_conn_path_error use

 net/rds/connection.c | 15 ---
 net/rds/rds.h|  4 
 net/rds/threads.c|  3 +--
 3 files changed, 1 insertion(+), 21 deletions(-)

-- 
2.10.0.rc2.1.g053435c

Re: Need help with mdiobus_register and phy

2016-10-15 Thread Timur Tabi


Andrew Lunn wrote:

1) Take the SerDes power down out of the suspend code for the at803x.

2) Assume MII_PHYID1/2 registers are not guaranteed to be available
when the PHY is powered down. So get_phy_id should first read
MII_BMCR. If it gets 0x, assume there is no PHY there. If the
PDOWN bit is set, power up the PHY. Then reading the ID registers.


Before we take approach #1, I'd like to hear from the developer of that 
patch, Zefir.  According to him, that patch is necessary to fix a bug. 
I don't know if that bug exists only on his system, though.


--
Sent by an employee of the Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the
Code Aurora Forum, hosted by The Linux Foundation.

Re: Need help with mdiobus_register and phy

2016-10-15 Thread Andrew Lunn

On Sat, Oct 15, 2016 at 09:39:12AM -0500, Timur Tabi wrote:
> Florian Fainelli wrote:
> >After reading the spec again, it does not appear to me that a PHY
> >with PDOWN set is guaranteed or even required to respond to other
> >register reads such as MII_PHYID1/2, in which case we may have to
> >implement a MDIO bus reset routine which clears PDOWN for all PHYs
> >that we detect(ed), or as Andrew suggested, utilize the matching by
> >compatible string with the PHY OUI in it.
> 
> The 8031 does respond normally when PDOWN is set.  However, the ID
> registers are not available when the SerDes bus is also powered
> down. I'll call this PDOWN+.  This is a special power-down sequence
> that the at803x driver does on suspend.  See my other email for
> details.

So we appear to have two ways to go:

1) Take the SerDes power down out of the suspend code for the at803x.

2) Assume MII_PHYID1/2 registers are not guaranteed to be available
when the PHY is powered down. So get_phy_id should first read
MII_BMCR. If it gets 0x, assume there is no PHY there. If the
PDOWN bit is set, power up the PHY. Then reading the ID registers.

  Andrew

[net:master 10/15] net/ipv6/addrconf.c:1251:14-30: WARNING: Unsigned expression compared with zero: tmp_prefered_lft < 0

2016-10-15 Thread Julia Lawall

I haven't checked the entire context, but it could be useful to look at
line 1251.

julia

-- Forwarded message --
Date: Sun, 16 Oct 2016 01:34:18 +0800
From: kbuild test robot 
To: kbu...@01.org
Cc: Julia Lawall 
Subject: [net:master 10/15] net/ipv6/addrconf.c:1251:14-30: WARNING: Unsigned
expression compared with zero: tmp_prefered_lft < 0

CC: kbuild-...@01.org
CC: netdev@vger.kernel.org
TO: Jiri Bohac 

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git master
head:   9e55d0f95460a067def5400fa5eee5dabb0fc5a5
commit: 76506a986dc31394fd1f2741db037d29c7e57843 [10/15] IPv6: fix DESYNC_FACTOR
:: branch date: 21 hours ago
:: commit date: 27 hours ago

>> net/ipv6/addrconf.c:1251:14-30: WARNING: Unsigned expression compared with 
>> zero: tmp_prefered_lft < 0

git remote add net https://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git
git remote update net
git checkout 76506a986dc31394fd1f2741db037d29c7e57843
vim +1251 net/ipv6/addrconf.c

76506a986 Jiri Bohac 2016-10-13  1235   if 
(unlikely(idev->desync_factor > max_desync_factor)) {
76506a986 Jiri Bohac 2016-10-13  1236   if (max_desync_factor > 
0) {
76506a986 Jiri Bohac 2016-10-13  1237   
get_random_bytes(&idev->desync_factor,
76506a986 Jiri Bohac 2016-10-13  1238   
 sizeof(idev->desync_factor));
76506a986 Jiri Bohac 2016-10-13  1239   
idev->desync_factor %= max_desync_factor;
76506a986 Jiri Bohac 2016-10-13  1240   } else {
76506a986 Jiri Bohac 2016-10-13  1241   
idev->desync_factor = 0;
76506a986 Jiri Bohac 2016-10-13  1242   }
76506a986 Jiri Bohac 2016-10-13  1243   }
76506a986 Jiri Bohac 2016-10-13  1244
^1da177e4 Linus Torvalds 2005-04-16  1245   tmp_valid_lft = min_t(__u32,
^1da177e4 Linus Torvalds 2005-04-16  1246 
ifp->valid_lft,
7a876b0ef Glenn Wurster  2010-09-27  1247 
idev->cnf.temp_valid_lft + age);
76506a986 Jiri Bohac 2016-10-13  1248   tmp_prefered_lft = 
idev->cnf.temp_prefered_lft + age -
76506a986 Jiri Bohac 2016-10-13  1249   
idev->desync_factor;
76506a986 Jiri Bohac 2016-10-13  1250   /* guard against underflow in 
case of concurrent updates to cnf */
76506a986 Jiri Bohac 2016-10-13 @1251   if (unlikely(tmp_prefered_lft < 
0))
76506a986 Jiri Bohac 2016-10-13  1252   tmp_prefered_lft = 0;
76506a986 Jiri Bohac 2016-10-13  1253   tmp_prefered_lft = min_t(__u32, 
ifp->prefered_lft, tmp_prefered_lft);
^1da177e4 Linus Torvalds 2005-04-16  1254   tmp_plen = ifp->prefix_len;
^1da177e4 Linus Torvalds 2005-04-16  1255   tmp_tstamp = ifp->tstamp;
^1da177e4 Linus Torvalds 2005-04-16  1256   spin_unlock_bh(&ifp->lock);
^1da177e4 Linus Torvalds 2005-04-16  1257
53bd67491 Jiri Pirko 2013-12-06  1258   write_unlock_bh(&idev->lock);
95c385b4d Neil Horman2007-04-25  1259

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation

Re: [patch net-next RFC 4/6] Introduce sample tc action

2016-10-15 Thread Roopa Prabhu

On 10/15/16, 9:34 AM, Roopa Prabhu wrote:
> On 10/12/16, 5:41 AM, Jiri Pirko wrote:
>> From: Yotam Gigi 
>>
>> This action allow the user to sample traffic matched by tc classifier.
>> The sampling consists of choosing packets randomly, truncating them,
>> adding some informative metadata regarding the interface and the original
>> packet size and mark them with specific mark, to allow further tc rules to
>> match and process. The marked sample packets are then injected into the
>> device ingress qdisc using netif_receive_skb.
>>
>> The packets metadata is packed using the ife encapsulation protocol, and
>> the outer packet's ethernet dest, source and eth_type, along with the
>> rate, mark and the optional truncation size can be configured from
>> userspace.
>>
>> Example:
>> To sample ingress traffic from interface eth1, and redirect the sampled
>> the sampled packets to interface dummy0, one may use the commands:
>>
>> tc qdisc add dev eth1 handle : ingress
>>
>> tc filter add dev eth1 parent : \
>> matchall action sample rate 12 mark 17
>>
>> tc filter add parent : dev eth1 protocol all \
>> u32 match mark 172 0xff
>> action mirred egress redirect dev dummy0
>>
>> Where the first command adds an ingress qdisc and the second starts
>> sampling every 12'th packet on dev eth0 and marks the sampled packets with
>> 17. The command third catches the sampled packets, which are marked with
>> 17, and redirects them to dev dummy0.
>>
>> Signed-off-by: Yotam Gigi 
>> Signed-off-by: Jiri Pirko 
> channeling some feedback from Peter Phaal @sflow inline below:
>
>
If it helps, one more thing that came up was using bpf.
They also use bpf filters for pkt sampling in the non-offloaded case:
http://blog.sflow.com/2016/05/berkeley-packet-filter-bpf.html

so, existing apps (like sflow) that care about packet sampling do prefer to use
a socket api for sample delivery: netlink nflog or bpf like socket filters

also, to keep the software and hardware models the same, wondering if ebpf 
attach
can be a viable option (have not thought about the offloaded case completely 
yet).
This would give apps more control on attaching sample headers (like sflow) if 
needed.

thanks,
Roopa

[PATCH] crypto: ccm - avoid scatterlist for MAC encryption

2016-10-15 Thread Ard Biesheuvel

The CCM code goes out of its way to perform the CTR encryption of the MAC
using the subordinate CTR driver. To this end, it tweaks the input and
output scatterlists so the aead_req 'odata' and/or 'auth_tag' fields [which
may live on the stack] are prepended to the CTR payload. This involves
calling sg_set_buf() on addresses which are not direct mapped, which is
not supported.

Since the calculation of the MAC keystream involves a single call into
the cipher, to which we have a handle already given that the CBC-MAC
calculation uses it as well, just calculate the MAC keystream directly,
and record it in the aead_req private context so we can apply it to the
MAC in cypto_ccm_auth_mac(). This greatly simplifies the scatterlist
manipulation, and no longer requires scatterlists to refer to buffers
that may live on the stack.

Signed-off-by: Ard Biesheuvel 
---

This is an alternative for the patch 'mac80211: aes_ccm: move struct
aead_req off the stack' that I sent out yesterday. IMO, this is a more
correct approach, since it addresses the problem directly in crypto/ccm.c,
which is the only CCM-AES driver that suffers from this issue.

 crypto/ccm.c | 55 +++-
 1 file changed, 29 insertions(+), 26 deletions(-)

diff --git a/crypto/ccm.c b/crypto/ccm.c
index 006d8575ef5c..faa5efcf59e2 100644
--- a/crypto/ccm.c
+++ b/crypto/ccm.c
@@ -46,10 +46,13 @@ struct crypto_ccm_req_priv_ctx {
u8 odata[16];
u8 idata[16];
u8 auth_tag[16];
+   u8 cmac[16];
u32 ilen;
u32 flags;
-   struct scatterlist src[3];
-   struct scatterlist dst[3];
+   struct scatterlist *src;
+   struct scatterlist *dst;
+   struct scatterlist srcbuf[2];
+   struct scatterlist dstbuf[2];
struct skcipher_request skreq;
 };
 
@@ -280,6 +283,8 @@ static int crypto_ccm_auth(struct aead_request *req, struct 
scatterlist *plain,
if (cryptlen)
get_data_to_compute(cipher, pctx, plain, cryptlen);
 
+   crypto_xor(odata, pctx->cmac, 16);
+
 out:
return err;
 }
@@ -307,10 +312,12 @@ static inline int crypto_ccm_check_iv(const u8 *iv)
return 0;
 }
 
-static int crypto_ccm_init_crypt(struct aead_request *req, u8 *tag)
+static int crypto_ccm_init_crypt(struct aead_request *req)
 {
+   struct crypto_aead *aead = crypto_aead_reqtfm(req);
+   struct crypto_ccm_ctx *ctx = crypto_aead_ctx(aead);
struct crypto_ccm_req_priv_ctx *pctx = crypto_ccm_reqctx(req);
-   struct scatterlist *sg;
+   struct crypto_cipher *cipher = ctx->cipher;
u8 *iv = req->iv;
int err;
 
@@ -325,19 +332,16 @@ static int crypto_ccm_init_crypt(struct aead_request 
*req, u8 *tag)
 */
memset(iv + 15 - iv[0], 0, iv[0] + 1);
 
-   sg_init_table(pctx->src, 3);
-   sg_set_buf(pctx->src, tag, 16);
-   sg = scatterwalk_ffwd(pctx->src + 1, req->src, req->assoclen);
-   if (sg != pctx->src + 1)
-   sg_chain(pctx->src, 2, sg);
+   /* prepare the key stream for the auth tag  */
+   crypto_cipher_encrypt_one(cipher, pctx->cmac, iv);
 
-   if (req->src != req->dst) {
-   sg_init_table(pctx->dst, 3);
-   sg_set_buf(pctx->dst, tag, 16);
-   sg = scatterwalk_ffwd(pctx->dst + 1, req->dst, req->assoclen);
-   if (sg != pctx->dst + 1)
-   sg_chain(pctx->dst, 2, sg);
-   }
+   /* increment BE counter in IV[] for the actual payload */
+   iv[15] = 1;
+
+   pctx->src = scatterwalk_ffwd(pctx->srcbuf, req->src, req->assoclen);
+   if (req->src != req->dst)
+   pctx->dst = scatterwalk_ffwd(pctx->dstbuf, req->dst,
+req->assoclen);
 
return 0;
 }
@@ -354,11 +358,11 @@ static int crypto_ccm_encrypt(struct aead_request *req)
u8 *iv = req->iv;
int err;
 
-   err = crypto_ccm_init_crypt(req, odata);
+   err = crypto_ccm_init_crypt(req);
if (err)
return err;
 
-   err = crypto_ccm_auth(req, sg_next(pctx->src), cryptlen);
+   err = crypto_ccm_auth(req, pctx->src, cryptlen);
if (err)
return err;
 
@@ -369,13 +373,13 @@ static int crypto_ccm_encrypt(struct aead_request *req)
skcipher_request_set_tfm(skreq, ctx->ctr);
skcipher_request_set_callback(skreq, pctx->flags,
  crypto_ccm_encrypt_done, req);
-   skcipher_request_set_crypt(skreq, pctx->src, dst, cryptlen + 16, iv);
+   skcipher_request_set_crypt(skreq, pctx->src, dst, cryptlen, iv);
err = crypto_skcipher_encrypt(skreq);
if (err)
return err;
 
/* copy authtag to end of dst */
-   scatterwalk_map_and_copy(odata, sg_next(dst), cryptlen,
+   scatterwalk_map_and_copy(odata, dst, cryptlen,
 crypto_aead_authsize(aead), 1);
return err;
 }
@@ -392,7 +396,7 @@ static void crypto_ccm_decrypt_do

Re: [PATCH] net: limit a number of namespaces which can be cleaned up concurrently

2016-10-15 Thread Eric W. Biederman

Andrei Vagin  writes:

> On Thu, Oct 13, 2016 at 10:06:28PM -0500, Eric W. Biederman wrote:
>> Andrei Vagin  writes:
>> 
>> > On Thu, Oct 13, 2016 at 10:49:38AM -0500, Eric W. Biederman wrote:
>> >> Andrei Vagin  writes:
>> >> 
>> >> > From: Andrey Vagin 
>> >> >
>> >> > The operation of destroying netns is heavy and it is executed under
>> >> > net_mutex. If many namespaces are destroyed concurrently, net_mutex can
>> >> > be locked for a long time. It is impossible to create a new netns during
>> >> > this period of time.
>> >> 
>> >> This may be the right approach or at least the right approach to bound
>> >> net_mutex hold times but I have to take exception to calling network
>> >> namespace cleanup heavy.
>> >> 
>> >> The only particularly time consuming operation I have ever found are 
>> >> calls to
>> >> synchronize_rcu/sycrhonize_sched/synchronize_net.
>> >
>> > I booted the kernel with maxcpus=1, in this case these functions work
>> > very fast and the problem is there any way.
>> >
>> > Accoding to perf, we spend a lot of time in kobject_uevent:
>> >
>> > -   99.96% 0.00%  kworker/u4:1 [kernel.kallsyms]  [k] 
>> > unregister_netdevice_many
>> >- unregister_netdevice_many
>> >   - 99.95% rollback_registered_many
>> >  - 99.64% netdev_unregister_kobject
>> > - 33.43% netdev_queue_update_kobjects
>> >- 33.40% kobject_put
>> >   - kobject_release
>> >  + 33.37% kobject_uevent
>> >  + 0.03% kobject_del
>> >+ 0.03% sysfs_remove_group
>> > - 33.13% net_rx_queue_update_kobjects
>> >- kobject_put
>> >- kobject_release
>> >   + 33.11% kobject_uevent
>> >   + 0.01% kobject_del
>> > 0.00% rx_queue_release
>> > - 33.08% device_del
>> >+ 32.75% kobject_uevent
>> >+ 0.17% device_remove_attrs
>> >+ 0.07% dpm_sysfs_remove
>> >+ 0.04% device_remove_class_symlinks
>> >+ 0.01% kobject_del
>> >+ 0.01% device_pm_remove
>> >+ 0.01% sysfs_remove_file_ns
>> >+ 0.00% klist_del
>> >+ 0.00% driver_deferred_probe_del
>> >  0.00% cleanup_glue_dir.isra.14.part.15
>> >  0.00% to_acpi_device_node
>> >  0.00% sysfs_remove_group
>> >   0.00% klist_del
>> >   0.00% device_remove_attrs
>> >  + 0.26% call_netdevice_notifiers_info
>> >  + 0.04% rtmsg_ifinfo_build_skb
>> >  + 0.01% rtmsg_ifinfo_send
>> > 0.00% dev_uc_flush
>> > 0.00% netif_reset_xps_queues_gt
>> >
>> > Someone can listen these uevents, so we can't stop sending them without
>> > breaking backward compatibility. We can try to optimize
>> > kobject_uevent...
>> 
>> Oh that is a surprise.  We can definitely skip genenerating uevents for
>> network namespaces that are exiting because by definition no one can see
>> those network namespaces.  If a socket existed that could see those
>> uevents it would hold a reference to the network namespace and as such
>> the network namespace could not exit.
>> 
>> That sounds like it is worth investigating a little more deeply.
>> 
>> I am surprised that allocation and freeing is so heavy we are spending
>> lots of time doing that.  On the other hand kobj_bcast_filter is very
>> dumb and very late so I expect something can be moved earlier and make
>> that code cheaper with the tiniest bit of work.
>> 
>
> I'm sorry, I've collected this data for a kernel with debug options
> (DEBUG_SPINLOCK, PROVE_LOCKING, DEBUG_LIST, etc). If a kernel is
> compiled without debug options, kobject_uevent becomes less expensive,
> but still expensive.
>
> -   98.64% 0.00%  kworker/u4:2  [kernel.kallsyms][k] cleanup_net
>- cleanup_net
>   - 98.54% ops_exit_list.isra.4
>  - 60.48% default_device_exit_batch
> - 60.40% unregister_netdevice_many
>- rollback_registered_many
>   - 59.82% netdev_unregister_kobject
>  - 20.10% device_del
> + 19.44% kobject_uevent
> + 0.40% device_remove_attrs
> + 0.17% dpm_sysfs_remove
> + 0.04% device_remove_class_symlinks
> + 0.04% kobject_del
> + 0.01% device_pm_remove
> + 0.01% sysfs_remove_file_ns
>  - 19.89% netdev_queue_update_kobjects
> + 19.81% kobject_put
> + 0.07% sysfs_remove_group
>  - 19.79% net_rx_queue_update_kobjects
>   kobject_put
> - kobject_release
>+ 19.77% kobject_uevent
>+ 0.02% kobject_d

Re: [patch net-next RFC 4/6] Introduce sample tc action

2016-10-15 Thread Roopa Prabhu

On 10/12/16, 5:41 AM, Jiri Pirko wrote:
> From: Yotam Gigi 
>
> This action allow the user to sample traffic matched by tc classifier.
> The sampling consists of choosing packets randomly, truncating them,
> adding some informative metadata regarding the interface and the original
> packet size and mark them with specific mark, to allow further tc rules to
> match and process. The marked sample packets are then injected into the
> device ingress qdisc using netif_receive_skb.
>
> The packets metadata is packed using the ife encapsulation protocol, and
> the outer packet's ethernet dest, source and eth_type, along with the
> rate, mark and the optional truncation size can be configured from
> userspace.
>
> Example:
> To sample ingress traffic from interface eth1, and redirect the sampled
> the sampled packets to interface dummy0, one may use the commands:
>
> tc qdisc add dev eth1 handle : ingress
>
> tc filter add dev eth1 parent : \
>  matchall action sample rate 12 mark 17
>
> tc filter add parent : dev eth1 protocol all \
>  u32 match mark 172 0xff
>  action mirred egress redirect dev dummy0
>
> Where the first command adds an ingress qdisc and the second starts
> sampling every 12'th packet on dev eth0 and marks the sampled packets with
> 17. The command third catches the sampled packets, which are marked with
> 17, and redirects them to dev dummy0.
>
> Signed-off-by: Yotam Gigi 
> Signed-off-by: Jiri Pirko 
channeling some feedback from Peter Phaal @sflow inline below:

> ---
>  
> diff --git a/include/net/tc_act/tc_sample.h b/include/net/tc_act/tc_sample.h
> new file mode 100644
> index 000..a2b445a
> --- /dev/null
> +++ b/include/net/tc_act/tc_sample.h
> @@ -0,0 +1,88 @@
> +#ifndef __NET_TC_SAMPLE_H
> +#define __NET_TC_SAMPLE_H
> +
> +#include 
> +#include 
> +
> +struct tcf_sample {
> + struct tc_actioncommon;
> + u32 rate;
> + u32 mark;
> + booltruncate;
> + u32 trunc_size;
> + u32 packet_counter;
> + u8  eth_dst[ETH_ALEN];
> + u8  eth_src[ETH_ALEN];
> + u16 eth_type;
> + booleth_type_set;
> + struct list_headtcfm_list;
> +};
> +#define to_sample(a) ((struct tcf_sample *)a)
> +
> +struct sample_packet_metadata {
> + int sample_size;
> + int orig_size;
> + int ifindex;
> +};
> +
This metadata does not look extensible.. can it be made to ?

With sflow in context, you need a pair of ifindex numbers to encode ingress and 
egress ports. Ideally you would also include a sequence number and a count of 
the total number of packets that were candidates for sampling. The OVS 
implementation is a good example, the metadata includes all the actions applied 
to the packet in the kernel data path.



[snip]

> diff --git a/include/uapi/linux/tc_act/tc_sample.h 
> b/include/uapi/linux/tc_act/tc_sample.h
> new file mode 100644
> index 000..654945b
> --- /dev/null
> +++ b/include/uapi/linux/tc_act/tc_sample.h
> @@ -0,0 +1,31 @@
> +#ifndef __LINUX_TC_SAMPLE_H
> +#define __LINUX_TC_SAMPLE_H
> +
> +#include 
> +#include 
> +#include 
> +
> +#define TCA_ACT_SAMPLE 26
> +
> +struct tc_sample {
> + tc_gen;
> + __u32   rate;   /* sample rate */
> + __u32   mark;   /* mark to put on the sampled packets */
> + booltruncate;   /* whether to truncate the packets */
> + __u32   trunc_size; /* truncation size */
> + __u8eth_dst[ETH_ALEN]; /* encapsulated mac destination */
> + __u8eth_src[ETH_ALEN]; /* encapsulated mac source */
> + booleth_type_set;  /* whether to overrid ethtype */
> + __u16   eth_type;  /* encapsulated mac ethtype */
> +};
> +
this does not look extensible and is part of UAPI ..

Doing the minimum in the kernel and leaving the rest to the user space agent is 
much more flexible. The user space agent can attach additional metadata and 
offer more flexibility in forwarding (sFlow uses XDR encoding over UDP and is 
routable over IPv4/IPv6).



> +enum {
> + TCA_SAMPLE_UNSPEC,
> + TCA_SAMPLE_TM,
> + TCA_SAMPLE_PARMS,
> + TCA_SAMPLE_PAD,
> + __TCA_SAMPLE_MAX
> +};
> +#define TCA_SAMPLE_MAX (__TCA_SAMPLE_MAX - 1)
> +
> +#endif
> diff --git a/net/sched/Kconfig b/net/sched/Kconfig
> index 24f7cac..c54ea6b 100644
> --- a/net/sched/Kconfig
> +++ b/net/sched/Kconfig
> @@ -650,6 +650,19 @@ config NET_ACT_MIRRED
> To compile this code as a module, choose M here: the
> module will be called act_mirred.
>  
> +config NET_ACT_SAMPLE
> +tristate "Traffic Sampling"
> +depends on NET_CLS_ACT
> +select NET_IFE
> +---help---
> +   Say Y here to allow packet sampling tc action. The packet sample
> +   action consi

[PATCH] ipvlan: constify l3mdev_ops structure

2016-10-15 Thread Julia Lawall

This l3mdev_ops structure is only stored in the l3mdev_ops field of a
net_device structure.  This field is declared const, so the l3mdev_ops
structure can be declared as const also.  Additionally drop the
__read_mostly annotation.

The semantic patch that adds const is as follows:
(http://coccinelle.lip6.fr/)

// 
@r disable optional_qualifier@
identifier i;
position p;
@@
static struct l3mdev_ops i@p = { ... };

@ok@
identifier r.i;
struct net_device *e;
position p;
@@
e->l3mdev_ops = &i@p;

@bad@
position p != {r.p,ok.p};
identifier r.i;
struct l3mdev_ops e;
@@
e@i@p

@depends on !bad disable optional_qualifier@
identifier r.i;
@@
static
+const
 struct l3mdev_ops i = { ... };
// 

The effect on the layout of the .o file is shown by the following output
of the size command, first before then after the transformation:

   textdata bss dec hex filename
   7364 466  5278821eca drivers/net/ipvlan/ipvlan_main.o
   7412 434  5278981eda drivers/net/ipvlan/ipvlan_main.o

Signed-off-by: Julia Lawall 

---
 drivers/net/ipvlan/ipvlan_main.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff -u -p a/drivers/net/ipvlan/ipvlan_main.c b/drivers/net/ipvlan/ipvlan_main.c
--- a/drivers/net/ipvlan/ipvlan_main.c
+++ b/drivers/net/ipvlan/ipvlan_main.c
@@ -26,7 +26,7 @@ static struct nf_hook_ops ipvl_nfops[] _
},
 };
 
-static struct l3mdev_ops ipvl_l3mdev_ops __read_mostly = {
+static const struct l3mdev_ops ipvl_l3mdev_ops = {
.l3mdev_l3_rcv = ipvlan_l3_rcv,
 };

[PATCH net] net: pktgen: remove rcu locking in pktgen_change_name()

2016-10-15 Thread Eric Dumazet

From: Eric Dumazet 

After Jesper commit back in linux-3.18, we trigger a lockdep
splat in proc_create_data() while allocating memory from
pktgen_change_name().

This patch converts t->if_lock to a mutex, since it is now only
used from control path, and adds proper locking to pktgen_change_name()

1) pktgen_thread_lock to protect the outer loop (iterating threads)
2) t->if_lock to protect the inner loop (iterating devices)

Note that before Jesper patch, pktgen_change_name() was lacking proper
protection, but lockdep was not able to detect the problem.

Fixes: 8788370a1d4b ("pktgen: RCU-ify "if_list" to remove lock in 
next_to_run()")
Reported-by: John Sperbeck 
Signed-off-by: Eric Dumazet 
Cc: Jesper Dangaard Brouer 
---
 net/core/pktgen.c |   17 ++---
 1 file changed, 10 insertions(+), 7 deletions(-)

diff --git a/net/core/pktgen.c b/net/core/pktgen.c
index 5219a9e2127a..306b8f0e03c1 100644
--- a/net/core/pktgen.c
+++ b/net/core/pktgen.c
@@ -216,8 +216,8 @@
 #define M_QUEUE_XMIT   2   /* Inject packet into qdisc */
 
 /* If lock -- protects updating of if_list */
-#define   if_lock(t)   spin_lock(&(t->if_lock));
-#define   if_unlock(t)   spin_unlock(&(t->if_lock));
+#define   if_lock(t)   mutex_lock(&(t->if_lock));
+#define   if_unlock(t)   mutex_unlock(&(t->if_lock));
 
 /* Used to help with determining the pkts on receive */
 #define PKTGEN_MAGIC 0xbe9be955
@@ -423,7 +423,7 @@ struct pktgen_net {
 };
 
 struct pktgen_thread {
-   spinlock_t if_lock; /* for list of devices */
+   struct mutex if_lock;   /* for list of devices */
struct list_head if_list;   /* All device here */
struct list_head th_list;
struct task_struct *tsk;
@@ -2010,11 +2010,13 @@ static void pktgen_change_name(const struct pktgen_net 
*pn, struct net_device *d
 {
struct pktgen_thread *t;
 
+   mutex_lock(&pktgen_thread_lock);
+
list_for_each_entry(t, &pn->pktgen_threads, th_list) {
struct pktgen_dev *pkt_dev;
 
-   rcu_read_lock();
-   list_for_each_entry_rcu(pkt_dev, &t->if_list, list) {
+   if_lock(t);
+   list_for_each_entry(pkt_dev, &t->if_list, list) {
if (pkt_dev->odev != dev)
continue;
 
@@ -2029,8 +2031,9 @@ static void pktgen_change_name(const struct pktgen_net 
*pn, struct net_device *d
   dev->name);
break;
}
-   rcu_read_unlock();
+   if_unlock(t);
}
+   mutex_unlock(&pktgen_thread_lock);
 }
 
 static int pktgen_device_event(struct notifier_block *unused,
@@ -3762,7 +3765,7 @@ static int __net_init pktgen_create_thread(int cpu, 
struct pktgen_net *pn)
return -ENOMEM;
}
 
-   spin_lock_init(&t->if_lock);
+   mutex_init(&t->if_lock);
t->cpu = cpu;
 
INIT_LIST_HEAD(&t->if_list);

Re: [PATCH] ethtool: Zero memory allocated for statistics

2016-10-15 Thread David Miller

From: Vlad Tsyrklevich 
Date: Sat, 15 Oct 2016 15:11:08 +

> I agree that we should propagate those errors and I'll prepare a new change
> to do so for phy_driver.get_stats(), ethtool_ops.self_test(), and
> ethtool_ops.get_ethtool_stats(). However, I still think this change should
> be adopted. 3/5 of the cases here are reachable without any special
> capabilities and programming defensively at the ethtool interface can
> eliminate an entire class of potential driver bugs instead of fixing them
> one by one. For example, get_eeprom() propagates errors but with a brief
> grep I found that qlcnic_get_eeprom() will return 0 incorrectly even though
> it read nothing for some NICs. Deeper bugs are undoubtedly laying around.

I'm all for defensive program when practical.

But statistics gathering is highly performance sensitive for many
important use cases, so I'm not ready to add a whole bzero() here
unless absolutely, positively, necessary.

Thanks.

Re: linux/atm_zatm.h not really usable in userspace since cf00713a655d3019be7faa184402f16c43a0fed3

2016-10-15 Thread Pascal Terjan

On 15 October 2016 at 16:10, Mikko Rapeli  wrote:
> On Sat, Oct 15, 2016 at 03:33:22PM +0100, Pascal Terjan wrote:
>> On 15 October 2016 at 15:09, Mikko Rapeli  wrote:
>> > On Sat, Oct 15, 2016 at 01:05:10PM +0100, Pascal Terjan wrote:
>> >> It is no longer possible to include  + userspace
>> >> headers using time, for example  , this broke for example
>> >> the build of linux-atm.
>> >>
>> >> Reproducer:
>> >>
>> >> $ cat test.c
>> >> #include 
>> >> #include 
>> >
>> > If possible, please reverse the order of includes to first include glibc
>> > headers and then Linux kernel uapi ones.
>>
>> That was what I tried first but this didn't help:
>>
>> In file included from /usr/include/linux/atm_zatm.h:17:0,
>>  from test.c:2:
>> /usr/include/linux/time.h:9:8: error: redefinition of 'struct timespec'
>>  struct timespec {
>> ^
>> In file included from /usr/include/sys/select.h:43:0,
>>  from /usr/include/sys/types.h:219,
>>  from /usr/include/stdlib.h:314,
>>  from test.c:1:
>> /usr/include/time.h:120:8: note: originally defined here
>>  struct timespec
>> ^
>> In file included from /usr/include/linux/atm_zatm.h:17:0,>  
>> from test.c:2:
>>  from test.c:2:
>> /usr/include/linux/time.h:15:8: error: redefinition of 'struct timeval'
>>  struct timeval {
>> ^
>> In file included from /usr/include/sys/select.h:45:0,
>>  from /usr/include/sys/types.h:219,
>>  from /usr/include/stdlib.h:314,
>>  from test.c:1:
>> /usr/include/bits/time.h:30:8: note: originally defined here
>>  struct timeval
>> ^
>> > Kernel uapi headers did not declare their header file dependencies 
>> > correctly
>> > and I've been fixing them. I have also tried to fix compatibility issues
>> > with glibc headers, but unfortunately they only work when glibc headers
>> > are included before kernel headers. Userspace which has been relying on
>> > the magic include order for various uapi headers is now unfortunately
>> > affected. Sorry about that.
>>
>> In this case no order works, it seems the kernel doesn't handle it in
>> time.h unlike many other headers
>
> Ok, then https://patchwork.kernel.org/patch/9294305/ hasn't been applied yet.
> You can apply that or revert cf00713a655d3019be7faa184402f16c43a0fed3
> for the time being.

Ah thanks, I'll take that patch :)

> It's a bit tricky to push through changes touching uapi headers for various
> kernel sub systems since they may get applied at different order and time.

Yeah I can imagine, thanks for doing it

Re: [PATCH] ethtool: Zero memory allocated for statistics

2016-10-15 Thread Vlad Tsyrklevich

I agree that we should propagate those errors and I'll prepare a new
change to do so for phy_driver.get_stats(), ethtool_ops.self_test(),
and ethtool_ops.get_ethtool_stats(). However, I still think this
change should be adopted. 3/5 of the cases here are reachable without
any special capabilities and programming defensively at the ethtool
interface can eliminate an entire class of potential driver bugs
instead of fixing them one by one. For example, get_eeprom()
propagates errors but with a brief grep I found that
qlcnic_get_eeprom() will return 0 incorrectly even though it read
nothing for some NICs. Deeper bugs are undoubtedly laying around.

On Sat, Oct 15, 2016 at 5:11 PM, Vlad Tsyrklevich  wrote:
> I agree that we should propagate those errors and I'll prepare a new change
> to do so for phy_driver.get_stats(), ethtool_ops.self_test(), and
> ethtool_ops.get_ethtool_stats(). However, I still think this change should
> be adopted. 3/5 of the cases here are reachable without any special
> capabilities and programming defensively at the ethtool interface can
> eliminate an entire class of potential driver bugs instead of fixing them
> one by one. For example, get_eeprom() propagates errors but with a brief
> grep I found that qlcnic_get_eeprom() will return 0 incorrectly even though
> it read nothing for some NICs. Deeper bugs are undoubtedly laying around.
>
> On Sat, Oct 15, 2016 at 3:21 AM David Miller  wrote:
>>
>> From: Vlad Tsyrklevich 
>> Date: Fri, 14 Oct 2016 11:59:18 +0200
>>
>> > enic_get_ethtool_stats()
>>
>> Looknig merely at this shows the real problem.
>>
>> We don't propagate and handle errors for this method.
>>
>> And that's what we should fix, making the get_ethtool_stats() method
>> return an integer error.
>>
>> Then ethtool_get_stats() should return any non-zero value provided by
>> ops->get_ethtool_stats() and not attempt to copy any bytes of 'data'
>> to userspace in that case.

Re: linux/atm_zatm.h not really usable in userspace since cf00713a655d3019be7faa184402f16c43a0fed3

2016-10-15 Thread Mikko Rapeli

On Sat, Oct 15, 2016 at 03:33:22PM +0100, Pascal Terjan wrote:
> On 15 October 2016 at 15:09, Mikko Rapeli  wrote:
> > On Sat, Oct 15, 2016 at 01:05:10PM +0100, Pascal Terjan wrote:
> >> It is no longer possible to include  + userspace
> >> headers using time, for example  , this broke for example
> >> the build of linux-atm.
> >>
> >> Reproducer:
> >>
> >> $ cat test.c
> >> #include 
> >> #include 
> >
> > If possible, please reverse the order of includes to first include glibc
> > headers and then Linux kernel uapi ones.
> 
> That was what I tried first but this didn't help:
> 
> In file included from /usr/include/linux/atm_zatm.h:17:0,
>  from test.c:2:
> /usr/include/linux/time.h:9:8: error: redefinition of 'struct timespec'
>  struct timespec {
> ^
> In file included from /usr/include/sys/select.h:43:0,
>  from /usr/include/sys/types.h:219,
>  from /usr/include/stdlib.h:314,
>  from test.c:1:
> /usr/include/time.h:120:8: note: originally defined here
>  struct timespec
> ^
> In file included from /usr/include/linux/atm_zatm.h:17:0,>  
> from test.c:2:
>  from test.c:2:
> /usr/include/linux/time.h:15:8: error: redefinition of 'struct timeval'
>  struct timeval {
> ^
> In file included from /usr/include/sys/select.h:45:0,
>  from /usr/include/sys/types.h:219,
>  from /usr/include/stdlib.h:314,
>  from test.c:1:
> /usr/include/bits/time.h:30:8: note: originally defined here
>  struct timeval
> ^
> > Kernel uapi headers did not declare their header file dependencies correctly
> > and I've been fixing them. I have also tried to fix compatibility issues
> > with glibc headers, but unfortunately they only work when glibc headers
> > are included before kernel headers. Userspace which has been relying on
> > the magic include order for various uapi headers is now unfortunately
> > affected. Sorry about that.
> 
> In this case no order works, it seems the kernel doesn't handle it in
> time.h unlike many other headers

Ok, then https://patchwork.kernel.org/patch/9294305/ hasn't been applied yet.
You can apply that or revert cf00713a655d3019be7faa184402f16c43a0fed3
for the time being.

It's a bit tricky to push through changes touching uapi headers for various
kernel sub systems since they may get applied at different order and time.

-Mikko

Re: Need help with mdiobus_register and phy

2016-10-15 Thread Timur Tabi


Florian Fainelli wrote:

After reading the spec again, it does not appear to me that a PHY
with PDOWN set is guaranteed or even required to respond to other
register reads such as MII_PHYID1/2, in which case we may have to
implement a MDIO bus reset routine which clears PDOWN for all PHYs
that we detect(ed), or as Andrew suggested, utilize the matching by
compatible string with the PHY OUI in it.


The 8031 does respond normally when PDOWN is set.  However, the ID 
registers are not available when the SerDes bus is also powered down. 
I'll call this PDOWN+.  This is a special power-down sequence that the 
at803x driver does on suspend.  See my other email for details.


--
Sent by an employee of the Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the
Code Aurora Forum, hosted by The Linux Foundation.

Re: linux/atm_zatm.h not really usable in userspace since cf00713a655d3019be7faa184402f16c43a0fed3

2016-10-15 Thread Pascal Terjan

On 15 October 2016 at 15:09, Mikko Rapeli  wrote:
> On Sat, Oct 15, 2016 at 01:05:10PM +0100, Pascal Terjan wrote:
>> It is no longer possible to include  + userspace
>> headers using time, for example  , this broke for example
>> the build of linux-atm.
>>
>> Reproducer:
>>
>> $ cat test.c
>> #include 
>> #include 
>
> If possible, please reverse the order of includes to first include glibc
> headers and then Linux kernel uapi ones.

That was what I tried first but this didn't help:

In file included from /usr/include/linux/atm_zatm.h:17:0,
 from test.c:2:
/usr/include/linux/time.h:9:8: error: redefinition of 'struct timespec'
 struct timespec {
^
In file included from /usr/include/sys/select.h:43:0,
 from /usr/include/sys/types.h:219,
 from /usr/include/stdlib.h:314,
 from test.c:1:
/usr/include/time.h:120:8: note: originally defined here
 struct timespec
^
In file included from /usr/include/linux/atm_zatm.h:17:0,
 from test.c:2:
/usr/include/linux/time.h:15:8: error: redefinition of 'struct timeval'
 struct timeval {
^
In file included from /usr/include/sys/select.h:45:0,
 from /usr/include/sys/types.h:219,
 from /usr/include/stdlib.h:314,
 from test.c:1:
/usr/include/bits/time.h:30:8: note: originally defined here
 struct timeval
^
> Kernel uapi headers did not declare their header file dependencies correctly
> and I've been fixing them. I have also tried to fix compatibility issues
> with glibc headers, but unfortunately they only work when glibc headers
> are included before kernel headers. Userspace which has been relying on
> the magic include order for various uapi headers is now unfortunately
> affected. Sorry about that.

In this case no order works, it seems the kernel doesn't handle it in
time.h unlike many other headers

Re: linux/atm_zatm.h not really usable in userspace since cf00713a655d3019be7faa184402f16c43a0fed3

2016-10-15 Thread Mikko Rapeli

On Sat, Oct 15, 2016 at 01:05:10PM +0100, Pascal Terjan wrote:
> It is no longer possible to include  + userspace
> headers using time, for example  , this broke for example
> the build of linux-atm.
> 
> Reproducer:
> 
> $ cat test.c
> #include 
> #include 

If possible, please reverse the order of includes to first include glibc
headers and then Linux kernel uapi ones.

Kernel uapi headers did not declare their header file dependencies correctly
and I've been fixing them. I have also tried to fix compatibility issues
with glibc headers, but unfortunately they only work when glibc headers
are included before kernel headers. Userspace which has been relying on
the magic include order for various uapi headers is now unfortunately
affected. Sorry about that.

-Mikko

> $ gcc -c test.c
> In file included from /usr/include/sys/select.h:43:0,
>  from /usr/include/sys/types.h:219,
>  from /usr/include/stdlib.h:314,
>  from test.c:2:
> /usr/include/time.h:120:8: error: redefinition of 'struct timespec'
>  struct timespec
> ^
> In file included from /usr/include/linux/atm_zatm.h:17:0,
>  from test.c:1:
> /usr/include/linux/time.h:9:8: note: originally defined here
>  struct timespec {
> ^
> In file included from /usr/include/sys/select.h:45:0,
>  from /usr/include/sys/types.h:219,
>  from /usr/include/stdlib.h:314,
>  from test.c:2:
> /usr/include/bits/time.h:30:8: error: redefinition of 'struct timeval'
>  struct timeval
> ^
> In file included from /usr/include/linux/atm_zatm.h:17:0,
>  from test.c:1:
> /usr/include/linux/time.h:15:8: note: originally defined here
>  struct timeval {
> ^

Re: userspace build broken by include changes

2016-10-15 Thread Mikko Rapeli

Hi,

On Sat, Oct 15, 2016 at 12:40:43PM +0100, Pascal Terjan wrote:
> rp-pppoe plugin of ppp no longer builds:
> 
> In file included from pppoe.h:87:0,
>  from plugin.c:29:
> /usr/include/linux/in.h:28:3: error: redeclaration of enumerator 'IPPROTO_IP'
>IPPROTO_IP = 0,  /* Dummy protocol for TCP  */
>^
> /usr/include/netinet/in.h:42:5: note: previous definition of
> 'IPPROTO_IP' was here
>  IPPROTO_IP = 0,/* Dummy protocol for TCP.  */
> 
> Short reproducer:
> 
> #include 
> #include 
> #include 
> #include 
> #include 
> #include 
> 
> Full log:
> http://pkgsubmit.mageia.org/autobuild/cauldron/x86_64/core/2016-10-12/ppp-2.4.7-8.mga6.src.rpm/build.0.20161012185227.log
> 
> Moving the include of linux/if.h after netinet/in.h fixes it.
> 
> I guess the breakage is caused by
> http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/include/uapi/linux?id=eafe92114308acf14e45c6c3d154a5dad5523d1a
> but the commit doesn't look wrong to me.

These kernel uapi headers did not declare their dependencies correctly
and this patch fixed it. Many uapi headers from kernel unfortunately
conflict with glibc and other userspace headers, and userspace code is often
relying on this behavior, e.g. has had a working order of includes.

My patch series has compatibility changes so that kernel headers can be
included after glibc ones.

Unfortunately I haven't had time to provide similar patches to glibc so
things might breaks if kernel headers are included before glibc headers.

So the best I can do for now is to ask you to change the userspace include
order to first include glibc headers and then kernel uapi ones.

This is an unfortunate kernel header API break, sorry. ABI's are not affected
though.

-Mikko

[RFC v2 2/2] proc connector: add a "get feature" op

2016-10-15 Thread Alban Crequy

From: Alban Crequy 

As more kinds of events are being added in the proc connector, userspace
needs a way to detect whether the kernel supports those new events.

When a kind of event is not supported, userspace should report an error
propertly, or fallback to other methods (regular polling of procfs).

The events fork, exec, uid, gid, sid, ptrace, comm, exit were added
together. Then commit 2b5faa4c ("connector: Added coredumping event to
the process connector") added coredump events but without a way for
userspace to detect if the kernel will emit those. So I am grouping
them all together in PROC_CN_FEATURE_BASIC.

- PROC_CN_FEATURE_BASIC: supports fork, exec, uid, gid, sid, ptrace,
  comm, exit, coredump.

- PROC_CN_FEATURE_NS: supports ns.

Signed-off-by: Alban Crequy 
---
 drivers/connector/cn_proc.c  | 25 +++--
 include/uapi/linux/cn_proc.h |  4 
 2 files changed, 19 insertions(+), 10 deletions(-)

diff --git a/drivers/connector/cn_proc.c b/drivers/connector/cn_proc.c
index c38733d..5f9ace6 100644
--- a/drivers/connector/cn_proc.c
+++ b/drivers/connector/cn_proc.c
@@ -442,15 +442,12 @@ void proc_ns_connector_send(struct ns_event_prepare 
*prepare, struct task_struct
  * values because it's not being returned via syscall return
  * mechanisms.
  */
-static void cn_proc_ack(int err, int rcvd_seq, int rcvd_ack)
+static void cn_proc_ack(int err, u16 flags, int rcvd_seq, int rcvd_ack)
 {
struct cn_msg *msg;
struct proc_event *ev;
__u8 buffer[CN_PROC_MSG_SIZE] __aligned(8);
 
-   if (atomic_read(&proc_event_num_listeners) < 1)
-   return;
-
msg = buffer_to_cn_msg(buffer);
ev = (struct proc_event *)msg->data;
memset(&ev->event_data, 0, sizeof(ev->event_data));
@@ -462,7 +459,7 @@ static void cn_proc_ack(int err, int rcvd_seq, int rcvd_ack)
memcpy(&msg->id, &cn_proc_event_id, sizeof(msg->id));
msg->ack = rcvd_ack + 1;
msg->len = sizeof(*ev);
-   msg->flags = 0; /* not used */
+   msg->flags = flags;
send_msg(msg);
 }
 
@@ -475,9 +472,12 @@ static void cn_proc_mcast_ctl(struct cn_msg *msg,
 {
enum proc_cn_mcast_op *mc_op = NULL;
int err = 0;
+   u16 flags = 0;
 
-   if (msg->len != sizeof(*mc_op))
-   return;
+   if (msg->len != sizeof(*mc_op)) {
+   err = EINVAL;
+   goto out;
+   }
 
/* 
 * Events are reported with respect to the initial pid
@@ -485,8 +485,10 @@ static void cn_proc_mcast_ctl(struct cn_msg *msg,
 * other namespaces.
 */
if ((current_user_ns() != &init_user_ns) ||
-   (task_active_pid_ns(current) != &init_pid_ns))
-   return;
+   (task_active_pid_ns(current) != &init_pid_ns)) {
+   err = EPERM;
+   goto out;
+   }
 
/* Can only change if privileged. */
if (!__netlink_ns_capable(nsp, &init_user_ns, CAP_NET_ADMIN)) {
@@ -496,6 +498,9 @@ static void cn_proc_mcast_ctl(struct cn_msg *msg,
 
mc_op = (enum proc_cn_mcast_op *)msg->data;
switch (*mc_op) {
+   case PROC_CN_GET_FEATURES:
+   flags = PROC_CN_FEATURE_BASIC | PROC_CN_FEATURE_NS;
+   break;
case PROC_CN_MCAST_LISTEN:
atomic_inc(&proc_event_num_listeners);
break;
@@ -508,7 +513,7 @@ static void cn_proc_mcast_ctl(struct cn_msg *msg,
}
 
 out:
-   cn_proc_ack(err, msg->seq, msg->ack);
+   cn_proc_ack(err, flags, msg->seq, msg->ack);
 }
 
 /*
diff --git a/include/uapi/linux/cn_proc.h b/include/uapi/linux/cn_proc.h
index 3270e8c..2ea0e5d 100644
--- a/include/uapi/linux/cn_proc.h
+++ b/include/uapi/linux/cn_proc.h
@@ -25,10 +25,14 @@
  * for events on the connector.
  */
 enum proc_cn_mcast_op {
+   PROC_CN_GET_FEATURES = 0,
PROC_CN_MCAST_LISTEN = 1,
PROC_CN_MCAST_IGNORE = 2
 };
 
+#define PROC_CN_FEATURE_BASIC 0x0001
+#define PROC_CN_FEATURE_NS0x0002
+
 /*
  * From the user's point of view, the process
  * ID is the thread group ID and thread ID is the internal
-- 
2.7.4

[RFC v2 1/2] proc connector: add namespace events

2016-10-15 Thread Alban Crequy

From: Alban Crequy 

The act of a process creating or joining a namespace via clone(),
unshare() or setns() is a useful signal for monitoring applications.

I am working on a monitoring application that keeps track of all the
containers and all processes inside each container. The current way of
doing it is by polling regularly in /proc for the list of processes and
in /proc/*/ns/* to know which namespaces they belong to. This is
inefficient on systems with a large number of containers and a large
number of processes.

Instead, I would inspect /proc only one time and get the updates with
the proc connector. Unfortunately, the proc connector gives me the list
of processes but does not notify me when a process changes namespaces.
So I would still need to inspect /proc/*/ns/*.

This patch adds namespace events for processes. It generates a namespace
event each time a process changes namespace via clone(), unshare() or
setns().

For example, the following command:
| # unshare -n -i -f ls -l /proc/self/ns/
| total 0
| lrwxrwxrwx 1 root root 0 Sep 25 22:31 cgroup -> 'cgroup:[4026531835]'
| lrwxrwxrwx 1 root root 0 Sep 25 22:31 ipc -> 'ipc:[4026532208]'
| lrwxrwxrwx 1 root root 0 Sep 25 22:31 mnt -> 'mnt:[4026531840]'
| lrwxrwxrwx 1 root root 0 Sep 25 22:31 net -> 'net:[4026532210]'
| lrwxrwxrwx 1 root root 0 Sep 25 22:31 pid -> 'pid:[4026531836]'
| lrwxrwxrwx 1 root root 0 Sep 25 22:31 user -> 'user:[4026531837]'
| lrwxrwxrwx 1 root root 0 Sep 25 22:31 uts -> 'uts:[4026531838]'

causes the proc connector to generate the following events:
| fork: ppid=691 pid=808
| exec: pid=808
| ns: pid=808 reason=unshare count=2
| type=ipc  4026531839 -> 4026532208
| type=net  4026531957 -> 4026532210
| fork: ppid=808 pid=809
| exec: pid=809
| exit: pid=809
| exit: pid=808

Signed-off-by: Alban Crequy 
---
 drivers/connector/cn_proc.c  | 138 +++
 include/linux/cn_proc.h  |  25 
 include/uapi/linux/cn_proc.h |  23 +++-
 kernel/fork.c|  10 
 kernel/nsproxy.c |   6 ++
 5 files changed, 201 insertions(+), 1 deletion(-)

diff --git a/drivers/connector/cn_proc.c b/drivers/connector/cn_proc.c
index a782ce8..c38733d 100644
--- a/drivers/connector/cn_proc.c
+++ b/drivers/connector/cn_proc.c
@@ -30,8 +30,13 @@
 #include 
 #include 
 #include 
+#include 
+#include 
+#include 
+#include 
 
 #include 
+#include 
 
 /*
  * Size of a cn_msg followed by a proc_event structure.  Since the
@@ -296,6 +301,139 @@ void proc_exit_connector(struct task_struct *task)
send_msg(msg);
 }
 
+void proc_ns_connector_prepare(struct ns_event_prepare *prepare, u16 reason)
+{
+   struct nsproxy *ns = current->nsproxy;
+   struct ns_common *mntns;
+
+   prepare->num_listeners = atomic_read(&proc_event_num_listeners);
+
+   if (prepare->num_listeners < 1)
+   return;
+
+   prepare->reason = reason;
+
+   prepare->user_inum = current->cred->user_ns->ns.inum;
+   prepare->uts_inum = ns->uts_ns->ns.inum;
+   prepare->ipc_inum = ns->ipc_ns->ns.inum;
+
+   mntns = mntns_operations.get(current);
+   if (mntns) {
+   prepare->mnt_inum = mntns->inum;
+   mntns_operations.put(mntns);
+   } else
+   prepare->mnt_inum = 0;
+
+   prepare->pid_inum = ns->pid_ns_for_children->ns.inum;
+   prepare->net_inum = ns->net_ns->ns.inum;
+   prepare->cgroup_inum = ns->cgroup_ns->ns.inum;
+}
+
+void proc_ns_connector_send(struct ns_event_prepare *prepare, struct 
task_struct *task)
+{
+   struct nsproxy *ns = task->nsproxy;
+   struct ns_common *mntns;
+   struct cn_msg *msg;
+   struct proc_event *ev;
+   __u8 buffer[CN_PROC_MSG_SIZE] __aligned(8);
+   int count;
+
+   if (prepare->num_listeners < 1)
+   return;
+
+   if (atomic_read(&proc_event_num_listeners) < 1)
+   return;
+
+   msg = buffer_to_cn_msg(buffer);
+   ev = (struct proc_event *)msg->data;
+   memset(&ev->event_data, 0, sizeof(ev->event_data));
+   ev->timestamp_ns = ktime_get_ns();
+   ev->what = PROC_EVENT_NS;
+
+   ev->event_data.ns.process_pid  = task->pid;
+   ev->event_data.ns.process_tgid = task->tgid;
+   ev->event_data.ns.reason = prepare->reason;
+   count = 0;
+
+   /* user */
+   if (prepare->user_inum != task->cred->user_ns->ns.inum) {
+   ev->event_data.ns.items[count].type = CLONE_NEWUSER;
+   ev->event_data.ns.items[count].flags = 0;
+   ev->event_data.ns.items[count].old_inum = prepare->user_inum;
+   ev->event_data.ns.items[count].inum = 
task->cred->user_ns->ns.inum;
+   count++;
+   }
+
+   /* uts */
+   if (prepare->uts_inum != ns->uts_ns->ns.inum) {
+   ev->event_data.ns.items[count].type = CLONE_NEWUTS;
+   ev->event_data.ns.items[count].flags = 0;
+   ev->event_data.ns.items[count].old_in

[RFC v2 0/2] proc connector: get namespace events

2016-10-15 Thread Alban Crequy

This is v2 of the patch set to add namespace events in the proc connector.

The act of a process creating or joining a namespace via clone(),
unshare() or setns() is a useful signal for monitoring applications.

I am working on a monitoring application that keeps track of all the
containers and all processes inside each container. The current way of
doing it is by polling regularly in /proc for the list of processes and
in /proc/*/ns/* to know which namespaces they belong to. This is
inefficient on systems with a large number of containers and a large
number of processes.

Instead, I would inspect /proc only one time and get the updates with
the proc connector. Unfortunately, the proc connector gives me the list
of processes but does not notify me when a process changes namespaces.
So I would still need to inspect /proc/*/ns/*.

 (1) Add namespace events for processes. It generates a namespace event each
 time a process changes namespace via clone(), unshare() or setns().

 (2) Add a way for userspace to detect if proc connector is able to send
 namespace events.


Changes since RFC-v1: https://lkml.org/lkml/2016/9/8/588

* Supports userns.

* The reason field says exactly whether it is clone/setns/unshare.

* Sends aggregated messages containing details of several namespaces
  changes. Suggested by Evgeniy Polyakov.

* Add patch 2 to detect if proc connector is able to send namespace events.


This patch set is available in the git repository at:

  https://github.com/kinvolk/linux.git alban/proc_ns_connector-v2-5


Alban Crequy (2):
  proc connector: add namespace events
  proc connector: add a "get feature" op

 drivers/connector/cn_proc.c  | 163 ---
 include/linux/cn_proc.h  |  25 +++
 include/uapi/linux/cn_proc.h |  27 ++-
 kernel/fork.c|  10 +++
 kernel/nsproxy.c |   6 ++
 5 files changed, 220 insertions(+), 11 deletions(-)

-- 
2.7.4

linux/atm_zatm.h not really usable in userspace since cf00713a655d3019be7faa184402f16c43a0fed3

2016-10-15 Thread Pascal Terjan

It is no longer possible to include  + userspace
headers using time, for example  , this broke for example
the build of linux-atm.

Reproducer:

$ cat test.c
#include 
#include 

$ gcc -c test.c
In file included from /usr/include/sys/select.h:43:0,
 from /usr/include/sys/types.h:219,
 from /usr/include/stdlib.h:314,
 from test.c:2:
/usr/include/time.h:120:8: error: redefinition of 'struct timespec'
 struct timespec
^
In file included from /usr/include/linux/atm_zatm.h:17:0,
 from test.c:1:
/usr/include/linux/time.h:9:8: note: originally defined here
 struct timespec {
^
In file included from /usr/include/sys/select.h:45:0,
 from /usr/include/sys/types.h:219,
 from /usr/include/stdlib.h:314,
 from test.c:2:
/usr/include/bits/time.h:30:8: error: redefinition of 'struct timeval'
 struct timeval
^
In file included from /usr/include/linux/atm_zatm.h:17:0,
 from test.c:1:
/usr/include/linux/time.h:15:8: note: originally defined here
 struct timeval {
^

userspace build broken by include changes

2016-10-15 Thread Pascal Terjan

rp-pppoe plugin of ppp no longer builds:

In file included from pppoe.h:87:0,
 from plugin.c:29:
/usr/include/linux/in.h:28:3: error: redeclaration of enumerator 'IPPROTO_IP'
   IPPROTO_IP = 0,  /* Dummy protocol for TCP  */
   ^
/usr/include/netinet/in.h:42:5: note: previous definition of
'IPPROTO_IP' was here
 IPPROTO_IP = 0,/* Dummy protocol for TCP.  */

Short reproducer:

#include 
#include 
#include 
#include 
#include 
#include 

Full log:
http://pkgsubmit.mageia.org/autobuild/cauldron/x86_64/core/2016-10-12/ppp-2.4.7-8.mga6.src.rpm/build.0.20161012185227.log

Moving the include of linux/if.h after netinet/in.h fixes it.

I guess the breakage is caused by
http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/include/uapi/linux?id=eafe92114308acf14e45c6c3d154a5dad5523d1a
but the commit doesn't look wrong to me.

This is indeed enough to cause the error:

#include 
#include 
#include

Re: Need help with mdiobus_register and phy

2016-10-15 Thread Florian Fainelli

On October 14, 2016 7:25:14 PM CEST, Andrew Lunn  wrote:

>> So after calling BMCR_PDOWN, the PHYSID1 and PHYSID2 registers are
>> no longer readable.  Is that expected?
>
>You are making two changes here. Is it the SGMII power down which is
>causing the id registers to return 0x, or the BMCR_PDOWN.

I would be curious to know about that as well.

>
>The generic suspend code sets the PDOWN bit, so it is assuming the PHY
>will respond afterwards.

After reading the spec again, it does not appear to me that a PHY with PDOWN 
set is guaranteed or even required to respond to other register reads such as 
MII_PHYID1/2, in which case we may have to implement a MDIO bus reset routine 
which clears PDOWN for all PHYs that we detect(ed), or as Andrew suggested, 
utilize the matching by compatible string with the PHY OUI in it.

-- 
Florian

Re: net/sctp: BUG: KASAN: stack-out-of-bounds in memcmp

2016-10-15 Thread Baozeng Ding

Hello Xin Long,

On 2016/10/14 19:13, Xin Long wrote:
> On Sat, Aug 20, 2016 at 3:51 PM, Baozeng Ding  wrote:
>> Hello all,
>> The following program triggers  stack-out-of-bounds in memcmp. The kernel 
>> version is 4.8.0-rc1+ (on Aug 13 commit 
>> 118253a593bd1c57de2d1193df1ccffe1abe745b). Thanks.
> ...
>>
>> #define _GNU_SOURCE
>> #include 
>> #include 
>> #include 
>> #include 
>> #include 
>> #include 
>> #include 
>> #include 
>>
>> int main()
>> {
>> int fd;
>> mmap((void *)0x2000ul, 0xff2000ul, 0x3ul, 0x32ul, -1, 0x0ul);
>> fd = socket(AF_INET6, SOCK_STREAM, IPPROTO_SCTP);
>> memcpy((void*)0x20f82f80, 
>> "\x0a\x00\xab\x12\x72\xd4\x19\x9a\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x01\x85\xda\x00\xa0\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00",
>>  128);
>> bind(fd, (struct sockaddr*)0x20f82f80ul, 0x80ul);
>> *(uint64_t*)0x202e1fc8 = (uint64_t)0x20f77f80;
>> *(uint32_t*)0x202e1fd0 = (uint32_t)0x80;
>> *(uint64_t*)0x202e1fd8 = (uint64_t)0x20f7dfe0;
>> *(uint64_t*)0x202e1fe0 = (uint64_t)0x2;
>> *(uint64_t*)0x202e1fe8 = (uint64_t)0x20f77000;
>> *(uint64_t*)0x202e1ff0 = (uint64_t)0x3;
>> *(uint32_t*)0x202e1ff8 = (uint32_t)0x80;
>> memcpy((void*)0x20f77f80, 
>> "\x0a\x00\xab\x12\xb0\xb3\x20\x7b\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x01\xc2\xc2\x0b\xb2\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00",
>>  128);
>> *(uint64_t*)0x20f7dfe0 = (uint64_t)0x20f77fc5;
>> *(uint64_t*)0x20f7dfe8 = (uint64_t)0x3b;
>> *(uint64_t*)0x20f7dff0 = (uint64_t)0x20f77fac;
>> *(uint64_t*)0x20f7dff8 = (uint64_t)0x54;
>> memcpy((void*)0x20f77fc5, 
>> "\xa5\x7d\xf3\xc4\xfe\xd3\xfd\x44\x63\x00\x8c\x1e\x4c\x2e\x8d\x8d\x9a\x9c\x9c\x9d\x5b\x7c\xe1\x06\xf7\x15\x16\xed\x68\xd1\xfc\xf4\xa4\x3a\xe4\x69\x51\x16\x74\xf4\x1a\xcf\x0e\x99\xc3\xa3\x87\xe7\x81\x6c\x10\x78\x75\x17\x69\x9d\x11\x0c\xc7",
>>  59);
>> memcpy((void*)0x20f77fac, 
>> "\x86\x08\x89\x3c\xf3\x58\xea\xe7\x64\x6a\xfb\xb5\xe8\xdd\x5f\x69\xa5\xd4\xdc\xd9\xe7\x71\x95\x07\x78\x7b\x21\xda\x43\x9c\x62\x4d\xca\x64\xb5\x6e\x96\x55\xe9\x58\x76\x66\x1d\xb9\x7b\xe6\x20\xc1\xa9\xed\x70\xc1\x2b\x7c\x86\x8c\xba\x28\xb3\x2c\xb9\x64\xb7\x84\x65\x0d\x7f\xa6\x98\x6f\x49\xcb\x35\xad\x5a\xdf\x13\x75\x99\x57\x7e\xbb\x38\x89",
>>  84);
>> *(uint64_t*)0x20f77000 = (uint64_t)0x15;
>> *(uint32_t*)0x20f77008 = (uint32_t)0x1;
>> *(uint32_t*)0x20f7700c = (uint32_t)0xfffe;
>> *(uint8_t*)0x20f77010 = (uint8_t)0xbb;
>> *(uint8_t*)0x20f77011 = (uint8_t)0x2;
>> *(uint8_t*)0x20f77012 = (uint8_t)0x5;
>> *(uint8_t*)0x20f77013 = (uint8_t)0x2;
>> *(uint8_t*)0x20f77014 = (uint8_t)0x8000;
>> *(uint64_t*)0x20f77015 = (uint64_t)0x10;
>> *(uint32_t*)0x20f7701d = (uint32_t)0x;
>> *(uint32_t*)0x20f77021 = (uint32_t)0x1;
>> *(uint64_t*)0x20f77025 = (uint64_t)0x13;
>> *(uint32_t*)0x20f7702d = (uint32_t)0x6;
>> *(uint32_t*)0x20f77031 = (uint32_t)0xfe00;
>> *(uint8_t*)0x20f77035 = (uint8_t)0x8000;
>> *(uint8_t*)0x20f77036 = (uint8_t)0xfff8;
>> sendmmsg(fd, (struct mmsghdr *)0x202e1fc8ul, 0x1ul, 0x1ul);
>> return 0;
>> }
>>
> Hi, Baozeng, I couldn't reproduce this issue with this script,
> even in 118253a593bd1c57de2d1193df1ccffe1abe745b
> do I need to do some extra config for this ?
> 
You need config KASAN.
CONFIG_HAVE_ARCH_KASAN=y
CONFIG_KASAN=y
CONFIG_KASAN_INLINE=y
CONFIG_KASAN_SHADOW_OFFSET=0xdc00

I justed tested with b67be92feb486f800d80d72c67fd87b47b79b18e(Octor 12),
it sitll exits. If you still cannot reproduce it, i will send the .config to 
you privately. Thanks.

50 matches

Mail list logo