[PATCH net] sctp: fix the issue sctp requeue auth chunk incorrectly

2016-07-29 Thread Xin Long
sctp needs to queue auth chunk back when we know that we are going
to generate another segment. But commit f1533cce60d1 ("sctp: fix
panic when sending auth chunks") requeues the last chunk processed
which is probably not the auth chunk.

It causes panic when calculating the MAC in sctp_auth_calculate_hmac(),
as the incorrect offset of the auth chunk in skb->data.

This fix is to requeue it by using packet->auth.

Fixes: f1533cce60d1 ("sctp: fix panic when sending auth chunks")
Signed-off-by: Xin Long 
---
 net/sctp/output.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/sctp/output.c b/net/sctp/output.c
index 7425f6c..1f1682b 100644
--- a/net/sctp/output.c
+++ b/net/sctp/output.c
@@ -610,7 +610,8 @@ int sctp_packet_transmit(struct sctp_packet *packet, gfp_t 
gfp)
/* We will generate more packets, so re-queue
 * auth chunk.
 */
-   list_add(>list, >chunk_list);
+   list_add(>auth->list,
+>chunk_list);
} else {
sctp_chunk_free(packet->auth);
packet->auth = NULL;
-- 
2.1.0



Re: [PATCH net-next 08/10] drivers: net: xgene: Poll link status via GPIO

2016-07-29 Thread kbuild test robot
Hi,

[auto build test ERROR on net-next/master]

url:
https://github.com/0day-ci/linux/commits/Iyappan-Subramanian/Fix-warning-and-issues/20160730-083713
config: mips-allmodconfig (attached as .config)
compiler: mips-linux-gnu-gcc (Debian 5.4.0-6) 5.4.0 20160609
reproduce:
wget 
https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross
 -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=mips 

All errors (new ones prefixed by >>):

   drivers/gpio/gpio-xgene-sb.c: In function 'xgene_gpio_sb_irq_set_type':
   drivers/gpio/gpio-xgene-sb.c:111:10: error: implicit declaration of function 
'irq_chip_set_type_parent' [-Werror=implicit-function-declaration]
  return irq_chip_set_type_parent(d, IRQ_TYPE_EDGE_RISING);
 ^
   drivers/gpio/gpio-xgene-sb.c: At top level:
   drivers/gpio/gpio-xgene-sb.c:118:13: error: 'irq_chip_eoi_parent' undeclared 
here (not in a function)
 .irq_eoi = irq_chip_eoi_parent,
^
   drivers/gpio/gpio-xgene-sb.c:119:20: error: 'irq_chip_mask_parent' 
undeclared here (not in a function)
 .irq_mask   = irq_chip_mask_parent,
   ^
   drivers/gpio/gpio-xgene-sb.c:120:20: error: 'irq_chip_unmask_parent' 
undeclared here (not in a function)
 .irq_unmask = irq_chip_unmask_parent,
   ^
   drivers/gpio/gpio-xgene-sb.c: In function 'xgene_gpio_sb_domain_alloc':
   drivers/gpio/gpio-xgene-sb.c:198:3: error: implicit declaration of function 
'irq_domain_set_hwirq_and_chip' [-Werror=implicit-function-declaration]
  irq_domain_set_hwirq_and_chip(domain, virq + i, hwirq + i,
  ^
   drivers/gpio/gpio-xgene-sb.c:201:31: error: 'struct irq_domain' has no 
member named 'parent'
 parent_fwspec.fwnode = domain->parent->fwnode;
  ^
   drivers/gpio/gpio-xgene-sb.c:215:9: error: implicit declaration of function 
'irq_domain_alloc_irqs_parent' [-Werror=implicit-function-declaration]
 return irq_domain_alloc_irqs_parent(domain, virq, nr_irqs,
^
   drivers/gpio/gpio-xgene-sb.c: At top level:
   drivers/gpio/gpio-xgene-sb.c:220:2: error: unknown field 'translate' 
specified in initializer
 .translate  = xgene_gpio_sb_domain_translate,
 ^
>> drivers/gpio/gpio-xgene-sb.c:220:20: error: initialization from incompatible 
>> pointer type [-Werror=incompatible-pointer-types]
 .translate  = xgene_gpio_sb_domain_translate,
   ^
   drivers/gpio/gpio-xgene-sb.c:220:20: note: (near initialization for 
'xgene_gpio_sb_domain_ops.match')
   drivers/gpio/gpio-xgene-sb.c:221:2: error: unknown field 'alloc' specified 
in initializer
 .alloc  = xgene_gpio_sb_domain_alloc,
 ^
   drivers/gpio/gpio-xgene-sb.c:221:20: error: initialization from incompatible 
pointer type [-Werror=incompatible-pointer-types]
 .alloc  = xgene_gpio_sb_domain_alloc,
   ^
   drivers/gpio/gpio-xgene-sb.c:221:20: note: (near initialization for 
'xgene_gpio_sb_domain_ops.select')
   drivers/gpio/gpio-xgene-sb.c:222:2: error: unknown field 'free' specified in 
initializer
 .free   = irq_domain_free_irqs_common,
 ^
   drivers/gpio/gpio-xgene-sb.c:222:20: error: 'irq_domain_free_irqs_common' 
undeclared here (not in a function)
 .free   = irq_domain_free_irqs_common,
   ^
   drivers/gpio/gpio-xgene-sb.c:223:2: error: unknown field 'activate' 
specified in initializer
 .activate = xgene_gpio_sb_domain_activate,
 ^
   drivers/gpio/gpio-xgene-sb.c:223:14: error: initialization from incompatible 
pointer type [-Werror=incompatible-pointer-types]
 .activate = xgene_gpio_sb_domain_activate,
 ^
   drivers/gpio/gpio-xgene-sb.c:223:14: note: (near initialization for 
'xgene_gpio_sb_domain_ops.unmap')
   drivers/gpio/gpio-xgene-sb.c:224:2: error: unknown field 'deactivate' 
specified in initializer
 .deactivate = xgene_gpio_sb_domain_deactivate,
 ^
   drivers/gpio/gpio-xgene-sb.c:224:16: error: initialization from incompatible 
pointer type [-Werror=incompatible-pointer-types]
 .deactivate = xgene_gpio_sb_domain_deactivate,
   ^
   drivers/gpio/gpio-xgene-sb.c:224:16: note: (near initialization for 
'xgene_gpio_sb_domain_ops.xlate')
   drivers/gpio/gpio-xgene-sb.c: In function 'xgene_gpio_sb_probe':
   drivers/gpio/gpio-xgene-sb.c:293:21: error: implicit declaration of function 
'irq_domain_create_hierarchy' [-Werror=implicit-function-declaration]
 priv->irq_domain = irq_domain_create_hierarchy(parent_domain,
^
   drivers/gpio/gpio-xgene-sb.c:293:19: warning: assignment makes pointer from 
integer without a cast [-Wint-conversion]
 priv->irq_domain = irq_domain_create_hierarchy(parent_domain,
  ^
   cc1: some warnings being treated as errors

vim +220 

Re: [PATCH] bpf: fix size of copy_to_user in percpu map.

2016-07-29 Thread Alexei Starovoitov
On Fri, Jul 29, 2016 at 10:23:06PM -0700, William Tu wrote:
> On Fri, Jul 29, 2016 at 5:19 PM, Daniel Borkmann  wrote:
> > On 07/29/2016 10:03 PM, William Tu wrote:
> >>
> >> I'm not using ARM. It's x86 in a VM with 2 vcpu. By printk() in kernel, I
> >> got
> >>num_possible_cpu == 64
> >>num_online_cpu == 2 == sysconf(_SC_NPROCESSORS_CONF)
...
> >> To fix it, I could either
> >> 1). declare values array based on num_possible_cpu in test_map.c,
> >>long values[64];
> >> or 2) in kernel, only copying 8*2 = 16 byte from kernel to user.
...
> Since percpu array adds variable length of data passing between kernel
> and userspace, I wonder if we should add a 'value_len' field in 'union
> bpf_attr' so kernel knows how much data to copy to user?

I think the first step is to figure out why num_possible is 64,
since it hurts all per-cpu allocations. If it is a widespread issue,
it hurts a lot of VMs.
Hopefully it's not the case, since in my kvm setup num_possible==num_online
qemu version 2.4.0
booting with -enable-kvm -smp N



Re: [PATCH] bpf: fix size of copy_to_user in percpu map.

2016-07-29 Thread William Tu
On Fri, Jul 29, 2016 at 5:19 PM, Daniel Borkmann  wrote:
> On 07/29/2016 10:03 PM, William Tu wrote:
>>
>> Hi Daniel and Alexei,
>>
>> Thanks for the reply. My apology for too brief description. In short,
>> in my environment, running samples/bpf/test_map always segfault under
>> percpu array/hash map operations. I think it's due to stack
>> corruption.
>>
>> I'm not using ARM. It's x86 in a VM with 2 vcpu. By printk() in kernel, I
>> got
>>num_possible_cpu == 64
>>num_online_cpu == 2 == sysconf(_SC_NPROCESSORS_CONF)
>
>
> Ok, thanks for the data!
>
>> So at samples/bpf/test_maps.c, test_percpu_arraymap_sanity(),
>> we define:
>>long values[nr_cpus]; //nr_cpus=2
>>
>>... // create map and update map ...
>>
>>/* check that key=0 is also found and zero initialized */
>>assert(bpf_lookup_elem(map_fd, , values) == 0 &&
>>  values[0] == 0 && values[nr_cpus - 1] == 0);
>>
>> Here we enter the bpf syscall, calls into kernel "map_lookup_elem()"
>> and we calculate:
>>value_size = round_up(map->value_size, 8) * num_possible_cpus();
>>// which in my case 8 * 64 = 512
>>...
>>// then copy to user, which writes 512B to the "values[nr_cpus]" on
>> stack
>>if (copy_to_user(uvalue, value, value_size) != 0)
>>
>> And I think this 512B write to userspace corrupts the userspace stack
>> and causes a coredump. After bpf_lookup_elem() calls, gdb shows
>> 'values' points to memory address 0x0.
>>
>> To fix it, I could either
>> 1). declare values array based on num_possible_cpu in test_map.c,
>>long values[64];
>> or 2) in kernel, only copying 8*2 = 16 byte from kernel to user.
>
>
> But I think the patch of using num_online_cpus() would also not be correct
> in the sense that f.e. your application could alloc an array at time X
> where map lookup at time Y would not fit to the expectations anymore due
> to CPU hotplugging (since apparently _SC_NPROCESSORS_CONF maps to online
> CPUs in some cases). So also there you could potentially corrupt your
> application or mem allocator in user space, or not all your valid data
> might get copied, hmm.
>
Yes, you're right. CPU hotplugging might cause the same issue.

Since percpu array adds variable length of data passing between kernel
and userspace, I wonder if we should add a 'value_len' field in 'union
bpf_attr' so kernel knows how much data to copy to user?

Regards,
William


Re: igb: question regarding auto-negotiation

2016-07-29 Thread Alexander Duyck
On Fri, Jul 29, 2016 at 4:37 PM, Dominic Curran
 wrote:
> Hi
>
> This question refers to igb codebase.
> I have a question regarding the setting of hw->mac.autoneg.
>
> Is it correct to say for igb driver:
>"if speed=1000 and duplex=FULL and media_type=COPPER  then  only
> auto-negotiate enable is supported"
>
> i.e.
>with these settings (speed/duplex/media_type) then auto-negotiate can
> _not_ be disabled.  Correct ?
>
> I say this for two reasons:
> 1) The code in igb_set_spd_dplx() seems to indicate it:
>
>case SPEED_1000 + DUPLEX_FULL:
> mac->autoneg = 1;
> adapter->hw.phy.autoneg_advertised = ADVERTISE_1000_FULL;
> break;
>
> 2) Instrumenting the driver, I always see the autoneg code in
> e1000_check_for_copper_link_generic()  get called after an igb_reset().
>
>
> Have i understood correctly ?
>
> thanks in advance
> dom

If you are using copper then you are likely referring to 1000Base-T
correct?  If so then autonegotation is a requirement.

Here is the wikipedia URL that refers to this:
https://en.wikipedia.org/wiki/Gigabit_Ethernet#1000BASE-T

Hope this helps to clear it up.

Thanks.

- Alex


Re: [PATCH net-next 08/10] drivers: net: xgene: Poll link status via GPIO

2016-07-29 Thread kbuild test robot
Hi,

[auto build test ERROR on net-next/master]

url:
https://github.com/0day-ci/linux/commits/Iyappan-Subramanian/Fix-warning-and-issues/20160730-083713
config: xtensa-allmodconfig (attached as .config)
compiler: xtensa-linux-gcc (GCC) 4.9.0
reproduce:
wget 
https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross
 -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=xtensa 

All error/warnings (new ones prefixed by >>):

   drivers/gpio/gpio-xgene-sb.c: In function 'xgene_gpio_sb_irq_set_type':
>> drivers/gpio/gpio-xgene-sb.c:111:3: error: implicit declaration of function 
>> 'irq_chip_set_type_parent' [-Werror=implicit-function-declaration]
  return irq_chip_set_type_parent(d, IRQ_TYPE_EDGE_RISING);
  ^
   drivers/gpio/gpio-xgene-sb.c: At top level:
>> drivers/gpio/gpio-xgene-sb.c:118:13: error: 'irq_chip_eoi_parent' undeclared 
>> here (not in a function)
 .irq_eoi = irq_chip_eoi_parent,
^
>> drivers/gpio/gpio-xgene-sb.c:119:20: error: 'irq_chip_mask_parent' 
>> undeclared here (not in a function)
 .irq_mask   = irq_chip_mask_parent,
   ^
>> drivers/gpio/gpio-xgene-sb.c:120:20: error: 'irq_chip_unmask_parent' 
>> undeclared here (not in a function)
 .irq_unmask = irq_chip_unmask_parent,
   ^
   drivers/gpio/gpio-xgene-sb.c: In function 'xgene_gpio_sb_domain_alloc':
>> drivers/gpio/gpio-xgene-sb.c:198:3: error: implicit declaration of function 
>> 'irq_domain_set_hwirq_and_chip' [-Werror=implicit-function-declaration]
  irq_domain_set_hwirq_and_chip(domain, virq + i, hwirq + i,
  ^
>> drivers/gpio/gpio-xgene-sb.c:201:31: error: 'struct irq_domain' has no 
>> member named 'parent'
 parent_fwspec.fwnode = domain->parent->fwnode;
  ^
>> drivers/gpio/gpio-xgene-sb.c:215:2: error: implicit declaration of function 
>> 'irq_domain_alloc_irqs_parent' [-Werror=implicit-function-declaration]
 return irq_domain_alloc_irqs_parent(domain, virq, nr_irqs,
 ^
   drivers/gpio/gpio-xgene-sb.c: At top level:
>> drivers/gpio/gpio-xgene-sb.c:220:2: error: unknown field 'translate' 
>> specified in initializer
 .translate  = xgene_gpio_sb_domain_translate,
 ^
>> drivers/gpio/gpio-xgene-sb.c:220:2: warning: initialization from 
>> incompatible pointer type
   drivers/gpio/gpio-xgene-sb.c:220:2: warning: (near initialization for 
'xgene_gpio_sb_domain_ops.match')
>> drivers/gpio/gpio-xgene-sb.c:221:2: error: unknown field 'alloc' specified 
>> in initializer
 .alloc  = xgene_gpio_sb_domain_alloc,
 ^
   drivers/gpio/gpio-xgene-sb.c:221:2: warning: initialization from 
incompatible pointer type
   drivers/gpio/gpio-xgene-sb.c:221:2: warning: (near initialization for 
'xgene_gpio_sb_domain_ops.select')
>> drivers/gpio/gpio-xgene-sb.c:222:2: error: unknown field 'free' specified in 
>> initializer
 .free   = irq_domain_free_irqs_common,
 ^
>> drivers/gpio/gpio-xgene-sb.c:222:20: error: 'irq_domain_free_irqs_common' 
>> undeclared here (not in a function)
 .free   = irq_domain_free_irqs_common,
   ^
>> drivers/gpio/gpio-xgene-sb.c:223:2: error: unknown field 'activate' 
>> specified in initializer
 .activate = xgene_gpio_sb_domain_activate,
 ^
   drivers/gpio/gpio-xgene-sb.c:223:2: warning: initialization from 
incompatible pointer type
   drivers/gpio/gpio-xgene-sb.c:223:2: warning: (near initialization for 
'xgene_gpio_sb_domain_ops.unmap')
>> drivers/gpio/gpio-xgene-sb.c:224:2: error: unknown field 'deactivate' 
>> specified in initializer
 .deactivate = xgene_gpio_sb_domain_deactivate,
 ^
   drivers/gpio/gpio-xgene-sb.c:224:2: warning: initialization from 
incompatible pointer type
   drivers/gpio/gpio-xgene-sb.c:224:2: warning: (near initialization for 
'xgene_gpio_sb_domain_ops.xlate')
   drivers/gpio/gpio-xgene-sb.c: In function 'xgene_gpio_sb_probe':
>> drivers/gpio/gpio-xgene-sb.c:293:2: error: implicit declaration of function 
>> 'irq_domain_create_hierarchy' [-Werror=implicit-function-declaration]
 priv->irq_domain = irq_domain_create_hierarchy(parent_domain,
 ^
>> drivers/gpio/gpio-xgene-sb.c:293:19: warning: assignment makes pointer from 
>> integer without a cast
 priv->irq_domain = irq_domain_create_hierarchy(parent_domain,
  ^
   cc1: some warnings being treated as errors

vim +/irq_chip_set_type_parent +111 drivers/gpio/gpio-xgene-sb.c

1013fc417 Quan Nguyen2016-02-17  105gpio * 2, 1);
1013fc417 Quan Nguyen2016-02-17  106xgene_gpio_set_bit(>gc, 
priv->regs + MPA_GPIO_INT_LVL,
1013fc417 Quan Nguyen2016-02-17  107d->hwirq, 
lvl_type);
1013fc417 Quan Nguyen2016-02-17  108  
1013fc417 Quan Nguyen2016-02-17  109/* Propagate IRQ type setting 
to parent */

Re: [PATCH net-next 00/10] Fix warning and issues

2016-07-29 Thread David Miller
From: Iyappan Subramanian 
Date: Fri, 29 Jul 2016 17:33:53 -0700

> This patch set fixes the following warning and issues,

net-next is closed, please resubmit this after the merge window
is closed and net-next re-opens.

Thank you.


Re: [PATCH net-next 08/10] drivers: net: xgene: Poll link status via GPIO

2016-07-29 Thread kbuild test robot
Hi,

[auto build test WARNING on net-next/master]

url:
https://github.com/0day-ci/linux/commits/Iyappan-Subramanian/Fix-warning-and-issues/20160730-083713
config: xtensa-allmodconfig (attached as .config)
compiler: xtensa-linux-gcc (GCC) 4.9.0
reproduce:
wget 
https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross
 -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=xtensa 

All warnings (new ones prefixed by >>):

warning: (NET_XGENE) selects GPIO_XGENE_SB which has unmet direct dependencies 
(GPIOLIB && ARCH_XGENE && OF_GPIO)

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: Binary data


Re: [PATCH v2 1/5] ethernet: add sun8i-emac driver

2016-07-29 Thread Chen-Yu Tsai
On Sat, Jul 30, 2016 at 1:25 AM, Maxime Ripard
 wrote:
> On Thu, Jul 28, 2016 at 04:57:34PM +0200, LABBE Corentin wrote:
>> > > +static int sun8i_mdio_write(struct mii_bus *bus, int phy_addr, int 
>> > > phy_reg,
>> > > + u16 data)
>> > > +{
>> > > + struct net_device *ndev = bus->priv;
>> > > + struct sun8i_emac_priv *priv = netdev_priv(ndev);
>> > > + u32 reg;
>> > > + int err;
>> > > +
>> > > + err = readl_poll_timeout(priv->base + SUN8I_EMAC_MDIO_CMD, reg,
>> > > +  !(reg & MDIO_CMD_MII_BUSY), 100, 1);
>> > > + if (err) {
>> > > + dev_err(priv->dev, "%s timeout %x\n", __func__, reg);
>> > > + return err;
>> > > + }
>> >
>> > Why the poll_timeout variant?
>> >
>> Because, in case of bad clock/reset/regulator setting, the value
>> expected to come could never be set.
>
> Ah, I missed that it was for a busy bit, my bad. However, you seem to
> be using that on several occasions, maybe you could turn that into a
> function?
>
>> > > +static void sun8i_emac_unset_syscon(struct net_device *ndev)
>> > > +{
>> > > + struct sun8i_emac_priv *priv = netdev_priv(ndev);
>> > > + u32 reg = 0;
>> > > +
>> > > + if (priv->variant == H3_EMAC)
>> > > + reg = H3_EPHY_DEFAULT_VALUE;
>> >
>> > Why do you need that?
>> >
>> For resetting the syscon to the factory default.
>
> Yes, but does it matter? Does it have any side effect? Is that
> register shared with another device?
>
> Otherwise, either it won't be used anymore, and you don't care, or you
> will reload the driver later, and the driver should work whatever
> state is programmed in there. In both cases, you don't need to reset
> that value.

The "default" setting also disables and powers down the internal PHY.
I think that's a good thing? The naming could be better.

>> > > +static irqreturn_t sun8i_emac_dma_interrupt(int irq, void *dev_id)
>> > > +{
>> > > + struct net_device *ndev = dev_id;
>> > > + struct sun8i_emac_priv *priv = netdev_priv(ndev);
>> > > + u32 v, u;
>> > > +
>> > > + v = readl(priv->base + SUN8I_EMAC_INT_STA);
>> > > +
>> > > + /* When this bit is asserted, a frame transmission is completed. */
>> > > + if (v & BIT(0)) {
>> > > + priv->estats.tx_int++;
>> > > + writel(0, priv->base + SUN8I_EMAC_INT_EN);
>> > > + napi_schedule(>napi);
>> > > + }
>> > > +
>> > > + /* When this bit is asserted, the TX DMA FSM is stopped. */
>> > > + if (v & BIT(1))
>> > > + priv->estats.tx_dma_stop++;
>> > > +
>> > > + /* When this asserted, the TX DMA can not acquire next TX descriptor
>> > > +  * and TX DMA FSM is suspended.
>> > > + */
>> > > + if (v & BIT(2))
>> > > + priv->estats.tx_dma_ua++;
>> > > +
>> > > + if (v & BIT(3))
>> > > + netif_dbg(priv, intr, ndev, "Unhandled interrupt TX 
>> > > TIMEOUT\n");
>> >
>> > Why do you enable that interrupt if you can't handle it?
>>
>> Some interrupt fire even when not enabled (like RX_BUF_UA_INT/TX_BUF_UA_INT)
>
> So the bits 9 and 2, respectively, in the interrupt enable register
> are useless?

Does it actually fire, i.e. pull the interrupt line on the GIC? Or is it just
the interrupt state showing an event? IIRC some other hardware blocks have this
behavior, such as the timer.

ChenYu

>> > And printing in the interrupt handler is a very bad idea.
>>
>> There are printed only when DEBUG is set, so not a problem ?
>
> It's always a problem, this adds a very significant latency and will
> fill the kernel log buffer at an insane rate, flushing out actual
> important messages, for no particular reason.
>> > > +
>> > > + return IRQ_HANDLED;
>> >
>> > The lack of spinlocks in there is quite worrying.
>> >
>>
>> The interrupt handler just do nothing harmfull if it race with itself.
>> Just stats, enabling NAPI etc..
>> Anyway, It miss a comment for that non-locking strategy
>
> The interrupt handler cannot race with itself. The interrupts will be
> masked on the local CPU and the interrupt can only be delivered to a
> single CPU (so, the one that the handler is currently running from).
>
>> > > +}
>> > > +
>> > > +static int sun8i_emac_probe(struct platform_device *pdev)
>> > > +{
>> > > + struct device_node *node = pdev->dev.of_node;
>> > > + struct sun8i_emac_priv *priv;
>> > > + struct net_device *ndev;
>> > > + struct resource *res;
>> > > + int ret;
>> > > +
>> > > + ret = dma_set_mask_and_coherent(>dev, DMA_BIT_MASK(32));
>> > > + if (ret) {
>> > > + dev_err(>dev, "No suitable DMA available\n");
>> > > + return ret;
>> > > + }
>> >
>> > Isn't that the default?
>> >
>> No, it is necessary on arm64 as apritzel requested.
>
> http://lxr.free-electrons.com/source/drivers/of/device.c#L93
>
> It seems to be shared between the two.
>
> Thanks!
> Maxime
>
> --
> Maxime Ripard, Free Electrons
> Embedded Linux and Kernel engineering
> http://free-electrons.com


[PATCH] Networking: Core: netpoll: Fixed a missing spin_unlock

2016-07-29 Thread Salil Kapur
In the case when the loop breaks at line 390, the txq lock is not
released. Added an unlock statement before the break statement.

Signed-off-by: Salil Kapur 
---
 net/core/netpoll.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/net/core/netpoll.c b/net/core/netpoll.c
index fc75c9e..9124f76 100644
--- a/net/core/netpoll.c
+++ b/net/core/netpoll.c
@@ -386,8 +386,10 @@ void netpoll_send_skb_on_dev(struct netpoll *np, struct 
sk_buff *skb,

!vlan_hw_offload_capable(netif_skb_features(skb),
 
skb->vlan_proto)) {
skb = __vlan_put_tag(skb, 
skb->vlan_proto, vlan_tx_tag_get(skb));
-   if (unlikely(!skb))
+   if (unlikely(!skb)) {
+   __netif_tx_unlock(txq);
break;
+   }
skb->vlan_tci = 0;
}
 
-- 
1.9.1



[PATCH net-next 00/10] Fix warning and issues

2016-07-29 Thread Iyappan Subramanian
This patch set fixes the following warning and issues,

  1. Fix kbuild warning
- drivers: net: xgene: Fix kbuild warning
  2. unmap DMA memory on xgene_Enet_delete_bufpoool() 
- drivers: net: xgene: fix: Add dma_unmap_single
  3. Delete descriptor rings and buffer pools on error
- drivers: net: xgene: fix: Delete descriptor rings and buffer pools
  4. Fix error desconstruction on probe()
- drivers: net: xgene: Fix error deconstruction path
  5. Fix RSS indirection table fields
- drivers: net: xgene: Fix RSS indirection table fields
  6. Change the port init sequence as per hardware specification
- drivers: net: xgene: Change port init sequence
  7. Fix link not recovered after link is down issue
- drivers: net: xgene: XFI PCS reset when link is down
  8. Fix link up is reported when no SFP+ module is plugged in issue
- drivers: net: xgene: Poll link status via GPIO
- dtb: xgene: Add rxlos-gpios property
- Documentation: dtb: xgene: Add rxlos GPIO mapping

Signed-off-by: Iyappan Subramanian 
Tested-by: Fushen Chen 
---

Iyappan Subramanian (10):
  drivers: net: xgene: Fix kbuild warning
  drivers: net: xgene: fix: Add dma_unmap_single
  drivers: net: xgene: fix: Delete descriptor rings and buffer pools
  drivers: net: xgene: Fix error deconstruction path
  drivers: net: xgene: Fix RSS indirection table fields
  drivers: net: xgene: Change port init sequence
  drivers: net: xgene: XFI PCS reset when link is down
  drivers: net: xgene: Poll link status via GPIO
  dtb: xgene: Add rxlos-gpios property
  Documentation: dtb: xgene: Add rxlos GPIO mapping

 .../devicetree/bindings/net/apm-xgene-enet.txt |  1 +
 arch/arm64/boot/dts/apm/apm-mustang.dts|  1 +
 drivers/net/ethernet/apm/xgene/Kconfig |  1 +
 drivers/net/ethernet/apm/xgene/xgene_enet_cle.c| 17 --
 drivers/net/ethernet/apm/xgene/xgene_enet_cle.h| 10 +++-
 drivers/net/ethernet/apm/xgene/xgene_enet_hw.c |  7 +--
 drivers/net/ethernet/apm/xgene/xgene_enet_hw.h |  6 ++
 drivers/net/ethernet/apm/xgene/xgene_enet_main.c   | 69 +-
 drivers/net/ethernet/apm/xgene/xgene_enet_main.h   |  2 +
 drivers/net/ethernet/apm/xgene/xgene_enet_xgmac.c  | 53 -
 drivers/net/ethernet/apm/xgene/xgene_enet_xgmac.h  |  4 ++
 11 files changed, 140 insertions(+), 31 deletions(-)

-- 
1.9.1



[PATCH net-next 01/10] drivers: net: xgene: Fix kbuild warning

2016-07-29 Thread Iyappan Subramanian
This patch fixes the following kbuild warning, when ACPI was not enabled.

>> drivers/net/ethernet/apm/xgene/xgene_enet_hw.c:878:23: warning: 'phy_dev' 
>> may be used uninitialized in this function [-Wmaybe-uninitialized]
 phy_dev->advertising = phy_dev->supported;

Signed-off-by: Iyappan Subramanian 
---
 drivers/net/ethernet/apm/xgene/xgene_enet_hw.c | 7 ++-
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/apm/xgene/xgene_enet_hw.c 
b/drivers/net/ethernet/apm/xgene/xgene_enet_hw.c
index 7714b7d..b6bc6fa 100644
--- a/drivers/net/ethernet/apm/xgene/xgene_enet_hw.c
+++ b/drivers/net/ethernet/apm/xgene/xgene_enet_hw.c
@@ -776,15 +776,11 @@ int xgene_enet_phy_connect(struct net_device *ndev)
netdev_err(ndev, "Could not connect to PHY\n");
return -ENODEV;
}
-
-   pdata->phy_dev = phy_dev;
} else {
 #ifdef CONFIG_ACPI
struct acpi_device *adev = acpi_phy_find_device(dev);
if (adev)
-   pdata->phy_dev =  adev->driver_data;
-
-   phy_dev = pdata->phy_dev;
+   phy_dev =  adev->driver_data;
 
if (!phy_dev ||
phy_connect_direct(ndev, phy_dev, _enet_adjust_link,
@@ -795,6 +791,7 @@ int xgene_enet_phy_connect(struct net_device *ndev)
 #endif
}
 
+   pdata->phy_dev = phy_dev;
pdata->phy_speed = SPEED_UNKNOWN;
phy_dev->supported &= ~SUPPORTED_10baseT_Half &
  ~SUPPORTED_100baseT_Half &
-- 
1.9.1



[PATCH net-next 04/10] drivers: net: xgene: Fix error deconstruction path

2016-07-29 Thread Iyappan Subramanian
Since register_netdev() call in xgene_enet_probe() was moved down to
the end, it doesn't properly handle errors that may occur, by
deconstructing everything that was setup before the error occurred.

Signed-off-by: Iyappan Subramanian 
---
 drivers/net/ethernet/apm/xgene/xgene_enet_main.c | 27 +---
 1 file changed, 19 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/apm/xgene/xgene_enet_main.c 
b/drivers/net/ethernet/apm/xgene/xgene_enet_main.c
index 6fc5698..d05f999 100644
--- a/drivers/net/ethernet/apm/xgene/xgene_enet_main.c
+++ b/drivers/net/ethernet/apm/xgene/xgene_enet_main.c
@@ -1643,8 +1643,8 @@ static int xgene_enet_probe(struct platform_device *pdev)
}
 #endif
if (!pdata->enet_id) {
-   free_netdev(ndev);
-   return -ENODEV;
+   ret = -ENODEV;
+   goto err;
}
 
ret = xgene_enet_get_resources(pdata);
@@ -1667,7 +1667,7 @@ static int xgene_enet_probe(struct platform_device *pdev)
 
ret = xgene_enet_init_hw(pdata);
if (ret)
-   goto err_netdev;
+   goto err;
 
link_state = pdata->mac_ops->link_state;
if (pdata->phy_mode == PHY_INTERFACE_MODE_XGMII) {
@@ -1677,21 +1677,32 @@ static int xgene_enet_probe(struct platform_device 
*pdev)
ret = xgene_enet_mdio_config(pdata);
else
INIT_DELAYED_WORK(>link_work, link_state);
+
+   if (ret)
+   goto err1;
}
-   if (ret)
-   goto err;
 
xgene_enet_napi_add(pdata);
ret = register_netdev(ndev);
if (ret) {
netdev_err(ndev, "Failed to register netdev\n");
-   goto err;
+   goto err2;
}
 
return 0;
 
-err_netdev:
-   unregister_netdev(ndev);
+err2:
+   /*
+* If necessary, free_netdev() will call netif_napi_del() and undo
+* the effects of xgene_enet_napi_add()'s calls to netif_napi_add().
+*/
+
+   if (pdata->mdio_driver)
+   xgene_enet_phy_disconnect(pdata);
+   else if (pdata->phy_mode == PHY_INTERFACE_MODE_RGMII)
+   xgene_enet_mdio_remove(pdata);
+err1:
+   xgene_enet_delete_desc_rings(pdata);
 err:
free_netdev(ndev);
return ret;
-- 
1.9.1



[PATCH net-next 05/10] drivers: net: xgene: Fix RSS indirection table fields

2016-07-29 Thread Iyappan Subramanian
This patch fixes FPSel and NxtFPSel fields length to 5-bit value.

Signed-off-by: Quan Nguyen 
Signed-off-by: Iyappan Subramanian 
Tested-by: Fushen Chen 
---
 drivers/net/ethernet/apm/xgene/xgene_enet_cle.c | 17 -
 drivers/net/ethernet/apm/xgene/xgene_enet_cle.h | 10 +++---
 2 files changed, 19 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/apm/xgene/xgene_enet_cle.c 
b/drivers/net/ethernet/apm/xgene/xgene_enet_cle.c
index 472c0fb..23d72af 100644
--- a/drivers/net/ethernet/apm/xgene/xgene_enet_cle.c
+++ b/drivers/net/ethernet/apm/xgene/xgene_enet_cle.c
@@ -32,12 +32,19 @@ static void xgene_cle_sband_to_hw(u8 frag, enum 
xgene_cle_prot_version ver,
SET_VAL(SB_HDRLEN, len);
 }
 
-static void xgene_cle_idt_to_hw(u32 dstqid, u32 fpsel,
+static void xgene_cle_idt_to_hw(struct xgene_enet_pdata *pdata,
+   u32 dstqid, u32 fpsel,
u32 nfpsel, u32 *idt_reg)
 {
-   *idt_reg =  SET_VAL(IDT_DSTQID, dstqid) |
-   SET_VAL(IDT_FPSEL, fpsel) |
-   SET_VAL(IDT_NFPSEL, nfpsel);
+   if (pdata->enet_id == XGENE_ENET1) {
+   *idt_reg = SET_VAL(IDT_DSTQID, dstqid) |
+  SET_VAL(IDT_FPSEL1, fpsel)  |
+  SET_VAL(IDT_NFPSEL1, nfpsel);
+   } else {
+   *idt_reg = SET_VAL(IDT_DSTQID, dstqid) |
+  SET_VAL(IDT_FPSEL, fpsel)   |
+  SET_VAL(IDT_NFPSEL, nfpsel);
+   }
 }
 
 static void xgene_cle_dbptr_to_hw(struct xgene_enet_pdata *pdata,
@@ -344,7 +351,7 @@ static int xgene_cle_set_rss_idt(struct xgene_enet_pdata 
*pdata)
nfpsel = 0;
idt_reg = 0;
 
-   xgene_cle_idt_to_hw(dstqid, fpsel, nfpsel, _reg);
+   xgene_cle_idt_to_hw(pdata, dstqid, fpsel, nfpsel, _reg);
ret = xgene_cle_dram_wr(>cle, _reg, 1, i,
RSS_IDT, CLE_CMD_WR);
if (ret)
diff --git a/drivers/net/ethernet/apm/xgene/xgene_enet_cle.h 
b/drivers/net/ethernet/apm/xgene/xgene_enet_cle.h
index 33c5f6b..9ac9f8e 100644
--- a/drivers/net/ethernet/apm/xgene/xgene_enet_cle.h
+++ b/drivers/net/ethernet/apm/xgene/xgene_enet_cle.h
@@ -196,9 +196,13 @@ enum xgene_cle_ptree_dbptrs {
 #define IDT_DSTQID_POS 0
 #define IDT_DSTQID_LEN 12
 #define IDT_FPSEL_POS  12
-#define IDT_FPSEL_LEN  4
-#define IDT_NFPSEL_POS 16
-#define IDT_NFPSEL_LEN 4
+#define IDT_FPSEL_LEN  5
+#define IDT_NFPSEL_POS 17
+#define IDT_NFPSEL_LEN 5
+#define IDT_FPSEL1_POS 12
+#define IDT_FPSEL1_LEN 4
+#define IDT_NFPSEL1_POS16
+#define IDT_NFPSEL1_LEN4
 
 struct xgene_cle_ptree_branch {
bool valid;
-- 
1.9.1



[PATCH net-next 03/10] drivers: net: xgene: fix: Delete descriptor rings and buffer pools

2016-07-29 Thread Iyappan Subramanian
xgene_enet_init_hw() should delete any descriptor rings and
buffer pools setup should le_ops->cle_init() return an error.

Signed-off-by: Iyappan Subramanian 
---
 drivers/net/ethernet/apm/xgene/xgene_enet_main.c | 12 +++-
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/apm/xgene/xgene_enet_main.c 
b/drivers/net/ethernet/apm/xgene/xgene_enet_main.c
index 5246457..6fc5698 100644
--- a/drivers/net/ethernet/apm/xgene/xgene_enet_main.c
+++ b/drivers/net/ethernet/apm/xgene/xgene_enet_main.c
@@ -1464,10 +1464,8 @@ static int xgene_enet_init_hw(struct xgene_enet_pdata 
*pdata)
buf_pool = pdata->rx_ring[i]->buf_pool;
xgene_enet_init_bufpool(buf_pool);
ret = xgene_enet_refill_bufpool(buf_pool, pdata->rx_buff_cnt);
-   if (ret) {
-   xgene_enet_delete_desc_rings(pdata);
-   return ret;
-   }
+   if (ret)
+   goto err;
}
 
dst_ring_num = xgene_enet_dst_ring_num(pdata->rx_ring[0]);
@@ -1484,7 +1482,7 @@ static int xgene_enet_init_hw(struct xgene_enet_pdata 
*pdata)
ret = pdata->cle_ops->cle_init(pdata);
if (ret) {
netdev_err(ndev, "Preclass Tree init error\n");
-   return ret;
+   goto err;
}
} else {
pdata->port_ops->cle_bypass(pdata, dst_ring_num, buf_pool->id);
@@ -1494,6 +1492,10 @@ static int xgene_enet_init_hw(struct xgene_enet_pdata 
*pdata)
pdata->mac_ops->init(pdata);
 
return ret;
+
+err:
+   xgene_enet_delete_desc_rings(pdata);
+   return ret;
 }
 
 static void xgene_enet_setup_ops(struct xgene_enet_pdata *pdata)
-- 
1.9.1



[PATCH net-next 08/10] drivers: net: xgene: Poll link status via GPIO

2016-07-29 Thread Iyappan Subramanian
When 10GbE SFP+ module is not plugged in or cable is not connected,
the link status register does not report the proper state due
to floating signal. This patch checks the module present status via an
GPIO to determine whether to ignore the link status register and report
link down.

Signed-off-by: Quan Nguyen 
Signed-off-by: Iyappan Subramanian 
Tested-by: Fushen Chen 
---
 drivers/net/ethernet/apm/xgene/Kconfig|  1 +
 drivers/net/ethernet/apm/xgene/xgene_enet_main.c  | 15 +++
 drivers/net/ethernet/apm/xgene/xgene_enet_main.h  |  1 +
 drivers/net/ethernet/apm/xgene/xgene_enet_xgmac.c |  6 ++
 4 files changed, 23 insertions(+)

diff --git a/drivers/net/ethernet/apm/xgene/Kconfig 
b/drivers/net/ethernet/apm/xgene/Kconfig
index 300e3b5..6c60a7d 100644
--- a/drivers/net/ethernet/apm/xgene/Kconfig
+++ b/drivers/net/ethernet/apm/xgene/Kconfig
@@ -4,6 +4,7 @@ config NET_XGENE
depends on ARCH_XGENE || COMPILE_TEST
select PHYLIB
select MDIO_XGENE
+   select GPIO_XGENE_SB
help
  This is the Ethernet driver for the on-chip ethernet interface on the
  APM X-Gene SoC.
diff --git a/drivers/net/ethernet/apm/xgene/xgene_enet_main.c 
b/drivers/net/ethernet/apm/xgene/xgene_enet_main.c
index 383e7ad..bda386d 100644
--- a/drivers/net/ethernet/apm/xgene/xgene_enet_main.c
+++ b/drivers/net/ethernet/apm/xgene/xgene_enet_main.c
@@ -19,6 +19,7 @@
  * along with this program.  If not, see .
  */
 
+#include 
 #include "xgene_enet_main.h"
 #include "xgene_enet_hw.h"
 #include "xgene_enet_sgmac.h"
@@ -1322,6 +1323,18 @@ static int xgene_enet_check_phy_handle(struct 
xgene_enet_pdata *pdata)
return 0;
 }
 
+static void xgene_enet_gpiod_get(struct xgene_enet_pdata *pdata)
+{
+   struct device *dev = >pdev->dev;
+
+   if (pdata->phy_mode != PHY_INTERFACE_MODE_XGMII)
+   return;
+
+   pdata->sfp_rdy = gpiod_get(dev, "rxlos", GPIOD_IN);
+   if (IS_ERR(pdata->sfp_rdy))
+   pdata->sfp_rdy = gpiod_get(dev, "sfp", GPIOD_IN);
+}
+
 static int xgene_enet_get_resources(struct xgene_enet_pdata *pdata)
 {
struct platform_device *pdev;
@@ -1411,6 +1424,8 @@ static int xgene_enet_get_resources(struct 
xgene_enet_pdata *pdata)
if (ret)
return ret;
 
+   xgene_enet_gpiod_get(pdata);
+
pdata->clk = devm_clk_get(>dev, NULL);
if (IS_ERR(pdata->clk)) {
/* Firmware may have set up the clock already. */
diff --git a/drivers/net/ethernet/apm/xgene/xgene_enet_main.h 
b/drivers/net/ethernet/apm/xgene/xgene_enet_main.h
index 53f4a16..b339fc1 100644
--- a/drivers/net/ethernet/apm/xgene/xgene_enet_main.h
+++ b/drivers/net/ethernet/apm/xgene/xgene_enet_main.h
@@ -217,6 +217,7 @@ struct xgene_enet_pdata {
u8 tx_delay;
u8 rx_delay;
bool mdio_driver;
+   struct gpio_desc *sfp_rdy;
 };
 
 struct xgene_indirect_ctl {
diff --git a/drivers/net/ethernet/apm/xgene/xgene_enet_xgmac.c 
b/drivers/net/ethernet/apm/xgene/xgene_enet_xgmac.c
index 4087dba..d672e71 100644
--- a/drivers/net/ethernet/apm/xgene/xgene_enet_xgmac.c
+++ b/drivers/net/ethernet/apm/xgene/xgene_enet_xgmac.c
@@ -18,6 +18,8 @@
  * along with this program.  If not, see .
  */
 
+#include 
+#include 
 #include "xgene_enet_main.h"
 #include "xgene_enet_hw.h"
 #include "xgene_enet_xgmac.h"
@@ -399,10 +401,14 @@ static void xgene_enet_link_state(struct work_struct 
*work)
 {
struct xgene_enet_pdata *pdata = container_of(to_delayed_work(work),
 struct xgene_enet_pdata, link_work);
+   struct gpio_desc *sfp_rdy = pdata->sfp_rdy;
struct net_device *ndev = pdata->ndev;
u32 link_status, poll_interval;
 
link_status = xgene_enet_link_status(pdata);
+   if (link_status && !IS_ERR(sfp_rdy) && !gpiod_get_value(sfp_rdy))
+   link_status = 0;
+
if (link_status) {
if (!netif_carrier_ok(ndev)) {
netif_carrier_on(ndev);
-- 
1.9.1



[PATCH net-next 10/10] Documentation: dtb: xgene: Add rxlos GPIO mapping

2016-07-29 Thread Iyappan Subramanian
Signed-off-by: Quan Nguyen 
Signed-off-by: Iyappan Subramanian 
Tested-by: Fushen Chen 
---
 Documentation/devicetree/bindings/net/apm-xgene-enet.txt | 1 +
 1 file changed, 1 insertion(+)

diff --git a/Documentation/devicetree/bindings/net/apm-xgene-enet.txt 
b/Documentation/devicetree/bindings/net/apm-xgene-enet.txt
index 05f705e3..b83ae67 100644
--- a/Documentation/devicetree/bindings/net/apm-xgene-enet.txt
+++ b/Documentation/devicetree/bindings/net/apm-xgene-enet.txt
@@ -24,6 +24,7 @@ Required properties for all the ethernet interfaces:
 - clocks: Reference to the clock entry.
 - local-mac-address: MAC address assigned to this device
 - phy-connection-type: Interface type between ethernet device and PHY device
+- rxlos-gpios: rxlos GPIO mapping
 
 Required properties for ethernet interfaces that have external PHY:
 - phy-handle: Reference to a PHY node connected to this device
-- 
1.9.1



[PATCH net-next 07/10] drivers: net: xgene: XFI PCS reset when link is down

2016-07-29 Thread Iyappan Subramanian
This patch fixes the link recovery issue, by doing PCS reset
when the link is down.

Signed-off-by: Fushen Chen 
Signed-off-by: Iyappan Subramanian 
---
 drivers/net/ethernet/apm/xgene/xgene_enet_hw.h|  6 
 drivers/net/ethernet/apm/xgene/xgene_enet_main.c  |  1 +
 drivers/net/ethernet/apm/xgene/xgene_enet_main.h  |  1 +
 drivers/net/ethernet/apm/xgene/xgene_enet_xgmac.c | 42 +++
 drivers/net/ethernet/apm/xgene/xgene_enet_xgmac.h |  4 +++
 5 files changed, 54 insertions(+)

diff --git a/drivers/net/ethernet/apm/xgene/xgene_enet_hw.h 
b/drivers/net/ethernet/apm/xgene/xgene_enet_hw.h
index 179a44d..8a8d055 100644
--- a/drivers/net/ethernet/apm/xgene/xgene_enet_hw.h
+++ b/drivers/net/ethernet/apm/xgene/xgene_enet_hw.h
@@ -124,6 +124,12 @@ enum xgene_enet_rm {
 #define MAC_READ_REG_OFFSET0x0c
 #define MAC_COMMAND_DONE_REG_OFFSET0x10
 
+#define PCS_ADDR_REG_OFFSET0x00
+#define PCS_COMMAND_REG_OFFSET 0x04
+#define PCS_WRITE_REG_OFFSET   0x08
+#define PCS_READ_REG_OFFSET0x0c
+#define PCS_COMMAND_DONE_REG_OFFSET0x10
+
 #define MII_MGMT_CONFIG_ADDR   0x20
 #define MII_MGMT_COMMAND_ADDR  0x24
 #define MII_MGMT_ADDRESS_ADDR  0x28
diff --git a/drivers/net/ethernet/apm/xgene/xgene_enet_main.c 
b/drivers/net/ethernet/apm/xgene/xgene_enet_main.c
index d05f999..383e7ad 100644
--- a/drivers/net/ethernet/apm/xgene/xgene_enet_main.c
+++ b/drivers/net/ethernet/apm/xgene/xgene_enet_main.c
@@ -1435,6 +1435,7 @@ static int xgene_enet_get_resources(struct 
xgene_enet_pdata *pdata)
} else {
pdata->mcx_mac_addr = base_addr + BLOCK_AXG_MAC_OFFSET;
pdata->mcx_mac_csr_addr = base_addr + BLOCK_AXG_MAC_CSR_OFFSET;
+   pdata->pcs_addr = base_addr + BLOCK_PCS_OFFSET;
}
pdata->rx_buff_cnt = NUM_PKT_BUF;
 
diff --git a/drivers/net/ethernet/apm/xgene/xgene_enet_main.h 
b/drivers/net/ethernet/apm/xgene/xgene_enet_main.h
index 217546e..53f4a16 100644
--- a/drivers/net/ethernet/apm/xgene/xgene_enet_main.h
+++ b/drivers/net/ethernet/apm/xgene/xgene_enet_main.h
@@ -196,6 +196,7 @@ struct xgene_enet_pdata {
void __iomem *mcx_mac_addr;
void __iomem *mcx_mac_csr_addr;
void __iomem *base_addr;
+   void __iomem *pcs_addr;
void __iomem *ring_csr_addr;
void __iomem *ring_cmd_addr;
int phy_mode;
diff --git a/drivers/net/ethernet/apm/xgene/xgene_enet_xgmac.c 
b/drivers/net/ethernet/apm/xgene/xgene_enet_xgmac.c
index d53c053..4087dba 100644
--- a/drivers/net/ethernet/apm/xgene/xgene_enet_xgmac.c
+++ b/drivers/net/ethernet/apm/xgene/xgene_enet_xgmac.c
@@ -84,6 +84,21 @@ static void xgene_enet_wr_mac(struct xgene_enet_pdata *pdata,
   wr_addr);
 }
 
+static void xgene_enet_wr_pcs(struct xgene_enet_pdata *pdata,
+ u32 wr_addr, u32 wr_data)
+{
+   void __iomem *addr, *wr, *cmd, *cmd_done;
+
+   addr = pdata->pcs_addr + PCS_ADDR_REG_OFFSET;
+   wr = pdata->pcs_addr + PCS_WRITE_REG_OFFSET;
+   cmd = pdata->pcs_addr + PCS_COMMAND_REG_OFFSET;
+   cmd_done = pdata->pcs_addr + PCS_COMMAND_DONE_REG_OFFSET;
+
+   if (!xgene_enet_wr_indirect(addr, wr, cmd, cmd_done, wr_addr, wr_data))
+   netdev_err(pdata->ndev, "PCS write failed, addr: %04x\n",
+  wr_addr);
+}
+
 static void xgene_enet_rd_csr(struct xgene_enet_pdata *pdata,
  u32 offset, u32 *val)
 {
@@ -122,6 +137,7 @@ static bool xgene_enet_rd_indirect(void __iomem *addr, void 
__iomem *rd,
 
return true;
 }
+
 static void xgene_enet_rd_mac(struct xgene_enet_pdata *pdata,
  u32 rd_addr, u32 *rd_data)
 {
@@ -137,6 +153,21 @@ static void xgene_enet_rd_mac(struct xgene_enet_pdata 
*pdata,
   rd_addr);
 }
 
+static void xgene_enet_rd_pcs(struct xgene_enet_pdata *pdata,
+ u32 rd_addr, u32 *rd_data)
+{
+   void __iomem *addr, *rd, *cmd, *cmd_done;
+
+   addr = pdata->pcs_addr + PCS_ADDR_REG_OFFSET;
+   rd = pdata->pcs_addr + PCS_READ_REG_OFFSET;
+   cmd = pdata->pcs_addr + PCS_COMMAND_REG_OFFSET;
+   cmd_done = pdata->pcs_addr + PCS_COMMAND_DONE_REG_OFFSET;
+
+   if (!xgene_enet_rd_indirect(addr, rd, cmd, cmd_done, rd_addr, rd_data))
+   netdev_err(pdata->ndev, "PCS read failed, addr: %04x\n",
+  rd_addr);
+}
+
 static int xgene_enet_ecc_init(struct xgene_enet_pdata *pdata)
 {
struct net_device *ndev = pdata->ndev;
@@ -171,6 +202,15 @@ static void xgene_xgmac_reset(struct xgene_enet_pdata 
*pdata)
xgene_enet_wr_mac(pdata, AXGMAC_CONFIG_0, 0);
 }
 
+static void xgene_pcs_reset(struct xgene_enet_pdata *pdata)
+{
+   u32 data;
+
+   xgene_enet_rd_pcs(pdata, PCS_CONTROL_1, );
+   xgene_enet_wr_pcs(pdata, PCS_CONTROL_1, data | PCS_CTRL_PCS_RST);
+   

[PATCH net-next 09/10] dtb: xgene: Add rxlos-gpios property

2016-07-29 Thread Iyappan Subramanian
Added rxlos GPIO mapping by adding rxlos-gpios property.

Signed-off-by: Quan Nguyen 
Signed-off-by: Iyappan Subramanian 
Tested-by: Fushen Chen 
---
 arch/arm64/boot/dts/apm/apm-mustang.dts | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/arm64/boot/dts/apm/apm-mustang.dts 
b/arch/arm64/boot/dts/apm/apm-mustang.dts
index b7fb5d9..32a961c 100644
--- a/arch/arm64/boot/dts/apm/apm-mustang.dts
+++ b/arch/arm64/boot/dts/apm/apm-mustang.dts
@@ -74,6 +74,7 @@
 
  {
status = "ok";
+   rxlos-gpios = < 12 1>;
 };
 
  {
-- 
1.9.1



[PATCH net-next 06/10] drivers: net: xgene: Change port init sequence

2016-07-29 Thread Iyappan Subramanian
This patch rearranges the port initialization sequence as recommended by
hardware specification.  This patch also removes, mac_init() call from
xgene_enet_link_state(), as it was not required.

Signed-off-by: Iyappan Subramanian 
Tested-by: Fushen Chen 
---
 drivers/net/ethernet/apm/xgene/xgene_enet_xgmac.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/apm/xgene/xgene_enet_xgmac.c 
b/drivers/net/ethernet/apm/xgene/xgene_enet_xgmac.c
index 9c6ad0d..d53c053 100644
--- a/drivers/net/ethernet/apm/xgene/xgene_enet_xgmac.c
+++ b/drivers/net/ethernet/apm/xgene/xgene_enet_xgmac.c
@@ -216,12 +216,12 @@ static void xgene_xgmac_init(struct xgene_enet_pdata 
*pdata)
data |= CFG_RSIF_FPBUFF_TIMEOUT_EN;
xgene_enet_wr_csr(pdata, XG_RSIF_CONFIG_REG_ADDR, data);
 
-   xgene_enet_wr_csr(pdata, XG_CFG_BYPASS_ADDR, RESUME_TX);
-   xgene_enet_wr_csr(pdata, XGENET_RX_DV_GATE_REG_0_ADDR, 0);
xgene_enet_rd_csr(pdata, XG_ENET_SPARE_CFG_REG_ADDR, );
data |= BIT(12);
xgene_enet_wr_csr(pdata, XG_ENET_SPARE_CFG_REG_ADDR, data);
xgene_enet_wr_csr(pdata, XG_ENET_SPARE_CFG_REG_1_ADDR, 0x82);
+   xgene_enet_wr_csr(pdata, XGENET_RX_DV_GATE_REG_0_ADDR, 0);
+   xgene_enet_wr_csr(pdata, XG_CFG_BYPASS_ADDR, RESUME_TX);
 }
 
 static void xgene_xgmac_rx_enable(struct xgene_enet_pdata *pdata)
@@ -366,7 +366,6 @@ static void xgene_enet_link_state(struct work_struct *work)
if (link_status) {
if (!netif_carrier_ok(ndev)) {
netif_carrier_on(ndev);
-   xgene_xgmac_init(pdata);
xgene_xgmac_rx_enable(pdata);
xgene_xgmac_tx_enable(pdata);
netdev_info(ndev, "Link is Up - 10Gbps\n");
-- 
1.9.1



[PATCH net-next 02/10] drivers: net: xgene: fix: Add dma_unmap_single

2016-07-29 Thread Iyappan Subramanian
In addition to xgene_enet_delete_bufpool() freeing skbs, their associated
dma memory should also be unmapped.

Signed-off-by: Iyappan Subramanian 
---
 drivers/net/ethernet/apm/xgene/xgene_enet_main.c | 14 --
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/apm/xgene/xgene_enet_main.c 
b/drivers/net/ethernet/apm/xgene/xgene_enet_main.c
index d1d6b5e..5246457 100644
--- a/drivers/net/ethernet/apm/xgene/xgene_enet_main.c
+++ b/drivers/net/ethernet/apm/xgene/xgene_enet_main.c
@@ -72,7 +72,6 @@ static int xgene_enet_refill_bufpool(struct 
xgene_enet_desc_ring *buf_pool,
skb = netdev_alloc_skb_ip_align(ndev, len);
if (unlikely(!skb))
return -ENOMEM;
-   buf_pool->rx_skb[tail] = skb;
 
dma_addr = dma_map_single(dev, skb->data, len, DMA_FROM_DEVICE);
if (dma_mapping_error(dev, dma_addr)) {
@@ -81,6 +80,8 @@ static int xgene_enet_refill_bufpool(struct 
xgene_enet_desc_ring *buf_pool,
return -EINVAL;
}
 
+   buf_pool->rx_skb[tail] = skb;
+
raw_desc->m1 = cpu_to_le64(SET_VAL(DATAADDR, dma_addr) |
   SET_VAL(BUFDATALEN, bufdatalen) |
   SET_BIT(COHERENT));
@@ -102,12 +103,21 @@ static u8 xgene_enet_hdr_len(const void *data)
 
 static void xgene_enet_delete_bufpool(struct xgene_enet_desc_ring *buf_pool)
 {
+   struct device *dev = ndev_to_dev(buf_pool->ndev);
+   struct xgene_enet_raw_desc16 *raw_desc;
+   dma_addr_t dma_addr;
int i;
 
/* Free up the buffers held by hardware */
for (i = 0; i < buf_pool->slots; i++) {
-   if (buf_pool->rx_skb[i])
+   if (buf_pool->rx_skb[i]) {
dev_kfree_skb_any(buf_pool->rx_skb[i]);
+
+   raw_desc = _pool->raw_desc16[i];
+   dma_addr = GET_VAL(DATAADDR, le64_to_cpu(raw_desc->m1));
+   dma_unmap_single(dev, dma_addr, XGENE_ENET_MAX_MTU,
+DMA_FROM_DEVICE);
+   }
}
 }
 
-- 
1.9.1



Re: [PATCH] bpf: fix size of copy_to_user in percpu map.

2016-07-29 Thread Daniel Borkmann

On 07/29/2016 10:03 PM, William Tu wrote:

Hi Daniel and Alexei,

Thanks for the reply. My apology for too brief description. In short,
in my environment, running samples/bpf/test_map always segfault under
percpu array/hash map operations. I think it's due to stack
corruption.

I'm not using ARM. It's x86 in a VM with 2 vcpu. By printk() in kernel, I got
   num_possible_cpu == 64
   num_online_cpu == 2 == sysconf(_SC_NPROCESSORS_CONF)


Ok, thanks for the data!


So at samples/bpf/test_maps.c, test_percpu_arraymap_sanity(),
we define:
   long values[nr_cpus]; //nr_cpus=2

   ... // create map and update map ...

   /* check that key=0 is also found and zero initialized */
   assert(bpf_lookup_elem(map_fd, , values) == 0 &&
 values[0] == 0 && values[nr_cpus - 1] == 0);

Here we enter the bpf syscall, calls into kernel "map_lookup_elem()"
and we calculate:
   value_size = round_up(map->value_size, 8) * num_possible_cpus();
   // which in my case 8 * 64 = 512
   ...
   // then copy to user, which writes 512B to the "values[nr_cpus]" on stack
   if (copy_to_user(uvalue, value, value_size) != 0)

And I think this 512B write to userspace corrupts the userspace stack
and causes a coredump. After bpf_lookup_elem() calls, gdb shows
'values' points to memory address 0x0.

To fix it, I could either
1). declare values array based on num_possible_cpu in test_map.c,
   long values[64];
or 2) in kernel, only copying 8*2 = 16 byte from kernel to user.


But I think the patch of using num_online_cpus() would also not be correct
in the sense that f.e. your application could alloc an array at time X
where map lookup at time Y would not fit to the expectations anymore due
to CPU hotplugging (since apparently _SC_NPROCESSORS_CONF maps to online
CPUs in some cases). So also there you could potentially corrupt your
application or mem allocator in user space, or not all your valid data
might get copied, hmm.


Regards,
William


On Fri, Jul 29, 2016 at 12:54 AM, Daniel Borkmann  wrote:

On 07/29/2016 08:47 AM, Alexei Starovoitov wrote:


On Thu, Jul 28, 2016 at 05:42:21PM -0700, William Tu wrote:


The total size of value copy_to_user() writes to userspace should
be the (current number of cpu) * (value size), instead of
num_possible_cpus() * (value size).  Found by samples/bpf/test_maps.c,
which always copies 512 byte to userspace, crashing the userspace
program stack.



hmm. I'm missing something. The sample code assumes no cpu hutplug,
so sysconf(_SC_NPROCESSORS_CONF) == num_possible_cpu == num_online_cpu,
unless there is crazy INIT_ALL_POSSIBLE config option is used.



Are you using ARM by chance? What is the count that you get in
user space and from kernel side?

http://lists.infradead.org/pipermail/linux-arm-kernel/2011-June/054177.html




igb: question regarding auto-negotiation

2016-07-29 Thread Dominic Curran

Hi

This question refers to igb codebase.
I have a question regarding the setting of hw->mac.autoneg.

Is it correct to say for igb driver:
   "if speed=1000 and duplex=FULL and media_type=COPPER  then  only 
auto-negotiate enable is supported"


i.e.
   with these settings (speed/duplex/media_type) then auto-negotiate 
can _not_ be disabled.  Correct ?


I say this for two reasons:
1) The code in igb_set_spd_dplx() seems to indicate it:

   case SPEED_1000 + DUPLEX_FULL:
mac->autoneg = 1;
adapter->hw.phy.autoneg_advertised = ADVERTISE_1000_FULL;
break;

2) Instrumenting the driver, I always see the autoneg code in 
e1000_check_for_copper_link_generic()  get called after an igb_reset().



Have i understood correctly ?

thanks in advance
dom



[iproute PATCH 1/2] include: Add linux/sctp.h

2016-07-29 Thread Phil Sutter
This header does not exist in this form upstream yet, as it contains
struct sctp_info which is required for SCTP support in 'ss' and hasn't
been exported yet.

Signed-off-by: Phil Sutter 
---
 include/linux/sctp.h | 1005 ++
 1 file changed, 1005 insertions(+)
 create mode 100644 include/linux/sctp.h

diff --git a/include/linux/sctp.h b/include/linux/sctp.h
new file mode 100644
index 0..eee08c066679e
--- /dev/null
+++ b/include/linux/sctp.h
@@ -0,0 +1,1005 @@
+/* SCTP kernel implementation
+ * (C) Copyright IBM Corp. 2001, 2004
+ * Copyright (c) 1999-2000 Cisco, Inc.
+ * Copyright (c) 1999-2001 Motorola, Inc.
+ * Copyright (c) 2002 Intel Corp.
+ *
+ * This file is part of the SCTP kernel implementation
+ *
+ * This header represents the structures and constants needed to support
+ * the SCTP Extension to the Sockets API.
+ *
+ * This SCTP implementation is free software;
+ * you can redistribute it and/or modify it under the terms of
+ * the GNU General Public License as published by
+ * the Free Software Foundation; either version 2, or (at your option)
+ * any later version.
+ *
+ * This SCTP implementation is distributed in the hope that it
+ * will be useful, but WITHOUT ANY WARRANTY; without even the implied
+ * 
+ * warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
+ * See the GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with GNU CC; see the file COPYING.  If not, see
+ * .
+ *
+ * Please send any bug reports or fixes you make to the
+ * email address(es):
+ *lksctp developers 
+ *
+ * Or submit a bug report through the following website:
+ *http://www.sf.net/projects/lksctp
+ *
+ * Written or modified by:
+ *La Monte H.P. Yarroll
+ *R. Stewart   
+ *K. Morneau   
+ *Q. Xie   
+ *Karl Knutson 
+ *Jon Grimm
+ *Daisy Chang  
+ *Ryan Layer   
+ *Ardelle Fan  
+ *Sridhar Samudrala
+ *Inaky Perez-Gonzalez 
+ *Vlad Yasevich
+ *
+ * Any bugs reported given to us we will try to fix... any fixes shared will
+ * be incorporated into the next SCTP release.
+ */
+
+#ifndef _SCTP_H
+#define _SCTP_H
+
+#include 
+#include 
+
+typedef __s32 sctp_assoc_t;
+
+/* The following symbols come from the Sockets API Extensions for
+ * SCTP .
+ */
+#define SCTP_RTOINFO   0
+#define SCTP_ASSOCINFO  1
+#define SCTP_INITMSG   2
+#define SCTP_NODELAY   3   /* Get/set nodelay option. */
+#define SCTP_AUTOCLOSE 4
+#define SCTP_SET_PEER_PRIMARY_ADDR 5
+#define SCTP_PRIMARY_ADDR  6
+#define SCTP_ADAPTATION_LAYER  7
+#define SCTP_DISABLE_FRAGMENTS 8
+#define SCTP_PEER_ADDR_PARAMS  9
+#define SCTP_DEFAULT_SEND_PARAM10
+#define SCTP_EVENTS11
+#define SCTP_I_WANT_MAPPED_V4_ADDR 12  /* Turn on/off mapped v4 addresses  */
+#define SCTP_MAXSEG13  /* Get/set maximum fragment. */
+#define SCTP_STATUS14
+#define SCTP_GET_PEER_ADDR_INFO15
+#define SCTP_DELAYED_ACK_TIME  16
+#define SCTP_DELAYED_ACK SCTP_DELAYED_ACK_TIME
+#define SCTP_DELAYED_SACK SCTP_DELAYED_ACK_TIME
+#define SCTP_CONTEXT   17
+#define SCTP_FRAGMENT_INTERLEAVE   18
+#define SCTP_PARTIAL_DELIVERY_POINT19 /* Set/Get partial delivery point */
+#define SCTP_MAX_BURST 20  /* Set/Get max burst */
+#define SCTP_AUTH_CHUNK21  /* Set only: add a chunk type to 
authenticate */
+#define SCTP_HMAC_IDENT22
+#define SCTP_AUTH_KEY  23
+#define SCTP_AUTH_ACTIVE_KEY   24
+#define SCTP_AUTH_DELETE_KEY   25
+#define SCTP_PEER_AUTH_CHUNKS  26  /* Read only */
+#define SCTP_LOCAL_AUTH_CHUNKS 27  /* Read only */
+#define SCTP_GET_ASSOC_NUMBER  28  /* Read only */
+#define SCTP_GET_ASSOC_ID_LIST 29  /* Read only */
+#define SCTP_AUTO_ASCONF   30
+#define SCTP_PEER_ADDR_THLDS   31
+#define SCTP_RECVRCVINFO   32
+#define SCTP_RECVNXTINFO   33
+#define SCTP_DEFAULT_SNDINFO   34
+
+/* Internal Socket Options. Some of the sctp library functions are
+ * implemented using these socket options.
+ */
+#define SCTP_SOCKOPT_BINDX_ADD 100 /* BINDX requests for adding addrs */
+#define SCTP_SOCKOPT_BINDX_REM 101 /* BINDX requests for removing addrs. */
+#define SCTP_SOCKOPT_PEELOFF   102 /* peel off association. */
+/* Options 104-106 are deprecated and removed. Do not use this space */
+#define SCTP_SOCKOPT_CONNECTX_OLD  107 /* CONNECTX old requests. */
+#define 

[iproute PATCH 0/2] ss: Implement sctp_diag support

2016-07-29 Thread Phil Sutter
This patch series adds the necessary bits to make use of recently added
sctp_diag module in Linux. There are a few pending kernel fixes
currently under review which this series kind of depends on, but it is
already useable as is (although a bit buggy here and there).

The first patch adds include/linux/sctp.h which albeit existing in
kernel uapi headers contains modifications included in above mentioned
kernel fixes so doesn't comply with upstream and therefore shouldn't be
applied as-is (at least not now). Probably it will be obsoleted by the
usual kernel headers import anyway.

So please don't consider this series as the final word, but rather a
base for early review/testing and feedback.

Phil Sutter (2):
  include: Add linux/sctp.h
  ss: Add support for SCTP protocol

 include/linux/sctp.h | 1005 ++
 misc/ss.c|  212 ++-
 2 files changed, 1209 insertions(+), 8 deletions(-)
 create mode 100644 include/linux/sctp.h

-- 
2.8.2



[iproute PATCH 2/2] ss: Add support for SCTP protocol

2016-07-29 Thread Phil Sutter
This makes use of the sctp_diag interface recently added to the kernel.

Joint work with Xin Long who provided the PoC implementation which I
merely polished up a bit.

Signed-off-by: Phil Sutter 
---
 misc/ss.c | 212 +++---
 1 file changed, 204 insertions(+), 8 deletions(-)

diff --git a/misc/ss.c b/misc/ss.c
index e758f5720a452..6a8f65af259f9 100644
--- a/misc/ss.c
+++ b/misc/ss.c
@@ -42,6 +42,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #define MAGIC_SEQ 123456
 
@@ -101,6 +102,7 @@ int show_header = 1;
 /* If show_users & show_proc_ctx only do user_ent_hash_build() once */
 int user_ent_hash_build_init;
 int follow_events;
+int sctp_ino;
 
 int netid_width;
 int state_width;
@@ -110,6 +112,7 @@ int serv_width;
 int screen_width;
 
 static const char *TCP_PROTO = "tcp";
+static const char *SCTP_PROTO = "sctp";
 static const char *UDP_PROTO = "udp";
 static const char *RAW_PROTO = "raw";
 static const char *dg_proto;
@@ -125,13 +128,14 @@ enum {
PACKET_DG_DB,
PACKET_R_DB,
NETLINK_DB,
+   SCTP_DB,
MAX_DB
 };
 
 #define PACKET_DBM ((1<ino)
+   return false;
+   return true;
+}
+
 static void sock_state_print(struct sockstat *s, const char *sock_name)
 {
if (netid_width)
-   printf("%-*s ", netid_width, sock_name);
-   if (state_width)
-   printf("%-*s ", state_width, sstate_name[s->state]);
+   printf("%-*s ", netid_width,
+  is_sctp_assoc(s, sock_name) ? "" : sock_name);
+   if (state_width) {
+   if (is_sctp_assoc(s, sock_name))
+   printf("`- %-*s ", state_width - 3,
+  sctp_sstate_name[s->state]);
+   else
+   printf("%-*s ", state_width, sstate_name[s->state]);
+   }
 
printf("%-6d %-6d ", s->rq, s->wq);
 }
@@ -901,6 +950,8 @@ static void init_service_resolver(void)
c->proto = TCP_PROTO;
else if (strcmp(proto, UDP_PROTO) == 0)
c->proto = UDP_PROTO;
+   else if (strcmp(proto, SCTP_PROTO) == 0)
+   c->proto = SCTP_PROTO;
else
c->proto = NULL;
c->next = rlist;
@@ -1628,6 +1679,8 

Re: [PATCH v5 4/8] thunderbolt: Communication with the ICM (firmware)

2016-07-29 Thread Greg KH
On Fri, Jul 29, 2016 at 02:02:24PM -0700, Stephen Hemminger wrote:
> On Thu, 28 Jul 2016 11:15:17 +0300
> Amir Levy  wrote:
> 
> > +static LIST_HEAD(controllers_list);
> > +static DECLARE_RWSEM(controllers_list_rwsem);
> 
> Why use a semaphore when simple spinlock or mutex would be better?

And never use a RW semaphore unless you can benchmark the difference
from a normal lock.  If you can't benchmark it, then don't use it...


Re: [PATCH net] tcp: fix functions of tcp_congestion_ops from being called before initialization

2016-07-29 Thread Li, Ji
Thank you for reply. I don’t think there would be kernel crash. But there must
be some unexpected behaviors caused by calling before initialization. Let’s 
still
use dctcp as an example.

If SYN loss happens during active open, dctcp_ssthresh() is called to calculate
new ssthresh using uninitialized dctcp_alpha (i.e. 0), instead of using 
specified
alpha as module parameter. Is this expected?  Another example, when ACK for
SYN is being processed, dctcp_update_alpha() is called with uninitialized 
prior_snd_una (again, 0). It makes local variable acked_bytes be just 
tp->snd_una, which is so wrong and then is used to calculate new alpha. I agree
that alpha will be initialized eventually when .init() gets called. But what is 
the 
point to invoke those functions with uninitialized parameters at first place?
 
The possible unexpected effect for particular congestion control depends on
how each congestion control algorithm requires their parameters. IMHO, it is
unreasonable and dangerous to call a ca_ops function with their parameters
that are supposed to be initialized as non-zero value.

By “non-established state”, are you asking TCP_SYN_SENT/TCP_SYN_RECV. 
In that case, the patch falls back to tcp_reno_ssthresh() for .ssthresh() if 
uninitialized, and .cong_avoid() will not be called if uninitialized. My 
impression is that init_cwnd should not grow by SYN/ACK or its
acknowledgement in 3WHS according to RFC 3390. Please let
me know if it is wrong. But, if ca_ops functions are really needed to be called
during 3WHS, why don’t we initialize them earlier?


On 7/29/16, 5:09 AM, "Florian Westphal"  wrote:

Li, Ji  wrote:
> In Linux 3.17 and earlier, tcp_init_congestion_ops (i.e. tcp_reno) is
> used as the ca_ops during 3WHS, and after 3WHS, ca_ops is assigned as 
> the default congestion control set by sysctl and immediately its 
parameters
> stored in icsk_ca_priv[] are initialized. Commit 55d8694fa82c ("net:
> tcp: assign tcp cong_ops when tcp sk is created") splits assignment and
> initialization into two steps: assignment is done before SYN or SYN-ACK
> is sent out; initialization is done after 3WHS (assume without
> fastopen). But this can cause out-of-order invocation for ca_ops functions
> other than .init() during 3WHS, as they could be called before its
> parameters get initialized. It may cause unexpected behavior for
> congestion controls, and make troubles for those that need dynamic
> object allocation, like tcp_cdg etc.

What exactly is the problem?
Kernel crash?

AFAICS cdg can cope with NULL ca->gradients.

> We used tcp_dctcp as an example to visualize the problem, and set it as
> default congestion control via sysctl. Three parameters
> (ca->prior_snd_una, ca->prior_rcv_nxt, ca->dctcp_alpha) were monitored
> when functions, such as dctcp_update_alpha() and dctcp_ssthresh(), are
> called during 3WHS. All of three are found to be zero, which is likely
> impossible if dctcp_init() was called ahead, where those three
> parameters should be initialized. Some other congestion controls are
> examined too and the same problem was reproduced.

Why is this a problem?

> diff --git a/include/net/tcp.h b/include/net/tcp.h
> +{
> +   if (inet_csk(sk)->icsk_ca_initialized)
> +   return inet_csk(sk)->icsk_ca_ops->ssthresh(sk);
> +   else
> +   return tcp_reno_ssthresh(sk);
> +}
> +
>  /* Enter Loss state. If we detect SACK reneging, forget all SACK 
information
>   * and reset tags completely, otherwise preserve SACKs. If receiver
>   * dropped its ofo queue, we will know this due to reneging detection.
> @@ -1896,7 +1904,7 @@ void tcp_enter_loss(struct sock *sk)
> !after(tp->high_seq, tp->snd_una) ||
> (icsk->icsk_ca_state == TCP_CA_Loss && 
!icsk->icsk_retransmits)) {
> tp->prior_ssthresh = tcp_current_ssthresh(sk);
> -   tp->snd_ssthresh = icsk->icsk_ca_ops->ssthresh(sk);
> +   tp->snd_ssthresh = tcp_ca_ssthresh(sk);
> tcp_ca_event(sk, CA_EVENT_LOSS);
> tcp_init_undo(tp);
> }

Can you explain how we can do loss recovery on a non-established
connection ?

> @@ -3335,7 +3343,8 @@ static void tcp_cong_control(struct sock *sk, u32 
ack, u32 acked_sacked,
> if (tcp_in_cwnd_reduction(sk)) {
> /* Reduce cwnd if state mandates */
> tcp_cwnd_reduction(sk, acked_sacked, flag);
> -   } else if (tcp_may_raise_cwnd(sk, flag)) {
> +   } else if (tcp_may_raise_cwnd(sk, flag) &&
> +  inet_csk(sk)->icsk_ca_initialized) {
> /* Advance cwnd if state allows */
> tcp_cong_avoid(sk, ack, acked_sacked);


Re: [PATCH v5 6/8] thunderbolt: Networking transmit and receive

2016-07-29 Thread Stephen Hemminger
On Thu, 28 Jul 2016 11:15:19 +0300
Amir Levy  wrote:

> + /* pad short packets */
> + if (unlikely(skb->len < ETH_ZLEN)) {
> + int pad_len = ETH_ZLEN - skb->len;
> +
> + /* The skb is freed on error */
> + if (unlikely(skb_pad(skb, pad_len))) {
> + cleaned_count += frame_count;
> + continue;
> + }
> + __skb_put(skb, pad_len);
> + }

Packets should be padded on transmit, not on receive??


Re: [PATCH v5 4/8] thunderbolt: Communication with the ICM (firmware)

2016-07-29 Thread Stephen Hemminger
On Thu, 28 Jul 2016 11:15:17 +0300
Amir Levy  wrote:

> +int nhi_send_message(struct tbt_nhi_ctxt *nhi_ctxt, enum pdf_value pdf,
> +  u32 msg_len, const u8 *msg, bool ignore_icm_resp)
> +{

Why not make msg a void * and not have to do so many casts?


Re: [PATCH v5 4/8] thunderbolt: Communication with the ICM (firmware)

2016-07-29 Thread Stephen Hemminger
On Thu, 28 Jul 2016 11:15:17 +0300
Amir Levy  wrote:

> +static LIST_HEAD(controllers_list);
> +static DECLARE_RWSEM(controllers_list_rwsem);

Why use a semaphore when simple spinlock or mutex would be better?


Re: [PATCH 2/3] sctp_diag: export timer value only if it is active

2016-07-29 Thread Marcelo Ricardo Leitner
On Fri, Jul 29, 2016 at 06:59:39PM +0200, Phil Sutter wrote:
> Since it is exported as unsigned value, userspace has no way detecting
> whether it is negative or just very large. Therefore do this in kernel
> space where it is a simple comparison.
> 
> Signed-off-by: Phil Sutter 
> ---
>  net/sctp/sctp_diag.c | 10 ++
>  1 file changed, 6 insertions(+), 4 deletions(-)
> 
> diff --git a/net/sctp/sctp_diag.c b/net/sctp/sctp_diag.c
> index f69edcf219e51..0ad6033a7330c 100644
> --- a/net/sctp/sctp_diag.c
> +++ b/net/sctp/sctp_diag.c
> @@ -40,10 +40,12 @@ static void inet_diag_msg_sctpasoc_fill(struct 
> inet_diag_msg *r,
>   }
>  
>   r->idiag_state = asoc->state;
> - r->idiag_timer = SCTP_EVENT_TIMEOUT_T3_RTX;
> - r->idiag_retrans = asoc->rtx_data_chunks;
> - r->idiag_expires = jiffies_to_msecs(
> - asoc->timeouts[SCTP_EVENT_TIMEOUT_T3_RTX] - jiffies);

I think we have two issues here, prior to your patch, but I noticed
while reviewing it :-)

This array is actually not based on jiffies but on intervals instead, as
per:

sm_sideeffect.c:
case SCTP_CMD_TIMER_START:   [1]
timer = >timers[cmd->obj.to];
timeout = asoc->timeouts[cmd->obj.to];   <---
BUG_ON(!timeout);

timer->expires = jiffies + timeout;  <---

But more importantly, this array is actually not used for this timeout
and the timeout is sctp_transport dependant, as per:

/* Schedule retransmission on the given transport */
void sctp_transport_immediate_rtx(struct sctp_transport *t)
{
/* Stop pending T3_rtx_timer */
if (del_timer(>T3_rtx_timer))
sctp_transport_put(t);

sctp_retransmit(>asoc->outqueue, t, SCTP_RTXR_T3_RTX);
if (!timer_pending(>T3_rtx_timer)) {
if (!mod_timer(>T3_rtx_timer, jiffies + t->rto))
 
sctp_transport_hold(t);

Note how on sctp_get_sctp_info() it fetches the RTO (which is T3_RTX)
this way:
info->sctpi_p_rto = jiffies_to_msecs(prim->rto);
If we want to know how long is left for the timer to expire, we have to
read directly from it.

With git grep -A 1 TIMER_START we can confirm that [1] is never hit for
SCTP_EVENT_TIMEOUT_T3_RTX. Yet, the asoc is allocated with kzalloc(), so
I guess you were just reading -jiffies in there.

Note however that the stats rtx_data_chunks is the accumulated stats,
it's good, and that we may have multiple T3 timers running at once, with
different timeouts.

Xin, ideas on how we can fix this? I'm not sure if we can dump
per-transport info in there. Not as it is now, I guess.

> + if (asoc->timeouts[SCTP_EVENT_TIMEOUT_T3_RTX] > jiffies) {
> + r->idiag_timer = SCTP_EVENT_TIMEOUT_T3_RTX;
> + r->idiag_retrans = asoc->rtx_data_chunks;
> + r->idiag_expires = jiffies_to_msecs(
> + asoc->timeouts[SCTP_EVENT_TIMEOUT_T3_RTX] - jiffies);
> + }
>  }
>  
>  static int inet_diag_msg_sctpladdrs_fill(struct sk_buff *skb,
> -- 
> 2.8.2
> 


Re: [PATCH] bpf: fix size of copy_to_user in percpu map.

2016-07-29 Thread William Tu
Hi Daniel and Alexei,

Thanks for the reply. My apology for too brief description. In short,
in my environment, running samples/bpf/test_map always segfault under
percpu array/hash map operations. I think it's due to stack
corruption.

I'm not using ARM. It's x86 in a VM with 2 vcpu. By printk() in kernel, I got
  num_possible_cpu == 64
  num_online_cpu == 2 == sysconf(_SC_NPROCESSORS_CONF)

So at samples/bpf/test_maps.c, test_percpu_arraymap_sanity(),
we define:
  long values[nr_cpus]; //nr_cpus=2

  ... // create map and update map ...

  /* check that key=0 is also found and zero initialized */
  assert(bpf_lookup_elem(map_fd, , values) == 0 &&
values[0] == 0 && values[nr_cpus - 1] == 0);

Here we enter the bpf syscall, calls into kernel "map_lookup_elem()"
and we calculate:
  value_size = round_up(map->value_size, 8) * num_possible_cpus();
  // which in my case 8 * 64 = 512
  ...
  // then copy to user, which writes 512B to the "values[nr_cpus]" on stack
  if (copy_to_user(uvalue, value, value_size) != 0)

And I think this 512B write to userspace corrupts the userspace stack
and causes a coredump. After bpf_lookup_elem() calls, gdb shows
'values' points to memory address 0x0.

To fix it, I could either
1). declare values array based on num_possible_cpu in test_map.c,
  long values[64];
or 2) in kernel, only copying 8*2 = 16 byte from kernel to user.

Regards,
William


On Fri, Jul 29, 2016 at 12:54 AM, Daniel Borkmann  wrote:
> On 07/29/2016 08:47 AM, Alexei Starovoitov wrote:
>>
>> On Thu, Jul 28, 2016 at 05:42:21PM -0700, William Tu wrote:
>>>
>>> The total size of value copy_to_user() writes to userspace should
>>> be the (current number of cpu) * (value size), instead of
>>> num_possible_cpus() * (value size).  Found by samples/bpf/test_maps.c,
>>> which always copies 512 byte to userspace, crashing the userspace
>>> program stack.
>>
>>
>> hmm. I'm missing something. The sample code assumes no cpu hutplug,
>> so sysconf(_SC_NPROCESSORS_CONF) == num_possible_cpu == num_online_cpu,
>> unless there is crazy INIT_ALL_POSSIBLE config option is used.
>
>
> Are you using ARM by chance? What is the count that you get in
> user space and from kernel side?
>
> http://lists.infradead.org/pipermail/linux-arm-kernel/2011-June/054177.html


Re: [PATCH v5] net: sched: convert qdisc linked list to hashtable

2016-07-29 Thread kbuild test robot
Hi,

[auto build test WARNING on v4.7-rc7]
[cannot apply to net/master net-next/master ipsec-next/master]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Jiri-Kosina/net-sched-convert-qdisc-linked-list-to-hashtable/20160729-155412
config: x86_64-randconfig-s1-07292101 (attached as .config)
compiler: gcc-4.4 (Debian 4.4.7-8) 4.4.7
reproduce:
# save the attached .config to linux build tree
make ARCH=x86_64 

All warnings (new ones prefixed by >>):

   In file included from drivers/net/ethernet/dec/tulip/de4x5.c:480:
   drivers/net/ethernet/dec/tulip/de4x5.h:864:1: warning: "HASH_BITS" redefined
   In file included from include/linux/netdevice.h:55,
from drivers/net/ethernet/dec/tulip/de4x5.c:459:
>> include/linux/hashtable.h:27:1: warning: this is the location of the 
>> previous definition
   drivers/net/ethernet/dec/tulip/de4x5.o: warning: objtool: 
dma_free_coherent()+0x4b: function has unreachable instruction

vim +27 include/linux/hashtable.h

d9b482c8 Sasha Levin  2012-10-30  11  #include 
d9b482c8 Sasha Levin  2012-10-30  12  #include 
d9b482c8 Sasha Levin  2012-10-30  13  #include 
d9b482c8 Sasha Levin  2012-10-30  14  
d9b482c8 Sasha Levin  2012-10-30  15  #define DEFINE_HASHTABLE(name, bits)  
\
d9b482c8 Sasha Levin  2012-10-30  16struct hlist_head name[1 << (bits)] =   
\
d9b482c8 Sasha Levin  2012-10-30  17{ [0 ... ((1 << (bits)) 
- 1)] = HLIST_HEAD_INIT }
d9b482c8 Sasha Levin  2012-10-30  18  
6180d9de Eric Dumazet 2015-11-18  19  #define 
DEFINE_READ_MOSTLY_HASHTABLE(name, bits)  \
6180d9de Eric Dumazet 2015-11-18  20struct hlist_head name[1 << (bits)] 
__read_mostly = \
6180d9de Eric Dumazet 2015-11-18  21{ [0 ... ((1 << (bits)) 
- 1)] = HLIST_HEAD_INIT }
6180d9de Eric Dumazet 2015-11-18  22  
d9b482c8 Sasha Levin  2012-10-30  23  #define DECLARE_HASHTABLE(name, bits) 
\
d9b482c8 Sasha Levin  2012-10-30  24struct hlist_head name[1 << (bits)]
d9b482c8 Sasha Levin  2012-10-30  25  
d9b482c8 Sasha Levin  2012-10-30  26  #define HASH_SIZE(name) (ARRAY_SIZE(name))
d9b482c8 Sasha Levin  2012-10-30 @27  #define HASH_BITS(name) 
ilog2(HASH_SIZE(name))
d9b482c8 Sasha Levin  2012-10-30  28  
d9b482c8 Sasha Levin  2012-10-30  29  /* Use hash_32 when possible to allow for 
fast 32bit hashing in 64bit kernels. */
d9b482c8 Sasha Levin  2012-10-30  30  #define hash_min(val, bits)   
\
d9b482c8 Sasha Levin  2012-10-30  31(sizeof(val) <= 4 ? hash_32(val, bits) 
: hash_long(val, bits))
d9b482c8 Sasha Levin  2012-10-30  32  
d9b482c8 Sasha Levin  2012-10-30  33  static inline void __hash_init(struct 
hlist_head *ht, unsigned int sz)
d9b482c8 Sasha Levin  2012-10-30  34  {
d9b482c8 Sasha Levin  2012-10-30  35unsigned int i;

:: The code at line 27 was first introduced by commit
:: d9b482c8ba1973a189f2d4c8175d405b87fbf2d7 hashtable: introduce a small 
and naive hashtable

:: TO: Sasha Levin <levinsasha...@gmail.com>
:: CC: Linus Torvalds <torva...@linux-foundation.org>

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: Binary data


Re: [PATCH] net: dsa: bcm_sf2: Unwind errors in correct order

2016-07-29 Thread Vivien Didelot
Florian Fainelli  writes:

> In case we cannot complete bcm_sf2_sw_setup() for any reason, and we
> go to the out_unmap label, but the MDIO bus has not been registered yet,
> we will hit the BUG condition in drivers/net/phy/mdio_bus.c about the
> bus not being registered. Fix this by dedicating a specific lable for
> when we fail after the MDIO bus has been successfully registered.
>
> Fixes: 461cd1b03e32 ("net: dsa: bcm_sf2: Register our slave MDIO bus")
> Signed-off-by: Florian Fainelli 

Reviewed-by: Vivien Didelot 


Re: [PATCH] net: dsa: bcm_sf2: Unwind errors in correct order

2016-07-29 Thread Florian Fainelli
On 07/29/2016 12:35 PM, Florian Fainelli wrote:
> In case we cannot complete bcm_sf2_sw_setup() for any reason, and we
> go to the out_unmap label, but the MDIO bus has not been registered yet,
> we will hit the BUG condition in drivers/net/phy/mdio_bus.c about the
> bus not being registered. Fix this by dedicating a specific lable for
> when we fail after the MDIO bus has been successfully registered.
> 
> Fixes: 461cd1b03e32 ("net: dsa: bcm_sf2: Register our slave MDIO bus")
> Signed-off-by: Florian Fainelli 

David, this is for 'net', forgot to mention it in the subject.
-- 
Florian


[PATCH] net: dsa: bcm_sf2: Unwind errors in correct order

2016-07-29 Thread Florian Fainelli
In case we cannot complete bcm_sf2_sw_setup() for any reason, and we
go to the out_unmap label, but the MDIO bus has not been registered yet,
we will hit the BUG condition in drivers/net/phy/mdio_bus.c about the
bus not being registered. Fix this by dedicating a specific lable for
when we fail after the MDIO bus has been successfully registered.

Fixes: 461cd1b03e32 ("net: dsa: bcm_sf2: Register our slave MDIO bus")
Signed-off-by: Florian Fainelli 
---
 drivers/net/dsa/bcm_sf2.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/net/dsa/bcm_sf2.c b/drivers/net/dsa/bcm_sf2.c
index cd1d630ae3a9..b2b838724a9b 100644
--- a/drivers/net/dsa/bcm_sf2.c
+++ b/drivers/net/dsa/bcm_sf2.c
@@ -1622,7 +1622,7 @@ static int bcm_sf2_sw_setup(struct dsa_switch *ds)
  "switch_0", priv);
if (ret < 0) {
pr_err("failed to request switch_0 IRQ\n");
-   goto out_unmap;
+   goto out_mdio;
}
 
ret = request_irq(priv->irq1, bcm_sf2_switch_1_isr, 0,
@@ -1679,6 +1679,8 @@ static int bcm_sf2_sw_setup(struct dsa_switch *ds)
 
 out_free_irq0:
free_irq(priv->irq0, priv);
+out_mdio:
+   bcm_sf2_mdio_unregister(priv);
 out_unmap:
base = >core;
for (i = 0; i < BCM_SF2_REGS_NUM; i++) {
@@ -1686,7 +1688,6 @@ out_unmap:
iounmap(*base);
base++;
}
-   bcm_sf2_mdio_unregister(priv);
return ret;
 }
 
-- 
2.7.4



Re: [RFC PATCH] xfrm: Add option to reset oif in xfrm lookup

2016-07-29 Thread subashab

Please don't try to workaround a bug with a sysctl.
If we have a bug here, we should fix it. Choosing
between bug A and bug B with a sysctl is not what
we are doing ;)


Sure, this was just a quick hack.


Can you give an example of your use case -- e.g., commands for others
(me) to reproduce?


Here is an equivalent set of rules. We see a difference in the oif when 
reset oif vs preserve it.
eth1 is the interface from which traffic is generated while eth0 is the 
tunnel.


--
#Commands
echo 1 > /proc/sys/net/ipv4/ip_forward
echo 1 > /proc/sys/net/ipv4/conf/all/accept_local
echo 1 > /proc/sys/net/ipv4/conf/eth0/accept_local
echo 1 > /proc/sys/net/ipv4/conf/eth1/accept_local

ip addr add 192.168.77.2/24 dev eth0
ip link set eth0 mtu 1400
ip link set eth0 up

ip addr add 192.168.33.2/24 dev eth1
ip link set eth1 mtu 1400
ip link set eth1 up

ip ru add to 192.168.33.1 lookup 8 prio 4000
ip ru add oif eth1 lookup 8 prio 4010
ip ru add to 192.168.77.1 lookup 9 prio 4030

ip route add default dev eth1 table 8
ip route add default dev eth0 table 9

iptables -t raw -A OUTPUT -j LOG --log-prefix "RAW-OUT >> "
iptables -t mangle -A POSTROUTING -j LOG --log-prefix "MAN-PST >> "

echo 0 > /proc/sys/net/ipv4/tcp_timestamps

# out direction
ip xfrm state add src 192.168.77.2 dst 192.168.77.1 proto esp spi 0x1234 
mode tunnel enc 'cbc(aes)' 
0xbb31df5b207dc1c7a8512eeda0b2d0691e27bc8059dbb82df616bb9955058cd5 auth 
'hmac(sha1)' 0x93b43b527d564efb9eac8cd04510b86e409f8ea7 flag af-unspec 
encap espinudp 4500 4500 0.0.0.0


ip xfrm policy add dir out src 192.168.33.2 tmpl src 192.168.77.2 dst 
192.168.77.1 proto esp spi 0x1234 mode tunnel


# in direction
ip xfrm state add src 192.168.77.1 dst 192.168.77.2 proto esp spi 0x4321 
mode tunnel enc 'cbc(aes)' 
0x5d3ca96d1af2eaa9cf8f1c1cace88f550e2a5b7b82027023287e1fe2a42f7f54 auth 
'hmac(sha1)' 0xcd09f850d7c0dd6dc0ed342619c1165571452f9d flag af-unspec 
encap espinudp 4500 4500 0.0.0.0


ip xfrm policy add dir in dst 192.168.33.2 tmpl src 192.168.77.1 dst 
192.168.77.2 proto esp spi 0x4321 mode tunnel
ip xfrm policy add dir fwd dst 192.168.33.2 tmpl src 192.168.77.1 dst 
192.168.77.2 proto esp spi 0x4321 mode tunnel

--

Output when resetting oif (3.18)

root@vm:~# ping -c 1 -I eth1 192.168.33.1
PING 192.168.33.1 (192.168.33.1) 56(84) bytes of data.
RAW-OUT >> IN= OUT=eth0 SRC=192.168.33.2 DST=192.168.33.1 LEN=84 
TOS=0x00 PREC=0x00 TTL=64 ID=801 DF PROTO=ICMP TYPE=8 CODE=0 ID=2040 
SEQ=1
MAN-PST >> IN= OUT=eth0 SRC=192.168.33.2 DST=192.168.33.1 LEN=84 
TOS=0x00 PREC=0x00 TTL=64 ID=801 DF PROTO=ICMP TYPE=8 CODE=0 ID=2040 
SEQ=1
RAW-OUT >> IN= OUT=eth0 SRC=192.168.77.2 DST=192.168.77.1 LEN=160 
TOS=0x00 PREC=0x00 TTL=64 ID=41757 DF PROTO=UDP SPT=4500 DPT=4500 
LEN=140
MAN-PST >> IN= OUT=eth0 SRC=192.168.77.2 DST=192.168.77.1 LEN=160 
TOS=0x00 PREC=0x00 TTL=64 ID=41757 DF PROTO=UDP SPT=4500 DPT=4500 
LEN=140


--

Output when preserving oif (4.4)

root@vm:~# ping -c 1 -I eth1 192.168.33.1
PING 192.168.33.1 (192.168.33.1) 56(84) bytes of data.
RAW-OUT >> IN= OUT=eth1 SRC=192.168.33.2 DST=192.168.33.1 LEN=84 
TOS=0x00 PREC=0x00 TTL=64 ID=20191 DF PROTO=ICMP TYPE=8 CODE=0 ID=2043 
SEQ=1
MAN-PST >> IN= OUT=eth1 SRC=192.168.33.2 DST=192.168.33.1 LEN=84 
TOS=0x00 PREC=0x00 TTL=64 ID=20191 DF PROTO=ICMP TYPE=8 CODE=0 ID=2043 
SEQ=1
RAW-OUT >> IN= OUT=eth1 SRC=192.168.77.2 DST=192.168.77.1 LEN=160 
TOS=0x00 PREC=0x00 TTL=64 ID=49515 DF PROTO=UDP SPT=4500 DPT=4500 
LEN=140
MAN-PST >> IN= OUT=eth1 SRC=192.168.77.2 DST=192.168.77.1 LEN=160 
TOS=0x00 PREC=0x00 TTL=64 ID=49515 DF PROTO=UDP SPT=4500 DPT=4500 
LEN=140


--
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a 
Linux Foundation Collaborative Project




Re: [PATCH v2 3/5] ARM: sun8i: dt: Add DT bindings documentation for Allwinner sun8i-emac

2016-07-29 Thread Maxime Ripard
On Fri, Jul 29, 2016 at 10:15:19AM +0200, LABBE Corentin wrote:
> > > > > +See ethernet.txt in the same directory for generic bindings for 
> > > > > ethernet
> > > > > +controllers.
> > > > > +
> > > > > +The device node referenced by "phy" or "phy-handle" should be a 
> > > > > child node
> > > > > +of this node. See phy.txt for the generic PHY bindings.
> > > > > +
> > > > > +Optional properties:
> > > > > +- phy-supply: phandle to a regulator if the PHY needs one
> > > > > +- phy-io-supply: phandle to a regulator if the PHY needs a another 
> > > > > one for I/O.
> > > > > +  This is sometimes found with RGMII PHYs, which use a 
> > > > > second
> > > > > +  regulator for the lower I/O voltage.
> > > > > +- allwinner,tx-delay: The setting of the TX clock delay chain
> > > > > +- allwinner,rx-delay: The setting of the RX clock delay chain
> > > > 
> > > > In which unit? What is the default value?
> > > 
> > > The unit is unknown to me, but I have added a comment for the
> > > default and acceptable range value.
> > 
> > That's unfortunate. We'll see how the DT maintainers feel about that.
> > 
> 
> I have searched for txdelay in Documentation, and found a few driver
> that give the units (us/ps).
>
> But in that case, the value in ps/us must be found in a table
> indexed by the Xxdelay value.
>
> So the settings seems always a raw number, and for sun8i-emac
> nothing in user manual could help to find what each value is/related
> to.
> 
> So the good value is either found by "try and test" or "copy the
> value found in fex file".

What I meant was that, just like you found out already, most of the
time the properties should be in absolute units, so that it doesn't
depend on some clock rate most likely in that case.

Maxime

-- 
Maxime Ripard, Free Electrons
Embedded Linux and Kernel engineering
http://free-electrons.com


signature.asc
Description: PGP signature


[PATCH net-next 3/3] bna: remove global bnad_list_mutex

2016-07-29 Thread Ivan Vecera
Remove global bnad_list_mutex as it is not used anymore. This makes
bnad_add_to_list() and bnad_remove_from_list() empty so remove them too.

Signed-off-by: Ivan Vecera 
---
 drivers/net/ethernet/brocade/bna/bnad.c | 20 
 1 file changed, 20 deletions(-)

diff --git a/drivers/net/ethernet/brocade/bna/bnad.c 
b/drivers/net/ethernet/brocade/bna/bnad.c
index 2bed050..f9df4b5 100644
--- a/drivers/net/ethernet/brocade/bna/bnad.c
+++ b/drivers/net/ethernet/brocade/bna/bnad.c
@@ -55,7 +55,6 @@ MODULE_PARM_DESC(bna_debugfs_enable, "Enables debugfs 
feature, default=1,"
  */
 static u32 bnad_rxqs_per_cq = 2;
 static atomic_t bna_id;
-static struct mutex bnad_list_mutex;
 static const u8 bnad_bcast_addr[] __aligned(2) =
{ 0xff, 0xff, 0xff, 0xff, 0xff, 0xff };
 
@@ -75,20 +74,6 @@ do { 
\
(_res_info)->res_u.mem_info.len = (_size);  \
 } while (0)
 
-static void
-bnad_add_to_list(struct bnad *bnad)
-{
-   mutex_lock(_list_mutex);
-   mutex_unlock(_list_mutex);
-}
-
-static void
-bnad_remove_from_list(struct bnad *bnad)
-{
-   mutex_lock(_list_mutex);
-   mutex_unlock(_list_mutex);
-}
-
 /*
  * Reinitialize completions in CQ, once Rx is taken down
  */
@@ -3569,14 +3554,12 @@ bnad_lock_init(struct bnad *bnad)
 {
spin_lock_init(>bna_lock);
mutex_init(>conf_mutex);
-   mutex_init(_list_mutex);
 }
 
 static void
 bnad_lock_uninit(struct bnad *bnad)
 {
mutex_destroy(>conf_mutex);
-   mutex_destroy(_list_mutex);
 }
 
 /* PCI Initialization */
@@ -3649,7 +3632,6 @@ bnad_pci_probe(struct pci_dev *pdev,
}
bnad = netdev_priv(netdev);
bnad_lock_init(bnad);
-   bnad_add_to_list(bnad);
bnad->id = atomic_inc_return(_id) - 1;
 
mutex_lock(>conf_mutex);
@@ -3804,7 +3786,6 @@ pci_uninit:
bnad_pci_uninit(pdev);
 unlock_mutex:
mutex_unlock(>conf_mutex);
-   bnad_remove_from_list(bnad);
bnad_lock_uninit(bnad);
free_netdev(netdev);
return err;
@@ -3842,7 +3823,6 @@ bnad_pci_remove(struct pci_dev *pdev)
bnad_disable_msix(bnad);
bnad_pci_uninit(pdev);
mutex_unlock(>conf_mutex);
-   bnad_remove_from_list(bnad);
bnad_lock_uninit(bnad);
/* Remove the debugfs node for this bnad */
kfree(bnad->regdata);
-- 
2.7.3



[PATCH net-next 0/3] bna: remove useless global variables

2016-07-29 Thread Ivan Vecera
The set removes useless global bnad_list as well as bnad->entry that track
a list of driver instances but it is not used anywhere. The associated
bnad_list_mutex is removed as well but as it is also used to protect
bna_id increment it is necessary to convert bna_id to atomic_t.

Signed-off-by: Ivan Vecera 

 drivers/net/ethernet/brocade/bna/bnad.c | 27 ++-
 drivers/net/ethernet/brocade/bna/bnad.h |  1 -
 2 files changed, 2 insertions(+), 26 deletions(-)

-- 
2.7.3



[PATCH net-next 2/3] bna: change type of bna_id to atomic_t

2016-07-29 Thread Ivan Vecera
Change type of bna_id to atomic_t. The bnad_list_mutex is used to prevent
a race when bna_id is incremented. After the change the mutex can be
removed in the next step.

Signed-off-by: Ivan Vecera 
---
 drivers/net/ethernet/brocade/bna/bnad.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/brocade/bna/bnad.c 
b/drivers/net/ethernet/brocade/bna/bnad.c
index 696bbae..2bed050 100644
--- a/drivers/net/ethernet/brocade/bna/bnad.c
+++ b/drivers/net/ethernet/brocade/bna/bnad.c
@@ -54,7 +54,7 @@ MODULE_PARM_DESC(bna_debugfs_enable, "Enables debugfs 
feature, default=1,"
  * Global variables
  */
 static u32 bnad_rxqs_per_cq = 2;
-static u32 bna_id;
+static atomic_t bna_id;
 static struct mutex bnad_list_mutex;
 static const u8 bnad_bcast_addr[] __aligned(2) =
{ 0xff, 0xff, 0xff, 0xff, 0xff, 0xff };
@@ -79,7 +79,6 @@ static void
 bnad_add_to_list(struct bnad *bnad)
 {
mutex_lock(_list_mutex);
-   bnad->id = bna_id++;
mutex_unlock(_list_mutex);
 }
 
@@ -3651,6 +3650,7 @@ bnad_pci_probe(struct pci_dev *pdev,
bnad = netdev_priv(netdev);
bnad_lock_init(bnad);
bnad_add_to_list(bnad);
+   bnad->id = atomic_inc_return(_id) - 1;
 
mutex_lock(>conf_mutex);
/*
-- 
2.7.3



[PATCH net-next 1/3] bna: remove useless linked list

2016-07-29 Thread Ivan Vecera
Remove global variable bnad_list and bnad->list_entry that are used
as list of bna driver instances. It is not necessary and useless.

Signed-off-by: Ivan Vecera 
---
 drivers/net/ethernet/brocade/bna/bnad.c | 3 ---
 drivers/net/ethernet/brocade/bna/bnad.h | 1 -
 2 files changed, 4 deletions(-)

diff --git a/drivers/net/ethernet/brocade/bna/bnad.c 
b/drivers/net/ethernet/brocade/bna/bnad.c
index 771cc26..696bbae 100644
--- a/drivers/net/ethernet/brocade/bna/bnad.c
+++ b/drivers/net/ethernet/brocade/bna/bnad.c
@@ -56,7 +56,6 @@ MODULE_PARM_DESC(bna_debugfs_enable, "Enables debugfs 
feature, default=1,"
 static u32 bnad_rxqs_per_cq = 2;
 static u32 bna_id;
 static struct mutex bnad_list_mutex;
-static LIST_HEAD(bnad_list);
 static const u8 bnad_bcast_addr[] __aligned(2) =
{ 0xff, 0xff, 0xff, 0xff, 0xff, 0xff };
 
@@ -80,7 +79,6 @@ static void
 bnad_add_to_list(struct bnad *bnad)
 {
mutex_lock(_list_mutex);
-   list_add_tail(>list_entry, _list);
bnad->id = bna_id++;
mutex_unlock(_list_mutex);
 }
@@ -89,7 +87,6 @@ static void
 bnad_remove_from_list(struct bnad *bnad)
 {
mutex_lock(_list_mutex);
-   list_del(>list_entry);
mutex_unlock(_list_mutex);
 }
 
diff --git a/drivers/net/ethernet/brocade/bna/bnad.h 
b/drivers/net/ethernet/brocade/bna/bnad.h
index f4ed816..46f7b84 100644
--- a/drivers/net/ethernet/brocade/bna/bnad.h
+++ b/drivers/net/ethernet/brocade/bna/bnad.h
@@ -288,7 +288,6 @@ struct bnad_rx_unmap_q {
 struct bnad {
struct net_device   *netdev;
u32 id;
-   struct list_headlist_entry;
 
/* Data path */
struct bnad_tx_info tx_info[BNAD_MAX_TX];
-- 
2.7.3



Re: [PATCH v2 1/5] ethernet: add sun8i-emac driver

2016-07-29 Thread Maxime Ripard
On Thu, Jul 28, 2016 at 04:57:34PM +0200, LABBE Corentin wrote:
> > > +static int sun8i_mdio_write(struct mii_bus *bus, int phy_addr, int 
> > > phy_reg,
> > > + u16 data)
> > > +{
> > > + struct net_device *ndev = bus->priv;
> > > + struct sun8i_emac_priv *priv = netdev_priv(ndev);
> > > + u32 reg;
> > > + int err;
> > > +
> > > + err = readl_poll_timeout(priv->base + SUN8I_EMAC_MDIO_CMD, reg,
> > > +  !(reg & MDIO_CMD_MII_BUSY), 100, 1);
> > > + if (err) {
> > > + dev_err(priv->dev, "%s timeout %x\n", __func__, reg);
> > > + return err;
> > > + }
> > 
> > Why the poll_timeout variant?
> > 
> Because, in case of bad clock/reset/regulator setting, the value
> expected to come could never be set.

Ah, I missed that it was for a busy bit, my bad. However, you seem to
be using that on several occasions, maybe you could turn that into a
function?

> > > +static void sun8i_emac_unset_syscon(struct net_device *ndev)
> > > +{
> > > + struct sun8i_emac_priv *priv = netdev_priv(ndev);
> > > + u32 reg = 0;
> > > +
> > > + if (priv->variant == H3_EMAC)
> > > + reg = H3_EPHY_DEFAULT_VALUE;
> > 
> > Why do you need that?
> > 
> For resetting the syscon to the factory default.

Yes, but does it matter? Does it have any side effect? Is that
register shared with another device?

Otherwise, either it won't be used anymore, and you don't care, or you
will reload the driver later, and the driver should work whatever
state is programmed in there. In both cases, you don't need to reset
that value.

> > > +static irqreturn_t sun8i_emac_dma_interrupt(int irq, void *dev_id)
> > > +{
> > > + struct net_device *ndev = dev_id;
> > > + struct sun8i_emac_priv *priv = netdev_priv(ndev);
> > > + u32 v, u;
> > > +
> > > + v = readl(priv->base + SUN8I_EMAC_INT_STA);
> > > +
> > > + /* When this bit is asserted, a frame transmission is completed. */
> > > + if (v & BIT(0)) {
> > > + priv->estats.tx_int++;
> > > + writel(0, priv->base + SUN8I_EMAC_INT_EN);
> > > + napi_schedule(>napi);
> > > + }
> > > +
> > > + /* When this bit is asserted, the TX DMA FSM is stopped. */
> > > + if (v & BIT(1))
> > > + priv->estats.tx_dma_stop++;
> > > +
> > > + /* When this asserted, the TX DMA can not acquire next TX descriptor
> > > +  * and TX DMA FSM is suspended.
> > > + */
> > > + if (v & BIT(2))
> > > + priv->estats.tx_dma_ua++;
> > > +
> > > + if (v & BIT(3))
> > > + netif_dbg(priv, intr, ndev, "Unhandled interrupt TX TIMEOUT\n");
> > 
> > Why do you enable that interrupt if you can't handle it?
>
> Some interrupt fire even when not enabled (like RX_BUF_UA_INT/TX_BUF_UA_INT)

So the bits 9 and 2, respectively, in the interrupt enable register
are useless?

> > And printing in the interrupt handler is a very bad idea.
> 
> There are printed only when DEBUG is set, so not a problem ?

It's always a problem, this adds a very significant latency and will
fill the kernel log buffer at an insane rate, flushing out actual
important messages, for no particular reason.
> > > +
> > > + return IRQ_HANDLED;
> > 
> > The lack of spinlocks in there is quite worrying.
> > 
> 
> The interrupt handler just do nothing harmfull if it race with itself.
> Just stats, enabling NAPI etc..
> Anyway, It miss a comment for that non-locking strategy

The interrupt handler cannot race with itself. The interrupts will be
masked on the local CPU and the interrupt can only be delivered to a
single CPU (so, the one that the handler is currently running from).

> > > +}
> > > +
> > > +static int sun8i_emac_probe(struct platform_device *pdev)
> > > +{
> > > + struct device_node *node = pdev->dev.of_node;
> > > + struct sun8i_emac_priv *priv;
> > > + struct net_device *ndev;
> > > + struct resource *res;
> > > + int ret;
> > > +
> > > + ret = dma_set_mask_and_coherent(>dev, DMA_BIT_MASK(32));
> > > + if (ret) {
> > > + dev_err(>dev, "No suitable DMA available\n");
> > > + return ret;
> > > + }
> > 
> > Isn't that the default?
> > 
> No, it is necessary on arm64 as apritzel requested.

http://lxr.free-electrons.com/source/drivers/of/device.c#L93

It seems to be shared between the two.

Thanks!
Maxime

-- 
Maxime Ripard, Free Electrons
Embedded Linux and Kernel engineering
http://free-electrons.com


signature.asc
Description: PGP signature


[PATCH 1/3] sctp: Export struct sctp_info to userspace

2016-07-29 Thread Phil Sutter
This is required to correctly interpret INET_DIAG_INFO messages exported
by sctp_diag module.

Signed-off-by: Phil Sutter 
---
 include/linux/sctp.h  | 64 ---
 include/uapi/linux/sctp.h | 64 +++
 2 files changed, 64 insertions(+), 64 deletions(-)

diff --git a/include/linux/sctp.h b/include/linux/sctp.h
index de1f64318fc4e..fcb4c36461732 100644
--- a/include/linux/sctp.h
+++ b/include/linux/sctp.h
@@ -705,70 +705,6 @@ typedef struct sctp_auth_chunk {
sctp_authhdr_t auth_hdr;
 } __packed sctp_auth_chunk_t;
 
-struct sctp_info {
-   __u32   sctpi_tag;
-   __u32   sctpi_state;
-   __u32   sctpi_rwnd;
-   __u16   sctpi_unackdata;
-   __u16   sctpi_penddata;
-   __u16   sctpi_instrms;
-   __u16   sctpi_outstrms;
-   __u32   sctpi_fragmentation_point;
-   __u32   sctpi_inqueue;
-   __u32   sctpi_outqueue;
-   __u32   sctpi_overall_error;
-   __u32   sctpi_max_burst;
-   __u32   sctpi_maxseg;
-   __u32   sctpi_peer_rwnd;
-   __u32   sctpi_peer_tag;
-   __u8sctpi_peer_capable;
-   __u8sctpi_peer_sack;
-   __u16   __reserved1;
-
-   /* assoc status info */
-   __u64   sctpi_isacks;
-   __u64   sctpi_osacks;
-   __u64   sctpi_opackets;
-   __u64   sctpi_ipackets;
-   __u64   sctpi_rtxchunks;
-   __u64   sctpi_outofseqtsns;
-   __u64   sctpi_idupchunks;
-   __u64   sctpi_gapcnt;
-   __u64   sctpi_ouodchunks;
-   __u64   sctpi_iuodchunks;
-   __u64   sctpi_oodchunks;
-   __u64   sctpi_iodchunks;
-   __u64   sctpi_octrlchunks;
-   __u64   sctpi_ictrlchunks;
-
-   /* primary transport info */
-   struct sockaddr_storage sctpi_p_address;
-   __s32   sctpi_p_state;
-   __u32   sctpi_p_cwnd;
-   __u32   sctpi_p_srtt;
-   __u32   sctpi_p_rto;
-   __u32   sctpi_p_hbinterval;
-   __u32   sctpi_p_pathmaxrxt;
-   __u32   sctpi_p_sackdelay;
-   __u32   sctpi_p_sackfreq;
-   __u32   sctpi_p_ssthresh;
-   __u32   sctpi_p_partial_bytes_acked;
-   __u32   sctpi_p_flight_size;
-   __u16   sctpi_p_error;
-   __u16   __reserved2;
-
-   /* sctp sock info */
-   __u32   sctpi_s_autoclose;
-   __u32   sctpi_s_adaptation_ind;
-   __u32   sctpi_s_pd_point;
-   __u8sctpi_s_nodelay;
-   __u8sctpi_s_disable_fragments;
-   __u8sctpi_s_v4mapped;
-   __u8sctpi_s_frag_interleave;
-   __u32   sctpi_s_type;
-   __u32   __reserved3;
-};
-
 struct sctp_infox {
struct sctp_info *sctpinfo;
struct sctp_association *asoc;
diff --git a/include/uapi/linux/sctp.h b/include/uapi/linux/sctp.h
index d304f4c9792c4..a406adcc0793e 100644
--- a/include/uapi/linux/sctp.h
+++ b/include/uapi/linux/sctp.h
@@ -944,4 +944,68 @@ struct sctp_default_prinfo {
__u16 pr_policy;
 };
 
+struct sctp_info {
+   __u32   sctpi_tag;
+   __u32   sctpi_state;
+   __u32   sctpi_rwnd;
+   __u16   sctpi_unackdata;
+   __u16   sctpi_penddata;
+   __u16   sctpi_instrms;
+   __u16   sctpi_outstrms;
+   __u32   sctpi_fragmentation_point;
+   __u32   sctpi_inqueue;
+   __u32   sctpi_outqueue;
+   __u32   sctpi_overall_error;
+   __u32   sctpi_max_burst;
+   __u32   sctpi_maxseg;
+   __u32   sctpi_peer_rwnd;
+   __u32   sctpi_peer_tag;
+   __u8sctpi_peer_capable;
+   __u8sctpi_peer_sack;
+   __u16   __reserved1;
+
+   /* assoc status info */
+   __u64   sctpi_isacks;
+   __u64   sctpi_osacks;
+   __u64   sctpi_opackets;
+   __u64   sctpi_ipackets;
+   __u64   sctpi_rtxchunks;
+   __u64   sctpi_outofseqtsns;
+   __u64   sctpi_idupchunks;
+   __u64   sctpi_gapcnt;
+   __u64   sctpi_ouodchunks;
+   __u64   sctpi_iuodchunks;
+   __u64   sctpi_oodchunks;
+   __u64   sctpi_iodchunks;
+   __u64   sctpi_octrlchunks;
+   __u64   sctpi_ictrlchunks;
+
+   /* primary transport info */
+   struct sockaddr_storage sctpi_p_address;
+   __s32   sctpi_p_state;
+   __u32   sctpi_p_cwnd;
+   __u32   sctpi_p_srtt;
+   __u32   sctpi_p_rto;
+   __u32   sctpi_p_hbinterval;
+   __u32   sctpi_p_pathmaxrxt;
+   __u32   sctpi_p_sackdelay;
+   __u32   sctpi_p_sackfreq;
+   __u32   sctpi_p_ssthresh;
+   __u32   sctpi_p_partial_bytes_acked;
+   __u32   sctpi_p_flight_size;
+   __u16   sctpi_p_error;
+   __u16   __reserved2;
+
+   /* sctp sock info */
+   __u32   sctpi_s_autoclose;
+   __u32   sctpi_s_adaptation_ind;
+   __u32   sctpi_s_pd_point;
+   __u8sctpi_s_nodelay;
+   __u8sctpi_s_disable_fragments;
+   __u8sctpi_s_v4mapped;
+   __u8sctpi_s_frag_interleave;
+   __u32   sctpi_s_type;
+   __u32   __reserved3;
+};
+
 #endif /* _UAPI_SCTP_H */
-- 
2.8.2



[PATCH 3/3] sctp_diag: Respect ss adding TCPF_CLOSE to idiag_states

2016-07-29 Thread Phil Sutter
Since 'ss' always adds TCPF_CLOSE to idiag_states flags, sctp_diag can't
rely upon TCPF_LISTEN flag solely being present when listening sockets
are requested.

Signed-off-by: Phil Sutter 
---
 net/sctp/sctp_diag.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/sctp/sctp_diag.c b/net/sctp/sctp_diag.c
index 0ad6033a7330c..92ae2828189d5 100644
--- a/net/sctp/sctp_diag.c
+++ b/net/sctp/sctp_diag.c
@@ -352,7 +352,7 @@ static int sctp_ep_dump(struct sctp_endpoint *ep, void *p)
if (cb->args[4] < cb->args[1])
goto next;
 
-   if ((r->idiag_states & ~TCPF_LISTEN) && !list_empty(>asocs))
+   if (!(r->idiag_states & TCPF_LISTEN) && !list_empty(>asocs))
goto next;
 
if (r->sdiag_family != AF_UNSPEC &&
@@ -467,7 +467,7 @@ skip:
 * 3 : to mark if we have dumped the ep info of the current asoc
 * 4 : to work as a temporary variable to traversal list
 */
-   if (!(idiag_states & ~TCPF_LISTEN))
+   if (!(idiag_states & ~(TCPF_LISTEN | TCPF_CLOSE)))
goto done;
sctp_for_each_transport(sctp_tsp_dump, net, cb->args[2], );
 done:
-- 
2.8.2



[PATCH 0/3] sctp_diag: A bunch of fixes for upcoming 'ss' support

2016-07-29 Thread Phil Sutter
The following series contains a number of fixes necessary to make my yet
unpublished 'ss' support patch functional.

Phil Sutter (3):
  sctp: Export struct sctp_info to userspace
  sctp_diag: export timer value only if it is active
  sctp_diag: Respect ss adding TCPF_CLOSE to idiag_states

 include/linux/sctp.h  | 64 ---
 include/uapi/linux/sctp.h | 64 +++
 net/sctp/sctp_diag.c  | 14 ++-
 3 files changed, 72 insertions(+), 70 deletions(-)

-- 
2.8.2



[PATCH 2/3] sctp_diag: export timer value only if it is active

2016-07-29 Thread Phil Sutter
Since it is exported as unsigned value, userspace has no way detecting
whether it is negative or just very large. Therefore do this in kernel
space where it is a simple comparison.

Signed-off-by: Phil Sutter 
---
 net/sctp/sctp_diag.c | 10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/net/sctp/sctp_diag.c b/net/sctp/sctp_diag.c
index f69edcf219e51..0ad6033a7330c 100644
--- a/net/sctp/sctp_diag.c
+++ b/net/sctp/sctp_diag.c
@@ -40,10 +40,12 @@ static void inet_diag_msg_sctpasoc_fill(struct 
inet_diag_msg *r,
}
 
r->idiag_state = asoc->state;
-   r->idiag_timer = SCTP_EVENT_TIMEOUT_T3_RTX;
-   r->idiag_retrans = asoc->rtx_data_chunks;
-   r->idiag_expires = jiffies_to_msecs(
-   asoc->timeouts[SCTP_EVENT_TIMEOUT_T3_RTX] - jiffies);
+   if (asoc->timeouts[SCTP_EVENT_TIMEOUT_T3_RTX] > jiffies) {
+   r->idiag_timer = SCTP_EVENT_TIMEOUT_T3_RTX;
+   r->idiag_retrans = asoc->rtx_data_chunks;
+   r->idiag_expires = jiffies_to_msecs(
+   asoc->timeouts[SCTP_EVENT_TIMEOUT_T3_RTX] - jiffies);
+   }
 }
 
 static int inet_diag_msg_sctpladdrs_fill(struct sk_buff *skb,
-- 
2.8.2



[PATCH RESEND nf] netfilter: avoid a race between nf_register_hook() and cleanup_net()

2016-07-29 Thread Michal Kubecek
There is a race condition between nf_{,un}register_hook() and
cleanup_net() which can either trigger WARN check or cause a memory
leak. The scenario is like this (2a and 2b are alternatives):

1.  cleanup_net() removes one or more struct net from net_namespace_list
2a. nf_register_hook() adds per-netns hooks to all netns (but not those
removed in step 1) and adds the hook to global nf_hook_list
2b. nf_unregister_hook() deletes per-netns hooks from all netns (but not
those removed in step 1) and removes the hook from nf_hook_list
3.  cleanup_net() calls pernet subsystem exit functions for netns being
removed; one of them is netfilter_net_exit() which (among others)
calls nf_unregister_net_hook() to unregister per-netns hooks for all
hooks in nf_hook_list.

In case (a), per-netns hooks are never added as the namespace was
already invisible to for_each_net() in step 2a but an attempt to remove
them in step 3 (the hook is already in nf_hook_list) triggers a WARN
check in nf_unregister_net_hook() (no real harm done, however). In case
(b), the per-netns hook is removed neither in step 2b (netns is already
invisible to for_each_net()) nor in step 3 (the hook is already removed
from nf_hook_list), causing a memory leak.

Prevent the race by protecting the for_each_net() loop in
nf_{,un}register_hook() (also) by net_mutex. There is already a
precendens for this in rtnl_link_unregister() which addresses similar
race.

Fixes: 085db2c04557 ("netfilter: Per network namespace netfilter hooks.")
Signed-off-by: Michal Kubecek 
---
 net/netfilter/core.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/net/netfilter/core.c b/net/netfilter/core.c
index f39276d1c2d7..860978c9f82e 100644
--- a/net/netfilter/core.c
+++ b/net/netfilter/core.c
@@ -193,6 +193,8 @@ int nf_register_hook(struct nf_hook_ops *reg)
struct net *net, *last;
int ret;
 
+   /* prevent race with cleanup_net() */
+   mutex_lock(_mutex);
rtnl_lock();
for_each_net(net) {
ret = nf_register_net_hook(net, reg);
@@ -201,6 +203,7 @@ int nf_register_hook(struct nf_hook_ops *reg)
}
list_add_tail(>list, _hook_list);
rtnl_unlock();
+   mutex_unlock(_mutex);
 
return 0;
 rollback:
@@ -211,6 +214,7 @@ rollback:
nf_unregister_net_hook(net, reg);
}
rtnl_unlock();
+   mutex_unlock(_mutex);
return ret;
 }
 EXPORT_SYMBOL(nf_register_hook);
@@ -219,11 +223,14 @@ void nf_unregister_hook(struct nf_hook_ops *reg)
 {
struct net *net;
 
+   /* prevent race with cleanup_net() */
+   mutex_lock(_mutex);
rtnl_lock();
list_del(>list);
for_each_net(net)
nf_unregister_net_hook(net, reg);
rtnl_unlock();
+   mutex_unlock(_mutex);
 }
 EXPORT_SYMBOL(nf_unregister_hook);
 
-- 
2.9.2



RE: [Intel-wired-lan] [PATCH net-next v3 1/2] e1000e: factor out systim sanitization

2016-07-29 Thread Woodford, Timothy W.
>>> This is prepatory work for an expanding list of adapter families that have 
>>> occasional ~10 hour clock jumps when being used for PTP. Factor out the 
>>> sanitization function and convert to using a feature (bug) flag, per 
>>> suggestion from Jesse Brandeburg.
>>> 
>>> Littering functional code with device-specific checks is much messier than 
>>> simply checking a flag, and having device-specific init set flags as needed.
>>> There are probably a number of other cases in the e1000e code that 
>>> could/should be converted similarly.
>> 
>> Looks ok to me.
>> Adding Chris who asked what happens if we reach the max retry counter 
>> (E1000_MAX_82574_SYSTIM_REREAD)?
>> This counter is set to 50. 
>> Can you, for testing purposes, decreased this value (or even set it to 0) 
>> and see what happens?
>  I'll set the max retry counter to 1 and run an overnight test to see what 
> happens.

After running with this configuration for about 36 hours, I haven't seen any 
timing jumps.  Either this configuration eliminates the error, or it makes it 
significantly less likely to occur.

Tim Woodford


Re: [PATCH v2 net] tcp: consider recv buf for the initial window scale

2016-07-29 Thread Neal Cardwell
On Fri, Jul 29, 2016 at 9:34 AM, Soheil Hassas Yeganeh
 wrote:
> To fix this, consider the maximum of net.ipv4.tcp_rmem[2],
> net.core.rmem_max and socket's initial buffer space.
>
> Fixes: b0573dea1fb3 ("[NET]: Introduce SO_{SND,RCV}BUFFORCE socket options")
> Signed-off-by: Soheil Hassas Yeganeh 
> Suggested-by: Neal Cardwell 

Acked-by: Neal Cardwell 

Thanks, Soheil.

neal


[PATCH net 1/3] macsec: fix reference counting on RXSC in macsec_handle_frame

2016-07-29 Thread Sabrina Dubroca
Currently, we lookup the RXSC without taking a reference on it.  The
RXSA holds a reference on the RXSC, but the SA and SC could still both
disappear before we take a reference on the SA.

Take a reference on the RXSC in macsec_handle_frame.

Fixes: c09440f7dcb3 ("macsec: introduce IEEE 802.1AE driver")
Signed-off-by: Sabrina Dubroca 
---
 drivers/net/macsec.c | 9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/drivers/net/macsec.c b/drivers/net/macsec.c
index 2d0beb1b801c..718cf98023ff 100644
--- a/drivers/net/macsec.c
+++ b/drivers/net/macsec.c
@@ -863,6 +863,7 @@ static void macsec_decrypt_done(struct crypto_async_request 
*base, int err)
struct net_device *dev = skb->dev;
struct macsec_dev *macsec = macsec_priv(dev);
struct macsec_rx_sa *rx_sa = macsec_skb_cb(skb)->rx_sa;
+   struct macsec_rx_sc *rx_sc = rx_sa->sc;
int len, ret;
u32 pn;
 
@@ -891,6 +892,7 @@ static void macsec_decrypt_done(struct crypto_async_request 
*base, int err)
 
 out:
macsec_rxsa_put(rx_sa);
+   macsec_rxsc_put(rx_sc);
dev_put(dev);
 }
 
@@ -1106,6 +1108,7 @@ static rx_handler_result_t macsec_handle_frame(struct 
sk_buff **pskb)
 
list_for_each_entry_rcu(macsec, >secys, secys) {
struct macsec_rx_sc *sc = find_rx_sc(>secy, sci);
+   sc = sc ? macsec_rxsc_get(sc) : NULL;
 
if (sc) {
secy = >secy;
@@ -1180,8 +1183,10 @@ static rx_handler_result_t macsec_handle_frame(struct 
sk_buff **pskb)
 
if (IS_ERR(skb)) {
/* the decrypt callback needs the reference */
-   if (PTR_ERR(skb) != -EINPROGRESS)
+   if (PTR_ERR(skb) != -EINPROGRESS) {
macsec_rxsa_put(rx_sa);
+   macsec_rxsc_put(rx_sc);
+   }
rcu_read_unlock();
*pskb = NULL;
return RX_HANDLER_CONSUMED;
@@ -1197,6 +1202,7 @@ deliver:
 
if (rx_sa)
macsec_rxsa_put(rx_sa);
+   macsec_rxsc_put(rx_sc);
 
ret = gro_cells_receive(>gro_cells, skb);
if (ret == NET_RX_SUCCESS)
@@ -1212,6 +1218,7 @@ deliver:
 drop:
macsec_rxsa_put(rx_sa);
 drop_nosa:
+   macsec_rxsc_put(rx_sc);
rcu_read_unlock();
 drop_direct:
kfree_skb(skb);
-- 
2.9.0



[PATCH net 0/3] macsec: reference counting fixes

2016-07-29 Thread Sabrina Dubroca
Patch 1 adds explicit reference counting on RXSCs, instead of the
current implicit reference counting using the RXSA's refcount.

Patch 2 fixes possible kernel panics during module unload caused by an
RCU callback that schedules another RCU callback, which the
rcu_barrier() added in b196c22af5c3 ("macsec: add rcu_barrier() on
module exit") didn't protect against.

Patch 3 fixes a refcounting issue with the underlying device for a
macsec device when link creation fails.

Sabrina Dubroca (3):
  macsec: fix reference counting on RXSC in macsec_handle_frame
  macsec: RXSAs don't need to hold a reference on RXSCs
  macsec: fix negative refcnt on parent link

 drivers/net/macsec.c | 16 +++-
 1 file changed, 11 insertions(+), 5 deletions(-)

-- 
2.9.0



[PATCH net 2/3] macsec: RXSAs don't need to hold a reference on RXSCs

2016-07-29 Thread Sabrina Dubroca
Following the previous patch, RXSCs are held and properly refcounted in
the RX path (instead of being implicitly held by their SA), so the SA
doesn't need to hold a reference on its parent RXSC.

This also avoids panics on module unload caused by the double layer of
RCU callbacks (call_rcu frees the RXSA, which puts the final reference
on the RXSC and allows to free it in its own call_rcu) that commit
b196c22af5c3 ("macsec: add rcu_barrier() on module exit") didn't
protect against.
There were also some refcounting bugs in macsec_add_rxsa where I didn't
put the reference on the RXSC on the error paths, which would lead to
memory leaks.

Fixes: c09440f7dcb3 ("macsec: introduce IEEE 802.1AE driver")
Signed-off-by: Sabrina Dubroca 
---
 drivers/net/macsec.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/net/macsec.c b/drivers/net/macsec.c
index 718cf98023ff..7f5c5e0f9cf9 100644
--- a/drivers/net/macsec.c
+++ b/drivers/net/macsec.c
@@ -344,7 +344,6 @@ static void free_rxsa(struct rcu_head *head)
 
crypto_free_aead(sa->key.tfm);
free_percpu(sa->stats);
-   macsec_rxsc_put(sa->sc);
kfree(sa);
 }
 
@@ -1653,7 +1652,7 @@ static int macsec_add_rxsa(struct sk_buff *skb, struct 
genl_info *info)
 
rtnl_lock();
rx_sc = get_rxsc_from_nl(genl_info_net(info), attrs, tb_rxsc, , 
);
-   if (IS_ERR(rx_sc) || !macsec_rxsc_get(rx_sc)) {
+   if (IS_ERR(rx_sc)) {
rtnl_unlock();
return PTR_ERR(rx_sc);
}
-- 
2.9.0



[PATCH net 3/3] macsec: fix negative refcnt on parent link

2016-07-29 Thread Sabrina Dubroca
When creation of a macsec device fails because an identical device
already exists on this link, the current code decrements the refcnt on
the parent link (in ->destructor for the macsec device), but it had not
been incremented yet.

Move the dev_hold(parent_link) call earlier during macsec device
creation.

Fixes: c09440f7dcb3 ("macsec: introduce IEEE 802.1AE driver")
Signed-off-by: Sabrina Dubroca 
---
 drivers/net/macsec.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/macsec.c b/drivers/net/macsec.c
index 7f5c5e0f9cf9..d13e6e15d7b5 100644
--- a/drivers/net/macsec.c
+++ b/drivers/net/macsec.c
@@ -3179,6 +3179,8 @@ static int macsec_newlink(struct net *net, struct 
net_device *dev,
if (err < 0)
return err;
 
+   dev_hold(real_dev);
+
/* need to be already registered so that ->init has run and
 * the MAC addr is set
 */
@@ -3207,8 +3209,6 @@ static int macsec_newlink(struct net *net, struct 
net_device *dev,
 
macsec_generation++;
 
-   dev_hold(real_dev);
-
return 0;
 
 del_dev:
-- 
2.9.0



[PATCH v2 net] tcp: consider recv buf for the initial window scale

2016-07-29 Thread Soheil Hassas Yeganeh
From: Soheil Hassas Yeganeh 

tcp_select_initial_window() intends to advertise a window
scaling for the maximum possible window size. To do so,
it considers the maximum of net.ipv4.tcp_rmem[2] and
net.core.rmem_max as the only possible upper-bounds.
However, users with CAP_NET_ADMIN can use SO_RCVBUFFORCE
to set the socket's receive buffer size to values
larger than net.ipv4.tcp_rmem[2] and net.core.rmem_max.
Thus, SO_RCVBUFFORCE is effectively ignored by
tcp_select_initial_window().

To fix this, consider the maximum of net.ipv4.tcp_rmem[2],
net.core.rmem_max and socket's initial buffer space.

Fixes: b0573dea1fb3 ("[NET]: Introduce SO_{SND,RCV}BUFFORCE socket options")
Signed-off-by: Soheil Hassas Yeganeh 
Suggested-by: Neal Cardwell 
---
 net/ipv4/tcp_output.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index b26aa87..bdaef7f 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -236,7 +236,8 @@ void tcp_select_initial_window(int __space, __u32 mss,
/* Set window scaling on max possible window
 * See RFC1323 for an explanation of the limit to 14
 */
-   space = max_t(u32, sysctl_tcp_rmem[2], sysctl_rmem_max);
+   space = max_t(u32, space, sysctl_tcp_rmem[2]);
+   space = max_t(u32, space, sysctl_rmem_max);
space = min_t(u32, space, *window_clamp);
while (space > 65535 && (*rcv_wscale) < 14) {
space >>= 1;
-- 
2.8.0.rc3.226.g39d4020



Re: [PATCH net] tcp: consider recv buf for the initial window scale

2016-07-29 Thread Soheil Hassas Yeganeh
On Fri, Jul 29, 2016 at 9:21 AM, Neal Cardwell  wrote:
>
> On Thu, Jul 28, 2016 at 11:11 PM, Soheil Hassas Yeganeh
>  wrote:
> >
> > From: Soheil Hassas Yeganeh 
> >
> > tcp_select_initial_window() intends to advertise a window
> > scaling for the maximum possible window size. To do so,
> > it considers the maximum of net.ipv4.tcp_rmem[2] and
> > net.core.rmem_max as the only possible upper-bounds.
> > However, users with CAP_NET_ADMIN can use SO_RCVBUFFORCE
> > to set the socket's receive buffer size to values
> > larger than net.ipv4.tcp_rmem[2] and net.core.rmem_max.
> > Thus, SO_RCVBUFFORCE is effectively ignored by
> > tcp_select_initial_window().
> >
> > To fix this, consider the maximum of net.ipv4.tcp_rmem[2],
> > net.core.rmem_max and socket's initial buffer space.
> >
> > This part of the code does not have git history and as a
> > result this patch does not have a `Fixes:` tag.
> >
> > Signed-off-by: Soheil Hassas Yeganeh 
> > Suggested-by: Neal Cardwell 
>
> I think it makes sense to tag this commit with:
>
> Fixes: b0573dea1fb3 ("[NET]: Introduce SO_{SND,RCV}BUFFORCE socket options")

Thanks for noting the SHA1. I'll send a v2.

Thanks,
Soheil

> ... because that's the moment at which this line of code started
> failing to achieve its stated objective of setting the window scaling
> factor based on the max possible window.
>
> And having a Fixes tag would help maintainers figure out that the
> patch makes sense to apply to kernels after that commit.
>
> thanks,
> neal


Re: [PATCH net] tcp: consider recv buf for the initial window scale

2016-07-29 Thread Neal Cardwell
On Thu, Jul 28, 2016 at 11:11 PM, Soheil Hassas Yeganeh
 wrote:
>
> From: Soheil Hassas Yeganeh 
>
> tcp_select_initial_window() intends to advertise a window
> scaling for the maximum possible window size. To do so,
> it considers the maximum of net.ipv4.tcp_rmem[2] and
> net.core.rmem_max as the only possible upper-bounds.
> However, users with CAP_NET_ADMIN can use SO_RCVBUFFORCE
> to set the socket's receive buffer size to values
> larger than net.ipv4.tcp_rmem[2] and net.core.rmem_max.
> Thus, SO_RCVBUFFORCE is effectively ignored by
> tcp_select_initial_window().
>
> To fix this, consider the maximum of net.ipv4.tcp_rmem[2],
> net.core.rmem_max and socket's initial buffer space.
>
> This part of the code does not have git history and as a
> result this patch does not have a `Fixes:` tag.
>
> Signed-off-by: Soheil Hassas Yeganeh 
> Suggested-by: Neal Cardwell 

I think it makes sense to tag this commit with:

Fixes: b0573dea1fb3 ("[NET]: Introduce SO_{SND,RCV}BUFFORCE socket options")

... because that's the moment at which this line of code started
failing to achieve its stated objective of setting the window scaling
factor based on the max possible window.

And having a Fixes tag would help maintainers figure out that the
patch makes sense to apply to kernels after that commit.

thanks,
neal


Re: PROBLEM: TPROXY and DNAT broken (bisected to 079096f103fa)

2016-07-29 Thread Florian Westphal
Brandon Cazander  wrote:
> * When it fails, no traffic hits the WEBSERVER. A tcpdump on the bad kernel 
> shows:
> root@dons-qemu-new-kernel:~# tcpdump -niany tcp and port 8080
> tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
> listening on any, link-type LINUX_SLL (Linux cooked), capture size 65535 
> bytes
> 16:42:31.551952 IP 10.100.0.206.35562 > 42.0.1.1.8080: Flags [S], seq 
> 3793582216, win 29200, options [mss 1460,sackOK,TS val 632068656 ecr 
> 0,nop,wscale 7], length 0
> 16:42:31.551988 IP 42.0.1.1.8080 > 10.100.0.206.35562: Flags [S.], seq 
> 4042636216, ack 3793582217, win 28960, options [mss 1460,sackOK,TS val 745382 
> ecr 632068656,nop,wscale 7], length 0
> 16:42:31.55 IP 10.100.0.206.35562 > 42.0.1.1.8080: Flags [.], ack 1, 
> win 229, options [nop,nop,TS val 632068657 ecr 745382], length 0
> 16:42:31.552238 IP 42.0.1.1.8080 > 10.100.0.206.35562: Flags [R], seq 
> 4042636217, win 0, length 0
> 16:42:31.552246 IP 10.100.0.206.35562 > 42.0.1.1.8080: Flags [P.], seq 
> 1:78, ack 1, win 229, options [nop,nop,TS val 632068657 ecr 745382], length 77
> 16:42:31.552251 IP 42.0.1.1.8080 > 10.100.0.206.35562: Flags [R], seq 
> 4042636217, win 0, length 0
> 16:42:32.551668 IP 42.0.1.1.8080 > 10.100.0.206.35562: Flags [S.], seq 
> 4042636216, ack 3793582217, win 28960, options [mss 1460,sackOK,TS val 745632 
> ecr 632068656,nop,wscale 7], length 0
> 16:42:32.551925 IP 10.100.0.206.35562 > 42.0.1.1.8080: Flags [R], seq 
> 3793582217, win 0, length 0
> 16:42:34.551668 IP 42.0.1.1.8080 > 10.100.0.206.35562: Flags [S.], seq 
> 4042636216, ack 3793582217, win 28960, options [mss 1460,sackOK,TS val 746132 
> ecr 632068656,nop,wscale 7], length 0
> 16:42:34.551995 IP 10.100.0.206.35562 > 42.0.1.1.8080: Flags [R], seq 
> 3793582217, win 0, length 0

Please try this patch, it makes it work for me again.
I decided to extend the existing snat support in xt_socket.c instead
of changing TPROXY target:

diff --git a/net/netfilter/xt_socket.c b/net/netfilter/xt_socket.c
--- a/net/netfilter/xt_socket.c
+++ b/net/netfilter/xt_socket.c
@@ -144,6 +144,44 @@ static bool xt_socket_sk_is_transparent(struct sock *sk)
}
 }
 
+static void get_lookup_daddr(const struct sk_buff *skb, u32 *daddr, u16 *dport)
+{
+#ifdef XT_SOCKET_HAVE_CONNTRACK
+   const struct iphdr *iph = ip_hdr(skb);
+   enum ip_conntrack_info ctinfo;
+   enum ip_conntrack_dir dir;
+   struct nf_conn const *ct;
+
+   /* Do the lookup with the original socket address in
+* case this is a packet of an SNAT-ted connection.
+*/
+   ct = nf_ct_get(skb, );
+   if (!ct || nf_ct_is_untracked(ct))
+   return;
+
+   if ((ct->status & IPS_SRC_NAT_DONE) == 0)
+   return;
+
+   dir = CTINFO2DIR(ctinfo);
+   switch (iph->protocol) {
+   case IPPROTO_ICMP:
+   if (ctinfo != IP_CT_RELATED_REPLY)
+   return;
+   break;
+   case IPPROTO_TCP:
+   *dport = ct->tuplehash[!dir].tuple.src.u.tcp.port;
+   break;
+   case IPPROTO_UDP:
+   *dport = ct->tuplehash[!dir].tuple.src.u.udp.port;
+   break;
+   default:
+   return;
+   }
+
+   *daddr = ct->tuplehash[!dir].tuple.src.u3.ip;
+#endif
+}
+
 static struct sock *xt_socket_lookup_slow_v4(struct net *net,
 const struct sk_buff *skb,
 const struct net_device *indev)
@@ -154,10 +192,6 @@ static struct sock *xt_socket_lookup_slow_v4(struct net 
*net,
__be32 uninitialized_var(daddr), uninitialized_var(saddr);
__be16 uninitialized_var(dport), uninitialized_var(sport);
u8 uninitialized_var(protocol);
-#ifdef XT_SOCKET_HAVE_CONNTRACK
-   struct nf_conn const *ct;
-   enum ip_conntrack_info ctinfo;
-#endif
 
if (iph->protocol == IPPROTO_UDP || iph->protocol == IPPROTO_TCP) {
struct udphdr _hdr, *hp;
@@ -185,25 +219,7 @@ static struct sock *xt_socket_lookup_slow_v4(struct net 
*net,
return NULL;
}
 
-#ifdef XT_SOCKET_HAVE_CONNTRACK
-   /* Do the lookup with the original socket address in
-* case this is a reply packet of an established
-* SNAT-ted connection.
-*/
-   ct = nf_ct_get(skb, );
-   if (ct && !nf_ct_is_untracked(ct) &&
-   ((iph->protocol != IPPROTO_ICMP &&
- ctinfo == IP_CT_ESTABLISHED_REPLY) ||
-(iph->protocol == IPPROTO_ICMP &&
- ctinfo == IP_CT_RELATED_REPLY)) &&
-   (ct->status & IPS_SRC_NAT_DONE)) {
-
-   daddr = ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple.src.u3.ip;
-   dport = (iph->protocol == IPPROTO_TCP) ?
-   ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple.src.u.tcp.port :
-   

Re: [PATCH v2 4/4] ARM: OMAP2+: omap_device: fix crash on omap_device removal

2016-07-29 Thread Grygorii Strashko

On 07/29/2016 08:15 AM, Peter Ujfalusi wrote:

On 07/28/16 20:50, Grygorii Strashko wrote:

Below call chain causes system crash when OMAP device is
removed by calling of_platform_depopulate()/device_del():


Should you swap 3 <-> 4 in the series?
Currently patch 3 will introduce the crash you are fixing in patch 4...


No. The key function here is device_del() - of_device_unregister(), 
which was used previously has the same issue:

  of_device_unregister() -> device_unregister() ->device_del()

In general all these patches are independent and they were created just 
in bugs detection order.






device_del()
- blocking_notifier_call_chain(>bus->p->bus_notifier,
 BUS_NOTIFY_DEL_DEVICE, dev);
  - _omap_device_notifier_call()
- omap_device_delete()
  - od->pdev->archdata.od = NULL;
kfree(od->hwmods);
kfree(od);
  - bus_remove_device()
- device_release_driver()
  - __device_release_driver()
- pm_runtime_get_sync()
   - _od_runtime_resume()
 - omap_hwmod_enable() <- OOPS od's delted already

Backtrace:
Unable to handle kernel NULL pointer dereference at virtual address 000d
pgd = eb10
[000d] *pgd=ad6e1831, *pte=, *ppte=
Internal error: Oops: 17 [#1] PREEMPT SMP ARM
CPU: 1 PID: 1273 Comm: modprobe Not tainted 4.4.15-rt19-00115-ge4d3cd3-dirty #68
Hardware name: Generic DRA74X (Flattened Device Tree)
task: eb1ee800 ti: ec962000 task.ti: ec962000
PC is at omap_device_enable+0x10/0x90
LR is at _od_runtime_resume+0x10/0x24
[...]
[] (omap_device_enable) from [] 
(_od_runtime_resume+0x10/0x24)
[] (_od_runtime_resume) from [] (__rpm_callback+0x20/0x34)
[] (__rpm_callback) from [] (rpm_callback+0x20/0x80)
[] (rpm_callback) from [] (rpm_resume+0x48c/0x964)
[] (rpm_resume) from [] (__pm_runtime_resume+0x60/0x88)
[] (__pm_runtime_resume) from [] 
(__device_release_driver+0x30/0x100)
[] (__device_release_driver) from [] 
(device_release_driver+0x1c/0x28)
[] (device_release_driver) from [] 
(bus_remove_device+0xec/0x144)
[] (bus_remove_device) from [] (device_del+0x10c/0x210)
[] (device_del) from [] (platform_device_del+0x18/0x84)
[] (platform_device_del) from [] 
(platform_device_unregister+0xc/0x20)
[] (platform_device_unregister) from [] 
(of_platform_device_destroy+0x8c/0x90)
[] (of_platform_device_destroy) from [] 
(device_for_each_child+0x4c/0x78)
[] (device_for_each_child) from [] 
(of_platform_depopulate+0x30/0x44)
[] (of_platform_depopulate) from [] (cpsw_remove+0x68/0xf4 
[ti_cpsw])
[] (cpsw_remove [ti_cpsw]) from [] 
(platform_drv_remove+0x24/0x3c)
[] (platform_drv_remove) from [] 
(__device_release_driver+0x84/0x100)
[] (__device_release_driver) from [] 
(driver_detach+0xac/0xb0)
[] (driver_detach) from [] (bus_remove_driver+0x60/0xd4)
[] (bus_remove_driver) from [] 
(SyS_delete_module+0x184/0x20c)
[] (SyS_delete_module) from [] (ret_fast_syscall+0x0/0x1c)
Code: e350 e92d4070 1590630c 01a06000 (e5d6300d)

Hence, fix it by using BUS_NOTIFY_REMOVED_DEVICE event for OMAP device
deletion which is sent when DD has finished processing of device
deletion.

Cc: Tony Lindgren 
Cc: Tero Kristo 
Signed-off-by: Grygorii Strashko 
---
 arch/arm/mach-omap2/omap_device.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm/mach-omap2/omap_device.c 
b/arch/arm/mach-omap2/omap_device.c
index f7ff3b9..208f115 100644
--- a/arch/arm/mach-omap2/omap_device.c
+++ b/arch/arm/mach-omap2/omap_device.c
@@ -194,7 +194,7 @@ static int _omap_device_notifier_call(struct notifier_block 
*nb,
int err;

switch (event) {
-   case BUS_NOTIFY_DEL_DEVICE:
+   case BUS_NOTIFY_REMOVED_DEVICE:
if (pdev->archdata.od)
omap_device_delete(pdev->archdata.od);
break;







--
regards,
-grygorii


Re: [RFC] net/mlx5_core/en_main: Remove deprecated create_workqueue

2016-07-29 Thread Tejun Heo
Hello,

On Fri, Jul 29, 2016 at 01:30:05AM +0300, Saeed Mahameed wrote:
> > Are the workitems being used on a memory reclaim path?
> 
> do you mean they need to allocate memory ?

It's a bit convoluted.  A workqueue needs WQ_MEM_RECLAIM flag to be
guaranteed forward progress under memory pressure, so any workqueue
which may be depended upon during memory reclaim should have the flag
set; otherwise, the system can deadlock (try to reclaim memory, hits
the wq which can't make forward progress due to lack of memory).  For
network devices, the requirement comes from block-over-network or nfs
which can be involved in memory reclaim.

Thanks.

-- 
tejun


Re: [PATCH] net/mlx5_core/pagealloc: Remove deprecated create_singlethread_workqueue

2016-07-29 Thread Tejun Heo
Hello,

On Thu, Jul 28, 2016 at 12:37:35PM +0300, Leon Romanovsky wrote:
> Did you test this patch? Did you notice the memory reclaim path nature
> of this work?

The conversion uses WQ_MEM_RECLAIM, which is standard for all
workqueues which can stall packet processing if stalled.  The
requirement comes from nfs or block devices over network.

Thanks.

-- 
tejun


Re: [PATCH v2 1/1] Add timer to handle OOM situations

2016-07-29 Thread Stefan Hajnoczi
On Tue, Jul 26, 2016 at 04:28:21PM +0200, ggar...@abra.uab.cat wrote:
> @@ -493,6 +524,9 @@ static int vhost_vsock_dev_open(struct inode *inode, 
> struct file *file)
>   goto out;
>   }
>  
> + setup_timer(>tx_kick,
> + vhost_vsock_rehandle_tx_kick, (unsigned long) NULL);
> +
>   vqs[VSOCK_VQ_TX] = >vqs[VSOCK_VQ_TX];
>   vqs[VSOCK_VQ_RX] = >vqs[VSOCK_VQ_RX];
>   vsock->vqs[VSOCK_VQ_TX].handle_kick = vhost_vsock_handle_tx_kick;
> @@ -555,6 +589,7 @@ static int vhost_vsock_dev_release(struct inode *inode, 
> struct file *file)
>   vhost_dev_stop(>dev);
>   vhost_dev_cleanup(>dev, false);
>   kfree(vsock->dev.vqs);
> + del_timer(>tx_kick);

Please use del_timer_sync() so that we know the timer callback has
finished executing if it's racing with us.

Also please figure out the correct ordering of this call so
vhost_poll_queue() doesn't crash if the timer fires while we are
executing vhost_vsock_dev_release().  In other words, vq and vq->poll
must still be alive when we delete the timer, otherwise the callback
could fire an run on a vq that has already been cleaned up by
vhost_vsock_dev_release().


signature.asc
Description: PGP signature


Re: [PATCH net] tcp: fix functions of tcp_congestion_ops from being called before initialization

2016-07-29 Thread Florian Westphal
Li, Ji  wrote:
> In Linux 3.17 and earlier, tcp_init_congestion_ops (i.e. tcp_reno) is
> used as the ca_ops during 3WHS, and after 3WHS, ca_ops is assigned as 
> the default congestion control set by sysctl and immediately its parameters
> stored in icsk_ca_priv[] are initialized. Commit 55d8694fa82c ("net:
> tcp: assign tcp cong_ops when tcp sk is created") splits assignment and
> initialization into two steps: assignment is done before SYN or SYN-ACK
> is sent out; initialization is done after 3WHS (assume without
> fastopen). But this can cause out-of-order invocation for ca_ops functions
> other than .init() during 3WHS, as they could be called before its
> parameters get initialized. It may cause unexpected behavior for
> congestion controls, and make troubles for those that need dynamic
> object allocation, like tcp_cdg etc.

What exactly is the problem?
Kernel crash?

AFAICS cdg can cope with NULL ca->gradients.

> We used tcp_dctcp as an example to visualize the problem, and set it as
> default congestion control via sysctl. Three parameters
> (ca->prior_snd_una, ca->prior_rcv_nxt, ca->dctcp_alpha) were monitored
> when functions, such as dctcp_update_alpha() and dctcp_ssthresh(), are
> called during 3WHS. All of three are found to be zero, which is likely
> impossible if dctcp_init() was called ahead, where those three
> parameters should be initialized. Some other congestion controls are
> examined too and the same problem was reproduced.

Why is this a problem?

> diff --git a/include/net/tcp.h b/include/net/tcp.h
> +{
> +   if (inet_csk(sk)->icsk_ca_initialized)
> +   return inet_csk(sk)->icsk_ca_ops->ssthresh(sk);
> +   else
> +   return tcp_reno_ssthresh(sk);
> +}
> +
>  /* Enter Loss state. If we detect SACK reneging, forget all SACK information
>   * and reset tags completely, otherwise preserve SACKs. If receiver
>   * dropped its ofo queue, we will know this due to reneging detection.
> @@ -1896,7 +1904,7 @@ void tcp_enter_loss(struct sock *sk)
> !after(tp->high_seq, tp->snd_una) ||
> (icsk->icsk_ca_state == TCP_CA_Loss && !icsk->icsk_retransmits)) {
> tp->prior_ssthresh = tcp_current_ssthresh(sk);
> -   tp->snd_ssthresh = icsk->icsk_ca_ops->ssthresh(sk);
> +   tp->snd_ssthresh = tcp_ca_ssthresh(sk);
> tcp_ca_event(sk, CA_EVENT_LOSS);
> tcp_init_undo(tp);
> }

Can you explain how we can do loss recovery on a non-established
connection ?

> @@ -3335,7 +3343,8 @@ static void tcp_cong_control(struct sock *sk, u32 ack, 
> u32 acked_sacked,
> if (tcp_in_cwnd_reduction(sk)) {
> /* Reduce cwnd if state mandates */
> tcp_cwnd_reduction(sk, acked_sacked, flag);
> -   } else if (tcp_may_raise_cwnd(sk, flag)) {
> +   } else if (tcp_may_raise_cwnd(sk, flag) &&
> +  inet_csk(sk)->icsk_ca_initialized) {
> /* Advance cwnd if state allows */
> tcp_cong_avoid(sk, ack, acked_sacked);

Same here.  How is this called for minisock/sk with non-inited cong ops?
Once sk moves to TCP_ESTABLISHED congestion ops are supposed to
be initialized.

If thats not the case then thats a bug and should be fixed rather
than not calling the cc state machinery any more.


[PATCH 1/1] Bluetooth: add printf format attribute to hci_set_[fh]w_info()

2016-07-29 Thread Nicolas Iooss
Commit 5177a83827cd ("Bluetooth: Add debugfs fields for hardware and
firmware info") introduced hci_set_hw_info() and hci_set_fw_info().
These functions use kvasprintf_const() but are not marked with a
__printf attribute.  Adding such an attribute helps detecting issues
related to printf-formatting at build time.

Signed-off-by: Nicolas Iooss 
---
 include/net/bluetooth/hci_core.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/net/bluetooth/hci_core.h b/include/net/bluetooth/hci_core.h
index ee7fc47680a1..012e5031fe47 100644
--- a/include/net/bluetooth/hci_core.h
+++ b/include/net/bluetooth/hci_core.h
@@ -1026,8 +1026,8 @@ int hci_resume_dev(struct hci_dev *hdev);
 int hci_reset_dev(struct hci_dev *hdev);
 int hci_recv_frame(struct hci_dev *hdev, struct sk_buff *skb);
 int hci_recv_diag(struct hci_dev *hdev, struct sk_buff *skb);
-void hci_set_hw_info(struct hci_dev *hdev, const char *fmt, ...);
-void hci_set_fw_info(struct hci_dev *hdev, const char *fmt, ...);
+__printf(2, 3) void hci_set_hw_info(struct hci_dev *hdev, const char *fmt, 
...);
+__printf(2, 3) void hci_set_fw_info(struct hci_dev *hdev, const char *fmt, 
...);
 int hci_dev_open(__u16 dev);
 int hci_dev_close(__u16 dev);
 int hci_dev_do_close(struct hci_dev *hdev);
-- 
2.9.0



Re: [PATCH ipsec] xfrm: Ignore socket policies when rebuilding hash tables

2016-07-29 Thread Steffen Klassert
On Fri, Jul 29, 2016 at 04:19:11PM +0800, Herbert Xu wrote:
> On Fri, Jul 29, 2016 at 09:57:32AM +0200, Tobias Brunner wrote:
> > Whenever thresholds are changed the hash tables are rebuilt.  This is
> > done by enumerating all policies and hashing and inserting them into
> > the right table according to the thresholds and direction.
> > 
> > Because socket policies are also contained in net->xfrm.policy_all but
> > no hash tables are defined for their direction (dir + XFRM_POLICY_MAX)
> > this causes a NULL or invalid pointer dereference after returning from
> > policy_hash_bysel() if the rebuild is done while any socket policies
> > are installed.
> > 
> > Since the rebuild after changing thresholds is scheduled this crash
> > could even occur if the userland sets thresholds seemingly before
> > installing any socket policies.
> > 
> > Fixes: 53c2e285f970 ("xfrm: Do not hash socket policies")
> > Signed-off-by: Tobias Brunner 
> 
> Acked-by: Herbert Xu 

Applied to the ipsec tree, thanks a lot Tobias!


Re: question: tg3 driver/nics and inconsistent RX ring count

2016-07-29 Thread Siva Reddy Kallam
On Wed, Jul 27, 2016 at 4:25 AM, Michael Chan  wrote:
> On Tue, Jul 26, 2016 at 1:32 PM, Michal Soltys  wrote:
>> On 2016-07-26 22:06, Alexander Duyck wrote:
>>> On Tue, Jul 26, 2016 at 12:52 PM, Michal Soltys  wrote:
 Hi,

 I have a few of BCM5720 and BCM5719 kinds sitting in Dell R320 and R520
 servers - and all of them have certain peculiarity: they claim to have
 up to 4 TX and RX rings (and this can be set/verified just fine through
 ethtool -l/-L, with driver defaulting to 4 rings), indirection table
 (ethtool -x) looks fine as well:

 RX flow hash indirection table for eth1b with 3 RX ring(s):
 0:  0 1 2 3 0 1 2 3
 8:  0 1 2 3 0 1 2 3
 ..

 But this "3 RX ring(s)" is actually a real limit if I try to adjust
 anything, for example all those commands would fail:

 ethtool -X eth1b equal 4
 ethtool -X eth1b weight 1 1 1 1

 But would work fine for 3 and less rings. This was quickly tested with
 different kernel/ethtool combinations, from old 3.16 to 4.6.y with same
 exact results. Nothing fancier (-N/-U) is supported either.

 Any hints/comments about the cause of this and/or possible workarounds ?
>>>
>>> Well a quick look at the driver code makes it seem the problem lies in
>>> tg3_get_rxnfc.  I suspect the bug is related to the following lines:
>>>
>>> /* The first interrupt vector only
>>>  * handles link interrupts.
>>>  */
>>> info->data -= 1;
>>>
>>> I'm not sure what the number of interrupt vectors has to do with the
>>> number of rings.  Perhaps someone more familiar with the driver can
>>> point out why you would subtract 1 from tp->rxq_cnt to get the number
>>> of queues when it seems like it should be tp->rxq_cnt.
>>>
>>> Hope that helps.
>>>
>>> - Alex
>>>
>>
>> Ah thanks, seems to be the case then. Quick git blame shows it's been
>> since the very introduction of RSS indirection configurability (ca.
>> 2011) in this driver, 90415477bf1356f72acc34063ff52441fc10a754.
>>
>> I've CCed the author, maybe he will be able to shed some light.
>
> Matt is no longer working here.  The driver should support up to 5
> MSIX vectors and 4 RSS rings.  It looks like the code to subtract 1 in
> tg3_get_rxnfc() is not correct.  Siva will look further into this.
> Thanks for reporting the issue.
Yes, the code to subtract 1 in tg3_get_rxnfc looks incorrect. we will
upstream the patch for removing this code.


[PATCH 1/1] phy/micrel: Change phy_id_mask for KSZ8721

2016-07-29 Thread Alexander Stein
There are KSZ8721 PHYs with phy_id 0x00221619. In order to detect them
as PHY_ID_KSZ8001 compatible while staying different to PHY_ID_KSZ9021
ignore the last two bits when matching PHY_ID

Signed-off-by: Alexander Stein 
---
 drivers/net/phy/micrel.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/phy/micrel.c b/drivers/net/phy/micrel.c
index 5a8fefc25157..469247b9bdc7 100644
--- a/drivers/net/phy/micrel.c
+++ b/drivers/net/phy/micrel.c
@@ -839,7 +839,7 @@ static struct phy_driver ksphy_driver[] = {
 }, {
.phy_id = PHY_ID_KSZ8001,
.name   = "Micrel KSZ8001 or KS8721",
-   .phy_id_mask= 0x00ff,
+   .phy_id_mask= 0x00fc,
.features   = (PHY_BASIC_FEATURES | SUPPORTED_Pause),
.flags  = PHY_HAS_MAGICANEG | PHY_HAS_INTERRUPT,
.driver_data= _type,
@@ -963,7 +963,7 @@ MODULE_LICENSE("GPL");
 static struct mdio_device_id __maybe_unused micrel_tbl[] = {
{ PHY_ID_KSZ9021, 0x000e },
{ PHY_ID_KSZ9031, MICREL_PHY_ID_MASK },
-   { PHY_ID_KSZ8001, 0x00ff },
+   { PHY_ID_KSZ8001, 0x00fc },
{ PHY_ID_KS8737, MICREL_PHY_ID_MASK },
{ PHY_ID_KSZ8021, 0x00ff },
{ PHY_ID_KSZ8031, 0x00ff },
-- 
2.9.2



Re: [PATCH v2 1/5] ethernet: add sun8i-emac driver

2016-07-29 Thread Andre Przywara
Hi,

On 25/07/16 20:54, Maxime Ripard wrote:
> On Wed, Jul 20, 2016 at 10:03:16AM +0200, LABBE Corentin wrote:
>> This patch add support for sun8i-emac ethernet MAC hardware.
>> It could be found in Allwinner H3/A83T/A64 SoCs.
>>
>> It supports 10/100/1000 Mbit/s speed with half/full duplex.
>> It can use an internal PHY (MII 10/100) or an external PHY
>> via RGMII/RMII.
>>
>> Signed-off-by: LABBE Corentin 
>> ---
>>  drivers/net/ethernet/allwinner/Kconfig  |   13 +
>>  drivers/net/ethernet/allwinner/Makefile |1 +
>>  drivers/net/ethernet/allwinner/sun8i-emac.c | 2129 
>> +++
>>  3 files changed, 2143 insertions(+)
>>  create mode 100644 drivers/net/ethernet/allwinner/sun8i-emac.c

...

>> diff --git a/drivers/net/ethernet/allwinner/sun8i-emac.c 
>> b/drivers/net/ethernet/allwinner/sun8i-emac.c
>> new file mode 100644
>> index 000..fc0c1dd
>> --- /dev/null
>> +++ b/drivers/net/ethernet/allwinner/sun8i-emac.c

...

>> +
>> +/* struct dma_desc - Structure of DMA descriptor used by the hardware
>> + * @status: Status of the frame written by HW, so RO for the
>> + *  driver (except for BIT(31) which is R/W)
>> + * @ctl: Information on the frame written by the driver (INT, len,...)
>> + * @buf_addr: physical address of the frame data
>> + * @next: physical address of next dma_desc
>> + */
>> +struct dma_desc {
>> +u32 status;
>> +u32 ctl;
>> +u32 buf_addr;
>> +u32 next;
>> +};
> 
> You should use the endian-aware variants here.

For the records: just doing the sparse annotation with __le32 here will
of course not be sufficient to make it work on BE kernels. I added
proper endianness conversion to all accesses to the descriptors and got
it to work with an arm64 big-endian kernel on the Pine64.

I put a patch here:
https://gist.github.com/apritzel/bc792c4dbbd8789f5f18aef538e8c440

This particular version is untested (though it compiles), since I just
adapted the working patch against the newer driver code and couldn't
test it yet.
I am not really an endianness expert, so don't know if there are smarter
ways to tackle this, if we should for instance provide access wrappers
to the DMA descriptor fields.

I will try to test this later today, if that works, feel free to merge
those changes into your driver.

Cheers,
Andre.


[PATCH 1/3] net: ipconfig: Add device name to debug messages

2016-07-29 Thread Uwe Kleine-König
This simplifies understanding what happens when there is more than one
device.

Signed-off-by: Uwe Kleine-König 
---
 net/ipv4/ipconfig.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/net/ipv4/ipconfig.c b/net/ipv4/ipconfig.c
index 1d71c40eaaf3..369e4a004850 100644
--- a/net/ipv4/ipconfig.c
+++ b/net/ipv4/ipconfig.c
@@ -666,14 +666,14 @@ static const u8 ic_bootp_cookie[4] = { 99, 130, 83, 99 };
 #ifdef IPCONFIG_DHCP
 
 static void __init
-ic_dhcp_init_options(u8 *options)
+ic_dhcp_init_options(u8 *options, struct ic_device *d)
 {
u8 mt = ((ic_servaddr == NONE)
 ? DHCPDISCOVER : DHCPREQUEST);
u8 *e = options;
int len;
 
-   pr_debug("DHCP: Sending message type %d\n", mt);
+   pr_debug("DHCP: Sending message type %d (%s)\n", mt, d->dev->name);
 
memcpy(e, ic_bootp_cookie, 4);  /* RFC1048 Magic Cookie */
e += 4;
@@ -857,7 +857,7 @@ static void __init ic_bootp_send_if(struct ic_device *d, 
unsigned long jiffies_d
/* add DHCP options or BOOTP extensions */
 #ifdef IPCONFIG_DHCP
if (ic_proto_enabled & IC_USE_DHCP)
-   ic_dhcp_init_options(b->exten);
+   ic_dhcp_init_options(b->exten, d);
else
 #endif
ic_bootp_init_ext(b->exten);
@@ -1033,8 +1033,8 @@ static int __init ic_bootp_recv(struct sk_buff *skb, 
struct net_device *dev, str
/* Is it a reply to our BOOTP request? */
if (b->op != BOOTP_REPLY ||
b->xid != d->xid) {
-   net_err_ratelimited("DHCP/BOOTP: Reply not for us, op[%x] 
xid[%x]\n",
-   b->op, b->xid);
+   net_err_ratelimited("DHCP/BOOTP: Reply not for us on %s, op[%x] 
xid[%x]\n",
+   d->dev->name, b->op, b->xid);
goto drop_unlock;
}
 
@@ -1075,7 +1075,7 @@ static int __init ic_bootp_recv(struct sk_buff *skb, 
struct net_device *dev, str
}
}
 
-   pr_debug("DHCP: Got message type %d\n", mt);
+   pr_debug("DHCP: Got message type %d (%s)\n", mt, 
d->dev->name);
 
switch (mt) {
case DHCPOFFER:
-- 
2.8.1



[PATCH 2/3] net: ipconfig: Support using "delayed" DHCP replies

2016-07-29 Thread Uwe Kleine-König
The dhcp code only waits 1s between sending DHCP requests on different
devices and only accepts an answer for the device that sent out the last
request. Only the timeout at the end of a loop is increased iteratively
which favours only the last device. This makes it impossible to work
with a dhcp server that takes little more than 1s connected to a device
that is not the last one.

Instead of also increasing the inter-device timeout, teach the code to
handle delayed replies.

To accomplish that, make *ic_dev track the current ic_device instead of
the current net_device and adapt all users accordingly. The relevant
change then is to reset d to ic_dev on a reply to assert that the
followup request goes through the right device.

Signed-off-by: Uwe Kleine-König 
---
 net/ipv4/ipconfig.c | 29 ++---
 1 file changed, 10 insertions(+), 19 deletions(-)

diff --git a/net/ipv4/ipconfig.c b/net/ipv4/ipconfig.c
index 369e4a004850..5af6736bd384 100644
--- a/net/ipv4/ipconfig.c
+++ b/net/ipv4/ipconfig.c
@@ -188,7 +188,7 @@ struct ic_device {
 };
 
 static struct ic_device *ic_first_dev __initdata;  /* List of open device 
*/
-static struct net_device *ic_dev __initdata;   /* Selected device */
+static struct ic_device *ic_dev __initdata;/* Selected device */
 
 static bool __init ic_is_init_dev(struct net_device *dev)
 {
@@ -307,7 +307,7 @@ static void __init ic_close_devs(void)
while ((d = next)) {
next = d->next;
dev = d->dev;
-   if (dev != ic_dev && !netdev_uses_dsa(dev)) {
+   if (dev != ic_dev->dev && !netdev_uses_dsa(dev)) {
pr_debug("IP-Config: Downing %s\n", dev->name);
dev_change_flags(dev, d->flags);
}
@@ -372,7 +372,7 @@ static int __init ic_setup_if(void)
int err;
 
memset(, 0, sizeof(ir));
-   strcpy(ir.ifr_ifrn.ifrn_name, ic_dev->name);
+   strcpy(ir.ifr_ifrn.ifrn_name, ic_dev->dev->name);
set_sockaddr(sin, ic_myaddr, 0);
if ((err = ic_devinet_ioctl(SIOCSIFADDR, )) < 0) {
pr_err("IP-Config: Unable to set interface address (%d)\n",
@@ -396,7 +396,7 @@ static int __init ic_setup_if(void)
 * out, we'll try to muddle along.
 */
if (ic_dev_mtu != 0) {
-   strcpy(ir.ifr_name, ic_dev->name);
+   strcpy(ir.ifr_name, ic_dev->dev->name);
ir.ifr_mtu = ic_dev_mtu;
if ((err = ic_dev_ioctl(SIOCSIFMTU, )) < 0)
pr_err("IP-Config: Unable to set interface mtu to %d 
(%d)\n",
@@ -568,7 +568,7 @@ ic_rarp_recv(struct sk_buff *skb, struct net_device *dev, 
struct packet_type *pt
goto drop_unlock;
 
/* We have a winner! */
-   ic_dev = dev;
+   ic_dev = d;
if (ic_myaddr == NONE)
ic_myaddr = tip;
ic_servaddr = sip;
@@ -655,8 +655,6 @@ static struct packet_type bootp_packet_type __initdata = {
.func = ic_bootp_recv,
 };
 
-static __be32 ic_dev_xid;  /* Device under configuration */
-
 /*
  *  Initialize DHCP/BOOTP extension fields in the request.
  */
@@ -1038,12 +1036,6 @@ static int __init ic_bootp_recv(struct sk_buff *skb, 
struct net_device *dev, str
goto drop_unlock;
}
 
-   /* Is it a reply for the device we are configuring? */
-   if (b->xid != ic_dev_xid) {
-   net_err_ratelimited("DHCP/BOOTP: Ignoring delayed packet\n");
-   goto drop_unlock;
-   }
-
/* Parse extensions */
if (ext_len >= 4 &&
!memcmp(b->exten, ic_bootp_cookie, 4)) { /* Check magic cookie */
@@ -1130,7 +1122,7 @@ static int __init ic_bootp_recv(struct sk_buff *skb, 
struct net_device *dev, str
}
 
/* We have a winner! */
-   ic_dev = dev;
+   ic_dev = d;
ic_myaddr = b->your_ip;
ic_servaddr = b->server_ip;
ic_addrservaddr = b->iph.saddr;
@@ -1225,9 +1217,6 @@ static int __init ic_dynamic(void)
timeout = CONF_BASE_TIMEOUT + (timeout % (unsigned int) 
CONF_TIMEOUT_RANDOM);
for (;;) {
 #ifdef IPCONFIG_BOOTP
-   /* Track the device we are configuring */
-   ic_dev_xid = d->xid;
-
if (do_bootp && (d->able & IC_BOOTP))
ic_bootp_send_if(d, jiffies - start_jiffies);
 #endif
@@ -1245,6 +1234,8 @@ static int __init ic_dynamic(void)
(ic_proto_enabled & IC_USE_DHCP) &&
ic_dhcp_msgtype != DHCPACK) {
ic_got_reply = 0;
+   /* continue on device that got the reply */
+   d = ic_dev;
pr_cont(",");
continue;
}
@@ -1487,7 +1478,7 @@ static int __init ip_auto_config(void)
 #endif /* IPCONFIG_DYNAMIC */
} else {
/* Device selected 

[PATCH 3/3] net: ipconfig: drop inter-device timeout

2016-07-29 Thread Uwe Kleine-König
Now that ipconfig learned to handle "delayed replies" in the previous
commit, there is no reason any more to delay sending a first request per
device.

Signed-off-by: Uwe Kleine-König 
---
 net/ipv4/ipconfig.c | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/net/ipv4/ipconfig.c b/net/ipv4/ipconfig.c
index 5af6736bd384..42cf629357b5 100644
--- a/net/ipv4/ipconfig.c
+++ b/net/ipv4/ipconfig.c
@@ -85,7 +85,6 @@
 /* Define the timeout for waiting for a DHCP/BOOTP/RARP reply */
 #define CONF_OPEN_RETRIES  2   /* (Re)open devices twice */
 #define CONF_SEND_RETRIES  6   /* Send six requests per open */
-#define CONF_INTER_TIMEOUT (HZ)/* Inter-device timeout: 1 second */
 #define CONF_BASE_TIMEOUT  (HZ*2)  /* Initial timeout: 2 seconds */
 #define CONF_TIMEOUT_RANDOM(HZ)/* Maximum amount of randomization */
 #define CONF_TIMEOUT_MULT  *7/4/* Rate of timeout growth */
@@ -1225,9 +1224,11 @@ static int __init ic_dynamic(void)
ic_rarp_send_if(d);
 #endif
 
-   jiff = jiffies + (d->next ? CONF_INTER_TIMEOUT : timeout);
-   while (time_before(jiffies, jiff) && !ic_got_reply)
-   schedule_timeout_uninterruptible(1);
+   if (!d->next) {
+   jiff = jiffies + timeout;
+   while (time_before(jiffies, jiff) && !ic_got_reply)
+   schedule_timeout_uninterruptible(1);
+   }
 #ifdef IPCONFIG_DHCP
/* DHCP isn't done until we get a DHCPACK. */
if ((ic_got_reply & IC_BOOTP) &&
-- 
2.8.1



[PATCH 0/3] net: ipconfig: improve DHCP timeout handling

2016-07-29 Thread Uwe Kleine-König
Hello,

this series teaches the ipconfig code to handle a DHCP reply on eth0 even if a
request on eth1 was already sent out.
This is a follow fix to 2513dfb83fc7 ("ipconfig: handle case of delayed DHCP
server") that dropped a late reply.

This makes it possible at all to work with slow DHCP servers at all in some
configurations and improves boot speed in general.

The first patch is not really necessary, it only helps decoding debug messages
when there is more than one device.

Best regards
Uwe

Uwe Kleine-König (3):
  net: ipconfig: Add device name to debug messages
  net: ipconfig: Support using "delayed" DHCP replies
  net: ipconfig: drop inter-device timeout

 net/ipv4/ipconfig.c | 50 +-
 1 file changed, 21 insertions(+), 29 deletions(-)

-- 
2.8.1



Re: [PATCH 15/15] ethernet: ti: davinci_emac: add missing of_node_put after calling of_parse_phandle

2016-07-29 Thread Mugunthan V N
+ Linux Omap ML

On Wednesday 27 July 2016 01:13 PM, Peter Chen wrote:
>  
>> On Wednesday 27 July 2016 07:50 AM, Peter Chen wrote:
>>> of_node_put needs to be called when the device node which is got from
>>> of_parse_phandle has finished using.
>>>
>>> Signed-off-by: Peter Chen 
>>> ---
>>>  drivers/net/ethernet/ti/davinci_emac.c | 1 +
>>>  1 file changed, 1 insertion(+)
>>>
>>> diff --git a/drivers/net/ethernet/ti/davinci_emac.c
>>> b/drivers/net/ethernet/ti/davinci_emac.c
>>> index c6c5465..d8cb9d0 100644
>>> --- a/drivers/net/ethernet/ti/davinci_emac.c
>>> +++ b/drivers/net/ethernet/ti/davinci_emac.c
>>> @@ -1571,6 +1571,7 @@ static int emac_dev_open(struct net_device *ndev)
>>> if (priv->phy_node) {
>>> phydev = of_phy_connect(ndev, priv->phy_node,
>>> _adjust_link, 0, 0);
>>> +   of_node_put(priv->phy_node);
>>> if (!phydev) {
>>> dev_err(emac_dev, "could not connect to phy %s\n",
>>> priv->phy_node->full_name);
>>>
>>
>> phy_node is accessed in case of of_phy_connect() returns error, so it has to 
>> be
>> moved after the dev_err log
>>
> 
> Yeah, you are right. I will change it, thanks.
> 

I see one more problem, when you stop and open the interface there is no
way to get the phy_node and interface will be unusable, so of_node_put()
should be moved to remove()

Regards
Mugunthan V N



Re: [PATCH v2 1/5] ethernet: add sun8i-emac driver

2016-07-29 Thread Arnd Bergmann
On Thursday, July 28, 2016 3:18:26 PM CEST LABBE Corentin wrote:
> 
> I will reworked locking and it seems that no locking is necessary.
> I have added the following comment about the locking strategy:
> 
> /* Locking strategy:
>  * RX queue does not need any lock since only sun8i_emac_poll() access it.
>  * (All other RX modifiers (ringparam/ndo_stop) disable NAPI and so 
> sun8i_emac_poll())
>  * TX queue is handled by sun8i_emac_xmit(), sun8i_emac_complete_xmit() and 
> sun8i_emac_tx_timeout()
>  * (All other RX modifiers (ringparam/ndo_stop) disable NAPI and stop queue)
>  *
>  * sun8i_emac_xmit() could fire only once (netif_tx_lock)
>  * sun8i_emac_complete_xmit() could fire only once (called from NAPI)
>  * sun8i_emac_tx_timeout() could fire only once (netif_tx_lock) and couldnt
>  * race with sun8i_emac_xmit (due to netif_tx_lock) and with 
> sun8i_emac_complete_xmit which disable NAPI.
>  *
>  * So only sun8i_emac_xmit and sun8i_emac_complete_xmit could fire at the 
> same time.
>  * But they never could modify the same descriptors:
>  * - sun8i_emac_complete_xmit() will modify only descriptors with empty status
>  * - sun8i_emac_xmit() will modify only descriptors set to DCLEAN
>  * Proper memory barriers ensure that descriptor set to DCLEAN could not be
>  * modified latter by sun8i_emac_complete_xmit().
>  * */

Sounds good, the comment is certainly very helpful here.

Arnd


Re: [PATCH 2/3] stmmac: change dma descriptors to __le32

2016-07-29 Thread Michael Weiser
Hi Giuseppe,

On Tue, Jul 26, 2016 at 02:13:45PM +0200, Giuseppe CAVALLARO wrote:

> > -   ep++;
> > +   ep++);
> there is a build problem here.
> Pls fix it.

Thanks for your review. I've got it fixed locally. An updated patchset
will be forthcoming as soon as I've sorted a problem with sun4i-emac
which I've been asked to look into as well.
-- 
Thanks,
Michael


[PATCH net v2 0/3] r8169:fix 3 runtime pm related issues.

2016-07-29 Thread Chunhao Lin
v2:
use "struct device *d = >pci_dev->dev" instead of "struct pci_dev *pdev = 
tp->pci_dev"

v1:
This series of patches fix 3 runtime pm related issues that are listed below.

Chunhao Lin (3):
  r8169:fix kernel log spam when set or get hardware wol setting.
  r8169:add checking driver's runtime pm status in
rtl8169_get_ethtool_stats()
  r8169:fix nic may not work after changing mac address.

 drivers/net/ethernet/realtek/r8169.c | 37 
 1 file changed, 33 insertions(+), 4 deletions(-)

-- 
1.9.1



[PATCH net v2 1/3] r8169:fix kernel log spam when set or get hardware wol setting.

2016-07-29 Thread Chunhao Lin
NIC will be put into D3 state during runtime suspend state. When set or
get hardware wol setting, driver will write or read hardware registers.
If we set or get hardware wol setting in runtime suspend state, because
NIC will in D3 state, the hardware registers read by driver will return all
0xff. That will let driver thinking register flag is not toggled and
then prints the warning message "rtl_counters_cond == 1 (loop: 1000,
delay: 10)" to kernel log.

For fixing this issue, add checking driver's pm runtime status in
rtl8169_get_wol() and rtl8169_set_wol().

Signed-off-by: Chunhao Lin 
---
 drivers/net/ethernet/realtek/r8169.c | 20 ++--
 1 file changed, 18 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/realtek/r8169.c 
b/drivers/net/ethernet/realtek/r8169.c
index 0e62d74..00c387b 100644
--- a/drivers/net/ethernet/realtek/r8169.c
+++ b/drivers/net/ethernet/realtek/r8169.c
@@ -1749,13 +1749,21 @@ static u32 __rtl8169_get_wol(struct rtl8169_private *tp)
 static void rtl8169_get_wol(struct net_device *dev, struct ethtool_wolinfo 
*wol)
 {
struct rtl8169_private *tp = netdev_priv(dev);
+   struct device *d = >pci_dev->dev;
+
+   pm_runtime_get_noresume(d);
 
rtl_lock_work(tp);
 
wol->supported = WAKE_ANY;
-   wol->wolopts = __rtl8169_get_wol(tp);
+   if (pm_runtime_active(d))
+   wol->wolopts = __rtl8169_get_wol(tp);
+   else
+   wol->wolopts = tp->saved_wolopts;
 
rtl_unlock_work(tp);
+
+   pm_runtime_put_noidle(d);
 }
 
 static void __rtl8169_set_wol(struct rtl8169_private *tp, u32 wolopts)
@@ -1845,6 +1853,9 @@ static void __rtl8169_set_wol(struct rtl8169_private *tp, 
u32 wolopts)
 static int rtl8169_set_wol(struct net_device *dev, struct ethtool_wolinfo *wol)
 {
struct rtl8169_private *tp = netdev_priv(dev);
+   struct device *d = >pci_dev->dev;
+
+   pm_runtime_get_noresume(d);
 
rtl_lock_work(tp);
 
@@ -1852,12 +1863,17 @@ static int rtl8169_set_wol(struct net_device *dev, 
struct ethtool_wolinfo *wol)
tp->features |= RTL_FEATURE_WOL;
else
tp->features &= ~RTL_FEATURE_WOL;
-   __rtl8169_set_wol(tp, wol->wolopts);
+   if (pm_runtime_active(d))
+   __rtl8169_set_wol(tp, wol->wolopts);
+   else
+   tp->saved_wolopts = wol->wolopts;
 
rtl_unlock_work(tp);
 
device_set_wakeup_enable(>pci_dev->dev, wol->wolopts);
 
+   pm_runtime_put_noidle(d);
+
return 0;
 }
 
-- 
1.9.1



[PATCH net v2 2/3] r8169:add checking driver's runtime pm status in rtl8169_get_ethtool_stats()

2016-07-29 Thread Chunhao Lin
Not to call rtl8169_update_counters() to dump tally counter when driver
is in runtime suspend state.

Calling rtl8169_update_counters() in runtime suspend state will produce
warning message "rtl_counters_cond == 1 (loop: 1000, delay: 10)".

Signed-off-by: Chunhao Lin 
---
 drivers/net/ethernet/realtek/r8169.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/realtek/r8169.c 
b/drivers/net/ethernet/realtek/r8169.c
index 00c387b..d0b5cae 100644
--- a/drivers/net/ethernet/realtek/r8169.c
+++ b/drivers/net/ethernet/realtek/r8169.c
@@ -2308,11 +2308,17 @@ static void rtl8169_get_ethtool_stats(struct net_device 
*dev,
  struct ethtool_stats *stats, u64 *data)
 {
struct rtl8169_private *tp = netdev_priv(dev);
+   struct device *d = >pci_dev->dev;
struct rtl8169_counters *counters = tp->counters;
 
ASSERT_RTNL();
 
-   rtl8169_update_counters(dev);
+   pm_runtime_get_noresume(d);
+
+   if (pm_runtime_active(d))
+   rtl8169_update_counters(dev);
+
+   pm_runtime_put_noidle(d);
 
data[0] = le64_to_cpu(counters->tx_packets);
data[1] = le64_to_cpu(counters->rx_packets);
-- 
1.9.1



[PATCH net v2 3/3] r8169:fix nic may not work after changing mac address.

2016-07-29 Thread Chunhao Lin
When there is no AC power, NIC may not work after changing mac address.
Please refer to following link.
http://www.spinics.net/lists/netdev/msg356572.html

This issue is caused by runtime power management. When there is no AC
power, if we put NIC down (ifconfig down), the driver will be in runtime
suspend state and hardware will be put into D3 state. During this time,
driver cannot access hardware regisers. So if you set new mac address
during this time, it will not be set to hardware. After resume, NIC will
keep using the old mac address and the network will not work normally.

In this patch I add detecting runtime pm status when setting mac address.
If driver is in runtime suspend state, it will skip setting mac address, keep
the new mac address, and set the new mac address during runtime resume.

Signed-off-by: Chunhao Lin 
---
 drivers/net/ethernet/realtek/r8169.c | 9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/realtek/r8169.c 
b/drivers/net/ethernet/realtek/r8169.c
index d0b5cae..e55638c 100644
--- a/drivers/net/ethernet/realtek/r8169.c
+++ b/drivers/net/ethernet/realtek/r8169.c
@@ -4480,6 +4480,7 @@ static void rtl_rar_set(struct rtl8169_private *tp, u8 
*addr)
 static int rtl_set_mac_address(struct net_device *dev, void *p)
 {
struct rtl8169_private *tp = netdev_priv(dev);
+   struct device *d = >pci_dev->dev;
struct sockaddr *addr = p;
 
if (!is_valid_ether_addr(addr->sa_data))
@@ -4487,7 +4488,12 @@ static int rtl_set_mac_address(struct net_device *dev, 
void *p)
 
memcpy(dev->dev_addr, addr->sa_data, dev->addr_len);
 
-   rtl_rar_set(tp, dev->dev_addr);
+   pm_runtime_get_noresume(d);
+
+   if (pm_runtime_active(d))
+   rtl_rar_set(tp, dev->dev_addr);
+
+   pm_runtime_put_noidle(d);
 
return 0;
 }
@@ -7890,6 +7896,7 @@ static int rtl8169_runtime_resume(struct device *device)
struct pci_dev *pdev = to_pci_dev(device);
struct net_device *dev = pci_get_drvdata(pdev);
struct rtl8169_private *tp = netdev_priv(dev);
+   rtl_rar_set(tp, dev->dev_addr);
 
if (!tp->TxDescArray)
return 0;
-- 
1.9.1



Re: [PATCH v2 3/4] drivers: net: cpsw: use of_platform_depopulate()

2016-07-29 Thread Mugunthan V N
On Thursday 28 July 2016 11:20 PM, Grygorii Strashko wrote:
> Use of_platform_depopulate() in cpsw_remove() instead of
> of_device_unregister(), because CSPW child devices will not be
> recreated otherwise on next insmod. of_platform_depopulate() is
> correct way now as it will ensure that all steps done in
> of_platform_populate() are reverted, including cleaning up of
> OF_POPULATED flag.
> 
> Signed-off-by: Grygorii Strashko 

Reviewed-by: Mugunthan V N 

Regards
Mugunthan V N



Re: [PATCH v2 2/4] drivers: net: cpsw: fix wrong regs access in cpsw_remove

2016-07-29 Thread Mugunthan V N
On Thursday 28 July 2016 11:20 PM, Grygorii Strashko wrote:
> The L3 error will be generated and system will crash during unloading
> of CPSW driver if CPSW is used as module and ethX devices are down.
> This happens because CPSW can be power off by PM runtime now when ethX
> devices are down.
> 
> Hence, ensure that CPSW powered up by PM runtime before performing any
> deinitialization actions which require CPSW registers access. In case
> of PM runtime error just leave cpsw_remove() as we can't do anything
> anymore.
> 
> Signed-off-by: Grygorii Strashko 

Reviewed-by: Mugunthan V N 

Regards
Mugunthan V N



Re: [PATCH 2/3] stmmac: change dma descriptors to __le32

2016-07-29 Thread Giuseppe CAVALLARO

On 7/29/2016 9:53 AM, Michael Weiser wrote:

Hi Giuseppe,

On Tue, Jul 26, 2016 at 02:13:45PM +0200, Giuseppe CAVALLARO wrote:


-   ep++;
+   ep++);

there is a build problem here.
Pls fix it.


Thanks for your review. I've got it fixed locally. An updated patchset
will be forthcoming as soon as I've sorted a problem with sun4i-emac
which I've been asked to look into as well.



ok, as you send the new version I will do a run on ARM box based too

peppe


Re: [PATCH ipsec] xfrm: Ignore socket policies when rebuilding hash tables

2016-07-29 Thread Herbert Xu
On Fri, Jul 29, 2016 at 09:57:32AM +0200, Tobias Brunner wrote:
> Whenever thresholds are changed the hash tables are rebuilt.  This is
> done by enumerating all policies and hashing and inserting them into
> the right table according to the thresholds and direction.
> 
> Because socket policies are also contained in net->xfrm.policy_all but
> no hash tables are defined for their direction (dir + XFRM_POLICY_MAX)
> this causes a NULL or invalid pointer dereference after returning from
> policy_hash_bysel() if the rebuild is done while any socket policies
> are installed.
> 
> Since the rebuild after changing thresholds is scheduled this crash
> could even occur if the userland sets thresholds seemingly before
> installing any socket policies.
> 
> Fixes: 53c2e285f970 ("xfrm: Do not hash socket policies")
> Signed-off-by: Tobias Brunner 

Acked-by: Herbert Xu 

Good catch, thanks!
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt


Re: Microsemi VSC 8531/41 PHY Driver

2016-07-29 Thread Andrew Lunn
> > +/* RGMII Rx Clock delay value change with board lay-out */ static u8 
> > +rgmii_rx_clk_delay = RGMII_RX_CLK_DELAY_1_1_NS;
> 
> Doesn't this stop you from having a board with two PHYs with different 
> layouts? You should be getting this value from the device tree.
> 
> Raju: As of now, RGMII Rx clock delay value should be 1.1 nsec as 
> optimized/recommended value. 
> We tested on Beaglebone Black with VSC 8531 PHY.
> We would like to provide new function to configure correct/require value 
> based on PHY layouts 
> alone with other RGMII configuration parameters as part of our next 
> implementation.

Please either do it properly now or hard code it as the default, and
then later replace it with device tree, etc. We don't like to see half
finished features.

> What are you locking against?
> 
> Raju: VSC 8531 has different PAGEs. Whenever MDC/MDIO access the PHY control 
> registers, 
> first set the page number then read/write the register address. Default page 
> should be Page 0.
> When I want to access not default page register, I have to lock phy device 
> access and change 
> the page number and register access as atomic operation. 

I understand all that, which is why i asked, "what are you locking
against?", not "why are you locking?" What are the other call paths? I
don't see you taking this lock anywhere else? Should you be? I would
just like to see a comment which suggests you understand when this
lock is needed, and when not.

 Andrew


Re: [PATCH v2 1/4] net: ethernet: ti: cpdma: fix lockup in cpdma_ctlr_destroy()

2016-07-29 Thread Mugunthan V N
On Thursday 28 July 2016 11:20 PM, Grygorii Strashko wrote:
> Fix deadlock in cpdma_ctlr_destroy() which is triggered now on
> cpsw module removal:
>  cpsw_remove()
>  - cpdma_ctlr_destroy()
>- spin_lock_irqsave(>lock, flags)
>- cpdma_ctlr_stop()
>  - spin_lock_irqsave(>lock, flags);
>- cpdma_chan_destroy()
>  - spin_lock_irqsave(>lock, flags);
> 
> The issue has not been observed before because CPDMA channels have
> been destroyed manually by CPSW until commit d941ebe88a41 ("net:
> ethernet: ti: cpsw: use destroy ctlr to destroy channels") was merged.
> 
> Signed-off-by: Grygorii Strashko 

Reviewed-by: Mugunthan V N 

Regards
Mugunthan V N


Re: [PATCH v2 3/5] ARM: sun8i: dt: Add DT bindings documentation for Allwinner sun8i-emac

2016-07-29 Thread LABBE Corentin
On Thu, Jul 28, 2016 at 08:49:16PM +0200, Maxime Ripard wrote:
> On Thu, Jul 28, 2016 at 03:40:31PM +0200, LABBE Corentin wrote:
> > On Thu, Jul 21, 2016 at 09:55:19AM +0200, Maxime Ripard wrote:
> > > Hi,
> > > 
> > > On Wed, Jul 20, 2016 at 10:03:18AM +0200, LABBE Corentin wrote:
> > > > This patch adds documentation for Device-Tree bindings for the
> > > > Allwinner sun8i-emac driver.
> > > > 
> > > > Signed-off-by: LABBE Corentin 
> > > > ---
> > > >  .../bindings/net/allwinner,sun8i-emac.txt  | 65 
> > > > ++
> > > >  1 file changed, 65 insertions(+)
> > > >  create mode 100644 
> > > > Documentation/devicetree/bindings/net/allwinner,sun8i-emac.txt
> > > > 
> > > > diff --git 
> > > > a/Documentation/devicetree/bindings/net/allwinner,sun8i-emac.txt 
> > > > b/Documentation/devicetree/bindings/net/allwinner,sun8i-emac.txt
> > > > new file mode 100644
> > > > index 000..4bf4e53
> > > > --- /dev/null
> > > > +++ b/Documentation/devicetree/bindings/net/allwinner,sun8i-emac.txt
> > > > @@ -0,0 +1,65 @@
> > > > +* Allwinner sun8i EMAC ethernet controller
> > > > +
> > > > +Required properties:
> > > > +- compatible: "allwinner,sun8i-a83t-emac", "allwinner,sun8i-h3-emac",
> > > > +   or "allwinner,sun50i-a64-emac"
> > > > +- reg: address and length of the register sets for the device.
> > > > +- reg-names: should be "emac" and "syscon", matching the register sets
> > > 
> > > Blindly mapping a register of some other device on the SoC doesn't
> > > look very reasonable.
> > 
> > As we discuss after this mail on IRC, this register is dedicated to EMAC.
> 
> I don't think we did. It's still right in the middle of some other
> hardware block register space. You actually have a syscon driver to do
> just that, why not use it?
> 

I will try with syscon driver

> > > > +See ethernet.txt in the same directory for generic bindings for 
> > > > ethernet
> > > > +controllers.
> > > > +
> > > > +The device node referenced by "phy" or "phy-handle" should be a child 
> > > > node
> > > > +of this node. See phy.txt for the generic PHY bindings.
> > > > +
> > > > +Optional properties:
> > > > +- phy-supply: phandle to a regulator if the PHY needs one
> > > > +- phy-io-supply: phandle to a regulator if the PHY needs a another one 
> > > > for I/O.
> > > > +This is sometimes found with RGMII PHYs, which use a 
> > > > second
> > > > +regulator for the lower I/O voltage.
> > > > +- allwinner,tx-delay: The setting of the TX clock delay chain
> > > > +- allwinner,rx-delay: The setting of the RX clock delay chain
> > > 
> > > In which unit? What is the default value?
> > 
> > The unit is unknown to me, but I have added a comment for the
> > default and acceptable range value.
> 
> That's unfortunate. We'll see how the DT maintainers feel about that.
> 

I have searched for txdelay in Documentation, and found a few driver that give 
the units (us/ps).
But in that case, the value in ps/us must be found in a table indexed by the 
Xxdelay value.
So the settings seems always a raw number, and for sun8i-emac nothing in user 
manual could help to find what each value is/related to.

So the good value is either found by "try and test" or "copy the value found in 
fex file".

Regards

LABBE Corentin


[PATCH ipsec] xfrm: Ignore socket policies when rebuilding hash tables

2016-07-29 Thread Tobias Brunner
Whenever thresholds are changed the hash tables are rebuilt.  This is
done by enumerating all policies and hashing and inserting them into
the right table according to the thresholds and direction.

Because socket policies are also contained in net->xfrm.policy_all but
no hash tables are defined for their direction (dir + XFRM_POLICY_MAX)
this causes a NULL or invalid pointer dereference after returning from
policy_hash_bysel() if the rebuild is done while any socket policies
are installed.

Since the rebuild after changing thresholds is scheduled this crash
could even occur if the userland sets thresholds seemingly before
installing any socket policies.

Fixes: 53c2e285f970 ("xfrm: Do not hash socket policies")
Signed-off-by: Tobias Brunner 
---
 net/xfrm/xfrm_policy.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c
index b5e665b3cfb0..45f9cf97ea25 100644
--- a/net/xfrm/xfrm_policy.c
+++ b/net/xfrm/xfrm_policy.c
@@ -626,6 +626,10 @@ static void xfrm_hash_rebuild(struct work_struct *work)
 
/* re-insert all policies by order of creation */
list_for_each_entry_reverse(policy, >xfrm.policy_all, walk.all) {
+   if (xfrm_policy_id2dir(policy->index) >= XFRM_POLICY_MAX) {
+   /* skip socket policies */
+   continue;
+   }
newpos = NULL;
chain = policy_hash_bysel(net, >selector,
  policy->family,
-- 
1.9.1


Re: Microsemi VSC 8531/41 PHY Driver

2016-07-29 Thread Andrew Lunn
On Thu, Jul 28, 2016 at 06:44:37AM +, Raju Lakkaraju wrote:
> Hello Andrew,
> 
> Thank you for given valuable comments.
> Please see the my responses inline.
> 
> Thanks,
> Raju
> 
> -Original Message-
> From: Andrew Lunn [mailto:and...@lunn.ch] 
> Sent: Tuesday, July 26, 2016 6:14 PM
> To: Raju Lakkaraju
> Cc: netdev@vger.kernel.org; f.faine...@gmail.com; Allan Nielsen
> Subject: Re: Microsemi VSC 8531/41 PHY Driver
> 
> EXTERNAL EMAIL
> 
> 
> > +/* RGMII Rx Clock delay value change with board lay-out */ static u8 
> > +rgmii_rx_clk_delay = RGMII_RX_CLK_DELAY_1_1_NS;
> 
> Doesn't this stop you from having a board with two PHYs with different 
> layouts? You should be getting this value from the device tree.
> 
> Raju: As of now, RGMII Rx clock delay value should be 1.1 nsec as 
> optimized/recommended value. 
> We tested on Beaglebone Black with VSC 8531 PHY.
> We would like to provide new function to configure correct/require value 
> based on PHY layouts 
> alone with other RGMII configuration parameters as part of our next 
> implementation.

Hi Raju

Please can you use standard email quoting, just like everybody else does.

   Andrew


Re: [PATCH] bpf: fix size of copy_to_user in percpu map.

2016-07-29 Thread Daniel Borkmann

On 07/29/2016 08:47 AM, Alexei Starovoitov wrote:

On Thu, Jul 28, 2016 at 05:42:21PM -0700, William Tu wrote:

The total size of value copy_to_user() writes to userspace should
be the (current number of cpu) * (value size), instead of
num_possible_cpus() * (value size).  Found by samples/bpf/test_maps.c,
which always copies 512 byte to userspace, crashing the userspace
program stack.


hmm. I'm missing something. The sample code assumes no cpu hutplug,
so sysconf(_SC_NPROCESSORS_CONF) == num_possible_cpu == num_online_cpu,
unless there is crazy INIT_ALL_POSSIBLE config option is used.


Are you using ARM by chance? What is the count that you get in
user space and from kernel side?

http://lists.infradead.org/pipermail/linux-arm-kernel/2011-June/054177.html


[PATCH v5] net: sched: convert qdisc linked list to hashtable

2016-07-29 Thread Jiri Kosina
From: Jiri Kosina 

Convert the per-device linked list into a hashtable. The primary 
motivation for this change is that currently, we're not tracking all the 
qdiscs in hierarchy (e.g. excluding default qdiscs), as the lookup 
performed over the linked list by qdisc_match_from_root() is rather 
expensive.

The ultimate goal is to get rid of hidden qdiscs completely, which will 
bring much more determinism in user experience.

As we're adding hashtable.h include into generic netdevice.h, we have to 
make sure HASH_SIZE macro is now non-conflicting with local definitions.

Reviewed-by: Cong Wang 
Signed-off-by: Jiri Kosina 
---

v1 -> v2: fix up RCU hastable usage wrt. rtnl
  fix compilation of .c files which define their own
  HASH_SIZE that now oncflicts with the one from
  hashtable.h (newly included via netdevice.h)

v2 -> v3: resolve HASH_SIZE identifier conflicts in a cleaner way
  fix up the number of hash bucket bits (4 bits for 16 buckets)

v3 -> v4: put the hastable into struct netdevice only if 
  CONFIG_NET_SCHED has been enabled

v4 -> v5: fix !CONFIG_NET_SCHED build (reported by Fengguang Wu)
  add Cong Wang's reviewed-by

 include/linux/netdevice.h |  4 
 include/net/pkt_sched.h   |  4 ++--
 include/net/sch_generic.h |  2 +-
 net/core/dev.c|  3 +++
 net/ipv6/ip6_gre.c| 12 ++--
 net/ipv6/ip6_tunnel.c | 10 +-
 net/ipv6/ip6_vti.c| 10 +-
 net/ipv6/sit.c| 10 +-
 net/sched/sch_api.c   | 23 +--
 net/sched/sch_generic.c   |  8 +---
 net/sched/sch_mq.c|  2 +-
 net/sched/sch_mqprio.c|  2 +-
 12 files changed, 51 insertions(+), 39 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index f45929c..17c6499 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -52,6 +52,7 @@
 #include 
 #include 
 #include 
+#include 
 
 struct netpoll_info;
 struct device;
@@ -1778,6 +1779,9 @@ struct net_device {
unsigned intnum_tx_queues;
unsigned intreal_num_tx_queues;
struct Qdisc*qdisc;
+#ifdef CONFIG_NET_SCHED
+   DECLARE_HASHTABLE   (qdisc_hash, 4);
+#endif
unsigned long   tx_queue_len;
spinlock_t  tx_global_lock;
int watchdog_timeo;
diff --git a/include/net/pkt_sched.h b/include/net/pkt_sched.h
index fea53f4..8ba11b4 100644
--- a/include/net/pkt_sched.h
+++ b/include/net/pkt_sched.h
@@ -90,8 +90,8 @@ int unregister_qdisc(struct Qdisc_ops *qops);
 void qdisc_get_default(char *id, size_t len);
 int qdisc_set_default(const char *id);
 
-void qdisc_list_add(struct Qdisc *q);
-void qdisc_list_del(struct Qdisc *q);
+void qdisc_hash_add(struct Qdisc *q);
+void qdisc_hash_del(struct Qdisc *q);
 struct Qdisc *qdisc_lookup(struct net_device *dev, u32 handle);
 struct Qdisc *qdisc_lookup_class(struct net_device *dev, u32 handle);
 struct qdisc_rate_table *qdisc_get_rtab(struct tc_ratespec *r,
diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 62d5531..26f5cb3 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -67,7 +67,7 @@ struct Qdisc {
u32 limit;
const struct Qdisc_ops  *ops;
struct qdisc_size_table __rcu *stab;
-   struct list_headlist;
+   struct hlist_node   hash;
u32 handle;
u32 parent;
int (*reshape_fail)(struct sk_buff *skb,
diff --git a/net/core/dev.c b/net/core/dev.c
index 904ff43..d3736d5 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -7511,6 +7511,9 @@ struct net_device *alloc_netdev_mqs(int sizeof_priv, 
const char *name,
INIT_LIST_HEAD(>all_adj_list.lower);
INIT_LIST_HEAD(>ptype_all);
INIT_LIST_HEAD(>ptype_specific);
+#ifdef CONFIG_NET_SCHED
+   hash_init(dev->qdisc_hash);
+#endif
dev->priv_flags = IFF_XMIT_DST_RELEASE | IFF_XMIT_DST_RELEASE_PERM;
setup(dev);
 
diff --git a/net/ipv6/ip6_gre.c b/net/ipv6/ip6_gre.c
index fdc9de2..d3697a4 100644
--- a/net/ipv6/ip6_gre.c
+++ b/net/ipv6/ip6_gre.c
@@ -61,12 +61,12 @@ static bool log_ecn_error = true;
 module_param(log_ecn_error, bool, 0644);
 MODULE_PARM_DESC(log_ecn_error, "Log packets received with corrupted ECN");
 
-#define HASH_SIZE_SHIFT  5
-#define HASH_SIZE (1 << HASH_SIZE_SHIFT)
+#define IP6_GRE_HASH_SIZE_SHIFT  5
+#define IP6_GRE_HASH_SIZE (1 << IP6_GRE_HASH_SIZE_SHIFT)
 
 static int ip6gre_net_id __read_mostly;
 struct ip6gre_net {
-   struct ip6_tnl __rcu *tunnels[4][HASH_SIZE];
+   struct ip6_tnl __rcu *tunnels[4][IP6_GRE_HASH_SIZE];
 
struct net_device *fb_tunnel_dev;
 };
@@ -96,12 +96,12 @@ static void ip6gre_tnl_link_config(struct ip6_tnl *t, int 

[PATCH net] tcp: fix functions of tcp_congestion_ops from being called before initialization

2016-07-29 Thread Li, Ji
In Linux 3.17 and earlier, tcp_init_congestion_ops (i.e. tcp_reno) is
used as the ca_ops during 3WHS, and after 3WHS, ca_ops is assigned as 
the default congestion control set by sysctl and immediately its parameters
stored in icsk_ca_priv[] are initialized. Commit 55d8694fa82c ("net:
tcp: assign tcp cong_ops when tcp sk is created") splits assignment and
initialization into two steps: assignment is done before SYN or SYN-ACK
is sent out; initialization is done after 3WHS (assume without
fastopen). But this can cause out-of-order invocation for ca_ops functions
other than .init() during 3WHS, as they could be called before its
parameters get initialized. It may cause unexpected behavior for
congestion controls, and make troubles for those that need dynamic
object allocation, like tcp_cdg etc.

We used tcp_dctcp as an example to visualize the problem, and set it as
default congestion control via sysctl. Three parameters
(ca->prior_snd_una, ca->prior_rcv_nxt, ca->dctcp_alpha) were monitored
when functions, such as dctcp_update_alpha() and dctcp_ssthresh(), are
called during 3WHS. All of three are found to be zero, which is likely
impossible if dctcp_init() was called ahead, where those three
parameters should be initialized. Some other congestion controls are
examined too and the same problem was reproduced.

This patch checks ca_initialized flag before ca_ops is invoked, and
allows a call only if initialization has been done. .ssthresh() is a
special case, where tcp_reno_ssthresh() is called instead if it is
uninitialized. .get_info() is still always allowed, as it is expected
case mentioned in commit 55d8694fa82c ("net: tcp: assign tcp cong_ops
when tcp sk is created").

Fixes: 55d8694fa82c ("net: tcp: assign tcp cong_ops when tcp sk is
created")
Signed-off-by: Ji Li 
---
include/net/inet_connection_sock.h |  3 ++-
 include/net/tcp.h  |  4 ++--
 net/ipv4/tcp_cong.c|  4 +++-
 net/ipv4/tcp_input.c   | 21 +++--
 4 files changed, 22 insertions(+), 10 deletions(-)

diff --git a/include/net/inet_connection_sock.h 
b/include/net/inet_connection_sock.h
index 49dcad4..933d217 100644
--- a/include/net/inet_connection_sock.h
+++ b/include/net/inet_connection_sock.h
@@ -100,7 +100,8 @@ struct inet_connection_sock {
const struct tcp_congestion_ops *icsk_ca_ops;
const struct inet_connection_sock_af_ops *icsk_af_ops;
unsigned int  (*icsk_sync_mss)(struct sock *sk, u32 pmtu);
-   __u8  icsk_ca_state:6,
+   __u8  icsk_ca_state:5,
+ icsk_ca_initialized:1,
  icsk_ca_setsockopt:1,
  icsk_ca_dst_locked:1;
__u8  icsk_retransmits;
diff --git a/include/net/tcp.h b/include/net/tcp.h
index 0bcc70f..4c26e1c 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -934,7 +934,7 @@ static inline void tcp_set_ca_state(struct sock *sk, const 
u8 ca_state)
 {
struct inet_connection_sock *icsk = inet_csk(sk);

-   if (icsk->icsk_ca_ops->set_state)
+   if (icsk->icsk_ca_ops->set_state && icsk->icsk_ca_initialized)
icsk->icsk_ca_ops->set_state(sk, ca_state);
icsk->icsk_ca_state = ca_state;
 }
@@ -943,7 +943,7 @@ static inline void tcp_ca_event(struct sock *sk, const enum 
tcp_ca_event event)
 {
const struct inet_connection_sock *icsk = inet_csk(sk);

-   if (icsk->icsk_ca_ops->cwnd_event)
+   if (icsk->icsk_ca_ops->cwnd_event && icsk->icsk_ca_initialized)
icsk->icsk_ca_ops->cwnd_event(sk, event);
 }

diff --git a/net/ipv4/tcp_cong.c b/net/ipv4/tcp_cong.c
index 882caa4..dd1de39 100644
--- a/net/ipv4/tcp_cong.c
+++ b/net/ipv4/tcp_cong.c
@@ -185,6 +185,7 @@ void tcp_init_congestion_control(struct sock *sk)

if (icsk->icsk_ca_ops->init)
icsk->icsk_ca_ops->init(sk);
+   inet_csk(sk)->icsk_ca_initialized = 1;
if (tcp_ca_needs_ecn(sk))
INET_ECN_xmit(sk);
else
@@ -209,8 +210,9 @@ void tcp_cleanup_congestion_control(struct sock *sk)
{
struct inet_connection_sock *icsk = inet_csk(sk);

-   if (icsk->icsk_ca_ops->release)
+   if (icsk->icsk_ca_ops->release && icsk->icsk_ca_initialized)
icsk->icsk_ca_ops->release(sk);
+   icsk->icsk_ca_initialized = 0;
module_put(icsk->icsk_ca_ops->owner);
 }

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 42bf89a..62f65dc 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -1878,6 +1878,14 @@ static inline void tcp_init_undo(struct tcp_sock *tp)
tp->undo_retrans = tp->retrans_out ? : -1;
 }

+static inline u32 tcp_ca_ssthresh(struct sock *sk)
+{
+   if (inet_csk(sk)->icsk_ca_initialized)
+   return inet_csk(sk)->icsk_ca_ops->ssthresh(sk);
+   else
+   return tcp_reno_ssthresh(sk);
+}
+
 /* Enter 

Re: [PATCH] bpf: fix size of copy_to_user in percpu map.

2016-07-29 Thread Alexei Starovoitov
On Thu, Jul 28, 2016 at 05:42:21PM -0700, William Tu wrote:
> The total size of value copy_to_user() writes to userspace should
> be the (current number of cpu) * (value size), instead of
> num_possible_cpus() * (value size).  Found by samples/bpf/test_maps.c,
> which always copies 512 byte to userspace, crashing the userspace
> program stack.

hmm. I'm missing something. The sample code assumes no cpu hutplug,
so sysconf(_SC_NPROCESSORS_CONF) == num_possible_cpu == num_online_cpu,
unless there is crazy INIT_ALL_POSSIBLE config option is used.