Re: [PATCH net-next 1/2] mpls: packet stats

2016-07-28 Thread Roopa Prabhu
On 2/5/16, 11:27 AM, Robert Shearman wrote:
> Having MPLS packet stats is useful for observing network operation and
> for diagnosing network problems. In the absence of anything better,
> use RFCs for MIBs defining MPLS stats for guidance on the semantics of
> the stats to expose. RFC3813 details two per-interface packet stats
> that should be provided (label lookup failures and fragmented packets)
> and also provides interpretation of RFC2863 for other per-interface
> stats (in/out ucast, mcast and bcast, in/out discards and errors and
> in unknown protos).
>
> Multicast, fragment and broadcast packet counters are printed, but not
> stored to allow for future implementation of current standards or
> future standards without user-space having to change.
>
> All the introduced fields are 64-bit, even error ones, to ensure no
> overflow with long uptimes. Per-CPU counters are used to avoid
> cache-line contention on the commonly used fields. The other fields
> have also been made per-CPU for code to avoid performance problems in
> error conditions on the assumption that on some platforms the cost of
> atomic operations could be more pexpensive than sending the packet
> (which is what would be done in the success case). If that's not the
> case, we could instead not use per-CPU counters for these fields.
>
> The IPv6 proc code was used as an inspiration for the proc code
> here, both in terms of the implementation as well as the location of
> the per-device stats proc files: /proc/net/dev_snmp_mpls/.
>
> Signed-off-by: Robert Shearman 
>
Robert, any interest in moving this to the new stats api ?.

I had done some work for AF_ stats. Did not eventually end up including 
it in the
final version. The AF_ infra patch is here:
https://github.com/CumulusNetworks/net-next/commits/mpls-stats

Thanks!.


Re: [PATCH v2 4/4] ARM: OMAP2+: omap_device: fix crash on omap_device removal

2016-07-28 Thread Peter Ujfalusi
On 07/28/16 20:50, Grygorii Strashko wrote:
> Below call chain causes system crash when OMAP device is
> removed by calling of_platform_depopulate()/device_del():

Should you swap 3 <-> 4 in the series?
Currently patch 3 will introduce the crash you are fixing in patch 4...

> 
> device_del()
> - blocking_notifier_call_chain(>bus->p->bus_notifier,
>BUS_NOTIFY_DEL_DEVICE, dev);
>   - _omap_device_notifier_call()
> - omap_device_delete()
>   - od->pdev->archdata.od = NULL;
>   kfree(od->hwmods);
>   kfree(od);
>   - bus_remove_device()
> - device_release_driver()
>   - __device_release_driver()
>   - pm_runtime_get_sync()
>  - _od_runtime_resume()
>- omap_hwmod_enable() <- OOPS od's delted already
> 
> Backtrace:
> Unable to handle kernel NULL pointer dereference at virtual address 000d
> pgd = eb10
> [000d] *pgd=ad6e1831, *pte=, *ppte=
> Internal error: Oops: 17 [#1] PREEMPT SMP ARM
> CPU: 1 PID: 1273 Comm: modprobe Not tainted 4.4.15-rt19-00115-ge4d3cd3-dirty 
> #68
> Hardware name: Generic DRA74X (Flattened Device Tree)
> task: eb1ee800 ti: ec962000 task.ti: ec962000
> PC is at omap_device_enable+0x10/0x90
> LR is at _od_runtime_resume+0x10/0x24
> [...]
> [] (omap_device_enable) from [] 
> (_od_runtime_resume+0x10/0x24)
> [] (_od_runtime_resume) from [] (__rpm_callback+0x20/0x34)
> [] (__rpm_callback) from [] (rpm_callback+0x20/0x80)
> [] (rpm_callback) from [] (rpm_resume+0x48c/0x964)
> [] (rpm_resume) from [] (__pm_runtime_resume+0x60/0x88)
> [] (__pm_runtime_resume) from [] 
> (__device_release_driver+0x30/0x100)
> [] (__device_release_driver) from [] 
> (device_release_driver+0x1c/0x28)
> [] (device_release_driver) from [] 
> (bus_remove_device+0xec/0x144)
> [] (bus_remove_device) from [] (device_del+0x10c/0x210)
> [] (device_del) from [] (platform_device_del+0x18/0x84)
> [] (platform_device_del) from [] 
> (platform_device_unregister+0xc/0x20)
> [] (platform_device_unregister) from [] 
> (of_platform_device_destroy+0x8c/0x90)
> [] (of_platform_device_destroy) from [] 
> (device_for_each_child+0x4c/0x78)
> [] (device_for_each_child) from [] 
> (of_platform_depopulate+0x30/0x44)
> [] (of_platform_depopulate) from [] 
> (cpsw_remove+0x68/0xf4 [ti_cpsw])
> [] (cpsw_remove [ti_cpsw]) from [] 
> (platform_drv_remove+0x24/0x3c)
> [] (platform_drv_remove) from [] 
> (__device_release_driver+0x84/0x100)
> [] (__device_release_driver) from [] 
> (driver_detach+0xac/0xb0)
> [] (driver_detach) from [] (bus_remove_driver+0x60/0xd4)
> [] (bus_remove_driver) from [] 
> (SyS_delete_module+0x184/0x20c)
> [] (SyS_delete_module) from [] (ret_fast_syscall+0x0/0x1c)
> Code: e350 e92d4070 1590630c 01a06000 (e5d6300d)
> 
> Hence, fix it by using BUS_NOTIFY_REMOVED_DEVICE event for OMAP device
> deletion which is sent when DD has finished processing of device
> deletion.
> 
> Cc: Tony Lindgren 
> Cc: Tero Kristo 
> Signed-off-by: Grygorii Strashko 
> ---
>  arch/arm/mach-omap2/omap_device.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/arm/mach-omap2/omap_device.c 
> b/arch/arm/mach-omap2/omap_device.c
> index f7ff3b9..208f115 100644
> --- a/arch/arm/mach-omap2/omap_device.c
> +++ b/arch/arm/mach-omap2/omap_device.c
> @@ -194,7 +194,7 @@ static int _omap_device_notifier_call(struct 
> notifier_block *nb,
>   int err;
>  
>   switch (event) {
> - case BUS_NOTIFY_DEL_DEVICE:
> + case BUS_NOTIFY_REMOVED_DEVICE:
>   if (pdev->archdata.od)
>   omap_device_delete(pdev->archdata.od);
>   break;
> 


-- 
Péter


[PATCH net] tcp: consider recv buf for the initial window scale

2016-07-28 Thread Soheil Hassas Yeganeh
From: Soheil Hassas Yeganeh 

tcp_select_initial_window() intends to advertise a window
scaling for the maximum possible window size. To do so,
it considers the maximum of net.ipv4.tcp_rmem[2] and
net.core.rmem_max as the only possible upper-bounds.
However, users with CAP_NET_ADMIN can use SO_RCVBUFFORCE
to set the socket's receive buffer size to values
larger than net.ipv4.tcp_rmem[2] and net.core.rmem_max.
Thus, SO_RCVBUFFORCE is effectively ignored by
tcp_select_initial_window().

To fix this, consider the maximum of net.ipv4.tcp_rmem[2],
net.core.rmem_max and socket's initial buffer space.

This part of the code does not have git history and as a
result this patch does not have a `Fixes:` tag.

Signed-off-by: Soheil Hassas Yeganeh 
Suggested-by: Neal Cardwell 
---
 net/ipv4/tcp_output.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index b26aa87..bdaef7f 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -236,7 +236,8 @@ void tcp_select_initial_window(int __space, __u32 mss,
/* Set window scaling on max possible window
 * See RFC1323 for an explanation of the limit to 14
 */
-   space = max_t(u32, sysctl_tcp_rmem[2], sysctl_rmem_max);
+   space = max_t(u32, space, sysctl_tcp_rmem[2]);
+   space = max_t(u32, space, sysctl_rmem_max);
space = min_t(u32, space, *window_clamp);
while (space > 65535 && (*rcv_wscale) < 14) {
space >>= 1;
-- 
2.8.0.rc3.226.g39d4020



[PATCH] bpf: fix size of copy_to_user in percpu map.

2016-07-28 Thread William Tu
The total size of value copy_to_user() writes to userspace should
be the (current number of cpu) * (value size), instead of
num_possible_cpus() * (value size).  Found by samples/bpf/test_maps.c,
which always copies 512 byte to userspace, crashing the userspace
program stack.

Signed-off-by: William Tu 
---
 kernel/bpf/syscall.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 228f962..47f738e 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -324,7 +324,8 @@ static int map_lookup_elem(union bpf_attr *attr)
goto free_value;
 
err = -EFAULT;
-   if (copy_to_user(uvalue, value, value_size) != 0)
+   if (copy_to_user(uvalue, value,
+   map->value_size * num_online_cpus()) != 0)
goto free_value;
 
err = 0;
-- 
2.5.0



Re: [Intel-wired-lan] [e1000_netpoll] BUG: sleeping function called from invalid context at kernel/irq/manage.c:110

2016-07-28 Thread Francois Romieu
Eric Dumazet  :
[...]
> I would prefer having a definitive advice from Thomas Gleixner and/or
> others if disable_irq() is forbidden from IRQ path.
> 
> As I said, about all netpoll() methods in net drivers use disable_irq()
> so a lot of patches would be needed.

s/about all/many/

There has been a WARN_ONCE(!irqs_disabled() in netpoll_send_skb_on_dev for
quite some time now but it's apparently screened by too many tests to be
effective. :o/

-- 
Ueimor


Re: [PATCH] net/mlx5_core/pagealloc: Remove deprecated create_singlethread_workqueue

2016-07-28 Thread Saeed Mahameed
On Thu, Jul 28, 2016 at 12:37 PM, Leon Romanovsky  wrote:
> On Thu, Jul 28, 2016 at 01:49:49PM +0530, Bhaktipriya Shridhar wrote:
>> A dedicated workqueue has been used since the work items are being used
>> on a memory reclaim path. WQ_MEM_RECLAIM has been set to guarantee forward
>> progress under memory pressure.
>>
>> The workqueue has a single work item. Hence, alloc_workqueue() is used
>> instead of alloc_ordered_workqueue() since ordering is unnecessary when
>> there's only one work item.
>>
>> Explicit concurrency limit is unnecessary here since there are only a
>> fixed number of work items.
>>
>> Signed-off-by: Bhaktipriya Shridhar 
>> ---
>>  drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c | 3 ++-
>>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> Hi Bhaktipriya,
>
> First of all, I would like to thank you for your work and invite you to
> continue, but can you please submit ONE patch SERIES which changes all
> similar places?
>

I agree with Leon, please push one series for all mlx5 patches and add
some explanation in the cover letter regarding the motivation of this
work.

> BTW,
> Did you test this patch? Did you notice the memory reclaim path nature
> of this work?
>
> Thanks
>
>>
>> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c 
>> b/drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c
>> index 905..7c85262 100644
>> --- a/drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c
>> +++ b/drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c
>> @@ -552,7 +552,8 @@ void mlx5_pagealloc_cleanup(struct mlx5_core_dev *dev)
>>
>>  int mlx5_pagealloc_start(struct mlx5_core_dev *dev)
>>  {
>> - dev->priv.pg_wq = create_singlethread_workqueue("mlx5_page_allocator");
>> + dev->priv.pg_wq = alloc_workqueue("mlx5_page_allocator",
>> +   WQ_MEM_RECLAIM, 0);
>>   if (!dev->priv.pg_wq)
>>   return -ENOMEM;
>>
>> --
>> 2.1.4
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] net/mlx5_core/pagealloc: Remove deprecated create_singlethread_workqueue

2016-07-28 Thread Saeed Mahameed
On Thu, Jul 28, 2016 at 11:19 AM, Bhaktipriya Shridhar
 wrote:
> A dedicated workqueue has been used since the work items are being used
> on a memory reclaim path. WQ_MEM_RECLAIM has been set to guarantee forward
> progress under memory pressure.
>
> The workqueue has a single work item. Hence, alloc_workqueue() is used
> instead of alloc_ordered_workqueue() since ordering is unnecessary when
> there's only one work item.

let's keep the current behavior (ST WQ) because at the moment we don't
know how this WQ will evolve in the future, the original author had
something in mind .. let's keep.

>
> Explicit concurrency limit is unnecessary here since there are only a
> fixed number of work items.
>
> Signed-off-by: Bhaktipriya Shridhar 
> ---
>  drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c 
> b/drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c
> index 905..7c85262 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c
> @@ -552,7 +552,8 @@ void mlx5_pagealloc_cleanup(struct mlx5_core_dev *dev)
>
>  int mlx5_pagealloc_start(struct mlx5_core_dev *dev)
>  {
> -   dev->priv.pg_wq = 
> create_singlethread_workqueue("mlx5_page_allocator");
> +   dev->priv.pg_wq = alloc_workqueue("mlx5_page_allocator",
> + WQ_MEM_RECLAIM, 0);
> if (!dev->priv.pg_wq)
> return -ENOMEM;
>
> --
> 2.1.4
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] net/mlx5_core/en_main: Remove deprecated create_workqueue

2016-07-28 Thread Saeed Mahameed
On Wed, Jul 27, 2016 at 9:12 AM, Bhaktipriya Shridhar
 wrote:
> alloc_ordered_workqueue() with WQ_MEM_RECLAIM set replaces
> deprecated create_singlethread_workqueue(). This is the identity
> conversion.
>
> A dedicated workqueue has been used since mlx5e workqueue was created to
> handle all mlx5e specific tasks. This is in preparation for vxlan using
> the mlx5e workqueue in order to schedule port add/remove operations.
> WQ_MEM_RECLAIM has been set to guarantee forward progress under memory
> pressure.
>
> Can the workitems be executed concurrently?

well, the work items that currently using the mlx5e WQ are:

priv->update_carrier_work : Read hardware link state and update netdev carrier
priv->update_stats_work: a periodic task that runs once every 1/4 sec
to update netdev statistics.
priv->set_rx_mode_work: A task queued from netdev set_rx_mdoe/set_mac
NDOs and sometimes explicitly from driver to update netdev RX mode and
filters.
mlx5e_vxlan_queue_work: Queue a _dynamically_ created workitem to
add/rem vxlan port, those tasks must not be executed concurrently
-since they are dynamically- allocated to gurantee add/remove
ordering.

bottom line, I wouldn't change concurrency of the mlx5e work queue.

> Are the workitems being used on a memory reclaim path?

do you mean they need to allocate memory ?

> Signed-off-by: Bhaktipriya Shridhar 
> ---
>  drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c 
> b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> index fd43929..1a96445 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> @@ -3042,7 +3042,7 @@ static void *mlx5e_create_netdev(struct mlx5_core_dev 
> *mdev)
>
> priv = netdev_priv(netdev);
>
> -   priv->wq = create_singlethread_workqueue("mlx5e");
> +   priv->wq = alloc_ordered_workqueue("mlx5e", WQ_MEM_RECLAIM);
> if (!priv->wq)
> goto err_free_netdev;
>
> --
> 2.1.4
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


Greetings Dear.

2016-07-28 Thread Eminado Justin Yak
Hello Dear It is really a pleasure to meet you, I'm eminado by name,
please write me on my e-mail through eminadojustin_...@yahoo.com  so that i
can send you my details with my pictures and also learn more about
you, hope to hear from you soonest,

Eminado Justin Yak.
-

Hola Queridos es realmente un placer conocerte, estoy eminado por su nombre,
por favor, escríbeme a mi dirección de e-mail a través de
eminadojustin_...@yahoo.com así que me
puede enviar los detalles de mi con mis fotos y también aprender más acerca de
usted, espero oír de usted lo más pronto posible,

Eminado Justin Yak.


Re: [PATCH net 1/3] r8169:fix kernel log spam when set or get hardware wol setting.

2016-07-28 Thread Francois Romieu
Hau  :
[...]
> > Either the driver resumes the device so that it can perform requested
> > operation or it signals .set_wol failure when the device is suspended.
> > 
> > If the driver does something else, "spam removal" translates to "silent
> > failure".
> 
> Because "tp->saved_wolopts" will be used to set hardware wol capability in
> rtl8169_runtime_resume().  So I prefer to keep "wol->wolopts" to
> " tp->saved_wolopts " in runtime suspend state and set this to this
> "wol->wolopts" to hardware in in rtl8169_runtime_resume(). 

It would be fine if it could be proven that rtl8169_runtime_resume() will
always be run before software state is lost.

-- 
Ueimor


㊣Hi

2016-07-28 Thread hi
hi
this is an electronics shop
bike,brand guitar,camera,tv,samsung product free shipping
www .slooone .com

Re: [PATCH -next] drivers: net: phy: xgene: Remove redundant dev_err call in xgene_mdio_probe()

2016-07-28 Thread Iyappan Subramanian
On Wed, Jul 27, 2016 at 7:12 PM, Wei Yongjun  wrote:
> There is a error message within devm_ioremap_resource
> already, so remove the dev_err call to avoid redundant
> error message.
>
> Signed-off-by: Wei Yongjun 
> ---
>  drivers/net/phy/mdio-xgene.c | 4 +---
>  1 file changed, 1 insertion(+), 3 deletions(-)
>
> diff --git a/drivers/net/phy/mdio-xgene.c b/drivers/net/phy/mdio-xgene.c
> index d94a978..7756748 100644
> --- a/drivers/net/phy/mdio-xgene.c
> +++ b/drivers/net/phy/mdio-xgene.c
> @@ -345,10 +345,8 @@ static int xgene_mdio_probe(struct platform_device *pdev)
>
> res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
> csr_base = devm_ioremap_resource(dev, res);
> -   if (IS_ERR(csr_base)) {
> -   dev_err(dev, "Unable to retrieve mac CSR region\n");
> +   if (IS_ERR(csr_base))
> return PTR_ERR(csr_base);
> -   }
> pdata->mac_csr_addr = csr_base;
> pdata->mdio_csr_addr = csr_base + BLOCK_XG_MDIO_CSR_OFFSET;
> pdata->diag_csr_addr = csr_base + BLOCK_DIAG_CSR_OFFSET;
>
>
>

Thanks.

Acked-By: Iyappan Subramanian 


Re: [PATCH] [v6] net: emac: emac gigabit ethernet controller driver

2016-07-28 Thread Timur Tabi

Lino Sanfilippo wrote:


+   skb = dev_alloc_skb(adpt->rxbuf_size + NET_IP_ALIGN);
+   if (!skb)
+   break;
+
+   /* Make buffer alignment 2 beyond a 16 byte boundary
+* this will result in a 16 byte aligned IP header after
+* the 14 byte MAC header is removed
+*/
+   skb_reserve(skb, NET_IP_ALIGN);


__netdev_alloc_skb_ip_align will do this for you.


Will fix.


+   curr_rxbuf->dma_addr = dma_map_single(adpt->netdev->dev.parent,
+ skb_data,
+ curr_rxbuf->length,
+ DMA_FROM_DEVICE);



Mapping can fail. You should check the result via dma_mapping_error().
There are several other places in which dma_map_single() is called and the 
return value
is not checked.


Will fix.


+   if (ret) {
+   netdev_err(adpt->netdev,
+  "error:%d on request_irq(%d:%s flags:0)\n", ret,
+  irq->irq, EMAC_MAC_IRQ_RES);


freeing the irq is missing


Will fix.


+   /* disable mac irq */
+   writel(DIS_INT, adpt->base + EMAC_INT_STATUS);
+   writel(0, adpt->base + EMAC_INT_MASK);
+   synchronize_irq(adpt->irq.irq);
+   free_irq(adpt->irq.irq, >irq);
+   clear_bit(EMAC_STATUS_TASK_REINIT_REQ, >status);
+
+   cancel_work_sync(>tx_ts_task);
+   spin_lock_irqsave(>tx_ts_lock, flags);


Maybe I am missing something but AFAICS tx_ts_lock is never called from irq 
context, so
there is no reason to disable irqs.


It might have been that way in an older version of the code, but it 
appears you are correct.  I will change it to a normal spinlock.  Thanks.



+/* Push the received skb to upper layers */
+static void emac_receive_skb(struct emac_rx_queue *rx_q,
+struct sk_buff *skb,
+u16 vlan_tag, bool vlan_flag)
+{
+   if (vlan_flag) {
+   u16 vlan;
+
+   EMAC_TAG_TO_VLAN(vlan_tag, vlan);
+   __vlan_hwaccel_put_tag(skb, htons(ETH_P_8021Q), vlan);
+   }
+
+   napi_gro_receive(_q->napi, skb);


napi_gro_receive requires rx checksum offload. However emac_receive_skb() is 
also called if
hardware checksumming is disabled.


So the hardware is a little weird here.  Apparently, there is a bug in 
the parsing of the packet headers that is avoided if we disable hardware 
checksumming.


In emac_mac_rx_process(), right before it calls emac_receive_skb(), it 
does this:


if (netdev->features & NETIF_F_RXCSUM)
skb->ip_summed = RRD_L4F() ?
  CHECKSUM_NONE : CHECKSUM_UNNECESSARY;
else
skb_checksum_none_assert(skb);

RRD_L4F() is always zero and NETIF_F_RXCSUM is set by default, so 
ip_summed is set to CHECKSUM_UNNECESSARY.


So you're saying that if NETIF_F_RXCSUM is not set, then 
napi_gro_receive() should not be called?


I see examples of other drivers that *appear* to call napi_gro_receive() 
even when hardware checksumming is disabled.


For example, bfin_mac_rx() in adi/bfin_mac.c does this:

/*
 * Disable hardware checksum for bug #5600 if writeback cache is
 * enabled. Otherwize, corrupted RX packet will be sent up stack
 * without error mark.
 */
#ifndef CONFIG_BFIN_EXTMEM_WRITEBACK
#define BFIN_MAC_CSUM_OFFLOAD
#endif

...

#if defined(BFIN_MAC_CSUM_OFFLOAD)
...
#endif
napi_gro_receive(>napi, skb);

Shouldn't the call to napi_gro_receive() be before the #endif?

Function i40e_receive_skb() has similar code to my driver.

In fact, I have not been able to find any clear example of a driver that 
intentionally avoids calling napi_gro_receive() if hardware checksumming 
is disabled.



+/* Transmit the packet using specified transmit queue */
+int emac_mac_tx_buf_send(struct emac_adapter *adpt, struct emac_tx_queue *tx_q,
+struct sk_buff *skb)
+{
+   struct emac_tpd tpd;
+   u32 prod_idx;
+
+   if (!emac_tx_has_enough_descs(tx_q, skb)) {


Drivers should avoid this situation right from the start by checking after each 
transmission if the max number
  of possible descriptors is still available for a further transmission and 
stop the queue if there are not.


Ok, to be clear, you're saying I should do what bcmgenet_xmit() does.

if (ring->free_bds <= (MAX_SKB_FRAGS + 1))
netif_tx_stop_queue(txq);

At the end of emac_mac_tx_buf_send(), I should call 
emac_tpd_num_free_descs() and check to see whether the number of free 
descriptors is <= (MAX_SKB_FRAGS + 1).



Furthermore there does not seem to be any function that wakes the queue up 
again once it has been stopped.


If I make the above fix, won't that also fix this bug?


+/* reinitialize */
+void emac_reinit_locked(struct emac_adapter *adpt)
+{
+   while (test_and_set_bit(EMAC_STATUS_RESETTING, >status))

Re: [PATCH v2 3/5] ARM: sun8i: dt: Add DT bindings documentation for Allwinner sun8i-emac

2016-07-28 Thread Maxime Ripard
On Thu, Jul 28, 2016 at 03:40:31PM +0200, LABBE Corentin wrote:
> On Thu, Jul 21, 2016 at 09:55:19AM +0200, Maxime Ripard wrote:
> > Hi,
> > 
> > On Wed, Jul 20, 2016 at 10:03:18AM +0200, LABBE Corentin wrote:
> > > This patch adds documentation for Device-Tree bindings for the
> > > Allwinner sun8i-emac driver.
> > > 
> > > Signed-off-by: LABBE Corentin 
> > > ---
> > >  .../bindings/net/allwinner,sun8i-emac.txt  | 65 
> > > ++
> > >  1 file changed, 65 insertions(+)
> > >  create mode 100644 
> > > Documentation/devicetree/bindings/net/allwinner,sun8i-emac.txt
> > > 
> > > diff --git 
> > > a/Documentation/devicetree/bindings/net/allwinner,sun8i-emac.txt 
> > > b/Documentation/devicetree/bindings/net/allwinner,sun8i-emac.txt
> > > new file mode 100644
> > > index 000..4bf4e53
> > > --- /dev/null
> > > +++ b/Documentation/devicetree/bindings/net/allwinner,sun8i-emac.txt
> > > @@ -0,0 +1,65 @@
> > > +* Allwinner sun8i EMAC ethernet controller
> > > +
> > > +Required properties:
> > > +- compatible: "allwinner,sun8i-a83t-emac", "allwinner,sun8i-h3-emac",
> > > + or "allwinner,sun50i-a64-emac"
> > > +- reg: address and length of the register sets for the device.
> > > +- reg-names: should be "emac" and "syscon", matching the register sets
> > 
> > Blindly mapping a register of some other device on the SoC doesn't
> > look very reasonable.
> 
> As we discuss after this mail on IRC, this register is dedicated to EMAC.

I don't think we did. It's still right in the middle of some other
hardware block register space. You actually have a syscon driver to do
just that, why not use it?

> > > +See ethernet.txt in the same directory for generic bindings for ethernet
> > > +controllers.
> > > +
> > > +The device node referenced by "phy" or "phy-handle" should be a child 
> > > node
> > > +of this node. See phy.txt for the generic PHY bindings.
> > > +
> > > +Optional properties:
> > > +- phy-supply: phandle to a regulator if the PHY needs one
> > > +- phy-io-supply: phandle to a regulator if the PHY needs a another one 
> > > for I/O.
> > > +  This is sometimes found with RGMII PHYs, which use a second
> > > +  regulator for the lower I/O voltage.
> > > +- allwinner,tx-delay: The setting of the TX clock delay chain
> > > +- allwinner,rx-delay: The setting of the RX clock delay chain
> > 
> > In which unit? What is the default value?
> 
> The unit is unknown to me, but I have added a comment for the
> default and acceptable range value.

That's unfortunate. We'll see how the DT maintainers feel about that.

Maxime

-- 
Maxime Ripard, Free Electrons
Embedded Linux and Kernel engineering
http://free-electrons.com


signature.asc
Description: PGP signature


[PATCH v2 0/4] drivers: net: cpsw: fix driver loading/unloading

2016-07-28 Thread Grygorii Strashko
This series fixes set of isssues observed when CPSW driver module is 
unloaded/loaded:
1) rmmod: deadlock in cpdma_ctlr_destroy
2) rmmod: L3 back-trace and crash if all net interfaces are down, because CPSW
can be powerred down by PM runtime in this case.
3) insmod: mdio device is not recreated on next insmod
 - need to use of_platform_depopulate() in cpsw_remove().
4) rmmod: system crash on omap_device removal

Tested on: am437x-idk, am57xx-beagle-x15

Changes in v2:
- build warning fixed
- added fix for correct omap_device removal

Link on v1:
 https://lkml.org/lkml/2016/7/22/240

Grygorii Strashko (4):
  net: ethernet: ti: cpdma: fix lockup in cpdma_ctlr_destroy()
  drivers: net: cpsw: fix wrong regs access in cpsw_remove
  drivers: net: cpsw: use of_platform_depopulate()
  ARM: OMAP2+: omap_device: fix crash on omap_device removal

 arch/arm/mach-omap2/omap_device.c   |  2 +-
 drivers/net/ethernet/ti/cpsw.c  | 19 +--
 drivers/net/ethernet/ti/davinci_cpdma.c |  3 ---
 3 files changed, 10 insertions(+), 14 deletions(-)

-- 
2.9.2



[PATCH v2 4/4] ARM: OMAP2+: omap_device: fix crash on omap_device removal

2016-07-28 Thread Grygorii Strashko
Below call chain causes system crash when OMAP device is
removed by calling of_platform_depopulate()/device_del():

device_del()
- blocking_notifier_call_chain(>bus->p->bus_notifier,
 BUS_NOTIFY_DEL_DEVICE, dev);
  - _omap_device_notifier_call()
- omap_device_delete()
  - od->pdev->archdata.od = NULL;
kfree(od->hwmods);
kfree(od);
  - bus_remove_device()
- device_release_driver()
  - __device_release_driver()
- pm_runtime_get_sync()
   - _od_runtime_resume()
 - omap_hwmod_enable() <- OOPS od's delted already

Backtrace:
Unable to handle kernel NULL pointer dereference at virtual address 000d
pgd = eb10
[000d] *pgd=ad6e1831, *pte=, *ppte=
Internal error: Oops: 17 [#1] PREEMPT SMP ARM
CPU: 1 PID: 1273 Comm: modprobe Not tainted 4.4.15-rt19-00115-ge4d3cd3-dirty #68
Hardware name: Generic DRA74X (Flattened Device Tree)
task: eb1ee800 ti: ec962000 task.ti: ec962000
PC is at omap_device_enable+0x10/0x90
LR is at _od_runtime_resume+0x10/0x24
[...]
[] (omap_device_enable) from [] 
(_od_runtime_resume+0x10/0x24)
[] (_od_runtime_resume) from [] (__rpm_callback+0x20/0x34)
[] (__rpm_callback) from [] (rpm_callback+0x20/0x80)
[] (rpm_callback) from [] (rpm_resume+0x48c/0x964)
[] (rpm_resume) from [] (__pm_runtime_resume+0x60/0x88)
[] (__pm_runtime_resume) from [] 
(__device_release_driver+0x30/0x100)
[] (__device_release_driver) from [] 
(device_release_driver+0x1c/0x28)
[] (device_release_driver) from [] 
(bus_remove_device+0xec/0x144)
[] (bus_remove_device) from [] (device_del+0x10c/0x210)
[] (device_del) from [] (platform_device_del+0x18/0x84)
[] (platform_device_del) from [] 
(platform_device_unregister+0xc/0x20)
[] (platform_device_unregister) from [] 
(of_platform_device_destroy+0x8c/0x90)
[] (of_platform_device_destroy) from [] 
(device_for_each_child+0x4c/0x78)
[] (device_for_each_child) from [] 
(of_platform_depopulate+0x30/0x44)
[] (of_platform_depopulate) from [] (cpsw_remove+0x68/0xf4 
[ti_cpsw])
[] (cpsw_remove [ti_cpsw]) from [] 
(platform_drv_remove+0x24/0x3c)
[] (platform_drv_remove) from [] 
(__device_release_driver+0x84/0x100)
[] (__device_release_driver) from [] 
(driver_detach+0xac/0xb0)
[] (driver_detach) from [] (bus_remove_driver+0x60/0xd4)
[] (bus_remove_driver) from [] 
(SyS_delete_module+0x184/0x20c)
[] (SyS_delete_module) from [] (ret_fast_syscall+0x0/0x1c)
Code: e350 e92d4070 1590630c 01a06000 (e5d6300d)

Hence, fix it by using BUS_NOTIFY_REMOVED_DEVICE event for OMAP device
deletion which is sent when DD has finished processing of device
deletion.

Cc: Tony Lindgren 
Cc: Tero Kristo 
Signed-off-by: Grygorii Strashko 
---
 arch/arm/mach-omap2/omap_device.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm/mach-omap2/omap_device.c 
b/arch/arm/mach-omap2/omap_device.c
index f7ff3b9..208f115 100644
--- a/arch/arm/mach-omap2/omap_device.c
+++ b/arch/arm/mach-omap2/omap_device.c
@@ -194,7 +194,7 @@ static int _omap_device_notifier_call(struct notifier_block 
*nb,
int err;
 
switch (event) {
-   case BUS_NOTIFY_DEL_DEVICE:
+   case BUS_NOTIFY_REMOVED_DEVICE:
if (pdev->archdata.od)
omap_device_delete(pdev->archdata.od);
break;
-- 
2.9.2



[PATCH v2 1/4] net: ethernet: ti: cpdma: fix lockup in cpdma_ctlr_destroy()

2016-07-28 Thread Grygorii Strashko
Fix deadlock in cpdma_ctlr_destroy() which is triggered now on
cpsw module removal:
 cpsw_remove()
 - cpdma_ctlr_destroy()
   - spin_lock_irqsave(>lock, flags)
   - cpdma_ctlr_stop()
 - spin_lock_irqsave(>lock, flags);
   - cpdma_chan_destroy()
 - spin_lock_irqsave(>lock, flags);

The issue has not been observed before because CPDMA channels have
been destroyed manually by CPSW until commit d941ebe88a41 ("net:
ethernet: ti: cpsw: use destroy ctlr to destroy channels") was merged.

Signed-off-by: Grygorii Strashko 
---
 drivers/net/ethernet/ti/davinci_cpdma.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/drivers/net/ethernet/ti/davinci_cpdma.c 
b/drivers/net/ethernet/ti/davinci_cpdma.c
index c8154ff..fdc0f4f 100644
--- a/drivers/net/ethernet/ti/davinci_cpdma.c
+++ b/drivers/net/ethernet/ti/davinci_cpdma.c
@@ -357,13 +357,11 @@ EXPORT_SYMBOL_GPL(cpdma_ctlr_stop);
 
 int cpdma_ctlr_destroy(struct cpdma_ctlr *ctlr)
 {
-   unsigned long flags;
int ret = 0, i;
 
if (!ctlr)
return -EINVAL;
 
-   spin_lock_irqsave(>lock, flags);
if (ctlr->state != CPDMA_STATE_IDLE)
cpdma_ctlr_stop(ctlr);
 
@@ -371,7 +369,6 @@ int cpdma_ctlr_destroy(struct cpdma_ctlr *ctlr)
cpdma_chan_destroy(ctlr->channels[i]);
 
cpdma_desc_pool_destroy(ctlr->pool);
-   spin_unlock_irqrestore(>lock, flags);
return ret;
 }
 EXPORT_SYMBOL_GPL(cpdma_ctlr_destroy);
-- 
2.9.2



[PATCH v2 2/4] drivers: net: cpsw: fix wrong regs access in cpsw_remove

2016-07-28 Thread Grygorii Strashko
The L3 error will be generated and system will crash during unloading
of CPSW driver if CPSW is used as module and ethX devices are down.
This happens because CPSW can be power off by PM runtime now when ethX
devices are down.

Hence, ensure that CPSW powered up by PM runtime before performing any
deinitialization actions which require CPSW registers access. In case
of PM runtime error just leave cpsw_remove() as we can't do anything
anymore.

Signed-off-by: Grygorii Strashko 
---
 drivers/net/ethernet/ti/cpsw.c | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/ti/cpsw.c b/drivers/net/ethernet/ti/cpsw.c
index 8f1eab9..ec6f473 100644
--- a/drivers/net/ethernet/ti/cpsw.c
+++ b/drivers/net/ethernet/ti/cpsw.c
@@ -2579,6 +2579,13 @@ static int cpsw_remove(struct platform_device *pdev)
 {
struct net_device *ndev = platform_get_drvdata(pdev);
struct cpsw_priv *priv = netdev_priv(ndev);
+   int ret;
+
+   ret = pm_runtime_get_sync(>dev);
+   if (ret < 0) {
+   pm_runtime_put_noidle(>dev);
+   return ret;
+   }
 
if (priv->data.dual_emac)
unregister_netdev(cpsw_get_slave_ndev(priv, 1));
@@ -2586,8 +2593,9 @@ static int cpsw_remove(struct platform_device *pdev)
 
cpsw_ale_destroy(priv->ale);
cpdma_ctlr_destroy(priv->dma);
-   pm_runtime_disable(>dev);
device_for_each_child(>dev, NULL, cpsw_remove_child_device);
+   pm_runtime_put_sync(>dev);
+   pm_runtime_disable(>dev);
if (priv->data.dual_emac)
free_netdev(cpsw_get_slave_ndev(priv, 1));
free_netdev(ndev);
-- 
2.9.2



[PATCH v2 3/4] drivers: net: cpsw: use of_platform_depopulate()

2016-07-28 Thread Grygorii Strashko
Use of_platform_depopulate() in cpsw_remove() instead of
of_device_unregister(), because CSPW child devices will not be
recreated otherwise on next insmod. of_platform_depopulate() is
correct way now as it will ensure that all steps done in
of_platform_populate() are reverted, including cleaning up of
OF_POPULATED flag.

Signed-off-by: Grygorii Strashko 
---
 drivers/net/ethernet/ti/cpsw.c | 11 +--
 1 file changed, 1 insertion(+), 10 deletions(-)

diff --git a/drivers/net/ethernet/ti/cpsw.c b/drivers/net/ethernet/ti/cpsw.c
index ec6f473..f0ed470 100644
--- a/drivers/net/ethernet/ti/cpsw.c
+++ b/drivers/net/ethernet/ti/cpsw.c
@@ -2566,15 +2566,6 @@ clean_ndev_ret:
return ret;
 }
 
-static int cpsw_remove_child_device(struct device *dev, void *c)
-{
-   struct platform_device *pdev = to_platform_device(dev);
-
-   of_device_unregister(pdev);
-
-   return 0;
-}
-
 static int cpsw_remove(struct platform_device *pdev)
 {
struct net_device *ndev = platform_get_drvdata(pdev);
@@ -2593,7 +2584,7 @@ static int cpsw_remove(struct platform_device *pdev)
 
cpsw_ale_destroy(priv->ale);
cpdma_ctlr_destroy(priv->dma);
-   device_for_each_child(>dev, NULL, cpsw_remove_child_device);
+   of_platform_depopulate(>dev);
pm_runtime_put_sync(>dev);
pm_runtime_disable(>dev);
if (priv->data.dual_emac)
-- 
2.9.2



Re: Microsemi VSC 8531/41 PHY Driver

2016-07-28 Thread Florian Fainelli
On 07/27/2016 11:44 PM, Raju Lakkaraju wrote:
> Hello Andrew,
> 
> Thank you for given valuable comments.
> Please see the my responses inline.
> 
> Thanks,
> Raju
> 
> -Original Message-
> From: Andrew Lunn [mailto:and...@lunn.ch] 
> Sent: Tuesday, July 26, 2016 6:14 PM
> To: Raju Lakkaraju
> Cc: netdev@vger.kernel.org; f.faine...@gmail.com; Allan Nielsen
> Subject: Re: Microsemi VSC 8531/41 PHY Driver
> 
> EXTERNAL EMAIL
> 
> 
>> +/* RGMII Rx Clock delay value change with board lay-out */ static u8 
>> +rgmii_rx_clk_delay = RGMII_RX_CLK_DELAY_1_1_NS;
> 
> Doesn't this stop you from having a board with two PHYs with different 
> layouts? You should be getting this value from the device tree.
> 
> Raju: As of now, RGMII Rx clock delay value should be 1.1 nsec as 
> optimized/recommended value. 
> We tested on Beaglebone Black with VSC 8531 PHY.

That is true, until the next design with a PHY that does not need this
value and then, it will have to be adjusted.

> We would like to provide new function to configure correct/require value 
> based on PHY layouts 
> alone with other RGMII configuration parameters as part of our next 
> implementation.

You can either introduce a Device Tree property to allow boards to
specify what the correct delay(s) should be, or if the platform does not
use Device Tree, using phy_register_fixup_for_id would be acceptable for
that.

> 
>> + phydev->supported = (SUPPORTED_1000baseT_Full |
>> +  SUPPORTED_1000baseT_Half |
>> +  SUPPORTED_100baseT_Full  |
>> +  SUPPORTED_100baseT_Half  |
>> +  SUPPORTED_10baseT_Full   |
>> +  SUPPORTED_10baseT_Half   |
>> +  SUPPORTED_Autoneg|
>> +  SUPPORTED_Pause  |
>> +  SUPPORTED_Asym_Pause |
>> +  SUPPORTED_TP);

This is not necessary, your driver should advertise what the PHY is
capable of in phy_driver::features. The Ethernet MAC driver later should
be adjusting phydev->supported with what it actually support, there are
cases where you connect a 10/100Mbits MAC to a 1Gbits PHY, and you want
to properly restrict unsupported speeds.

>> +
>> + phydev->speed = SPEED_1000;
>> + phydev->duplex = DUPLEX_FULL;
>> + phydev->pause = 0;
>> + phydev->asym_pause = 0;
>> + phydev->interface = PHY_INTERFACE_MODE_RGMII;
>> + phydev->mdix = ETH_TP_MDI_AUTO;
> 
> Why are you setting all these? This is not normal, if you look at other 
> drivers.
> 
> Raju: I would like to update the default values in software data structure 
> (phydev). 
> Our PHY is 1G speed support device and RGMII supported device.

Whether RGMII is used as an interface/connection type between the MAC
and PHY is something that is within the consumer of the PHYLIB API
(typically Ethernet MAC/Switch driver), your PHY cannot enforce
anything, but the driver can check that the connection interface is sensble.

All of these default values that you are setting here will need to be
potentially changed by the state machine (link, duplex, pause) upon
reaction to link state changes, this change needs to be dropped.

> 
>> +
>> + mutex_lock(>lock);
> 
> What are you locking against?
> 
> Raju: VSC 8531 has different PAGEs. Whenever MDC/MDIO access the PHY control 
> registers, 
> first set the page number then read/write the register address. Default page 
> should be Page 0.
> When I want to access not default page register, I have to lock phy device 
> access and change 
> the page number and register access as atomic operation. 

Based on the execution context of this function, acquiring the mutex is
not necessary, the state machine has not started yet, so there cannot be
a conflicting PHY read which would end up changing the page selection.

[snip]

>> +
>> +static int vsc85xx_ack_interrupt(struct phy_device *phydev) {
>> + int rc = 0;
>> +
>> + if (phydev->interrupts == PHY_INTERRUPT_ENABLED)
>> + rc = phy_read(phydev, MII_VSC85XX_INT_STATUS);
>> +
>> + return (rc < 0) ? rc : 0;
>> +}
>> +
>> +static int vsc85xx_config_intr(struct phy_device *phydev) {
>> + int rc;
>> +
>> + if (phydev->interrupts == PHY_INTERRUPT_ENABLED) {
>> + rc = phy_write(phydev, MII_VSC85XX_INT_MASK,
>> +MII_VSC85XX_INT_MASK_MASK);
>> + } else {
>> + rc = phy_read(phydev, MII_VSC85XX_INT_STATUS);
>> + if (rc < 0)
>> + return rc;
> 
> And the purpose of this read is? I assume it clears an outstanding interrupt? 
> If so, shouldn't you do it after disabling interrupts, not before? Otherwise 
> you have a race condition.
> 
> Raju: The Interrupt status register is read on clean. When, 
> PHY_INTERRUPT_DISABLE case, 
> I should make sure that status should be clear. If I read the Interrupt 
> status registers, it clears all 

Re: [PATCH v4] net: sched: convert qdisc linked list to hashtable

2016-07-28 Thread Cong Wang
On Thu, Jul 28, 2016 at 5:53 AM, Fengguang Wu  wrote:
> On Thu, Jul 28, 2016 at 01:18:27PM +0200, Jiri Kosina wrote:
>> This issue is be there even without my patch (but with qdisc_list_add
>> instead), isn't it?
>
>
> Yes it looks so, this number happens in a number of places:
>
> dns_query.c:(.text+0x39b84): undefined reference to `qdisc_hash_add'
> include/linux/netdevice.h:1935: undefined reference to `qdisc_hash_add'
> net/core/netevent.c:31: undefined reference to `qdisc_hash_add'
> net/sched/sch_generic.c:789: undefined reference to `qdisc_hash_add'
> sch_generic.c:(.text+0x33487): undefined reference to `qdisc_hash_add'
> switchdev.c:(.text+0x3bf58): undefined reference to `qdisc_hash_add'
> sysctl_net.c:(.text+0x31f70): undefined reference to `qdisc_hash_add'
> (.text.dev_activate+0x228): undefined reference to `qdisc_hash_add'
> (.text+0x37d0b): undefined reference to `qdisc_hash_add'
> wext-proc.c:(.text+0x390a8): undefined reference to `qdisc_hash_add'
>
>> The problem is that sch_generic.c (where dev_activate() is) is being
>> compiled everytime CONFIG_NET is set, but sch_api.c (where
>> qdisc_list_add() is defined) only when CONFIG_NET_SCHED is set (and there
>> is no stub for !CONFIG_NET_SCHED case).
>
>
> So it looks like a more general problem than specific to this patch.

Agreed. I can send a patch if Jiri doesn't. ;)


Re: [PATCH v4] net: sched: convert qdisc linked list to hashtable

2016-07-28 Thread Cong Wang
On Thu, Jul 28, 2016 at 2:56 AM, Jiri Kosina  wrote:
> From: Jiri Kosina 
>
> Convert the per-device linked list into a hashtable. The primary
> motivation for this change is that currently, we're not tracking all the
> qdiscs in hierarchy (e.g. excluding default qdiscs), as the lookup
> performed over the linked list by qdisc_match_from_root() is rather
> expensive.
>
> The ultimate goal is to get rid of hidden qdiscs completely, which will
> bring much more determinism in user experience.
>
> As we're adding hashtable.h include into generic netdevice.h, we have to
> make sure HASH_SIZE macro is now non-conflicting with local definitions.
>
> Signed-off-by: Jiri Kosina 
> ---
> v1 -> v2: fix up RCU hastable usage wrt. rtnl
>   fix compilation of .c files which define their own
>   HASH_SIZE that now oncflicts with the one from
>   hashtable.h (newly included via netdevice.h)
>
> v2 -> v3: resolve HASH_SIZE identifier conflicts in a cleaner way
>   fix up the number of hash bucket bits (4 bits for 16 buckets)
>
> v3 -> v4: put the hastable into struct netdevice only if
>   CONFIG_NET_SCHED has been enabled

Reviewed-by: Cong Wang 

Thanks!


[PATCH -next] net: ipv6: use list_move instead of list_del/list_add

2016-07-28 Thread Wei Yongjun
Using list_move() instead of list_del() + list_add().

Signed-off-by: Wei Yongjun 
---
 net/ipv6/addrconf.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index 6287a8b..ab3e796 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -3624,8 +3624,7 @@ restart:
state = ifa->state;
ifa->state = INET6_IFADDR_STATE_DEAD;
 
-   list_del(>if_list);
-   list_add(>if_list, _list);
+   list_move(>if_list, _list);
}
 
spin_unlock_bh(>lock);





RE: [PATCH 14/15] ethernet: stmicro: stmmac: stmmac_platform: add missing of_node_put after calling of_parse_phandle

2016-07-28 Thread Peter Chen
 
>Hi,
>
>On 07/27/2016 04:20 AM, Peter Chen wrote:
>> of_node_put needs to be called when the device node which is got from
>> of_parse_phandle has finished using.
>>
>> Signed-off-by: Peter Chen 
>> ---
>>   drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c | 5 -
>>   1 file changed, 4 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c
>> b/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c
>> index f7dfc0a..8d88782 100644
>> --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c
>> +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c
>> @@ -113,8 +113,10 @@ static struct stmmac_axi *stmmac_axi_setup(struct
>platform_device *pdev)
>>  return NULL;
>>
>>  axi = kzalloc(sizeof(*axi), GFP_KERNEL);
>> -if (!axi)
>> +if (!axi) {
>> +of_node_put(np);
>>  return ERR_PTR(-ENOMEM);
>> +}
>>
>>  axi->axi_lpi_en = of_property_read_bool(np, "snps,lpi_en");
>>  axi->axi_xit_frm = of_property_read_bool(np, "snps,xit_frm"); @@
>> -127,6 +129,7 @@ static struct stmmac_axi *stmmac_axi_setup(struct
>platform_device *pdev)
>>  of_property_read_u32(np, "snps,wr_osr_lmt", >axi_wr_osr_lmt);
>>  of_property_read_u32(np, "snps,rd_osr_lmt", >axi_rd_osr_lmt);
>>  of_property_read_u32_array(np, "snps,blen", axi->axi_blen,
>> AXI_BLEN);
>> +of_node_put(np);
>>
>>  return axi;
>>   }
>>
>
>I agree with the modification inside stmmac_axi. I just have a question about 
>np =
>pdev->dev.of_node inside stmmac_probe_config_dt (same file).
>We could add a "of_node_put(np)" just before "return plat" ?
>

Yes, you remind me there is still one node need to be put (should be 
plat->phy_node),
except for changing node name at error path, how about calling 
of_node_put(plat->phy_node) at
stmmac_release after phy is disconnected?

Peter


Re: PROBLEM: TPROXY and DNAT broken (bisected to 079096f103fa)

2016-07-28 Thread Florian Westphal
Brandon Cazander  wrote:
> Hopefully that's enough detail to replicate this issue. I have the full 
> environment set up for both working and non-working kernel versions, so 
> please let me know if there's anything else I can provide.

No need, this reproduces easily with this two-line ruleset:

-t nat -A PREROUTING -d 192.168.7.20/32 -i eth0 -j DNAT --to-destination 
192.168.8.1
-t mangle -A PREROUTING -p tcp -m tcp --dport 8080 -j TPROXY --on-port 9876 
--on-ip 0.0.0.0 --tproxy-mark 0x1/0x1

AFAIU the problem is this:

SYN:
1. -j TPROXY finds listen sk, redirects to it
2. DNAT takes place (iphdr(skb)->daddr is mangled).
3. tcp stack puts request sk into ehash table.

Note that the ehash entry uses the updated/dnatted address.

ACK:
1. -j TPROXY finds no established or request socket
since it uses iph->daddr but ehash contains dnatted-to address
... so we redirect to the listener socket.

Before the ehash change, for skb to listen sk the kernel
used to search both the listener socket request queue and
the ehash table, using the iphdr daddr (which at this point
is the DNAT'ed address).  So this used to work because this
returns the request sk.

After the ehash change we only check syn cookie and will then
emit a reset.

Eric, AFAICS the only solution for this is to extend
TPROXY and obtain the lookup saddr/daddr info from the conntrack
entry instead of the ip headers, which should make this work again.

Do you agree?
Any other suggestions?

Thanks!


Re: [PATCH v2 1/5] ethernet: add sun8i-emac driver

2016-07-28 Thread LABBE Corentin
On Mon, Jul 25, 2016 at 09:54:55PM +0200, Maxime Ripard wrote:
> On Wed, Jul 20, 2016 at 10:03:16AM +0200, LABBE Corentin wrote:
> > This patch add support for sun8i-emac ethernet MAC hardware.
> > It could be found in Allwinner H3/A83T/A64 SoCs.
> > 
> > It supports 10/100/1000 Mbit/s speed with half/full duplex.
> > It can use an internal PHY (MII 10/100) or an external PHY
> > via RGMII/RMII.
> > 
> > Signed-off-by: LABBE Corentin 
> > ---
> >  drivers/net/ethernet/allwinner/Kconfig  |   13 +
> >  drivers/net/ethernet/allwinner/Makefile |1 +
> >  drivers/net/ethernet/allwinner/sun8i-emac.c | 2129 
> > +++
> >  3 files changed, 2143 insertions(+)
> >  create mode 100644 drivers/net/ethernet/allwinner/sun8i-emac.c
> > 
> > diff --git a/drivers/net/ethernet/allwinner/Kconfig 
> > b/drivers/net/ethernet/allwinner/Kconfig
> > index 47da7e7..060569c 100644
> > --- a/drivers/net/ethernet/allwinner/Kconfig
> > +++ b/drivers/net/ethernet/allwinner/Kconfig
> > @@ -33,4 +33,17 @@ config SUN4I_EMAC
> >To compile this driver as a module, choose M here.  The module
> >will be called sun4i-emac.
> >  
> > +config SUN8I_EMAC
> > +   tristate "Allwinner sun8i EMAC support"
> > +   depends on ARCH_SUNXI || COMPILE_TEST
> > +   depends on OF
> > +   select MII
> > +   select PHYLIB
> > +---help---
> > + This driver support the sun8i EMAC ethernet driver present on
> > + H3/A83T/A64 Allwinner SoCs.
> > +
> > +  To compile this driver as a module, choose M here.  The module
> > +  will be called sun8i-emac.
> > +
> >  endif # NET_VENDOR_ALLWINNER
> > diff --git a/drivers/net/ethernet/allwinner/Makefile 
> > b/drivers/net/ethernet/allwinner/Makefile
> > index 03129f7..8bd1693c 100644
> > --- a/drivers/net/ethernet/allwinner/Makefile
> > +++ b/drivers/net/ethernet/allwinner/Makefile
> > @@ -3,3 +3,4 @@
> >  #
> >  
> >  obj-$(CONFIG_SUN4I_EMAC) += sun4i-emac.o
> > +obj-$(CONFIG_SUN8I_EMAC) += sun8i-emac.o
> > diff --git a/drivers/net/ethernet/allwinner/sun8i-emac.c 
> > b/drivers/net/ethernet/allwinner/sun8i-emac.c
> > new file mode 100644
> > index 000..fc0c1dd
> > --- /dev/null
> > +++ b/drivers/net/ethernet/allwinner/sun8i-emac.c
> > @@ -0,0 +1,2129 @@
> > +/*
> > + * sun8i-emac driver
> > + *
> > + * Copyright (C) 2015-2016 Corentin LABBE 
> > + *
> > + * This is the driver for Allwinner Ethernet MAC found in H3/A83T/A64 SoC
> > + *
> > + * TODO:
> > + * - MAC filtering
> > + * - Jumbo frame
> > + * - features rx-all (NETIF_F_RXALL_BIT)
> > + */
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +
> > +#define SUN8I_EMAC_BASIC_CTL0  0x00
> > +#define SUN8I_EMAC_BASIC_CTL1  0x04
> > +#define SUN8I_EMAC_INT_STA 0x08
> > +#define SUN8I_EMAC_INT_EN  0x0C
> > +#define SUN8I_EMAC_TX_CTL0 0x10
> > +#define SUN8I_EMAC_TX_CTL1 0x14
> > +#define SUN8I_EMAC_TX_FLOW_CTL 0x1C
> > +#define SUN8I_EMAC_RX_CTL0 0x24
> > +#define SUN8I_EMAC_RX_CTL1 0x28
> > +#define SUN8I_EMAC_RX_FRM_FLT  0x38
> > +#define SUN8I_EMAC_MDIO_CMD0x48
> > +#define SUN8I_EMAC_MDIO_DATA   0x4C
> > +#define SUN8I_EMAC_TX_DMA_STA  0xB0
> > +#define SUN8I_EMAC_TX_CUR_DESC 0xB4
> > +#define SUN8I_EMAC_TX_CUR_BUF  0xB8
> > +#define SUN8I_EMAC_RX_DMA_STA  0xC0
> > +
> > +#define MDIO_CMD_MII_BUSY  BIT(0)
> > +#define MDIO_CMD_MII_WRITE BIT(1)
> > +#define MDIO_CMD_MII_PHY_REG_ADDR_MASK GENMASK(8, 4)
> > +#define MDIO_CMD_MII_PHY_REG_ADDR_SHIFT4
> > +#define MDIO_CMD_MII_PHY_ADDR_MASK GENMASK(16, 12)
> > +#define MDIO_CMD_MII_PHY_ADDR_SHIFT12
> > +
> > +#define SUN8I_EMAC_MACADDR_HI  0x50
> > +#define SUN8I_EMAC_MACADDR_LO  0x54
> > +
> > +#define SUN8I_EMAC_RX_DESC_LIST 0x34
> > +#define SUN8I_EMAC_TX_DESC_LIST 0x20
> > +
> > +#define SUN8I_EMAC_RX_DO_CRC BIT(27)
> > +#define SUN8I_EMAC_RX_STRIP_FCS BIT(28)
> > +
> > +#define SUN8I_COULD_BE_USED_BY_DMA BIT(31)
> > +
> > +/* Used in RX_CTL1*/
> > +#define RX_DMA_EN  BIT(30)
> > +#define RX_DMA_START   BIT(31)
> > +/* Used in TX_CTL1*/
> > +#define TX_DMA_EN  BIT(30)
> > +#define TX_DMA_START   BIT(31)
> > +
> > +/* Used in RX_CTL0 */
> > +#define RX_RECEIVER_EN BIT(31)
> > +/* Used in TX_CTL0 */
> > +#define TX_TRANSMITTER_EN  BIT(31)
> > +
> > +/* Basic CTL0 */
> > +#define BCTL0_FD BIT(0)
> > +#define BCTL0_SPEED_10 2
> > +#define BCTL0_SPEED_1003
> > +#define BCTL0_SPEED_MASK   GENMASK(3, 2)
> > +#define BCTL0_SPEED_SHIFT  2
> > +
> > +#define FLOW_RX 1
> > +#define FLOW_TX 2
> > +
> > +#define RX_INT  BIT(8)
> > +#define TX_INT  BIT(0)
> > +
> > +/* Bits used in frame RX status */
> > +#define 

[RFC v6 6/6] VSOCK: Add Makefile and Kconfig

2016-07-28 Thread Stefan Hajnoczi
From: Asias He 

Enable virtio-vsock and vhost-vsock.

Signed-off-by: Asias He 
Signed-off-by: Stefan Hajnoczi 
---
v6:
 * Rename to virtio-vsock kernel modules to vmw_vsock_virtio_transport*
   instead of just virtio_transport to make the name clearer [Ian
   Campbell]
v4:
 * Make checkpatch.pl happy with longer option description
 * Clarify dependency on virtio rather than QEMU as suggested by Alex
   Bennee
v3:
 * Don't put vhost vsock driver into staging
 * Add missing Kconfig dependencies (Arnd Bergmann )
---
 drivers/vhost/Kconfig  | 15 +++
 drivers/vhost/Makefile |  4 
 net/vmw_vsock/Kconfig  | 20 
 net/vmw_vsock/Makefile |  6 ++
 4 files changed, 45 insertions(+)

diff --git a/drivers/vhost/Kconfig b/drivers/vhost/Kconfig
index 533eaf0..d7aae9e 100644
--- a/drivers/vhost/Kconfig
+++ b/drivers/vhost/Kconfig
@@ -21,6 +21,21 @@ config VHOST_SCSI
Say M here to enable the vhost_scsi TCM fabric module
for use with virtio-scsi guests
 
+config VHOST_VSOCK
+   tristate "vhost virtio-vsock driver"
+   depends on VSOCKETS && EVENTFD
+   select VIRTIO_VSOCKETS_COMMON
+   select VHOST
+   select VHOST_RING
+   default n
+   ---help---
+   This kernel module can be loaded in the host kernel to provide AF_VSOCK
+   sockets for communicating with guests.  The guests must have the
+   virtio_transport.ko driver loaded to use the virtio-vsock device.
+
+   To compile this driver as a module, choose M here: the module will be 
called
+   vhost_vsock.
+
 config VHOST_RING
tristate
---help---
diff --git a/drivers/vhost/Makefile b/drivers/vhost/Makefile
index e0441c3..6b012b9 100644
--- a/drivers/vhost/Makefile
+++ b/drivers/vhost/Makefile
@@ -4,5 +4,9 @@ vhost_net-y := net.o
 obj-$(CONFIG_VHOST_SCSI) += vhost_scsi.o
 vhost_scsi-y := scsi.o
 
+obj-$(CONFIG_VHOST_VSOCK) += vhost_vsock.o
+vhost_vsock-y := vsock.o
+
 obj-$(CONFIG_VHOST_RING) += vringh.o
+
 obj-$(CONFIG_VHOST)+= vhost.o
diff --git a/net/vmw_vsock/Kconfig b/net/vmw_vsock/Kconfig
index 14810ab..8831e7c 100644
--- a/net/vmw_vsock/Kconfig
+++ b/net/vmw_vsock/Kconfig
@@ -26,3 +26,23 @@ config VMWARE_VMCI_VSOCKETS
 
  To compile this driver as a module, choose M here: the module
  will be called vmw_vsock_vmci_transport. If unsure, say N.
+
+config VIRTIO_VSOCKETS
+   tristate "virtio transport for Virtual Sockets"
+   depends on VSOCKETS && VIRTIO
+   select VIRTIO_VSOCKETS_COMMON
+   help
+ This module implements a virtio transport for Virtual Sockets.
+
+ Enable this transport if your Virtual Machine host supports Virtual
+ Sockets over virtio.
+
+ To compile this driver as a module, choose M here: the module will be
+ called vmw_vsock_virtio_transport. If unsure, say N.
+
+config VIRTIO_VSOCKETS_COMMON
+   tristate
+   help
+ This option is selected by any driver which needs to access
+ the virtio_vsock.  The module will be called
+ vmw_vsock_virtio_transport_common.
diff --git a/net/vmw_vsock/Makefile b/net/vmw_vsock/Makefile
index 2ce52d7..bc27c70 100644
--- a/net/vmw_vsock/Makefile
+++ b/net/vmw_vsock/Makefile
@@ -1,7 +1,13 @@
 obj-$(CONFIG_VSOCKETS) += vsock.o
 obj-$(CONFIG_VMWARE_VMCI_VSOCKETS) += vmw_vsock_vmci_transport.o
+obj-$(CONFIG_VIRTIO_VSOCKETS) += vmw_vsock_virtio_transport.o
+obj-$(CONFIG_VIRTIO_VSOCKETS_COMMON) += vmw_vsock_virtio_transport_common.o
 
 vsock-y += af_vsock.o vsock_addr.o
 
 vmw_vsock_vmci_transport-y += vmci_transport.o vmci_transport_notify.o \
vmci_transport_notify_qstate.o
+
+vmw_vsock_virtio_transport-y += virtio_transport.o
+
+vmw_vsock_virtio_transport_common-y += virtio_transport_common.o
-- 
2.7.4



[RFC v6 5/6] VSOCK: Introduce vhost_vsock.ko

2016-07-28 Thread Stefan Hajnoczi
From: Asias He 

VM sockets vhost transport implementation.  This driver runs on the
host.

Signed-off-by: Asias He 
Signed-off-by: Stefan Hajnoczi 
---
v6:
 * Fall back to non-contiguous pages if vsock struct is too large (idea
   stolen from vhost_net)
 * 64-bit CIDs in packet header and ioctl to match virtio-vsock
   specification
 * Add VHOST_VSOCK_SET_RUNNING ioctl to start/stop vhost cleanly
 * Remove total_tx_buf accounting, it is ineffective because control
   packets are not included
 * Start/stop rx depending on reply packet accounting to bound memory
   allocation if the guest is not processing rx packets
 * Turn vhost_vsock_mutex into a spinlock so packets can be sent in
   atomic context without lockdep errors
v5:
 * Only take rx/tx virtqueues, userspace handles the other virtqueues
 * Explicitly skip instances without a CID when transferring packets
 * Add VHOST_VSOCK_START ioctl to being vhost virtqueue processing
 * Reset established connections when device is closed
v4:
 * Add MAINTAINERS file entry
 * virtqueue used len is now sizeof(pkt->hdr) + pkt->len instead of just
   pkt->len
 * checkpatch.pl cleanups
 * Clarify struct vhost_vsock locking
 * Add comments about optimization that disables virtqueue notify
 * Drop unused vhost_vsock_handle_ctl_kick()
 * Call wake_up() after decrementing total_tx_buf to prevent deadlock
v3:
 * Remove unneeded variable used to store return value
   (Fengguang Wu  and Julia Lawall
   )
v2:
 * Add missing total_tx_buf decrement
 * Support flexible rx/tx descriptor layout
 * Refuse to assign reserved CIDs
 * Refuse guest CID if already in use
 * Only accept correctly addressed packets
---
 MAINTAINERS|   2 +
 drivers/vhost/vsock.c  | 722 +
 include/uapi/linux/vhost.h |   5 +
 3 files changed, 729 insertions(+)
 create mode 100644 drivers/vhost/vsock.c

diff --git a/MAINTAINERS b/MAINTAINERS
index 7302663..12c79e5 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -12148,6 +12148,8 @@ F:  include/linux/virtio_vsock.h
 F: include/uapi/linux/virtio_vsock.h
 F: net/vmw_vsock/virtio_transport_common.c
 F: net/vmw_vsock/virtio_transport.c
+F: drivers/vhost/vsock.c
+F: drivers/vhost/vsock.h
 
 VIRTUAL SERIO DEVICE DRIVER
 M: Stephen Chandler Paul 
diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
new file mode 100644
index 000..028ca16
--- /dev/null
+++ b/drivers/vhost/vsock.c
@@ -0,0 +1,722 @@
+/*
+ * vhost transport for vsock
+ *
+ * Copyright (C) 2013-2015 Red Hat, Inc.
+ * Author: Asias He 
+ * Stefan Hajnoczi 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.
+ */
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include "vhost.h"
+
+#define VHOST_VSOCK_DEFAULT_HOST_CID   2
+
+enum {
+   VHOST_VSOCK_FEATURES = VHOST_FEATURES,
+};
+
+/* Used to track all the vhost_vsock instances on the system. */
+static DEFINE_SPINLOCK(vhost_vsock_lock);
+static LIST_HEAD(vhost_vsock_list);
+
+struct vhost_vsock {
+   struct vhost_dev dev;
+   struct vhost_virtqueue vqs[2];
+
+   /* Link to global vhost_vsock_list, protected by vhost_vsock_lock */
+   struct list_head list;
+
+   struct vhost_work send_pkt_work;
+   spinlock_t send_pkt_list_lock;
+   struct list_head send_pkt_list; /* host->guest pending packets */
+
+   atomic_t queued_replies;
+
+   u32 guest_cid;
+};
+
+static u32 vhost_transport_get_local_cid(void)
+{
+   return VHOST_VSOCK_DEFAULT_HOST_CID;
+}
+
+static struct vhost_vsock *vhost_vsock_get(u32 guest_cid)
+{
+   struct vhost_vsock *vsock;
+
+   spin_lock_bh(_vsock_lock);
+   list_for_each_entry(vsock, _vsock_list, list) {
+   u32 other_cid = vsock->guest_cid;
+
+   /* Skip instances that have no CID yet */
+   if (other_cid == 0)
+   continue;
+
+   if (other_cid == guest_cid) {
+   spin_unlock_bh(_vsock_lock);
+   return vsock;
+   }
+   }
+   spin_unlock_bh(_vsock_lock);
+
+   return NULL;
+}
+
+static void
+vhost_transport_do_send_pkt(struct vhost_vsock *vsock,
+   struct vhost_virtqueue *vq)
+{
+   struct vhost_virtqueue *tx_vq = >vqs[VSOCK_VQ_TX];
+   bool added = false;
+   bool restart_tx = false;
+
+   mutex_lock(>mutex);
+
+   if (!vq->private_data)
+   goto out;
+
+   /* Avoid further vmexits, we're already processing the virtqueue */
+   vhost_disable_notify(>dev, vq);
+
+   for (;;) {
+   struct virtio_vsock_pkt *pkt;
+   struct iov_iter iov_iter;
+   unsigned out, in;
+   size_t nbytes;
+   

[RFC v6 4/6] VSOCK: Introduce virtio_transport.ko

2016-07-28 Thread Stefan Hajnoczi
From: Asias He 

VM sockets virtio transport implementation.  This driver runs in the
guest.

Signed-off-by: Asias He 
Signed-off-by: Stefan Hajnoczi 
---
v6:
 * Start/stop rx depending on reply packet accounting to bound memory
   allocation if the host is not processing rx packets
 * 64-bit CID in packet header to match virtio-vsock specification
 * Use send_pkt_list to defer transmission
 * Avoid leaking virtqueue buffers and pkt lists on shutdown
 * Remove total_tx_buf accounting, it is ineffective because control
   packets are not included
v5:
 * Add transport reset event handling
 * Drop ctrl virtqueue
v4:
 * Add MAINTAINERS file entry
 * Drop short/long rx packets
 * checkpatch.pl cleanups
 * Clarify locking in struct virtio_vsock
 * Narrow local variable scopes as suggested by Alex Bennee
 * Call wake_up() after decrementing total_tx_buf to avoid deadlock
v2:
 * Fix total_tx_buf accounting
 * Add virtio_transport global mutex to prevent races
---
 MAINTAINERS  |   1 +
 net/vmw_vsock/virtio_transport.c | 624 +++
 2 files changed, 625 insertions(+)
 create mode 100644 net/vmw_vsock/virtio_transport.c

diff --git a/MAINTAINERS b/MAINTAINERS
index b49ffb8..7302663 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -12147,6 +12147,7 @@ S:  Maintained
 F: include/linux/virtio_vsock.h
 F: include/uapi/linux/virtio_vsock.h
 F: net/vmw_vsock/virtio_transport_common.c
+F: net/vmw_vsock/virtio_transport.c
 
 VIRTUAL SERIO DEVICE DRIVER
 M: Stephen Chandler Paul 
diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c
new file mode 100644
index 000..699dfab
--- /dev/null
+++ b/net/vmw_vsock/virtio_transport.c
@@ -0,0 +1,624 @@
+/*
+ * virtio transport for vsock
+ *
+ * Copyright (C) 2013-2015 Red Hat, Inc.
+ * Author: Asias He 
+ * Stefan Hajnoczi 
+ *
+ * Some of the code is take from Gerd Hoffmann 's
+ * early virtio-vsock proof-of-concept bits.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.
+ */
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+static struct workqueue_struct *virtio_vsock_workqueue;
+static struct virtio_vsock *the_virtio_vsock;
+static DEFINE_MUTEX(the_virtio_vsock_mutex); /* protects the_virtio_vsock */
+
+struct virtio_vsock {
+   struct virtio_device *vdev;
+   struct virtqueue *vqs[VSOCK_VQ_MAX];
+
+   /* Virtqueue processing is deferred to a workqueue */
+   struct work_struct tx_work;
+   struct work_struct rx_work;
+   struct work_struct event_work;
+
+   /* The following fields are protected by tx_lock.  vqs[VSOCK_VQ_TX]
+* must be accessed with tx_lock held.
+*/
+   struct mutex tx_lock;
+
+   struct work_struct send_pkt_work;
+   spinlock_t send_pkt_list_lock;
+   struct list_head send_pkt_list;
+
+   atomic_t queued_replies;
+
+   /* The following fields are protected by rx_lock.  vqs[VSOCK_VQ_RX]
+* must be accessed with rx_lock held.
+*/
+   struct mutex rx_lock;
+   int rx_buf_nr;
+   int rx_buf_max_nr;
+
+   /* The following fields are protected by event_lock.
+* vqs[VSOCK_VQ_EVENT] must be accessed with event_lock held.
+*/
+   struct mutex event_lock;
+   struct virtio_vsock_event event_list[8];
+
+   u32 guest_cid;
+};
+
+static struct virtio_vsock *virtio_vsock_get(void)
+{
+   return the_virtio_vsock;
+}
+
+static u32 virtio_transport_get_local_cid(void)
+{
+   struct virtio_vsock *vsock = virtio_vsock_get();
+
+   return vsock->guest_cid;
+}
+
+static void
+virtio_transport_send_pkt_work(struct work_struct *work)
+{
+   struct virtio_vsock *vsock =
+   container_of(work, struct virtio_vsock, send_pkt_work);
+   struct virtqueue *vq;
+   bool added = false;
+   bool restart_rx = false;
+
+   mutex_lock(>tx_lock);
+
+   vq = vsock->vqs[VSOCK_VQ_TX];
+
+   /* Avoid unnecessary interrupts while we're processing the ring */
+   virtqueue_disable_cb(vq);
+
+   for (;;) {
+   struct virtio_vsock_pkt *pkt;
+   struct scatterlist hdr, buf, *sgs[2];
+   int ret, in_sg = 0, out_sg = 0;
+   bool reply;
+
+   spin_lock_bh(>send_pkt_list_lock);
+   if (list_empty(>send_pkt_list)) {
+   spin_unlock_bh(>send_pkt_list_lock);
+   virtqueue_enable_cb(vq);
+   break;
+   }
+
+   pkt = list_first_entry(>send_pkt_list,
+  struct virtio_vsock_pkt, list);
+   list_del_init(>list);
+   spin_unlock_bh(>send_pkt_list_lock);
+
+   reply = 

[RFC v6 3/6] VSOCK: Introduce virtio_vsock_common.ko

2016-07-28 Thread Stefan Hajnoczi
From: Asias He 

This module contains the common code and header files for the following
virtio_transporto and vhost_vsock kernel modules.

Signed-off-by: Asias He 
Signed-off-by: Claudio Imbrenda 
Signed-off-by: Stefan Hajnoczi 
---
v6:
 * Add graceful shutdown to avoid port reuse while peer is still closing
   socket [Ian Campbell]
 * Use spinlocks instead of mutexes for tx_lock/rx_lock because they are
   used in sections that are not allowed to sleep. [Claudio]
 * Used nested lock notation in cases where child socket is held
   [Claudio]
 * 64-bit CIDs in virtio_vsock.h to match virtio-vsock specification
 * Avoid memcpy_to_msg() in atomic region
 * Call sk_write_space() with sk_lock held
 * Move duplicated send_pkt() logic from vhost and virtio transports
   into virtio_transport_common.ko
 * Add __attribute__((packed)) to avoid struct padding for ABI structs
   [Gerard Garcia]
v5:
 * Add event virtqueue, struct virtio_vsock_event, and transport reset
   event
 * Reorder virtqueue indices: rx, tx, event
 * Drop unused virtqueue_pairs config field
 * Drop unused ctrl virtqueue
 * Switch to a free virtio device ID, the previous one was reserved
v4:
 * Add MAINTAINERS file entry
 * checkpatch.pl cleanups
 * linux_vsock.h: drop wrong copy-pasted license header
 * Move tx sock refcounting to virtio_transport_alloc/free_pkt() to fix
   leaks in error paths
 * Add send_pkt_no_sock() to send RST packets with no listen socket
 * Rename per-socket state from virtio_transport to virtio_vsock_sock
 * Move send_pkt_ops to new virtio_transport struct
 * Drop dumppkt function, packet capture will be added in the future
 * Drop empty virtio_transport_dec_tx_pkt()
 * Allow guest->host connections again
 * Use trace events instead of pr_debug()
v3:
 * Remove unnecessary 3-way handshake, just do REQUEST/RESPONSE instead
   of REQUEST/RESPONSE/ACK
 * Remove SOCK_DGRAM support and focus on SOCK_STREAM first
 * Only allow host->guest connections (same security model as latest
   VMware)
v2:
 * Fix peer_buf_alloc inheritance on child socket
 * Notify other side of SOCK_STREAM disconnect (fixes shutdown
   semantics)
 * Avoid recursive mutex_lock(tx_lock) for write_space (fixes deadlock)
 * Define VIRTIO_VSOCK_TYPE_STREAM/DGRAM hardware interface constants
 * Define VIRTIO_VSOCK_SHUTDOWN_RCV/SEND hardware interface constants
---
 MAINTAINERS|  10 +
 include/linux/virtio_vsock.h   | 154 
 include/net/af_vsock.h |   2 +
 .../trace/events/vsock_virtio_transport_common.h   | 144 +++
 include/uapi/linux/Kbuild  |   1 +
 include/uapi/linux/virtio_ids.h|   1 +
 include/uapi/linux/virtio_vsock.h  |  94 ++
 net/vmw_vsock/virtio_transport_common.c| 992 +
 8 files changed, 1398 insertions(+)
 create mode 100644 include/linux/virtio_vsock.h
 create mode 100644 include/trace/events/vsock_virtio_transport_common.h
 create mode 100644 include/uapi/linux/virtio_vsock.h
 create mode 100644 net/vmw_vsock/virtio_transport_common.c

diff --git a/MAINTAINERS b/MAINTAINERS
index 8c20323..b49ffb8 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -12138,6 +12138,16 @@ S: Maintained
 F: drivers/media/v4l2-core/videobuf2-*
 F: include/media/videobuf2-*
 
+VIRTIO AND VHOST VSOCK DRIVER
+M: Stefan Hajnoczi 
+L: k...@vger.kernel.org
+L: virtualizat...@lists.linux-foundation.org
+L: netdev@vger.kernel.org
+S: Maintained
+F: include/linux/virtio_vsock.h
+F: include/uapi/linux/virtio_vsock.h
+F: net/vmw_vsock/virtio_transport_common.c
+
 VIRTUAL SERIO DEVICE DRIVER
 M: Stephen Chandler Paul 
 S: Maintained
diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
new file mode 100644
index 000..9638bfe
--- /dev/null
+++ b/include/linux/virtio_vsock.h
@@ -0,0 +1,154 @@
+#ifndef _LINUX_VIRTIO_VSOCK_H
+#define _LINUX_VIRTIO_VSOCK_H
+
+#include 
+#include 
+#include 
+#include 
+
+#define VIRTIO_VSOCK_DEFAULT_MIN_BUF_SIZE  128
+#define VIRTIO_VSOCK_DEFAULT_BUF_SIZE  (1024 * 256)
+#define VIRTIO_VSOCK_DEFAULT_MAX_BUF_SIZE  (1024 * 256)
+#define VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE   (1024 * 4)
+#define VIRTIO_VSOCK_MAX_BUF_SIZE  0xUL
+#define VIRTIO_VSOCK_MAX_PKT_BUF_SIZE  (1024 * 64)
+
+enum {
+   VSOCK_VQ_RX = 0, /* for host to guest data */
+   VSOCK_VQ_TX = 1, /* for guest to host data */
+   VSOCK_VQ_EVENT  = 2,
+   VSOCK_VQ_MAX= 3,
+};
+
+/* Per-socket state (accessed via vsk->trans) */
+struct virtio_vsock_sock {
+   struct vsock_sock *vsk;
+
+   /* Protected by lock_sock(sk_vsock(trans->vsk)) */
+   u32 buf_size;
+   u32 buf_size_min;
+   u32 buf_size_max;
+
+   spinlock_t tx_lock;
+   

[RFC v6 2/6] VSOCK: defer sock removal to transports

2016-07-28 Thread Stefan Hajnoczi
The virtio transport will implement graceful shutdown and the related
SO_LINGER socket option.  This requires orphaning the sock but keeping
it in the table of connections after .release().

This patch adds the vsock_remove_sock() function and leaves it up to the
transport when to remove the sock.

Signed-off-by: Stefan Hajnoczi 
---
 include/net/af_vsock.h |  1 +
 net/vmw_vsock/af_vsock.c   | 16 ++--
 net/vmw_vsock/vmci_transport.c |  2 ++
 3 files changed, 13 insertions(+), 6 deletions(-)

diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h
index 23f5525..3af0b22 100644
--- a/include/net/af_vsock.h
+++ b/include/net/af_vsock.h
@@ -180,6 +180,7 @@ void vsock_remove_connected(struct vsock_sock *vsk);
 struct sock *vsock_find_bound_socket(struct sockaddr_vm *addr);
 struct sock *vsock_find_connected_socket(struct sockaddr_vm *src,
 struct sockaddr_vm *dst);
+void vsock_remove_sock(struct vsock_sock *vsk);
 void vsock_for_each_connected_socket(void (*fn)(struct sock *sk));
 
 #endif /* __AF_VSOCK_H__ */
diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
index e34d96f..17dbbe6 100644
--- a/net/vmw_vsock/af_vsock.c
+++ b/net/vmw_vsock/af_vsock.c
@@ -344,6 +344,16 @@ static bool vsock_in_connected_table(struct vsock_sock 
*vsk)
return ret;
 }
 
+void vsock_remove_sock(struct vsock_sock *vsk)
+{
+   if (vsock_in_bound_table(vsk))
+   vsock_remove_bound(vsk);
+
+   if (vsock_in_connected_table(vsk))
+   vsock_remove_connected(vsk);
+}
+EXPORT_SYMBOL_GPL(vsock_remove_sock);
+
 void vsock_for_each_connected_socket(void (*fn)(struct sock *sk))
 {
int i;
@@ -660,12 +670,6 @@ static void __vsock_release(struct sock *sk)
vsk = vsock_sk(sk);
pending = NULL; /* Compiler warning. */
 
-   if (vsock_in_bound_table(vsk))
-   vsock_remove_bound(vsk);
-
-   if (vsock_in_connected_table(vsk))
-   vsock_remove_connected(vsk);
-
transport->release(vsk);
 
lock_sock(sk);
diff --git a/net/vmw_vsock/vmci_transport.c b/net/vmw_vsock/vmci_transport.c
index 4120b7a..4be4fbb 100644
--- a/net/vmw_vsock/vmci_transport.c
+++ b/net/vmw_vsock/vmci_transport.c
@@ -1644,6 +1644,8 @@ static void vmci_transport_destruct(struct vsock_sock 
*vsk)
 
 static void vmci_transport_release(struct vsock_sock *vsk)
 {
+   vsock_remove_sock(vsk);
+
if (!vmci_handle_is_invalid(vmci_trans(vsk)->dg_handle)) {
vmci_datagram_destroy_handle(vmci_trans(vsk)->dg_handle);
vmci_trans(vsk)->dg_handle = VMCI_INVALID_HANDLE;
-- 
2.7.4



[RFC v6 1/6] VSOCK: transport-specific vsock_transport functions

2016-07-28 Thread Stefan Hajnoczi
struct vsock_transport contains function pointers called by AF_VSOCK
core code.  The transport may want its own transport-specific function
pointers and they can be added after struct vsock_transport.

Allow the transport to fetch vsock_transport.  It can downcast it to
access transport-specific function pointers.

The virtio transport will use this.

Signed-off-by: Stefan Hajnoczi 
---
 include/net/af_vsock.h   | 3 +++
 net/vmw_vsock/af_vsock.c | 9 +
 2 files changed, 12 insertions(+)

diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h
index e9eb2d6..23f5525 100644
--- a/include/net/af_vsock.h
+++ b/include/net/af_vsock.h
@@ -165,6 +165,9 @@ static inline int vsock_core_init(const struct 
vsock_transport *t)
 }
 void vsock_core_exit(void);
 
+/* The transport may downcast this to access transport-specific functions */
+const struct vsock_transport *vsock_core_get_transport(void);
+
 / UTILS /
 
 void vsock_release_pending(struct sock *pending);
diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
index b96ac91..e34d96f 100644
--- a/net/vmw_vsock/af_vsock.c
+++ b/net/vmw_vsock/af_vsock.c
@@ -1995,6 +1995,15 @@ void vsock_core_exit(void)
 }
 EXPORT_SYMBOL_GPL(vsock_core_exit);
 
+const struct vsock_transport *vsock_core_get_transport(void)
+{
+   /* vsock_register_mutex not taken since only the transport uses this
+* function and only while registered.
+*/
+   return transport;
+}
+EXPORT_SYMBOL_GPL(vsock_core_get_transport);
+
 MODULE_AUTHOR("VMware, Inc.");
 MODULE_DESCRIPTION("VMware Virtual Socket Family");
 MODULE_VERSION("1.0.1.0-k");
-- 
2.7.4



[RFC v6 0/6] Add virtio transport for AF_VSOCK

2016-07-28 Thread Stefan Hajnoczi
This series is based on v4.7.

This RFC is the implementation for the new VIRTIO Socket device.  It is
developed in parallel with the VIRTIO device specification and proves the
design.  Once the specification has been accepted I will send a non-RFC version
of this patch series.

v6:
 * Add VHOST_VSOCK_SET_RUNNING ioctl to start/stop vhost cleanly
 * Add graceful shutdown to avoid port reuse while peer is still closing
   socket [Ian Campbell]
 * Start/stop rx depending on reply packet accounting to bound memory
   allocation if the host is not processing rx packets
 * Use send_pkt_list to defer transmission
 * Use spinlocks instead of mutexes for tx_lock/rx_lock because they are
   used in sections that are not allowed to sleep. [Claudio]
 * 64-bit CIDs in virtio_vsock.h to match virtio-vsock specification
 * Move duplicated send_pkt() logic from vhost and virtio transports
   into virtio_transport_common.ko
 * ...and more, see individual patch changelogs

v5:
 * Transport reset event for live migration support
 * Reorder virtqueues, drop unused ctrl virtqueue
 * Switch to a free virtio device ID
 * More small changes, see patches for individual items

v4:
 * Addressed code review comments from Alex Bennee
 * MAINTAINERS file entries for new files
 * Trace events instead of pr_debug()
 * RST packet is sent when there is no listen socket
 * Allow guest->host connections again (began discussing netfilter support with
   Matt Benjamin instead of hard-coding security policy in virtio-vsock code)
 * Many checkpatch.pl cleanups (will be 100% clean in v5)

v3:
 * Remove unnecessary 3-way handshake, just do REQUEST/RESPONSE instead
   of REQUEST/RESPONSE/ACK
 * Remove SOCK_DGRAM support and focus on SOCK_STREAM first
   (also drop v2 Patch 1, it's only needed for SOCK_DGRAM)
 * Only allow host->guest connections (same security model as latest
   VMware)
 * Don't put vhost vsock driver into staging
 * Add missing Kconfig dependencies (Arnd Bergmann )
 * Remove unneeded variable used to store return value
   (Fengguang Wu  and Julia Lawall
   )

v2:
 * Rebased onto Linux v4.4-rc2
 * vhost: Refuse to assign reserved CIDs
 * vhost: Refuse guest CID if already in use
 * vhost: Only accept correctly addressed packets (no spoofing!)
 * vhost: Support flexible rx/tx descriptor layout
 * vhost: Add missing total_tx_buf decrement
 * virtio_transport: Fix total_tx_buf accounting
 * virtio_transport: Add virtio_transport global mutex to prevent races
 * common: Notify other side of SOCK_STREAM disconnect (fixes shutdown
   semantics)
 * common: Avoid recursive mutex_lock(tx_lock) for write_space (fixes deadlock)
 * common: Define VIRTIO_VSOCK_TYPE_STREAM/DGRAM hardware interface constants
 * common: Define VIRTIO_VSOCK_SHUTDOWN_RCV/SEND hardware interface constants
 * common: Fix peer_buf_alloc inheritance on child socket

This patch series adds a virtio transport for AF_VSOCK (net/vmw_vsock/).
AF_VSOCK is designed for communication between virtual machines and
hypervisors.  It is currently only implemented for VMware's VMCI transport.

Much of the work was done by Asias He and Gerd Hoffmann a while back.  I have
picked up the series again.

The QEMU userspace changes are here:
https://github.com/stefanha/qemu/commits/vsock

Why virtio-vsock?
-
Guest<->host communication is currently done over the virtio-serial device.
This makes it hard to port sockets API-based applications and is limited to
static ports.

virtio-vsock uses the sockets API so that applications can rely on familiar
SOCK_STREAM semantics.  Applications on the host can easily connect to guest
agents because the sockets API allows multiple connections to a listen socket
(unlike virtio-serial).  This simplifies the guest<->host communication and
eliminates the need for extra processes on the host to arbitrate virtio-serial
ports.

Overview

This series adds 3 pieces:

1. virtio_transport_common.ko - core virtio vsock code that uses vsock.ko

2. virtio_transport.ko - guest driver

3. drivers/vhost/vsock.ko - host driver

Howto
-
The following kernel options are needed:
  CONFIG_VSOCKETS=y
  CONFIG_VIRTIO_VSOCKETS=y
  CONFIG_VIRTIO_VSOCKETS_COMMON=y
  CONFIG_VHOST_VSOCK=m

Launch QEMU as follows:
  # qemu ... -device vhost-vsock-pci,id=vhost-vsock-pci0,guest-cid=3

Guest and host can communicate via AF_VSOCK sockets.  The host's CID (address)
is 2 and the guest must be assigned a CID (3 in the example above).

See http://qemu-project.org/Features/VirtioVsock for more info.

Asias He (4):
  VSOCK: Introduce virtio_vsock_common.ko
  VSOCK: Introduce virtio_transport.ko
  VSOCK: Introduce vhost_vsock.ko
  VSOCK: Add Makefile and Kconfig

Stefan Hajnoczi (2):
  VSOCK: transport-specific vsock_transport functions
  VSOCK: defer sock removal to transports

 MAINTAINERS|  13 +
 drivers/vhost/Kconfig  |  15 +
 

Re: [PATCH 1/1] ixgbevf: replace integer number with bool value

2016-07-28 Thread Greg
On Wed, 2016-07-27 at 21:28 +0800, Zhu Yanjun wrote:
> The variable get_mac_status is a bool variable. So a bool value is
> better than an integer number.
> 
> Signed-off-by: Zhu Yanjun 

Looks good to me.

Reviewed-by: Greg Rose 

> ---
>  drivers/net/ethernet/intel/ixgbevf/ethtool.c  | 2 +-
>  drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c | 4 ++--
>  2 files changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/net/ethernet/intel/ixgbevf/ethtool.c 
> b/drivers/net/ethernet/intel/ixgbevf/ethtool.c
> index 508e72c..ce221d1 100644
> --- a/drivers/net/ethernet/intel/ixgbevf/ethtool.c
> +++ b/drivers/net/ethernet/intel/ixgbevf/ethtool.c
> @@ -104,7 +104,7 @@ static int ixgbevf_get_settings(struct net_device *netdev,
>   ecmd->transceiver = XCVR_DUMMY1;
>   ecmd->port = -1;
>  
> - hw->mac.get_link_status = 1;
> + hw->mac.get_link_status = true;
>   hw->mac.ops.check_link(hw, _speed, _up, false);
>  
>   if (link_up) {
> diff --git a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c 
> b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
> index acc2401..a98e7c2 100644
> --- a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
> +++ b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
> @@ -1285,7 +1285,7 @@ static irqreturn_t ixgbevf_msix_other(int irq, void 
> *data)
>   struct ixgbevf_adapter *adapter = data;
>   struct ixgbe_hw *hw = >hw;
>  
> - hw->mac.get_link_status = 1;
> + hw->mac.get_link_status = true;
>  
>   ixgbevf_service_event_schedule(adapter);
>  
> @@ -2109,7 +2109,7 @@ static void ixgbevf_up_complete(struct ixgbevf_adapter 
> *adapter)
>   ixgbevf_save_reset_stats(adapter);
>   ixgbevf_init_last_counter_stats(adapter);
>  
> - hw->mac.get_link_status = 1;
> + hw->mac.get_link_status = true;
>   mod_timer(>service_timer, jiffies);
>  }
>  




[PATCH 1/1] ixgbevf: replace integer number with bool value

2016-07-28 Thread Zhu Yanjun
The variable get_mac_status is a bool variable. So a bool value is
better than an integer number.

Signed-off-by: Zhu Yanjun 
---
 drivers/net/ethernet/intel/ixgbevf/ethtool.c  | 2 +-
 drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c | 4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbevf/ethtool.c 
b/drivers/net/ethernet/intel/ixgbevf/ethtool.c
index 508e72c..ce221d1 100644
--- a/drivers/net/ethernet/intel/ixgbevf/ethtool.c
+++ b/drivers/net/ethernet/intel/ixgbevf/ethtool.c
@@ -104,7 +104,7 @@ static int ixgbevf_get_settings(struct net_device *netdev,
ecmd->transceiver = XCVR_DUMMY1;
ecmd->port = -1;
 
-   hw->mac.get_link_status = 1;
+   hw->mac.get_link_status = true;
hw->mac.ops.check_link(hw, _speed, _up, false);
 
if (link_up) {
diff --git a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c 
b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
index acc2401..a98e7c2 100644
--- a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
+++ b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
@@ -1285,7 +1285,7 @@ static irqreturn_t ixgbevf_msix_other(int irq, void *data)
struct ixgbevf_adapter *adapter = data;
struct ixgbe_hw *hw = >hw;
 
-   hw->mac.get_link_status = 1;
+   hw->mac.get_link_status = true;
 
ixgbevf_service_event_schedule(adapter);
 
@@ -2109,7 +2109,7 @@ static void ixgbevf_up_complete(struct ixgbevf_adapter 
*adapter)
ixgbevf_save_reset_stats(adapter);
ixgbevf_init_last_counter_stats(adapter);
 
-   hw->mac.get_link_status = 1;
+   hw->mac.get_link_status = true;
mod_timer(>service_timer, jiffies);
 }
 
-- 
2.7.4



Re: [PATCH 1/2] SUNRPC: accept() may return sockets that are still in SYN_RECV

2016-07-28 Thread Fields Bruce James
On Wed, Jul 27, 2016 at 07:11:23PM +, Trond Myklebust wrote:
> Hi Eric,
> 
> > On Jul 27, 2016, at 14:59, Eric Dumazet  wrote:
> > 
> > On Wed, 2016-07-27 at 14:48 -0400, Fields Bruce James wrote:
> >> On Tue, Jul 26, 2016 at 04:08:29PM +, Trond Myklebust wrote:
> >>> 
>  On Jul 26, 2016, at 11:43, J. Bruce Fields  wrote:
>  
>  On Tue, Jul 26, 2016 at 09:51:19AM -0400, Trond Myklebust wrote:
> > We're seeing traces of the following form:
> > 
> > [10952.396347] svc: transport 88042ba4a 000 dequeued, inuse=2
> > [10952.396351] svc: tcp_accept 88042ba4 a000 sock 88042a6e4c80
> > [10952.396362] nfsd: connect from 10.2.6.1, port=187
> > [10952.396364] svc: svc_setup_socket 8800b99bcf00
> > [10952.396368] setting up TCP socket for reading
> > [10952.396370] svc: svc_setup_socket created 8803eb10a000 (inet 
> > 88042b75b800)
> > [10952.396373] svc: transport 8803eb10a000 put into queue
> > [10952.396375] svc: transport 88042ba4a000 put into queue
> > [10952.396377] svc: server 8800bb0ec000 waiting for data (to = 
> > 360)
> > [10952.396380] svc: transport 8803eb10a000 dequeued, inuse=2
> > [10952.396381] svc_recv: found XPT_CLOSE
> > [10952.396397] svc: svc_delete_xprt(8803eb10a000)
> > [10952.396398] svc: svc_tcp_sock_detach(8803eb10a000)
> > [10952.396399] svc: svc_sock_detach(8803eb10a000)
> > [10952.396412] svc: svc_sock_free(8803eb10a000)
> > 
> > i.e. an immediate close of the socket after initialisation.
>  
>  Interesting, thanks!
>  
>  So the one thing I don't understand is why this is correct behavior for
>  accept--I thought it wasn't supposed to return a socket until it was
>  fully established.
> >>> 
> >>> inet_accept() appears to allow SYN_RECV:
> >> 
> >> OK.  Cc'ing netdev just to make sure we didn't overlook anything.
> >> 
> > 
> > SYN_RECV after accept() is a TCP Fast Open property I think.
> > 
> > Maybe you are playing with some global TCP Fast Open settings ?
> > 
> 
> The Linux kernel client should not be using TCP fast open, but it is possible 
> that some of the other NFSv3 clients we’re using are.
> Would a standard knfsd listener respond to a TCP fast open request, or would 
> the default behaviour be to ignore it?
> 
> If the default behaviour for the server is to allow fast open, then we do 
> need these patches, IMO.

Even if it's not a default, if there's a configuration that allows
accept to return a socket in SYN_RECV state, then knfsd should handle it
gracefully, especially as long as it's this easy.

It'd still be useful to understand why this is happening, though

--b.


Re: [PATCH net-next 1/3] strparser: Stream parser for messages

2016-07-28 Thread David Ahern

On 7/27/16 3:03 PM, Tom Herbert wrote:

diff --git a/net/strparser/strparser.c b/net/strparser/strparser.c
new file mode 100644
index 000..d7aec13
--- /dev/null
+++ b/net/strparser/strparser.c
@@ -0,0 +1,492 @@


missing copyright header?


+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 


-8<-



+/* Lower sock lock held */
+void strp_tcp_data_ready(struct sock *sk)
+{
+   struct strparser *strp;
+
+   read_lock_bh(>sk_callback_lock);
+
+   strp = (struct strparser *)sk->sk_user_data;
+   if (unlikely(!strp || strp->rx_stopped))
+   goto out;
+
+   if (strp->rx_paused)
+   goto out;
+
+   if (strp->rx_need_bytes) {
+   if (tcp_inq(sk) >= strp->rx_need_bytes)
+   strp->rx_need_bytes = 0;
+   else
+   goto out;
+   }
+
+   if (strp_tcp_read_sock(strp) == -ENOMEM)
+   queue_delayed_work(strp_wq, >rx_delayed_work, 0);
+
+out:
+   read_unlock_bh(>sk_callback_lock);
+}
+EXPORT_SYMBOL(strp_tcp_data_ready);


The module is GPL; did you want the symbol exports to be GPL as well?




Re: [PATCH v2 3/5] ARM: sun8i: dt: Add DT bindings documentation for Allwinner sun8i-emac

2016-07-28 Thread LABBE Corentin
On Thu, Jul 21, 2016 at 09:55:19AM +0200, Maxime Ripard wrote:
> Hi,
> 
> On Wed, Jul 20, 2016 at 10:03:18AM +0200, LABBE Corentin wrote:
> > This patch adds documentation for Device-Tree bindings for the
> > Allwinner sun8i-emac driver.
> > 
> > Signed-off-by: LABBE Corentin 
> > ---
> >  .../bindings/net/allwinner,sun8i-emac.txt  | 65 
> > ++
> >  1 file changed, 65 insertions(+)
> >  create mode 100644 
> > Documentation/devicetree/bindings/net/allwinner,sun8i-emac.txt
> > 
> > diff --git a/Documentation/devicetree/bindings/net/allwinner,sun8i-emac.txt 
> > b/Documentation/devicetree/bindings/net/allwinner,sun8i-emac.txt
> > new file mode 100644
> > index 000..4bf4e53
> > --- /dev/null
> > +++ b/Documentation/devicetree/bindings/net/allwinner,sun8i-emac.txt
> > @@ -0,0 +1,65 @@
> > +* Allwinner sun8i EMAC ethernet controller
> > +
> > +Required properties:
> > +- compatible: "allwinner,sun8i-a83t-emac", "allwinner,sun8i-h3-emac",
> > +   or "allwinner,sun50i-a64-emac"
> > +- reg: address and length of the register sets for the device.
> > +- reg-names: should be "emac" and "syscon", matching the register sets
> 
> Blindly mapping a register of some other device on the SoC doesn't
> look very reasonable.
> 

As we discuss after this mail on IRC, this register is dedicated to EMAC.

> > +- interrupts: interrupt for the device
> > +- clocks: A phandle to the reference clock for this device
> > +- clock-names: should be "ahb"
> > +- resets: A phandle to the reset control for this device
> > +- reset-names: should be "ahb"
> > +- phy-mode: See ethernet.txt
> > +- phy or phy-handle: See ethernet.txt
> > +- #address-cells: shall be 1
> > +- #size-cells: shall be 0
> > +
> > +"allwinner,sun8i-h3-emac" also requires:
> > +- clocks: an extra phandle to the reference clock for the EPHY
> > +- clock-names: an extra "ephy" entry matching the clocks property
> > +- resets: an extra phandle to the reset control for the EPHY
> > +- resets-names: an extra "ephy" entry matching the resets property
> 
> Shouldn't that be attached to the phy itself?
> 

Ok I will move them.

> > +See ethernet.txt in the same directory for generic bindings for ethernet
> > +controllers.
> > +
> > +The device node referenced by "phy" or "phy-handle" should be a child node
> > +of this node. See phy.txt for the generic PHY bindings.
> > +
> > +Optional properties:
> > +- phy-supply: phandle to a regulator if the PHY needs one
> > +- phy-io-supply: phandle to a regulator if the PHY needs a another one for 
> > I/O.
> > +This is sometimes found with RGMII PHYs, which use a second
> > +regulator for the lower I/O voltage.
> > +- allwinner,tx-delay: The setting of the TX clock delay chain
> > +- allwinner,rx-delay: The setting of the RX clock delay chain
> 
> In which unit? What is the default value?
> 

The unit is unknown to me, but I have added a comment for the default and 
acceptable range value.

> > +
> > +The TX/RX clock delay chain settings are board specific.
> > +
> > +Optional properties for "allwinner,sun8i-h3-emac":
> > +- allwinner,use-internal-phy: Use the H3 SoC's internal E(thernet) PHY
> 
> Can't that be derived from the presence of the phy property?
> 

Yes, I have reworked the "variant" of the driver for easily handling this.

> > +- allwinner,leds-active-low: EPHY LEDs are active low
> 
> That also seems PHY related. Overall, I feel like we really need a phy
> node for the internal phy.
> 

Moved also in the phy node.

> Maxime
> 
> -- 
> Maxime Ripard, Free Electrons
> Embedded Linux and Kernel engineering
> http://free-electrons.com

Best regards

Thanks

LABBE Corentin



Re: [e1000_netpoll] BUG: sleeping function called from invalid context at kernel/irq/manage.c:110

2016-07-28 Thread Fengguang Wu

Hi Sabrina,


The idea when this first came up was to skip the sleeping part of
disable_irq():

http://marc.info/?l=linux-netdev=142314159626052

This fell off my todolist and I didn't send the conversion patches,
which would basically look like this:


Yes it works in the several machines that had the BUG!

[   23.806847] netpoll: netconsole: local port 6665
[   23.807145] netpoll: netconsole: local IPv4 address 0.0.0.0
[   23.807494] netpoll: netconsole: interface 'eth0'
[   23.807799] netpoll: netconsole: remote port 6646
[   23.808096] netpoll: netconsole: remote IPv4 address 192.168.1.1
[   23.808474] netpoll: netconsole: remote ethernet address ff:ff:ff:ff:ff:ff
[   23.808910] netpoll: netconsole: local IP 192.168.1.161
[   23.811680] 28 Jul 19:42:10 ntpdate[376]: step time server 192.168.1.1 
offset 1696.257557 sec
[   23.811886] console [netcon0] enabled
[   23.812131] netconsole: network logging started

Thanks,
Fengguang



diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c 
b/drivers/net/ethernet/intel/e1000e/netdev.c
index 41f32c0b341e..b022691e680b 100644
--- a/drivers/net/ethernet/intel/e1000e/netdev.c
+++ b/drivers/net/ethernet/intel/e1000e/netdev.c
@@ -6713,20 +6713,20 @@ static irqreturn_t e1000_intr_msix(int __always_unused 
irq, void *data)

vector = 0;
msix_irq = adapter->msix_entries[vector].vector;
-   disable_irq(msix_irq);
-   e1000_intr_msix_rx(msix_irq, netdev);
+   if (disable_hardirq(msix_irq))
+   e1000_intr_msix_rx(msix_irq, netdev);
enable_irq(msix_irq);

vector++;
msix_irq = adapter->msix_entries[vector].vector;
-   disable_irq(msix_irq);
-   e1000_intr_msix_tx(msix_irq, netdev);
+   if (disable_hardirq(msix_irq))
+   e1000_intr_msix_tx(msix_irq, netdev);
enable_irq(msix_irq);

vector++;
msix_irq = adapter->msix_entries[vector].vector;
-   disable_irq(msix_irq);
-   e1000_msix_other(msix_irq, netdev);
+   if (disable_hardirq(msix_irq))
+   e1000_msix_other(msix_irq, netdev);
enable_irq(msix_irq);
}

@@ -6750,13 +6750,13 @@ static void e1000_netpoll(struct net_device *netdev)
e1000_intr_msix(adapter->pdev->irq, netdev);
break;
case E1000E_INT_MODE_MSI:
-   disable_irq(adapter->pdev->irq);
-   e1000_intr_msi(adapter->pdev->irq, netdev);
+   if (disable_hardirq(adapter->pdev->irq))
+   e1000_intr_msi(adapter->pdev->irq, netdev);
enable_irq(adapter->pdev->irq);
break;
default:/* E1000E_INT_MODE_LEGACY */
-   disable_irq(adapter->pdev->irq);
-   e1000_intr(adapter->pdev->irq, netdev);
+   if (disable_hardirq(adapter->pdev->irq))
+   e1000_intr(adapter->pdev->irq, netdev);
enable_irq(adapter->pdev->irq);
br
ak;
}


[PATCH 1/1] ixgbevf: replace integer number with bool value

2016-07-28 Thread zyjzyj2000
From: Zhu Yanjun 

The variable get_mac_status is a bool variable. So a bool value is
better than an integer number.

Signed-off-by: Zhu Yanjun 
---
 drivers/net/ethernet/intel/ixgbevf/ethtool.c  | 2 +-
 drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c | 4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbevf/ethtool.c 
b/drivers/net/ethernet/intel/ixgbevf/ethtool.c
index 508e72c..ce221d1 100644
--- a/drivers/net/ethernet/intel/ixgbevf/ethtool.c
+++ b/drivers/net/ethernet/intel/ixgbevf/ethtool.c
@@ -104,7 +104,7 @@ static int ixgbevf_get_settings(struct net_device *netdev,
ecmd->transceiver = XCVR_DUMMY1;
ecmd->port = -1;
 
-   hw->mac.get_link_status = 1;
+   hw->mac.get_link_status = true;
hw->mac.ops.check_link(hw, _speed, _up, false);
 
if (link_up) {
diff --git a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c 
b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
index acc2401..a98e7c2 100644
--- a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
+++ b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
@@ -1285,7 +1285,7 @@ static irqreturn_t ixgbevf_msix_other(int irq, void *data)
struct ixgbevf_adapter *adapter = data;
struct ixgbe_hw *hw = >hw;
 
-   hw->mac.get_link_status = 1;
+   hw->mac.get_link_status = true;
 
ixgbevf_service_event_schedule(adapter);
 
@@ -2109,7 +2109,7 @@ static void ixgbevf_up_complete(struct ixgbevf_adapter 
*adapter)
ixgbevf_save_reset_stats(adapter);
ixgbevf_init_last_counter_stats(adapter);
 
-   hw->mac.get_link_status = 1;
+   hw->mac.get_link_status = true;
mod_timer(>service_timer, jiffies);
 }
 
-- 
2.7.4



Re: [PATCH v2 3/5] ARM: sun8i: dt: Add DT bindings documentation for Allwinner sun8i-emac

2016-07-28 Thread LABBE Corentin
On Wed, Jul 20, 2016 at 02:15:33PM -0500, Rob Herring wrote:
> On Wed, Jul 20, 2016 at 10:03:18AM +0200, LABBE Corentin wrote:
> > This patch adds documentation for Device-Tree bindings for the
> > Allwinner sun8i-emac driver.
> > 
> > Signed-off-by: LABBE Corentin 
> > ---
> >  .../bindings/net/allwinner,sun8i-emac.txt  | 65 
> > ++
> >  1 file changed, 65 insertions(+)
> >  create mode 100644 
> > Documentation/devicetree/bindings/net/allwinner,sun8i-emac.txt
> > 
> > diff --git a/Documentation/devicetree/bindings/net/allwinner,sun8i-emac.txt 
> > b/Documentation/devicetree/bindings/net/allwinner,sun8i-emac.txt
> > new file mode 100644
> > index 000..4bf4e53
> > --- /dev/null
> > +++ b/Documentation/devicetree/bindings/net/allwinner,sun8i-emac.txt
> > @@ -0,0 +1,65 @@
> > +* Allwinner sun8i EMAC ethernet controller
> > +
> > +Required properties:
> > +- compatible: "allwinner,sun8i-a83t-emac", "allwinner,sun8i-h3-emac",
> > +   or "allwinner,sun50i-a64-emac"
> 
> List one per line.
> 
ok

> > +- reg: address and length of the register sets for the device.
> > +- reg-names: should be "emac" and "syscon", matching the register sets
> > +- interrupts: interrupt for the device
> > +- clocks: A phandle to the reference clock for this device
> > +- clock-names: should be "ahb"
> > +- resets: A phandle to the reset control for this device
> > +- reset-names: should be "ahb"
> > +- phy-mode: See ethernet.txt
> > +- phy or phy-handle: See ethernet.txt
> > +- #address-cells: shall be 1
> > +- #size-cells: shall be 0
> > +
> > +"allwinner,sun8i-h3-emac" also requires:
> > +- clocks: an extra phandle to the reference clock for the EPHY
> > +- clock-names: an extra "ephy" entry matching the clocks property
> > +- resets: an extra phandle to the reset control for the EPHY
> > +- resets-names: an extra "ephy" entry matching the resets property
> > +
> > +See ethernet.txt in the same directory for generic bindings for ethernet
> > +controllers.
> > +
> > +The device node referenced by "phy" or "phy-handle" should be a child node
> > +of this node. See phy.txt for the generic PHY bindings.
> > +
> > +Optional properties:
> > +- phy-supply: phandle to a regulator if the PHY needs one
> > +- phy-io-supply: phandle to a regulator if the PHY needs a another one for 
> > I/O.
> > +This is sometimes found with RGMII PHYs, which use a second
> > +regulator for the lower I/O voltage.
> 
> I previously said these should go in the phy node, and you said you 
> would remove them.
> 
> Rob

Sorry, I forgot to remove them.
I have removed them now.

Regards

Thanks

LABBE Corentin


Re: [PATCH v2 1/5] ethernet: add sun8i-emac driver

2016-07-28 Thread LABBE Corentin
On Wed, Jul 20, 2016 at 11:56:12AM +0200, Arnd Bergmann wrote:
> On Wednesday, July 20, 2016 10:03:16 AM CEST LABBE Corentin wrote:
> > +
> > +   /* Benched on OPIPC with 100M, setting more than 256 does not give 
> > any
> > +* perf boost
> > +*/
> > +   priv->nbdesc_rx = 128;
> > +   priv->nbdesc_tx = 256;
> > +
> > 
> 
> 256 tx descriptors can introduce a significant latency. Can you add
> support for BQL (netdev_sent_queue/netdev_completed_queue) to limit
> the queue size to the minimum?

Done, since setting below 256 give lower performance with iperf.

> 
> I also noticed that your tx_lock() prevents you from concurrently
> running sun8i_emac_complete_xmit() and sun8i_emac_xmit(). Is that
> necessary? I'd think that you can find a way to make them work
> concurrently.
> 
>   Arnd

I will reworked locking and it seems that no locking is necessary.
I have added the following comment about the locking strategy:

/* Locking strategy:
 * RX queue does not need any lock since only sun8i_emac_poll() access it.
 * (All other RX modifiers (ringparam/ndo_stop) disable NAPI and so 
sun8i_emac_poll())
 * TX queue is handled by sun8i_emac_xmit(), sun8i_emac_complete_xmit() and 
sun8i_emac_tx_timeout()
 * (All other RX modifiers (ringparam/ndo_stop) disable NAPI and stop queue)
 *
 * sun8i_emac_xmit() could fire only once (netif_tx_lock)
 * sun8i_emac_complete_xmit() could fire only once (called from NAPI)
 * sun8i_emac_tx_timeout() could fire only once (netif_tx_lock) and couldnt
 * race with sun8i_emac_xmit (due to netif_tx_lock) and with 
sun8i_emac_complete_xmit which disable NAPI.
 *
 * So only sun8i_emac_xmit and sun8i_emac_complete_xmit could fire at the same 
time.
 * But they never could modify the same descriptors:
 * - sun8i_emac_complete_xmit() will modify only descriptors with empty status
 * - sun8i_emac_xmit() will modify only descriptors set to DCLEAN
 * Proper memory barriers ensure that descriptor set to DCLEAN could not be
 * modified latter by sun8i_emac_complete_xmit().
 * */

Does I am right ?

Thanks for your review.

Regards

LABBE Corentin


RE: [PATCH net 1/3] r8169:fix kernel log spam when set or get hardware wol setting.

2016-07-28 Thread Hau
[...]
> Nit: you may directly use "struct device *d = >pci_dev->dev;"
> 

I will do that on my next version patch.

Thanks.
--Please consider the environment before printing this e-mail.


RE: [PATCH net 1/3] r8169:fix kernel log spam when set or get hardware wol setting.

2016-07-28 Thread Hau
[...]
> > @@ -1852,12 +1863,17 @@ static int rtl8169_set_wol(struct net_device
> *dev, struct ethtool_wolinfo *wol)
> > tp->features |= RTL_FEATURE_WOL;
> > else
> > tp->features &= ~RTL_FEATURE_WOL;
> > -   __rtl8169_set_wol(tp, wol->wolopts);
> > +   if (pm_runtime_active(>dev))
> > +   __rtl8169_set_wol(tp, wol->wolopts);
> > +   else
> > +   tp->saved_wolopts = wol->wolopts;
> >
> > rtl_unlock_work(tp);
> >
> > device_set_wakeup_enable(>pci_dev->dev, wol->wolopts);
> >
> > +   pm_runtime_put_noidle(>dev);
> > +
> > return 0;
> 
> Either the driver resumes the device so that it can perform requested
> operation or it signals .set_wol failure when the device is suspended.
> 
> If the driver does something else, "spam removal" translates to "silent
> failure".

Because "tp->saved_wolopts" will be used to set hardware wol capability in 
rtl8169_runtime_resume().  So I prefer to keep "wol->wolopts" to " 
tp->saved_wolopts " in runtime suspend state and set this to this 
"wol->wolopts" to hardware in in rtl8169_runtime_resume(). 

Thanks.

--Please consider the environment before printing this e-mail.


Re: [PATCH v4] net: sched: convert qdisc linked list to hashtable

2016-07-28 Thread Fengguang Wu

On Thu, Jul 28, 2016 at 01:18:27PM +0200, Jiri Kosina wrote:

On Thu, 28 Jul 2016, kbuild test robot wrote:


[auto build test ERROR on v4.7-rc7]
[also build test ERROR on next-20160728]
[cannot apply to net/master net-next/master ipsec-next/master]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Jiri-Kosina/net-sched-convert-qdisc-linked-list-to-hashtable/20160728-182303
config: i386-randconfig-s0-201630 (attached as .config)
compiler: gcc-6 (Debian 6.1.1-9) 6.1.1 20160705
reproduce:
# save the attached .config to linux build tree
make ARCH=i386

All errors (new ones prefixed by >>):

   net/built-in.o: In function `dev_activate':
>> (.text+0x37ccb): undefined reference to `qdisc_hash_add'


Dear 0-day team,

could you please check my question regarding this very build failure here?

lkml.kernel.org/r/alpine.lnx.2.00.1607141612560.24...@cbobk.fhfr.pm


Sorry I missed that. For your convenience, here is the answer to the
original email:


This issue is be there even without my patch (but with qdisc_list_add
instead), isn't it?


Yes it looks so, this number happens in a number of places:

dns_query.c:(.text+0x39b84): undefined reference to `qdisc_hash_add'
include/linux/netdevice.h:1935: undefined reference to `qdisc_hash_add'
net/core/netevent.c:31: undefined reference to `qdisc_hash_add'
net/sched/sch_generic.c:789: undefined reference to `qdisc_hash_add'
sch_generic.c:(.text+0x33487): undefined reference to `qdisc_hash_add'
switchdev.c:(.text+0x3bf58): undefined reference to `qdisc_hash_add'
sysctl_net.c:(.text+0x31f70): undefined reference to `qdisc_hash_add'
(.text.dev_activate+0x228): undefined reference to `qdisc_hash_add'
(.text+0x37d0b): undefined reference to `qdisc_hash_add'
wext-proc.c:(.text+0x390a8): undefined reference to `qdisc_hash_add'


The problem is that sch_generic.c (where dev_activate() is) is being
compiled everytime CONFIG_NET is set, but sch_api.c (where
qdisc_list_add() is defined) only when CONFIG_NET_SCHED is set (and there
is no stub for !CONFIG_NET_SCHED case).


So it looks like a more general problem than specific to this patch.

Thanks,
Fengguang


Re: [RFC PATCH v3] net: sched: convert qdisc linked list to hashtable

2016-07-28 Thread Fengguang Wu

Hi Jiri,

On Thu, Jul 14, 2016 at 04:14:58PM +0200, Jiri Kosina wrote:


[ added CCs ]

On Tue, 12 Jul 2016, kbuild test robot wrote:


Hi,

[auto build test ERROR on net/master]
[also build test ERROR on v4.7-rc7 next-20160711]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Jiri-Kosina/net-sched-convert-qdisc-linked-list-to-hashtable/20160711-220527
config: arm-tct_hammer_defconfig (attached as .config)
compiler: arm-linux-gnueabi-gcc (Debian 5.3.1-8) 5.3.1 20160205
reproduce:
wget 
https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross
 -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=arm

All errors (new ones prefixed by >>):

   net/built-in.o: In function `dev_activate':
>> wext-proc.c:(.text+0x38544): undefined reference to `qdisc_hash_add'


This issue is be there even without my patch (but with qdisc_list_add
instead), isn't it?


Yes it looks so, this number happens in a number of places:

dns_query.c:(.text+0x39b84): undefined reference to `qdisc_hash_add'
include/linux/netdevice.h:1935: undefined reference to `qdisc_hash_add'
net/core/netevent.c:31: undefined reference to `qdisc_hash_add'
net/sched/sch_generic.c:789: undefined reference to `qdisc_hash_add'
sch_generic.c:(.text+0x33487): undefined reference to `qdisc_hash_add'
switchdev.c:(.text+0x3bf58): undefined reference to `qdisc_hash_add'
sysctl_net.c:(.text+0x31f70): undefined reference to `qdisc_hash_add'
(.text.dev_activate+0x228): undefined reference to `qdisc_hash_add'
(.text+0x37d0b): undefined reference to `qdisc_hash_add'
wext-proc.c:(.text+0x390a8): undefined reference to `qdisc_hash_add'


The problem is that sch_generic.c (where dev_activate() is) is being
compiled everytime CONFIG_NET is set, but sch_api.c (where
qdisc_list_add() is defined) only when CONFIG_NET_SCHED is set (and there
is no stub for !CONFIG_NET_SCHED case).


So it looks like a more general problem than specific to this patch.

Thanks,
Fengguang


Re: [e1000_netpoll] BUG: sleeping function called from invalid context at kernel/irq/manage.c:110

2016-07-28 Thread Thomas Gleixner
On Thu, 28 Jul 2016, Sabrina Dubroca wrote:
> 2016-07-28, 07:43:55 +0200, Eric Dumazet wrote:
> > I would prefer having a definitive advice from Thomas Gleixner and/or
> > others if disable_irq() is forbidden from IRQ path.

Yes it is. Before we added threaded interrupt handlers it was not an issue,
but with (possibly) threaded interrupts it's an absolute no-no.

> > As I said, about all netpoll() methods in net drivers use disable_irq()
> > so a lot of patches would be needed.
> > 
> > disable_irq() should then test this condition earlier, so that we can
> > detect potential bug, even if the IRQ is not (yet) threaded.
> 
> The idea when this first came up was to skip the sleeping part of
> disable_irq():
> 
> http://marc.info/?l=linux-netdev=142314159626052
> 
> This fell off my todolist and I didn't send the conversion patches,
> which would basically look like this:
>
> diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c 
> b/drivers/net/ethernet/intel/e1000e/netdev.c
> index 41f32c0b341e..b022691e680b 100644
> --- a/drivers/net/ethernet/intel/e1000e/netdev.c
> +++ b/drivers/net/ethernet/intel/e1000e/netdev.c
> @@ -6713,20 +6713,20 @@ static irqreturn_t e1000_intr_msix(int 
> __always_unused irq, void *data)
>  
>   vector = 0;
>   msix_irq = adapter->msix_entries[vector].vector;
> - disable_irq(msix_irq);
> - e1000_intr_msix_rx(msix_irq, netdev);
> + if (disable_hardirq(msix_irq))
> + e1000_intr_msix_rx(msix_irq, netdev);
>   enable_irq(msix_irq);

That'll work nicely even when one of the affected interrupts is threaded.

Thanks,

tglx


Re: [PATCH 0/2] net: davinci_cpdma: reduce latency on -rt

2016-07-28 Thread Uwe Kleine-König
Hello Grygorii,

On Thu, Jul 28, 2016 at 12:34:19PM +0300, Grygorii Strashko wrote:
> Thanks. I've just wanted to have clear understanding of the [possible] issue.
> And I'd be appreciated if you could share and measurement results if you have.

I didn't measure anything (yet), just considered these patches
low-hanging fruits. But when looking again there, I will provide some
numbers.

Best regards
Uwe

-- 
Pengutronix e.K.   | Uwe Kleine-König|
Industrial Linux Solutions | http://www.pengutronix.de/  |


Re: [PATCH v4] net: sched: convert qdisc linked list to hashtable

2016-07-28 Thread kbuild test robot
Hi,

[auto build test ERROR on v4.7-rc7]
[also build test ERROR on next-20160728]
[cannot apply to net/master net-next/master ipsec-next/master]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Jiri-Kosina/net-sched-convert-qdisc-linked-list-to-hashtable/20160728-182303
config: arm-sunxi_defconfig (attached as .config)
compiler: arm-linux-gnueabi-gcc (Debian 5.4.0-6) 5.4.0 20160609
reproduce:
wget 
https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross
 -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=arm 

All errors (new ones prefixed by >>):

   net/built-in.o: In function `dev_activate':
>> dns_query.c:(.text+0x39adc): undefined reference to `qdisc_hash_add'

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: Binary data


Re: [PATCH v5 5/8] thunderbolt: Networking state machine

2016-07-28 Thread Lukas Wunner
On Thu, Jul 28, 2016 at 11:15:18AM +0300, Amir Levy wrote:
> +static void nhi_handle_notification_msg(struct tbt_nhi_ctxt *nhi_ctxt,
> + const u8 *msg)
> +{
> + struct port_net_dev *port;
> + u8 port_num;
> +
> +#define INTER_DOMAIN_LINK_SHIFT 0
> +#define INTER_DOMAIN_LINK_MASK   GENMASK(2, INTER_DOMAIN_LINK_SHIFT)
> + switch (msg[3]) {
> +
> + case NC_INTER_DOMAIN_CONNECTED:
> + port_num = PORT_NUM_FROM_MSG(msg[5]);
> +#define INTER_DOMAIN_APPROVED BIT(3)
> + if (likely(port_num < nhi_ctxt->num_ports)) {
> + if (!(msg[5] & INTER_DOMAIN_APPROVED))

I find these interspersed #defines make the code hard to read,
but maybe that's just me.


> + nhi_ctxt->net_devices[
> + port_num].medium_sts =

Looks like a carriage return slipped in here.

In patch [4/8], I've found it a bit puzzling that FW->SW responses and
FW->SW notifications are defined in icm_nhi.c, whereas SW->FW commands
are defined in net.h. It would perhaps be more logical to have them
all in the header file. The FW->SW responses and SW->FW commands are
almost identical, there are odd spelling differences (CONNEXION vs.
CONNECTION).

It would probably be good to explain the PDF acronym somewhere.

I've skimmed over all patches in the series, too superficial to provide
a Reviewed-by, it's just too much code to review thoroughly and I also
lack the hardware to test it, but broadly this LGTM.

Thanks,

Lukas


Re: 4.6.3, pppoe + shaper workload, skb_panic / skb_push / ppp_start_xmit

2016-07-28 Thread Denys Fedoryshchenko

On 2016-07-28 14:09, Guillaume Nault wrote:

On Tue, Jul 12, 2016 at 10:31:18AM -0700, Cong Wang wrote:

On Mon, Jul 11, 2016 at 12:45 PM,   wrote:
> Hi
>
> On latest kernel i noticed kernel panic happening 1-2 times per day. It is
> also happening on older kernel (at least 4.5.3).
>
...
>  [42916.426463] Call Trace:
>  [42916.426658]  
>
>  [42916.426719]  [] skb_push+0x36/0x37
>  [42916.427111]  [] ppp_start_xmit+0x10f/0x150
> [ppp_generic]
>  [42916.427314]  [] dev_hard_start_xmit+0x25a/0x2d3
>  [42916.427516]  [] ?
> validate_xmit_skb.isra.107.part.108+0x11d/0x238
>  [42916.427858]  [] sch_direct_xmit+0x89/0x1b5
>  [42916.428060]  [] __qdisc_run+0x133/0x170
>  [42916.428261]  [] net_tx_action+0xe3/0x148
>  [42916.428462]  [] __do_softirq+0xb9/0x1a9
>  [42916.428663]  [] irq_exit+0x37/0x7c
>  [42916.428862]  [] smp_apic_timer_interrupt+0x3d/0x48
>  [42916.429063]  [] apic_timer_interrupt+0x7c/0x90

Interesting, we call a skb_cow_head() before skb_push() in 
ppp_start_xmit(),

I have no idea why this could happen.


The skb is corrupted: head is at 8800b0bf2800 while data is at
ffa00500b0bf284c.

Figuring out how this corruption happened is going to be hard without a
way to reproduce the problem.

Denys, can you confirm you're using a vanilla kernel?
Also I guess the ppp devices and tc settings are handled by accel-ppp.
If so, can you share more info about your setup (accel-ppp.conf, radius
attributes, iptables...) so that I can try to reproduce it on my
machines?


I have slight modification from vanilla:

--- linux/net/sched/sch_htb.c   2016-06-08 01:23:53.0 +
+++ linux-new/net/sched/sch_htb.c   2016-06-21 14:03:08.398486593 +
@@ -1495,10 +1495,10 @@
cl->common.classid);
cl->quantum = 1000;
}
-   if (!hopt->quantum && cl->quantum > 20) {
+   if (!hopt->quantum && cl->quantum > 200) {
pr_warn("HTB: quantum of class %X is big. Consider r2q 
change.\n",
cl->common.classid);
-   cl->quantum = 20;
+   cl->quantum = 200;
}
if (hopt->quantum)
cl->quantum = hopt->quantum;

But i guess it should not be reason of crash (it is related to another 
system,  without it i was unable to shape over 7Gbps, maybe with latest 
kernel i will not need this patch).


I'm trying to make reproducible conditions of crash, because right now 
it happens only on some servers in large networks (completely different 
ISPs, so i excluded possible hardware fault of specific server). It is 
complex config, i have accel-ppp, plus my own "shaping daemon" that 
apply several shapers on ppp interfaces. Wost thing it happens only on 
live customers, i am unable to reproduce same on stress tests. Also 
until recent kernel i was getting different panic messages (but all 
related to ppp).


I think also at least one reason of crash also was fixed by "ppp: defer 
netns reference release for ppp channel" in 4.7.0 (maybe thats why i am 
getting less crashes recently).
I tried also various kernel debug options that doesn't cause major 
performance degradation (locks checking, freed memory poisoning and 
etc), without any luck yet. Is it useful if i will post panics that at 
least occurs twice? (I will post below example, got recently)
Sure if i will be able to reproducible conditions i will send them 
immediately.



 [ 5449.900988] general protection fault:  [#1] SMP
 [ 5449.901263] Modules linked in:
 cls_fw
 act_police
 cls_u32
 sch_ingress
 sch_sfq
 sch_htb
 pppoe
 pppox
 ppp_generic
 slhc
 netconsole
 configfs
 xt_nat
 ts_bm
 xt_string
 xt_connmark
 xt_TCPMSS
 xt_tcpudp
 xt_mark
 iptable_filter
 iptable_nat
 nf_conntrack_ipv4
 nf_defrag_ipv4
 nf_nat_ipv4
 nf_nat
 nf_conntrack
 iptable_mangle
 ip_tables
 x_tables
 8021q
 garp
 mrp
 stp
 llc
 ixgbe
 dca

 [ 5449.904989] CPU: 1 PID: 6359 Comm: ip Not tainted 
4.7.0-build-0109 #2
 [ 5449.905255] Hardware name: Supermicro 
X10SLM+-LN4F/X10SLM+-LN4F, BIOS 3.0 04/24/2015
 [ 5449.905712] task: 8803eef4 ti: 8803fd754000 
task.ti: 8803fd754000

 [ 5449.906168] RIP: 0010:[]
 [] inet_fill_ifaddr+0x5a/0x264
 [ 5449.906710] RSP: 0018:8803fd757b98  EFLAGS: 00010286
 [ 5449.906976] RAX: 8803ef65cb90 RBX: 8803f7d2cd00 
RCX: 
 [ 5449.907248] RDX: 00080002 RSI: 8803ef65cb90 
RDI: 8803ef65cba8
 [ 5449.907519] RBP: 8803fd757be0 R08: 0008 
R09: 0002
 [ 5449.907792] R10: ffa005040269f480 R11: 820a1c00 
R12: ffa005040269f480
 [ 5449.908067] R13: 8803ef65cb90 R14:  
R15: 8803f7d2cd00
 [ 5449.908339] FS:  7f660674d700() 
GS:88041fc4() knlGS:
 [ 5449.908796] CS:  0010 DS:  ES:  CR0: 
80050033
 [ 5449.909067] CR2: 008b9018 CR3: 

ip=dhcp woes

2016-07-28 Thread Uwe Kleine-König
Hello,

I have a machine with four network interfaces and I'm using ip=dhcp
during development on it.

in ic_dynamic the procedure is the following (assuming no successful
reply is received in time):

  timeout = 2s + random([0, 1]) s
loop:
  send bootp on 1st dev
  wait 1s
  send bootp on 2nd dev
  wait 1s
  send bootp on 3rd dev
  wait 1s
  send bootp on 4th dev
  wait timeout
  timeout = timeout * 4 / 7
  goto loop;

My problem now is: The dhcp server is reachable via the first device and
takes little more than 1s to respond. A reply must match the last sent
request to be accepted.

So the obvious questions are:

Why is only the last timeout increased for each loop? Why is there a
difference at all between the waits which results in a special casing of
the last device?

Alternatively, why not accept a reply on eth0 when eth1 has already sent
a request? Then the procedure could be:

  timeout = 2s + random([0, 1]) s
loop:
  send bootp on 1st dev
  send bootp on 2nd dev
  send bootp on 3rd dev
  send bootp on 4th dev
  wait timeout
  timeout = timeout * 4 / 7
  goto loop;

which looks more effective.

Is there anything I missed?

Best regards
Uwe

-- 
Pengutronix e.K.   | Uwe Kleine-König|
Industrial Linux Solutions | http://www.pengutronix.de/  |


Re: [PATCH v4] net: sched: convert qdisc linked list to hashtable

2016-07-28 Thread kbuild test robot
Hi,

[auto build test ERROR on v4.7-rc7]
[also build test ERROR on next-20160728]
[cannot apply to net/master net-next/master ipsec-next/master]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Jiri-Kosina/net-sched-convert-qdisc-linked-list-to-hashtable/20160728-182303
config: sh-edosk7760_defconfig (attached as .config)
compiler: sh4-linux-gnu-gcc (Debian 5.4.0-6) 5.4.0 20160609
reproduce:
wget 
https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross
 -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=sh 

All errors (new ones prefixed by >>):

   net/built-in.o: In function `dev_activate':
>> net/sched/sch_generic.c:733: undefined reference to `qdisc_hash_add'

vim +733 net/sched/sch_generic.c

   727  qdisc = qdisc_create_dflt(txq, _qdisc_ops, 
TC_H_ROOT);
   728  if (qdisc) {
   729  dev->qdisc = qdisc;
   730  qdisc->ops->attach(qdisc);
   731  }
   732  }
 > 733  if (dev->qdisc)
   734  qdisc_hash_add(dev->qdisc);
   735  }
   736  

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: Binary data


Re: [PATCH v4] net: sched: convert qdisc linked list to hashtable

2016-07-28 Thread Jiri Kosina
On Thu, 28 Jul 2016, kbuild test robot wrote:

> [auto build test ERROR on v4.7-rc7]
> [also build test ERROR on next-20160728]
> [cannot apply to net/master net-next/master ipsec-next/master]
> [if your patch is applied to the wrong git tree, please drop us a note to 
> help improve the system]
> 
> url:
> https://github.com/0day-ci/linux/commits/Jiri-Kosina/net-sched-convert-qdisc-linked-list-to-hashtable/20160728-182303
> config: i386-randconfig-s0-201630 (attached as .config)
> compiler: gcc-6 (Debian 6.1.1-9) 6.1.1 20160705
> reproduce:
> # save the attached .config to linux build tree
> make ARCH=i386 
> 
> All errors (new ones prefixed by >>):
> 
>net/built-in.o: In function `dev_activate':
> >> (.text+0x37ccb): undefined reference to `qdisc_hash_add'

Dear 0-day team,

could you please check my question regarding this very build failure here?

lkml.kernel.org/r/alpine.lnx.2.00.1607141612560.24...@cbobk.fhfr.pm

Thanks,

-- 
Jiri Kosina
SUSE Labs



Re: [PATCH v4] net: sched: convert qdisc linked list to hashtable

2016-07-28 Thread kbuild test robot
Hi,

[auto build test ERROR on v4.7-rc7]
[also build test ERROR on next-20160728]
[cannot apply to net/master net-next/master ipsec-next/master]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Jiri-Kosina/net-sched-convert-qdisc-linked-list-to-hashtable/20160728-182303
config: i386-randconfig-s0-201630 (attached as .config)
compiler: gcc-6 (Debian 6.1.1-9) 6.1.1 20160705
reproduce:
# save the attached .config to linux build tree
make ARCH=i386 

All errors (new ones prefixed by >>):

   net/built-in.o: In function `dev_activate':
>> (.text+0x37ccb): undefined reference to `qdisc_hash_add'

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: Binary data


Re: 4.6.3, pppoe + shaper workload, skb_panic / skb_push / ppp_start_xmit

2016-07-28 Thread Guillaume Nault
On Tue, Jul 12, 2016 at 10:31:18AM -0700, Cong Wang wrote:
> On Mon, Jul 11, 2016 at 12:45 PM,   wrote:
> > Hi
> >
> > On latest kernel i noticed kernel panic happening 1-2 times per day. It is
> > also happening on older kernel (at least 4.5.3).
> >
> ...
> >  [42916.426463] Call Trace:
> >  [42916.426658]  
> >
> >  [42916.426719]  [] skb_push+0x36/0x37
> >  [42916.427111]  [] ppp_start_xmit+0x10f/0x150
> > [ppp_generic]
> >  [42916.427314]  [] dev_hard_start_xmit+0x25a/0x2d3
> >  [42916.427516]  [] ?
> > validate_xmit_skb.isra.107.part.108+0x11d/0x238
> >  [42916.427858]  [] sch_direct_xmit+0x89/0x1b5
> >  [42916.428060]  [] __qdisc_run+0x133/0x170
> >  [42916.428261]  [] net_tx_action+0xe3/0x148
> >  [42916.428462]  [] __do_softirq+0xb9/0x1a9
> >  [42916.428663]  [] irq_exit+0x37/0x7c
> >  [42916.428862]  [] smp_apic_timer_interrupt+0x3d/0x48
> >  [42916.429063]  [] apic_timer_interrupt+0x7c/0x90
> 
> Interesting, we call a skb_cow_head() before skb_push() in ppp_start_xmit(),
> I have no idea why this could happen.
>
The skb is corrupted: head is at 8800b0bf2800 while data is at
ffa00500b0bf284c.

Figuring out how this corruption happened is going to be hard without a
way to reproduce the problem.

Denys, can you confirm you're using a vanilla kernel?
Also I guess the ppp devices and tc settings are handled by accel-ppp.
If so, can you share more info about your setup (accel-ppp.conf, radius
attributes, iptables...) so that I can try to reproduce it on my
machines?

Regards

Guillaume


skb_release_data causes "BUG: Bad page state"

2016-07-28 Thread Alex Lyakas

Greetings,

We had this warning[1] on long-term mainline kernel 3.18.19. Can anybody 
please advise on what might be causing it.


Thanks,
Alex.

[1]
Jul 21 22:57:27 vsa-01cc-vc-1 kernel: [96804.538709] BUG: Bad page state 
in process kworker/0:1H  pfn:4b317
Jul 21 22:57:27 vsa-01cc-vc-1 kernel: [96804.539555] 
page:ea00012cc5c0 count:0 mapcount:-1 mapping:  (null) index:0x0
Jul 21 22:57:27 vsa-01cc-vc-1 kernel: [96804.540541] flags: 
0x1008000(tail)
Jul 21 22:57:27 vsa-01cc-vc-1 kernel: [96804.541087] page dumped 
because: nonzero mapcount
Jul 21 22:57:27 vsa-01cc-vc-1 kernel: [96804.541664] Modules linked in: 
dm_crypt(OE) xfrm_user(E) xfrm4_tunnel(E) tunnel4(E) ipcomp(E) 
xfrm_ipcomp(E) esp4(E) ah4(E) xt_multiport(E) dm_queue_length(E) sd_mod(E) 
bonding(E) ib_iser(OE) rdma_cm(OE) iw_cm(OE) ib_cm(OE) ib_sa(OE) ib_mad(OE) 
ib_core(OE) ib_addr(OE) compat(OE) iscsi_tcp(OE) libiscsi_tcp(OE) 
libiscsi(OE) scsi_transport_iscsi(OE) ipt_MASQUERADE(E) 
nf_nat_masquerade_ipv4(E) iptable_nat(E) nf_nat_ipv4(E) nf_nat(E) 
nf_conntrack_ipv4(E) nf_defrag_ipv4(E) xt_conntrack(E) nf_conntrack(E) 
ipt_REJECT(E) nf_reject_ipv4(E) xt_CHECKSUM(E) iptable_mangle(E) 
xt_tcpudp(E) dm_zcache(OE) bridge(E) stp(E) llc(E) xfs(OE) btrfs(OE) 
ip6table_filter(E) ip6_tables(E) raid456(OE) iptable_filter(E) 
async_raid6_recov(E) async_memcpy(E) async_pq(E) async_xor(E) ip_tables(E) 
x_tables(E) xor(E) async_tx(E) raid6_pq(E) raid1(OE) md_mod(OE) deflate(E) 
ctr(E) twofish_generic(E) twofish_x86_64_3way(E) twofish_x86_64(E) 
twofish_common(E) camellia_generic(E) camellia_x86_64(E) 
serpent_sse2_x86_64(E) xts(E) serpent_generic(E) lrw(E) gf128mul(E) 
glue_helper(E) blowfish_generic(E) blowfish_x86_64(E) blowfish_common(E) 
cast5_generic(E) cast_common(E) ablk_helper(E) cryptd(E) des3_ede_x86_64(E) 
des_generic(E) cmac(E) xcbc(E) rmd160(E) crypto_null(E) af_key(E) 
xfrm_algo(E) iscsi_scst(OE) scst_utgt(OE) scst_vdisk(OE) libcrc32c(E) 
scst(OE) ppdev(E) nls_iso8859_1(E) dm_multipath(OE) kvm(E) scsi_dh(E) 
serio_raw(E) ttm(E) drm_kms_helper(E) drm(E) nfsd(OE) syscopyarea(E) 
auth_rpcgss(E) nfs_acl(E) sysfillrect(E) 8250_fintek(E) sysimgblt(E) nfs(E) 
parport_pc(E) lockd(E) grace(E) i2c_piix4(E) sunrpc(E) fscache(E) 
i6300esb(E) mac_hid(E) lp(E) parport(E) ata_generic(E) pata_acpi(E) 
ata_piix(E) psmouse(E) floppy(E) libata(E) scsi_mod(OE) ixgbevf(OE)
Jul 21 22:57:27 vsa-01cc-vc-1 kernel: [96804.541736] CPU: 0 PID: 3885 
Comm: kworker/0:1H Tainted: G   OE  3.18.19-zadara05 #1
Jul 21 22:57:27 vsa-01cc-vc-1 kernel: [96804.541738] Hardware name: 
Bochs Bochs, BIOS Bochs 01/01/2007
Jul 21 22:57:27 vsa-01cc-vc-1 kernel: [96804.541743] Workqueue: kblockd 
blk_delay_work
Jul 21 22:57:27 vsa-01cc-vc-1 kernel: [96804.541744]  81a79568 
88008fa03c88 81710c85 1fcc
Jul 21 22:57:27 vsa-01cc-vc-1 kernel: [96804.541747]  ea00012cc5c0 
88008fa03cb8 8170d7d7 ffff
Jul 21 22:57:27 vsa-01cc-vc-1 kernel: [96804.541749]  ea00012cc5c0 
 0000 88008fa03d08

Jul 21 22:57:27 vsa-01cc-vc-1 kernel: [96804.541751] Call Trace:
Jul 21 22:57:27 vsa-01cc-vc-1 kernel: [96804.541753]   
[] dump_stack+0x4e/0x71
Jul 21 22:57:27 vsa-01cc-vc-1 kernel: [96804.541759] 
[] bad_page.part.50+0xe0/0xfe
Jul 21 22:57:27 vsa-01cc-vc-1 kernel: [96804.541764] 
[] free_pages_prepare+0x199/0x1b0
Jul 21 22:57:27 vsa-01cc-vc-1 kernel: [96804.541766] 
[] __free_pages_ok+0x1b/0xd0
Jul 21 22:57:27 vsa-01cc-vc-1 kernel: [96804.541768] 
[] free_compound_page+0x1b/0x20
Jul 21 22:57:27 vsa-01cc-vc-1 kernel: [96804.541770] 
[] __put_compound_page+0x19/0x1d
Jul 21 22:57:27 vsa-01cc-vc-1 kernel: [96804.541773] 
[] put_compound_page+0x1bf/0x1e0
Jul 21 22:57:27 vsa-01cc-vc-1 kernel: [96804.541776] 
[] put_page+0x4b/0x50
Jul 21 22:57:27 vsa-01cc-vc-1 kernel: [96804.541779] 
[] ? reschedule_interrupt+0x6d/0x80
Jul 21 22:57:27 vsa-01cc-vc-1 kernel: [96804.541782] 
[] skb_release_data+0x87/0xd0
Jul 21 22:57:27 vsa-01cc-vc-1 kernel: [96804.541784] 
[] skb_release_all+0x28/0x30
Jul 21 22:57:27 vsa-01cc-vc-1 kernel: [96804.541786] 
[] consume_skb+0x2c/0xa0
Jul 21 22:57:27 vsa-01cc-vc-1 kernel: [96804.541789] 
[] __dev_kfree_skb_any+0x35/0x40
Jul 21 22:57:27 vsa-01cc-vc-1 kernel: [96804.541797] 
[] ixgbevf_poll+0xd1/0x550 [ixgbevf]
Jul 21 22:57:27 vsa-01cc-vc-1 kernel: [96804.541799] 
[] net_rx_action+0x152/0x280
Jul 21 22:57:27 vsa-01cc-vc-1 kernel: [96804.541803] 
[] __do_softirq+0xf5/0x320
Jul 21 22:57:27 vsa-01cc-vc-1 kernel: [96804.541805] 
[] irq_exit+0x115/0x120
Jul 21 22:57:27 vsa-01cc-vc-1 kernel: [96804.541809] 
[] smp_call_function_single_interrupt+0x35/0x40
Jul 21 22:57:27 vsa-01cc-vc-1 kernel: [96804.541811] 
[] call_function_single_interrupt+0x6d/0x80
Jul 21 22:57:27 vsa-01cc-vc-1 kernel: [96804.541812]   
[] ? scsi_host_free_command.isra.10+0x47/0x50 

Re: [Patch net] ppp: defer netns reference release for ppp channel

2016-07-28 Thread Guillaume Nault
On Wed, Jul 06, 2016 at 03:25:15PM +0300, Cyrill Gorcunov wrote:
> On Wed, Jul 06, 2016 at 11:26:02AM +0300, Cyrill Gorcunov wrote:
> > On Tue, Jul 05, 2016 at 10:12:36PM -0700, Cong Wang wrote:
> > > Matt reported that we have a NULL pointer dereference
> > > in ppp_pernet() from ppp_connect_channel(),
> > > i.e. pch->chan_net is NULL.
> > > 
> > > This is due to that a parallel ppp_unregister_channel()
> > > could happen while we are in ppp_connect_channel(), during
> > > which pch->chan_net set to NULL. Since we need a reference
> > > to net per channel, it makes sense to sync the refcnt
> > > with the life time of the channel, therefore we should
> > > release this reference when we destroy it.
> > > 
> > > Fixes: 1f461dcdd296 ("ppp: take reference on channels netns")
> > > Reported-by: Matt Bennett 
> > > Cc: Paul Mackerras 
> > > Cc: linux-...@vger.kernel.org
> > > Cc: Guillaume Nault 
> > > Cc: Cyrill Gorcunov 
> > > Signed-off-by: Cong Wang 
> > > ---
> > 
> > Hi Cong! I may be wrong, but this doesn't look right in general.
> > We take the net in ppp_register_channel->ppp_register_net_channel
> > and (name) context implies that ppp_unregister_channel does
> > the reverse. Maybe there some sync point missed? I'll review
> > in detail a bit later.
> 
> After staring more I think the patch should be fine as a fix
> since implementing sync with ppp_[re|un]register_channel and
> ppp_ioctl might need a way more work.
> 

[Sorry for arriving so late in the game, I was offline the last 3 weeks]

I agree having some symmetry between the creation and deletion
processes would be nice and would make the code easier to reason about.
Actually, I released the channel netns in ppp_unregister_channel() for
exactly this reason (and failed to spot this race).

But the code is already quite asymmetric and it's certainly too late to
move away from this scheme now. So releasing the channel netns in
ppp_destroy_channel() is in line with ppp_generic's architecture. Other
data are handled this way: e.g. channel_count is incremented in
ppp_register_net_channel() and decremented in ppp_destroy_channel()).

Thank you all for testing and fixing this issue!

Guillaume
--
To unsubscribe from this list: send the line "unsubscribe linux-ppp" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 05/15] ethernet: cavium: octeon: add missing of_node_put after calling of_parse_phandle

2016-07-28 Thread Peter Chen
 
>
>> ---
>>   drivers/net/ethernet/cavium/octeon/octeon_mgmt.c | 1 +
>>   1 file changed, 1 insertion(+)
>>
>> diff --git a/drivers/net/ethernet/cavium/octeon/octeon_mgmt.c
>> b/drivers/net/ethernet/cavium/octeon/octeon_mgmt.c
>> index e8bc15b..5eb9d8c 100644
>> --- a/drivers/net/ethernet/cavium/octeon/octeon_mgmt.c
>> +++ b/drivers/net/ethernet/cavium/octeon/octeon_mgmt.c
>> @@ -960,6 +960,7 @@ static int octeon_mgmt_init_phy(struct net_device 
>> *netdev)
>>  phydev = of_phy_connect(netdev, p->phy_np,
>>  octeon_mgmt_adjust_link, 0,
>>  PHY_INTERFACE_MODE_MII);
>> +of_node_put(p->phy_np);
>
>I don't think you can do this here.  octeon_mgmt_init_phy() may be called 
>multiple
>times in the life of the driver, so p->phy_np must remain valid.
>
>It may be appropriate to do the  of_node_put() in the
>octeon_mgmt_remove() function.
 
Thanks, I will change it.

Peter



Re: [e1000_netpoll] BUG: sleeping function called from invalid context at kernel/irq/manage.c:110

2016-07-28 Thread Sabrina Dubroca
2016-07-28, 07:43:55 +0200, Eric Dumazet wrote:
> On Wed, 2016-07-27 at 14:38 -0700, Jeff Kirsher wrote:
> > On Tue, 2016-07-26 at 11:14 +0200, Eric Dumazet wrote:
> > > Could you try this ?
> > > 
> > > diff --git a/drivers/net/ethernet/intel/e1000/e1000_main.c
> > > b/drivers/net/ethernet/intel/e1000/e1000_main.c
> > > index
> > > f42129d09e2c23ba9fdb5cde890d50ecb7166a42..a53c41c4c4f7d1fe52f95a2cab8784a
> > > 938b3820b 100644
> > > --- a/drivers/net/ethernet/intel/e1000/e1000_main.c
> > > +++ b/drivers/net/ethernet/intel/e1000/e1000_main.c
> > > @@ -5257,9 +5257,13 @@ static void e1000_netpoll(struct net_device
> > > *netdev)
> > >  {
> > > struct e1000_adapter *adapter = netdev_priv(netdev);
> > >  
> > > -   disable_irq(adapter->pdev->irq);
> > > -   e1000_intr(adapter->pdev->irq, netdev);
> > > -   enable_irq(adapter->pdev->irq);
> > > +   if (napi_schedule_prep(>napi)) {
> > > +   adapter->total_tx_bytes = 0;
> > > +   adapter->total_tx_packets = 0;
> > > +   adapter->total_rx_bytes = 0;
> > > +   adapter->total_rx_packets = 0;
> > > +   __napi_schedule(>napi);
> > > +   }
> > >  }
> > >  #endif
> > >  
> > 
> > Since this fixes the issue Fengguang saw, will you be submitting a formal
> > patch Eric? (please) I can get this queued up for Dave's net tree as soon
> > as I receive the formal patch.
> 
> I would prefer having a definitive advice from Thomas Gleixner and/or
> others if disable_irq() is forbidden from IRQ path.
> 
> As I said, about all netpoll() methods in net drivers use disable_irq()
> so a lot of patches would be needed.
> 
> disable_irq() should then test this condition earlier, so that we can
> detect potential bug, even if the IRQ is not (yet) threaded.

The idea when this first came up was to skip the sleeping part of
disable_irq():

http://marc.info/?l=linux-netdev=142314159626052

This fell off my todolist and I didn't send the conversion patches,
which would basically look like this:


diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c 
b/drivers/net/ethernet/intel/e1000e/netdev.c
index 41f32c0b341e..b022691e680b 100644
--- a/drivers/net/ethernet/intel/e1000e/netdev.c
+++ b/drivers/net/ethernet/intel/e1000e/netdev.c
@@ -6713,20 +6713,20 @@ static irqreturn_t e1000_intr_msix(int __always_unused 
irq, void *data)
 
vector = 0;
msix_irq = adapter->msix_entries[vector].vector;
-   disable_irq(msix_irq);
-   e1000_intr_msix_rx(msix_irq, netdev);
+   if (disable_hardirq(msix_irq))
+   e1000_intr_msix_rx(msix_irq, netdev);
enable_irq(msix_irq);
 
vector++;
msix_irq = adapter->msix_entries[vector].vector;
-   disable_irq(msix_irq);
-   e1000_intr_msix_tx(msix_irq, netdev);
+   if (disable_hardirq(msix_irq))
+   e1000_intr_msix_tx(msix_irq, netdev);
enable_irq(msix_irq);
 
vector++;
msix_irq = adapter->msix_entries[vector].vector;
-   disable_irq(msix_irq);
-   e1000_msix_other(msix_irq, netdev);
+   if (disable_hardirq(msix_irq))
+   e1000_msix_other(msix_irq, netdev);
enable_irq(msix_irq);
}
 
@@ -6750,13 +6750,13 @@ static void e1000_netpoll(struct net_device *netdev)
e1000_intr_msix(adapter->pdev->irq, netdev);
break;
case E1000E_INT_MODE_MSI:
-   disable_irq(adapter->pdev->irq);
-   e1000_intr_msi(adapter->pdev->irq, netdev);
+   if (disable_hardirq(adapter->pdev->irq))
+   e1000_intr_msi(adapter->pdev->irq, netdev);
enable_irq(adapter->pdev->irq);
break;
default:/* E1000E_INT_MODE_LEGACY */
-   disable_irq(adapter->pdev->irq);
-   e1000_intr(adapter->pdev->irq, netdev);
+   if (disable_hardirq(adapter->pdev->irq))
+   e1000_intr(adapter->pdev->irq, netdev);
enable_irq(adapter->pdev->irq);
break;
}


-- 
Sabrina


[PATCH v4] net: sched: convert qdisc linked list to hashtable

2016-07-28 Thread Jiri Kosina
From: Jiri Kosina 

Convert the per-device linked list into a hashtable. The primary 
motivation for this change is that currently, we're not tracking all the 
qdiscs in hierarchy (e.g. excluding default qdiscs), as the lookup 
performed over the linked list by qdisc_match_from_root() is rather 
expensive.

The ultimate goal is to get rid of hidden qdiscs completely, which will 
bring much more determinism in user experience.

As we're adding hashtable.h include into generic netdevice.h, we have to 
make sure HASH_SIZE macro is now non-conflicting with local definitions.

Signed-off-by: Jiri Kosina 
---
v1 -> v2: fix up RCU hastable usage wrt. rtnl
  fix compilation of .c files which define their own
  HASH_SIZE that now oncflicts with the one from
  hashtable.h (newly included via netdevice.h)

v2 -> v3: resolve HASH_SIZE identifier conflicts in a cleaner way
  fix up the number of hash bucket bits (4 bits for 16 buckets)

v3 -> v4: put the hastable into struct netdevice only if 
  CONFIG_NET_SCHED has been enabled

 include/linux/netdevice.h |  4 
 include/net/pkt_sched.h   |  4 ++--
 include/net/sch_generic.h |  2 +-
 net/core/dev.c|  3 +++
 net/ipv6/ip6_gre.c| 12 ++--
 net/ipv6/ip6_tunnel.c | 10 +-
 net/ipv6/ip6_vti.c| 10 +-
 net/ipv6/sit.c| 10 +-
 net/sched/sch_api.c   | 23 +--
 net/sched/sch_generic.c   |  6 +++---
 net/sched/sch_mq.c|  2 +-
 net/sched/sch_mqprio.c|  2 +-
 12 files changed, 49 insertions(+), 39 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index f45929c..17c6499 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -52,6 +52,7 @@
 #include 
 #include 
 #include 
+#include 
 
 struct netpoll_info;
 struct device;
@@ -1778,6 +1779,9 @@ struct net_device {
unsigned intnum_tx_queues;
unsigned intreal_num_tx_queues;
struct Qdisc*qdisc;
+#ifdef CONFIG_NET_SCHED
+   DECLARE_HASHTABLE   (qdisc_hash, 4);
+#endif
unsigned long   tx_queue_len;
spinlock_t  tx_global_lock;
int watchdog_timeo;
diff --git a/include/net/pkt_sched.h b/include/net/pkt_sched.h
index fea53f4..8ba11b4 100644
--- a/include/net/pkt_sched.h
+++ b/include/net/pkt_sched.h
@@ -90,8 +90,8 @@ int unregister_qdisc(struct Qdisc_ops *qops);
 void qdisc_get_default(char *id, size_t len);
 int qdisc_set_default(const char *id);
 
-void qdisc_list_add(struct Qdisc *q);
-void qdisc_list_del(struct Qdisc *q);
+void qdisc_hash_add(struct Qdisc *q);
+void qdisc_hash_del(struct Qdisc *q);
 struct Qdisc *qdisc_lookup(struct net_device *dev, u32 handle);
 struct Qdisc *qdisc_lookup_class(struct net_device *dev, u32 handle);
 struct qdisc_rate_table *qdisc_get_rtab(struct tc_ratespec *r,
diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 62d5531..26f5cb3 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -67,7 +67,7 @@ struct Qdisc {
u32 limit;
const struct Qdisc_ops  *ops;
struct qdisc_size_table __rcu *stab;
-   struct list_headlist;
+   struct hlist_node   hash;
u32 handle;
u32 parent;
int (*reshape_fail)(struct sk_buff *skb,
diff --git a/net/core/dev.c b/net/core/dev.c
index 904ff43..d3736d5 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -7511,6 +7511,9 @@ struct net_device *alloc_netdev_mqs(int sizeof_priv, 
const char *name,
INIT_LIST_HEAD(>all_adj_list.lower);
INIT_LIST_HEAD(>ptype_all);
INIT_LIST_HEAD(>ptype_specific);
+#ifdef CONFIG_NET_SCHED
+   hash_init(dev->qdisc_hash);
+#endif
dev->priv_flags = IFF_XMIT_DST_RELEASE | IFF_XMIT_DST_RELEASE_PERM;
setup(dev);
 
diff --git a/net/ipv6/ip6_gre.c b/net/ipv6/ip6_gre.c
index fdc9de2..d3697a4 100644
--- a/net/ipv6/ip6_gre.c
+++ b/net/ipv6/ip6_gre.c
@@ -61,12 +61,12 @@ static bool log_ecn_error = true;
 module_param(log_ecn_error, bool, 0644);
 MODULE_PARM_DESC(log_ecn_error, "Log packets received with corrupted ECN");
 
-#define HASH_SIZE_SHIFT  5
-#define HASH_SIZE (1 << HASH_SIZE_SHIFT)
+#define IP6_GRE_HASH_SIZE_SHIFT  5
+#define IP6_GRE_HASH_SIZE (1 << IP6_GRE_HASH_SIZE_SHIFT)
 
 static int ip6gre_net_id __read_mostly;
 struct ip6gre_net {
-   struct ip6_tnl __rcu *tunnels[4][HASH_SIZE];
+   struct ip6_tnl __rcu *tunnels[4][IP6_GRE_HASH_SIZE];
 
struct net_device *fb_tunnel_dev;
 };
@@ -96,12 +96,12 @@ static void ip6gre_tnl_link_config(struct ip6_tnl *t, int 
set_mtu);
will match fallback tunnel.
  */
 
-#define HASH_KEY(key) (((__force u32)key^((__force u32)key>>4))&(HASH_SIZE - 
1))
+#define HASH_KEY(key) (((__force 

Re: [PATCH -next] tipc: fix imbalance read_unlock_bh in __tipc_nl_add_monitor()

2016-07-28 Thread Ying Xue
On 07/28/2016 10:07 AM, Wei Yongjun wrote:
> In the error handling case of nla_nest_start() failed read_unlock_bh()
> is called  to unlock a lock that had not been taken yet. sparse warns
> about the context imbalance as the following:
> 
> net/tipc/monitor.c:799:23: warning:
>  context imbalance in '__tipc_nl_add_monitor' - different lock contexts for 
> basic block
> 
> Fixes: cf6f7e1d5109 ('tipc: dump monitor attributes')
> Signed-off-by: Wei Yongjun 

Acked-by: Ying Xue 

> ---
>  net/tipc/monitor.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/net/tipc/monitor.c b/net/tipc/monitor.c
> index be70a57..b62caa1 100644
> --- a/net/tipc/monitor.c
> +++ b/net/tipc/monitor.c
> @@ -794,10 +794,10 @@ int __tipc_nl_add_monitor(struct net *net, struct 
> tipc_nl_msg *msg,
>   return 0;
>  
>  attr_msg_full:
> + read_unlock_bh(>lock);
>   nla_nest_cancel(msg->skb, attrs);
>  msg_full:
>   genlmsg_cancel(msg->skb, hdr);
> - read_unlock_bh(>lock);
>  
>   return -EMSGSIZE;
>  }
> 
> 



Re: [e1000_netpoll] BUG: sleeping function called from invalid context at kernel/irq/manage.c:110

2016-07-28 Thread Valdis . Kletnieks
On Thu, 28 Jul 2016 09:45:12 +0200, Thomas Gleixner said:
> On Tue, 26 Jul 2016, nick wrote:
> > diff --git a/drivers/net/ethernet/intel/e1000/e1000_main.c 
> > b/drivers/net/ethernet/intel/e1000/e1000_main.c
> > index f42129d..e1830af 100644
> > --- a/drivers/net/ethernet/intel/e1000/e1000_main.c
> > +++ b/drivers/net/ethernet/intel/e1000/e1000_main.c
> > @@ -3797,7 +3797,7 @@ static irqreturn_t e1000_intr(int irq, void *data)
> > hw->get_link_status = 1;
> > /* guard against interrupt when we're going down */
> > if (!test_bit(__E1000_DOWN, >flags))
> > -   schedule_delayed_work(>watchdog_task, 1);
> > +   mod_work(>watchdog_task, jiffies + 1);
>
> And that's not even funny anymore. Are you using a random generator to create
> these patches?

At some point, we need to decide if the occasional accidentally-correct
trivial patch from Nick is worth all the wasted maintainer time.



pgpmzfn8ooCEA.pgp
Description: PGP signature


Re: [PATCH 1/3] net: ethernet: ti: cpdma: fix lockup in cpdma_ctlr_destroy()

2016-07-28 Thread Grygorii Strashko
On 07/26/2016 11:54 PM, ivan.khoronzhuk wrote:
> 
> 
> On 26.07.16 19:02, Grygorii Strashko wrote:
>> On 07/23/2016 09:24 AM, Ivan Khoronzhuk wrote:
>>>
>>>
>>> On 22.07.16 16:58, Grygorii Strashko wrote:
 Fix deadlock in cpdma_ctlr_destroy() which is triggered now on
 cpsw module removal:
  cpsw_remove()
  - cpdma_ctlr_destroy()
- spin_lock_irqsave(>lock, flags)
- cpdma_ctlr_stop()
  - spin_lock_irqsave(>lock, flags); <- deadlock
- cpdma_chan_destroy()
  - spin_lock_irqsave(>lock, flags); <- deadlock

 The issue has not been observed before because CPDMA channels have
 been destroyed manually by CPSW until commit d941ebe88a41 ("net:
 ethernet: ti: cpsw: use destroy ctlr to destroy channels") was merged.

 Signed-off-by: Grygorii Strashko 
 ---
  drivers/net/ethernet/ti/davinci_cpdma.c | 2 --
  1 file changed, 2 deletions(-)

 diff --git a/drivers/net/ethernet/ti/davinci_cpdma.c
 b/drivers/net/ethernet/ti/davinci_cpdma.c
 index a68652a..89242e9 100644
 --- a/drivers/net/ethernet/ti/davinci_cpdma.c
 +++ b/drivers/net/ethernet/ti/davinci_cpdma.c
 @@ -436,7 +436,6 @@ int cpdma_ctlr_destroy(struct cpdma_ctlr *ctlr)
  if (!ctlr)
  return -EINVAL;

 -spin_lock_irqsave(>lock, flags);
>>> Should ctlr->state be checked under lock?
>>> Seems like here should be used unlocked static versions of
>>> cpdma_ctlr_stop() and cpdma_chan_destroy() instead.
>>
>> As per my understanding it's not expected the ctlr->state will be
>> changed at this
>> moment as all net devices has been unregistered already.
> Seems yes, the race can be only in case of incorrect usage, stop while
> destroy,
> destroy while start...etc..all they are mostly unreal use-cases, you are
> right,
> but such check w/o lock always under eyes control, that always makes you
> think
> that smth wrong.
> 
>>
>>>
  if (ctlr->state != CPDMA_STATE_IDLE)
>>
>> May be I can move above check in cpdma_ctlr_stop() instead.
>> What do you think?
> Yes, it be more clear.
> I was thinking about lock deletion also, as under this destroy function the
> ctlr destroys it's resources one by one, ok, the channels are destroyed
> under lock,
> but pool (it's good that it's destroyed after channels). I see that
> it should never
> happen, but ctrl is external structure, who knows as it can be used
> while destroying.
> That was my paranoiac point, so don't pay a lot attention to it. In case
> of normal usage,
> as it's currently is and should be, the lock can be removed.


I'm going to keep it as is after some thinking and code checking -
I don't see any reasons for races here and I can't simply move this check in 
cpdma_ctlr_stop() 
as it might break ndo_open failure handling (and this is not smth. I'd like to 
fix within this series).

I'll resend v2 with build issue fixed and with fix for new issue I've found.
> 
>>
  cpdma_ctlr_stop(ctlr);

 @@ -444,7 +443,6 @@ int cpdma_ctlr_destroy(struct cpdma_ctlr *ctlr)
  cpdma_chan_destroy(ctlr->channels[i]);

  cpdma_desc_pool_destroy(ctlr->pool);
 -spin_unlock_irqrestore(>lock, flags);
  return ret;
  }
  EXPORT_SYMBOL_GPL(cpdma_ctlr_destroy);

>>>
>>
>>


-- 
regards,
-grygorii


Re: [PATCH] net/mlx5_core/pagealloc: Remove deprecated create_singlethread_workqueue

2016-07-28 Thread Leon Romanovsky
On Thu, Jul 28, 2016 at 01:49:49PM +0530, Bhaktipriya Shridhar wrote:
> A dedicated workqueue has been used since the work items are being used
> on a memory reclaim path. WQ_MEM_RECLAIM has been set to guarantee forward
> progress under memory pressure.
> 
> The workqueue has a single work item. Hence, alloc_workqueue() is used
> instead of alloc_ordered_workqueue() since ordering is unnecessary when
> there's only one work item.
> 
> Explicit concurrency limit is unnecessary here since there are only a
> fixed number of work items.
> 
> Signed-off-by: Bhaktipriya Shridhar 
> ---
>  drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)

Hi Bhaktipriya,

First of all, I would like to thank you for your work and invite you to
continue, but can you please submit ONE patch SERIES which changes all
similar places?

BTW,
Did you test this patch? Did you notice the memory reclaim path nature
of this work?

Thanks

> 
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c 
> b/drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c
> index 905..7c85262 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c
> @@ -552,7 +552,8 @@ void mlx5_pagealloc_cleanup(struct mlx5_core_dev *dev)
> 
>  int mlx5_pagealloc_start(struct mlx5_core_dev *dev)
>  {
> - dev->priv.pg_wq = create_singlethread_workqueue("mlx5_page_allocator");
> + dev->priv.pg_wq = alloc_workqueue("mlx5_page_allocator",
> +   WQ_MEM_RECLAIM, 0);
>   if (!dev->priv.pg_wq)
>   return -ENOMEM;
> 
> --
> 2.1.4
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


signature.asc
Description: Digital signature


Re: [PATCH 0/2] net: davinci_cpdma: reduce latency on -rt

2016-07-28 Thread Grygorii Strashko
On 07/27/2016 05:38 PM, Uwe Kleine-König wrote:
> Hello,
> 
> On Wed, Jul 27, 2016 at 05:11:54PM +0300, Grygorii Strashko wrote:
>> On 07/27/2016 10:03 AM, Uwe Kleine-König wrote:
>>> On Tue, Jul 26, 2016 at 05:36:49PM +0300, Grygorii Strashko wrote:
 On 07/26/2016 03:02 PM, Uwe Kleine-König wrote:
> Hello,
>
> these patches are based on next-20160726. I didn't check yet how latency
> improves by using these patches, but even if the improvment is small,
> it's still a good idea to have them.

 Sry, but how this will affect on -RT? This is not a raw locks, so
 they will be converted to rt-mutexes which are sleepable.
 Or I've missed smth?
>>>
>>> They are still locks after all. On -rt I saw for the relevant
>>> application:
>>>
>>>   send package |
>>> take lock  |
>>> write pckt to hw   |
>>>| rcv irq
>>>|   take lock
>>>| schedule
>>> drop lock  | 
>>>   schedule |
>>>|   get pckt from hw
>>>|   drop lock
>>>
>>> So reducing the time a lock is taken reduces the chances that the lock
>>> is contended for another thread which results in extra context switches.
>>>
>> Thanks a lot for explanation. So, this is not exactly rt-latency reduction,
>> but it might improve net performance on -RT. correct?
> 
> Well, it's not really rt related, but if you hit a locked lock on rt it
> hurts more than on !rt. And this results in increased latency.
> 

Thanks. I've just wanted to have clear understanding of the [possible] issue.
And I'd be appreciated if you could share and measurement results if you have.

-- 
regards,
-grygorii


[PATCH v5 3/8] thunderbolt: Kconfig for Thunderbolt(TM) networking

2016-07-28 Thread Amir Levy
Updating the Kconfig Thunderbolt(TM) description.

Signed-off-by: Amir Levy 
---
 drivers/thunderbolt/Kconfig  | 25 +
 drivers/thunderbolt/Makefile |  2 +-
 2 files changed, 22 insertions(+), 5 deletions(-)

diff --git a/drivers/thunderbolt/Kconfig b/drivers/thunderbolt/Kconfig
index c121acc..d34b0f5 100644
--- a/drivers/thunderbolt/Kconfig
+++ b/drivers/thunderbolt/Kconfig
@@ -1,13 +1,30 @@
-menuconfig THUNDERBOLT
-   tristate "Thunderbolt support for Apple devices"
+config THUNDERBOLT
+   tristate "Thunderbolt(TM) support"
depends on PCI
select CRC32
help
- Cactus Ridge Thunderbolt Controller driver
+ Thunderbolt(TM) Controller driver
+
+if THUNDERBOLT
+
+config THUNDERBOLT_APPLE
+   tristate "Apple hardware support"
+   help
  This driver is required if you want to hotplug Thunderbolt devices on
  Apple hardware.
 
  Device chaining is currently not supported.
 
- To compile this driver a module, choose M here. The module will be
+ To compile this driver as a module, choose M here. The module will be
  called thunderbolt.
+
+config THUNDERBOLT_ICM
+   tristate "Thunderbolt(TM) Networking"
+   help
+ This driver is required if you want Thunderbolt(TM) Networking on
+ non-Apple hardware.
+
+ To compile this driver as a module, choose M here. The module will be
+ called thunderbolt_icm.
+
+endif
diff --git a/drivers/thunderbolt/Makefile b/drivers/thunderbolt/Makefile
index 5d1053c..7a85bd1 100644
--- a/drivers/thunderbolt/Makefile
+++ b/drivers/thunderbolt/Makefile
@@ -1,3 +1,3 @@
-obj-${CONFIG_THUNDERBOLT} := thunderbolt.o
+obj-${CONFIG_THUNDERBOLT_APPLE} := thunderbolt.o
 thunderbolt-objs := nhi.o ctl.o tb.o switch.o cap.o path.o tunnel_pci.o 
eeprom.o
 
-- 
2.7.4



[PATCH v5 5/8] thunderbolt: Networking state machine

2016-07-28 Thread Amir Levy
Negotiation states that a peer goes through in order to establish
the communication with the second peer.
This includes communication with upper layer and additional
infrastructure support to communicate with the second peer through ICM.

Signed-off-by: Amir Levy 
---
 drivers/thunderbolt/icm/Makefile  |   2 +-
 drivers/thunderbolt/icm/icm_nhi.c | 303 ++-
 drivers/thunderbolt/icm/net.c | 792 ++
 drivers/thunderbolt/icm/net.h |  70 
 4 files changed, 1156 insertions(+), 11 deletions(-)
 create mode 100644 drivers/thunderbolt/icm/net.c

diff --git a/drivers/thunderbolt/icm/Makefile b/drivers/thunderbolt/icm/Makefile
index 3adfc35..624ee31 100644
--- a/drivers/thunderbolt/icm/Makefile
+++ b/drivers/thunderbolt/icm/Makefile
@@ -25,4 +25,4 @@
 

 
 obj-${CONFIG_THUNDERBOLT_ICM} += thunderbolt-icm.o
-thunderbolt-icm-objs := icm_nhi.o
+thunderbolt-icm-objs := icm_nhi.o net.o
diff --git a/drivers/thunderbolt/icm/icm_nhi.c 
b/drivers/thunderbolt/icm/icm_nhi.c
index 2106dea..e491b0e 100644
--- a/drivers/thunderbolt/icm/icm_nhi.c
+++ b/drivers/thunderbolt/icm/icm_nhi.c
@@ -101,6 +101,12 @@ static const struct nla_policy 
nhi_genl_policy[NHI_ATTR_MAX + 1] = {
.len = TBT_ICM_RING_MAX_FRAME_SIZE },
[NHI_ATTR_MSG_FROM_ICM] = { .type = NLA_BINARY,
.len = TBT_ICM_RING_MAX_FRAME_SIZE },
+   [NHI_ATTR_LOCAL_ROUTE_STRING]   = {.len = sizeof(struct route_string)},
+   [NHI_ATTR_LOCAL_UUID]   = { .len = sizeof(uuid_be) },
+   [NHI_ATTR_REMOTE_UUID]  = { .len = sizeof(uuid_be) },
+   [NHI_ATTR_LOCAL_DEPTH]  = { .type = NLA_U8, },
+   [NHI_ATTR_ENABLE_FULL_E2E]  = { .type = NLA_FLAG, },
+   [NHI_ATTR_MATCH_FRAME_ID]   = { .type = NLA_FLAG, },
 };
 
 /* NHI genetlink family */
@@ -542,6 +548,29 @@ int nhi_mailbox(struct tbt_nhi_ctxt *nhi_ctxt, u32 cmd, 
u32 data, bool deinit)
return 0;
 }
 
+static inline bool nhi_is_path_disconnected(u32 cmd, u8 num_ports)
+{
+   return (cmd >= DISCONNECT_PORT_A_INTER_DOMAIN_PATH &&
+   cmd < (DISCONNECT_PORT_A_INTER_DOMAIN_PATH + num_ports));
+}
+
+static int nhi_mailbox_disconn_path(struct tbt_nhi_ctxt *nhi_ctxt, u32 cmd)
+   __releases(_list_rwsem)
+{
+   struct port_net_dev *port;
+   u32 port_num = cmd - DISCONNECT_PORT_A_INTER_DOMAIN_PATH;
+
+   port = &(nhi_ctxt->net_devices[port_num]);
+   mutex_lock(>state_mutex);
+
+   up_read(_list_rwsem);
+   port->medium_sts = MEDIUM_READY_FOR_APPROVAL;
+   if (port->net_dev)
+   negotiation_events(port->net_dev, MEDIUM_DISCONNECTED);
+   mutex_unlock(>state_mutex);
+   return  0;
+}
+
 static int nhi_mailbox_generic(struct tbt_nhi_ctxt *nhi_ctxt, u32 mb_cmd)
__releases(_list_rwsem)
 {
@@ -590,13 +619,93 @@ static int nhi_genl_mailbox(__always_unused struct 
sk_buff *u_skb,
down_read(_list_rwsem);
 
nhi_ctxt = nhi_search_ctxt(*(u32 *)info->userhdr);
-   if (nhi_ctxt && !nhi_ctxt->d0_exit)
-   return nhi_mailbox_generic(nhi_ctxt, mb_cmd);
+   if (nhi_ctxt && !nhi_ctxt->d0_exit) {
+
+   /* rwsem is released later by the below functions */
+   if (nhi_is_path_disconnected(cmd, nhi_ctxt->num_ports))
+   return nhi_mailbox_disconn_path(nhi_ctxt, cmd);
+   else
+   return nhi_mailbox_generic(nhi_ctxt, mb_cmd);
+
+   }
 
up_read(_list_rwsem);
return -ENODEV;
 }
 
+static int nhi_genl_approve_networking(__always_unused struct sk_buff *u_skb,
+  struct genl_info *info)
+{
+   struct tbt_nhi_ctxt *nhi_ctxt;
+   struct route_string *route_str;
+   int res = -ENODEV;
+   u8 port_num;
+
+   if (!info || !info->userhdr || !info->attrs ||
+   !info->attrs[NHI_ATTR_LOCAL_ROUTE_STRING] ||
+   !info->attrs[NHI_ATTR_LOCAL_UUID] ||
+   !info->attrs[NHI_ATTR_REMOTE_UUID] ||
+   !info->attrs[NHI_ATTR_LOCAL_DEPTH])
+   return -EINVAL;
+
+   /*
+* route_str is an unique topological address
+* used for approving remote controller
+*/
+   route_str = nla_data(info->attrs[NHI_ATTR_LOCAL_ROUTE_STRING]);
+   /* extracts the port we're connected to */
+   port_num = PORT_NUM_FROM_LINK(L0_PORT_NUM(route_str->lo));
+
+   down_read(_list_rwsem);
+
+   nhi_ctxt = nhi_search_ctxt(*(u32 *)info->userhdr);
+   if (nhi_ctxt && !nhi_ctxt->d0_exit) {
+   struct port_net_dev *port;
+
+   if (port_num >= nhi_ctxt->num_ports) {
+   res = -EINVAL;
+   goto free_ctl_list;
+   }
+
+   port = &(nhi_ctxt->net_devices[port_num]);
+
+

[PATCH] net/mlx5_core/pagealloc: Remove deprecated create_singlethread_workqueue

2016-07-28 Thread Bhaktipriya Shridhar
A dedicated workqueue has been used since the work items are being used
on a memory reclaim path. WQ_MEM_RECLAIM has been set to guarantee forward
progress under memory pressure.

The workqueue has a single work item. Hence, alloc_workqueue() is used
instead of alloc_ordered_workqueue() since ordering is unnecessary when
there's only one work item.

Explicit concurrency limit is unnecessary here since there are only a
fixed number of work items.

Signed-off-by: Bhaktipriya Shridhar 
---
 drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c 
b/drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c
index 905..7c85262 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c
@@ -552,7 +552,8 @@ void mlx5_pagealloc_cleanup(struct mlx5_core_dev *dev)

 int mlx5_pagealloc_start(struct mlx5_core_dev *dev)
 {
-   dev->priv.pg_wq = create_singlethread_workqueue("mlx5_page_allocator");
+   dev->priv.pg_wq = alloc_workqueue("mlx5_page_allocator",
+ WQ_MEM_RECLAIM, 0);
if (!dev->priv.pg_wq)
return -ENOMEM;

--
2.1.4



[PATCH v5 8/8] thunderbolt: Adding maintainer entry

2016-07-28 Thread Amir Levy
Add Amir Levy as maintainer for Thunderbolt(TM) ICM driver

Signed-off-by: Amir Levy 
---
 MAINTAINERS | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 771c31c..5f24eb2 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -10141,7 +10141,13 @@ F: include/uapi/linux/stm.h
 THUNDERBOLT DRIVER
 M: Andreas Noever 
 S: Maintained
-F: drivers/thunderbolt/
+F: drivers/thunderbolt/*
+
+THUNDERBOLT ICM DRIVER
+M: Amir Levy 
+S: Maintained
+F: drivers/thunderbolt/icm/
+F: Documentation/thunderbolt-networking.txt
 
 TI BQ27XXX POWER SUPPLY DRIVER
 R: Andrew F. Davis 
-- 
2.7.4



[PATCH v5 4/8] thunderbolt: Communication with the ICM (firmware)

2016-07-28 Thread Amir Levy
Firmware-based (a.k.a ICM - Intel Connection Manager) controller is
used for establishing and maintaining the Thunderbolt Networking
connection. We need to be able to communicate with it.

Signed-off-by: Amir Levy 
---
 drivers/thunderbolt/Makefile  |1 +
 drivers/thunderbolt/icm/Makefile  |   28 +
 drivers/thunderbolt/icm/icm_nhi.c | 1343 +
 drivers/thunderbolt/icm/icm_nhi.h |   93 +++
 drivers/thunderbolt/icm/net.h |  200 ++
 5 files changed, 1665 insertions(+)
 create mode 100644 drivers/thunderbolt/icm/Makefile
 create mode 100644 drivers/thunderbolt/icm/icm_nhi.c
 create mode 100644 drivers/thunderbolt/icm/icm_nhi.h
 create mode 100644 drivers/thunderbolt/icm/net.h

diff --git a/drivers/thunderbolt/Makefile b/drivers/thunderbolt/Makefile
index 7a85bd1..b6aa6a3 100644
--- a/drivers/thunderbolt/Makefile
+++ b/drivers/thunderbolt/Makefile
@@ -1,3 +1,4 @@
 obj-${CONFIG_THUNDERBOLT_APPLE} := thunderbolt.o
 thunderbolt-objs := nhi.o ctl.o tb.o switch.o cap.o path.o tunnel_pci.o 
eeprom.o
 
+obj-${CONFIG_THUNDERBOLT_ICM} += icm/
diff --git a/drivers/thunderbolt/icm/Makefile b/drivers/thunderbolt/icm/Makefile
new file mode 100644
index 000..3adfc35
--- /dev/null
+++ b/drivers/thunderbolt/icm/Makefile
@@ -0,0 +1,28 @@
+
+#
+# Intel Thunderbolt(TM) driver
+# Copyright(c) 2014 - 2016 Intel Corporation.
+#
+# This program is free software; you can redistribute it and/or modify it
+# under the terms and conditions of the GNU General Public License,
+# version 2, as published by the Free Software Foundation.
+#
+# This program is distributed in the hope it will be useful, but WITHOUT
+# ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+# FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+# more details.
+#
+# You should have received a copy of the GNU General Public License along
+# with this program.  If not, see .
+#
+# The full GNU General Public License is included in this distribution in
+# the file called "COPYING".
+#
+# Contact Information:
+# Intel Thunderbolt Mailing List 
+# Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
+#
+
+
+obj-${CONFIG_THUNDERBOLT_ICM} += thunderbolt-icm.o
+thunderbolt-icm-objs := icm_nhi.o
diff --git a/drivers/thunderbolt/icm/icm_nhi.c 
b/drivers/thunderbolt/icm/icm_nhi.c
new file mode 100644
index 000..2106dea
--- /dev/null
+++ b/drivers/thunderbolt/icm/icm_nhi.c
@@ -0,0 +1,1343 @@
+/***
+ *
+ * Intel Thunderbolt(TM) driver
+ * Copyright(c) 2014 - 2016 Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program.  If not, see .
+ *
+ * The full GNU General Public License is included in this distribution in
+ * the file called "COPYING".
+ *
+ * Contact Information:
+ * Intel Thunderbolt Mailing List 
+ * Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
+ *
+ 
**/
+
+#include 
+#include 
+#include 
+#include 
+#include "icm_nhi.h"
+#include "net.h"
+
+#define NHI_GENL_VERSION 1
+#define NHI_GENL_NAME DRV_NAME
+
+#define DEVICE_DATA(num_ports, dma_port, nvm_ver_offset, nvm_auth_on_boot,\
+   support_full_e2e) \
+   ((num_ports) | ((dma_port) << 4) | ((nvm_ver_offset) << 10) | \
+((nvm_auth_on_boot) << 22) | ((support_full_e2e) << 23))
+#define DEVICE_DATA_NUM_PORTS(device_data) ((device_data) & 0xf)
+#define DEVICE_DATA_DMA_PORT(device_data) (((device_data) >> 4) & 0x3f)
+#define DEVICE_DATA_NVM_VER_OFFSET(device_data) (((device_data) >> 10) & 0xfff)
+#define DEVICE_DATA_NVM_AUTH_ON_BOOT(device_data) (((device_data) >> 22) & 0x1)
+#define DEVICE_DATA_SUPPORT_FULL_E2E(device_data) (((device_data) >> 23) & 0x1)
+
+#define USEC_TO_256_NSECS(usec) DIV_ROUND_UP((usec) * NSEC_PER_USEC, 256)
+
+/*
+ * FW->SW responses
+ * RC = response code
+ */
+enum {
+   RC_GET_TBT_TOPOLOGY = 1,
+   RC_GET_VIDEO_RESOURCES_DATA,
+   RC_DRV_READY,
+   RC_APPROVE_PCI_CONNEXION,
+   RC_CHALLENGE_PCI_CONNEXION,
+   

[PATCH v5 7/8] thunderbolt: Networking doc

2016-07-28 Thread Amir Levy
Adding Thunderbolt(TM) networking documentation.

Signed-off-by: Amir Levy 
---
 Documentation/00-INDEX   |   2 +
 Documentation/thunderbolt-networking.txt | 135 +++
 2 files changed, 137 insertions(+)
 create mode 100644 Documentation/thunderbolt-networking.txt

diff --git a/Documentation/00-INDEX b/Documentation/00-INDEX
index cb9a6c6..80a6706 100644
--- a/Documentation/00-INDEX
+++ b/Documentation/00-INDEX
@@ -439,6 +439,8 @@ this_cpu_ops.txt
- List rationale behind and the way to use this_cpu operations.
 thermal/
- directory with information on managing thermal issues (CPU/temp)
+thunderbolt-networking.txt
+   - Thunderbolt(TM) Networking driver description.
 trace/
- directory with info on tracing technologies within linux
 unaligned-memory-access.txt
diff --git a/Documentation/thunderbolt-networking.txt 
b/Documentation/thunderbolt-networking.txt
new file mode 100644
index 000..b7714cf
--- /dev/null
+++ b/Documentation/thunderbolt-networking.txt
@@ -0,0 +1,135 @@
+Intel Thunderbolt(TM) Linux driver
+==
+
+Copyright(c) 2013 - 2016 Intel Corporation.
+
+Contact Information:
+Intel Thunderbolt mailing list 
+Edited by Michael Jamet 
+
+Overview
+
+
+Thunderbolt(TM) Networking mode is introduced with this driver.
+This kernel code creates an ethernet device utilized in computer to computer
+communication over a Thunderbolt cable.
+This driver has been added on the top of the existing thunderbolt driver
+for systems with firwmare (FW) based Thunderbolt controllers supporting
+Thunderbolt Networking.
+
+Files
+=
+
+- icm_nhi.c/h: These files allow communication with the FW (a.k.a ICM) based 
controller.
+   In addition, they create an interface for netlink communication 
with
+   a user space daemon.
+
+- net.c/net.h: These files implement the 'eth' interface for the 
Thunderbolt(TM)
+   networking.
+
+Interface to user space
+===
+
+The interface to the user space module is implemented through a Generic 
Netlink.
+In order to be accessed by the user space module, both kernel and user space
+modules have to register with the same GENL_NAME. In our case, this is
+simply "thunderbolt".
+The registration is done at driver initialization time for all instances of
+the Thunderbolt controllers.
+The communication is then carried through pre-defined Thunderbolt messages.
+Each specific message has a callback function that is called when
+the related message is received.
+
+The messages are defined as follows:
+* NHI_CMD_UNSPEC: Not used.
+* NHI_CMD_SUBSCRIBE: Subscription request from daemon to driver to open the
+  communication channel.
+* NHI_CMD_UNSUBSCRIBE: Request from daemon to driver to unsubscribe
+  to close communication channel.
+* NHI_CMD_QUERY_INFORMATION: Request information from the driver such as
+  driver version, FW version offset, number of ports in the controller
+  and DMA port.
+* NHI_CMD_MSG_TO_ICM: Message from user space module to FW.
+* NHI_CMD_MSG_FROM_ICM: Response from FW to user space module.
+* NHI_CMD_MAILBOX: Message that uses mailbox mechanism such as FW policy
+  changes or disconnect path.
+* NHI_CMD_APPROVE_TBT_NETWORKING: Request from user space
+  module to FW to establish path.
+* NHI_CMD_ICM_IN_SAFE_MODE: Indication that the FW has entered safe mode.
+
+Communication with ICM (Firmware)
+=
+
+The communication with ICM is principally achieved through
+a DMA mechanism on Ring 0.
+The driver allocates a shared memory that is physically mapped onto
+the DMA physical space at Ring 0.
+
+Interrupts
+==
+
+Thunderbolt relies on MSI-X interrupts.
+The MSI-X vector is allocated as follows:
+ICM
+ - Tx: MSI-X vector index 0
+ - Rx: MSI-X vector index 1
+
+Port 0
+ - Tx: MSI-X vector index 2
+ - Rx: MSI-X vector index 3
+
+Port 1
+ - Tx: MSI-X vector index 4
+ - Rx: MSI-X vector index 5
+
+ICM interrupts are used for communication with ICM only.
+Port 0 and Port 1 interrupts are used for Thunderbolt Networking
+communications.
+In case MSI-X is not available, the driver requests to enable MSI only.
+
+Mutexes, semaphores and spinlocks
+=
+
+The driver should be able to operate in an environment where hardware
+is asynchronously accessed by multiple entities such as netlink,
+multiple controllers etc.
+
+* send_sem: This semaphore enforces unique sender (one sender at a time)
+  to avoid wrong impairing with responses. FW may process one message
+  at the time.
+* d0_exit_send_mutex: This mutex protects D0 exit (D3) situation
+  to avoid continuing to send messages to FW.
+* d0_exit_mailbox_mutex: This mutex protects D0 exit (D3) situation to
+  avoid continuing to send commands to mailbox.
+* mailbox_mutex: This mutex enforces 

[PATCH v5 2/8] thunderbolt: Updating the register definitions

2016-07-28 Thread Amir Levy
Adding more Thunderbolt(TM) register definitions
and some helper macros.

Signed-off-by: Amir Levy 
---
 drivers/thunderbolt/nhi_regs.h | 109 +
 1 file changed, 109 insertions(+)

diff --git a/drivers/thunderbolt/nhi_regs.h b/drivers/thunderbolt/nhi_regs.h
index 75cf069..b8e961f 100644
--- a/drivers/thunderbolt/nhi_regs.h
+++ b/drivers/thunderbolt/nhi_regs.h
@@ -9,6 +9,11 @@
 
 #include 
 
+#define NHI_MMIO_BAR 0
+
+#define TBT_RING_MIN_NUM_BUFFERS   2
+#define TBT_RING_MAX_FRAME_SIZE(4 * 1024)
+
 enum ring_flags {
RING_FLAG_ISOCH_ENABLE = 1 << 27, /* TX only? */
RING_FLAG_E2E_FLOW_CONTROL = 1 << 28,
@@ -39,6 +44,33 @@ struct ring_desc {
u32 time; /* write zero */
 } __packed;
 
+/**
+ * struct tbt_buf_desc - TX/RX ring buffer descriptor.
+ * This is same as struct ring_desc, but without the use of bitfields and
+ * with explicit endianity.
+ */
+struct tbt_buf_desc {
+   __le64 phys;
+   __le32 attributes;
+   __le32 time;
+};
+
+#define DESC_ATTR_LEN_SHIFT0
+#define DESC_ATTR_LEN_MASK GENMASK(11, DESC_ATTR_LEN_SHIFT)
+#define DESC_ATTR_EOF_SHIFT12
+#define DESC_ATTR_EOF_MASK GENMASK(15, DESC_ATTR_EOF_SHIFT)
+#define DESC_ATTR_SOF_SHIFT16
+#define DESC_ATTR_SOF_MASK GENMASK(19, DESC_ATTR_SOF_SHIFT)
+#define DESC_ATTR_TX_ISOCH_DMA_EN  BIT(20) /* TX */
+#define DESC_ATTR_RX_CRC_ERR   BIT(20) /* RX after use */
+#define DESC_ATTR_DESC_DONEBIT(21)
+#define DESC_ATTR_REQ_STS  BIT(22) /* TX and RX before use */
+#define DESC_ATTR_RX_BUF_OVRN_ERR  BIT(22) /* RX after use */
+#define DESC_ATTR_INT_EN   BIT(23)
+#define DESC_ATTR_OFFSET_SHIFT 24
+#define DESC_ATTR_OFFSET_MASK  GENMASK(31, DESC_ATTR_OFFSET_SHIFT)
+
+
 /* NHI registers in bar 0 */
 
 /*
@@ -60,6 +92,30 @@ struct ring_desc {
  */
 #define REG_RX_RING_BASE   0x08000
 
+#define REG_RING_STEP  16
+#define REG_RING_PHYS_LO_OFFSET0
+#define REG_RING_PHYS_HI_OFFSET4
+#define REG_RING_CONS_PROD_OFFSET  8   /* cons - RO, prod - RW */
+#define REG_RING_CONS_SHIFT0
+#define REG_RING_CONS_MASK GENMASK(15, REG_RING_CONS_SHIFT)
+#define REG_RING_PROD_SHIFT16
+#define REG_RING_PROD_MASK GENMASK(31, REG_RING_PROD_SHIFT)
+#define REG_RING_SIZE_OFFSET   12
+#define REG_RING_SIZE_SHIFT0
+#define REG_RING_SIZE_MASK GENMASK(15, REG_RING_SIZE_SHIFT)
+#define REG_RING_BUF_SIZE_SHIFT16
+#define REG_RING_BUF_SIZE_MASK GENMASK(27, REG_RING_BUF_SIZE_SHIFT)
+
+#define TBT_RING_CONS_PROD_REG(iobase, ringbase, ringnumber) \
+ ((iobase) + (ringbase) + \
+ ((ringnumber) * REG_RING_STEP) + \
+ REG_RING_CONS_PROD_OFFSET)
+
+#define TBT_REG_RING_PROD_EXTRACT(val) (((val) & REG_RING_PROD_MASK) >> \
+  REG_RING_PROD_SHIFT)
+
+#define TBT_REG_RING_CONS_EXTRACT(val) (((val) & REG_RING_CONS_MASK) >> \
+  REG_RING_CONS_SHIFT)
 /*
  * 32 bytes per entry, one entry for every hop (REG_HOP_COUNT)
  * 00: enum_ring_flags
@@ -77,6 +133,19 @@ struct ring_desc {
  * ..: unknown
  */
 #define REG_RX_OPTIONS_BASE0x29800
+#define REG_RX_OPTS_TX_E2E_HOP_ID_SHIFT12
+#define REG_RX_OPTS_TX_E2E_HOP_ID_MASK \
+   GENMASK(22, REG_RX_OPTS_TX_E2E_HOP_ID_SHIFT)
+#define REG_RX_OPTS_MASK_OFFSET4
+#define REG_RX_OPTS_MASK_EOF_SHIFT 0
+#define REG_RX_OPTS_MASK_EOF_MASK  GENMASK(15, REG_RX_OPTS_MASK_EOF_SHIFT)
+#define REG_RX_OPTS_MASK_SOF_SHIFT 16
+#define REG_RX_OPTS_MASK_SOF_MASK  GENMASK(31, REG_RX_OPTS_MASK_SOF_SHIFT)
+
+#define REG_OPTS_STEP  32
+#define REG_OPTS_E2E_ENBIT(28)
+#define REG_OPTS_RAW   BIT(30)
+#define REG_OPTS_VALID BIT(31)
 
 /*
  * three bitfields: tx, rx, rx overflow
@@ -86,6 +155,7 @@ struct ring_desc {
  */
 #define REG_RING_NOTIFY_BASE   0x37800
 #define RING_NOTIFY_REG_COUNT(nhi) ((31 + 3 * nhi->hop_count) / 32)
+#define REG_RING_NOTIFY_STEP   4
 
 /*
  * two bitfields: rx, tx
@@ -94,8 +164,47 @@ struct ring_desc {
  */
 #define REG_RING_INTERRUPT_BASE0x38200
 #define RING_INTERRUPT_REG_COUNT(nhi) ((31 + 2 * nhi->hop_count) / 32)
+#define REG_RING_INT_TX_PROCESSED(ring_num)BIT(ring_num)
+#define REG_RING_INT_RX_PROCESSED(ring_num, num_paths) BIT((ring_num) + \
+   (num_paths))
+#define RING_INT_DISABLE(base, val) iowrite32( \
+   ioread32((base) + REG_RING_INTERRUPT_BASE) & ~(val), \
+   (base) + REG_RING_INTERRUPT_BASE)
+#define RING_INT_ENABLE(base, val) 

[PATCH v5 6/8] thunderbolt: Networking transmit and receive

2016-07-28 Thread Amir Levy
Handling the transmission to second peer and receiving from it.
This includes communication with upper layer, the network stack
and configuration of Thunderbolt(TM) HW.

Signed-off-by: Amir Levy 
---
 drivers/thunderbolt/icm/icm_nhi.c |   15 +
 drivers/thunderbolt/icm/net.c | 1475 +
 2 files changed, 1490 insertions(+)

diff --git a/drivers/thunderbolt/icm/icm_nhi.c 
b/drivers/thunderbolt/icm/icm_nhi.c
index e491b0e..efadf9c 100644
--- a/drivers/thunderbolt/icm/icm_nhi.c
+++ b/drivers/thunderbolt/icm/icm_nhi.c
@@ -1055,6 +1055,7 @@ static irqreturn_t nhi_msi(int __always_unused irq, void 
*data)
 {
struct tbt_nhi_ctxt *nhi_ctxt = data;
u32 isr0, isr1, imr0, imr1;
+   int i;
 
/* clear on read */
isr0 = ioread32(nhi_ctxt->iobase + REG_RING_NOTIFY_BASE);
@@ -1077,6 +1078,20 @@ static irqreturn_t nhi_msi(int __always_unused irq, void 
*data)
 
spin_unlock(_ctxt->lock);
 
+   for (i = 0; i < nhi_ctxt->num_ports; ++i) {
+   struct net_device *net_dev =
+   nhi_ctxt->net_devices[i].net_dev;
+   if (net_dev) {
+   u8 path = PATH_FROM_PORT(nhi_ctxt->num_paths, i);
+
+   if (isr0 & REG_RING_INT_RX_PROCESSED(
+   path, nhi_ctxt->num_paths))
+   tbt_net_rx_msi(net_dev);
+   if (isr0 & REG_RING_INT_TX_PROCESSED(path))
+   tbt_net_tx_msi(net_dev);
+   }
+   }
+
if (isr0 & REG_RING_INT_RX_PROCESSED(TBT_ICM_RING_NUM,
 nhi_ctxt->num_paths))
schedule_work(_ctxt->icm_msgs_work);
diff --git a/drivers/thunderbolt/icm/net.c b/drivers/thunderbolt/icm/net.c
index 75801b5..6ce7c18 100644
--- a/drivers/thunderbolt/icm/net.c
+++ b/drivers/thunderbolt/icm/net.c
@@ -134,6 +134,17 @@ struct approve_inter_domain_connection_cmd {
 
 };
 
+struct tbt_frame_header {
+   /* size of the data with the frame */
+   __le32 frame_size;
+   /* running index on the frames */
+   __le16 frame_index;
+   /* ID of the frame to match frames to specific packet */
+   __le16 frame_id;
+   /* how many frames assembles a full packet */
+   __le32 frame_count;
+};
+
 enum neg_event {
RECEIVE_LOGOUT = NUM_MEDIUM_STATUSES,
RECEIVE_LOGIN_RESPONSE,
@@ -141,15 +152,81 @@ enum neg_event {
NUM_NEG_EVENTS
 };
 
+enum frame_status {
+   GOOD_FRAME,
+   GOOD_AS_FIRST_FRAME,
+   GOOD_AS_FIRST_MULTICAST_FRAME,
+   FRAME_NOT_READY,
+   FRAME_ERROR,
+};
+
+enum packet_filter {
+   /* all multicast MAC addresses */
+   PACKET_TYPE_ALL_MULTICAST,
+   /* all types of MAC addresses: multicast, unicast and broadcast */
+   PACKET_TYPE_PROMISCUOUS,
+   /* all unicast MAC addresses */
+   PACKET_TYPE_UNICAST_PROMISCUOUS,
+};
+
 enum disconnect_path_stage {
STAGE_1 = BIT(0),
STAGE_2 = BIT(1)
 };
 
+struct tbt_net_stats {
+   u64 tx_packets;
+   u64 tx_bytes;
+   u64 tx_errors;
+   u64 rx_packets;
+   u64 rx_bytes;
+   u64 rx_length_errors;
+   u64 rx_over_errors;
+   u64 rx_crc_errors;
+   u64 rx_missed_errors;
+   u64 multicast;
+};
+
+static const char tbt_net_gstrings_stats[][ETH_GSTRING_LEN] = {
+   "tx_packets",
+   "tx_bytes",
+   "tx_errors",
+   "rx_packets",
+   "rx_bytes",
+   "rx_length_errors",
+   "rx_over_errors",
+   "rx_crc_errors",
+   "rx_missed_errors",
+   "multicast",
+};
+
+struct tbt_buffer {
+   dma_addr_t dma;
+   union {
+   struct tbt_frame_header *hdr;
+   struct page *page;
+   };
+   u32 page_offset;
+};
+
+struct tbt_desc_ring {
+   /* pointer to the descriptor ring memory */
+   struct tbt_buf_desc *desc;
+   /* physical address of the descriptor ring */
+   dma_addr_t dma;
+   /* array of buffer structs */
+   struct tbt_buffer *buffers;
+   /* last descriptor that was associated with a buffer */
+   u16 last_allocated;
+   /* next descriptor to check for DD status bit */
+   u16 next_to_clean;
+};
+
 /**
 *  struct tbt_port - the basic tbt_port structure
 *  @tbt_nhi_ctxt:  context of the nhi controller.
 *  @net_dev:   networking device object.
+*  @napi:  network API
 *  @login_retry_work:  work queue for sending login requests.
 *  @login_response_work:   work queue for sending login responses.
 *  @work_struct logout_work:   work queue for sending logout requests.
@@ -165,6 +242,11 @@ enum disconnect_path_stage {
 *  @login_retry_count: counts number of login retries sent.
 *  @local_depth:   depth of the remote peer in the chain.
 *  @transmit_path: routing parameter for the icm.
+*  

[PATCH v5 1/8] thunderbolt: Macro rename

2016-07-28 Thread Amir Levy
This first patch updates the registers file to
reflect that it isn't only for Cactus Ridge.
No functional change intended.

Signed-off-by: Amir Levy 
---
 drivers/thunderbolt/nhi_regs.h | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/thunderbolt/nhi_regs.h b/drivers/thunderbolt/nhi_regs.h
index 86b996c..75cf069 100644
--- a/drivers/thunderbolt/nhi_regs.h
+++ b/drivers/thunderbolt/nhi_regs.h
@@ -1,11 +1,11 @@
 /*
- * Thunderbolt Cactus Ridge driver - NHI registers
+ * Thunderbolt driver - NHI registers
  *
  * Copyright (c) 2014 Andreas Noever 
  */
 
-#ifndef DSL3510_REGS_H_
-#define DSL3510_REGS_H_
+#ifndef NHI_REGS_H_
+#define NHI_REGS_H_
 
 #include 
 
-- 
2.7.4



[PATCH v5 0/8] thunderbolt: Introducing Thunderbolt(TM) networking

2016-07-28 Thread Amir Levy
This is version 5 of Thunderbolt(TM) driver for non-Apple hardware.

Changes since v4:
 - Added Amir Levy as maintainer of thunderbolt/icm
 - Replaced private uuid definitions with uuid_be

These patches were pushed to GitHub where they can be reviewed more
comfortably with green/red highlighting:
https://github.com/01org/thunderbolt-software-kernel-tree

Daemon code:
https://github.com/01org/thunderbolt-software-daemon

For reference, here's a link to version 4:
[v4]:   https://lkml.org/lkml/2016/7/18/171

Amir Levy (8):
  thunderbolt: Macro rename
  thunderbolt: Updating the register definitions
  thunderbolt: Kconfig for Thunderbolt(TM) networking
  thunderbolt: Communication with the ICM (firmware)
  thunderbolt: Networking state machine
  thunderbolt: Networking transmit and receive
  thunderbolt: Networking doc
  thunderbolt: Adding maintainer entry

 Documentation/00-INDEX   |2 +
 Documentation/thunderbolt-networking.txt |  135 ++
 MAINTAINERS  |8 +-
 drivers/thunderbolt/Kconfig  |   25 +-
 drivers/thunderbolt/Makefile |3 +-
 drivers/thunderbolt/icm/Makefile |   28 +
 drivers/thunderbolt/icm/icm_nhi.c| 1641 +
 drivers/thunderbolt/icm/icm_nhi.h|   93 ++
 drivers/thunderbolt/icm/net.c| 2267 ++
 drivers/thunderbolt/icm/net.h|  270 
 drivers/thunderbolt/nhi_regs.h   |  115 +-
 11 files changed, 4578 insertions(+), 9 deletions(-)
 create mode 100644 Documentation/thunderbolt-networking.txt
 create mode 100644 drivers/thunderbolt/icm/Makefile
 create mode 100644 drivers/thunderbolt/icm/icm_nhi.c
 create mode 100644 drivers/thunderbolt/icm/icm_nhi.h
 create mode 100644 drivers/thunderbolt/icm/net.c
 create mode 100644 drivers/thunderbolt/icm/net.h

-- 
2.7.4



Uncreachable default lo route in table 0

2016-07-28 Thread ashwanth


On an Android device(using 3.18 kernel version) we see multiple entries 
as the following added to the route table 0(unspec) by kernel. We 
noticed that this typically happens when we toggle WIFI repeatedly. The 
problem we see is that under such stress test, the table 0 gets filled 
with these entries(4096) and after that no further routes can get added 
leading to unstability.


unreachable default dev lo table 0 proto kernel metric 4294967295 error 
-101


Can you please let us know why does this route get added for lo device 
and when is this supposed to be flushed out?


--
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a 
Linux Foundation Collaborative Project


[PATCH net] cxgb4/cxgb4vf: Fixes regression in perf when tx vlan offload is disabled

2016-07-28 Thread Hariprasad Shenai
The commit 637d3e997351 ("cxgb4: Discard the packet if the length is
greater than mtu") introduced a regression in the VLAN interface
performance when Tx VLAN offload is disabled.

Check if skb is tagged, regardless of whether it is hardware accelerated
or not. Presently we were checking only for hardware acclereated one,
which caused performance to drop to ~0.17Mbps on a 10GbE adapter for
VLAN interface, when tx vlan offload is turned off using ethtool.
The ethernet head length calculation was going wrong in this case, and
driver ended up dropping packets.

Fixes: 637d3e997351 ("cxgb4: Discard the packet if the length is greater than 
mtu")

Signed-off-by: Hariprasad Shenai 
---
 drivers/net/ethernet/chelsio/cxgb4/sge.c   | 2 +-
 drivers/net/ethernet/chelsio/cxgb4vf/sge.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/chelsio/cxgb4/sge.c 
b/drivers/net/ethernet/chelsio/cxgb4/sge.c
index bad253beb8c8..ad3552df0545 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/sge.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/sge.c
@@ -1192,7 +1192,7 @@ out_free: dev_kfree_skb_any(skb);
 
/* Discard the packet if the length is greater than mtu */
max_pkt_len = ETH_HLEN + dev->mtu;
-   if (skb_vlan_tag_present(skb))
+   if (skb_vlan_tagged(skb))
max_pkt_len += VLAN_HLEN;
if (!skb_shinfo(skb)->gso_size && (unlikely(skb->len > max_pkt_len)))
goto out_free;
diff --git a/drivers/net/ethernet/chelsio/cxgb4vf/sge.c 
b/drivers/net/ethernet/chelsio/cxgb4vf/sge.c
index 1bb57d3fbbe8..c8fd4f8fe1fa 100644
--- a/drivers/net/ethernet/chelsio/cxgb4vf/sge.c
+++ b/drivers/net/ethernet/chelsio/cxgb4vf/sge.c
@@ -1188,7 +1188,7 @@ int t4vf_eth_xmit(struct sk_buff *skb, struct net_device 
*dev)
 
/* Discard the packet if the length is greater than mtu */
max_pkt_len = ETH_HLEN + dev->mtu;
-   if (skb_vlan_tag_present(skb))
+   if (skb_vlan_tagged(skb))
max_pkt_len += VLAN_HLEN;
if (!skb_shinfo(skb)->gso_size && (unlikely(skb->len > max_pkt_len)))
goto out_free;
-- 
2.3.4



Re: [e1000_netpoll] BUG: sleeping function called from invalid context at kernel/irq/manage.c:110

2016-07-28 Thread Thomas Gleixner
On Tue, 26 Jul 2016, nick wrote:
> diff --git a/drivers/net/ethernet/intel/e1000/e1000_main.c 
> b/drivers/net/ethernet/intel/e1000/e1000_main.c
> index f42129d..e1830af 100644
> --- a/drivers/net/ethernet/intel/e1000/e1000_main.c
> +++ b/drivers/net/ethernet/intel/e1000/e1000_main.c
> @@ -3797,7 +3797,7 @@ static irqreturn_t e1000_intr(int irq, void *data)
>   hw->get_link_status = 1;
>   /* guard against interrupt when we're going down */
>   if (!test_bit(__E1000_DOWN, >flags))
> - schedule_delayed_work(>watchdog_task, 1);
> + mod_work(>watchdog_task, jiffies + 1);

And that's not even funny anymore. Are you using a random generator to create
these patches?




Re: [PATCH 14/15] ethernet: stmicro: stmmac: stmmac_platform: add missing of_node_put after calling of_parse_phandle

2016-07-28 Thread Alexandre Torgue

Hi,

On 07/27/2016 04:20 AM, Peter Chen wrote:

of_node_put needs to be called when the device node which is got
from of_parse_phandle has finished using.

Signed-off-by: Peter Chen 
---
  drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c | 5 -
  1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c 
b/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c
index f7dfc0a..8d88782 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c
@@ -113,8 +113,10 @@ static struct stmmac_axi *stmmac_axi_setup(struct 
platform_device *pdev)
return NULL;

axi = kzalloc(sizeof(*axi), GFP_KERNEL);
-   if (!axi)
+   if (!axi) {
+   of_node_put(np);
return ERR_PTR(-ENOMEM);
+   }

axi->axi_lpi_en = of_property_read_bool(np, "snps,lpi_en");
axi->axi_xit_frm = of_property_read_bool(np, "snps,xit_frm");
@@ -127,6 +129,7 @@ static struct stmmac_axi *stmmac_axi_setup(struct 
platform_device *pdev)
of_property_read_u32(np, "snps,wr_osr_lmt", >axi_wr_osr_lmt);
of_property_read_u32(np, "snps,rd_osr_lmt", >axi_rd_osr_lmt);
of_property_read_u32_array(np, "snps,blen", axi->axi_blen, AXI_BLEN);
+   of_node_put(np);

return axi;
  }



I agree with the modification inside stmmac_axi. I just have a question 
about np = pdev->dev.of_node inside stmmac_probe_config_dt (same file). 
We could add a "of_node_put(np)" just before "return plat" ?


Regards

alex


[PATCH 1/1] rps: Inspect PPTP encapsulated by GRE to get flow hash

2016-07-28 Thread fgao
From: Gao Feng 

The PPTP is encapsulated by GRE header with that GRE_VERSION bits
must contain one. But current GRE RPS needs the GRE_VERSION must be
zero. So RPS does not work for PPTP traffic.

In my test environment, there are four MIPS cores, and all traffic
are passed through by PPTP. As a result, only one core is 100% busy
while other three cores are very idle. After this patch, the usage
of four cores are balanced well.

Signed-off-by: Gao Feng 
---
 v2: Update according to Tom and Philp's advice. 
 1) Consolidate the codes with GRE version 0 path;
 2) Use PPP_PROTOCOL to get ppp protol;
 3) Set the FLOW_DIS_ENCAPSULATION flag;
 v1: Initial patch 

 include/uapi/linux/if_tunnel.h |   5 +-
 net/core/flow_dissector.c  | 146 ++---
 2 files changed, 97 insertions(+), 54 deletions(-)

diff --git a/include/uapi/linux/if_tunnel.h b/include/uapi/linux/if_tunnel.h
index 1046f55..dda4e4b 100644
--- a/include/uapi/linux/if_tunnel.h
+++ b/include/uapi/linux/if_tunnel.h
@@ -24,9 +24,12 @@
 #define GRE_SEQ__cpu_to_be16(0x1000)
 #define GRE_STRICT __cpu_to_be16(0x0800)
 #define GRE_REC__cpu_to_be16(0x0700)
-#define GRE_FLAGS  __cpu_to_be16(0x00F8)
+#define GRE_ACK__cpu_to_be16(0x0080)
+#define GRE_FLAGS  __cpu_to_be16(0x0078)
 #define GRE_VERSION__cpu_to_be16(0x0007)
 
+#define GRE_PROTO_PPP  __cpu_to_be16(0x880b)
+
 struct ip_tunnel_parm {
charname[IFNAMSIZ];
int link;
diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c
index 61ad43f..33e957b 100644
--- a/net/core/flow_dissector.c
+++ b/net/core/flow_dissector.c
@@ -346,63 +346,103 @@ ip_proto_again:
hdr = __skb_header_pointer(skb, nhoff, sizeof(_hdr), data, 
hlen, &_hdr);
if (!hdr)
goto out_bad;
-   /*
-* Only look inside GRE if version zero and no
-* routing
-*/
-   if (hdr->flags & (GRE_VERSION | GRE_ROUTING))
-   break;
-
-   proto = hdr->proto;
-   nhoff += 4;
-   if (hdr->flags & GRE_CSUM)
-   nhoff += 4;
-   if (hdr->flags & GRE_KEY) {
-   const __be32 *keyid;
-   __be32 _keyid;
-
-   keyid = __skb_header_pointer(skb, nhoff, sizeof(_keyid),
-data, hlen, &_keyid);
 
-   if (!keyid)
-   goto out_bad;
+   /* Only look inside GRE without routing */
+   if (!(hdr->flags & GRE_ROUTING)) {
+   proto = hdr->proto;
+
+   if (hdr->flags & GRE_VERSION) {
+   /* It should be the PPTP in GRE */
+   u8 _ppp_hdr[PPP_HDRLEN];
+   u8 *ppp_hdr;
+   int offset = 0;
+
+   /* Check the flags according to RFC 2637*/
+   if (!(proto == GRE_PROTO_PPP && (hdr->flags & 
GRE_KEY) &&
+ !(hdr->flags & (GRE_CSUM | GRE_STRICT | 
GRE_REC | GRE_FLAGS {
+   break;
+   }
+
+   /* Skip GRE header */
+   offset += 4;
+   /* Skip payload length and call id */
+   offset += 4;
+
+   if (hdr->flags & GRE_SEQ)
+   offset += 4;
+
+   if (hdr->flags & GRE_ACK)
+   offset += 4;
+
+   ppp_hdr = skb_header_pointer(skb, nhoff + 
offset, sizeof(_ppp_hdr), _ppp_hdr);
+   if (!ppp_hdr)
+   goto out_bad;
+   proto = PPP_PROTOCOL(ppp_hdr);
+   if (proto == PPP_IP) {
+   nhoff += (PPP_HDRLEN + offset);
+   proto = htons(ETH_P_IP);
+   key_control->flags |= 
FLOW_DIS_ENCAPSULATION;
+   goto again;
+   } else if (proto == PPP_IPV6) {
+   nhoff += (PPP_HDRLEN + offset);
+   proto = htons(ETH_P_IPV6);
+   key_control->flags |= 
FLOW_DIS_ENCAPSULATION;
+   goto again;
+   }
+   } else {
+   /* Original GRE */
+  

RE: Microsemi VSC 8531/41 PHY Driver

2016-07-28 Thread Raju Lakkaraju
Hello Andrew,

Thank you for given valuable comments.
Please see the my responses inline.

Thanks,
Raju

-Original Message-
From: Andrew Lunn [mailto:and...@lunn.ch] 
Sent: Tuesday, July 26, 2016 6:14 PM
To: Raju Lakkaraju
Cc: netdev@vger.kernel.org; f.faine...@gmail.com; Allan Nielsen
Subject: Re: Microsemi VSC 8531/41 PHY Driver

EXTERNAL EMAIL


> +/* RGMII Rx Clock delay value change with board lay-out */ static u8 
> +rgmii_rx_clk_delay = RGMII_RX_CLK_DELAY_1_1_NS;

Doesn't this stop you from having a board with two PHYs with different layouts? 
You should be getting this value from the device tree.

Raju: As of now, RGMII Rx clock delay value should be 1.1 nsec as 
optimized/recommended value. 
We tested on Beaglebone Black with VSC 8531 PHY.
We would like to provide new function to configure correct/require value based 
on PHY layouts 
alone with other RGMII configuration parameters as part of our next 
implementation.

> + phydev->supported = (SUPPORTED_1000baseT_Full |
> +  SUPPORTED_1000baseT_Half |
> +  SUPPORTED_100baseT_Full  |
> +  SUPPORTED_100baseT_Half  |
> +  SUPPORTED_10baseT_Full   |
> +  SUPPORTED_10baseT_Half   |
> +  SUPPORTED_Autoneg|
> +  SUPPORTED_Pause  |
> +  SUPPORTED_Asym_Pause |
> +  SUPPORTED_TP);
> +
> + phydev->speed = SPEED_1000;
> + phydev->duplex = DUPLEX_FULL;
> + phydev->pause = 0;
> + phydev->asym_pause = 0;
> + phydev->interface = PHY_INTERFACE_MODE_RGMII;
> + phydev->mdix = ETH_TP_MDI_AUTO;

Why are you setting all these? This is not normal, if you look at other drivers.

Raju: I would like to update the default values in software data structure 
(phydev). 
Our PHY is 1G speed support device and RGMII supported device.

> +
> + mutex_lock(>lock);

What are you locking against?

Raju: VSC 8531 has different PAGEs. Whenever MDC/MDIO access the PHY control 
registers, 
first set the page number then read/write the register address. Default page 
should be Page 0.
When I want to access not default page register, I have to lock phy device 
access and change 
the page number and register access as atomic operation. 

> + rc = vsc85xx_phy_page_set(phydev, MSCC_PHY_PAGE_EXTENDED_2);
> + if (rc != 0) {
> + rc = -EINVAL;

Why do you overwrite the error code vsc85xx_phy_page_set gives you?

Raju: initially I would like to create new type of Error code. Then, I decided 
to use existing one. 
I accept your comment. I will remove the code.

> + goto out_unlock;
> + }
> + reg_val = phy_read(phydev, MSCC_PHY_RGMII_CNTL);
> + reg_val &= ~(RGMII_RX_CLK_DELAY_MASK);
> + reg_val |= (rgmii_rx_clk_delay << RGMII_RX_CLK_DELAY_POS);
> + phy_write(phydev, MSCC_PHY_RGMII_CNTL, reg_val);
> + rc = vsc85xx_phy_page_set(phydev, MSCC_PHY_PAGE_STANDARD);
> + if (rc != 0)
> + rc = -EINVAL;

Same here.

Raju: I accept your comment. I will remove the code.

> +
> +out_unlock:
> + mutex_unlock(>lock);
> +
> + return rc;
> +}
> +
> +static int vsc85xx_config_init(struct phy_device *phydev) {
> + int rc = 0;

No need to initialise rc.

Raju: I accept your comment. I will remove the code.

> + rc = vsc85xx_default_config(phydev);

if (rc)
return rc;

> + rc = genphy_config_init(phydev);
> +
> + return rc;

Or just
return genphy_config_init(phydev);

Raju: I accept your comment. I will remove the code.

> +}
> +
> +static int vsc85xx_ack_interrupt(struct phy_device *phydev) {
> + int rc = 0;
> +
> + if (phydev->interrupts == PHY_INTERRUPT_ENABLED)
> + rc = phy_read(phydev, MII_VSC85XX_INT_STATUS);
> +
> + return (rc < 0) ? rc : 0;
> +}
> +
> +static int vsc85xx_config_intr(struct phy_device *phydev) {
> + int rc;
> +
> + if (phydev->interrupts == PHY_INTERRUPT_ENABLED) {
> + rc = phy_write(phydev, MII_VSC85XX_INT_MASK,
> +MII_VSC85XX_INT_MASK_MASK);
> + } else {
> + rc = phy_read(phydev, MII_VSC85XX_INT_STATUS);
> + if (rc < 0)
> + return rc;

And the purpose of this read is? I assume it clears an outstanding interrupt? 
If so, shouldn't you do it after disabling interrupts, not before? Otherwise 
you have a race condition.

Raju: The Interrupt status register is read on clean. When, 
PHY_INTERRUPT_DISABLE case, 
I should make sure that status should be clear. If I read the Interrupt status 
registers, it clears all preexisting interrupts.

> + rc = phy_write(phydev, MII_VSC85XX_INT_MASK, 0);
> + }
> +
> + return rc;

  Andrew


RE: [PATCH 06/15] ethernet: hisilicon: hns: hns_dsaf_mac: add missing of_node_put after calling of_parse_phandle

2016-07-28 Thread Peter Chen
 
>在 2016/7/27 10:20, Peter Chen 写道:
>> of_node_put needs to be called when the device node which is got from
>> of_parse_phandle has finished using.
>>
>> Signed-off-by: Peter Chen 
>> ---
>>  drivers/net/ethernet/hisilicon/hns/hns_dsaf_mac.c | 9 ++---
>>  1 file changed, 6 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_mac.c
>> b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_mac.c
>> index 3fb87e2..18d72ea 100644
>> --- a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_mac.c
>> +++ b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_mac.c
>> @@ -786,6 +786,7 @@ static int  hns_mac_get_info(struct hns_mac_cb *mac_cb)
>>  np = of_parse_phandle(mac_cb->dev->of_node, "phy-handle",
>>mac_cb->mac_id);
>>  mac_cb->phy_dev = of_phy_find_device(np);
>> +of_node_put(np);
>>  if (mac_cb->phy_dev) {
>>  /* refcount is held by of_phy_find_device()
>>   * if the phy_dev is found
>
>np is accessed in case of of_phy_find_device() returns a no null value, so it 
>has to be
>moved after the dev_dbg log.
>
>> @@ -804,6 +805,7 @@ static int  hns_mac_get_info(struct hns_mac_cb *mac_cb)
>>  np = of_parse_phandle(to_of_node(mac_cb->fw_port),
>>"phy-handle", 0);
>>  mac_cb->phy_dev = of_phy_find_device(np);
>> +of_node_put(np);
>>  if (mac_cb->phy_dev) {
>>  /* refcount is held by of_phy_find_device()
>>   * if the phy_dev is found
>
>np is accessed in case of of_phy_find_device() returns a no null value, so it 
>has to be
>moved after the dev_dbg log.
>

Thanks, I will change them.

Peter