[PATCH] Revert "net/ipv6: fix metrics leak"

2018-08-01 Thread David Miller


This reverts commit df18b50448fab1dff093731dfd0e25e77e1afcd1.

This change causes other problems and use-after-free situations as
found by syzbot.

Signed-off-by: David S. Miller 
---
 net/ipv6/ip6_fib.c | 18 --
 1 file changed, 4 insertions(+), 14 deletions(-)

diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c
index 211a2d437b56..d212738e9d10 100644
--- a/net/ipv6/ip6_fib.c
+++ b/net/ipv6/ip6_fib.c
@@ -167,22 +167,11 @@ struct fib6_info *fib6_info_alloc(gfp_t gfp_flags)
return f6i;
 }
 
-static void fib6_metrics_release(struct fib6_info *f6i)
-{
-   struct dst_metrics *m;
-
-   if (!f6i)
-   return;
-
-   m = f6i->fib6_metrics;
-   if (m != _default_metrics && refcount_dec_and_test(>refcnt))
-   kfree(m);
-}
-
 void fib6_info_destroy_rcu(struct rcu_head *head)
 {
struct fib6_info *f6i = container_of(head, struct fib6_info, rcu);
struct rt6_exception_bucket *bucket;
+   struct dst_metrics *m;
 
WARN_ON(f6i->fib6_node);
 
@@ -212,7 +201,9 @@ void fib6_info_destroy_rcu(struct rcu_head *head)
if (f6i->fib6_nh.nh_dev)
dev_put(f6i->fib6_nh.nh_dev);
 
-   fib6_metrics_release(f6i);
+   m = f6i->fib6_metrics;
+   if (m != _default_metrics && refcount_dec_and_test(>refcnt))
+   kfree(m);
 
kfree(f6i);
 }
@@ -896,7 +887,6 @@ static void fib6_drop_pcpu_from(struct fib6_info *f6i,
 
from = rcu_dereference_protected(pcpu_rt->from,
 lockdep_is_held(>tb6_lock));
-   fib6_metrics_release(from);
rcu_assign_pointer(pcpu_rt->from, NULL);
fib6_info_release(from);
}
-- 
2.17.1



[PATCH net-next] rxrpc: Remove set but not used variable 'nowj'

2018-08-01 Thread Wei Yongjun
Fixes gcc '-Wunused-but-set-variable' warning:

net/rxrpc/proc.c: In function 'rxrpc_call_seq_show':
net/rxrpc/proc.c:66:29: warning:
 variable 'nowj' set but not used [-Wunused-but-set-variable]
  unsigned long timeout = 0, nowj;
 ^

Signed-off-by: Wei Yongjun 
---
 net/rxrpc/proc.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/net/rxrpc/proc.c b/net/rxrpc/proc.c
index 163d05d..9805e3b 100644
--- a/net/rxrpc/proc.c
+++ b/net/rxrpc/proc.c
@@ -63,7 +63,7 @@ static int rxrpc_call_seq_show(struct seq_file *seq, void *v)
struct rxrpc_peer *peer;
struct rxrpc_call *call;
struct rxrpc_net *rxnet = rxrpc_net(seq_file_net(seq));
-   unsigned long timeout = 0, nowj;
+   unsigned long timeout = 0;
rxrpc_seq_t tx_hard_ack, rx_hard_ack;
char lbuff[50], rbuff[50];
 
@@ -97,7 +97,6 @@ static int rxrpc_call_seq_show(struct seq_file *seq, void *v)
 
if (call->state != RXRPC_CALL_SERVER_PREALLOC) {
timeout = READ_ONCE(call->expect_rx_by);
-   nowj = jiffies;
timeout -= jiffies;
}



Re: [pull request][net-next 00/10] Mellanox, mlx5 and devlink updates 2018-07-31

2018-08-01 Thread David Miller
From: Jakub Kicinski 
Date: Wed, 1 Aug 2018 17:00:47 -0700

> On Wed,  1 Aug 2018 14:52:45 -0700, Saeed Mahameed wrote:
>> - According to the discussion outcome, we are keeping the congestion control
>>   setting as mlx5 device specific for the current HW generation.
> 
> I still see queuing and marking based on queue level.  You want to add 
> a Qdisc that will mirror your HW's behaviour to offload, if you really
> believe this is not a subset of RED, why not...  But devlink params?

I totally agree, devlink seems like absolutely to wrong level and set
of interfaces to be doing this stuff.

I will not pull these changes in and I probably should have not
accepted the DCB changes from the other day and they were sneakily
leading up to this crap.

Sorry, please follow Jakub's lead as I think his approach makes much
more technical sense than using devlink for this.

Thanks.


Re: [Patch net-next] net: hns3: fix return value error while hclge_cmd_csq_clean failed

2018-08-01 Thread tanhuazhong




On 2018/8/2 1:03, David Miller wrote:

From: Huazhong Tan 
Date: Wed, 1 Aug 2018 17:53:28 +0800


From: fredalu 

While cleaning the command queue, the value of the HEAD register is not
in the range of next_to_clean and next_to_use, meaning that this value
is invalid. This also means that there is a hardware error and the
hardware will trigger a reset soon. At this time we should return an
error code instead of 0, and HCLGE_STATE_CMD_DISABLE needs to be set to
prevent sending command again.

Fixes: 3ff504908f95 ("net: hns3: fix a dead loop in hclge_cmd_csq_clean")
Signed-off-by: Huazhong Tan wq

 --->  ^^^

I think I know what text editor you use... :-)

Applied with 'wq' removed from your signoff... Thanks.

.



Sorry about this, I have sent V2 to fix it.
Thanks.:)



Re: [pull request][net-next 00/10] Mellanox, mlx5 and devlink updates 2018-07-31

2018-08-01 Thread Alexander Duyck
On Wed, Aug 1, 2018 at 4:13 PM, Saeed Mahameed
 wrote:
> On Wed, Aug 1, 2018 at 3:34 PM, Alexander Duyck
>  wrote:
>> On Wed, Aug 1, 2018 at 2:52 PM, Saeed Mahameed  wrote:
>>> Hi Dave,
>>>
>>> This series provides devlink parameters updates to both devlink API and
>>> mlx5 driver, it is a 2nd iteration of the dropped patches sent in a previous
>>> mlx5 submission "net/mlx5: Support PCIe buffer congestion handling via
>>> Devlink" to address review comments [1].
>>>
>>> Changes from the original series:
>>> - According to the discussion outcome, we are keeping the congestion control
>>>   setting as mlx5 device specific for the current HW generation.
>>> - Changed the congestion_mode and congestion action param type to string
>>> - Added patches to fix devlink handling of param type string
>>> - Added a patch which adds extack messages support for param set.
>>> - At the end of this series, I've added yet another mlx5 devlink related
>>>  feature, firmware snapshot support.
>>>
>>> For more information please see tag log below.
>>>
>>> Please pull and let me know if there's any problem.
>>>
>>> [1] https://patchwork.ozlabs.org/patch/945996/
>>>
>>> Thanks,
>>> Saeed.
>>>
>>> ---
>>>
>>> The following changes since commit e6476c21447c4b17c47e476aade6facf050f31e8:
>>>
>>>   net: remove bogus RCU annotations on socket.wq (2018-07-31 12:40:22 -0700)
>>>
>>> are available in the Git repository at:
>>>
>>>   git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux.git 
>>> tags/mlx5-updates-2018-08-01
>>>
>>> for you to fetch changes up to 2ac6108c65ffcb1e5eab1fba1fd59272604d1c32:
>>>
>>>   net/mlx5: Use devlink region_snapshot parameter (2018-08-01 14:49:09 
>>> -0700)
>>>
>>> 
>>> mlx5-updates-2018-08-01
>>>
>>> This series provides devlink parameters updates to both devlink API and
>>> mlx5 driver,
>>>
>>> 1) Devlink changes: (Moshe Shemesh)
>>> The first two patches fix devlink param infrastructure for string type
>>> params.
>>> The third patch adds a devlink helper function to safely copy string from
>>> driver to devlink.
>>> The forth patch adds extack support for param set.
>>>
>>> 2) mlx5 specific congestion parameters: (Eran Ben Elisha)
>>> Next three patches add new devlink driver specific params for controlling
>>> congestion action and mode, using string type params and extack messages 
>>> support.
>>>
>>> This congestion mode enables hw workaround in specific devices which is
>>> controlled by devlink driver-specific params. The workaround is device
>>> specific for this NIC generation, the next NIC will not need it.
>>>
>>> Congestion parameters:
>>>  - Congestion action
>>> HW W/A mechanism in the PCIe buffer which monitors the amount of
>>> consumed PCIe buffer per host.  This mechanism supports the
>>> following actions in case of threshold overflow:
>>> - Disabled - NOP (Default)
>>> - Drop
>>> - Mark - Mark CE bit in the CQE of received packet
>>> - Congestion mode
>>> - Aggressive - Aggressive static trigger threshold (Default)
>>> - Dynamic - Dynamically change the trigger threshold
>>>
>>> 3) mlx5 firmware snapshot support via devlink: (Alex Vesker)
>>> Last three patches, add the support for capturing region snapshot of the
>>> firmware crspace during critical errors, using devlink region_snapshot
>>> parameter.
>>>
>>> -Saeed.
>>>
>>> 
>>> Alex Vesker (3):
>>>   net/mlx5: Add Vendor Specific Capability access gateway
>>>   net/mlx5: Add Crdump FW snapshot support
>>>   net/mlx5: Use devlink region_snapshot parameter
>>>
>>> Eran Ben Elisha (3):
>>>   net/mlx5: Move all devlink related functions calls to devlink.c
>>>   net/mlx5: Add MPEGC register configuration functionality
>>>   net/mlx5: Enable PCIe buffer congestion handling workaround via 
>>> devlink
>>>
>>> Moshe Shemesh (4):
>>>   devlink: Fix param set handling for string type
>>>   devlink: Fix param cmode driverinit for string type
>>>   devlink: Add helper function for safely copy string param
>>>   devlink: Add extack messages support to param set
>>>
>>>  drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c  |   3 +-
>>>  drivers/net/ethernet/mellanox/mlx4/main.c  |   6 +-
>>>  drivers/net/ethernet/mellanox/mlx5/core/Makefile   |   3 +-
>>>  drivers/net/ethernet/mellanox/mlx5/core/devlink.c  | 388 
>>> +
>>>  drivers/net/ethernet/mellanox/mlx5/core/devlink.h  |  13 +
>>>  .../net/ethernet/mellanox/mlx5/core/diag/crdump.c  | 223 
>>>  drivers/net/ethernet/mellanox/mlx5/core/health.c   |   3 +
>>>  drivers/net/ethernet/mellanox/mlx5/core/lib/mlx5.h |   4 +
>>>  .../net/ethernet/mellanox/mlx5/core/lib/pci_vsc.c  | 320 +
>>>  .../net/ethernet/mellanox/mlx5/core/lib/pci_vsc.h  |  56 +++
>>>  

Re: [pull request][net-next 00/10] Mellanox, mlx5 and devlink updates 2018-07-31

2018-08-01 Thread Jakub Kicinski
On Wed,  1 Aug 2018 14:52:45 -0700, Saeed Mahameed wrote:
> - According to the discussion outcome, we are keeping the congestion control
>   setting as mlx5 device specific for the current HW generation.

I still see queuing and marking based on queue level.  You want to add 
a Qdisc that will mirror your HW's behaviour to offload, if you really
believe this is not a subset of RED, why not...  But devlink params?


re: hello

2018-08-01 Thread Erika@EuropeMillions




--
Congratulations, You Have won SIX HUNDRED AND FIFTY THOUSAND EURO in the
monthly Euro/ Google draws on July 1, 2018. reply to fill your claim 
form.


Erika Hermann
Online Coordinator
Desk038984EU


Re: [PATCH net-next] virtio_net: force_napi_tx module param.

2018-08-01 Thread Willem de Bruijn
On Wed, Aug 1, 2018 at 6:48 PM Michael S. Tsirkin  wrote:
>
> On Wed, Aug 01, 2018 at 06:43:53PM -0400, Willem de Bruijn wrote:
> >  > > So e.g. we could set an affinity hint to a group of CPUs that
> > > > > might transmit to this queue.
> > > >
> > > > We also want to set the xps mask for all cpus in the group to this 
> > > > queue.
> > > >
> > > > Is there a benefit over explicitly choosing one cpu from the set, btw?
> > >
> > > If only one CPU actually transmits on this queue then probably yes.
> > > And virtio doesn't know whether that's the case.
> >
> > Oh, yes, good example.
> >
> > To plumb that through, vp_set_vq_affinity would have to take a cpu
> > mask like irq_set_affinity_hint, instead of a single value.
> >
> > Though I don't suggest doing that as part of the core
> > virtnet_set_affinity changes.
>
> Why not? It seems a reasonable thing to do.

Well, at least let's split it into separate patches.


Re: [pull request][net-next 00/10] Mellanox, mlx5 and devlink updates 2018-07-31

2018-08-01 Thread Saeed Mahameed
On Wed, Aug 1, 2018 at 3:34 PM, Alexander Duyck
 wrote:
> On Wed, Aug 1, 2018 at 2:52 PM, Saeed Mahameed  wrote:
>> Hi Dave,
>>
>> This series provides devlink parameters updates to both devlink API and
>> mlx5 driver, it is a 2nd iteration of the dropped patches sent in a previous
>> mlx5 submission "net/mlx5: Support PCIe buffer congestion handling via
>> Devlink" to address review comments [1].
>>
>> Changes from the original series:
>> - According to the discussion outcome, we are keeping the congestion control
>>   setting as mlx5 device specific for the current HW generation.
>> - Changed the congestion_mode and congestion action param type to string
>> - Added patches to fix devlink handling of param type string
>> - Added a patch which adds extack messages support for param set.
>> - At the end of this series, I've added yet another mlx5 devlink related
>>  feature, firmware snapshot support.
>>
>> For more information please see tag log below.
>>
>> Please pull and let me know if there's any problem.
>>
>> [1] https://patchwork.ozlabs.org/patch/945996/
>>
>> Thanks,
>> Saeed.
>>
>> ---
>>
>> The following changes since commit e6476c21447c4b17c47e476aade6facf050f31e8:
>>
>>   net: remove bogus RCU annotations on socket.wq (2018-07-31 12:40:22 -0700)
>>
>> are available in the Git repository at:
>>
>>   git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux.git 
>> tags/mlx5-updates-2018-08-01
>>
>> for you to fetch changes up to 2ac6108c65ffcb1e5eab1fba1fd59272604d1c32:
>>
>>   net/mlx5: Use devlink region_snapshot parameter (2018-08-01 14:49:09 -0700)
>>
>> 
>> mlx5-updates-2018-08-01
>>
>> This series provides devlink parameters updates to both devlink API and
>> mlx5 driver,
>>
>> 1) Devlink changes: (Moshe Shemesh)
>> The first two patches fix devlink param infrastructure for string type
>> params.
>> The third patch adds a devlink helper function to safely copy string from
>> driver to devlink.
>> The forth patch adds extack support for param set.
>>
>> 2) mlx5 specific congestion parameters: (Eran Ben Elisha)
>> Next three patches add new devlink driver specific params for controlling
>> congestion action and mode, using string type params and extack messages 
>> support.
>>
>> This congestion mode enables hw workaround in specific devices which is
>> controlled by devlink driver-specific params. The workaround is device
>> specific for this NIC generation, the next NIC will not need it.
>>
>> Congestion parameters:
>>  - Congestion action
>> HW W/A mechanism in the PCIe buffer which monitors the amount of
>> consumed PCIe buffer per host.  This mechanism supports the
>> following actions in case of threshold overflow:
>> - Disabled - NOP (Default)
>> - Drop
>> - Mark - Mark CE bit in the CQE of received packet
>> - Congestion mode
>> - Aggressive - Aggressive static trigger threshold (Default)
>> - Dynamic - Dynamically change the trigger threshold
>>
>> 3) mlx5 firmware snapshot support via devlink: (Alex Vesker)
>> Last three patches, add the support for capturing region snapshot of the
>> firmware crspace during critical errors, using devlink region_snapshot
>> parameter.
>>
>> -Saeed.
>>
>> 
>> Alex Vesker (3):
>>   net/mlx5: Add Vendor Specific Capability access gateway
>>   net/mlx5: Add Crdump FW snapshot support
>>   net/mlx5: Use devlink region_snapshot parameter
>>
>> Eran Ben Elisha (3):
>>   net/mlx5: Move all devlink related functions calls to devlink.c
>>   net/mlx5: Add MPEGC register configuration functionality
>>   net/mlx5: Enable PCIe buffer congestion handling workaround via devlink
>>
>> Moshe Shemesh (4):
>>   devlink: Fix param set handling for string type
>>   devlink: Fix param cmode driverinit for string type
>>   devlink: Add helper function for safely copy string param
>>   devlink: Add extack messages support to param set
>>
>>  drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c  |   3 +-
>>  drivers/net/ethernet/mellanox/mlx4/main.c  |   6 +-
>>  drivers/net/ethernet/mellanox/mlx5/core/Makefile   |   3 +-
>>  drivers/net/ethernet/mellanox/mlx5/core/devlink.c  | 388 
>> +
>>  drivers/net/ethernet/mellanox/mlx5/core/devlink.h  |  13 +
>>  .../net/ethernet/mellanox/mlx5/core/diag/crdump.c  | 223 
>>  drivers/net/ethernet/mellanox/mlx5/core/health.c   |   3 +
>>  drivers/net/ethernet/mellanox/mlx5/core/lib/mlx5.h |   4 +
>>  .../net/ethernet/mellanox/mlx5/core/lib/pci_vsc.c  | 320 +
>>  .../net/ethernet/mellanox/mlx5/core/lib/pci_vsc.h  |  56 +++
>>  drivers/net/ethernet/mellanox/mlx5/core/main.c |  10 +-
>>  include/linux/mlx5/driver.h|   5 +
>>  include/net/devlink.h  |  15 +-
>>  

Re: [PATCH net-next] virtio_net: force_napi_tx module param.

2018-08-01 Thread Michael S. Tsirkin
On Wed, Aug 01, 2018 at 06:43:53PM -0400, Willem de Bruijn wrote:
>  > > So e.g. we could set an affinity hint to a group of CPUs that
> > > > might transmit to this queue.
> > >
> > > We also want to set the xps mask for all cpus in the group to this queue.
> > >
> > > Is there a benefit over explicitly choosing one cpu from the set, btw?
> >
> > If only one CPU actually transmits on this queue then probably yes.
> > And virtio doesn't know whether that's the case.
> 
> Oh, yes, good example.
> 
> To plumb that through, vp_set_vq_affinity would have to take a cpu
> mask like irq_set_affinity_hint, instead of a single value.
> 
> Though I don't suggest doing that as part of the core
> virtnet_set_affinity changes.

Why not? It seems a reasonable thing to do.

-- 
MST


Re: [PATCH net-next] virtio_net: force_napi_tx module param.

2018-08-01 Thread Willem de Bruijn
 > > So e.g. we could set an affinity hint to a group of CPUs that
> > > might transmit to this queue.
> >
> > We also want to set the xps mask for all cpus in the group to this queue.
> >
> > Is there a benefit over explicitly choosing one cpu from the set, btw?
>
> If only one CPU actually transmits on this queue then probably yes.
> And virtio doesn't know whether that's the case.

Oh, yes, good example.

To plumb that through, vp_set_vq_affinity would have to take a cpu
mask like irq_set_affinity_hint, instead of a single value.

Though I don't suggest doing that as part of the core
virtnet_set_affinity changes.


Re: [pull request][net-next 00/10] Mellanox, mlx5 and devlink updates 2018-07-31

2018-08-01 Thread Alexander Duyck
On Wed, Aug 1, 2018 at 2:52 PM, Saeed Mahameed  wrote:
> Hi Dave,
>
> This series provides devlink parameters updates to both devlink API and
> mlx5 driver, it is a 2nd iteration of the dropped patches sent in a previous
> mlx5 submission "net/mlx5: Support PCIe buffer congestion handling via
> Devlink" to address review comments [1].
>
> Changes from the original series:
> - According to the discussion outcome, we are keeping the congestion control
>   setting as mlx5 device specific for the current HW generation.
> - Changed the congestion_mode and congestion action param type to string
> - Added patches to fix devlink handling of param type string
> - Added a patch which adds extack messages support for param set.
> - At the end of this series, I've added yet another mlx5 devlink related
>  feature, firmware snapshot support.
>
> For more information please see tag log below.
>
> Please pull and let me know if there's any problem.
>
> [1] https://patchwork.ozlabs.org/patch/945996/
>
> Thanks,
> Saeed.
>
> ---
>
> The following changes since commit e6476c21447c4b17c47e476aade6facf050f31e8:
>
>   net: remove bogus RCU annotations on socket.wq (2018-07-31 12:40:22 -0700)
>
> are available in the Git repository at:
>
>   git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux.git 
> tags/mlx5-updates-2018-08-01
>
> for you to fetch changes up to 2ac6108c65ffcb1e5eab1fba1fd59272604d1c32:
>
>   net/mlx5: Use devlink region_snapshot parameter (2018-08-01 14:49:09 -0700)
>
> 
> mlx5-updates-2018-08-01
>
> This series provides devlink parameters updates to both devlink API and
> mlx5 driver,
>
> 1) Devlink changes: (Moshe Shemesh)
> The first two patches fix devlink param infrastructure for string type
> params.
> The third patch adds a devlink helper function to safely copy string from
> driver to devlink.
> The forth patch adds extack support for param set.
>
> 2) mlx5 specific congestion parameters: (Eran Ben Elisha)
> Next three patches add new devlink driver specific params for controlling
> congestion action and mode, using string type params and extack messages 
> support.
>
> This congestion mode enables hw workaround in specific devices which is
> controlled by devlink driver-specific params. The workaround is device
> specific for this NIC generation, the next NIC will not need it.
>
> Congestion parameters:
>  - Congestion action
> HW W/A mechanism in the PCIe buffer which monitors the amount of
> consumed PCIe buffer per host.  This mechanism supports the
> following actions in case of threshold overflow:
> - Disabled - NOP (Default)
> - Drop
> - Mark - Mark CE bit in the CQE of received packet
> - Congestion mode
> - Aggressive - Aggressive static trigger threshold (Default)
> - Dynamic - Dynamically change the trigger threshold
>
> 3) mlx5 firmware snapshot support via devlink: (Alex Vesker)
> Last three patches, add the support for capturing region snapshot of the
> firmware crspace during critical errors, using devlink region_snapshot
> parameter.
>
> -Saeed.
>
> 
> Alex Vesker (3):
>   net/mlx5: Add Vendor Specific Capability access gateway
>   net/mlx5: Add Crdump FW snapshot support
>   net/mlx5: Use devlink region_snapshot parameter
>
> Eran Ben Elisha (3):
>   net/mlx5: Move all devlink related functions calls to devlink.c
>   net/mlx5: Add MPEGC register configuration functionality
>   net/mlx5: Enable PCIe buffer congestion handling workaround via devlink
>
> Moshe Shemesh (4):
>   devlink: Fix param set handling for string type
>   devlink: Fix param cmode driverinit for string type
>   devlink: Add helper function for safely copy string param
>   devlink: Add extack messages support to param set
>
>  drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c  |   3 +-
>  drivers/net/ethernet/mellanox/mlx4/main.c  |   6 +-
>  drivers/net/ethernet/mellanox/mlx5/core/Makefile   |   3 +-
>  drivers/net/ethernet/mellanox/mlx5/core/devlink.c  | 388 
> +
>  drivers/net/ethernet/mellanox/mlx5/core/devlink.h  |  13 +
>  .../net/ethernet/mellanox/mlx5/core/diag/crdump.c  | 223 
>  drivers/net/ethernet/mellanox/mlx5/core/health.c   |   3 +
>  drivers/net/ethernet/mellanox/mlx5/core/lib/mlx5.h |   4 +
>  .../net/ethernet/mellanox/mlx5/core/lib/pci_vsc.c  | 320 +
>  .../net/ethernet/mellanox/mlx5/core/lib/pci_vsc.h  |  56 +++
>  drivers/net/ethernet/mellanox/mlx5/core/main.c |  10 +-
>  include/linux/mlx5/driver.h|   5 +
>  include/net/devlink.h  |  15 +-
>  net/core/devlink.c |  44 ++-
>  14 files changed, 1076 insertions(+), 17 deletions(-)
>  create mode 100644 

Re: [net-next 01/10] devlink: Fix param set handling for string type

2018-08-01 Thread Jakub Kicinski
On Wed,  1 Aug 2018 14:52:46 -0700, Saeed Mahameed wrote:
> From: Moshe Shemesh 
> 
> In case devlink param type is string, it needs to copy the string value
> it got from the input to devlink_param_value.
> 
> Fixes: e3b7ca18ad7b ("devlink: Add param set command")
> Signed-off-by: Moshe Shemesh 
> Acked-by: Jiri Pirko 
> Signed-off-by: Saeed Mahameed 
> ---
>  include/net/devlink.h | 2 +-
>  net/core/devlink.c| 3 ++-
>  2 files changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/include/net/devlink.h b/include/net/devlink.h
> index b9b89d6604d4..b0e17c025fdc 100644
> --- a/include/net/devlink.h
> +++ b/include/net/devlink.h
> @@ -311,7 +311,7 @@ union devlink_param_value {
>   u8 vu8;
>   u16 vu16;
>   u32 vu32;
> - const char *vstr;
> + char vstr[DEVLINK_PARAM_MAX_STRING_VALUE];
>   bool vbool;
>  };
>  
> diff --git a/net/core/devlink.c b/net/core/devlink.c
> index 65fc366a78a4..61e126c28526 100644
> --- a/net/core/devlink.c
> +++ b/net/core/devlink.c
> @@ -3014,7 +3014,8 @@ devlink_param_value_get_from_info(const struct 
> devlink_param *param,
>   if (nla_len(info->attrs[DEVLINK_ATTR_PARAM_VALUE_DATA]) >
>   DEVLINK_PARAM_MAX_STRING_VALUE)
>   return -EINVAL;
> - value->vstr = 
> nla_data(info->attrs[DEVLINK_ATTR_PARAM_VALUE_DATA]);
> + strcpy(value->vstr,
> +nla_data(info->attrs[DEVLINK_ATTR_PARAM_VALUE_DATA]));

DEVLINK_ATTR_PARAM_VALUE_DATA is a type mux, I'm struggling to find in
the code where it's checked for being null-terminated for strings.

>   break;
>   case DEVLINK_PARAM_TYPE_BOOL:
>   value->vbool = info->attrs[DEVLINK_ATTR_PARAM_VALUE_DATA] ?



Re: [PATCH net-next] virtio_net: force_napi_tx module param.

2018-08-01 Thread Michael S. Tsirkin
On Mon, Jul 30, 2018 at 02:06:50PM +0800, Jason Wang wrote:
> 
> 
> On 2018年07月25日 08:17, Jon Olson wrote:
> > On Tue, Jul 24, 2018 at 3:46 PM Michael S. Tsirkin  wrote:
> > > On Tue, Jul 24, 2018 at 06:31:54PM -0400, Willem de Bruijn wrote:
> > > > On Tue, Jul 24, 2018 at 6:23 PM Michael S. Tsirkin  
> > > > wrote:
> > > > > On Tue, Jul 24, 2018 at 04:52:53PM -0400, Willem de Bruijn wrote:
> > > > > > >From the above linked patch, I understand that there are yet
> > > > > > other special cases in production, such as a hard cap on #tx queues 
> > > > > > to
> > > > > > 32 regardless of number of vcpus.
> > > > > I don't think upstream kernels have this limit - we can
> > > > > now use vmalloc for higher number of queues.
> > > > Yes. that patch* mentioned it as a google compute engine imposed
> > > > limit. It is exactly such cloud provider imposed rules that I'm
> > > > concerned about working around in upstream drivers.
> > > > 
> > > > * for reference, I mean https://patchwork.ozlabs.org/patch/725249/
> > > Yea. Why does GCE do it btw?
> > There are a few reasons for the limit, some historical, some current.
> > 
> > Historically we did this because of a kernel limit on the number of
> > TAP queues (in Montreal I thought this limit was 32). To my chagrin,
> > the limit upstream at the time we did it was actually eight. We had
> > increased the limit from eight to 32 internally, and it appears in
> > upstream it has subsequently increased upstream to 256. We no longer
> > use TAP for networking, so that constraint no longer applies for us,
> > but when looking at removing/raising the limit we discovered no
> > workloads that clearly benefited from lifting it, and it also placed
> > more pressure on our virtual networking stack particularly on the Tx
> > side. We left it as-is.
> > 
> > In terms of current reasons there are really two. One is memory usage.
> > As you know, virtio-net uses rx/tx pairs, so there's an expectation
> > that the guest will have an Rx queue for every Tx queue. We run our
> > individual virtqueues fairly deep (4096 entries) to give guests a wide
> > time window for re-posting Rx buffers and avoiding starvation on
> > packet delivery. Filling an Rx vring with max-sized mergeable buffers
> > (4096 bytes) is 16MB of GFP_ATOMIC allocations. At 32 queues this can
> > be up to 512MB of memory posted for network buffers. Scaling this to
> > the largest VM GCE offers today (160 VCPUs -- n1-ultramem-160) keeping
> > all of the Rx rings full would (in the large average Rx packet size
> > case) consume up to 2.5 GB(!) of guest RAM. Now, those VMs have 3.8T
> > of RAM available, but I don't believe we've observed a situation where
> > they would have benefited from having 2.5 gigs of buffers posted for
> > incoming network traffic :)
> 
> We can work to have async txq and rxq instead of paris if there's a strong
> requirement.

I think the reason we don't is because RX queueing is programmed by TX
packets.  It might make sense if we support RX queueing policy that
isn't dependent on TX.


> > 
> > The second reason is interrupt related -- as I mentioned above, we
> > have found no workloads that clearly benefit from so many queues, but
> > we have found workloads that degrade. In particular workloads that do
> > a lot of small packet processing but which aren't extremely latency
> > sensitive can achieve higher PPS by taking fewer interrupt across
> > fewer VCPUs due to better batching (this also incurs higher latency,
> > but at the limit the "busy" cores end up suppressing most interrupts
> > and spending most of their cycles farming out work). Memcache is a
> > good example here, particularly if the latency targets for request
> > completion are in the ~milliseconds range (rather than the
> > microseconds we typically strive for with TCP_RR-style workloads).
> > 
> > All of that said, we haven't been forthcoming with data (and
> > unfortunately I don't have it handy in a useful form, otherwise I'd
> > simply post it here), so I understand the hesitation to simply run
> > with napi_tx across the board. As Willem said, this patch seemed like
> > the least disruptive way to allow us to continue down the road of
> > "universal" NAPI Tx and to hopefully get data across enough workloads
> > (with VMs small, large, and absurdly large :) to present a compelling
> > argument in one direction or another. As far as I know there aren't
> > currently any NAPI related ethtool commands (based on a quick perusal
> > of ethtool.h)
> 
> As I suggest before, maybe we can (ab)use tx-frames-irq.
> 
> Thanks
> 
> > -- it seems like it would be fairly involved/heavyweight
> > to plumb one solely for this unless NAPI Tx is something many users
> > will want to tune (and for which other drivers would support tuning).
> > 
> > --
> > Jon Olson


Re: [PATCH net-next] virtio_net: force_napi_tx module param.

2018-08-01 Thread Michael S. Tsirkin
On Wed, Aug 01, 2018 at 11:46:14AM -0400, Willem de Bruijn wrote:
> On Tue, Jul 31, 2018 at 8:34 AM Michael S. Tsirkin  wrote:
> >
> > On Sun, Jul 29, 2018 at 05:32:56PM -0400, Willem de Bruijn wrote:
> > > On Sun, Jul 29, 2018 at 12:01 PM David Miller  wrote:
> > > >
> > > > From: Caleb Raitto 
> > > > Date: Mon, 23 Jul 2018 16:11:19 -0700
> > > >
> > > > > From: Caleb Raitto 
> > > > >
> > > > > The driver disables tx napi if it's not certain that completions will
> > > > > be processed affine with tx service.
> > > > >
> > > > > Its heuristic doesn't account for some scenarios where it is, such as
> > > > > when the queue pair count matches the core but not hyperthread count.
> > > > >
> > > > > Allow userspace to override the heuristic. This is an alternative
> > > > > solution to that in the linked patch. That added more logic in the
> > > > > kernel for these cases, but the agreement was that this was better 
> > > > > left
> > > > > to user control.
> > > > >
> > > > > Do not expand the existing napi_tx variable to a ternary value,
> > > > > because doing so can break user applications that expect
> > > > > boolean ('Y'/'N') instead of integer output. Add a new param instead.
> > > > >
> > > > > Link: https://patchwork.ozlabs.org/patch/725249/
> > > > > Acked-by: Willem de Bruijn 
> > > > > Acked-by: Jon Olson 
> > > > > Signed-off-by: Caleb Raitto 
> > > >
> > > > So I looked into the history surrounding these issues.
> > > >
> > > > First of all, it's always ends up turning out crummy when drivers start
> > > > to set affinities themselves.  The worst possible case is to do it
> > > > _conditionally_, and that is exactly what virtio_net is doing.
> > > >
> > > > From the user's perspective, this provides a really bad experience.
> > > >
> > > > So if I have a 32-queue device and there are 32 cpus, you'll do all
> > > > the affinity settings, stopping Irqbalanced from doing anything
> > > > right?
> > > >
> > > > So if I add one more cpu, you'll say "oops, no idea what to do in
> > > > this situation" and not touch the affinities at all?
> > > >
> > > > That makes no sense at all.
> > > >
> > > > If the driver is going to set affinities at all, OWN that decision
> > > > and set it all the time to something reasonable.
> > > >
> > > > Or accept that you shouldn't be touching this stuff in the first place
> > > > and leave the affinities alone.
> > > >
> > > > Right now we're kinda in a situation where the driver has been setting
> > > > affinities in the ncpus==nqueues cases for some time, so we can't stop
> > > > doing it.
> > > >
> > > > Which means we have to set them in all cases to make the user
> > > > experience sane again.
> > > >
> > > > I looked at the linked to patch again:
> > > >
> > > > https://patchwork.ozlabs.org/patch/725249/
> > > >
> > > > And I think the strategy should be made more generic, to get rid of
> > > > the hyperthreading assumptions.  I also agree that the "assign
> > > > to first N cpus" logic doesn't make much sense either.
> > > >
> > > > Just distribute across the available cpus evenly, and be done with it.
> > >
> > > Sounds good to me.
> >
> > So e.g. we could set an affinity hint to a group of CPUs that
> > might transmit to this queue.
> 
> We also want to set the xps mask for all cpus in the group to this queue.
> 
> Is there a benefit over explicitly choosing one cpu from the set, btw?

If only one CPU actually transmits on this queue then probably yes.
And virtio doesn't know whether that's the case.


> I assumed striping. Something along the lines of
> 
>   int stripe = max_t(int, num_online_cpus() / vi->curr_queue_pairs, 1);
>   int vq = 0;
> 
>   cpumask_clear(xps_mask);
> 
>   for_each_online_cpu(cpu) {
>   cpumask_set_cpu(cpu, xps_mask);
> 
>   if ((i + 1) % stripe == 0) {
>   virtqueue_set_affinity(vi->rq[vq].vq, cpu);
>   virtqueue_set_affinity(vi->sq[vq].vq, cpu);
>   netif_set_xps_queue(vi->dev, xps_mask, vq);
>   cpumask_clear(xps_mask);
>   vq++;
>  }
>   i++;
>   }


Re: [net-next 07/10] net/mlx5: Enable PCIe buffer congestion handling workaround via devlink

2018-08-01 Thread Alexander Duyck
On Wed, Aug 1, 2018 at 2:52 PM, Saeed Mahameed  wrote:
> From: Eran Ben Elisha 
>
> Add support for two driver parameters via devlink params interface:
> - Congestion action
> HW mechanism in the PCIe buffer which monitors the amount of
> consumed PCIe buffer per host. This mechanism supports the
> following actions in case of threshold overflow:
> - disabled - NOP (Default)
> - drop
> - mark - Mark CE bit in the CQE of received packet

Any chance you could clarify the differences between "disabled" and
"drop"? I am assuming the "drop" is a head-of-line drop versus the
"disabled" being a incoming packet drop?

Also I still don't see this as necessarily being all that unique of a
feature/issue. Basically being PCIe bus limited is not all that
uncommon of a thing and has existed since the early days of PCI. In
the case of the Intel NICs we just throw a warning and end up dropping
the incoming packets instead of providing the two other options you
have listed.

> - Congestion mode
> - aggressive - Aggressive static trigger threshold (Default)
> - dynamic - Dynamically change the trigger threshold
>
> These driver-specific params enable the NIC HW workaround to handle
> buffer congestion on the current NIC generation.

Is there any documentation anywhere for any of these features? In the
patch set I see you adding interfaces, but I don't see them documented
anywhere.

> Signed-off-by: Eran Ben Elisha 
> Reviewed-by: Moshe Shemesh 
> Reviewed-by: Jiri Pirko 
> Signed-off-by: Saeed Mahameed 
> ---
>  .../net/ethernet/mellanox/mlx5/core/devlink.c | 204 +-
>  1 file changed, 203 insertions(+), 1 deletion(-)

Ideally there should be some documentation going into the kernel when
you extend the devlink interface at least so that I know how to use
your new interfaces when you define them. Just updating devlink.c
seems like a messy way to do things.

- Alex


[net-next 09/10] net/mlx5: Add Crdump FW snapshot support

2018-08-01 Thread Saeed Mahameed
From: Alex Vesker 

Crdump allows the driver to create a snapshot of the FW PCI
crspace. This is useful in case of catastrophic issues which
require FW reset. The snapshot can be used for later debug.

The snapshot is exposed using devlink, cr-space
address regions are registered on init and snapshots are attached
once a new snapshot is collected by the driver.

Signed-off-by: Alex Vesker 
Signed-off-by: Saeed Mahameed 
---
 .../net/ethernet/mellanox/mlx5/core/Makefile  |   3 +-
 .../ethernet/mellanox/mlx5/core/diag/crdump.c | 201 ++
 .../net/ethernet/mellanox/mlx5/core/health.c  |   1 +
 .../ethernet/mellanox/mlx5/core/lib/mlx5.h|   2 +
 .../net/ethernet/mellanox/mlx5/core/main.c|   5 +
 include/linux/mlx5/driver.h   |   4 +
 6 files changed, 215 insertions(+), 1 deletion(-)
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/diag/crdump.c

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Makefile 
b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
index 15f6916efe1b..6ea9e9462c77 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/Makefile
+++ b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
@@ -6,7 +6,8 @@ mlx5_core-y :=  main.o cmd.o debugfs.o fw.o eq.o uar.o 
pagealloc.o \
health.o mcg.o cq.o srq.o alloc.o qp.o port.o mr.o pd.o \
mad.o transobj.o vport.o sriov.o fs_cmd.o fs_core.o \
fs_counters.o rl.o lag.o dev.o wq.o lib/gid.o lib/clock.o \
-   lib/pci_vsc.o diag/fs_tracepoint.o diag/fw_tracer.o devlink.o
+   lib/pci_vsc.o diag/fs_tracepoint.o diag/fw_tracer.o 
diag/crdump.o \
+   devlink.o
 
 mlx5_core-$(CONFIG_MLX5_ACCEL) += accel/ipsec.o accel/tls.o
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/diag/crdump.c 
b/drivers/net/ethernet/mellanox/mlx5/core/diag/crdump.c
new file mode 100644
index ..fe779e62fc70
--- /dev/null
+++ b/drivers/net/ethernet/mellanox/mlx5/core/diag/crdump.c
@@ -0,0 +1,201 @@
+/*
+ * Copyright (c) 2018, Mellanox Technologies. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ *  - Redistributions of source code must retain the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer.
+ *
+ *  - Redistributions in binary form must reproduce the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer in the documentation and/or other materials
+ *provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#include 
+#include 
+#include 
+#include "mlx5_core.h"
+#include "lib/pci_vsc.h"
+
+#define BAD_ACCESS 0xBADACCE5
+#define MLX5_PROTECTED_CR_SCAN_CRSPACE 0x7
+#define MAX_NUM_OF_DUMPS_TO_STORE  (8)
+
+static const char *region_cr_space_str = "cr-space";
+
+struct mlx5_fw_crdump {
+   u32 size;
+   struct devlink_region   *region_crspace;
+};
+
+bool mlx5_crdump_enbaled(struct mlx5_core_dev *dev)
+{
+   struct mlx5_priv *priv = >priv;
+
+   return (!!priv->health.crdump);
+}
+
+static int mlx5_crdump_fill(struct mlx5_core_dev *dev)
+{
+   struct devlink *devlink = priv_to_devlink(dev);
+   struct mlx5_priv *priv = >priv;
+   struct mlx5_fw_crdump *crdump = priv->health.crdump;
+   int i, ret = 0;
+   u32 *cr_data;
+   u32 id;
+
+   cr_data = kvmalloc(crdump->size, GFP_KERNEL);
+   if (!cr_data)
+   return -ENOMEM;
+
+   for (i = 0; i < (crdump->size / 4); i++)
+   cr_data[i] = BAD_ACCESS;
+
+   ret = mlx5_vsc_gw_read_block_fast(dev, cr_data, crdump->size);
+   if (ret <= 0)
+   goto free_data;
+
+   if (crdump->size != ret) {
+   mlx5_core_warn(dev, "failed to read full dump, read %d out of 
%u\n",
+  ret, crdump->size);
+   ret = -EINVAL;
+   goto free_data;
+   }
+
+   /* Get the available snapshot ID for the dumps */
+   id = devlink_region_shapshot_id_get(devlink);
+   ret = 

[net-next 10/10] net/mlx5: Use devlink region_snapshot parameter

2018-08-01 Thread Saeed Mahameed
From: Alex Vesker 

This parameter enables capturing region snapshot of the crspace
during critical errors. The default value of this parameter is
disabled, it can be enabled using devlink param commands.
It is possible to configure during runtime and also driver init.

Signed-off-by: Alex Vesker 
Signed-off-by: Saeed Mahameed 
---
 .../net/ethernet/mellanox/mlx5/core/devlink.c | 49 +++
 .../ethernet/mellanox/mlx5/core/diag/crdump.c | 22 +
 .../ethernet/mellanox/mlx5/core/lib/mlx5.h|  2 +
 3 files changed, 73 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/devlink.c 
b/drivers/net/ethernet/mellanox/mlx5/core/devlink.c
index ec0ca690839e..4e33049096d0 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/devlink.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/devlink.c
@@ -2,6 +2,7 @@
 /* Copyright (c) 2018, Mellanox Technologies inc. All rights reserved. */
 
 #include 
+#include "lib/mlx5.h"
 
 enum {
MLX5_DEVLINK_MPEGC_FIELD_SELECT_TX_OVERFLOW_DROP_EN = BIT(0),
@@ -288,6 +289,24 @@ static int mlx5_devlink_get_congestion_mode(struct devlink 
*devlink, u32 id,
return 0;
 }
 
+static int mlx5_devlink_get_crdump_snapshot(struct devlink *devlink, u32 id,
+   struct devlink_param_gset_ctx *ctx)
+{
+   struct mlx5_core_dev *dev = devlink_priv(devlink);
+
+   ctx->val.vbool = mlx5_crdump_is_snapshot_enabled(dev);
+   return 0;
+}
+
+static int mlx5_devlink_set_crdump_snapshot(struct devlink *devlink, u32 id,
+   struct devlink_param_gset_ctx *ctx,
+   struct netlink_ext_ack *extack)
+{
+   struct mlx5_core_dev *dev = devlink_priv(devlink);
+
+   return mlx5_crdump_set_snapshot_enabled(dev, ctx->val.vbool);
+}
+
 enum mlx5_devlink_param_id {
MLX5_DEVLINK_PARAM_ID_BASE = DEVLINK_PARAM_GENERIC_ID_MAX,
MLX5_DEVLINK_PARAM_ID_CONGESTION_ACTION,
@@ -295,6 +314,11 @@ enum mlx5_devlink_param_id {
 };
 
 static const struct devlink_param mlx5_devlink_params[] = {
+   DEVLINK_PARAM_GENERIC(REGION_SNAPSHOT,
+ BIT(DEVLINK_PARAM_CMODE_RUNTIME) |
+ BIT(DEVLINK_PARAM_CMODE_DRIVERINIT),
+ mlx5_devlink_get_crdump_snapshot,
+ mlx5_devlink_set_crdump_snapshot, NULL),
DEVLINK_PARAM_DRIVER(MLX5_DEVLINK_PARAM_ID_CONGESTION_ACTION,
 "congestion_action",
 DEVLINK_PARAM_TYPE_STRING,
@@ -309,6 +333,29 @@ static const struct devlink_param mlx5_devlink_params[] = {
 mlx5_devlink_set_congestion_mode, NULL),
 };
 
+static void mlx5_devlink_set_init_value(struct devlink *devlink, u32 param_id,
+   union devlink_param_value init_val)
+{
+   struct mlx5_core_dev *dev = devlink_priv(devlink);
+   int err;
+
+   err = devlink_param_driverinit_value_set(devlink, param_id, init_val);
+   if (err)
+   dev_warn(>pdev->dev,
+"devlink set parameter %u value failed (err = %d)",
+param_id, err);
+}
+
+static void mlx5_devlink_set_params_init_values(struct devlink *devlink)
+{
+   union devlink_param_value value;
+
+   value.vbool = false;
+   mlx5_devlink_set_init_value(devlink,
+   DEVLINK_PARAM_GENERIC_ID_REGION_SNAPSHOT,
+   value);
+}
+
 int mlx5_devlink_register(struct devlink *devlink, struct device *dev)
 {
int err;
@@ -324,6 +371,8 @@ int mlx5_devlink_register(struct devlink *devlink, struct 
device *dev)
goto unregister;
}
 
+   mlx5_devlink_set_params_init_values(devlink);
+
return 0;
 
 unregister:
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/diag/crdump.c 
b/drivers/net/ethernet/mellanox/mlx5/core/diag/crdump.c
index fe779e62fc70..94b74b256eaa 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/diag/crdump.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/diag/crdump.c
@@ -44,6 +44,7 @@ static const char *region_cr_space_str = "cr-space";
 
 struct mlx5_fw_crdump {
u32 size;
+   boolsnapshot_enable;
struct devlink_region   *region_crspace;
 };
 
@@ -125,6 +126,27 @@ int mlx5_crdump_collect(struct mlx5_core_dev *dev)
return ret;
 }
 
+bool mlx5_crdump_is_snapshot_enabled(struct mlx5_core_dev *dev)
+{
+   struct mlx5_priv *priv = >priv;
+
+   if (mlx5_crdump_enbaled(dev))
+   return priv->health.crdump->snapshot_enable;
+
+   return false;
+}
+
+int mlx5_crdump_set_snapshot_enabled(struct mlx5_core_dev *dev, bool value)
+{
+   struct mlx5_priv *priv = >priv;
+
+   if (!mlx5_crdump_enbaled(dev))
+   return -ENODEV;
+
+   priv->health.crdump->snapshot_enable = value;
+   

[net-next 07/10] net/mlx5: Enable PCIe buffer congestion handling workaround via devlink

2018-08-01 Thread Saeed Mahameed
From: Eran Ben Elisha 

Add support for two driver parameters via devlink params interface:
- Congestion action
HW mechanism in the PCIe buffer which monitors the amount of
consumed PCIe buffer per host. This mechanism supports the
following actions in case of threshold overflow:
- disabled - NOP (Default)
- drop
- mark - Mark CE bit in the CQE of received packet
- Congestion mode
- aggressive - Aggressive static trigger threshold (Default)
- dynamic - Dynamically change the trigger threshold

These driver-specific params enable the NIC HW workaround to handle
buffer congestion on the current NIC generation.

Signed-off-by: Eran Ben Elisha 
Reviewed-by: Moshe Shemesh 
Reviewed-by: Jiri Pirko 
Signed-off-by: Saeed Mahameed 
---
 .../net/ethernet/mellanox/mlx5/core/devlink.c | 204 +-
 1 file changed, 203 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/devlink.c 
b/drivers/net/ethernet/mellanox/mlx5/core/devlink.c
index e3a5de6b4ee7..ec0ca690839e 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/devlink.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/devlink.c
@@ -126,12 +126,214 @@ static int mlx5_devlink_query_tx_overflow_sense(struct 
mlx5_core_dev *mdev,
return 0;
 }
 
+static const char *const action_to_str[] = {
+   [MLX5_DEVLINK_CONGESTION_ACTION_DISABLED] = "disabled",
+   [MLX5_DEVLINK_CONGESTION_ACTION_DROP] = "drop",
+   [MLX5_DEVLINK_CONGESTION_ACTION_MARK] = "mark"
+};
+
+static const char *mlx5_devlink_congestion_action_to_str(int action)
+{
+   if (action > MLX5_DEVLINK_CONGESTION_ACTION_MAX) {
+   WARN_ON(1);
+   return ERR_PTR(-EINVAL);
+   }
+
+   return action_to_str[action];
+}
+
+static int mlx5_devlink_str_to_congestion_action(const char *str, u8 *action)
+{
+   int i;
+
+   for (i = 0; i <= MLX5_DEVLINK_CONGESTION_ACTION_MAX; i++) {
+   if (!strcmp(str, action_to_str[i])) {
+   *action = i;
+   return 0;
+   }
+   }
+
+   return -EINVAL;
+}
+
+static int mlx5_devlink_set_congestion_action(struct devlink *devlink, u32 id,
+ struct devlink_param_gset_ctx 
*ctx,
+ struct netlink_ext_ack *extack)
+{
+   struct mlx5_core_dev *dev = devlink_priv(devlink);
+   u8 max = MLX5_DEVLINK_CONGESTION_ACTION_MAX;
+   u8 congestion_action;
+   u8 sense;
+   int err;
+
+   if (!MLX5_CAP_MCAM_FEATURE(dev, mark_tx_action_cqe) &&
+   !MLX5_CAP_MCAM_FEATURE(dev, mark_tx_action_cnp))
+   max = MLX5_DEVLINK_CONGESTION_ACTION_MARK - 1;
+
+   err = mlx5_devlink_str_to_congestion_action(ctx->val.vstr,
+   _action);
+   if (err)
+   return err;
+
+   if (congestion_action > max) {
+   NL_SET_ERR_MSG(extack, "Requested congestion action is not 
supported on current device/FW");
+   return -EINVAL;
+   }
+
+   err = mlx5_devlink_query_tx_overflow_sense(dev, );
+   if (err)
+   return err;
+
+   if (congestion_action == MLX5_DEVLINK_CONGESTION_ACTION_DISABLED &&
+   sense != MLX5_DEVLINK_CONGESTION_MODE_AGGRESSIVE) {
+   NL_SET_ERR_MSG(extack, "Congestion action \"disabled\" is 
allowed only while mode is configured to aggressive");
+   return -EINVAL;
+   }
+
+   return mlx5_devlink_set_tx_lossy_overflow(dev, congestion_action);
+}
+
+static int mlx5_devlink_get_congestion_action(struct devlink *devlink, u32 id,
+ struct devlink_param_gset_ctx 
*ctx)
+{
+   struct mlx5_core_dev *dev = devlink_priv(devlink);
+   u8 congestion_action;
+   const char *val;
+   int err;
+
+   err = mlx5_devlink_query_tx_lossy_overflow(dev, _action);
+   if (err)
+   return err;
+
+   val = mlx5_devlink_congestion_action_to_str(congestion_action);
+   if (IS_ERR(val))
+   return PTR_ERR(val);
+
+   devlink_param_value_str_fill(>val, val);
+   return 0;
+}
+
+static const char *const mode_to_str[] = {
+   [MLX5_DEVLINK_CONGESTION_MODE_AGGRESSIVE] = "aggressive",
+   [MLX5_DEVLINK_CONGESTION_MODE_DYNAMIC_ADJUSTMENT] = "dynamic"
+};
+
+static const char *mlx5_devlink_congestion_mode_to_str(int mode)
+{
+   if (mode > MLX5_DEVLINK_CONGESTION_MODE_MAX) {
+   WARN_ON(1);
+   return ERR_PTR(-EINVAL);
+   }
+
+   return mode_to_str[mode];
+}
+
+static int mlx5_devlink_str_to_congestion_mode(const char *str, u8 *mode)
+{
+   int i;
+
+   for (i = 0; i <= MLX5_DEVLINK_CONGESTION_MODE_MAX; i++) {
+   if (!strcmp(str, mode_to_str[i])) {
+   *mode = i;
+   return 0;
+   }
+   }
+
+  

[net-next 08/10] net/mlx5: Add Vendor Specific Capability access gateway

2018-08-01 Thread Saeed Mahameed
From: Alex Vesker 

The Vendor Specific Capability (VSC) is used to activate a gateway
interfacing with the device. The gateway is used to read or write
device configurations, which are organized in different domains (spaces).
A configuration access may result in multiple actions, reads, writes.

Example usages are accessing the Crspace domain to read the crspace or
locking a device semaphore using the Semaphore domain.

Signed-off-by: Alex Vesker 
Signed-off-by: Saeed Mahameed 
---
 .../net/ethernet/mellanox/mlx5/core/Makefile  |   2 +-
 .../net/ethernet/mellanox/mlx5/core/health.c  |   2 +
 .../ethernet/mellanox/mlx5/core/lib/pci_vsc.c | 320 ++
 .../ethernet/mellanox/mlx5/core/lib/pci_vsc.h |  56 +++
 include/linux/mlx5/driver.h   |   1 +
 5 files changed, 380 insertions(+), 1 deletion(-)
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/lib/pci_vsc.c
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/lib/pci_vsc.h

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Makefile 
b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
index 0be5bf93f33b..15f6916efe1b 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/Makefile
+++ b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
@@ -6,7 +6,7 @@ mlx5_core-y :=  main.o cmd.o debugfs.o fw.o eq.o uar.o 
pagealloc.o \
health.o mcg.o cq.o srq.o alloc.o qp.o port.o mr.o pd.o \
mad.o transobj.o vport.o sriov.o fs_cmd.o fs_core.o \
fs_counters.o rl.o lag.o dev.o wq.o lib/gid.o lib/clock.o \
-   diag/fs_tracepoint.o diag/fw_tracer.o devlink.o
+   lib/pci_vsc.o diag/fs_tracepoint.o diag/fw_tracer.o devlink.o
 
 mlx5_core-$(CONFIG_MLX5_ACCEL) += accel/ipsec.o accel/tls.o
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/health.c 
b/drivers/net/ethernet/mellanox/mlx5/core/health.c
index d39b0b7011b2..db9e39fdc33e 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/health.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/health.c
@@ -38,6 +38,7 @@
 #include 
 #include 
 #include "mlx5_core.h"
+#include "lib/pci_vsc.h"
 
 enum {
MLX5_HEALTH_POLL_INTERVAL   = 2 * HZ,
@@ -388,6 +389,7 @@ int mlx5_health_init(struct mlx5_core_dev *dev)
spin_lock_init(>wq_lock);
INIT_WORK(>work, health_care);
INIT_DELAYED_WORK(>recover_work, health_recover);
+   mlx5_vsc_init(dev);
 
return 0;
 }
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/pci_vsc.c 
b/drivers/net/ethernet/mellanox/mlx5/core/lib/pci_vsc.c
new file mode 100644
index ..cfc2eebbf7c5
--- /dev/null
+++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/pci_vsc.c
@@ -0,0 +1,320 @@
+/*
+ * Copyright (c) 2018, Mellanox Technologies. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ *  - Redistributions of source code must retain the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer.
+ *
+ *  - Redistributions in binary form must reproduce the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer in the documentation and/or other materials
+ *provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#include 
+#include "mlx5_core.h"
+#include "pci_vsc.h"
+
+#define MLX5_EXTRACT_C(source, offset, size)   \
+   unsigned)(source)) >> (offset)) & MLX5_ONES32(size))
+#define MLX5_EXTRACT(src, start, len)  \
+   (((len) == 32) ? (src) : MLX5_EXTRACT_C(src, start, len))
+#define MLX5_ONES32(size)  \
+   ((size) ? (0x >> (32 - (size))) : 0)
+#define MLX5_MASK32(offset, size)  \
+   (MLX5_ONES32(size) << (offset))
+#define MLX5_MERGE_C(rsrc1, rsrc2, start, len)  \
+   rsrc2) << (start)) & (MLX5_MASK32((start), (len | \
+   ((rsrc1) & (~MLX5_MASK32((start), (len)
+#define MLX5_MERGE(rsrc1, rsrc2, start, len)   \
+   (((len) == 32) ? (rsrc2) : MLX5_MERGE_C(rsrc1, rsrc2, start, len))
+
+#define VSC_MAX_RETRIES 2048
+
+enum {
+   

[net-next 06/10] net/mlx5: Add MPEGC register configuration functionality

2018-08-01 Thread Saeed Mahameed
From: Eran Ben Elisha 

MPEGC register is used to configure and access the PCIe general
configuration.

Expose set/get for TX lossy overflow and TX overflow sense which use the
MPEGC register. These will be used in a downstream patch via devlink
params.

Signed-off-by: Eran Ben Elisha 
Reviewed-by: Moshe Shemesh 
Signed-off-by: Saeed Mahameed 
---
 .../net/ethernet/mellanox/mlx5/core/devlink.c | 123 ++
 .../net/ethernet/mellanox/mlx5/core/devlink.h |   1 +
 2 files changed, 124 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/devlink.c 
b/drivers/net/ethernet/mellanox/mlx5/core/devlink.c
index 2daf686bcc98..e3a5de6b4ee7 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/devlink.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/devlink.c
@@ -3,6 +3,129 @@
 
 #include 
 
+enum {
+   MLX5_DEVLINK_MPEGC_FIELD_SELECT_TX_OVERFLOW_DROP_EN = BIT(0),
+   MLX5_DEVLINK_MPEGC_FIELD_SELECT_TX_OVERFLOW_SENSE = BIT(3),
+   MLX5_DEVLINK_MPEGC_FIELD_SELECT_MARK_TX_ACTION_CQE = BIT(4),
+   MLX5_DEVLINK_MPEGC_FIELD_SELECT_MARK_TX_ACTION_CNP = BIT(5),
+};
+
+enum {
+   MLX5_DEVLINK_CONGESTION_ACTION_DISABLED,
+   MLX5_DEVLINK_CONGESTION_ACTION_DROP,
+   MLX5_DEVLINK_CONGESTION_ACTION_MARK,
+   __MLX5_DEVLINK_CONGESTION_ACTION_MAX,
+   MLX5_DEVLINK_CONGESTION_ACTION_MAX = 
__MLX5_DEVLINK_CONGESTION_ACTION_MAX - 1,
+};
+
+enum {
+   MLX5_DEVLINK_CONGESTION_MODE_AGGRESSIVE,
+   MLX5_DEVLINK_CONGESTION_MODE_DYNAMIC_ADJUSTMENT,
+   __MLX5_DEVLINK_CONGESTION_MODE_MAX,
+   MLX5_DEVLINK_CONGESTION_MODE_MAX = __MLX5_DEVLINK_CONGESTION_MODE_MAX - 
1,
+};
+
+static int mlx5_devlink_set_mpegc(struct mlx5_core_dev *mdev, u32 *in,
+ int size_in)
+{
+   u32 out[MLX5_ST_SZ_DW(mpegc_reg)] = {0};
+
+   if (!MLX5_CAP_MCAM_REG(mdev, mpegc))
+   return -EOPNOTSUPP;
+
+   return mlx5_core_access_reg(mdev, in, size_in, out,
+   sizeof(out), MLX5_REG_MPEGC, 0, 1);
+}
+
+static int mlx5_devlink_set_tx_lossy_overflow(struct mlx5_core_dev *mdev,
+ u8 tx_lossy_overflow)
+{
+   u32 in[MLX5_ST_SZ_DW(mpegc_reg)] = {0};
+   u8 field_select = 0;
+
+   if (tx_lossy_overflow == MLX5_DEVLINK_CONGESTION_ACTION_MARK) {
+   if (MLX5_CAP_MCAM_FEATURE(mdev, mark_tx_action_cqe))
+   field_select |=
+   
MLX5_DEVLINK_MPEGC_FIELD_SELECT_MARK_TX_ACTION_CQE;
+
+   if (MLX5_CAP_MCAM_FEATURE(mdev, mark_tx_action_cnp))
+   field_select |=
+   
MLX5_DEVLINK_MPEGC_FIELD_SELECT_MARK_TX_ACTION_CNP;
+
+   if (!field_select)
+   return -EOPNOTSUPP;
+   }
+
+   MLX5_SET(mpegc_reg, in, field_select,
+field_select |
+MLX5_DEVLINK_MPEGC_FIELD_SELECT_TX_OVERFLOW_DROP_EN);
+   MLX5_SET(mpegc_reg, in, tx_lossy_overflow_oper, tx_lossy_overflow);
+   MLX5_SET(mpegc_reg, in, mark_cqe, 0x1);
+   MLX5_SET(mpegc_reg, in, mark_cnp, 0x1);
+
+   return mlx5_devlink_set_mpegc(mdev, in, sizeof(in));
+}
+
+static int mlx5_devlink_set_tx_overflow_sense(struct mlx5_core_dev *mdev,
+ u8 tx_overflow_sense)
+{
+   u32 in[MLX5_ST_SZ_DW(mpegc_reg)] = {0};
+
+   if (!MLX5_CAP_MCAM_FEATURE(mdev, dynamic_tx_overflow))
+   return -EOPNOTSUPP;
+
+   MLX5_SET(mpegc_reg, in, field_select,
+MLX5_DEVLINK_MPEGC_FIELD_SELECT_TX_OVERFLOW_SENSE);
+   MLX5_SET(mpegc_reg, in, tx_overflow_sense, tx_overflow_sense);
+
+   return mlx5_devlink_set_mpegc(mdev, in, sizeof(in));
+}
+
+static int mlx5_devlink_query_mpegc(struct mlx5_core_dev *mdev, u32 *out,
+   int size_out)
+{
+   u32 in[MLX5_ST_SZ_DW(mpegc_reg)] = {0};
+
+   if (!MLX5_CAP_MCAM_REG(mdev, mpegc))
+   return -EOPNOTSUPP;
+
+   return mlx5_core_access_reg(mdev, in, sizeof(in), out,
+   size_out, MLX5_REG_MPEGC, 0, 0);
+}
+
+static int mlx5_devlink_query_tx_lossy_overflow(struct mlx5_core_dev *mdev,
+   u8 *tx_lossy_overflow)
+{
+   u32 out[MLX5_ST_SZ_DW(mpegc_reg)] = {0};
+   int err;
+
+   err = mlx5_devlink_query_mpegc(mdev, out, sizeof(out));
+   if (err)
+   return err;
+
+   *tx_lossy_overflow = MLX5_GET(mpegc_reg, out, tx_lossy_overflow_oper);
+
+   return 0;
+}
+
+static int mlx5_devlink_query_tx_overflow_sense(struct mlx5_core_dev *mdev,
+   u8 *tx_overflow_sense)
+{
+   u32 out[MLX5_ST_SZ_DW(mpegc_reg)] = {0};
+   int err;
+
+   if (!MLX5_CAP_MCAM_FEATURE(mdev, dynamic_tx_overflow))
+   return -EOPNOTSUPP;
+
+   err = mlx5_devlink_query_mpegc(mdev, out, sizeof(out));
+   if (err)
+

[net-next 05/10] net/mlx5: Move all devlink related functions calls to devlink.c

2018-08-01 Thread Saeed Mahameed
From: Eran Ben Elisha 

Centralize all devlink related callbacks in one file.
In the downstream patch, some more functionality will be added, this
patch is preparing the driver infrastructure for it.

Currently, move devlink un/register functions calls into this file.

Signed-off-by: Eran Ben Elisha 
Reviewed-by: Moshe Shemesh 
Reviewed-by: Jiri Pirko 
Signed-off-by: Saeed Mahameed 
---
 drivers/net/ethernet/mellanox/mlx5/core/Makefile  |  2 +-
 drivers/net/ethernet/mellanox/mlx5/core/devlink.c | 14 ++
 drivers/net/ethernet/mellanox/mlx5/core/devlink.h | 12 
 drivers/net/ethernet/mellanox/mlx5/core/main.c|  5 +++--
 4 files changed, 30 insertions(+), 3 deletions(-)
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/devlink.c
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/devlink.h

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Makefile 
b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
index f20fda1ced4f..0be5bf93f33b 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/Makefile
+++ b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
@@ -6,7 +6,7 @@ mlx5_core-y :=  main.o cmd.o debugfs.o fw.o eq.o uar.o 
pagealloc.o \
health.o mcg.o cq.o srq.o alloc.o qp.o port.o mr.o pd.o \
mad.o transobj.o vport.o sriov.o fs_cmd.o fs_core.o \
fs_counters.o rl.o lag.o dev.o wq.o lib/gid.o lib/clock.o \
-   diag/fs_tracepoint.o diag/fw_tracer.o
+   diag/fs_tracepoint.o diag/fw_tracer.o devlink.o
 
 mlx5_core-$(CONFIG_MLX5_ACCEL) += accel/ipsec.o accel/tls.o
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/devlink.c 
b/drivers/net/ethernet/mellanox/mlx5/core/devlink.c
new file mode 100644
index ..2daf686bcc98
--- /dev/null
+++ b/drivers/net/ethernet/mellanox/mlx5/core/devlink.c
@@ -0,0 +1,14 @@
+// SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB
+/* Copyright (c) 2018, Mellanox Technologies inc. All rights reserved. */
+
+#include 
+
+int mlx5_devlink_register(struct devlink *devlink, struct device *dev)
+{
+   return devlink_register(devlink, dev);
+}
+
+void mlx5_devlink_unregister(struct devlink *devlink)
+{
+   devlink_unregister(devlink);
+}
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/devlink.h 
b/drivers/net/ethernet/mellanox/mlx5/core/devlink.h
new file mode 100644
index ..455bfa4e89c0
--- /dev/null
+++ b/drivers/net/ethernet/mellanox/mlx5/core/devlink.h
@@ -0,0 +1,12 @@
+/* SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB */
+/* Copyright (c) 2018, Mellanox Technologies inc. All rights reserved. */
+
+#ifndef __MLX5_DEVLINK_H__
+#define __MLX5_DEVLINK_H__
+
+#include 
+
+int mlx5_devlink_register(struct devlink *devlink, struct device *dev);
+void mlx5_devlink_unregister(struct devlink *devlink);
+
+#endif /* __MLX5_DEVLINK_H__ */
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c 
b/drivers/net/ethernet/mellanox/mlx5/core/main.c
index 03b9c6733eed..7a1ddf96f065 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c
@@ -56,6 +56,7 @@
 #include "fs_core.h"
 #include "lib/mpfs.h"
 #include "eswitch.h"
+#include "devlink.h"
 #include "lib/mlx5.h"
 #include "fpga/core.h"
 #include "fpga/ipsec.h"
@@ -1445,7 +1446,7 @@ static int init_one(struct pci_dev *pdev,
 
request_module_nowait(MLX5_IB_MOD);
 
-   err = devlink_register(devlink, >dev);
+   err = mlx5_devlink_register(devlink, >dev);
if (err)
goto clean_load;
 
@@ -1475,7 +1476,7 @@ static void remove_one(struct pci_dev *pdev)
struct devlink *devlink = priv_to_devlink(dev);
struct mlx5_priv *priv = >priv;
 
-   devlink_unregister(devlink);
+   mlx5_devlink_unregister(devlink);
mlx5_unregister_device(dev);
 
if (mlx5_unload_one(dev, priv, true)) {
-- 
2.17.0



[net-next 03/10] devlink: Add helper function for safely copy string param

2018-08-01 Thread Saeed Mahameed
From: Moshe Shemesh 

Devlink string param buffer is allocated at the size of
DEVLINK_PARAM_MAX_STRING_VALUE. Add helper function which makes sure
this size is not exceeded.
Renamed DEVLINK_PARAM_MAX_STRING_VALUE to
__DEVLINK_PARAM_MAX_STRING_VALUE to emphasize that it should be used by
devlink only. The driver should use the helper function instead to
verify it doesn't exceed the allowed length.

Signed-off-by: Moshe Shemesh 
Acked-by: Jiri Pirko 
Signed-off-by: Saeed Mahameed 
---
 include/net/devlink.h | 12 ++--
 net/core/devlink.c| 19 ++-
 2 files changed, 28 insertions(+), 3 deletions(-)

diff --git a/include/net/devlink.h b/include/net/devlink.h
index b0e17c025fdc..99efc156a309 100644
--- a/include/net/devlink.h
+++ b/include/net/devlink.h
@@ -298,7 +298,7 @@ struct devlink_resource {
 
 #define DEVLINK_RESOURCE_ID_PARENT_TOP 0
 
-#define DEVLINK_PARAM_MAX_STRING_VALUE 32
+#define __DEVLINK_PARAM_MAX_STRING_VALUE 32
 enum devlink_param_type {
DEVLINK_PARAM_TYPE_U8,
DEVLINK_PARAM_TYPE_U16,
@@ -311,7 +311,7 @@ union devlink_param_value {
u8 vu8;
u16 vu16;
u32 vu32;
-   char vstr[DEVLINK_PARAM_MAX_STRING_VALUE];
+   char vstr[__DEVLINK_PARAM_MAX_STRING_VALUE];
bool vbool;
 };
 
@@ -553,6 +553,8 @@ int devlink_param_driverinit_value_get(struct devlink 
*devlink, u32 param_id,
 int devlink_param_driverinit_value_set(struct devlink *devlink, u32 param_id,
   union devlink_param_value init_val);
 void devlink_param_value_changed(struct devlink *devlink, u32 param_id);
+void devlink_param_value_str_fill(union devlink_param_value *dst_val,
+ const char *src);
 struct devlink_region *devlink_region_create(struct devlink *devlink,
 const char *region_name,
 u32 region_max_snapshots,
@@ -789,6 +791,12 @@ devlink_param_value_changed(struct devlink *devlink, u32 
param_id)
 {
 }
 
+static inline void
+devlink_param_value_str_fill(union devlink_param_value *dst_val,
+const char *src)
+{
+}
+
 static inline struct devlink_region *
 devlink_region_create(struct devlink *devlink,
  const char *region_name,
diff --git a/net/core/devlink.c b/net/core/devlink.c
index 99ad53e8297f..367564099357 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -3012,7 +3012,7 @@ devlink_param_value_get_from_info(const struct 
devlink_param *param,
break;
case DEVLINK_PARAM_TYPE_STRING:
if (nla_len(info->attrs[DEVLINK_ATTR_PARAM_VALUE_DATA]) >
-   DEVLINK_PARAM_MAX_STRING_VALUE)
+   __DEVLINK_PARAM_MAX_STRING_VALUE)
return -EINVAL;
strcpy(value->vstr,
   nla_data(info->attrs[DEVLINK_ATTR_PARAM_VALUE_DATA]));
@@ -4614,6 +4614,23 @@ void devlink_param_value_changed(struct devlink 
*devlink, u32 param_id)
 }
 EXPORT_SYMBOL_GPL(devlink_param_value_changed);
 
+/**
+ * devlink_param_value_str_fill - Safely fill-up the string preventing
+ *from overflow of the preallocated buffer
+ *
+ * @dst_val: destination devlink_param_value
+ * @src: source buffer
+ */
+void devlink_param_value_str_fill(union devlink_param_value *dst_val,
+ const char *src)
+{
+   size_t len;
+
+   len = strlcpy(dst_val->vstr, src, __DEVLINK_PARAM_MAX_STRING_VALUE);
+   WARN_ON(len >= __DEVLINK_PARAM_MAX_STRING_VALUE);
+}
+EXPORT_SYMBOL_GPL(devlink_param_value_str_fill);
+
 /**
  * devlink_region_create - create a new address region
  *
-- 
2.17.0



[net-next 04/10] devlink: Add extack messages support to param set

2018-08-01 Thread Saeed Mahameed
From: Moshe Shemesh 

Add param extack messages support to param->set() function.
This will enable providing clear reasoning to user on setting new value
failure. Note that param->validate() already supports extack messages,
but it is not enough as input validation can pass or even not needed and
still setting new value may fail for some reason, such as device state,
FW support, etc.

Signed-off-by: Moshe Shemesh 
Signed-off-by: Jiri Pirko 
Signed-off-by: Saeed Mahameed 
---
 drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c | 3 ++-
 drivers/net/ethernet/mellanox/mlx4/main.c | 6 --
 include/net/devlink.h | 3 ++-
 net/core/devlink.c| 7 ---
 4 files changed, 12 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c 
b/drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c
index 7bd96ab4f7c5..97b987f31881 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c
@@ -91,7 +91,8 @@ static int bnxt_dl_nvm_param_get(struct devlink *dl, u32 id,
 }
 
 static int bnxt_dl_nvm_param_set(struct devlink *dl, u32 id,
-struct devlink_param_gset_ctx *ctx)
+struct devlink_param_gset_ctx *ctx,
+struct netlink_ext_ack *extack)
 {
struct hwrm_nvm_set_variable_input req = {0};
struct bnxt *bp = bnxt_get_bp_from_dl(dl);
diff --git a/drivers/net/ethernet/mellanox/mlx4/main.c 
b/drivers/net/ethernet/mellanox/mlx4/main.c
index d2d59444f562..629a5c2a4971 100644
--- a/drivers/net/ethernet/mellanox/mlx4/main.c
+++ b/drivers/net/ethernet/mellanox/mlx4/main.c
@@ -186,7 +186,8 @@ static int mlx4_devlink_ierr_reset_get(struct devlink 
*devlink, u32 id,
 }
 
 static int mlx4_devlink_ierr_reset_set(struct devlink *devlink, u32 id,
-  struct devlink_param_gset_ctx *ctx)
+  struct devlink_param_gset_ctx *ctx,
+  struct netlink_ext_ack *extack)
 {
mlx4_internal_err_reset = ctx->val.vbool;
return 0;
@@ -203,7 +204,8 @@ static int mlx4_devlink_crdump_snapshot_get(struct devlink 
*devlink, u32 id,
 }
 
 static int mlx4_devlink_crdump_snapshot_set(struct devlink *devlink, u32 id,
-   struct devlink_param_gset_ctx *ctx)
+   struct devlink_param_gset_ctx *ctx,
+   struct netlink_ext_ack *extack)
 {
struct mlx4_priv *priv = devlink_priv(devlink);
struct mlx4_dev *dev = >dev;
diff --git a/include/net/devlink.h b/include/net/devlink.h
index 99efc156a309..84294f4c2580 100644
--- a/include/net/devlink.h
+++ b/include/net/devlink.h
@@ -344,7 +344,8 @@ struct devlink_param {
int (*get)(struct devlink *devlink, u32 id,
   struct devlink_param_gset_ctx *ctx);
int (*set)(struct devlink *devlink, u32 id,
-  struct devlink_param_gset_ctx *ctx);
+  struct devlink_param_gset_ctx *ctx,
+  struct netlink_ext_ack *extack);
int (*validate)(struct devlink *devlink, u32 id,
union devlink_param_value val,
struct netlink_ext_ack *extack);
diff --git a/net/core/devlink.c b/net/core/devlink.c
index 367564099357..54ba7e55b1de 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -2746,11 +2746,12 @@ static int devlink_param_get(struct devlink *devlink,
 
 static int devlink_param_set(struct devlink *devlink,
 const struct devlink_param *param,
-struct devlink_param_gset_ctx *ctx)
+struct devlink_param_gset_ctx *ctx,
+struct netlink_ext_ack *extack)
 {
if (!param->set)
return -EOPNOTSUPP;
-   return param->set(devlink, param->id, ctx);
+   return param->set(devlink, param->id, ctx, extack);
 }
 
 static int
@@ -3112,7 +3113,7 @@ static int devlink_nl_cmd_param_set_doit(struct sk_buff 
*skb,
return -EOPNOTSUPP;
ctx.val = value;
ctx.cmode = cmode;
-   err = devlink_param_set(devlink, param, );
+   err = devlink_param_set(devlink, param, , info->extack);
if (err)
return err;
}
-- 
2.17.0



[net-next 01/10] devlink: Fix param set handling for string type

2018-08-01 Thread Saeed Mahameed
From: Moshe Shemesh 

In case devlink param type is string, it needs to copy the string value
it got from the input to devlink_param_value.

Fixes: e3b7ca18ad7b ("devlink: Add param set command")
Signed-off-by: Moshe Shemesh 
Acked-by: Jiri Pirko 
Signed-off-by: Saeed Mahameed 
---
 include/net/devlink.h | 2 +-
 net/core/devlink.c| 3 ++-
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/include/net/devlink.h b/include/net/devlink.h
index b9b89d6604d4..b0e17c025fdc 100644
--- a/include/net/devlink.h
+++ b/include/net/devlink.h
@@ -311,7 +311,7 @@ union devlink_param_value {
u8 vu8;
u16 vu16;
u32 vu32;
-   const char *vstr;
+   char vstr[DEVLINK_PARAM_MAX_STRING_VALUE];
bool vbool;
 };
 
diff --git a/net/core/devlink.c b/net/core/devlink.c
index 65fc366a78a4..61e126c28526 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -3014,7 +3014,8 @@ devlink_param_value_get_from_info(const struct 
devlink_param *param,
if (nla_len(info->attrs[DEVLINK_ATTR_PARAM_VALUE_DATA]) >
DEVLINK_PARAM_MAX_STRING_VALUE)
return -EINVAL;
-   value->vstr = 
nla_data(info->attrs[DEVLINK_ATTR_PARAM_VALUE_DATA]);
+   strcpy(value->vstr,
+  nla_data(info->attrs[DEVLINK_ATTR_PARAM_VALUE_DATA]));
break;
case DEVLINK_PARAM_TYPE_BOOL:
value->vbool = info->attrs[DEVLINK_ATTR_PARAM_VALUE_DATA] ?
-- 
2.17.0



[net-next 02/10] devlink: Fix param cmode driverinit for string type

2018-08-01 Thread Saeed Mahameed
From: Moshe Shemesh 

Driverinit configuration mode value is held by devlink to enable the
driver fetch the value after reload command. In case the param type is
string devlink should copy the value from driver string buffer to
devlink string buffer on devlink_param_driverinit_value_set() and
vice-versa on devlink_param_driverinit_value_get().

Fixes: ec01aeb1803e ("devlink: Add support for get/set driverinit value")
Signed-off-by: Moshe Shemesh 
Acked-by: Jiri Pirko 
Signed-off-by: Saeed Mahameed 
---
 net/core/devlink.c | 15 ---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/net/core/devlink.c b/net/core/devlink.c
index 61e126c28526..99ad53e8297f 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -3102,7 +3102,10 @@ static int devlink_nl_cmd_param_set_doit(struct sk_buff 
*skb,
return -EOPNOTSUPP;
 
if (cmode == DEVLINK_PARAM_CMODE_DRIVERINIT) {
-   param_item->driverinit_value = value;
+   if (param->type == DEVLINK_PARAM_TYPE_STRING)
+   strcpy(param_item->driverinit_value.vstr, value.vstr);
+   else
+   param_item->driverinit_value = value;
param_item->driverinit_value_valid = true;
} else {
if (!param->set)
@@ -4542,7 +4545,10 @@ int devlink_param_driverinit_value_get(struct devlink 
*devlink, u32 param_id,
  DEVLINK_PARAM_CMODE_DRIVERINIT))
return -EOPNOTSUPP;
 
-   *init_val = param_item->driverinit_value;
+   if (param_item->param->type == DEVLINK_PARAM_TYPE_STRING)
+   strcpy(init_val->vstr, param_item->driverinit_value.vstr);
+   else
+   *init_val = param_item->driverinit_value;
 
return 0;
 }
@@ -4573,7 +4579,10 @@ int devlink_param_driverinit_value_set(struct devlink 
*devlink, u32 param_id,
  DEVLINK_PARAM_CMODE_DRIVERINIT))
return -EOPNOTSUPP;
 
-   param_item->driverinit_value = init_val;
+   if (param_item->param->type == DEVLINK_PARAM_TYPE_STRING)
+   strcpy(param_item->driverinit_value.vstr, init_val.vstr);
+   else
+   param_item->driverinit_value = init_val;
param_item->driverinit_value_valid = true;
 
devlink_param_notify(devlink, param_item, DEVLINK_CMD_PARAM_NEW);
-- 
2.17.0



[pull request][net-next 00/10] Mellanox, mlx5 and devlink updates 2018-07-31

2018-08-01 Thread Saeed Mahameed
Hi Dave,

This series provides devlink parameters updates to both devlink API and
mlx5 driver, it is a 2nd iteration of the dropped patches sent in a previous
mlx5 submission "net/mlx5: Support PCIe buffer congestion handling via
Devlink" to address review comments [1].

Changes from the original series:
- According to the discussion outcome, we are keeping the congestion control
  setting as mlx5 device specific for the current HW generation.
- Changed the congestion_mode and congestion action param type to string
- Added patches to fix devlink handling of param type string
- Added a patch which adds extack messages support for param set.
- At the end of this series, I've added yet another mlx5 devlink related
 feature, firmware snapshot support.

For more information please see tag log below.

Please pull and let me know if there's any problem.

[1] https://patchwork.ozlabs.org/patch/945996/

Thanks,
Saeed.

---

The following changes since commit e6476c21447c4b17c47e476aade6facf050f31e8:

  net: remove bogus RCU annotations on socket.wq (2018-07-31 12:40:22 -0700)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux.git 
tags/mlx5-updates-2018-08-01

for you to fetch changes up to 2ac6108c65ffcb1e5eab1fba1fd59272604d1c32:

  net/mlx5: Use devlink region_snapshot parameter (2018-08-01 14:49:09 -0700)


mlx5-updates-2018-08-01

This series provides devlink parameters updates to both devlink API and
mlx5 driver,

1) Devlink changes: (Moshe Shemesh)
The first two patches fix devlink param infrastructure for string type
params.
The third patch adds a devlink helper function to safely copy string from
driver to devlink.
The forth patch adds extack support for param set.

2) mlx5 specific congestion parameters: (Eran Ben Elisha)
Next three patches add new devlink driver specific params for controlling
congestion action and mode, using string type params and extack messages 
support.

This congestion mode enables hw workaround in specific devices which is
controlled by devlink driver-specific params. The workaround is device
specific for this NIC generation, the next NIC will not need it.

Congestion parameters:
 - Congestion action
HW W/A mechanism in the PCIe buffer which monitors the amount of
consumed PCIe buffer per host.  This mechanism supports the
following actions in case of threshold overflow:
- Disabled - NOP (Default)
- Drop
- Mark - Mark CE bit in the CQE of received packet
- Congestion mode
- Aggressive - Aggressive static trigger threshold (Default)
- Dynamic - Dynamically change the trigger threshold

3) mlx5 firmware snapshot support via devlink: (Alex Vesker)
Last three patches, add the support for capturing region snapshot of the
firmware crspace during critical errors, using devlink region_snapshot
parameter.

-Saeed.


Alex Vesker (3):
  net/mlx5: Add Vendor Specific Capability access gateway
  net/mlx5: Add Crdump FW snapshot support
  net/mlx5: Use devlink region_snapshot parameter

Eran Ben Elisha (3):
  net/mlx5: Move all devlink related functions calls to devlink.c
  net/mlx5: Add MPEGC register configuration functionality
  net/mlx5: Enable PCIe buffer congestion handling workaround via devlink

Moshe Shemesh (4):
  devlink: Fix param set handling for string type
  devlink: Fix param cmode driverinit for string type
  devlink: Add helper function for safely copy string param
  devlink: Add extack messages support to param set

 drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c  |   3 +-
 drivers/net/ethernet/mellanox/mlx4/main.c  |   6 +-
 drivers/net/ethernet/mellanox/mlx5/core/Makefile   |   3 +-
 drivers/net/ethernet/mellanox/mlx5/core/devlink.c  | 388 +
 drivers/net/ethernet/mellanox/mlx5/core/devlink.h  |  13 +
 .../net/ethernet/mellanox/mlx5/core/diag/crdump.c  | 223 
 drivers/net/ethernet/mellanox/mlx5/core/health.c   |   3 +
 drivers/net/ethernet/mellanox/mlx5/core/lib/mlx5.h |   4 +
 .../net/ethernet/mellanox/mlx5/core/lib/pci_vsc.c  | 320 +
 .../net/ethernet/mellanox/mlx5/core/lib/pci_vsc.h  |  56 +++
 drivers/net/ethernet/mellanox/mlx5/core/main.c |  10 +-
 include/linux/mlx5/driver.h|   5 +
 include/net/devlink.h  |  15 +-
 net/core/devlink.c |  44 ++-
 14 files changed, 1076 insertions(+), 17 deletions(-)
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/devlink.c
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/devlink.h
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/diag/crdump.c
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/lib/pci_vsc.c
 create mode 100644 

Re: how PHY driver is notified that cable is unplugged? (possibly related to IFF_RUNNING flag)

2018-08-01 Thread Robert P. J. Day
On Wed, 1 Aug 2018, Florian Fainelli wrote:

> On 08/01/2018 05:58 AM, rpj...@crashcourse.ca wrote:
> >
> >   (warning that i have a few questions that are probably trivial
> > until i get up to speed with networking code.)
> >
> >   a colleague asked for advice about the following -- apparently a
> > new PHY driver works properly when being brought up with "ifconfig
> >  up", part of that process apparently setting the
> > IFF_RUNNING net_device flags bit. so far, so good.
> >
> >   now, when the cable is physically unplugged, the claim is that
> > there is no obvious status change for that port, accompanied by
> > the suggestion that it is that IFF_RUNNING flag bit that is not
> > being unset.
> >
> >   asking a more general question, where can i read up on the
> > proper protocol for a driver being notified of, and properly
> > handling, physical disconnection on that port? and, of course, the
> > cable being plugged back in.
>
> The basic mechanism used by the PHY library is to read the standard
> Basic Mode Status Register and check the Link status bit to
> determine what the state of the link is set. This event can be
> triggered either through polling, or the use of an interrupt that
> the PHY is generating.
>
> Once the link state is determined, because the PHY device is
> "connected" to a network device, the PHY library can call
> netif_carrier_{on,off} against the network device attached to the
> PHY and that propagates through the networking stack and sets the
> appropriate bits, including IFF_RUNNING.

  yup, that agrees with the code i was perusing, thanks for
clarifying. more questions coming shortly, hopefully tougher.

rday

[net-next:master 408/425] net/rds/connection.c:71:31: sparse: incorrect type in argument 1 (different base types)

2018-08-01 Thread kbuild test robot
tree:   https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git 
master
head:   b69ab96ab13ddaa28a506a2433ae5b12669615e8
commit: e65d4d96334e3ff4fe0064612a93a51c63de08de [408/425] rds: Remove IPv6 
dependency
reproduce:
# apt-get install sparse
git checkout e65d4d96334e3ff4fe0064612a93a51c63de08de
make ARCH=x86_64 allmodconfig
make C=1 CF=-D__CHECK_ENDIAN__


sparse warnings: (new ones prefixed by >>)

>> net/rds/connection.c:71:31: sparse: incorrect type in argument 1 (different 
>> base types) @@expected restricted __be32 const [usertype] laddr @@
>> got unsigned inrestricted __be32 const [usertype] laddr @@
   net/rds/connection.c:71:31:expected restricted __be32 const [usertype] 
laddr
   net/rds/connection.c:71:31:got unsigned int [unsigned] [assigned] 
[usertype] lhash
>> net/rds/connection.c:71:41: sparse: incorrect type in argument 3 (different 
>> base types) @@expected restricted __be32 const [usertype] faddr @@
>> got unsigned inrestricted __be32 const [usertype] faddr @@
   net/rds/connection.c:71:41:expected restricted __be32 const [usertype] 
faddr
   net/rds/connection.c:71:41:got unsigned int [unsigned] [assigned] 
[usertype] fhash
   include/linux/slab.h:631:13: sparse: undefined identifier 
'__builtin_mul_overflow'
   include/linux/slab.h:631:13: sparse: call with no type!

vim +71 net/rds/connection.c

53  
54  static struct hlist_head *rds_conn_bucket(const struct in6_addr *laddr,
55const struct in6_addr *faddr)
56  {
57  static u32 rds6_hash_secret __read_mostly;
58  static u32 rds_hash_secret __read_mostly;
59  
60  u32 lhash, fhash, hash;
61  
62  net_get_random_once(_hash_secret, sizeof(rds_hash_secret));
63  net_get_random_once(_hash_secret, 
sizeof(rds6_hash_secret));
64  
65  lhash = (__force u32)laddr->s6_addr32[3];
66  #if IS_ENABLED(CONFIG_IPV6)
67  fhash = __ipv6_addr_jhash(faddr, rds6_hash_secret);
68  #else
69  fhash = (__force u32)faddr->s6_addr32[3];
70  #endif
  > 71  hash = __inet_ehashfn(lhash, 0, fhash, 0, rds_hash_secret);
72  
73  return _conn_hash[hash & RDS_CONNECTION_HASH_MASK];
74  }
75  

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


Re: [PATCH net-next v1] net: don't declare IPv6 non-local bind helper if CONFIG_IPV6 undefined

2018-08-01 Thread David Miller
From: Vincent Bernat 
Date: Wed,  1 Aug 2018 22:05:10 +0200

> Fixes: 83ba4645152d ("net: add helpers checking if socket can be bound to 
> nonlocal address")
> Signed-off-by: Vincent Bernat 

Applied, thanks Vincent.


Re: [net-next v5 3/3] net/tls: Remove redundant array allocation.

2018-08-01 Thread Dave Watson
On 08/01/18 01:49 PM, Vakul Garg wrote:
> > I don't think this patch is safe as-is.  sgin_arr is a stack array of size
> > MAX_SKB_FRAGS (+ overhead), while my read of skb_cow_data is that it
> > walks the whole chain of skbs from skb->next, and can return any number of
> > segments.  Therefore we need to heap allocate.  I think I copied the IPSEC
> > code here.
> > 
> > For perf though, we could use the stack array if skb_cow_data returns <=
> > MAX_SKB_FRAGS.
> 
> skb_cow_data() is being called only when caller passes sgout=NULL (i.e.
> non-zero copy case). In case of zero-copy, we do not call skb_cow_data()
> and just assume that MAX_SKB_FRAGS+2 sized scatterlist array sgin_arr[]
> is sufficient. This assumption could be wrong. So skb_cow_data() should be
> called unconditionally to determine number of scatterlist entries required
> for skb.

I agree it is best to unify them.  I was originally worried about perf
with the extra allocation (which you proposed fixing by merging with
the crypto allocation, which would be great), and the perf of
skb_cow_data().  Zerocopy doesn't require skb_cow_data(), but we do
have to walk the skbs to calculate nsg correctly.

However skb_cow_data perf might be fine after your fix "strparser: Call
skb_unclone conditionally".


Re: Security enhancement proposal for kernel TLS

2018-08-01 Thread Dave Watson
On 07/31/18 10:45 AM, Vakul Garg wrote:
> > > IIUC, with the upstream implementation of tls record layer in kernel,
> > > the decryption of tls FINISHED message happens in kernel. Therefore
> > > the keys are already being sent to kernel tls socket before handshake is
> > completed.
> > 
> > This is incorrect.  
> 
> Let us first reach a common ground on this.
> 
>  The kernel TLS implementation can decrypt only after setting the keys on the 
> socket.
> The TLS message 'finished' (which is encrypted) is received after receiving 
> 'CCS'
> message. After the user space  TLS library receives CCS message, it sets the 
> keys
> on kernel TLS socket. Therefore, the next message in the  socket receive queue
> which is TLS finished gets decrypted in kernel only.
> 
> Please refer to following Boris's patch on openssl. The  commit log says:
> " We choose to set this option at the earliest - just after CCS is complete".

I agree that Boris' patch does what you say it does - it sets keys
immediately after CCS instead of after FINISHED message.  I disagree
that the kernel tls implementation currently requires that specific
ordering, nor do I think that it should require that ordering.


Re: [PATCH bpf] xdp: add NULL pointer check in __xdp_return()

2018-08-01 Thread Daniel Borkmann
On 08/01/2018 04:43 PM, Björn Töpel wrote:
> Den ons 1 aug. 2018 kl 16:14 skrev Jesper Dangaard Brouer :
>> On Mon, 23 Jul 2018 11:41:02 +0200
>> Björn Töpel  wrote:
>>
>> diff --git a/net/core/xdp.c b/net/core/xdp.c
>> index 9d1f220..1c12bc7 100644
>> --- a/net/core/xdp.c
>> +++ b/net/core/xdp.c
>> @@ -345,7 +345,8 @@ static void __xdp_return(void *data, struct 
>> xdp_mem_info *mem, bool napi_direct,
>>   rcu_read_lock();
>>   /* mem->id is valid, checked in 
>> xdp_rxq_info_reg_mem_model() */
>>   xa = rhashtable_lookup(mem_id_ht, >id, 
>> mem_id_rht_params);
>> - xa->zc_alloc->free(xa->zc_alloc, handle);
>> + if (xa)
>> + xa->zc_alloc->free(xa->zc_alloc, handle);
> hmm...It is not clear to me the "!xa" case don't have to be handled?

 Thank you for reviewing!

 Returning NULL pointer is bug case such as calling after use
 xdp_rxq_info_unreg().
 so that, I think it can't handle at that moment.
 we can make __xdp_return to add WARN_ON_ONCE() or
 add return error code to driver.
 But I'm not sure if these is useful information.

 I might have misunderstood scenario of MEM_TYPE_ZERO_COPY
 because there is no use case of MEM_TYPE_ZERO_COPY yet.
>>>
>>> Taehee, again, sorry for the slow response and thanks for patch!
>>>
>>> If xa is NULL, the driver has a buggy/broken implementation. What
>>> would be a proper way of dealing with this? BUG?
>>
>> Hmm... I don't like these kind of changes to the hot-path code!
>>
>> You might not realize this, but adding BUG() and WARN_ON() to the code
>> affect performance in ways you might not realize!  These macros gets
>> compiled and uses an asm instruction called "ud2".  Seeing the "ud2"
>> instruction causes the CPUs instruction cache prefetcher to stop.
>> Thus, if some code ends up below this instruction, this will cause more
>> i-cache-misses.
>>
>> I don't know if xa==NULL is even possible, but if it is, then I think
>> this is a result of a driver mem_reg API usage bug.  And the mem-reg
>> API is full of WARN's and error messages, exactly to push these kind of
>> checks out of the fast-path.  There is no need for a BUG() call, as
>> deref a NULL pointer will case an OOPS, that is easy to read and
>> understand.
> 
> Jesper, thanks for having a look! So, you're right that if xa==NULL
> the driver is "broken/buggy" (as stated earlier!). I agree that
> OOPSing on a NULL pointer is as good as a BUG!
> 
> The applied patch adds a WARN_ON_ONCE, and I thought best practice was
> that a buggy driver shouldn't crash the kernel... What is considered
> best practices in these scenarios? *I'd* prefer an OOPS instead of
> WARN_ON_ONCE, to catch that buggy driver. Again, that's me. I thought
> that most people prefer not crashing, hence the patch. :-)

In that case, lets send a revert for the patch with a proper analysis
of why it is safe to omit the NULL check which should be placed as a
comment right near the rhashtable_lookup().

Thanks,
Daniel


[PATCH net-next v1] net: don't declare IPv6 non-local bind helper if CONFIG_IPV6 undefined

2018-08-01 Thread Vincent Bernat
Fixes: 83ba4645152d ("net: add helpers checking if socket can be bound to 
nonlocal address")
Signed-off-by: Vincent Bernat 
---
 include/net/ipv6.h | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/include/net/ipv6.h b/include/net/ipv6.h
index 82deb684ba73..ff33f498c137 100644
--- a/include/net/ipv6.h
+++ b/include/net/ipv6.h
@@ -766,13 +766,6 @@ static inline int ip6_sk_dst_hoplimit(struct ipv6_pinfo 
*np, struct flowi6 *fl6,
return hlimit;
 }
 
-static inline bool ipv6_can_nonlocal_bind(struct net *net,
- struct inet_sock *inet)
-{
-   return net->ipv6.sysctl.ip_nonlocal_bind ||
-   inet->freebind || inet->transparent;
-}
-
 /* copy IPv6 saddr & daddr to flow_keys, possibly using 64bit load/store
  * Equivalent to : flow->v6addrs.src = iph->saddr;
  * flow->v6addrs.dst = iph->daddr;
@@ -789,6 +782,13 @@ static inline void iph_to_flow_copy_v6addrs(struct 
flow_keys *flow,
 
 #if IS_ENABLED(CONFIG_IPV6)
 
+static inline bool ipv6_can_nonlocal_bind(struct net *net,
+ struct inet_sock *inet)
+{
+   return net->ipv6.sysctl.ip_nonlocal_bind ||
+   inet->freebind || inet->transparent;
+}
+
 /* Sysctl settings for net ipv6.auto_flowlabels */
 #define IP6_AUTO_FLOW_LABEL_OFF0
 #define IP6_AUTO_FLOW_LABEL_OPTOUT 1
-- 
2.18.0



[net-next:master 415/424] include/net/ipv6.h:772:14: error: 'struct net' has no member named 'ipv6'; did you mean 'ipv4'?

2018-08-01 Thread kbuild test robot
tree:   https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git 
master
head:   90d4c5bb98bf6665266917edf0e16ccd35f9
commit: 83ba4645152d1177c161750e1064e3a8e7cee19b [415/424] net: add helpers 
checking if socket can be bound to nonlocal address
config: i386-tinyconfig (attached as .config)
compiler: gcc-7 (Debian 7.3.0-16) 7.3.0
reproduce:
git checkout 83ba4645152d1177c161750e1064e3a8e7cee19b
# save the attached .config to linux build tree
make ARCH=i386 

All errors (new ones prefixed by >>):

   In file included from include/net/addrconf.h:54:0,
from lib/vsprintf.c:35:
   include/net/ipv6.h: In function 'ipv6_can_nonlocal_bind':
>> include/net/ipv6.h:772:14: error: 'struct net' has no member named 'ipv6'; 
>> did you mean 'ipv4'?
 return net->ipv6.sysctl.ip_nonlocal_bind ||
 ^~~~
 ipv4

vim +772 include/net/ipv6.h

   768  
   769  static inline bool ipv6_can_nonlocal_bind(struct net *net,
   770struct inet_sock *inet)
   771  {
 > 772  return net->ipv6.sysctl.ip_nonlocal_bind ||
   773  inet->freebind || inet->transparent;
   774  }
   775  

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip


Re: [net-next 10/16] net/mlx5: Support PCIe buffer congestion handling via Devlink

2018-08-01 Thread Moshe Shemesh
On Tue, Jul 31, 2018 at 2:06 PM, Bjorn Helgaas  wrote:

> On Mon, Jul 30, 2018 at 08:19:50PM -0700, Alexander Duyck wrote:
> > On Mon, Jul 30, 2018 at 7:33 PM, Bjorn Helgaas 
> wrote:
> > > On Mon, Jul 30, 2018 at 08:02:48AM -0700, Alexander Duyck wrote:
> > >> On Mon, Jul 30, 2018 at 7:07 AM, Bjorn Helgaas 
> wrote:
> > >> > On Sun, Jul 29, 2018 at 03:00:28PM -0700, Alexander Duyck wrote:
> > >> >> On Sun, Jul 29, 2018 at 2:23 AM, Moshe Shemesh <
> moshes20...@gmail.com> wrote:
> > >> >> > On Sat, Jul 28, 2018 at 7:06 PM, Bjorn Helgaas <
> helg...@kernel.org> wrote:
> > >> >> >> On Thu, Jul 26, 2018 at 07:00:20AM -0700, Alexander Duyck wrote:
> > >> >> >> > On Thu, Jul 26, 2018 at 12:14 AM, Jiri Pirko <
> j...@resnulli.us> wrote:
> > >> >> >> > > Thu, Jul 26, 2018 at 02:43:59AM CEST,
> jakub.kicin...@netronome.com
> > >> >> >> > > wrote:
> > >> >> >> > >>On Wed, 25 Jul 2018 08:23:26 -0700, Alexander Duyck wrote:
> > >> >> >> > >>> On Wed, Jul 25, 2018 at 5:31 AM, Eran Ben Elisha wrote:
> > >> >> >> > >>> > On 7/24/2018 10:51 PM, Jakub Kicinski wrote:
> > >> >> >> > >>>  The devlink params haven't been upstream even for a
> full cycle
> > >> >> >> > >>>  and
> > >> >> >> > >>>  already you guys are starting to use them to
> configure standard
> > >> >> >> > >>>  features like queuing.
> > >> >> >> > >>> >>>
> > >> >> >> > >>> >>> We developed the devlink params in order to support
> non-standard
> > >> >> >> > >>> >>> configuration only. And for non-standard, there are
> generic and
> > >> >> >> > >>> >>> vendor
> > >> >> >> > >>> >>> specific options.
> > >> >> >> > >>> >>
> > >> >> >> > >>> >> I thought it was developed for performing non-standard
> and
> > >> >> >> > >>> >> possibly
> > >> >> >> > >>> >> vendor specific configuration.  Look at
> DEVLINK_PARAM_GENERIC_*
> > >> >> >> > >>> >> for
> > >> >> >> > >>> >> examples of well justified generic options for which
> we have no
> > >> >> >> > >>> >> other API.  The vendor mlx4 options look fairly vendor
> specific
> > >> >> >> > >>> >> if you
> > >> >> >> > >>> >> ask me, too.
> > >> >> >> > >>> >>
> > >> >> >> > >>> >> Configuring queuing has an API.  The question is it
> acceptable to
> > >> >> >> > >>> >> enter
> > >> >> >> > >>> >> into the risky territory of controlling offloads via
> devlink
> > >> >> >> > >>> >> parameters
> > >> >> >> > >>> >> or would we rather make vendors take the time and
> effort to model
> > >> >> >> > >>> >> things to (a subset) of existing APIs.  The HW never
> fits the
> > >> >> >> > >>> >> APIs
> > >> >> >> > >>> >> perfectly.
> > >> >> >> > >>> >
> > >> >> >> > >>> > I understand what you meant here, I would like to
> highlight that
> > >> >> >> > >>> > this
> > >> >> >> > >>> > mechanism was not meant to handle SRIOV, Representors,
> etc.
> > >> >> >> > >>> > The vendor specific configuration suggested here is to
> handle a
> > >> >> >> > >>> > congestion
> > >> >> >> > >>> > state in Multi Host environment (which includes PF and
> multiple
> > >> >> >> > >>> > VFs per
> > >> >> >> > >>> > host), where one host is not aware to the other hosts,
> and each is
> > >> >> >> > >>> > running
> > >> >> >> > >>> > on its own pci/driver. It is a device working mode
> configuration.
> > >> >> >> > >>> >
> > >> >> >> > >>> > This  couldn't fit into any existing API, thus creating
> this
> > >> >> >> > >>> > vendor specific
> > >> >> >> > >>> > unique API is needed.
> > >> >> >> > >>>
> > >> >> >> > >>> If we are just going to start creating devlink interfaces
> in for
> > >> >> >> > >>> every
> > >> >> >> > >>> one-off option a device wants to add why did we even
> bother with
> > >> >> >> > >>> trying to prevent drivers from using sysfs? This just
> feels like we
> > >> >> >> > >>> are back to the same arguments we had back in the day
> with it.
> > >> >> >> > >>>
> > >> >> >> > >>> I feel like the bigger question here is if devlink is how
> we are
> > >> >> >> > >>> going
> > >> >> >> > >>> to deal with all PCIe related features going forward, or
> should we
> > >> >> >> > >>> start looking at creating a new interface/tool for
> PCI/PCIe related
> > >> >> >> > >>> features? My concern is that we have already had features
> such as
> > >> >> >> > >>> DMA
> > >> >> >> > >>> Coalescing that didn't really fit into anything and now
> we are
> > >> >> >> > >>> starting to see other things related to DMA and PCIe bus
> credits.
> > >> >> >> > >>> I'm
> > >> >> >> > >>> wondering if we shouldn't start looking at a
> tool/interface to
> > >> >> >> > >>> configure all the PCIe related features such as
> interrupts, error
> > >> >> >> > >>> reporting, DMA configuration, power management, etc.
> Maybe we could
> > >> >> >> > >>> even look at sharing it across subsystems and include
> things like
> > >> >> >> > >>> storage, graphics, and other subsystems in the
> conversation.
> > >> >> >> > >>
> > >> >> >> > >>Agreed, for actual PCIe configuration (i.e. not ECN
> marking) we do
> > >> >> >> > >> need
> > >> >> >> > 

Re: [Patch net-next] net: hns3: fix return value error while hclge_cmd_csq_clean failed

2018-08-01 Thread David Miller
From: Huazhong Tan 
Date: Wed, 1 Aug 2018 17:53:28 +0800

> From: fredalu 
> 
> While cleaning the command queue, the value of the HEAD register is not
> in the range of next_to_clean and next_to_use, meaning that this value
> is invalid. This also means that there is a hardware error and the
> hardware will trigger a reset soon. At this time we should return an
> error code instead of 0, and HCLGE_STATE_CMD_DISABLE needs to be set to
> prevent sending command again.
> 
> Fixes: 3ff504908f95 ("net: hns3: fix a dead loop in hclge_cmd_csq_clean")
> Signed-off-by: Huazhong Tan wq
--->  ^^^

I think I know what text editor you use... :-)

Applied with 'wq' removed from your signoff... Thanks.


Re: [PATCH net-next] cxgb4: fix endian to test F_FW_PORT_CMD_DCBXDIS32

2018-08-01 Thread David Miller
From: Ganesh Goudar 
Date: Wed,  1 Aug 2018 18:15:32 +0530

> For FW_PORT_ACTION_GET_PORT_INFO32 messages, the
> u.info32.lstatus32_to_cbllen32 is 32-bit Big Endian.
> We need to translate that to CPU Endian in order to
> test F_FW_PORT_CMD_DCBXDIS32.
> 
> Signed-off-by: Casey Leedom 
> Signed-off-by: Ganesh Goudar 

Applied, thank you.


Re: [PATCH net-next] strparser: remove redundant variable 'rd_desc'

2018-08-01 Thread David Miller
From: YueHaibing 
Date: Wed, 1 Aug 2018 15:10:37 +0800

> Variable 'rd_desc' is being assigned but never used,
> so can be removed.
> 
> fix this clang warning:
> net/strparser/strparser.c:411:20: warning: variable ‘rd_desc’ set but not 
> used [-Wunused-but-set-variable]
> 
> Signed-off-by: YueHaibing 

Applied.


Re: [patch net-next v2 0/3] net: sched: couple of adjustments/fixes

2018-08-01 Thread David Miller
From: Jiri Pirko 
Date: Wed,  1 Aug 2018 12:36:54 +0200

> From: Jiri Pirko 
> 
> Jiri Pirko (3):
>   net: sched: change name of zombie chain to "held_by_acts_only"
>   net: sched: fix notifications for action-held chains
>   net: sched: make tcf_chain_{get,put}() static

Series applied, thanks Jiri.


Re: [PATCH net-next v1] net: add helpers checking if socket can be bound to nonlocal address

2018-08-01 Thread David Miller
From: Vincent Bernat 
Date: Tue, 31 Jul 2018 21:18:11 +0200

> The construction "net->ipv4.sysctl_ip_nonlocal_bind || inet->freebind
> || inet->transparent" is present three times and its IPv6 counterpart
> is also present three times. We introduce two small helpers to
> characterize these tests uniformly.
> 
> Signed-off-by: Vincent Bernat 

Looks great, thanks for doing this.

Applied.


Re: how PHY driver is notified that cable is unplugged? (possibly related to IFF_RUNNING flag)

2018-08-01 Thread Florian Fainelli
On 08/01/2018 05:58 AM, rpj...@crashcourse.ca wrote:
> 
>   (warning that i have a few questions that are probably trivial until i
> get up to speed
> with networking code.)
> 
>   a colleague asked for advice about the following -- apparently a new
> PHY driver works
> properly when being brought up with "ifconfig  up", part of that
> process
> apparently setting the IFF_RUNNING net_device flags bit. so far, so good.
> 
>   now, when the cable is physically unplugged, the claim is that there
> is no obvious
> status change for that port, accompanied by the suggestion that it is
> that IFF_RUNNING
> flag bit that is not being unset.
> 
>   asking a more general question, where can i read up on the proper
> protocol for
> a driver being notified of, and properly handling, physical
> disconnection on that
> port? and, of course, the cable being plugged back in.

The basic mechanism used by the PHY library is to read the standard
Basic Mode Status Register and check the Link status bit to determine
what the state of the link is set. This event can be triggered either
through polling, or the use of an interrupt that the PHY is generating.

Once the link state is determined, because the PHY device is "connected"
to a network device, the PHY library can call netif_carrier_{on,off}
against the network device attached to the PHY and that propagates
through the networking stack and sets the appropriate bits, including
IFF_RUNNING.

Hope this helps.
-- 
Florian


Re: [PATCH v2 net-next 0/5] tcp: add 4 new stats

2018-08-01 Thread David Miller
From: Wei Wang 
Date: Tue, 31 Jul 2018 17:46:19 -0700

> From: Wei Wang 
> 
> This patch series adds 3 RFC4898 stats:
> 1. tcpEStatsPerfHCDataOctetsOut
> 2. tcpEStatsPerfOctetsRetrans
> 3. tcpEStatsStackDSACKDups
> and an addtional stat to record the number of data packet reordering
> events seen:
> 4. tcp_reord_seen
> 
> Together with the existing stats, application can use them to measure
> the retransmission rate in bytes, exclude spurious retransmissions
> reflected by DSACK, and keep track of the reordering events on live
> connections.
> In particular the networks with different MTUs make bytes-based loss stats
> more useful. Google servers have been using these stats for many years to
> instrument transport and network performance.
> 
> Note: The first patch is a refactor to add a helper to calculate
> opt_stats size in order to make later changes cleaner.

Series applied, thank you.


Re: [PATCH net-next] tcp: remove set but not used variable 'skb_size'

2018-08-01 Thread David Miller
From: Wei Yongjun 
Date: Wed, 1 Aug 2018 01:59:56 +

> Fixes gcc '-Wunused-but-set-variable' warning:
> 
> net/ipv4/tcp_output.c: In function 'tcp_collapse_retrans':
> net/ipv4/tcp_output.c:2700:6: warning:
>  variable 'skb_size' set but not used [-Wunused-but-set-variable]
>   int skb_size, next_skb_size;
>   ^
> 
> Signed-off-by: Wei Yongjun 

Applied, thanks.


Re: [PATCH V2 net-next 0/2] be2net: patch-set

2018-08-01 Thread David Miller
From: Suresh Reddy 
Date: Tue, 31 Jul 2018 11:39:41 -0400

> v1->v2 : Modified the subject line and commit log.
> 
> Please consider applying these two patches to net-next.

Series applied, thanks.


Re: [PATCH net-next 1/2] rds: rds_ib_recv_alloc_cache() should call alloc_percpu_gfp() instead

2018-08-01 Thread David Miller
From: Ka-Cheong Poon 
Date: Mon, 30 Jul 2018 22:48:41 -0700

> Currently, rds_ib_conn_alloc() calls rds_ib_recv_alloc_caches()
> without passing along the gfp_t flag.  But rds_ib_recv_alloc_caches()
> and rds_ib_recv_alloc_cache() should take a gfp_t parameter so that
> rds_ib_recv_alloc_cache() can call alloc_percpu_gfp() using the
> correct flag instead of calling alloc_percpu().
> 
> Signed-off-by: Ka-Cheong Poon 

Applied.


Re: [PATCH net] net: stmmac: Fix WoL for PCI-based setups

2018-08-01 Thread David Miller
From: Jose Abreu 
Date: Tue, 31 Jul 2018 15:08:20 +0100

> WoL won't work in PCI-based setups because we are not saving the PCI EP
> state before entering suspend state and not allowing D3 wake.
> 
> Fix this by using a wrapper around stmmac_{suspend/resume} which
> correctly sets the PCI EP state.
> 
> Signed-off-by: Jose Abreu 

Applied and queued up for -stable, thanks.


Re: [PATCH v2 net] bonding: avoid lockdep confusion in bond_get_stats()

2018-08-01 Thread David Miller
From: Eric Dumazet 
Date: Tue, 31 Jul 2018 06:30:54 -0700

> syzbot found that the following sequence produces a LOCKDEP splat [1]
 ...
> To fix this, we can use the already provided nest_level.
> 
> This patch also provides correct nesting for dev->addr_list_lock
 ...
> Signed-off-by: Eric Dumazet 

Applied, thanks Eric.


Re: [PATCH ethtool] ethtool: Add support for WAKE_FILTER

2018-08-01 Thread David Miller
From: Florian Fainelli 
Date: Mon, 30 Jul 2018 15:26:24 -0700

> On 07/17/2018 08:36 AM, Florian Fainelli wrote:
>> Allow re-purposing the wol->sopass storage area to specify a bitmask of 
>> filters
>> (programmed previously via ethtool::rxnfc) to be used as wake-up patterns.
> 
> John, David, can you provide some feedback if the approach is
> acceptable? I will address Andrew's comment about the user friendliness
> and allow providing a comma separate list of filter identifiers.
> 
> One usability issue with this approach is that one cannot specify
> wake-on-LAN using WAKE_MAGICSECURE *and* WAKE_FILTER at the same time,
> since it uses the same location in the ioctl() structure that is being
> passed. Do you see this as a problem?

Once again we are stuck in this weird situation, a sort of limbo.

On the one hand, I don't want to block your work on the ethtool
netlink stuff being done.

However it is clear that by using netlink attributes, it would
be so much cleaner.

I honestly don't know what to say at this time.  I wish I had
a clear piece of advice and a way for everyone to move forward,
and usually I do, but this time I really don't :-/


Re: [PATCH net-next 2/2] rds: Remove IPv6 dependency

2018-08-01 Thread David Miller
From: Ka-Cheong Poon 
Date: Mon, 30 Jul 2018 22:48:42 -0700

> This patch removes the IPv6 dependency from RDS.
> 
> Signed-off-by: Ka-Cheong Poon 

Applied.


Re: [PATCH net-next] virtio_net: force_napi_tx module param.

2018-08-01 Thread Willem de Bruijn
> > > > Just distribute across the available cpus evenly, and be done with it.
> > >
> > > Sounds good to me.
> >
> > So e.g. we could set an affinity hint to a group of CPUs that
> > might transmit to this queue.
>
> We also want to set the xps mask for all cpus in the group to this queue.
>
> Is there a benefit over explicitly choosing one cpu from the set, btw?
> I assumed striping. Something along the lines of
>
>   int stripe = max_t(int, num_online_cpus() / vi->curr_queue_pairs, 1);
>   int vq = 0;
>
>   cpumask_clear(xps_mask);
>
>   for_each_online_cpu(cpu) {
>   cpumask_set_cpu(cpu, xps_mask);
>
>   if ((i + 1) % stripe == 0) {
>   virtqueue_set_affinity(vi->rq[vq].vq, cpu);
>   virtqueue_set_affinity(vi->sq[vq].vq, cpu);
>   netif_set_xps_queue(vi->dev, xps_mask, vq);
>   cpumask_clear(xps_mask);
>   vq++;
>  }
>   i++;
>   }

.. but handling edge cases correctly, such as #cpu not being a perfect
multiple of #vq.


Re: [PATCH net-next] virtio_net: force_napi_tx module param.

2018-08-01 Thread Willem de Bruijn
On Tue, Jul 31, 2018 at 8:34 AM Michael S. Tsirkin  wrote:
>
> On Sun, Jul 29, 2018 at 05:32:56PM -0400, Willem de Bruijn wrote:
> > On Sun, Jul 29, 2018 at 12:01 PM David Miller  wrote:
> > >
> > > From: Caleb Raitto 
> > > Date: Mon, 23 Jul 2018 16:11:19 -0700
> > >
> > > > From: Caleb Raitto 
> > > >
> > > > The driver disables tx napi if it's not certain that completions will
> > > > be processed affine with tx service.
> > > >
> > > > Its heuristic doesn't account for some scenarios where it is, such as
> > > > when the queue pair count matches the core but not hyperthread count.
> > > >
> > > > Allow userspace to override the heuristic. This is an alternative
> > > > solution to that in the linked patch. That added more logic in the
> > > > kernel for these cases, but the agreement was that this was better left
> > > > to user control.
> > > >
> > > > Do not expand the existing napi_tx variable to a ternary value,
> > > > because doing so can break user applications that expect
> > > > boolean ('Y'/'N') instead of integer output. Add a new param instead.
> > > >
> > > > Link: https://patchwork.ozlabs.org/patch/725249/
> > > > Acked-by: Willem de Bruijn 
> > > > Acked-by: Jon Olson 
> > > > Signed-off-by: Caleb Raitto 
> > >
> > > So I looked into the history surrounding these issues.
> > >
> > > First of all, it's always ends up turning out crummy when drivers start
> > > to set affinities themselves.  The worst possible case is to do it
> > > _conditionally_, and that is exactly what virtio_net is doing.
> > >
> > > From the user's perspective, this provides a really bad experience.
> > >
> > > So if I have a 32-queue device and there are 32 cpus, you'll do all
> > > the affinity settings, stopping Irqbalanced from doing anything
> > > right?
> > >
> > > So if I add one more cpu, you'll say "oops, no idea what to do in
> > > this situation" and not touch the affinities at all?
> > >
> > > That makes no sense at all.
> > >
> > > If the driver is going to set affinities at all, OWN that decision
> > > and set it all the time to something reasonable.
> > >
> > > Or accept that you shouldn't be touching this stuff in the first place
> > > and leave the affinities alone.
> > >
> > > Right now we're kinda in a situation where the driver has been setting
> > > affinities in the ncpus==nqueues cases for some time, so we can't stop
> > > doing it.
> > >
> > > Which means we have to set them in all cases to make the user
> > > experience sane again.
> > >
> > > I looked at the linked to patch again:
> > >
> > > https://patchwork.ozlabs.org/patch/725249/
> > >
> > > And I think the strategy should be made more generic, to get rid of
> > > the hyperthreading assumptions.  I also agree that the "assign
> > > to first N cpus" logic doesn't make much sense either.
> > >
> > > Just distribute across the available cpus evenly, and be done with it.
> >
> > Sounds good to me.
>
> So e.g. we could set an affinity hint to a group of CPUs that
> might transmit to this queue.

We also want to set the xps mask for all cpus in the group to this queue.

Is there a benefit over explicitly choosing one cpu from the set, btw?
I assumed striping. Something along the lines of

  int stripe = max_t(int, num_online_cpus() / vi->curr_queue_pairs, 1);
  int vq = 0;

  cpumask_clear(xps_mask);

  for_each_online_cpu(cpu) {
  cpumask_set_cpu(cpu, xps_mask);

  if ((i + 1) % stripe == 0) {
  virtqueue_set_affinity(vi->rq[vq].vq, cpu);
  virtqueue_set_affinity(vi->sq[vq].vq, cpu);
  netif_set_xps_queue(vi->dev, xps_mask, vq);
  cpumask_clear(xps_mask);
  vq++;
 }
  i++;
  }


Re: [PATCH net-next 7/9] net: stmmac: Integrate XGMAC into main driver flow

2018-08-01 Thread Andrew Lunn
> @@ -842,6 +863,12 @@ static void stmmac_adjust_link(struct net_device *dev)
>   new_state = true;
>   ctrl &= ~priv->hw->link.speed_mask;
>   switch (phydev->speed) {
> + case SPEED_1:
> + ctrl |= priv->hw->link.speed1;
> + break;
> + case SPEED_2500:
> + ctrl |= priv->hw->link.speed2500;
> + break;
>   case SPEED_1000:
>   ctrl |= priv->hw->link.speed1000;
>   break;

Hi Jose

What PHY did you test this with?

10G phys change the interface mode when the speed change. In general,
10/100/1000G copper uses SGMII. A 1G SFP optical module generally
wants 1000Base-X. 2.5G wants 2500Base-X, 10G copper wants 10GKR, etc.

So your adjust link callback needs to look at phydev->interface and
reconfigure the MAC as requested.

You might also want to consider moving from phylib to phylink. It has
a better interface for things like this, and makes support for SFP
interfaces much easier. A MAC which supports 10G is likely to be used
with SFPs...

 Andrew


Re: [PATCH bpf] selftests/bpf: update test_lwt_seg6local.sh according to iproute2

2018-08-01 Thread Y Song
On Wed, Aug 1, 2018 at 8:34 AM, Mathieu Xhonneux  wrote:
> The shell file for test_lwt_seg6local contains an early iproute2 syntax
> for installing a seg6local End.BPF route. iproute2 support for this
> feature has recently been upstreamed, but with an additional keyword
> required. This patch updates test_lwt_seg6local.sh to the definitive
> iproute2 syntax
>
> Signed-off-by: Mathieu Xhonneux 

Acked-by: Yonghong Song 


Re: [PATCH v6 bpf-next 4/9] veth: Handle xdp_frames in xdp napi ring

2018-08-01 Thread Jesper Dangaard Brouer
On Wed, 1 Aug 2018 14:41:08 +0900
Toshiaki Makita  wrote:

> On 2018/07/31 21:46, Jesper Dangaard Brouer wrote:
> > On Tue, 31 Jul 2018 19:40:08 +0900
> > Toshiaki Makita  wrote:
> >   
> >> On 2018/07/31 19:26, Jesper Dangaard Brouer wrote:  
> >>>
> >>> Context needed from: [PATCH v6 bpf-next 2/9] veth: Add driver XDP
> >>>
> >>> On Mon, 30 Jul 2018 19:43:44 +0900
> >>> Toshiaki Makita  wrote:
> >>> 
[...]
> >>>
> >>> Here you are adding an assumption that struct xdp_frame is always
> >>> located in-the-top of the packet-data area.  I tried hard not to add
> >>> such a dependency!  You can calculate the beginning of the frame from
> >>> the xdp_frame->data pointer.
> >>>
> >>> Why not add such a dependency?  Because for AF_XDP zero-copy, we cannot
> >>> make such an assumption.  
> >>>
> >>> Currently, when an RX-queue is in AF-XDP-ZC mode (MEM_TYPE_ZERO_COPY)
> >>> the packet will get dropped when calling convert_to_xdp_frame(), but as
> >>> the TODO comment indicated in convert_to_xdp_frame() this is not the
> >>> end-goal. 
> >>>
> >>> The comment in convert_to_xdp_frame(), indicate we need a full
> >>> alloc+copy, but that is actually not necessary, if we can just use
> >>> another memory area for struct xdp_frame, and a pointer to data.  Thus,
> >>> allowing devmap-redir to work-ZC and allow cpumap-redir to do the copy
> >>> on the remote CPU.
> >>
> >> Thanks for pointing this out.
> >> Seems you are saying xdp_frame area is not reusable. That means we
> >> reduce usable headroom on every REDIRECT. I wanted to avoid this but
> >> actually it is impossible, right?  
> > 
> > I'm not sure I understand fully...  has this something to do, with the
> > below memset?  
> 
> Sorry for not being so clear...
> It has something to do with the memset as well but mainly I was talking
> about XDP_TX and REDIRECT introduced in patch 8. On REDIRECT,
> dev_map_enqueue() calls convert_to_xdp_frame() so we use the headroom
> for struct xdp_frame on REDIRECT. If we don't reuse xdp_frame region of
> the original xdp packet, we reduce the headroom size each time on
> REDIRECT. When ZC is used, in the future xdp_frame can be non-contiguous
> to the buffer, so we cannot reuse the xdp_frame region in
> convert_to_xdp_frame()? But current convert_to_xdp_frame()
> implementation requires xdp_frame region in headroom so I think I cannot
> avoid this dependency now.
> 
> SKB has a similar problem if we cannot reuse it. It can be passed to a
> bridge and redirected to another veth which has driver XDP. In that case
> we need to reallocate the page if we have reduced the headroom because
> sufficient headroom is required for XDP processing for now (can we
> remove this requirement actually?).

Okay, now I understand.  Your changes allow multiple levels of
XDP_REDIRECT between/into other veth net_devices.  This is very
interesting and exciting stuff, but also a bit scary, when thinking
about if we got he life-time correct for the different memory objects.

You have convinced me.  We should not sacrifice/reduce the headroom
this way.  I'll also fix up cpumap.

To avoid the performance penalty of the memset, I propose that we just
clear the xdp_frame->data pointer.  But lets implement it via a common
sanitize/scrub function.


> > When cpumap generate an SKB for the netstack, then we sacrifice/reduce
> > the SKB headroom available, by in convert_to_xdp_frame() reducing the
> > headroom by xdp_frame size.
> > 
> >  xdp_frame->headroom = headroom - sizeof(*xdp_frame)
> > 
> > In-order to avoid doing such memset of this area.  We are actually only
> > worried about exposing the 'data' pointer, thus we could just clear
> > that.  (See commit 6dfb970d3dbd, this is because Alexei is planing to
> > move from CAP_SYS_ADMIN to lesser privileged mode CAP_NET_ADMIN)
> > 
> > See commits:
> >  97e19cce05e5 ("bpf: reserve xdp_frame size in xdp headroom")
> >  6dfb970d3dbd ("xdp: avoid leaking info stored in frame data on page 
> > reuse")  
> 
> We have talked about that...
> https://patchwork.ozlabs.org/patch/903536/
> 
> The memset is introduced as per your feedback, but I'm still not sure if
> we need this. In general the headroom is not cleared after allocation in
> drivers, so anyway unprivileged users should not see it no matter if it
> contains xdp_frame or not...

I actually got this request from Alexei. That is why I implemented it.
Personally I don't think this clearing is really needed, until someone
actually makes the TC/cls_act BPF hook CAP_NET_ADMIN.

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer


Re: [PATCH net-next 5/9] net: stmmac: Add MDIO related functions for XGMAC2

2018-08-01 Thread Andrew Lunn
Hi Jose

> +static int stmmac_xgmac2_mdio_read(struct stmmac_priv *priv, int phyaddr,
> +int phyreg)
> +{
> + unsigned int mii_address = priv->hw->mii.addr;
> + unsigned int mii_data = priv->hw->mii.data;
> + u32 tmp, addr, value = MII_XGMAC_BUSY;
> + int data;
> +
> + if (phyreg & MII_ADDR_C45) {
> + addr = ((phyreg >> 16) & 0x1f) << 21;
> + addr |= (phyaddr << 16) | (phyreg & 0x);

Do you need to tell the hardware this is a C45 transfer? Normally an
extra bit needs setting somewhere.

> + } else {
> + if (phyaddr >= 4)
> + return -ENODEV;

Can the MDIO bus be external? If so, is there a reason why there
cannot be a PHY at addresses > 4. So maybe there is an Ethernet
switch, which needs lots of addresses? And C45 can have devices > 4
but C22 cannot?

> + writel(~0x0, priv->ioaddr + 0x220);
> + addr = (phyaddr << 16) | (phyreg & 0x1f);
> + }
> +
> + value |= (priv->clk_csr << priv->hw->mii.clk_csr_shift)
> + & priv->hw->mii.clk_csr_mask;
> + value |= BIT(18);

Please add a #define for this bit.

> + value |= MII_XGMAC_READ;
> +
> + if (readl_poll_timeout(priv->ioaddr + mii_data, tmp,
> +!(tmp & MII_XGMAC_BUSY), 100, 1))
> + return -EBUSY;
> +
> + writel(addr, priv->ioaddr + mii_address);
> + writel(value, priv->ioaddr + mii_data);
> +
> + if (readl_poll_timeout(priv->ioaddr + mii_data, tmp,
> +!(tmp & MII_XGMAC_BUSY), 100, 1))
> + return -EBUSY;
> +
> + /* Read the data from the MII data register */
> + data = (int)readl(priv->ioaddr + mii_data) & GENMASK(15, 0);

Is the cast needed? And why use GENMASK here, but not in all the other
places you have masks in this code?

>  /**
>   * stmmac_mdio_read
>   * @bus: points to the mii_bus structure
> @@ -59,6 +141,9 @@ static int stmmac_mdio_read(struct mii_bus *bus, int 
> phyaddr, int phyreg)
>   int data;
>   u32 value = MII_BUSY;
>  
> + if (priv->plat->has_xgmac)
> + return stmmac_xgmac2_mdio_read(priv, phyaddr, phyreg);

It would be cleaner to instead do this in stmmac_mdio_register() when
setting new_bus->read.

Andrew


Re: [PATCH net-next 9/9] bindings: net: stmmac: Add the bindings documentation for XGMAC2.

2018-08-01 Thread Sergei Shtylyov
Hello!

On 08/01/2018 03:10 PM, Jose Abreu wrote:

> Adds the documentation for XGMAC2 DT bindings.
> 
> Signed-off-by: Jose Abreu 
> Cc: David S. Miller 
> Cc: Joao Pinto 
> Cc: Giuseppe Cavallaro 
> Cc: Alexandre Torgue 
> ---
>  Documentation/devicetree/bindings/net/stmmac.txt | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/Documentation/devicetree/bindings/net/stmmac.txt 
> b/Documentation/devicetree/bindings/net/stmmac.txt
> index 3a28a5d8857d..525425beb6e7 100644
> --- a/Documentation/devicetree/bindings/net/stmmac.txt
> +++ b/Documentation/devicetree/bindings/net/stmmac.txt
> @@ -1,7 +1,8 @@
>  * STMicroelectronics 10/100/1000 Ethernet driver (GMAC)
>  
>  Required properties:
> -- compatible: Should be "snps,dwmac-", "snps,dwmac"
> +- compatible: Should be "snps,dwmac-", "snps,dwmac" or
> + "snps,dwxgmac-

[...]

MBR, Sergei


Re: [PATCH bpf] xdp: add NULL pointer check in __xdp_return()

2018-08-01 Thread Björn Töpel
Den ons 1 aug. 2018 kl 16:14 skrev Jesper Dangaard Brouer :
>
> On Mon, 23 Jul 2018 11:41:02 +0200
> Björn Töpel  wrote:
>
> > > >> diff --git a/net/core/xdp.c b/net/core/xdp.c
> > > >> index 9d1f220..1c12bc7 100644
> > > >> --- a/net/core/xdp.c
> > > >> +++ b/net/core/xdp.c
> > > >> @@ -345,7 +345,8 @@ static void __xdp_return(void *data, struct 
> > > >> xdp_mem_info *mem, bool napi_direct,
> > > >>   rcu_read_lock();
> > > >>   /* mem->id is valid, checked in 
> > > >> xdp_rxq_info_reg_mem_model() */
> > > >>   xa = rhashtable_lookup(mem_id_ht, >id, 
> > > >> mem_id_rht_params);
> > > >> - xa->zc_alloc->free(xa->zc_alloc, handle);
> > > >> + if (xa)
> > > >> + xa->zc_alloc->free(xa->zc_alloc, handle);
> > > > hmm...It is not clear to me the "!xa" case don't have to be handled?
> > >
> > > Thank you for reviewing!
> > >
> > > Returning NULL pointer is bug case such as calling after use
> > > xdp_rxq_info_unreg().
> > > so that, I think it can't handle at that moment.
> > > we can make __xdp_return to add WARN_ON_ONCE() or
> > > add return error code to driver.
> > > But I'm not sure if these is useful information.
> > >
> > > I might have misunderstood scenario of MEM_TYPE_ZERO_COPY
> > > because there is no use case of MEM_TYPE_ZERO_COPY yet.
> > >
> >
> > Taehee, again, sorry for the slow response and thanks for patch!
> >
> > If xa is NULL, the driver has a buggy/broken implementation. What
> > would be a proper way of dealing with this? BUG?
>
> Hmm... I don't like these kind of changes to the hot-path code!
>
> You might not realize this, but adding BUG() and WARN_ON() to the code
> affect performance in ways you might not realize!  These macros gets
> compiled and uses an asm instruction called "ud2".  Seeing the "ud2"
> instruction causes the CPUs instruction cache prefetcher to stop.
> Thus, if some code ends up below this instruction, this will cause more
> i-cache-misses.
>
> I don't know if xa==NULL is even possible, but if it is, then I think
> this is a result of a driver mem_reg API usage bug.  And the mem-reg
> API is full of WARN's and error messages, exactly to push these kind of
> checks out of the fast-path.  There is no need for a BUG() call, as
> deref a NULL pointer will case an OOPS, that is easy to read and
> understand.
>

Jesper, thanks for having a look! So, you're right that if xa==NULL
the driver is "broken/buggy" (as stated earlier!). I agree that
OOPSing on a NULL pointer is as good as a BUG!

The applied patch adds a WARN_ON_ONCE, and I thought best practice was
that a buggy driver shouldn't crash the kernel... What is considered
best practices in these scenarios? *I'd* prefer an OOPS instead of
WARN_ON_ONCE, to catch that buggy driver. Again, that's me. I thought
that most people prefer not crashing, hence the patch. :-)


Björn

> --
> Best regards,
>   Jesper Dangaard Brouer
>   MSc.CS, Principal Kernel Engineer at Red Hat
>   LinkedIn: http://www.linkedin.com/in/brouer


Re: [PATCH net-next v5 1/4] net/sched: user-space can't set unknown tcfa_action values

2018-08-01 Thread Jamal Hadi Salim

On 31/07/18 10:40 AM, Paolo Abeni wrote:


If we choose to reject unknown opcodes, such user-space configuration
will fail.



I think that is a good thing. The kernel should not be accepting things
it doesnt understand. This is a good opportunity to enforce that.


What would happen before this patch is that configurations using such
TC_ACT_ value would be successful. This is why I proposed to keep
the fixup.



Note: Such behavior can only occur if tc(user space) allows you
to pass illegitimate values which today can only happen when you have a 
new user space but older kernel (with "old" starting with your current

changes).
iow, fixing a policy in a kernel which has no support for TC_ACT_
to translate intent to be TC_ACT_OK/PIPE is problematic (as i was
showing earlier).


I initially thought the kernel behavior in the above scenario would
match exactly TC_ACT_UNSPEC processing, but as you noted with the
example in your previous email, TC_ACT_UNSPEC processing is actually a
bit different.



I worry: I dont think we can get a good default for most use
cases. No point in maintaining faulty  expectations
(because  IMO: the user will - eventually - fix their scripts if they
dont see expected behavior).

cheers,
jamal


Re: [PATCH mlx5-next] RDMA/mlx5: Don't use cached IRQ affinity mask

2018-08-01 Thread Max Gurtovoy



On 8/1/2018 8:12 AM, Sagi Grimberg wrote:

Hi Max,


Hi,




Yes, since nvmf is the only user of this function.
Still waiting for comments on the suggested patch :)



Sorry for the late response (but I'm on vacation so I have
an excuse ;))


NP :) currently the code works..



I'm thinking that we should avoid trying to find an assignment
when stuff like irqbalance daemon is running and changing
the affinitization.


but this is exactly what Steve complained and Leon try to fix (and break 
the connection establishment).
If this is the case and we all agree then we're good without Leon's 
patch and without our suggestions.




This extension was made to apply optimal affinity assignment
when the device irq affinity is lined up in a vector per
core.

I'm thinking that when we identify this is not the case, we immediately
fallback to the default mapping.

1. when we get a mask, if its weight != 1, we fallback.
2. if a queue was left unmapped, we fallback.

Maybe something like the following:


did you test it ? I think it will not work since you need to map all the 
queues and all the CPUs.



--
diff --git a/block/blk-mq-rdma.c b/block/blk-mq-rdma.c
index 996167f1de18..1ada6211c55e 100644
--- a/block/blk-mq-rdma.c
+++ b/block/blk-mq-rdma.c
@@ -35,17 +35,26 @@ int blk_mq_rdma_map_queues(struct blk_mq_tag_set *set,
     const struct cpumask *mask;
     unsigned int queue, cpu;

+   /* reset all CPUs mapping */
+   for_each_possible_cpu(cpu)
+   set->mq_map[cpu] = UINT_MAX;
+
     for (queue = 0; queue < set->nr_hw_queues; queue++) {
     mask = ib_get_vector_affinity(dev, first_vec + queue);
     if (!mask)
     goto fallback;

-   for_each_cpu(cpu, mask)
-   set->mq_map[cpu] = queue;
+   if (cpumask_weight(mask) != 1)
+   goto fallback;
+
+   cpu = cpumask_first(mask);
+   if (set->mq_map[cpu] != UINT_MAX)
+   goto fallback;
+
+   set->mq_map[cpu] = queue;
     }

     return 0;
-
  fallback:
     return blk_mq_map_queues(set);
  }
--


see attached another algorithem that can improve the mapping (although 
it's not a short one)...


it will try to map according to affinity mask, and also in case the mask 
weight > 1 it will try to be better than the naive mapping I suggest in 
the previous email.


From 007d773af7b65a1f1ca543f031ca58b3afa5b7d9 Mon Sep 17 00:00:00 2001
From: Max Gurtovoy 
Date: Thu, 19 Jul 2018 12:42:00 +
Subject: [PATCH 1/1] blk-mq: fix RDMA queue/cpu mappings assignments for mq

Signed-off-by: Max Gurtovoy 
Signed-off-by: Israel Rukshin 
---
 block/blk-mq-cpumap.c  | 41 ++--
 block/blk-mq-rdma.c| 84 ++
 include/linux/blk-mq.h |  1 +
 3 files changed, 103 insertions(+), 23 deletions(-)

diff --git a/block/blk-mq-cpumap.c b/block/blk-mq-cpumap.c
index 3eb169f..02b888f 100644
--- a/block/blk-mq-cpumap.c
+++ b/block/blk-mq-cpumap.c
@@ -30,29 +30,36 @@ static int get_first_sibling(unsigned int cpu)
return cpu;
 }
 
-int blk_mq_map_queues(struct blk_mq_tag_set *set)
+void blk_mq_map_queue_to_cpu(struct blk_mq_tag_set *set, unsigned int cpu)
 {
unsigned int *map = set->mq_map;
unsigned int nr_queues = set->nr_hw_queues;
-   unsigned int cpu, first_sibling;
+   unsigned int first_sibling;
 
-   for_each_possible_cpu(cpu) {
-   /*
-* First do sequential mapping between CPUs and queues.
-* In case we still have CPUs to map, and we have some number of
-* threads per cores then map sibling threads to the same queue 
for
-* performace optimizations.
-*/
-   if (cpu < nr_queues) {
+   /*
+* First do sequential mapping between CPUs and queues.
+* In case we still have CPUs to map, and we have some number of
+* threads per cores then map sibling threads to the same queue for
+* performace optimizations.
+*/
+   if (cpu < nr_queues) {
+   map[cpu] = cpu_to_queue_index(nr_queues, cpu);
+   } else {
+   first_sibling = get_first_sibling(cpu);
+   if (first_sibling == cpu)
map[cpu] = cpu_to_queue_index(nr_queues, cpu);
-   } else {
-   first_sibling = get_first_sibling(cpu);
-   if (first_sibling == cpu)
-   map[cpu] = cpu_to_queue_index(nr_queues, cpu);
-   else
-   map[cpu] = map[first_sibling];
-   }
+   else
+   map[cpu] = map[first_sibling];
}
+}
+EXPORT_SYMBOL_GPL(blk_mq_map_queue_to_cpu);
+
+int blk_mq_map_queues(struct blk_mq_tag_set *set)
+{
+   unsigned int cpu;
+
+   

Re: [PATCH bpf] xdp: add NULL pointer check in __xdp_return()

2018-08-01 Thread Jesper Dangaard Brouer
On Mon, 23 Jul 2018 11:41:02 +0200
Björn Töpel  wrote:

> > >> diff --git a/net/core/xdp.c b/net/core/xdp.c
> > >> index 9d1f220..1c12bc7 100644
> > >> --- a/net/core/xdp.c
> > >> +++ b/net/core/xdp.c
> > >> @@ -345,7 +345,8 @@ static void __xdp_return(void *data, struct 
> > >> xdp_mem_info *mem, bool napi_direct,
> > >>   rcu_read_lock();
> > >>   /* mem->id is valid, checked in 
> > >> xdp_rxq_info_reg_mem_model() */
> > >>   xa = rhashtable_lookup(mem_id_ht, >id, 
> > >> mem_id_rht_params);
> > >> - xa->zc_alloc->free(xa->zc_alloc, handle);
> > >> + if (xa)
> > >> + xa->zc_alloc->free(xa->zc_alloc, handle);  
> > > hmm...It is not clear to me the "!xa" case don't have to be handled?  
> >
> > Thank you for reviewing!
> >
> > Returning NULL pointer is bug case such as calling after use
> > xdp_rxq_info_unreg().
> > so that, I think it can't handle at that moment.
> > we can make __xdp_return to add WARN_ON_ONCE() or
> > add return error code to driver.
> > But I'm not sure if these is useful information.
> >
> > I might have misunderstood scenario of MEM_TYPE_ZERO_COPY
> > because there is no use case of MEM_TYPE_ZERO_COPY yet.
> >  
> 
> Taehee, again, sorry for the slow response and thanks for patch!
> 
> If xa is NULL, the driver has a buggy/broken implementation. What
> would be a proper way of dealing with this? BUG?

Hmm... I don't like these kind of changes to the hot-path code!

You might not realize this, but adding BUG() and WARN_ON() to the code
affect performance in ways you might not realize!  These macros gets
compiled and uses an asm instruction called "ud2".  Seeing the "ud2"
instruction causes the CPUs instruction cache prefetcher to stop.
Thus, if some code ends up below this instruction, this will cause more
i-cache-misses.

I don't know if xa==NULL is even possible, but if it is, then I think
this is a result of a driver mem_reg API usage bug.  And the mem-reg
API is full of WARN's and error messages, exactly to push these kind of
checks out of the fast-path.  There is no need for a BUG() call, as
deref a NULL pointer will case an OOPS, that is easy to read and
understand.

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer


RE: [net-next v5 3/3] net/tls: Remove redundant array allocation.

2018-08-01 Thread Vakul Garg



> -Original Message-
> From: Dave Watson [mailto:davejwat...@fb.com]
> Sent: Monday, July 23, 2018 10:05 PM
> To: David Miller 
> Cc: Vakul Garg ; netdev@vger.kernel.org;
> bor...@mellanox.com; avia...@mellanox.com; Doron Roberts-Kedes
> 
> Subject: Re: [net-next v5 3/3] net/tls: Remove redundant array allocation.
> 
> On 07/21/18 07:25 PM, David Miller wrote:
> > From: Vakul Garg 
> > Date: Thu, 19 Jul 2018 21:56:13 +0530
> >
> > > In function decrypt_skb(), array allocation in case when sgout is
> > > NULL is unnecessary. Instead, local variable sgin_arr[] can be used.
> > >
> > > Signed-off-by: Vakul Garg 
> >
> > Hmmm...
> >
> > Dave, can you take a look at this?  Do you think there might have been
> > a reason you felt that you needed to dynamically allocate the
> > scatterlists when you COW and skb and do in-place decryption?
> >
> > I guess this change is ok, nsg can only get smaller when the SKB is
> > COW'd.
> >
> 
> > > memcpy(iv, tls_ctx->rx.iv, TLS_CIPHER_AES_GCM_128_SALT_SIZE);
> > > if (!sgout) {
> > > nsg = skb_cow_data(skb, 0, ) + 1;
> > > -   sgin = kmalloc_array(nsg, sizeof(*sgin), 
> > > sk->sk_allocation);
> > > sgout = sgin;
> > > }
> 
> I don't think this patch is safe as-is.  sgin_arr is a stack array of size
> MAX_SKB_FRAGS (+ overhead), while my read of skb_cow_data is that it
> walks the whole chain of skbs from skb->next, and can return any number of
> segments.  Therefore we need to heap allocate.  I think I copied the IPSEC
> code here.
> 
> For perf though, we could use the stack array if skb_cow_data returns <=
> MAX_SKB_FRAGS.

skb_cow_data() is being called only when caller passes sgout=NULL (i.e.
non-zero copy case). In case of zero-copy, we do not call skb_cow_data()
and just assume that MAX_SKB_FRAGS+2 sized scatterlist array sgin_arr[]
is sufficient. This assumption could be wrong. So skb_cow_data() should be
called unconditionally to determine number of scatterlist entries required
for skb.

> 
> This code is slightly confusing though, since we don't heap allocate in the
> zerocopy case - what happens is that skb_to_sgvec returns -EMSGSIZE, and
> we fall back to the non-zerocopy case, and return again to this function,
> where we then hit the skb_cow_data path and heap allocate.

If skb_to_sgvec return -EMSGSIZE, decrypt_skb() would return failure, 
resulting in abort of TLS session.



[PATCH bpf] selftests/bpf: update test_lwt_seg6local.sh according to iproute2

2018-08-01 Thread Mathieu Xhonneux
The shell file for test_lwt_seg6local contains an early iproute2 syntax
for installing a seg6local End.BPF route. iproute2 support for this
feature has recently been upstreamed, but with an additional keyword
required. This patch updates test_lwt_seg6local.sh to the definitive
iproute2 syntax

Signed-off-by: Mathieu Xhonneux 
---
 tools/testing/selftests/bpf/test_lwt_seg6local.sh | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/tools/testing/selftests/bpf/test_lwt_seg6local.sh 
b/tools/testing/selftests/bpf/test_lwt_seg6local.sh
index 270fa8f49573..785eabf2a593 100755
--- a/tools/testing/selftests/bpf/test_lwt_seg6local.sh
+++ b/tools/testing/selftests/bpf/test_lwt_seg6local.sh
@@ -115,14 +115,14 @@ ip netns exec ns2 ip -6 route add fb00::6 encap bpf in 
obj test_lwt_seg6local.o
 ip netns exec ns2 ip -6 route add fd00::1 dev veth3 via fb00::43 scope link
 
 ip netns exec ns3 ip -6 route add fc42::1 dev veth5 via fb00::65
-ip netns exec ns3 ip -6 route add fd00::1 encap seg6local action End.BPF obj 
test_lwt_seg6local.o sec add_egr_x dev veth4
+ip netns exec ns3 ip -6 route add fd00::1 encap seg6local action End.BPF 
endpoint obj test_lwt_seg6local.o sec add_egr_x dev veth4
 
-ip netns exec ns4 ip -6 route add fd00::2 encap seg6local action End.BPF obj 
test_lwt_seg6local.o sec pop_egr dev veth6
+ip netns exec ns4 ip -6 route add fd00::2 encap seg6local action End.BPF 
endpoint obj test_lwt_seg6local.o sec pop_egr dev veth6
 ip netns exec ns4 ip -6 addr add fc42::1 dev lo
 ip netns exec ns4 ip -6 route add fd00::3 dev veth7 via fb00::87
 
 ip netns exec ns5 ip -6 route add fd00::4 table 117 dev veth9 via fb00::109
-ip netns exec ns5 ip -6 route add fd00::3 encap seg6local action End.BPF obj 
test_lwt_seg6local.o sec inspect_t dev veth8
+ip netns exec ns5 ip -6 route add fd00::3 encap seg6local action End.BPF 
endpoint obj test_lwt_seg6local.o sec inspect_t dev veth8
 
 ip netns exec ns6 ip -6 addr add fb00::6/16 dev lo
 ip netns exec ns6 ip -6 addr add fd00::4/16 dev lo
-- 
2.16.1



how PHY driver is notified that cable is unplugged? (possibly related to IFF_RUNNING flag)

2018-08-01 Thread rpjday



  (warning that i have a few questions that are probably trivial  
until i get up to speed

with networking code.)

  a colleague asked for advice about the following -- apparently a  
new PHY driver works
properly when being brought up with "ifconfig  up", part of  
that process

apparently setting the IFF_RUNNING net_device flags bit. so far, so good.

  now, when the cable is physically unplugged, the claim is that  
there is no obvious
status change for that port, accompanied by the suggestion that it is  
that IFF_RUNNING

flag bit that is not being unset.

  asking a more general question, where can i read up on the proper  
protocol for
a driver being notified of, and properly handling, physical  
disconnection on that

port? and, of course, the cable being plugged back in.

  thanks in advance for any pointers.

rday



Re: [RFC PATCH 0/2] net: macb: Disable TX checksum offloading on all Zynq

2018-08-01 Thread Harini Katakam
Hi Jennifer,

On Tue, Jun 5, 2018 at 10:21 AM, Harini Katakam  wrote:
> Hi Jeniffer,
>
> On Mon, Jun 4, 2018 at 8:35 PM, Nicolas Ferre
>  wrote:
>> Jennifer,
>>
>> On 25/05/2018 at 23:44, Jennifer Dahm wrote:
>>>
>>> During testing, I discovered that the Zynq GEM hardware overwrites all
>>> outgoing UDP packet checksums, which is illegal in packet forwarding
>>> cases. This happens both with and without the checksum-zeroing
>>> behavior  introduced  in  007e4ba3ee137f4700f39aa6dbaf01a71047c5f6
>>> ("net: macb: initialize checksum when using checksum offloading"). The
>>> only solution to both the small packet bug and the packet forwarding
>>> bug that I can find is to disable TX checksum offloading entirely.
>>
>>
>
> Thanks for the extensive testing.
> I'll try to reproduce and see if it is something to be fixed in the driver.
>
>> Are the bugs listed above present in all revisions of the GEM IP, only for
>> some revisions?
>> Is there an errata that describe this issue for the Zynq GEM?
>
> @Nicolas, AFAIK, there is no errata for this in either Cadence or
> Zynq documentation.

I was unable to reproduce this issue on Zynq.
Although I do not have HW with two GEM ports,
I tried by routing one GEM via PL and another via on board RGMII.
Since there was no specific errata related to this, I also tried on
subsequent ZynqMP versions with multiple GEM ports but dint find any
checksum issues. I discussed the same with cadence and they
tried the test with 2 bytes of UDP payload on the Zynq GEM IP
version in their regressions and did not hit any issue either.

I tried to reach out earlier to see if you can share your exact
application. Could you please let me know if you have any
further updates?

Regards,
Harini


[PATCH net-next] cxgb4: fix endian to test F_FW_PORT_CMD_DCBXDIS32

2018-08-01 Thread Ganesh Goudar
For FW_PORT_ACTION_GET_PORT_INFO32 messages, the
u.info32.lstatus32_to_cbllen32 is 32-bit Big Endian.
We need to translate that to CPU Endian in order to
test F_FW_PORT_CMD_DCBXDIS32.

Signed-off-by: Casey Leedom 
Signed-off-by: Ganesh Goudar 
---
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c 
b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
index 40cf8dc..674997d 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
@@ -554,10 +554,9 @@ static int fwevtq_handler(struct sge_rspq *q, const __be64 
*rsp,
 
dev = q->adap->port[q->adap->chan_map[port]];
dcbxdis = (action == FW_PORT_ACTION_GET_PORT_INFO
-  ? !!(pcmd->u.info.dcbxdis_pkd &
-   FW_PORT_CMD_DCBXDIS_F)
-  : !!(pcmd->u.info32.lstatus32_to_cbllen32 &
-   FW_PORT_CMD_DCBXDIS32_F));
+ ? !!(pcmd->u.info.dcbxdis_pkd & FW_PORT_CMD_DCBXDIS_F)
+ : !!(be32_to_cpu(pcmd->u.info32.lstatus32_to_cbllen32)
+  & FW_PORT_CMD_DCBXDIS32_F));
state_input = (dcbxdis
   ? CXGB4_DCB_INPUT_FW_DISABLED
   : CXGB4_DCB_INPUT_FW_ENABLED);
-- 
2.1.0



[PATCH net-next 6/9] net: stmmac: Add PTP support for XGMAC2

2018-08-01 Thread Jose Abreu
XGMAC2 uses the same engine of timestamping as GMAC4. Let's use the same
callbacks.

Signed-off-by: Jose Abreu 
Cc: David S. Miller 
Cc: Joao Pinto 
Cc: Giuseppe Cavallaro 
Cc: Alexandre Torgue 
---
 drivers/net/ethernet/stmicro/stmmac/hwif.c   | 4 ++--
 drivers/net/ethernet/stmicro/stmmac/stmmac_ptp.c | 6 --
 drivers/net/ethernet/stmicro/stmmac/stmmac_ptp.h | 1 +
 3 files changed, 7 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/hwif.c 
b/drivers/net/ethernet/stmicro/stmmac/hwif.c
index 4b4ba1c8bad5..357309a6d6a5 100644
--- a/drivers/net/ethernet/stmicro/stmmac/hwif.c
+++ b/drivers/net/ethernet/stmicro/stmmac/hwif.c
@@ -193,13 +193,13 @@ static const struct stmmac_hwif_entry {
.xgmac = true,
.min_id = DWXGMAC_CORE_2_10,
.regs = {
-   .ptp_off = 0,
+   .ptp_off = PTP_XGMAC_OFFSET,
.mmc_off = 0,
},
.desc = _desc_ops,
.dma = _dma_ops,
.mac = _ops,
-   .hwtimestamp = NULL,
+   .hwtimestamp = _ptp,
.mode = NULL,
.tc = NULL,
.setup = dwxgmac2_setup,
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_ptp.c 
b/drivers/net/ethernet/stmicro/stmmac/stmmac_ptp.c
index 0cb0e39a2be9..2293e21f789f 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_ptp.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_ptp.c
@@ -71,6 +71,9 @@ static int stmmac_adjust_time(struct ptp_clock_info *ptp, s64 
delta)
u32 sec, nsec;
u32 quotient, reminder;
int neg_adj = 0;
+   bool xmac;
+
+   xmac = priv->plat->has_gmac4 || priv->plat->has_xgmac;
 
if (delta < 0) {
neg_adj = 1;
@@ -82,8 +85,7 @@ static int stmmac_adjust_time(struct ptp_clock_info *ptp, s64 
delta)
nsec = reminder;
 
spin_lock_irqsave(>ptp_lock, flags);
-   stmmac_adjust_systime(priv, priv->ptpaddr, sec, nsec, neg_adj,
-   priv->plat->has_gmac4);
+   stmmac_adjust_systime(priv, priv->ptpaddr, sec, nsec, neg_adj, xmac);
spin_unlock_irqrestore(>ptp_lock, flags);
 
return 0;
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_ptp.h 
b/drivers/net/ethernet/stmicro/stmmac/stmmac_ptp.h
index f4b31d69f60e..ecccf895fd7e 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_ptp.h
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_ptp.h
@@ -21,6 +21,7 @@
 #ifndef__STMMAC_PTP_H__
 #define__STMMAC_PTP_H__
 
+#define PTP_XGMAC_OFFSET   0xd00
 #definePTP_GMAC4_OFFSET0xb00
 #definePTP_GMAC3_X_OFFSET  0x700
 
-- 
2.7.4




[PATCH net-next 7/9] net: stmmac: Integrate XGMAC into main driver flow

2018-08-01 Thread Jose Abreu
Now that we have all the XGMAC related callbacks, lets start integrating
this IP block into main driver.

Signed-off-by: Jose Abreu 
Cc: David S. Miller 
Cc: Joao Pinto 
Cc: Giuseppe Cavallaro 
Cc: Alexandre Torgue 
---
 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 55 ++-
 1 file changed, 45 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c 
b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index d9e60cfd8a85..2c7a571140bb 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -51,6 +51,7 @@
 #include 
 #include 
 #include "dwmac1000.h"
+#include "dwxgmac2.h"
 #include "hwif.h"
 
 #define STMMAC_ALIGN(x)L1_CACHE_ALIGN(x)
@@ -262,6 +263,21 @@ static void stmmac_clk_csr_set(struct stmmac_priv *priv)
else
priv->clk_csr = 0;
}
+
+   if (priv->plat->has_xgmac) {
+   if (clk_rate > 4)
+   priv->clk_csr = 0x5;
+   else if (clk_rate > 35000)
+   priv->clk_csr = 0x4;
+   else if (clk_rate > 3)
+   priv->clk_csr = 0x3;
+   else if (clk_rate > 25000)
+   priv->clk_csr = 0x2;
+   else if (clk_rate > 15000)
+   priv->clk_csr = 0x1;
+   else
+   priv->clk_csr = 0x0;
+   }
 }
 
 static void print_pkt(unsigned char *buf, int len)
@@ -498,7 +514,7 @@ static void stmmac_get_rx_hwtstamp(struct stmmac_priv 
*priv, struct dma_desc *p,
if (!priv->hwts_rx_en)
return;
/* For GMAC4, the valid timestamp is from CTX next desc. */
-   if (priv->plat->has_gmac4)
+   if (priv->plat->has_gmac4 || priv->plat->has_xgmac)
desc = np;
 
/* Check if timestamp is available */
@@ -540,6 +556,9 @@ static int stmmac_hwtstamp_ioctl(struct net_device *dev, 
struct ifreq *ifr)
u32 ts_event_en = 0;
u32 value = 0;
u32 sec_inc;
+   bool xmac;
+
+   xmac = priv->plat->has_gmac4 || priv->plat->has_xgmac;
 
if (!(priv->dma_cap.time_stamp || priv->adv_ts)) {
netdev_alert(priv->dev, "No support for HW time stamping\n");
@@ -575,7 +594,7 @@ static int stmmac_hwtstamp_ioctl(struct net_device *dev, 
struct ifreq *ifr)
/* PTP v1, UDP, any kind of event packet */
config.rx_filter = HWTSTAMP_FILTER_PTP_V1_L4_EVENT;
/* take time stamp for all event messages */
-   if (priv->plat->has_gmac4)
+   if (xmac)
snap_type_sel = PTP_GMAC4_TCR_SNAPTYPSEL_1;
else
snap_type_sel = PTP_TCR_SNAPTYPSEL_1;
@@ -610,7 +629,7 @@ static int stmmac_hwtstamp_ioctl(struct net_device *dev, 
struct ifreq *ifr)
config.rx_filter = HWTSTAMP_FILTER_PTP_V2_L4_EVENT;
ptp_v2 = PTP_TCR_TSVER2ENA;
/* take time stamp for all event messages */
-   if (priv->plat->has_gmac4)
+   if (xmac)
snap_type_sel = PTP_GMAC4_TCR_SNAPTYPSEL_1;
else
snap_type_sel = PTP_TCR_SNAPTYPSEL_1;
@@ -647,7 +666,7 @@ static int stmmac_hwtstamp_ioctl(struct net_device *dev, 
struct ifreq *ifr)
config.rx_filter = HWTSTAMP_FILTER_PTP_V2_EVENT;
ptp_v2 = PTP_TCR_TSVER2ENA;
/* take time stamp for all event messages */
-   if (priv->plat->has_gmac4)
+   if (xmac)
snap_type_sel = PTP_GMAC4_TCR_SNAPTYPSEL_1;
else
snap_type_sel = PTP_TCR_SNAPTYPSEL_1;
@@ -718,7 +737,7 @@ static int stmmac_hwtstamp_ioctl(struct net_device *dev, 
struct ifreq *ifr)
/* program Sub Second Increment reg */
stmmac_config_sub_second_increment(priv,
priv->ptpaddr, priv->plat->clk_ptp_rate,
-   priv->plat->has_gmac4, _inc);
+   xmac, _inc);
temp = div_u64(10ULL, sec_inc);
 
/* Store sub second increment and flags for later use */
@@ -755,12 +774,14 @@ static int stmmac_hwtstamp_ioctl(struct net_device *dev, 
struct ifreq *ifr)
  */
 static int stmmac_init_ptp(struct stmmac_priv *priv)
 {
+   bool xmac = priv->plat->has_gmac4 || priv->plat->has_xgmac;
+
if (!(priv->dma_cap.time_stamp || priv->dma_cap.atime_stamp))
return -EOPNOTSUPP;
 
priv->adv_ts = 0;
-   /* Check if adv_ts can be enabled for dwmac 4.x core */
-   if 

[PATCH net-next 9/9] bindings: net: stmmac: Add the bindings documentation for XGMAC2.

2018-08-01 Thread Jose Abreu
Adds the documentation for XGMAC2 DT bindings.

Signed-off-by: Jose Abreu 
Cc: David S. Miller 
Cc: Joao Pinto 
Cc: Giuseppe Cavallaro 
Cc: Alexandre Torgue 
---
 Documentation/devicetree/bindings/net/stmmac.txt | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/Documentation/devicetree/bindings/net/stmmac.txt 
b/Documentation/devicetree/bindings/net/stmmac.txt
index 3a28a5d8857d..525425beb6e7 100644
--- a/Documentation/devicetree/bindings/net/stmmac.txt
+++ b/Documentation/devicetree/bindings/net/stmmac.txt
@@ -1,7 +1,8 @@
 * STMicroelectronics 10/100/1000 Ethernet driver (GMAC)
 
 Required properties:
-- compatible: Should be "snps,dwmac-", "snps,dwmac"
+- compatible: Should be "snps,dwmac-", "snps,dwmac" or
+   "snps,dwxgmac-

[PATCH net-next 8/9] net: stmmac: Add the bindings parsing for XGMAC2

2018-08-01 Thread Jose Abreu
Add the bindings parsing for XGMAC2 IP block.

Signed-off-by: Jose Abreu 
Cc: David S. Miller 
Cc: Joao Pinto 
Cc: Giuseppe Cavallaro 
Cc: Alexandre Torgue 
---
 drivers/net/ethernet/stmicro/stmmac/dwmac-generic.c   | 2 ++
 drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c | 6 ++
 2 files changed, 8 insertions(+)

diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac-generic.c 
b/drivers/net/ethernet/stmicro/stmmac/dwmac-generic.c
index 3304095c934c..fad503820e04 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac-generic.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-generic.c
@@ -78,6 +78,8 @@ static const struct of_device_id dwmac_generic_match[] = {
{ .compatible = "snps,dwmac-4.00"},
{ .compatible = "snps,dwmac-4.10a"},
{ .compatible = "snps,dwmac"},
+   { .compatible = "snps,dwxgmac-2.10"},
+   { .compatible = "snps,dwxgmac"},
{ }
 };
 MODULE_DEVICE_TABLE(of, dwmac_generic_match);
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c 
b/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c
index 72da77b94ecd..3609c7b696c7 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c
@@ -486,6 +486,12 @@ stmmac_probe_config_dt(struct platform_device *pdev, const 
char **mac)
plat->force_sf_dma_mode = 1;
}
 
+   if (of_device_is_compatible(np, "snps,dwxgmac")) {
+   plat->has_xgmac = 1;
+   plat->pmt = 1;
+   plat->tso_en = of_property_read_bool(np, "snps,tso");
+   }
+
dma_cfg = devm_kzalloc(>dev, sizeof(*dma_cfg),
   GFP_KERNEL);
if (!dma_cfg) {
-- 
2.7.4




[PATCH net-next 3/9] net: stmmac: Add DMA related callbacks for XGMAC2

2018-08-01 Thread Jose Abreu
Add the DMA related callbacks for the new IP block XGMAC2.

Signed-off-by: Jose Abreu 
Cc: David S. Miller 
Cc: Joao Pinto 
Cc: Giuseppe Cavallaro 
Cc: Alexandre Torgue 
---
 drivers/net/ethernet/stmicro/stmmac/Makefile   |   2 +-
 drivers/net/ethernet/stmicro/stmmac/dwxgmac2.h |  56 +++
 drivers/net/ethernet/stmicro/stmmac/dwxgmac2_dma.c | 410 +
 drivers/net/ethernet/stmicro/stmmac/hwif.c |   2 +-
 drivers/net/ethernet/stmicro/stmmac/hwif.h |   1 +
 5 files changed, 469 insertions(+), 2 deletions(-)
 create mode 100644 drivers/net/ethernet/stmicro/stmmac/dwxgmac2_dma.c

diff --git a/drivers/net/ethernet/stmicro/stmmac/Makefile 
b/drivers/net/ethernet/stmicro/stmmac/Makefile
index a6cf632c9592..da40d3bba037 100644
--- a/drivers/net/ethernet/stmicro/stmmac/Makefile
+++ b/drivers/net/ethernet/stmicro/stmmac/Makefile
@@ -5,7 +5,7 @@ stmmac-objs:= stmmac_main.o stmmac_ethtool.o stmmac_mdio.o 
ring_mode.o  \
  dwmac100_core.o dwmac100_dma.o enh_desc.o norm_desc.o \
  mmc_core.o stmmac_hwtstamp.o stmmac_ptp.o dwmac4_descs.o  \
  dwmac4_dma.o dwmac4_lib.o dwmac4_core.o dwmac5.o hwif.o \
- stmmac_tc.o dwxgmac2_core.o $(stmmac-y)
+ stmmac_tc.o dwxgmac2_core.o dwxgmac2_dma.o $(stmmac-y)
 
 # Ordering matters. Generic driver must be last.
 obj-$(CONFIG_STMMAC_PLATFORM)  += stmmac-platform.o
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwxgmac2.h 
b/drivers/net/ethernet/stmicro/stmmac/dwxgmac2.h
index ef63a62a699d..69d225a9c890 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwxgmac2.h
+++ b/drivers/net/ethernet/stmicro/stmmac/dwxgmac2.h
@@ -137,4 +137,60 @@
 #define XGMAC_ABPSIS   BIT(1)
 #define XGMAC_TXUNFIS  BIT(0)
 
+/* DMA Registers */
+#define XGMAC_DMA_MODE 0x3000
+#define XGMAC_SWR  BIT(0)
+#define XGMAC_DMA_SYSBUS_MODE  0x3004
+#define XGMAC_WR_OSR_LMT   GENMASK(29, 24)
+#define XGMAC_WR_OSR_LMT_SHIFT 24
+#define XGMAC_RD_OSR_LMT   GENMASK(21, 16)
+#define XGMAC_RD_OSR_LMT_SHIFT 16
+#define XGMAC_EN_LPI   BIT(15)
+#define XGMAC_LPI_XIT_PKT  BIT(14)
+#define XGMAC_AAL  BIT(12)
+#define XGMAC_BLEN256  BIT(7)
+#define XGMAC_BLEN128  BIT(6)
+#define XGMAC_BLEN64   BIT(5)
+#define XGMAC_BLEN32   BIT(4)
+#define XGMAC_BLEN16   BIT(3)
+#define XGMAC_BLEN8BIT(2)
+#define XGMAC_BLEN4BIT(1)
+#define XGMAC_UNDEFBIT(0)
+#define XGMAC_DMA_CH_CONTROL(x)(0x3100 + (0x80 * (x)))
+#define XGMAC_PBLx8BIT(16)
+#define XGMAC_DMA_CH_TX_CONTROL(x) (0x3104 + (0x80 * (x)))
+#define XGMAC_TxPBLGENMASK(21, 16)
+#define XGMAC_TxPBL_SHIFT  16
+#define XGMAC_TSE  BIT(12)
+#define XGMAC_OSP  BIT(4)
+#define XGMAC_TXST BIT(0)
+#define XGMAC_DMA_CH_RX_CONTROL(x) (0x3108 + (0x80 * (x)))
+#define XGMAC_RxPBLGENMASK(21, 16)
+#define XGMAC_RxPBL_SHIFT  16
+#define XGMAC_RXST BIT(0)
+#define XGMAC_DMA_CH_TxDESC_LADDR(x)   (0x3114 + (0x80 * (x)))
+#define XGMAC_DMA_CH_RxDESC_LADDR(x)   (0x311c + (0x80 * (x)))
+#define XGMAC_DMA_CH_TxDESC_TAIL_LPTR(x)   (0x3124 + (0x80 * (x)))
+#define XGMAC_DMA_CH_RxDESC_TAIL_LPTR(x)   (0x312c + (0x80 * (x)))
+#define XGMAC_DMA_CH_TxDESC_RING_LEN(x)(0x3130 + (0x80 * 
(x)))
+#define XGMAC_DMA_CH_RxDESC_RING_LEN(x)(0x3134 + (0x80 * 
(x)))
+#define XGMAC_DMA_CH_INT_EN(x) (0x3138 + (0x80 * (x)))
+#define XGMAC_NIE  BIT(15)
+#define XGMAC_AIE  BIT(14)
+#define XGMAC_RBUE BIT(7)
+#define XGMAC_RIE  BIT(6)
+#define XGMAC_TIE  BIT(0)
+#define XGMAC_DMA_INT_DEFAULT_EN   (XGMAC_NIE | XGMAC_AIE | XGMAC_RBUE | \
+   XGMAC_RIE | XGMAC_TIE)
+#define XGMAC_DMA_CH_Rx_WATCHDOG(x)(0x313c + (0x80 * (x)))
+#define XGMAC_RWT  GENMASK(7, 0)
+#define XGMAC_DMA_CH_STATUS(x) (0x3160 + (0x80 * (x)))
+#define XGMAC_NIS  BIT(15)
+#define XGMAC_AIS  BIT(14)
+#define XGMAC_FBE  BIT(12)
+#define XGMAC_RBU  BIT(7)
+#define XGMAC_RI   BIT(6)
+#define XGMAC_TPS  BIT(1)
+#define XGMAC_TI   BIT(0)
+
 #endif /* __STMMAC_DWXGMAC2_H__ */
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwxgmac2_dma.c 
b/drivers/net/ethernet/stmicro/stmmac/dwxgmac2_dma.c
new file mode 100644
index ..50d9fffc32b5
--- /dev/null
+++ 

[PATCH net-next 5/9] net: stmmac: Add MDIO related functions for XGMAC2

2018-08-01 Thread Jose Abreu
Add the MDIO related funcionalities for the new IP block XGMAC2.

Signed-off-by: Jose Abreu 
Cc: David S. Miller 
Cc: Joao Pinto 
Cc: Giuseppe Cavallaro 
Cc: Alexandre Torgue 
---
 drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c | 88 +++
 1 file changed, 88 insertions(+)

diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c 
b/drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c
index 5df1a608e566..e6bf86dc14eb 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c
@@ -39,6 +39,88 @@
 #define MII_GMAC4_WRITE(1 << MII_GMAC4_GOC_SHIFT)
 #define MII_GMAC4_READ (3 << MII_GMAC4_GOC_SHIFT)
 
+/* XGMAC defines */
+#define MII_XGMAC_CMD_SHIFT16
+#define MII_XGMAC_WRITE(1 << MII_XGMAC_CMD_SHIFT)
+#define MII_XGMAC_READ (3 << MII_XGMAC_CMD_SHIFT)
+#define MII_XGMAC_BUSY BIT(22)
+
+static int stmmac_xgmac2_mdio_read(struct stmmac_priv *priv, int phyaddr,
+  int phyreg)
+{
+   unsigned int mii_address = priv->hw->mii.addr;
+   unsigned int mii_data = priv->hw->mii.data;
+   u32 tmp, addr, value = MII_XGMAC_BUSY;
+   int data;
+
+   if (phyreg & MII_ADDR_C45) {
+   addr = ((phyreg >> 16) & 0x1f) << 21;
+   addr |= (phyaddr << 16) | (phyreg & 0x);
+   } else {
+   if (phyaddr >= 4)
+   return -ENODEV;
+   writel(~0x0, priv->ioaddr + 0x220);
+   addr = (phyaddr << 16) | (phyreg & 0x1f);
+   }
+
+   value |= (priv->clk_csr << priv->hw->mii.clk_csr_shift)
+   & priv->hw->mii.clk_csr_mask;
+   value |= BIT(18);
+   value |= MII_XGMAC_READ;
+
+   if (readl_poll_timeout(priv->ioaddr + mii_data, tmp,
+  !(tmp & MII_XGMAC_BUSY), 100, 1))
+   return -EBUSY;
+
+   writel(addr, priv->ioaddr + mii_address);
+   writel(value, priv->ioaddr + mii_data);
+
+   if (readl_poll_timeout(priv->ioaddr + mii_data, tmp,
+  !(tmp & MII_XGMAC_BUSY), 100, 1))
+   return -EBUSY;
+
+   /* Read the data from the MII data register */
+   data = (int)readl(priv->ioaddr + mii_data) & GENMASK(15, 0);
+
+   return data;
+}
+
+static int stmmac_xgmac2_mdio_write(struct stmmac_priv *priv, int phyaddr,
+   int phyreg, u16 phydata)
+{
+   unsigned int mii_address = priv->hw->mii.addr;
+   unsigned int mii_data = priv->hw->mii.data;
+   u32 addr, tmp, value = MII_XGMAC_BUSY;
+
+   if (phyreg & MII_ADDR_C45) {
+   addr = ((phyreg >> 16) & 0x1f) << 21;
+   addr |= (phyaddr << 16) | (phyreg & 0x);
+   } else {
+   if (phyaddr >= 4)
+   return -ENODEV;
+   writel(~0x0, priv->ioaddr + 0x220);
+   addr = (phyaddr << 16) | (phyreg & 0x1f);
+   }
+
+   value |= (priv->clk_csr << priv->hw->mii.clk_csr_shift)
+   & priv->hw->mii.clk_csr_mask;
+   value |= phydata | BIT(18);
+   value |= MII_XGMAC_WRITE;
+
+   /* Wait until any existing MII operation is complete */
+   if (readl_poll_timeout(priv->ioaddr + mii_data, tmp,
+  !(tmp & MII_XGMAC_BUSY), 100, 1))
+   return -EBUSY;
+
+   /* Set the MII address register to write */
+   writel(addr, priv->ioaddr + mii_address);
+   writel(value, priv->ioaddr + mii_data);
+
+   /* Wait until any existing MII operation is complete */
+   return readl_poll_timeout(priv->ioaddr + mii_data, tmp,
+ !(tmp & MII_XGMAC_BUSY), 100, 1);
+}
+
 /**
  * stmmac_mdio_read
  * @bus: points to the mii_bus structure
@@ -59,6 +141,9 @@ static int stmmac_mdio_read(struct mii_bus *bus, int 
phyaddr, int phyreg)
int data;
u32 value = MII_BUSY;
 
+   if (priv->plat->has_xgmac)
+   return stmmac_xgmac2_mdio_read(priv, phyaddr, phyreg);
+
value |= (phyaddr << priv->hw->mii.addr_shift)
& priv->hw->mii.addr_mask;
value |= (phyreg << priv->hw->mii.reg_shift) & priv->hw->mii.reg_mask;
@@ -101,6 +186,9 @@ static int stmmac_mdio_write(struct mii_bus *bus, int 
phyaddr, int phyreg,
u32 v;
u32 value = MII_BUSY;
 
+   if (priv->plat->has_xgmac)
+   return stmmac_xgmac2_mdio_write(priv, phyaddr, phyreg, phydata);
+
value |= (phyaddr << priv->hw->mii.addr_shift)
& priv->hw->mii.addr_mask;
value |= (phyreg << priv->hw->mii.reg_shift) & priv->hw->mii.reg_mask;
-- 
2.7.4




[PATCH net-next 4/9] net: stmmac: Add descriptor related callbacks for XGMAC2

2018-08-01 Thread Jose Abreu
Add the descriptor related callbacks for the new IP block XGMAC2.

Signed-off-by: Jose Abreu 
Cc: David S. Miller 
Cc: Joao Pinto 
Cc: Giuseppe Cavallaro 
Cc: Alexandre Torgue 
---
 drivers/net/ethernet/stmicro/stmmac/Makefile   |   3 +-
 drivers/net/ethernet/stmicro/stmmac/dwxgmac2.h |  30 +++
 .../net/ethernet/stmicro/stmmac/dwxgmac2_descs.c   | 280 +
 drivers/net/ethernet/stmicro/stmmac/hwif.c |   2 +-
 drivers/net/ethernet/stmicro/stmmac/hwif.h |   1 +
 5 files changed, 314 insertions(+), 2 deletions(-)
 create mode 100644 drivers/net/ethernet/stmicro/stmmac/dwxgmac2_descs.c

diff --git a/drivers/net/ethernet/stmicro/stmmac/Makefile 
b/drivers/net/ethernet/stmicro/stmmac/Makefile
index da40d3bba037..99967a80a8c8 100644
--- a/drivers/net/ethernet/stmicro/stmmac/Makefile
+++ b/drivers/net/ethernet/stmicro/stmmac/Makefile
@@ -5,7 +5,8 @@ stmmac-objs:= stmmac_main.o stmmac_ethtool.o stmmac_mdio.o 
ring_mode.o  \
  dwmac100_core.o dwmac100_dma.o enh_desc.o norm_desc.o \
  mmc_core.o stmmac_hwtstamp.o stmmac_ptp.o dwmac4_descs.o  \
  dwmac4_dma.o dwmac4_lib.o dwmac4_core.o dwmac5.o hwif.o \
- stmmac_tc.o dwxgmac2_core.o dwxgmac2_dma.o $(stmmac-y)
+ stmmac_tc.o dwxgmac2_core.o dwxgmac2_dma.o dwxgmac2_descs.o \
+ $(stmmac-y)
 
 # Ordering matters. Generic driver must be last.
 obj-$(CONFIG_STMMAC_PLATFORM)  += stmmac-platform.o
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwxgmac2.h 
b/drivers/net/ethernet/stmicro/stmmac/dwxgmac2.h
index 69d225a9c890..7d9f4428d8d5 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwxgmac2.h
+++ b/drivers/net/ethernet/stmicro/stmmac/dwxgmac2.h
@@ -193,4 +193,34 @@
 #define XGMAC_TPS  BIT(1)
 #define XGMAC_TI   BIT(0)
 
+/* Descriptors */
+#define XGMAC_TDES2_IOCBIT(31)
+#define XGMAC_TDES2_TTSE   BIT(30)
+#define XGMAC_TDES2_B2LGENMASK(29, 16)
+#define XGMAC_TDES2_B2L_SHIFT  16
+#define XGMAC_TDES2_B1LGENMASK(13, 0)
+#define XGMAC_TDES3_OWNBIT(31)
+#define XGMAC_TDES3_CTXT   BIT(30)
+#define XGMAC_TDES3_FD BIT(29)
+#define XGMAC_TDES3_LD BIT(28)
+#define XGMAC_TDES3_CPCGENMASK(27, 26)
+#define XGMAC_TDES3_CPC_SHIFT  26
+#define XGMAC_TDES3_TCMSSV BIT(26)
+#define XGMAC_TDES3_THLGENMASK(22, 19)
+#define XGMAC_TDES3_THL_SHIFT  19
+#define XGMAC_TDES3_TSEBIT(18)
+#define XGMAC_TDES3_CICGENMASK(17, 16)
+#define XGMAC_TDES3_CIC_SHIFT  16
+#define XGMAC_TDES3_TPLGENMASK(17, 0)
+#define XGMAC_TDES3_FL GENMASK(14, 0)
+#define XGMAC_RDES3_OWNBIT(31)
+#define XGMAC_RDES3_CTXT   BIT(30)
+#define XGMAC_RDES3_IOCBIT(30)
+#define XGMAC_RDES3_LD BIT(28)
+#define XGMAC_RDES3_CDABIT(27)
+#define XGMAC_RDES3_ES BIT(15)
+#define XGMAC_RDES3_PL GENMASK(13, 0)
+#define XGMAC_RDES3_TSDBIT(6)
+#define XGMAC_RDES3_TSABIT(4)
+
 #endif /* __STMMAC_DWXGMAC2_H__ */
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwxgmac2_descs.c 
b/drivers/net/ethernet/stmicro/stmmac/dwxgmac2_descs.c
new file mode 100644
index ..1d858fdec997
--- /dev/null
+++ b/drivers/net/ethernet/stmicro/stmmac/dwxgmac2_descs.c
@@ -0,0 +1,280 @@
+// SPDX-License-Identifier: (GPL-2.0 OR MIT)
+/*
+ * Copyright (c) 2018 Synopsys, Inc. and/or its affiliates.
+ * stmmac XGMAC support.
+ */
+
+#include 
+#include "common.h"
+#include "dwxgmac2.h"
+
+static int dwxgmac2_get_tx_status(void *data, struct stmmac_extra_stats *x,
+ struct dma_desc *p, void __iomem *ioaddr)
+{
+   unsigned int tdes3 = le32_to_cpu(p->des3);
+   int ret = tx_done;
+
+   if (unlikely(tdes3 & XGMAC_TDES3_OWN))
+   return tx_dma_own;
+   if (likely(!(tdes3 & XGMAC_TDES3_LD)))
+   return tx_not_ls;
+
+   return ret;
+}
+
+static int dwxgmac2_get_rx_status(void *data, struct stmmac_extra_stats *x,
+ struct dma_desc *p)
+{
+   unsigned int rdes3 = le32_to_cpu(p->des3);
+   int ret = good_frame;
+
+   if (unlikely(rdes3 & XGMAC_RDES3_OWN))
+   return dma_own;
+   if (likely(!(rdes3 & XGMAC_RDES3_LD)))
+   return discard_frame;
+   if (unlikely(rdes3 & XGMAC_RDES3_ES))
+   ret = discard_frame;
+
+   return ret;
+}
+
+static int dwxgmac2_get_tx_len(struct dma_desc *p)
+{
+   return (le32_to_cpu(p->des2) & XGMAC_TDES2_B1L);
+}
+
+static int dwxgmac2_get_tx_owner(struct dma_desc *p)
+{
+   return (le32_to_cpu(p->des3) & 

[PATCH net-next 2/9] net: stmmac: Add MAC related callbacks for XGMAC2

2018-08-01 Thread Jose Abreu
Add the MAC related callbacks for the new IP block XGMAC2.

Signed-off-by: Jose Abreu 
Cc: David S. Miller 
Cc: Joao Pinto 
Cc: Giuseppe Cavallaro 
Cc: Alexandre Torgue 
---
 drivers/net/ethernet/stmicro/stmmac/Makefile   |   2 +-
 drivers/net/ethernet/stmicro/stmmac/common.h   |   3 +
 drivers/net/ethernet/stmicro/stmmac/dwxgmac2.h | 140 
 .../net/ethernet/stmicro/stmmac/dwxgmac2_core.c| 371 +
 drivers/net/ethernet/stmicro/stmmac/hwif.c |   4 +-
 drivers/net/ethernet/stmicro/stmmac/hwif.h |   1 +
 6 files changed, 518 insertions(+), 3 deletions(-)
 create mode 100644 drivers/net/ethernet/stmicro/stmmac/dwxgmac2.h
 create mode 100644 drivers/net/ethernet/stmicro/stmmac/dwxgmac2_core.c

diff --git a/drivers/net/ethernet/stmicro/stmmac/Makefile 
b/drivers/net/ethernet/stmicro/stmmac/Makefile
index 68e9e2640c62..a6cf632c9592 100644
--- a/drivers/net/ethernet/stmicro/stmmac/Makefile
+++ b/drivers/net/ethernet/stmicro/stmmac/Makefile
@@ -5,7 +5,7 @@ stmmac-objs:= stmmac_main.o stmmac_ethtool.o stmmac_mdio.o 
ring_mode.o  \
  dwmac100_core.o dwmac100_dma.o enh_desc.o norm_desc.o \
  mmc_core.o stmmac_hwtstamp.o stmmac_ptp.o dwmac4_descs.o  \
  dwmac4_dma.o dwmac4_lib.o dwmac4_core.o dwmac5.o hwif.o \
- stmmac_tc.o $(stmmac-y)
+ stmmac_tc.o dwxgmac2_core.o $(stmmac-y)
 
 # Ordering matters. Generic driver must be last.
 obj-$(CONFIG_STMMAC_PLATFORM)  += stmmac-platform.o
diff --git a/drivers/net/ethernet/stmicro/stmmac/common.h 
b/drivers/net/ethernet/stmicro/stmmac/common.h
index 3fb81acbd274..1854f270ad66 100644
--- a/drivers/net/ethernet/stmicro/stmmac/common.h
+++ b/drivers/net/ethernet/stmicro/stmmac/common.h
@@ -400,6 +400,8 @@ struct mac_link {
u32 speed10;
u32 speed100;
u32 speed1000;
+   u32 speed2500;
+   u32 speed1;
u32 duplex;
 };
 
@@ -441,6 +443,7 @@ struct stmmac_rx_routing {
 int dwmac100_setup(struct stmmac_priv *priv);
 int dwmac1000_setup(struct stmmac_priv *priv);
 int dwmac4_setup(struct stmmac_priv *priv);
+int dwxgmac2_setup(struct stmmac_priv *priv);
 
 void stmmac_set_mac_addr(void __iomem *ioaddr, u8 addr[6],
 unsigned int high, unsigned int low);
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwxgmac2.h 
b/drivers/net/ethernet/stmicro/stmmac/dwxgmac2.h
new file mode 100644
index ..ef63a62a699d
--- /dev/null
+++ b/drivers/net/ethernet/stmicro/stmmac/dwxgmac2.h
@@ -0,0 +1,140 @@
+// SPDX-License-Identifier: (GPL-2.0 OR MIT)
+/*
+ * Copyright (c) 2018 Synopsys, Inc. and/or its affiliates.
+ * stmmac XGMAC definitions.
+ */
+
+#ifndef __STMMAC_DWXGMAC2_H__
+#define __STMMAC_DWXGMAC2_H__
+
+#include "common.h"
+
+/* Misc */
+#define XGMAC_JUMBO_LEN16368
+
+/* MAC Registers */
+#define XGMAC_TX_CONFIG0x
+#define XGMAC_CONFIG_SS_OFF29
+#define XGMAC_CONFIG_SS_MASK   GENMASK(30, 29)
+#define XGMAC_CONFIG_SS_1  (0x0 << XGMAC_CONFIG_SS_OFF)
+#define XGMAC_CONFIG_SS_2500   (0x2 << XGMAC_CONFIG_SS_OFF)
+#define XGMAC_CONFIG_SS_1000   (0x3 << XGMAC_CONFIG_SS_OFF)
+#define XGMAC_CONFIG_SARC  GENMASK(22, 20)
+#define XGMAC_CONFIG_SARC_SHIFT20
+#define XGMAC_CONFIG_JDBIT(16)
+#define XGMAC_CONFIG_TEBIT(0)
+#define XGMAC_CORE_INIT_TX (XGMAC_CONFIG_JD)
+#define XGMAC_RX_CONFIG0x0004
+#define XGMAC_CONFIG_ARPEN BIT(31)
+#define XGMAC_CONFIG_GPSL  GENMASK(29, 16)
+#define XGMAC_CONFIG_GPSL_SHIFT16
+#define XGMAC_CONFIG_S2KP  BIT(11)
+#define XGMAC_CONFIG_IPC   BIT(9)
+#define XGMAC_CONFIG_JEBIT(8)
+#define XGMAC_CONFIG_WDBIT(7)
+#define XGMAC_CONFIG_GPSLCEBIT(6)
+#define XGMAC_CONFIG_CST   BIT(2)
+#define XGMAC_CONFIG_ACS   BIT(1)
+#define XGMAC_CONFIG_REBIT(0)
+#define XGMAC_CORE_INIT_RX 0
+#define XGMAC_PACKET_FILTER0x0008
+#define XGMAC_FILTER_RABIT(31)
+#define XGMAC_FILTER_PMBIT(4)
+#define XGMAC_FILTER_HMC   BIT(2)
+#define XGMAC_FILTER_PRBIT(0)
+#define XGMAC_HASH_TABLE(x)(0x0010 + (x) * 4)
+#define XGMAC_RXQ_CTRL00x00a0
+#define XGMAC_RXQEN(x) GENMASK((x) * 2 + 1, (x) * 2)
+#define XGMAC_RXQEN_SHIFT(x)   ((x) * 2)
+#define XGMAC_RXQ_CTRL20x00a8
+#define XGMAC_RXQ_CTRL30x00ac
+#define XGMAC_PSRQ(x)  GENMASK((x) * 8 + 7, (x) * 8)
+#define XGMAC_PSRQ_SHIFT(x)((x) * 8)
+#define XGMAC_INT_STATUS   0x00b0
+#define XGMAC_PMTIS  

[PATCH net-next 0/9] Add 10GbE support in stmmac using XGMAC2

2018-08-01 Thread Jose Abreu
This series adds support for 10Gigabit IP in stmmac. The IP is called XGMAC2
and has many similarities with GMAC4. Due to this, its relatively easy to
incorporate this new IP into stmmac driver by adding a new block and
filling the necessary callbacks.

The functionality added by this series is still reduced but its only a
starting point which will later be expanded.

I splitted the patches into funcionality and to ease the review. Only the
patch 8/9 really enables the XGMAC2 block by adding a new compatible string.

Cc: David S. Miller 
Cc: Joao Pinto 
Cc: Giuseppe Cavallaro 
Cc: Alexandre Torgue 

Jose Abreu (9):
  net: stmmac: Add XGMAC 2.10 HWIF entry
  net: stmmac: Add MAC related callbacks for XGMAC2
  net: stmmac: Add DMA related callbacks for XGMAC2
  net: stmmac: Add descriptor related callbacks for XGMAC2
  net: stmmac: Add MDIO related functions for XGMAC2
  net: stmmac: Add PTP support for XGMAC2
  net: stmmac: Integrate XGMAC into main driver flow
  net: stmmac: Add the bindings parsing for XGMAC2
  bindings: net: stmmac: Add the bindings documentation for XGMAC2.

 Documentation/devicetree/bindings/net/stmmac.txt   |   3 +-
 drivers/net/ethernet/stmicro/stmmac/Makefile   |   3 +-
 drivers/net/ethernet/stmicro/stmmac/common.h   |  17 +-
 .../net/ethernet/stmicro/stmmac/dwmac-generic.c|   2 +
 drivers/net/ethernet/stmicro/stmmac/dwxgmac2.h | 226 
 .../net/ethernet/stmicro/stmmac/dwxgmac2_core.c| 371 +++
 .../net/ethernet/stmicro/stmmac/dwxgmac2_descs.c   | 280 ++
 drivers/net/ethernet/stmicro/stmmac/dwxgmac2_dma.c | 410 +
 drivers/net/ethernet/stmicro/stmmac/hwif.c |  31 +-
 drivers/net/ethernet/stmicro/stmmac/hwif.h |   3 +
 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c  |  55 ++-
 drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c  |  88 +
 .../net/ethernet/stmicro/stmmac/stmmac_platform.c  |   6 +
 drivers/net/ethernet/stmicro/stmmac/stmmac_ptp.c   |   6 +-
 drivers/net/ethernet/stmicro/stmmac/stmmac_ptp.h   |   1 +
 include/linux/stmmac.h |   1 +
 16 files changed, 1481 insertions(+), 22 deletions(-)
 create mode 100644 drivers/net/ethernet/stmicro/stmmac/dwxgmac2.h
 create mode 100644 drivers/net/ethernet/stmicro/stmmac/dwxgmac2_core.c
 create mode 100644 drivers/net/ethernet/stmicro/stmmac/dwxgmac2_descs.c
 create mode 100644 drivers/net/ethernet/stmicro/stmmac/dwxgmac2_dma.c

-- 
2.7.4




[PATCH net-next 1/9] net: stmmac: Add XGMAC 2.10 HWIF entry

2018-08-01 Thread Jose Abreu
Add a new entry to HWIF table for XGMAC 2.10. For now we fill it with
empty callbacks which will be added in posterior patches.

Signed-off-by: Jose Abreu 
Cc: David S. Miller 
Cc: Joao Pinto 
Cc: Giuseppe Cavallaro 
Cc: Alexandre Torgue 
---
 drivers/net/ethernet/stmicro/stmmac/common.h | 14 +++--
 drivers/net/ethernet/stmicro/stmmac/hwif.c   | 31 ++--
 include/linux/stmmac.h   |  1 +
 3 files changed, 38 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/common.h 
b/drivers/net/ethernet/stmicro/stmmac/common.h
index 78fd0f8b8e81..3fb81acbd274 100644
--- a/drivers/net/ethernet/stmicro/stmmac/common.h
+++ b/drivers/net/ethernet/stmicro/stmmac/common.h
@@ -36,12 +36,14 @@
 #include "mmc.h"
 
 /* Synopsys Core versions */
-#defineDWMAC_CORE_3_40 0x34
-#defineDWMAC_CORE_3_50 0x35
-#defineDWMAC_CORE_4_00 0x40
-#define DWMAC_CORE_4_100x41
-#define DWMAC_CORE_5_00 0x50
-#define DWMAC_CORE_5_10 0x51
+#defineDWMAC_CORE_3_40 0x34
+#defineDWMAC_CORE_3_50 0x35
+#defineDWMAC_CORE_4_00 0x40
+#define DWMAC_CORE_4_100x41
+#define DWMAC_CORE_5_000x50
+#define DWMAC_CORE_5_100x51
+#define DWXGMAC_CORE_2_10  0x21
+
 #define STMMAC_CHAN0   0   /* Always supported and default for all chips */
 
 /* These need to be power of two, and >= 4 */
diff --git a/drivers/net/ethernet/stmicro/stmmac/hwif.c 
b/drivers/net/ethernet/stmicro/stmmac/hwif.c
index 1f50e83cafb2..24f5ff175aa4 100644
--- a/drivers/net/ethernet/stmicro/stmmac/hwif.c
+++ b/drivers/net/ethernet/stmicro/stmmac/hwif.c
@@ -72,6 +72,7 @@ static int stmmac_dwmac4_quirks(struct stmmac_priv *priv)
 static const struct stmmac_hwif_entry {
bool gmac;
bool gmac4;
+   bool xgmac;
u32 min_id;
const struct stmmac_regs_off regs;
const void *desc;
@@ -87,6 +88,7 @@ static const struct stmmac_hwif_entry {
{
.gmac = false,
.gmac4 = false,
+   .xgmac = false,
.min_id = 0,
.regs = {
.ptp_off = PTP_GMAC3_X_OFFSET,
@@ -103,6 +105,7 @@ static const struct stmmac_hwif_entry {
}, {
.gmac = true,
.gmac4 = false,
+   .xgmac = false,
.min_id = 0,
.regs = {
.ptp_off = PTP_GMAC3_X_OFFSET,
@@ -119,6 +122,7 @@ static const struct stmmac_hwif_entry {
}, {
.gmac = false,
.gmac4 = true,
+   .xgmac = false,
.min_id = 0,
.regs = {
.ptp_off = PTP_GMAC4_OFFSET,
@@ -135,6 +139,7 @@ static const struct stmmac_hwif_entry {
}, {
.gmac = false,
.gmac4 = true,
+   .xgmac = false,
.min_id = DWMAC_CORE_4_00,
.regs = {
.ptp_off = PTP_GMAC4_OFFSET,
@@ -151,6 +156,7 @@ static const struct stmmac_hwif_entry {
}, {
.gmac = false,
.gmac4 = true,
+   .xgmac = false,
.min_id = DWMAC_CORE_4_10,
.regs = {
.ptp_off = PTP_GMAC4_OFFSET,
@@ -167,6 +173,7 @@ static const struct stmmac_hwif_entry {
}, {
.gmac = false,
.gmac4 = true,
+   .xgmac = false,
.min_id = DWMAC_CORE_5_10,
.regs = {
.ptp_off = PTP_GMAC4_OFFSET,
@@ -180,11 +187,29 @@ static const struct stmmac_hwif_entry {
.tc = _tc_ops,
.setup = dwmac4_setup,
.quirks = NULL,
-   }
+   }, {
+   .gmac = false,
+   .gmac4 = false,
+   .xgmac = true,
+   .min_id = DWXGMAC_CORE_2_10,
+   .regs = {
+   .ptp_off = 0,
+   .mmc_off = 0,
+   },
+   .desc = NULL,
+   .dma = NULL,
+   .mac = NULL,
+   .hwtimestamp = NULL,
+   .mode = NULL,
+   .tc = NULL,
+   .setup = NULL,
+   .quirks = NULL,
+   },
 };
 
 int stmmac_hwif_init(struct stmmac_priv *priv)
 {
+   bool needs_xgmac = priv->plat->has_xgmac;
bool needs_gmac4 = priv->plat->has_gmac4;
bool needs_gmac = priv->plat->has_gmac;
const struct stmmac_hwif_entry *entry;
@@ -195,7 +220,7 @@ int stmmac_hwif_init(struct stmmac_priv *priv)
 
if (needs_gmac) {
id = stmmac_get_id(priv, GMAC_VERSION);
-   } else if (needs_gmac4) {
+   } else if (needs_gmac4 || needs_xgmac) {
id = stmmac_get_id(priv, GMAC4_VERSION);
} else {
id = 0;
@@ -229,6 +254,8 @@ int stmmac_hwif_init(struct 

[patch net-next v2 3/3] net: sched: make tcf_chain_{get,put}() static

2018-08-01 Thread Jiri Pirko
From: Jiri Pirko 

These are no longer used outside of cls_api.c so make them static.
Move tcf_chain_flush() to avoid fwd declaration of tcf_chain_put().

Signed-off-by: Jiri Pirko 

v1->v2:
- new patch
---
 include/net/pkt_cls.h |  3 ---
 net/sched/cls_api.c   | 34 --
 2 files changed, 16 insertions(+), 21 deletions(-)

diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
index 22bfc3a13c25..ef727f71336e 100644
--- a/include/net/pkt_cls.h
+++ b/include/net/pkt_cls.h
@@ -40,11 +40,8 @@ struct tcf_block_cb;
 bool tcf_queue_work(struct rcu_work *rwork, work_func_t func);
 
 #ifdef CONFIG_NET_CLS
-struct tcf_chain *tcf_chain_get(struct tcf_block *block, u32 chain_index,
-   bool create);
 struct tcf_chain *tcf_chain_get_by_act(struct tcf_block *block,
   u32 chain_index);
-void tcf_chain_put(struct tcf_chain *chain);
 void tcf_chain_put_by_act(struct tcf_chain *chain);
 void tcf_block_netif_keep_dst(struct tcf_block *block);
 int tcf_block_get(struct tcf_block **p_block,
diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index b194a5abfc6a..e8b0bbd0883f 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -232,19 +232,6 @@ static void tcf_chain0_head_change(struct tcf_chain *chain,
tcf_chain_head_change_item(item, tp_head);
 }
 
-static void tcf_chain_flush(struct tcf_chain *chain)
-{
-   struct tcf_proto *tp = rtnl_dereference(chain->filter_chain);
-
-   tcf_chain0_head_change(chain, NULL);
-   while (tp) {
-   RCU_INIT_POINTER(chain->filter_chain, tp->next);
-   tcf_proto_destroy(tp, NULL);
-   tp = rtnl_dereference(chain->filter_chain);
-   tcf_chain_put(chain);
-   }
-}
-
 static void tcf_chain_destroy(struct tcf_chain *chain)
 {
struct tcf_block *block = chain->block;
@@ -316,12 +303,11 @@ static struct tcf_chain *__tcf_chain_get(struct tcf_block 
*block,
return chain;
 }
 
-struct tcf_chain *tcf_chain_get(struct tcf_block *block, u32 chain_index,
-   bool create)
+static struct tcf_chain *tcf_chain_get(struct tcf_block *block, u32 
chain_index,
+  bool create)
 {
return __tcf_chain_get(block, chain_index, create, false);
 }
-EXPORT_SYMBOL(tcf_chain_get);
 
 struct tcf_chain *tcf_chain_get_by_act(struct tcf_block *block, u32 
chain_index)
 {
@@ -347,11 +333,10 @@ static void __tcf_chain_put(struct tcf_chain *chain, bool 
by_act)
}
 }
 
-void tcf_chain_put(struct tcf_chain *chain)
+static void tcf_chain_put(struct tcf_chain *chain)
 {
__tcf_chain_put(chain, false);
 }
-EXPORT_SYMBOL(tcf_chain_put);
 
 void tcf_chain_put_by_act(struct tcf_chain *chain)
 {
@@ -365,6 +350,19 @@ static void tcf_chain_put_explicitly_created(struct 
tcf_chain *chain)
tcf_chain_put(chain);
 }
 
+static void tcf_chain_flush(struct tcf_chain *chain)
+{
+   struct tcf_proto *tp = rtnl_dereference(chain->filter_chain);
+
+   tcf_chain0_head_change(chain, NULL);
+   while (tp) {
+   RCU_INIT_POINTER(chain->filter_chain, tp->next);
+   tcf_proto_destroy(tp, NULL);
+   tp = rtnl_dereference(chain->filter_chain);
+   tcf_chain_put(chain);
+   }
+}
+
 static bool tcf_block_offload_in_use(struct tcf_block *block)
 {
return block->offloadcnt;
-- 
2.14.4



[patch net-next v2 1/3] net: sched: change name of zombie chain to "held_by_acts_only"

2018-08-01 Thread Jiri Pirko
From: Jiri Pirko 

As mentioned by Cong and Jakub during the review process, it is a bit
odd to sometimes (act flow) create a new chain which would be
immediately a "zombie". So just rename it to "held_by_acts_only".

Signed-off-by: Jiri Pirko 
Suggested-by: Cong Wang 
Suggested-by: Jakub Kicinski 
Acked-by: Cong Wang 
---
 net/sched/cls_api.c | 14 ++
 1 file changed, 6 insertions(+), 8 deletions(-)

diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index e20aad1987b8..2f78341f2888 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -272,11 +272,10 @@ static void tcf_chain_release_by_act(struct tcf_chain 
*chain)
--chain->action_refcnt;
 }
 
-static bool tcf_chain_is_zombie(struct tcf_chain *chain)
+static bool tcf_chain_held_by_acts_only(struct tcf_chain *chain)
 {
/* In case all the references are action references, this
-* chain is a zombie and should not be listed in the chain
-* dump list.
+* chain should not be shown to the user.
 */
return chain->refcnt == chain->action_refcnt;
 }
@@ -1838,10 +1837,9 @@ static int tc_ctl_chain(struct sk_buff *skb, struct 
nlmsghdr *n,
chain = tcf_chain_lookup(block, chain_index);
if (n->nlmsg_type == RTM_NEWCHAIN) {
if (chain) {
-   if (tcf_chain_is_zombie(chain)) {
+   if (tcf_chain_held_by_acts_only(chain)) {
/* The chain exists only because there is
-* some action referencing it, meaning it
-* is a zombie.
+* some action referencing it.
 */
tcf_chain_hold(chain);
} else {
@@ -1860,7 +1858,7 @@ static int tc_ctl_chain(struct sk_buff *skb, struct 
nlmsghdr *n,
}
}
} else {
-   if (!chain || tcf_chain_is_zombie(chain)) {
+   if (!chain || tcf_chain_held_by_acts_only(chain)) {
NL_SET_ERR_MSG(extack, "Cannot find specified filter 
chain");
return -EINVAL;
}
@@ -1988,7 +1986,7 @@ static int tc_dump_chain(struct sk_buff *skb, struct 
netlink_callback *cb)
index++;
continue;
}
-   if (tcf_chain_is_zombie(chain))
+   if (tcf_chain_held_by_acts_only(chain))
continue;
err = tc_chain_fill_node(chain, net, skb, block,
 NETLINK_CB(cb->skb).portid,
-- 
2.14.4



[patch net-next v2 2/3] net: sched: fix notifications for action-held chains

2018-08-01 Thread Jiri Pirko
From: Jiri Pirko 

Chains that only have action references serve as placeholders.
Until a non-action reference is created, user should not be aware
of the chain. Also he should not receive any notifications about it.
So send notifications for the new chain only in case the chain gets
the first non-action reference. Symmetrically to that, when
the last non-action reference is dropped, send the notification about
deleted chain.

Reported-by: Cong Wang 
Signed-off-by: Jiri Pirko 
Acked-by: Cong Wang 

v1->v2:
- made __tcf_chain_{get,put}() static as suggested by Cong
---
 net/sched/cls_api.c | 71 -
 1 file changed, 43 insertions(+), 28 deletions(-)

diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index 2f78341f2888..b194a5abfc6a 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -262,16 +262,6 @@ static void tcf_chain_hold(struct tcf_chain *chain)
++chain->refcnt;
 }
 
-static void tcf_chain_hold_by_act(struct tcf_chain *chain)
-{
-   ++chain->action_refcnt;
-}
-
-static void tcf_chain_release_by_act(struct tcf_chain *chain)
-{
-   --chain->action_refcnt;
-}
-
 static bool tcf_chain_held_by_acts_only(struct tcf_chain *chain)
 {
/* In case all the references are action references, this
@@ -295,52 +285,77 @@ static struct tcf_chain *tcf_chain_lookup(struct 
tcf_block *block,
 static int tc_chain_notify(struct tcf_chain *chain, struct sk_buff *oskb,
   u32 seq, u16 flags, int event, bool unicast);
 
-struct tcf_chain *tcf_chain_get(struct tcf_block *block, u32 chain_index,
-   bool create)
+static struct tcf_chain *__tcf_chain_get(struct tcf_block *block,
+u32 chain_index, bool create,
+bool by_act)
 {
struct tcf_chain *chain = tcf_chain_lookup(block, chain_index);
 
if (chain) {
tcf_chain_hold(chain);
-   return chain;
+   } else {
+   if (!create)
+   return NULL;
+   chain = tcf_chain_create(block, chain_index);
+   if (!chain)
+   return NULL;
}
 
-   if (!create)
-   return NULL;
-   chain = tcf_chain_create(block, chain_index);
-   if (!chain)
-   return NULL;
-   tc_chain_notify(chain, NULL, 0, NLM_F_CREATE | NLM_F_EXCL,
-   RTM_NEWCHAIN, false);
+   if (by_act)
+   ++chain->action_refcnt;
+
+   /* Send notification only in case we got the first
+* non-action reference. Until then, the chain acts only as
+* a placeholder for actions pointing to it and user ought
+* not know about them.
+*/
+   if (chain->refcnt - chain->action_refcnt == 1 && !by_act)
+   tc_chain_notify(chain, NULL, 0, NLM_F_CREATE | NLM_F_EXCL,
+   RTM_NEWCHAIN, false);
+
return chain;
 }
+
+struct tcf_chain *tcf_chain_get(struct tcf_block *block, u32 chain_index,
+   bool create)
+{
+   return __tcf_chain_get(block, chain_index, create, false);
+}
 EXPORT_SYMBOL(tcf_chain_get);
 
 struct tcf_chain *tcf_chain_get_by_act(struct tcf_block *block, u32 
chain_index)
 {
-   struct tcf_chain *chain = tcf_chain_get(block, chain_index, true);
-
-   tcf_chain_hold_by_act(chain);
-   return chain;
+   return __tcf_chain_get(block, chain_index, true, true);
 }
 EXPORT_SYMBOL(tcf_chain_get_by_act);
 
 static void tc_chain_tmplt_del(struct tcf_chain *chain);
 
-void tcf_chain_put(struct tcf_chain *chain)
+static void __tcf_chain_put(struct tcf_chain *chain, bool by_act)
 {
-   if (--chain->refcnt == 0) {
+   if (by_act)
+   chain->action_refcnt--;
+   chain->refcnt--;
+
+   /* The last dropped non-action reference will trigger notification. */
+   if (chain->refcnt - chain->action_refcnt == 0 && !by_act)
tc_chain_notify(chain, NULL, 0, 0, RTM_DELCHAIN, false);
+
+   if (chain->refcnt == 0) {
tc_chain_tmplt_del(chain);
tcf_chain_destroy(chain);
}
 }
+
+void tcf_chain_put(struct tcf_chain *chain)
+{
+   __tcf_chain_put(chain, false);
+}
 EXPORT_SYMBOL(tcf_chain_put);
 
 void tcf_chain_put_by_act(struct tcf_chain *chain)
 {
-   tcf_chain_release_by_act(chain);
-   tcf_chain_put(chain);
+   __tcf_chain_put(chain, true);
 }
 EXPORT_SYMBOL(tcf_chain_put_by_act);
 
-- 
2.14.4



[patch net-next v2 0/3] net: sched: couple of adjustments/fixes

2018-08-01 Thread Jiri Pirko
From: Jiri Pirko 

Jiri Pirko (3):
  net: sched: change name of zombie chain to "held_by_acts_only"
  net: sched: fix notifications for action-held chains
  net: sched: make tcf_chain_{get,put}() static

 include/net/pkt_cls.h |   3 --
 net/sched/cls_api.c   | 113 +++---
 2 files changed, 62 insertions(+), 54 deletions(-)

-- 
2.14.4



答复: [Patch net-next] net: hns3: fix return value error while hclge_cmd_csq_clean failed

2018-08-01 Thread tanhuazhong
Sorry, please ignore this patch. I will resend it.

-邮件原件-
发件人: netdev-ow...@vger.kernel.org [mailto:netdev-ow...@vger.kernel.org] 代表 
Huazhong Tan
发送时间: 2018年8月1日 17:53
收件人: da...@davemloft.net
抄送: netdev@vger.kernel.org; Linuxarm 
主题: [Patch net-next] net: hns3: fix return value error while 
hclge_cmd_csq_clean failed

From: fredalu 

While cleaning the command queue, the value of the HEAD register is not in the 
range of next_to_clean and next_to_use, meaning that this value is invalid. 
This also means that there is a hardware error and the hardware will trigger a 
reset soon. At this time we should return an error code instead of 0, and 
HCLGE_STATE_CMD_DISABLE needs to be set to prevent sending command again.

Fixes: 3ff504908f95 ("net: hns3: fix a dead loop in hclge_cmd_csq_clean")
Signed-off-by: Huazhong Tan wq
---
 drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.c | 12 +---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.c 
b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.c
index 165c3d5..ac13cb2 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.c
@@ -147,7 +147,12 @@ static int hclge_cmd_csq_clean(struct hclge_hw *hw)
if (!is_valid_csq_clean_head(csq, head)) {
dev_warn(>pdev->dev, "wrong cmd head (%d, %d-%d)\n", head,
 csq->next_to_use, csq->next_to_clean);
-   return 0;
+   dev_warn(>pdev->dev,
+"Disabling any further commands to IMP firmware\n");
+   set_bit(HCLGE_STATE_CMD_DISABLE, >state);
+   dev_warn(>pdev->dev,
+"IMP firmware watchdog reset soon expected!\n");
+   return -EIO;
}
 
clean = (head - csq->next_to_clean + csq->desc_num) % csq->desc_num; @@ 
-267,10 +272,11 @@ int hclge_cmd_send(struct hclge_hw *hw, struct hclge_desc 
*desc, int num)
 
/* Clean the command send queue */
handle = hclge_cmd_csq_clean(hw);
-   if (handle != num) {
+   if (handle < 0)
+   retval = handle;
+   else if (handle != num)
dev_warn(>pdev->dev,
 "cleaned %d, need to clean %d\n", handle, num);
-   }
 
spin_unlock_bh(>cmq.csq.lock);
 
--
2.7.4



[Patch net-next V2] net: hns3: fix return value error while hclge_cmd_csq_clean failed

2018-08-01 Thread Huazhong Tan
While cleaning the command queue, the value of the HEAD register is not
in the range of next_to_clean and next_to_use, meaning that this value
is invalid. This also means that there is a hardware error and the
hardware will trigger a reset soon. At this time we should return an
error code instead of 0, and HCLGE_STATE_CMD_DISABLE needs to be set to
prevent sending command again.

Fixes: 3ff504908f95 ("net: hns3: fix a dead loop in hclge_cmd_csq_clean")
Signed-off-by: Huazhong Tan 
Signed-off-by: Salil Mehta 
---
 drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.c | 12 +---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.c 
b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.c
index 165c3d5..ac13cb2 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.c
@@ -147,7 +147,12 @@ static int hclge_cmd_csq_clean(struct hclge_hw *hw)
if (!is_valid_csq_clean_head(csq, head)) {
dev_warn(>pdev->dev, "wrong cmd head (%d, %d-%d)\n", head,
 csq->next_to_use, csq->next_to_clean);
-   return 0;
+   dev_warn(>pdev->dev,
+"Disabling any further commands to IMP firmware\n");
+   set_bit(HCLGE_STATE_CMD_DISABLE, >state);
+   dev_warn(>pdev->dev,
+"IMP firmware watchdog reset soon expected!\n");
+   return -EIO;
}
 
clean = (head - csq->next_to_clean + csq->desc_num) % csq->desc_num;
@@ -267,10 +272,11 @@ int hclge_cmd_send(struct hclge_hw *hw, struct hclge_desc 
*desc, int num)
 
/* Clean the command send queue */
handle = hclge_cmd_csq_clean(hw);
-   if (handle != num) {
+   if (handle < 0)
+   retval = handle;
+   else if (handle != num)
dev_warn(>pdev->dev,
 "cleaned %d, need to clean %d\n", handle, num);
-   }
 
spin_unlock_bh(>cmq.csq.lock);
 
-- 
2.7.4



[Patch net-next] net: hns3: fix return value error while hclge_cmd_csq_clean failed

2018-08-01 Thread Huazhong Tan
From: fredalu 

While cleaning the command queue, the value of the HEAD register is not
in the range of next_to_clean and next_to_use, meaning that this value
is invalid. This also means that there is a hardware error and the
hardware will trigger a reset soon. At this time we should return an
error code instead of 0, and HCLGE_STATE_CMD_DISABLE needs to be set to
prevent sending command again.

Fixes: 3ff504908f95 ("net: hns3: fix a dead loop in hclge_cmd_csq_clean")
Signed-off-by: Huazhong Tan wq
---
 drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.c | 12 +---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.c 
b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.c
index 165c3d5..ac13cb2 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.c
@@ -147,7 +147,12 @@ static int hclge_cmd_csq_clean(struct hclge_hw *hw)
if (!is_valid_csq_clean_head(csq, head)) {
dev_warn(>pdev->dev, "wrong cmd head (%d, %d-%d)\n", head,
 csq->next_to_use, csq->next_to_clean);
-   return 0;
+   dev_warn(>pdev->dev,
+"Disabling any further commands to IMP firmware\n");
+   set_bit(HCLGE_STATE_CMD_DISABLE, >state);
+   dev_warn(>pdev->dev,
+"IMP firmware watchdog reset soon expected!\n");
+   return -EIO;
}
 
clean = (head - csq->next_to_clean + csq->desc_num) % csq->desc_num;
@@ -267,10 +272,11 @@ int hclge_cmd_send(struct hclge_hw *hw, struct hclge_desc 
*desc, int num)
 
/* Clean the command send queue */
handle = hclge_cmd_csq_clean(hw);
-   if (handle != num) {
+   if (handle < 0)
+   retval = handle;
+   else if (handle != num)
dev_warn(>pdev->dev,
 "cleaned %d, need to clean %d\n", handle, num);
-   }
 
spin_unlock_bh(>cmq.csq.lock);
 
-- 
2.7.4



Re: [patch net-next 2/2] net: sched: fix notifications for action-held chains

2018-08-01 Thread Jiri Pirko
Wed, Aug 01, 2018 at 06:27:28AM CEST, xiyou.wangc...@gmail.com wrote:
>On Tue, Jul 31, 2018 at 5:10 AM Jiri Pirko  wrote:
>>
>> From: Jiri Pirko 
>>
>> Chains that only have action references serve as placeholders.
>> Until a non-action reference is created, user should not be aware
>> of the chain. Also he should not receive any notifications about it.
>> So send notifications for the new chain only in case the chain gets
>> the first non-action reference. Symmetrically to that, when
>> the last non-action reference is dropped, send the notification about
>> deleted chain.
>>
>> Reported-by: Cong Wang 
>> Signed-off-by: Jiri Pirko 
>
>I think __tcf_chain_{get,put}() can be static.

In fact, tcf_chain_get/put could be now static too. Will send v2.
Thanks!

>
>Other than that,
>
>Acked-by: Cong Wang 
>
>Thanks.


Re: [net-next V2 12/12] net/mlx5e: Use PARTIAL_GSO for UDP segmentation

2018-08-01 Thread Boris Pismenny




On 7/24/2018 8:35 PM, Saeed Mahameed wrote:

On Tue, Jul 24, 2018 at 7:53 AM, Alexander Duyck
 wrote:

On Mon, Jul 23, 2018 at 3:11 PM, Saeed Mahameed  wrote:

From: Boris Pismenny 

This patch removes the splitting of UDP_GSO_L4 packets in the driver,
and exposes UDP_GSO_L4 as a PARTIAL_GSO feature. Thus, the network stack
is not responsible for splitting the packet into two.

Signed-off-by: Boris Pismenny 
Signed-off-by: Saeed Mahameed 
---
  .../net/ethernet/mellanox/mlx5/core/Makefile  |   4 +-
  .../mellanox/mlx5/core/en_accel/en_accel.h|  27 +++--
  .../mellanox/mlx5/core/en_accel/rxtx.c| 109 --
  .../mellanox/mlx5/core/en_accel/rxtx.h|  14 ---
  .../net/ethernet/mellanox/mlx5/core/en_main.c |   9 +-
  5 files changed, 23 insertions(+), 140 deletions(-)
  delete mode 100644 drivers/net/ethernet/mellanox/mlx5/core/en_accel/rxtx.c
  delete mode 100644 drivers/net/ethernet/mellanox/mlx5/core/en_accel/rxtx.h

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Makefile 
b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
index 55d5a5c2e9d8..fa7fcca5dc78 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/Makefile
+++ b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
@@ -14,8 +14,8 @@ mlx5_core-$(CONFIG_MLX5_FPGA) += fpga/cmd.o fpga/core.o 
fpga/conn.o fpga/sdk.o \
 fpga/ipsec.o fpga/tls.o

  mlx5_core-$(CONFIG_MLX5_CORE_EN) += en_main.o en_common.o en_fs.o 
en_ethtool.o \
-   en_tx.o en_rx.o en_dim.o en_txrx.o en_accel/rxtx.o en_stats.o  \
-   vxlan.o en_arfs.o en_fs_ethtool.o en_selftest.o en/port.o
+   en_tx.o en_rx.o en_dim.o en_txrx.o en_stats.o vxlan.o  \
+   en_arfs.o en_fs_ethtool.o en_selftest.o en/port.o

  mlx5_core-$(CONFIG_MLX5_MPFS) += lib/mpfs.o

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/en_accel.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/en_accel.h
index 39a5d13ba459..1dd225380a66 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/en_accel.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/en_accel.h
@@ -38,14 +38,22 @@
  #include 
  #include "en_accel/ipsec_rxtx.h"
  #include "en_accel/tls_rxtx.h"
-#include "en_accel/rxtx.h"
  #include "en.h"

-static inline struct sk_buff *mlx5e_accel_handle_tx(struct sk_buff *skb,
-   struct mlx5e_txqsq *sq,
-   struct net_device *dev,
-   struct mlx5e_tx_wqe **wqe,
-   u16 *pi)
+static inline void
+mlx5e_udp_gso_handle_tx_skb(struct sk_buff *skb)
+{
+   int payload_len = skb_shinfo(skb)->gso_size + sizeof(struct udphdr);
+
+   udp_hdr(skb)->len = htons(payload_len);
+}
+


So it looks like you decided to just update the length here. Do you
still have plans to update GSO_PARTIAL to set the length this way or
have you decided to just leave it as it is?



I don't know what are Boris's plans regarding this, I will let him answer that.
But what is your take on this ?



Sorry for the late response. I'm currently busy with other things. I'd 
rather let it be like this for now, and fix it later. If you'd like to 
change it, I'll be happy to test.


Thanks