[PATCH] Revert "net/ipv6: fix metrics leak"
This reverts commit df18b50448fab1dff093731dfd0e25e77e1afcd1. This change causes other problems and use-after-free situations as found by syzbot. Signed-off-by: David S. Miller --- net/ipv6/ip6_fib.c | 18 -- 1 file changed, 4 insertions(+), 14 deletions(-) diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c index 211a2d437b56..d212738e9d10 100644 --- a/net/ipv6/ip6_fib.c +++ b/net/ipv6/ip6_fib.c @@ -167,22 +167,11 @@ struct fib6_info *fib6_info_alloc(gfp_t gfp_flags) return f6i; } -static void fib6_metrics_release(struct fib6_info *f6i) -{ - struct dst_metrics *m; - - if (!f6i) - return; - - m = f6i->fib6_metrics; - if (m != _default_metrics && refcount_dec_and_test(>refcnt)) - kfree(m); -} - void fib6_info_destroy_rcu(struct rcu_head *head) { struct fib6_info *f6i = container_of(head, struct fib6_info, rcu); struct rt6_exception_bucket *bucket; + struct dst_metrics *m; WARN_ON(f6i->fib6_node); @@ -212,7 +201,9 @@ void fib6_info_destroy_rcu(struct rcu_head *head) if (f6i->fib6_nh.nh_dev) dev_put(f6i->fib6_nh.nh_dev); - fib6_metrics_release(f6i); + m = f6i->fib6_metrics; + if (m != _default_metrics && refcount_dec_and_test(>refcnt)) + kfree(m); kfree(f6i); } @@ -896,7 +887,6 @@ static void fib6_drop_pcpu_from(struct fib6_info *f6i, from = rcu_dereference_protected(pcpu_rt->from, lockdep_is_held(>tb6_lock)); - fib6_metrics_release(from); rcu_assign_pointer(pcpu_rt->from, NULL); fib6_info_release(from); } -- 2.17.1
[PATCH net-next] rxrpc: Remove set but not used variable 'nowj'
Fixes gcc '-Wunused-but-set-variable' warning: net/rxrpc/proc.c: In function 'rxrpc_call_seq_show': net/rxrpc/proc.c:66:29: warning: variable 'nowj' set but not used [-Wunused-but-set-variable] unsigned long timeout = 0, nowj; ^ Signed-off-by: Wei Yongjun --- net/rxrpc/proc.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/net/rxrpc/proc.c b/net/rxrpc/proc.c index 163d05d..9805e3b 100644 --- a/net/rxrpc/proc.c +++ b/net/rxrpc/proc.c @@ -63,7 +63,7 @@ static int rxrpc_call_seq_show(struct seq_file *seq, void *v) struct rxrpc_peer *peer; struct rxrpc_call *call; struct rxrpc_net *rxnet = rxrpc_net(seq_file_net(seq)); - unsigned long timeout = 0, nowj; + unsigned long timeout = 0; rxrpc_seq_t tx_hard_ack, rx_hard_ack; char lbuff[50], rbuff[50]; @@ -97,7 +97,6 @@ static int rxrpc_call_seq_show(struct seq_file *seq, void *v) if (call->state != RXRPC_CALL_SERVER_PREALLOC) { timeout = READ_ONCE(call->expect_rx_by); - nowj = jiffies; timeout -= jiffies; }
Re: [pull request][net-next 00/10] Mellanox, mlx5 and devlink updates 2018-07-31
From: Jakub Kicinski Date: Wed, 1 Aug 2018 17:00:47 -0700 > On Wed, 1 Aug 2018 14:52:45 -0700, Saeed Mahameed wrote: >> - According to the discussion outcome, we are keeping the congestion control >> setting as mlx5 device specific for the current HW generation. > > I still see queuing and marking based on queue level. You want to add > a Qdisc that will mirror your HW's behaviour to offload, if you really > believe this is not a subset of RED, why not... But devlink params? I totally agree, devlink seems like absolutely to wrong level and set of interfaces to be doing this stuff. I will not pull these changes in and I probably should have not accepted the DCB changes from the other day and they were sneakily leading up to this crap. Sorry, please follow Jakub's lead as I think his approach makes much more technical sense than using devlink for this. Thanks.
Re: [Patch net-next] net: hns3: fix return value error while hclge_cmd_csq_clean failed
On 2018/8/2 1:03, David Miller wrote: From: Huazhong Tan Date: Wed, 1 Aug 2018 17:53:28 +0800 From: fredalu While cleaning the command queue, the value of the HEAD register is not in the range of next_to_clean and next_to_use, meaning that this value is invalid. This also means that there is a hardware error and the hardware will trigger a reset soon. At this time we should return an error code instead of 0, and HCLGE_STATE_CMD_DISABLE needs to be set to prevent sending command again. Fixes: 3ff504908f95 ("net: hns3: fix a dead loop in hclge_cmd_csq_clean") Signed-off-by: Huazhong Tan wq ---> ^^^ I think I know what text editor you use... :-) Applied with 'wq' removed from your signoff... Thanks. . Sorry about this, I have sent V2 to fix it. Thanks.:)
Re: [pull request][net-next 00/10] Mellanox, mlx5 and devlink updates 2018-07-31
On Wed, Aug 1, 2018 at 4:13 PM, Saeed Mahameed wrote: > On Wed, Aug 1, 2018 at 3:34 PM, Alexander Duyck > wrote: >> On Wed, Aug 1, 2018 at 2:52 PM, Saeed Mahameed wrote: >>> Hi Dave, >>> >>> This series provides devlink parameters updates to both devlink API and >>> mlx5 driver, it is a 2nd iteration of the dropped patches sent in a previous >>> mlx5 submission "net/mlx5: Support PCIe buffer congestion handling via >>> Devlink" to address review comments [1]. >>> >>> Changes from the original series: >>> - According to the discussion outcome, we are keeping the congestion control >>> setting as mlx5 device specific for the current HW generation. >>> - Changed the congestion_mode and congestion action param type to string >>> - Added patches to fix devlink handling of param type string >>> - Added a patch which adds extack messages support for param set. >>> - At the end of this series, I've added yet another mlx5 devlink related >>> feature, firmware snapshot support. >>> >>> For more information please see tag log below. >>> >>> Please pull and let me know if there's any problem. >>> >>> [1] https://patchwork.ozlabs.org/patch/945996/ >>> >>> Thanks, >>> Saeed. >>> >>> --- >>> >>> The following changes since commit e6476c21447c4b17c47e476aade6facf050f31e8: >>> >>> net: remove bogus RCU annotations on socket.wq (2018-07-31 12:40:22 -0700) >>> >>> are available in the Git repository at: >>> >>> git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux.git >>> tags/mlx5-updates-2018-08-01 >>> >>> for you to fetch changes up to 2ac6108c65ffcb1e5eab1fba1fd59272604d1c32: >>> >>> net/mlx5: Use devlink region_snapshot parameter (2018-08-01 14:49:09 >>> -0700) >>> >>> >>> mlx5-updates-2018-08-01 >>> >>> This series provides devlink parameters updates to both devlink API and >>> mlx5 driver, >>> >>> 1) Devlink changes: (Moshe Shemesh) >>> The first two patches fix devlink param infrastructure for string type >>> params. >>> The third patch adds a devlink helper function to safely copy string from >>> driver to devlink. >>> The forth patch adds extack support for param set. >>> >>> 2) mlx5 specific congestion parameters: (Eran Ben Elisha) >>> Next three patches add new devlink driver specific params for controlling >>> congestion action and mode, using string type params and extack messages >>> support. >>> >>> This congestion mode enables hw workaround in specific devices which is >>> controlled by devlink driver-specific params. The workaround is device >>> specific for this NIC generation, the next NIC will not need it. >>> >>> Congestion parameters: >>> - Congestion action >>> HW W/A mechanism in the PCIe buffer which monitors the amount of >>> consumed PCIe buffer per host. This mechanism supports the >>> following actions in case of threshold overflow: >>> - Disabled - NOP (Default) >>> - Drop >>> - Mark - Mark CE bit in the CQE of received packet >>> - Congestion mode >>> - Aggressive - Aggressive static trigger threshold (Default) >>> - Dynamic - Dynamically change the trigger threshold >>> >>> 3) mlx5 firmware snapshot support via devlink: (Alex Vesker) >>> Last three patches, add the support for capturing region snapshot of the >>> firmware crspace during critical errors, using devlink region_snapshot >>> parameter. >>> >>> -Saeed. >>> >>> >>> Alex Vesker (3): >>> net/mlx5: Add Vendor Specific Capability access gateway >>> net/mlx5: Add Crdump FW snapshot support >>> net/mlx5: Use devlink region_snapshot parameter >>> >>> Eran Ben Elisha (3): >>> net/mlx5: Move all devlink related functions calls to devlink.c >>> net/mlx5: Add MPEGC register configuration functionality >>> net/mlx5: Enable PCIe buffer congestion handling workaround via >>> devlink >>> >>> Moshe Shemesh (4): >>> devlink: Fix param set handling for string type >>> devlink: Fix param cmode driverinit for string type >>> devlink: Add helper function for safely copy string param >>> devlink: Add extack messages support to param set >>> >>> drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c | 3 +- >>> drivers/net/ethernet/mellanox/mlx4/main.c | 6 +- >>> drivers/net/ethernet/mellanox/mlx5/core/Makefile | 3 +- >>> drivers/net/ethernet/mellanox/mlx5/core/devlink.c | 388 >>> + >>> drivers/net/ethernet/mellanox/mlx5/core/devlink.h | 13 + >>> .../net/ethernet/mellanox/mlx5/core/diag/crdump.c | 223 >>> drivers/net/ethernet/mellanox/mlx5/core/health.c | 3 + >>> drivers/net/ethernet/mellanox/mlx5/core/lib/mlx5.h | 4 + >>> .../net/ethernet/mellanox/mlx5/core/lib/pci_vsc.c | 320 + >>> .../net/ethernet/mellanox/mlx5/core/lib/pci_vsc.h | 56 +++ >>>
Re: [pull request][net-next 00/10] Mellanox, mlx5 and devlink updates 2018-07-31
On Wed, 1 Aug 2018 14:52:45 -0700, Saeed Mahameed wrote: > - According to the discussion outcome, we are keeping the congestion control > setting as mlx5 device specific for the current HW generation. I still see queuing and marking based on queue level. You want to add a Qdisc that will mirror your HW's behaviour to offload, if you really believe this is not a subset of RED, why not... But devlink params?
re: hello
-- Congratulations, You Have won SIX HUNDRED AND FIFTY THOUSAND EURO in the monthly Euro/ Google draws on July 1, 2018. reply to fill your claim form. Erika Hermann Online Coordinator Desk038984EU
Re: [PATCH net-next] virtio_net: force_napi_tx module param.
On Wed, Aug 1, 2018 at 6:48 PM Michael S. Tsirkin wrote: > > On Wed, Aug 01, 2018 at 06:43:53PM -0400, Willem de Bruijn wrote: > > > > So e.g. we could set an affinity hint to a group of CPUs that > > > > > might transmit to this queue. > > > > > > > > We also want to set the xps mask for all cpus in the group to this > > > > queue. > > > > > > > > Is there a benefit over explicitly choosing one cpu from the set, btw? > > > > > > If only one CPU actually transmits on this queue then probably yes. > > > And virtio doesn't know whether that's the case. > > > > Oh, yes, good example. > > > > To plumb that through, vp_set_vq_affinity would have to take a cpu > > mask like irq_set_affinity_hint, instead of a single value. > > > > Though I don't suggest doing that as part of the core > > virtnet_set_affinity changes. > > Why not? It seems a reasonable thing to do. Well, at least let's split it into separate patches.
Re: [pull request][net-next 00/10] Mellanox, mlx5 and devlink updates 2018-07-31
On Wed, Aug 1, 2018 at 3:34 PM, Alexander Duyck wrote: > On Wed, Aug 1, 2018 at 2:52 PM, Saeed Mahameed wrote: >> Hi Dave, >> >> This series provides devlink parameters updates to both devlink API and >> mlx5 driver, it is a 2nd iteration of the dropped patches sent in a previous >> mlx5 submission "net/mlx5: Support PCIe buffer congestion handling via >> Devlink" to address review comments [1]. >> >> Changes from the original series: >> - According to the discussion outcome, we are keeping the congestion control >> setting as mlx5 device specific for the current HW generation. >> - Changed the congestion_mode and congestion action param type to string >> - Added patches to fix devlink handling of param type string >> - Added a patch which adds extack messages support for param set. >> - At the end of this series, I've added yet another mlx5 devlink related >> feature, firmware snapshot support. >> >> For more information please see tag log below. >> >> Please pull and let me know if there's any problem. >> >> [1] https://patchwork.ozlabs.org/patch/945996/ >> >> Thanks, >> Saeed. >> >> --- >> >> The following changes since commit e6476c21447c4b17c47e476aade6facf050f31e8: >> >> net: remove bogus RCU annotations on socket.wq (2018-07-31 12:40:22 -0700) >> >> are available in the Git repository at: >> >> git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux.git >> tags/mlx5-updates-2018-08-01 >> >> for you to fetch changes up to 2ac6108c65ffcb1e5eab1fba1fd59272604d1c32: >> >> net/mlx5: Use devlink region_snapshot parameter (2018-08-01 14:49:09 -0700) >> >> >> mlx5-updates-2018-08-01 >> >> This series provides devlink parameters updates to both devlink API and >> mlx5 driver, >> >> 1) Devlink changes: (Moshe Shemesh) >> The first two patches fix devlink param infrastructure for string type >> params. >> The third patch adds a devlink helper function to safely copy string from >> driver to devlink. >> The forth patch adds extack support for param set. >> >> 2) mlx5 specific congestion parameters: (Eran Ben Elisha) >> Next three patches add new devlink driver specific params for controlling >> congestion action and mode, using string type params and extack messages >> support. >> >> This congestion mode enables hw workaround in specific devices which is >> controlled by devlink driver-specific params. The workaround is device >> specific for this NIC generation, the next NIC will not need it. >> >> Congestion parameters: >> - Congestion action >> HW W/A mechanism in the PCIe buffer which monitors the amount of >> consumed PCIe buffer per host. This mechanism supports the >> following actions in case of threshold overflow: >> - Disabled - NOP (Default) >> - Drop >> - Mark - Mark CE bit in the CQE of received packet >> - Congestion mode >> - Aggressive - Aggressive static trigger threshold (Default) >> - Dynamic - Dynamically change the trigger threshold >> >> 3) mlx5 firmware snapshot support via devlink: (Alex Vesker) >> Last three patches, add the support for capturing region snapshot of the >> firmware crspace during critical errors, using devlink region_snapshot >> parameter. >> >> -Saeed. >> >> >> Alex Vesker (3): >> net/mlx5: Add Vendor Specific Capability access gateway >> net/mlx5: Add Crdump FW snapshot support >> net/mlx5: Use devlink region_snapshot parameter >> >> Eran Ben Elisha (3): >> net/mlx5: Move all devlink related functions calls to devlink.c >> net/mlx5: Add MPEGC register configuration functionality >> net/mlx5: Enable PCIe buffer congestion handling workaround via devlink >> >> Moshe Shemesh (4): >> devlink: Fix param set handling for string type >> devlink: Fix param cmode driverinit for string type >> devlink: Add helper function for safely copy string param >> devlink: Add extack messages support to param set >> >> drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c | 3 +- >> drivers/net/ethernet/mellanox/mlx4/main.c | 6 +- >> drivers/net/ethernet/mellanox/mlx5/core/Makefile | 3 +- >> drivers/net/ethernet/mellanox/mlx5/core/devlink.c | 388 >> + >> drivers/net/ethernet/mellanox/mlx5/core/devlink.h | 13 + >> .../net/ethernet/mellanox/mlx5/core/diag/crdump.c | 223 >> drivers/net/ethernet/mellanox/mlx5/core/health.c | 3 + >> drivers/net/ethernet/mellanox/mlx5/core/lib/mlx5.h | 4 + >> .../net/ethernet/mellanox/mlx5/core/lib/pci_vsc.c | 320 + >> .../net/ethernet/mellanox/mlx5/core/lib/pci_vsc.h | 56 +++ >> drivers/net/ethernet/mellanox/mlx5/core/main.c | 10 +- >> include/linux/mlx5/driver.h| 5 + >> include/net/devlink.h | 15 +- >>
Re: [PATCH net-next] virtio_net: force_napi_tx module param.
On Wed, Aug 01, 2018 at 06:43:53PM -0400, Willem de Bruijn wrote: > > > So e.g. we could set an affinity hint to a group of CPUs that > > > > might transmit to this queue. > > > > > > We also want to set the xps mask for all cpus in the group to this queue. > > > > > > Is there a benefit over explicitly choosing one cpu from the set, btw? > > > > If only one CPU actually transmits on this queue then probably yes. > > And virtio doesn't know whether that's the case. > > Oh, yes, good example. > > To plumb that through, vp_set_vq_affinity would have to take a cpu > mask like irq_set_affinity_hint, instead of a single value. > > Though I don't suggest doing that as part of the core > virtnet_set_affinity changes. Why not? It seems a reasonable thing to do. -- MST
Re: [PATCH net-next] virtio_net: force_napi_tx module param.
> > So e.g. we could set an affinity hint to a group of CPUs that > > > might transmit to this queue. > > > > We also want to set the xps mask for all cpus in the group to this queue. > > > > Is there a benefit over explicitly choosing one cpu from the set, btw? > > If only one CPU actually transmits on this queue then probably yes. > And virtio doesn't know whether that's the case. Oh, yes, good example. To plumb that through, vp_set_vq_affinity would have to take a cpu mask like irq_set_affinity_hint, instead of a single value. Though I don't suggest doing that as part of the core virtnet_set_affinity changes.
Re: [pull request][net-next 00/10] Mellanox, mlx5 and devlink updates 2018-07-31
On Wed, Aug 1, 2018 at 2:52 PM, Saeed Mahameed wrote: > Hi Dave, > > This series provides devlink parameters updates to both devlink API and > mlx5 driver, it is a 2nd iteration of the dropped patches sent in a previous > mlx5 submission "net/mlx5: Support PCIe buffer congestion handling via > Devlink" to address review comments [1]. > > Changes from the original series: > - According to the discussion outcome, we are keeping the congestion control > setting as mlx5 device specific for the current HW generation. > - Changed the congestion_mode and congestion action param type to string > - Added patches to fix devlink handling of param type string > - Added a patch which adds extack messages support for param set. > - At the end of this series, I've added yet another mlx5 devlink related > feature, firmware snapshot support. > > For more information please see tag log below. > > Please pull and let me know if there's any problem. > > [1] https://patchwork.ozlabs.org/patch/945996/ > > Thanks, > Saeed. > > --- > > The following changes since commit e6476c21447c4b17c47e476aade6facf050f31e8: > > net: remove bogus RCU annotations on socket.wq (2018-07-31 12:40:22 -0700) > > are available in the Git repository at: > > git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux.git > tags/mlx5-updates-2018-08-01 > > for you to fetch changes up to 2ac6108c65ffcb1e5eab1fba1fd59272604d1c32: > > net/mlx5: Use devlink region_snapshot parameter (2018-08-01 14:49:09 -0700) > > > mlx5-updates-2018-08-01 > > This series provides devlink parameters updates to both devlink API and > mlx5 driver, > > 1) Devlink changes: (Moshe Shemesh) > The first two patches fix devlink param infrastructure for string type > params. > The third patch adds a devlink helper function to safely copy string from > driver to devlink. > The forth patch adds extack support for param set. > > 2) mlx5 specific congestion parameters: (Eran Ben Elisha) > Next three patches add new devlink driver specific params for controlling > congestion action and mode, using string type params and extack messages > support. > > This congestion mode enables hw workaround in specific devices which is > controlled by devlink driver-specific params. The workaround is device > specific for this NIC generation, the next NIC will not need it. > > Congestion parameters: > - Congestion action > HW W/A mechanism in the PCIe buffer which monitors the amount of > consumed PCIe buffer per host. This mechanism supports the > following actions in case of threshold overflow: > - Disabled - NOP (Default) > - Drop > - Mark - Mark CE bit in the CQE of received packet > - Congestion mode > - Aggressive - Aggressive static trigger threshold (Default) > - Dynamic - Dynamically change the trigger threshold > > 3) mlx5 firmware snapshot support via devlink: (Alex Vesker) > Last three patches, add the support for capturing region snapshot of the > firmware crspace during critical errors, using devlink region_snapshot > parameter. > > -Saeed. > > > Alex Vesker (3): > net/mlx5: Add Vendor Specific Capability access gateway > net/mlx5: Add Crdump FW snapshot support > net/mlx5: Use devlink region_snapshot parameter > > Eran Ben Elisha (3): > net/mlx5: Move all devlink related functions calls to devlink.c > net/mlx5: Add MPEGC register configuration functionality > net/mlx5: Enable PCIe buffer congestion handling workaround via devlink > > Moshe Shemesh (4): > devlink: Fix param set handling for string type > devlink: Fix param cmode driverinit for string type > devlink: Add helper function for safely copy string param > devlink: Add extack messages support to param set > > drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c | 3 +- > drivers/net/ethernet/mellanox/mlx4/main.c | 6 +- > drivers/net/ethernet/mellanox/mlx5/core/Makefile | 3 +- > drivers/net/ethernet/mellanox/mlx5/core/devlink.c | 388 > + > drivers/net/ethernet/mellanox/mlx5/core/devlink.h | 13 + > .../net/ethernet/mellanox/mlx5/core/diag/crdump.c | 223 > drivers/net/ethernet/mellanox/mlx5/core/health.c | 3 + > drivers/net/ethernet/mellanox/mlx5/core/lib/mlx5.h | 4 + > .../net/ethernet/mellanox/mlx5/core/lib/pci_vsc.c | 320 + > .../net/ethernet/mellanox/mlx5/core/lib/pci_vsc.h | 56 +++ > drivers/net/ethernet/mellanox/mlx5/core/main.c | 10 +- > include/linux/mlx5/driver.h| 5 + > include/net/devlink.h | 15 +- > net/core/devlink.c | 44 ++- > 14 files changed, 1076 insertions(+), 17 deletions(-) > create mode 100644
Re: [net-next 01/10] devlink: Fix param set handling for string type
On Wed, 1 Aug 2018 14:52:46 -0700, Saeed Mahameed wrote: > From: Moshe Shemesh > > In case devlink param type is string, it needs to copy the string value > it got from the input to devlink_param_value. > > Fixes: e3b7ca18ad7b ("devlink: Add param set command") > Signed-off-by: Moshe Shemesh > Acked-by: Jiri Pirko > Signed-off-by: Saeed Mahameed > --- > include/net/devlink.h | 2 +- > net/core/devlink.c| 3 ++- > 2 files changed, 3 insertions(+), 2 deletions(-) > > diff --git a/include/net/devlink.h b/include/net/devlink.h > index b9b89d6604d4..b0e17c025fdc 100644 > --- a/include/net/devlink.h > +++ b/include/net/devlink.h > @@ -311,7 +311,7 @@ union devlink_param_value { > u8 vu8; > u16 vu16; > u32 vu32; > - const char *vstr; > + char vstr[DEVLINK_PARAM_MAX_STRING_VALUE]; > bool vbool; > }; > > diff --git a/net/core/devlink.c b/net/core/devlink.c > index 65fc366a78a4..61e126c28526 100644 > --- a/net/core/devlink.c > +++ b/net/core/devlink.c > @@ -3014,7 +3014,8 @@ devlink_param_value_get_from_info(const struct > devlink_param *param, > if (nla_len(info->attrs[DEVLINK_ATTR_PARAM_VALUE_DATA]) > > DEVLINK_PARAM_MAX_STRING_VALUE) > return -EINVAL; > - value->vstr = > nla_data(info->attrs[DEVLINK_ATTR_PARAM_VALUE_DATA]); > + strcpy(value->vstr, > +nla_data(info->attrs[DEVLINK_ATTR_PARAM_VALUE_DATA])); DEVLINK_ATTR_PARAM_VALUE_DATA is a type mux, I'm struggling to find in the code where it's checked for being null-terminated for strings. > break; > case DEVLINK_PARAM_TYPE_BOOL: > value->vbool = info->attrs[DEVLINK_ATTR_PARAM_VALUE_DATA] ?
Re: [PATCH net-next] virtio_net: force_napi_tx module param.
On Mon, Jul 30, 2018 at 02:06:50PM +0800, Jason Wang wrote: > > > On 2018年07月25日 08:17, Jon Olson wrote: > > On Tue, Jul 24, 2018 at 3:46 PM Michael S. Tsirkin wrote: > > > On Tue, Jul 24, 2018 at 06:31:54PM -0400, Willem de Bruijn wrote: > > > > On Tue, Jul 24, 2018 at 6:23 PM Michael S. Tsirkin > > > > wrote: > > > > > On Tue, Jul 24, 2018 at 04:52:53PM -0400, Willem de Bruijn wrote: > > > > > > >From the above linked patch, I understand that there are yet > > > > > > other special cases in production, such as a hard cap on #tx queues > > > > > > to > > > > > > 32 regardless of number of vcpus. > > > > > I don't think upstream kernels have this limit - we can > > > > > now use vmalloc for higher number of queues. > > > > Yes. that patch* mentioned it as a google compute engine imposed > > > > limit. It is exactly such cloud provider imposed rules that I'm > > > > concerned about working around in upstream drivers. > > > > > > > > * for reference, I mean https://patchwork.ozlabs.org/patch/725249/ > > > Yea. Why does GCE do it btw? > > There are a few reasons for the limit, some historical, some current. > > > > Historically we did this because of a kernel limit on the number of > > TAP queues (in Montreal I thought this limit was 32). To my chagrin, > > the limit upstream at the time we did it was actually eight. We had > > increased the limit from eight to 32 internally, and it appears in > > upstream it has subsequently increased upstream to 256. We no longer > > use TAP for networking, so that constraint no longer applies for us, > > but when looking at removing/raising the limit we discovered no > > workloads that clearly benefited from lifting it, and it also placed > > more pressure on our virtual networking stack particularly on the Tx > > side. We left it as-is. > > > > In terms of current reasons there are really two. One is memory usage. > > As you know, virtio-net uses rx/tx pairs, so there's an expectation > > that the guest will have an Rx queue for every Tx queue. We run our > > individual virtqueues fairly deep (4096 entries) to give guests a wide > > time window for re-posting Rx buffers and avoiding starvation on > > packet delivery. Filling an Rx vring with max-sized mergeable buffers > > (4096 bytes) is 16MB of GFP_ATOMIC allocations. At 32 queues this can > > be up to 512MB of memory posted for network buffers. Scaling this to > > the largest VM GCE offers today (160 VCPUs -- n1-ultramem-160) keeping > > all of the Rx rings full would (in the large average Rx packet size > > case) consume up to 2.5 GB(!) of guest RAM. Now, those VMs have 3.8T > > of RAM available, but I don't believe we've observed a situation where > > they would have benefited from having 2.5 gigs of buffers posted for > > incoming network traffic :) > > We can work to have async txq and rxq instead of paris if there's a strong > requirement. I think the reason we don't is because RX queueing is programmed by TX packets. It might make sense if we support RX queueing policy that isn't dependent on TX. > > > > The second reason is interrupt related -- as I mentioned above, we > > have found no workloads that clearly benefit from so many queues, but > > we have found workloads that degrade. In particular workloads that do > > a lot of small packet processing but which aren't extremely latency > > sensitive can achieve higher PPS by taking fewer interrupt across > > fewer VCPUs due to better batching (this also incurs higher latency, > > but at the limit the "busy" cores end up suppressing most interrupts > > and spending most of their cycles farming out work). Memcache is a > > good example here, particularly if the latency targets for request > > completion are in the ~milliseconds range (rather than the > > microseconds we typically strive for with TCP_RR-style workloads). > > > > All of that said, we haven't been forthcoming with data (and > > unfortunately I don't have it handy in a useful form, otherwise I'd > > simply post it here), so I understand the hesitation to simply run > > with napi_tx across the board. As Willem said, this patch seemed like > > the least disruptive way to allow us to continue down the road of > > "universal" NAPI Tx and to hopefully get data across enough workloads > > (with VMs small, large, and absurdly large :) to present a compelling > > argument in one direction or another. As far as I know there aren't > > currently any NAPI related ethtool commands (based on a quick perusal > > of ethtool.h) > > As I suggest before, maybe we can (ab)use tx-frames-irq. > > Thanks > > > -- it seems like it would be fairly involved/heavyweight > > to plumb one solely for this unless NAPI Tx is something many users > > will want to tune (and for which other drivers would support tuning). > > > > -- > > Jon Olson
Re: [PATCH net-next] virtio_net: force_napi_tx module param.
On Wed, Aug 01, 2018 at 11:46:14AM -0400, Willem de Bruijn wrote: > On Tue, Jul 31, 2018 at 8:34 AM Michael S. Tsirkin wrote: > > > > On Sun, Jul 29, 2018 at 05:32:56PM -0400, Willem de Bruijn wrote: > > > On Sun, Jul 29, 2018 at 12:01 PM David Miller wrote: > > > > > > > > From: Caleb Raitto > > > > Date: Mon, 23 Jul 2018 16:11:19 -0700 > > > > > > > > > From: Caleb Raitto > > > > > > > > > > The driver disables tx napi if it's not certain that completions will > > > > > be processed affine with tx service. > > > > > > > > > > Its heuristic doesn't account for some scenarios where it is, such as > > > > > when the queue pair count matches the core but not hyperthread count. > > > > > > > > > > Allow userspace to override the heuristic. This is an alternative > > > > > solution to that in the linked patch. That added more logic in the > > > > > kernel for these cases, but the agreement was that this was better > > > > > left > > > > > to user control. > > > > > > > > > > Do not expand the existing napi_tx variable to a ternary value, > > > > > because doing so can break user applications that expect > > > > > boolean ('Y'/'N') instead of integer output. Add a new param instead. > > > > > > > > > > Link: https://patchwork.ozlabs.org/patch/725249/ > > > > > Acked-by: Willem de Bruijn > > > > > Acked-by: Jon Olson > > > > > Signed-off-by: Caleb Raitto > > > > > > > > So I looked into the history surrounding these issues. > > > > > > > > First of all, it's always ends up turning out crummy when drivers start > > > > to set affinities themselves. The worst possible case is to do it > > > > _conditionally_, and that is exactly what virtio_net is doing. > > > > > > > > From the user's perspective, this provides a really bad experience. > > > > > > > > So if I have a 32-queue device and there are 32 cpus, you'll do all > > > > the affinity settings, stopping Irqbalanced from doing anything > > > > right? > > > > > > > > So if I add one more cpu, you'll say "oops, no idea what to do in > > > > this situation" and not touch the affinities at all? > > > > > > > > That makes no sense at all. > > > > > > > > If the driver is going to set affinities at all, OWN that decision > > > > and set it all the time to something reasonable. > > > > > > > > Or accept that you shouldn't be touching this stuff in the first place > > > > and leave the affinities alone. > > > > > > > > Right now we're kinda in a situation where the driver has been setting > > > > affinities in the ncpus==nqueues cases for some time, so we can't stop > > > > doing it. > > > > > > > > Which means we have to set them in all cases to make the user > > > > experience sane again. > > > > > > > > I looked at the linked to patch again: > > > > > > > > https://patchwork.ozlabs.org/patch/725249/ > > > > > > > > And I think the strategy should be made more generic, to get rid of > > > > the hyperthreading assumptions. I also agree that the "assign > > > > to first N cpus" logic doesn't make much sense either. > > > > > > > > Just distribute across the available cpus evenly, and be done with it. > > > > > > Sounds good to me. > > > > So e.g. we could set an affinity hint to a group of CPUs that > > might transmit to this queue. > > We also want to set the xps mask for all cpus in the group to this queue. > > Is there a benefit over explicitly choosing one cpu from the set, btw? If only one CPU actually transmits on this queue then probably yes. And virtio doesn't know whether that's the case. > I assumed striping. Something along the lines of > > int stripe = max_t(int, num_online_cpus() / vi->curr_queue_pairs, 1); > int vq = 0; > > cpumask_clear(xps_mask); > > for_each_online_cpu(cpu) { > cpumask_set_cpu(cpu, xps_mask); > > if ((i + 1) % stripe == 0) { > virtqueue_set_affinity(vi->rq[vq].vq, cpu); > virtqueue_set_affinity(vi->sq[vq].vq, cpu); > netif_set_xps_queue(vi->dev, xps_mask, vq); > cpumask_clear(xps_mask); > vq++; > } > i++; > }
Re: [net-next 07/10] net/mlx5: Enable PCIe buffer congestion handling workaround via devlink
On Wed, Aug 1, 2018 at 2:52 PM, Saeed Mahameed wrote: > From: Eran Ben Elisha > > Add support for two driver parameters via devlink params interface: > - Congestion action > HW mechanism in the PCIe buffer which monitors the amount of > consumed PCIe buffer per host. This mechanism supports the > following actions in case of threshold overflow: > - disabled - NOP (Default) > - drop > - mark - Mark CE bit in the CQE of received packet Any chance you could clarify the differences between "disabled" and "drop"? I am assuming the "drop" is a head-of-line drop versus the "disabled" being a incoming packet drop? Also I still don't see this as necessarily being all that unique of a feature/issue. Basically being PCIe bus limited is not all that uncommon of a thing and has existed since the early days of PCI. In the case of the Intel NICs we just throw a warning and end up dropping the incoming packets instead of providing the two other options you have listed. > - Congestion mode > - aggressive - Aggressive static trigger threshold (Default) > - dynamic - Dynamically change the trigger threshold > > These driver-specific params enable the NIC HW workaround to handle > buffer congestion on the current NIC generation. Is there any documentation anywhere for any of these features? In the patch set I see you adding interfaces, but I don't see them documented anywhere. > Signed-off-by: Eran Ben Elisha > Reviewed-by: Moshe Shemesh > Reviewed-by: Jiri Pirko > Signed-off-by: Saeed Mahameed > --- > .../net/ethernet/mellanox/mlx5/core/devlink.c | 204 +- > 1 file changed, 203 insertions(+), 1 deletion(-) Ideally there should be some documentation going into the kernel when you extend the devlink interface at least so that I know how to use your new interfaces when you define them. Just updating devlink.c seems like a messy way to do things. - Alex
[net-next 09/10] net/mlx5: Add Crdump FW snapshot support
From: Alex Vesker Crdump allows the driver to create a snapshot of the FW PCI crspace. This is useful in case of catastrophic issues which require FW reset. The snapshot can be used for later debug. The snapshot is exposed using devlink, cr-space address regions are registered on init and snapshots are attached once a new snapshot is collected by the driver. Signed-off-by: Alex Vesker Signed-off-by: Saeed Mahameed --- .../net/ethernet/mellanox/mlx5/core/Makefile | 3 +- .../ethernet/mellanox/mlx5/core/diag/crdump.c | 201 ++ .../net/ethernet/mellanox/mlx5/core/health.c | 1 + .../ethernet/mellanox/mlx5/core/lib/mlx5.h| 2 + .../net/ethernet/mellanox/mlx5/core/main.c| 5 + include/linux/mlx5/driver.h | 4 + 6 files changed, 215 insertions(+), 1 deletion(-) create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/diag/crdump.c diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Makefile b/drivers/net/ethernet/mellanox/mlx5/core/Makefile index 15f6916efe1b..6ea9e9462c77 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/Makefile +++ b/drivers/net/ethernet/mellanox/mlx5/core/Makefile @@ -6,7 +6,8 @@ mlx5_core-y := main.o cmd.o debugfs.o fw.o eq.o uar.o pagealloc.o \ health.o mcg.o cq.o srq.o alloc.o qp.o port.o mr.o pd.o \ mad.o transobj.o vport.o sriov.o fs_cmd.o fs_core.o \ fs_counters.o rl.o lag.o dev.o wq.o lib/gid.o lib/clock.o \ - lib/pci_vsc.o diag/fs_tracepoint.o diag/fw_tracer.o devlink.o + lib/pci_vsc.o diag/fs_tracepoint.o diag/fw_tracer.o diag/crdump.o \ + devlink.o mlx5_core-$(CONFIG_MLX5_ACCEL) += accel/ipsec.o accel/tls.o diff --git a/drivers/net/ethernet/mellanox/mlx5/core/diag/crdump.c b/drivers/net/ethernet/mellanox/mlx5/core/diag/crdump.c new file mode 100644 index ..fe779e62fc70 --- /dev/null +++ b/drivers/net/ethernet/mellanox/mlx5/core/diag/crdump.c @@ -0,0 +1,201 @@ +/* + * Copyright (c) 2018, Mellanox Technologies. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + *copyright notice, this list of conditions and the following + *disclaimer. + * + * - Redistributions in binary form must reproduce the above + *copyright notice, this list of conditions and the following + *disclaimer in the documentation and/or other materials + *provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + */ + +#include +#include +#include +#include "mlx5_core.h" +#include "lib/pci_vsc.h" + +#define BAD_ACCESS 0xBADACCE5 +#define MLX5_PROTECTED_CR_SCAN_CRSPACE 0x7 +#define MAX_NUM_OF_DUMPS_TO_STORE (8) + +static const char *region_cr_space_str = "cr-space"; + +struct mlx5_fw_crdump { + u32 size; + struct devlink_region *region_crspace; +}; + +bool mlx5_crdump_enbaled(struct mlx5_core_dev *dev) +{ + struct mlx5_priv *priv = >priv; + + return (!!priv->health.crdump); +} + +static int mlx5_crdump_fill(struct mlx5_core_dev *dev) +{ + struct devlink *devlink = priv_to_devlink(dev); + struct mlx5_priv *priv = >priv; + struct mlx5_fw_crdump *crdump = priv->health.crdump; + int i, ret = 0; + u32 *cr_data; + u32 id; + + cr_data = kvmalloc(crdump->size, GFP_KERNEL); + if (!cr_data) + return -ENOMEM; + + for (i = 0; i < (crdump->size / 4); i++) + cr_data[i] = BAD_ACCESS; + + ret = mlx5_vsc_gw_read_block_fast(dev, cr_data, crdump->size); + if (ret <= 0) + goto free_data; + + if (crdump->size != ret) { + mlx5_core_warn(dev, "failed to read full dump, read %d out of %u\n", + ret, crdump->size); + ret = -EINVAL; + goto free_data; + } + + /* Get the available snapshot ID for the dumps */ + id = devlink_region_shapshot_id_get(devlink); + ret =
[net-next 10/10] net/mlx5: Use devlink region_snapshot parameter
From: Alex Vesker This parameter enables capturing region snapshot of the crspace during critical errors. The default value of this parameter is disabled, it can be enabled using devlink param commands. It is possible to configure during runtime and also driver init. Signed-off-by: Alex Vesker Signed-off-by: Saeed Mahameed --- .../net/ethernet/mellanox/mlx5/core/devlink.c | 49 +++ .../ethernet/mellanox/mlx5/core/diag/crdump.c | 22 + .../ethernet/mellanox/mlx5/core/lib/mlx5.h| 2 + 3 files changed, 73 insertions(+) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/devlink.c b/drivers/net/ethernet/mellanox/mlx5/core/devlink.c index ec0ca690839e..4e33049096d0 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/devlink.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/devlink.c @@ -2,6 +2,7 @@ /* Copyright (c) 2018, Mellanox Technologies inc. All rights reserved. */ #include +#include "lib/mlx5.h" enum { MLX5_DEVLINK_MPEGC_FIELD_SELECT_TX_OVERFLOW_DROP_EN = BIT(0), @@ -288,6 +289,24 @@ static int mlx5_devlink_get_congestion_mode(struct devlink *devlink, u32 id, return 0; } +static int mlx5_devlink_get_crdump_snapshot(struct devlink *devlink, u32 id, + struct devlink_param_gset_ctx *ctx) +{ + struct mlx5_core_dev *dev = devlink_priv(devlink); + + ctx->val.vbool = mlx5_crdump_is_snapshot_enabled(dev); + return 0; +} + +static int mlx5_devlink_set_crdump_snapshot(struct devlink *devlink, u32 id, + struct devlink_param_gset_ctx *ctx, + struct netlink_ext_ack *extack) +{ + struct mlx5_core_dev *dev = devlink_priv(devlink); + + return mlx5_crdump_set_snapshot_enabled(dev, ctx->val.vbool); +} + enum mlx5_devlink_param_id { MLX5_DEVLINK_PARAM_ID_BASE = DEVLINK_PARAM_GENERIC_ID_MAX, MLX5_DEVLINK_PARAM_ID_CONGESTION_ACTION, @@ -295,6 +314,11 @@ enum mlx5_devlink_param_id { }; static const struct devlink_param mlx5_devlink_params[] = { + DEVLINK_PARAM_GENERIC(REGION_SNAPSHOT, + BIT(DEVLINK_PARAM_CMODE_RUNTIME) | + BIT(DEVLINK_PARAM_CMODE_DRIVERINIT), + mlx5_devlink_get_crdump_snapshot, + mlx5_devlink_set_crdump_snapshot, NULL), DEVLINK_PARAM_DRIVER(MLX5_DEVLINK_PARAM_ID_CONGESTION_ACTION, "congestion_action", DEVLINK_PARAM_TYPE_STRING, @@ -309,6 +333,29 @@ static const struct devlink_param mlx5_devlink_params[] = { mlx5_devlink_set_congestion_mode, NULL), }; +static void mlx5_devlink_set_init_value(struct devlink *devlink, u32 param_id, + union devlink_param_value init_val) +{ + struct mlx5_core_dev *dev = devlink_priv(devlink); + int err; + + err = devlink_param_driverinit_value_set(devlink, param_id, init_val); + if (err) + dev_warn(>pdev->dev, +"devlink set parameter %u value failed (err = %d)", +param_id, err); +} + +static void mlx5_devlink_set_params_init_values(struct devlink *devlink) +{ + union devlink_param_value value; + + value.vbool = false; + mlx5_devlink_set_init_value(devlink, + DEVLINK_PARAM_GENERIC_ID_REGION_SNAPSHOT, + value); +} + int mlx5_devlink_register(struct devlink *devlink, struct device *dev) { int err; @@ -324,6 +371,8 @@ int mlx5_devlink_register(struct devlink *devlink, struct device *dev) goto unregister; } + mlx5_devlink_set_params_init_values(devlink); + return 0; unregister: diff --git a/drivers/net/ethernet/mellanox/mlx5/core/diag/crdump.c b/drivers/net/ethernet/mellanox/mlx5/core/diag/crdump.c index fe779e62fc70..94b74b256eaa 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/diag/crdump.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/diag/crdump.c @@ -44,6 +44,7 @@ static const char *region_cr_space_str = "cr-space"; struct mlx5_fw_crdump { u32 size; + boolsnapshot_enable; struct devlink_region *region_crspace; }; @@ -125,6 +126,27 @@ int mlx5_crdump_collect(struct mlx5_core_dev *dev) return ret; } +bool mlx5_crdump_is_snapshot_enabled(struct mlx5_core_dev *dev) +{ + struct mlx5_priv *priv = >priv; + + if (mlx5_crdump_enbaled(dev)) + return priv->health.crdump->snapshot_enable; + + return false; +} + +int mlx5_crdump_set_snapshot_enabled(struct mlx5_core_dev *dev, bool value) +{ + struct mlx5_priv *priv = >priv; + + if (!mlx5_crdump_enbaled(dev)) + return -ENODEV; + + priv->health.crdump->snapshot_enable = value; +
[net-next 07/10] net/mlx5: Enable PCIe buffer congestion handling workaround via devlink
From: Eran Ben Elisha Add support for two driver parameters via devlink params interface: - Congestion action HW mechanism in the PCIe buffer which monitors the amount of consumed PCIe buffer per host. This mechanism supports the following actions in case of threshold overflow: - disabled - NOP (Default) - drop - mark - Mark CE bit in the CQE of received packet - Congestion mode - aggressive - Aggressive static trigger threshold (Default) - dynamic - Dynamically change the trigger threshold These driver-specific params enable the NIC HW workaround to handle buffer congestion on the current NIC generation. Signed-off-by: Eran Ben Elisha Reviewed-by: Moshe Shemesh Reviewed-by: Jiri Pirko Signed-off-by: Saeed Mahameed --- .../net/ethernet/mellanox/mlx5/core/devlink.c | 204 +- 1 file changed, 203 insertions(+), 1 deletion(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/devlink.c b/drivers/net/ethernet/mellanox/mlx5/core/devlink.c index e3a5de6b4ee7..ec0ca690839e 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/devlink.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/devlink.c @@ -126,12 +126,214 @@ static int mlx5_devlink_query_tx_overflow_sense(struct mlx5_core_dev *mdev, return 0; } +static const char *const action_to_str[] = { + [MLX5_DEVLINK_CONGESTION_ACTION_DISABLED] = "disabled", + [MLX5_DEVLINK_CONGESTION_ACTION_DROP] = "drop", + [MLX5_DEVLINK_CONGESTION_ACTION_MARK] = "mark" +}; + +static const char *mlx5_devlink_congestion_action_to_str(int action) +{ + if (action > MLX5_DEVLINK_CONGESTION_ACTION_MAX) { + WARN_ON(1); + return ERR_PTR(-EINVAL); + } + + return action_to_str[action]; +} + +static int mlx5_devlink_str_to_congestion_action(const char *str, u8 *action) +{ + int i; + + for (i = 0; i <= MLX5_DEVLINK_CONGESTION_ACTION_MAX; i++) { + if (!strcmp(str, action_to_str[i])) { + *action = i; + return 0; + } + } + + return -EINVAL; +} + +static int mlx5_devlink_set_congestion_action(struct devlink *devlink, u32 id, + struct devlink_param_gset_ctx *ctx, + struct netlink_ext_ack *extack) +{ + struct mlx5_core_dev *dev = devlink_priv(devlink); + u8 max = MLX5_DEVLINK_CONGESTION_ACTION_MAX; + u8 congestion_action; + u8 sense; + int err; + + if (!MLX5_CAP_MCAM_FEATURE(dev, mark_tx_action_cqe) && + !MLX5_CAP_MCAM_FEATURE(dev, mark_tx_action_cnp)) + max = MLX5_DEVLINK_CONGESTION_ACTION_MARK - 1; + + err = mlx5_devlink_str_to_congestion_action(ctx->val.vstr, + _action); + if (err) + return err; + + if (congestion_action > max) { + NL_SET_ERR_MSG(extack, "Requested congestion action is not supported on current device/FW"); + return -EINVAL; + } + + err = mlx5_devlink_query_tx_overflow_sense(dev, ); + if (err) + return err; + + if (congestion_action == MLX5_DEVLINK_CONGESTION_ACTION_DISABLED && + sense != MLX5_DEVLINK_CONGESTION_MODE_AGGRESSIVE) { + NL_SET_ERR_MSG(extack, "Congestion action \"disabled\" is allowed only while mode is configured to aggressive"); + return -EINVAL; + } + + return mlx5_devlink_set_tx_lossy_overflow(dev, congestion_action); +} + +static int mlx5_devlink_get_congestion_action(struct devlink *devlink, u32 id, + struct devlink_param_gset_ctx *ctx) +{ + struct mlx5_core_dev *dev = devlink_priv(devlink); + u8 congestion_action; + const char *val; + int err; + + err = mlx5_devlink_query_tx_lossy_overflow(dev, _action); + if (err) + return err; + + val = mlx5_devlink_congestion_action_to_str(congestion_action); + if (IS_ERR(val)) + return PTR_ERR(val); + + devlink_param_value_str_fill(>val, val); + return 0; +} + +static const char *const mode_to_str[] = { + [MLX5_DEVLINK_CONGESTION_MODE_AGGRESSIVE] = "aggressive", + [MLX5_DEVLINK_CONGESTION_MODE_DYNAMIC_ADJUSTMENT] = "dynamic" +}; + +static const char *mlx5_devlink_congestion_mode_to_str(int mode) +{ + if (mode > MLX5_DEVLINK_CONGESTION_MODE_MAX) { + WARN_ON(1); + return ERR_PTR(-EINVAL); + } + + return mode_to_str[mode]; +} + +static int mlx5_devlink_str_to_congestion_mode(const char *str, u8 *mode) +{ + int i; + + for (i = 0; i <= MLX5_DEVLINK_CONGESTION_MODE_MAX; i++) { + if (!strcmp(str, mode_to_str[i])) { + *mode = i; + return 0; + } + } + +
[net-next 08/10] net/mlx5: Add Vendor Specific Capability access gateway
From: Alex Vesker The Vendor Specific Capability (VSC) is used to activate a gateway interfacing with the device. The gateway is used to read or write device configurations, which are organized in different domains (spaces). A configuration access may result in multiple actions, reads, writes. Example usages are accessing the Crspace domain to read the crspace or locking a device semaphore using the Semaphore domain. Signed-off-by: Alex Vesker Signed-off-by: Saeed Mahameed --- .../net/ethernet/mellanox/mlx5/core/Makefile | 2 +- .../net/ethernet/mellanox/mlx5/core/health.c | 2 + .../ethernet/mellanox/mlx5/core/lib/pci_vsc.c | 320 ++ .../ethernet/mellanox/mlx5/core/lib/pci_vsc.h | 56 +++ include/linux/mlx5/driver.h | 1 + 5 files changed, 380 insertions(+), 1 deletion(-) create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/lib/pci_vsc.c create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/lib/pci_vsc.h diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Makefile b/drivers/net/ethernet/mellanox/mlx5/core/Makefile index 0be5bf93f33b..15f6916efe1b 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/Makefile +++ b/drivers/net/ethernet/mellanox/mlx5/core/Makefile @@ -6,7 +6,7 @@ mlx5_core-y := main.o cmd.o debugfs.o fw.o eq.o uar.o pagealloc.o \ health.o mcg.o cq.o srq.o alloc.o qp.o port.o mr.o pd.o \ mad.o transobj.o vport.o sriov.o fs_cmd.o fs_core.o \ fs_counters.o rl.o lag.o dev.o wq.o lib/gid.o lib/clock.o \ - diag/fs_tracepoint.o diag/fw_tracer.o devlink.o + lib/pci_vsc.o diag/fs_tracepoint.o diag/fw_tracer.o devlink.o mlx5_core-$(CONFIG_MLX5_ACCEL) += accel/ipsec.o accel/tls.o diff --git a/drivers/net/ethernet/mellanox/mlx5/core/health.c b/drivers/net/ethernet/mellanox/mlx5/core/health.c index d39b0b7011b2..db9e39fdc33e 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/health.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/health.c @@ -38,6 +38,7 @@ #include #include #include "mlx5_core.h" +#include "lib/pci_vsc.h" enum { MLX5_HEALTH_POLL_INTERVAL = 2 * HZ, @@ -388,6 +389,7 @@ int mlx5_health_init(struct mlx5_core_dev *dev) spin_lock_init(>wq_lock); INIT_WORK(>work, health_care); INIT_DELAYED_WORK(>recover_work, health_recover); + mlx5_vsc_init(dev); return 0; } diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/pci_vsc.c b/drivers/net/ethernet/mellanox/mlx5/core/lib/pci_vsc.c new file mode 100644 index ..cfc2eebbf7c5 --- /dev/null +++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/pci_vsc.c @@ -0,0 +1,320 @@ +/* + * Copyright (c) 2018, Mellanox Technologies. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + *copyright notice, this list of conditions and the following + *disclaimer. + * + * - Redistributions in binary form must reproduce the above + *copyright notice, this list of conditions and the following + *disclaimer in the documentation and/or other materials + *provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + */ + +#include +#include "mlx5_core.h" +#include "pci_vsc.h" + +#define MLX5_EXTRACT_C(source, offset, size) \ + unsigned)(source)) >> (offset)) & MLX5_ONES32(size)) +#define MLX5_EXTRACT(src, start, len) \ + (((len) == 32) ? (src) : MLX5_EXTRACT_C(src, start, len)) +#define MLX5_ONES32(size) \ + ((size) ? (0x >> (32 - (size))) : 0) +#define MLX5_MASK32(offset, size) \ + (MLX5_ONES32(size) << (offset)) +#define MLX5_MERGE_C(rsrc1, rsrc2, start, len) \ + rsrc2) << (start)) & (MLX5_MASK32((start), (len | \ + ((rsrc1) & (~MLX5_MASK32((start), (len) +#define MLX5_MERGE(rsrc1, rsrc2, start, len) \ + (((len) == 32) ? (rsrc2) : MLX5_MERGE_C(rsrc1, rsrc2, start, len)) + +#define VSC_MAX_RETRIES 2048 + +enum { +
[net-next 06/10] net/mlx5: Add MPEGC register configuration functionality
From: Eran Ben Elisha MPEGC register is used to configure and access the PCIe general configuration. Expose set/get for TX lossy overflow and TX overflow sense which use the MPEGC register. These will be used in a downstream patch via devlink params. Signed-off-by: Eran Ben Elisha Reviewed-by: Moshe Shemesh Signed-off-by: Saeed Mahameed --- .../net/ethernet/mellanox/mlx5/core/devlink.c | 123 ++ .../net/ethernet/mellanox/mlx5/core/devlink.h | 1 + 2 files changed, 124 insertions(+) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/devlink.c b/drivers/net/ethernet/mellanox/mlx5/core/devlink.c index 2daf686bcc98..e3a5de6b4ee7 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/devlink.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/devlink.c @@ -3,6 +3,129 @@ #include +enum { + MLX5_DEVLINK_MPEGC_FIELD_SELECT_TX_OVERFLOW_DROP_EN = BIT(0), + MLX5_DEVLINK_MPEGC_FIELD_SELECT_TX_OVERFLOW_SENSE = BIT(3), + MLX5_DEVLINK_MPEGC_FIELD_SELECT_MARK_TX_ACTION_CQE = BIT(4), + MLX5_DEVLINK_MPEGC_FIELD_SELECT_MARK_TX_ACTION_CNP = BIT(5), +}; + +enum { + MLX5_DEVLINK_CONGESTION_ACTION_DISABLED, + MLX5_DEVLINK_CONGESTION_ACTION_DROP, + MLX5_DEVLINK_CONGESTION_ACTION_MARK, + __MLX5_DEVLINK_CONGESTION_ACTION_MAX, + MLX5_DEVLINK_CONGESTION_ACTION_MAX = __MLX5_DEVLINK_CONGESTION_ACTION_MAX - 1, +}; + +enum { + MLX5_DEVLINK_CONGESTION_MODE_AGGRESSIVE, + MLX5_DEVLINK_CONGESTION_MODE_DYNAMIC_ADJUSTMENT, + __MLX5_DEVLINK_CONGESTION_MODE_MAX, + MLX5_DEVLINK_CONGESTION_MODE_MAX = __MLX5_DEVLINK_CONGESTION_MODE_MAX - 1, +}; + +static int mlx5_devlink_set_mpegc(struct mlx5_core_dev *mdev, u32 *in, + int size_in) +{ + u32 out[MLX5_ST_SZ_DW(mpegc_reg)] = {0}; + + if (!MLX5_CAP_MCAM_REG(mdev, mpegc)) + return -EOPNOTSUPP; + + return mlx5_core_access_reg(mdev, in, size_in, out, + sizeof(out), MLX5_REG_MPEGC, 0, 1); +} + +static int mlx5_devlink_set_tx_lossy_overflow(struct mlx5_core_dev *mdev, + u8 tx_lossy_overflow) +{ + u32 in[MLX5_ST_SZ_DW(mpegc_reg)] = {0}; + u8 field_select = 0; + + if (tx_lossy_overflow == MLX5_DEVLINK_CONGESTION_ACTION_MARK) { + if (MLX5_CAP_MCAM_FEATURE(mdev, mark_tx_action_cqe)) + field_select |= + MLX5_DEVLINK_MPEGC_FIELD_SELECT_MARK_TX_ACTION_CQE; + + if (MLX5_CAP_MCAM_FEATURE(mdev, mark_tx_action_cnp)) + field_select |= + MLX5_DEVLINK_MPEGC_FIELD_SELECT_MARK_TX_ACTION_CNP; + + if (!field_select) + return -EOPNOTSUPP; + } + + MLX5_SET(mpegc_reg, in, field_select, +field_select | +MLX5_DEVLINK_MPEGC_FIELD_SELECT_TX_OVERFLOW_DROP_EN); + MLX5_SET(mpegc_reg, in, tx_lossy_overflow_oper, tx_lossy_overflow); + MLX5_SET(mpegc_reg, in, mark_cqe, 0x1); + MLX5_SET(mpegc_reg, in, mark_cnp, 0x1); + + return mlx5_devlink_set_mpegc(mdev, in, sizeof(in)); +} + +static int mlx5_devlink_set_tx_overflow_sense(struct mlx5_core_dev *mdev, + u8 tx_overflow_sense) +{ + u32 in[MLX5_ST_SZ_DW(mpegc_reg)] = {0}; + + if (!MLX5_CAP_MCAM_FEATURE(mdev, dynamic_tx_overflow)) + return -EOPNOTSUPP; + + MLX5_SET(mpegc_reg, in, field_select, +MLX5_DEVLINK_MPEGC_FIELD_SELECT_TX_OVERFLOW_SENSE); + MLX5_SET(mpegc_reg, in, tx_overflow_sense, tx_overflow_sense); + + return mlx5_devlink_set_mpegc(mdev, in, sizeof(in)); +} + +static int mlx5_devlink_query_mpegc(struct mlx5_core_dev *mdev, u32 *out, + int size_out) +{ + u32 in[MLX5_ST_SZ_DW(mpegc_reg)] = {0}; + + if (!MLX5_CAP_MCAM_REG(mdev, mpegc)) + return -EOPNOTSUPP; + + return mlx5_core_access_reg(mdev, in, sizeof(in), out, + size_out, MLX5_REG_MPEGC, 0, 0); +} + +static int mlx5_devlink_query_tx_lossy_overflow(struct mlx5_core_dev *mdev, + u8 *tx_lossy_overflow) +{ + u32 out[MLX5_ST_SZ_DW(mpegc_reg)] = {0}; + int err; + + err = mlx5_devlink_query_mpegc(mdev, out, sizeof(out)); + if (err) + return err; + + *tx_lossy_overflow = MLX5_GET(mpegc_reg, out, tx_lossy_overflow_oper); + + return 0; +} + +static int mlx5_devlink_query_tx_overflow_sense(struct mlx5_core_dev *mdev, + u8 *tx_overflow_sense) +{ + u32 out[MLX5_ST_SZ_DW(mpegc_reg)] = {0}; + int err; + + if (!MLX5_CAP_MCAM_FEATURE(mdev, dynamic_tx_overflow)) + return -EOPNOTSUPP; + + err = mlx5_devlink_query_mpegc(mdev, out, sizeof(out)); + if (err) +
[net-next 05/10] net/mlx5: Move all devlink related functions calls to devlink.c
From: Eran Ben Elisha Centralize all devlink related callbacks in one file. In the downstream patch, some more functionality will be added, this patch is preparing the driver infrastructure for it. Currently, move devlink un/register functions calls into this file. Signed-off-by: Eran Ben Elisha Reviewed-by: Moshe Shemesh Reviewed-by: Jiri Pirko Signed-off-by: Saeed Mahameed --- drivers/net/ethernet/mellanox/mlx5/core/Makefile | 2 +- drivers/net/ethernet/mellanox/mlx5/core/devlink.c | 14 ++ drivers/net/ethernet/mellanox/mlx5/core/devlink.h | 12 drivers/net/ethernet/mellanox/mlx5/core/main.c| 5 +++-- 4 files changed, 30 insertions(+), 3 deletions(-) create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/devlink.c create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/devlink.h diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Makefile b/drivers/net/ethernet/mellanox/mlx5/core/Makefile index f20fda1ced4f..0be5bf93f33b 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/Makefile +++ b/drivers/net/ethernet/mellanox/mlx5/core/Makefile @@ -6,7 +6,7 @@ mlx5_core-y := main.o cmd.o debugfs.o fw.o eq.o uar.o pagealloc.o \ health.o mcg.o cq.o srq.o alloc.o qp.o port.o mr.o pd.o \ mad.o transobj.o vport.o sriov.o fs_cmd.o fs_core.o \ fs_counters.o rl.o lag.o dev.o wq.o lib/gid.o lib/clock.o \ - diag/fs_tracepoint.o diag/fw_tracer.o + diag/fs_tracepoint.o diag/fw_tracer.o devlink.o mlx5_core-$(CONFIG_MLX5_ACCEL) += accel/ipsec.o accel/tls.o diff --git a/drivers/net/ethernet/mellanox/mlx5/core/devlink.c b/drivers/net/ethernet/mellanox/mlx5/core/devlink.c new file mode 100644 index ..2daf686bcc98 --- /dev/null +++ b/drivers/net/ethernet/mellanox/mlx5/core/devlink.c @@ -0,0 +1,14 @@ +// SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB +/* Copyright (c) 2018, Mellanox Technologies inc. All rights reserved. */ + +#include + +int mlx5_devlink_register(struct devlink *devlink, struct device *dev) +{ + return devlink_register(devlink, dev); +} + +void mlx5_devlink_unregister(struct devlink *devlink) +{ + devlink_unregister(devlink); +} diff --git a/drivers/net/ethernet/mellanox/mlx5/core/devlink.h b/drivers/net/ethernet/mellanox/mlx5/core/devlink.h new file mode 100644 index ..455bfa4e89c0 --- /dev/null +++ b/drivers/net/ethernet/mellanox/mlx5/core/devlink.h @@ -0,0 +1,12 @@ +/* SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB */ +/* Copyright (c) 2018, Mellanox Technologies inc. All rights reserved. */ + +#ifndef __MLX5_DEVLINK_H__ +#define __MLX5_DEVLINK_H__ + +#include + +int mlx5_devlink_register(struct devlink *devlink, struct device *dev); +void mlx5_devlink_unregister(struct devlink *devlink); + +#endif /* __MLX5_DEVLINK_H__ */ diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c b/drivers/net/ethernet/mellanox/mlx5/core/main.c index 03b9c6733eed..7a1ddf96f065 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/main.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c @@ -56,6 +56,7 @@ #include "fs_core.h" #include "lib/mpfs.h" #include "eswitch.h" +#include "devlink.h" #include "lib/mlx5.h" #include "fpga/core.h" #include "fpga/ipsec.h" @@ -1445,7 +1446,7 @@ static int init_one(struct pci_dev *pdev, request_module_nowait(MLX5_IB_MOD); - err = devlink_register(devlink, >dev); + err = mlx5_devlink_register(devlink, >dev); if (err) goto clean_load; @@ -1475,7 +1476,7 @@ static void remove_one(struct pci_dev *pdev) struct devlink *devlink = priv_to_devlink(dev); struct mlx5_priv *priv = >priv; - devlink_unregister(devlink); + mlx5_devlink_unregister(devlink); mlx5_unregister_device(dev); if (mlx5_unload_one(dev, priv, true)) { -- 2.17.0
[net-next 03/10] devlink: Add helper function for safely copy string param
From: Moshe Shemesh Devlink string param buffer is allocated at the size of DEVLINK_PARAM_MAX_STRING_VALUE. Add helper function which makes sure this size is not exceeded. Renamed DEVLINK_PARAM_MAX_STRING_VALUE to __DEVLINK_PARAM_MAX_STRING_VALUE to emphasize that it should be used by devlink only. The driver should use the helper function instead to verify it doesn't exceed the allowed length. Signed-off-by: Moshe Shemesh Acked-by: Jiri Pirko Signed-off-by: Saeed Mahameed --- include/net/devlink.h | 12 ++-- net/core/devlink.c| 19 ++- 2 files changed, 28 insertions(+), 3 deletions(-) diff --git a/include/net/devlink.h b/include/net/devlink.h index b0e17c025fdc..99efc156a309 100644 --- a/include/net/devlink.h +++ b/include/net/devlink.h @@ -298,7 +298,7 @@ struct devlink_resource { #define DEVLINK_RESOURCE_ID_PARENT_TOP 0 -#define DEVLINK_PARAM_MAX_STRING_VALUE 32 +#define __DEVLINK_PARAM_MAX_STRING_VALUE 32 enum devlink_param_type { DEVLINK_PARAM_TYPE_U8, DEVLINK_PARAM_TYPE_U16, @@ -311,7 +311,7 @@ union devlink_param_value { u8 vu8; u16 vu16; u32 vu32; - char vstr[DEVLINK_PARAM_MAX_STRING_VALUE]; + char vstr[__DEVLINK_PARAM_MAX_STRING_VALUE]; bool vbool; }; @@ -553,6 +553,8 @@ int devlink_param_driverinit_value_get(struct devlink *devlink, u32 param_id, int devlink_param_driverinit_value_set(struct devlink *devlink, u32 param_id, union devlink_param_value init_val); void devlink_param_value_changed(struct devlink *devlink, u32 param_id); +void devlink_param_value_str_fill(union devlink_param_value *dst_val, + const char *src); struct devlink_region *devlink_region_create(struct devlink *devlink, const char *region_name, u32 region_max_snapshots, @@ -789,6 +791,12 @@ devlink_param_value_changed(struct devlink *devlink, u32 param_id) { } +static inline void +devlink_param_value_str_fill(union devlink_param_value *dst_val, +const char *src) +{ +} + static inline struct devlink_region * devlink_region_create(struct devlink *devlink, const char *region_name, diff --git a/net/core/devlink.c b/net/core/devlink.c index 99ad53e8297f..367564099357 100644 --- a/net/core/devlink.c +++ b/net/core/devlink.c @@ -3012,7 +3012,7 @@ devlink_param_value_get_from_info(const struct devlink_param *param, break; case DEVLINK_PARAM_TYPE_STRING: if (nla_len(info->attrs[DEVLINK_ATTR_PARAM_VALUE_DATA]) > - DEVLINK_PARAM_MAX_STRING_VALUE) + __DEVLINK_PARAM_MAX_STRING_VALUE) return -EINVAL; strcpy(value->vstr, nla_data(info->attrs[DEVLINK_ATTR_PARAM_VALUE_DATA])); @@ -4614,6 +4614,23 @@ void devlink_param_value_changed(struct devlink *devlink, u32 param_id) } EXPORT_SYMBOL_GPL(devlink_param_value_changed); +/** + * devlink_param_value_str_fill - Safely fill-up the string preventing + *from overflow of the preallocated buffer + * + * @dst_val: destination devlink_param_value + * @src: source buffer + */ +void devlink_param_value_str_fill(union devlink_param_value *dst_val, + const char *src) +{ + size_t len; + + len = strlcpy(dst_val->vstr, src, __DEVLINK_PARAM_MAX_STRING_VALUE); + WARN_ON(len >= __DEVLINK_PARAM_MAX_STRING_VALUE); +} +EXPORT_SYMBOL_GPL(devlink_param_value_str_fill); + /** * devlink_region_create - create a new address region * -- 2.17.0
[net-next 04/10] devlink: Add extack messages support to param set
From: Moshe Shemesh Add param extack messages support to param->set() function. This will enable providing clear reasoning to user on setting new value failure. Note that param->validate() already supports extack messages, but it is not enough as input validation can pass or even not needed and still setting new value may fail for some reason, such as device state, FW support, etc. Signed-off-by: Moshe Shemesh Signed-off-by: Jiri Pirko Signed-off-by: Saeed Mahameed --- drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c | 3 ++- drivers/net/ethernet/mellanox/mlx4/main.c | 6 -- include/net/devlink.h | 3 ++- net/core/devlink.c| 7 --- 4 files changed, 12 insertions(+), 7 deletions(-) diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c index 7bd96ab4f7c5..97b987f31881 100644 --- a/drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c @@ -91,7 +91,8 @@ static int bnxt_dl_nvm_param_get(struct devlink *dl, u32 id, } static int bnxt_dl_nvm_param_set(struct devlink *dl, u32 id, -struct devlink_param_gset_ctx *ctx) +struct devlink_param_gset_ctx *ctx, +struct netlink_ext_ack *extack) { struct hwrm_nvm_set_variable_input req = {0}; struct bnxt *bp = bnxt_get_bp_from_dl(dl); diff --git a/drivers/net/ethernet/mellanox/mlx4/main.c b/drivers/net/ethernet/mellanox/mlx4/main.c index d2d59444f562..629a5c2a4971 100644 --- a/drivers/net/ethernet/mellanox/mlx4/main.c +++ b/drivers/net/ethernet/mellanox/mlx4/main.c @@ -186,7 +186,8 @@ static int mlx4_devlink_ierr_reset_get(struct devlink *devlink, u32 id, } static int mlx4_devlink_ierr_reset_set(struct devlink *devlink, u32 id, - struct devlink_param_gset_ctx *ctx) + struct devlink_param_gset_ctx *ctx, + struct netlink_ext_ack *extack) { mlx4_internal_err_reset = ctx->val.vbool; return 0; @@ -203,7 +204,8 @@ static int mlx4_devlink_crdump_snapshot_get(struct devlink *devlink, u32 id, } static int mlx4_devlink_crdump_snapshot_set(struct devlink *devlink, u32 id, - struct devlink_param_gset_ctx *ctx) + struct devlink_param_gset_ctx *ctx, + struct netlink_ext_ack *extack) { struct mlx4_priv *priv = devlink_priv(devlink); struct mlx4_dev *dev = >dev; diff --git a/include/net/devlink.h b/include/net/devlink.h index 99efc156a309..84294f4c2580 100644 --- a/include/net/devlink.h +++ b/include/net/devlink.h @@ -344,7 +344,8 @@ struct devlink_param { int (*get)(struct devlink *devlink, u32 id, struct devlink_param_gset_ctx *ctx); int (*set)(struct devlink *devlink, u32 id, - struct devlink_param_gset_ctx *ctx); + struct devlink_param_gset_ctx *ctx, + struct netlink_ext_ack *extack); int (*validate)(struct devlink *devlink, u32 id, union devlink_param_value val, struct netlink_ext_ack *extack); diff --git a/net/core/devlink.c b/net/core/devlink.c index 367564099357..54ba7e55b1de 100644 --- a/net/core/devlink.c +++ b/net/core/devlink.c @@ -2746,11 +2746,12 @@ static int devlink_param_get(struct devlink *devlink, static int devlink_param_set(struct devlink *devlink, const struct devlink_param *param, -struct devlink_param_gset_ctx *ctx) +struct devlink_param_gset_ctx *ctx, +struct netlink_ext_ack *extack) { if (!param->set) return -EOPNOTSUPP; - return param->set(devlink, param->id, ctx); + return param->set(devlink, param->id, ctx, extack); } static int @@ -3112,7 +3113,7 @@ static int devlink_nl_cmd_param_set_doit(struct sk_buff *skb, return -EOPNOTSUPP; ctx.val = value; ctx.cmode = cmode; - err = devlink_param_set(devlink, param, ); + err = devlink_param_set(devlink, param, , info->extack); if (err) return err; } -- 2.17.0
[net-next 01/10] devlink: Fix param set handling for string type
From: Moshe Shemesh In case devlink param type is string, it needs to copy the string value it got from the input to devlink_param_value. Fixes: e3b7ca18ad7b ("devlink: Add param set command") Signed-off-by: Moshe Shemesh Acked-by: Jiri Pirko Signed-off-by: Saeed Mahameed --- include/net/devlink.h | 2 +- net/core/devlink.c| 3 ++- 2 files changed, 3 insertions(+), 2 deletions(-) diff --git a/include/net/devlink.h b/include/net/devlink.h index b9b89d6604d4..b0e17c025fdc 100644 --- a/include/net/devlink.h +++ b/include/net/devlink.h @@ -311,7 +311,7 @@ union devlink_param_value { u8 vu8; u16 vu16; u32 vu32; - const char *vstr; + char vstr[DEVLINK_PARAM_MAX_STRING_VALUE]; bool vbool; }; diff --git a/net/core/devlink.c b/net/core/devlink.c index 65fc366a78a4..61e126c28526 100644 --- a/net/core/devlink.c +++ b/net/core/devlink.c @@ -3014,7 +3014,8 @@ devlink_param_value_get_from_info(const struct devlink_param *param, if (nla_len(info->attrs[DEVLINK_ATTR_PARAM_VALUE_DATA]) > DEVLINK_PARAM_MAX_STRING_VALUE) return -EINVAL; - value->vstr = nla_data(info->attrs[DEVLINK_ATTR_PARAM_VALUE_DATA]); + strcpy(value->vstr, + nla_data(info->attrs[DEVLINK_ATTR_PARAM_VALUE_DATA])); break; case DEVLINK_PARAM_TYPE_BOOL: value->vbool = info->attrs[DEVLINK_ATTR_PARAM_VALUE_DATA] ? -- 2.17.0
[net-next 02/10] devlink: Fix param cmode driverinit for string type
From: Moshe Shemesh Driverinit configuration mode value is held by devlink to enable the driver fetch the value after reload command. In case the param type is string devlink should copy the value from driver string buffer to devlink string buffer on devlink_param_driverinit_value_set() and vice-versa on devlink_param_driverinit_value_get(). Fixes: ec01aeb1803e ("devlink: Add support for get/set driverinit value") Signed-off-by: Moshe Shemesh Acked-by: Jiri Pirko Signed-off-by: Saeed Mahameed --- net/core/devlink.c | 15 --- 1 file changed, 12 insertions(+), 3 deletions(-) diff --git a/net/core/devlink.c b/net/core/devlink.c index 61e126c28526..99ad53e8297f 100644 --- a/net/core/devlink.c +++ b/net/core/devlink.c @@ -3102,7 +3102,10 @@ static int devlink_nl_cmd_param_set_doit(struct sk_buff *skb, return -EOPNOTSUPP; if (cmode == DEVLINK_PARAM_CMODE_DRIVERINIT) { - param_item->driverinit_value = value; + if (param->type == DEVLINK_PARAM_TYPE_STRING) + strcpy(param_item->driverinit_value.vstr, value.vstr); + else + param_item->driverinit_value = value; param_item->driverinit_value_valid = true; } else { if (!param->set) @@ -4542,7 +4545,10 @@ int devlink_param_driverinit_value_get(struct devlink *devlink, u32 param_id, DEVLINK_PARAM_CMODE_DRIVERINIT)) return -EOPNOTSUPP; - *init_val = param_item->driverinit_value; + if (param_item->param->type == DEVLINK_PARAM_TYPE_STRING) + strcpy(init_val->vstr, param_item->driverinit_value.vstr); + else + *init_val = param_item->driverinit_value; return 0; } @@ -4573,7 +4579,10 @@ int devlink_param_driverinit_value_set(struct devlink *devlink, u32 param_id, DEVLINK_PARAM_CMODE_DRIVERINIT)) return -EOPNOTSUPP; - param_item->driverinit_value = init_val; + if (param_item->param->type == DEVLINK_PARAM_TYPE_STRING) + strcpy(param_item->driverinit_value.vstr, init_val.vstr); + else + param_item->driverinit_value = init_val; param_item->driverinit_value_valid = true; devlink_param_notify(devlink, param_item, DEVLINK_CMD_PARAM_NEW); -- 2.17.0
[pull request][net-next 00/10] Mellanox, mlx5 and devlink updates 2018-07-31
Hi Dave, This series provides devlink parameters updates to both devlink API and mlx5 driver, it is a 2nd iteration of the dropped patches sent in a previous mlx5 submission "net/mlx5: Support PCIe buffer congestion handling via Devlink" to address review comments [1]. Changes from the original series: - According to the discussion outcome, we are keeping the congestion control setting as mlx5 device specific for the current HW generation. - Changed the congestion_mode and congestion action param type to string - Added patches to fix devlink handling of param type string - Added a patch which adds extack messages support for param set. - At the end of this series, I've added yet another mlx5 devlink related feature, firmware snapshot support. For more information please see tag log below. Please pull and let me know if there's any problem. [1] https://patchwork.ozlabs.org/patch/945996/ Thanks, Saeed. --- The following changes since commit e6476c21447c4b17c47e476aade6facf050f31e8: net: remove bogus RCU annotations on socket.wq (2018-07-31 12:40:22 -0700) are available in the Git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux.git tags/mlx5-updates-2018-08-01 for you to fetch changes up to 2ac6108c65ffcb1e5eab1fba1fd59272604d1c32: net/mlx5: Use devlink region_snapshot parameter (2018-08-01 14:49:09 -0700) mlx5-updates-2018-08-01 This series provides devlink parameters updates to both devlink API and mlx5 driver, 1) Devlink changes: (Moshe Shemesh) The first two patches fix devlink param infrastructure for string type params. The third patch adds a devlink helper function to safely copy string from driver to devlink. The forth patch adds extack support for param set. 2) mlx5 specific congestion parameters: (Eran Ben Elisha) Next three patches add new devlink driver specific params for controlling congestion action and mode, using string type params and extack messages support. This congestion mode enables hw workaround in specific devices which is controlled by devlink driver-specific params. The workaround is device specific for this NIC generation, the next NIC will not need it. Congestion parameters: - Congestion action HW W/A mechanism in the PCIe buffer which monitors the amount of consumed PCIe buffer per host. This mechanism supports the following actions in case of threshold overflow: - Disabled - NOP (Default) - Drop - Mark - Mark CE bit in the CQE of received packet - Congestion mode - Aggressive - Aggressive static trigger threshold (Default) - Dynamic - Dynamically change the trigger threshold 3) mlx5 firmware snapshot support via devlink: (Alex Vesker) Last three patches, add the support for capturing region snapshot of the firmware crspace during critical errors, using devlink region_snapshot parameter. -Saeed. Alex Vesker (3): net/mlx5: Add Vendor Specific Capability access gateway net/mlx5: Add Crdump FW snapshot support net/mlx5: Use devlink region_snapshot parameter Eran Ben Elisha (3): net/mlx5: Move all devlink related functions calls to devlink.c net/mlx5: Add MPEGC register configuration functionality net/mlx5: Enable PCIe buffer congestion handling workaround via devlink Moshe Shemesh (4): devlink: Fix param set handling for string type devlink: Fix param cmode driverinit for string type devlink: Add helper function for safely copy string param devlink: Add extack messages support to param set drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c | 3 +- drivers/net/ethernet/mellanox/mlx4/main.c | 6 +- drivers/net/ethernet/mellanox/mlx5/core/Makefile | 3 +- drivers/net/ethernet/mellanox/mlx5/core/devlink.c | 388 + drivers/net/ethernet/mellanox/mlx5/core/devlink.h | 13 + .../net/ethernet/mellanox/mlx5/core/diag/crdump.c | 223 drivers/net/ethernet/mellanox/mlx5/core/health.c | 3 + drivers/net/ethernet/mellanox/mlx5/core/lib/mlx5.h | 4 + .../net/ethernet/mellanox/mlx5/core/lib/pci_vsc.c | 320 + .../net/ethernet/mellanox/mlx5/core/lib/pci_vsc.h | 56 +++ drivers/net/ethernet/mellanox/mlx5/core/main.c | 10 +- include/linux/mlx5/driver.h| 5 + include/net/devlink.h | 15 +- net/core/devlink.c | 44 ++- 14 files changed, 1076 insertions(+), 17 deletions(-) create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/devlink.c create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/devlink.h create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/diag/crdump.c create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/lib/pci_vsc.c create mode 100644
Re: how PHY driver is notified that cable is unplugged? (possibly related to IFF_RUNNING flag)
On Wed, 1 Aug 2018, Florian Fainelli wrote: > On 08/01/2018 05:58 AM, rpj...@crashcourse.ca wrote: > > > > (warning that i have a few questions that are probably trivial > > until i get up to speed with networking code.) > > > > a colleague asked for advice about the following -- apparently a > > new PHY driver works properly when being brought up with "ifconfig > > up", part of that process apparently setting the > > IFF_RUNNING net_device flags bit. so far, so good. > > > > now, when the cable is physically unplugged, the claim is that > > there is no obvious status change for that port, accompanied by > > the suggestion that it is that IFF_RUNNING flag bit that is not > > being unset. > > > > asking a more general question, where can i read up on the > > proper protocol for a driver being notified of, and properly > > handling, physical disconnection on that port? and, of course, the > > cable being plugged back in. > > The basic mechanism used by the PHY library is to read the standard > Basic Mode Status Register and check the Link status bit to > determine what the state of the link is set. This event can be > triggered either through polling, or the use of an interrupt that > the PHY is generating. > > Once the link state is determined, because the PHY device is > "connected" to a network device, the PHY library can call > netif_carrier_{on,off} against the network device attached to the > PHY and that propagates through the networking stack and sets the > appropriate bits, including IFF_RUNNING. yup, that agrees with the code i was perusing, thanks for clarifying. more questions coming shortly, hopefully tougher. rday
[net-next:master 408/425] net/rds/connection.c:71:31: sparse: incorrect type in argument 1 (different base types)
tree: https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git master head: b69ab96ab13ddaa28a506a2433ae5b12669615e8 commit: e65d4d96334e3ff4fe0064612a93a51c63de08de [408/425] rds: Remove IPv6 dependency reproduce: # apt-get install sparse git checkout e65d4d96334e3ff4fe0064612a93a51c63de08de make ARCH=x86_64 allmodconfig make C=1 CF=-D__CHECK_ENDIAN__ sparse warnings: (new ones prefixed by >>) >> net/rds/connection.c:71:31: sparse: incorrect type in argument 1 (different >> base types) @@expected restricted __be32 const [usertype] laddr @@ >> got unsigned inrestricted __be32 const [usertype] laddr @@ net/rds/connection.c:71:31:expected restricted __be32 const [usertype] laddr net/rds/connection.c:71:31:got unsigned int [unsigned] [assigned] [usertype] lhash >> net/rds/connection.c:71:41: sparse: incorrect type in argument 3 (different >> base types) @@expected restricted __be32 const [usertype] faddr @@ >> got unsigned inrestricted __be32 const [usertype] faddr @@ net/rds/connection.c:71:41:expected restricted __be32 const [usertype] faddr net/rds/connection.c:71:41:got unsigned int [unsigned] [assigned] [usertype] fhash include/linux/slab.h:631:13: sparse: undefined identifier '__builtin_mul_overflow' include/linux/slab.h:631:13: sparse: call with no type! vim +71 net/rds/connection.c 53 54 static struct hlist_head *rds_conn_bucket(const struct in6_addr *laddr, 55const struct in6_addr *faddr) 56 { 57 static u32 rds6_hash_secret __read_mostly; 58 static u32 rds_hash_secret __read_mostly; 59 60 u32 lhash, fhash, hash; 61 62 net_get_random_once(_hash_secret, sizeof(rds_hash_secret)); 63 net_get_random_once(_hash_secret, sizeof(rds6_hash_secret)); 64 65 lhash = (__force u32)laddr->s6_addr32[3]; 66 #if IS_ENABLED(CONFIG_IPV6) 67 fhash = __ipv6_addr_jhash(faddr, rds6_hash_secret); 68 #else 69 fhash = (__force u32)faddr->s6_addr32[3]; 70 #endif > 71 hash = __inet_ehashfn(lhash, 0, fhash, 0, rds_hash_secret); 72 73 return _conn_hash[hash & RDS_CONNECTION_HASH_MASK]; 74 } 75 --- 0-DAY kernel test infrastructureOpen Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation
Re: [PATCH net-next v1] net: don't declare IPv6 non-local bind helper if CONFIG_IPV6 undefined
From: Vincent Bernat Date: Wed, 1 Aug 2018 22:05:10 +0200 > Fixes: 83ba4645152d ("net: add helpers checking if socket can be bound to > nonlocal address") > Signed-off-by: Vincent Bernat Applied, thanks Vincent.
Re: [net-next v5 3/3] net/tls: Remove redundant array allocation.
On 08/01/18 01:49 PM, Vakul Garg wrote: > > I don't think this patch is safe as-is. sgin_arr is a stack array of size > > MAX_SKB_FRAGS (+ overhead), while my read of skb_cow_data is that it > > walks the whole chain of skbs from skb->next, and can return any number of > > segments. Therefore we need to heap allocate. I think I copied the IPSEC > > code here. > > > > For perf though, we could use the stack array if skb_cow_data returns <= > > MAX_SKB_FRAGS. > > skb_cow_data() is being called only when caller passes sgout=NULL (i.e. > non-zero copy case). In case of zero-copy, we do not call skb_cow_data() > and just assume that MAX_SKB_FRAGS+2 sized scatterlist array sgin_arr[] > is sufficient. This assumption could be wrong. So skb_cow_data() should be > called unconditionally to determine number of scatterlist entries required > for skb. I agree it is best to unify them. I was originally worried about perf with the extra allocation (which you proposed fixing by merging with the crypto allocation, which would be great), and the perf of skb_cow_data(). Zerocopy doesn't require skb_cow_data(), but we do have to walk the skbs to calculate nsg correctly. However skb_cow_data perf might be fine after your fix "strparser: Call skb_unclone conditionally".
Re: Security enhancement proposal for kernel TLS
On 07/31/18 10:45 AM, Vakul Garg wrote: > > > IIUC, with the upstream implementation of tls record layer in kernel, > > > the decryption of tls FINISHED message happens in kernel. Therefore > > > the keys are already being sent to kernel tls socket before handshake is > > completed. > > > > This is incorrect. > > Let us first reach a common ground on this. > > The kernel TLS implementation can decrypt only after setting the keys on the > socket. > The TLS message 'finished' (which is encrypted) is received after receiving > 'CCS' > message. After the user space TLS library receives CCS message, it sets the > keys > on kernel TLS socket. Therefore, the next message in the socket receive queue > which is TLS finished gets decrypted in kernel only. > > Please refer to following Boris's patch on openssl. The commit log says: > " We choose to set this option at the earliest - just after CCS is complete". I agree that Boris' patch does what you say it does - it sets keys immediately after CCS instead of after FINISHED message. I disagree that the kernel tls implementation currently requires that specific ordering, nor do I think that it should require that ordering.
Re: [PATCH bpf] xdp: add NULL pointer check in __xdp_return()
On 08/01/2018 04:43 PM, Björn Töpel wrote: > Den ons 1 aug. 2018 kl 16:14 skrev Jesper Dangaard Brouer : >> On Mon, 23 Jul 2018 11:41:02 +0200 >> Björn Töpel wrote: >> >> diff --git a/net/core/xdp.c b/net/core/xdp.c >> index 9d1f220..1c12bc7 100644 >> --- a/net/core/xdp.c >> +++ b/net/core/xdp.c >> @@ -345,7 +345,8 @@ static void __xdp_return(void *data, struct >> xdp_mem_info *mem, bool napi_direct, >> rcu_read_lock(); >> /* mem->id is valid, checked in >> xdp_rxq_info_reg_mem_model() */ >> xa = rhashtable_lookup(mem_id_ht, >id, >> mem_id_rht_params); >> - xa->zc_alloc->free(xa->zc_alloc, handle); >> + if (xa) >> + xa->zc_alloc->free(xa->zc_alloc, handle); > hmm...It is not clear to me the "!xa" case don't have to be handled? Thank you for reviewing! Returning NULL pointer is bug case such as calling after use xdp_rxq_info_unreg(). so that, I think it can't handle at that moment. we can make __xdp_return to add WARN_ON_ONCE() or add return error code to driver. But I'm not sure if these is useful information. I might have misunderstood scenario of MEM_TYPE_ZERO_COPY because there is no use case of MEM_TYPE_ZERO_COPY yet. >>> >>> Taehee, again, sorry for the slow response and thanks for patch! >>> >>> If xa is NULL, the driver has a buggy/broken implementation. What >>> would be a proper way of dealing with this? BUG? >> >> Hmm... I don't like these kind of changes to the hot-path code! >> >> You might not realize this, but adding BUG() and WARN_ON() to the code >> affect performance in ways you might not realize! These macros gets >> compiled and uses an asm instruction called "ud2". Seeing the "ud2" >> instruction causes the CPUs instruction cache prefetcher to stop. >> Thus, if some code ends up below this instruction, this will cause more >> i-cache-misses. >> >> I don't know if xa==NULL is even possible, but if it is, then I think >> this is a result of a driver mem_reg API usage bug. And the mem-reg >> API is full of WARN's and error messages, exactly to push these kind of >> checks out of the fast-path. There is no need for a BUG() call, as >> deref a NULL pointer will case an OOPS, that is easy to read and >> understand. > > Jesper, thanks for having a look! So, you're right that if xa==NULL > the driver is "broken/buggy" (as stated earlier!). I agree that > OOPSing on a NULL pointer is as good as a BUG! > > The applied patch adds a WARN_ON_ONCE, and I thought best practice was > that a buggy driver shouldn't crash the kernel... What is considered > best practices in these scenarios? *I'd* prefer an OOPS instead of > WARN_ON_ONCE, to catch that buggy driver. Again, that's me. I thought > that most people prefer not crashing, hence the patch. :-) In that case, lets send a revert for the patch with a proper analysis of why it is safe to omit the NULL check which should be placed as a comment right near the rhashtable_lookup(). Thanks, Daniel
[PATCH net-next v1] net: don't declare IPv6 non-local bind helper if CONFIG_IPV6 undefined
Fixes: 83ba4645152d ("net: add helpers checking if socket can be bound to nonlocal address") Signed-off-by: Vincent Bernat --- include/net/ipv6.h | 14 +++--- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/include/net/ipv6.h b/include/net/ipv6.h index 82deb684ba73..ff33f498c137 100644 --- a/include/net/ipv6.h +++ b/include/net/ipv6.h @@ -766,13 +766,6 @@ static inline int ip6_sk_dst_hoplimit(struct ipv6_pinfo *np, struct flowi6 *fl6, return hlimit; } -static inline bool ipv6_can_nonlocal_bind(struct net *net, - struct inet_sock *inet) -{ - return net->ipv6.sysctl.ip_nonlocal_bind || - inet->freebind || inet->transparent; -} - /* copy IPv6 saddr & daddr to flow_keys, possibly using 64bit load/store * Equivalent to : flow->v6addrs.src = iph->saddr; * flow->v6addrs.dst = iph->daddr; @@ -789,6 +782,13 @@ static inline void iph_to_flow_copy_v6addrs(struct flow_keys *flow, #if IS_ENABLED(CONFIG_IPV6) +static inline bool ipv6_can_nonlocal_bind(struct net *net, + struct inet_sock *inet) +{ + return net->ipv6.sysctl.ip_nonlocal_bind || + inet->freebind || inet->transparent; +} + /* Sysctl settings for net ipv6.auto_flowlabels */ #define IP6_AUTO_FLOW_LABEL_OFF0 #define IP6_AUTO_FLOW_LABEL_OPTOUT 1 -- 2.18.0
[net-next:master 415/424] include/net/ipv6.h:772:14: error: 'struct net' has no member named 'ipv6'; did you mean 'ipv4'?
tree: https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git master head: 90d4c5bb98bf6665266917edf0e16ccd35f9 commit: 83ba4645152d1177c161750e1064e3a8e7cee19b [415/424] net: add helpers checking if socket can be bound to nonlocal address config: i386-tinyconfig (attached as .config) compiler: gcc-7 (Debian 7.3.0-16) 7.3.0 reproduce: git checkout 83ba4645152d1177c161750e1064e3a8e7cee19b # save the attached .config to linux build tree make ARCH=i386 All errors (new ones prefixed by >>): In file included from include/net/addrconf.h:54:0, from lib/vsprintf.c:35: include/net/ipv6.h: In function 'ipv6_can_nonlocal_bind': >> include/net/ipv6.h:772:14: error: 'struct net' has no member named 'ipv6'; >> did you mean 'ipv4'? return net->ipv6.sysctl.ip_nonlocal_bind || ^~~~ ipv4 vim +772 include/net/ipv6.h 768 769 static inline bool ipv6_can_nonlocal_bind(struct net *net, 770struct inet_sock *inet) 771 { > 772 return net->ipv6.sysctl.ip_nonlocal_bind || 773 inet->freebind || inet->transparent; 774 } 775 --- 0-DAY kernel test infrastructureOpen Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation .config.gz Description: application/gzip
Re: [net-next 10/16] net/mlx5: Support PCIe buffer congestion handling via Devlink
On Tue, Jul 31, 2018 at 2:06 PM, Bjorn Helgaas wrote: > On Mon, Jul 30, 2018 at 08:19:50PM -0700, Alexander Duyck wrote: > > On Mon, Jul 30, 2018 at 7:33 PM, Bjorn Helgaas > wrote: > > > On Mon, Jul 30, 2018 at 08:02:48AM -0700, Alexander Duyck wrote: > > >> On Mon, Jul 30, 2018 at 7:07 AM, Bjorn Helgaas > wrote: > > >> > On Sun, Jul 29, 2018 at 03:00:28PM -0700, Alexander Duyck wrote: > > >> >> On Sun, Jul 29, 2018 at 2:23 AM, Moshe Shemesh < > moshes20...@gmail.com> wrote: > > >> >> > On Sat, Jul 28, 2018 at 7:06 PM, Bjorn Helgaas < > helg...@kernel.org> wrote: > > >> >> >> On Thu, Jul 26, 2018 at 07:00:20AM -0700, Alexander Duyck wrote: > > >> >> >> > On Thu, Jul 26, 2018 at 12:14 AM, Jiri Pirko < > j...@resnulli.us> wrote: > > >> >> >> > > Thu, Jul 26, 2018 at 02:43:59AM CEST, > jakub.kicin...@netronome.com > > >> >> >> > > wrote: > > >> >> >> > >>On Wed, 25 Jul 2018 08:23:26 -0700, Alexander Duyck wrote: > > >> >> >> > >>> On Wed, Jul 25, 2018 at 5:31 AM, Eran Ben Elisha wrote: > > >> >> >> > >>> > On 7/24/2018 10:51 PM, Jakub Kicinski wrote: > > >> >> >> > >>> The devlink params haven't been upstream even for a > full cycle > > >> >> >> > >>> and > > >> >> >> > >>> already you guys are starting to use them to > configure standard > > >> >> >> > >>> features like queuing. > > >> >> >> > >>> >>> > > >> >> >> > >>> >>> We developed the devlink params in order to support > non-standard > > >> >> >> > >>> >>> configuration only. And for non-standard, there are > generic and > > >> >> >> > >>> >>> vendor > > >> >> >> > >>> >>> specific options. > > >> >> >> > >>> >> > > >> >> >> > >>> >> I thought it was developed for performing non-standard > and > > >> >> >> > >>> >> possibly > > >> >> >> > >>> >> vendor specific configuration. Look at > DEVLINK_PARAM_GENERIC_* > > >> >> >> > >>> >> for > > >> >> >> > >>> >> examples of well justified generic options for which > we have no > > >> >> >> > >>> >> other API. The vendor mlx4 options look fairly vendor > specific > > >> >> >> > >>> >> if you > > >> >> >> > >>> >> ask me, too. > > >> >> >> > >>> >> > > >> >> >> > >>> >> Configuring queuing has an API. The question is it > acceptable to > > >> >> >> > >>> >> enter > > >> >> >> > >>> >> into the risky territory of controlling offloads via > devlink > > >> >> >> > >>> >> parameters > > >> >> >> > >>> >> or would we rather make vendors take the time and > effort to model > > >> >> >> > >>> >> things to (a subset) of existing APIs. The HW never > fits the > > >> >> >> > >>> >> APIs > > >> >> >> > >>> >> perfectly. > > >> >> >> > >>> > > > >> >> >> > >>> > I understand what you meant here, I would like to > highlight that > > >> >> >> > >>> > this > > >> >> >> > >>> > mechanism was not meant to handle SRIOV, Representors, > etc. > > >> >> >> > >>> > The vendor specific configuration suggested here is to > handle a > > >> >> >> > >>> > congestion > > >> >> >> > >>> > state in Multi Host environment (which includes PF and > multiple > > >> >> >> > >>> > VFs per > > >> >> >> > >>> > host), where one host is not aware to the other hosts, > and each is > > >> >> >> > >>> > running > > >> >> >> > >>> > on its own pci/driver. It is a device working mode > configuration. > > >> >> >> > >>> > > > >> >> >> > >>> > This couldn't fit into any existing API, thus creating > this > > >> >> >> > >>> > vendor specific > > >> >> >> > >>> > unique API is needed. > > >> >> >> > >>> > > >> >> >> > >>> If we are just going to start creating devlink interfaces > in for > > >> >> >> > >>> every > > >> >> >> > >>> one-off option a device wants to add why did we even > bother with > > >> >> >> > >>> trying to prevent drivers from using sysfs? This just > feels like we > > >> >> >> > >>> are back to the same arguments we had back in the day > with it. > > >> >> >> > >>> > > >> >> >> > >>> I feel like the bigger question here is if devlink is how > we are > > >> >> >> > >>> going > > >> >> >> > >>> to deal with all PCIe related features going forward, or > should we > > >> >> >> > >>> start looking at creating a new interface/tool for > PCI/PCIe related > > >> >> >> > >>> features? My concern is that we have already had features > such as > > >> >> >> > >>> DMA > > >> >> >> > >>> Coalescing that didn't really fit into anything and now > we are > > >> >> >> > >>> starting to see other things related to DMA and PCIe bus > credits. > > >> >> >> > >>> I'm > > >> >> >> > >>> wondering if we shouldn't start looking at a > tool/interface to > > >> >> >> > >>> configure all the PCIe related features such as > interrupts, error > > >> >> >> > >>> reporting, DMA configuration, power management, etc. > Maybe we could > > >> >> >> > >>> even look at sharing it across subsystems and include > things like > > >> >> >> > >>> storage, graphics, and other subsystems in the > conversation. > > >> >> >> > >> > > >> >> >> > >>Agreed, for actual PCIe configuration (i.e. not ECN > marking) we do > > >> >> >> > >> need > > >> >> >> >
Re: [Patch net-next] net: hns3: fix return value error while hclge_cmd_csq_clean failed
From: Huazhong Tan Date: Wed, 1 Aug 2018 17:53:28 +0800 > From: fredalu > > While cleaning the command queue, the value of the HEAD register is not > in the range of next_to_clean and next_to_use, meaning that this value > is invalid. This also means that there is a hardware error and the > hardware will trigger a reset soon. At this time we should return an > error code instead of 0, and HCLGE_STATE_CMD_DISABLE needs to be set to > prevent sending command again. > > Fixes: 3ff504908f95 ("net: hns3: fix a dead loop in hclge_cmd_csq_clean") > Signed-off-by: Huazhong Tan wq ---> ^^^ I think I know what text editor you use... :-) Applied with 'wq' removed from your signoff... Thanks.
Re: [PATCH net-next] cxgb4: fix endian to test F_FW_PORT_CMD_DCBXDIS32
From: Ganesh Goudar Date: Wed, 1 Aug 2018 18:15:32 +0530 > For FW_PORT_ACTION_GET_PORT_INFO32 messages, the > u.info32.lstatus32_to_cbllen32 is 32-bit Big Endian. > We need to translate that to CPU Endian in order to > test F_FW_PORT_CMD_DCBXDIS32. > > Signed-off-by: Casey Leedom > Signed-off-by: Ganesh Goudar Applied, thank you.
Re: [PATCH net-next] strparser: remove redundant variable 'rd_desc'
From: YueHaibing Date: Wed, 1 Aug 2018 15:10:37 +0800 > Variable 'rd_desc' is being assigned but never used, > so can be removed. > > fix this clang warning: > net/strparser/strparser.c:411:20: warning: variable ‘rd_desc’ set but not > used [-Wunused-but-set-variable] > > Signed-off-by: YueHaibing Applied.
Re: [patch net-next v2 0/3] net: sched: couple of adjustments/fixes
From: Jiri Pirko Date: Wed, 1 Aug 2018 12:36:54 +0200 > From: Jiri Pirko > > Jiri Pirko (3): > net: sched: change name of zombie chain to "held_by_acts_only" > net: sched: fix notifications for action-held chains > net: sched: make tcf_chain_{get,put}() static Series applied, thanks Jiri.
Re: [PATCH net-next v1] net: add helpers checking if socket can be bound to nonlocal address
From: Vincent Bernat Date: Tue, 31 Jul 2018 21:18:11 +0200 > The construction "net->ipv4.sysctl_ip_nonlocal_bind || inet->freebind > || inet->transparent" is present three times and its IPv6 counterpart > is also present three times. We introduce two small helpers to > characterize these tests uniformly. > > Signed-off-by: Vincent Bernat Looks great, thanks for doing this. Applied.
Re: how PHY driver is notified that cable is unplugged? (possibly related to IFF_RUNNING flag)
On 08/01/2018 05:58 AM, rpj...@crashcourse.ca wrote: > > (warning that i have a few questions that are probably trivial until i > get up to speed > with networking code.) > > a colleague asked for advice about the following -- apparently a new > PHY driver works > properly when being brought up with "ifconfig up", part of that > process > apparently setting the IFF_RUNNING net_device flags bit. so far, so good. > > now, when the cable is physically unplugged, the claim is that there > is no obvious > status change for that port, accompanied by the suggestion that it is > that IFF_RUNNING > flag bit that is not being unset. > > asking a more general question, where can i read up on the proper > protocol for > a driver being notified of, and properly handling, physical > disconnection on that > port? and, of course, the cable being plugged back in. The basic mechanism used by the PHY library is to read the standard Basic Mode Status Register and check the Link status bit to determine what the state of the link is set. This event can be triggered either through polling, or the use of an interrupt that the PHY is generating. Once the link state is determined, because the PHY device is "connected" to a network device, the PHY library can call netif_carrier_{on,off} against the network device attached to the PHY and that propagates through the networking stack and sets the appropriate bits, including IFF_RUNNING. Hope this helps. -- Florian
Re: [PATCH v2 net-next 0/5] tcp: add 4 new stats
From: Wei Wang Date: Tue, 31 Jul 2018 17:46:19 -0700 > From: Wei Wang > > This patch series adds 3 RFC4898 stats: > 1. tcpEStatsPerfHCDataOctetsOut > 2. tcpEStatsPerfOctetsRetrans > 3. tcpEStatsStackDSACKDups > and an addtional stat to record the number of data packet reordering > events seen: > 4. tcp_reord_seen > > Together with the existing stats, application can use them to measure > the retransmission rate in bytes, exclude spurious retransmissions > reflected by DSACK, and keep track of the reordering events on live > connections. > In particular the networks with different MTUs make bytes-based loss stats > more useful. Google servers have been using these stats for many years to > instrument transport and network performance. > > Note: The first patch is a refactor to add a helper to calculate > opt_stats size in order to make later changes cleaner. Series applied, thank you.
Re: [PATCH net-next] tcp: remove set but not used variable 'skb_size'
From: Wei Yongjun Date: Wed, 1 Aug 2018 01:59:56 + > Fixes gcc '-Wunused-but-set-variable' warning: > > net/ipv4/tcp_output.c: In function 'tcp_collapse_retrans': > net/ipv4/tcp_output.c:2700:6: warning: > variable 'skb_size' set but not used [-Wunused-but-set-variable] > int skb_size, next_skb_size; > ^ > > Signed-off-by: Wei Yongjun Applied, thanks.
Re: [PATCH V2 net-next 0/2] be2net: patch-set
From: Suresh Reddy Date: Tue, 31 Jul 2018 11:39:41 -0400 > v1->v2 : Modified the subject line and commit log. > > Please consider applying these two patches to net-next. Series applied, thanks.
Re: [PATCH net-next 1/2] rds: rds_ib_recv_alloc_cache() should call alloc_percpu_gfp() instead
From: Ka-Cheong Poon Date: Mon, 30 Jul 2018 22:48:41 -0700 > Currently, rds_ib_conn_alloc() calls rds_ib_recv_alloc_caches() > without passing along the gfp_t flag. But rds_ib_recv_alloc_caches() > and rds_ib_recv_alloc_cache() should take a gfp_t parameter so that > rds_ib_recv_alloc_cache() can call alloc_percpu_gfp() using the > correct flag instead of calling alloc_percpu(). > > Signed-off-by: Ka-Cheong Poon Applied.
Re: [PATCH net] net: stmmac: Fix WoL for PCI-based setups
From: Jose Abreu Date: Tue, 31 Jul 2018 15:08:20 +0100 > WoL won't work in PCI-based setups because we are not saving the PCI EP > state before entering suspend state and not allowing D3 wake. > > Fix this by using a wrapper around stmmac_{suspend/resume} which > correctly sets the PCI EP state. > > Signed-off-by: Jose Abreu Applied and queued up for -stable, thanks.
Re: [PATCH v2 net] bonding: avoid lockdep confusion in bond_get_stats()
From: Eric Dumazet Date: Tue, 31 Jul 2018 06:30:54 -0700 > syzbot found that the following sequence produces a LOCKDEP splat [1] ... > To fix this, we can use the already provided nest_level. > > This patch also provides correct nesting for dev->addr_list_lock ... > Signed-off-by: Eric Dumazet Applied, thanks Eric.
Re: [PATCH ethtool] ethtool: Add support for WAKE_FILTER
From: Florian Fainelli Date: Mon, 30 Jul 2018 15:26:24 -0700 > On 07/17/2018 08:36 AM, Florian Fainelli wrote: >> Allow re-purposing the wol->sopass storage area to specify a bitmask of >> filters >> (programmed previously via ethtool::rxnfc) to be used as wake-up patterns. > > John, David, can you provide some feedback if the approach is > acceptable? I will address Andrew's comment about the user friendliness > and allow providing a comma separate list of filter identifiers. > > One usability issue with this approach is that one cannot specify > wake-on-LAN using WAKE_MAGICSECURE *and* WAKE_FILTER at the same time, > since it uses the same location in the ioctl() structure that is being > passed. Do you see this as a problem? Once again we are stuck in this weird situation, a sort of limbo. On the one hand, I don't want to block your work on the ethtool netlink stuff being done. However it is clear that by using netlink attributes, it would be so much cleaner. I honestly don't know what to say at this time. I wish I had a clear piece of advice and a way for everyone to move forward, and usually I do, but this time I really don't :-/
Re: [PATCH net-next 2/2] rds: Remove IPv6 dependency
From: Ka-Cheong Poon Date: Mon, 30 Jul 2018 22:48:42 -0700 > This patch removes the IPv6 dependency from RDS. > > Signed-off-by: Ka-Cheong Poon Applied.
Re: [PATCH net-next] virtio_net: force_napi_tx module param.
> > > > Just distribute across the available cpus evenly, and be done with it. > > > > > > Sounds good to me. > > > > So e.g. we could set an affinity hint to a group of CPUs that > > might transmit to this queue. > > We also want to set the xps mask for all cpus in the group to this queue. > > Is there a benefit over explicitly choosing one cpu from the set, btw? > I assumed striping. Something along the lines of > > int stripe = max_t(int, num_online_cpus() / vi->curr_queue_pairs, 1); > int vq = 0; > > cpumask_clear(xps_mask); > > for_each_online_cpu(cpu) { > cpumask_set_cpu(cpu, xps_mask); > > if ((i + 1) % stripe == 0) { > virtqueue_set_affinity(vi->rq[vq].vq, cpu); > virtqueue_set_affinity(vi->sq[vq].vq, cpu); > netif_set_xps_queue(vi->dev, xps_mask, vq); > cpumask_clear(xps_mask); > vq++; > } > i++; > } .. but handling edge cases correctly, such as #cpu not being a perfect multiple of #vq.
Re: [PATCH net-next] virtio_net: force_napi_tx module param.
On Tue, Jul 31, 2018 at 8:34 AM Michael S. Tsirkin wrote: > > On Sun, Jul 29, 2018 at 05:32:56PM -0400, Willem de Bruijn wrote: > > On Sun, Jul 29, 2018 at 12:01 PM David Miller wrote: > > > > > > From: Caleb Raitto > > > Date: Mon, 23 Jul 2018 16:11:19 -0700 > > > > > > > From: Caleb Raitto > > > > > > > > The driver disables tx napi if it's not certain that completions will > > > > be processed affine with tx service. > > > > > > > > Its heuristic doesn't account for some scenarios where it is, such as > > > > when the queue pair count matches the core but not hyperthread count. > > > > > > > > Allow userspace to override the heuristic. This is an alternative > > > > solution to that in the linked patch. That added more logic in the > > > > kernel for these cases, but the agreement was that this was better left > > > > to user control. > > > > > > > > Do not expand the existing napi_tx variable to a ternary value, > > > > because doing so can break user applications that expect > > > > boolean ('Y'/'N') instead of integer output. Add a new param instead. > > > > > > > > Link: https://patchwork.ozlabs.org/patch/725249/ > > > > Acked-by: Willem de Bruijn > > > > Acked-by: Jon Olson > > > > Signed-off-by: Caleb Raitto > > > > > > So I looked into the history surrounding these issues. > > > > > > First of all, it's always ends up turning out crummy when drivers start > > > to set affinities themselves. The worst possible case is to do it > > > _conditionally_, and that is exactly what virtio_net is doing. > > > > > > From the user's perspective, this provides a really bad experience. > > > > > > So if I have a 32-queue device and there are 32 cpus, you'll do all > > > the affinity settings, stopping Irqbalanced from doing anything > > > right? > > > > > > So if I add one more cpu, you'll say "oops, no idea what to do in > > > this situation" and not touch the affinities at all? > > > > > > That makes no sense at all. > > > > > > If the driver is going to set affinities at all, OWN that decision > > > and set it all the time to something reasonable. > > > > > > Or accept that you shouldn't be touching this stuff in the first place > > > and leave the affinities alone. > > > > > > Right now we're kinda in a situation where the driver has been setting > > > affinities in the ncpus==nqueues cases for some time, so we can't stop > > > doing it. > > > > > > Which means we have to set them in all cases to make the user > > > experience sane again. > > > > > > I looked at the linked to patch again: > > > > > > https://patchwork.ozlabs.org/patch/725249/ > > > > > > And I think the strategy should be made more generic, to get rid of > > > the hyperthreading assumptions. I also agree that the "assign > > > to first N cpus" logic doesn't make much sense either. > > > > > > Just distribute across the available cpus evenly, and be done with it. > > > > Sounds good to me. > > So e.g. we could set an affinity hint to a group of CPUs that > might transmit to this queue. We also want to set the xps mask for all cpus in the group to this queue. Is there a benefit over explicitly choosing one cpu from the set, btw? I assumed striping. Something along the lines of int stripe = max_t(int, num_online_cpus() / vi->curr_queue_pairs, 1); int vq = 0; cpumask_clear(xps_mask); for_each_online_cpu(cpu) { cpumask_set_cpu(cpu, xps_mask); if ((i + 1) % stripe == 0) { virtqueue_set_affinity(vi->rq[vq].vq, cpu); virtqueue_set_affinity(vi->sq[vq].vq, cpu); netif_set_xps_queue(vi->dev, xps_mask, vq); cpumask_clear(xps_mask); vq++; } i++; }
Re: [PATCH net-next 7/9] net: stmmac: Integrate XGMAC into main driver flow
> @@ -842,6 +863,12 @@ static void stmmac_adjust_link(struct net_device *dev) > new_state = true; > ctrl &= ~priv->hw->link.speed_mask; > switch (phydev->speed) { > + case SPEED_1: > + ctrl |= priv->hw->link.speed1; > + break; > + case SPEED_2500: > + ctrl |= priv->hw->link.speed2500; > + break; > case SPEED_1000: > ctrl |= priv->hw->link.speed1000; > break; Hi Jose What PHY did you test this with? 10G phys change the interface mode when the speed change. In general, 10/100/1000G copper uses SGMII. A 1G SFP optical module generally wants 1000Base-X. 2.5G wants 2500Base-X, 10G copper wants 10GKR, etc. So your adjust link callback needs to look at phydev->interface and reconfigure the MAC as requested. You might also want to consider moving from phylib to phylink. It has a better interface for things like this, and makes support for SFP interfaces much easier. A MAC which supports 10G is likely to be used with SFPs... Andrew
Re: [PATCH bpf] selftests/bpf: update test_lwt_seg6local.sh according to iproute2
On Wed, Aug 1, 2018 at 8:34 AM, Mathieu Xhonneux wrote: > The shell file for test_lwt_seg6local contains an early iproute2 syntax > for installing a seg6local End.BPF route. iproute2 support for this > feature has recently been upstreamed, but with an additional keyword > required. This patch updates test_lwt_seg6local.sh to the definitive > iproute2 syntax > > Signed-off-by: Mathieu Xhonneux Acked-by: Yonghong Song
Re: [PATCH v6 bpf-next 4/9] veth: Handle xdp_frames in xdp napi ring
On Wed, 1 Aug 2018 14:41:08 +0900 Toshiaki Makita wrote: > On 2018/07/31 21:46, Jesper Dangaard Brouer wrote: > > On Tue, 31 Jul 2018 19:40:08 +0900 > > Toshiaki Makita wrote: > > > >> On 2018/07/31 19:26, Jesper Dangaard Brouer wrote: > >>> > >>> Context needed from: [PATCH v6 bpf-next 2/9] veth: Add driver XDP > >>> > >>> On Mon, 30 Jul 2018 19:43:44 +0900 > >>> Toshiaki Makita wrote: > >>> [...] > >>> > >>> Here you are adding an assumption that struct xdp_frame is always > >>> located in-the-top of the packet-data area. I tried hard not to add > >>> such a dependency! You can calculate the beginning of the frame from > >>> the xdp_frame->data pointer. > >>> > >>> Why not add such a dependency? Because for AF_XDP zero-copy, we cannot > >>> make such an assumption. > >>> > >>> Currently, when an RX-queue is in AF-XDP-ZC mode (MEM_TYPE_ZERO_COPY) > >>> the packet will get dropped when calling convert_to_xdp_frame(), but as > >>> the TODO comment indicated in convert_to_xdp_frame() this is not the > >>> end-goal. > >>> > >>> The comment in convert_to_xdp_frame(), indicate we need a full > >>> alloc+copy, but that is actually not necessary, if we can just use > >>> another memory area for struct xdp_frame, and a pointer to data. Thus, > >>> allowing devmap-redir to work-ZC and allow cpumap-redir to do the copy > >>> on the remote CPU. > >> > >> Thanks for pointing this out. > >> Seems you are saying xdp_frame area is not reusable. That means we > >> reduce usable headroom on every REDIRECT. I wanted to avoid this but > >> actually it is impossible, right? > > > > I'm not sure I understand fully... has this something to do, with the > > below memset? > > Sorry for not being so clear... > It has something to do with the memset as well but mainly I was talking > about XDP_TX and REDIRECT introduced in patch 8. On REDIRECT, > dev_map_enqueue() calls convert_to_xdp_frame() so we use the headroom > for struct xdp_frame on REDIRECT. If we don't reuse xdp_frame region of > the original xdp packet, we reduce the headroom size each time on > REDIRECT. When ZC is used, in the future xdp_frame can be non-contiguous > to the buffer, so we cannot reuse the xdp_frame region in > convert_to_xdp_frame()? But current convert_to_xdp_frame() > implementation requires xdp_frame region in headroom so I think I cannot > avoid this dependency now. > > SKB has a similar problem if we cannot reuse it. It can be passed to a > bridge and redirected to another veth which has driver XDP. In that case > we need to reallocate the page if we have reduced the headroom because > sufficient headroom is required for XDP processing for now (can we > remove this requirement actually?). Okay, now I understand. Your changes allow multiple levels of XDP_REDIRECT between/into other veth net_devices. This is very interesting and exciting stuff, but also a bit scary, when thinking about if we got he life-time correct for the different memory objects. You have convinced me. We should not sacrifice/reduce the headroom this way. I'll also fix up cpumap. To avoid the performance penalty of the memset, I propose that we just clear the xdp_frame->data pointer. But lets implement it via a common sanitize/scrub function. > > When cpumap generate an SKB for the netstack, then we sacrifice/reduce > > the SKB headroom available, by in convert_to_xdp_frame() reducing the > > headroom by xdp_frame size. > > > > xdp_frame->headroom = headroom - sizeof(*xdp_frame) > > > > In-order to avoid doing such memset of this area. We are actually only > > worried about exposing the 'data' pointer, thus we could just clear > > that. (See commit 6dfb970d3dbd, this is because Alexei is planing to > > move from CAP_SYS_ADMIN to lesser privileged mode CAP_NET_ADMIN) > > > > See commits: > > 97e19cce05e5 ("bpf: reserve xdp_frame size in xdp headroom") > > 6dfb970d3dbd ("xdp: avoid leaking info stored in frame data on page > > reuse") > > We have talked about that... > https://patchwork.ozlabs.org/patch/903536/ > > The memset is introduced as per your feedback, but I'm still not sure if > we need this. In general the headroom is not cleared after allocation in > drivers, so anyway unprivileged users should not see it no matter if it > contains xdp_frame or not... I actually got this request from Alexei. That is why I implemented it. Personally I don't think this clearing is really needed, until someone actually makes the TC/cls_act BPF hook CAP_NET_ADMIN. -- Best regards, Jesper Dangaard Brouer MSc.CS, Principal Kernel Engineer at Red Hat LinkedIn: http://www.linkedin.com/in/brouer
Re: [PATCH net-next 5/9] net: stmmac: Add MDIO related functions for XGMAC2
Hi Jose > +static int stmmac_xgmac2_mdio_read(struct stmmac_priv *priv, int phyaddr, > +int phyreg) > +{ > + unsigned int mii_address = priv->hw->mii.addr; > + unsigned int mii_data = priv->hw->mii.data; > + u32 tmp, addr, value = MII_XGMAC_BUSY; > + int data; > + > + if (phyreg & MII_ADDR_C45) { > + addr = ((phyreg >> 16) & 0x1f) << 21; > + addr |= (phyaddr << 16) | (phyreg & 0x); Do you need to tell the hardware this is a C45 transfer? Normally an extra bit needs setting somewhere. > + } else { > + if (phyaddr >= 4) > + return -ENODEV; Can the MDIO bus be external? If so, is there a reason why there cannot be a PHY at addresses > 4. So maybe there is an Ethernet switch, which needs lots of addresses? And C45 can have devices > 4 but C22 cannot? > + writel(~0x0, priv->ioaddr + 0x220); > + addr = (phyaddr << 16) | (phyreg & 0x1f); > + } > + > + value |= (priv->clk_csr << priv->hw->mii.clk_csr_shift) > + & priv->hw->mii.clk_csr_mask; > + value |= BIT(18); Please add a #define for this bit. > + value |= MII_XGMAC_READ; > + > + if (readl_poll_timeout(priv->ioaddr + mii_data, tmp, > +!(tmp & MII_XGMAC_BUSY), 100, 1)) > + return -EBUSY; > + > + writel(addr, priv->ioaddr + mii_address); > + writel(value, priv->ioaddr + mii_data); > + > + if (readl_poll_timeout(priv->ioaddr + mii_data, tmp, > +!(tmp & MII_XGMAC_BUSY), 100, 1)) > + return -EBUSY; > + > + /* Read the data from the MII data register */ > + data = (int)readl(priv->ioaddr + mii_data) & GENMASK(15, 0); Is the cast needed? And why use GENMASK here, but not in all the other places you have masks in this code? > /** > * stmmac_mdio_read > * @bus: points to the mii_bus structure > @@ -59,6 +141,9 @@ static int stmmac_mdio_read(struct mii_bus *bus, int > phyaddr, int phyreg) > int data; > u32 value = MII_BUSY; > > + if (priv->plat->has_xgmac) > + return stmmac_xgmac2_mdio_read(priv, phyaddr, phyreg); It would be cleaner to instead do this in stmmac_mdio_register() when setting new_bus->read. Andrew
Re: [PATCH net-next 9/9] bindings: net: stmmac: Add the bindings documentation for XGMAC2.
Hello! On 08/01/2018 03:10 PM, Jose Abreu wrote: > Adds the documentation for XGMAC2 DT bindings. > > Signed-off-by: Jose Abreu > Cc: David S. Miller > Cc: Joao Pinto > Cc: Giuseppe Cavallaro > Cc: Alexandre Torgue > --- > Documentation/devicetree/bindings/net/stmmac.txt | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/Documentation/devicetree/bindings/net/stmmac.txt > b/Documentation/devicetree/bindings/net/stmmac.txt > index 3a28a5d8857d..525425beb6e7 100644 > --- a/Documentation/devicetree/bindings/net/stmmac.txt > +++ b/Documentation/devicetree/bindings/net/stmmac.txt > @@ -1,7 +1,8 @@ > * STMicroelectronics 10/100/1000 Ethernet driver (GMAC) > > Required properties: > -- compatible: Should be "snps,dwmac-", "snps,dwmac" > +- compatible: Should be "snps,dwmac-", "snps,dwmac" or > + "snps,dwxgmac- [...] MBR, Sergei
Re: [PATCH bpf] xdp: add NULL pointer check in __xdp_return()
Den ons 1 aug. 2018 kl 16:14 skrev Jesper Dangaard Brouer : > > On Mon, 23 Jul 2018 11:41:02 +0200 > Björn Töpel wrote: > > > > >> diff --git a/net/core/xdp.c b/net/core/xdp.c > > > >> index 9d1f220..1c12bc7 100644 > > > >> --- a/net/core/xdp.c > > > >> +++ b/net/core/xdp.c > > > >> @@ -345,7 +345,8 @@ static void __xdp_return(void *data, struct > > > >> xdp_mem_info *mem, bool napi_direct, > > > >> rcu_read_lock(); > > > >> /* mem->id is valid, checked in > > > >> xdp_rxq_info_reg_mem_model() */ > > > >> xa = rhashtable_lookup(mem_id_ht, >id, > > > >> mem_id_rht_params); > > > >> - xa->zc_alloc->free(xa->zc_alloc, handle); > > > >> + if (xa) > > > >> + xa->zc_alloc->free(xa->zc_alloc, handle); > > > > hmm...It is not clear to me the "!xa" case don't have to be handled? > > > > > > Thank you for reviewing! > > > > > > Returning NULL pointer is bug case such as calling after use > > > xdp_rxq_info_unreg(). > > > so that, I think it can't handle at that moment. > > > we can make __xdp_return to add WARN_ON_ONCE() or > > > add return error code to driver. > > > But I'm not sure if these is useful information. > > > > > > I might have misunderstood scenario of MEM_TYPE_ZERO_COPY > > > because there is no use case of MEM_TYPE_ZERO_COPY yet. > > > > > > > Taehee, again, sorry for the slow response and thanks for patch! > > > > If xa is NULL, the driver has a buggy/broken implementation. What > > would be a proper way of dealing with this? BUG? > > Hmm... I don't like these kind of changes to the hot-path code! > > You might not realize this, but adding BUG() and WARN_ON() to the code > affect performance in ways you might not realize! These macros gets > compiled and uses an asm instruction called "ud2". Seeing the "ud2" > instruction causes the CPUs instruction cache prefetcher to stop. > Thus, if some code ends up below this instruction, this will cause more > i-cache-misses. > > I don't know if xa==NULL is even possible, but if it is, then I think > this is a result of a driver mem_reg API usage bug. And the mem-reg > API is full of WARN's and error messages, exactly to push these kind of > checks out of the fast-path. There is no need for a BUG() call, as > deref a NULL pointer will case an OOPS, that is easy to read and > understand. > Jesper, thanks for having a look! So, you're right that if xa==NULL the driver is "broken/buggy" (as stated earlier!). I agree that OOPSing on a NULL pointer is as good as a BUG! The applied patch adds a WARN_ON_ONCE, and I thought best practice was that a buggy driver shouldn't crash the kernel... What is considered best practices in these scenarios? *I'd* prefer an OOPS instead of WARN_ON_ONCE, to catch that buggy driver. Again, that's me. I thought that most people prefer not crashing, hence the patch. :-) Björn > -- > Best regards, > Jesper Dangaard Brouer > MSc.CS, Principal Kernel Engineer at Red Hat > LinkedIn: http://www.linkedin.com/in/brouer
Re: [PATCH net-next v5 1/4] net/sched: user-space can't set unknown tcfa_action values
On 31/07/18 10:40 AM, Paolo Abeni wrote: If we choose to reject unknown opcodes, such user-space configuration will fail. I think that is a good thing. The kernel should not be accepting things it doesnt understand. This is a good opportunity to enforce that. What would happen before this patch is that configurations using such TC_ACT_ value would be successful. This is why I proposed to keep the fixup. Note: Such behavior can only occur if tc(user space) allows you to pass illegitimate values which today can only happen when you have a new user space but older kernel (with "old" starting with your current changes). iow, fixing a policy in a kernel which has no support for TC_ACT_ to translate intent to be TC_ACT_OK/PIPE is problematic (as i was showing earlier). I initially thought the kernel behavior in the above scenario would match exactly TC_ACT_UNSPEC processing, but as you noted with the example in your previous email, TC_ACT_UNSPEC processing is actually a bit different. I worry: I dont think we can get a good default for most use cases. No point in maintaining faulty expectations (because IMO: the user will - eventually - fix their scripts if they dont see expected behavior). cheers, jamal
Re: [PATCH mlx5-next] RDMA/mlx5: Don't use cached IRQ affinity mask
On 8/1/2018 8:12 AM, Sagi Grimberg wrote: Hi Max, Hi, Yes, since nvmf is the only user of this function. Still waiting for comments on the suggested patch :) Sorry for the late response (but I'm on vacation so I have an excuse ;)) NP :) currently the code works.. I'm thinking that we should avoid trying to find an assignment when stuff like irqbalance daemon is running and changing the affinitization. but this is exactly what Steve complained and Leon try to fix (and break the connection establishment). If this is the case and we all agree then we're good without Leon's patch and without our suggestions. This extension was made to apply optimal affinity assignment when the device irq affinity is lined up in a vector per core. I'm thinking that when we identify this is not the case, we immediately fallback to the default mapping. 1. when we get a mask, if its weight != 1, we fallback. 2. if a queue was left unmapped, we fallback. Maybe something like the following: did you test it ? I think it will not work since you need to map all the queues and all the CPUs. -- diff --git a/block/blk-mq-rdma.c b/block/blk-mq-rdma.c index 996167f1de18..1ada6211c55e 100644 --- a/block/blk-mq-rdma.c +++ b/block/blk-mq-rdma.c @@ -35,17 +35,26 @@ int blk_mq_rdma_map_queues(struct blk_mq_tag_set *set, const struct cpumask *mask; unsigned int queue, cpu; + /* reset all CPUs mapping */ + for_each_possible_cpu(cpu) + set->mq_map[cpu] = UINT_MAX; + for (queue = 0; queue < set->nr_hw_queues; queue++) { mask = ib_get_vector_affinity(dev, first_vec + queue); if (!mask) goto fallback; - for_each_cpu(cpu, mask) - set->mq_map[cpu] = queue; + if (cpumask_weight(mask) != 1) + goto fallback; + + cpu = cpumask_first(mask); + if (set->mq_map[cpu] != UINT_MAX) + goto fallback; + + set->mq_map[cpu] = queue; } return 0; - fallback: return blk_mq_map_queues(set); } -- see attached another algorithem that can improve the mapping (although it's not a short one)... it will try to map according to affinity mask, and also in case the mask weight > 1 it will try to be better than the naive mapping I suggest in the previous email. From 007d773af7b65a1f1ca543f031ca58b3afa5b7d9 Mon Sep 17 00:00:00 2001 From: Max Gurtovoy Date: Thu, 19 Jul 2018 12:42:00 + Subject: [PATCH 1/1] blk-mq: fix RDMA queue/cpu mappings assignments for mq Signed-off-by: Max Gurtovoy Signed-off-by: Israel Rukshin --- block/blk-mq-cpumap.c | 41 ++-- block/blk-mq-rdma.c| 84 ++ include/linux/blk-mq.h | 1 + 3 files changed, 103 insertions(+), 23 deletions(-) diff --git a/block/blk-mq-cpumap.c b/block/blk-mq-cpumap.c index 3eb169f..02b888f 100644 --- a/block/blk-mq-cpumap.c +++ b/block/blk-mq-cpumap.c @@ -30,29 +30,36 @@ static int get_first_sibling(unsigned int cpu) return cpu; } -int blk_mq_map_queues(struct blk_mq_tag_set *set) +void blk_mq_map_queue_to_cpu(struct blk_mq_tag_set *set, unsigned int cpu) { unsigned int *map = set->mq_map; unsigned int nr_queues = set->nr_hw_queues; - unsigned int cpu, first_sibling; + unsigned int first_sibling; - for_each_possible_cpu(cpu) { - /* -* First do sequential mapping between CPUs and queues. -* In case we still have CPUs to map, and we have some number of -* threads per cores then map sibling threads to the same queue for -* performace optimizations. -*/ - if (cpu < nr_queues) { + /* +* First do sequential mapping between CPUs and queues. +* In case we still have CPUs to map, and we have some number of +* threads per cores then map sibling threads to the same queue for +* performace optimizations. +*/ + if (cpu < nr_queues) { + map[cpu] = cpu_to_queue_index(nr_queues, cpu); + } else { + first_sibling = get_first_sibling(cpu); + if (first_sibling == cpu) map[cpu] = cpu_to_queue_index(nr_queues, cpu); - } else { - first_sibling = get_first_sibling(cpu); - if (first_sibling == cpu) - map[cpu] = cpu_to_queue_index(nr_queues, cpu); - else - map[cpu] = map[first_sibling]; - } + else + map[cpu] = map[first_sibling]; } +} +EXPORT_SYMBOL_GPL(blk_mq_map_queue_to_cpu); + +int blk_mq_map_queues(struct blk_mq_tag_set *set) +{ + unsigned int cpu; + +
Re: [PATCH bpf] xdp: add NULL pointer check in __xdp_return()
On Mon, 23 Jul 2018 11:41:02 +0200 Björn Töpel wrote: > > >> diff --git a/net/core/xdp.c b/net/core/xdp.c > > >> index 9d1f220..1c12bc7 100644 > > >> --- a/net/core/xdp.c > > >> +++ b/net/core/xdp.c > > >> @@ -345,7 +345,8 @@ static void __xdp_return(void *data, struct > > >> xdp_mem_info *mem, bool napi_direct, > > >> rcu_read_lock(); > > >> /* mem->id is valid, checked in > > >> xdp_rxq_info_reg_mem_model() */ > > >> xa = rhashtable_lookup(mem_id_ht, >id, > > >> mem_id_rht_params); > > >> - xa->zc_alloc->free(xa->zc_alloc, handle); > > >> + if (xa) > > >> + xa->zc_alloc->free(xa->zc_alloc, handle); > > > hmm...It is not clear to me the "!xa" case don't have to be handled? > > > > Thank you for reviewing! > > > > Returning NULL pointer is bug case such as calling after use > > xdp_rxq_info_unreg(). > > so that, I think it can't handle at that moment. > > we can make __xdp_return to add WARN_ON_ONCE() or > > add return error code to driver. > > But I'm not sure if these is useful information. > > > > I might have misunderstood scenario of MEM_TYPE_ZERO_COPY > > because there is no use case of MEM_TYPE_ZERO_COPY yet. > > > > Taehee, again, sorry for the slow response and thanks for patch! > > If xa is NULL, the driver has a buggy/broken implementation. What > would be a proper way of dealing with this? BUG? Hmm... I don't like these kind of changes to the hot-path code! You might not realize this, but adding BUG() and WARN_ON() to the code affect performance in ways you might not realize! These macros gets compiled and uses an asm instruction called "ud2". Seeing the "ud2" instruction causes the CPUs instruction cache prefetcher to stop. Thus, if some code ends up below this instruction, this will cause more i-cache-misses. I don't know if xa==NULL is even possible, but if it is, then I think this is a result of a driver mem_reg API usage bug. And the mem-reg API is full of WARN's and error messages, exactly to push these kind of checks out of the fast-path. There is no need for a BUG() call, as deref a NULL pointer will case an OOPS, that is easy to read and understand. -- Best regards, Jesper Dangaard Brouer MSc.CS, Principal Kernel Engineer at Red Hat LinkedIn: http://www.linkedin.com/in/brouer
RE: [net-next v5 3/3] net/tls: Remove redundant array allocation.
> -Original Message- > From: Dave Watson [mailto:davejwat...@fb.com] > Sent: Monday, July 23, 2018 10:05 PM > To: David Miller > Cc: Vakul Garg ; netdev@vger.kernel.org; > bor...@mellanox.com; avia...@mellanox.com; Doron Roberts-Kedes > > Subject: Re: [net-next v5 3/3] net/tls: Remove redundant array allocation. > > On 07/21/18 07:25 PM, David Miller wrote: > > From: Vakul Garg > > Date: Thu, 19 Jul 2018 21:56:13 +0530 > > > > > In function decrypt_skb(), array allocation in case when sgout is > > > NULL is unnecessary. Instead, local variable sgin_arr[] can be used. > > > > > > Signed-off-by: Vakul Garg > > > > Hmmm... > > > > Dave, can you take a look at this? Do you think there might have been > > a reason you felt that you needed to dynamically allocate the > > scatterlists when you COW and skb and do in-place decryption? > > > > I guess this change is ok, nsg can only get smaller when the SKB is > > COW'd. > > > > > > memcpy(iv, tls_ctx->rx.iv, TLS_CIPHER_AES_GCM_128_SALT_SIZE); > > > if (!sgout) { > > > nsg = skb_cow_data(skb, 0, ) + 1; > > > - sgin = kmalloc_array(nsg, sizeof(*sgin), > > > sk->sk_allocation); > > > sgout = sgin; > > > } > > I don't think this patch is safe as-is. sgin_arr is a stack array of size > MAX_SKB_FRAGS (+ overhead), while my read of skb_cow_data is that it > walks the whole chain of skbs from skb->next, and can return any number of > segments. Therefore we need to heap allocate. I think I copied the IPSEC > code here. > > For perf though, we could use the stack array if skb_cow_data returns <= > MAX_SKB_FRAGS. skb_cow_data() is being called only when caller passes sgout=NULL (i.e. non-zero copy case). In case of zero-copy, we do not call skb_cow_data() and just assume that MAX_SKB_FRAGS+2 sized scatterlist array sgin_arr[] is sufficient. This assumption could be wrong. So skb_cow_data() should be called unconditionally to determine number of scatterlist entries required for skb. > > This code is slightly confusing though, since we don't heap allocate in the > zerocopy case - what happens is that skb_to_sgvec returns -EMSGSIZE, and > we fall back to the non-zerocopy case, and return again to this function, > where we then hit the skb_cow_data path and heap allocate. If skb_to_sgvec return -EMSGSIZE, decrypt_skb() would return failure, resulting in abort of TLS session.
[PATCH bpf] selftests/bpf: update test_lwt_seg6local.sh according to iproute2
The shell file for test_lwt_seg6local contains an early iproute2 syntax for installing a seg6local End.BPF route. iproute2 support for this feature has recently been upstreamed, but with an additional keyword required. This patch updates test_lwt_seg6local.sh to the definitive iproute2 syntax Signed-off-by: Mathieu Xhonneux --- tools/testing/selftests/bpf/test_lwt_seg6local.sh | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/tools/testing/selftests/bpf/test_lwt_seg6local.sh b/tools/testing/selftests/bpf/test_lwt_seg6local.sh index 270fa8f49573..785eabf2a593 100755 --- a/tools/testing/selftests/bpf/test_lwt_seg6local.sh +++ b/tools/testing/selftests/bpf/test_lwt_seg6local.sh @@ -115,14 +115,14 @@ ip netns exec ns2 ip -6 route add fb00::6 encap bpf in obj test_lwt_seg6local.o ip netns exec ns2 ip -6 route add fd00::1 dev veth3 via fb00::43 scope link ip netns exec ns3 ip -6 route add fc42::1 dev veth5 via fb00::65 -ip netns exec ns3 ip -6 route add fd00::1 encap seg6local action End.BPF obj test_lwt_seg6local.o sec add_egr_x dev veth4 +ip netns exec ns3 ip -6 route add fd00::1 encap seg6local action End.BPF endpoint obj test_lwt_seg6local.o sec add_egr_x dev veth4 -ip netns exec ns4 ip -6 route add fd00::2 encap seg6local action End.BPF obj test_lwt_seg6local.o sec pop_egr dev veth6 +ip netns exec ns4 ip -6 route add fd00::2 encap seg6local action End.BPF endpoint obj test_lwt_seg6local.o sec pop_egr dev veth6 ip netns exec ns4 ip -6 addr add fc42::1 dev lo ip netns exec ns4 ip -6 route add fd00::3 dev veth7 via fb00::87 ip netns exec ns5 ip -6 route add fd00::4 table 117 dev veth9 via fb00::109 -ip netns exec ns5 ip -6 route add fd00::3 encap seg6local action End.BPF obj test_lwt_seg6local.o sec inspect_t dev veth8 +ip netns exec ns5 ip -6 route add fd00::3 encap seg6local action End.BPF endpoint obj test_lwt_seg6local.o sec inspect_t dev veth8 ip netns exec ns6 ip -6 addr add fb00::6/16 dev lo ip netns exec ns6 ip -6 addr add fd00::4/16 dev lo -- 2.16.1
how PHY driver is notified that cable is unplugged? (possibly related to IFF_RUNNING flag)
(warning that i have a few questions that are probably trivial until i get up to speed with networking code.) a colleague asked for advice about the following -- apparently a new PHY driver works properly when being brought up with "ifconfig up", part of that process apparently setting the IFF_RUNNING net_device flags bit. so far, so good. now, when the cable is physically unplugged, the claim is that there is no obvious status change for that port, accompanied by the suggestion that it is that IFF_RUNNING flag bit that is not being unset. asking a more general question, where can i read up on the proper protocol for a driver being notified of, and properly handling, physical disconnection on that port? and, of course, the cable being plugged back in. thanks in advance for any pointers. rday
Re: [RFC PATCH 0/2] net: macb: Disable TX checksum offloading on all Zynq
Hi Jennifer, On Tue, Jun 5, 2018 at 10:21 AM, Harini Katakam wrote: > Hi Jeniffer, > > On Mon, Jun 4, 2018 at 8:35 PM, Nicolas Ferre > wrote: >> Jennifer, >> >> On 25/05/2018 at 23:44, Jennifer Dahm wrote: >>> >>> During testing, I discovered that the Zynq GEM hardware overwrites all >>> outgoing UDP packet checksums, which is illegal in packet forwarding >>> cases. This happens both with and without the checksum-zeroing >>> behavior introduced in 007e4ba3ee137f4700f39aa6dbaf01a71047c5f6 >>> ("net: macb: initialize checksum when using checksum offloading"). The >>> only solution to both the small packet bug and the packet forwarding >>> bug that I can find is to disable TX checksum offloading entirely. >> >> > > Thanks for the extensive testing. > I'll try to reproduce and see if it is something to be fixed in the driver. > >> Are the bugs listed above present in all revisions of the GEM IP, only for >> some revisions? >> Is there an errata that describe this issue for the Zynq GEM? > > @Nicolas, AFAIK, there is no errata for this in either Cadence or > Zynq documentation. I was unable to reproduce this issue on Zynq. Although I do not have HW with two GEM ports, I tried by routing one GEM via PL and another via on board RGMII. Since there was no specific errata related to this, I also tried on subsequent ZynqMP versions with multiple GEM ports but dint find any checksum issues. I discussed the same with cadence and they tried the test with 2 bytes of UDP payload on the Zynq GEM IP version in their regressions and did not hit any issue either. I tried to reach out earlier to see if you can share your exact application. Could you please let me know if you have any further updates? Regards, Harini
[PATCH net-next] cxgb4: fix endian to test F_FW_PORT_CMD_DCBXDIS32
For FW_PORT_ACTION_GET_PORT_INFO32 messages, the u.info32.lstatus32_to_cbllen32 is 32-bit Big Endian. We need to translate that to CPU Endian in order to test F_FW_PORT_CMD_DCBXDIS32. Signed-off-by: Casey Leedom Signed-off-by: Ganesh Goudar --- drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c | 7 +++ 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c index 40cf8dc..674997d 100644 --- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c +++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c @@ -554,10 +554,9 @@ static int fwevtq_handler(struct sge_rspq *q, const __be64 *rsp, dev = q->adap->port[q->adap->chan_map[port]]; dcbxdis = (action == FW_PORT_ACTION_GET_PORT_INFO - ? !!(pcmd->u.info.dcbxdis_pkd & - FW_PORT_CMD_DCBXDIS_F) - : !!(pcmd->u.info32.lstatus32_to_cbllen32 & - FW_PORT_CMD_DCBXDIS32_F)); + ? !!(pcmd->u.info.dcbxdis_pkd & FW_PORT_CMD_DCBXDIS_F) + : !!(be32_to_cpu(pcmd->u.info32.lstatus32_to_cbllen32) + & FW_PORT_CMD_DCBXDIS32_F)); state_input = (dcbxdis ? CXGB4_DCB_INPUT_FW_DISABLED : CXGB4_DCB_INPUT_FW_ENABLED); -- 2.1.0
[PATCH net-next 6/9] net: stmmac: Add PTP support for XGMAC2
XGMAC2 uses the same engine of timestamping as GMAC4. Let's use the same callbacks. Signed-off-by: Jose Abreu Cc: David S. Miller Cc: Joao Pinto Cc: Giuseppe Cavallaro Cc: Alexandre Torgue --- drivers/net/ethernet/stmicro/stmmac/hwif.c | 4 ++-- drivers/net/ethernet/stmicro/stmmac/stmmac_ptp.c | 6 -- drivers/net/ethernet/stmicro/stmmac/stmmac_ptp.h | 1 + 3 files changed, 7 insertions(+), 4 deletions(-) diff --git a/drivers/net/ethernet/stmicro/stmmac/hwif.c b/drivers/net/ethernet/stmicro/stmmac/hwif.c index 4b4ba1c8bad5..357309a6d6a5 100644 --- a/drivers/net/ethernet/stmicro/stmmac/hwif.c +++ b/drivers/net/ethernet/stmicro/stmmac/hwif.c @@ -193,13 +193,13 @@ static const struct stmmac_hwif_entry { .xgmac = true, .min_id = DWXGMAC_CORE_2_10, .regs = { - .ptp_off = 0, + .ptp_off = PTP_XGMAC_OFFSET, .mmc_off = 0, }, .desc = _desc_ops, .dma = _dma_ops, .mac = _ops, - .hwtimestamp = NULL, + .hwtimestamp = _ptp, .mode = NULL, .tc = NULL, .setup = dwxgmac2_setup, diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_ptp.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_ptp.c index 0cb0e39a2be9..2293e21f789f 100644 --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_ptp.c +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_ptp.c @@ -71,6 +71,9 @@ static int stmmac_adjust_time(struct ptp_clock_info *ptp, s64 delta) u32 sec, nsec; u32 quotient, reminder; int neg_adj = 0; + bool xmac; + + xmac = priv->plat->has_gmac4 || priv->plat->has_xgmac; if (delta < 0) { neg_adj = 1; @@ -82,8 +85,7 @@ static int stmmac_adjust_time(struct ptp_clock_info *ptp, s64 delta) nsec = reminder; spin_lock_irqsave(>ptp_lock, flags); - stmmac_adjust_systime(priv, priv->ptpaddr, sec, nsec, neg_adj, - priv->plat->has_gmac4); + stmmac_adjust_systime(priv, priv->ptpaddr, sec, nsec, neg_adj, xmac); spin_unlock_irqrestore(>ptp_lock, flags); return 0; diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_ptp.h b/drivers/net/ethernet/stmicro/stmmac/stmmac_ptp.h index f4b31d69f60e..ecccf895fd7e 100644 --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_ptp.h +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_ptp.h @@ -21,6 +21,7 @@ #ifndef__STMMAC_PTP_H__ #define__STMMAC_PTP_H__ +#define PTP_XGMAC_OFFSET 0xd00 #definePTP_GMAC4_OFFSET0xb00 #definePTP_GMAC3_X_OFFSET 0x700 -- 2.7.4
[PATCH net-next 7/9] net: stmmac: Integrate XGMAC into main driver flow
Now that we have all the XGMAC related callbacks, lets start integrating this IP block into main driver. Signed-off-by: Jose Abreu Cc: David S. Miller Cc: Joao Pinto Cc: Giuseppe Cavallaro Cc: Alexandre Torgue --- drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 55 ++- 1 file changed, 45 insertions(+), 10 deletions(-) diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c index d9e60cfd8a85..2c7a571140bb 100644 --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c @@ -51,6 +51,7 @@ #include #include #include "dwmac1000.h" +#include "dwxgmac2.h" #include "hwif.h" #define STMMAC_ALIGN(x)L1_CACHE_ALIGN(x) @@ -262,6 +263,21 @@ static void stmmac_clk_csr_set(struct stmmac_priv *priv) else priv->clk_csr = 0; } + + if (priv->plat->has_xgmac) { + if (clk_rate > 4) + priv->clk_csr = 0x5; + else if (clk_rate > 35000) + priv->clk_csr = 0x4; + else if (clk_rate > 3) + priv->clk_csr = 0x3; + else if (clk_rate > 25000) + priv->clk_csr = 0x2; + else if (clk_rate > 15000) + priv->clk_csr = 0x1; + else + priv->clk_csr = 0x0; + } } static void print_pkt(unsigned char *buf, int len) @@ -498,7 +514,7 @@ static void stmmac_get_rx_hwtstamp(struct stmmac_priv *priv, struct dma_desc *p, if (!priv->hwts_rx_en) return; /* For GMAC4, the valid timestamp is from CTX next desc. */ - if (priv->plat->has_gmac4) + if (priv->plat->has_gmac4 || priv->plat->has_xgmac) desc = np; /* Check if timestamp is available */ @@ -540,6 +556,9 @@ static int stmmac_hwtstamp_ioctl(struct net_device *dev, struct ifreq *ifr) u32 ts_event_en = 0; u32 value = 0; u32 sec_inc; + bool xmac; + + xmac = priv->plat->has_gmac4 || priv->plat->has_xgmac; if (!(priv->dma_cap.time_stamp || priv->adv_ts)) { netdev_alert(priv->dev, "No support for HW time stamping\n"); @@ -575,7 +594,7 @@ static int stmmac_hwtstamp_ioctl(struct net_device *dev, struct ifreq *ifr) /* PTP v1, UDP, any kind of event packet */ config.rx_filter = HWTSTAMP_FILTER_PTP_V1_L4_EVENT; /* take time stamp for all event messages */ - if (priv->plat->has_gmac4) + if (xmac) snap_type_sel = PTP_GMAC4_TCR_SNAPTYPSEL_1; else snap_type_sel = PTP_TCR_SNAPTYPSEL_1; @@ -610,7 +629,7 @@ static int stmmac_hwtstamp_ioctl(struct net_device *dev, struct ifreq *ifr) config.rx_filter = HWTSTAMP_FILTER_PTP_V2_L4_EVENT; ptp_v2 = PTP_TCR_TSVER2ENA; /* take time stamp for all event messages */ - if (priv->plat->has_gmac4) + if (xmac) snap_type_sel = PTP_GMAC4_TCR_SNAPTYPSEL_1; else snap_type_sel = PTP_TCR_SNAPTYPSEL_1; @@ -647,7 +666,7 @@ static int stmmac_hwtstamp_ioctl(struct net_device *dev, struct ifreq *ifr) config.rx_filter = HWTSTAMP_FILTER_PTP_V2_EVENT; ptp_v2 = PTP_TCR_TSVER2ENA; /* take time stamp for all event messages */ - if (priv->plat->has_gmac4) + if (xmac) snap_type_sel = PTP_GMAC4_TCR_SNAPTYPSEL_1; else snap_type_sel = PTP_TCR_SNAPTYPSEL_1; @@ -718,7 +737,7 @@ static int stmmac_hwtstamp_ioctl(struct net_device *dev, struct ifreq *ifr) /* program Sub Second Increment reg */ stmmac_config_sub_second_increment(priv, priv->ptpaddr, priv->plat->clk_ptp_rate, - priv->plat->has_gmac4, _inc); + xmac, _inc); temp = div_u64(10ULL, sec_inc); /* Store sub second increment and flags for later use */ @@ -755,12 +774,14 @@ static int stmmac_hwtstamp_ioctl(struct net_device *dev, struct ifreq *ifr) */ static int stmmac_init_ptp(struct stmmac_priv *priv) { + bool xmac = priv->plat->has_gmac4 || priv->plat->has_xgmac; + if (!(priv->dma_cap.time_stamp || priv->dma_cap.atime_stamp)) return -EOPNOTSUPP; priv->adv_ts = 0; - /* Check if adv_ts can be enabled for dwmac 4.x core */ - if
[PATCH net-next 9/9] bindings: net: stmmac: Add the bindings documentation for XGMAC2.
Adds the documentation for XGMAC2 DT bindings. Signed-off-by: Jose Abreu Cc: David S. Miller Cc: Joao Pinto Cc: Giuseppe Cavallaro Cc: Alexandre Torgue --- Documentation/devicetree/bindings/net/stmmac.txt | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/Documentation/devicetree/bindings/net/stmmac.txt b/Documentation/devicetree/bindings/net/stmmac.txt index 3a28a5d8857d..525425beb6e7 100644 --- a/Documentation/devicetree/bindings/net/stmmac.txt +++ b/Documentation/devicetree/bindings/net/stmmac.txt @@ -1,7 +1,8 @@ * STMicroelectronics 10/100/1000 Ethernet driver (GMAC) Required properties: -- compatible: Should be "snps,dwmac-", "snps,dwmac" +- compatible: Should be "snps,dwmac-", "snps,dwmac" or + "snps,dwxgmac-
[PATCH net-next 8/9] net: stmmac: Add the bindings parsing for XGMAC2
Add the bindings parsing for XGMAC2 IP block. Signed-off-by: Jose Abreu Cc: David S. Miller Cc: Joao Pinto Cc: Giuseppe Cavallaro Cc: Alexandre Torgue --- drivers/net/ethernet/stmicro/stmmac/dwmac-generic.c | 2 ++ drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c | 6 ++ 2 files changed, 8 insertions(+) diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac-generic.c b/drivers/net/ethernet/stmicro/stmmac/dwmac-generic.c index 3304095c934c..fad503820e04 100644 --- a/drivers/net/ethernet/stmicro/stmmac/dwmac-generic.c +++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-generic.c @@ -78,6 +78,8 @@ static const struct of_device_id dwmac_generic_match[] = { { .compatible = "snps,dwmac-4.00"}, { .compatible = "snps,dwmac-4.10a"}, { .compatible = "snps,dwmac"}, + { .compatible = "snps,dwxgmac-2.10"}, + { .compatible = "snps,dwxgmac"}, { } }; MODULE_DEVICE_TABLE(of, dwmac_generic_match); diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c index 72da77b94ecd..3609c7b696c7 100644 --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c @@ -486,6 +486,12 @@ stmmac_probe_config_dt(struct platform_device *pdev, const char **mac) plat->force_sf_dma_mode = 1; } + if (of_device_is_compatible(np, "snps,dwxgmac")) { + plat->has_xgmac = 1; + plat->pmt = 1; + plat->tso_en = of_property_read_bool(np, "snps,tso"); + } + dma_cfg = devm_kzalloc(>dev, sizeof(*dma_cfg), GFP_KERNEL); if (!dma_cfg) { -- 2.7.4
[PATCH net-next 3/9] net: stmmac: Add DMA related callbacks for XGMAC2
Add the DMA related callbacks for the new IP block XGMAC2. Signed-off-by: Jose Abreu Cc: David S. Miller Cc: Joao Pinto Cc: Giuseppe Cavallaro Cc: Alexandre Torgue --- drivers/net/ethernet/stmicro/stmmac/Makefile | 2 +- drivers/net/ethernet/stmicro/stmmac/dwxgmac2.h | 56 +++ drivers/net/ethernet/stmicro/stmmac/dwxgmac2_dma.c | 410 + drivers/net/ethernet/stmicro/stmmac/hwif.c | 2 +- drivers/net/ethernet/stmicro/stmmac/hwif.h | 1 + 5 files changed, 469 insertions(+), 2 deletions(-) create mode 100644 drivers/net/ethernet/stmicro/stmmac/dwxgmac2_dma.c diff --git a/drivers/net/ethernet/stmicro/stmmac/Makefile b/drivers/net/ethernet/stmicro/stmmac/Makefile index a6cf632c9592..da40d3bba037 100644 --- a/drivers/net/ethernet/stmicro/stmmac/Makefile +++ b/drivers/net/ethernet/stmicro/stmmac/Makefile @@ -5,7 +5,7 @@ stmmac-objs:= stmmac_main.o stmmac_ethtool.o stmmac_mdio.o ring_mode.o \ dwmac100_core.o dwmac100_dma.o enh_desc.o norm_desc.o \ mmc_core.o stmmac_hwtstamp.o stmmac_ptp.o dwmac4_descs.o \ dwmac4_dma.o dwmac4_lib.o dwmac4_core.o dwmac5.o hwif.o \ - stmmac_tc.o dwxgmac2_core.o $(stmmac-y) + stmmac_tc.o dwxgmac2_core.o dwxgmac2_dma.o $(stmmac-y) # Ordering matters. Generic driver must be last. obj-$(CONFIG_STMMAC_PLATFORM) += stmmac-platform.o diff --git a/drivers/net/ethernet/stmicro/stmmac/dwxgmac2.h b/drivers/net/ethernet/stmicro/stmmac/dwxgmac2.h index ef63a62a699d..69d225a9c890 100644 --- a/drivers/net/ethernet/stmicro/stmmac/dwxgmac2.h +++ b/drivers/net/ethernet/stmicro/stmmac/dwxgmac2.h @@ -137,4 +137,60 @@ #define XGMAC_ABPSIS BIT(1) #define XGMAC_TXUNFIS BIT(0) +/* DMA Registers */ +#define XGMAC_DMA_MODE 0x3000 +#define XGMAC_SWR BIT(0) +#define XGMAC_DMA_SYSBUS_MODE 0x3004 +#define XGMAC_WR_OSR_LMT GENMASK(29, 24) +#define XGMAC_WR_OSR_LMT_SHIFT 24 +#define XGMAC_RD_OSR_LMT GENMASK(21, 16) +#define XGMAC_RD_OSR_LMT_SHIFT 16 +#define XGMAC_EN_LPI BIT(15) +#define XGMAC_LPI_XIT_PKT BIT(14) +#define XGMAC_AAL BIT(12) +#define XGMAC_BLEN256 BIT(7) +#define XGMAC_BLEN128 BIT(6) +#define XGMAC_BLEN64 BIT(5) +#define XGMAC_BLEN32 BIT(4) +#define XGMAC_BLEN16 BIT(3) +#define XGMAC_BLEN8BIT(2) +#define XGMAC_BLEN4BIT(1) +#define XGMAC_UNDEFBIT(0) +#define XGMAC_DMA_CH_CONTROL(x)(0x3100 + (0x80 * (x))) +#define XGMAC_PBLx8BIT(16) +#define XGMAC_DMA_CH_TX_CONTROL(x) (0x3104 + (0x80 * (x))) +#define XGMAC_TxPBLGENMASK(21, 16) +#define XGMAC_TxPBL_SHIFT 16 +#define XGMAC_TSE BIT(12) +#define XGMAC_OSP BIT(4) +#define XGMAC_TXST BIT(0) +#define XGMAC_DMA_CH_RX_CONTROL(x) (0x3108 + (0x80 * (x))) +#define XGMAC_RxPBLGENMASK(21, 16) +#define XGMAC_RxPBL_SHIFT 16 +#define XGMAC_RXST BIT(0) +#define XGMAC_DMA_CH_TxDESC_LADDR(x) (0x3114 + (0x80 * (x))) +#define XGMAC_DMA_CH_RxDESC_LADDR(x) (0x311c + (0x80 * (x))) +#define XGMAC_DMA_CH_TxDESC_TAIL_LPTR(x) (0x3124 + (0x80 * (x))) +#define XGMAC_DMA_CH_RxDESC_TAIL_LPTR(x) (0x312c + (0x80 * (x))) +#define XGMAC_DMA_CH_TxDESC_RING_LEN(x)(0x3130 + (0x80 * (x))) +#define XGMAC_DMA_CH_RxDESC_RING_LEN(x)(0x3134 + (0x80 * (x))) +#define XGMAC_DMA_CH_INT_EN(x) (0x3138 + (0x80 * (x))) +#define XGMAC_NIE BIT(15) +#define XGMAC_AIE BIT(14) +#define XGMAC_RBUE BIT(7) +#define XGMAC_RIE BIT(6) +#define XGMAC_TIE BIT(0) +#define XGMAC_DMA_INT_DEFAULT_EN (XGMAC_NIE | XGMAC_AIE | XGMAC_RBUE | \ + XGMAC_RIE | XGMAC_TIE) +#define XGMAC_DMA_CH_Rx_WATCHDOG(x)(0x313c + (0x80 * (x))) +#define XGMAC_RWT GENMASK(7, 0) +#define XGMAC_DMA_CH_STATUS(x) (0x3160 + (0x80 * (x))) +#define XGMAC_NIS BIT(15) +#define XGMAC_AIS BIT(14) +#define XGMAC_FBE BIT(12) +#define XGMAC_RBU BIT(7) +#define XGMAC_RI BIT(6) +#define XGMAC_TPS BIT(1) +#define XGMAC_TI BIT(0) + #endif /* __STMMAC_DWXGMAC2_H__ */ diff --git a/drivers/net/ethernet/stmicro/stmmac/dwxgmac2_dma.c b/drivers/net/ethernet/stmicro/stmmac/dwxgmac2_dma.c new file mode 100644 index ..50d9fffc32b5 --- /dev/null +++
[PATCH net-next 5/9] net: stmmac: Add MDIO related functions for XGMAC2
Add the MDIO related funcionalities for the new IP block XGMAC2. Signed-off-by: Jose Abreu Cc: David S. Miller Cc: Joao Pinto Cc: Giuseppe Cavallaro Cc: Alexandre Torgue --- drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c | 88 +++ 1 file changed, 88 insertions(+) diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c index 5df1a608e566..e6bf86dc14eb 100644 --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c @@ -39,6 +39,88 @@ #define MII_GMAC4_WRITE(1 << MII_GMAC4_GOC_SHIFT) #define MII_GMAC4_READ (3 << MII_GMAC4_GOC_SHIFT) +/* XGMAC defines */ +#define MII_XGMAC_CMD_SHIFT16 +#define MII_XGMAC_WRITE(1 << MII_XGMAC_CMD_SHIFT) +#define MII_XGMAC_READ (3 << MII_XGMAC_CMD_SHIFT) +#define MII_XGMAC_BUSY BIT(22) + +static int stmmac_xgmac2_mdio_read(struct stmmac_priv *priv, int phyaddr, + int phyreg) +{ + unsigned int mii_address = priv->hw->mii.addr; + unsigned int mii_data = priv->hw->mii.data; + u32 tmp, addr, value = MII_XGMAC_BUSY; + int data; + + if (phyreg & MII_ADDR_C45) { + addr = ((phyreg >> 16) & 0x1f) << 21; + addr |= (phyaddr << 16) | (phyreg & 0x); + } else { + if (phyaddr >= 4) + return -ENODEV; + writel(~0x0, priv->ioaddr + 0x220); + addr = (phyaddr << 16) | (phyreg & 0x1f); + } + + value |= (priv->clk_csr << priv->hw->mii.clk_csr_shift) + & priv->hw->mii.clk_csr_mask; + value |= BIT(18); + value |= MII_XGMAC_READ; + + if (readl_poll_timeout(priv->ioaddr + mii_data, tmp, + !(tmp & MII_XGMAC_BUSY), 100, 1)) + return -EBUSY; + + writel(addr, priv->ioaddr + mii_address); + writel(value, priv->ioaddr + mii_data); + + if (readl_poll_timeout(priv->ioaddr + mii_data, tmp, + !(tmp & MII_XGMAC_BUSY), 100, 1)) + return -EBUSY; + + /* Read the data from the MII data register */ + data = (int)readl(priv->ioaddr + mii_data) & GENMASK(15, 0); + + return data; +} + +static int stmmac_xgmac2_mdio_write(struct stmmac_priv *priv, int phyaddr, + int phyreg, u16 phydata) +{ + unsigned int mii_address = priv->hw->mii.addr; + unsigned int mii_data = priv->hw->mii.data; + u32 addr, tmp, value = MII_XGMAC_BUSY; + + if (phyreg & MII_ADDR_C45) { + addr = ((phyreg >> 16) & 0x1f) << 21; + addr |= (phyaddr << 16) | (phyreg & 0x); + } else { + if (phyaddr >= 4) + return -ENODEV; + writel(~0x0, priv->ioaddr + 0x220); + addr = (phyaddr << 16) | (phyreg & 0x1f); + } + + value |= (priv->clk_csr << priv->hw->mii.clk_csr_shift) + & priv->hw->mii.clk_csr_mask; + value |= phydata | BIT(18); + value |= MII_XGMAC_WRITE; + + /* Wait until any existing MII operation is complete */ + if (readl_poll_timeout(priv->ioaddr + mii_data, tmp, + !(tmp & MII_XGMAC_BUSY), 100, 1)) + return -EBUSY; + + /* Set the MII address register to write */ + writel(addr, priv->ioaddr + mii_address); + writel(value, priv->ioaddr + mii_data); + + /* Wait until any existing MII operation is complete */ + return readl_poll_timeout(priv->ioaddr + mii_data, tmp, + !(tmp & MII_XGMAC_BUSY), 100, 1); +} + /** * stmmac_mdio_read * @bus: points to the mii_bus structure @@ -59,6 +141,9 @@ static int stmmac_mdio_read(struct mii_bus *bus, int phyaddr, int phyreg) int data; u32 value = MII_BUSY; + if (priv->plat->has_xgmac) + return stmmac_xgmac2_mdio_read(priv, phyaddr, phyreg); + value |= (phyaddr << priv->hw->mii.addr_shift) & priv->hw->mii.addr_mask; value |= (phyreg << priv->hw->mii.reg_shift) & priv->hw->mii.reg_mask; @@ -101,6 +186,9 @@ static int stmmac_mdio_write(struct mii_bus *bus, int phyaddr, int phyreg, u32 v; u32 value = MII_BUSY; + if (priv->plat->has_xgmac) + return stmmac_xgmac2_mdio_write(priv, phyaddr, phyreg, phydata); + value |= (phyaddr << priv->hw->mii.addr_shift) & priv->hw->mii.addr_mask; value |= (phyreg << priv->hw->mii.reg_shift) & priv->hw->mii.reg_mask; -- 2.7.4
[PATCH net-next 4/9] net: stmmac: Add descriptor related callbacks for XGMAC2
Add the descriptor related callbacks for the new IP block XGMAC2. Signed-off-by: Jose Abreu Cc: David S. Miller Cc: Joao Pinto Cc: Giuseppe Cavallaro Cc: Alexandre Torgue --- drivers/net/ethernet/stmicro/stmmac/Makefile | 3 +- drivers/net/ethernet/stmicro/stmmac/dwxgmac2.h | 30 +++ .../net/ethernet/stmicro/stmmac/dwxgmac2_descs.c | 280 + drivers/net/ethernet/stmicro/stmmac/hwif.c | 2 +- drivers/net/ethernet/stmicro/stmmac/hwif.h | 1 + 5 files changed, 314 insertions(+), 2 deletions(-) create mode 100644 drivers/net/ethernet/stmicro/stmmac/dwxgmac2_descs.c diff --git a/drivers/net/ethernet/stmicro/stmmac/Makefile b/drivers/net/ethernet/stmicro/stmmac/Makefile index da40d3bba037..99967a80a8c8 100644 --- a/drivers/net/ethernet/stmicro/stmmac/Makefile +++ b/drivers/net/ethernet/stmicro/stmmac/Makefile @@ -5,7 +5,8 @@ stmmac-objs:= stmmac_main.o stmmac_ethtool.o stmmac_mdio.o ring_mode.o \ dwmac100_core.o dwmac100_dma.o enh_desc.o norm_desc.o \ mmc_core.o stmmac_hwtstamp.o stmmac_ptp.o dwmac4_descs.o \ dwmac4_dma.o dwmac4_lib.o dwmac4_core.o dwmac5.o hwif.o \ - stmmac_tc.o dwxgmac2_core.o dwxgmac2_dma.o $(stmmac-y) + stmmac_tc.o dwxgmac2_core.o dwxgmac2_dma.o dwxgmac2_descs.o \ + $(stmmac-y) # Ordering matters. Generic driver must be last. obj-$(CONFIG_STMMAC_PLATFORM) += stmmac-platform.o diff --git a/drivers/net/ethernet/stmicro/stmmac/dwxgmac2.h b/drivers/net/ethernet/stmicro/stmmac/dwxgmac2.h index 69d225a9c890..7d9f4428d8d5 100644 --- a/drivers/net/ethernet/stmicro/stmmac/dwxgmac2.h +++ b/drivers/net/ethernet/stmicro/stmmac/dwxgmac2.h @@ -193,4 +193,34 @@ #define XGMAC_TPS BIT(1) #define XGMAC_TI BIT(0) +/* Descriptors */ +#define XGMAC_TDES2_IOCBIT(31) +#define XGMAC_TDES2_TTSE BIT(30) +#define XGMAC_TDES2_B2LGENMASK(29, 16) +#define XGMAC_TDES2_B2L_SHIFT 16 +#define XGMAC_TDES2_B1LGENMASK(13, 0) +#define XGMAC_TDES3_OWNBIT(31) +#define XGMAC_TDES3_CTXT BIT(30) +#define XGMAC_TDES3_FD BIT(29) +#define XGMAC_TDES3_LD BIT(28) +#define XGMAC_TDES3_CPCGENMASK(27, 26) +#define XGMAC_TDES3_CPC_SHIFT 26 +#define XGMAC_TDES3_TCMSSV BIT(26) +#define XGMAC_TDES3_THLGENMASK(22, 19) +#define XGMAC_TDES3_THL_SHIFT 19 +#define XGMAC_TDES3_TSEBIT(18) +#define XGMAC_TDES3_CICGENMASK(17, 16) +#define XGMAC_TDES3_CIC_SHIFT 16 +#define XGMAC_TDES3_TPLGENMASK(17, 0) +#define XGMAC_TDES3_FL GENMASK(14, 0) +#define XGMAC_RDES3_OWNBIT(31) +#define XGMAC_RDES3_CTXT BIT(30) +#define XGMAC_RDES3_IOCBIT(30) +#define XGMAC_RDES3_LD BIT(28) +#define XGMAC_RDES3_CDABIT(27) +#define XGMAC_RDES3_ES BIT(15) +#define XGMAC_RDES3_PL GENMASK(13, 0) +#define XGMAC_RDES3_TSDBIT(6) +#define XGMAC_RDES3_TSABIT(4) + #endif /* __STMMAC_DWXGMAC2_H__ */ diff --git a/drivers/net/ethernet/stmicro/stmmac/dwxgmac2_descs.c b/drivers/net/ethernet/stmicro/stmmac/dwxgmac2_descs.c new file mode 100644 index ..1d858fdec997 --- /dev/null +++ b/drivers/net/ethernet/stmicro/stmmac/dwxgmac2_descs.c @@ -0,0 +1,280 @@ +// SPDX-License-Identifier: (GPL-2.0 OR MIT) +/* + * Copyright (c) 2018 Synopsys, Inc. and/or its affiliates. + * stmmac XGMAC support. + */ + +#include +#include "common.h" +#include "dwxgmac2.h" + +static int dwxgmac2_get_tx_status(void *data, struct stmmac_extra_stats *x, + struct dma_desc *p, void __iomem *ioaddr) +{ + unsigned int tdes3 = le32_to_cpu(p->des3); + int ret = tx_done; + + if (unlikely(tdes3 & XGMAC_TDES3_OWN)) + return tx_dma_own; + if (likely(!(tdes3 & XGMAC_TDES3_LD))) + return tx_not_ls; + + return ret; +} + +static int dwxgmac2_get_rx_status(void *data, struct stmmac_extra_stats *x, + struct dma_desc *p) +{ + unsigned int rdes3 = le32_to_cpu(p->des3); + int ret = good_frame; + + if (unlikely(rdes3 & XGMAC_RDES3_OWN)) + return dma_own; + if (likely(!(rdes3 & XGMAC_RDES3_LD))) + return discard_frame; + if (unlikely(rdes3 & XGMAC_RDES3_ES)) + ret = discard_frame; + + return ret; +} + +static int dwxgmac2_get_tx_len(struct dma_desc *p) +{ + return (le32_to_cpu(p->des2) & XGMAC_TDES2_B1L); +} + +static int dwxgmac2_get_tx_owner(struct dma_desc *p) +{ + return (le32_to_cpu(p->des3) &
[PATCH net-next 2/9] net: stmmac: Add MAC related callbacks for XGMAC2
Add the MAC related callbacks for the new IP block XGMAC2. Signed-off-by: Jose Abreu Cc: David S. Miller Cc: Joao Pinto Cc: Giuseppe Cavallaro Cc: Alexandre Torgue --- drivers/net/ethernet/stmicro/stmmac/Makefile | 2 +- drivers/net/ethernet/stmicro/stmmac/common.h | 3 + drivers/net/ethernet/stmicro/stmmac/dwxgmac2.h | 140 .../net/ethernet/stmicro/stmmac/dwxgmac2_core.c| 371 + drivers/net/ethernet/stmicro/stmmac/hwif.c | 4 +- drivers/net/ethernet/stmicro/stmmac/hwif.h | 1 + 6 files changed, 518 insertions(+), 3 deletions(-) create mode 100644 drivers/net/ethernet/stmicro/stmmac/dwxgmac2.h create mode 100644 drivers/net/ethernet/stmicro/stmmac/dwxgmac2_core.c diff --git a/drivers/net/ethernet/stmicro/stmmac/Makefile b/drivers/net/ethernet/stmicro/stmmac/Makefile index 68e9e2640c62..a6cf632c9592 100644 --- a/drivers/net/ethernet/stmicro/stmmac/Makefile +++ b/drivers/net/ethernet/stmicro/stmmac/Makefile @@ -5,7 +5,7 @@ stmmac-objs:= stmmac_main.o stmmac_ethtool.o stmmac_mdio.o ring_mode.o \ dwmac100_core.o dwmac100_dma.o enh_desc.o norm_desc.o \ mmc_core.o stmmac_hwtstamp.o stmmac_ptp.o dwmac4_descs.o \ dwmac4_dma.o dwmac4_lib.o dwmac4_core.o dwmac5.o hwif.o \ - stmmac_tc.o $(stmmac-y) + stmmac_tc.o dwxgmac2_core.o $(stmmac-y) # Ordering matters. Generic driver must be last. obj-$(CONFIG_STMMAC_PLATFORM) += stmmac-platform.o diff --git a/drivers/net/ethernet/stmicro/stmmac/common.h b/drivers/net/ethernet/stmicro/stmmac/common.h index 3fb81acbd274..1854f270ad66 100644 --- a/drivers/net/ethernet/stmicro/stmmac/common.h +++ b/drivers/net/ethernet/stmicro/stmmac/common.h @@ -400,6 +400,8 @@ struct mac_link { u32 speed10; u32 speed100; u32 speed1000; + u32 speed2500; + u32 speed1; u32 duplex; }; @@ -441,6 +443,7 @@ struct stmmac_rx_routing { int dwmac100_setup(struct stmmac_priv *priv); int dwmac1000_setup(struct stmmac_priv *priv); int dwmac4_setup(struct stmmac_priv *priv); +int dwxgmac2_setup(struct stmmac_priv *priv); void stmmac_set_mac_addr(void __iomem *ioaddr, u8 addr[6], unsigned int high, unsigned int low); diff --git a/drivers/net/ethernet/stmicro/stmmac/dwxgmac2.h b/drivers/net/ethernet/stmicro/stmmac/dwxgmac2.h new file mode 100644 index ..ef63a62a699d --- /dev/null +++ b/drivers/net/ethernet/stmicro/stmmac/dwxgmac2.h @@ -0,0 +1,140 @@ +// SPDX-License-Identifier: (GPL-2.0 OR MIT) +/* + * Copyright (c) 2018 Synopsys, Inc. and/or its affiliates. + * stmmac XGMAC definitions. + */ + +#ifndef __STMMAC_DWXGMAC2_H__ +#define __STMMAC_DWXGMAC2_H__ + +#include "common.h" + +/* Misc */ +#define XGMAC_JUMBO_LEN16368 + +/* MAC Registers */ +#define XGMAC_TX_CONFIG0x +#define XGMAC_CONFIG_SS_OFF29 +#define XGMAC_CONFIG_SS_MASK GENMASK(30, 29) +#define XGMAC_CONFIG_SS_1 (0x0 << XGMAC_CONFIG_SS_OFF) +#define XGMAC_CONFIG_SS_2500 (0x2 << XGMAC_CONFIG_SS_OFF) +#define XGMAC_CONFIG_SS_1000 (0x3 << XGMAC_CONFIG_SS_OFF) +#define XGMAC_CONFIG_SARC GENMASK(22, 20) +#define XGMAC_CONFIG_SARC_SHIFT20 +#define XGMAC_CONFIG_JDBIT(16) +#define XGMAC_CONFIG_TEBIT(0) +#define XGMAC_CORE_INIT_TX (XGMAC_CONFIG_JD) +#define XGMAC_RX_CONFIG0x0004 +#define XGMAC_CONFIG_ARPEN BIT(31) +#define XGMAC_CONFIG_GPSL GENMASK(29, 16) +#define XGMAC_CONFIG_GPSL_SHIFT16 +#define XGMAC_CONFIG_S2KP BIT(11) +#define XGMAC_CONFIG_IPC BIT(9) +#define XGMAC_CONFIG_JEBIT(8) +#define XGMAC_CONFIG_WDBIT(7) +#define XGMAC_CONFIG_GPSLCEBIT(6) +#define XGMAC_CONFIG_CST BIT(2) +#define XGMAC_CONFIG_ACS BIT(1) +#define XGMAC_CONFIG_REBIT(0) +#define XGMAC_CORE_INIT_RX 0 +#define XGMAC_PACKET_FILTER0x0008 +#define XGMAC_FILTER_RABIT(31) +#define XGMAC_FILTER_PMBIT(4) +#define XGMAC_FILTER_HMC BIT(2) +#define XGMAC_FILTER_PRBIT(0) +#define XGMAC_HASH_TABLE(x)(0x0010 + (x) * 4) +#define XGMAC_RXQ_CTRL00x00a0 +#define XGMAC_RXQEN(x) GENMASK((x) * 2 + 1, (x) * 2) +#define XGMAC_RXQEN_SHIFT(x) ((x) * 2) +#define XGMAC_RXQ_CTRL20x00a8 +#define XGMAC_RXQ_CTRL30x00ac +#define XGMAC_PSRQ(x) GENMASK((x) * 8 + 7, (x) * 8) +#define XGMAC_PSRQ_SHIFT(x)((x) * 8) +#define XGMAC_INT_STATUS 0x00b0 +#define XGMAC_PMTIS
[PATCH net-next 0/9] Add 10GbE support in stmmac using XGMAC2
This series adds support for 10Gigabit IP in stmmac. The IP is called XGMAC2 and has many similarities with GMAC4. Due to this, its relatively easy to incorporate this new IP into stmmac driver by adding a new block and filling the necessary callbacks. The functionality added by this series is still reduced but its only a starting point which will later be expanded. I splitted the patches into funcionality and to ease the review. Only the patch 8/9 really enables the XGMAC2 block by adding a new compatible string. Cc: David S. Miller Cc: Joao Pinto Cc: Giuseppe Cavallaro Cc: Alexandre Torgue Jose Abreu (9): net: stmmac: Add XGMAC 2.10 HWIF entry net: stmmac: Add MAC related callbacks for XGMAC2 net: stmmac: Add DMA related callbacks for XGMAC2 net: stmmac: Add descriptor related callbacks for XGMAC2 net: stmmac: Add MDIO related functions for XGMAC2 net: stmmac: Add PTP support for XGMAC2 net: stmmac: Integrate XGMAC into main driver flow net: stmmac: Add the bindings parsing for XGMAC2 bindings: net: stmmac: Add the bindings documentation for XGMAC2. Documentation/devicetree/bindings/net/stmmac.txt | 3 +- drivers/net/ethernet/stmicro/stmmac/Makefile | 3 +- drivers/net/ethernet/stmicro/stmmac/common.h | 17 +- .../net/ethernet/stmicro/stmmac/dwmac-generic.c| 2 + drivers/net/ethernet/stmicro/stmmac/dwxgmac2.h | 226 .../net/ethernet/stmicro/stmmac/dwxgmac2_core.c| 371 +++ .../net/ethernet/stmicro/stmmac/dwxgmac2_descs.c | 280 ++ drivers/net/ethernet/stmicro/stmmac/dwxgmac2_dma.c | 410 + drivers/net/ethernet/stmicro/stmmac/hwif.c | 31 +- drivers/net/ethernet/stmicro/stmmac/hwif.h | 3 + drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 55 ++- drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c | 88 + .../net/ethernet/stmicro/stmmac/stmmac_platform.c | 6 + drivers/net/ethernet/stmicro/stmmac/stmmac_ptp.c | 6 +- drivers/net/ethernet/stmicro/stmmac/stmmac_ptp.h | 1 + include/linux/stmmac.h | 1 + 16 files changed, 1481 insertions(+), 22 deletions(-) create mode 100644 drivers/net/ethernet/stmicro/stmmac/dwxgmac2.h create mode 100644 drivers/net/ethernet/stmicro/stmmac/dwxgmac2_core.c create mode 100644 drivers/net/ethernet/stmicro/stmmac/dwxgmac2_descs.c create mode 100644 drivers/net/ethernet/stmicro/stmmac/dwxgmac2_dma.c -- 2.7.4
[PATCH net-next 1/9] net: stmmac: Add XGMAC 2.10 HWIF entry
Add a new entry to HWIF table for XGMAC 2.10. For now we fill it with empty callbacks which will be added in posterior patches. Signed-off-by: Jose Abreu Cc: David S. Miller Cc: Joao Pinto Cc: Giuseppe Cavallaro Cc: Alexandre Torgue --- drivers/net/ethernet/stmicro/stmmac/common.h | 14 +++-- drivers/net/ethernet/stmicro/stmmac/hwif.c | 31 ++-- include/linux/stmmac.h | 1 + 3 files changed, 38 insertions(+), 8 deletions(-) diff --git a/drivers/net/ethernet/stmicro/stmmac/common.h b/drivers/net/ethernet/stmicro/stmmac/common.h index 78fd0f8b8e81..3fb81acbd274 100644 --- a/drivers/net/ethernet/stmicro/stmmac/common.h +++ b/drivers/net/ethernet/stmicro/stmmac/common.h @@ -36,12 +36,14 @@ #include "mmc.h" /* Synopsys Core versions */ -#defineDWMAC_CORE_3_40 0x34 -#defineDWMAC_CORE_3_50 0x35 -#defineDWMAC_CORE_4_00 0x40 -#define DWMAC_CORE_4_100x41 -#define DWMAC_CORE_5_00 0x50 -#define DWMAC_CORE_5_10 0x51 +#defineDWMAC_CORE_3_40 0x34 +#defineDWMAC_CORE_3_50 0x35 +#defineDWMAC_CORE_4_00 0x40 +#define DWMAC_CORE_4_100x41 +#define DWMAC_CORE_5_000x50 +#define DWMAC_CORE_5_100x51 +#define DWXGMAC_CORE_2_10 0x21 + #define STMMAC_CHAN0 0 /* Always supported and default for all chips */ /* These need to be power of two, and >= 4 */ diff --git a/drivers/net/ethernet/stmicro/stmmac/hwif.c b/drivers/net/ethernet/stmicro/stmmac/hwif.c index 1f50e83cafb2..24f5ff175aa4 100644 --- a/drivers/net/ethernet/stmicro/stmmac/hwif.c +++ b/drivers/net/ethernet/stmicro/stmmac/hwif.c @@ -72,6 +72,7 @@ static int stmmac_dwmac4_quirks(struct stmmac_priv *priv) static const struct stmmac_hwif_entry { bool gmac; bool gmac4; + bool xgmac; u32 min_id; const struct stmmac_regs_off regs; const void *desc; @@ -87,6 +88,7 @@ static const struct stmmac_hwif_entry { { .gmac = false, .gmac4 = false, + .xgmac = false, .min_id = 0, .regs = { .ptp_off = PTP_GMAC3_X_OFFSET, @@ -103,6 +105,7 @@ static const struct stmmac_hwif_entry { }, { .gmac = true, .gmac4 = false, + .xgmac = false, .min_id = 0, .regs = { .ptp_off = PTP_GMAC3_X_OFFSET, @@ -119,6 +122,7 @@ static const struct stmmac_hwif_entry { }, { .gmac = false, .gmac4 = true, + .xgmac = false, .min_id = 0, .regs = { .ptp_off = PTP_GMAC4_OFFSET, @@ -135,6 +139,7 @@ static const struct stmmac_hwif_entry { }, { .gmac = false, .gmac4 = true, + .xgmac = false, .min_id = DWMAC_CORE_4_00, .regs = { .ptp_off = PTP_GMAC4_OFFSET, @@ -151,6 +156,7 @@ static const struct stmmac_hwif_entry { }, { .gmac = false, .gmac4 = true, + .xgmac = false, .min_id = DWMAC_CORE_4_10, .regs = { .ptp_off = PTP_GMAC4_OFFSET, @@ -167,6 +173,7 @@ static const struct stmmac_hwif_entry { }, { .gmac = false, .gmac4 = true, + .xgmac = false, .min_id = DWMAC_CORE_5_10, .regs = { .ptp_off = PTP_GMAC4_OFFSET, @@ -180,11 +187,29 @@ static const struct stmmac_hwif_entry { .tc = _tc_ops, .setup = dwmac4_setup, .quirks = NULL, - } + }, { + .gmac = false, + .gmac4 = false, + .xgmac = true, + .min_id = DWXGMAC_CORE_2_10, + .regs = { + .ptp_off = 0, + .mmc_off = 0, + }, + .desc = NULL, + .dma = NULL, + .mac = NULL, + .hwtimestamp = NULL, + .mode = NULL, + .tc = NULL, + .setup = NULL, + .quirks = NULL, + }, }; int stmmac_hwif_init(struct stmmac_priv *priv) { + bool needs_xgmac = priv->plat->has_xgmac; bool needs_gmac4 = priv->plat->has_gmac4; bool needs_gmac = priv->plat->has_gmac; const struct stmmac_hwif_entry *entry; @@ -195,7 +220,7 @@ int stmmac_hwif_init(struct stmmac_priv *priv) if (needs_gmac) { id = stmmac_get_id(priv, GMAC_VERSION); - } else if (needs_gmac4) { + } else if (needs_gmac4 || needs_xgmac) { id = stmmac_get_id(priv, GMAC4_VERSION); } else { id = 0; @@ -229,6 +254,8 @@ int stmmac_hwif_init(struct
[patch net-next v2 3/3] net: sched: make tcf_chain_{get,put}() static
From: Jiri Pirko These are no longer used outside of cls_api.c so make them static. Move tcf_chain_flush() to avoid fwd declaration of tcf_chain_put(). Signed-off-by: Jiri Pirko v1->v2: - new patch --- include/net/pkt_cls.h | 3 --- net/sched/cls_api.c | 34 -- 2 files changed, 16 insertions(+), 21 deletions(-) diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h index 22bfc3a13c25..ef727f71336e 100644 --- a/include/net/pkt_cls.h +++ b/include/net/pkt_cls.h @@ -40,11 +40,8 @@ struct tcf_block_cb; bool tcf_queue_work(struct rcu_work *rwork, work_func_t func); #ifdef CONFIG_NET_CLS -struct tcf_chain *tcf_chain_get(struct tcf_block *block, u32 chain_index, - bool create); struct tcf_chain *tcf_chain_get_by_act(struct tcf_block *block, u32 chain_index); -void tcf_chain_put(struct tcf_chain *chain); void tcf_chain_put_by_act(struct tcf_chain *chain); void tcf_block_netif_keep_dst(struct tcf_block *block); int tcf_block_get(struct tcf_block **p_block, diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c index b194a5abfc6a..e8b0bbd0883f 100644 --- a/net/sched/cls_api.c +++ b/net/sched/cls_api.c @@ -232,19 +232,6 @@ static void tcf_chain0_head_change(struct tcf_chain *chain, tcf_chain_head_change_item(item, tp_head); } -static void tcf_chain_flush(struct tcf_chain *chain) -{ - struct tcf_proto *tp = rtnl_dereference(chain->filter_chain); - - tcf_chain0_head_change(chain, NULL); - while (tp) { - RCU_INIT_POINTER(chain->filter_chain, tp->next); - tcf_proto_destroy(tp, NULL); - tp = rtnl_dereference(chain->filter_chain); - tcf_chain_put(chain); - } -} - static void tcf_chain_destroy(struct tcf_chain *chain) { struct tcf_block *block = chain->block; @@ -316,12 +303,11 @@ static struct tcf_chain *__tcf_chain_get(struct tcf_block *block, return chain; } -struct tcf_chain *tcf_chain_get(struct tcf_block *block, u32 chain_index, - bool create) +static struct tcf_chain *tcf_chain_get(struct tcf_block *block, u32 chain_index, + bool create) { return __tcf_chain_get(block, chain_index, create, false); } -EXPORT_SYMBOL(tcf_chain_get); struct tcf_chain *tcf_chain_get_by_act(struct tcf_block *block, u32 chain_index) { @@ -347,11 +333,10 @@ static void __tcf_chain_put(struct tcf_chain *chain, bool by_act) } } -void tcf_chain_put(struct tcf_chain *chain) +static void tcf_chain_put(struct tcf_chain *chain) { __tcf_chain_put(chain, false); } -EXPORT_SYMBOL(tcf_chain_put); void tcf_chain_put_by_act(struct tcf_chain *chain) { @@ -365,6 +350,19 @@ static void tcf_chain_put_explicitly_created(struct tcf_chain *chain) tcf_chain_put(chain); } +static void tcf_chain_flush(struct tcf_chain *chain) +{ + struct tcf_proto *tp = rtnl_dereference(chain->filter_chain); + + tcf_chain0_head_change(chain, NULL); + while (tp) { + RCU_INIT_POINTER(chain->filter_chain, tp->next); + tcf_proto_destroy(tp, NULL); + tp = rtnl_dereference(chain->filter_chain); + tcf_chain_put(chain); + } +} + static bool tcf_block_offload_in_use(struct tcf_block *block) { return block->offloadcnt; -- 2.14.4
[patch net-next v2 1/3] net: sched: change name of zombie chain to "held_by_acts_only"
From: Jiri Pirko As mentioned by Cong and Jakub during the review process, it is a bit odd to sometimes (act flow) create a new chain which would be immediately a "zombie". So just rename it to "held_by_acts_only". Signed-off-by: Jiri Pirko Suggested-by: Cong Wang Suggested-by: Jakub Kicinski Acked-by: Cong Wang --- net/sched/cls_api.c | 14 ++ 1 file changed, 6 insertions(+), 8 deletions(-) diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c index e20aad1987b8..2f78341f2888 100644 --- a/net/sched/cls_api.c +++ b/net/sched/cls_api.c @@ -272,11 +272,10 @@ static void tcf_chain_release_by_act(struct tcf_chain *chain) --chain->action_refcnt; } -static bool tcf_chain_is_zombie(struct tcf_chain *chain) +static bool tcf_chain_held_by_acts_only(struct tcf_chain *chain) { /* In case all the references are action references, this -* chain is a zombie and should not be listed in the chain -* dump list. +* chain should not be shown to the user. */ return chain->refcnt == chain->action_refcnt; } @@ -1838,10 +1837,9 @@ static int tc_ctl_chain(struct sk_buff *skb, struct nlmsghdr *n, chain = tcf_chain_lookup(block, chain_index); if (n->nlmsg_type == RTM_NEWCHAIN) { if (chain) { - if (tcf_chain_is_zombie(chain)) { + if (tcf_chain_held_by_acts_only(chain)) { /* The chain exists only because there is -* some action referencing it, meaning it -* is a zombie. +* some action referencing it. */ tcf_chain_hold(chain); } else { @@ -1860,7 +1858,7 @@ static int tc_ctl_chain(struct sk_buff *skb, struct nlmsghdr *n, } } } else { - if (!chain || tcf_chain_is_zombie(chain)) { + if (!chain || tcf_chain_held_by_acts_only(chain)) { NL_SET_ERR_MSG(extack, "Cannot find specified filter chain"); return -EINVAL; } @@ -1988,7 +1986,7 @@ static int tc_dump_chain(struct sk_buff *skb, struct netlink_callback *cb) index++; continue; } - if (tcf_chain_is_zombie(chain)) + if (tcf_chain_held_by_acts_only(chain)) continue; err = tc_chain_fill_node(chain, net, skb, block, NETLINK_CB(cb->skb).portid, -- 2.14.4
[patch net-next v2 2/3] net: sched: fix notifications for action-held chains
From: Jiri Pirko Chains that only have action references serve as placeholders. Until a non-action reference is created, user should not be aware of the chain. Also he should not receive any notifications about it. So send notifications for the new chain only in case the chain gets the first non-action reference. Symmetrically to that, when the last non-action reference is dropped, send the notification about deleted chain. Reported-by: Cong Wang Signed-off-by: Jiri Pirko Acked-by: Cong Wang v1->v2: - made __tcf_chain_{get,put}() static as suggested by Cong --- net/sched/cls_api.c | 71 - 1 file changed, 43 insertions(+), 28 deletions(-) diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c index 2f78341f2888..b194a5abfc6a 100644 --- a/net/sched/cls_api.c +++ b/net/sched/cls_api.c @@ -262,16 +262,6 @@ static void tcf_chain_hold(struct tcf_chain *chain) ++chain->refcnt; } -static void tcf_chain_hold_by_act(struct tcf_chain *chain) -{ - ++chain->action_refcnt; -} - -static void tcf_chain_release_by_act(struct tcf_chain *chain) -{ - --chain->action_refcnt; -} - static bool tcf_chain_held_by_acts_only(struct tcf_chain *chain) { /* In case all the references are action references, this @@ -295,52 +285,77 @@ static struct tcf_chain *tcf_chain_lookup(struct tcf_block *block, static int tc_chain_notify(struct tcf_chain *chain, struct sk_buff *oskb, u32 seq, u16 flags, int event, bool unicast); -struct tcf_chain *tcf_chain_get(struct tcf_block *block, u32 chain_index, - bool create) +static struct tcf_chain *__tcf_chain_get(struct tcf_block *block, +u32 chain_index, bool create, +bool by_act) { struct tcf_chain *chain = tcf_chain_lookup(block, chain_index); if (chain) { tcf_chain_hold(chain); - return chain; + } else { + if (!create) + return NULL; + chain = tcf_chain_create(block, chain_index); + if (!chain) + return NULL; } - if (!create) - return NULL; - chain = tcf_chain_create(block, chain_index); - if (!chain) - return NULL; - tc_chain_notify(chain, NULL, 0, NLM_F_CREATE | NLM_F_EXCL, - RTM_NEWCHAIN, false); + if (by_act) + ++chain->action_refcnt; + + /* Send notification only in case we got the first +* non-action reference. Until then, the chain acts only as +* a placeholder for actions pointing to it and user ought +* not know about them. +*/ + if (chain->refcnt - chain->action_refcnt == 1 && !by_act) + tc_chain_notify(chain, NULL, 0, NLM_F_CREATE | NLM_F_EXCL, + RTM_NEWCHAIN, false); + return chain; } + +struct tcf_chain *tcf_chain_get(struct tcf_block *block, u32 chain_index, + bool create) +{ + return __tcf_chain_get(block, chain_index, create, false); +} EXPORT_SYMBOL(tcf_chain_get); struct tcf_chain *tcf_chain_get_by_act(struct tcf_block *block, u32 chain_index) { - struct tcf_chain *chain = tcf_chain_get(block, chain_index, true); - - tcf_chain_hold_by_act(chain); - return chain; + return __tcf_chain_get(block, chain_index, true, true); } EXPORT_SYMBOL(tcf_chain_get_by_act); static void tc_chain_tmplt_del(struct tcf_chain *chain); -void tcf_chain_put(struct tcf_chain *chain) +static void __tcf_chain_put(struct tcf_chain *chain, bool by_act) { - if (--chain->refcnt == 0) { + if (by_act) + chain->action_refcnt--; + chain->refcnt--; + + /* The last dropped non-action reference will trigger notification. */ + if (chain->refcnt - chain->action_refcnt == 0 && !by_act) tc_chain_notify(chain, NULL, 0, 0, RTM_DELCHAIN, false); + + if (chain->refcnt == 0) { tc_chain_tmplt_del(chain); tcf_chain_destroy(chain); } } + +void tcf_chain_put(struct tcf_chain *chain) +{ + __tcf_chain_put(chain, false); +} EXPORT_SYMBOL(tcf_chain_put); void tcf_chain_put_by_act(struct tcf_chain *chain) { - tcf_chain_release_by_act(chain); - tcf_chain_put(chain); + __tcf_chain_put(chain, true); } EXPORT_SYMBOL(tcf_chain_put_by_act); -- 2.14.4
[patch net-next v2 0/3] net: sched: couple of adjustments/fixes
From: Jiri Pirko Jiri Pirko (3): net: sched: change name of zombie chain to "held_by_acts_only" net: sched: fix notifications for action-held chains net: sched: make tcf_chain_{get,put}() static include/net/pkt_cls.h | 3 -- net/sched/cls_api.c | 113 +++--- 2 files changed, 62 insertions(+), 54 deletions(-) -- 2.14.4
答复: [Patch net-next] net: hns3: fix return value error while hclge_cmd_csq_clean failed
Sorry, please ignore this patch. I will resend it. -邮件原件- 发件人: netdev-ow...@vger.kernel.org [mailto:netdev-ow...@vger.kernel.org] 代表 Huazhong Tan 发送时间: 2018年8月1日 17:53 收件人: da...@davemloft.net 抄送: netdev@vger.kernel.org; Linuxarm 主题: [Patch net-next] net: hns3: fix return value error while hclge_cmd_csq_clean failed From: fredalu While cleaning the command queue, the value of the HEAD register is not in the range of next_to_clean and next_to_use, meaning that this value is invalid. This also means that there is a hardware error and the hardware will trigger a reset soon. At this time we should return an error code instead of 0, and HCLGE_STATE_CMD_DISABLE needs to be set to prevent sending command again. Fixes: 3ff504908f95 ("net: hns3: fix a dead loop in hclge_cmd_csq_clean") Signed-off-by: Huazhong Tan wq --- drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.c | 12 +--- 1 file changed, 9 insertions(+), 3 deletions(-) diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.c b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.c index 165c3d5..ac13cb2 100644 --- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.c +++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.c @@ -147,7 +147,12 @@ static int hclge_cmd_csq_clean(struct hclge_hw *hw) if (!is_valid_csq_clean_head(csq, head)) { dev_warn(>pdev->dev, "wrong cmd head (%d, %d-%d)\n", head, csq->next_to_use, csq->next_to_clean); - return 0; + dev_warn(>pdev->dev, +"Disabling any further commands to IMP firmware\n"); + set_bit(HCLGE_STATE_CMD_DISABLE, >state); + dev_warn(>pdev->dev, +"IMP firmware watchdog reset soon expected!\n"); + return -EIO; } clean = (head - csq->next_to_clean + csq->desc_num) % csq->desc_num; @@ -267,10 +272,11 @@ int hclge_cmd_send(struct hclge_hw *hw, struct hclge_desc *desc, int num) /* Clean the command send queue */ handle = hclge_cmd_csq_clean(hw); - if (handle != num) { + if (handle < 0) + retval = handle; + else if (handle != num) dev_warn(>pdev->dev, "cleaned %d, need to clean %d\n", handle, num); - } spin_unlock_bh(>cmq.csq.lock); -- 2.7.4
[Patch net-next V2] net: hns3: fix return value error while hclge_cmd_csq_clean failed
While cleaning the command queue, the value of the HEAD register is not in the range of next_to_clean and next_to_use, meaning that this value is invalid. This also means that there is a hardware error and the hardware will trigger a reset soon. At this time we should return an error code instead of 0, and HCLGE_STATE_CMD_DISABLE needs to be set to prevent sending command again. Fixes: 3ff504908f95 ("net: hns3: fix a dead loop in hclge_cmd_csq_clean") Signed-off-by: Huazhong Tan Signed-off-by: Salil Mehta --- drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.c | 12 +--- 1 file changed, 9 insertions(+), 3 deletions(-) diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.c b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.c index 165c3d5..ac13cb2 100644 --- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.c +++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.c @@ -147,7 +147,12 @@ static int hclge_cmd_csq_clean(struct hclge_hw *hw) if (!is_valid_csq_clean_head(csq, head)) { dev_warn(>pdev->dev, "wrong cmd head (%d, %d-%d)\n", head, csq->next_to_use, csq->next_to_clean); - return 0; + dev_warn(>pdev->dev, +"Disabling any further commands to IMP firmware\n"); + set_bit(HCLGE_STATE_CMD_DISABLE, >state); + dev_warn(>pdev->dev, +"IMP firmware watchdog reset soon expected!\n"); + return -EIO; } clean = (head - csq->next_to_clean + csq->desc_num) % csq->desc_num; @@ -267,10 +272,11 @@ int hclge_cmd_send(struct hclge_hw *hw, struct hclge_desc *desc, int num) /* Clean the command send queue */ handle = hclge_cmd_csq_clean(hw); - if (handle != num) { + if (handle < 0) + retval = handle; + else if (handle != num) dev_warn(>pdev->dev, "cleaned %d, need to clean %d\n", handle, num); - } spin_unlock_bh(>cmq.csq.lock); -- 2.7.4
[Patch net-next] net: hns3: fix return value error while hclge_cmd_csq_clean failed
From: fredalu While cleaning the command queue, the value of the HEAD register is not in the range of next_to_clean and next_to_use, meaning that this value is invalid. This also means that there is a hardware error and the hardware will trigger a reset soon. At this time we should return an error code instead of 0, and HCLGE_STATE_CMD_DISABLE needs to be set to prevent sending command again. Fixes: 3ff504908f95 ("net: hns3: fix a dead loop in hclge_cmd_csq_clean") Signed-off-by: Huazhong Tan wq --- drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.c | 12 +--- 1 file changed, 9 insertions(+), 3 deletions(-) diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.c b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.c index 165c3d5..ac13cb2 100644 --- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.c +++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.c @@ -147,7 +147,12 @@ static int hclge_cmd_csq_clean(struct hclge_hw *hw) if (!is_valid_csq_clean_head(csq, head)) { dev_warn(>pdev->dev, "wrong cmd head (%d, %d-%d)\n", head, csq->next_to_use, csq->next_to_clean); - return 0; + dev_warn(>pdev->dev, +"Disabling any further commands to IMP firmware\n"); + set_bit(HCLGE_STATE_CMD_DISABLE, >state); + dev_warn(>pdev->dev, +"IMP firmware watchdog reset soon expected!\n"); + return -EIO; } clean = (head - csq->next_to_clean + csq->desc_num) % csq->desc_num; @@ -267,10 +272,11 @@ int hclge_cmd_send(struct hclge_hw *hw, struct hclge_desc *desc, int num) /* Clean the command send queue */ handle = hclge_cmd_csq_clean(hw); - if (handle != num) { + if (handle < 0) + retval = handle; + else if (handle != num) dev_warn(>pdev->dev, "cleaned %d, need to clean %d\n", handle, num); - } spin_unlock_bh(>cmq.csq.lock); -- 2.7.4
Re: [patch net-next 2/2] net: sched: fix notifications for action-held chains
Wed, Aug 01, 2018 at 06:27:28AM CEST, xiyou.wangc...@gmail.com wrote: >On Tue, Jul 31, 2018 at 5:10 AM Jiri Pirko wrote: >> >> From: Jiri Pirko >> >> Chains that only have action references serve as placeholders. >> Until a non-action reference is created, user should not be aware >> of the chain. Also he should not receive any notifications about it. >> So send notifications for the new chain only in case the chain gets >> the first non-action reference. Symmetrically to that, when >> the last non-action reference is dropped, send the notification about >> deleted chain. >> >> Reported-by: Cong Wang >> Signed-off-by: Jiri Pirko > >I think __tcf_chain_{get,put}() can be static. In fact, tcf_chain_get/put could be now static too. Will send v2. Thanks! > >Other than that, > >Acked-by: Cong Wang > >Thanks.
Re: [net-next V2 12/12] net/mlx5e: Use PARTIAL_GSO for UDP segmentation
On 7/24/2018 8:35 PM, Saeed Mahameed wrote: On Tue, Jul 24, 2018 at 7:53 AM, Alexander Duyck wrote: On Mon, Jul 23, 2018 at 3:11 PM, Saeed Mahameed wrote: From: Boris Pismenny This patch removes the splitting of UDP_GSO_L4 packets in the driver, and exposes UDP_GSO_L4 as a PARTIAL_GSO feature. Thus, the network stack is not responsible for splitting the packet into two. Signed-off-by: Boris Pismenny Signed-off-by: Saeed Mahameed --- .../net/ethernet/mellanox/mlx5/core/Makefile | 4 +- .../mellanox/mlx5/core/en_accel/en_accel.h| 27 +++-- .../mellanox/mlx5/core/en_accel/rxtx.c| 109 -- .../mellanox/mlx5/core/en_accel/rxtx.h| 14 --- .../net/ethernet/mellanox/mlx5/core/en_main.c | 9 +- 5 files changed, 23 insertions(+), 140 deletions(-) delete mode 100644 drivers/net/ethernet/mellanox/mlx5/core/en_accel/rxtx.c delete mode 100644 drivers/net/ethernet/mellanox/mlx5/core/en_accel/rxtx.h diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Makefile b/drivers/net/ethernet/mellanox/mlx5/core/Makefile index 55d5a5c2e9d8..fa7fcca5dc78 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/Makefile +++ b/drivers/net/ethernet/mellanox/mlx5/core/Makefile @@ -14,8 +14,8 @@ mlx5_core-$(CONFIG_MLX5_FPGA) += fpga/cmd.o fpga/core.o fpga/conn.o fpga/sdk.o \ fpga/ipsec.o fpga/tls.o mlx5_core-$(CONFIG_MLX5_CORE_EN) += en_main.o en_common.o en_fs.o en_ethtool.o \ - en_tx.o en_rx.o en_dim.o en_txrx.o en_accel/rxtx.o en_stats.o \ - vxlan.o en_arfs.o en_fs_ethtool.o en_selftest.o en/port.o + en_tx.o en_rx.o en_dim.o en_txrx.o en_stats.o vxlan.o \ + en_arfs.o en_fs_ethtool.o en_selftest.o en/port.o mlx5_core-$(CONFIG_MLX5_MPFS) += lib/mpfs.o diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/en_accel.h b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/en_accel.h index 39a5d13ba459..1dd225380a66 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/en_accel.h +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/en_accel.h @@ -38,14 +38,22 @@ #include #include "en_accel/ipsec_rxtx.h" #include "en_accel/tls_rxtx.h" -#include "en_accel/rxtx.h" #include "en.h" -static inline struct sk_buff *mlx5e_accel_handle_tx(struct sk_buff *skb, - struct mlx5e_txqsq *sq, - struct net_device *dev, - struct mlx5e_tx_wqe **wqe, - u16 *pi) +static inline void +mlx5e_udp_gso_handle_tx_skb(struct sk_buff *skb) +{ + int payload_len = skb_shinfo(skb)->gso_size + sizeof(struct udphdr); + + udp_hdr(skb)->len = htons(payload_len); +} + So it looks like you decided to just update the length here. Do you still have plans to update GSO_PARTIAL to set the length this way or have you decided to just leave it as it is? I don't know what are Boris's plans regarding this, I will let him answer that. But what is your take on this ? Sorry for the late response. I'm currently busy with other things. I'd rather let it be like this for now, and fix it later. If you'd like to change it, I'll be happy to test. Thanks