Re: [PATCH net-next,v5 11/12] qede: place ethtool_rx_flow_spec after code after TC flower codebase

2018-12-08 Thread Jiri Pirko
Thu, Dec 06, 2018 at 11:40:01PM CET, pa...@netfilter.org wrote:
>This is a preparation patch to reuse the existing TC flower codebase
>from ethtool_rx_flow_spec.
>
>This patch is merely moving the core ethtool_rx_flow_spec parser after
>tc flower offload driver code so we can skip a few forward function
>declarations in the follow up patch.
>
>Signed-off-by: Pablo Neira Ayuso 

Acked-by: Jiri Pirko 


Re: [PATCH net-next,v5 10/12] dsa: bcm_sf2: use flow_rule infrastructure

2018-12-08 Thread Jiri Pirko
Thu, Dec 06, 2018 at 11:40:00PM CET, pa...@netfilter.org wrote:
>Update this driver to use the flow_rule infrastructure, hence we can use
>the same code to populate hardware IR from ethtool_rx_flow and the
>cls_flower interfaces.
>
>Signed-off-by: Pablo Neira Ayuso 

Acked-by: Jiri Pirko 


Re: [PATCH net-next,v5 09/12] ethtool: add ethtool_rx_flow_spec to flow_rule structure translator

2018-12-08 Thread Jiri Pirko
Thu, Dec 06, 2018 at 11:39:59PM CET, pa...@netfilter.org wrote:
>This patch adds a function to translate the ethtool_rx_flow_spec
>structure to the flow_rule representation.
>
>This allows us to reuse code from the driver side given that both flower
>and ethtool_rx_flow interfaces use the same representation.
>
>This patch also includes support for the flow type flags FLOW_EXT,
>FLOW_MAC_EXT and FLOW_RSS.
>
>The ethtool_rx_flow_spec_input wrapper structure is used to convey the
>rss_context field, that is away from the ethtool_rx_flow_spec structure,
>and the ethtool_rx_flow_spec structure.
>
>Signed-off-by: Pablo Neira Ayuso 

Acked-by: Jiri Pirko 


Re: [PATCH net-next,v5 08/12] flow_offload: add wake-up-on-lan and queue to flow_action

2018-12-08 Thread Jiri Pirko
Thu, Dec 06, 2018 at 11:39:58PM CET, pa...@netfilter.org wrote:
>These actions need to be added to support the ethtool_rx_flow interface.
>The queue action includes a field to specify the RSS context, that is
>set via FLOW_RSS flow type flag and the rss_context field in struct
>ethtool_rxnfc, plus the corresponding queue index. FLOW_RSS implies that
>rss_context is non-zero, therefore, queue.ctx == 0 means that FLOW_RSS
>was not set.
>
>Signed-off-by: Pablo Neira Ayuso 

Acked-by: Jiri Pirko 


Re: [PATCH net-next,v5 06/12] drivers: net: use flow action infrastructure

2018-12-08 Thread Jiri Pirko
Thu, Dec 06, 2018 at 11:39:56PM CET, pa...@netfilter.org wrote:
>This patch updates drivers to use the new flow action infrastructure.
>
>Signed-off-by: Pablo Neira Ayuso 

Acked-by: Jiri Pirko 


Re: [PATCH net-next,v5 07/12] cls_flower: don't expose TC actions to drivers anymore

2018-12-08 Thread Jiri Pirko
Thu, Dec 06, 2018 at 11:39:57PM CET, pa...@netfilter.org wrote:
>Now that drivers have been converted to use the flow action
>infrastructure, remove this field from the tc_cls_flower_offload
>structure.
>
>Signed-off-by: Pablo Neira Ayuso 

Acked-by: Jiri Pirko 


Re: [PATCH net-next,v5 05/12] flow_offload: add statistics retrieval infrastructure and use it

2018-12-08 Thread Jiri Pirko
Thu, Dec 06, 2018 at 11:39:55PM CET, pa...@netfilter.org wrote:
>This patch provides the flow_stats structure that acts as container for
>tc_cls_flower_offload, then we can use to restore the statistics on the
>existing TC actions. Hence, tcf_exts_stats_update() is not used from
>drivers anymore.
>
>Signed-off-by: Pablo Neira Ayuso 

Acked-by: Jiri Pirko 


Re: [PATCH net-next,v5 04/12] cls_api: add translator to flow_action representation

2018-12-08 Thread Jiri Pirko
Thu, Dec 06, 2018 at 11:39:54PM CET, pa...@netfilter.org wrote:
>This patch implements a new function to translate from native TC action
>to the new flow_action representation. Moreover, this patch also updates
>cls_flower to use this new function.
>
>Signed-off-by: Pablo Neira Ayuso 

Acked-by: Jiri Pirko 


Re: [PATCH net-next,v5 03/12] flow_offload: add flow action infrastructure

2018-12-08 Thread Jiri Pirko
Thu, Dec 06, 2018 at 11:39:53PM CET, pa...@netfilter.org wrote:
>This new infrastructure defines the nic actions that you can perform
>from existing network drivers. This infrastructure allows us to avoid a
>direct dependency with the native software TC action representation.
>
>Signed-off-by: Pablo Neira Ayuso 

Acked-by: Jiri Pirko 


Re: [PATCH net-next,v5 02/12] net/mlx5e: support for two independent packet edit actions

2018-12-08 Thread Jiri Pirko
Thu, Dec 06, 2018 at 11:39:52PM CET, pa...@netfilter.org wrote:
>This patch adds pedit_headers_action structure to store the result of
>parsing tc pedit actions. Then, it calls alloc_tc_pedit_action() to
>populate the mlx5e hardware intermediate representation once all actions
>have been parsed.
>
>This patch comes in preparation for the new flow_action infrastructure,
>where each packet mangling comes in an separated action, ie. not packed
>as in tc pedit.
>
>Signed-off-by: Pablo Neira Ayuso 

Acked-by: Jiri Pirko 


Re: [PATCH net-next,v5 01/12] flow_offload: add flow_rule and flow_match structures and use them

2018-12-08 Thread Jiri Pirko
Thu, Dec 06, 2018 at 11:39:51PM CET, pa...@netfilter.org wrote:
>This patch wraps the dissector key and mask - that flower uses to
>represent the matching side - around the flow_match structure.
>
>To avoid a follow up patch that would edit the same LoCs in the drivers,
>this patch also wraps this new flow match structure around the flow rule
>object. This new structure will also contain the flow actions in follow
>up patches.
>
>This introduces two new interfaces:
>
>   bool flow_rule_match_key(rule, dissector_id)
>
>that returns true if a given matching key is set on, and:
>
>   flow_rule_match_XYZ(rule, );
>
>To fetch the matching side XYZ into the match container structure, to
>retrieve the key and the mask with one single call.
>
>Signed-off-by: Pablo Neira Ayuso 

Acked-by: Jiri Pirko 


Re: [PATCH net-next RFC 7/7] bnxt_en: Add bnxt_en initial port params table and register it

2018-12-06 Thread Jiri Pirko
Thu, Dec 06, 2018 at 11:31:26AM CET, vasundhara-v.vo...@broadcom.com wrote:
>On Thu, Dec 6, 2018 at 2:41 PM Jiri Pirko  wrote:
>>
>> Thu, Dec 06, 2018 at 09:57:05AM CET, michael.c...@broadcom.com wrote:
>> >On Wed, Dec 5, 2018 at 11:11 PM Jakub Kicinski
>> > wrote:
>> >>
>> >> On Wed, 5 Dec 2018 22:41:43 -0800, Michael Chan wrote:
>> >> >
>> >> > It will be in the BIOS only for a LOM, I think.  For a NIC, it should
>> >> > be in the NIC's NVRAM.
>> >>
>> >> This is all vague.  Could you please clearly state the use case.
>> >>
>> >Well, the WoL setting's use case should be quite simple, right?  If
>> >the card's NVRAM WoL setting is ON, when you plug the card in a slot
>> >that has Vaux power, it will assert PME# when a magic packet is
>> >received.  Again, the WoL setting in this context is similar to other
>> >power up settings such as PCIe Gen2 or Gen3.
>> >
>> >Let's say the power up setting is ON and it boots up to Linux for the
>> >first time after receiving a magic packet.  The Linux user can then
>> >run ethtool -s to set the driver's non persistent WoL setting.  It can
>> >be the same as the NVRAM's power up setting, or different.  Ethtool
>> >may support additional WoL packet types that the power up setting does
>>
>> Wouldn't it make sense to also support multiple wol packet types in
>> devlink param, not just true/false? Your device may not support that but
>> others may. So enum instead of bool.
>There is no type enum in devlink types as of now. Instead it can be
>defined as u8 and
>a enum structure can be defined for wol types.

Yes.

>>
>>
>> >not support.  Let's say the Linux user sets the ethtool WoL setting to
>> >OFF and shuts down the system.  That card now will not wake up the
>> >system.  But if there is a power failure and power comes back on
>> >later, the card will lose the ethtool setting and go back to the power
>> >up WoL setting, which is ON in this example.


Re: [PATCH net-next RFC 7/7] bnxt_en: Add bnxt_en initial port params table and register it

2018-12-06 Thread Jiri Pirko
Thu, Dec 06, 2018 at 09:57:05AM CET, michael.c...@broadcom.com wrote:
>On Wed, Dec 5, 2018 at 11:11 PM Jakub Kicinski
> wrote:
>>
>> On Wed, 5 Dec 2018 22:41:43 -0800, Michael Chan wrote:
>> >
>> > It will be in the BIOS only for a LOM, I think.  For a NIC, it should
>> > be in the NIC's NVRAM.
>>
>> This is all vague.  Could you please clearly state the use case.
>>
>Well, the WoL setting's use case should be quite simple, right?  If
>the card's NVRAM WoL setting is ON, when you plug the card in a slot
>that has Vaux power, it will assert PME# when a magic packet is
>received.  Again, the WoL setting in this context is similar to other
>power up settings such as PCIe Gen2 or Gen3.
>
>Let's say the power up setting is ON and it boots up to Linux for the
>first time after receiving a magic packet.  The Linux user can then
>run ethtool -s to set the driver's non persistent WoL setting.  It can
>be the same as the NVRAM's power up setting, or different.  Ethtool
>may support additional WoL packet types that the power up setting does

Wouldn't it make sense to also support multiple wol packet types in
devlink param, not just true/false? Your device may not support that but
others may. So enum instead of bool.


>not support.  Let's say the Linux user sets the ethtool WoL setting to
>OFF and shuts down the system.  That card now will not wake up the
>system.  But if there is a power failure and power comes back on
>later, the card will lose the ethtool setting and go back to the power
>up WoL setting, which is ON in this example.


Re: [PATCH net-next RFC 1/7] devlink: Add devlink_param for port register and unregister

2018-12-05 Thread Jiri Pirko
Thu, Dec 06, 2018 at 07:02:59AM CET, vasundhara-v.vo...@broadcom.com wrote:
>Thank you reviewing the patches.
>
>On Wed, Dec 5, 2018 at 5:24 PM Jiri Pirko  wrote:
>>
>> Wed, Dec 05, 2018 at 06:56:54AM CET, vasundhara-v.vo...@broadcom.com wrote:
>> >Add functions to register and unregister for the driver supported
>> >configuration parameters table per port.
>> >
>> >Cc: Jiri Pirko 
>> >Signed-off-by: Vasundhara Volam 
>> >---
>> > include/net/devlink.h |  29 +++
>> > net/core/devlink.c| 130 
>> > ++
>> > 2 files changed, 151 insertions(+), 8 deletions(-)
>> >
>> >diff --git a/include/net/devlink.h b/include/net/devlink.h
>> >index 67f4293..9b4d80b 100644
>> >--- a/include/net/devlink.h
>> >+++ b/include/net/devlink.h
>> >@@ -48,6 +48,7 @@ struct devlink_port_attrs {
>> >
>> > struct devlink_port {
>> >   struct list_head list;
>> >+  struct list_head param_list;
>> >   struct devlink *devlink;
>> >   unsigned index;
>> >   bool registered;
>> >@@ -419,6 +420,13 @@ enum devlink_param_generic_id {
>> >   .validate = _validate,  \
>> > }
>> >
>> >+enum devlink_port_param_generic_id {
>> >+  /* add new param generic ids above here */
>> >+  __DEVLINK_PORT_PARAM_GENERIC_ID_MAX,
>> >+  DEVLINK_PORT_PARAM_GENERIC_ID_MAX =
>> >+ __DEVLINK_PORT_PARAM_GENERIC_ID_MAX - 1,
>> >+};
>>
>> I don't see the need for enum just for per-port params. The existing
>> params enum should be enough.
>In that case there won't be any differentiation between generic device
>and port params.

Yes, you don't need that.

>Also, if enum is not needed, then separate struct
>devlink_port_param_generic is also not needed.

Yep.

>
>Based on this, entire patchset needs a change.

[...]


Re: [PATCH net-next RFC 7/7] bnxt_en: Add bnxt_en initial port params table and register it

2018-12-05 Thread Jiri Pirko
Wed, Dec 05, 2018 at 06:57:00AM CET, vasundhara-v.vo...@broadcom.com wrote:
>Register devlink_port with devlink and create initial port params
>table for bnxt_en. The table consists of a generic parameter:
>
>wake-on-lan: Enables Wake on Lan for this port when magic packet
>is received with this port's MAC address using ACPI pattern.
>If enabled, the controller asserts a wake pin upon reception of
>WoL packet.  ACPI (Advanced Configuration and Power Interface) is
>an industry specification for the efficient handling of power
>consumption in desktop and mobile computers.
>
>Cc: Michael Chan 
>Signed-off-by: Vasundhara Volam 
>---
> drivers/net/ethernet/broadcom/bnxt/bnxt.h |  1 +
> drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c | 35 +++
> drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.h |  1 +
> 3 files changed, 37 insertions(+)
>
>diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.h 
>b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
>index 9e99d4a..0ba62b3 100644
>--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.h
>+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
>@@ -1585,6 +1585,7 @@ struct bnxt {
> 
>   /* devlink interface and vf-rep structs */
>   struct devlink  *dl;
>+  struct devlink_port dl_port;
>   enum devlink_eswitch_mode eswitch_mode;
>   struct bnxt_vf_rep  **vf_reps; /* array of vf-rep ptrs */
>   u16 *cfa_code_map; /* cfa_code -> vf_idx map */
>diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c 
>b/drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c
>index 140dbd6..a2930d5 100644
>--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c
>+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c
>@@ -26,6 +26,10 @@ enum bnxt_dl_param_id {
>   BNXT_DEVLINK_PARAM_ID_GRE_VER_CHECK,
> };
> 
>+enum bnxt_dl_port_param_id {
>+  BNXT_DEVLINK_PORT_PARAM_ID_BASE = DEVLINK_PORT_PARAM_GENERIC_ID_MAX,
>+};
>+
> static const struct bnxt_dl_nvm_param nvm_params[] = {
>   {DEVLINK_PARAM_GENERIC_ID_ENABLE_SRIOV, NVM_OFF_ENABLE_SRIOV,
>BNXT_NVM_SHARED_CFG, 1},
>@@ -37,6 +41,8 @@ enum bnxt_dl_param_id {
>NVM_OFF_MSIX_VEC_PER_PF_MIN, BNXT_NVM_SHARED_CFG, 7},
>   {BNXT_DEVLINK_PARAM_ID_GRE_VER_CHECK, NVM_OFF_DIS_GRE_VER_CHECK,
>BNXT_NVM_SHARED_CFG, 1},
>+
>+  {DEVLINK_PORT_PARAM_GENERIC_ID_WOL, NVM_OFF_WOL, BNXT_NVM_PORT_CFG, 1},
> };
> 
> static int bnxt_hwrm_nvm_req(struct bnxt *bp, u32 param_id, void *msg,
>@@ -188,6 +194,12 @@ static int bnxt_dl_msix_validate(struct devlink *dl, u32 
>id,
>NULL),
> };
> 
>+static const struct devlink_param bnxt_dl_port_params[] = {
>+  DEVLINK_PORT_PARAM_GENERIC(WOL, BIT(DEVLINK_PARAM_CMODE_PERMANENT),
>+ bnxt_dl_nvm_param_get, bnxt_dl_nvm_param_set,
>+ NULL),
>+};
>+
> int bnxt_dl_register(struct bnxt *bp)
> {
>   struct devlink *dl;
>@@ -225,8 +237,28 @@ int bnxt_dl_register(struct bnxt *bp)
>   goto err_dl_unreg;
>   }
> 
>+  rc = devlink_port_register(dl, >dl_port, bp->pf.port_id);
>+  if (rc) {
>+  netdev_warn(bp->dev, "devlink_port_register failed. rc=%d",
>+  rc);
>+  goto err_dl_param_unreg;
>+  }
>+  bp->dl_port.type = DEVLINK_PORT_TYPE_ETH;

Please use devlink_port_type_set()


>+
>+  rc = devlink_port_params_register(>dl_port, bnxt_dl_port_params,
>+ARRAY_SIZE(bnxt_dl_port_params));
>+  if (rc) {
>+  netdev_warn(bp->dev, "devlink_port_params_register failed. 
>rc=%d",
>+  rc);
>+  goto err_dl_port_unreg;
>+  }
>   return 0;
> 
>+err_dl_port_unreg:
>+  devlink_port_unregister(>dl_port);
>+err_dl_param_unreg:
>+  devlink_params_unregister(dl, bnxt_dl_params,
>+ARRAY_SIZE(bnxt_dl_params));
> err_dl_unreg:
>   devlink_unregister(dl);
> err_dl_free:
>@@ -242,6 +274,9 @@ void bnxt_dl_unregister(struct bnxt *bp)
>   if (!dl)
>   return;
> 
>+  devlink_port_params_unregister(>dl_port, bnxt_dl_port_params,
>+ ARRAY_SIZE(bnxt_dl_port_params));
>+  devlink_port_unregister(>dl_port);
>   devlink_params_unregister(dl, bnxt_dl_params,
> ARRAY_SIZE(bnxt_dl_params));
>   devlink_unregister(dl);
>diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.h 
>b/drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.h
>index 5b6b2c7..da065ca 100644
>--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.h
>+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.h
>@@ -35,6 +35,7 @@ static inline void bnxt_link_bp_to_dl(struct bnxt *bp, 
>struct devlink *dl)
> 
> #define NVM_OFF_MSIX_VEC_PER_PF_MAX   108
> #define NVM_OFF_MSIX_VEC_PER_PF_MIN   114
>+#define NVM_OFF_WOL   152
> #define NVM_OFF_IGNORE_ARI164
> 

Re: [PATCH net-next RFC 6/7] devlink: Add a boolean generic port parameter

2018-12-05 Thread Jiri Pirko
Wed, Dec 05, 2018 at 06:56:59AM CET, vasundhara-v.vo...@broadcom.com wrote:
>wake-on-lan - Enables Wake on Lan for this port when magic packet
>is received with this port's MAC address using ACPI pattern.
>If enabled, the controller asserts a wake pin upon reception of
>WoL packet.  ACPI (Advanced Configuration and Power Interface)
>is an industry specification for the efficient handling of power
>consumption in desktop and mobile computers.
>
>Cc: Jiri Pirko 
>Signed-off-by: Vasundhara Volam 
>---
> include/net/devlink.h | 17 +
> net/core/devlink.c|  8 +++-
> 2 files changed, 24 insertions(+), 1 deletion(-)
>
>diff --git a/include/net/devlink.h b/include/net/devlink.h
>index abb5b3a..7409076 100644
>--- a/include/net/devlink.h
>+++ b/include/net/devlink.h
>@@ -421,12 +421,29 @@ enum devlink_param_generic_id {
> }
> 
> enum devlink_port_param_generic_id {
>+  DEVLINK_PORT_PARAM_GENERIC_ID_WOL,
>+
>   /* add new param generic ids above here */
>   __DEVLINK_PORT_PARAM_GENERIC_ID_MAX,
>   DEVLINK_PORT_PARAM_GENERIC_ID_MAX =
>  __DEVLINK_PORT_PARAM_GENERIC_ID_MAX - 1,
> };
> 
>+#define DEVLINK_PORT_PARAM_GENERIC_WOL_NAME "wake-on-lan"
>+#define DEVLINK_PORT_PARAM_GENERIC_WOL_TYPE DEVLINK_PARAM_TYPE_BOOL
>+
>+#define DEVLINK_PORT_PARAM_GENERIC(_id, _cmodes, _get, _set, _validate)   
>\
>+{ \
>+  .id = DEVLINK_PORT_PARAM_GENERIC_ID_##_id,  \
>+  .name = DEVLINK_PORT_PARAM_GENERIC_##_id##_NAME,\
>+  .type = DEVLINK_PORT_PARAM_GENERIC_##_id##_TYPE,\
>+  .generic = true,\
>+  .supported_cmodes = _cmodes,\
>+  .get = _get,\
>+  .set = _set,\
>+  .validate = _validate,  \
>+}

Just use DEVLINK_PARAM_GENERIC


>+
> struct devlink_region;
> 
> typedef void devlink_snapshot_data_dest_t(const void *data);
>diff --git a/net/core/devlink.c b/net/core/devlink.c
>index 7598da1..24c2b9e 100644
>--- a/net/core/devlink.c
>+++ b/net/core/devlink.c
>@@ -3210,7 +3210,13 @@ static void devlink_param_unregister_one(struct devlink 
>*devlink,
>   kfree(param_item);
> }
> 
>-static const struct devlink_param devlink_port_param_generic[] = {};
>+static const struct devlink_param devlink_port_param_generic[] = {
>+  {
>+  .id = DEVLINK_PORT_PARAM_GENERIC_ID_WOL,
>+  .name = DEVLINK_PORT_PARAM_GENERIC_WOL_NAME,
>+  .type = DEVLINK_PORT_PARAM_GENERIC_WOL_TYPE,
>+  },
>+};
> 
> static int devlink_port_param_generic_verify(const struct devlink_param 
> *param)
> {
>-- 
>1.8.3.1
>


Re: [PATCH net-next RFC 5/7] devlink: Add devlink notifications support for port params

2018-12-05 Thread Jiri Pirko
Wed, Dec 05, 2018 at 06:56:58AM CET, vasundhara-v.vo...@broadcom.com wrote:
>Add notification call for devlink port param set, register and unregister
>functions.
>Add devlink_port_param_value_changed() function to enable the driver notify
>devlink on value change. Driver should use this function after value was
>changed on any configuration mode part to driverinit.
>
>Cc: Jiri Pirko 
>Signed-off-by: Vasundhara Volam 

looks fine


Re: [PATCH net-next RFC 4/7] devlink: Add support for get/set driverinit value for devlink_port

2018-12-05 Thread Jiri Pirko
Wed, Dec 05, 2018 at 06:56:57AM CET, vasundhara-v.vo...@broadcom.com wrote:
>Add support for "driverinit" configuration mode value for devlink_port
>configuration parameters. Two additional functions added to help the
>driver get/set the value from/to devlink:
>devlink_port_param_driverinit_value_set() and
>devlink_port_param_driverinit_value_get().
>
>Also, move the common code to __devlink_param_driverinit_value_get/set()
>to be used by both device and port params.
>
>Cc: Jiri Pirko 
>Signed-off-by: Vasundhara Volam 
>---
> include/net/devlink.h |  19 
> net/core/devlink.c| 118 +++---
> 2 files changed, 112 insertions(+), 25 deletions(-)
>
>diff --git a/include/net/devlink.h b/include/net/devlink.h
>index 9b4d80b..c22b195 100644
>--- a/include/net/devlink.h
>+++ b/include/net/devlink.h
>@@ -588,6 +588,9 @@ int devlink_port_params_register(struct devlink_port 
>*devlink_port,
> void devlink_port_params_unregister(struct devlink_port *devlink_port,
>   const struct devlink_param *params,
>   size_t params_count);
>+int devlink_port_param_driverinit_value_set(struct devlink_port *devlink_port,
>+  u32 param_id,
>+  union devlink_param_value init_val);
> struct devlink_region *devlink_region_create(struct devlink *devlink,
>const char *region_name,
>u32 region_max_snapshots,
>@@ -845,6 +848,22 @@ static inline bool 
>devlink_dpipe_table_counter_enabled(struct devlink *devlink,
> {
> }
> 
>+static inline int
>+devlink_port_param_driverinit_value_get(struct devlink_port *devlink_port,
>+  u32 param_id,
>+  union devlink_param_value *init_val)
>+{
>+  return -EOPNOTSUPP;
>+}
>+
>+static inline int
>+devlink_port_param_driverinit_value_set(struct devlink_port *devlink_port,
>+  u32 param_id,
>+  union devlink_param_value init_val)
>+{
>+  return -EOPNOTSUPP;
>+}
>+
> static inline struct devlink_region *
> devlink_region_create(struct devlink *devlink,
> const char *region_name,
>diff --git a/net/core/devlink.c b/net/core/devlink.c
>index 10e1c45..4f0052d 100644
>--- a/net/core/devlink.c
>+++ b/net/core/devlink.c
>@@ -4690,6 +4690,29 @@ void devlink_params_unregister(struct devlink *devlink,
> }
> EXPORT_SYMBOL_GPL(devlink_params_unregister);
> 
>+static int
>+__devlink_param_driverinit_value_get(struct list_head *param_list, u32 
>param_id,
>+   union devlink_param_value *init_val)
>+{
>+  struct devlink_param_item *param_item;
>+
>+  param_item = devlink_param_find_by_id(param_list, param_id);
>+  if (!param_item)
>+  return -EINVAL;
>+
>+  if (!param_item->driverinit_value_valid ||
>+  !devlink_param_cmode_is_supported(param_item->param,
>+DEVLINK_PARAM_CMODE_DRIVERINIT))
>+  return -EOPNOTSUPP;
>+
>+  if (param_item->param->type == DEVLINK_PARAM_TYPE_STRING)
>+  strcpy(init_val->vstr, param_item->driverinit_value.vstr);
>+  else
>+  *init_val = param_item->driverinit_value;
>+
>+  return 0;
>+}
>+
> /**
>  *devlink_param_driverinit_value_get - get configuration parameter
>  * value for driver initializing
>@@ -4704,28 +4727,40 @@ void devlink_params_unregister(struct devlink *devlink,
> int devlink_param_driverinit_value_get(struct devlink *devlink, u32 param_id,
>  union devlink_param_value *init_val)
> {
>-  struct devlink_param_item *param_item;
>-
>   if (!devlink->ops || !devlink->ops->reload)
>   return -EOPNOTSUPP;
> 
>+  return __devlink_param_driverinit_value_get(>param_list,
>+  param_id, init_val);
>+}
>+EXPORT_SYMBOL_GPL(devlink_param_driverinit_value_get);
>+
>+static int
>+__devlink_param_driverinit_value_set(struct devlink *devlink,
>+   struct list_head *param_list,
>+   u32 param_id,
>+   union devlink_param_value init_val,
>+   enum devlink_command cmd)
>+{
>+  struct devlink_param_item *param_item;
>+
> 

Re: [PATCH net-next RFC 3/7] devlink: Add port param set command

2018-12-05 Thread Jiri Pirko
Wed, Dec 05, 2018 at 06:56:56AM CET, vasundhara-v.vo...@broadcom.com wrote:
>Add port param set command to set the value for a parameter.
>Value can be set to any of the supported configuration modes.
>
>Cc: Jiri Pirko 
>Signed-off-by: Vasundhara Volam 
>---
> include/uapi/linux/devlink.h |  1 +
> net/core/devlink.c   | 41 ++---
> 2 files changed, 35 insertions(+), 7 deletions(-)
>
>diff --git a/include/uapi/linux/devlink.h b/include/uapi/linux/devlink.h
>index f96e052..8f3c5dd 100644
>--- a/include/uapi/linux/devlink.h
>+++ b/include/uapi/linux/devlink.h
>@@ -84,6 +84,7 @@ enum devlink_command {
>   DEVLINK_CMD_PARAM_DEL,
> 
>   DEVLINK_CMD_PORT_PARAM_GET, /* can dump */
>+  DEVLINK_CMD_PORT_PARAM_SET,

Same note as for the previous patch.

> 
>   DEVLINK_CMD_REGION_GET,
>   DEVLINK_CMD_REGION_SET,
>diff --git a/net/core/devlink.c b/net/core/devlink.c
>index 8653fb5..10e1c45 100644
>--- a/net/core/devlink.c
>+++ b/net/core/devlink.c
>@@ -3096,19 +3096,20 @@ static int devlink_nl_cmd_param_get_doit(struct 
>sk_buff *skb,
>   return genlmsg_reply(msg, info);
> }
> 
>-static int devlink_nl_cmd_param_set_doit(struct sk_buff *skb,
>-   struct genl_info *info)
>+static int __devlink_nl_cmd_param_set_doit(struct devlink *devlink,
>+ struct list_head *param_list,
>+ struct genl_info *info,
>+ enum devlink_command cmd)
> {
>-  struct devlink *devlink = info->user_ptr[0];
>+  struct devlink_param_item *param_item;
>   enum devlink_param_type param_type;
>   struct devlink_param_gset_ctx ctx;
>-  enum devlink_param_cmode cmode;
>-  struct devlink_param_item *param_item;
>   const struct devlink_param *param;
>   union devlink_param_value value;
>+  enum devlink_param_cmode cmode;

Don't move the things around for no good reason. In case you want to
make our reverse-Christmas-tree obsession satisfied, do it in
a separate patch :)

[...]


Re: [PATCH net-next RFC 2/7] devlink: Add port param get command

2018-12-05 Thread Jiri Pirko
Wed, Dec 05, 2018 at 06:56:55AM CET, vasundhara-v.vo...@broadcom.com wrote:
>Add port param get command which gets data per parameter.
>It also has option to dump the parameters data per port.
>
>Cc: Jiri Pirko 
>Signed-off-by: Vasundhara Volam 
>---
> include/uapi/linux/devlink.h |   2 +
> net/core/devlink.c   | 102 ---
> 2 files changed, 97 insertions(+), 7 deletions(-)
>
>diff --git a/include/uapi/linux/devlink.h b/include/uapi/linux/devlink.h
>index 6e52d36..f96e052 100644
>--- a/include/uapi/linux/devlink.h
>+++ b/include/uapi/linux/devlink.h
>@@ -83,6 +83,8 @@ enum devlink_command {
>   DEVLINK_CMD_PARAM_NEW,
>   DEVLINK_CMD_PARAM_DEL,
> 
>+  DEVLINK_CMD_PORT_PARAM_GET, /* can dump */

You need to add this to the end, otherwise you would break uapi.


>+
>   DEVLINK_CMD_REGION_GET,
>   DEVLINK_CMD_REGION_SET,
>   DEVLINK_CMD_REGION_NEW,

[...]

The rest of the patch looks fine.


Re: [PATCH net-next RFC 1/7] devlink: Add devlink_param for port register and unregister

2018-12-05 Thread Jiri Pirko
Wed, Dec 05, 2018 at 06:56:54AM CET, vasundhara-v.vo...@broadcom.com wrote:
>Add functions to register and unregister for the driver supported
>configuration parameters table per port.
>
>Cc: Jiri Pirko 
>Signed-off-by: Vasundhara Volam 
>---
> include/net/devlink.h |  29 +++
> net/core/devlink.c| 130 ++
> 2 files changed, 151 insertions(+), 8 deletions(-)
>
>diff --git a/include/net/devlink.h b/include/net/devlink.h
>index 67f4293..9b4d80b 100644
>--- a/include/net/devlink.h
>+++ b/include/net/devlink.h
>@@ -48,6 +48,7 @@ struct devlink_port_attrs {
> 
> struct devlink_port {
>   struct list_head list;
>+  struct list_head param_list;
>   struct devlink *devlink;
>   unsigned index;
>   bool registered;
>@@ -419,6 +420,13 @@ enum devlink_param_generic_id {
>   .validate = _validate,  \
> }
> 
>+enum devlink_port_param_generic_id {
>+  /* add new param generic ids above here */
>+  __DEVLINK_PORT_PARAM_GENERIC_ID_MAX,
>+  DEVLINK_PORT_PARAM_GENERIC_ID_MAX =
>+ __DEVLINK_PORT_PARAM_GENERIC_ID_MAX - 1,
>+};

I don't see the need for enum just for per-port params. The existing
params enum should be enough.


>+
> struct devlink_region;
> 
> typedef void devlink_snapshot_data_dest_t(const void *data);
>@@ -574,6 +582,12 @@ int devlink_param_driverinit_value_set(struct devlink 
>*devlink, u32 param_id,
> void devlink_param_value_changed(struct devlink *devlink, u32 param_id);
> void devlink_param_value_str_fill(union devlink_param_value *dst_val,
> const char *src);
>+int devlink_port_params_register(struct devlink_port *devlink_port,
>+   const struct devlink_param *params,
>+   size_t params_count);
>+void devlink_port_params_unregister(struct devlink_port *devlink_port,
>+  const struct devlink_param *params,
>+  size_t params_count);
> struct devlink_region *devlink_region_create(struct devlink *devlink,
>const char *region_name,
>u32 region_max_snapshots,
>@@ -816,6 +830,21 @@ static inline bool 
>devlink_dpipe_table_counter_enabled(struct devlink *devlink,
> {
> }
> 
>+static inline int
>+devlink_port_params_register(struct devlink_port *devlink_port,
>+   const struct devlink_param *params,
>+   size_t params_count)
>+{
>+  return 0;
>+}
>+
>+static inline void
>+devlink_port_params_unregister(struct devlink_port *devlink_port,
>+ const struct devlink_param *params,
>+ size_t params_count)
>+{
>+}
>+
> static inline struct devlink_region *
> devlink_region_create(struct devlink *devlink,
> const char *region_name,
>diff --git a/net/core/devlink.c b/net/core/devlink.c
>index abb0da9..e194913 100644
>--- a/net/core/devlink.c
>+++ b/net/core/devlink.c
>@@ -3147,12 +3147,12 @@ static int devlink_nl_cmd_param_set_doit(struct 
>sk_buff *skb,
> }
> 
> static int devlink_param_register_one(struct devlink *devlink,
>+struct list_head *param_list,
> const struct devlink_param *param)
> {
>   struct devlink_param_item *param_item;
> 
>-  if (devlink_param_find_by_name(>param_list,
>- param->name))
>+  if (devlink_param_find_by_name(param_list, param->name))
>   return -EEXIST;
> 
>   if (param->supported_cmodes == BIT(DEVLINK_PARAM_CMODE_DRIVERINIT))
>@@ -3165,24 +3165,54 @@ static int devlink_param_register_one(struct devlink 
>*devlink,
>   return -ENOMEM;
>   param_item->param = param;
> 
>-  list_add_tail(_item->list, >param_list);
>+  list_add_tail(_item->list, param_list);
>   devlink_param_notify(devlink, param_item, DEVLINK_CMD_PARAM_NEW);
>   return 0;
> }
> 
> static void devlink_param_unregister_one(struct devlink *devlink,
>+   struct list_head *param_list,
>const struct devlink_param *param)
> {
>   struct devlink_param_item *param_item;
> 
>-  param_item = devlink_param_find_by_name(>param_list,
>-  param->name);
>+  param_item = devlink_param_find_by_name(param_list, param->name);
>   WARN_ON(!param_item);
>   d

Re: [PATCH iproute2] tc: add a missing space between rate estimator and backlog

2018-12-01 Thread Jiri Pirko
Fri, Nov 30, 2018 at 02:57:02PM CET, eduma...@google.com wrote:
>When a rate estimator is active, "tc -s qd" displays
>something like :
>
>rate 12616bit 11ppsbacklog 0b 0p requeues 2
>
>instead of :
>
>rate 12616bit 11pps backlog 0b 0p requeues 2
>
>Fixes: 4fcec7f3665b ("tc: jsonify stats2")
>Signed-off-by: Eric Dumazet 
>Cc: Jiri Pirko 

Reviewed-by: Jiri Pirko 


Re: [PATCH net-next,v4 00/12] add flow_rule infrastructure

2018-11-29 Thread Jiri Pirko
Thu, Nov 29, 2018 at 04:47:07PM CET, john.fastab...@gmail.com wrote:
>On 11/28/18 6:22 PM, Pablo Neira Ayuso wrote:
>> Hi,
>> 
>> This patchset is another iteration to introduce an in-kernel intermediate
>> representation (IR) to express ACL hardware offloads [1] [2] [3].
>> 
>
>Hi,
>
>Also wanted to add. In an earlier thread it was mentioned this could be
>used for other offload rule infrastructures specifically u32 was
>mentioned. I don't think this is actually possible on the flow_rule
>side. This set uses basically an enum based key system where enums
>such as FLOW_DISSECTOR_KEY_* identify the field in the packet. For
>every field we want to match a new key is needed. But the u32 classifier
>defines fields using offset/mask and a parse graph. They do not seem
>compatible to me so in the end this unifies ethtool and flower only.
>Did I get this right?
>
>So would it be better to simply map ethtool onto flower vs defining
>a new one? Patch 1 seems to be pretty light-weight so maybe rather
>than calling it a new IR we just need some helper routines for
>drivers to work with.

You don't really want to create TC internal structures in order to
offload ethtool stuff. So you need something generic and subsystem
agnostic so you can use it by both (and possibly more). And that is
exactly what this patchset is doing.

U32 is on the side. You still offload u32 via existing structs. No
change there. Nobody is pushing u32 to use any other structs.
If later on (I don't see it likely) there will be another subsystem
using u32-like structures, we can find the generic structs for that too.


>
>Probably a more detailed cover letter explaining motivation
>and any future work would help (me at least) understand the direction.
>I see netfilter offload was mentioned at one point so maybe that is
>the motivation that makes it more clear why flower API today is
>insufficient. Mostly curious at this point I see Jiri and Florian
>both reviewed it already.
>
>Thanks,
>John


Re: [PATCH net-next,v3 09/12] flow_dissector: add basic ethtool_rx_flow_spec to flow_rule structure translator

2018-11-21 Thread Jiri Pirko
Thu, Nov 22, 2018 at 05:59:27AM CET, f.faine...@gmail.com wrote:
>
>
>On 11/20/2018 6:51 PM, Pablo Neira Ayuso wrote:
>> This patch adds a function to translate the ethtool_rx_flow_spec
>> structure to the flow_rule representation.
>> 
>> This allows us to reuse code from the driver side given that both flower
>> and ethtool_rx_flow interfaces use the same representation.
>> 
>> Signed-off-by: Pablo Neira Ayuso 
>> ---
>> v3: Suggested by Jiri Pirko:
>> - Add struct ethtool_rx_flow_rule, keep placeholder to private
>>   dissector information.
>> Reported by Manish Chopra:
>>  - Fix incorrect dissector user_keys flags.
>> 
>>  include/linux/ethtool.h |  10 +++
>>  net/core/ethtool.c  | 189 
>> 
>>  2 files changed, 199 insertions(+)
>> 
>> diff --git a/include/linux/ethtool.h b/include/linux/ethtool.h
>> index afd9596ce636..99849e0858b2 100644
>> --- a/include/linux/ethtool.h
>> +++ b/include/linux/ethtool.h
>> @@ -400,4 +400,14 @@ struct ethtool_ops {
>>  void(*get_ethtool_phy_stats)(struct net_device *,
>>   struct ethtool_stats *, u64 *);
>>  };
>> +
>> +struct ethtool_rx_flow_rule {
>> +struct flow_rule*rule;
>> +unsigned long   priv[0];
>
>This forces you to cast to/from that member, any reason this is just not
>a void *priv here?

The reason is single alloc call for rule and its priv. Quite usual along
the code.


>-- 
>Florian


Re: [PATCH net-next,v3 09/12] flow_dissector: add basic ethtool_rx_flow_spec to flow_rule structure translator

2018-11-21 Thread Jiri Pirko
Thu, Nov 22, 2018 at 05:57:31AM CET, f.faine...@gmail.com wrote:
>
>
>On 11/20/2018 6:51 PM, Pablo Neira Ayuso wrote:
>> This patch adds a function to translate the ethtool_rx_flow_spec
>> structure to the flow_rule representation.
>> 
>> This allows us to reuse code from the driver side given that both flower
>> and ethtool_rx_flow interfaces use the same representation.
>> 
>> Signed-off-by: Pablo Neira Ayuso 
>> ---
>> v3: Suggested by Jiri Pirko:
>> - Add struct ethtool_rx_flow_rule, keep placeholder to private
>>   dissector information.
>> Reported by Manish Chopra:
>>  - Fix incorrect dissector user_keys flags.
>> 
>>  include/linux/ethtool.h |  10 +++
>>  net/core/ethtool.c  | 189 
>> 
>>  2 files changed, 199 insertions(+)
>> 
>> diff --git a/include/linux/ethtool.h b/include/linux/ethtool.h
>> index afd9596ce636..99849e0858b2 100644
>> --- a/include/linux/ethtool.h
>> +++ b/include/linux/ethtool.h
>> @@ -400,4 +400,14 @@ struct ethtool_ops {
>>  void(*get_ethtool_phy_stats)(struct net_device *,
>>   struct ethtool_stats *, u64 *);
>>  };
>> +
>> +struct ethtool_rx_flow_rule {
>> +struct flow_rule*rule;
>> +unsigned long   priv[0];
>> +};
>> +
>> +struct ethtool_rx_flow_rule *
>> +ethtool_rx_flow_rule_alloc(const struct ethtool_rx_flow_spec *fs);
>> +void ethtool_rx_flow_rule_free(struct ethtool_rx_flow_rule *rule);
>> +
>>  #endif /* _LINUX_ETHTOOL_H */
>> diff --git a/net/core/ethtool.c b/net/core/ethtool.c
>> index d05402868575..e679d6478371 100644
>> --- a/net/core/ethtool.c
>> +++ b/net/core/ethtool.c
>> @@ -28,6 +28,7 @@
>>  #include 
>>  #include 
>>  #include 
>> +#include 
>>  
>>  /*
>>   * Some useful ethtool_ops methods that're device independent.
>> @@ -2808,3 +2809,191 @@ int dev_ethtool(struct net *net, struct ifreq *ifr)
>>  
>>  return rc;
>>  }
>> +
>> +struct ethtool_rx_flow_key {
>> +struct flow_dissector_key_basic basic;
>> +union {
>> +struct flow_dissector_key_ipv4_addrsipv4;
>> +struct flow_dissector_key_ipv6_addrsipv6;
>> +};
>> +struct flow_dissector_key_ports tp;
>> +struct flow_dissector_key_ipip;
>> +} __aligned(BITS_PER_LONG / 8); /* Ensure that we can do comparisons as 
>> longs. */
>> +
>> +struct ethtool_rx_flow_match {
>> +struct flow_dissector   dissector;
>> +struct ethtool_rx_flow_key  key;
>> +struct ethtool_rx_flow_key  mask;
>> +};
>> +
>> +struct ethtool_rx_flow_rule *
>> +ethtool_rx_flow_rule_alloc(const struct ethtool_rx_flow_spec *fs)
>
>This is more than alloc, it's allocate and map, no reason to split the
>two operations AFAICT, but the name could be improved, how about
>alloc_from()?

Or ethtool_rx_flow_rule_create() and ethtool_rx_flow_rule_destroy()


>
>> +{
>> +static struct in6_addr zero_addr = {};
>> +struct ethtool_rx_flow_match *match;
>> +struct ethtool_rx_flow_rule *flow;
>> +struct flow_action_entry *act;
>> +
>> +flow = kzalloc(sizeof(struct ethtool_rx_flow_rule) +
>> +   sizeof(struct ethtool_rx_flow_match), GFP_KERNEL);
>> +if (!flow)
>> +return NULL;
>> +
>> +/* ethtool_rx supports only one single action per rule. */
>> +flow->rule = flow_rule_alloc(1);
>> +if (!flow->rule) {
>> +kfree(flow);
>> +return NULL;
>> +}
>> +
>> +match = (struct ethtool_rx_flow_match *)flow->priv;
>> +flow->rule->match.dissector = >dissector;
>> +flow->rule->match.mask  = >mask;
>> +flow->rule->match.key   = >key;
>> +
>> +match->mask.basic.n_proto = 0x;
>> +
>> +switch (fs->flow_type & ~FLOW_EXT) {
>> +case TCP_V4_FLOW:
>> +case UDP_V4_FLOW: {
>> +const struct ethtool_tcpip4_spec *v4_spec, *v4_m_spec;
>> +
>> +match->key.basic.n_proto = htons(ETH_P_IP);
>> +
>> +v4_spec = >h_u.tcp_ip4_spec;
>> +v4_m_spec = >m_u.tcp_ip4_spec;
>> +
>> +if (v4_m_spec->ip4src) {
>> +match->key.ipv4.src = v4_spec-&

Re: [PATCH net] net/sched: act_police: add missing spinlock initialization

2018-11-21 Thread Jiri Pirko
Wed, Nov 21, 2018 at 06:23:53PM CET, dcara...@redhat.com wrote:
>commit f2cbd4852820 ("net/sched: act_police: fix race condition on state
>variables") introduces a new spinlock, but forgets its initialization.
>Ensure that tcf_police_init() initializes 'tcfp_lock' every time a 'police'
>action is newly created, to avoid the following lockdep splat:
>
> INFO: trying to register non-static key.
> the code is fine but needs lockdep annotation.
> turning off the locking correctness validator.
> <...>
> Call Trace:
>  dump_stack+0x85/0xcb
>  register_lock_class+0x581/0x590
>  __lock_acquire+0xd4/0x1330
>  ? tcf_police_init+0x2fa/0x650 [act_police]
>  ? lock_acquire+0x9e/0x1a0
>  lock_acquire+0x9e/0x1a0
>  ? tcf_police_init+0x2fa/0x650 [act_police]
>  ? tcf_police_init+0x55a/0x650 [act_police]
>  _raw_spin_lock_bh+0x34/0x40
>  ? tcf_police_init+0x2fa/0x650 [act_police]
>  tcf_police_init+0x2fa/0x650 [act_police]
>  tcf_action_init_1+0x384/0x4c0
>  tcf_action_init+0xf6/0x160
>  tcf_action_add+0x73/0x170
>  tc_ctl_action+0x122/0x160
>  rtnetlink_rcv_msg+0x2a4/0x490
>  ? netlink_deliver_tap+0x99/0x400
>  ? validate_linkmsg+0x370/0x370
>  netlink_rcv_skb+0x4d/0x130
>  netlink_unicast+0x196/0x230
>  netlink_sendmsg+0x2e5/0x3e0
>  sock_sendmsg+0x36/0x40
>  ___sys_sendmsg+0x280/0x2f0
>  ? _raw_spin_unlock+0x24/0x30
>  ? handle_pte_fault+0xafe/0xf30
>  ? find_held_lock+0x2d/0x90
>  ? syscall_trace_enter+0x1df/0x360
>  ? __sys_sendmsg+0x5e/0xa0
>  __sys_sendmsg+0x5e/0xa0
>  do_syscall_64+0x60/0x210
>  entry_SYSCALL_64_after_hwframe+0x49/0xbe
> RIP: 0033:0x7f1841c7cf10
> Code: c3 48 8b 05 82 6f 2c 00 f7 db 64 89 18 48 83 cb ff eb dd 0f 1f 80 00 00 
> 00 00 83 3d 8d d0 2c 00 00 75 10 b8 2e 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 
> 31 c3 48 83 ec 08 e8 ae cc 00 00 48 89 04 24
> RSP: 002b:7ffcf9df4d68 EFLAGS: 0246 ORIG_RAX: 002e
> RAX: ffda RBX: 0001 RCX: 7f1841c7cf10
> RDX:  RSI: 7ffcf9df4dc0 RDI: 0003
> RBP: 5bf56105 R08: 0002 R09: 7ffcf9df8edc
> R10: 7ffcf9df47e0 R11: 0246 R12: 00671be0
> R13: 7ffcf9df4e84 R14: 0008 R15: 0000
>
>Fixes: f2cbd4852820 ("net/sched: act_police: fix race condition on state 
>variables")
>Reported-by: Cong Wang 
>Signed-off-by: Davide Caratti 

Acked-by: Jiri Pirko 


Re: [iproute2-next PATCH v3 2/2] man: tc-flower: Add explanation for range option

2018-11-21 Thread Jiri Pirko
Wed, Nov 21, 2018 at 05:59:45AM CET, dsah...@gmail.com wrote:
>On 11/20/18 9:59 PM, Nambiar, Amritha wrote:
>> Oops, submitted the v2 patch for man changes too soon, without seeing
>> this. So, in this case, should I re-submit the iproute2-flower patch
>> that was accepted removing the 'range' keyword?
>
>I think so. Consistency across commands is a good thing.

+1


Re: [PATCH 00/12 net-next,v2] add flow_rule infrastructure

2018-11-20 Thread Jiri Pirko
Tue, Nov 20, 2018 at 06:16:40PM CET, da...@davemloft.net wrote:
>From: Jiri Pirko 
>Date: Tue, 20 Nov 2018 08:39:12 +0100
>
>> If later on the netfilter code will use it, through another
>> ndo/notifier/whatever, that is side a nice side-effect in my
>> opinion.
>
>Netfilter HW offloading is the main motivation of these changes.
>
>You can try to spin it any way you like, but I think this is pretty
>clear.
>
>Would the author of these changes be even be remotely interested in
>this "cleanup" in areas of code he has never been involved in if that
>were not the case?

No, but of course. I'm just saying that the cleanup is nice and handy
even if the code would never be used by netfilter. Therefore I think
the info is irrelevant for the review. Anyway, I get your point.


>
>I think it is very dishonest to portray the situation differently.
>
>Thank you.


Re: [PATCH net] net: skb_scrub_packet(): Scrub offload_fwd_mark

2018-11-20 Thread Jiri Pirko
Tue, Nov 20, 2018 at 12:39:56PM CET, pe...@mellanox.com wrote:
>When a packet is trapped and the corresponding SKB marked as
>already-forwarded, it retains this marking even after it is forwarded
>across veth links into another bridge. There, since it ingresses the
>bridge over veth, which doesn't have offload_fwd_mark, it triggers a
>warning in nbp_switchdev_frame_mark().
>
>Then nbp_switchdev_allowed_egress() decides not to allow egress from
>this bridge through another veth, because the SKB is already marked, and
>the mark (of 0) of course matches. Thus the packet is incorrectly
>blocked.
>
>Solve by resetting offload_fwd_mark() in skb_scrub_packet(). That
>function is called from tunnels and also from veth, and thus catches the
>cases where traffic is forwarded between bridges and transformed in a
>way that invalidates the marking.
>
>Fixes: 6bc506b4fb06 ("bridge: switchdev: Add forward mark support for stacked 
>devices")
>Fixes: abf4bb6b63d0 ("skbuff: Add the offload_mr_fwd_mark field")
>Signed-off-by: Petr Machata 
>Suggested-by: Ido Schimmel 

Acked-by: Jiri Pirko 


Re: [PATCH 00/12 net-next,v2] add flow_rule infrastructure

2018-11-19 Thread Jiri Pirko
Mon, Nov 19, 2018 at 09:12:29PM CET, da...@davemloft.net wrote:
>From: Pablo Neira Ayuso 
>Date: Mon, 19 Nov 2018 01:15:07 +0100
>
>> This patchset introduces a kernel intermediate representation (IR) to
>> express ACL hardware offloads, as already described in previous RFC and
>> v1 patchset [1] [2]. The idea is to normalize the frontend U/APIs to use
>> the flow dissectors and the flow actions so drivers can reuse the
>> existing TC offload driver codebase - that has been converted to use the
>> flow_rule infrastructure.
>
>I'm go to bring up the elephant in the room.
>
>I think the real motivation here is to offload netfilter rules to HW,
>and you should be completely honest about that.

Sure, but this patchset is mainly about making the parsing code in
drivers common no matter from where the "flow rule" comes. If later on
the netfilter code will use it, through another ndo/notifier/whatever,
that is side a nice side-effect in my opinion.



Re: [PATCH net-next,v2 03/12] flow_dissector: add flow action infrastructure

2018-11-19 Thread Jiri Pirko
Mon, Nov 19, 2018 at 05:26:04PM CET, pa...@netfilter.org wrote:
>On Mon, Nov 19, 2018 at 03:03:22PM +0100, Jiri Pirko wrote:
>[...]
>> Maybe you can push this and related stuff into new files include/net/flow.h
>> and net/core/flow.c.
>
>Quick notice: These files already exist. Since you refer to _new_
>files, not the existing one, I propose to use net/core/flow_offload.c
>and include/net/flow_offload.h

Ok.


Re: [PATCH net-next 12/12] qede: use ethtool_rx_flow_rule() to remove duplicated parser code

2018-11-19 Thread Jiri Pirko
Mon, Nov 19, 2018 at 01:15:19AM CET, pa...@netfilter.org wrote:
>The qede driver supports for ethtool_rx_flow_spec and flower, both
>codebases look very similar.
>
>This patch uses the ethtool_rx_flow_rule() infrastructure to remove the
>duplicated ethtool_rx_flow_spec parser and consolidate ACL offload
>support around the flow_rule infrastructure.
>
>Furthermore, more code can be consolidated by merging
>qede_add_cls_rule() and qede_add_tc_flower_fltr(), these two functions
>also look very similar.
>
>This driver currently provides simple ACL support, such as 5-tuple
>matching, drop policy and queue to CPU.
>
>Drivers that support more features can benefit from this infrastructure
>to save even more redundant codebase.
>
>Signed-off-by: Pablo Neira Ayuso 
>---
>Note that, after this patch, qede_add_cls_rule() and
>qede_add_tc_flower_fltr() can be also consolidated since their code is
>redundant.
>
> drivers/net/ethernet/qlogic/qede/qede_filter.c | 246 ++---
> 1 file changed, 53 insertions(+), 193 deletions(-)
>
>diff --git a/drivers/net/ethernet/qlogic/qede/qede_filter.c 
>b/drivers/net/ethernet/qlogic/qede/qede_filter.c
>index aca302c3261b..f82b26ba8f80 100644
>--- a/drivers/net/ethernet/qlogic/qede/qede_filter.c
>+++ b/drivers/net/ethernet/qlogic/qede/qede_filter.c
>@@ -1578,30 +1578,6 @@ static void qede_flow_build_ipv6_hdr(struct 
>qede_arfs_tuple *t,
>   ports[1] = t->dst_port;
> }
> 
>-/* Validate fields which are set and not accepted by the driver */
>-static int qede_flow_spec_validate_unused(struct qede_dev *edev,
>-struct ethtool_rx_flow_spec *fs)
>-{
>-  if (fs->flow_type & FLOW_MAC_EXT) {
>-  DP_INFO(edev, "Don't support MAC extensions\n");
>-  return -EOPNOTSUPP;
>-  }
>-
>-  if ((fs->flow_type & FLOW_EXT) &&
>-  (fs->h_ext.vlan_etype || fs->h_ext.vlan_tci)) {
>-  DP_INFO(edev, "Don't support vlan-based classification\n");
>-  return -EOPNOTSUPP;
>-  }
>-
>-  if ((fs->flow_type & FLOW_EXT) &&
>-  (fs->h_ext.data[0] || fs->h_ext.data[1])) {
>-  DP_INFO(edev, "Don't support user defined data\n");
>-  return -EOPNOTSUPP;
>-  }
>-
>-  return 0;
>-}
>-
> static int qede_set_v4_tuple_to_profile(struct qede_dev *edev,
>   struct qede_arfs_tuple *t)
> {
>@@ -1665,132 +1641,6 @@ static int qede_set_v6_tuple_to_profile(struct 
>qede_dev *edev,
>   return 0;
> }
> 
>-static int qede_flow_spec_to_tuple_ipv4_common(struct qede_dev *edev,
>- struct qede_arfs_tuple *t,
>- struct ethtool_rx_flow_spec *fs)
>-{
>-  if ((fs->h_u.tcp_ip4_spec.ip4src &
>-   fs->m_u.tcp_ip4_spec.ip4src) != fs->h_u.tcp_ip4_spec.ip4src) {
>-  DP_INFO(edev, "Don't support IP-masks\n");
>-  return -EOPNOTSUPP;
>-  }
>-
>-  if ((fs->h_u.tcp_ip4_spec.ip4dst &
>-   fs->m_u.tcp_ip4_spec.ip4dst) != fs->h_u.tcp_ip4_spec.ip4dst) {
>-  DP_INFO(edev, "Don't support IP-masks\n");
>-  return -EOPNOTSUPP;
>-  }
>-
>-  if ((fs->h_u.tcp_ip4_spec.psrc &
>-   fs->m_u.tcp_ip4_spec.psrc) != fs->h_u.tcp_ip4_spec.psrc) {
>-  DP_INFO(edev, "Don't support port-masks\n");
>-  return -EOPNOTSUPP;
>-  }
>-
>-  if ((fs->h_u.tcp_ip4_spec.pdst &
>-   fs->m_u.tcp_ip4_spec.pdst) != fs->h_u.tcp_ip4_spec.pdst) {
>-  DP_INFO(edev, "Don't support port-masks\n");
>-  return -EOPNOTSUPP;
>-  }
>-
>-  if (fs->h_u.tcp_ip4_spec.tos) {
>-  DP_INFO(edev, "Don't support tos\n");
>-  return -EOPNOTSUPP;
>-  }
>-
>-  t->eth_proto = htons(ETH_P_IP);
>-  t->src_ipv4 = fs->h_u.tcp_ip4_spec.ip4src;
>-  t->dst_ipv4 = fs->h_u.tcp_ip4_spec.ip4dst;
>-  t->src_port = fs->h_u.tcp_ip4_spec.psrc;
>-  t->dst_port = fs->h_u.tcp_ip4_spec.pdst;
>-
>-  return qede_set_v4_tuple_to_profile(edev, t);
>-}
>-
>-static int qede_flow_spec_to_tuple_tcpv4(struct qede_dev *edev,
>-   struct qede_arfs_tuple *t,
>-   struct ethtool_rx_flow_spec *fs)
>-{
>-  t->ip_proto = IPPROTO_TCP;
>-
>-  if (qede_flow_spec_to_tuple_ipv4_common(edev, t, fs))
>-  return -EINVAL;
>-
>-  return 0;
>-}
>-
>-static int qede_flow_spec_to_tuple_udpv4(struct qede_dev *edev,
>-   struct qede_arfs_tuple *t,
>-   struct ethtool_rx_flow_spec *fs)
>-{
>-  t->ip_proto = IPPROTO_UDP;
>-
>-  if (qede_flow_spec_to_tuple_ipv4_common(edev, t, fs))
>-  return -EINVAL;
>-
>-  return 0;
>-}
>-
>-static int qede_flow_spec_to_tuple_ipv6_common(struct qede_dev *edev,
>- struct qede_arfs_tuple *t,
>- 

Re: [PATCH net-next,v2 05/12] cls_flower: add statistics retrieval infrastructure and use it

2018-11-19 Thread Jiri Pirko
Mon, Nov 19, 2018 at 03:48:50PM CET, pa...@netfilter.org wrote:
>On Mon, Nov 19, 2018 at 02:57:05PM +0100, Jiri Pirko wrote:
>> Mon, Nov 19, 2018 at 01:15:12AM CET, pa...@netfilter.org wrote:
>[...]
>> >diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
>> >index 7d7aefa5fcd2..7f9a8d5ca945 100644
>> >--- a/include/net/pkt_cls.h
>> >+++ b/include/net/pkt_cls.h
>> >@@ -758,6 +758,12 @@ enum tc_fl_command {
>> >TC_CLSFLOWER_TMPLT_DESTROY,
>> > };
>> > 
>> >+struct tc_cls_flower_stats {
>> >+   u64 pkts;
>> >+   u64 bytes;
>> >+   u64 lastused;
>> >+};
>> >+
>> > struct tc_cls_flower_offload {
>> >struct tc_cls_common_offload common;
>> >enum tc_fl_command command;
>> >@@ -765,6 +771,7 @@ struct tc_cls_flower_offload {
>> >struct flow_rule rule;
>> >struct tcf_exts *exts;
>> >u32 classid;
>> >+   struct tc_cls_flower_stats stats;
>> > };
>> > 
>> > static inline struct flow_rule *
>> >@@ -773,6 +780,14 @@ tc_cls_flower_offload_flow_rule(struct 
>> >tc_cls_flower_offload *tc_flow_cmd)
>> >return _flow_cmd->rule;
>> > }
>> > 
>> >+static inline void tc_cls_flower_stats_update(struct tc_cls_flower_offload 
>> >*cls_flower,
>> >+ u64 pkts, u64 bytes, u64 lastused)
>> >+{
>> >+   cls_flower->stats.pkts  = pkts;
>> >+   cls_flower->stats.bytes = bytes;
>> >+   cls_flower->stats.lastused  = lastused;
>> 
>> Why do you need to store the values here in struct tc_cls_flower_offload?
>> Why don't you just call tcf_exts_stats_update()? Basically,
>> tc_cls_flower_stats_update() should be just wrapper around
>> tcf_exts_stats_update() so that drivers wouldn't use ->exts directly, as
>> you will remove them in follow-up patches, no?
>
>Patch 07/12 stops exposing tc action exts to drivers, so we need a
>structure (struct tc_cls_flower_stats) to convey this statistics back
>to the cls_flower frontend.

Hmm, shouldn't these stats be rather flow_rule related than flower
related?



Re: [PATCH net-next,v2 09/12] flow_dissector: add basic ethtool_rx_flow_spec to flow_rule structure translator

2018-11-19 Thread Jiri Pirko
Mon, Nov 19, 2018 at 01:15:16AM CET, pa...@netfilter.org wrote:
>This patch adds a function to translate the ethtool_rx_flow_spec
>structure to the flow_rule representation.
>
>This allows us to reuse code from the driver side given that both flower
>and ethtool_rx_flow interfaces use the same representation.
>
>Signed-off-by: Pablo Neira Ayuso 
>---
>v2: no changes.
>
> include/net/flow_dissector.h |   5 ++
> net/core/flow_dissector.c| 190 +++
> 2 files changed, 195 insertions(+)
>
>diff --git a/include/net/flow_dissector.h b/include/net/flow_dissector.h
>index 7a4683646d5a..ec9036232538 100644
>--- a/include/net/flow_dissector.h
>+++ b/include/net/flow_dissector.h
>@@ -485,4 +485,9 @@ static inline bool flow_rule_match_key(const struct 
>flow_rule *rule,
>   return dissector_uses_key(rule->match.dissector, key);
> }
> 
>+struct ethtool_rx_flow_spec;
>+
>+struct flow_rule *ethtool_rx_flow_rule(const struct ethtool_rx_flow_spec *fs);
>+void ethtool_rx_flow_rule_free(struct flow_rule *rule);
>+
> #endif
>diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c
>index b9368349f0f7..ef5bdb62620c 100644
>--- a/net/core/flow_dissector.c
>+++ b/net/core/flow_dissector.c
>@@ -17,6 +17,7 @@
> #include 
> #include 
> #include 
>+#include 
> #include 
> #include 
> #include 
>@@ -276,6 +277,195 @@ void flow_action_free(struct flow_action *flow_action)
> }
> EXPORT_SYMBOL(flow_action_free);
> 
>+struct ethtool_rx_flow_key {
>+  struct flow_dissector_key_basic basic;
>+  union {
>+  struct flow_dissector_key_ipv4_addrsipv4;
>+  struct flow_dissector_key_ipv6_addrsipv6;
>+  };
>+  struct flow_dissector_key_ports tp;
>+  struct flow_dissector_key_ipip;
>+} __aligned(BITS_PER_LONG / 8); /* Ensure that we can do comparisons as 
>longs. */
>+
>+struct ethtool_rx_flow_match {
>+  struct flow_dissector   dissector;
>+  struct ethtool_rx_flow_key  key;
>+  struct ethtool_rx_flow_key  mask;
>+};
>+
>+struct flow_rule *ethtool_rx_flow_rule(const struct ethtool_rx_flow_spec *fs)
>+{
>+  static struct in6_addr zero_addr = {};
>+  struct ethtool_rx_flow_match *match;
>+  struct flow_action_key *act;
>+  struct flow_rule *rule;
>+
>+  rule = kmalloc(sizeof(struct flow_rule), GFP_KERNEL);

Allocating struct flow_rule manually here seems wrong. There should
be some helpers. Please see below.***


>+  if (!rule)
>+  return NULL;
>+
>+  match = kzalloc(sizeof(struct ethtool_rx_flow_match), GFP_KERNEL);
>+  if (!match)
>+  goto err_match;
>+
>+  rule->match.dissector   = >dissector;
>+  rule->match.mask= >mask;
>+  rule->match.key = >key;
>+
>+  match->mask.basic.n_proto = 0x;
>+
>+  switch (fs->flow_type & ~FLOW_EXT) {
>+  case TCP_V4_FLOW:
>+  case UDP_V4_FLOW: {
>+  const struct ethtool_tcpip4_spec *v4_spec, *v4_m_spec;
>+
>+  match->key.basic.n_proto = htons(ETH_P_IP);
>+
>+  v4_spec = >h_u.tcp_ip4_spec;
>+  v4_m_spec = >m_u.tcp_ip4_spec;
>+
>+  if (v4_m_spec->ip4src) {
>+  match->key.ipv4.src = v4_spec->ip4src;
>+  match->mask.ipv4.src = v4_m_spec->ip4src;
>+  }
>+  if (v4_m_spec->ip4dst) {
>+  match->key.ipv4.dst = v4_spec->ip4dst;
>+  match->mask.ipv4.dst = v4_m_spec->ip4dst;
>+  }
>+  if (v4_m_spec->ip4src ||
>+  v4_m_spec->ip4dst) {
>+  match->dissector.used_keys |=
>+  FLOW_DISSECTOR_KEY_IPV4_ADDRS;
>+  match->dissector.offset[FLOW_DISSECTOR_KEY_IPV4_ADDRS] =
>+  offsetof(struct ethtool_rx_flow_key, ipv4);
>+  }
>+  if (v4_m_spec->psrc) {
>+  match->key.tp.src = v4_spec->psrc;
>+  match->mask.tp.src = v4_m_spec->psrc;
>+  }
>+  if (v4_m_spec->pdst) {
>+  match->key.tp.dst = v4_spec->pdst;
>+  match->mask.tp.dst = v4_m_spec->pdst;
>+  }
>+  if (v4_m_spec->psrc ||
>+  v4_m_spec->pdst) {
>+  match->dissector.used_keys |= FLOW_DISSECTOR_KEY_PORTS;
>+  match->dissector.offset[FLOW_DISSECTOR_KEY_PORTS] =
>+  offsetof(struct ethtool_rx_flow_key, tp);
>+  }
>+  if (v4_m_spec->tos) {
>+  match->key.ip.tos = v4_spec->pdst;
>+  match->mask.ip.tos = v4_m_spec->pdst;
>+  match->dissector.used_keys |= FLOW_DISSECTOR_KEY_IP;
>+  match->dissector.offset[FLOW_DISSECTOR_KEY_IP] =
>+  offsetof(struct 

Re: [PATCH net-next,v2 09/12] flow_dissector: add basic ethtool_rx_flow_spec to flow_rule structure translator

2018-11-19 Thread Jiri Pirko
Mon, Nov 19, 2018 at 01:15:16AM CET, pa...@netfilter.org wrote:
>This patch adds a function to translate the ethtool_rx_flow_spec
>structure to the flow_rule representation.
>
>This allows us to reuse code from the driver side given that both flower
>and ethtool_rx_flow interfaces use the same representation.
>
>Signed-off-by: Pablo Neira Ayuso 
>---
>v2: no changes.
>
> include/net/flow_dissector.h |   5 ++
> net/core/flow_dissector.c| 190 +++
> 2 files changed, 195 insertions(+)
>
>diff --git a/include/net/flow_dissector.h b/include/net/flow_dissector.h
>index 7a4683646d5a..ec9036232538 100644
>--- a/include/net/flow_dissector.h
>+++ b/include/net/flow_dissector.h
>@@ -485,4 +485,9 @@ static inline bool flow_rule_match_key(const struct 
>flow_rule *rule,
>   return dissector_uses_key(rule->match.dissector, key);
> }
> 
>+struct ethtool_rx_flow_spec;
>+
>+struct flow_rule *ethtool_rx_flow_rule(const struct ethtool_rx_flow_spec *fs);
>+void ethtool_rx_flow_rule_free(struct flow_rule *rule);
>+
> #endif
>diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c
>index b9368349f0f7..ef5bdb62620c 100644
>--- a/net/core/flow_dissector.c
>+++ b/net/core/flow_dissector.c
>@@ -17,6 +17,7 @@
> #include 
> #include 
> #include 
>+#include 
> #include 
> #include 
> #include 
>@@ -276,6 +277,195 @@ void flow_action_free(struct flow_action *flow_action)
> }
> EXPORT_SYMBOL(flow_action_free);
> 
>+struct ethtool_rx_flow_key {
>+  struct flow_dissector_key_basic basic;
>+  union {
>+  struct flow_dissector_key_ipv4_addrsipv4;
>+  struct flow_dissector_key_ipv6_addrsipv6;
>+  };
>+  struct flow_dissector_key_ports tp;
>+  struct flow_dissector_key_ipip;
>+} __aligned(BITS_PER_LONG / 8); /* Ensure that we can do comparisons as 
>longs. */
>+
>+struct ethtool_rx_flow_match {
>+  struct flow_dissector   dissector;
>+  struct ethtool_rx_flow_key  key;
>+  struct ethtool_rx_flow_key  mask;
>+};
>+
>+struct flow_rule *ethtool_rx_flow_rule(const struct ethtool_rx_flow_spec *fs)
>+{
>+  static struct in6_addr zero_addr = {};
>+  struct ethtool_rx_flow_match *match;
>+  struct flow_action_key *act;
>+  struct flow_rule *rule;
>+
>+  rule = kmalloc(sizeof(struct flow_rule), GFP_KERNEL);
>+  if (!rule)
>+  return NULL;
>+
>+  match = kzalloc(sizeof(struct ethtool_rx_flow_match), GFP_KERNEL);
>+  if (!match)
>+  goto err_match;
>+
>+  rule->match.dissector   = >dissector;
>+  rule->match.mask= >mask;
>+  rule->match.key = >key;
>+
>+  match->mask.basic.n_proto = 0x;
>+
>+  switch (fs->flow_type & ~FLOW_EXT) {
>+  case TCP_V4_FLOW:
>+  case UDP_V4_FLOW: {
>+  const struct ethtool_tcpip4_spec *v4_spec, *v4_m_spec;
>+
>+  match->key.basic.n_proto = htons(ETH_P_IP);
>+
>+  v4_spec = >h_u.tcp_ip4_spec;
>+  v4_m_spec = >m_u.tcp_ip4_spec;
>+
>+  if (v4_m_spec->ip4src) {
>+  match->key.ipv4.src = v4_spec->ip4src;
>+  match->mask.ipv4.src = v4_m_spec->ip4src;
>+  }
>+  if (v4_m_spec->ip4dst) {
>+  match->key.ipv4.dst = v4_spec->ip4dst;
>+  match->mask.ipv4.dst = v4_m_spec->ip4dst;
>+  }
>+  if (v4_m_spec->ip4src ||
>+  v4_m_spec->ip4dst) {
>+  match->dissector.used_keys |=
>+  FLOW_DISSECTOR_KEY_IPV4_ADDRS;
>+  match->dissector.offset[FLOW_DISSECTOR_KEY_IPV4_ADDRS] =
>+  offsetof(struct ethtool_rx_flow_key, ipv4);
>+  }
>+  if (v4_m_spec->psrc) {
>+  match->key.tp.src = v4_spec->psrc;
>+  match->mask.tp.src = v4_m_spec->psrc;
>+  }
>+  if (v4_m_spec->pdst) {
>+  match->key.tp.dst = v4_spec->pdst;
>+  match->mask.tp.dst = v4_m_spec->pdst;
>+  }
>+  if (v4_m_spec->psrc ||
>+  v4_m_spec->pdst) {
>+  match->dissector.used_keys |= FLOW_DISSECTOR_KEY_PORTS;
>+  match->dissector.offset[FLOW_DISSECTOR_KEY_PORTS] =
>+  offsetof(struct ethtool_rx_flow_key, tp);
>+  }
>+  if (v4_m_spec->tos) {
>+  match->key.ip.tos = v4_spec->pdst;
>+  match->mask.ip.tos = v4_m_spec->pdst;
>+  match->dissector.used_keys |= FLOW_DISSECTOR_KEY_IP;
>+  match->dissector.offset[FLOW_DISSECTOR_KEY_IP] =
>+  offsetof(struct ethtool_rx_flow_key, ip);
>+  }
>+  }
>+  break;
>+  case 

Re: [PATCH net-next,v2 03/12] flow_dissector: add flow action infrastructure

2018-11-19 Thread Jiri Pirko
Mon, Nov 19, 2018 at 01:15:10AM CET, pa...@netfilter.org wrote:
>This new infrastructure defines the nic actions that you can perform
>from existing network drivers. This infrastructure allows us to avoid a
>direct dependency with the native software TC action representation.
>
>Signed-off-by: Pablo Neira Ayuso 
>---
>v2: no changes.
>
> include/net/flow_dissector.h | 70 
> net/core/flow_dissector.c| 18 
> 2 files changed, 88 insertions(+)
>
>diff --git a/include/net/flow_dissector.h b/include/net/flow_dissector.h
>index 965a82b8d881..925c208816f1 100644
>--- a/include/net/flow_dissector.h
>+++ b/include/net/flow_dissector.h
>@@ -402,8 +402,78 @@ void flow_rule_match_enc_keyid(const struct flow_rule 
>*rule,
> void flow_rule_match_enc_opts(const struct flow_rule *rule,
> struct flow_match_enc_opts *out);
> 
>+enum flow_action_key_id {
>+  FLOW_ACTION_KEY_ACCEPT  = 0,
>+  FLOW_ACTION_KEY_DROP,
>+  FLOW_ACTION_KEY_TRAP,
>+  FLOW_ACTION_KEY_GOTO,
>+  FLOW_ACTION_KEY_REDIRECT,
>+  FLOW_ACTION_KEY_MIRRED,
>+  FLOW_ACTION_KEY_VLAN_PUSH,
>+  FLOW_ACTION_KEY_VLAN_POP,
>+  FLOW_ACTION_KEY_VLAN_MANGLE,
>+  FLOW_ACTION_KEY_TUNNEL_ENCAP,
>+  FLOW_ACTION_KEY_TUNNEL_DECAP,
>+  FLOW_ACTION_KEY_MANGLE,
>+  FLOW_ACTION_KEY_ADD,
>+  FLOW_ACTION_KEY_CSUM,
>+  FLOW_ACTION_KEY_MARK,
>+};
>+
>+/* This is mirroring enum pedit_header_type definition for easy mapping 
>between
>+ * tc pedit action. Legacy TCA_PEDIT_KEY_EX_HDR_TYPE_NETWORK is mapped to
>+ * FLOW_ACT_MANGLE_UNSPEC, which is supported by no driver.
>+ */
>+enum flow_act_mangle_base {
>+  FLOW_ACT_MANGLE_UNSPEC  = 0,
>+  FLOW_ACT_MANGLE_HDR_TYPE_ETH,
>+  FLOW_ACT_MANGLE_HDR_TYPE_IP4,
>+  FLOW_ACT_MANGLE_HDR_TYPE_IP6,
>+  FLOW_ACT_MANGLE_HDR_TYPE_TCP,
>+  FLOW_ACT_MANGLE_HDR_TYPE_UDP,
>+};
>+
>+struct flow_action_key {
>+  enum flow_action_key_id id;
>+  union {
>+  u32 chain_index;/* FLOW_ACTION_KEY_GOTO 
>*/
>+  struct net_device   *dev;   /* 
>FLOW_ACTION_KEY_REDIRECT */
>+  struct {/* FLOW_ACTION_KEY_VLAN 
>*/
>+  u16 vid;
>+  __be16  proto;
>+  u8  prio;
>+  } vlan;
>+  struct {/* 
>FLOW_ACTION_KEY_PACKET_EDIT */
>+  enum flow_act_mangle_base htype;
>+  u32 offset;
>+  u32 mask;
>+  u32 val;
>+  } mangle;
>+  const struct ip_tunnel_info *tunnel;/* 
>FLOW_ACTION_KEY_TUNNEL_ENCAP */
>+  u32 csum_flags; /* FLOW_ACTION_KEY_CSUM 
>*/
>+  u32 mark;   /* FLOW_ACTION_KEY_MARK 
>*/
>+  };
>+};
>+
>+struct flow_action {

Hmm, thinking about it a bit more, none of this is is related to flow
dissector so it is misleading to put the code in flow_dissector.[hc].

Maybe you can push this and related stuff into new files include/net/flow.h
and net/core/flow.c.



>+  int num_keys;
>+  struct flow_action_key  *keys;
>+};
>+
>+int flow_action_init(struct flow_action *flow_action, int num_acts);
>+void flow_action_free(struct flow_action *flow_action);
>+
>+static inline bool flow_action_has_keys(const struct flow_action *action)
>+{
>+  return action->num_keys;
>+}
>+
>+#define flow_action_for_each(__i, __act, __actions)   \
>+for (__i = 0, __act = &(__actions)->keys[0]; __i < 
>(__actions)->num_keys; __act = &(__actions)->keys[++__i])
>+
> struct flow_rule {
>   struct flow_match   match;
>+  struct flow_action  action;
> };
> 
> static inline bool flow_rule_match_key(const struct flow_rule *rule,
>diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c
>index 186089b8d852..b9368349f0f7 100644
>--- a/net/core/flow_dissector.c
>+++ b/net/core/flow_dissector.c
>@@ -258,6 +258,24 @@ void flow_rule_match_enc_opts(const struct flow_rule 
>*rule,
> }
> EXPORT_SYMBOL(flow_rule_match_enc_opts);
> 
>+int flow_action_init(struct flow_action *flow_action, int num_acts)
>+{
>+  flow_action->keys = kmalloc(sizeof(struct flow_action_key) * num_acts,
>+  GFP_KERNEL);
>+  if (!flow_action->keys)
>+  return -ENOMEM;
>+
>+  flow_action->num_keys = num_acts;
>+  return 0;
>+}
>+EXPORT_SYMBOL(flow_action_init);
>+
>+void flow_action_free(struct flow_action *flow_action)
>+{
>+  kfree(flow_action->keys);
>+}
>+EXPORT_SYMBOL(flow_action_free);
>+
> /**
>  * __skb_flow_get_ports - extract the upper layer ports and return them
>  * @skb: sk_buff to extract the ports from
>-- 
>2.11.0
>


Re: [PATCH net-next,v2 08/12] flow_dissector: add wake-up-on-lan and queue to flow_action

2018-11-19 Thread Jiri Pirko
Mon, Nov 19, 2018 at 01:15:15AM CET, pa...@netfilter.org wrote:
>These actions need to be added to support bcm sf2 features available
>through the ethtool_rx_flow interface.
>
>Reviewed-by: Florian Fainelli 
>Signed-off-by: Pablo Neira Ayuso 

For me it would be nicer to have this as 2 patches, one per action type.
But up to you.


>---
>v2: no changes.
>
> include/net/flow_dissector.h | 3 +++
> 1 file changed, 3 insertions(+)
>
>diff --git a/include/net/flow_dissector.h b/include/net/flow_dissector.h
>index 925c208816f1..7a4683646d5a 100644
>--- a/include/net/flow_dissector.h
>+++ b/include/net/flow_dissector.h
>@@ -418,6 +418,8 @@ enum flow_action_key_id {
>   FLOW_ACTION_KEY_ADD,
>   FLOW_ACTION_KEY_CSUM,
>   FLOW_ACTION_KEY_MARK,
>+  FLOW_ACTION_KEY_WAKE,
>+  FLOW_ACTION_KEY_QUEUE,
> };
> 
> /* This is mirroring enum pedit_header_type definition for easy mapping 
> between
>@@ -452,6 +454,7 @@ struct flow_action_key {
>   const struct ip_tunnel_info *tunnel;/* 
> FLOW_ACTION_KEY_TUNNEL_ENCAP */
>   u32 csum_flags; /* FLOW_ACTION_KEY_CSUM 
> */
>   u32 mark;   /* FLOW_ACTION_KEY_MARK 
> */
>+  u32 queue_index;/* 
>FLOW_ACTION_KEY_QUEUE */
>   };
> };
> 
>-- 
>2.11.0
>


Re: [PATCH net-next,v2 05/12] cls_flower: add statistics retrieval infrastructure and use it

2018-11-19 Thread Jiri Pirko
Mon, Nov 19, 2018 at 01:15:12AM CET, pa...@netfilter.org wrote:
>This patch provides a tc_cls_flower_stats structure that acts as
>container for tc_cls_flower_offload, then we can use to restore the
>statistics on the existing TC actions. Hence, tcf_exts_stats_update() is
>not used from drivers.
>
>Signed-off-by: Pablo Neira Ayuso 
>---
>v2: no changes.
>
> drivers/net/ethernet/broadcom/bnxt/bnxt_tc.c  |  4 ++--
> drivers/net/ethernet/chelsio/cxgb4/cxgb4_tc_flower.c  |  6 +++---
> drivers/net/ethernet/mellanox/mlx5/core/en_tc.c   |  2 +-
> drivers/net/ethernet/mellanox/mlxsw/spectrum_flower.c |  2 +-
> drivers/net/ethernet/netronome/nfp/flower/offload.c   |  6 +++---
> include/net/pkt_cls.h | 15 +++
> net/sched/cls_flower.c|  4 
> 7 files changed, 29 insertions(+), 10 deletions(-)
>
>diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_tc.c 
>b/drivers/net/ethernet/broadcom/bnxt/bnxt_tc.c
>index b82143d6cdde..3d71b2530d67 100644
>--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_tc.c
>+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_tc.c
>@@ -1366,8 +1366,8 @@ static int bnxt_tc_get_flow_stats(struct bnxt *bp,
>   lastused = flow->lastused;
>   spin_unlock(>stats_lock);
> 
>-  tcf_exts_stats_update(tc_flow_cmd->exts, stats.bytes, stats.packets,
>-lastused);
>+  tc_cls_flower_stats_update(tc_flow_cmd, stats.bytes, stats.packets,
>+ lastused);
>   return 0;
> }
> 
>diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_tc_flower.c 
>b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_tc_flower.c
>index 39c5af5dad3d..2c7d1aebe214 100644
>--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_tc_flower.c
>+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_tc_flower.c
>@@ -807,9 +807,9 @@ int cxgb4_tc_flower_stats(struct net_device *dev,
>   if (ofld_stats->packet_count != packets) {
>   if (ofld_stats->prev_packet_count != packets)
>   ofld_stats->last_used = jiffies;
>-  tcf_exts_stats_update(cls->exts, bytes - ofld_stats->byte_count,
>-packets - ofld_stats->packet_count,
>-ofld_stats->last_used);
>+  tc_cls_flower_stats_update(cls, bytes - ofld_stats->byte_count,
>+ packets - ofld_stats->packet_count,
>+ ofld_stats->last_used);
> 
>   ofld_stats->packet_count = packets;
>   ofld_stats->byte_count = bytes;
>diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c 
>b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
>index 2645e5d1e790..c5f0b826fa91 100644
>--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
>+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
>@@ -3224,7 +3224,7 @@ int mlx5e_stats_flower(struct mlx5e_priv *priv,
> 
>   mlx5_fc_query_cached(counter, , , );
> 
>-  tcf_exts_stats_update(f->exts, bytes, packets, lastuse);
>+  tc_cls_flower_stats_update(f, bytes, packets, lastuse);
> 
>   return 0;
> }
>diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_flower.c 
>b/drivers/net/ethernet/mellanox/mlxsw/spectrum_flower.c
>index 193a6f9acf79..3398984ffb2a 100644
>--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_flower.c
>+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_flower.c
>@@ -460,7 +460,7 @@ int mlxsw_sp_flower_stats(struct mlxsw_sp *mlxsw_sp,
>   if (err)
>   goto err_rule_get_stats;
> 
>-  tcf_exts_stats_update(f->exts, bytes, packets, lastuse);
>+  tc_cls_flower_stats_update(f, bytes, packets, lastuse);
> 
>   mlxsw_sp_acl_ruleset_put(mlxsw_sp, ruleset);
>   return 0;
>diff --git a/drivers/net/ethernet/netronome/nfp/flower/offload.c 
>b/drivers/net/ethernet/netronome/nfp/flower/offload.c
>index 708331234908..bec74d84756c 100644
>--- a/drivers/net/ethernet/netronome/nfp/flower/offload.c
>+++ b/drivers/net/ethernet/netronome/nfp/flower/offload.c
>@@ -532,9 +532,9 @@ nfp_flower_get_stats(struct nfp_app *app, struct 
>net_device *netdev,
>   ctx_id = be32_to_cpu(nfp_flow->meta.host_ctx_id);
> 
>   spin_lock_bh(>stats_lock);
>-  tcf_exts_stats_update(flow->exts, priv->stats[ctx_id].bytes,
>-priv->stats[ctx_id].pkts,
>-priv->stats[ctx_id].used);
>+  tc_cls_flower_stats_update(flow, priv->stats[ctx_id].bytes,
>+ priv->stats[ctx_id].pkts,
>+ priv->stats[ctx_id].used);
> 
>   priv->stats[ctx_id].pkts = 0;
>   priv->stats[ctx_id].bytes = 0;
>diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
>index 7d7aefa5fcd2..7f9a8d5ca945 100644
>--- a/include/net/pkt_cls.h
>+++ b/include/net/pkt_cls.h
>@@ -758,6 +758,12 @@ enum tc_fl_command {
>   TC_CLSFLOWER_TMPLT_DESTROY,
> };
> 
>+struct tc_cls_flower_stats {
>+  

Re: [PATCH net-next,v2 04/12] cls_api: add translator to flow_action representation

2018-11-19 Thread Jiri Pirko
Mon, Nov 19, 2018 at 02:21:41PM CET, pa...@netfilter.org wrote:
>On Mon, Nov 19, 2018 at 01:12:51PM +0100, Jiri Pirko wrote:
>> Mon, Nov 19, 2018 at 01:15:11AM CET, pa...@netfilter.org wrote:
>> >@@ -2567,6 +2575,111 @@ int tc_setup_cb_call(struct tcf_block *block, 
>> >struct tcf_exts *exts,
>> > }
>> > EXPORT_SYMBOL(tc_setup_cb_call);
>> > 
>> >+int tc_setup_flow_action(struct flow_action *flow_action,
>> >+const struct tcf_exts *exts)
>> >+{
>> >+   const struct tc_action *act;
>> >+   int num_acts = 0, i, j, k;
>> >+
>> >+   if (!exts)
>> >+   return 0;
>> >+
>> >+   tcf_exts_for_each_action(i, act, exts) {
>> >+   if (is_tcf_pedit(act))
>> >+   num_acts += tcf_pedit_nkeys(act);
>> >+   else
>> >+   num_acts++;
>> >+   }
>> >+   if (!num_acts)
>> >+   return 0;
>> >+
>> >+   if (flow_action_init(flow_action, num_acts) < 0)
>> 
>> This is actually a "alloc" function. And the counterpart is "free".
>
>I can rename it to _alloc() if you prefer.
>
>> How about to allocate the container struct which would have the [0]
>> trick for the array of action?
>
>You mean turn *keys into keys[0] stub in struct flow_action? This is
>embedded into struct tc_cls_flower_offload, I may need to make a
>second look but I think it won't fly.
>
>BTW, side note: I will rename keys to "array" given keys is not
>semantically appropriate as you mentioned, BTW.

What I suggest is this:

struct flow_actions {
   unsinged int action_count;
   struct flow_action action[0];
};


And then to have 
struct flow_actions *flow_actions_alloc(unsigned int action_count)
{
return kzalloc(sizeof(struct flow_actions) + sizeof(struct flow_action) 
* action_count, ..);
}

Something like this.


>
>Thanks!


Re: [PATCH net-next,v2 03/12] flow_dissector: add flow action infrastructure

2018-11-19 Thread Jiri Pirko
Mon, Nov 19, 2018 at 01:35:48PM CET, pa...@netfilter.org wrote:
>On Mon, Nov 19, 2018 at 12:56:23PM +0100, Jiri Pirko wrote:
>> Mon, Nov 19, 2018 at 01:15:10AM CET, pa...@netfilter.org wrote:
>> >This new infrastructure defines the nic actions that you can perform
>> >from existing network drivers. This infrastructure allows us to avoid a
>> >direct dependency with the native software TC action representation.
>> >
>> >Signed-off-by: Pablo Neira Ayuso 
>> >---
>> >v2: no changes.
>> >
>> > include/net/flow_dissector.h | 70 
>> > 
>> > net/core/flow_dissector.c| 18 
>> > 2 files changed, 88 insertions(+)
>> >
>> >diff --git a/include/net/flow_dissector.h b/include/net/flow_dissector.h
>> >index 965a82b8d881..925c208816f1 100644
>> >--- a/include/net/flow_dissector.h
>> >+++ b/include/net/flow_dissector.h
>> >@@ -402,8 +402,78 @@ void flow_rule_match_enc_keyid(const struct flow_rule 
>> >*rule,
>> > void flow_rule_match_enc_opts(const struct flow_rule *rule,
>> >  struct flow_match_enc_opts *out);
>> > 
>> >+enum flow_action_key_id {
>> 
>> Why "key"? Why not just "flow_action_id"
>
>Sure, will rename this.
>
>> >+   FLOW_ACTION_KEY_ACCEPT  = 0,
>> >+   FLOW_ACTION_KEY_DROP,
>> >+   FLOW_ACTION_KEY_TRAP,
>> >+   FLOW_ACTION_KEY_GOTO,
>> >+   FLOW_ACTION_KEY_REDIRECT,
>> >+   FLOW_ACTION_KEY_MIRRED,
>> >+   FLOW_ACTION_KEY_VLAN_PUSH,
>> >+   FLOW_ACTION_KEY_VLAN_POP,
>> >+   FLOW_ACTION_KEY_VLAN_MANGLE,
>> >+   FLOW_ACTION_KEY_TUNNEL_ENCAP,
>> >+   FLOW_ACTION_KEY_TUNNEL_DECAP,
>> >+   FLOW_ACTION_KEY_MANGLE,
>> >+   FLOW_ACTION_KEY_ADD,
>> >+   FLOW_ACTION_KEY_CSUM,
>> >+   FLOW_ACTION_KEY_MARK,
>
>I assume I should remove _KEY_ from this enum definitions too.

Sure.

>
>> >+};
>> >+
>> >+/* This is mirroring enum pedit_header_type definition for easy mapping 
>> >between
>> >+ * tc pedit action. Legacy TCA_PEDIT_KEY_EX_HDR_TYPE_NETWORK is mapped to
>> >+ * FLOW_ACT_MANGLE_UNSPEC, which is supported by no driver.
>> >+ */
>> >+enum flow_act_mangle_base {
>> 
>> Please be consistent in naming: "act" vs "action"
>
>OK.
>
>> >+   FLOW_ACT_MANGLE_UNSPEC  = 0,
>> >+   FLOW_ACT_MANGLE_HDR_TYPE_ETH,
>> >+   FLOW_ACT_MANGLE_HDR_TYPE_IP4,
>> >+   FLOW_ACT_MANGLE_HDR_TYPE_IP6,
>> >+   FLOW_ACT_MANGLE_HDR_TYPE_TCP,
>> >+   FLOW_ACT_MANGLE_HDR_TYPE_UDP,
>> >+};
>> >+
>> >+struct flow_action_key {
>> 
>> And here "struct flow_action"
>
>OK.
>
>> >+   enum flow_action_key_id id;
>> >+   union {
>> >+   u32 chain_index;/* FLOW_ACTION_KEY_GOTO 
>> >*/
>> >+   struct net_device   *dev;   /* 
>> >FLOW_ACTION_KEY_REDIRECT */
>> >+   struct {/* FLOW_ACTION_KEY_VLAN 
>> >*/
>> >+   u16 vid;
>> >+   __be16  proto;
>> >+   u8  prio;
>> >+   } vlan;
>> >+   struct {/* 
>> >FLOW_ACTION_KEY_PACKET_EDIT */
>> >+   enum flow_act_mangle_base htype;
>> >+   u32 offset;
>> >+   u32 mask;
>> >+   u32 val;
>> >+   } mangle;
>> >+   const struct ip_tunnel_info *tunnel;/* 
>> >FLOW_ACTION_KEY_TUNNEL_ENCAP */
>> >+   u32 csum_flags; /* FLOW_ACTION_KEY_CSUM 
>> >*/
>> >+   u32 mark;   /* FLOW_ACTION_KEY_MARK 
>> >*/
>> >+   };
>> >+};
>> >+
>> >+struct flow_action {
>> 
>> And here "struct flow_actions"
>> 
>> 
>> >+   int num_keys;
>> 
>> unsigned int;
>
>OK.


Re: [PATCH net-next,v2 04/12] cls_api: add translator to flow_action representation

2018-11-19 Thread Jiri Pirko
Mon, Nov 19, 2018 at 01:15:11AM CET, pa...@netfilter.org wrote:
>This patch implements a new function to translate from native TC action
>to the new flow_action representation. Moreover, this patch also updates
>cls_flower to use this new function.
>
>Signed-off-by: Pablo Neira Ayuso 
>---
>v2: no changes.
>
> include/net/pkt_cls.h  |   3 ++
> net/sched/cls_api.c| 113 +
> net/sched/cls_flower.c |  15 ++-
> 3 files changed, 130 insertions(+), 1 deletion(-)
>
>diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
>index 8b79a1a3a5c7..7d7aefa5fcd2 100644
>--- a/include/net/pkt_cls.h
>+++ b/include/net/pkt_cls.h
>@@ -619,6 +619,9 @@ tcf_match_indev(struct sk_buff *skb, int ifindex)
> }
> #endif /* CONFIG_NET_CLS_IND */
> 
>+int tc_setup_flow_action(struct flow_action *flow_action,
>+   const struct tcf_exts *exts);
>+
> int tc_setup_cb_call(struct tcf_block *block, struct tcf_exts *exts,
>enum tc_setup_type type, void *type_data, bool err_stop);
> 
>diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
>index d92f44ac4c39..6ab44e650f43 100644
>--- a/net/sched/cls_api.c
>+++ b/net/sched/cls_api.c
>@@ -31,6 +31,14 @@
> #include 
> #include 
> #include 
>+#include 
>+#include 
>+#include 
>+#include 
>+#include 
>+#include 
>+#include 
>+#include 
> 
> extern const struct nla_policy rtm_tca_policy[TCA_MAX + 1];
> 
>@@ -2567,6 +2575,111 @@ int tc_setup_cb_call(struct tcf_block *block, struct 
>tcf_exts *exts,
> }
> EXPORT_SYMBOL(tc_setup_cb_call);
> 
>+int tc_setup_flow_action(struct flow_action *flow_action,
>+   const struct tcf_exts *exts)
>+{
>+  const struct tc_action *act;
>+  int num_acts = 0, i, j, k;
>+
>+  if (!exts)
>+  return 0;
>+
>+  tcf_exts_for_each_action(i, act, exts) {
>+  if (is_tcf_pedit(act))
>+  num_acts += tcf_pedit_nkeys(act);
>+  else
>+  num_acts++;
>+  }
>+  if (!num_acts)
>+  return 0;
>+
>+  if (flow_action_init(flow_action, num_acts) < 0)
>+  return -ENOMEM;
>+
>+  j = 0;
>+  tcf_exts_for_each_action(i, act, exts) {
>+  struct flow_action_key *key;
>+
>+  key = _action->keys[j];
>+  if (is_tcf_gact_ok(act)) {
>+  key->id = FLOW_ACTION_KEY_ACCEPT;
>+  } else if (is_tcf_gact_shot(act)) {
>+  key->id = FLOW_ACTION_KEY_DROP;
>+  } else if (is_tcf_gact_trap(act)) {
>+  key->id = FLOW_ACTION_KEY_TRAP;
>+  } else if (is_tcf_gact_goto_chain(act)) {
>+  key->id = FLOW_ACTION_KEY_GOTO;
>+  key->chain_index = tcf_gact_goto_chain_index(act);
>+  } else if (is_tcf_mirred_egress_redirect(act)) {
>+  key->id = FLOW_ACTION_KEY_REDIRECT;
>+  key->dev = tcf_mirred_dev(act);
>+  } else if (is_tcf_mirred_egress_mirror(act)) {
>+  key->id = FLOW_ACTION_KEY_MIRRED;
>+  key->dev = tcf_mirred_dev(act);
>+  } else if (is_tcf_vlan(act)) {
>+  switch (tcf_vlan_action(act)) {
>+  case TCA_VLAN_ACT_PUSH:
>+  key->id = FLOW_ACTION_KEY_VLAN_PUSH;
>+  key->vlan.vid = tcf_vlan_push_vid(act);
>+  key->vlan.proto = tcf_vlan_push_proto(act);
>+  key->vlan.prio = tcf_vlan_push_prio(act);
>+  break;
>+  case TCA_VLAN_ACT_POP:
>+  key->id = FLOW_ACTION_KEY_VLAN_POP;
>+  break;
>+  case TCA_VLAN_ACT_MODIFY:
>+  key->id = FLOW_ACTION_KEY_VLAN_MANGLE;
>+  key->vlan.vid = tcf_vlan_push_vid(act);
>+  key->vlan.proto = tcf_vlan_push_proto(act);
>+  key->vlan.prio = tcf_vlan_push_prio(act);
>+  break;
>+  default:
>+  goto err_out;
>+  }
>+  } else if (is_tcf_tunnel_set(act)) {
>+  key->id = FLOW_ACTION_KEY_TUNNEL_ENCAP;
>+  key->tunnel = tcf_tunnel_info(act);
>+  } else if (is_tcf_tunnel_release(act)) {
>+  key->id = FLOW_ACTION_KEY_TUNNEL_DECAP;
>+  key->tunnel = tcf_tunnel_info(act);
>+  } else if (is_tcf_pedit(act)) {
>+  for (k = 0; k < tcf_pedit_nkeys(act); k++) {
>+  switch (tcf_pedit_cmd(act, k)) {
>+  case TCA_PEDIT_KEY_EX_CMD_SET:
>+  key->id = FLOW_ACTION_KEY_MANGLE;
>+   

Re: [PATCH net-next,v2 04/12] cls_api: add translator to flow_action representation

2018-11-19 Thread Jiri Pirko
Mon, Nov 19, 2018 at 01:15:11AM CET, pa...@netfilter.org wrote:
>This patch implements a new function to translate from native TC action
>to the new flow_action representation. Moreover, this patch also updates
>cls_flower to use this new function.
>
>Signed-off-by: Pablo Neira Ayuso 
>---
>v2: no changes.
>
> include/net/pkt_cls.h  |   3 ++
> net/sched/cls_api.c| 113 +
> net/sched/cls_flower.c |  15 ++-
> 3 files changed, 130 insertions(+), 1 deletion(-)
>
>diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
>index 8b79a1a3a5c7..7d7aefa5fcd2 100644
>--- a/include/net/pkt_cls.h
>+++ b/include/net/pkt_cls.h
>@@ -619,6 +619,9 @@ tcf_match_indev(struct sk_buff *skb, int ifindex)
> }
> #endif /* CONFIG_NET_CLS_IND */
> 
>+int tc_setup_flow_action(struct flow_action *flow_action,
>+   const struct tcf_exts *exts);
>+
> int tc_setup_cb_call(struct tcf_block *block, struct tcf_exts *exts,
>enum tc_setup_type type, void *type_data, bool err_stop);
> 
>diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
>index d92f44ac4c39..6ab44e650f43 100644
>--- a/net/sched/cls_api.c
>+++ b/net/sched/cls_api.c
>@@ -31,6 +31,14 @@
> #include 
> #include 
> #include 
>+#include 
>+#include 
>+#include 
>+#include 
>+#include 
>+#include 
>+#include 
>+#include 
> 
> extern const struct nla_policy rtm_tca_policy[TCA_MAX + 1];
> 
>@@ -2567,6 +2575,111 @@ int tc_setup_cb_call(struct tcf_block *block, struct 
>tcf_exts *exts,
> }
> EXPORT_SYMBOL(tc_setup_cb_call);
> 
>+int tc_setup_flow_action(struct flow_action *flow_action,
>+   const struct tcf_exts *exts)
>+{
>+  const struct tc_action *act;
>+  int num_acts = 0, i, j, k;
>+
>+  if (!exts)
>+  return 0;
>+
>+  tcf_exts_for_each_action(i, act, exts) {
>+  if (is_tcf_pedit(act))
>+  num_acts += tcf_pedit_nkeys(act);
>+  else
>+  num_acts++;
>+  }
>+  if (!num_acts)
>+  return 0;
>+
>+  if (flow_action_init(flow_action, num_acts) < 0)

This is actually a "alloc" function. And the counterpart is "free".
How about to allocate the container struct which would have the [0]
trick for the array of action?

[...]


Re: [PATCH net-next,v2 03/12] flow_dissector: add flow action infrastructure

2018-11-19 Thread Jiri Pirko
Mon, Nov 19, 2018 at 01:15:10AM CET, pa...@netfilter.org wrote:
>This new infrastructure defines the nic actions that you can perform
>from existing network drivers. This infrastructure allows us to avoid a
>direct dependency with the native software TC action representation.
>
>Signed-off-by: Pablo Neira Ayuso 
>---
>v2: no changes.
>
> include/net/flow_dissector.h | 70 
> net/core/flow_dissector.c| 18 
> 2 files changed, 88 insertions(+)
>
>diff --git a/include/net/flow_dissector.h b/include/net/flow_dissector.h
>index 965a82b8d881..925c208816f1 100644
>--- a/include/net/flow_dissector.h
>+++ b/include/net/flow_dissector.h
>@@ -402,8 +402,78 @@ void flow_rule_match_enc_keyid(const struct flow_rule 
>*rule,
> void flow_rule_match_enc_opts(const struct flow_rule *rule,
> struct flow_match_enc_opts *out);
> 
>+enum flow_action_key_id {

Why "key"? Why not just "flow_action_id"


>+  FLOW_ACTION_KEY_ACCEPT  = 0,
>+  FLOW_ACTION_KEY_DROP,
>+  FLOW_ACTION_KEY_TRAP,
>+  FLOW_ACTION_KEY_GOTO,
>+  FLOW_ACTION_KEY_REDIRECT,
>+  FLOW_ACTION_KEY_MIRRED,
>+  FLOW_ACTION_KEY_VLAN_PUSH,
>+  FLOW_ACTION_KEY_VLAN_POP,
>+  FLOW_ACTION_KEY_VLAN_MANGLE,
>+  FLOW_ACTION_KEY_TUNNEL_ENCAP,
>+  FLOW_ACTION_KEY_TUNNEL_DECAP,
>+  FLOW_ACTION_KEY_MANGLE,
>+  FLOW_ACTION_KEY_ADD,
>+  FLOW_ACTION_KEY_CSUM,
>+  FLOW_ACTION_KEY_MARK,
>+};
>+
>+/* This is mirroring enum pedit_header_type definition for easy mapping 
>between
>+ * tc pedit action. Legacy TCA_PEDIT_KEY_EX_HDR_TYPE_NETWORK is mapped to
>+ * FLOW_ACT_MANGLE_UNSPEC, which is supported by no driver.
>+ */
>+enum flow_act_mangle_base {

Please be consistent in naming: "act" vs "action"


>+  FLOW_ACT_MANGLE_UNSPEC  = 0,
>+  FLOW_ACT_MANGLE_HDR_TYPE_ETH,
>+  FLOW_ACT_MANGLE_HDR_TYPE_IP4,
>+  FLOW_ACT_MANGLE_HDR_TYPE_IP6,
>+  FLOW_ACT_MANGLE_HDR_TYPE_TCP,
>+  FLOW_ACT_MANGLE_HDR_TYPE_UDP,
>+};
>+
>+struct flow_action_key {

And here "struct flow_action"


>+  enum flow_action_key_id id;
>+  union {
>+  u32 chain_index;/* FLOW_ACTION_KEY_GOTO 
>*/
>+  struct net_device   *dev;   /* 
>FLOW_ACTION_KEY_REDIRECT */
>+  struct {/* FLOW_ACTION_KEY_VLAN 
>*/
>+  u16 vid;
>+  __be16  proto;
>+  u8  prio;
>+  } vlan;
>+  struct {/* 
>FLOW_ACTION_KEY_PACKET_EDIT */
>+  enum flow_act_mangle_base htype;
>+  u32 offset;
>+  u32 mask;
>+  u32 val;
>+  } mangle;
>+  const struct ip_tunnel_info *tunnel;/* 
>FLOW_ACTION_KEY_TUNNEL_ENCAP */
>+  u32 csum_flags; /* FLOW_ACTION_KEY_CSUM 
>*/
>+  u32 mark;   /* FLOW_ACTION_KEY_MARK 
>*/
>+  };
>+};
>+
>+struct flow_action {

And here "struct flow_actions"


>+  int num_keys;

unsigned int;


>+  struct flow_action_key  *keys;
>+};
>+
>+int flow_action_init(struct flow_action *flow_action, int num_acts);
>+void flow_action_free(struct flow_action *flow_action);
>+
>+static inline bool flow_action_has_keys(const struct flow_action *action)
>+{
>+  return action->num_keys;
>+}
>+
>+#define flow_action_for_each(__i, __act, __actions)   \
>+for (__i = 0, __act = &(__actions)->keys[0]; __i < 
>(__actions)->num_keys; __act = &(__actions)->keys[++__i])
>+
> struct flow_rule {
>   struct flow_match   match;
>+  struct flow_action  action;
> };
> 
> static inline bool flow_rule_match_key(const struct flow_rule *rule,
>diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c
>index 186089b8d852..b9368349f0f7 100644
>--- a/net/core/flow_dissector.c
>+++ b/net/core/flow_dissector.c
>@@ -258,6 +258,24 @@ void flow_rule_match_enc_opts(const struct flow_rule 
>*rule,
> }
> EXPORT_SYMBOL(flow_rule_match_enc_opts);
> 
>+int flow_action_init(struct flow_action *flow_action, int num_acts)
>+{
>+  flow_action->keys = kmalloc(sizeof(struct flow_action_key) * num_acts,
>+  GFP_KERNEL);
>+  if (!flow_action->keys)
>+  return -ENOMEM;
>+
>+  flow_action->num_keys = num_acts;
>+  return 0;
>+}
>+EXPORT_SYMBOL(flow_action_init);
>+
>+void flow_action_free(struct flow_action *flow_action)
>+{
>+  kfree(flow_action->keys);
>+}
>+EXPORT_SYMBOL(flow_action_free);
>+
> /**
>  * __skb_flow_get_ports - extract the upper layer ports and return them
>  * @skb: sk_buff to extract the ports from
>-- 
>2.11.0
>


Re: [PATCH 00/10] add flow_rule infrastructure

2018-11-19 Thread Jiri Pirko
Mon, Nov 19, 2018 at 10:19:28AM CET, manish.cho...@cavium.com wrote:
>> -Original Message-
>> From: netdev-ow...@vger.kernel.org  On
>> Behalf Of Pablo Neira Ayuso
>> Sent: Friday, November 16, 2018 7:11 AM
>> To: netdev@vger.kernel.org
>> Cc: da...@davemloft.net; thomas.lenda...@amd.com;
>> f.faine...@gmail.com; Elior, Ariel ;
>> michael.c...@broadcom.com; sant...@chelsio.com;
>> madalin.bu...@nxp.com; yisen.zhu...@huawei.com;
>> salil.me...@huawei.com; jeffrey.t.kirs...@intel.com; tar...@mellanox.com;
>> sae...@mellanox.com; j...@mellanox.com; ido...@mellanox.com;
>> jakub.kicin...@netronome.com; peppe.cavall...@st.com;
>> grygorii.stras...@ti.com; and...@lunn.ch;
>> vivien.dide...@savoirfairelinux.com; alexandre.tor...@st.com;
>> joab...@synopsys.com; linux-net-driv...@solarflare.com;
>> ganes...@chelsio.com
>> Subject: [PATCH 00/10] add flow_rule infrastructure
>> 
>> External Email
>> 
>> This patchset introduces a kernel intermediate representation (IR) to express
>> ACL hardware offloads, this is heavily based on the existing flow dissector
>> infrastructure and the TC actions. This IR can be used by different frontend
>> ACL interfaces such as ethtool_rxnfc and tc to represent ACL hardware
>> offloads. Main goal is to simplify the development of ACL hardware offloads
>> for the existing frontend interfaces, the idea is that driver developers do 
>> not
>> need to add one specific parser for each ACL frontend, instead each frontend
>> can just generate this flow_rule IR and pass it to drivers to populate the
>> hardware IR.
>> 
>> .   ethtool_rxnfc   tc
>>|   (ioctl)(netlink)
>>|  | | translate native
>>   Frontend |  | |  interface representation
>>|  | |  to flow_rule IR
>>|  | |
>> . \/\/
>> . flow_rule IR
>>||
>>Drivers || parsing of flow_rule IR
>>||  to populate hardware IR
>>|   \/
>> .  hardware IR (driver)
>> 
>> For design and implementation details, please have a look at:
>> 
>> https://lwn.net/Articles/766695/
>> 
>> As an example, with this patchset, it should be possible to simplify the
>> existing net/qede driver which already has two parsers to populate the
>> hardware IR, one for ethtool_rxnfc interface and another for tc.
>> 
>> This batch is composed of 10 patches:
>> 
>> Patch #1 adds the flow_match structure, this includes the
>>  flow_rule_match_key() interface to check for existing selectors
>>  that are in used in the rule and the flow_rule_match_*()
>>  functions to fetch the selector value and the mask. This
>>  also introduces the initial flow_rule structure skeleton to
>>  avoid a follow up patch that would update the same LoCs.
>> 
>> Patch #2 makes changes to packet edit parser of mlx5e driver, to prepare
>>  introduction of the new flow_action to mangle packets.
>> 
>> Patch #3 Introduce flow_action infrastructure. This infrastructure is
>>  based on the TC actions. Patch #8 extends it so it also
>>  supports two new actions that are only available through the
>>  ethtool_rxnfc interface.
>> 
>> Patch #4 Add function to translate TC action to flow_action from
>>  cls_flower.
>> 
>> Patch #5 Add infrastructure to fetch statistics into container structure
>>  and synchronize them to TC actions from cls_flower. Another
>>  preparation patch before patch #7, so we can stop exposing the
>>  TC action native layout to the drivers.
>> 
>> Patch #6 Use flow_action infrastructure from drivers.
>> 
>> Patch #7 Do not expose TC actions to drivers anymore, now that drivers
>>  have been converted to use the flow_action infrastructure after
>>  patch #5.
>> 
>> Patch #8 Support to wake-up-on-lan and queue actions for the flow_action
>>  infrastructure, two actions supported by NICs. This is used by
>>  the ethtool_rx_flow interface.
>> 
>> Patch #9 Add a function to translate from ethtool_rx_flow_spec structure
>>  to the flow_action structure. This is a simple enough for its
>>  first client: the ethtool_rxnfc interface in the bcm_sf2 driver.
>> 
>> Patch #10 Update bcm_sf2 to use this new translator function and
>>   update codebase to configure hardware IR using the
>>   flow_action representation. This will allow later development
>>   of cls_flower using the same codebase from the driver.
>> 
>> This patchset has passed here functional tests of the codepath that generates
>> the flow_rule structure and the functions to implement the parsers that
>> populate the hardware IR.
>> 
>> Thanks.
>> 
>> Pablo Neira Ayuso (10):
>>   

Re: [iproute2-next PATCH v3 1/2] tc: flower: Classify packets based port ranges

2018-11-18 Thread Jiri Pirko
Fri, Nov 16, 2018 at 01:55:13AM CET, amritha.namb...@intel.com wrote:
>Added support for filtering based on port ranges.
>UAPI changes have been accepted into net-next.
>
>Example:
>1. Match on a port range:
>-
>$ tc filter add dev enp4s0 protocol ip parent :\
>  prio 1 flower ip_proto tcp dst_port range 20-30 skip_hw\
>  action drop
>
>$ tc -s filter show dev enp4s0 parent :
>filter protocol ip pref 1 flower chain 0
>filter protocol ip pref 1 flower chain 0 handle 0x1
>  eth_type ipv4
>  ip_proto tcp
>  dst_port range 20-30
>  skip_hw
>  not_in_hw
>action order 1: gact action drop
> random type none pass val 0
> index 1 ref 1 bind 1 installed 85 sec used 3 sec
>Action statistics:
>Sent 460 bytes 10 pkt (dropped 10, overlimits 0 requeues 0)
>backlog 0b 0p requeues 0
>
>2. Match on IP address and port range:
>--
>$ tc filter add dev enp4s0 protocol ip parent :\
>  prio 1 flower dst_ip 192.168.1.1 ip_proto tcp dst_port range 100-200\
>  skip_hw action drop
>
>$ tc -s filter show dev enp4s0 parent :
>filter protocol ip pref 1 flower chain 0 handle 0x2
>  eth_type ipv4
>  ip_proto tcp
>  dst_ip 192.168.1.1
>  dst_port range 100-200
>  skip_hw
>  not_in_hw
>action order 1: gact action drop
> random type none pass val 0
> index 2 ref 1 bind 1 installed 58 sec used 2 sec
>Action statistics:
>Sent 920 bytes 20 pkt (dropped 20, overlimits 0 requeues 0)
>backlog 0b 0p requeues 0
>
>v3:
>Modified flower_port_range_attr_type calls.
>
>v2:
>Addressed Jiri's comment to sync output format with input
>
>Signed-off-by: Amritha Nambiar 

Looks ok. But why do you have man changes in a separate patch ?
I think it should be in this one. Anyway

Acked-by: Jiri Pirko 



Re: [PATCH net-next 17/17] net: sched: unlock rules update API

2018-11-15 Thread Jiri Pirko
Wed, Nov 14, 2018 at 05:45:34PM CET, vla...@mellanox.com wrote:
>
>On Wed 14 Nov 2018 at 06:44, Jiri Pirko  wrote:
>> Tue, Nov 13, 2018 at 02:46:54PM CET, vla...@mellanox.com wrote:
>>>On Mon 12 Nov 2018 at 17:30, David Miller  wrote:
>>>> From: Vlad Buslov 
>>>> Date: Mon, 12 Nov 2018 09:55:46 +0200
>>>>
>>>>> Register netlink protocol handlers for message types RTM_NEWTFILTER,
>>>>> RTM_DELTFILTER, RTM_GETTFILTER as unlocked. Set rtnl_held variable that
>>>>> tracks rtnl mutex state to be false by default.
>>>>
>>>> This whole conditional locking mechanism is really not clean and makes
>>>> this code so much harder to understand and audit.
>>>>
>>>> Please improve the code so that this kind of construct is not needed.
>>>>
>>>> Thank you.
>>>
>>>Hi David,
>>>
>>>I considered several approaches to this problem and decided that this
>>>one is most straightforward to implement. I understand your concern and
>>>agree that this code is not easiest to understand and can suggest
>>>several possible solutions that do not require this kind of elaborate
>>>locking mechanism in cls API, but have their own drawbacks:
>>>
>>>1. Convert all qdiscs and classifiers to support unlocked execution,
>>>like we did for actions. However, according to my experience with
>>>converting flower classifier, these require much more code than actions.
>>>I would estimate it to be more work than whole current unlocking effort
>>>(hundred+ patches). Also, authors of some of them might be unhappy with
>>>such intrusive changes. I don't think this approach is realistic.
>>>
>>>2. Somehow determine if rtnl is needed at the beginning of cls API rule
>>>update functions. Currently, this is not possible because locking
>>>requirements are determined by qdisc_class_ops and tcf_proto_ops 'flags'
>>>field, which requires code to first do whole ops lookup sequence.
>>>However, instead of class field I can put 'flags' in some kind of hash
>>>table or array that will map qdisc/classifier type string to flags, so
>>>it will be possible to determine locking requirements by just parsing
>>>netlink message and obtaining flags by qdisc/classifier type. I do not
>>>consider it pretty solution either, but maybe you have different
>>>opinion.
>>
>> I think you will have to do 2. or some modification. Can't you just
>> check for cls ability to run unlocked early on in tc_new_tfilter()?
>> You would call tcf_proto_locking_check(nla_data(tca[TCA_KIND]), ...),
>> which would do tcf_proto_lookup_ops() for ops and check the flags?
>
>I guess that would work. However, such solution requires calling
>tcf_proto_lookup_ops(), which iterates over tcf_proto_base list and
>calls strcmp() for each proto, for every rule update call. That is why I
>suggested to use some kind of optimized data structure for that purpose
>in my first reply. Dunno if such solution will significantly impact rule
>update performance. We don't have that many classifiers and their names
>are short, so I guess not?

Let's do it like this for unlocked first, then we can optimize if
necessary.


>
>>
>>
>>>
>>>3. Anything you can suggest? I might be missing something simple that
>>>you would consider more elegant solution to this problem.
>>>
>>>Thanks,
>>>Vlad
>>>
>


Re: [PATCH net-next 17/17] net: sched: unlock rules update API

2018-11-13 Thread Jiri Pirko
Tue, Nov 13, 2018 at 02:46:54PM CET, vla...@mellanox.com wrote:
>On Mon 12 Nov 2018 at 17:30, David Miller  wrote:
>> From: Vlad Buslov 
>> Date: Mon, 12 Nov 2018 09:55:46 +0200
>>
>>> Register netlink protocol handlers for message types RTM_NEWTFILTER,
>>> RTM_DELTFILTER, RTM_GETTFILTER as unlocked. Set rtnl_held variable that
>>> tracks rtnl mutex state to be false by default.
>>
>> This whole conditional locking mechanism is really not clean and makes
>> this code so much harder to understand and audit.
>>
>> Please improve the code so that this kind of construct is not needed.
>>
>> Thank you.
>
>Hi David,
>
>I considered several approaches to this problem and decided that this
>one is most straightforward to implement. I understand your concern and
>agree that this code is not easiest to understand and can suggest
>several possible solutions that do not require this kind of elaborate
>locking mechanism in cls API, but have their own drawbacks:
>
>1. Convert all qdiscs and classifiers to support unlocked execution,
>like we did for actions. However, according to my experience with
>converting flower classifier, these require much more code than actions.
>I would estimate it to be more work than whole current unlocking effort
>(hundred+ patches). Also, authors of some of them might be unhappy with
>such intrusive changes. I don't think this approach is realistic.
>
>2. Somehow determine if rtnl is needed at the beginning of cls API rule
>update functions. Currently, this is not possible because locking
>requirements are determined by qdisc_class_ops and tcf_proto_ops 'flags'
>field, which requires code to first do whole ops lookup sequence.
>However, instead of class field I can put 'flags' in some kind of hash
>table or array that will map qdisc/classifier type string to flags, so
>it will be possible to determine locking requirements by just parsing
>netlink message and obtaining flags by qdisc/classifier type. I do not
>consider it pretty solution either, but maybe you have different
>opinion.

I think you will have to do 2. or some modification. Can't you just
check for cls ability to run unlocked early on in tc_new_tfilter()?
You would call tcf_proto_locking_check(nla_data(tca[TCA_KIND]), ...),
which would do tcf_proto_lookup_ops() for ops and check the flags?


>
>3. Anything you can suggest? I might be missing something simple that
>you would consider more elegant solution to this problem.
>
>Thanks,
>Vlad
>


[patch net-next] net: 8021q: move vlan offload registrations into vlan_core

2018-11-13 Thread Jiri Pirko
From: Jiri Pirko 

Currently, the vlan packet offloads are registered only upon 8021q module
load. However, even without this module loaded, the offloads could be
utilized, for example by openvswitch datapath. As reported by Michael,
that causes 2x to 5x performance improvement, depending on a testcase.

So move the vlan offload registrations into vlan_core and make this
available even without 8021q module loaded.

Reported-by: Michael Shteinbok 
Signed-off-by: Jiri Pirko 
Tested-by: Michael Shteinbok 
---
 net/8021q/vlan.c  | 96 -
 net/8021q/vlan_core.c | 99 +++
 2 files changed, 99 insertions(+), 96 deletions(-)

diff --git a/net/8021q/vlan.c b/net/8021q/vlan.c
index 1b7a375c6616..aef1a977279c 100644
--- a/net/8021q/vlan.c
+++ b/net/8021q/vlan.c
@@ -648,93 +648,6 @@ static int vlan_ioctl_handler(struct net *net, void __user 
*arg)
return err;
 }
 
-static struct sk_buff *vlan_gro_receive(struct list_head *head,
-   struct sk_buff *skb)
-{
-   const struct packet_offload *ptype;
-   unsigned int hlen, off_vlan;
-   struct sk_buff *pp = NULL;
-   struct vlan_hdr *vhdr;
-   struct sk_buff *p;
-   __be16 type;
-   int flush = 1;
-
-   off_vlan = skb_gro_offset(skb);
-   hlen = off_vlan + sizeof(*vhdr);
-   vhdr = skb_gro_header_fast(skb, off_vlan);
-   if (skb_gro_header_hard(skb, hlen)) {
-   vhdr = skb_gro_header_slow(skb, hlen, off_vlan);
-   if (unlikely(!vhdr))
-   goto out;
-   }
-
-   type = vhdr->h_vlan_encapsulated_proto;
-
-   rcu_read_lock();
-   ptype = gro_find_receive_by_type(type);
-   if (!ptype)
-   goto out_unlock;
-
-   flush = 0;
-
-   list_for_each_entry(p, head, list) {
-   struct vlan_hdr *vhdr2;
-
-   if (!NAPI_GRO_CB(p)->same_flow)
-   continue;
-
-   vhdr2 = (struct vlan_hdr *)(p->data + off_vlan);
-   if (compare_vlan_header(vhdr, vhdr2))
-   NAPI_GRO_CB(p)->same_flow = 0;
-   }
-
-   skb_gro_pull(skb, sizeof(*vhdr));
-   skb_gro_postpull_rcsum(skb, vhdr, sizeof(*vhdr));
-   pp = call_gro_receive(ptype->callbacks.gro_receive, head, skb);
-
-out_unlock:
-   rcu_read_unlock();
-out:
-   skb_gro_flush_final(skb, pp, flush);
-
-   return pp;
-}
-
-static int vlan_gro_complete(struct sk_buff *skb, int nhoff)
-{
-   struct vlan_hdr *vhdr = (struct vlan_hdr *)(skb->data + nhoff);
-   __be16 type = vhdr->h_vlan_encapsulated_proto;
-   struct packet_offload *ptype;
-   int err = -ENOENT;
-
-   rcu_read_lock();
-   ptype = gro_find_complete_by_type(type);
-   if (ptype)
-   err = ptype->callbacks.gro_complete(skb, nhoff + sizeof(*vhdr));
-
-   rcu_read_unlock();
-   return err;
-}
-
-static struct packet_offload vlan_packet_offloads[] __read_mostly = {
-   {
-   .type = cpu_to_be16(ETH_P_8021Q),
-   .priority = 10,
-   .callbacks = {
-   .gro_receive = vlan_gro_receive,
-   .gro_complete = vlan_gro_complete,
-   },
-   },
-   {
-   .type = cpu_to_be16(ETH_P_8021AD),
-   .priority = 10,
-   .callbacks = {
-   .gro_receive = vlan_gro_receive,
-   .gro_complete = vlan_gro_complete,
-   },
-   },
-};
-
 static int __net_init vlan_init_net(struct net *net)
 {
struct vlan_net *vn = net_generic(net, vlan_net_id);
@@ -762,7 +675,6 @@ static struct pernet_operations vlan_net_ops = {
 static int __init vlan_proto_init(void)
 {
int err;
-   unsigned int i;
 
pr_info("%s v%s\n", vlan_fullname, vlan_version);
 
@@ -786,9 +698,6 @@ static int __init vlan_proto_init(void)
if (err < 0)
goto err5;
 
-   for (i = 0; i < ARRAY_SIZE(vlan_packet_offloads); i++)
-   dev_add_offload(_packet_offloads[i]);
-
vlan_ioctl_set(vlan_ioctl_handler);
return 0;
 
@@ -806,13 +715,8 @@ static int __init vlan_proto_init(void)
 
 static void __exit vlan_cleanup_module(void)
 {
-   unsigned int i;
-
vlan_ioctl_set(NULL);
 
-   for (i = 0; i < ARRAY_SIZE(vlan_packet_offloads); i++)
-   dev_remove_offload(_packet_offloads[i]);
-
vlan_netlink_fini();
 
unregister_netdevice_notifier(_notifier_block);
diff --git a/net/8021q/vlan_core.c b/net/8021q/vlan_core.c
index 57425049faf2..a313165e7a67 100644
--- a/net/8021q/vlan_core.c
+++ b/net/8021q/vlan_core.c
@@ -453,3 +453,102 @@ bool vlan_uses_dev(const struct net_device *dev)
return vlan_info->grp.nr_vlan_devs ? true : false;
 }
 EXPORT_SYMBOL(vlan_uses_dev);
+
+static struct sk

Re: [PATCH net-next 1/6] net: sched: register callbacks for indirect tc block binds

2018-11-11 Thread Jiri Pirko
Sat, Nov 10, 2018 at 06:21:26AM CET, jakub.kicin...@netronome.com wrote:
>From: John Hurley 
>
>Currently drivers can register to receive TC block bind/unbind callbacks
>by implementing the setup_tc ndo in any of their given netdevs. However,
>drivers may also be interested in binds to higher level devices (e.g.
>tunnel drivers) to potentially offload filters applied to them.
>
>Introduce indirect block devs which allows drivers to register callbacks
>for block binds on other devices. The callback is triggered when the
>device is bound to a block, allowing the driver to register for rules
>applied to that block using already available functions.
>
>Freeing an indirect block callback will trigger an unbind event (if
>necessary) to direct the driver to remove any offloaded rules and unreg
>any block rule callbacks. It is the responsibility of the implementing
>driver to clean any registered indirect block callbacks before exiting,
>if the block it still active at such a time.
>
>Allow registering an indirect block dev callback for a device that is
>already bound to a block. In this case (if it is an ingress block),
>register and also trigger the callback meaning that any already installed
>rules can be replayed to the calling driver.
>
>Signed-off-by: John Hurley 
>Signed-off-by: Jakub Kicinski 

Acked-by: Jiri Pirko 


Re: [PATCH net v2] net: sched: cls_flower: validate nested enc_opts_policy to avoid warning

2018-11-09 Thread Jiri Pirko
Sat, Nov 10, 2018 at 06:06:26AM CET, jakub.kicin...@netronome.com wrote:
>TCA_FLOWER_KEY_ENC_OPTS and TCA_FLOWER_KEY_ENC_OPTS_MASK can only
>currently contain further nested attributes, which are parsed by
>hand, so the policy is never actually used resulting in a W=1
>build warning:
>
>net/sched/cls_flower.c:492:1: warning: ‘enc_opts_policy’ defined but not used 
>[-Wunused-const-variable=]
> enc_opts_policy[TCA_FLOWER_KEY_ENC_OPTS_MAX + 1] = {
>
>Add the validation anyway to avoid potential bugs when other
>attributes are added and to make the attribute structure slightly
>more clear.  Validation will also set extact to point to bad
>attribute on error.
>
>Signed-off-by: Jakub Kicinski 
>Acked-by: Simon Horman 

Fixes: 0a6e77784f49 ("net/sched: allow flower to match tunnel options")
Acked-by: Jiri Pirko 


Re: [net-next PATCH v3] net: sched: cls_flower: Classify packets using port ranges

2018-11-09 Thread Jiri Pirko
Sat, Nov 10, 2018 at 01:11:10AM CET, amritha.namb...@intel.com wrote:

[...]

>@@ -1026,8 +1122,7 @@ static void fl_init_dissector(struct flow_dissector 
>*dissector,
>FLOW_DISSECTOR_KEY_IPV4_ADDRS, ipv4);
>   FL_KEY_SET_IF_MASKED(mask, keys, cnt,
>FLOW_DISSECTOR_KEY_IPV6_ADDRS, ipv6);
>-  FL_KEY_SET_IF_MASKED(mask, keys, cnt,
>-   FLOW_DISSECTOR_KEY_PORTS, tp);
>+  FL_KEY_SET(keys, cnt, FLOW_DISSECTOR_KEY_PORTS, tp);

You still need to set the key under a condition. Something like:
if (FL_KEY_IS_MASKED(mask, tp) ||
FL_KEY_IS_MASKED(mask, tp_min) ||
FL_KEY_IS_MASKED(mask, tp_max)
FL_KEY_SET(keys, cnt, FLOW_DISSECTOR_KEY_PORTS, tp);


>   FL_KEY_SET_IF_MASKED(mask, keys, cnt,
>FLOW_DISSECTOR_KEY_IP, ip);
>   FL_KEY_SET_IF_MASKED(mask, keys, cnt,

[...]


Re: [PATCH iproute2-next] devlink: Add missing region option to devlink man page

2018-11-09 Thread Jiri Pirko
Thu, Nov 08, 2018 at 10:14:13AM CET, va...@mellanox.com wrote:
>The region field was not added to the devlink man page.
>
>Fixes: 8b4fbf0bed8e6 ("devlink: Add support for devlink-region access")
>Signed-off-by: Alex Vesker 

Acked-by: Jiri Pirko 



Re: [net-next PATCH v2] net: sched: cls_flower: Classify packets using port ranges

2018-11-09 Thread Jiri Pirko
Wed, Nov 07, 2018 at 10:22:42PM CET, amritha.namb...@intel.com wrote:
>Added support in tc flower for filtering based on port ranges.
>
>Example:
>1. Match on a port range:
>-
>$ tc filter add dev enp4s0 protocol ip parent :\
>  prio 1 flower ip_proto tcp dst_port range 20-30 skip_hw\
>  action drop
>
>$ tc -s filter show dev enp4s0 parent :
>filter protocol ip pref 1 flower chain 0
>filter protocol ip pref 1 flower chain 0 handle 0x1
>  eth_type ipv4
>  ip_proto tcp
>  dst_port range 20-30
>  skip_hw
>  not_in_hw
>action order 1: gact action drop
> random type none pass val 0
> index 1 ref 1 bind 1 installed 85 sec used 3 sec
>Action statistics:
>Sent 460 bytes 10 pkt (dropped 10, overlimits 0 requeues 0)
>backlog 0b 0p requeues 0
>
>2. Match on IP address and port range:
>--
>$ tc filter add dev enp4s0 protocol ip parent :\
>  prio 1 flower dst_ip 192.168.1.1 ip_proto tcp dst_port range 100-200\
>  skip_hw action drop
>
>$ tc -s filter show dev enp4s0 parent :
>filter protocol ip pref 1 flower chain 0 handle 0x2
>  eth_type ipv4
>  ip_proto tcp
>  dst_ip 192.168.1.1
>  dst_port range 100-200
>  skip_hw
>  not_in_hw
>action order 1: gact action drop
> random type none pass val 0
> index 2 ref 1 bind 1 installed 58 sec used 2 sec
>Action statistics:
>Sent 920 bytes 20 pkt (dropped 20, overlimits 0 requeues 0)
>backlog 0b 0p requeues 0
>
>v2:
>Addressed Jiri's comments:
>1. Added separate functions for dst and src comparisons.
>2. Removed endpoint enum.
>3. Added new bit TCA_FLOWER_FLAGS_RANGE to decide normal/range
>  lookup.
>4. Cleaned up fl_lookup function.
>
>Signed-off-by: Amritha Nambiar 
>---
> include/uapi/linux/pkt_cls.h |7 ++
> net/sched/cls_flower.c   |  133 --
> 2 files changed, 134 insertions(+), 6 deletions(-)
>
>diff --git a/include/uapi/linux/pkt_cls.h b/include/uapi/linux/pkt_cls.h
>index 401d0c1..b63c3cf 100644
>--- a/include/uapi/linux/pkt_cls.h
>+++ b/include/uapi/linux/pkt_cls.h
>@@ -405,6 +405,11 @@ enum {
>   TCA_FLOWER_KEY_UDP_SRC, /* be16 */
>   TCA_FLOWER_KEY_UDP_DST, /* be16 */
> 
>+  TCA_FLOWER_KEY_PORT_SRC_MIN,/* be16 */
>+  TCA_FLOWER_KEY_PORT_SRC_MAX,/* be16 */
>+  TCA_FLOWER_KEY_PORT_DST_MIN,/* be16 */
>+  TCA_FLOWER_KEY_PORT_DST_MAX,/* be16 */
>+

Please put it at the end of the enum, as David mentioned.


>   TCA_FLOWER_FLAGS,
>   TCA_FLOWER_KEY_VLAN_ID, /* be16 */
>   TCA_FLOWER_KEY_VLAN_PRIO,   /* u8   */
>@@ -518,6 +523,8 @@ enum {
>   TCA_FLOWER_KEY_FLAGS_FRAG_IS_FIRST = (1 << 1),
> };
> 
>+#define TCA_FLOWER_MASK_FLAGS_RANGE   (1 << 0) /* Range-based match */
>+
> /* Match-all classifier */
> 
> enum {
>diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c
>index 9aada2d..9d2582d 100644
>--- a/net/sched/cls_flower.c
>+++ b/net/sched/cls_flower.c
>@@ -55,6 +55,9 @@ struct fl_flow_key {
>   struct flow_dissector_key_ip ip;
>   struct flow_dissector_key_ip enc_ip;
>   struct flow_dissector_key_enc_opts enc_opts;
>+

No need for an empty line.


>+  struct flow_dissector_key_ports tp_min;
>+  struct flow_dissector_key_ports tp_max;
> } __aligned(BITS_PER_LONG / 8); /* Ensure that we can do comparisons as 
> longs. */
> 
> struct fl_flow_mask_range {
>@@ -65,6 +68,7 @@ struct fl_flow_mask_range {
> struct fl_flow_mask {
>   struct fl_flow_key key;
>   struct fl_flow_mask_range range;
>+  u32 flags;
>   struct rhash_head ht_node;
>   struct rhashtable ht;
>   struct rhashtable_params filter_ht_params;
>@@ -179,13 +183,89 @@ static void fl_clear_masked_range(struct fl_flow_key 
>*key,
>   memset(fl_key_get_start(key, mask), 0, fl_mask_range(mask));
> }
> 
>-static struct cls_fl_filter *fl_lookup(struct fl_flow_mask *mask,
>- struct fl_flow_key *mkey)
>+static bool fl_range_port_dst_cmp(struct cls_fl_filter *filter,
>+struct fl_flow_key *key,
>+struct fl_flow_key *mkey)
>+{
>+  __be16 min_mask, max_mask, min_val, max_val;
>+
>+  min_mask = htons(filter->mask->key.tp_min.dst);
>+  max_mask = htons(filter->mask->key.tp_max.dst);
>+  min_val = htons(filter->key.tp_min.dst);
>+  max_val = htons(filter->key.tp_max.dst);
>+
>+  if (min_mask && max_mask) {
>+  if (htons(key->tp.dst) < min_val ||
>+  htons(key->tp.dst) > max_val)
>+  return false;
>+
>+  /* skb does not have min and max values */
>+  mkey->tp_min.dst = filter->mkey.tp_min.dst;
>+  mkey->tp_max.dst = filter->mkey.tp_max.dst;
>+  }
>+  return true;
>+}
>+
>+static bool fl_range_port_src_cmp(struct cls_fl_filter *filter,
>+

Re: [iproute2 PATCH v2] tc: flower: Classify packets based port ranges

2018-11-09 Thread Jiri Pirko
Wed, Nov 07, 2018 at 10:22:50PM CET, amritha.namb...@intel.com wrote:
>Added support for filtering based on port ranges.
>
>Example:
>1. Match on a port range:
>-
>$ tc filter add dev enp4s0 protocol ip parent :\
>  prio 1 flower ip_proto tcp dst_port range 20-30 skip_hw\
>  action drop
>
>$ tc -s filter show dev enp4s0 parent :
>filter protocol ip pref 1 flower chain 0
>filter protocol ip pref 1 flower chain 0 handle 0x1
>  eth_type ipv4
>  ip_proto tcp
>  dst_port range 20-30
>  skip_hw
>  not_in_hw
>action order 1: gact action drop
> random type none pass val 0
> index 1 ref 1 bind 1 installed 85 sec used 3 sec
>Action statistics:
>Sent 460 bytes 10 pkt (dropped 10, overlimits 0 requeues 0)
>backlog 0b 0p requeues 0
>
>2. Match on IP address and port range:
>--
>$ tc filter add dev enp4s0 protocol ip parent :\
>  prio 1 flower dst_ip 192.168.1.1 ip_proto tcp dst_port range 100-200\
>  skip_hw action drop
>
>$ tc -s filter show dev enp4s0 parent :
>filter protocol ip pref 1 flower chain 0 handle 0x2
>  eth_type ipv4
>  ip_proto tcp
>  dst_ip 192.168.1.1
>  dst_port range 100-200
>  skip_hw
>  not_in_hw
>action order 1: gact action drop
> random type none pass val 0
> index 2 ref 1 bind 1 installed 58 sec used 2 sec
>Action statistics:
>Sent 920 bytes 20 pkt (dropped 20, overlimits 0 requeues 0)
>backlog 0b 0p requeues 0
>
>v2:
>Addressed Jiri's comment to sync output format with input
>
>Signed-off-by: Amritha Nambiar 
>---
> include/uapi/linux/pkt_cls.h |7 ++
> tc/f_flower.c|  145 +++---
> 2 files changed, 142 insertions(+), 10 deletions(-)
>
>diff --git a/include/uapi/linux/pkt_cls.h b/include/uapi/linux/pkt_cls.h
>index 401d0c1..b63c3cf 100644
>--- a/include/uapi/linux/pkt_cls.h
>+++ b/include/uapi/linux/pkt_cls.h
>@@ -405,6 +405,11 @@ enum {
>   TCA_FLOWER_KEY_UDP_SRC, /* be16 */
>   TCA_FLOWER_KEY_UDP_DST, /* be16 */
> 
>+  TCA_FLOWER_KEY_PORT_SRC_MIN,/* be16 */
>+  TCA_FLOWER_KEY_PORT_SRC_MAX,/* be16 */
>+  TCA_FLOWER_KEY_PORT_DST_MIN,/* be16 */
>+  TCA_FLOWER_KEY_PORT_DST_MAX,/* be16 */
>+
>   TCA_FLOWER_FLAGS,
>   TCA_FLOWER_KEY_VLAN_ID, /* be16 */
>   TCA_FLOWER_KEY_VLAN_PRIO,   /* u8   */
>@@ -518,6 +523,8 @@ enum {
>   TCA_FLOWER_KEY_FLAGS_FRAG_IS_FIRST = (1 << 1),
> };
> 
>+#define TCA_FLOWER_MASK_FLAGS_RANGE   (1 << 0) /* Range-based match */
>+
> /* Match-all classifier */
> 
> enum {
>diff --git a/tc/f_flower.c b/tc/f_flower.c
>index 65fca04..7724a1d 100644
>--- a/tc/f_flower.c
>+++ b/tc/f_flower.c
>@@ -494,6 +494,66 @@ static int flower_parse_port(char *str, __u8 ip_proto,
>   return 0;
> }
> 
>+static int flower_port_range_attr_type(__u8 ip_proto, enum flower_endpoint 
>type,
>+ __be16 *min_port_type,
>+ __be16 *max_port_type)
>+{
>+  if (ip_proto == IPPROTO_TCP || ip_proto == IPPROTO_UDP ||
>+  ip_proto == IPPROTO_SCTP) {
>+  if (type == FLOWER_ENDPOINT_SRC) {
>+  *min_port_type = TCA_FLOWER_KEY_PORT_SRC_MIN;
>+  *max_port_type = TCA_FLOWER_KEY_PORT_SRC_MAX;
>+  } else {
>+  *min_port_type = TCA_FLOWER_KEY_PORT_DST_MIN;
>+  *max_port_type = TCA_FLOWER_KEY_PORT_DST_MAX;
>+  }
>+  } else {
>+  return -1;
>+  }
>+
>+  return 0;
>+}
>+
>+static int flower_parse_port_range(__be16 *min, __be16 *max, __u8 ip_proto,
>+ enum flower_endpoint endpoint,
>+ struct nlmsghdr *n)
>+{
>+  __be16 min_port_type, max_port_type;
>+
>+  flower_port_range_attr_type(ip_proto, endpoint, _port_type,
>+  _port_type);
>+  addattr16(n, MAX_MSG, min_port_type, *min);
>+  addattr16(n, MAX_MSG, max_port_type, *max);
>+
>+  return 0;
>+}
>+
>+static int get_range(__be16 *min, __be16 *max, char *argv)
>+{
>+  char *r;
>+
>+  r = strchr(argv, '-');
>+  if (r) {
>+  *r = '\0';
>+  if (get_be16(min, argv, 10)) {
>+  fprintf(stderr, "invalid min range\n");
>+  return -1;
>+  }
>+  if (get_be16(max, r + 1, 10)) {
>+  fprintf(stderr, "invalid max range\n");
>+  return -1;
>+  }
>+  if (htons(*max) <= htons(*min)) {
>+  fprintf(stderr, "max value should be greater than min 
>value\n");
>+  return -1;
>+  }
>+  } else {
>+  fprintf(stderr, "Illegal range format\n");
>+  return -1;
>+  }
>+  return 0;
>+}
>+
> #define TCP_FLAGS_MAX_MASK 

Re: [PATCH net] net: sched: Remove TCA_OPTIONS from policy

2018-10-26 Thread Jiri Pirko
Fri, Oct 26, 2018 at 06:02:01PM CEST, dsah...@gmail.com wrote:
>On 10/25/18 12:31 AM, Jiri Pirko wrote:
>> Wed, Oct 24, 2018 at 05:32:49PM CEST, dsah...@kernel.org wrote:
>>> From: David Ahern 
>>>
>>> Marco reported an error with hfsc:
>>> root@Calimero:~# tc qdisc add dev eth0 root handle 1:0 hfsc default 1
>>> Error: Attribute failed policy validation.
>>>
>>> Apparently a few implementations pass TCA_OPTIONS as a binary instead
>>> of nested attribute, so drop TCA_OPTIONS from the policy.
>> 
>> Yeah, this is nice example of a case, where I think it wouldn't hurt to
>> be a bit more strict. Apparently, the userspace app is buggy. It should
>> be fixed. Note that I'm aware of the bw compatibility.
>
>Kernel side for hfsc expects TCA_OPTIONS as a binary as well - a struct
>tc_hfsc_qopt. Nothing that can be done.

:(


Re: [PATCH net-next] net/ipv6: Block IPv6 addrconf on team ports

2018-10-26 Thread Jiri Pirko
Fri, Oct 26, 2018 at 12:40:31AM CEST, jay.vosbu...@canonical.com wrote:
>Chas Williams <3ch...@gmail.com> wrote:
>
>>On 10/25/2018 05:59 PM, Jay Vosburgh wrote:
>>> Chas Williams <3ch...@gmail.com> wrote:
>>>
 netif_is_lag_port should be used to identify link aggregation ports.
 For this to work, we need to reorganize the bonding and team drivers
 so that the necessary flags are set before dev_open is called.

 commit 31e77c93e432 ("sched/fair: Update blocked load when newly idle")
 made this decision originally based on the IFF_SLAVE flag which isn't
 used by the team driver.  Note, we do need to retain the IFF_SLAVE
 check for the eql driver.
>>>
>>> Is 31e77c93e432 the correct commit reference?  I don't see
>>> anything in there about IFF_SLAVE or bonding; it's a patch to the
>>> process scheduler.
>>
>>No, that's wrong.  It should be c2edacf80e155.
>>
>>> And, as Jiri said, the subject doesn't mention bonding.
>>
>>The behavior of bonding wasn't changed.  The intent of the patch
>>is to add team slaves to the interfaces that don't get automatic
>>IPv6 addresses.  The body discusses why bonding had to change as
>>well.
>
>   Sure, but the bonding code has changed, and the current
>presentation makes it harder for reviewers to follow (or perhaps even
>notice).
>
>>I was under the impression that the subject needs to kept short.
>>If there a better way to phrase what I want to do?
>
>   I'd suggest splitting this into three patches: A first patch
>that adds the new IPv6 functionality, then one patch each for team and
>bonding to take advantage of that new functionality.  Each of the three
>would then be very straightforward, change just one thing, and should be
>clearer all around.

+1


Re: [PATCH net-next] net/ipv6: Block IPv6 addrconf on team ports

2018-10-25 Thread Jiri Pirko
Thu, Oct 25, 2018 at 11:02:27PM CEST, 3ch...@gmail.com wrote:
>netif_is_lag_port should be used to identify link aggregation ports.
>For this to work, we need to reorganize the bonding and team drivers
>so that the necessary flags are set before dev_open is called.
>
>commit 31e77c93e432 ("sched/fair: Update blocked load when newly idle")
>made this decision originally based on the IFF_SLAVE flag which isn't
>used by the team driver.  Note, we do need to retain the IFF_SLAVE
>check for the eql driver.
>
>Signed-off-by: Chas Williams <3ch...@gmail.com>
>---
> drivers/net/bonding/bond_main.c | 4 ++--
> drivers/net/team/team.c | 7 +--
> net/ipv6/addrconf.c | 2 +-

Subject talks about "team" yet you modify bond and team. Confusing..


Re: [RFC net-next v2 2/8] net: add netif_is_geneve()

2018-10-25 Thread Jiri Pirko
Thu, Oct 25, 2018 at 02:26:51PM CEST, john.hur...@netronome.com wrote:
>Add a helper function to determine if the type of a netdev is geneve based
>on its rtnl_link_ops. This allows drivers that may wish to ofload tunnels
>to check the underlying type of the device.
>
>A recent patch added a similar helper to vxlan.h
>
>Signed-off-by: John Hurley 
>Reviewed-by: Jakub Kicinski 

I don't understand why this and the next patch are part of this
patchset. They don't seem directly related.


Re: [RFC net-next v2 0/8] indirect tc block cb registration

2018-10-25 Thread Jiri Pirko
Thu, Oct 25, 2018 at 02:26:49PM CEST, john.hur...@netronome.com wrote:
>This patchset introduces an alternative to egdev offload by allowing a
>driver to register for block updates when an external device (e.g. tunnel
>netdev) is bound to a TC block. Drivers can track new netdevs or register
>to existing ones to receive information on such events. Based on this,
>they may register for block offload rules using already existing
>functions.
>
>The patchset also implements this new indirect block registration in the
>NFP driver to allow the offloading of tunnel rules. The use of egdev
>offload (which is currently only used for tunnel offload) is subsequently
>removed.

John, I'm missing v1->v2 changelog. Could you please add it?

Thanks!


Re: [PATCH net] net: sched: Remove TCA_OPTIONS from policy

2018-10-25 Thread Jiri Pirko
Wed, Oct 24, 2018 at 05:32:49PM CEST, dsah...@kernel.org wrote:
>From: David Ahern 
>
>Marco reported an error with hfsc:
>root@Calimero:~# tc qdisc add dev eth0 root handle 1:0 hfsc default 1
>Error: Attribute failed policy validation.
>
>Apparently a few implementations pass TCA_OPTIONS as a binary instead
>of nested attribute, so drop TCA_OPTIONS from the policy.

Yeah, this is nice example of a case, where I think it wouldn't hurt to
be a bit more strict. Apparently, the userspace app is buggy. It should
be fixed. Note that I'm aware of the bw compatibility.


>
>Fixes: 8b4c3cdd9dd8 ("net: sched: Add policy validation for tc attributes")
>Reported-by: Marco Berizzi 
>Signed-off-by: David Ahern 
>---
> net/sched/sch_api.c | 1 -
> 1 file changed, 1 deletion(-)
>
>diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
>index 3dc0acf54245..be7cd140b2a3 100644
>--- a/net/sched/sch_api.c
>+++ b/net/sched/sch_api.c
>@@ -1309,7 +1309,6 @@ check_loop_fn(struct Qdisc *q, unsigned long cl, struct 
>qdisc_walker *w)
> 
> const struct nla_policy rtm_tca_policy[TCA_MAX + 1] = {
>   [TCA_KIND]  = { .type = NLA_STRING },
>-  [TCA_OPTIONS]   = { .type = NLA_NESTED },

A future developer might add this again. A comment why not would be good
here.


>   [TCA_RATE]  = { .type = NLA_BINARY,
>   .len = sizeof(struct tc_estimator) },
>   [TCA_STAB]  = { .type = NLA_NESTED },
>-- 
>2.11.0
>


Re: [PATCH net 2/4] net/sched: act_police: disallow 'goto chain' on fallback control action

2018-10-22 Thread Jiri Pirko
Sat, Oct 20, 2018 at 11:33:08PM CEST, dcara...@redhat.com wrote:
>in the following command:
>
> # tc action add action police rate  burst  conform-exceed /
>
>'goto chain x' is allowed only for c1: setting it for c2 makes the kernel
>crash with NULL pointer dereference, since TC core doesn't initialize the
>chain handle.
>
>Signed-off-by: Davide Caratti 
>---
> net/sched/act_police.c | 12 ++--
> 1 file changed, 10 insertions(+), 2 deletions(-)
>
>diff --git a/net/sched/act_police.c b/net/sched/act_police.c
>index 5d8bfa878477..3b793393efd1 100644
>--- a/net/sched/act_police.c
>+++ b/net/sched/act_police.c
>@@ -150,6 +150,16 @@ static int tcf_police_init(struct net *net, struct nlattr 
>*nla,
>   goto failure;
>   }
> 
>+  if (tb[TCA_POLICE_RESULT]) {
>+  police->tcfp_result = nla_get_u32(tb[TCA_POLICE_RESULT]);
>+  if (TC_ACT_EXT_CMP(police->tcfp_result, TC_ACT_GOTO_CHAIN)) {
>+  NL_SET_ERR_MSG(extack,
>+     "goto chain not allowed on fallback");

Also, no need for line-wrap

Acked-by: Jiri Pirko 



>+  err = -EINVAL;
>+  goto failure;
>+  }
>+  }
>+
>   spin_lock_bh(>tcf_lock);
>   /* No failure allowed after this point */
>   police->tcfp_mtu = parm->mtu;
>@@ -173,8 +183,6 @@ static int tcf_police_init(struct net *net, struct nlattr 
>*nla,
>   police->peak_present = false;
>   }
> 
>-  if (tb[TCA_POLICE_RESULT])
>-  police->tcfp_result = nla_get_u32(tb[TCA_POLICE_RESULT]);
>   police->tcfp_burst = PSCHED_TICKS2NS(parm->burst);
>   police->tcfp_toks = police->tcfp_burst;
>   if (police->peak_present) {
>-- 
>2.17.1
>


Re: [PATCH net 1/4] net/sched: act_gact: disallow 'goto chain' on fallback control action

2018-10-22 Thread Jiri Pirko
Sat, Oct 20, 2018 at 11:33:07PM CEST, dcara...@redhat.com wrote:
>in the following command:
>
> # tc action add action  random   
>
>'goto chain x' is allowed only for c1: setting it for c2 makes the kernel
>crash with NULL pointer dereference, since TC core doesn't initialize the
>chain handle.
>
>Signed-off-by: Davide Caratti 
>---
> net/sched/act_gact.c | 5 +
> 1 file changed, 5 insertions(+)
>
>diff --git a/net/sched/act_gact.c b/net/sched/act_gact.c
>index cd1d9bd32ef9..505138047e5c 100644
>--- a/net/sched/act_gact.c
>+++ b/net/sched/act_gact.c
>@@ -88,6 +88,11 @@ static int tcf_gact_init(struct net *net, struct nlattr 
>*nla,
>   p_parm = nla_data(tb[TCA_GACT_PROB]);
>   if (p_parm->ptype >= MAX_RAND)
>   return -EINVAL;
>+  if (TC_ACT_EXT_CMP(p_parm->paction, TC_ACT_GOTO_CHAIN)) {
>+  NL_SET_ERR_MSG(extack,
>+ "goto chain not allowed on fallback");

No need for a line-wrap. Otherwise
Acked-by: Jiri Pirko 


Re: [PATCH net-next] rocker: Drop pointless static qualifier

2018-10-19 Thread Jiri Pirko
Fri, Oct 19, 2018 at 02:02:59PM CEST, yuehaib...@huawei.com wrote:
>There is no need to have the 'struct rocker_desc_info *desc_info'
>variable static since new value always be assigned before use it.
>
>Signed-off-by: YueHaibing 

Acked-by: Jiri Pirko 



Re: [net-next PATCH] net: sched: cls_flower: Classify packets using port ranges

2018-10-19 Thread Jiri Pirko
Thu, Oct 18, 2018 at 08:24:44PM CEST, amritha.namb...@intel.com wrote:
>On 10/18/2018 5:17 AM, Jiri Pirko wrote:
>> Fri, Oct 12, 2018 at 03:53:30PM CEST, amritha.namb...@intel.com wrote:
>>> Added support in tc flower for filtering based on port ranges.
>>> This is a rework of the RFC patch at:
>>> https://patchwork.ozlabs.org/patch/969595/
>>>
>>> Example:
>>> 1. Match on a port range:
>>> -
>>> $ tc filter add dev enp4s0 protocol ip parent :\
>>>  prio 1 flower ip_proto tcp dst_port range 20-30 skip_hw\
>>>  action drop
>>>
>>> $ tc -s filter show dev enp4s0 parent :
>>> filter protocol ip pref 1 flower chain 0
>>> filter protocol ip pref 1 flower chain 0 handle 0x1
>>>  eth_type ipv4
>>>  ip_proto tcp
>>>  dst_port_min 20
>>>  dst_port_max 30
>>>  skip_hw
>>>  not_in_hw
>>>action order 1: gact action drop
>>> random type none pass val 0
>>> index 1 ref 1 bind 1 installed 181 sec used 5 sec
>>>Action statistics:
>>>Sent 460 bytes 10 pkt (dropped 10, overlimits 0 requeues 0)
>>>backlog 0b 0p requeues 0
>>>
>>> 2. Match on IP address and port range:
>>> --
>>> $ tc filter add dev enp4s0 protocol ip parent :\
>>>  prio 1 flower dst_ip 192.168.1.1 ip_proto tcp dst_port range 100-200\
>>>  skip_hw action drop
>>>
>>> $ tc -s filter show dev enp4s0 parent :
>>> filter protocol ip pref 1 flower chain 0 handle 0x2
>>>  eth_type ipv4
>>>  ip_proto tcp
>>>  dst_ip 192.168.1.1
>>>  dst_port_min 100
>>>  dst_port_max 200
>>>  skip_hw
>>>  not_in_hw
>>>action order 1: gact action drop
>>> random type none pass val 0
>>> index 2 ref 1 bind 1 installed 28 sec used 6 sec
>>>Action statistics:
>>>Sent 460 bytes 10 pkt (dropped 10, overlimits 0 requeues 0)
>>>backlog 0b 0p requeues 0
>>>
>>> Signed-off-by: Amritha Nambiar 
>>> ---
>>> include/uapi/linux/pkt_cls.h |5 ++
>>> net/sched/cls_flower.c   |  134 
>>> --
>>> 2 files changed, 132 insertions(+), 7 deletions(-)
>>>
>>> diff --git a/include/uapi/linux/pkt_cls.h b/include/uapi/linux/pkt_cls.h
>>> index 401d0c1..b569308 100644
>>> --- a/include/uapi/linux/pkt_cls.h
>>> +++ b/include/uapi/linux/pkt_cls.h
>>> @@ -405,6 +405,11 @@ enum {
>>> TCA_FLOWER_KEY_UDP_SRC, /* be16 */
>>> TCA_FLOWER_KEY_UDP_DST, /* be16 */
>>>
>>> +   TCA_FLOWER_KEY_PORT_SRC_MIN,/* be16 */
>>> +   TCA_FLOWER_KEY_PORT_SRC_MAX,/* be16 */
>>> +   TCA_FLOWER_KEY_PORT_DST_MIN,/* be16 */
>>> +   TCA_FLOWER_KEY_PORT_DST_MAX,/* be16 */
>>> +
>>> TCA_FLOWER_FLAGS,
>>> TCA_FLOWER_KEY_VLAN_ID, /* be16 */
>>> TCA_FLOWER_KEY_VLAN_PRIO,   /* u8   */
>>> diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c
>>> index 9aada2d..5f135f0 100644
>>> --- a/net/sched/cls_flower.c
>>> +++ b/net/sched/cls_flower.c
>>> @@ -55,6 +55,9 @@ struct fl_flow_key {
>>> struct flow_dissector_key_ip ip;
>>> struct flow_dissector_key_ip enc_ip;
>>> struct flow_dissector_key_enc_opts enc_opts;
>>> +
>>> +   struct flow_dissector_key_ports tp_min;
>>> +   struct flow_dissector_key_ports tp_max;
>>> } __aligned(BITS_PER_LONG / 8); /* Ensure that we can do comparisons as 
>>> longs. */
>>>
>>> struct fl_flow_mask_range {
>>> @@ -103,6 +106,11 @@ struct cls_fl_filter {
>>> struct net_device *hw_dev;
>>> };
>>>
>>> +enum fl_endpoint {
>>> +   FLOWER_ENDPOINT_DST,
>>> +   FLOWER_ENDPOINT_SRC
>>> +};
>>> +
>>> static const struct rhashtable_params mask_ht_params = {
>>> .key_offset = offsetof(struct fl_flow_mask, key),
>>> .key_len = sizeof(struct fl_flow_key),
>>> @@ -179,11 +187,86 @@ static void fl_clear_masked_range(struct fl_flow_key 
>>> *key,
>>> memset(fl_key_get_start(key, mask), 0, fl_mask_range(mask));
>>> }
>>>
>>> +static int fl_range_compare_params(struct cls_fl_filter *filter,
>>> +  struct fl_flow_key *key,
>>> + 

Re: [net PATCH] net: sched: Fix for duplicate class dump

2018-10-18 Thread Jiri Pirko
Thu, Oct 18, 2018 at 10:34:26AM CEST, p...@nwl.cc wrote:
>When dumping classes by parent, kernel would return classes twice:
>
>| # tc qdisc add dev lo root prio
>| # tc class show dev lo
>| class prio 8001:1 parent 8001:
>| class prio 8001:2 parent 8001:
>| class prio 8001:3 parent 8001:
>| # tc class show dev lo parent 8001:
>| class prio 8001:1 parent 8001:
>| class prio 8001:2 parent 8001:
>| class prio 8001:3 parent 8001:
>| class prio 8001:1 parent 8001:
>| class prio 8001:2 parent 8001:
>| class prio 8001:3 parent 8001:
>
>This comes from qdisc_match_from_root() potentially returning the root
>qdisc itself if its handle matched. Though in that case, root's classes
>were already dumped a few lines above.
>
>Fixes: cb395b2010879 ("net: sched: optimize class dumps")
>Signed-off-by: Phil Sutter 

Reviewed-by: Jiri Pirko 


Re: [iproute PATCH] devlink: Fix error reporting in cmd_resource_set()

2018-10-18 Thread Jiri Pirko
Thu, Oct 18, 2018 at 01:28:23PM CEST, p...@nwl.cc wrote:
>resource_path_parse() returns either zero or a negative error code,
>hence the negated value must be passed to strerror().
>
>Fixes: 8cd644095842a ("devlink: Add support for devlink resource abstraction")
>Signed-off-by: Phil Sutter 

Acked-by: Jiri Pirko 


Re: [iproute2 PATCH] tc: flower: Classify packets based port ranges

2018-10-18 Thread Jiri Pirko
Fri, Oct 12, 2018 at 03:54:42PM CEST, amritha.namb...@intel.com wrote:

[...]

>@@ -1516,6 +1625,22 @@ static int flower_print_opt(struct filter_util *qu, 
>FILE *f,
>   if (nl_type >= 0)
>   flower_print_port("src_port", tb[nl_type]);
> 
>+  if (flower_port_range_attr_type(ip_proto, FLOWER_ENDPOINT_DST, )
>+  == 0) {
>+  flower_print_port_range("dst_port_min",
>+  tb[range.min_port_type]);
>+  flower_print_port_range("dst_port_max",
>+  tb[range.max_port_type]);

The input and output of iproute2 utils, tc included should be in sync.
So you need to print "range x-y" here.


>+  }
>+
>+  if (flower_port_range_attr_type(ip_proto, FLOWER_ENDPOINT_SRC, )
>+  == 0) {
>+  flower_print_port_range("src_port_min",
>+  tb[range.min_port_type]);
>+  flower_print_port_range("src_port_max",
>+  tb[range.max_port_type]);
>+  }
>+
>   flower_print_tcp_flags("tcp_flags", tb[TCA_FLOWER_KEY_TCP_FLAGS],
>  tb[TCA_FLOWER_KEY_TCP_FLAGS_MASK]);
> 
>


Re: [net-next PATCH] net: sched: cls_flower: Classify packets using port ranges

2018-10-18 Thread Jiri Pirko
Fri, Oct 12, 2018 at 03:53:30PM CEST, amritha.namb...@intel.com wrote:
>Added support in tc flower for filtering based on port ranges.
>This is a rework of the RFC patch at:
>https://patchwork.ozlabs.org/patch/969595/
>
>Example:
>1. Match on a port range:
>-
>$ tc filter add dev enp4s0 protocol ip parent :\
>  prio 1 flower ip_proto tcp dst_port range 20-30 skip_hw\
>  action drop
>
>$ tc -s filter show dev enp4s0 parent :
>filter protocol ip pref 1 flower chain 0
>filter protocol ip pref 1 flower chain 0 handle 0x1
>  eth_type ipv4
>  ip_proto tcp
>  dst_port_min 20
>  dst_port_max 30
>  skip_hw
>  not_in_hw
>action order 1: gact action drop
> random type none pass val 0
> index 1 ref 1 bind 1 installed 181 sec used 5 sec
>Action statistics:
>Sent 460 bytes 10 pkt (dropped 10, overlimits 0 requeues 0)
>backlog 0b 0p requeues 0
>
>2. Match on IP address and port range:
>--
>$ tc filter add dev enp4s0 protocol ip parent :\
>  prio 1 flower dst_ip 192.168.1.1 ip_proto tcp dst_port range 100-200\
>  skip_hw action drop
>
>$ tc -s filter show dev enp4s0 parent :
>filter protocol ip pref 1 flower chain 0 handle 0x2
>  eth_type ipv4
>  ip_proto tcp
>  dst_ip 192.168.1.1
>  dst_port_min 100
>  dst_port_max 200
>  skip_hw
>  not_in_hw
>action order 1: gact action drop
> random type none pass val 0
> index 2 ref 1 bind 1 installed 28 sec used 6 sec
>Action statistics:
>Sent 460 bytes 10 pkt (dropped 10, overlimits 0 requeues 0)
>backlog 0b 0p requeues 0
>
>Signed-off-by: Amritha Nambiar 
>---
> include/uapi/linux/pkt_cls.h |5 ++
> net/sched/cls_flower.c   |  134 --
> 2 files changed, 132 insertions(+), 7 deletions(-)
>
>diff --git a/include/uapi/linux/pkt_cls.h b/include/uapi/linux/pkt_cls.h
>index 401d0c1..b569308 100644
>--- a/include/uapi/linux/pkt_cls.h
>+++ b/include/uapi/linux/pkt_cls.h
>@@ -405,6 +405,11 @@ enum {
>   TCA_FLOWER_KEY_UDP_SRC, /* be16 */
>   TCA_FLOWER_KEY_UDP_DST, /* be16 */
> 
>+  TCA_FLOWER_KEY_PORT_SRC_MIN,/* be16 */
>+  TCA_FLOWER_KEY_PORT_SRC_MAX,/* be16 */
>+  TCA_FLOWER_KEY_PORT_DST_MIN,/* be16 */
>+  TCA_FLOWER_KEY_PORT_DST_MAX,/* be16 */
>+
>   TCA_FLOWER_FLAGS,
>   TCA_FLOWER_KEY_VLAN_ID, /* be16 */
>   TCA_FLOWER_KEY_VLAN_PRIO,   /* u8   */
>diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c
>index 9aada2d..5f135f0 100644
>--- a/net/sched/cls_flower.c
>+++ b/net/sched/cls_flower.c
>@@ -55,6 +55,9 @@ struct fl_flow_key {
>   struct flow_dissector_key_ip ip;
>   struct flow_dissector_key_ip enc_ip;
>   struct flow_dissector_key_enc_opts enc_opts;
>+
>+  struct flow_dissector_key_ports tp_min;
>+  struct flow_dissector_key_ports tp_max;
> } __aligned(BITS_PER_LONG / 8); /* Ensure that we can do comparisons as 
> longs. */
> 
> struct fl_flow_mask_range {
>@@ -103,6 +106,11 @@ struct cls_fl_filter {
>   struct net_device *hw_dev;
> };
> 
>+enum fl_endpoint {
>+  FLOWER_ENDPOINT_DST,
>+  FLOWER_ENDPOINT_SRC
>+};
>+
> static const struct rhashtable_params mask_ht_params = {
>   .key_offset = offsetof(struct fl_flow_mask, key),
>   .key_len = sizeof(struct fl_flow_key),
>@@ -179,11 +187,86 @@ static void fl_clear_masked_range(struct fl_flow_key 
>*key,
>   memset(fl_key_get_start(key, mask), 0, fl_mask_range(mask));
> }
> 
>+static int fl_range_compare_params(struct cls_fl_filter *filter,
>+ struct fl_flow_key *key,
>+ struct fl_flow_key *mkey,
>+ enum fl_endpoint endpoint)
>+{
>+  __be16 min_mask, max_mask, min_val, max_val;
>+
>+  if (endpoint == FLOWER_ENDPOINT_DST) {
>+  min_mask = htons(filter->mask->key.tp_min.dst);
>+  max_mask = htons(filter->mask->key.tp_max.dst);
>+  min_val = htons(filter->key.tp_min.dst);
>+  max_val = htons(filter->key.tp_max.dst);
>+
>+  if (min_mask && max_mask) {
>+  if (htons(key->tp.dst) < min_val ||
>+  htons(key->tp.dst) > max_val)
>+  return -1;
>+
>+  /* skb does not have min and max values */
>+  mkey->tp_min.dst = filter->mkey.tp_min.dst;
>+  mkey->tp_max.dst = filter->mkey.tp_max.dst;
>+  }
>+  } else {
>+  min_mask = htons(filter->mask->key.tp_min.src);
>+  max_mask = htons(filter->mask->key.tp_max.src);
>+  min_val = htons(filter->key.tp_min.src);
>+  max_val = htons(filter->key.tp_max.src);
>+
>+  if (min_mask && max_mask) {
>+  if (htons(key->tp.src) < min_val ||
>+  htons(key->tp.src) 

Re: [PATCH net-next v7 28/28] net: WireGuard secure network tunnel

2018-10-10 Thread Jiri Pirko
Wed, Oct 10, 2018 at 10:27:46PM CEST, ja...@zx2c4.com wrote:
>Hey Jiri,
>
>Actually, in the end I went with the suggestion from Andrew and Lukas,
>which is to follow Dan's guideline:
>https://lkml.org/lkml/2016/8/22/374 . It looks like this:
>
>https://git.kernel.org/pub/scm/linux/kernel/git/zx2c4/linux.git/tree/drivers/net/wireguard/device.c?h=jd/wireguard#n280

I prefer:
 err = do_something();
 if (err)
 goto err_do_something;

But your style is also quite common. Up to you, I guess.


>
>Jason


Re: [PATCH v4 net-next 0/9] bnxt_en: devlink param updates

2018-10-04 Thread Jiri Pirko
Thu, Oct 04, 2018 at 07:43:43AM CEST, vasundhara-v.vo...@broadcom.com wrote:
>This patchset adds support for 3 generic and 1 driver-specific devlink
>parameters. Add documentation for these configuration parameters.
>
>Also, this patchset adds support to return proper error code if
>HWRM_NVM_GET/SET_VARIABLE commands return error code
>HWRM_ERR_CODE_RESOURCE_ACCESS_DENIED.
>
>v3->v4:
>-Remove extra definition of NVM_OFF_HW_TC_OFFLOAD from bnxt_devlink.h
>-Remove type information for generic parameters from
>devlink-params-bnxt.txt
>
>v2->v3:
>-Remove description of generic parameters from devlink-params-bnxt.txt
>
>v1->v2:
>-Remove hw_tc_offload parameter.
>-Update all patches with Cc of MAINTAINERS.
>-Add more description in commit message for device specific parameter.
>-Add a new Documentation/networking/devlink-params.txt with some
>generic devlink parameters information.
>-Add a new Documentation/networking/devlink-params-bnxt.txt with devlink
>parameters information that are supported by bnxt_en driver.
>
>Vasundhara Volam (9):
>  devlink: Add generic parameter ignore_ari
>  devlink: Add generic parameter msix_vec_per_pf_max
>  devlink: Add generic parameter msix_vec_per_pf_min
>  bnxt_en: Use ignore_ari devlink parameter
>  bnxt_en: return proper error when FW returns
>HWRM_ERR_CODE_RESOURCE_ACCESS_DENIED
>  bnxt_en: Use msix_vec_per_pf_max and msix_vec_per_pf_min devlink
>params.
>  bnxt_en: Add a driver specific gre_ver_check devlink parameter.
>  devlink: Add Documentation/networking/devlink-params.txt
>  devlink: Add Documentation/networking/devlink-params-bnxt.txt

Looks good to me. For the whole set:
Acked-by: Jiri Pirko 


Re: [PATCH v3 net-next 9/9] devlink: Add Documentation/networking/devlink-params-bnxt.txt

2018-10-01 Thread Jiri Pirko
Mon, Oct 01, 2018 at 06:04:08AM CEST, vasundhara-v.vo...@broadcom.com wrote:
>On Sat, Sep 29, 2018 at 6:27 PM Jiri Pirko  wrote:
>>
>> Fri, Sep 28, 2018 at 08:28:23AM CEST, vasundhara-v.vo...@broadcom.com wrote:
>> >This patch adds a new file to add information about configuration
>> >parameters that are supported by bnxt_en driver via devlink.
>> >
>> >Cc: "David S. Miller" 
>> >Cc: Jonathan Corbet 
>> >Cc: linux-...@vger.kernel.org
>> >Cc: Jiri Pirko 
>> >Cc: Michael Chan 
>> >Signed-off-by: Vasundhara Volam 
>> >---
>> > Documentation/networking/devlink-params-bnxt.txt | 22 
>> > ++
>> > 1 file changed, 22 insertions(+)
>> > create mode 100644 Documentation/networking/devlink-params-bnxt.txt
>> >
>> >diff --git a/Documentation/networking/devlink-params-bnxt.txt 
>> >b/Documentation/networking/devlink-params-bnxt.txt
>> >new file mode 100644
>> >index 000..c7bc9d8
>> >--- /dev/null
>> >+++ b/Documentation/networking/devlink-params-bnxt.txt
>> >@@ -0,0 +1,22 @@
>> >+enable_sriov  [DEVICE, GENERIC]
>> >+  Type: Boolean
>>
>> No need to list "Type" here. You have it in devlink-params.txt for
>> generic params already.
>Ok, I thought type will give clarity when configuration mode is listed and
>when user checks out only driver specific readme file.
>I can remove it in next version of patchset.

Yes please. The thing is, once this info is on 2 places, I'm pretty sure
what eventually there would be different types on different places.
Leaving the type only on one place reduces need for sync.

Thanks.

>>
>>
>> >+  Configuration mode: Permanent
>> >+
>> >+ignore_ari[DEVICE, GENERIC]
>> >+  Type: Boolean
>> >+  Configuration mode: Permanent
>> >+
>> >+msix_vec_per_pf_max   [DEVICE, GENERIC]
>> >+  Type: u32
>> >+  Configuration mode: Permanent
>> >+
>> >+msix_vec_per_pf_min   [DEVICE, GENERIC]
>> >+  Type: u32
>> >+  Configuration mode: Permanent
>> >+
>> >+gre_ver_check [DEVICE, DRIVER-SPECIFIC]
>> >+  Generic Routing Encapsulation (GRE) version check 
>> >will
>> >+  be enabled in the device. If disabled, device skips
>> >+  version checking for incoming packets.
>> >+  Type: Boolean
>> >+  Configuration mode: Permanent
>> >--
>> >1.8.3.1
>> >


Re: [PATCH v3 net-next 9/9] devlink: Add Documentation/networking/devlink-params-bnxt.txt

2018-09-29 Thread Jiri Pirko
Fri, Sep 28, 2018 at 08:28:23AM CEST, vasundhara-v.vo...@broadcom.com wrote:
>This patch adds a new file to add information about configuration
>parameters that are supported by bnxt_en driver via devlink.
>
>Cc: "David S. Miller" 
>Cc: Jonathan Corbet 
>Cc: linux-...@vger.kernel.org
>Cc: Jiri Pirko 
>Cc: Michael Chan 
>Signed-off-by: Vasundhara Volam 
>---
> Documentation/networking/devlink-params-bnxt.txt | 22 ++
> 1 file changed, 22 insertions(+)
> create mode 100644 Documentation/networking/devlink-params-bnxt.txt
>
>diff --git a/Documentation/networking/devlink-params-bnxt.txt 
>b/Documentation/networking/devlink-params-bnxt.txt
>new file mode 100644
>index 000..c7bc9d8
>--- /dev/null
>+++ b/Documentation/networking/devlink-params-bnxt.txt
>@@ -0,0 +1,22 @@
>+enable_sriov  [DEVICE, GENERIC]
>+  Type: Boolean

No need to list "Type" here. You have it in devlink-params.txt for
generic params already.


>+  Configuration mode: Permanent
>+
>+ignore_ari[DEVICE, GENERIC]
>+  Type: Boolean
>+  Configuration mode: Permanent
>+
>+msix_vec_per_pf_max   [DEVICE, GENERIC]
>+  Type: u32
>+  Configuration mode: Permanent
>+
>+msix_vec_per_pf_min   [DEVICE, GENERIC]
>+  Type: u32
>+  Configuration mode: Permanent
>+
>+gre_ver_check [DEVICE, DRIVER-SPECIFIC]
>+  Generic Routing Encapsulation (GRE) version check will
>+  be enabled in the device. If disabled, device skips
>+  version checking for incoming packets.
>+  Type: Boolean
>+  Configuration mode: Permanent
>-- 
>1.8.3.1
>


Re: [RFC PATCH iproute2-next V2] System specification exception API

2018-09-27 Thread Jiri Pirko
Thu, Sep 27, 2018 at 04:02:48PM CEST, era...@mellanox.com wrote:
>
>
>On 9/27/2018 3:47 PM, Jiri Pirko wrote:
>> Wed, Sep 26, 2018 at 01:52:58PM CEST, era...@mellanox.com wrote:
>> > The exception spec is targeted for Real Time Alerting, in order to know 
>> > when
>> > something bad had happened to a PCI device
>> > - Provide alert debug information
>> > - Self healing
>> > - If problem needs vendor support, provide a way to gather all needed 
>> > debugging
>> >   information.
>> > 
>> > The exception mechanism contains condition checkers which sense for 
>> > malfunction. Upon a condition hit,
>> > actions such as logs and correction can be taken.
>> > 
>> > The condition checkers are divided into the following groups
>> > - Hardware - a checker which is triggered by the device due to
>> >   malfunction.
>> > - Software - a checker which is triggered by the software due to
>> >   malfunction.
>> 
>> What do you mean by a "software malfunction", a "FW malfunction"?
>> Also, I don't see this 2 groups in the man.
>
>Software malfunction can be a Transmit error (caused by bad send request).

Sorry, but I still don't undestand what "software malfuntion" are you
talking about. Could you be more specific please?


>FW/HW malfunction can be any catastrophic error report (the ones that should
>be exposed to driver).
>The comment here was to highlight that we can support different kinds of
>condition groups.
>If for a specific condition, we will need to highlight it is SW/HW, we can
>concatenate it to its name.
>
>Eran
>
>> 
>> 
>> > Both groups of condition checkers can be triggered due to error event or 
>> > due to a periodic check.
>> > 
>> > Actions are the way to handle those events. Action can be in one of the
>> > following groups:
>> > - Dump -  SW trace, SW dump, HW trace, HW dump
>> > - Reset - Surgical correction (e.g. modify Q, flush Q, reset of device, 
>> > etc)
>> > Actions can be performed by SW or HW.
>> > 
>> > User is allowed to enable or disable condition checkers and its action 
>> > mapping.
>> > 
>> > This RFC man page patch describes the suggested API of devlink-exception 
>> > in order
>> > to control conditions and actions.
>> > 
>> > V2:
>> > * Renaming terms:
>> >health -> exception
>> >sensor -> condition
>> > * Remove reinit command and merge with action command.
>> > * Consmetics in grammer.
>> > 
>> > Eran Ben Elisha (1):
>> >   man: Add devlink exception man page
>> > 
>> > man/man8/devlink-exception.8 | 158 
>> > +++
>> > 1 file changed, 158 insertions(+)
>> > create mode 100644 man/man8/devlink-exception.8
>> > 
>> > -- 
>> > 1.8.3.1
>> > 


Re: [RFC PATCH iproute2-next V2] man: Add devlink exception man page

2018-09-27 Thread Jiri Pirko
Wed, Sep 26, 2018 at 01:52:59PM CEST, era...@mellanox.com wrote:
>Add devlink-exception man page. Devlink-exception tool will control device
>exception attributes, conditions, actions and logging.
>
>Signed-off-by: Eran Ben Elisha 
>
>---
>Copy paste man output to here for easier review process of the RFC.
>
>DEVLINK-EXCEPTION(8)   
> Linux 
>  DEVLINK-EXCEPTION(8)
>
>NAME
>   devlink-exception - devlink exception configuration
>
>SYNOPSIS
>   devlink [ OPTIONS ] exception  { COMMAND | help }
>
>   OPTIONS := { -V[ersion] | -n[no-nice-names] }
>
>   devlink exception show [ DEV ] [ condition NAME ] [ action NAME ]
>
>   devlink exception condition set DEV name NAME [ action NAME { active | 
> inactive } ]
>
>   devlink exception action set DEV name NAME period PERIOD count COUNT 
> fail { ignore | down }
>
>   devlink exception help
>
>DESCRIPTION
>   devlink-exception tool allows user to configure the way driver treats 
> unexpected status. The tool allows configuration of the conditions that can 
> trigger exception activity. Set for each condition the follow up opera‐
>   tions, such as, reset and dump of info. In addition, set the exception 
> activity termination action.
>
>   devlink exception show - Display devlink exception conditions and actions 
> attributes
>   DEVSpecifies the devlink device to show.
>
>   condition NAME
>  Specifies the devlink condition to show.
>
>   action NAME
>  Specifies the devlink action to show.
>
>   devlink exception condition set - sets devlink exception condition 
> attributes
>   DEVSpecifies the devlink device to set.
>
>   name NAME
>  Name of the condition to set.
>
>   action NAME { active | inactive }
>  Specify which actions to activate and which to deactivate 
> once a condition was triggered. Actions can be dump, reset, etc.
>
>   devlink exception action set - sets devlink action attributes.
>   Once this command is launched, period and count measurement will be 
> reset.
>
>   DEVSpecifies the devlink device to set.
>
>   name NAME
>  Specifies the devlink action to set.
>
>   period PERIOD
>  The period on which we limit the amount of performed actions, 
> measured in seconds.
>
>   count COUNT
>  The maximum number of actions performed in a limited time frame.
>
>   fail   { ignore | down }
>  Specify the behavior once count limit was reached.
>
>  ignore - Skip triggering this action.
>
>  down - Driver will remain in nonoperational state.
>
>EXAMPLES
>   devlink exception show
>   Shows the exception state of all devlink devices on the system.
>
>   devlink exception show pci/:01:00.0
>   Shows the exception state of specified devlink device.
>
>   devlink exception condition set pci/:01:00.0 name TX_COMP_ERROR 
> action reset off action dump on
>   Sets TX_COMP_ERROR condition parameters for a specific device.
>
>   devlink exception action set pci/:01:00.0 name reset period 3600 
> count 5 fail ignore
>   Sets exception attributes for reset action. Period timer and 
> counter are being reset.

Looks good to me. But still, I need the code so I can play with it, to
see the outputs etc.

Thanks!


>
>SEE ALSO
>   devlink(8), devlink-port(8), devlink-sb(8), devlink-monitor(8), 
> devlink-dev(8),
>
>AUTHOR
>   Eran ben Elisha 
>
>iproute2   
>  15 Aug 2018  
>  DEVLINK-EXCEPTION(8)
>
>---
> man/man8/devlink-exception.8 | 158 +++
> 1 file changed, 158 insertions(+)
> create mode 100644 man/man8/devlink-exception.8
>
>diff --git a/man/man8/devlink-exception.8 b/man/man8/devlink-exception.8
>new file mode 100644
>index ..03f24b32cc98
>--- /dev/null
>+++ b/man/man8/devlink-exception.8
>@@ -0,0 +1,158 @@
>+.TH DEVLINK\-EXCEPTION 8 "15 Aug 2018" "iproute2" "Linux"
>+.SH NAME
>+devlink-exception \- devlink exception configuration
>+.SH SYNOPSIS
>+.sp
>+.ad l
>+.in +8
>+.ti -8
>+.B devlink
>+.RI "[ " OPTIONS " ]"
>+.BR exception
>+.RI  " { " COMMAND " | "
>+.BR help " }"
>+.sp
>+
>+.ti -8
>+.IR OPTIONS " := { "
>+\fB\-V\fR[\fIersion\fR] |
>+\fB\-n\fR[\fIno-nice-names\fR] }
>+
>+.ti -8
>+.B devlink exception show
>+.RI "[ " DEV " ]"
>+.RI "[ "
>+.B condition
>+.IR NAME
>+.RI "]"
>+.RI "[ "
>+.B action
>+.IR NAME
>+.RI "]"
>+
>+.ti -8
>+.B devlink exception condition set
>+.IR DEV
>+.B name
>+.IR NAME
>+.RI "[ "

Re: [RFC PATCH iproute2-next V2] System specification exception API

2018-09-27 Thread Jiri Pirko
Wed, Sep 26, 2018 at 01:52:58PM CEST, era...@mellanox.com wrote:
>The exception spec is targeted for Real Time Alerting, in order to know when
>something bad had happened to a PCI device
>- Provide alert debug information
>- Self healing
>- If problem needs vendor support, provide a way to gather all needed debugging
>  information.
>
>The exception mechanism contains condition checkers which sense for 
>malfunction. Upon a condition hit,
>actions such as logs and correction can be taken.
>
>The condition checkers are divided into the following groups
>- Hardware - a checker which is triggered by the device due to
>  malfunction.
>- Software - a checker which is triggered by the software due to
>  malfunction.

What do you mean by a "software malfunction", a "FW malfunction"?
Also, I don't see this 2 groups in the man.


>Both groups of condition checkers can be triggered due to error event or due 
>to a periodic check.
>
>Actions are the way to handle those events. Action can be in one of the
>following groups:
>- Dump -  SW trace, SW dump, HW trace, HW dump
>- Reset - Surgical correction (e.g. modify Q, flush Q, reset of device, etc)
>Actions can be performed by SW or HW.
>
>User is allowed to enable or disable condition checkers and its action mapping.
>
>This RFC man page patch describes the suggested API of devlink-exception in 
>order
>to control conditions and actions.
>
>V2:
>* Renaming terms:
>   health -> exception
>   sensor -> condition
>* Remove reinit command and merge with action command.
>* Consmetics in grammer.
>
>Eran Ben Elisha (1):
>  man: Add devlink exception man page
>
> man/man8/devlink-exception.8 | 158 +++
> 1 file changed, 158 insertions(+)
> create mode 100644 man/man8/devlink-exception.8
>
>-- 
>1.8.3.1
>


Re: [PATCH net] devlink: double free in devlink_resource_fill()

2018-09-23 Thread Jiri Pirko
Fri, Sep 21, 2018 at 10:07:55AM CEST, dan.carpen...@oracle.com wrote:
>Smatch reports that devlink_dpipe_send_and_alloc_skb() frees the skb
>on error so this is a double free.  We fixed a bunch of these bugs in
>commit 7fe4d6dcbcb4 ("devlink: Remove redundant free on error path") but
>we accidentally overlooked this one.
>
>Fixes: d9f9b9a4d05f ("devlink: Add support for resource abstraction")
>Signed-off-by: Dan Carpenter 

Acked-by: Jiri Pirko 


Re: [PATCH net-next 08/13] net: sched: rename tcf_block_get{_ext}() and tcf_block_put{_ext}()

2018-09-14 Thread Jiri Pirko
Fri, Sep 14, 2018 at 12:38:08PM CEST, vla...@mellanox.com wrote:
>
>On Thu 13 Sep 2018 at 17:21, Cong Wang  wrote:
>> On Wed, Sep 12, 2018 at 1:24 AM Vlad Buslov  wrote:
>>>
>>>
>>> On Fri 07 Sep 2018 at 20:09, Cong Wang  wrote:
>>> > On Thu, Sep 6, 2018 at 12:59 AM Vlad Buslov  wrote:
>>> >>
>>> >> Functions tcf_block_get{_ext}() and tcf_block_put{_ext}() actually
>>> >> attach/detach block to specific Qdisc besides just taking/putting
>>> >> reference. Rename them according to their purpose.
>>> >
>>> > Where exactly does it attach to?
>>> >
>>> > Each qdisc provides a pointer to a pointer of a block, like
>>> > >block. It is where the result is saved to. It takes a parameter
>>> > of Qdisc* merely for read-only purpose.
>>>
>>> tcf_block_attach_ext() passes qdisc parameter to tcf_block_owner_add()
>>> which saves qdisc to new tcf_block_owner_item and adds the item to
>>> block's owner list. I proposed several naming options for these
>>> functions to Jiri on internal review and he suggested "attach" as better
>>> option.
>>
>> But that is merely item->q = q, this is why I said it is read-only,
>> hard to claim this is attaching.
>>
>>
>>>
>>> >
>>> > So, renaming it to *attach() is even confusing, at least not
>>> > any better. Please find other names or leave them as they are.
>>>
>>> What would you recommend?
>>
>> I don't know, perhaps "acquire"?
>>
>> Or, leaving tcf_block_get() as it is but rename your refcnt
>> increment function to be something like tcf_block_refcnt_get()?
>
>Cong, I'm okay with both options.
>
>Jiri, which naming would you prefer?

Maybe tcf_block_refcnt_get() is better.


Re: [PATCH net] net/sched: act_sample: fix NULL dereference in the data path

2018-09-14 Thread Jiri Pirko
Fri, Sep 14, 2018 at 12:03:18PM CEST, dcara...@redhat.com wrote:
>Matteo reported the following splat, testing the datapath of TC 'sample':
>
> BUG: KASAN: null-ptr-deref in tcf_sample_act+0xc4/0x310
> Read of size 8 at addr  by task nc/433
>
> CPU: 0 PID: 433 Comm: nc Not tainted 4.19.0-rc3-kvm #17
> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 
> ?-20180531_142017-buildhw-08.phx2.fedoraproject.org-1.fc28 04/01/2014
> Call Trace:
>  kasan_report.cold.6+0x6c/0x2fa
>  tcf_sample_act+0xc4/0x310
>  ? dev_hard_start_xmit+0x117/0x180
>  tcf_action_exec+0xa3/0x160
>  tcf_classify+0xdd/0x1d0
>  htb_enqueue+0x18e/0x6b0
>  ? deref_stack_reg+0x7a/0xb0
>  ? htb_delete+0x4b0/0x4b0
>  ? unwind_next_frame+0x819/0x8f0
>  ? entry_SYSCALL_64_after_hwframe+0x44/0xa9
>  __dev_queue_xmit+0x722/0xca0
>  ? unwind_get_return_address_ptr+0x50/0x50
>  ? netdev_pick_tx+0xe0/0xe0
>  ? save_stack+0x8c/0xb0
>  ? kasan_kmalloc+0xbe/0xd0
>  ? __kmalloc_track_caller+0xe4/0x1c0
>  ? __kmalloc_reserve.isra.45+0x24/0x70
>  ? __alloc_skb+0xdd/0x2e0
>  ? sk_stream_alloc_skb+0x91/0x3b0
>  ? tcp_sendmsg_locked+0x71b/0x15a0
>  ? tcp_sendmsg+0x22/0x40
>  ? __sys_sendto+0x1b0/0x250
>  ? __x64_sys_sendto+0x6f/0x80
>  ? do_syscall_64+0x5d/0x150
>  ? entry_SYSCALL_64_after_hwframe+0x44/0xa9
>  ? __sys_sendto+0x1b0/0x250
>  ? __x64_sys_sendto+0x6f/0x80
>  ? do_syscall_64+0x5d/0x150
>  ? entry_SYSCALL_64_after_hwframe+0x44/0xa9
>  ip_finish_output2+0x495/0x590
>  ? ip_copy_metadata+0x2e0/0x2e0
>  ? skb_gso_validate_network_len+0x6f/0x110
>  ? ip_finish_output+0x174/0x280
>  __tcp_transmit_skb+0xb17/0x12b0
>  ? __tcp_select_window+0x380/0x380
>  tcp_write_xmit+0x913/0x1de0
>  ? __sk_mem_schedule+0x50/0x80
>  tcp_sendmsg_locked+0x49d/0x15a0
>  ? tcp_rcv_established+0x8da/0xa30
>  ? tcp_set_state+0x220/0x220
>  ? clear_user+0x1f/0x50
>  ? iov_iter_zero+0x1ae/0x590
>  ? __fget_light+0xa0/0xe0
>  tcp_sendmsg+0x22/0x40
>  __sys_sendto+0x1b0/0x250
>  ? __ia32_sys_getpeername+0x40/0x40
>  ? _copy_to_user+0x58/0x70
>  ? poll_select_copy_remaining+0x176/0x200
>  ? __pollwait+0x1c0/0x1c0
>  ? ktime_get_ts64+0x11f/0x140
>  ? kern_select+0x108/0x150
>  ? core_sys_select+0x360/0x360
>  ? vfs_read+0x127/0x150
>  ? kernel_write+0x90/0x90
>  __x64_sys_sendto+0x6f/0x80
>  do_syscall_64+0x5d/0x150
>  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> RIP: 0033:0x7fefef2b129d
> Code: ff ff ff ff eb b6 0f 1f 80 00 00 00 00 48 8d 05 51 37 0c 00 41 89 ca 8b 
> 00 85 c0 75 20 45 31 c9 45 31 c0 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 
> 6b f3 c3 66 0f 1f 84 00 00 00 00 00 41 56 41
> RSP: 002b:7fff2f5350c8 EFLAGS: 0246 ORIG_RAX: 002c
> RAX: ffda RBX: 56118d60c120 RCX: 7fefef2b129d
> RDX: 2000 RSI: 56118d629320 RDI: 0003
> RBP: 56118d530370 R08:  R09: 
> R10:  R11: 0246 R12: 2000
> R13: 56118d5c2a10 R14: 56118d5c2a10 R15: 56118d5303b8
>
>tcf_sample_act() tried to update its per-cpu stats, but tcf_sample_init()
>forgot to allocate them, because tcf_idr_create() was called with a wrong
>value of 'cpustats'. Setting it to true proved to fix the reported crash.
>
>Reported-by: Matteo Croce 
>Fixes: 65a206c01e8e ("net/sched: Change act_api and act_xxx modules to use 
>IDR")
>Fixes: 5c5670fae430 ("net/sched: Introduce sample tc action")
>Tested-by: Matteo Croce 
>Signed-off-by: Davide Caratti 

Acked-by: Jiri Pirko 



Re: [PATCH net-next 0/8] bnxt_en: devlink param updates

2018-09-14 Thread Jiri Pirko
Fri, Sep 14, 2018 at 06:17:07AM CEST, vasundhara-v.vo...@broadcom.com wrote:
>On Wed, Sep 12, 2018 at 3:20 PM Jakub Kicinski
> wrote:
>>
>> On Wed, 12 Sep 2018 12:09:37 +0530, Vasundhara Volam wrote:
>> > On Tue, Sep 11, 2018 at 5:04 PM Jakub Kicinski wrote:
>> > > On Tue, 11 Sep 2018 14:14:57 +0530, Vasundhara Volam wrote:
>> > > > This patchset adds support for 4 generic and 1 driver-specific devlink
>> > > > parameters.
>> > > >
>> > > > Also, this patchset adds support to return proper error code if
>> > > > HWRM_NVM_GET/SET_VARIABLE commands return error code
>> > > > HWRM_ERR_CODE_RESOURCE_ACCESS_DENIED.
>> > > >
>> > > > Vasundhara Volam (8):
>> > > >   devlink: Add generic parameter hw_tc_offload
>> > >
>> > > Much like Jiri, I can't help but wonder why do you need this?
>> >
>> > There is a request from our customer for a way to toggle tc_offload
>> > feature in our adapter.
>>
>> Vasundhara, again, we don't need to know who asked you to do this, but
>> _why_.  What problem are you solving?  What is the customer trying to
>> achieve?
>For Brand new big features like TC_offload, few customers are not willing
>to enable it by default in the adapter(Firmware). This was a subjective 
>decision
>to disable TC_offload by default in the adapter.

Again, why? Why it cannot be enabled in FW and just enabled/disabled by
ethtool flag? Don't say that "customers want it" please...


>>
>> > > >   devlink: Add generic parameter ignore_ari
>> > > >   devlink: Add generic parameter msix_vec_per_pf_max
>> > > >   devlink: Add generic parameter msix_vec_per_pf_min
>> > >
>> > > IMHO more structured API would be preferable if possible.  The string
>> > > keys won't scale if you want to set the parameters per PF, and
>> > > creating more structured API for PCIe which is a relatively slow
>> > > moving HW spec seems tractable.
>> >
>> > Sorry, could you please suggest an example? We will try to adapt.
>>
>> My thinking was that the same way devlink device has ports, it should
>> have PCIe functions as objects which then have attributes.  Instead of
>> making everything a string-identified device attribute.  But I'm not
>> dead set on this if others don't think its a good idea.
>Actually this parameters are for the port but the value given to this param
>is applicable for individual PF. That's the reason I have added "per_pf" 
>string.
>If you think this is not a good idea, I can move this params to 
>driver-specific.


Re: [net-next, RFC PATCH] net: sched: cls_range: Introduce Range classifier

2018-09-14 Thread Jiri Pirko
Thu, Sep 13, 2018 at 10:52:06PM CEST, amritha.namb...@intel.com wrote:

[...]

>+static struct cls_range_filter *range_lookup(struct cls_range_head *head,
>+   struct range_flow_key *key,
>+   struct range_flow_key *mkey,
>+   bool is_skb)
>+{
>+  struct cls_range_filter *filter, *next_filter;
>+  struct range_params range;
>+  int ret;
>+  size_t cmp_size;
>+
>+  list_for_each_entry_safe(filter, next_filter, >filters, flist) {

This really should be list_for_each_entry_rcu()

also, as I wrote in the previous email, this should be done in
cls_flower. Look at fl_lookup() it looks-up hashtable. You just need to
add linked list traversal and range comparison to that function for the
hit in the hashtable.


>+  if (!is_skb) {
>+  /* Existing filter comparison */
>+  cmp_size = sizeof(filter->mkey);
>+  } else {
>+  /* skb classification */
>+  ret = range_compare_params(, filter, key,
>+ RANGE_PORT_DST);
>+  if (ret < 0)
>+  continue;
>+
>+  ret = range_compare_params(, filter, key,
>+ RANGE_PORT_SRC);
>+  if (ret < 0)
>+  continue;
>+
>+  /* skb does not have min and max values */
>+  cmp_size = RANGE_KEY_MEMBER_OFFSET(tp_min);
>+  }
>+  if (!memcmp(mkey, >mkey, cmp_size))
>+  return filter;
>+  }
>+  return NULL;

[...]


Re: [net-next,RFC PATCH] Introduce TC Range classifier

2018-09-14 Thread Jiri Pirko
Thu, Sep 13, 2018 at 10:52:01PM CEST, amritha.namb...@intel.com wrote:
>This patch introduces a TC range classifier to support filtering based
>on ranges. Only port-range filters are supported currently. This can
>be combined with flower classifier to support filters that are a
>combination of port-ranges and other parameters based on existing
>fields supported by cls_flower. The 'goto chain' action can be used to
>combine the flower and range filter.
>The filter precedence is decided based on the 'prio' value.

For example Spectrum ASIC supports mask-based and range-based matching
in a single TCAM rule. No chains needed. Also, I don't really understand
why is this a separate cls. I believe that this functionality should be
put as an extension of existing cls_flower.


Re: [PATCH net-next 1/8] devlink: Add generic parameter hw_tc_offload

2018-09-13 Thread Jiri Pirko
Wed, Sep 12, 2018 at 08:34:22AM CEST, jakub.kicin...@netronome.com wrote:
>On Wed, 12 Sep 2018 11:47:59 +0530, Vasundhara Volam wrote:
>> On Tue, Sep 11, 2018 at 3:25 PM Jiri Pirko  wrote:
>> >
>> > Tue, Sep 11, 2018 at 10:44:58AM CEST, vasundhara-v.vo...@broadcom.com 
>> > wrote:  
>> > >hw_tc_offload - Enable/Disable TC flower offload in the device.
>> > >
>> > >Signed-off-by: Vasundhara Volam 
>> > >---
>> > > include/net/devlink.h | 4 
>> > > net/core/devlink.c| 5 +
>> > > 2 files changed, 9 insertions(+)
>> > >
>> > >diff --git a/include/net/devlink.h b/include/net/devlink.h
>> > >index b9b89d6..a0e9ce9 100644
>> > >--- a/include/net/devlink.h
>> > >+++ b/include/net/devlink.h
>> > >@@ -362,6 +362,7 @@ enum devlink_param_generic_id {
>> > >   DEVLINK_PARAM_GENERIC_ID_MAX_MACS,
>> > >   DEVLINK_PARAM_GENERIC_ID_ENABLE_SRIOV,
>> > >   DEVLINK_PARAM_GENERIC_ID_REGION_SNAPSHOT,
>> > >+  DEVLINK_PARAM_GENERIC_ID_HW_TC_OFFLOAD,  
>> >
>> > Could you please describe why do you need this here and why the
>> > tc_offload flag in ethtool is not enough. How do you imagine the user
>> > should use them together?  
>> Jiri, tc_offload flag in ethtool will modify feature in driver at
>> runtime. But I am adding
>> tc_offload param here to toggle this feature in NVM Config of our
>> adapter, whose configuration
>> mode is permanent and will be effective only with a reboot of the server.
>> 
>> User has to turn on tc_offload feature in NVM config of the adapter and then
>> enabling the tc_offload flag in ethtool will completely enable the
>> feature in driver.
>
>Thanks for explaining, however, I don't think we have trouble
>understanding *what* you are doing, but rather *why* you're doing it.

Exactly. Why ethtool tc_offload flag is not enough?


Re: [PATCH net-next 0/8] bnxt_en: devlink param updates

2018-09-11 Thread Jiri Pirko
Tue, Sep 11, 2018 at 01:33:51PM CEST, jakub.kicin...@netronome.com wrote:
>On Tue, 11 Sep 2018 14:14:57 +0530, Vasundhara Volam wrote:
>> This patchset adds support for 4 generic and 1 driver-specific devlink
>> parameters.
>> 
>> Also, this patchset adds support to return proper error code if
>> HWRM_NVM_GET/SET_VARIABLE commands return error code
>> HWRM_ERR_CODE_RESOURCE_ACCESS_DENIED.
>> 
>> Vasundhara Volam (8):
>>   devlink: Add generic parameter hw_tc_offload
>
>Much like Jiri, I can't help but wonder why do you need this?
>
>>   devlink: Add generic parameter ignore_ari
>>   devlink: Add generic parameter msix_vec_per_pf_max
>>   devlink: Add generic parameter msix_vec_per_pf_min
>
>IMHO more structured API would be preferable if possible.  The string
>keys won't scale if you want to set the parameters per PF, and
>creating more structured API for PCIe which is a relatively slow
>moving HW spec seems tractable.
>
>Not to mention the question Alex posed before about where this knobs
>should actually live.  I'm personally fine with devlink.
>
>>   bnxt_en: Use hw_tc_offload and ignore_ari devlink parameters
>>   bnxt_en: return proper error when FW returns
>> HWRM_ERR_CODE_RESOURCE_ACCESS_DENIED
>>   bnxt_en: Use msix_vec_per_pf_max and msix_vec_per_pf_min devlink
>> params.
>>   bnxt_en: Add a driver specific devlink parameter.
>
>The details about the device specific devlink parameter are very much
>lacking.  Why does the patch subject not mention any specifics?  What's
>GRE in the first place?
>
>You really need to provide more details and docs for all these
>parameters.

We really need Documentation/networking/devlink-params.txt to describe
these.



Re: [PATCH net-next 1/8] devlink: Add generic parameter hw_tc_offload

2018-09-11 Thread Jiri Pirko
Tue, Sep 11, 2018 at 10:44:58AM CEST, vasundhara-v.vo...@broadcom.com wrote:
>hw_tc_offload - Enable/Disable TC flower offload in the device.
>
>Signed-off-by: Vasundhara Volam 
>---
> include/net/devlink.h | 4 
> net/core/devlink.c| 5 +
> 2 files changed, 9 insertions(+)
>
>diff --git a/include/net/devlink.h b/include/net/devlink.h
>index b9b89d6..a0e9ce9 100644
>--- a/include/net/devlink.h
>+++ b/include/net/devlink.h
>@@ -362,6 +362,7 @@ enum devlink_param_generic_id {
>   DEVLINK_PARAM_GENERIC_ID_MAX_MACS,
>   DEVLINK_PARAM_GENERIC_ID_ENABLE_SRIOV,
>   DEVLINK_PARAM_GENERIC_ID_REGION_SNAPSHOT,
>+  DEVLINK_PARAM_GENERIC_ID_HW_TC_OFFLOAD,

Could you please describe why do you need this here and why the
tc_offload flag in ethtool is not enough. How do you imagine the user
should use them together?


Re: [PATCH] rtnetlink: expose value from SET_NETDEV_DEVTYPE via IFLA_DEVTYPE attribute

2018-08-30 Thread Jiri Pirko
Thu, Aug 30, 2018 at 12:13:40PM CEST, mar...@holtmann.org wrote:
>Hi Jiri,
>
> The name value from SET_NETDEV_DEVTYPE only ended up in the uevent sysfs
> file as DEVTYPE= information. To avoid any kind of race conditions
> between netlink messages and reading from sysfs, it is useful to add the
> same string as new IFLA_DEVTYPE attribute included in the RTM_NEWLINK
> messages.
> 
> For network managing daemons that have to classify ARPHRD_ETHER network
> devices into different types (like Wireless LAN, Bluetooth etc.), this
> avoids the extra round trip to sysfs and parsing of the uevent file.
> 
> Signed-off-by: Marcel Holtmann 
> ---
> include/uapi/linux/if_link.h |  2 ++
> net/core/rtnetlink.c | 12 
> 2 files changed, 14 insertions(+)
> 
> diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
> index 43391e2d1153..781294972bb4 100644
> --- a/include/uapi/linux/if_link.h
> +++ b/include/uapi/linux/if_link.h
> @@ -166,6 +166,8 @@ enum {
>   IFLA_NEW_IFINDEX,
>   IFLA_MIN_MTU,
>   IFLA_MAX_MTU,
> + IFLA_DEVTYPE,   /* Name value from SET_NETDEV_DEVTYPE */
 
 This is not something netdev-related. dev->dev.type is struct device_type.
 This is a generic "device" thing. Incorrect to expose over
 netdev-specific API. Please use "device" API for this.
>>> 
>>> it is not just "device" related since this is a sub-classification of 
>>> ARPHRD_ETHER type as a wrote above. Don't get hang up that this information 
>>> is part of struct device. The dev->dev.type contains strings like "wlan", 
>>> "bluetooth", "wimax", "gadget" etc. so that you can tell what kind of 
>>> Ethernet card you have.
>> 
>> I understand. But using dev->dev.type for that purpose seems like an
>> abuse. If this is subtype of netdev, it should not be stored in parent
>> device. I believe that you need to re-introduce this as a part of struct
>> net_device - preferably an enum - and that you can expose over rtnl (and
>> net-sysfs).
>
>it is not stored in the parent device. It is stored in the device of the 
>netdev.

Yeah. That is what I ment. "Device struct" associated with the
netdevice.

>
>As Stephen said, there is no device API. And why would I add the same info 
>again and bloat netdev?
>
>drivers/net/bonding/bond_main.c:SET_NETDEV_DEVTYPE(bond_dev, 
>_type);
>drivers/net/geneve.c:   SET_NETDEV_DEVTYPE(dev, _type);
>drivers/net/macsec.c:   SET_NETDEV_DEVTYPE(dev, _type);
>drivers/net/ppp/ppp_generic.c:  SET_NETDEV_DEVTYPE(dev, _type);
>drivers/net/usb/hso.c:  SET_NETDEV_DEVTYPE(net, _type);
>drivers/net/usb/usbnet.c:   SET_NETDEV_DEVTYPE(net, _type);
>drivers/net/usb/usbnet.c:   SET_NETDEV_DEVTYPE(net, _type);
>drivers/net/vxlan.c:SET_NETDEV_DEVTYPE(dev, _type);
>drivers/net/wimax/i2400m/usb.c: SET_NETDEV_DEVTYPE(net_dev, _type);
>drivers/net/wireless/intersil/prism54/islpci_dev.c: 
>SET_NETDEV_DEVTYPE(ndev, _type);
>drivers/staging/gdm724x/gdm_lte.c:  SET_NETDEV_DEVTYPE(net, 
>_type);
>drivers/usb/gadget/function/u_ether.c:  SET_NETDEV_DEVTYPE(net, _type);
>drivers/usb/gadget/function/u_ether.c:  SET_NETDEV_DEVTYPE(net, _type);
>net/8021q/vlan_dev.c:   SET_NETDEV_DEVTYPE(dev, _type);
>net/bluetooth/6lowpan.c:SET_NETDEV_DEVTYPE(netdev, _type);
>net/bluetooth/bnep/core.c:  SET_NETDEV_DEVTYPE(dev, _type);
>net/bridge/br_device.c: SET_NETDEV_DEVTYPE(dev, _type);
>net/dsa/slave.c:SET_NETDEV_DEVTYPE(slave_dev, _type);
>net/hsr/hsr_device.c:   SET_NETDEV_DEVTYPE(dev, _type);
>net/l2tp/l2tp_eth.c:SET_NETDEV_DEVTYPE(dev, _type);
>net/wireless/core.c:SET_NETDEV_DEVTYPE(dev, _type);
>
>So for all these devices we have to add duplicate information now?

The fact the things are done now it a wrong way does not justify
extending UAPI in the same wrong way. So yes, change them all if you
need it. The "ethernet-subtype" should not an attribute of "struct
device" but rather "struct net_device".

>
>I actually don't like that idea. I can rename it to IFLA_FOO or any other 
>proposal, but adding the same information twice is pointless.

So don't expose the "struct device" attribute over rtnetlink. Use
"struct device"-specific interface for that.

>
>>> We can revive the patches for adding this information as IFLA_INFO_KIND, 
>>> but last time they where reverted since NetworkManager did all the wrong 
>>> decisions based on that. That part is fixed now and we can put that back 
>>> and declare the sub-type classification of Ethernet device in two places. 
>>> Or we just take the information that we added a long time ago for exactly 
>>> this sub-classification and provide them to userspace via RTNL.
>> 
>> IFLA_INFO_KIND serves for different purpose. It is for drivers that
>> register rtnl_link_ops. I don't think it would be wise to mix it with
>> this.
>
>We could register rtnl_link_ops for these Ethernet 

Re: [PATCH] rtnetlink: expose value from SET_NETDEV_DEVTYPE via IFLA_DEVTYPE attribute

2018-08-29 Thread Jiri Pirko
Wed, Aug 29, 2018 at 05:24:28PM CEST, step...@networkplumber.org wrote:
>On Wed, 29 Aug 2018 09:18:55 +0200
>Jiri Pirko  wrote:
>
>> Tue, Aug 28, 2018 at 10:58:11PM CEST, mar...@holtmann.org wrote:
>> >The name value from SET_NETDEV_DEVTYPE only ended up in the uevent sysfs
>> >file as DEVTYPE= information. To avoid any kind of race conditions
>> >between netlink messages and reading from sysfs, it is useful to add the
>> >same string as new IFLA_DEVTYPE attribute included in the RTM_NEWLINK
>> >messages.
>> >
>> >For network managing daemons that have to classify ARPHRD_ETHER network
>> >devices into different types (like Wireless LAN, Bluetooth etc.), this
>> >avoids the extra round trip to sysfs and parsing of the uevent file.
>> >
>> >Signed-off-by: Marcel Holtmann 
>> >---
>> > include/uapi/linux/if_link.h |  2 ++
>> > net/core/rtnetlink.c | 12 
>> > 2 files changed, 14 insertions(+)
>> >
>> >diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
>> >index 43391e2d1153..781294972bb4 100644
>> >--- a/include/uapi/linux/if_link.h
>> >+++ b/include/uapi/linux/if_link.h
>> >@@ -166,6 +166,8 @@ enum {
>> >IFLA_NEW_IFINDEX,
>> >IFLA_MIN_MTU,
>> >IFLA_MAX_MTU,
>> >+   IFLA_DEVTYPE,   /* Name value from SET_NETDEV_DEVTYPE */  
>> 
>> This is not something netdev-related. dev->dev.type is struct device_type.
>> This is a generic "device" thing. Incorrect to expose over
>> netdev-specific API. Please use "device" API for this.
>
>There is no device API in netlink. The whole point of this patch is to
>do it in one message. It might be a performance optimization, but I can't
>see how it could be a race condition. Devices set type before registering.

I understand. My point is to avoid exposing generic "struct device"
values over rtnetlink. It mixes apples and oranges.


Re: [PATCH] rtnetlink: expose value from SET_NETDEV_DEVTYPE via IFLA_DEVTYPE attribute

2018-08-29 Thread Jiri Pirko
Wed, Aug 29, 2018 at 05:23:58PM CEST, mar...@holtmann.org wrote:
>Hi Jiri,
>
>>> The name value from SET_NETDEV_DEVTYPE only ended up in the uevent sysfs
>>> file as DEVTYPE= information. To avoid any kind of race conditions
>>> between netlink messages and reading from sysfs, it is useful to add the
>>> same string as new IFLA_DEVTYPE attribute included in the RTM_NEWLINK
>>> messages.
>>> 
>>> For network managing daemons that have to classify ARPHRD_ETHER network
>>> devices into different types (like Wireless LAN, Bluetooth etc.), this
>>> avoids the extra round trip to sysfs and parsing of the uevent file.
>>> 
>>> Signed-off-by: Marcel Holtmann 
>>> ---
>>> include/uapi/linux/if_link.h |  2 ++
>>> net/core/rtnetlink.c | 12 
>>> 2 files changed, 14 insertions(+)
>>> 
>>> diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
>>> index 43391e2d1153..781294972bb4 100644
>>> --- a/include/uapi/linux/if_link.h
>>> +++ b/include/uapi/linux/if_link.h
>>> @@ -166,6 +166,8 @@ enum {
>>> IFLA_NEW_IFINDEX,
>>> IFLA_MIN_MTU,
>>> IFLA_MAX_MTU,
>>> +   IFLA_DEVTYPE,   /* Name value from SET_NETDEV_DEVTYPE */
>> 
>> This is not something netdev-related. dev->dev.type is struct device_type.
>> This is a generic "device" thing. Incorrect to expose over
>> netdev-specific API. Please use "device" API for this.
>
>it is not just "device" related since this is a sub-classification of 
>ARPHRD_ETHER type as a wrote above. Don't get hang up that this information is 
>part of struct device. The dev->dev.type contains strings like "wlan", 
>"bluetooth", "wimax", "gadget" etc. so that you can tell what kind of Ethernet 
>card you have.

I understand. But using dev->dev.type for that purpose seems like an
abuse. If this is subtype of netdev, it should not be stored in parent
device. I believe that you need to re-introduce this as a part of struct
net_device - preferably an enum - and that you can expose over rtnl (and
net-sysfs).

>
>We can revive the patches for adding this information as IFLA_INFO_KIND, but 
>last time they where reverted since NetworkManager did all the wrong decisions 
>based on that. That part is fixed now and we can put that back and declare the 
>sub-type classification of Ethernet device in two places. Or we just take the 
>information that we added a long time ago for exactly this sub-classification 
>and provide them to userspace via RTNL.

IFLA_INFO_KIND serves for different purpose. It is for drivers that
register rtnl_link_ops. I don't think it would be wise to mix it with
this.

Btw, could you wrap the sentences at 72 cols? Would be easier to read
that way.


>
>Regards
>
>Marcel
>


Re: [PATCH net v2 1/2] net_sched: reject unknown tcfa_action values

2018-08-29 Thread Jiri Pirko
Wed, Aug 29, 2018 at 10:22:33AM CEST, pab...@redhat.com wrote:
>After the commit 802bfb19152c ("net/sched: user-space can't set
>unknown tcfa_action values"), unknown tcfa_action values are
>converted to TC_ACT_UNSPEC, but the common agreement is instead
>rejecting such configurations.
>
>This change also introduces a helper to simplify the destruction
>of a single action, avoiding code duplication.
>
>v1 -> v2:
> - helper is now static and renamed according to act_* convention
> - updated extack message, according to the new behavior
>
>Fixes: 802bfb19152c ("net/sched: user-space can't set unknown tcfa_action 
>values")
>Signed-off-by: Paolo Abeni 

Acked-by: Jiri Pirko 


Re: [PATCH] rtnetlink: expose value from SET_NETDEV_DEVTYPE via IFLA_DEVTYPE attribute

2018-08-29 Thread Jiri Pirko
Tue, Aug 28, 2018 at 10:58:11PM CEST, mar...@holtmann.org wrote:
>The name value from SET_NETDEV_DEVTYPE only ended up in the uevent sysfs
>file as DEVTYPE= information. To avoid any kind of race conditions
>between netlink messages and reading from sysfs, it is useful to add the
>same string as new IFLA_DEVTYPE attribute included in the RTM_NEWLINK
>messages.
>
>For network managing daemons that have to classify ARPHRD_ETHER network
>devices into different types (like Wireless LAN, Bluetooth etc.), this
>avoids the extra round trip to sysfs and parsing of the uevent file.
>
>Signed-off-by: Marcel Holtmann 
>---
> include/uapi/linux/if_link.h |  2 ++
> net/core/rtnetlink.c | 12 
> 2 files changed, 14 insertions(+)
>
>diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
>index 43391e2d1153..781294972bb4 100644
>--- a/include/uapi/linux/if_link.h
>+++ b/include/uapi/linux/if_link.h
>@@ -166,6 +166,8 @@ enum {
>   IFLA_NEW_IFINDEX,
>   IFLA_MIN_MTU,
>   IFLA_MAX_MTU,
>+  IFLA_DEVTYPE,   /* Name value from SET_NETDEV_DEVTYPE */

This is not something netdev-related. dev->dev.type is struct device_type.
This is a generic "device" thing. Incorrect to expose over
netdev-specific API. Please use "device" API for this.


[patch net 1/2] net: sched: fix extack error message when chain is failed to be created

2018-08-27 Thread Jiri Pirko
From: Jiri Pirko 

Instead "Cannot find" say "Cannot create".

Fixes: c35a4acc2985 ("net: sched: cls_api: handle generic cls errors")
Signed-off-by: Jiri Pirko 
---
 net/sched/cls_api.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index 31bd1439cf60..2d41c5b21b48 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -1252,7 +1252,7 @@ static int tc_new_tfilter(struct sk_buff *skb, struct 
nlmsghdr *n,
}
chain = tcf_chain_get(block, chain_index, true);
if (!chain) {
-   NL_SET_ERR_MSG(extack, "Cannot find specified filter chain");
+   NL_SET_ERR_MSG(extack, "Cannot create specified filter chain");
err = -ENOMEM;
goto errout;
}
-- 
2.14.4



[patch net 2/2] net: sched: return -ENOENT when trying to remove filter from non-existent chain

2018-08-27 Thread Jiri Pirko
From: Jiri Pirko 

When chain 0 was implicitly created, removal of non-existent filter from
chain 0 gave -ENOENT. Once chain 0 became non-implicit, the same call is
giving -EINVAL. Fix this by returning -ENOENT in that case.

Reported-by: Roman Mashak 
Fixes: f71e0ca4db18 ("net: sched: Avoid implicit chain 0 creation")
Signed-off-by: Jiri Pirko 
---
 net/sched/cls_api.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index 2d41c5b21b48..1a67af8a6e8c 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -1399,7 +1399,7 @@ static int tc_del_tfilter(struct sk_buff *skb, struct 
nlmsghdr *n,
goto errout;
}
NL_SET_ERR_MSG(extack, "Cannot find specified filter chain");
-   err = -EINVAL;
+   err = -ENOENT;
goto errout;
}
 
-- 
2.14.4



[patch net 0/2] net: sched: couple of small fixes

2018-08-27 Thread Jiri Pirko
From: Jiri Pirko 

Jiri Pirko (2):
  net: sched: fix extack error message when chain is failed to be
created
  net: sched: return -ENOENT when trying to remove filter from
non-existent chain

 net/sched/cls_api.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

-- 
2.14.4



Re: broken behaviour of TC filter delete

2018-08-25 Thread Jiri Pirko
Fri, Aug 24, 2018 at 08:11:07PM CEST, xiyou.wangc...@gmail.com wrote:
>On Fri, Aug 24, 2018 at 9:21 AM Roman Mashak  wrote:
>>
>> So _before_ commit f71e0ca4db187af7c44987e9d21e9042c3046070 step 6 would
>> return -ENOENT with "Error: Filter with specified priority/protocol not
>> found." and _after_ the commit it returns -EINVAL (Error: Cannot find
>> specified filter chain.)
>>
>> ENOENT seems to be more logical to return when there's no more filter to 
>> delete.
>
>Yeah, at least we should keep ENOENT for compatibility.
>
>The bug here is chain 0 is gone after the last filter is gone,
>so when you delete the filter again, it treats it as you specify
>chain 0 which does not exist, so it hits EINVAL before ENOENT.

I understand. My concern is about consistency with other chains. Perhaps
-ENOENT for all chains in this case would be doable. What do you think?

>
>I am not sure how to fix this properly.


Re: broken behaviour of TC filter delete

2018-08-24 Thread Jiri Pirko
Thu, Aug 23, 2018 at 11:39:22PM CEST, m...@mojatatu.com wrote:
>
>
>It appears that the following commit changed the behaviour of scenario where a
>filter is deleted twice:
>
>commit f71e0ca4db187af7c44987e9d21e9042c3046070
>Author: Jiri Pirko 
>Date:   Mon Jul 23 09:23:05 2018 +0200
>
>net: sched: Avoid implicit chain 0 creation
>
>
>Steps to reproduce :
>
>1) create dummy device
>   $ ip link add dev dummy0 type dummy
>
>2) create qdisc
>   $ tc qdisc add dev dummy0 ingress
>
>3) create simple u32 filter with action attached
>   $ tc filter add dev dummy0 parent : protocol ip prio 1 u32 match ip src 
> 10.10.10.1/32 action ok
>
>4) list the filter
>   $ tc filter ls dev dummy0 parent :
>
>5) delete the filter with the given protocol and priority
>   $ tc filter del dev dummy0 parent : protocol ip prio 1
>
>6) repeat step 5, this will return -ENOENT ("Error: Filter with specified 
>priority/protocol not found.")
>However, before the change at step 6 we would get -EINVAL (Error: Cannot find 
>specified filter chain.)
>and that makes sense.

Wait, this now returns:
Error: Cannot find specified filter chain.
So you want it to return -EINVAL (Error: Cannot find specified filter chain.) ?
How about for other chains?


>
>The change breaks a number of our internal TC tests.


[patch net-next] net: sched: fix block->refcnt decrement

2018-08-08 Thread Jiri Pirko
From: Jiri Pirko 

Currently the refcnt is never decremented in case the value is not 1.
Fix it by adding decrement in case the refcnt is not 1.

Reported-by: Vlad Buslov 
Fixes: f71e0ca4db18 ("net: sched: Avoid implicit chain 0 creation")
Signed-off-by: Jiri Pirko 
---
 net/sched/cls_api.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index 194c2e0b2737..f922ce27ed5e 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -780,6 +780,8 @@ void tcf_block_put_ext(struct tcf_block *block, struct 
Qdisc *q,
block->refcnt--;
if (list_empty(>chain_list))
kfree(block);
+   } else {
+   block->refcnt--;
}
 }
 EXPORT_SYMBOL(tcf_block_put_ext);
-- 
2.14.4



Re: [PATCH net-next 12/14] net: sched: act_mirred: remove dependency on rtnl lock

2018-08-08 Thread Jiri Pirko
Wed, Aug 08, 2018 at 10:47:04AM CEST, vla...@mellanox.com wrote:
>
>On Wed 08 Aug 2018 at 08:03, Jiri Pirko  wrote:
>> Wed, Aug 08, 2018 at 09:40:35AM CEST, vla...@mellanox.com wrote:
>>>
>>>On Tue 07 Aug 2018 at 16:36, Jiri Pirko  wrote:
>>>> Mon, Aug 06, 2018 at 08:54:23AM CEST, vla...@mellanox.com wrote:
>>>>
>>>> [...]
>>>>
>>>>>diff --git a/include/net/tc_act/tc_tunnel_key.h 
>>>>>b/include/net/tc_act/tc_tunnel_key.h
>>>>>index 46b8c7f1c8d5..e6e475d788c6 100644
>>>>>--- a/include/net/tc_act/tc_tunnel_key.h
>>>>>+++ b/include/net/tc_act/tc_tunnel_key.h
>>>>>@@ -30,26 +30,47 @@ struct tcf_tunnel_key {
>>>>> 
>>>>> static inline bool is_tcf_tunnel_set(const struct tc_action *a)
>>>>> {
>>>>>+  bool ret = false;
>>>>> #ifdef CONFIG_NET_CLS_ACT
>>>>>   struct tcf_tunnel_key *t = to_tunnel_key(a);
>>>>>-  struct tcf_tunnel_key_params *params = rtnl_dereference(t->params);
>>>>>+  struct tcf_tunnel_key_params *params;
>>>>> 
>>>>>+  rcu_read_lock();
>>>>>+  params = rcu_dereference(t->params);
>>>>>   if (a->ops && a->ops->type == TCA_ACT_TUNNEL_KEY)
>>>>>-  return params->tcft_action == TCA_TUNNEL_KEY_ACT_SET;
>>>>>+  ret = params->tcft_action == TCA_TUNNEL_KEY_ACT_SET;
>>>>>+  rcu_read_unlock();
>>>>> #endif
>>>>>-  return false;
>>>>>+  return ret;
>>>>> }
>>>>> 
>>>>> static inline bool is_tcf_tunnel_release(const struct tc_action *a)
>>>>
>>>> Why are these tunnel things in a mirred patch?
>>>
>>>Mistake during re-slit. Will move those to tunnel_key patch.
>>
>> Are you sure that the changes are safe? I just quickly looked over it
>> and it smells:
>> drivers/net/ethernet/mellanox/mlx5/core/en_tc.c:
>> if (is_tcf_tunnel_set(a)) {
>> info = tcf_tunnel_info(a);
>>
>> Why the "t->params" can't be nulled in the middle?
>
>First of all, no API is actually "unlocked" with this patch. It is a
>preparation, rtnl mutex is still in use.
>
>Callers of these functions will have to be updated, for example, to use
>their _rcu version while holding rcu_read_lock.

I don't see any rcu version of these. I think that it would be good to
convert the callers to rcu and you can avoid these changes.

>
>
>
>


Re: [PATCH net-next 12/14] net: sched: act_mirred: remove dependency on rtnl lock

2018-08-08 Thread Jiri Pirko
Wed, Aug 08, 2018 at 09:40:35AM CEST, vla...@mellanox.com wrote:
>
>On Tue 07 Aug 2018 at 16:36, Jiri Pirko  wrote:
>> Mon, Aug 06, 2018 at 08:54:23AM CEST, vla...@mellanox.com wrote:
>>
>> [...]
>>
>>>diff --git a/include/net/tc_act/tc_tunnel_key.h 
>>>b/include/net/tc_act/tc_tunnel_key.h
>>>index 46b8c7f1c8d5..e6e475d788c6 100644
>>>--- a/include/net/tc_act/tc_tunnel_key.h
>>>+++ b/include/net/tc_act/tc_tunnel_key.h
>>>@@ -30,26 +30,47 @@ struct tcf_tunnel_key {
>>> 
>>> static inline bool is_tcf_tunnel_set(const struct tc_action *a)
>>> {
>>>+bool ret = false;
>>> #ifdef CONFIG_NET_CLS_ACT
>>> struct tcf_tunnel_key *t = to_tunnel_key(a);
>>>-struct tcf_tunnel_key_params *params = rtnl_dereference(t->params);
>>>+struct tcf_tunnel_key_params *params;
>>> 
>>>+rcu_read_lock();
>>>+params = rcu_dereference(t->params);
>>> if (a->ops && a->ops->type == TCA_ACT_TUNNEL_KEY)
>>>-return params->tcft_action == TCA_TUNNEL_KEY_ACT_SET;
>>>+ret = params->tcft_action == TCA_TUNNEL_KEY_ACT_SET;
>>>+rcu_read_unlock();
>>> #endif
>>>-return false;
>>>+return ret;
>>> }
>>> 
>>> static inline bool is_tcf_tunnel_release(const struct tc_action *a)
>>
>> Why are these tunnel things in a mirred patch?
>
>Mistake during re-slit. Will move those to tunnel_key patch.

Are you sure that the changes are safe? I just quickly looked over it
and it smells:
drivers/net/ethernet/mellanox/mlx5/core/en_tc.c:
if (is_tcf_tunnel_set(a)) {
info = tcf_tunnel_info(a);

Why the "t->params" can't be nulled in the middle?


Re: [PATCH net-next 12/14] net: sched: act_mirred: remove dependency on rtnl lock

2018-08-07 Thread Jiri Pirko
Mon, Aug 06, 2018 at 08:54:23AM CEST, vla...@mellanox.com wrote:

[...]

>diff --git a/include/net/tc_act/tc_tunnel_key.h 
>b/include/net/tc_act/tc_tunnel_key.h
>index 46b8c7f1c8d5..e6e475d788c6 100644
>--- a/include/net/tc_act/tc_tunnel_key.h
>+++ b/include/net/tc_act/tc_tunnel_key.h
>@@ -30,26 +30,47 @@ struct tcf_tunnel_key {
> 
> static inline bool is_tcf_tunnel_set(const struct tc_action *a)
> {
>+  bool ret = false;
> #ifdef CONFIG_NET_CLS_ACT
>   struct tcf_tunnel_key *t = to_tunnel_key(a);
>-  struct tcf_tunnel_key_params *params = rtnl_dereference(t->params);
>+  struct tcf_tunnel_key_params *params;
> 
>+  rcu_read_lock();
>+  params = rcu_dereference(t->params);
>   if (a->ops && a->ops->type == TCA_ACT_TUNNEL_KEY)
>-  return params->tcft_action == TCA_TUNNEL_KEY_ACT_SET;
>+  ret = params->tcft_action == TCA_TUNNEL_KEY_ACT_SET;
>+  rcu_read_unlock();
> #endif
>-  return false;
>+  return ret;
> }
> 
> static inline bool is_tcf_tunnel_release(const struct tc_action *a)

Why are these tunnel things in a mirred patch?


[...]


> static struct net_device *tcf_mirred_get_dev(const struct tc_action *a)
> {
>   struct tcf_mirred *m = to_mirred(a);
>+  struct net_device *dev;
>+
>+  rcu_read_lock();
>+  dev = rcu_dereference(m->tcfm_dev);
>+  if (dev)
>+  dev_hold(dev);
>+  rcu_read_unlock();
> 
>-  return rtnl_dereference(m->tcfm_dev);
>+  return dev;
> }
> 
> static int tcf_mirred_delete(struct net *net, u32 index)
>diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
>index e8b0bbd0883f..0cce0eadc28b 100644
>--- a/net/sched/cls_api.c
>+++ b/net/sched/cls_api.c
>@@ -2167,6 +2167,7 @@ static int tc_exts_setup_cb_egdev_call(struct tcf_exts 
>*exts,
>   if (!dev)
>   continue;
>   ret = tc_setup_cb_egdev_call(dev, type, type_data, err_stop);
>+  dev_put(dev);

This looks really odd. a->ops->put_dev is needed.


>   if (ret < 0)
>   return ret;
>   ok_count += ret;
>-- 
>2.7.5
>


Re: consequences of setting net_device_ops ndo_change_carrier()?

2018-08-04 Thread Jiri Pirko
Sat, Aug 04, 2018 at 01:06:58PM CEST, rpj...@crashcourse.ca wrote:
>
>  i'll try to keep this (relatively) short as there may be a simple
>answer to this, or it could just be a stupid question -- sort of
>related to previous question (thank you, florian).
>
>  currently messing with networking device involving FPGA and some
>quad-port transceivers, and noticed that, when one unplugs or plugs a
>device into one of the ports, there is no change in the contents of
>the corresponding sysfs files /sys/class/net//carrier (or
>operstate, for that matter, which might be related to this as well).
>doing this with a "regular" port on my linux laptop certainly
>confirmed that the carrier file would switch between 0 and 1, and
>operstate would switch between up and down, so i know what behaviour i
>was *expecting* if things were ostensibly working properly.
>
>  long story short, i pawed through the driver code only to stumble

What driver? Has to be out of tree as I don't see any in the existing
kernel using .ndo_change_carrier (aside of team and dummy)


>over this in the ethernet driver for the device:
>
>  static const struct net_device_ops netdev_netdev_ops = {
>  ... snip ...
>.ndo_change_carrier = netdev_change_carrier,
>  ... snip ...
>  };
>
>and
>
>  static int
>  netdev_change_carrier(struct net_device *dev, bool new_carrier)
>  {
>if (new_carrier)
>netif_carrier_on(dev);
>else
>netif_carrier_off(dev);
>return 0;
>  }
>
>as i mentioned before, i am really new to kernel networking code, so i
>did a quick search and found this in netdevice.h:
>
>* int (*ndo_change_carrier)(struct net_device *dev, bool new_carrier);
> *  Called to change device carrier. Soft-devices (like dummy, team, etc)
> *  which do not represent real hardware may define this to allow their
> *  userspace components to manage their virtual carrier state. Devices
> *  that determine carrier state from physical hardware properties (eg
> *  network cables) or protocol-dependent mechanisms (eg
> *  USB_CDC_NOTIFY_NETWORK_CONNECTION) should NOT implement this function.
> *
>
>although i still don't fully understand the purpose of that field, it
>makes me *very* nervous to read that that routine is for "soft"
>devices, and ***not*** for devices that attempt to determine carrier
>state from physical hardware properties. i searched the kernel code
>base for other drivers that set that field, and found only what is
>mentioned in that comment -- dummy.c, of_dummy_mac.c and team.c.
>
>  the testers for this unit are complaining that they are somehow not
>being notified when they plug and unplug devices from the ports -- is
>this why? what would be the purpose of assigning a routine to that
>field? as i read it (and i could be wrong), my impression is that you
>can have the driver *either* determine the carrier state from physical
>properties, *or* allow userspace control, but not both, is that
>correct?

Correct. Your device is physical device, it knows how to get the state
of the carrier itself.


>
>  i'm about to ask the original authors why they did the above, but

I guess that the reason is that they had no clue what they are doing :)


>i'd like to feel that it's not a stupid question if there's something
>really clever going on here. is this just a development debugging
>feature that would normally be removed at production? or what?
>
>rday
>
>-- 
>
>
>Robert P. J. Day Ottawa, Ontario, CANADA
>  http://crashcourse.ca/dokuwiki
>
>Twitter:   http://twitter.com/rpjday
>LinkedIn:   http://ca.linkedin.com/in/rpjday
>


[patch net-next] net: sched: fix flush on non-existing chain

2018-08-03 Thread Jiri Pirko
From: Jiri Pirko 

User was able to perform filter flush on chain 0 even if it didn't have
any filters in it. With the patch that avoided implicit chain 0
creation, this changed. So in case user wants filter flush on chain
which does not exist, just return success. There's no reason for non-0
chains to behave differently than chain 0, so do the same for them.

Reported-by: Ido Schimmel 
Fixes: f71e0ca4db18 ("net: sched: Avoid implicit chain 0 creation")
Signed-off-by: Jiri Pirko 
---
 net/sched/cls_api.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index e8b0bbd0883f..194c2e0b2737 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -1389,6 +1389,13 @@ static int tc_del_tfilter(struct sk_buff *skb, struct 
nlmsghdr *n,
}
chain = tcf_chain_get(block, chain_index, false);
if (!chain) {
+   /* User requested flush on non-existent chain. Nothing to do,
+* so just return success.
+*/
+   if (prio == 0) {
+   err = 0;
+   goto errout;
+   }
NL_SET_ERR_MSG(extack, "Cannot find specified filter chain");
err = -EINVAL;
goto errout;
-- 
2.14.4



  1   2   3   4   5   6   7   8   9   10   >