[PATCH net-next] add documents for snmp counters
Add explaination of below counters: TcpExtTCPRcvCoalesce TcpExtTCPAutoCorking TcpExtTCPOrigDataSent TCPSynRetrans TCPFastOpenActiveFail TcpExtListenOverflows TcpExtListenDrops TcpExtTCPHystartTrainDetect TcpExtTCPHystartTrainCwnd TcpExtTCPHystartDelayDetect TcpExtTCPHystartDelayCwnd Signed-off-by: yupeng --- Documentation/networking/snmp_counter.rst | 202 ++ 1 file changed, 202 insertions(+) diff --git a/Documentation/networking/snmp_counter.rst b/Documentation/networking/snmp_counter.rst index a262d32ed710..918a1374af30 100644 --- a/Documentation/networking/snmp_counter.rst +++ b/Documentation/networking/snmp_counter.rst @@ -220,6 +220,68 @@ Defined in `RFC1213 tcpPassiveOpens`_ It means the TCP layer receives a SYN, replies a SYN+ACK, come into the SYN-RCVD state. +* TcpExtTCPRcvCoalesce +When packets are received by the TCP layer and are not be read by the +application, the TCP layer will try to merge them. This counter +indicate how many packets are merged in such situation. If GRO is +enabled, lots of packets would be merged by GRO, these packets +wouldn't be counted to TcpExtTCPRcvCoalesce. + +* TcpExtTCPAutoCorking +When sending packets, the TCP layer will try to merge small packets to +a bigger one. This counter increase 1 for every packet merged in such +situation. Please refer to the LWN article for more details: +https://lwn.net/Articles/576263/ + +* TcpExtTCPOrigDataSent +This counter is explained by `kernel commit f19c29e3e391`_, I pasted the +explaination below:: + + TCPOrigDataSent: number of outgoing packets with original data (excluding + retransmission but including data-in-SYN). This counter is different from + TcpOutSegs because TcpOutSegs also tracks pure ACKs. TCPOrigDataSent is + more useful to track the TCP retransmission rate. + +* TCPSynRetrans +This counter is explained by `kernel commit f19c29e3e391`_, I pasted the +explaination below:: + + TCPSynRetrans: number of SYN and SYN/ACK retransmits to break down + retransmissions into SYN, fast-retransmits, timeout retransmits, etc. + +* TCPFastOpenActiveFail +This counter is explained by `kernel commit f19c29e3e391`_, I pasted the +explaination below:: + + TCPFastOpenActiveFail: Fast Open attempts (SYN/data) failed because + the remote does not accept it or the attempts timed out. + +.. _kernel commit f19c29e3e391: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=f19c29e3e391a66a273e9afebaf01917245148cd + +* TcpExtListenOverflows and TcpExtListenDrops +When kernel receives a SYN from a client, and if the TCP accept queue +is full, kernel will drop the SYN and add 1 to TcpExtListenOverflows. +At the same time kernel will also add 1 to TcpExtListenDrops. When a +TCP socket is in LISTEN state, and kernel need to drop a packet, +kernel would always add 1 to TcpExtListenDrops. So increase +TcpExtListenOverflows would let TcpExtListenDrops increasing at the +same time, but TcpExtListenDrops would also increase without +TcpExtListenOverflows increasing, e.g. a memory allocation fail would +also let TcpExtListenDrops increase. + +Note: The above explanation is based on kernel 4.10 or above version, on +an old kernel, the TCP stack has different behavior when TCP accept +queue is full. On the old kernel, TCP stack won't drop the SYN, it +would complete the 3-way handshake. As the accept queue is full, TCP +stack will keep the socket in the TCP half-open queue. As it is in the +half open queue, TCP stack will send SYN+ACK on an exponential backoff +timer, after client replies ACK, TCP stack checks whether the accept +queue is still full, if it is not full, moves the socket to the accept +queue, if it is full, keeps the socket in the half-open queue, at next +time client replies ACK, this socket will get another chance to move +to the accept queue. + + TCP Fast Open When kernel receives a TCP packet, it has two paths to handler the @@ -331,6 +393,38 @@ TcpExtTCPAbortFailed will be increased. .. _RFC2525 2.17 section: https://tools.ietf.org/html/rfc2525#page-50 +TCP Hybrid Slow Start + +The Hybrid Slow Start algorithm is an enhancement of the traditional +TCP congestion window Slow Start algorithm. It uses two pieces of +information to detect whether the max bandwidth of the TCP path is +approached. The two pieces of information are ACK train length and +increase in packet delay. For detail information, please refer the +`Hybrid Slow Start paper`_. Either ACK train length or packet delay +hits a specific threshold, the congestion control algorithm will come +into the Congestion Avoidance state. Until v4.20, two congestion +control algorithms are using Hybrid Slow Start, they are cubic (the +default congestion control algorithm) and cdg. Four snmp counters +relate with the Hybrid Slow Start algorithm. + +.. _Hybrid Slow Start paper: https://pdfs.semanticscholar.org/25e9/ef3f03315782c7f1cbcd31b587857adae7d1.pdf + +*
RE: [PATCH net-next 1/8] dpaa2-eth: Add basic XDP support
> -Original Message- > From: David Ahern > Sent: Saturday, November 24, 2018 11:49 PM > To: Ioana Ciocoi Radulescu ; > netdev@vger.kernel.org; da...@davemloft.net > Cc: Ioana Ciornei > Subject: Re: [PATCH net-next 1/8] dpaa2-eth: Add basic XDP support > > On 11/23/18 9:56 AM, Ioana Ciocoi Radulescu wrote: > > @@ -215,6 +255,7 @@ static void dpaa2_eth_rx(struct dpaa2_eth_priv > *priv, > > struct dpaa2_fas *fas; > > void *buf_data; > > u32 status = 0; > > + u32 xdp_act; > > > > /* Tracing point */ > > trace_dpaa2_rx_fd(priv->net_dev, fd); > > @@ -231,8 +272,14 @@ static void dpaa2_eth_rx(struct dpaa2_eth_priv > *priv, > > percpu_extras = this_cpu_ptr(priv->percpu_extras); > > > > if (fd_format == dpaa2_fd_single) { > > + xdp_act = run_xdp(priv, ch, (struct dpaa2_fd *)fd, vaddr); > > + if (xdp_act != XDP_PASS) > > + return; > > please bump the rx counters (packets and bytes) regardless of what XDP > outcome is. > > Same for Tx; packets and bytes counter should be bumped for packets > redirected by XDP. Thanks for the feedback, I wasn't sure whether I should count them as regular packets or not. I'll make the change in v2. Ioana
[PATCH net] sctp: increase sk_wmem_alloc when head->truesize is increased
I changed to count sk_wmem_alloc by skb truesize instead of 1 to fix the sk_wmem_alloc leak caused by later truesize's change in xfrm in Commit 02968ccf0125 ("sctp: count sk_wmem_alloc by skb truesize in sctp_packet_transmit"). But I should have also increased sk_wmem_alloc when head->truesize is increased in sctp_packet_gso_append() as xfrm does. Otherwise, sctp gso packet will cause sk_wmem_alloc underflow. Fixes: 02968ccf0125 ("sctp: count sk_wmem_alloc by skb truesize in sctp_packet_transmit") Signed-off-by: Xin Long --- net/sctp/output.c | 1 + 1 file changed, 1 insertion(+) diff --git a/net/sctp/output.c b/net/sctp/output.c index b0e74a3..025f48e 100644 --- a/net/sctp/output.c +++ b/net/sctp/output.c @@ -410,6 +410,7 @@ static void sctp_packet_gso_append(struct sk_buff *head, struct sk_buff *skb) head->truesize += skb->truesize; head->data_len += skb->len; head->len += skb->len; + refcount_add(skb->truesize, >sk->sk_wmem_alloc); __skb_header_release(skb); } -- 2.1.0
[PATCH rdma-next 6/7] IB/mlx5: Update the supported DEVX commands
From: Yishai Hadas Update the supported DEVX commands, it includes adding to the query/modify command's list and to the encoding handling. In addition, a valid range for general commands was added to be used for future commands. Signed-off-by: Yishai Hadas Reviewed-by: Artemy Kovalyov Signed-off-by: Leon Romanovsky --- drivers/infiniband/hw/mlx5/devx.c | 17 + include/linux/mlx5/mlx5_ifc.h | 10 ++ 2 files changed, 27 insertions(+) diff --git a/drivers/infiniband/hw/mlx5/devx.c b/drivers/infiniband/hw/mlx5/devx.c index 80053324dd31..5271469aad10 100644 --- a/drivers/infiniband/hw/mlx5/devx.c +++ b/drivers/infiniband/hw/mlx5/devx.c @@ -314,6 +314,8 @@ static u64 devx_get_obj_id(const void *in) MLX5_GET(query_dct_in, in, dctn)); break; case MLX5_CMD_OP_QUERY_XRQ: + case MLX5_CMD_OP_QUERY_XRQ_DC_PARAMS_ENTRY: + case MLX5_CMD_OP_QUERY_XRQ_ERROR_PARAMS: obj_id = get_enc_obj_id(MLX5_CMD_OP_CREATE_XRQ, MLX5_GET(query_xrq_in, in, xrqn)); break; @@ -340,9 +342,16 @@ static u64 devx_get_obj_id(const void *in) MLX5_GET(drain_dct_in, in, dctn)); break; case MLX5_CMD_OP_ARM_XRQ: + case MLX5_CMD_OP_SET_XRQ_DC_PARAMS_ENTRY: obj_id = get_enc_obj_id(MLX5_CMD_OP_CREATE_XRQ, MLX5_GET(arm_xrq_in, in, xrqn)); break; + case MLX5_CMD_OP_QUERY_PACKET_REFORMAT_CONTEXT: + obj_id = get_enc_obj_id + (MLX5_CMD_OP_ALLOC_PACKET_REFORMAT_CONTEXT, +MLX5_GET(query_packet_reformat_context_in, + in, packet_reformat_id)); + break; default: obj_id = 0; } @@ -601,6 +610,7 @@ static bool devx_is_obj_modify_cmd(const void *in) case MLX5_CMD_OP_DRAIN_DCT: case MLX5_CMD_OP_ARM_DCT_FOR_KEY_VIOLATION: case MLX5_CMD_OP_ARM_XRQ: + case MLX5_CMD_OP_SET_XRQ_DC_PARAMS_ENTRY: return true; case MLX5_CMD_OP_SET_FLOW_TABLE_ENTRY: { @@ -642,6 +652,9 @@ static bool devx_is_obj_query_cmd(const void *in) case MLX5_CMD_OP_QUERY_XRC_SRQ: case MLX5_CMD_OP_QUERY_DCT: case MLX5_CMD_OP_QUERY_XRQ: + case MLX5_CMD_OP_QUERY_XRQ_DC_PARAMS_ENTRY: + case MLX5_CMD_OP_QUERY_XRQ_ERROR_PARAMS: + case MLX5_CMD_OP_QUERY_PACKET_REFORMAT_CONTEXT: return true; default: return false; @@ -685,6 +698,10 @@ static bool devx_is_general_cmd(void *in) { u16 opcode = MLX5_GET(general_obj_in_cmd_hdr, in, opcode); + if (opcode >= MLX5_CMD_OP_GENERAL_START && + opcode < MLX5_CMD_OP_GENERAL_END) + return true; + switch (opcode) { case MLX5_CMD_OP_QUERY_HCA_CAP: case MLX5_CMD_OP_QUERY_HCA_VPORT_CONTEXT: diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h index ece1b606c909..171d68663640 100644 --- a/include/linux/mlx5/mlx5_ifc.h +++ b/include/linux/mlx5/mlx5_ifc.h @@ -144,6 +144,9 @@ enum { MLX5_CMD_OP_DESTROY_XRQ = 0x718, MLX5_CMD_OP_QUERY_XRQ = 0x719, MLX5_CMD_OP_ARM_XRQ = 0x71a, + MLX5_CMD_OP_QUERY_XRQ_DC_PARAMS_ENTRY = 0x725, + MLX5_CMD_OP_SET_XRQ_DC_PARAMS_ENTRY = 0x726, + MLX5_CMD_OP_QUERY_XRQ_ERROR_PARAMS= 0x727, MLX5_CMD_OP_QUERY_VPORT_STATE = 0x750, MLX5_CMD_OP_MODIFY_VPORT_STATE= 0x751, MLX5_CMD_OP_QUERY_ESW_VPORT_CONTEXT = 0x752, @@ -245,6 +248,7 @@ enum { MLX5_CMD_OP_MODIFY_FLOW_TABLE = 0x93c, MLX5_CMD_OP_ALLOC_PACKET_REFORMAT_CONTEXT = 0x93d, MLX5_CMD_OP_DEALLOC_PACKET_REFORMAT_CONTEXT = 0x93e, + MLX5_CMD_OP_QUERY_PACKET_REFORMAT_CONTEXT = 0x93f, MLX5_CMD_OP_ALLOC_MODIFY_HEADER_CONTEXT = 0x940, MLX5_CMD_OP_DEALLOC_MODIFY_HEADER_CONTEXT = 0x941, MLX5_CMD_OP_QUERY_MODIFY_HEADER_CONTEXT = 0x942, @@ -260,6 +264,12 @@ enum { MLX5_CMD_OP_MAX }; +/* Valid range for general commands that don't work over an object */ +enum { + MLX5_CMD_OP_GENERAL_START = 0xb00, + MLX5_CMD_OP_GENERAL_END = 0xd00, +}; + struct mlx5_ifc_flow_table_fields_supported_bits { u8 outer_dmac[0x1]; u8 outer_smac[0x1]; -- 2.19.1
[PATCH rdma-next 7/7] IB/mlx5: Allow XRC usage via verbs in DEVX context
From: Yishai Hadas Allows XRC usage from the verbs flow in a DEVX context. As XRCD is some shared kernel resource between processes it should be created with UID=0 to point on that. As a result once XRC QP/SRQ are created they must be used as well with UID=0 so that firmware will allow the XRCD usage. Signed-off-by: Yishai Hadas Reviewed-by: Artemy Kovalyov Signed-off-by: Leon Romanovsky --- drivers/infiniband/hw/mlx5/mlx5_ib.h | 1 - drivers/infiniband/hw/mlx5/qp.c | 12 +--- drivers/infiniband/hw/mlx5/srq.c | 2 +- 3 files changed, 6 insertions(+), 9 deletions(-) diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h b/drivers/infiniband/hw/mlx5/mlx5_ib.h index 4d33965369cc..24cb2f793210 100644 --- a/drivers/infiniband/hw/mlx5/mlx5_ib.h +++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h @@ -543,7 +543,6 @@ struct mlx5_ib_srq { struct mlx5_ib_xrcd { struct ib_xrcd ibxrcd; u32 xrcdn; - u16 uid; }; enum mlx5_ib_mtt_access_flags { diff --git a/drivers/infiniband/hw/mlx5/qp.c b/drivers/infiniband/hw/mlx5/qp.c index 52ffc6af3c20..369db954edbe 100644 --- a/drivers/infiniband/hw/mlx5/qp.c +++ b/drivers/infiniband/hw/mlx5/qp.c @@ -775,6 +775,7 @@ static int create_user_qp(struct mlx5_ib_dev *dev, struct ib_pd *pd, __be64 *pas; void *qpc; int err; + u16 uid; err = ib_copy_from_udata(, udata, sizeof(ucmd)); if (err) { @@ -836,7 +837,8 @@ static int create_user_qp(struct mlx5_ib_dev *dev, struct ib_pd *pd, goto err_umem; } - MLX5_SET(create_qp_in, *in, uid, to_mpd(pd)->uid); + uid = (attr->qp_type != IB_QPT_XRC_TGT) ? to_mpd(pd)->uid : 0; + MLX5_SET(create_qp_in, *in, uid, uid); pas = (__be64 *)MLX5_ADDR_OF(create_qp_in, *in, pas); if (ubuffer->umem) mlx5_ib_populate_pas(dev, ubuffer->umem, page_shift, pas, 0); @@ -5513,7 +5515,6 @@ struct ib_xrcd *mlx5_ib_alloc_xrcd(struct ib_device *ibdev, struct mlx5_ib_dev *dev = to_mdev(ibdev); struct mlx5_ib_xrcd *xrcd; int err; - u16 uid; if (!MLX5_CAP_GEN(dev->mdev, xrc)) return ERR_PTR(-ENOSYS); @@ -5522,14 +5523,12 @@ struct ib_xrcd *mlx5_ib_alloc_xrcd(struct ib_device *ibdev, if (!xrcd) return ERR_PTR(-ENOMEM); - uid = context ? to_mucontext(context)->devx_uid : 0; - err = mlx5_cmd_xrcd_alloc(dev->mdev, >xrcdn, uid); + err = mlx5_cmd_xrcd_alloc(dev->mdev, >xrcdn, 0); if (err) { kfree(xrcd); return ERR_PTR(-ENOMEM); } - xrcd->uid = uid; return >ibxrcd; } @@ -5537,10 +5536,9 @@ int mlx5_ib_dealloc_xrcd(struct ib_xrcd *xrcd) { struct mlx5_ib_dev *dev = to_mdev(xrcd->device); u32 xrcdn = to_mxrcd(xrcd)->xrcdn; - u16 uid = to_mxrcd(xrcd)->uid; int err; - err = mlx5_cmd_xrcd_dealloc(dev->mdev, xrcdn, uid); + err = mlx5_cmd_xrcd_dealloc(dev->mdev, xrcdn, 0); if (err) mlx5_ib_warn(dev, "failed to dealloc xrcdn 0x%x\n", xrcdn); diff --git a/drivers/infiniband/hw/mlx5/srq.c b/drivers/infiniband/hw/mlx5/srq.c index b3aef0eb39cb..0413b10dea71 100644 --- a/drivers/infiniband/hw/mlx5/srq.c +++ b/drivers/infiniband/hw/mlx5/srq.c @@ -113,7 +113,7 @@ static int create_srq_user(struct ib_pd *pd, struct mlx5_ib_srq *srq, in->log_page_size = page_shift - MLX5_ADAPTER_PAGE_SHIFT; in->page_offset = offset; - in->uid = to_mpd(pd)->uid; + in->uid = (in->type != IB_SRQT_XRC) ? to_mpd(pd)->uid : 0; if (MLX5_CAP_GEN(dev->mdev, cqe_version) == MLX5_CQE_VERSION_V1 && in->type != IB_SRQT_BASIC) in->user_index = uidx; -- 2.19.1
[PATCH rdma-next 5/7] IB/mlx5: Enforce DEVX privilege by firmware
From: Yishai Hadas Enforce DEVX privilege by firmware, this enables future device functionality without the need to make driver changes unless a new privilege type will be introduced. Signed-off-by: Yishai Hadas Reviewed-by: Artemy Kovalyov Signed-off-by: Leon Romanovsky --- drivers/infiniband/hw/mlx5/devx.c| 17 + drivers/infiniband/hw/mlx5/main.c| 4 ++-- drivers/infiniband/hw/mlx5/mlx5_ib.h | 5 +++-- 3 files changed, 14 insertions(+), 12 deletions(-) diff --git a/drivers/infiniband/hw/mlx5/devx.c b/drivers/infiniband/hw/mlx5/devx.c index f80b78aab4da..80053324dd31 100644 --- a/drivers/infiniband/hw/mlx5/devx.c +++ b/drivers/infiniband/hw/mlx5/devx.c @@ -47,24 +47,31 @@ devx_ufile2uctx(const struct uverbs_attr_bundle *attrs) return to_mucontext(ib_uverbs_get_ucontext(attrs)); } -int mlx5_ib_devx_create(struct mlx5_ib_dev *dev) +int mlx5_ib_devx_create(struct mlx5_ib_dev *dev, bool is_user) { u32 in[MLX5_ST_SZ_DW(create_uctx_in)] = {0}; u32 out[MLX5_ST_SZ_DW(general_obj_out_cmd_hdr)] = {0}; u64 general_obj_types; - void *hdr; + void *hdr, *uctx; int err; u16 uid; + u32 cap = 0; hdr = MLX5_ADDR_OF(create_uctx_in, in, hdr); + uctx = MLX5_ADDR_OF(create_uctx_in, in, uctx); general_obj_types = MLX5_CAP_GEN_64(dev->mdev, general_obj_types); if (!(general_obj_types & MLX5_GENERAL_OBJ_TYPES_CAP_UCTX) || !(general_obj_types & MLX5_GENERAL_OBJ_TYPES_CAP_UMEM)) return -EINVAL; + if (is_user && capable(CAP_NET_RAW) && + (MLX5_CAP_GEN(dev->mdev, uctx_cap) & MLX5_UCTX_CAP_RAW_TX)) + cap |= MLX5_UCTX_CAP_RAW_TX; + MLX5_SET(general_obj_in_cmd_hdr, hdr, opcode, MLX5_CMD_OP_CREATE_GENERAL_OBJECT); MLX5_SET(general_obj_in_cmd_hdr, hdr, obj_type, MLX5_OBJ_TYPE_UCTX); + MLX5_SET(uctx, uctx, cap, cap); err = mlx5_cmd_exec(dev->mdev, in, sizeof(in), out, sizeof(out)); if (err) @@ -672,9 +679,6 @@ static int devx_get_uid(struct mlx5_ib_ucontext *c, void *cmd_in) if (!c->devx_uid) return -EINVAL; - if (!capable(CAP_NET_RAW)) - return -EPERM; - return c->devx_uid; } static bool devx_is_general_cmd(void *in) @@ -1239,9 +1243,6 @@ static int UVERBS_HANDLER(MLX5_IB_METHOD_DEVX_UMEM_REG)( if (!c->devx_uid) return -EINVAL; - if (!capable(CAP_NET_RAW)) - return -EPERM; - obj = kzalloc(sizeof(struct devx_umem), GFP_KERNEL); if (!obj) return -ENOMEM; diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c index b3986bc961ca..2b09e6896e5a 100644 --- a/drivers/infiniband/hw/mlx5/main.c +++ b/drivers/infiniband/hw/mlx5/main.c @@ -1763,7 +1763,7 @@ static struct ib_ucontext *mlx5_ib_alloc_ucontext(struct ib_device *ibdev, #endif if (req.flags & MLX5_IB_ALLOC_UCTX_DEVX) { - err = mlx5_ib_devx_create(dev); + err = mlx5_ib_devx_create(dev, true); if (err < 0) goto out_uars; context->devx_uid = err; @@ -6234,7 +6234,7 @@ static int mlx5_ib_stage_devx_init(struct mlx5_ib_dev *dev) { int uid; - uid = mlx5_ib_devx_create(dev); + uid = mlx5_ib_devx_create(dev, false); if (uid > 0) dev->devx_whitelist_uid = uid; diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h b/drivers/infiniband/hw/mlx5/mlx5_ib.h index 59e1664a107f..4d33965369cc 100644 --- a/drivers/infiniband/hw/mlx5/mlx5_ib.h +++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h @@ -1268,7 +1268,7 @@ void mlx5_ib_put_native_port_mdev(struct mlx5_ib_dev *dev, u8 port_num); #if IS_ENABLED(CONFIG_INFINIBAND_USER_ACCESS) -int mlx5_ib_devx_create(struct mlx5_ib_dev *dev); +int mlx5_ib_devx_create(struct mlx5_ib_dev *dev, bool is_user); void mlx5_ib_devx_destroy(struct mlx5_ib_dev *dev, u16 uid); const struct uverbs_object_tree_def *mlx5_ib_get_devx_tree(void); extern const struct uapi_definition mlx5_ib_devx_defs[]; @@ -1283,7 +1283,8 @@ int mlx5_ib_get_flow_trees(const struct uverbs_object_tree_def **root); void mlx5_ib_destroy_flow_action_raw(struct mlx5_ib_flow_action *maction); #else static inline int -mlx5_ib_devx_create(struct mlx5_ib_dev *dev) { return -EOPNOTSUPP; }; +mlx5_ib_devx_create(struct mlx5_ib_dev *dev, + bool is_user) { return -EOPNOTSUPP; } static inline void mlx5_ib_devx_destroy(struct mlx5_ib_dev *dev, u16 uid) {} static inline bool mlx5_ib_devx_is_flow_dest(void *obj, int *dest_id, int *dest_type) -- 2.19.1
[PATCH mlx5-next 1/7] net/mlx5: Update mlx5_ifc with DEVX UCTX capabilities bits
From: Yishai Hadas Expose device capabilities for DEVX user context, it includes which caps the device is supported and a matching bit to set as part of user context creation. Signed-off-by: Yishai Hadas Reviewed-by: Artemy Kovalyov Signed-off-by: Leon Romanovsky --- include/linux/mlx5/mlx5_ifc.h | 16 ++-- 1 file changed, 14 insertions(+), 2 deletions(-) diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h index 6f64e814cc10..ece1b606c909 100644 --- a/include/linux/mlx5/mlx5_ifc.h +++ b/include/linux/mlx5/mlx5_ifc.h @@ -883,6 +883,10 @@ enum { MLX5_CAP_UMR_FENCE_NONE = 0x2, }; +enum { + MLX5_UCTX_CAP_RAW_TX = 1UL << 0, +}; + struct mlx5_ifc_cmd_hca_cap_bits { u8 reserved_at_0[0x30]; u8 vhca_id[0x10]; @@ -1193,7 +1197,13 @@ struct mlx5_ifc_cmd_hca_cap_bits { u8 num_vhca_ports[0x8]; u8 reserved_at_618[0x6]; u8 sw_owner_id[0x1]; - u8 reserved_at_61f[0x1e1]; + u8 reserved_at_61f[0x1]; + + u8 reserved_at_620[0x80]; + + u8 uctx_cap[0x20]; + + u8 reserved_at_6c0[0x140]; }; enum mlx5_flow_destination_type { @@ -9276,7 +9286,9 @@ struct mlx5_ifc_umem_bits { struct mlx5_ifc_uctx_bits { u8 modify_field_select[0x40]; - u8 reserved_at_40[0x1c0]; + u8 cap[0x20]; + + u8 reserved_at_60[0x1a0]; }; struct mlx5_ifc_create_umem_in_bits { -- 2.19.1
[PATCH rdma-next 0/7] Enrich DEVX support
From: Leon Romanovsky >From Yishai, --- This series enriches DEVX support in few aspects: it enables interoperability between DEVX and verbs and improves mechanism for controlling privileged DEVX commands. The first patch updates mlx5 ifc file. Next 3 patches enable modifying and querying verbs objects via the DEVX interface. To achieve that the core layer introduced the 'UVERBS_IDR_ANY_OBJECT' type to match any IDR object. Once it's used by some driver's method, the infrastructure skips checking for the IDR type and it becomes the driver handler responsibility. The DEVX methods of modify and query were changed to get any object type via the 'UVERBS_IDR_ANY_OBJECT' mechanism. The type checking is done per object as part of the driver code. The next 3 patches introduce more robust mechanism for controlling privileged DEVX commands. The responsibility to block/allow per command was moved to be done in the firmware based on the UID credentials that the driver reports upon user context creation. This enables more granularity per command based on the device security model and the user credentials. In addition, by introducing a valid range for 'general commands' we prevent the need to touch the driver's code any time that a new future command will be added. The last patch fixes the XRC verbs flow once a DEVX context is used. This is needed as XRCD is some shared kernel resource and as such a kernel UID (=0) should be used in its related resources. Thanks Yishai Hadas (7): net/mlx5: Update mlx5_ifc with DEVX UCTX capabilities bits IB/core: Introduce UVERBS_IDR_ANY_OBJECT IB/core: Enable getting an object type from a given uobject IB/mlx5: Enable modify and query verbs objects via DEVX IB/mlx5: Enforce DEVX privilege by firmware IB/mlx5: Update the supported DEVX commands IB/mlx5: Allow XRC usage via verbs in DEVX context drivers/infiniband/core/rdma_core.c | 27 +++-- drivers/infiniband/core/rdma_core.h | 21 ++-- drivers/infiniband/core/uverbs_uapi.c | 10 +- drivers/infiniband/hw/mlx5/devx.c | 142 ++ drivers/infiniband/hw/mlx5/main.c | 4 +- drivers/infiniband/hw/mlx5/mlx5_ib.h | 6 +- drivers/infiniband/hw/mlx5/qp.c | 12 +-- drivers/infiniband/hw/mlx5/srq.c | 2 +- include/linux/mlx5/mlx5_ifc.h | 26 - include/rdma/uverbs_ioctl.h | 6 ++ include/rdma/uverbs_std_types.h | 12 +++ 11 files changed, 215 insertions(+), 53 deletions(-) -- 2.19.1
[PATCH rdma-next 3/7] IB/core: Enable getting an object type from a given uobject
From: Yishai Hadas Enable getting an object type from a given uobject, the type is saved upon tree merging and is returned as part of some helper function. Signed-off-by: Yishai Hadas Signed-off-by: Leon Romanovsky --- drivers/infiniband/core/rdma_core.h | 5 - drivers/infiniband/core/uverbs_uapi.c | 1 + include/rdma/uverbs_std_types.h | 12 3 files changed, 13 insertions(+), 5 deletions(-) diff --git a/drivers/infiniband/core/rdma_core.h b/drivers/infiniband/core/rdma_core.h index 8aec28037c48..b3ca7457ac42 100644 --- a/drivers/infiniband/core/rdma_core.h +++ b/drivers/infiniband/core/rdma_core.h @@ -118,11 +118,6 @@ void release_ufile_idr_uobject(struct ib_uverbs_file *ufile); * Depending on ID the slot pointer in the radix tree points at one of these * structs. */ -struct uverbs_api_object { - const struct uverbs_obj_type *type_attrs; - const struct uverbs_obj_type_class *type_class; - u8 disabled:1; -}; struct uverbs_api_ioctl_method { int(__rcu *handler)(struct uverbs_attr_bundle *attrs); diff --git a/drivers/infiniband/core/uverbs_uapi.c b/drivers/infiniband/core/uverbs_uapi.c index faac225184a6..0136c1d78a0f 100644 --- a/drivers/infiniband/core/uverbs_uapi.c +++ b/drivers/infiniband/core/uverbs_uapi.c @@ -184,6 +184,7 @@ static int uapi_merge_obj_tree(struct uverbs_api *uapi, if (WARN_ON(obj_elm->type_attrs)) return -EINVAL; + obj_elm->id = obj->id; obj_elm->type_attrs = obj->type_attrs; obj_elm->type_class = obj->type_attrs->type_class; /* diff --git a/include/rdma/uverbs_std_types.h b/include/rdma/uverbs_std_types.h index df878ce02c94..883abcf6d36e 100644 --- a/include/rdma/uverbs_std_types.h +++ b/include/rdma/uverbs_std_types.h @@ -182,5 +182,17 @@ static inline void ib_set_flow(struct ib_uobject *uobj, struct ib_flow *ibflow, uflow->resources = uflow_res; } +struct uverbs_api_object { + const struct uverbs_obj_type *type_attrs; + const struct uverbs_obj_type_class *type_class; + u8 disabled:1; + u32 id; +}; + +static inline u32 uobj_get_object_id(struct ib_uobject *uobj) +{ + return uobj->uapi_object->id; +} + #endif -- 2.19.1
[PATCH rdma-next 2/7] IB/core: Introduce UVERBS_IDR_ANY_OBJECT
From: Yishai Hadas Introduce the UVERBS_IDR_ANY_OBJECT type to match any IDR object. Once used, the infrastructure skips checking for the IDR type, it becomes the driver handler responsibility. This enables drivers to get in a given method an object from various of types. Signed-off-by: Yishai Hadas Signed-off-by: Leon Romanovsky --- drivers/infiniband/core/rdma_core.c | 27 +-- drivers/infiniband/core/rdma_core.h | 16 +++- drivers/infiniband/core/uverbs_uapi.c | 9 +++-- include/rdma/uverbs_ioctl.h | 6 ++ 4 files changed, 45 insertions(+), 13 deletions(-) diff --git a/drivers/infiniband/core/rdma_core.c b/drivers/infiniband/core/rdma_core.c index efa292489271..d160ed23065e 100644 --- a/drivers/infiniband/core/rdma_core.c +++ b/drivers/infiniband/core/rdma_core.c @@ -398,16 +398,23 @@ struct ib_uobject *rdma_lookup_get_uobject(const struct uverbs_api_object *obj, struct ib_uobject *uobj; int ret; - if (!obj) - return ERR_PTR(-EINVAL); + if (IS_ERR(obj) && PTR_ERR(obj) == -ENOMSG) { + /* must be UVERBS_IDR_ANY_OBJECT, see uapi_get_object() */ + uobj = lookup_get_idr_uobject(NULL, ufile, id, mode); + if (IS_ERR(uobj)) + return uobj; + } else { + if (IS_ERR(obj)) + return ERR_PTR(-EINVAL); - uobj = obj->type_class->lookup_get(obj, ufile, id, mode); - if (IS_ERR(uobj)) - return uobj; + uobj = obj->type_class->lookup_get(obj, ufile, id, mode); + if (IS_ERR(uobj)) + return uobj; - if (uobj->uapi_object != obj) { - ret = -EINVAL; - goto free; + if (uobj->uapi_object != obj) { + ret = -EINVAL; + goto free; + } } /* @@ -427,7 +434,7 @@ struct ib_uobject *rdma_lookup_get_uobject(const struct uverbs_api_object *obj, return uobj; free: - obj->type_class->lookup_put(uobj, mode); + uobj->uapi_object->type_class->lookup_put(uobj, mode); uverbs_uobject_put(uobj); return ERR_PTR(ret); } @@ -491,7 +498,7 @@ struct ib_uobject *rdma_alloc_begin_uobject(const struct uverbs_api_object *obj, { struct ib_uobject *ret; - if (!obj) + if (IS_ERR(obj)) return ERR_PTR(-EINVAL); /* diff --git a/drivers/infiniband/core/rdma_core.h b/drivers/infiniband/core/rdma_core.h index bac484d6753a..8aec28037c48 100644 --- a/drivers/infiniband/core/rdma_core.h +++ b/drivers/infiniband/core/rdma_core.h @@ -162,10 +162,24 @@ struct uverbs_api { const struct uverbs_api_write_method **write_ex_methods; }; +/* + * Get an uverbs_api_object that corresponds to the given object_id. + * Note: + * -ENOMSG means that any object is allowed to match during lookup. + */ static inline const struct uverbs_api_object * uapi_get_object(struct uverbs_api *uapi, u16 object_id) { - return radix_tree_lookup(>radix, uapi_key_obj(object_id)); + const struct uverbs_api_object *res; + + if (object_id == UVERBS_IDR_ANY_OBJECT) + return ERR_PTR(-ENOMSG); + + res = radix_tree_lookup(>radix, uapi_key_obj(object_id)); + if (!res) + return ERR_PTR(-ENOENT); + + return res; } char *uapi_key_format(char *S, unsigned int key); diff --git a/drivers/infiniband/core/uverbs_uapi.c b/drivers/infiniband/core/uverbs_uapi.c index 19ae4b19b2ef..faac225184a6 100644 --- a/drivers/infiniband/core/uverbs_uapi.c +++ b/drivers/infiniband/core/uverbs_uapi.c @@ -580,8 +580,13 @@ static void uapi_finalize_disable(struct uverbs_api *uapi) if (obj_key == UVERBS_API_KEY_ERR) continue; tmp_obj = uapi_get_object(uapi, obj_key); - if (tmp_obj && !tmp_obj->disabled) - continue; + if (IS_ERR(tmp_obj)) { + if (PTR_ERR(tmp_obj) == -ENOMSG) + continue; + } else { + if (!tmp_obj->disabled) + continue; + } starting_key = iter.index; uapi_remove_method( diff --git a/include/rdma/uverbs_ioctl.h b/include/rdma/uverbs_ioctl.h index 7f4ace93e502..2f56844fb7da 100644 --- a/include/rdma/uverbs_ioctl.h +++ b/include/rdma/uverbs_ioctl.h @@ -524,6 +524,12 @@ struct uapi_definition { .u2.objs_arr.max_len = _max_len, \ __VA_ARGS__ } }) +/* + * Only for use with UVERBS_ATTR_IDR, allows any uobject type to be accepted, + * the user must validate the type of the uobject instead. + */
[PATCH rdma-next 4/7] IB/mlx5: Enable modify and query verbs objects via DEVX
From: Yishai Hadas Enables modify and query verbs objects via the DEVX interface. To support this the above DEVX handlers were changed to get any object type via the UVERBS_IDR_ANY_OBJECT mechanism. The type checking and handling is done per object as part of the driver code. Signed-off-by: Yishai Hadas Signed-off-by: Leon Romanovsky --- drivers/infiniband/hw/mlx5/devx.c | 108 ++ 1 file changed, 96 insertions(+), 12 deletions(-) diff --git a/drivers/infiniband/hw/mlx5/devx.c b/drivers/infiniband/hw/mlx5/devx.c index 0aa2ee732eaa..f80b78aab4da 100644 --- a/drivers/infiniband/hw/mlx5/devx.c +++ b/drivers/infiniband/hw/mlx5/devx.c @@ -9,6 +9,7 @@ #include #include #include +#include #include #include #include "mlx5_ib.h" @@ -132,7 +133,7 @@ static u64 get_enc_obj_id(u16 opcode, u32 obj_id) return ((u64)opcode << 32) | obj_id; } -static int devx_is_valid_obj_id(struct devx_obj *obj, const void *in) +static u64 devx_get_obj_id(const void *in) { u16 opcode = MLX5_GET(general_obj_in_cmd_hdr, in, opcode); u64 obj_id; @@ -336,13 +337,96 @@ static int devx_is_valid_obj_id(struct devx_obj *obj, const void *in) MLX5_GET(arm_xrq_in, in, xrqn)); break; default: + obj_id = 0; + } + + return obj_id; +} + +static bool devx_is_valid_obj_id(struct ib_uobject *uobj, const void *in) +{ + u64 obj_id = devx_get_obj_id(in); + + if (!obj_id) return false; + + switch (uobj_get_object_id(uobj)) { + case UVERBS_OBJECT_CQ: + return get_enc_obj_id(MLX5_CMD_OP_CREATE_CQ, + to_mcq(uobj->object)->mcq.cqn) == + obj_id; + + case UVERBS_OBJECT_SRQ: + { + struct mlx5_core_srq *srq = &(to_msrq(uobj->object)->msrq); + struct mlx5_ib_dev *dev = to_mdev(uobj->context->device); + u16 opcode; + + switch (srq->common.res) { + case MLX5_RES_XSRQ: + opcode = MLX5_CMD_OP_CREATE_XRC_SRQ; + break; + case MLX5_RES_XRQ: + opcode = MLX5_CMD_OP_CREATE_XRQ; + break; + default: + if (!dev->mdev->issi) + opcode = MLX5_CMD_OP_CREATE_SRQ; + else + opcode = MLX5_CMD_OP_CREATE_RMP; + } + + return get_enc_obj_id(opcode, + to_msrq(uobj->object)->msrq.srqn) == + obj_id; } - if (obj_id == obj->obj_id) - return true; + case UVERBS_OBJECT_QP: + { + struct mlx5_ib_qp *qp = to_mqp(uobj->object); + enum ib_qp_type qp_type = qp->ibqp.qp_type; + + if (qp_type == IB_QPT_RAW_PACKET || + (qp->flags & MLX5_IB_QP_UNDERLAY)) { + struct mlx5_ib_raw_packet_qp *raw_packet_qp = +>raw_packet_qp; + struct mlx5_ib_rq *rq = _packet_qp->rq; + struct mlx5_ib_sq *sq = _packet_qp->sq; + + return (get_enc_obj_id(MLX5_CMD_OP_CREATE_RQ, + rq->base.mqp.qpn) == obj_id || + get_enc_obj_id(MLX5_CMD_OP_CREATE_SQ, + sq->base.mqp.qpn) == obj_id || + get_enc_obj_id(MLX5_CMD_OP_CREATE_TIR, + rq->tirn) == obj_id || + get_enc_obj_id(MLX5_CMD_OP_CREATE_TIS, + sq->tisn) == obj_id); + } + + if (qp_type == MLX5_IB_QPT_DCT) + return get_enc_obj_id(MLX5_CMD_OP_CREATE_DCT, + qp->dct.mdct.mqp.qpn) == obj_id; + + return get_enc_obj_id(MLX5_CMD_OP_CREATE_QP, + qp->ibqp.qp_num) == obj_id; + } - return false; + case UVERBS_OBJECT_WQ: + return get_enc_obj_id(MLX5_CMD_OP_CREATE_RQ, + to_mrwq(uobj->object)->core_qp.qpn) == + obj_id; + + case UVERBS_OBJECT_RWQ_IND_TBL: + return get_enc_obj_id(MLX5_CMD_OP_CREATE_RQT, + to_mrwq_ind_table(uobj->object)->rqtn) == + obj_id; + + case MLX5_IB_OBJECT_DEVX_OBJ: + return ((struct devx_obj *)uobj->object)->obj_id == obj_id; + + default: + return false; + } } static void devx_set_umem_valid(const void *in) @@
Re: [PATCH net-next] net: remove unsafe skb_insert()
On 11/25/2018 07:52 PM, David Miller wrote: > > I fixed up the build in your original patch and am about to push that > out. Thanks David, sorry for this, I should have compiled the damn thing :/
Re: pull-request: bpf 2018-11-25
From: Daniel Borkmann Date: Mon, 26 Nov 2018 01:16:51 +0100 > The following pull-request contains BPF updates for your *net* tree. Pulled, thanks.
Re: [PATCH net-next] net: remove unsafe skb_insert()
From: Eric Dumazet Date: Sun, 25 Nov 2018 15:37:43 -0800 > On Sun, Nov 25, 2018 at 10:29 AM David Miller wrote: >> >> From: Eric Dumazet >> Date: Sun, 25 Nov 2018 08:26:23 -0800 >> >> > I do not see how one can effectively use skb_insert() without holding >> > some kind of lock. Otherwise other cpus could have changed the list >> > right before we have a chance of acquiring list->lock. >> > >> > Only existing user is in drivers/infiniband/hw/nes/nes_mgt.c and this >> > one probably meant to use __skb_insert() since it appears nesqp->pau_list >> > is protected by nesqp->pau_lock. This looks like nesqp->pau_lock >> > could be removed, since nesqp->pau_list.lock could be used instead. >> > >> > Signed-off-by: Eric Dumazet >> >> Good find. >> >> Indeed, any of the queue SKB manipulation functions that take two SKBs >> as an argument are suspect in this manner. >> >> Applied, thanks Eric. > > Oh well, this does not build. > > Since you have not pushed your tree yet, maybe we can replace this > with a version that actually compiles. > > Please let me know if a relative patch or a v2 is needed, thanks. I fixed up the build in your original patch and am about to push that out.
Re: [PATCH net-next] net: phy: fix two issues with linkmode bitmaps
> Because function naming is the same I'm afraid they easily can be used > incorrectly (the bugs we just discuss are good examples). Maybe it > could be an option to reflect the semantics in the name like this > (better suited proposals welcome): > > case 1: mii_xxx_to_linkmode_yyy > case 2: mii_xxx_or_linkmode_yyy > case 3: mii_xxx_mod_linkmode_yyy Hi Heiner I started a patchset using this idea, and reworks your fix. Lets work on that, rather than merge this patch. Andrew
Re: consistency for statistics with XDP mode
On 2018/11/23 1:43, David Ahern wrote: > On 11/21/18 5:53 PM, Toshiaki Makita wrote: >>> We really need consistency in the counters and at a minimum, users >>> should be able to track packet and byte counters for both Rx and Tx >>> including XDP. >>> >>> It seems to me the Rx and Tx packet, byte and dropped counters returned >>> for the standard device stats (/proc/net/dev, ip -s li show, ...) should >>> include all packets managed by the driver regardless of whether they are >>> forwarded / dropped in XDP or go up the Linux stack. This also aligns >> >> Agreed. When I introduced virtio_net XDP counters, I just forgot to >> update tx packets/bytes counters on ndo_xdp_xmit. Probably I thought it >> is handled by free_old_xmit_skbs. > > Do you have some time to look at adding the Tx counters to virtio_net? hoping I can make some time within a couple of days. -- Toshiaki Makita
Re: [PATCH bpf-next 1/3] bpf: helper to pop data from messages
On 11/23/2018 02:38 AM, John Fastabend wrote: > This adds a BPF SK_MSG program helper so that we can pop data from a > msg. We use this to pop metadata from a previous push data call. > > Signed-off-by: John Fastabend > --- > include/uapi/linux/bpf.h | 13 +++- > net/core/filter.c| 169 > +++ > net/ipv4/tcp_bpf.c | 14 +++- > 3 files changed, 192 insertions(+), 4 deletions(-) > > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h > index c1554aa..64681f8 100644 > --- a/include/uapi/linux/bpf.h > +++ b/include/uapi/linux/bpf.h > @@ -2268,6 +2268,16 @@ union bpf_attr { > * > * Return > * 0 on success, or a negative error in case of failure. > + * > + * int bpf_msg_pop_data(struct sk_msg_buff *msg, u32 start, u32 pop, u64 > flags) > + *Description > + * Will remove 'pop' bytes from a msg starting at byte 'start'. > + * This result in ENOMEM errors under certain situations where > + * a allocation and copy are required due to a full ring buffer. > + * However, the helper will try to avoid doing the allocation > + * if possible. Other errors can occur if input parameters are > + * invalid either do to start byte not being valid part of msg > + * payload and/or pop value being to large. > */ > #define __BPF_FUNC_MAPPER(FN)\ > FN(unspec), \ > @@ -2360,7 +2370,8 @@ union bpf_attr { > FN(map_push_elem), \ > FN(map_pop_elem), \ > FN(map_peek_elem), \ > - FN(msg_push_data), > + FN(msg_push_data), \ > + FN(msg_pop_data), > > /* integer value in 'imm' field of BPF_CALL instruction selects which helper > * function eBPF program intends to call > diff --git a/net/core/filter.c b/net/core/filter.c > index f6ca38a..c6b35b5 100644 > --- a/net/core/filter.c > +++ b/net/core/filter.c > @@ -2428,6 +2428,173 @@ static const struct bpf_func_proto > bpf_msg_push_data_proto = { > .arg4_type = ARG_ANYTHING, > }; > > +static void sk_msg_shift_left(struct sk_msg *msg, int i) > +{ > + int prev; > + > + do { > + prev = i; > + sk_msg_iter_var_next(i); > + msg->sg.data[prev] = msg->sg.data[i]; > + } while (i != msg->sg.end); > + > + sk_msg_iter_prev(msg, end); > +} > + > +static void sk_msg_shift_right(struct sk_msg *msg, int i) > +{ > + struct scatterlist tmp, sge; > + > + sk_msg_iter_next(msg, end); > + sge = sk_msg_elem_cpy(msg, i); > + sk_msg_iter_var_next(i); > + tmp = sk_msg_elem_cpy(msg, i); > + > + while (i != msg->sg.end) { > + msg->sg.data[i] = sge; > + sk_msg_iter_var_next(i); > + sge = tmp; > + tmp = sk_msg_elem_cpy(msg, i); > + } > +} > + > +BPF_CALL_4(bpf_msg_pop_data, struct sk_msg *, msg, u32, start, > +u32, len, u64, flags) > +{ > + u32 i = 0, l, space, offset = 0; > + u64 last = start + len; > + int pop; > + > + if (unlikely(flags)) > + return -EINVAL; > + > + /* First find the starting scatterlist element */ > + i = msg->sg.start; > + do { > + l = sk_msg_elem(msg, i)->length; > + > + if (start < offset + l) > + break; > + offset += l; > + sk_msg_iter_var_next(i); > + } while (i != msg->sg.end); > + > + /* Bounds checks: start and pop must be inside message */ > + if (start >= offset + l || last >= msg->sg.size) > + return -EINVAL; > + > + space = MAX_MSG_FRAGS - sk_msg_elem_used(msg); > + > + pop = len; > + /* --| offset > + * -| start |--- len --| > + * > + * |- a | pop ---|- b | > + * |__| length > + * > + * > + * a: region at front of scatter element to save > + * b: region at back of scatter element to save when length > A + pop > + * pop: region to pop from element, same as input 'pop' here will be > + * decremented below per iteration. > + * > + * Two top-level cases to handle when start != offset, first B is non > + * zero and second B is zero corresponding to when a pop includes more > + * than one element. > + * > + * Then if B is non-zero AND there is no space allocate space and > + * compact A, B regions into page. If there is space shift ring to > + * the rigth free'ing the next element in ring to place B, leaving > + * A untouched except to reduce length. > + */ > + if (start != offset) { > + struct scatterlist *nsge, *sge = sk_msg_elem(msg, i); > + int a = start; > + int b = sge->length - pop - a; > + > + sk_msg_iter_var_next(i); > + > + if (pop <
Re: [PATCH bpf-next 0/3] bpf: add sk_msg helper sk_msg_pop_data
On 11/23/2018 02:38 AM, John Fastabend wrote: > After being able to add metadata to messages with sk_msg_push_data we > have also found it useful to be able to "pop" this metadata off before > sending it to applications in some cases. This series adds a new helper > sk_msg_pop_data() and the associated patches to add tests and tools/lib > support. > > Thanks! > > John Fastabend (3): > bpf: helper to pop data from messages > bpf: add msg_pop_data helper to tools > bpf: test_sockmap, add options for msg_pop_data() helper usage > > include/uapi/linux/bpf.h| 13 +- > net/core/filter.c | 169 > > net/ipv4/tcp_bpf.c | 14 +- > tools/include/uapi/linux/bpf.h | 13 +- > tools/testing/selftests/bpf/bpf_helpers.h | 2 + > tools/testing/selftests/bpf/test_sockmap.c | 127 +- > tools/testing/selftests/bpf/test_sockmap_kern.h | 70 -- > 7 files changed, 386 insertions(+), 22 deletions(-) > Applied to bpf-next, thanks.
Re: [PATCH bpf-next] bpf: align map type names formatting.
On 11/24/2018 12:58 AM, David Calavera wrote: > Make the formatting for map_type_name array consistent. > > Signed-off-by: David Calavera Applied, thanks!
Re: [PATCH] tags: Fix DEFINE_PER_CPU expansion
On 11/24/2018 12:48 AM, Rustam Kovhaev wrote: > Building tags produces warning: > ctags: Warning: kernel/bpf/local_storage.c:10: null expansion of name > pattern "\1" > > Let's use the same fix as in commit <25528213fe9f75f4>, even though it > violates the usual code style. > > Signed-off-by: Rustam Kovhaev Applied to bpf-next, thanks!
pull-request: bpf 2018-11-25
Hi David, The following pull-request contains BPF updates for your *net* tree. The main changes are: 1) Fix an off-by-one bug when adjusting subprog start offsets after patching, from Edward. 2) Fix several bugs such as overflow in size allocation in queue / stack map creation, from Alexei. 3) Fix wrong IPv6 destination port byte order in bpf_sk_lookup_udp helper, from Andrey. 4) Fix several bugs in bpftool such as preventing an infinite loop in get_fdinfo, error handling and man page references, from Quentin. 5) Fix a warning in bpf_trace_printk() that wasn't catching an invalid format string, from Martynas. 6) Fix a bug in BPF cgroup local storage where non-atomic allocation was used in atomic context, from Roman. 7) Fix a NULL pointer dereference bug in bpftool from reallocarray() error handling, from Jakub and Wen. 8) Add a copy of pkt_cls.h and tc_bpf.h uapi headers to the tools include infrastructure so that bpftool compiles on older RHEL7-like user space which does not ship these headers, from Yonghong. 9) Fix BPF kselftests for user space where to get ping test working with ping6 and ping -6, from Li. Please consider pulling these changes from: git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git Thanks a lot! The following changes since commit 85b18b0237ce9986a81a1b9534b5e2ee116f5504: net: smsc95xx: Fix MTU range (2018-11-08 19:54:49 -0800) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git for you to fetch changes up to 1efb6ee3edea57f57f9fb05dba8dcb3f7333f61f: bpf: fix check of allowed specifiers in bpf_trace_printk (2018-11-23 21:54:14 +0100) Alexei Starovoitov (1): bpf: fix integer overflow in queue_stack_map Andrey Ignatov (1): bpf: Fix IPv6 dport byte order in bpf_sk_lookup_udp Edward Cree (1): bpf: fix off-by-one error in adjust_subprog_starts Jakub Kicinski (1): tools: bpftool: fix potential NULL pointer dereference in do_load Li Zhijian (1): kselftests/bpf: use ping6 as the default ipv6 ping binary when it exists Martynas Pumputis (1): bpf: fix check of allowed specifiers in bpf_trace_printk Quentin Monnet (4): tools: bpftool: prevent infinite loop in get_fdinfo() tools: bpftool: fix plain output and doc for --bpffs option tools: bpftool: pass an argument to silence open_obj_pinned() tools: bpftool: update references to other man pages in documentation Roman Gushchin (1): bpf: allocate local storage buffers using GFP_ATOMIC Yonghong Song (1): tools/bpftool: copy a few net uapi headers to tools directory kernel/bpf/local_storage.c | 3 +- kernel/bpf/queue_stack_maps.c | 16 +- kernel/bpf/verifier.c | 2 +- kernel/trace/bpf_trace.c | 8 +- net/core/filter.c | 5 +- tools/bpf/bpftool/Documentation/bpftool-cgroup.rst | 8 +- tools/bpf/bpftool/Documentation/bpftool-map.rst| 8 +- tools/bpf/bpftool/Documentation/bpftool-net.rst| 8 +- tools/bpf/bpftool/Documentation/bpftool-perf.rst | 8 +- tools/bpf/bpftool/Documentation/bpftool-prog.rst | 11 +- tools/bpf/bpftool/Documentation/bpftool.rst| 9 +- tools/bpf/bpftool/common.c | 17 +- tools/bpf/bpftool/main.h | 2 +- tools/bpf/bpftool/prog.c | 13 +- tools/include/uapi/linux/pkt_cls.h | 612 + tools/include/uapi/linux/tc_act/tc_bpf.h | 37 ++ tools/testing/selftests/bpf/test_netcnt.c | 5 +- tools/testing/selftests/bpf/test_verifier.c| 19 + 18 files changed, 752 insertions(+), 39 deletions(-) create mode 100644 tools/include/uapi/linux/pkt_cls.h create mode 100644 tools/include/uapi/linux/tc_act/tc_bpf.h
[PATCH linux-next 05/10] ARM: dts: dra7: switch to use phy-gmii-sel
Switch to use phy-gmii-sel PHY instead of cpsw-phy-sel. Cc: Kishon Vijay Abraham I Cc: Tony Lindgren Signed-off-by: Grygorii Strashko --- arch/arm/boot/dts/dra7-l4.dtsi | 15 --- 1 file changed, 8 insertions(+), 7 deletions(-) diff --git a/arch/arm/boot/dts/dra7-l4.dtsi b/arch/arm/boot/dts/dra7-l4.dtsi index 7e5c0d4f..7070095 100644 --- a/arch/arm/boot/dts/dra7-l4.dtsi +++ b/arch/arm/boot/dts/dra7-l4.dtsi @@ -77,18 +77,18 @@ }; }; + phy_gmii_sel: phy-gmii-sel { + compatible = "ti,dra7xx-phy-gmii-sel"; + reg = <0x554 0x4>; + #phy-cells = <1>; + }; + scm_conf_clocks: clocks { #address-cells = <1>; #size-cells = <0>; }; }; - phy_sel: cpsw-phy-sel@554 { - compatible = "ti,dra7xx-cpsw-phy-sel"; - reg= <0x554 0x4>; - reg-names = "gmii-sel"; - }; - dra7_pmx_core: pinmux@1400 { compatible = "ti,dra7-padconf", "pinctrl-single"; @@ -3060,7 +3060,6 @@ ; ranges = <0 0 0x4000>; syscon = <_conf>; - cpsw-phy-sel = <_sel>; status = "disabled"; davinci_mdio: mdio@1000 { @@ -3075,11 +3074,13 @@ cpsw_emac0: slave@200 { /* Filled in by U-Boot */ mac-address = [ 00 00 00 00 00 00 ]; + phys = <_gmii_sel 1>; }; cpsw_emac1: slave@300 { /* Filled in by U-Boot */ mac-address = [ 00 00 00 00 00 00 ]; + phys = <_gmii_sel 2>; }; }; }; -- 2.10.5
[PATCH linux-next 09/10] dt-bindings: net: ti: deprecate cpsw-phy-sel bindings
The cpsw-phy-sel driver was replaced with new PHY driver phy-gmii-sel, so deprecate cpsw-phy-sel bindings. Cc: Kishon Vijay Abraham I Cc: Tony Lindgren Signed-off-by: Grygorii Strashko --- Documentation/devicetree/bindings/net/cpsw-phy-sel.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Documentation/devicetree/bindings/net/cpsw-phy-sel.txt b/Documentation/devicetree/bindings/net/cpsw-phy-sel.txt index 764c0c7..5d76f99 100644 --- a/Documentation/devicetree/bindings/net/cpsw-phy-sel.txt +++ b/Documentation/devicetree/bindings/net/cpsw-phy-sel.txt @@ -1,4 +1,4 @@ -TI CPSW Phy mode Selection Device Tree Bindings +TI CPSW Phy mode Selection Device Tree Bindings (DEPRECATED) --- Required properties: -- 2.10.5
[PATCH linux-next 02/10] phy: ti: introduce phy-gmii-sel driver
TI am335x/am437x/dra7(am5)/dm814x CPSW3G Ethernet Subsystem supports two 10/100/1000 Ethernet ports with selectable G/MII, RMII, and RGMII interfaces. The interface mode is selected by configuring the MII mode selection register(s) (GMII_SEL) in the System Control Module chapter (SCM). GMII_SEL register(s) and bit fields placement in SCM are different between SoCs while fields meaning is the same. Historically CPSW external Port's interface mode selection configuration was introduced using custom API and driver cpsw-phy-sel.c. This leads to unnecessary driver, DT binding and custom API support effort. This patch introduces CPSW Port's PHY Interface Mode selection Driver (phy-gmii-sel) which implements standard Linux PHY interface and used as a replacement for TI's specific driver cpsw-phy-sel.c and corresponding custom API. Cc: Kishon Vijay Abraham I Cc: Tony Lindgren Signed-off-by: Grygorii Strashko --- drivers/phy/ti/Kconfig| 10 ++ drivers/phy/ti/Makefile | 1 + drivers/phy/ti/phy-gmii-sel.c | 349 ++ 3 files changed, 360 insertions(+) create mode 100644 drivers/phy/ti/phy-gmii-sel.c diff --git a/drivers/phy/ti/Kconfig b/drivers/phy/ti/Kconfig index 2050356..f137e01 100644 --- a/drivers/phy/ti/Kconfig +++ b/drivers/phy/ti/Kconfig @@ -76,3 +76,13 @@ config TWL4030_USB family chips (including the TWL5030 and TPS659x0 devices). This transceiver supports high and full speed devices plus, in host mode, low speed. + +config PHY_TI_GMII_SEL + tristate + default y if TI_CPSW=y + depends on TI_CPSW || COMPILE_TEST + select GENERIC_PHY + default m + help + This driver supports configuring of the TI CPSW Port mode depending on + the Ethernet PHY connected to the CPSW Port. diff --git a/drivers/phy/ti/Makefile b/drivers/phy/ti/Makefile index 9f36175..bea8f25 100644 --- a/drivers/phy/ti/Makefile +++ b/drivers/phy/ti/Makefile @@ -6,3 +6,4 @@ obj-$(CONFIG_OMAP_USB2) += phy-omap-usb2.o obj-$(CONFIG_TI_PIPE3) += phy-ti-pipe3.o obj-$(CONFIG_PHY_TUSB1210) += phy-tusb1210.o obj-$(CONFIG_TWL4030_USB) += phy-twl4030-usb.o +obj-$(CONFIG_PHY_TI_GMII_SEL) += phy-gmii-sel.o diff --git a/drivers/phy/ti/phy-gmii-sel.c b/drivers/phy/ti/phy-gmii-sel.c new file mode 100644 index 000..04ebf53 --- /dev/null +++ b/drivers/phy/ti/phy-gmii-sel.c @@ -0,0 +1,349 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Texas Instruments CPSW Port's PHY Interface Mode selection Driver + * + * Copyright (C) 2018 Texas Instruments Incorporated - http://www.ti.com/ + * + * Based on cpsw-phy-sel.c driver created by Mugunthan V N + */ + +#include +#include +#include +#include +#include +#include +#include +#include + +/* AM33xx SoC specific definitions for the CONTROL port */ +#define AM33XX_GMII_SEL_MODE_MII 0 +#define AM33XX_GMII_SEL_MODE_RMII 1 +#define AM33XX_GMII_SEL_MODE_RGMII 2 + +enum { + PHY_GMII_SEL_PORT_MODE, + PHY_GMII_SEL_RGMII_ID_MODE, + PHY_GMII_SEL_RMII_IO_CLK_EN, + PHY_GMII_SEL_LAST, +}; + +struct phy_gmii_sel_phy_priv { + struct phy_gmii_sel_priv *priv; + u32 id; + struct phy *if_phy; + int rmii_clock_external; + int phy_if_mode; + struct regmap_field *fields[PHY_GMII_SEL_LAST]; +}; + +struct phy_gmii_sel_soc_data { + u32 num_ports; + u32 features; + const struct reg_field (*regfields)[PHY_GMII_SEL_LAST]; +}; + +struct phy_gmii_sel_priv { + struct device *dev; + const struct phy_gmii_sel_soc_data *soc_data; + struct regmap *regmap; + struct phy_provider *phy_provider; + struct phy_gmii_sel_phy_priv *if_phys; +}; + +static int phy_gmii_sel_mode(struct phy *phy, enum phy_mode mode, int submode) +{ + struct phy_gmii_sel_phy_priv *if_phy = phy_get_drvdata(phy); + const struct phy_gmii_sel_soc_data *soc_data = if_phy->priv->soc_data; + struct device *dev = if_phy->priv->dev; + struct regmap_field *regfield; + int ret, rgmii_id = 0; + u32 gmii_sel_mode = 0; + + if (mode != PHY_MODE_ETHERNET) + return -EINVAL; + + switch (submode) { + case PHY_INTERFACE_MODE_RMII: + gmii_sel_mode = AM33XX_GMII_SEL_MODE_RMII; + break; + + case PHY_INTERFACE_MODE_RGMII: + gmii_sel_mode = AM33XX_GMII_SEL_MODE_RGMII; + break; + + case PHY_INTERFACE_MODE_RGMII_ID: + case PHY_INTERFACE_MODE_RGMII_RXID: + case PHY_INTERFACE_MODE_RGMII_TXID: + gmii_sel_mode = AM33XX_GMII_SEL_MODE_RGMII; + rgmii_id = 1; + break; + + case PHY_INTERFACE_MODE_MII: + mode = AM33XX_GMII_SEL_MODE_MII; + break; + + default: + dev_warn(dev, +
[PATCH v2 net-next] net: remove unsafe skb_insert()
I do not see how one can effectively use skb_insert() without holding some kind of lock. Otherwise other cpus could have changed the list right before we have a chance of acquiring list->lock. Only existing user is in drivers/infiniband/hw/nes/nes_mgt.c and this one probably meant to use __skb_insert() since it appears nesqp->pau_list is protected by nesqp->pau_lock. This looks like nesqp->pau_lock could be removed, since nesqp->pau_list.lock could be used instead. Signed-off-by: Eric Dumazet Cc: Faisal Latif Cc: Doug Ledford Cc: Jason Gunthorpe Cc: linux-rdma --- drivers/infiniband/hw/nes/nes_mgt.c | 4 ++-- include/linux/skbuff.h | 2 -- net/core/skbuff.c | 22 -- 3 files changed, 2 insertions(+), 26 deletions(-) diff --git a/drivers/infiniband/hw/nes/nes_mgt.c b/drivers/infiniband/hw/nes/nes_mgt.c index fc0c191014e908eea32d752f3499295ef143aa0a..cc4dce5c3e5f6d99fc44fcde7334e70ac7a33002 100644 --- a/drivers/infiniband/hw/nes/nes_mgt.c +++ b/drivers/infiniband/hw/nes/nes_mgt.c @@ -551,14 +551,14 @@ static void queue_fpdus(struct sk_buff *skb, struct nes_vnic *nesvnic, struct ne /* Queue skb by sequence number */ if (skb_queue_len(>pau_list) == 0) { - skb_queue_head(>pau_list, skb); + __skb_queue_head(>pau_list, skb); } else { skb_queue_walk(>pau_list, tmpskb) { cb = (struct nes_rskb_cb *)>cb[0]; if (before(seqnum, cb->seqnum)) break; } - skb_insert(tmpskb, skb, >pau_list); + __skb_insert(skb, tmpskb->prev, tmpskb, >pau_list); } if (nesqp->pau_state == PAU_READY) process_it = true; diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index f17a7452ac7bf47ef4bcf89840bba165cee6f50a..73902acf2b71c8800d81b744a936a7420f33b459 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -1749,8 +1749,6 @@ static inline void skb_queue_head_init_class(struct sk_buff_head *list, * The "__skb_()" functions are the non-atomic ones that * can only be called with interrupts disabled. */ -void skb_insert(struct sk_buff *old, struct sk_buff *newsk, - struct sk_buff_head *list); static inline void __skb_insert(struct sk_buff *newsk, struct sk_buff *prev, struct sk_buff *next, struct sk_buff_head *list) diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 9a8a72cefe9b94d3821b9cc5ba5bba647ae51267..02cd7ae3d0fb26ef0a8b006390154fdefd0d292f 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -2990,28 +2990,6 @@ void skb_append(struct sk_buff *old, struct sk_buff *newsk, struct sk_buff_head } EXPORT_SYMBOL(skb_append); -/** - * skb_insert - insert a buffer - * @old: buffer to insert before - * @newsk: buffer to insert - * @list: list to use - * - * Place a packet before a given packet in a list. The list locks are - * taken and this function is atomic with respect to other list locked - * calls. - * - * A buffer cannot be placed on two lists at the same time. - */ -void skb_insert(struct sk_buff *old, struct sk_buff *newsk, struct sk_buff_head *list) -{ - unsigned long flags; - - spin_lock_irqsave(>lock, flags); - __skb_insert(newsk, old->prev, old, list); - spin_unlock_irqrestore(>lock, flags); -} -EXPORT_SYMBOL(skb_insert); - static inline void skb_split_inside_header(struct sk_buff *skb, struct sk_buff* skb1, const u32 len, const int pos) -- 2.20.0.rc0.387.gc7a69e6b6c-goog
Re: [PATCH net-next] net: remove unsafe skb_insert()
On Sun, Nov 25, 2018 at 10:29 AM David Miller wrote: > > From: Eric Dumazet > Date: Sun, 25 Nov 2018 08:26:23 -0800 > > > I do not see how one can effectively use skb_insert() without holding > > some kind of lock. Otherwise other cpus could have changed the list > > right before we have a chance of acquiring list->lock. > > > > Only existing user is in drivers/infiniband/hw/nes/nes_mgt.c and this > > one probably meant to use __skb_insert() since it appears nesqp->pau_list > > is protected by nesqp->pau_lock. This looks like nesqp->pau_lock > > could be removed, since nesqp->pau_list.lock could be used instead. > > > > Signed-off-by: Eric Dumazet > > Good find. > > Indeed, any of the queue SKB manipulation functions that take two SKBs > as an argument are suspect in this manner. > > Applied, thanks Eric. Oh well, this does not build. Since you have not pushed your tree yet, maybe we can replace this with a version that actually compiles. Please let me know if a relative patch or a v2 is needed, thanks.
Re: [PATCH net-next] net: remove unsafe skb_insert()
Hi Eric, I love your patch! Yet something to improve: [auto build test ERROR on net-next/master] url: https://github.com/0day-ci/linux/commits/Eric-Dumazet/net-remove-unsafe-skb_insert/20181126-061342 config: x86_64-randconfig-x009-201847 (attached as .config) compiler: gcc-7 (Debian 7.3.0-1) 7.3.0 reproduce: # save the attached .config to linux build tree make ARCH=x86_64 All errors (new ones prefixed by >>): drivers/infiniband//hw/nes/nes_mgt.c: In function 'queue_fpdus': >> drivers/infiniband//hw/nes/nes_mgt.c:561:29: error: passing argument 3 of >> '__skb_insert' from incompatible pointer type >> [-Werror=incompatible-pointer-types] __skb_insert(tmpskb, skb, >pau_list); ^ In file included from drivers/infiniband//hw/nes/nes_mgt.c:34:0: include/linux/skbuff.h:1752:20: note: expected 'struct sk_buff *' but argument is of type 'struct sk_buff_head *' static inline void __skb_insert(struct sk_buff *newsk, ^~~~ >> drivers/infiniband//hw/nes/nes_mgt.c:561:3: error: too few arguments to >> function '__skb_insert' __skb_insert(tmpskb, skb, >pau_list); ^~~~ In file included from drivers/infiniband//hw/nes/nes_mgt.c:34:0: include/linux/skbuff.h:1752:20: note: declared here static inline void __skb_insert(struct sk_buff *newsk, ^~~~ cc1: some warnings being treated as errors vim +/__skb_insert +561 drivers/infiniband//hw/nes/nes_mgt.c 503 504 /** 505 * queue_fpdus - Handle fpdu's that hw passed up to sw 506 */ 507 static void queue_fpdus(struct sk_buff *skb, struct nes_vnic *nesvnic, struct nes_qp *nesqp) 508 { 509 struct sk_buff *tmpskb; 510 struct nes_rskb_cb *cb; 511 struct iphdr *iph; 512 struct tcphdr *tcph; 513 unsigned char *tcph_end; 514 u32 rcv_nxt; 515 u32 rcv_wnd; 516 u32 seqnum; 517 u32 len; 518 bool process_it = false; 519 unsigned long flags; 520 521 /* Move data ptr to after tcp header */ 522 iph = (struct iphdr *)skb->data; 523 tcph = (struct tcphdr *)(((char *)iph) + (4 * iph->ihl)); 524 seqnum = be32_to_cpu(tcph->seq); 525 tcph_end = (((char *)tcph) + (4 * tcph->doff)); 526 527 len = be16_to_cpu(iph->tot_len); 528 if (skb->len > len) 529 skb_trim(skb, len); 530 skb_pull(skb, tcph_end - skb->data); 531 532 /* Initialize tracking values */ 533 cb = (struct nes_rskb_cb *)>cb[0]; 534 cb->seqnum = seqnum; 535 536 /* Make sure data is in the receive window */ 537 rcv_nxt = nesqp->pau_rcv_nxt; 538 rcv_wnd = le32_to_cpu(nesqp->nesqp_context->rcv_wnd); 539 if (!between(seqnum, rcv_nxt, (rcv_nxt + rcv_wnd))) { 540 nes_mgt_free_skb(nesvnic->nesdev, skb, PCI_DMA_TODEVICE); 541 nes_rem_ref_cm_node(nesqp->cm_node); 542 return; 543 } 544 545 spin_lock_irqsave(>pau_lock, flags); 546 547 if (nesqp->pau_busy) 548 nesqp->pau_pending = 1; 549 else 550 nesqp->pau_busy = 1; 551 552 /* Queue skb by sequence number */ 553 if (skb_queue_len(>pau_list) == 0) { 554 __skb_queue_head(>pau_list, skb); 555 } else { 556 skb_queue_walk(>pau_list, tmpskb) { 557 cb = (struct nes_rskb_cb *)>cb[0]; 558 if (before(seqnum, cb->seqnum)) 559 break; 560 } > 561 __skb_insert(tmpskb, skb, >pau_list); 562 } 563 if (nesqp->pau_state == PAU_READY) 564 process_it = true; 565 spin_unlock_irqrestore(>pau_lock, flags); 566 567 if (process_it) 568 process_fpdus(nesvnic, nesqp); 569 570 return; 571 } 572 --- 0-DAY kernel test infrastructureOpen Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation .config.gz Description: application/gzip
Re: [PATCH net-next] net: phy: fix two issues with linkmode bitmaps
> Eventually we'd have three types of mii_xxx_to_linkmode_yyy functions: > > 1. Function first zeroes the destination linkmode bitmap > 2. Function sets bits in the linkmode bitmap but doesn't clear bits >if condition isn't met > 3. Function clears / sets bits it's responsible for > > example case 1: mmd_eee_adv_to_linkmode > example case 2: mii_stat1000_to_linkmode_lpa_t > example case 3: what you just proposed as fix for > mii_adv_to_linkmode_adv_t > > Because function naming is the same I'm afraid they easily can be used > incorrectly (the bugs we just discuss are good examples). Maybe it > could be an option to reflect the semantics in the name like this > (better suited proposals welcome): > > case 1: mii_xxx_to_linkmode_yyy > case 2: mii_xxx_or_linkmode_yyy > case 3: mii_xxx_mod_linkmode_yyy Hi Heiner That is a good idea. We should probably do this first, it will help find the bugs. Andrew
Re: Can decnet be deprecated?
From: Bjørn Mork Date: Sun, 25 Nov 2018 12:30:26 +0100 > David Miller writes: >> From: David Ahern >> Date: Sat, 24 Nov 2018 17:12:48 -0700 >> >>> IPX was moved to staging at the end of last year. Can decnet follow >>> suit? git log seems to indicate no active development in a very long time. >> >> Last time I tried to do that someone immediately said on the list >> "Don't, we're using that!" > > Not sure about that. What I can see is a claim that it has no bugs: > http://patchwork.ozlabs.org/patch/837484/ > > The V1 received only support for removal: > http://patchwork.ozlabs.org/patch/837261/ > > But no one claimed they were using decnet. Ok, if people want to try and deprecate it again we can try.
Re: [PATCH net-next] net: remove unsafe skb_insert()
From: Eric Dumazet Date: Sun, 25 Nov 2018 08:26:23 -0800 > I do not see how one can effectively use skb_insert() without holding > some kind of lock. Otherwise other cpus could have changed the list > right before we have a chance of acquiring list->lock. > > Only existing user is in drivers/infiniband/hw/nes/nes_mgt.c and this > one probably meant to use __skb_insert() since it appears nesqp->pau_list > is protected by nesqp->pau_lock. This looks like nesqp->pau_lock > could be removed, since nesqp->pau_list.lock could be used instead. > > Signed-off-by: Eric Dumazet Good find. Indeed, any of the queue SKB manipulation functions that take two SKBs as an argument are suspect in this manner. Applied, thanks Eric.
Re: [PATCH net-next v2 0/2] r8169: make use of xmit_more and __netdev_sent_queue
From: Heiner Kallweit Date: Sun, 25 Nov 2018 14:29:22 +0100 > This series adds helper __netdev_sent_queue to the core and makes use > of it in the r8169 driver. > > Heiner Kallweit (2): > net: core: add __netdev_sent_queue as variant of __netdev_tx_sent_queue > r8169: make use of xmit_more and __netdev_sent_queue > > v2: > - fix minor style issue Series applied.
Re: [PATCH iproute2-next] man: tc: update man page for fq packet scheduler
On 11/24/18 6:44 PM, Eric Dumazet wrote: > Signed-off-by: Eric Dumazet > --- > man/man8/tc-fq.8 | 37 ++--- > 1 file changed, 26 insertions(+), 11 deletions(-) > applied to iproute2-next. Thanks, Eric.
Re: [PATCH net-next] net: phy: fix two issues with linkmode bitmaps
On 25.11.2018 17:45, Andrew Lunn wrote: > On Sun, Nov 25, 2018 at 03:23:42PM +0100, Heiner Kallweit wrote: >> I wondered why ethtool suddenly reports that link partner doesn't >> support aneg and GBit modes. It turned out that this is caused by two >> bugs in conversion to linkmode bitmaps. >> >> 1. In genphy_read_status the value of phydev->lp_advertising is >>overwritten, thus GBit modes aren't reported any longer. >> 2. In mii_lpa_to_linkmode_lpa_t the aneg bit was overwritten by the >>call to mii_adv_to_linkmode_adv_t. > > Hi Heiner > > Thanks for looking into this. > > There are more bugs :-( > > static inline void mii_lpa_to_linkmode_lpa_t(unsigned long *lp_advertising, > u32 lpa) > { > if (lpa & LPA_LPACK) > linkmode_set_bit(ETHTOOL_LINK_MODE_Autoneg_BIT, > lp_advertising); > > mii_adv_to_linkmode_adv_t(lp_advertising, lpa); > } > > But > > static inline void mii_adv_to_linkmode_adv_t(unsigned long *advertising, > u32 adv) > { > linkmode_zero(advertising); > > if (adv & ADVERTISE_10HALF) > linkmode_set_bit(ETHTOOL_LINK_MODE_10baseT_Half_BIT, > advertising); > > So the Autoneg_BIT gets cleared. > > I think the better fix is to take the linkmode_zero() out from here. > > Then: > > if (adv & ADVERTISE_10HALF) >linkmode_set_bit(ETHTOOL_LINK_MODE_10baseT_Half_BIT, > advertising); > + else > + linkmode_clear_bit(ETHTOOL_LINK_MODE_10baseT_Half_BIT, > + advertising); > > for all the bits mii_adv_to_linkmode_adv_t() looks at. > > So mii_adv_to_linkmode_adv_t() only modifies bits it is responsible > for, and leaves the others alone. > > Andrew > mii_adv_to_linkmode_adv_t() is used also in phy_mii_ioctl(), and I'm not sure the proposed change is safe there. Eventually we'd have three types of mii_xxx_to_linkmode_yyy functions: 1. Function first zeroes the destination linkmode bitmap 2. Function sets bits in the linkmode bitmap but doesn't clear bits if condition isn't met 3. Function clears / sets bits it's responsible for example case 1: mmd_eee_adv_to_linkmode example case 2: mii_stat1000_to_linkmode_lpa_t example case 3: what you just proposed as fix for mii_adv_to_linkmode_adv_t Because function naming is the same I'm afraid they easily can be used incorrectly (the bugs we just discuss are good examples). Maybe it could be an option to reflect the semantics in the name like this (better suited proposals welcome): case 1: mii_xxx_to_linkmode_yyy case 2: mii_xxx_or_linkmode_yyy case 3: mii_xxx_mod_linkmode_yyy Heiner
Re: [PATCH net-next] net: phy: fix two issues with linkmode bitmaps
On Sun, Nov 25, 2018 at 03:23:42PM +0100, Heiner Kallweit wrote: > I wondered why ethtool suddenly reports that link partner doesn't > support aneg and GBit modes. It turned out that this is caused by two > bugs in conversion to linkmode bitmaps. > > 1. In genphy_read_status the value of phydev->lp_advertising is >overwritten, thus GBit modes aren't reported any longer. > 2. In mii_lpa_to_linkmode_lpa_t the aneg bit was overwritten by the >call to mii_adv_to_linkmode_adv_t. Hi Heiner Thanks for looking into this. There are more bugs :-( static inline void mii_lpa_to_linkmode_lpa_t(unsigned long *lp_advertising, u32 lpa) { if (lpa & LPA_LPACK) linkmode_set_bit(ETHTOOL_LINK_MODE_Autoneg_BIT, lp_advertising); mii_adv_to_linkmode_adv_t(lp_advertising, lpa); } But static inline void mii_adv_to_linkmode_adv_t(unsigned long *advertising, u32 adv) { linkmode_zero(advertising); if (adv & ADVERTISE_10HALF) linkmode_set_bit(ETHTOOL_LINK_MODE_10baseT_Half_BIT, advertising); So the Autoneg_BIT gets cleared. I think the better fix is to take the linkmode_zero() out from here. Then: if (adv & ADVERTISE_10HALF) linkmode_set_bit(ETHTOOL_LINK_MODE_10baseT_Half_BIT, advertising); + else + linkmode_clear_bit(ETHTOOL_LINK_MODE_10baseT_Half_BIT, + advertising); for all the bits mii_adv_to_linkmode_adv_t() looks at. So mii_adv_to_linkmode_adv_t() only modifies bits it is responsible for, and leaves the others alone. Andrew
[PATCH net-next] net: remove unsafe skb_insert()
I do not see how one can effectively use skb_insert() without holding some kind of lock. Otherwise other cpus could have changed the list right before we have a chance of acquiring list->lock. Only existing user is in drivers/infiniband/hw/nes/nes_mgt.c and this one probably meant to use __skb_insert() since it appears nesqp->pau_list is protected by nesqp->pau_lock. This looks like nesqp->pau_lock could be removed, since nesqp->pau_list.lock could be used instead. Signed-off-by: Eric Dumazet Cc: Faisal Latif Cc: Doug Ledford Cc: Jason Gunthorpe Cc: linux-rdma --- drivers/infiniband/hw/nes/nes_mgt.c | 4 ++-- include/linux/skbuff.h | 2 -- net/core/skbuff.c | 22 -- 3 files changed, 2 insertions(+), 26 deletions(-) diff --git a/drivers/infiniband/hw/nes/nes_mgt.c b/drivers/infiniband/hw/nes/nes_mgt.c index fc0c191014e908eea32d752f3499295ef143aa0a..abb54d30d35dd53fa983ee437506933eeba72746 100644 --- a/drivers/infiniband/hw/nes/nes_mgt.c +++ b/drivers/infiniband/hw/nes/nes_mgt.c @@ -551,14 +551,14 @@ static void queue_fpdus(struct sk_buff *skb, struct nes_vnic *nesvnic, struct ne /* Queue skb by sequence number */ if (skb_queue_len(>pau_list) == 0) { - skb_queue_head(>pau_list, skb); + __skb_queue_head(>pau_list, skb); } else { skb_queue_walk(>pau_list, tmpskb) { cb = (struct nes_rskb_cb *)>cb[0]; if (before(seqnum, cb->seqnum)) break; } - skb_insert(tmpskb, skb, >pau_list); + __skb_insert(tmpskb, skb, >pau_list); } if (nesqp->pau_state == PAU_READY) process_it = true; diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index f17a7452ac7bf47ef4bcf89840bba165cee6f50a..73902acf2b71c8800d81b744a936a7420f33b459 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -1749,8 +1749,6 @@ static inline void skb_queue_head_init_class(struct sk_buff_head *list, * The "__skb_()" functions are the non-atomic ones that * can only be called with interrupts disabled. */ -void skb_insert(struct sk_buff *old, struct sk_buff *newsk, - struct sk_buff_head *list); static inline void __skb_insert(struct sk_buff *newsk, struct sk_buff *prev, struct sk_buff *next, struct sk_buff_head *list) diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 9a8a72cefe9b94d3821b9cc5ba5bba647ae51267..02cd7ae3d0fb26ef0a8b006390154fdefd0d292f 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -2990,28 +2990,6 @@ void skb_append(struct sk_buff *old, struct sk_buff *newsk, struct sk_buff_head } EXPORT_SYMBOL(skb_append); -/** - * skb_insert - insert a buffer - * @old: buffer to insert before - * @newsk: buffer to insert - * @list: list to use - * - * Place a packet before a given packet in a list. The list locks are - * taken and this function is atomic with respect to other list locked - * calls. - * - * A buffer cannot be placed on two lists at the same time. - */ -void skb_insert(struct sk_buff *old, struct sk_buff *newsk, struct sk_buff_head *list) -{ - unsigned long flags; - - spin_lock_irqsave(>lock, flags); - __skb_insert(newsk, old->prev, old, list); - spin_unlock_irqrestore(>lock, flags); -} -EXPORT_SYMBOL(skb_insert); - static inline void skb_split_inside_header(struct sk_buff *skb, struct sk_buff* skb1, const u32 len, const int pos) -- 2.20.0.rc0.387.gc7a69e6b6c-goog
[PATCH net-next] net: phy: fix two issues with linkmode bitmaps
I wondered why ethtool suddenly reports that link partner doesn't support aneg and GBit modes. It turned out that this is caused by two bugs in conversion to linkmode bitmaps. 1. In genphy_read_status the value of phydev->lp_advertising is overwritten, thus GBit modes aren't reported any longer. 2. In mii_lpa_to_linkmode_lpa_t the aneg bit was overwritten by the call to mii_adv_to_linkmode_adv_t. Fixes: c0ec3c273677 ("net: phy: Convert u32 phydev->lp_advertising to linkmode") Signed-off-by: Heiner Kallweit --- drivers/net/phy/phy_device.c | 5 - include/linux/mii.h | 4 ++-- 2 files changed, 6 insertions(+), 3 deletions(-) diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c index 0904002b1..94f60c08b 100644 --- a/drivers/net/phy/phy_device.c +++ b/drivers/net/phy/phy_device.c @@ -1696,6 +1696,7 @@ int genphy_read_status(struct phy_device *phydev) int lpagb = 0; int common_adv; int common_adv_gb = 0; + __ETHTOOL_DECLARE_LINK_MODE_MASK(lpa_tmp); /* Update the link, but return if there was an error */ err = genphy_update_link(phydev); @@ -1734,7 +1735,9 @@ int genphy_read_status(struct phy_device *phydev) if (lpa < 0) return lpa; - mii_lpa_to_linkmode_lpa_t(phydev->lp_advertising, lpa); + mii_lpa_to_linkmode_lpa_t(lpa_tmp, lpa); + linkmode_or(phydev->lp_advertising, phydev->lp_advertising, + lpa_tmp); adv = phy_read(phydev, MII_ADVERTISE); if (adv < 0) diff --git a/include/linux/mii.h b/include/linux/mii.h index fb7ae4ae8..08450609d 100644 --- a/include/linux/mii.h +++ b/include/linux/mii.h @@ -413,11 +413,11 @@ static inline void mii_adv_to_linkmode_adv_t(unsigned long *advertising, static inline void mii_lpa_to_linkmode_lpa_t(unsigned long *lp_advertising, u32 lpa) { + mii_adv_to_linkmode_adv_t(lp_advertising, lpa); + if (lpa & LPA_LPACK) linkmode_set_bit(ETHTOOL_LINK_MODE_Autoneg_BIT, lp_advertising); - - mii_adv_to_linkmode_adv_t(lp_advertising, lpa); } /** -- 2.19.2
[PATCH net-next v2 2/2] r8169: make use of xmit_more and __netdev_sent_queue
Make use of xmit_more and add the functionality introduced with 3e59020abf0f ("net: bql: add __netdev_tx_sent_queue()"). I used the mlx4 driver as template. Signed-off-by: Heiner Kallweit --- v2: - fix minor style issue --- drivers/net/ethernet/realtek/r8169.c | 19 +-- 1 file changed, 9 insertions(+), 10 deletions(-) diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c index 5ee684f9e..4114c2712 100644 --- a/drivers/net/ethernet/realtek/r8169.c +++ b/drivers/net/ethernet/realtek/r8169.c @@ -6069,6 +6069,7 @@ static netdev_tx_t rtl8169_start_xmit(struct sk_buff *skb, struct device *d = tp_to_dev(tp); dma_addr_t mapping; u32 opts[2], len; + bool stop_queue; int frags; if (unlikely(!rtl_tx_slots_avail(tp, skb_shinfo(skb)->nr_frags))) { @@ -6110,8 +6111,6 @@ static netdev_tx_t rtl8169_start_xmit(struct sk_buff *skb, txd->opts2 = cpu_to_le32(opts[1]); - netdev_sent_queue(dev, skb->len); - skb_tx_timestamp(skb); /* Force memory writes to complete before releasing descriptor */ @@ -6124,16 +6123,16 @@ static netdev_tx_t rtl8169_start_xmit(struct sk_buff *skb, tp->cur_tx += frags + 1; - RTL_W8(tp, TxPoll, NPQ); + stop_queue = !rtl_tx_slots_avail(tp, MAX_SKB_FRAGS); + if (unlikely(stop_queue)) + netif_stop_queue(dev); - mmiowb(); + if (__netdev_sent_queue(dev, skb->len, skb->xmit_more)) { + RTL_W8(tp, TxPoll, NPQ); + mmiowb(); + } - if (!rtl_tx_slots_avail(tp, MAX_SKB_FRAGS)) { - /* Avoid wrongly optimistic queue wake-up: rtl_tx thread must -* not miss a ring update when it notices a stopped queue. -*/ - smp_wmb(); - netif_stop_queue(dev); + if (unlikely(stop_queue)) { /* Sync with rtl_tx: * - publish queue status and cur_tx ring index (write barrier) * - refresh dirty_tx ring index (read barrier). -- 2.19.2
[PATCH net-next v2 1/2] net: core: add __netdev_sent_queue as variant of __netdev_tx_sent_queue
Similar to netdev_sent_queue add helper __netdev_sent_queue as variant of __netdev_tx_sent_queue. Signed-off-by: Heiner Kallweit --- v2: - no changes --- include/linux/netdevice.h | 8 1 file changed, 8 insertions(+) diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 1dcc0628b..a417fa501 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -3214,6 +3214,14 @@ static inline void netdev_sent_queue(struct net_device *dev, unsigned int bytes) netdev_tx_sent_queue(netdev_get_tx_queue(dev, 0), bytes); } +static inline bool __netdev_sent_queue(struct net_device *dev, + unsigned int bytes, + bool xmit_more) +{ + return __netdev_tx_sent_queue(netdev_get_tx_queue(dev, 0), bytes, + xmit_more); +} + static inline void netdev_tx_completed_queue(struct netdev_queue *dev_queue, unsigned int pkts, unsigned int bytes) { -- 2.19.1
[PATCH net-next v2 0/2] r8169: make use of xmit_more and __netdev_sent_queue
This series adds helper __netdev_sent_queue to the core and makes use of it in the r8169 driver. Heiner Kallweit (2): net: core: add __netdev_sent_queue as variant of __netdev_tx_sent_queue r8169: make use of xmit_more and __netdev_sent_queue v2: - fix minor style issue drivers/net/ethernet/realtek/r8169.c | 19 +-- include/linux/netdevice.h| 8 2 files changed, 17 insertions(+), 10 deletions(-) -- 2.19.1
Re: Can decnet be deprecated?
On Sun, Nov 25, 2018 at 4:14 AM David Ahern wrote: > > IPX was moved to staging at the end of last year. Can decnet follow > suit? git log seems to indicate no active development in a very long time. > > David Kill it :)
Re: Can decnet be deprecated?
David Miller writes: > From: David Ahern > Date: Sat, 24 Nov 2018 17:12:48 -0700 > >> IPX was moved to staging at the end of last year. Can decnet follow >> suit? git log seems to indicate no active development in a very long time. > > Last time I tried to do that someone immediately said on the list > "Don't, we're using that!" Not sure about that. What I can see is a claim that it has no bugs: http://patchwork.ozlabs.org/patch/837484/ The V1 received only support for removal: http://patchwork.ozlabs.org/patch/837261/ But no one claimed they were using decnet. Bjørn
[PATCH net-next 4/5] mlxsw: spectrum_router: Introduce emulated VLAN RIFs
Router interfaces (RIFs) constructed on top of VLAN-aware bridges are of "VLAN" type, whereas RIFs constructed on top of VLAN-unaware bridges of "FID" type. In other words, the RIF type is derived from the underlying FID type. VLAN RIFs are used on top of 802.1Q FIDs, whereas FID RIFs are used on top of 802.1D FIDs. Since the previous patch emulated 802.1Q FIDs using 802.1D FIDs, this patch emulates VLAN RIFs using FID RIFs. Signed-off-by: Ido Schimmel Reviewed-by: Petr Machata --- drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c | 9 + 1 file changed, 9 insertions(+) diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c index 9e9bb57134f2..5cdd4ceee7a9 100644 --- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c +++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c @@ -7296,6 +7296,15 @@ static const struct mlxsw_sp_rif_ops mlxsw_sp_rif_fid_ops = { .fdb_del= mlxsw_sp_rif_fid_fdb_del, }; +static const struct mlxsw_sp_rif_ops mlxsw_sp_rif_vlan_emu_ops = { + .type = MLXSW_SP_RIF_TYPE_VLAN, + .rif_size = sizeof(struct mlxsw_sp_rif), + .configure = mlxsw_sp_rif_fid_configure, + .deconfigure= mlxsw_sp_rif_fid_deconfigure, + .fid_get= mlxsw_sp_rif_vlan_fid_get, + .fdb_del= mlxsw_sp_rif_vlan_fdb_del, +}; + static struct mlxsw_sp_rif_ipip_lb * mlxsw_sp_rif_ipip_lb_rif(struct mlxsw_sp_rif *rif) { -- 2.19.1
[PATCH net-next 2/5] mlxsw: spectrum_fid: Make flood index calculation more robust
802.1D FIDs use a per-FID flood table, where the flood index into the table is calculated by subtracting 4K from the FID's index. Currently, 802.1D FIDs start at 4K, so the calculation is correct, but if it was ever to change, the calculation will no longer be correct. In addition, this change will allow us to reuse the flood index calculation function in the next patch, where we are going to emulate 802.1Q FIDs using 802.1D FIDs. Signed-off-by: Ido Schimmel Reviewed-by: Petr Machata --- drivers/net/ethernet/mellanox/mlxsw/spectrum_fid.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_fid.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_fid.c index 5008bf63d73b..e1739cda25cb 100644 --- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_fid.c +++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_fid.c @@ -607,7 +607,7 @@ mlxsw_sp_fid_8021d_compare(const struct mlxsw_sp_fid *fid, const void *arg) static u16 mlxsw_sp_fid_8021d_flood_index(const struct mlxsw_sp_fid *fid) { - return fid->fid_index - fid->fid_family->start_index; + return fid->fid_index - VLAN_N_VID; } static int mlxsw_sp_port_vp_mode_trans(struct mlxsw_sp_port *mlxsw_sp_port) -- 2.19.1
[PATCH net-next 0/5] mlxsw: Prepare for VLAN-aware bridge w/VxLAN
The driver is using 802.1Q filtering identifiers (FIDs) to represent the different VLANs in the VLAN-aware bridge (only one is supported). However, the device cannot assign a VNI to such FIDs, which prevents the driver from supporting the enslavement of VxLAN devices to the VLAN-aware bridge. This patchset works around this limitation by emulating 802.1Q FIDs using 802.1D FIDs, which can be assigned a VNI and so far have only been used in conjunction with VLAN-unaware bridges. The downside of this approach is that multiple {Port,VID}->FID entries are required, whereas a single VID->FID entry is required with "true" 802.1Q FIDs. First four patches introduce the new FID family of emulated 802.1Q FIDs and the associated type of router interfaces (RIFs). Last patch flips the driver to use this new FID family. The diff is relatively small because the internal implementation of each FID family is contained and hidden in spectrum_fid.c. Different internal users (e.g., bridge, router) are aware of the different FID types, but do not care about their internal implementation. This makes it trivial to swap the current implementation of 802.1Q FIDs with the new one, using 802.1D FIDs. Ido Schimmel (5): mlxsw: spectrum_switchdev: Do not set field when it is reserved mlxsw: spectrum_fid: Make flood index calculation more robust mlxsw: spectrum_fid: Introduce emulated 802.1Q FIDs mlxsw: spectrum_router: Introduce emulated VLAN RIFs mlxsw: spectrum: Flip driver to use emulated 802.1Q FIDs .../net/ethernet/mellanox/mlxsw/spectrum.c| 14 +++--- .../net/ethernet/mellanox/mlxsw/spectrum.h| 1 + .../ethernet/mellanox/mlxsw/spectrum_fid.c| 44 ++- .../ethernet/mellanox/mlxsw/spectrum_router.c | 11 - .../mellanox/mlxsw/spectrum_switchdev.c | 3 +- 5 files changed, 63 insertions(+), 10 deletions(-) -- 2.19.1
[PATCH net-next 3/5] mlxsw: spectrum_fid: Introduce emulated 802.1Q FIDs
The driver uses 802.1Q FIDs when offloading a VLAN-aware bridge. Unfortunately, it is not possible to assign a VNI to such FIDs, which prompts the driver to forbid the enslavement of VxLAN devices to a VLAN-aware bridge. Workaround this hardware limitation by creating a new family of FIDs, emulated 802.1Q FIDs. These FIDs are emulated using 802.1D FIDs, which can be assigned a VNI. The downside of this approach is that multiple {Port, VID}->FID entries are required, whereas only a single VID->FID is required with "true" 802.1Q FIDs. Signed-off-by: Ido Schimmel Reviewed-by: Petr Machata --- .../ethernet/mellanox/mlxsw/spectrum_fid.c| 33 +++ 1 file changed, 33 insertions(+) diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_fid.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_fid.c index e1739cda25cb..99ccb11405a5 100644 --- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_fid.c +++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_fid.c @@ -801,6 +801,39 @@ static const struct mlxsw_sp_fid_family mlxsw_sp_fid_8021d_family = { .lag_vid_valid = 1, }; +static const struct mlxsw_sp_fid_ops mlxsw_sp_fid_8021q_emu_ops = { + .setup = mlxsw_sp_fid_8021q_setup, + .configure = mlxsw_sp_fid_8021d_configure, + .deconfigure= mlxsw_sp_fid_8021d_deconfigure, + .index_alloc= mlxsw_sp_fid_8021d_index_alloc, + .compare= mlxsw_sp_fid_8021q_compare, + .flood_index= mlxsw_sp_fid_8021d_flood_index, + .port_vid_map = mlxsw_sp_fid_8021d_port_vid_map, + .port_vid_unmap = mlxsw_sp_fid_8021d_port_vid_unmap, + .vni_set= mlxsw_sp_fid_8021d_vni_set, + .vni_clear = mlxsw_sp_fid_8021d_vni_clear, + .nve_flood_index_set= mlxsw_sp_fid_8021d_nve_flood_index_set, + .nve_flood_index_clear = mlxsw_sp_fid_8021d_nve_flood_index_clear, +}; + +/* There are 4K-2 emulated 802.1Q FIDs, starting right after the 802.1D FIDs */ +#define MLXSW_SP_FID_8021Q_EMU_START (VLAN_N_VID + MLXSW_SP_FID_8021D_MAX) +#define MLXSW_SP_FID_8021Q_EMU_END (MLXSW_SP_FID_8021Q_EMU_START + \ +VLAN_VID_MASK - 2) + +/* Range and flood configuration must match mlxsw_config_profile */ +static const struct mlxsw_sp_fid_family mlxsw_sp_fid_8021q_emu_family = { + .type = MLXSW_SP_FID_TYPE_8021Q, + .fid_size = sizeof(struct mlxsw_sp_fid_8021q), + .start_index= MLXSW_SP_FID_8021Q_EMU_START, + .end_index = MLXSW_SP_FID_8021Q_EMU_END, + .flood_tables = mlxsw_sp_fid_8021d_flood_tables, + .nr_flood_tables= ARRAY_SIZE(mlxsw_sp_fid_8021d_flood_tables), + .rif_type = MLXSW_SP_RIF_TYPE_VLAN, + .ops= _sp_fid_8021q_emu_ops, + .lag_vid_valid = 1, +}; + static int mlxsw_sp_fid_rfid_configure(struct mlxsw_sp_fid *fid) { /* rFIDs are allocated by the device during init */ -- 2.19.1
[PATCH net-next 1/5] mlxsw: spectrum_switchdev: Do not set field when it is reserved
When configuring an FDB entry pointing to a LAG netdev (or its upper), the driver should only set the 'lag_vid' field when the FID (filtering identifier) is of 802.1D type. Extend the 802.1D FID family with an attribute indicating whether this field should be set and based on its value set the field or leave it blank. Signed-off-by: Ido Schimmel Reviewed-by: Petr Machata --- drivers/net/ethernet/mellanox/mlxsw/spectrum.h | 1 + drivers/net/ethernet/mellanox/mlxsw/spectrum_fid.c | 7 +++ drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c | 3 ++- 3 files changed, 10 insertions(+), 1 deletion(-) diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum.h b/drivers/net/ethernet/mellanox/mlxsw/spectrum.h index 973a9d2901f7..244972bf8b0a 100644 --- a/drivers/net/ethernet/mellanox/mlxsw/spectrum.h +++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum.h @@ -721,6 +721,7 @@ int mlxsw_sp_setup_tc_prio(struct mlxsw_sp_port *mlxsw_sp_port, struct tc_prio_qopt_offload *p); /* spectrum_fid.c */ +bool mlxsw_sp_fid_lag_vid_valid(const struct mlxsw_sp_fid *fid); struct mlxsw_sp_fid *mlxsw_sp_fid_lookup_by_index(struct mlxsw_sp *mlxsw_sp, u16 fid_index); int mlxsw_sp_fid_nve_ifindex(const struct mlxsw_sp_fid *fid, int *nve_ifindex); diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_fid.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_fid.c index 71b2d20afcc2..5008bf63d73b 100644 --- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_fid.c +++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_fid.c @@ -98,6 +98,7 @@ struct mlxsw_sp_fid_family { enum mlxsw_sp_rif_type rif_type; const struct mlxsw_sp_fid_ops *ops; struct mlxsw_sp *mlxsw_sp; + u8 lag_vid_valid:1; }; static const int mlxsw_sp_sfgc_uc_packet_types[MLXSW_REG_SFGC_TYPE_MAX] = { @@ -122,6 +123,11 @@ static const int *mlxsw_sp_packet_type_sfgc_types[] = { [MLXSW_SP_FLOOD_TYPE_MC]= mlxsw_sp_sfgc_mc_packet_types, }; +bool mlxsw_sp_fid_lag_vid_valid(const struct mlxsw_sp_fid *fid) +{ + return fid->fid_family->lag_vid_valid; +} + struct mlxsw_sp_fid *mlxsw_sp_fid_lookup_by_index(struct mlxsw_sp *mlxsw_sp, u16 fid_index) { @@ -792,6 +798,7 @@ static const struct mlxsw_sp_fid_family mlxsw_sp_fid_8021d_family = { .nr_flood_tables= ARRAY_SIZE(mlxsw_sp_fid_8021d_flood_tables), .rif_type = MLXSW_SP_RIF_TYPE_FID, .ops= _sp_fid_8021d_ops, + .lag_vid_valid = 1, }; static int mlxsw_sp_fid_rfid_configure(struct mlxsw_sp_fid *fid) diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c index 73e5db176d7e..3c2428404b2e 100644 --- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c +++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c @@ -2482,7 +2482,8 @@ static void mlxsw_sp_fdb_notify_mac_lag_process(struct mlxsw_sp *mlxsw_sp, bridge_device = bridge_port->bridge_device; vid = bridge_device->vlan_enabled ? mlxsw_sp_port_vlan->vid : 0; - lag_vid = mlxsw_sp_port_vlan->vid; + lag_vid = mlxsw_sp_fid_lag_vid_valid(mlxsw_sp_port_vlan->fid) ? + mlxsw_sp_port_vlan->vid : 0; do_fdb_op: err = mlxsw_sp_port_fdb_uc_lag_op(mlxsw_sp, lag_id, mac, fid, lag_vid, -- 2.19.1
[PATCH net-next 5/5] mlxsw: spectrum: Flip driver to use emulated 802.1Q FIDs
Replace 802.1Q FIDs and VLAN RIFs with their emulated counterparts. The emulated 802.1Q FIDs are actually 802.1D FIDs and thus use the same flood tables, of per-FID type. Therefore, add 4K-1 entries to the per-FID flood tables for the new FIDs and get rid of the FID-offset flood tables that were used by the old 802.1Q FIDs. Signed-off-by: Ido Schimmel Reviewed-by: Petr Machata --- drivers/net/ethernet/mellanox/mlxsw/spectrum.c | 14 -- drivers/net/ethernet/mellanox/mlxsw/spectrum_fid.c | 2 +- .../net/ethernet/mellanox/mlxsw/spectrum_router.c | 2 +- 3 files changed, 10 insertions(+), 8 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum.c index 637e2ef76abe..93378d507962 100644 --- a/drivers/net/ethernet/mellanox/mlxsw/spectrum.c +++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum.c @@ -4111,16 +4111,20 @@ static void mlxsw_sp_fini(struct mlxsw_core *mlxsw_core) mlxsw_sp_kvdl_fini(mlxsw_sp); } +/* Per-FID flood tables are used for both "true" 802.1D FIDs and emulated + * 802.1Q FIDs + */ +#define MLXSW_SP_FID_FLOOD_TABLE_SIZE (MLXSW_SP_FID_8021D_MAX + \ +VLAN_VID_MASK - 1) + static const struct mlxsw_config_profile mlxsw_sp1_config_profile = { .used_max_mid = 1, .max_mid= MLXSW_SP_MID_MAX, .used_flood_tables = 1, .used_flood_mode= 1, .flood_mode = 3, - .max_fid_offset_flood_tables= 3, - .fid_offset_flood_table_size= VLAN_N_VID - 1, .max_fid_flood_tables = 3, - .fid_flood_table_size = MLXSW_SP_FID_8021D_MAX, + .fid_flood_table_size = MLXSW_SP_FID_FLOOD_TABLE_SIZE, .used_max_ib_mc = 1, .max_ib_mc = 0, .used_max_pkey = 1, @@ -4143,10 +4147,8 @@ static const struct mlxsw_config_profile mlxsw_sp2_config_profile = { .used_flood_tables = 1, .used_flood_mode= 1, .flood_mode = 3, - .max_fid_offset_flood_tables= 3, - .fid_offset_flood_table_size= VLAN_N_VID - 1, .max_fid_flood_tables = 3, - .fid_flood_table_size = MLXSW_SP_FID_8021D_MAX, + .fid_flood_table_size = MLXSW_SP_FID_FLOOD_TABLE_SIZE, .used_max_ib_mc = 1, .max_ib_mc = 0, .used_max_pkey = 1, diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_fid.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_fid.c index 99ccb11405a5..6830e79aed93 100644 --- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_fid.c +++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_fid.c @@ -961,7 +961,7 @@ static const struct mlxsw_sp_fid_family mlxsw_sp_fid_dummy_family = { }; static const struct mlxsw_sp_fid_family *mlxsw_sp_fid_family_arr[] = { - [MLXSW_SP_FID_TYPE_8021Q] = _sp_fid_8021q_family, + [MLXSW_SP_FID_TYPE_8021Q] = _sp_fid_8021q_emu_family, [MLXSW_SP_FID_TYPE_8021D] = _sp_fid_8021d_family, [MLXSW_SP_FID_TYPE_RFID]= _sp_fid_rfid_family, [MLXSW_SP_FID_TYPE_DUMMY] = _sp_fid_dummy_family, diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c index 5cdd4ceee7a9..1557c5fc6d10 100644 --- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c +++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c @@ -7373,7 +7373,7 @@ static const struct mlxsw_sp_rif_ops mlxsw_sp_rif_ipip_lb_ops = { static const struct mlxsw_sp_rif_ops *mlxsw_sp_rif_ops_arr[] = { [MLXSW_SP_RIF_TYPE_SUBPORT] = _sp_rif_subport_ops, - [MLXSW_SP_RIF_TYPE_VLAN]= _sp_rif_vlan_ops, + [MLXSW_SP_RIF_TYPE_VLAN]= _sp_rif_vlan_emu_ops, [MLXSW_SP_RIF_TYPE_FID] = _sp_rif_fid_ops, [MLXSW_SP_RIF_TYPE_IPIP_LB] = _sp_rif_ipip_lb_ops, }; -- 2.19.1