date:20181125

[PATCH net-next] add documents for snmp counters

2018-11-25 Thread yupeng

Add explaination of below counters:
TcpExtTCPRcvCoalesce
TcpExtTCPAutoCorking
TcpExtTCPOrigDataSent
TCPSynRetrans
TCPFastOpenActiveFail
TcpExtListenOverflows
TcpExtListenDrops
TcpExtTCPHystartTrainDetect
TcpExtTCPHystartTrainCwnd
TcpExtTCPHystartDelayDetect
TcpExtTCPHystartDelayCwnd

Signed-off-by: yupeng 
---
 Documentation/networking/snmp_counter.rst | 202 ++
 1 file changed, 202 insertions(+)

diff --git a/Documentation/networking/snmp_counter.rst 
b/Documentation/networking/snmp_counter.rst
index a262d32ed710..918a1374af30 100644
--- a/Documentation/networking/snmp_counter.rst
+++ b/Documentation/networking/snmp_counter.rst
@@ -220,6 +220,68 @@ Defined in `RFC1213 tcpPassiveOpens`_
 It means the TCP layer receives a SYN, replies a SYN+ACK, come into
 the SYN-RCVD state.
 
+* TcpExtTCPRcvCoalesce
+When packets are received by the TCP layer and are not be read by the
+application, the TCP layer will try to merge them. This counter
+indicate how many packets are merged in such situation. If GRO is
+enabled, lots of packets would be merged by GRO, these packets
+wouldn't be counted to TcpExtTCPRcvCoalesce.
+
+* TcpExtTCPAutoCorking
+When sending packets, the TCP layer will try to merge small packets to
+a bigger one. This counter increase 1 for every packet merged in such
+situation. Please refer to the LWN article for more details:
+https://lwn.net/Articles/576263/
+
+* TcpExtTCPOrigDataSent
+This counter is explained by `kernel commit f19c29e3e391`_, I pasted the
+explaination below::
+
+  TCPOrigDataSent: number of outgoing packets with original data (excluding
+  retransmission but including data-in-SYN). This counter is different from
+  TcpOutSegs because TcpOutSegs also tracks pure ACKs. TCPOrigDataSent is
+  more useful to track the TCP retransmission rate.
+
+* TCPSynRetrans
+This counter is explained by `kernel commit f19c29e3e391`_, I pasted the
+explaination below::
+
+  TCPSynRetrans: number of SYN and SYN/ACK retransmits to break down
+  retransmissions into SYN, fast-retransmits, timeout retransmits, etc.
+
+* TCPFastOpenActiveFail
+This counter is explained by `kernel commit f19c29e3e391`_, I pasted the
+explaination below::
+
+  TCPFastOpenActiveFail: Fast Open attempts (SYN/data) failed because
+  the remote does not accept it or the attempts timed out.
+
+.. _kernel commit f19c29e3e391: 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=f19c29e3e391a66a273e9afebaf01917245148cd
+
+* TcpExtListenOverflows and TcpExtListenDrops
+When kernel receives a SYN from a client, and if the TCP accept queue
+is full, kernel will drop the SYN and add 1 to TcpExtListenOverflows.
+At the same time kernel will also add 1 to TcpExtListenDrops. When a
+TCP socket is in LISTEN state, and kernel need to drop a packet,
+kernel would always add 1 to TcpExtListenDrops. So increase
+TcpExtListenOverflows would let TcpExtListenDrops increasing at the
+same time, but TcpExtListenDrops would also increase without
+TcpExtListenOverflows increasing, e.g. a memory allocation fail would
+also let TcpExtListenDrops increase.
+
+Note: The above explanation is based on kernel 4.10 or above version, on
+an old kernel, the TCP stack has different behavior when TCP accept
+queue is full. On the old kernel, TCP stack won't drop the SYN, it
+would complete the 3-way handshake. As the accept queue is full, TCP
+stack will keep the socket in the TCP half-open queue. As it is in the
+half open queue, TCP stack will send SYN+ACK on an exponential backoff
+timer, after client replies ACK, TCP stack checks whether the accept
+queue is still full, if it is not full, moves the socket to the accept
+queue, if it is full, keeps the socket in the half-open queue, at next
+time client replies ACK, this socket will get another chance to move
+to the accept queue.
+
+
 TCP Fast Open
 
 When kernel receives a TCP packet, it has two paths to handler the
@@ -331,6 +393,38 @@ TcpExtTCPAbortFailed will be increased.
 
 .. _RFC2525 2.17 section: https://tools.ietf.org/html/rfc2525#page-50
 
+TCP Hybrid Slow Start
+
+The Hybrid Slow Start algorithm is an enhancement of the traditional
+TCP congestion window Slow Start algorithm. It uses two pieces of
+information to detect whether the max bandwidth of the TCP path is
+approached. The two pieces of information are ACK train length and
+increase in packet delay. For detail information, please refer the
+`Hybrid Slow Start paper`_. Either ACK train length or packet delay
+hits a specific threshold, the congestion control algorithm will come
+into the Congestion Avoidance state. Until v4.20, two congestion
+control algorithms are using Hybrid Slow Start, they are cubic (the
+default congestion control algorithm) and cdg. Four snmp counters
+relate with the Hybrid Slow Start algorithm.
+
+.. _Hybrid Slow Start paper: 
https://pdfs.semanticscholar.org/25e9/ef3f03315782c7f1cbcd31b587857adae7d1.pdf
+
+*

RE: [PATCH net-next 1/8] dpaa2-eth: Add basic XDP support

2018-11-25 Thread Ioana Ciocoi Radulescu

> -Original Message-
> From: David Ahern 
> Sent: Saturday, November 24, 2018 11:49 PM
> To: Ioana Ciocoi Radulescu ;
> netdev@vger.kernel.org; da...@davemloft.net
> Cc: Ioana Ciornei 
> Subject: Re: [PATCH net-next 1/8] dpaa2-eth: Add basic XDP support
> 
> On 11/23/18 9:56 AM, Ioana Ciocoi Radulescu wrote:
> > @@ -215,6 +255,7 @@ static void dpaa2_eth_rx(struct dpaa2_eth_priv
> *priv,
> > struct dpaa2_fas *fas;
> > void *buf_data;
> > u32 status = 0;
> > +   u32 xdp_act;
> >
> > /* Tracing point */
> > trace_dpaa2_rx_fd(priv->net_dev, fd);
> > @@ -231,8 +272,14 @@ static void dpaa2_eth_rx(struct dpaa2_eth_priv
> *priv,
> > percpu_extras = this_cpu_ptr(priv->percpu_extras);
> >
> > if (fd_format == dpaa2_fd_single) {
> > +   xdp_act = run_xdp(priv, ch, (struct dpaa2_fd *)fd, vaddr);
> > +   if (xdp_act != XDP_PASS)
> > +   return;
> 
> please bump the rx counters (packets and bytes) regardless of what XDP
> outcome is.
> 
> Same for Tx; packets and bytes counter should be bumped for packets
> redirected by XDP.

Thanks for the feedback, I wasn't sure whether I should count them
as regular packets or not. I'll make the change in v2.

Ioana

[PATCH net] sctp: increase sk_wmem_alloc when head->truesize is increased

2018-11-25 Thread Xin Long

I changed to count sk_wmem_alloc by skb truesize instead of 1 to
fix the sk_wmem_alloc leak caused by later truesize's change in
xfrm in Commit 02968ccf0125 ("sctp: count sk_wmem_alloc by skb
truesize in sctp_packet_transmit").

But I should have also increased sk_wmem_alloc when head->truesize
is increased in sctp_packet_gso_append() as xfrm does. Otherwise,
sctp gso packet will cause sk_wmem_alloc underflow.

Fixes: 02968ccf0125 ("sctp: count sk_wmem_alloc by skb truesize in 
sctp_packet_transmit")
Signed-off-by: Xin Long 
---
 net/sctp/output.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/net/sctp/output.c b/net/sctp/output.c
index b0e74a3..025f48e 100644
--- a/net/sctp/output.c
+++ b/net/sctp/output.c
@@ -410,6 +410,7 @@ static void sctp_packet_gso_append(struct sk_buff *head, 
struct sk_buff *skb)
head->truesize += skb->truesize;
head->data_len += skb->len;
head->len += skb->len;
+   refcount_add(skb->truesize, >sk->sk_wmem_alloc);
 
__skb_header_release(skb);
 }
-- 
2.1.0

[PATCH rdma-next 6/7] IB/mlx5: Update the supported DEVX commands

2018-11-25 Thread Leon Romanovsky

From: Yishai Hadas 

Update the supported DEVX commands, it includes adding to the
query/modify command's list and to the encoding handling.

In addition, a valid range for general commands was added to be used for
future commands.

Signed-off-by: Yishai Hadas 
Reviewed-by: Artemy Kovalyov 
Signed-off-by: Leon Romanovsky 
---
 drivers/infiniband/hw/mlx5/devx.c | 17 +
 include/linux/mlx5/mlx5_ifc.h | 10 ++
 2 files changed, 27 insertions(+)

diff --git a/drivers/infiniband/hw/mlx5/devx.c 
b/drivers/infiniband/hw/mlx5/devx.c
index 80053324dd31..5271469aad10 100644
--- a/drivers/infiniband/hw/mlx5/devx.c
+++ b/drivers/infiniband/hw/mlx5/devx.c
@@ -314,6 +314,8 @@ static u64 devx_get_obj_id(const void *in)
MLX5_GET(query_dct_in, in, dctn));
break;
case MLX5_CMD_OP_QUERY_XRQ:
+   case MLX5_CMD_OP_QUERY_XRQ_DC_PARAMS_ENTRY:
+   case MLX5_CMD_OP_QUERY_XRQ_ERROR_PARAMS:
obj_id = get_enc_obj_id(MLX5_CMD_OP_CREATE_XRQ,
MLX5_GET(query_xrq_in, in, xrqn));
break;
@@ -340,9 +342,16 @@ static u64 devx_get_obj_id(const void *in)
MLX5_GET(drain_dct_in, in, dctn));
break;
case MLX5_CMD_OP_ARM_XRQ:
+   case MLX5_CMD_OP_SET_XRQ_DC_PARAMS_ENTRY:
obj_id = get_enc_obj_id(MLX5_CMD_OP_CREATE_XRQ,
MLX5_GET(arm_xrq_in, in, xrqn));
break;
+   case MLX5_CMD_OP_QUERY_PACKET_REFORMAT_CONTEXT:
+   obj_id = get_enc_obj_id
+   (MLX5_CMD_OP_ALLOC_PACKET_REFORMAT_CONTEXT,
+MLX5_GET(query_packet_reformat_context_in,
+ in, packet_reformat_id));
+   break;
default:
obj_id = 0;
}
@@ -601,6 +610,7 @@ static bool devx_is_obj_modify_cmd(const void *in)
case MLX5_CMD_OP_DRAIN_DCT:
case MLX5_CMD_OP_ARM_DCT_FOR_KEY_VIOLATION:
case MLX5_CMD_OP_ARM_XRQ:
+   case MLX5_CMD_OP_SET_XRQ_DC_PARAMS_ENTRY:
return true;
case MLX5_CMD_OP_SET_FLOW_TABLE_ENTRY:
{
@@ -642,6 +652,9 @@ static bool devx_is_obj_query_cmd(const void *in)
case MLX5_CMD_OP_QUERY_XRC_SRQ:
case MLX5_CMD_OP_QUERY_DCT:
case MLX5_CMD_OP_QUERY_XRQ:
+   case MLX5_CMD_OP_QUERY_XRQ_DC_PARAMS_ENTRY:
+   case MLX5_CMD_OP_QUERY_XRQ_ERROR_PARAMS:
+   case MLX5_CMD_OP_QUERY_PACKET_REFORMAT_CONTEXT:
return true;
default:
return false;
@@ -685,6 +698,10 @@ static bool devx_is_general_cmd(void *in)
 {
u16 opcode = MLX5_GET(general_obj_in_cmd_hdr, in, opcode);

+   if (opcode >= MLX5_CMD_OP_GENERAL_START &&
+   opcode < MLX5_CMD_OP_GENERAL_END)
+   return true;
+
switch (opcode) {
case MLX5_CMD_OP_QUERY_HCA_CAP:
case MLX5_CMD_OP_QUERY_HCA_VPORT_CONTEXT:
diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h
index ece1b606c909..171d68663640 100644
--- a/include/linux/mlx5/mlx5_ifc.h
+++ b/include/linux/mlx5/mlx5_ifc.h
@@ -144,6 +144,9 @@ enum {
MLX5_CMD_OP_DESTROY_XRQ   = 0x718,
MLX5_CMD_OP_QUERY_XRQ = 0x719,
MLX5_CMD_OP_ARM_XRQ   = 0x71a,
+   MLX5_CMD_OP_QUERY_XRQ_DC_PARAMS_ENTRY = 0x725,
+   MLX5_CMD_OP_SET_XRQ_DC_PARAMS_ENTRY   = 0x726,
+   MLX5_CMD_OP_QUERY_XRQ_ERROR_PARAMS= 0x727,
MLX5_CMD_OP_QUERY_VPORT_STATE = 0x750,
MLX5_CMD_OP_MODIFY_VPORT_STATE= 0x751,
MLX5_CMD_OP_QUERY_ESW_VPORT_CONTEXT   = 0x752,
@@ -245,6 +248,7 @@ enum {
MLX5_CMD_OP_MODIFY_FLOW_TABLE = 0x93c,
MLX5_CMD_OP_ALLOC_PACKET_REFORMAT_CONTEXT = 0x93d,
MLX5_CMD_OP_DEALLOC_PACKET_REFORMAT_CONTEXT = 0x93e,
+   MLX5_CMD_OP_QUERY_PACKET_REFORMAT_CONTEXT = 0x93f,
MLX5_CMD_OP_ALLOC_MODIFY_HEADER_CONTEXT   = 0x940,
MLX5_CMD_OP_DEALLOC_MODIFY_HEADER_CONTEXT = 0x941,
MLX5_CMD_OP_QUERY_MODIFY_HEADER_CONTEXT   = 0x942,
@@ -260,6 +264,12 @@ enum {
MLX5_CMD_OP_MAX
 };

+/* Valid range for general commands that don't work over an object */
+enum {
+   MLX5_CMD_OP_GENERAL_START = 0xb00,
+   MLX5_CMD_OP_GENERAL_END = 0xd00,
+};
+
 struct mlx5_ifc_flow_table_fields_supported_bits {
u8 outer_dmac[0x1];
u8 outer_smac[0x1];
--
2.19.1

[PATCH rdma-next 7/7] IB/mlx5: Allow XRC usage via verbs in DEVX context

2018-11-25 Thread Leon Romanovsky

From: Yishai Hadas 

Allows XRC usage from the verbs flow in a DEVX context.
As XRCD is some shared kernel resource between processes it should be
created with UID=0 to point on that.

As a result once XRC QP/SRQ are created they must be used as well with
UID=0 so that firmware will allow the XRCD usage.

Signed-off-by: Yishai Hadas 
Reviewed-by: Artemy Kovalyov 
Signed-off-by: Leon Romanovsky 
---
 drivers/infiniband/hw/mlx5/mlx5_ib.h |  1 -
 drivers/infiniband/hw/mlx5/qp.c  | 12 +---
 drivers/infiniband/hw/mlx5/srq.c |  2 +-
 3 files changed, 6 insertions(+), 9 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h 
b/drivers/infiniband/hw/mlx5/mlx5_ib.h
index 4d33965369cc..24cb2f793210 100644
--- a/drivers/infiniband/hw/mlx5/mlx5_ib.h
+++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h
@@ -543,7 +543,6 @@ struct mlx5_ib_srq {
 struct mlx5_ib_xrcd {
struct ib_xrcd  ibxrcd;
u32 xrcdn;
-   u16 uid;
 };
 
 enum mlx5_ib_mtt_access_flags {
diff --git a/drivers/infiniband/hw/mlx5/qp.c b/drivers/infiniband/hw/mlx5/qp.c
index 52ffc6af3c20..369db954edbe 100644
--- a/drivers/infiniband/hw/mlx5/qp.c
+++ b/drivers/infiniband/hw/mlx5/qp.c
@@ -775,6 +775,7 @@ static int create_user_qp(struct mlx5_ib_dev *dev, struct 
ib_pd *pd,
__be64 *pas;
void *qpc;
int err;
+   u16 uid;
 
err = ib_copy_from_udata(, udata, sizeof(ucmd));
if (err) {
@@ -836,7 +837,8 @@ static int create_user_qp(struct mlx5_ib_dev *dev, struct 
ib_pd *pd,
goto err_umem;
}
 
-   MLX5_SET(create_qp_in, *in, uid, to_mpd(pd)->uid);
+   uid = (attr->qp_type != IB_QPT_XRC_TGT) ? to_mpd(pd)->uid : 0;
+   MLX5_SET(create_qp_in, *in, uid, uid);
pas = (__be64 *)MLX5_ADDR_OF(create_qp_in, *in, pas);
if (ubuffer->umem)
mlx5_ib_populate_pas(dev, ubuffer->umem, page_shift, pas, 0);
@@ -5513,7 +5515,6 @@ struct ib_xrcd *mlx5_ib_alloc_xrcd(struct ib_device 
*ibdev,
struct mlx5_ib_dev *dev = to_mdev(ibdev);
struct mlx5_ib_xrcd *xrcd;
int err;
-   u16 uid;
 
if (!MLX5_CAP_GEN(dev->mdev, xrc))
return ERR_PTR(-ENOSYS);
@@ -5522,14 +5523,12 @@ struct ib_xrcd *mlx5_ib_alloc_xrcd(struct ib_device 
*ibdev,
if (!xrcd)
return ERR_PTR(-ENOMEM);
 
-   uid = context ? to_mucontext(context)->devx_uid : 0;
-   err = mlx5_cmd_xrcd_alloc(dev->mdev, >xrcdn, uid);
+   err = mlx5_cmd_xrcd_alloc(dev->mdev, >xrcdn, 0);
if (err) {
kfree(xrcd);
return ERR_PTR(-ENOMEM);
}
 
-   xrcd->uid = uid;
return >ibxrcd;
 }
 
@@ -5537,10 +5536,9 @@ int mlx5_ib_dealloc_xrcd(struct ib_xrcd *xrcd)
 {
struct mlx5_ib_dev *dev = to_mdev(xrcd->device);
u32 xrcdn = to_mxrcd(xrcd)->xrcdn;
-   u16 uid =  to_mxrcd(xrcd)->uid;
int err;
 
-   err = mlx5_cmd_xrcd_dealloc(dev->mdev, xrcdn, uid);
+   err = mlx5_cmd_xrcd_dealloc(dev->mdev, xrcdn, 0);
if (err)
mlx5_ib_warn(dev, "failed to dealloc xrcdn 0x%x\n", xrcdn);
 
diff --git a/drivers/infiniband/hw/mlx5/srq.c b/drivers/infiniband/hw/mlx5/srq.c
index b3aef0eb39cb..0413b10dea71 100644
--- a/drivers/infiniband/hw/mlx5/srq.c
+++ b/drivers/infiniband/hw/mlx5/srq.c
@@ -113,7 +113,7 @@ static int create_srq_user(struct ib_pd *pd, struct 
mlx5_ib_srq *srq,
 
in->log_page_size = page_shift - MLX5_ADAPTER_PAGE_SHIFT;
in->page_offset = offset;
-   in->uid = to_mpd(pd)->uid;
+   in->uid = (in->type != IB_SRQT_XRC) ?  to_mpd(pd)->uid : 0;
if (MLX5_CAP_GEN(dev->mdev, cqe_version) == MLX5_CQE_VERSION_V1 &&
in->type != IB_SRQT_BASIC)
in->user_index = uidx;
-- 
2.19.1

[PATCH rdma-next 5/7] IB/mlx5: Enforce DEVX privilege by firmware

2018-11-25 Thread Leon Romanovsky

From: Yishai Hadas 

Enforce DEVX privilege by firmware, this enables future device
functionality without the need to make driver changes unless a new
privilege type will be introduced.

Signed-off-by: Yishai Hadas 
Reviewed-by: Artemy Kovalyov 
Signed-off-by: Leon Romanovsky 
---
 drivers/infiniband/hw/mlx5/devx.c| 17 +
 drivers/infiniband/hw/mlx5/main.c|  4 ++--
 drivers/infiniband/hw/mlx5/mlx5_ib.h |  5 +++--
 3 files changed, 14 insertions(+), 12 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/devx.c 
b/drivers/infiniband/hw/mlx5/devx.c
index f80b78aab4da..80053324dd31 100644
--- a/drivers/infiniband/hw/mlx5/devx.c
+++ b/drivers/infiniband/hw/mlx5/devx.c
@@ -47,24 +47,31 @@ devx_ufile2uctx(const struct uverbs_attr_bundle *attrs)
return to_mucontext(ib_uverbs_get_ucontext(attrs));
 }
 
-int mlx5_ib_devx_create(struct mlx5_ib_dev *dev)
+int mlx5_ib_devx_create(struct mlx5_ib_dev *dev, bool is_user)
 {
u32 in[MLX5_ST_SZ_DW(create_uctx_in)] = {0};
u32 out[MLX5_ST_SZ_DW(general_obj_out_cmd_hdr)] = {0};
u64 general_obj_types;
-   void *hdr;
+   void *hdr, *uctx;
int err;
u16 uid;
+   u32 cap = 0;
 
hdr = MLX5_ADDR_OF(create_uctx_in, in, hdr);
+   uctx = MLX5_ADDR_OF(create_uctx_in, in, uctx);
 
general_obj_types = MLX5_CAP_GEN_64(dev->mdev, general_obj_types);
if (!(general_obj_types & MLX5_GENERAL_OBJ_TYPES_CAP_UCTX) ||
!(general_obj_types & MLX5_GENERAL_OBJ_TYPES_CAP_UMEM))
return -EINVAL;
 
+   if (is_user && capable(CAP_NET_RAW) &&
+   (MLX5_CAP_GEN(dev->mdev, uctx_cap) & MLX5_UCTX_CAP_RAW_TX))
+   cap |= MLX5_UCTX_CAP_RAW_TX;
+
MLX5_SET(general_obj_in_cmd_hdr, hdr, opcode, 
MLX5_CMD_OP_CREATE_GENERAL_OBJECT);
MLX5_SET(general_obj_in_cmd_hdr, hdr, obj_type, MLX5_OBJ_TYPE_UCTX);
+   MLX5_SET(uctx, uctx, cap, cap);
 
err = mlx5_cmd_exec(dev->mdev, in, sizeof(in), out, sizeof(out));
if (err)
@@ -672,9 +679,6 @@ static int devx_get_uid(struct mlx5_ib_ucontext *c, void 
*cmd_in)
if (!c->devx_uid)
return -EINVAL;
 
-   if (!capable(CAP_NET_RAW))
-   return -EPERM;
-
return c->devx_uid;
 }
 static bool devx_is_general_cmd(void *in)
@@ -1239,9 +1243,6 @@ static int UVERBS_HANDLER(MLX5_IB_METHOD_DEVX_UMEM_REG)(
if (!c->devx_uid)
return -EINVAL;
 
-   if (!capable(CAP_NET_RAW))
-   return -EPERM;
-
obj = kzalloc(sizeof(struct devx_umem), GFP_KERNEL);
if (!obj)
return -ENOMEM;
diff --git a/drivers/infiniband/hw/mlx5/main.c 
b/drivers/infiniband/hw/mlx5/main.c
index b3986bc961ca..2b09e6896e5a 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -1763,7 +1763,7 @@ static struct ib_ucontext *mlx5_ib_alloc_ucontext(struct 
ib_device *ibdev,
 #endif
 
if (req.flags & MLX5_IB_ALLOC_UCTX_DEVX) {
-   err = mlx5_ib_devx_create(dev);
+   err = mlx5_ib_devx_create(dev, true);
if (err < 0)
goto out_uars;
context->devx_uid = err;
@@ -6234,7 +6234,7 @@ static int mlx5_ib_stage_devx_init(struct mlx5_ib_dev 
*dev)
 {
int uid;
 
-   uid = mlx5_ib_devx_create(dev);
+   uid = mlx5_ib_devx_create(dev, false);
if (uid > 0)
dev->devx_whitelist_uid = uid;
 
diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h 
b/drivers/infiniband/hw/mlx5/mlx5_ib.h
index 59e1664a107f..4d33965369cc 100644
--- a/drivers/infiniband/hw/mlx5/mlx5_ib.h
+++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h
@@ -1268,7 +1268,7 @@ void mlx5_ib_put_native_port_mdev(struct mlx5_ib_dev *dev,
  u8 port_num);
 
 #if IS_ENABLED(CONFIG_INFINIBAND_USER_ACCESS)
-int mlx5_ib_devx_create(struct mlx5_ib_dev *dev);
+int mlx5_ib_devx_create(struct mlx5_ib_dev *dev, bool is_user);
 void mlx5_ib_devx_destroy(struct mlx5_ib_dev *dev, u16 uid);
 const struct uverbs_object_tree_def *mlx5_ib_get_devx_tree(void);
 extern const struct uapi_definition mlx5_ib_devx_defs[];
@@ -1283,7 +1283,8 @@ int mlx5_ib_get_flow_trees(const struct 
uverbs_object_tree_def **root);
 void mlx5_ib_destroy_flow_action_raw(struct mlx5_ib_flow_action *maction);
 #else
 static inline int
-mlx5_ib_devx_create(struct mlx5_ib_dev *dev) { return -EOPNOTSUPP; };
+mlx5_ib_devx_create(struct mlx5_ib_dev *dev,
+  bool is_user) { return -EOPNOTSUPP; }
 static inline void mlx5_ib_devx_destroy(struct mlx5_ib_dev *dev, u16 uid) {}
 static inline bool mlx5_ib_devx_is_flow_dest(void *obj, int *dest_id,
 int *dest_type)
-- 
2.19.1

[PATCH mlx5-next 1/7] net/mlx5: Update mlx5_ifc with DEVX UCTX capabilities bits

2018-11-25 Thread Leon Romanovsky

From: Yishai Hadas 

Expose device capabilities for DEVX user context, it includes which caps
the device is supported and a matching bit to set as part of user
context creation.

Signed-off-by: Yishai Hadas 
Reviewed-by: Artemy Kovalyov 
Signed-off-by: Leon Romanovsky 
---
 include/linux/mlx5/mlx5_ifc.h | 16 ++--
 1 file changed, 14 insertions(+), 2 deletions(-)

diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h
index 6f64e814cc10..ece1b606c909 100644
--- a/include/linux/mlx5/mlx5_ifc.h
+++ b/include/linux/mlx5/mlx5_ifc.h
@@ -883,6 +883,10 @@ enum {
MLX5_CAP_UMR_FENCE_NONE = 0x2,
 };
 
+enum {
+   MLX5_UCTX_CAP_RAW_TX = 1UL << 0,
+};
+
 struct mlx5_ifc_cmd_hca_cap_bits {
u8 reserved_at_0[0x30];
u8 vhca_id[0x10];
@@ -1193,7 +1197,13 @@ struct mlx5_ifc_cmd_hca_cap_bits {
u8 num_vhca_ports[0x8];
u8 reserved_at_618[0x6];
u8 sw_owner_id[0x1];
-   u8 reserved_at_61f[0x1e1];
+   u8 reserved_at_61f[0x1];
+
+   u8 reserved_at_620[0x80];
+
+   u8 uctx_cap[0x20];
+
+   u8 reserved_at_6c0[0x140];
 };
 
 enum mlx5_flow_destination_type {
@@ -9276,7 +9286,9 @@ struct mlx5_ifc_umem_bits {
 struct mlx5_ifc_uctx_bits {
u8 modify_field_select[0x40];
 
-   u8 reserved_at_40[0x1c0];
+   u8 cap[0x20];
+
+   u8 reserved_at_60[0x1a0];
 };
 
 struct mlx5_ifc_create_umem_in_bits {
-- 
2.19.1

[PATCH rdma-next 0/7] Enrich DEVX support

2018-11-25 Thread Leon Romanovsky

From: Leon Romanovsky 

>From Yishai,
---
This series enriches DEVX support in few aspects: it enables interoperability
between DEVX and verbs and improves mechanism for controlling privileged DEVX
commands.

The first patch updates mlx5 ifc file.

Next 3 patches enable modifying and querying verbs objects via the DEVX
interface.

To achieve that the core layer introduced the 'UVERBS_IDR_ANY_OBJECT' type
to match any IDR object. Once it's used by some driver's method, the
infrastructure skips checking for the IDR type and it becomes the driver
handler responsibility.

The DEVX methods of modify and query were changed to get any object type via
the 'UVERBS_IDR_ANY_OBJECT' mechanism. The type checking is done per object as
part of the driver code.

The next 3 patches introduce more robust mechanism for controlling privileged
DEVX commands. The responsibility to block/allow per command was moved to be
done in the firmware based on the UID credentials that the driver reports upon
user context creation. This enables more granularity per command based on the
device security model and the user credentials.

In addition, by introducing a valid range for 'general commands' we prevent the
need to touch the driver's code any time that a new future command will be
added.

The last patch fixes the XRC verbs flow once a DEVX context is used. This is
needed as XRCD is some shared kernel resource and as such a kernel UID (=0)
should be used in its related resources.

Thanks

Yishai Hadas (7):
  net/mlx5: Update mlx5_ifc with DEVX UCTX capabilities bits
  IB/core: Introduce UVERBS_IDR_ANY_OBJECT
  IB/core: Enable getting an object type from a given uobject
  IB/mlx5: Enable modify and query verbs objects via DEVX
  IB/mlx5: Enforce DEVX privilege by firmware
  IB/mlx5: Update the supported DEVX commands
  IB/mlx5: Allow XRC usage via verbs in DEVX context

 drivers/infiniband/core/rdma_core.c   |  27 +++--
 drivers/infiniband/core/rdma_core.h   |  21 ++--
 drivers/infiniband/core/uverbs_uapi.c |  10 +-
 drivers/infiniband/hw/mlx5/devx.c | 142 ++
 drivers/infiniband/hw/mlx5/main.c |   4 +-
 drivers/infiniband/hw/mlx5/mlx5_ib.h  |   6 +-
 drivers/infiniband/hw/mlx5/qp.c   |  12 +--
 drivers/infiniband/hw/mlx5/srq.c  |   2 +-
 include/linux/mlx5/mlx5_ifc.h |  26 -
 include/rdma/uverbs_ioctl.h   |   6 ++
 include/rdma/uverbs_std_types.h   |  12 +++
 11 files changed, 215 insertions(+), 53 deletions(-)

--
2.19.1

[PATCH rdma-next 3/7] IB/core: Enable getting an object type from a given uobject

2018-11-25 Thread Leon Romanovsky

From: Yishai Hadas 

Enable getting an object type from a given uobject, the type is saved
upon tree merging and is returned as part of some helper function.

Signed-off-by: Yishai Hadas 
Signed-off-by: Leon Romanovsky 
---
 drivers/infiniband/core/rdma_core.h   |  5 -
 drivers/infiniband/core/uverbs_uapi.c |  1 +
 include/rdma/uverbs_std_types.h   | 12 
 3 files changed, 13 insertions(+), 5 deletions(-)

diff --git a/drivers/infiniband/core/rdma_core.h 
b/drivers/infiniband/core/rdma_core.h
index 8aec28037c48..b3ca7457ac42 100644
--- a/drivers/infiniband/core/rdma_core.h
+++ b/drivers/infiniband/core/rdma_core.h
@@ -118,11 +118,6 @@ void release_ufile_idr_uobject(struct ib_uverbs_file 
*ufile);
  * Depending on ID the slot pointer in the radix tree points at one of these
  * structs.
  */
-struct uverbs_api_object {
-   const struct uverbs_obj_type *type_attrs;
-   const struct uverbs_obj_type_class *type_class;
-   u8 disabled:1;
-};
 
 struct uverbs_api_ioctl_method {
int(__rcu *handler)(struct uverbs_attr_bundle *attrs);
diff --git a/drivers/infiniband/core/uverbs_uapi.c 
b/drivers/infiniband/core/uverbs_uapi.c
index faac225184a6..0136c1d78a0f 100644
--- a/drivers/infiniband/core/uverbs_uapi.c
+++ b/drivers/infiniband/core/uverbs_uapi.c
@@ -184,6 +184,7 @@ static int uapi_merge_obj_tree(struct uverbs_api *uapi,
if (WARN_ON(obj_elm->type_attrs))
return -EINVAL;
 
+   obj_elm->id = obj->id;
obj_elm->type_attrs = obj->type_attrs;
obj_elm->type_class = obj->type_attrs->type_class;
/*
diff --git a/include/rdma/uverbs_std_types.h b/include/rdma/uverbs_std_types.h
index df878ce02c94..883abcf6d36e 100644
--- a/include/rdma/uverbs_std_types.h
+++ b/include/rdma/uverbs_std_types.h
@@ -182,5 +182,17 @@ static inline void ib_set_flow(struct ib_uobject *uobj, 
struct ib_flow *ibflow,
uflow->resources = uflow_res;
 }
 
+struct uverbs_api_object {
+   const struct uverbs_obj_type *type_attrs;
+   const struct uverbs_obj_type_class *type_class;
+   u8 disabled:1;
+   u32 id;
+};
+
+static inline u32 uobj_get_object_id(struct ib_uobject *uobj)
+{
+   return uobj->uapi_object->id;
+}
+
 #endif
 
-- 
2.19.1

[PATCH rdma-next 2/7] IB/core: Introduce UVERBS_IDR_ANY_OBJECT

2018-11-25 Thread Leon Romanovsky

From: Yishai Hadas 

Introduce the UVERBS_IDR_ANY_OBJECT type to match any IDR object.

Once used, the infrastructure skips checking for the IDR type, it
becomes the driver handler responsibility.

This enables drivers to get in a given method an object from various of
types.

Signed-off-by: Yishai Hadas 
Signed-off-by: Leon Romanovsky 
---
 drivers/infiniband/core/rdma_core.c   | 27 +--
 drivers/infiniband/core/rdma_core.h   | 16 +++-
 drivers/infiniband/core/uverbs_uapi.c |  9 +++--
 include/rdma/uverbs_ioctl.h   |  6 ++
 4 files changed, 45 insertions(+), 13 deletions(-)

diff --git a/drivers/infiniband/core/rdma_core.c 
b/drivers/infiniband/core/rdma_core.c
index efa292489271..d160ed23065e 100644
--- a/drivers/infiniband/core/rdma_core.c
+++ b/drivers/infiniband/core/rdma_core.c
@@ -398,16 +398,23 @@ struct ib_uobject *rdma_lookup_get_uobject(const struct 
uverbs_api_object *obj,
struct ib_uobject *uobj;
int ret;
 
-   if (!obj)
-   return ERR_PTR(-EINVAL);
+   if (IS_ERR(obj) && PTR_ERR(obj) == -ENOMSG) {
+   /* must be UVERBS_IDR_ANY_OBJECT, see uapi_get_object() */
+   uobj = lookup_get_idr_uobject(NULL, ufile, id, mode);
+   if (IS_ERR(uobj))
+   return uobj;
+   } else {
+   if (IS_ERR(obj))
+   return ERR_PTR(-EINVAL);
 
-   uobj = obj->type_class->lookup_get(obj, ufile, id, mode);
-   if (IS_ERR(uobj))
-   return uobj;
+   uobj = obj->type_class->lookup_get(obj, ufile, id, mode);
+   if (IS_ERR(uobj))
+   return uobj;
 
-   if (uobj->uapi_object != obj) {
-   ret = -EINVAL;
-   goto free;
+   if (uobj->uapi_object != obj) {
+   ret = -EINVAL;
+   goto free;
+   }
}
 
/*
@@ -427,7 +434,7 @@ struct ib_uobject *rdma_lookup_get_uobject(const struct 
uverbs_api_object *obj,
 
return uobj;
 free:
-   obj->type_class->lookup_put(uobj, mode);
+   uobj->uapi_object->type_class->lookup_put(uobj, mode);
uverbs_uobject_put(uobj);
return ERR_PTR(ret);
 }
@@ -491,7 +498,7 @@ struct ib_uobject *rdma_alloc_begin_uobject(const struct 
uverbs_api_object *obj,
 {
struct ib_uobject *ret;
 
-   if (!obj)
+   if (IS_ERR(obj))
return ERR_PTR(-EINVAL);
 
/*
diff --git a/drivers/infiniband/core/rdma_core.h 
b/drivers/infiniband/core/rdma_core.h
index bac484d6753a..8aec28037c48 100644
--- a/drivers/infiniband/core/rdma_core.h
+++ b/drivers/infiniband/core/rdma_core.h
@@ -162,10 +162,24 @@ struct uverbs_api {
const struct uverbs_api_write_method **write_ex_methods;
 };
 
+/*
+ * Get an uverbs_api_object that corresponds to the given object_id.
+ * Note:
+ * -ENOMSG means that any object is allowed to match during lookup.
+ */
 static inline const struct uverbs_api_object *
 uapi_get_object(struct uverbs_api *uapi, u16 object_id)
 {
-   return radix_tree_lookup(>radix, uapi_key_obj(object_id));
+   const struct uverbs_api_object *res;
+
+   if (object_id == UVERBS_IDR_ANY_OBJECT)
+   return ERR_PTR(-ENOMSG);
+
+   res = radix_tree_lookup(>radix, uapi_key_obj(object_id));
+   if (!res)
+   return ERR_PTR(-ENOENT);
+
+   return res;
 }
 
 char *uapi_key_format(char *S, unsigned int key);
diff --git a/drivers/infiniband/core/uverbs_uapi.c 
b/drivers/infiniband/core/uverbs_uapi.c
index 19ae4b19b2ef..faac225184a6 100644
--- a/drivers/infiniband/core/uverbs_uapi.c
+++ b/drivers/infiniband/core/uverbs_uapi.c
@@ -580,8 +580,13 @@ static void uapi_finalize_disable(struct uverbs_api *uapi)
if (obj_key == UVERBS_API_KEY_ERR)
continue;
tmp_obj = uapi_get_object(uapi, obj_key);
-   if (tmp_obj && !tmp_obj->disabled)
-   continue;
+   if (IS_ERR(tmp_obj)) {
+   if (PTR_ERR(tmp_obj) == -ENOMSG)
+   continue;
+   } else {
+   if (!tmp_obj->disabled)
+   continue;
+   }
 
starting_key = iter.index;
uapi_remove_method(
diff --git a/include/rdma/uverbs_ioctl.h b/include/rdma/uverbs_ioctl.h
index 7f4ace93e502..2f56844fb7da 100644
--- a/include/rdma/uverbs_ioctl.h
+++ b/include/rdma/uverbs_ioctl.h
@@ -524,6 +524,12 @@ struct uapi_definition {
  .u2.objs_arr.max_len = _max_len, \
  __VA_ARGS__ } })
 
+/*
+ * Only for use with UVERBS_ATTR_IDR, allows any uobject type to be accepted,
+ * the user must validate the type of the uobject instead.
+ */

[PATCH rdma-next 4/7] IB/mlx5: Enable modify and query verbs objects via DEVX

2018-11-25 Thread Leon Romanovsky

From: Yishai Hadas 

Enables modify and query verbs objects via the DEVX interface.
To support this the above DEVX handlers were changed to get any
object type via the UVERBS_IDR_ANY_OBJECT mechanism.

The type checking and handling is done per object as part of the
driver code.

Signed-off-by: Yishai Hadas 
Signed-off-by: Leon Romanovsky 
---
 drivers/infiniband/hw/mlx5/devx.c | 108 ++
 1 file changed, 96 insertions(+), 12 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/devx.c 
b/drivers/infiniband/hw/mlx5/devx.c
index 0aa2ee732eaa..f80b78aab4da 100644
--- a/drivers/infiniband/hw/mlx5/devx.c
+++ b/drivers/infiniband/hw/mlx5/devx.c
@@ -9,6 +9,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include "mlx5_ib.h"
@@ -132,7 +133,7 @@ static u64 get_enc_obj_id(u16 opcode, u32 obj_id)
return ((u64)opcode << 32) | obj_id;
 }
 
-static int devx_is_valid_obj_id(struct devx_obj *obj, const void *in)
+static u64 devx_get_obj_id(const void *in)
 {
u16 opcode = MLX5_GET(general_obj_in_cmd_hdr, in, opcode);
u64 obj_id;
@@ -336,13 +337,96 @@ static int devx_is_valid_obj_id(struct devx_obj *obj, 
const void *in)
MLX5_GET(arm_xrq_in, in, xrqn));
break;
default:
+   obj_id = 0;
+   }
+
+   return obj_id;
+}
+
+static bool devx_is_valid_obj_id(struct ib_uobject *uobj, const void *in)
+{
+   u64 obj_id = devx_get_obj_id(in);
+
+   if (!obj_id)
return false;
+
+   switch (uobj_get_object_id(uobj)) {
+   case UVERBS_OBJECT_CQ:
+   return get_enc_obj_id(MLX5_CMD_OP_CREATE_CQ,
+ to_mcq(uobj->object)->mcq.cqn) ==
+ obj_id;
+
+   case UVERBS_OBJECT_SRQ:
+   {
+   struct mlx5_core_srq *srq = &(to_msrq(uobj->object)->msrq);
+   struct mlx5_ib_dev *dev = to_mdev(uobj->context->device);
+   u16 opcode;
+
+   switch (srq->common.res) {
+   case MLX5_RES_XSRQ:
+   opcode = MLX5_CMD_OP_CREATE_XRC_SRQ;
+   break;
+   case MLX5_RES_XRQ:
+   opcode = MLX5_CMD_OP_CREATE_XRQ;
+   break;
+   default:
+   if (!dev->mdev->issi)
+   opcode = MLX5_CMD_OP_CREATE_SRQ;
+   else
+   opcode = MLX5_CMD_OP_CREATE_RMP;
+   }
+
+   return get_enc_obj_id(opcode,
+ to_msrq(uobj->object)->msrq.srqn) ==
+ obj_id;
}
 
-   if (obj_id == obj->obj_id)
-   return true;
+   case UVERBS_OBJECT_QP:
+   {
+   struct mlx5_ib_qp *qp = to_mqp(uobj->object);
+   enum ib_qp_type qp_type = qp->ibqp.qp_type;
+
+   if (qp_type == IB_QPT_RAW_PACKET ||
+   (qp->flags & MLX5_IB_QP_UNDERLAY)) {
+   struct mlx5_ib_raw_packet_qp *raw_packet_qp =
+>raw_packet_qp;
+   struct mlx5_ib_rq *rq = _packet_qp->rq;
+   struct mlx5_ib_sq *sq = _packet_qp->sq;
+
+   return (get_enc_obj_id(MLX5_CMD_OP_CREATE_RQ,
+  rq->base.mqp.qpn) == obj_id ||
+   get_enc_obj_id(MLX5_CMD_OP_CREATE_SQ,
+  sq->base.mqp.qpn) == obj_id ||
+   get_enc_obj_id(MLX5_CMD_OP_CREATE_TIR,
+  rq->tirn) == obj_id ||
+   get_enc_obj_id(MLX5_CMD_OP_CREATE_TIS,
+  sq->tisn) == obj_id);
+   }
+
+   if (qp_type == MLX5_IB_QPT_DCT)
+   return get_enc_obj_id(MLX5_CMD_OP_CREATE_DCT,
+ qp->dct.mdct.mqp.qpn) == obj_id;
+
+   return get_enc_obj_id(MLX5_CMD_OP_CREATE_QP,
+ qp->ibqp.qp_num) == obj_id;
+   }
 
-   return false;
+   case UVERBS_OBJECT_WQ:
+   return get_enc_obj_id(MLX5_CMD_OP_CREATE_RQ,
+ to_mrwq(uobj->object)->core_qp.qpn) ==
+ obj_id;
+
+   case UVERBS_OBJECT_RWQ_IND_TBL:
+   return get_enc_obj_id(MLX5_CMD_OP_CREATE_RQT,
+ to_mrwq_ind_table(uobj->object)->rqtn) ==
+ obj_id;
+
+   case MLX5_IB_OBJECT_DEVX_OBJ:
+   return ((struct devx_obj *)uobj->object)->obj_id == obj_id;
+
+   default:
+   return false;
+   }
 }
 
 static void devx_set_umem_valid(const void *in)
@@

Re: [PATCH net-next] net: remove unsafe skb_insert()

2018-11-25 Thread Eric Dumazet




On 11/25/2018 07:52 PM, David Miller wrote:
> 
> I fixed up the build in your original patch and am about to push that
> out.

Thanks David, sorry for this, I should have compiled the damn thing :/

Re: pull-request: bpf 2018-11-25

2018-11-25 Thread David Miller

From: Daniel Borkmann 
Date: Mon, 26 Nov 2018 01:16:51 +0100

> The following pull-request contains BPF updates for your *net* tree.

Pulled, thanks.

Re: [PATCH net-next] net: remove unsafe skb_insert()

2018-11-25 Thread David Miller

From: Eric Dumazet 
Date: Sun, 25 Nov 2018 15:37:43 -0800

> On Sun, Nov 25, 2018 at 10:29 AM David Miller  wrote:
>>
>> From: Eric Dumazet 
>> Date: Sun, 25 Nov 2018 08:26:23 -0800
>>
>> > I do not see how one can effectively use skb_insert() without holding
>> > some kind of lock. Otherwise other cpus could have changed the list
>> > right before we have a chance of acquiring list->lock.
>> >
>> > Only existing user is in drivers/infiniband/hw/nes/nes_mgt.c and this
>> > one probably meant to use __skb_insert() since it appears nesqp->pau_list
>> > is protected by nesqp->pau_lock. This looks like nesqp->pau_lock
>> > could be removed, since nesqp->pau_list.lock could be used instead.
>> >
>> > Signed-off-by: Eric Dumazet 
>>
>> Good find.
>>
>> Indeed, any of the queue SKB manipulation functions that take two SKBs
>> as an argument are suspect in this manner.
>>
>> Applied, thanks Eric.
> 
> Oh well, this does not build.
> 
> Since you have not pushed your tree yet, maybe we can replace this
> with a version that actually compiles.
> 
> Please let me know if a relative patch or a v2 is needed, thanks.

I fixed up the build in your original patch and am about to push that
out.

Re: [PATCH net-next] net: phy: fix two issues with linkmode bitmaps

2018-11-25 Thread Andrew Lunn

> Because function naming is the same I'm afraid they easily can be used
> incorrectly (the bugs we just discuss are good examples). Maybe it
> could be an option to reflect the semantics in the name like this
> (better suited proposals welcome):
> 
> case 1: mii_xxx_to_linkmode_yyy
> case 2: mii_xxx_or_linkmode_yyy
> case 3: mii_xxx_mod_linkmode_yyy
 
Hi Heiner

I started a patchset using this idea, and reworks your fix. Lets work
on that, rather than merge this patch.

   Andrew

Re: consistency for statistics with XDP mode

2018-11-25 Thread Toshiaki Makita

On 2018/11/23 1:43, David Ahern wrote:
> On 11/21/18 5:53 PM, Toshiaki Makita wrote:
>>> We really need consistency in the counters and at a minimum, users
>>> should be able to track packet and byte counters for both Rx and Tx
>>> including XDP.
>>>
>>> It seems to me the Rx and Tx packet, byte and dropped counters returned
>>> for the standard device stats (/proc/net/dev, ip -s li show, ...) should
>>> include all packets managed by the driver regardless of whether they are
>>> forwarded / dropped in XDP or go up the Linux stack. This also aligns
>>
>> Agreed. When I introduced virtio_net XDP counters, I just forgot to
>> update tx packets/bytes counters on ndo_xdp_xmit. Probably I thought it
>> is handled by free_old_xmit_skbs.
> 
> Do you have some time to look at adding the Tx counters to virtio_net?

hoping I can make some time within a couple of days.

-- 
Toshiaki Makita

Re: [PATCH bpf-next 1/3] bpf: helper to pop data from messages

2018-11-25 Thread Daniel Borkmann

On 11/23/2018 02:38 AM, John Fastabend wrote:
> This adds a BPF SK_MSG program helper so that we can pop data from a
> msg. We use this to pop metadata from a previous push data call.
> 
> Signed-off-by: John Fastabend 
> ---
>  include/uapi/linux/bpf.h |  13 +++-
>  net/core/filter.c| 169 
> +++
>  net/ipv4/tcp_bpf.c   |  14 +++-
>  3 files changed, 192 insertions(+), 4 deletions(-)
> 
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index c1554aa..64681f8 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -2268,6 +2268,16 @@ union bpf_attr {
>   *
>   *   Return
>   *   0 on success, or a negative error in case of failure.
> + *
> + * int bpf_msg_pop_data(struct sk_msg_buff *msg, u32 start, u32 pop, u64 
> flags)
> + *Description
> + *   Will remove 'pop' bytes from a msg starting at byte 'start'.
> + *   This result in ENOMEM errors under certain situations where
> + *   a allocation and copy are required due to a full ring buffer.
> + *   However, the helper will try to avoid doing the allocation
> + *   if possible. Other errors can occur if input parameters are
> + *   invalid either do to start byte not being valid part of msg
> + *   payload and/or pop value being to large.
>   */
>  #define __BPF_FUNC_MAPPER(FN)\
>   FN(unspec), \
> @@ -2360,7 +2370,8 @@ union bpf_attr {
>   FN(map_push_elem),  \
>   FN(map_pop_elem),   \
>   FN(map_peek_elem),  \
> - FN(msg_push_data),
> + FN(msg_push_data),  \
> + FN(msg_pop_data),
>  
>  /* integer value in 'imm' field of BPF_CALL instruction selects which helper
>   * function eBPF program intends to call
> diff --git a/net/core/filter.c b/net/core/filter.c
> index f6ca38a..c6b35b5 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -2428,6 +2428,173 @@ static const struct bpf_func_proto 
> bpf_msg_push_data_proto = {
>   .arg4_type  = ARG_ANYTHING,
>  };
>  
> +static void sk_msg_shift_left(struct sk_msg *msg, int i)
> +{
> + int prev;
> +
> + do {
> + prev = i;
> + sk_msg_iter_var_next(i);
> + msg->sg.data[prev] = msg->sg.data[i];
> + } while (i != msg->sg.end);
> +
> + sk_msg_iter_prev(msg, end);
> +}
> +
> +static void sk_msg_shift_right(struct sk_msg *msg, int i)
> +{
> + struct scatterlist tmp, sge;
> +
> + sk_msg_iter_next(msg, end);
> + sge = sk_msg_elem_cpy(msg, i);
> + sk_msg_iter_var_next(i);
> + tmp = sk_msg_elem_cpy(msg, i);
> +
> + while (i != msg->sg.end) {
> + msg->sg.data[i] = sge;
> + sk_msg_iter_var_next(i);
> + sge = tmp;
> + tmp = sk_msg_elem_cpy(msg, i);
> + }
> +}
> +
> +BPF_CALL_4(bpf_msg_pop_data, struct sk_msg *, msg, u32, start,
> +u32, len, u64, flags)
> +{
> + u32 i = 0, l, space, offset = 0;
> + u64 last = start + len;
> + int pop;
> +
> + if (unlikely(flags))
> + return -EINVAL;
> +
> + /* First find the starting scatterlist element */
> + i = msg->sg.start;
> + do {
> + l = sk_msg_elem(msg, i)->length;
> +
> + if (start < offset + l)
> + break;
> + offset += l;
> + sk_msg_iter_var_next(i);
> + } while (i != msg->sg.end);
> +
> + /* Bounds checks: start and pop must be inside message */
> + if (start >= offset + l || last >= msg->sg.size)
> + return -EINVAL;
> +
> + space = MAX_MSG_FRAGS - sk_msg_elem_used(msg);
> +
> + pop = len;
> + /* --| offset
> +  * -| start  |--- len --|
> +  *
> +  *  |- a | pop ---|- b |
> +  *  |__| length
> +  *
> +  *
> +  * a:   region at front of scatter element to save
> +  * b:   region at back of scatter element to save when length > A + pop
> +  * pop: region to pop from element, same as input 'pop' here will be
> +  *  decremented below per iteration.
> +  *
> +  * Two top-level cases to handle when start != offset, first B is non
> +  * zero and second B is zero corresponding to when a pop includes more
> +  * than one element.
> +  *
> +  * Then if B is non-zero AND there is no space allocate space and
> +  * compact A, B regions into page. If there is space shift ring to
> +  * the rigth free'ing the next element in ring to place B, leaving
> +  * A untouched except to reduce length.
> +  */
> + if (start != offset) {
> + struct scatterlist *nsge, *sge = sk_msg_elem(msg, i);
> + int a = start;
> + int b = sge->length - pop - a;
> +
> + sk_msg_iter_var_next(i);
> +
> + if (pop <

Re: [PATCH bpf-next 0/3] bpf: add sk_msg helper sk_msg_pop_data

2018-11-25 Thread Daniel Borkmann

On 11/23/2018 02:38 AM, John Fastabend wrote:
> After being able to add metadata to messages with sk_msg_push_data we
> have also found it useful to be able to "pop" this metadata off before
> sending it to applications in some cases. This series adds a new helper
> sk_msg_pop_data() and the associated patches to add tests and tools/lib
> support.
> 
> Thanks!
> 
> John Fastabend (3):
>   bpf: helper to pop data from messages
>   bpf: add msg_pop_data helper to tools
>   bpf: test_sockmap, add options for msg_pop_data() helper usage
> 
>  include/uapi/linux/bpf.h|  13 +-
>  net/core/filter.c   | 169 
> 
>  net/ipv4/tcp_bpf.c  |  14 +-
>  tools/include/uapi/linux/bpf.h  |  13 +-
>  tools/testing/selftests/bpf/bpf_helpers.h   |   2 +
>  tools/testing/selftests/bpf/test_sockmap.c  | 127 +-
>  tools/testing/selftests/bpf/test_sockmap_kern.h |  70 --
>  7 files changed, 386 insertions(+), 22 deletions(-)
> 

Applied to bpf-next, thanks.

Re: [PATCH bpf-next] bpf: align map type names formatting.

2018-11-25 Thread Daniel Borkmann

On 11/24/2018 12:58 AM, David Calavera wrote:
> Make the formatting for map_type_name array consistent.
> 
> Signed-off-by: David Calavera 

Applied, thanks!

Re: [PATCH] tags: Fix DEFINE_PER_CPU expansion

2018-11-25 Thread Daniel Borkmann

On 11/24/2018 12:48 AM, Rustam Kovhaev wrote:
> Building tags produces warning:
>  ctags: Warning: kernel/bpf/local_storage.c:10: null expansion of name 
> pattern "\1"
> 
> Let's use the same fix as in commit <25528213fe9f75f4>, even though it
> violates the usual code style.
> 
> Signed-off-by: Rustam Kovhaev 

Applied to bpf-next, thanks!

pull-request: bpf 2018-11-25

2018-11-25 Thread Daniel Borkmann

Hi David,

The following pull-request contains BPF updates for your *net* tree.

The main changes are:

1) Fix an off-by-one bug when adjusting subprog start offsets after
   patching, from Edward.

2) Fix several bugs such as overflow in size allocation in queue /
   stack map creation, from Alexei.

3) Fix wrong IPv6 destination port byte order in bpf_sk_lookup_udp
   helper, from Andrey.

4) Fix several bugs in bpftool such as preventing an infinite loop
   in get_fdinfo, error handling and man page references, from Quentin.

5) Fix a warning in bpf_trace_printk() that wasn't catching an
   invalid format string, from Martynas.

6) Fix a bug in BPF cgroup local storage where non-atomic allocation
   was used in atomic context, from Roman.

7) Fix a NULL pointer dereference bug in bpftool from reallocarray()
   error handling, from Jakub and Wen.

8) Add a copy of pkt_cls.h and tc_bpf.h uapi headers to the tools
   include infrastructure so that bpftool compiles on older RHEL7-like
   user space which does not ship these headers, from Yonghong.

9) Fix BPF kselftests for user space where to get ping test working
   with ping6 and ping -6, from Li.

Please consider pulling these changes from:

  git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git

Thanks a lot!



The following changes since commit 85b18b0237ce9986a81a1b9534b5e2ee116f5504:

  net: smsc95xx: Fix MTU range (2018-11-08 19:54:49 -0800)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git 

for you to fetch changes up to 1efb6ee3edea57f57f9fb05dba8dcb3f7333f61f:

  bpf: fix check of allowed specifiers in bpf_trace_printk (2018-11-23 21:54:14 
+0100)


Alexei Starovoitov (1):
  bpf: fix integer overflow in queue_stack_map

Andrey Ignatov (1):
  bpf: Fix IPv6 dport byte order in bpf_sk_lookup_udp

Edward Cree (1):
  bpf: fix off-by-one error in adjust_subprog_starts

Jakub Kicinski (1):
  tools: bpftool: fix potential NULL pointer dereference in do_load

Li Zhijian (1):
  kselftests/bpf: use ping6 as the default ipv6 ping binary when it exists

Martynas Pumputis (1):
  bpf: fix check of allowed specifiers in bpf_trace_printk

Quentin Monnet (4):
  tools: bpftool: prevent infinite loop in get_fdinfo()
  tools: bpftool: fix plain output and doc for --bpffs option
  tools: bpftool: pass an argument to silence open_obj_pinned()
  tools: bpftool: update references to other man pages in documentation

Roman Gushchin (1):
  bpf: allocate local storage buffers using GFP_ATOMIC

Yonghong Song (1):
  tools/bpftool: copy a few net uapi headers to tools directory

 kernel/bpf/local_storage.c |   3 +-
 kernel/bpf/queue_stack_maps.c  |  16 +-
 kernel/bpf/verifier.c  |   2 +-
 kernel/trace/bpf_trace.c   |   8 +-
 net/core/filter.c  |   5 +-
 tools/bpf/bpftool/Documentation/bpftool-cgroup.rst |   8 +-
 tools/bpf/bpftool/Documentation/bpftool-map.rst|   8 +-
 tools/bpf/bpftool/Documentation/bpftool-net.rst|   8 +-
 tools/bpf/bpftool/Documentation/bpftool-perf.rst   |   8 +-
 tools/bpf/bpftool/Documentation/bpftool-prog.rst   |  11 +-
 tools/bpf/bpftool/Documentation/bpftool.rst|   9 +-
 tools/bpf/bpftool/common.c |  17 +-
 tools/bpf/bpftool/main.h   |   2 +-
 tools/bpf/bpftool/prog.c   |  13 +-
 tools/include/uapi/linux/pkt_cls.h | 612 +
 tools/include/uapi/linux/tc_act/tc_bpf.h   |  37 ++
 tools/testing/selftests/bpf/test_netcnt.c  |   5 +-
 tools/testing/selftests/bpf/test_verifier.c|  19 +
 18 files changed, 752 insertions(+), 39 deletions(-)
 create mode 100644 tools/include/uapi/linux/pkt_cls.h
 create mode 100644 tools/include/uapi/linux/tc_act/tc_bpf.h

[PATCH linux-next 05/10] ARM: dts: dra7: switch to use phy-gmii-sel

2018-11-25 Thread Grygorii Strashko

Switch to use phy-gmii-sel PHY instead of cpsw-phy-sel.

Cc: Kishon Vijay Abraham I 
Cc: Tony Lindgren 
Signed-off-by: Grygorii Strashko 
---
 arch/arm/boot/dts/dra7-l4.dtsi | 15 ---
 1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/arch/arm/boot/dts/dra7-l4.dtsi b/arch/arm/boot/dts/dra7-l4.dtsi
index 7e5c0d4f..7070095 100644
--- a/arch/arm/boot/dts/dra7-l4.dtsi
+++ b/arch/arm/boot/dts/dra7-l4.dtsi
@@ -77,18 +77,18 @@
};
};
 
+   phy_gmii_sel: phy-gmii-sel {
+   compatible = 
"ti,dra7xx-phy-gmii-sel";
+   reg = <0x554 0x4>;
+   #phy-cells = <1>;
+   };
+
scm_conf_clocks: clocks {
#address-cells = <1>;
#size-cells = <0>;
};
};
 
-   phy_sel: cpsw-phy-sel@554 {
-   compatible = "ti,dra7xx-cpsw-phy-sel";
-   reg= <0x554 0x4>;
-   reg-names = "gmii-sel";
-   };
-
dra7_pmx_core: pinmux@1400 {
compatible = "ti,dra7-padconf",
 "pinctrl-single";
@@ -3060,7 +3060,6 @@
 ;
ranges = <0 0 0x4000>;
syscon = <_conf>;
-   cpsw-phy-sel = <_sel>;
status = "disabled";
 
davinci_mdio: mdio@1000 {
@@ -3075,11 +3074,13 @@
cpsw_emac0: slave@200 {
/* Filled in by U-Boot */
mac-address = [ 00 00 00 00 00 00 ];
+   phys = <_gmii_sel 1>;
};
 
cpsw_emac1: slave@300 {
/* Filled in by U-Boot */
mac-address = [ 00 00 00 00 00 00 ];
+   phys = <_gmii_sel 2>;
};
};
};
-- 
2.10.5

[PATCH linux-next 09/10] dt-bindings: net: ti: deprecate cpsw-phy-sel bindings

2018-11-25 Thread Grygorii Strashko

The cpsw-phy-sel driver was replaced with new PHY driver phy-gmii-sel, so
deprecate cpsw-phy-sel bindings.

Cc: Kishon Vijay Abraham I 
Cc: Tony Lindgren 
Signed-off-by: Grygorii Strashko 
---
 Documentation/devicetree/bindings/net/cpsw-phy-sel.txt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/devicetree/bindings/net/cpsw-phy-sel.txt 
b/Documentation/devicetree/bindings/net/cpsw-phy-sel.txt
index 764c0c7..5d76f99 100644
--- a/Documentation/devicetree/bindings/net/cpsw-phy-sel.txt
+++ b/Documentation/devicetree/bindings/net/cpsw-phy-sel.txt
@@ -1,4 +1,4 @@
-TI CPSW Phy mode Selection Device Tree Bindings
+TI CPSW Phy mode Selection Device Tree Bindings (DEPRECATED)
 ---
 
 Required properties:
-- 
2.10.5

[PATCH linux-next 02/10] phy: ti: introduce phy-gmii-sel driver

2018-11-25 Thread Grygorii Strashko

TI am335x/am437x/dra7(am5)/dm814x CPSW3G Ethernet Subsystem supports two
10/100/1000 Ethernet ports with selectable G/MII, RMII, and RGMII
interfaces. The interface mode is selected by configuring the MII mode
selection register(s) (GMII_SEL) in the System Control Module chapter
(SCM). GMII_SEL register(s) and bit fields placement in SCM are different
between SoCs while fields meaning is the same.

Historically CPSW external Port's interface mode selection configuration
was introduced using custom API and driver cpsw-phy-sel.c. This leads to
unnecessary driver, DT binding and custom API support effort.

This patch introduces CPSW Port's PHY Interface Mode selection Driver
(phy-gmii-sel) which implements standard Linux PHY interface and used
as a replacement for TI's specific driver cpsw-phy-sel.c and corresponding
custom API.

Cc: Kishon Vijay Abraham I 
Cc: Tony Lindgren 
Signed-off-by: Grygorii Strashko 
---
 drivers/phy/ti/Kconfig|  10 ++
 drivers/phy/ti/Makefile   |   1 +
 drivers/phy/ti/phy-gmii-sel.c | 349 ++
 3 files changed, 360 insertions(+)
 create mode 100644 drivers/phy/ti/phy-gmii-sel.c

diff --git a/drivers/phy/ti/Kconfig b/drivers/phy/ti/Kconfig
index 2050356..f137e01 100644
--- a/drivers/phy/ti/Kconfig
+++ b/drivers/phy/ti/Kconfig
@@ -76,3 +76,13 @@ config TWL4030_USB
  family chips (including the TWL5030 and TPS659x0 devices).
  This transceiver supports high and full speed devices plus,
  in host mode, low speed.
+
+config PHY_TI_GMII_SEL
+   tristate
+   default y if TI_CPSW=y
+   depends on TI_CPSW || COMPILE_TEST
+   select GENERIC_PHY
+   default m
+   help
+ This driver supports configuring of the TI CPSW Port mode depending on
+ the Ethernet PHY connected to the CPSW Port.
diff --git a/drivers/phy/ti/Makefile b/drivers/phy/ti/Makefile
index 9f36175..bea8f25 100644
--- a/drivers/phy/ti/Makefile
+++ b/drivers/phy/ti/Makefile
@@ -6,3 +6,4 @@ obj-$(CONFIG_OMAP_USB2) += phy-omap-usb2.o
 obj-$(CONFIG_TI_PIPE3) += phy-ti-pipe3.o
 obj-$(CONFIG_PHY_TUSB1210) += phy-tusb1210.o
 obj-$(CONFIG_TWL4030_USB)  += phy-twl4030-usb.o
+obj-$(CONFIG_PHY_TI_GMII_SEL)  += phy-gmii-sel.o
diff --git a/drivers/phy/ti/phy-gmii-sel.c b/drivers/phy/ti/phy-gmii-sel.c
new file mode 100644
index 000..04ebf53
--- /dev/null
+++ b/drivers/phy/ti/phy-gmii-sel.c
@@ -0,0 +1,349 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Texas Instruments CPSW Port's PHY Interface Mode selection Driver
+ *
+ * Copyright (C) 2018 Texas Instruments Incorporated - http://www.ti.com/
+ *
+ * Based on cpsw-phy-sel.c driver created by Mugunthan V N 

+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+/* AM33xx SoC specific definitions for the CONTROL port */
+#define AM33XX_GMII_SEL_MODE_MII   0
+#define AM33XX_GMII_SEL_MODE_RMII  1
+#define AM33XX_GMII_SEL_MODE_RGMII 2
+
+enum {
+   PHY_GMII_SEL_PORT_MODE,
+   PHY_GMII_SEL_RGMII_ID_MODE,
+   PHY_GMII_SEL_RMII_IO_CLK_EN,
+   PHY_GMII_SEL_LAST,
+};
+
+struct phy_gmii_sel_phy_priv {
+   struct phy_gmii_sel_priv *priv;
+   u32 id;
+   struct phy  *if_phy;
+   int rmii_clock_external;
+   int phy_if_mode;
+   struct regmap_field *fields[PHY_GMII_SEL_LAST];
+};
+
+struct phy_gmii_sel_soc_data {
+   u32 num_ports;
+   u32 features;
+   const struct reg_field (*regfields)[PHY_GMII_SEL_LAST];
+};
+
+struct phy_gmii_sel_priv {
+   struct device *dev;
+   const struct phy_gmii_sel_soc_data *soc_data;
+   struct regmap *regmap;
+   struct phy_provider *phy_provider;
+   struct phy_gmii_sel_phy_priv *if_phys;
+};
+
+static int phy_gmii_sel_mode(struct phy *phy, enum phy_mode mode, int submode)
+{
+   struct phy_gmii_sel_phy_priv *if_phy = phy_get_drvdata(phy);
+   const struct phy_gmii_sel_soc_data *soc_data = if_phy->priv->soc_data;
+   struct device *dev = if_phy->priv->dev;
+   struct regmap_field *regfield;
+   int ret, rgmii_id = 0;
+   u32 gmii_sel_mode = 0;
+
+   if (mode != PHY_MODE_ETHERNET)
+   return -EINVAL;
+
+   switch (submode) {
+   case PHY_INTERFACE_MODE_RMII:
+   gmii_sel_mode = AM33XX_GMII_SEL_MODE_RMII;
+   break;
+
+   case PHY_INTERFACE_MODE_RGMII:
+   gmii_sel_mode = AM33XX_GMII_SEL_MODE_RGMII;
+   break;
+
+   case PHY_INTERFACE_MODE_RGMII_ID:
+   case PHY_INTERFACE_MODE_RGMII_RXID:
+   case PHY_INTERFACE_MODE_RGMII_TXID:
+   gmii_sel_mode = AM33XX_GMII_SEL_MODE_RGMII;
+   rgmii_id = 1;
+   break;
+
+   case PHY_INTERFACE_MODE_MII:
+   mode = AM33XX_GMII_SEL_MODE_MII;
+   break;
+
+   default:
+   dev_warn(dev,
+

[PATCH v2 net-next] net: remove unsafe skb_insert()

2018-11-25 Thread Eric Dumazet

I do not see how one can effectively use skb_insert() without holding
some kind of lock. Otherwise other cpus could have changed the list
right before we have a chance of acquiring list->lock.

Only existing user is in drivers/infiniband/hw/nes/nes_mgt.c and this
one probably meant to use __skb_insert() since it appears nesqp->pau_list
is protected by nesqp->pau_lock. This looks like nesqp->pau_lock
could be removed, since nesqp->pau_list.lock could be used instead.

Signed-off-by: Eric Dumazet 
Cc: Faisal Latif 
Cc: Doug Ledford 
Cc: Jason Gunthorpe 
Cc: linux-rdma 
---
 drivers/infiniband/hw/nes/nes_mgt.c |  4 ++--
 include/linux/skbuff.h  |  2 --
 net/core/skbuff.c   | 22 --
 3 files changed, 2 insertions(+), 26 deletions(-)

diff --git a/drivers/infiniband/hw/nes/nes_mgt.c 
b/drivers/infiniband/hw/nes/nes_mgt.c
index 
fc0c191014e908eea32d752f3499295ef143aa0a..cc4dce5c3e5f6d99fc44fcde7334e70ac7a33002
 100644
--- a/drivers/infiniband/hw/nes/nes_mgt.c
+++ b/drivers/infiniband/hw/nes/nes_mgt.c
@@ -551,14 +551,14 @@ static void queue_fpdus(struct sk_buff *skb, struct 
nes_vnic *nesvnic, struct ne
 
/* Queue skb by sequence number */
if (skb_queue_len(>pau_list) == 0) {
-   skb_queue_head(>pau_list, skb);
+   __skb_queue_head(>pau_list, skb);
} else {
skb_queue_walk(>pau_list, tmpskb) {
cb = (struct nes_rskb_cb *)>cb[0];
if (before(seqnum, cb->seqnum))
break;
}
-   skb_insert(tmpskb, skb, >pau_list);
+   __skb_insert(skb, tmpskb->prev, tmpskb, >pau_list);
}
if (nesqp->pau_state == PAU_READY)
process_it = true;
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 
f17a7452ac7bf47ef4bcf89840bba165cee6f50a..73902acf2b71c8800d81b744a936a7420f33b459
 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -1749,8 +1749,6 @@ static inline void skb_queue_head_init_class(struct 
sk_buff_head *list,
  * The "__skb_()" functions are the non-atomic ones that
  * can only be called with interrupts disabled.
  */
-void skb_insert(struct sk_buff *old, struct sk_buff *newsk,
-   struct sk_buff_head *list);
 static inline void __skb_insert(struct sk_buff *newsk,
struct sk_buff *prev, struct sk_buff *next,
struct sk_buff_head *list)
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 
9a8a72cefe9b94d3821b9cc5ba5bba647ae51267..02cd7ae3d0fb26ef0a8b006390154fdefd0d292f
 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -2990,28 +2990,6 @@ void skb_append(struct sk_buff *old, struct sk_buff 
*newsk, struct sk_buff_head
 }
 EXPORT_SYMBOL(skb_append);
 
-/**
- * skb_insert  -   insert a buffer
- * @old: buffer to insert before
- * @newsk: buffer to insert
- * @list: list to use
- *
- * Place a packet before a given packet in a list. The list locks are
- * taken and this function is atomic with respect to other list locked
- * calls.
- *
- * A buffer cannot be placed on two lists at the same time.
- */
-void skb_insert(struct sk_buff *old, struct sk_buff *newsk, struct 
sk_buff_head *list)
-{
-   unsigned long flags;
-
-   spin_lock_irqsave(>lock, flags);
-   __skb_insert(newsk, old->prev, old, list);
-   spin_unlock_irqrestore(>lock, flags);
-}
-EXPORT_SYMBOL(skb_insert);
-
 static inline void skb_split_inside_header(struct sk_buff *skb,
   struct sk_buff* skb1,
   const u32 len, const int pos)
-- 
2.20.0.rc0.387.gc7a69e6b6c-goog

Re: [PATCH net-next] net: remove unsafe skb_insert()

2018-11-25 Thread Eric Dumazet

On Sun, Nov 25, 2018 at 10:29 AM David Miller  wrote:
>
> From: Eric Dumazet 
> Date: Sun, 25 Nov 2018 08:26:23 -0800
>
> > I do not see how one can effectively use skb_insert() without holding
> > some kind of lock. Otherwise other cpus could have changed the list
> > right before we have a chance of acquiring list->lock.
> >
> > Only existing user is in drivers/infiniband/hw/nes/nes_mgt.c and this
> > one probably meant to use __skb_insert() since it appears nesqp->pau_list
> > is protected by nesqp->pau_lock. This looks like nesqp->pau_lock
> > could be removed, since nesqp->pau_list.lock could be used instead.
> >
> > Signed-off-by: Eric Dumazet 
>
> Good find.
>
> Indeed, any of the queue SKB manipulation functions that take two SKBs
> as an argument are suspect in this manner.
>
> Applied, thanks Eric.

Oh well, this does not build.

Since you have not pushed your tree yet, maybe we can replace this
with a version that actually compiles.

Please let me know if a relative patch or a v2 is needed, thanks.

Re: [PATCH net-next] net: remove unsafe skb_insert()

2018-11-25 Thread kbuild test robot

Hi Eric,

I love your patch! Yet something to improve:

[auto build test ERROR on net-next/master]

url:
https://github.com/0day-ci/linux/commits/Eric-Dumazet/net-remove-unsafe-skb_insert/20181126-061342
config: x86_64-randconfig-x009-201847 (attached as .config)
compiler: gcc-7 (Debian 7.3.0-1) 7.3.0
reproduce:
# save the attached .config to linux build tree
make ARCH=x86_64 

All errors (new ones prefixed by >>):

   drivers/infiniband//hw/nes/nes_mgt.c: In function 'queue_fpdus':
>> drivers/infiniband//hw/nes/nes_mgt.c:561:29: error: passing argument 3 of 
>> '__skb_insert' from incompatible pointer type 
>> [-Werror=incompatible-pointer-types]
  __skb_insert(tmpskb, skb, >pau_list);
^
   In file included from drivers/infiniband//hw/nes/nes_mgt.c:34:0:
   include/linux/skbuff.h:1752:20: note: expected 'struct sk_buff *' but 
argument is of type 'struct sk_buff_head *'
static inline void __skb_insert(struct sk_buff *newsk,
   ^~~~
>> drivers/infiniband//hw/nes/nes_mgt.c:561:3: error: too few arguments to 
>> function '__skb_insert'
  __skb_insert(tmpskb, skb, >pau_list);
  ^~~~
   In file included from drivers/infiniband//hw/nes/nes_mgt.c:34:0:
   include/linux/skbuff.h:1752:20: note: declared here
static inline void __skb_insert(struct sk_buff *newsk,
   ^~~~
   cc1: some warnings being treated as errors

vim +/__skb_insert +561 drivers/infiniband//hw/nes/nes_mgt.c

   503  
   504  /**
   505   * queue_fpdus - Handle fpdu's that hw passed up to sw
   506   */
   507  static void queue_fpdus(struct sk_buff *skb, struct nes_vnic *nesvnic, 
struct nes_qp *nesqp)
   508  {
   509  struct sk_buff *tmpskb;
   510  struct nes_rskb_cb *cb;
   511  struct iphdr *iph;
   512  struct tcphdr *tcph;
   513  unsigned char *tcph_end;
   514  u32 rcv_nxt;
   515  u32 rcv_wnd;
   516  u32 seqnum;
   517  u32 len;
   518  bool process_it = false;
   519  unsigned long flags;
   520  
   521  /* Move data ptr to after tcp header */
   522  iph = (struct iphdr *)skb->data;
   523  tcph = (struct tcphdr *)(((char *)iph) + (4 * iph->ihl));
   524  seqnum = be32_to_cpu(tcph->seq);
   525  tcph_end = (((char *)tcph) + (4 * tcph->doff));
   526  
   527  len = be16_to_cpu(iph->tot_len);
   528  if (skb->len > len)
   529  skb_trim(skb, len);
   530  skb_pull(skb, tcph_end - skb->data);
   531  
   532  /* Initialize tracking values */
   533  cb = (struct nes_rskb_cb *)>cb[0];
   534  cb->seqnum = seqnum;
   535  
   536  /* Make sure data is in the receive window */
   537  rcv_nxt = nesqp->pau_rcv_nxt;
   538  rcv_wnd = le32_to_cpu(nesqp->nesqp_context->rcv_wnd);
   539  if (!between(seqnum, rcv_nxt, (rcv_nxt + rcv_wnd))) {
   540  nes_mgt_free_skb(nesvnic->nesdev, skb, 
PCI_DMA_TODEVICE);
   541  nes_rem_ref_cm_node(nesqp->cm_node);
   542  return;
   543  }
   544  
   545  spin_lock_irqsave(>pau_lock, flags);
   546  
   547  if (nesqp->pau_busy)
   548  nesqp->pau_pending = 1;
   549  else
   550  nesqp->pau_busy = 1;
   551  
   552  /* Queue skb by sequence number */
   553  if (skb_queue_len(>pau_list) == 0) {
   554  __skb_queue_head(>pau_list, skb);
   555  } else {
   556  skb_queue_walk(>pau_list, tmpskb) {
   557  cb = (struct nes_rskb_cb *)>cb[0];
   558  if (before(seqnum, cb->seqnum))
   559  break;
   560  }
 > 561  __skb_insert(tmpskb, skb, >pau_list);
   562  }
   563  if (nesqp->pau_state == PAU_READY)
   564  process_it = true;
   565  spin_unlock_irqrestore(>pau_lock, flags);
   566  
   567  if (process_it)
   568  process_fpdus(nesvnic, nesqp);
   569  
   570  return;
   571  }
   572  

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip

Re: [PATCH net-next] net: phy: fix two issues with linkmode bitmaps

2018-11-25 Thread Andrew Lunn

> Eventually we'd have three types of mii_xxx_to_linkmode_yyy functions:
> 
> 1. Function first zeroes the destination linkmode bitmap
> 2. Function sets bits in the linkmode bitmap but doesn't clear bits
>if condition isn't met
> 3. Function clears / sets bits it's responsible for
> 
> example case 1: mmd_eee_adv_to_linkmode
> example case 2: mii_stat1000_to_linkmode_lpa_t
> example case 3: what you just proposed as fix for
> mii_adv_to_linkmode_adv_t
> 
> Because function naming is the same I'm afraid they easily can be used
> incorrectly (the bugs we just discuss are good examples). Maybe it
> could be an option to reflect the semantics in the name like this
> (better suited proposals welcome):
> 
> case 1: mii_xxx_to_linkmode_yyy
> case 2: mii_xxx_or_linkmode_yyy
> case 3: mii_xxx_mod_linkmode_yyy

Hi Heiner

That is a good idea. We should probably do this first, it will help
find the bugs.

 Andrew

Re: Can decnet be deprecated?

2018-11-25 Thread David Miller

From: Bjørn Mork 
Date: Sun, 25 Nov 2018 12:30:26 +0100

> David Miller  writes:
>> From: David Ahern 
>> Date: Sat, 24 Nov 2018 17:12:48 -0700
>>
>>> IPX was moved to staging at the end of last year. Can decnet follow
>>> suit? git log seems to indicate no active development in a very long time.
>>
>> Last time I tried to do that someone immediately said on the list
>> "Don't, we're using that!"
> 
> Not sure about that.  What I can see is a claim that it has no bugs:
> http://patchwork.ozlabs.org/patch/837484/
> 
> The V1 received only support for removal:
> http://patchwork.ozlabs.org/patch/837261/
> 
> But no one claimed they were using decnet.

Ok, if people want to try and deprecate it again we can try.

Re: [PATCH net-next] net: remove unsafe skb_insert()

2018-11-25 Thread David Miller

From: Eric Dumazet 
Date: Sun, 25 Nov 2018 08:26:23 -0800

> I do not see how one can effectively use skb_insert() without holding
> some kind of lock. Otherwise other cpus could have changed the list
> right before we have a chance of acquiring list->lock.
> 
> Only existing user is in drivers/infiniband/hw/nes/nes_mgt.c and this
> one probably meant to use __skb_insert() since it appears nesqp->pau_list
> is protected by nesqp->pau_lock. This looks like nesqp->pau_lock
> could be removed, since nesqp->pau_list.lock could be used instead.
> 
> Signed-off-by: Eric Dumazet 

Good find.

Indeed, any of the queue SKB manipulation functions that take two SKBs
as an argument are suspect in this manner.

Applied, thanks Eric.

Re: [PATCH net-next v2 0/2] r8169: make use of xmit_more and __netdev_sent_queue

2018-11-25 Thread David Miller

From: Heiner Kallweit 
Date: Sun, 25 Nov 2018 14:29:22 +0100

> This series adds helper __netdev_sent_queue to the core and makes use
> of it in the r8169 driver.
> 
> Heiner Kallweit (2):
>   net: core: add __netdev_sent_queue as variant of __netdev_tx_sent_queue
>   r8169: make use of xmit_more and __netdev_sent_queue
> 
> v2:
> - fix minor style issue

Series applied.

Re: [PATCH iproute2-next] man: tc: update man page for fq packet scheduler

2018-11-25 Thread David Ahern

On 11/24/18 6:44 PM, Eric Dumazet wrote:
> Signed-off-by: Eric Dumazet 
> ---
>  man/man8/tc-fq.8 | 37 ++---
>  1 file changed, 26 insertions(+), 11 deletions(-)
> 

applied to iproute2-next. Thanks, Eric.

Re: [PATCH net-next] net: phy: fix two issues with linkmode bitmaps

2018-11-25 Thread Heiner Kallweit

On 25.11.2018 17:45, Andrew Lunn wrote:
> On Sun, Nov 25, 2018 at 03:23:42PM +0100, Heiner Kallweit wrote:
>> I wondered why ethtool suddenly reports that link partner doesn't
>> support aneg and GBit modes. It turned out that this is caused by two
>> bugs in conversion to linkmode bitmaps.
>>
>> 1. In genphy_read_status the value of phydev->lp_advertising is
>>overwritten, thus GBit modes aren't reported any longer.
>> 2. In mii_lpa_to_linkmode_lpa_t the aneg bit was overwritten by the
>>call to mii_adv_to_linkmode_adv_t.
> 
> Hi Heiner
> 
> Thanks for looking into this.
> 
> There are more bugs :-(
> 
> static inline void mii_lpa_to_linkmode_lpa_t(unsigned long *lp_advertising,
>  u32 lpa)
> {
> if (lpa & LPA_LPACK)
> linkmode_set_bit(ETHTOOL_LINK_MODE_Autoneg_BIT,
>  lp_advertising);
> 
> mii_adv_to_linkmode_adv_t(lp_advertising, lpa);
> }
> 
> But
> 
> static inline void mii_adv_to_linkmode_adv_t(unsigned long *advertising,
>  u32 adv)
> {
> linkmode_zero(advertising);
> 
> if (adv & ADVERTISE_10HALF)
> linkmode_set_bit(ETHTOOL_LINK_MODE_10baseT_Half_BIT,
>  advertising);
>  
> So the Autoneg_BIT gets cleared.
> 
> I think the better fix is to take the linkmode_zero() out from here.
> 
> Then:
> 
> if (adv & ADVERTISE_10HALF)
>linkmode_set_bit(ETHTOOL_LINK_MODE_10baseT_Half_BIT,
> advertising);
> + else
> +  linkmode_clear_bit(ETHTOOL_LINK_MODE_10baseT_Half_BIT,
> + advertising);
> 
> for all the bits mii_adv_to_linkmode_adv_t() looks at.
> 
> So mii_adv_to_linkmode_adv_t() only modifies bits it is responsible
> for, and leaves the others alone.
> 
> Andrew
> 

mii_adv_to_linkmode_adv_t() is used also in phy_mii_ioctl(), and I'm
not sure the proposed change is safe there.

Eventually we'd have three types of mii_xxx_to_linkmode_yyy functions:

1. Function first zeroes the destination linkmode bitmap
2. Function sets bits in the linkmode bitmap but doesn't clear bits
   if condition isn't met
3. Function clears / sets bits it's responsible for

example case 1: mmd_eee_adv_to_linkmode
example case 2: mii_stat1000_to_linkmode_lpa_t
example case 3: what you just proposed as fix for
mii_adv_to_linkmode_adv_t

Because function naming is the same I'm afraid they easily can be used
incorrectly (the bugs we just discuss are good examples). Maybe it
could be an option to reflect the semantics in the name like this
(better suited proposals welcome):

case 1: mii_xxx_to_linkmode_yyy
case 2: mii_xxx_or_linkmode_yyy
case 3: mii_xxx_mod_linkmode_yyy

Heiner

Re: [PATCH net-next] net: phy: fix two issues with linkmode bitmaps

2018-11-25 Thread Andrew Lunn

On Sun, Nov 25, 2018 at 03:23:42PM +0100, Heiner Kallweit wrote:
> I wondered why ethtool suddenly reports that link partner doesn't
> support aneg and GBit modes. It turned out that this is caused by two
> bugs in conversion to linkmode bitmaps.
> 
> 1. In genphy_read_status the value of phydev->lp_advertising is
>overwritten, thus GBit modes aren't reported any longer.
> 2. In mii_lpa_to_linkmode_lpa_t the aneg bit was overwritten by the
>call to mii_adv_to_linkmode_adv_t.

Hi Heiner

Thanks for looking into this.

There are more bugs :-(

static inline void mii_lpa_to_linkmode_lpa_t(unsigned long *lp_advertising,
 u32 lpa)
{
if (lpa & LPA_LPACK)
linkmode_set_bit(ETHTOOL_LINK_MODE_Autoneg_BIT,
 lp_advertising);

mii_adv_to_linkmode_adv_t(lp_advertising, lpa);
}

But

static inline void mii_adv_to_linkmode_adv_t(unsigned long *advertising,
 u32 adv)
{
linkmode_zero(advertising);

if (adv & ADVERTISE_10HALF)
linkmode_set_bit(ETHTOOL_LINK_MODE_10baseT_Half_BIT,
 advertising);
 
So the Autoneg_BIT gets cleared.

I think the better fix is to take the linkmode_zero() out from here.

Then:

if (adv & ADVERTISE_10HALF)
   linkmode_set_bit(ETHTOOL_LINK_MODE_10baseT_Half_BIT,
advertising);
+   else
+  linkmode_clear_bit(ETHTOOL_LINK_MODE_10baseT_Half_BIT,
+ advertising);

for all the bits mii_adv_to_linkmode_adv_t() looks at.

So mii_adv_to_linkmode_adv_t() only modifies bits it is responsible
for, and leaves the others alone.

Andrew

[PATCH net-next] net: remove unsafe skb_insert()

2018-11-25 Thread Eric Dumazet

I do not see how one can effectively use skb_insert() without holding
some kind of lock. Otherwise other cpus could have changed the list
right before we have a chance of acquiring list->lock.

Only existing user is in drivers/infiniband/hw/nes/nes_mgt.c and this
one probably meant to use __skb_insert() since it appears nesqp->pau_list
is protected by nesqp->pau_lock. This looks like nesqp->pau_lock
could be removed, since nesqp->pau_list.lock could be used instead.

Signed-off-by: Eric Dumazet 
Cc: Faisal Latif 
Cc: Doug Ledford 
Cc: Jason Gunthorpe 
Cc: linux-rdma 
---
 drivers/infiniband/hw/nes/nes_mgt.c |  4 ++--
 include/linux/skbuff.h  |  2 --
 net/core/skbuff.c   | 22 --
 3 files changed, 2 insertions(+), 26 deletions(-)

diff --git a/drivers/infiniband/hw/nes/nes_mgt.c 
b/drivers/infiniband/hw/nes/nes_mgt.c
index 
fc0c191014e908eea32d752f3499295ef143aa0a..abb54d30d35dd53fa983ee437506933eeba72746
 100644
--- a/drivers/infiniband/hw/nes/nes_mgt.c
+++ b/drivers/infiniband/hw/nes/nes_mgt.c
@@ -551,14 +551,14 @@ static void queue_fpdus(struct sk_buff *skb, struct 
nes_vnic *nesvnic, struct ne
 
/* Queue skb by sequence number */
if (skb_queue_len(>pau_list) == 0) {
-   skb_queue_head(>pau_list, skb);
+   __skb_queue_head(>pau_list, skb);
} else {
skb_queue_walk(>pau_list, tmpskb) {
cb = (struct nes_rskb_cb *)>cb[0];
if (before(seqnum, cb->seqnum))
break;
}
-   skb_insert(tmpskb, skb, >pau_list);
+   __skb_insert(tmpskb, skb, >pau_list);
}
if (nesqp->pau_state == PAU_READY)
process_it = true;
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 
f17a7452ac7bf47ef4bcf89840bba165cee6f50a..73902acf2b71c8800d81b744a936a7420f33b459
 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -1749,8 +1749,6 @@ static inline void skb_queue_head_init_class(struct 
sk_buff_head *list,
  * The "__skb_()" functions are the non-atomic ones that
  * can only be called with interrupts disabled.
  */
-void skb_insert(struct sk_buff *old, struct sk_buff *newsk,
-   struct sk_buff_head *list);
 static inline void __skb_insert(struct sk_buff *newsk,
struct sk_buff *prev, struct sk_buff *next,
struct sk_buff_head *list)
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 
9a8a72cefe9b94d3821b9cc5ba5bba647ae51267..02cd7ae3d0fb26ef0a8b006390154fdefd0d292f
 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -2990,28 +2990,6 @@ void skb_append(struct sk_buff *old, struct sk_buff 
*newsk, struct sk_buff_head
 }
 EXPORT_SYMBOL(skb_append);
 
-/**
- * skb_insert  -   insert a buffer
- * @old: buffer to insert before
- * @newsk: buffer to insert
- * @list: list to use
- *
- * Place a packet before a given packet in a list. The list locks are
- * taken and this function is atomic with respect to other list locked
- * calls.
- *
- * A buffer cannot be placed on two lists at the same time.
- */
-void skb_insert(struct sk_buff *old, struct sk_buff *newsk, struct 
sk_buff_head *list)
-{
-   unsigned long flags;
-
-   spin_lock_irqsave(>lock, flags);
-   __skb_insert(newsk, old->prev, old, list);
-   spin_unlock_irqrestore(>lock, flags);
-}
-EXPORT_SYMBOL(skb_insert);
-
 static inline void skb_split_inside_header(struct sk_buff *skb,
   struct sk_buff* skb1,
   const u32 len, const int pos)
-- 
2.20.0.rc0.387.gc7a69e6b6c-goog

[PATCH net-next] net: phy: fix two issues with linkmode bitmaps

2018-11-25 Thread Heiner Kallweit

I wondered why ethtool suddenly reports that link partner doesn't
support aneg and GBit modes. It turned out that this is caused by two
bugs in conversion to linkmode bitmaps.

1. In genphy_read_status the value of phydev->lp_advertising is
   overwritten, thus GBit modes aren't reported any longer.
2. In mii_lpa_to_linkmode_lpa_t the aneg bit was overwritten by the
   call to mii_adv_to_linkmode_adv_t.

Fixes: c0ec3c273677 ("net: phy: Convert u32 phydev->lp_advertising to linkmode")
Signed-off-by: Heiner Kallweit 
---
 drivers/net/phy/phy_device.c | 5 -
 include/linux/mii.h  | 4 ++--
 2 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
index 0904002b1..94f60c08b 100644
--- a/drivers/net/phy/phy_device.c
+++ b/drivers/net/phy/phy_device.c
@@ -1696,6 +1696,7 @@ int genphy_read_status(struct phy_device *phydev)
int lpagb = 0;
int common_adv;
int common_adv_gb = 0;
+   __ETHTOOL_DECLARE_LINK_MODE_MASK(lpa_tmp);
 
/* Update the link, but return if there was an error */
err = genphy_update_link(phydev);
@@ -1734,7 +1735,9 @@ int genphy_read_status(struct phy_device *phydev)
if (lpa < 0)
return lpa;
 
-   mii_lpa_to_linkmode_lpa_t(phydev->lp_advertising, lpa);
+   mii_lpa_to_linkmode_lpa_t(lpa_tmp, lpa);
+   linkmode_or(phydev->lp_advertising, phydev->lp_advertising,
+   lpa_tmp);
 
adv = phy_read(phydev, MII_ADVERTISE);
if (adv < 0)
diff --git a/include/linux/mii.h b/include/linux/mii.h
index fb7ae4ae8..08450609d 100644
--- a/include/linux/mii.h
+++ b/include/linux/mii.h
@@ -413,11 +413,11 @@ static inline void mii_adv_to_linkmode_adv_t(unsigned 
long *advertising,
 static inline void mii_lpa_to_linkmode_lpa_t(unsigned long *lp_advertising,
 u32 lpa)
 {
+   mii_adv_to_linkmode_adv_t(lp_advertising, lpa);
+
if (lpa & LPA_LPACK)
linkmode_set_bit(ETHTOOL_LINK_MODE_Autoneg_BIT,
 lp_advertising);
-
-   mii_adv_to_linkmode_adv_t(lp_advertising, lpa);
 }
 
 /**
-- 
2.19.2

[PATCH net-next v2 2/2] r8169: make use of xmit_more and __netdev_sent_queue

2018-11-25 Thread Heiner Kallweit

Make use of xmit_more and add the functionality introduced with
3e59020abf0f ("net: bql: add __netdev_tx_sent_queue()").
I used the mlx4 driver as template.

Signed-off-by: Heiner Kallweit 
---
v2:
- fix minor style issue
---
 drivers/net/ethernet/realtek/r8169.c | 19 +--
 1 file changed, 9 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ethernet/realtek/r8169.c 
b/drivers/net/ethernet/realtek/r8169.c
index 5ee684f9e..4114c2712 100644
--- a/drivers/net/ethernet/realtek/r8169.c
+++ b/drivers/net/ethernet/realtek/r8169.c
@@ -6069,6 +6069,7 @@ static netdev_tx_t rtl8169_start_xmit(struct sk_buff *skb,
struct device *d = tp_to_dev(tp);
dma_addr_t mapping;
u32 opts[2], len;
+   bool stop_queue;
int frags;
 
if (unlikely(!rtl_tx_slots_avail(tp, skb_shinfo(skb)->nr_frags))) {
@@ -6110,8 +6111,6 @@ static netdev_tx_t rtl8169_start_xmit(struct sk_buff *skb,
 
txd->opts2 = cpu_to_le32(opts[1]);
 
-   netdev_sent_queue(dev, skb->len);
-
skb_tx_timestamp(skb);
 
/* Force memory writes to complete before releasing descriptor */
@@ -6124,16 +6123,16 @@ static netdev_tx_t rtl8169_start_xmit(struct sk_buff 
*skb,
 
tp->cur_tx += frags + 1;
 
-   RTL_W8(tp, TxPoll, NPQ);
+   stop_queue = !rtl_tx_slots_avail(tp, MAX_SKB_FRAGS);
+   if (unlikely(stop_queue))
+   netif_stop_queue(dev);
 
-   mmiowb();
+   if (__netdev_sent_queue(dev, skb->len, skb->xmit_more)) {
+   RTL_W8(tp, TxPoll, NPQ);
+   mmiowb();
+   }
 
-   if (!rtl_tx_slots_avail(tp, MAX_SKB_FRAGS)) {
-   /* Avoid wrongly optimistic queue wake-up: rtl_tx thread must
-* not miss a ring update when it notices a stopped queue.
-*/
-   smp_wmb();
-   netif_stop_queue(dev);
+   if (unlikely(stop_queue)) {
/* Sync with rtl_tx:
 * - publish queue status and cur_tx ring index (write barrier)
 * - refresh dirty_tx ring index (read barrier).
-- 
2.19.2

[PATCH net-next v2 1/2] net: core: add __netdev_sent_queue as variant of __netdev_tx_sent_queue

2018-11-25 Thread Heiner Kallweit

Similar to netdev_sent_queue add helper __netdev_sent_queue as variant
of __netdev_tx_sent_queue.

Signed-off-by: Heiner Kallweit 
---
v2:
- no changes
---
 include/linux/netdevice.h | 8 
 1 file changed, 8 insertions(+)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 1dcc0628b..a417fa501 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -3214,6 +3214,14 @@ static inline void netdev_sent_queue(struct net_device 
*dev, unsigned int bytes)
netdev_tx_sent_queue(netdev_get_tx_queue(dev, 0), bytes);
 }
 
+static inline bool __netdev_sent_queue(struct net_device *dev,
+  unsigned int bytes,
+  bool xmit_more)
+{
+   return __netdev_tx_sent_queue(netdev_get_tx_queue(dev, 0), bytes,
+ xmit_more);
+}
+
 static inline void netdev_tx_completed_queue(struct netdev_queue *dev_queue,
 unsigned int pkts, unsigned int 
bytes)
 {
-- 
2.19.1

[PATCH net-next v2 0/2] r8169: make use of xmit_more and __netdev_sent_queue

2018-11-25 Thread Heiner Kallweit

This series adds helper __netdev_sent_queue to the core and makes use
of it in the r8169 driver.

Heiner Kallweit (2):
  net: core: add __netdev_sent_queue as variant of __netdev_tx_sent_queue
  r8169: make use of xmit_more and __netdev_sent_queue

v2:
- fix minor style issue

 drivers/net/ethernet/realtek/r8169.c | 19 +--
 include/linux/netdevice.h|  8 
 2 files changed, 17 insertions(+), 10 deletions(-)

-- 
2.19.1

Re: Can decnet be deprecated?

2018-11-25 Thread Loganaden Velvindron

On Sun, Nov 25, 2018 at 4:14 AM David Ahern  wrote:
>
> IPX was moved to staging at the end of last year. Can decnet follow
> suit? git log seems to indicate no active development in a very long time.
>
> David

Kill it :)

Re: Can decnet be deprecated?

2018-11-25 Thread Bjørn Mork

David Miller  writes:
> From: David Ahern 
> Date: Sat, 24 Nov 2018 17:12:48 -0700
>
>> IPX was moved to staging at the end of last year. Can decnet follow
>> suit? git log seems to indicate no active development in a very long time.
>
> Last time I tried to do that someone immediately said on the list
> "Don't, we're using that!"

Not sure about that.  What I can see is a claim that it has no bugs:
http://patchwork.ozlabs.org/patch/837484/

The V1 received only support for removal:
http://patchwork.ozlabs.org/patch/837261/

But no one claimed they were using decnet.


Bjørn

[PATCH net-next 4/5] mlxsw: spectrum_router: Introduce emulated VLAN RIFs

2018-11-25 Thread Ido Schimmel

Router interfaces (RIFs) constructed on top of VLAN-aware bridges are of
"VLAN" type, whereas RIFs constructed on top of VLAN-unaware bridges of
"FID" type.

In other words, the RIF type is derived from the underlying FID type.
VLAN RIFs are used on top of 802.1Q FIDs, whereas FID RIFs are used on
top of 802.1D FIDs.

Since the previous patch emulated 802.1Q FIDs using 802.1D FIDs, this
patch emulates VLAN RIFs using FID RIFs.

Signed-off-by: Ido Schimmel 
Reviewed-by: Petr Machata 
---
 drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c 
b/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
index 9e9bb57134f2..5cdd4ceee7a9 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
@@ -7296,6 +7296,15 @@ static const struct mlxsw_sp_rif_ops 
mlxsw_sp_rif_fid_ops = {
.fdb_del= mlxsw_sp_rif_fid_fdb_del,
 };
 
+static const struct mlxsw_sp_rif_ops mlxsw_sp_rif_vlan_emu_ops = {
+   .type   = MLXSW_SP_RIF_TYPE_VLAN,
+   .rif_size   = sizeof(struct mlxsw_sp_rif),
+   .configure  = mlxsw_sp_rif_fid_configure,
+   .deconfigure= mlxsw_sp_rif_fid_deconfigure,
+   .fid_get= mlxsw_sp_rif_vlan_fid_get,
+   .fdb_del= mlxsw_sp_rif_vlan_fdb_del,
+};
+
 static struct mlxsw_sp_rif_ipip_lb *
 mlxsw_sp_rif_ipip_lb_rif(struct mlxsw_sp_rif *rif)
 {
-- 
2.19.1

[PATCH net-next 2/5] mlxsw: spectrum_fid: Make flood index calculation more robust

2018-11-25 Thread Ido Schimmel

802.1D FIDs use a per-FID flood table, where the flood index into the
table is calculated by subtracting 4K from the FID's index.

Currently, 802.1D FIDs start at 4K, so the calculation is correct, but
if it was ever to change, the calculation will no longer be correct.

In addition, this change will allow us to reuse the flood index
calculation function in the next patch, where we are going to emulate
802.1Q FIDs using 802.1D FIDs.

Signed-off-by: Ido Schimmel 
Reviewed-by: Petr Machata 
---
 drivers/net/ethernet/mellanox/mlxsw/spectrum_fid.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_fid.c 
b/drivers/net/ethernet/mellanox/mlxsw/spectrum_fid.c
index 5008bf63d73b..e1739cda25cb 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_fid.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_fid.c
@@ -607,7 +607,7 @@ mlxsw_sp_fid_8021d_compare(const struct mlxsw_sp_fid *fid, 
const void *arg)
 
 static u16 mlxsw_sp_fid_8021d_flood_index(const struct mlxsw_sp_fid *fid)
 {
-   return fid->fid_index - fid->fid_family->start_index;
+   return fid->fid_index - VLAN_N_VID;
 }
 
 static int mlxsw_sp_port_vp_mode_trans(struct mlxsw_sp_port *mlxsw_sp_port)
-- 
2.19.1

[PATCH net-next 0/5] mlxsw: Prepare for VLAN-aware bridge w/VxLAN

2018-11-25 Thread Ido Schimmel

The driver is using 802.1Q filtering identifiers (FIDs) to represent the
different VLANs in the VLAN-aware bridge (only one is supported).

However, the device cannot assign a VNI to such FIDs, which prevents the
driver from supporting the enslavement of VxLAN devices to the
VLAN-aware bridge.

This patchset works around this limitation by emulating 802.1Q FIDs
using 802.1D FIDs, which can be assigned a VNI and so far have only been
used in conjunction with VLAN-unaware bridges.

The downside of this approach is that multiple {Port,VID}->FID entries
are required, whereas a single VID->FID entry is required with "true"
802.1Q FIDs.

First four patches introduce the new FID family of emulated 802.1Q FIDs
and the associated type of router interfaces (RIFs). Last patch flips
the driver to use this new FID family.

The diff is relatively small because the internal implementation of each
FID family is contained and hidden in spectrum_fid.c. Different internal
users (e.g., bridge, router) are aware of the different FID types, but
do not care about their internal implementation. This makes it trivial
to swap the current implementation of 802.1Q FIDs with the new one,
using 802.1D FIDs.

Ido Schimmel (5):
  mlxsw: spectrum_switchdev: Do not set field when it is reserved
  mlxsw: spectrum_fid: Make flood index calculation more robust
  mlxsw: spectrum_fid: Introduce emulated 802.1Q FIDs
  mlxsw: spectrum_router: Introduce emulated VLAN RIFs
  mlxsw: spectrum: Flip driver to use emulated 802.1Q FIDs

 .../net/ethernet/mellanox/mlxsw/spectrum.c| 14 +++---
 .../net/ethernet/mellanox/mlxsw/spectrum.h|  1 +
 .../ethernet/mellanox/mlxsw/spectrum_fid.c| 44 ++-
 .../ethernet/mellanox/mlxsw/spectrum_router.c | 11 -
 .../mellanox/mlxsw/spectrum_switchdev.c   |  3 +-
 5 files changed, 63 insertions(+), 10 deletions(-)

-- 
2.19.1

[PATCH net-next 3/5] mlxsw: spectrum_fid: Introduce emulated 802.1Q FIDs

2018-11-25 Thread Ido Schimmel

The driver uses 802.1Q FIDs when offloading a VLAN-aware bridge.
Unfortunately, it is not possible to assign a VNI to such FIDs, which
prompts the driver to forbid the enslavement of VxLAN devices to a
VLAN-aware bridge.

Workaround this hardware limitation by creating a new family of FIDs,
emulated 802.1Q FIDs. These FIDs are emulated using 802.1D FIDs, which
can be assigned a VNI.

The downside of this approach is that multiple {Port, VID}->FID entries
are required, whereas only a single VID->FID is required with "true"
802.1Q FIDs.

Signed-off-by: Ido Schimmel 
Reviewed-by: Petr Machata 
---
 .../ethernet/mellanox/mlxsw/spectrum_fid.c| 33 +++
 1 file changed, 33 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_fid.c 
b/drivers/net/ethernet/mellanox/mlxsw/spectrum_fid.c
index e1739cda25cb..99ccb11405a5 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_fid.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_fid.c
@@ -801,6 +801,39 @@ static const struct mlxsw_sp_fid_family 
mlxsw_sp_fid_8021d_family = {
.lag_vid_valid  = 1,
 };
 
+static const struct mlxsw_sp_fid_ops mlxsw_sp_fid_8021q_emu_ops = {
+   .setup  = mlxsw_sp_fid_8021q_setup,
+   .configure  = mlxsw_sp_fid_8021d_configure,
+   .deconfigure= mlxsw_sp_fid_8021d_deconfigure,
+   .index_alloc= mlxsw_sp_fid_8021d_index_alloc,
+   .compare= mlxsw_sp_fid_8021q_compare,
+   .flood_index= mlxsw_sp_fid_8021d_flood_index,
+   .port_vid_map   = mlxsw_sp_fid_8021d_port_vid_map,
+   .port_vid_unmap = mlxsw_sp_fid_8021d_port_vid_unmap,
+   .vni_set= mlxsw_sp_fid_8021d_vni_set,
+   .vni_clear  = mlxsw_sp_fid_8021d_vni_clear,
+   .nve_flood_index_set= mlxsw_sp_fid_8021d_nve_flood_index_set,
+   .nve_flood_index_clear  = mlxsw_sp_fid_8021d_nve_flood_index_clear,
+};
+
+/* There are 4K-2 emulated 802.1Q FIDs, starting right after the 802.1D FIDs */
+#define MLXSW_SP_FID_8021Q_EMU_START   (VLAN_N_VID + MLXSW_SP_FID_8021D_MAX)
+#define MLXSW_SP_FID_8021Q_EMU_END (MLXSW_SP_FID_8021Q_EMU_START + \
+VLAN_VID_MASK - 2)
+
+/* Range and flood configuration must match mlxsw_config_profile */
+static const struct mlxsw_sp_fid_family mlxsw_sp_fid_8021q_emu_family = {
+   .type   = MLXSW_SP_FID_TYPE_8021Q,
+   .fid_size   = sizeof(struct mlxsw_sp_fid_8021q),
+   .start_index= MLXSW_SP_FID_8021Q_EMU_START,
+   .end_index  = MLXSW_SP_FID_8021Q_EMU_END,
+   .flood_tables   = mlxsw_sp_fid_8021d_flood_tables,
+   .nr_flood_tables= ARRAY_SIZE(mlxsw_sp_fid_8021d_flood_tables),
+   .rif_type   = MLXSW_SP_RIF_TYPE_VLAN,
+   .ops= _sp_fid_8021q_emu_ops,
+   .lag_vid_valid  = 1,
+};
+
 static int mlxsw_sp_fid_rfid_configure(struct mlxsw_sp_fid *fid)
 {
/* rFIDs are allocated by the device during init */
-- 
2.19.1

[PATCH net-next 1/5] mlxsw: spectrum_switchdev: Do not set field when it is reserved

2018-11-25 Thread Ido Schimmel

When configuring an FDB entry pointing to a LAG netdev (or its upper),
the driver should only set the 'lag_vid' field when the FID (filtering
identifier) is of 802.1D type.

Extend the 802.1D FID family with an attribute indicating whether this
field should be set and based on its value set the field or leave it
blank.

Signed-off-by: Ido Schimmel 
Reviewed-by: Petr Machata 
---
 drivers/net/ethernet/mellanox/mlxsw/spectrum.h   | 1 +
 drivers/net/ethernet/mellanox/mlxsw/spectrum_fid.c   | 7 +++
 drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c | 3 ++-
 3 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum.h 
b/drivers/net/ethernet/mellanox/mlxsw/spectrum.h
index 973a9d2901f7..244972bf8b0a 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum.h
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum.h
@@ -721,6 +721,7 @@ int mlxsw_sp_setup_tc_prio(struct mlxsw_sp_port 
*mlxsw_sp_port,
   struct tc_prio_qopt_offload *p);
 
 /* spectrum_fid.c */
+bool mlxsw_sp_fid_lag_vid_valid(const struct mlxsw_sp_fid *fid);
 struct mlxsw_sp_fid *mlxsw_sp_fid_lookup_by_index(struct mlxsw_sp *mlxsw_sp,
  u16 fid_index);
 int mlxsw_sp_fid_nve_ifindex(const struct mlxsw_sp_fid *fid, int *nve_ifindex);
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_fid.c 
b/drivers/net/ethernet/mellanox/mlxsw/spectrum_fid.c
index 71b2d20afcc2..5008bf63d73b 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_fid.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_fid.c
@@ -98,6 +98,7 @@ struct mlxsw_sp_fid_family {
enum mlxsw_sp_rif_type rif_type;
const struct mlxsw_sp_fid_ops *ops;
struct mlxsw_sp *mlxsw_sp;
+   u8 lag_vid_valid:1;
 };
 
 static const int mlxsw_sp_sfgc_uc_packet_types[MLXSW_REG_SFGC_TYPE_MAX] = {
@@ -122,6 +123,11 @@ static const int *mlxsw_sp_packet_type_sfgc_types[] = {
[MLXSW_SP_FLOOD_TYPE_MC]= mlxsw_sp_sfgc_mc_packet_types,
 };
 
+bool mlxsw_sp_fid_lag_vid_valid(const struct mlxsw_sp_fid *fid)
+{
+   return fid->fid_family->lag_vid_valid;
+}
+
 struct mlxsw_sp_fid *mlxsw_sp_fid_lookup_by_index(struct mlxsw_sp *mlxsw_sp,
  u16 fid_index)
 {
@@ -792,6 +798,7 @@ static const struct mlxsw_sp_fid_family 
mlxsw_sp_fid_8021d_family = {
.nr_flood_tables= ARRAY_SIZE(mlxsw_sp_fid_8021d_flood_tables),
.rif_type   = MLXSW_SP_RIF_TYPE_FID,
.ops= _sp_fid_8021d_ops,
+   .lag_vid_valid  = 1,
 };
 
 static int mlxsw_sp_fid_rfid_configure(struct mlxsw_sp_fid *fid)
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c 
b/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c
index 73e5db176d7e..3c2428404b2e 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c
@@ -2482,7 +2482,8 @@ static void mlxsw_sp_fdb_notify_mac_lag_process(struct 
mlxsw_sp *mlxsw_sp,
 
bridge_device = bridge_port->bridge_device;
vid = bridge_device->vlan_enabled ? mlxsw_sp_port_vlan->vid : 0;
-   lag_vid = mlxsw_sp_port_vlan->vid;
+   lag_vid = mlxsw_sp_fid_lag_vid_valid(mlxsw_sp_port_vlan->fid) ?
+ mlxsw_sp_port_vlan->vid : 0;
 
 do_fdb_op:
err = mlxsw_sp_port_fdb_uc_lag_op(mlxsw_sp, lag_id, mac, fid, lag_vid,
-- 
2.19.1

[PATCH net-next 5/5] mlxsw: spectrum: Flip driver to use emulated 802.1Q FIDs

2018-11-25 Thread Ido Schimmel

Replace 802.1Q FIDs and VLAN RIFs with their emulated counterparts.

The emulated 802.1Q FIDs are actually 802.1D FIDs and thus use the same
flood tables, of per-FID type. Therefore, add 4K-1 entries to the
per-FID flood tables for the new FIDs and get rid of the FID-offset
flood tables that were used by the old 802.1Q FIDs.

Signed-off-by: Ido Schimmel 
Reviewed-by: Petr Machata 
---
 drivers/net/ethernet/mellanox/mlxsw/spectrum.c | 14 --
 drivers/net/ethernet/mellanox/mlxsw/spectrum_fid.c |  2 +-
 .../net/ethernet/mellanox/mlxsw/spectrum_router.c  |  2 +-
 3 files changed, 10 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum.c 
b/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
index 637e2ef76abe..93378d507962 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
@@ -4111,16 +4111,20 @@ static void mlxsw_sp_fini(struct mlxsw_core *mlxsw_core)
mlxsw_sp_kvdl_fini(mlxsw_sp);
 }
 
+/* Per-FID flood tables are used for both "true" 802.1D FIDs and emulated
+ * 802.1Q FIDs
+ */
+#define MLXSW_SP_FID_FLOOD_TABLE_SIZE  (MLXSW_SP_FID_8021D_MAX + \
+VLAN_VID_MASK - 1)
+
 static const struct mlxsw_config_profile mlxsw_sp1_config_profile = {
.used_max_mid   = 1,
.max_mid= MLXSW_SP_MID_MAX,
.used_flood_tables  = 1,
.used_flood_mode= 1,
.flood_mode = 3,
-   .max_fid_offset_flood_tables= 3,
-   .fid_offset_flood_table_size= VLAN_N_VID - 1,
.max_fid_flood_tables   = 3,
-   .fid_flood_table_size   = MLXSW_SP_FID_8021D_MAX,
+   .fid_flood_table_size   = MLXSW_SP_FID_FLOOD_TABLE_SIZE,
.used_max_ib_mc = 1,
.max_ib_mc  = 0,
.used_max_pkey  = 1,
@@ -4143,10 +4147,8 @@ static const struct mlxsw_config_profile 
mlxsw_sp2_config_profile = {
.used_flood_tables  = 1,
.used_flood_mode= 1,
.flood_mode = 3,
-   .max_fid_offset_flood_tables= 3,
-   .fid_offset_flood_table_size= VLAN_N_VID - 1,
.max_fid_flood_tables   = 3,
-   .fid_flood_table_size   = MLXSW_SP_FID_8021D_MAX,
+   .fid_flood_table_size   = MLXSW_SP_FID_FLOOD_TABLE_SIZE,
.used_max_ib_mc = 1,
.max_ib_mc  = 0,
.used_max_pkey  = 1,
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_fid.c 
b/drivers/net/ethernet/mellanox/mlxsw/spectrum_fid.c
index 99ccb11405a5..6830e79aed93 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_fid.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_fid.c
@@ -961,7 +961,7 @@ static const struct mlxsw_sp_fid_family 
mlxsw_sp_fid_dummy_family = {
 };
 
 static const struct mlxsw_sp_fid_family *mlxsw_sp_fid_family_arr[] = {
-   [MLXSW_SP_FID_TYPE_8021Q]   = _sp_fid_8021q_family,
+   [MLXSW_SP_FID_TYPE_8021Q]   = _sp_fid_8021q_emu_family,
[MLXSW_SP_FID_TYPE_8021D]   = _sp_fid_8021d_family,
[MLXSW_SP_FID_TYPE_RFID]= _sp_fid_rfid_family,
[MLXSW_SP_FID_TYPE_DUMMY]   = _sp_fid_dummy_family,
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c 
b/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
index 5cdd4ceee7a9..1557c5fc6d10 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
@@ -7373,7 +7373,7 @@ static const struct mlxsw_sp_rif_ops 
mlxsw_sp_rif_ipip_lb_ops = {
 
 static const struct mlxsw_sp_rif_ops *mlxsw_sp_rif_ops_arr[] = {
[MLXSW_SP_RIF_TYPE_SUBPORT] = _sp_rif_subport_ops,
-   [MLXSW_SP_RIF_TYPE_VLAN]= _sp_rif_vlan_ops,
+   [MLXSW_SP_RIF_TYPE_VLAN]= _sp_rif_vlan_emu_ops,
[MLXSW_SP_RIF_TYPE_FID] = _sp_rif_fid_ops,
[MLXSW_SP_RIF_TYPE_IPIP_LB] = _sp_rif_ipip_lb_ops,
 };
-- 
2.19.1

47 matches

Mail list logo