RE: [PATCH net-next v1] net/tls: Add support for async decryption of tls records

2018-08-17 Thread Vakul Garg



> -Original Message-
> From: Dave Watson 
> Sent: Saturday, August 18, 2018 3:43 AM
> To: Vakul Garg 
> Cc: netdev@vger.kernel.org; bor...@mellanox.com;
> avia...@mellanox.com; da...@davemloft.net
> Subject: Re: [PATCH net-next v1] net/tls: Add support for async decryption of
> tls records
> 
> On 08/16/18 08:49 PM, Vakul Garg wrote:
> > Changes since RFC version:
> > 1) Improved commit message.
> > 2) Fixed dequeued record offset handling because of which few of
> >tls selftests 'recv_partial, recv_peek, recv_peek_multiple' were
> failing.
> 
> Thanks! Commit message much more clear, tests work great for me also,
> only minor comments on clarity
> 
> > -   if (tls_sw_advance_skb(sk, skb, chunk)) {
> > +   if (async) {
> > +   /* Finished with current record, pick up next
> */
> > +   ctx->recv_pkt = NULL;
> > +   __strp_unpause(>strp);
> > +   goto mark_eor_chk_ctrl;
> 
> Control flow is a little hard to follow here, maybe just pass an async flag to
> tls_sw_advance_skb?  It already does strp_unpause and recv_pkt = NULL.
> 

I improved it but in a slightly different way. Please see in v2.
As net-next is closed right now, I would send the patch to you privately &
later post it on list when David gives a green signal.
Is it ok?


> > +   } else if (tls_sw_advance_skb(sk, skb, chunk)) {
> > /* Return full control message to
> >  * userspace before trying to parse
> >  * another message type
> >  */
> > +mark_eor_chk_ctrl:
> > msg->msg_flags |= MSG_EOR;
> > if (control != TLS_RECORD_TYPE_DATA)
> > goto recv_end;
> > +   } else {
> > +   break;
> 
> I don't see the need for the else { break; }, isn't this already covered by
> while(len); below as before?
 
When tls_sw_advance_skb() returns false, it is certain that we cannot 
continue in the loop. So putting a break here avoids having to execute
'if' checks and while (len) checks down below.


[PATCH] fm10k_main.c: fix missing return value check of alloc_skb()

2018-08-17 Thread Jiecheng Wu
Function fm10k_init_module() defined in 
drivers/net/ethernet/intel/fm10k/fm10k_main.c
 calls alloc_workqueue() to allocate memory for struct workqueue_struct which 
is 
 dereferenced immediately. As alloc_workqueue() may return NULL on failure, 
 this code piece may cause NULL pointer dereference bug.
---
 drivers/net/ethernet/intel/fm10k/fm10k_main.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_main.c 
b/drivers/net/ethernet/intel/fm10k/fm10k_main.c
index 3f53654..78a43d6 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_main.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_main.c
@@ -41,6 +41,8 @@ static int __init fm10k_init_module(void)
/* create driver workqueue */
fm10k_workqueue = alloc_workqueue("%s", WQ_MEM_RECLAIM, 0,
  fm10k_driver_name);
+   if (!fm10k_workqueue)
+   return -ENOMEM;
 
fm10k_dbg_init();
 
-- 
2.6.4



Re: [PATCH] fm10k_main.c: fix missing return value check of alloc_skb()

2018-08-17 Thread Andrew Lunn
On Sat, Aug 18, 2018 at 10:00:58AM +0800, Jiecheng Wu wrote:
> Function fm10k_init_module() defined in 
> drivers/net/ethernet/intel/fm10k/fm10k_main.c calls alloc_workqueue() to 
> allocate memory for struct workqueue_struct which is dereferenced 
> immediately. As alloc_workqueue() may return NULL on failure, this code piece 
> may cause NULL pointer dereference bug.

Hi Jiecheng

Please wrap your commit message to around 80 character wide.

You also need to add a Signed-off-by to your patch. Please see 

https://www.kernel.org/doc/html/v4.17/process/submitting-patches.html#developer-s-certificate-of-origin-1-1

scripts/checkpatch.pl will help you get these things rights.

Andrew


[PATCH] fm10k_main.c: fix missing return value check of alloc_skb()

2018-08-17 Thread Jiecheng Wu
Function fm10k_init_module() defined in 
drivers/net/ethernet/intel/fm10k/fm10k_main.c calls alloc_workqueue() to 
allocate memory for struct workqueue_struct which is dereferenced immediately. 
As alloc_workqueue() may return NULL on failure, this code piece may cause NULL 
pointer dereference bug.
---
 drivers/net/ethernet/intel/fm10k/fm10k_main.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_main.c 
b/drivers/net/ethernet/intel/fm10k/fm10k_main.c
index 3f53654..78a43d6 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_main.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_main.c
@@ -41,6 +41,8 @@ static int __init fm10k_init_module(void)
/* create driver workqueue */
fm10k_workqueue = alloc_workqueue("%s", WQ_MEM_RECLAIM, 0,
  fm10k_driver_name);
+   if (!fm10k_workqueue)
+   return -ENOMEM;
 
fm10k_dbg_init();
 
-- 
2.6.4



Re: Bug in FIB insert

2018-08-17 Thread David Ahern
On 8/16/18 6:59 PM, Md. Islam wrote:
> There is a bug in fib_table_insert(). If I add following routes,
> 
> 23.20.0.0/14     veth1
> 23.20.0.0/15     veth2
> 
> FIB lookup on 23.22.111.212  results veth1, not veth2.
> 

veth1 is the correct lookup response. '22' toggles the 15th bit which
means the compare to 23.20/15 fails.


pull-request: bpf 2018-08-18

2018-08-17 Thread Daniel Borkmann
Hi David,

The following pull-request contains BPF updates for your *net* tree.

The main changes are:

1) Fix a BPF selftest failure in test_cgroup_storage due to rlimit
   restrictions, from Yonghong.

2) Fix a suspicious RCU rcu_dereference_check() warning triggered
   from removing a device's XDP memory allocator by using the correct
   rhashtable lookup function, from Tariq.

3) A batch of BPF sockmap and ULP fixes mainly fixing leaks and races
   as well as enforcing module aliases for ULPs. Another fix for BPF
   map redirect to make them work again with tail calls, from Daniel.

4) Fix XDP BPF samples to unload their programs upon SIGTERM, from Jesper.

Please consider pulling these changes from:

  git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git

Thanks a lot!



The following changes since commit 9a76aba02a37718242d7cdc294f0a3901928aa57:

  Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next 
(2018-08-15 15:04:25 -0700)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git 

for you to fetch changes up to f6069b9aa9934ede26f41ac0781fce241279ad43:

  bpf: fix redirect to map under tail calls (2018-08-17 15:56:23 -0700)


Alexei Starovoitov (1):
  Merge branch 'sockmap-ulp-fixes'

Daniel Borkmann (6):
  tcp, ulp: add alias for all ulp modules
  tcp, ulp: fix leftover icsk_ulp_ops preventing sock from reattach
  bpf, sockmap: fix leakage of smap_psock_map_entry
  bpf, sockmap: fix map elem deletion race with smap_stop_sock
  bpf, sockmap: fix sock_map_ctx_update_elem race with exist/noexist
  bpf: fix redirect to map under tail calls

Jesper Dangaard Brouer (1):
  samples/bpf: all XDP samples should unload xdp/bpf prog on SIGTERM

Tariq Toukan (1):
  net/xdp: Fix suspicious RCU usage warning

Yonghong Song (2):
  bpf: fix a rcu usage warning in bpf_prog_array_copy_core()
  tools/bpf: fix bpf selftest test_cgroup_storage failure

 include/linux/filter.h|   3 +-
 include/net/tcp.h |   4 +
 include/trace/events/xdp.h|   5 +-
 kernel/bpf/core.c |   2 +-
 kernel/bpf/cpumap.c   |   2 +
 kernel/bpf/devmap.c   |   1 +
 kernel/bpf/sockmap.c  | 120 --
 kernel/bpf/verifier.c |  21 
 kernel/bpf/xskmap.c   |   1 +
 net/core/filter.c |  68 ++--
 net/core/xdp.c|  14 +--
 net/ipv4/tcp_ulp.c|   4 +-
 net/tls/tls_main.c|   1 +
 samples/bpf/xdp_redirect_cpu_user.c   |   3 +-
 samples/bpf/xdp_rxq_info_user.c   |   3 +-
 tools/testing/selftests/bpf/test_cgroup_storage.c |   1 +
 16 files changed, 123 insertions(+), 130 deletions(-)


Re: [Intel-wired-lan] [PATCH next-queue 0/8] ixgbe/ixgbevf: IPsec offload support for VFs

2018-08-17 Thread Alexander Duyck
On Fri, Aug 17, 2018 at 4:19 PM Shannon Nelson
 wrote:
>
>
> On 8/16/2018 2:36 PM, Shannon Nelson wrote:
> > On 8/16/2018 2:15 PM, Alexander Duyck wrote:
> >> On Tue, Aug 14, 2018 at 10:10 AM Shannon Nelson
> >>  wrote:
> >>>
> >>> On 8/14/2018 8:30 AM, Alexander Duyck wrote:
>  On Mon, Aug 13, 2018 at 11:43 AM Shannon Nelson
>   wrote:
> >
> > This set of patches implements IPsec hardware offload for VF
> > devices in
> > Intel's 10Gbe x540 family of Ethernet devices.
> >>>
> >>> [...]
> >>>
> 
>  So the one question I would have about this patch set is what happens
>  if you are setting up a ipsec connection between the PF and one of the
>  VFs on the same port/function? Do the ipsec offloads get translated
>  across the Tx loopback or do they end up causing issues? Specifically
>  I would be interested in seeing the results of a test either between
>  two VFs, or the PF and one of the VFs on the same port.
> 
>  - Alex
> 
> >>>
> >>> There is definitely something funky in the internal switch connection,
> >>> as messages going from PF to VF with an offloaded encryption don't seem
> >>> to get received by the VF, at least when in a VEB setup.  If I only set
> >>> up offloads on the Rx on both PF and VF, and don't offload the Tx, then
> >>> things work.
> >>>
> >>> I don't have a setup to test this, but I suspect that in a VEPA
> >>> configuration, with packets going out to the switch and turned around
> >>> back in, the Tx encryption offload would happen as expected.
> >>>
> >>> sln
> >>
> >> We should probably look at adding at least one patch to the set then
> >> that disables IPsec Tx offload if SR-IOV is enabled with VEB so that
> >> we don't end up breaking connections should a VF be migrated from a
> >> remote system to a local one that it is connected to.
> >>
> >> - Alex
> >>
> >
> > The problem with this is that someone could set up an IPsec connection
> > on the PF for Tx and Rx use, then set num_vfs, start some VFs, and we
> > still can end up in the same place.  I don't think we want to disallow
> > all Tx IPsec offload.
> >
> > Maybe we can catch it in ixgbe_ipsec_offload_ok()?  If it can find that
> > the dest mac is on the internal switch, perhaps it can NAK the Tx
> > offload?  That would force the XFRM xmit code to do a regular SW encrypt
> > before sending the packet.  I'll look into this.
> >
> > sln
>
> This would be a great idea, but the xdo_state_offload_ok() callback
> happens in the network stack before routing has happened, so there is no
> mac address yet in the skb.  We may be stuck with NAKing *all* Tx
> offloads when num_vfs != 0.  It works, and it is better than no offload
> at all, but it sure harshes the vibe.  Blech.
>
> sln

You can probably just think of the Tx offload as being lumped in with
all the other offloads that don't work when SR-IOV is enabled such as
ATR and RSC.

- Alex


Re: [Intel-wired-lan] [PATCH next-queue 0/8] ixgbe/ixgbevf: IPsec offload support for VFs

2018-08-17 Thread Shannon Nelson



On 8/16/2018 2:36 PM, Shannon Nelson wrote:

On 8/16/2018 2:15 PM, Alexander Duyck wrote:

On Tue, Aug 14, 2018 at 10:10 AM Shannon Nelson
 wrote:


On 8/14/2018 8:30 AM, Alexander Duyck wrote:

On Mon, Aug 13, 2018 at 11:43 AM Shannon Nelson
 wrote:


This set of patches implements IPsec hardware offload for VF 
devices in

Intel's 10Gbe x540 family of Ethernet devices.


[...]



So the one question I would have about this patch set is what happens
if you are setting up a ipsec connection between the PF and one of the
VFs on the same port/function? Do the ipsec offloads get translated
across the Tx loopback or do they end up causing issues? Specifically
I would be interested in seeing the results of a test either between
two VFs, or the PF and one of the VFs on the same port.

- Alex



There is definitely something funky in the internal switch connection,
as messages going from PF to VF with an offloaded encryption don't seem
to get received by the VF, at least when in a VEB setup.  If I only set
up offloads on the Rx on both PF and VF, and don't offload the Tx, then
things work.

I don't have a setup to test this, but I suspect that in a VEPA
configuration, with packets going out to the switch and turned around
back in, the Tx encryption offload would happen as expected.

sln


We should probably look at adding at least one patch to the set then
that disables IPsec Tx offload if SR-IOV is enabled with VEB so that
we don't end up breaking connections should a VF be migrated from a
remote system to a local one that it is connected to.

- Alex



The problem with this is that someone could set up an IPsec connection 
on the PF for Tx and Rx use, then set num_vfs, start some VFs, and we 
still can end up in the same place.  I don't think we want to disallow 
all Tx IPsec offload.


Maybe we can catch it in ixgbe_ipsec_offload_ok()?  If it can find that 
the dest mac is on the internal switch, perhaps it can NAK the Tx 
offload?  That would force the XFRM xmit code to do a regular SW encrypt 
before sending the packet.  I'll look into this.


sln


This would be a great idea, but the xdo_state_offload_ok() callback 
happens in the network stack before routing has happened, so there is no 
mac address yet in the skb.  We may be stuck with NAKing *all* Tx 
offloads when num_vfs != 0.  It works, and it is better than no offload 
at all, but it sure harshes the vibe.  Blech.


sln



[RFC v3 net-next 4/5] rds: invoke socket sg filter attached to rds socket

2018-08-17 Thread Tushar Dave
RDS module sits on top of TCP (rds_tcp) and IB (rds_rdma), so messages
arrive in form of skb (over TCP) and scatterlist (over IB/RDMA).
However, because socket filter only deal with skb (e.g. struct skb as
bpf context) we can only use socket filter for rds_tcp and not for
rds_rdma.

Considering one filtering solution for RDS, it seems that the common
denominator between sk_buff and scatterlist is scatterlist. Therefore,
this patch converts skb to sgvec and invoke sg_filter_run for
rds_tcp and simply invoke sg_filter_run for IB/rds_rdma.

Signed-off-by: Tushar Dave 
Reviewed-by: Sowmini Varadhan 
---
 net/rds/ib.c   |  1 +
 net/rds/ib.h   |  1 +
 net/rds/ib_recv.c  | 12 
 net/rds/rds.h  |  2 ++
 net/rds/recv.c | 17 +
 net/rds/tcp.c  |  2 ++
 net/rds/tcp.h  |  2 ++
 net/rds/tcp_recv.c | 38 ++
 8 files changed, 75 insertions(+)

diff --git a/net/rds/ib.c b/net/rds/ib.c
index 89c6333..6ba1f75 100644
--- a/net/rds/ib.c
+++ b/net/rds/ib.c
@@ -532,6 +532,7 @@ struct rds_transport rds_ib_transport = {
.conn_path_shutdown = rds_ib_conn_path_shutdown,
.inc_copy_to_user   = rds_ib_inc_copy_to_user,
.inc_free   = rds_ib_inc_free,
+   .inc_to_sg_get  = rds_ib_inc_to_sg_get,
.cm_initiate_connect= rds_ib_cm_initiate_connect,
.cm_handle_connect  = rds_ib_cm_handle_connect,
.cm_connect_complete= rds_ib_cm_connect_complete,
diff --git a/net/rds/ib.h b/net/rds/ib.h
index 73427ff..0a12b41 100644
--- a/net/rds/ib.h
+++ b/net/rds/ib.h
@@ -404,6 +404,7 @@ int rds_ib_update_ipaddr(struct rds_ib_device *rds_ibdev,
 void rds_ib_recv_free_caches(struct rds_ib_connection *ic);
 void rds_ib_recv_refill(struct rds_connection *conn, int prefill, gfp_t gfp);
 void rds_ib_inc_free(struct rds_incoming *inc);
+int rds_ib_inc_to_sg_get(struct rds_incoming *inc, struct scatterlist **sg);
 int rds_ib_inc_copy_to_user(struct rds_incoming *inc, struct iov_iter *to);
 void rds_ib_recv_cqe_handler(struct rds_ib_connection *ic, struct ib_wc *wc,
 struct rds_ib_ack_state *state);
diff --git a/net/rds/ib_recv.c b/net/rds/ib_recv.c
index d300186..2f76a91 100644
--- a/net/rds/ib_recv.c
+++ b/net/rds/ib_recv.c
@@ -219,6 +219,18 @@ void rds_ib_inc_free(struct rds_incoming *inc)
rds_ib_recv_cache_put(>ii_cache_entry, >i_cache_incs);
 }
 
+int rds_ib_inc_to_sg_get(struct rds_incoming *inc, struct scatterlist **sg)
+{
+   struct rds_ib_incoming *ibinc;
+   struct rds_page_frag *frag;
+
+   ibinc = container_of(inc, struct rds_ib_incoming, ii_inc);
+   frag = list_entry(ibinc->ii_frags.next, struct rds_page_frag, f_item);
+   *sg =  >f_sg;
+
+   return 0;
+}
+
 static void rds_ib_recv_clear_one(struct rds_ib_connection *ic,
  struct rds_ib_recv_work *recv)
 {
diff --git a/net/rds/rds.h b/net/rds/rds.h
index c4dcf65..abcd5ce 100644
--- a/net/rds/rds.h
+++ b/net/rds/rds.h
@@ -542,6 +542,8 @@ struct rds_transport {
int (*recv_path)(struct rds_conn_path *cp);
int (*inc_copy_to_user)(struct rds_incoming *inc, struct iov_iter *to);
void (*inc_free)(struct rds_incoming *inc);
+   int (*inc_to_sg_get)(struct rds_incoming *inc, struct scatterlist **sg);
+   void (*inc_to_sg_put)(struct scatterlist **sg);
 
int (*cm_handle_connect)(struct rdma_cm_id *cm_id,
 struct rdma_cm_event *event, bool isv6);
diff --git a/net/rds/recv.c b/net/rds/recv.c
index 504cd6b..261904c 100644
--- a/net/rds/recv.c
+++ b/net/rds/recv.c
@@ -292,6 +292,8 @@ void rds_recv_incoming(struct rds_connection *conn, struct 
in6_addr *saddr,
struct sock *sk;
unsigned long flags;
struct rds_conn_path *cp;
+   struct sk_filter *filter;
+   int result = __SOCKSG_PASS;
 
inc->i_conn = conn;
inc->i_rx_jiffies = jiffies;
@@ -376,6 +378,21 @@ void rds_recv_incoming(struct rds_connection *conn, struct 
in6_addr *saddr,
/* We can be racing with rds_release() which marks the socket dead. */
sk = rds_rs_to_sk(rs);
 
+   rcu_read_lock();
+   filter = rcu_dereference(sk->sk_filter);
+   if (filter) {
+   if (conn->c_trans->inc_to_sg_get) {
+   struct scatterlist *sg;
+
+   if (conn->c_trans->inc_to_sg_get(inc, ) == 0) {
+   result = sg_filter_run(sk, sg);
+   if (conn->c_trans->inc_to_sg_put)
+   conn->c_trans->inc_to_sg_put();
+   }
+   }
+   }
+   rcu_read_unlock();
+
/* serialize with rds_release -> sock_orphan */
write_lock_irqsave(>rs_recv_lock, flags);
if (!sock_flag(sk, SOCK_DEAD)) {
diff --git a/net/rds/tcp.c b/net/rds/tcp.c
index 2c7b7c3..35454c7 100644
--- a/net/rds/tcp.c
+++ 

[RFC v3 net-next 5/5] ebpf: Add sample ebpf program for SOCKET_SG_FILTER

2018-08-17 Thread Tushar Dave
Add a sample program that shows how socksg program is used and attached
to socket filter. The kernel sample program deals with struct
scatterlist that is passed as bpf context.

When run in server mode, the sample RDS program opens PF_RDS socket,
attaches eBPF program to RDS socket which then uses bpf_msg_pull_data
helper to inspect packet data contained in struct scatterlist and
returns appropriate action code back to kernel.

To ease testing, RDS client functionality is also added so that users
can generate RDS packet.

Server:
[root@lab71 bpf]# ./rds_filter -s 192.168.3.71 -t tcp
running server in a loop
transport tcp
server bound to address: 192.168.3.71 port 4000
server listening on 192.168.3.71

Client:
[root@lab70 bpf]# ./rds_filter -s 192.168.3.71 -c 192.168.3.70 -t tcp
transport tcp
client bound to address: 192.168.3.70 port 25278
client sending 8192 byte message  from 192.168.3.70 to 192.168.3.71 on
port 25278
payload contains:30 31 32 33 34 35 36 37 38 39 ...

Server output:
192.168.3.71 received a packet from 192.168.3.71 of len 8192 cmsg len 0,
on port 25278
payload contains:30 31 32 33 34 35 36 37 38 39 ...
server listening on 192.168.3.71

[root@lab71 tushar]# cat /sys/kernel/debug/tracing/trace_pipe
  -0 [038] ..s.   146.947362: 0: 30 31 32
  -0 [038] ..s.   146.947364: 0: 33 34 35

Similarly specifying '-t ib' will run this on IB link.

Signed-off-by: Tushar Dave 
Acked-by: Sowmini Varadhan 
---
 samples/bpf/Makefile  |   3 +
 samples/bpf/rds_filter_kern.c |  42 ++
 samples/bpf/rds_filter_user.c | 339 ++
 3 files changed, 384 insertions(+)
 create mode 100644 samples/bpf/rds_filter_kern.c
 create mode 100644 samples/bpf/rds_filter_user.c

diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile
index 36f9f41..dbb30d0 100644
--- a/samples/bpf/Makefile
+++ b/samples/bpf/Makefile
@@ -53,6 +53,7 @@ hostprogs-y += xdpsock
 hostprogs-y += xdp_fwd
 hostprogs-y += task_fd_query
 hostprogs-y += xdp_sample_pkts
+hostprogs-y += rds_filter
 
 # Libbpf dependencies
 LIBBPF = $(TOOLS_PATH)/lib/bpf/libbpf.a
@@ -109,6 +110,7 @@ xdpsock-objs := xdpsock_user.o
 xdp_fwd-objs := xdp_fwd_user.o
 task_fd_query-objs := bpf_load.o task_fd_query_user.o $(TRACE_HELPERS)
 xdp_sample_pkts-objs := xdp_sample_pkts_user.o $(TRACE_HELPERS)
+rds_filter-objs := bpf_load.o rds_filter_user.o
 
 # Tell kbuild to always build the programs
 always := $(hostprogs-y)
@@ -166,6 +168,7 @@ always += xdpsock_kern.o
 always += xdp_fwd_kern.o
 always += task_fd_query_kern.o
 always += xdp_sample_pkts_kern.o
+always += rds_filter_kern.o
 
 KBUILD_HOSTCFLAGS += -I$(objtree)/usr/include
 KBUILD_HOSTCFLAGS += -I$(srctree)/tools/lib/
diff --git a/samples/bpf/rds_filter_kern.c b/samples/bpf/rds_filter_kern.c
new file mode 100644
index 000..633e687
--- /dev/null
+++ b/samples/bpf/rds_filter_kern.c
@@ -0,0 +1,42 @@
+// SPDX-License-Identifier: GPL-2.0
+#include 
+#include 
+#include 
+#include 
+#include 
+#include "bpf_helpers.h"
+
+#define bpf_printk(fmt, ...)   \
+({ \
+   char fmt[] = fmt;   \
+   bpf_trace_printk(fmt, sizeof(fmt),  \
+   ##__VA_ARGS__); \
+})
+
+SEC("socksg")
+int main_prog(struct sk_msg_md *msg)
+{
+   int start, end, err;
+   unsigned char *d;
+
+   start = 0;
+   end = 6;
+
+   err = bpf_msg_pull_data(msg, start, end, 0);
+   if (err) {
+   bpf_printk("socksg: pull_data err %i\n", err);
+   return SOCKSG_PASS;
+   }
+
+   if (msg->data + 6 > msg->data_end)
+   return SOCKSG_PASS;
+
+   d = (unsigned char *)msg->data;
+   bpf_printk("%x %x %x\n", d[0], d[1], d[2]);
+   bpf_printk("%x %x %x\n", d[3], d[4], d[5]);
+
+   return SOCKSG_PASS;
+}
+
+char _license[] SEC("license") = "GPL";
+u32 _version SEC("version") = LINUX_VERSION_CODE;
diff --git a/samples/bpf/rds_filter_user.c b/samples/bpf/rds_filter_user.c
new file mode 100644
index 000..1186345
--- /dev/null
+++ b/samples/bpf/rds_filter_user.c
@@ -0,0 +1,339 @@
+// SPDX-License-Identifier: GPL-2.0
+#include 
+#include 
+#include "bpf_load.h"
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#define TESTPORT   4000
+#define BUFSIZE8192
+
+int transport = -1;
+
+static int str2trans(const char *trans)
+{
+   if (strcmp(trans, "tcp") == 0)
+   return RDS_TRANS_TCP;
+   if (strcmp(trans, "ib") == 0)
+   return RDS_TRANS_IB;
+   return (RDS_TRANS_NONE);
+}
+
+static const char *trans2str(int trans)
+{
+   switch (trans) {
+   case RDS_TRANS_TCP:
+   return ("tcp");
+   case RDS_TRANS_IB:
+   return ("ib");
+   case RDS_TRANS_NONE:
+  

[RFC v3 net-next 2/5] ebpf: Add sg_filter_run()

2018-08-17 Thread Tushar Dave
When sg_filter_run() is invoked it runs the attached eBPF
prog of type BPF_PROG_TYPE_SOCKET_SG_FILTER which deals with
struct scatterlist.

Signed-off-by: Tushar Dave 
Acked-by: Sowmini Varadhan 
---
 include/linux/filter.h |  8 
 include/uapi/linux/bpf.h   |  6 ++
 net/core/filter.c  | 24 
 tools/include/uapi/linux/bpf.h |  6 ++
 4 files changed, 44 insertions(+)

diff --git a/include/linux/filter.h b/include/linux/filter.h
index 5d565c5..9f1f7c1 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -1112,4 +1112,12 @@ struct bpf_sock_ops_kern {
 */
 };
 
+enum __socksg_action {
+   __SOCKSG_DROP = 0,
+   __SOCKSG_PASS,
+   __SOCKSG_REDIRECT,
+};
+
+int sg_filter_run(struct sock *sk, struct scatterlist *sg);
+
 #endif /* __LINUX_FILTER_H__ */
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 6ec1e32..d1d0ceb 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -2428,6 +2428,12 @@ enum sk_action {
SK_PASS,
 };
 
+enum socksg_action {
+   SOCKSG_DROP = 0,
+   SOCKSG_PASS,
+   SOCKSG_REDIRECT,
+};
+
 /* user accessible metadata for SK_MSG packet hook, new fields must
  * be added to the end of this structure
  */
diff --git a/net/core/filter.c b/net/core/filter.c
index cec3807..e427c8e 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -121,6 +121,30 @@ int sk_filter_trim_cap(struct sock *sk, struct sk_buff 
*skb, unsigned int cap)
 }
 EXPORT_SYMBOL(sk_filter_trim_cap);
 
+int sg_filter_run(struct sock *sk, struct scatterlist *sg)
+{
+   struct sk_filter *filter;
+   int result;
+
+   rcu_read_lock();
+   filter = rcu_dereference(sk->sk_filter);
+   if (filter) {
+   struct sk_msg_buff mb = {0};
+
+   memcpy(mb.sg_data, sg, sizeof(*sg) * MAX_SKB_FRAGS);
+   mb.sg_start = 0;
+   mb.sg_end = sg_nents(sg) - 1;
+   mb.data = sg_virt(sg);
+   mb.data_end = mb.data + sg->length;
+   mb.sg_copy[mb.sg_end] = true;
+   result = BPF_PROG_RUN(filter->prog, );
+   }
+   rcu_read_unlock();
+
+   return result;
+}
+EXPORT_SYMBOL(sg_filter_run);
+
 BPF_CALL_1(bpf_skb_get_pay_offset, struct sk_buff *, skb)
 {
return skb_get_poff(skb);
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 6ec1e32..d1d0ceb 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -2428,6 +2428,12 @@ enum sk_action {
SK_PASS,
 };
 
+enum socksg_action {
+   SOCKSG_DROP = 0,
+   SOCKSG_PASS,
+   SOCKSG_REDIRECT,
+};
+
 /* user accessible metadata for SK_MSG packet hook, new fields must
  * be added to the end of this structure
  */
-- 
1.8.3.1



[RFC v3 net-next 3/5] ebpf: fix bpf_msg_pull_data

2018-08-17 Thread Tushar Dave
Like sockmap (sk_msg), socksg also deals with struct scatterlist
therefore socksg programs can use existing bpf helper bpf_msg_pull_data
to access packet data contained in struct scatterlist. While doing some
prelimnary testing, there are couple of issues found with
bpf_msg_pull_data that are fixed in this patch.

Also, there cannot be more than MAX_SKB_FRAGS entries in sg_data
therefore any checks for sg entry more than MAX_SKB_FRAGS in
bpf_msg_pull_data() is removed.

Besides that, I also ran into issues while put_page() is invoked.
e.g.
[ 450.568723] BUG: Bad page state in process swapper/10 pfn:2021540
[ 450.575632] page:ea0080855000 count:0 mapcount:0
mapping:88103d006840 index:0x88202154 compound_mapcount: 0
[ 450.588069] flags: 0x6f80008100(slab|head)
[ 450.593033] raw: 006f80008100 dead0100 dead0200
88103d006840
[ 450.601683] raw: 88202154 80080007 

[ 450.610337] page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set
[ 450.617530] bad because of flags: 0x100(slab)

To avoid above issue, currently put_page() is disabled in this patch
temporarily. I am working on alternatives so that page allocated via
slab (in this case) can be freed without any issue.

Signed-off-by: Tushar Dave 
Acked-by: Sowmini Varadhan 
---
 net/core/filter.c | 61 +--
 1 file changed, 32 insertions(+), 29 deletions(-)

diff --git a/net/core/filter.c b/net/core/filter.c
index e427c8e..cc52baa 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -2316,7 +2316,7 @@ struct sock *do_msg_redirect_map(struct sk_msg_buff *msg)
 BPF_CALL_4(bpf_msg_pull_data,
   struct sk_msg_buff *, msg, u32, start, u32, end, u64, flags)
 {
-   unsigned int len = 0, offset = 0, copy = 0;
+   unsigned int len = 0, offset = 0, copy = 0, off = 0;
struct scatterlist *sg = msg->sg_data;
int first_sg, last_sg, i, shift;
unsigned char *p, *to, *from;
@@ -2330,22 +2330,28 @@ struct sock *do_msg_redirect_map(struct sk_msg_buff 
*msg)
i = msg->sg_start;
do {
len = sg[i].length;
-   offset += len;
if (start < offset + len)
break;
+   offset += len;
i++;
-   if (i == MAX_SKB_FRAGS)
-   i = 0;
-   } while (i != msg->sg_end);
+   } while (i <= msg->sg_end);
 
+   /* return error if start is out of range */
if (unlikely(start >= offset + len))
return -EINVAL;
 
-   if (!msg->sg_copy[i] && bytes <= len)
-   goto out;
+   /* return error if i is last entry in sglist and end is out of range */
+   if (msg->sg_copy[i] && end > offset + len)
+   return -EINVAL;
 
first_sg = i;
 
+   /* if i is not last entry in sg list and end (i.e start + bytes) is
+* within this sg[i] then goto out and calculate data and data_end
+*/
+   if (!msg->sg_copy[i] && end <= offset + len)
+   goto out;
+
/* At this point we need to linearize multiple scatterlist
 * elements or a single shared page. Either way we need to
 * copy into a linear buffer exclusively owned by BPF. Then
@@ -2359,11 +2365,14 @@ struct sock *do_msg_redirect_map(struct sk_msg_buff 
*msg)
do {
copy += sg[i].length;
i++;
-   if (i == MAX_SKB_FRAGS)
-   i = 0;
-   if (bytes < copy)
+   if (end < copy)
break;
-   } while (i != msg->sg_end);
+   } while (i <= msg->sg_end);
+
+   /* return error if i is last entry in sglist and end is out of range */
+   if (i > msg->sg_end && end > offset + copy)
+   return -EINVAL;
+
last_sg = i;
 
if (unlikely(copy < end - start))
@@ -2373,23 +2382,25 @@ struct sock *do_msg_redirect_map(struct sk_msg_buff 
*msg)
if (unlikely(!page))
return -ENOMEM;
p = page_address(page);
-   offset = 0;
 
i = first_sg;
do {
from = sg_virt([i]);
len = sg[i].length;
-   to = p + offset;
+   to = p + off;
 
memcpy(to, from, len);
-   offset += len;
+   off += len;
sg[i].length = 0;
-   put_page(sg_page([i]));
+   /* if original page is allocated via slab then put_page
+* causes error BUG: Bad page state in process. So temporarily
+* disabled put_page.
+* Todo: fix it
+*/
+   //put_page(sg_page([i]));
 
i++;
-   if (i == MAX_SKB_FRAGS)
-   i = 0;
-   } while (i != last_sg);
+   } while (i < last_sg);
 
sg[first_sg].length = copy;

[RFC v3 net-next 0/5] eBPF and struct scatterlist

2018-08-17 Thread Tushar Dave
This is v3 of the RFC sent earlier,
(https://patchwork.ozlabs.org/cover/931785/).

v2->v3:
- As per the review feedback received, this patchset reuses as much code
as possible from sockmap/sk_msg. e.g. it uses existing struct
sk_msg_buff, struct sk_msg_md, sk_msg_convert_ctx_access and part of
code from sk_msg_convert_ctx_access.

- bpf helper bpf_msg_pull_data() is used to access packet data. Some
issues found with bpf_msg_pull_data() are therefore fixed in patch 3.

- A feedback was given that unprivileged user can attach a new
BPF_PROG_TYPE_SOCKET_SG_FILTER to a non-rds socket e.g. normal tcp/udp
through the SO_ATTACH_BPF sockopt, where input context is skb instead of
sg list and can cause issues. However, I found that as an unprivileged,
user can attach any kind of eBPF program to socket using SO_ATTACH_BPF,
not only socksg. But if eBPF program is faulty, kernel BPF verifier take
care of it and invalidate any access to kernel data, doesn't let eBPF
program to run.

- socksg programs now returns action code (e.g. SOCKSG_PASS etc,.).


Background:
The motivation for this work is to allow eBPF based firewalling for
kernel modules that do not always get their packet as an sk_buff from
their downlink drivers. One such instance of this use-case is RDS, which
can be run both over IB (driver RDMA's a scatterlist to the RDS module)
or over TCP (TCP passes an sk_buff to the RDS module).

This patchset uses exiting socket filter infrastructure and extend it
with new eBPF program type that deals with struct scatterlist.
Existing bpf helper bpf_msg_pull_data() is used to inspect packet data
that are in form struct scatterlist. For RDS, the integrated approach
treats the scatterlist as the common denominator, and allows the
application to write a filter for processing a scatterlist.


Details:
Patch 1 adds new eBPF prog type BPF_PROG_TYPE_SOCKET_SG_FILTER which
uses the existing socket filter infrastructure for bpf program attach
and load. eBPF program of type BPF_PROG_TYPE_SOCKET_SG_FILTER deals with
struct scatterlist as bpf context contrast to
BPF_PROG_TYPE_SOCKET_FILTER which deals with struct skb. This new eBPF
program type allow socket filter to run on packet data that is in form
of struct scatterlist.

Patch 2 adds sg_filter_run() that runs BPF_PROG_TYPE_SOCKET_SG_FILTER.

Patch 3 fixes bpf_msg_pull_data() for the bugs that were found while
doing some experiment with different size of packets.

patch 4 allows rds_recv_incoming to invoke socket filter program which
deals with struct scatterlist.

Patch 5 adds socket filter eBPF sample program that uses patches 1 to 4.
The sample program opens an rds socket, attach ebpf program
(socksg i.e. BPF_PROG_TYPE_SOCKET_SG_FILTER) to rds socket and uses
bpf_msg_pull_data() helper to inspect RDS packet data. For a test,
current sample program only prints first few bytes of packet data.


Testing:
To confirm data accuracy and results, RDS packets of various sizes has
been tested with socksg program along with various start and end values
for bpf_msg_pull_data(). All such tests shows accurate results.

Thanks.

-Tushar



Tushar Dave (5):
  eBPF: Add new eBPF prog type BPF_PROG_TYPE_SOCKET_SG_FILTER
  ebpf: Add sg_filter_run()
  ebpf: fix bpf_msg_pull_data
  rds: invoke socket sg filter attached to rds socket
  ebpf: Add sample ebpf program for SOCKET_SG_FILTER

 include/linux/bpf_types.h  |   1 +
 include/linux/filter.h |   8 +
 include/uapi/linux/bpf.h   |   7 +
 kernel/bpf/syscall.c   |   1 +
 kernel/bpf/verifier.c  |   1 +
 net/core/filter.c  | 140 +
 net/rds/ib.c   |   1 +
 net/rds/ib.h   |   1 +
 net/rds/ib_recv.c  |  12 ++
 net/rds/rds.h  |   2 +
 net/rds/recv.c |  17 +++
 net/rds/tcp.c  |   2 +
 net/rds/tcp.h  |   2 +
 net/rds/tcp_recv.c |  38 +
 samples/bpf/Makefile   |   3 +
 samples/bpf/bpf_load.c |  11 +-
 samples/bpf/rds_filter_kern.c  |  42 +
 samples/bpf/rds_filter_user.c  | 339 +
 tools/bpf/bpftool/prog.c   |   1 +
 tools/include/uapi/linux/bpf.h |   7 +
 tools/lib/bpf/libbpf.c |   3 +
 tools/lib/bpf/libbpf.h |   2 +
 22 files changed, 607 insertions(+), 34 deletions(-)
 create mode 100644 samples/bpf/rds_filter_kern.c
 create mode 100644 samples/bpf/rds_filter_user.c

-- 
1.8.3.1



[RFC v3 net-next 1/5] eBPF: Add new eBPF prog type BPF_PROG_TYPE_SOCKET_SG_FILTER

2018-08-17 Thread Tushar Dave
Add new eBPF prog type BPF_PROG_TYPE_SOCKET_SG_FILTER which uses the
existing socket filter infrastructure for bpf program attach and load.
SOCKET_SG_FILTER eBPF program receives struct scatterlist as bpf context
contrast to SOCKET_FILTER which deals with struct skb. This is useful
for kernel entities that don't have skb to represent packet data but
want to run eBPF socket filter on packet data that is in form of struct
scatterlist e.g. IB/RDMA

Signed-off-by: Tushar Dave 
Acked-by: Sowmini Varadhan 
---
 include/linux/bpf_types.h  |  1 +
 include/uapi/linux/bpf.h   |  1 +
 kernel/bpf/syscall.c   |  1 +
 kernel/bpf/verifier.c  |  1 +
 net/core/filter.c  | 55 --
 samples/bpf/bpf_load.c | 11 ++---
 tools/bpf/bpftool/prog.c   |  1 +
 tools/include/uapi/linux/bpf.h |  1 +
 tools/lib/bpf/libbpf.c |  3 +++
 tools/lib/bpf/libbpf.h |  2 ++
 10 files changed, 72 insertions(+), 5 deletions(-)

diff --git a/include/linux/bpf_types.h b/include/linux/bpf_types.h
index cd26c09..7dc1503 100644
--- a/include/linux/bpf_types.h
+++ b/include/linux/bpf_types.h
@@ -16,6 +16,7 @@
 BPF_PROG_TYPE(BPF_PROG_TYPE_SOCK_OPS, sock_ops)
 BPF_PROG_TYPE(BPF_PROG_TYPE_SK_SKB, sk_skb)
 BPF_PROG_TYPE(BPF_PROG_TYPE_SK_MSG, sk_msg)
+BPF_PROG_TYPE(BPF_PROG_TYPE_SOCKET_SG_FILTER, socksg_filter)
 #endif
 #ifdef CONFIG_BPF_EVENTS
 BPF_PROG_TYPE(BPF_PROG_TYPE_KPROBE, kprobe)
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 66917a4..6ec1e32 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -152,6 +152,7 @@ enum bpf_prog_type {
BPF_PROG_TYPE_LWT_SEG6LOCAL,
BPF_PROG_TYPE_LIRC_MODE2,
BPF_PROG_TYPE_SK_REUSEPORT,
+   BPF_PROG_TYPE_SOCKET_SG_FILTER,
 };
 
 enum bpf_attach_type {
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 8339d81..160cdb2 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -1362,6 +1362,7 @@ static int bpf_prog_load(union bpf_attr *attr)
 
if (type != BPF_PROG_TYPE_SOCKET_FILTER &&
type != BPF_PROG_TYPE_CGROUP_SKB &&
+   type != BPF_PROG_TYPE_SOCKET_SG_FILTER &&
!capable(CAP_SYS_ADMIN))
return -EPERM;
 
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index ca90679..5abc788 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -1321,6 +1321,7 @@ static bool may_access_direct_pkt_data(struct 
bpf_verifier_env *env,
case BPF_PROG_TYPE_LWT_XMIT:
case BPF_PROG_TYPE_SK_SKB:
case BPF_PROG_TYPE_SK_MSG:
+   case BPF_PROG_TYPE_SOCKET_SG_FILTER:
if (meta)
return meta->pkt_access;
 
diff --git a/net/core/filter.c b/net/core/filter.c
index fd423ce..cec3807 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -1140,7 +1140,8 @@ static void bpf_release_orig_filter(struct bpf_prog *fp)
 
 static void __bpf_prog_release(struct bpf_prog *prog)
 {
-   if (prog->type == BPF_PROG_TYPE_SOCKET_FILTER) {
+   if (prog->type == BPF_PROG_TYPE_SOCKET_FILTER ||
+   prog->type == BPF_PROG_TYPE_SOCKET_SG_FILTER) {
bpf_prog_put(prog);
} else {
bpf_release_orig_filter(prog);
@@ -1539,10 +1540,16 @@ int sk_reuseport_attach_filter(struct sock_fprog 
*fprog, struct sock *sk)
 
 static struct bpf_prog *__get_bpf(u32 ufd, struct sock *sk)
 {
+   struct bpf_prog *prog;
+
if (sock_flag(sk, SOCK_FILTER_LOCKED))
return ERR_PTR(-EPERM);
 
-   return bpf_prog_get_type(ufd, BPF_PROG_TYPE_SOCKET_FILTER);
+   prog = bpf_prog_get_type(ufd, BPF_PROG_TYPE_SOCKET_FILTER);
+   if (IS_ERR(prog))
+   prog = bpf_prog_get_type(ufd, BPF_PROG_TYPE_SOCKET_SG_FILTER);
+
+   return prog;
 }
 
 int sk_attach_bpf(u32 ufd, struct sock *sk)
@@ -4920,6 +4927,17 @@ bool bpf_helper_changes_pkt_data(void *func)
 }
 
 static const struct bpf_func_proto *
+socksg_filter_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
+{
+   switch (func_id) {
+   case BPF_FUNC_msg_pull_data:
+   return _msg_pull_data_proto;
+   default:
+   return bpf_base_func_proto(func_id);
+   }
+}
+
+static const struct bpf_func_proto *
 tc_cls_act_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 {
switch (func_id) {
@@ -6738,6 +6756,30 @@ static u32 sk_skb_convert_ctx_access(enum 
bpf_access_type type,
return insn - insn_buf;
 }
 
+static u32 socksg_filter_convert_ctx_access(enum bpf_access_type type,
+   const struct bpf_insn *si,
+   struct bpf_insn *insn_buf,
+   struct bpf_prog *prog,
+   u32 *target_size)
+{
+   struct bpf_insn *insn = insn_buf;
+
+   switch (si->off) {
+   case offsetof(struct 

Re: [PATCH bpf] bpf: fix redirect to map under tail calls

2018-08-17 Thread Alexei Starovoitov
On Fri, Aug 17, 2018 at 11:26:14PM +0200, Daniel Borkmann wrote:
> Commits 109980b894e9 ("bpf: don't select potentially stale ri->map
> from buggy xdp progs") and 7c3001313396 ("bpf: fix ri->map_owner
> pointer on bpf_prog_realloc") tried to mitigate that buggy programs
> using bpf_redirect_map() helper call do not leave stale maps behind.
> Idea was to add a map_owner cookie into the per CPU struct redirect_info
> which was set to prog->aux by the prog making the helper call as a
> proof that the map is not stale since the prog is implicitly holding
> a reference to it. This owner cookie could later on get compared with
> the program calling into BPF whether they match and therefore the
> redirect could proceed with processing the map safely.
> 
> In (obvious) hindsight, this approach breaks down when tail calls are
> involved since the original caller's prog->aux pointer does not have
> to match the one from one of the progs out of the tail call chain,
> and therefore the xdp buffer will be dropped instead of redirected.
> A way around that would be to fix the issue differently (which also
> allows to remove related work in fast path at the same time): once
> the life-time of a redirect map has come to its end we use it's map
> free callback where we need to wait on synchronize_rcu() for current
> outstanding xdp buffers and remove such a map pointer from the
> redirect info if found to be present. At that time no program is
> using this map anymore so we simply invalidate the map pointers to
> NULL iff they previously pointed to that instance while making sure
> that the redirect path only reads out the map once.
> 
> Fixes: 97f91a7cf04f ("bpf: add bpf_redirect_map helper routine")
> Fixes: 109980b894e9 ("bpf: don't select potentially stale ri->map from buggy 
> xdp progs")
> Reported-by: Sebastiano Miano 
> Signed-off-by: Daniel Borkmann 
> Acked-by: John Fastabend 

Applied, Thanks



Re: [PATCH net-next v1] net/tls: Add support for async decryption of tls records

2018-08-17 Thread Dave Watson
On 08/16/18 08:49 PM, Vakul Garg wrote:
> Changes since RFC version:
>   1) Improved commit message.
>   2) Fixed dequeued record offset handling because of which few of
>  tls selftests 'recv_partial, recv_peek, recv_peek_multiple' were 
> failing.

Thanks! Commit message much more clear, tests work great for me also,
only minor comments on clarity

> - if (tls_sw_advance_skb(sk, skb, chunk)) {
> + if (async) {
> + /* Finished with current record, pick up next */
> + ctx->recv_pkt = NULL;
> + __strp_unpause(>strp);
> + goto mark_eor_chk_ctrl;

Control flow is a little hard to follow here, maybe just pass an async
flag to tls_sw_advance_skb?  It already does strp_unpause and recv_pkt
= NULL.  

> + } else if (tls_sw_advance_skb(sk, skb, chunk)) {
>   /* Return full control message to
>* userspace before trying to parse
>* another message type
>*/
> +mark_eor_chk_ctrl:
>   msg->msg_flags |= MSG_EOR;
>   if (control != TLS_RECORD_TYPE_DATA)
>   goto recv_end;
> + } else {
> + break;

I don't see the need for the else { break; }, isn't this already
covered by while(len); below as before?


Re: [offlist] Re: Crash in netlink/sk_filter_trim_cap on ARMv7 on 4.18rc1

2018-08-17 Thread Daniel Borkmann
On 08/17/2018 11:13 PM, Peter Robinson wrote:
> On Fri, Aug 17, 2018 at 7:30 PM, Daniel Borkmann  wrote:
>> On 08/17/2018 06:17 PM, Russell King - ARM Linux wrote:
>>> On Fri, Aug 17, 2018 at 02:40:19PM +0200, Daniel Borkmann wrote:
 I'd have one potential bug suspicion, for the 4.18 one you were trying,
 could you run with the below patch to see whether it would help?
>>>
>>> I think this is almost certainly the problem - looking at the history,
>>> it seems that the "-4" was assumed to be part of the scratch stuff in
>>> commit 38ca93060163 ("bpf, arm32: save 4 bytes of unneeded stack space")
>>> but it isn't - it's because "off" of zero refers to the top word in the
>>> stack (iow at STACK_SIZE-4).
>>
>> Yeah agree, my thinking as well (albeit bit late, sigh, sorry about that).
>> Waiting for Peter to get back with results for definite confirmation. Your
>> rework in 1c35ba122d4a ("ARM: net: bpf: use negative numbers for stacked
>> registers") and 96cced4e774a ("ARM: net: bpf: access eBPF scratch space using
>> ARM FP register") fixes this in mainline, so unless I'm missing something 
>> this
>> would only need a stand-alone fix for 4.18/stable which I can cook up and
>> submit then.
> 
> I can confirm that fixes the problems I was seeing on Fedora 29.
> 
> Feel free to add a tested by from me:
> 
> Tested-by: Peter Robinson 

Great, thanks everyone! Will get it out asap.


RE: [PATCH mlx5-next] RDMA/mlx5: Don't use cached IRQ affinity mask

2018-08-17 Thread Steve Wise
> 
> 
> > Hey Sagi,
> >
> > The patch works allowing connections for the various affinity mappings
> below:
> >
> > One comp_vector per core across all cores, starting with numa-local cores:
> 
> Thanks Steve, is this your "Tested by:" tag?

Sure:

Tested-by: Steve Wise 



[PATCH bpf] bpf: fix redirect to map under tail calls

2018-08-17 Thread Daniel Borkmann
Commits 109980b894e9 ("bpf: don't select potentially stale ri->map
from buggy xdp progs") and 7c3001313396 ("bpf: fix ri->map_owner
pointer on bpf_prog_realloc") tried to mitigate that buggy programs
using bpf_redirect_map() helper call do not leave stale maps behind.
Idea was to add a map_owner cookie into the per CPU struct redirect_info
which was set to prog->aux by the prog making the helper call as a
proof that the map is not stale since the prog is implicitly holding
a reference to it. This owner cookie could later on get compared with
the program calling into BPF whether they match and therefore the
redirect could proceed with processing the map safely.

In (obvious) hindsight, this approach breaks down when tail calls are
involved since the original caller's prog->aux pointer does not have
to match the one from one of the progs out of the tail call chain,
and therefore the xdp buffer will be dropped instead of redirected.
A way around that would be to fix the issue differently (which also
allows to remove related work in fast path at the same time): once
the life-time of a redirect map has come to its end we use it's map
free callback where we need to wait on synchronize_rcu() for current
outstanding xdp buffers and remove such a map pointer from the
redirect info if found to be present. At that time no program is
using this map anymore so we simply invalidate the map pointers to
NULL iff they previously pointed to that instance while making sure
that the redirect path only reads out the map once.

Fixes: 97f91a7cf04f ("bpf: add bpf_redirect_map helper routine")
Fixes: 109980b894e9 ("bpf: don't select potentially stale ri->map from buggy 
xdp progs")
Reported-by: Sebastiano Miano 
Signed-off-by: Daniel Borkmann 
Acked-by: John Fastabend 
---
 include/linux/filter.h |  3 +-
 include/trace/events/xdp.h |  5 ++--
 kernel/bpf/cpumap.c|  2 ++
 kernel/bpf/devmap.c|  1 +
 kernel/bpf/verifier.c  | 21 --
 kernel/bpf/xskmap.c|  1 +
 net/core/filter.c  | 68 --
 7 files changed, 38 insertions(+), 63 deletions(-)

diff --git a/include/linux/filter.h b/include/linux/filter.h
index 5d565c5..6791a0a 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -543,7 +543,6 @@ struct bpf_redirect_info {
u32 flags;
struct bpf_map *map;
struct bpf_map *map_to_flush;
-   unsigned long   map_owner;
u32 kern_flags;
 };
 
@@ -781,6 +780,8 @@ static inline bool bpf_dump_raw_ok(void)
 struct bpf_prog *bpf_patch_insn_single(struct bpf_prog *prog, u32 off,
   const struct bpf_insn *patch, u32 len);
 
+void bpf_clear_redirect_map(struct bpf_map *map);
+
 static inline bool xdp_return_frame_no_direct(void)
 {
struct bpf_redirect_info *ri = this_cpu_ptr(_redirect_info);
diff --git a/include/trace/events/xdp.h b/include/trace/events/xdp.h
index 1ecf4c6..e95cb86 100644
--- a/include/trace/events/xdp.h
+++ b/include/trace/events/xdp.h
@@ -147,9 +147,8 @@ struct _bpf_dtab_netdev {
 
 #define devmap_ifindex(fwd, map)   \
(!fwd ? 0 : \
-(!map ? 0 :\
- ((map->map_type == BPF_MAP_TYPE_DEVMAP) ? \
-  ((struct _bpf_dtab_netdev *)fwd)->dev->ifindex : 0)))
+((map->map_type == BPF_MAP_TYPE_DEVMAP) ?  \
+ ((struct _bpf_dtab_netdev *)fwd)->dev->ifindex : 0))
 
 #define _trace_xdp_redirect_map(dev, xdp, fwd, map, idx)   \
 trace_xdp_redirect_map(dev, xdp, devmap_ifindex(fwd, map), \
diff --git a/kernel/bpf/cpumap.c b/kernel/bpf/cpumap.c
index 620bc50..24aac0d 100644
--- a/kernel/bpf/cpumap.c
+++ b/kernel/bpf/cpumap.c
@@ -479,6 +479,8 @@ static void cpu_map_free(struct bpf_map *map)
 * It does __not__ ensure pending flush operations (if any) are
 * complete.
 */
+
+   bpf_clear_redirect_map(map);
synchronize_rcu();
 
/* To ensure all pending flush operations have completed wait for flush
diff --git a/kernel/bpf/devmap.c b/kernel/bpf/devmap.c
index ac1df79..141710b 100644
--- a/kernel/bpf/devmap.c
+++ b/kernel/bpf/devmap.c
@@ -161,6 +161,7 @@ static void dev_map_free(struct bpf_map *map)
list_del_rcu(>list);
spin_unlock(_map_lock);
 
+   bpf_clear_redirect_map(map);
synchronize_rcu();
 
/* To ensure all pending flush operations have completed wait for flush
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index ca90679..9224611 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -5844,27 +5844,6 @@ static int fixup_bpf_calls(struct bpf_verifier_env *env)
goto patch_call_imm;
}
 
-   if (insn->imm == BPF_FUNC_redirect_map) {
-   /* Note, we cannot use prog directly as imm as 

Re: [offlist] Re: Crash in netlink/sk_filter_trim_cap on ARMv7 on 4.18rc1

2018-08-17 Thread Peter Robinson
Hi Stefan,

>> On 08/17/2018 06:17 PM, Russell King - ARM Linux wrote:
>> > On Fri, Aug 17, 2018 at 02:40:19PM +0200, Daniel Borkmann wrote:
>> >> I'd have one potential bug suspicion, for the 4.18 one you were trying,
>> >> could you run with the below patch to see whether it would help?
>> >
>> > I think this is almost certainly the problem - looking at the history,
>> > it seems that the "-4" was assumed to be part of the scratch stuff in
>> > commit 38ca93060163 ("bpf, arm32: save 4 bytes of unneeded stack space")
>> > but it isn't - it's because "off" of zero refers to the top word in the
>> > stack (iow at STACK_SIZE-4).
>>
>> Yeah agree, my thinking as well (albeit bit late, sigh, sorry about that).
>> Waiting for Peter to get back with results for definite confirmation. Your
>> rework in 1c35ba122d4a ("ARM: net: bpf: use negative numbers for stacked
>> registers") and 96cced4e774a ("ARM: net: bpf: access eBPF scratch space using
>> ARM FP register") fixes this in mainline, so unless I'm missing something 
>> this
>> would only need a stand-alone fix for 4.18/stable which I can cook up and
>> submit then.
>
> i was able to reproduce this issue on RPi 3 with Linux 4.18.1 + 
> multi_v7_defconfig and the following  config changes:
>
>  --- a/arch/arm/configs/multi_v7_defconfig
> +++ b/arch/arm/configs/multi_v7_defconfig
> @@ -2,7 +2,10 @@ CONFIG_SYSVIPC=y
>  CONFIG_NO_HZ=y
>  CONFIG_HIGH_RES_TIMERS=y
>  CONFIG_CGROUPS=y
> +CONFIG_CGROUP_BPF=y
>  CONFIG_BLK_DEV_INITRD=y
> +CONFIG_BPF_SYSCALL=y
> +CONFIG_BPF_JIT_ALWAYS_ON=y
>  CONFIG_EMBEDDED=y
>  CONFIG_PERF_EVENTS=y
>  CONFIG_MODULES=y
> @@ -153,6 +156,8 @@ CONFIG_IPV6_MIP6=m
>  CONFIG_IPV6_TUNNEL=m
>  CONFIG_IPV6_MULTIPLE_TABLES=y
>  CONFIG_NET_DSA=m
> +CONFIG_BPF_JIT=y
> +CONFIG_BPF_STREAM_PARSER=y
>  CONFIG_CAN=y
>  CONFIG_CAN_AT91=m
>  CONFIG_CAN_FLEXCAN=m
>
> After applying the "-4" patch the oopses doesn't appear during boot anymore.

Would be fab to get that into the kernel so this is widely tested
moving forward.

Peter


Re: [offlist] Re: Crash in netlink/sk_filter_trim_cap on ARMv7 on 4.18rc1

2018-08-17 Thread Peter Robinson
On Fri, Aug 17, 2018 at 7:30 PM, Daniel Borkmann  wrote:
> On 08/17/2018 06:17 PM, Russell King - ARM Linux wrote:
>> On Fri, Aug 17, 2018 at 02:40:19PM +0200, Daniel Borkmann wrote:
>>> I'd have one potential bug suspicion, for the 4.18 one you were trying,
>>> could you run with the below patch to see whether it would help?
>>
>> I think this is almost certainly the problem - looking at the history,
>> it seems that the "-4" was assumed to be part of the scratch stuff in
>> commit 38ca93060163 ("bpf, arm32: save 4 bytes of unneeded stack space")
>> but it isn't - it's because "off" of zero refers to the top word in the
>> stack (iow at STACK_SIZE-4).
>
> Yeah agree, my thinking as well (albeit bit late, sigh, sorry about that).
> Waiting for Peter to get back with results for definite confirmation. Your
> rework in 1c35ba122d4a ("ARM: net: bpf: use negative numbers for stacked
> registers") and 96cced4e774a ("ARM: net: bpf: access eBPF scratch space using
> ARM FP register") fixes this in mainline, so unless I'm missing something this
> would only need a stand-alone fix for 4.18/stable which I can cook up and
> submit then.

I can confirm that fixes the problems I was seeing on Fedora 29.

Feel free to add a tested by from me:

Tested-by: Peter Robinson 


Re: [offlist] Re: Crash in netlink/sk_filter_trim_cap on ARMv7 on 4.18rc1

2018-08-17 Thread Peter Robinson
On Fri, Aug 17, 2018 at 5:17 PM, Russell King - ARM Linux
 wrote:
> On Fri, Aug 17, 2018 at 02:40:19PM +0200, Daniel Borkmann wrote:
>> I'd have one potential bug suspicion, for the 4.18 one you were trying,
>> could you run with the below patch to see whether it would help?
>
> I think this is almost certainly the problem - looking at the history,
> it seems that the "-4" was assumed to be part of the scratch stuff in
> commit 38ca93060163 ("bpf, arm32: save 4 bytes of unneeded stack space")
> but it isn't - it's because "off" of zero refers to the top word in the
> stack (iow at STACK_SIZE-4).

I can confirm that patch fixes the problem I was seeing.

Peter


Re: [PATCH mlx5-next] RDMA/mlx5: Don't use cached IRQ affinity mask

2018-08-17 Thread Sagi Grimberg

Hi Jason,


The new patchworks doesn't grab patches inlined in messages, so you
will need to resend it.


Yes, just wanted to to add Steve's tested by as its going to
lists that did not follow this thread.


Also, can someone remind me what the outcome is here? Does it
supersede Leon's patch:

https://patchwork.kernel.org/patch/10526167/


Leon's patch is exposing the breakage so I think it would be
wise to have it go in after this lands mainline.


Re: [iproute PATCH v5 1/2] Make colored output configurable

2018-08-17 Thread David Ahern
On 8/17/18 10:38 AM, Phil Sutter wrote:
> Allow for -color={never,auto,always} to have colored output disabled,
> enabled only if stdout is a terminal or enabled regardless of stdout
> state.
> 
> Signed-off-by: Phil Sutter 
> ---
> Changes since v1:
> - Allow to override isatty() check by specifying '-color' flag more than
>   once.
> - Document new behaviour in man pages.
> 
> Changes since v2:
> - Implement new -color=foo syntax.
> - Update commit message and man page texts accordingly.
> 
> Changes since v3:
> - Fix typo in tc/tc.c causing compile error.
> 
> Changes since v4:
> - Make matches_color() return boolean.
> ---
>  bridge/bridge.c   |  3 +--
>  include/color.h   |  9 +
>  ip/ip.c   |  3 +--
>  lib/color.c   | 33 -
>  man/man8/bridge.8 | 13 +++--
>  man/man8/ip.8 | 13 +++--
>  man/man8/tc.8 | 13 +++--
>  tc/tc.c   |  3 +--
>  8 files changed, 77 insertions(+), 13 deletions(-)
> 

LGTM.

Reviewed-by: David Ahern 



Re: [PATCH] sunhme: convert printk to pr_cont

2018-08-17 Thread David Miller
From: Mikulas Patocka 
Date: Fri, 17 Aug 2018 16:08:49 -0400 (EDT)

> I'm not an expert on networking code - you can change it if it is more 
> appropriate this way.

What Stephen is asking of you doesn't require networking expertiece
and he even gave you an example of how to do it.  All you would need
to do is test is suggestion and make sure it works properly.


Re: [PATCH iproute2-next] iproute_lwtunnel: allow specifying 'src' for 'encap ip' / 'encap ip6'

2018-08-17 Thread David Ahern
On 8/17/18 1:31 AM, Shmulik Ladkani wrote:
> This allows the user to specify the LWTUNNEL_IP_SRC/LWTUNNEL_IP6_SRC
> when setting an lwtunnel encapsulation route.
> 
> Signed-off-by: Shmulik Ladkani 
> ---
>  ip/iproute_lwtunnel.c | 22 --
>  1 file changed, 20 insertions(+), 2 deletions(-)
> 

applied to iproute2-next. Thanks



Re: [PATCH mlx5-next] RDMA/mlx5: Don't use cached IRQ affinity mask

2018-08-17 Thread Jason Gunthorpe
On Fri, Aug 17, 2018 at 01:03:20PM -0700, Sagi Grimberg wrote:
> 
> > Hey Sagi,
> > 
> > The patch works allowing connections for the various affinity mappings 
> > below:
> > 
> > One comp_vector per core across all cores, starting with numa-local cores:
> 
> Thanks Steve, is this your "Tested by:" tag?

The new patchworks doesn't grab patches inlined in messages, so you
will need to resend it.

Also, can someone remind me what the outcome is here? Does it
supersede Leon's patch:

https://patchwork.kernel.org/patch/10526167/

?

Thanks,
Jason


Re: [PATCH] sunhme: convert printk to pr_cont

2018-08-17 Thread Mikulas Patocka



On Fri, 17 Aug 2018, Stephen Hemminger wrote:

> On Fri, 17 Aug 2018 15:12:22 -0400 (EDT)
> Mikulas Patocka  wrote:
> 
> > ===
> > --- linux-stable.orig/drivers/net/ethernet/sun/sunhme.c 2018-04-20 
> > 18:11:00.0 +0200
> > +++ linux-stable/drivers/net/ethernet/sun/sunhme.c  2018-08-13 
> > 22:01:08.0 +0200
> > @@ -572,21 +572,21 @@ static void display_link_mode(struct hap
> >  {
> > printk(KERN_INFO "%s: Link is up using ", hp->dev->name);
> > if (hp->tcvr_type == external)
> > -   printk("external ");
> > +   pr_cont("external ");
> > else
> > -   printk("internal ");
> > -   printk("transceiver at ");
> > +   pr_cont("internal ");
> > +   pr_cont("transceiver at ");
> > hp->sw_lpa = happy_meal_tcvr_read(hp, tregs, MII_LPA);
> > if (hp->sw_lpa & (LPA_100HALF | LPA_100FULL)) {
> > if (hp->sw_lpa & LPA_100FULL)
> > -   printk("100Mb/s, Full Duplex.\n");
> > +   pr_cont("100Mb/s, Full Duplex.\n");
> > else
> > -   printk("100Mb/s, Half Duplex.\n");
> > +   pr_cont("100Mb/s, Half Duplex.\n");
> > } else {
> > if (hp->sw_lpa & LPA_10FULL)
> > -   printk("10Mb/s, Full Duplex.\n");
> > +   pr_cont("10Mb/s, Full Duplex.\n");
> > else
> > -   printk("10Mb/s, Half Duplex.\n");
> > +   pr_cont("10Mb/s, Half Duplex.\n");
> > }
> >  }
> 
> Why not just  use a single netdev_info (or drop the useless message 
> altogether).
> 
> I.e
>   netdev_info(hp->dev, "Link is up using %s transceiver at %dMb/s %s 
> Duplex\n",
>   (hp->tcvr->type == external) ? "external" : "internal",
>   (hp->sw_lpa & (LPA_100HALF | LPA_100FULL)) ? 100 : 10,
>   (hw->sw_lpa & (LPA_100FULL | LPA_10FULL)) ? "Full" : "Half"));

I'm not an expert on networking code - you can change it if it is more 
appropriate this way.

Mikulas


Re: [PATCH mlx5-next] RDMA/mlx5: Don't use cached IRQ affinity mask

2018-08-17 Thread Sagi Grimberg




Hey Sagi,

The patch works allowing connections for the various affinity mappings below:

One comp_vector per core across all cores, starting with numa-local cores:


Thanks Steve, is this your "Tested by:" tag?


Re: [PATCH] sunhme: convert printk to pr_cont

2018-08-17 Thread Stephen Hemminger
On Fri, 17 Aug 2018 15:12:22 -0400 (EDT)
Mikulas Patocka  wrote:

> ===
> --- linux-stable.orig/drivers/net/ethernet/sun/sunhme.c   2018-04-20 
> 18:11:00.0 +0200
> +++ linux-stable/drivers/net/ethernet/sun/sunhme.c2018-08-13 
> 22:01:08.0 +0200
> @@ -572,21 +572,21 @@ static void display_link_mode(struct hap
>  {
>   printk(KERN_INFO "%s: Link is up using ", hp->dev->name);
>   if (hp->tcvr_type == external)
> - printk("external ");
> + pr_cont("external ");
>   else
> - printk("internal ");
> - printk("transceiver at ");
> + pr_cont("internal ");
> + pr_cont("transceiver at ");
>   hp->sw_lpa = happy_meal_tcvr_read(hp, tregs, MII_LPA);
>   if (hp->sw_lpa & (LPA_100HALF | LPA_100FULL)) {
>   if (hp->sw_lpa & LPA_100FULL)
> - printk("100Mb/s, Full Duplex.\n");
> + pr_cont("100Mb/s, Full Duplex.\n");
>   else
> - printk("100Mb/s, Half Duplex.\n");
> + pr_cont("100Mb/s, Half Duplex.\n");
>   } else {
>   if (hp->sw_lpa & LPA_10FULL)
> - printk("10Mb/s, Full Duplex.\n");
> + pr_cont("10Mb/s, Full Duplex.\n");
>   else
> - printk("10Mb/s, Half Duplex.\n");
> + pr_cont("10Mb/s, Half Duplex.\n");
>   }
>  }

Why not just  use a single netdev_info (or drop the useless message altogether).

I.e
netdev_info(hp->dev, "Link is up using %s transceiver at %dMb/s %s 
Duplex\n",
(hp->tcvr->type == external) ? "external" : "internal",
(hp->sw_lpa & (LPA_100HALF | LPA_100FULL)) ? 100 : 10,
(hw->sw_lpa & (LPA_100FULL | LPA_10FULL)) ? "Full" : "Half"));


Re: [endianness bug] cxgb4: mk_act_open_req() buggers ->{local,peer}_ip on big-endian hosts

2018-08-17 Thread Al Viro
On Fri, Aug 17, 2018 at 07:59:44PM +0100, Al Viro wrote:
> On Fri, Aug 17, 2018 at 07:58:41PM +0100, Al Viro wrote:
> > On Fri, Aug 17, 2018 at 07:09:49PM +0100, Al Viro wrote:
> > 
> > > Re that code - are you sure it doesn't need le64_to_cpu(*src)?  Because 
> > > from what
> > > I understand about PCI (which matches just fine to the comments in the 
> > > same driver),
> > > you probably do need that.  Again, the only real way to find out is to 
> > > test on
> > > big-endian host...
> > 
> > BTW, would that, by any chance, be an open-coded
> > _iowrite64_copy(dst, src, EQ_UNIT/sizeof(u64))
> 
> __iowrite64_copy, even...

FWIW, it looks like the confusion had been between the endianness of the data 
structures
(b-e both on host and NIC side) and the fact that PCI is l-e.  *IF* that code 
wants to
copy data from host data structures to iomem as-is, it needs to use 
__raw_writeq() and
its ilk or writeq(le64_to_cpu(...)) to compensate.  The latter would, indeed, 
confuse
sparse - we are accessing b-e data as if it was l-e.

If we want copying that wouldn't affect the endianness, we need memcpy_toio() 
or similar
beasts.  And AFAICS that code is very close to
/* If we're only writing a single Egress Unit and the BAR2
 * Queue ID is 0, we can use the Write Combining Doorbell
 * Gather Buffer; otherwise we use the simple doorbell.
 */
if (n == 1 && tq->bar2_qid == 0) {
unsigned int index = (tq->pidx ?: tq->size) - 1;
/* Copy the TX Descriptor in a tight loop in order to
 * try to get it to the adapter in a single Write
 * Combined transfer on the PCI-E Bus.  If the Write
 * Combine fails (say because of an interrupt, etc.)
 * the hardware will simply take the last write as a
 * simple doorbell write with a PIDX Increment of 1
 * and will fetch the TX Descriptor from memory via
 * DMA.
 */
__iowrite64_copy(tq->bar2_addr + SGE_UDB_WCDOORBELL,
 >desc[index], EQ_UNIT/sizeof(u64))
} else {
writel(val | QID_V(tq->bar2_qid),
   tq->bar2_addr + SGE_UDB_KDOORBELL);
}
/* This Write Memory Barrier will force the write to the User
 * Doorbell area to be flushed.  This is needed to prevent
 * writes on different CPUs for the same queue from hitting
 * the adapter out of order.  This is required when some Work
 * Requests take the Write Combine Gather Buffer path (user
 * doorbell area offset [SGE_UDB_WCDOORBELL..+63]) and some
 * take the traditional path where we simply increment the
 * PIDX (User Doorbell area SGE_UDB_KDOORBELL) and have the
 * hardware DMA read the actual Work Request.
 */
wmb();

which wouldn't have looked unusual...  Again, that really needs review from
the folks familiar with the hardware in question, as well as testing - I'm
not trying to push patches like that.  If the current mainline variant
really works on b-e, I'd like to understand how does it manage that, though...


Re: [PATCH] sunhme: convert printk to pr_cont

2018-08-17 Thread David Miller
From: Mikulas Patocka 
Date: Fri, 17 Aug 2018 15:12:22 -0400 (EDT)

> The kernel adds newlines automatically unless pr_cont is used. This patch
> converts sunhme to use pr_cont, so that the messages are not broken to
> multiple lines.
> 
> The patch also adds "\n" to a few strings that were missing it.
> 
> Signed-off-by: Mikulas Patocka 
> Cc: sta...@vger.kernel.org

"stable", are you sure?  What crash or memory corruption does these
added newlines in the kernel log cuase?

I don't think this is appropriate for -stable, sorry.

At best this is net-next material, and that tree is closed right now.

Please resubmit this when the net-next tree opens back up again,
thanks.


Re: [PATCH bpf] tools/bpf: fix bpf selftest test_cgroup_storage failure

2018-08-17 Thread Alexei Starovoitov
On Fri, Aug 17, 2018 at 08:54:15AM -0700, Yonghong Song wrote:
> The bpf selftest test_cgroup_storage failed in one of
> our production test servers.
>   # sudo ./test_cgroup_storage
>   Failed to create map: Operation not permitted
> 
> It turns out this is due to insufficient locked memory
> with system default 16KB.
> 
> Similar to other self tests, let us arm the process
> with unlimited locked memory. With this change,
> the test passed.
>   # sudo ./test_cgroup_storage
>   test_cgroup_storage:PASS
> 
> Fixes: 68cfa3ac6b8d ("selftests/bpf: add a cgroup storage test")
> Cc: Roman Gushchin 
> Signed-off-by: Yonghong Song 

Applied, Thanks



Re: [PATCH net-next] r8169: add missing Kconfig dependency

2018-08-17 Thread David Miller
From: Florian Fainelli 
Date: Fri, 17 Aug 2018 11:45:57 -0700

> On 08/17/2018 11:42 AM, Heiner Kallweit wrote:
>> Now that we switched the r8169 driver to use phylib, there's a
>> dependency on the Realtek PHY drivers. This dependency was missing
>> in Kconfig.
>> 
>> Reported-by: Jouni Mettälä 
>> Fixes: f1e911d5d0df ("r8169: add basic phylib support")
>> Signed-off-by: Heiner Kallweit 
> 
> This is probably targeting 'net' now that the changes landed in Linus' tree:

Right.

> Acked-by: Florian Fainelli 

Applied, thanks everyone.


[PATCH] sunhme: convert printk to pr_cont

2018-08-17 Thread Mikulas Patocka
The kernel adds newlines automatically unless pr_cont is used. This patch
converts sunhme to use pr_cont, so that the messages are not broken to
multiple lines.

The patch also adds "\n" to a few strings that were missing it.

Signed-off-by: Mikulas Patocka 
Cc: sta...@vger.kernel.org

---
 drivers/net/ethernet/sun/sunhme.c |   70 +++---
 1 file changed, 35 insertions(+), 35 deletions(-)

Index: linux-stable/drivers/net/ethernet/sun/sunhme.c
===
--- linux-stable.orig/drivers/net/ethernet/sun/sunhme.c 2018-04-20 
18:11:00.0 +0200
+++ linux-stable/drivers/net/ethernet/sun/sunhme.c  2018-08-13 
22:01:08.0 +0200
@@ -572,21 +572,21 @@ static void display_link_mode(struct hap
 {
printk(KERN_INFO "%s: Link is up using ", hp->dev->name);
if (hp->tcvr_type == external)
-   printk("external ");
+   pr_cont("external ");
else
-   printk("internal ");
-   printk("transceiver at ");
+   pr_cont("internal ");
+   pr_cont("transceiver at ");
hp->sw_lpa = happy_meal_tcvr_read(hp, tregs, MII_LPA);
if (hp->sw_lpa & (LPA_100HALF | LPA_100FULL)) {
if (hp->sw_lpa & LPA_100FULL)
-   printk("100Mb/s, Full Duplex.\n");
+   pr_cont("100Mb/s, Full Duplex.\n");
else
-   printk("100Mb/s, Half Duplex.\n");
+   pr_cont("100Mb/s, Half Duplex.\n");
} else {
if (hp->sw_lpa & LPA_10FULL)
-   printk("10Mb/s, Full Duplex.\n");
+   pr_cont("10Mb/s, Full Duplex.\n");
else
-   printk("10Mb/s, Half Duplex.\n");
+   pr_cont("10Mb/s, Half Duplex.\n");
}
 }
 
@@ -594,19 +594,19 @@ static void display_forced_link_mode(str
 {
printk(KERN_INFO "%s: Link has been forced up using ", hp->dev->name);
if (hp->tcvr_type == external)
-   printk("external ");
+   pr_cont("external ");
else
-   printk("internal ");
-   printk("transceiver at ");
+   pr_cont("internal ");
+   pr_cont("transceiver at ");
hp->sw_bmcr = happy_meal_tcvr_read(hp, tregs, MII_BMCR);
if (hp->sw_bmcr & BMCR_SPEED100)
-   printk("100Mb/s, ");
+   pr_cont("100Mb/s, ");
else
-   printk("10Mb/s, ");
+   pr_cont("10Mb/s, ");
if (hp->sw_bmcr & BMCR_FULLDPLX)
-   printk("Full Duplex.\n");
+   pr_cont("Full Duplex.\n");
else
-   printk("Half Duplex.\n");
+   pr_cont("Half Duplex.\n");
 }
 
 static int set_happy_link_modes(struct happy_meal *hp, void __iomem *tregs)
@@ -883,7 +883,7 @@ static void happy_meal_tx_reset(struct h
 
/* Lettuce, tomato, buggy hardware (no extra charge)? */
if (!tries)
-   printk(KERN_ERR "happy meal: Transceiver BigMac ATTACK!");
+   printk(KERN_ERR "happy meal: Transceiver BigMac ATTACK!\n");
 
/* Take care. */
HMD(("done\n"));
@@ -903,7 +903,7 @@ static void happy_meal_rx_reset(struct h
 
/* Will that be all? */
if (!tries)
-   printk(KERN_ERR "happy meal: Receiver BigMac ATTACK!");
+   printk(KERN_ERR "happy meal: Receiver BigMac ATTACK!\n");
 
/* Don't forget your vik_1137125_wa.  Have a nice day. */
HMD(("done\n"));
@@ -925,7 +925,7 @@ static void happy_meal_stop(struct happy
 
/* Come back next week when we are "Sun Microelectronics". */
if (!tries)
-   printk(KERN_ERR "happy meal: Fry guys.");
+   printk(KERN_ERR "happy meal: Fry guys.\n");
 
/* Remember: "Different name, same old buggy as shit hardware." */
HMD(("done\n"));
@@ -1143,7 +1143,7 @@ static void happy_meal_transceiver_check
hp->tcvr_type = internal;
ASD(("\n"));
} else {
-   printk(KERN_ERR "happy meal: Transceiver and a 
coke please.");
+   printk(KERN_ERR "happy meal: Transceiver and a 
coke please.\n");
hp->tcvr_type = none; /* Grrr... */
ASD(("\n"));
}
@@ -1824,12 +1824,12 @@ static int happy_meal_is_not_so_happy(st
/* All sorts of DMA receive errors. */
printk(KERN_ERR "%s: Happy Meal rx DMA errors [ ", 
hp->dev->name);
if (status & GREG_STAT_RXERR)
-   printk("GenericError ");
+   pr_cont("GenericError ");
if (status & GREG_STAT_RXPERR)
-   printk("ParityError ");
+   pr_cont("ParityError ");
   

Re: virtio_net failover and initramfs

2018-08-17 Thread Samudrala, Sridhar

On 8/17/2018 2:56 AM, Harald Hoyer wrote:

On 17.08.2018 11:51, Harald Hoyer wrote:

On 16.08.2018 00:17, Siwei Liu wrote:

On Wed, Aug 15, 2018 at 12:05 PM, Samudrala, Sridhar
 wrote:

On 8/14/2018 5:03 PM, Siwei Liu wrote:

Are we sure all userspace apps skip and ignore slave interfaces by
just looking at "IFLA_MASTER" attribute?

When STANDBY is enabled on virtio-net, a failover master interface
will appear, which automatically enslaves the virtio device. But it is
found out that iSCSI (or any network boot) cannot boot strap over the
new failover interface together with a standby virtio (without any VF
or PT device in place).

Dracut (initramfs) ends up with timeout and dropping into emergency shell:

[  228.170425] dracut-initqueue[377]: Warning: dracut-initqueue
timeout - starting timeout scripts
[  228.171788] dracut-initqueue[377]: Warning: Could not boot.
   Starting Dracut Emergency Shell...
Generating "/run/initramfs/rdsosreport.txt"
Entering emergency mode. Exit the shell to continue.
Type "journalctl" to view system logs.
You might want to save "/run/initramfs/rdsosreport.txt" to a USB stick or
/boot
after mounting them and attach it to a bug report.
dracut:/# ip l sh
1: lo:  mtu 65536 qdisc noqueue state UNKNOWN
mode DEFAULT group default qlen 1000
  link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: eth0:  mtu 1500 qdisc noqueue
state UP mode DEFAULT group default qlen 1000
  link/ether 9a:46:22:ae:33:54 brd ff:ff:ff:ff:ff:ff\
3: eth1:  mtu 1500 qdisc pfifo_fast
master eth0 state UP mode DEFAULT group default qlen 1000
  link/ether 9a:46:22:ae:33:54 brd ff:ff:ff:ff:ff:ff
dracut:/#

If changing dracut code to ignore eth1 (with IFLA_MASTER attr),
network boot starts to work.


Does dracut by default tries to use all the interfaces that are UP?


Yes. The specific dracut cmdline of our case is "ip=dhcp
netroot=iscsi:... ", but it's not specific to iscsi boot. And because
of same MAC address for failover and standby, while dracut tries to
run DHCP on all interfaces that are up it eventually gets same route
for each interface. Those conflict route entries kill off the network
connection.


The reason is that dracut has its own means to differentiate virtual
interfaces for network boot: it does not look at IFLA_MASTER and
ignores slave interfaces. Instead, users have to provide explicit
option e.g. bond=eth0,eth1 in the boot line, then dracut would know
the config and ignore the slave interfaces.


Isn't it possible to specify the interface that should be used for network
boot?

As I understand it, one can only specify interface name for running
DHCP but not select interface for network boot.  We want DHCP to run
on every NIC that is up (excluding the enslaved interfaces), and only
one of them can get a route entry to the network boot server (ie.g.
iSCSI target).




However, with automatic creation of failover interface that assumption
is no longer true. Can we change dracut to ignore all slave interface
by checking  IFLA_MASTER? I don't think so. It has a large impact to
existing configs.


What is the issue with checking for IFLA_MASTER? I guess this is used with
team/bonding setups.

That should be discussed within and determined by the dracut
community. But the current dracut code doesn't check IFLA_MASTER for
team or bonding specifically. I guess this change might have broader
impact to existing userspace that might be already relying on the
current behaviour.

Thanks,
-Siwei

Is there a sysfs flag for IFF_SLAVE? Or any "ip" output I can use to detect, 
that it is a IFF_SLAVE?


Oh, it's the other way around.. dracut should ignore "master" (eth1).
In the above example eth0 is the net_failover device and eth1 is the 
lower virtio_net device.
"ip" output of eth1 shows "master eth0". It indicates that eth0 is its 
upper/master device.
This information can also be obtained via sysfs too. 
/sys/class/net/eth1/upper_eth0


Can the master enslave the "eth0", if it is already "UP" and busy later on?
eth0 is the master/failover device and eth1 gets registered as its slave 
via NETDEV_REGISTER event.

dracut should ignore eth1 in this setup.


Re: [endianness bug] cxgb4: mk_act_open_req() buggers ->{local,peer}_ip on big-endian hosts

2018-08-17 Thread Al Viro
On Fri, Aug 17, 2018 at 07:58:41PM +0100, Al Viro wrote:
> On Fri, Aug 17, 2018 at 07:09:49PM +0100, Al Viro wrote:
> 
> > Re that code - are you sure it doesn't need le64_to_cpu(*src)?  Because 
> > from what
> > I understand about PCI (which matches just fine to the comments in the same 
> > driver),
> > you probably do need that.  Again, the only real way to find out is to test 
> > on
> > big-endian host...
> 
> BTW, would that, by any chance, be an open-coded
>   _iowrite64_copy(dst, src, EQ_UNIT/sizeof(u64))

__iowrite64_copy, even...


Re: [endianness bug] cxgb4: mk_act_open_req() buggers ->{local,peer}_ip on big-endian hosts

2018-08-17 Thread Al Viro
On Fri, Aug 17, 2018 at 07:09:49PM +0100, Al Viro wrote:

> Re that code - are you sure it doesn't need le64_to_cpu(*src)?  Because from 
> what
> I understand about PCI (which matches just fine to the comments in the same 
> driver),
> you probably do need that.  Again, the only real way to find out is to test on
> big-endian host...

BTW, would that, by any chance, be an open-coded
_iowrite64_copy(dst, src, EQ_UNIT/sizeof(u64))


Re: [offlist] Re: Crash in netlink/sk_filter_trim_cap on ARMv7 on 4.18rc1

2018-08-17 Thread Stefan Wahren
Hi Daniel,

> Daniel Borkmann  hat am 17. August 2018 um 20:30 
> geschrieben:
> 
> 
> On 08/17/2018 06:17 PM, Russell King - ARM Linux wrote:
> > On Fri, Aug 17, 2018 at 02:40:19PM +0200, Daniel Borkmann wrote:
> >> I'd have one potential bug suspicion, for the 4.18 one you were trying,
> >> could you run with the below patch to see whether it would help?
> > 
> > I think this is almost certainly the problem - looking at the history,
> > it seems that the "-4" was assumed to be part of the scratch stuff in
> > commit 38ca93060163 ("bpf, arm32: save 4 bytes of unneeded stack space")
> > but it isn't - it's because "off" of zero refers to the top word in the
> > stack (iow at STACK_SIZE-4).
> 
> Yeah agree, my thinking as well (albeit bit late, sigh, sorry about that).
> Waiting for Peter to get back with results for definite confirmation. Your
> rework in 1c35ba122d4a ("ARM: net: bpf: use negative numbers for stacked
> registers") and 96cced4e774a ("ARM: net: bpf: access eBPF scratch space using
> ARM FP register") fixes this in mainline, so unless I'm missing something this
> would only need a stand-alone fix for 4.18/stable which I can cook up and
> submit then.

i was able to reproduce this issue on RPi 3 with Linux 4.18.1 + 
multi_v7_defconfig and the following  config changes:

 --- a/arch/arm/configs/multi_v7_defconfig
+++ b/arch/arm/configs/multi_v7_defconfig
@@ -2,7 +2,10 @@ CONFIG_SYSVIPC=y
 CONFIG_NO_HZ=y
 CONFIG_HIGH_RES_TIMERS=y
 CONFIG_CGROUPS=y
+CONFIG_CGROUP_BPF=y
 CONFIG_BLK_DEV_INITRD=y
+CONFIG_BPF_SYSCALL=y
+CONFIG_BPF_JIT_ALWAYS_ON=y
 CONFIG_EMBEDDED=y
 CONFIG_PERF_EVENTS=y
 CONFIG_MODULES=y
@@ -153,6 +156,8 @@ CONFIG_IPV6_MIP6=m
 CONFIG_IPV6_TUNNEL=m
 CONFIG_IPV6_MULTIPLE_TABLES=y
 CONFIG_NET_DSA=m
+CONFIG_BPF_JIT=y
+CONFIG_BPF_STREAM_PARSER=y
 CONFIG_CAN=y
 CONFIG_CAN_AT91=m
 CONFIG_CAN_FLEXCAN=m

After applying the "-4" patch the oopses doesn't appear during boot anymore.

Stefan

> 
> Thanks,
> Daniel
> 
> ___
> linux-arm-kernel mailing list
> linux-arm-ker...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel


Re: [PATCH net-next] r8169: add missing Kconfig dependency

2018-08-17 Thread Florian Fainelli
On 08/17/2018 11:42 AM, Heiner Kallweit wrote:
> Now that we switched the r8169 driver to use phylib, there's a
> dependency on the Realtek PHY drivers. This dependency was missing
> in Kconfig.
> 
> Reported-by: Jouni Mettälä 
> Fixes: f1e911d5d0df ("r8169: add basic phylib support")
> Signed-off-by: Heiner Kallweit 

This is probably targeting 'net' now that the changes landed in Linus' tree:

Acked-by: Florian Fainelli 
-- 
Florian


[PATCH net-next] r8169: add missing Kconfig dependency

2018-08-17 Thread Heiner Kallweit
Now that we switched the r8169 driver to use phylib, there's a
dependency on the Realtek PHY drivers. This dependency was missing
in Kconfig.

Reported-by: Jouni Mettälä 
Fixes: f1e911d5d0df ("r8169: add basic phylib support")
Signed-off-by: Heiner Kallweit 
---
 drivers/net/ethernet/realtek/Kconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/ethernet/realtek/Kconfig 
b/drivers/net/ethernet/realtek/Kconfig
index e1cd934c2..96d1b9c08 100644
--- a/drivers/net/ethernet/realtek/Kconfig
+++ b/drivers/net/ethernet/realtek/Kconfig
@@ -100,6 +100,7 @@ config R8169
select FW_LOADER
select CRC32
select PHYLIB
+   select REALTEK_PHY
---help---
  Say Y here if you have a Realtek 8169 PCI Gigabit Ethernet adapter.
 
-- 
2.18.0



Re: [offlist] Re: Crash in netlink/sk_filter_trim_cap on ARMv7 on 4.18rc1

2018-08-17 Thread Daniel Borkmann
On 08/17/2018 06:17 PM, Russell King - ARM Linux wrote:
> On Fri, Aug 17, 2018 at 02:40:19PM +0200, Daniel Borkmann wrote:
>> I'd have one potential bug suspicion, for the 4.18 one you were trying,
>> could you run with the below patch to see whether it would help?
> 
> I think this is almost certainly the problem - looking at the history,
> it seems that the "-4" was assumed to be part of the scratch stuff in
> commit 38ca93060163 ("bpf, arm32: save 4 bytes of unneeded stack space")
> but it isn't - it's because "off" of zero refers to the top word in the
> stack (iow at STACK_SIZE-4).

Yeah agree, my thinking as well (albeit bit late, sigh, sorry about that).
Waiting for Peter to get back with results for definite confirmation. Your
rework in 1c35ba122d4a ("ARM: net: bpf: use negative numbers for stacked
registers") and 96cced4e774a ("ARM: net: bpf: access eBPF scratch space using
ARM FP register") fixes this in mainline, so unless I'm missing something this
would only need a stand-alone fix for 4.18/stable which I can cook up and
submit then.

Thanks,
Daniel


Re: [regression] r8169 without realtek_phy

2018-08-17 Thread Heiner Kallweit
On 17.08.2018 19:39, Florian Fainelli wrote:
> +Heiner,
> 
> On 08/17/2018 01:33 AM, Jouni Mettälä wrote:
>> There is network regression for me. 4.18 was good. 4.18+ is bad. There
>> was some phy changes in r8169 driver. Fortunately adding
>> CONFIG_REALTEK_PHY=m to kernel config fixed the regression. 
>>
>> Should r8169 depend on realtek_phy? Does that breaks something else?
> 
> That would be reasonable given that there is now a hard dependency on
> the r8169 driver having the right PHY driver, since the Generic PHY
> driver likely wont' do all the workarounds. Heiner, what do you think?
> 
Good catch, indeed we need a Kconfig dependency now. I missed this
because I had Realtek PHY support enabled anyway. I will submit a
fix adding this dependency.

Heiner
 
>
>> Network doesn't work with Generic PHY (output of dmesg)
>> Generic PHY r8169-300:00: attached PHY driver [Generic PHY]
>> (mii_bus:phy_addr=r8169-300:00, irq=IGNORE)
>>
>> When realtek_phy is compiled, r8169 automatically uses it.
>> RTL8211B Gigabit Ethernet r8169-300:00: attached PHY driver [RTL8211B
>> Gigabit Ethernet] (mii_bus:phy_addr=r8169-300:00, irq=IGNORE)
>>
>> Here is Ethernet controller's lspci for reference:
>> 03:00.0 Ethernet controller [0200]: Realtek Semiconductor Co., Ltd.
>> RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller [10ec:8168]
>> (rev 01)
>>  Subsystem: ABIT Computer Corp. RTL8111/8168/8411 PCI Express
>> Gigabit Ethernet Controller [147b:1078]
>>  Flags: bus master, fast devsel, latency 0, IRQ 27
>>  I/O ports at ce00 [size=256]
>>  Memory at fddff000 (64-bit, non-prefetchable) [size=4K]
>>  [virtual] Expansion ROM at fdd0 [disabled] [size=64K]
>>  Capabilities: [40] Power Management version 2
>>  Capabilities: [48] Vital Product Data
>>  Capabilities: [50] MSI: Enable+ Count=1/2 Maskable- 64bit+
>>  Capabilities: [60] Express Endpoint, MSI 00
>>  Capabilities: [84] Vendor Specific Information: Len=4c 
>>  Capabilities: [100] Advanced Error Reporting
>>  Capabilities: [12c] Virtual Channel
>>  Capabilities: [148] Device Serial Number 28-00-00-00-00-00-00-
>> 00
>>  Capabilities: [154] Power Budgeting 
>>  Kernel driver in use: r8169
>>  Kernel modules: r8169
>>
> 
> 



Re: [endianness bug] cxgb4: mk_act_open_req() buggers ->{local,peer}_ip on big-endian hosts

2018-08-17 Thread Al Viro
On Fri, Aug 17, 2018 at 04:43:20PM +0100, Al Viro wrote:
> On Fri, Aug 17, 2018 at 06:35:41PM +0530, Ganesh Goudar wrote:
> > Thanks, Al. The patch looks good to me but it does not seem
> > to be showing up in patchwork, should I resend the patch on
> > your behalf to net tree ?
> 
> Umm...  I thought net-next had been closed until -rc1, hadn't
> it?
> 
> Anyway, endianness cleanups and fixes of drivers/net/ethernet/chelsio
> can be found in vfs.git #net-endian.chelsio; I was planning to post
> that stuff on netdev after -rc1, but I would certainly appreciate
> a look from somebody familiar with the hardware prior to that, assuming
> you have time for that at the moment...  The stuff in there (it's
> based off net/master):
>   struct cxgb4_next_header .match_val/.match_mask/mask should be 
> net-endian
>   cxgb4: fix TC-PEDIT-related parts of cxgb4_tc_flower on big-endian
>   cxgb4_tc_u32: trivial endianness annotations
>   cxgb4: trivial endianness annotations
>   libcxgb: trivial __percpu annotations
>   libcxgb: trivial endianness annotations
>   cxgb3: trivial endianness annotations
>   cxgb3: don't get cute with !! and shifts in t3_prep_adapter()...
>   [investigate][endianness bug] cxgb3: assigning htonl(11-bit value) to 
> __be16 field is wrong
>   cxgb: trivial endianness annotations

... and updated for some of today's catch (== stuff caught while finding the
reply).  Another very likely bug hidden by force-casts: in t4_fwcache()
you put host-endian 0 or 1 into __be32 field, hiding that by force-cast.
Then feed that to hardware, which, judging by everything else nearby
expects big-endian there.  The only reason it works, AFAICS, is that you
only pass it FW_PARAM_DEV_FWCACHE_FLUSH (== 0); as soon as you get
a caller passing FW_PARAM_DEV_FWCACHE_FLUSHINV, you'll get breakage.

And frankly, comments like
while (count) {
/* the (__force u64) is because the compiler
 * doesn't understand the endian swizzling
 * going on
 */
writeq((__force u64)*src, dst);
src++;
dst++;
count--;
}
are more than slightly terrifying - piss on compiler, what about reviewers?  And
the authors themselves, for that matter...  FWIW, see the dmr's comments on
"you are not expected to understand that" story (bell labs site is buggered,
but wayback machine has it on
https://web.archive.org/web/20140724213028/http://cm.bell-labs.com/cm/cs/who/dmr/odd.html)
Especially the "The real problem is that we didn't understand what was going on 
either"
part...

Re that code - are you sure it doesn't need le64_to_cpu(*src)?  Because from 
what
I understand about PCI (which matches just fine to the comments in the same 
driver),
you probably do need that.  Again, the only real way to find out is to test on
big-endian host...


Re: [PATCH ethtool] ethtool: document WoL filters option also in help message

2018-08-17 Thread Florian Fainelli
On 08/17/2018 06:21 AM, Michal Kubecek wrote:
> Commit eff0bb337223 ("ethtool: Add support for WAKE_FILTER (WoL using
> filters)") added option "f" for wake on lan and documented it in man page
> but not in the output of "ethtool --help".
> 
> Signed-off-by: Michal Kubecek 

Acked-by: Florian Fainelli 

Thanks Michal!
-- 
Florian


Re: [regression] r8169 without realtek_phy

2018-08-17 Thread Florian Fainelli
+Heiner,

On 08/17/2018 01:33 AM, Jouni Mettälä wrote:
> There is network regression for me. 4.18 was good. 4.18+ is bad. There
> was some phy changes in r8169 driver. Fortunately adding
> CONFIG_REALTEK_PHY=m to kernel config fixed the regression. 
> 
> Should r8169 depend on realtek_phy? Does that breaks something else?

That would be reasonable given that there is now a hard dependency on
the r8169 driver having the right PHY driver, since the Generic PHY
driver likely wont' do all the workarounds. Heiner, what do you think?

> 
> Network doesn't work with Generic PHY (output of dmesg)
> Generic PHY r8169-300:00: attached PHY driver [Generic PHY]
> (mii_bus:phy_addr=r8169-300:00, irq=IGNORE)
> 
> When realtek_phy is compiled, r8169 automatically uses it.
> RTL8211B Gigabit Ethernet r8169-300:00: attached PHY driver [RTL8211B
> Gigabit Ethernet] (mii_bus:phy_addr=r8169-300:00, irq=IGNORE)
> 
> Here is Ethernet controller's lspci for reference:
> 03:00.0 Ethernet controller [0200]: Realtek Semiconductor Co., Ltd.
> RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller [10ec:8168]
> (rev 01)
>   Subsystem: ABIT Computer Corp. RTL8111/8168/8411 PCI Express
> Gigabit Ethernet Controller [147b:1078]
>   Flags: bus master, fast devsel, latency 0, IRQ 27
>   I/O ports at ce00 [size=256]
>   Memory at fddff000 (64-bit, non-prefetchable) [size=4K]
>   [virtual] Expansion ROM at fdd0 [disabled] [size=64K]
>   Capabilities: [40] Power Management version 2
>   Capabilities: [48] Vital Product Data
>   Capabilities: [50] MSI: Enable+ Count=1/2 Maskable- 64bit+
>   Capabilities: [60] Express Endpoint, MSI 00
>   Capabilities: [84] Vendor Specific Information: Len=4c 
>   Capabilities: [100] Advanced Error Reporting
>   Capabilities: [12c] Virtual Channel
>   Capabilities: [148] Device Serial Number 28-00-00-00-00-00-00-
> 00
>   Capabilities: [154] Power Budgeting 
>   Kernel driver in use: r8169
>   Kernel modules: r8169
> 


-- 
Florian


Re: [PATCH iproute2-next] iproute_lwtunnel: allow specifying 'src' for 'encap ip' / 'encap ip6'

2018-08-17 Thread Shmulik Ladkani
Hi,

On Fri, 17 Aug 2018 08:00:22 -0700
Stephen Hemminger  wrote:

> If you accept an attribute on input you need to parse it and display it the
> same way in the show command.

Note print_encap_ip and print_encap_ip6 already handle LWTUNNEL_IP_SRC
and LWTUNNEL_IP6_SRC (since long ago, 1e5293056 and d95cdcf52).

The only missing part is treatment in parse_encap_ip and
parse_encap_ip6, as suggested by this patch.

Best,
Shmulik


Re: [PATCH iproute2] ipmaddr: use preferred_family when given

2018-08-17 Thread महेश बंडेवार
On Fri, Aug 17, 2018 at 9:29 AM, Stephen Hemminger
 wrote:
> On Wed, 15 Aug 2018 16:08:55 -0700
> Mahesh Bandewar  wrote:
>
>> From: Mahesh Bandewar 
>>
>> When creating socket() AF_INET is used irrespective of the family
>> that is given at the command-line (with -4, -6, or -0). This change
>> will open the socket with the preferred family.
>>
>> Signed-off-by: Mahesh Bandewar 
>> ---
>>  ip/ipmaddr.c | 13 -
>>  1 file changed, 12 insertions(+), 1 deletion(-)
>
> What is impact of this? Does ip multicast address changes not work on IPv6?
> Or is it just doing the right thing?
Essentially a no-op. Just doing the right thing. :)


[iproute PATCH v5 1/2] Make colored output configurable

2018-08-17 Thread Phil Sutter
Allow for -color={never,auto,always} to have colored output disabled,
enabled only if stdout is a terminal or enabled regardless of stdout
state.

Signed-off-by: Phil Sutter 
---
Changes since v1:
- Allow to override isatty() check by specifying '-color' flag more than
  once.
- Document new behaviour in man pages.

Changes since v2:
- Implement new -color=foo syntax.
- Update commit message and man page texts accordingly.

Changes since v3:
- Fix typo in tc/tc.c causing compile error.

Changes since v4:
- Make matches_color() return boolean.
---
 bridge/bridge.c   |  3 +--
 include/color.h   |  9 +
 ip/ip.c   |  3 +--
 lib/color.c   | 33 -
 man/man8/bridge.8 | 13 +++--
 man/man8/ip.8 | 13 +++--
 man/man8/tc.8 | 13 +++--
 tc/tc.c   |  3 +--
 8 files changed, 77 insertions(+), 13 deletions(-)

diff --git a/bridge/bridge.c b/bridge/bridge.c
index b3cab717ead30..663a35b2b2e46 100644
--- a/bridge/bridge.c
+++ b/bridge/bridge.c
@@ -173,8 +173,7 @@ main(int argc, char **argv)
NEXT_ARG();
if (netns_switch(argv[1]))
exit(-1);
-   } else if (matches(opt, "-color") == 0) {
-   ++color;
+   } else if (matches_color(opt, )) {
} else if (matches(opt, "-compressvlans") == 0) {
++compress_vlans;
} else if (matches(opt, "-force") == 0) {
diff --git a/include/color.h b/include/color.h
index 4f2c918db7e43..a22a00c2277e0 100644
--- a/include/color.h
+++ b/include/color.h
@@ -2,6 +2,8 @@
 #ifndef __COLOR_H__
 #define __COLOR_H__ 1
 
+#include 
+
 enum color_attr {
COLOR_IFNAME,
COLOR_MAC,
@@ -12,8 +14,15 @@ enum color_attr {
COLOR_NONE
 };
 
+enum color_opt {
+   COLOR_OPT_NEVER = 0,
+   COLOR_OPT_AUTO = 1,
+   COLOR_OPT_ALWAYS = 2
+};
+
 void enable_color(void);
 int check_enable_color(int color, int json);
+bool matches_color(const char *arg, int *val);
 void set_color_palette(void);
 int color_fprintf(FILE *fp, enum color_attr attr, const char *fmt, ...);
 enum color_attr ifa_family_color(__u8 ifa_family);
diff --git a/ip/ip.c b/ip/ip.c
index 72e858eed50d5..58c643df8a366 100644
--- a/ip/ip.c
+++ b/ip/ip.c
@@ -283,8 +283,7 @@ int main(int argc, char **argv)
exit(-1);
}
rcvbuf = size;
-   } else if (matches(opt, "-color") == 0) {
-   ++color;
+   } else if (matches_color(opt, )) {
} else if (matches(opt, "-help") == 0) {
usage();
} else if (matches(opt, "-netns") == 0) {
diff --git a/lib/color.c b/lib/color.c
index edf96e5c6ecd7..9c9023587748f 100644
--- a/lib/color.c
+++ b/lib/color.c
@@ -3,11 +3,13 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
 
 #include "color.h"
+#include "utils.h"
 
 enum color {
C_RED,
@@ -79,13 +81,42 @@ void enable_color(void)
 
 int check_enable_color(int color, int json)
 {
-   if (color && !json) {
+   if (json || color == COLOR_OPT_NEVER)
+   return 1;
+
+   if (color == COLOR_OPT_ALWAYS || isatty(fileno(stdout))) {
enable_color();
return 0;
}
return 1;
 }
 
+bool matches_color(const char *arg, int *val)
+{
+   char *dup, *p;
+
+   if (!val)
+   return false;
+
+   dup = strdupa(arg);
+   p = strchrnul(dup, '=');
+   if (*p)
+   *(p++) = '\0';
+
+   if (matches(dup, "-color"))
+   return false;
+
+   if (*p == '\0' || !strcmp(p, "always"))
+   *val = COLOR_OPT_ALWAYS;
+   else if (!strcmp(p, "auto"))
+   *val = COLOR_OPT_AUTO;
+   else if (!strcmp(p, "never"))
+   *val = COLOR_OPT_NEVER;
+   else
+   return false;
+   return true;
+}
+
 void set_color_palette(void)
 {
char *p = getenv("COLORFGBG");
diff --git a/man/man8/bridge.8 b/man/man8/bridge.8
index 1d10cb2b6a72c..53cd3d0a3d933 100644
--- a/man/man8/bridge.8
+++ b/man/man8/bridge.8
@@ -172,8 +172,17 @@ If there were any errors during execution of the commands, 
the application
 return code will be non zero.
 
 .TP
-.BR "\-c" , " -color"
-Use color output.
+.BR \-c [ color ][ = { always | auto | never }
+Configure color output. If parameter is omitted or
+.BR always ,
+color output is enabled regardless of stdout state. If parameter is
+.BR auto ,
+stdout is checked to be a terminal before enabling color output. If parameter 
is
+.BR never ,
+color output is disabled. If specified multiple times, the last one takes
+precedence. This flag is ignored if
+.B \-json
+is also given.
 
 .TP
 .BR "\-j", " \-json"
diff --git a/man/man8/ip.8 b/man/man8/ip.8
index 0087d18b74706..1d358879ec39c 100644
--- a/man/man8/ip.8
+++ 

[iproute PATCH 2/2] lib: Make check_enable_color() return boolean

2018-08-17 Thread Phil Sutter
As suggested, turn return code into true/false although it's not checked
anywhere yet.

Fixes: 4d829626a ("Merge common code for conditionally colored output")
Signed-off-by: Phil Sutter 
---
 include/color.h | 2 +-
 lib/color.c | 8 
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/include/color.h b/include/color.h
index a22a00c2277e0..e30f28c51c844 100644
--- a/include/color.h
+++ b/include/color.h
@@ -21,7 +21,7 @@ enum color_opt {
 };
 
 void enable_color(void);
-int check_enable_color(int color, int json);
+bool check_enable_color(int color, int json);
 bool matches_color(const char *arg, int *val);
 void set_color_palette(void);
 int color_fprintf(FILE *fp, enum color_attr attr, const char *fmt, ...);
diff --git a/lib/color.c b/lib/color.c
index 9c9023587748f..eaf69e74d673a 100644
--- a/lib/color.c
+++ b/lib/color.c
@@ -79,16 +79,16 @@ void enable_color(void)
set_color_palette();
 }
 
-int check_enable_color(int color, int json)
+bool check_enable_color(int color, int json)
 {
if (json || color == COLOR_OPT_NEVER)
-   return 1;
+   return false;
 
if (color == COLOR_OPT_ALWAYS || isatty(fileno(stdout))) {
enable_color();
-   return 0;
+   return true;
}
-   return 1;
+   return false;
 }
 
 bool matches_color(const char *arg, int *val)
-- 
2.18.0



Re: [endianness bug] cxgb4: mk_act_open_req() buggers ->{local,peer}_ip on big-endian hosts

2018-08-17 Thread David Miller
From: Al Viro 
Date: Fri, 17 Aug 2018 16:43:20 +0100

> Umm...  I thought net-next had been closed until -rc1, hadn't
> it?

That's correct.


Re: [PATCH iproute2] ipmaddr: use preferred_family when given

2018-08-17 Thread Stephen Hemminger
On Wed, 15 Aug 2018 16:08:55 -0700
Mahesh Bandewar  wrote:

> From: Mahesh Bandewar 
> 
> When creating socket() AF_INET is used irrespective of the family
> that is given at the command-line (with -4, -6, or -0). This change
> will open the socket with the preferred family.
> 
> Signed-off-by: Mahesh Bandewar 
> ---
>  ip/ipmaddr.c | 13 -
>  1 file changed, 12 insertions(+), 1 deletion(-)

What is impact of this? Does ip multicast address changes not work on IPv6?
Or is it just doing the right thing?


Re: [PATCH net-next] net: dsa: mv88e6xxx: Share main switch IRQ

2018-08-17 Thread David Miller
From: Marek Behún 
Date: Fri, 17 Aug 2018 12:09:49 +0200

> On some boards the interrupt can be shared between multiple devices.
> For example on Turris Mox the interrupt is shared between all switches.
> 
> Signed-off-by: Marek Behun 

The net-next tree is closed, please resubmit when it opens back
up.


Re: mv88e6xxx: question: can switch irq be shared?

2018-08-17 Thread David Miller
From: Marek Behún 
Date: Fri, 17 Aug 2018 11:30:55 +0200

> -IRQF_ONESHOT | IRQF_TRIGGER_FALLING,
> +IRQF_ONESHOT | IRQF_TRIGGER_FALLING
> +| IRQF_SHARED,

The "|" operator shoudl end a line not start one.


Re: [iproute PATCH v4] Make colored output configurable

2018-08-17 Thread Stephen Hemminger
On Thu, 16 Aug 2018 11:37:03 +0200
Phil Sutter  wrote:

> Allow for -color={never,auto,always} to have colored output disabled,
> enabled only if stdout is a terminal or enabled regardless of stdout
> state.
> 
> Signed-off-by: Phil Sutter 
> ---
> Changes since v1:
> - Allow to override isatty() check by specifying '-color' flag more than
>   once.
> - Document new behaviour in man pages.
> 
> Changes since v2:
> - Implement new -color=foo syntax.
> - Update commit message and man page texts accordingly.
> 
> Changes since v3:
> - Fix typo in tc/tc.c causing compile error.
> ---
>  bridge/bridge.c   |  3 +--
>  include/color.h   |  7 +++
>  ip/ip.c   |  3 +--
>  lib/color.c   | 33 -
>  man/man8/bridge.8 | 13 +++--
>  man/man8/ip.8 | 13 +++--
>  man/man8/tc.8 | 13 +++--
>  tc/tc.c   |  3 +--
>  8 files changed, 75 insertions(+), 13 deletions(-)
> 
> diff --git a/bridge/bridge.c b/bridge/bridge.c
> index 451d684e0bcfd..e35e5bdf7fb30 100644
> --- a/bridge/bridge.c
> +++ b/bridge/bridge.c
> @@ -173,8 +173,7 @@ main(int argc, char **argv)
>   NEXT_ARG();
>   if (netns_switch(argv[1]))
>   exit(-1);
> - } else if (matches(opt, "-color") == 0) {
> - ++color;
> + } else if (matches_color(opt, ) == 0) {
>   } else if (matches(opt, "-compressvlans") == 0) {
>   ++compress_vlans;
>   } else if (matches(opt, "-force") == 0) {
> diff --git a/include/color.h b/include/color.h
> index 4f2c918db7e43..42038dc2e7f87 100644
> --- a/include/color.h
> +++ b/include/color.h
> @@ -12,8 +12,15 @@ enum color_attr {
>   COLOR_NONE
>  };
>  
> +enum color_opt {
> + COLOR_OPT_NEVER = 0,
> + COLOR_OPT_AUTO = 1,
> + COLOR_OPT_ALWAYS = 2
> +};
> +
>  void enable_color(void);
>  int check_enable_color(int color, int json);
> +int matches_color(const char *arg, int *val);
>  void set_color_palette(void);
>  int color_fprintf(FILE *fp, enum color_attr attr, const char *fmt, ...);
>  enum color_attr ifa_family_color(__u8 ifa_family);
> diff --git a/ip/ip.c b/ip/ip.c
> index 38eac5ec1e17d..893c3c43ef99a 100644
> --- a/ip/ip.c
> +++ b/ip/ip.c
> @@ -283,8 +283,7 @@ int main(int argc, char **argv)
>   exit(-1);
>   }
>   rcvbuf = size;
> - } else if (matches(opt, "-color") == 0) {
> - ++color;
> + } else if (matches_color(opt, ) == 0) {
>   } else if (matches(opt, "-help") == 0) {
>   usage();
>   } else if (matches(opt, "-netns") == 0) {
> diff --git a/lib/color.c b/lib/color.c
> index edf96e5c6ecd7..3ad1d6d647722 100644
> --- a/lib/color.c
> +++ b/lib/color.c
> @@ -3,11 +3,13 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
>  
>  #include "color.h"
> +#include "utils.h"
>  
>  enum color {
>   C_RED,
> @@ -79,13 +81,42 @@ void enable_color(void)
>  
>  int check_enable_color(int color, int json)
>  {
> - if (color && !json) {
> + if (json || color == COLOR_OPT_NEVER)
> + return 1;
> +
> + if (color == COLOR_OPT_ALWAYS || isatty(fileno(stdout))) {
>   enable_color();
>   return 0;
>   }
>   return 1;
>  }
>  
> +int matches_color(const char *arg, int *val)
> +{
> + char *dup, *p;
> +
> + if (!val)
> + return 1;

I am fine with this even for current version (not next).
Minor nit, is that these functions should be bool.
Most of the iproute2 uses int for booleans for historical reasons
but lets try use bool for new code.


Re: [offlist] Re: Crash in netlink/sk_filter_trim_cap on ARMv7 on 4.18rc1

2018-08-17 Thread Russell King - ARM Linux
On Fri, Aug 17, 2018 at 02:40:19PM +0200, Daniel Borkmann wrote:
> I'd have one potential bug suspicion, for the 4.18 one you were trying,
> could you run with the below patch to see whether it would help?

I think this is almost certainly the problem - looking at the history,
it seems that the "-4" was assumed to be part of the scratch stuff in
commit 38ca93060163 ("bpf, arm32: save 4 bytes of unneeded stack space")
but it isn't - it's because "off" of zero refers to the top word in the
stack (iow at STACK_SIZE-4).

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 13.8Mbps down 630kbps up
According to speedtest.net: 13Mbps down 490kbps up


RE: [PATCH mlx5-next] RDMA/mlx5: Don't use cached IRQ affinity mask

2018-08-17 Thread Steve Wise
> On 8/16/2018 1:26 PM, Sagi Grimberg wrote:
> >
> >> Let me know if you want me to try this or any particular fix.
> >
> > Steve, can you test this one?
> 
> Yes!  I'll try it out tomorrow.
> 
> Stevo
> 

Hey Sagi,

The patch works allowing connections for the various affinity mappings below:

One comp_vector per core across all cores, starting with numa-local cores:

[ 3798.494963] iw_cxgb4: comp_vector 0, irq 203 mask 0x100
[ 3798.500717] iw_cxgb4: comp_vector 1, irq 204 mask 0x200
[ 3798.506396] iw_cxgb4: comp_vector 2, irq 205 mask 0x400
[ 3798.512043] iw_cxgb4: comp_vector 3, irq 206 mask 0x800
[ 3798.517675] iw_cxgb4: comp_vector 4, irq 207 mask 0x1000
[ 3798.523382] iw_cxgb4: comp_vector 5, irq 208 mask 0x2000
[ 3798.529075] iw_cxgb4: comp_vector 6, irq 209 mask 0x4000
[ 3798.534754] iw_cxgb4: comp_vector 7, irq 210 mask 0x8000
[ 3798.540425] iw_cxgb4: comp_vector 8, irq 211 mask 0x1
[ 3798.545825] iw_cxgb4: comp_vector 9, irq 212 mask 0x2
[ 3798.551231] iw_cxgb4: comp_vector 10, irq 213 mask 0x4
[ 3798.556713] iw_cxgb4: comp_vector 11, irq 214 mask 0x8
[ 3798.562189] iw_cxgb4: comp_vector 12, irq 215 mask 0x10
[ 3798.567755] iw_cxgb4: comp_vector 13, irq 216 mask 0x20
[ 3798.573312] iw_cxgb4: comp_vector 14, irq 217 mask 0x40
[ 3798.578855] iw_cxgb4: comp_vector 15, irq 218 mask 0x80
[ 3798.584384] set->mq_map[0] queue 8 vector 8
[ 3798.588879] set->mq_map[1] queue 9 vector 9
[ 3798.593358] set->mq_map[2] queue 10 vector 10
[ 3798.598008] set->mq_map[3] queue 11 vector 11
[ 3798.602633] set->mq_map[4] queue 12 vector 12
[ 3798.607260] set->mq_map[5] queue 13 vector 13
[ 3798.611872] set->mq_map[6] queue 14 vector 14
[ 3798.616470] set->mq_map[7] queue 15 vector 15
[ 3798.621059] set->mq_map[8] queue 0 vector 0
[ 3798.625460] set->mq_map[9] queue 1 vector 1
[ 3798.629852] set->mq_map[10] queue 2 vector 2
[ 3798.634331] set->mq_map[11] queue 3 vector 3
[ 3798.638796] set->mq_map[12] queue 4 vector 4
[ 3798.643263] set->mq_map[13] queue 5 vector 5
[ 3798.647727] set->mq_map[14] queue 6 vector 6
[ 3798.652197] set->mq_map[15] queue 7 vector 7

One comp_vector per core, but only numa-local cores:

[ 3855.406027] iw_cxgb4: comp_vector 0, irq 203 mask 0x400
[ 3855.411577] iw_cxgb4: comp_vector 1, irq 204 mask 0x800
[ 3855.417057] iw_cxgb4: comp_vector 2, irq 205 mask 0x1000
[ 3855.422618] iw_cxgb4: comp_vector 3, irq 206 mask 0x2000
[ 3855.428176] iw_cxgb4: comp_vector 4, irq 207 mask 0x4000
[ 3855.433731] iw_cxgb4: comp_vector 5, irq 208 mask 0x8000
[ 3855.439293] iw_cxgb4: comp_vector 6, irq 209 mask 0x100
[ 3855.444770] iw_cxgb4: comp_vector 7, irq 210 mask 0x200
[ 3855.450231] iw_cxgb4: comp_vector 8, irq 211 mask 0x400
[ 3855.455691] iw_cxgb4: comp_vector 9, irq 212 mask 0x800
[ 3855.461144] iw_cxgb4: comp_vector 10, irq 213 mask 0x1000
[ 3855.466768] iw_cxgb4: comp_vector 11, irq 214 mask 0x2000
[ 3855.472379] iw_cxgb4: comp_vector 12, irq 215 mask 0x4000
[ 3855.477992] iw_cxgb4: comp_vector 13, irq 216 mask 0x8000
[ 3855.483599] iw_cxgb4: comp_vector 14, irq 217 mask 0x100
[ 3855.489116] iw_cxgb4: comp_vector 15, irq 218 mask 0x200
[ 3855.494644] set->mq_map[0] queue 8 vector 8
[ 3855.499046] set->mq_map[1] queue 9 vector 9
[ 3855.503445] set->mq_map[2] queue 10 vector 10
[ 3855.508025] set->mq_map[3] queue 11 vector 11
[ 3855.512600] set->mq_map[4] queue 12 vector 12
[ 3855.517176] set->mq_map[5] queue 13 vector 13
[ 3855.521750] set->mq_map[6] queue 14 vector 14
[ 3855.526325] set->mq_map[7] queue 15 vector 15
[ 3855.530902] set->mq_map[8] queue 6 vector 6
[ 3855.535306] set->mq_map[9] queue 7 vector 7
[ 3855.539703] set->mq_map[10] queue 0 vector 0
[ 3855.544197] set->mq_map[11] queue 1 vector 1
[ 3855.548670] set->mq_map[12] queue 2 vector 2
[ 3855.553144] set->mq_map[13] queue 3 vector 3
[ 3855.557630] set->mq_map[14] queue 4 vector 4
[ 3855.562105] set->mq_map[15] queue 5 vector 5

Each comp_vector has affinity to all numa-local cores:

[ 4010.002954] iw_cxgb4: comp_vector 0, irq 203 mask 0xff00
[ 4010.008606] iw_cxgb4: comp_vector 1, irq 204 mask 0xff00
[ 4010.014179] iw_cxgb4: comp_vector 2, irq 205 mask 0xff00
[ 4010.019741] iw_cxgb4: comp_vector 3, irq 206 mask 0xff00
[ 4010.025310] iw_cxgb4: comp_vector 4, irq 207 mask 0xff00
[ 4010.030881] iw_cxgb4: comp_vector 5, irq 208 mask 0xff00
[ 4010.036448] iw_cxgb4: comp_vector 6, irq 209 mask 0xff00
[ 4010.042012] iw_cxgb4: comp_vector 7, irq 210 mask 0xff00
[ 4010.047562] iw_cxgb4: comp_vector 8, irq 211 mask 0xff00
[ 4010.053103] iw_cxgb4: comp_vector 9, irq 212 mask 0xff00
[ 4010.058632] iw_cxgb4: comp_vector 10, irq 213 mask 0xff00
[ 4010.064248] iw_cxgb4: comp_vector 11, irq 214 mask 0xff00
[ 4010.069863] iw_cxgb4: comp_vector 12, irq 215 mask 0xff00
[ 4010.075462] iw_cxgb4: comp_vector 13, irq 216 mask 0xff00
[ 4010.081066] iw_cxgb4: comp_vector 14, irq 217 mask 0xff00
[ 4010.086676] iw_cxgb4: comp_vector 15, irq 218 mask 0xff00
[ 4010.092283] set->mq_map[0] queue 8 vector 8
[ 4010.096683] set->mq_map[1] queue 9 vector 9
[ 4010.101085] 

Re: [PATCH bpf] tools/bpf: fix bpf selftest test_cgroup_storage failure

2018-08-17 Thread Roman Gushchin
On Fri, Aug 17, 2018 at 08:54:15AM -0700, Yonghong Song wrote:
> The bpf selftest test_cgroup_storage failed in one of
> our production test servers.
>   # sudo ./test_cgroup_storage
>   Failed to create map: Operation not permitted
> 
> It turns out this is due to insufficient locked memory
> with system default 16KB.
> 
> Similar to other self tests, let us arm the process
> with unlimited locked memory. With this change,
> the test passed.
>   # sudo ./test_cgroup_storage
>   test_cgroup_storage:PASS
> 
> Fixes: 68cfa3ac6b8d ("selftests/bpf: add a cgroup storage test")
> Cc: Roman Gushchin 
> Signed-off-by: Yonghong Song 
> ---
>  tools/testing/selftests/bpf/test_cgroup_storage.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/tools/testing/selftests/bpf/test_cgroup_storage.c 
> b/tools/testing/selftests/bpf/test_cgroup_storage.c
> index dc83fb2d3f27..4e196e3bfecf 100644
> --- a/tools/testing/selftests/bpf/test_cgroup_storage.c
> +++ b/tools/testing/selftests/bpf/test_cgroup_storage.c
> @@ -5,6 +5,7 @@
>  #include 
>  #include 
>  
> +#include "bpf_rlimit.h"
>  #include "cgroup_helpers.h"
>  
>  char bpf_log_buf[BPF_LOG_BUF_SIZE];
> -- 
> 2.17.1
> 

Acked-by: Roman Gushchin 

Thank you, Yonghong!


[PATCH bpf] tools/bpf: fix bpf selftest test_cgroup_storage failure

2018-08-17 Thread Yonghong Song
The bpf selftest test_cgroup_storage failed in one of
our production test servers.
  # sudo ./test_cgroup_storage
  Failed to create map: Operation not permitted

It turns out this is due to insufficient locked memory
with system default 16KB.

Similar to other self tests, let us arm the process
with unlimited locked memory. With this change,
the test passed.
  # sudo ./test_cgroup_storage
  test_cgroup_storage:PASS

Fixes: 68cfa3ac6b8d ("selftests/bpf: add a cgroup storage test")
Cc: Roman Gushchin 
Signed-off-by: Yonghong Song 
---
 tools/testing/selftests/bpf/test_cgroup_storage.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/tools/testing/selftests/bpf/test_cgroup_storage.c 
b/tools/testing/selftests/bpf/test_cgroup_storage.c
index dc83fb2d3f27..4e196e3bfecf 100644
--- a/tools/testing/selftests/bpf/test_cgroup_storage.c
+++ b/tools/testing/selftests/bpf/test_cgroup_storage.c
@@ -5,6 +5,7 @@
 #include 
 #include 
 
+#include "bpf_rlimit.h"
 #include "cgroup_helpers.h"
 
 char bpf_log_buf[BPF_LOG_BUF_SIZE];
-- 
2.17.1



WICHTIGE MITTEILUNG

2018-08-17 Thread ines_valdiviezo
Lieber Freund,
 
Ich bin Herr Richard Wahl der Mega-Gewinner von $ 533M In Mega Millions Jackpot 
spende ich an 5 zufällige Personen, wenn Sie diese E-Mail erhalten, dann wurde 
Ihre E-Mail nach einem Spinball ausgewählt. Ich habe den größten Teil meines 
Vermögens auf eine Reihe von Wohltätigkeitsorganisationen und Organisationen 
verteilt. Ich habe mich freiwillig dazu entschieden, Ihnen den Betrag von € 
2.000.000,00 zu spenden
eine der ausgewählten 5, um meine Gewinne zu überprüfen, finden Sie auf meiner 
You Tube Seite unten.
 
UHR MICH HIER: https://www.youtube.com/watch?v=tne02ExNDrw
 
Das ist dein Spendencode: [DF00430342018]
 
Antworten Sie mit dem Spendencode auf diese E-Mail: 
oceanicfinancialh...@gmail.com
 
Ich hoffe, Sie und Ihre Familie glücklich zu machen.
 
Grüße
Herr Richard Wahl


Re: [endianness bug] cxgb4: mk_act_open_req() buggers ->{local,peer}_ip on big-endian hosts

2018-08-17 Thread Al Viro
On Fri, Aug 17, 2018 at 06:35:41PM +0530, Ganesh Goudar wrote:
> Thanks, Al. The patch looks good to me but it does not seem
> to be showing up in patchwork, should I resend the patch on
> your behalf to net tree ?

Umm...  I thought net-next had been closed until -rc1, hadn't
it?

Anyway, endianness cleanups and fixes of drivers/net/ethernet/chelsio
can be found in vfs.git #net-endian.chelsio; I was planning to post
that stuff on netdev after -rc1, but I would certainly appreciate
a look from somebody familiar with the hardware prior to that, assuming
you have time for that at the moment...  The stuff in there (it's
based off net/master):
  struct cxgb4_next_header .match_val/.match_mask/mask should be net-endian
  cxgb4: fix TC-PEDIT-related parts of cxgb4_tc_flower on big-endian
  cxgb4_tc_u32: trivial endianness annotations
  cxgb4: trivial endianness annotations
  libcxgb: trivial __percpu annotations
  libcxgb: trivial endianness annotations
  cxgb3: trivial endianness annotations
  cxgb3: don't get cute with !! and shifts in t3_prep_adapter()...
  [investigate][endianness bug] cxgb3: assigning htonl(11-bit value) to 
__be16 field is wrong
  cxgb: trivial endianness annotations

The first two are fixes (the second being the patch you've just replied to),
next-to-last might or might not be (see "[RFC] weirdness in
cxgb3_main.c:init_tp_parity()" posted to netdev a couple of weeks
ago), the rest is pure annotations.  Result is not entirely sparse-clean, but
fairly close to that:

drivers/net/ethernet/chelsio/cxgb3/t3_hw.c:681:67: warning: incorrect type in 
argument 3 (different base types)
drivers/net/ethernet/chelsio/cxgb3/t3_hw.c:681:67:expected restricted 
__le32 [usertype] data
drivers/net/ethernet/chelsio/cxgb3/t3_hw.c:681:67:got int
drivers/net/ethernet/chelsio/cxgb3/t3_hw.c:898:31: warning: incorrect type in 
assignment (different base types)
drivers/net/ethernet/chelsio/cxgb3/t3_hw.c:898:31:expected unsigned int 
[unsigned] [usertype] 
drivers/net/ethernet/chelsio/cxgb3/t3_hw.c:898:31:got restricted __be32 
[usertype] 
drivers/net/ethernet/chelsio/cxgb3/sge.c:2435:43: warning: incorrect type in 
assignment (different base types)
drivers/net/ethernet/chelsio/cxgb3/sge.c:2435:43:expected restricted __wsum 
[usertype] csum
drivers/net/ethernet/chelsio/cxgb3/sge.c:2435:43:got restricted __be32 
[assigned] [usertype] rss_hi
drivers/net/ethernet/chelsio/cxgb3/sge.c:2436:47: warning: incorrect type in 
assignment (different base types)
drivers/net/ethernet/chelsio/cxgb3/sge.c:2436:47:expected unsigned int 
[unsigned] [usertype] priority
drivers/net/ethernet/chelsio/cxgb3/sge.c:2436:47:got restricted __be32 
[assigned] [usertype] rss_lo
drivers/net/ethernet/chelsio/cxgb4/cxgb4_tc_flower.c:419:38: warning: incorrect 
type in assignment (different base types)
drivers/net/ethernet/chelsio/cxgb4/cxgb4_tc_flower.c:419:38:expected 
restricted __be32 [usertype] mask
drivers/net/ethernet/chelsio/cxgb4/cxgb4_tc_flower.c:419:38:got unsigned int
drivers/net/ethernet/chelsio/cxgb4/cxgb4_tc_flower.c:420:37: warning: incorrect 
type in assignment (different base types)
drivers/net/ethernet/chelsio/cxgb4/cxgb4_tc_flower.c:420:37:expected 
restricted __be32 [usertype] val
drivers/net/ethernet/chelsio/cxgb4/cxgb4_tc_flower.c:420:37:got unsigned int
drivers/net/ethernet/chelsio/cxgb4/cxgb4_tc_flower.c:449:22: warning: incorrect 
type in assignment (different base types)
drivers/net/ethernet/chelsio/cxgb4/cxgb4_tc_flower.c:449:22:expected 
restricted __be32 [usertype] mask
drivers/net/ethernet/chelsio/cxgb4/cxgb4_tc_flower.c:449:22:got unsigned int

Remaining tc_flower warnings are misannotated tcf_pedit_mask()/tcf_pedit_val()
(both are actually returning __be32)

sge.c ones are
/* Preserve the RSS info in csum & priority */
skb->csum = rss_hi;
skb->priority = rss_lo;
and I've no idea whether it's a problem or not - ->csum is almost
certainly being abused there, and it looks like ->priority also
is...  I don't know - it's certainly used as host-endian in the
same driver (see e.g. queue_set() and is_ctrl_pkt()), but those are
on TX path and this is RX...  I'm not familiar enough with the
code to tell what's going on here - what *is* the consumer of
the data stored there?  cxgb3_offload.c get_hwtid() and get_opcode()?
If so (and if that's all there is), why not simply something like
skb->priority = (ntohl(rss_lo) & 0x00) | G_OPCODE(ntohl(rss_hi));
before passing skb to rx_offload(), with
static inline u32 get_hwtid(struct sk_buff *skb)
{
return skb->priority >> 8;
}
static inline u32 get_opcode(struct sk_buff *skb)
{
return skb->priority & 0xff;
}
and don't bother touching ->csum...  Again, I'm not familiar with hardware
*or* the driver; I've no idea how much is e.g. shared with the infiniband
or scsi 

Re: [PATCH] net: dsa: add support for ksz9897 ethernet switch

2018-08-17 Thread Rob Herring
Hi, this email is from Rob's (experimental) review bot. I found a couple
of common problems with your patch. Please see below.

On Wed, 15 Aug 2018 16:51:23 +0100, Lad Prabhakar wrote:
> From: "Lad, Prabhakar" 
> 
> ksz9477 is superset of ksz9xx series, driver just works
> out of the box for ksz9897 chip with this patch.
> 
> Signed-off-by: Lad, Prabhakar 

The preferred subject prefix is "dt-bindings: : ...".

> ---
>  Documentation/devicetree/bindings/net/dsa/ksz.txt | 4 +++-
>  drivers/net/dsa/microchip/ksz_common.c| 9 +
>  drivers/net/dsa/microchip/ksz_spi.c   | 1 +
>  3 files changed, 13 insertions(+), 1 deletion(-)
> 

DT bindings (including binding headers) should be a separate patch. See
Documentation/devicetree/bindings/submitting-patches.txt.


Re: [PATCH iproute2-next] iproute_lwtunnel: allow specifying 'src' for 'encap ip' / 'encap ip6'

2018-08-17 Thread Stephen Hemminger
On Fri, 17 Aug 2018 10:31:34 +0300
Shmulik Ladkani  wrote:

> This allows the user to specify the LWTUNNEL_IP_SRC/LWTUNNEL_IP6_SRC
> when setting an lwtunnel encapsulation route.
> 
> Signed-off-by: Shmulik Ladkani 
> ---
>  ip/iproute_lwtunnel.c | 22 --
>  1 file changed, 20 insertions(+), 2 deletions(-)
> 
> diff --git a/ip/iproute_lwtunnel.c b/ip/iproute_lwtunnel.c
> index 740da7c6..20d5545c 100644
> --- a/ip/iproute_lwtunnel.c
> +++ b/ip/iproute_lwtunnel.c
> @@ -671,7 +671,7 @@ static int parse_encap_mpls(struct rtattr *rta, size_t 
> len,
>  static int parse_encap_ip(struct rtattr *rta, size_t len,
> int *argcp, char ***argvp)
>  {
> - int id_ok = 0, dst_ok = 0, tos_ok = 0, ttl_ok = 0;
> + int id_ok = 0, dst_ok = 0, src_ok = 0, tos_ok = 0, ttl_ok = 0;
>   char **argv = *argvp;
>   int argc = *argcp;
>  
> @@ -694,6 +694,15 @@ static int parse_encap_ip(struct rtattr *rta, size_t len,
>   get_addr(, *argv, AF_INET);
>   rta_addattr_l(rta, len, LWTUNNEL_IP_DST,
> , addr.bytelen);
> + } else if (strcmp(*argv, "src") == 0) {
> + inet_prefix addr;
> +
> + NEXT_ARG();
> + if (src_ok++)
> + duparg2("src", *argv);
> + get_addr(, *argv, AF_INET);
> + rta_addattr_l(rta, len, LWTUNNEL_IP_SRC,
> +   , addr.bytelen);
>   } else if (strcmp(*argv, "tos") == 0) {
>   __u32 tos;
>  
> @@ -805,7 +814,7 @@ static int parse_encap_ila(struct rtattr *rta, size_t len,
>  static int parse_encap_ip6(struct rtattr *rta, size_t len,
>  int *argcp, char ***argvp)
>  {
> - int id_ok = 0, dst_ok = 0, tos_ok = 0, ttl_ok = 0;
> + int id_ok = 0, dst_ok = 0, src_ok = 0, tos_ok = 0, ttl_ok = 0;
>   char **argv = *argvp;
>   int argc = *argcp;
>  
> @@ -828,6 +837,15 @@ static int parse_encap_ip6(struct rtattr *rta, size_t 
> len,
>   get_addr(, *argv, AF_INET6);
>   rta_addattr_l(rta, len, LWTUNNEL_IP6_DST,
> , addr.bytelen);
> + } else if (strcmp(*argv, "src") == 0) {
> + inet_prefix addr;
> +
> + NEXT_ARG();
> + if (src_ok++)
> + duparg2("src", *argv);
> + get_addr(, *argv, AF_INET6);
> + rta_addattr_l(rta, len, LWTUNNEL_IP6_SRC,
> +   , addr.bytelen);
>   } else if (strcmp(*argv, "tc") == 0) {
>   __u32 tc;
>  

If you accept an attribute on input you need to parse it and display it the
same way in the show command.


Re: [offlist] Re: Crash in netlink/sk_filter_trim_cap on ARMv7 on 4.18rc1

2018-08-17 Thread Peter Robinson
On Fri, Aug 17, 2018 at 1:40 PM, Daniel Borkmann  wrote:
> On 08/17/2018 02:25 PM, Peter Robinson wrote:
>> On Thu, Aug 16, 2018 at 11:58 PM, Russell King - ARM Linux
>>  wrote:
>>> On Thu, Aug 16, 2018 at 10:35:16PM +0200, Marc Haber wrote:
 On Mon, Jun 25, 2018 at 05:41:27PM +0100, Peter Robinson wrote:
> So with that and the other fix there was no improvement, with those
> and the BPF JIT disabled it works, I'm not sure if the two patches
> have any effect with the JIT disabled though.

 I can confirm the crash with the released 4.18.1 on Banana Pi, and I can
 also confirm that disabling BPF JIT makes the Banana Pi work again.,
>>>
>>> I'm afraid that the information in the crash dumps is insufficient
>>> to be able to work very much out about these crashes.
>>>
>>> We need a recipe (kernel configuration and what userspace is doing)
>>> so that it's possible to recreate the crash, or we need responses
>>> to requests for information - I requested the disassembly of
>>> sk_filter_trim_cap and the BPF code dump via setting a sysctl back
>>> in early July.  Without this, as I say, I don't see how this problem
>>> can be progressed.
>>
>> I can provide a kernel config [1] but I've not had enough time to sit
>> down and get the rest of the stuff and debug it due to a combination
>> of travel and other priorities.
>
> Did you get a chance to try latest kernel from Linus' tree [1] from last
> few days to see whether the issue is still persistent? There have been
> a number of improvements, bit strange why e.g. Russell didn't run into
> it while others have, hmm. Perhaps due to EABI vs non EABI.

I haven't had a chance to try anything from the 4.19 merge window as
yet, I'm traveling this week so it was on the list for next week to
try.

> [1] git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
>
>>> If the problem is at boot, one way to set the sysctl would be to
>>> hack the kernel and explicitly initialise the sysctl to '2', or
>>> boot with init=/bin/sh, then manually mount /proc, set the sysctl,
>>> and then "exec /sbin/init" from that shell.  (Remember there's no
>>> job control in that shell, so ^z, ^c, etc do not work.)
>>
>> It starts to happen in the early kernel boot long before we get to any
>> userspace across a number of ARMv7 devices (RPi2/3, BeagleBone and
>> AllWinner H3 based devices at least).
>>
>> [1] https://pbrobinson.fedorapeople.org/kernel-armv7hl.config
>
> I'd have one potential bug suspicion, for the 4.18 one you were trying,
> could you run with the below patch to see whether it would help?

I will try and get someone to test that today, thanks

> Thanks,
> Daniel
>
> diff --git a/arch/arm/net/bpf_jit_32.c b/arch/arm/net/bpf_jit_32.c
> index f6a62ae..c864f6b 100644
> --- a/arch/arm/net/bpf_jit_32.c
> +++ b/arch/arm/net/bpf_jit_32.c
> @@ -238,7 +238,7 @@ static void jit_fill_hole(void *area, unsigned int size)
>  #define STACK_SIZE ALIGN(_STACK_SIZE, STACK_ALIGNMENT)
>
>  /* Get the offset of eBPF REGISTERs stored on scratch space. */
> -#define STACK_VAR(off) (STACK_SIZE - off)
> +#define STACK_VAR(off) (STACK_SIZE - off - 4)
>
>  #if __LINUX_ARM_ARCH__ < 7
>


Re: mv88e6xxx: question: can switch irq be shared?

2018-08-17 Thread Andrew Lunn
On Fri, Aug 17, 2018 at 11:30:55AM +0200, Marek Behún wrote:
> Hello, I am wondering if the main device irq in
> dsa/mv88e6xxx/chip.c can be requested as shared (see patch below).

This probably works O.K, but its not something anybody else has
done. So there could be some hidden issues.

  Andrew


[PATCH ethtool] ethtool: document WoL filters option also in help message

2018-08-17 Thread Michal Kubecek
Commit eff0bb337223 ("ethtool: Add support for WAKE_FILTER (WoL using
filters)") added option "f" for wake on lan and documented it in man page
but not in the output of "ethtool --help".

Signed-off-by: Michal Kubecek 
---
 ethtool.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/ethtool.c b/ethtool.c
index aa2bbe9e4c65..e8b7703293d2 100644
--- a/ethtool.c
+++ b/ethtool.c
@@ -5069,7 +5069,7 @@ static const struct option {
  " [ advertise %x ]\n"
  " [ phyad %d ]\n"
  " [ xcvr internal|external ]\n"
- " [ wol p|u|m|b|a|g|s|d... ]\n"
+ " [ wol p|u|m|b|a|g|s|f|d... ]\n"
  " [ sopass %x:%x:%x:%x:%x:%x ]\n"
  " [ msglvl %d | msglvl type on|off ... ]\n" },
{ "-a|--show-pause", 1, do_gpause, "Show pause options" },
-- 
2.18.0



Re: [endianness bug] cxgb4: mk_act_open_req() buggers ->{local,peer}_ip on big-endian hosts

2018-08-17 Thread Ganesh Goudar
Thanks, Al. The patch looks good to me but it does not seem
to be showing up in patchwork, should I resend the patch on
your behalf to net tree ?

Ganesh


Re: [offlist] Re: Crash in netlink/sk_filter_trim_cap on ARMv7 on 4.18rc1

2018-08-17 Thread Daniel Borkmann
On 08/17/2018 02:25 PM, Peter Robinson wrote:
> On Thu, Aug 16, 2018 at 11:58 PM, Russell King - ARM Linux
>  wrote:
>> On Thu, Aug 16, 2018 at 10:35:16PM +0200, Marc Haber wrote:
>>> On Mon, Jun 25, 2018 at 05:41:27PM +0100, Peter Robinson wrote:
 So with that and the other fix there was no improvement, with those
 and the BPF JIT disabled it works, I'm not sure if the two patches
 have any effect with the JIT disabled though.
>>>
>>> I can confirm the crash with the released 4.18.1 on Banana Pi, and I can
>>> also confirm that disabling BPF JIT makes the Banana Pi work again.,
>>
>> I'm afraid that the information in the crash dumps is insufficient
>> to be able to work very much out about these crashes.
>>
>> We need a recipe (kernel configuration and what userspace is doing)
>> so that it's possible to recreate the crash, or we need responses
>> to requests for information - I requested the disassembly of
>> sk_filter_trim_cap and the BPF code dump via setting a sysctl back
>> in early July.  Without this, as I say, I don't see how this problem
>> can be progressed.
> 
> I can provide a kernel config [1] but I've not had enough time to sit
> down and get the rest of the stuff and debug it due to a combination
> of travel and other priorities.

Did you get a chance to try latest kernel from Linus' tree [1] from last
few days to see whether the issue is still persistent? There have been
a number of improvements, bit strange why e.g. Russell didn't run into
it while others have, hmm. Perhaps due to EABI vs non EABI.

[1] git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git

>> If the problem is at boot, one way to set the sysctl would be to
>> hack the kernel and explicitly initialise the sysctl to '2', or
>> boot with init=/bin/sh, then manually mount /proc, set the sysctl,
>> and then "exec /sbin/init" from that shell.  (Remember there's no
>> job control in that shell, so ^z, ^c, etc do not work.)
> 
> It starts to happen in the early kernel boot long before we get to any
> userspace across a number of ARMv7 devices (RPi2/3, BeagleBone and
> AllWinner H3 based devices at least).
> 
> [1] https://pbrobinson.fedorapeople.org/kernel-armv7hl.config

I'd have one potential bug suspicion, for the 4.18 one you were trying,
could you run with the below patch to see whether it would help?

Thanks,
Daniel

diff --git a/arch/arm/net/bpf_jit_32.c b/arch/arm/net/bpf_jit_32.c
index f6a62ae..c864f6b 100644
--- a/arch/arm/net/bpf_jit_32.c
+++ b/arch/arm/net/bpf_jit_32.c
@@ -238,7 +238,7 @@ static void jit_fill_hole(void *area, unsigned int size)
 #define STACK_SIZE ALIGN(_STACK_SIZE, STACK_ALIGNMENT)

 /* Get the offset of eBPF REGISTERs stored on scratch space. */
-#define STACK_VAR(off) (STACK_SIZE - off)
+#define STACK_VAR(off) (STACK_SIZE - off - 4)

 #if __LINUX_ARM_ARCH__ < 7



[PATCH]ipv6: multicast: In mld_send_cr function moving read lock to second for loop

2018-08-17 Thread Guruswamy Basavaiah
In function mld_send_cr, the first loop is already protected by
idev->mc_lock, it dont need idev->lock read lock, hence moving it
only to second for loop.

Signed-off-by: Guruswamy Basavaiah 
---
 net/ipv6/mcast.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/ipv6/mcast.c b/net/ipv6/mcast.c
index d64ee7e..d8e7e15 100644
--- a/net/ipv6/mcast.c
+++ b/net/ipv6/mcast.c
@@ -1860,7 +1860,6 @@ static void mld_send_cr(struct inet6_dev *idev)
 struct sk_buff *skb = NULL;
 int type, dtype;

-read_lock_bh(>lock);
 spin_lock(>mc_lock);

 /* deleted MCA's */
@@ -1897,6 +1896,7 @@ static void mld_send_cr(struct inet6_dev *idev)
 }
 spin_unlock(>mc_lock);

+read_lock_bh(>lock);
 /* change recs */
 for (pmc = idev->mc_list; pmc; pmc = pmc->next) {
 spin_lock_bh(>mca_lock);
-- 
2.9.5


Re: [offlist] Re: Crash in netlink/sk_filter_trim_cap on ARMv7 on 4.18rc1

2018-08-17 Thread Peter Robinson
On Thu, Aug 16, 2018 at 11:58 PM, Russell King - ARM Linux
 wrote:
> On Thu, Aug 16, 2018 at 10:35:16PM +0200, Marc Haber wrote:
>> On Mon, Jun 25, 2018 at 05:41:27PM +0100, Peter Robinson wrote:
>> > So with that and the other fix there was no improvement, with those
>> > and the BPF JIT disabled it works, I'm not sure if the two patches
>> > have any effect with the JIT disabled though.
>>
>> I can confirm the crash with the released 4.18.1 on Banana Pi, and I can
>> also confirm that disabling BPF JIT makes the Banana Pi work again.,
>
> Hi,
>
> I'm afraid that the information in the crash dumps is insufficient
> to be able to work very much out about these crashes.
>
> We need a recipe (kernel configuration and what userspace is doing)
> so that it's possible to recreate the crash, or we need responses
> to requests for information - I requested the disassembly of
> sk_filter_trim_cap and the BPF code dump via setting a sysctl back
> in early July.  Without this, as I say, I don't see how this problem
> can be progressed.

I can provide a kernel config [1] but I've not had enough time to sit
down and get the rest of the stuff and debug it due to a combination
of travel and other priorities.

> If the problem is at boot, one way to set the sysctl would be to
> hack the kernel and explicitly initialise the sysctl to '2', or
> boot with init=/bin/sh, then manually mount /proc, set the sysctl,
> and then "exec /sbin/init" from that shell.  (Remember there's no
> job control in that shell, so ^z, ^c, etc do not work.)

It starts to happen in the early kernel boot long before we get to any
userspace across a number of ARMv7 devices (RPi2/3, BeagleBone and
AllWinner H3 based devices at least).

[1] https://pbrobinson.fedorapeople.org/kernel-armv7hl.config


Re: [PATCH 0/2] net/sched: Add hardware specific counters to TC actions

2018-08-17 Thread Jakub Kicinski
On Thu, 16 Aug 2018 14:02:44 +0200, Eelco Chaudron wrote:
> On 11 Aug 2018, at 21:06, David Miller wrote:
> 
> > From: Jakub Kicinski 
> > Date: Thu, 9 Aug 2018 20:26:08 -0700
> >  
> >> It is not immediately clear why this is needed.  The memory and
> >> updating two sets of counters won't come for free, so perhaps a
> >> stronger justification than troubleshooting is due? :S
> >>
> >> Netdev has counters for fallback vs forwarded traffic, so you'd know
> >> that traffic hits the SW datapath, plus the rules which are in_hw 
> >> will
> >> most likely not match as of today for flower (assuming correctness).  
> 
> I strongly believe that these counters are a requirement for a mixed 
> software/hardware (flow) based forwarding environment. The global 
> counters will not help much here as you might have chosen to have 
> certain traffic forwarded by software.
> 
> These counters are probably the only option you have to figure out why 
> forwarding is not as fast as expected, and you want to blame the TC 
> offload NIC.

The suggested debugging flow would be:
 (1) check the global counter for fallback are incrementing;
 (2) find a flow with high stats but no in_hw flag set.

The in_hw indication should be sufficient in most cases (unless there
are shared blocks between netdevs of different ASICs...).

> >> I'm slightly concerned about potential performance impact, would you
> >> be able to share some numbers for non-trivial number of flows (100k
> >> active?)?  
> >
> > Agreed, features used for diagnostics cannot have a harmful penalty 
> > for fast path performance.  
> 
> Fast path performance is not affected as these counters are not 
> incremented there. They are only incremented by the nic driver when they 
> gather their statistics from hardware.

Not by much, you are adding state to performance-critical structures,
though, for what is effectively debugging purposes.  

I was mostly talking about the HW offload stat updates (sorry for not
being clear).

We can have some hundreds of thousands active offloaded flows, each of
them can have multiple actions, and stats have to be updated multiple
times per second and dumped probably around once a second, too.  On a
busy system the stats will get evicted from cache between each round.  

But I'm speculating let's see if I can get some numbers on it (if you
could get some too, that would be great!).

> However, the flow creation is effected, as this is where the extra 
> memory gets allocated. I had done some 40K flow tests before and did not 
> see any noticeable change in flow insertion performance. As requested by 
> Jakub I did it again for 100K (and threw a Netronome blade in the mix 
> ;). I used Marcelo’s test tool, 
> https://github.com/marceloleitner/perf-flower.git.
> 
> Here are the numbers (time in seconds) for 10 runs in sorted order:
> 
> +-++
> | Base_kernel | Change_applied |
> +-++
> |5.684019 |   5.656388 |
> |5.699658 |   5.674974 |
> |5.725220 |   5.722107 |
> |5.739285 |   5.839855 |
> |5.748088 |   5.865238 |
> |5.766231 |   5.873913 |
> |5.842264 |   5.909259 |
> |5.902202 |   5.912685 |
> |5.905391 |   5.947138 |
> |6.032997 |   5.997779 |
> +-++
> 
> I guess the deviation is in the userspace part, which is where in real 
> life flows get added anyway.
> 
> Let me know if more is unclear.


Re: [iproute PATCH 07/10] ip: Add missing -M flag to help text

2018-08-17 Thread Phil Sutter
Hi Stephen,
On Thu, Aug 16, 2018 at 12:27:59PM +0200, Phil Sutter wrote:
> Signed-off-by: Phil Sutter 
> ---
>  ip/ip.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Seems like this patch wasn't applied. Did you perhaps drop it by
accident?

Cheers, Phil


[PATCH net-next] net: dsa: mv88e6xxx: Share main switch IRQ

2018-08-17 Thread Marek Behún
On some boards the interrupt can be shared between multiple devices.
For example on Turris Mox the interrupt is shared between all switches.

Signed-off-by: Marek Behun 
---
 drivers/net/dsa/mv88e6xxx/chip.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
index 8da3d39e3218..b57f5403982a 100644
--- a/drivers/net/dsa/mv88e6xxx/chip.c
+++ b/drivers/net/dsa/mv88e6xxx/chip.c
@@ -434,7 +434,7 @@ static int mv88e6xxx_g1_irq_setup(struct mv88e6xxx_chip 
*chip)
 
err = request_threaded_irq(chip->irq, NULL,
   mv88e6xxx_g1_irq_thread_fn,
-  IRQF_ONESHOT,
+  IRQF_ONESHOT | IRQF_SHARED,
   dev_name(chip->dev), chip);
if (err)
mv88e6xxx_g1_irq_free_common(chip);
-- 
2.16.4



Re: virtio_net failover and initramfs (was: Re: [PATCH net-next v11 2/5] netvsc: refactor notifier/event handling code to use the failover framework)

2018-08-17 Thread Harald Hoyer
On 17.08.2018 11:51, Harald Hoyer wrote:
> On 16.08.2018 00:17, Siwei Liu wrote:
>> On Wed, Aug 15, 2018 at 12:05 PM, Samudrala, Sridhar
>>  wrote:
>>> On 8/14/2018 5:03 PM, Siwei Liu wrote:

 Are we sure all userspace apps skip and ignore slave interfaces by
 just looking at "IFLA_MASTER" attribute?

 When STANDBY is enabled on virtio-net, a failover master interface
 will appear, which automatically enslaves the virtio device. But it is
 found out that iSCSI (or any network boot) cannot boot strap over the
 new failover interface together with a standby virtio (without any VF
 or PT device in place).

 Dracut (initramfs) ends up with timeout and dropping into emergency shell:

 [  228.170425] dracut-initqueue[377]: Warning: dracut-initqueue
 timeout - starting timeout scripts
 [  228.171788] dracut-initqueue[377]: Warning: Could not boot.
   Starting Dracut Emergency Shell...
 Generating "/run/initramfs/rdsosreport.txt"
 Entering emergency mode. Exit the shell to continue.
 Type "journalctl" to view system logs.
 You might want to save "/run/initramfs/rdsosreport.txt" to a USB stick or
 /boot
 after mounting them and attach it to a bug report.
 dracut:/# ip l sh
 1: lo:  mtu 65536 qdisc noqueue state UNKNOWN
 mode DEFAULT group default qlen 1000
  link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
 2: eth0:  mtu 1500 qdisc noqueue
 state UP mode DEFAULT group default qlen 1000
  link/ether 9a:46:22:ae:33:54 brd ff:ff:ff:ff:ff:ff\
 3: eth1:  mtu 1500 qdisc pfifo_fast
 master eth0 state UP mode DEFAULT group default qlen 1000
  link/ether 9a:46:22:ae:33:54 brd ff:ff:ff:ff:ff:ff
 dracut:/#

 If changing dracut code to ignore eth1 (with IFLA_MASTER attr),
 network boot starts to work.
>>>
>>>
>>> Does dracut by default tries to use all the interfaces that are UP?
>>>
>> Yes. The specific dracut cmdline of our case is "ip=dhcp
>> netroot=iscsi:... ", but it's not specific to iscsi boot. And because
>> of same MAC address for failover and standby, while dracut tries to
>> run DHCP on all interfaces that are up it eventually gets same route
>> for each interface. Those conflict route entries kill off the network
>> connection.
>>
>>>

 The reason is that dracut has its own means to differentiate virtual
 interfaces for network boot: it does not look at IFLA_MASTER and
 ignores slave interfaces. Instead, users have to provide explicit
 option e.g. bond=eth0,eth1 in the boot line, then dracut would know
 the config and ignore the slave interfaces.
>>>
>>>
>>> Isn't it possible to specify the interface that should be used for network
>>> boot?
>> As I understand it, one can only specify interface name for running
>> DHCP but not select interface for network boot.  We want DHCP to run
>> on every NIC that is up (excluding the enslaved interfaces), and only
>> one of them can get a route entry to the network boot server (ie.g.
>> iSCSI target).
>>
>>>
>>>

 However, with automatic creation of failover interface that assumption
 is no longer true. Can we change dracut to ignore all slave interface
 by checking  IFLA_MASTER? I don't think so. It has a large impact to
 existing configs.
>>>
>>>
>>> What is the issue with checking for IFLA_MASTER? I guess this is used with
>>> team/bonding setups.
>> That should be discussed within and determined by the dracut
>> community. But the current dracut code doesn't check IFLA_MASTER for
>> team or bonding specifically. I guess this change might have broader
>> impact to existing userspace that might be already relying on the
>> current behaviour.
>>
>> Thanks,
>> -Siwei
> 
> Is there a sysfs flag for IFF_SLAVE? Or any "ip" output I can use to detect, 
> that it is a IFF_SLAVE?
> 

Oh, it's the other way around.. dracut should ignore "master" (eth1).

Can the master enslave the "eth0", if it is already "UP" and busy later on?



Re: virtio_net failover and initramfs (was: Re: [PATCH net-next v11 2/5] netvsc: refactor notifier/event handling code to use the failover framework)

2018-08-17 Thread Harald Hoyer
On 16.08.2018 00:17, Siwei Liu wrote:
> On Wed, Aug 15, 2018 at 12:05 PM, Samudrala, Sridhar
>  wrote:
>> On 8/14/2018 5:03 PM, Siwei Liu wrote:
>>>
>>> Are we sure all userspace apps skip and ignore slave interfaces by
>>> just looking at "IFLA_MASTER" attribute?
>>>
>>> When STANDBY is enabled on virtio-net, a failover master interface
>>> will appear, which automatically enslaves the virtio device. But it is
>>> found out that iSCSI (or any network boot) cannot boot strap over the
>>> new failover interface together with a standby virtio (without any VF
>>> or PT device in place).
>>>
>>> Dracut (initramfs) ends up with timeout and dropping into emergency shell:
>>>
>>> [  228.170425] dracut-initqueue[377]: Warning: dracut-initqueue
>>> timeout - starting timeout scripts
>>> [  228.171788] dracut-initqueue[377]: Warning: Could not boot.
>>>   Starting Dracut Emergency Shell...
>>> Generating "/run/initramfs/rdsosreport.txt"
>>> Entering emergency mode. Exit the shell to continue.
>>> Type "journalctl" to view system logs.
>>> You might want to save "/run/initramfs/rdsosreport.txt" to a USB stick or
>>> /boot
>>> after mounting them and attach it to a bug report.
>>> dracut:/# ip l sh
>>> 1: lo:  mtu 65536 qdisc noqueue state UNKNOWN
>>> mode DEFAULT group default qlen 1000
>>>  link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
>>> 2: eth0:  mtu 1500 qdisc noqueue
>>> state UP mode DEFAULT group default qlen 1000
>>>  link/ether 9a:46:22:ae:33:54 brd ff:ff:ff:ff:ff:ff\
>>> 3: eth1:  mtu 1500 qdisc pfifo_fast
>>> master eth0 state UP mode DEFAULT group default qlen 1000
>>>  link/ether 9a:46:22:ae:33:54 brd ff:ff:ff:ff:ff:ff
>>> dracut:/#
>>>
>>> If changing dracut code to ignore eth1 (with IFLA_MASTER attr),
>>> network boot starts to work.
>>
>>
>> Does dracut by default tries to use all the interfaces that are UP?
>>
> Yes. The specific dracut cmdline of our case is "ip=dhcp
> netroot=iscsi:... ", but it's not specific to iscsi boot. And because
> of same MAC address for failover and standby, while dracut tries to
> run DHCP on all interfaces that are up it eventually gets same route
> for each interface. Those conflict route entries kill off the network
> connection.
> 
>>
>>>
>>> The reason is that dracut has its own means to differentiate virtual
>>> interfaces for network boot: it does not look at IFLA_MASTER and
>>> ignores slave interfaces. Instead, users have to provide explicit
>>> option e.g. bond=eth0,eth1 in the boot line, then dracut would know
>>> the config and ignore the slave interfaces.
>>
>>
>> Isn't it possible to specify the interface that should be used for network
>> boot?
> As I understand it, one can only specify interface name for running
> DHCP but not select interface for network boot.  We want DHCP to run
> on every NIC that is up (excluding the enslaved interfaces), and only
> one of them can get a route entry to the network boot server (ie.g.
> iSCSI target).
> 
>>
>>
>>>
>>> However, with automatic creation of failover interface that assumption
>>> is no longer true. Can we change dracut to ignore all slave interface
>>> by checking  IFLA_MASTER? I don't think so. It has a large impact to
>>> existing configs.
>>
>>
>> What is the issue with checking for IFLA_MASTER? I guess this is used with
>> team/bonding setups.
> That should be discussed within and determined by the dracut
> community. But the current dracut code doesn't check IFLA_MASTER for
> team or bonding specifically. I guess this change might have broader
> impact to existing userspace that might be already relying on the
> current behaviour.
> 
> Thanks,
> -Siwei

Is there a sysfs flag for IFF_SLAVE? Or any "ip" output I can use to detect, 
that it is a IFF_SLAVE?


mv88e6xxx: question: can switch irq be shared?

2018-08-17 Thread Marek Behún
Hello, I am wondering if the main device irq in
dsa/mv88e6xxx/chip.c can be requested as shared (see patch below).

The reason is that our board is wired so that irqs from all switches
come to the same gpio.

Marek
---
 drivers/net/dsa/mv88e6xxx/chip.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
index d90656e415b0..1caaa09e391e 100644
--- a/drivers/net/dsa/mv88e6xxx/chip.c
+++ b/drivers/net/dsa/mv88e6xxx/chip.c
@@ -424,7 +424,8 @@ static int mv88e6xxx_g1_irq_setup(struct mv88e6xxx_chip 
*chip)
 
err = request_threaded_irq(chip->irq, NULL,
   mv88e6xxx_g1_irq_thread_fn,
-  IRQF_ONESHOT | IRQF_TRIGGER_FALLING,
+  IRQF_ONESHOT | IRQF_TRIGGER_FALLING
+  | IRQF_SHARED,
   dev_name(chip->dev), chip);
if (err)
mv88e6xxx_g1_irq_free_common(chip);



[regression] r8169 without realtek_phy

2018-08-17 Thread Jouni Mettälä
There is network regression for me. 4.18 was good. 4.18+ is bad. There
was some phy changes in r8169 driver. Fortunately adding
CONFIG_REALTEK_PHY=m to kernel config fixed the regression. 

Should r8169 depend on realtek_phy? Does that breaks something else?

Network doesn't work with Generic PHY (output of dmesg)
Generic PHY r8169-300:00: attached PHY driver [Generic PHY]
(mii_bus:phy_addr=r8169-300:00, irq=IGNORE)

When realtek_phy is compiled, r8169 automatically uses it.
RTL8211B Gigabit Ethernet r8169-300:00: attached PHY driver [RTL8211B
Gigabit Ethernet] (mii_bus:phy_addr=r8169-300:00, irq=IGNORE)

Here is Ethernet controller's lspci for reference:
03:00.0 Ethernet controller [0200]: Realtek Semiconductor Co., Ltd.
RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller [10ec:8168]
(rev 01)
Subsystem: ABIT Computer Corp. RTL8111/8168/8411 PCI Express
Gigabit Ethernet Controller [147b:1078]
Flags: bus master, fast devsel, latency 0, IRQ 27
I/O ports at ce00 [size=256]
Memory at fddff000 (64-bit, non-prefetchable) [size=4K]
[virtual] Expansion ROM at fdd0 [disabled] [size=64K]
Capabilities: [40] Power Management version 2
Capabilities: [48] Vital Product Data
Capabilities: [50] MSI: Enable+ Count=1/2 Maskable- 64bit+
Capabilities: [60] Express Endpoint, MSI 00
Capabilities: [84] Vendor Specific Information: Len=4c 
Capabilities: [100] Advanced Error Reporting
Capabilities: [12c] Virtual Channel
Capabilities: [148] Device Serial Number 28-00-00-00-00-00-00-
00
Capabilities: [154] Power Budgeting 
Kernel driver in use: r8169
Kernel modules: r8169



[PATCH 1/1] tap: RCU usage and comment fixes

2018-08-17 Thread Wang Jian
The tap_queue and the 'tap_dev' are loosely coupled, not 'macvlan_dev'.

Taking rcu_read_lock a little later seems can slightly reduce rcu read critical 
section.

Signed-off-by: Wang Jian 
---
 drivers/net/tap.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/net/tap.c b/drivers/net/tap.c
index f0f7cd977667..e5e5a8e4a60d 100644
--- a/drivers/net/tap.c
+++ b/drivers/net/tap.c
@@ -125,7 +125,7 @@ static struct tap_dev *tap_dev_get_rcu(const struct 
net_device *dev)
 
 /*
  * RCU usage:
- * The tap_queue and the macvlan_dev are loosely coupled, the
+ * The tap_queue and the tap_dev are loosely coupled, the
  * pointers from one to the other can only be read while rcu_read_lock
  * or rtnl is held.
  *
@@ -720,8 +720,6 @@ static ssize_t tap_get_user(struct tap_queue *q, struct 
msghdr *m,
__vlan_get_protocol(skb, skb->protocol, ) != 0)
skb_set_network_header(skb, depth);
 
-   rcu_read_lock();
-   tap = rcu_dereference(q->tap);
/* copy skb_ubuf_info for callback when skb has no error */
if (zerocopy) {
skb_shinfo(skb)->destructor_arg = m->msg_control;
@@ -732,6 +730,8 @@ static ssize_t tap_get_user(struct tap_queue *q, struct 
msghdr *m,
uarg->callback(uarg, false);
}
 
+   rcu_read_lock();
+   tap = rcu_dereference(q->tap);
if (tap) {
skb->dev = tap->dev;
dev_queue_xmit(skb);
-- 
2.17.1



[PATCH] datapath.c: fix missing return value check of nla_nest_start()

2018-08-17 Thread Jiecheng Wu
Function queue_userspace_packet() defined in net/openvswitch/datapath.c calls 
nla_nest_start() to allocate memory for struct nlattr which is dereferenced 
immediately. As nla_nest_start() may return NULL on failure, this code piece 
may cause NULL pointer dereference bug.
---
 net/openvswitch/datapath.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
index 0f5ce77..ff4457d 100644
--- a/net/openvswitch/datapath.c
+++ b/net/openvswitch/datapath.c
@@ -460,6 +460,8 @@ static int queue_userspace_packet(struct datapath *dp, 
struct sk_buff *skb,
 
if (upcall_info->egress_tun_info) {
nla = nla_nest_start(user_skb, OVS_PACKET_ATTR_EGRESS_TUN_KEY);
+   if (!nla)
+   return -EMSGSIZE;
err = ovs_nla_put_tunnel_info(user_skb,
  upcall_info->egress_tun_info);
BUG_ON(err);
@@ -468,6 +470,8 @@ static int queue_userspace_packet(struct datapath *dp, 
struct sk_buff *skb,
 
if (upcall_info->actions_len) {
nla = nla_nest_start(user_skb, OVS_PACKET_ATTR_ACTIONS);
+   if (!nla)
+   return -EMSGSIZE;
err = ovs_nla_put_actions(upcall_info->actions,
  upcall_info->actions_len,
  user_skb);
-- 
2.6.4



Dear intended recipient

2018-08-17 Thread Malik Sanfo




--
Good-Day!,

Can you assist me to handle this transaction? I will forward you the
full details about the transaction if you are ready.

Yours faithfully
Mr. Malik Sanfo
--



Re: [v3, net-next, 02/12] net: stmmac: Do not keep rearming the coalesce timer in stmmac_xmit

2018-08-17 Thread Jerome Brunet
On Fri, 2018-05-18 at 14:55 +0100, Jose Abreu wrote:
> This is cutting down performance. Once the timer is armed it should run
> after the time expires for the first packet sent and not the last one.
> 
> After this change, running iperf, the performance gain is +/- 24%.

Hi Guys,

Since v4.18, we are getting a serious regression on Amlogic based SoCs.
I have tested this on amlogic's: 
* gxbb S905 p200 (Micrel KSZ9031 - 1GBps)
* axg A113 s400 (Realtek RTL8211F - 1GBps)

Both SoCs use the synopsys gmac with stmmac driver.

I first noticed that running NFS root filesystem became unstable but I could not
understand why. Then, running a download as simple test with iperf3 (from an
initramfs) will break the 'network' in matter of seconds.

I don't know exactly what breaks but bisect clearly assign the blame to this
change. Reverting the change solve this problem.

I'll be happy to make more tests to help understand what is happening here.

In the meantime, should we consider reverting this patch ?

Best Regards
Jerome

> 
> Signed-off-by: Jose Abreu 
> Cc: David S. Miller 
> Cc: Joao Pinto 
> Cc: Vitor Soares 
> Cc: Giuseppe Cavallaro 
> Cc: Alexandre Torgue 
> ---
>  drivers/net/ethernet/stmicro/stmmac/stmmac.h  |1 +
>  drivers/net/ethernet/stmicro/stmmac/stmmac_main.c |5 -
>  2 files changed, 5 insertions(+), 1 deletions(-)
> 
> diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac.h 
> b/drivers/net/ethernet/stmicro/stmmac/stmmac.h
> index 42fc76e..4d425b1 100644
> --- a/drivers/net/ethernet/stmicro/stmmac/stmmac.h
> +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac.h
> @@ -105,6 +105,7 @@ struct stmmac_priv {
>   u32 tx_count_frames;
>   u32 tx_coal_frames;
>   u32 tx_coal_timer;
> + bool tx_timer_armed;
>  
>   int tx_coalesce;
>   int hwts_tx_en;
> diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c 
> b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> index d9dbe13..789bc22 100644
> --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> @@ -3158,13 +3158,16 @@ static netdev_tx_t stmmac_xmit(struct sk_buff *skb, 
> struct net_device *dev)
>* element in case of no SG.
>*/
>   priv->tx_count_frames += nfrags + 1;
> - if (likely(priv->tx_coal_frames > priv->tx_count_frames)) {
> + if (likely(priv->tx_coal_frames > priv->tx_count_frames) &&
> + !priv->tx_timer_armed) {
>   mod_timer(>txtimer,
> STMMAC_COAL_TIMER(priv->tx_coal_timer));
> + priv->tx_timer_armed = true;
>   } else {
>   priv->tx_count_frames = 0;
>   stmmac_set_tx_ic(priv, desc);
>   priv->xstats.tx_set_ic_bit++;
> + priv->tx_timer_armed = false;
>   }
>  
>   skb_tx_timestamp(skb);




[PATCH iproute2-next] iproute_lwtunnel: allow specifying 'src' for 'encap ip' / 'encap ip6'

2018-08-17 Thread Shmulik Ladkani
This allows the user to specify the LWTUNNEL_IP_SRC/LWTUNNEL_IP6_SRC
when setting an lwtunnel encapsulation route.

Signed-off-by: Shmulik Ladkani 
---
 ip/iproute_lwtunnel.c | 22 --
 1 file changed, 20 insertions(+), 2 deletions(-)

diff --git a/ip/iproute_lwtunnel.c b/ip/iproute_lwtunnel.c
index 740da7c6..20d5545c 100644
--- a/ip/iproute_lwtunnel.c
+++ b/ip/iproute_lwtunnel.c
@@ -671,7 +671,7 @@ static int parse_encap_mpls(struct rtattr *rta, size_t len,
 static int parse_encap_ip(struct rtattr *rta, size_t len,
  int *argcp, char ***argvp)
 {
-   int id_ok = 0, dst_ok = 0, tos_ok = 0, ttl_ok = 0;
+   int id_ok = 0, dst_ok = 0, src_ok = 0, tos_ok = 0, ttl_ok = 0;
char **argv = *argvp;
int argc = *argcp;
 
@@ -694,6 +694,15 @@ static int parse_encap_ip(struct rtattr *rta, size_t len,
get_addr(, *argv, AF_INET);
rta_addattr_l(rta, len, LWTUNNEL_IP_DST,
  , addr.bytelen);
+   } else if (strcmp(*argv, "src") == 0) {
+   inet_prefix addr;
+
+   NEXT_ARG();
+   if (src_ok++)
+   duparg2("src", *argv);
+   get_addr(, *argv, AF_INET);
+   rta_addattr_l(rta, len, LWTUNNEL_IP_SRC,
+ , addr.bytelen);
} else if (strcmp(*argv, "tos") == 0) {
__u32 tos;
 
@@ -805,7 +814,7 @@ static int parse_encap_ila(struct rtattr *rta, size_t len,
 static int parse_encap_ip6(struct rtattr *rta, size_t len,
   int *argcp, char ***argvp)
 {
-   int id_ok = 0, dst_ok = 0, tos_ok = 0, ttl_ok = 0;
+   int id_ok = 0, dst_ok = 0, src_ok = 0, tos_ok = 0, ttl_ok = 0;
char **argv = *argvp;
int argc = *argcp;
 
@@ -828,6 +837,15 @@ static int parse_encap_ip6(struct rtattr *rta, size_t len,
get_addr(, *argv, AF_INET6);
rta_addattr_l(rta, len, LWTUNNEL_IP6_DST,
  , addr.bytelen);
+   } else if (strcmp(*argv, "src") == 0) {
+   inet_prefix addr;
+
+   NEXT_ARG();
+   if (src_ok++)
+   duparg2("src", *argv);
+   get_addr(, *argv, AF_INET6);
+   rta_addattr_l(rta, len, LWTUNNEL_IP6_SRC,
+ , addr.bytelen);
} else if (strcmp(*argv, "tc") == 0) {
__u32 tc;
 
-- 
2.18.0