[PATCH V2 bpf-next 0/2] Perf-based event notification for sock_ops

2018-11-07 Thread Sowmini Varadhan
-event notification based on the verdict from the filter. The uspace component can use these perf-event notifications to either read any state managed by the eBPF kernel module, or issue a TCP_INFO netlink call if desired. Patch 2 provides a simple example that shows how to use this infra (and also p

[PATCH V2 bpf-next 2/2] selftests/bpf: add a test case for sock_ops perf-event notification

2018-11-07 Thread Sowmini Varadhan
-by: Sowmini Varadhan --- V2: inline call to sys_perf_event_open() following the style of existing code in kselftests/bpf tools/testing/selftests/bpf/Makefile |4 +- tools/testing/selftests/bpf/test_tcpnotify.h | 19 ++ tools/testing/selftests/bpf/test_tcpnotify_kern.c | 95

[PATCH V2 bpf-next 1/2] bpf: add perf-event notificaton support for sock_ops

2018-11-07 Thread Sowmini Varadhan
This patch allows eBPF programs that use sock_ops to send perf-based event notifications using bpf_perf_event_output() Signed-off-by: Sowmini Varadhan --- net/core/filter.c | 19 +++ 1 files changed, 19 insertions(+), 0 deletions(-) diff --git a/net/core/filter.c b/net/core

[PATCH bpf-next 0/2] TCP-BPF event notification support

2018-11-06 Thread Sowmini Varadhan
tifications to either read any state managed by the eBPF kernel module, or issue a TCP_INFO netlink call if desired. Patch 2 provides a simple example that shows how to use this infra (and also provides a test case for it) Sowmini Varadhan (2): bpf: add perf-event notificaton support for sock_ops

[PATCH bpf-next 2/2] selftests/bpf: add a test case for sock_ops perf-event notification

2018-11-06 Thread Sowmini Varadhan
-by: Sowmini Varadhan --- tools/testing/selftests/bpf/Makefile |4 +- tools/testing/selftests/bpf/perf-sys.h| 74 tools/testing/selftests/bpf/test_tcpnotify.h | 19 ++ tools/testing/selftests/bpf/test_tcpnotify_kern.c | 95 +++ tools/testing

[PATCH bpf-next 1/2] bpf: add perf-event notificaton support for sock_ops

2018-11-06 Thread Sowmini Varadhan
This patch allows eBPF programs that use sock_ops to send perf-based event notifications using bpf_perf_event_output() Signed-off-by: Sowmini Varadhan --- net/core/filter.c | 19 +++ 1 files changed, 19 insertions(+), 0 deletions(-) diff --git a/net/core/filter.c b/net/core

[PATCH RFC net-next 3/3] bpf: Added a sample for tcp_info_notify callback

2018-10-22 Thread Sowmini Varadhan
Simple Proof-Of-Concept test program for BPF_TCP_INFO_NOTIFY (will move this to testing/selftests/net later) Signed-off-by: Sowmini Varadhan --- samples/bpf/Makefile |1 + samples/bpf/tcp_notify_kern.c | 73 + 2 files changed, 74 insertions

[PATCH RFC net-next 1/3] sock_diag: Refactor inet_sock_diag_destroy code

2018-10-22 Thread Sowmini Varadhan
We want to use the inet_sock_diag_destroy code to send notifications for more types of TCP events than just socket_close(), so refactor the code to allow this. Signed-off-by: Sowmini Varadhan --- include/linux/sock_diag.h | 18 +- include/uapi/linux/sock_diag.h |2

[PATCH RFC net-next 0/3] Extensions to allow asynchronous TCP_INFO notifications based on congestion parameters

2018-10-22 Thread Sowmini Varadhan
notification for an iperf connection if the number of retransmits exceeds 16. Sowmini Varadhan (3): sock_diag: Refactor inet_sock_diag_destroy code tcp: BPF_TCP_INFO_NOTIFY support bpf: Added a sample for tcp_info_notify callback include/linux/sock_diag.h | 18 +++--- includ

[PATCH RFC net-next 2/3] tcp: BPF_TCP_INFO_NOTIFY support

2018-10-22 Thread Sowmini Varadhan
eturn status is used by the caller to queue up a tcp_info notification for the application. Signed-off-by: Sowmini Varadhan --- include/net/tcp.h| 15 +-- include/uapi/linux/bpf.h |4 2 files changed, 17 insertions(+), 2 deletions(-) diff --git a/include/net/tcp.h b/inc

Re: [PATCH net-next 0/9] net: Kernel side filtering for route dumps

2018-10-11 Thread Sowmini Varadhan
Without getting into Ahern's patchset, which he obviously feels quite passionately about.. On (10/11/18 12:28), David Miller wrote: > > Once you've composed the message, the whole point of filtering is lost. it would be nice to apply the filter *before* constructing the skb, but afaict most

Re: [PATCH net-next 0/9] net: Kernel side filtering for route dumps

2018-10-11 Thread Sowmini Varadhan
On (10/11/18 09:33), Roopa Prabhu wrote: > 3. All networking subsystems already have this type of netlink > attribute filtering that apps rely on. This series > just makes it consistent for route dumps. Apps use such mechanism > already when requesting dumps. > Like everywhere else, BPF hook can

Re: [PATCH net-next 0/9] net: Kernel side filtering for route dumps

2018-10-11 Thread Sowmini Varadhan
On (10/11/18 09:32), David Ahern wrote: > > Route dumps are done for the entire FIB for each address family. As we > approach internet routing tables (700k+ routes for IPv4, currently > around 55k for IPv6) with many VRFs dumping the entire table is grossly > inefficient when for example only a

Re: [PATCH net-next 0/9] net: Kernel side filtering for route dumps

2018-10-11 Thread Sowmini Varadhan
On (10/11/18 08:26), Stephen Hemminger wrote: > You can do the something like this already with BPF socket filters. > But writing BPF for multi-part messages is hard. Indeed. And I was just experimenting with this for ARP just last week. So to handle the caes of "ip neigh show a.b.c.d" without

Re: [PATCH net-next 5/5] ebpf: Add sample ebpf program for SOCKET_SG_FILTER

2018-09-17 Thread Sowmini Varadhan
On (09/17/18 16:15), Alexei Starovoitov wrote: > > if the goal is to add firewall ability to RDS then the patch set > is going in the wrong direction. The goal is to add the ability to process scatterlist directly, just like we process skb's today. Your main objection was that you wanted a test

Re: [PATCH net-next 5/5] ebpf: Add sample ebpf program for SOCKET_SG_FILTER

2018-09-13 Thread Sowmini Varadhan
On (09/12/18 19:07), Alexei Starovoitov wrote: > > I didn't know that. The way I understand your statement that > this new program type, new sg logic, and all the complexity > are only applicable to RDMA capable hw and RDS. I dont know if you have been following the RFC series at all (and

Re: [PATCH net-next 5/5] ebpf: Add sample ebpf program for SOCKET_SG_FILTER

2018-09-12 Thread Sowmini Varadhan
> On 09/11/2018 09:00 PM, Alexei Starovoitov wrote: > >please no samples. > >Add this as proper test to tools/testing/selftests/bpf > >that reports PASS/FAIL and can be run automatically. > >samples/bpf is effectively dead code. Just a second. You do realize that RDS is doing real networking, so

Re: [Patch net] rds: mark bound socket with SOCK_RCU_FREE

2018-09-10 Thread Sowmini Varadhan
On (09/10/18 17:16), Cong Wang wrote: > > > > On (09/10/18 16:51), Cong Wang wrote: > > > > > > __rds_create_bind_key(key, addr, port, scope_id); > > > - rs = rhashtable_lookup_fast(_hash_table, key, ht_parms); > > > + rcu_read_lock(); > > > + rs =

Re: [Patch net] rds: mark bound socket with SOCK_RCU_FREE

2018-09-10 Thread Sowmini Varadhan
On (09/10/18 16:51), Cong Wang wrote: > > __rds_create_bind_key(key, addr, port, scope_id); > - rs = rhashtable_lookup_fast(_hash_table, key, ht_parms); > + rcu_read_lock(); > + rs = rhashtable_lookup(_hash_table, key, ht_parms); > if (rs &&

Re: [Patch net] rds: mark bound socket with SOCK_RCU_FREE

2018-09-10 Thread Sowmini Varadhan
On (09/10/18 15:43), Santosh Shilimkar wrote: > On 9/10/2018 3:24 PM, Cong Wang wrote: > >When a rds sock is bound, it is inserted into the bind_hash_table > >which is protected by RCU. But when releasing rd sock, after it > >is removed from this hash table, it is freed immediately without >

Re: [Patch net] rds: mark bound socket with SOCK_RCU_FREE

2018-09-10 Thread Sowmini Varadhan
On (09/10/18 15:24), Cong Wang wrote: > > When a rds sock is bound, it is inserted into the bind_hash_table > which is protected by RCU. But when releasing rd sock, after it > is removed from this hash table, it is freed immediately without > respecting RCU grace period. This could cause some

Re: [PATCH RFC net-next 00/11] udp gso

2018-09-03 Thread Sowmini Varadhan
On (09/03/18 10:02), Steffen Klassert wrote: > I'm working on patches that builds such skb lists. The list is chained > at the frag_list pointer of the first skb, all subsequent skbs are linked > to the next pointer of the skb. It looks like this: there are some risks to using the frag_list

[PATCH V2 ipsec-next 2/2] xfrm: reset crypto_done when iterating over multiple input xfrms

2018-09-03 Thread Sowmini Varadhan
e612a0 ("xfrm: Add an IPsec hardware offloading API") Signed-off-by: Sowmini Varadhan --- v2: added "Fixes" tag net/xfrm/xfrm_input.c |1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/net/xfrm/xfrm_input.c b/net/xfrm/xfrm_input.c index b89c9c7..be3

[PATCH V2 ipsec-next 0/2] xfrm: bug fixes when processing multiple transforms

2018-09-03 Thread Sowmini Varadhan
esp=aes_gcm_c-256-null. Each patch has a technical description of the contents of the fix. V2: added Fixes tag so that it can be backported to the stable trees. Sowmini Varadhan (2): xfrm: reset transport header back to network header after all input transforms ahave been applied xfrm

[PATCH V2 ipsec-next 1/2] xfrm: reset transport header back to network header after all input transforms ahave been applied

2018-09-03 Thread Sowmini Varadhan
back to network header only after the last transformation so that subsequent xfrms can find the correct transport header. Fixes: 7785bba299a8 ("esp: Add a software GRO codepath") Suggested-by: Steffen Klassert Signed-off-by: Sowmini Varadhan --- v2: added "Fixes" tag ne

[PATCH ipsec-next 2/2] xfrm: reset crypto_done when iterating over multiple input xfrms

2018-09-02 Thread Sowmini Varadhan
ed-off-by: Sowmini Varadhan --- net/xfrm/xfrm_input.c |1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/net/xfrm/xfrm_input.c b/net/xfrm/xfrm_input.c index b89c9c7..be3520e 100644 --- a/net/xfrm/xfrm_input.c +++ b/net/xfrm/xfrm_input.c @@ -458,6 +458,7 @@ int xfrm_input(struct s

[PATCH ipsec-next 1/2] xfrm: reset transport header back to network header after all input transforms ahave been applied

2018-09-02 Thread Sowmini Varadhan
back to network header only after the last transformation so that subsequent xfrms can find the correct transport header. Suggested-by: Steffen Klassert Signed-off-by: Sowmini Varadhan --- net/ipv4/xfrm4_input.c |1 + net/ipv4/xfrm4_mode_transport.c |4 +--- net/ipv6/xfrm6_input.c

[PATCH ipsec-next 0/2] xfrm: bug fixes when processing multiple transforms

2018-09-02 Thread Sowmini Varadhan
esp=aes_gcm_c-256-null. Each patch has a technical description of the contents of the fix. Sowmini Varadhan (2): xfrm: reset transport header back to network header after all input transforms ahave been applied xfrm: reset crypto_done when iterating over multiple input xfrms net/ipv4

Re: [PATCH] rds: tcp: remove duplicated include from tcp.c

2018-08-21 Thread Sowmini Varadhan
On (08/21/18 14:05), Yue Haibing wrote: > Remove duplicated include. > > Signed-off-by: Yue Haibing Acked-by: Sowmini Varadhan

Re: [PATCH net-next] rds: avoid lock hierarchy violation between m_rs_lock and rs_recv_lock

2018-08-08 Thread Sowmini Varadhan
On (08/08/18 14:51), Santosh Shilimkar wrote: > This bug doesn't make sense since two different transports are using > same socket (Loop and rds_tcp) and running together. > For same transport, such race can't happen with MSG_ON_SOCK flag. > CPU1-> rds_loop_inc_free > CPU0 -> rds_tcp_cork ... >

[PATCH net-next] rds: avoid lock hierarchy violation between m_rs_lock and rs_recv_lock

2018-08-08 Thread Sowmini Varadhan
the tmp_list (potentially resulting in rds_message_purge()) after dropping the rs_recv_lock. The same lock hierarchy violation also exists in rds_still_queued() and should be avoided in a similar manner Signed-off-by: Sowmini Varadhan Reported-by: syzbot+52140d69ac6dc6b92...@syzkaller.appspotmail.com

Re: [PATCH] rds: send: Fix dead code in rds_sendmsg

2018-07-25 Thread Sowmini Varadhan
ot;Structurally dead code") > Fixes: 1e2b44e78eea ("rds: Enable RDS IPv6 support") > Signed-off-by: Gustavo A. R. Silva Acked-by: Sowmini Varadhan

Re: [PATCH v3 net-next 0/3] rds: IPv6 support

2018-07-18 Thread Sowmini Varadhan
On (07/18/18 15:19), Ka-Cheong Poon wrote: > >bind() and connect() are using the sa_family/ss_family to have > >the application signal to the kernel about whether ipv4 or ipv6 is > >desired. (and bind and connect are doing the right thing for > >v4mapped, so that doesnt seem to be a problem there)

Re: [PATCH v3 net-next 0/3] rds: IPv6 support

2018-07-17 Thread Sowmini Varadhan
On (07/17/18 13:32), Ka-Cheong Poon wrote: > > The app can use either structures to make the call. When the > app fills in the structure, it knows what it is filling in, > either sockaddr_in or sockaddr_in6. So it knows the right size > to use. The app can also use IPv4 mapped address in a

Re: [PATCH v3 net-next 0/3] rds: IPv6 support

2018-07-16 Thread Sowmini Varadhan
- Looks like rds_connect() is checking things in the right order (thanks) However, rds_cancel_sent_to is still looking at the len to figure out the family.. as we move to ipv6, it would be better if we allow the caller to specify struct sockaddr_storage, or even a union of

Re: [PATCH v2 net-next 1/3] rds: Changing IP address internal representation to struct in6_addr

2018-07-06 Thread Sowmini Varadhan
On (07/06/18 23:08), Ka-Cheong Poon wrote: > > As mentioned in a previous mail, it is unclear why the > port number is transport specific. Most Internet services > use the same port number running over TCP/UDP as shown > in the IANA database. And the IANA RDS registration is > the same. What

Re: [PATCH v2 net-next 0/3] rds: IPv6 support

2018-07-06 Thread Sowmini Varadhan
On (07/06/18 22:36), Ka-Cheong Poon wrote: > This patch series does not change existing behavior. But > I think this is a strange RDS semantics as it differs from > other types of socket. But this is not about IPv6 support > and can be dealt with later. sure, > > Since we are choosing to

Re: [PATCH v2 net-next 1/3] rds: Changing IP address internal representation to struct in6_addr

2018-07-06 Thread Sowmini Varadhan
On (07/06/18 17:08), Ka-Cheong Poon wrote: > >Hmm. Why do you need to include tcp header in ib transport > >code ? If there is any common function just move to core > >common file and use it. > > I think it can be removed as it is left over from earlier > changes when the IB IPv6 listener port

Re: [PATCH v2 net-next 0/3] rds: IPv6 support

2018-07-05 Thread Sowmini Varadhan
Some additional comments on this patchset (consolidated here, please tease this apart into patch1/patch2/patch3 as appropriate) I looked at the most of rds-core, and the rds-tcp changes. Please make sure santosh looks at these carefully, especially. - RDS bind key changes -

[BISECTED] [4.17.0-rc6] IPv6 link-local address not getting added

2018-06-27 Thread Sowmini Varadhan
Hi David, An IPv6 regression has been introduced in 4.17.0-rc6 by 8308f3f net/ipv6: Add support for specifying metric of connected routes The regression is that some interfaces on my test machine come up with link-local addrs but the fe80 prefix is missing. After this bug, I cannot send any

Re: [PATCH net-next 2/3] rds: Enable RDS IPv6 support

2018-06-27 Thread Sowmini Varadhan
On (06/27/18 18:07), Ka-Cheong Poon wrote: > > There is a reason for that. It is the way folks expect > how IPv6 addresses are being used. have you tried "traceoute6 -s abc::2 fe80::2" on linux? > It is not just forwarding. The simple case is that one > picks a global address in a different

Re: [rds-devel] [PATCH net-next] rds: clean up loopback rds_connections on netns deletion

2018-06-26 Thread Sowmini Varadhan
On (06/26/18 10:53), Sowmini Varadhan wrote: > Date: Tue, 26 Jun 2018 10:53:23 -0400 > From: Sowmini Varadhan > To: David Miller > Cc: netdev@vger.kernel.org, rds-de...@oss.oracle.com > Subject: Re: [rds-devel] [PATCH net-next] rds: clean up loopback > > and just t

Re: [PATCH net-next] rds: clean up loopback rds_connections on netns deletion

2018-06-26 Thread Sowmini Varadhan
On (06/26/18 16:48), Dmitry Vyukov wrote: > it probably hit the race by a pure luck of the large program, but then > never had the same luck when tried to remove any syscalls. > So it can make sense to submit several test requests to get more testing. How does one submit test requests by email?

Re: [PATCH net-next] rds: clean up loopback rds_connections on netns deletion

2018-06-26 Thread Sowmini Varadhan
On (06/26/18 23:29), David Miller wrote: > >> > >> Since this probably fixes syzbot reports, this can be targetted > >> at 'net' instead? > > > > that thought occurred to me but I wanted to be conservative and have > > it in net-next first, have the syzkaller-bugs team confirm the > > the fixes

Re: [PATCH net-next] rds: clean up loopback rds_connections on netns deletion

2018-06-26 Thread Sowmini Varadhan
On (06/26/18 23:29), David Miller wrote: > > I think there is a way to ask syzbot to test a patch in an > email. Dmitry/syzkaller-bugs, can you clarify? This is for the cluster of dup reports like https://groups.google.com/forum/#!topic/syzkaller-bugs/zBph8Vu-q2U and (most recently)

Re: [PATCH net-next] rds: clean up loopback rds_connections on netns deletion

2018-06-26 Thread Sowmini Varadhan
On (06/26/18 22:23), David Miller wrote: > > Since this probably fixes syzbot reports, this can be targetted > at 'net' instead? that thought occurred to me but I wanted to be conservative and have it in net-next first, have the syzkaller-bugs team confirm the the fixes and then backport to

Re: [PATCH net-next 2/3] rds: Enable RDS IPv6 support

2018-06-26 Thread Sowmini Varadhan
On (06/26/18 21:02), Ka-Cheong Poon wrote: > > In this case, RFC 6724 prefers link local address as source. the keyword is "prefers". > While using non-link local address (say ULA) is not forbidden, > doing this can easily cause inter-operability issues (does the > app really know that the

Re: [PATCH net-next 2/3] rds: Enable RDS IPv6 support

2018-06-26 Thread Sowmini Varadhan
On (06/26/18 13:30), Ka-Cheong Poon wrote: > > My answer to this is that if a socket is not bound to a link > local address (meaning it is bound to a non-link local address) > and it is used to send to a link local peer, I think it should > fail. Hmm, I'm not sure I agree. I dont think this is

Re: [PATCH net-next 2/3] rds: Enable RDS IPv6 support

2018-06-25 Thread Sowmini Varadhan
On (06/26/18 01:43), Ka-Cheong Poon wrote: > > Yes, I think if the socket is bound, it should check the scope_id > in msg_name (if not NULL) to make sure that they match. A bound > RDS socket can send to multiple peers. But if the bound local > address is link local, it should only be allowed

Re: [PATCH net-next 2/3] rds: Enable RDS IPv6 support

2018-06-25 Thread Sowmini Varadhan
On (06/25/18 03:38), Ka-Cheong Poon wrote: > @@ -1105,8 +1105,27 @@ int rds_sendmsg(struct socket *sock, struct msghdr > *msg, size_t payload_len) > break; > > case sizeof(*sin6): { > - ret = -EPROTONOSUPPORT; > - goto

Re: [PATCH net-next] rds: clean up loopback rds_connections on netns deletion

2018-06-25 Thread Sowmini Varadhan
On (06/25/18 06:41), Sowmini Varadhan wrote: : > Add the changes aligned with the changes from > commit ebeeb1ad9b8a ("rds: tcp: use rds_destroy_pending() to synchronize > netns/module teardown and rds connection/workq management") for > rds_loop_transport

[PATCH net-next] rds: clean up loopback rds_connections on netns deletion

2018-06-25 Thread Sowmini Varadhan
with the changes from commit ebeeb1ad9b8a ("rds: tcp: use rds_destroy_pending() to synchronize netns/module teardown and rds connection/workq management") for rds_loop_transport Acked-by: Santosh Shilimkar Signed-off-by: Sowmini Varadhan --- net/rds/connection.c | 11 +- net/

Re: KASAN: out-of-bounds Read in rds_cong_queue_updates (2)

2018-06-13 Thread Sowmini Varadhan
On (06/13/18 09:52), Dmitry Vyukov wrote: > I think this is: > > #syz dup: KASAN: use-after-free Read in rds_cong_queue_updates Indeed. We'd had a discussion about getting a dump of threads using sysrq (or similar), given the challenges around actually getting a crash dump, is that now possible?

Re: [rds-devel] KASAN: null-ptr-deref Read in rds_ib_get_mr

2018-05-11 Thread Sowmini Varadhan
On (05/11/18 15:48), Yanjun Zhu wrote: > diff --git a/net/rds/ib_rdma.c b/net/rds/ib_rdma.c > index e678699..2228b50 100644 > --- a/net/rds/ib_rdma.c > +++ b/net/rds/ib_rdma.c > @@ -539,11 +539,17 @@ void rds_ib_flush_mrs(void) >void *rds_ib_get_mr(struct scatterlist *sg, unsigned long nents,

Re: [PATCH RFC net-next 00/11] udp gso

2018-04-18 Thread Sowmini Varadhan
On (04/18/18 06:35), Eric Dumazet wrote: > > There is no change at all. > > This will only be used as a mechanism to send X packets of same size. > > So instead of X system calls , one system call. > > One traversal of some expensive part of the host stack. > > The content on the wire should

Re: [PATCH RFC net-next 00/11] udp gso

2018-04-18 Thread Sowmini Varadhan
I went through the patch set and the code looks fine- it extends existing infra for TCP/GSO to UDP. One thing that was not clear to me about the API: shouldn't UDP_SEGMENT just be automatically determined in the stack from the pmtu? Whats the motivation for the socket option for this? also AIUI

Re: [PATCH RFC net-next 00/11] udp gso

2018-04-17 Thread Sowmini Varadhan
On (04/17/18 16:23), Willem de Bruijn wrote: > > Assuming IPv4 with an MTU of 1500 and the maximum segment > size of 1472, the receiver will see three datagrams with MSS of > 1472B, 528B and 512B. so the recvmsg will also pass up 1472, 526, 512, right? If yes, how will the recvmsg differentiate

Re: [PATCH RFC net-next 00/11] udp gso

2018-04-17 Thread Sowmini Varadhan
On (04/17/18 16:00), Willem de Bruijn wrote: > > This patchset implements GSO for UDP. A process can concatenate and > submit multiple datagrams to the same destination in one send call > by setting socket option SOL_UDP/UDP_SEGMENT with the segment size, > or passing an analogous cmsg at send

Re: KASAN: use-after-free Read in inet_create

2018-04-08 Thread Sowmini Varadhan
#syz dup: KASAN: use-after-free Read in rds_cong_queue_updates There are a number of manifestations of this bug, basically all suggest that the connect/reconnect etc workqs are somehow being scheduled after the netns is deleted, despite the code refactoring in Commit 3db6e0d172c (and looks

Re: [rds-devel] [PATCH RFC RFC] rds: Use NETDEV_UNREGISTER in rds_tcp_dev_event() (then kill NETDEV_UNREGISTER_FINAL)

2018-03-20 Thread Sowmini Varadhan
On (03/20/18 12:37), H??kon Bugge wrote: > > A little nit below. And some spelling issues in existing commentary > you can consider fixing, since you reshuffle this file considerable. > > + if (net != _net && rtn->ctl_table) > > + kfree(rtn->ctl_table); > > Well, this comes from the

[PATCH net-next] rds: tcp: remove register_netdevice_notifier infrastructure.

2018-03-19 Thread Sowmini Varadhan
to netdevice notifiers and refactors all the code needed to dismantle rds_tcp state into a ->exit callback for the pernet_operations used with register_pernet_device(). Signed-off-by: Sowmini Varadhan <sowmini.varad...@oracle.com> --- net/rds/tcp

Re: KASAN: slab-out-of-bounds Read in rds_cong_queue_updates

2018-03-19 Thread Sowmini Varadhan
On (03/19/18 09:29), Dmitry Vyukov wrote: > > This looks the same as: > > #syz dup: KASAN: use-after-free Read in rds_cong_queue_updates correct, seems like the rds_destroy_pending() fixes did not seal this race condition. I need to look at this more carefully to see what race I missed.. no

Re: [rds-devel] [PATCH RFC RFC] rds: Use NETDEV_UNREGISTER in rds_tcp_dev_event() (then kill NETDEV_UNREGISTER_FINAL)

2018-03-18 Thread Sowmini Varadhan
On (03/18/18 00:55), Kirill Tkhai wrote: > > I just want to make rds not using NETDEV_UNREGISTER_FINAL. If there is > another solution to do that, I'm not again that. The patch below takes care of this. I've done some preliminary testing, and I'll send it upstream after doing additional

Re: [rds-devel] [PATCH RFC RFC] rds: Use NETDEV_UNREGISTER in rds_tcp_dev_event() (then kill NETDEV_UNREGISTER_FINAL)

2018-03-17 Thread Sowmini Varadhan
On (03/17/18 10:15), Sowmini Varadhan wrote: > To solve the scaling problem why not just have a well-defined > callback to modules when devices are quiesced, instead of > overloading the pernet_device registration in this obscure way? I thought about this a bit, and maybe I missed your

Re: [PATCH RFC RFC] rds: Use NETDEV_UNREGISTER in rds_tcp_dev_event() (then kill NETDEV_UNREGISTER_FINAL)

2018-03-17 Thread Sowmini Varadhan
I spent a long time staring at both v1 and v2 of your patch. I understand the overall goal, but I am afraid to say that these patches are complete hacks. I was trying to understand why patchv1 blows with a null rtn in rds_tcp_init_net, but v2 does not, and the analysis is ugly. I'm going to

Re: [PATCH RFC RFC] rds: Use NETDEV_UNREGISTER in rds_tcp_dev_event() (then kill NETDEV_UNREGISTER_FINAL)

2018-03-16 Thread Sowmini Varadhan
On (03/16/18 21:48), Kirill Tkhai wrote: > > > Thus I have to spend some time reviewing your patch, > > and I cannot give you an answer in the next 5 minutes. > > No problem, 5 minutes response never required. Thanks for > your review. thank you. I would like to take some time this weekend to

Re: [PATCH RFC RFC] rds: Use NETDEV_UNREGISTER in rds_tcp_dev_event() (then kill NETDEV_UNREGISTER_FINAL)

2018-03-16 Thread Sowmini Varadhan
On (03/16/18 21:14), Kirill Tkhai wrote: > > I did the second version and sent you. Have you tried it? I tried it briefly, and it works for the handful of testcases that I tried, but I still think its very werid to register as both a device and a subsys, esp in the light of the warning in

Re: [PATCH RFC RFC] rds: Use NETDEV_UNREGISTER in rds_tcp_dev_event() (then kill NETDEV_UNREGISTER_FINAL)

2018-03-16 Thread Sowmini Varadhan
I had taken some of this offline, but it occurs to me that some of these notes should be saved to the netdev archives, in case this question pops up again in a few years. When I run your patch, I get a repeatable panic by doing modprobe rds-tcp ip netns create blue the panic is because we

Re: [PATCH RFC RFC] rds: Use NETDEV_UNREGISTER in rds_tcp_dev_event() (then kill NETDEV_UNREGISTER_FINAL)

2018-03-16 Thread Sowmini Varadhan
Found my previous question: https://www.mail-archive.com/netdev@vger.kernel.org/msg72330.html (see section about "Comments are specifically ivinted.." > This is not a problem, and rds-tcp is not the only pernet_subsys registering > a socket. It's OK to close it from .exit method. There are

Re: [PATCH RFC RFC] rds: Use NETDEV_UNREGISTER in rds_tcp_dev_event() (then kill NETDEV_UNREGISTER_FINAL)

2018-03-16 Thread Sowmini Varadhan
On (03/16/18 15:38), Kirill Tkhai wrote: > > 467fa15356acf by Sowmini Varadhan added NETDEV_UNREGISTER_FINAL dependence > with the commentary: > > /* rds-tcp registers as a pernet subys, so the ->exit will only >* get invoked after network acitivity

[PATCH net-next] rds: tcp: must use spin_lock_irq* and not spin_lock_bh with rds_tcp_conn_lock

2018-03-15 Thread Sowmini Varadhan
use rds_destroy_pending() correctly. Reported-by: syzbot+c68e51bb5e699d3f8...@syzkaller.appspotmail.com Fixes: ebeeb1ad9b8a ("rds: tcp: use rds_destroy_pending() to synchronize netns/module teardown and rds connection/workq management") Signed-off-by: Sowmini Varadhan <

Re: WARNING in __local_bh_enable_ip (2)

2018-03-14 Thread Sowmini Varadhan
On (03/14/18 14:28), Eric Dumazet wrote: > > > spin_lock_bh(_tcp_conn_lock);/spin_unlock_bh(_tcp_conn_lock); in > rds_tcp_conn_free() > > is in conflict with the spin_lock_irqsave(_conn_lock, flags); > in __rds_conn_create() yes I was going to look into this and fix it later. > Hard to

Re: [PATCH][rds-next] rds: make functions rds_info_from_znotifier and rds_message_zcopy_from_user static

2018-03-11 Thread Sowmini Varadhan
On (03/11/18 18:03), Colin King wrote: > From: Colin Ian King > > Functions rds_info_from_znotifier and rds_message_zcopy_from_user are > local to the source and do not need to be in global scope, so make them > static. the rds_message_zcopy_from_user warning was

Re: [rds-devel] [PATCH][rds-next] rds: remove redundant variable 'sg_off'

2018-03-11 Thread Sowmini Varadhan
On (03/11/18 17:27), Colin King wrote: > Variable sg_off is assigned a value but it is never read, hence it is > redundant and can be removed. > Acked-by: Sowmini Varadhan <sowmini.varad...@oracle.com>

Re: [RFC PATCH linux-next] rds: rds_message_zcopy_from_user() can be static

2018-03-08 Thread Sowmini Varadhan
On (03/08/18 18:56), kbuild test robot wrote: > > Fixes: d40a126b16ea ("rds: refactor zcopy code into > rds_message_zcopy_from_user") > Signed-off-by: Fengguang Wu <fengguang...@intel.com> Acked-by: Sowmini Varadhan <sowmini.varad...@oracle.com> (do I need to separately submit a non-RFC patch for this?)

Re: [PATCH net-next] sock: Fix SO_ZEROCOPY switch case

2018-03-07 Thread Sowmini Varadhan
On (03/07/18 09:40), Jesus Sanchez-Palencia wrote: > Fix the SO_ZEROCOPY switch case on sock_setsockopt() avoiding the > ret values to be overwritten by the one set on the default case. Acked-by: Sowmini Varadhan <sowmini.varad...@oracle.com>

[PATCH net-next 1/2] rds: refactor zcopy code into rds_message_zcopy_from_user

2018-03-06 Thread Sowmini Varadhan
Move the large block of code predicated on zcopy from rds_message_copy_from_user into a new function, rds_message_zcopy_from_user() Signed-off-by: Sowmini Varadhan <sowmini.varad...@oracle.com> --- net/rds/message.c | 108 +--- 1 files chang

[PATCH net-next 0/2] RDS: zerocopy code enhancements

2018-03-06 Thread Sowmini Varadhan
) Sowmini Varadhan (2): rds: refactor zcopy code into rds_message_zcopy_from_user rds: use list structure to track information for zerocopy completion notification

[PATCH net-next 2/2] rds: use list structure to track information for zerocopy completion notification

2018-03-06 Thread Sowmini Varadhan
okie_queue by a simpler list that results in a smaller memory footprint as well as more efficient memory_allocation time. Signed-off-by: Sowmini Varadhan <sowmini.varad...@oracle.com> --- net/rds/af_rds.c |6 ++-- net/rds/message.c | 77 +

Re: [PATCH v2 net] rds: Incorrect reference counting in TCP socket creation

2018-03-02 Thread Sowmini Varadhan
figured as a kernel module. Acked-by: Sowmini Varadhan <sowmini.varad...@oracle.com>

Re: [PATCH net] rds: Incorrect reference counting in TCP socket creation

2018-03-01 Thread Sowmini Varadhan
On (03/01/18 20:19), Ka-Cheong Poon wrote: > >> > >>- new_sock->type = sock->type; > >>- new_sock->ops = sock->ops; > >> ret = sock->ops->accept(sock, new_sock, O_NONBLOCK, true); > >> if (ret < 0) > >> goto out; > >> > >>+ new_sock->ops =

Re: [PATCH net] rds: Incorrect reference counting in TCP socket creation

2018-03-01 Thread Sowmini Varadhan
On Wed, Feb 28, 2018 at 11:44 PM, Ka-Cheong Poon wrote: > Commit 0933a578cd55 ("rds: tcp: use sock_create_lite() to create the > accept socket") has a reference counting issue in TCP socket creation > when accepting a new connection. The code uses sock_create_lite() to

[PATCH RESEND V3 net-next 3/3] selftests/net: reap zerocopy completions passed up as ancillary data.

2018-02-27 Thread Sowmini Varadhan
PF_RDS sockets pass up cookies for zerocopy completion as ancillary data. Update msg_zerocopy to reap this information. Signed-off-by: Sowmini Varadhan <sowmini.varad...@oracle.com> Acked-by: Willem de Bruijn <will...@google.com> Acked-by: Santosh Shilimkar <santosh.shilim...@ora

[PATCH RESEND V3 net-next 0/3] RDS: optimized notification for zerocopy completion

2018-02-27 Thread Sowmini Varadhan
to remove the sk_errror_queue related paths in RDS. Both of these goals are implemented in this series. v2: removed sk_error_queue support v3: incorporated additional code review comments (details in each patch) Sowmini Varadhan (3): selftests/net: revert the zerocopy Rx path for PF_RDS rds: deliver

[PATCH RESEND V3 net-next 1/3] selftests/net: revert the zerocopy Rx path for PF_RDS

2018-02-27 Thread Sowmini Varadhan
In preparation for optimized reception of zerocopy completion, revert the Rx side changes introduced by Commit dfb8434b0a94 ("selftests/net: add zerocopy support for PF_RDS test case") Signed-off-by: Sowmini Varadhan <sowmini.varad...@oracle.com> Acked-by: Willem de Bruijn <

[PATCH RESEND V3 net-next 2/3] rds: deliver zerocopy completion notification with data

2018-02-27 Thread Sowmini Varadhan
es support for zerocopy completion notification on MSG_ERRQUEUE for PF_RDS sockets. Signed-off-by: Sowmini Varadhan <sowmini.varad...@oracle.com> Acked-by: Willem de Bruijn <will...@google.com> Acked-by: Santosh Shilimkar <santosh.shilim...@oracle.com> --- v2: remove sk_error_q

Re: [PATCH net-next 0/2] ntuple filters with RSS

2018-02-27 Thread Sowmini Varadhan
On (02/27/18 12:40), David Miller wrote: > > I'm expecting a V3 respin of your zerocopy notification changes > if that is what you're talking about, and I therefore marked > the most recent spin as "changes requested". sorry, I'm confused - you are waiting for V4? I am not seeing v3 on

Re: [PATCH net-next 0/2] ntuple filters with RSS

2018-02-27 Thread Sowmini Varadhan
On (02/27/18 11:49), David Miller wrote: > > Do I need to resend? > > Yes, see my other email. do we need to resend patches not showing up in patchwork? I recall seeing email about picking things manually from inbox but missed this. --Sowmini

Re: [PATCH V3 net-next 2/3] rds: deliver zerocopy completion notification with data

2018-02-26 Thread Sowmini Varadhan
On (02/26/18 09:07), Santosh Shilimkar wrote: > Just in case you haven't seen yet, Dan Carpenter reported skb deref > warning on previous version of the patch. Not sure why it wasn't sent > on netdev. yes I saw it, but that was for the previous version of the patch around code that delivers

[PATCH V3 net-next 1/3] selftests/net: revert the zerocopy Rx path for PF_RDS

2018-02-25 Thread Sowmini Varadhan
In preparation for optimized reception of zerocopy completion, revert the Rx side changes introduced by Commit dfb8434b0a94 ("selftests/net: add zerocopy support for PF_RDS test case") Signed-off-by: Sowmini Varadhan <sowmini.varad...@oracle.com> --- v2: prepare to remove sk_e

[PATCH V3 net-next 3/3] selftests/net: reap zerocopy completions passed up as ancillary data.

2018-02-25 Thread Sowmini Varadhan
PF_RDS sockets pass up cookies for zerocopy completion as ancillary data. Update msg_zerocopy to reap this information. Signed-off-by: Sowmini Varadhan <sowmini.varad...@oracle.com> --- v2: receive zerocopy completion notification as POLLIN v3: drop ncookies arg in do_process_zerocopy_c

[PATCH V3 net-next 2/3] rds: deliver zerocopy completion notification with data

2018-02-25 Thread Sowmini Varadhan
es support for zerocopy completion notification on MSG_ERRQUEUE for PF_RDS sockets. Signed-off-by: Sowmini Varadhan <sowmini.varad...@oracle.com> --- v2: remove sk_error_queue path; lot of cautionary checks rds_recvmsg_zcookie() and callers to make sure we dont remove cookie

[PATCH V3 net-next 0/3] RDS: optimized notification for zerocopy completion

2018-02-25 Thread Sowmini Varadhan
sk_error_queue support v3: incorporated additional code review comments (details in each patch) Sowmini Varadhan (3): selftests/net: revert the zerocopy Rx path for PF_RDS rds: deliver zerocopy completion notification with data selftests/net: reap zerocopy completions passed up as ancillary data

Re: [PATCH V2 net-next 2/3] rds: deliver zerocopy completion notification with data

2018-02-25 Thread Sowmini Varadhan
On (02/25/18 10:56), Willem de Bruijn wrote: > > @@ -91,22 +85,19 @@ static void rds_rm_zerocopy_callback(struct rds_sock > > *rs, > > spin_unlock_irqrestore(>lock, flags); > > mm_unaccount_pinned_pages(>z_mmp); > >

[PATCH V2 net-next 0/3] RDS: optimized notification for zerocopy completion

2018-02-23 Thread Sowmini Varadhan
pointed out that socket functions block if sk_err is non-zero, thus if the RDS code does not plan/need to use sk_error_queue path for completion notification, it is preferable to remove the sk_errror_queue related paths in RDS. Both of these goals are implemented in this series. Sowmini Varadhan (3

[PATCH V2 net-next 3/3] selftests/net: reap zerocopy completions passed up as ancillary data.

2018-02-23 Thread Sowmini Varadhan
PF_RDS sockets pass up cookies for zerocopy completion as ancillary data. Update msg_zerocopy to reap this information. Signed-off-by: Sowmini Varadhan <sowmini.varad...@oracle.com> --- v2: receive zerocopy completion notification as POLLIN tools/testing/selftests/net/msg_zerocopy.c

[PATCH V2 net-next 1/3] selftests/net: revert the zerocopy Rx path for PF_RDS

2018-02-23 Thread Sowmini Varadhan
In preparation for optimized reception of zerocopy completion, revert the Rx side changes introduced by Commit dfb8434b0a94 ("selftests/net: add zerocopy support for PF_RDS test case") Signed-off-by: Sowmini Varadhan <sowmini.varad...@oracle.com> --- v2: prepare to remove sk_e

[PATCH V2 net-next 2/3] rds: deliver zerocopy completion notification with data

2018-02-23 Thread Sowmini Varadhan
es support for zerocopy completion notification on MSG_ERRQUEUE for PF_RDS sockets. Signed-off-by: Sowmini Varadhan <sowmini.varad...@oracle.com> --- v2: remove sk_error_queue path; lot of cautionary checks rds_recvmsg_zcookie() and callers to make sure we dont remove cookie

[PATCH net-next] rds: rds_msg_zcopy should return error of null rm->data.op_mmp_znotifier

2018-02-22 Thread Sowmini Varadhan
bot+f893ae7bb2f6456df...@syzkaller.appspotmail.com Fixes: 0cebaccef3ac ("rds: zerocopy Tx support.") Signed-off-by: Sowmini Varadhan <sowmini.varad...@oracle.com> --- net/rds/send.c |3 ++- 1 files changed, 2 insertions(+), 1 deletions(-) diff --git a/net/rds/send.c b/net/rds/send.c ind

Re: [PATCH net-next] RDS: deliver zerocopy completion notification with data as an optimization

2018-02-22 Thread Sowmini Varadhan
On (02/21/18 19:39), Willem de Bruijn wrote: > >> By the way, the put_cmsg is unconditional even if the caller did > >> not supply msg_control. So it is basically no longer safe to ever > >> call read, recv or recvfrom on a socket if zerocopy notifications > >> are outstanding. > > > > Wait, I

  1   2   3   4   5   6   7   >