from:"Santosh Shilimkar"

Re: [PATCH net] rds: RDS (tcp) hangs on sendto() to unresponding address

2018-10-08 Thread Santosh Shilimkar


10/8/2018 9:17 AM, Ka-Cheong Poon wrote:

In rds_send_mprds_hash(), if the calculated hash value is non-zero and
the MPRDS connections are not yet up, it will wait.  But it should not
wait if the send is non-blocking.  In this case, it should just use the
base c_path for sending the message.

Signed-off-by: Ka-Cheong Poon 
---

Thanks for getting this out on the list.

Acked-by: Santosh Shilimkar

Re: [PATCH 1/1] net: rds: use memset to optimize the recv

2018-09-14 Thread Santosh Shilimkar


On 9/14/2018 1:45 AM, Zhu Yanjun wrote:

The function rds_inc_init is in recv process. To use memset can optimize
the function rds_inc_init.
The test result:

 Before:
 1) + 24.950 us   |rds_inc_init [rds]();
 After:
 1) + 10.990 us   |rds_inc_init [rds]();

Signed-off-by: Zhu Yanjun 
---

Looks good. Thanks !!

Acked-by: Santosh Shilimkar

Re: [Patch net v2] rds: fix two RCU related problems

2018-09-10 Thread Santosh Shilimkar


On 9/10/2018 6:27 PM, Cong Wang wrote:

When a rds sock is bound, it is inserted into the bind_hash_table
which is protected by RCU. But when releasing rds sock, after it
is removed from this hash table, it is freed immediately without
respecting RCU grace period. This could cause some use-after-free
as reported by syzbot.

Mark the rds sock with SOCK_RCU_FREE before inserting it into the
bind_hash_table, so that it would be always freed after a RCU grace
period.

The other problem is in rds_find_bound(), the rds sock could be
freed in between rhashtable_lookup_fast() and rds_sock_addref(),
so we need to extend RCU read lock protection in rds_find_bound()
to close this race condition.

Reported-and-tested-by: syzbot+8967084bcac563795...@syzkaller.appspotmail.com
Reported-by: syzbot+93a5839deb3555374...@syzkaller.appspotmail.com
Cc: Sowmini Varadhan 
Cc: Santosh Shilimkar 
Cc: rds-de...@oss.oracle.com
Signed-off-by: Cong Wang 
---

Thank you !!
Acked-by: Santosh Shilimkar

Re: [Patch net] rds: mark bound socket with SOCK_RCU_FREE

2018-09-10 Thread Santosh Shilimkar


On 9/10/2018 5:45 PM, Cong Wang wrote:

On Mon, Sep 10, 2018 at 5:26 PM Santosh Shilimkar
 wrote:

Would you mind posting an updated patch please with call_rcu and
above extended RCU grace period with rcu_read_lock. Thanks !!


If you prefer to fix _two_ problems in one patch, sure.

For the record, the bug this patch fixes is NOT same with the one
in rds_find_bound(), because there is no rds_find_bound() in
the backtrace. If you want to see the full backtrace, here it is:

https://marc.info/?l=linux-netdev=15364808962=2

This is why I believe they are two problems.

Whether fixing two problems in one patch or two patches is
merely a preference, I leave it up to you.


I had a partial fix as well in past but since it wasn't covering
complete window, it was abandoned.

Since we know the other race, it would be best to address it
together and hence requested a combine patch. Thanks for help !!

Regards,
Santosh

Re: [Patch net] rds: mark bound socket with SOCK_RCU_FREE

2018-09-10 Thread Santosh Shilimkar


On 9/10/2018 5:16 PM, Cong Wang wrote:

On Mon, Sep 10, 2018 at 5:04 PM Sowmini Varadhan
 wrote:


On (09/10/18 16:51), Cong Wang wrote:


 __rds_create_bind_key(key, addr, port, scope_id);
-   rs = rhashtable_lookup_fast(_hash_table, key, ht_parms);
+   rcu_read_lock();
+   rs = rhashtable_lookup(_hash_table, key, ht_parms);
 if (rs && !sock_flag(rds_rs_to_sk(rs), SOCK_DEAD))
 rds_sock_addref(rs);
 else
 rs = NULL;
+   rcu_read_unlock();


aiui, the rcu_read lock/unlock is only useful if the write
side doing destructive operations  does something to make sure readers
are done before doing the destructive opertion. AFAIK, that does
not exist for rds socket management today


That is exactly why we need it here, right?

As you mentioned previously, the sock could be freed after
rhashtable_lookup_fast() but before rds_sock_addref(), extending
the RCU read section after rds_sock_addref() exactly solves
the problem here.


Thats the case really.


Or is there any other destructive problem you didn't mention?
Mind to be specific?






Although sock release path is not a very hot path, but blocking
it isn't a good idea either, especially when you can use call_rcu(),
which has the same effect.

I don't see any reason we should prefer synchronize_rcu() here.


Usually correctness (making sure all readers are done, before nuking a
data structure) is a little bit more important than perforamance, aka
"safety before speed" is what I've always been taught.



Hmm, so you are saying synchronize_rcu() is kinda more correct
than call_rcu()?? I never hear this before, would like to know why.

To my best knowledge, the only difference between them is the context,
one is blocking, the other is non-blocking. Their correctness must be
equivalent.


We have burn our hands with blocking synchronize_rcu() and that was
actually the main reason we moved to rw locks from rcu to localise
the cost. call_rcu()should be just fine as long as it plugs the hole.
I don't want to add blocking in this path since this slows
down the connection shutdown path and at times lead to soft dumps.

Would you mind posting an updated patch please with call_rcu and
above extended RCU grace period with rcu_read_lock. Thanks !!

Regards,
Santosh

Regards,
Santosh

Re: [Patch net] rds: mark bound socket with SOCK_RCU_FREE

2018-09-10 Thread Santosh Shilimkar


On 9/10/2018 3:24 PM, Cong Wang wrote:

When a rds sock is bound, it is inserted into the bind_hash_table
which is protected by RCU. But when releasing rd sock, after it
is removed from this hash table, it is freed immediately without
respecting RCU grace period. This could cause some use-after-free
as reported by syzbot.


Indeed.


Mark the rds sock as SOCK_RCU_FREE before inserting it into the
bind_hash_table, so that it would be always freed after a RCU grace
period.

Reported-and-tested-by: syzbot+8967084bcac563795...@syzkaller.appspotmail.com
Cc: Sowmini Varadhan 
Cc: Santosh Shilimkar 
Cc: rds-de...@oss.oracle.com
Signed-off-by: Cong Wang 
---
  net/rds/bind.c | 1 +
  1 file changed, 1 insertion(+)

diff --git a/net/rds/bind.c b/net/rds/bind.c
index 3ab55784b637..2281b34415b9 100644
--- a/net/rds/bind.c
+++ b/net/rds/bind.c
@@ -235,6 +235,7 @@ int rds_bind(struct socket *sock, struct sockaddr *uaddr, 
int addr_len)
goto out;
}
  
+	sock_set_flag(sk, SOCK_RCU_FREE);

ret = rds_add_bound(rs, binding_addr, , scope_id);
if (ret)
goto out;

I wasn't aware of this "SOCK_RCU_FREE" so really thanks for this patch. 
Have been scratching my head over this for a while thinking about

generic provision at sk level to synchronize. This is much
better than adding the sync at upper layer.

It does have the tax for slowing down RDS for other kernel components
rcu sync but anyway this hole needs to be plugged so am fine to go
ahead with this change. Thanks for the patch.

FWIW,
Acked-by: Santosh Shilimkar

Re: [PATCH] rds: tcp: remove duplicated include from tcp.c

2018-08-21 Thread Santosh Shilimkar


On 8/21/2018 7:05 AM, Yue Haibing wrote:

Remove duplicated include.

Signed-off-by: Yue Haibing 
---

Looks fine.

Acked-by : Santosh Shilimkar

Re: [RFC v3 net-next 4/5] rds: invoke socket sg filter attached to rds socket

2018-08-20 Thread Santosh Shilimkar


On 8/17/2018 4:08 PM, Tushar Dave wrote:

RDS module sits on top of TCP (rds_tcp) and IB (rds_rdma), so messages
arrive in form of skb (over TCP) and scatterlist (over IB/RDMA).
However, because socket filter only deal with skb (e.g. struct skb as
bpf context) we can only use socket filter for rds_tcp and not for
rds_rdma.

Considering one filtering solution for RDS, it seems that the common
denominator between sk_buff and scatterlist is scatterlist. Therefore,
this patch converts skb to sgvec and invoke sg_filter_run for
rds_tcp and simply invoke sg_filter_run for IB/rds_rdma.

Signed-off-by: Tushar Dave 
Reviewed-by: Sowmini Varadhan 
---


Looks good Tushar !!

Acked-by: Santosh Shilimkar

Re: [PATCH net-next] rds: avoid lock hierarchy violation between m_rs_lock and rs_recv_lock

2018-08-08 Thread Santosh Shilimkar


On 8/8/2018 3:18 PM, Sowmini Varadhan wrote:

On (08/08/18 14:51), Santosh Shilimkar wrote:

This bug doesn't make sense since two different transports are using
same socket (Loop and rds_tcp) and running together.
For same transport, such race can't happen with MSG_ON_SOCK flag.
CPU1-> rds_loop_inc_free
CPU0 -> rds_tcp_cork ...



The test is just reporting a lock hierarchy violation

As far as I can tell, this wasn't an actual deadlock that happened
because as you point out, either a socket has the rds_tcp transport
or the rds_loop transport, so this particular pair of stack traces
would not happen with the code as it exists today.


Exactly.


but there is a valid lock hierachy violation here, and
imho it's a good idea to get that cleaned up.


The lock hierarchy violation is protected for the same transport.
I don't see this violation possible for legitimate use and hence
the comment. If we start supporting two different transport on
same socket then we have many more cases to fix and as such lock
violation will be just one of those.

Loop transport seems to keep throwing surprises. Need to
confirm but looks like it can co-exist with another transport
on same socket if those traces to be believed. If its the case,
then definitely that need to be plugged.

Regards,
Santosh

Re: [PATCH net-next] rds: avoid lock hierarchy violation between m_rs_lock and rs_recv_lock

2018-08-08 Thread Santosh Shilimkar


On 8/8/2018 1:57 PM, Sowmini Varadhan wrote:

The following deadlock, reported by syzbot, can occur if CPU0 is in
rds_send_remove_from_sock() while CPU1 is in rds_clear_recv_queue()

CPU0CPU1

   lock(&(>m_rs_lock)->rlock);
lock(>rs_recv_lock);
lock(&(>m_rs_lock)->rlock);
   lock(>rs_recv_lock);

The deadlock should be avoided by moving the messages from the
rs_recv_queue into a tmp_list in rds_clear_recv_queue() under
the rs_recv_lock, and then dropping the refcnt on the messages
in the tmp_list (potentially resulting in rds_message_purge())
after dropping the rs_recv_lock.

The same lock hierarchy violation also exists in rds_still_queued()
and should be avoided in a similar manner

Signed-off-by: Sowmini Varadhan 
Reported-by: syzbot+52140d69ac6dc6b92...@syzkaller.appspotmail.com
---

This bug doesn't make sense since two different transports are using
same socket (Loop and rds_tcp) and running together.
For same transport, such race can't happen with MSG_ON_SOCK flag.
CPU1-> rds_loop_inc_free
CPU0 -> rds_tcp_cork ...

I need to understand this test better.

Regards,
Santosh

Re: [PATCH net-next 2/2] rds: Remove IPv6 dependency

2018-07-31 Thread Santosh Shilimkar


On 7/30/2018 10:48 PM, Ka-Cheong Poon wrote:

This patch removes the IPv6 dependency from RDS.

Signed-off-by: Ka-Cheong Poon 
---
  net/rds/Kconfig  |  2 +-
  net/rds/af_rds.c | 32 +++-
  net/rds/bind.c   |  4 +++-
  net/rds/connection.c | 26 --
  net/rds/ib.c | 31 ++-
  net/rds/ib_cm.c  |  9 +
  net/rds/ib_rdma.c|  2 ++
  net/rds/rdma_transport.c | 10 ++
  net/rds/recv.c   |  2 ++
  net/rds/send.c   |  2 ++
  net/rds/tcp.c| 25 +
  net/rds/tcp_listen.c | 21 +
  12 files changed, 140 insertions(+), 26 deletions(-)

diff --git a/net/rds/Kconfig b/net/rds/Kconfig
index 4c7f259..41f7556 100644
--- a/net/rds/Kconfig
+++ b/net/rds/Kconfig
@@ -1,7 +1,7 @@
  
  config RDS

tristate "The RDS Protocol"
-   depends on INET && IPV6
+   depends on INET

Thanks for followup Ka-Cheong.


diff --git a/net/rds/af_rds.c b/net/rds/af_rds.c
index fc5c48b..65387e1 100644
--- a/net/rds/af_rds.c
+++ b/net/rds/af_rds.c
@@ -156,18 +156,20 @@ static int rds_getname(struct socket *sock, struct 
sockaddr *uaddr,
return sizeof(*sin);
}
  
-			if (ipv6_addr_type(>rs_conn_addr) &

-   IPV6_ADDR_MAPPED) {
-   sin = (struct sockaddr_in *)uaddr;
-   memset(sin, 0, sizeof(*sin));
-   sin->sin_family = AF_INET;
-   return sizeof(*sin);
+#if IS_ENABLED(CONFIG_IPV6)
+   if (!(ipv6_addr_type(>rs_conn_addr) &
+ IPV6_ADDR_MAPPED)) {
+   sin6 = (struct sockaddr_in6 *)uaddr;
+   memset(sin6, 0, sizeof(*sin6));
+   sin6->sin6_family = AF_INET6;
+   return sizeof(*sin6);
}
+#endif
  

I don't like this #ifdef spaghetti all over in the middle functions
but seems like that is the best option considering the involved
changes from previous series.

FWIW,
Acked-by: Santosh Shilimkar

Re: [PATCH net-next 1/2] rds: rds_ib_recv_alloc_cache() should call alloc_percpu_gfp() instead

2018-07-31 Thread Santosh Shilimkar


On 7/30/2018 10:48 PM, Ka-Cheong Poon wrote:

Currently, rds_ib_conn_alloc() calls rds_ib_recv_alloc_caches()
without passing along the gfp_t flag.  But rds_ib_recv_alloc_caches()
and rds_ib_recv_alloc_cache() should take a gfp_t parameter so that
rds_ib_recv_alloc_cache() can call alloc_percpu_gfp() using the
correct flag instead of calling alloc_percpu().

Signed-off-by: Ka-Cheong Poon 
---

Looks good !!

Acked-by: Santosh Shilimkar

Re: [PATCH] rds: send: Fix dead code in rds_sendmsg

2018-07-25 Thread Santosh Shilimkar


On 7/25/2018 8:22 AM, Gustavo A. R. Silva wrote:

Currently, code at label *out* is unreachable. Fix this by updating
variable *ret* with -EINVAL, so the jump to *out* can be properly
executed instead of directly returning from function.

Addresses-Coverity-ID: 1472059 ("Structurally dead code")
Fixes: 1e2b44e78eea ("rds: Enable RDS IPv6 support")
Signed-off-by: Gustavo A. R. Silva 
---

Looks fine.
Acked-by: Santosh Shilimkar

Re: [PATCH v4 net-next 3/3] rds: Extend RDS API for IPv6 support

2018-07-23 Thread Santosh Shilimkar


On 7/23/2018 7:16 AM, Ka-Cheong Poon wrote:

There are many data structures (RDS socket options) used by RDS apps
which use a 32 bit integer to store IP address. To support IPv6,
struct in6_addr needs to be used. To ensure backward compatibility, a
new data structure is introduced for each of those data structures
which use a 32 bit integer to represent an IP address. And new socket
options are introduced to use those new structures. This means that
existing apps should work without a problem with the new RDS module.
For apps which want to use IPv6, those new data structures and socket
options can be used. IPv4 mapped address is used to represent IPv4
address in the new data structures.

v4: Revert changes to SO_RDS_TRANSPORT

Signed-off-by: Ka-Cheong Poon 
---

Looks good to me now. Thanks !!

Acked-by: Santosh Shilimkar

Re: [PATCH v4 net-next 2/3] rds: Enable RDS IPv6 support

2018-07-23 Thread Santosh Shilimkar


On 7/23/2018 7:16 AM, Ka-Cheong Poon wrote:

This patch enables RDS to use IPv6 addresses. For RDS/TCP, the
listener is now an IPv6 endpoint which accepts both IPv4 and IPv6
connection requests.  RDS/RDMA/IB uses a private data (struct
rds_ib_connect_private) exchange between endpoints at RDS connection
establishment time to support RDMA. This private data exchange uses a
32 bit integer to represent an IP address. This needs to be changed in
order to support IPv6. A new private data struct
rds6_ib_connect_private is introduced to handle this. To ensure
backward compatibility, an IPv6 capable RDS stack uses another RDMA
listener port (RDS_CM_PORT) to accept IPv6 connection. And it
continues to use the original RDS_PORT for IPv4 RDS connections. When
it needs to communicate with an IPv6 peer, it uses the RDS_CM_PORT to
send the connection set up request.

v4: Changed port history comments in rds.h (Sowmini Varadhan).

v3: Added support to set up IPv4 connection using mapped address
 (David Miller).
 Added support to set up connection between link local and non-link
 addresses.
 Various review comments from Santosh Shilimkar and Sowmini Varadhan.

v2: Fixed bound and peer address scope mismatched issue.
 Added back rds_connect() IPv6 changes.

Signed-off-by: Ka-Cheong Poon 
---

Looks good to me now. Thanks !!

Acked-by: Santosh Shilimkar

Re: [PATCH v4 net-next 1/3] rds: Changing IP address internal representation to struct in6_addr

2018-07-23 Thread Santosh Shilimkar


On 7/23/2018 7:16 AM, Ka-Cheong Poon wrote:

This patch changes the internal representation of an IP address to use
struct in6_addr.  IPv4 address is stored as an IPv4 mapped address.
All the functions which take an IP address as argument are also
changed to use struct in6_addr.  But RDS socket layer is not modified
such that it still does not accept IPv6 address from an application.
And RDS layer does not accept nor initiate IPv6 connections.

v2: Fixed sparse warnings.

Signed-off-by: Ka-Cheong Poon 
---

Looks good to me now. Thanks !!

Acked-by: Santosh Shilimkar

Re: [PATCH v3 net-next 0/3] rds: IPv6 support

2018-07-18 Thread Santosh Shilimkar


On 7/13/2018 4:02 AM, Ka-Cheong Poon wrote:

This patch set adds IPv6 support to the kernel RDS and related
modules.  Existing RDS apps using IPv4 address continue to run without
any problem.  New RDS apps which want to use IPv6 address can do so by
passing the address in struct sockaddr_in6 to bind(), connect() or
sendmsg().  And those apps also need to use the new IPv6 equivalents
of some of the existing socket options as the existing options use a
32 bit integer to store IP address.


[...]



Ka-Cheong Poon (3):
   rds: Changing IP address internal representation to struct in6_addr
   rds: Enable RDS IPv6 support
   rds: Extend RDS API for IPv6 support


Could you please post v4 and as aligned, please drop the SO_TRANSPORT
change from the set.

regards,
Santosh

Re: [PATCH v3 net-next 3/3] rds: Extend RDS API for IPv6 support

2018-07-13 Thread Santosh Shilimkar

On 7/13/2018 4:27 PM, David Miller wrote:

From: Santosh Shilimkar 
Date: Fri, 13 Jul 2018 15:00:59 -0700

Ofcourse any application built using upstream header and
using SO_RDS_TRANSPORT will break but since this particular
option was added for special case(application wants to
upfront select transport instead letting bind figure it out),
our hope its not used by other application(s).

We can't let people have different UAPIs from upstream on a whim like
this then change the already released upstream UAPI to match.

Please take this into consideration when making changes in the future.

Will not be repeated in future.

I'm not allowing this upstream UAPI break, sorry.

Ok Dave !!

Regards,
Santosh

Re: [PATCH v3 net-next 3/3] rds: Extend RDS API for IPv6 support

2018-07-13 Thread Santosh Shilimkar

Hi Dave,

On 7/13/2018 2:25 PM, David Miller wrote:

From: Ka-Cheong Poon 
Date: Fri, 13 Jul 2018 04:02:59 -0700

@@ -52,7 +52,7 @@
  #define RDS_RECVERR   5
  #define RDS_CONG_MONITOR  6
  #define RDS_GET_MR_FOR_DEST   7
-#define SO_RDS_TRANSPORT   8
+#define SO_RDS_TRANSPORT   9

There is no way you can change this value without breaking
applications.

Downstream Oracle shipping application have been built
with 9 as a SO_RDS_TRANSPORT from beginning. 8 is used
for RDS_CONN_RESET but support for it doesn't exist
upstream yet. Ka-cheong was aligning them so that we
have same number upstream as well as in shipping products.

Ofcourse any application built using upstream header and
using SO_RDS_TRANSPORT will break but since this particular
option was added for special case(application wants to
upfront select transport instead letting bind figure it out),
our hope its not used by other application(s).

Regards,
Santosh

Re: [PATCH v2 net-next 1/3] rds: Changing IP address internal representation to struct in6_addr

2018-07-06 Thread Santosh Shilimkar


Hi Ka-Cheong,

On 7/6/2018 8:25 AM, Sowmini Varadhan wrote:

On (07/06/18 23:08), Ka-Cheong Poon wrote:


As mentioned in a previous mail, it is unclear why the
port number is transport specific.  Most Internet services
use the same port number running over TCP/UDP as shown
in the IANA database.  And the IANA RDS registration is
the same.  What is the rationale of having a transport
specific number in the RDS implementation?


because not every transport may need a port number.


Lets keep separate port for RDMA and TCP transport. This has been
already useful for wireshark dissector and can also help for eBPF
like external tooling. The fragment format and re-assembly is
different across transports.

I do see your point and also agree that port number isn't transport
specific and in case we need to add another transport, what port
to use. But may be till then lets keep the existing behavior.
As such this port switch is not related to IPv6 support as such
so lets deal with it separately.

Regards,
Santosh

Re: [PATCH v2 net-next 1/3] rds: Changing IP address internal representation to struct in6_addr

2018-07-05 Thread Santosh Shilimkar


On 6/27/2018 3:23 AM, Ka-Cheong Poon wrote:

This patch changes the internal representation of an IP address to use
struct in6_addr.  IPv4 address is stored as an IPv4 mapped address.
All the functions which take an IP address as argument are also
changed to use struct in6_addr.  But RDS socket layer is not modified
such that it still does not accept IPv6 address from an application.
And RDS layer does not accept nor initiate IPv6 connections.

v2: Fixed sparse warnings.

Signed-off-by: Ka-Cheong Poon 
---
  net/rds/af_rds.c | 138 --
  net/rds/bind.c   |  91 +-
  net/rds/cong.c   |  23 ++--
  net/rds/connection.c | 132 +
  net/rds/ib.c |  17 +--
  net/rds/ib.h |  45 +--
  net/rds/ib_cm.c  | 300 +++
  net/rds/ib_rdma.c|  15 +--
  net/rds/ib_recv.c|  18 +--
  net/rds/ib_send.c|  10 +-
  net/rds/loop.c   |   7 +-
  net/rds/rdma.c   |   6 +-
  net/rds/rdma_transport.c |  56 ++---
  net/rds/rds.h|  69 +++
  net/rds/recv.c   |  51 +---
  net/rds/send.c   |  67 ---
  net/rds/tcp.c|  32 -
  net/rds/tcp_connect.c|  34 +++---
  net/rds/tcp_listen.c |  18 +--
  net/rds/tcp_recv.c   |   9 +-
  net/rds/tcp_send.c   |   4 +-
  net/rds/threads.c|  69 +--
  net/rds/transport.c  |  15 ++-
  23 files changed, 857 insertions(+), 369 deletions(-)




diff --git a/net/rds/bind.c b/net/rds/bind.c
index 5aa3a64..3822886 100644
--- a/net/rds/bind.c
+++ b/net/rds/bind.c
@@ -1,5 +1,5 @@
  /*
- * Copyright (c) 2006 Oracle.  All rights reserved.
+ * Copyright (c) 2006, 2018 Oracle and/or its affiliates. All rights reserved.
   *
   * This software is available to you under a choice of one of two
   * licenses.  You may choose to be licensed under the terms of the GNU
@@ -33,6 +33,7 @@
  #include 
  #include 
  #include 
+#include 
  #include 
  #include 
  #include 
@@ -42,42 +43,58 @@
  
  static const struct rhashtable_params ht_parms = {

.nelem_hint = 768,
-   .key_len = sizeof(u64),
+   .key_len = RDS_BOUND_KEY_LEN,

Do we really need the scope id to be part of the key ? With link
local/global, do you see any collisions. Please educate me
on the actual usecase. This can avoid bunch of changes and hence
the question.


@@ -114,7 +132,7 @@ static int rds_add_bound(struct rds_sock *rs, __be32 addr, 
__be16 *port)
  rs, , (int)ntohs(*port));
break;
} else {
-   rs->rs_bound_addr = 0;
+   rs->rs_bound_addr = in6addr_any;

Can you elaborate why 0 is not ok ?


rds_sock_put(rs);
ret = -ENOMEM;
break;
@@ -127,44 +145,61 @@ static int rds_add_bound(struct rds_sock *rs, __be32 
addr, __be16 *port)
  void rds_remove_bound(struct rds_sock *rs)
  {
  
-	if (!rs->rs_bound_addr)

+   if (ipv6_addr_any(>rs_bound_addr))
return;
  
-	rdsdebug("rs %p unbinding from %pI4:%d\n",

+   rdsdebug("rs %p unbinding from %pI6c:%d\n",
 rs, >rs_bound_addr,
 ntohs(rs->rs_bound_port));
  
  	rhashtable_remove_fast(_hash_table, >rs_bound_node, ht_parms);

rds_sock_put(rs);
-   rs->rs_bound_addr = 0;
+   rs->rs_bound_addr = in6addr_any;
  }
  
  int rds_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len)

  {
struct sock *sk = sock->sk;
-   struct sockaddr_in *sin = (struct sockaddr_in *)uaddr;
struct rds_sock *rs = rds_sk_to_rs(sk);
+   struct in6_addr v6addr, *binding_addr;
struct rds_transport *trans;
+   __u32 scope_id = 0;
int ret = 0;
+   __be16 port;
  
+	/* We only allow an RDS socket to be bound to and IPv4 address. IPv6

s/'bound to and IPv4'/'bound to an IPv4'

+* address support will be added later.
+*/
+   if (addr_len == sizeof(struct sockaddr_in)) {
+   struct sockaddr_in *sin = (struct sockaddr_in *)uaddr;
+
+   if (sin->sin_family != AF_INET ||
+   sin->sin_addr.s_addr == htonl(INADDR_ANY))
+   return -EINVAL;
+   ipv6_addr_set_v4mapped(sin->sin_addr.s_addr, );
+   binding_addr = 
+   port = sin->sin_port;
+   } else if (addr_len == sizeof(struct sockaddr_in6)) {
+   return -EPROTONOSUPPORT;
+   } else {
+   return -EINVAL;
+   }
lock_sock(sk);
  
-	if (addr_len != sizeof(struct sockaddr_in) ||

-   sin->sin_family != AF_INET ||
-   rs->rs_bound_addr ||
-   sin->sin_addr.s_addr == htonl(INADDR_ANY)) {
+   /* RDS socket does not allow re-binding. */
+   if (!ipv6_addr_any(>rs_bound_addr)) {
ret = -EINVAL;

Re: [PATCH net-next 2/3] rds: Enable RDS IPv6 support

2018-06-27 Thread Santosh Shilimkar


On 6/27/2018 3:07 AM, Ka-Cheong Poon wrote:

On 06/26/2018 09:08 PM, Sowmini Varadhan wrote:

On (06/26/18 21:02), Ka-Cheong Poon wrote:


[...]




I don't expect RDS apps will want to use link local address
in the first place.  In fact, most normal network apps don't.


This is not true.




You're not doing this for IPv4 and RDS today (you dont have to do this
for UDP, afaik)



Do you know of any IPv4 RDS app which uses IPv4 link local
address?  In fact, IPv4 link local address is explicitly
disallowed for active active bonding.


Yes. Cluster-ware HAIP makes use of link local addresses. That
check was mainly because of RDMA CM issues but that only means
active-active isn't used. The bonding works just fine and if
needed cluster-ware can also use TCP transport.

Lets not add this new behavior for link local and its
actually not relevant to really v6 addressing support.

Regards,
Santosh

Re: [PATCH net-next 2/3] rds: Enable RDS IPv6 support

2018-06-25 Thread Santosh Shilimkar


On 6/25/2018 10:50 AM, Sowmini Varadhan wrote:

On (06/26/18 01:43), Ka-Cheong Poon wrote:


Yes, I think if the socket is bound, it should check the scope_id
in msg_name (if not NULL) to make sure that they match.  A bound
RDS socket can send to multiple peers.  But if the bound local
address is link local, it should only be allowed to send to peers
on the same link.


agree.

Yep. Its inline with RDS bind behavior.





If a socket is bound, I guess the scope_id should be used.  So
if a socket is not bound to a link local address and the socket
is used to sent to a link local peer, it should fail.


PF_RDS sockets *MUST* alwasy be bound.  See
Documentation/networking/rds.txt:
"   Sockets must be bound before you can send or receive data.
 This is needed because binding also selects a transport and
 attaches it to the socket. Once bound, the transport assignment
 does not change."


In any case link local or not, the socket needs to be bound before
any data can be sent as documented. Send path already enforces
it.


Also, why is there no IPv6 support in rds_connect?



Oops, I missed this when I ported the internal version to the
net-next version.  Will add it back.



So the net-next wasn't tested? IPv6 connections
itself wouldn't be formed with this missing. As mentioned
already, please test v2 before posting on list.

Regards,
Santosh

[PATCH] rds: avoid unenecessary cong_update in loop transport

2018-06-14 Thread Santosh Shilimkar

Loop transport which is self loopback, remote port congestion
update isn't relevant. Infact the xmit path already ignores it.
Receive path needs to do the same.

Reported-by: syzbot+4c20b3866171ce844...@syzkaller.appspotmail.com
Reviewed-by: Sowmini Varadhan 
Signed-off-by: Santosh Shilimkar 
---
 net/rds/loop.c | 1 +
 net/rds/rds.h  | 5 +
 net/rds/recv.c | 5 +
 3 files changed, 11 insertions(+)

diff --git a/net/rds/loop.c b/net/rds/loop.c
index f2bf78d..dac6218 100644
--- a/net/rds/loop.c
+++ b/net/rds/loop.c
@@ -193,4 +193,5 @@ struct rds_transport rds_loop_transport = {
.inc_copy_to_user   = rds_message_inc_copy_to_user,
.inc_free   = rds_loop_inc_free,
.t_name = "loopback",
+   .t_type = RDS_TRANS_LOOP,
 };
diff --git a/net/rds/rds.h b/net/rds/rds.h
index b04c333..f2272fb 100644
--- a/net/rds/rds.h
+++ b/net/rds/rds.h
@@ -479,6 +479,11 @@ struct rds_notifier {
int n_status;
 };
 
+/* Available as part of RDS core, so doesn't need to participate
+ * in get_preferred transport etc
+ */
+#defineRDS_TRANS_LOOP  3
+
 /**
  * struct rds_transport -  transport specific behavioural hooks
  *
diff --git a/net/rds/recv.c b/net/rds/recv.c
index dc67458..192ac6f 100644
--- a/net/rds/recv.c
+++ b/net/rds/recv.c
@@ -103,6 +103,11 @@ static void rds_recv_rcvbuf_delta(struct rds_sock *rs, 
struct sock *sk,
rds_stats_add(s_recv_bytes_added_to_socket, delta);
else
rds_stats_add(s_recv_bytes_removed_from_socket, -delta);
+
+   /* loop transport doesn't send/recv congestion updates */
+   if (rs->rs_transport->t_type == RDS_TRANS_LOOP)
+   return;
+
now_congested = rs->rs_rcv_bytes > rds_sk_rcvbuf(rs);
 
rdsdebug("rs %p (%pI4:%u) recv bytes %d buf %d "
-- 
1.9.1

Re: KASAN: null-ptr-deref Read in rds_ib_get_mr

2018-05-11 Thread Santosh Shilimkar


On 5/11/2018 12:48 AM, Yanjun Zhu wrote:



On 2018/5/11 13:20, DaeRyong Jeong wrote:

We report the crash: KASAN: null-ptr-deref Read in rds_ib_get_mr

Note that this bug is previously reported by syzkaller.
https://syzkaller.appspot.com/bug?id=0bb56a5a48b000b52aa2b0d8dd20b1f545214d91 

Nonetheless, this bug has not fixed yet, and we hope that this report 
and our
analysis, which gets help by the RaceFuzzer's feature, will helpful to 
fix the

crash.

This crash has been found in v4.17-rc1 using RaceFuzzer (a modified
version of Syzkaller), which we describe more at the end of this
report. Our analysis shows that the race occurs when invoking two
syscalls concurrently, bind$rds and setsockopt$RDS_GET_MR.


Analysis:
We think the concurrent execution of __rds_rdma_map() and rds_bind()
causes the problem. __rds_rdma_map() checks whether rs->rs_bound_addr 
is 0

or not. But the concurrent execution with rds_bind() can by-pass this
check. Therefore, __rds_rdmap_map() calls rs->rs_transport->get_mr() and
rds_ib_get_mr() causes the null deref at ib_rdma.c:544 in v4.17-rc1, when
dereferencing rs_conn.


Thread interleaving:
CPU0 (__rds_rdma_map)    CPU1 (rds_bind)
    // rds_add_bound() sets rs->bound_addr as 
none 0
    ret = rds_add_bound(rs, 
sin->sin_addr.s_addr, >sin_port);

if (rs->rs_bound_addr == 0 || !rs->rs_transport) {
ret = -ENOTCONN; /* XXX not a great errno */
goto out;
}
    if (rs->rs_transport) { /* previously 
bound */

    trans = rs->rs_transport;
    if 
(trans->laddr_check(sock_net(sock->sk),
   sin->sin_addr.s_addr) 
!= 0) {

    ret = -ENOPROTOOPT;
    // rds_remove_bound() sets 
rs->bound_addr as 0

    rds_remove_bound(rs);
...
trans_private = rs->rs_transport->get_mr(sg, nents, rs,
 >r_key);
(in rds_ib_get_mr())
struct rds_ib_connection *ic = rs->rs_conn->c_transport_data;


Call sequence (v4.17-rc1):
CPU0
rds_setsockopt
rds_get_mr
    __rds_rdma_map
    rds_ib_get_mr


CPU1
rds_bind
rds_add_bound
...
rds_remove_bound


Crash log:
==
BUG: KASAN: null-ptr-deref in rds_ib_get_mr+0x3a/0x150 
net/rds/ib_rdma.c:544

Read of size 8 at addr 0068 by task syz-executor0/32067

CPU: 0 PID: 32067 Comm: syz-executor0 Not tainted 4.17.0-rc1 #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
rel-1.8.2-0-g33fbe13 by qemu-project.org 04/01/2014

Call Trace:
  __dump_stack lib/dump_stack.c:77 [inline]
  dump_stack+0x166/0x21c lib/dump_stack.c:113
  kasan_report_error mm/kasan/report.c:352 [inline]
  kasan_report+0x140/0x360 mm/kasan/report.c:412
  check_memory_region_inline mm/kasan/kasan.c:260 [inline]
  __asan_load8+0x54/0x90 mm/kasan/kasan.c:699
  rds_ib_get_mr+0x3a/0x150 net/rds/ib_rdma.c:544
  __rds_rdma_map+0x521/0x9d0 net/rds/rdma.c:271
  rds_get_mr+0xad/0xf0 net/rds/rdma.c:333
  rds_setsockopt+0x57f/0x720 net/rds/af_rds.c:347
  __sys_setsockopt+0x147/0x230 net/socket.c:1903
  __do_sys_setsockopt net/socket.c:1914 [inline]
  __se_sys_setsockopt net/socket.c:1911 [inline]
  __x64_sys_setsockopt+0x67/0x80 net/socket.c:1911
  do_syscall_64+0x15f/0x4a0 arch/x86/entry/common.c:287
  entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x4563f9
RSP: 002b:7f6a2b3c2b28 EFLAGS: 0246 ORIG_RAX: 0036
RAX: ffda RBX: 0072bee0 RCX: 004563f9
RDX: 0002 RSI: 0114 RDI: 0015
RBP: 0575 R08: 0020 R09: 
R10: 2140 R11: 0246 R12: 7f6a2b3c36d4
R13:  R14: 006fd398 R15: 
==

diff --git a/net/rds/ib_rdma.c b/net/rds/ib_rdma.c
index e678699..2228b50 100644
--- a/net/rds/ib_rdma.c
+++ b/net/rds/ib_rdma.c
@@ -539,11 +539,17 @@ void rds_ib_flush_mrs(void)
  void *rds_ib_get_mr(struct scatterlist *sg, unsigned long nents,
     struct rds_sock *rs, u32 *key_ret)
  {
-   struct rds_ib_device *rds_ibdev;
+   struct rds_ib_device *rds_ibdev = NULL;
     struct rds_ib_mr *ibmr = NULL;
-   struct rds_ib_connection *ic = rs->rs_conn->c_transport_data;
+   struct rds_ib_connection *ic = NULL;
     int ret;

+   if (rs->rs_bound_addr == 0) {
+   ret = -EPERM;
+   goto out;
+   }
+

No you can't return such error for this API and the
socket related checks needs to be done at core layer.
I remember fixing this race but probably never pushed
fix upstream.

The MR code is due for update with optimized FRWR code
which now stable enough. We will address this issue as
well as part of that patchset.

Re: [PATCH] rds: ib: Fix missing call to rds_ib_dev_put in rds_ib_setup_qp

2018-04-25 Thread Santosh Shilimkar


On 4/25/2018 4:22 AM, Dag Moxnes wrote:

The function rds_ib_setup_qp is calling rds_ib_get_client_data and
should correspondingly call rds_ib_dev_put. This call was lost in
the non-error path with the introduction of error handling done in
commit 3b12f73a5c29 ("rds: ib: add error handle")

Signed-off-by: Dag Moxnes <dag.mox...@oracle.com>
---

Thanks Dag for following it up.

Acked-by: Santosh Shilimkar <santosh.shilim...@oracle.com>

Re: [PATCH 1/1] Revert "rds: ib: add error handle"

2018-04-24 Thread Santosh Shilimkar


On 4/24/2018 4:25 AM, Dag Moxnes wrote:

I was going to suggest the following correction:


If all agree that this is the correct way of doing it, I can go ahead 
and an post it.



Yes please. Go ahead and post your fix.

Regards,
Santosh
P.S: Avoid top posting please.

Re: [PATCH net-next] rds: tcp: must use spin_lock_irq* and not spin_lock_bh with rds_tcp_conn_lock

2018-03-15 Thread Santosh Shilimkar


On 3/15/2018 3:54 AM, Sowmini Varadhan wrote:

rds_tcp_connection allocation/free management has the potential to be
called from __rds_conn_create after IRQs have been disabled, so
spin_[un]lock_bh cannot be used with rds_tcp_conn_lock.

Bottom-halves that need to synchronize for critical sections protected
by rds_tcp_conn_lock should instead use rds_destroy_pending() correctly.

Reported-by: syzbot+c68e51bb5e699d3f8...@syzkaller.appspotmail.com
Fixes: ebeeb1ad9b8a ("rds: tcp: use rds_destroy_pending() to synchronize
netns/module teardown and rds connection/workq management")
Signed-off-by: Sowmini Varadhan <sowmini.varad...@oracle.com>
---

Thanks Sowmini for the WARN_ON() discussion off-list.

Acked-by: Santosh Shilimkar <santosh.shilim...@oracle.com>

Re: [PATCH v2 net] rds: Incorrect reference counting in TCP socket creation

2018-03-02 Thread Santosh Shilimkar

On 3/2/2018 6:42 AM, David Miller wrote:

From: "santosh.shilim...@oracle.com" 
Date: Thu, 1 Mar 2018 22:22:07 -0800

Versioning comment typically goes below "---" and not part of
commit message.

I like them to be in the commit message most of the time.

Especially for patch series header postings.

Series header I have seen it followed but not in patch commit since
it goes into kernel git history.

Later if someone reviews the patch and says "why didn't they do X" and
if it says in the version history text "don't do X based upon feedback
from developer Y" that helps a lot.

I agree the versioning info helps, but didn't know that you like that
to be in commit message of patches as well. Will keep that in mind for
future for netdev list.

Regards,
Santosh

Re: [PATCH V3 net-next 2/3] rds: deliver zerocopy completion notification with data

2018-02-26 Thread Santosh Shilimkar


On 2/26/2018 9:11 AM, Sowmini Varadhan wrote:

On (02/26/18 09:07), Santosh Shilimkar wrote:

Just in case you haven't seen yet, Dan Carpenter reported skb deref
warning on previous version of the patch. Not sure why it wasn't sent
on netdev.


yes I saw it, but that was for the previous version of the patch
around code that delivers notifications on sk_error_queue.

This patch series removes the sk_error_queue support to the
smatch warning is not applicable after this patch.


I see it now. Thanks !!

Regards,
Santosh

Re: [PATCH V3 net-next 2/3] rds: deliver zerocopy completion notification with data

2018-02-26 Thread Santosh Shilimkar



On 2/25/2018 3:21 PM, Sowmini Varadhan wrote:

This commit is an optimization over commit 01883eda72bd
("rds: support for zcopy completion notification") for PF_RDS sockets.

RDS applications are predominantly request-response transactions, so
it is more efficient to reduce the number of system calls and have
zerocopy completion notification delivered as ancillary data on the
POLLIN channel.

Cookies are passed up as ancillary data (at level SOL_RDS) in a
struct rds_zcopy_cookies when the returned value of recvmsg() is
greater than, or equal to, 0. A max of RDS_MAX_ZCOOKIES may be passed
with each message.

This commit removes support for zerocopy completion notification on
MSG_ERRQUEUE for PF_RDS sockets.

Signed-off-by: Sowmini Varadhan <sowmini.varad...@oracle.com>
---
v2: remove sk_error_queue path; lot of cautionary checks rds_recvmsg_zcookie()
 and callers to make sure we dont remove cookies from the queue and then
 fail to pass it up to caller
v3:
   - bounds check on skb->cb to make sure there is enough room for
 struct rds_zcopy_cookies as well as the rds_znotifier;
   - Refactor cautionary checks in rds_recvmsg_zcookie: if no msg_control
 has been passed, or if there not enough msg_controllen for a
 a rds_zcopy_cookies, return silently (do not return error, as the
 caller may have wanted other ancillary data which may happen to fit
 in the space provided)
   - return bool form rds_recvmsg_zcookie, some other code cleanup


Just in case you haven't seen yet, Dan Carpenter reported skb deref
warning on previous version of the patch. Not sure why it wasn't sent
on netdev.

smatch warnings:
net/rds/recv.c:605 rds_recvmsg_zcookie() warn: variable dereferenced 
before check 'skb' (see line 596)


With that addressed,

Acked-by: Santosh Shilimkar <santosh.shilim...@oracle.com>

Re: [for-next 7/7] IB/mlx5: Implement fragmented completion queue (CQ)

2018-02-25 Thread Santosh Shilimkar


n 2/24/2018 1:40 AM, Majd Dibbiny wrote:



On Feb 23, 2018, at 9:13 PM, Saeed Mahameed <sae...@mellanox.com> wrote:


On Thu, 2018-02-22 at 16:04 -0800, Santosh Shilimkar wrote:
Hi Saeed


On 2/21/2018 12:13 PM, Saeed Mahameed wrote:


[...]



Jason mentioned about this patch to me off-list. We were
seeing similar issue with SRQs & QPs. So wondering whether
you have any plans to do similar change for other resouces
too so that they don't rely on higher order page allocation
for icm tables.



Hi Santosh,

Adding Majd,

Which ULP is in question ? how big are the QPs/SRQs you create that
lead to this problem ?

For icm tables we already allocate only order 0 pages:
see alloc_system_page() in
drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c

But for kernel RDMA SRQ and QP buffers there is a place for
improvement.

Majd, do you know if we have any near future plans for this.


It’s in our plans to move all the buffers to use 0-order pages.

Santosh,

Is this RDS? Do you have persistent failure with some configuration? Can you 
please share more information?


No the issue seen with user verbs and actually MLX4 driver. My
last question was more for both MLX4 and MLX5 drivers icm
allocation for all the resources.

With MLX4 driver, we have seen corruption issues with MLX4_NO_RR
while recycling the issues. So we ended up switching to round robin
bitmap allocation as it was before which was changed by one of
Jacks commit 7c6d74d23 {mlx4_core: Roll back round robin bitmap
allocation commit for CQs, SRQs, and MPTs}

With default round robin, the corruption issue went away but then
its undesired effect of bloating the icm tables till you hit the
resource limit means more memory fragmentation. Since these resources
makes use of higher order allocations and in fragmented memory scenarios
we see contention on mm lock for seconds since compaction layer is
trying to stitch pages which takes time.

If these alloaction don't make use of higher order pages, the issue
can be certainly avoided and hence the reason behind the question.

Ofcourse we wouldn't have ended up with this issue if 'MLX4_NO_RR'
worked without corruption :-)

Regards,
Santosh

Re: [for-next 7/7] IB/mlx5: Implement fragmented completion queue (CQ)

2018-02-22 Thread Santosh Shilimkar


Hi Saeed

On 2/21/2018 12:13 PM, Saeed Mahameed wrote:

From: Yonatan Cohen 

The current implementation of create CQ requires contiguous
memory, such requirement is problematic once the memory is
fragmented or the system is low in memory, it causes for
failures in dma_zalloc_coherent().

This patch implements new scheme of fragmented CQ to overcome
this issue by introducing new type: 'struct mlx5_frag_buf_ctrl'
to allocate fragmented buffers, rather than contiguous ones.

Base the Completion Queues (CQs) on this new fragmented buffer.

It fixes following crashes:
kworker/29:0: page allocation failure: order:6, mode:0x80d0
CPU: 29 PID: 8374 Comm: kworker/29:0 Tainted: G OE 3.10.0
Workqueue: ib_cm cm_work_handler [ib_cm]
Call Trace:
[<>] dump_stack+0x19/0x1b
[<>] warn_alloc_failed+0x110/0x180
[<>] __alloc_pages_slowpath+0x6b7/0x725
[<>] __alloc_pages_nodemask+0x405/0x420
[<>] dma_generic_alloc_coherent+0x8f/0x140
[<>] x86_swiotlb_alloc_coherent+0x21/0x50
[<>] mlx5_dma_zalloc_coherent_node+0xad/0x110 [mlx5_core]
[<>] ? mlx5_db_alloc_node+0x69/0x1b0 [mlx5_core]
[<>] mlx5_buf_alloc_node+0x3e/0xa0 [mlx5_core]
[<>] mlx5_buf_alloc+0x14/0x20 [mlx5_core]
[<>] create_cq_kernel+0x90/0x1f0 [mlx5_ib]
[<>] mlx5_ib_create_cq+0x3b0/0x4e0 [mlx5_ib]

Signed-off-by: Yonatan Cohen 
Reviewed-by: Tariq Toukan 
Signed-off-by: Leon Romanovsky 
Signed-off-by: Saeed Mahameed 
---

Jason mentioned about this patch to me off-list. We were
seeing similar issue with SRQs & QPs. So wondering whether
you have any plans to do similar change for other resouces
too so that they don't rely on higher order page allocation
for icm tables.

Regards,
Santosh

Re: [PATCH net-next] rds: rds_msg_zcopy should return error of null rm->data.op_mmp_znotifier

2018-02-22 Thread Santosh Shilimkar


On 2/22/2018 1:40 PM, Sowmini Varadhan wrote:

if either or both of MSG_ZEROCOPY and SOCK_ZEROCOPY have not been
specified, the rm->data.op_mmp_znotifier allocation will be skipped.
In this case, it is invalid ot pass down a cmsghdr with
RDS_CMSG_ZCOPY_COOKIE, so return EINVAL from rds_msg_zcopy for this
case.

Reported-by: syzbot+f893ae7bb2f6456df...@syzkaller.appspotmail.com
Fixes: 0cebaccef3ac ("rds: zerocopy Tx support.")
Signed-off-by: Sowmini Varadhan <sowmini.varad...@oracle.com>
---


Acked-by: Santosh Shilimkar <santosh.shilim...@oracle.com>

Re: [PATCH net-next] RDS: deliver zerocopy completion notification with data as an optimization

2018-02-21 Thread Santosh Shilimkar


On 2/21/2018 12:19 PM, Sowmini Varadhan wrote:

This commit is an optimization that builds on top of commit 01883eda72bd
("rds: support for zcopy completion notification") for PF_RDS sockets.

Cookies associated with zerocopy completion are passed up on the POLLIN
channel, piggybacked with data whereever possible. Such cookies are passed

s/whereever/wherever


up as ancillary data (at level SOL_RDS) in a struct rds_zcopy_cookies when
the returned value of recvmsg() is >= 0. A max of SO_EE_ORIGIN_MAX_ZCOOKIES
may be passed with each message.

Signed-off-by: Sowmini Varadhan <sowmini.varad...@oracle.com>
---

Acked-by: Santosh Shilimkar <santosh.shilim...@oracle.com>

Re: [PATCH] rds: send: mark expected switch fall-through in rds_rm_size

2018-02-20 Thread Santosh Shilimkar


On 2/20/2018 10:05 AM, Gustavo A. R. Silva wrote:

Hi Santosh,

On 02/20/2018 11:54 AM, Santosh Shilimkar wrote:

Hi,

2/19/2018 10:10 AM, Gustavo A. R. Silva wrote:

In preparation to enabling -Wimplicit-fallthrough, mark switch cases
where we are expecting to fall through.

Addresses-Coverity-ID: 1465362 ("Missing break in switch")
Signed-off-by: Gustavo A. R. Silva <gust...@embeddedor.com>
---
  net/rds/send.c | 2 ++
  1 file changed, 2 insertions(+)

diff --git a/net/rds/send.c b/net/rds/send.c
index 028ab59..79d158b 100644
--- a/net/rds/send.c
+++ b/net/rds/send.c
@@ -902,6 +902,8 @@ static int rds_rm_size(struct msghdr *msg, int 
num_sgs)

  case RDS_CMSG_ZCOPY_COOKIE:
  zcopy_cookie = true;
+    /* fall through */
+
  case RDS_CMSG_RDMA_DEST:
  case RDS_CMSG_RDMA_MAP:
  cmsg_groups |= 2;


So coverity greps for commet as "fall through" for
-Wimplicit-fallthrough build ?



No. Basically, Coverity only reports cases in which a break, return or 
continue statement is missing.


Now, if the statements I mention above are missing and if you add the 
following line to your Makefile:


KBUILD_CFLAGS  += $(call cc-option,-Wimplicit-fallthrough)

You will get a warning if a fall-through comment is missing.


That make sense.


Adding that comments itself if fine but was curious
about it if some one makes a spell error in this
comment what happens ;-)



In this case, Coverity would still report the same "Missing break in 
switch" error, but you'll get a GCC warning.



Got it. Thanks !!

Re: [PATCH] rds: send: mark expected switch fall-through in rds_rm_size

2018-02-20 Thread Santosh Shilimkar

On 2/20/2018 10:01 AM, David Miller wrote:

From: Santosh Shilimkar <santosh.shilim...@oracle.com>
Date: Tue, 20 Feb 2018 09:54:09 -0800

So coverity greps for commet as "fall through" for
-Wimplicit-fallthrough build ?

 From what I understand, 'gcc' does in the latest versions.  Coverity
might as well, I don't know.

Good to know about 'gcc' adding such option. Thanks !!

Re: [PATCH] rds: send: mark expected switch fall-through in rds_rm_size

2018-02-20 Thread Santosh Shilimkar


Hi,

2/19/2018 10:10 AM, Gustavo A. R. Silva wrote:

In preparation to enabling -Wimplicit-fallthrough, mark switch cases
where we are expecting to fall through.

Addresses-Coverity-ID: 1465362 ("Missing break in switch")
Signed-off-by: Gustavo A. R. Silva <gust...@embeddedor.com>
---
  net/rds/send.c | 2 ++
  1 file changed, 2 insertions(+)

diff --git a/net/rds/send.c b/net/rds/send.c
index 028ab59..79d158b 100644
--- a/net/rds/send.c
+++ b/net/rds/send.c
@@ -902,6 +902,8 @@ static int rds_rm_size(struct msghdr *msg, int num_sgs)
  
  		case RDS_CMSG_ZCOPY_COOKIE:

zcopy_cookie = true;
+   /* fall through */
+
case RDS_CMSG_RDMA_DEST:
case RDS_CMSG_RDMA_MAP:
cmsg_groups |= 2;


So coverity greps for commet as "fall through" for
-Wimplicit-fallthrough build ?

Adding that comments itself if fine but was curious
about it if some one makes a spell error in this
comment what happens ;-)

For patch itself,
Acked-by: Santosh Shilimkar <santosh.shilim...@oracle.com>

Re: [PATCH v3 net-next 5/7] rds: zerocopy Tx support.

2018-02-15 Thread Santosh Shilimkar


On 2/15/2018 10:49 AM, Sowmini Varadhan wrote:

If the MSG_ZEROCOPY flag is specified with rds_sendmsg(), and,
if the SO_ZEROCOPY socket option has been set on the PF_RDS socket,
application pages sent down with rds_sendmsg() are pinned.

The pinning uses the accounting infrastructure added by
Commit a91dbff551a6 ("sock: ulimit on MSG_ZEROCOPY pages")

The payload bytes in the message may not be modified for the
duration that the message has been pinned. A multi-threaded
application using this infrastructure may thus need to be notified
about send-completion so that it can free/reuse the buffers
passed to rds_sendmsg(). Notification of send-completion will
identify each message-buffer by a cookie that the application
must specify as ancillary data to rds_sendmsg().
The ancillary data in this case has cmsg_level == SOL_RDS
and cmsg_type == RDS_CMSG_ZCOPY_COOKIE.

Signed-off-by: Sowmini Varadhan <sowmini.varad...@oracle.com>
---


Acked-by: Santosh Shilimkar <santosh.shilim...@oracle.com>

Re: [PATCH v3 net-next 4/7] rds: support for zcopy completion notification

2018-02-15 Thread Santosh Shilimkar


On 2/15/2018 10:49 AM, Sowmini Varadhan wrote:

RDS removes a datagram (rds_message) from the retransmit queue when
an ACK is received. The ACK indicates that the receiver has queued
the RDS datagram, so that the sender can safely forget the datagram.
When all references to the rds_message are quiesced, rds_message_purge
is called to release resources used by the rds_message

If the datagram to be removed had pinned pages set up, add
an entry to the rs->rs_znotify_queue so that the notifcation
will be sent up via rds_rm_zerocopy_callback() when the
rds_message is eventually freed by rds_message_purge.

rds_rm_zerocopy_callback() attempts to batch the number of cookies
sent with each notification  to a max of SO_EE_ORIGIN_MAX_ZCOOKIES.
This is achieved by checking the tail skb in the sk_error_queue:
if this has room for one more cookie, the cookie from the
current notification is added; else a new skb is added to the
sk_error_queue. Every invocation of rds_rm_zerocopy_callback() will
trigger a ->sk_error_report to notify the application.

Signed-off-by: Sowmini Varadhan <sowmini.varad...@oracle.com>
---

Acked-by: Santosh Shilimkar <santosh.shilim...@oracle.com>

Re: [PATCH v3 net-next 2/7] rds: hold a sock ref from rds_message to the rds_sock

2018-02-15 Thread Santosh Shilimkar


On 2/15/2018 10:49 AM, Sowmini Varadhan wrote:

The existing model holds a reference from the rds_sock to the
rds_message, but the rds_message does not itself hold a sock_put()
on the rds_sock. Instead the m_rs field in the rds_message is
assigned when the message is queued on the sock, and nulled when
the message is dequeued from the sock.

We want to be able to notify userspace when the rds_message
is actually freed (from rds_message_purge(), after the refcounts
to the rds_message go to 0). At the time that rds_message_purge()
is called, the message is no longer on the rds_sock retransmit
queue. Thus the explicit reference for the m_rs is needed to
send a notification that will signal to userspace that
it is now safe to free/reuse any pages that may have
been pinned down for zerocopy.

This patch manages the m_rs assignment in the rds_message with
the necessary refcount book-keeping.

Signed-off-by: Sowmini Varadhan <sowmini.varad...@oracle.com>
---

Acked-by: Santosh Shilimkar <santosh.shilim...@oracle.com>

Re: [PATCH V2 net-next 4/7] rds: support for zcopy completion notification

2018-02-14 Thread Santosh Shilimkar


On 2/14/2018 2:26 PM, Willem de Bruijn wrote:

On Wed, Feb 14, 2018 at 1:50 PM, Santosh Shilimkar


[...]


This error change might need to go though other subsystem tree. May
be you can seperate it and also copy "linux-...@vger.kernel.org"


Previous changes to this file also went in through net-next, so this is
fine. The error queue is a socket, so network, datastructure.


OK if that acceptable then there is no need to split the change.


As for naming, zerocopy is unfortunately an overused term. I did not
help matters when adding msg_zerocopy. 

:-)


That said, let's try to keep
consistent naming across socket families, so no zmsg prefix only in
RDS.
Fair enough.


Regards,
Santosh

Re: [PATCH V2 net-next 4/7] rds: support for zcopy completion notification

2018-02-14 Thread Santosh Shilimkar


On 2/14/2018 1:25 PM, Sowmini Varadhan wrote:

On (02/14/18 13:10), Santosh Shilimkar wrote:

RDS support true zero copy already with RDMA transport so some of
this code can easily get confused.


btw, another way to solve this is to have the RDMA code use the
suffix "rdma" (which is what it really is) as needed.


And same breath,here zcopy is No message from user ;-)
ZCOPY is otherwise often tied with RDMA directly.

Renaming churns are not that useful and I definitely agree
with what Dave said if it was renaming change to the
existing code like what you are suggesting with 'rdma'

Anyways I don't want to contest this too much since I can
follow that code and know what each does and means :-)

The comment was long term readability perspective for
some one completely new reading the code and being able to
distinguish the different modes.

Regards,
Santosh

Re: [PATCH V2 net-next 4/7] rds: support for zcopy completion notification

2018-02-14 Thread Santosh Shilimkar


On 2/14/2018 1:25 PM, Sowmini Varadhan wrote:

On (02/14/18 13:10), Santosh Shilimkar wrote:

RDS support true zero copy already with RDMA transport so some of
this code can easily get confused.


btw, another way to solve this is to have the RDMA code use the
suffix "rdma" (which is what it really is) as needed.


And same breath,here zcopy is No message from user ;-)
ZCOPY is otherwise often tied with RDMA directly.

Renaming churns are not that useful and I definitely agree
with what Dave said if it was renaming change to the
existing code like what you are suggesting with 'rdma'

Anyways I don't want to contest this too much since I can
follow that code and know what each does and means :-)

The comment was long term readability perspective for
some one completely new reading the code and being able to
distinguish the different modes.

Regards,
Santosh

Re: [PATCH V2 net-next 5/7] rds: zerocopy Tx support.

2018-02-14 Thread Santosh Shilimkar


On 2/14/2018 11:49 AM, Sowmini Varadhan wrote:

On (02/14/18 11:10), Santosh Shilimkar wrote:

s/RDS_CMSG_ZCOPY_COOKIE/RDS_CMSG_ZMSGCOPY_COOKIE



Please see https://www.spinics.net/lists/netdev/msg483627.html


Just saw it and responded to Dave.



@@ -356,6 +358,53 @@ int rds_message_copy_from_user(struct rds_message *rm, 
struct iov_iter *from)
sg = rm->data.op_sg;
sg_off = 0; /* Dear gcc, sg->page will be null from kzalloc. */
+   if (zcopy) {
+   int total_copied = 0;
+   struct sk_buff *skb;
+
+   skb = alloc_skb(SO_EE_ORIGIN_MAX_ZCOOKIES * sizeof(u32),
+   GFP_KERNEL);

This can sleep so you might want to check if you want to use ATOMIC version
here.


I think it should be fine: rds_message_copy_from_user() is called
in process context, and if you notice, the calling function rds_sendmsg()
also has this
1100 rm = rds_message_alloc(ret, GFP_KERNEL);
1101 if (!rm) {
1102 ret = -ENOMEM;
1103 goto out;
1104 }

 :
1106 /* Attach data to the rm */
 :
1113 ret = rds_message_copy_from_user(rm, >msg_iter);

So using GFP_KERNEL is as safe as the call at line 1100.


Was just asking you to check if it is safe. The path already
does that so we are good.




+   return -ENOMEM;
+   }

NOMEM new application visible change but probably the right one for this
particular case. Just need to make sure application can handle this
error.


I think the application already handles this correctly (see line 1102 above)


Indeed. Thanks for checking.

Regards,
Santosh

Re: [PATCH V2 net-next 4/7] rds: support for zcopy completion notification

2018-02-14 Thread Santosh Shilimkar

On 2/14/2018 11:02 AM, David Miller wrote:

From: Sowmini Varadhan <sowmini.varad...@oracle.com>
Date: Wed, 14 Feb 2018 14:01:10 -0500

On (02/14/18 10:50), Santosh Shilimkar wrote:

generic comment and please update it where it is applicable
in terms of variable names, notifiers etc.

RDS support true zero copy already with RDMA transport so some of
this code can easily get confused.

So I suggest something like below.
s/zerocopy/zeromsgcopy
s/zcopy/zmsgcopy
s/zcookie/zmsgcpycookie
s/znotifier/zmsgcpynotifier 

I'd like to hear some additional opinions from the list on this:
the existing socket API for TCP etc.  already uses ZEROCOPY, and other
than extending variable names (and putting me at risk of violating the
"fit within 80 chars per line" requirement, leading to not-so-pretty
line wraps), I'm not seeing much value in this.

I agree, this name change requires seems pointless.  Just keep the names
the way they are, thank you.

Dave I understand your point for generic code and I was not all asking 
to rename anything in generic code.

My point was new code getting added to RDS to support this zero copy
message option. Within RDS, core code is common and all the 
infrastructure with it and it will be hard to follow the code if

that distinction is not maintained.

Ofcourse you have a last say on this list so if you want to
over rule the comment, I will step back :-)

Regards,
Santosh

Re: [PATCH V2 net-next 4/7] rds: support for zcopy completion notification

2018-02-14 Thread Santosh Shilimkar


On 2/14/2018 11:01 AM, Sowmini Varadhan wrote:

On (02/14/18 10:50), Santosh Shilimkar wrote:

generic comment and please update it where it is applicable
in terms of variable names, notifiers etc.

RDS support true zero copy already with RDMA transport so some of
this code can easily get confused.

So I suggest something like below.
s/zerocopy/zeromsgcopy
s/zcopy/zmsgcopy
s/zcookie/zmsgcpycookie
s/znotifier/zmsgcpynotifier 


I'd like to hear some additional opinions from the list on this:
the existing socket API for TCP etc.  already uses ZEROCOPY, and other
than extending variable names (and putting me at risk of violating the
"fit within 80 chars per line" requirement, leading to not-so-pretty
line wraps), I'm not seeing much value in this.


What you mean by not seeing value ?
It has a value for RDS code which already support ZERO copy via
RDMA transport. For TCP sockets in isolation, ZCOPY may be
alright. Patch under discussion is for RDS and hence the
comment.

Regards,
Santosh

Re: [PATCH V2 net-next 5/7] rds: zerocopy Tx support.

2018-02-14 Thread Santosh Shilimkar


On 2/14/2018 2:28 AM, Sowmini Varadhan wrote:

If the MSG_ZEROCOPY flag is specified with rds_sendmsg(), and,
if the SO_ZEROCOPY socket option has been set on the PF_RDS socket,
application pages sent down with rds_sendmsg() are pinned.

The pinning uses the accounting infrastructure added by
Commit a91dbff551a6 ("sock: ulimit on MSG_ZEROCOPY pages")

The payload bytes in the message may not be modified for the
duration that the message has been pinned. A multi-threaded
application using this infrastructure may thus need to be notified
about send-completion so that it can free/reuse the buffers
passed to rds_sendmsg(). Notification of send-completion will
identify each message-buffer by a cookie that the application
must specify as ancillary data to rds_sendmsg().
The ancillary data in this case has cmsg_level == SOL_RDS
and cmsg_type == RDS_CMSG_ZCOPY_COOKIE.

Signed-off-by: Sowmini Varadhan 
---
v2:
   - remove unused data_len argument to rds_rm_size;
   - unmap as necessary if we fail in the middle of zerocopy setup

  include/uapi/linux/rds.h |1 +
  net/rds/message.c|   51 +-
  net/rds/rds.h|3 +-
  net/rds/send.c   |   44 ++-
  4 files changed, 91 insertions(+), 8 deletions(-)

diff --git a/include/uapi/linux/rds.h b/include/uapi/linux/rds.h
index e71d449..12e3bca 100644
--- a/include/uapi/linux/rds.h
+++ b/include/uapi/linux/rds.h
@@ -103,6 +103,7 @@
  #define RDS_CMSG_MASKED_ATOMIC_FADD   8
  #define RDS_CMSG_MASKED_ATOMIC_CSWP   9
  #define RDS_CMSG_RXPATH_LATENCY   11
+#defineRDS_CMSG_ZCOPY_COOKIE   12


s/RDS_CMSG_ZCOPY_COOKIE/RDS_CMSG_ZMSGCOPY_COOKIE


  #define RDS_INFO_FIRST1
  #define RDS_INFO_COUNTERS 1
diff --git a/net/rds/message.c b/net/rds/message.c
index d874b74..e499566 100644
--- a/net/rds/message.c
+++ b/net/rds/message.c
@@ -341,12 +341,14 @@ struct rds_message *rds_message_map_pages(unsigned long 
*page_addrs, unsigned in
return rm;
  }
  
-int rds_message_copy_from_user(struct rds_message *rm, struct iov_iter *from)

+int rds_message_copy_from_user(struct rds_message *rm, struct iov_iter *from,
+  bool zcopy)
  {
unsigned long to_copy, nbytes;
unsigned long sg_off;
struct scatterlist *sg;
int ret = 0;
+   int length = iov_iter_count(from);
  
  	rm->m_inc.i_hdr.h_len = cpu_to_be32(iov_iter_count(from));
  
@@ -356,6 +358,53 @@ int rds_message_copy_from_user(struct rds_message *rm, struct iov_iter *from)

sg = rm->data.op_sg;
sg_off = 0; /* Dear gcc, sg->page will be null from kzalloc. */
  
+	if (zcopy) {

+   int total_copied = 0;
+   struct sk_buff *skb;
+
+   skb = alloc_skb(SO_EE_ORIGIN_MAX_ZCOOKIES * sizeof(u32),
+   GFP_KERNEL);
This can sleep so you might want to check if you want to use ATOMIC 
version here.



+   if (!skb)
+   return -ENOMEM;
+   rm->data.op_mmp_znotifier = RDS_ZCOPY_SKB(skb);
+   memset(rm->data.op_mmp_znotifier, 0,
+  sizeof(*rm->data.op_mmp_znotifier));
+   if (mm_account_pinned_pages(>data.op_mmp_znotifier->z_mmp,
+   length)) {
+   consume_skb(skb);
+   rm->data.op_mmp_znotifier = NULL;
+   return -ENOMEM;
+   }

NOMEM new application visible change but probably the right one for this
particular case. Just need to make sure application can handle this
error.



diff --git a/net/rds/rds.h b/net/rds/rds.h
index 6e8fc4c..dfdc9b3 100644
--- a/net/rds/rds.h
+++ b/net/rds/rds.h
@@ -784,7 +784,8 @@ void rds_for_each_conn_info(struct socket *sock, unsigned 
int len,
  /* message.c */
  struct rds_message *rds_message_alloc(unsigned int nents, gfp_t gfp);
  struct scatterlist *rds_message_alloc_sgs(struct rds_message *rm, int nents);
-int rds_message_copy_from_user(struct rds_message *rm, struct iov_iter *from);
+int rds_message_copy_from_user(struct rds_message *rm, struct iov_iter *from,
+  bool zcopy);
  struct rds_message *rds_message_map_pages(unsigned long *page_addrs, unsigned 
int total_len);
  void rds_message_populate_header(struct rds_header *hdr, __be16 sport,
 __be16 dport, u64 seq);
diff --git a/net/rds/send.c b/net/rds/send.c
index 5ac0925..80171cf 100644
--- a/net/rds/send.c
+++ b/net/rds/send.c


[...]


@@ -1087,8 +1112,15 @@ int rds_sendmsg(struct socket *sock, struct msghdr *msg, 
size_t payload_len)
goto out;
}
  
+	if (zcopy) {

+   if (rs->rs_transport->t_type != RDS_TRANS_TCP) {
+   ret = -EOPNOTSUPP;
+   goto out;
+   }
+

Re: [PATCH V2 net-next 4/7] rds: support for zcopy completion notification

2018-02-14 Thread Santosh Shilimkar


On 2/14/2018 2:28 AM, Sowmini Varadhan wrote:

RDS removes a datagram (rds_message) from the retransmit queue when
an ACK is received. The ACK indicates that the receiver has queued
the RDS datagram, so that the sender can safely forget the datagram.
When all references to the rds_message are quiesced, rds_message_purge
is called to release resources used by the rds_message

If the datagram to be removed had pinned pages set up, add
an entry to the rs->rs_znotify_queue so that the notifcation
will be sent up via rds_rm_zerocopy_callback() when the
rds_message is eventually freed by rds_message_purge.

rds_rm_zerocopy_callback() attempts to batch the number of cookies
sent with each notification  to a max of SO_EE_ORIGIN_MAX_ZCOOKIES.
This is achieved by checking the tail skb in the sk_error_queue:
if this has room for one more cookie, the cookie from the
current notification is added; else a new skb is added to the
sk_error_queue. Every invocation of rds_rm_zerocopy_callback() will
trigger a ->sk_error_report to notify the application.

Signed-off-by: Sowmini Varadhan 
---
v2:
   - make sure to always sock_put m_rs even if there is no znotifier.
   - major rewrite of notification, resulting in much simplification.

  include/uapi/linux/errqueue.h |2 +
  net/rds/af_rds.c  |2 +
  net/rds/message.c |   83 +---
  net/rds/rds.h |   14 +++
  net/rds/recv.c|2 +
  5 files changed, 96 insertions(+), 7 deletions(-)


generic comment and please update it where it is applicable
in terms of variable names, notifiers etc.

RDS support true zero copy already with RDMA transport so some of
this code can easily get confused.

So I suggest something like below.
s/zerocopy/zeromsgcopy
s/zcopy/zmsgcopy
s/zcookie/zmsgcpycookie
s/znotifier/zmsgcpynotifier 



diff --git a/include/uapi/linux/errqueue.h b/include/uapi/linux/errqueue.h
index dc64cfa..28812ed 100644
--- a/include/uapi/linux/errqueue.h
+++ b/include/uapi/linux/errqueue.h
@@ -20,11 +20,13 @@ struct sock_extended_err {
  #define SO_EE_ORIGIN_ICMP63
  #define SO_EE_ORIGIN_TXSTATUS 4
  #define SO_EE_ORIGIN_ZEROCOPY 5
+#define SO_EE_ORIGIN_ZCOOKIE   6
  #define SO_EE_ORIGIN_TIMESTAMPING SO_EE_ORIGIN_TXSTATUS
  
  #define SO_EE_OFFENDER(ee)	((struct sockaddr*)((ee)+1))
  
  #define SO_EE_CODE_ZEROCOPY_COPIED	1

+#defineSO_EE_ORIGIN_MAX_ZCOOKIES   8


This error change might need to go though other subsystem tree. May
be you can seperate it and also copy "linux-...@vger.kernel.org"

Otherwise changes looks fine to me.

Regards,
Santosh

Re: [PATCH V2 net-next 2/7] rds: hold a sock ref from rds_message to the rds_sock

2018-02-14 Thread Santosh Shilimkar


On 2/14/2018 2:28 AM, Sowmini Varadhan wrote:

The existing model holds a reference from the rds_sock to the
rds_message, but the rds_message does not itself hold a sock_put()
on the rds_sock. Instead the m_rs field in the rds_message is
assigned when the message is queued on the sock, and nulled when
the message is dequeued from the sock.

We want to be able to notify userspace when the rds_message
is actually freed (from rds_message_purge(), after the refcounts
to the rds_message go to 0). At the time that rds_message_purge()
is called, the message is no longer on the rds_sock retransmit
queue. Thus the explicit reference for the m_rs is needed to
send a notification that will signal to userspace that
it is now safe to free/reuse any pages that may have
been pinned down for zerocopy.

This patch manages the m_rs assignment in the rds_message with
the necessary refcount book-keeping.

Signed-off-by: Sowmini Varadhan <sowmini.varad...@oracle.com>
---


[...]



@@ -756,9 +755,6 @@ void rds_send_drop_to(struct rds_sock *rs, struct 
sockaddr_in *dest)
 */
if (!test_and_clear_bit(RDS_MSG_ON_CONN, >m_flags)) {
spin_unlock_irqrestore(>cp_lock, flags);
-   spin_lock_irqsave(>m_rs_lock, flags);
-   rm->m_rs = NULL;
-   spin_unlock_irqrestore(>m_rs_lock, flags);
continue;
}
list_del_init(>m_conn_item);

This hunk was clearly wrong so good that you got rid of it as well.
Patch looks fine to me.

Acked-by: Santosh Shilimkar <santosh.shilim...@oracle.com>

Re: KASAN: use-after-free Read in rds_find_bound

2018-02-14 Thread Santosh Shilimkar


On 2/14/2018 9:52 AM, Dmitry Vyukov wrote:

On Wed, Feb 14, 2018 at 6:35 PM, Santosh Shilimkar
<santosh.shilim...@oracle.com> wrote:

Hi Santosh,

What is that fix? You forgot to provide any link/reference. I also
don't see any patches from you at around that date...


Fix [1] was later not added since there was a still a race. Wanted to
see if the issue re-appears after recent netns fix [2].




We will not see if the bug re-appears or not until this bug is closed.
Please see this recent discussion about another rds bug:
https://groups.google.com/d/msg/syzkaller-bugs/3XjmOzr5jRU/g7pXIsY1BgAJ
In the current state syzbot will never report bugs in these functions
again.


OK. Can you close that one then in that case ?


Anybody can do this:


I see.


#syz fix: rds: tcp: use rds_destroy_pending() to synchronize
netns/module teardown and rds connection/workq management

syzbot provides full self-service, see first email and in particular this:
https://github.com/google/syzkaller/blob/master/docs/syzbot.md#communication-with-syzbot


ok will have a look.

Regards,
Santosh

Re: KASAN: use-after-free Read in rds_find_bound

2018-02-14 Thread Santosh Shilimkar


On 2/14/2018 9:14 AM, Dmitry Vyukov wrote:

On Wed, Feb 14, 2018 at 5:53 PM, Santosh Shilimkar
<santosh.shilim...@oracle.com> wrote:


[...]


Hi Santosh,

What is that fix? You forgot to provide any link/reference. I also
don't see any patches from you at around that date...


Fix [1] was later not added since there was a still a race. Wanted to
see if the issue re-appears after recent netns fix [2].



We will not see if the bug re-appears or not until this bug is closed.
Please see this recent discussion about another rds bug:
https://groups.google.com/d/msg/syzkaller-bugs/3XjmOzr5jRU/g7pXIsY1BgAJ
In the current state syzbot will never report bugs in these functions again.


OK. Can you close that one then in that case ?

Re: KASAN: use-after-free Read in rds_find_bound

2018-02-14 Thread Santosh Shilimkar


On 2/13/2018 12:12 PM, Dmitry Vyukov wrote:

On Sat, Dec 30, 2017 at 8:41 PM, santosh.shilim...@oracle.com
 wrote:

On 12/30/17 1:17 AM, syzbot wrote:


Hello,

syzkaller hit the following crash on
fba961ab29e5ffb055592442808bb0f7962e05da
git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git/master
compiler: gcc (GCC) 7.1.1 20170620
.config is attached
Raw console output is attached.
Unfortunately, I don't have any reproducer for this bug yet.


IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+93a5839deb3555374...@syzkaller.appspotmail.com



Posted a fix[1] for above issue. Didn't test it but looks straight
forward.



Hi Santosh,

What is that fix? You forgot to provide any link/reference. I also
don't see any patches from you at around that date...


Fix [1] was later not added since there was a still a race. Wanted to
see if the issue re-appears after recent netns fix [2].

Regards,
Santosh


[1] https://patchwork.kernel.org/patch/10137901/
[2] https://patchwork.ozlabs.org/patch/868902/

Re: [PATCH V2 net-next] rds: do not call ->conn_alloc with GFP_KERNEL

2018-02-13 Thread Santosh Shilimkar



On 2/13/2018 9:05 AM, Sowmini Varadhan wrote:

Commit ebeeb1ad9b8a ("rds: tcp: use rds_destroy_pending() to synchronize
netns/module teardown and rds connection/workq management")
adds an rcu read critical section to __rd_conn_create. The
memory allocations in that critcal section need to use
GFP_ATOMIC to avoid sleeping.

This patch was verified with syzkaller reproducer.

Fixes: ebeeb1ad9b8a ("rds: tcp: use rds_destroy_pending() to synchronize
netns/module teardown and rds connection/workq management")
Reported-by: syzbot+a0564419941aaae3f...@syzkaller.appspotmail.com
Signed-off-by: Sowmini Varadhan <sowmini.varad...@oracle.com>
---

Acked-by: Santosh Shilimkar <santosh.shilim...@oracle.com>

Re: [PATCH V2 net-next] rds: tcp: use rds_destroy_pending() to synchronize netns/module teardown and rds connection/workq management

2018-02-05 Thread Santosh Shilimkar


On 2/3/2018 4:26 AM, Sowmini Varadhan wrote:

An rds_connection can get added during netns deletion between lines 528
and 529 of

   506 static void rds_tcp_kill_sock(struct net *net)
   :
   /* code to pull out all the rds_connections that should be destroyed */
   :
   528 spin_unlock_irq(_tcp_conn_lock);
   529 list_for_each_entry_safe(tc, _tc, _list, t_tcp_node)
   530 rds_conn_destroy(tc->t_cpath->cp_conn);

Such an rds_connection would miss out the rds_conn_destroy()
loop (that cancels all pending work) and (if it was scheduled
after netns deletion) could trigger the use-after-free.

A similar race-window exists for the module unload path
in rds_tcp_exit -> rds_tcp_destroy_conns

Concurrency with netns deletion (rds_tcp_kill_sock()) must be handled
by checking check_net() before enqueuing new work or adding new
connections.

Concurrency with module-unload is handled by maintaining a module
specific flag that is set at the start of the module exit function,
and must be checked before enqueuing new work or adding new connections.

This commit refactors existing RDS_DESTROY_PENDING checks added by
commit 3db6e0d172c9 ("rds: use RCU to synchronize work-enqueue with
connection teardown") and consolidates all the concurrency checks
listed above into the function rds_destroy_pending().

Signed-off-by: Sowmini Varadhan <sowmini.varad...@oracle.com>
---
v2: use check_net() for the netns delete case, as recommended on the list.
 refactor RDS_DESTROY_PENDING checks and consolidate into
 rds_destroy_pending()


Thanks for the update. It looks inline as per off-list chat.

Acked-by: Santosh Shilimkar <santosh.shilim...@oracle.com>

Re: KASAN: stack-out-of-bounds Read in rds_sendmsg

2018-01-30 Thread Santosh Shilimkar


On 1/30/2018 6:16 PM, Eric Biggers wrote:

On Thu, Dec 21, 2017 at 08:44:32AM -0800, Santosh Shilimkar wrote:

+Avinash

On 12/21/2017 1:10 AM, syzbot wrote:

syzkaller has found reproducer for the following crash on


[..]



audit: type=1400 audit(1513847224.110:7): avc:  denied  { map } for
pid=3157 comm="syzkaller455006" path="/root/syzkaller455006870"
dev="sda1" ino=16481
scontext=unconfined_u:system_r:insmod_t:s0-s0:c0.c1023
tcontext=unconfined_u:object_r:user_home_t:s0 tclass=file permissive=1
==
BUG: KASAN: stack-out-of-bounds in rds_rdma_bytes net/rds/send.c:1013
[inline]


Could you please post the discussed fix if you are ready with it ?
This new report is same as last one and cmesg length check should
address it.



[...]



This crash seems to have stopped occurring.  I assume it was fixed by commit
14e138a86f63 (thanks Avinash!), so let's tell syzbot so that it can start
reporting crashes in the same place again:

#syz fix: RDS: Check cmsg_len before dereferencing CMSG_DATA


Thanks Eric for confirmation !!

Regards,
Santosh

Re: [PATCH net-next 0/7] RDS zerocopy support

2018-01-25 Thread Santosh Shilimkar


Hi Sowmini,

On 1/24/2018 3:45 AM, Sowmini Varadhan wrote:

This patch series follows up on the RFC and subsequent review comments
at https://patchwork.ozlabs.org/cover/862248/

Review comments addressed are
- drop MSG_PEEK change for sk_error_queue
- (patch4) batch of SO_EE_ORIGIN_MAX_ZCOOKIES (#defined to 8) is sent up
   as part of the data in the error notification. The ancillary data in
   with this notification specifies the number of cookies in ee_data,
   with the ee_origin is set to SO_EE_ORIGIN_ZCOOKIE
- (patch4, patch5) allocate the skb to be used for error notification
   up-front (in rds_sendmsg()) so that we never have to fail due to skb
   allocation failure in the callback routine.
- other minor review fixes around refactoring code for the setsockopt
   of ZEROCOPY, use iov_iter_npages()  etc.

This patch series also updates the selftests/net/msg_zerocopy.c to support
PF_RDS sockets (both with and without zerocopy)


RDS changes looks like largely good but I need some time to look at the
completion notification and send side changes. Will try to provide
feedback in next few days.

regards,
Santosh

Re: [PATCH net-next] rds: tcp: per-netns flag to stop new connection creation when rds-tcp is being dismantled

2018-01-24 Thread Santosh Shilimkar


On 1/24/2018 1:03 PM, Sowmini Varadhan wrote:

An rds_connection can get added during netns deletion between lines 528
and 529 of

   506 static void rds_tcp_kill_sock(struct net *net)
   :
   /* code to pull out all the rds_connections that should be destroyed */
   :
   528 spin_unlock_irq(_tcp_conn_lock);
   529 list_for_each_entry_safe(tc, _tc, _list, t_tcp_node)
   530 rds_conn_destroy(tc->t_cpath->cp_conn);

Such an rds_connection would miss out the rds_conn_destroy()
loop (that cancels all pending work) and (if it was scheduled
after netns deletion) could trigger the use-after-free.

A similar race-window exists for the module unload path
in rds_tcp_exit -> rds_tcp_destroy_conns

To avoid the addition of new rds_connections during kill_sock
or netns_delete, this patch introduces a per-netns flag,
RTN_DELETE_PENDING, that will cause RDS connection creation to fail.
RCU is used to make sure that we wait for the critical
section of __rds_conn_create threads (that may have started before
the setting of RTN_DELETE_PENDING) to complete before starting
the connection destruction.

Reported-by: syzbot+bbd8e9a06452cc480...@syzkaller.appspotmail.com
Signed-off-by: Sowmini Varadhan <sowmini.varad...@oracle.com>
---
  net/rds/connection.c |3 ++
  net/rds/tcp.c|   82 -
  net/rds/tcp.h|1 +
  3 files changed, 57 insertions(+), 29 deletions(-)


FWIW,
Acked-by: Santosh Shilimkar <santosh.shilim...@oracle.com>

Just for archives, just summarizing off-list discussion. Netns
destroy making use of conn_destroy now which in past was used for only
module unload is racy.

Its not possible to make it race free with just flags alone and needs
rcu sync kind of mechanism. RDS being sensitive to brownouts on 
reconnects, rcu usage was has been minimised. Netns delete

is expected to be non-frequent operation and hence usage of rcu as
done in this patch is probably ok. If needed it will be revisited in
future for optimization.

regards,
Santosh

Re: [PATCH] RDS: Fix rds-ping inducing kernel panic

2018-01-22 Thread Santosh Shilimkar


On 1/22/2018 2:17 PM, Kees Cook wrote:

On Tue, Jan 23, 2018 at 4:01 AM, Santosh Shilimkar
<santosh.shilim...@oracle.com> wrote:

On 1/22/2018 3:24 AM, Kees Cook wrote:


As described in: https://bugzilla.redhat.com/show_bug.cgi?id=822754

Attempting an RDS connection from the IP address of an IPoIB interface
to itself causes a kernel panic due to a BUG_ON() being triggered.
Making the test less strict allows rds-ping to work without crashing
the machine.

A local unprivileged user could use this flaw to crash the sytem.


Are you able to reproduce this issue on mainline kernel ?
IIRC, this sjouldn't happen anymore but if you see it, please
let me know. Will try it as well. rds-ping on self
loopback device is often tested and used as well for
monitoring services in production.


I don't have an RDS test setup, no. But it sounds like kernels without
this patch aren't seeing the problem.


Yep. Thats what I thought and hence asked.


Am not sure if its applicable anymore. Infact the issue with
loopback device was due to congestion update and thats been
already addressed with commit '18fc25c94: {rds: prevent BUG_ON
triggered on congestion update to loopback}'


That looks very much like it was fixed there. Thanks!


Yeah. Thanks Kees !!

Regards,
Santosh

Re: [PATCH] RDS: Fix rds-ping inducing kernel panic

2018-01-22 Thread Santosh Shilimkar

On 1/22/2018 7:47 AM, David Miller wrote:

From: Leon Romanovsky 
Date: Mon, 22 Jan 2018 17:10:54 +0200

On Mon, Jan 22, 2018 at 03:24:15AM -0800, Kees Cook wrote:

diff --git a/net/rds/ib_send.c b/net/rds/ib_send.c
index 8557a1cae041..5fbf635d17cb 100644
--- a/net/rds/ib_send.c
+++ b/net/rds/ib_send.c
@@ -506,7 +506,7 @@ int rds_ib_xmit(struct rds_connection *conn, struct 
rds_message *rm,
int flow_controlled = 0;
int nr_sig = 0;

-   BUG_ON(off % RDS_FRAG_SIZE);
+   BUG_ON(!conn->c_loopback && off % RDS_FRAG_SIZE);
BUG_ON(hdr_off != 0 && hdr_off != sizeof(struct rds_header));

To be honest this function full of BUG_ONs and it looks fishy to have them 
there.
Why don't we return EINVAL instead of crashing system?

I completely agree that these assertions should just cause an error-out
rather than trigger a BUG().

Andy did remove bunch of them but there are still few more left overs.
Will have a look at remainder set since most of them were added during
early development and remained there. Thanks Dave/Leon.

Regards,
Santosh

Re: [PATCH] RDS: Fix rds-ping inducing kernel panic

2018-01-22 Thread Santosh Shilimkar


On 1/22/2018 3:24 AM, Kees Cook wrote:

As described in: https://bugzilla.redhat.com/show_bug.cgi?id=822754

Attempting an RDS connection from the IP address of an IPoIB interface
to itself causes a kernel panic due to a BUG_ON() being triggered.
Making the test less strict allows rds-ping to work without crashing
the machine.

A local unprivileged user could use this flaw to crash the sytem.


Are you able to reproduce this issue on mainline kernel ?
IIRC, this sjouldn't happen anymore but if you see it, please
let me know. Will try it as well. rds-ping on self
loopback device is often tested and used as well for
monitoring services in production.


I think this fix was written by Jay Fenlason ,
and extracted from the RedHat kernel patches here:

https://oss.oracle.com/git/gitweb.cgi?p=redpatch.git;a=commitdiff;h=c7b6a0a1d8d636852be130fa15fa8be10d4704e8


It was part of redhat patched kernel but not carried in shipping
Oracle UEK kernels at least afaik.


This fix appears to have been carried by at least RedHat, Oracle, and
Ubuntu for several years.

CVE-2012-2372

Reported-by: Honggang Li 
Cc: sta...@vger.kernel.org
Signed-off-by: Kees Cook 
---
This is what I get for researching CVE lifetimes...

Am not sure if its applicable anymore. Infact the issue with
loopback device was due to congestion update and thats been
already addressed with commit '18fc25c94: {rds: prevent BUG_ON
triggered on congestion update to loopback}'

Regards,
Santosh

Re: [PATCH] RDS: null pointer dereference in rds_atomic_free_op

2018-01-03 Thread Santosh Shilimkar


On 1/3/2018 1:06 PM, simo.ghan...@gmail.com wrote:

From: Mohamed Ghannam <simo.ghan...@gmail.com>

set rm->atomic.op_active to 0 when rds_pin_pages() fails
or the user supplied address is invalid,
this prevents a NULL pointer usage in rds_atomic_free_op()

Signed-off-by: Mohamed Ghannam <simo.ghan...@gmail.com>
---

Good catch !!

Acked-by: Santosh Shilimkar <santosh.shilim...@oracle.com>

Re: WARNING in rds_cmsg_rdma_args

2018-01-03 Thread Santosh Shilimkar



On 1/3/2018 10:51 AM, Mohamed Ghannam wrote:
The fix  : https://patchwork.ozlabs.org/patch/854723/ 


Cool. Thanks Mohamed for following it up.

Regards,
Santosh

Re: WARNING in rds_cmsg_rdma_args

2018-01-03 Thread Santosh Shilimkar


On 1/3/2018 12:58 AM, syzbot wrote:

Hello,

syzkaller hit the following crash on 
ad036b63ee57df9ab802a4eb20cbbbec66aa4520

git://git.cmpxchg.org/linux-mmots.git/master
compiler: gcc (GCC) 7.1.1 20170620
.config is attached
Raw console output is attached.
C reproducer is attached
syzkaller reproducer is attached. See 




for information about syzkaller reproducers


IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+ef175b5825285531e...@syzkaller.appspotmail.com
It will help syzbot understand when the bug is fixed. See footer for 
details.

If you forward the report, please keep this part and the footer.

audit: type=1400 audit(1514947982.760:7): avc:  denied  { map } for  
pid=3468 comm="syzkaller284818" path="/root/syzkaller284818499" 
dev="sda1" ino=16481 
scontext=unconfined_u:system_r:insmod_t:s0-s0:c0.c1023 
tcontext=unconfined_u:object_r:user_home_t:s0 tclass=file permissive=1
WARNING: CPU: 1 PID: 3468 at net/rds/rdma.c:617 
rds_cmsg_rdma_args+0xe96/0x1360 net/rds/rdma.c:617

Kernel panic - not syncing: panic_on_warn set ...


+Mohamed Ghannam, who was discussing similar issue and was going
to post fix for it. Checking args->nr_local against 0 upfront would
address this issue as well.

Regards,
Santosh

[PATCH] rds: fix use-after-free read in rds_find_bound

2017-12-30 Thread Santosh Shilimkar

socket buffer can get freed as part of sock_close
callback so before adding reference check underneath
socket validity.

Reported-by: syzbot+93a5839deb3555374...@syzkaller.appspotmail.com
Signed-off-by: Santosh Shilimkar <santosh.shilim...@oracle.com>
---
 net/rds/bind.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/rds/bind.c b/net/rds/bind.c
index 75d43dc..8dec06e 100644
--- a/net/rds/bind.c
+++ b/net/rds/bind.c
@@ -61,7 +61,7 @@ struct rds_sock *rds_find_bound(__be32 addr, __be16 port)
struct rds_sock *rs;
 
rs = rhashtable_lookup_fast(_hash_table, , ht_parms);
-   if (rs && !sock_flag(rds_rs_to_sk(rs), SOCK_DEAD))
+   if (rs && rds_rs_to_sk(rs) && !sock_flag(rds_rs_to_sk(rs), SOCK_DEAD))
rds_sock_addref(rs);
else
rs = NULL;
-- 
1.9.1

Re: [PATCH V2 net-next 3/3] rds: tcp: cleanup if kmem_cache_alloc fails in rds_tcp_conn_alloc()

2017-12-22 Thread Santosh Shilimkar


On 12/22/2017 9:39 AM, Sowmini Varadhan wrote:

If kmem_cache_alloc() fails in the middle of the for() loop,
cleanup anything that might have been allocated so far.

Signed-off-by: Sowmini Varadhan <sowmini.varad...@oracle.com>
---
v2: target net-next, not net


Acked-by: Santosh Shilimkar <santosh.shilim...@oracle.com>

Re: [PATCH V2 net-next 2/3] rds: tcp: initialize t_tcp_detached to false

2017-12-22 Thread Santosh Shilimkar


On 12/22/2017 9:39 AM, Sowmini Varadhan wrote:

Commit f10b4cff98c6 ("rds: tcp: atomically purge entries from
rds_tcp_conn_list during netns delete") adds the field t_tcp_detached,
but this needs to be initialized explicitly to false.

Signed-off-by: Sowmini Varadhan <sowmini.varad...@oracle.com>
---
v2: target net-next, not net


Acked-by: Santosh Shilimkar <santosh.shilim...@oracle.com>

Re: [PATCH V2 net-next 1/3] rds; Reset rs->rs_bound_addr in rds_add_bound() failure path

2017-12-22 Thread Santosh Shilimkar


On 12/22/2017 9:38 AM, Sowmini Varadhan wrote:

If the rds_sock is not added to the bind_hash_table, we must
reset rs_bound_addr so that rds_remove_bound will not trip on
this rds_sock.

rds_add_bound() does a rds_sock_put() in this failure path, so
failing to reset rs_bound_addr will result in a socket refcount
bug, and will trigger a WARN_ON with the stack shown below when
the application subsequently tries to close the PF_RDS socket.

  WARNING: CPU: 20 PID: 19499 at net/rds/af_rds.c:496 \
rds_sock_destruct+0x15/0x30 [rds]
:
  __sk_destruct+0x21/0x190
  rds_remove_bound.part.13+0xb6/0x140 [rds]
  rds_release+0x71/0x120 [rds]
  sock_release+0x1a/0x70
  sock_close+0xe/0x20
  __fput+0xd5/0x210
  task_work_run+0x82/0xa0
  do_exit+0x2ce/0xb30
  ? syscall_trace_enter+0x1cc/0x2b0
  do_group_exit+0x39/0xa0
  SyS_exit_group+0x10/0x10
  do_syscall_64+0x61/0x1a0

Signed-off-by: Sowmini Varadhan <sowmini.varad...@oracle.com>
---
v2: target net-next, not net


Acked-by: Santosh Shilimkar <santosh.shilim...@oracle.com>

Re: KASAN: stack-out-of-bounds Read in rds_sendmsg

2017-12-21 Thread Santosh Shilimkar


+Avinash

On 12/21/2017 1:10 AM, syzbot wrote:
syzkaller has found reproducer for the following crash on 


[..]



audit: type=1400 audit(1513847224.110:7): avc:  denied  { map } for  
pid=3157 comm="syzkaller455006" path="/root/syzkaller455006870" 
dev="sda1" ino=16481 
scontext=unconfined_u:system_r:insmod_t:s0-s0:c0.c1023 
tcontext=unconfined_u:object_r:user_home_t:s0 tclass=file permissive=1

==
BUG: KASAN: stack-out-of-bounds in rds_rdma_bytes net/rds/send.c:1013 
[inline]


Could you please post the discussed fix if you are ready with it ?
This new report is same as last one and cmesg length check should
address it.

Regards,
Santosh

Re: [PATCH v2 4/5] rds: Add runchecks.cfg for net/rds

2017-12-18 Thread Santosh Shilimkar


On 12/16/2017 6:02 PM, Knut Omang wrote:

On Sat, 2017-12-16 at 12:00 -0800, santosh.shilim...@oracle.com wrote:

On 12/16/17 10:24 AM, Joe Perches wrote:


[...]

Most of these existing messages from checkpatch should
probably be inspected and corrected where possible to
minimize the style differences between this subsystem
and the rest of the kernel.

For instance, here's a trivial patch to substitute
pr_ for printks and a couple braces next to
these substitutions.


Thanks Joe. I actually had a similar patch a while back but
since it was lot of churn, and code was already merged,
never submitted it and then later forgot about it.

Will look into it.


Please look at my set here first - I have already spent considerable time 
cleaning up
stuff while working on this:



Just closing the loop. As discussed, I can use your patches without
any new tool dependency since existing checkpatch.pl already gives
those warnings. I started picking up Joes patch but since you have
changes, can use them instead once you untie them with runcheck.

Regarding the $subject, just re-iterating that I don't want any custom
script for RDS and want to just follow generic guidelines followed by
netdev for all net/* code.

Regards,
Santosh

Re: BUG: unable to handle kernel NULL pointer dereference in rds_send_xmit

2017-12-18 Thread Santosh Shilimkar

On 12/18/2017 9:12 AM, David Miller wrote:

From: Santosh Shilimkar <santosh.shilim...@oracle.com>
Date: Mon, 18 Dec 2017 08:28:05 -0800

On 12/18/2017 12:43 AM, syzbot wrote:

Hello,
syzkaller hit the following crash on
6084b576dca2e898f5c101baef151f7bfdbb606d
git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/master
compiler: gcc (GCC) 7.1.1 20170620
.config is attached
Raw console output is attached.
Unfortunately, I don't have any reproducer for this bug yet.
BUG: unable to handle kernel NULL pointer dereference at
0028
program syz-executor6 is using a deprecated SCSI ioctl, please convert
it to SG_IO
IP: rds_send_xmit+0x80/0x930 net/rds/send.c:186

Looks like another one tripping on empty transport. Mostly below
should
address it but we will test it if it does.

diff --git a/net/rds/send.c b/net/rds/send.c
index 7244d2e..e2d0eaa 100644
--- a/net/rds/send.c
+++ b/net/rds/send.c
@@ -183,7 +183,7 @@ int rds_send_xmit(struct rds_conn_path *cp)
 goto out;
 }

-   if (conn->c_trans->xmit_path_prepare)
+   if (conn->c_trans && conn->c_trans->xmit_path_prepare)
 conn->c_trans->xmit_path_prepare(cp);

We're seeming to accumulate a lot of checks like this, maybe there
is a more general way to deal with this problem?

Agree. Some of these additional transports hooks got added later
to specific transports which needs them. Will review this overall
and see if it can be addressed generically.

Regards,
Santosh

Re: BUG: spinlock bad magic (2)

2017-12-18 Thread Santosh Shilimkar


On 12/18/2017 4:36 AM, syzbot wrote:

Hello,

syzkaller hit the following crash on 
6084b576dca2e898f5c101baef151f7bfdbb606d

git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/master
compiler: gcc (GCC) 7.1.1 20170620
.config is attached
Raw console output is attached.

Unfortunately, I don't have any reproducer for this bug yet.


[...]


BUG: unable to handle kernel NULL pointer dereference at 0028
IP: rds_send_xmit+0x80/0x930 net/rds/send.c:186


This one seems to be same bug as reported as below.

BUG: unable to handle kernel NULL pointer dereference in rds_send_xmit

Re: BUG: unable to handle kernel NULL pointer dereference in rds_send_xmit

2017-12-18 Thread Santosh Shilimkar


On 12/18/2017 12:43 AM, syzbot wrote:

Hello,

syzkaller hit the following crash on 
6084b576dca2e898f5c101baef151f7bfdbb606d

git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/master
compiler: gcc (GCC) 7.1.1 20170620
.config is attached
Raw console output is attached.

Unfortunately, I don't have any reproducer for this bug yet.


BUG: unable to handle kernel NULL pointer dereference at 0028
program syz-executor6 is using a deprecated SCSI ioctl, please convert 
it to SG_IO

IP: rds_send_xmit+0x80/0x930 net/rds/send.c:186


Looks like another one tripping on empty transport. Mostly below should
address it but we will test it if it does.

diff --git a/net/rds/send.c b/net/rds/send.c
index 7244d2e..e2d0eaa 100644
--- a/net/rds/send.c
+++ b/net/rds/send.c
@@ -183,7 +183,7 @@ int rds_send_xmit(struct rds_conn_path *cp)
goto out;
}

-   if (conn->c_trans->xmit_path_prepare)
+   if (conn->c_trans && conn->c_trans->xmit_path_prepare)
conn->c_trans->xmit_path_prepare(cp);

Re: [PATCH net-next] rds: debug: fix null check on static array

2017-12-06 Thread Santosh Shilimkar


On 12/6/2017 3:32 AM, Sowmini Varadhan wrote:

On (12/06/17 10:47), Prashant Bhole wrote:


t_name cannot be NULL since it is an array field of a struct.
Replacing null check on static array with string length check using
strnlen()


t_name is always initialized for all rds transports today,  and would
be all zeros unless someone stomped over that memory (in which case
there could be more serious issues than a busted debug string) but
it's good  to be safe and check I suppose.


Thanks Sowmini !!

Acked-by: Santosh Shilimkar <santosh.shilim...@oracle.com>

Re: [PATCH net] rds: Fix NULL pointer dereference in __rds_rdma_map

2017-12-06 Thread Santosh Shilimkar


On 12/6/2017 8:18 AM, Håkon Bugge wrote:

This is a fix for syzkaller719569, where memory registration was
attempted without any underlying transport being loaded.

Analysis of the case reveals that it is the setsockopt() RDS_GET_MR
(2) and RDS_GET_MR_FOR_DEST (7) that are vulnerable.

Here is an example stack trace when the bug is hit:

BUG: unable to handle kernel NULL pointer dereference at 00c0
IP: __rds_rdma_map+0x36/0x440 [rds]
PGD 2f93d03067 P4D 2f93d03067 PUD 2f93d02067 PMD 0
Oops:  [#1] SMP
Modules linked in: bridge stp llc tun rpcsec_gss_krb5 nfsv4
dns_resolver nfs fscache rds binfmt_misc sb_edac intel_powerclamp
coretemp kvm_intel kvm irqbypass crct10dif_pclmul c rc32_pclmul
ghash_clmulni_intel pcbc aesni_intel crypto_simd glue_helper cryptd
iTCO_wdt mei_me sg iTCO_vendor_support ipmi_si mei ipmi_devintf nfsd
shpchp pcspkr i2c_i801 ioatd ma ipmi_msghandler wmi lpc_ich mfd_core
auth_rpcgss nfs_acl lockd grace sunrpc ip_tables ext4 mbcache jbd2
mgag200 i2c_algo_bit drm_kms_helper ixgbe syscopyarea ahci sysfillrect
sysimgblt libahci mdio fb_sys_fops ttm ptp libata sd_mod mlx4_core drm
crc32c_intel pps_core megaraid_sas i2c_core dca dm_mirror
dm_region_hash dm_log dm_mod
CPU: 48 PID: 45787 Comm: repro_set2 Not tainted 4.14.2-3.el7uek.x86_64 #2
Hardware name: Oracle Corporation ORACLE SERVER X5-2L/ASM,MOBO TRAY,2U, BIOS 
3111 03/03/2017
task: 882f9190db00 task.stack: c9002b994000
RIP: 0010:__rds_rdma_map+0x36/0x440 [rds]
RSP: 0018:c9002b997df0 EFLAGS: 00010202
RAX:  RBX: 882fa2182580 RCX: 
RDX:  RSI: c9002b997e40 RDI: 882fa2182580
RBP: c9002b997e30 R08:  R09: 0002
R10: 885fb29e3838 R11:  R12: 882fa2182580
R13: 882fa2182580 R14: 0002 R15: 2ffc
FS:  7fbffa20b700() GS:882fbfb8() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 00c0 CR3: 002f98a66006 CR4: 001606e0
Call Trace:
  rds_get_mr+0x56/0x80 [rds]
  rds_setsockopt+0x172/0x340 [rds]
  ? __fget_light+0x25/0x60
  ? __fdget+0x13/0x20
  SyS_setsockopt+0x80/0xe0
  do_syscall_64+0x67/0x1b0
  entry_SYSCALL64_slow_path+0x25/0x25
RIP: 0033:0x7fbff9b117f9
RSP: 002b:7fbffa20aed8 EFLAGS: 0293 ORIG_RAX: 0036
RAX: ffda RBX: 000c84a4 RCX: 7fbff9b117f9
RDX: 0002 RSI: 4114 RDI: 109b
RBP: 7fbffa20af10 R08: 0020 R09: 7fbff9dd7860
R10: 2ffc R11: 0293 R12: 
R13: 7fbffa20b9c0 R14: 7fbffa20b700 R15: 0021

Code: 41 56 41 55 49 89 fd 41 54 53 48 83 ec 18 8b 87 f0 02 00 00 48
89 55 d0 48 89 4d c8 85 c0 0f 84 2d 03 00 00 48 8b 87 00 03 00 00 <48>
83 b8 c0 00 00 00 00 0f 84 25 03 00 0 0 48 8b 06 48 8b 56 08

The fix is to check the existence of an underlying transport in
__rds_rdma_map().

Signed-off-by: Håkon Bugge <haakon.bu...@oracle.com>
Reported-by: syzbot <syzkal...@googlegroups.com>
---

Thanks Haakon for testing and posting out the discussed change
out.

Acked-by: Santosh Shilimkar <santosh.shilim...@oracle.com>

Re: [PATCH net-next 3/3] rds: tcp: atomically purge entries from rds_tcp_conn_list during netns delete

2017-11-30 Thread Santosh Shilimkar


On 11/30/2017 11:11 AM, Sowmini Varadhan wrote:

The rds_tcp_kill_sock() function parses the rds_tcp_conn_list
to find the rds_connection entries marked for deletion as part
of the netns deletion under the protection of the rds_tcp_conn_lock.
Since the rds_tcp_conn_list tracks rds_tcp_connections (which
have a 1:1 mapping with rds_conn_path), multiple tc entries in
the rds_tcp_conn_list will map to a single rds_connection, and will
be deleted as part of the rds_conn_destroy() operation that is
done outside the rds_tcp_conn_lock.

The rds_tcp_conn_list traversal done under the protection of
rds_tcp_conn_lock should not leave any doomed tc entries in
the list after the rds_tcp_conn_lock is released, else another
concurrently executiong netns delete (for a differnt netns) thread
may trip on these entries.

Reported-by: syzbot <syzkal...@googlegroups.com>
Signed-off-by: Sowmini Varadhan <sowmini.varad...@oracle.com>
---

Acked-by: Santosh Shilimkar <santosh.shilim...@oracle.com>

Re: [PATCH net-next 2/3] rds: tcp: correctly sequence cleanup on netns deletion.

2017-11-30 Thread Santosh Shilimkar


On 11/30/2017 11:11 AM, Sowmini Varadhan wrote:

Commit 8edc3affc077 ("rds: tcp: Take explicit refcounts on struct net")
introduces a regression in rds-tcp netns cleanup. The cleanup_net(),
(and thus rds_tcp_dev_event notification) is only called from put_net()
when all netns refcounts go to 0, but this cannot happen if the
rds_connection itself is holding a c_net ref that it expects to
release in rds_tcp_kill_sock.

Instead, the rds_tcp_kill_sock callback should make sure to
tear down state carefully, ensuring that the socket teardown
is only done after all data-structures and workqs that depend
on it are quiesced.

The original motivation for commit 8edc3affc077 ("rds: tcp: Take explicit
refcounts on struct net") was to resolve a race condition reported by
syzkaller where workqs for tx/rx/connect were triggered after the
namespace was deleted. Those worker threads should have been
cancelled/flushed before socket tear-down and indeed,
rds_conn_path_destroy() does try to sequence this by doing
  /* cancel cp_send_w */
  /* cancel cp_recv_w */
  /* flush cp_down_w */
  /* free data structures */
Here the "flush cp_down_w" will trigger rds_conn_shutdown and thus
invoke rds_tcp_conn_path_shutdown() to close the tcp socket, so that
we ought to have satisfied the requirement that "socket-close is
done after all other dependent state is quiesced". However,
rds_conn_shutdown has a bug in that it *always* triggers the reconnect
workq (and if connection is successful, we always restart tx/rx
workqs so with the right timing, we risk the race conditions reported
by syzkaller).

Netns deletion is like module teardown- no need to restart a
reconnect in this case. We can use the c_destroy_in_prog bit
to avoid restarting the reconnect.

Fixes: 8edc3affc077 ("rds: tcp: Take explicit refcounts on struct net")
Signed-off-by: Sowmini Varadhan <sowmini.varad...@oracle.com>
---
  net/rds/connection.c |3 ++-
  net/rds/rds.h|6 +++---
  net/rds/tcp.c|4 ++--
  3 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/net/rds/connection.c b/net/rds/connection.c
index 7ee2d5d..9efc82c 100644
--- a/net/rds/connection.c
+++ b/net/rds/connection.c
@@ -366,6 +366,8 @@ void rds_conn_shutdown(struct rds_conn_path *cp)
 * to the conn hash, so we never trigger a reconnect on this
 * conn - the reconnect is always triggered by the active peer. */
cancel_delayed_work_sync(>cp_conn_w);
+   if (conn->c_destroy_in_prog)
+   return;


Not related to this patch but it will be more safe to use
cp_flags or if needed add flag and conn level for bundle
and use bit wise to avoid possible races to set c_destroy_in_prog.

Something similar to RDS_DESTROY_PENDING etc.

The patch itself looks good to me in terms of netns ref counting.

Acked-by: Santosh Shilimkar <santosh.shilim...@oracle.com>

Re: [PATCH net-next 1/3] rds: tcp: remove redundant function rds_tcp_conn_paths_destroy()

2017-11-30 Thread Santosh Shilimkar


On 11/30/2017 11:11 AM, Sowmini Varadhan wrote:

A side-effect of Commit c14b0366813a ("rds: tcp: set linger to 1
when unloading a rds-tcp") is that we always send a RST on the tcp
connection for rds_conn_destroy(), so rds_tcp_conn_paths_destroy()
is not needed any more and is removed in this patch.

Signed-off-by: Sowmini Varadhan <sowmini.varad...@oracle.com>
---

Looks good.
Acked-by: Santosh Shilimkar <santosh.shilim...@oracle.com>

Re: general protection fault in __rds_rdma_map

2017-11-27 Thread Santosh Shilimkar


On 11/27/2017 10:30 AM, syzbot wrote:

Hello,

syzkaller hit the following crash on 
e1d1ea549b57790a3d8cf6300e6ef86118d692a3

git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/master
compiler: gcc (GCC) 7.1.1 20170620
.config is attached
Raw console output is attached.
C reproducer is attached
syzkaller reproducer is attached. See 
https://urldefense.proofpoint.com/v2/url?u=https-3A__goo.gl_kgGztJ=DwIBaQ=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE=hWpFvp_cTkkwMMULcvbV65orOO9Gv3OUaY0ATWhQwak=0pw38xYdDB2QuLTkc6b0N3240iyzMU13jwFZvLaxDSo=0kx55ufXFnBORomS71r4MtXomSqMRKhkHI1tGM3oPic= 


for information about syzkaller reproducers


kasan: GPF could be caused by NULL-ptr deref or user memory access
general protection fault:  [#1] SMP KASAN
RDS: rds_bind could not find a transport for 224.0.0.2, load rds_tcp or 
rds_rdma?


Seems like the RDMA operation got triggered on the non RDMA transport
lead to non populated rs->rs_transport->get_mr(). Also seems like the
tests was running in the namespace and the RDMA transport doesn't
yet support it. Thanks for reporting. Will look into fix internally.

Regards,
Santosh

Re: [lldp-devel] Fwd: [PATCH RESEND] Fix segfault on "lldptool -t -i eth2 -V PFC -c enabled"

2017-11-08 Thread Santosh Shilimkar


On 11/8/2017 1:31 AM, Valentin Vidic wrote:

On Wed, Nov 08, 2017 at 04:24:48AM -0500, Sowmini Varadhan wrote:

I just tried setting up a git pull-request for this to Valentin's repo,
I'm not sure if I followed procedures correctly (sending text patches to
a list comes more naturally to me, and I may have fat-fingered something)

To whom should I send this patch?

We also have a couple of other patches in the pipeline that we are
testing, so setting up a mailing list would be welcomed!


Right, but my repo is just for Debian packaging :)  We would need a
SUSE repo or email address where to send patches - I also have few small
fixes for lldpad and fcoe waiting.


I also heard Mellanox folks have few patches so looping them as well.
IMO email alone will not be enough and a public repo needed so
that latest sources can be available for people developing/using it.

Hannes, you mentioned that you are taking fixes. Is your repo public
where you are applying this fixes. Sorry if I missed that part.

Regards,
Santosh

Re: [PATCH net] rds: ib: Fix NULL pointer dereference in debug code

2017-11-07 Thread Santosh Shilimkar

_DEBUG -DDEBUG" and inserting an artificial delay
between the rdsdebug() and ib_ib_port_recv() statements:

   /* XXX when can this fail? */
   ret = ib_post_recv(ic->i_cm_id->qp, >r_wr, _wr);
+   if (can_wait)
+   usleep_range(1000, 5000);
   rdsdebug("recv %p ibinc %p page %p addr %lu ret %d\n", recv,
recv->r_ibinc, sg_page(>r_frag->f_sg),
(long) ib_sg_dma_address(

The fix is simply to move the rdsdebug() statement up before the
ib_post_recv() and remove the printing of ret, which is taken care of
anyway by the non-debug code.

Signed-off-by: Håkon Bugge <haakon.bu...@oracle.com>
Reviewed-by: Knut Omang <knut.om...@oracle.com>
Reviewed-by: Wei Lin Guay <wei.lin.g...@oracle.com>
---
  net/rds/ib_recv.c | 10 +-
  1 file changed, 5 insertions(+), 5 deletions(-)


Acked-by: Santosh Shilimkar <santosh.shilim...@oracle.com>

Re: [PATCH] rds: Fix inaccurate accounting of unsignaled wrs

2017-10-24 Thread Santosh Shilimkar


On 10/24/2017 9:15 AM, Håkon Bugge wrote:


On 24 Oct 2017, at 18:05, Santosh Shilimkar 
<santosh.shilim...@oracle.com <mailto:santosh.shilim...@oracle.com>> 
wrote:




[...]

Instead of partially doing changes inside/outside helper,
can also add inline helper for solicited state like
rds_ib_set_wr_solicited_state() and use that along
with this change.


Why? There is no book-keeping associated with setting send-solicited. 
Its set on the last fragment of a message and on the last fragment sent 
before throttling due to flow-control.


Creating a function to perform:

         FOO |= BAR;

seems like an overkill to me.


Its just inline helper and keep code consistent for flag
manipulation. Compiler output will be like "FOO =| BAR;" :-)

That being said, in my opinion the fragments of a (large) send should be 
scattered with send-solicited. But that is another commit. But with such 
a commit, I agree with you, a helper function is required.



We already talked about it so lets leave it there.

Regards,
Santosh

Re: [PATCH] rds: Fix inaccurate accounting of unsignaled wrs

2017-10-24 Thread Santosh Shilimkar


On 10/24/2017 7:16 AM, Håkon Bugge wrote:

The number of unsignaled work-requests posted to the IB send queue is
tracked by a counter in the rds_ib_connection struct. When it reaches
zero, or the caller explicitly asks for it, the send-signaled bit is
set in send_flags and the counter is reset. This is performed by the
rds_ib_set_wr_signal_state() function.

However, this function is not always used which yields inaccurate
accounting. This commit fixes this, re-factors a code bloat related to
the matter, and makes the actual parameter type to the function
consistent.

Signed-off-by: Håkon Bugge 
---

Instead of partially doing changes inside/outside helper,
can also add inline helper for solicited state like
rds_ib_set_wr_solicited_state() and use that along
with this change.

Regards,
Santosh

Re: [PATCH] rds: Fix uninitialized variable

2017-10-24 Thread Santosh Shilimkar


$subject
s/rds:/rds: ib:

On 10/24/2017 8:02 AM, Håkon Bugge wrote:

send_flags needs to be initialized before calling
rds_ib_set_wr_signal_state().

Signed-off-by: Håkon Bugge <haakon.bu...@oracle.com>
---

Looks fine otherwise. Please re-post with subject fixed.

Acked-by: Santosh Shilimkar <santosh.shilim...@oracle.com>

Re: [PATCH net] RDS: IB: Initialize max_items based on underlying device attributes

2017-10-03 Thread Santosh Shilimkar


Hi Avinash,

On 10/3/2017 5:50 PM, Avinash Repaka wrote:

Use max_1m_mrs/max_8k_mrs while setting max_items, as the former
variables are set based on the underlying device attricutes.


s/attricutes/attributes


Signed-off-by: Avinash Repaka <avinash.rep...@oracle.com>
---
  net/rds/ib_rdma.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)


This patch is fine but it will be nice if you can collect
these changes in series as you are trying to update the
FRWR support. Like this patch and other cleanup patch
you posted yesterday.

With that log fixed,
Acked-by: Santosh Shilimkar <santosh.shilim...@oracle.com>

Re: [PATCH net] RDS: IB: Limit the scope of has_fr/has_fmr variables

2017-10-03 Thread Santosh Shilimkar


Hi Dave,

On 10/2/2017 1:30 PM, Santosh Shilimkar wrote:

On 10/1/2017 10:56 PM, David Miller wrote:

From: David Miller <da...@davemloft.net>

[...]



Actually, reverted, this breaks the build.

net/rds/rdma_transport.c:38:10: fatal error: ib.h: No such file or 
directory

  #include "ib.h"

Although I can't see how in the world this patch is causing such
an error.


Weird indeed. Will sort this out with Avinash. Thanks Dave.


I tried few build combinations on net-next but couldn't
reproduce the build failure. AFAIU, the build error can only
happen if for some reason the ib.h file got deleted
accidentally. I did delete ib.h file and saw below error

  CC [M]  net/rds/rdma_transport.o
net/rds/rdma_transport.c:38:16: error: ib.h: No such file or directory

Could it be the case for some reason the ib.h file got
deleted or mangled while applying the $subject patch ?

Regards,
Santosh

Re: [PATCH net] RDS: IB: Limit the scope of has_fr/has_fmr variables

2017-10-02 Thread Santosh Shilimkar

On 10/1/2017 10:56 PM, David Miller wrote:

From: David Miller 
Date: Sun, 01 Oct 2017 22:54:19 -0700 (PDT)

From: Avinash Repaka 
Date: Fri, 29 Sep 2017 18:13:50 -0700

This patch fixes the scope of has_fr and has_fmr variables as they are
needed only in rds_ib_add_one().

Signed-off-by: Avinash Repaka 

Applied.

Actually, reverted, this breaks the build.

net/rds/rdma_transport.c:38:10: fatal error: ib.h: No such file or directory
  #include "ib.h"

Although I can't see how in the world this patch is causing such
an error.

Weird indeed. Will sort this out with Avinash. Thanks Dave.

Regards,
Santosh

Re: [PATCH net] RDS: IB: Limit the scope of has_fr/has_fmr variables

2017-09-29 Thread Santosh Shilimkar


On 9/29/2017 6:13 PM, Avinash Repaka wrote:

This patch fixes the scope of has_fr and has_fmr variables as they are
needed only in rds_ib_add_one().

Signed-off-by: Avinash Repaka <avinash.rep...@oracle.com>
---

Indeed the final merge version actually didn't need
those across files. Change looks good to me. Thanks !!

Acked-by: Santosh Shilimkar <santosh.shilim...@oracle.com>

Re: [PATCH net v2] rds: Fix incorrect statistics counting

2017-09-06 Thread Santosh Shilimkar


On 9/6/2017 9:35 AM, Håkon Bugge wrote:

In rds_send_xmit() there is logic to batch the sends. However, if
another thread has acquired the lock and has incremented the send_gen,
it is considered a race and we yield. The code incrementing the
s_send_lock_queue_raced statistics counter did not count this event
correctly.

This commit counts the race condition correctly.

Changes from v1:
- Removed check for *someone_on_xmit()*
- Fixed incorrect indentation

Signed-off-by: Håkon Bugge <haakon.bu...@oracle.com>
Reviewed-by: Knut Omang <knut.om...@oracle.com>
---


Thanks for the update.
Acked-by: Santosh Shilimkar <santosh.shilim...@oracle.com>

Re: [PATCH net] rds: Fix incorrect statistics counting

2017-09-06 Thread Santosh Shilimkar


On 9/6/2017 9:12 AM, Håkon Bugge wrote:



[...]



Hi Santosh,


Yes, I agree with accuracy of s_send_lock_queue_raced. But the main point is 
that the existing code counts some partial share of when it is _not_ raced.

So, in the critical path, my patch adds one test_bit(), which hits the local 
CPU cache, if not raced. If raced, some other thread is in control, so I would 
not think the added cycles would make any big difference.


Cycles added for no good reason is the point.


I can send a v2 where the race tightening is removed if you like.


Yes please.

Regards,
Santosh

Re: [PATCH net] rds: Fix incorrect statistics counting

2017-09-06 Thread Santosh Shilimkar


On 9/6/2017 8:29 AM, Håkon Bugge wrote:

In rds_send_xmit() there is logic to batch the sends. However, if
another thread has acquired the lock, it is considered a race and we
yield. The code incrementing the s_send_lock_queue_raced statistics
counter did not count this event correctly.

This commit removes a small race in determining the race and
increments the statistics counter correctly.

Signed-off-by: Håkon Bugge 
Reviewed-by: Knut Omang 
---
  net/rds/send.c | 16 +---
  1 file changed, 13 insertions(+), 3 deletions(-)


Those counters are not really to give that accurate so
am not very keen to add additional cycles in send paths
and add additional code. Have you seen any real issue
or this is just a observation. s_send_lock_queue_raced
counter is never used to check for smaller increments
and hence the question.

Regards,
Santosh

Re: [PATCH net] rds: Fix non-atomic operation on shared flag variable

2017-09-05 Thread Santosh Shilimkar


On 9/5/2017 8:42 AM, Håkon Bugge wrote:

The bits in m_flags in struct rds_message are used for a plurality of
reasons, and from different contexts. To avoid any missing updates to
m_flags, use the atomic set_bit() instead of the non-atomic equivalent.

Signed-off-by: Håkon Bugge <haakon.bu...@oracle.com>
Reviewed-by: Knut Omang <knut.om...@oracle.com>
Reviewed-by: Wei Lin Guay <wei.lin.g...@oracle.com>
---
  net/rds/send.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/rds/send.c b/net/rds/send.c
index 41b9f0f..058a407 100644
--- a/net/rds/send.c
+++ b/net/rds/send.c
@@ -273,7 +273,7 @@ int rds_send_xmit(struct rds_conn_path *cp)
len = ntohl(rm->m_inc.i_hdr.h_len);
if (cp->cp_unacked_packets == 0 ||
cp->cp_unacked_bytes < len) {
-   __set_bit(RDS_MSG_ACK_REQUIRED, >m_flags);
+   set_bit(RDS_MSG_ACK_REQUIRED, >m_flags);
  
  cp->cp_unacked_packets =

rds_sysctl_max_unacked_packets;
@@ -829,7 +829,7 @@ static int rds_send_queue_rm(struct rds_sock *rs, struct 
rds_connection *conn,
 * throughput hits a certain threshold.
 */
if (rs->rs_snd_bytes >= rds_sk_sndbuf(rs) / 2)
-   __set_bit(RDS_MSG_ACK_REQUIRED, >m_flags);
+   set_bit(RDS_MSG_ACK_REQUIRED, >m_flags);
  
  		list_add_tail(>m_sock_item, >rs_send_queue);

set_bit(RDS_MSG_ON_SOCK, >m_flags);


Indeed, these couple of instances remained for the m_flags.
Patch looks good. Thanks !!

Acked-by: Santosh Shilimkar <santosh.shilim...@oracle.com>

Re: [PATCH 5/7] RDS: make rhashtable_params const

2017-08-25 Thread Santosh Shilimkar


8/25/2017 7:21 AM, Bhumika Goyal wrote:

Make this const as it is either used during a copy operation or passed
to a const argument of the function rhltable_init

Signed-off-by: Bhumika Goyal <bhumi...@gmail.com>
---

Acked-by: Santosh Shilimkar <santosh.shilim...@oracle.com>

Re: [PATCH net] rds: Reintroduce statistics counting

2017-08-08 Thread Santosh Shilimkar


On 8/8/2017 2:13 AM, Håkon Bugge wrote:

In commit 7e3f2952eeb1 ("rds: don't let RDS shutdown a connection
while senders are present"), refilling the receive queue was removed
from rds_ib_recv(), along with the increment of
s_ib_rx_refill_from_thread.

Commit 73ce4317bf98 ("RDS: make sure we post recv buffers")
re-introduces filling the receive queue from rds_ib_recv(), but does
not add the statistics counter. rds_ib_recv() was later renamed to
rds_ib_recv_path().

This commit reintroduces the statistics counting of
s_ib_rx_refill_from_thread and s_ib_rx_refill_from_cq.

Signed-off-by: Håkon Bugge <haakon.bu...@oracle.com>
Reviewed-by: Knut Omang <knut.om...@oracle.com>
Reviewed-by: Wei Lin Guay <wei.lin.g...@oracle.com>
---

Looks fine.
Acked-by: Santosh Shilimkar <santosh.shilim...@oracle.com>

Re: [PATCH 0/3] ARM: dts: keystone-k2g: Add DCAN instances to 66AK2G

2017-08-04 Thread Santosh Shilimkar


Hi Franklin,

On 8/2/2017 1:18 PM, Franklin S Cooper Jr wrote:

Add D CAN nodes to 66AK2G based SoC dtsi.

Franklin S Cooper Jr (2):
   dt-bindings: net: c_can: Update binding for clock and power-domains
 property
   ARM: configs: keystone: Enable D_CAN driver

Lokesh Vutla (1):
   ARM: dts: k2g: Add DCAN nodes


Any DCAN driver dependency with these patchset ? If not, I can
queue this up so do let me know.

Regards,
Santosh

Re: [PATCH net] rds: Make sure updates to cp_send_gen can be observed

2017-07-20 Thread Santosh Shilimkar


On 7/20/2017 3:28 AM, Håkon Bugge wrote:

cp->cp_send_gen is treated as a normal variable, although it may be
used by different threads.

This is fixed by using {READ,WRITE}_ONCE when it is incremented and
READ_ONCE when it is read outside the {acquire,release}_in_xmit
protection.


There is explicit memory barrier before the value is read outside
the {acquire,release}_in_xmit so it takes care of load/store sync.


Normative reference from the Linux-Kernel Memory Model:

 Loads from and stores to shared (but non-atomic) variables should
 be protected with the READ_ONCE(), WRITE_ONCE(), and
 ACCESS_ONCE().

Clause 5.1.2.4/25 in the C standard is also relevant.

Signed-off-by: Håkon Bugge <haakon.bu...@oracle.com>
Reviewed-by: Knut Omang <knut.om...@oracle.com>
---

Having said that, {READ,WRITE}_ONCE usages seems to make
it clear and explicit. So its fine with me.

Acked-by: Santosh Shilimkar <santosh.shilim...@oracle.com>

Re: [PATCH net 1/2] rds: tcp: use sock_create_lite() to create the accept socket

2017-07-06 Thread Santosh Shilimkar


On 7/6/2017 8:15 AM, Sowmini Varadhan wrote:

There are two problems with calling sock_create_kern() from
rds_tcp_accept_one()
1. it sets up a new_sock->sk that is wasteful, because this ->sk
is going to get replaced by inet_accept() in the subsequent ->accept()
2. The new_sock->sk is a leaked reference in sock_graft() which
expects to find a null parent->sk

Avoid these problems by calling sock_create_lite().

Signed-off-by: Sowmini Varadhan <sowmini.varad...@oracle.com>
---

Acked-by: Santosh Shilimkar <santosh.shilim...@oracle.com>

Re: [PATCH 1/3] RDS: IB: Delete an error message for a failed memory allocation in rds_ib_add_one()

2017-05-22 Thread Santosh Shilimkar


On 5/22/2017 7:11 AM, SF Markus Elfring wrote:

From: Markus Elfring <elfr...@users.sourceforge.net>
Date: Mon, 22 May 2017 15:34:28 +0200

Omit an extra message for a memory allocation failure in this function.

This issue was detected by using the Coccinelle software.

Link: 
http://events.linuxfoundation.org/sites/events/files/slides/LCJ16-Refactor_Strings-WSang_0.pdf
Signed-off-by: Markus Elfring <elfr...@users.sourceforge.net>
---
  net/rds/ib.c | 5 +
  1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/net/rds/ib.c b/net/rds/ib.c
index 7a64c8db81ab..c5514d058171 100644
--- a/net/rds/ib.c
+++ b/net/rds/ib.c
@@ -166,8 +166,5 @@ static void rds_ib_add_one(struct ib_device *device)
-   if (!rds_ibdev->vector_load) {
-   pr_err("RDS/IB: %s failed to allocate vector memory\n",
-   __func__);
+   if (!rds_ibdev->vector_load)
goto put_dev;
-   }
  

Well the ENOMEM is not carried here so the message was usefu
but its not critical so its fine to clean that up.

Acked-by: Santosh Shilimkar <santosh.shilim...@oracle.com>

Re: [PATCH 2/3] RDS: IB: Improve a size determination in rds_ib_add_one()

2017-05-22 Thread Santosh Shilimkar

On 5/22/2017 7:12 AM, SF Markus Elfring wrote:

From: Markus Elfring <elfr...@users.sourceforge.net>
Date: Mon, 22 May 2017 15:40:21 +0200

Replace the specification of a data structure by a pointer dereference
as the parameter for the operator "sizeof" to make the corresponding size
determination a bit safer according to the Linux coding style convention.

Signed-off-by: Markus Elfring <elfr...@users.sourceforge.net>
---

Acked-by: Santosh Shilimkar <santosh.shilim...@oracle.com>

Re: [PATCH] RDS: IB: ensure an initialized ret is printed in pr_warn message

2017-04-07 Thread Santosh Shilimkar


On 4/7/2017 7:24 AM, Dan Carpenter wrote:

Setting it to zero doesn't seem like the right thing, it should be an
error code.  Oh, heh...  Smatch parses this one correctly.  "ret" is
always initialized but the code is probably buggy in a different way:



[...]


   561  ibmr = rds_ib_reg_fmr(rds_ibdev, sg, nents, key_ret);
   562  if (ibmr)

This condition is always true because those functions return ERR_PTRs
not NULLs.


Thanks Dan. I will fix that up.

Regards,
Santosh

1 2 3 4 >

1 - 100 of 386 matches

Mail list logo