Re: [PATCH net-next] RDS: deliver zerocopy completion notification with data as an optimization

2018-02-22 Thread Sowmini Varadhan
On (02/21/18 19:39), Willem de Bruijn wrote:
> >> By the way, the put_cmsg is unconditional even if the caller did
> >> not supply msg_control. So it is basically no longer safe to ever
> >> call read, recv or recvfrom on a socket if zerocopy notifications
> >> are outstanding.
> >
> > Wait, I thought put_cmsg already checks for these things.
> 
> It does, and sets MSG_CTRUNC to signal that it was unable to
> write all control data. But by then the notifications have already
> been dequeued.

Putting hyperbole about "no longer safe to ever call read etc" aside,

put_cmsg can also return EFAULT if uspace provides a bogus cmsghdr,
(i.e., copy_to_user fails). So the only thing you can do to really
protect against every possible thing is to requeue the notification 
if put_cmsg fails.


Re: [PATCH net-next] RDS: deliver zerocopy completion notification with data as an optimization

2018-02-21 Thread Willem de Bruijn
On Wed, Feb 21, 2018 at 7:26 PM, Sowmini Varadhan
 wrote:
> On (02/21/18 18:45), Willem de Bruijn wrote:
>>
>> I do mean returning 0 instead of -EAGAIN if control data is ready.
>> Something like
>>
>> @@ -611,7 +611,8 @@ int rds_recvmsg(struct socket *sock, struct msghdr
>> *msg, size_t size,
>>
>> if (!rds_next_incoming(rs, )) {
>> if (nonblock) {
>> -   ret = -EAGAIN;
>> +   ncookies = rds_recvmsg_zcookie(rs, msg);
>> +   ret = ncookies ? 0 : -EAGAIN;
>> break;
>> }
>
> Yes, but you now have an implicit branch based on ncookies, so I'm
> not sure it saved all that much?

At least it removes the extra list empty check in the hot path and
relegates this to the relatively rare branch when the queue is empty
and the syscall is non-blocking.

> like I said let me revisit this

Okay. I won't harp on this further.

>> By the way, the put_cmsg is unconditional even if the caller did
>> not supply msg_control. So it is basically no longer safe to ever
>> call read, recv or recvfrom on a socket if zerocopy notifications
>> are outstanding.
>
> Wait, I thought put_cmsg already checks for these things.

It does, and sets MSG_CTRUNC to signal that it was unable to
write all control data. But by then the notifications have already
been dequeued.

>> It is possible to check msg_controllen before even deciding whether
>> to try to dequeue notifications (and take the lock). I see that this is
>> not common. But RDS of all cases seems to do this, in
>> rds_notify_queue_get:
>
> yes the comment above that code suggests that this bit of code
> was done to avoid calling put_cmsg while holding the rs_lock.
>
> One bit of administrivia though- if I now drop sk_error_queue for
> PF_RDS, I'll have to fix selftests in the same patch too, so the
> patch will get a bit bulky (and thus a bit more difficult to review).

Understood. It might be cleanest to split into three patches. One
revert of the error queue code, one new feature and one update
to the test.


Re: [PATCH net-next] RDS: deliver zerocopy completion notification with data as an optimization

2018-02-21 Thread Sowmini Varadhan
On (02/21/18 18:45), Willem de Bruijn wrote:
> 
> I do mean returning 0 instead of -EAGAIN if control data is ready.
> Something like
> 
> @@ -611,7 +611,8 @@ int rds_recvmsg(struct socket *sock, struct msghdr
> *msg, size_t size,
> 
> if (!rds_next_incoming(rs, )) {
> if (nonblock) {
> -   ret = -EAGAIN;
> +   ncookies = rds_recvmsg_zcookie(rs, msg);
> +   ret = ncookies ? 0 : -EAGAIN;
> break;
> }

Yes, but you now have an implicit branch based on ncookies, so I'm
not sure it saved all that much? like I said let me revisit this

> By the way, the put_cmsg is unconditional even if the caller did
> not supply msg_control. So it is basically no longer safe to ever
> call read, recv or recvfrom on a socket if zerocopy notifications
> are outstanding.

Wait, I thought put_cmsg already checks for these things. 

> It is possible to check msg_controllen before even deciding whether
> to try to dequeue notifications (and take the lock). I see that this is
> not common. But RDS of all cases seems to do this, in
> rds_notify_queue_get:

yes the comment above that code suggests that this bit of code
was done to avoid calling put_cmsg while holding the rs_lock.

One bit of administrivia though- if I now drop sk_error_queue for
PF_RDS, I'll have to fix selftests in the same patch too, so the
patch will get a bit bulky (and thus a bit more difficult to review).



Re: [PATCH net-next] RDS: deliver zerocopy completion notification with data as an optimization

2018-02-21 Thread Willem de Bruijn
>> Okay. If callers must already handle 0 as a valid return code, then
>> it is fine to add another case that does this.
>>
>> The extra branch in the hot path is still rather unfortunately. Could
>> this be integrated in the existing if (nonblock) branch below?
>
> that's where I first started. it got even hairier because the
> callers expect a retval of 0 (-EAGAIN threw rds-stress into an
> infinite loop of continulally trying to recv) and the end result
> was just confusing code with the same number of branches..

I do mean returning 0 instead of -EAGAIN if control data is ready.
Something like

@@ -611,7 +611,8 @@ int rds_recvmsg(struct socket *sock, struct msghdr
*msg, size_t size,

if (!rds_next_incoming(rs, )) {
if (nonblock) {
-   ret = -EAGAIN;
+   ncookies = rds_recvmsg_zcookie(rs, msg);
+   ret = ncookies ? 0 : -EAGAIN;
break;
}

By the way, the put_cmsg is unconditional even if the caller did
not supply msg_control. So it is basically no longer safe to ever
call read, recv or recvfrom on a socket if zerocopy notifications
are outstanding.

It is possible to check msg_controllen before even deciding whether
to try to dequeue notifications (and take the lock). I see that this is
not common. But RDS of all cases seems to do this, in
rds_notify_queue_get:

max_messages = msghdr->msg_controllen /
CMSG_SPACE(sizeof(cmsg));


Re: [PATCH net-next] RDS: deliver zerocopy completion notification with data as an optimization

2018-02-21 Thread Sowmini Varadhan
On (02/21/18 17:50), Willem de Bruijn wrote:
> 
> In the common case no more than one notification will be outstanding,
> but with a fixed number of notifications per packet, in edge cases this
> list may be long.
   :
> Socket functions block if sk_err is non-zero. See for instance
> tcp_sendmsg_locked. It is set by most code that also calls
> sock_queue_err_skb and also on dequeue from err skb.
> 
> This is the main reason that I would consider dropping error
> queue completely if you expect all users of RDS to use the
> cmsg on regular read to get these notifications.

I see. That's a good point, and maybe it makes sense to just have
a struct sk_buff_head rs_zcookie_quese on the rds_sock, and
have rds_rm_zerocopy_callback chain cookies ot this  rs_zcookie_queue.

[discussion regarding rds_recvmsg return values elided]

> Okay. If callers must already handle 0 as a valid return code, then
> it is fine to add another case that does this.
> 
> The extra branch in the hot path is still rather unfortunately. Could
> this be integrated in the existing if (nonblock) branch below?

that's where I first started. it got even hairier because the
callers expect a retval of 0 (-EAGAIN threw rds-stress into an
infinite loop of continulally trying to recv) and the end result
was just confusing code with the same number of branches.. 
let me revisit this when I spin out V2 without the sk_error_queue..

--Sowmini


Re: [PATCH net-next] RDS: deliver zerocopy completion notification with data as an optimization

2018-02-21 Thread Willem de Bruijn
On Wed, Feb 21, 2018 at 5:14 PM, Sowmini Varadhan
 wrote:
> On (02/21/18 16:54), Willem de Bruijn wrote:
>>
>> I'd put this optimization behind a socket option.
>
> Yes, that thought occurred to me as well- I think RDS applications
> are unlikely to use the error_queue path because, as I mentioned
> before, these are heavily request-response based, so you're
> going to be getting something back for each sendmsg anyway.
>
> So if I had a sockopt, it would be something that would be
> "piggyback completion" most (all?) of the time anyway, but I suppose
> making it explicit is useful to have.
>
>> Either that, or always store cookies on an RDS specific queue, return as recv
>> cmsg and then remove the alternative error queue path completely.
>
> I think the error queue path is good to have, if only to be aligned
> with other socket families.
>
>> > +   spin_lock_irqsave(>lock, flags);
>>
>> This seems expensive on every recvmsg. Even if zerocopy is not enabled.
>
> noted, I'll predicate it on both zcopy and the sockopt suggestion.
>
>> > +   if (skb_queue_empty(q)) {
>> > +   spin_unlock_irqrestore(>lock, flags);
>> > +   return 0;
>> > +   }
>> > +   skb_queue_walk_safe(q, skb, tmp) {
>>
>> This too. If anything, just peek at the queue head and skip otherwise.
>> Anything else will cause an error, which blocks regular read and write
>> on the socket.
>
> but that would be technically incorrect, because you could have a
> mix/match of the zcopy notification with other sk_error_queue messages
> (given that rds_rm_zerocopy_callback looks at the tail, and
> creates a new skb if the tail is not a SO_EE_ORIGIN_ZCOOKIE).
>
> Of course, its true that *today* the only thing in the rds socket
> error queue is the cookie list, so this queue walk is a bit of overkill..
>
> but maybe I am missing something you are concerned about? The queue walk is
> intended to pull out the first SO_EE_ORIGIN_ZCOOKIE (if it exists)
> and return quietly otherwise

Yes, avoiding out of order completions is sensible.

In the common case no more than one notification will be outstanding,
but with a fixed number of notifications per packet, in edge cases this
list may be long.

Reading using a reverse walk avoids that problem.

> (leaving err_queue unchanged)- where
> do you see the blocking error?

Socket functions block if sk_err is non-zero. See for instance
tcp_sendmsg_locked. It is set by most code that also calls
sock_queue_err_skb and also on dequeue from err skb.

This is the main reason that I would consider dropping error
queue completely if you expect all users of RDS to use the
cmsg on regular read to get these notifications.

>> > +   if (list_empty(>rs_recv_queue) && nonblock) {
>> > +   ncookies = rds_recvmsg_zcookie(rs, msg);
>> > +   if (ncookies) {
>> > +   ret = 0;
>>
>> This probably changes semantics for MSG_DONTWAIT. Will it ever return 0 now?
>
> The only time MSG_DONTWAIT was previously returning 0 for PF_RDS
> was when there was a congestion notification or rdma completion
> i.e., the checks for rs_notify_queue and rs_cong_notify earlier
> in the function, and in those 2 cases it was not returning data
> (i.e. ret was always 0). In all other cases  rds_recvmsg would
> always return either
> - ret == -1, with errno set to EAGAIN or ETIMEDOUT, depending on the value
>   of MSG_DONTWAIT, or,
> - ret > 0 with data bytes.
>
> That behavior (as well existing behavior for congestion notification
> and rdma completion) has not changed. The only thing that changed is that
> if there is no data to pass up, the ret will be 0, errno will be 0,
> and the ancillary data may have completion cookies.

Okay. If callers must already handle 0 as a valid return code, then
it is fine to add another case that does this.

The extra branch in the hot path is still rather unfortunately. Could
this be integrated in the existing if (nonblock) branch below?


Re: [PATCH net-next] RDS: deliver zerocopy completion notification with data as an optimization

2018-02-21 Thread Sowmini Varadhan
On (02/21/18 16:54), Willem de Bruijn wrote:
> 
> I'd put this optimization behind a socket option.

Yes, that thought occurred to me as well- I think RDS applications
are unlikely to use the error_queue path because, as I mentioned
before, these are heavily request-response based, so you're 
going to be getting something back for each sendmsg anyway.

So if I had a sockopt, it would be something that would be
"piggyback completion" most (all?) of the time anyway, but I suppose
making it explicit is useful to have.

> Either that, or always store cookies on an RDS specific queue, return as recv
> cmsg and then remove the alternative error queue path completely.

I think the error queue path is good to have, if only to be aligned
with other socket families.

> > +   spin_lock_irqsave(>lock, flags);
> 
> This seems expensive on every recvmsg. Even if zerocopy is not enabled.

noted, I'll predicate it on both zcopy and the sockopt suggestion.

> > +   if (skb_queue_empty(q)) {
> > +   spin_unlock_irqrestore(>lock, flags);
> > +   return 0;
> > +   }
> > +   skb_queue_walk_safe(q, skb, tmp) {
> 
> This too. If anything, just peek at the queue head and skip otherwise.
> Anything else will cause an error, which blocks regular read and write
> on the socket.

but that would be technically incorrect, because you could have a 
mix/match of the zcopy notification with other sk_error_queue messages
(given that rds_rm_zerocopy_callback looks at the tail, and
creates a new skb if the tail is not a SO_EE_ORIGIN_ZCOOKIE).

Of course, its true that *today* the only thing in the rds socket
error queue is the cookie list, so this queue walk is a bit of overkill..

but maybe I am missing something you are concerned about? The queue walk is
intended to pull out the first SO_EE_ORIGIN_ZCOOKIE (if it exists)
and return quietly otherwise (leaving err_queue unchanged)- where
do you see the blocking error?


> > +   if (list_empty(>rs_recv_queue) && nonblock) {
> > +   ncookies = rds_recvmsg_zcookie(rs, msg);
> > +   if (ncookies) {
> > +   ret = 0;
> 
> This probably changes semantics for MSG_DONTWAIT. Will it ever return 0 now?

The only time MSG_DONTWAIT was previously returning 0 for PF_RDS
was when there was a congestion notification or rdma completion
i.e., the checks for rs_notify_queue and rs_cong_notify earlier
in the function, and in those 2 cases it was not returning data
(i.e. ret was always 0). In all other cases  rds_recvmsg would
always return either
- ret == -1, with errno set to EAGAIN or ETIMEDOUT, depending on the value
  of MSG_DONTWAIT, or,
- ret > 0 with data bytes.

That behavior (as well existing behavior for congestion notification
and rdma completion) has not changed. The only thing that changed is that
if there is no data to pass up, the ret will be 0, errno will be 0,
and the ancillary data may have completion cookies.

--Sowmini
  


Re: [PATCH net-next] RDS: deliver zerocopy completion notification with data as an optimization

2018-02-21 Thread Willem de Bruijn
On Wed, Feb 21, 2018 at 3:19 PM, Sowmini Varadhan
 wrote:
> This commit is an optimization that builds on top of commit 01883eda72bd
> ("rds: support for zcopy completion notification") for PF_RDS sockets.
>
> Cookies associated with zerocopy completion are passed up on the POLLIN
> channel, piggybacked with data whereever possible. Such cookies are passed
> up as ancillary data (at level SOL_RDS) in a struct rds_zcopy_cookies when
> the returned value of recvmsg() is >= 0. A max of SO_EE_ORIGIN_MAX_ZCOOKIES
> may be passed with each message.

I'd put this optimization behind a socket option.

Either that, or always store cookies on an RDS specific queue, return as recv
cmsg and then remove the alternative error queue path completely.

Having both is confusing, also in terms of system behavior (e.g., see signaling
and sk_err handling in sock_dequeue_err_skb).

> +static int rds_recvmsg_zcookie(struct rds_sock *rs, struct msghdr *msg)
> +{
> +   struct sk_buff *skb, *tmp;
> +   struct sock_exterr_skb *serr;
> +   struct sock *sk = rds_rs_to_sk(rs);
> +   struct sk_buff_head *q = >sk_error_queue;
> +   struct rds_zcopy_cookies done;
> +   u32 *ptr;
> +   int i;
> +   unsigned long flags;
> +
> +   spin_lock_irqsave(>lock, flags);

This seems expensive on every recvmsg. Even if zerocopy is not enabled.

> +   if (skb_queue_empty(q)) {
> +   spin_unlock_irqrestore(>lock, flags);
> +   return 0;
> +   }
> +   skb_queue_walk_safe(q, skb, tmp) {

This too. If anything, just peek at the queue head and skip otherwise.
Anything else will cause an error, which blocks regular read and write
on the socket.

>  int rds_recvmsg(struct socket *sock, struct msghdr *msg, size_t size,
> int msg_flags)
>  {
> @@ -586,6 +623,7 @@ int rds_recvmsg(struct socket *sock, struct msghdr *msg, 
> size_t size,
> int ret = 0, nonblock = msg_flags & MSG_DONTWAIT;
> DECLARE_SOCKADDR(struct sockaddr_in *, sin, msg->msg_name);
> struct rds_incoming *inc = NULL;
> +   int ncookies;
>
> /* udp_recvmsg()->sock_recvtimeo() gets away without locking too.. */
> timeo = sock_rcvtimeo(sk, nonblock);
> @@ -609,6 +647,14 @@ int rds_recvmsg(struct socket *sock, struct msghdr *msg, 
> size_t size,
> break;
> }
>
> +   if (list_empty(>rs_recv_queue) && nonblock) {
> +   ncookies = rds_recvmsg_zcookie(rs, msg);
> +   if (ncookies) {
> +   ret = 0;

This probably changes semantics for MSG_DONTWAIT. Will it ever return 0 now?


Re: [PATCH net-next] RDS: deliver zerocopy completion notification with data as an optimization

2018-02-21 Thread Santosh Shilimkar

On 2/21/2018 12:19 PM, Sowmini Varadhan wrote:

This commit is an optimization that builds on top of commit 01883eda72bd
("rds: support for zcopy completion notification") for PF_RDS sockets.

Cookies associated with zerocopy completion are passed up on the POLLIN
channel, piggybacked with data whereever possible. Such cookies are passed

s/whereever/wherever


up as ancillary data (at level SOL_RDS) in a struct rds_zcopy_cookies when
the returned value of recvmsg() is >= 0. A max of SO_EE_ORIGIN_MAX_ZCOOKIES
may be passed with each message.

Signed-off-by: Sowmini Varadhan 
---

Acked-by: Santosh Shilimkar