Re: [PATCH V2 net-next 5/7] rds: zerocopy Tx support.
On Wed, Feb 14, 2018 at 7:09 PM, Sowmini Varadhan wrote: > On (02/14/18 18:48), Willem de Bruijn wrote: >> >> If the missing break is intentional, no need to respin just for the other >> minor comments. > > yes the missing break is intentional- the function returns the > size of the scatterlist needed for RDMA, and RDS_CMSG_ZCOPY_COOKIE > (like RDMA_DEST and RDMA_MAP) is meta-data that does not change > that size. > > I expect to be in the neighborhood of this code pretty soon, to get > the additional opimization of passing up the zcopy completion > as part of recvmsg (see the discussion in > https://www.mail-archive.com/netdev@vger.kernel.org/msg212788.html) > > I can take care of the other code-cleanup comment suggestions in > here at that time.. Sounds good.
Re: [PATCH V2 net-next 5/7] rds: zerocopy Tx support.
On (02/14/18 18:48), Willem de Bruijn wrote: > > If the missing break is intentional, no need to respin just for the other > minor comments. yes the missing break is intentional- the function returns the size of the scatterlist needed for RDMA, and RDS_CMSG_ZCOPY_COOKIE (like RDMA_DEST and RDMA_MAP) is meta-data that does not change that size. I expect to be in the neighborhood of this code pretty soon, to get the additional opimization of passing up the zcopy completion as part of recvmsg (see the discussion in https://www.mail-archive.com/netdev@vger.kernel.org/msg212788.html) I can take care of the other code-cleanup comment suggestions in here at that time.. --Sowmini
Re: [PATCH V2 net-next 5/7] rds: zerocopy Tx support.
On Wed, Feb 14, 2018 at 5:28 AM, Sowmini Varadhan wrote: > If the MSG_ZEROCOPY flag is specified with rds_sendmsg(), and, > if the SO_ZEROCOPY socket option has been set on the PF_RDS socket, > application pages sent down with rds_sendmsg() are pinned. > > The pinning uses the accounting infrastructure added by > Commit a91dbff551a6 ("sock: ulimit on MSG_ZEROCOPY pages") > > The payload bytes in the message may not be modified for the > duration that the message has been pinned. A multi-threaded > application using this infrastructure may thus need to be notified > about send-completion so that it can free/reuse the buffers > passed to rds_sendmsg(). Notification of send-completion will > identify each message-buffer by a cookie that the application > must specify as ancillary data to rds_sendmsg(). > The ancillary data in this case has cmsg_level == SOL_RDS > and cmsg_type == RDS_CMSG_ZCOPY_COOKIE. > > Signed-off-by: Sowmini Varadhan > --- If the missing break is intentional, no need to respin just for the other minor comments. > @@ -341,12 +341,14 @@ struct rds_message *rds_message_map_pages(unsigned long > *page_addrs, unsigned in > return rm; > } > > -int rds_message_copy_from_user(struct rds_message *rm, struct iov_iter *from) > +int rds_message_copy_from_user(struct rds_message *rm, struct iov_iter *from, > + bool zcopy) > { > unsigned long to_copy, nbytes; > unsigned long sg_off; > struct scatterlist *sg; > int ret = 0; > + int length = iov_iter_count(from); > > rm->m_inc.i_hdr.h_len = cpu_to_be32(iov_iter_count(from)); > > @@ -356,6 +358,53 @@ int rds_message_copy_from_user(struct rds_message *rm, > struct iov_iter *from) > sg = rm->data.op_sg; > sg_off = 0; /* Dear gcc, sg->page will be null from kzalloc. */ > > + if (zcopy) { > + int total_copied = 0; > + struct sk_buff *skb; > + > + skb = alloc_skb(SO_EE_ORIGIN_MAX_ZCOOKIES * sizeof(u32), > + GFP_KERNEL); > + if (!skb) > + return -ENOMEM; > + rm->data.op_mmp_znotifier = RDS_ZCOPY_SKB(skb); > + memset(rm->data.op_mmp_znotifier, 0, > + sizeof(*rm->data.op_mmp_znotifier)); Not strictly needed, as alloc_skb clears skb->cb[] > + if (mm_account_pinned_pages(&rm->data.op_mmp_znotifier->z_mmp, > + length)) { > + consume_skb(skb); > + rm->data.op_mmp_znotifier = NULL; > + return -ENOMEM; > + } One less action to revert if moving the mm_account_pinned_pages check before assigning op_mmp_znotifier. Conversely, move to an err: label at the end to be able to deduplicate with the error branch introduced below. > @@ -875,12 +875,13 @@ static int rds_send_queue_rm(struct rds_sock *rs, > struct rds_connection *conn, > * rds_message is getting to be quite complicated, and we'd like to allocate > * it all in one go. This figures out how big it needs to be up front. > */ > -static int rds_rm_size(struct msghdr *msg, int data_len) > +static int rds_rm_size(struct msghdr *msg, int num_sgs) > { > struct cmsghdr *cmsg; > int size = 0; > int cmsg_groups = 0; > int retval; > + bool zcopy_cookie = false; > > for_each_cmsghdr(cmsg, msg) { > if (!CMSG_OK(msg, cmsg)) > @@ -899,6 +900,8 @@ static int rds_rm_size(struct msghdr *msg, int data_len) > > break; > > + case RDS_CMSG_ZCOPY_COOKIE: > + zcopy_cookie = true; break, or if intended to fall through, please label as such. > case RDS_CMSG_RDMA_DEST: > case RDS_CMSG_RDMA_MAP: > cmsg_groups |= 2;
Re: [PATCH V2 net-next 5/7] rds: zerocopy Tx support.
On 2/14/2018 11:49 AM, Sowmini Varadhan wrote: On (02/14/18 11:10), Santosh Shilimkar wrote: s/RDS_CMSG_ZCOPY_COOKIE/RDS_CMSG_ZMSGCOPY_COOKIE Please see https://www.spinics.net/lists/netdev/msg483627.html Just saw it and responded to Dave. @@ -356,6 +358,53 @@ int rds_message_copy_from_user(struct rds_message *rm, struct iov_iter *from) sg = rm->data.op_sg; sg_off = 0; /* Dear gcc, sg->page will be null from kzalloc. */ + if (zcopy) { + int total_copied = 0; + struct sk_buff *skb; + + skb = alloc_skb(SO_EE_ORIGIN_MAX_ZCOOKIES * sizeof(u32), + GFP_KERNEL); This can sleep so you might want to check if you want to use ATOMIC version here. I think it should be fine: rds_message_copy_from_user() is called in process context, and if you notice, the calling function rds_sendmsg() also has this 1100 rm = rds_message_alloc(ret, GFP_KERNEL); 1101 if (!rm) { 1102 ret = -ENOMEM; 1103 goto out; 1104 } : 1106 /* Attach data to the rm */ : 1113 ret = rds_message_copy_from_user(rm, &msg->msg_iter); So using GFP_KERNEL is as safe as the call at line 1100. Was just asking you to check if it is safe. The path already does that so we are good. + return -ENOMEM; + } NOMEM new application visible change but probably the right one for this particular case. Just need to make sure application can handle this error. I think the application already handles this correctly (see line 1102 above) Indeed. Thanks for checking. Regards, Santosh
Re: [PATCH V2 net-next 5/7] rds: zerocopy Tx support.
On (02/14/18 11:10), Santosh Shilimkar wrote: > s/RDS_CMSG_ZCOPY_COOKIE/RDS_CMSG_ZMSGCOPY_COOKIE > Please see https://www.spinics.net/lists/netdev/msg483627.html > >@@ -356,6 +358,53 @@ int rds_message_copy_from_user(struct rds_message *rm, > >struct iov_iter *from) > > sg = rm->data.op_sg; > > sg_off = 0; /* Dear gcc, sg->page will be null from kzalloc. */ > >+if (zcopy) { > >+int total_copied = 0; > >+struct sk_buff *skb; > >+ > >+skb = alloc_skb(SO_EE_ORIGIN_MAX_ZCOOKIES * sizeof(u32), > >+GFP_KERNEL); > This can sleep so you might want to check if you want to use ATOMIC version > here. I think it should be fine: rds_message_copy_from_user() is called in process context, and if you notice, the calling function rds_sendmsg() also has this 1100 rm = rds_message_alloc(ret, GFP_KERNEL); 1101 if (!rm) { 1102 ret = -ENOMEM; 1103 goto out; 1104 } : 1106 /* Attach data to the rm */ : 1113 ret = rds_message_copy_from_user(rm, &msg->msg_iter); So using GFP_KERNEL is as safe as the call at line 1100. > >+return -ENOMEM; > >+} > NOMEM new application visible change but probably the right one for this > particular case. Just need to make sure application can handle this > error. I think the application already handles this correctly (see line 1102 above) Thanks for taking a look. --Sowmini
Re: [PATCH V2 net-next 5/7] rds: zerocopy Tx support.
On 2/14/2018 2:28 AM, Sowmini Varadhan wrote: If the MSG_ZEROCOPY flag is specified with rds_sendmsg(), and, if the SO_ZEROCOPY socket option has been set on the PF_RDS socket, application pages sent down with rds_sendmsg() are pinned. The pinning uses the accounting infrastructure added by Commit a91dbff551a6 ("sock: ulimit on MSG_ZEROCOPY pages") The payload bytes in the message may not be modified for the duration that the message has been pinned. A multi-threaded application using this infrastructure may thus need to be notified about send-completion so that it can free/reuse the buffers passed to rds_sendmsg(). Notification of send-completion will identify each message-buffer by a cookie that the application must specify as ancillary data to rds_sendmsg(). The ancillary data in this case has cmsg_level == SOL_RDS and cmsg_type == RDS_CMSG_ZCOPY_COOKIE. Signed-off-by: Sowmini Varadhan --- v2: - remove unused data_len argument to rds_rm_size; - unmap as necessary if we fail in the middle of zerocopy setup include/uapi/linux/rds.h |1 + net/rds/message.c| 51 +- net/rds/rds.h|3 +- net/rds/send.c | 44 ++- 4 files changed, 91 insertions(+), 8 deletions(-) diff --git a/include/uapi/linux/rds.h b/include/uapi/linux/rds.h index e71d449..12e3bca 100644 --- a/include/uapi/linux/rds.h +++ b/include/uapi/linux/rds.h @@ -103,6 +103,7 @@ #define RDS_CMSG_MASKED_ATOMIC_FADD 8 #define RDS_CMSG_MASKED_ATOMIC_CSWP 9 #define RDS_CMSG_RXPATH_LATENCY 11 +#defineRDS_CMSG_ZCOPY_COOKIE 12 s/RDS_CMSG_ZCOPY_COOKIE/RDS_CMSG_ZMSGCOPY_COOKIE #define RDS_INFO_FIRST1 #define RDS_INFO_COUNTERS 1 diff --git a/net/rds/message.c b/net/rds/message.c index d874b74..e499566 100644 --- a/net/rds/message.c +++ b/net/rds/message.c @@ -341,12 +341,14 @@ struct rds_message *rds_message_map_pages(unsigned long *page_addrs, unsigned in return rm; } -int rds_message_copy_from_user(struct rds_message *rm, struct iov_iter *from) +int rds_message_copy_from_user(struct rds_message *rm, struct iov_iter *from, + bool zcopy) { unsigned long to_copy, nbytes; unsigned long sg_off; struct scatterlist *sg; int ret = 0; + int length = iov_iter_count(from); rm->m_inc.i_hdr.h_len = cpu_to_be32(iov_iter_count(from)); @@ -356,6 +358,53 @@ int rds_message_copy_from_user(struct rds_message *rm, struct iov_iter *from) sg = rm->data.op_sg; sg_off = 0; /* Dear gcc, sg->page will be null from kzalloc. */ + if (zcopy) { + int total_copied = 0; + struct sk_buff *skb; + + skb = alloc_skb(SO_EE_ORIGIN_MAX_ZCOOKIES * sizeof(u32), + GFP_KERNEL); This can sleep so you might want to check if you want to use ATOMIC version here. + if (!skb) + return -ENOMEM; + rm->data.op_mmp_znotifier = RDS_ZCOPY_SKB(skb); + memset(rm->data.op_mmp_znotifier, 0, + sizeof(*rm->data.op_mmp_znotifier)); + if (mm_account_pinned_pages(&rm->data.op_mmp_znotifier->z_mmp, + length)) { + consume_skb(skb); + rm->data.op_mmp_znotifier = NULL; + return -ENOMEM; + } NOMEM new application visible change but probably the right one for this particular case. Just need to make sure application can handle this error. diff --git a/net/rds/rds.h b/net/rds/rds.h index 6e8fc4c..dfdc9b3 100644 --- a/net/rds/rds.h +++ b/net/rds/rds.h @@ -784,7 +784,8 @@ void rds_for_each_conn_info(struct socket *sock, unsigned int len, /* message.c */ struct rds_message *rds_message_alloc(unsigned int nents, gfp_t gfp); struct scatterlist *rds_message_alloc_sgs(struct rds_message *rm, int nents); -int rds_message_copy_from_user(struct rds_message *rm, struct iov_iter *from); +int rds_message_copy_from_user(struct rds_message *rm, struct iov_iter *from, + bool zcopy); struct rds_message *rds_message_map_pages(unsigned long *page_addrs, unsigned int total_len); void rds_message_populate_header(struct rds_header *hdr, __be16 sport, __be16 dport, u64 seq); diff --git a/net/rds/send.c b/net/rds/send.c index 5ac0925..80171cf 100644 --- a/net/rds/send.c +++ b/net/rds/send.c [...] @@ -1087,8 +1112,15 @@ int rds_sendmsg(struct socket *sock, struct msghdr *msg, size_t payload_len) goto out; } + if (zcopy) { + if (rs->rs_transport->t_type != RDS_TRANS_TCP) { + ret = -EOPNOTSUPP; + goto out; + } + num_sgs = i