Re: [net-next v5 3/3] net/tls: Remove redundant array allocation.
On 08/01/18 01:49 PM, Vakul Garg wrote: > > I don't think this patch is safe as-is. sgin_arr is a stack array of size > > MAX_SKB_FRAGS (+ overhead), while my read of skb_cow_data is that it > > walks the whole chain of skbs from skb->next, and can return any number of > > segments. Therefore we need to heap allocate. I think I copied the IPSEC > > code here. > > > > For perf though, we could use the stack array if skb_cow_data returns <= > > MAX_SKB_FRAGS. > > skb_cow_data() is being called only when caller passes sgout=NULL (i.e. > non-zero copy case). In case of zero-copy, we do not call skb_cow_data() > and just assume that MAX_SKB_FRAGS+2 sized scatterlist array sgin_arr[] > is sufficient. This assumption could be wrong. So skb_cow_data() should be > called unconditionally to determine number of scatterlist entries required > for skb. I agree it is best to unify them. I was originally worried about perf with the extra allocation (which you proposed fixing by merging with the crypto allocation, which would be great), and the perf of skb_cow_data(). Zerocopy doesn't require skb_cow_data(), but we do have to walk the skbs to calculate nsg correctly. However skb_cow_data perf might be fine after your fix "strparser: Call skb_unclone conditionally".
RE: [net-next v5 3/3] net/tls: Remove redundant array allocation.
> -Original Message- > From: Dave Watson [mailto:davejwat...@fb.com] > Sent: Monday, July 23, 2018 10:05 PM > To: David Miller > Cc: Vakul Garg ; netdev@vger.kernel.org; > bor...@mellanox.com; avia...@mellanox.com; Doron Roberts-Kedes > > Subject: Re: [net-next v5 3/3] net/tls: Remove redundant array allocation. > > On 07/21/18 07:25 PM, David Miller wrote: > > From: Vakul Garg > > Date: Thu, 19 Jul 2018 21:56:13 +0530 > > > > > In function decrypt_skb(), array allocation in case when sgout is > > > NULL is unnecessary. Instead, local variable sgin_arr[] can be used. > > > > > > Signed-off-by: Vakul Garg > > > > Hmmm... > > > > Dave, can you take a look at this? Do you think there might have been > > a reason you felt that you needed to dynamically allocate the > > scatterlists when you COW and skb and do in-place decryption? > > > > I guess this change is ok, nsg can only get smaller when the SKB is > > COW'd. > > > > > > memcpy(iv, tls_ctx->rx.iv, TLS_CIPHER_AES_GCM_128_SALT_SIZE); > > > if (!sgout) { > > > nsg = skb_cow_data(skb, 0, &unused) + 1; > > > - sgin = kmalloc_array(nsg, sizeof(*sgin), > > > sk->sk_allocation); > > > sgout = sgin; > > > } > > I don't think this patch is safe as-is. sgin_arr is a stack array of size > MAX_SKB_FRAGS (+ overhead), while my read of skb_cow_data is that it > walks the whole chain of skbs from skb->next, and can return any number of > segments. Therefore we need to heap allocate. I think I copied the IPSEC > code here. > > For perf though, we could use the stack array if skb_cow_data returns <= > MAX_SKB_FRAGS. skb_cow_data() is being called only when caller passes sgout=NULL (i.e. non-zero copy case). In case of zero-copy, we do not call skb_cow_data() and just assume that MAX_SKB_FRAGS+2 sized scatterlist array sgin_arr[] is sufficient. This assumption could be wrong. So skb_cow_data() should be called unconditionally to determine number of scatterlist entries required for skb. > > This code is slightly confusing though, since we don't heap allocate in the > zerocopy case - what happens is that skb_to_sgvec returns -EMSGSIZE, and > we fall back to the non-zerocopy case, and return again to this function, > where we then hit the skb_cow_data path and heap allocate. If skb_to_sgvec return -EMSGSIZE, decrypt_skb() would return failure, resulting in abort of TLS session.
Re: [net-next v5 3/3] net/tls: Remove redundant array allocation.
On 07/27/18 09:34 AM, Vakul Garg wrote: > > > > -Original Message- > > From: Dave Watson [mailto:davejwat...@fb.com] > > Sent: Thursday, July 26, 2018 2:31 AM > > To: Vakul Garg > > Cc: David Miller ; netdev@vger.kernel.org; > > bor...@mellanox.com; avia...@mellanox.com; Doron Roberts-Kedes > > > > Subject: Re: [net-next v5 3/3] net/tls: Remove redundant array allocation. > > > > On 07/24/18 08:22 AM, Vakul Garg wrote: > > > Will it be a bad idea to get rid of array 'sgin' on stack and simply > > > kmalloc 'sgin' for whatever the number the number of pages returned by > > iov_iter_npages()? > > > We can allocate for sgout too with the same kmalloc(). > > > > > > (Using a local array based 'sgin' is coming in the way to achieve > > > sending multiple async record decryption requests to the accelerator > > > without waiting for previous one to complete.) > > > > Yes we could do this, and yes we would need to heap allocate if you want to > > support multiple outstanding decryption requests. I think async crypto > > prevents any sort of zerocopy-fastpath, however. > > We already do a aead_request_alloc (which internally does kmalloc). > To mitigate the cost of kmalloc/kfree for sg lists and aad, I am allocating a > combined memory chunk for all of these and then segmenting it into > aead_req, aad, sgin, sgout. This way there should be no extra cost for > memory allocations in non-async. Makes sense, sounds good to me.
RE: [net-next v5 3/3] net/tls: Remove redundant array allocation.
> -Original Message- > From: Dave Watson [mailto:davejwat...@fb.com] > Sent: Thursday, July 26, 2018 2:31 AM > To: Vakul Garg > Cc: David Miller ; netdev@vger.kernel.org; > bor...@mellanox.com; avia...@mellanox.com; Doron Roberts-Kedes > > Subject: Re: [net-next v5 3/3] net/tls: Remove redundant array allocation. > > On 07/24/18 08:22 AM, Vakul Garg wrote: > > > I don't think this patch is safe as-is. sgin_arr is a stack array > > > of size MAX_SKB_FRAGS (+ overhead), while my read of skb_cow_data is > > > that it walks the whole chain of skbs from skb->next, and can return > > > any number of segments. Therefore we need to heap allocate. I > > > think I copied the IPSEC code here. > > > > > > For perf though, we could use the stack array if skb_cow_data > > > returns <= MAX_SKB_FRAGS. > > > > > > This code is slightly confusing though, since we don't heap allocate > > > in the zerocopy case - what happens is that skb_to_sgvec returns > > > -EMSGSIZE, and we fall back to the non-zerocopy case, and return > > > again to this function, where we then hit the skb_cow_data path and > heap allocate. > > > > Thanks for explaining. > > I am missing the point why MAX_SKB_FRAGS sized local array sgin has > > been used in tls_sw_recvmsg(). What is special about MAX_SKB_FRAGS so > > that we used it as a size factor for 'sgin'? > > There is nothing special about it, in the zerocopy-fastpath if we happen to > fit > in MAX_SKB_FRAGS we avoid any kmalloc whatsoever though. > It could be renamed MAX_SC_FOR_FASTPATH or something. > > > Will it be a bad idea to get rid of array 'sgin' on stack and simply > > kmalloc 'sgin' for whatever the number the number of pages returned by > iov_iter_npages()? > > We can allocate for sgout too with the same kmalloc(). > > > > (Using a local array based 'sgin' is coming in the way to achieve > > sending multiple async record decryption requests to the accelerator > > without waiting for previous one to complete.) > > Yes we could do this, and yes we would need to heap allocate if you want to > support multiple outstanding decryption requests. I think async crypto > prevents any sort of zerocopy-fastpath, however. We already do a aead_request_alloc (which internally does kmalloc). To mitigate the cost of kmalloc/kfree for sg lists and aad, I am allocating a combined memory chunk for all of these and then segmenting it into aead_req, aad, sgin, sgout. This way there should be no extra cost for memory allocations in non-async.
Re: [net-next v5 3/3] net/tls: Remove redundant array allocation.
On 07/24/18 08:22 AM, Vakul Garg wrote: > > I don't think this patch is safe as-is. sgin_arr is a stack array of size > > MAX_SKB_FRAGS (+ overhead), while my read of skb_cow_data is that it > > walks the whole chain of skbs from skb->next, and can return any number of > > segments. Therefore we need to heap allocate. I think I copied the IPSEC > > code here. > > > > For perf though, we could use the stack array if skb_cow_data returns <= > > MAX_SKB_FRAGS. > > > > This code is slightly confusing though, since we don't heap allocate in the > > zerocopy case - what happens is that skb_to_sgvec returns -EMSGSIZE, and > > we fall back to the non-zerocopy case, and return again to this function, > > where we then hit the skb_cow_data path and heap allocate. > > Thanks for explaining. > I am missing the point why MAX_SKB_FRAGS sized local array > sgin has been used in tls_sw_recvmsg(). What is special about MAX_SKB_FRAGS so > that we used it as a size factor for 'sgin'? There is nothing special about it, in the zerocopy-fastpath if we happen to fit in MAX_SKB_FRAGS we avoid any kmalloc whatsoever though. It could be renamed MAX_SC_FOR_FASTPATH or something. > Will it be a bad idea to get rid of array 'sgin' on stack and simply kmalloc > 'sgin' for > whatever the number the number of pages returned by iov_iter_npages()? > We can allocate for sgout too with the same kmalloc(). > > (Using a local array based 'sgin' is coming in the way to achieve sending > multiple async > record decryption requests to the accelerator without waiting for previous > one to complete.) Yes we could do this, and yes we would need to heap allocate if you want to support multiple outstanding decryption requests. I think async crypto prevents any sort of zerocopy-fastpath, however.
RE: [net-next v5 3/3] net/tls: Remove redundant array allocation.
> -Original Message- > From: Dave Watson [mailto:davejwat...@fb.com] > Sent: Monday, July 23, 2018 10:05 PM > To: David Miller > Cc: Vakul Garg ; netdev@vger.kernel.org; > bor...@mellanox.com; avia...@mellanox.com; Doron Roberts-Kedes > > Subject: Re: [net-next v5 3/3] net/tls: Remove redundant array allocation. > > On 07/21/18 07:25 PM, David Miller wrote: > > From: Vakul Garg > > Date: Thu, 19 Jul 2018 21:56:13 +0530 > > > > > In function decrypt_skb(), array allocation in case when sgout is > > > NULL is unnecessary. Instead, local variable sgin_arr[] can be used. > > > > > > Signed-off-by: Vakul Garg > > > > Hmmm... > > > > Dave, can you take a look at this? Do you think there might have been > > a reason you felt that you needed to dynamically allocate the > > scatterlists when you COW and skb and do in-place decryption? > > > > I guess this change is ok, nsg can only get smaller when the SKB is > > COW'd. > > > > > > memcpy(iv, tls_ctx->rx.iv, TLS_CIPHER_AES_GCM_128_SALT_SIZE); > > > if (!sgout) { > > > nsg = skb_cow_data(skb, 0, &unused) + 1; > > > - sgin = kmalloc_array(nsg, sizeof(*sgin), > > > sk->sk_allocation); > > > sgout = sgin; > > > } > > I don't think this patch is safe as-is. sgin_arr is a stack array of size > MAX_SKB_FRAGS (+ overhead), while my read of skb_cow_data is that it > walks the whole chain of skbs from skb->next, and can return any number of > segments. Therefore we need to heap allocate. I think I copied the IPSEC > code here. > > For perf though, we could use the stack array if skb_cow_data returns <= > MAX_SKB_FRAGS. > > This code is slightly confusing though, since we don't heap allocate in the > zerocopy case - what happens is that skb_to_sgvec returns -EMSGSIZE, and > we fall back to the non-zerocopy case, and return again to this function, > where we then hit the skb_cow_data path and heap allocate. Thanks for explaining. I am missing the point why MAX_SKB_FRAGS sized local array sgin has been used in tls_sw_recvmsg(). What is special about MAX_SKB_FRAGS so that we used it as a size factor for 'sgin'? Will it be a bad idea to get rid of array 'sgin' on stack and simply kmalloc 'sgin' for whatever the number the number of pages returned by iov_iter_npages()? We can allocate for sgout too with the same kmalloc(). (Using a local array based 'sgin' is coming in the way to achieve sending multiple async record decryption requests to the accelerator without waiting for previous one to complete.)
Re: [net-next v5 3/3] net/tls: Remove redundant array allocation.
From: Vakul Garg Date: Tue, 24 Jul 2018 04:43:55 + > Can you still apply the rest of two patches in the series or do I > need to send them again separately? When a change of any kind needs to be made to a patch series, you must always resubmit the entire series. Thank you.
Re: [net-next v5 3/3] net/tls: Remove redundant array allocation.
Hi Dave Can you still apply the rest of two patches in the series or do I need to send them again separately? Regards Vakul From: netdev-ow...@vger.kernel.org on behalf of David Miller Sent: Tuesday, July 24, 2018 10:11:09 AM To: davejwat...@fb.com Cc: Vakul Garg; netdev@vger.kernel.org; bor...@mellanox.com; avia...@mellanox.com; doro...@fb.com Subject: Re: [net-next v5 3/3] net/tls: Remove redundant array allocation. From: Dave Watson Date: Mon, 23 Jul 2018 09:35:09 -0700 > I don't think this patch is safe as-is. sgin_arr is a stack array of > size MAX_SKB_FRAGS (+ overhead), while my read of skb_cow_data is that > it walks the whole chain of skbs from skb->next, and can return any > number of segments. Therefore we need to heap allocate. I think I > copied the IPSEC code here. Ok I see what you are saying. So it means that, when a non-NULL sgout is passed into decrypt_skb(), via decrypt_skb_update(), via tls_sw_recvmsg() it means that it is the zerocopy case and you know that you only have page frags and no SKB frag list, right? I agree with you that this change is therefore incorrect.
Re: [net-next v5 3/3] net/tls: Remove redundant array allocation.
From: Dave Watson Date: Mon, 23 Jul 2018 09:35:09 -0700 > I don't think this patch is safe as-is. sgin_arr is a stack array of > size MAX_SKB_FRAGS (+ overhead), while my read of skb_cow_data is that > it walks the whole chain of skbs from skb->next, and can return any > number of segments. Therefore we need to heap allocate. I think I > copied the IPSEC code here. Ok I see what you are saying. So it means that, when a non-NULL sgout is passed into decrypt_skb(), via decrypt_skb_update(), via tls_sw_recvmsg() it means that it is the zerocopy case and you know that you only have page frags and no SKB frag list, right? I agree with you that this change is therefore incorrect.
Re: [net-next v5 3/3] net/tls: Remove redundant array allocation.
On 07/21/18 07:25 PM, David Miller wrote: > From: Vakul Garg > Date: Thu, 19 Jul 2018 21:56:13 +0530 > > > In function decrypt_skb(), array allocation in case when sgout is NULL > > is unnecessary. Instead, local variable sgin_arr[] can be used. > > > > Signed-off-by: Vakul Garg > > Hmmm... > > Dave, can you take a look at this? Do you think there might have > been a reason you felt that you needed to dynamically allocate > the scatterlists when you COW and skb and do in-place decryption? > > I guess this change is ok, nsg can only get smaller when the SKB > is COW'd. > > > memcpy(iv, tls_ctx->rx.iv, TLS_CIPHER_AES_GCM_128_SALT_SIZE); > > if (!sgout) { > > nsg = skb_cow_data(skb, 0, &unused) + 1; > > - sgin = kmalloc_array(nsg, sizeof(*sgin), sk->sk_allocation); > > sgout = sgin; > > } I don't think this patch is safe as-is. sgin_arr is a stack array of size MAX_SKB_FRAGS (+ overhead), while my read of skb_cow_data is that it walks the whole chain of skbs from skb->next, and can return any number of segments. Therefore we need to heap allocate. I think I copied the IPSEC code here. For perf though, we could use the stack array if skb_cow_data returns <= MAX_SKB_FRAGS. This code is slightly confusing though, since we don't heap allocate in the zerocopy case - what happens is that skb_to_sgvec returns -EMSGSIZE, and we fall back to the non-zerocopy case, and return again to this function, where we then hit the skb_cow_data path and heap allocate.
Re: [net-next v5 3/3] net/tls: Remove redundant array allocation.
From: Vakul Garg Date: Thu, 19 Jul 2018 21:56:13 +0530 > In function decrypt_skb(), array allocation in case when sgout is NULL > is unnecessary. Instead, local variable sgin_arr[] can be used. > > Signed-off-by: Vakul Garg Hmmm... Dave, can you take a look at this? Do you think there might have been a reason you felt that you needed to dynamically allocate the scatterlists when you COW and skb and do in-place decryption? I guess this change is ok, nsg can only get smaller when the SKB is COW'd.
[net-next v5 3/3] net/tls: Remove redundant array allocation.
In function decrypt_skb(), array allocation in case when sgout is NULL is unnecessary. Instead, local variable sgin_arr[] can be used. Signed-off-by: Vakul Garg --- net/tls/tls_sw.c | 4 1 file changed, 4 deletions(-) diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c index e15ace0ebd79..1aa2d46713d7 100644 --- a/net/tls/tls_sw.c +++ b/net/tls/tls_sw.c @@ -704,7 +704,6 @@ int decrypt_skb(struct sock *sk, struct sk_buff *skb, memcpy(iv, tls_ctx->rx.iv, TLS_CIPHER_AES_GCM_128_SALT_SIZE); if (!sgout) { nsg = skb_cow_data(skb, 0, &unused) + 1; - sgin = kmalloc_array(nsg, sizeof(*sgin), sk->sk_allocation); sgout = sgin; } @@ -725,9 +724,6 @@ int decrypt_skb(struct sock *sk, struct sk_buff *skb, rxm->full_len - tls_ctx->rx.overhead_size, skb, sk->sk_allocation); - if (sgin != &sgin_arr[0]) - kfree(sgin); - return ret; } -- 2.13.6
[net-next v5 3/3] net/tls: Remove redundant array allocation.
In function decrypt_skb(), array allocation in case when sgout is NULL is unnecessary. Instead, local variable sgin_arr[] can be used. Signed-off-by: Vakul Garg --- net/tls/tls_sw.c | 4 1 file changed, 4 deletions(-) diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c index e15ace0ebd79..1aa2d46713d7 100644 --- a/net/tls/tls_sw.c +++ b/net/tls/tls_sw.c @@ -704,7 +704,6 @@ int decrypt_skb(struct sock *sk, struct sk_buff *skb, memcpy(iv, tls_ctx->rx.iv, TLS_CIPHER_AES_GCM_128_SALT_SIZE); if (!sgout) { nsg = skb_cow_data(skb, 0, &unused) + 1; - sgin = kmalloc_array(nsg, sizeof(*sgin), sk->sk_allocation); sgout = sgin; } @@ -725,9 +724,6 @@ int decrypt_skb(struct sock *sk, struct sk_buff *skb, rxm->full_len - tls_ctx->rx.overhead_size, skb, sk->sk_allocation); - if (sgin != &sgin_arr[0]) - kfree(sgin); - return ret; } -- 2.13.6