Re: [net-next v5 3/3] net/tls: Remove redundant array allocation.

2018-08-01 Thread Dave Watson
On 08/01/18 01:49 PM, Vakul Garg wrote:
> > I don't think this patch is safe as-is.  sgin_arr is a stack array of size
> > MAX_SKB_FRAGS (+ overhead), while my read of skb_cow_data is that it
> > walks the whole chain of skbs from skb->next, and can return any number of
> > segments.  Therefore we need to heap allocate.  I think I copied the IPSEC
> > code here.
> > 
> > For perf though, we could use the stack array if skb_cow_data returns <=
> > MAX_SKB_FRAGS.
> 
> skb_cow_data() is being called only when caller passes sgout=NULL (i.e.
> non-zero copy case). In case of zero-copy, we do not call skb_cow_data()
> and just assume that MAX_SKB_FRAGS+2 sized scatterlist array sgin_arr[]
> is sufficient. This assumption could be wrong. So skb_cow_data() should be
> called unconditionally to determine number of scatterlist entries required
> for skb.

I agree it is best to unify them.  I was originally worried about perf
with the extra allocation (which you proposed fixing by merging with
the crypto allocation, which would be great), and the perf of
skb_cow_data().  Zerocopy doesn't require skb_cow_data(), but we do
have to walk the skbs to calculate nsg correctly.

However skb_cow_data perf might be fine after your fix "strparser: Call
skb_unclone conditionally".


RE: [net-next v5 3/3] net/tls: Remove redundant array allocation.

2018-08-01 Thread Vakul Garg



> -Original Message-
> From: Dave Watson [mailto:davejwat...@fb.com]
> Sent: Monday, July 23, 2018 10:05 PM
> To: David Miller 
> Cc: Vakul Garg ; netdev@vger.kernel.org;
> bor...@mellanox.com; avia...@mellanox.com; Doron Roberts-Kedes
> 
> Subject: Re: [net-next v5 3/3] net/tls: Remove redundant array allocation.
> 
> On 07/21/18 07:25 PM, David Miller wrote:
> > From: Vakul Garg 
> > Date: Thu, 19 Jul 2018 21:56:13 +0530
> >
> > > In function decrypt_skb(), array allocation in case when sgout is
> > > NULL is unnecessary. Instead, local variable sgin_arr[] can be used.
> > >
> > > Signed-off-by: Vakul Garg 
> >
> > Hmmm...
> >
> > Dave, can you take a look at this?  Do you think there might have been
> > a reason you felt that you needed to dynamically allocate the
> > scatterlists when you COW and skb and do in-place decryption?
> >
> > I guess this change is ok, nsg can only get smaller when the SKB is
> > COW'd.
> >
> 
> > > memcpy(iv, tls_ctx->rx.iv, TLS_CIPHER_AES_GCM_128_SALT_SIZE);
> > > if (!sgout) {
> > > nsg = skb_cow_data(skb, 0, ) + 1;
> > > -   sgin = kmalloc_array(nsg, sizeof(*sgin), 
> > > sk->sk_allocation);
> > > sgout = sgin;
> > > }
> 
> I don't think this patch is safe as-is.  sgin_arr is a stack array of size
> MAX_SKB_FRAGS (+ overhead), while my read of skb_cow_data is that it
> walks the whole chain of skbs from skb->next, and can return any number of
> segments.  Therefore we need to heap allocate.  I think I copied the IPSEC
> code here.
> 
> For perf though, we could use the stack array if skb_cow_data returns <=
> MAX_SKB_FRAGS.

skb_cow_data() is being called only when caller passes sgout=NULL (i.e.
non-zero copy case). In case of zero-copy, we do not call skb_cow_data()
and just assume that MAX_SKB_FRAGS+2 sized scatterlist array sgin_arr[]
is sufficient. This assumption could be wrong. So skb_cow_data() should be
called unconditionally to determine number of scatterlist entries required
for skb.

> 
> This code is slightly confusing though, since we don't heap allocate in the
> zerocopy case - what happens is that skb_to_sgvec returns -EMSGSIZE, and
> we fall back to the non-zerocopy case, and return again to this function,
> where we then hit the skb_cow_data path and heap allocate.

If skb_to_sgvec return -EMSGSIZE, decrypt_skb() would return failure, 
resulting in abort of TLS session.



Re: [net-next v5 3/3] net/tls: Remove redundant array allocation.

2018-07-27 Thread Dave Watson
On 07/27/18 09:34 AM, Vakul Garg wrote:
> 
> 
> > -Original Message-
> > From: Dave Watson [mailto:davejwat...@fb.com]
> > Sent: Thursday, July 26, 2018 2:31 AM
> > To: Vakul Garg 
> > Cc: David Miller ; netdev@vger.kernel.org;
> > bor...@mellanox.com; avia...@mellanox.com; Doron Roberts-Kedes
> > 
> > Subject: Re: [net-next v5 3/3] net/tls: Remove redundant array allocation.
> > 
> > On 07/24/18 08:22 AM, Vakul Garg wrote:
> > > Will it be a bad idea to get rid of array 'sgin' on stack and simply
> > > kmalloc 'sgin' for whatever the number the number of pages returned by
> > iov_iter_npages()?
> > > We can allocate for sgout too with the same kmalloc().
> > >
> > > (Using a local array based 'sgin' is coming in the way to achieve
> > > sending multiple async record decryption requests to the accelerator
> > > without waiting for previous one to complete.)
> > 
> > Yes we could do this, and yes we would need to heap allocate if you want to
> > support multiple outstanding decryption requests.  I think async crypto
> > prevents any sort of zerocopy-fastpath, however.
> 
> We already do a aead_request_alloc (which internally does kmalloc).
> To mitigate the cost of kmalloc/kfree for sg lists and aad, I am allocating a 
> combined memory chunk for all of these and then segmenting it into
> aead_req, aad, sgin, sgout. This way there should be no extra cost for
> memory allocations in non-async.

Makes sense, sounds good to me. 


RE: [net-next v5 3/3] net/tls: Remove redundant array allocation.

2018-07-27 Thread Vakul Garg



> -Original Message-
> From: Dave Watson [mailto:davejwat...@fb.com]
> Sent: Thursday, July 26, 2018 2:31 AM
> To: Vakul Garg 
> Cc: David Miller ; netdev@vger.kernel.org;
> bor...@mellanox.com; avia...@mellanox.com; Doron Roberts-Kedes
> 
> Subject: Re: [net-next v5 3/3] net/tls: Remove redundant array allocation.
> 
> On 07/24/18 08:22 AM, Vakul Garg wrote:
> > > I don't think this patch is safe as-is.  sgin_arr is a stack array
> > > of size MAX_SKB_FRAGS (+ overhead), while my read of skb_cow_data is
> > > that it walks the whole chain of skbs from skb->next, and can return
> > > any number of segments.  Therefore we need to heap allocate.  I
> > > think I copied the IPSEC code here.
> > >
> > > For perf though, we could use the stack array if skb_cow_data
> > > returns <= MAX_SKB_FRAGS.
> > >
> > > This code is slightly confusing though, since we don't heap allocate
> > > in the zerocopy case - what happens is that skb_to_sgvec returns
> > > -EMSGSIZE, and we fall back to the non-zerocopy case, and return
> > > again to this function, where we then hit the skb_cow_data path and
> heap allocate.
> >
> > Thanks for explaining.
> > I am missing the point why MAX_SKB_FRAGS sized local array sgin has
> > been used in tls_sw_recvmsg(). What is special about MAX_SKB_FRAGS so
> > that we used it as a size factor for 'sgin'?
> 
> There is nothing special about it, in the zerocopy-fastpath if we happen to 
> fit
> in MAX_SKB_FRAGS we avoid any kmalloc whatsoever though.
> It could be renamed MAX_SC_FOR_FASTPATH or something.
> 
> > Will it be a bad idea to get rid of array 'sgin' on stack and simply
> > kmalloc 'sgin' for whatever the number the number of pages returned by
> iov_iter_npages()?
> > We can allocate for sgout too with the same kmalloc().
> >
> > (Using a local array based 'sgin' is coming in the way to achieve
> > sending multiple async record decryption requests to the accelerator
> > without waiting for previous one to complete.)
> 
> Yes we could do this, and yes we would need to heap allocate if you want to
> support multiple outstanding decryption requests.  I think async crypto
> prevents any sort of zerocopy-fastpath, however.

We already do a aead_request_alloc (which internally does kmalloc).
To mitigate the cost of kmalloc/kfree for sg lists and aad, I am allocating a 
combined memory chunk for all of these and then segmenting it into
aead_req, aad, sgin, sgout. This way there should be no extra cost for
memory allocations in non-async.
 



Re: [net-next v5 3/3] net/tls: Remove redundant array allocation.

2018-07-25 Thread Dave Watson
On 07/24/18 08:22 AM, Vakul Garg wrote:
> > I don't think this patch is safe as-is.  sgin_arr is a stack array of size
> > MAX_SKB_FRAGS (+ overhead), while my read of skb_cow_data is that it
> > walks the whole chain of skbs from skb->next, and can return any number of
> > segments.  Therefore we need to heap allocate.  I think I copied the IPSEC
> > code here.
> > 
> > For perf though, we could use the stack array if skb_cow_data returns <=
> > MAX_SKB_FRAGS.
> > 
> > This code is slightly confusing though, since we don't heap allocate in the
> > zerocopy case - what happens is that skb_to_sgvec returns -EMSGSIZE, and
> > we fall back to the non-zerocopy case, and return again to this function,
> > where we then hit the skb_cow_data path and heap allocate.
> 
> Thanks for explaining. 
> I am missing the point why MAX_SKB_FRAGS sized local array 
> sgin has been used in tls_sw_recvmsg(). What is special about MAX_SKB_FRAGS so
> that we used it as a size factor for 'sgin'?

There is nothing special about it, in the zerocopy-fastpath if we
happen to fit in MAX_SKB_FRAGS we avoid any kmalloc whatsoever though.
It could be renamed MAX_SC_FOR_FASTPATH or something.

> Will it be a bad idea to get rid of array 'sgin' on stack and simply kmalloc 
> 'sgin' for 
> whatever the number the number of pages returned by iov_iter_npages()?
> We can allocate for sgout too with the same kmalloc().
> 
> (Using a local array based 'sgin' is coming in the way to achieve sending 
> multiple async
> record decryption requests to the accelerator without waiting for previous 
> one to complete.)

Yes we could do this, and yes we would need to heap allocate if you
want to support multiple outstanding decryption requests.  I think
async crypto prevents any sort of zerocopy-fastpath, however.  


RE: [net-next v5 3/3] net/tls: Remove redundant array allocation.

2018-07-24 Thread Vakul Garg



> -Original Message-
> From: Dave Watson [mailto:davejwat...@fb.com]
> Sent: Monday, July 23, 2018 10:05 PM
> To: David Miller 
> Cc: Vakul Garg ; netdev@vger.kernel.org;
> bor...@mellanox.com; avia...@mellanox.com; Doron Roberts-Kedes
> 
> Subject: Re: [net-next v5 3/3] net/tls: Remove redundant array allocation.
> 
> On 07/21/18 07:25 PM, David Miller wrote:
> > From: Vakul Garg 
> > Date: Thu, 19 Jul 2018 21:56:13 +0530
> >
> > > In function decrypt_skb(), array allocation in case when sgout is
> > > NULL is unnecessary. Instead, local variable sgin_arr[] can be used.
> > >
> > > Signed-off-by: Vakul Garg 
> >
> > Hmmm...
> >
> > Dave, can you take a look at this?  Do you think there might have been
> > a reason you felt that you needed to dynamically allocate the
> > scatterlists when you COW and skb and do in-place decryption?
> >
> > I guess this change is ok, nsg can only get smaller when the SKB is
> > COW'd.
> >
> 
> > > memcpy(iv, tls_ctx->rx.iv, TLS_CIPHER_AES_GCM_128_SALT_SIZE);
> > > if (!sgout) {
> > > nsg = skb_cow_data(skb, 0, ) + 1;
> > > -   sgin = kmalloc_array(nsg, sizeof(*sgin), 
> > > sk->sk_allocation);
> > > sgout = sgin;
> > > }
> 
> I don't think this patch is safe as-is.  sgin_arr is a stack array of size
> MAX_SKB_FRAGS (+ overhead), while my read of skb_cow_data is that it
> walks the whole chain of skbs from skb->next, and can return any number of
> segments.  Therefore we need to heap allocate.  I think I copied the IPSEC
> code here.
> 
> For perf though, we could use the stack array if skb_cow_data returns <=
> MAX_SKB_FRAGS.
> 
> This code is slightly confusing though, since we don't heap allocate in the
> zerocopy case - what happens is that skb_to_sgvec returns -EMSGSIZE, and
> we fall back to the non-zerocopy case, and return again to this function,
> where we then hit the skb_cow_data path and heap allocate.

Thanks for explaining. 
I am missing the point why MAX_SKB_FRAGS sized local array 
sgin has been used in tls_sw_recvmsg(). What is special about MAX_SKB_FRAGS so
that we used it as a size factor for 'sgin'?

Will it be a bad idea to get rid of array 'sgin' on stack and simply kmalloc 
'sgin' for 
whatever the number the number of pages returned by iov_iter_npages()?
We can allocate for sgout too with the same kmalloc().

(Using a local array based 'sgin' is coming in the way to achieve sending 
multiple async
record decryption requests to the accelerator without waiting for previous one 
to complete.)
 



Re: [net-next v5 3/3] net/tls: Remove redundant array allocation.

2018-07-23 Thread David Miller
From: Vakul Garg 
Date: Tue, 24 Jul 2018 04:43:55 +

> Can you still apply the rest of two patches in the series or do I
> need to send them again separately?

When a change of any kind needs to be made to a patch series, you must
always resubmit the entire series.

Thank you.


Re: [net-next v5 3/3] net/tls: Remove redundant array allocation.

2018-07-23 Thread Vakul Garg
Hi Dave

Can you still apply the rest of two patches in the series or do I need to send 
them again separately?

Regards

Vakul



From: netdev-ow...@vger.kernel.org  on behalf of 
David Miller 
Sent: Tuesday, July 24, 2018 10:11:09 AM
To: davejwat...@fb.com
Cc: Vakul Garg; netdev@vger.kernel.org; bor...@mellanox.com; 
avia...@mellanox.com; doro...@fb.com
Subject: Re: [net-next v5 3/3] net/tls: Remove redundant array allocation.

From: Dave Watson 
Date: Mon, 23 Jul 2018 09:35:09 -0700

> I don't think this patch is safe as-is.  sgin_arr is a stack array of
> size MAX_SKB_FRAGS (+ overhead), while my read of skb_cow_data is that
> it walks the whole chain of skbs from skb->next, and can return any
> number of segments.  Therefore we need to heap allocate.  I think I
> copied the IPSEC code here.

Ok I see what you are saying.

So it means that, when a non-NULL sgout is passed into decrypt_skb(),
via decrypt_skb_update(), via tls_sw_recvmsg() it means that it is the
zerocopy case and you know that you only have page frags and no SKB
frag list, right?

I agree with you that this change is therefore incorrect.


Re: [net-next v5 3/3] net/tls: Remove redundant array allocation.

2018-07-23 Thread David Miller
From: Dave Watson 
Date: Mon, 23 Jul 2018 09:35:09 -0700

> I don't think this patch is safe as-is.  sgin_arr is a stack array of
> size MAX_SKB_FRAGS (+ overhead), while my read of skb_cow_data is that
> it walks the whole chain of skbs from skb->next, and can return any
> number of segments.  Therefore we need to heap allocate.  I think I
> copied the IPSEC code here.

Ok I see what you are saying.

So it means that, when a non-NULL sgout is passed into decrypt_skb(),
via decrypt_skb_update(), via tls_sw_recvmsg() it means that it is the
zerocopy case and you know that you only have page frags and no SKB
frag list, right?

I agree with you that this change is therefore incorrect.


Re: [net-next v5 3/3] net/tls: Remove redundant array allocation.

2018-07-23 Thread Dave Watson
On 07/21/18 07:25 PM, David Miller wrote:
> From: Vakul Garg 
> Date: Thu, 19 Jul 2018 21:56:13 +0530
> 
> > In function decrypt_skb(), array allocation in case when sgout is NULL
> > is unnecessary. Instead, local variable sgin_arr[] can be used.
> > 
> > Signed-off-by: Vakul Garg 
> 
> Hmmm...
> 
> Dave, can you take a look at this?  Do you think there might have
> been a reason you felt that you needed to dynamically allocate
> the scatterlists when you COW and skb and do in-place decryption?
> 
> I guess this change is ok, nsg can only get smaller when the SKB
> is COW'd.
> 

> > memcpy(iv, tls_ctx->rx.iv, TLS_CIPHER_AES_GCM_128_SALT_SIZE);
> > if (!sgout) {
> > nsg = skb_cow_data(skb, 0, ) + 1;
> > -   sgin = kmalloc_array(nsg, sizeof(*sgin), sk->sk_allocation);
> > sgout = sgin;
> > }

I don't think this patch is safe as-is.  sgin_arr is a stack array of
size MAX_SKB_FRAGS (+ overhead), while my read of skb_cow_data is that
it walks the whole chain of skbs from skb->next, and can return any
number of segments.  Therefore we need to heap allocate.  I think I
copied the IPSEC code here.

For perf though, we could use the stack array if skb_cow_data returns
<= MAX_SKB_FRAGS.

This code is slightly confusing though, since we don't heap allocate
in the zerocopy case - what happens is that skb_to_sgvec returns
-EMSGSIZE, and we fall back to the non-zerocopy case, and return again
to this function, where we then hit the skb_cow_data path and heap
allocate.


Re: [net-next v5 3/3] net/tls: Remove redundant array allocation.

2018-07-21 Thread David Miller
From: Vakul Garg 
Date: Thu, 19 Jul 2018 21:56:13 +0530

> In function decrypt_skb(), array allocation in case when sgout is NULL
> is unnecessary. Instead, local variable sgin_arr[] can be used.
> 
> Signed-off-by: Vakul Garg 

Hmmm...

Dave, can you take a look at this?  Do you think there might have
been a reason you felt that you needed to dynamically allocate
the scatterlists when you COW and skb and do in-place decryption?

I guess this change is ok, nsg can only get smaller when the SKB
is COW'd.



[net-next v5 3/3] net/tls: Remove redundant array allocation.

2018-07-19 Thread Vakul Garg
In function decrypt_skb(), array allocation in case when sgout is NULL
is unnecessary. Instead, local variable sgin_arr[] can be used.

Signed-off-by: Vakul Garg 
---
 net/tls/tls_sw.c | 4 
 1 file changed, 4 deletions(-)

diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index e15ace0ebd79..1aa2d46713d7 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -704,7 +704,6 @@ int decrypt_skb(struct sock *sk, struct sk_buff *skb,
memcpy(iv, tls_ctx->rx.iv, TLS_CIPHER_AES_GCM_128_SALT_SIZE);
if (!sgout) {
nsg = skb_cow_data(skb, 0, ) + 1;
-   sgin = kmalloc_array(nsg, sizeof(*sgin), sk->sk_allocation);
sgout = sgin;
}
 
@@ -725,9 +724,6 @@ int decrypt_skb(struct sock *sk, struct sk_buff *skb,
rxm->full_len - tls_ctx->rx.overhead_size,
skb, sk->sk_allocation);
 
-   if (sgin != _arr[0])
-   kfree(sgin);
-
return ret;
 }
 
-- 
2.13.6



[net-next v5 3/3] net/tls: Remove redundant array allocation.

2018-07-19 Thread Vakul Garg
In function decrypt_skb(), array allocation in case when sgout is NULL
is unnecessary. Instead, local variable sgin_arr[] can be used.

Signed-off-by: Vakul Garg 
---
 net/tls/tls_sw.c | 4 
 1 file changed, 4 deletions(-)

diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index e15ace0ebd79..1aa2d46713d7 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -704,7 +704,6 @@ int decrypt_skb(struct sock *sk, struct sk_buff *skb,
memcpy(iv, tls_ctx->rx.iv, TLS_CIPHER_AES_GCM_128_SALT_SIZE);
if (!sgout) {
nsg = skb_cow_data(skb, 0, ) + 1;
-   sgin = kmalloc_array(nsg, sizeof(*sgin), sk->sk_allocation);
sgout = sgin;
}
 
@@ -725,9 +724,6 @@ int decrypt_skb(struct sock *sk, struct sk_buff *skb,
rxm->full_len - tls_ctx->rx.overhead_size,
skb, sk->sk_allocation);
 
-   if (sgin != _arr[0])
-   kfree(sgin);
-
return ret;
 }
 
-- 
2.13.6