from:"Vakul Garg"

[PATCH] tls: Add support for encryption using async offload accelerator

2018-01-30 Thread Vakul Garg

Async crypto accelerators (e.g. drivers/crypto/caam) support offloading
GCM operation. If they are enabled, crypto_aead_encrypt() return error
code -EINPROGRESS. In this case tls_do_encryption() needs to wait on a
completion till the time the response for crypto offload request is
received.

Signed-off-by: Vakul Garg <vakul.g...@nxp.com>
---
 net/tls/tls_sw.c | 22 ++
 1 file changed, 22 insertions(+)

diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index 73d19210dd49..390e6dc7b135 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -39,6 +39,11 @@
 
 #include 
 
+struct crypto_completion {
+   struct completion comp;
+   int result;
+};
+
 static void trim_sg(struct sock *sk, struct scatterlist *sg,
int *sg_num_elem, unsigned int *sg_size, int target_size)
 {
@@ -194,6 +199,14 @@ static void tls_free_both_sg(struct sock *sk)
>sg_plaintext_size);
 }
 
+static void crypto_complete(struct crypto_async_request *req, int err)
+{
+   struct crypto_completion *x = req->data;
+
+   x->result = err;
+   complete(>comp);
+}
+
 static int tls_do_encryption(struct tls_context *tls_ctx,
 struct tls_sw_context *ctx, size_t data_len,
 gfp_t flags)
@@ -202,6 +215,7 @@ static int tls_do_encryption(struct tls_context *tls_ctx,
crypto_aead_reqsize(ctx->aead_send);
struct aead_request *aead_req;
int rc;
+   struct crypto_completion crypto_done;
 
aead_req = kzalloc(req_size, flags);
if (!aead_req)
@@ -214,7 +228,15 @@ static int tls_do_encryption(struct tls_context *tls_ctx,
aead_request_set_ad(aead_req, TLS_AAD_SPACE_SIZE);
aead_request_set_crypt(aead_req, ctx->sg_aead_in, ctx->sg_aead_out,
   data_len, tls_ctx->iv);
+   aead_request_set_callback(aead_req, 0, crypto_complete, _done);
+
+   init_completion(_done.comp);
+
rc = crypto_aead_encrypt(aead_req);
+   if (rc == -EINPROGRESS) {
+   wait_for_completion(_done.comp);
+   rc = crypto_done.result;
+   }
 
ctx->sg_encrypted_data[0].offset -= tls_ctx->prepend_size;
ctx->sg_encrypted_data[0].length += tls_ctx->prepend_size;
-- 
2.13.6

RE: [PATCHv2] tls: Add support for encryption using async offload accelerator

2018-01-31 Thread Vakul Garg



> -Original Message-
> From: Dave Watson [mailto:davejwat...@fb.com]
> Sent: Wednesday, January 31, 2018 8:52 PM
> To: Vakul Garg <vakul.g...@nxp.com>
> Cc: linux-cry...@vger.kernel.org; il...@mellanox.com;
> avia...@mellanox.com; da...@davemloft.net; netdev@vger.kernel.org;
> Gilad Ben-Yossef <gi...@benyossef.com>
> Subject: Re: [PATCHv2] tls: Add support for encryption using async offload
> accelerator
> 
> On 01/31/18 09:34 PM, Vakul Garg wrote:
> > Async crypto accelerators (e.g. drivers/crypto/caam) support
> > offloading GCM operation. If they are enabled, crypto_aead_encrypt()
> > return error code -EINPROGRESS. In this case tls_do_encryption() needs
> > to wait on a completion till the time the response for crypto offload
> > request is received.
> 
> Comments from V1
> > On Wed, Jan 31, 2018 at 8:10 AM, Gilad Ben-Yossef
> <gi...@benyossef.com> wrote:
> >> Hi Vakul,
> >>
> >> On Wed, Jan 31, 2018 at 12:36 PM, Vakul Garg <vakul.g...@nxp.com>
> wrote:
> >>> Async crypto accelerators (e.g. drivers/crypto/caam) support
> >>> offloading GCM operation. If they are enabled, crypto_aead_encrypt()
> >>> return error code -EINPROGRESS. In this case tls_do_encryption()
> >>> needs to wait on a completion till the time the response for crypto
> >>> offload request is received.
> >>>
> >>
> >> Thank you for this patch. I think it is actually a bug fix and should
> >> probably go into stable
> >
> > On second though in stable we should probably just disable async tfm
> > allocations.
> > It's simpler. But this approach is still good for -next
> >
> >
> > Gilad
> 
> I agree with Gilad, just disable async for now.
> 

How to do it? Can you help with the api name?

> If the flag MSG_DONTWAIT is set, we should be returning -EINPROGRESS and
> not wait for a response.  I had started working on a patch for that, but it's
> pretty tricky to get right.

Can you point me to your WIP code branch for this?

If MSG_DONTWAIT is not used, will it be sane if enqueue the crypto request to 
accelerator and return to user space back so that user space can send more 
plaintext data while 
crypto accelerator is working in parallel?

RE: [PATCHv2] tls: Add support for encryption using async offload accelerator

2018-01-31 Thread Vakul Garg



> -Original Message-
> From: Dave Watson [mailto:davejwat...@fb.com]
> Sent: Wednesday, January 31, 2018 11:05 PM
> To: Vakul Garg <vakul.g...@nxp.com>
> Cc: linux-cry...@vger.kernel.org; il...@mellanox.com;
> avia...@mellanox.com; da...@davemloft.net; netdev@vger.kernel.org;
> Gilad Ben-Yossef <gi...@benyossef.com>
> Subject: Re: [PATCHv2] tls: Add support for encryption using async offload
> accelerator
> 
> On 01/31/18 05:22 PM, Vakul Garg wrote:
> > > > On second though in stable we should probably just disable async
> > > > tfm allocations.
> > > > It's simpler. But this approach is still good for -next
> > > >
> > > >
> > > > Gilad
> > >
> > > I agree with Gilad, just disable async for now.
> > >
> >
> > How to do it? Can you help with the api name?
> 
> *aead = crypto_alloc_aead("gcm(aes)", 0, CRYPTO_ALG_ASYNC);
> 
> https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit
> hub.com%2Fktls%2Fnet_next_ktls%2Fcommit%2Ff3b9b402e755e4b0623fa8
> 3f88137173fc249f2d=02%7C01%7Cvakul.garg%40nxp.com%7Cf1c707
> 9af97e48c9e89308d568d1633a%7C686ea1d3bc2b4c6fa92cd99c5c301635%
> 7C0%7C0%7C636530170887502735=qTdwbuDsCRmK1UHAYiOcwfPC
> U%2FPXgKg%2BICh%2F2N0b9Nw%3D=0
> 
 
The above patch you authored works fine. I tested it on my test platform.
I have nothing to contribute here.
Would you send it to stable kernel yourself or still want me to do it?


> > > If the flag MSG_DONTWAIT is set, we should be returning -EINPROGRESS
> > > and not wait for a response.  I had started working on a patch for
> > > that, but it's pretty tricky to get right.
> >
> > Can you point me to your WIP code branch for this?
> 
> https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit
> hub.com%2Fktls%2Fnet_next_ktls%2Fcommit%2F9cc839aa551ed972d148ec
> ebf353b25ee93543b9=02%7C01%7Cvakul.garg%40nxp.com%7Cf1c70
> 79af97e48c9e89308d568d1633a%7C686ea1d3bc2b4c6fa92cd99c5c301635
> %7C0%7C0%7C636530170887502735=0ASwmPPXXSGCXpBGes9vBma
> y8Gojv0wSUGAOOOjBExc%3D=0
> 
> > If MSG_DONTWAIT is not used, will it be sane if enqueue the crypto
> > request to accelerator and return to user space back so that user
> > space can send more plaintext data while crypto accelerator is working in
> parallel?
> 
> Right, that's roughly what the above does.  I believe the tricky unfinished 
> part
> was getting poll() to work correctly if there is an async crypto request
> outstanding.  Currently the tls poll() just relies on backpressure from
> do_tcp_sendpages.
 
I wanted to add async accelerator support for ktls where multiple crypto 
requests can be pipelined.
But since you are already doing it, I won't duplicate the efforts.
I would leverage your work on my platform, test it and try to contribute from 
here.

[PATCH] tls: Add support for encryption using async offload accelerator

2018-01-31 Thread Vakul Garg

Async crypto accelerators (e.g. drivers/crypto/caam) support offloading
GCM operation. If they are enabled, crypto_aead_encrypt() return error
code -EINPROGRESS. In this case tls_do_encryption() needs to wait on a
completion till the time the response for crypto offload request is
received.

Signed-off-by: Vakul Garg <vakul.g...@nxp.com>
---
 net/tls/tls_sw.c | 22 ++
 1 file changed, 22 insertions(+)

diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index 73d19210dd49..390e6dc7b135 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -39,6 +39,11 @@
 
 #include 
 
+struct crypto_completion {
+   struct completion comp;
+   int result;
+};
+
 static void trim_sg(struct sock *sk, struct scatterlist *sg,
int *sg_num_elem, unsigned int *sg_size, int target_size)
 {
@@ -194,6 +199,14 @@ static void tls_free_both_sg(struct sock *sk)
>sg_plaintext_size);
 }
 
+static void crypto_complete(struct crypto_async_request *req, int err)
+{
+   struct crypto_completion *x = req->data;
+
+   x->result = err;
+   complete(>comp);
+}
+
 static int tls_do_encryption(struct tls_context *tls_ctx,
 struct tls_sw_context *ctx, size_t data_len,
 gfp_t flags)
@@ -202,6 +215,7 @@ static int tls_do_encryption(struct tls_context *tls_ctx,
crypto_aead_reqsize(ctx->aead_send);
struct aead_request *aead_req;
int rc;
+   struct crypto_completion crypto_done;
 
aead_req = kzalloc(req_size, flags);
if (!aead_req)
@@ -214,7 +228,15 @@ static int tls_do_encryption(struct tls_context *tls_ctx,
aead_request_set_ad(aead_req, TLS_AAD_SPACE_SIZE);
aead_request_set_crypt(aead_req, ctx->sg_aead_in, ctx->sg_aead_out,
   data_len, tls_ctx->iv);
+   aead_request_set_callback(aead_req, 0, crypto_complete, _done);
+
+   init_completion(_done.comp);
+
rc = crypto_aead_encrypt(aead_req);
+   if (rc == -EINPROGRESS) {
+   wait_for_completion(_done.comp);
+   rc = crypto_done.result;
+   }
 
ctx->sg_encrypted_data[0].offset -= tls_ctx->prepend_size;
ctx->sg_encrypted_data[0].length += tls_ctx->prepend_size;
-- 
2.13.6

[PATCHv2] tls: Add support for encryption using async offload accelerator

2018-01-31 Thread Vakul Garg

Async crypto accelerators (e.g. drivers/crypto/caam) support offloading
GCM operation. If they are enabled, crypto_aead_encrypt() return error
code -EINPROGRESS. In this case tls_do_encryption() needs to wait on a
completion till the time the response for crypto offload request is
received.

Signed-off-by: Vakul Garg <vakul.g...@nxp.com>
---
v1-v2:
 - Used crypto_wait_req() to wait for async operation completion
 - Passed CRYPTO_TFM_REQ_MAY_BACKLOG to crypto_aead_encrypt

 include/net/tls.h | 2 ++
 net/tls/tls_sw.c  | 8 +++-
 2 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/include/net/tls.h b/include/net/tls.h
index 936cfc5cab7d..8a8e1256caee 100644
--- a/include/net/tls.h
+++ b/include/net/tls.h
@@ -36,6 +36,7 @@
 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -57,6 +58,7 @@
 
 struct tls_sw_context {
struct crypto_aead *aead_send;
+   struct crypto_wait async_wait;
 
/* Sending context */
char aad_space[TLS_AAD_SPACE_SIZE];
diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index 73d19210dd49..31a9f0ef8af6 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -214,7 +214,11 @@ static int tls_do_encryption(struct tls_context *tls_ctx,
aead_request_set_ad(aead_req, TLS_AAD_SPACE_SIZE);
aead_request_set_crypt(aead_req, ctx->sg_aead_in, ctx->sg_aead_out,
   data_len, tls_ctx->iv);
-   rc = crypto_aead_encrypt(aead_req);
+
+   aead_request_set_callback(aead_req, CRYPTO_TFM_REQ_MAY_BACKLOG,
+ crypto_req_done, >async_wait);
+
+   rc = crypto_wait_req(crypto_aead_encrypt(aead_req), >async_wait);
 
ctx->sg_encrypted_data[0].offset -= tls_ctx->prepend_size;
ctx->sg_encrypted_data[0].length += tls_ctx->prepend_size;
@@ -663,6 +667,8 @@ int tls_set_sw_offload(struct sock *sk, struct tls_context 
*ctx)
goto out;
}
 
+   crypto_init_wait(_ctx->async_wait);
+
ctx->priv_ctx = (struct tls_offload_context *)sw_ctx;
 
crypto_info = >crypto_send;
-- 
2.13.6

[PATCH] tls: Add support for encryption using async offload accelerator

2018-01-31 Thread Vakul Garg

Async crypto accelerators (e.g. drivers/crypto/caam) support offloading
GCM operation. If they are enabled, crypto_aead_encrypt() return error
code -EINPROGRESS. In this case tls_do_encryption() needs to wait on a
completion till the time the response for crypto offload request is
received.

Signed-off-by: Vakul Garg <vakul.g...@nxp.com>
---
v1-v2:
 - Used crypto_wait_req() to wait for async operation completion
 - Passed CRYPTO_TFM_REQ_MAY_BACKLOG to crypto_aead_encrypt

 include/net/tls.h | 2 ++
 net/tls/tls_sw.c  | 8 +++-
 2 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/include/net/tls.h b/include/net/tls.h
index 936cfc5cab7d..8a8e1256caee 100644
--- a/include/net/tls.h
+++ b/include/net/tls.h
@@ -36,6 +36,7 @@
 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -57,6 +58,7 @@
 
 struct tls_sw_context {
struct crypto_aead *aead_send;
+   struct crypto_wait async_wait;
 
/* Sending context */
char aad_space[TLS_AAD_SPACE_SIZE];
diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index 73d19210dd49..31a9f0ef8af6 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -214,7 +214,11 @@ static int tls_do_encryption(struct tls_context *tls_ctx,
aead_request_set_ad(aead_req, TLS_AAD_SPACE_SIZE);
aead_request_set_crypt(aead_req, ctx->sg_aead_in, ctx->sg_aead_out,
   data_len, tls_ctx->iv);
-   rc = crypto_aead_encrypt(aead_req);
+
+   aead_request_set_callback(aead_req, CRYPTO_TFM_REQ_MAY_BACKLOG,
+ crypto_req_done, >async_wait);
+
+   rc = crypto_wait_req(crypto_aead_encrypt(aead_req), >async_wait);
 
ctx->sg_encrypted_data[0].offset -= tls_ctx->prepend_size;
ctx->sg_encrypted_data[0].length += tls_ctx->prepend_size;
@@ -663,6 +667,8 @@ int tls_set_sw_offload(struct sock *sk, struct tls_context 
*ctx)
goto out;
}
 
+   crypto_init_wait(_ctx->async_wait);
+
ctx->priv_ctx = (struct tls_offload_context *)sw_ctx;
 
crypto_info = >crypto_send;
-- 
2.13.6

RE: [PATCH] tls: Add support for encryption using async offload accelerator

2018-01-31 Thread Vakul Garg

Forgot to add 'v2' in subject line.
I will be resending.

-Original Message-
From: Vakul Garg 
Sent: Wednesday, January 31, 2018 9:29 PM
To: linux-cry...@vger.kernel.org
Cc: il...@mellanox.com; avia...@mellanox.com; davejwat...@fb.com; 
da...@davemloft.net; netdev@vger.kernel.org; Vakul Garg <vakul.g...@nxp.com>
Subject: [PATCH] tls: Add support for encryption using async offload accelerator

Async crypto accelerators (e.g. drivers/crypto/caam) support offloading GCM 
operation. If they are enabled, crypto_aead_encrypt() return error code 
-EINPROGRESS. In this case tls_do_encryption() needs to wait on a completion 
till the time the response for crypto offload request is received.

Signed-off-by: Vakul Garg <vakul.g...@nxp.com>
---
v1-v2:
 - Used crypto_wait_req() to wait for async operation completion
 - Passed CRYPTO_TFM_REQ_MAY_BACKLOG to crypto_aead_encrypt

 include/net/tls.h | 2 ++
 net/tls/tls_sw.c  | 8 +++-
 2 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/include/net/tls.h b/include/net/tls.h index 
936cfc5cab7d..8a8e1256caee 100644
--- a/include/net/tls.h
+++ b/include/net/tls.h
@@ -36,6 +36,7 @@
 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -57,6 +58,7 @@
 
 struct tls_sw_context {
struct crypto_aead *aead_send;
+   struct crypto_wait async_wait;
 
/* Sending context */
char aad_space[TLS_AAD_SPACE_SIZE];
diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c index 
73d19210dd49..31a9f0ef8af6 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -214,7 +214,11 @@ static int tls_do_encryption(struct tls_context *tls_ctx,
aead_request_set_ad(aead_req, TLS_AAD_SPACE_SIZE);
aead_request_set_crypt(aead_req, ctx->sg_aead_in, ctx->sg_aead_out,
   data_len, tls_ctx->iv);
-   rc = crypto_aead_encrypt(aead_req);
+
+   aead_request_set_callback(aead_req, CRYPTO_TFM_REQ_MAY_BACKLOG,
+ crypto_req_done, >async_wait);
+
+   rc = crypto_wait_req(crypto_aead_encrypt(aead_req), >async_wait);
 
ctx->sg_encrypted_data[0].offset -= tls_ctx->prepend_size;
ctx->sg_encrypted_data[0].length += tls_ctx->prepend_size; @@ -663,6 
+667,8 @@ int tls_set_sw_offload(struct sock *sk, struct tls_context *ctx)
goto out;
}
 
+   crypto_init_wait(_ctx->async_wait);
+
ctx->priv_ctx = (struct tls_offload_context *)sw_ctx;
 
crypto_info = >crypto_send;
--
2.13.6

RE: [RFC crypto v3 8/9] chtls: Register the ULP

2018-02-08 Thread Vakul Garg



> -Original Message-
> From: linux-crypto-ow...@vger.kernel.org [mailto:linux-crypto-
> ow...@vger.kernel.org] On Behalf Of Atul Gupta
> Sent: Thursday, February 8, 2018 3:56 PM
> To: Dave Watson 
> Cc: s...@queasysnail.net; herb...@gondor.apana.org.au; linux-
> cry...@vger.kernel.org; ganes...@chelsio.co; netdev@vger.kernel.org;
> da...@davemloft.net; Boris Pismenny ; Ilya
> Lesokhin 
> Subject: RE: [RFC crypto v3 8/9] chtls: Register the ULP
> 
> I thought about this and approach below can avoid new ulp type:
> 
> 1. Register Inline TLS driver to net TLS
> 2. enable ethtool -K  tls-hw-record-offload on
> 3. Issue " setsockopt(fd, SOL_TCP, TCP_ULP, "tls", sizeof("tls")) " after 
> Bind,
> this will enable user fetch net_device corresponding to ipaadr bound to
> interface, if dev found is the one registered and record-offload enabled,
> program the sk->sk_prot as required.
 
What happens in case of TLS  clients which do not explicitly call bind() and
rely on kernel to choose an ephemeral port for socket?
Does calling setsockopt after the connection is established fix the problem?

> 4. fallback to SW TLS for any other case, bind to inaddr_any falls in this
> category and need proper handling?
> 
> tls-hw-record-offload is TLS record offload to HW, which does tx/rx and
> record creation Inline.
> 
> enum {
> TLS_BASE_TX,
> TLS_SW_TX,
> TLS_RECORD_HW, /* TLS record processed Inline */
> TLS_NUM_CONFIG,
> };
> 
> -Original Message-
> From: Dave Watson [mailto:davejwat...@fb.com]
> Sent: Wednesday, January 31, 2018 10:14 PM
> To: Atul Gupta 
> Cc: s...@queasysnail.net; herb...@gondor.apana.org.au; linux-
> cry...@vger.kernel.org; ganes...@chelsio.co; netdev@vger.kernel.org;
> da...@davemloft.net; Boris Pismenny ; Ilya
> Lesokhin 
> Subject: Re: [RFC crypto v3 8/9] chtls: Register the ULP
> 
> On 01/31/18 04:14 PM, Atul Gupta wrote:
> >
> >
> > On Tuesday 30 January 2018 10:41 PM, Dave Watson wrote:
> > > On 01/30/18 06:51 AM, Atul Gupta wrote:
> > >
> > > > What I was referring is that passing "tls" ulp type in setsockopt
> > > > may be insufficient to make the decision when multi HW assist
> > > > Inline TLS solution exists.
> > > Setting the ULP doesn't choose HW or SW implementation, I think that
> > > should be done later when setting up crypto with
> > >
> > > setsockopt(SOL_TLS, TLS_TX, struct crypto_info).
> > setsockpot [mentioned above] is quite late for driver to enable HW
> > implementation, we require something as early as tls_init
> > [setsockopt(sock, SOL_TCP, TCP_ULP, "tls", sizeof("tls"))], for driver
> > to set HW prot and offload connection beside Inline Tx/Rx.
> > >
> > > Any reason we can't use ethtool to choose HW vs SW implementation,
> > > if available on the device?
> > Thought about it,  the interface index is not available to fetch
> > netdev and caps check to set HW prot eg. bind [prot.hash] --> tls_hash to
> program HW.
> 
> Perhaps this is the part I don't follow - why do you need to override hash and
> check for LISTEN?  I briefly looked through the patch named "CPL handler
> definition", this looks like it is a full TCP offload?
> 
> Yes, this is connection and record layer offload, and the reason I used
> different ulp type, need to see what additional info or check can help setup
> the required sk prot.

RE: [PATCH net-next] net/tls: Mark the end in scatterlist table

2018-08-02 Thread Vakul Garg




> -Original Message-
> From: Dave Watson [mailto:davejwat...@fb.com]
> Sent: Thursday, August 2, 2018 10:17 PM
> To: Vakul Garg 
> Cc: netdev@vger.kernel.org; bor...@mellanox.com;
> avia...@mellanox.com; da...@davemloft.net
> Subject: Re: [PATCH net-next] net/tls: Mark the end in scatterlist table
> 
> On 08/02/18 08:43 PM, Vakul Garg wrote:
> > Function zerocopy_from_iter() unmarks the 'end' in input sgtable while
> > adding new entries in it. The last entry in sgtable remained unmarked.
> > This results in KASAN error report on using apis like sg_nents().
> > Before returning, the function needs to mark the 'end' in the last
> > entry it adds.
> >
> > Signed-off-by: Vakul Garg 
> 
> Looks good to me, it looks like the fallback path should unmark the end
> appropriately.
> 
In case zerocopy_from_iter() fails, 'end' won't get marked.
So fallback path is fine.

> Which codepath is calling sg_nents()?

While testing my WIP implementation of combined dynamic memory allocation for 
(aead_req || sgin || sgout || aad || iv), I was getting random kernel crashes.
To debug it I had inserted sg_nents() in my code. The KASAN then immediately
complained that sg_nents() went beyond the allocated memory for scatterlist.
This led me to find that scatterlist table end has not been marked.

> Acked-by: Dave Watson

RE: [PATCH net-next] net/tls: Mark the end in scatterlist table

2018-08-02 Thread Vakul Garg




> -Original Message-
> From: Dave Watson [mailto:davejwat...@fb.com]
> Sent: Thursday, August 2, 2018 10:47 PM
> To: Vakul Garg 
> Cc: netdev@vger.kernel.org; bor...@mellanox.com;
> avia...@mellanox.com; da...@davemloft.net
> Subject: Re: [PATCH net-next] net/tls: Mark the end in scatterlist table
> 
> On 08/02/18 05:05 PM, Vakul Garg wrote:
> > In case zerocopy_from_iter() fails, 'end' won't get marked.
> > So fallback path is fine.
> >
> > > Which codepath is calling sg_nents()?
> >
> > While testing my WIP implementation of combined dynamic memory
> > allocation for (aead_req || sgin || sgout || aad || iv), I was getting
> random kernel crashes.
> > To debug it I had inserted sg_nents() in my code. The KASAN then
> > immediately complained that sg_nents() went beyond the allocated
> memory for scatterlist.
> > This led me to find that scatterlist table end has not been marked.
> >
> 
> If this isn't causing KASAN issues for the existing code, it probably makes
> more sense to put in a future series with that WIP work then.

Isn't using a sgtable with unmarked end already bad and should be fixed?
Crypto hardware accelerator drivers could be broken while using sg lists with
unmarked end.

RE: Security enhancement proposal for kernel TLS

2018-08-02 Thread Vakul Garg




> -Original Message-
> From: Dave Watson [mailto:davejwat...@fb.com]
> Sent: Thursday, August 2, 2018 2:17 AM
> To: Vakul Garg 
> Cc: netdev@vger.kernel.org; Peter Doliwa ; Boris
> Pismenny 
> Subject: Re: Security enhancement proposal for kernel TLS
> 
> On 07/31/18 10:45 AM, Vakul Garg wrote:
> > > > IIUC, with the upstream implementation of tls record layer in
> > > > kernel, the decryption of tls FINISHED message happens in kernel.
> > > > Therefore the keys are already being sent to kernel tls socket
> > > > before handshake is
> > > completed.
> > >
> > > This is incorrect.
> >
> > Let us first reach a common ground on this.
> >
> >  The kernel TLS implementation can decrypt only after setting the keys on
> the socket.
> > The TLS message 'finished' (which is encrypted) is received after receiving
> 'CCS'
> > message. After the user space  TLS library receives CCS message, it
> > sets the keys on kernel TLS socket. Therefore, the next message in the
> > socket receive queue which is TLS finished gets decrypted in kernel only.
> >
> > Please refer to following Boris's patch on openssl. The  commit log says:
> > " We choose to set this option at the earliest - just after CCS is 
> > complete".
> 
> I agree that Boris' patch does what you say it does - it sets keys immediately
> after CCS instead of after FINISHED message.  I disagree that the kernel tls
> implementation currently requires that specific ordering, nor do I think that 
> it
> should require that ordering.

The current kernel implementation assumes record sequence number to start from 
'0'.
If keys have to be set after FINISHED message, then record sequence number need 
to
be communicated from user space TLS stack to kernel. IIRC, sequence number is 
not 
part of the interface through which key is transferred.

RE: [net-next v5 3/3] net/tls: Remove redundant array allocation.

2018-08-01 Thread Vakul Garg




> -Original Message-
> From: Dave Watson [mailto:davejwat...@fb.com]
> Sent: Monday, July 23, 2018 10:05 PM
> To: David Miller 
> Cc: Vakul Garg ; netdev@vger.kernel.org;
> bor...@mellanox.com; avia...@mellanox.com; Doron Roberts-Kedes
> 
> Subject: Re: [net-next v5 3/3] net/tls: Remove redundant array allocation.
> 
> On 07/21/18 07:25 PM, David Miller wrote:
> > From: Vakul Garg 
> > Date: Thu, 19 Jul 2018 21:56:13 +0530
> >
> > > In function decrypt_skb(), array allocation in case when sgout is
> > > NULL is unnecessary. Instead, local variable sgin_arr[] can be used.
> > >
> > > Signed-off-by: Vakul Garg 
> >
> > Hmmm...
> >
> > Dave, can you take a look at this?  Do you think there might have been
> > a reason you felt that you needed to dynamically allocate the
> > scatterlists when you COW and skb and do in-place decryption?
> >
> > I guess this change is ok, nsg can only get smaller when the SKB is
> > COW'd.
> >
> 
> > > memcpy(iv, tls_ctx->rx.iv, TLS_CIPHER_AES_GCM_128_SALT_SIZE);
> > > if (!sgout) {
> > > nsg = skb_cow_data(skb, 0, ) + 1;
> > > -   sgin = kmalloc_array(nsg, sizeof(*sgin), 
> > > sk->sk_allocation);
> > > sgout = sgin;
> > > }
> 
> I don't think this patch is safe as-is.  sgin_arr is a stack array of size
> MAX_SKB_FRAGS (+ overhead), while my read of skb_cow_data is that it
> walks the whole chain of skbs from skb->next, and can return any number of
> segments.  Therefore we need to heap allocate.  I think I copied the IPSEC
> code here.
> 
> For perf though, we could use the stack array if skb_cow_data returns <=
> MAX_SKB_FRAGS.

skb_cow_data() is being called only when caller passes sgout=NULL (i.e.
non-zero copy case). In case of zero-copy, we do not call skb_cow_data()
and just assume that MAX_SKB_FRAGS+2 sized scatterlist array sgin_arr[]
is sufficient. This assumption could be wrong. So skb_cow_data() should be
called unconditionally to determine number of scatterlist entries required
for skb.

> 
> This code is slightly confusing though, since we don't heap allocate in the
> zerocopy case - what happens is that skb_to_sgvec returns -EMSGSIZE, and
> we fall back to the non-zerocopy case, and return again to this function,
> where we then hit the skb_cow_data path and heap allocate.

If skb_to_sgvec return -EMSGSIZE, decrypt_skb() would return failure, 
resulting in abort of TLS session.

RE: [PATCH net-next] net/tls: Calculate nsg for zerocopy path without skb_cow_data.

2018-08-02 Thread Vakul Garg




> -Original Message-
> From: Doron Roberts-Kedes [mailto:doro...@fb.com]
> Sent: Friday, August 3, 2018 6:00 AM
> To: David S . Miller 
> Cc: Dave Watson ; Vakul Garg
> ; Boris Pismenny ; Aviad
> Yehezkel ; netdev@vger.kernel.org; Doron
> Roberts-Kedes 
> Subject: [PATCH net-next] net/tls: Calculate nsg for zerocopy path without
> skb_cow_data.
> 
> decrypt_skb fails if the number of sg elements required to map is greater
> than MAX_SKB_FRAGS. As noted by Vakul Garg, nsg must always be
> calculated, but skb_cow_data adds unnecessary memcpy's for the zerocopy
> case.
> 
> The new function skb_nsg calculates the number of scatterlist elements
> required to map the skb without the extra overhead of skb_cow_data. This
> function mimics the structure of skb_to_sgvec.
> 
> Fixes: c46234ebb4d1 ("tls: RX path for ktls")
> Signed-off-by: Doron Roberts-Kedes 
> ---
>  net/tls/tls_sw.c | 89
> ++--
>  1 file changed, 86 insertions(+), 3 deletions(-)
> 
> diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c index
> ff3a6904a722..c62793601cfc 100644
> --- a/net/tls/tls_sw.c
> +++ b/net/tls/tls_sw.c
> @@ -43,6 +43,76 @@
> 
>  #define MAX_IV_SIZE  TLS_CIPHER_AES_GCM_128_IV_SIZE
> 
> +static int __skb_nsg(struct sk_buff *skb, int offset, int len,
> +unsigned int recursion_level)
> +{
> +int start = skb_headlen(skb);
> +int i, copy = start - offset;
> +struct sk_buff *frag_iter;
> +int elt = 0;
> +
> +if (unlikely(recursion_level >= 24))
> +return -EMSGSIZE;
> +
> +if (copy > 0) {
> +if (copy > len)
> +copy = len;
> +elt++;
> +if ((len -= copy) == 0)
> +return elt;
> +offset += copy;
> +}
> +
> +for (i = 0; i < skb_shinfo(skb)->nr_frags; i++) {
> +int end;
> +
> +WARN_ON(start > offset + len);
> +
> +end = start + skb_frag_size(_shinfo(skb)->frags[i]);
> +if ((copy = end - offset) > 0) {
> +if (copy > len)
> +copy = len;
> +elt++;
> +if (!(len -= copy))
> +return elt;
> +offset += copy;
> +}
> +start = end;
> +}
> +
> +skb_walk_frags(skb, frag_iter) {
> +int end, ret;
> +
> +WARN_ON(start > offset + len);
> +
> +end = start + frag_iter->len;
> +if ((copy = end - offset) > 0) {
> +
> +if (copy > len)
> +copy = len;
> +ret = __skb_nsg(frag_iter, offset - start, copy,
> + recursion_level + 1);
> +if (unlikely(ret < 0))
> +return ret;
> +elt += ret;
> +if ((len -= copy) == 0)
> +return elt;
> +offset += copy;
> +}
> +start = end;
> +}
> +BUG_ON(len);
> +return elt;
> +}
> +
> +/* Return the number of scatterlist elements required to completely map
> +the
> + * skb, or -EMSGSIZE if the recursion depth is exceeded.
> + */
> +static int skb_nsg(struct sk_buff *skb, int offset, int len) {
> + return __skb_nsg(skb, offset, len, 0); }
> +

These is generic function and useful elsewhere too.
Should the above two functions be exported by skbuff.c?

>  static int tls_do_decryption(struct sock *sk,
>struct scatterlist *sgin,
>struct scatterlist *sgout,
> @@ -693,7 +763,7 @@ int decrypt_skb(struct sock *sk, struct sk_buff *skb,
>   struct scatterlist sgin_arr[MAX_SKB_FRAGS + 2];
>   struct scatterlist *sgin = _arr[0];
>   struct strp_msg *rxm = strp_msg(skb);
> - int ret, nsg = ARRAY_SIZE(sgin_arr);
> + int ret, nsg;
>   struct sk_buff *unused;
> 
>   ret = skb_copy_bits(skb, rxm->offset + TLS_HEADER_SIZE, @@ -
> 704,10 +774,23 @@ int decrypt_skb(struct sock *sk, struct sk_buff *skb,
> 
>   memcpy(iv, tls_ctx->rx.iv, TLS_CIPHER_AES_GCM_128_SALT_SIZE);
>   if (!sgout) {
> - nsg = skb_cow_data(skb, 0, ) + 1;
> + nsg = skb_cow_data(skb, 0, );
> + } else {
> + nsg = skb_n

RE: [net-next v6 1/2] net/tls: Use socket data_ready callback on record availability

2018-07-29 Thread Vakul Garg

Hi David

Could you please correct me if my counter-reasoning behind changing the socket 
callback is wrong?

Thanks & Regards

Vakul

> -Original Message-
> From: Vakul Garg
> Sent: Wednesday, July 25, 2018 11:22 AM
> To: David Miller 
> Cc: netdev@vger.kernel.org; bor...@mellanox.com;
> avia...@mellanox.com; davejwat...@fb.com
> Subject: RE: [net-next v6 1/2] net/tls: Use socket data_ready callback on
> record availability
> 
> 
> 
> > -Original Message-
> > From: David Miller [mailto:da...@davemloft.net]
> > Sent: Wednesday, July 25, 2018 1:43 AM
> > To: Vakul Garg 
> > Cc: netdev@vger.kernel.org; bor...@mellanox.com;
> avia...@mellanox.com;
> > davejwat...@fb.com
> > Subject: Re: [net-next v6 1/2] net/tls: Use socket data_ready callback
> > on record availability
> >
> > From: Vakul Garg 
> > Date: Tue, 24 Jul 2018 15:44:02 +0530
> >
> > > On receipt of a complete tls record, use socket's saved data_ready
> > > callback instead of state_change callback.
> > >
> > > Signed-off-by: Vakul Garg 
> >
> > I don't think this is correct.
> >
> > Here, the stream parser has given us a complete TLS record.
> >
> > But we haven't decrypted this packet yet.  It sits on the stream
> > parser's queue to be processed by tls_sw_recvmsg(), not the saved
> > socket's receive queue.
> 
> I understand that at this point in code, the TLS record is still queued in
> encrypted state. But the decryption happens inline when tls_sw_recvmsg()
> gets invokved.
> So it should be ok to notify the  waiting context about the availability of 
> data
> as soon as we could collect a full TLS record.
> 
> For new data availability notification, sk_data_ready callback should be more
> more appropriate. It points to sock_def_readable() which wakes up
> specifically for EPOLLIN event.
> 
> This is in contrast to the socket callback sk_state_change which points to
> sock_def_wakeup() which issues a wakeup unconditionally (without event
> mask).

RE: [net-next v6 1/2] net/tls: Use socket data_ready callback on record availability

2018-07-29 Thread Vakul Garg




> -Original Message-
> From: David Miller [mailto:da...@davemloft.net]
> Sent: Sunday, July 29, 2018 11:48 AM
> To: Vakul Garg 
> Cc: netdev@vger.kernel.org; bor...@mellanox.com;
> avia...@mellanox.com; davejwat...@fb.com
> Subject: Re: [net-next v6 1/2] net/tls: Use socket data_ready callback on
> record availability
> 
> From: Vakul Garg 
> Date: Sun, 29 Jul 2018 06:01:29 +
> 
> > Could you please correct me if my counter-reasoning behind changing
> > the socket callback is wrong?
> 
> Ok, after stufying the code a bit I agree with your analysis.

Thanks.
Kindly advise, if I need to resubmit/rebase the patch.

RE: [PATCH net-next] net/tls: Corrected enabling of zero-copy mode

2018-07-26 Thread Vakul Garg




> -Original Message-
> From: netdev-ow...@vger.kernel.org [mailto:netdev-
> ow...@vger.kernel.org] On Behalf Of David Miller
> Sent: Thursday, July 26, 2018 1:59 AM
> To: Vakul Garg 
> Cc: netdev@vger.kernel.org; bor...@mellanox.com;
> avia...@mellanox.com; davejwat...@fb.com
> Subject: Re: [PATCH net-next] net/tls: Corrected enabling of zero-copy mode
> 
> From: Vakul Garg 
> Date: Mon, 23 Jul 2018 21:00:06 +0530
> 
> > @@ -787,7 +787,7 @@ int tls_sw_recvmsg(struct sock *sk,
> > target = sock_rcvlowat(sk, flags & MSG_WAITALL, len);
> > timeo = sock_rcvtimeo(sk, flags & MSG_DONTWAIT);
> > do {
> > -   bool zc = false;
> > +   bool zc;
> > int chunk = 0;
> >
> > skb = tls_wait_data(sk, flags, timeo, );
>  ...
> > @@ -836,6 +835,7 @@ int tls_sw_recvmsg(struct sock *sk,
> > if (err < 0)
> > goto fallback_to_reg_recv;
> >
> > +   zc = true;
> > err = decrypt_skb_update(sk, skb, sgin, );
> > for (; pages > 0; pages--)
> > put_page(sg_page([pages]));
> @@ -845,6 +845,7 @@ int
> > tls_sw_recvmsg(struct sock *sk,
> > }
> > } else {
> >  fallback_to_reg_recv:
> > +   zc = false;
> > err = decrypt_skb_update(sk, skb, NULL,
> );
> > if (err < 0) {
> > tls_err_abort(sk, EBADMSG);
> > --
> > 2.13.6
> >
> 
> This will leave a code path where 'zc' is evaluated but not initialized to
> any value.
> 
> And that's the path taken when ctx->decrypted is true.  The code after
> your changes looks like:
> 
>   bool zc;
>  ...
>   if (!ctx->decrypted) {
> 
>  ... assignments to 'zc' happen in this code block
> 
>   ctx->decrypted = true;
>   }
> 
>   if (!zc) {
> 
> So when ctx->decrypted it true, the if(!zc) condition runs on an
> uninitialized value.
> 
> I have to say that your TLS changes are becomming quite a time sink
> for two reasons.
> 
> First, you are making a lot of changes that seem not so needed, and
> whose value is purely determined by taste.  I'd put the
> msg_data_left() multiple evaluation patch into this category.
> 
> The rest require deep review and understanding of the complicated
> details of the TLS code, and many of them turn out to be incorrect.

My apologies for sending incorrect patches. I would be more careful next time. 


> 
> As I find more errors in your submissions, I begin to scrutinize your
> patches even more.  Thus, review of your changes takes even more time.
> 
> And it isn't helping that there are not a lot of other developers
> helping actively to review your changes.
> 
> I would like to just make a small request to you, that you concentrate
> on fixing clear bugs and clear issues that need to be resolved.
> 
> Thank you.

RE: [net-next v5 3/3] net/tls: Remove redundant array allocation.

2018-07-27 Thread Vakul Garg




> -Original Message-
> From: Dave Watson [mailto:davejwat...@fb.com]
> Sent: Thursday, July 26, 2018 2:31 AM
> To: Vakul Garg 
> Cc: David Miller ; netdev@vger.kernel.org;
> bor...@mellanox.com; avia...@mellanox.com; Doron Roberts-Kedes
> 
> Subject: Re: [net-next v5 3/3] net/tls: Remove redundant array allocation.
> 
> On 07/24/18 08:22 AM, Vakul Garg wrote:
> > > I don't think this patch is safe as-is.  sgin_arr is a stack array
> > > of size MAX_SKB_FRAGS (+ overhead), while my read of skb_cow_data is
> > > that it walks the whole chain of skbs from skb->next, and can return
> > > any number of segments.  Therefore we need to heap allocate.  I
> > > think I copied the IPSEC code here.
> > >
> > > For perf though, we could use the stack array if skb_cow_data
> > > returns <= MAX_SKB_FRAGS.
> > >
> > > This code is slightly confusing though, since we don't heap allocate
> > > in the zerocopy case - what happens is that skb_to_sgvec returns
> > > -EMSGSIZE, and we fall back to the non-zerocopy case, and return
> > > again to this function, where we then hit the skb_cow_data path and
> heap allocate.
> >
> > Thanks for explaining.
> > I am missing the point why MAX_SKB_FRAGS sized local array sgin has
> > been used in tls_sw_recvmsg(). What is special about MAX_SKB_FRAGS so
> > that we used it as a size factor for 'sgin'?
> 
> There is nothing special about it, in the zerocopy-fastpath if we happen to 
> fit
> in MAX_SKB_FRAGS we avoid any kmalloc whatsoever though.
> It could be renamed MAX_SC_FOR_FASTPATH or something.
> 
> > Will it be a bad idea to get rid of array 'sgin' on stack and simply
> > kmalloc 'sgin' for whatever the number the number of pages returned by
> iov_iter_npages()?
> > We can allocate for sgout too with the same kmalloc().
> >
> > (Using a local array based 'sgin' is coming in the way to achieve
> > sending multiple async record decryption requests to the accelerator
> > without waiting for previous one to complete.)
> 
> Yes we could do this, and yes we would need to heap allocate if you want to
> support multiple outstanding decryption requests.  I think async crypto
> prevents any sort of zerocopy-fastpath, however.

We already do a aead_request_alloc (which internally does kmalloc).
To mitigate the cost of kmalloc/kfree for sg lists and aad, I am allocating a 
combined memory chunk for all of these and then segmenting it into
aead_req, aad, sgin, sgout. This way there should be no extra cost for
memory allocations in non-async.

RE: Security enhancement proposal for kernel TLS

2018-07-31 Thread Vakul Garg




> -Original Message-
> From: Dave Watson [mailto:davejwat...@fb.com]
> Sent: Tuesday, July 31, 2018 2:46 AM
> To: Vakul Garg 
> Cc: netdev@vger.kernel.org; Peter Doliwa ; Boris
> Pismenny 
> Subject: Re: Security enhancement proposal for kernel TLS
> 
> On 07/30/18 06:31 AM, Vakul Garg wrote:
> > > It's not entirely clear how your TLS handshake daemon works -   Why is
> > > it necessary to set the keys in the kernel tls socket before the
> > > handshake is completed?
> >
> > IIUC, with the upstream implementation of tls record layer in kernel,
> > the decryption of tls FINISHED message happens in kernel. Therefore
> > the keys are already being sent to kernel tls socket before handshake is
> completed.
> 
> This is incorrect.  

Let us first reach a common ground on this.

 The kernel TLS implementation can decrypt only after setting the keys on the 
socket.
The TLS message 'finished' (which is encrypted) is received after receiving 
'CCS'
message. After the user space  TLS library receives CCS message, it sets the 
keys
on kernel TLS socket. Therefore, the next message in the  socket receive queue
which is TLS finished gets decrypted in kernel only.

Please refer to following Boris's patch on openssl. The  commit log says:
" We choose to set this option at the earliest - just after CCS is complete".

--
commit a01dd062a32c687630b2a860b4bb053008f09ff5
Author: Boris Pismenny 
Date:   Sun Mar 11 16:18:27 2018 +0200

ssl: Linux TLS Rx Offload

This patch adds support for the Linux TLS Rx socket option.
It completes the previous patch for TLS Tx offload.
If the socket option is successful, then the receive data-path of the TCP
socket is implemented by the kernel.
We choose to set this option at the earliest - just after CCS is complete.
--

The  fact that keys are handed over to kernel TLS socket can also be verified
by putting a log in tls_sw_recvmsg().

I would stop here for you to confirm my observation first. 
Regards. Vakul


 > Currently the kernel TLS implementation decrypts
> everything after you set the keys on the socket.  I'm suggesting that you
> don't set the keys on the socket until after the FINISHED message.
> 
> > > Or, why do you need to hand off the fd to the client program before
> > > the handshake is completed?
> >
> > The fd is always owned by the client program..
> >
> > In my proposal, the applications poll their own tcp socket using
> read/recvmsg etc.
> > If they get handshake record, they forward it to the entity running
> handshake agent.
> > The handshake agent could be a linux daemon or could run on a separate
> > security processor like 'Secure element' or say arm trustzone etc. The
> > applications forward any handshake message it gets backs from
> > handshake agent to the connected tcp socket. Therefore, the
> > applications act as a forwarder of the handshake messages between the
> peer tls endpoint and handshake agent.
> > The received data messages are absorbed by the applications themselves
> > (bypassing ssl stack completely). Similarly, the applications tx data 
> > directly
> by writing on their socket.
> >
> > > Waiting until after handshake solves both of these issues.
> >
> > The security sensitive check which is 'Wait for handshake to finish
> > completely before accepting data' should not be the onus of the
> > application. We have enough examples in past where application
> > programmers made mistakes in setting up tls correctly. The idea is to
> isolate tls session setting up from the applications.
> 
> It's not clear to me what you gain by putting this 'handshake finished'
> notification in the kernel instead of in the client's tls library - you're 
> already
> forwarding the handshake start notification to the daemon, why can't the
> daemon notify them back in userspace that
> the handshake is finished?
> 
> If you did want to put the notification in the kernel, how would you handle
> poll on the socket, since probably both the handshake daemon and client
> might be polling the socket, but one for control messages and one for data?
> 
> The original kernel TLS RFC did split these to two separate sockets, but we
> decided it was too complicated, and that's not how userspace TLS clients
> function today.
> 
> Do you have an implementation of this?  There are a bunch of tricky corner
> cases here, it might make more sense to have something concrete to discuss.
> 
> > Further, as per tls RFC it is ok to piggyback the data records after
> > the finished handshake message. This is called early data

[PATCH net-next] net/tls: Always get number of sg entries for skb to be decrypted

2018-08-02 Thread Vakul Garg

Function decrypt_skb() made a bad assumption that number of sg entries
required for mapping skb to be decrypted would always be less than
MAX_SKB_FRAGS. The required count of sg entries for skb should always be
calculated. If they cannot fit in local array sgin_arr[], allocate them
from heap irrespective whether it is zero-copy case or otherwise. The
change also benefits the non-zero copy case as we could use sgin_arr[]
instead of always allocating sg entries from heap.

Signed-off-by: Vakul Garg 
---

The said problem has been discussed with Dave Watson over mail list.

 net/tls/tls_sw.c | 16 
 1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index ff3a6904a722..e2cf7aebb877 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -693,7 +693,7 @@ int decrypt_skb(struct sock *sk, struct sk_buff *skb,
struct scatterlist sgin_arr[MAX_SKB_FRAGS + 2];
struct scatterlist *sgin = _arr[0];
struct strp_msg *rxm = strp_msg(skb);
-   int ret, nsg = ARRAY_SIZE(sgin_arr);
+   int ret, nsg;
struct sk_buff *unused;
 
ret = skb_copy_bits(skb, rxm->offset + TLS_HEADER_SIZE,
@@ -703,12 +703,20 @@ int decrypt_skb(struct sock *sk, struct sk_buff *skb,
return ret;
 
memcpy(iv, tls_ctx->rx.iv, TLS_CIPHER_AES_GCM_128_SALT_SIZE);
-   if (!sgout) {
-   nsg = skb_cow_data(skb, 0, ) + 1;
+
+   /* If required number of SG entries for skb are more than
+* sgin_arr elements, then dynamically allocate sg table.
+*/
+   nsg = skb_cow_data(skb, 0, ) + 1;
+   if (nsg > ARRAY_SIZE(sgin_arr)) {
sgin = kmalloc_array(nsg, sizeof(*sgin), sk->sk_allocation);
-   sgout = sgin;
+   if (!sgin)
+   return -ENOMEM;
}
 
+   if (!sgout)
+   sgout = sgin;
+
sg_init_table(sgin, nsg);
sg_set_buf([0], ctx->rx_aad_ciphertext, TLS_AAD_SPACE_SIZE);
 
-- 
2.13.6

[PATCH net-next] net/tls: Mark the end in scatterlist table

2018-08-02 Thread Vakul Garg

Function zerocopy_from_iter() unmarks the 'end' in input sgtable while
adding new entries in it. The last entry in sgtable remained unmarked.
This results in KASAN error report on using apis like sg_nents(). Before
returning, the function needs to mark the 'end' in the last entry it
adds.

Signed-off-by: Vakul Garg 
---
 net/tls/tls_sw.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index ff3a6904a722..83d67df33f0c 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -311,6 +311,9 @@ static int zerocopy_from_iter(struct sock *sk, struct 
iov_iter *from,
}
}
 
+   /* Mark the end in the last sg entry if newly added */
+   if (num_elem > *pages_used)
+   sg_mark_end([num_elem - 1]);
 out:
if (rc)
iov_iter_revert(from, size - *size_used);
-- 
2.13.6

[PATCH net-next v7] net/tls: Use socket data_ready callback on record availability

2018-07-29 Thread Vakul Garg

On receipt of a complete tls record, use socket's saved data_ready
callback instead of state_change callback. In function tls_queue(),
the TLS record is queued in encrypted state. But the decryption
happen inline when tls_sw_recvmsg() or tls_sw_splice_read() get invoked.
So it should be ok to notify the waiting context about the availability
of data as soon as we could collect a full TLS record. For new data
availability notification, sk_data_ready callback is more appropriate.
It points to sock_def_readable() which wakes up specifically for EPOLLIN
event. This is in contrast to the socket callback sk_state_change which
points to sock_def_wakeup() which issues a wakeup unconditionally
(without event mask).

Signed-off-by: Vakul Garg 
---
v6->v7: Improved the commit message to contain the detailed reasoning.
(The same analysis was shared on the mail list.)

 net/tls/tls_sw.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index 6deceb7c56ba..33838f11fafa 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -1028,7 +1028,7 @@ static void tls_queue(struct strparser *strp, struct 
sk_buff *skb)
ctx->recv_pkt = skb;
strp_pause(strp);
 
-   strp->sk->sk_state_change(strp->sk);
+   ctx->saved_data_ready(strp->sk);
 }
 
 static void tls_data_ready(struct sock *sk)
-- 
2.13.6

RE: Security enhancement proposal for kernel TLS

2018-07-30 Thread Vakul Garg

Sorry for a delayed response.
Kindly see inline.

> -Original Message-
> From: Dave Watson [mailto:davejwat...@fb.com]
> Sent: Wednesday, July 25, 2018 9:30 PM
> To: Vakul Garg 
> Cc: netdev@vger.kernel.org; Peter Doliwa ; Boris
> Pismenny 
> Subject: Re: Security enhancement proposal for kernel TLS
> 
> You would probably get more responses if you cc the relevant people.
> Comments inline
> 
> On 07/22/18 12:49 PM, Vakul Garg wrote:
> > The kernel based TLS record layer allows the user space world to use a
> decoupled TLS implementation.
> > The applications need not be linked with TLS stack.
> > The TLS handshake can be done by a TLS daemon on the behalf of
> applications.
> >
> > Presently, as soon as the handshake process derives keys, it pushes the
> negotiated keys to kernel TLS .
> > Thereafter the applications can directly read and write data on their TCP
> socket (without having to use SSL apis).
> >
> > With the current kernel TLS implementation, there is a security problem.
> > Since the kernel TLS socket does not have information about the state
> > of handshake, it allows applications to be able to receive data from the
> peer TLS endpoint even when the handshake verification has not been
> completed by the SSL daemon.
> > It is a security problem if applications can receive data if verification 
> > of the
> handshake transcript is not completed (done with processing tls FINISHED
> message).
> >
> > My proposal:
> > - Kernel TLS should maintain state of handshake (verified or
> unverified).
> > In un-verified state, data records should not be allowed pass through
> to the applications.
> >
> > - Add a new control interface using which that the user space SSL
> stack can tell the TLS socket that handshake has been verified and DATA
> records can flow.
> > In 'unverified' state, only control records should be allowed to pass
> and reception DATA record should be pause the receive side record
> decryption.
> 
> It's not entirely clear how your TLS handshake daemon works -   Why is
> it necessary to set the keys in the kernel tls socket before the handshake is
> completed? 

IIUC, with the upstream implementation of tls record layer in kernel, the
decryption of tls FINISHED message happens in kernel. Therefore the keys are
already being sent to kernel tls socket before handshake is completed.

> Or, why do you need to hand off the fd to the client program
> before the handshake is completed?

The fd is always owned by the client program..
The client program opens up the socket, TCP bind/connect it and then
hands it over to SSL stack as a transport handle for exchanging handshake
messages. This is how it works today whether we use kernel TLS or not.
I do not propose to change it.

In my proposal, the applications poll their own tcp socket using read/recvmsg 
etc.
If they get handshake record, they forward it to the entity running handshake 
agent.
The handshake agent could be a linux daemon or could run on a separate security
processor like 'Secure element' or say arm trustzone etc. The applications
forward any handshake message it gets backs from handshake agent to the
connected tcp socket. Therefore, the  applications act as a forwarder of the 
handshake 
messages between the peer tls endpoint and handshake agent.
The received data messages are absorbed by the applications themselves 
(bypassing ssl stack
completely). Similarly, the applications tx data directly by writing on their 
socket.

> Waiting until after handshake solves both of these issues.

The security sensitive check which is 'Wait for handshake to finish completely 
before 
accepting data' should not be the onus of the application. We have enough 
examples
in past where application programmers made mistakes in setting up tls 
correctly. The idea
is to isolate tls session setting up from the applications.

> 
> I'm not aware of any tls libraries that send data before the finished message,
> is there any reason you need to support this?

Sending data records before sending finished message is a protocol error.
A good tls library never does that. But an attacker can exploit it if 
applications can receive
the  data records before handshake is finished. With current kernel TLS, it is 
possible to do so.

Further, as per tls RFC it is ok to piggyback the data records after the 
finished handshake
message. This is called early data. But then it is the responsibility of 
applications to first
complete finished message processing before accepting the data records.

The proposal is to disallow application world seeing data records before 
handshake finishes.

> 
> >
> > - The handshake state should fallback to 'unverified' in case a control
> record is seen again by k

RE: [PATCH net-next] net/tls: Calculate nsg for zerocopy path without skb_cow_data.

2018-08-06 Thread Vakul Garg




> -Original Message-
> From: Doron Roberts-Kedes [mailto:doro...@fb.com]
> Sent: Tuesday, August 7, 2018 12:02 AM
> To: Vakul Garg 
> Cc: David S . Miller ; Dave Watson
> ; Boris Pismenny ; Aviad
> Yehezkel ; netdev@vger.kernel.org
> Subject: Re: [PATCH net-next] net/tls: Calculate nsg for zerocopy path
> without skb_cow_data.
> 
> On Fri, Aug 03, 2018 at 11:49:02AM -0700, Doron Roberts-Kedes wrote:
> > On Fri, Aug 03, 2018 at 01:23:33AM +, Vakul Garg wrote:
> > >
> > >
> > > > -Original Message-
> > > > From: Doron Roberts-Kedes [mailto:doro...@fb.com]
> > > > Sent: Friday, August 3, 2018 6:00 AM
> > > > To: David S . Miller 
> > > > Cc: Dave Watson ; Vakul Garg
> > > > ; Boris Pismenny ;
> Aviad
> > > > Yehezkel ; netdev@vger.kernel.org; Doron
> > > > Roberts-Kedes 
> > > > Subject: [PATCH net-next] net/tls: Calculate nsg for zerocopy path
> > > > without skb_cow_data.
> > > >
> > > > decrypt_skb fails if the number of sg elements required to map is
> > > > greater than MAX_SKB_FRAGS. As noted by Vakul Garg, nsg must
> > > > always be calculated, but skb_cow_data adds unnecessary memcpy's
> > > > for the zerocopy case.
> > > >
> > > > The new function skb_nsg calculates the number of scatterlist
> > > > elements required to map the skb without the extra overhead of
> > > > skb_cow_data. This function mimics the structure of skb_to_sgvec.
> > > >
> > > > Fixes: c46234ebb4d1 ("tls: RX path for ktls")
> > > > Signed-off-by: Doron Roberts-Kedes 
> > > > ---
> > > >  net/tls/tls_sw.c | 89
> > > > ++--
> > > >  1 file changed, 86 insertions(+), 3 deletions(-)
> > > >
> > > > memcpy(iv, tls_ctx->rx.iv, TLS_CIPHER_AES_GCM_128_SALT_SIZE);
> > > > if (!sgout) {
> > > > -   nsg = skb_cow_data(skb, 0, ) + 1;
> > > > +   nsg = skb_cow_data(skb, 0, );
> > > > +   } else {
> > > > +   nsg = skb_nsg(skb,
> > > > + rxm->offset + tls_ctx->rx.prepend_size,
> > > > + rxm->full_len - tls_ctx->rx.prepend_size);
> > > > +   if (nsg <= 0)
> > > > +   return nsg;
> > > Comparison should be (nsg < 1). TLS forbids '0' sized records.
> >
> > Yes true, v2 incoming
> >
> 
> Glancing at this a second time, I actually don't believe this should be
> changed. nsg <= 0 is equivalent to nsg < 1. 

Correct.


> Returning 0 if the record is
> 0 sized is the proper behavior here, since decrypting a zero-length skb is a 
> no-
> op. It is true that zero sized TLS records are forbidden, but it is confusing 
> to
> enforce that in this part of the code. I would be surpised to learn that
> tls_sw_recvmsg could be invoked with a len equal to 0, but if I'm wrong, and
> that case does need to be handled, then it should be in a different patch.

[PATCH net-next v1 0/1] net/tls: Combined memory allocation for decryption request

2018-08-08 Thread Vakul Garg

This patch does a combined memory allocation from heap for scatterlists,
aead_request, aad and iv for the tls record decryption path. In present
code, aead_request is allocated from heap, scatterlists on a conditional
basis are allocated on heap or on stack. This is inefficient as it may
requires multiple kmalloc/kfree.

The initialization vector passed in cryption request is allocated on
stack. This is a problem since the stack memory is not dma-able from
crypto accelerators.

Doing one combined memory allocation for each decryption request fixes
both the above issues. It also paves a way to be able to submit multiple
async decryption requests while the previous one is pending i.e. being 
processed or queued.

This patch needs to be applied over Doron Roberts-Kedes's patch.
net/tls: Calculate nsg for zerocopy path without skb_cow_data.

Vakul Garg (1):
  net/tls: Combined memory allocation for decryption request

 include/net/tls.h |   4 -
 net/tls/tls_sw.c  | 258 ++
 2 files changed, 146 insertions(+), 116 deletions(-)

-- 
2.13.6

[PATCH net-next v1 1/1] net/tls: Combined memory allocation for decryption request

2018-08-08 Thread Vakul Garg

For preparing decryption request, several memory chunks are required
(aead_req, sgin, sgout, iv, aad). For submitting the decrypt request to
an accelerator, it is required that the buffers which are read by the
accelerator must be dma-able and not come from stack. The buffers for
aad and iv can be separately kmalloced each, but it is inefficient.
This patch does a combined allocation for preparing decryption request
and then segments into aead_req || sgin || sgout || iv || aad.

Signed-off-by: Vakul Garg 
---

This patch needs to be applied over Doron Roberts-Kedes's patch.
net/tls: Calculate nsg for zerocopy path without skb_cow_data.

Changes since RFC version of patch:
- Addressed Dave Watson's comment about removing a redundant 'else'.

 include/net/tls.h |   4 -
 net/tls/tls_sw.c  | 258 ++
 2 files changed, 146 insertions(+), 116 deletions(-)

diff --git a/include/net/tls.h b/include/net/tls.h
index d8b3b6578c01..d5c683e8bb22 100644
--- a/include/net/tls.h
+++ b/include/net/tls.h
@@ -124,10 +124,6 @@ struct tls_sw_context_rx {
struct sk_buff *recv_pkt;
u8 control;
bool decrypted;
-
-   char rx_aad_ciphertext[TLS_AAD_SPACE_SIZE];
-   char rx_aad_plaintext[TLS_AAD_SPACE_SIZE];
-
 };
 
 struct tls_record_info {
diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index 55eb04415bb3..b148da3c0bf6 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -122,19 +122,13 @@ static int tls_do_decryption(struct sock *sk,
 struct scatterlist *sgout,
 char *iv_recv,
 size_t data_len,
-struct sk_buff *skb,
-gfp_t flags)
+struct aead_request *aead_req)
 {
struct tls_context *tls_ctx = tls_get_ctx(sk);
struct tls_sw_context_rx *ctx = tls_sw_ctx_rx(tls_ctx);
-   struct aead_request *aead_req;
-
int ret;
 
-   aead_req = aead_request_alloc(ctx->aead_recv, flags);
-   if (!aead_req)
-   return -ENOMEM;
-
+   aead_request_set_tfm(aead_req, ctx->aead_recv);
aead_request_set_ad(aead_req, TLS_AAD_SPACE_SIZE);
aead_request_set_crypt(aead_req, sgin, sgout,
   data_len + tls_ctx->rx.tag_size,
@@ -143,8 +137,6 @@ static int tls_do_decryption(struct sock *sk,
  crypto_req_done, >async_wait);
 
ret = crypto_wait_req(crypto_aead_decrypt(aead_req), >async_wait);
-
-   aead_request_free(aead_req);
return ret;
 }
 
@@ -731,8 +723,136 @@ static struct sk_buff *tls_wait_data(struct sock *sk, int 
flags,
return skb;
 }
 
+/* This function decrypts the input skb into either out_iov or in out_sg
+ * or in skb buffers itself. The input parameter 'zc' indicates if
+ * zero-copy mode needs to be tried or not. With zero-copy mode, either
+ * out_iov or out_sg must be non-NULL. In case both out_iov and out_sg are
+ * NULL, then the decryption happens inside skb buffers itself, i.e.
+ * zero-copy gets disabled and 'zc' is updated.
+ */
+
+static int decrypt_internal(struct sock *sk, struct sk_buff *skb,
+   struct iov_iter *out_iov,
+   struct scatterlist *out_sg,
+   int *chunk, bool *zc)
+{
+   struct tls_context *tls_ctx = tls_get_ctx(sk);
+   struct tls_sw_context_rx *ctx = tls_sw_ctx_rx(tls_ctx);
+   struct strp_msg *rxm = strp_msg(skb);
+   int n_sgin, n_sgout, nsg, mem_size, aead_size, err, pages = 0;
+   struct aead_request *aead_req;
+   struct sk_buff *unused;
+   u8 *aad, *iv, *mem = NULL;
+   struct scatterlist *sgin = NULL;
+   struct scatterlist *sgout = NULL;
+   const int data_len = rxm->full_len - tls_ctx->rx.overhead_size;
+
+   if (*zc && (out_iov || out_sg)) {
+   if (out_iov)
+   n_sgout = iov_iter_npages(out_iov, INT_MAX) + 1;
+   else
+   n_sgout = sg_nents(out_sg);
+
+   n_sgin = skb_nsg(skb, rxm->offset + tls_ctx->rx.prepend_size,
+rxm->full_len - tls_ctx->rx.prepend_size);
+   } else {
+no_zerocopy:
+   n_sgout = 0;
+   *zc = false;
+   n_sgin = skb_cow_data(skb, 0, );
+   }
+
+   if (n_sgin < 1)
+   return -EBADMSG;
+
+   /* Increment to accommodate AAD */
+   n_sgin = n_sgin + 1;
+
+   nsg = n_sgin + n_sgout;
+
+   aead_size = sizeof(*aead_req) + crypto_aead_reqsize(ctx->aead_recv);
+   mem_size = aead_size + (nsg * sizeof(struct scatterlist));
+   mem_size = mem_size + TLS_AAD_SPACE_SIZE;
+   mem_size = mem_size + crypto_aead_ivsize(ctx->aead_recv);
+
+   /* Allocate a single block of memory which contains
+* aead_req || sgin[] || sg

RE: [PATCH net-next][RFC] net/tls: Add support for async decryption of tls records

2018-08-15 Thread Vakul Garg



> -Original Message-
> From: Dave Watson [mailto:davejwat...@fb.com]
> Sent: Wednesday, August 15, 2018 10:26 PM
> To: Vakul Garg 
> Cc: netdev@vger.kernel.org; bor...@mellanox.com;
> avia...@mellanox.com; da...@davemloft.net
> Subject: Re: [PATCH net-next][RFC] net/tls: Add support for async decryption
> of tls records
> 
> On 08/14/18 07:47 PM, Vakul Garg wrote:
> > Incoming TLS records which are directly decrypted into user space
> > application buffer i.e. records which are decrypted in zero-copy mode
> > are submitted for async decryption. When the decryption cryptoapi
> > returns -EINPROGRESS, the next tls record is parsed and then submitted
> > for decryption. The references to records which has been sent for
> > async decryption are dropped. This happens in a loop for all the
> > records that can be decrypted in zero-copy mode. For records for which
> > decryption is not possible in zero-copy mode, asynchronous decryption
> > is not used and we wait for decryption crypto api to complete.
> >
> > For crypto requests executing in async fashion, the memory for
> > aead_request, sglists and skb etc is freed from the decryption
> > completion handler. The decryption completion handler wakesup the
> > sleeping user context. This happens when the user context is done
> > enqueueing all the crypto requests and is waiting for all the async
> > operations to finish. Since the splice() operation does not use
> > zero-copy decryption, async remains disabled for splice().
> 
> I found it a little hard to understand what you are trying to do based on the
> commit message.  
 
Ok, I will rewrite it. 

> Reading the code, it looks like if the recv() spans multiple
> TLS records, we will start decryption on all of them, and only wait right
> before recv() returns, yes?  This approach sounds great to me.
> 

Yes, that's the idea. I am firing as many decryption jobs as possible till I run
out of user application provided buffer space.

> Three of the selftests are failing for me:
> 
> [ FAIL ] tls.recv_partial
> [ FAIL ] tls.recv_peek
> [ FAIL ] tls.recv_peek_multiple
 
Will look into it.
Thanks for spending time in review my patch.
The patch is showing good performance benefits.

[PATCH net-next v1] net/tls: Add support for async decryption of tls records

2018-08-16 Thread Vakul Garg

When tls records are decrypted using asynchronous acclerators such as
NXP CAAM engine, the crypto apis return -EINPROGRESS. Presently, on
getting -EINPROGRESS, the tls record processing stops till the time the
crypto accelerator finishes off and returns the result. This incurs a
context switch and is not an efficient way of accessing the crypto
accelerators. Crypto accelerators work efficient when they are queued
with multiple crypto jobs without having to wait for the previous ones
to complete.

The patch submits multiple crypto requests without having to wait for
for previous ones to complete. This has been implemented for records
which are decrypted in zero-copy mode. At the end of recvmsg(), we wait
for all the asynchronous decryption requests to complete.

The references to records which have been sent for async decryption are
dropped. For cases where record decryption is not possible in zero-copy
mode, asynchronous decryption is not used and we wait for decryption
crypto api to complete.

For crypto requests executing in async fashion, the memory for
aead_request, sglists and skb etc is freed from the decryption
completion handler. The decryption completion handler wakesup the
sleeping user context when recvmsg() flags that it has done sending
all the decryption requests and there are no more decryption requests
pending to be completed.

Signed-off-by: Vakul Garg 
---
Changes since RFC version:
1) Improved commit message.
2) Fixed dequeued record offset handling because of which few of
   tls selftests 'recv_partial, recv_peek, recv_peek_multiple' were 
failing.

With this patch version, all tls selftests pass with both software based
synchronous crypto and hardware based asynchronous crypto.
Also, I tested sanity of file contents transferred over a tls session
backed by kernel based record layer.

 include/net/tls.h |   6 +++
 net/tls/tls_sw.c  | 124 ++
 2 files changed, 122 insertions(+), 8 deletions(-)

diff --git a/include/net/tls.h b/include/net/tls.h
index d5c683e8bb22..cd0a65bd92f9 100644
--- a/include/net/tls.h
+++ b/include/net/tls.h
@@ -124,6 +124,12 @@ struct tls_sw_context_rx {
struct sk_buff *recv_pkt;
u8 control;
bool decrypted;
+   atomic_t decrypt_pending;
+   bool async_notify;
+};
+
+struct decrypt_req_ctx {
+   struct sock *sk;
 };
 
 struct tls_record_info {
diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index 52fbe727d7c1..5d1e7c336807 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -43,12 +43,50 @@
 
 #define MAX_IV_SIZETLS_CIPHER_AES_GCM_128_IV_SIZE
 
+static void tls_decrypt_done(struct crypto_async_request *req, int err)
+{
+   struct aead_request *aead_req = (struct aead_request *)req;
+   struct decrypt_req_ctx *req_ctx =
+   (struct decrypt_req_ctx *)(aead_req + 1);
+
+   struct scatterlist *sgout = aead_req->dst;
+
+   struct tls_context *tls_ctx = tls_get_ctx(req_ctx->sk);
+   struct tls_sw_context_rx *ctx = tls_sw_ctx_rx(tls_ctx);
+   int pending = atomic_dec_return(>decrypt_pending);
+   struct scatterlist *sg;
+   unsigned int pages;
+
+   /* Propagate if there was an err */
+   if (err) {
+   ctx->async_wait.err = err;
+   tls_err_abort(req_ctx->sk, err);
+   }
+
+   /* Release the skb, pages and memory allocated for crypto req */
+   kfree_skb(req->data);
+
+   /* Skip the first S/G entry as it points to AAD */
+   for_each_sg(sg_next(sgout), sg, UINT_MAX, pages) {
+   if (!sg)
+   break;
+   put_page(sg_page(sg));
+   }
+
+   kfree(aead_req);
+
+   if (!pending && READ_ONCE(ctx->async_notify))
+   complete(>async_wait.completion);
+}
+
 static int tls_do_decryption(struct sock *sk,
+struct sk_buff *skb,
 struct scatterlist *sgin,
 struct scatterlist *sgout,
 char *iv_recv,
 size_t data_len,
-struct aead_request *aead_req)
+struct aead_request *aead_req,
+bool async)
 {
struct tls_context *tls_ctx = tls_get_ctx(sk);
struct tls_sw_context_rx *ctx = tls_sw_ctx_rx(tls_ctx);
@@ -59,10 +97,34 @@ static int tls_do_decryption(struct sock *sk,
aead_request_set_crypt(aead_req, sgin, sgout,
   data_len + tls_ctx->rx.tag_size,
   (u8 *)iv_recv);
-   aead_request_set_callback(aead_req, CRYPTO_TFM_REQ_MAY_BACKLOG,
- crypto_req_done, >async_wait);
 
-   ret = crypto_wait_req(crypto_aead_decrypt(aead_req), >async_wait);
+   if (async) {
+   struct decrypt_req_ctx *req_ctx;
+
+

[PATCH net-next][RFC] net/tls: Add support for async decryption of tls records

2018-08-14 Thread Vakul Garg

Incoming TLS records which are directly decrypted into user space
application buffer i.e. records which are decrypted in zero-copy mode
are submitted for async decryption. When the decryption cryptoapi
returns -EINPROGRESS, the next tls record is parsed and then submitted
for decryption. The references to records which has been sent for async
decryption are dropped. This happens in a loop for all the records that
can be decrypted in zero-copy mode. For records for which decryption is
not possible in zero-copy mode, asynchronous decryption is not used and
we wait for decryption crypto api to complete.

For crypto requests executing in async fashion, the memory for
aead_request, sglists and skb etc is freed from the decryption
completion handler. The decryption completion handler wakesup the
sleeping user context. This happens when the user context is done
enqueueing all the crypto requests and is waiting for all the async
operations to finish. Since the splice() operation does not use
zero-copy decryption, async remains disabled for splice().

Signed-off-by: Vakul Garg 
---
 include/net/tls.h |   6 +++
 net/tls/tls_sw.c  | 134 +-
 2 files changed, 129 insertions(+), 11 deletions(-)

diff --git a/include/net/tls.h b/include/net/tls.h
index d5c683e8bb22..cd0a65bd92f9 100644
--- a/include/net/tls.h
+++ b/include/net/tls.h
@@ -124,6 +124,12 @@ struct tls_sw_context_rx {
struct sk_buff *recv_pkt;
u8 control;
bool decrypted;
+   atomic_t decrypt_pending;
+   bool async_notify;
+};
+
+struct decrypt_req_ctx {
+   struct sock *sk;
 };
 
 struct tls_record_info {
diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index 52fbe727d7c1..e2f0df18b6cf 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -43,12 +43,50 @@
 
 #define MAX_IV_SIZETLS_CIPHER_AES_GCM_128_IV_SIZE
 
+static void tls_decrypt_done(struct crypto_async_request *req, int err)
+{
+   struct aead_request *aead_req = (struct aead_request *)req;
+   struct decrypt_req_ctx *req_ctx =
+   (struct decrypt_req_ctx *)(aead_req + 1);
+
+   struct scatterlist *sgout = aead_req->dst;
+
+   struct tls_context *tls_ctx = tls_get_ctx(req_ctx->sk);
+   struct tls_sw_context_rx *ctx = tls_sw_ctx_rx(tls_ctx);
+   int pending = atomic_dec_return(>decrypt_pending);
+   struct scatterlist *sg;
+   unsigned int pages;
+
+   /* Propagate if there was an err */
+   if (err) {
+   ctx->async_wait.err = err;
+   tls_err_abort(req_ctx->sk, err);
+   }
+
+   /* Release the skb, pages and memory allocated for crypto req */
+   kfree_skb(req->data);
+
+   /* Skip the first S/G entry as it points to AAD */
+   for_each_sg(sg_next(sgout), sg, UINT_MAX, pages) {
+   if (!sg)
+   break;
+   put_page(sg_page(sg));
+   }
+
+   kfree(aead_req);
+
+   if (!pending && READ_ONCE(ctx->async_notify))
+   complete(>async_wait.completion);
+}
+
 static int tls_do_decryption(struct sock *sk,
+struct sk_buff *skb,
 struct scatterlist *sgin,
 struct scatterlist *sgout,
 char *iv_recv,
 size_t data_len,
-struct aead_request *aead_req)
+struct aead_request *aead_req,
+bool async)
 {
struct tls_context *tls_ctx = tls_get_ctx(sk);
struct tls_sw_context_rx *ctx = tls_sw_ctx_rx(tls_ctx);
@@ -59,10 +97,34 @@ static int tls_do_decryption(struct sock *sk,
aead_request_set_crypt(aead_req, sgin, sgout,
   data_len + tls_ctx->rx.tag_size,
   (u8 *)iv_recv);
-   aead_request_set_callback(aead_req, CRYPTO_TFM_REQ_MAY_BACKLOG,
- crypto_req_done, >async_wait);
 
-   ret = crypto_wait_req(crypto_aead_decrypt(aead_req), >async_wait);
+   if (async) {
+   struct decrypt_req_ctx *req_ctx;
+
+   req_ctx = (struct decrypt_req_ctx *)(aead_req + 1);
+   req_ctx->sk = sk;
+
+   aead_request_set_callback(aead_req,
+ CRYPTO_TFM_REQ_MAY_BACKLOG,
+ tls_decrypt_done, skb);
+   atomic_inc(>decrypt_pending);
+   } else {
+   aead_request_set_callback(aead_req,
+ CRYPTO_TFM_REQ_MAY_BACKLOG,
+ crypto_req_done, >async_wait);
+   }
+
+   ret = crypto_aead_decrypt(aead_req);
+   if (ret == -EINPROGRESS) {
+   if (async)
+   return ret;
+
+   ret = crypto_wait_req(ret, >async_wait);
+

[PATCH RFC net-next 0/1] net/tls: Combined memory allocation for decryption request

2018-08-06 Thread Vakul Garg

This patch does a combined memory allocation from heap for scatterlists,
aead_request, aad and iv for the tls record decryption path. In present
code, aead_request is allocated from heap, scatterlists on a conditional
basis are allocated on heap or on stack. This is inefficient as it may
requires multiple kmalloc/kfree.

The initialization vector passed in cryption request is allocated on
stack. This is a problem since the stack memory is not dma-able from
crypto accelerators.

Doing one combined memory allocation for each decryption request fixes
both the above issues. It also paves a way to be able to submit multiple
async decryption requests while the previous one is pending i.e. being 
processed or queued.

This patch has been built over Doron Roberts-Kedes's patch:

"net/tls: Calculate nsg for zerocopy path without skb_cow_data"


Vakul Garg (1):
  net/tls: Combined memory allocation for decryption request

 include/net/tls.h |   4 -
 net/tls/tls_sw.c  | 256 +++---
 2 files changed, 147 insertions(+), 113 deletions(-)

-- 
2.13.6

[PATCH RFC net-next 1/1] net/tls: Combined memory allocation for decryption request

2018-08-06 Thread Vakul Garg

For preparing decryption request, several memory chunks are required
(aead_req, sgin, sgout, iv, aad). For submitting the decrypt request to
an accelerator, it is required that the buffers which are read by the
accelerator must be dma-able and not come from stack. The buffers for
aad and iv can be separately kmalloced each, but it is inefficient.
This patch does a combined allocation for preparing decryption request
and then segments into aead_req || sgin || sgout || iv || aad.

Signed-off-by: Vakul Garg 
---
 include/net/tls.h |   4 -
 net/tls/tls_sw.c  | 257 +++---
 2 files changed, 148 insertions(+), 113 deletions(-)

diff --git a/include/net/tls.h b/include/net/tls.h
index d8b3b6578c01..d5c683e8bb22 100644
--- a/include/net/tls.h
+++ b/include/net/tls.h
@@ -124,10 +124,6 @@ struct tls_sw_context_rx {
struct sk_buff *recv_pkt;
u8 control;
bool decrypted;
-
-   char rx_aad_ciphertext[TLS_AAD_SPACE_SIZE];
-   char rx_aad_plaintext[TLS_AAD_SPACE_SIZE];
-
 };
 
 struct tls_record_info {
diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index cd5ed2d1dbe8..a478b06fc015 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -118,19 +118,13 @@ static int tls_do_decryption(struct sock *sk,
 struct scatterlist *sgout,
 char *iv_recv,
 size_t data_len,
-struct sk_buff *skb,
-gfp_t flags)
+struct aead_request *aead_req)
 {
struct tls_context *tls_ctx = tls_get_ctx(sk);
struct tls_sw_context_rx *ctx = tls_sw_ctx_rx(tls_ctx);
-   struct aead_request *aead_req;
-
int ret;
 
-   aead_req = aead_request_alloc(ctx->aead_recv, flags);
-   if (!aead_req)
-   return -ENOMEM;
-
+   aead_request_set_tfm(aead_req, ctx->aead_recv);
aead_request_set_ad(aead_req, TLS_AAD_SPACE_SIZE);
aead_request_set_crypt(aead_req, sgin, sgout,
   data_len + tls_ctx->rx.tag_size,
@@ -139,8 +133,6 @@ static int tls_do_decryption(struct sock *sk,
  crypto_req_done, >async_wait);
 
ret = crypto_wait_req(crypto_aead_decrypt(aead_req), >async_wait);
-
-   aead_request_free(aead_req);
return ret;
 }
 
@@ -727,8 +719,138 @@ static struct sk_buff *tls_wait_data(struct sock *sk, int 
flags,
return skb;
 }
 
+/* This function decrypts the input skb into either out_iov or in out_sg
+ * or in skb buffers itself. The input parameter 'zc' indicates if
+ * zero-copy mode needs to be tried or not. With zero-copy mode, either
+ * out_iov or out_sg must be non-NULL. In case both out_iov and out_sg are
+ * NULL, then the decryption happens inside skb buffers itself, i.e.
+ * zero-copy gets disabled and 'zc' is updated.
+ */
+
+static int decrypt_internal(struct sock *sk, struct sk_buff *skb,
+   struct iov_iter *out_iov,
+   struct scatterlist *out_sg,
+   int *chunk, bool *zc)
+{
+   struct tls_context *tls_ctx = tls_get_ctx(sk);
+   struct tls_sw_context_rx *ctx = tls_sw_ctx_rx(tls_ctx);
+   struct strp_msg *rxm = strp_msg(skb);
+   int n_sgin, n_sgout, nsg, mem_size, aead_size, err, pages = 0;
+   struct aead_request *aead_req;
+   struct sk_buff *unused;
+   u8 *aad, *iv, *mem = NULL;
+   struct scatterlist *sgin = NULL;
+   struct scatterlist *sgout = NULL;
+   const int data_len = rxm->full_len - tls_ctx->rx.overhead_size;
+
+   if (*zc && (out_iov || out_sg)) {
+   if (out_iov)
+   n_sgout = iov_iter_npages(out_iov, INT_MAX) + 1;
+   else if (out_sg)
+   n_sgout = sg_nents(out_sg);
+   else
+   goto no_zerocopy;
+
+   n_sgin = skb_nsg(skb, rxm->offset + tls_ctx->rx.prepend_size,
+rxm->full_len - tls_ctx->rx.prepend_size);
+   } else {
+no_zerocopy:
+   n_sgout = 0;
+   *zc = false;
+   n_sgin = skb_cow_data(skb, 0, );
+   }
+
+   if (n_sgin < 1)
+   return -EBADMSG;
+
+   /* Increment to accommodate AAD */
+   n_sgin = n_sgin + 1;
+
+   nsg = n_sgin + n_sgout;
+
+   aead_size = sizeof(*aead_req) + crypto_aead_reqsize(ctx->aead_recv);
+   mem_size = aead_size + (nsg * sizeof(struct scatterlist));
+   mem_size = mem_size + TLS_AAD_SPACE_SIZE;
+   mem_size = mem_size + crypto_aead_ivsize(ctx->aead_recv);
+
+   /* Allocate a single block of memory which contains
+* aead_req || sgin[] || sgout[] || aad || iv.
+* This order achieves correct alignment for aead_req, sgin, sgout.
+*/
+   mem = kmalloc(mem_size, sk->sk_allocati

RE: [PATCH RFC net-next 1/1] net/tls: Combined memory allocation for decryption request

2018-08-07 Thread Vakul Garg




> -Original Message-
> From: Dave Watson [mailto:davejwat...@fb.com]
> Sent: Tuesday, August 7, 2018 8:26 PM
> To: Vakul Garg 
> Cc: netdev@vger.kernel.org; bor...@mellanox.com;
> avia...@mellanox.com; da...@davemloft.net
> Subject: Re: [PATCH RFC net-next 1/1] net/tls: Combined memory allocation
> for decryption request
> 
> Hi Vakul,
> 
> Only minor comments, mostly looks good to me.  Thanks
> 
> > +/* This function decrypts the input skb into either out_iov or in
> > +out_sg
> > + * or in skb buffers itself. The input parameter 'zc' indicates if
> > + * zero-copy mode needs to be tried or not. With zero-copy mode,
> > +either
> > + * out_iov or out_sg must be non-NULL. In case both out_iov and
> > +out_sg are
> > + * NULL, then the decryption happens inside skb buffers itself, i.e.
> > + * zero-copy gets disabled and 'zc' is updated.
> > + */
> > +
> > +static int decrypt_internal(struct sock *sk, struct sk_buff *skb,
> > +   struct iov_iter *out_iov,
> > +   struct scatterlist *out_sg,
> > +   int *chunk, bool *zc)
> > +{
> > +   struct tls_context *tls_ctx = tls_get_ctx(sk);
> > +   struct tls_sw_context_rx *ctx = tls_sw_ctx_rx(tls_ctx);
> > +   struct strp_msg *rxm = strp_msg(skb);
> > +   int n_sgin, n_sgout, nsg, mem_size, aead_size, err, pages = 0;
> > +   struct aead_request *aead_req;
> > +   struct sk_buff *unused;
> > +   u8 *aad, *iv, *mem = NULL;
> > +   struct scatterlist *sgin = NULL;
> > +   struct scatterlist *sgout = NULL;
> > +   const int data_len = rxm->full_len - tls_ctx->rx.overhead_size;
> > +
> > +   if (*zc && (out_iov || out_sg)) {
> > +   if (out_iov)
> > +   n_sgout = iov_iter_npages(out_iov, INT_MAX) + 1;
> > +   else if (out_sg)
> > +   n_sgout = sg_nents(out_sg);
> > +   else
> > +   goto no_zerocopy;
> 
> Is the last else necessary?  It looks like the if already checks for out_iov 
> ||
> out_sg.
> 
You are right. I can remove 'else. 'else if' can also be replaced with 'else'.
I would submit v2.


> > struct scatterlist *sgout)
> >  {
> > -   struct tls_context *tls_ctx = tls_get_ctx(sk);
> > -   struct tls_sw_context_rx *ctx = tls_sw_ctx_rx(tls_ctx);
> > -   char iv[TLS_CIPHER_AES_GCM_128_SALT_SIZE + MAX_IV_SIZE];
> > -   struct scatterlist sgin_arr[MAX_SKB_FRAGS + 2];
> > -   struct scatterlist *sgin = _arr[0];
> > -   struct strp_msg *rxm = strp_msg(skb);
> > -   int ret, nsg;
> > -   struct sk_buff *unused;
> > -
> > -   ret = skb_copy_bits(skb, rxm->offset + TLS_HEADER_SIZE,
> > -   iv + TLS_CIPHER_AES_GCM_128_SALT_SIZE,
> > -   tls_ctx->rx.iv_size);
> > -   if (ret < 0)
> > -   return ret;
> > -
> > -   memcpy(iv, tls_ctx->rx.iv, TLS_CIPHER_AES_GCM_128_SALT_SIZE);
> > -   if (!sgout) {
> > -   nsg = skb_cow_data(skb, 0, );
> > -   } else {
> > -   nsg = skb_nsg(skb,
> > - rxm->offset + tls_ctx->rx.prepend_size,
> > - rxm->full_len - tls_ctx->rx.prepend_size);
> > -   if (nsg <= 0)
> > -   return nsg;
> > -   }
> > -
> > -   // We need one extra for ctx->rx_aad_ciphertext
> > -   nsg++;
> > -
> > -   if (nsg > ARRAY_SIZE(sgin_arr))
> > -   sgin = kmalloc_array(nsg, sizeof(*sgin), sk->sk_allocation);
> > -
> > -   if (!sgout)
> > -   sgout = sgin;
> > -
> > -   sg_init_table(sgin, nsg);
> > -   sg_set_buf([0], ctx->rx_aad_ciphertext, TLS_AAD_SPACE_SIZE);
> > -
> > -   nsg = skb_to_sgvec(skb, [1],
> > -  rxm->offset + tls_ctx->rx.prepend_size,
> > -  rxm->full_len - tls_ctx->rx.prepend_size);
> > -   if (nsg < 0) {
> > -   ret = nsg;
> > -   goto out;
> > -   }
> > -
> > -   tls_make_aad(ctx->rx_aad_ciphertext,
> > -rxm->full_len - tls_ctx->rx.overhead_size,
> > -tls_ctx->rx.rec_seq,
> > -tls_ctx->rx.rec_seq_size,
> > -ctx->control);
> > -
> > -   ret = tls_do_decryption(sk, sgin, sgout, iv,
> > -   rxm->full_len - tls_ctx->rx.overhead_size,
> > -   skb, sk->sk_allocation);
> > -
> > -out:
> &g

RE: [PATCH net-next,v3] net/tls: Calculate nsg for zerocopy path without skb_cow_data.

2018-08-07 Thread Vakul Garg




> -Original Message-
> From: Doron Roberts-Kedes [mailto:doro...@fb.com]
> Sent: Tuesday, August 7, 2018 1:18 AM
> To: David S . Miller 
> Cc: Vakul Garg ; Dave Watson
> ; Boris Pismenny ; Aviad
> Yehezkel ; netdev@vger.kernel.org; Doron
> Roberts-Kedes 
> Subject: [PATCH net-next,v3] net/tls: Calculate nsg for zerocopy path
> without skb_cow_data.
> 
> decrypt_skb fails if the number of sg elements required to map is
> greater than MAX_SKB_FRAGS. As noted by Vakul Garg, nsg must always be
> calculated, but skb_cow_data adds unnecessary memcpy's for the zerocopy
> case.
> 
> The new function skb_nsg calculates the number of scatterlist elements
> required to map the skb without the extra overhead of skb_cow_data. This
> function mimics the structure of skb_to_sgvec.
> 
> Fixes: c46234ebb4d1 ("tls: RX path for ktls")
> Signed-off-by: Doron Roberts-Kedes 
> ---
>  net/tls/tls_sw.c | 96
> ++--
>  1 file changed, 93 insertions(+), 3 deletions(-)
> 
> diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
> index ff3a6904a722..eb87f931a0d6 100644
> --- a/net/tls/tls_sw.c
> +++ b/net/tls/tls_sw.c
> @@ -43,6 +43,80 @@
> 
>  #define MAX_IV_SIZE  TLS_CIPHER_AES_GCM_128_IV_SIZE
> 
> +static int __skb_nsg(struct sk_buff *skb, int offset, int len,
> +  unsigned int recursion_level)
> +{
> + int start = skb_headlen(skb);
> + int i, copy = start - offset;
> + struct sk_buff *frag_iter;
> + int elt = 0;
> +
> + if (unlikely(recursion_level >= 24))
> + return -EMSGSIZE;
> +
> + if (copy > 0) {
> + if (copy > len)
> + copy = len;
> + elt++;
> + len -= copy;
> + if (len == 0)
> + return elt;
> + offset += copy;
> + }
> +
> + for (i = 0; i < skb_shinfo(skb)->nr_frags; i++) {
> + int end;
> +
> + WARN_ON(start > offset + len);
> +
> + end = start + skb_frag_size(_shinfo(skb)->frags[i]);
> + copy = end - offset;
> + if (copy > 0) {
> + if (copy > len)
> + copy = len;
> + elt++;
> + len -= copy;
> + if (len == 0)
> + return elt;
> + offset += copy;
> + }
> + start = end;
> + }
> +
> + skb_walk_frags(skb, frag_iter) {
> + int end, ret;
> +
> + WARN_ON(start > offset + len);
> +
> + end = start + frag_iter->len;
> + copy = end - offset;
> + if (copy > 0) {
> + if (copy > len)
> + copy = len;
> + ret = __skb_nsg(frag_iter, offset - start, copy,
> + recursion_level + 1);
> + if (unlikely(ret < 0))
> + return ret;
> + elt += ret;
> + len -= copy;
> + if (len == 0)
> + return elt;
> + offset += copy;
> + }
> + start = end;
> + }
> + BUG_ON(len);
> + return elt;
> +}
> +
> +/* Return the number of scatterlist elements required to completely map
> the
> + * skb, or -EMSGSIZE if the recursion depth is exceeded.
> + */
> +static int skb_nsg(struct sk_buff *skb, int offset, int len)
> +{
> + return __skb_nsg(skb, offset, len, 0);
> +}
> +
>  static int tls_do_decryption(struct sock *sk,
>struct scatterlist *sgin,
>struct scatterlist *sgout,
> @@ -693,7 +767,7 @@ int decrypt_skb(struct sock *sk, struct sk_buff *skb,
>   struct scatterlist sgin_arr[MAX_SKB_FRAGS + 2];
>   struct scatterlist *sgin = _arr[0];
>   struct strp_msg *rxm = strp_msg(skb);
> - int ret, nsg = ARRAY_SIZE(sgin_arr);
> + int ret, nsg;
>   struct sk_buff *unused;
> 
>   ret = skb_copy_bits(skb, rxm->offset + TLS_HEADER_SIZE,
> @@ -704,11 +778,27 @@ int decrypt_skb(struct sock *sk, struct sk_buff
> *skb,
> 
>   memcpy(iv, tls_ctx->rx.iv, TLS_CIPHER_AES_GCM_128_SALT_SIZE);
>   if (!sgout) {
> - nsg = skb_cow_data(skb, 0, ) + 1;
> + nsg = skb_cow_data(skb, 0, );
> + } else {
> + nsg = skb_nsg(skb,
> +   rxm->offset + tls_ctx->rx.prepend_size,
> +

[PATCH net-next v3 0/1] net/tls: Combined memory allocation for decryption request

2018-08-10 Thread Vakul Garg

This patch does a combined memory allocation from heap for scatterlists,
aead_request, aad and iv for the tls record decryption path. In present
code, aead_request is allocated from heap, scatterlists on a conditional
basis are allocated on heap or on stack. This is inefficient as it may
requires multiple kmalloc/kfree.

The initialization vector passed in cryption request is allocated on
stack. This is a problem since the stack memory is not dma-able from
crypto accelerators.

Doing one combined memory allocation for each decryption request fixes
both the above issues. It also paves a way to be able to submit multiple
async decryption requests while the previous one is pending i.e. being 
processed or queued.


Vakul Garg (1):
  net/tls: Combined memory allocation for decryption request

 include/net/tls.h |   4 -
 net/tls/tls_sw.c  | 238 --
 2 files changed, 142 insertions(+), 100 deletions(-)

-- 
2.13.6

[PATCH net-next v3 1/1] net/tls: Combined memory allocation for decryption request

2018-08-10 Thread Vakul Garg

For preparing decryption request, several memory chunks are required
(aead_req, sgin, sgout, iv, aad). For submitting the decrypt request to
an accelerator, it is required that the buffers which are read by the
accelerator must be dma-able and not come from stack. The buffers for
aad and iv can be separately kmalloced each, but it is inefficient.
This patch does a combined allocation for preparing decryption request
and then segments into aead_req || sgin || sgout || iv || aad.

Signed-off-by: Vakul Garg 
---

Changes since v2:
- Rebased to not require following Doron Roberts-Kedes's patch.
"net/tls: Calculate nsg for zerocopy path without skb_cow_data."


 include/net/tls.h |   4 -
 net/tls/tls_sw.c  | 238 --
 2 files changed, 142 insertions(+), 100 deletions(-)

diff --git a/include/net/tls.h b/include/net/tls.h
index d8b3b6578c01..d5c683e8bb22 100644
--- a/include/net/tls.h
+++ b/include/net/tls.h
@@ -124,10 +124,6 @@ struct tls_sw_context_rx {
struct sk_buff *recv_pkt;
u8 control;
bool decrypted;
-
-   char rx_aad_ciphertext[TLS_AAD_SPACE_SIZE];
-   char rx_aad_plaintext[TLS_AAD_SPACE_SIZE];
-
 };
 
 struct tls_record_info {
diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index 83d67df33f0c..52fbe727d7c1 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -48,19 +48,13 @@ static int tls_do_decryption(struct sock *sk,
 struct scatterlist *sgout,
 char *iv_recv,
 size_t data_len,
-struct sk_buff *skb,
-gfp_t flags)
+struct aead_request *aead_req)
 {
struct tls_context *tls_ctx = tls_get_ctx(sk);
struct tls_sw_context_rx *ctx = tls_sw_ctx_rx(tls_ctx);
-   struct aead_request *aead_req;
-
int ret;
 
-   aead_req = aead_request_alloc(ctx->aead_recv, flags);
-   if (!aead_req)
-   return -ENOMEM;
-
+   aead_request_set_tfm(aead_req, ctx->aead_recv);
aead_request_set_ad(aead_req, TLS_AAD_SPACE_SIZE);
aead_request_set_crypt(aead_req, sgin, sgout,
   data_len + tls_ctx->rx.tag_size,
@@ -69,8 +63,6 @@ static int tls_do_decryption(struct sock *sk,
  crypto_req_done, >async_wait);
 
ret = crypto_wait_req(crypto_aead_decrypt(aead_req), >async_wait);
-
-   aead_request_free(aead_req);
return ret;
 }
 
@@ -657,8 +649,132 @@ static struct sk_buff *tls_wait_data(struct sock *sk, int 
flags,
return skb;
 }
 
+/* This function decrypts the input skb into either out_iov or in out_sg
+ * or in skb buffers itself. The input parameter 'zc' indicates if
+ * zero-copy mode needs to be tried or not. With zero-copy mode, either
+ * out_iov or out_sg must be non-NULL. In case both out_iov and out_sg are
+ * NULL, then the decryption happens inside skb buffers itself, i.e.
+ * zero-copy gets disabled and 'zc' is updated.
+ */
+
+static int decrypt_internal(struct sock *sk, struct sk_buff *skb,
+   struct iov_iter *out_iov,
+   struct scatterlist *out_sg,
+   int *chunk, bool *zc)
+{
+   struct tls_context *tls_ctx = tls_get_ctx(sk);
+   struct tls_sw_context_rx *ctx = tls_sw_ctx_rx(tls_ctx);
+   struct strp_msg *rxm = strp_msg(skb);
+   int n_sgin, n_sgout, nsg, mem_size, aead_size, err, pages = 0;
+   struct aead_request *aead_req;
+   struct sk_buff *unused;
+   u8 *aad, *iv, *mem = NULL;
+   struct scatterlist *sgin = NULL;
+   struct scatterlist *sgout = NULL;
+   const int data_len = rxm->full_len - tls_ctx->rx.overhead_size;
+
+   if (*zc && (out_iov || out_sg)) {
+   if (out_iov)
+   n_sgout = iov_iter_npages(out_iov, INT_MAX) + 1;
+   else
+   n_sgout = sg_nents(out_sg);
+   } else {
+   n_sgout = 0;
+   *zc = false;
+   }
+
+   n_sgin = skb_cow_data(skb, 0, );
+   if (n_sgin < 1)
+   return -EBADMSG;
+
+   /* Increment to accommodate AAD */
+   n_sgin = n_sgin + 1;
+
+   nsg = n_sgin + n_sgout;
+
+   aead_size = sizeof(*aead_req) + crypto_aead_reqsize(ctx->aead_recv);
+   mem_size = aead_size + (nsg * sizeof(struct scatterlist));
+   mem_size = mem_size + TLS_AAD_SPACE_SIZE;
+   mem_size = mem_size + crypto_aead_ivsize(ctx->aead_recv);
+
+   /* Allocate a single block of memory which contains
+* aead_req || sgin[] || sgout[] || aad || iv.
+* This order achieves correct alignment for aead_req, sgin, sgout.
+*/
+   mem = kmalloc(mem_size, sk->sk_allocation);
+   if (!mem)
+   return -ENOMEM;
+
+   /* Segment the allocated memory */
+

RE: [PATCH net-next v2 1/1] net/tls: Combined memory allocation for decryption request

2018-08-10 Thread Vakul Garg



> -Original Message-
> From: Dave Watson [mailto:davejwat...@fb.com]
> Sent: Thursday, August 9, 2018 9:56 PM
> To: Vakul Garg 
> Cc: netdev@vger.kernel.org; bor...@mellanox.com;
> avia...@mellanox.com; da...@davemloft.net
> Subject: Re: [PATCH net-next v2 1/1] net/tls: Combined memory allocation
> for decryption request
> 
> On 08/09/18 04:56 AM, Vakul Garg wrote:
> > For preparing decryption request, several memory chunks are required
> > (aead_req, sgin, sgout, iv, aad). For submitting the decrypt request
> > to an accelerator, it is required that the buffers which are read by
> > the accelerator must be dma-able and not come from stack. The buffers
> > for aad and iv can be separately kmalloced each, but it is inefficient.
> > This patch does a combined allocation for preparing decryption request
> > and then segments into aead_req || sgin || sgout || iv || aad.
> >
> > Signed-off-by: Vakul Garg 
> 
> Reviewed-by: Dave Watson 
> 
> Thanks, looks good to me now.

I had to rebase the patch so as not to use any other patch which is under 
review.
Since Doron's skb_nsg() patch needs rework, I submitted v3 after removing 
reference to function skb_nsg() and 
replaced it with skb_cow_data().

Kindly review.

[PATCH net-next v2 1/1] net/tls: Combined memory allocation for decryption request

2018-08-08 Thread Vakul Garg

For preparing decryption request, several memory chunks are required
(aead_req, sgin, sgout, iv, aad). For submitting the decrypt request to
an accelerator, it is required that the buffers which are read by the
accelerator must be dma-able and not come from stack. The buffers for
aad and iv can be separately kmalloced each, but it is inefficient.
This patch does a combined allocation for preparing decryption request
and then segments into aead_req || sgin || sgout || iv || aad.

Signed-off-by: Vakul Garg 
---
This patch needs to be applied over Doron Roberts-Kedes's patch.
net/tls: Calculate nsg for zerocopy path without skb_cow_data.

Changes since patch v1:
- Removed unused label 'no_zerocopy'


 include/net/tls.h |   4 -
 net/tls/tls_sw.c  | 257 ++
 2 files changed, 145 insertions(+), 116 deletions(-)

diff --git a/include/net/tls.h b/include/net/tls.h
index d8b3b6578c01..d5c683e8bb22 100644
--- a/include/net/tls.h
+++ b/include/net/tls.h
@@ -124,10 +124,6 @@ struct tls_sw_context_rx {
struct sk_buff *recv_pkt;
u8 control;
bool decrypted;
-
-   char rx_aad_ciphertext[TLS_AAD_SPACE_SIZE];
-   char rx_aad_plaintext[TLS_AAD_SPACE_SIZE];
-
 };
 
 struct tls_record_info {
diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index 55eb04415bb3..13e365011347 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -122,19 +122,13 @@ static int tls_do_decryption(struct sock *sk,
 struct scatterlist *sgout,
 char *iv_recv,
 size_t data_len,
-struct sk_buff *skb,
-gfp_t flags)
+struct aead_request *aead_req)
 {
struct tls_context *tls_ctx = tls_get_ctx(sk);
struct tls_sw_context_rx *ctx = tls_sw_ctx_rx(tls_ctx);
-   struct aead_request *aead_req;
-
int ret;
 
-   aead_req = aead_request_alloc(ctx->aead_recv, flags);
-   if (!aead_req)
-   return -ENOMEM;
-
+   aead_request_set_tfm(aead_req, ctx->aead_recv);
aead_request_set_ad(aead_req, TLS_AAD_SPACE_SIZE);
aead_request_set_crypt(aead_req, sgin, sgout,
   data_len + tls_ctx->rx.tag_size,
@@ -143,8 +137,6 @@ static int tls_do_decryption(struct sock *sk,
  crypto_req_done, >async_wait);
 
ret = crypto_wait_req(crypto_aead_decrypt(aead_req), >async_wait);
-
-   aead_request_free(aead_req);
return ret;
 }
 
@@ -731,8 +723,135 @@ static struct sk_buff *tls_wait_data(struct sock *sk, int 
flags,
return skb;
 }
 
+/* This function decrypts the input skb into either out_iov or in out_sg
+ * or in skb buffers itself. The input parameter 'zc' indicates if
+ * zero-copy mode needs to be tried or not. With zero-copy mode, either
+ * out_iov or out_sg must be non-NULL. In case both out_iov and out_sg are
+ * NULL, then the decryption happens inside skb buffers itself, i.e.
+ * zero-copy gets disabled and 'zc' is updated.
+ */
+
+static int decrypt_internal(struct sock *sk, struct sk_buff *skb,
+   struct iov_iter *out_iov,
+   struct scatterlist *out_sg,
+   int *chunk, bool *zc)
+{
+   struct tls_context *tls_ctx = tls_get_ctx(sk);
+   struct tls_sw_context_rx *ctx = tls_sw_ctx_rx(tls_ctx);
+   struct strp_msg *rxm = strp_msg(skb);
+   int n_sgin, n_sgout, nsg, mem_size, aead_size, err, pages = 0;
+   struct aead_request *aead_req;
+   struct sk_buff *unused;
+   u8 *aad, *iv, *mem = NULL;
+   struct scatterlist *sgin = NULL;
+   struct scatterlist *sgout = NULL;
+   const int data_len = rxm->full_len - tls_ctx->rx.overhead_size;
+
+   if (*zc && (out_iov || out_sg)) {
+   if (out_iov)
+   n_sgout = iov_iter_npages(out_iov, INT_MAX) + 1;
+   else
+   n_sgout = sg_nents(out_sg);
+
+   n_sgin = skb_nsg(skb, rxm->offset + tls_ctx->rx.prepend_size,
+rxm->full_len - tls_ctx->rx.prepend_size);
+   } else {
+   n_sgout = 0;
+   *zc = false;
+   n_sgin = skb_cow_data(skb, 0, );
+   }
+
+   if (n_sgin < 1)
+   return -EBADMSG;
+
+   /* Increment to accommodate AAD */
+   n_sgin = n_sgin + 1;
+
+   nsg = n_sgin + n_sgout;
+
+   aead_size = sizeof(*aead_req) + crypto_aead_reqsize(ctx->aead_recv);
+   mem_size = aead_size + (nsg * sizeof(struct scatterlist));
+   mem_size = mem_size + TLS_AAD_SPACE_SIZE;
+   mem_size = mem_size + crypto_aead_ivsize(ctx->aead_recv);
+
+   /* Allocate a single block of memory which contains
+* aead_req || sgin[] || sgout[] || aad || iv.
+* This order ac

RE: [PATCH net-next v1 1/1] net/tls: Combined memory allocation for decryption request

2018-08-08 Thread Vakul Garg




> -Original Message-
> From: Dave Watson [mailto:davejwat...@fb.com]
> Sent: Wednesday, August 8, 2018 10:37 PM
> To: Vakul Garg 
> Cc: netdev@vger.kernel.org; bor...@mellanox.com;
> avia...@mellanox.com; da...@davemloft.net
> Subject: Re: [PATCH net-next v1 1/1] net/tls: Combined memory allocation
> for decryption request
> 
> On 08/08/18 06:36 PM, Vakul Garg wrote:
> > For preparing decryption request, several memory chunks are required
> > (aead_req, sgin, sgout, iv, aad). For submitting the decrypt request
> > to an accelerator, it is required that the buffers which are read by
> > the accelerator must be dma-able and not come from stack. The buffers
> > for aad and iv can be separately kmalloced each, but it is inefficient.
> > This patch does a combined allocation for preparing decryption request
> > and then segments into aead_req || sgin || sgout || iv || aad.
> >
> > Signed-off-by: Vakul Garg 
> > ---
> > +   n_sgout = sg_nents(out_sg);
> > +
> > +   n_sgin = skb_nsg(skb, rxm->offset + tls_ctx->rx.prepend_size,
> > +rxm->full_len - tls_ctx->rx.prepend_size);
> > +   } else {
> > +no_zerocopy:
> 
> label is unused now

Thanks for reviewing the patch. 
I wish compiler could have warned about unused label..
I have submitted 'v2'.

[PATCH net-next v2 0/1] net/tls: Combined memory allocation for decryption request

2018-08-08 Thread Vakul Garg

This patch does a combined memory allocation from heap for scatterlists,
aead_request, aad and iv for the tls record decryption path. In present
code, aead_request is allocated from heap, scatterlists on a conditional
basis are allocated on heap or on stack. This is inefficient as it may
requires multiple kmalloc/kfree.

The initialization vector passed in cryption request is allocated on
stack. This is a problem since the stack memory is not dma-able from
crypto accelerators.

Doing one combined memory allocation for each decryption request fixes
both the above issues. It also paves a way to be able to submit multiple
async decryption requests while the previous one is pending i.e. being 
processed or queued.

This patch needs to be applied over Doron Roberts-Kedes's patch.
net/tls: Calculate nsg for zerocopy path without skb_cow_data.



Vakul Garg (1):
  net/tls: Combined memory allocation for decryption request

 include/net/tls.h |   4 -
 net/tls/tls_sw.c  | 257 ++
 2 files changed, 145 insertions(+), 116 deletions(-)

-- 
2.13.6

[PATCH net-next v1] selftests/tls: Add test for recv(PEEK) spanning across multiple records

2018-08-28 Thread Vakul Garg

Added test case to receive multiple records with a single recvmsg()
operation with a MSG_PEEK set.
---
 tools/testing/selftests/net/tls.c | 22 ++
 1 file changed, 22 insertions(+)

diff --git a/tools/testing/selftests/net/tls.c 
b/tools/testing/selftests/net/tls.c
index b3ebf2646e52..07daff076ce0 100644
--- a/tools/testing/selftests/net/tls.c
+++ b/tools/testing/selftests/net/tls.c
@@ -502,6 +502,28 @@ TEST_F(tls, recv_peek_multiple)
EXPECT_EQ(memcmp(test_str, buf, send_len), 0);
 }
 
+TEST_F(tls, recv_peek_large_buf_mult_recs)
+{
+   char const *test_str = "test_read_peek_mult_recs";
+   char const *test_str_first = "test_read_peek";
+   char const *test_str_second = "_mult_recs";
+   int len;
+   char buf[64];
+
+   len = strlen(test_str_first);
+   EXPECT_EQ(send(self->fd, test_str_first, len, 0), len);
+
+   len = strlen(test_str_second) + 1;
+   EXPECT_EQ(send(self->fd, test_str_second, len, 0), len);
+
+   len = sizeof(buf);
+   memset(buf, 0, len);
+   EXPECT_NE(recv(self->cfd, buf, len, MSG_PEEK), -1);
+
+   len = strlen(test_str) + 1;
+   EXPECT_EQ(memcmp(test_str, buf, len), 0);
+}
+
 TEST_F(tls, pollin)
 {
char const *test_str = "test_poll";
-- 
2.13.6

[PATCH net-next v2] net/tls: Add support for async decryption of tls records

2018-08-28 Thread Vakul Garg

When tls records are decrypted using asynchronous acclerators such as
NXP CAAM engine, the crypto apis return -EINPROGRESS. Presently, on
getting -EINPROGRESS, the tls record processing stops till the time the
crypto accelerator finishes off and returns the result. This incurs a
context switch and is not an efficient way of accessing the crypto
accelerators. Crypto accelerators work efficient when they are queued
with multiple crypto jobs without having to wait for the previous ones
to complete.

The patch submits multiple crypto requests without having to wait for
for previous ones to complete. This has been implemented for records
which are decrypted in zero-copy mode. At the end of recvmsg(), we wait
for all the asynchronous decryption requests to complete.

The references to records which have been sent for async decryption are
dropped. For cases where record decryption is not possible in zero-copy
mode, asynchronous decryption is not used and we wait for decryption
crypto api to complete.

For crypto requests executing in async fashion, the memory for
aead_request, sglists and skb etc is freed from the decryption
completion handler. The decryption completion handler wakesup the
sleeping user context when recvmsg() flags that it has done sending
all the decryption requests and there are no more decryption requests
pending to be completed.

Signed-off-by: Vakul Garg 
Reviewed-by: Dave Watson 
---

Changes since v1:
- Simplified recvmsg() so to drop reference to skb in case it
  was submimtted for async decryption.
- Modified tls_sw_advance_skb() to handle case when input skb is
  NULL.

 include/net/tls.h |   6 +++
 net/tls/tls_sw.c  | 134 --
 2 files changed, 127 insertions(+), 13 deletions(-)

diff --git a/include/net/tls.h b/include/net/tls.h
index d5c683e8bb22..cd0a65bd92f9 100644
--- a/include/net/tls.h
+++ b/include/net/tls.h
@@ -124,6 +124,12 @@ struct tls_sw_context_rx {
struct sk_buff *recv_pkt;
u8 control;
bool decrypted;
+   atomic_t decrypt_pending;
+   bool async_notify;
+};
+
+struct decrypt_req_ctx {
+   struct sock *sk;
 };
 
 struct tls_record_info {
diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index 52fbe727d7c1..9503e5a4c27e 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -43,12 +43,50 @@
 
 #define MAX_IV_SIZETLS_CIPHER_AES_GCM_128_IV_SIZE
 
+static void tls_decrypt_done(struct crypto_async_request *req, int err)
+{
+   struct aead_request *aead_req = (struct aead_request *)req;
+   struct decrypt_req_ctx *req_ctx =
+   (struct decrypt_req_ctx *)(aead_req + 1);
+
+   struct scatterlist *sgout = aead_req->dst;
+
+   struct tls_context *tls_ctx = tls_get_ctx(req_ctx->sk);
+   struct tls_sw_context_rx *ctx = tls_sw_ctx_rx(tls_ctx);
+   int pending = atomic_dec_return(>decrypt_pending);
+   struct scatterlist *sg;
+   unsigned int pages;
+
+   /* Propagate if there was an err */
+   if (err) {
+   ctx->async_wait.err = err;
+   tls_err_abort(req_ctx->sk, err);
+   }
+
+   /* Release the skb, pages and memory allocated for crypto req */
+   kfree_skb(req->data);
+
+   /* Skip the first S/G entry as it points to AAD */
+   for_each_sg(sg_next(sgout), sg, UINT_MAX, pages) {
+   if (!sg)
+   break;
+   put_page(sg_page(sg));
+   }
+
+   kfree(aead_req);
+
+   if (!pending && READ_ONCE(ctx->async_notify))
+   complete(>async_wait.completion);
+}
+
 static int tls_do_decryption(struct sock *sk,
+struct sk_buff *skb,
 struct scatterlist *sgin,
 struct scatterlist *sgout,
 char *iv_recv,
 size_t data_len,
-struct aead_request *aead_req)
+struct aead_request *aead_req,
+bool async)
 {
struct tls_context *tls_ctx = tls_get_ctx(sk);
struct tls_sw_context_rx *ctx = tls_sw_ctx_rx(tls_ctx);
@@ -59,10 +97,34 @@ static int tls_do_decryption(struct sock *sk,
aead_request_set_crypt(aead_req, sgin, sgout,
   data_len + tls_ctx->rx.tag_size,
   (u8 *)iv_recv);
-   aead_request_set_callback(aead_req, CRYPTO_TFM_REQ_MAY_BACKLOG,
- crypto_req_done, >async_wait);
 
-   ret = crypto_wait_req(crypto_aead_decrypt(aead_req), >async_wait);
+   if (async) {
+   struct decrypt_req_ctx *req_ctx;
+
+   req_ctx = (struct decrypt_req_ctx *)(aead_req + 1);
+   req_ctx->sk = sk;
+
+   aead_request_set_callback(aead_req,
+

RE: [PATCH net-next v1] net/tls: Add support for async decryption of tls records

2018-08-17 Thread Vakul Garg




> -Original Message-
> From: Dave Watson 
> Sent: Saturday, August 18, 2018 3:43 AM
> To: Vakul Garg 
> Cc: netdev@vger.kernel.org; bor...@mellanox.com;
> avia...@mellanox.com; da...@davemloft.net
> Subject: Re: [PATCH net-next v1] net/tls: Add support for async decryption of
> tls records
> 
> On 08/16/18 08:49 PM, Vakul Garg wrote:
> > Changes since RFC version:
> > 1) Improved commit message.
> > 2) Fixed dequeued record offset handling because of which few of
> >tls selftests 'recv_partial, recv_peek, recv_peek_multiple' were
> failing.
> 
> Thanks! Commit message much more clear, tests work great for me also,
> only minor comments on clarity
> 
> > -   if (tls_sw_advance_skb(sk, skb, chunk)) {
> > +   if (async) {
> > +   /* Finished with current record, pick up next
> */
> > +   ctx->recv_pkt = NULL;
> > +   __strp_unpause(>strp);
> > +   goto mark_eor_chk_ctrl;
> 
> Control flow is a little hard to follow here, maybe just pass an async flag to
> tls_sw_advance_skb?  It already does strp_unpause and recv_pkt = NULL.
> 

I improved it but in a slightly different way. Please see in v2.
As net-next is closed right now, I would send the patch to you privately &
later post it on list when David gives a green signal.
Is it ok?


> > +   } else if (tls_sw_advance_skb(sk, skb, chunk)) {
> > /* Return full control message to
> >  * userspace before trying to parse
> >  * another message type
> >  */
> > +mark_eor_chk_ctrl:
> > msg->msg_flags |= MSG_EOR;
> > if (control != TLS_RECORD_TYPE_DATA)
> > goto recv_end;
> > +   } else {
> > +   break;
> 
> I don't see the need for the else { break; }, isn't this already covered by
> while(len); below as before?
 
When tls_sw_advance_skb() returns false, it is certain that we cannot 
continue in the loop. So putting a break here avoids having to execute
'if' checks and while (len) checks down below.

[net-next v2 1/5] net/tls: Do not enable zero-copy prematurely

2018-07-16 Thread Vakul Garg

Zero-copy mode was left enabled even when zerocopy_from_iter() failed.
Set the zero-copy mode only when zerocopy_from_iter() succeeds. This
leads to removal of argument 'zc' of function decrypt_skb_update().
Function decrypt_skb_update() does not need to check whether
ctx->decrypted is set since it is never called if ctx->decrypted is
true.

Signed-off-by: Vakul Garg 
---
 net/tls/tls_sw.c | 22 +-
 1 file changed, 9 insertions(+), 13 deletions(-)

diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index 7d194c0cd6cf..e94cb54a6994 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -656,7 +656,7 @@ static struct sk_buff *tls_wait_data(struct sock *sk, int 
flags,
 }
 
 static int decrypt_skb_update(struct sock *sk, struct sk_buff *skb,
- struct scatterlist *sgout, bool *zc)
+ struct scatterlist *sgout)
 {
struct tls_context *tls_ctx = tls_get_ctx(sk);
struct tls_sw_context_rx *ctx = tls_sw_ctx_rx(tls_ctx);
@@ -668,13 +668,9 @@ static int decrypt_skb_update(struct sock *sk, struct 
sk_buff *skb,
if (err < 0)
return err;
 #endif
-   if (!ctx->decrypted) {
-   err = decrypt_skb(sk, skb, sgout);
-   if (err < 0)
-   return err;
-   } else {
-   *zc = false;
-   }
+   err = decrypt_skb(sk, skb, sgout);
+   if (err < 0)
+   return err;
 
rxm->offset += tls_ctx->rx.prepend_size;
rxm->full_len -= tls_ctx->rx.overhead_size;
@@ -819,7 +815,6 @@ int tls_sw_recvmsg(struct sock *sk,
struct scatterlist sgin[MAX_SKB_FRAGS + 1];
int pages = 0;
 
-   zc = true;
sg_init_table(sgin, MAX_SKB_FRAGS + 1);
sg_set_buf([0], ctx->rx_aad_plaintext,
   TLS_AAD_SPACE_SIZE);
@@ -830,8 +825,10 @@ int tls_sw_recvmsg(struct sock *sk,
 MAX_SKB_FRAGS, false, 
true);
if (err < 0)
goto fallback_to_reg_recv;
+   else
+   zc = true;
 
-   err = decrypt_skb_update(sk, skb, sgin, );
+   err = decrypt_skb_update(sk, skb, sgin);
for (; pages > 0; pages--)
put_page(sg_page([pages]));
if (err < 0) {
@@ -840,7 +837,7 @@ int tls_sw_recvmsg(struct sock *sk,
}
} else {
 fallback_to_reg_recv:
-   err = decrypt_skb_update(sk, skb, NULL, );
+   err = decrypt_skb_update(sk, skb, NULL);
if (err < 0) {
tls_err_abort(sk, EBADMSG);
goto recv_end;
@@ -895,7 +892,6 @@ ssize_t tls_sw_splice_read(struct socket *sock,  loff_t 
*ppos,
int err = 0;
long timeo;
int chunk;
-   bool zc;
 
lock_sock(sk);
 
@@ -912,7 +908,7 @@ ssize_t tls_sw_splice_read(struct socket *sock,  loff_t 
*ppos,
}
 
if (!ctx->decrypted) {
-   err = decrypt_skb_update(sk, skb, NULL, );
+   err = decrypt_skb_update(sk, skb, NULL);
 
if (err < 0) {
tls_err_abort(sk, EBADMSG);
-- 
2.13.6

[net-next v2 4/5] net/tls: Remove redundant array allocation.

2018-07-16 Thread Vakul Garg

In function decrypt_skb(), array allocation in case when sgout is NULL
is unnecessary. Instead, local variable sgin_arr[] can be used.

Signed-off-by: Vakul Garg 
---
 net/tls/tls_sw.c | 4 
 1 file changed, 4 deletions(-)

diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index 6eaaa587db71..55960088baea 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -700,7 +700,6 @@ int decrypt_skb(struct sock *sk, struct sk_buff *skb,
memcpy(iv, tls_ctx->rx.iv, TLS_CIPHER_AES_GCM_128_SALT_SIZE);
if (!sgout) {
nsg = skb_cow_data(skb, 0, ) + 1;
-   sgin = kmalloc_array(nsg, sizeof(*sgin), sk->sk_allocation);
sgout = sgin;
}
 
@@ -721,9 +720,6 @@ int decrypt_skb(struct sock *sk, struct sk_buff *skb,
rxm->full_len - tls_ctx->rx.overhead_size,
skb, sk->sk_allocation);
 
-   if (sgin != _arr[0])
-   kfree(sgin);
-
return ret;
 }
 
-- 
2.13.6

[net-next v2 5/5] net/tls: Rework error checking after decrypt_skb_update()

2018-07-16 Thread Vakul Garg

Error checking code after invoking decrypt_skb_update() for zero-copy
and non-zero-copy cases in tls_sw_recvmsg has been made common.

Signed-off-by: Vakul Garg 
---
 net/tls/tls_sw.c | 14 ++
 1 file changed, 6 insertions(+), 8 deletions(-)

diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index 55960088baea..0e915d4a30bc 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -825,18 +825,16 @@ int tls_sw_recvmsg(struct sock *sk,
err = decrypt_skb_update(sk, skb, sgin);
for (; pages > 0; pages--)
put_page(sg_page([pages]));
-   if (err < 0) {
-   tls_err_abort(sk, EBADMSG);
-   goto recv_end;
-   }
} else {
 fallback_to_reg_recv:
err = decrypt_skb_update(sk, skb, NULL);
-   if (err < 0) {
-   tls_err_abort(sk, EBADMSG);
-   goto recv_end;
-   }
}
+
+   if (err < 0) {
+   tls_err_abort(sk, EBADMSG);
+   goto recv_end;
+   }
+
ctx->decrypted = true;
}
 
-- 
2.13.6

[net-next v2 3/5] net/tls: Remove redundant variable assignments and wakeup

2018-07-16 Thread Vakul Garg

In function decrypt_skb_update(), the assignment to tls receive context
variable 'decrypted' is redundant as the same is being done in function
tls_sw_recvmsg() after calling decrypt_skb_update(). Also calling callback
function to wakeup processes sleeping on socket data availability is
useless as decrypt_skb_update() is invoked from user processes only. This
patch cleans these up.

Signed-off-by: Vakul Garg 
---
 net/tls/tls_sw.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index 186152dced25..6eaaa587db71 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -675,8 +675,6 @@ static int decrypt_skb_update(struct sock *sk, struct 
sk_buff *skb,
rxm->offset += tls_ctx->rx.prepend_size;
rxm->full_len -= tls_ctx->rx.overhead_size;
tls_advance_record_sn(sk, _ctx->rx);
-   ctx->decrypted = true;
-   ctx->saved_data_ready(sk);
 
return err;
 }
-- 
2.13.6

[net-next v2 2/5] net/tls: Use socket data_ready callback on record availability

2018-07-16 Thread Vakul Garg

On receipt of a complete tls record, use socket's saved data_ready
callback instead of state_change callback.

Signed-off-by: Vakul Garg 
---
 net/tls/tls_sw.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index e94cb54a6994..186152dced25 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -1019,7 +1019,7 @@ static void tls_queue(struct strparser *strp, struct 
sk_buff *skb)
ctx->recv_pkt = skb;
strp_pause(strp);
 
-   strp->sk->sk_state_change(strp->sk);
+   ctx->saved_data_ready(strp->sk);
 }
 
 static void tls_data_ready(struct sock *sk)
-- 
2.13.6

RE: [net-next v5 3/3] net/tls: Remove redundant array allocation.

2018-07-24 Thread Vakul Garg




> -Original Message-
> From: Dave Watson [mailto:davejwat...@fb.com]
> Sent: Monday, July 23, 2018 10:05 PM
> To: David Miller 
> Cc: Vakul Garg ; netdev@vger.kernel.org;
> bor...@mellanox.com; avia...@mellanox.com; Doron Roberts-Kedes
> 
> Subject: Re: [net-next v5 3/3] net/tls: Remove redundant array allocation.
> 
> On 07/21/18 07:25 PM, David Miller wrote:
> > From: Vakul Garg 
> > Date: Thu, 19 Jul 2018 21:56:13 +0530
> >
> > > In function decrypt_skb(), array allocation in case when sgout is
> > > NULL is unnecessary. Instead, local variable sgin_arr[] can be used.
> > >
> > > Signed-off-by: Vakul Garg 
> >
> > Hmmm...
> >
> > Dave, can you take a look at this?  Do you think there might have been
> > a reason you felt that you needed to dynamically allocate the
> > scatterlists when you COW and skb and do in-place decryption?
> >
> > I guess this change is ok, nsg can only get smaller when the SKB is
> > COW'd.
> >
> 
> > > memcpy(iv, tls_ctx->rx.iv, TLS_CIPHER_AES_GCM_128_SALT_SIZE);
> > > if (!sgout) {
> > > nsg = skb_cow_data(skb, 0, ) + 1;
> > > -   sgin = kmalloc_array(nsg, sizeof(*sgin), 
> > > sk->sk_allocation);
> > > sgout = sgin;
> > > }
> 
> I don't think this patch is safe as-is.  sgin_arr is a stack array of size
> MAX_SKB_FRAGS (+ overhead), while my read of skb_cow_data is that it
> walks the whole chain of skbs from skb->next, and can return any number of
> segments.  Therefore we need to heap allocate.  I think I copied the IPSEC
> code here.
> 
> For perf though, we could use the stack array if skb_cow_data returns <=
> MAX_SKB_FRAGS.
> 
> This code is slightly confusing though, since we don't heap allocate in the
> zerocopy case - what happens is that skb_to_sgvec returns -EMSGSIZE, and
> we fall back to the non-zerocopy case, and return again to this function,
> where we then hit the skb_cow_data path and heap allocate.

Thanks for explaining. 
I am missing the point why MAX_SKB_FRAGS sized local array 
sgin has been used in tls_sw_recvmsg(). What is special about MAX_SKB_FRAGS so
that we used it as a size factor for 'sgin'?

Will it be a bad idea to get rid of array 'sgin' on stack and simply kmalloc 
'sgin' for 
whatever the number the number of pages returned by iov_iter_npages()?
We can allocate for sgout too with the same kmalloc().

(Using a local array based 'sgin' is coming in the way to achieve sending 
multiple async
record decryption requests to the accelerator without waiting for previous one 
to complete.)

[PATCH net-next] net/tls: Removed redundant checks for non-NULL

2018-07-24 Thread Vakul Garg

Removed checks against non-NULL before calling kfree_skb() and
crypto_free_aead(). These functions are safe to be called with NULL
as an argument.

Signed-off-by: Vakul Garg 
---
 net/tls/tls_sw.c | 9 +++--
 1 file changed, 3 insertions(+), 6 deletions(-)

diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index 0c2d029c9d4c..ef445478239c 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -1044,8 +1044,7 @@ void tls_sw_free_resources_tx(struct sock *sk)
struct tls_context *tls_ctx = tls_get_ctx(sk);
struct tls_sw_context_tx *ctx = tls_sw_ctx_tx(tls_ctx);
 
-   if (ctx->aead_send)
-   crypto_free_aead(ctx->aead_send);
+   crypto_free_aead(ctx->aead_send);
tls_free_both_sg(sk);
 
kfree(ctx);
@@ -1057,10 +1056,8 @@ void tls_sw_release_resources_rx(struct sock *sk)
struct tls_sw_context_rx *ctx = tls_sw_ctx_rx(tls_ctx);
 
if (ctx->aead_recv) {
-   if (ctx->recv_pkt) {
-   kfree_skb(ctx->recv_pkt);
-   ctx->recv_pkt = NULL;
-   }
+   kfree_skb(ctx->recv_pkt);
+   ctx->recv_pkt = NULL;
crypto_free_aead(ctx->aead_recv);
strp_stop(>strp);
write_lock_bh(>sk_callback_lock);
-- 
2.13.6

[PATCH net-next] net/tls: Do not call msg_data_left() twice

2018-07-23 Thread Vakul Garg

In function tls_sw_sendmsg(), msg_data_left() needs to be called only
once. The second invocation of msg_data_left() for assigning variable
try_to_copy can be removed and merged with the first one.

Signed-off-by: Vakul Garg 
---
 net/tls/tls_sw.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index 0c2d029c9d4c..fd51ce65b99c 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -377,7 +377,7 @@ int tls_sw_sendmsg(struct sock *sk, struct msghdr *msg, 
size_t size)
goto send_end;
}
 
-   while (msg_data_left(msg)) {
+   while ((try_to_copy = msg_data_left(msg))) {
if (sk->sk_err) {
ret = -sk->sk_err;
goto send_end;
@@ -385,7 +385,6 @@ int tls_sw_sendmsg(struct sock *sk, struct msghdr *msg, 
size_t size)
 
orig_size = ctx->sg_plaintext_size;
full_record = false;
-   try_to_copy = msg_data_left(msg);
record_room = TLS_MAX_PAYLOAD_SIZE - ctx->sg_plaintext_size;
if (try_to_copy >= record_room) {
try_to_copy = record_room;
-- 
2.13.6

[net-next v6 1/2] net/tls: Use socket data_ready callback on record availability

2018-07-23 Thread Vakul Garg

On receipt of a complete tls record, use socket's saved data_ready
callback instead of state_change callback.

Signed-off-by: Vakul Garg 
---
 net/tls/tls_sw.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index 0c2d029c9d4c..fee1240eff92 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -1028,7 +1028,7 @@ static void tls_queue(struct strparser *strp, struct 
sk_buff *skb)
ctx->recv_pkt = skb;
strp_pause(strp);
 
-   strp->sk->sk_state_change(strp->sk);
+   ctx->saved_data_ready(strp->sk);
 }
 
 static void tls_data_ready(struct sock *sk)
-- 
2.13.6

[net-next v6 2/2] net/tls: Remove redundant variable assignments and wakeup

2018-07-23 Thread Vakul Garg

In function decrypt_skb_update(), the assignment to tls receive context
variable 'decrypted' is redundant as the same is being done in function
tls_sw_recvmsg() after calling decrypt_skb_update(). Also calling callback
function to wakeup processes sleeping on socket data availability is
useless as decrypt_skb_update() is invoked from user processes only. This
patch cleans these up.

Signed-off-by: Vakul Garg 
---
 net/tls/tls_sw.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index fee1240eff92..6c71da7b147f 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -679,8 +679,6 @@ static int decrypt_skb_update(struct sock *sk, struct 
sk_buff *skb,
rxm->offset += tls_ctx->rx.prepend_size;
rxm->full_len -= tls_ctx->rx.overhead_size;
tls_advance_record_sn(sk, _ctx->rx);
-   ctx->decrypted = true;
-   ctx->saved_data_ready(sk);
 
return err;
 }
-- 
2.13.6

RE: [PATCH net-next] tls: Fix improper revert in zerocopy_from_iter

2018-07-23 Thread Vakul Garg




> -Original Message-
> From: Doron Roberts-Kedes [mailto:doro...@fb.com]
> Sent: Tuesday, July 24, 2018 3:50 AM
> To: David S . Miller 
> Cc: Dave Watson ; Vakul Garg
> ; Matt Mullins ;
> netdev@vger.kernel.org; Doron Roberts-Kedes 
> Subject: [PATCH net-next] tls: Fix improper revert in zerocopy_from_iter
> 
> The current code is problematic because the iov_iter is reverted and never
> advanced in the non-error case. This patch skips the revert in the non-error
> case. This patch also fixes the amount by which the iov_iter is reverted.
> Currently, iov_iter is reverted by size, which can be greater than the amount
> by which the iter was actually advanced.
> Instead, mimic the tx path which reverts by the difference before and after
> zerocopy_from_iter.
> 
> Fixes: 4718799817c5 ("tls: Fix zerocopy_from_iter iov handling")
> Signed-off-by: Doron Roberts-Kedes 
> ---
>  net/tls/tls_sw.c | 13 +++--
>  1 file changed, 7 insertions(+), 6 deletions(-)
> 
> diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c index
> 490f2bcc6313..2ea000baebf8 100644
> --- a/net/tls/tls_sw.c
> +++ b/net/tls/tls_sw.c
> @@ -276,7 +276,7 @@ static int zerocopy_from_iter(struct sock *sk, struct
> iov_iter *from,
> int length, int *pages_used,
> unsigned int *size_used,
> struct scatterlist *to, int to_max_pages,
> -   bool charge, bool revert)
> +   bool charge)
>  {
>   struct page *pages[MAX_SKB_FRAGS];
> 
> @@ -327,8 +327,6 @@ static int zerocopy_from_iter(struct sock *sk, struct
> iov_iter *from,
>  out:
>   *size_used = size;
>   *pages_used = num_elem;
> - if (revert)
> - iov_iter_revert(from, size);
> 
>   return rc;
>  }
> @@ -431,7 +429,7 @@ int tls_sw_sendmsg(struct sock *sk, struct msghdr
> *msg, size_t size)
>   >sg_plaintext_size,
>   ctx->sg_plaintext_data,
>   ARRAY_SIZE(ctx->sg_plaintext_data),
> - true, false);
> + true);
>   if (ret)
>   goto fallback_to_reg_send;
> 
> @@ -811,6 +809,7 @@ int tls_sw_recvmsg(struct sock *sk,
>   likely(!(flags & MSG_PEEK)))  {
>   struct scatterlist sgin[MAX_SKB_FRAGS + 1];
>   int pages = 0;
> + int orig_chunk = chunk;
> 
>   zc = true;
>   sg_init_table(sgin, MAX_SKB_FRAGS + 1);
> @@ -820,9 +819,11 @@ int tls_sw_recvmsg(struct sock *sk,
>   err = zerocopy_from_iter(sk, 
> >msg_iter,
>to_copy, ,
>, [1],
> -  MAX_SKB_FRAGS,
>   false, true);
> - if (err < 0)
> +  MAX_SKB_FRAGS,
>   false);
> + if (err < 0) {
> + iov_iter_revert(>msg_iter,
> chunk - orig_chunk);
>   goto fallback_to_reg_recv;
> + }

This assumes that msg_iter gets advanced even if zerocopy_from_iter() fails.
It is easier from code readability perspective if functions upon failure do not 
leave any side effects for the caller to clean-up.
I suggest that iov_iter_revert() should be called from zerocopy_from_iter() 
itself if it is going to fail. 

 

> 
>   err = decrypt_skb(sk, skb, sgin);
>   for (; pages > 0; pages--)
> --
> 2.17.1

Re: [net-next v5 3/3] net/tls: Remove redundant array allocation.

2018-07-23 Thread Vakul Garg

Hi Dave

Can you still apply the rest of two patches in the series or do I need to send 
them again separately?

Regards

Vakul



From: netdev-ow...@vger.kernel.org  on behalf of 
David Miller 
Sent: Tuesday, July 24, 2018 10:11:09 AM
To: davejwat...@fb.com
Cc: Vakul Garg; netdev@vger.kernel.org; bor...@mellanox.com; 
avia...@mellanox.com; doro...@fb.com
Subject: Re: [net-next v5 3/3] net/tls: Remove redundant array allocation.

From: Dave Watson 
Date: Mon, 23 Jul 2018 09:35:09 -0700

> I don't think this patch is safe as-is.  sgin_arr is a stack array of
> size MAX_SKB_FRAGS (+ overhead), while my read of skb_cow_data is that
> it walks the whole chain of skbs from skb->next, and can return any
> number of segments.  Therefore we need to heap allocate.  I think I
> copied the IPSEC code here.

Ok I see what you are saying.

So it means that, when a non-NULL sgout is passed into decrypt_skb(),
via decrypt_skb_update(), via tls_sw_recvmsg() it means that it is the
zerocopy case and you know that you only have page frags and no SKB
frag list, right?

I agree with you that this change is therefore incorrect.

[net-next v6 0/2] Minor code cleanup patches

2018-07-23 Thread Vakul Garg

This patch series improves tls_sw.c code by:

1) Using correct socket callback for flagging data availability.
2) Removing redundant variable assignments and wakeup callbacks.


Vakul Garg (2):
  net/tls: Use socket data_ready callback on record availability
  net/tls: Remove redundant variable assignments and wakeup

 net/tls/tls_sw.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

-- 
2.13.6

Re: [PATCH net-next] net/tls: Do not call msg_data_left() twice

2018-07-24 Thread Vakul Garg

From: Al Viro
Sent: Wednesday, 25 July, 7:19 AM
Subject: Re: [PATCH net-next] net/tls: Do not call msg_data_left() twice
To: Vakul Garg
Cc: netdev@vger.kernel.org, bor...@mellanox.com, avia...@mellanox.com, 
davejwat...@fb.com, da...@davemloft.net

On Tue, Jul 24, 2018 at 04:41:18PM +0530, Vakul Garg wrote: > In function 
tls_sw_sendmsg(), msg_data_left() needs to be called only > once. The second 
invocation of msg_data_left() for assigning variable > try_to_copy can be 
removed and merged with the first one. You do realize that msg_data_left(msg) 
is simply msg->msg_iter.count? So I'm not at all sure that your change will be 
an overall win...

Agreed, msg_data_left is inexpensive. It's just that is pointless to call it 
again.

Security enhancement proposal for kernel TLS

2018-07-22 Thread Vakul Garg

Hi

The kernel based TLS record layer allows the user space world to use a 
decoupled TLS implementation.
The applications need not be linked with TLS stack. 
The TLS handshake can be done by a TLS daemon on the behalf of applications.

Presently, as soon as the handshake process derives keys, it pushes the 
negotiated keys to kernel TLS . 
Thereafter the applications can directly read and write data on their TCP 
socket (without having to use SSL apis).

With the current kernel TLS implementation, there is a security problem. 
Since the kernel TLS socket does not have information about the state of 
handshake, 
it allows applications to be able to receive data from the peer TLS endpoint 
even when the handshake verification has not been completed by the SSL daemon. 
It is a security problem if applications can receive data if verification of 
the handshake transcript is not completed (done with processing tls FINISHED 
message).

My proposal:
- Kernel TLS should maintain state of handshake (verified or 
unverified). 
In un-verified state, data records should not be allowed pass through 
to the applications.

- Add a new control interface using which that the user space SSL stack 
can tell the TLS socket that handshake has been verified and DATA records can 
flow. 
In 'unverified' state, only control records should be allowed to pass 
and reception DATA record should be pause the receive side record decryption.

- The handshake state should fallback to 'unverified' in case a control 
record is seen again by kernel TLS (e.g. in case of renegotiation, post 
handshake client auth etc).

Kindly comment.

Regards

Vakul

Query about tls patch

2018-07-22 Thread Vakul Garg

Hi 

I got a query reading patch https://patchwork.ozlabs.org/patch/943442/ (already 
merged).

[PATCH]: tls: Fix zerocopy_from_iter iov handling

In tls_sw_sendmsg(), if zerocopy_from_iter() fails, we go to 
fallback_to_reg_send.
Here we first call iov_iter_revert(). But the iov_iter_advance  didn't happen 
in first place.

Similarly, in zerocopy_from_iter(), if 'revert' is passed as 'true', 
iov_iter_revert() is called even if iov_iter_advance() never happened (case of 
error return).

In tls_sw_recvmsg() flow, the iov_iter_revert() is always triggered. 
With revert happening, I am missing the point how the case where multiple 
records are decrypted into user space buffer with one invocation of 'recvmsg()' 
happens?
It seems that successively dequeued tls records would get decrypted at same 
location in the output buffer and keep overwriting the previous one.

Regards

Vakul

[PATCH net-next] net/tls: Corrected enabling of zero-copy mode

2018-07-23 Thread Vakul Garg

Zero-copy mode was left enabled even when zerocopy_from_iter() failed.
In this case, control falls back to regular non zero-copy mode. This
caused skipping call of datagram copy function skb_copy_datagram_msg()
and required an extra iteration of record decryption loop for decrypted
record to be copied into user space provided buffer. Hence zero-copy
mode should be enabled/disabled as per the success/failure of
zerocopy_from_iter().

Fixes: c46234ebb4d1 ("tls: RX path for ktls")
Signed-off-by: Vakul Garg 
---

The patch does not need to be applied to 'net' branch as it does not fix
any functional bug. The patch achieves better run-time efficiency and
code readability.

 net/tls/tls_sw.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index 0c2d029c9d4c..9ae57bec4927 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -787,7 +787,7 @@ int tls_sw_recvmsg(struct sock *sk,
target = sock_rcvlowat(sk, flags & MSG_WAITALL, len);
timeo = sock_rcvtimeo(sk, flags & MSG_DONTWAIT);
do {
-   bool zc = false;
+   bool zc;
int chunk = 0;
 
skb = tls_wait_data(sk, flags, timeo, );
@@ -824,7 +824,6 @@ int tls_sw_recvmsg(struct sock *sk,
struct scatterlist sgin[MAX_SKB_FRAGS + 1];
int pages = 0;
 
-   zc = true;
sg_init_table(sgin, MAX_SKB_FRAGS + 1);
sg_set_buf([0], ctx->rx_aad_plaintext,
   TLS_AAD_SPACE_SIZE);
@@ -836,6 +835,7 @@ int tls_sw_recvmsg(struct sock *sk,
if (err < 0)
goto fallback_to_reg_recv;
 
+   zc = true;
err = decrypt_skb_update(sk, skb, sgin, );
for (; pages > 0; pages--)
put_page(sg_page([pages]));
@@ -845,6 +845,7 @@ int tls_sw_recvmsg(struct sock *sk,
}
} else {
 fallback_to_reg_recv:
+   zc = false;
err = decrypt_skb_update(sk, skb, NULL, );
if (err < 0) {
tls_err_abort(sk, EBADMSG);
-- 
2.13.6

RE: [net-next v3 1/5] net/tls: Do not enable zero-copy prematurely

2018-07-19 Thread Vakul Garg

Thanks for the comment.
I will take this patch out of the series.

> -Original Message-
> From: Boris Pismenny [mailto:bor...@mellanox.com]
> Sent: Thursday, July 19, 2018 3:58 PM
> To: Vakul Garg ; netdev@vger.kernel.org
> Cc: avia...@mellanox.com; davejwat...@fb.com; da...@davemloft.net
> Subject: Re: [net-next v3 1/5] net/tls: Do not enable zero-copy prematurely
> 
> Hi Vakul,
> 
> On 7/19/2018 7:16 AM, Vakul Garg wrote:
> > Zero-copy mode was left enabled even when zerocopy_from_iter() failed.
> > Set the zero-copy mode only when zerocopy_from_iter() succeeds. This
> > leads to removal of argument 'zc' of function decrypt_skb_update().
> > Function decrypt_skb_update() does not need to check whether
> > ctx->decrypted is set since it is never called if ctx->decrypted is
> > true.
> >
> 
> This patch breaks our tls_device code for the following 2 reasons:
> 1. We need to disable zerocopy if the device decrypted the record, because
> decrypted data has to be copied to user buffers.
> 2. ctx->decrypted must be checked in decrypt_skb_update, because it might
> change after calling tls_device_decrypted.

[net-next v5 3/3] net/tls: Remove redundant array allocation.

2018-07-19 Thread Vakul Garg

In function decrypt_skb(), array allocation in case when sgout is NULL
is unnecessary. Instead, local variable sgin_arr[] can be used.

Signed-off-by: Vakul Garg 
---
 net/tls/tls_sw.c | 4 
 1 file changed, 4 deletions(-)

diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index e15ace0ebd79..1aa2d46713d7 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -704,7 +704,6 @@ int decrypt_skb(struct sock *sk, struct sk_buff *skb,
memcpy(iv, tls_ctx->rx.iv, TLS_CIPHER_AES_GCM_128_SALT_SIZE);
if (!sgout) {
nsg = skb_cow_data(skb, 0, ) + 1;
-   sgin = kmalloc_array(nsg, sizeof(*sgin), sk->sk_allocation);
sgout = sgin;
}
 
@@ -725,9 +724,6 @@ int decrypt_skb(struct sock *sk, struct sk_buff *skb,
rxm->full_len - tls_ctx->rx.overhead_size,
skb, sk->sk_allocation);
 
-   if (sgin != _arr[0])
-   kfree(sgin);
-
return ret;
 }
 
-- 
2.13.6

[net-next v4 0/3] net/tls: Minor code cleanup patches

2018-07-19 Thread Vakul Garg

This patch series improves tls_sw.c code by:

1) Using correct socket callback for flagging data availability.
2) Removing redundant variable assignments and wakeup callbacks.
3) Removing redundant dynamic array allocation.

The patches do not fix any functional bug. Hence "Fixes:" tag has not
been used. From patch series v3, this series v4 contains two patches
less. They will be submitted separately.

Vakul Garg (3):
  net/tls: Use socket data_ready callback on record availability
  net/tls: Remove redundant variable assignments and wakeup
  net/tls: Remove redundant array allocation.

 net/tls/tls_sw.c | 9 +
 1 file changed, 1 insertion(+), 8 deletions(-)

-- 
2.13.6

[net-next v4 3/3] net/tls: Remove redundant array allocation.

2018-07-19 Thread Vakul Garg

In function decrypt_skb(), array allocation in case when sgout is NULL
is unnecessary. Instead, local variable sgin_arr[] can be used.

Signed-off-by: Vakul Garg 
---
 net/tls/tls_sw.c | 4 
 1 file changed, 4 deletions(-)

diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index e62f288fda31..c33ba3f1c408 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -703,7 +703,6 @@ int decrypt_skb(struct sock *sk, struct sk_buff *skb,
memcpy(iv, tls_ctx->rx.iv, TLS_CIPHER_AES_GCM_128_SALT_SIZE);
if (!sgout) {
nsg = skb_cow_data(skb, 0, ) + 1;
-   sgin = kmalloc_array(nsg, sizeof(*sgin), sk->sk_allocation);
sgout = sgin;
}
 
@@ -724,9 +723,6 @@ int decrypt_skb(struct sock *sk, struct sk_buff *skb,
rxm->full_len - tls_ctx->rx.overhead_size,
skb, sk->sk_allocation);
 
-   if (sgin != _arr[0])
-   kfree(sgin);
-
return ret;
 }
 
-- 
2.13.6

[net-next v4 1/3] net/tls: Use socket data_ready callback on record availability

2018-07-19 Thread Vakul Garg

On receipt of a complete tls record, use socket's saved data_ready
callback instead of state_change callback.

Signed-off-by: Vakul Garg 
---
 net/tls/tls_sw.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index 7d194c0cd6cf..a58661c624ec 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -1023,7 +1023,7 @@ static void tls_queue(struct strparser *strp, struct 
sk_buff *skb)
ctx->recv_pkt = skb;
strp_pause(strp);
 
-   strp->sk->sk_state_change(strp->sk);
+   ctx->saved_data_ready(strp->sk);
 }
 
 static void tls_data_ready(struct sock *sk)
-- 
2.13.6

[net-next v4 2/3] net/tls: Remove redundant variable assignments and wakeup

2018-07-19 Thread Vakul Garg

In function decrypt_skb_update(), the assignment to tls receive context
variable 'decrypted' is redundant as the same is being done in function
tls_sw_recvmsg() after calling decrypt_skb_update(). Also calling callback
function to wakeup processes sleeping on socket data availability is
useless as decrypt_skb_update() is invoked from user processes only. This
patch cleans these up.

Signed-off-by: Vakul Garg 
---
 net/tls/tls_sw.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index a58661c624ec..e62f288fda31 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -659,7 +659,6 @@ static int decrypt_skb_update(struct sock *sk, struct 
sk_buff *skb,
  struct scatterlist *sgout, bool *zc)
 {
struct tls_context *tls_ctx = tls_get_ctx(sk);
-   struct tls_sw_context_rx *ctx = tls_sw_ctx_rx(tls_ctx);
struct strp_msg *rxm = strp_msg(skb);
int err = 0;
 
@@ -679,8 +678,6 @@ static int decrypt_skb_update(struct sock *sk, struct 
sk_buff *skb,
rxm->offset += tls_ctx->rx.prepend_size;
rxm->full_len -= tls_ctx->rx.overhead_size;
tls_advance_record_sn(sk, _ctx->rx);
-   ctx->decrypted = true;
-   ctx->saved_data_ready(sk);
 
return err;
 }
-- 
2.13.6

[net-next v4 0/3] net/tls: Minor code cleanup patches

2018-07-19 Thread Vakul Garg

This patch series improves tls_sw.c code by:

1) Using correct socket callback for flagging data availability.
2) Removing redundant variable assignments and wakeup callbacks.
3) Removing redundant dynamic array allocation.

The patches do not fix any functional bug. Hence "Fixes:" tag has not
been used. From patch series v3, this series v4 contains two patches
less. They will be submitted separately.

Vakul Garg (3):
  net/tls: Use socket data_ready callback on record availability
  net/tls: Remove redundant variable assignments and wakeup
  net/tls: Remove redundant array allocation.

 net/tls/tls_sw.c | 9 +
 1 file changed, 1 insertion(+), 8 deletions(-)

-- 
2.13.6

[net-next v5 2/3] net/tls: Remove redundant variable assignments and wakeup

2018-07-19 Thread Vakul Garg

In function decrypt_skb_update(), the assignment to tls receive context
variable 'decrypted' is redundant as the same is being done in function
tls_sw_recvmsg() after calling decrypt_skb_update(). Also calling callback
function to wakeup processes sleeping on socket data availability is
useless as decrypt_skb_update() is invoked from user processes only. This
patch cleans these up.

Signed-off-by: Vakul Garg 
---

Changes from v4->v5: Fixed compilation issue.

 net/tls/tls_sw.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index a58661c624ec..e15ace0ebd79 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -679,8 +679,6 @@ static int decrypt_skb_update(struct sock *sk, struct 
sk_buff *skb,
rxm->offset += tls_ctx->rx.prepend_size;
rxm->full_len -= tls_ctx->rx.overhead_size;
tls_advance_record_sn(sk, _ctx->rx);
-   ctx->decrypted = true;
-   ctx->saved_data_ready(sk);
 
return err;
 }
-- 
2.13.6

[net-next v5 3/3] net/tls: Remove redundant array allocation.

2018-07-19 Thread Vakul Garg

In function decrypt_skb(), array allocation in case when sgout is NULL
is unnecessary. Instead, local variable sgin_arr[] can be used.

Signed-off-by: Vakul Garg 
---
 net/tls/tls_sw.c | 4 
 1 file changed, 4 deletions(-)

diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index e15ace0ebd79..1aa2d46713d7 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -704,7 +704,6 @@ int decrypt_skb(struct sock *sk, struct sk_buff *skb,
memcpy(iv, tls_ctx->rx.iv, TLS_CIPHER_AES_GCM_128_SALT_SIZE);
if (!sgout) {
nsg = skb_cow_data(skb, 0, ) + 1;
-   sgin = kmalloc_array(nsg, sizeof(*sgin), sk->sk_allocation);
sgout = sgin;
}
 
@@ -725,9 +724,6 @@ int decrypt_skb(struct sock *sk, struct sk_buff *skb,
rxm->full_len - tls_ctx->rx.overhead_size,
skb, sk->sk_allocation);
 
-   if (sgin != _arr[0])
-   kfree(sgin);
-
return ret;
 }
 
-- 
2.13.6

[net-next v5 1/3] net/tls: Use socket data_ready callback on record availability

2018-07-19 Thread Vakul Garg

On receipt of a complete tls record, use socket's saved data_ready
callback instead of state_change callback.

Signed-off-by: Vakul Garg 
---
 net/tls/tls_sw.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index 7d194c0cd6cf..a58661c624ec 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -1023,7 +1023,7 @@ static void tls_queue(struct strparser *strp, struct 
sk_buff *skb)
ctx->recv_pkt = skb;
strp_pause(strp);
 
-   strp->sk->sk_state_change(strp->sk);
+   ctx->saved_data_ready(strp->sk);
 }
 
 static void tls_data_ready(struct sock *sk)
-- 
2.13.6

[net-next v5 0/3] net/tls: Minor code cleanup patches

2018-07-19 Thread Vakul Garg

This patch series improves tls_sw.c code by:

1) Using correct socket callback for flagging data availability.
2) Removing redundant variable assignments and wakeup callbacks.
3) Removing redundant dynamic array allocation.

The patches do not fix any functional bug. Hence "Fixes:" tag has not
been used. From patch series v3, this series v4 contains two patches
less. They will be submitted separately.

Vakul Garg (3):
  net/tls: Use socket data_ready callback on record availability
  net/tls: Remove redundant variable assignments and wakeup
  net/tls: Remove redundant array allocation.

 net/tls/tls_sw.c | 9 +
 1 file changed, 1 insertion(+), 8 deletions(-)

-- 
2.13.6

[net-next v5 1/3] net/tls: Use socket data_ready callback on record availability

2018-07-19 Thread Vakul Garg

On receipt of a complete tls record, use socket's saved data_ready
callback instead of state_change callback.

Signed-off-by: Vakul Garg 
---
 net/tls/tls_sw.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index 7d194c0cd6cf..a58661c624ec 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -1023,7 +1023,7 @@ static void tls_queue(struct strparser *strp, struct 
sk_buff *skb)
ctx->recv_pkt = skb;
strp_pause(strp);
 
-   strp->sk->sk_state_change(strp->sk);
+   ctx->saved_data_ready(strp->sk);
 }
 
 static void tls_data_ready(struct sock *sk)
-- 
2.13.6

[net-next v5 2/3] net/tls: Remove redundant variable assignments and wakeup

2018-07-19 Thread Vakul Garg

In function decrypt_skb_update(), the assignment to tls receive context
variable 'decrypted' is redundant as the same is being done in function
tls_sw_recvmsg() after calling decrypt_skb_update(). Also calling callback
function to wakeup processes sleeping on socket data availability is
useless as decrypt_skb_update() is invoked from user processes only. This
patch cleans these up.

Signed-off-by: Vakul Garg 
---

Changes from v4->v5: Fixed compilation issue.

 net/tls/tls_sw.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index a58661c624ec..e15ace0ebd79 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -679,8 +679,6 @@ static int decrypt_skb_update(struct sock *sk, struct 
sk_buff *skb,
rxm->offset += tls_ctx->rx.prepend_size;
rxm->full_len -= tls_ctx->rx.overhead_size;
tls_advance_record_sn(sk, _ctx->rx);
-   ctx->decrypted = true;
-   ctx->saved_data_ready(sk);
 
return err;
 }
-- 
2.13.6

[net-next v3 3/5] net/tls: Remove redundant variable assignments and wakeup

2018-07-18 Thread Vakul Garg

In function decrypt_skb_update(), the assignment to tls receive context
variable 'decrypted' is redundant as the same is being done in function
tls_sw_recvmsg() after calling decrypt_skb_update(). Also calling callback
function to wakeup processes sleeping on socket data availability is
useless as decrypt_skb_update() is invoked from user processes only. This
patch cleans these up.

Signed-off-by: Vakul Garg 
---

v2 -> v3
Removed compilation warning.

 net/tls/tls_sw.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index 186152dced25..5dcfbaf33680 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -659,7 +659,6 @@ static int decrypt_skb_update(struct sock *sk, struct 
sk_buff *skb,
  struct scatterlist *sgout)
 {
struct tls_context *tls_ctx = tls_get_ctx(sk);
-   struct tls_sw_context_rx *ctx = tls_sw_ctx_rx(tls_ctx);
struct strp_msg *rxm = strp_msg(skb);
int err = 0;
 
@@ -675,8 +674,6 @@ static int decrypt_skb_update(struct sock *sk, struct 
sk_buff *skb,
rxm->offset += tls_ctx->rx.prepend_size;
rxm->full_len -= tls_ctx->rx.overhead_size;
tls_advance_record_sn(sk, _ctx->rx);
-   ctx->decrypted = true;
-   ctx->saved_data_ready(sk);
 
return err;
 }
-- 
2.13.6

[net-next v3 2/5] net/tls: Use socket data_ready callback on record availability

2018-07-18 Thread Vakul Garg

On receipt of a complete tls record, use socket's saved data_ready
callback instead of state_change callback.

Signed-off-by: Vakul Garg 
---
 net/tls/tls_sw.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index e94cb54a6994..186152dced25 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -1019,7 +1019,7 @@ static void tls_queue(struct strparser *strp, struct 
sk_buff *skb)
ctx->recv_pkt = skb;
strp_pause(strp);
 
-   strp->sk->sk_state_change(strp->sk);
+   ctx->saved_data_ready(strp->sk);
 }
 
 static void tls_data_ready(struct sock *sk)
-- 
2.13.6

[net-next v3 1/5] net/tls: Do not enable zero-copy prematurely

2018-07-18 Thread Vakul Garg

Zero-copy mode was left enabled even when zerocopy_from_iter() failed.
Set the zero-copy mode only when zerocopy_from_iter() succeeds. This
leads to removal of argument 'zc' of function decrypt_skb_update().
Function decrypt_skb_update() does not need to check whether
ctx->decrypted is set since it is never called if ctx->decrypted is
true.

Signed-off-by: Vakul Garg 
---
 net/tls/tls_sw.c | 22 +-
 1 file changed, 9 insertions(+), 13 deletions(-)

diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index 7d194c0cd6cf..e94cb54a6994 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -656,7 +656,7 @@ static struct sk_buff *tls_wait_data(struct sock *sk, int 
flags,
 }
 
 static int decrypt_skb_update(struct sock *sk, struct sk_buff *skb,
- struct scatterlist *sgout, bool *zc)
+ struct scatterlist *sgout)
 {
struct tls_context *tls_ctx = tls_get_ctx(sk);
struct tls_sw_context_rx *ctx = tls_sw_ctx_rx(tls_ctx);
@@ -668,13 +668,9 @@ static int decrypt_skb_update(struct sock *sk, struct 
sk_buff *skb,
if (err < 0)
return err;
 #endif
-   if (!ctx->decrypted) {
-   err = decrypt_skb(sk, skb, sgout);
-   if (err < 0)
-   return err;
-   } else {
-   *zc = false;
-   }
+   err = decrypt_skb(sk, skb, sgout);
+   if (err < 0)
+   return err;
 
rxm->offset += tls_ctx->rx.prepend_size;
rxm->full_len -= tls_ctx->rx.overhead_size;
@@ -819,7 +815,6 @@ int tls_sw_recvmsg(struct sock *sk,
struct scatterlist sgin[MAX_SKB_FRAGS + 1];
int pages = 0;
 
-   zc = true;
sg_init_table(sgin, MAX_SKB_FRAGS + 1);
sg_set_buf([0], ctx->rx_aad_plaintext,
   TLS_AAD_SPACE_SIZE);
@@ -830,8 +825,10 @@ int tls_sw_recvmsg(struct sock *sk,
 MAX_SKB_FRAGS, false, 
true);
if (err < 0)
goto fallback_to_reg_recv;
+   else
+   zc = true;
 
-   err = decrypt_skb_update(sk, skb, sgin, );
+   err = decrypt_skb_update(sk, skb, sgin);
for (; pages > 0; pages--)
put_page(sg_page([pages]));
if (err < 0) {
@@ -840,7 +837,7 @@ int tls_sw_recvmsg(struct sock *sk,
}
} else {
 fallback_to_reg_recv:
-   err = decrypt_skb_update(sk, skb, NULL, );
+   err = decrypt_skb_update(sk, skb, NULL);
if (err < 0) {
tls_err_abort(sk, EBADMSG);
goto recv_end;
@@ -895,7 +892,6 @@ ssize_t tls_sw_splice_read(struct socket *sock,  loff_t 
*ppos,
int err = 0;
long timeo;
int chunk;
-   bool zc;
 
lock_sock(sk);
 
@@ -912,7 +908,7 @@ ssize_t tls_sw_splice_read(struct socket *sock,  loff_t 
*ppos,
}
 
if (!ctx->decrypted) {
-   err = decrypt_skb_update(sk, skb, NULL, );
+   err = decrypt_skb_update(sk, skb, NULL);
 
if (err < 0) {
tls_err_abort(sk, EBADMSG);
-- 
2.13.6

[net-next v3 5/5] net/tls: Rework error checking after decrypt_skb_update()

2018-07-18 Thread Vakul Garg

Error checking code after invoking decrypt_skb_update() for zero-copy
and non-zero-copy cases in tls_sw_recvmsg has been made common.

Signed-off-by: Vakul Garg 
---
 net/tls/tls_sw.c | 14 ++
 1 file changed, 6 insertions(+), 8 deletions(-)

diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index ae0b40d6671b..da3b884bea93 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -824,18 +824,16 @@ int tls_sw_recvmsg(struct sock *sk,
err = decrypt_skb_update(sk, skb, sgin);
for (; pages > 0; pages--)
put_page(sg_page([pages]));
-   if (err < 0) {
-   tls_err_abort(sk, EBADMSG);
-   goto recv_end;
-   }
} else {
 fallback_to_reg_recv:
err = decrypt_skb_update(sk, skb, NULL);
-   if (err < 0) {
-   tls_err_abort(sk, EBADMSG);
-   goto recv_end;
-   }
}
+
+   if (err < 0) {
+   tls_err_abort(sk, EBADMSG);
+   goto recv_end;
+   }
+
ctx->decrypted = true;
}
 
-- 
2.13.6

[net-next v3 0/5] net/tls: Minor code cleanup patches

2018-07-18 Thread Vakul Garg

This patch series improves tls_sw.c code by:

1) Removing redundant function decrypt_skb_update() argument.
2) Using correct socket callback for flagging data availability.
3) Removing redundant variable assignments and wakeup callbacks.
4) Removing redundant dynamic array allocation.
5) Using common error checking code for zero-copy, non zero-copy modes.

The patches do not fix any functional bug. Hence "Fixes:" tag has not
been used.

Vakul Garg (5):
  net/tls: Do not enable zero-copy prematurely
  net/tls: Use socket data_ready callback on record availability
  net/tls: Remove redundant variable assignments and wakeup
  net/tls: Remove redundant array allocation.
  net/tls: Rework error checking after decrypt_skb_update()

 net/tls/tls_sw.c | 45 -
 1 file changed, 16 insertions(+), 29 deletions(-)

-- 
2.13.6

[net-next v3 4/5] net/tls: Remove redundant array allocation.

2018-07-18 Thread Vakul Garg

In function decrypt_skb(), array allocation in case when sgout is NULL
is unnecessary. Instead, local variable sgin_arr[] can be used.

Signed-off-by: Vakul Garg 
---
 net/tls/tls_sw.c | 4 
 1 file changed, 4 deletions(-)

diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index 5dcfbaf33680..ae0b40d6671b 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -699,7 +699,6 @@ int decrypt_skb(struct sock *sk, struct sk_buff *skb,
memcpy(iv, tls_ctx->rx.iv, TLS_CIPHER_AES_GCM_128_SALT_SIZE);
if (!sgout) {
nsg = skb_cow_data(skb, 0, ) + 1;
-   sgin = kmalloc_array(nsg, sizeof(*sgin), sk->sk_allocation);
sgout = sgin;
}
 
@@ -720,9 +719,6 @@ int decrypt_skb(struct sock *sk, struct sk_buff *skb,
rxm->full_len - tls_ctx->rx.overhead_size,
skb, sk->sk_allocation);
 
-   if (sgin != _arr[0])
-   kfree(sgin);
-
return ret;
 }
 
-- 
2.13.6

RE: [net-next v6 1/2] net/tls: Use socket data_ready callback on record availability

2018-07-24 Thread Vakul Garg




> -Original Message-
> From: David Miller [mailto:da...@davemloft.net]
> Sent: Wednesday, July 25, 2018 1:43 AM
> To: Vakul Garg 
> Cc: netdev@vger.kernel.org; bor...@mellanox.com;
> avia...@mellanox.com; davejwat...@fb.com
> Subject: Re: [net-next v6 1/2] net/tls: Use socket data_ready callback on
> record availability
> 
> From: Vakul Garg 
> Date: Tue, 24 Jul 2018 15:44:02 +0530
> 
> > On receipt of a complete tls record, use socket's saved data_ready
> > callback instead of state_change callback.
> >
> > Signed-off-by: Vakul Garg 
> 
> I don't think this is correct.
> 
> Here, the stream parser has given us a complete TLS record.
> 
> But we haven't decrypted this packet yet.  It sits on the stream parser's
> queue to be processed by tls_sw_recvmsg(), not the saved socket's receive
> queue.

I understand that at this point in code, the TLS record is still queued in 
encrypted
state. But the decryption happens inline when tls_sw_recvmsg() gets invokved.
So it should be ok to notify the  waiting context about the availability of 
data as soon
as we could collect a full TLS record.

For new data availability notification, sk_data_ready callback should be more
more appropriate. It points to sock_def_readable() which wakes up specifically 
for
EPOLLIN event. 

This is in contrast to the socket callback sk_state_change which points
to sock_def_wakeup() which issues a wakeup unconditionally (without event mask).

RE: [PATCH net-next] tls: Fix improper revert in zerocopy_from_iter

2018-07-25 Thread Vakul Garg




> -Original Message-
> From: Doron Roberts-Kedes [mailto:doro...@fb.com]
> Sent: Wednesday, July 25, 2018 1:50 AM
> To: Vakul Garg 
> Cc: David S . Miller ; Dave Watson
> ; Matt Mullins ;
> netdev@vger.kernel.org
> Subject: Re: [PATCH net-next] tls: Fix improper revert in zerocopy_from_iter
> 
> On Tue, Jul 24, 2018 at 05:13:26AM +, Vakul Garg wrote:
> >
> > > -Original Message-
> > > From: Doron Roberts-Kedes [mailto:doro...@fb.com]
> > > Sent: Tuesday, July 24, 2018 3:50 AM @@ -811,6 +809,7 @@ int
> > > tls_sw_recvmsg(struct sock *sk,
> > >   likely(!(flags & MSG_PEEK)))  {
> > >   struct scatterlist sgin[MAX_SKB_FRAGS + 1];
> > >   int pages = 0;
> > > + int orig_chunk = chunk;
> > >
> > >   zc = true;
> > >   sg_init_table(sgin, MAX_SKB_FRAGS + 1);
> @@ -820,9 +819,11 @@
> > > int tls_sw_recvmsg(struct sock *sk,
> > >   err = zerocopy_from_iter(sk, 
> > > >msg_iter,
> > >to_copy, ,
> > >, [1],
> > > -  MAX_SKB_FRAGS,
> > >   false, true);
> > > - if (err < 0)
> > > +  MAX_SKB_FRAGS,
> > >   false);
> > > + if (err < 0) {
> > > + iov_iter_revert(>msg_iter,
> > > chunk - orig_chunk);
> > >   goto fallback_to_reg_recv;
> > > + }
> >
> > This assumes that msg_iter gets advanced even if zerocopy_from_iter()
> fails.
> 
> Not sure I see what you mean. If msg_iter is not advanced then chunk -
> orig_chunk is 0, and the revert is a no-op.
> 
As I said below, my comment was to improve code readability.
It takes a while to grasp that calling iov_iter_revert would result in no-op
if zerocopy_from_iter() fails.

> > It is easier from code readability perspective if functions upon failure do
> not leave any side effects for the caller to clean-up.
> > I suggest that iov_iter_revert() should be called from zerocopy_from_iter()
> itself if it is going to fail.
> 
> Agreed that zerocopy_from_iter() should call iov_iter_revert(). I didn't do
> that because at first glace, the tx path seems to depend on the
> iov_iter_revert() being called as a result of either failed
> zerocopy_from_iter() or tls_push_record(). However, I think the latter path
> cannot actually be taken because tls_push_record appears never to return a
> positive value. This means that the code between the continue and
> fallback_to_reg_send should probably just be replaced with simply, goto
> send_end.
> 
> After that change, its clear that iov_iter_revert() can be safely moved inside
> zerocopy_from_iter() in the error case.
> 
> Pending your input, I'll plan on putting up a 2 part patch addressing each of
> these.

Please submit.

RE: [PATCH net-next v2] net/tls: Add support for async decryption of tls records

2018-09-01 Thread Vakul Garg



> -Original Message-
> From: David Miller 
> Sent: Saturday, September 1, 2018 6:31 AM
> To: Vakul Garg 
> Cc: netdev@vger.kernel.org; bor...@mellanox.com;
> avia...@mellanox.com; davejwat...@fb.com
> Subject: Re: [PATCH net-next v2] net/tls: Add support for async decryption of
> tls records
> 
> From: Vakul Garg 
> Date: Wed, 29 Aug 2018 15:26:55 +0530
> 
> > When tls records are decrypted using asynchronous acclerators such as
> > NXP CAAM engine, the crypto apis return -EINPROGRESS. Presently, on
> > getting -EINPROGRESS, the tls record processing stops till the time
> > the crypto accelerator finishes off and returns the result. This
> > incurs a context switch and is not an efficient way of accessing the
> > crypto accelerators. Crypto accelerators work efficient when they are
> > queued with multiple crypto jobs without having to wait for the
> > previous ones to complete.
> >
> > The patch submits multiple crypto requests without having to wait for
> > for previous ones to complete. This has been implemented for records
> > which are decrypted in zero-copy mode. At the end of recvmsg(), we
> > wait for all the asynchronous decryption requests to complete.
> >
> > The references to records which have been sent for async decryption
> > are dropped. For cases where record decryption is not possible in
> > zero-copy mode, asynchronous decryption is not used and we wait for
> > decryption crypto api to complete.
> >
> > For crypto requests executing in async fashion, the memory for
> > aead_request, sglists and skb etc is freed from the decryption
> > completion handler. The decryption completion handler wakesup the
> > sleeping user context when recvmsg() flags that it has done sending
> > all the decryption requests and there are no more decryption requests
> > pending to be completed.
> >
> > Signed-off-by: Vakul Garg 
> > Reviewed-by: Dave Watson 
> > ---
> >
> > Changes since v1:
> > - Simplified recvmsg() so to drop reference to skb in case it
> >   was submimtted for async decryption.
> > - Modified tls_sw_advance_skb() to handle case when input skb is
> >   NULL.
> 
> Applied.

I do not find this patch in tree yet. 
Can you please check? Thanks and Regards.

RE: [PATCH net 3/3] tls: zero the crypto information from tls_context before freeing

2018-09-05 Thread Vakul Garg




> -Original Message-
> From: netdev-ow...@vger.kernel.org  On
> Behalf Of Sabrina Dubroca
> Sent: Wednesday, September 5, 2018 6:52 PM
> To: netdev@vger.kernel.org
> Cc: Sabrina Dubroca ; Boris Pismenny
> ; Ilya Lesokhin ; Aviad
> Yehezkel ; Dave Watson 
> Subject: [PATCH net 3/3] tls: zero the crypto information from tls_context
> before freeing
> 
> This contains key material in crypto_send_aes_gcm_128 and
> crypto_recv_aes_gcm_128.
> 
> Fixes: 3c4d7559159b ("tls: kernel TLS support")
> Signed-off-by: Sabrina Dubroca 
> ---
>  include/net/tls.h  |  1 +
>  net/tls/tls_main.c | 14 --
>  2 files changed, 13 insertions(+), 2 deletions(-)
> 
> diff --git a/include/net/tls.h b/include/net/tls.h index
> d5c683e8bb22..2010d23112f9 100644
> --- a/include/net/tls.h
> +++ b/include/net/tls.h
> @@ -180,6 +180,7 @@ struct tls_context {
>   struct tls_crypto_info crypto_recv;
>   struct tls12_crypto_info_aes_gcm_128
> crypto_recv_aes_gcm_128;
>   };
> + char tls_crypto_ctx_end[0];

This makes the key zeroization dependent upon the position of unions in 
structure.
Can you zeroise the crypto_send, crypto_recv separately using two 
memzero_explicit calls? 

> 
>   struct list_head list;
>   struct net_device *netdev;
> diff --git a/net/tls/tls_main.c b/net/tls/tls_main.c index
> 0d432d025471..d3a57c0b2182 100644
> --- a/net/tls/tls_main.c
> +++ b/net/tls/tls_main.c
> @@ -241,6 +241,16 @@ static void tls_write_space(struct sock *sk)
>   ctx->sk_write_space(sk);
>  }
> 
> +static void tls_ctx_free(struct tls_context *ctx) {
> + if (!ctx)
> + return;
> +
> + memzero_explicit(>crypto_send,
> +  offsetof(struct tls_context, tls_crypto_ctx_end));
> + kfree(ctx);
> +}
> +
>  static void tls_sk_proto_close(struct sock *sk, long timeout)  {
>   struct tls_context *ctx = tls_get_ctx(sk); @@ -294,7 +304,7 @@
> static void tls_sk_proto_close(struct sock *sk, long timeout)  #else
>   {
>  #endif
> - kfree(ctx);
> + tls_ctx_free(ctx);
>   ctx = NULL;
>   }
> 
> @@ -305,7 +315,7 @@ static void tls_sk_proto_close(struct sock *sk, long
> timeout)
>* for sk->sk_prot->unhash [tls_hw_unhash]
>*/
>   if (free_ctx)
> - kfree(ctx);
> + tls_ctx_free(ctx);
>  }
> 
>  static int do_tls_getsockopt_tx(struct sock *sk, char __user *optval,
> --
> 2.18.0

[PATCH net-next, net v2] net/tls: Set count of SG entries if sk_alloc_sg returns -ENOSPC

2018-09-06 Thread Vakul Garg

tls_sw_sendmsg() allocates plaintext and encrypted SG entries using
function sk_alloc_sg(). In case the number of SG entries hit
MAX_SKB_FRAGS, sk_alloc_sg() returns -ENOSPC and sets the variable for
current SG index to '0'. This leads to calling of function
tls_push_record() with 'sg_encrypted_num_elem = 0' and later causes
kernel crash. To fix this, set the number of SG elements to the number
of elements in plaintext/encrypted SG arrays in case sk_alloc_sg()
returns -ENOSPC.

Fixes: 3c4d7559159b ("tls: kernel TLS support")
Signed-off-by: Vakul Garg 
---
Changes since v1:
- Added 'Fixes:' tag.
- Marking that patch applies to both 'net-next' & 'net' branches
- Resending after correcting system time.
- No code changes as such

 net/tls/tls_sw.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index be4f2e990f9f..2dad3dc7be60 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -263,6 +263,9 @@ static int alloc_encrypted_sg(struct sock *sk, int len)
 >sg_encrypted_num_elem,
 >sg_encrypted_size, 0);
 
+   if (rc == -ENOSPC)
+   ctx->sg_encrypted_num_elem = ARRAY_SIZE(ctx->sg_encrypted_data);
+
return rc;
 }
 
@@ -276,6 +279,9 @@ static int alloc_plaintext_sg(struct sock *sk, int len)
 >sg_plaintext_num_elem, >sg_plaintext_size,
 tls_ctx->pending_open_record_frags);
 
+   if (rc == -ENOSPC)
+   ctx->sg_plaintext_num_elem = ARRAY_SIZE(ctx->sg_plaintext_data);
+
return rc;
 }
 
-- 
2.13.6

[PATCH net-next v1] net/tls: Fixed return value when tls_complete_pending_work() fails

2018-09-10 Thread Vakul Garg

In tls_sw_sendmsg() and tls_sw_sendpage(), the variable 'ret' has
been set to return value of tls_complete_pending_work(). This allows
return of proper error code if tls_complete_pending_work() fails.

Fixes: 3c4d7559159b ("tls: kernel TLS support")
Signed-off-by: Vakul Garg 
---
 net/tls/tls_sw.c | 10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index be4f2e990f9f..adab598bd6db 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -486,7 +486,7 @@ int tls_sw_sendmsg(struct sock *sk, struct msghdr *msg, 
size_t size)
 {
struct tls_context *tls_ctx = tls_get_ctx(sk);
struct tls_sw_context_tx *ctx = tls_sw_ctx_tx(tls_ctx);
-   int ret = 0;
+   int ret;
int required_size;
long timeo = sock_sndtimeo(sk, msg->msg_flags & MSG_DONTWAIT);
bool eor = !(msg->msg_flags & MSG_MORE);
@@ -502,7 +502,8 @@ int tls_sw_sendmsg(struct sock *sk, struct msghdr *msg, 
size_t size)
 
lock_sock(sk);
 
-   if (tls_complete_pending_work(sk, tls_ctx, msg->msg_flags, ))
+   ret = tls_complete_pending_work(sk, tls_ctx, msg->msg_flags, );
+   if (ret)
goto send_end;
 
if (unlikely(msg->msg_controllen)) {
@@ -637,7 +638,7 @@ int tls_sw_sendpage(struct sock *sk, struct page *page,
 {
struct tls_context *tls_ctx = tls_get_ctx(sk);
struct tls_sw_context_tx *ctx = tls_sw_ctx_tx(tls_ctx);
-   int ret = 0;
+   int ret;
long timeo = sock_sndtimeo(sk, flags & MSG_DONTWAIT);
bool eor;
size_t orig_size = size;
@@ -657,7 +658,8 @@ int tls_sw_sendpage(struct sock *sk, struct page *page,
 
sk_clear_bit(SOCKWQ_ASYNC_NOSPACE, sk);
 
-   if (tls_complete_pending_work(sk, tls_ctx, flags, ))
+   ret = tls_complete_pending_work(sk, tls_ctx, flags, );
+   if (ret)
goto sendpage_end;
 
/* Call the sk_stream functions to manage the sndbuf mem. */
-- 
2.13.6

[PATCH net-next v1] net/tls: Set count of SG entries if sk_alloc_sg returns -ENOSPC

2018-09-05 Thread Vakul Garg

tls_sw_sendmsg() allocates plaintext and encrypted SG entries using
function sk_alloc_sg(). In case the number of SG entries hit
MAX_SKB_FRAGS, sk_alloc_sg() returns -ENOSPC and sets the variable for
current SG index to '0'. This leads to calling of function
tls_push_record() with 'sg_encrypted_num_elem = 0' and later causes
kernel crash. To fix this, set the number of SG elements to the number
of elements in plaintext/encrypted SG arrays in case sk_alloc_sg()
returns -ENOSPC.

Signed-off-by: Vakul Garg 
---
 net/tls/tls_sw.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index be4f2e990f9f..2dad3dc7be60 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -263,6 +263,9 @@ static int alloc_encrypted_sg(struct sock *sk, int len)
 >sg_encrypted_num_elem,
 >sg_encrypted_size, 0);
 
+   if (rc == -ENOSPC)
+   ctx->sg_encrypted_num_elem = ARRAY_SIZE(ctx->sg_encrypted_data);
+
return rc;
 }
 
@@ -276,6 +279,9 @@ static int alloc_plaintext_sg(struct sock *sk, int len)
 >sg_plaintext_num_elem, >sg_plaintext_size,
 tls_ctx->pending_open_record_frags);
 
+   if (rc == -ENOSPC)
+   ctx->sg_plaintext_num_elem = ARRAY_SIZE(ctx->sg_plaintext_data);
+
return rc;
 }
 
-- 
2.13.6

[PATCH net-next] net/tls: Removed redundant variable from 'struct tls_sw_context_rx'

2018-07-11 Thread Vakul Garg

The variable 'decrypted' in 'struct tls_sw_context_rx' is redundant and
is being set/unset without purpose. Simplified the code by removing it.

Signed-off-by: Vakul Garg 
---
 include/net/tls.h |  1 -
 net/tls/tls_sw.c  | 87 ---
 2 files changed, 38 insertions(+), 50 deletions(-)

diff --git a/include/net/tls.h b/include/net/tls.h
index 70c273777fe9..528d0c2d6cc2 100644
--- a/include/net/tls.h
+++ b/include/net/tls.h
@@ -113,7 +113,6 @@ struct tls_sw_context_rx {
struct poll_table_struct *wait);
struct sk_buff *recv_pkt;
u8 control;
-   bool decrypted;
 
char rx_aad_ciphertext[TLS_AAD_SPACE_SIZE];
char rx_aad_plaintext[TLS_AAD_SPACE_SIZE];
diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index 0d670c8adf18..e5f2de2c3fd6 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -81,8 +81,6 @@ static int tls_do_decryption(struct sock *sk,
rxm->full_len -= tls_ctx->rx.overhead_size;
tls_advance_record_sn(sk, _ctx->rx);
 
-   ctx->decrypted = true;
-
ctx->saved_data_ready(sk);
 
 out:
@@ -756,6 +754,9 @@ int tls_sw_recvmsg(struct sock *sk,
bool cmsg = false;
int target, err = 0;
long timeo;
+   int page_count;
+   int to_copy;
+
 
flags |= nonblock;
 
@@ -792,46 +793,38 @@ int tls_sw_recvmsg(struct sock *sk,
goto recv_end;
}
 
-   if (!ctx->decrypted) {
-   int page_count;
-   int to_copy;
-
-   page_count = iov_iter_npages(>msg_iter,
-MAX_SKB_FRAGS);
-   to_copy = rxm->full_len - tls_ctx->rx.overhead_size;
-   if (to_copy <= len && page_count < MAX_SKB_FRAGS &&
-   likely(!(flags & MSG_PEEK)))  {
-   struct scatterlist sgin[MAX_SKB_FRAGS + 1];
-   int pages = 0;
-
-   zc = true;
-   sg_init_table(sgin, MAX_SKB_FRAGS + 1);
-   sg_set_buf([0], ctx->rx_aad_plaintext,
-  TLS_AAD_SPACE_SIZE);
-
-   err = zerocopy_from_iter(sk, >msg_iter,
-to_copy, ,
-, [1],
-MAX_SKB_FRAGS, false);
-   if (err < 0)
-   goto fallback_to_reg_recv;
-
-   err = decrypt_skb(sk, skb, sgin);
-   for (; pages > 0; pages--)
-   put_page(sg_page([pages]));
-   if (err < 0) {
-   tls_err_abort(sk, EBADMSG);
-   goto recv_end;
-   }
-   } else {
+   page_count = iov_iter_npages(>msg_iter, MAX_SKB_FRAGS);
+   to_copy = rxm->full_len - tls_ctx->rx.overhead_size;
+
+   if (to_copy <= len && page_count < MAX_SKB_FRAGS &&
+   likely(!(flags & MSG_PEEK)))  {
+   struct scatterlist sgin[MAX_SKB_FRAGS + 1];
+   int pages = 0;
+
+   zc = true;
+   sg_init_table(sgin, MAX_SKB_FRAGS + 1);
+   sg_set_buf([0], ctx->rx_aad_plaintext,
+  TLS_AAD_SPACE_SIZE);
+   err = zerocopy_from_iter(sk, >msg_iter, to_copy,
+, , [1],
+MAX_SKB_FRAGS, false);
+   if (err < 0)
+   goto fallback_to_reg_recv;
+
+   err = decrypt_skb(sk, skb, sgin);
+   for (; pages > 0; pages--)
+   put_page(sg_page([pages]));
+   if (err < 0) {
+   tls_err_abort(sk, EBADMSG);
+   goto recv_end;
+   }
+   } else {
 fallback_to_reg_recv:
-   err = decrypt_skb(sk, skb, NULL);
-   if (err < 0) {
-   tls_err_abort(sk, EBADMSG);
-   goto recv_end;
-   }
+   err = decrypt_skb(sk, skb, NULL);
+   if (err < 0) {
+   tls_err_abort(sk, EBAD

RE: [PATCH net-next] net/tls: Removed redundant variable from 'struct tls_sw_context_rx'

2018-07-12 Thread Vakul Garg

Hi Boris

Thanks for explaining.
Few questions/observations.

1. Isn't ' ctx->decrypted = true' a redundant statement in tls_do_decryption()?
The same has been repeated in tls_recvmsg() after calling decrypt_skb()?

2. Similarly, ctx->saved_data_ready(sk) seems not required in 
tls_do_decryption().
This is because tls_do_decryption() is already triggered from tls_recvmsg() 
i.e. from user space app context.

3. In tls_queue(), I think strp->sk->sk_state_change() needs to be replaced 
with ctx->saved_data_ready().

Regards

Vakul

> -Original Message-
> From: Boris Pismenny [mailto:bor...@mellanox.com]
> Sent: Thursday, July 12, 2018 4:11 PM
> To: Vakul Garg ; da...@davemloft.net;
> davejwat...@fb.com; netdev@vger.kernel.org
> Cc: avia...@mellanox.com
> Subject: Re: [PATCH net-next] net/tls: Removed redundant variable from
> 'struct tls_sw_context_rx'
> 
> Hi Vakul,
> 
> On 7/12/2018 7:03 AM, Vakul Garg wrote:
> > The variable 'decrypted' in 'struct tls_sw_context_rx' is redundant
> > and is being set/unset without purpose. Simplified the code by removing it.
> >
> 
> AFAIU, this variable has an important use here. It keeps the state whether
> the current record has been decrypted between invocations of the
> recv/splice system calls. Otherwise, some records would be decrypted more
> than once if the entire record was not read.




> 
> > Signed-off-by: Vakul Garg 
> > ---
> >   include/net/tls.h |  1 -
> >   net/tls/tls_sw.c  | 87 
> > 
> ---
> >   2 files changed, 38 insertions(+), 50 deletions(-)
> >
> > diff --git a/include/net/tls.h b/include/net/tls.h index
> > 70c273777fe9..528d0c2d6cc2 100644
> > --- a/include/net/tls.h
> > +++ b/include/net/tls.h
> > @@ -113,7 +113,6 @@ struct tls_sw_context_rx {
> > struct poll_table_struct *wait);
> > struct sk_buff *recv_pkt;
> > u8 control;
> > -   bool decrypted;
> >
> > char rx_aad_ciphertext[TLS_AAD_SPACE_SIZE];
> > char rx_aad_plaintext[TLS_AAD_SPACE_SIZE];
> > diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c index
> > 0d670c8adf18..e5f2de2c3fd6 100644
> > --- a/net/tls/tls_sw.c
> > +++ b/net/tls/tls_sw.c
> > @@ -81,8 +81,6 @@ static int tls_do_decryption(struct sock *sk,
> > rxm->full_len -= tls_ctx->rx.overhead_size;
> > tls_advance_record_sn(sk, _ctx->rx);
> >
> > -   ctx->decrypted = true;
> > -
> > ctx->saved_data_ready(sk);
> >
> >   out:
> > @@ -756,6 +754,9 @@ int tls_sw_recvmsg(struct sock *sk,
> > bool cmsg = false;
> > int target, err = 0;
> > long timeo;
> > +   int page_count;
> > +   int to_copy;
> > +
> >
> > flags |= nonblock;
> >
> > @@ -792,46 +793,38 @@ int tls_sw_recvmsg(struct sock *sk,
> > goto recv_end;
> > }
> >
> > -   if (!ctx->decrypted) {
> > -   int page_count;
> > -   int to_copy;
> > -
> > -   page_count = iov_iter_npages(>msg_iter,
> > -MAX_SKB_FRAGS);
> > -   to_copy = rxm->full_len - tls_ctx->rx.overhead_size;
> > -   if (to_copy <= len && page_count < MAX_SKB_FRAGS
> &&
> > -   likely(!(flags & MSG_PEEK)))  {
> > -   struct scatterlist sgin[MAX_SKB_FRAGS + 1];
> > -   int pages = 0;
> > -
> > -   zc = true;
> > -   sg_init_table(sgin, MAX_SKB_FRAGS + 1);
> > -   sg_set_buf([0], ctx->rx_aad_plaintext,
> > -  TLS_AAD_SPACE_SIZE);
> > -
> > -   err = zerocopy_from_iter(sk, 
> >msg_iter,
> > -to_copy, ,
> > -, [1],
> > -MAX_SKB_FRAGS,
>   false);
> > -   if (err < 0)
> > -   goto fallback_to_reg_recv;
> > -
> > -   err = decrypt_skb(sk, skb, sgin);
> > -   for (; pages > 0; pages--)
> > -   put_page(sg_page([pages]));
> > -   if (err < 0) {
> > -   tls_err_abo

[PATCH net-next 3/3] net/tls: Remove redundant variable assignments and wakeup

2018-07-13 Thread Vakul Garg

In function tls_do_decryption(), the assignment to tls receive context'
variable 'decrypted' is redundant as the same is being done in function
tls_recvmsg() after calling decrypt_skb(). Also calling callback
function to wakeup processes sleeping on socket data availability is
useless as tls_do_decryption() is invoked from user processes only. This
patch cleans these up.

Signed-off-by: Vakul Garg 
---
 net/tls/tls_sw.c | 4 
 1 file changed, 4 deletions(-)

diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index a184c70ee0ac..a53fdcc33e31 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -78,10 +78,6 @@ static int tls_do_decryption(struct sock *sk,
rxm->full_len -= tls_ctx->rx.overhead_size;
tls_advance_record_sn(sk, _ctx->rx);
 
-   ctx->decrypted = true;
-
-   ctx->saved_data_ready(sk);
-
 out:
aead_request_free(aead_req);
return ret;
-- 
2.13.6

[PATCH net-next 2/3] net/tls: Use socket data_ready callback on record availability

2018-07-13 Thread Vakul Garg

On receipt of a complete tls record, use socket's saved data_ready
callback instead of state_change callback.

Signed-off-by: Vakul Garg 
---
 net/tls/tls_sw.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index d31dc2487488..a184c70ee0ac 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -1000,7 +1000,7 @@ static void tls_queue(struct strparser *strp, struct 
sk_buff *skb)
ctx->recv_pkt = skb;
strp_pause(strp);
 
-   strp->sk->sk_state_change(strp->sk);
+   ctx->saved_data_ready(strp->sk);
 }
 
 static void tls_data_ready(struct sock *sk)
-- 
2.13.6

[PATCH net-next 1/3] net/tls: Do not enable zero-copy prematurely

2018-07-13 Thread Vakul Garg

Zero-copy mode was left enabled even when zerocopy_from_iter() failed.
Set the zero-copy mode only when zerocopy_from_iter() succeeds.

Signed-off-by: Vakul Garg 
---
 net/tls/tls_sw.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index 7453f5ae0819..d31dc2487488 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -800,7 +800,6 @@ int tls_sw_recvmsg(struct sock *sk,
struct scatterlist sgin[MAX_SKB_FRAGS + 1];
int pages = 0;
 
-   zc = true;
sg_init_table(sgin, MAX_SKB_FRAGS + 1);
sg_set_buf([0], ctx->rx_aad_plaintext,
   TLS_AAD_SPACE_SIZE);
@@ -811,6 +810,8 @@ int tls_sw_recvmsg(struct sock *sk,
 MAX_SKB_FRAGS, false);
if (err < 0)
goto fallback_to_reg_recv;
+   else
+   zc = true;
 
err = decrypt_skb(sk, skb, sgin);
for (; pages > 0; pages--)
-- 
2.13.6

RE: [PATCH v9 crypto 00/12] Chelsio Inline TLS

2018-03-06 Thread Vakul Garg



> -Original Message-
> From: linux-crypto-ow...@vger.kernel.org [mailto:linux-crypto-
> ow...@vger.kernel.org] On Behalf Of Atul Gupta
> Sent: Tuesday, March 6, 2018 9:05 PM
> To: davejwat...@fb.com; da...@davemloft.net;
> herb...@gondor.apana.org.au
> Cc: linux-cry...@vger.kernel.org; netdev@vger.kernel.org;
> ganes...@chelsio.com
> Subject: [PATCH v9 crypto 00/12] Chelsio Inline TLS
> 
> Series for Chelsio Inline TLS driver (chtls)
> 
> Use tls ULP infrastructure to register chtls as Inline TLS driver.
> Chtls use TCP Sockets to transmit and receive TLS record. TCP proto_ops is
> extended to offload TLS record.
> 
> T6 adapter provides the following features:
> -TLS record offload, TLS header, encrypt, digest and transmit
> -TLS record receive and decrypt
> -TLS keys store
> -TCP/IP engine
> -TLS engine
> -GCM crypto engine [support CBC also]
> 
> TLS provides security at the transport layer. It uses TCP to provide reliable
> end-to-end transport of application data. It relies on TCP for any
> retransmission. TLS session comprises of three parts:
> a. TCP/IP connection
> b. TLS handshake
> c. Record layer processing
> 
> TLS handshake state machine is executed in host (refer standard
> implementation eg. OpenSSL).  Setsockopt [SOL_TCP, TCP_ULP] initialize TCP
> proto-ops for Chelsio inline tls support. setsockopt(sock, SOL_TCP, TCP_ULP,
> "tls", sizeof("tls"));
> 
> Tx and Rx Keys are decided during handshake and programmed onto the chip
> after CCS is exchanged.
> struct tls12_crypto_info_aes_gcm_128 crypto_info setsockopt(sock,
> SOL_TLS, TLS_TX, _info, sizeof(crypto_info)) Finish is the first
> encrypted/decrypted message tx/rx inline.
> 
> On the Tx path TLS engine receive plain text from openssl, insert IV, fetches
> the tx key, create cipher text records and generate MAC. TLS header is added
> to cipher text and forward to TCP/IP engine for transport layer processing
> and transmission on wire.
> TX:
> Application--openssl--chtls---TLS engine---encrypt/auth---TCP/IP engine---
> wire.
> 
> On the Rx side, data received is PDU aligned at record boundaries. TLS
> processes only the complete record. If rx key is programmed on CCS receive,
> data is decrypted and plain text is posted to host.

What happens if rx key is not programmed on CCS receive? 
Does the TLS record decryption in Chelsio NIC device gets paused?
 

> RX:
> Wire--cipher-text--TCP/IP engine [PDU align]---TLS engine--- decrypt/auth---
> plain-text--chtls--openssl--application
> 
> v9: corrected __u8 and similar usage
> 
> v8: tls_main.c cleanup comment [Dave Watson]
> 
> v7: func name change, use sk->sk_prot where required
> 
> v6: modify prot only for FULL_HW
>-corrected commit message for patch 11
> 
> v5: set TLS_FULL_HW for registered inline tls drivers
>-set TLS_FULL_HW prot for offload connection else move
> to TLS_SW_TX
>-Case handled for interface with same IP [Dave Miller]
>-Removed Specific IP and INADDR_ANY handling [v4]
> 
> v4: removed chtls ULP type, retained tls ULP
>-registered chtls with net tls
>-defined struct tls_device to register the Inline drivers
>-ethtool interface tls-inline to enable Inline TLS for interface
>-prot update to support inline TLS
> 
> v3: fixed the kbuild test issues
>-made few funtions static
>-initialized few variables
> 
> v2: fixed the following based on the review comments of Stephan Mueller,
> Stefano Brivio and Hannes Frederic
> -Added more details in cover letter
> -Fixed indentation and formating issues
> -Using aes instead of aes-generic
> -memset key info after programing the key on chip
> -reordered the patch sequence
> 
> Atul Gupta (12):
>   tls: tls_device struct to register TLS drivers
>   ethtool: enable Inline TLS in HW
>   tls: support for inline tls
>   chtls: structure and macro definiton
>   cxgb4: Inline TLS FW Interface
>   cxgb4: LLD driver changes to enable TLS
>   chcr: Key Macro
>   chtls: Key program
>   chtls: CPL handler definition
>   chtls: Inline crypto request Tx/Rx
>   chtls: Register chtls Inline TLS with net tls
>   Makefile Kconfig
> 
>  drivers/crypto/chelsio/Kconfig  |   11 +
>  drivers/crypto/chelsio/Makefile |1 +
>  drivers/crypto/chelsio/chcr_algo.h  |   42 +
>  drivers/crypto/chelsio/chcr_core.h  |   55 +-
>  drivers/crypto/chelsio/chtls/Makefile   |4 +
>  drivers/crypto/chelsio/chtls/chtls.h|  487 ++
>  drivers/crypto/chelsio/chtls/chtls_cm.c | 2041
> +++
>  drivers/crypto/chelsio/chtls/chtls_cm.h |  202 +++
>  drivers/crypto/chelsio/chtls/chtls_hw.c |  394 +
>  drivers/crypto/chelsio/chtls/chtls_io.c | 1867 +
>  drivers/crypto/chelsio/chtls/chtls_main.c   |  600 +++
>  drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c |   32 +-
>

[PATCH net-next] strparser: Call skb_unclone conditionally

2018-06-29 Thread Vakul Garg

Calling skb_unclone() is expensive as it triggers a memcpy operation.
Instead of calling skb_unclone() unconditionally, call it only when skb
has a shared frag_list. This improves tls rx throughout significantly.

Signed-off-by: Vakul Garg 
Suggested-by: Boris Pismenny 
---
 net/strparser/strparser.c | 26 +++---
 1 file changed, 15 insertions(+), 11 deletions(-)

diff --git a/net/strparser/strparser.c b/net/strparser/strparser.c
index 373836615c57..4f40a90ca016 100644
--- a/net/strparser/strparser.c
+++ b/net/strparser/strparser.c
@@ -155,11 +155,13 @@ static int __strp_recv(read_descriptor_t *desc, struct 
sk_buff *orig_skb,
/* We are going to append to the frags_list of head.
 * Need to unshare the frag_list.
 */
-   err = skb_unclone(head, GFP_ATOMIC);
-   if (err) {
-   STRP_STATS_INCR(strp->stats.mem_fail);
-   desc->error = err;
-   return 0;
+   if (skb_has_frag_list(head)) {
+   err = skb_unclone(head, GFP_ATOMIC);
+   if (err) {
+   STRP_STATS_INCR(strp->stats.mem_fail);
+   desc->error = err;
+   return 0;
+   }
}
 
if (unlikely(skb_shinfo(head)->frag_list)) {
@@ -216,14 +218,16 @@ static int __strp_recv(read_descriptor_t *desc, struct 
sk_buff *orig_skb,
memset(stm, 0, sizeof(*stm));
stm->strp.offset = orig_offset + eaten;
} else {
-   /* Unclone since we may be appending to an skb that we
+   /* Unclone if we are appending to an skb that we
 * already share a frag_list with.
 */
-   err = skb_unclone(skb, GFP_ATOMIC);
-   if (err) {
-   STRP_STATS_INCR(strp->stats.mem_fail);
-   desc->error = err;
-   break;
+   if (skb_has_frag_list(skb)) {
+   err = skb_unclone(skb, GFP_ATOMIC);
+   if (err) {
+   STRP_STATS_INCR(strp->stats.mem_fail);
+   desc->error = err;
+   break;
+   }
}
 
stm = _strp_msg(head);
-- 
2.13.6

[PATCH net-next] tls: Add support for inplace records encryption

2018-09-29 Thread Vakul Garg

Presently, for non-zero copy case, separate pages are allocated for
storing plaintext and encrypted text of records. These pages are stored
in sg_plaintext_data and sg_encrypted_data scatterlists inside record
structure. Further, sg_plaintext_data & sg_encrypted_data are passed
to cryptoapis for record encryption. Allocating separate pages for
plaintext and encrypted text is inefficient from both required memory
and performance point of view.

This patch adds support of inplace encryption of records. For non-zero
copy case, we reuse the pages from sg_encrypted_data scatterlist to
copy the application's plaintext data. For the movement of pages from
sg_encrypted_data to sg_plaintext_data scatterlists, we introduce a new
function move_to_plaintext_sg(). This function add pages into
sg_plaintext_data from sg_encrypted_data scatterlists.

tls_do_encryption() is modified to pass the same scatterlist as both
source and destination into aead_request_set_crypt() if inplace crypto
has been enabled. A new ariable 'inplace_crypto' has been introduced in
record structure to signify whether the same scatterlist can be used.
By default, the inplace_crypto is enabled in get_rec(). If zero-copy is
used (i.e. plaintext data is not copied), inplace_crypto is set to '0'.

Signed-off-by: Vakul Garg 
---
 include/net/tls.h |  1 +
 net/tls/tls_sw.c  | 91 ---
 2 files changed, 74 insertions(+), 18 deletions(-)

diff --git a/include/net/tls.h b/include/net/tls.h
index 262420cdad10..5e853835597e 100644
--- a/include/net/tls.h
+++ b/include/net/tls.h
@@ -101,6 +101,7 @@ struct tls_rec {
struct list_head list;
int tx_ready;
int tx_flags;
+   int inplace_crypto;
 
/* AAD | sg_plaintext_data | sg_tag */
struct scatterlist sg_plaintext_data[MAX_SKB_FRAGS + 1];
diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index 1d4c354d5516..aa9fdce272b6 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -281,24 +281,72 @@ static int alloc_encrypted_sg(struct sock *sk, int len)
return rc;
 }
 
-static int alloc_plaintext_sg(struct sock *sk, int len)
+static int move_to_plaintext_sg(struct sock *sk, int required_size)
 {
struct tls_context *tls_ctx = tls_get_ctx(sk);
struct tls_sw_context_tx *ctx = tls_sw_ctx_tx(tls_ctx);
struct tls_rec *rec = ctx->open_rec;
-   int rc = 0;
+   struct scatterlist *plain_sg = >sg_plaintext_data[1];
+   struct scatterlist *enc_sg = >sg_encrypted_data[1];
+   int enc_sg_idx = 0;
+   int skip, len;
 
-   rc = sk_alloc_sg(sk, len,
->sg_plaintext_data[1], 0,
->sg_plaintext_num_elem,
->sg_plaintext_size,
-tls_ctx->pending_open_record_frags);
+   if (rec->sg_plaintext_num_elem == MAX_SKB_FRAGS)
+   return -ENOSPC;
 
-   if (rc == -ENOSPC)
-   rec->sg_plaintext_num_elem =
-   ARRAY_SIZE(rec->sg_plaintext_data) - 1;
+   /* We add page references worth len bytes from enc_sg at the
+* end of plain_sg. It is guaranteed that sg_encrypted_data
+* has enough required room (ensured by caller).
+*/
+   len = required_size - rec->sg_plaintext_size;
 
-   return rc;
+   /* Skip initial bytes in sg_encrypted_data to be able
+* to use same offset of both plain and encrypted data.
+*/
+   skip = tls_ctx->tx.prepend_size + rec->sg_plaintext_size;
+
+   while (enc_sg_idx < rec->sg_encrypted_num_elem) {
+   if (enc_sg[enc_sg_idx].length > skip)
+   break;
+
+   skip -= enc_sg[enc_sg_idx].length;
+   enc_sg_idx++;
+   }
+
+   /* unmark the end of plain_sg*/
+   sg_unmark_end(plain_sg + rec->sg_plaintext_num_elem - 1);
+
+   while (len) {
+   struct page *page = sg_page(_sg[enc_sg_idx]);
+   int bytes = enc_sg[enc_sg_idx].length - skip;
+   int offset = enc_sg[enc_sg_idx].offset + skip;
+
+   if (bytes > len)
+   bytes = len;
+   else
+   enc_sg_idx++;
+
+   /* Skipping is required only one time */
+   skip = 0;
+
+   /* Increment page reference */
+   get_page(page);
+
+   sg_set_page(_sg[rec->sg_plaintext_num_elem], page,
+   bytes, offset);
+
+   sk_mem_charge(sk, bytes);
+
+   len -= bytes;
+   rec->sg_plaintext_size += bytes;
+
+   rec->sg_plaintext_num_elem++;
+
+   if (rec->sg_plaintext_num_elem == MAX_SKB_FRAGS)
+   return -ENOSPC;
+   }
+
+   return 0;
 }
 
 static void free_sg(struct sock *sk, struct scatterlist *sg,
@@ -459,16 +507,21 @@ static int tls_do_en

[PATCH net-next v2] net/tls: Add support for async encryption of records for performance

2018-09-20 Thread Vakul Garg

In current implementation, tls records are encrypted & transmitted
serially. Till the time the previously submitted user data is encrypted,
the implementation waits and on finish starts transmitting the record.
This approach of encrypt-one record at a time is inefficient when
asynchronous crypto accelerators are used. For each record, there are
overheads of interrupts, driver softIRQ scheduling etc. Also the crypto
accelerator sits idle most of time while an encrypted record's pages are
handed over to tcp stack for transmission.

This patch enables encryption of multiple records in parallel when an
async capable crypto accelerator is present in system. This is achieved
by allowing the user space application to send more data using sendmsg()
even while previously issued data is being processed by crypto
accelerator. This requires returning the control back to user space
application after submitting encryption request to accelerator. This
also means that zero-copy mode of encryption cannot be used with async
accelerator as we must be done with user space application buffer before
returning from sendmsg().

There can be multiple records in flight to/from the accelerator. Each of
the record is represented by 'struct tls_rec'. This is used to store the
memory pages for the record.

After the records are encrypted, they are added in a linked list called
tx_ready_list which contains encrypted tls records sorted as per tls
sequence number. The records from tx_ready_list are transmitted using a
newly introduced function called tls_tx_records(). The tx_ready_list is
polled for any record ready to be transmitted in sendmsg(), sendpage()
after initiating encryption of new tls records. This achieves parallel
encryption and transmission of records when async accelerator is
present.

There could be situation when crypto accelerator completes encryption
later than polling of tx_ready_list by sendmsg()/sendpage(). Therefore
we need a deferred work context to be able to transmit records from
tx_ready_list. The deferred work context gets scheduled if applications
are not sending much data through the socket. If the applications issue
sendmsg()/sendpage() in quick succession, then the scheduling of
tx_work_handler gets cancelled as the tx_ready_list would be polled from
application's context itself. This saves scheduling overhead of deferred
work.

The patch also brings some side benefit. We are able to get rid of the
concept of CLOSED record. This is because the records once closed are
either encrypted and then placed into tx_ready_list or if encryption
fails, the socket error is set. This simplifies the kernel tls
sendpath. However since tls_device.c is still using macros, accessory
functions for CLOSED records have been retained.

Signed-off-by: Vakul Garg 
---

Changes since v1: Addressed Dave Miller's comments.
- Removed an extra space between 'inline' and 'bool' in
  'is_tx_ready' declaration.
- Changed order of variable declarations at several places in
  order to following 'longest to shortest' line suggestion.

 include/net/tls.h  |  70 +--
 net/tls/tls_main.c |  54 ++---
 net/tls/tls_sw.c   | 586 -
 3 files changed, 522 insertions(+), 188 deletions(-)

diff --git a/include/net/tls.h b/include/net/tls.h
index 9f3c4ea9ad6f..3aa73e2d8823 100644
--- a/include/net/tls.h
+++ b/include/net/tls.h
@@ -41,7 +41,7 @@
 #include 
 #include 
 #include 
-
+#include 
 #include 
 
 
@@ -93,24 +93,47 @@ enum {
TLS_NUM_CONFIG,
 };
 
-struct tls_sw_context_tx {
-   struct crypto_aead *aead_send;
-   struct crypto_wait async_wait;
-
-   char aad_space[TLS_AAD_SPACE_SIZE];
-
-   unsigned int sg_plaintext_size;
-   int sg_plaintext_num_elem;
+/* TLS records are maintained in 'struct tls_rec'. It stores the memory pages
+ * allocated or mapped for each TLS record. After encryption, the records are
+ * stores in a linked list.
+ */
+struct tls_rec {
+   struct list_head list;
+   int tx_flags;
struct scatterlist sg_plaintext_data[MAX_SKB_FRAGS];
-
-   unsigned int sg_encrypted_size;
-   int sg_encrypted_num_elem;
struct scatterlist sg_encrypted_data[MAX_SKB_FRAGS];
 
/* AAD | sg_plaintext_data | sg_tag */
struct scatterlist sg_aead_in[2];
/* AAD | sg_encrypted_data (data contain overhead for hdr) */
struct scatterlist sg_aead_out[2];
+
+   unsigned int sg_plaintext_size;
+   unsigned int sg_encrypted_size;
+   int sg_plaintext_num_elem;
+   int sg_encrypted_num_elem;
+
+   char aad_space[TLS_AAD_SPACE_SIZE];
+   struct aead_request aead_req;
+   u8 aead_req_ctx[];
+};
+
+struct tx_work {
+   struct delayed_work work;
+   struct sock *sk;
+};
+
+struct tls_sw_context_tx {
+   struct crypto_aead *aead_send;
+   struct crypto_wait async_wait;
+   struct tx_work tx_work;
+   struct tls_rec *open_rec;
+   st

RE: [PATCH net-next] net/tls: Add support for async encryption of records for performance

2018-09-20 Thread Vakul Garg




> -Original Message-
> From: David Miller 
> Sent: Thursday, September 20, 2018 11:49 PM
> To: Vakul Garg 
> Cc: netdev@vger.kernel.org; bor...@mellanox.com;
> avia...@mellanox.com; davejwat...@fb.com; doro...@fb.com
> Subject: Re: [PATCH net-next] net/tls: Add support for async encryption of
> records for performance
> 
> From: Vakul Garg 
> Date: Wed, 19 Sep 2018 20:51:35 +0530
> 
> > This patch enables encryption of multiple records in parallel when an
> > async capable crypto accelerator is present in system.
> 
> This seems to be trading off zero copy with async support.
> 
> Async crypto device support is not the common case at all, and synchronous
> crypto via cpu crypto acceleration instructions is so much more likely.
> 
> Oh I see, the new logic is only triggered with ASYNC_CAPABLE is set?
> 
> > +static inline  bool is_tx_ready(struct tls_context *tls_ctx,
> > +   struct tls_sw_context_tx *ctx)
> > +{
> 
> Two space between "inline" and "bool", please make it one.
 
Fixed.
Seems checkpatch misses it.

> 
> >  static void tls_write_space(struct sock *sk)  {
> > struct tls_context *ctx = tls_get_ctx(sk);
> > +   struct tls_sw_context_tx *tx_ctx = tls_sw_ctx_tx(ctx);
> 
> Longest to shortest line (reverse christmas tree) ordering for local variable
> declarations please.
 
Can't do this. The second variable assignment is dependent upon previous one.


> >
> > +   list_for_each_prev(pos, >tx_ready_list) {
> > +   struct tls_rec *rec = (struct tls_rec *)pos;
> > +   u64 seq = be64_to_cpup((const __be64 *)>aad_space);
> 
> Likewise.
> 

I can split variable declaration 'seq' and its assignment into two separate 
lines.
But I am not sure if increasing number of lines in order to comply reverse 
Christmas tree
is a good thing for this case.

> > -static int tls_do_encryption(struct tls_context *tls_ctx,
> > +int tls_tx_records(struct sock *sk, int flags) {
> > +   struct tls_rec *rec, *tmp;
> > +   struct tls_context *tls_ctx = tls_get_ctx(sk);
> > +   struct tls_sw_context_tx *ctx = tls_sw_ctx_tx(tls_ctx);
> > +   int rc = 0;
> > +   int tx_flags;
> 
> Likewise.

Could partially address since ctx assignment depends upon tls_ctx assignment.
 
> 
> > +static void tls_encrypt_done(struct crypto_async_request *req, int
> > +err) {
> > +   struct aead_request *aead_req = (struct aead_request *)req;
> > +   struct sock *sk = req->data;
> > +   struct tls_context *tls_ctx = tls_get_ctx(sk);
> > +   struct tls_sw_context_tx *ctx = tls_sw_ctx_tx(tls_ctx);
> > +   struct tls_rec *rec;
> > +   int pending;
> > +   bool ready = false;
> 
> Likewise.

Placed 'ready' above pending 'pending'. Rest unchanged because of dependencies.

> 
> > +static int tls_do_encryption(struct sock *sk,
> > +struct tls_context *tls_ctx,
> >  struct tls_sw_context_tx *ctx,
> >  struct aead_request *aead_req,
> >  size_t data_len)
> >  {
> > int rc;
> > +   struct tls_rec *rec = ctx->open_rec;
> 
> Likewise.
> 
> > @@ -473,11 +630,12 @@ static int memcopy_from_iter(struct sock *sk,
> > struct iov_iter *from,  {
> > struct tls_context *tls_ctx = tls_get_ctx(sk);
> > struct tls_sw_context_tx *ctx = tls_sw_ctx_tx(tls_ctx);
> > -   struct scatterlist *sg = ctx->sg_plaintext_data;
> > +   struct tls_rec *rec = ctx->open_rec;
> > +   struct scatterlist *sg = rec->sg_plaintext_data;
> > int copy, i, rc = 0;
> 
> Likewise.
 
Can't change because of dependencies.

> 
> > +struct tls_rec *get_rec(struct sock *sk) {
> > +   int mem_size;
> > +   struct tls_context *tls_ctx = tls_get_ctx(sk);
> > +   struct tls_sw_context_tx *ctx = tls_sw_ctx_tx(tls_ctx);
> > +   struct tls_rec *rec;
> 
> Likewise.
> 
 
Declared 'mem_size' below 'rec'.

> > @@ -510,21 +707,33 @@ int tls_sw_sendmsg(struct sock *sk, struct
> msghdr *msg, size_t size)
> > int record_room;
> > bool full_record;
> > int orig_size;
> > +   struct tls_rec *rec;
> > bool is_kvec = msg->msg_iter.type & ITER_KVEC;
> > +   struct crypto_tfm *tfm = crypto_aead_tfm(ctx->aead_send);
> > +   bool async_capable = tfm->__crt_alg->cra_flags &
> CRYPTO_ALG_ASYNC;
> > +   int num_async = 0;
> > +   int num_zc = 0;
> 
> Likewise.

Fixed

> > @@ -661,6 +904,8 @@ int tls_sw_sendpage(struct sock *sk, struct page
> *page,
> > struct scatterlist *sg;
> > bool full_record;
> > int record_room;
> > +   struct tls_rec *rec;
> > +   int num_async = 0;
> 
> Likewise.
 
Fixed.

Sending v2.

[PATCH net-next] net/tls: Add support for async encryption of records for performance

2018-09-19 Thread Vakul Garg

In current implementation, tls records are encrypted & transmitted
serially. Till the time the previously submitted user data is encrypted,
the implementation waits and on finish starts transmitting the record.
This approach of encrypt-one record at a time is inefficient when
asynchronous crypto accelerators are used. For each record, there are
overheads of interrupts, driver softIRQ scheduling etc. Also the crypto
accelerator sits idle most of time while an encrypted record's pages are
handed over to tcp stack for transmission.

This patch enables encryption of multiple records in parallel when an
async capable crypto accelerator is present in system. This is achieved
by allowing the user space application to send more data using sendmsg()
even while previously issued data is being processed by crypto
accelerator. This requires returning the control back to user space
application after submitting encryption request to accelerator. This
also means that zero-copy mode of encryption cannot be used with async
accelerator as we must be done with user space application buffer before
returning from sendmsg().

There can be multiple records in flight to/from the accelerator. Each of
the record is represented by 'struct tls_rec'. This is used to store the
memory pages for the record.

After the records are encrypted, they are added in a linked list called
tx_ready_list which contains encrypted tls records sorted as per tls
sequence number. The records from tx_ready_list are transmitted using a
newly introduced function called tls_tx_records(). The tx_ready_list is
polled for any record ready to be transmitted in sendmsg(), sendpage()
after initiating encryption of new tls records. This achieves parallel
encryption and transmission of records when async accelerator is
present.

There could be situation when crypto accelerator completes encryption
later than polling of tx_ready_list by sendmsg()/sendpage(). Therefore
we need a deferred work context to be able to transmit records from
tx_ready_list. The deferred work context gets scheduled if applications
are not sending much data through the socket. If the applications issue
sendmsg()/sendpage() in quick succession, then the scheduling of
tx_work_handler gets cancelled as the tx_ready_list would be polled from
application's context itself. This saves scheduling overhead of deferred
work.

The patch also brings some side benefit. We are able to get rid of the
concept of CLOSED record. This is because the records once closed are
either encrypted and then placed into tx_ready_list or if encryption
fails, the socket error is set. This simplifies the kernel tls
sendpath. However since tls_device.c is still using macros, accessory
functions for CLOSED records have been retained.

Signed-off-by: Vakul Garg 
---
 include/net/tls.h  |  70 +--
 net/tls/tls_main.c |  54 ++---
 net/tls/tls_sw.c   | 569 -
 3 files changed, 515 insertions(+), 178 deletions(-)

diff --git a/include/net/tls.h b/include/net/tls.h
index 9f3c4ea9ad6f..84756667fc2a 100644
--- a/include/net/tls.h
+++ b/include/net/tls.h
@@ -41,7 +41,7 @@
 #include 
 #include 
 #include 
-
+#include 
 #include 
 
 
@@ -93,24 +93,47 @@ enum {
TLS_NUM_CONFIG,
 };
 
-struct tls_sw_context_tx {
-   struct crypto_aead *aead_send;
-   struct crypto_wait async_wait;
-
-   char aad_space[TLS_AAD_SPACE_SIZE];
-
-   unsigned int sg_plaintext_size;
-   int sg_plaintext_num_elem;
+/* TLS records are maintained in 'struct tls_rec'. It stores the memory pages
+ * allocated or mapped for each TLS record. After encryption, the records are
+ * stores in a linked list.
+ */
+struct tls_rec {
+   struct list_head list;
+   int tx_flags;
struct scatterlist sg_plaintext_data[MAX_SKB_FRAGS];
-
-   unsigned int sg_encrypted_size;
-   int sg_encrypted_num_elem;
struct scatterlist sg_encrypted_data[MAX_SKB_FRAGS];
 
/* AAD | sg_plaintext_data | sg_tag */
struct scatterlist sg_aead_in[2];
/* AAD | sg_encrypted_data (data contain overhead for hdr) */
struct scatterlist sg_aead_out[2];
+
+   unsigned int sg_plaintext_size;
+   unsigned int sg_encrypted_size;
+   int sg_plaintext_num_elem;
+   int sg_encrypted_num_elem;
+
+   char aad_space[TLS_AAD_SPACE_SIZE];
+   struct aead_request aead_req;
+   u8 aead_req_ctx[];
+};
+
+struct tx_work {
+   struct delayed_work work;
+   struct sock *sk;
+};
+
+struct tls_sw_context_tx {
+   struct crypto_aead *aead_send;
+   struct crypto_wait async_wait;
+   struct tx_work tx_work;
+   struct tls_rec *open_rec;
+   struct list_head tx_ready_list;
+   atomic_t encrypt_pending;
+   int async_notify;
+
+#define BIT_TX_SCHEDULED   0
+   unsigned long tx_bitmask;
 };
 
 struct tls_sw_context_rx {
@@ -197,6 +220,8 @@ struct tls_context {
 
struct scatterlist *partially_sent_re

[PATCH net-next] net/tls: Fixed race condition in async encryption

2018-09-24 Thread Vakul Garg

On processors with multi-engine crypto accelerators, it is possible that
multiple records get encrypted in parallel and their encryption
completion is notified to different cpus in multicore processor. This
leads to the situation where tls_encrypt_done() starts executing in
parallel on different cores. In current implementation, encrypted
records are queued to tx_ready_list in tls_encrypt_done(). This requires
addition to linked list 'tx_ready_list' to be protected. As
tls_decrypt_done() could be executing in irq content, it is not possible
to protect linked list addition operation using a lock.

To fix the problem, we remove linked list addition operation from the
irq context. We do tx_ready_list addition/removal operation from
application context only and get rid of possible multiple access to
the linked list. Before starting encryption on the record, we add it to
the tail of tx_ready_list. To prevent tls_tx_records() from transmitting
it, we mark the record with a new flag 'tx_ready' in 'struct tls_rec'.
When record encryption gets completed, tls_encrypt_done() has to only
update the 'tx_ready' flag to true & linked list add operation is not
required.

The changed logic brings some other side benefits. Since the records
are always submitted in tls sequence number order for encryption, the
tx_ready_list always remains sorted and addition of new records to it
does not have to traverse the linked list.

Lastly, we renamed tx_ready_list in 'struct tls_sw_context_tx' to
'tx_list'. This is because now, the some of the records at the tail are
not ready to transmit.

Fixes: a42055e8d2c3 ("net/tls: Add support for async encryption")
Signed-off-by: Vakul Garg 
---
 include/net/tls.h  | 16 ---
 net/tls/tls_main.c |  4 +--
 net/tls/tls_sw.c   | 81 --
 3 files changed, 37 insertions(+), 64 deletions(-)

diff --git a/include/net/tls.h b/include/net/tls.h
index 3aa73e2d8823..1615fb5ea114 100644
--- a/include/net/tls.h
+++ b/include/net/tls.h
@@ -99,6 +99,7 @@ enum {
  */
 struct tls_rec {
struct list_head list;
+   int tx_ready;
int tx_flags;
struct scatterlist sg_plaintext_data[MAX_SKB_FRAGS];
struct scatterlist sg_encrypted_data[MAX_SKB_FRAGS];
@@ -128,7 +129,7 @@ struct tls_sw_context_tx {
struct crypto_wait async_wait;
struct tx_work tx_work;
struct tls_rec *open_rec;
-   struct list_head tx_ready_list;
+   struct list_head tx_list;
atomic_t encrypt_pending;
int async_notify;
 
@@ -220,7 +221,6 @@ struct tls_context {
 
struct scatterlist *partially_sent_record;
u16 partially_sent_offset;
-   u64 tx_seq_number;  /* Next TLS seqnum to be transmitted */
 
unsigned long flags;
bool in_tcp_sendpages;
@@ -341,21 +341,15 @@ static inline bool tls_is_pending_open_record(struct 
tls_context *tls_ctx)
return tls_ctx->pending_open_record_frags;
 }
 
-static inline bool is_tx_ready(struct tls_context *tls_ctx,
-  struct tls_sw_context_tx *ctx)
+static inline bool is_tx_ready(struct tls_sw_context_tx *ctx)
 {
struct tls_rec *rec;
-   u64 seq;
 
-   rec = list_first_entry(>tx_ready_list, struct tls_rec, list);
+   rec = list_first_entry(>tx_list, struct tls_rec, list);
if (!rec)
return false;
 
-   seq = be64_to_cpup((const __be64 *)>aad_space);
-   if (seq == tls_ctx->tx_seq_number)
-   return true;
-   else
-   return false;
+   return READ_ONCE(rec->tx_ready);
 }
 
 struct sk_buff *
diff --git a/net/tls/tls_main.c b/net/tls/tls_main.c
index 06094de7a3d9..b428069a1b05 100644
--- a/net/tls/tls_main.c
+++ b/net/tls/tls_main.c
@@ -212,7 +212,7 @@ int tls_push_pending_closed_record(struct sock *sk,
struct tls_sw_context_tx *ctx = tls_sw_ctx_tx(tls_ctx);
 
if (tls_is_partially_sent_record(tls_ctx) ||
-   !list_empty(>tx_ready_list))
+   !list_empty(>tx_list))
return tls_tx_records(sk, flags);
else
return tls_ctx->push_pending_record(sk, flags);
@@ -233,7 +233,7 @@ static void tls_write_space(struct sock *sk)
}
 
/* Schedule the transmission if tx list is ready */
-   if (is_tx_ready(ctx, tx_ctx) && !sk->sk_write_pending) {
+   if (is_tx_ready(tx_ctx) && !sk->sk_write_pending) {
/* Schedule the transmission */
if (!test_and_set_bit(BIT_TX_SCHEDULED, _ctx->tx_bitmask))
schedule_delayed_work(_ctx->tx_work.work, 0);
diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index bcb24c498b84..d30d65bf0753 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -329,29 +329,6 @@ static void tls_free_both_sg(struct sock *sk)
>sg_plaintext_size);
 }
 
-static bool append_tx_ready_list(struct tls_context

[PATCH net-next] tls: Fixed uninitialised vars warning

2018-09-24 Thread Vakul Garg

In tls_sw_sendmsg() and tls_sw_sendpage(), it is possible that the
uninitialised variable 'ret' gets passed to sk_stream_error(). So
initialise local variable 'ret' to '0. The warnings were detected by
'smatch' tool.

Fixes: a42055e8d2c3 ("net/tls: Add support for async encryption")
Signed-off-by: Vakul Garg 
---
 net/tls/tls_sw.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index bcb24c498b84..102d84bdb2ab 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -709,7 +709,7 @@ int tls_sw_sendmsg(struct sock *sk, struct msghdr *msg, 
size_t size)
int record_room;
int num_zc = 0;
int orig_size;
-   int ret;
+   int ret = 0;
 
if (msg->msg_flags & ~(MSG_MORE | MSG_DONTWAIT | MSG_NOSIGNAL))
return -ENOTSUPP;
@@ -901,8 +901,8 @@ int tls_sw_sendpage(struct sock *sk, struct page *page,
int num_async = 0;
bool full_record;
int record_room;
+   int ret = 0;
bool eor;
-   int ret;
 
if (flags & ~(MSG_MORE | MSG_DONTWAIT | MSG_NOSIGNAL |
  MSG_SENDPAGE_NOTLAST))
-- 
2.13.6

[PATCH net-next] tls: Remove redundant vars from tls record structure

2018-09-26 Thread Vakul Garg

Structure 'tls_rec' contains sg_aead_in and sg_aead_out which point
to a aad_space and then chain scatterlists sg_plaintext_data,
sg_encrypted_data respectively. Rather than using chained scatterlists
for plaintext and encrypted data in aead_req, it is efficient to store
aad_space in sg_encrypted_data and sg_plaintext_data itself in the
first index and get rid of sg_aead_in, sg_aead_in and further chaining.

This requires increasing size of sg_encrypted_data & sg_plaintext_data
arrarys by 1 to accommodate entry for aad_space. The code which uses
sg_encrypted_data and sg_plaintext_data has been modified to skip first
index as it points to aad_space.

Signed-off-by: Vakul Garg 
---
 include/net/tls.h |  6 ++--
 net/tls/tls_sw.c  | 92 ++-
 2 files changed, 45 insertions(+), 53 deletions(-)

diff --git a/include/net/tls.h b/include/net/tls.h
index 1615fb5ea114..262420cdad10 100644
--- a/include/net/tls.h
+++ b/include/net/tls.h
@@ -101,13 +101,11 @@ struct tls_rec {
struct list_head list;
int tx_ready;
int tx_flags;
-   struct scatterlist sg_plaintext_data[MAX_SKB_FRAGS];
-   struct scatterlist sg_encrypted_data[MAX_SKB_FRAGS];
 
/* AAD | sg_plaintext_data | sg_tag */
-   struct scatterlist sg_aead_in[2];
+   struct scatterlist sg_plaintext_data[MAX_SKB_FRAGS + 1];
/* AAD | sg_encrypted_data (data contain overhead for hdr) */
-   struct scatterlist sg_aead_out[2];
+   struct scatterlist sg_encrypted_data[MAX_SKB_FRAGS + 1];
 
unsigned int sg_plaintext_size;
unsigned int sg_encrypted_size;
diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index 4c18b4dba284..8cf7bef7c5a2 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -248,7 +248,7 @@ static void trim_both_sgl(struct sock *sk, int target_size)
struct tls_sw_context_tx *ctx = tls_sw_ctx_tx(tls_ctx);
struct tls_rec *rec = ctx->open_rec;
 
-   trim_sg(sk, rec->sg_plaintext_data,
+   trim_sg(sk, >sg_plaintext_data[1],
>sg_plaintext_num_elem,
>sg_plaintext_size,
target_size);
@@ -256,7 +256,7 @@ static void trim_both_sgl(struct sock *sk, int target_size)
if (target_size > 0)
target_size += tls_ctx->tx.overhead_size;
 
-   trim_sg(sk, rec->sg_encrypted_data,
+   trim_sg(sk, >sg_encrypted_data[1],
>sg_encrypted_num_elem,
>sg_encrypted_size,
target_size);
@@ -270,12 +270,13 @@ static int alloc_encrypted_sg(struct sock *sk, int len)
int rc = 0;
 
rc = sk_alloc_sg(sk, len,
-rec->sg_encrypted_data, 0,
+>sg_encrypted_data[1], 0,
 >sg_encrypted_num_elem,
 >sg_encrypted_size, 0);
 
if (rc == -ENOSPC)
-   rec->sg_encrypted_num_elem = ARRAY_SIZE(rec->sg_encrypted_data);
+   rec->sg_encrypted_num_elem =
+   ARRAY_SIZE(rec->sg_encrypted_data) - 1;
 
return rc;
 }
@@ -287,12 +288,15 @@ static int alloc_plaintext_sg(struct sock *sk, int len)
struct tls_rec *rec = ctx->open_rec;
int rc = 0;
 
-   rc = sk_alloc_sg(sk, len, rec->sg_plaintext_data, 0,
->sg_plaintext_num_elem, >sg_plaintext_size,
+   rc = sk_alloc_sg(sk, len,
+>sg_plaintext_data[1], 0,
+>sg_plaintext_num_elem,
+>sg_plaintext_size,
 tls_ctx->pending_open_record_frags);
 
if (rc == -ENOSPC)
-   rec->sg_plaintext_num_elem = ARRAY_SIZE(rec->sg_plaintext_data);
+   rec->sg_plaintext_num_elem =
+   ARRAY_SIZE(rec->sg_plaintext_data) - 1;
 
return rc;
 }
@@ -320,11 +324,11 @@ static void tls_free_open_rec(struct sock *sk)
if (!rec)
return;
 
-   free_sg(sk, rec->sg_encrypted_data,
+   free_sg(sk, >sg_encrypted_data[1],
>sg_encrypted_num_elem,
>sg_encrypted_size);
 
-   free_sg(sk, rec->sg_plaintext_data,
+   free_sg(sk, >sg_plaintext_data[1],
>sg_plaintext_num_elem,
>sg_plaintext_size);
 
@@ -355,7 +359,7 @@ int tls_tx_records(struct sock *sk, int flags)
 * Remove the head of tx_list
 */
list_del(>list);
-   free_sg(sk, rec->sg_plaintext_data,
+   free_sg(sk, >sg_plaintext_data[1],
>sg_plaintext_num_elem, >sg_plaintext_size);
 
kfree(rec);
@@ -370,13 +374,13 @@ int tls_tx_records(struct sock *sk, int flags)
tx_flags = flags;
 
rc = tls_push_sg(sk

[PATCH net-next] tls: Fix socket mem accounting error under async encryption

2018-09-25 Thread Vakul Garg

Current async encryption implementation sometimes showed up socket
memory accounting error during socket close. This results in kernel
warning calltrace. The root cause of the problem is that socket var
sk_forward_alloc gets corrupted due to access in sk_mem_charge()
and sk_mem_uncharge() being invoked from multiple concurrent contexts
in multicore processor. The apis sk_mem_charge() and sk_mem_uncharge()
are called from functions alloc_plaintext_sg(), free_sg() etc. It is
required that memory accounting apis are called under a socket lock.

The plaintext sg data sent for encryption is freed using free_sg() in
tls_encryption_done(). It is wrong to call free_sg() from this function.
This is because this function may run in irq context. We cannot acquire
socket lock in this function.

We remove calling of function free_sg() for plaintext data from
tls_encryption_done() and defer freeing up of plaintext data to the time
when the record is picked up from tx_list and transmitted/freed. When
tls_tx_records() gets called, socket is already locked and thus there is
no concurrent access problem.

Fixes: a42055e8d2c3 ("net/tls: Add support for async encryption")
Signed-off-by: Vakul Garg 
---
 net/tls/tls_sw.c | 21 -
 1 file changed, 16 insertions(+), 5 deletions(-)

diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index bf03f32aa983..406d3bb98818 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -353,6 +353,9 @@ int tls_tx_records(struct sock *sk, int flags)
 * Remove the head of tx_list
 */
list_del(>list);
+   free_sg(sk, rec->sg_plaintext_data,
+   >sg_plaintext_num_elem, >sg_plaintext_size);
+
kfree(rec);
}
 
@@ -371,6 +374,10 @@ int tls_tx_records(struct sock *sk, int flags)
goto tx_err;
 
list_del(>list);
+   free_sg(sk, rec->sg_plaintext_data,
+   >sg_plaintext_num_elem,
+   >sg_plaintext_size);
+
kfree(rec);
} else {
break;
@@ -399,8 +406,6 @@ static void tls_encrypt_done(struct crypto_async_request 
*req, int err)
rec->sg_encrypted_data[0].offset -= tls_ctx->tx.prepend_size;
rec->sg_encrypted_data[0].length += tls_ctx->tx.prepend_size;
 
-   free_sg(sk, rec->sg_plaintext_data,
-   >sg_plaintext_num_elem, >sg_plaintext_size);
 
/* Free the record if error is previously set on socket */
if (err || sk->sk_err) {
@@ -523,9 +528,6 @@ static int tls_push_record(struct sock *sk, int flags,
if (rc == -EINPROGRESS)
return -EINPROGRESS;
 
-   free_sg(sk, rec->sg_plaintext_data, >sg_plaintext_num_elem,
-   >sg_plaintext_size);
-
if (rc < 0) {
tls_err_abort(sk, EBADMSG);
return rc;
@@ -1566,6 +1568,11 @@ void tls_sw_free_resources_tx(struct sock *sk)
 
rec = list_first_entry(>tx_list,
   struct tls_rec, list);
+
+   free_sg(sk, rec->sg_plaintext_data,
+   >sg_plaintext_num_elem,
+   >sg_plaintext_size);
+
list_del(>list);
kfree(rec);
}
@@ -1575,6 +1582,10 @@ void tls_sw_free_resources_tx(struct sock *sk)
>sg_encrypted_num_elem,
>sg_encrypted_size);
 
+   free_sg(sk, rec->sg_plaintext_data,
+   >sg_plaintext_num_elem,
+   >sg_plaintext_size);
+
list_del(>list);
kfree(rec);
}
-- 
2.13.6

RE: [PATCH net-next] tls: Fix socket mem accounting error under async encryption

2018-09-25 Thread Vakul Garg




> -Original Message-
> From: David Miller 
> Sent: Tuesday, September 25, 2018 11:14 PM
> To: Vakul Garg 
> Cc: netdev@vger.kernel.org; bor...@mellanox.com;
> avia...@mellanox.com; davejwat...@fb.com; doro...@fb.com
> Subject: Re: [PATCH net-next] tls: Fix socket mem accounting error under
> async encryption
> 
> From: Vakul Garg 
> Date: Tue, 25 Sep 2018 16:26:17 +0530
> 
> > Current async encryption implementation sometimes showed up socket
> > memory accounting error during socket close. This results in kernel
> > warning calltrace. The root cause of the problem is that socket var
> > sk_forward_alloc gets corrupted due to access in sk_mem_charge() and
> > sk_mem_uncharge() being invoked from multiple concurrent contexts in
> > multicore processor. The apis sk_mem_charge() and sk_mem_uncharge()
> > are called from functions alloc_plaintext_sg(), free_sg() etc. It is
> > required that memory accounting apis are called under a socket lock.
> >
> > The plaintext sg data sent for encryption is freed using free_sg() in
> > tls_encryption_done(). It is wrong to call free_sg() from this function.
> > This is because this function may run in irq context. We cannot
> > acquire socket lock in this function.
> >
> > We remove calling of function free_sg() for plaintext data from
> > tls_encryption_done() and defer freeing up of plaintext data to the
> > time when the record is picked up from tx_list and transmitted/freed.
> > When
> > tls_tx_records() gets called, socket is already locked and thus there
> > is no concurrent access problem.
> >
> > Fixes: a42055e8d2c3 ("net/tls: Add support for async encryption")
> > Signed-off-by: Vakul Garg 
> 
> Applied.
 
I don't find this patch and one other ("tls: Fixed a memory leak during socket 
close")
in linux-net-next. Could you please kindly check? Regards.

1 2 >

1 - 100 of 104 matches

Mail list logo