[PATCH net-next] tls: Add maintainers

2018-10-22 Thread Dave Watson
Add John and Daniel as additional tls co-maintainers to help review
patches and fix syzbot reports.

Acked-by: John Fastabend 
Acked-by: Daniel Borkmann 
Signed-off-by: Dave Watson 
---
 MAINTAINERS | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 7f1399ac028e..4265dacab7d7 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -10223,6 +10223,8 @@ NETWORKING [TLS]
 M: Boris Pismenny 
 M: Aviad Yehezkel 
 M: Dave Watson 
+M: John Fastabend 
+M: Daniel Borkmann 
 L: netdev@vger.kernel.org
 S: Maintained
 F: net/tls/*
-- 
2.17.1



Re: [PATCH bpf-next 4/8] tls: convert to generic sk_msg interface

2018-10-12 Thread Dave Watson
On 10/11/18 02:45 AM, Daniel Borkmann wrote:
> Convert kTLS over to make use of sk_msg interface for plaintext and
> encrypted scattergather data, so it reuses all the sk_msg helpers
> and data structure which later on in a second step enables to glue
> this to BPF.

Looks very clean, thanks!

> 
> -static int zerocopy_from_iter(struct sock *sk, struct iov_iter *from,
> -   int length, int *pages_used,
> -   unsigned int *size_used,
> -   struct scatterlist *to, int to_max_pages,
> -   bool charge)
> -{

...

> - err = zerocopy_from_iter(sk, out_iov, data_len, ,
> -  chunk, [1],
> -  (n_sgout - 1), false);
> + err = tls_setup_from_iter(sk, out_iov, data_len,
> +   , chunk, [1],
> +   (n_sgout - 1));

Any reason not to add the 'bool charge' to sk_msg_zerocopy_from_iter?
Then tls_setup_from_iter is not necessary.


Re: [PATCH net-next] tls: Add support for inplace records encryption

2018-10-01 Thread Dave Watson
On 09/30/18 08:04 AM, Vakul Garg wrote:
> Presently, for non-zero copy case, separate pages are allocated for
> storing plaintext and encrypted text of records. These pages are stored
> in sg_plaintext_data and sg_encrypted_data scatterlists inside record
> structure. Further, sg_plaintext_data & sg_encrypted_data are passed
> to cryptoapis for record encryption. Allocating separate pages for
> plaintext and encrypted text is inefficient from both required memory
> and performance point of view.
> 
> This patch adds support of inplace encryption of records. For non-zero
> copy case, we reuse the pages from sg_encrypted_data scatterlist to
> copy the application's plaintext data. For the movement of pages from
> sg_encrypted_data to sg_plaintext_data scatterlists, we introduce a new
> function move_to_plaintext_sg(). This function add pages into
> sg_plaintext_data from sg_encrypted_data scatterlists.
> 
> tls_do_encryption() is modified to pass the same scatterlist as both
> source and destination into aead_request_set_crypt() if inplace crypto
> has been enabled. A new ariable 'inplace_crypto' has been introduced in
> record structure to signify whether the same scatterlist can be used.
> By default, the inplace_crypto is enabled in get_rec(). If zero-copy is
> used (i.e. plaintext data is not copied), inplace_crypto is set to '0'.
> 
> Signed-off-by: Vakul Garg 

Looks reasonable to me, thanks.

Reviewed-by: Dave Watson 


Re: [bpf PATCH 1/2] tls: possible hang when do_tcp_sendpages hits sndbuf is full case

2018-08-22 Thread Dave Watson
On 08/22/18 08:37 AM, John Fastabend wrote:
> Currently, the lower protocols sk_write_space handler is not called if
> TLS is sending a scatterlist via  tls_push_sg. However, normally
> tls_push_sg calls do_tcp_sendpage, which may be under memory pressure,
> that in turn may trigger a wait via sk_wait_event. Typically, this
> happens when the in-flight bytes exceed the sdnbuf size. In the normal
> case when enough ACKs are received sk_write_space() will be called and
> the sk_wait_event will be woken up allowing it to send more data
> and/or return to the user.
> 
> But, in the TLS case because the sk_write_space() handler does not
> wake up the events the above send will wait until the sndtimeo is
> exceeded. By default this is MAX_SCHEDULE_TIMEOUT so it look like a
> hang to the user (especially this impatient user). To fix this pass
> the sk_write_space event to the lower layers sk_write_space event
> which in the TCP case will wake any pending events.
> 
> I observed the above while integrating sockmap and ktls. It
> initially appeared as test_sockmap (modified to use ktls) occasionally
> hanging. To reliably reproduce this reduce the sndbuf size and stress
> the tls layer by sending many 1B sends. This results in every byte
> needing a header and each byte individually being sent to the crypto
> layer.
> 
> Signed-off-by: John Fastabend 

Super, thanks!

Acked-by: Dave Watson 


Re: [PATCH net-next v1] net/tls: Add support for async decryption of tls records

2018-08-17 Thread Dave Watson
On 08/16/18 08:49 PM, Vakul Garg wrote:
> Changes since RFC version:
>   1) Improved commit message.
>   2) Fixed dequeued record offset handling because of which few of
>  tls selftests 'recv_partial, recv_peek, recv_peek_multiple' were 
> failing.

Thanks! Commit message much more clear, tests work great for me also,
only minor comments on clarity

> - if (tls_sw_advance_skb(sk, skb, chunk)) {
> + if (async) {
> + /* Finished with current record, pick up next */
> + ctx->recv_pkt = NULL;
> + __strp_unpause(>strp);
> + goto mark_eor_chk_ctrl;

Control flow is a little hard to follow here, maybe just pass an async
flag to tls_sw_advance_skb?  It already does strp_unpause and recv_pkt
= NULL.  

> + } else if (tls_sw_advance_skb(sk, skb, chunk)) {
>   /* Return full control message to
>* userspace before trying to parse
>* another message type
>*/
> +mark_eor_chk_ctrl:
>   msg->msg_flags |= MSG_EOR;
>   if (control != TLS_RECORD_TYPE_DATA)
>   goto recv_end;
> + } else {
> + break;

I don't see the need for the else { break; }, isn't this already
covered by while(len); below as before?


Re: [PATCH net-next][RFC] net/tls: Add support for async decryption of tls records

2018-08-15 Thread Dave Watson
On 08/14/18 07:47 PM, Vakul Garg wrote:
> Incoming TLS records which are directly decrypted into user space
> application buffer i.e. records which are decrypted in zero-copy mode
> are submitted for async decryption. When the decryption cryptoapi
> returns -EINPROGRESS, the next tls record is parsed and then submitted
> for decryption. The references to records which has been sent for async
> decryption are dropped. This happens in a loop for all the records that
> can be decrypted in zero-copy mode. For records for which decryption is
> not possible in zero-copy mode, asynchronous decryption is not used and
> we wait for decryption crypto api to complete.
> 
> For crypto requests executing in async fashion, the memory for
> aead_request, sglists and skb etc is freed from the decryption
> completion handler. The decryption completion handler wakesup the
> sleeping user context. This happens when the user context is done
> enqueueing all the crypto requests and is waiting for all the async
> operations to finish. Since the splice() operation does not use
> zero-copy decryption, async remains disabled for splice().

I found it a little hard to understand what you are trying to do based
on the commit message.  Reading the code, it looks like if the recv()
spans multiple TLS records, we will start decryption on all of them,
and only wait right before recv() returns, yes?  This approach sounds
great to me.

Three of the selftests are failing for me:

[ FAIL ] tls.recv_partial
[ FAIL ] tls.recv_peek
[ FAIL ] tls.recv_peek_multiple


Re: [PATCH net-next v2 1/1] net/tls: Combined memory allocation for decryption request

2018-08-09 Thread Dave Watson
On 08/09/18 04:56 AM, Vakul Garg wrote:
> For preparing decryption request, several memory chunks are required
> (aead_req, sgin, sgout, iv, aad). For submitting the decrypt request to
> an accelerator, it is required that the buffers which are read by the
> accelerator must be dma-able and not come from stack. The buffers for
> aad and iv can be separately kmalloced each, but it is inefficient.
> This patch does a combined allocation for preparing decryption request
> and then segments into aead_req || sgin || sgout || iv || aad.
> 
> Signed-off-by: Vakul Garg 

Reviewed-by: Dave Watson 

Thanks, looks good to me now.


Re: [PATCH net-next v1 1/1] net/tls: Combined memory allocation for decryption request

2018-08-08 Thread Dave Watson
On 08/08/18 06:36 PM, Vakul Garg wrote:
> For preparing decryption request, several memory chunks are required
> (aead_req, sgin, sgout, iv, aad). For submitting the decrypt request to
> an accelerator, it is required that the buffers which are read by the
> accelerator must be dma-able and not come from stack. The buffers for
> aad and iv can be separately kmalloced each, but it is inefficient.
> This patch does a combined allocation for preparing decryption request
> and then segments into aead_req || sgin || sgout || iv || aad.
> 
> Signed-off-by: Vakul Garg 
> ---
> + n_sgout = sg_nents(out_sg);
> +
> + n_sgin = skb_nsg(skb, rxm->offset + tls_ctx->rx.prepend_size,
> +  rxm->full_len - tls_ctx->rx.prepend_size);
> + } else {
> +no_zerocopy:

label is unused now



Re: [PATCH RFC net-next 1/1] net/tls: Combined memory allocation for decryption request

2018-08-07 Thread Dave Watson
Hi Vakul,

Only minor comments, mostly looks good to me.  Thanks

> +/* This function decrypts the input skb into either out_iov or in out_sg
> + * or in skb buffers itself. The input parameter 'zc' indicates if
> + * zero-copy mode needs to be tried or not. With zero-copy mode, either
> + * out_iov or out_sg must be non-NULL. In case both out_iov and out_sg are
> + * NULL, then the decryption happens inside skb buffers itself, i.e.
> + * zero-copy gets disabled and 'zc' is updated.
> + */
> +
> +static int decrypt_internal(struct sock *sk, struct sk_buff *skb,
> + struct iov_iter *out_iov,
> + struct scatterlist *out_sg,
> + int *chunk, bool *zc)
> +{
> + struct tls_context *tls_ctx = tls_get_ctx(sk);
> + struct tls_sw_context_rx *ctx = tls_sw_ctx_rx(tls_ctx);
> + struct strp_msg *rxm = strp_msg(skb);
> + int n_sgin, n_sgout, nsg, mem_size, aead_size, err, pages = 0;
> + struct aead_request *aead_req;
> + struct sk_buff *unused;
> + u8 *aad, *iv, *mem = NULL;
> + struct scatterlist *sgin = NULL;
> + struct scatterlist *sgout = NULL;
> + const int data_len = rxm->full_len - tls_ctx->rx.overhead_size;
> +
> + if (*zc && (out_iov || out_sg)) {
> + if (out_iov)
> + n_sgout = iov_iter_npages(out_iov, INT_MAX) + 1;
> + else if (out_sg)
> + n_sgout = sg_nents(out_sg);
> + else
> + goto no_zerocopy;

Is the last else necessary?  It looks like the if already checks for
out_iov || out_sg.

>   struct scatterlist *sgout)
>  {
> - struct tls_context *tls_ctx = tls_get_ctx(sk);
> - struct tls_sw_context_rx *ctx = tls_sw_ctx_rx(tls_ctx);
> - char iv[TLS_CIPHER_AES_GCM_128_SALT_SIZE + MAX_IV_SIZE];
> - struct scatterlist sgin_arr[MAX_SKB_FRAGS + 2];
> - struct scatterlist *sgin = _arr[0];
> - struct strp_msg *rxm = strp_msg(skb);
> - int ret, nsg;
> - struct sk_buff *unused;
> -
> - ret = skb_copy_bits(skb, rxm->offset + TLS_HEADER_SIZE,
> - iv + TLS_CIPHER_AES_GCM_128_SALT_SIZE,
> - tls_ctx->rx.iv_size);
> - if (ret < 0)
> - return ret;
> -
> - memcpy(iv, tls_ctx->rx.iv, TLS_CIPHER_AES_GCM_128_SALT_SIZE);
> - if (!sgout) {
> - nsg = skb_cow_data(skb, 0, );
> - } else {
> - nsg = skb_nsg(skb,
> -   rxm->offset + tls_ctx->rx.prepend_size,
> -   rxm->full_len - tls_ctx->rx.prepend_size);
> - if (nsg <= 0)
> - return nsg;
> - }
> -
> - // We need one extra for ctx->rx_aad_ciphertext
> - nsg++;
> -
> - if (nsg > ARRAY_SIZE(sgin_arr))
> - sgin = kmalloc_array(nsg, sizeof(*sgin), sk->sk_allocation);
> -
> - if (!sgout)
> - sgout = sgin;
> -
> - sg_init_table(sgin, nsg);
> - sg_set_buf([0], ctx->rx_aad_ciphertext, TLS_AAD_SPACE_SIZE);
> -
> - nsg = skb_to_sgvec(skb, [1],
> -rxm->offset + tls_ctx->rx.prepend_size,
> -rxm->full_len - tls_ctx->rx.prepend_size);
> - if (nsg < 0) {
> - ret = nsg;
> - goto out;
> - }
> -
> - tls_make_aad(ctx->rx_aad_ciphertext,
> -  rxm->full_len - tls_ctx->rx.overhead_size,
> -  tls_ctx->rx.rec_seq,
> -  tls_ctx->rx.rec_seq_size,
> -  ctx->control);
> -
> - ret = tls_do_decryption(sk, sgin, sgout, iv,
> - rxm->full_len - tls_ctx->rx.overhead_size,
> - skb, sk->sk_allocation);
> -
> -out:
> - if (sgin != _arr[0])
> - kfree(sgin);
> + bool zc = true;
> + int chunk;
>  
> - return ret;
> + return decrypt_internal(sk, skb, NULL, sgout, , );
>  }

Can we merge this function to callsites?  It's pretty useless now.

>  
>  static bool tls_sw_advance_skb(struct sock *sk, struct sk_buff *skb,
> @@ -899,43 +964,17 @@ int tls_sw_recvmsg(struct sock *sk,
>   }
>  
>   if (!ctx->decrypted) {
> - int page_count;
> - int to_copy;


Re: Security enhancement proposal for kernel TLS

2018-08-03 Thread Dave Watson
On 08/02/18 05:23 PM, Vakul Garg wrote:
> > I agree that Boris' patch does what you say it does - it sets keys 
> > immediately
> > after CCS instead of after FINISHED message.  I disagree that the kernel tls
> > implementation currently requires that specific ordering, nor do I think 
> > that it
> > should require that ordering.
> 
> The current kernel implementation assumes record sequence number to start 
> from '0'.
> If keys have to be set after FINISHED message, then record sequence number 
> need to
> be communicated from user space TLS stack to kernel. IIRC, sequence number is 
> not 
> part of the interface through which key is transferred.

The setsockopt call struct takes the key, iv, salt, and seqno:

struct tls12_crypto_info_aes_gcm_128 {
struct tls_crypto_info info;
unsigned char iv[TLS_CIPHER_AES_GCM_128_IV_SIZE];
unsigned char key[TLS_CIPHER_AES_GCM_128_KEY_SIZE];
unsigned char salt[TLS_CIPHER_AES_GCM_128_SALT_SIZE];
unsigned char rec_seq[TLS_CIPHER_AES_GCM_128_REC_SEQ_SIZE];
};


Re: [PATCH net-next] net/tls: Mark the end in scatterlist table

2018-08-02 Thread Dave Watson
On 08/02/18 05:05 PM, Vakul Garg wrote:
> In case zerocopy_from_iter() fails, 'end' won't get marked.
> So fallback path is fine.
> 
> > Which codepath is calling sg_nents()?
> 
> While testing my WIP implementation of combined dynamic memory allocation for 
> (aead_req || sgin || sgout || aad || iv), I was getting random kernel crashes.
> To debug it I had inserted sg_nents() in my code. The KASAN then immediately
> complained that sg_nents() went beyond the allocated memory for scatterlist.
> This led me to find that scatterlist table end has not been marked.
> 

If this isn't causing KASAN issues for the existing code, it probably
makes more sense to put in a future series with that WIP work then.


Re: [PATCH net-next] net/tls: Always get number of sg entries for skb to be decrypted

2018-08-02 Thread Dave Watson
On 08/02/18 09:50 PM, Vakul Garg wrote:
> Function decrypt_skb() made a bad assumption that number of sg entries
> required for mapping skb to be decrypted would always be less than
> MAX_SKB_FRAGS. The required count of sg entries for skb should always be
> calculated. If they cannot fit in local array sgin_arr[], allocate them
> from heap irrespective whether it is zero-copy case or otherwise. The
> change also benefits the non-zero copy case as we could use sgin_arr[]
> instead of always allocating sg entries from heap.
> 
> Signed-off-by: Vakul Garg 

Looks great, thanks.

Reviewed-by: Dave Waston 


Re: [PATCH net-next] net/tls: Mark the end in scatterlist table

2018-08-02 Thread Dave Watson
On 08/02/18 08:43 PM, Vakul Garg wrote:
> Function zerocopy_from_iter() unmarks the 'end' in input sgtable while
> adding new entries in it. The last entry in sgtable remained unmarked.
> This results in KASAN error report on using apis like sg_nents(). Before
> returning, the function needs to mark the 'end' in the last entry it
> adds.
> 
> Signed-off-by: Vakul Garg 

Looks good to me, it looks like the fallback path should unmark the
end appropriately.

Which codepath is calling sg_nents()? 

Acked-by: Dave Watson 



Re: [net-next v5 3/3] net/tls: Remove redundant array allocation.

2018-08-01 Thread Dave Watson
On 08/01/18 01:49 PM, Vakul Garg wrote:
> > I don't think this patch is safe as-is.  sgin_arr is a stack array of size
> > MAX_SKB_FRAGS (+ overhead), while my read of skb_cow_data is that it
> > walks the whole chain of skbs from skb->next, and can return any number of
> > segments.  Therefore we need to heap allocate.  I think I copied the IPSEC
> > code here.
> > 
> > For perf though, we could use the stack array if skb_cow_data returns <=
> > MAX_SKB_FRAGS.
> 
> skb_cow_data() is being called only when caller passes sgout=NULL (i.e.
> non-zero copy case). In case of zero-copy, we do not call skb_cow_data()
> and just assume that MAX_SKB_FRAGS+2 sized scatterlist array sgin_arr[]
> is sufficient. This assumption could be wrong. So skb_cow_data() should be
> called unconditionally to determine number of scatterlist entries required
> for skb.

I agree it is best to unify them.  I was originally worried about perf
with the extra allocation (which you proposed fixing by merging with
the crypto allocation, which would be great), and the perf of
skb_cow_data().  Zerocopy doesn't require skb_cow_data(), but we do
have to walk the skbs to calculate nsg correctly.

However skb_cow_data perf might be fine after your fix "strparser: Call
skb_unclone conditionally".


Re: Security enhancement proposal for kernel TLS

2018-08-01 Thread Dave Watson
On 07/31/18 10:45 AM, Vakul Garg wrote:
> > > IIUC, with the upstream implementation of tls record layer in kernel,
> > > the decryption of tls FINISHED message happens in kernel. Therefore
> > > the keys are already being sent to kernel tls socket before handshake is
> > completed.
> > 
> > This is incorrect.  
> 
> Let us first reach a common ground on this.
> 
>  The kernel TLS implementation can decrypt only after setting the keys on the 
> socket.
> The TLS message 'finished' (which is encrypted) is received after receiving 
> 'CCS'
> message. After the user space  TLS library receives CCS message, it sets the 
> keys
> on kernel TLS socket. Therefore, the next message in the  socket receive queue
> which is TLS finished gets decrypted in kernel only.
> 
> Please refer to following Boris's patch on openssl. The  commit log says:
> " We choose to set this option at the earliest - just after CCS is complete".

I agree that Boris' patch does what you say it does - it sets keys
immediately after CCS instead of after FINISHED message.  I disagree
that the kernel tls implementation currently requires that specific
ordering, nor do I think that it should require that ordering.


Re: Security enhancement proposal for kernel TLS

2018-07-30 Thread Dave Watson
On 07/30/18 06:31 AM, Vakul Garg wrote:
> > It's not entirely clear how your TLS handshake daemon works -   Why is
> > it necessary to set the keys in the kernel tls socket before the handshake 
> > is
> > completed? 
> 
> IIUC, with the upstream implementation of tls record layer in kernel, the
> decryption of tls FINISHED message happens in kernel. Therefore the keys are
> already being sent to kernel tls socket before handshake is completed.

This is incorrect.  Currently the kernel TLS implementation decrypts
everything after you set the keys on the socket.  I'm suggesting that
you don't set the keys on the socket until after the FINISHED message.

> > Or, why do you need to hand off the fd to the client program
> > before the handshake is completed?
>   
> The fd is always owned by the client program..
> 
> In my proposal, the applications poll their own tcp socket using read/recvmsg 
> etc.
> If they get handshake record, they forward it to the entity running handshake 
> agent.
> The handshake agent could be a linux daemon or could run on a separate 
> security
> processor like 'Secure element' or say arm trustzone etc. The applications
> forward any handshake message it gets backs from handshake agent to the
> connected tcp socket. Therefore, the  applications act as a forwarder of the 
> handshake 
> messages between the peer tls endpoint and handshake agent.
> The received data messages are absorbed by the applications themselves 
> (bypassing ssl stack
> completely). Similarly, the applications tx data directly by writing on their 
> socket.
> 
> > Waiting until after handshake solves both of these issues.
>  
> The security sensitive check which is 'Wait for handshake to finish 
> completely before 
> accepting data' should not be the onus of the application. We have enough 
> examples
> in past where application programmers made mistakes in setting up tls 
> correctly. The idea
> is to isolate tls session setting up from the applications.

It's not clear to me what you gain by putting this 'handshake
finished' notification in the kernel instead of in the client's tls
library - you're already forwarding the handshake start notification
to the daemon, why can't the daemon notify them back in userspace that
the handshake is finished?   

If you did want to put the notification in the kernel, how would you
handle poll on the socket, since probably both the handshake daemon
and client might be polling the socket, but one for control messages
and one for data? 

The original kernel TLS RFC did split these to two separate sockets,
but we decided it was too complicated, and that's not how userspace
TLS clients function today.

Do you have an implementation of this?  There are a bunch of tricky
corner cases here, it might make more sense to have something concrete
to discuss.

> Further, as per tls RFC it is ok to piggyback the data records after the 
> finished handshake
> message. This is called early data. But then it is the responsibility of 
> applications to first
> complete finished message processing before accepting the data records.
> 
> The proposal is to disallow application world seeing data records
> before handshake finishes.

You're talking about the TLS 1.3 0-RTT feature, which is indeed an
interesting case.  For in-process TLS libraries, it's fairly easy to
punt, and don't set the kernel TLS keys until after the 0-RTT data +
handshake message.  For an OOB handshake daemon it might indeed make
more sense to leave the data in kernelspace ... somehow.

> > >   - The handshake state should fallback to 'unverified' in case a control
> > record is seen again by kernel TLS (e.g. in case of renegotiation, post
> > handshake client auth etc).
> > 
> > Currently kernel tls sockets return an error unless you explicitly handle 
> > the
> > control record for exactly this reason.
> 
> IIRC, any kind handshake message post handshake-completion is a problem for 
> kernel tls.
> This includes renegotiation, post handshake client-auth etc.
> 
> Please correct me if I am wrong.

You are correct, but currently kernel TLS sockets return an error
unless you explicitly handle the control message.  This should be
enough already to implement your proposal. 



Re: [net-next v5 3/3] net/tls: Remove redundant array allocation.

2018-07-27 Thread Dave Watson
On 07/27/18 09:34 AM, Vakul Garg wrote:
> 
> 
> > -Original Message-
> > From: Dave Watson [mailto:davejwat...@fb.com]
> > Sent: Thursday, July 26, 2018 2:31 AM
> > To: Vakul Garg 
> > Cc: David Miller ; netdev@vger.kernel.org;
> > bor...@mellanox.com; avia...@mellanox.com; Doron Roberts-Kedes
> > 
> > Subject: Re: [net-next v5 3/3] net/tls: Remove redundant array allocation.
> > 
> > On 07/24/18 08:22 AM, Vakul Garg wrote:
> > > Will it be a bad idea to get rid of array 'sgin' on stack and simply
> > > kmalloc 'sgin' for whatever the number the number of pages returned by
> > iov_iter_npages()?
> > > We can allocate for sgout too with the same kmalloc().
> > >
> > > (Using a local array based 'sgin' is coming in the way to achieve
> > > sending multiple async record decryption requests to the accelerator
> > > without waiting for previous one to complete.)
> > 
> > Yes we could do this, and yes we would need to heap allocate if you want to
> > support multiple outstanding decryption requests.  I think async crypto
> > prevents any sort of zerocopy-fastpath, however.
> 
> We already do a aead_request_alloc (which internally does kmalloc).
> To mitigate the cost of kmalloc/kfree for sg lists and aad, I am allocating a 
> combined memory chunk for all of these and then segmenting it into
> aead_req, aad, sgin, sgout. This way there should be no extra cost for
> memory allocations in non-async.

Makes sense, sounds good to me. 


Re: [PATCH net-next] net/tls: Removed redundant checks for non-NULL

2018-07-25 Thread Dave Watson
On 07/24/18 04:54 PM, Vakul Garg wrote:
> Removed checks against non-NULL before calling kfree_skb() and
> crypto_free_aead(). These functions are safe to be called with NULL
> as an argument.
> 
> Signed-off-by: Vakul Garg 

Acked-by: Dave Watson 


Re: [net-next v5 3/3] net/tls: Remove redundant array allocation.

2018-07-25 Thread Dave Watson
On 07/24/18 08:22 AM, Vakul Garg wrote:
> > I don't think this patch is safe as-is.  sgin_arr is a stack array of size
> > MAX_SKB_FRAGS (+ overhead), while my read of skb_cow_data is that it
> > walks the whole chain of skbs from skb->next, and can return any number of
> > segments.  Therefore we need to heap allocate.  I think I copied the IPSEC
> > code here.
> > 
> > For perf though, we could use the stack array if skb_cow_data returns <=
> > MAX_SKB_FRAGS.
> > 
> > This code is slightly confusing though, since we don't heap allocate in the
> > zerocopy case - what happens is that skb_to_sgvec returns -EMSGSIZE, and
> > we fall back to the non-zerocopy case, and return again to this function,
> > where we then hit the skb_cow_data path and heap allocate.
> 
> Thanks for explaining. 
> I am missing the point why MAX_SKB_FRAGS sized local array 
> sgin has been used in tls_sw_recvmsg(). What is special about MAX_SKB_FRAGS so
> that we used it as a size factor for 'sgin'?

There is nothing special about it, in the zerocopy-fastpath if we
happen to fit in MAX_SKB_FRAGS we avoid any kmalloc whatsoever though.
It could be renamed MAX_SC_FOR_FASTPATH or something.

> Will it be a bad idea to get rid of array 'sgin' on stack and simply kmalloc 
> 'sgin' for 
> whatever the number the number of pages returned by iov_iter_npages()?
> We can allocate for sgout too with the same kmalloc().
> 
> (Using a local array based 'sgin' is coming in the way to achieve sending 
> multiple async
> record decryption requests to the accelerator without waiting for previous 
> one to complete.)

Yes we could do this, and yes we would need to heap allocate if you
want to support multiple outstanding decryption requests.  I think
async crypto prevents any sort of zerocopy-fastpath, however.  


Re: Security enhancement proposal for kernel TLS

2018-07-25 Thread Dave Watson
You would probably get more responses if you cc the relevant people.
Comments inline

On 07/22/18 12:49 PM, Vakul Garg wrote:
> The kernel based TLS record layer allows the user space world to use a 
> decoupled TLS implementation.
> The applications need not be linked with TLS stack. 
> The TLS handshake can be done by a TLS daemon on the behalf of applications.
> 
> Presently, as soon as the handshake process derives keys, it pushes the 
> negotiated keys to kernel TLS . 
> Thereafter the applications can directly read and write data on their TCP 
> socket (without having to use SSL apis).
> 
> With the current kernel TLS implementation, there is a security problem. 
> Since the kernel TLS socket does not have information about the state of 
> handshake, 
> it allows applications to be able to receive data from the peer TLS endpoint 
> even when the handshake verification has not been completed by the SSL 
> daemon. 
> It is a security problem if applications can receive data if verification of 
> the handshake transcript is not completed (done with processing tls FINISHED 
> message).
> 
> My proposal:
>   - Kernel TLS should maintain state of handshake (verified or 
> unverified). 
>   In un-verified state, data records should not be allowed pass through 
> to the applications.
> 
>   - Add a new control interface using which that the user space SSL stack 
> can tell the TLS socket that handshake has been verified and DATA records can 
> flow. 
>   In 'unverified' state, only control records should be allowed to pass 
> and reception DATA record should be pause the receive side record decryption.

It's not entirely clear how your TLS handshake daemon works -   Why is
it necessary to set the keys in the kernel tls socket before the
handshake is completed?  Or, why do you need to hand off the fd to the
client program before the handshake is completed?  

Waiting until after handshake solves both of these issues.

I'm not aware of any tls libraries that send data before the finished
message, is there any reason you need to support this?

> 
>   - The handshake state should fallback to 'unverified' in case a control 
> record is seen again by kernel TLS (e.g. in case of renegotiation, post 
> handshake client auth etc).

Currently kernel tls sockets return an error unless you explicitly
handle the control record for exactly this reason.

If you want an external daemon to handle control messages after
handshake, there definitely might be some synchronization that would
make sense to push in the kernel.  However, with TLS 1.3 removing
renegotiation (and currently reneg is not implemented in kernel tls
anyway), there's much less reason to do so.


Re: [net-next v5 3/3] net/tls: Remove redundant array allocation.

2018-07-23 Thread Dave Watson
On 07/21/18 07:25 PM, David Miller wrote:
> From: Vakul Garg 
> Date: Thu, 19 Jul 2018 21:56:13 +0530
> 
> > In function decrypt_skb(), array allocation in case when sgout is NULL
> > is unnecessary. Instead, local variable sgin_arr[] can be used.
> > 
> > Signed-off-by: Vakul Garg 
> 
> Hmmm...
> 
> Dave, can you take a look at this?  Do you think there might have
> been a reason you felt that you needed to dynamically allocate
> the scatterlists when you COW and skb and do in-place decryption?
> 
> I guess this change is ok, nsg can only get smaller when the SKB
> is COW'd.
> 

> > memcpy(iv, tls_ctx->rx.iv, TLS_CIPHER_AES_GCM_128_SALT_SIZE);
> > if (!sgout) {
> > nsg = skb_cow_data(skb, 0, ) + 1;
> > -   sgin = kmalloc_array(nsg, sizeof(*sgin), sk->sk_allocation);
> > sgout = sgin;
> > }

I don't think this patch is safe as-is.  sgin_arr is a stack array of
size MAX_SKB_FRAGS (+ overhead), while my read of skb_cow_data is that
it walks the whole chain of skbs from skb->next, and can return any
number of segments.  Therefore we need to heap allocate.  I think I
copied the IPSEC code here.

For perf though, we could use the stack array if skb_cow_data returns
<= MAX_SKB_FRAGS.

This code is slightly confusing though, since we don't heap allocate
in the zerocopy case - what happens is that skb_to_sgvec returns
-EMSGSIZE, and we fall back to the non-zerocopy case, and return again
to this function, where we then hit the skb_cow_data path and heap
allocate.


[PATCH net-next] selftests: tls: add selftests for TLS sockets

2018-07-12 Thread Dave Watson
Add selftests for tls socket.  Tests various iov and message options,
poll blocking and nonblocking behavior, partial message sends / receives,
 and control message data.  Tests should pass regardless of if TLS
is enabled in the kernel or not, and print a warning message if not.

Signed-off-by: Dave Watson 
---
 tools/testing/selftests/net/Makefile |   2 +-
 tools/testing/selftests/net/tls.c| 692 +++
 2 files changed, 693 insertions(+), 1 deletion(-)
 create mode 100644 tools/testing/selftests/net/tls.c

diff --git a/tools/testing/selftests/net/Makefile 
b/tools/testing/selftests/net/Makefile
index 663e11e85727..9cca68e440a0 100644
--- a/tools/testing/selftests/net/Makefile
+++ b/tools/testing/selftests/net/Makefile
@@ -13,7 +13,7 @@ TEST_GEN_FILES += psock_fanout psock_tpacket msg_zerocopy
 TEST_GEN_FILES += tcp_mmap tcp_inq psock_snd
 TEST_GEN_FILES += udpgso udpgso_bench_tx udpgso_bench_rx
 TEST_GEN_PROGS = reuseport_bpf reuseport_bpf_cpu reuseport_bpf_numa
-TEST_GEN_PROGS += reuseport_dualstack reuseaddr_conflict
+TEST_GEN_PROGS += reuseport_dualstack reuseaddr_conflict tls
 
 include ../lib.mk
 
diff --git a/tools/testing/selftests/net/tls.c 
b/tools/testing/selftests/net/tls.c
new file mode 100644
index ..b3ebf2646e52
--- /dev/null
+++ b/tools/testing/selftests/net/tls.c
@@ -0,0 +1,692 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#define _GNU_SOURCE
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+#include 
+
+#include 
+#include 
+#include 
+#include 
+
+#include "../kselftest_harness.h"
+
+#define TLS_PAYLOAD_MAX_LEN 16384
+#define SOL_TLS 282
+
+FIXTURE(tls)
+{
+   int fd, cfd;
+   bool notls;
+};
+
+FIXTURE_SETUP(tls)
+{
+   struct tls12_crypto_info_aes_gcm_128 tls12;
+   struct sockaddr_in addr;
+   socklen_t len;
+   int sfd, ret;
+
+   self->notls = false;
+   len = sizeof(addr);
+
+   memset(, 0, sizeof(tls12));
+   tls12.info.version = TLS_1_2_VERSION;
+   tls12.info.cipher_type = TLS_CIPHER_AES_GCM_128;
+
+   addr.sin_family = AF_INET;
+   addr.sin_addr.s_addr = htonl(INADDR_ANY);
+   addr.sin_port = 0;
+
+   self->fd = socket(AF_INET, SOCK_STREAM, 0);
+   sfd = socket(AF_INET, SOCK_STREAM, 0);
+
+   ret = bind(sfd, , sizeof(addr));
+   ASSERT_EQ(ret, 0);
+   ret = listen(sfd, 10);
+   ASSERT_EQ(ret, 0);
+
+   ret = getsockname(sfd, , );
+   ASSERT_EQ(ret, 0);
+
+   ret = connect(self->fd, , sizeof(addr));
+   ASSERT_EQ(ret, 0);
+
+   ret = setsockopt(self->fd, IPPROTO_TCP, TCP_ULP, "tls", sizeof("tls"));
+   if (ret != 0) {
+   self->notls = true;
+   printf("Failure setting TCP_ULP, testing without tls\n");
+   }
+
+   if (!self->notls) {
+   ret = setsockopt(self->fd, SOL_TLS, TLS_TX, ,
+sizeof(tls12));
+   ASSERT_EQ(ret, 0);
+   }
+
+   self->cfd = accept(sfd, , );
+   ASSERT_GE(self->cfd, 0);
+
+   if (!self->notls) {
+   ret = setsockopt(self->cfd, IPPROTO_TCP, TCP_ULP, "tls",
+sizeof("tls"));
+   ASSERT_EQ(ret, 0);
+
+   ret = setsockopt(self->cfd, SOL_TLS, TLS_RX, ,
+sizeof(tls12));
+   ASSERT_EQ(ret, 0);
+   }
+
+   close(sfd);
+}
+
+FIXTURE_TEARDOWN(tls)
+{
+   close(self->fd);
+   close(self->cfd);
+}
+
+TEST_F(tls, sendfile)
+{
+   int filefd = open("/proc/self/exe", O_RDONLY);
+   struct stat st;
+
+   EXPECT_GE(filefd, 0);
+   fstat(filefd, );
+   EXPECT_GE(sendfile(self->fd, filefd, 0, st.st_size), 0);
+}
+
+TEST_F(tls, send_then_sendfile)
+{
+   int filefd = open("/proc/self/exe", O_RDONLY);
+   char const *test_str = "test_send";
+   int to_send = strlen(test_str) + 1;
+   char recv_buf[10];
+   struct stat st;
+   char *buf;
+
+   EXPECT_GE(filefd, 0);
+   fstat(filefd, );
+   buf = (char *)malloc(st.st_size);
+
+   EXPECT_EQ(send(self->fd, test_str, to_send, 0), to_send);
+   EXPECT_EQ(recv(self->cfd, recv_buf, to_send, 0), to_send);
+   EXPECT_EQ(memcmp(test_str, recv_buf, to_send), 0);
+
+   EXPECT_GE(sendfile(self->fd, filefd, 0, st.st_size), 0);
+   EXPECT_EQ(recv(self->cfd, buf, st.st_size, 0), st.st_size);
+}
+
+TEST_F(tls, recv_max)
+{
+   unsigned int send_len = TLS_PAYLOAD_MAX_LEN;
+   char recv_mem[TLS_PAYLOAD_MAX_LEN];
+   char buf[TLS_PAYLOAD_MAX_LEN];
+
+   EXPECT_GE(send(self->fd, buf, send_len, 0), 0);
+   EXPECT_NE(recv(self->cfd, recv_mem, send_len, 0), -1);
+   EXPECT_EQ(memcmp(buf, recv_mem, send_len), 0);
+}
+
+TEST_F(tls, recv_small)
+{
+   char const *test_str = &qu

Re: [PATCH v3 net-next 00/19] TLS offload rx, netdev & mlx5

2018-07-12 Thread Dave Watson
On 07/11/18 10:54 PM, Boris Pismenny wrote:
> Hi,
> 
> The following series provides TLS RX inline crypto offload.

All the tls patches look good to me except #10

"tls: Fix zerocopy_from_iter iov handling"

which seems to break the non-device zerocopy flow. 

The integration is very clean, thanks!

> 
> v2->v3:
> - Fix typo
> - Adjust cover letter
> - Fix bug in zero copy flows
> - Use network byte order for the record number in resync
> - Adjust the sequence provided in resync
> 
> v1->v2:
> - Fix bisectability problems due to variable name changes
> - Fix potential uninitialized return value
> 


Re: [PATCH v3 net-next 10/19] tls: Fix zerocopy_from_iter iov handling

2018-07-12 Thread Dave Watson
On 07/11/18 10:54 PM, Boris Pismenny wrote:
> zerocopy_from_iter iterates over the message, but it doesn't revert the
> updates made by the iov iteration. This patch fixes it. Now, the iov can
> be used after calling zerocopy_from_iter.

This breaks tests (which I will send up as selftests shortly).  I
believe we are depending on zerocopy_from_iter to advance the iter,
and if zerocopy_from_iter returns a failure, then we revert it.  So
you can revert it here if you want, but you'd have to advance it if we
actually used it instead.


[PATCH net] tls: Stricter error checking in zerocopy sendmsg path

2018-07-12 Thread Dave Watson
In the zerocopy sendmsg() path, there are error checks to revert
the zerocopy if we get any error code.  syzkaller has discovered
that tls_push_record can return -ECONNRESET, which is fatal, and
happens after the point at which it is safe to revert the iter,
as we've already passed the memory to do_tcp_sendpages.

Previously this code could return -ENOMEM and we would want to
revert the iter, but AFAIK this no longer returns ENOMEM after
a447da7d004 ("tls: fix waitall behavior in tls_sw_recvmsg"),
so we fail for all error codes.

Reported-by: syzbot+c226690f7b3126c5e...@syzkaller.appspotmail.com
Reported-by: syzbot+709f2810a6a05f11d...@syzkaller.appspotmail.com
Signed-off-by: Dave Watson 
Fixes: 3c4d7559159b ("tls: kernel TLS support")
---
 net/tls/tls_sw.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index 7818011fd250..4618f1c31137 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -440,7 +440,7 @@ int tls_sw_sendmsg(struct sock *sk, struct msghdr *msg, 
size_t size)
ret = tls_push_record(sk, msg->msg_flags, record_type);
if (!ret)
continue;
-   if (ret == -EAGAIN)
+   if (ret < 0)
goto send_end;
 
copied -= try_to_copy;
-- 
2.17.1



Re: [PATCH net-next] net/tls: Removed redundant variable from 'struct tls_sw_context_rx'

2018-07-12 Thread Dave Watson
On 07/12/18 11:14 AM, Vakul Garg wrote:
> Hi Boris
> 
> Thanks for explaining.
> Few questions/observations.
> 
> 1. Isn't ' ctx->decrypted = true' a redundant statement in 
> tls_do_decryption()?
> The same has been repeated in tls_recvmsg() after calling decrypt_skb()?
> 
> 2. Similarly, ctx->saved_data_ready(sk) seems not required in 
> tls_do_decryption().
> This is because tls_do_decryption() is already triggered from tls_recvmsg() 
> i.e. from user space app context.
> 
> 3. In tls_queue(), I think strp->sk->sk_state_change() needs to be replaced 
> with ctx->saved_data_ready().

Yes, I think these 3 can all be changed.  #2 would be required if
do_decryption ever is called not in user context, but that's not the
case currently.


Re: [PATCH net] tls: fix NULL pointer dereference on poll

2018-06-11 Thread Dave Watson
On 06/11/18 11:22 PM, Daniel Borkmann wrote:
> While hacking on kTLS, I ran into the following panic from an
> unprivileged netserver / netperf TCP session:
> 
>   [...]
> Looks like the recent conversion from poll to poll_mask callback started
> in 152524231023 ("net: add support for ->poll_mask in proto_ops") missed
> to eventually convert kTLS, too: TCP's ->poll was converted over to the
> ->poll_mask in commit 2c7d3dacebd4 ("net/tcp: convert to ->poll_mask")
> and therefore kTLS wrongly saved the ->poll old one which is now NULL.
> 
> Convert kTLS over to use ->poll_mask instead. Also instead of POLLIN |
> POLLRDNORM use the proper EPOLLIN | EPOLLRDNORM bits as the case in
> tcp_poll_mask() as well that is mangled here.

Thanks, was just trying to bisect this myself. Works for me. 

Tested-by: Dave Watson 

> Fixes: 2c7d3dacebd4 ("net/tcp: convert to ->poll_mask")
> Signed-off-by: Daniel Borkmann 
> Cc: Christoph Hellwig 
> Cc: Dave Watson 


[PATCH net] net/tls: Don't recursively call push_record during tls_write_space callbacks

2018-05-01 Thread Dave Watson
It is reported that in some cases, write_space may be called in
do_tcp_sendpages, such that we recursively invoke do_tcp_sendpages again:

[  660.468802]  ? do_tcp_sendpages+0x8d/0x580
[  660.468826]  ? tls_push_sg+0x74/0x130 [tls]
[  660.468852]  ? tls_push_record+0x24a/0x390 [tls]
[  660.468880]  ? tls_write_space+0x6a/0x80 [tls]
...

tls_push_sg already does a loop over all sending sg's, so ignore
any tls_write_space notifications until we are done sending.
We then have to call the previous write_space to wake up
poll() waiters after we are done with the send loop.

Reported-by: Andre Tomt <an...@tomt.net>
Signed-off-by: Dave Watson <davejwat...@fb.com>
---
 include/net/tls.h  | 1 +
 net/tls/tls_main.c | 7 +++
 2 files changed, 8 insertions(+)

diff --git a/include/net/tls.h b/include/net/tls.h
index 3da8e13..b400d0bb 100644
--- a/include/net/tls.h
+++ b/include/net/tls.h
@@ -148,6 +148,7 @@ struct tls_context {
struct scatterlist *partially_sent_record;
u16 partially_sent_offset;
unsigned long flags;
+   bool in_tcp_sendpages;
 
u16 pending_open_record_frags;
int (*push_pending_record)(struct sock *sk, int flags);
diff --git a/net/tls/tls_main.c b/net/tls/tls_main.c
index 0d37997..cc03e00 100644
--- a/net/tls/tls_main.c
+++ b/net/tls/tls_main.c
@@ -114,6 +114,7 @@ int tls_push_sg(struct sock *sk,
size = sg->length - offset;
offset += sg->offset;
 
+   ctx->in_tcp_sendpages = true;
while (1) {
if (sg_is_last(sg))
sendpage_flags = flags;
@@ -148,6 +149,8 @@ int tls_push_sg(struct sock *sk,
}
 
clear_bit(TLS_PENDING_CLOSED_RECORD, >flags);
+   ctx->in_tcp_sendpages = false;
+   ctx->sk_write_space(sk);
 
return 0;
 }
@@ -217,6 +220,10 @@ static void tls_write_space(struct sock *sk)
 {
struct tls_context *ctx = tls_get_ctx(sk);
 
+   /* We are already sending pages, ignore notification */
+   if (ctx->in_tcp_sendpages)
+   return;
+
if (!sk->sk_write_pending && tls_is_pending_closed_record(ctx)) {
gfp_t sk_allocation = sk->sk_allocation;
int rc;
-- 
2.9.5



Re: kTLS in combination with mlx4 is very unstable

2018-05-01 Thread Dave Watson
Hi Andre, 

On 04/24/18 10:01 AM, Dave Watson wrote:
> On 04/22/18 11:21 PM, Andre Tomt wrote:
> > The kernel seems to get increasingly unstable as I load it up with client
> > connections. At about 9Gbps and 700 connections, it is okay at least for a
> > while - it might run fine for say 45 minutes. Once it gets to 20 - 30Gbps,
> > the kernel will usually start spewing OOPSes within minutes and the traffic
> > drops.
> > 
> > Some bad interaction between mlx4 and kTLS?

I tried to repro, but wasn't able to - of course I don't have an mlx4
test setup.  If I manually add a tls_write_space call after
do_tcp_sendpages, I get a similar stack though.

Something like the following should work, can you test?  Thanks

diff --git a/include/net/tls.h b/include/net/tls.h
index 8c56809..ee78f33 100644
--- a/include/net/tls.h
+++ b/include/net/tls.h
@@ -187,6 +187,7 @@ struct tls_context {
struct scatterlist *partially_sent_record;
u16 partially_sent_offset;
unsigned long flags;
+   bool in_tcp_sendpages;
 
u16 pending_open_record_frags;
int (*push_pending_record)(struct sock *sk, int flags);
diff --git a/net/tls/tls_main.c b/net/tls/tls_main.c
index 3aafb87..095af65 100644
--- a/net/tls/tls_main.c
+++ b/net/tls/tls_main.c
@@ -114,6 +114,7 @@ int tls_push_sg(struct sock *sk,
size = sg->length - offset;
offset += sg->offset;
 
+   ctx->in_tcp_sendpages = 1;
while (1) {
if (sg_is_last(sg))
sendpage_flags = flags;
@@ -148,6 +149,8 @@ int tls_push_sg(struct sock *sk,
}
 
clear_bit(TLS_PENDING_CLOSED_RECORD, >flags);
+   ctx->in_tcp_sendpages = 0;
+   ctx->sk_write_space(sk);
 
return 0;
 }
@@ -217,6 +220,9 @@ static void tls_write_space(struct sock *sk)
 {
struct tls_context *ctx = tls_get_ctx(sk);
 
+   if (ctx->in_tcp_sendpages)
+   return;
+
if (!sk->sk_write_pending && tls_is_pending_closed_record(ctx)) {
gfp_t sk_allocation = sk->sk_allocation;
int rc;


Re: kTLS in combination with mlx4 is very unstable

2018-04-24 Thread Dave Watson
On 04/22/18 11:21 PM, Andre Tomt wrote:
> kTLS looks fun, so I decided to play with it. It is quite spiffy - however
> with mlx4 I get kernel crashes I'm not seeing when testing on ixgbe.
> 
> For testing I'm using a git build of the "stream reflector" cubemap[1]
> configured with kTLS and 8 worker threads running on 4 physical cores,
> loading it up with a ~13Mbps MPEG-TS stream pulled from satelite TV.
> 
> The kernel seems to get increasingly unstable as I load it up with client
> connections. At about 9Gbps and 700 connections, it is okay at least for a
> while - it might run fine for say 45 minutes. Once it gets to 20 - 30Gbps,
> the kernel will usually start spewing OOPSes within minutes and the traffic
> drops.
> 
> Some bad interaction between mlx4 and kTLS?

I'm not familiar with any mlx4 specific issues, but it looks like
there is enough information here to fix the stack overflow from
recursive callbacks. I'll see if I can come up with something.

Thanks for the report.

> 
> [1] 
> https://urldefense.proofpoint.com/v2/url?u=https-3A__git.sesse.net_-3Fp-3Dcubemap=DwICaQ=5VD0RTtNlTh3ycd41b3MUw=vou6lT5jmE_fWQWZZgNrsMWu4RT87QAB9V07tPHlP5U=nXYfAmb3ozJUT-pI1JGDgMYhxb7Dq4XSorzfyyQeGWk=05SnCOrNbK2DHRub2qPdVxAzXW9e7utxqDMeVaGBd8k=
> 
> First OOPS (from 4.16.3):
> > [  660.467358] BUG: stack guard page was hit at b136e403 (stack is 
> > ded3f179..835ee6c5)
> > [  660.467422] kernel stack overflow (double-fault):  [#1] SMP PTI
> > [  660.467457] Modules linked in: coretemp intel_rapl sb_edac 
> > x86_pkg_temp_thermal intel_powerclamp iTCO_wdt gpio_ich iTCO_vendor_support 
> > kvm_intel mxm_wmi xfs libcrc32c kvm crc32c_generic irqbypass nls_iso8859_1 
> > crct10dif_pclmul crc32_pclmul nls_cp437 ghash_clmulni_intel vfat fat 
> > aesni_intel aes_x86_64 crypto_simd cryptd glue_helper intel_pch_thermal 
> > mei_me sg mei lpc_ich mfd_core evdev ipmi_si ipmi_devintf ipmi_msghandler 
> > wmi acpi_pad tls ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 
> > hid_generic usbhid hid mlx4_ib mlx4_en ib_core sd_mod ast i2c_algo_bit ttm 
> > drm_kms_helper syscopyarea sysfillrect ehci_pci xhci_pci sysimgblt 
> > fb_sys_fops ahci libahci xhci_hcd ehci_hcd libata crc32c_intel nvme drm 
> > usbcore scsi_mod mlx4_core ixgbe i2c_core mdio usb_common devlink hwmon 
> > nvme_core rtc_cmos
> > [  660.467856] CPU: 4 PID: 660 Comm: cubemap Not tainted 4.16.0-1 #1
> > [  660.467890] Hardware name: Supermicro Super Server/X10SDV-4C-TLN2F, BIOS 
> > 1.2c 09/19/2017
> > [  660.467939] RIP: 0010:__kmalloc+0x7/0x1f0
> > [  660.467962] RSP: 0018:abafc27b8000 EFLAGS: 00010206
> > [  660.467992] RAX: 000d RBX: 0010 RCX: 
> > abafc27b8070
> > [  660.468030] RDX: 98a0d0235490 RSI: 01080020 RDI: 
> > 001d
> > [  660.468069] RBP: 000d R08: 98a0d5be4860 R09: 
> > 98a0ec299180
> > [  660.468106] R10: abafc27b80b8 R11: 0010 R12: 
> > 0010
> > [  660.468145] R13: 98a0ec299180 R14: 98a0ec299180 R15: 
> > 
> > [  660.468184] FS:  7f8a35ffb700() GS:98a17fd0() 
> > knlGS:
> > [  660.468227] CS:  0010 DS:  ES:  CR0: 80050033
> > [  660.468258] CR2: abafc27b7ff8 CR3: 0004698ee001 CR4: 
> > 003606e0
> > [  660.468297] DR0:  DR1:  DR2: 
> > 
> > [  660.468334] DR3:  DR6: fffe0ff0 DR7: 
> > 0400
> > [  660.468373] Call Trace:
> > [  660.468401]  gcmaes_encrypt.constprop.5+0x137/0x240 [aesni_intel]
> > [  660.468439]  ? generic_gcmaes_encrypt+0x5f/0x80 [aesni_intel]
> > [  660.468476]  ? gcmaes_wrapper_encrypt+0x36/0x80 [aesni_intel]
> > [  660.468511]  ? tls_push_record+0x1d3/0x390 [tls]
> > [  660.468537]  ? tls_push_record+0x1d3/0x390 [tls]
> > [  660.468565]  ? tls_write_space+0x6a/0x80 [tls]
> > [  660.468593]  ? do_tcp_sendpages+0x8d/0x580
> > [  660.468618]  ? tls_push_sg+0x74/0x130 [tls]
> > [  660.468643]  ? tls_push_record+0x24a/0x390 [tls]
> > [  660.468671]  ? tls_write_space+0x6a/0x80 [tls]
> > [  660.468697]  ? do_tcp_sendpages+0x8d/0x580
> > [  660.468722]  ? tls_push_sg+0x74/0x130 [tls]
> > [  660.468748]  ? tls_push_record+0x24a/0x390 [tls]
> > [  660.468776]  ? tls_write_space+0x6a/0x80 [tls]
> > [  660.468802]  ? do_tcp_sendpages+0x8d/0x580
> > [  660.468826]  ? tls_push_sg+0x74/0x130 [tls]
> > [  660.468852]  ? tls_push_record+0x24a/0x390 [tls]
> > [  660.468880]  ? tls_write_space+0x6a/0x80 [tls]
> > [  660.468906]  ? do_tcp_sendpages+0x8d/0x580
> > [  660.468931]  ? tls_push_sg+0x74/0x130 [tls]
> > [  660.468957]  ? tls_push_record+0x24a/0x390 [tls]
> > [  660.470165]  ? tls_write_space+0x6a/0x80 [tls]
> > [  660.471363]  ? do_tcp_sendpages+0x8d/0x580
> > [  660.472555]  ? tls_push_sg+0x74/0x130 [tls]
> > [  660.473713]  ? tls_push_record+0x24a/0x390 [tls]
> > [  660.474838]  ? tls_write_space+0x6a/0x80 [tls]
> > [  660.475927]  

Re: [PATCH] net/tls: Remove VLA usage

2018-04-11 Thread Dave Watson
On 04/10/18 05:52 PM, Kees Cook wrote:
> In the quest to remove VLAs from the kernel[1], this replaces the VLA
> size with the only possible size used in the code, and adds a mechanism
> to double-check future IV sizes.
> 
> [1] 
> https://lkml.kernel.org/r/CA+55aFzCG-zNmZwX4A2FQpadafLfEzK6CC=qpxydaacu1rq...@mail.gmail.com
> 
> Signed-off-by: Kees Cook <keesc...@chromium.org>

Thanks

Acked-by: Dave Watson <davejwat...@fb.com>

> ---
>  net/tls/tls_sw.c | 10 +-
>  1 file changed, 9 insertions(+), 1 deletion(-)
> 
> diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
> index 4dc766b03f00..71e79597f940 100644
> --- a/net/tls/tls_sw.c
> +++ b/net/tls/tls_sw.c
> @@ -41,6 +41,8 @@
>  #include 
>  #include 
>  
> +#define MAX_IV_SIZE  TLS_CIPHER_AES_GCM_128_IV_SIZE
> +
>  static int tls_do_decryption(struct sock *sk,
>struct scatterlist *sgin,
>struct scatterlist *sgout,
> @@ -673,7 +675,7 @@ static int decrypt_skb(struct sock *sk, struct sk_buff 
> *skb,
>  {
>   struct tls_context *tls_ctx = tls_get_ctx(sk);
>   struct tls_sw_context *ctx = tls_sw_ctx(tls_ctx);
> - char iv[TLS_CIPHER_AES_GCM_128_SALT_SIZE + tls_ctx->rx.iv_size];
> + char iv[TLS_CIPHER_AES_GCM_128_SALT_SIZE + MAX_IV_SIZE];
>   struct scatterlist sgin_arr[MAX_SKB_FRAGS + 2];
>   struct scatterlist *sgin = _arr[0];
>   struct strp_msg *rxm = strp_msg(skb);
> @@ -1094,6 +1096,12 @@ int tls_set_sw_offload(struct sock *sk, struct 
> tls_context *ctx, int tx)
>   goto free_priv;
>   }
>  
> + /* Sanity-check the IV size for stack allocations. */
> + if (iv_size > MAX_IV_SIZE) {
> + rc = -EINVAL;
> + goto free_priv;
> + }
> +
>   cctx->prepend_size = TLS_HEADER_SIZE + nonce_size;
>   cctx->tag_size = tag_size;
>   cctx->overhead_size = cctx->prepend_size + cctx->tag_size;
> -- 
> 2.7.4
> 
> 
> -- 
> Kees Cook
> Pixel Security


Re: [PATCH V4 net-next 06/14] net/tls: Add generic NIC offload infrastructure

2018-03-27 Thread Dave Watson
Thanks for doing the merge, it looks good to me.  One issue below,
otherwise all my SW tests still pass.

On 03/27/18 02:51 PM, Saeed Mahameed wrote:
> - if (ctx->conf == TLS_SW_TX ||
> - ctx->conf == TLS_SW_RX ||
> - ctx->conf == TLS_SW_RXTX) {
> - tls_sw_free_resources(sk);
> + if (ctx->rx_conf == TLS_SW) {
> + kfree(ctx->rx.rec_seq);
> + kfree(ctx->rx.iv);
> + tls_sw_free_resources_rx(sk);
>   }
>  
> + if (ctx->tx_conf != TLS_HW)
> + kfree(ctx);

Looks like this needs to be hidden behind CONFIG_TLS_DEVICE, otherwise
it doesn't compile if false because TLS_HW is not defined



[PATCH v2 net] strparser: Fix sign of err codes

2018-03-27 Thread Dave Watson
strp_parser_err is called with a negative code everywhere, which then
calls abort_parser with a negative code.  strp_msg_timeout calls
abort_parser directly with a positive code.  Negate ETIMEDOUT
to match signed-ness of other calls.

The default abort_parser callback, strp_abort_strp, sets
sk->sk_err to err.  Also negate the error here so sk_err always
holds a positive value, as the rest of the net code expects.  Currently
a negative sk_err can result in endless loops, or user code that
thinks it actually sent/received err bytes.

Found while testing net/tls_sw recv path.

Fixes: 43a0c6751a322847 ("strparser: Stream parser for messages")
Signed-off-by: Dave Watson <davejwat...@fb.com>
---
 Documentation/networking/strparser.txt | 5 +++--
 net/strparser/strparser.c  | 8 
 2 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/Documentation/networking/strparser.txt 
b/Documentation/networking/strparser.txt
index 13081b3..23bf827 100644
--- a/Documentation/networking/strparser.txt
+++ b/Documentation/networking/strparser.txt
@@ -158,7 +158,8 @@ int (*read_sock_done)(struct strparser *strp, int err);
  the TCP socket in receive callback mode. The stream parser may
  read multiple messages in a loop and this function allows cleanup
  to occur when exiting the loop. If the callback is not set (NULL
- in strp_init) a default function is used.
+ in strp_init) a default function is used.  err is a negative
+ error value.
 
 void (*abort_parser)(struct strparser *strp, int err);
 
@@ -166,7 +167,7 @@ void (*abort_parser)(struct strparser *strp, int err);
  in parsing. The default function stops the stream parser and
  sets the error in the socket if the parser is in receive callback
  mode. The default function can be changed by setting the callback
- to non-NULL in strp_init.
+ to non-NULL in strp_init. err is a negative error value.
 
 Statistics
 ==
diff --git a/net/strparser/strparser.c b/net/strparser/strparser.c
index 1fdab5c..a82ca09 100644
--- a/net/strparser/strparser.c
+++ b/net/strparser/strparser.c
@@ -44,7 +44,7 @@ static inline struct _strp_msg *_strp_msg(struct sk_buff *skb)
offsetof(struct qdisc_skb_cb, data));
 }
 
-/* Lower lock held */
+/* Lower lock held.  err is a negative error value. */
 static void strp_abort_strp(struct strparser *strp, int err)
 {
/* Unrecoverable error in receive */
@@ -60,7 +60,7 @@ static void strp_abort_strp(struct strparser *strp, int err)
struct sock *sk = strp->sk;
 
/* Report an error on the lower socket */
-   sk->sk_err = err;
+   sk->sk_err = -err;
sk->sk_error_report(sk);
}
 }
@@ -71,7 +71,7 @@ static void strp_start_timer(struct strparser *strp, long 
timeo)
mod_delayed_work(strp_wq, >msg_timer_work, timeo);
 }
 
-/* Lower lock held */
+/* Lower lock held.  err is a negative error value. */
 static void strp_parser_err(struct strparser *strp, int err,
read_descriptor_t *desc)
 {
@@ -458,7 +458,7 @@ static void strp_msg_timeout(struct work_struct *w)
/* Message assembly timed out */
STRP_STATS_INCR(strp->stats.msg_timeouts);
strp->cb.lock(strp);
-   strp->cb.abort_parser(strp, ETIMEDOUT);
+   strp->cb.abort_parser(strp, -ETIMEDOUT);
strp->cb.unlock(strp);
 }
 
-- 
2.9.5



Re: [PATCH net] strparser: Fix sign of err codes

2018-03-27 Thread Dave Watson
On 03/26/18 01:44 PM, Tom Herbert wrote:
> On Mon, Mar 26, 2018 at 12:31 PM, Dave Watson <davejwat...@fb.com> wrote:
> > strp_parser_err is called with a negative code everywhere, which then
> > calls abort_parser with a negative code.  strp_msg_timeout calls
> > abort_parser directly with a positive code.  Negate ETIMEDOUT
> > to match signed-ness of other calls.
> >
> > The default abort_parser callback, strp_abort_strp, sets
> > sk->sk_err to err.  Also negate the error here so sk_err always
> > holds a positive value, as the rest of the net code expects.  Currently
> > a negative sk_err can result in endless loops, or user code that
> > thinks it actually sent/received err bytes.
> >
> > Found while testing net/tls_sw recv path.
> >
> Nice catch!
> 
> It might be nice to have a comment at strp_parser_err and abort_parser
> description in Documentation/networking/strparser.txt should also be
> updated that err is a negative error value.

Sure I can update the docs also.


[PATCH net] strparser: Fix sign of err codes

2018-03-26 Thread Dave Watson
strp_parser_err is called with a negative code everywhere, which then
calls abort_parser with a negative code.  strp_msg_timeout calls
abort_parser directly with a positive code.  Negate ETIMEDOUT
to match signed-ness of other calls.

The default abort_parser callback, strp_abort_strp, sets
sk->sk_err to err.  Also negate the error here so sk_err always
holds a positive value, as the rest of the net code expects.  Currently
a negative sk_err can result in endless loops, or user code that
thinks it actually sent/received err bytes.

Found while testing net/tls_sw recv path.

Fixes: 43a0c6751a322847 ("strparser: Stream parser for messages")
Signed-off-by: Dave Watson <davejwat...@fb.com>
---
 net/strparser/strparser.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/strparser/strparser.c b/net/strparser/strparser.c
index 1fdab5c..b9283ce 100644
--- a/net/strparser/strparser.c
+++ b/net/strparser/strparser.c
@@ -60,7 +60,7 @@ static void strp_abort_strp(struct strparser *strp, int err)
struct sock *sk = strp->sk;
 
/* Report an error on the lower socket */
-   sk->sk_err = err;
+   sk->sk_err = -err;
sk->sk_error_report(sk);
}
 }
@@ -458,7 +458,7 @@ static void strp_msg_timeout(struct work_struct *w)
/* Message assembly timed out */
STRP_STATS_INCR(strp->stats.msg_timeouts);
strp->cb.lock(strp);
-   strp->cb.abort_parser(strp, ETIMEDOUT);
+   strp->cb.abort_parser(strp, -ETIMEDOUT);
strp->cb.unlock(strp);
 }
 
-- 
2.9.5



[PATCH v2 net-next 5/6] tls: RX path for ktls

2018-03-22 Thread Dave Watson
Add rx path for tls software implementation.

recvmsg, splice_read, and poll implemented.

An additional sockopt TLS_RX is added, with the same interface as
TLS_TX.  Either TLX_RX or TLX_TX may be provided separately, or
together (with two different setsockopt calls with appropriate keys).

Control messages are passed via CMSG in a similar way to transmit.
If no cmsg buffer is passed, then only application data records
will be passed to userspace, and EIO is returned for other types of
alerts.

EBADMSG is passed for decryption errors, and EMSGSIZE is passed for
framing too big, and EBADMSG for framing too small (matching openssl
semantics). EINVAL is returned for TLS versions that do not match the
original setsockopt call.  All are unrecoverable.

strparser is used to parse TLS framing.   Decryption is done directly
in to userspace buffers if they are large enough to support it, otherwise
sk_cow_data is called (similar to ipsec), and buffers are decrypted in
place and copied.  splice_read always decrypts in place, since no
buffers are provided to decrypt in to.

sk_poll is overridden, and only returns POLLIN if a full TLS message is
received.  Otherwise we wait for strparser to finish reading a full frame.
Actual decryption is only done during recvmsg or splice_read calls.

Signed-off-by: Dave Watson <davejwat...@fb.com>
---
 include/net/tls.h|  27 ++-
 include/uapi/linux/tls.h |   2 +
 net/tls/Kconfig  |   1 +
 net/tls/tls_main.c   |  62 -
 net/tls/tls_sw.c | 587 ++-
 5 files changed, 609 insertions(+), 70 deletions(-)

diff --git a/include/net/tls.h b/include/net/tls.h
index 095b722..437a746 100644
--- a/include/net/tls.h
+++ b/include/net/tls.h
@@ -40,6 +40,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 
@@ -58,8 +59,18 @@
 
 struct tls_sw_context {
struct crypto_aead *aead_send;
+   struct crypto_aead *aead_recv;
struct crypto_wait async_wait;
 
+   /* Receive context */
+   struct strparser strp;
+   void (*saved_data_ready)(struct sock *sk);
+   unsigned int (*sk_poll)(struct file *file, struct socket *sock,
+   struct poll_table_struct *wait);
+   struct sk_buff *recv_pkt;
+   u8 control;
+   bool decrypted;
+
/* Sending context */
char aad_space[TLS_AAD_SPACE_SIZE];
 
@@ -96,12 +107,17 @@ struct tls_context {
struct tls_crypto_info crypto_send;
struct tls12_crypto_info_aes_gcm_128 crypto_send_aes_gcm_128;
};
+   union {
+   struct tls_crypto_info crypto_recv;
+   struct tls12_crypto_info_aes_gcm_128 crypto_recv_aes_gcm_128;
+   };
 
void *priv_ctx;
 
u8 conf:2;
 
struct cipher_context tx;
+   struct cipher_context rx;
 
struct scatterlist *partially_sent_record;
u16 partially_sent_offset;
@@ -128,12 +144,19 @@ int tls_sk_attach(struct sock *sk, int optname, char 
__user *optval,
  unsigned int optlen);
 
 
-int tls_set_sw_offload(struct sock *sk, struct tls_context *ctx);
+int tls_set_sw_offload(struct sock *sk, struct tls_context *ctx, int tx);
 int tls_sw_sendmsg(struct sock *sk, struct msghdr *msg, size_t size);
 int tls_sw_sendpage(struct sock *sk, struct page *page,
int offset, size_t size, int flags);
 void tls_sw_close(struct sock *sk, long timeout);
-void tls_sw_free_tx_resources(struct sock *sk);
+void tls_sw_free_resources(struct sock *sk);
+int tls_sw_recvmsg(struct sock *sk, struct msghdr *msg, size_t len,
+  int nonblock, int flags, int *addr_len);
+unsigned int tls_sw_poll(struct file *file, struct socket *sock,
+struct poll_table_struct *wait);
+ssize_t tls_sw_splice_read(struct socket *sock, loff_t *ppos,
+  struct pipe_inode_info *pipe,
+  size_t len, unsigned int flags);
 
 void tls_sk_destruct(struct sock *sk, struct tls_context *ctx);
 void tls_icsk_clean_acked(struct sock *sk);
diff --git a/include/uapi/linux/tls.h b/include/uapi/linux/tls.h
index 293b2cd..c6633e9 100644
--- a/include/uapi/linux/tls.h
+++ b/include/uapi/linux/tls.h
@@ -38,6 +38,7 @@
 
 /* TLS socket options */
 #define TLS_TX 1   /* Set transmit parameters */
+#define TLS_RX 2   /* Set receive parameters */
 
 /* Supported versions */
 #define TLS_VERSION_MINOR(ver) ((ver) & 0xFF)
@@ -59,6 +60,7 @@
 #define TLS_CIPHER_AES_GCM_128_REC_SEQ_SIZE8
 
 #define TLS_SET_RECORD_TYPE1
+#define TLS_GET_RECORD_TYPE2
 
 struct tls_crypto_info {
__u16 version;
diff --git a/net/tls/Kconfig b/net/tls/Kconfig
index eb58303..89b8745a 100644
--- a/net/tls/Kconfig
+++ b/net/tls/Kconfig
@@ -7,6 +7,7 @@ config TLS
select CRYPTO
select CRYPTO_AES
select CRYPTO_GCM
+   select STREAM_PARSER
default n
 

[PATCH v2 net-next 1/6] tls: Generalize zerocopy_from_iter

2018-03-22 Thread Dave Watson
Refactor zerocopy_from_iter to take arguments for pages and size,
such that it can be used for both tx and rx. RX will also support
zerocopy direct to output iter, as long as the full message can
be copied at once (a large enough userspace buffer was provided).

Signed-off-by: Dave Watson <davejwat...@fb.com>
---
 net/tls/tls_sw.c | 31 +++
 1 file changed, 19 insertions(+), 12 deletions(-)

diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index 057a558..ca1d20d 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -226,23 +226,24 @@ static int tls_sw_push_pending_record(struct sock *sk, 
int flags)
 }
 
 static int zerocopy_from_iter(struct sock *sk, struct iov_iter *from,
- int length)
+ int length, int *pages_used,
+ unsigned int *size_used,
+ struct scatterlist *to, int to_max_pages,
+ bool charge)
 {
-   struct tls_context *tls_ctx = tls_get_ctx(sk);
-   struct tls_sw_context *ctx = tls_sw_ctx(tls_ctx);
struct page *pages[MAX_SKB_FRAGS];
 
size_t offset;
ssize_t copied, use;
int i = 0;
-   unsigned int size = ctx->sg_plaintext_size;
-   int num_elem = ctx->sg_plaintext_num_elem;
+   unsigned int size = *size_used;
+   int num_elem = *pages_used;
int rc = 0;
int maxpages;
 
while (length > 0) {
i = 0;
-   maxpages = ARRAY_SIZE(ctx->sg_plaintext_data) - num_elem;
+   maxpages = to_max_pages - num_elem;
if (maxpages == 0) {
rc = -EFAULT;
goto out;
@@ -262,10 +263,11 @@ static int zerocopy_from_iter(struct sock *sk, struct 
iov_iter *from,
while (copied) {
use = min_t(int, copied, PAGE_SIZE - offset);
 
-   sg_set_page(>sg_plaintext_data[num_elem],
+   sg_set_page([num_elem],
pages[i], use, offset);
-   sg_unmark_end(>sg_plaintext_data[num_elem]);
-   sk_mem_charge(sk, use);
+   sg_unmark_end([num_elem]);
+   if (charge)
+   sk_mem_charge(sk, use);
 
offset = 0;
copied -= use;
@@ -276,8 +278,9 @@ static int zerocopy_from_iter(struct sock *sk, struct 
iov_iter *from,
}
 
 out:
-   ctx->sg_plaintext_size = size;
-   ctx->sg_plaintext_num_elem = num_elem;
+   *size_used = size;
+   *pages_used = num_elem;
+
return rc;
 }
 
@@ -374,7 +377,11 @@ int tls_sw_sendmsg(struct sock *sk, struct msghdr *msg, 
size_t size)
 
if (full_record || eor) {
ret = zerocopy_from_iter(sk, >msg_iter,
-try_to_copy);
+   try_to_copy, >sg_plaintext_num_elem,
+   >sg_plaintext_size,
+   ctx->sg_plaintext_data,
+   ARRAY_SIZE(ctx->sg_plaintext_data),
+   true);
if (ret)
goto fallback_to_reg_send;
 
-- 
2.9.5



[PATCH v2 net-next 6/6] tls: Add receive path documentation

2018-03-22 Thread Dave Watson
Add documentation on rx path setup and cmsg interface.

Signed-off-by: Dave Watson <davejwat...@fb.com>
---
 Documentation/networking/tls.txt | 66 ++--
 1 file changed, 64 insertions(+), 2 deletions(-)

diff --git a/Documentation/networking/tls.txt b/Documentation/networking/tls.txt
index 77ed006..58b5ef7 100644
--- a/Documentation/networking/tls.txt
+++ b/Documentation/networking/tls.txt
@@ -48,6 +48,9 @@ the transmit and the receive into the kernel.
 
   setsockopt(sock, SOL_TLS, TLS_TX, _info, sizeof(crypto_info));
 
+Transmit and receive are set separately, but the setup is the same, using 
either
+TLS_TX or TLS_RX.
+
 Sending TLS application data
 
 
@@ -79,6 +82,28 @@ for memory), or the encryption will always succeed.  If 
send() returns
 -ENOMEM and some data was left on the socket buffer from a previous
 call using MSG_MORE, the MSG_MORE data is left on the socket buffer.
 
+Receiving TLS application data
+--
+
+After setting the TLS_RX socket option, all recv family socket calls
+are decrypted using TLS parameters provided.  A full TLS record must
+be received before decryption can happen.
+
+  char buffer[16384];
+  recv(sock, buffer, 16384);
+
+Received data is decrypted directly in to the user buffer if it is
+large enough, and no additional allocations occur.  If the userspace
+buffer is too small, data is decrypted in the kernel and copied to
+userspace.
+
+EINVAL is returned if the TLS version in the received message does not
+match the version passed in setsockopt.
+
+EMSGSIZE is returned if the received message is too big.
+
+EBADMSG is returned if decryption failed for any other reason.
+
 Send TLS control messages
 -
 
@@ -118,6 +143,43 @@ using a record of type @record_type.
 Control message data should be provided unencrypted, and will be
 encrypted by the kernel.
 
+Receiving TLS control messages
+--
+
+TLS control messages are passed in the userspace buffer, with message
+type passed via cmsg.  If no cmsg buffer is provided, an error is
+returned if a control message is received.  Data messages may be
+received without a cmsg buffer set.
+
+  char buffer[16384];
+  char cmsg[CMSG_SPACE(sizeof(unsigned char))];
+  struct msghdr msg = {0};
+  msg.msg_control = cmsg;
+  msg.msg_controllen = sizeof(cmsg);
+
+  struct iovec msg_iov;
+  msg_iov.iov_base = buffer;
+  msg_iov.iov_len = 16384;
+
+  msg.msg_iov = _iov;
+  msg.msg_iovlen = 1;
+
+  int ret = recvmsg(sock, , 0 /* flags */);
+
+  struct cmsghdr *cmsg = CMSG_FIRSTHDR();
+  if (cmsg->cmsg_level == SOL_TLS &&
+  cmsg->cmsg_type == TLS_GET_RECORD_TYPE) {
+  int record_type = *((unsigned char *)CMSG_DATA(cmsg));
+  // Do something with record_type, and control message data in
+  // buffer.
+  //
+  // Note that record_type may be == to application data (23).
+  } else {
+  // Buffer contains application data.
+  }
+
+recv will never return data from mixed types of TLS records.
+
 Integrating in to userspace TLS library
 ---
 
@@ -126,10 +188,10 @@ layer of a userspace TLS library.
 
 A patchset to OpenSSL to use ktls as the record layer is here:
 
-https://github.com/Mellanox/tls-openssl
+https://github.com/Mellanox/openssl/commits/tls_rx2
 
 An example of calling send directly after a handshake using
 gnutls.  Since it doesn't implement a full record layer, control
 messages are not supported:
 
-https://github.com/Mellanox/tls-af_ktls_tool
+https://github.com/ktls/af_ktls-tool/commits/RX
-- 
2.9.5



[PATCH v2 net-next 3/6] tls: Pass error code explicitly to tls_err_abort

2018-03-22 Thread Dave Watson
Pass EBADMSG explicitly to tls_err_abort.  Receive path will
pass additional codes - EMSGSIZE if framing is larger than max
TLS record size, EINVAL if TLS version mismatch.

Signed-off-by: Dave Watson <davejwat...@fb.com>
---
 include/net/tls.h | 6 +++---
 net/tls/tls_sw.c  | 2 +-
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/include/net/tls.h b/include/net/tls.h
index 019e52d..6b44875 100644
--- a/include/net/tls.h
+++ b/include/net/tls.h
@@ -174,9 +174,9 @@ static inline bool tls_is_pending_open_record(struct 
tls_context *tls_ctx)
return tls_ctx->pending_open_record_frags;
 }
 
-static inline void tls_err_abort(struct sock *sk)
+static inline void tls_err_abort(struct sock *sk, int err)
 {
-   sk->sk_err = EBADMSG;
+   sk->sk_err = err;
sk->sk_error_report(sk);
 }
 
@@ -197,7 +197,7 @@ static inline void tls_advance_record_sn(struct sock *sk,
 struct cipher_context *ctx)
 {
if (tls_bigint_increment(ctx->rec_seq, ctx->rec_seq_size))
-   tls_err_abort(sk);
+   tls_err_abort(sk, EBADMSG);
tls_bigint_increment(ctx->iv + TLS_CIPHER_AES_GCM_128_SALT_SIZE,
 ctx->iv_size);
 }
diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index 338d743..1c79d9a 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -214,7 +214,7 @@ static int tls_push_record(struct sock *sk, int flags,
/* Only pass through MSG_DONTWAIT and MSG_NOSIGNAL flags */
rc = tls_push_sg(sk, tls_ctx, ctx->sg_encrypted_data, 0, flags);
if (rc < 0 && rc != -EAGAIN)
-   tls_err_abort(sk);
+   tls_err_abort(sk, EBADMSG);
 
tls_advance_record_sn(sk, _ctx->tx);
return rc;
-- 
2.9.5



[PATCH v2 net-next 2/6] tls: Move cipher info to a separate struct

2018-03-22 Thread Dave Watson
Separate tx crypto parameters to a separate cipher_context struct.
The same parameters will be used for rx using the same struct.

tls_advance_record_sn is modified to only take the cipher info.

Signed-off-by: Dave Watson <davejwat...@fb.com>
---
 include/net/tls.h  | 26 +---
 net/tls/tls_main.c |  8 
 net/tls/tls_sw.c   | 58 --
 3 files changed, 49 insertions(+), 43 deletions(-)

diff --git a/include/net/tls.h b/include/net/tls.h
index 4913430..019e52d 100644
--- a/include/net/tls.h
+++ b/include/net/tls.h
@@ -81,6 +81,16 @@ enum {
TLS_PENDING_CLOSED_RECORD
 };
 
+struct cipher_context {
+   u16 prepend_size;
+   u16 tag_size;
+   u16 overhead_size;
+   u16 iv_size;
+   char *iv;
+   u16 rec_seq_size;
+   char *rec_seq;
+};
+
 struct tls_context {
union {
struct tls_crypto_info crypto_send;
@@ -91,13 +101,7 @@ struct tls_context {
 
u8 tx_conf:2;
 
-   u16 prepend_size;
-   u16 tag_size;
-   u16 overhead_size;
-   u16 iv_size;
-   char *iv;
-   u16 rec_seq_size;
-   char *rec_seq;
+   struct cipher_context tx;
 
struct scatterlist *partially_sent_record;
u16 partially_sent_offset;
@@ -190,7 +194,7 @@ static inline bool tls_bigint_increment(unsigned char *seq, 
int len)
 }
 
 static inline void tls_advance_record_sn(struct sock *sk,
-struct tls_context *ctx)
+struct cipher_context *ctx)
 {
if (tls_bigint_increment(ctx->rec_seq, ctx->rec_seq_size))
tls_err_abort(sk);
@@ -203,9 +207,9 @@ static inline void tls_fill_prepend(struct tls_context *ctx,
 size_t plaintext_len,
 unsigned char record_type)
 {
-   size_t pkt_len, iv_size = ctx->iv_size;
+   size_t pkt_len, iv_size = ctx->tx.iv_size;
 
-   pkt_len = plaintext_len + iv_size + ctx->tag_size;
+   pkt_len = plaintext_len + iv_size + ctx->tx.tag_size;
 
/* we cover nonce explicit here as well, so buf should be of
 * size KTLS_DTLS_HEADER_SIZE + KTLS_DTLS_NONCE_EXPLICIT_SIZE
@@ -217,7 +221,7 @@ static inline void tls_fill_prepend(struct tls_context *ctx,
buf[3] = pkt_len >> 8;
buf[4] = pkt_len & 0xFF;
memcpy(buf + TLS_NONCE_OFFSET,
-  ctx->iv + TLS_CIPHER_AES_GCM_128_SALT_SIZE, iv_size);
+  ctx->tx.iv + TLS_CIPHER_AES_GCM_128_SALT_SIZE, iv_size);
 }
 
 static inline void tls_make_aad(char *buf,
diff --git a/net/tls/tls_main.c b/net/tls/tls_main.c
index d824d54..c671560 100644
--- a/net/tls/tls_main.c
+++ b/net/tls/tls_main.c
@@ -259,8 +259,8 @@ static void tls_sk_proto_close(struct sock *sk, long 
timeout)
}
}
 
-   kfree(ctx->rec_seq);
-   kfree(ctx->iv);
+   kfree(ctx->tx.rec_seq);
+   kfree(ctx->tx.iv);
 
if (ctx->tx_conf == TLS_SW_TX)
tls_sw_free_tx_resources(sk);
@@ -319,9 +319,9 @@ static int do_tls_getsockopt_tx(struct sock *sk, char 
__user *optval,
}
lock_sock(sk);
memcpy(crypto_info_aes_gcm_128->iv,
-  ctx->iv + TLS_CIPHER_AES_GCM_128_SALT_SIZE,
+  ctx->tx.iv + TLS_CIPHER_AES_GCM_128_SALT_SIZE,
   TLS_CIPHER_AES_GCM_128_IV_SIZE);
-   memcpy(crypto_info_aes_gcm_128->rec_seq, ctx->rec_seq,
+   memcpy(crypto_info_aes_gcm_128->rec_seq, ctx->tx.rec_seq,
   TLS_CIPHER_AES_GCM_128_REC_SEQ_SIZE);
release_sock(sk);
if (copy_to_user(optval,
diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index ca1d20d..338d743 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -79,7 +79,7 @@ static void trim_both_sgl(struct sock *sk, int target_size)
target_size);
 
if (target_size > 0)
-   target_size += tls_ctx->overhead_size;
+   target_size += tls_ctx->tx.overhead_size;
 
trim_sg(sk, ctx->sg_encrypted_data,
>sg_encrypted_num_elem,
@@ -152,21 +152,21 @@ static int tls_do_encryption(struct tls_context *tls_ctx,
if (!aead_req)
return -ENOMEM;
 
-   ctx->sg_encrypted_data[0].offset += tls_ctx->prepend_size;
-   ctx->sg_encrypted_data[0].length -= tls_ctx->prepend_size;
+   ctx->sg_encrypted_data[0].offset += tls_ctx->tx.prepend_size;
+   ctx->sg_encrypted_data[0].length -= tls_ctx->tx.prepend_size;
 
aead_request_set_tfm(aead_req, ctx->aead_send);
aead_request_set_ad(aead_req, TLS_AAD_SPACE_SIZE);
aead_request_set_crypt(aead_req, ctx->sg_aead_in, ctx->sg_aead_out,
-  data_len, tls_ctx->iv);
+  

[PATCH v2 net-next 4/6] tls: Refactor variable names

2018-03-22 Thread Dave Watson
Several config variables are prefixed with tx, drop the prefix
since these will be used for both tx and rx.

Signed-off-by: Dave Watson <davejwat...@fb.com>
---
 include/net/tls.h  |  2 +-
 net/tls/tls_main.c | 26 +-
 2 files changed, 14 insertions(+), 14 deletions(-)

diff --git a/include/net/tls.h b/include/net/tls.h
index 6b44875..095b722 100644
--- a/include/net/tls.h
+++ b/include/net/tls.h
@@ -99,7 +99,7 @@ struct tls_context {
 
void *priv_ctx;
 
-   u8 tx_conf:2;
+   u8 conf:2;
 
struct cipher_context tx;
 
diff --git a/net/tls/tls_main.c b/net/tls/tls_main.c
index c671560..c405bee 100644
--- a/net/tls/tls_main.c
+++ b/net/tls/tls_main.c
@@ -52,7 +52,7 @@ enum {
 };
 
 enum {
-   TLS_BASE_TX,
+   TLS_BASE,
TLS_SW_TX,
TLS_NUM_CONFIG,
 };
@@ -65,7 +65,7 @@ static inline void update_sk_prot(struct sock *sk, struct 
tls_context *ctx)
 {
int ip_ver = sk->sk_family == AF_INET6 ? TLSV6 : TLSV4;
 
-   sk->sk_prot = _prots[ip_ver][ctx->tx_conf];
+   sk->sk_prot = _prots[ip_ver][ctx->conf];
 }
 
 int wait_on_pending_writer(struct sock *sk, long *timeo)
@@ -238,7 +238,7 @@ static void tls_sk_proto_close(struct sock *sk, long 
timeout)
lock_sock(sk);
sk_proto_close = ctx->sk_proto_close;
 
-   if (ctx->tx_conf == TLS_BASE_TX) {
+   if (ctx->conf == TLS_BASE) {
kfree(ctx);
goto skip_tx_cleanup;
}
@@ -262,7 +262,7 @@ static void tls_sk_proto_close(struct sock *sk, long 
timeout)
kfree(ctx->tx.rec_seq);
kfree(ctx->tx.iv);
 
-   if (ctx->tx_conf == TLS_SW_TX)
+   if (ctx->conf == TLS_SW_TX)
tls_sw_free_tx_resources(sk);
 
 skip_tx_cleanup:
@@ -371,7 +371,7 @@ static int do_tls_setsockopt_tx(struct sock *sk, char 
__user *optval,
struct tls_crypto_info *crypto_info;
struct tls_context *ctx = tls_get_ctx(sk);
int rc = 0;
-   int tx_conf;
+   int conf;
 
if (!optval || (optlen < sizeof(*crypto_info))) {
rc = -EINVAL;
@@ -418,11 +418,11 @@ static int do_tls_setsockopt_tx(struct sock *sk, char 
__user *optval,
 
/* currently SW is default, we will have ethtool in future */
rc = tls_set_sw_offload(sk, ctx);
-   tx_conf = TLS_SW_TX;
+   conf = TLS_SW_TX;
if (rc)
goto err_crypto_info;
 
-   ctx->tx_conf = tx_conf;
+   ctx->conf = conf;
update_sk_prot(sk, ctx);
ctx->sk_write_space = sk->sk_write_space;
sk->sk_write_space = tls_write_space;
@@ -465,12 +465,12 @@ static int tls_setsockopt(struct sock *sk, int level, int 
optname,
 
 static void build_protos(struct proto *prot, struct proto *base)
 {
-   prot[TLS_BASE_TX] = *base;
-   prot[TLS_BASE_TX].setsockopt= tls_setsockopt;
-   prot[TLS_BASE_TX].getsockopt= tls_getsockopt;
-   prot[TLS_BASE_TX].close = tls_sk_proto_close;
+   prot[TLS_BASE] = *base;
+   prot[TLS_BASE].setsockopt   = tls_setsockopt;
+   prot[TLS_BASE].getsockopt   = tls_getsockopt;
+   prot[TLS_BASE].close= tls_sk_proto_close;
 
-   prot[TLS_SW_TX] = prot[TLS_BASE_TX];
+   prot[TLS_SW_TX] = prot[TLS_BASE];
prot[TLS_SW_TX].sendmsg = tls_sw_sendmsg;
prot[TLS_SW_TX].sendpage= tls_sw_sendpage;
 }
@@ -513,7 +513,7 @@ static int tls_init(struct sock *sk)
mutex_unlock(_prot_mutex);
}
 
-   ctx->tx_conf = TLS_BASE_TX;
+   ctx->conf = TLS_BASE;
update_sk_prot(sk, ctx);
 out:
return rc;
-- 
2.9.5



[PATCH v2 net-next 0/6] TLS Rx

2018-03-22 Thread Dave Watson
TLS tcp socket RX implementation, to match existing TX code.

This patchset completes the software TLS socket, allowing full
bi-directional communication over TLS using normal socket syscalls,
after the handshake has been done in userspace.  Only the symmetric
encryption is done in the kernel.

This allows usage of TLS sockets from within the kernel (for example
with network block device, or from bpf).  Performance can be better
than userspace, with appropriate crypto routines [1].

sk->sk_socket->ops must be overridden to implement splice_read and
poll, but otherwise the interface & implementation match TX closely.
strparser is used to parse TLS framing on receive.

There are Openssl RX patches that work with this interface [2], as
well as a testing tool using the socket interface directly (without
cmsg support) [3].  An example tcp socket setup is:

  // Normal tcp socket connect/accept, and TLS handshake
  // using any TLS library.
  setsockopt(sock, SOL_TCP, TCP_ULP, "tls", sizeof("tls"));

  struct tls12_crypto_info_aes_gcm_128 crypto_info_rx;
  // Fill in crypto_info based on negotiated keys.

  setsockopt(sock, SOL_TLS, TLS_RX, _info, sizeof(crypto_info_rx));
  // You can optionally TLX_TX as well.

  char buffer[16384];
  int ret = recv(sock, buffer, 16384);

  // cmsg can be received using recvmsg and a msg_control 
  // of type TLS_GET_RECORD_TYPE will be set.

V1 -> V2

* For too-small framing errors, return EBADMSG, to match openssl error
  code semantics.  Docs and commit logs about this also updated.

RFC -> V1

* Refactor 'tx' variable names to drop tx
* Error return codes changed per discussion
* Only call skb_cow_data based on in-place decryption, 
  drop unnecessary frag list check.

[1] Recent crypto patchset to remove copies, resulting in optimally
zero copies vs. userspace's one, vs. previous kernel's two.  

https://marc.info/?l=linux-crypto-vger=151931242406416=2

[2] https://github.com/Mellanox/openssl/commits/tls_rx2

[3] https://github.com/ktls/af_ktls-tool/tree/RX

Dave Watson (6):
  tls: Generalize zerocopy_from_iter
  tls: Move cipher info to a separate struct
  tls: Pass error code explicitly to tls_err_abort
  tls: Refactor variable names
  tls: RX path for ktls
  tls: Add receive path documentation

 Documentation/networking/tls.txt |  66 +++-
 include/net/tls.h|  61 ++--
 include/uapi/linux/tls.h |   2 +
 net/tls/Kconfig  |   1 +
 net/tls/tls_main.c   |  92 --
 net/tls/tls_sw.c | 644 ++-
 6 files changed, 740 insertions(+), 126 deletions(-)

-- 
2.9.5



Re: [PATCH net-next 5/6] tls: RX path for ktls

2018-03-21 Thread Dave Watson
On 03/21/18 07:20 AM, Boris Pismenny wrote:
> 
> 
> On 3/20/2018 7:54 PM, Dave Watson wrote:
> > +   ctx->control = header[0];
> > +
> > +   data_len = ((header[4] & 0xFF) | (header[3] << 8));
> > +
> > +   cipher_overhead = tls_ctx->rx.tag_size + tls_ctx->rx.iv_size;
> > +
> > +   if (data_len > TLS_MAX_PAYLOAD_SIZE + cipher_overhead) {
> > +   ret = -EMSGSIZE;
> > +   goto read_failure;
> > +   }
> > +   if (data_len < cipher_overhead) {
> > +   ret = -EMSGSIZE;
> 
> I think this should be considered EBADMSG, because this error is cipher
> dependent. At least, that's what happens within OpenSSL. Also, EMSGSIZE is
> usually used only for too long messages.

Ah, indeed.  Thanks, will send v2.


Re: [PATCH net-next 06/14] net/tls: Add generic NIC offload infrastructure

2018-03-21 Thread Dave Watson
On 03/19/18 07:45 PM, Saeed Mahameed wrote:
> +#define TLS_OFFLOAD_CONTEXT_SIZE 
>   \
> + (ALIGN(sizeof(struct tls_offload_context), sizeof(void *)) +   \
> +  TLS_DRIVER_STATE_SIZE)
> +
> + pfrag = sk_page_frag(sk);
> +
> + /* KTLS_TLS_HEADER_SIZE is not counted as part of the TLS record, and

I think the define is actually TLS_HEADER_SIZE, no KTLS_ prefix

> + memcpy(ctx->iv + TLS_CIPHER_AES_GCM_128_SALT_SIZE, iv, iv_size);
> +
> + ctx->rec_seq_size = rec_seq_size;
> + /* worst case is:
> +  * MAX_SKB_FRAGS in tls_record_info
> +  * MAX_SKB_FRAGS + 1 in SKB head an frags.

spelling

> +int tls_sw_fallback_init(struct sock *sk,
> +  struct tls_offload_context *offload_ctx,
> +  struct tls_crypto_info *crypto_info)
> +{
> + int rc;
> + const u8 *key;
> +
> + offload_ctx->aead_send =
> + crypto_alloc_aead("gcm(aes)", 0, CRYPTO_ALG_ASYNC);

in tls_sw we went with async + crypto_wait_req, any reason to not do
that here?  Otherwise I think you still get the software gcm on x86
instead of aesni without additional changes.

> diff --git a/net/tls/tls_main.c b/net/tls/tls_main.c
> index d824d548447e..e0dface33017 100644
> --- a/net/tls/tls_main.c
> +++ b/net/tls/tls_main.c
> @@ -54,6 +54,9 @@ enum {
>  enum {
>   TLS_BASE_TX,
>   TLS_SW_TX,
> +#ifdef CONFIG_TLS_DEVICE
> + TLS_HW_TX,
> +#endif
>   TLS_NUM_CONFIG,
>  };

I have posted SW_RX patches, do you forsee any issues with SW_RX + HW_TX?

Thanks


[PATCH net-next 6/6] tls: Add receive path documentation

2018-03-20 Thread Dave Watson
Add documentation on rx path setup and cmsg interface.

Signed-off-by: Dave Watson <davejwat...@fb.com>
---
 Documentation/networking/tls.txt | 67 ++--
 1 file changed, 65 insertions(+), 2 deletions(-)

diff --git a/Documentation/networking/tls.txt b/Documentation/networking/tls.txt
index 77ed006..6c39505 100644
--- a/Documentation/networking/tls.txt
+++ b/Documentation/networking/tls.txt
@@ -48,6 +48,9 @@ the transmit and the receive into the kernel.
 
   setsockopt(sock, SOL_TLS, TLS_TX, _info, sizeof(crypto_info));
 
+Transmit and receive are set separately, but the setup is the same, using 
either
+TLS_TX or TLS_RX.
+
 Sending TLS application data
 
 
@@ -79,6 +82,29 @@ for memory), or the encryption will always succeed.  If 
send() returns
 -ENOMEM and some data was left on the socket buffer from a previous
 call using MSG_MORE, the MSG_MORE data is left on the socket buffer.
 
+Receiving TLS application data
+--
+
+After setting the TLS_RX socket option, all recv family socket calls
+are decrypted using TLS parameters provided.  A full TLS record must
+be received before decryption can happen.
+
+  char buffer[16384];
+  recv(sock, buffer, 16384);
+
+Received data is decrypted directly in to the user buffer if it is
+large enough, and no additional allocations occur.  If the userspace
+buffer is too small, data is decrypted in the kernel and copied to
+userspace.
+
+EINVAL is returned if the TLS version in the received message does not
+match the version passed in setsockopt.
+
+EMSGSIZE is returned if the received message is too big, or too small
+when crypto overheads are included.
+
+EBADMSG is returned if decryption failed for any other reason.
+
 Send TLS control messages
 -
 
@@ -118,6 +144,43 @@ using a record of type @record_type.
 Control message data should be provided unencrypted, and will be
 encrypted by the kernel.
 
+Receiving TLS control messages
+--
+
+TLS control messages are passed in the userspace buffer, with message
+type passed via cmsg.  If no cmsg buffer is provided, an error is
+returned if a control message is received.  Data messages may be
+received without a cmsg buffer set.
+
+  char buffer[16384];
+  char cmsg[CMSG_SPACE(sizeof(unsigned char))];
+  struct msghdr msg = {0};
+  msg.msg_control = cmsg;
+  msg.msg_controllen = sizeof(cmsg);
+
+  struct iovec msg_iov;
+  msg_iov.iov_base = buffer;
+  msg_iov.iov_len = 16384;
+
+  msg.msg_iov = _iov;
+  msg.msg_iovlen = 1;
+
+  int ret = recvmsg(sock, , 0 /* flags */);
+
+  struct cmsghdr *cmsg = CMSG_FIRSTHDR();
+  if (cmsg->cmsg_level == SOL_TLS &&
+  cmsg->cmsg_type == TLS_GET_RECORD_TYPE) {
+  int record_type = *((unsigned char *)CMSG_DATA(cmsg));
+  // Do something with record_type, and control message data in
+  // buffer.
+  //
+  // Note that record_type may be == to application data (23).
+  } else {
+  // Buffer contains application data.
+  }
+
+recv will never return data from mixed types of TLS records.
+
 Integrating in to userspace TLS library
 ---
 
@@ -126,10 +189,10 @@ layer of a userspace TLS library.
 
 A patchset to OpenSSL to use ktls as the record layer is here:
 
-https://github.com/Mellanox/tls-openssl
+https://github.com/Mellanox/openssl/commits/tls_rx2
 
 An example of calling send directly after a handshake using
 gnutls.  Since it doesn't implement a full record layer, control
 messages are not supported:
 
-https://github.com/Mellanox/tls-af_ktls_tool
+https://github.com/ktls/af_ktls-tool/commits/RX
-- 
2.9.5



[PATCH net-next 4/6] tls: Refactor variable names

2018-03-20 Thread Dave Watson
Several config variables are prefixed with tx, drop the prefix
since these will be used for both tx and rx.

Signed-off-by: Dave Watson <davejwat...@fb.com>
---
 include/net/tls.h  |  2 +-
 net/tls/tls_main.c | 26 +-
 2 files changed, 14 insertions(+), 14 deletions(-)

diff --git a/include/net/tls.h b/include/net/tls.h
index 6b44875..095b722 100644
--- a/include/net/tls.h
+++ b/include/net/tls.h
@@ -99,7 +99,7 @@ struct tls_context {
 
void *priv_ctx;
 
-   u8 tx_conf:2;
+   u8 conf:2;
 
struct cipher_context tx;
 
diff --git a/net/tls/tls_main.c b/net/tls/tls_main.c
index c671560..c405bee 100644
--- a/net/tls/tls_main.c
+++ b/net/tls/tls_main.c
@@ -52,7 +52,7 @@ enum {
 };
 
 enum {
-   TLS_BASE_TX,
+   TLS_BASE,
TLS_SW_TX,
TLS_NUM_CONFIG,
 };
@@ -65,7 +65,7 @@ static inline void update_sk_prot(struct sock *sk, struct 
tls_context *ctx)
 {
int ip_ver = sk->sk_family == AF_INET6 ? TLSV6 : TLSV4;
 
-   sk->sk_prot = _prots[ip_ver][ctx->tx_conf];
+   sk->sk_prot = _prots[ip_ver][ctx->conf];
 }
 
 int wait_on_pending_writer(struct sock *sk, long *timeo)
@@ -238,7 +238,7 @@ static void tls_sk_proto_close(struct sock *sk, long 
timeout)
lock_sock(sk);
sk_proto_close = ctx->sk_proto_close;
 
-   if (ctx->tx_conf == TLS_BASE_TX) {
+   if (ctx->conf == TLS_BASE) {
kfree(ctx);
goto skip_tx_cleanup;
}
@@ -262,7 +262,7 @@ static void tls_sk_proto_close(struct sock *sk, long 
timeout)
kfree(ctx->tx.rec_seq);
kfree(ctx->tx.iv);
 
-   if (ctx->tx_conf == TLS_SW_TX)
+   if (ctx->conf == TLS_SW_TX)
tls_sw_free_tx_resources(sk);
 
 skip_tx_cleanup:
@@ -371,7 +371,7 @@ static int do_tls_setsockopt_tx(struct sock *sk, char 
__user *optval,
struct tls_crypto_info *crypto_info;
struct tls_context *ctx = tls_get_ctx(sk);
int rc = 0;
-   int tx_conf;
+   int conf;
 
if (!optval || (optlen < sizeof(*crypto_info))) {
rc = -EINVAL;
@@ -418,11 +418,11 @@ static int do_tls_setsockopt_tx(struct sock *sk, char 
__user *optval,
 
/* currently SW is default, we will have ethtool in future */
rc = tls_set_sw_offload(sk, ctx);
-   tx_conf = TLS_SW_TX;
+   conf = TLS_SW_TX;
if (rc)
goto err_crypto_info;
 
-   ctx->tx_conf = tx_conf;
+   ctx->conf = conf;
update_sk_prot(sk, ctx);
ctx->sk_write_space = sk->sk_write_space;
sk->sk_write_space = tls_write_space;
@@ -465,12 +465,12 @@ static int tls_setsockopt(struct sock *sk, int level, int 
optname,
 
 static void build_protos(struct proto *prot, struct proto *base)
 {
-   prot[TLS_BASE_TX] = *base;
-   prot[TLS_BASE_TX].setsockopt= tls_setsockopt;
-   prot[TLS_BASE_TX].getsockopt= tls_getsockopt;
-   prot[TLS_BASE_TX].close = tls_sk_proto_close;
+   prot[TLS_BASE] = *base;
+   prot[TLS_BASE].setsockopt   = tls_setsockopt;
+   prot[TLS_BASE].getsockopt   = tls_getsockopt;
+   prot[TLS_BASE].close= tls_sk_proto_close;
 
-   prot[TLS_SW_TX] = prot[TLS_BASE_TX];
+   prot[TLS_SW_TX] = prot[TLS_BASE];
prot[TLS_SW_TX].sendmsg = tls_sw_sendmsg;
prot[TLS_SW_TX].sendpage= tls_sw_sendpage;
 }
@@ -513,7 +513,7 @@ static int tls_init(struct sock *sk)
mutex_unlock(_prot_mutex);
}
 
-   ctx->tx_conf = TLS_BASE_TX;
+   ctx->conf = TLS_BASE;
update_sk_prot(sk, ctx);
 out:
return rc;
-- 
2.9.5



[PATCH net-next 5/6] tls: RX path for ktls

2018-03-20 Thread Dave Watson
Add rx path for tls software implementation.

recvmsg, splice_read, and poll implemented.

An additional sockopt TLS_RX is added, with the same interface as
TLS_TX.  Either TLX_RX or TLX_TX may be provided separately, or
together (with two different setsockopt calls with appropriate keys).

Control messages are passed via CMSG in a similar way to transmit.
If no cmsg buffer is passed, then only application data records
will be passed to userspace, and EIO is returned for other types of
alerts.

EBADMSG is passed for decryption errors, and EMSGSIZE is passed for
framing errors (either framing too big *or* too small with crypto
overhead). EINVAL is returned for TLS versions that do not match the
original setsockopt call.  All are unrecoverable.

strparser is used to parse TLS framing.   Decryption is done directly
in to userspace buffers if they are large enough to support it, otherwise
sk_cow_data is called (similar to ipsec), and buffers are decrypted in
place and copied.  splice_read always decrypts in place, since no
buffers are provided to decrypt in to.

sk_poll is overridden, and only returns POLLIN if a full TLS message is
received.  Otherwise we wait for strparser to finish reading a full frame.
Actual decryption is only done during recvmsg or splice_read calls.

Signed-off-by: Dave Watson <davejwat...@fb.com>
---
 include/net/tls.h|  27 ++-
 include/uapi/linux/tls.h |   2 +
 net/tls/Kconfig  |   1 +
 net/tls/tls_main.c   |  62 -
 net/tls/tls_sw.c | 587 ++-
 5 files changed, 609 insertions(+), 70 deletions(-)

diff --git a/include/net/tls.h b/include/net/tls.h
index 095b722..437a746 100644
--- a/include/net/tls.h
+++ b/include/net/tls.h
@@ -40,6 +40,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 
@@ -58,8 +59,18 @@
 
 struct tls_sw_context {
struct crypto_aead *aead_send;
+   struct crypto_aead *aead_recv;
struct crypto_wait async_wait;
 
+   /* Receive context */
+   struct strparser strp;
+   void (*saved_data_ready)(struct sock *sk);
+   unsigned int (*sk_poll)(struct file *file, struct socket *sock,
+   struct poll_table_struct *wait);
+   struct sk_buff *recv_pkt;
+   u8 control;
+   bool decrypted;
+
/* Sending context */
char aad_space[TLS_AAD_SPACE_SIZE];
 
@@ -96,12 +107,17 @@ struct tls_context {
struct tls_crypto_info crypto_send;
struct tls12_crypto_info_aes_gcm_128 crypto_send_aes_gcm_128;
};
+   union {
+   struct tls_crypto_info crypto_recv;
+   struct tls12_crypto_info_aes_gcm_128 crypto_recv_aes_gcm_128;
+   };
 
void *priv_ctx;
 
u8 conf:2;
 
struct cipher_context tx;
+   struct cipher_context rx;
 
struct scatterlist *partially_sent_record;
u16 partially_sent_offset;
@@ -128,12 +144,19 @@ int tls_sk_attach(struct sock *sk, int optname, char 
__user *optval,
  unsigned int optlen);
 
 
-int tls_set_sw_offload(struct sock *sk, struct tls_context *ctx);
+int tls_set_sw_offload(struct sock *sk, struct tls_context *ctx, int tx);
 int tls_sw_sendmsg(struct sock *sk, struct msghdr *msg, size_t size);
 int tls_sw_sendpage(struct sock *sk, struct page *page,
int offset, size_t size, int flags);
 void tls_sw_close(struct sock *sk, long timeout);
-void tls_sw_free_tx_resources(struct sock *sk);
+void tls_sw_free_resources(struct sock *sk);
+int tls_sw_recvmsg(struct sock *sk, struct msghdr *msg, size_t len,
+  int nonblock, int flags, int *addr_len);
+unsigned int tls_sw_poll(struct file *file, struct socket *sock,
+struct poll_table_struct *wait);
+ssize_t tls_sw_splice_read(struct socket *sock, loff_t *ppos,
+  struct pipe_inode_info *pipe,
+  size_t len, unsigned int flags);
 
 void tls_sk_destruct(struct sock *sk, struct tls_context *ctx);
 void tls_icsk_clean_acked(struct sock *sk);
diff --git a/include/uapi/linux/tls.h b/include/uapi/linux/tls.h
index 293b2cd..c6633e9 100644
--- a/include/uapi/linux/tls.h
+++ b/include/uapi/linux/tls.h
@@ -38,6 +38,7 @@
 
 /* TLS socket options */
 #define TLS_TX 1   /* Set transmit parameters */
+#define TLS_RX 2   /* Set receive parameters */
 
 /* Supported versions */
 #define TLS_VERSION_MINOR(ver) ((ver) & 0xFF)
@@ -59,6 +60,7 @@
 #define TLS_CIPHER_AES_GCM_128_REC_SEQ_SIZE8
 
 #define TLS_SET_RECORD_TYPE1
+#define TLS_GET_RECORD_TYPE2
 
 struct tls_crypto_info {
__u16 version;
diff --git a/net/tls/Kconfig b/net/tls/Kconfig
index eb58303..89b8745a 100644
--- a/net/tls/Kconfig
+++ b/net/tls/Kconfig
@@ -7,6 +7,7 @@ config TLS
select CRYPTO
select CRYPTO_AES
select CRYPTO_GCM
+   select STREAM_PARSER
default n
 

[PATCH net-next 3/6] tls: Pass error code explicitly to tls_err_abort

2018-03-20 Thread Dave Watson
Pass EBADMSG explicitly to tls_err_abort.  Receive path will
pass additional codes - EMSGSIZE if framing is larger than max
TLS record size, EINVAL if TLS version mismatch.

Signed-off-by: Dave Watson <davejwat...@fb.com>
---
 include/net/tls.h | 6 +++---
 net/tls/tls_sw.c  | 2 +-
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/include/net/tls.h b/include/net/tls.h
index 019e52d..6b44875 100644
--- a/include/net/tls.h
+++ b/include/net/tls.h
@@ -174,9 +174,9 @@ static inline bool tls_is_pending_open_record(struct 
tls_context *tls_ctx)
return tls_ctx->pending_open_record_frags;
 }
 
-static inline void tls_err_abort(struct sock *sk)
+static inline void tls_err_abort(struct sock *sk, int err)
 {
-   sk->sk_err = EBADMSG;
+   sk->sk_err = err;
sk->sk_error_report(sk);
 }
 
@@ -197,7 +197,7 @@ static inline void tls_advance_record_sn(struct sock *sk,
 struct cipher_context *ctx)
 {
if (tls_bigint_increment(ctx->rec_seq, ctx->rec_seq_size))
-   tls_err_abort(sk);
+   tls_err_abort(sk, EBADMSG);
tls_bigint_increment(ctx->iv + TLS_CIPHER_AES_GCM_128_SALT_SIZE,
 ctx->iv_size);
 }
diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index dd4441d..6a0a669 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -269,7 +269,7 @@ static int tls_push_record(struct sock *sk, int flags,
/* Only pass through MSG_DONTWAIT and MSG_NOSIGNAL flags */
rc = tls_push_sg(sk, tls_ctx, ctx->sg_encrypted_data, 0, flags);
if (rc < 0 && rc != -EAGAIN)
-   tls_err_abort(sk);
+   tls_err_abort(sk, EBADMSG);
 
tls_advance_record_sn(sk, _ctx->tx);
return rc;
-- 
2.9.5



[PATCH net-next 1/6] tls: Generalize zerocopy_from_iter

2018-03-20 Thread Dave Watson
Refactor zerocopy_from_iter to take arguments for pages and size,
such that it can be used for both tx and rx. RX will also support
zerocopy direct to output iter, as long as the full message can
be copied at once (a large enough userspace buffer was provided).

Signed-off-by: Dave Watson <davejwat...@fb.com>
---
 net/tls/tls_sw.c | 31 +++
 1 file changed, 19 insertions(+), 12 deletions(-)

diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index f26376e..d58f675 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -281,23 +281,24 @@ static int tls_sw_push_pending_record(struct sock *sk, 
int flags)
 }
 
 static int zerocopy_from_iter(struct sock *sk, struct iov_iter *from,
- int length)
+ int length, int *pages_used,
+ unsigned int *size_used,
+ struct scatterlist *to, int to_max_pages,
+ bool charge)
 {
-   struct tls_context *tls_ctx = tls_get_ctx(sk);
-   struct tls_sw_context *ctx = tls_sw_ctx(tls_ctx);
struct page *pages[MAX_SKB_FRAGS];
 
size_t offset;
ssize_t copied, use;
int i = 0;
-   unsigned int size = ctx->sg_plaintext_size;
-   int num_elem = ctx->sg_plaintext_num_elem;
+   unsigned int size = *size_used;
+   int num_elem = *pages_used;
int rc = 0;
int maxpages;
 
while (length > 0) {
i = 0;
-   maxpages = ARRAY_SIZE(ctx->sg_plaintext_data) - num_elem;
+   maxpages = to_max_pages - num_elem;
if (maxpages == 0) {
rc = -EFAULT;
goto out;
@@ -317,10 +318,11 @@ static int zerocopy_from_iter(struct sock *sk, struct 
iov_iter *from,
while (copied) {
use = min_t(int, copied, PAGE_SIZE - offset);
 
-   sg_set_page(>sg_plaintext_data[num_elem],
+   sg_set_page([num_elem],
pages[i], use, offset);
-   sg_unmark_end(>sg_plaintext_data[num_elem]);
-   sk_mem_charge(sk, use);
+   sg_unmark_end([num_elem]);
+   if (charge)
+   sk_mem_charge(sk, use);
 
offset = 0;
copied -= use;
@@ -331,8 +333,9 @@ static int zerocopy_from_iter(struct sock *sk, struct 
iov_iter *from,
}
 
 out:
-   ctx->sg_plaintext_size = size;
-   ctx->sg_plaintext_num_elem = num_elem;
+   *size_used = size;
+   *pages_used = num_elem;
+
return rc;
 }
 
@@ -429,7 +432,11 @@ int tls_sw_sendmsg(struct sock *sk, struct msghdr *msg, 
size_t size)
 
if (full_record || eor) {
ret = zerocopy_from_iter(sk, >msg_iter,
-try_to_copy);
+   try_to_copy, >sg_plaintext_num_elem,
+   >sg_plaintext_size,
+   ctx->sg_plaintext_data,
+   ARRAY_SIZE(ctx->sg_plaintext_data),
+   true);
if (ret)
goto fallback_to_reg_send;
 
-- 
2.9.5



[PATCH net-next 2/6] tls: Move cipher info to a separate struct

2018-03-20 Thread Dave Watson
Separate tx crypto parameters to a separate cipher_context struct.
The same parameters will be used for rx using the same struct.

tls_advance_record_sn is modified to only take the cipher info.

Signed-off-by: Dave Watson <davejwat...@fb.com>
---
 include/net/tls.h  | 26 +---
 net/tls/tls_main.c |  8 
 net/tls/tls_sw.c   | 58 --
 3 files changed, 49 insertions(+), 43 deletions(-)

diff --git a/include/net/tls.h b/include/net/tls.h
index 4913430..019e52d 100644
--- a/include/net/tls.h
+++ b/include/net/tls.h
@@ -81,6 +81,16 @@ enum {
TLS_PENDING_CLOSED_RECORD
 };
 
+struct cipher_context {
+   u16 prepend_size;
+   u16 tag_size;
+   u16 overhead_size;
+   u16 iv_size;
+   char *iv;
+   u16 rec_seq_size;
+   char *rec_seq;
+};
+
 struct tls_context {
union {
struct tls_crypto_info crypto_send;
@@ -91,13 +101,7 @@ struct tls_context {
 
u8 tx_conf:2;
 
-   u16 prepend_size;
-   u16 tag_size;
-   u16 overhead_size;
-   u16 iv_size;
-   char *iv;
-   u16 rec_seq_size;
-   char *rec_seq;
+   struct cipher_context tx;
 
struct scatterlist *partially_sent_record;
u16 partially_sent_offset;
@@ -190,7 +194,7 @@ static inline bool tls_bigint_increment(unsigned char *seq, 
int len)
 }
 
 static inline void tls_advance_record_sn(struct sock *sk,
-struct tls_context *ctx)
+struct cipher_context *ctx)
 {
if (tls_bigint_increment(ctx->rec_seq, ctx->rec_seq_size))
tls_err_abort(sk);
@@ -203,9 +207,9 @@ static inline void tls_fill_prepend(struct tls_context *ctx,
 size_t plaintext_len,
 unsigned char record_type)
 {
-   size_t pkt_len, iv_size = ctx->iv_size;
+   size_t pkt_len, iv_size = ctx->tx.iv_size;
 
-   pkt_len = plaintext_len + iv_size + ctx->tag_size;
+   pkt_len = plaintext_len + iv_size + ctx->tx.tag_size;
 
/* we cover nonce explicit here as well, so buf should be of
 * size KTLS_DTLS_HEADER_SIZE + KTLS_DTLS_NONCE_EXPLICIT_SIZE
@@ -217,7 +221,7 @@ static inline void tls_fill_prepend(struct tls_context *ctx,
buf[3] = pkt_len >> 8;
buf[4] = pkt_len & 0xFF;
memcpy(buf + TLS_NONCE_OFFSET,
-  ctx->iv + TLS_CIPHER_AES_GCM_128_SALT_SIZE, iv_size);
+  ctx->tx.iv + TLS_CIPHER_AES_GCM_128_SALT_SIZE, iv_size);
 }
 
 static inline void tls_make_aad(char *buf,
diff --git a/net/tls/tls_main.c b/net/tls/tls_main.c
index d824d54..c671560 100644
--- a/net/tls/tls_main.c
+++ b/net/tls/tls_main.c
@@ -259,8 +259,8 @@ static void tls_sk_proto_close(struct sock *sk, long 
timeout)
}
}
 
-   kfree(ctx->rec_seq);
-   kfree(ctx->iv);
+   kfree(ctx->tx.rec_seq);
+   kfree(ctx->tx.iv);
 
if (ctx->tx_conf == TLS_SW_TX)
tls_sw_free_tx_resources(sk);
@@ -319,9 +319,9 @@ static int do_tls_getsockopt_tx(struct sock *sk, char 
__user *optval,
}
lock_sock(sk);
memcpy(crypto_info_aes_gcm_128->iv,
-  ctx->iv + TLS_CIPHER_AES_GCM_128_SALT_SIZE,
+  ctx->tx.iv + TLS_CIPHER_AES_GCM_128_SALT_SIZE,
   TLS_CIPHER_AES_GCM_128_IV_SIZE);
-   memcpy(crypto_info_aes_gcm_128->rec_seq, ctx->rec_seq,
+   memcpy(crypto_info_aes_gcm_128->rec_seq, ctx->tx.rec_seq,
   TLS_CIPHER_AES_GCM_128_REC_SEQ_SIZE);
release_sock(sk);
if (copy_to_user(optval,
diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index d58f675..dd4441d 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -79,7 +79,7 @@ static void trim_both_sgl(struct sock *sk, int target_size)
target_size);
 
if (target_size > 0)
-   target_size += tls_ctx->overhead_size;
+   target_size += tls_ctx->tx.overhead_size;
 
trim_sg(sk, ctx->sg_encrypted_data,
>sg_encrypted_num_elem,
@@ -207,21 +207,21 @@ static int tls_do_encryption(struct tls_context *tls_ctx,
if (!aead_req)
return -ENOMEM;
 
-   ctx->sg_encrypted_data[0].offset += tls_ctx->prepend_size;
-   ctx->sg_encrypted_data[0].length -= tls_ctx->prepend_size;
+   ctx->sg_encrypted_data[0].offset += tls_ctx->tx.prepend_size;
+   ctx->sg_encrypted_data[0].length -= tls_ctx->tx.prepend_size;
 
aead_request_set_tfm(aead_req, ctx->aead_send);
aead_request_set_ad(aead_req, TLS_AAD_SPACE_SIZE);
aead_request_set_crypt(aead_req, ctx->sg_aead_in, ctx->sg_aead_out,
-  data_len, tls_ctx->iv);
+  

[PATCH net-next 0/6] TLS Rx

2018-03-20 Thread Dave Watson
TLS tcp socket RX implementation, to match existing TX code.

This patchset completes the software TLS socket, allowing full
bi-directional communication over TLS using normal socket syscalls,
after the handshake has been done in userspace.  Only the symmetric
encryption is done in the kernel.

This allows usage of TLS sockets from within the kernel (for example
with network block device, or from bpf).  Performance can be better
than userspace, with appropriate crypto routines [1].

sk->sk_socket->ops must be overridden to implement splice_read and
poll, but otherwise the interface & implementation match TX closely.
strparser is used to parse TLS framing on receive.

There are Openssl RX patches that work with this interface [2], as
well as a testing tool using the socket interface directly (without
cmsg support) [3].  An example tcp socket setup is:

  // Normal tcp socket connect/accept, and TLS handshake
  // using any TLS library.
  setsockopt(sock, SOL_TCP, TCP_ULP, "tls", sizeof("tls"));

  struct tls12_crypto_info_aes_gcm_128 crypto_info_rx;
  // Fill in crypto_info based on negotiated keys.

  setsockopt(sock, SOL_TLS, TLS_RX, _info, sizeof(crypto_info_rx));
  // You can optionally TLX_TX as well.

  char buffer[16384];
  int ret = recv(sock, buffer, 16384);

  // cmsg can be received using recvmsg and a msg_control 
  // of type TLS_GET_RECORD_TYPE will be set.

RFC -> V1

* Refactor 'tx' variable names to drop tx
* Error return codes changed per discussion
* Only call skb_cow_data based on in-place decryption, 
  drop unnecessary frag list check.

[1] Recent crypto patchset to remove copies, resulting in optimally
zero copies vs. userspace's one, vs. previous kernel's two.  

https://marc.info/?l=linux-crypto-vger=151931242406416=2

[2] https://github.com/Mellanox/openssl/commits/tls_rx2

[3] https://github.com/ktls/af_ktls-tool/tree/RX

Dave Watson (6):
  tls: Generalize zerocopy_from_iter
  tls: Move cipher info to a separate struct
  tls: Pass error code explicitly to tls_err_abort
  tls: Refactor variable names
  tls: RX path for ktls
  tls: Add receive path documentation

 Documentation/networking/tls.txt |  67 +++-
 include/net/tls.h|  61 ++--
 include/uapi/linux/tls.h |   2 +
 net/tls/Kconfig  |   1 +
 net/tls/tls_main.c   |  92 --
 net/tls/tls_sw.c | 644 ++-
 6 files changed, 741 insertions(+), 126 deletions(-)

-- 
2.9.5



Re: [PATCH RFC 4/5] tls: RX path for ktls

2018-03-08 Thread Dave Watson
On 03/08/18 09:48 PM, Boris Pismenny wrote:
> Hi Dave,
> 
> On 03/08/18 18:50, Dave Watson wrote:
> > Add rx path for tls software implementation.
> > 
> > recvmsg, splice_read, and poll implemented.
> > 
> > An additional sockopt TLS_RX is added, with the same interface as
> > TLS_TX.  Either TLX_RX or TLX_TX may be provided separately, or
> > together (with two different setsockopt calls with appropriate keys).
> > 
> > Control messages are passed via CMSG in a similar way to transmit.
> > If no cmsg buffer is passed, then only application data records
> > will be passed to userspace, and EIO is returned for other types of
> > alerts.
> > 
> > EBADMSG is passed for decryption errors, and E2BIG is passed for framing
> > errors.  Both are unrecoverable.
> 
> I think E2BIG is for too long argument list. EMSGSIZE might be more
> appropriate.

Sounds good.

> Also, we must check that the record is not too short (cipher specific).
> For TLS1.2 with AES-GCM the minimum length is 8 (IV) + 16 (TAG).
> The correct error for this case is EBADMSG, like a decryption failure.
> 
> Also, how about bad TLS version (e.g. not TLS1.2)?
> A separate error type is required for bad version, because it triggers a
> unique alert in libraries such as OpenSSL.
> I thought of using EINVAL for bad version. What do you think?

Ah, I did not realize there was a separate alert for that, sounds good.

> 
> I wonder if we should provide a more flexible method of obtaining errors for
> the future.
> Maybe use a special CMSG for errors?
> This CMSG will be triggered only after the socket enters the error state.

I'm not opposed to this in principle, but without a concrete use am
hesitant to add it.  I don't know of any other error codes that could
be returned besides the ones discussed above.

> > 
> > +
> > +int tls_sw_recvmsg(struct sock *sk,
> > +  struct msghdr *msg,
> > +  size_t len,
> > +  int nonblock,
> > +  int flags,
> > +  int *addr_len)
> > +{
> > +   struct tls_context *tls_ctx = tls_get_ctx(sk);
> > +   struct tls_sw_context *ctx = tls_sw_ctx(tls_ctx);
> > +   unsigned char control;
> > +   struct strp_msg *rxm;
> > +   struct sk_buff *skb;
> > +   ssize_t copied = 0;
> > +   bool cmsg = false;
> > +   int err = 0;
> > +   long timeo;
> Maybe try to read from the error queue here?

Sure.



[PATCH RFC 5/5] tls: Add receive path documentation

2018-03-08 Thread Dave Watson
Add documentation on rx path setup and cmsg interface.

Signed-off-by: Dave Watson <davejwat...@fb.com>
---
 Documentation/networking/tls.txt | 59 ++--
 1 file changed, 57 insertions(+), 2 deletions(-)

diff --git a/Documentation/networking/tls.txt b/Documentation/networking/tls.txt
index 77ed006..d341016 100644
--- a/Documentation/networking/tls.txt
+++ b/Documentation/networking/tls.txt
@@ -48,6 +48,9 @@ the transmit and the receive into the kernel.
 
   setsockopt(sock, SOL_TLS, TLS_TX, _info, sizeof(crypto_info));
 
+Transmit and receive are set separately, but the setup is the same, using 
either
+TLS_TX or TLS_RX.
+
 Sending TLS application data
 
 
@@ -79,6 +82,21 @@ for memory), or the encryption will always succeed.  If 
send() returns
 -ENOMEM and some data was left on the socket buffer from a previous
 call using MSG_MORE, the MSG_MORE data is left on the socket buffer.
 
+Receiving TLS application data
+--
+
+After setting the TLS_RX socket option, all recv family socket calls
+are decrypted using TLS parameters provided.  A full TLS record must
+be received before decryption can happen.
+
+  char buffer[16384];
+  recv(sock, buffer, 16384);
+
+Received data is decrypted directly in to the user buffer if it is
+large enough, and no additional allocations occur.  If the userspace
+buffer is too small, data is decrypted in the kernel and copied to
+userspace.
+
 Send TLS control messages
 -
 
@@ -118,6 +136,43 @@ using a record of type @record_type.
 Control message data should be provided unencrypted, and will be
 encrypted by the kernel.
 
+Receiving TLS control messages
+--
+
+TLS control messages are passed in the userspace buffer, with message
+type passed via cmsg.  If no cmsg buffer is provided, an error is
+returned if a control message is received.  Data messages may be
+received without a cmsg buffer set.
+
+  char buffer[16384];
+  char cmsg[CMSG_SPACE(sizeof(unsigned char))];
+  struct msghdr msg = {0};
+  msg.msg_control = cmsg;
+  msg.msg_controllen = sizeof(cmsg);
+
+  struct iovec msg_iov;
+  msg_iov.iov_base = buffer;
+  msg_iov.iov_len = 16384;
+
+  msg.msg_iov = _iov;
+  msg.msg_iovlen = 1;
+
+  int ret = recvmsg(sock, , 0 /* flags */);
+
+  struct cmsghdr *cmsg = CMSG_FIRSTHDR();
+  if (cmsg->cmsg_level == SOL_TLS &&
+  cmsg->cmsg_type == TLS_GET_RECORD_TYPE) {
+  int record_type = *((unsigned char *)CMSG_DATA(cmsg));
+  // Do something with record_type, and control message data in
+  // buffer.
+  //
+  // Note that record_type may be == to application data (23).
+  } else {
+// Buffer contains application data.
+  }
+
+recv will never return data from mixed types of TLS records.
+
 Integrating in to userspace TLS library
 ---
 
@@ -126,10 +181,10 @@ layer of a userspace TLS library.
 
 A patchset to OpenSSL to use ktls as the record layer is here:
 
-https://github.com/Mellanox/tls-openssl
+https://github.com/Mellanox/openssl/commits/tls_rx
 
 An example of calling send directly after a handshake using
 gnutls.  Since it doesn't implement a full record layer, control
 messages are not supported:
 
-https://github.com/Mellanox/tls-af_ktls_tool
+https://github.com/ktls/af_ktls-tool/commits/RX
-- 
2.9.5



[PATCH RFC 1/5] tls: Generalize zerocopy_from_iter

2018-03-08 Thread Dave Watson
Refactor zerocopy_from_iter to take arguments for pages and size,
such that it can be used for both tx and rx. RX will also support
zerocopy direct to output iter, as long as the full message can
be copied at once (a large enough userspace buffer was provided).

Signed-off-by: Dave Watson <davejwat...@fb.com>
---
 net/tls/tls_sw.c | 31 +++
 1 file changed, 19 insertions(+), 12 deletions(-)

diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index f26376e..d58f675 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -281,23 +281,24 @@ static int tls_sw_push_pending_record(struct sock *sk, 
int flags)
 }
 
 static int zerocopy_from_iter(struct sock *sk, struct iov_iter *from,
- int length)
+ int length, int *pages_used,
+ unsigned int *size_used,
+ struct scatterlist *to, int to_max_pages,
+ bool charge)
 {
-   struct tls_context *tls_ctx = tls_get_ctx(sk);
-   struct tls_sw_context *ctx = tls_sw_ctx(tls_ctx);
struct page *pages[MAX_SKB_FRAGS];
 
size_t offset;
ssize_t copied, use;
int i = 0;
-   unsigned int size = ctx->sg_plaintext_size;
-   int num_elem = ctx->sg_plaintext_num_elem;
+   unsigned int size = *size_used;
+   int num_elem = *pages_used;
int rc = 0;
int maxpages;
 
while (length > 0) {
i = 0;
-   maxpages = ARRAY_SIZE(ctx->sg_plaintext_data) - num_elem;
+   maxpages = to_max_pages - num_elem;
if (maxpages == 0) {
rc = -EFAULT;
goto out;
@@ -317,10 +318,11 @@ static int zerocopy_from_iter(struct sock *sk, struct 
iov_iter *from,
while (copied) {
use = min_t(int, copied, PAGE_SIZE - offset);
 
-   sg_set_page(>sg_plaintext_data[num_elem],
+   sg_set_page([num_elem],
pages[i], use, offset);
-   sg_unmark_end(>sg_plaintext_data[num_elem]);
-   sk_mem_charge(sk, use);
+   sg_unmark_end([num_elem]);
+   if (charge)
+   sk_mem_charge(sk, use);
 
offset = 0;
copied -= use;
@@ -331,8 +333,9 @@ static int zerocopy_from_iter(struct sock *sk, struct 
iov_iter *from,
}
 
 out:
-   ctx->sg_plaintext_size = size;
-   ctx->sg_plaintext_num_elem = num_elem;
+   *size_used = size;
+   *pages_used = num_elem;
+
return rc;
 }
 
@@ -429,7 +432,11 @@ int tls_sw_sendmsg(struct sock *sk, struct msghdr *msg, 
size_t size)
 
if (full_record || eor) {
ret = zerocopy_from_iter(sk, >msg_iter,
-try_to_copy);
+   try_to_copy, >sg_plaintext_num_elem,
+   >sg_plaintext_size,
+   ctx->sg_plaintext_data,
+   ARRAY_SIZE(ctx->sg_plaintext_data),
+   true);
if (ret)
goto fallback_to_reg_send;
 
-- 
2.9.5



[PATCH RFC 4/5] tls: RX path for ktls

2018-03-08 Thread Dave Watson
Add rx path for tls software implementation.

recvmsg, splice_read, and poll implemented.

An additional sockopt TLS_RX is added, with the same interface as
TLS_TX.  Either TLX_RX or TLX_TX may be provided separately, or
together (with two different setsockopt calls with appropriate keys).

Control messages are passed via CMSG in a similar way to transmit.
If no cmsg buffer is passed, then only application data records
will be passed to userspace, and EIO is returned for other types of
alerts.

EBADMSG is passed for decryption errors, and E2BIG is passed for framing
errors.  Both are unrecoverable.

strparser is used to parse TLS framing.   Decryption is done directly
in to userspace buffers if they are large enough to support it, otherwise
sk_cow_data is called (similar to ipsec), and buffers are decrypted in
place and copied.  splice_read always decrypts in place, since no
buffers are provided to decrypt in to.

sk_poll is overridden, and only returns POLLIN if a full TLS message is
received.  Otherwise we wait for strparser to finish reading a full frame.
Actual decryption is only done during recvmsg or splice_read calls.

Signed-off-by: Dave Watson <davejwat...@fb.com>
---
 include/net/tls.h|  27 ++-
 include/uapi/linux/tls.h |   2 +
 net/tls/Kconfig  |   1 +
 net/tls/tls_main.c   |  62 -
 net/tls/tls_sw.c | 574 ++-
 5 files changed, 596 insertions(+), 70 deletions(-)

diff --git a/include/net/tls.h b/include/net/tls.h
index 6b44875..7202026 100644
--- a/include/net/tls.h
+++ b/include/net/tls.h
@@ -40,6 +40,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 
@@ -58,8 +59,18 @@
 
 struct tls_sw_context {
struct crypto_aead *aead_send;
+   struct crypto_aead *aead_recv;
struct crypto_wait async_wait;
 
+   /* Receive context */
+   struct strparser strp;
+   void (*saved_data_ready)(struct sock *sk);
+   unsigned int (*sk_poll)(struct file *file, struct socket *sock,
+   struct poll_table_struct *wait);
+   struct sk_buff *recv_pkt;
+   u8 control;
+   bool decrypted;
+
/* Sending context */
char aad_space[TLS_AAD_SPACE_SIZE];
 
@@ -96,12 +107,17 @@ struct tls_context {
struct tls_crypto_info crypto_send;
struct tls12_crypto_info_aes_gcm_128 crypto_send_aes_gcm_128;
};
+   union {
+   struct tls_crypto_info crypto_recv;
+   struct tls12_crypto_info_aes_gcm_128 crypto_recv_aes_gcm_128;
+   };
 
void *priv_ctx;
 
u8 tx_conf:2;
 
struct cipher_context tx;
+   struct cipher_context rx;
 
struct scatterlist *partially_sent_record;
u16 partially_sent_offset;
@@ -128,12 +144,19 @@ int tls_sk_attach(struct sock *sk, int optname, char 
__user *optval,
  unsigned int optlen);
 
 
-int tls_set_sw_offload(struct sock *sk, struct tls_context *ctx);
+int tls_set_sw_offload(struct sock *sk, struct tls_context *ctx, int tx);
 int tls_sw_sendmsg(struct sock *sk, struct msghdr *msg, size_t size);
 int tls_sw_sendpage(struct sock *sk, struct page *page,
int offset, size_t size, int flags);
 void tls_sw_close(struct sock *sk, long timeout);
-void tls_sw_free_tx_resources(struct sock *sk);
+void tls_sw_free_resources(struct sock *sk);
+int tls_sw_recvmsg(struct sock *sk, struct msghdr *msg, size_t len,
+  int nonblock, int flags, int *addr_len);
+unsigned int tls_sw_poll(struct file *file, struct socket *sock,
+struct poll_table_struct *wait);
+ssize_t tls_sw_splice_read(struct socket *sock, loff_t *ppos,
+  struct pipe_inode_info *pipe,
+  size_t len, unsigned int flags);
 
 void tls_sk_destruct(struct sock *sk, struct tls_context *ctx);
 void tls_icsk_clean_acked(struct sock *sk);
diff --git a/include/uapi/linux/tls.h b/include/uapi/linux/tls.h
index 293b2cd..c6633e9 100644
--- a/include/uapi/linux/tls.h
+++ b/include/uapi/linux/tls.h
@@ -38,6 +38,7 @@
 
 /* TLS socket options */
 #define TLS_TX 1   /* Set transmit parameters */
+#define TLS_RX 2   /* Set receive parameters */
 
 /* Supported versions */
 #define TLS_VERSION_MINOR(ver) ((ver) & 0xFF)
@@ -59,6 +60,7 @@
 #define TLS_CIPHER_AES_GCM_128_REC_SEQ_SIZE8
 
 #define TLS_SET_RECORD_TYPE1
+#define TLS_GET_RECORD_TYPE2
 
 struct tls_crypto_info {
__u16 version;
diff --git a/net/tls/Kconfig b/net/tls/Kconfig
index eb58303..89b8745a 100644
--- a/net/tls/Kconfig
+++ b/net/tls/Kconfig
@@ -7,6 +7,7 @@ config TLS
select CRYPTO
select CRYPTO_AES
select CRYPTO_GCM
+   select STREAM_PARSER
default n
---help---
Enable kernel support for TLS protocol. This allows symmetric
diff --git a/net/tls/tls_main.c b/net/tls/tls_main.c
index c67156

[PATCH RFC 3/5] tls: Pass error code explicitly to tls_err_abort

2018-03-08 Thread Dave Watson
Pass EBADMSG explicitly to tls_err_abort.  Receive path will
pass additional codes - E2BIG if framing is larger than max
TLS record size.

Signed-off-by: Dave Watson <davejwat...@fb.com>
---
 include/net/tls.h | 6 +++---
 net/tls/tls_sw.c  | 2 +-
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/include/net/tls.h b/include/net/tls.h
index 019e52d..6b44875 100644
--- a/include/net/tls.h
+++ b/include/net/tls.h
@@ -174,9 +174,9 @@ static inline bool tls_is_pending_open_record(struct 
tls_context *tls_ctx)
return tls_ctx->pending_open_record_frags;
 }
 
-static inline void tls_err_abort(struct sock *sk)
+static inline void tls_err_abort(struct sock *sk, int err)
 {
-   sk->sk_err = EBADMSG;
+   sk->sk_err = err;
sk->sk_error_report(sk);
 }
 
@@ -197,7 +197,7 @@ static inline void tls_advance_record_sn(struct sock *sk,
 struct cipher_context *ctx)
 {
if (tls_bigint_increment(ctx->rec_seq, ctx->rec_seq_size))
-   tls_err_abort(sk);
+   tls_err_abort(sk, EBADMSG);
tls_bigint_increment(ctx->iv + TLS_CIPHER_AES_GCM_128_SALT_SIZE,
 ctx->iv_size);
 }
diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index dd4441d..6a0a669 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -269,7 +269,7 @@ static int tls_push_record(struct sock *sk, int flags,
/* Only pass through MSG_DONTWAIT and MSG_NOSIGNAL flags */
rc = tls_push_sg(sk, tls_ctx, ctx->sg_encrypted_data, 0, flags);
if (rc < 0 && rc != -EAGAIN)
-   tls_err_abort(sk);
+   tls_err_abort(sk, EBADMSG);
 
tls_advance_record_sn(sk, _ctx->tx);
return rc;
-- 
2.9.5



[PATCH RFC 2/5] tls: Move cipher info to a separate struct

2018-03-08 Thread Dave Watson
Separate tx crypto parameters to a separate cipher_context struct.
The same parameters will be used for rx using the same struct.

tls_advance_record_sn is modified to only take the cipher info.

Signed-off-by: Dave Watson <davejwat...@fb.com>
---
 include/net/tls.h  | 26 +---
 net/tls/tls_main.c |  8 
 net/tls/tls_sw.c   | 58 --
 3 files changed, 49 insertions(+), 43 deletions(-)

diff --git a/include/net/tls.h b/include/net/tls.h
index 4913430..019e52d 100644
--- a/include/net/tls.h
+++ b/include/net/tls.h
@@ -81,6 +81,16 @@ enum {
TLS_PENDING_CLOSED_RECORD
 };
 
+struct cipher_context {
+   u16 prepend_size;
+   u16 tag_size;
+   u16 overhead_size;
+   u16 iv_size;
+   char *iv;
+   u16 rec_seq_size;
+   char *rec_seq;
+};
+
 struct tls_context {
union {
struct tls_crypto_info crypto_send;
@@ -91,13 +101,7 @@ struct tls_context {
 
u8 tx_conf:2;
 
-   u16 prepend_size;
-   u16 tag_size;
-   u16 overhead_size;
-   u16 iv_size;
-   char *iv;
-   u16 rec_seq_size;
-   char *rec_seq;
+   struct cipher_context tx;
 
struct scatterlist *partially_sent_record;
u16 partially_sent_offset;
@@ -190,7 +194,7 @@ static inline bool tls_bigint_increment(unsigned char *seq, 
int len)
 }
 
 static inline void tls_advance_record_sn(struct sock *sk,
-struct tls_context *ctx)
+struct cipher_context *ctx)
 {
if (tls_bigint_increment(ctx->rec_seq, ctx->rec_seq_size))
tls_err_abort(sk);
@@ -203,9 +207,9 @@ static inline void tls_fill_prepend(struct tls_context *ctx,
 size_t plaintext_len,
 unsigned char record_type)
 {
-   size_t pkt_len, iv_size = ctx->iv_size;
+   size_t pkt_len, iv_size = ctx->tx.iv_size;
 
-   pkt_len = plaintext_len + iv_size + ctx->tag_size;
+   pkt_len = plaintext_len + iv_size + ctx->tx.tag_size;
 
/* we cover nonce explicit here as well, so buf should be of
 * size KTLS_DTLS_HEADER_SIZE + KTLS_DTLS_NONCE_EXPLICIT_SIZE
@@ -217,7 +221,7 @@ static inline void tls_fill_prepend(struct tls_context *ctx,
buf[3] = pkt_len >> 8;
buf[4] = pkt_len & 0xFF;
memcpy(buf + TLS_NONCE_OFFSET,
-  ctx->iv + TLS_CIPHER_AES_GCM_128_SALT_SIZE, iv_size);
+  ctx->tx.iv + TLS_CIPHER_AES_GCM_128_SALT_SIZE, iv_size);
 }
 
 static inline void tls_make_aad(char *buf,
diff --git a/net/tls/tls_main.c b/net/tls/tls_main.c
index d824d54..c671560 100644
--- a/net/tls/tls_main.c
+++ b/net/tls/tls_main.c
@@ -259,8 +259,8 @@ static void tls_sk_proto_close(struct sock *sk, long 
timeout)
}
}
 
-   kfree(ctx->rec_seq);
-   kfree(ctx->iv);
+   kfree(ctx->tx.rec_seq);
+   kfree(ctx->tx.iv);
 
if (ctx->tx_conf == TLS_SW_TX)
tls_sw_free_tx_resources(sk);
@@ -319,9 +319,9 @@ static int do_tls_getsockopt_tx(struct sock *sk, char 
__user *optval,
}
lock_sock(sk);
memcpy(crypto_info_aes_gcm_128->iv,
-  ctx->iv + TLS_CIPHER_AES_GCM_128_SALT_SIZE,
+  ctx->tx.iv + TLS_CIPHER_AES_GCM_128_SALT_SIZE,
   TLS_CIPHER_AES_GCM_128_IV_SIZE);
-   memcpy(crypto_info_aes_gcm_128->rec_seq, ctx->rec_seq,
+   memcpy(crypto_info_aes_gcm_128->rec_seq, ctx->tx.rec_seq,
   TLS_CIPHER_AES_GCM_128_REC_SEQ_SIZE);
release_sock(sk);
if (copy_to_user(optval,
diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index d58f675..dd4441d 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -79,7 +79,7 @@ static void trim_both_sgl(struct sock *sk, int target_size)
target_size);
 
if (target_size > 0)
-   target_size += tls_ctx->overhead_size;
+   target_size += tls_ctx->tx.overhead_size;
 
trim_sg(sk, ctx->sg_encrypted_data,
>sg_encrypted_num_elem,
@@ -207,21 +207,21 @@ static int tls_do_encryption(struct tls_context *tls_ctx,
if (!aead_req)
return -ENOMEM;
 
-   ctx->sg_encrypted_data[0].offset += tls_ctx->prepend_size;
-   ctx->sg_encrypted_data[0].length -= tls_ctx->prepend_size;
+   ctx->sg_encrypted_data[0].offset += tls_ctx->tx.prepend_size;
+   ctx->sg_encrypted_data[0].length -= tls_ctx->tx.prepend_size;
 
aead_request_set_tfm(aead_req, ctx->aead_send);
aead_request_set_ad(aead_req, TLS_AAD_SPACE_SIZE);
aead_request_set_crypt(aead_req, ctx->sg_aead_in, ctx->sg_aead_out,
-  data_len, tls_ctx->iv);
+  

[PATCH RFC 0/5] TLX Rx

2018-03-08 Thread Dave Watson
TLS tcp socket RX implementation, to match existing TX code.

This patchset completes the software TLS socket, allowing full
bi-directional communication over TLS using normal socket syscalls,
after the handshake has been done in userspace.  Only the symmetric
encryption is done in the kernel.

This allows usage of TLS sockets from within the kernel (for example
with network block device, or from bpf).  Performance can be better
than userspace, with appropriate crypto routines [1].

sk->sk_socket->ops must be overridden to implement splice_read and
poll, but otherwise the interface & implementation match TX closely.
strparser is used to parse TLS framing on receive.

There are Openssl RX patches that work with this interface [2], as
well as a testing tool using the socket interface directly (without
cmsg support) [3].  An example tcp socket setup is:

  // Normal tcp socket connect/accept, and TLS handshake
  // using any TLS library.
  setsockopt(sock, SOL_TCP, TCP_ULP, "tls", sizeof("tls"));

  struct tls12_crypto_info_aes_gcm_128 crypto_info_rx;
  // Fill in crypto_info based on negotiated keys.

  setsockopt(sock, SOL_TLS, TLS_RX, _info, sizeof(crypto_info_rx));
  // You can optionally TLX_TX as well.

  char buffer[16384];
  int ret = recv(sock, buffer, 16384);

  // cmsg can be received using recvmsg and a msg_control 
  // of type TLS_GET_RECORD_TYPE will be set.

[1] Recent crypto patchset to remove copies, resulting in optimally
zero copies vs. userspace's one, vs. previous kernel's two.  

https://marc.info/?l=linux-crypto-vger=151931242406416=2

[2] https://github.com/Mellanox/openssl/commits/tls_rx

[3] https://github.com/ktls/af_ktls-tool/tree/RX

Dave Watson (5):
  tls: Generalize zerocopy_from_iter
  tls: Move cipher info to a separate struct
  tls: Pass error code explicitly to tls_err_abort
  tls: RX path for ktls
  tls: Add receive path documentation

 Documentation/networking/tls.txt |  59 +++-
 include/net/tls.h|  59 +++-
 include/uapi/linux/tls.h |   2 +
 net/tls/Kconfig  |   1 +
 net/tls/tls_main.c   |  70 -
 net/tls/tls_sw.c | 631 ++-
 6 files changed, 708 insertions(+), 114 deletions(-)

-- 
2.9.5



Re: [Crypto v7 03/12] tls: support for inline tls

2018-02-23 Thread Dave Watson
On 02/23/18 04:58 PM, Atul Gupta wrote:
> > On 02/22/18 11:21 PM, Atul Gupta wrote:
> > > @@ -403,6 +431,15 @@ static int do_tls_setsockopt_tx(struct sock *sk, 
> > > char __user *optval,
> > >   goto err_crypto_info;
> > >   }
> > >  
> > > + rc = tls_offload_dev_absent(sk);
> > > + if (rc == -EINVAL) {
> > > + goto out;
> > > + } else if (rc == -EEXIST) {
> > > + /* Retain HW unhash for cleanup and move to SW Tx */
> > > + sk->sk_prot[TLS_BASE_TX].unhash =
> > > + sk->sk_prot[TLS_FULL_HW].unhash;
> > 
> > I'm still confused by this, it lookes like it is modifying the global 
> > tls_prots without taking a lock?  And modifying it for all sockets, not 
> > just this one?  One way to fix might be to always set an unhash in 
> > TLS_BASE_TX, and then have a function pointer unhash in ctx.
> 
> code enters do_tls_setsockopt_tx only for those offload capable dev which 
> does not define FULL_HW setsockopt as done by chtls, unhash prot update is 
> required for cleanup/revert of setup done in tls_hw_hash. This update does 
> not impact SW or other Inline HW path. 

I still don't follow.  If it doesn't impact SW, then what is it doing?
According to the comment, we're moving to SW tx, where sk_prot will be
_prot[TLS_SW_TX], and the unhash function you set here in
TLS_BASE_TX won't be called.



Re: [Crypto v7 03/12] tls: support for inline tls

2018-02-23 Thread Dave Watson
On 02/22/18 11:21 PM, Atul Gupta wrote:
> @@ -403,6 +431,15 @@ static int do_tls_setsockopt_tx(struct sock *sk, char 
> __user *optval,
>   goto err_crypto_info;
>   }
>  
> + rc = tls_offload_dev_absent(sk);
> + if (rc == -EINVAL) {
> + goto out;
> + } else if (rc == -EEXIST) {
> + /* Retain HW unhash for cleanup and move to SW Tx */
> + sk->sk_prot[TLS_BASE_TX].unhash =
> + sk->sk_prot[TLS_FULL_HW].unhash;

I'm still confused by this, it lookes like it is modifying the global
tls_prots without taking a lock?  And modifying it for all sockets,
not just this one?  One way to fix might be to always set an unhash in
TLS_BASE_TX, and then have a function pointer unhash in ctx.

> +static void tls_hw_unhash(struct sock *sk)
> +{
> + struct tls_device *dev;
> +
> + mutex_lock(_mutex);
> + list_for_each_entry(dev, _list, dev_list) {
> + if (dev->unhash)
> + dev->unhash(dev, sk);
> + }
> + mutex_unlock(_mutex);
> + sk->sk_prot->unhash(sk);

I would have thought unhash() here was tls_hw_unhash, doesn't the
original callback need to be saved like the other ones
(set/getsockopt, etc) in tls_init?  Similar for hash().

It looks like in patch 11 you directly call tcp_prot.hash/unhash, so
it doesn't have this issue.


Re: [Crypto v5 03/12] support for inline tls

2018-02-15 Thread Dave Watson
On 02/15/18 04:10 PM, Atul Gupta wrote:
> > -Original Message-
> > From: Dave Watson [mailto:davejwat...@fb.com] 
> > Sent: Thursday, February 15, 2018 9:22 PM
> > To: Atul Gupta <atul.gu...@chelsio.com>
> > Cc: da...@davemloft.net; herb...@gondor.apana.org.au; s...@queasysnail.net; 
> > linux-cry...@vger.kernel.org; netdev@vger.kernel.org; Ganesh GR 
> > <ganes...@chelsio.com>
> > Subject: Re: [Crypto v5 03/12] support for inline tls
> > 
> > On 02/15/18 12:24 PM, Atul Gupta wrote:
> > > @@ -401,6 +430,15 @@ static int do_tls_setsockopt_tx(struct sock *sk, 
> > > char __user *optval,
> > >   goto out;
> > >   }
> > >  
> > > + rc = get_tls_offload_dev(sk);
> > > + if (rc) {
> > > + goto out;
> > > + } else {
> > > + /* Retain HW unhash for cleanup and move to SW Tx */
> > > + sk->sk_prot[TLS_BASE_TX].unhash =
> > > + sk->sk_prot[TLS_FULL_HW].unhash;
> > 
> > Isn't sk->sk_prot a pointer to a global shared struct here still?  It looks 
> > like this would actually modify the global struct, and not just for this sk.
> Yes, its global. It require add on check to modify only when tls_offload dev 
> list has an entry. I will revisit and correct. 
> 
> Can you look through other changes please?

I looked through 1,2,3,11 (the tls-related ones) and don't have any
other code comments.  Patch 11 commit message still mentions ULP,
could use updating / clarification.



Re: [Crypto v5 03/12] support for inline tls

2018-02-15 Thread Dave Watson
On 02/15/18 12:24 PM, Atul Gupta wrote:
> @@ -401,6 +430,15 @@ static int do_tls_setsockopt_tx(struct sock *sk, char 
> __user *optval,
>   goto out;
>   }
>  
> + rc = get_tls_offload_dev(sk);
> + if (rc) {
> + goto out;
> + } else {
> + /* Retain HW unhash for cleanup and move to SW Tx */
> + sk->sk_prot[TLS_BASE_TX].unhash =
> + sk->sk_prot[TLS_FULL_HW].unhash;

Isn't sk->sk_prot a pointer to a global shared struct here still?  It
looks like this would actually modify the global struct, and not just
for this sk.



Re: [PATCHv2] tls: Add support for encryption using async offload accelerator

2018-01-31 Thread Dave Watson
On 01/31/18 05:22 PM, Vakul Garg wrote:
> > > On second though in stable we should probably just disable async tfm
> > > allocations.
> > > It's simpler. But this approach is still good for -next
> > >
> > >
> > > Gilad
> > 
> > I agree with Gilad, just disable async for now.
> > 
> 
> How to do it? Can you help with the api name?

*aead = crypto_alloc_aead("gcm(aes)", 0, CRYPTO_ALG_ASYNC);

https://github.com/ktls/net_next_ktls/commit/f3b9b402e755e4b0623fa83f88137173fc249f2d

> > If the flag MSG_DONTWAIT is set, we should be returning -EINPROGRESS and
> > not wait for a response.  I had started working on a patch for that, but 
> > it's
> > pretty tricky to get right.
> 
> Can you point me to your WIP code branch for this?

https://github.com/ktls/net_next_ktls/commit/9cc839aa551ed972d148ecebf353b25ee93543b9

> If MSG_DONTWAIT is not used, will it be sane if enqueue the crypto request to 
> accelerator and return to user space back so that user space can send more 
> plaintext data while 
> crypto accelerator is working in parallel?

Right, that's roughly what the above does.  I believe the tricky
unfinished part was getting poll() to work correctly if there is an
async crypto request outstanding.  Currently the tls poll() just
relies on backpressure from do_tcp_sendpages.


Re: [RFC crypto v3 8/9] chtls: Register the ULP

2018-01-31 Thread Dave Watson
On 01/31/18 04:14 PM, Atul Gupta wrote:
> 
> 
> On Tuesday 30 January 2018 10:41 PM, Dave Watson wrote:
> > On 01/30/18 06:51 AM, Atul Gupta wrote:
> > 
> > > What I was referring is that passing "tls" ulp type in setsockopt
> > > may be insufficient to make the decision when multi HW assist Inline
> > > TLS solution exists.
> > Setting the ULP doesn't choose HW or SW implementation, I think that
> > should be done later when setting up crypto with
> > 
> > setsockopt(SOL_TLS, TLS_TX, struct crypto_info).
> setsockpot [mentioned above] is quite late for driver to enable HW
> implementation, we require something as early as tls_init [setsockopt(sock,
> SOL_TCP, TCP_ULP, "tls", sizeof("tls"))], for driver to set HW prot and
> offload connection beside Inline Tx/Rx.
> > 
> > Any reason we can't use ethtool to choose HW vs SW implementation, if
> > available on the device?
> Thought about it,  the interface index is not available to fetch netdev and
> caps check to set HW prot eg. bind [prot.hash] --> tls_hash to program HW.

Perhaps this is the part I don't follow - why do you need to override
hash and check for LISTEN?  I briefly looked through the patch named
"CPL handler definition", this looks like it is a full TCP offload?


Re: [PATCHv2] tls: Add support for encryption using async offload accelerator

2018-01-31 Thread Dave Watson
On 01/31/18 09:34 PM, Vakul Garg wrote:
> Async crypto accelerators (e.g. drivers/crypto/caam) support offloading
> GCM operation. If they are enabled, crypto_aead_encrypt() return error
> code -EINPROGRESS. In this case tls_do_encryption() needs to wait on a
> completion till the time the response for crypto offload request is
> received.

Comments from V1
> On Wed, Jan 31, 2018 at 8:10 AM, Gilad Ben-Yossef  wrote:
>> Hi Vakul,
>>
>> On Wed, Jan 31, 2018 at 12:36 PM, Vakul Garg  wrote:
>>> Async crypto accelerators (e.g. drivers/crypto/caam) support offloading
>>> GCM operation. If they are enabled, crypto_aead_encrypt() return error
>>> code -EINPROGRESS. In this case tls_do_encryption() needs to wait on a
>>> completion till the time the response for crypto offload request is
>>> received.
>>>
>>
>> Thank you for this patch. I think it is actually a bug fix and should
>> probably go into stable
>
> On second though in stable we should probably just disable async tfm
> allocations.
> It's simpler. But this approach is still good for -next
>
>
> Gilad

I agree with Gilad, just disable async for now. 

If the flag MSG_DONTWAIT is set, we should be returning -EINPROGRESS
and not wait for a response.  I had started working on a patch for
that, but it's pretty tricky to get right.


Re: [RFC crypto v3 8/9] chtls: Register the ULP

2018-01-30 Thread Dave Watson
On 01/30/18 06:51 AM, Atul Gupta wrote:

> What I was referring is that passing "tls" ulp type in setsockopt
> may be insufficient to make the decision when multi HW assist Inline
> TLS solution exists.

Setting the ULP doesn't choose HW or SW implementation, I think that
should be done later when setting up crypto with 

setsockopt(SOL_TLS, TLS_TX, struct crypto_info).

Any reason we can't use ethtool to choose HW vs SW implementation, if
available on the device?

> Some HW may go beyond defining sendmsg/sendpage of the prot and
> require additional info to setup the env? Also, we need to keep
> vendor specific code out of tls_main.c i.e anything other than
> base/sw_tx prot perhaps go to hw driver.

Sure, but I think we can add hooks to tls_main to do this without a
new ULP.


Re: [bpf PATCH 1/3] net: add a UID to use for ULP socket assignment

2018-01-26 Thread Dave Watson
On 01/25/18 04:27 PM, John Fastabend wrote:
> I did not however retest TLS with the small change to ULP layer.
> Mostly because I don't have a TLS setup. I plan/hope to get around
> to writing either a sample or preferably a selftest for this case
> as well (assuming I didn't miss one).

> @Dave Watson can you take a quick look to verify the changes are
> good on TLS ULP side.

Looks reasonable, and passes my test suite.  One comment below

Tested-by: Dave Watson <davejwat...@fb.com>

> Signed-off-by: John Fastabend <john.fastab...@gmail.com>
> ---
>  include/net/tcp.h  |6 ++
>  net/ipv4/tcp_ulp.c |   53 
> +++-
>  net/tls/tls_main.c |2 ++
>  3 files changed, 56 insertions(+), 5 deletions(-)
> 
> diff --git a/net/ipv4/tcp_ulp.c b/net/ipv4/tcp_ulp.c
> index 6bb9e14..8ef437d 100644
> --- a/net/ipv4/tcp_ulp.c
> +++ b/net/ipv4/tcp_ulp.c
> @@ -133,3 +157,22 @@ int tcp_set_ulp(struct sock *sk, const char *name)
>   icsk->icsk_ulp_ops = ulp_ops;
>   return 0;
>  }
> +
> +int tcp_set_ulp_id(struct sock *sk, int ulp)
> +{
> + struct inet_connection_sock *icsk = inet_csk(sk);
> + const struct tcp_ulp_ops *ulp_ops;
> + int err;
> +
> + if (icsk->icsk_ulp_ops)
> + return -EEXIST;
> +
> + ulp_ops = __tcp_ulp_lookup(ulp);
> + if (!ulp_ops)
> + return -ENOENT;
> +
> + err = ulp_ops->init(sk);
> + if (!err)
> + icsk->icsk_ulp_ops = ulp_ops;

Does this need module_put on error, similar to tcp_set_ulp?

> + return err;
> +}


Re: [RFC crypto v3 8/9] chtls: Register the ULP

2018-01-25 Thread Dave Watson
<1513769897-26945-1-git-send-email-atul.gu...@chelsio.com>

On 12/20/17 05:08 PM, Atul Gupta wrote:
> +static void __init chtls_init_ulp_ops(void)
> +{
> + chtls_base_prot = tcp_prot;
> + chtls_base_prot.hash= chtls_hash;
> + chtls_base_prot.unhash  = chtls_unhash;
> + chtls_base_prot.close   = chtls_lsk_close;
> +
> + chtls_cpl_prot  = chtls_base_prot;
> + chtls_init_rsk_ops(_cpl_prot, _rsk_ops,
> +_prot, PF_INET);
> + chtls_cpl_prot.close= chtls_close;
> + chtls_cpl_prot.disconnect   = chtls_disconnect;
> + chtls_cpl_prot.destroy  = chtls_destroy_sock;
> + chtls_cpl_prot.shutdown = chtls_shutdown;
> + chtls_cpl_prot.sendmsg  = chtls_sendmsg;
> + chtls_cpl_prot.recvmsg  = chtls_recvmsg;
> + chtls_cpl_prot.sendpage = chtls_sendpage;
> + chtls_cpl_prot.setsockopt   = chtls_setsockopt;
> + chtls_cpl_prot.getsockopt   = chtls_getsockopt;
> +}

Much of this file should go in tls_main.c, reusing as much as
possible. For example it doesn't look like the get/set sockopts have
changed at all for chtls.

> +
> +static int __init chtls_register(void)
> +{
> + chtls_init_ulp_ops();
> + register_listen_notifier(_notifier);
> + cxgb4_register_uld(CXGB4_ULD_TLS, _uld_info);
> + tcp_register_ulp(_chtls_ulp_ops);
> + return 0;
> +}
> +
> +static void __exit chtls_unregister(void)
> +{
> + unregister_listen_notifier(_notifier);
> + tcp_unregister_ulp(_chtls_ulp_ops);
> + chtls_free_all_uld();
> + cxgb4_unregister_uld(CXGB4_ULD_TLS);
> +}

The idea with ULP is that there is one ULP hook per protocol, 
not per driver.  


[PATCH net] tls: Correct length of scatterlist in tls_sw_sendpage

2018-01-19 Thread Dave Watson
The scatterlist is reused by both sendmsg and sendfile.
If a sendmsg of smaller number of pages is followed by a sendfile
of larger number of pages, the scatterlist may be too short, resulting
in a crash in gcm_encrypt.

Add sg_unmark_end to make the list the correct length.

tls_sw_sendmsg already calls sg_unmark_end correctly when it allocates
memory in alloc_sg, or in zerocopy_from_iter.

Signed-off-by: Dave Watson <davejwat...@fb.com>
---
 net/tls/tls_sw.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index 61f394d..0a9b72f 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -577,6 +577,8 @@ int tls_sw_sendpage(struct sock *sk, struct page *page,
get_page(page);
sg = ctx->sg_plaintext_data + ctx->sg_plaintext_num_elem;
sg_set_page(sg, page, copy, offset);
+   sg_unmark_end(sg);
+
ctx->sg_plaintext_num_elem++;
 
sk_mem_charge(sk, copy);
-- 
2.9.5



Re: BUG_ON(sg->sg_magic != SG_MAGIC) on tls socket.

2017-08-11 Thread Dave Watson
Hi Dave,

On 08/11/17 02:52 PM, Dave Jones wrote:
> kernel BUG at ./include/linux/scatterlist.h:189!
> invalid opcode:  [#1] SMP KASAN

...

> Call Trace:
>  ? copy_page_to_iter+0x6c0/0x6c0
>  tls_sw_sendmsg+0x6d8/0x9c0
>  ? alloc_sg+0x510/0x510
>  ? cyc2ns_read_end+0x10/0x10
>  ? import_iovec+0xa8/0x1f0
>  ? do_syscall_64+0x1bc/0x3e0
>  ? entry_SYSCALL64_slow_path+0x25/0x25
>  inet_sendmsg+0xce/0x310

...

> 186 static inline void sg_mark_end(struct scatterlist *sg)
> 187 {
> 188 #ifdef CONFIG_DEBUG_SG
> 189 BUG_ON(sg->sg_magic != SG_MAGIC);
> 190 #endif

+added mellanox folks

I haven't seen this one yet.  The specific sequence of
sendmsg/sendpage calls would probably be super helpful to nail it down

Thanks


Re: [PATCH v2 net-next 3/4] tcp: Adjust TCP ULP to defer to sockets ULP

2017-08-02 Thread Dave Watson
On 08/01/17 08:18 PM, Tom Herbert wrote:
>  
> -static int tls_init(struct sock *sk)
> +static int tls_init(struct sock *sk, char __user *optval, int len)
>  {
> - struct inet_connection_sock *icsk = inet_csk(sk);
>   struct tls_context *ctx;
>   int rc = 0;
>  
> @@ -450,7 +449,7 @@ static int tls_init(struct sock *sk)
>   rc = -ENOMEM;
>   goto out;
>   }
> - icsk->icsk_ulp_data = ctx;
> + sk->sk_ulp_data = ctx;
>   ctx->setsockopt = sk->sk_prot->setsockopt;
>   ctx->getsockopt = sk->sk_prot->getsockopt;
>   sk->sk_prot = _base_prot;
> @@ -458,7 +457,7 @@ static int tls_init(struct sock *sk)
>   return rc;
>  }

It looks like tls_init should be checking if this is a tcp socket now
also, and failing if not


Re: [PATCH v2 net-next 3/4] tcp: Adjust TCP ULP to defer to sockets ULP

2017-08-02 Thread Dave Watson
On 08/01/17 08:18 PM, Tom Herbert wrote:
> Fix TCP and TLS to use the newer ULP infrastructure in sockets.
> 
> Signed-off-by: Tom Herbert <t...@quantonium.net>
> ---
>  Documentation/networking/tls.txt   |   6 +-
>  include/net/inet_connection_sock.h |   4 --
>  include/net/tcp.h  |  25 ---
>  include/net/tls.h  |   4 +-
>  net/ipv4/sysctl_net_ipv4.c |   9 ++-
>  net/ipv4/tcp.c |  40 ++-
>  net/ipv4/tcp_ipv4.c|   2 -
>  net/ipv4/tcp_ulp.c | 135 
> -
>  net/tls/Kconfig|   1 +
>  net/tls/tls_main.c |  21 +++---
>  10 files changed, 47 insertions(+), 200 deletions(-)
>  delete mode 100644 net/ipv4/tcp_ulp.c

It looks like net/ipv4/Makefile needs updating as well.

I ran this through some ktls tests with no issues.  Thanks

Tested-by: Dave Watson <davejwat...@fb.com>


Re: [PATCH v3 net-next 1/4] tcp: ULP infrastructure

2017-07-31 Thread Dave Watson
On 07/29/17 01:12 PM, Tom Herbert wrote:
> On Wed, Jun 14, 2017 at 11:37 AM, Dave Watson <davejwat...@fb.com> wrote:
> > Add the infrustructure for attaching Upper Layer Protocols (ULPs) over TCP
> > sockets. Based on a similar infrastructure in tcp_cong.  The idea is that 
> > any
> > ULP can add its own logic by changing the TCP proto_ops structure to its own
> > methods.
> >
> > Example usage:
> >
> > setsockopt(sock, SOL_TCP, TCP_ULP, "tls", sizeof("tls"));
> >
> One question: is there a good reason why the ULP infrastructure should
> just be for TCP sockets. For example, I'd really like to be able
> something like:
> 
> setsockopt(sock, SOL_SOCKET, SO_ULP, _param, sizeof(ulp_param));
> 
> Where ulp_param is a structure containing the ULP name as well as some
> ULP specific parameters that are passed to init_ulp. ulp_init could
> determine whether the socket family is appropriate for the ULP being
> requested.

Using SOL_SOCKET instead seems reasonable to me.  I can see how
ulp_params could have some use, perhaps at a slight loss in clarity.
TLS needs its own setsockopts anyway though, for renegotiate for
example.


Re: Kernel TLS in 4.13-rc1

2017-07-31 Thread Dave Watson
On 07/30/17 11:14 PM, David Oberhollenzer wrote:
> On 07/24/2017 11:10 PM, Dave Watson wrote:
> > On 07/23/17 09:39 PM, David Oberhollenzer wrote:
> >> After fixing the benchmark/test tool that the patch description
> >> linked to (https://github.com/Mellanox/tls-af_ktls_tool) to make
> >> sure that the server and client actually *agree* on AES-128-GCM,
> >> I simply ran the client program with the --verify-sendpage option.
> >>
> >> The handshake and setting up of the sockets appears to work but
> >> the program complains that the sent and received page contents
> >> do not match (sent is 0x12 repeated all over and received looks
> >> pretty random).
> > 
> > The --verify functions depend on the RX path as well, which has not
> > been merged.  Any programs / tests using OpenSSL + patches should work
> > fine.
> > 
> > If you want to use the tool, something like this should work, so that
> > the receive path uses gnutls:
> > 
> > ./server --no-echo
> > 
> > ./client --server-port 12345 --sendfile some_file --server-host localhost
> > 
> 
> Thanks! This appears to work as expected (output from the server matches the
> input from the client and the pcap dumps look fine).
> 
> From briefly browsing through the code of the test tool I was initially under
> the impression that it would generate an error message and terminate if an
> attempt was made at configuring ktls for the RX path.
> 
> Anyway, I already read in the patch description that RX wasn't included yet,
> still requires a few cleanups and would follow at some point.
> 
> Is there currently a "not-so-clean" version of the RX patches floating around
> somewhere that we could take a look at?

I dumped the current state here.  Still plenty rough but at least passes
--verify-transmission for me.

https://github.com/ktls/net_next_ktls/tree/tls_recv_net_next

and config changes to af_ktls-tool

https://github.com/ktls/af_ktls-tool/tree/RX



Re: Kernel TLS in 4.13-rc1

2017-07-24 Thread Dave Watson
On 07/23/17 09:39 PM, David Oberhollenzer wrote:
> After fixing the benchmark/test tool that the patch description
> linked to (https://github.com/Mellanox/tls-af_ktls_tool) to make
> sure that the server and client actually *agree* on AES-128-GCM,
> I simply ran the client program with the --verify-sendpage option.
> 
> The handshake and setting up of the sockets appears to work but
> the program complains that the sent and received page contents
> do not match (sent is 0x12 repeated all over and received looks
> pretty random).

The --verify functions depend on the RX path as well, which has not
been merged.  Any programs / tests using OpenSSL + patches should work
fine.

If you want to use the tool, something like this should work, so that
the receive path uses gnutls:

./server --no-echo

./client --server-port 12345 --sendfile some_file --server-host localhost


Re: [PATCH v3 net-next 3/4] tls: kernel TLS support

2017-07-12 Thread Dave Watson
On 07/12/17 09:20 AM, Steffen Klassert wrote:
> On Tue, Jul 11, 2017 at 11:53:11AM -0700, Dave Watson wrote:
> > On 07/11/17 08:29 AM, Steffen Klassert wrote:
> > > Sorry for replying to old mail...
> > > > +int tls_set_sw_offload(struct sock *sk, struct tls_context *ctx)
> > > > +{
> > > 
> > > ...
> > > 
> > > > +
> > > > +   if (!sw_ctx->aead_send) {
> > > > +   sw_ctx->aead_send = crypto_alloc_aead("gcm(aes)", 0, 0);
> > > > +   if (IS_ERR(sw_ctx->aead_send)) {
> > > > +   rc = PTR_ERR(sw_ctx->aead_send);
> > > > +   sw_ctx->aead_send = NULL;
> > > > +   goto free_rec_seq;
> > > > +   }
> > > > +   }
> > > > +
> > > 
> > > When I look on how you allocate the aead transformation, it seems
> > > that you should either register an asynchronous callback with
> > > aead_request_set_callback(), or request for a synchronous algorithm.
> > > 
> > > Otherwise you will crash on an asynchronous crypto return, no?
> > 
> > The intention is for it to be synchronous, and gather directly from
> > userspace buffers.  It looks like calling
> > crypto_alloc_aead("gcm(aes)", 0, CRYPTO_ALG_ASYNC) is the correct way
> > to request synchronous algorithms only?
> 
> Yes, but then you loose the aes-ni based algorithms because they are
> asynchronous. If you want to have good crypto performance, it is
> better to implement the asynchronous callbacks.

Right, the trick is we want both aesni, and to guarantee that we are
done using the input buffers before sendmsg() returns.  For now I can
set a callback, and wait on a completion.  The initial use case of
userspace openssl integration shouldn't hit the aesni async case
anyway (!irq_fpu_usable())


Re: [PATCH v3 net-next 3/4] tls: kernel TLS support

2017-07-11 Thread Dave Watson
On 07/11/17 08:29 AM, Steffen Klassert wrote:
> Sorry for replying to old mail...
> > +int tls_set_sw_offload(struct sock *sk, struct tls_context *ctx)
> > +{
> 
> ...
> 
> > +
> > +   if (!sw_ctx->aead_send) {
> > +   sw_ctx->aead_send = crypto_alloc_aead("gcm(aes)", 0, 0);
> > +   if (IS_ERR(sw_ctx->aead_send)) {
> > +   rc = PTR_ERR(sw_ctx->aead_send);
> > +   sw_ctx->aead_send = NULL;
> > +   goto free_rec_seq;
> > +   }
> > +   }
> > +
> 
> When I look on how you allocate the aead transformation, it seems
> that you should either register an asynchronous callback with
> aead_request_set_callback(), or request for a synchronous algorithm.
> 
> Otherwise you will crash on an asynchronous crypto return, no?

The intention is for it to be synchronous, and gather directly from
userspace buffers.  It looks like calling
crypto_alloc_aead("gcm(aes)", 0, CRYPTO_ALG_ASYNC) is the correct way
to request synchronous algorithms only?

> Also, it seems that you have your scatterlists on a per crypto
> transformation base istead of per crypto request. Is this intentional?

We hold the socket lock and only one crypto op can happen at a time,
so we reuse the scatterlists.


Re: [PATCH v3 net-next 0/4] kernel TLS

2017-07-06 Thread Dave Watson
Hi Richard, 

On 07/06/17 04:30 PM, Richard Weinberger wrote:
> Dave,
> 
> On Wed, Jun 14, 2017 at 8:36 PM, Dave Watson <davejwat...@fb.com> wrote:
> >  Documentation/networking/tls.txt   | 135 +++
> >  MAINTAINERS|  10 +
> >  include/linux/socket.h |   1 +
> >  include/net/inet_connection_sock.h |   4 +
> >  include/net/tcp.h  |  27 ++
> >  include/net/tls.h  | 237 
> >  include/uapi/linux/tcp.h   |   1 +
> >  include/uapi/linux/tls.h   |  79 
> >  net/Kconfig|   1 +
> >  net/Makefile   |   1 +
> >  net/ipv4/Makefile  |   2 +-
> >  net/ipv4/sysctl_net_ipv4.c |  25 ++
> >  net/ipv4/tcp.c |  33 +-
> >  net/ipv4/tcp_ipv4.c|   2 +
> >  net/ipv4/tcp_rate.c|   1 +
> >  net/ipv4/tcp_ulp.c | 134 +++
> >  net/tls/Kconfig|  12 +
> >  net/tls/Makefile   |   7 +
> >  net/tls/tls_main.c | 487 +++
> >  net/tls/tls_sw.c   | 772 
> > +
> >  20 files changed, 1968 insertions(+), 3 deletions(-)
> >  create mode 100644 Documentation/networking/tls.txt
> >  create mode 100644 include/net/tls.h
> >  create mode 100644 include/uapi/linux/tls.h
> >  create mode 100644 net/ipv4/tcp_ulp.c
> >  create mode 100644 net/tls/Kconfig
> >  create mode 100644 net/tls/Makefile
> >  create mode 100644 net/tls/tls_main.c
> >  create mode 100644 net/tls/tls_sw.c
> 
> Sorry for the late question. Do I miss something or is this IPv4 only?

The hooks it currently overrides / uses from proto_ops (sendmsg, sendpage,
get/setsockopt, close) are the same for ipv4 & ipv6, so it should work
for both.  Our test suites have been passing in both, at least.


[PATCH net-next] tcp: fix null ptr deref in getsockopt(..., TCP_ULP, ...)

2017-06-26 Thread Dave Watson
If icsk_ulp_ops is unset, it dereferences a null ptr.
Add a null ptr check.

BUG: KASAN: null-ptr-deref in copy_to_user include/linux/uaccess.h:168 [inline]
BUG: KASAN: null-ptr-deref in do_tcp_getsockopt.isra.33+0x24f/0x1e30 
net/ipv4/tcp.c:3057
Read of size 4 at addr 0020 by task syz-executor1/15452

Signed-off-by: Dave Watson <davejwat...@fb.com>
Reported-by: "Levin, Alexander (Sasha Levin)" <alexander.le...@verizon.com>
---
 net/ipv4/tcp.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 058f509..4c88d20 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -3062,6 +3062,11 @@ static int do_tcp_getsockopt(struct sock *sk, int level,
if (get_user(len, optlen))
return -EFAULT;
len = min_t(unsigned int, len, TCP_ULP_NAME_MAX);
+   if (!icsk->icsk_ulp_ops) {
+   if (put_user(0, optlen))
+   return -EFAULT;
+   return 0;
+   }
if (put_user(len, optlen))
return -EFAULT;
if (copy_to_user(optval, icsk->icsk_ulp_ops->name, len))
-- 
2.9.3



Re: [PATCH v3 net-next 1/4] tcp: ULP infrastructure

2017-06-26 Thread Dave Watson
On 06/25/17 02:42 AM, Levin, Alexander (Sasha Levin) wrote:
> On Wed, Jun 14, 2017 at 11:37:14AM -0700, Dave Watson wrote:
> >Add the infrustructure for attaching Upper Layer Protocols (ULPs) over TCP
> >sockets. Based on a similar infrastructure in tcp_cong.  The idea is that any
> >ULP can add its own logic by changing the TCP proto_ops structure to its own
> >methods.
> >
> >Example usage:
> >
> >setsockopt(sock, SOL_TCP, TCP_ULP, "tls", sizeof("tls"));
> >
> >modules will call:
> >tcp_register_ulp(_tls_ulp_ops);
> >
> >to register/unregister their ulp, with an init function and name.
> >
> >A list of registered ulps will be returned by tcp_get_available_ulp, which is
> >hooked up to /proc.  Example:
> >
> >$ cat /proc/sys/net/ipv4/tcp_available_ulp
> >tls
> >
> >There is currently no functionality to remove or chain ULPs, but
> >it should be possible to add these in the future if needed.
> >
> >Signed-off-by: Boris Pismenny <bor...@mellanox.com>
> >Signed-off-by: Dave Watson <davejwat...@fb.com>
> 
> Hey Dave,
> 
> I'm seeing the following while fuzzing, which was bisected to this commit:
> 
> ==
> BUG: KASAN: null-ptr-deref in copy_to_user include/linux/uaccess.h:168 
> [inline]
> BUG: KASAN: null-ptr-deref in do_tcp_getsockopt.isra.33+0x24f/0x1e30 
> net/ipv4/tcp.c:3057
> Read of size 4 at addr 0020 by task syz-executor1/15452

At a glance, this looks like it was fixed already by

https://www.mail-archive.com/netdev@vger.kernel.org/msg175226.html

Can you recheck with that patch, or verify that you already have it?
Thanks.


Re: [PATCH -net] tls: return -EFAULT if copy_to_user() fails

2017-06-23 Thread Dave Watson
On 06/23/17 01:15 PM, Dan Carpenter wrote:
> The copy_to_user() function returns the number of bytes remaining but we
> want to return -EFAULT here.
> 
> Fixes: 3c4d7559159b ("tls: kernel TLS support")
> Signed-off-by: Dan Carpenter <dan.carpen...@oracle.com>

Acked-by: Dave Watson <davejwat...@fb.com>

Yes, -EFAULT seems like the correct choice here, the return from
copy_to_user isn't useful.  Thanks

> 
> diff --git a/net/tls/tls_main.c b/net/tls/tls_main.c
> index 2ebc328bda96..a03130a47b85 100644
> --- a/net/tls/tls_main.c
> +++ b/net/tls/tls_main.c
> @@ -273,7 +273,8 @@ static int do_tls_getsockopt_tx(struct sock *sk, char 
> __user *optval,
>   }
>  
>   if (len == sizeof(crypto_info)) {
> - rc = copy_to_user(optval, crypto_info, sizeof(*crypto_info));
> + if (copy_to_user(optval, crypto_info, sizeof(*crypto_info)))
> + rc = -EFAULT;
>   goto out;
>   }
>  
> @@ -293,9 +294,10 @@ static int do_tls_getsockopt_tx(struct sock *sk, char 
> __user *optval,
>   memcpy(crypto_info_aes_gcm_128->iv, ctx->iv,
>  TLS_CIPHER_AES_GCM_128_IV_SIZE);
>   release_sock(sk);
> - rc = copy_to_user(optval,
> -   crypto_info_aes_gcm_128,
> -   sizeof(*crypto_info_aes_gcm_128));
> + if (copy_to_user(optval,
> +  crypto_info_aes_gcm_128,
> +  sizeof(*crypto_info_aes_gcm_128)))
> + rc = -EFAULT;
>   break;
>   }
>   default:


[PATCH net-next] tls: update Kconfig

2017-06-17 Thread Dave Watson
Missing crypto deps for some platforms.
Default to n for new module.

config: m68k-amcore_defconfig (attached as .config)
compiler: m68k-linux-gcc (GCC) 4.9.0

make.cross ARCH=m68k
All errors (new ones prefixed by >>):

   net/built-in.o: In function `tls_set_sw_offload':
>> (.text+0x732f8): undefined reference to `crypto_alloc_aead'
   net/built-in.o: In function `tls_set_sw_offload':
>> (.text+0x7333c): undefined reference to `crypto_aead_setkey'
   net/built-in.o: In function `tls_set_sw_offload':
>> (.text+0x73354): undefined reference to `crypto_aead_setauthsize'

Reported-by: kbuild test robot <fengguang...@intel.com>
Signed-off-by: Dave Watson <davejwat...@fb.com>
---
 net/tls/Kconfig | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/net/tls/Kconfig b/net/tls/Kconfig
index b13541f..eb58303 100644
--- a/net/tls/Kconfig
+++ b/net/tls/Kconfig
@@ -4,9 +4,12 @@
 config TLS
tristate "Transport Layer Security support"
depends on INET
-   default m
+   select CRYPTO
+   select CRYPTO_AES
+   select CRYPTO_GCM
+   default n
---help---
Enable kernel support for TLS protocol. This allows symmetric
encryption handling of the TLS protocol to be done in-kernel.
 
-   If unsure, say M.
+   If unsure, say N.
-- 
2.9.3



Re: [PATCH v3 net-next 3/4] tls: kernel TLS support

2017-06-16 Thread Dave Watson
On 06/16/17 01:58 PM, Stephen Hemminger wrote:
> On Wed, 14 Jun 2017 11:37:39 -0700
> Dave Watson <davejwat...@fb.com> wrote:
> 
> > --- /dev/null
> > +++ b/net/tls/Kconfig
> > @@ -0,0 +1,12 @@
> > +#
> > +# TLS configuration
> > +#
> > +config TLS
> > +   tristate "Transport Layer Security support"
> > +   depends on NET
> > +   default m
> > +   ---help---
> > +   Enable kernel support for TLS protocol. This allows symmetric
> > +   encryption handling of the TLS protocol to be done in-kernel.
> > +
> > +   If unsure, say M.
> 
> I understand that this will be useful to lots of people and most distributions
> will enable it. But the defacto policy in kernel configuration has been that
> new features in kernel default to being disabled.

Sure, will send a patch to switch to default n.


Re: [PATCH v3 net-next 0/4] kernel TLS

2017-06-14 Thread Dave Watson
On 06/14/17 01:54 PM, Tom Herbert wrote:
> On Wed, Jun 14, 2017 at 11:36 AM, Dave Watson <davejwat...@fb.com> wrote:
> > This series adds support for kernel TLS encryption over TCP sockets.
> > A standard TCP socket is converted to a TLS socket using a setsockopt.
> > Only symmetric crypto is done in the kernel, as well as TLS record
> > framing.  The handshake remains in userspace, and the negotiated
> > cipher keys/iv are provided to the TCP socket.
> >
> I don't see support for TLS receive path in the kernel, only the send
> path. Am I missing something?

Correct, this is only TX.  Since it sounds likely some hardware might
only be able to offload TX, we decided to configure TX and RX
separately.  Using the OpenSSL patches, it should be transparent to
users even if only one side is offloaded.

The software RX patches exist but haven't been polished up yet.


Re: [PATCH v3 net-next 0/4] kernel TLS

2017-06-14 Thread Dave Watson
Hi Hannes, 

On 06/14/17 10:15 PM, Hannes Frederic Sowa wrote:
> one question for this patch set:
> 
> What is the reason for not allowing key updates for the TX path? I was
> always loud pointing out the problems with TLSv1.2 renegotiation and
> TLSv1.3 key update alerts. This patch set uses encryption in a
> synchronous way directly in the socket layer and thus wouldn't suffer
> from problems regarding updates of the key. My hunch is that you leave
> this option open so you can later on introduce asynchronous crypto which
> might be used on hardware? It looks also be doable in case of MSG_MORE.
> Otherwise by allowing key updates to the data path I would not see any
> problems with key updates in TLS.

I don't currently have any reasons to not support renegotation, we
just don't currently use it, so I didn't add support for it.  I don't
work on the hardware, but yes it looks like it would have to keep the
old keys around until everything sent using them has been acked.

> Anyway, this patch seems easy and maybe with key updates added later on
> doesn't seem to have any problems pointed out by me so far.

Indeed, it would be easy to flush any unencrypted data, and then
change the keys.


[PATCH v3 net-next 4/4] tls: Documentation

2017-06-14 Thread Dave Watson
Add documentation for the tcp ULP tls interface.

Signed-off-by: Boris Pismenny <bor...@mellanox.com>
Signed-off-by: Dave Watson <davejwat...@fb.com>
---
 Documentation/networking/tls.txt | 135 +++
 1 file changed, 135 insertions(+)
 create mode 100644 Documentation/networking/tls.txt

diff --git a/Documentation/networking/tls.txt b/Documentation/networking/tls.txt
new file mode 100644
index 000..77ed006
--- /dev/null
+++ b/Documentation/networking/tls.txt
@@ -0,0 +1,135 @@
+Overview
+
+
+Transport Layer Security (TLS) is a Upper Layer Protocol (ULP) that runs over
+TCP. TLS provides end-to-end data integrity and confidentiality.
+
+User interface
+==
+
+Creating a TLS connection
+-
+
+First create a new TCP socket and set the TLS ULP.
+
+  sock = socket(AF_INET, SOCK_STREAM, 0);
+  setsockopt(sock, SOL_TCP, TCP_ULP, "tls", sizeof("tls"));
+
+Setting the TLS ULP allows us to set/get TLS socket options. Currently
+only the symmetric encryption is handled in the kernel.  After the TLS
+handshake is complete, we have all the parameters required to move the
+data-path to the kernel. There is a separate socket option for moving
+the transmit and the receive into the kernel.
+
+  /* From linux/tls.h */
+  struct tls_crypto_info {
+  unsigned short version;
+  unsigned short cipher_type;
+  };
+
+  struct tls12_crypto_info_aes_gcm_128 {
+  struct tls_crypto_info info;
+  unsigned char iv[TLS_CIPHER_AES_GCM_128_IV_SIZE];
+  unsigned char key[TLS_CIPHER_AES_GCM_128_KEY_SIZE];
+  unsigned char salt[TLS_CIPHER_AES_GCM_128_SALT_SIZE];
+  unsigned char rec_seq[TLS_CIPHER_AES_GCM_128_REC_SEQ_SIZE];
+  };
+
+
+  struct tls12_crypto_info_aes_gcm_128 crypto_info;
+
+  crypto_info.info.version = TLS_1_2_VERSION;
+  crypto_info.info.cipher_type = TLS_CIPHER_AES_GCM_128;
+  memcpy(crypto_info.iv, iv_write, TLS_CIPHER_AES_GCM_128_IV_SIZE);
+  memcpy(crypto_info.rec_seq, seq_number_write,
+   TLS_CIPHER_AES_GCM_128_REC_SEQ_SIZE);
+  memcpy(crypto_info.key, cipher_key_write, TLS_CIPHER_AES_GCM_128_KEY_SIZE);
+  memcpy(crypto_info.salt, implicit_iv_write, 
TLS_CIPHER_AES_GCM_128_SALT_SIZE);
+
+  setsockopt(sock, SOL_TLS, TLS_TX, _info, sizeof(crypto_info));
+
+Sending TLS application data
+
+
+After setting the TLS_TX socket option all application data sent over this
+socket is encrypted using TLS and the parameters provided in the socket option.
+For example, we can send an encrypted hello world record as follows:
+
+  const char *msg = "hello world\n";
+  send(sock, msg, strlen(msg));
+
+send() data is directly encrypted from the userspace buffer provided
+to the encrypted kernel send buffer if possible.
+
+The sendfile system call will send the file's data over TLS records of maximum
+length (2^14).
+
+  file = open(filename, O_RDONLY);
+  fstat(file, );
+  sendfile(sock, file, , stat.st_size);
+
+TLS records are created and sent after each send() call, unless
+MSG_MORE is passed.  MSG_MORE will delay creation of a record until
+MSG_MORE is not passed, or the maximum record size is reached.
+
+The kernel will need to allocate a buffer for the encrypted data.
+This buffer is allocated at the time send() is called, such that
+either the entire send() call will return -ENOMEM (or block waiting
+for memory), or the encryption will always succeed.  If send() returns
+-ENOMEM and some data was left on the socket buffer from a previous
+call using MSG_MORE, the MSG_MORE data is left on the socket buffer.
+
+Send TLS control messages
+-
+
+Other than application data, TLS has control messages such as alert
+messages (record type 21) and handshake messages (record type 22), etc.
+These messages can be sent over the socket by providing the TLS record type
+via a CMSG. For example the following function sends @data of @length bytes
+using a record of type @record_type.
+
+/* send TLS control message using record_type */
+  static int klts_send_ctrl_message(int sock, unsigned char record_type,
+  void *data, size_t length)
+  {
+struct msghdr msg = {0};
+int cmsg_len = sizeof(record_type);
+struct cmsghdr *cmsg;
+char buf[CMSG_SPACE(cmsg_len)];
+struct iovec msg_iov;   /* Vector of data to send/receive into.  */
+
+msg.msg_control = buf;
+msg.msg_controllen = sizeof(buf);
+cmsg = CMSG_FIRSTHDR();
+cmsg->cmsg_level = SOL_TLS;
+cmsg->cmsg_type = TLS_SET_RECORD_TYPE;
+cmsg->cmsg_len = CMSG_LEN(cmsg_len);
+*CMSG_DATA(cmsg) = record_type;
+msg.msg_controllen = cmsg->cmsg_len;
+
+msg_iov.iov_base = data;
+msg_iov.iov_len = length;
+msg.msg_iov = _iov;
+msg.msg_iovlen = 1;
+
+return s

[PATCH v3 net-next 3/4] tls: kernel TLS support

2017-06-14 Thread Dave Watson
Software implementation of transport layer security, implemented using ULP
infrastructure.  tcp proto_ops are replaced with tls equivalents of sendmsg and
sendpage.

Only symmetric crypto is done in the kernel, keys are passed by setsockopt
after the handshake is complete.  All control messages are supported via CMSG
data - the actual symmetric encryption is the same, just the message type needs
to be passed separately.

For user API, please see Documentation patch.

Pieces that can be shared between hw and sw implementation
are in tls_main.c

Signed-off-by: Boris Pismenny <bor...@mellanox.com>
Signed-off-by: Ilya Lesokhin <il...@mellanox.com>
Signed-off-by: Aviad Yehezkel <avia...@mellanox.com>
Signed-off-by: Dave Watson <davejwat...@fb.com>
---
 MAINTAINERS  |  10 +
 include/linux/socket.h   |   1 +
 include/net/tls.h| 237 +++
 include/uapi/linux/tls.h |  79 +
 net/Kconfig  |   1 +
 net/Makefile |   1 +
 net/tls/Kconfig  |  12 +
 net/tls/Makefile |   7 +
 net/tls/tls_main.c   | 487 ++
 net/tls/tls_sw.c | 772 +++
 10 files changed, 1607 insertions(+)
 create mode 100644 include/net/tls.h
 create mode 100644 include/uapi/linux/tls.h
 create mode 100644 net/tls/Kconfig
 create mode 100644 net/tls/Makefile
 create mode 100644 net/tls/tls_main.c
 create mode 100644 net/tls/tls_sw.c

diff --git a/MAINTAINERS b/MAINTAINERS
index f4e682c..710af53 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -8979,6 +8979,16 @@ F:   net/ipv6/
 F: include/net/ip*
 F: arch/x86/net/*
 
+NETWORKING [TLS]
+M: Ilya Lesokhin <il...@mellanox.com>
+M: Aviad Yehezkel <avia...@mellanox.com>
+M: Dave Watson <davejwat...@fb.com>
+L: netdev@vger.kernel.org
+S: Maintained
+F: net/tls/*
+F: include/uapi/linux/tls.h
+F: include/net/tls.h
+
 NETWORKING [IPSEC]
 M: Steffen Klassert <steffen.klass...@secunet.com>
 M: Herbert Xu <herb...@gondor.apana.org.au>
diff --git a/include/linux/socket.h b/include/linux/socket.h
index 0820274..8b13db5 100644
--- a/include/linux/socket.h
+++ b/include/linux/socket.h
@@ -334,6 +334,7 @@ struct ucred {
 #define SOL_ALG279
 #define SOL_NFC280
 #define SOL_KCM281
+#define SOL_TLS282
 
 /* IPX options */
 #define IPX_TYPE   1
diff --git a/include/net/tls.h b/include/net/tls.h
new file mode 100644
index 000..b89d397
--- /dev/null
+++ b/include/net/tls.h
@@ -0,0 +1,237 @@
+/*
+ * Copyright (c) 2016-2017, Mellanox Technologies. All rights reserved.
+ * Copyright (c) 2016-2017, Dave Watson <davejwat...@fb.com>. All rights 
reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ *  - Redistributions of source code must retain the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer.
+ *
+ *  - Redistributions in binary form must reproduce the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer in the documentation and/or other materials
+ *provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#ifndef _TLS_OFFLOAD_H
+#define _TLS_OFFLOAD_H
+
+#include 
+
+#include 
+
+
+/* Maximum data size carried in a TLS record */
+#define TLS_MAX_PAYLOAD_SIZE   ((size_t)1 << 14)
+
+#define TLS_HEADER_SIZE5
+#define TLS_NONCE_OFFSET   TLS_HEADER_SIZE
+
+#define TLS_CRYPTO_INFO_READY(info)((info)->cipher_type)
+
+#define TLS_RECORD_TYPE_DATA   0x17
+
+#define TLS_AAD_SPACE_SIZE 13
+
+struct tls_sw_context {
+   struct crypto_aead *aead_send;
+
+   /* Sending context */
+   char aad_space[TLS_AAD_SPACE_SIZE];
+
+   unsigned int sg_plaintext_size;
+   int sg_plaintext_num_elem;
+   struct scatterlist sg_plaintext_data[MAX_SKB_FRAGS];
+
+   unsigned int sg_encrypted_size;

[PATCH v3 net-next 2/4] tcp: export do_tcp_sendpages and tcp_rate_check_app_limited functions

2017-06-14 Thread Dave Watson
Export do_tcp_sendpages and tcp_rate_check_app_limited, since tls will need to
sendpages while the socket is already locked.

tcp_sendpage is exported, but requires the socket lock to not be held already.

Signed-off-by: Aviad Yehezkel <avia...@mellanox.com>
Signed-off-by: Ilya Lesokhin <il...@mellanox.com>
Signed-off-by: Boris Pismenny <bor...@mellanox.com>
Signed-off-by: Dave Watson <davejwat...@fb.com>
---
 include/net/tcp.h   | 2 ++
 net/ipv4/tcp.c  | 5 +++--
 net/ipv4/tcp_rate.c | 1 +
 3 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/include/net/tcp.h b/include/net/tcp.h
index b439f46..e17ec28 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -350,6 +350,8 @@ int tcp_v4_tw_remember_stamp(struct inet_timewait_sock *tw);
 int tcp_sendmsg(struct sock *sk, struct msghdr *msg, size_t size);
 int tcp_sendpage(struct sock *sk, struct page *page, int offset, size_t size,
 int flags);
+ssize_t do_tcp_sendpages(struct sock *sk, struct page *page, int offset,
+size_t size, int flags);
 void tcp_release_cb(struct sock *sk);
 void tcp_wfree(struct sk_buff *skb);
 void tcp_write_timer_handler(struct sock *sk);
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index b06ee30..11e4ee2 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -901,8 +901,8 @@ static int tcp_send_mss(struct sock *sk, int *size_goal, 
int flags)
return mss_now;
 }
 
-static ssize_t do_tcp_sendpages(struct sock *sk, struct page *page, int offset,
-   size_t size, int flags)
+ssize_t do_tcp_sendpages(struct sock *sk, struct page *page, int offset,
+size_t size, int flags)
 {
struct tcp_sock *tp = tcp_sk(sk);
int mss_now, size_goal;
@@ -1032,6 +1032,7 @@ static ssize_t do_tcp_sendpages(struct sock *sk, struct 
page *page, int offset,
}
return sk_stream_error(sk, flags, err);
 }
+EXPORT_SYMBOL_GPL(do_tcp_sendpages);
 
 int tcp_sendpage(struct sock *sk, struct page *page, int offset,
 size_t size, int flags)
diff --git a/net/ipv4/tcp_rate.c b/net/ipv4/tcp_rate.c
index ad99569..3330a37 100644
--- a/net/ipv4/tcp_rate.c
+++ b/net/ipv4/tcp_rate.c
@@ -185,3 +185,4 @@ void tcp_rate_check_app_limited(struct sock *sk)
tp->app_limited =
(tp->delivered + tcp_packets_in_flight(tp)) ? : 1;
 }
+EXPORT_SYMBOL_GPL(tcp_rate_check_app_limited);
-- 
2.9.3



[PATCH v3 net-next 1/4] tcp: ULP infrastructure

2017-06-14 Thread Dave Watson
Add the infrustructure for attaching Upper Layer Protocols (ULPs) over TCP
sockets. Based on a similar infrastructure in tcp_cong.  The idea is that any
ULP can add its own logic by changing the TCP proto_ops structure to its own
methods.

Example usage:

setsockopt(sock, SOL_TCP, TCP_ULP, "tls", sizeof("tls"));

modules will call:
tcp_register_ulp(_tls_ulp_ops);

to register/unregister their ulp, with an init function and name.

A list of registered ulps will be returned by tcp_get_available_ulp, which is
hooked up to /proc.  Example:

$ cat /proc/sys/net/ipv4/tcp_available_ulp
tls

There is currently no functionality to remove or chain ULPs, but
it should be possible to add these in the future if needed.

Signed-off-by: Boris Pismenny <bor...@mellanox.com>
Signed-off-by: Dave Watson <davejwat...@fb.com>
---
 include/net/inet_connection_sock.h |   4 ++
 include/net/tcp.h  |  25 +++
 include/uapi/linux/tcp.h   |   1 +
 net/ipv4/Makefile  |   2 +-
 net/ipv4/sysctl_net_ipv4.c |  25 +++
 net/ipv4/tcp.c |  28 
 net/ipv4/tcp_ipv4.c|   2 +
 net/ipv4/tcp_ulp.c | 134 +
 8 files changed, 220 insertions(+), 1 deletion(-)
 create mode 100644 net/ipv4/tcp_ulp.c

diff --git a/include/net/inet_connection_sock.h 
b/include/net/inet_connection_sock.h
index c7a5779..13e4c89 100644
--- a/include/net/inet_connection_sock.h
+++ b/include/net/inet_connection_sock.h
@@ -75,6 +75,8 @@ struct inet_connection_sock_af_ops {
  * @icsk_pmtu_cookie  Last pmtu seen by socket
  * @icsk_ca_ops   Pluggable congestion control hook
  * @icsk_af_ops   Operations which are AF_INET{4,6} specific
+ * @icsk_ulp_ops  Pluggable ULP control hook
+ * @icsk_ulp_data ULP private data
  * @icsk_ca_state:Congestion control state
  * @icsk_retransmits: Number of unrecovered [RTO] timeouts
  * @icsk_pending: Scheduled timer event
@@ -97,6 +99,8 @@ struct inet_connection_sock {
__u32 icsk_pmtu_cookie;
const struct tcp_congestion_ops *icsk_ca_ops;
const struct inet_connection_sock_af_ops *icsk_af_ops;
+   const struct tcp_ulp_ops  *icsk_ulp_ops;
+   void  *icsk_ulp_data;
unsigned int  (*icsk_sync_mss)(struct sock *sk, u32 pmtu);
__u8  icsk_ca_state:6,
  icsk_ca_setsockopt:1,
diff --git a/include/net/tcp.h b/include/net/tcp.h
index 3ab677d..b439f46 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -1991,4 +1991,29 @@ static inline void tcp_listendrop(const struct sock *sk)
 
 enum hrtimer_restart tcp_pace_kick(struct hrtimer *timer);
 
+/*
+ * Interface for adding Upper Level Protocols over TCP
+ */
+
+#define TCP_ULP_NAME_MAX   16
+#define TCP_ULP_MAX128
+#define TCP_ULP_BUF_MAX(TCP_ULP_NAME_MAX*TCP_ULP_MAX)
+
+struct tcp_ulp_ops {
+   struct list_headlist;
+
+   /* initialize ulp */
+   int (*init)(struct sock *sk);
+   /* cleanup ulp */
+   void (*release)(struct sock *sk);
+
+   charname[TCP_ULP_NAME_MAX];
+   struct module   *owner;
+};
+int tcp_register_ulp(struct tcp_ulp_ops *type);
+void tcp_unregister_ulp(struct tcp_ulp_ops *type);
+int tcp_set_ulp(struct sock *sk, const char *name);
+void tcp_get_available_ulp(char *buf, size_t len);
+void tcp_cleanup_ulp(struct sock *sk);
+
 #endif /* _TCP_H */
diff --git a/include/uapi/linux/tcp.h b/include/uapi/linux/tcp.h
index 38a2b07..8204dce 100644
--- a/include/uapi/linux/tcp.h
+++ b/include/uapi/linux/tcp.h
@@ -117,6 +117,7 @@ enum {
 #define TCP_SAVED_SYN  28  /* Get SYN headers recorded for 
connection */
 #define TCP_REPAIR_WINDOW  29  /* Get/set window parameters */
 #define TCP_FASTOPEN_CONNECT   30  /* Attempt FastOpen with connect */
+#define TCP_ULP31  /* Attach a ULP to a TCP connection */
 
 struct tcp_repair_opt {
__u32   opt_code;
diff --git a/net/ipv4/Makefile b/net/ipv4/Makefile
index f83de23..afcb435 100644
--- a/net/ipv4/Makefile
+++ b/net/ipv4/Makefile
@@ -8,7 +8,7 @@ obj-y := route.o inetpeer.o protocol.o \
 inet_timewait_sock.o inet_connection_sock.o \
 tcp.o tcp_input.o tcp_output.o tcp_timer.o tcp_ipv4.o \
 tcp_minisocks.o tcp_cong.o tcp_metrics.o tcp_fastopen.o \
-tcp_rate.o tcp_recovery.o \
+tcp_rate.o tcp_recovery.o tcp_ulp.o \
 tcp_offload.o datagram.o raw.o udp.o udplite.o \
 udp_offload.o arp.o icmp.o devinet.o af_inet.o igmp.o \
 fib_frontend.o fib_semantics.o fib_trie.o fib_notifier.o \
diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c
index 7065234a..9bf8097 100644
--- a/net/ipv4/sysctl_net_ipv4.c
+++ b/net/ipv4/sys

[PATCH v3 net-next 0/4] kernel TLS

2017-06-14 Thread Dave Watson
This series adds support for kernel TLS encryption over TCP sockets.
A standard TCP socket is converted to a TLS socket using a setsockopt.
Only symmetric crypto is done in the kernel, as well as TLS record
framing.  The handshake remains in userspace, and the negotiated
cipher keys/iv are provided to the TCP socket.

We implemented support for this API in OpenSSL 1.1.0, the code is
available at https://github.com/Mellanox/tls-openssl/tree/master

It should work with any TLS library with similar modifications,
a test tool using gnutls is here: https://github.com/Mellanox/tls-af_ktls_tool

RFC patch to openssl:
https://mta.openssl.org/pipermail/openssl-dev/2017-June/009384.html

Changes from V2:

* EXPORT_SYMBOL_GPL in patch 1
* Ensure cleanup code always called before sk_stream_kill_queues to
  avoid warnings

Changes from V1:

* EXPORT_SYMBOL GPL in patch 2
* Add link to OpenSSL patch & gnutls example in documentation patch.
* sk_write_pending check was rolled in to wait_for_memory path,
  avoids special case and fixes lock inbalance issue.
* Unify flag handling for sendmsg/sendfile

Changes from RFC V2:

* Generic ULP (upper layer protocol) framework instead of TLS specific
  setsockopts
* Dropped Mellanox hardware patches, will come as separate series.
  Framework will work for both.

RFC V2:

http://www.mail-archive.com/netdev@vger.kernel.org/msg160317.html

Changes from RFC V1:

* Socket based on changing TCP proto_ops instead of crypto framework
* Merged code with Mellanox's hardware tls offload
* Zerocopy sendmsg support added - sendpage/sendfile is no longer
  necessary for zerocopy optimization

RFC V1:

http://www.mail-archive.com/netdev@vger.kernel.org/msg88021.html

* Socket based on crypto userspace API framework, required two
  sockets in userspace, one encrypted, one unencrypted.

Paper: https://netdevconf.org/1.2/papers/ktls.pdf

Aviad Yehezkel (1):
  tcp: export do_tcp_sendpages and tcp_rate_check_app_limited functions

Boris Pismenny (2):
  tcp: ULP infrastructure
  tls: Documentation

Ilya Lesokhin (1):
  tls: kernel TLS support

 Documentation/networking/tls.txt   | 135 +++
 MAINTAINERS|  10 +
 include/linux/socket.h |   1 +
 include/net/inet_connection_sock.h |   4 +
 include/net/tcp.h  |  27 ++
 include/net/tls.h  | 237 
 include/uapi/linux/tcp.h   |   1 +
 include/uapi/linux/tls.h   |  79 
 net/Kconfig|   1 +
 net/Makefile   |   1 +
 net/ipv4/Makefile  |   2 +-
 net/ipv4/sysctl_net_ipv4.c |  25 ++
 net/ipv4/tcp.c |  33 +-
 net/ipv4/tcp_ipv4.c|   2 +
 net/ipv4/tcp_rate.c|   1 +
 net/ipv4/tcp_ulp.c | 134 +++
 net/tls/Kconfig|  12 +
 net/tls/Makefile   |   7 +
 net/tls/tls_main.c | 487 +++
 net/tls/tls_sw.c   | 772 +
 20 files changed, 1968 insertions(+), 3 deletions(-)
 create mode 100644 Documentation/networking/tls.txt
 create mode 100644 include/net/tls.h
 create mode 100644 include/uapi/linux/tls.h
 create mode 100644 net/ipv4/tcp_ulp.c
 create mode 100644 net/tls/Kconfig
 create mode 100644 net/tls/Makefile
 create mode 100644 net/tls/tls_main.c
 create mode 100644 net/tls/tls_sw.c

-- 
2.9.3



[PATCH v2 net-next 4/4] tls: Documentation

2017-06-06 Thread Dave Watson
Add documentation for the tcp ULP tls interface.

Signed-off-by: Boris Pismenny <bor...@mellanox.com>
Signed-off-by: Dave Watson <davejwat...@fb.com>
---
 Documentation/networking/tls.txt | 135 +++
 1 file changed, 135 insertions(+)
 create mode 100644 Documentation/networking/tls.txt

diff --git a/Documentation/networking/tls.txt b/Documentation/networking/tls.txt
new file mode 100644
index 000..77ed006
--- /dev/null
+++ b/Documentation/networking/tls.txt
@@ -0,0 +1,135 @@
+Overview
+
+
+Transport Layer Security (TLS) is a Upper Layer Protocol (ULP) that runs over
+TCP. TLS provides end-to-end data integrity and confidentiality.
+
+User interface
+==
+
+Creating a TLS connection
+-
+
+First create a new TCP socket and set the TLS ULP.
+
+  sock = socket(AF_INET, SOCK_STREAM, 0);
+  setsockopt(sock, SOL_TCP, TCP_ULP, "tls", sizeof("tls"));
+
+Setting the TLS ULP allows us to set/get TLS socket options. Currently
+only the symmetric encryption is handled in the kernel.  After the TLS
+handshake is complete, we have all the parameters required to move the
+data-path to the kernel. There is a separate socket option for moving
+the transmit and the receive into the kernel.
+
+  /* From linux/tls.h */
+  struct tls_crypto_info {
+  unsigned short version;
+  unsigned short cipher_type;
+  };
+
+  struct tls12_crypto_info_aes_gcm_128 {
+  struct tls_crypto_info info;
+  unsigned char iv[TLS_CIPHER_AES_GCM_128_IV_SIZE];
+  unsigned char key[TLS_CIPHER_AES_GCM_128_KEY_SIZE];
+  unsigned char salt[TLS_CIPHER_AES_GCM_128_SALT_SIZE];
+  unsigned char rec_seq[TLS_CIPHER_AES_GCM_128_REC_SEQ_SIZE];
+  };
+
+
+  struct tls12_crypto_info_aes_gcm_128 crypto_info;
+
+  crypto_info.info.version = TLS_1_2_VERSION;
+  crypto_info.info.cipher_type = TLS_CIPHER_AES_GCM_128;
+  memcpy(crypto_info.iv, iv_write, TLS_CIPHER_AES_GCM_128_IV_SIZE);
+  memcpy(crypto_info.rec_seq, seq_number_write,
+   TLS_CIPHER_AES_GCM_128_REC_SEQ_SIZE);
+  memcpy(crypto_info.key, cipher_key_write, TLS_CIPHER_AES_GCM_128_KEY_SIZE);
+  memcpy(crypto_info.salt, implicit_iv_write, 
TLS_CIPHER_AES_GCM_128_SALT_SIZE);
+
+  setsockopt(sock, SOL_TLS, TLS_TX, _info, sizeof(crypto_info));
+
+Sending TLS application data
+
+
+After setting the TLS_TX socket option all application data sent over this
+socket is encrypted using TLS and the parameters provided in the socket option.
+For example, we can send an encrypted hello world record as follows:
+
+  const char *msg = "hello world\n";
+  send(sock, msg, strlen(msg));
+
+send() data is directly encrypted from the userspace buffer provided
+to the encrypted kernel send buffer if possible.
+
+The sendfile system call will send the file's data over TLS records of maximum
+length (2^14).
+
+  file = open(filename, O_RDONLY);
+  fstat(file, );
+  sendfile(sock, file, , stat.st_size);
+
+TLS records are created and sent after each send() call, unless
+MSG_MORE is passed.  MSG_MORE will delay creation of a record until
+MSG_MORE is not passed, or the maximum record size is reached.
+
+The kernel will need to allocate a buffer for the encrypted data.
+This buffer is allocated at the time send() is called, such that
+either the entire send() call will return -ENOMEM (or block waiting
+for memory), or the encryption will always succeed.  If send() returns
+-ENOMEM and some data was left on the socket buffer from a previous
+call using MSG_MORE, the MSG_MORE data is left on the socket buffer.
+
+Send TLS control messages
+-
+
+Other than application data, TLS has control messages such as alert
+messages (record type 21) and handshake messages (record type 22), etc.
+These messages can be sent over the socket by providing the TLS record type
+via a CMSG. For example the following function sends @data of @length bytes
+using a record of type @record_type.
+
+/* send TLS control message using record_type */
+  static int klts_send_ctrl_message(int sock, unsigned char record_type,
+  void *data, size_t length)
+  {
+struct msghdr msg = {0};
+int cmsg_len = sizeof(record_type);
+struct cmsghdr *cmsg;
+char buf[CMSG_SPACE(cmsg_len)];
+struct iovec msg_iov;   /* Vector of data to send/receive into.  */
+
+msg.msg_control = buf;
+msg.msg_controllen = sizeof(buf);
+cmsg = CMSG_FIRSTHDR();
+cmsg->cmsg_level = SOL_TLS;
+cmsg->cmsg_type = TLS_SET_RECORD_TYPE;
+cmsg->cmsg_len = CMSG_LEN(cmsg_len);
+*CMSG_DATA(cmsg) = record_type;
+msg.msg_controllen = cmsg->cmsg_len;
+
+msg_iov.iov_base = data;
+msg_iov.iov_len = length;
+msg.msg_iov = _iov;
+msg.msg_iovlen = 1;
+
+return s

[PATCH v2 net-next 3/4] tls: kernel TLS support

2017-06-06 Thread Dave Watson
Software implementation of transport layer security, implemented using ULP
infrastructure.  tcp proto_ops are replaced with tls equivalents of sendmsg and
sendpage.

Only symmetric crypto is done in the kernel, keys are passed by setsockopt
after the handshake is complete.  All control messages are supported via CMSG
data - the actual symmetric encryption is the same, just the message type needs
to be passed separately.

For user API, please see Documentation patch.

Pieces that can be shared between hw and sw implementation
are in tls_main.c

Signed-off-by: Boris Pismenny <bor...@mellanox.com>
Signed-off-by: Ilya Lesokhin <il...@mellanox.com>
Signed-off-by: Aviad Yehezkel <avia...@mellanox.com>
Signed-off-by: Dave Watson <davejwat...@fb.com>
---
 MAINTAINERS  |  10 +
 include/linux/socket.h   |   1 +
 include/net/tls.h| 222 +
 include/uapi/linux/tls.h |  79 +
 net/Kconfig  |   1 +
 net/Makefile |   1 +
 net/tls/Kconfig  |  12 +
 net/tls/Makefile |   7 +
 net/tls/tls_main.c   | 485 +
 net/tls/tls_sw.c | 794 +++
 10 files changed, 1612 insertions(+)
 create mode 100644 include/net/tls.h
 create mode 100644 include/uapi/linux/tls.h
 create mode 100644 net/tls/Kconfig
 create mode 100644 net/tls/Makefile
 create mode 100644 net/tls/tls_main.c
 create mode 100644 net/tls/tls_sw.c

diff --git a/MAINTAINERS b/MAINTAINERS
index 6b7625f..246ddd7 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -8973,6 +8973,16 @@ F:   net/ipv6/
 F: include/net/ip*
 F: arch/x86/net/*
 
+NETWORKING [TLS]
+M: Ilya Lesokhin <il...@mellanox.com>
+M: Aviad Yehezkel <avia...@mellanox.com>
+M: Dave Watson <davejwat...@fb.com>
+L: netdev@vger.kernel.org
+S: Maintained
+F: net/tls/*
+F: include/uapi/linux/tls.h
+F: include/net/tls.h
+
 NETWORKING [IPSEC]
 M: Steffen Klassert <steffen.klass...@secunet.com>
 M: Herbert Xu <herb...@gondor.apana.org.au>
diff --git a/include/linux/socket.h b/include/linux/socket.h
index 0820274..8b13db5 100644
--- a/include/linux/socket.h
+++ b/include/linux/socket.h
@@ -334,6 +334,7 @@ struct ucred {
 #define SOL_ALG279
 #define SOL_NFC280
 #define SOL_KCM281
+#define SOL_TLS282
 
 /* IPX options */
 #define IPX_TYPE   1
diff --git a/include/net/tls.h b/include/net/tls.h
new file mode 100644
index 000..b20fd2f
--- /dev/null
+++ b/include/net/tls.h
@@ -0,0 +1,222 @@
+/*
+ * Copyright (c) 2016-2017, Mellanox Technologies. All rights reserved.
+ * Copyright (c) 2016-2017, Dave Watson <davejwat...@fb.com>. All rights 
reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ *  - Redistributions of source code must retain the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer.
+ *
+ *  - Redistributions in binary form must reproduce the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer in the documentation and/or other materials
+ *provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#ifndef _TLS_OFFLOAD_H
+#define _TLS_OFFLOAD_H
+
+#include 
+
+#include 
+
+
+/* Maximum data size carried in a TLS record */
+#define TLS_MAX_PAYLOAD_SIZE   ((size_t)1 << 14)
+
+#define TLS_HEADER_SIZE5
+#define TLS_NONCE_OFFSET   TLS_HEADER_SIZE
+
+#define TLS_CRYPTO_INFO_READY(info)((info)->cipher_type)
+
+#define TLS_RECORD_TYPE_DATA   0x17
+
+#define TLS_AAD_SPACE_SIZE 13
+
+struct tls_sw_context {
+   struct crypto_aead *aead_send;
+
+   /* Sending context */
+   char aad_space[TLS_AAD_SPACE_SIZE];
+
+   unsigned int sg_plaintext_size;
+   int sg_plaintext_num_elem;
+   struct scatterlist sg_plaintext_data[MAX_SKB_FRAGS];
+
+   unsigned int sg_encrypted_size;

[PATCH v2 net-next 2/4] tcp: export do_tcp_sendpages and tcp_rate_check_app_limited functions

2017-06-06 Thread Dave Watson
Export do_tcp_sendpages and tcp_rate_check_app_limited, since tls will need to
sendpages while the socket is already locked.

tcp_sendpage is exported, but requires the socket lock to not be held already.

Signed-off-by: Aviad Yehezkel <avia...@mellanox.com>
Signed-off-by: Ilya Lesokhin <il...@mellanox.com>
Signed-off-by: Boris Pismenny <bor...@mellanox.com>
Signed-off-by: Dave Watson <davejwat...@fb.com>
---
 include/net/tcp.h   | 2 ++
 net/ipv4/tcp.c  | 5 +++--
 net/ipv4/tcp_rate.c | 1 +
 3 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/include/net/tcp.h b/include/net/tcp.h
index fcc39f8..2b35100 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -353,6 +353,8 @@ int tcp_v4_tw_remember_stamp(struct inet_timewait_sock *tw);
 int tcp_sendmsg(struct sock *sk, struct msghdr *msg, size_t size);
 int tcp_sendpage(struct sock *sk, struct page *page, int offset, size_t size,
 int flags);
+ssize_t do_tcp_sendpages(struct sock *sk, struct page *page, int offset,
+size_t size, int flags);
 void tcp_release_cb(struct sock *sk);
 void tcp_wfree(struct sk_buff *skb);
 void tcp_write_timer_handler(struct sock *sk);
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 0aa72cd..70efada 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -882,8 +882,8 @@ static int tcp_send_mss(struct sock *sk, int *size_goal, 
int flags)
return mss_now;
 }
 
-static ssize_t do_tcp_sendpages(struct sock *sk, struct page *page, int offset,
-   size_t size, int flags)
+ssize_t do_tcp_sendpages(struct sock *sk, struct page *page, int offset,
+size_t size, int flags)
 {
struct tcp_sock *tp = tcp_sk(sk);
int mss_now, size_goal;
@@ -1013,6 +1013,7 @@ static ssize_t do_tcp_sendpages(struct sock *sk, struct 
page *page, int offset,
}
return sk_stream_error(sk, flags, err);
 }
+EXPORT_SYMBOL_GPL(do_tcp_sendpages);
 
 int tcp_sendpage(struct sock *sk, struct page *page, int offset,
 size_t size, int flags)
diff --git a/net/ipv4/tcp_rate.c b/net/ipv4/tcp_rate.c
index ad99569..3330a37 100644
--- a/net/ipv4/tcp_rate.c
+++ b/net/ipv4/tcp_rate.c
@@ -185,3 +185,4 @@ void tcp_rate_check_app_limited(struct sock *sk)
tp->app_limited =
(tp->delivered + tcp_packets_in_flight(tp)) ? : 1;
 }
+EXPORT_SYMBOL_GPL(tcp_rate_check_app_limited);
-- 
2.9.3



[PATCH v2 net-next 1/4] tcp: ULP infrastructure

2017-06-06 Thread Dave Watson
Add the infrustructure for attaching Upper Layer Protocols (ULPs) over TCP
sockets. Based on a similar infrastructure in tcp_cong.  The idea is that any
ULP can add its own logic by changing the TCP proto_ops structure to its own
methods.

Example usage:

setsockopt(sock, SOL_TCP, TCP_ULP, "tls", sizeof("tls"));

modules will call:
tcp_register_ulp(_tls_ulp_ops);

to register/unregister their ulp, with an init function and name.

A list of registered ulps will be returned by tcp_get_available_ulp, which is
hooked up to /proc.  Example:

$ cat /proc/sys/net/ipv4/tcp_available_ulp
tls

There is currently no functionality to remove or chain ULPs, but
it should be possible to add these in the future if needed.

Signed-off-by: Boris Pismenny <bor...@mellanox.com>
Signed-off-by: Dave Watson <davejwat...@fb.com>
---
 include/net/inet_connection_sock.h |   4 ++
 include/net/tcp.h  |  25 +++
 include/uapi/linux/tcp.h   |   1 +
 net/ipv4/Makefile  |   2 +-
 net/ipv4/sysctl_net_ipv4.c |  25 +++
 net/ipv4/tcp.c |  28 
 net/ipv4/tcp_ipv4.c|   2 +
 net/ipv4/tcp_ulp.c | 134 +
 8 files changed, 220 insertions(+), 1 deletion(-)
 create mode 100644 net/ipv4/tcp_ulp.c

diff --git a/include/net/inet_connection_sock.h 
b/include/net/inet_connection_sock.h
index c7a5779..13e4c89 100644
--- a/include/net/inet_connection_sock.h
+++ b/include/net/inet_connection_sock.h
@@ -75,6 +75,8 @@ struct inet_connection_sock_af_ops {
  * @icsk_pmtu_cookie  Last pmtu seen by socket
  * @icsk_ca_ops   Pluggable congestion control hook
  * @icsk_af_ops   Operations which are AF_INET{4,6} specific
+ * @icsk_ulp_ops  Pluggable ULP control hook
+ * @icsk_ulp_data ULP private data
  * @icsk_ca_state:Congestion control state
  * @icsk_retransmits: Number of unrecovered [RTO] timeouts
  * @icsk_pending: Scheduled timer event
@@ -97,6 +99,8 @@ struct inet_connection_sock {
__u32 icsk_pmtu_cookie;
const struct tcp_congestion_ops *icsk_ca_ops;
const struct inet_connection_sock_af_ops *icsk_af_ops;
+   const struct tcp_ulp_ops  *icsk_ulp_ops;
+   void  *icsk_ulp_data;
unsigned int  (*icsk_sync_mss)(struct sock *sk, u32 pmtu);
__u8  icsk_ca_state:6,
  icsk_ca_setsockopt:1,
diff --git a/include/net/tcp.h b/include/net/tcp.h
index 82462db..fcc39f8 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -1992,4 +1992,29 @@ static inline void tcp_listendrop(const struct sock *sk)
 
 enum hrtimer_restart tcp_pace_kick(struct hrtimer *timer);
 
+/*
+ * Interface for adding Upper Level Protocols over TCP
+ */
+
+#define TCP_ULP_NAME_MAX   16
+#define TCP_ULP_MAX128
+#define TCP_ULP_BUF_MAX(TCP_ULP_NAME_MAX*TCP_ULP_MAX)
+
+struct tcp_ulp_ops {
+   struct list_headlist;
+
+   /* initialize ulp */
+   int (*init)(struct sock *sk);
+   /* cleanup ulp */
+   void (*release)(struct sock *sk);
+
+   charname[TCP_ULP_NAME_MAX];
+   struct module   *owner;
+};
+int tcp_register_ulp(struct tcp_ulp_ops *type);
+void tcp_unregister_ulp(struct tcp_ulp_ops *type);
+int tcp_set_ulp(struct sock *sk, const char *name);
+void tcp_get_available_ulp(char *buf, size_t len);
+void tcp_cleanup_ulp(struct sock *sk);
+
 #endif /* _TCP_H */
diff --git a/include/uapi/linux/tcp.h b/include/uapi/linux/tcp.h
index 38a2b07..8204dce 100644
--- a/include/uapi/linux/tcp.h
+++ b/include/uapi/linux/tcp.h
@@ -117,6 +117,7 @@ enum {
 #define TCP_SAVED_SYN  28  /* Get SYN headers recorded for 
connection */
 #define TCP_REPAIR_WINDOW  29  /* Get/set window parameters */
 #define TCP_FASTOPEN_CONNECT   30  /* Attempt FastOpen with connect */
+#define TCP_ULP31  /* Attach a ULP to a TCP connection */
 
 struct tcp_repair_opt {
__u32   opt_code;
diff --git a/net/ipv4/Makefile b/net/ipv4/Makefile
index f83de23..afcb435 100644
--- a/net/ipv4/Makefile
+++ b/net/ipv4/Makefile
@@ -8,7 +8,7 @@ obj-y := route.o inetpeer.o protocol.o \
 inet_timewait_sock.o inet_connection_sock.o \
 tcp.o tcp_input.o tcp_output.o tcp_timer.o tcp_ipv4.o \
 tcp_minisocks.o tcp_cong.o tcp_metrics.o tcp_fastopen.o \
-tcp_rate.o tcp_recovery.o \
+tcp_rate.o tcp_recovery.o tcp_ulp.o \
 tcp_offload.o datagram.o raw.o udp.o udplite.o \
 udp_offload.o arp.o icmp.o devinet.o af_inet.o igmp.o \
 fib_frontend.o fib_semantics.o fib_trie.o fib_notifier.o \
diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c
index 86957e9..6a40837c 100644
--- a/net/ipv4/sysctl_net_ipv4.c
+++ b/net/ipv4/sys

[PATCH v2 net-next 0/4] kernel TLS

2017-06-06 Thread Dave Watson
This series adds support for kernel TLS encryption over TCP sockets.
A standard TCP socket is converted to a TLS socket using a setsockopt.
Only symmetric crypto is done in the kernel, as well as TLS record
framing.  The handshake remains in userspace, and the negotiated
cipher keys/iv are provided to the TCP socket.

We implemented support for this API in OpenSSL 1.1.0, the code is
available at https://github.com/Mellanox/tls-openssl/tree/master

It should work with any TLS library with similar modifications,
a test tool using gnutls is here: https://github.com/Mellanox/tls-af_ktls_tool

Changes from V1:

* EXPORT_SYMBOL GPL in patch 2
* Add link to OpenSSL patch & gnutls example in documentation patch.
* sk_write_pending check was rolled in to wait_for_memory path,
  avoids special case and fixes lock inbalance issue.
* Unify flag handling for sendmsg/sendfile

Changes from RFC V2:

* Generic ULP (upper layer protocol) framework instead of TLS specific
  setsockopts
* Dropped Mellanox hardware patches, will come as separate series.
  Framework will work for both.

RFC V2:

http://www.mail-archive.com/netdev@vger.kernel.org/msg160317.html

Changes from RFC V1:

* Socket based on changing TCP proto_ops instead of crypto framework
* Merged code with Mellanox's hardware tls offload
* Zerocopy sendmsg support added - sendpage/sendfile is no longer
  necessary for zerocopy optimization

RFC V1:

http://www.mail-archive.com/netdev@vger.kernel.org/msg88021.html

* Socket based on crypto userspace API framework, required two
  sockets in userspace, one encrypted, one unencrypted.

Paper: https://netdevconf.org/1.2/papers/ktls.pdf

Aviad Yehezkel (1):
  tcp: export do_tcp_sendpages and tcp_rate_check_app_limited functions

Boris Pismenny (2):
  tcp: ULP infrastructure
  tls: Documentation

Ilya Lesokhin (1):
  tls: kernel TLS support

 Documentation/networking/tls.txt   | 135 +++
 MAINTAINERS|  10 +
 include/linux/socket.h |   1 +
 include/net/inet_connection_sock.h |   4 +
 include/net/tcp.h  |  27 ++
 include/net/tls.h  | 222 +++
 include/uapi/linux/tcp.h   |   1 +
 include/uapi/linux/tls.h   |  79 
 net/Kconfig|   1 +
 net/Makefile   |   1 +
 net/ipv4/Makefile  |   2 +-
 net/ipv4/sysctl_net_ipv4.c |  25 ++
 net/ipv4/tcp.c |  33 +-
 net/ipv4/tcp_ipv4.c|   2 +
 net/ipv4/tcp_rate.c|   1 +
 net/ipv4/tcp_ulp.c | 134 +++
 net/tls/Kconfig|  12 +
 net/tls/Makefile   |   7 +
 net/tls/tls_main.c | 485 ++
 net/tls/tls_sw.c   | 794 +
 20 files changed, 1973 insertions(+), 3 deletions(-)
 create mode 100644 Documentation/networking/tls.txt
 create mode 100644 include/net/tls.h
 create mode 100644 include/uapi/linux/tls.h
 create mode 100644 net/ipv4/tcp_ulp.c
 create mode 100644 net/tls/Kconfig
 create mode 100644 net/tls/Makefile
 create mode 100644 net/tls/tls_main.c
 create mode 100644 net/tls/tls_sw.c

-- 
2.9.3



[PATCH net-next 4/4] tls: Documentation

2017-05-24 Thread Dave Watson
Add documentation for the tcp ULP tls interface.

Signed-off-by: Boris Pismenny <bor...@mellanox.com>
Signed-off-by: Dave Watson <davejwat...@fb.com>
---
 Documentation/networking/tls.txt | 120 +++
 1 file changed, 120 insertions(+)
 create mode 100644 Documentation/networking/tls.txt

diff --git a/Documentation/networking/tls.txt b/Documentation/networking/tls.txt
new file mode 100644
index 000..7bfb256
--- /dev/null
+++ b/Documentation/networking/tls.txt
@@ -0,0 +1,120 @@
+Overview
+
+
+Transport Layer Security (TLS) is a Upper Layer Protocol (ULP) that runs over
+TCP. TLS provides end-to-end data integrity and confidentiality.
+
+User interface
+==
+
+Creating a TLS connection
+-
+
+First create a new TCP socket and set the TLS ULP.
+
+  sock = socket(AF_INET, SOCK_STREAM, 0);
+  setsockopt(sock, SOL_TCP, TCP_ULP, "tls", sizeof("tls"));
+
+Setting the TLS ULP allows us to set/get TLS socket options. Currently
+only the symmetric encryption is handled in the kernel.  After the TLS
+handshake is complete, we have all the parameters required to move the
+data-path to the kernel. There is a separate socket option for moving
+the transmit and the receive into the kernel.
+
+  /* From linux/tls.h */
+  struct tls_crypto_info {
+  unsigned short version;
+  unsigned short cipher_type;
+  };
+
+  struct tls12_crypto_info_aes_gcm_128 {
+  struct tls_crypto_info info;
+  unsigned char iv[TLS_CIPHER_AES_GCM_128_IV_SIZE];
+  unsigned char key[TLS_CIPHER_AES_GCM_128_KEY_SIZE];
+  unsigned char salt[TLS_CIPHER_AES_GCM_128_SALT_SIZE];
+  unsigned char rec_seq[TLS_CIPHER_AES_GCM_128_REC_SEQ_SIZE];
+  };
+
+
+  struct tls12_crypto_info_aes_gcm_128 crypto_info;
+
+  crypto_info.info.version = TLS_1_2_VERSION;
+  crypto_info.info.cipher_type = TLS_CIPHER_AES_GCM_128;
+  memcpy(crypto_info.iv, iv_write, TLS_CIPHER_AES_GCM_128_IV_SIZE);
+  memcpy(crypto_info.rec_seq, seq_number_write,
+   TLS_CIPHER_AES_GCM_128_REC_SEQ_SIZE);
+  memcpy(crypto_info.key, cipher_key_write, TLS_CIPHER_AES_GCM_128_KEY_SIZE);
+  memcpy(crypto_info.salt, implicit_iv_write, 
TLS_CIPHER_AES_GCM_128_SALT_SIZE);
+
+  setsockopt(sock, SOL_TLS, TLS_TX, _info, sizeof(crypto_info));
+
+Sending TLS application data
+
+
+After setting the TLS_TX socket option all application data sent over this
+socket is encrypted using TLS and the parameters provided in the socket option.
+For example, we can send an encrypted hello world record as follows:
+
+  const char *msg = "hello world\n";
+  send(sock, msg, strlen(msg));
+
+send() data is directly encrypted from the userspace buffer provided
+to the encrypted kernel send buffer if possible.
+
+The sendfile system call will send the file's data over TLS records of maximum
+length (2^14).
+
+  file = open(filename, O_RDONLY);
+  fstat(file, );
+  sendfile(sock, file, , stat.st_size);
+
+TLS records are created and sent after each send() call, unless
+MSG_MORE is passed.  MSG_MORE will delay creation of a record until
+MSG_MORE is not passed, or the maximum record size is reached.
+
+The kernel will need to allocate a buffer for the encrypted data.
+This buffer is allocated at the time send() is called, such that
+either the entire send() call will return -ENOMEM (or block waiting
+for memory), or the encryption will always succeed.  If send() returns
+-ENOMEM and some data was left on the socket buffer from a previous
+call using MSG_MORE, the MSG_MORE data is left on the socket buffer.
+
+Send TLS control messages
+-
+
+Other than application data, TLS has control messages such as alert
+messages (record type 21) and handshake messages (record type 22), etc.
+These messages can be sent over the socket by providing the TLS record type
+via a CMSG. For example the following function sends @data of @length bytes
+using a record of type @record_type.
+
+/* send TLS control message using record_type */
+  static int klts_send_ctrl_message(int sock, unsigned char record_type,
+  void *data, size_t length)
+  {
+struct msghdr msg = {0};
+int cmsg_len = sizeof(record_type);
+struct cmsghdr *cmsg;
+char buf[CMSG_SPACE(cmsg_len)];
+struct iovec msg_iov;   /* Vector of data to send/receive into.  */
+
+msg.msg_control = buf;
+msg.msg_controllen = sizeof(buf);
+cmsg = CMSG_FIRSTHDR();
+cmsg->cmsg_level = SOL_TLS;
+cmsg->cmsg_type = TLS_SET_RECORD_TYPE;
+cmsg->cmsg_len = CMSG_LEN(cmsg_len);
+*CMSG_DATA(cmsg) = record_type;
+msg.msg_controllen = cmsg->cmsg_len;
+
+msg_iov.iov_base = data;
+msg_iov.iov_len = length;
+msg.msg_iov = _iov;
+msg.msg_iovlen = 1;
+
+return s

[PATCH net-next 3/4] tls: kernel TLS support

2017-05-24 Thread Dave Watson
Software implementation of transport layer security, implemented using ULP
infrastructure.  tcp proto_ops are replaced with tls equivalents of sendmsg and
sendpage.

Only symmetric crypto is done in the kernel, keys are passed by setsockopt
after the handshake is complete.  All control messages are supported via CMSG
data - the actual symmetric encryption is the same, just the message type needs
to be passed separately.

For user API, please see Documentation patch.

Pieces that can be shared between hw and sw implementation
are in tls_main.c

Signed-off-by: Boris Pismenny <bor...@mellanox.com>
Signed-off-by: Ilya Lesokhin <il...@mellanox.com>
Signed-off-by: Aviad Yehezkel <avia...@mellanox.com>
Signed-off-by: Dave Watson <davejwat...@fb.com>
---
 MAINTAINERS  |  10 +
 include/linux/socket.h   |   1 +
 include/net/tls.h| 223 ++
 include/uapi/linux/tls.h |  79 +
 net/Kconfig  |   1 +
 net/Makefile |   1 +
 net/tls/Kconfig  |  12 +
 net/tls/Makefile |   7 +
 net/tls/tls_main.c   | 450 +++
 net/tls/tls_sw.c | 788 +++
 10 files changed, 1572 insertions(+)
 create mode 100644 include/net/tls.h
 create mode 100644 include/uapi/linux/tls.h
 create mode 100644 net/tls/Kconfig
 create mode 100644 net/tls/Makefile
 create mode 100644 net/tls/tls_main.c
 create mode 100644 net/tls/tls_sw.c

diff --git a/MAINTAINERS b/MAINTAINERS
index 9e98464..94bdbe8 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -8944,6 +8944,16 @@ F:   net/ipv6/
 F: include/net/ip*
 F: arch/x86/net/*
 
+NETWORKING [TLS]
+M: Ilya Lesokhin <il...@mellanox.com>
+M: Aviad Yehezkel <avia...@mellanox.com>
+M: Dave Watson <davejwat...@fb.com>
+L: netdev@vger.kernel.org
+S: Maintained
+F: net/tls/*
+F: include/uapi/linux/tls.h
+F: include/net/tls.h
+
 NETWORKING [IPSEC]
 M: Steffen Klassert <steffen.klass...@secunet.com>
 M: Herbert Xu <herb...@gondor.apana.org.au>
diff --git a/include/linux/socket.h b/include/linux/socket.h
index 0820274..8b13db5 100644
--- a/include/linux/socket.h
+++ b/include/linux/socket.h
@@ -334,6 +334,7 @@ struct ucred {
 #define SOL_ALG279
 #define SOL_NFC280
 #define SOL_KCM281
+#define SOL_TLS282
 
 /* IPX options */
 #define IPX_TYPE   1
diff --git a/include/net/tls.h b/include/net/tls.h
new file mode 100644
index 000..eee6ddf
--- /dev/null
+++ b/include/net/tls.h
@@ -0,0 +1,223 @@
+/*
+ * Copyright (c) 2016-2017, Mellanox Technologies. All rights reserved.
+ * Copyright (c) 2016-2017, Dave Watson <davejwat...@fb.com>. All rights 
reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ *  - Redistributions of source code must retain the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer.
+ *
+ *  - Redistributions in binary form must reproduce the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer in the documentation and/or other materials
+ *provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#ifndef _TLS_OFFLOAD_H
+#define _TLS_OFFLOAD_H
+
+#include 
+
+#include 
+
+
+/* Maximum data size carried in a TLS record */
+#define TLS_MAX_PAYLOAD_SIZE   ((size_t)1 << 14)
+
+#define TLS_HEADER_SIZE5
+#define TLS_NONCE_OFFSET   TLS_HEADER_SIZE
+
+#define TLS_CRYPTO_INFO_READY(info)((info)->cipher_type)
+
+#define TLS_RECORD_TYPE_DATA   0x17
+
+#define TLS_AAD_SPACE_SIZE 13
+
+struct tls_sw_context {
+   struct crypto_aead *aead_send;
+
+   /* Sending context */
+   char aad_space[TLS_AAD_SPACE_SIZE];
+
+   unsigned int sg_plaintext_size;
+   int sg_plaintext_num_elem;
+   struct scatterlist sg_plaintext_data[MAX_SKB_FRAGS];
+
+   unsigned int sg_encrypted_size;

[PATCH net-next 2/4] tcp: export do_tcp_sendpages and tcp_rate_check_app_limited functions

2017-05-24 Thread Dave Watson
Export do_tcp_sendpages and tcp_rate_check_app_limited, since tls will need to
sendpages while the socket is already locked.

tcp_sendpage is exported, but requires the socket lock to not be held already.

Signed-off-by: Aviad Yehezkel <avia...@mellanox.com>
Signed-off-by: Ilya Lesokhin <il...@mellanox.com>
Signed-off-by: Boris Pismenny <bor...@mellanox.com>
Signed-off-by: Dave Watson <davejwat...@fb.com>
---
 include/net/tcp.h   | 2 ++
 net/ipv4/tcp.c  | 5 +++--
 net/ipv4/tcp_rate.c | 1 +
 3 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/include/net/tcp.h b/include/net/tcp.h
index fcc39f8..2b35100 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -353,6 +353,8 @@ int tcp_v4_tw_remember_stamp(struct inet_timewait_sock *tw);
 int tcp_sendmsg(struct sock *sk, struct msghdr *msg, size_t size);
 int tcp_sendpage(struct sock *sk, struct page *page, int offset, size_t size,
 int flags);
+ssize_t do_tcp_sendpages(struct sock *sk, struct page *page, int offset,
+size_t size, int flags);
 void tcp_release_cb(struct sock *sk);
 void tcp_wfree(struct sk_buff *skb);
 void tcp_write_timer_handler(struct sock *sk);
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 9f06faa..08a8ef4 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -882,8 +882,8 @@ static int tcp_send_mss(struct sock *sk, int *size_goal, 
int flags)
return mss_now;
 }
 
-static ssize_t do_tcp_sendpages(struct sock *sk, struct page *page, int offset,
-   size_t size, int flags)
+ssize_t do_tcp_sendpages(struct sock *sk, struct page *page, int offset,
+size_t size, int flags)
 {
struct tcp_sock *tp = tcp_sk(sk);
int mss_now, size_goal;
@@ -1013,6 +1013,7 @@ static ssize_t do_tcp_sendpages(struct sock *sk, struct 
page *page, int offset,
}
return sk_stream_error(sk, flags, err);
 }
+EXPORT_SYMBOL(do_tcp_sendpages);
 
 int tcp_sendpage(struct sock *sk, struct page *page, int offset,
 size_t size, int flags)
diff --git a/net/ipv4/tcp_rate.c b/net/ipv4/tcp_rate.c
index ad99569..62876e4 100644
--- a/net/ipv4/tcp_rate.c
+++ b/net/ipv4/tcp_rate.c
@@ -185,3 +185,4 @@ void tcp_rate_check_app_limited(struct sock *sk)
tp->app_limited =
(tp->delivered + tcp_packets_in_flight(tp)) ? : 1;
 }
+EXPORT_SYMBOL(tcp_rate_check_app_limited);
-- 
2.9.3



[PATCH net-next 1/4] tcp: ULP infrastructure

2017-05-24 Thread Dave Watson
Add the infrustructure for attaching Upper Layer Protocols (ULPs) over TCP
sockets. Based on a similar infrastructure in tcp_cong.  The idea is that any
ULP can add its own logic by changing the TCP proto_ops structure to its own
methods.

Example usage:

setsockopt(sock, SOL_TCP, TCP_ULP, "tls", sizeof("tls"));

modules will call:
tcp_register_ulp(_tls_ulp_ops);

to register/unregister their ulp, with an init function and name.

A list of registered ulps will be returned by tcp_get_available_ulp, which is
hooked up to /proc.  Example:

$ cat /proc/sys/net/ipv4/tcp_available_ulp
tls

There is currently no functionality to remove or chain ULPs, but
it should be possible to add these in the future if needed.

Signed-off-by: Boris Pismenny <bor...@mellanox.com>
Signed-off-by: Dave Watson <davejwat...@fb.com>
---
 include/net/inet_connection_sock.h |   4 ++
 include/net/tcp.h  |  25 +++
 include/uapi/linux/tcp.h   |   1 +
 net/ipv4/Makefile  |   2 +-
 net/ipv4/sysctl_net_ipv4.c |  25 +++
 net/ipv4/tcp.c |  28 
 net/ipv4/tcp_ipv4.c|   2 +
 net/ipv4/tcp_ulp.c | 134 +
 8 files changed, 220 insertions(+), 1 deletion(-)
 create mode 100644 net/ipv4/tcp_ulp.c

diff --git a/include/net/inet_connection_sock.h 
b/include/net/inet_connection_sock.h
index c7a5779..13e4c89 100644
--- a/include/net/inet_connection_sock.h
+++ b/include/net/inet_connection_sock.h
@@ -75,6 +75,8 @@ struct inet_connection_sock_af_ops {
  * @icsk_pmtu_cookie  Last pmtu seen by socket
  * @icsk_ca_ops   Pluggable congestion control hook
  * @icsk_af_ops   Operations which are AF_INET{4,6} specific
+ * @icsk_ulp_ops  Pluggable ULP control hook
+ * @icsk_ulp_data ULP private data
  * @icsk_ca_state:Congestion control state
  * @icsk_retransmits: Number of unrecovered [RTO] timeouts
  * @icsk_pending: Scheduled timer event
@@ -97,6 +99,8 @@ struct inet_connection_sock {
__u32 icsk_pmtu_cookie;
const struct tcp_congestion_ops *icsk_ca_ops;
const struct inet_connection_sock_af_ops *icsk_af_ops;
+   const struct tcp_ulp_ops  *icsk_ulp_ops;
+   void  *icsk_ulp_data;
unsigned int  (*icsk_sync_mss)(struct sock *sk, u32 pmtu);
__u8  icsk_ca_state:6,
  icsk_ca_setsockopt:1,
diff --git a/include/net/tcp.h b/include/net/tcp.h
index 82462db..fcc39f8 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -1992,4 +1992,29 @@ static inline void tcp_listendrop(const struct sock *sk)
 
 enum hrtimer_restart tcp_pace_kick(struct hrtimer *timer);
 
+/*
+ * Interface for adding Upper Level Protocols over TCP
+ */
+
+#define TCP_ULP_NAME_MAX   16
+#define TCP_ULP_MAX128
+#define TCP_ULP_BUF_MAX(TCP_ULP_NAME_MAX*TCP_ULP_MAX)
+
+struct tcp_ulp_ops {
+   struct list_headlist;
+
+   /* initialize ulp */
+   int (*init)(struct sock *sk);
+   /* cleanup ulp */
+   void (*release)(struct sock *sk);
+
+   charname[TCP_ULP_NAME_MAX];
+   struct module   *owner;
+};
+int tcp_register_ulp(struct tcp_ulp_ops *type);
+void tcp_unregister_ulp(struct tcp_ulp_ops *type);
+int tcp_set_ulp(struct sock *sk, const char *name);
+void tcp_get_available_ulp(char *buf, size_t len);
+void tcp_cleanup_ulp(struct sock *sk);
+
 #endif /* _TCP_H */
diff --git a/include/uapi/linux/tcp.h b/include/uapi/linux/tcp.h
index 38a2b07..8204dce 100644
--- a/include/uapi/linux/tcp.h
+++ b/include/uapi/linux/tcp.h
@@ -117,6 +117,7 @@ enum {
 #define TCP_SAVED_SYN  28  /* Get SYN headers recorded for 
connection */
 #define TCP_REPAIR_WINDOW  29  /* Get/set window parameters */
 #define TCP_FASTOPEN_CONNECT   30  /* Attempt FastOpen with connect */
+#define TCP_ULP31  /* Attach a ULP to a TCP connection */
 
 struct tcp_repair_opt {
__u32   opt_code;
diff --git a/net/ipv4/Makefile b/net/ipv4/Makefile
index f83de23..afcb435 100644
--- a/net/ipv4/Makefile
+++ b/net/ipv4/Makefile
@@ -8,7 +8,7 @@ obj-y := route.o inetpeer.o protocol.o \
 inet_timewait_sock.o inet_connection_sock.o \
 tcp.o tcp_input.o tcp_output.o tcp_timer.o tcp_ipv4.o \
 tcp_minisocks.o tcp_cong.o tcp_metrics.o tcp_fastopen.o \
-tcp_rate.o tcp_recovery.o \
+tcp_rate.o tcp_recovery.o tcp_ulp.o \
 tcp_offload.o datagram.o raw.o udp.o udplite.o \
 udp_offload.o arp.o icmp.o devinet.o af_inet.o igmp.o \
 fib_frontend.o fib_semantics.o fib_trie.o fib_notifier.o \
diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c
index 86957e9..6a40837c 100644
--- a/net/ipv4/sysctl_net_ipv4.c
+++ b/net/ipv4/sys

[PATCH net-next 0/4] kernel TLS

2017-05-24 Thread Dave Watson
This series adds support for kernel TLS encryption over TCP sockets.
A standard TCP socket is converted to a TLS socket using a setsockopt.
Only symmetric crypto is done in the kernel, as well as TLS record
framing.  The handshake remains in userspace, and the negotiated
cipher keys/iv are provided to the TCP socket.

We implemented support for this API in OpenSSL 1.1.0, the code is
available at https://github.com/Mellanox/tls-openssl/tree/master

It should work with any TLS library with similar modifications,
a test tool using gnutls is here: https://github.com/Mellanox/tls-af_ktls_tool

Changes from RFC V2:

* Generic ULP (upper layer protocol) framework instead of TLS specific
  setsockopts
* Dropped Mellanox hardware patches, will come as separate series.
  Framework will work for both.

RFC V2:

http://www.mail-archive.com/netdev@vger.kernel.org/msg160317.html

Changes from RFC V1:

* Socket based on changing TCP proto_ops instead of crypto framework
* Merged code with Mellanox's hardware tls offload
* Zerocopy sendmsg support added - sendpage/sendfile is no longer
  necessary for zerocopy optimization

RFC V1:

http://www.mail-archive.com/netdev@vger.kernel.org/msg88021.html

* Socket based on crypto userspace API framework, required two
  sockets in userspace, one encrypted, one unencrypted.

Paper: https://netdevconf.org/1.2/papers/ktls.pdf

Aviad Yehezkel (1):
  tcp: export do_tcp_sendpages and tcp_rate_check_app_limited functions

Boris Pismenny (2):
  tcp: ULP infrastructure
  tls: Documentation

Ilya Lesokhin (1):
  tls: kernel TLS support

 Documentation/networking/tls.txt   | 120 ++
 MAINTAINERS|  10 +
 include/linux/socket.h |   1 +
 include/net/inet_connection_sock.h |   4 +
 include/net/tcp.h  |  27 ++
 include/net/tls.h  | 223 +++
 include/uapi/linux/tcp.h   |   1 +
 include/uapi/linux/tls.h   |  79 
 net/Kconfig|   1 +
 net/Makefile   |   1 +
 net/ipv4/Makefile  |   2 +-
 net/ipv4/sysctl_net_ipv4.c |  25 ++
 net/ipv4/tcp.c |  33 +-
 net/ipv4/tcp_ipv4.c|   2 +
 net/ipv4/tcp_rate.c|   1 +
 net/ipv4/tcp_ulp.c | 134 +++
 net/tls/Kconfig|  12 +
 net/tls/Makefile   |   7 +
 net/tls/tls_main.c | 450 +
 net/tls/tls_sw.c   | 788 +
 20 files changed, 1918 insertions(+), 3 deletions(-)
 create mode 100644 Documentation/networking/tls.txt
 create mode 100644 include/net/tls.h
 create mode 100644 include/uapi/linux/tls.h
 create mode 100644 net/ipv4/tcp_ulp.c
 create mode 100644 net/tls/Kconfig
 create mode 100644 net/tls/Makefile
 create mode 100644 net/tls/tls_main.c
 create mode 100644 net/tls/tls_sw.c

-- 
2.9.3



  1   2   >