Re: [PATCH net-next 06/14] net/tls: Add generic NIC offload infrastructure

2018-03-22 Thread Kirill Tkhai
On 22.03.2018 15:38, Boris Pismenny wrote:
> ...

 Can't we move this check in tls_dev_event() and use it for all types of 
 events?
 Then we avoid duplicate code.

>>>
>>> No. Not all events require this check. Also, the result is different for 
>>> different events.
>>
>> No. You always return NOTIFY_DONE, in case of !(netdev->features & 
>> NETIF_F_HW_TLS_TX).
>> See below:
>>
>> static int tls_check_dev_ops(struct net_device *dev)
>> {
>> if (!dev->tlsdev_ops)
>>     return NOTIFY_BAD;
>>
>> return NOTIFY_DONE;
>> }
>>
>> static int tls_device_down(struct net_device *netdev)
>> {
>> struct tls_context *ctx, *tmp;
>> struct list_head list;
>> unsigned long flags;
>>
>> ...
>> return NOTIFY_DONE;
>> }
>>
>> static int tls_dev_event(struct notifier_block *this, unsigned long event,
>>   void *ptr)
>> {
>>  struct net_device *dev = netdev_notifier_info_to_dev(ptr);
>>
>> if (!(netdev->features & NETIF_F_HW_TLS_TX))
>>     return NOTIFY_DONE;
>>    switch (event) {
>>  case NETDEV_REGISTER:
>>  case NETDEV_FEAT_CHANGE:
>>  return tls_check_dev_ops(dev);
>>    case NETDEV_DOWN:
>>  return tls_device_down(dev);
>>  }
>>  return NOTIFY_DONE;
>> }
>>  
> 
> Sure, will fix in V3.
> 
> +
> +    /* Request a write lock to block new offload attempts
> + */
> +    percpu_down_write(_offload_lock);

 What is the reason percpu_rwsem is chosen here? It looks like this 
 primitive
 gives more advantages readers, then plain rwsem does. But it also gives
 disadvantages to writers. It would be good, unless tls_device_down() is 
 called
 with rtnl_lock() held from netdevice notifier. But since netdevice notifier
 are called with rtnl_lock() held, percpu_rwsem will increase the time 
 rtnl_lock()
 is locked.
>>> We use the a rwsem to allow multiple (readers) invocations of 
>>> tls_set_device_offload, which is triggered by the user (persumably) during 
>>> the TLS handshake. This might be considered a fast-path.
>>>
>>> However, we must block all calls to tls_set_device_offload while we are 
>>> processing NETDEV_DOWN events (writer).
>>>
>>> As you've mentioned, the percpu rwsem is more efficient for readers, 
>>> especially on NUMA systems, where cache-line bouncing occurs during reader 
>>> acquire and reduces performance.
>>
>> Hm, and who are the readers? It's used from do_tls_setsockopt_tx(), but it 
>> doesn't
>> seem to be performance critical. Who else?
>>
> 
> It depends on whether you consider the TLS handshake code as critical.
> The readers are TCP connections processing the CCS message of the TLS 
> handshake. They are providing key material to the kernel to start using 
> Kernel TLS.

The thing is rtnl_lock() is critical for the rest of the system,
while TLS handshake is small subset of actions the system makes.

rtnl_lock() is used just almost everywhere, from netlink messages
to netdev ioctls.

Currently, you even just can't close raw socket without rtnl lock.
So, all of this is big reason to avoid doing rcu waitings under it.

Kirill


 Can't we use plain rwsem here instead?

>>>
>>> Its a performance tradeoff. I'm not certain that the percpu rwsem write 
>>> side acquire is significantly worse than using the global rwsem.
>>>
>>> For now, while all of this is experimental, can we agree to focus on the 
>>> performance of readers? We can change it later if it becomes a problem.
>>
>> Same as above.
>>   
> 
> Replaced with rwsem from V2.


Re: [PATCH net-next 06/14] net/tls: Add generic NIC offload infrastructure

2018-03-22 Thread Boris Pismenny

...


Can't we move this check in tls_dev_event() and use it for all types of events?
Then we avoid duplicate code.



No. Not all events require this check. Also, the result is different for 
different events.


No. You always return NOTIFY_DONE, in case of !(netdev->features & 
NETIF_F_HW_TLS_TX).
See below:

static int tls_check_dev_ops(struct net_device *dev)
{
if (!dev->tlsdev_ops)
return NOTIFY_BAD;

return NOTIFY_DONE;
}

static int tls_device_down(struct net_device *netdev)
{
struct tls_context *ctx, *tmp;
struct list_head list;
unsigned long flags;

...
return NOTIFY_DONE;
}

static int tls_dev_event(struct notifier_block *this, unsigned long event,
 void *ptr)
{
 struct net_device *dev = netdev_notifier_info_to_dev(ptr);

if (!(netdev->features & NETIF_F_HW_TLS_TX))
return NOTIFY_DONE;
  
 switch (event) {

 case NETDEV_REGISTER:
 case NETDEV_FEAT_CHANGE:
return tls_check_dev_ops(dev);
  
 case NETDEV_DOWN:

return tls_device_down(dev);
 }
 return NOTIFY_DONE;
}
 


Sure, will fix in V3.


+
+    /* Request a write lock to block new offload attempts
+ */
+    percpu_down_write(_offload_lock);


What is the reason percpu_rwsem is chosen here? It looks like this primitive
gives more advantages readers, then plain rwsem does. But it also gives
disadvantages to writers. It would be good, unless tls_device_down() is called
with rtnl_lock() held from netdevice notifier. But since netdevice notifier
are called with rtnl_lock() held, percpu_rwsem will increase the time 
rtnl_lock()
is locked.

We use the a rwsem to allow multiple (readers) invocations of 
tls_set_device_offload, which is triggered by the user (persumably) during the 
TLS handshake. This might be considered a fast-path.

However, we must block all calls to tls_set_device_offload while we are 
processing NETDEV_DOWN events (writer).

As you've mentioned, the percpu rwsem is more efficient for readers, especially 
on NUMA systems, where cache-line bouncing occurs during reader acquire and 
reduces performance.


Hm, and who are the readers? It's used from do_tls_setsockopt_tx(), but it 
doesn't
seem to be performance critical. Who else?



It depends on whether you consider the TLS handshake code as critical.
The readers are TCP connections processing the CCS message of the TLS 
handshake. They are providing key material to the kernel to start using 
Kernel TLS.





Can't we use plain rwsem here instead?



Its a performance tradeoff. I'm not certain that the percpu rwsem write side 
acquire is significantly worse than using the global rwsem.

For now, while all of this is experimental, can we agree to focus on the 
performance of readers? We can change it later if it becomes a problem.


Same as above.
  


Replaced with rwsem from V2.


Re: [PATCH net-next 06/14] net/tls: Add generic NIC offload infrastructure

2018-03-21 Thread Saeed Mahameed
On Wed, 2018-03-21 at 19:31 +0300, Kirill Tkhai wrote:
> On 21.03.2018 18:53, Boris Pismenny wrote:
> > ...
> > > 
> > > Other patches have two licenses in header. Can I distribute this
> > > file under GPL license terms?
> > > 
> > 
> > Sure, I'll update the license to match other files under net/tls.
> > 
> > > > +#include 
> > > > +#include 
> > > > +#include 
> > > > +#include 
> > > > +#include 
> > > > +
> > > > +#include 
> > > > +#include 
> > > > +
> > > > +/* device_offload_lock is used to synchronize tls_dev_add
> > > > + * against NETDEV_DOWN notifications.
> > > > + */
> > > > +DEFINE_STATIC_PERCPU_RWSEM(device_offload_lock);
> > > > +
> > > > +static void tls_device_gc_task(struct work_struct *work);
> > > > +
> > > > +static DECLARE_WORK(tls_device_gc_work, tls_device_gc_task);
> > > > +static LIST_HEAD(tls_device_gc_list);
> > > > +static LIST_HEAD(tls_device_list);
> > > > +static DEFINE_SPINLOCK(tls_device_lock);
> > > > +
> > > > +static void tls_device_free_ctx(struct tls_context *ctx)
> > > > +{
> > > > +struct tls_offload_context *offlad_ctx =
> > > > tls_offload_ctx(ctx);
> > > > +
> > > > +kfree(offlad_ctx);
> > > > +kfree(ctx);
> > > > +}
> > > > +
> > > > +static void tls_device_gc_task(struct work_struct *work)
> > > > +{
> > > > +struct tls_context *ctx, *tmp;
> > > > +struct list_head gc_list;
> > > > +unsigned long flags;
> > > > +
> > > > +spin_lock_irqsave(_device_lock, flags);
> > > > +INIT_LIST_HEAD(_list);
> > > 
> > > This is stack variable, and it should be initialized outside of
> > > global spinlock.
> > > There is LIST_HEAD() primitive for that in kernel.
> > > There is one more similar place below.
> > > 
> > 
> > Sure.
> > 
> > > > +list_splice_init(_device_gc_list, _list);
> > > > +spin_unlock_irqrestore(_device_lock, flags);
> > > > +
> > > > +list_for_each_entry_safe(ctx, tmp, _list, list) {
> > > > +struct net_device *netdev = ctx->netdev;
> > > > +
> > > > +if (netdev) {
> > > > +netdev->tlsdev_ops->tls_dev_del(netdev, ctx,
> > > > +TLS_OFFLOAD_CTX_DIR_TX);
> > > > +dev_put(netdev);
> > > > +}
> > > 
> > > How is possible the situation we meet NULL netdev here >
> > 
> > This can happen in tls_device_down. tls_deviec_down is called
> > whenever a netdev that is used for TLS inline crypto offload goes
> > down. It gets called via the NETDEV_DOWN event of the netdevice
> > notifier.
> > 
> > This flow is somewhat similar to the xfrm_device netdev notifier.
> > However, we do not destroy the socket (as in destroying the
> > xfrm_state in xfrm_device). Instead, we cleanup the netdev state
> > and allow software fallback to handle the rest of the traffic.
> > 
> > > > +
> > > > +list_del(>list);
> > > > +tls_device_free_ctx(ctx);
> > > > +}
> > > > +}
> > > > +
> > > > +static void tls_device_queue_ctx_destruction(struct
> > > > tls_context *ctx)
> > > > +{
> > > > +unsigned long flags;
> > > > +
> > > > +spin_lock_irqsave(_device_lock, flags);
> > > > +list_move_tail(>list, _device_gc_list);
> > > > +
> > > > +/* schedule_work inside the spinlock
> > > > + * to make sure tls_device_down waits for that work.
> > > > + */
> > > > +schedule_work(_device_gc_work);
> > > > +
> > > > +spin_unlock_irqrestore(_device_lock, flags);
> > > > +}
> > > > +
> > > > +/* We assume that the socket is already connected */
> > > > +static struct net_device *get_netdev_for_sock(struct sock *sk)
> > > > +{
> > > > +struct inet_sock *inet = inet_sk(sk);
> > > > +struct net_device *netdev = NULL;
> > > > +
> > > > +netdev = dev_get_by_index(sock_net(sk), inet-
> > > > >cork.fl.flowi_oif);
> > > > +
> > > > +return netdev;
> > > > +}
> > > > +
> > > > +static int attach_sock_to_netdev(struct sock *sk, struct
> > > > net_device *netdev,
> > > > + struct tls_context *ctx)
> > > > +{
> > > > +int rc;
> > > > +
> > > > +rc = netdev->tlsdev_ops->tls_dev_add(netdev, sk,
> > > > TLS_OFFLOAD_CTX_DIR_TX,
> > > > + >crypto_send,
> > > > + tcp_sk(sk)->write_seq);
> > > > +if (rc) {
> > > > +pr_err_ratelimited("The netdev has refused to offload
> > > > this socket\n");
> > > > +goto out;
> > > > +}
> > > > +
> > > > +rc = 0;
> > > > +out:
> > > > +return rc;
> > > > +}
> > > > +
> > > > +static void destroy_record(struct tls_record_info *record)
> > > > +{
> > > > +skb_frag_t *frag;
> > > > +int nr_frags = record->num_frags;
> > > > +
> > > > +while (nr_frags > 0) {
> > > > +frag = >frags[nr_frags - 1];
> > > > +__skb_frag_unref(frag);
> > > > +--nr_frags;
> > > > +}
> > > > +kfree(record);
> > > > +}
> > > > +
> > > > +static void delete_all_records(struct tls_offload_context
> > > > *offload_ctx)
> > > > +{
> > > > +struct tls_record_info *info, *temp;
> > > > +

Re: [PATCH net-next 06/14] net/tls: Add generic NIC offload infrastructure

2018-03-21 Thread Kirill Tkhai
On 21.03.2018 18:53, Boris Pismenny wrote:
> ...
>>
>> Other patches have two licenses in header. Can I distribute this file under 
>> GPL license terms?
>>
> 
> Sure, I'll update the license to match other files under net/tls.
> 
>>> +#include 
>>> +#include 
>>> +#include 
>>> +#include 
>>> +#include 
>>> +
>>> +#include 
>>> +#include 
>>> +
>>> +/* device_offload_lock is used to synchronize tls_dev_add
>>> + * against NETDEV_DOWN notifications.
>>> + */
>>> +DEFINE_STATIC_PERCPU_RWSEM(device_offload_lock);
>>> +
>>> +static void tls_device_gc_task(struct work_struct *work);
>>> +
>>> +static DECLARE_WORK(tls_device_gc_work, tls_device_gc_task);
>>> +static LIST_HEAD(tls_device_gc_list);
>>> +static LIST_HEAD(tls_device_list);
>>> +static DEFINE_SPINLOCK(tls_device_lock);
>>> +
>>> +static void tls_device_free_ctx(struct tls_context *ctx)
>>> +{
>>> +    struct tls_offload_context *offlad_ctx = tls_offload_ctx(ctx);
>>> +
>>> +    kfree(offlad_ctx);
>>> +    kfree(ctx);
>>> +}
>>> +
>>> +static void tls_device_gc_task(struct work_struct *work)
>>> +{
>>> +    struct tls_context *ctx, *tmp;
>>> +    struct list_head gc_list;
>>> +    unsigned long flags;
>>> +
>>> +    spin_lock_irqsave(_device_lock, flags);
>>> +    INIT_LIST_HEAD(_list);
>>
>> This is stack variable, and it should be initialized outside of global 
>> spinlock.
>> There is LIST_HEAD() primitive for that in kernel.
>> There is one more similar place below.
>>
> 
> Sure.
> 
>>> +    list_splice_init(_device_gc_list, _list);
>>> +    spin_unlock_irqrestore(_device_lock, flags);
>>> +
>>> +    list_for_each_entry_safe(ctx, tmp, _list, list) {
>>> +    struct net_device *netdev = ctx->netdev;
>>> +
>>> +    if (netdev) {
>>> +    netdev->tlsdev_ops->tls_dev_del(netdev, ctx,
>>> +    TLS_OFFLOAD_CTX_DIR_TX);
>>> +    dev_put(netdev);
>>> +    }
>>
>> How is possible the situation we meet NULL netdev here >
> 
> This can happen in tls_device_down. tls_deviec_down is called whenever a 
> netdev that is used for TLS inline crypto offload goes down. It gets called 
> via the NETDEV_DOWN event of the netdevice notifier.
> 
> This flow is somewhat similar to the xfrm_device netdev notifier. However, we 
> do not destroy the socket (as in destroying the xfrm_state in xfrm_device). 
> Instead, we cleanup the netdev state and allow software fallback to handle 
> the rest of the traffic.
> 
>>> +
>>> +    list_del(>list);
>>> +    tls_device_free_ctx(ctx);
>>> +    }
>>> +}
>>> +
>>> +static void tls_device_queue_ctx_destruction(struct tls_context *ctx)
>>> +{
>>> +    unsigned long flags;
>>> +
>>> +    spin_lock_irqsave(_device_lock, flags);
>>> +    list_move_tail(>list, _device_gc_list);
>>> +
>>> +    /* schedule_work inside the spinlock
>>> + * to make sure tls_device_down waits for that work.
>>> + */
>>> +    schedule_work(_device_gc_work);
>>> +
>>> +    spin_unlock_irqrestore(_device_lock, flags);
>>> +}
>>> +
>>> +/* We assume that the socket is already connected */
>>> +static struct net_device *get_netdev_for_sock(struct sock *sk)
>>> +{
>>> +    struct inet_sock *inet = inet_sk(sk);
>>> +    struct net_device *netdev = NULL;
>>> +
>>> +    netdev = dev_get_by_index(sock_net(sk), inet->cork.fl.flowi_oif);
>>> +
>>> +    return netdev;
>>> +}
>>> +
>>> +static int attach_sock_to_netdev(struct sock *sk, struct net_device 
>>> *netdev,
>>> + struct tls_context *ctx)
>>> +{
>>> +    int rc;
>>> +
>>> +    rc = netdev->tlsdev_ops->tls_dev_add(netdev, sk, 
>>> TLS_OFFLOAD_CTX_DIR_TX,
>>> + >crypto_send,
>>> + tcp_sk(sk)->write_seq);
>>> +    if (rc) {
>>> +    pr_err_ratelimited("The netdev has refused to offload this 
>>> socket\n");
>>> +    goto out;
>>> +    }
>>> +
>>> +    rc = 0;
>>> +out:
>>> +    return rc;
>>> +}
>>> +
>>> +static void destroy_record(struct tls_record_info *record)
>>> +{
>>> +    skb_frag_t *frag;
>>> +    int nr_frags = record->num_frags;
>>> +
>>> +    while (nr_frags > 0) {
>>> +    frag = >frags[nr_frags - 1];
>>> +    __skb_frag_unref(frag);
>>> +    --nr_frags;
>>> +    }
>>> +    kfree(record);
>>> +}
>>> +
>>> +static void delete_all_records(struct tls_offload_context *offload_ctx)
>>> +{
>>> +    struct tls_record_info *info, *temp;
>>> +
>>> +    list_for_each_entry_safe(info, temp, _ctx->records_list, list) 
>>> {
>>> +    list_del(>list);
>>> +    destroy_record(info);
>>> +    }
>>> +
>>> +    offload_ctx->retransmit_hint = NULL;
>>> +}
>>> +
>>> +static void tls_icsk_clean_acked(struct sock *sk, u32 acked_seq)
>>> +{
>>> +    struct tls_context *tls_ctx = tls_get_ctx(sk);
>>> +    struct tls_offload_context *ctx;
>>> +    struct tls_record_info *info, *temp;
>>> +    unsigned long flags;
>>> +    u64 deleted_records = 0;
>>> +
>>> +    if (!tls_ctx)
>>> +    return;
>>> +
>>> +    ctx = tls_offload_ctx(tls_ctx);
>>> +
>>> +    

Re: [PATCH net-next 06/14] net/tls: Add generic NIC offload infrastructure

2018-03-21 Thread Boris Pismenny

...


Other patches have two licenses in header. Can I distribute this file under GPL 
license terms?



Sure, I'll update the license to match other files under net/tls.


+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+
+/* device_offload_lock is used to synchronize tls_dev_add
+ * against NETDEV_DOWN notifications.
+ */
+DEFINE_STATIC_PERCPU_RWSEM(device_offload_lock);
+
+static void tls_device_gc_task(struct work_struct *work);
+
+static DECLARE_WORK(tls_device_gc_work, tls_device_gc_task);
+static LIST_HEAD(tls_device_gc_list);
+static LIST_HEAD(tls_device_list);
+static DEFINE_SPINLOCK(tls_device_lock);
+
+static void tls_device_free_ctx(struct tls_context *ctx)
+{
+   struct tls_offload_context *offlad_ctx = tls_offload_ctx(ctx);
+
+   kfree(offlad_ctx);
+   kfree(ctx);
+}
+
+static void tls_device_gc_task(struct work_struct *work)
+{
+   struct tls_context *ctx, *tmp;
+   struct list_head gc_list;
+   unsigned long flags;
+
+   spin_lock_irqsave(_device_lock, flags);
+   INIT_LIST_HEAD(_list);


This is stack variable, and it should be initialized outside of global spinlock.
There is LIST_HEAD() primitive for that in kernel.
There is one more similar place below.



Sure.


+   list_splice_init(_device_gc_list, _list);
+   spin_unlock_irqrestore(_device_lock, flags);
+
+   list_for_each_entry_safe(ctx, tmp, _list, list) {
+   struct net_device *netdev = ctx->netdev;
+
+   if (netdev) {
+   netdev->tlsdev_ops->tls_dev_del(netdev, ctx,
+   TLS_OFFLOAD_CTX_DIR_TX);
+   dev_put(netdev);
+   }


How is possible the situation we meet NULL netdev here >


This can happen in tls_device_down. tls_deviec_down is called whenever a 
netdev that is used for TLS inline crypto offload goes down. It gets 
called via the NETDEV_DOWN event of the netdevice notifier.


This flow is somewhat similar to the xfrm_device netdev notifier. 
However, we do not destroy the socket (as in destroying the xfrm_state 
in xfrm_device). Instead, we cleanup the netdev state and allow software 
fallback to handle the rest of the traffic.



+
+   list_del(>list);
+   tls_device_free_ctx(ctx);
+   }
+}
+
+static void tls_device_queue_ctx_destruction(struct tls_context *ctx)
+{
+   unsigned long flags;
+
+   spin_lock_irqsave(_device_lock, flags);
+   list_move_tail(>list, _device_gc_list);
+
+   /* schedule_work inside the spinlock
+* to make sure tls_device_down waits for that work.
+*/
+   schedule_work(_device_gc_work);
+
+   spin_unlock_irqrestore(_device_lock, flags);
+}
+
+/* We assume that the socket is already connected */
+static struct net_device *get_netdev_for_sock(struct sock *sk)
+{
+   struct inet_sock *inet = inet_sk(sk);
+   struct net_device *netdev = NULL;
+
+   netdev = dev_get_by_index(sock_net(sk), inet->cork.fl.flowi_oif);
+
+   return netdev;
+}
+
+static int attach_sock_to_netdev(struct sock *sk, struct net_device *netdev,
+struct tls_context *ctx)
+{
+   int rc;
+
+   rc = netdev->tlsdev_ops->tls_dev_add(netdev, sk, TLS_OFFLOAD_CTX_DIR_TX,
+>crypto_send,
+tcp_sk(sk)->write_seq);
+   if (rc) {
+   pr_err_ratelimited("The netdev has refused to offload this 
socket\n");
+   goto out;
+   }
+
+   rc = 0;
+out:
+   return rc;
+}
+
+static void destroy_record(struct tls_record_info *record)
+{
+   skb_frag_t *frag;
+   int nr_frags = record->num_frags;
+
+   while (nr_frags > 0) {
+   frag = >frags[nr_frags - 1];
+   __skb_frag_unref(frag);
+   --nr_frags;
+   }
+   kfree(record);
+}
+
+static void delete_all_records(struct tls_offload_context *offload_ctx)
+{
+   struct tls_record_info *info, *temp;
+
+   list_for_each_entry_safe(info, temp, _ctx->records_list, list) {
+   list_del(>list);
+   destroy_record(info);
+   }
+
+   offload_ctx->retransmit_hint = NULL;
+}
+
+static void tls_icsk_clean_acked(struct sock *sk, u32 acked_seq)
+{
+   struct tls_context *tls_ctx = tls_get_ctx(sk);
+   struct tls_offload_context *ctx;
+   struct tls_record_info *info, *temp;
+   unsigned long flags;
+   u64 deleted_records = 0;
+
+   if (!tls_ctx)
+   return;
+
+   ctx = tls_offload_ctx(tls_ctx);
+
+   spin_lock_irqsave(>lock, flags);
+   info = ctx->retransmit_hint;
+   if (info && !before(acked_seq, info->end_seq)) {
+   ctx->retransmit_hint = NULL;
+   list_del(>list);
+   destroy_record(info);
+   deleted_records++;
+   }
+
+   list_for_each_entry_safe(info, temp, 

Re: [PATCH net-next 06/14] net/tls: Add generic NIC offload infrastructure

2018-03-21 Thread Boris Pismenny



On 3/21/2018 5:08 PM, Dave Watson wrote:

On 03/19/18 07:45 PM, Saeed Mahameed wrote:

+#define TLS_OFFLOAD_CONTEXT_SIZE   
\
+   (ALIGN(sizeof(struct tls_offload_context), sizeof(void *)) +   \
+TLS_DRIVER_STATE_SIZE)
+
+   pfrag = sk_page_frag(sk);
+
+   /* KTLS_TLS_HEADER_SIZE is not counted as part of the TLS record, and


I think the define is actually TLS_HEADER_SIZE, no KTLS_ prefix



Fixed. Thanks.


+   memcpy(ctx->iv + TLS_CIPHER_AES_GCM_128_SALT_SIZE, iv, iv_size);
+
+   ctx->rec_seq_size = rec_seq_size;
+   /* worst case is:
+* MAX_SKB_FRAGS in tls_record_info
+* MAX_SKB_FRAGS + 1 in SKB head an frags.


spelling



Fixed. Thanks.


+int tls_sw_fallback_init(struct sock *sk,
+struct tls_offload_context *offload_ctx,
+struct tls_crypto_info *crypto_info)
+{
+   int rc;
+   const u8 *key;
+
+   offload_ctx->aead_send =
+   crypto_alloc_aead("gcm(aes)", 0, CRYPTO_ALG_ASYNC);


in tls_sw we went with async + crypto_wait_req, any reason to not do
that here?  Otherwise I think you still get the software gcm on x86
instead of aesni without additional changes.



Yes, synchronous crypto code runs to handle a software fallback in 
validate_xmit_skb, where waiting is not possible. I know Steffen 
recently added support for calling async crypto from validate_xmit_skb, 
but it wasn't available when we were writing these patches.


I think we could implemented async support in the future based on the 
infrastructure introduced by Steffen.



diff --git a/net/tls/tls_main.c b/net/tls/tls_main.c
index d824d548447e..e0dface33017 100644
--- a/net/tls/tls_main.c
+++ b/net/tls/tls_main.c
@@ -54,6 +54,9 @@ enum {
  enum {
TLS_BASE_TX,
TLS_SW_TX,
+#ifdef CONFIG_TLS_DEVICE
+   TLS_HW_TX,
+#endif
TLS_NUM_CONFIG,
  };


I have posted SW_RX patches, do you forsee any issues with SW_RX + HW_TX?



No, but I haven't tested these patches with the SW_RX patches.
I'll try to rebase your V2 SW_RX patches over this series tomorrow and 
run some tests.



Thanks



Re: [PATCH net-next 06/14] net/tls: Add generic NIC offload infrastructure

2018-03-21 Thread Dave Watson
On 03/19/18 07:45 PM, Saeed Mahameed wrote:
> +#define TLS_OFFLOAD_CONTEXT_SIZE 
>   \
> + (ALIGN(sizeof(struct tls_offload_context), sizeof(void *)) +   \
> +  TLS_DRIVER_STATE_SIZE)
> +
> + pfrag = sk_page_frag(sk);
> +
> + /* KTLS_TLS_HEADER_SIZE is not counted as part of the TLS record, and

I think the define is actually TLS_HEADER_SIZE, no KTLS_ prefix

> + memcpy(ctx->iv + TLS_CIPHER_AES_GCM_128_SALT_SIZE, iv, iv_size);
> +
> + ctx->rec_seq_size = rec_seq_size;
> + /* worst case is:
> +  * MAX_SKB_FRAGS in tls_record_info
> +  * MAX_SKB_FRAGS + 1 in SKB head an frags.

spelling

> +int tls_sw_fallback_init(struct sock *sk,
> +  struct tls_offload_context *offload_ctx,
> +  struct tls_crypto_info *crypto_info)
> +{
> + int rc;
> + const u8 *key;
> +
> + offload_ctx->aead_send =
> + crypto_alloc_aead("gcm(aes)", 0, CRYPTO_ALG_ASYNC);

in tls_sw we went with async + crypto_wait_req, any reason to not do
that here?  Otherwise I think you still get the software gcm on x86
instead of aesni without additional changes.

> diff --git a/net/tls/tls_main.c b/net/tls/tls_main.c
> index d824d548447e..e0dface33017 100644
> --- a/net/tls/tls_main.c
> +++ b/net/tls/tls_main.c
> @@ -54,6 +54,9 @@ enum {
>  enum {
>   TLS_BASE_TX,
>   TLS_SW_TX,
> +#ifdef CONFIG_TLS_DEVICE
> + TLS_HW_TX,
> +#endif
>   TLS_NUM_CONFIG,
>  };

I have posted SW_RX patches, do you forsee any issues with SW_RX + HW_TX?

Thanks


Re: [PATCH net-next 06/14] net/tls: Add generic NIC offload infrastructure

2018-03-21 Thread Kirill Tkhai
On 20.03.2018 05:45, Saeed Mahameed wrote:
> From: Ilya Lesokhin 
> 
> This patch adds a generic infrastructure to offload TLS crypto to a
> network devices. It enables the kernel TLS socket to skip encryption
> and authentication operations on the transmit side of the data path.
> Leaving those computationally expensive operations to the NIC.
> 
> The NIC offload infrastructure builds TLS records and pushes them to
> the TCP layer just like the SW KTLS implementation and using the same API.
> TCP segmentation is mostly unaffected. Currently the only exception is
> that we prevent mixed SKBs where only part of the payload requires
> offload. In the future we are likely to add a similar restriction
> following a change cipher spec record.
> 
> The notable differences between SW KTLS and NIC offloaded TLS
> implementations are as follows:
> 1. The offloaded implementation builds "plaintext TLS record", those
> records contain plaintext instead of ciphertext and place holder bytes
> instead of authentication tags.
> 2. The offloaded implementation maintains a mapping from TCP sequence
> number to TLS records. Thus given a TCP SKB sent from a NIC offloaded
> TLS socket, we can use the tls NIC offload infrastructure to obtain
> enough context to encrypt the payload of the SKB.
> A TLS record is released when the last byte of the record is ack'ed,
> this is done through the new icsk_clean_acked callback.
> 
> The infrastructure should be extendable to support various NIC offload
> implementations.  However it is currently written with the
> implementation below in mind:
> The NIC assumes that packets from each offloaded stream are sent as
> plaintext and in-order. It keeps track of the TLS records in the TCP
> stream. When a packet marked for offload is transmitted, the NIC
> encrypts the payload in-place and puts authentication tags in the
> relevant place holders.
> 
> The responsibility for handling out-of-order packets (i.e. TCP
> retransmission, qdisc drops) falls on the netdev driver.
> 
> The netdev driver keeps track of the expected TCP SN from the NIC's
> perspective.  If the next packet to transmit matches the expected TCP
> SN, the driver advances the expected TCP SN, and transmits the packet
> with TLS offload indication.
> 
> If the next packet to transmit does not match the expected TCP SN. The
> driver calls the TLS layer to obtain the TLS record that includes the
> TCP of the packet for transmission. Using this TLS record, the driver
> posts a work entry on the transmit queue to reconstruct the NIC TLS
> state required for the offload of the out-of-order packet. It updates
> the expected TCP SN accordingly and transmit the now in-order packet.
> The same queue is used for packet transmission and TLS context
> reconstruction to avoid the need for flushing the transmit queue before
> issuing the context reconstruction request.
> 
> Signed-off-by: Ilya Lesokhin 
> Signed-off-by: Boris Pismenny 
> Signed-off-by: Aviad Yehezkel 
> Signed-off-by: Saeed Mahameed 
> ---
>  include/net/tls.h |  70 +++-
>  net/tls/Kconfig   |  10 +
>  net/tls/Makefile  |   2 +
>  net/tls/tls_device.c  | 804 
> ++
>  net/tls/tls_device_fallback.c | 419 ++
>  net/tls/tls_main.c|  33 +-
>  6 files changed, 1331 insertions(+), 7 deletions(-)
>  create mode 100644 net/tls/tls_device.c
>  create mode 100644 net/tls/tls_device_fallback.c
> 
> diff --git a/include/net/tls.h b/include/net/tls.h
> index 4913430ab807..ab98a6dc4929 100644
> --- a/include/net/tls.h
> +++ b/include/net/tls.h
> @@ -77,6 +77,37 @@ struct tls_sw_context {
>   struct scatterlist sg_aead_out[2];
>  };
>  
> +struct tls_record_info {
> + struct list_head list;
> + u32 end_seq;
> + int len;
> + int num_frags;
> + skb_frag_t frags[MAX_SKB_FRAGS];
> +};
> +
> +struct tls_offload_context {
> + struct crypto_aead *aead_send;
> + spinlock_t lock;/* protects records list */
> + struct list_head records_list;
> + struct tls_record_info *open_record;
> + struct tls_record_info *retransmit_hint;
> + u64 hint_record_sn;
> + u64 unacked_record_sn;
> +
> + struct scatterlist sg_tx_data[MAX_SKB_FRAGS];
> + void (*sk_destruct)(struct sock *sk);
> + u8 driver_state[];
> + /* The TLS layer reserves room for driver specific state
> +  * Currently the belief is that there is not enough
> +  * driver specific state to justify another layer of indirection
> +  */
> +#define TLS_DRIVER_STATE_SIZE (max_t(size_t, 8, sizeof(void *)))
> +};
> +
> +#define TLS_OFFLOAD_CONTEXT_SIZE 
>   \
> + (ALIGN(sizeof(struct tls_offload_context), sizeof(void *)) +   \
> +  TLS_DRIVER_STATE_SIZE)
> +
>  enum {
>   

[PATCH net-next 06/14] net/tls: Add generic NIC offload infrastructure

2018-03-19 Thread Saeed Mahameed
From: Ilya Lesokhin 

This patch adds a generic infrastructure to offload TLS crypto to a
network devices. It enables the kernel TLS socket to skip encryption
and authentication operations on the transmit side of the data path.
Leaving those computationally expensive operations to the NIC.

The NIC offload infrastructure builds TLS records and pushes them to
the TCP layer just like the SW KTLS implementation and using the same API.
TCP segmentation is mostly unaffected. Currently the only exception is
that we prevent mixed SKBs where only part of the payload requires
offload. In the future we are likely to add a similar restriction
following a change cipher spec record.

The notable differences between SW KTLS and NIC offloaded TLS
implementations are as follows:
1. The offloaded implementation builds "plaintext TLS record", those
records contain plaintext instead of ciphertext and place holder bytes
instead of authentication tags.
2. The offloaded implementation maintains a mapping from TCP sequence
number to TLS records. Thus given a TCP SKB sent from a NIC offloaded
TLS socket, we can use the tls NIC offload infrastructure to obtain
enough context to encrypt the payload of the SKB.
A TLS record is released when the last byte of the record is ack'ed,
this is done through the new icsk_clean_acked callback.

The infrastructure should be extendable to support various NIC offload
implementations.  However it is currently written with the
implementation below in mind:
The NIC assumes that packets from each offloaded stream are sent as
plaintext and in-order. It keeps track of the TLS records in the TCP
stream. When a packet marked for offload is transmitted, the NIC
encrypts the payload in-place and puts authentication tags in the
relevant place holders.

The responsibility for handling out-of-order packets (i.e. TCP
retransmission, qdisc drops) falls on the netdev driver.

The netdev driver keeps track of the expected TCP SN from the NIC's
perspective.  If the next packet to transmit matches the expected TCP
SN, the driver advances the expected TCP SN, and transmits the packet
with TLS offload indication.

If the next packet to transmit does not match the expected TCP SN. The
driver calls the TLS layer to obtain the TLS record that includes the
TCP of the packet for transmission. Using this TLS record, the driver
posts a work entry on the transmit queue to reconstruct the NIC TLS
state required for the offload of the out-of-order packet. It updates
the expected TCP SN accordingly and transmit the now in-order packet.
The same queue is used for packet transmission and TLS context
reconstruction to avoid the need for flushing the transmit queue before
issuing the context reconstruction request.

Signed-off-by: Ilya Lesokhin 
Signed-off-by: Boris Pismenny 
Signed-off-by: Aviad Yehezkel 
Signed-off-by: Saeed Mahameed 
---
 include/net/tls.h |  70 +++-
 net/tls/Kconfig   |  10 +
 net/tls/Makefile  |   2 +
 net/tls/tls_device.c  | 804 ++
 net/tls/tls_device_fallback.c | 419 ++
 net/tls/tls_main.c|  33 +-
 6 files changed, 1331 insertions(+), 7 deletions(-)
 create mode 100644 net/tls/tls_device.c
 create mode 100644 net/tls/tls_device_fallback.c

diff --git a/include/net/tls.h b/include/net/tls.h
index 4913430ab807..ab98a6dc4929 100644
--- a/include/net/tls.h
+++ b/include/net/tls.h
@@ -77,6 +77,37 @@ struct tls_sw_context {
struct scatterlist sg_aead_out[2];
 };
 
+struct tls_record_info {
+   struct list_head list;
+   u32 end_seq;
+   int len;
+   int num_frags;
+   skb_frag_t frags[MAX_SKB_FRAGS];
+};
+
+struct tls_offload_context {
+   struct crypto_aead *aead_send;
+   spinlock_t lock;/* protects records list */
+   struct list_head records_list;
+   struct tls_record_info *open_record;
+   struct tls_record_info *retransmit_hint;
+   u64 hint_record_sn;
+   u64 unacked_record_sn;
+
+   struct scatterlist sg_tx_data[MAX_SKB_FRAGS];
+   void (*sk_destruct)(struct sock *sk);
+   u8 driver_state[];
+   /* The TLS layer reserves room for driver specific state
+* Currently the belief is that there is not enough
+* driver specific state to justify another layer of indirection
+*/
+#define TLS_DRIVER_STATE_SIZE (max_t(size_t, 8, sizeof(void *)))
+};
+
+#define TLS_OFFLOAD_CONTEXT_SIZE   
\
+   (ALIGN(sizeof(struct tls_offload_context), sizeof(void *)) +   \
+TLS_DRIVER_STATE_SIZE)
+
 enum {
TLS_PENDING_CLOSED_RECORD
 };
@@ -87,6 +118,10 @@ struct tls_context {
struct tls12_crypto_info_aes_gcm_128 crypto_send_aes_gcm_128;
};
 
+   struct list_head list;
+   struct net_device *netdev;
+