date:20180712

Re: [PATCH net-next] selftests: forwarding: mirror_gre_nh: Unset rp_filter on host VRF

2018-07-12 Thread David Miller

From: Petr Machata 
Date: Tue, 10 Jul 2018 14:44:26 +0200

> The mirrored packets arrive at $h3 encapsulated in GRE/IPv4, with IP
> address from 192.0.2.128/28 network. However the interface is configured
> as a member of 192.0.2.160/28 and there's no route directing traffic
> from the former network through that interface. Correspondingly, the RP
> filter on the VRF rejects it.
> 
> Therefore turn off the VRF's RP filter.
> 
> Signed-off-by: Petr Machata 

Applied.

Re: [PATCH net-next v2 0/8] be2net: small structures clean-up

2018-07-12 Thread David Miller

From: Ivan Vecera 
Date: Tue, 10 Jul 2018 22:59:40 +0200

> The series:
> - removes unused / unneccessary fields in several be2net structures
> - re-order fields in some structures to eliminate holes, cache-lines
>   crosses
> - as result reduces size of main struct be_adapter by 4kB

Series applied, thanks.

[CLOSED] LPC Plumbers Networking Track

2018-07-12 Thread David Miller



The submission window for the networking track of this years's Linux
Plumbers Conference is now officially closed.

We are simply overwhelmed by the amount of submissions and the overall
quality of the content!

Please do not submit any new submissions from this point forward, it
will not be considered, sorry.

The technical committee will now review all of the pending submissions
and notify authors by the specified deadline of August 15th.

Thanks!

Re: pull-request: ieee802154 for net 2018-07-11

2018-07-12 Thread David Miller

From: Stefan Schmidt 
Date: Wed, 11 Jul 2018 11:26:53 -0400

> An update from ieee802154 for your *net* tree.
> 
> Build system fix for a missing include from Arnd Bergmann.
> Setting the IFLA_LINK for the lowpan parent from Lubomir Rintel.
> Fixes for some RX corner cases in adf7242 driver by Michael Hennerich.
> And some small patches to cleanup our BUG_ON vs WARN_ON usage.

Pulled, thanks.

[PATCH net-next] net: gro: properly remove skb from list

2018-07-12 Thread Prashant Bhole

Following crash occurs in validate_xmit_skb_list() when same skb is
iterated multiple times in the loop and consume_skb() is called.

The root cause is calling list_del_init(&skb->list) and not clearing
skb->next in d4546c2509b1. list_del_init(&skb->list) sets skb->next
to point to skb itself. skb->next needs to be cleared because other
parts of network stack uses another kind of SKB lists.
validate_xmit_skb_list() uses such list.

A similar type of bugfix was reported by Jesper Dangaard Brouer.
https://patchwork.ozlabs.org/patch/942541/

This patch clears skb->next and changes list_del_init() to list_del()
so that list->prev will maintain the list poison.

[  148.185511] 
==
[  148.187865] BUG: KASAN: use-after-free in validate_xmit_skb_list+0x4b/0xa0
[  148.190158] Read of size 8 at addr 8801e52eefc0 by task swapper/1/0
[  148.192940]
[  148.193642] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.18.0-rc3+ #25
[  148.195423] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
?-20180531_142017-buildhw-08.phx2.fedoraproject.org-1.fc28 04/01/2014
[  148.199129] Call Trace:
[  148.200565]  
[  148.201911]  dump_stack+0xc6/0x14c
[  148.203572]  ? dump_stack_print_info.cold.1+0x2f/0x2f
[  148.205083]  ? kmsg_dump_rewind_nolock+0x59/0x59
[  148.206307]  ? validate_xmit_skb+0x2c6/0x560
[  148.207432]  ? debug_show_held_locks+0x30/0x30
[  148.208571]  ? validate_xmit_skb_list+0x4b/0xa0
[  148.211144]  print_address_description+0x6c/0x23c
[  148.212601]  ? validate_xmit_skb_list+0x4b/0xa0
[  148.213782]  kasan_report.cold.6+0x241/0x2fd
[  148.214958]  validate_xmit_skb_list+0x4b/0xa0
[  148.216494]  sch_direct_xmit+0x1b0/0x680
[  148.217601]  ? dev_watchdog+0x4e0/0x4e0
[  148.218675]  ? do_raw_spin_trylock+0x10/0x120
[  148.219818]  ? do_raw_spin_lock+0xe0/0xe0
[  148.221032]  __dev_queue_xmit+0x1167/0x1810
[  148.222155]  ? sched_clock+0x5/0x10
[...]

[  148.474257] Allocated by task 0:
[  148.475363]  kasan_kmalloc+0xbf/0xe0
[  148.476503]  kmem_cache_alloc+0xb4/0x1b0
[  148.477654]  __build_skb+0x91/0x250
[  148.478677]  build_skb+0x67/0x180
[  148.479657]  e1000_clean_rx_irq+0x542/0x8a0
[  148.480757]  e1000_clean+0x652/0xd10
[  148.481772]  net_rx_action+0x4ea/0xc20
[  148.482808]  __do_softirq+0x1f9/0x574
[  148.483831]
[  148.484575] Freed by task 0:
[  148.485504]  __kasan_slab_free+0x12e/0x180
[  148.486589]  kmem_cache_free+0xb4/0x240
[  148.487634]  kfree_skbmem+0xed/0x150
[  148.488648]  consume_skb+0x146/0x250
[  148.489665]  validate_xmit_skb+0x2b7/0x560
[  148.490754]  validate_xmit_skb_list+0x70/0xa0
[  148.491897]  sch_direct_xmit+0x1b0/0x680
[  148.493949]  __dev_queue_xmit+0x1167/0x1810
[  148.495103]  br_dev_queue_push_xmit+0xce/0x250
[  148.496196]  br_forward_finish+0x276/0x280
[  148.497234]  __br_forward+0x44f/0x520
[  148.498260]  br_forward+0x19f/0x1b0
[  148.499264]  br_handle_frame_finish+0x65e/0x980
[  148.500398]  NF_HOOK.constprop.10+0x290/0x2a0
[  148.501522]  br_handle_frame+0x417/0x640
[  148.502582]  __netif_receive_skb_core+0xaac/0x18f0
[  148.503753]  __netif_receive_skb_one_core+0x98/0x120
[  148.504958]  netif_receive_skb_internal+0xe3/0x330
[  148.506154]  napi_gro_complete+0x190/0x2a0
[  148.507243]  dev_gro_receive+0x9f7/0x1100
[  148.508316]  napi_gro_receive+0xcb/0x260
[  148.509387]  e1000_clean_rx_irq+0x2fc/0x8a0
[  148.510501]  e1000_clean+0x652/0xd10
[  148.511523]  net_rx_action+0x4ea/0xc20
[  148.512566]  __do_softirq+0x1f9/0x574
[  148.513598]
[  148.514346] The buggy address belongs to the object at 8801e52eefc0
[  148.514346]  which belongs to the cache skbuff_head_cache of size 232
[  148.517047] The buggy address is located 0 bytes inside of
[  148.517047]  232-byte region [8801e52eefc0, 8801e52ef0a8)
[  148.519549] The buggy address belongs to the page:
[  148.520726] page:ea000794bb00 count:1 mapcount:0 
mapping:880106f4dfc0 index:0x8801e52ee840 compound_mapcount: 0
[  148.524325] flags: 0x17c0008100(slab|head)
[  148.525481] raw: 0017c0008100 880106b938d0 880106b938d0 
880106f4dfc0
[  148.527503] raw: 8801e52ee840 00190011 0001 

[  148.529547] page dumped because: kasan: bad access detected

Fixes: d4546c2509b1 ("net: Convert GRO SKB handling to list_head.")
Signed-off-by: Prashant Bhole 
Reported-by: Tyler Hicks 
---
 net/core/dev.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index d13cddcac41f..08c41941f912 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -5169,7 +5169,8 @@ static void __napi_gro_flush_chain(struct napi_struct 
*napi, u32 index,
list_for_each_entry_safe_reverse(skb, p, head, list) {
if (flush_old && NAPI_GRO_CB(skb)->age == jiffies)
return;
-   list_del_init(&skb->list);
+   list_del(&skb->list);
+   skb->next = NULL;
na

Re: [PATCH net] nsh: set mac len based on inner packet

2018-07-12 Thread Jiri Benc

On Wed, 11 Jul 2018 12:00:44 -0400, Willem de Bruijn wrote:
> From: Willem de Bruijn 
> 
> When pulling the NSH header in nsh_gso_segment, set the mac length
> based on the encapsulated packet type.
> 
> skb_reset_mac_len computes an offset to the network header, which
> here still points to the outer packet:
> 
>   > skb_reset_network_header(skb);
>   > [...]
>   > __skb_pull(skb, nsh_len);
>   > skb_reset_mac_header(skb);// now mac hdr starts nsh_len == 8B 
> after net hdr
>   > skb_reset_mac_len(skb);   // mac len = net hdr - mac hdr == (u16) 
> -8 == 65528
>   > [..]
>   > skb_mac_gso_segment(skb, ..)  
> 
> Link: 
> http://lkml.kernel.org/r/CAF=yd-keactson4axiraxl8m7qas8gbbe1w09eziywvpbbu...@mail.gmail.com
> Reported-by: syzbot+7b9ed9872dab8c323...@syzkaller.appspotmail.com
> Fixes: c411ed854584 ("nsh: add GSO support")
> Signed-off-by: Willem de Bruijn 

Acked-by: Jiri Benc

Re: [BUG net-next] BUG triggered with GRO SKB list_head changes

2018-07-12 Thread Prashant Bhole





On 7/12/2018 7:39 AM, Tyler Hicks wrote:

Starting with the following net-next commit, I see a BUG when starting a
LXD container inside of a KVM guest using virtio-net:

   d4546c2509b1 net: Convert GRO SKB handling to list_head.


Recently I encountered KASAN:use-after-free BUG and git bisect pointed 
to above commit. Looks like this is the same issue without KASAN 
enabled. I have submitted a bugfix for this BUG with Tyler Hicks in 
Reported-by tag.


-Prashant




Here's what the kernel spits out:

  kernel BUG at /var/scm/kernel/linux/include/linux/skbuff.h:2080!
  invalid opcode:  [#1] PREEMPT SMP DEBUG_PAGEALLOC PTI
  CPU: 0 PID: 1362 Comm: libvirtd Not tainted 4.18.0-rc2+ #69
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
Ubuntu-1.8.2-1ubuntu1 04/01/2014
  RIP: 0010:skb_pull+0x36/0x40
  Code: c6 77 24 29 f0 3b 87 84 00 00 00 89 87 80 00 00 00 72 17 89 f6 48 89 f0 48 03 
87 d8 00 00 00 48 89 87 d8 00 00 00 c3 31 c0 c3 <0f> 0b 0f 1f 84 00 00 00
00 00 0f 1f 44 00 00 39 b7 80 00 00 00 76
  RSP: :96737f6039f0 EFLAGS: 00010297
  RAX: 9c66e2f2 RBX:  RCX: 0501
  RDX: 0001 RSI: 000e RDI: 96737f7e3938
  RBP: 967379f40020 R08:  R09: 
  R10: 96737f603988 R11: c0461335 R12: 967379f409e0
  R13: 96737f7e3938 R14:  R15: 967379e96ac0
  FS:  7fc96087e640() GS:96737f60() knlGS:
  CS:  0010 DS:  ES:  CR0: 80050033
  CR2: 7fc913608aa0 CR3: 5dacc001 CR4: 001606f0
  Call Trace:
   
   br_dev_xmit+0xe1/0x3d0 [bridge]
   dev_hard_start_xmit+0xbc/0x3b0
   __dev_queue_xmit+0xb98/0xc30
   ip_finish_output2+0x3e5/0x670
   ? ip_output+0x7f/0x250
   ip_output+0x7f/0x250
   ? ip_fragment.constprop.5+0x80/0x80
   ip_forward+0x3e2/0x650
   ? ipv4_frags_init_net+0x130/0x130
   ip_rcv+0x2be/0x500
   ? ip_local_deliver_finish+0x3b0/0x3b0
   __netif_receive_skb_core+0x6a8/0xb30
   ? lock_acquire+0xab/0x200
   ? netif_receive_skb_internal+0x2a/0x380
   netif_receive_skb_internal+0x73/0x380
   ? napi_gro_complete+0xcf/0x1b0
   dev_gro_receive+0x374/0x730
   napi_gro_receive+0x4f/0x1d0
   receive_buf+0x4b6/0x1930 [virtio_net]
   ? detach_buf+0x69/0x120 [virtio_ring]
   virtnet_poll+0x122/0x2e0 [virtio_net]
   net_rx_action+0x207/0x450
   __do_softirq+0x149/0x4ea
   irq_exit+0xbf/0xd0
   do_IRQ+0x6c/0x130
   common_interrupt+0xf/0xf
   
  RIP: 0010:__radix_tree_lookup+0x28/0xe0
  Code: 00 00 53 49 89 ca 41 bb 40 00 00 00 4c 8b 47 50 4c 89 c0 83 e0 03 48 83 f8 01 
0f 85 a8 00 00 00 4c 89 c0 48 83 e0 fe 0f b6 08 <4c> 89 d8 48 d3 e0 48 83
e8 01 48 39 c6 76 11 e9 9f 00 00 00 4c 89
  RSP: :ae150048fcc0 EFLAGS: 0282 ORIG_RAX: ffd9
  RAX: 96735d2ef908 RBX: 001f RCX: 0006
  RDX:  RSI: 02e2 RDI: 96735d10b788
  RBP: 02e2 R08: 96735d2ef909 R09: 
  R10:  R11: 0040 R12: 001f
  R13: ec01c15f3a80 R14: 001f R15: ae150048fd18
   __do_page_cache_readahead+0x11f/0x2e0
   filemap_fault+0x408/0x660
   ext4_filemap_fault+0x2f/0x40
   __do_fault+0x1f/0xd0
   __handle_mm_fault+0x915/0xfa0
   handle_mm_fault+0x1c2/0x390
   __do_page_fault+0x2f6/0x580
   ? async_page_fault+0x5/0x20
   async_page_fault+0x1b/0x20
  RIP: 0033:0x7fc913608aa0
  Code: Bad RIP value.
  RSP: 002b:7ffcfa9c7f08 EFLAGS: 00010206
  RAX:  RBX: 0003 RCX: 0080
  RDX: 0006 RSI: 7fc913a74bf8 RDI: 7fc913df9720
  RBP: 0001 R08: 55df45795700 R09: 
  R10: 55df4574c010 R11: 0001 R12: 7ffcfa9c8c38
  R13: 7ffcfa9c8c48 R14: 7fc913dc3d70 R15: 55df4578ab30
  Modules linked in: veth ebtable_filter ebtables ipt_MASQUERADE xt_CHECKSUM 
xt_comment xt_tcpudp iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 
nf_nat nf_conntrack libcrc32c iptable_mangle iptable_filter bpfilter bridge stp 
llc fuse kvm_intel kvm irqbypass 9pnet_virtio 9pnet virtio_balloon ib_iser 
rdma_cm configfs iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi 
scsi_transport_iscsi ip_tables x_tables virtio_net net_failover virtio_blk 
failover crc32_pclmul crc32c_intel pcbc aesni_intel aes_x86_64 crypto_simd 
cryptd glue_helper virtio_pci psmouse virtio_ring virtio

I'm not very familiar with the GRO or IP fragmentation code but I was
able to identify that this change "fixes" the issue:

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 7ccc601b55d9..a5cea572a7f1 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -666,6 +666,7 @@ struct sk_buff {
/* These two members must be first. */
struct sk_buff  *next;
struct sk_buff  *prev;
+   struct list_headli

Re: [PATCH] net: convert gro_count to bitmask

2018-07-12 Thread Stefano Brivio

On Thu, 12 Jul 2018 02:31:10 +
"Li,Rongqing"  wrote:

> > -邮件原件-
> > 发件人: Stefano Brivio [mailto:sbri...@redhat.com]
> > 发送时间: 2018年7月11日 18:52
> > 收件人: Li,Rongqing 
> > 抄送: netdev@vger.kernel.org; Eric Dumazet 
> > 主题: Re: [PATCH] net: convert gro_count to bitmask
> > 
> > On Wed, 11 Jul 2018 17:15:53 +0800
> > Li RongQing  wrote:
> >   
> > > @@ -5380,6 +5382,12 @@ static enum gro_result dev_gro_receive(struct  
> > napi_struct *napi, struct sk_buff  
> > >   if (grow > 0)
> > >   gro_pull_from_frag0(skb, grow);
> > >  ok:
> > > + if (napi->gro_hash[hash].count)
> > > + if (!test_bit(hash, &napi->gro_bitmask))
> > > + set_bit(hash, &napi->gro_bitmask);
> > > + else if (test_bit(hash, &napi->gro_bitmask))
> > > + clear_bit(hash, &napi->gro_bitmask);  
> > 
> > This might not do what you want.
> > 
> > --  
> 
> could you show detail ?

$ cat if1.c; gcc -o if1 if1.c
#include 

int main()
{
if (1)
if (0)
;
else if (2)
printf("whoops\n");

return 0;
}
$ ./if1
whoops

$ cat if2.c; gcc -o if2 if2.c
#include 

int main()
{
if (1) {
if (0)
;
} else if (2) {
printf("whoops\n");
}

return 0;
}
$ ./if2

-- 
Stefano

Re: [PATCH net-next] net/tls: Removed redundant variable from 'struct tls_sw_context_rx'

2018-07-12 Thread Boris Pismenny


Hi Vakul,

On 7/12/2018 7:03 AM, Vakul Garg wrote:

The variable 'decrypted' in 'struct tls_sw_context_rx' is redundant and
is being set/unset without purpose. Simplified the code by removing it.



AFAIU, this variable has an important use here. It keeps the state 
whether the current record has been decrypted between invocations of the 
recv/splice system calls. Otherwise, some records would be decrypted 
more than once if the entire record was not read.



Signed-off-by: Vakul Garg 
---
  include/net/tls.h |  1 -
  net/tls/tls_sw.c  | 87 ---
  2 files changed, 38 insertions(+), 50 deletions(-)

diff --git a/include/net/tls.h b/include/net/tls.h
index 70c273777fe9..528d0c2d6cc2 100644
--- a/include/net/tls.h
+++ b/include/net/tls.h
@@ -113,7 +113,6 @@ struct tls_sw_context_rx {
struct poll_table_struct *wait);
struct sk_buff *recv_pkt;
u8 control;
-   bool decrypted;
  
  	char rx_aad_ciphertext[TLS_AAD_SPACE_SIZE];

char rx_aad_plaintext[TLS_AAD_SPACE_SIZE];
diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index 0d670c8adf18..e5f2de2c3fd6 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -81,8 +81,6 @@ static int tls_do_decryption(struct sock *sk,
rxm->full_len -= tls_ctx->rx.overhead_size;
tls_advance_record_sn(sk, &tls_ctx->rx);
  
-	ctx->decrypted = true;

-
ctx->saved_data_ready(sk);
  
  out:

@@ -756,6 +754,9 @@ int tls_sw_recvmsg(struct sock *sk,
bool cmsg = false;
int target, err = 0;
long timeo;
+   int page_count;
+   int to_copy;
+
  
  	flags |= nonblock;
  
@@ -792,46 +793,38 @@ int tls_sw_recvmsg(struct sock *sk,

goto recv_end;
}
  
-		if (!ctx->decrypted) {

-   int page_count;
-   int to_copy;
-
-   page_count = iov_iter_npages(&msg->msg_iter,
-MAX_SKB_FRAGS);
-   to_copy = rxm->full_len - tls_ctx->rx.overhead_size;
-   if (to_copy <= len && page_count < MAX_SKB_FRAGS &&
-   likely(!(flags & MSG_PEEK)))  {
-   struct scatterlist sgin[MAX_SKB_FRAGS + 1];
-   int pages = 0;
-
-   zc = true;
-   sg_init_table(sgin, MAX_SKB_FRAGS + 1);
-   sg_set_buf(&sgin[0], ctx->rx_aad_plaintext,
-  TLS_AAD_SPACE_SIZE);
-
-   err = zerocopy_from_iter(sk, &msg->msg_iter,
-to_copy, &pages,
-&chunk, &sgin[1],
-MAX_SKB_FRAGS, false);
-   if (err < 0)
-   goto fallback_to_reg_recv;
-
-   err = decrypt_skb(sk, skb, sgin);
-   for (; pages > 0; pages--)
-   put_page(sg_page(&sgin[pages]));
-   if (err < 0) {
-   tls_err_abort(sk, EBADMSG);
-   goto recv_end;
-   }
-   } else {
+   page_count = iov_iter_npages(&msg->msg_iter, MAX_SKB_FRAGS);
+   to_copy = rxm->full_len - tls_ctx->rx.overhead_size;
+
+   if (to_copy <= len && page_count < MAX_SKB_FRAGS &&
+   likely(!(flags & MSG_PEEK)))  {
+   struct scatterlist sgin[MAX_SKB_FRAGS + 1];
+   int pages = 0;
+
+   zc = true;
+   sg_init_table(sgin, MAX_SKB_FRAGS + 1);
+   sg_set_buf(&sgin[0], ctx->rx_aad_plaintext,
+  TLS_AAD_SPACE_SIZE);
+   err = zerocopy_from_iter(sk, &msg->msg_iter, to_copy,
+&pages, &chunk, &sgin[1],
+MAX_SKB_FRAGS, false);
+   if (err < 0)
+   goto fallback_to_reg_recv;
+
+   err = decrypt_skb(sk, skb, sgin);
+   for (; pages > 0; pages--)
+   put_page(sg_page(&sgin[pages]));
+   if (err < 0) {
+   tls_err_abort(sk, EBADMSG);
+   goto recv_end;
+   }
+   } else {
  fallback_to_reg_recv:
-   err = decrypt_skb(sk, skb, NULL);
-   if (err < 0) {
-   tls_err_abort(sk, EBADMSG);
-

RE: [PATCH net-next] net/tls: Removed redundant variable from 'struct tls_sw_context_rx'

2018-07-12 Thread Vakul Garg

Hi Boris

Thanks for explaining.
Few questions/observations.

1. Isn't ' ctx->decrypted = true' a redundant statement in tls_do_decryption()?
The same has been repeated in tls_recvmsg() after calling decrypt_skb()?

2. Similarly, ctx->saved_data_ready(sk) seems not required in 
tls_do_decryption().
This is because tls_do_decryption() is already triggered from tls_recvmsg() 
i.e. from user space app context.

3. In tls_queue(), I think strp->sk->sk_state_change() needs to be replaced 
with ctx->saved_data_ready().

Regards

Vakul

> -Original Message-
> From: Boris Pismenny [mailto:bor...@mellanox.com]
> Sent: Thursday, July 12, 2018 4:11 PM
> To: Vakul Garg ; da...@davemloft.net;
> davejwat...@fb.com; netdev@vger.kernel.org
> Cc: avia...@mellanox.com
> Subject: Re: [PATCH net-next] net/tls: Removed redundant variable from
> 'struct tls_sw_context_rx'
> 
> Hi Vakul,
> 
> On 7/12/2018 7:03 AM, Vakul Garg wrote:
> > The variable 'decrypted' in 'struct tls_sw_context_rx' is redundant
> > and is being set/unset without purpose. Simplified the code by removing it.
> >
> 
> AFAIU, this variable has an important use here. It keeps the state whether
> the current record has been decrypted between invocations of the
> recv/splice system calls. Otherwise, some records would be decrypted more
> than once if the entire record was not read.




> 
> > Signed-off-by: Vakul Garg 
> > ---
> >   include/net/tls.h |  1 -
> >   net/tls/tls_sw.c  | 87 
> > 
> ---
> >   2 files changed, 38 insertions(+), 50 deletions(-)
> >
> > diff --git a/include/net/tls.h b/include/net/tls.h index
> > 70c273777fe9..528d0c2d6cc2 100644
> > --- a/include/net/tls.h
> > +++ b/include/net/tls.h
> > @@ -113,7 +113,6 @@ struct tls_sw_context_rx {
> > struct poll_table_struct *wait);
> > struct sk_buff *recv_pkt;
> > u8 control;
> > -   bool decrypted;
> >
> > char rx_aad_ciphertext[TLS_AAD_SPACE_SIZE];
> > char rx_aad_plaintext[TLS_AAD_SPACE_SIZE];
> > diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c index
> > 0d670c8adf18..e5f2de2c3fd6 100644
> > --- a/net/tls/tls_sw.c
> > +++ b/net/tls/tls_sw.c
> > @@ -81,8 +81,6 @@ static int tls_do_decryption(struct sock *sk,
> > rxm->full_len -= tls_ctx->rx.overhead_size;
> > tls_advance_record_sn(sk, &tls_ctx->rx);
> >
> > -   ctx->decrypted = true;
> > -
> > ctx->saved_data_ready(sk);
> >
> >   out:
> > @@ -756,6 +754,9 @@ int tls_sw_recvmsg(struct sock *sk,
> > bool cmsg = false;
> > int target, err = 0;
> > long timeo;
> > +   int page_count;
> > +   int to_copy;
> > +
> >
> > flags |= nonblock;
> >
> > @@ -792,46 +793,38 @@ int tls_sw_recvmsg(struct sock *sk,
> > goto recv_end;
> > }
> >
> > -   if (!ctx->decrypted) {
> > -   int page_count;
> > -   int to_copy;
> > -
> > -   page_count = iov_iter_npages(&msg->msg_iter,
> > -MAX_SKB_FRAGS);
> > -   to_copy = rxm->full_len - tls_ctx->rx.overhead_size;
> > -   if (to_copy <= len && page_count < MAX_SKB_FRAGS
> &&
> > -   likely(!(flags & MSG_PEEK)))  {
> > -   struct scatterlist sgin[MAX_SKB_FRAGS + 1];
> > -   int pages = 0;
> > -
> > -   zc = true;
> > -   sg_init_table(sgin, MAX_SKB_FRAGS + 1);
> > -   sg_set_buf(&sgin[0], ctx->rx_aad_plaintext,
> > -  TLS_AAD_SPACE_SIZE);
> > -
> > -   err = zerocopy_from_iter(sk, &msg-
> >msg_iter,
> > -to_copy, &pages,
> > -&chunk, &sgin[1],
> > -MAX_SKB_FRAGS,
>   false);
> > -   if (err < 0)
> > -   goto fallback_to_reg_recv;
> > -
> > -   err = decrypt_skb(sk, skb, sgin);
> > -   for (; pages > 0; pages--)
> > -   put_page(sg_page(&sgin[pages]));
> > -   if (err < 0) {
> > -   tls_err_abort(sk, EBADMSG);
> > -   goto recv_end;
> > -   }
> > -   } else {
> > +   page_count = iov_iter_npages(&msg->msg_iter,
> MAX_SKB_FRAGS);
> > +   to_copy = rxm->full_len - tls_ctx->rx.overhead_size;
> > +
> > +   if (to_copy <= len && page_count < MAX_SKB_FRAGS &&
> > +   likely(!(flags & MSG_PEEK)))  {
> > +   struct scatterlist sgin[MAX_SKB_FRAGS + 1];
> > +   int pages = 0;
> > +
> > +   zc = true;
> > +   sg_init_table(sgin

[PATCH bpf-next 3/3] tools: bpf: build and install man page for eBPF helpers from bpftool/

2018-07-12 Thread Quentin Monnet

Provide a new Makefile.helpers in tools/bpf, in order to build and
install the man page for eBPF helpers. This Makefile is also included in
the one used to build bpftool documentation, so that it can be called
either on its own (cd tools/bpf && make -f Makefile.helpers) or from
bpftool directory (cd tools/bpf/bpftool && make doc, or
cd tools/bpf/bpftool/Documentation && make helpers).

Makefile.helpers is not added directly to bpftool to avoid changing its
Makefile too much (helpers are not 100% directly related with bpftool).
But the possibility to build the page from bpftool directory makes us
able to package the helpers man page with bpftool, and to install it
along with bpftool documentation, so that the doc for helpers becomes
easily available to developers through the "man" program.

Cc: linux-...@vger.kernel.org
Suggested-by: Daniel Borkmann 
Signed-off-by: Quentin Monnet 
Reviewed-by: Jakub Kicinski 
---
 tools/bpf/Makefile.helpers   | 59 
 tools/bpf/bpftool/Documentation/Makefile | 13 ---
 2 files changed, 67 insertions(+), 5 deletions(-)
 create mode 100644 tools/bpf/Makefile.helpers

diff --git a/tools/bpf/Makefile.helpers b/tools/bpf/Makefile.helpers
new file mode 100644
index ..c34fea77f39f
--- /dev/null
+++ b/tools/bpf/Makefile.helpers
@@ -0,0 +1,59 @@
+ifndef allow-override
+  include ../scripts/Makefile.include
+  include ../scripts/utilities.mak
+else
+  # Assume Makefile.helpers is being run from bpftool/Documentation
+  # subdirectory. Go up two more directories to fetch bpf.h header and
+  # associated script.
+  UP2DIR := ../../
+endif
+
+INSTALL ?= install
+RM ?= rm -f
+RMDIR ?= rmdir --ignore-fail-on-non-empty
+
+ifeq ($(V),1)
+  Q =
+else
+  Q = @
+endif
+
+prefix ?= /usr/local
+mandir ?= $(prefix)/man
+man7dir = $(mandir)/man7
+
+HELPERS_RST = bpf-helpers.rst
+MAN7_RST = $(HELPERS_RST)
+
+_DOC_MAN7 = $(patsubst %.rst,%.7,$(MAN7_RST))
+DOC_MAN7 = $(addprefix $(OUTPUT),$(_DOC_MAN7))
+
+helpers: man7
+man7: $(DOC_MAN7)
+
+RST2MAN_DEP := $(shell command -v rst2man 2>/dev/null)
+
+$(OUTPUT)$(HELPERS_RST): $(UP2DIR)../../include/uapi/linux/bpf.h
+   $(QUIET_GEN)$(UP2DIR)../../scripts/bpf_helpers_doc.py --filename $< > $@
+
+$(OUTPUT)%.7: $(OUTPUT)%.rst
+ifndef RST2MAN_DEP
+   $(error "rst2man not found, but required to generate man pages")
+endif
+   $(QUIET_GEN)rst2man $< > $@
+
+helpers-clean:
+   $(call QUIET_CLEAN, eBPF_helpers-manpage)
+   $(Q)$(RM) $(DOC_MAN7) $(OUTPUT)$(HELPERS_RST)
+
+helpers-install: helpers
+   $(call QUIET_INSTALL, eBPF_helpers-manpage)
+   $(Q)$(INSTALL) -d -m 755 $(DESTDIR)$(man7dir)
+   $(Q)$(INSTALL) -m 644 $(DOC_MAN7) $(DESTDIR)$(man7dir)
+
+helpers-uninstall:
+   $(call QUIET_UNINST, eBPF_helpers-manpage)
+   $(Q)$(RM) $(addprefix $(DESTDIR)$(man7dir)/,$(_DOC_MAN7))
+   $(Q)$(RMDIR) $(DESTDIR)$(man7dir)
+
+.PHONY: helpers helpers-clean helpers-install helpers-uninstall
diff --git a/tools/bpf/bpftool/Documentation/Makefile 
b/tools/bpf/bpftool/Documentation/Makefile
index a9d47c1558bb..f7663a3e60c9 100644
--- a/tools/bpf/bpftool/Documentation/Makefile
+++ b/tools/bpf/bpftool/Documentation/Makefile
@@ -15,12 +15,15 @@ prefix ?= /usr/local
 mandir ?= $(prefix)/man
 man8dir = $(mandir)/man8
 
-MAN8_RST = $(wildcard *.rst)
+# Load targets for building eBPF helpers man page.
+include ../../Makefile.helpers
+
+MAN8_RST = $(filter-out $(HELPERS_RST),$(wildcard *.rst))
 
 _DOC_MAN8 = $(patsubst %.rst,%.8,$(MAN8_RST))
 DOC_MAN8 = $(addprefix $(OUTPUT),$(_DOC_MAN8))
 
-man: man8
+man: man8 helpers
 man8: $(DOC_MAN8)
 
 RST2MAN_DEP := $(shell command -v rst2man 2>/dev/null)
@@ -31,16 +34,16 @@ ifndef RST2MAN_DEP
 endif
$(QUIET_GEN)rst2man $< > $@
 
-clean:
+clean: helpers-clean
$(call QUIET_CLEAN, Documentation)
$(Q)$(RM) $(DOC_MAN8)
 
-install: man
+install: man helpers-install
$(call QUIET_INSTALL, Documentation-man)
$(Q)$(INSTALL) -d -m 755 $(DESTDIR)$(man8dir)
$(Q)$(INSTALL) -m 644 $(DOC_MAN8) $(DESTDIR)$(man8dir)
 
-uninstall:
+uninstall: helpers-uninstall
$(call QUIET_UNINST, Documentation-man)
$(Q)$(RM) $(addprefix $(DESTDIR)$(man8dir)/,$(_DOC_MAN8))
$(Q)$(RMDIR) $(DESTDIR)$(man8dir)
-- 
2.14.1

[PATCH bpf-next 1/3] bpf: fix documentation for eBPF helpers

2018-07-12 Thread Quentin Monnet

Minor formatting edits for eBPF helpers documentation, including blank
lines removal, fix of item list for return values in bpf_fib_lookup(),
and missing prefix on bpf_skb_load_bytes_relative().

Signed-off-by: Quentin Monnet 
Reviewed-by: Jakub Kicinski 
---
 include/uapi/linux/bpf.h | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index b7db3261c62d..6bcb287a888d 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -1826,7 +1826,7 @@ union bpf_attr {
  * A non-negative value equal to or less than *size* on success,
  * or a negative error in case of failure.
  *
- * int skb_load_bytes_relative(const struct sk_buff *skb, u32 offset, void 
*to, u32 len, u32 start_header)
+ * int bpf_skb_load_bytes_relative(const struct sk_buff *skb, u32 offset, void 
*to, u32 len, u32 start_header)
  * Description
  * This helper is similar to **bpf_skb_load_bytes**\ () in that
  * it provides an easy way to load *len* bytes from *offset*
@@ -1877,7 +1877,7 @@ union bpf_attr {
  * * < 0 if any input argument is invalid
  * *   0 on success (packet is forwarded, nexthop neighbor exists)
  * * > 0 one of **BPF_FIB_LKUP_RET_** codes explaining why the
- * * packet is not forwarded or needs assist from full stack
+ *   packet is not forwarded or needs assist from full stack
  *
  * int bpf_sock_hash_update(struct bpf_sock_ops_kern *skops, struct bpf_map 
*map, void *key, u64 flags)
  * Description
@@ -2033,7 +2033,6 @@ union bpf_attr {
  * This helper is only available is the kernel was compiled with
  * the **CONFIG_BPF_LIRC_MODE2** configuration option set to
  * "**y**".
- *
  * Return
  * 0
  *
@@ -2053,7 +2052,6 @@ union bpf_attr {
  * This helper is only available is the kernel was compiled with
  * the **CONFIG_BPF_LIRC_MODE2** configuration option set to
  * "**y**".
- *
  * Return
  * 0
  *
-- 
2.14.1

[PATCH bpf-next 2/3] tools: bpf: synchronise BPF UAPI header with tools

2018-07-12 Thread Quentin Monnet

Update with latest changes from include/uapi/linux/bpf.h header.

Signed-off-by: Quentin Monnet 
Reviewed-by: Jakub Kicinski 
---
 tools/include/uapi/linux/bpf.h | 32 
 1 file changed, 24 insertions(+), 8 deletions(-)

diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 59b19b6a40d7..6bcb287a888d 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -1826,7 +1826,7 @@ union bpf_attr {
  * A non-negative value equal to or less than *size* on success,
  * or a negative error in case of failure.
  *
- * int skb_load_bytes_relative(const struct sk_buff *skb, u32 offset, void 
*to, u32 len, u32 start_header)
+ * int bpf_skb_load_bytes_relative(const struct sk_buff *skb, u32 offset, void 
*to, u32 len, u32 start_header)
  * Description
  * This helper is similar to **bpf_skb_load_bytes**\ () in that
  * it provides an easy way to load *len* bytes from *offset*
@@ -1857,7 +1857,8 @@ union bpf_attr {
  * is resolved), the nexthop address is returned in ipv4_dst
  * or ipv6_dst based on family, smac is set to mac address of
  * egress device, dmac is set to nexthop mac address, rt_metric
- * is set to metric from route (IPv4/IPv6 only).
+ * is set to metric from route (IPv4/IPv6 only), and ifindex
+ * is set to the device index of the nexthop from the FIB lookup.
  *
  * *plen* argument is the size of the passed in struct.
  * *flags* argument can be a combination of one or more of the
@@ -1873,9 +1874,10 @@ union bpf_attr {
  * *ctx* is either **struct xdp_md** for XDP programs or
  * **struct sk_buff** tc cls_act programs.
  * Return
- * Egress device index on success, 0 if packet needs to continue
- * up the stack for further processing or a negative error in case
- * of failure.
+ * * < 0 if any input argument is invalid
+ * *   0 on success (packet is forwarded, nexthop neighbor exists)
+ * * > 0 one of **BPF_FIB_LKUP_RET_** codes explaining why the
+ *   packet is not forwarded or needs assist from full stack
  *
  * int bpf_sock_hash_update(struct bpf_sock_ops_kern *skops, struct bpf_map 
*map, void *key, u64 flags)
  * Description
@@ -2031,7 +2033,6 @@ union bpf_attr {
  * This helper is only available is the kernel was compiled with
  * the **CONFIG_BPF_LIRC_MODE2** configuration option set to
  * "**y**".
- *
  * Return
  * 0
  *
@@ -2051,7 +2052,6 @@ union bpf_attr {
  * This helper is only available is the kernel was compiled with
  * the **CONFIG_BPF_LIRC_MODE2** configuration option set to
  * "**y**".
- *
  * Return
  * 0
  *
@@ -2612,6 +2612,18 @@ struct bpf_raw_tracepoint_args {
 #define BPF_FIB_LOOKUP_DIRECT  BIT(0)
 #define BPF_FIB_LOOKUP_OUTPUT  BIT(1)
 
+enum {
+   BPF_FIB_LKUP_RET_SUCCESS,  /* lookup successful */
+   BPF_FIB_LKUP_RET_BLACKHOLE,/* dest is blackholed; can be dropped */
+   BPF_FIB_LKUP_RET_UNREACHABLE,  /* dest is unreachable; can be dropped */
+   BPF_FIB_LKUP_RET_PROHIBIT, /* dest not allowed; can be dropped */
+   BPF_FIB_LKUP_RET_NOT_FWDED,/* packet is not forwarded */
+   BPF_FIB_LKUP_RET_FWD_DISABLED, /* fwding is not enabled on ingress */
+   BPF_FIB_LKUP_RET_UNSUPP_LWT,   /* fwd requires encapsulation */
+   BPF_FIB_LKUP_RET_NO_NEIGH, /* no neighbor entry for nh */
+   BPF_FIB_LKUP_RET_FRAG_NEEDED,  /* fragmentation required to fwd */
+};
+
 struct bpf_fib_lookup {
/* input:  network family for lookup (AF_INET, AF_INET6)
 * output: network family of egress nexthop
@@ -2625,7 +2637,11 @@ struct bpf_fib_lookup {
 
/* total length of packet from network header - used for MTU check */
__u16   tot_len;
-   __u32   ifindex;  /* L3 device index for lookup */
+
+   /* input: L3 device index for lookup
+* output: device index from FIB lookup
+*/
+   __u32   ifindex;
 
union {
/* inputs to lookup */
-- 
2.14.1

[PATCH bpf-next 0/3] bpf: install eBPF helper man page along with bpftool doc

2018-07-12 Thread Quentin Monnet

The three patches in this series are related to the documentation for eBPF
helpers. The first patch brings minor formatting edits to the documentation
in include/uapi/linux/bpf.h, and the second one updates the related header
file under tools/.

The third patch adds a Makefile under tools/bpf for generating the
documentation (man pages) about eBPF helpers. The targets defined in this
file can also be called from the bpftool directory (please refer to
relevant commit logs for details).

Quentin Monnet (3):
  bpf: fix documentation for eBPF helpers
  tools: bpf: synchronise BPF UAPI header with tools
  tools: bpf: build and install man page for eBPF helpers from bpftool/

 include/uapi/linux/bpf.h |  6 ++--
 tools/bpf/Makefile.helpers   | 59 
 tools/bpf/bpftool/Documentation/Makefile | 13 ---
 tools/include/uapi/linux/bpf.h   | 32 -
 4 files changed, 93 insertions(+), 17 deletions(-)
 create mode 100644 tools/bpf/Makefile.helpers

-- 
2.14.1

Hello Beautiful

2018-07-12 Thread Jack

Hi Dear, my name is Jack and i am seeking for a relationship in which i will 
feel loved after a series of failed relationships. 

I am hoping that you would be interested and we could possibly get to know each 
other more if you do not mind. I am open to answering questions from you as i 
think my approach is a little inappropriate. Hope to hear back from you.

Jack.

[PATCH net-next v3 03/11] devlink: Add support for creating region snapshots

2018-07-12 Thread Alex Vesker

Each device address region can store multiple snapshots,
each snapshot is identified using a different numerical ID.
This ID is used when deleting a snapshot or showing an address
region specific snapshot. This patch exposes a callback to add
a new snapshot to an address region.
The snapshot will be deleted using the destructor function
when destroying a region or when a snapshot delete command
from devlink user tool.

Signed-off-by: Alex Vesker 
Signed-off-by: Jiri Pirko 
---
 include/net/devlink.h | 13 +++
 net/core/devlink.c| 95 +++
 2 files changed, 108 insertions(+)

diff --git a/include/net/devlink.h b/include/net/devlink.h
index f27d859..905f0bb 100644
--- a/include/net/devlink.h
+++ b/include/net/devlink.h
@@ -401,6 +401,8 @@ enum devlink_param_generic_id {
 
 struct devlink_region;
 
+typedef void devlink_snapshot_data_dest_t(const void *data);
+
 struct devlink_ops {
int (*reload)(struct devlink *devlink, struct netlink_ext_ack *extack);
int (*port_type_set)(struct devlink_port *devlink_port,
@@ -553,6 +555,9 @@ struct devlink_region *devlink_region_create(struct devlink 
*devlink,
 u64 region_size);
 void devlink_region_destroy(struct devlink_region *region);
 u32 devlink_region_shapshot_id_get(struct devlink *devlink);
+int devlink_region_snapshot_create(struct devlink_region *region, u64 data_len,
+  u8 *data, u32 snapshot_id,
+  devlink_snapshot_data_dest_t 
*data_destructor);
 
 #else
 
@@ -800,6 +805,14 @@ static inline bool 
devlink_dpipe_table_counter_enabled(struct devlink *devlink,
return 0;
 }
 
+static inline int
+devlink_region_snapshot_create(struct devlink_region *region, u64 data_len,
+  u8 *data, u32 snapshot_id,
+  devlink_snapshot_data_dest_t *data_destructor)
+{
+   return 0;
+}
+
 #endif
 
 #endif /* _NET_DEVLINK_H_ */
diff --git a/net/core/devlink.c b/net/core/devlink.c
index 6c92ddd..7d09fe6 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -336,6 +336,15 @@ struct devlink_region {
u64 size;
 };
 
+struct devlink_snapshot {
+   struct list_head list;
+   struct devlink_region *region;
+   devlink_snapshot_data_dest_t *data_destructor;
+   u64 data_len;
+   u8 *data;
+   u32 id;
+};
+
 static struct devlink_region *
 devlink_region_get_by_name(struct devlink *devlink, const char *region_name)
 {
@@ -348,6 +357,26 @@ struct devlink_region {
return NULL;
 }
 
+static struct devlink_snapshot *
+devlink_region_snapshot_get_by_id(struct devlink_region *region, u32 id)
+{
+   struct devlink_snapshot *snapshot;
+
+   list_for_each_entry(snapshot, ®ion->snapshot_list, list)
+   if (snapshot->id == id)
+   return snapshot;
+
+   return NULL;
+}
+
+static void devlink_region_snapshot_del(struct devlink_snapshot *snapshot)
+{
+   snapshot->region->cur_snapshots--;
+   list_del(&snapshot->list);
+   (*snapshot->data_destructor)(snapshot->data);
+   kfree(snapshot);
+}
+
 #define DEVLINK_NL_FLAG_NEED_DEVLINK   BIT(0)
 #define DEVLINK_NL_FLAG_NEED_PORT  BIT(1)
 #define DEVLINK_NL_FLAG_NEED_SBBIT(2)
@@ -4185,8 +4214,14 @@ struct devlink_region *devlink_region_create(struct 
devlink *devlink,
 void devlink_region_destroy(struct devlink_region *region)
 {
struct devlink *devlink = region->devlink;
+   struct devlink_snapshot *snapshot, *ts;
 
mutex_lock(&devlink->lock);
+
+   /* Free all snapshots of region */
+   list_for_each_entry_safe(snapshot, ts, ®ion->snapshot_list, list)
+   devlink_region_snapshot_del(snapshot);
+
list_del(®ion->list);
mutex_unlock(&devlink->lock);
kfree(region);
@@ -4214,6 +4249,66 @@ u32 devlink_region_shapshot_id_get(struct devlink 
*devlink)
 }
 EXPORT_SYMBOL_GPL(devlink_region_shapshot_id_get);
 
+/**
+ * devlink_region_snapshot_create - create a new snapshot
+ * This will add a new snapshot of a region. The snapshot
+ * will be stored on the region struct and can be accessed
+ * from devlink. This is useful for future analyses of snapshots.
+ * Multiple snapshots can be created on a region.
+ * The @snapshot_id should be obtained using the getter function.
+ *
+ * @devlink_region: devlink region of the snapshot
+ * @data_len: size of snapshot data
+ * @data: snapshot data
+ * @snapshot_id: snapshot id to be created
+ * @data_destructor: pointer to destructor function to free data
+ */
+int devlink_region_snapshot_create(struct devlink_region *region, u64 data_len,
+  u8 *data, u32 snapshot_id,
+  devlink_snapshot_data_dest_t 
*data_destructor)
+{
+   struct devlink *devlink = region->devlink;
+   struct devlink_s

[PATCH net-next v3 00/11] devlink: Add support for region access

2018-07-12 Thread Alex Vesker

This is a proposal which will allow access to driver defined address
regions using devlink. Each device can create its supported address
regions and register them. A device which exposes a region will allow
access to it using devlink.

The suggested implementation will allow exposing regions to the user,
reading and dumping snapshots taken from different regions. 
A snapshot represents a memory image of a region taken by the driver.

If a device collects a snapshot of an address region it can be later
exposed using devlink region read or dump commands.
This functionality allows for future analyses on the snapshots to be
done.

The major benefit of this support is not only to provide access to
internal address regions which were inaccessible to the user but also
to provide an additional way to debug complex error states using the
region snapshots.

Implemented commands:
$ devlink region help
$ devlink region show [ DEV/REGION ]
$ devlink region del DEV/REGION snapshot SNAPSHOT_ID
$ devlink region dump DEV/REGION [ snapshot SNAPSHOT_ID ]
$ devlink region read DEV/REGION [ snapshot SNAPSHOT_ID ]
address ADDRESS length length

Show all of the exposed regions with region sizes:
$ devlink region show
pci/:00:05.0/cr-space: size 1048576 snapshot [1 2]
pci/:00:05.0/fw-health: size 64 snapshot [1 2]

Delete a snapshot using:
$ devlink region del pci/:00:05.0/cr-space snapshot 1

Dump a snapshot:
$ devlink region dump pci/:00:05.0/fw-health snapshot 1
 0014 95dc 0014 9514 0035 1670 0034 db30
0010    ff04 0029 8c00 0028 8cc8
0020 0016 0bb8 0016 1720   c00f 3ffc
0030 bada cce5 bada cce5 bada cce5 bada cce5

Read a specific part of a snapshot:
$ devlink region read pci/:00:05.0/fw-health snapshot 1 address 0 
length 16
 0014 95dc 0014 9514 0035 1670 0034 db30

For more information you can check devlink-region.8 man page

Future:
There is a plan to extend the support to include a write command
as well as performing read and dump live region

v1->v2:
-Add a parameter to enable devlink region snapshot
-Allocate snapshot memory using kvmalloc
-Introduce destructor function devlink_snapshot_data_dest_t to avoid
 double allocation

v2->v3:
-Fix incorrect comment in devlink.h for DEVLINK_ATTR_REGION_SIZE
 from u32 to u64

Alex Vesker (11):
  devlink: Add support for creating and destroying regions
  devlink: Add callback to query for snapshot id before snapshot create
  devlink: Add support for creating region snapshots
  devlink: Add support for region get command
  devlink: Extend the support querying for region snapshot IDs
  devlink: Add support for region snapshot delete command
  devlink: Add support for region snapshot read command
  net/mlx4_core: Add health buffer address capability
  net/mlx4_core: Add Crdump FW snapshot support
  devlink: Add generic parameters region_snapshot
  net/mlx4_core: Use devlink region_snapshot parameter

 drivers/net/ethernet/mellanox/mlx4/Makefile |   2 +-
 drivers/net/ethernet/mellanox/mlx4/catas.c  |   6 +-
 drivers/net/ethernet/mellanox/mlx4/crdump.c | 239 ++
 drivers/net/ethernet/mellanox/mlx4/fw.c |   5 +-
 drivers/net/ethernet/mellanox/mlx4/fw.h |   1 +
 drivers/net/ethernet/mellanox/mlx4/main.c   |  52 ++-
 drivers/net/ethernet/mellanox/mlx4/mlx4.h   |   4 +
 include/linux/mlx4/device.h |   8 +
 include/net/devlink.h   |  47 ++
 include/uapi/linux/devlink.h|  18 +
 net/core/devlink.c  | 647 
 11 files changed, 1024 insertions(+), 5 deletions(-)
 create mode 100644 drivers/net/ethernet/mellanox/mlx4/crdump.c

-- 
1.8.3.1

[PATCH net-next v3 01/11] devlink: Add support for creating and destroying regions

2018-07-12 Thread Alex Vesker

This allows a device to register its supported address regions.
Each address region can be accessed directly for example reading
the snapshots taken of this address space.
Drivers are not limited in the name selection for different regions.
An example of a region-name can be: pci cr-space, register-space.

Signed-off-by: Alex Vesker 
Signed-off-by: Jiri Pirko 
---
 include/net/devlink.h | 22 ++
 net/core/devlink.c| 84 +++
 2 files changed, 106 insertions(+)

diff --git a/include/net/devlink.h b/include/net/devlink.h
index f67c29c..e539765 100644
--- a/include/net/devlink.h
+++ b/include/net/devlink.h
@@ -28,6 +28,7 @@ struct devlink {
struct list_head dpipe_table_list;
struct list_head resource_list;
struct list_head param_list;
+   struct list_head region_list;
struct devlink_dpipe_headers *dpipe_headers;
const struct devlink_ops *ops;
struct device *dev;
@@ -397,6 +398,8 @@ enum devlink_param_generic_id {
.validate = _validate,  \
 }
 
+struct devlink_region;
+
 struct devlink_ops {
int (*reload)(struct devlink *devlink, struct netlink_ext_ack *extack);
int (*port_type_set)(struct devlink_port *devlink_port,
@@ -543,6 +546,11 @@ int devlink_param_driverinit_value_get(struct devlink 
*devlink, u32 param_id,
 int devlink_param_driverinit_value_set(struct devlink *devlink, u32 param_id,
   union devlink_param_value init_val);
 void devlink_param_value_changed(struct devlink *devlink, u32 param_id);
+struct devlink_region *devlink_region_create(struct devlink *devlink,
+const char *region_name,
+u32 region_max_snapshots,
+u64 region_size);
+void devlink_region_destroy(struct devlink_region *region);
 
 #else
 
@@ -770,6 +778,20 @@ static inline bool 
devlink_dpipe_table_counter_enabled(struct devlink *devlink,
 {
 }
 
+static inline struct devlink_region *
+devlink_region_create(struct devlink *devlink,
+ const char *region_name,
+ u32 region_max_snapshots,
+ u64 region_size)
+{
+   return NULL;
+}
+
+static inline void
+devlink_region_destroy(struct devlink_region *region)
+{
+}
+
 #endif
 
 #endif /* _NET_DEVLINK_H_ */
diff --git a/net/core/devlink.c b/net/core/devlink.c
index 470f3db..cac8561 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -326,6 +326,28 @@ static int devlink_sb_pool_index_get_from_info(struct 
devlink_sb *devlink_sb,
  pool_type, p_tc_index);
 }
 
+struct devlink_region {
+   struct devlink *devlink;
+   struct list_head list;
+   const char *name;
+   struct list_head snapshot_list;
+   u32 max_snapshots;
+   u32 cur_snapshots;
+   u64 size;
+};
+
+static struct devlink_region *
+devlink_region_get_by_name(struct devlink *devlink, const char *region_name)
+{
+   struct devlink_region *region;
+
+   list_for_each_entry(region, &devlink->region_list, list)
+   if (!strcmp(region->name, region_name))
+   return region;
+
+   return NULL;
+}
+
 #define DEVLINK_NL_FLAG_NEED_DEVLINK   BIT(0)
 #define DEVLINK_NL_FLAG_NEED_PORT  BIT(1)
 #define DEVLINK_NL_FLAG_NEED_SBBIT(2)
@@ -3358,6 +3380,7 @@ struct devlink *devlink_alloc(const struct devlink_ops 
*ops, size_t priv_size)
INIT_LIST_HEAD_RCU(&devlink->dpipe_table_list);
INIT_LIST_HEAD(&devlink->resource_list);
INIT_LIST_HEAD(&devlink->param_list);
+   INIT_LIST_HEAD(&devlink->region_list);
mutex_init(&devlink->lock);
return devlink;
 }
@@ -4109,6 +4132,67 @@ void devlink_param_value_changed(struct devlink 
*devlink, u32 param_id)
 }
 EXPORT_SYMBOL_GPL(devlink_param_value_changed);
 
+/**
+ * devlink_region_create - create a new address region
+ *
+ * @devlink: devlink
+ * @region_name: region name
+ * @region_max_snapshots: Maximum supported number of snapshots for region
+ * @region_size: size of region
+ */
+struct devlink_region *devlink_region_create(struct devlink *devlink,
+const char *region_name,
+u32 region_max_snapshots,
+u64 region_size)
+{
+   struct devlink_region *region;
+   int err = 0;
+
+   mutex_lock(&devlink->lock);
+
+   if (devlink_region_get_by_name(devlink, region_name)) {
+   err = -EEXIST;
+   goto unlock;
+   }
+
+   region = kzalloc(sizeof(*region), GFP_KERNEL);
+   if (!region) {
+   err = -ENOMEM;
+   goto unlock;
+   }
+
+   region->devlink = devlink;
+   region->max_snapshots = re

[PATCH net-next v3 04/11] devlink: Add support for region get command

2018-07-12 Thread Alex Vesker

Add support for DEVLINK_CMD_REGION_GET command which is used for
querying for the supported DEV/REGION values of devlink devices.
The support is both for doit and dumpit.

Reply includes:
  BUS_NAME, DEVICE_NAME, REGION_NAME, REGION_SIZE

Signed-off-by: Alex Vesker 
Signed-off-by: Jiri Pirko 
---
 include/uapi/linux/devlink.h |   6 +++
 net/core/devlink.c   | 114 +++
 2 files changed, 120 insertions(+)

diff --git a/include/uapi/linux/devlink.h b/include/uapi/linux/devlink.h
index 68641fb..28bfa8a 100644
--- a/include/uapi/linux/devlink.h
+++ b/include/uapi/linux/devlink.h
@@ -83,6 +83,9 @@ enum devlink_command {
DEVLINK_CMD_PARAM_NEW,
DEVLINK_CMD_PARAM_DEL,
 
+   DEVLINK_CMD_REGION_GET,
+   DEVLINK_CMD_REGION_SET,
+
/* add new commands above here */
__DEVLINK_CMD_MAX,
DEVLINK_CMD_MAX = __DEVLINK_CMD_MAX - 1
@@ -262,6 +265,9 @@ enum devlink_attr {
DEVLINK_ATTR_PARAM_VALUE_DATA,  /* dynamic */
DEVLINK_ATTR_PARAM_VALUE_CMODE, /* u8 */
 
+   DEVLINK_ATTR_REGION_NAME,   /* string */
+   DEVLINK_ATTR_REGION_SIZE,   /* u64 */
+
/* add new attributes above here, update the policy in devlink.c */
 
__DEVLINK_ATTR_MAX,
diff --git a/net/core/devlink.c b/net/core/devlink.c
index 7d09fe6..221ddb6 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -3149,6 +3149,111 @@ static void devlink_param_unregister_one(struct devlink 
*devlink,
kfree(param_item);
 }
 
+static int devlink_nl_region_fill(struct sk_buff *msg, struct devlink *devlink,
+ enum devlink_command cmd, u32 portid,
+ u32 seq, int flags,
+ struct devlink_region *region)
+{
+   void *hdr;
+   int err;
+
+   hdr = genlmsg_put(msg, portid, seq, &devlink_nl_family, flags, cmd);
+   if (!hdr)
+   return -EMSGSIZE;
+
+   err = devlink_nl_put_handle(msg, devlink);
+   if (err)
+   goto nla_put_failure;
+
+   err = nla_put_string(msg, DEVLINK_ATTR_REGION_NAME, region->name);
+   if (err)
+   goto nla_put_failure;
+
+   err = nla_put_u64_64bit(msg, DEVLINK_ATTR_REGION_SIZE,
+   region->size,
+   DEVLINK_ATTR_PAD);
+   if (err)
+   goto nla_put_failure;
+
+   genlmsg_end(msg, hdr);
+   return 0;
+
+nla_put_failure:
+   genlmsg_cancel(msg, hdr);
+   return err;
+}
+
+static int devlink_nl_cmd_region_get_doit(struct sk_buff *skb,
+ struct genl_info *info)
+{
+   struct devlink *devlink = info->user_ptr[0];
+   struct devlink_region *region;
+   const char *region_name;
+   struct sk_buff *msg;
+   int err;
+
+   if (!info->attrs[DEVLINK_ATTR_REGION_NAME])
+   return -EINVAL;
+
+   region_name = nla_data(info->attrs[DEVLINK_ATTR_REGION_NAME]);
+   region = devlink_region_get_by_name(devlink, region_name);
+   if (!region)
+   return -EINVAL;
+
+   msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL);
+   if (!msg)
+   return -ENOMEM;
+
+   err = devlink_nl_region_fill(msg, devlink, DEVLINK_CMD_REGION_GET,
+info->snd_portid, info->snd_seq, 0,
+region);
+   if (err) {
+   nlmsg_free(msg);
+   return err;
+   }
+
+   return genlmsg_reply(msg, info);
+}
+
+static int devlink_nl_cmd_region_get_dumpit(struct sk_buff *msg,
+   struct netlink_callback *cb)
+{
+   struct devlink_region *region;
+   struct devlink *devlink;
+   int start = cb->args[0];
+   int idx = 0;
+   int err;
+
+   mutex_lock(&devlink_mutex);
+   list_for_each_entry(devlink, &devlink_list, list) {
+   if (!net_eq(devlink_net(devlink), sock_net(msg->sk)))
+   continue;
+
+   mutex_lock(&devlink->lock);
+   list_for_each_entry(region, &devlink->region_list, list) {
+   if (idx < start) {
+   idx++;
+   continue;
+   }
+   err = devlink_nl_region_fill(msg, devlink,
+DEVLINK_CMD_REGION_GET,
+NETLINK_CB(cb->skb).portid,
+cb->nlh->nlmsg_seq,
+NLM_F_MULTI, region);
+   if (err) {
+   mutex_unlock(&devlink->lock);
+   goto out;
+   }
+   idx++;
+   }
+   mutex_unlock(&devlink->

[PATCH net-next v3 05/11] devlink: Extend the support querying for region snapshot IDs

2018-07-12 Thread Alex Vesker

Extend the support for DEVLINK_CMD_REGION_GET command to also
return the IDs of the snapshot currently present on the region.
Each reply will include a nested snapshots attribute that
can contain multiple snapshot attributes each with an ID.

Signed-off-by: Alex Vesker 
Signed-off-by: Jiri Pirko 
---
 include/uapi/linux/devlink.h |  3 +++
 net/core/devlink.c   | 53 
 2 files changed, 56 insertions(+)

diff --git a/include/uapi/linux/devlink.h b/include/uapi/linux/devlink.h
index 28bfa8a..abde4e3 100644
--- a/include/uapi/linux/devlink.h
+++ b/include/uapi/linux/devlink.h
@@ -267,6 +267,9 @@ enum devlink_attr {
 
DEVLINK_ATTR_REGION_NAME,   /* string */
DEVLINK_ATTR_REGION_SIZE,   /* u64 */
+   DEVLINK_ATTR_REGION_SNAPSHOTS,  /* nested */
+   DEVLINK_ATTR_REGION_SNAPSHOT,   /* nested */
+   DEVLINK_ATTR_REGION_SNAPSHOT_ID,/* u32 */
 
/* add new attributes above here, update the policy in devlink.c */
 
diff --git a/net/core/devlink.c b/net/core/devlink.c
index 221ddb6..cb75e26 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -3149,6 +3149,55 @@ static void devlink_param_unregister_one(struct devlink 
*devlink,
kfree(param_item);
 }
 
+static int devlink_nl_region_snapshot_id_put(struct sk_buff *msg,
+struct devlink *devlink,
+struct devlink_snapshot *snapshot)
+{
+   struct nlattr *snap_attr;
+   int err;
+
+   snap_attr = nla_nest_start(msg, DEVLINK_ATTR_REGION_SNAPSHOT);
+   if (!snap_attr)
+   return -EINVAL;
+
+   err = nla_put_u32(msg, DEVLINK_ATTR_REGION_SNAPSHOT_ID, snapshot->id);
+   if (err)
+   goto nla_put_failure;
+
+   nla_nest_end(msg, snap_attr);
+   return 0;
+
+nla_put_failure:
+   nla_nest_cancel(msg, snap_attr);
+   return err;
+}
+
+static int devlink_nl_region_snapshots_id_put(struct sk_buff *msg,
+ struct devlink *devlink,
+ struct devlink_region *region)
+{
+   struct devlink_snapshot *snapshot;
+   struct nlattr *snapshots_attr;
+   int err;
+
+   snapshots_attr = nla_nest_start(msg, DEVLINK_ATTR_REGION_SNAPSHOTS);
+   if (!snapshots_attr)
+   return -EINVAL;
+
+   list_for_each_entry(snapshot, ®ion->snapshot_list, list) {
+   err = devlink_nl_region_snapshot_id_put(msg, devlink, snapshot);
+   if (err)
+   goto nla_put_failure;
+   }
+
+   nla_nest_end(msg, snapshots_attr);
+   return 0;
+
+nla_put_failure:
+   nla_nest_cancel(msg, snapshots_attr);
+   return err;
+}
+
 static int devlink_nl_region_fill(struct sk_buff *msg, struct devlink *devlink,
  enum devlink_command cmd, u32 portid,
  u32 seq, int flags,
@@ -3175,6 +3224,10 @@ static int devlink_nl_region_fill(struct sk_buff *msg, 
struct devlink *devlink,
if (err)
goto nla_put_failure;
 
+   err = devlink_nl_region_snapshots_id_put(msg, devlink, region);
+   if (err)
+   goto nla_put_failure;
+
genlmsg_end(msg, hdr);
return 0;
 
-- 
1.8.3.1

[PATCH net-next v3 08/11] net/mlx4_core: Add health buffer address capability

2018-07-12 Thread Alex Vesker

Health buffer address is a 32 bit PCI address offset provided by
the FW. This offset is used for reading FW health debug data
located on the shared CR space. Cr space is accessible in both
driver and FW and allows for different queries and configurations.
Health buffer size is always 64B of readable data followed by a
lock which is used to block volatile CR space access.

Signed-off-by: Alex Vesker 
Signed-off-by: Tariq Toukan 
Signed-off-by: Jiri Pirko 
---
 drivers/net/ethernet/mellanox/mlx4/fw.c   | 5 -
 drivers/net/ethernet/mellanox/mlx4/fw.h   | 1 +
 drivers/net/ethernet/mellanox/mlx4/main.c | 1 +
 include/linux/mlx4/device.h   | 1 +
 4 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/fw.c 
b/drivers/net/ethernet/mellanox/mlx4/fw.c
index 46dcbfb..babcfd9 100644
--- a/drivers/net/ethernet/mellanox/mlx4/fw.c
+++ b/drivers/net/ethernet/mellanox/mlx4/fw.c
@@ -825,7 +825,7 @@ int mlx4_QUERY_DEV_CAP(struct mlx4_dev *dev, struct 
mlx4_dev_cap *dev_cap)
 #define QUERY_DEV_CAP_QP_RATE_LIMIT_NUM_OFFSET 0xcc
 #define QUERY_DEV_CAP_QP_RATE_LIMIT_MAX_OFFSET 0xd0
 #define QUERY_DEV_CAP_QP_RATE_LIMIT_MIN_OFFSET 0xd2
-
+#define QUERY_DEV_CAP_HEALTH_BUFFER_ADDRESS_OFFSET 0xe4
 
dev_cap->flags2 = 0;
mailbox = mlx4_alloc_cmd_mailbox(dev);
@@ -1082,6 +1082,9 @@ int mlx4_QUERY_DEV_CAP(struct mlx4_dev *dev, struct 
mlx4_dev_cap *dev_cap)
dev_cap->rl_caps.min_unit = size >> 14;
}
 
+   MLX4_GET(dev_cap->health_buffer_addrs, outbox,
+QUERY_DEV_CAP_HEALTH_BUFFER_ADDRESS_OFFSET);
+
MLX4_GET(field32, outbox, QUERY_DEV_CAP_EXT_2_FLAGS_OFFSET);
if (field32 & (1 << 16))
dev_cap->flags2 |= MLX4_DEV_CAP_FLAG2_UPDATE_QP;
diff --git a/drivers/net/ethernet/mellanox/mlx4/fw.h 
b/drivers/net/ethernet/mellanox/mlx4/fw.h
index cd6399c..650ae08 100644
--- a/drivers/net/ethernet/mellanox/mlx4/fw.h
+++ b/drivers/net/ethernet/mellanox/mlx4/fw.h
@@ -128,6 +128,7 @@ struct mlx4_dev_cap {
u32 dmfs_high_rate_qpn_base;
u32 dmfs_high_rate_qpn_range;
struct mlx4_rate_limit_caps rl_caps;
+   u32 health_buffer_addrs;
struct mlx4_port_cap port_cap[MLX4_MAX_PORTS + 1];
bool wol_port[MLX4_MAX_PORTS + 1];
 };
diff --git a/drivers/net/ethernet/mellanox/mlx4/main.c 
b/drivers/net/ethernet/mellanox/mlx4/main.c
index c42eddf..806d441 100644
--- a/drivers/net/ethernet/mellanox/mlx4/main.c
+++ b/drivers/net/ethernet/mellanox/mlx4/main.c
@@ -523,6 +523,7 @@ static int mlx4_dev_cap(struct mlx4_dev *dev, struct 
mlx4_dev_cap *dev_cap)
dev->caps.max_rss_tbl_sz = dev_cap->max_rss_tbl_sz;
dev->caps.wol_port[1]  = dev_cap->wol_port[1];
dev->caps.wol_port[2]  = dev_cap->wol_port[2];
+   dev->caps.health_buffer_addrs  = dev_cap->health_buffer_addrs;
 
/* Save uar page shift */
if (!mlx4_is_slave(dev)) {
diff --git a/include/linux/mlx4/device.h b/include/linux/mlx4/device.h
index 122e7e9..e3bfe76 100644
--- a/include/linux/mlx4/device.h
+++ b/include/linux/mlx4/device.h
@@ -630,6 +630,7 @@ struct mlx4_caps {
u32 vf_caps;
boolwol_port[MLX4_MAX_PORTS + 1];
struct mlx4_rate_limit_caps rl_caps;
+   u32 health_buffer_addrs;
 };
 
 struct mlx4_buf_list {
-- 
1.8.3.1

[PATCH net-next v3 09/11] net/mlx4_core: Add Crdump FW snapshot support

2018-07-12 Thread Alex Vesker

Crdump allows the driver to create a snapshot of the FW PCI
crspace and health buffer during a critical FW issue.
In case of a FW command timeout, FW getting stuck or a non zero
value on the catastrophic buffer, a snapshot will be taken.

The snapshot is exposed using devlink, cr-space, fw-health
address regions are registered on init and snapshots are attached
once a new snapshot is collected by the driver.

Signed-off-by: Alex Vesker 
Signed-off-by: Tariq Toukan 
Signed-off-by: Jiri Pirko 
---
 drivers/net/ethernet/mellanox/mlx4/Makefile |   2 +-
 drivers/net/ethernet/mellanox/mlx4/catas.c  |   6 +-
 drivers/net/ethernet/mellanox/mlx4/crdump.c | 231 
 drivers/net/ethernet/mellanox/mlx4/main.c   |  10 +-
 drivers/net/ethernet/mellanox/mlx4/mlx4.h   |   4 +
 include/linux/mlx4/device.h |   6 +
 6 files changed, 255 insertions(+), 4 deletions(-)
 create mode 100644 drivers/net/ethernet/mellanox/mlx4/crdump.c

diff --git a/drivers/net/ethernet/mellanox/mlx4/Makefile 
b/drivers/net/ethernet/mellanox/mlx4/Makefile
index 16b10d0..3f40077 100644
--- a/drivers/net/ethernet/mellanox/mlx4/Makefile
+++ b/drivers/net/ethernet/mellanox/mlx4/Makefile
@@ -3,7 +3,7 @@ obj-$(CONFIG_MLX4_CORE) += mlx4_core.o
 
 mlx4_core-y := alloc.o catas.o cmd.o cq.o eq.o fw.o fw_qos.o icm.o intf.o \
main.o mcg.o mr.o pd.o port.o profile.o qp.o reset.o sense.o \
-   srq.o resource_tracker.o
+   srq.o resource_tracker.o crdump.o
 
 obj-$(CONFIG_MLX4_EN)   += mlx4_en.o
 
diff --git a/drivers/net/ethernet/mellanox/mlx4/catas.c 
b/drivers/net/ethernet/mellanox/mlx4/catas.c
index 8afe4b5..c81d15b 100644
--- a/drivers/net/ethernet/mellanox/mlx4/catas.c
+++ b/drivers/net/ethernet/mellanox/mlx4/catas.c
@@ -178,10 +178,12 @@ void mlx4_enter_error_state(struct mlx4_dev_persistent 
*persist)
 
dev = persist->dev;
mlx4_err(dev, "device is going to be reset\n");
-   if (mlx4_is_slave(dev))
+   if (mlx4_is_slave(dev)) {
err = mlx4_reset_slave(dev);
-   else
+   } else {
+   mlx4_crdump_collect(dev);
err = mlx4_reset_master(dev);
+   }
 
if (!err) {
mlx4_err(dev, "device was reset successfully\n");
diff --git a/drivers/net/ethernet/mellanox/mlx4/crdump.c 
b/drivers/net/ethernet/mellanox/mlx4/crdump.c
new file mode 100644
index 000..4d5524d
--- /dev/null
+++ b/drivers/net/ethernet/mellanox/mlx4/crdump.c
@@ -0,0 +1,231 @@
+/*
+ * Copyright (c) 2018, Mellanox Technologies. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ *  - Redistributions of source code must retain the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer.
+ *
+ *  - Redistributions in binary form must reproduce the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer in the documentation and/or other materials
+ *provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#include "mlx4.h"
+
+#define BAD_ACCESS 0xBADACCE5
+#define HEALTH_BUFFER_SIZE 0x40
+#define CR_ENABLE_BIT  swab32(BIT(6))
+#define CR_ENABLE_BIT_OFFSET   0xF3F04
+#define MAX_NUM_OF_DUMPS_TO_STORE  (8)
+
+static const char *region_cr_space_str = "cr-space";
+static const char *region_fw_health_str = "fw-health";
+
+/* Set to true in case cr enable bit was set to true before crdump */
+static bool crdump_enbale_bit_set;
+
+static void crdump_enable_crspace_access(struct mlx4_dev *dev,
+u8 __iomem *cr_space)
+{
+   /* Get current enable bit value */
+   crdump_enbale_bit_set =
+   readl(cr_space + CR_ENABLE_BIT_OFFSET) & CR_ENABLE_BIT;
+
+   /* Enable FW CR filter (set bit6 to 0) */
+   if (crdump_enbale_bit_set)
+   writel(readl(cr_space + CR_ENABLE_BIT_OFFSET) & ~CR_ENABLE_BIT,
+  cr_space + CR_ENABLE_BIT_OFFSET);
+
+   /* Enab

[PATCH net-next v3 11/11] net/mlx4_core: Use devlink region_snapshot parameter

2018-07-12 Thread Alex Vesker

This parameter enables capturing region snapshot of the crspace
during critical errors. The default value of this parameter is
disabled, it can be enabled using devlink param commands.
It is possible to configure during runtime and also driver init.

Signed-off-by: Alex Vesker 
Signed-off-by: Jiri Pirko 
Reviewed-by: Moshe Shemesh 
---
 drivers/net/ethernet/mellanox/mlx4/crdump.c |  8 ++
 drivers/net/ethernet/mellanox/mlx4/main.c   | 41 +
 include/linux/mlx4/device.h |  1 +
 3 files changed, 50 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlx4/crdump.c 
b/drivers/net/ethernet/mellanox/mlx4/crdump.c
index 4d5524d..88316c7 100644
--- a/drivers/net/ethernet/mellanox/mlx4/crdump.c
+++ b/drivers/net/ethernet/mellanox/mlx4/crdump.c
@@ -158,6 +158,7 @@ static void mlx4_crdump_collect_fw_health(struct mlx4_dev 
*dev,
 int mlx4_crdump_collect(struct mlx4_dev *dev)
 {
struct devlink *devlink = priv_to_devlink(mlx4_priv(dev));
+   struct mlx4_fw_crdump *crdump = &dev->persist->crdump;
struct pci_dev *pdev = dev->persist->pdev;
unsigned long cr_res_size;
u8 __iomem *cr_space;
@@ -168,6 +169,11 @@ int mlx4_crdump_collect(struct mlx4_dev *dev)
return 0;
}
 
+   if (!crdump->snapshot_enable) {
+   mlx4_info(dev, "crdump: devlink snapshot disabled, skipping\n");
+   return 0;
+   }
+
cr_res_size = pci_resource_len(pdev, 0);
 
cr_space = ioremap(pci_resource_start(pdev, 0), cr_res_size);
@@ -197,6 +203,8 @@ int mlx4_crdump_init(struct mlx4_dev *dev)
struct mlx4_fw_crdump *crdump = &dev->persist->crdump;
struct pci_dev *pdev = dev->persist->pdev;
 
+   crdump->snapshot_enable = false;
+
/* Create cr-space region */
crdump->region_crspace =
devlink_region_create(devlink,
diff --git a/drivers/net/ethernet/mellanox/mlx4/main.c 
b/drivers/net/ethernet/mellanox/mlx4/main.c
index 46b0214..2d979a6 100644
--- a/drivers/net/ethernet/mellanox/mlx4/main.c
+++ b/drivers/net/ethernet/mellanox/mlx4/main.c
@@ -191,6 +191,26 @@ static int mlx4_devlink_ierr_reset_set(struct devlink 
*devlink, u32 id,
return 0;
 }
 
+static int mlx4_devlink_crdump_snapshot_get(struct devlink *devlink, u32 id,
+   struct devlink_param_gset_ctx *ctx)
+{
+   struct mlx4_priv *priv = devlink_priv(devlink);
+   struct mlx4_dev *dev = &priv->dev;
+
+   ctx->val.vbool = dev->persist->crdump.snapshot_enable;
+   return 0;
+}
+
+static int mlx4_devlink_crdump_snapshot_set(struct devlink *devlink, u32 id,
+   struct devlink_param_gset_ctx *ctx)
+{
+   struct mlx4_priv *priv = devlink_priv(devlink);
+   struct mlx4_dev *dev = &priv->dev;
+
+   dev->persist->crdump.snapshot_enable = ctx->val.vbool;
+   return 0;
+}
+
 static int
 mlx4_devlink_max_macs_validate(struct devlink *devlink, u32 id,
   union devlink_param_value val,
@@ -224,6 +244,11 @@ enum mlx4_devlink_param_id {
DEVLINK_PARAM_GENERIC(MAX_MACS,
  BIT(DEVLINK_PARAM_CMODE_DRIVERINIT),
  NULL, NULL, mlx4_devlink_max_macs_validate),
+   DEVLINK_PARAM_GENERIC(REGION_SNAPSHOT,
+ BIT(DEVLINK_PARAM_CMODE_RUNTIME) |
+ BIT(DEVLINK_PARAM_CMODE_DRIVERINIT),
+ mlx4_devlink_crdump_snapshot_get,
+ mlx4_devlink_crdump_snapshot_set, NULL),
DEVLINK_PARAM_DRIVER(MLX4_DEVLINK_PARAM_ID_ENABLE_64B_CQE_EQE,
 "enable_64b_cqe_eqe", DEVLINK_PARAM_TYPE_BOOL,
 BIT(DEVLINK_PARAM_CMODE_DRIVERINIT),
@@ -270,6 +295,11 @@ static void mlx4_devlink_set_params_init_values(struct 
devlink *devlink)
mlx4_devlink_set_init_value(devlink,
MLX4_DEVLINK_PARAM_ID_ENABLE_4K_UAR,
value);
+
+   value.vbool = false;
+   mlx4_devlink_set_init_value(devlink,
+   DEVLINK_PARAM_GENERIC_ID_REGION_SNAPSHOT,
+   value);
 }
 
 static inline void mlx4_set_num_reserved_uars(struct mlx4_dev *dev,
@@ -3862,6 +3892,9 @@ static int mlx4_devlink_port_type_set(struct devlink_port 
*devlink_port,
 
 static void mlx4_devlink_param_load_driverinit_values(struct devlink *devlink)
 {
+   struct mlx4_priv *priv = devlink_priv(devlink);
+   struct mlx4_dev *dev = &priv->dev;
+   struct mlx4_fw_crdump *crdump = &dev->persist->crdump;
union devlink_param_value saved_value;
int err;
 
@@ -3889,6 +3922,14 @@ static void 
mlx4_devlink_param_load_driverinit_values(struct devlink *devlink)
 &saved_value);
if (!err)
enable_4k_uar = saved

[PATCH net-next v3 07/11] devlink: Add support for region snapshot read command

2018-07-12 Thread Alex Vesker

Add support for DEVLINK_CMD_REGION_READ_GET used for both reading
and dumping region data. Read allows reading from a region specific
address for given length. Dump allows reading the full region.
If only snapshot ID is provided a snapshot dump will be done.
If snapshot ID, Address and Length are provided a snapshot read
will done.

This is used for both snapshot access and will be used in the same
way to access current data on the region.

Signed-off-by: Alex Vesker 
Signed-off-by: Jiri Pirko 
---
 include/uapi/linux/devlink.h |   7 ++
 net/core/devlink.c   | 182 +++
 2 files changed, 189 insertions(+)

diff --git a/include/uapi/linux/devlink.h b/include/uapi/linux/devlink.h
index d212e02..79407bb 100644
--- a/include/uapi/linux/devlink.h
+++ b/include/uapi/linux/devlink.h
@@ -87,6 +87,7 @@ enum devlink_command {
DEVLINK_CMD_REGION_SET,
DEVLINK_CMD_REGION_NEW,
DEVLINK_CMD_REGION_DEL,
+   DEVLINK_CMD_REGION_READ,
 
/* add new commands above here */
__DEVLINK_CMD_MAX,
@@ -273,6 +274,12 @@ enum devlink_attr {
DEVLINK_ATTR_REGION_SNAPSHOT,   /* nested */
DEVLINK_ATTR_REGION_SNAPSHOT_ID,/* u32 */
 
+   DEVLINK_ATTR_REGION_CHUNKS, /* nested */
+   DEVLINK_ATTR_REGION_CHUNK,  /* nested */
+   DEVLINK_ATTR_REGION_CHUNK_DATA, /* binary */
+   DEVLINK_ATTR_REGION_CHUNK_ADDR, /* u64 */
+   DEVLINK_ATTR_REGION_CHUNK_LEN,  /* u64 */
+
/* add new attributes above here, update the policy in devlink.c */
 
__DEVLINK_ATTR_MAX,
diff --git a/net/core/devlink.c b/net/core/devlink.c
index fc08363..e5118db 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -3388,6 +3388,181 @@ static int devlink_nl_cmd_region_del(struct sk_buff 
*skb,
return 0;
 }
 
+static int devlink_nl_cmd_region_read_chunk_fill(struct sk_buff *msg,
+struct devlink *devlink,
+u8 *chunk, u32 chunk_size,
+u64 addr)
+{
+   struct nlattr *chunk_attr;
+   int err;
+
+   chunk_attr = nla_nest_start(msg, DEVLINK_ATTR_REGION_CHUNK);
+   if (!chunk_attr)
+   return -EINVAL;
+
+   err = nla_put(msg, DEVLINK_ATTR_REGION_CHUNK_DATA, chunk_size, chunk);
+   if (err)
+   goto nla_put_failure;
+
+   err = nla_put_u64_64bit(msg, DEVLINK_ATTR_REGION_CHUNK_ADDR, addr,
+   DEVLINK_ATTR_PAD);
+   if (err)
+   goto nla_put_failure;
+
+   nla_nest_end(msg, chunk_attr);
+   return 0;
+
+nla_put_failure:
+   nla_nest_cancel(msg, chunk_attr);
+   return err;
+}
+
+#define DEVLINK_REGION_READ_CHUNK_SIZE 256
+
+static int devlink_nl_region_read_snapshot_fill(struct sk_buff *skb,
+   struct devlink *devlink,
+   struct devlink_region *region,
+   struct nlattr **attrs,
+   u64 start_offset,
+   u64 end_offset,
+   bool dump,
+   u64 *new_offset)
+{
+   struct devlink_snapshot *snapshot;
+   u64 curr_offset = start_offset;
+   u32 snapshot_id;
+   int err = 0;
+
+   *new_offset = start_offset;
+
+   snapshot_id = nla_get_u32(attrs[DEVLINK_ATTR_REGION_SNAPSHOT_ID]);
+   snapshot = devlink_region_snapshot_get_by_id(region, snapshot_id);
+   if (!snapshot)
+   return -EINVAL;
+
+   if (end_offset > snapshot->data_len || dump)
+   end_offset = snapshot->data_len;
+
+   while (curr_offset < end_offset) {
+   u32 data_size;
+   u8 *data;
+
+   if (end_offset - curr_offset < DEVLINK_REGION_READ_CHUNK_SIZE)
+   data_size = end_offset - curr_offset;
+   else
+   data_size = DEVLINK_REGION_READ_CHUNK_SIZE;
+
+   data = &snapshot->data[curr_offset];
+   err = devlink_nl_cmd_region_read_chunk_fill(skb, devlink,
+   data, data_size,
+   curr_offset);
+   if (err)
+   break;
+
+   curr_offset += data_size;
+   }
+   *new_offset = curr_offset;
+
+   return err;
+}
+
+static int devlink_nl_cmd_region_read_dumpit(struct sk_buff *skb,
+struct netlink_callback *cb)
+{
+   u64 ret_offset, start_offset, end_offset = 0;
+   struct nlattr *attrs[DEVLINK_ATTR_MAX + 1];
+   const struct genl_ops *ops = cb->data;
+   struct devlink_region *r

[PATCH net-next v3 02/11] devlink: Add callback to query for snapshot id before snapshot create

2018-07-12 Thread Alex Vesker

To restrict the driver with the snapshot ID selection a new callback
is introduced for the driver to get the snapshot ID before creating
a new snapshot. This will also allow giving the same ID for multiple
snapshots taken of different regions on the same time.

Signed-off-by: Alex Vesker 
Signed-off-by: Jiri Pirko 
---
 include/net/devlink.h |  8 
 net/core/devlink.c| 21 +
 2 files changed, 29 insertions(+)

diff --git a/include/net/devlink.h b/include/net/devlink.h
index e539765..f27d859 100644
--- a/include/net/devlink.h
+++ b/include/net/devlink.h
@@ -29,6 +29,7 @@ struct devlink {
struct list_head resource_list;
struct list_head param_list;
struct list_head region_list;
+   u32 snapshot_id;
struct devlink_dpipe_headers *dpipe_headers;
const struct devlink_ops *ops;
struct device *dev;
@@ -551,6 +552,7 @@ struct devlink_region *devlink_region_create(struct devlink 
*devlink,
 u32 region_max_snapshots,
 u64 region_size);
 void devlink_region_destroy(struct devlink_region *region);
+u32 devlink_region_shapshot_id_get(struct devlink *devlink);
 
 #else
 
@@ -792,6 +794,12 @@ static inline bool 
devlink_dpipe_table_counter_enabled(struct devlink *devlink,
 {
 }
 
+static inline u32
+devlink_region_shapshot_id_get(struct devlink *devlink)
+{
+   return 0;
+}
+
 #endif
 
 #endif /* _NET_DEVLINK_H_ */
diff --git a/net/core/devlink.c b/net/core/devlink.c
index cac8561..6c92ddd 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -4193,6 +4193,27 @@ void devlink_region_destroy(struct devlink_region 
*region)
 }
 EXPORT_SYMBOL_GPL(devlink_region_destroy);
 
+/**
+ * devlink_region_shapshot_id_get - get snapshot ID
+ *
+ * This callback should be called when adding a new snapshot,
+ * Driver should use the same id for multiple snapshots taken
+ * on multiple regions at the same time/by the same trigger.
+ *
+ * @devlink: devlink
+ */
+u32 devlink_region_shapshot_id_get(struct devlink *devlink)
+{
+   u32 id;
+
+   mutex_lock(&devlink->lock);
+   id = ++devlink->snapshot_id;
+   mutex_unlock(&devlink->lock);
+
+   return id;
+}
+EXPORT_SYMBOL_GPL(devlink_region_shapshot_id_get);
+
 static int __init devlink_module_init(void)
 {
return genl_register_family(&devlink_nl_family);
-- 
1.8.3.1

[PATCH net-next v3 10/11] devlink: Add generic parameters region_snapshot

2018-07-12 Thread Alex Vesker

region_snapshot - When set enables capturing region snapshots

Signed-off-by: Alex Vesker 
Signed-off-by: Jiri Pirko 
Reviewed-by: Moshe Shemesh 
---
 include/net/devlink.h | 4 
 net/core/devlink.c| 5 +
 2 files changed, 9 insertions(+)

diff --git a/include/net/devlink.h b/include/net/devlink.h
index 905f0bb..b9b89d6 100644
--- a/include/net/devlink.h
+++ b/include/net/devlink.h
@@ -361,6 +361,7 @@ enum devlink_param_generic_id {
DEVLINK_PARAM_GENERIC_ID_INT_ERR_RESET,
DEVLINK_PARAM_GENERIC_ID_MAX_MACS,
DEVLINK_PARAM_GENERIC_ID_ENABLE_SRIOV,
+   DEVLINK_PARAM_GENERIC_ID_REGION_SNAPSHOT,
 
/* add new param generic ids above here*/
__DEVLINK_PARAM_GENERIC_ID_MAX,
@@ -376,6 +377,9 @@ enum devlink_param_generic_id {
 #define DEVLINK_PARAM_GENERIC_ENABLE_SRIOV_NAME "enable_sriov"
 #define DEVLINK_PARAM_GENERIC_ENABLE_SRIOV_TYPE DEVLINK_PARAM_TYPE_BOOL
 
+#define DEVLINK_PARAM_GENERIC_REGION_SNAPSHOT_NAME "region_snapshot_enable"
+#define DEVLINK_PARAM_GENERIC_REGION_SNAPSHOT_TYPE DEVLINK_PARAM_TYPE_BOOL
+
 #define DEVLINK_PARAM_GENERIC(_id, _cmodes, _get, _set, _validate) \
 {  \
.id = DEVLINK_PARAM_GENERIC_ID_##_id,   \
diff --git a/net/core/devlink.c b/net/core/devlink.c
index e5118db..65fc366 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -2671,6 +2671,11 @@ static int devlink_nl_cmd_reload(struct sk_buff *skb, 
struct genl_info *info)
.name = DEVLINK_PARAM_GENERIC_ENABLE_SRIOV_NAME,
.type = DEVLINK_PARAM_GENERIC_ENABLE_SRIOV_TYPE,
},
+   {
+   .id = DEVLINK_PARAM_GENERIC_ID_REGION_SNAPSHOT,
+   .name = DEVLINK_PARAM_GENERIC_REGION_SNAPSHOT_NAME,
+   .type = DEVLINK_PARAM_GENERIC_REGION_SNAPSHOT_TYPE,
+   },
 };
 
 static int devlink_param_generic_verify(const struct devlink_param *param)
-- 
1.8.3.1

[PATCH net-next v3 06/11] devlink: Add support for region snapshot delete command

2018-07-12 Thread Alex Vesker

Add support for DEVLINK_CMD_REGION_DEL used
for deleting a snapshot from a region. The snapshot ID is required.
Also added notification support for NEW and DEL of snapshots.

Signed-off-by: Alex Vesker 
Signed-off-by: Jiri Pirko 
---
 include/uapi/linux/devlink.h |  2 +
 net/core/devlink.c   | 93 
 2 files changed, 95 insertions(+)

diff --git a/include/uapi/linux/devlink.h b/include/uapi/linux/devlink.h
index abde4e3..d212e02 100644
--- a/include/uapi/linux/devlink.h
+++ b/include/uapi/linux/devlink.h
@@ -85,6 +85,8 @@ enum devlink_command {
 
DEVLINK_CMD_REGION_GET,
DEVLINK_CMD_REGION_SET,
+   DEVLINK_CMD_REGION_NEW,
+   DEVLINK_CMD_REGION_DEL,
 
/* add new commands above here */
__DEVLINK_CMD_MAX,
diff --git a/net/core/devlink.c b/net/core/devlink.c
index cb75e26..fc08363 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -3236,6 +3236,58 @@ static int devlink_nl_region_fill(struct sk_buff *msg, 
struct devlink *devlink,
return err;
 }
 
+static void devlink_nl_region_notify(struct devlink_region *region,
+struct devlink_snapshot *snapshot,
+enum devlink_command cmd)
+{
+   struct devlink *devlink = region->devlink;
+   struct sk_buff *msg;
+   void *hdr;
+   int err;
+
+   WARN_ON(cmd != DEVLINK_CMD_REGION_NEW && cmd != DEVLINK_CMD_REGION_DEL);
+
+   msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL);
+   if (!msg)
+   return;
+
+   hdr = genlmsg_put(msg, 0, 0, &devlink_nl_family, 0, cmd);
+   if (!hdr)
+   goto out_free_msg;
+
+   err = devlink_nl_put_handle(msg, devlink);
+   if (err)
+   goto out_cancel_msg;
+
+   err = nla_put_string(msg, DEVLINK_ATTR_REGION_NAME,
+region->name);
+   if (err)
+   goto out_cancel_msg;
+
+   if (snapshot) {
+   err = nla_put_u32(msg, DEVLINK_ATTR_REGION_SNAPSHOT_ID,
+ snapshot->id);
+   if (err)
+   goto out_cancel_msg;
+   } else {
+   err = nla_put_u64_64bit(msg, DEVLINK_ATTR_REGION_SIZE,
+   region->size, DEVLINK_ATTR_PAD);
+   if (err)
+   goto out_cancel_msg;
+   }
+   genlmsg_end(msg, hdr);
+
+   genlmsg_multicast_netns(&devlink_nl_family, devlink_net(devlink),
+   msg, 0, DEVLINK_MCGRP_CONFIG, GFP_KERNEL);
+
+   return;
+
+out_cancel_msg:
+   genlmsg_cancel(msg, hdr);
+out_free_msg:
+   nlmsg_free(msg);
+}
+
 static int devlink_nl_cmd_region_get_doit(struct sk_buff *skb,
  struct genl_info *info)
 {
@@ -3307,6 +3359,35 @@ static int devlink_nl_cmd_region_get_dumpit(struct 
sk_buff *msg,
return msg->len;
 }
 
+static int devlink_nl_cmd_region_del(struct sk_buff *skb,
+struct genl_info *info)
+{
+   struct devlink *devlink = info->user_ptr[0];
+   struct devlink_snapshot *snapshot;
+   struct devlink_region *region;
+   const char *region_name;
+   u32 snapshot_id;
+
+   if (!info->attrs[DEVLINK_ATTR_REGION_NAME] ||
+   !info->attrs[DEVLINK_ATTR_REGION_SNAPSHOT_ID])
+   return -EINVAL;
+
+   region_name = nla_data(info->attrs[DEVLINK_ATTR_REGION_NAME]);
+   snapshot_id = nla_get_u32(info->attrs[DEVLINK_ATTR_REGION_SNAPSHOT_ID]);
+
+   region = devlink_region_get_by_name(devlink, region_name);
+   if (!region)
+   return -EINVAL;
+
+   snapshot = devlink_region_snapshot_get_by_id(region, snapshot_id);
+   if (!snapshot)
+   return -EINVAL;
+
+   devlink_nl_region_notify(region, snapshot, DEVLINK_CMD_REGION_DEL);
+   devlink_region_snapshot_del(snapshot);
+   return 0;
+}
+
 static const struct nla_policy devlink_nl_policy[DEVLINK_ATTR_MAX + 1] = {
[DEVLINK_ATTR_BUS_NAME] = { .type = NLA_NUL_STRING },
[DEVLINK_ATTR_DEV_NAME] = { .type = NLA_NUL_STRING },
@@ -3331,6 +3412,7 @@ static int devlink_nl_cmd_region_get_dumpit(struct 
sk_buff *msg,
[DEVLINK_ATTR_PARAM_TYPE] = { .type = NLA_U8 },
[DEVLINK_ATTR_PARAM_VALUE_CMODE] = { .type = NLA_U8 },
[DEVLINK_ATTR_REGION_NAME] = { .type = NLA_NUL_STRING },
+   [DEVLINK_ATTR_REGION_SNAPSHOT_ID] = { .type = NLA_U32 },
 };
 
 static const struct genl_ops devlink_nl_ops[] = {
@@ -3537,6 +3619,13 @@ static int devlink_nl_cmd_region_get_dumpit(struct 
sk_buff *msg,
.flags = GENL_ADMIN_PERM,
.internal_flags = DEVLINK_NL_FLAG_NEED_DEVLINK,
},
+   {
+   .cmd = DEVLINK_CMD_REGION_DEL,
+   .doit = devlink_nl_cmd_region_del,
+   .policy = devlink_nl_policy,
+   .flags = GENL_ADMIN_PERM,
+   .i

[PATCH net] qlogic: check kstrtoul() for errors

2018-07-12 Thread Dan Carpenter

We accidentally left out the error handling for kstrtoul().

Fixes: a520030e326a ("qlcnic: Implement flash sysfs callback for 83xx adapter")
Signed-off-by: Dan Carpenter 

diff --git a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_sysfs.c 
b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_sysfs.c
index 891f03a7a33d..8d7b9bb910f2 100644
--- a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_sysfs.c
+++ b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_sysfs.c
@@ -1128,6 +1128,8 @@ static ssize_t 
qlcnic_83xx_sysfs_flash_write_handler(struct file *filp,
struct qlcnic_adapter *adapter = dev_get_drvdata(dev);
 
ret = kstrtoul(buf, 16, &data);
+   if (ret)
+   return ret;
 
switch (data) {
case QLC_83XX_FLASH_SECTOR_ERASE_CMD:

[PATCH net 0/2] fix DCTCP delayed ACK

2018-07-12 Thread Yuchung Cheng

This patch series addresses the issue that sometimes DCTCP
fail to acknowledge the latest sequence and result in sender timeout
if inflight is small.

Yuchung Cheng (2):
  tcp: fix dctcp delayed ACK schedule
  tcp: remove DELAYED ACK events in DCTCP

 include/net/tcp.h |  2 --
 net/ipv4/tcp_dctcp.c  | 31 ---
 net/ipv4/tcp_output.c |  4 
 3 files changed, 4 insertions(+), 33 deletions(-)

-- 
2.18.0.203.gfac676dfb9-goog

[PATCH net 1/2] tcp: fix dctcp delayed ACK schedule

2018-07-12 Thread Yuchung Cheng

Previously, when a data segment was sent an ACK was piggybacked
on the data segment without generating a CA_EVENT_NON_DELAYED_ACK
event to notify congestion control modules. So the DCTCP
ca->delayed_ack_reserved flag could incorrectly stay set when
in fact there were no delayed ACKs being reserved. This could result
in sending a special ECN notification ACK that carries an older
ACK sequence, when in fact there was no need for such an ACK.
DCTCP keeps track of the delayed ACK status with its own separate
state ca->delayed_ack_reserved. Previously it may accidentally cancel
the delayed ACK without updating this field upon sending a special
ACK that carries a older ACK sequence. This inconsistency would
lead to DCTCP receiver never acknowledging the latest data until the
sender times out and retry in some cases.

Packetdrill script (provided by Larry Brakmo)

0.000 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
0.000 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
0.000 setsockopt(3, SOL_TCP, TCP_CONGESTION, "dctcp", 5) = 0
0.000 bind(3, ..., ...) = 0
0.000 listen(3, 1) = 0

0.100 < [ect0] SEW 0:0(0) win 32792 
0.100 > SE. 0:0(0) ack 1 
0.110 < [ect0] . 1:1(0) ack 1 win 257
0.200 accept(3, ..., ...) = 4

0.200 < [ect0] . 1:1001(1000) ack 1 win 257
0.200 > [ect01] . 1:1(0) ack 1001

0.200 write(4, ..., 1) = 1
0.200 > [ect01] P. 1:2(1) ack 1001

0.200 < [ect0] . 1001:2001(1000) ack 2 win 257
0.200 write(4, ..., 1) = 1
0.200 > [ect01] P. 2:3(1) ack 2001

0.200 < [ect0] . 2001:3001(1000) ack 3 win 257
0.200 < [ect0] . 3001:4001(1000) ack 3 win 257
0.200 > [ect01] . 3:3(0) ack 4001

0.210 < [ce] P. 4001:4501(500) ack 3 win 257

+0.001 read(4, ..., 4500) = 4500
+0 write(4, ..., 1) = 1
+0 > [ect01] PE. 3:4(1) ack 4501

+0.010 < [ect0] W. 4501:5501(1000) ack 4 win 257
// Previously the ACK sequence below would be 4501, causing a long RTO
+0.040~+0.045 > [ect01] . 4:4(0) ack 5501   // delayed ack

+0.311 < [ect0] . 5501:6501(1000) ack 4 win 257  // More data
+0 > [ect01] . 4:4(0) ack 6501 // now acks everything

+0.500 < F. 9501:9501(0) ack 4 win 257

Reported-by: Larry Brakmo 
Signed-off-by: Yuchung Cheng 
Signed-off-by: Eric Dumazet 
Acked-by: Neal Cardwell 
---
 net/ipv4/tcp_dctcp.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/tcp_dctcp.c b/net/ipv4/tcp_dctcp.c
index 5f5e5936760e..89f88b0d8167 100644
--- a/net/ipv4/tcp_dctcp.c
+++ b/net/ipv4/tcp_dctcp.c
@@ -134,7 +134,8 @@ static void dctcp_ce_state_0_to_1(struct sock *sk)
/* State has changed from CE=0 to CE=1 and delayed
 * ACK has not sent yet.
 */
-   if (!ca->ce_state && ca->delayed_ack_reserved) {
+   if (!ca->ce_state &&
+   inet_csk(sk)->icsk_ack.pending & ICSK_ACK_TIMER) {
u32 tmp_rcv_nxt;
 
/* Save current rcv_nxt. */
@@ -164,7 +165,8 @@ static void dctcp_ce_state_1_to_0(struct sock *sk)
/* State has changed from CE=1 to CE=0 and delayed
 * ACK has not sent yet.
 */
-   if (ca->ce_state && ca->delayed_ack_reserved) {
+   if (ca->ce_state &&
+   inet_csk(sk)->icsk_ack.pending & ICSK_ACK_TIMER) {
u32 tmp_rcv_nxt;
 
/* Save current rcv_nxt. */
-- 
2.18.0.203.gfac676dfb9-goog

[PATCH net 2/2] tcp: remove DELAYED ACK events in DCTCP

2018-07-12 Thread Yuchung Cheng

After fixing the way DCTCP tracking delayed ACKs, the delayed-ACK
related callbacks are no longer needed

Signed-off-by: Yuchung Cheng 
Signed-off-by: Eric Dumazet 
Acked-by: Neal Cardwell 
---
 include/net/tcp.h |  2 --
 net/ipv4/tcp_dctcp.c  | 25 -
 net/ipv4/tcp_output.c |  4 
 3 files changed, 31 deletions(-)

diff --git a/include/net/tcp.h b/include/net/tcp.h
index af3ec72d5d41..3482d13d655b 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -912,8 +912,6 @@ enum tcp_ca_event {
CA_EVENT_LOSS,  /* loss timeout */
CA_EVENT_ECN_NO_CE, /* ECT set, but not CE marked */
CA_EVENT_ECN_IS_CE, /* received CE marked IP packet */
-   CA_EVENT_DELAYED_ACK,   /* Delayed ack is sent */
-   CA_EVENT_NON_DELAYED_ACK,
 };
 
 /* Information about inbound ACK, passed to cong_ops->in_ack_event() */
diff --git a/net/ipv4/tcp_dctcp.c b/net/ipv4/tcp_dctcp.c
index 89f88b0d8167..5869f89ca656 100644
--- a/net/ipv4/tcp_dctcp.c
+++ b/net/ipv4/tcp_dctcp.c
@@ -55,7 +55,6 @@ struct dctcp {
u32 dctcp_alpha;
u32 next_seq;
u32 ce_state;
-   u32 delayed_ack_reserved;
u32 loss_cwnd;
 };
 
@@ -96,7 +95,6 @@ static void dctcp_init(struct sock *sk)
 
ca->dctcp_alpha = min(dctcp_alpha_on_init, DCTCP_MAX_ALPHA);
 
-   ca->delayed_ack_reserved = 0;
ca->loss_cwnd = 0;
ca->ce_state = 0;
 
@@ -250,25 +248,6 @@ static void dctcp_state(struct sock *sk, u8 new_state)
}
 }
 
-static void dctcp_update_ack_reserved(struct sock *sk, enum tcp_ca_event ev)
-{
-   struct dctcp *ca = inet_csk_ca(sk);
-
-   switch (ev) {
-   case CA_EVENT_DELAYED_ACK:
-   if (!ca->delayed_ack_reserved)
-   ca->delayed_ack_reserved = 1;
-   break;
-   case CA_EVENT_NON_DELAYED_ACK:
-   if (ca->delayed_ack_reserved)
-   ca->delayed_ack_reserved = 0;
-   break;
-   default:
-   /* Don't care for the rest. */
-   break;
-   }
-}
-
 static void dctcp_cwnd_event(struct sock *sk, enum tcp_ca_event ev)
 {
switch (ev) {
@@ -278,10 +257,6 @@ static void dctcp_cwnd_event(struct sock *sk, enum 
tcp_ca_event ev)
case CA_EVENT_ECN_NO_CE:
dctcp_ce_state_1_to_0(sk);
break;
-   case CA_EVENT_DELAYED_ACK:
-   case CA_EVENT_NON_DELAYED_ACK:
-   dctcp_update_ack_reserved(sk, ev);
-   break;
default:
/* Don't care for the rest. */
break;
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 8e08b409c71e..00e5a300ddb9 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -3523,8 +3523,6 @@ void tcp_send_delayed_ack(struct sock *sk)
int ato = icsk->icsk_ack.ato;
unsigned long timeout;
 
-   tcp_ca_event(sk, CA_EVENT_DELAYED_ACK);
-
if (ato > TCP_DELACK_MIN) {
const struct tcp_sock *tp = tcp_sk(sk);
int max_ato = HZ / 2;
@@ -3581,8 +3579,6 @@ void tcp_send_ack(struct sock *sk)
if (sk->sk_state == TCP_CLOSE)
return;
 
-   tcp_ca_event(sk, CA_EVENT_NON_DELAYED_ACK);
-
/* We are not putting this on the write queue, so
 * tcp_transmit_skb() will set the ownership to this
 * sock.
-- 
2.18.0.203.gfac676dfb9-goog

Re: [PATCH net 2/2] tcp: remove DELAYED ACK events in DCTCP

2018-07-12 Thread Lawrence Brakmo

LGTM. Thanks for the patch.

Acked-by: Lawrence Brakmo 

On 7/12/18, 9:05 AM, "Yuchung Cheng"  wrote:

After fixing the way DCTCP tracking delayed ACKs, the delayed-ACK
related callbacks are no longer needed

Signed-off-by: Yuchung Cheng 
Signed-off-by: Eric Dumazet 
Acked-by: Neal Cardwell 
---
 include/net/tcp.h |  2 --
 net/ipv4/tcp_dctcp.c  | 25 -
 net/ipv4/tcp_output.c |  4 
 3 files changed, 31 deletions(-)

diff --git a/include/net/tcp.h b/include/net/tcp.h
index af3ec72d5d41..3482d13d655b 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -912,8 +912,6 @@ enum tcp_ca_event {
CA_EVENT_LOSS,  /* loss timeout */
CA_EVENT_ECN_NO_CE, /* ECT set, but not CE marked */
CA_EVENT_ECN_IS_CE, /* received CE marked IP packet */
-   CA_EVENT_DELAYED_ACK,   /* Delayed ack is sent */
-   CA_EVENT_NON_DELAYED_ACK,
 };
 
 /* Information about inbound ACK, passed to cong_ops->in_ack_event() */
diff --git a/net/ipv4/tcp_dctcp.c b/net/ipv4/tcp_dctcp.c
index 89f88b0d8167..5869f89ca656 100644
--- a/net/ipv4/tcp_dctcp.c
+++ b/net/ipv4/tcp_dctcp.c
@@ -55,7 +55,6 @@ struct dctcp {
u32 dctcp_alpha;
u32 next_seq;
u32 ce_state;
-   u32 delayed_ack_reserved;
u32 loss_cwnd;
 };
 
@@ -96,7 +95,6 @@ static void dctcp_init(struct sock *sk)
 
ca->dctcp_alpha = min(dctcp_alpha_on_init, DCTCP_MAX_ALPHA);
 
-   ca->delayed_ack_reserved = 0;
ca->loss_cwnd = 0;
ca->ce_state = 0;
 
@@ -250,25 +248,6 @@ static void dctcp_state(struct sock *sk, u8 new_state)
}
 }
 
-static void dctcp_update_ack_reserved(struct sock *sk, enum tcp_ca_event 
ev)
-{
-   struct dctcp *ca = inet_csk_ca(sk);
-
-   switch (ev) {
-   case CA_EVENT_DELAYED_ACK:
-   if (!ca->delayed_ack_reserved)
-   ca->delayed_ack_reserved = 1;
-   break;
-   case CA_EVENT_NON_DELAYED_ACK:
-   if (ca->delayed_ack_reserved)
-   ca->delayed_ack_reserved = 0;
-   break;
-   default:
-   /* Don't care for the rest. */
-   break;
-   }
-}
-
 static void dctcp_cwnd_event(struct sock *sk, enum tcp_ca_event ev)
 {
switch (ev) {
@@ -278,10 +257,6 @@ static void dctcp_cwnd_event(struct sock *sk, enum 
tcp_ca_event ev)
case CA_EVENT_ECN_NO_CE:
dctcp_ce_state_1_to_0(sk);
break;
-   case CA_EVENT_DELAYED_ACK:
-   case CA_EVENT_NON_DELAYED_ACK:
-   dctcp_update_ack_reserved(sk, ev);
-   break;
default:
/* Don't care for the rest. */
break;
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 8e08b409c71e..00e5a300ddb9 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -3523,8 +3523,6 @@ void tcp_send_delayed_ack(struct sock *sk)
int ato = icsk->icsk_ack.ato;
unsigned long timeout;
 
-   tcp_ca_event(sk, CA_EVENT_DELAYED_ACK);
-
if (ato > TCP_DELACK_MIN) {
const struct tcp_sock *tp = tcp_sk(sk);
int max_ato = HZ / 2;
@@ -3581,8 +3579,6 @@ void tcp_send_ack(struct sock *sk)
if (sk->sk_state == TCP_CLOSE)
return;
 
-   tcp_ca_event(sk, CA_EVENT_NON_DELAYED_ACK);
-
/* We are not putting this on the write queue, so
 * tcp_transmit_skb() will set the ownership to this
 * sock.
-- 
2.18.0.203.gfac676dfb9-goog

Re: [PATCH net 1/2] tcp: fix dctcp delayed ACK schedule

2018-07-12 Thread Lawrence Brakmo

On 7/12/18, 9:05 AM, "netdev-ow...@vger.kernel.org on behalf of Yuchung Cheng" 
 wrote:

Previously, when a data segment was sent an ACK was piggybacked
on the data segment without generating a CA_EVENT_NON_DELAYED_ACK
event to notify congestion control modules. So the DCTCP
ca->delayed_ack_reserved flag could incorrectly stay set when
in fact there were no delayed ACKs being reserved. This could result
in sending a special ECN notification ACK that carries an older
ACK sequence, when in fact there was no need for such an ACK.
DCTCP keeps track of the delayed ACK status with its own separate
state ca->delayed_ack_reserved. Previously it may accidentally cancel
the delayed ACK without updating this field upon sending a special
ACK that carries a older ACK sequence. This inconsistency would
lead to DCTCP receiver never acknowledging the latest data until the
sender times out and retry in some cases.

Packetdrill script (provided by Larry Brakmo)

0.000 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
0.000 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
0.000 setsockopt(3, SOL_TCP, TCP_CONGESTION, "dctcp", 5) = 0
0.000 bind(3, ..., ...) = 0
0.000 listen(3, 1) = 0

0.100 < [ect0] SEW 0:0(0) win 32792 
0.100 > SE. 0:0(0) ack 1 
0.110 < [ect0] . 1:1(0) ack 1 win 257
0.200 accept(3, ..., ...) = 4

0.200 < [ect0] . 1:1001(1000) ack 1 win 257
0.200 > [ect01] . 1:1(0) ack 1001

0.200 write(4, ..., 1) = 1
0.200 > [ect01] P. 1:2(1) ack 1001

0.200 < [ect0] . 1001:2001(1000) ack 2 win 257
0.200 write(4, ..., 1) = 1
0.200 > [ect01] P. 2:3(1) ack 2001

0.200 < [ect0] . 2001:3001(1000) ack 3 win 257
0.200 < [ect0] . 3001:4001(1000) ack 3 win 257
0.200 > [ect01] . 3:3(0) ack 4001

0.210 < [ce] P. 4001:4501(500) ack 3 win 257

+0.001 read(4, ..., 4500) = 4500
+0 write(4, ..., 1) = 1
+0 > [ect01] PE. 3:4(1) ack 4501

+0.010 < [ect0] W. 4501:5501(1000) ack 4 win 257
// Previously the ACK sequence below would be 4501, causing a long RTO
+0.040~+0.045 > [ect01] . 4:4(0) ack 5501   // delayed ack

+0.311 < [ect0] . 5501:6501(1000) ack 4 win 257  // More data
+0 > [ect01] . 4:4(0) ack 6501 // now acks everything

+0.500 < F. 9501:9501(0) ack 4 win 257

Reported-by: Larry Brakmo 
Signed-off-by: Yuchung Cheng 
Signed-off-by: Eric Dumazet 
Acked-by: Neal Cardwell 
---
 net/ipv4/tcp_dctcp.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/tcp_dctcp.c b/net/ipv4/tcp_dctcp.c
index 5f5e5936760e..89f88b0d8167 100644
--- a/net/ipv4/tcp_dctcp.c
+++ b/net/ipv4/tcp_dctcp.c
@@ -134,7 +134,8 @@ static void dctcp_ce_state_0_to_1(struct sock *sk)
/* State has changed from CE=0 to CE=1 and delayed
 * ACK has not sent yet.
 */
-   if (!ca->ce_state && ca->delayed_ack_reserved) {
+   if (!ca->ce_state &&
+   inet_csk(sk)->icsk_ack.pending & ICSK_ACK_TIMER) {
u32 tmp_rcv_nxt;
 
/* Save current rcv_nxt. */
@@ -164,7 +165,8 @@ static void dctcp_ce_state_1_to_0(struct sock *sk)
/* State has changed from CE=1 to CE=0 and delayed
 * ACK has not sent yet.
 */
-   if (ca->ce_state && ca->delayed_ack_reserved) {
+   if (ca->ce_state &&
+   inet_csk(sk)->icsk_ack.pending & ICSK_ACK_TIMER) {
u32 tmp_rcv_nxt;
 
/* Save current rcv_nxt. */
-- 
2.18.0.203.gfac676dfb9-goog

LGTM. Thanks for the patch.

Acked-by: Lawrence Brakmo

Re: [PATCH net-next] net/tls: Removed redundant variable from 'struct tls_sw_context_rx'

2018-07-12 Thread Dave Watson

On 07/12/18 11:14 AM, Vakul Garg wrote:
> Hi Boris
> 
> Thanks for explaining.
> Few questions/observations.
> 
> 1. Isn't ' ctx->decrypted = true' a redundant statement in 
> tls_do_decryption()?
> The same has been repeated in tls_recvmsg() after calling decrypt_skb()?
> 
> 2. Similarly, ctx->saved_data_ready(sk) seems not required in 
> tls_do_decryption().
> This is because tls_do_decryption() is already triggered from tls_recvmsg() 
> i.e. from user space app context.
> 
> 3. In tls_queue(), I think strp->sk->sk_state_change() needs to be replaced 
> with ctx->saved_data_ready().

Yes, I think these 3 can all be changed.  #2 would be required if
do_decryption ever is called not in user context, but that's not the
case currently.

Re: [BUG] bonded interfaces drop bpdu (stp) frames

2018-07-12 Thread Jay Vosburgh

Mahesh Bandewar (महेश बंडेवार) wrote:

>On Wed, Jul 11, 2018 at 3:23 PM, Michal Soltys  wrote:
>>
>> Hi,
>>
>> As weird as that sounds, this is what I observed today after bumping
>> kernel version. I have a setup where 2 bonds are attached to linux
>> bridge and physically are connected to two switches doing MSTP (and
>> linux bridge is just passing them).
>>
>> Initially I suspected some changes related to bridge code - but quick
>> peek at the code showed nothing suspicious - and the part of it that
>> explicitly passes stp frames if stp is not enabled has seen little
>> changes (e.g. per-port group_fwd_mask added recently). Furthermore - if
>> regular non-bonded interfaces are attached everything works fine.
>>
>> Just to be sure I detached the bond (802.3ad mode) and checked it with
>> simple tcpdump (ether proto \\stp) - and indeed no hello packets were
>> there (with them being present just fine on active enslaved interface,
>> or on the bond device in earlier kernels).
>>
>> If time permits I'll bisect tommorow to pinpoint the commit, but from
>> quick todays test - 4.9.x is working fine, while 4.16.16 (tested on
>> debian) and 4.17.3 (tested on archlinux) are failing.
>>
>> Unless this is already a known issue (or you have any suggestions what
>> could be responsible).
>>
>I believe these are link-local-multicast messages and sometime back a
>change went into to not pass those frames to the bonding master. This
>could be the side effect of that.

Mahesh, I suspect you're thinking of:

commit b89f04c61efe3b7756434d693b9203cc0cce002e
Author: Chonggang Li 
Date:   Sun Apr 16 12:02:18 2017 -0700

bonding: deliver link-local packets with skb->dev set to link that packets 
arrived on

Michal, are you able to revert this patch and test?

-J

---
-Jay Vosburgh, jay.vosbu...@canonical.com

Re: [PATCH bpf-next 0/6] TCP-BPF callback for listening sockets

2018-07-12 Thread Lawrence Brakmo

LGTM. Thank you for adding the listen callback and cleaning up the test.

Acked-by: Lawrence Brakmo 


On 7/11/18, 8:34 PM, "Andrey Ignatov"  wrote:

This patchset adds TCP-BPF callback for listening sockets.

Patch 0001 provides more details and is the main patch in the set.

Patch 0006 adds selftest for the new callback.

Other patches are bug fixes and improvements in TCP-BPF selftest to make it
easier to extend in 0006.


Andrey Ignatov (6):
  bpf: Add BPF_SOCK_OPS_TCP_LISTEN_CB
  bpf: Sync bpf.h to tools/
  selftests/bpf: Fix const'ness in cgroup_helpers
  selftests/bpf: Switch test_tcpbpf_user to cgroup_helpers
  selftests/bpf: Better verification in test_tcpbpf
  selftests/bpf: Test case for BPF_SOCK_OPS_TCP_LISTEN_CB

 include/uapi/linux/bpf.h  |   3 +
 net/ipv4/af_inet.c|   1 +
 tools/include/uapi/linux/bpf.h|   3 +
 tools/testing/selftests/bpf/Makefile  |   1 +
 tools/testing/selftests/bpf/cgroup_helpers.c  |   6 +-
 tools/testing/selftests/bpf/cgroup_helpers.h  |   6 +-
 tools/testing/selftests/bpf/test_tcpbpf.h |   1 +
 .../testing/selftests/bpf/test_tcpbpf_kern.c  |  17 ++-
 .../testing/selftests/bpf/test_tcpbpf_user.c  | 119 +-
 9 files changed, 88 insertions(+), 69 deletions(-)

-- 
2.17.1

[PATCH net] tls: Stricter error checking in zerocopy sendmsg path

2018-07-12 Thread Dave Watson

In the zerocopy sendmsg() path, there are error checks to revert
the zerocopy if we get any error code.  syzkaller has discovered
that tls_push_record can return -ECONNRESET, which is fatal, and
happens after the point at which it is safe to revert the iter,
as we've already passed the memory to do_tcp_sendpages.

Previously this code could return -ENOMEM and we would want to
revert the iter, but AFAIK this no longer returns ENOMEM after
a447da7d004 ("tls: fix waitall behavior in tls_sw_recvmsg"),
so we fail for all error codes.

Reported-by: syzbot+c226690f7b3126c5e...@syzkaller.appspotmail.com
Reported-by: syzbot+709f2810a6a05f11d...@syzkaller.appspotmail.com
Signed-off-by: Dave Watson 
Fixes: 3c4d7559159b ("tls: kernel TLS support")
---
 net/tls/tls_sw.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index 7818011fd250..4618f1c31137 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -440,7 +440,7 @@ int tls_sw_sendmsg(struct sock *sk, struct msghdr *msg, 
size_t size)
ret = tls_push_record(sk, msg->msg_flags, record_type);
if (!ret)
continue;
-   if (ret == -EAGAIN)
+   if (ret < 0)
goto send_end;
 
copied -= try_to_copy;
-- 
2.17.1

[PATCH bpf] bpf: don't leave partially mangled prog in jit_subprogs error path

2018-07-12 Thread Daniel Borkmann

syzkaller managed to trigger the following bug through fault injection:

  [...]
  [  141.043668] verifier bug. No program starts at insn 3
  [  141.044648] WARNING: CPU: 3 PID: 4072 at kernel/bpf/verifier.c:1613
 get_callee_stack_depth kernel/bpf/verifier.c:1612 [inline]
  [  141.044648] WARNING: CPU: 3 PID: 4072 at kernel/bpf/verifier.c:1613
 fixup_call_args kernel/bpf/verifier.c:5587 [inline]
  [  141.044648] WARNING: CPU: 3 PID: 4072 at kernel/bpf/verifier.c:1613
 bpf_check+0x525e/0x5e60 kernel/bpf/verifier.c:5952
  [  141.047355] CPU: 3 PID: 4072 Comm: a.out Not tainted 4.18.0-rc4+ #51
  [  141.048446] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),BIOS 
1.10.2-1 04/01/2014
  [  141.049877] Call Trace:
  [  141.050324]  __dump_stack lib/dump_stack.c:77 [inline]
  [  141.050324]  dump_stack+0x1c9/0x2b4 lib/dump_stack.c:113
  [  141.050950]  ? dump_stack_print_info.cold.2+0x52/0x52 lib/dump_stack.c:60
  [  141.051837]  panic+0x238/0x4e7 kernel/panic.c:184
  [  141.052386]  ? add_taint.cold.5+0x16/0x16 kernel/panic.c:385
  [  141.053101]  ? __warn.cold.8+0x148/0x1ba kernel/panic.c:537
  [  141.053814]  ? __warn.cold.8+0x117/0x1ba kernel/panic.c:530
  [  141.054506]  ? get_callee_stack_depth kernel/bpf/verifier.c:1612 [inline]
  [  141.054506]  ? fixup_call_args kernel/bpf/verifier.c:5587 [inline]
  [  141.054506]  ? bpf_check+0x525e/0x5e60 kernel/bpf/verifier.c:5952
  [  141.055163]  __warn.cold.8+0x163/0x1ba kernel/panic.c:538
  [  141.055820]  ? get_callee_stack_depth kernel/bpf/verifier.c:1612 [inline]
  [  141.055820]  ? fixup_call_args kernel/bpf/verifier.c:5587 [inline]
  [  141.055820]  ? bpf_check+0x525e/0x5e60 kernel/bpf/verifier.c:5952
  [...]

What happens in jit_subprogs() is that kcalloc() for the subprog func
buffer is failing with NULL where we then bail out. Latter is a plain
return -ENOMEM, and this is definitely not okay since earlier in the
loop we are walking all subprogs and temporarily rewrite insn->off to
remember the subprog id as well as insn->imm to temporarily point the
call to __bpf_call_base + 1 for the initial JIT pass. Thus, bailing
out in such state and handing this over to the interpreter is troublesome
since later/subsequent e.g. find_subprog() lookups are based on wrong
insn->imm.

Therefore, once we hit this point, we need to jump to out_free path
where we undo all changes from earlier loop, so that interpreter can
work on unmodified insn->{off,imm}.

Another point is that should find_subprog() fail in jit_subprogs() due
to a verifier bug, then we also should not simply defer the program to
the interpreter since also here we did partial modifications. Instead
we should just bail out entirely and return an error to the user who is
trying to load the program.

Fixes: 1c2a088a6626 ("bpf: x64: add JIT support for multi-function programs")
Reported-by: syzbot+7d427828b2ea6e592...@syzkaller.appspotmail.com
Signed-off-by: Daniel Borkmann 
---
 kernel/bpf/verifier.c | 21 +++--
 1 file changed, 15 insertions(+), 6 deletions(-)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 9e2bf83..6c5eb46 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -5430,6 +5430,10 @@ static int jit_subprogs(struct bpf_verifier_env *env)
if (insn->code != (BPF_JMP | BPF_CALL) ||
insn->src_reg != BPF_PSEUDO_CALL)
continue;
+   /* Upon error here we cannot fall back to interpreter but
+* need a hard reject of the program. Thus -EFAULT is
+* propagated in any case.
+*/
subprog = find_subprog(env, i + insn->imm + 1);
if (subprog < 0) {
WARN_ONCE(1, "verifier bug. No program starts at insn 
%d\n",
@@ -5450,7 +5454,7 @@ static int jit_subprogs(struct bpf_verifier_env *env)
 
func = kcalloc(env->subprog_cnt, sizeof(prog), GFP_KERNEL);
if (!func)
-   return -ENOMEM;
+   goto out_free;
 
for (i = 0; i < env->subprog_cnt; i++) {
subprog_start = subprog_end;
@@ -5515,7 +5519,7 @@ static int jit_subprogs(struct bpf_verifier_env *env)
tmp = bpf_int_jit_compile(func[i]);
if (tmp != func[i] || func[i]->bpf_func != old_bpf_func) {
verbose(env, "JIT doesn't support bpf-to-bpf calls\n");
-   err = -EFAULT;
+   err = -ENOTSUPP;
goto out_free;
}
cond_resched();
@@ -5548,10 +5552,13 @@ static int jit_subprogs(struct bpf_verifier_env *env)
prog->aux->func_cnt = env->subprog_cnt;
return 0;
 out_free:
-   for (i = 0; i < env->subprog_cnt; i++)
-   if (func[i])
-   bpf_jit_free(func[i]);
-   kfree(func);
+   if (func) {
+   for (i = 0; i < env->subprog_cnt; i++)
+

[net 1/4] ixgbe: Be more careful when modifying MAC filters

2018-07-12 Thread Jeff Kirsher

From: Alexander Duyck 

This change makes it so that we are much more explicit about the ordering
of updates to the receive address register (RAR) table. Prior to this patch
I believe we may have been updating the table while entries were still
active, or possibly allowing for reordering of things since we weren't
explicitly flushing writes to either the lower or upper portion of the
register prior to accessing the other half.

Signed-off-by: Alexander Duyck 
Reviewed-by: Shannon Nelson 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_common.c | 12 +++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_common.c 
b/drivers/net/ethernet/intel/ixgbe/ixgbe_common.c
index 3f5c350716bb..0bd1294ba517 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_common.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_common.c
@@ -1871,7 +1871,12 @@ s32 ixgbe_set_rar_generic(struct ixgbe_hw *hw, u32 
index, u8 *addr, u32 vmdq,
if (enable_addr != 0)
rar_high |= IXGBE_RAH_AV;
 
+   /* Record lower 32 bits of MAC address and then make
+* sure that write is flushed to hardware before writing
+* the upper 16 bits and setting the valid bit.
+*/
IXGBE_WRITE_REG(hw, IXGBE_RAL(index), rar_low);
+   IXGBE_WRITE_FLUSH(hw);
IXGBE_WRITE_REG(hw, IXGBE_RAH(index), rar_high);
 
return 0;
@@ -1903,8 +1908,13 @@ s32 ixgbe_clear_rar_generic(struct ixgbe_hw *hw, u32 
index)
rar_high = IXGBE_READ_REG(hw, IXGBE_RAH(index));
rar_high &= ~(0x | IXGBE_RAH_AV);
 
-   IXGBE_WRITE_REG(hw, IXGBE_RAL(index), 0);
+   /* Clear the address valid bit and upper 16 bits of the address
+* before clearing the lower bits. This way we aren't updating
+* a live filter.
+*/
IXGBE_WRITE_REG(hw, IXGBE_RAH(index), rar_high);
+   IXGBE_WRITE_FLUSH(hw);
+   IXGBE_WRITE_REG(hw, IXGBE_RAL(index), 0);
 
/* clear VMDq pool/queue selection for this RAR */
hw->mac.ops.clear_vmdq(hw, index, IXGBE_CLEAR_VMDQ_ALL);
-- 
2.17.1

[net 0/4][pull request] Intel Wired LAN Driver Updates 2018-07-12

2018-07-12 Thread Jeff Kirsher

This series contains updates to ixgbe and e100/e1000 kernel documentation.

Alex fixes ixgbe to ensure that we are more explicit about the ordering
of updates to the receive address register (RAR) table.

Dan Carpenter fixes an issue where we were reading one element beyond
the end of the array.

Mauro Carvalho Chehab fixes formatting issues in the e100.rst and
e1000.rst that were causing errors during 'make htmldocs'.

The following are changes since commit 20c4515a1af770f4fb0dc6b044ffc9a6031e5767:
  qed: fix spelling mistake "successffuly" -> "successfully"
and are available in the git repository at:
  git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-queue 10GbE

Alexander Duyck (1):
  ixgbe: Be more careful when modifying MAC filters

Dan Carpenter (1):
  ixgbe: Off by one in ixgbe_ipsec_tx()

Mauro Carvalho Chehab (2):
  networking: e100.rst: Get rid of Sphinx warnings
  networking: e1000.rst: Get rid of Sphinx warnings

 Documentation/networking/e100.rst |  27 ++-
 Documentation/networking/e1000.rst| 187 +++---
 .../net/ethernet/intel/ixgbe/ixgbe_common.c   |  12 +-
 .../net/ethernet/intel/ixgbe/ixgbe_ipsec.c|   2 +-
 4 files changed, 141 insertions(+), 87 deletions(-)

-- 
2.17.1

[net 3/4] networking: e100.rst: Get rid of Sphinx warnings

2018-07-12 Thread Jeff Kirsher

From: Mauro Carvalho Chehab 

Documentation/networking/e100.rst:57: WARNING: Literal block expected; none 
found.
Documentation/networking/e100.rst:68: WARNING: Literal block expected; none 
found.
Documentation/networking/e100.rst:75: WARNING: Literal block expected; none 
found.
Documentation/networking/e100.rst:84: WARNING: Literal block expected; none 
found.
Documentation/networking/e100.rst:93: WARNING: Inline emphasis start-string 
without end-string.

While here, fix some highlights.

Signed-off-by: Mauro Carvalho Chehab 
Signed-off-by: Jeff Kirsher 
---
 Documentation/networking/e100.rst | 27 +--
 1 file changed, 17 insertions(+), 10 deletions(-)

diff --git a/Documentation/networking/e100.rst 
b/Documentation/networking/e100.rst
index 9708f5fa76de..f8eba9c5 100644
--- a/Documentation/networking/e100.rst
+++ b/Documentation/networking/e100.rst
@@ -47,41 +47,45 @@ Driver Configuration Parameters
 The default value for each parameter is generally the recommended setting,
 unless otherwise noted.
 
-Rx Descriptors: Number of receive descriptors. A receive descriptor is a data
+Rx Descriptors:
+   Number of receive descriptors. A receive descriptor is a data
structure that describes a receive buffer and its attributes to the network
controller. The data in the descriptor is used by the controller to write
data from the controller to host memory. In the 3.x.x driver the valid range
for this parameter is 64-256. The default value is 256. This parameter can 
be
changed using the command::
 
-   ethtool -G eth? rx n
+ ethtool -G eth? rx n
 
Where n is the number of desired Rx descriptors.
 
-Tx Descriptors: Number of transmit descriptors. A transmit descriptor is a data
+Tx Descriptors:
+   Number of transmit descriptors. A transmit descriptor is a data
structure that describes a transmit buffer and its attributes to the network
controller. The data in the descriptor is used by the controller to read
data from the host memory to the controller. In the 3.x.x driver the valid
range for this parameter is 64-256. The default value is 128. This parameter
can be changed using the command::
 
-   ethtool -G eth? tx n
+ ethtool -G eth? tx n
 
Where n is the number of desired Tx descriptors.
 
-Speed/Duplex: The driver auto-negotiates the link speed and duplex settings by
+Speed/Duplex:
+   The driver auto-negotiates the link speed and duplex settings by
default. The ethtool utility can be used as follows to force speed/duplex.::
 
-   ethtool -s eth?  autoneg off speed {10|100} duplex {full|half}
+ ethtool -s eth?  autoneg off speed {10|100} duplex {full|half}
 
NOTE: setting the speed/duplex to incorrect values will cause the link to
fail.
 
-Event Log Message Level:  The driver uses the message level flag to log events
+Event Log Message Level:
+   The driver uses the message level flag to log events
to syslog. The message level can be set at driver load time. It can also be
set using the command::
 
-   ethtool -s eth? msglvl n
+ ethtool -s eth? msglvl n
 
 
 Additional Configurations
@@ -92,7 +96,7 @@ Configuring the Driver on Different Distributions
 
 Configuring a network driver to load properly when the system is started
 is distribution dependent.  Typically, the configuration process involves
-adding an alias line to /etc/modprobe.d/*.conf as well as editing other
+adding an alias line to `/etc/modprobe.d/*.conf` as well as editing other
 system startup scripts and/or configuration files.  Many popular Linux
 distributions ship with tools to make these changes for you.  To learn
 the proper way to configure a network device for your system, refer to
@@ -160,7 +164,10 @@ This results in unbalanced receive traffic.
 If you have multiple interfaces in a server, either turn on ARP
 filtering by
 
-(1) entering:: echo 1 > /proc/sys/net/ipv4/conf/all/arp_filter
+(1) entering::
+
+   echo 1 > /proc/sys/net/ipv4/conf/all/arp_filter
+
 (this only works if your kernel's version is higher than 2.4.5), or
 
 (2) installing the interfaces in separate broadcast domains (either
-- 
2.17.1

[net 2/4] ixgbe: Off by one in ixgbe_ipsec_tx()

2018-07-12 Thread Jeff Kirsher

From: Dan Carpenter 

The ipsec->tx_tbl[] has IXGBE_IPSEC_MAX_SA_COUNT elements so the > needs
to be changed to >= so we don't read one element beyond the end of the
array.

Fixes: 592594704761 ("ixgbe: process the Tx ipsec offload")
Signed-off-by: Dan Carpenter 
Acked-by: Shannon Nelson 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c 
b/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c
index c116f459945d..da4322e4daed 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c
@@ -839,7 +839,7 @@ int ixgbe_ipsec_tx(struct ixgbe_ring *tx_ring,
}
 
itd->sa_idx = xs->xso.offload_handle - IXGBE_IPSEC_BASE_TX_INDEX;
-   if (unlikely(itd->sa_idx > IXGBE_IPSEC_MAX_SA_COUNT)) {
+   if (unlikely(itd->sa_idx >= IXGBE_IPSEC_MAX_SA_COUNT)) {
netdev_err(tx_ring->netdev, "%s: bad sa_idx=%d handle=%lu\n",
   __func__, itd->sa_idx, xs->xso.offload_handle);
return 0;
-- 
2.17.1

[net 4/4] networking: e1000.rst: Get rid of Sphinx warnings

2018-07-12 Thread Jeff Kirsher

From: Mauro Carvalho Chehab 

Documentation/networking/e1000.rst:83: ERROR: Unexpected indentation.
Documentation/networking/e1000.rst:84: WARNING: Block quote ends without a 
blank line; unexpected unindent.
Documentation/networking/e1000.rst:173: WARNING: Definition list ends 
without a blank line; unexpected unindent.
Documentation/networking/e1000.rst:236: WARNING: Definition list ends 
without a blank line; unexpected unindent.

While here, fix highlights and mark a table as such.

Signed-off-by: Mauro Carvalho Chehab 
Signed-off-by: Jeff Kirsher 
---
 Documentation/networking/e1000.rst | 187 +
 1 file changed, 112 insertions(+), 75 deletions(-)

diff --git a/Documentation/networking/e1000.rst 
b/Documentation/networking/e1000.rst
index 144b87eef153..f10dd4086921 100644
--- a/Documentation/networking/e1000.rst
+++ b/Documentation/networking/e1000.rst
@@ -34,7 +34,8 @@ Command Line Parameters
 The default value for each parameter is generally the recommended setting,
 unless otherwise noted.
 
-NOTES:  For more information about the AutoNeg, Duplex, and Speed
+NOTES:
+   For more information about the AutoNeg, Duplex, and Speed
 parameters, see the "Speed and Duplex Configuration" section in
 this document.
 
@@ -45,22 +46,27 @@ NOTES:  For more information about the AutoNeg, Duplex, and 
Speed
 
 AutoNeg
 ---
+
 (Supported only on adapters with copper connections)
-Valid Range:   0x01-0x0F, 0x20-0x2F
-Default Value: 0x2F
+
+:Valid Range:   0x01-0x0F, 0x20-0x2F
+:Default Value: 0x2F
 
 This parameter is a bit-mask that specifies the speed and duplex settings
 advertised by the adapter.  When this parameter is used, the Speed and
 Duplex parameters must not be specified.
 
-NOTE:  Refer to the Speed and Duplex section of this readme for more
+NOTE:
+   Refer to the Speed and Duplex section of this readme for more
information on the AutoNeg parameter.
 
 Duplex
 --
+
 (Supported only on adapters with copper connections)
-Valid Range:   0-2 (0=auto-negotiate, 1=half, 2=full)
-Default Value: 0
+
+:Valid Range:   0-2 (0=auto-negotiate, 1=half, 2=full)
+:Default Value: 0
 
 This defines the direction in which data is allowed to flow.  Can be
 either one or two-directional.  If both Duplex and the link partner are
@@ -70,18 +76,22 @@ duplex.
 
 FlowControl
 ---
-Valid Range:   0-3 (0=none, 1=Rx only, 2=Tx only, 3=Rx&Tx)
-Default Value: Reads flow control settings from the EEPROM
+
+:Valid Range:   0-3 (0=none, 1=Rx only, 2=Tx only, 3=Rx&Tx)
+:Default Value: Reads flow control settings from the EEPROM
 
 This parameter controls the automatic generation(Tx) and response(Rx)
 to Ethernet PAUSE frames.
 
 InterruptThrottleRate
 -
+
 (not supported on Intel(R) 82542, 82543 or 82544-based adapters)
-Valid Range:   0,1,3,4,100-10 (0=off, 1=dynamic, 3=dynamic conservative,
- 4=simplified balancing)
-Default Value: 3
+
+:Valid Range:
+   0,1,3,4,100-10 (0=off, 1=dynamic, 3=dynamic conservative,
+   4=simplified balancing)
+:Default Value: 3
 
 The driver can limit the amount of interrupts per second that the adapter
 will generate for incoming packets. It does this by writing a value to the
@@ -135,13 +145,15 @@ Setting InterruptThrottleRate to 0 turns off any 
interrupt moderation
 and may improve small packet latency, but is generally not suitable
 for bulk throughput traffic.
 
-NOTE:  InterruptThrottleRate takes precedence over the TxAbsIntDelay and
+NOTE:
+   InterruptThrottleRate takes precedence over the TxAbsIntDelay and
RxAbsIntDelay parameters.  In other words, minimizing the receive
and/or transmit absolute delays does not force the controller to
generate more interrupts than what the Interrupt Throttle Rate
allows.
 
-CAUTION:  If you are using the Intel(R) PRO/1000 CT Network Connection
+CAUTION:
+  If you are using the Intel(R) PRO/1000 CT Network Connection
   (controller 82547), setting InterruptThrottleRate to a value
   greater than 75,000, may hang (stop transmitting) adapters
   under certain network conditions.  If this occurs a NETDEV
@@ -151,7 +163,8 @@ CAUTION:  If you are using the Intel(R) PRO/1000 CT Network 
Connection
   hang, ensure that InterruptThrottleRate is set no greater
   than 75,000 and is not set to 0.
 
-NOTE:  When e1000 is loaded with default settings and multiple adapters
+NOTE:
+   When e1000 is loaded with default settings and multiple adapters
are in use simultaneously, the CPU utilization may increase non-
linearly.  In order to limit the CPU utilization without impacting
the overall throughput, we recommend that you load the driver as
@@ -168,9 +181,11 @@ NOTE:  When e1000 is loaded with default settings and 
multiple adapters
 
 RxDescriptors
 -
-Valid Range:   48-256 for 82542 and 82543-based adapters

[PATCH iproute2-next v2] net:sched: add action inheritdsfield to skbedit

2018-07-12 Thread Qiaobin Fu

The new action inheritdsfield copies the field DS of
IPv4 and IPv6 packets into skb->priority. This enables
later classification of packets based on the DS field.

v2:
* Align the output syntax with the input syntax

* Fix the style issues

Original idea by Jamal Hadi Salim 

Signed-off-by: Qiaobin Fu 
Reviewed-by: Michel Machado 
Reviewed-by: Cong Wang 
---

Note that the motivation for this patch is found in the following discussion:
https://www.spinics.net/lists/netdev/msg501061.html
---
 tc/m_skbedit.c | 20 ++--
 1 file changed, 18 insertions(+), 2 deletions(-)

diff --git a/tc/m_skbedit.c b/tc/m_skbedit.c
index 7391fc7f..59b2affc 100644
--- a/tc/m_skbedit.c
+++ b/tc/m_skbedit.c
@@ -30,16 +30,18 @@
 
 static void explain(void)
 {
-   fprintf(stderr, "Usage: ... skbedit <[QM] [PM] [MM] [PT]>\n"
+   fprintf(stderr, "Usage: ... skbedit <[QM] [PM] [MM] [PT] [IF]>\n"
"QM = queue_mapping QUEUE_MAPPING\n"
"PM = priority PRIORITY\n"
"MM = mark MARK\n"
"PT = ptype PACKETYPE\n"
+   "IF = inheritdsfield\n"
"PACKETYPE = is one of:\n"
"  host, otherhost, broadcast, multicast\n"
"QUEUE_MAPPING = device transmit queue to use\n"
"PRIORITY = classID to assign to priority field\n"
-   "MARK = firewall mark to set\n");
+   "MARK = firewall mark to set\n"
+   "note: inheritdsfield maps DS field to skb->priority\n");
 }
 
 static void
@@ -60,6 +62,7 @@ parse_skbedit(struct action_util *a, int *argc_p, char 
***argv_p, int tca_id,
unsigned int tmp;
__u16 queue_mapping, ptype;
__u32 flags = 0, priority, mark;
+   __u64 pure_flags = 0;
struct tc_skbedit sel = { 0 };
 
if (matches(*argv, "skbedit") != 0)
@@ -111,6 +114,9 @@ parse_skbedit(struct action_util *a, int *argc_p, char 
***argv_p, int tca_id,
}
flags |= SKBEDIT_F_PTYPE;
ok++;
+   } else if (matches(*argv, "inheritdsfield") == 0) {
+   pure_flags |= SKBEDIT_F_INHERITDSFIELD;
+   ok++;
} else if (matches(*argv, "help") == 0) {
usage();
} else {
@@ -156,6 +162,9 @@ parse_skbedit(struct action_util *a, int *argc_p, char 
***argv_p, int tca_id,
if (flags & SKBEDIT_F_PTYPE)
addattr_l(n, MAX_MSG, TCA_SKBEDIT_PTYPE,
  &ptype, sizeof(ptype));
+   if (pure_flags != 0)
+   addattr_l(n, MAX_MSG, TCA_SKBEDIT_FLAGS,
+   &pure_flags, sizeof(pure_flags));
addattr_nest_end(n, tail);
 
*argc_p = argc;
@@ -214,6 +223,13 @@ static int print_skbedit(struct action_util *au, FILE *f, 
struct rtattr *arg)
else
print_uint(PRINT_ANY, "ptype", " ptype %u", ptype);
}
+   if (tb[TCA_SKBEDIT_FLAGS] != NULL) {
+   __u64 *flags = RTA_DATA(tb[TCA_SKBEDIT_FLAGS]);
+
+   if (*flags & SKBEDIT_F_INHERITDSFIELD)
+   print_string(PRINT_ANY, "inheritdsfield", " %s",
+"inheritdsfield");
+   }
 
print_action_control(f, " ", p->action, "");
 
-- 
2.17.1

Re: [BUG] bonded interfaces drop bpdu (stp) frames

2018-07-12 Thread Michal Soltys


On 07/12/2018 04:51 PM, Jay Vosburgh wrote:

Mahesh Bandewar (महेश बंडेवार) wrote:


On Wed, Jul 11, 2018 at 3:23 PM, Michal Soltys  wrote:


Hi,

As weird as that sounds, this is what I observed today after bumping
kernel version. I have a setup where 2 bonds are attached to linux
bridge and physically are connected to two switches doing MSTP (and
linux bridge is just passing them).

Initially I suspected some changes related to bridge code - but quick
peek at the code showed nothing suspicious - and the part of it that
explicitly passes stp frames if stp is not enabled has seen little
changes (e.g. per-port group_fwd_mask added recently). Furthermore - if
regular non-bonded interfaces are attached everything works fine.

Just to be sure I detached the bond (802.3ad mode) and checked it with
simple tcpdump (ether proto \\stp) - and indeed no hello packets were
there (with them being present just fine on active enslaved interface,
or on the bond device in earlier kernels).

If time permits I'll bisect tommorow to pinpoint the commit, but from
quick todays test - 4.9.x is working fine, while 4.16.16 (tested on
debian) and 4.17.3 (tested on archlinux) are failing.

Unless this is already a known issue (or you have any suggestions what
could be responsible).


I believe these are link-local-multicast messages and sometime back a
change went into to not pass those frames to the bonding master. This
could be the side effect of that.


Mahesh, I suspect you're thinking of:

commit b89f04c61efe3b7756434d693b9203cc0cce002e
Author: Chonggang Li 
Date:   Sun Apr 16 12:02:18 2017 -0700

 bonding: deliver link-local packets with skb->dev set to link that packets 
arrived on

Michal, are you able to revert this patch and test?

-J

---
-Jay Vosburgh, jay.vosbu...@canonical.com




Just tested - yes, reverting that patch solves the issues.

Re: [PATCH v3 net-next 10/19] tls: Fix zerocopy_from_iter iov handling

2018-07-12 Thread Dave Watson

On 07/11/18 10:54 PM, Boris Pismenny wrote:
> zerocopy_from_iter iterates over the message, but it doesn't revert the
> updates made by the iov iteration. This patch fixes it. Now, the iov can
> be used after calling zerocopy_from_iter.

This breaks tests (which I will send up as selftests shortly).  I
believe we are depending on zerocopy_from_iter to advance the iter,
and if zerocopy_from_iter returns a failure, then we revert it.  So
you can revert it here if you want, but you'd have to advance it if we
actually used it instead.

Re: [PATCH net-next] net: gro: properly remove skb from list

2018-07-12 Thread Tyler Hicks

On 2018-07-12 16:24:59, Prashant Bhole wrote:
> Following crash occurs in validate_xmit_skb_list() when same skb is
> iterated multiple times in the loop and consume_skb() is called.
> 
> The root cause is calling list_del_init(&skb->list) and not clearing
> skb->next in d4546c2509b1. list_del_init(&skb->list) sets skb->next
> to point to skb itself. skb->next needs to be cleared because other
> parts of network stack uses another kind of SKB lists.
> validate_xmit_skb_list() uses such list.
> 
> A similar type of bugfix was reported by Jesper Dangaard Brouer.
> https://patchwork.ozlabs.org/patch/942541/
> 
> This patch clears skb->next and changes list_del_init() to list_del()
> so that list->prev will maintain the list poison.
> 
> [  148.185511] 
> ==
> [  148.187865] BUG: KASAN: use-after-free in validate_xmit_skb_list+0x4b/0xa0
> [  148.190158] Read of size 8 at addr 8801e52eefc0 by task swapper/1/0
> [  148.192940]
> [  148.193642] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.18.0-rc3+ #25
> [  148.195423] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
> ?-20180531_142017-buildhw-08.phx2.fedoraproject.org-1.fc28 04/01/2014
> [  148.199129] Call Trace:
> [  148.200565]  
> [  148.201911]  dump_stack+0xc6/0x14c
> [  148.203572]  ? dump_stack_print_info.cold.1+0x2f/0x2f
> [  148.205083]  ? kmsg_dump_rewind_nolock+0x59/0x59
> [  148.206307]  ? validate_xmit_skb+0x2c6/0x560
> [  148.207432]  ? debug_show_held_locks+0x30/0x30
> [  148.208571]  ? validate_xmit_skb_list+0x4b/0xa0
> [  148.211144]  print_address_description+0x6c/0x23c
> [  148.212601]  ? validate_xmit_skb_list+0x4b/0xa0
> [  148.213782]  kasan_report.cold.6+0x241/0x2fd
> [  148.214958]  validate_xmit_skb_list+0x4b/0xa0
> [  148.216494]  sch_direct_xmit+0x1b0/0x680
> [  148.217601]  ? dev_watchdog+0x4e0/0x4e0
> [  148.218675]  ? do_raw_spin_trylock+0x10/0x120
> [  148.219818]  ? do_raw_spin_lock+0xe0/0xe0
> [  148.221032]  __dev_queue_xmit+0x1167/0x1810
> [  148.222155]  ? sched_clock+0x5/0x10
> [...]
> 
> [  148.474257] Allocated by task 0:
> [  148.475363]  kasan_kmalloc+0xbf/0xe0
> [  148.476503]  kmem_cache_alloc+0xb4/0x1b0
> [  148.477654]  __build_skb+0x91/0x250
> [  148.478677]  build_skb+0x67/0x180
> [  148.479657]  e1000_clean_rx_irq+0x542/0x8a0
> [  148.480757]  e1000_clean+0x652/0xd10
> [  148.481772]  net_rx_action+0x4ea/0xc20
> [  148.482808]  __do_softirq+0x1f9/0x574
> [  148.483831]
> [  148.484575] Freed by task 0:
> [  148.485504]  __kasan_slab_free+0x12e/0x180
> [  148.486589]  kmem_cache_free+0xb4/0x240
> [  148.487634]  kfree_skbmem+0xed/0x150
> [  148.488648]  consume_skb+0x146/0x250
> [  148.489665]  validate_xmit_skb+0x2b7/0x560
> [  148.490754]  validate_xmit_skb_list+0x70/0xa0
> [  148.491897]  sch_direct_xmit+0x1b0/0x680
> [  148.493949]  __dev_queue_xmit+0x1167/0x1810
> [  148.495103]  br_dev_queue_push_xmit+0xce/0x250
> [  148.496196]  br_forward_finish+0x276/0x280
> [  148.497234]  __br_forward+0x44f/0x520
> [  148.498260]  br_forward+0x19f/0x1b0
> [  148.499264]  br_handle_frame_finish+0x65e/0x980
> [  148.500398]  NF_HOOK.constprop.10+0x290/0x2a0
> [  148.501522]  br_handle_frame+0x417/0x640
> [  148.502582]  __netif_receive_skb_core+0xaac/0x18f0
> [  148.503753]  __netif_receive_skb_one_core+0x98/0x120
> [  148.504958]  netif_receive_skb_internal+0xe3/0x330
> [  148.506154]  napi_gro_complete+0x190/0x2a0
> [  148.507243]  dev_gro_receive+0x9f7/0x1100
> [  148.508316]  napi_gro_receive+0xcb/0x260
> [  148.509387]  e1000_clean_rx_irq+0x2fc/0x8a0
> [  148.510501]  e1000_clean+0x652/0xd10
> [  148.511523]  net_rx_action+0x4ea/0xc20
> [  148.512566]  __do_softirq+0x1f9/0x574
> [  148.513598]
> [  148.514346] The buggy address belongs to the object at 8801e52eefc0
> [  148.514346]  which belongs to the cache skbuff_head_cache of size 232
> [  148.517047] The buggy address is located 0 bytes inside of
> [  148.517047]  232-byte region [8801e52eefc0, 8801e52ef0a8)
> [  148.519549] The buggy address belongs to the page:
> [  148.520726] page:ea000794bb00 count:1 mapcount:0 
> mapping:880106f4dfc0 index:0x8801e52ee840 compound_mapcount: 0
> [  148.524325] flags: 0x17c0008100(slab|head)
> [  148.525481] raw: 0017c0008100 880106b938d0 880106b938d0 
> 880106f4dfc0
> [  148.527503] raw: 8801e52ee840 00190011 0001 
> 
> [  148.529547] page dumped because: kasan: bad access detected
> 
> Fixes: d4546c2509b1 ("net: Convert GRO SKB handling to list_head.")
> Signed-off-by: Prashant Bhole 
> Reported-by: Tyler Hicks 

Thanks for the fix! I have verified that it fixes the crash that I was
seeing.

  Tested-by: Tyler Hicks 

Your analysis of the problem makes sense. I feel like a helper function
to properly remove an skb from list_head and NULL the next pointer, in
the GRO code, before passing it on to something else that uses the
hand-rolled linked list implementation c

Re: [PATCH v3 net-next 00/19] TLS offload rx, netdev & mlx5

2018-07-12 Thread Dave Watson

On 07/11/18 10:54 PM, Boris Pismenny wrote:
> Hi,
> 
> The following series provides TLS RX inline crypto offload.

All the tls patches look good to me except #10

"tls: Fix zerocopy_from_iter iov handling"

which seems to break the non-device zerocopy flow. 

The integration is very clean, thanks!

> 
> v2->v3:
> - Fix typo
> - Adjust cover letter
> - Fix bug in zero copy flows
> - Use network byte order for the record number in resync
> - Adjust the sequence provided in resync
> 
> v1->v2:
> - Fix bisectability problems due to variable name changes
> - Fix potential uninitialized return value
>

Re: [PATCH bpf-next 0/3] bpf: install eBPF helper man page along with bpftool doc

2018-07-12 Thread Daniel Borkmann

On 07/12/2018 01:52 PM, Quentin Monnet wrote:
> The three patches in this series are related to the documentation for eBPF
> helpers. The first patch brings minor formatting edits to the documentation
> in include/uapi/linux/bpf.h, and the second one updates the related header
> file under tools/.
> 
> The third patch adds a Makefile under tools/bpf for generating the
> documentation (man pages) about eBPF helpers. The targets defined in this
> file can also be called from the bpftool directory (please refer to
> relevant commit logs for details).
> 
> Quentin Monnet (3):
>   bpf: fix documentation for eBPF helpers
>   tools: bpf: synchronise BPF UAPI header with tools
>   tools: bpf: build and install man page for eBPF helpers from bpftool/

Looks good, applied to bpf-next, thanks Quentin!

Re: [net 0/4][pull request] Intel Wired LAN Driver Updates 2018-07-12

2018-07-12 Thread David Miller

From: Jeff Kirsher 
Date: Thu, 12 Jul 2018 08:27:52 -0700

> This series contains updates to ixgbe and e100/e1000 kernel documentation.
> 
> Alex fixes ixgbe to ensure that we are more explicit about the ordering
> of updates to the receive address register (RAR) table.
> 
> Dan Carpenter fixes an issue where we were reading one element beyond
> the end of the array.
> 
> Mauro Carvalho Chehab fixes formatting issues in the e100.rst and
> e1000.rst that were causing errors during 'make htmldocs'.
> 
> The following are changes since commit 
> 20c4515a1af770f4fb0dc6b044ffc9a6031e5767:
>   qed: fix spelling mistake "successffuly" -> "successfully"
> and are available in the git repository at:
>   git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-queue 10GbE

Pulled, thanks Jeff.

[PATCH net-next] selftests: tls: add selftests for TLS sockets

2018-07-12 Thread Dave Watson

Add selftests for tls socket.  Tests various iov and message options,
poll blocking and nonblocking behavior, partial message sends / receives,
 and control message data.  Tests should pass regardless of if TLS
is enabled in the kernel or not, and print a warning message if not.

Signed-off-by: Dave Watson 
---
 tools/testing/selftests/net/Makefile |   2 +-
 tools/testing/selftests/net/tls.c| 692 +++
 2 files changed, 693 insertions(+), 1 deletion(-)
 create mode 100644 tools/testing/selftests/net/tls.c

diff --git a/tools/testing/selftests/net/Makefile 
b/tools/testing/selftests/net/Makefile
index 663e11e85727..9cca68e440a0 100644
--- a/tools/testing/selftests/net/Makefile
+++ b/tools/testing/selftests/net/Makefile
@@ -13,7 +13,7 @@ TEST_GEN_FILES += psock_fanout psock_tpacket msg_zerocopy
 TEST_GEN_FILES += tcp_mmap tcp_inq psock_snd
 TEST_GEN_FILES += udpgso udpgso_bench_tx udpgso_bench_rx
 TEST_GEN_PROGS = reuseport_bpf reuseport_bpf_cpu reuseport_bpf_numa
-TEST_GEN_PROGS += reuseport_dualstack reuseaddr_conflict
+TEST_GEN_PROGS += reuseport_dualstack reuseaddr_conflict tls
 
 include ../lib.mk
 
diff --git a/tools/testing/selftests/net/tls.c 
b/tools/testing/selftests/net/tls.c
new file mode 100644
index ..b3ebf2646e52
--- /dev/null
+++ b/tools/testing/selftests/net/tls.c
@@ -0,0 +1,692 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#define _GNU_SOURCE
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+#include 
+
+#include 
+#include 
+#include 
+#include 
+
+#include "../kselftest_harness.h"
+
+#define TLS_PAYLOAD_MAX_LEN 16384
+#define SOL_TLS 282
+
+FIXTURE(tls)
+{
+   int fd, cfd;
+   bool notls;
+};
+
+FIXTURE_SETUP(tls)
+{
+   struct tls12_crypto_info_aes_gcm_128 tls12;
+   struct sockaddr_in addr;
+   socklen_t len;
+   int sfd, ret;
+
+   self->notls = false;
+   len = sizeof(addr);
+
+   memset(&tls12, 0, sizeof(tls12));
+   tls12.info.version = TLS_1_2_VERSION;
+   tls12.info.cipher_type = TLS_CIPHER_AES_GCM_128;
+
+   addr.sin_family = AF_INET;
+   addr.sin_addr.s_addr = htonl(INADDR_ANY);
+   addr.sin_port = 0;
+
+   self->fd = socket(AF_INET, SOCK_STREAM, 0);
+   sfd = socket(AF_INET, SOCK_STREAM, 0);
+
+   ret = bind(sfd, &addr, sizeof(addr));
+   ASSERT_EQ(ret, 0);
+   ret = listen(sfd, 10);
+   ASSERT_EQ(ret, 0);
+
+   ret = getsockname(sfd, &addr, &len);
+   ASSERT_EQ(ret, 0);
+
+   ret = connect(self->fd, &addr, sizeof(addr));
+   ASSERT_EQ(ret, 0);
+
+   ret = setsockopt(self->fd, IPPROTO_TCP, TCP_ULP, "tls", sizeof("tls"));
+   if (ret != 0) {
+   self->notls = true;
+   printf("Failure setting TCP_ULP, testing without tls\n");
+   }
+
+   if (!self->notls) {
+   ret = setsockopt(self->fd, SOL_TLS, TLS_TX, &tls12,
+sizeof(tls12));
+   ASSERT_EQ(ret, 0);
+   }
+
+   self->cfd = accept(sfd, &addr, &len);
+   ASSERT_GE(self->cfd, 0);
+
+   if (!self->notls) {
+   ret = setsockopt(self->cfd, IPPROTO_TCP, TCP_ULP, "tls",
+sizeof("tls"));
+   ASSERT_EQ(ret, 0);
+
+   ret = setsockopt(self->cfd, SOL_TLS, TLS_RX, &tls12,
+sizeof(tls12));
+   ASSERT_EQ(ret, 0);
+   }
+
+   close(sfd);
+}
+
+FIXTURE_TEARDOWN(tls)
+{
+   close(self->fd);
+   close(self->cfd);
+}
+
+TEST_F(tls, sendfile)
+{
+   int filefd = open("/proc/self/exe", O_RDONLY);
+   struct stat st;
+
+   EXPECT_GE(filefd, 0);
+   fstat(filefd, &st);
+   EXPECT_GE(sendfile(self->fd, filefd, 0, st.st_size), 0);
+}
+
+TEST_F(tls, send_then_sendfile)
+{
+   int filefd = open("/proc/self/exe", O_RDONLY);
+   char const *test_str = "test_send";
+   int to_send = strlen(test_str) + 1;
+   char recv_buf[10];
+   struct stat st;
+   char *buf;
+
+   EXPECT_GE(filefd, 0);
+   fstat(filefd, &st);
+   buf = (char *)malloc(st.st_size);
+
+   EXPECT_EQ(send(self->fd, test_str, to_send, 0), to_send);
+   EXPECT_EQ(recv(self->cfd, recv_buf, to_send, 0), to_send);
+   EXPECT_EQ(memcmp(test_str, recv_buf, to_send), 0);
+
+   EXPECT_GE(sendfile(self->fd, filefd, 0, st.st_size), 0);
+   EXPECT_EQ(recv(self->cfd, buf, st.st_size, 0), st.st_size);
+}
+
+TEST_F(tls, recv_max)
+{
+   unsigned int send_len = TLS_PAYLOAD_MAX_LEN;
+   char recv_mem[TLS_PAYLOAD_MAX_LEN];
+   char buf[TLS_PAYLOAD_MAX_LEN];
+
+   EXPECT_GE(send(self->fd, buf, send_len, 0), 0);
+   EXPECT_NE(recv(self->cfd, recv_mem, send_len, 0), -1);
+   EXPECT_EQ(memcmp(buf, recv_mem, send_len), 0);
+}
+
+TEST_F(tls, recv_small)
+{
+   char const *test_str = "test_read";
+   int send_len = 10;
+   char buf[10];
+
+   send_len = strlen(test_str) + 1;

Re: [BUG] bonded interfaces drop bpdu (stp) frames

2018-07-12 Thread Jay Vosburgh

Michal Soltys  wrote:

>On 07/12/2018 04:51 PM, Jay Vosburgh wrote:
>> Mahesh Bandewar (महेश बंडेवार) wrote:
>>
>>> On Wed, Jul 11, 2018 at 3:23 PM, Michal Soltys  wrote:

 Hi,

 As weird as that sounds, this is what I observed today after bumping
 kernel version. I have a setup where 2 bonds are attached to linux
 bridge and physically are connected to two switches doing MSTP (and
 linux bridge is just passing them).

 Initially I suspected some changes related to bridge code - but quick
 peek at the code showed nothing suspicious - and the part of it that
 explicitly passes stp frames if stp is not enabled has seen little
 changes (e.g. per-port group_fwd_mask added recently). Furthermore - if
 regular non-bonded interfaces are attached everything works fine.

 Just to be sure I detached the bond (802.3ad mode) and checked it with
 simple tcpdump (ether proto \\stp) - and indeed no hello packets were
 there (with them being present just fine on active enslaved interface,
 or on the bond device in earlier kernels).

 If time permits I'll bisect tommorow to pinpoint the commit, but from
 quick todays test - 4.9.x is working fine, while 4.16.16 (tested on
 debian) and 4.17.3 (tested on archlinux) are failing.

 Unless this is already a known issue (or you have any suggestions what
 could be responsible).

>>> I believe these are link-local-multicast messages and sometime back a
>>> change went into to not pass those frames to the bonding master. This
>>> could be the side effect of that.
>>
>>  Mahesh, I suspect you're thinking of:
>>
>> commit b89f04c61efe3b7756434d693b9203cc0cce002e
>> Author: Chonggang Li 
>> Date:   Sun Apr 16 12:02:18 2017 -0700
>>
>>  bonding: deliver link-local packets with skb->dev set to link that 
>> packets arrived on
>>
>>  Michal, are you able to revert this patch and test?
>>
>>  -J
>>
>> ---
>>  -Jay Vosburgh, jay.vosbu...@canonical.com
>>
>
>
>Just tested - yes, reverting that patch solves the issues.

Chonggang,

Reading the changelog in your commit referenced above, I'm not
entirely sure what actual problem it is fixing.  Could you elaborate?

As the patch appears to cause a regression, it needs to be
either fixed or reverted.

Mahesh, you signed-off on it as well, perhaps you also have some
context?

-J

---
-Jay Vosburgh, jay.vosbu...@canonical.com

Re: [BUG] mlx5 have problems with ipv4-ipv6 tunnels in linux 4.4

2018-07-12 Thread Or Gerlitz

On Tue, Jul 10, 2018 at 12:19 PM, Konstantin Khlebnikov
 wrote:
> On 10.07.2018 01:31, Saeed Mahameed wrote:
>>
>> On Tue, Jul 3, 2018 at 10:45 PM, Konstantin Khlebnikov
>>  wrote:
>>>
>>> I'm seeing problems with tunnelled traffic with Mellanox Technologies
>>> MT27710 Family [ConnectX-4 Lx] using vanilla driver from linux 4.4.y
>>>
>>> Packets with payload bigger than 116 bytes are not exmited.
>>> Smaller packets and normal ipv6 works fine.
>>>
>>
>> Hi Konstantin,
>>
>> Is this true for all ipv6 traffic or just ipv4-ipv6 tunnels ?
>>
>> what is the skb_network_offset(skb) for such packet ?
>>
>>> In linux 4.9, 4.14 and out-of-tree driver everything seems fine for now.
>>> It's hard to guess or bisect commit: there are a lot of changes and
>>> something wrong with driver or swiotlb in 4.7..4.8.
>>> 4.6 is affected too - so this should be something between 4.6 and 4.9
>>>
>>> Probably this case was fixed indirectly by adding some kind of offload
>>> and
>>> non-offloaded path is still broken.
>>> Please give me a hint: which commit could it be.
>>>
>>
>> I suspect it works in a newer kernel since we introduced on 4.7/4.8:
>
>
> Yes, this works. Thank you.
>
> Problem was with VLAN rather than tunnel.
>
> This hunk from first patch is enough:
> -#define MLX5E_MIN_INLINE ETH_HLEN
> +#define MLX5E_MIN_INLINE (ETH_HLEN + VLAN_HLEN)


so... what should we do to fix 4.4-stable? just push there the 1st path?

Saeed, 4.4 is LTS, lets fix it there..

Or.


> In my case full data path looks like
>
> ( tcp -> ipip6 -> veth ) -> netns-to-host -> ( veth -> vlan at mlx5 )
>
> Tunnelled traffic also goes to vlan, while most of other traffic goes
> through non-tagged interface and worked fine.
>
> max_inline is 226 so (226 - vlan - ethernet - ipv6 - ipv4 - tcp)
> leaves exactly 116 bytes for payload.
>
>
>>
>> commit e3a19b53cbb0e6738b7a547f262179065b72e3fa
>> Author: Matthew Finlay 
>> Date:   Thu Jun 30 17:34:47 2016 +0300
>>
>>  net/mlx5e: Copy all L2 headers into inline segment
>>
>>  ConnectX4-Lx uses an inline wqe mode that currently defaults to
>>  requiring the entire L2 header be included in the wqe.
>>  This patch fixes mlx5e_get_inline_hdr_size() to account for
>>  all L2 headers (VLAN, QinQ, etc) using skb_network_offset(skb).
>>
>>  Fixes: e586b3b0baee ("net/mlx5: Ethernet Datapath files")
>>  Signed-off-by: Matthew Finlay 
>>  Signed-off-by: Saeed Mahameed 
>>  Signed-off-by: David S. Miller 
>>
>>
>>
>> commit ae76715d153e33c249b6850361e4d8d775388b5a
>> Author: Hadar Hen Zion 
>> Date:   Sun Jul 24 16:12:39 2016 +0300
>>
>>  net/mlx5e: Check the minimum inline header mode before xmit
>>
>> and then some fixes on top of it, such as:
>>
>> commit f600c6088018d1dbc5777d18daa83660f7ea4a64
>> Author: Eran Ben Elisha 
>> Date:   Thu Jan 25 11:18:09 2018 +0200
>>
>>  net/mlx5e: Verify inline header size do not exceed SKB linear size
>>
>>
>> anyhow, can you try the above patches one by one  on 4.4.y and see if it
>> helps ?
>>
>>
>> Thanks,
>> Saeed
>>
>

Re: [PATCH net-next 2/2] net: phy: add phy_speed_down and phy_speed_up

2018-07-12 Thread Heiner Kallweit

On 11.07.2018 23:59, Heiner Kallweit wrote:
> On 11.07.2018 23:33, Florian Fainelli wrote:
>>
>>
>> On 07/11/2018 02:08 PM, Heiner Kallweit wrote:
>>> On 11.07.2018 22:55, Andrew Lunn wrote:
> +/**
> + * phy_speed_down - set speed to lowest speed supported by both link 
> partners
> + * @phydev: the phy_device struct
> + * @sync: perform action synchronously
> + *
> + * Description: Typically used to save energy when waiting for a WoL 
> packet
> + */
> +int phy_speed_down(struct phy_device *phydev, bool sync)

 This sync parameter needs some more thought. I'm not sure it is safe.

 How does a PHY trigger a WoL wake up? I guess some use the interrupt
 pin. How does a PHY indicate auto-neg has completed? It triggers an
 interrupt. So it seems like there is a danger here we suspend, and
 then wake up 2 seconds later when auto-neg has completed.

 I'm not sure we can safely suspend until auto-neg has completed.

> +/**
> + * phy_speed_up - (re)set advertised speeds to all supported speeds
> + * @phydev: the phy_device struct
> + * @sync: perform action synchronously
> + *
> + * Description: Used to revert the effect of phy_speed_down
> + */
> +int phy_speed_up(struct phy_device *phydev, bool sync)

 And here, i'm thinking the opposite. A MAC driver needs to be ready
 for the PHY state to change at any time. So why do we need to wait?
 Just let the normal mechanisms inform the MAC when the link is up.

>>> I see your points, thanks for the feedback. In my case WoL triggers
>>> a PCI PME and the code works as expected, but I agree this may be
>>> different in other setups (external PHY).
>>>
>>> The sync parameter was inspired by following comment from Florian:
>>> "One thing that bothers me a bit is that this should ideally be
>>> offered as both blocking and non-blocking options"
>>> So let's see which comments he may have before preparing a v2.
>>
>> What I had in mind is that you would be able to register a callback that
>> would tell you when auto-negotiation completes, and not register one if
>> you did not want to have that information.
>>
>> As Andrew points out though, with PHY using interrupts, this might be a
>> bit challenging to do because you will get an interrupt about "something
>> has changed" and you would have to run the callback from the PHY state
>> machine to determine this was indeed a result of triggering
>> auto-negotiation. Maybe polling for auto-negotiation like you do here is
>> good enough.
>>
> OK, then I would poll for autoneg finished in phy_speed_down and
> remove the polling option from phy_speed_up. I will do some tests
> with this before submitting a v2.
> 
Like r8169 also tg3 driver doesn't wait for the speed-down-renegotiation
to finish. Therefore, even though I share Andrew's concerns, there seem
to be chips where it's safe to not wait for the renegotiation to finish
(e.g. because device is in PCI D3 already and can't generate an interrupt).
Having said that I'd keep the sync parameter for phy_speed_down so that
the driver can decide.
For phy_speed_up I'll remove the sync parameter as discussed.


>> One nit, you might have to check for those functions that the PHY did
>> have auto-negotiation enabled and was not forced.
>>
> This I'm doing already, or do you mean something different?
>

Re: [PATCH 00/14] ARM BPF jit compiler improvements

2018-07-12 Thread Daniel Borkmann

On 07/11/2018 11:30 AM, Russell King - ARM Linux wrote:
> Hi,
> 
> This series improves the ARM BPF JIT compiler by:
> - enumerating the stack layout rather than using constants that happen
>   to be multiples of four
> - rejig the BPF "register" accesses to use negative numbers instead of
>   positive, which could be confused with register numbers in the bpf2a32
>   array.
> - since we maintain the ARM FP register as a pointer to the top of our
>   scratch space (or, with frame pointers enabled, a valid ARM frame
>   pointer register), we can access our scratch space using FP, which is
>   constant across all BPF programs, including tail-called programs.
> - use immediate forms of ARM instructions where possible, rather than
>   first loading the immediate into an ARM register.
> - use load-with-shift instruction rather than seperate shift instruction
>   followed by load
> - avoid reloading index and array in the tail-call code
> - use double-word load/store instructions where available
> 
> Version 2:
> - Fix ARMv5 test pointed out by Olof
> - Fix build error found by 0-day (adding an additional patch)
> 
>  arch/arm/net/bpf_jit_32.c | 982 
> --
>  arch/arm/net/bpf_jit_32.h |  42 +-
>  2 files changed, 543 insertions(+), 481 deletions(-)

Applied to bpf-next, thanks a lot Russell!

[PATCH v2 net-next 0/9] lan743x: Add features to lan743x driver

2018-07-12 Thread Bryan Whitehead

This patch series adds extra features to the lan743x driver.

Updates for V2:
Patch 3/9 - Used ARRAY_SIZE macro in lan743x_ethtool_get_ethtool_stats.
Patch 5/9 - Used MAX_EEPROM_SIZE in lan743x_ethtool_set_eeprom.
Patch 6/9 - Removed unnecessary read of PMT_CTL.
Used CRC algorithm from lib.
Removed PHY interrupt settings from lan743x_pm_suspend
Change "#if CONFIG_PM" to "#ifdef CONFIG_PM"

Bryan Whitehead (9):
  lan743x: Add support for ethtool get_drvinfo
  lan743x: Add support for ethtool link settings
  lan743x: Add support for ethtool statistics
  lan743x: Add support for ethtool message level
  lan743x: Add support for ethtool eeprom access
  lan743x: Add power management support
  lan743x: Add EEE support
  lan743x: Add RSS support
  lan743x: Add PTP support

 drivers/net/ethernet/microchip/Makefile  |2 +-
 drivers/net/ethernet/microchip/lan743x_ethtool.c |  729 +
 drivers/net/ethernet/microchip/lan743x_ethtool.h |   11 +
 drivers/net/ethernet/microchip/lan743x_main.c|  293 +-
 drivers/net/ethernet/microchip/lan743x_main.h|  229 -
 drivers/net/ethernet/microchip/lan743x_ptp.c | 1194 ++
 drivers/net/ethernet/microchip/lan743x_ptp.h |   78 ++
 7 files changed, 2528 insertions(+), 8 deletions(-)
 create mode 100644 drivers/net/ethernet/microchip/lan743x_ethtool.c
 create mode 100644 drivers/net/ethernet/microchip/lan743x_ethtool.h
 create mode 100644 drivers/net/ethernet/microchip/lan743x_ptp.c
 create mode 100644 drivers/net/ethernet/microchip/lan743x_ptp.h

-- 
2.7.4

[PATCH v2 net-next 2/9] lan743x: Add support for ethtool link settings

2018-07-12 Thread Bryan Whitehead

Use default link setting functions

Signed-off-by: Bryan Whitehead 
---
 drivers/net/ethernet/microchip/lan743x_ethtool.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/drivers/net/ethernet/microchip/lan743x_ethtool.c 
b/drivers/net/ethernet/microchip/lan743x_ethtool.c
index 0e20758..5c4582c 100644
--- a/drivers/net/ethernet/microchip/lan743x_ethtool.c
+++ b/drivers/net/ethernet/microchip/lan743x_ethtool.c
@@ -5,6 +5,7 @@
 #include "lan743x_main.h"
 #include "lan743x_ethtool.h"
 #include 
+#include 
 
 static void lan743x_ethtool_get_drvinfo(struct net_device *netdev,
struct ethtool_drvinfo *info)
@@ -18,4 +19,8 @@ static void lan743x_ethtool_get_drvinfo(struct net_device 
*netdev,
 
 const struct ethtool_ops lan743x_ethtool_ops = {
.get_drvinfo = lan743x_ethtool_get_drvinfo,
+   .get_link = ethtool_op_get_link,
+
+   .get_link_ksettings = phy_ethtool_get_link_ksettings,
+   .set_link_ksettings = phy_ethtool_set_link_ksettings,
 };
-- 
2.7.4

[PATCH v2 net-next 3/9] lan743x: Add support for ethtool statistics

2018-07-12 Thread Bryan Whitehead

Implement ethtool statistics

Signed-off-by: Bryan Whitehead 
---
 drivers/net/ethernet/microchip/lan743x_ethtool.c | 180 +++
 drivers/net/ethernet/microchip/lan743x_main.c|   6 +-
 drivers/net/ethernet/microchip/lan743x_main.h|  31 
 3 files changed, 214 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/microchip/lan743x_ethtool.c 
b/drivers/net/ethernet/microchip/lan743x_ethtool.c
index 5c4582c..9ed9711 100644
--- a/drivers/net/ethernet/microchip/lan743x_ethtool.c
+++ b/drivers/net/ethernet/microchip/lan743x_ethtool.c
@@ -17,10 +17,190 @@ static void lan743x_ethtool_get_drvinfo(struct net_device 
*netdev,
pci_name(adapter->pdev), sizeof(info->bus_info));
 }
 
+static const char lan743x_set0_hw_cnt_strings[][ETH_GSTRING_LEN] = {
+   "RX FCS Errors",
+   "RX Alignment Errors",
+   "Rx Fragment Errors",
+   "RX Jabber Errors",
+   "RX Undersize Frame Errors",
+   "RX Oversize Frame Errors",
+   "RX Dropped Frames",
+   "RX Unicast Byte Count",
+   "RX Broadcast Byte Count",
+   "RX Multicast Byte Count",
+   "RX Unicast Frames",
+   "RX Broadcast Frames",
+   "RX Multicast Frames",
+   "RX Pause Frames",
+   "RX 64 Byte Frames",
+   "RX 65 - 127 Byte Frames",
+   "RX 128 - 255 Byte Frames",
+   "RX 256 - 511 Bytes Frames",
+   "RX 512 - 1023 Byte Frames",
+   "RX 1024 - 1518 Byte Frames",
+   "RX Greater 1518 Byte Frames",
+};
+
+static const char lan743x_set1_sw_cnt_strings[][ETH_GSTRING_LEN] = {
+   "RX Queue 0 Frames",
+   "RX Queue 1 Frames",
+   "RX Queue 2 Frames",
+   "RX Queue 3 Frames",
+};
+
+static const char lan743x_set2_hw_cnt_strings[][ETH_GSTRING_LEN] = {
+   "RX Total Frames",
+   "EEE RX LPI Transitions",
+   "EEE RX LPI Time",
+   "RX Counter Rollover Status",
+   "TX FCS Errors",
+   "TX Excess Deferral Errors",
+   "TX Carrier Errors",
+   "TX Bad Byte Count",
+   "TX Single Collisions",
+   "TX Multiple Collisions",
+   "TX Excessive Collision",
+   "TX Late Collisions",
+   "TX Unicast Byte Count",
+   "TX Broadcast Byte Count",
+   "TX Multicast Byte Count",
+   "TX Unicast Frames",
+   "TX Broadcast Frames",
+   "TX Multicast Frames",
+   "TX Pause Frames",
+   "TX 64 Byte Frames",
+   "TX 65 - 127 Byte Frames",
+   "TX 128 - 255 Byte Frames",
+   "TX 256 - 511 Bytes Frames",
+   "TX 512 - 1023 Byte Frames",
+   "TX 1024 - 1518 Byte Frames",
+   "TX Greater 1518 Byte Frames",
+   "TX Total Frames",
+   "EEE TX LPI Transitions",
+   "EEE TX LPI Time",
+   "TX Counter Rollover Status",
+};
+
+static const u32 lan743x_set0_hw_cnt_addr[] = {
+   STAT_RX_FCS_ERRORS,
+   STAT_RX_ALIGNMENT_ERRORS,
+   STAT_RX_FRAGMENT_ERRORS,
+   STAT_RX_JABBER_ERRORS,
+   STAT_RX_UNDERSIZE_FRAME_ERRORS,
+   STAT_RX_OVERSIZE_FRAME_ERRORS,
+   STAT_RX_DROPPED_FRAMES,
+   STAT_RX_UNICAST_BYTE_COUNT,
+   STAT_RX_BROADCAST_BYTE_COUNT,
+   STAT_RX_MULTICAST_BYTE_COUNT,
+   STAT_RX_UNICAST_FRAMES,
+   STAT_RX_BROADCAST_FRAMES,
+   STAT_RX_MULTICAST_FRAMES,
+   STAT_RX_PAUSE_FRAMES,
+   STAT_RX_64_BYTE_FRAMES,
+   STAT_RX_65_127_BYTE_FRAMES,
+   STAT_RX_128_255_BYTE_FRAMES,
+   STAT_RX_256_511_BYTES_FRAMES,
+   STAT_RX_512_1023_BYTE_FRAMES,
+   STAT_RX_1024_1518_BYTE_FRAMES,
+   STAT_RX_GREATER_1518_BYTE_FRAMES,
+};
+
+static const u32 lan743x_set2_hw_cnt_addr[] = {
+   STAT_RX_TOTAL_FRAMES,
+   STAT_EEE_RX_LPI_TRANSITIONS,
+   STAT_EEE_RX_LPI_TIME,
+   STAT_RX_COUNTER_ROLLOVER_STATUS,
+   STAT_TX_FCS_ERRORS,
+   STAT_TX_EXCESS_DEFERRAL_ERRORS,
+   STAT_TX_CARRIER_ERRORS,
+   STAT_TX_BAD_BYTE_COUNT,
+   STAT_TX_SINGLE_COLLISIONS,
+   STAT_TX_MULTIPLE_COLLISIONS,
+   STAT_TX_EXCESSIVE_COLLISION,
+   STAT_TX_LATE_COLLISIONS,
+   STAT_TX_UNICAST_BYTE_COUNT,
+   STAT_TX_BROADCAST_BYTE_COUNT,
+   STAT_TX_MULTICAST_BYTE_COUNT,
+   STAT_TX_UNICAST_FRAMES,
+   STAT_TX_BROADCAST_FRAMES,
+   STAT_TX_MULTICAST_FRAMES,
+   STAT_TX_PAUSE_FRAMES,
+   STAT_TX_64_BYTE_FRAMES,
+   STAT_TX_65_127_BYTE_FRAMES,
+   STAT_TX_128_255_BYTE_FRAMES,
+   STAT_TX_256_511_BYTES_FRAMES,
+   STAT_TX_512_1023_BYTE_FRAMES,
+   STAT_TX_1024_1518_BYTE_FRAMES,
+   STAT_TX_GREATER_1518_BYTE_FRAMES,
+   STAT_TX_TOTAL_FRAMES,
+   STAT_EEE_TX_LPI_TRANSITIONS,
+   STAT_EEE_TX_LPI_TIME,
+   STAT_TX_COUNTER_ROLLOVER_STATUS
+};
+
+static void lan743x_ethtool_get_strings(struct net_device *netdev,
+   u32 stringset, u8 *data)
+{
+   switch (stringset) {
+   case ETH_SS_STATS:
+   memcpy(data, lan743x_set0_hw_cnt_strings,
+  sizeof(lan743x_set0_hw_cnt_strings));
+   memcpy(&data[sizeof(lan743x_s

[PATCH v2 net-next 5/9] lan743x: Add support for ethtool eeprom access

2018-07-12 Thread Bryan Whitehead

Implement ethtool eeprom access
Also provides access to OTP (One Time Programming)

Signed-off-by: Bryan Whitehead 
---
 drivers/net/ethernet/microchip/lan743x_ethtool.c | 209 +++
 drivers/net/ethernet/microchip/lan743x_main.h|  33 
 2 files changed, 242 insertions(+)

diff --git a/drivers/net/ethernet/microchip/lan743x_ethtool.c 
b/drivers/net/ethernet/microchip/lan743x_ethtool.c
index bab1344..f9ad237 100644
--- a/drivers/net/ethernet/microchip/lan743x_ethtool.c
+++ b/drivers/net/ethernet/microchip/lan743x_ethtool.c
@@ -7,6 +7,178 @@
 #include 
 #include 
 
+/* eeprom */
+#define LAN743X_EEPROM_MAGIC   (0x74A5)
+#define LAN743X_OTP_MAGIC  (0x74F3)
+#define EEPROM_INDICATOR_1 (0xA5)
+#define EEPROM_INDICATOR_2 (0xAA)
+#define EEPROM_MAC_OFFSET  (0x01)
+#define MAX_EEPROM_SIZE512
+#define OTP_INDICATOR_1(0xF3)
+#define OTP_INDICATOR_2(0xF7)
+
+static int lan743x_otp_write(struct lan743x_adapter *adapter, u32 offset,
+u32 length, u8 *data)
+{
+   unsigned long timeout;
+   u32 buf;
+   int i;
+
+   buf = lan743x_csr_read(adapter, OTP_PWR_DN);
+
+   if (buf & OTP_PWR_DN_PWRDN_N_) {
+   /* clear it and wait to be cleared */
+   lan743x_csr_write(adapter, OTP_PWR_DN, 0);
+
+   timeout = jiffies + HZ;
+   do {
+   udelay(1);
+   buf = lan743x_csr_read(adapter, OTP_PWR_DN);
+   if (time_after(jiffies, timeout)) {
+   netif_warn(adapter, drv, adapter->netdev,
+  "timeout on OTP_PWR_DN 
completion\n");
+   return -EIO;
+   }
+   } while (buf & OTP_PWR_DN_PWRDN_N_);
+   }
+
+   /* set to BYTE program mode */
+   lan743x_csr_write(adapter, OTP_PRGM_MODE, OTP_PRGM_MODE_BYTE_);
+
+   for (i = 0; i < length; i++) {
+   lan743x_csr_write(adapter, OTP_ADDR1,
+ ((offset + i) >> 8) &
+ OTP_ADDR1_15_11_MASK_);
+   lan743x_csr_write(adapter, OTP_ADDR2,
+ ((offset + i) &
+ OTP_ADDR2_10_3_MASK_));
+   lan743x_csr_write(adapter, OTP_PRGM_DATA, data[i]);
+   lan743x_csr_write(adapter, OTP_TST_CMD, OTP_TST_CMD_PRGVRFY_);
+   lan743x_csr_write(adapter, OTP_CMD_GO, OTP_CMD_GO_GO_);
+
+   timeout = jiffies + HZ;
+   do {
+   udelay(1);
+   buf = lan743x_csr_read(adapter, OTP_STATUS);
+   if (time_after(jiffies, timeout)) {
+   netif_warn(adapter, drv, adapter->netdev,
+  "Timeout on OTP_STATUS 
completion\n");
+   return -EIO;
+   }
+   } while (buf & OTP_STATUS_BUSY_);
+   }
+
+   return 0;
+}
+
+static int lan743x_eeprom_wait(struct lan743x_adapter *adapter)
+{
+   unsigned long start_time = jiffies;
+   u32 val;
+
+   do {
+   val = lan743x_csr_read(adapter, E2P_CMD);
+
+   if (!(val & E2P_CMD_EPC_BUSY_) ||
+   (val & E2P_CMD_EPC_TIMEOUT_))
+   break;
+   usleep_range(40, 100);
+   } while (!time_after(jiffies, start_time + HZ));
+
+   if (val & (E2P_CMD_EPC_TIMEOUT_ | E2P_CMD_EPC_BUSY_)) {
+   netif_warn(adapter, drv, adapter->netdev,
+  "EEPROM read operation timeout\n");
+   return -EIO;
+   }
+
+   return 0;
+}
+
+static int lan743x_eeprom_confirm_not_busy(struct lan743x_adapter *adapter)
+{
+   unsigned long start_time = jiffies;
+   u32 val;
+
+   do {
+   val = lan743x_csr_read(adapter, E2P_CMD);
+
+   if (!(val & E2P_CMD_EPC_BUSY_))
+   return 0;
+
+   usleep_range(40, 100);
+   } while (!time_after(jiffies, start_time + HZ));
+
+   netif_warn(adapter, drv, adapter->netdev, "EEPROM is busy\n");
+   return -EIO;
+}
+
+static int lan743x_eeprom_read(struct lan743x_adapter *adapter,
+  u32 offset, u32 length, u8 *data)
+{
+   int retval;
+   u32 val;
+   int i;
+
+   retval = lan743x_eeprom_confirm_not_busy(adapter);
+   if (retval)
+   return retval;
+
+   for (i = 0; i < length; i++) {
+   val = E2P_CMD_EPC_BUSY_ | E2P_CMD_EPC_CMD_READ_;
+   val |= (offset & E2P_CMD_EPC_ADDR_MASK_);
+   lan743x_csr_write(adapter, E2P_CMD, val);
+
+   retval = lan743x_eeprom_wait(adapter);
+   if (retval < 0)
+

[PATCH v2 net-next 9/9] lan743x: Add PTP support

2018-07-12 Thread Bryan Whitehead

PTP support includes:
Ingress, and egress timestamping.
PTP clock support
Pulse per second output on GPIO

Signed-off-by: Bryan Whitehead 
---
 drivers/net/ethernet/microchip/Makefile  |2 +-
 drivers/net/ethernet/microchip/lan743x_ethtool.c |   28 +
 drivers/net/ethernet/microchip/lan743x_main.c|   81 +-
 drivers/net/ethernet/microchip/lan743x_main.h|   96 +-
 drivers/net/ethernet/microchip/lan743x_ptp.c | 1194 ++
 drivers/net/ethernet/microchip/lan743x_ptp.h |   78 ++
 6 files changed, 1474 insertions(+), 5 deletions(-)
 create mode 100644 drivers/net/ethernet/microchip/lan743x_ptp.c
 create mode 100644 drivers/net/ethernet/microchip/lan743x_ptp.h

diff --git a/drivers/net/ethernet/microchip/Makefile 
b/drivers/net/ethernet/microchip/Makefile
index 43f47cb..538926d 100644
--- a/drivers/net/ethernet/microchip/Makefile
+++ b/drivers/net/ethernet/microchip/Makefile
@@ -6,4 +6,4 @@ obj-$(CONFIG_ENC28J60) += enc28j60.o
 obj-$(CONFIG_ENCX24J600) += encx24j600.o encx24j600-regmap.o
 obj-$(CONFIG_LAN743X) += lan743x.o
 
-lan743x-objs := lan743x_main.o lan743x_ethtool.o
+lan743x-objs := lan743x_main.o lan743x_ethtool.o lan743x_ptp.o
diff --git a/drivers/net/ethernet/microchip/lan743x_ethtool.c 
b/drivers/net/ethernet/microchip/lan743x_ethtool.c
index 33d6c2d..8800716 100644
--- a/drivers/net/ethernet/microchip/lan743x_ethtool.c
+++ b/drivers/net/ethernet/microchip/lan743x_ethtool.c
@@ -4,6 +4,7 @@
 #include 
 #include "lan743x_main.h"
 #include "lan743x_ethtool.h"
+#include 
 #include 
 #include 
 
@@ -542,6 +543,32 @@ static int lan743x_ethtool_set_rxfh(struct net_device 
*netdev,
return 0;
 }
 
+static int lan743x_ethtool_get_ts_info(struct net_device *netdev,
+  struct ethtool_ts_info *ts_info)
+{
+   struct lan743x_adapter *adapter = netdev_priv(netdev);
+
+   ts_info->so_timestamping = SOF_TIMESTAMPING_TX_SOFTWARE |
+  SOF_TIMESTAMPING_RX_SOFTWARE |
+  SOF_TIMESTAMPING_SOFTWARE |
+  SOF_TIMESTAMPING_TX_HARDWARE |
+  SOF_TIMESTAMPING_RX_HARDWARE |
+  SOF_TIMESTAMPING_RAW_HARDWARE;
+#ifdef CONFIG_PTP_1588_CLOCK
+   if (adapter->ptp.ptp_clock)
+   ts_info->phc_index = ptp_clock_index(adapter->ptp.ptp_clock);
+   else
+   ts_info->phc_index = -1;
+#else
+   ts_info->phc_index = -1;
+#endif
+   ts_info->tx_types = BIT(HWTSTAMP_TX_OFF) |
+   BIT(HWTSTAMP_TX_ON);
+   ts_info->rx_filters = BIT(HWTSTAMP_FILTER_NONE) |
+ BIT(HWTSTAMP_FILTER_ALL);
+   return 0;
+}
+
 static int lan743x_ethtool_get_eee(struct net_device *netdev,
   struct ethtool_eee *eee)
 {
@@ -690,6 +717,7 @@ const struct ethtool_ops lan743x_ethtool_ops = {
.get_rxfh_indir_size = lan743x_ethtool_get_rxfh_indir_size,
.get_rxfh = lan743x_ethtool_get_rxfh,
.set_rxfh = lan743x_ethtool_set_rxfh,
+   .get_ts_info = lan743x_ethtool_get_ts_info,
.get_eee = lan743x_ethtool_get_eee,
.set_eee = lan743x_ethtool_set_eee,
.get_link_ksettings = phy_ethtool_get_link_ksettings,
diff --git a/drivers/net/ethernet/microchip/lan743x_main.c 
b/drivers/net/ethernet/microchip/lan743x_main.c
index 953b581..ca9ae49 100644
--- a/drivers/net/ethernet/microchip/lan743x_main.c
+++ b/drivers/net/ethernet/microchip/lan743x_main.c
@@ -267,6 +267,10 @@ static void lan743x_intr_shared_isr(void *context, u32 
int_sts, u32 flags)
lan743x_intr_software_isr(adapter);
int_sts &= ~INT_BIT_SW_GP_;
}
+   if (int_sts & INT_BIT_1588_) {
+   lan743x_ptp_isr(adapter);
+   int_sts &= ~INT_BIT_1588_;
+   }
}
if (int_sts)
lan743x_csr_write(adapter, INT_EN_CLR, int_sts);
@@ -976,6 +980,7 @@ static void lan743x_phy_link_status_change(struct 
net_device *netdev)
   ksettings.base.duplex,
   local_advertisement,
   remote_advertisement);
+   lan743x_ptp_update_latency(adapter, ksettings.base.speed);
}
 }
 
@@ -1256,11 +1261,29 @@ static void lan743x_tx_release_desc(struct lan743x_tx 
*tx,
buffer_info->dma_ptr = 0;
buffer_info->buffer_length = 0;
}
-   if (buffer_info->skb) {
+   if (!buffer_info->skb)
+   goto clear_active;
+
+   if (!(buffer_info->flags &
+   TX_BUFFER_INFO_FLAG_TIMESTAMP_REQUESTED)) {
dev_kfree_skb(buffer_info->skb);
-   buffer_info->skb = NULL;
+   goto clear_skb;
}
 
+   if (cleanup) {
+   lan

[PATCH v2 net-next 8/9] lan743x: Add RSS support

2018-07-12 Thread Bryan Whitehead

Implement RSS support

Signed-off-by: Bryan Whitehead 
---
 drivers/net/ethernet/microchip/lan743x_ethtool.c | 132 +++
 drivers/net/ethernet/microchip/lan743x_main.c|  20 
 drivers/net/ethernet/microchip/lan743x_main.h|  19 
 3 files changed, 171 insertions(+)

diff --git a/drivers/net/ethernet/microchip/lan743x_ethtool.c 
b/drivers/net/ethernet/microchip/lan743x_ethtool.c
index 3d95290..33d6c2d 100644
--- a/drivers/net/ethernet/microchip/lan743x_ethtool.c
+++ b/drivers/net/ethernet/microchip/lan743x_ethtool.c
@@ -415,6 +415,133 @@ static int lan743x_ethtool_get_sset_count(struct 
net_device *netdev, int sset)
}
 }
 
+static int lan743x_ethtool_get_rxnfc(struct net_device *netdev,
+struct ethtool_rxnfc *rxnfc,
+u32 *rule_locs)
+{
+   switch (rxnfc->cmd) {
+   case ETHTOOL_GRXFH:
+   rxnfc->data = 0;
+   switch (rxnfc->flow_type) {
+   case TCP_V4_FLOW:case UDP_V4_FLOW:
+   case TCP_V6_FLOW:case UDP_V6_FLOW:
+   rxnfc->data |= RXH_L4_B_0_1 | RXH_L4_B_2_3;
+   /* fall through */
+   case IPV4_FLOW: case IPV6_FLOW:
+   rxnfc->data |= RXH_IP_SRC | RXH_IP_DST;
+   return 0;
+   }
+   break;
+   case ETHTOOL_GRXRINGS:
+   rxnfc->data = LAN743X_USED_RX_CHANNELS;
+   return 0;
+   }
+   return -EOPNOTSUPP;
+}
+
+static u32 lan743x_ethtool_get_rxfh_key_size(struct net_device *netdev)
+{
+   return 40;
+}
+
+static u32 lan743x_ethtool_get_rxfh_indir_size(struct net_device *netdev)
+{
+   return 128;
+}
+
+static int lan743x_ethtool_get_rxfh(struct net_device *netdev,
+   u32 *indir, u8 *key, u8 *hfunc)
+{
+   struct lan743x_adapter *adapter = netdev_priv(netdev);
+
+   if (indir) {
+   int dw_index;
+   int byte_index = 0;
+
+   for (dw_index = 0; dw_index < 32; dw_index++) {
+   u32 four_entries =
+   lan743x_csr_read(adapter, RFE_INDX(dw_index));
+
+   byte_index = dw_index << 2;
+   indir[byte_index + 0] =
+   ((four_entries >> 0) & 0x00FF);
+   indir[byte_index + 1] =
+   ((four_entries >> 8) & 0x00FF);
+   indir[byte_index + 2] =
+   ((four_entries >> 16) & 0x00FF);
+   indir[byte_index + 3] =
+   ((four_entries >> 24) & 0x00FF);
+   }
+   }
+   if (key) {
+   int dword_index;
+   int byte_index = 0;
+
+   for (dword_index = 0; dword_index < 10; dword_index++) {
+   u32 four_entries =
+   lan743x_csr_read(adapter,
+RFE_HASH_KEY(dword_index));
+
+   byte_index = dword_index << 2;
+   key[byte_index + 0] =
+   ((four_entries >> 0) & 0x00FF);
+   key[byte_index + 1] =
+   ((four_entries >> 8) & 0x00FF);
+   key[byte_index + 2] =
+   ((four_entries >> 16) & 0x00FF);
+   key[byte_index + 3] =
+   ((four_entries >> 24) & 0x00FF);
+   }
+   }
+   if (hfunc)
+   (*hfunc) = ETH_RSS_HASH_TOP;
+   return 0;
+}
+
+static int lan743x_ethtool_set_rxfh(struct net_device *netdev,
+   const u32 *indir, const u8 *key,
+   const u8 hfunc)
+{
+   struct lan743x_adapter *adapter = netdev_priv(netdev);
+
+   if (hfunc != ETH_RSS_HASH_NO_CHANGE && hfunc != ETH_RSS_HASH_TOP)
+   return -EOPNOTSUPP;
+
+   if (indir) {
+   u32 indir_value = 0;
+   int dword_index = 0;
+   int byte_index = 0;
+
+   for (dword_index = 0; dword_index < 32; dword_index++) {
+   byte_index = dword_index << 2;
+   indir_value =
+   (((indir[byte_index + 0] & 0x00FF) << 0) |
+   ((indir[byte_index + 1] & 0x00FF) << 8) |
+   ((indir[byte_index + 2] & 0x00FF) << 16) |
+   ((indir[byte_index + 3] & 0x00FF) << 24));
+   lan743x_csr_write(adapter, RFE_INDX(dword_index),
+ indir_value);
+   }
+   }
+   if (key) {
+   int dword_index = 0;
+   int byte_index = 0;
+   u32 key_va

[PATCH v2 net-next 4/9] lan743x: Add support for ethtool message level

2018-07-12 Thread Bryan Whitehead

Implement ethtool message level

Signed-off-by: Bryan Whitehead 
---
 drivers/net/ethernet/microchip/lan743x_ethtool.c | 17 +
 1 file changed, 17 insertions(+)

diff --git a/drivers/net/ethernet/microchip/lan743x_ethtool.c 
b/drivers/net/ethernet/microchip/lan743x_ethtool.c
index 9ed9711..bab1344 100644
--- a/drivers/net/ethernet/microchip/lan743x_ethtool.c
+++ b/drivers/net/ethernet/microchip/lan743x_ethtool.c
@@ -17,6 +17,21 @@ static void lan743x_ethtool_get_drvinfo(struct net_device 
*netdev,
pci_name(adapter->pdev), sizeof(info->bus_info));
 }
 
+static u32 lan743x_ethtool_get_msglevel(struct net_device *netdev)
+{
+   struct lan743x_adapter *adapter = netdev_priv(netdev);
+
+   return adapter->msg_enable;
+}
+
+static void lan743x_ethtool_set_msglevel(struct net_device *netdev,
+u32 msglevel)
+{
+   struct lan743x_adapter *adapter = netdev_priv(netdev);
+
+   adapter->msg_enable = msglevel;
+}
+
 static const char lan743x_set0_hw_cnt_strings[][ETH_GSTRING_LEN] = {
"RX FCS Errors",
"RX Alignment Errors",
@@ -196,6 +211,8 @@ static int lan743x_ethtool_get_sset_count(struct net_device 
*netdev, int sset)
 
 const struct ethtool_ops lan743x_ethtool_ops = {
.get_drvinfo = lan743x_ethtool_get_drvinfo,
+   .get_msglevel = lan743x_ethtool_get_msglevel,
+   .set_msglevel = lan743x_ethtool_set_msglevel,
.get_link = ethtool_op_get_link,
 
.get_strings = lan743x_ethtool_get_strings,
-- 
2.7.4

[PATCH v2 net-next 7/9] lan743x: Add EEE support

2018-07-12 Thread Bryan Whitehead

Implement EEE support

Signed-off-by: Bryan Whitehead 
---
 drivers/net/ethernet/microchip/lan743x_ethtool.c | 89 
 drivers/net/ethernet/microchip/lan743x_main.h|  3 +
 2 files changed, 92 insertions(+)

diff --git a/drivers/net/ethernet/microchip/lan743x_ethtool.c 
b/drivers/net/ethernet/microchip/lan743x_ethtool.c
index f9d875d..3d95290 100644
--- a/drivers/net/ethernet/microchip/lan743x_ethtool.c
+++ b/drivers/net/ethernet/microchip/lan743x_ethtool.c
@@ -415,6 +415,93 @@ static int lan743x_ethtool_get_sset_count(struct 
net_device *netdev, int sset)
}
 }
 
+static int lan743x_ethtool_get_eee(struct net_device *netdev,
+  struct ethtool_eee *eee)
+{
+   struct lan743x_adapter *adapter = netdev_priv(netdev);
+   struct phy_device *phydev = netdev->phydev;
+   u32 buf;
+   int ret;
+
+   if (!phydev)
+   return -EIO;
+   if (!phydev->drv) {
+   netif_err(adapter, drv, adapter->netdev,
+ "Missing PHY Driver\n");
+   return -EIO;
+   }
+
+   ret = phy_ethtool_get_eee(phydev, eee);
+   if (ret < 0)
+   return ret;
+
+   buf = lan743x_csr_read(adapter, MAC_CR);
+   if (buf & MAC_CR_EEE_EN_) {
+   eee->eee_enabled = true;
+   eee->eee_active = !!(eee->advertised & eee->lp_advertised);
+   eee->tx_lpi_enabled = true;
+   /* EEE_TX_LPI_REQ_DLY & tx_lpi_timer are same uSec unit */
+   buf = lan743x_csr_read(adapter, MAC_EEE_TX_LPI_REQ_DLY_CNT);
+   eee->tx_lpi_timer = buf;
+   } else {
+   eee->eee_enabled = false;
+   eee->eee_active = false;
+   eee->tx_lpi_enabled = false;
+   eee->tx_lpi_timer = 0;
+   }
+
+   return 0;
+}
+
+static int lan743x_ethtool_set_eee(struct net_device *netdev,
+  struct ethtool_eee *eee)
+{
+   struct lan743x_adapter *adapter = netdev_priv(netdev);
+   struct phy_device *phydev = NULL;
+   u32 buf = 0;
+   int ret = 0;
+
+   if (!netdev)
+   return -EINVAL;
+   adapter = netdev_priv(netdev);
+   if (!adapter)
+   return -EINVAL;
+   phydev = netdev->phydev;
+   if (!phydev)
+   return -EIO;
+   if (!phydev->drv) {
+   netif_err(adapter, drv, adapter->netdev,
+ "Missing PHY Driver\n");
+   return -EIO;
+   }
+
+   if (eee->eee_enabled) {
+   ret = phy_init_eee(phydev, 0);
+   if (ret) {
+   netif_err(adapter, drv, adapter->netdev,
+ "EEE initialization failed\n");
+   return ret;
+   }
+
+   buf = lan743x_csr_read(adapter, MAC_CR);
+   buf |= MAC_CR_EEE_EN_;
+   lan743x_csr_write(adapter, MAC_CR, buf);
+
+   phy_ethtool_set_eee(phydev, eee);
+
+   buf = (u32)eee->tx_lpi_timer;
+   lan743x_csr_write(adapter, MAC_EEE_TX_LPI_REQ_DLY_CNT, buf);
+   netif_info(adapter, drv, adapter->netdev, "Enabled EEE\n");
+   } else {
+   buf = lan743x_csr_read(adapter, MAC_CR);
+   buf &= ~MAC_CR_EEE_EN_;
+   lan743x_csr_write(adapter, MAC_CR, buf);
+   netif_info(adapter, drv, adapter->netdev, "Disabled EEE\n");
+   }
+
+   return 0;
+}
+
 #ifdef CONFIG_PM
 static void lan743x_ethtool_get_wol(struct net_device *netdev,
struct ethtool_wolinfo *wol)
@@ -471,6 +558,8 @@ const struct ethtool_ops lan743x_ethtool_ops = {
.get_strings = lan743x_ethtool_get_strings,
.get_ethtool_stats = lan743x_ethtool_get_ethtool_stats,
.get_sset_count = lan743x_ethtool_get_sset_count,
+   .get_eee = lan743x_ethtool_get_eee,
+   .set_eee = lan743x_ethtool_set_eee,
.get_link_ksettings = phy_ethtool_get_link_ksettings,
.set_link_ksettings = phy_ethtool_set_link_ksettings,
 #ifdef CONFIG_PM
diff --git a/drivers/net/ethernet/microchip/lan743x_main.h 
b/drivers/net/ethernet/microchip/lan743x_main.h
index 72b9beb..93cb60a 100644
--- a/drivers/net/ethernet/microchip/lan743x_main.h
+++ b/drivers/net/ethernet/microchip/lan743x_main.h
@@ -82,6 +82,7 @@
((value << 0) & FCT_FLOW_CTL_ON_THRESHOLD_)
 
 #define MAC_CR (0x100)
+#define MAC_CR_EEE_EN_ BIT(17)
 #define MAC_CR_ADD_BIT(12)
 #define MAC_CR_ASD_BIT(11)
 #define MAC_CR_CNTR_RST_   BIT(5)
@@ -117,6 +118,8 @@
 
 #define MAC_MII_DATA   (0x124)
 
+#define MAC_EEE_TX_LPI_REQ_DLY_CNT (0x130)
+
 #define MAC_WUCSR  (0x140)
 #define MAC_WUCSR_RFE_WAKE_EN_ BIT(14)
 #define MAC_WUCSR_PFDA_EN_ BIT(3)
--

[PATCH v2 net-next 1/9] lan743x: Add support for ethtool get_drvinfo

2018-07-12 Thread Bryan Whitehead

Implement ethtool get_drvinfo

Signed-off-by: Bryan Whitehead 
---
 drivers/net/ethernet/microchip/Makefile  |  2 +-
 drivers/net/ethernet/microchip/lan743x_ethtool.c | 21 +
 drivers/net/ethernet/microchip/lan743x_ethtool.h | 11 +++
 drivers/net/ethernet/microchip/lan743x_main.c|  2 ++
 4 files changed, 35 insertions(+), 1 deletion(-)
 create mode 100644 drivers/net/ethernet/microchip/lan743x_ethtool.c
 create mode 100644 drivers/net/ethernet/microchip/lan743x_ethtool.h

diff --git a/drivers/net/ethernet/microchip/Makefile 
b/drivers/net/ethernet/microchip/Makefile
index 2e982cc..43f47cb 100644
--- a/drivers/net/ethernet/microchip/Makefile
+++ b/drivers/net/ethernet/microchip/Makefile
@@ -6,4 +6,4 @@ obj-$(CONFIG_ENC28J60) += enc28j60.o
 obj-$(CONFIG_ENCX24J600) += encx24j600.o encx24j600-regmap.o
 obj-$(CONFIG_LAN743X) += lan743x.o
 
-lan743x-objs := lan743x_main.o
+lan743x-objs := lan743x_main.o lan743x_ethtool.o
diff --git a/drivers/net/ethernet/microchip/lan743x_ethtool.c 
b/drivers/net/ethernet/microchip/lan743x_ethtool.c
new file mode 100644
index 000..0e20758
--- /dev/null
+++ b/drivers/net/ethernet/microchip/lan743x_ethtool.c
@@ -0,0 +1,21 @@
+/* SPDX-License-Identifier: GPL-2.0+ */
+/* Copyright (C) 2018 Microchip Technology Inc. */
+
+#include 
+#include "lan743x_main.h"
+#include "lan743x_ethtool.h"
+#include 
+
+static void lan743x_ethtool_get_drvinfo(struct net_device *netdev,
+   struct ethtool_drvinfo *info)
+{
+   struct lan743x_adapter *adapter = netdev_priv(netdev);
+
+   strlcpy(info->driver, DRIVER_NAME, sizeof(info->driver));
+   strlcpy(info->bus_info,
+   pci_name(adapter->pdev), sizeof(info->bus_info));
+}
+
+const struct ethtool_ops lan743x_ethtool_ops = {
+   .get_drvinfo = lan743x_ethtool_get_drvinfo,
+};
diff --git a/drivers/net/ethernet/microchip/lan743x_ethtool.h 
b/drivers/net/ethernet/microchip/lan743x_ethtool.h
new file mode 100644
index 000..d0d11a7
--- /dev/null
+++ b/drivers/net/ethernet/microchip/lan743x_ethtool.h
@@ -0,0 +1,11 @@
+/* SPDX-License-Identifier: GPL-2.0+ */
+/* Copyright (C) 2018 Microchip Technology Inc. */
+
+#ifndef _LAN743X_ETHTOOL_H
+#define _LAN743X_ETHTOOL_H
+
+#include "linux/ethtool.h"
+
+extern const struct ethtool_ops lan743x_ethtool_ops;
+
+#endif /* _LAN743X_ETHTOOL_H */
diff --git a/drivers/net/ethernet/microchip/lan743x_main.c 
b/drivers/net/ethernet/microchip/lan743x_main.c
index e1747a4..ade3b04 100644
--- a/drivers/net/ethernet/microchip/lan743x_main.c
+++ b/drivers/net/ethernet/microchip/lan743x_main.c
@@ -12,6 +12,7 @@
 #include 
 #include 
 #include "lan743x_main.h"
+#include "lan743x_ethtool.h"
 
 static void lan743x_pci_cleanup(struct lan743x_adapter *adapter)
 {
@@ -2689,6 +2690,7 @@ static int lan743x_pcidev_probe(struct pci_dev *pdev,
goto cleanup_hardware;
 
adapter->netdev->netdev_ops = &lan743x_netdev_ops;
+   adapter->netdev->ethtool_ops = &lan743x_ethtool_ops;
adapter->netdev->features = NETIF_F_SG | NETIF_F_TSO | NETIF_F_HW_CSUM;
adapter->netdev->hw_features = adapter->netdev->features;
 
-- 
2.7.4

[PATCH v2 net-next 6/9] lan743x: Add power management support

2018-07-12 Thread Bryan Whitehead

Implement power management.
Supports suspend, resume, and Wake On LAN

Signed-off-by: Bryan Whitehead 
---
 drivers/net/ethernet/microchip/lan743x_ethtool.c |  48 ++
 drivers/net/ethernet/microchip/lan743x_main.c| 184 +++
 drivers/net/ethernet/microchip/lan743x_main.h|  47 ++
 3 files changed, 279 insertions(+)

diff --git a/drivers/net/ethernet/microchip/lan743x_ethtool.c 
b/drivers/net/ethernet/microchip/lan743x_ethtool.c
index f9ad237..f9d875d 100644
--- a/drivers/net/ethernet/microchip/lan743x_ethtool.c
+++ b/drivers/net/ethernet/microchip/lan743x_ethtool.c
@@ -415,6 +415,50 @@ static int lan743x_ethtool_get_sset_count(struct 
net_device *netdev, int sset)
}
 }
 
+#ifdef CONFIG_PM
+static void lan743x_ethtool_get_wol(struct net_device *netdev,
+   struct ethtool_wolinfo *wol)
+{
+   struct lan743x_adapter *adapter = netdev_priv(netdev);
+
+   wol->supported = WAKE_BCAST | WAKE_UCAST | WAKE_MCAST |
+   WAKE_MAGIC | WAKE_PHY | WAKE_ARP;
+
+   wol->wolopts = adapter->wolopts;
+}
+#endif /* CONFIG_PM */
+
+#ifdef CONFIG_PM
+static int lan743x_ethtool_set_wol(struct net_device *netdev,
+  struct ethtool_wolinfo *wol)
+{
+   struct lan743x_adapter *adapter = netdev_priv(netdev);
+
+   if (wol->wolopts & WAKE_MAGICSECURE)
+   return -EOPNOTSUPP;
+
+   adapter->wolopts = 0;
+   if (wol->wolopts & WAKE_UCAST)
+   adapter->wolopts |= WAKE_UCAST;
+   if (wol->wolopts & WAKE_MCAST)
+   adapter->wolopts |= WAKE_MCAST;
+   if (wol->wolopts & WAKE_BCAST)
+   adapter->wolopts |= WAKE_BCAST;
+   if (wol->wolopts & WAKE_MAGIC)
+   adapter->wolopts |= WAKE_MAGIC;
+   if (wol->wolopts & WAKE_PHY)
+   adapter->wolopts |= WAKE_PHY;
+   if (wol->wolopts & WAKE_ARP)
+   adapter->wolopts |= WAKE_ARP;
+
+   device_set_wakeup_enable(&adapter->pdev->dev, (bool)wol->wolopts);
+
+   phy_ethtool_set_wol(netdev->phydev, wol);
+
+   return 0;
+}
+#endif /* CONFIG_PM */
+
 const struct ethtool_ops lan743x_ethtool_ops = {
.get_drvinfo = lan743x_ethtool_get_drvinfo,
.get_msglevel = lan743x_ethtool_get_msglevel,
@@ -429,4 +473,8 @@ const struct ethtool_ops lan743x_ethtool_ops = {
.get_sset_count = lan743x_ethtool_get_sset_count,
.get_link_ksettings = phy_ethtool_get_link_ksettings,
.set_link_ksettings = phy_ethtool_set_link_ksettings,
+#ifdef CONFIG_PM
+   .get_wol = lan743x_ethtool_get_wol,
+   .set_wol = lan743x_ethtool_set_wol,
+#endif
 };
diff --git a/drivers/net/ethernet/microchip/lan743x_main.c 
b/drivers/net/ethernet/microchip/lan743x_main.c
index 1e2f8c6..8e9eff8 100644
--- a/drivers/net/ethernet/microchip/lan743x_main.c
+++ b/drivers/net/ethernet/microchip/lan743x_main.c
@@ -11,6 +11,7 @@
 #include 
 #include 
 #include 
+#include 
 #include "lan743x_main.h"
 #include "lan743x_ethtool.h"
 
@@ -2749,10 +2750,190 @@ static void lan743x_pcidev_shutdown(struct pci_dev 
*pdev)
lan743x_netdev_close(netdev);
rtnl_unlock();
 
+#ifdef CONFIG_PM
+   pci_save_state(pdev);
+#endif
+
/* clean up lan743x portion */
lan743x_hardware_cleanup(adapter);
 }
 
+#ifdef CONFIG_PM
+static u16 lan743x_pm_wakeframe_crc16(const u8 *buf, int len)
+{
+   return bitrev16(crc16(0x, buf, len));
+}
+#endif /* CONFIG_PM */
+
+#ifdef CONFIG_PM
+static void lan743x_pm_set_wol(struct lan743x_adapter *adapter)
+{
+   const u8 ipv4_multicast[3] = { 0x01, 0x00, 0x5E };
+   const u8 ipv6_multicast[3] = { 0x33, 0x33 };
+   const u8 arp_type[2] = { 0x08, 0x06 };
+   int mask_index;
+   u32 pmtctl;
+   u32 wucsr;
+   u32 macrx;
+   u16 crc;
+
+   for (mask_index = 0; mask_index < MAC_NUM_OF_WUF_CFG; mask_index++)
+   lan743x_csr_write(adapter, MAC_WUF_CFG(mask_index), 0);
+
+   /* clear wake settings */
+   pmtctl = lan743x_csr_read(adapter, PMT_CTL);
+   pmtctl |= PMT_CTL_WUPS_MASK_;
+   pmtctl &= ~(PMT_CTL_GPIO_WAKEUP_EN_ | PMT_CTL_EEE_WAKEUP_EN_ |
+   PMT_CTL_WOL_EN_ | PMT_CTL_MAC_D3_RX_CLK_OVR_ |
+   PMT_CTL_RX_FCT_RFE_D3_CLK_OVR_ | PMT_CTL_ETH_PHY_WAKE_EN_);
+
+   macrx = lan743x_csr_read(adapter, MAC_RX);
+
+   wucsr = 0;
+   mask_index = 0;
+
+   pmtctl |= PMT_CTL_ETH_PHY_D3_COLD_OVR_ | PMT_CTL_ETH_PHY_D3_OVR_;
+
+   if (adapter->wolopts & WAKE_PHY) {
+   pmtctl |= PMT_CTL_ETH_PHY_EDPD_PLL_CTL_;
+   pmtctl |= PMT_CTL_ETH_PHY_WAKE_EN_;
+   }
+   if (adapter->wolopts & WAKE_MAGIC) {
+   wucsr |= MAC_WUCSR_MPEN_;
+   macrx |= MAC_RX_RXEN_;
+   pmtctl |= PMT_CTL_WOL_EN_ | PMT_CTL_MAC_D3_RX_CLK_OVR_;
+   }
+   if (adapter->wolopts & WAKE_UCAST) {
+   wucsr |= MAC_WUCSR_RFE_WAKE_EN_ | MAC_WUCSR_PFDA_EN_;
+   m

Re: [PATCH net-next 2/2] net: phy: add phy_speed_down and phy_speed_up

2018-07-12 Thread Andrew Lunn

> Like r8169 also tg3 driver doesn't wait for the speed-down-renegotiation
> to finish. Therefore, even though I share Andrew's concerns, there seem
> to be chips where it's safe to not wait for the renegotiation to finish
> (e.g. because device is in PCI D3 already and can't generate an interrupt).
> Having said that I'd keep the sync parameter for phy_speed_down so that
> the driver can decide.

Hi Heiner

Please put a big fat comment about the dangers of sync=false in the
function header. We want people to known it is dangerous by default,
and should only be used in special conditions, when it is known to be
safe.
Andrew

Re: [PATCH net-next 2/2] net: phy: add phy_speed_down and phy_speed_up

2018-07-12 Thread Heiner Kallweit

On 12.07.2018 21:09, Andrew Lunn wrote:
>> Like r8169 also tg3 driver doesn't wait for the speed-down-renegotiation
>> to finish. Therefore, even though I share Andrew's concerns, there seem
>> to be chips where it's safe to not wait for the renegotiation to finish
>> (e.g. because device is in PCI D3 already and can't generate an interrupt).
>> Having said that I'd keep the sync parameter for phy_speed_down so that
>> the driver can decide.
> 
> Hi Heiner
> 
> Please put a big fat comment about the dangers of sync=false in the
> function header. We want people to known it is dangerous by default,
> and should only be used in special conditions, when it is known to be
> safe.
>   Andrew
> 
OK ..

Heiner

Re: [PATCH v3 net-next 00/19] TLS offload rx, netdev & mlx5

2018-07-12 Thread Boris Pismenny


Hi Dave,

On 7/12/2018 12:54 PM, Dave Watson wrote:

On 07/11/18 10:54 PM, Boris Pismenny wrote:

Hi,

The following series provides TLS RX inline crypto offload.


All the tls patches look good to me except #10

"tls: Fix zerocopy_from_iter iov handling"

which seems to break the non-device zerocopy flow.


Thanks for reviewing!

Sorry, it seems to break the zerocopy send flow, and I've tested only 
with the receive flow offload disabled.


I'll fix it in v4. I think that adding a flag to indicate whether a 
revert is needed should do the trick. In the receive flow the revert is 
needed to handle potential errors, while in the transmit flow it needs 
to be removed.




The integration is very clean, thanks!  



v2->v3:
 - Fix typo
 - Adjust cover letter
 - Fix bug in zero copy flows
 - Use network byte order for the record number in resync
 - Adjust the sequence provided in resync

v1->v2:
 - Fix bisectability problems due to variable name changes
 - Fix potential uninitialized return value

Re: [PATCH net-next 2/2] net: phy: add phy_speed_down and phy_speed_up

2018-07-12 Thread Florian Fainelli




On 07/12/2018 12:10 PM, Heiner Kallweit wrote:
> On 12.07.2018 21:09, Andrew Lunn wrote:
>>> Like r8169 also tg3 driver doesn't wait for the speed-down-renegotiation
>>> to finish. Therefore, even though I share Andrew's concerns, there seem
>>> to be chips where it's safe to not wait for the renegotiation to finish
>>> (e.g. because device is in PCI D3 already and can't generate an interrupt).
>>> Having said that I'd keep the sync parameter for phy_speed_down so that
>>> the driver can decide.
>>
>> Hi Heiner
>>
>> Please put a big fat comment about the dangers of sync=false in the
>> function header. We want people to known it is dangerous by default,
>> and should only be used in special conditions, when it is known to be
>> safe.
>>  Andrew
>>
> OK ..

What part do you find dangerous? Magic Packets are UDP packets and they
are not routed (unless specifically taken care of) so there is already
some "lossy" behavior involved with waking-up an Ethernet MAC, I don't
think that is too bad to retry several times until the link comes up.
-- 
Florian

[PATCH v4 net-next 07/19] tls: Split tls_sw_release_resources_rx

2018-07-12 Thread Boris Pismenny

This patch splits tls_sw_release_resources_rx into two functions one
which releases all inner software tls structures and another that also
frees the containing structure.

In TLS_DEVICE we will need to release the software structures without
freeeing the containing structure, which contains other information.

Signed-off-by: Boris Pismenny 
---
 include/net/tls.h |  1 +
 net/tls/tls_sw.c  | 10 +-
 2 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/include/net/tls.h b/include/net/tls.h
index 49b8922..7a485de 100644
--- a/include/net/tls.h
+++ b/include/net/tls.h
@@ -223,6 +223,7 @@ int tls_sw_sendpage(struct sock *sk, struct page *page,
 void tls_sw_close(struct sock *sk, long timeout);
 void tls_sw_free_resources_tx(struct sock *sk);
 void tls_sw_free_resources_rx(struct sock *sk);
+void tls_sw_release_resources_rx(struct sock *sk);
 int tls_sw_recvmsg(struct sock *sk, struct msghdr *msg, size_t len,
   int nonblock, int flags, int *addr_len);
 unsigned int tls_sw_poll(struct file *file, struct socket *sock,
diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index 99d0347..86e22bc 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -1039,7 +1039,7 @@ void tls_sw_free_resources_tx(struct sock *sk)
kfree(ctx);
 }
 
-void tls_sw_free_resources_rx(struct sock *sk)
+void tls_sw_release_resources_rx(struct sock *sk)
 {
struct tls_context *tls_ctx = tls_get_ctx(sk);
struct tls_sw_context_rx *ctx = tls_sw_ctx_rx(tls_ctx);
@@ -1058,6 +1058,14 @@ void tls_sw_free_resources_rx(struct sock *sk)
strp_done(&ctx->strp);
lock_sock(sk);
}
+}
+
+void tls_sw_free_resources_rx(struct sock *sk)
+{
+   struct tls_context *tls_ctx = tls_get_ctx(sk);
+   struct tls_sw_context_rx *ctx = tls_sw_ctx_rx(tls_ctx);
+
+   tls_sw_release_resources_rx(sk);
 
kfree(ctx);
 }
-- 
1.8.3.1

[PATCH v4 net-next 02/19] net: Add TLS RX offload feature

2018-07-12 Thread Boris Pismenny

From: Ilya Lesokhin 

This patch adds a netdev feature to configure TLS RX inline crypto offload.

Signed-off-by: Ilya Lesokhin 
Signed-off-by: Boris Pismenny 
---
 include/linux/netdev_features.h | 2 ++
 net/core/ethtool.c  | 1 +
 2 files changed, 3 insertions(+)

diff --git a/include/linux/netdev_features.h b/include/linux/netdev_features.h
index 623bb8c..2b2a6dc 100644
--- a/include/linux/netdev_features.h
+++ b/include/linux/netdev_features.h
@@ -79,6 +79,7 @@ enum {
NETIF_F_HW_ESP_TX_CSUM_BIT, /* ESP with TX checksum offload */
NETIF_F_RX_UDP_TUNNEL_PORT_BIT, /* Offload of RX port for UDP tunnels */
NETIF_F_HW_TLS_TX_BIT,  /* Hardware TLS TX offload */
+   NETIF_F_HW_TLS_RX_BIT,  /* Hardware TLS RX offload */
 
NETIF_F_GRO_HW_BIT, /* Hardware Generic receive offload */
NETIF_F_HW_TLS_RECORD_BIT,  /* Offload TLS record */
@@ -151,6 +152,7 @@ enum {
 #define NETIF_F_HW_TLS_RECORD  __NETIF_F(HW_TLS_RECORD)
 #define NETIF_F_GSO_UDP_L4 __NETIF_F(GSO_UDP_L4)
 #define NETIF_F_HW_TLS_TX  __NETIF_F(HW_TLS_TX)
+#define NETIF_F_HW_TLS_RX  __NETIF_F(HW_TLS_RX)
 
 #define for_each_netdev_feature(mask_addr, bit)\
for_each_set_bit(bit, (unsigned long *)mask_addr, NETDEV_FEATURE_COUNT)
diff --git a/net/core/ethtool.c b/net/core/ethtool.c
index e677a20..c9993c6 100644
--- a/net/core/ethtool.c
+++ b/net/core/ethtool.c
@@ -111,6 +111,7 @@ int ethtool_op_get_ts_info(struct net_device *dev, struct 
ethtool_ts_info *info)
[NETIF_F_RX_UDP_TUNNEL_PORT_BIT] =   "rx-udp_tunnel-port-offload",
[NETIF_F_HW_TLS_RECORD_BIT] =   "tls-hw-record",
[NETIF_F_HW_TLS_TX_BIT] ="tls-hw-tx-offload",
+   [NETIF_F_HW_TLS_RX_BIT] ="tls-hw-rx-offload",
 };
 
 static const char
-- 
1.8.3.1

[PATCH v4 net-next 04/19] tcp: Don't coalesce decrypted and encrypted SKBs

2018-07-12 Thread Boris Pismenny

Prevent coalescing of decrypted and encrypted SKBs in GRO
and TCP layer.

Signed-off-by: Boris Pismenny 
Signed-off-by: Ilya Lesokhin 
---
 net/ipv4/tcp_input.c   | 12 
 net/ipv4/tcp_offload.c |  3 +++
 2 files changed, 15 insertions(+)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 814ea43..f89d86a 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -4343,6 +4343,11 @@ static bool tcp_try_coalesce(struct sock *sk,
if (TCP_SKB_CB(from)->seq != TCP_SKB_CB(to)->end_seq)
return false;
 
+#ifdef CONFIG_TLS_DEVICE
+   if (from->decrypted != to->decrypted)
+   return false;
+#endif
+
if (!skb_try_coalesce(to, from, fragstolen, &delta))
return false;
 
@@ -4872,6 +4877,9 @@ void tcp_rbtree_insert(struct rb_root *root, struct 
sk_buff *skb)
break;
 
memcpy(nskb->cb, skb->cb, sizeof(skb->cb));
+#ifdef CONFIG_TLS_DEVICE
+   nskb->decrypted = skb->decrypted;
+#endif
TCP_SKB_CB(nskb)->seq = TCP_SKB_CB(nskb)->end_seq = start;
if (list)
__skb_queue_before(list, skb, nskb);
@@ -4899,6 +4907,10 @@ void tcp_rbtree_insert(struct rb_root *root, struct 
sk_buff *skb)
skb == tail ||
(TCP_SKB_CB(skb)->tcp_flags & (TCPHDR_SYN | 
TCPHDR_FIN)))
goto end;
+#ifdef CONFIG_TLS_DEVICE
+   if (skb->decrypted != nskb->decrypted)
+   goto end;
+#endif
}
}
}
diff --git a/net/ipv4/tcp_offload.c b/net/ipv4/tcp_offload.c
index f5aee64..870b0a3 100644
--- a/net/ipv4/tcp_offload.c
+++ b/net/ipv4/tcp_offload.c
@@ -262,6 +262,9 @@ struct sk_buff *tcp_gro_receive(struct list_head *head, 
struct sk_buff *skb)
 
flush |= (len - 1) >= mss;
flush |= (ntohl(th2->seq) + skb_gro_len(p)) ^ ntohl(th->seq);
+#ifdef CONFIG_TLS_DEVICE
+   flush |= p->decrypted ^ skb->decrypted;
+#endif
 
if (flush || skb_gro_receive(p, skb)) {
mss = 1;
-- 
1.8.3.1

[PATCH v4 net-next 11/19] net/mlx5e: TLS, refactor variable names

2018-07-12 Thread Boris Pismenny

For symmetry, we rename mlx5e_tls_offload_context to
mlx5e_tls_offload_context_tx before we add mlx5e_tls_offload_context_rx.

Signed-off-by: Boris Pismenny 
Reviewed-by: Aviad Yehezkel 
Reviewed-by: Tariq Toukan 
---
 drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls.c  | 2 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls.h  | 8 
 drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls_rxtx.c | 6 +++---
 3 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls.c
index d167845..7fb9c75 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls.c
@@ -123,7 +123,7 @@ static int mlx5e_tls_add(struct net_device *netdev, struct 
sock *sk,
goto free_flow;
 
if (direction == TLS_OFFLOAD_CTX_DIR_TX) {
-   struct mlx5e_tls_offload_context *tx_ctx =
+   struct mlx5e_tls_offload_context_tx *tx_ctx =
mlx5e_get_tls_tx_context(tls_ctx);
u32 swid;
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls.h
index b82f4de..e26222a 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls.h
@@ -49,19 +49,19 @@ struct mlx5e_tls {
struct mlx5e_tls_sw_stats sw_stats;
 };
 
-struct mlx5e_tls_offload_context {
+struct mlx5e_tls_offload_context_tx {
struct tls_offload_context_tx base;
u32 expected_seq;
__be32 swid;
 };
 
-static inline struct mlx5e_tls_offload_context *
+static inline struct mlx5e_tls_offload_context_tx *
 mlx5e_get_tls_tx_context(struct tls_context *tls_ctx)
 {
-   BUILD_BUG_ON(sizeof(struct mlx5e_tls_offload_context) >
+   BUILD_BUG_ON(sizeof(struct mlx5e_tls_offload_context_tx) >
 TLS_OFFLOAD_CONTEXT_SIZE_TX);
return container_of(tls_offload_ctx_tx(tls_ctx),
-   struct mlx5e_tls_offload_context,
+   struct mlx5e_tls_offload_context_tx,
base);
 }
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls_rxtx.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls_rxtx.c
index 15aef71..c96196f 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls_rxtx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls_rxtx.c
@@ -73,7 +73,7 @@ static int mlx5e_tls_add_metadata(struct sk_buff *skb, __be32 
swid)
return 0;
 }
 
-static int mlx5e_tls_get_sync_data(struct mlx5e_tls_offload_context *context,
+static int mlx5e_tls_get_sync_data(struct mlx5e_tls_offload_context_tx 
*context,
   u32 tcp_seq, struct sync_info *info)
 {
int remaining, i = 0, ret = -EINVAL;
@@ -161,7 +161,7 @@ static void mlx5e_tls_complete_sync_skb(struct sk_buff *skb,
 }
 
 static struct sk_buff *
-mlx5e_tls_handle_ooo(struct mlx5e_tls_offload_context *context,
+mlx5e_tls_handle_ooo(struct mlx5e_tls_offload_context_tx *context,
 struct mlx5e_txqsq *sq, struct sk_buff *skb,
 struct mlx5e_tx_wqe **wqe,
 u16 *pi,
@@ -239,7 +239,7 @@ struct sk_buff *mlx5e_tls_handle_tx_skb(struct net_device 
*netdev,
u16 *pi)
 {
struct mlx5e_priv *priv = netdev_priv(netdev);
-   struct mlx5e_tls_offload_context *context;
+   struct mlx5e_tls_offload_context_tx *context;
struct tls_context *tls_ctx;
u32 expected_seq;
int datalen;
-- 
1.8.3.1

[PATCH v4 net-next 00/19] TLS offload rx, netdev & mlx5

2018-07-12 Thread Boris Pismenny

Hi,

The following series provides TLS RX inline crypto offload.

v4->v3:
- Remove the iov revert for zero copy send flow 

v2->v3:
- Fix typo
- Adjust cover letter
- Fix bug in zero copy flows
- Use network byte order for the record number in resync
- Adjust the sequence provided in resync

v1->v2:
- Fix bisectability problems due to variable name changes
- Fix potential uninitialized return value

This series completes the generic infrastructure to offload TLS crypto to
a network devices. It enables the kernel TLS socket to skip decryption and
authentication operations for SKBs marked as decrypted on the receive
side of the data path. Leaving those computationally expensive operations
to the NIC.

This infrastructure doesn't require a TCP offload engine. Instead, the
NIC decrypts a packet's payload if the packet contains the expected TCP
sequence number. The TLS record authentication tag remains unmodified
regardless of decryption. If the packet is decrypted successfully and it
contains an authentication tag, then the authentication check has passed.
Otherwise, if the authentication fails, then the packet is provided
unmodified and the KTLS layer is responsible for handling it.
Out-Of-Order TCP packets are provided unmodified. As a result,
in the slow path some of the SKBs are decrypted while others remain as
ciphertext.

The GRO and TCP layers must not coalesce decrypted and non-decrypted SKBs. 
At the worst case a received TLS record consists of both plaintext
and ciphertext packets. These partially decrypted records must be
reencrypted, only to be decrypted.

The notable differences between SW KTLS and NIC offloaded TLS
implementations are as follows:
1. Partial decryption - Software must handle the case of a TLS record
that was only partially decrypted by HW. This can happen due to packet
reordering.
2. Resynchronization - tls_read_size calls the device driver to
resynchronize HW whenever it lost track of the TLS record framing in
the TCP stream.

The infrastructure should be extendable to support various NIC offload
implementations.  However it is currently written with the
implementation below in mind:
The NIC identifies packets that should be offloaded according to
the 5-tuple and the TCP sequence number. If these match and the
packet is decrypted and authenticated successfully, then a syndrome
is provided to software. Otherwise, the packet is unmodified.
Decrypted and non-decrypted packets aren't coalesced by the network stack,
and the KTLS layer decrypts and authenticates partially decrypted records.
The NIC provides an indication whenever a resync is required. The resync
operation is triggered by the KTLS layer while parsing TLS record headers.

Finally, we measure the performance obtained by running single stream
iperf with two Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz machines connected
back-to-back with Innova TLS (40Gbps) NICs. We compare TCP (upper bound)
and KTLS-Offload running both in Tx and Rx. The results show that the
performance of offload is comparable to TCP.

  | Bandwidth (Gbps) | CPU Tx (%) | CPU rx (%)
TCP   | 28.8 | 5  | 12
KTLS-Offload-Tx-Rx| 28.6 | 7  | 14

Paper: https://netdevconf.org/2.2/papers/pismenny-tlscrypto-talk.pdf

Boris Pismenny (18):
  net: Add decrypted field to skb
  net: Add TLS rx resync NDO
  tcp: Don't coalesce decrypted and encrypted SKBs
  tls: Refactor tls_offload variable names
  tls: Split decrypt_skb to two functions
  tls: Split tls_sw_release_resources_rx
  tls: Fill software context without allocation
  tls: Add rx inline crypto offload
  tls: Fix zerocopy_from_iter iov handling
  net/mlx5e: TLS, refactor variable names
  net/mlx5: Accel, add TLS rx offload routines
  net/mlx5e: TLS, add innova rx support
  net/mlx5e: TLS, add Innova TLS rx data path
  net/mlx5e: TLS, add software statistics
  net/mlx5e: TLS, build TLS netdev from capabilities
  net/mlx5: Accel, add common metadata functions
  net/mlx5e: IPsec, fix byte count in CQE
  net/mlx5e: Kconfig, mutually exclude compilation of TLS and IPsec
accel

Ilya Lesokhin (1):
  net: Add TLS RX offload feature

 drivers/net/ethernet/mellanox/mlx5/core/Kconfig|   1 +
 .../net/ethernet/mellanox/mlx5/core/accel/accel.h  |  37 +++
 .../net/ethernet/mellanox/mlx5/core/accel/tls.c|  23 +-
 .../net/ethernet/mellanox/mlx5/core/accel/tls.h|  26 +-
 .../mellanox/mlx5/core/en_accel/ipsec_rxtx.c   |  20 +-
 .../mellanox/mlx5/core/en_accel/ipsec_rxtx.h   |   2 +-
 .../net/ethernet/mellanox/mlx5/core/en_accel/tls.c |  69 +++--
 .../net/ethernet/mellanox/mlx5/core/en_accel/tls.h |  33 ++-
 .../mellanox/mlx5/core/en_accel/tls_rxtx.c | 117 +++-
 .../mellanox/mlx5/core/en_accel/tls_rxtx.h |   3 +
 drivers/net/ethernet/mellanox/mlx5/core/en_rx.c|   8 +-
 drivers/net/ethernet/mellanox/mlx5/core/fpga/tls.c | 113 ++--
 drivers/net/ethernet/mellano

[PATCH v4 net-next 09/19] tls: Add rx inline crypto offload

2018-07-12 Thread Boris Pismenny

This patch completes the generic infrastructure to offload TLS crypto to a
network device. It enables the kernel to skip decryption and
authentication of some skbs marked as decrypted by the NIC. In the fast
path, all packets received are decrypted by the NIC and the performance
is comparable to plain TCP.

This infrastructure doesn't require a TCP offload engine. Instead, the
NIC only decrypts packets that contain the expected TCP sequence number.
Out-Of-Order TCP packets are provided unmodified. As a result, at the
worst case a received TLS record consists of both plaintext and ciphertext
packets. These partially decrypted records must be reencrypted,
only to be decrypted.

The notable differences between SW KTLS Rx and this offload are as
follows:
1. Partial decryption - Software must handle the case of a TLS record
that was only partially decrypted by HW. This can happen due to packet
reordering.
2. Resynchronization - tls_read_size calls the device driver to
resynchronize HW after HW lost track of TLS record framing in
the TCP stream.

Signed-off-by: Boris Pismenny 
---
 include/net/tls.h |  63 +-
 net/tls/tls_device.c  | 278 ++
 net/tls/tls_device_fallback.c |   1 +
 net/tls/tls_main.c|  32 +++--
 net/tls/tls_sw.c  |  24 +++-
 5 files changed, 355 insertions(+), 43 deletions(-)

diff --git a/include/net/tls.h b/include/net/tls.h
index 7a485de..d8b3b65 100644
--- a/include/net/tls.h
+++ b/include/net/tls.h
@@ -83,6 +83,16 @@ struct tls_device {
void (*unhash)(struct tls_device *device, struct sock *sk);
 };
 
+enum {
+   TLS_BASE,
+   TLS_SW,
+#ifdef CONFIG_TLS_DEVICE
+   TLS_HW,
+#endif
+   TLS_HW_RECORD,
+   TLS_NUM_CONFIG,
+};
+
 struct tls_sw_context_tx {
struct crypto_aead *aead_send;
struct crypto_wait async_wait;
@@ -197,6 +207,7 @@ struct tls_context {
int (*push_pending_record)(struct sock *sk, int flags);
 
void (*sk_write_space)(struct sock *sk);
+   void (*sk_destruct)(struct sock *sk);
void (*sk_proto_close)(struct sock *sk, long timeout);
 
int  (*setsockopt)(struct sock *sk, int level,
@@ -209,13 +220,27 @@ struct tls_context {
void (*unhash)(struct sock *sk);
 };
 
+struct tls_offload_context_rx {
+   /* sw must be the first member of tls_offload_context_rx */
+   struct tls_sw_context_rx sw;
+   atomic64_t resync_req;
+   u8 driver_state[];
+   /* The TLS layer reserves room for driver specific state
+* Currently the belief is that there is not enough
+* driver specific state to justify another layer of indirection
+*/
+};
+
+#define TLS_OFFLOAD_CONTEXT_SIZE_RX\
+   (ALIGN(sizeof(struct tls_offload_context_rx), sizeof(void *)) + \
+TLS_DRIVER_STATE_SIZE)
+
 int wait_on_pending_writer(struct sock *sk, long *timeo);
 int tls_sk_query(struct sock *sk, int optname, char __user *optval,
int __user *optlen);
 int tls_sk_attach(struct sock *sk, int optname, char __user *optval,
  unsigned int optlen);
 
-
 int tls_set_sw_offload(struct sock *sk, struct tls_context *ctx, int tx);
 int tls_sw_sendmsg(struct sock *sk, struct msghdr *msg, size_t size);
 int tls_sw_sendpage(struct sock *sk, struct page *page,
@@ -290,11 +315,19 @@ static inline bool tls_is_pending_open_record(struct 
tls_context *tls_ctx)
return tls_ctx->pending_open_record_frags;
 }
 
+struct sk_buff *
+tls_validate_xmit_skb(struct sock *sk, struct net_device *dev,
+ struct sk_buff *skb);
+
 static inline bool tls_is_sk_tx_device_offloaded(struct sock *sk)
 {
-   return sk_fullsock(sk) &&
-  /* matches smp_store_release in tls_set_device_offload */
-  smp_load_acquire(&sk->sk_destruct) == &tls_device_sk_destruct;
+#ifdef CONFIG_SOCK_VALIDATE_XMIT
+   return sk_fullsock(sk) &
+  (smp_load_acquire(&sk->sk_validate_xmit_skb) ==
+  &tls_validate_xmit_skb);
+#else
+   return false;
+#endif
 }
 
 static inline void tls_err_abort(struct sock *sk, int err)
@@ -387,10 +420,27 @@ static inline struct tls_sw_context_tx *tls_sw_ctx_tx(
return (struct tls_offload_context_tx *)tls_ctx->priv_ctx_tx;
 }
 
+static inline struct tls_offload_context_rx *
+tls_offload_ctx_rx(const struct tls_context *tls_ctx)
+{
+   return (struct tls_offload_context_rx *)tls_ctx->priv_ctx_rx;
+}
+
+/* The TLS context is valid until sk_destruct is called */
+static inline void tls_offload_rx_resync_request(struct sock *sk, __be32 seq)
+{
+   struct tls_context *tls_ctx = tls_get_ctx(sk);
+   struct tls_offload_context_rx *rx_ctx = tls_offload_ctx_rx(tls_ctx);
+
+   atomic64_set(&rx_ctx->resync_req, uint64_t)seq) << 32) | 1));
+}
+
+
 int tls_proccess_cmsg(struct sock *sk, struct msghdr *msg,
  unsigned char *record_type);
 void tls_r

[PATCH v4 net-next 18/19] net/mlx5e: IPsec, fix byte count in CQE

2018-07-12 Thread Boris Pismenny

This patch fixes the byte count indication in CQE for processed IPsec
packets that contain a metadata header.

Signed-off-by: Boris Pismenny 
---
 drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_rxtx.c | 1 +
 drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_rxtx.h | 2 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_rx.c   | 2 +-
 3 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_rxtx.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_rxtx.c
index fda7929..128a82b 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_rxtx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_rxtx.c
@@ -364,6 +364,7 @@ struct sk_buff *mlx5e_ipsec_handle_rx_skb(struct net_device 
*netdev,
}
 
remove_metadata_hdr(skb);
+   *cqe_bcnt -= MLX5E_METADATA_ETHER_LEN;
 
return skb;
 }
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_rxtx.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_rxtx.h
index 2bfbbef..ca47c05 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_rxtx.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_rxtx.h
@@ -41,7 +41,7 @@
 #include "en.h"
 
 struct sk_buff *mlx5e_ipsec_handle_rx_skb(struct net_device *netdev,
- struct sk_buff *skb);
+ struct sk_buff *skb, u32 *cqe_bcnt);
 void mlx5e_ipsec_handle_rx_cqe(struct mlx5e_rq *rq, struct mlx5_cqe64 *cqe);
 
 void mlx5e_ipsec_inverse_table_init(void);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
index 847e195..4a85b26 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
@@ -1470,7 +1470,7 @@ void mlx5e_ipsec_handle_rx_cqe(struct mlx5e_rq *rq, 
struct mlx5_cqe64 *cqe)
mlx5e_free_rx_wqe(rq, wi);
goto wq_ll_pop;
}
-   skb = mlx5e_ipsec_handle_rx_skb(rq->netdev, skb);
+   skb = mlx5e_ipsec_handle_rx_skb(rq->netdev, skb, &cqe_bcnt);
if (unlikely(!skb)) {
mlx5e_free_rx_wqe(rq, wi);
goto wq_ll_pop;
-- 
1.8.3.1

[PATCH v4 net-next 05/19] tls: Refactor tls_offload variable names

2018-07-12 Thread Boris Pismenny

For symmetry, we rename tls_offload_context to
tls_offload_context_tx before we add tls_offload_context_rx.

Signed-off-by: Boris Pismenny 
---
 .../net/ethernet/mellanox/mlx5/core/en_accel/tls.h |  6 +++---
 include/net/tls.h  | 16 +++---
 net/tls/tls_device.c   | 25 +++---
 net/tls/tls_device_fallback.c  |  8 +++
 4 files changed, 27 insertions(+), 28 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls.h
index b616217..b82f4de 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls.h
@@ -50,7 +50,7 @@ struct mlx5e_tls {
 };
 
 struct mlx5e_tls_offload_context {
-   struct tls_offload_context base;
+   struct tls_offload_context_tx base;
u32 expected_seq;
__be32 swid;
 };
@@ -59,8 +59,8 @@ struct mlx5e_tls_offload_context {
 mlx5e_get_tls_tx_context(struct tls_context *tls_ctx)
 {
BUILD_BUG_ON(sizeof(struct mlx5e_tls_offload_context) >
-TLS_OFFLOAD_CONTEXT_SIZE);
-   return container_of(tls_offload_ctx(tls_ctx),
+TLS_OFFLOAD_CONTEXT_SIZE_TX);
+   return container_of(tls_offload_ctx_tx(tls_ctx),
struct mlx5e_tls_offload_context,
base);
 }
diff --git a/include/net/tls.h b/include/net/tls.h
index 70c2737..5dcd808 100644
--- a/include/net/tls.h
+++ b/include/net/tls.h
@@ -128,7 +128,7 @@ struct tls_record_info {
skb_frag_t frags[MAX_SKB_FRAGS];
 };
 
-struct tls_offload_context {
+struct tls_offload_context_tx {
struct crypto_aead *aead_send;
spinlock_t lock;/* protects records list */
struct list_head records_list;
@@ -147,8 +147,8 @@ struct tls_offload_context {
 #define TLS_DRIVER_STATE_SIZE (max_t(size_t, 8, sizeof(void *)))
 };
 
-#define TLS_OFFLOAD_CONTEXT_SIZE   
\
-   (ALIGN(sizeof(struct tls_offload_context), sizeof(void *)) +   \
+#define TLS_OFFLOAD_CONTEXT_SIZE_TX
\
+   (ALIGN(sizeof(struct tls_offload_context_tx), sizeof(void *)) +\
 TLS_DRIVER_STATE_SIZE)
 
 enum {
@@ -239,7 +239,7 @@ int tls_device_sendpage(struct sock *sk, struct page *page,
 void tls_device_init(void);
 void tls_device_cleanup(void);
 
-struct tls_record_info *tls_get_record(struct tls_offload_context *context,
+struct tls_record_info *tls_get_record(struct tls_offload_context_tx *context,
   u32 seq, u64 *p_record_sn);
 
 static inline bool tls_record_is_start_marker(struct tls_record_info *rec)
@@ -380,10 +380,10 @@ static inline struct tls_sw_context_tx *tls_sw_ctx_tx(
return (struct tls_sw_context_tx *)tls_ctx->priv_ctx_tx;
 }
 
-static inline struct tls_offload_context *tls_offload_ctx(
-   const struct tls_context *tls_ctx)
+static inline struct tls_offload_context_tx *
+tls_offload_ctx_tx(const struct tls_context *tls_ctx)
 {
-   return (struct tls_offload_context *)tls_ctx->priv_ctx_tx;
+   return (struct tls_offload_context_tx *)tls_ctx->priv_ctx_tx;
 }
 
 int tls_proccess_cmsg(struct sock *sk, struct msghdr *msg,
@@ -396,7 +396,7 @@ struct sk_buff *tls_validate_xmit_skb(struct sock *sk,
  struct sk_buff *skb);
 
 int tls_sw_fallback_init(struct sock *sk,
-struct tls_offload_context *offload_ctx,
+struct tls_offload_context_tx *offload_ctx,
 struct tls_crypto_info *crypto_info);
 
 #endif /* _TLS_OFFLOAD_H */
diff --git a/net/tls/tls_device.c b/net/tls/tls_device.c
index a7a8f8e..332a5d1 100644
--- a/net/tls/tls_device.c
+++ b/net/tls/tls_device.c
@@ -52,9 +52,8 @@
 
 static void tls_device_free_ctx(struct tls_context *ctx)
 {
-   struct tls_offload_context *offload_ctx = tls_offload_ctx(ctx);
+   kfree(tls_offload_ctx_tx(ctx));
 
-   kfree(offload_ctx);
kfree(ctx);
 }
 
@@ -125,7 +124,7 @@ static void destroy_record(struct tls_record_info *record)
kfree(record);
 }
 
-static void delete_all_records(struct tls_offload_context *offload_ctx)
+static void delete_all_records(struct tls_offload_context_tx *offload_ctx)
 {
struct tls_record_info *info, *temp;
 
@@ -141,14 +140,14 @@ static void tls_icsk_clean_acked(struct sock *sk, u32 
acked_seq)
 {
struct tls_context *tls_ctx = tls_get_ctx(sk);
struct tls_record_info *info, *temp;
-   struct tls_offload_context *ctx;
+   struct tls_offload_context_tx *ctx;
u64 deleted_records = 0;
unsigned long flags;
 
if (!tls_ctx)
return;
 
-   ctx = tls_offload_ctx(tls_ctx);
+   ctx = tls_offload_ctx_tx(tls_ctx);
 
spin_lock_irqsave(&ctx->lock, flags

[PATCH v4 net-next 13/19] net/mlx5e: TLS, add innova rx support

2018-07-12 Thread Boris Pismenny

Add the mlx5 implementation of the TLS Rx routines to add/del TLS
contexts, also add the tls_dev_resync_rx routine
to work with the TLS inline Rx crypto offload infrastructure.

Signed-off-by: Boris Pismenny 
Signed-off-by: Ilya Lesokhin 
---
 .../net/ethernet/mellanox/mlx5/core/en_accel/tls.c | 46 +++---
 .../net/ethernet/mellanox/mlx5/core/en_accel/tls.h | 15 +++
 2 files changed, 46 insertions(+), 15 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls.c
index 7fb9c75..68368c9 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls.c
@@ -110,9 +110,7 @@ static int mlx5e_tls_add(struct net_device *netdev, struct 
sock *sk,
u32 caps = mlx5_accel_tls_device_caps(mdev);
int ret = -ENOMEM;
void *flow;
-
-   if (direction != TLS_OFFLOAD_CTX_DIR_TX)
-   return -EINVAL;
+   u32 swid;
 
flow = kzalloc(MLX5_ST_SZ_BYTES(tls_flow), GFP_KERNEL);
if (!flow)
@@ -122,18 +120,23 @@ static int mlx5e_tls_add(struct net_device *netdev, 
struct sock *sk,
if (ret)
goto free_flow;
 
+   ret = mlx5_accel_tls_add_flow(mdev, flow, crypto_info,
+ start_offload_tcp_sn, &swid,
+ direction == TLS_OFFLOAD_CTX_DIR_TX);
+   if (ret < 0)
+   goto free_flow;
+
if (direction == TLS_OFFLOAD_CTX_DIR_TX) {
struct mlx5e_tls_offload_context_tx *tx_ctx =
mlx5e_get_tls_tx_context(tls_ctx);
-   u32 swid;
-
-   ret = mlx5_accel_tls_add_tx_flow(mdev, flow, crypto_info,
-start_offload_tcp_sn, &swid);
-   if (ret < 0)
-   goto free_flow;
 
tx_ctx->swid = htonl(swid);
tx_ctx->expected_seq = start_offload_tcp_sn;
+   } else {
+   struct mlx5e_tls_offload_context_rx *rx_ctx =
+   mlx5e_get_tls_rx_context(tls_ctx);
+
+   rx_ctx->handle = htonl(swid);
}
 
return 0;
@@ -147,19 +150,32 @@ static void mlx5e_tls_del(struct net_device *netdev,
  enum tls_offload_ctx_dir direction)
 {
struct mlx5e_priv *priv = netdev_priv(netdev);
+   unsigned int handle;
 
-   if (direction == TLS_OFFLOAD_CTX_DIR_TX) {
-   u32 swid = ntohl(mlx5e_get_tls_tx_context(tls_ctx)->swid);
+   handle = ntohl((direction == TLS_OFFLOAD_CTX_DIR_TX) ?
+  mlx5e_get_tls_tx_context(tls_ctx)->swid :
+  mlx5e_get_tls_rx_context(tls_ctx)->handle);
 
-   mlx5_accel_tls_del_tx_flow(priv->mdev, swid);
-   } else {
-   netdev_err(netdev, "unsupported direction %d\n", direction);
-   }
+   mlx5_accel_tls_del_flow(priv->mdev, handle,
+   direction == TLS_OFFLOAD_CTX_DIR_TX);
+}
+
+static void mlx5e_tls_resync_rx(struct net_device *netdev, struct sock *sk,
+   u32 seq, u64 rcd_sn)
+{
+   struct tls_context *tls_ctx = tls_get_ctx(sk);
+   struct mlx5e_priv *priv = netdev_priv(netdev);
+   struct mlx5e_tls_offload_context_rx *rx_ctx;
+
+   rx_ctx = mlx5e_get_tls_rx_context(tls_ctx);
+
+   mlx5_accel_tls_resync_rx(priv->mdev, rx_ctx->handle, seq, rcd_sn);
 }
 
 static const struct tlsdev_ops mlx5e_tls_ops = {
.tls_dev_add = mlx5e_tls_add,
.tls_dev_del = mlx5e_tls_del,
+   .tls_dev_resync_rx = mlx5e_tls_resync_rx,
 };
 
 void mlx5e_tls_build_netdev(struct mlx5e_priv *priv)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls.h
index e26222a..2d40ede 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls.h
@@ -65,6 +65,21 @@ struct mlx5e_tls_offload_context_tx {
base);
 }
 
+struct mlx5e_tls_offload_context_rx {
+   struct tls_offload_context_rx base;
+   __be32 handle;
+};
+
+static inline struct mlx5e_tls_offload_context_rx *
+mlx5e_get_tls_rx_context(struct tls_context *tls_ctx)
+{
+   BUILD_BUG_ON(sizeof(struct mlx5e_tls_offload_context_rx) >
+TLS_OFFLOAD_CONTEXT_SIZE_RX);
+   return container_of(tls_offload_ctx_rx(tls_ctx),
+   struct mlx5e_tls_offload_context_rx,
+   base);
+}
+
 void mlx5e_tls_build_netdev(struct mlx5e_priv *priv);
 int mlx5e_tls_init(struct mlx5e_priv *priv);
 void mlx5e_tls_cleanup(struct mlx5e_priv *priv);
-- 
1.8.3.1

[PATCH v4 net-next 01/19] net: Add decrypted field to skb

2018-07-12 Thread Boris Pismenny

The decrypted bit is propogated to cloned/copied skbs.
This will be used later by the inline crypto receive side offload
of tls.

Signed-off-by: Boris Pismenny 
Signed-off-by: Ilya Lesokhin 
---
 include/linux/skbuff.h | 7 ++-
 net/core/skbuff.c  | 6 ++
 2 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 7601838..3ceb8dc 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -630,6 +630,7 @@ enum {
  * @hash: the packet hash
  * @queue_mapping: Queue mapping for multiqueue devices
  * @xmit_more: More SKBs are pending for this queue
+ * @decrypted: Decrypted SKB
  * @ndisc_nodetype: router type (from link layer)
  * @ooo_okay: allow the mapping of a socket to a queue to be changed
  * @l4_hash: indicate hash is a canonical 4-tuple hash over transport
@@ -736,7 +737,11 @@ struct sk_buff {
peeked:1,
head_frag:1,
xmit_more:1,
-   __unused:1; /* one bit hole */
+#ifdef CONFIG_TLS_DEVICE
+   decrypted:1;
+#else
+   __unused:1;
+#endif
 
/* fields enclosed in headers_start/headers_end are copied
 * using a single memcpy() in __copy_skb_header()
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index c4e24ac..cfd6c6f 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -805,6 +805,9 @@ static void __copy_skb_header(struct sk_buff *new, const 
struct sk_buff *old)
 * It is not yet because we do not want to have a 16 bit hole
 */
new->queue_mapping = old->queue_mapping;
+#ifdef CONFIG_TLS_DEVICE
+   new->decrypted = old->decrypted;
+#endif
 
memcpy(&new->headers_start, &old->headers_start,
   offsetof(struct sk_buff, headers_end) -
@@ -865,6 +868,9 @@ static struct sk_buff *__skb_clone(struct sk_buff *n, 
struct sk_buff *skb)
C(head_frag);
C(data);
C(truesize);
+#ifdef CONFIG_TLS_DEVICE
+   C(decrypted);
+#endif
refcount_set(&n->users, 1);
 
atomic_inc(&(skb_shinfo(skb)->dataref));
-- 
1.8.3.1

[PATCH v4 net-next 10/19] tls: Fix zerocopy_from_iter iov handling

2018-07-12 Thread Boris Pismenny

zerocopy_from_iter iterates over the message, but it doesn't revert the
updates made by the iov iteration. This patch fixes it. Now, the iov can
be used after calling zerocopy_from_iter.

Fixes: 3c4d75591 ("tls: kernel TLS support")
Signed-off-by: Boris Pismenny 
---
 net/tls/tls_sw.c | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index 2a6ba0f..ea78678 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -267,7 +267,7 @@ static int zerocopy_from_iter(struct sock *sk, struct 
iov_iter *from,
  int length, int *pages_used,
  unsigned int *size_used,
  struct scatterlist *to, int to_max_pages,
- bool charge)
+ bool charge, bool revert)
 {
struct page *pages[MAX_SKB_FRAGS];
 
@@ -318,6 +318,8 @@ static int zerocopy_from_iter(struct sock *sk, struct 
iov_iter *from,
 out:
*size_used = size;
*pages_used = num_elem;
+   if (revert)
+   iov_iter_revert(from, size);
 
return rc;
 }
@@ -419,7 +421,7 @@ int tls_sw_sendmsg(struct sock *sk, struct msghdr *msg, 
size_t size)
&ctx->sg_plaintext_size,
ctx->sg_plaintext_data,
ARRAY_SIZE(ctx->sg_plaintext_data),
-   true);
+   true, false);
if (ret)
goto fallback_to_reg_send;
 
@@ -834,7 +836,7 @@ int tls_sw_recvmsg(struct sock *sk,
err = zerocopy_from_iter(sk, &msg->msg_iter,
 to_copy, &pages,
 &chunk, &sgin[1],
-MAX_SKB_FRAGS, false);
+MAX_SKB_FRAGS, false, 
true);
if (err < 0)
goto fallback_to_reg_recv;
 
-- 
1.8.3.1

[PATCH v4 net-next 17/19] net/mlx5: Accel, add common metadata functions

2018-07-12 Thread Boris Pismenny

This patch adds common functions to handle mellanox metadata headers.
These functions are used by IPsec and TLS to process FPGA metadata.

Signed-off-by: Boris Pismenny 
---
 .../net/ethernet/mellanox/mlx5/core/accel/accel.h  | 37 ++
 .../mellanox/mlx5/core/en_accel/ipsec_rxtx.c   | 19 +++
 .../mellanox/mlx5/core/en_accel/tls_rxtx.c | 18 +++
 3 files changed, 45 insertions(+), 29 deletions(-)
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/accel/accel.h

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/accel/accel.h 
b/drivers/net/ethernet/mellanox/mlx5/core/accel/accel.h
new file mode 100644
index 000..c132604
--- /dev/null
+++ b/drivers/net/ethernet/mellanox/mlx5/core/accel/accel.h
@@ -0,0 +1,37 @@
+#ifndef __MLX5E_ACCEL_H__
+#define __MLX5E_ACCEL_H__
+
+#ifdef CONFIG_MLX5_ACCEL
+
+#include 
+#include 
+#include "en.h"
+
+static inline bool is_metadata_hdr_valid(struct sk_buff *skb)
+{
+   __be16 *ethtype;
+
+   if (unlikely(skb->len < ETH_HLEN + MLX5E_METADATA_ETHER_LEN))
+   return false;
+   ethtype = (__be16 *)(skb->data + ETH_ALEN * 2);
+   if (*ethtype != cpu_to_be16(MLX5E_METADATA_ETHER_TYPE))
+   return false;
+   return true;
+}
+
+static inline void remove_metadata_hdr(struct sk_buff *skb)
+{
+   struct ethhdr *old_eth;
+   struct ethhdr *new_eth;
+
+   /* Remove the metadata from the buffer */
+   old_eth = (struct ethhdr *)skb->data;
+   new_eth = (struct ethhdr *)(skb->data + MLX5E_METADATA_ETHER_LEN);
+   memmove(new_eth, old_eth, 2 * ETH_ALEN);
+   /* Ethertype is already in its new place */
+   skb_pull_inline(skb, MLX5E_METADATA_ETHER_LEN);
+}
+
+#endif /* CONFIG_MLX5_ACCEL */
+
+#endif /* __MLX5E_EN_ACCEL_H__ */
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_rxtx.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_rxtx.c
index c245d8e..fda7929 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_rxtx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_rxtx.c
@@ -37,6 +37,7 @@
 
 #include "en_accel/ipsec_rxtx.h"
 #include "en_accel/ipsec.h"
+#include "accel/accel.h"
 #include "en.h"
 
 enum {
@@ -346,19 +347,12 @@ struct sk_buff *mlx5e_ipsec_handle_tx_skb(struct 
net_device *netdev,
 }
 
 struct sk_buff *mlx5e_ipsec_handle_rx_skb(struct net_device *netdev,
- struct sk_buff *skb)
+ struct sk_buff *skb, u32 *cqe_bcnt)
 {
struct mlx5e_ipsec_metadata *mdata;
-   struct ethhdr *old_eth;
-   struct ethhdr *new_eth;
struct xfrm_state *xs;
-   __be16 *ethtype;
 
-   /* Detect inline metadata */
-   if (skb->len < ETH_HLEN + MLX5E_METADATA_ETHER_LEN)
-   return skb;
-   ethtype = (__be16 *)(skb->data + ETH_ALEN * 2);
-   if (*ethtype != cpu_to_be16(MLX5E_METADATA_ETHER_TYPE))
+   if (!is_metadata_hdr_valid(skb))
return skb;
 
/* Use the metadata */
@@ -369,12 +363,7 @@ struct sk_buff *mlx5e_ipsec_handle_rx_skb(struct 
net_device *netdev,
return NULL;
}
 
-   /* Remove the metadata from the buffer */
-   old_eth = (struct ethhdr *)skb->data;
-   new_eth = (struct ethhdr *)(skb->data + MLX5E_METADATA_ETHER_LEN);
-   memmove(new_eth, old_eth, 2 * ETH_ALEN);
-   /* Ethertype is already in its new place */
-   skb_pull_inline(skb, MLX5E_METADATA_ETHER_LEN);
+   remove_metadata_hdr(skb);
 
return skb;
 }
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls_rxtx.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls_rxtx.c
index ecfc764..92d3745 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls_rxtx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls_rxtx.c
@@ -33,6 +33,8 @@
 
 #include "en_accel/tls.h"
 #include "en_accel/tls_rxtx.h"
+#include "accel/accel.h"
+
 #include 
 #include 
 
@@ -350,16 +352,9 @@ void mlx5e_tls_handle_rx_skb(struct net_device *netdev, 
struct sk_buff *skb,
 u32 *cqe_bcnt)
 {
struct mlx5e_tls_metadata *mdata;
-   struct ethhdr *old_eth;
-   struct ethhdr *new_eth;
-   __be16 *ethtype;
struct mlx5e_priv *priv;
 
-   /* Detect inline metadata */
-   if (skb->len < ETH_HLEN + MLX5E_METADATA_ETHER_LEN)
-   return;
-   ethtype = (__be16 *)(skb->data + ETH_ALEN * 2);
-   if (*ethtype != cpu_to_be16(MLX5E_METADATA_ETHER_TYPE))
+   if (!is_metadata_hdr_valid(skb))
return;
 
/* Use the metadata */
@@ -383,11 +378,6 @@ void mlx5e_tls_handle_rx_skb(struct net_device *netdev, 
struct sk_buff *skb,
return;
}
 
-   /* Remove the metadata from the buffer */
-   old_eth = (struct ethhdr *)skb->data;
-   new_eth = (struct ethhdr *)(skb->data + MLX5E_METADATA_ETHER_LEN);
-   memmove(new_eth,

[PATCH v4 net-next 15/19] net/mlx5e: TLS, add software statistics

2018-07-12 Thread Boris Pismenny

This patch adds software statistics for TLS to count important
events.

Signed-off-by: Boris Pismenny 
---
 drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls.c  |  3 +++
 drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls.h  |  4 
 drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls_rxtx.c | 11 ++-
 3 files changed, 17 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls.c
index 68368c9..541e6f4 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls.c
@@ -169,7 +169,10 @@ static void mlx5e_tls_resync_rx(struct net_device *netdev, 
struct sock *sk,
 
rx_ctx = mlx5e_get_tls_rx_context(tls_ctx);
 
+   netdev_info(netdev, "resyncing seq %d rcd %lld\n", seq,
+   be64_to_cpu(rcd_sn));
mlx5_accel_tls_resync_rx(priv->mdev, rx_ctx->handle, seq, rcd_sn);
+   atomic64_inc(&priv->tls->sw_stats.rx_tls_resync_reply);
 }
 
 static const struct tlsdev_ops mlx5e_tls_ops = {
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls.h
index 2d40ede..3f5d721 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls.h
@@ -43,6 +43,10 @@ struct mlx5e_tls_sw_stats {
atomic64_t tx_tls_drop_resync_alloc;
atomic64_t tx_tls_drop_no_sync_data;
atomic64_t tx_tls_drop_bypass_required;
+   atomic64_t rx_tls_drop_resync_request;
+   atomic64_t rx_tls_resync_request;
+   atomic64_t rx_tls_resync_reply;
+   atomic64_t rx_tls_auth_fail;
 };
 
 struct mlx5e_tls {
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls_rxtx.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls_rxtx.c
index d460fda..ecfc764 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls_rxtx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls_rxtx.c
@@ -330,8 +330,12 @@ static int tls_update_resync_sn(struct net_device *netdev,
netdev->ifindex, 0);
 #endif
}
-   if (!sk || sk->sk_state == TCP_TIME_WAIT)
+   if (!sk || sk->sk_state == TCP_TIME_WAIT) {
+   struct mlx5e_priv *priv = netdev_priv(netdev);
+
+   atomic64_inc(&priv->tls->sw_stats.rx_tls_drop_resync_request);
goto out;
+   }
 
skb->sk = sk;
skb->destructor = sock_edemux;
@@ -349,6 +353,7 @@ void mlx5e_tls_handle_rx_skb(struct net_device *netdev, 
struct sk_buff *skb,
struct ethhdr *old_eth;
struct ethhdr *new_eth;
__be16 *ethtype;
+   struct mlx5e_priv *priv;
 
/* Detect inline metadata */
if (skb->len < ETH_HLEN + MLX5E_METADATA_ETHER_LEN)
@@ -365,9 +370,13 @@ void mlx5e_tls_handle_rx_skb(struct net_device *netdev, 
struct sk_buff *skb,
break;
case SYNDROM_RESYNC_REQUEST:
tls_update_resync_sn(netdev, skb, mdata);
+   priv = netdev_priv(netdev);
+   atomic64_inc(&priv->tls->sw_stats.rx_tls_resync_request);
break;
case SYNDROM_AUTH_FAILED:
/* Authentication failure will be observed and verified by kTLS 
*/
+   priv = netdev_priv(netdev);
+   atomic64_inc(&priv->tls->sw_stats.rx_tls_auth_fail);
break;
default:
/* Bypass the metadata header to others */
-- 
1.8.3.1

[PATCH v4 net-next 08/19] tls: Fill software context without allocation

2018-07-12 Thread Boris Pismenny

This patch allows tls_set_sw_offload to fill the context in case it was
already allocated previously.

We will use it in TLS_DEVICE to fill the RX software context.

Signed-off-by: Boris Pismenny 
---
 net/tls/tls_sw.c | 34 ++
 1 file changed, 22 insertions(+), 12 deletions(-)

diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index 86e22bc..5073676 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -1090,28 +1090,38 @@ int tls_set_sw_offload(struct sock *sk, struct 
tls_context *ctx, int tx)
}
 
if (tx) {
-   sw_ctx_tx = kzalloc(sizeof(*sw_ctx_tx), GFP_KERNEL);
-   if (!sw_ctx_tx) {
-   rc = -ENOMEM;
-   goto out;
+   if (!ctx->priv_ctx_tx) {
+   sw_ctx_tx = kzalloc(sizeof(*sw_ctx_tx), GFP_KERNEL);
+   if (!sw_ctx_tx) {
+   rc = -ENOMEM;
+   goto out;
+   }
+   ctx->priv_ctx_tx = sw_ctx_tx;
+   } else {
+   sw_ctx_tx =
+   (struct tls_sw_context_tx *)ctx->priv_ctx_tx;
}
-   crypto_init_wait(&sw_ctx_tx->async_wait);
-   ctx->priv_ctx_tx = sw_ctx_tx;
} else {
-   sw_ctx_rx = kzalloc(sizeof(*sw_ctx_rx), GFP_KERNEL);
-   if (!sw_ctx_rx) {
-   rc = -ENOMEM;
-   goto out;
+   if (!ctx->priv_ctx_rx) {
+   sw_ctx_rx = kzalloc(sizeof(*sw_ctx_rx), GFP_KERNEL);
+   if (!sw_ctx_rx) {
+   rc = -ENOMEM;
+   goto out;
+   }
+   ctx->priv_ctx_rx = sw_ctx_rx;
+   } else {
+   sw_ctx_rx =
+   (struct tls_sw_context_rx *)ctx->priv_ctx_rx;
}
-   crypto_init_wait(&sw_ctx_rx->async_wait);
-   ctx->priv_ctx_rx = sw_ctx_rx;
}
 
if (tx) {
+   crypto_init_wait(&sw_ctx_tx->async_wait);
crypto_info = &ctx->crypto_send;
cctx = &ctx->tx;
aead = &sw_ctx_tx->aead_send;
} else {
+   crypto_init_wait(&sw_ctx_rx->async_wait);
crypto_info = &ctx->crypto_recv;
cctx = &ctx->rx;
aead = &sw_ctx_rx->aead_recv;
-- 
1.8.3.1

[PATCH v4 net-next 06/19] tls: Split decrypt_skb to two functions

2018-07-12 Thread Boris Pismenny

Previously, decrypt_skb also updated the TLS context.
Now, decrypt_skb only decrypts the payload using the current context,
while decrypt_skb_update also updates the state.

Later, in the tls_device Rx flow, we will use decrypt_skb directly.

Signed-off-by: Boris Pismenny 
---
 include/net/tls.h |  2 ++
 net/tls/tls_sw.c  | 44 ++--
 2 files changed, 28 insertions(+), 18 deletions(-)

diff --git a/include/net/tls.h b/include/net/tls.h
index 5dcd808..49b8922 100644
--- a/include/net/tls.h
+++ b/include/net/tls.h
@@ -390,6 +390,8 @@ int tls_proccess_cmsg(struct sock *sk, struct msghdr *msg,
  unsigned char *record_type);
 void tls_register_device(struct tls_device *device);
 void tls_unregister_device(struct tls_device *device);
+int decrypt_skb(struct sock *sk, struct sk_buff *skb,
+   struct scatterlist *sgout);
 
 struct sk_buff *tls_validate_xmit_skb(struct sock *sk,
  struct net_device *dev,
diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index 3bd7c14..99d0347 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -53,7 +53,6 @@ static int tls_do_decryption(struct sock *sk,
 {
struct tls_context *tls_ctx = tls_get_ctx(sk);
struct tls_sw_context_rx *ctx = tls_sw_ctx_rx(tls_ctx);
-   struct strp_msg *rxm = strp_msg(skb);
struct aead_request *aead_req;
 
int ret;
@@ -74,18 +73,6 @@ static int tls_do_decryption(struct sock *sk,
 
ret = crypto_wait_req(crypto_aead_decrypt(aead_req), &ctx->async_wait);
 
-   if (ret < 0)
-   goto out;
-
-   rxm->offset += tls_ctx->rx.prepend_size;
-   rxm->full_len -= tls_ctx->rx.overhead_size;
-   tls_advance_record_sn(sk, &tls_ctx->rx);
-
-   ctx->decrypted = true;
-
-   ctx->saved_data_ready(sk);
-
-out:
kfree(aead_req);
return ret;
 }
@@ -670,8 +657,29 @@ static struct sk_buff *tls_wait_data(struct sock *sk, int 
flags,
return skb;
 }
 
-static int decrypt_skb(struct sock *sk, struct sk_buff *skb,
-  struct scatterlist *sgout)
+static int decrypt_skb_update(struct sock *sk, struct sk_buff *skb,
+ struct scatterlist *sgout)
+{
+   struct tls_context *tls_ctx = tls_get_ctx(sk);
+   struct tls_sw_context_rx *ctx = tls_sw_ctx_rx(tls_ctx);
+   struct strp_msg *rxm = strp_msg(skb);
+   int err = 0;
+
+   err = decrypt_skb(sk, skb, sgout);
+   if (err < 0)
+   return err;
+
+   rxm->offset += tls_ctx->rx.prepend_size;
+   rxm->full_len -= tls_ctx->rx.overhead_size;
+   tls_advance_record_sn(sk, &tls_ctx->rx);
+   ctx->decrypted = true;
+   ctx->saved_data_ready(sk);
+
+   return err;
+}
+
+int decrypt_skb(struct sock *sk, struct sk_buff *skb,
+   struct scatterlist *sgout)
 {
struct tls_context *tls_ctx = tls_get_ctx(sk);
struct tls_sw_context_rx *ctx = tls_sw_ctx_rx(tls_ctx);
@@ -821,7 +829,7 @@ int tls_sw_recvmsg(struct sock *sk,
if (err < 0)
goto fallback_to_reg_recv;
 
-   err = decrypt_skb(sk, skb, sgin);
+   err = decrypt_skb_update(sk, skb, sgin);
for (; pages > 0; pages--)
put_page(sg_page(&sgin[pages]));
if (err < 0) {
@@ -830,7 +838,7 @@ int tls_sw_recvmsg(struct sock *sk,
}
} else {
 fallback_to_reg_recv:
-   err = decrypt_skb(sk, skb, NULL);
+   err = decrypt_skb_update(sk, skb, NULL);
if (err < 0) {
tls_err_abort(sk, EBADMSG);
goto recv_end;
@@ -901,7 +909,7 @@ ssize_t tls_sw_splice_read(struct socket *sock,  loff_t 
*ppos,
}
 
if (!ctx->decrypted) {
-   err = decrypt_skb(sk, skb, NULL);
+   err = decrypt_skb_update(sk, skb, NULL);
 
if (err < 0) {
tls_err_abort(sk, EBADMSG);
-- 
1.8.3.1

[PATCH v4 net-next 12/19] net/mlx5: Accel, add TLS rx offload routines

2018-07-12 Thread Boris Pismenny

In Innova TLS, TLS contexts are added or deleted
via a command message over the SBU connection.
The HW then sends a response message over the same connection.

Complete the implementation for Innova TLS (FPGA-based) hardware by
adding support for rx inline crypto offload.

Signed-off-by: Boris Pismenny 
Signed-off-by: Ilya Lesokhin 
---
 .../net/ethernet/mellanox/mlx5/core/accel/tls.c|  23 +++--
 .../net/ethernet/mellanox/mlx5/core/accel/tls.h|  26 +++--
 drivers/net/ethernet/mellanox/mlx5/core/fpga/tls.c | 113 -
 drivers/net/ethernet/mellanox/mlx5/core/fpga/tls.h |  18 ++--
 include/linux/mlx5/mlx5_ifc_fpga.h |   1 +
 5 files changed, 135 insertions(+), 46 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/accel/tls.c 
b/drivers/net/ethernet/mellanox/mlx5/core/accel/tls.c
index 77ac19f..da7bd26 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/accel/tls.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/accel/tls.c
@@ -37,17 +37,26 @@
 #include "mlx5_core.h"
 #include "fpga/tls.h"
 
-int mlx5_accel_tls_add_tx_flow(struct mlx5_core_dev *mdev, void *flow,
-  struct tls_crypto_info *crypto_info,
-  u32 start_offload_tcp_sn, u32 *p_swid)
+int mlx5_accel_tls_add_flow(struct mlx5_core_dev *mdev, void *flow,
+   struct tls_crypto_info *crypto_info,
+   u32 start_offload_tcp_sn, u32 *p_swid,
+   bool direction_sx)
 {
-   return mlx5_fpga_tls_add_tx_flow(mdev, flow, crypto_info,
-start_offload_tcp_sn, p_swid);
+   return mlx5_fpga_tls_add_flow(mdev, flow, crypto_info,
+ start_offload_tcp_sn, p_swid,
+ direction_sx);
 }
 
-void mlx5_accel_tls_del_tx_flow(struct mlx5_core_dev *mdev, u32 swid)
+void mlx5_accel_tls_del_flow(struct mlx5_core_dev *mdev, u32 swid,
+bool direction_sx)
 {
-   mlx5_fpga_tls_del_tx_flow(mdev, swid, GFP_KERNEL);
+   mlx5_fpga_tls_del_flow(mdev, swid, GFP_KERNEL, direction_sx);
+}
+
+int mlx5_accel_tls_resync_rx(struct mlx5_core_dev *mdev, u32 handle, u32 seq,
+u64 rcd_sn)
+{
+   return mlx5_fpga_tls_resync_rx(mdev, handle, seq, rcd_sn);
 }
 
 bool mlx5_accel_is_tls_device(struct mlx5_core_dev *mdev)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/accel/tls.h 
b/drivers/net/ethernet/mellanox/mlx5/core/accel/tls.h
index 6f9c9f4..2228c10 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/accel/tls.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/accel/tls.h
@@ -60,10 +60,14 @@ struct mlx5_ifc_tls_flow_bits {
u8 reserved_at_2[0x1e];
 };
 
-int mlx5_accel_tls_add_tx_flow(struct mlx5_core_dev *mdev, void *flow,
-  struct tls_crypto_info *crypto_info,
-  u32 start_offload_tcp_sn, u32 *p_swid);
-void mlx5_accel_tls_del_tx_flow(struct mlx5_core_dev *mdev, u32 swid);
+int mlx5_accel_tls_add_flow(struct mlx5_core_dev *mdev, void *flow,
+   struct tls_crypto_info *crypto_info,
+   u32 start_offload_tcp_sn, u32 *p_swid,
+   bool direction_sx);
+void mlx5_accel_tls_del_flow(struct mlx5_core_dev *mdev, u32 swid,
+bool direction_sx);
+int mlx5_accel_tls_resync_rx(struct mlx5_core_dev *mdev, u32 handle, u32 seq,
+u64 rcd_sn);
 bool mlx5_accel_is_tls_device(struct mlx5_core_dev *mdev);
 u32 mlx5_accel_tls_device_caps(struct mlx5_core_dev *mdev);
 int mlx5_accel_tls_init(struct mlx5_core_dev *mdev);
@@ -71,11 +75,15 @@ int mlx5_accel_tls_add_tx_flow(struct mlx5_core_dev *mdev, 
void *flow,
 
 #else
 
-static inline int
-mlx5_accel_tls_add_tx_flow(struct mlx5_core_dev *mdev, void *flow,
-  struct tls_crypto_info *crypto_info,
-  u32 start_offload_tcp_sn, u32 *p_swid) { return 0; }
-static inline void mlx5_accel_tls_del_tx_flow(struct mlx5_core_dev *mdev, u32 
swid) { }
+static int
+mlx5_accel_tls_add_flow(struct mlx5_core_dev *mdev, void *flow,
+   struct tls_crypto_info *crypto_info,
+   u32 start_offload_tcp_sn, u32 *p_swid,
+   bool direction_sx) { return -ENOTSUPP; }
+static inline void mlx5_accel_tls_del_flow(struct mlx5_core_dev *mdev, u32 
swid,
+  bool direction_sx) { }
+static inline int mlx5_accel_tls_resync_rx(struct mlx5_core_dev *mdev, u32 
handle,
+  u32 seq, u64 rcd_sn) { return 0; }
 static inline bool mlx5_accel_is_tls_device(struct mlx5_core_dev *mdev) { 
return false; }
 static inline u32 mlx5_accel_tls_device_caps(struct mlx5_core_dev *mdev) { 
return 0; }
 static inline int mlx5_accel_tls_init(struct mlx5_core_dev *mdev) { return 0; }
d

[PATCH v4 net-next 16/19] net/mlx5e: TLS, build TLS netdev from capabilities

2018-07-12 Thread Boris Pismenny

This patch enables TLS Rx based on available HW capabilities.

Signed-off-by: Boris Pismenny 
---
 drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls.c | 18 --
 1 file changed, 16 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls.c
index 541e6f4..eddd7702 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls.c
@@ -183,13 +183,27 @@ static void mlx5e_tls_resync_rx(struct net_device 
*netdev, struct sock *sk,
 
 void mlx5e_tls_build_netdev(struct mlx5e_priv *priv)
 {
+   u32 caps = mlx5_accel_tls_device_caps(priv->mdev);
struct net_device *netdev = priv->netdev;
 
if (!mlx5_accel_is_tls_device(priv->mdev))
return;
 
-   netdev->features |= NETIF_F_HW_TLS_TX;
-   netdev->hw_features |= NETIF_F_HW_TLS_TX;
+   if (caps & MLX5_ACCEL_TLS_TX) {
+   netdev->features  |= NETIF_F_HW_TLS_TX;
+   netdev->hw_features   |= NETIF_F_HW_TLS_TX;
+   }
+
+   if (caps & MLX5_ACCEL_TLS_RX) {
+   netdev->features  |= NETIF_F_HW_TLS_RX;
+   netdev->hw_features   |= NETIF_F_HW_TLS_RX;
+   }
+
+   if (!(caps & MLX5_ACCEL_TLS_LRO)) {
+   netdev->features  &= ~NETIF_F_LRO;
+   netdev->hw_features   &= ~NETIF_F_LRO;
+   }
+
netdev->tlsdev_ops = &mlx5e_tls_ops;
 }
 
-- 
1.8.3.1

[PATCH v4 net-next 03/19] net: Add TLS rx resync NDO

2018-07-12 Thread Boris Pismenny

Add new netdev tls op for resynchronizing HW tls context

Signed-off-by: Boris Pismenny 
---
 include/linux/netdevice.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index b683971..0434df3 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -903,6 +903,8 @@ struct tlsdev_ops {
void (*tls_dev_del)(struct net_device *netdev,
struct tls_context *ctx,
enum tls_offload_ctx_dir direction);
+   void (*tls_dev_resync_rx)(struct net_device *netdev,
+ struct sock *sk, u32 seq, u64 rcd_sn);
 };
 #endif
 
-- 
1.8.3.1

[PATCH v4 net-next 19/19] net/mlx5e: Kconfig, mutually exclude compilation of TLS and IPsec accel

2018-07-12 Thread Boris Pismenny

We currently have no devices that support both TLS and IPsec using the
accel framework, and the current code does not support both IPsec and
TLS. This patch prevents such combinations.

Signed-off-by: Boris Pismenny 
---
 drivers/net/ethernet/mellanox/mlx5/core/Kconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Kconfig 
b/drivers/net/ethernet/mellanox/mlx5/core/Kconfig
index 2545296..d3e8c70 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/Kconfig
+++ b/drivers/net/ethernet/mellanox/mlx5/core/Kconfig
@@ -93,6 +93,7 @@ config MLX5_EN_TLS
depends on TLS_DEVICE
depends on TLS=y || MLX5_CORE=m
depends on MLX5_ACCEL
+   depends on !MLX5_EN_IPSEC
default n
---help---
  Build support for TLS cryptography-offload accelaration in the NIC.
-- 
1.8.3.1

[PATCH v4 net-next 14/19] net/mlx5e: TLS, add Innova TLS rx data path

2018-07-12 Thread Boris Pismenny

Implement the TLS rx offload data path according to the
requirements of the TLS generic NIC offload infrastructure.

Special metadata ethertype is used to pass information to
the hardware.

When hardware loses synchronization a special resync request
metadata message is used to request resync.

Signed-off-by: Boris Pismenny 
Signed-off-by: Ilya Lesokhin 
---
 .../mellanox/mlx5/core/en_accel/tls_rxtx.c | 112 -
 .../mellanox/mlx5/core/en_accel/tls_rxtx.h |   3 +
 drivers/net/ethernet/mellanox/mlx5/core/en_rx.c|   6 ++
 3 files changed, 118 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls_rxtx.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls_rxtx.c
index c96196f..d460fda 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls_rxtx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls_rxtx.c
@@ -33,6 +33,12 @@
 
 #include "en_accel/tls.h"
 #include "en_accel/tls_rxtx.h"
+#include 
+#include 
+
+#define SYNDROM_DECRYPTED  0x30
+#define SYNDROM_RESYNC_REQUEST 0x31
+#define SYNDROM_AUTH_FAILED 0x32
 
 #define SYNDROME_OFFLOAD_REQUIRED 32
 #define SYNDROME_SYNC 33
@@ -44,10 +50,26 @@ struct sync_info {
skb_frag_t frags[MAX_SKB_FRAGS];
 };
 
-struct mlx5e_tls_metadata {
+struct recv_metadata_content {
+   u8 syndrome;
+   u8 reserved;
+   __be32 sync_seq;
+} __packed;
+
+struct send_metadata_content {
/* One byte of syndrome followed by 3 bytes of swid */
__be32 syndrome_swid;
__be16 first_seq;
+} __packed;
+
+struct mlx5e_tls_metadata {
+   union {
+   /* from fpga to host */
+   struct recv_metadata_content recv;
+   /* from host to fpga */
+   struct send_metadata_content send;
+   unsigned char raw[6];
+   } __packed content;
/* packet type ID field */
__be16 ethertype;
 } __packed;
@@ -68,7 +90,8 @@ static int mlx5e_tls_add_metadata(struct sk_buff *skb, __be32 
swid)
2 * ETH_ALEN);
 
eth->h_proto = cpu_to_be16(MLX5E_METADATA_ETHER_TYPE);
-   pet->syndrome_swid = htonl(SYNDROME_OFFLOAD_REQUIRED << 24) | swid;
+   pet->content.send.syndrome_swid =
+   htonl(SYNDROME_OFFLOAD_REQUIRED << 24) | swid;
 
return 0;
 }
@@ -149,7 +172,7 @@ static void mlx5e_tls_complete_sync_skb(struct sk_buff *skb,
 
pet = (struct mlx5e_tls_metadata *)(nskb->data + sizeof(struct ethhdr));
memcpy(pet, &syndrome, sizeof(syndrome));
-   pet->first_seq = htons(tcp_seq);
+   pet->content.send.first_seq = htons(tcp_seq);
 
/* MLX5 devices don't care about the checksum partial start, offset
 * and pseudo header
@@ -276,3 +299,86 @@ struct sk_buff *mlx5e_tls_handle_tx_skb(struct net_device 
*netdev,
 out:
return skb;
 }
+
+static int tls_update_resync_sn(struct net_device *netdev,
+   struct sk_buff *skb,
+   struct mlx5e_tls_metadata *mdata)
+{
+   struct sock *sk = NULL;
+   struct iphdr *iph;
+   struct tcphdr *th;
+   __be32 seq;
+
+   if (mdata->ethertype != htons(ETH_P_IP))
+   return -EINVAL;
+
+   iph = (struct iphdr *)(mdata + 1);
+
+   th = ((void *)iph) + iph->ihl * 4;
+
+   if (iph->version == 4) {
+   sk = inet_lookup_established(dev_net(netdev), &tcp_hashinfo,
+iph->saddr, th->source, iph->daddr,
+th->dest, netdev->ifindex);
+#if IS_ENABLED(CONFIG_IPV6)
+   } else {
+   struct ipv6hdr *ipv6h = (struct ipv6hdr *)iph;
+
+   sk = __inet6_lookup_established(dev_net(netdev), &tcp_hashinfo,
+   &ipv6h->saddr, th->source,
+   &ipv6h->daddr, th->dest,
+   netdev->ifindex, 0);
+#endif
+   }
+   if (!sk || sk->sk_state == TCP_TIME_WAIT)
+   goto out;
+
+   skb->sk = sk;
+   skb->destructor = sock_edemux;
+
+   memcpy(&seq, &mdata->content.recv.sync_seq, sizeof(seq));
+   tls_offload_rx_resync_request(sk, seq);
+out:
+   return 0;
+}
+
+void mlx5e_tls_handle_rx_skb(struct net_device *netdev, struct sk_buff *skb,
+u32 *cqe_bcnt)
+{
+   struct mlx5e_tls_metadata *mdata;
+   struct ethhdr *old_eth;
+   struct ethhdr *new_eth;
+   __be16 *ethtype;
+
+   /* Detect inline metadata */
+   if (skb->len < ETH_HLEN + MLX5E_METADATA_ETHER_LEN)
+   return;
+   ethtype = (__be16 *)(skb->data + ETH_ALEN * 2);
+   if (*ethtype != cpu_to_be16(MLX5E_METADATA_ETHER_TYPE))
+   return;
+
+   /* Use the metadata */
+   mdata = (struct mlx5e_tls_metadata *)(skb->data + ETH_HLEN);
+   switch (mdata->content.recv.syndrome) {
+   case SYND

Re: [PATCH v3 net-next 10/19] tls: Fix zerocopy_from_iter iov handling

2018-07-12 Thread Boris Pismenny





On 7/12/2018 12:46 PM, Dave Watson wrote:

On 07/11/18 10:54 PM, Boris Pismenny wrote:

zerocopy_from_iter iterates over the message, but it doesn't revert the
updates made by the iov iteration. This patch fixes it. Now, the iov can
be used after calling zerocopy_from_iter.


This breaks tests (which I will send up as selftests shortly).  I
believe we are depending on zerocopy_from_iter to advance the iter,
and if zerocopy_from_iter returns a failure, then we revert it.  So
you can revert it here if you want, but you'd have to advance it if we
actually used it instead.



Only in the send side do we depend on this semantic. On the receive 
side, we need to revert it in case we go to the fallback flow.

[PATCH net-next v2 0/2] net: phy: add functionality to speed down PHY when waiting for WoL packet

2018-07-12 Thread Heiner Kallweit

Some network drivers include functionality to speed down the PHY when
suspending and just waiting for a WoL packet because this saves energy.

This patch is based on our recent discussion about factoring out this
functionality to phylib. First user will be the r8169 driver.

v2:
- add warning comment to phy_speed_down regarding usage of sync = false
- remove sync parameter from phy_speed_up

Heiner Kallweit (2):
  net: phy: add helper phy_config_aneg
  net: phy: add phy_speed_down and phy_speed_up

 drivers/net/phy/phy.c | 91 +--
 include/linux/phy.h   |  2 +
 2 files changed, 89 insertions(+), 4 deletions(-)

-- 
2.18.0

Re: [net v2] sch_fq_codel: zero q->flows_cnt when fq_codel_init fails

2018-07-12 Thread David Miller

From: Jacob Keller 
Date: Tue, 10 Jul 2018 14:22:27 -0700

> When fq_codel_init fails, qdisc_create_dflt will cleanup by using
> qdisc_destroy. This function calls the ->reset() op prior to calling the
> ->destroy() op.
> 
> Unfortunately, during the failure flow for sch_fq_codel, the ->flows
> parameter is not initialized, so the fq_codel_reset function will null
> pointer dereference.
 ...
> This is caused because flows_cnt is non-zero, but flows hasn't been
> initialized. fq_codel_init has left the private data in a partially
> initialized state.
> 
> To fix this, reset flows_cnt to 0 when we fail to initialize.
> Additionally, to make the state more consistent, also cleanup the flows
> pointer when the allocation of backlogs fails.
> 
> This fixes the NULL pointer dereference, since both the for-loop and
> memset in fq_codel_reset will be no-ops when flow_cnt is zero.
> 
> Signed-off-by: Jacob Keller 

Applied and queued up for -stable, thanks!

[PATCH net-next v2 2/2] net: phy: add phy_speed_down and phy_speed_up

2018-07-12 Thread Heiner Kallweit

Some network drivers include functionality to speed down the PHY when
suspending and just waiting for a WoL packet because this saves energy.
This functionality is quite generic, therefore let's factor it out to
phylib.

Signed-off-by: Heiner Kallweit 
---
v2:
- add comment to phy_speed_down regarding use of sync = false
- remove sync parameter from phy_speed_up
---
 drivers/net/phy/phy.c | 78 +++
 include/linux/phy.h   |  2 ++
 2 files changed, 80 insertions(+)

diff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c
index c4aa360d..e61864ca 100644
--- a/drivers/net/phy/phy.c
+++ b/drivers/net/phy/phy.c
@@ -551,6 +551,84 @@ int phy_start_aneg(struct phy_device *phydev)
 }
 EXPORT_SYMBOL(phy_start_aneg);
 
+static int phy_poll_aneg_done(struct phy_device *phydev)
+{
+   unsigned int retries = 100;
+   int ret;
+
+   do {
+   msleep(100);
+   ret = phy_aneg_done(phydev);
+   } while (!ret && --retries);
+
+   if (!ret)
+   return -ETIMEDOUT;
+
+   return ret < 0 ? ret : 0;
+}
+
+/**
+ * phy_speed_down - set speed to lowest speed supported by both link partners
+ * @phydev: the phy_device struct
+ * @sync: perform action synchronously
+ *
+ * Description: Typically used to save energy when waiting for a WoL packet
+ *
+ * WARNING: Setting sync to false may cause the system being unable to suspend
+ * in case the PHY generates an interrupt when finishing the autonegotiation.
+ * This interrupt may wake up the system immediately after suspend.
+ * Therefore use sync = false only if you're sure it's safe with the respective
+ * network chip.
+ */
+int phy_speed_down(struct phy_device *phydev, bool sync)
+{
+   u32 adv = phydev->lp_advertising & phydev->supported;
+   u32 adv_old = phydev->advertising;
+   int ret;
+
+   if (phydev->autoneg != AUTONEG_ENABLE)
+   return 0;
+
+   if (adv & PHY_10BT_FEATURES)
+   phydev->advertising &= ~(PHY_100BT_FEATURES |
+PHY_1000BT_FEATURES);
+   else if (adv & PHY_100BT_FEATURES)
+   phydev->advertising &= ~PHY_1000BT_FEATURES;
+
+   if (phydev->advertising == adv_old)
+   return 0;
+
+   ret = phy_config_aneg(phydev);
+   if (ret)
+   return ret;
+
+   return sync ? phy_poll_aneg_done(phydev) : 0;
+}
+EXPORT_SYMBOL_GPL(phy_speed_down);
+
+/**
+ * phy_speed_up - (re)set advertised speeds to all supported speeds
+ * @phydev: the phy_device struct
+ *
+ * Description: Used to revert the effect of phy_speed_down
+ */
+int phy_speed_up(struct phy_device *phydev)
+{
+   u32 mask = PHY_10BT_FEATURES | PHY_100BT_FEATURES | PHY_1000BT_FEATURES;
+   u32 adv_old = phydev->advertising;
+
+   if (phydev->autoneg != AUTONEG_ENABLE)
+   return 0;
+
+   phydev->advertising = (adv_old & ~mask) | (phydev->supported & mask);
+
+   if (phydev->advertising == adv_old)
+   return 0;
+
+   return phy_config_aneg(phydev);
+}
+EXPORT_SYMBOL_GPL(phy_speed_up);
+
 /**
  * phy_start_machine - start PHY state machine tracking
  * @phydev: the phy_device struct
diff --git a/include/linux/phy.h b/include/linux/phy.h
index 6cd09098..075c2f77 100644
--- a/include/linux/phy.h
+++ b/include/linux/phy.h
@@ -942,6 +942,8 @@ void phy_start(struct phy_device *phydev);
 void phy_stop(struct phy_device *phydev);
 int phy_start_aneg(struct phy_device *phydev);
 int phy_aneg_done(struct phy_device *phydev);
+int phy_speed_down(struct phy_device *phydev, bool sync);
+int phy_speed_up(struct phy_device *phydev);
 
 int phy_stop_interrupts(struct phy_device *phydev);
 int phy_restart_aneg(struct phy_device *phydev);
-- 
2.18.0

[PATCH net-next v2 1/2] net: phy: add helper phy_config_aneg

2018-07-12 Thread Heiner Kallweit

This functionality will also be needed in subsequent patches of this
series, therefore factor it out to a helper.

Signed-off-by: Heiner Kallweit 
Reviewed-by: Andrew Lunn 
Reviewed-by: Florian Fainelli 
---
 drivers/net/phy/phy.c | 13 +
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c
index 537297d2..c4aa360d 100644
--- a/drivers/net/phy/phy.c
+++ b/drivers/net/phy/phy.c
@@ -467,6 +467,14 @@ int phy_mii_ioctl(struct phy_device *phydev, struct ifreq 
*ifr, int cmd)
 }
 EXPORT_SYMBOL(phy_mii_ioctl);
 
+static int phy_config_aneg(struct phy_device *phydev)
+{
+   if (phydev->drv->config_aneg)
+   return phydev->drv->config_aneg(phydev);
+   else
+   return genphy_config_aneg(phydev);
+}
+
 /**
  * phy_start_aneg_priv - start auto-negotiation for this PHY device
  * @phydev: the phy_device struct
@@ -493,10 +501,7 @@ static int phy_start_aneg_priv(struct phy_device *phydev, 
bool sync)
/* Invalidate LP advertising flags */
phydev->lp_advertising = 0;
 
-   if (phydev->drv->config_aneg)
-   err = phydev->drv->config_aneg(phydev);
-   else
-   err = genphy_config_aneg(phydev);
+   err = phy_config_aneg(phydev);
if (err < 0)
goto out_unlock;
 
-- 
2.18.0

Re: [PATCH net-next] tc-testing: add geneve options in tunnel_key unit tests

2018-07-12 Thread David Miller

From: Jakub Kicinski 
Date: Tue, 10 Jul 2018 18:22:31 -0700

> From: Pieter Jansen van Vuuren 
> 
> Extend tc tunnel_key action unit tests with geneve options. Tests
> include testing single and multiple geneve options, as well as
> testing geneve options that are expected to fail.
> 
> Signed-off-by: Pieter Jansen van Vuuren 

Applied, thanks.

[PATCH bpf v2] bpf: don't leave partial mangled prog in jit_subprogs error path

2018-07-12 Thread Daniel Borkmann

syzkaller managed to trigger the following bug through fault injection:

  [...]
  [  141.043668] verifier bug. No program starts at insn 3
  [  141.044648] WARNING: CPU: 3 PID: 4072 at kernel/bpf/verifier.c:1613
 get_callee_stack_depth kernel/bpf/verifier.c:1612 [inline]
  [  141.044648] WARNING: CPU: 3 PID: 4072 at kernel/bpf/verifier.c:1613
 fixup_call_args kernel/bpf/verifier.c:5587 [inline]
  [  141.044648] WARNING: CPU: 3 PID: 4072 at kernel/bpf/verifier.c:1613
 bpf_check+0x525e/0x5e60 kernel/bpf/verifier.c:5952
  [  141.047355] CPU: 3 PID: 4072 Comm: a.out Not tainted 4.18.0-rc4+ #51
  [  141.048446] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),BIOS 
1.10.2-1 04/01/2014
  [  141.049877] Call Trace:
  [  141.050324]  __dump_stack lib/dump_stack.c:77 [inline]
  [  141.050324]  dump_stack+0x1c9/0x2b4 lib/dump_stack.c:113
  [  141.050950]  ? dump_stack_print_info.cold.2+0x52/0x52 lib/dump_stack.c:60
  [  141.051837]  panic+0x238/0x4e7 kernel/panic.c:184
  [  141.052386]  ? add_taint.cold.5+0x16/0x16 kernel/panic.c:385
  [  141.053101]  ? __warn.cold.8+0x148/0x1ba kernel/panic.c:537
  [  141.053814]  ? __warn.cold.8+0x117/0x1ba kernel/panic.c:530
  [  141.054506]  ? get_callee_stack_depth kernel/bpf/verifier.c:1612 [inline]
  [  141.054506]  ? fixup_call_args kernel/bpf/verifier.c:5587 [inline]
  [  141.054506]  ? bpf_check+0x525e/0x5e60 kernel/bpf/verifier.c:5952
  [  141.055163]  __warn.cold.8+0x163/0x1ba kernel/panic.c:538
  [  141.055820]  ? get_callee_stack_depth kernel/bpf/verifier.c:1612 [inline]
  [  141.055820]  ? fixup_call_args kernel/bpf/verifier.c:5587 [inline]
  [  141.055820]  ? bpf_check+0x525e/0x5e60 kernel/bpf/verifier.c:5952
  [...]

What happens in jit_subprogs() is that kcalloc() for the subprog func
buffer is failing with NULL where we then bail out. Latter is a plain
return -ENOMEM, and this is definitely not okay since earlier in the
loop we are walking all subprogs and temporarily rewrite insn->off to
remember the subprog id as well as insn->imm to temporarily point the
call to __bpf_call_base + 1 for the initial JIT pass. Thus, bailing
out in such state and handing this over to the interpreter is troublesome
since later/subsequent e.g. find_subprog() lookups are based on wrong
insn->imm.

Therefore, once we hit this point, we need to jump to out_free path
where we undo all changes from earlier loop, so that interpreter can
work on unmodified insn->{off,imm}.

Another point is that should find_subprog() fail in jit_subprogs() due
to a verifier bug, then we also should not simply defer the program to
the interpreter since also here we did partial modifications. Instead
we should just bail out entirely and return an error to the user who is
trying to load the program.

Fixes: 1c2a088a6626 ("bpf: x64: add JIT support for multi-function programs")
Reported-by: syzbot+7d427828b2ea6e592...@syzkaller.appspotmail.com
Signed-off-by: Daniel Borkmann 
---
 v1 -> v2:
   - used label instead of if condition, bit cleaner and shorter

 kernel/bpf/verifier.c | 11 +--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 9e2bf83..63aaac5 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -5430,6 +5430,10 @@ static int jit_subprogs(struct bpf_verifier_env *env)
if (insn->code != (BPF_JMP | BPF_CALL) ||
insn->src_reg != BPF_PSEUDO_CALL)
continue;
+   /* Upon error here we cannot fall back to interpreter but
+* need a hard reject of the program. Thus -EFAULT is
+* propagated in any case.
+*/
subprog = find_subprog(env, i + insn->imm + 1);
if (subprog < 0) {
WARN_ONCE(1, "verifier bug. No program starts at insn 
%d\n",
@@ -5450,7 +5454,7 @@ static int jit_subprogs(struct bpf_verifier_env *env)
 
func = kcalloc(env->subprog_cnt, sizeof(prog), GFP_KERNEL);
if (!func)
-   return -ENOMEM;
+   goto out_undo_insn;
 
for (i = 0; i < env->subprog_cnt; i++) {
subprog_start = subprog_end;
@@ -5515,7 +5519,7 @@ static int jit_subprogs(struct bpf_verifier_env *env)
tmp = bpf_int_jit_compile(func[i]);
if (tmp != func[i] || func[i]->bpf_func != old_bpf_func) {
verbose(env, "JIT doesn't support bpf-to-bpf calls\n");
-   err = -EFAULT;
+   err = -ENOTSUPP;
goto out_free;
}
cond_resched();
@@ -5552,6 +5556,7 @@ static int jit_subprogs(struct bpf_verifier_env *env)
if (func[i])
bpf_jit_free(func[i]);
kfree(func);
+out_undo_insn:
/* cleanup main prog to be interpreted */
prog->jit_requested = 0;
for (i = 0, insn = prog->insns

[PATCH net-next] net: phy: realtek: add missing entry for RTL8211C to mdio_device_id table

2018-07-12 Thread Heiner Kallweit

Add missing entry for RTL8211C to mdio_device_id table.

Signed-off-by: Heiner Kallweit 
Fixes: cf87915cb9f8 ("net: phy: realtek: add support for RTL8211C")
---
 drivers/net/phy/realtek.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/phy/realtek.c b/drivers/net/phy/realtek.c
index f8f12783..0610148c 100644
--- a/drivers/net/phy/realtek.c
+++ b/drivers/net/phy/realtek.c
@@ -279,6 +279,7 @@ static struct mdio_device_id __maybe_unused realtek_tbl[] = 
{
{ 0x001cc816, 0x001f },
{ 0x001cc910, 0x001f },
{ 0x001cc912, 0x001f },
+   { 0x001cc913, 0x001f },
{ 0x001cc914, 0x001f },
{ 0x001cc915, 0x001f },
{ 0x001cc916, 0x001f },
-- 
2.18.0

Re: [PATCH net-next 2/2] net: phy: add phy_speed_down and phy_speed_up

2018-07-12 Thread Florian Fainelli




On 07/12/2018 12:25 PM, Florian Fainelli wrote:
> 
> 
> On 07/12/2018 12:10 PM, Heiner Kallweit wrote:
>> On 12.07.2018 21:09, Andrew Lunn wrote:
 Like r8169 also tg3 driver doesn't wait for the speed-down-renegotiation
 to finish. Therefore, even though I share Andrew's concerns, there seem
 to be chips where it's safe to not wait for the renegotiation to finish
 (e.g. because device is in PCI D3 already and can't generate an interrupt).
 Having said that I'd keep the sync parameter for phy_speed_down so that
 the driver can decide.
>>>
>>> Hi Heiner
>>>
>>> Please put a big fat comment about the dangers of sync=false in the
>>> function header. We want people to known it is dangerous by default,
>>> and should only be used in special conditions, when it is known to be
>>> safe.
>>> Andrew
>>>
>> OK ..
> 
> What part do you find dangerous? Magic Packets are UDP packets and they
> are not routed (unless specifically taken care of) so there is already
> some "lossy" behavior involved with waking-up an Ethernet MAC, I don't
> think that is too bad to retry several times until the link comes up.

I see the concern with the comment from v2, and indeed you could get an
interrupt signaling the PHY auto-negotiated the link before or at the
time we are suspending causing potentially an early wake-up. Not that
this should be a problem though since there is usually a point of not
return past which you can't do early wake-up anyway.
-- 
Florian

Re: [PATCH net-next 2/2] net: phy: add phy_speed_down and phy_speed_up

2018-07-12 Thread Heiner Kallweit

On 12.07.2018 21:53, Florian Fainelli wrote:
> 
> 
> On 07/12/2018 12:25 PM, Florian Fainelli wrote:
>>
>>
>> On 07/12/2018 12:10 PM, Heiner Kallweit wrote:
>>> On 12.07.2018 21:09, Andrew Lunn wrote:
> Like r8169 also tg3 driver doesn't wait for the speed-down-renegotiation
> to finish. Therefore, even though I share Andrew's concerns, there seem
> to be chips where it's safe to not wait for the renegotiation to finish
> (e.g. because device is in PCI D3 already and can't generate an 
> interrupt).
> Having said that I'd keep the sync parameter for phy_speed_down so that
> the driver can decide.

 Hi Heiner

 Please put a big fat comment about the dangers of sync=false in the
 function header. We want people to known it is dangerous by default,
 and should only be used in special conditions, when it is known to be
 safe.
Andrew

>>> OK ..
>>
>> What part do you find dangerous? Magic Packets are UDP packets and they
>> are not routed (unless specifically taken care of) so there is already
>> some "lossy" behavior involved with waking-up an Ethernet MAC, I don't
>> think that is too bad to retry several times until the link comes up.
> 
> I see the concern with the comment from v2, and indeed you could get an
> interrupt signaling the PHY auto-negotiated the link before or at the
> time we are suspending causing potentially an early wake-up. Not that
> this should be a problem though since there is usually a point of not
> return past which you can't do early wake-up anyway.
> 
I think we should leave the comment in for the moment so that people
think twice about the described scenario. If we should find out that
the issue can't be triggered on all platforms then we still can remove
the comment.

Re: [net-next PATCH] net: ipv4: fix listify ip_rcv_finish in case of forwarding

2018-07-12 Thread Or Gerlitz

On Wed, Jul 11, 2018 at 11:06 PM, Jesper Dangaard Brouer
 wrote:

> Well, I would prefer you to implement those.  I just did a quick
> implementation (its trivially easy) so I have something to benchmark
> with.  The performance boost is quite impressive!

sounds good, but wait

> One reason I didn't "just" send a patch, is that Edward so-fare only
> implemented netif_receive_skb_list() and not napi_gro_receive_list().

sfc does't support gro?! doesn't make sense.. Edward?

> And your driver uses napi_gro_receive().  This sort-of disables GRO for
> your driver, which is not a choice I can make.  Interestingly I get
> around the same netperf TCP_STREAM performance.

Same TCP performance

with GRO and no rx-batching

or

without GRO and yes rx-batching

is by far not intuitive result to me unless both these techniques
mostly serve to eliminate lots of instruction cache misses and the
TCP stack is so much optimized that if the code is in the cache,
going through it once with 64K byte GRO-ed packet is like going
through it ~40 (64K/1500) times with non GRO-ed packets.

What's the baseline (with GRO and no rx-batching) number on your setup?

> I assume we can get even better perf if we "listify" napi_gro_receive.

yeah, that would be very interesting to get there

1 2 >

1 - 100 of 164 matches

Mail list logo