date:20180222

Re: ss issue on arm not showing UDP listening ports

2018-02-22 Thread Guillaume Nault

On Wed, Feb 21, 2018 at 07:59:24PM -0600, Jesse Cooper wrote:
> Thank you for the suggestions. This is on a raspberry pi 3 not sure if
> that fact matters. I will notify Raspbian of the issue.
> 
Does your kernel have CONFIG_INET_UDP_DIAG?

Re: [PATCH net-next v2 1/1] net: Allow a rule to track originating protocol

2018-02-22 Thread Ido Schimmel

Hi Donald,

On Tue, Feb 20, 2018 at 08:55:58AM -0500, Donald Sharp wrote:
> Allow a rule that is being added/deleted/modified or
> dumped to contain the originating protocol's id.
> 
> The protocol is handled just like a routes originating
> protocol is.  This is especially useful because there
> is starting to be a plethora of different user space
> programs adding rules.
> 
> Allow the vrf device to specify that the kernel is the originator
> of the rule created for this device.
> 
> Signed-off-by: Donald Sharp 
> ---
>  drivers/net/vrf.c  | 1 +
>  include/net/fib_rules.h| 3 ++-
>  include/uapi/linux/fib_rules.h | 2 +-
>  net/core/fib_rules.c   | 7 ++-
>  4 files changed, 10 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/net/vrf.c b/drivers/net/vrf.c
> index 139c61c8244a..ec6d2d623b60 100644
> --- a/drivers/net/vrf.c
> +++ b/drivers/net/vrf.c
> @@ -1175,6 +1175,7 @@ static int vrf_fib_rule(const struct net_device *dev, 
> __u8 family, bool add_it)
>   memset(frh, 0, sizeof(*frh));
>   frh->family = family;
>   frh->action = FR_ACT_TO_TBL;
> + frh->proto = RTPROT_KERNEL;
>  
>   if (nla_put_u8(skb, FRA_L3MDEV, 1))
>   goto nla_put_failure;
> diff --git a/include/net/fib_rules.h b/include/net/fib_rules.h
> index 648caf90ec07..b166ef07e6d4 100644
> --- a/include/net/fib_rules.h
> +++ b/include/net/fib_rules.h
> @@ -26,7 +26,8 @@ struct fib_rule {
>   u32 table;
>   u8  action;
>   u8  l3mdev;
> - /* 2 bytes hole, try to use */
> + u8  proto;
> + /* 1 byte hole, try to use */
>   u32 target;
>   __be64  tun_id;
>   struct fib_rule __rcu   *ctarget;
> diff --git a/include/uapi/linux/fib_rules.h b/include/uapi/linux/fib_rules.h
> index 2b642bf9b5a0..925539172d5b 100644
> --- a/include/uapi/linux/fib_rules.h
> +++ b/include/uapi/linux/fib_rules.h
> @@ -23,8 +23,8 @@ struct fib_rule_hdr {
>   __u8tos;
>  
>   __u8table;
> + __u8proto;
>   __u8res1;   /* reserved */
> - __u8res2;   /* reserved */
>   __u8action;
>  
>   __u32   flags;
> diff --git a/net/core/fib_rules.c b/net/core/fib_rules.c
> index 98e1066c3d55..c1d4ab5b2d9f 100644
> --- a/net/core/fib_rules.c
> +++ b/net/core/fib_rules.c
> @@ -51,6 +51,7 @@ int fib_default_rule_add(struct fib_rules_ops *ops,
>   r->pref = pref;
>   r->table = table;
>   r->flags = flags;
> + r->proto = RTPROT_KERNEL;
>   r->fr_net = ops->fro_net;
>   r->uid_range = fib_kuid_range_unset;
>  
> @@ -465,6 +466,7 @@ int fib_nl_newrule(struct sk_buff *skb, struct nlmsghdr 
> *nlh,
>   }
>   refcount_set(>refcnt, 1);
>   rule->fr_net = net;
> + rule->proto = frh->proto;
>  
>   rule->pref = tb[FRA_PRIORITY] ? nla_get_u32(tb[FRA_PRIORITY])
> : fib_default_rule_pref(ops);
> @@ -664,6 +666,9 @@ int fib_nl_delrule(struct sk_buff *skb, struct nlmsghdr 
> *nlh,
>   }
>  
>   list_for_each_entry(rule, >rules_list, list) {
> + if (frh->proto && (frh->proto != rule->proto))
> + continue;

This breaks my scripts:
# ip -4 rule show
0:  from all lookup local
32766:  from all lookup main
32767:  from all lookup default

# ip -4 rule del pref 0
RTNETLINK answers: No such file or directory

Using iproute 4.15 in Fedora 27:
# ip -V
ip utility, iproute2-ss180129

Problem is iproute sets protocol to RTPROT_BOOT while rules are
installed with RTPROT_KERNEL.

Maybe add FRA_PROTOCOL?

Thanks!

> +
>   if (frh->action && (frh->action != rule->action))
>   continue;
>  
> @@ -808,9 +813,9 @@ static int fib_nl_fill_rule(struct sk_buff *skb, struct 
> fib_rule *rule,
>   if (nla_put_u32(skb, FRA_SUPPRESS_PREFIXLEN, rule->suppress_prefixlen))
>   goto nla_put_failure;
>   frh->res1 = 0;
> - frh->res2 = 0;
>   frh->action = rule->action;
>   frh->flags = rule->flags;
> + frh->proto = rule->proto;
>  
>   if (rule->action == FR_ACT_GOTO &&
>   rcu_access_pointer(rule->ctarget) == NULL)
> -- 
> 2.14.3
>

Re: [RFC PATCH v3 0/3] Enable virtio_net to act as a backup for a passthru device

2018-02-22 Thread Jiri Pirko

Wed, Feb 21, 2018 at 09:57:09PM CET, alexander.du...@gmail.com wrote:
>On Wed, Feb 21, 2018 at 11:38 AM, Jiri Pirko  wrote:
>> Wed, Feb 21, 2018 at 06:56:35PM CET, alexander.du...@gmail.com wrote:
>>>On Wed, Feb 21, 2018 at 8:58 AM, Jiri Pirko  wrote:
 Wed, Feb 21, 2018 at 05:49:49PM CET, alexander.du...@gmail.com wrote:
>On Wed, Feb 21, 2018 at 8:11 AM, Jiri Pirko  wrote:
>> Wed, Feb 21, 2018 at 04:56:48PM CET, alexander.du...@gmail.com wrote:
>>>On Wed, Feb 21, 2018 at 1:51 AM, Jiri Pirko  wrote:
 Tue, Feb 20, 2018 at 11:33:56PM CET, kubak...@wp.pl wrote:
>On Tue, 20 Feb 2018 21:14:10 +0100, Jiri Pirko wrote:
>> Yeah, I can see it now :( I guess that the ship has sailed and we are
>> stuck with this ugly thing forever...
>>
>> Could you at least make some common code that is shared in between
>> netvsc and virtio_net so this is handled in exacly the same way in 
>> both?
>
>IMHO netvsc is a vendor specific driver which made a mistake on what
>behaviour it provides (or tried to align itself with Windows SR-IOV).
>Let's not make a far, far more commonly deployed and important driver
>(virtio) bug-compatible with netvsc.

 Yeah. netvsc solution is a dangerous precedent here and in my opinition
 it was a huge mistake to merge it. I personally would vote to unmerge 
 it
 and make the solution based on team/bond.


>
>To Jiri's initial comments, I feel the same way, in fact I've talked to
>the NetworkManager guys to get auto-bonding based on MACs handled in
>user space.  I think it may very well get done in next versions of NM,
>but isn't done yet.  Stephen also raised the point that not everybody 
>is
>using NM.

 Can be done in NM, networkd or other network management tools.
 Even easier to do this in teamd and let them all benefit.

 Actually, I took a stab to implement this in teamd. Took me like an 
 hour
 and half.

 You can just run teamd with config option "kidnap" like this:
 # teamd/teamd -c '{"kidnap": true }'

 Whenever teamd sees another netdev to appear with the same mac as his,
 or whenever teamd sees another netdev to change mac to his,
 it enslaves it.

 Here's the patch (quick and dirty):

 Subject: [patch teamd] teamd: introduce kidnap feature

 Signed-off-by: Jiri Pirko 
>>>
>>>So this doesn't really address the original problem we were trying to
>>>solve. You asked earlier why the netdev name mattered and it mostly
>>>has to do with configuration. Specifically what our patch is
>>>attempting to resolve is the issue of how to allow a cloud provider to
>>>upgrade their customer to SR-IOV support and live migration without
>>>requiring them to reconfigure their guest. So the general idea with
>>>our patch is to take a VM that is running with virtio_net only and
>>>allow it to instead spawn a virtio_bypass master using the same netdev
>>>name as the original virtio, and then have the virtio_net and VF come
>>>up and be enslaved by the bypass interface. Doing it this way we can
>>>allow for multi-vendor SR-IOV live migration support using a guest
>>>that was originally configured for virtio only.
>>>
>>>The problem with your solution is we already have teaming and bonding
>>>as you said. There is already a write-up from Red Hat on how to do it
>>>(https://access.redhat.com/documentation/en-us/red_hat_virtualization/4.1/html/virtual_machine_management_guide/sect-migrating_virtual_machines_between_hosts).
>>>That is all well and good as long as you are willing to keep around
>>>two VM images, one for virtio, and one for SR-IOV with live migration.
>>
>> You don't need 2 images. You need only one. The one with the team setup.
>> That's it. If another netdev with the same mac appears, teamd will
>> enslave it and run traffic on it. If not, ok, you'll go only through
>> virtio_net.
>
>Isn't that going to cause the routing table to get messed up when we
>rearrange the netdevs? We don't want to have an significant disruption
> in traffic when we are adding/removing the VF. It seems like we would
>need to invalidate any entries that were configured for the virtio_net
>and reestablish them on the new team interface. Part of the criteria
>we have been working with is that we should be able to transition from
>having a VF to not or vice versa without seeing any significant
>disruption in the traffic.

 What? You have routes on the team netdev. virtio_net and VF are only
 slaves. What are

Re: syzcaller patch postings...

2018-02-22 Thread Paolo Abeni

On Wed, 2018-02-21 at 16:47 -0500, David Miller wrote:
> I have to mention this now before it gets out of control.
> 
> I would like to ask that syzkaller stop posting the patch it is
> testing when it posts to netdev.

There is an open issue on this topic:

https://github.com/google/syzkaller/issues/526

The current behaviour is that syzbot replies to all get_maintainer.pl
recipients after testing a patch, regardless of the test submission
recipient list, the idea was instead to respect such list.

Cheers,

Paolo

Re: WARNING: refcount bug in sock_wfree

2018-02-22 Thread Xin Long

On Wed, Feb 21, 2018 at 9:59 PM, syzbot
 wrote:
> Hello,
>
> syzbot hit the following crash on upstream commit
> 79c0ef3e85c015b0921a8fd5dd539d1480e9cd6c (Mon Feb 19 19:58:19 2018 +)
> Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
>
> So far this crash happened 2 times on
> https://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git/master.
> syzkaller reproducer is attached.
> Raw console output is attached.
> compiler: gcc (GCC) 7.1.1 20170620
> .config is attached.
>
> IMPORTANT: if you fix the bug, please add the following tag to the commit:
> Reported-by: syzbot+98a809ad0f54884bd...@syzkaller.appspotmail.com
> It will help syzbot understand when the bug is fixed. See footer for
> details.
> If you forward the report, please keep this part and the footer.
>
> IPv6: ADDRCONF(NETDEV_UP): bridge0: link is not ready
> IPv6: ADDRCONF(NETDEV_UP): bridge0: link is not ready
> audit: type=1400 audit(1519151768.973:11): avc:  denied  { sys_chroot } for
> pid=4169 comm="syz-executor5" capability=18
> scontext=unconfined_u:system_r:insmod_t:s0-s0:c0.c1023
> tcontext=unconfined_u:system_r:insmod_t:s0-s0:c0.c1023 tclass=cap_userns
> permissive=1
> [ cut here ]
> refcount_t: underflow; use-after-free.
> WARNING: CPU: 0 PID: 5074 at lib/refcount.c:187
> refcount_sub_and_test+0x167/0x1b0 lib/refcount.c:187
> Kernel panic - not syncing: panic_on_warn set ...
It seems that we also need to process the chunks in
outqueue->control_chunk_list in sctp_for_each_tx_datachunk()
when peeling off.

>
> CPU: 0 PID: 5074 Comm: syz-executor5 Not tainted 4.16.0-rc2+ #322
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
> Google 01/01/2011
> Call Trace:
>  __dump_stack lib/dump_stack.c:17 [inline]
>  dump_stack+0x194/0x257 lib/dump_stack.c:53
>  panic+0x1e4/0x41c kernel/panic.c:183
>  __warn+0x1dc/0x200 kernel/panic.c:547
>  report_bug+0x211/0x2d0 lib/bug.c:184
>  fixup_bug.part.11+0x37/0x80 arch/x86/kernel/traps.c:178
>  fixup_bug arch/x86/kernel/traps.c:247 [inline]
>  do_error_trap+0x2d7/0x3e0 arch/x86/kernel/traps.c:296
>  do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:315
>  invalid_op+0x58/0x80 arch/x86/entry/entry_64.S:957
> RIP: 0010:refcount_sub_and_test+0x167/0x1b0 lib/refcount.c:187
> RSP: 0018:8801cf2b6490 EFLAGS: 00010282
> RAX: dc08 RBX: 0401 RCX: 815abdbe
> RDX:  RSI: 110039e56c42 RDI: 110039e56c17
> RBP: 8801cf2b6520 R08: 110039e56bd9 R09: 
> R10:  R11:  R12: 110039e56c93
> R13: ff01 R14: 0500 R15: 8801ae3c69fc
>  sock_wfree+0xa6/0x140 net/core/sock.c:1819
>  sctp_wfree+0x2eb/0x670 net/sctp/socket.c:8065
>  skb_release_head_state+0x124/0x260 net/core/skbuff.c:612
>  skb_release_all+0x15/0x60 net/core/skbuff.c:625
>  __kfree_skb net/core/skbuff.c:641 [inline]
>  consume_skb+0x153/0x490 net/core/skbuff.c:701
>  sctp_chunk_destroy net/sctp/sm_make_chunk.c:1450 [inline]
>  sctp_chunk_put+0x29c/0x420 net/sctp/sm_make_chunk.c:1477
>  sctp_chunk_free+0x53/0x60 net/sctp/sm_make_chunk.c:1464
>  __sctp_outq_teardown+0x244/0x1230 net/sctp/outqueue.c:234
>  sctp_outq_free+0x15/0x20 net/sctp/outqueue.c:291
>  sctp_association_free+0x2d0/0x930 net/sctp/associola.c:356
>  sctp_cmd_delete_tcb net/sctp/sm_sideeffect.c:939 [inline]
>  sctp_cmd_interpreter net/sctp/sm_sideeffect.c:1343 [inline]
>  sctp_side_effects net/sctp/sm_sideeffect.c:1210 [inline]
>  sctp_do_sm+0x32e3/0x6ed0 net/sctp/sm_sideeffect.c:1181
>  sctp_primitive_ABORT+0xa0/0xd0 net/sctp/primitive.c:119
>  sctp_close+0x266/0x9a0 net/sctp/socket.c:1539
>  inet_release+0xed/0x1c0 net/ipv4/af_inet.c:427
>  sock_release+0x8d/0x1e0 net/socket.c:595
>  sock_close+0x16/0x20 net/socket.c:1149
>  __fput+0x327/0x7e0 fs/file_table.c:209
>  fput+0x15/0x20 fs/file_table.c:243
>  task_work_run+0x199/0x270 kernel/task_work.c:113
>  exit_task_work include/linux/task_work.h:22 [inline]
>  do_exit+0x9bb/0x1ad0 kernel/exit.c:865
>  do_group_exit+0x149/0x400 kernel/exit.c:968
>  get_signal+0x73a/0x16d0 kernel/signal.c:2469
>  do_signal+0x90/0x1e90 arch/x86/kernel/signal.c:809
>  exit_to_usermode_loop+0x258/0x2f0 arch/x86/entry/common.c:162
>  prepare_exit_to_usermode arch/x86/entry/common.c:196 [inline]
>  syscall_return_slowpath arch/x86/entry/common.c:265 [inline]
>  do_syscall_64+0x6e5/0x940 arch/x86/entry/common.c:292
>  entry_SYSCALL_64_after_hwframe+0x42/0xb7
> RIP: 0033:0x453da9
> RSP: 002b:7f297546ace8 EFLAGS: 0246 ORIG_RAX: 00ca
> RAX: fe00 RBX: 0072bf80 RCX: 00453da9
> RDX:  RSI:  RDI: 0072bf80
> RBP: 0072bf80 R08:  R09: 0072bf58
> R10:  R11: 0246 R12: 
> R13: 00a3e8ef R14: 7f297546b9c0 R15: 0001
> Dumping ftrace buffer:
>

Re: [PATCH net-next 5/7] net/ipv6: Add support for path selection using hash of 5-tuple

2018-02-22 Thread Ido Schimmel

Hi David,

On Wed, Feb 21, 2018 at 10:49:52AM -0800, David Ahern wrote:
> Some operators prefer IPv6 path selection to use a standard 5-tuple
> hash rather than just an L3 hash with the flow the label. To that end
> add support to IPv6 for multipath hash policy similar to bf4e0a3db97eb
> ("net: ipv4: add support for ECMP hash policy choice"). The default
> is still L3 which covers source and destination addresses along with
> flow label and IPv6 protocol.
> 
> Signed-off-by: David Ahern 
> ---
>  Documentation/networking/ip-sysctl.txt |  7 
>  include/net/ip6_route.h|  3 +-
>  include/net/netevent.h |  1 +
>  include/net/netns/ipv6.h   |  1 +
>  net/ipv6/icmp.c|  2 +-
>  net/ipv6/route.c   | 64 
> ++
>  net/ipv6/sysctl_net_ipv6.c | 26 ++
>  7 files changed, 87 insertions(+), 17 deletions(-)
> 
> diff --git a/Documentation/networking/ip-sysctl.txt 
> b/Documentation/networking/ip-sysctl.txt
> index a553d4e4a0fb..783675a730e5 100644
> --- a/Documentation/networking/ip-sysctl.txt
> +++ b/Documentation/networking/ip-sysctl.txt
> @@ -1363,6 +1363,13 @@ flowlabel_reflect - BOOLEAN
>   FALSE: disabled
>   Default: FALSE
>  
> +fib_multipath_hash_policy - INTEGER
> + Controls which hash policy to use for multipath routes.
> + Default: 0 (Layer 3)
> + Possible values:
> + 0 - Layer 3 (source and destination addresses plus flow label)
> + 1 - Layer 4 (standard 5-tuple)
> +
>  anycast_src_echo_reply - BOOLEAN
>   Controls the use of anycast addresses as source addresses for ICMPv6
>   echo reply
> diff --git a/include/net/ip6_route.h b/include/net/ip6_route.h
> index 27d23a65f3cd..f13657a62e5c 100644
> --- a/include/net/ip6_route.h
> +++ b/include/net/ip6_route.h
> @@ -127,7 +127,8 @@ static inline int ip6_route_get_saddr(struct net *net, 
> struct rt6_info *rt,
>  
>  struct rt6_info *rt6_lookup(struct net *net, const struct in6_addr *daddr,
>   const struct in6_addr *saddr, int oif, int flags);
> -u32 rt6_multipath_hash(const struct flowi6 *fl6, const struct sk_buff *skb);
> +u32 rt6_multipath_hash(const struct net *net, const struct flowi6 *fl6,
> +const struct sk_buff *skb);
>  
>  struct dst_entry *icmp6_dst_alloc(struct net_device *dev, struct flowi6 
> *fl6);
>  
> diff --git a/include/net/netevent.h b/include/net/netevent.h
> index baee605a94ab..d9918261701c 100644
> --- a/include/net/netevent.h
> +++ b/include/net/netevent.h
> @@ -27,6 +27,7 @@ enum netevent_notif_type {
>   NETEVENT_REDIRECT, /* arg is struct netevent_redirect ptr */
>   NETEVENT_DELAY_PROBE_TIME_UPDATE, /* arg is struct neigh_parms ptr */
>   NETEVENT_IPV4_MPATH_HASH_UPDATE, /* arg is struct net ptr */
> + NETEVENT_IPV6_MPATH_HASH_UPDATE, /* arg is struct net ptr */
>  };
>  
>  int register_netevent_notifier(struct notifier_block *nb);
> diff --git a/include/net/netns/ipv6.h b/include/net/netns/ipv6.h
> index 987cc4569cb8..6b0de3b71bbf 100644
> --- a/include/net/netns/ipv6.h
> +++ b/include/net/netns/ipv6.h
> @@ -28,6 +28,7 @@ struct netns_sysctl_ipv6 {
>   int ip6_rt_gc_elasticity;
>   int ip6_rt_mtu_expires;
>   int ip6_rt_min_advmss;
> + int multipath_hash_policy;
>   int flowlabel_consistency;
>   int auto_flowlabels;
>   int icmpv6_time;
> diff --git a/net/ipv6/icmp.c b/net/ipv6/icmp.c
> index 4fa4f1b150a4..f53c14390d9f 100644
> --- a/net/ipv6/icmp.c
> +++ b/net/ipv6/icmp.c
> @@ -522,7 +522,7 @@ static void icmp6_send(struct sk_buff *skb, u8 type, u8 
> code, __u32 info,
>   fl6.fl6_icmp_type = type;
>   fl6.fl6_icmp_code = code;
>   fl6.flowi6_uid = sock_net_uid(net, NULL);
> - fl6.mp_hash = rt6_multipath_hash(, skb);
> + fl6.mp_hash = rt6_multipath_hash(net, , skb);
>   security_skb_classify_flow(skb, flowi6_to_flowi());
>  
>   sk = icmpv6_xmit_lock(net);
> diff --git a/net/ipv6/route.c b/net/ipv6/route.c
> index ab0de69ec67d..5c4481e4f3e1 100644
> --- a/net/ipv6/route.c
> +++ b/net/ipv6/route.c
> @@ -450,7 +450,8 @@ static bool rt6_check_expired(const struct rt6_info *rt)
>   return false;
>  }
>  
> -static struct rt6_info *rt6_multipath_select(struct rt6_info *match,
> +static struct rt6_info *rt6_multipath_select(const struct net *net,
> +  struct rt6_info *match,
>struct flowi6 *fl6, int oif,
>int strict)
>  {
> @@ -460,7 +461,7 @@ static struct rt6_info *rt6_multipath_select(struct 
> rt6_info *match,
>* case it will always be non-zero. Otherwise now is the time to do it.
>*/
>   if (!fl6->mp_hash)
> - fl6->mp_hash = rt6_multipath_hash(fl6, NULL);
> + fl6->mp_hash = rt6_multipath_hash(net, fl6, NULL);

My test is failing and all

[PATCH net] net: aquantia: Fix error handling in aq_pci_probe()

2018-02-22 Thread Dan Carpenter

We should check "self->aq_hw" for allocation failure, and also we should
free it on the error paths.

Fixes: 23ee07ad3c2f ("net: aquantia: Cleanup pci functions module")
Signed-off-by: Dan Carpenter 

diff --git a/drivers/net/ethernet/aquantia/atlantic/aq_pci_func.c 
b/drivers/net/ethernet/aquantia/atlantic/aq_pci_func.c
index 22889fc158f2..87c4308b52a7 100644
--- a/drivers/net/ethernet/aquantia/atlantic/aq_pci_func.c
+++ b/drivers/net/ethernet/aquantia/atlantic/aq_pci_func.c
@@ -226,6 +226,10 @@ static int aq_pci_probe(struct pci_dev *pdev,
goto err_ioremap;
 
self->aq_hw = kzalloc(sizeof(*self->aq_hw), GFP_KERNEL);
+   if (!self->aq_hw) {
+   err = -ENOMEM;
+   goto err_ioremap;
+   }
self->aq_hw->aq_nic_cfg = aq_nic_get_cfg(self);
 
for (bar = 0; bar < 4; ++bar) {
@@ -235,19 +239,19 @@ static int aq_pci_probe(struct pci_dev *pdev,
mmio_pa = pci_resource_start(pdev, bar);
if (mmio_pa == 0U) {
err = -EIO;
-   goto err_ioremap;
+   goto err_free_aq_hw;
}
 
reg_sz = pci_resource_len(pdev, bar);
if ((reg_sz <= 24 /*ATL_REGS_SIZE*/)) {
err = -EIO;
-   goto err_ioremap;
+   goto err_free_aq_hw;
}
 
self->aq_hw->mmio = ioremap_nocache(mmio_pa, reg_sz);
if (!self->aq_hw->mmio) {
err = -EIO;
-   goto err_ioremap;
+   goto err_free_aq_hw;
}
break;
}
@@ -255,7 +259,7 @@ static int aq_pci_probe(struct pci_dev *pdev,
 
if (bar == 4) {
err = -EIO;
-   goto err_ioremap;
+   goto err_free_aq_hw;
}
 
numvecs = min((u8)AQ_CFG_VECS_DEF,
@@ -290,6 +294,8 @@ static int aq_pci_probe(struct pci_dev *pdev,
aq_pci_free_irq_vectors(self);
 err_hwinit:
iounmap(self->aq_hw->mmio);
+err_free_aq_hw:
+   kfree(self->aq_hw);
 err_ioremap:
free_netdev(ndev);
 err_pci_func:

Re: [PATCH net v2 2/2] tuntap: correctly add the missing xdp flush

2018-02-22 Thread Jason Wang




On 2018年02月22日 15:54, Sergei Shtylyov wrote:

Hello!

On 2/22/2018 9:24 AM, Jason Wang wrote:


Commit 762c330d670e ("tuntap: add missing xdp flush") tries to fix the
devmap stall caused by missed xdp flush by counting the pending xdp
redirected packets and flush when it exceeds NAPI_POLL_WEIGHT or
MSG_MORE is clear. This may lead BUG() since xdp_do_flush() was


   Lead to BUG().


called under process context with preemption enabled. Simply disable


   s/under/in the/?


preemption may silent the warning but be not enough since process may


   Silence.


move between different CPUS during a batch which cause xdp_do_flush()
misses some CPU where the process run previously. Consider the several
fallouts, that commit was reverted. To fix the issue correctly, we can
simply calling xdp_do_flush() immediately after xdp_do_redirect(),


   Call.


a side effect is that this removes any possibility of batching which
could be addressed in the future.

Reported-by: Christoffer Dall 
Fixes: 762c330d670e ("tuntap: add missing xdp flush")
Signed-off-by: Jason Wang 

[...]

MBR, Sergei


My bad, let me post v3.

Thanks

Re: syzcaller patch postings...

2018-02-22 Thread Dmitry Vyukov

On Thu, Feb 22, 2018 at 9:26 AM, Paolo Abeni  wrote:
> On Wed, 2018-02-21 at 16:47 -0500, David Miller wrote:
>> I have to mention this now before it gets out of control.
>>
>> I would like to ask that syzkaller stop posting the patch it is
>> testing when it posts to netdev.
>
> There is an open issue on this topic:
>
> https://github.com/google/syzkaller/issues/526
>
> The current behaviour is that syzbot replies to all get_maintainer.pl
> recipients after testing a patch, regardless of the test submission
> recipient list, the idea was instead to respect such list.

Hi David, Florian, Paolo,

Didn't realize it triggers patchwork. This wasn't intentional, sorry.

Do I understand it correctly that if syzbot replies to the CC list
that was in the testing request, it will resolve the problem? So if
netdev wasn't in CC, it will not be added to CC.

I will go and fix it now.

Re: [PATCH bpf] bpf, x64: implement retpoline for tail call

2018-02-22 Thread Daniel Borkmann

On 02/22/2018 04:53 AM, Eric Dumazet wrote:
> On Wed, 2018-02-21 at 19:43 -0800, Alexei Starovoitov wrote:
>> On Wed, Feb 21, 2018 at 07:04:02PM -0800, Eric Dumazet wrote:
>>> On Thu, 2018-02-22 at 01:05 +0100, Daniel Borkmann wrote:
>>>
>>> ...
>>>
 +/* Instead of plain jmp %rax, we emit a retpoline to control
 + * speculative execution for the indirect branch.
 + */
 +static void emit_retpoline_rax_trampoline(u8 **pprog)
 +{
 +  u8 *prog = *pprog;
 +  int cnt = 0;
 +
 +  EMIT1_off32(0xE8, 7);/* callq  */
 +  /* capture_spec: */
 +  EMIT2(0xF3, 0x90);   /* pause */
 +  EMIT3(0x0F, 0xAE, 0xE8); /* lfence */
 +  EMIT2(0xEB, 0xF9);   /* jmp  */
 +  /* set_up_target: */
 +  EMIT4(0x48, 0x89, 0x04, 0x24); /* mov %rax,(%rsp) */
 +  EMIT1(0xC3); /* retq */
 +
 +  BUILD_BUG_ON(cnt != RETPOLINE_SIZE);
 +  *pprog = prog;
>>>
>>> You might define the actual code sequence (and length) in 
>>> arch/x86/include/asm/nospec-branch.h
>>>
>>> If we need to adjust code sequences for RETPOLINE, then we wont
>>> forget/miss that arch/x86/net/bpf_jit_comp.c had it hard-coded.
>>
>> like adding a comment to asm/nospec-branch.h that says
>> "dont forget to adjust bpf_jit_comp.c" ?
>> but clang/gcc generate slightly different sequences for
>> retpoline anyway, so even if '.macro RETPOLINE_JMP' in
>> nospec-branch.h changes it doesn't mean that x64 jit has to change.
>> So what kinda comment there would make sense?
> 
> I was thinking of something very explicit :
> 
> /* byte sequence for following assembly code used by eBPF
>call ...
>...
>retq
> */
> #define RETPOLINE_RAX_DIRECT_FOR_EBPF \
>    EMIT1_off32(0xE8, 7);/* callq  */   \
>    /* capture_spec: */\
>    EMIT2(0xF3, 0x90);   /* pause */   \
>    EMIT3(0x0F, 0xAE, 0xE8); /* lfence */  \
>    EMIT2(0xEB, 0xF9);   /* jmp  */  \
>    /* set_up_target: */   \
>    EMIT4(0x48, 0x89, 0x04, 0x24); /* mov %rax,(%rsp) */   \
>    EMIT1(0xC3); /* retq */\
> 
> Might be simply byte encoded, (array of 17 bytes)
> 
> Well, something like that anyway...

Okay, sounds fine. Will respin, thanks Eric!

Re: [PATCH iproute2 0/7] Add support for devlink resource abstraction

2018-02-22 Thread Arkadi Sharshevsky



On 02/15/2018 05:41 AM, David Ahern wrote:
> On 2/14/18 1:55 AM, Arkadi Sharshevsky wrote:
>> Add support for devlink resource abstraction.
>>
>> Arkadi Sharshevsky (7):
>>   devlink: Change empty line indication with indentations
>>   devlink: mnlg: Add support for extended ack
>>   devlink: Add support for devlink resource abstraction
>>   devlink: Add support for hot reload
>>   devlink: Move dpipe context from heap to stack
>>   devlink: Add support for resource/dpipe relation
>>   devlink: Update man pages and add resource man
>>
>>  devlink/devlink.c   | 774 
>> 
>>  devlink/mnlg.c  |  53 ++-
>>  include/libnetlink.h|   1 +
>>  include/list.h  |   5 +
>>  lib/libnetlink.c|   4 +-
>>  man/man8/devlink-dev.8  |  15 +
>>  man/man8/devlink-resource.8 |  78 +
>>  man/man8/devlink.8  |   1 +
>>  8 files changed, 871 insertions(+), 60 deletions(-)
>>  create mode 100644 man/man8/devlink-resource.8
>>
> 
> Looks ok to me.
> 

Hi David, noticed it wasn't applied yet.

Re: Qualcomm rmnet driver and qmi_wwan

2018-02-22 Thread Daniele Palmas

Hi Subash,

2018-02-21 20:47 GMT+01:00 Subash Abhinov Kasiviswanathan
:
> On 2018-02-21 04:38, Daniele Palmas wrote:
>>
>> Hello,
>>
>> in rmnet kernel documentation I read:
>>
>> "This driver can be used to register onto any physical network device in
>> IP mode. Physical transports include USB, HSIC, PCIe and IP accelerator."
>>
>> Does this mean that it can be used in association with the qmi_wwan
>> driver?
>>
>> If yes, can someone give me an hint on the steps to follow?
>>
>> If not, does anyone know if it is possible to modify qmi_wwan in order
>> to take advantage of the features provided by the rmnet driver?
>>
>> In this case hint on the changes for modifying qmi_wwan are welcome.
>>
>> Thanks in advance,
>> Daniele
>
>
> Hi
>
> I havent used qmi_wwan so the following comment is based on code inspection.
> qmimux_register_device() is creating qmimux devices with usb net device as
> real_dev. The Multiplexing and aggregation header (qmimux_hdr) is stripped
> off
> in qmimux_rx_fixup() and the packet is passed on to stack.
>
> You could instead create rmnet devices with the usb netdevice as real dev.
> The packets from the usb net driver can be queued to network stack directly
> as rmnet driver will setup a RX handler. rmnet driver will process the
> packets
> further and then queue to network stack.
>

Thanks, I will try to do some testing and report my findings.

Regards,
Daniele

> --
> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
> a Linux Foundation Collaborative Project

Your Consent

2018-02-22 Thread Mr.Lee

Important details to share with you, kindly email me for info: 
"peter.waddell...@gmail.com" Peter

[PATCH net v3 1/2] Revert "tuntap: add missing xdp flush"

2018-02-22 Thread Jason Wang

This reverts commit 762c330d670e3d4b795cf7a8d761866fdd1eef49. The
reason is we try to batch packets for devmap which causes calling
xdp_do_flush() in the process context. Simply disabling preemption
may not work since process may move among processors which lead
xdp_do_flush() to miss some flushes on some processors.

So simply revert the patch, a follow-up patch will add the xdp flush
correctly.

Reported-by: Christoffer Dall 
Fixes: 762c330d670e ("tuntap: add missing xdp flush")
Signed-off-by: Jason Wang 
---
 drivers/net/tun.c | 15 ---
 1 file changed, 15 deletions(-)

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index b52258c..2823a4a 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -181,7 +181,6 @@ struct tun_file {
struct tun_struct *detached;
struct ptr_ring tx_ring;
struct xdp_rxq_info xdp_rxq;
-   int xdp_pending_pkts;
 };
 
 struct tun_flow_entry {
@@ -1662,7 +1661,6 @@ static struct sk_buff *tun_build_skb(struct tun_struct 
*tun,
case XDP_REDIRECT:
get_page(alloc_frag->page);
alloc_frag->offset += buflen;
-   ++tfile->xdp_pending_pkts;
err = xdp_do_redirect(tun->dev, , xdp_prog);
if (err)
goto err_redirect;
@@ -1984,11 +1982,6 @@ static ssize_t tun_chr_write_iter(struct kiocb *iocb, 
struct iov_iter *from)
result = tun_get_user(tun, tfile, NULL, from,
  file->f_flags & O_NONBLOCK, false);
 
-   if (tfile->xdp_pending_pkts) {
-   tfile->xdp_pending_pkts = 0;
-   xdp_do_flush_map();
-   }
-
tun_put(tun);
return result;
 }
@@ -2325,13 +2318,6 @@ static int tun_sendmsg(struct socket *sock, struct 
msghdr *m, size_t total_len)
ret = tun_get_user(tun, tfile, m->msg_control, >msg_iter,
   m->msg_flags & MSG_DONTWAIT,
   m->msg_flags & MSG_MORE);
-
-   if (tfile->xdp_pending_pkts >= NAPI_POLL_WEIGHT ||
-   !(m->msg_flags & MSG_MORE)) {
-   tfile->xdp_pending_pkts = 0;
-   xdp_do_flush_map();
-   }
-
tun_put(tun);
return ret;
 }
@@ -3163,7 +3149,6 @@ static int tun_chr_open(struct inode *inode, struct file 
* file)
sock_set_flag(>sk, SOCK_ZEROCOPY);
 
memset(>tx_ring, 0, sizeof(tfile->tx_ring));
-   tfile->xdp_pending_pkts = 0;
 
return 0;
 }
-- 
2.7.4

[PATCH net v3 2/2] tuntap: correctly add the missing xdp flush

2018-02-22 Thread Jason Wang

Commit 762c330d670e ("tuntap: add missing xdp flush") tries to fix the
devmap stall caused by missed xdp flush by counting the pending xdp
redirected packets and flush when it exceeds NAPI_POLL_WEIGHT or
MSG_MORE is clear. This may lead to BUG() since xdp_do_flush() was
called in the process context with preemption enabled. Simply
disabling preemption may silence the warning but be not enough since
process may move between different CPUS during a batch which cause
xdp_do_flush() misses some CPU where the process run
previously. Consider the fallouts, that commit was reverted. To fix
the issue correctly, we can simply call xdp_do_flush() immediately
after xdp_do_redirect(), a side effect is that this removes any
possibility of batching which could be addressed in the future.

Reported-by: Christoffer Dall 
Fixes: 762c330d670e ("tuntap: add missing xdp flush")
Signed-off-by: Jason Wang 
---
 drivers/net/tun.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 2823a4a..a363ea2 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -1662,6 +1662,7 @@ static struct sk_buff *tun_build_skb(struct tun_struct 
*tun,
get_page(alloc_frag->page);
alloc_frag->offset += buflen;
err = xdp_do_redirect(tun->dev, , xdp_prog);
+   xdp_do_flush_map();
if (err)
goto err_redirect;
rcu_read_unlock();
-- 
2.7.4

Re: syzcaller patch postings...

2018-02-22 Thread Florian Westphal

Dmitry Vyukov  wrote:
> On Thu, Feb 22, 2018 at 9:26 AM, Paolo Abeni  wrote:
> > On Wed, 2018-02-21 at 16:47 -0500, David Miller wrote:
> >> I have to mention this now before it gets out of control.
> >>
> >> I would like to ask that syzkaller stop posting the patch it is
> >> testing when it posts to netdev.
> >
> > There is an open issue on this topic:
> >
> > https://github.com/google/syzkaller/issues/526
> >
> > The current behaviour is that syzbot replies to all get_maintainer.pl
> > recipients after testing a patch, regardless of the test submission
> > recipient list, the idea was instead to respect such list.
>
> 
> Hi David, Florian, Paolo,
> 
> Didn't realize it triggers patchwork. This wasn't intentional, sorry.
> 
> Do I understand it correctly that if syzbot replies to the CC list
> that was in the testing request, it will resolve the problem? So if
> netdev wasn't in CC, it will not be added to CC.

Yes, thats at least my expected/desired behaviour.
This way I can even CC some other person (maintainer, reporter etc)
to have them informed about test result too.

> I will go and fix it now.

Thank you Dmitry!

My Dear Friend.

2018-02-22 Thread abudu hassan

My Dear Friend.

I have a transaction deal of (US$18mllion US Dollars). Contact me for more 
info. 

regard.Mr Abudu Hassan

Re: syzcaller patch postings...

2018-02-22 Thread Dmitry Vyukov

On Thu, Feb 22, 2018 at 11:03 AM, Florian Westphal  wrote:
> Dmitry Vyukov  wrote:
>> On Thu, Feb 22, 2018 at 9:26 AM, Paolo Abeni  wrote:
>> > On Wed, 2018-02-21 at 16:47 -0500, David Miller wrote:
>> >> I have to mention this now before it gets out of control.
>> >>
>> >> I would like to ask that syzkaller stop posting the patch it is
>> >> testing when it posts to netdev.
>> >
>> > There is an open issue on this topic:
>> >
>> > https://github.com/google/syzkaller/issues/526
>> >
>> > The current behaviour is that syzbot replies to all get_maintainer.pl
>> > recipients after testing a patch, regardless of the test submission
>> > recipient list, the idea was instead to respect such list.
>>
>>
>> Hi David, Florian, Paolo,
>>
>> Didn't realize it triggers patchwork. This wasn't intentional, sorry.
>>
>> Do I understand it correctly that if syzbot replies to the CC list
>> that was in the testing request, it will resolve the problem? So if
>> netdev wasn't in CC, it will not be added to CC.
>
> Yes, thats at least my expected/desired behaviour.
> This way I can even CC some other person (maintainer, reporter etc)
> to have them informed about test result too.
>
>> I will go and fix it now.
>
> Thank you Dmitry!


This now should be fixed by
https://github.com/google/syzkaller/commit/7daaa06d53f0f496aa1a87656d16c81ebff37f73
I've also added a note to the doc referenced from bug report emails:
https://github.com/google/syzkaller/commit/7daaa06d53f0f496aa1a87656d16c81ebff37f73#diff-5b3b5ff5f03b01e1d31ec93aafd2f3d5

Re: syzcaller patch postings...

2018-02-22 Thread Daniel Axtens

Dmitry Vyukov  writes:

> On Thu, Feb 22, 2018 at 9:26 AM, Paolo Abeni  wrote:
>> On Wed, 2018-02-21 at 16:47 -0500, David Miller wrote:
>>> I have to mention this now before it gets out of control.
>>>
>>> I would like to ask that syzkaller stop posting the patch it is
>>> testing when it posts to netdev.
>>
>> There is an open issue on this topic:
>>
>> https://github.com/google/syzkaller/issues/526
>>
>> The current behaviour is that syzbot replies to all get_maintainer.pl
>> recipients after testing a patch, regardless of the test submission
>> recipient list, the idea was instead to respect such list.
>
>
> Hi David, Florian, Paolo,
>
> Didn't realize it triggers patchwork. This wasn't intentional, sorry.

A little-publicised and incorrectly-documented(!) feature of Patchwork
is that it supports some email headers. In particular, if you include an
"X-Patchwork-Hint: ignore" header, the mail will not be parsed by
Patchwork.

This will stop it being recorded as a patch. Unfortunately it will also
stop it being recorded as a comment - I don't know if that's an issue in
this case. Maybe we can set you up with Patchwork 2's new checks
infrastructure instead.

>
> Do I understand it correctly that if syzbot replies to the CC list
> that was in the testing request, it will resolve the problem? So if
> netdev wasn't in CC, it will not be added to CC.
>
> I will go and fix it now.

Re: [PATCH net-next] RDS: deliver zerocopy completion notification with data as an optimization

2018-02-22 Thread Sowmini Varadhan

On (02/21/18 19:39), Willem de Bruijn wrote:
> >> By the way, the put_cmsg is unconditional even if the caller did
> >> not supply msg_control. So it is basically no longer safe to ever
> >> call read, recv or recvfrom on a socket if zerocopy notifications
> >> are outstanding.
> >
> > Wait, I thought put_cmsg already checks for these things.
> 
> It does, and sets MSG_CTRUNC to signal that it was unable to
> write all control data. But by then the notifications have already
> been dequeued.

Putting hyperbole about "no longer safe to ever call read etc" aside,

put_cmsg can also return EFAULT if uspace provides a bogus cmsghdr,
(i.e., copy_to_user fails). So the only thing you can do to really
protect against every possible thing is to requeue the notification 
if put_cmsg fails.

Re: syzcaller patch postings...

2018-02-22 Thread Dmitry Vyukov

On Thu, Feb 22, 2018 at 2:31 PM, Daniel Axtens  wrote:
> Dmitry Vyukov  writes:
>
>> On Thu, Feb 22, 2018 at 9:26 AM, Paolo Abeni  wrote:
>>> On Wed, 2018-02-21 at 16:47 -0500, David Miller wrote:
 I have to mention this now before it gets out of control.

 I would like to ask that syzkaller stop posting the patch it is
 testing when it posts to netdev.
>>>
>>> There is an open issue on this topic:
>>>
>>> https://github.com/google/syzkaller/issues/526
>>>
>>> The current behaviour is that syzbot replies to all get_maintainer.pl
>>> recipients after testing a patch, regardless of the test submission
>>> recipient list, the idea was instead to respect such list.
>>
>>
>> Hi David, Florian, Paolo,
>>
>> Didn't realize it triggers patchwork. This wasn't intentional, sorry.
>
> A little-publicised and incorrectly-documented(!) feature of Patchwork
> is that it supports some email headers. In particular, if you include an
> "X-Patchwork-Hint: ignore" header, the mail will not be parsed by
> Patchwork.
>
> This will stop it being recorded as a patch. Unfortunately it will also
> stop it being recorded as a comment - I don't know if that's an issue in
> this case. Maybe we can set you up with Patchwork 2's new checks
> infrastructure instead.

Nice. But unfortunately the current mailing technology we use allows
very limited set of headers and no custom headers:
https://cloud.google.com/appengine/docs/standard/go/mail/mail-with-headers-attachments
So while possible, it would require very significant rework...

What's the Patchwork 2's new checks infrastructure?
If it will still remain a problem (hopefully not), then maybe it's
possible to blacklist syzbot address from creating new patches. syzbot
can do a lot, but so far does not also generate fixes for the bugs it
discovers :)

[PATCH iproute2-next v3 2/8] iplink: Correctly report error when network device isn't found

2018-02-22 Thread Serhey Popovych

Distinguish cases when "dev" parameter isn't given from cases where no
network device corresponding to "dev" is found.

Do not check for index validity in xdp_parse(): caller should take care
of this because has more information (e.g. when "dev" is given or not
found) for this.

Signed-off-by: Serhey Popovych 
---
 ip/iplink.c |   16 +---
 ip/iplink_xdp.c |7 +--
 2 files changed, 14 insertions(+), 9 deletions(-)

diff --git a/ip/iplink.c b/ip/iplink.c
index 5471626..fc358fc 100644
--- a/ip/iplink.c
+++ b/ip/iplink.c
@@ -569,6 +569,14 @@ static int iplink_parse_vf(int vf, int *argcp, char 
***argvp,
return 0;
 }
 
+static void has_dev(const char *dev, int dev_index)
+{
+   if (!dev)
+   missarg("dev");
+   if (!dev_index)
+   exit(nodev(dev));
+}
+
 int iplink_parse(int argc, char **argv, struct iplink_req *req,
 char **name, char **type, char **link, char **dev,
 int *group, int *index)
@@ -650,6 +658,9 @@ int iplink_parse(int argc, char **argv, struct iplink_req 
*req,
bool drv = strcmp(*argv, "xdpdrv") == 0;
bool offload = strcmp(*argv, "xdpoffload") == 0;
 
+   if (offload)
+   has_dev(*dev, dev_index);
+
NEXT_ARG();
if (xdp_parse(, , req, dev_index,
  generic, drv, offload))
@@ -732,15 +743,14 @@ int iplink_parse(int argc, char **argv, struct iplink_req 
*req,
} else if (strcmp(*argv, "vf") == 0) {
struct rtattr *vflist;
 
+   has_dev(*dev, dev_index);
+
NEXT_ARG();
if (get_integer(,  *argv, 0))
invarg("Invalid \"vf\" value\n", *argv);
 
vflist = addattr_nest(>n, sizeof(*req),
  IFLA_VFINFO_LIST);
-   if (dev_index == 0)
-   missarg("dev");
-
len = iplink_parse_vf(vf, , , req, dev_index);
if (len < 0)
return -1;
diff --git a/ip/iplink_xdp.c b/ip/iplink_xdp.c
index 8382635..3df38b8 100644
--- a/ip/iplink_xdp.c
+++ b/ip/iplink_xdp.c
@@ -55,17 +55,12 @@ int xdp_parse(int *argc, char ***argv, struct iplink_req 
*req, __u32 ifindex,
.type = BPF_PROG_TYPE_XDP,
.argc = *argc,
.argv = *argv,
+   .ifindex = ifindex,
};
struct xdp_req xdp = {
.req = req,
};
 
-   if (offload) {
-   if (!ifindex)
-   incomplete_command();
-   cfg.ifindex = ifindex;
-   }
-
if (!force)
xdp.flags |= XDP_FLAGS_UPDATE_IF_NOEXIST;
if (generic)
-- 
1.7.10.4

[PATCH iproute2-next v3 8/8] iplink: Reduce number of arguments to iplink_parse()

2018-02-22 Thread Serhey Popovych

Introduce new @struct iplink_parse_args data structure to consolidate
arguments to iplink_parse(). This will reduce number of arguments
passed to it.

Pass this data structure to ->parse_opt() in iplink specific modules:
it may be used to get network device name and other information.

Signed-off-by: Serhey Popovych 
---
 ip/ip_common.h   |   16 +---
 ip/iplink.c  |   34 ++
 ip/iplink_bond.c |4 +++-
 ip/iplink_bond_slave.c   |4 +++-
 ip/iplink_bridge.c   |4 +++-
 ip/iplink_bridge_slave.c |4 +++-
 ip/iplink_can.c  |4 +++-
 ip/iplink_geneve.c   |4 +++-
 ip/iplink_hsr.c  |4 +++-
 ip/iplink_ipoib.c|4 +++-
 ip/iplink_ipvlan.c   |4 +++-
 ip/iplink_macvlan.c  |4 +++-
 ip/iplink_vlan.c |4 +++-
 ip/iplink_vrf.c  |5 -
 ip/iplink_vxcan.c|   14 ++
 ip/iplink_vxlan.c|4 +++-
 ip/ipmacsec.c|4 +++-
 ip/link_gre.c|6 --
 ip/link_gre6.c   |6 --
 ip/link_ip6tnl.c |6 --
 ip/link_iptnl.c  |6 --
 ip/link_veth.c   |   14 ++
 ip/link_vti.c|6 --
 ip/link_vti6.c   |6 --
 24 files changed, 114 insertions(+), 57 deletions(-)

diff --git a/ip/ip_common.h b/ip/ip_common.h
index f762821..aef70de 100644
--- a/ip/ip_common.h
+++ b/ip/ip_common.h
@@ -112,12 +112,23 @@ struct iplink_req {
charbuf[1024];
 };
 
+struct iplink_parse_args {
+   const char *dev;
+   const char *name;
+   const char *type;
+
+   /* This definitely must be the last one and initialized
+* by the caller of iplink_parse() that will initialize rest.
+*/
+   struct iplink_req *req;
+};
+
 struct link_util {
struct link_util*next;
const char  *id;
int maxattr;
int (*parse_opt)(struct link_util *, int, char **,
-struct nlmsghdr *);
+struct iplink_parse_args *);
void(*print_opt)(struct link_util *, FILE *,
 struct rtattr *[]);
void(*print_xstats)(struct link_util *, FILE *,
@@ -132,8 +143,7 @@ struct link_util {
 
 struct link_util *get_link_kind(const char *kind);
 
-int iplink_parse(int argc, char **argv, struct iplink_req *req,
-char **name, char **type, char **dev);
+int iplink_parse(int argc, char **argv, struct iplink_parse_args *pa);
 
 /* iplink_bridge.c */
 void br_dump_bridge_id(const struct ifla_bridge_id *id, char *buf, size_t len);
diff --git a/ip/iplink.c b/ip/iplink.c
index e53d890..837e2b0 100644
--- a/ip/iplink.c
+++ b/ip/iplink.c
@@ -10,6 +10,7 @@
  *
  */
 
+#include 
 #include 
 #include 
 #include 
@@ -577,9 +578,12 @@ static void has_dev(const char *dev, int dev_index)
exit(nodev(dev));
 }
 
-int iplink_parse(int argc, char **argv, struct iplink_req *req,
-char **name, char **type, char **dev)
+int iplink_parse(int argc, char **argv, struct iplink_parse_args *pa)
 {
+   struct iplink_req *req = pa->req;
+   const char **dev = >dev;
+   const char **name = >name;
+   const char **type = >type;
char *link = NULL;
int ret, len;
char abuf[32];
@@ -597,6 +601,8 @@ int iplink_parse(int argc, char **argv, struct iplink_req 
*req,
 
ret = argc;
 
+   memset(pa, 0, offsetof(struct iplink_parse_args, req));
+
while (argc > 0) {
if (strcmp(*argv, "up") == 0) {
req->i.ifi_change |= IFF_UP;
@@ -1016,32 +1022,32 @@ out:
 
 static int iplink_modify(int cmd, unsigned int flags, int argc, char **argv)
 {
-   char *dev = NULL;
-   char *name = NULL;
-   char *type = NULL;
struct iplink_req req = {
.n.nlmsg_len = NLMSG_LENGTH(sizeof(struct ifinfomsg)),
.n.nlmsg_flags = NLM_F_REQUEST | flags,
.n.nlmsg_type = cmd,
.i.ifi_family = preferred_family,
};
+   struct iplink_parse_args pa;
int ret;
 
-   ret = iplink_parse(argc, argv, , , , );
+   pa.req = 
+
+   ret = iplink_parse(argc, argv, );
if (ret < 0)
return ret;
 
-   if (type) {
+   if (pa.type) {
struct link_util *lu;
struct rtattr *linkinfo;
-   char *ulinep = strchr(type, '_');
+   char *ulinep = strchr(pa.type, '_');
int iflatype;
 
linkinfo = addattr_nest(, sizeof(req), IFLA_LINKINFO);
-   addattr_l(, sizeof(req), IFLA_INFO_KIND, type,
-strlen(type));
+   addattr_l(, sizeof(req),

[PATCH iproute2-next v3 7/8] iplink: Move data structures to block of their users

2018-02-22 Thread Serhey Popovych

This will consolidate data and code using it in single place and prepare
for upcoming ->parse_opt() method change.

Signed-off-by: Serhey Popovych 
---
 ip/link_gre.c|   32 
 ip/link_gre6.c   |   32 
 ip/link_ip6tnl.c |   32 
 ip/link_iptnl.c  |   32 
 ip/link_vti.c|   32 
 ip/link_vti6.c   |   32 
 6 files changed, 96 insertions(+), 96 deletions(-)

diff --git a/ip/link_gre.c b/ip/link_gre.c
index bc1cee8..6654525 100644
--- a/ip/link_gre.c
+++ b/ip/link_gre.c
@@ -66,22 +66,6 @@ static void gre_print_help(struct link_util *lu, int argc, 
char **argv, FILE *f)
 static int gre_parse_opt(struct link_util *lu, int argc, char **argv,
 struct nlmsghdr *n)
 {
-   struct ifinfomsg *ifi = NLMSG_DATA(n);
-   struct {
-   struct nlmsghdr n;
-   struct ifinfomsg i;
-   } req = {
-   .n.nlmsg_len = NLMSG_LENGTH(sizeof(*ifi)),
-   .n.nlmsg_flags = NLM_F_REQUEST,
-   .n.nlmsg_type = RTM_GETLINK,
-   .i.ifi_family = preferred_family,
-   .i.ifi_index = ifi->ifi_index,
-   };
-   struct nlmsghdr *answer;
-   struct rtattr *tb[IFLA_MAX + 1];
-   struct rtattr *linkinfo[IFLA_INFO_MAX+1];
-   struct rtattr *greinfo[IFLA_GRE_MAX + 1];
-   int len;
__u16 iflags = 0;
__u16 oflags = 0;
__be32 ikey = 0;
@@ -107,7 +91,23 @@ static int gre_parse_opt(struct link_util *lu, int argc, 
char **argv,
inet_prefix_reset();
 
if (!(n->nlmsg_flags & NLM_F_CREATE)) {
+   struct ifinfomsg *ifi = NLMSG_DATA(n);
+   struct {
+   struct nlmsghdr n;
+   struct ifinfomsg i;
+   } req = {
+   .n.nlmsg_len = NLMSG_LENGTH(sizeof(*ifi)),
+   .n.nlmsg_flags = NLM_F_REQUEST,
+   .n.nlmsg_type = RTM_GETLINK,
+   .i.ifi_family = preferred_family,
+   .i.ifi_index = ifi->ifi_index,
+   };
+   struct nlmsghdr *answer;
+   struct rtattr *tb[IFLA_MAX + 1];
+   struct rtattr *linkinfo[IFLA_INFO_MAX+1];
+   struct rtattr *greinfo[IFLA_GRE_MAX + 1];
const struct rtattr *rta;
+   int len;
 
if (rtnl_talk(, , ) < 0) {
 get_failed:
diff --git a/ip/link_gre6.c b/ip/link_gre6.c
index 83c61e2..a92854d 100644
--- a/ip/link_gre6.c
+++ b/ip/link_gre6.c
@@ -77,22 +77,6 @@ static void gre_print_help(struct link_util *lu, int argc, 
char **argv, FILE *f)
 static int gre_parse_opt(struct link_util *lu, int argc, char **argv,
 struct nlmsghdr *n)
 {
-   struct ifinfomsg *ifi = NLMSG_DATA(n);
-   struct {
-   struct nlmsghdr n;
-   struct ifinfomsg i;
-   } req = {
-   .n.nlmsg_len = NLMSG_LENGTH(sizeof(*ifi)),
-   .n.nlmsg_flags = NLM_F_REQUEST,
-   .n.nlmsg_type = RTM_GETLINK,
-   .i.ifi_family = preferred_family,
-   .i.ifi_index = ifi->ifi_index,
-   };
-   struct nlmsghdr *answer;
-   struct rtattr *tb[IFLA_MAX + 1];
-   struct rtattr *linkinfo[IFLA_INFO_MAX+1];
-   struct rtattr *greinfo[IFLA_GRE_MAX + 1];
-   int len;
__u16 iflags = 0;
__u16 oflags = 0;
__be32 ikey = 0;
@@ -118,7 +102,23 @@ static int gre_parse_opt(struct link_util *lu, int argc, 
char **argv,
inet_prefix_reset();
 
if (!(n->nlmsg_flags & NLM_F_CREATE)) {
+   struct ifinfomsg *ifi = NLMSG_DATA(n);
+   struct {
+   struct nlmsghdr n;
+   struct ifinfomsg i;
+   } req = {
+   .n.nlmsg_len = NLMSG_LENGTH(sizeof(*ifi)),
+   .n.nlmsg_flags = NLM_F_REQUEST,
+   .n.nlmsg_type = RTM_GETLINK,
+   .i.ifi_family = preferred_family,
+   .i.ifi_index = ifi->ifi_index,
+   };
+   struct nlmsghdr *answer;
+   struct rtattr *tb[IFLA_MAX + 1];
+   struct rtattr *linkinfo[IFLA_INFO_MAX+1];
+   struct rtattr *greinfo[IFLA_GRE_MAX + 1];
const struct rtattr *rta;
+   int len;
 
if (rtnl_talk(, , ) < 0) {
 get_failed:
diff --git a/ip/link_ip6tnl.c b/ip/link_ip6tnl.c
index c7fef2e..edd7632 100644
--- a/ip/link_ip6tnl.c
+++ b/ip/link_ip6tnl.c
@@ -77,22 +77,6 @@ static void ip6tunnel_print_help(struct link_util *lu, int 
argc, char **argv,
 static int ip6tunnel_parse_opt(struct link_util *lu, int argc, char **argv,
   struct nlmsghdr *n)
 {
-   struct ifinfomsg

[PATCH iproute2-next v3 4/8] iplink: Follow documented behaviour when "index" is given

2018-02-22 Thread Serhey Popovych

Both ip-link(8) and error message when "index" parameter is given for
set/delete case says that index can only be given during network
device creation.

Follow this documented behaviour and get rid of ambiguous behaviour in
case of both "dev" and "index" specified for ip link delete scenario
(actually "index" being ignored in favor to "dev").

Prohibit "index" when configuring/deleting group of network devices.

Signed-off-by: Serhey Popovych 
---
 ip/iplink.c |   11 ++-
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/ip/iplink.c b/ip/iplink.c
index 1359c0f..4e9f571 100644
--- a/ip/iplink.c
+++ b/ip/iplink.c
@@ -974,6 +974,12 @@ static int iplink_modify(int cmd, unsigned int flags, int 
argc, char **argv)
argc -= ret;
argv += ret;
 
+   if (!(flags & NLM_F_CREATE) && index) {
+   fprintf(stderr,
+   "index can be used only when creating devices.\n");
+   exit(-1);
+   }
+
if (group != -1) {
if (dev)
addattr_l(, sizeof(req), IFLA_GROUP,
@@ -1004,11 +1010,6 @@ static int iplink_modify(int cmd, unsigned int flags, 
int argc, char **argv)
"Not enough information: \"dev\" argument is 
required.\n");
exit(-1);
}
-   if (cmd == RTM_NEWLINK && index) {
-   fprintf(stderr,
-   "index can be used only when creating 
devices.\n");
-   exit(-1);
-   }
 
req.i.ifi_index = ll_name_to_index(dev);
if (!req.i.ifi_index)
-- 
1.7.10.4

[PATCH iproute2-next v3 0/8] iplink: Improve iplink_parse()

2018-02-22 Thread Serhey Popovych

This is main routine to parse ip-link(8) configuration parameters.

Main reason to improve it is to pass network device @name, @dev and
other parameters to kind specific ->parse_opt() function so they can use
this information.

For example later we will extend iplink_get() to parse netlink
attributes deeper and replace open coded rtnl_talk() in ip/tunnel
modules to simplify getting existing tunnel information.

Among main change there is a number of patches to prepare for it
that improve iplink_parse() in some way.

See individual patch description message for more information.

v3
  Move vxlan/veth ifinfomsg save/restore to separate patch to
  make clear change that perform most of request buffer setups
  and checks in iplink_parse().

  Update commit message descriptions and extra new line from
  "utils: Introduce and use nodev() helper routine" patch.

v2
  Terminate via exit() when failing to parse command line arguments
  to help identify failing line in batch mode.

Thanks,
Serhii

Serhey Popovych (8):
  utils: Introduce and use nodev() helper routine
  iplink: Correctly report error when network device isn't found
  iplink: Use "dev" and "name" parameters interchangeable when possible
  iplink: Follow documented behaviour when "index" is given
  veth,vxcan: Save/reinitialize/restore whole @struct ifinfomsg
  iplink: Perform most of request buffer setups and checks in
iplink_parse()
  iplink: Move data structures to block of their users
  iplink: Reduce number of arguments to iplink_parse()

 bridge/fdb.c |   17 ++--
 bridge/link.c|8 +-
 bridge/mdb.c |   19 ++---
 bridge/vlan.c|7 +-
 include/utils.h  |1 +
 ip/ip6tunnel.c   |6 +-
 ip/ip_common.h   |   17 +++-
 ip/ipaddress.c   |7 +-
 ip/iplink.c  |  200 +++---
 ip/iplink_bond.c |8 +-
 ip/iplink_bond_slave.c   |4 +-
 ip/iplink_bridge.c   |   11 ++-
 ip/iplink_bridge_slave.c |4 +-
 ip/iplink_can.c  |4 +-
 ip/iplink_geneve.c   |4 +-
 ip/iplink_hsr.c  |4 +-
 ip/iplink_ipoib.c|4 +-
 ip/iplink_ipvlan.c   |4 +-
 ip/iplink_macvlan.c  |4 +-
 ip/iplink_vlan.c |4 +-
 ip/iplink_vrf.c  |5 +-
 ip/iplink_vxcan.c|   41 +++---
 ip/iplink_vxlan.c|   11 ++-
 ip/iplink_xdp.c  |7 +-
 ip/ipmacsec.c|4 +-
 ip/ipmroute.c|7 +-
 ip/ipneigh.c |   14 ++--
 ip/ipntable.c|6 +-
 ip/iproute.c |   36 +++--
 ip/iproute_lwtunnel.c|4 +-
 ip/iptunnel.c|6 +-
 ip/link_gre.c|   43 +-
 ip/link_gre6.c   |   43 +-
 ip/link_ip6tnl.c |   40 +-
 ip/link_iptnl.c  |   40 +-
 ip/link_veth.c   |   41 +++---
 ip/link_vti.c|   43 +-
 ip/link_vti6.c   |   43 +-
 lib/utils.c  |6 ++
 tc/m_mirred.c|6 +-
 tc/tc_class.c|   14 ++--
 tc/tc_filter.c   |   18 ++---
 tc/tc_qdisc.c|   12 +--
 43 files changed, 404 insertions(+), 423 deletions(-)

-- 
1.7.10.4

[PATCH iproute2-next v3 5/8] veth,vxcan: Save/reinitialize/restore whole @struct ifinfomsg

2018-02-22 Thread Serhey Popovych

Now in iplink_parse() we use ->ifi_change and ->ifi_flags fields and
plan to use ->ifi_index with upcoming change.

Saving, restoring and reinitializing individual fields is error prone:
using new field in iplink_parse() without updating callers in veth and
vxcan will overwrite main device ifinfomsg data.

Since @struct ifinfomsg is small enough with known sizeof() compiler may
inline memcpy()/memset() with few load/store instructions.

Signed-off-by: Serhey Popovych 
---
 ip/iplink_vxcan.c |   22 --
 ip/link_veth.c|   22 --
 2 files changed, 16 insertions(+), 28 deletions(-)

diff --git a/ip/iplink_vxcan.c b/ip/iplink_vxcan.c
index ebe9e56..d7e94a0 100644
--- a/ip/iplink_vxcan.c
+++ b/ip/iplink_vxcan.c
@@ -38,11 +38,10 @@ static int vxcan_parse_opt(struct link_util *lu, int argc, 
char **argv,
char *link = NULL;
char *type = NULL;
int index = 0;
+   int group;
int err;
struct rtattr *data;
-   int group;
-   struct ifinfomsg *ifm, *peer_ifm;
-   unsigned int ifi_flags, ifi_change;
+   struct ifinfomsg ifm_save, *ifm, *peer_ifm;
 
if (strcmp(argv[0], "peer") != 0) {
usage();
@@ -50,10 +49,8 @@ static int vxcan_parse_opt(struct link_util *lu, int argc, 
char **argv,
}
 
ifm = NLMSG_DATA(n);
-   ifi_flags = ifm->ifi_flags;
-   ifi_change = ifm->ifi_change;
-   ifm->ifi_flags = 0;
-   ifm->ifi_change = 0;
+   memcpy(_save, ifm, sizeof(*ifm));
+   memset(ifm, 0, sizeof(*ifm));
 
data = addattr_nest(n, 1024, VXCAN_INFO_PEER);
 
@@ -72,16 +69,13 @@ static int vxcan_parse_opt(struct link_util *lu, int argc, 
char **argv,
  IFLA_IFNAME, name, strlen(name) + 1);
}
 
-   peer_ifm = RTA_DATA(data);
-   peer_ifm->ifi_index = index;
-   peer_ifm->ifi_flags = ifm->ifi_flags;
-   peer_ifm->ifi_change = ifm->ifi_change;
-   ifm->ifi_flags = ifi_flags;
-   ifm->ifi_change = ifi_change;
-
if (group != -1)
addattr32(n, 1024, IFLA_GROUP, group);
 
+   peer_ifm = RTA_DATA(data);
+   memcpy(peer_ifm, ifm, sizeof(*ifm));
+   memcpy(ifm, _save, sizeof(*ifm));
+
addattr_nest_end(n, data);
return argc - 1 - err;
 }
diff --git a/ip/link_veth.c b/ip/link_veth.c
index a8e7cf7..b6d37b9 100644
--- a/ip/link_veth.c
+++ b/ip/link_veth.c
@@ -36,11 +36,10 @@ static int veth_parse_opt(struct link_util *lu, int argc, 
char **argv,
char *link = NULL;
char *type = NULL;
int index = 0;
+   int group;
int err;
struct rtattr *data;
-   int group;
-   struct ifinfomsg *ifm, *peer_ifm;
-   unsigned int ifi_flags, ifi_change;
+   struct ifinfomsg ifm_save, *ifm, *peer_ifm;
 
if (strcmp(argv[0], "peer") != 0) {
usage();
@@ -48,10 +47,8 @@ static int veth_parse_opt(struct link_util *lu, int argc, 
char **argv,
}
 
ifm = NLMSG_DATA(n);
-   ifi_flags = ifm->ifi_flags;
-   ifi_change = ifm->ifi_change;
-   ifm->ifi_flags = 0;
-   ifm->ifi_change = 0;
+   memcpy(_save, ifm, sizeof(*ifm));
+   memset(ifm, 0, sizeof(*ifm));
 
data = addattr_nest(n, 1024, VETH_INFO_PEER);
 
@@ -70,16 +67,13 @@ static int veth_parse_opt(struct link_util *lu, int argc, 
char **argv,
  IFLA_IFNAME, name, strlen(name) + 1);
}
 
-   peer_ifm = RTA_DATA(data);
-   peer_ifm->ifi_index = index;
-   peer_ifm->ifi_flags = ifm->ifi_flags;
-   peer_ifm->ifi_change = ifm->ifi_change;
-   ifm->ifi_flags = ifi_flags;
-   ifm->ifi_change = ifi_change;
-
if (group != -1)
addattr32(n, 1024, IFLA_GROUP, group);
 
+   peer_ifm = RTA_DATA(data);
+   memcpy(peer_ifm, ifm, sizeof(*ifm));
+   memcpy(ifm, _save, sizeof(*ifm));
+
addattr_nest_end(n, data);
return argc - 1 - err;
 }
-- 
1.7.10.4

[PATCH iproute2-next v3 6/8] iplink: Perform most of request buffer setups and checks in iplink_parse()

2018-02-22 Thread Serhey Popovych

To benefit other users (e.g. link_veth.c) of iplink_parse() from
additional attribute checks and setups made in iplink_modify(). This
catches most of weired cobination of parameters to peer device
configuration.

Drop @link, @group and @index from iplink_parse() parameters list: they
are not needed outside.

While there change return -1 to exit(-1) for group parsing errors: we
want to stop further command processing unless -force option is given
to get error line easily.

Signed-off-by: Serhey Popovych 
---
 ip/ip_common.h|3 +-
 ip/iplink.c   |  118 +
 ip/iplink_vxcan.c |   13 +-
 ip/link_veth.c|   13 +-
 4 files changed, 59 insertions(+), 88 deletions(-)

diff --git a/ip/ip_common.h b/ip/ip_common.h
index e4e628b..f762821 100644
--- a/ip/ip_common.h
+++ b/ip/ip_common.h
@@ -133,8 +133,7 @@ struct link_util {
 struct link_util *get_link_kind(const char *kind);
 
 int iplink_parse(int argc, char **argv, struct iplink_req *req,
-char **name, char **type, char **link, char **dev,
-int *group, int *index);
+char **name, char **type, char **dev);
 
 /* iplink_bridge.c */
 void br_dump_bridge_id(const struct ifla_bridge_id *id, char *buf, size_t len);
diff --git a/ip/iplink.c b/ip/iplink.c
index 4e9f571..e53d890 100644
--- a/ip/iplink.c
+++ b/ip/iplink.c
@@ -578,9 +578,9 @@ static void has_dev(const char *dev, int dev_index)
 }
 
 int iplink_parse(int argc, char **argv, struct iplink_req *req,
-char **name, char **type, char **link, char **dev,
-int *group, int *index)
+char **name, char **type, char **dev)
 {
+   char *link = NULL;
int ret, len;
char abuf[32];
int qlen = -1;
@@ -591,9 +591,10 @@ int iplink_parse(int argc, char **argv, struct iplink_req 
*req,
int numrxqueues = -1;
int dev_index = 0;
int link_netnsid = -1;
+   int index = 0;
+   int group = -1;
int addr_len = 0;
 
-   *group = -1;
ret = argc;
 
while (argc > 0) {
@@ -616,14 +617,14 @@ int iplink_parse(int argc, char **argv, struct iplink_req 
*req,
}
} else if (strcmp(*argv, "index") == 0) {
NEXT_ARG();
-   if (*index)
+   if (index)
duparg("index", *argv);
-   *index = atoi(*argv);
-   if (*index <= 0)
+   index = atoi(*argv);
+   if (index <= 0)
invarg("Invalid \"index\" value", *argv);
} else if (matches(*argv, "link") == 0) {
NEXT_ARG();
-   *link = *argv;
+   link = *argv;
} else if (matches(*argv, "address") == 0) {
NEXT_ARG();
addr_len = ll_addr_a2n(abuf, sizeof(abuf), *argv);
@@ -816,10 +817,11 @@ int iplink_parse(int argc, char **argv, struct iplink_req 
*req,
  *argv, len);
} else if (strcmp(*argv, "group") == 0) {
NEXT_ARG();
-   if (*group != -1)
+   if (group != -1)
duparg("group", *argv);
-   if (rtnl_group_a2n(group, *argv))
+   if (rtnl_group_a2n(, *argv))
invarg("Invalid \"group\" value\n", *argv);
+   addattr32(>n, sizeof(*req), IFLA_GROUP, group);
} else if (strcmp(*argv, "mode") == 0) {
int mode;
 
@@ -946,80 +948,47 @@ int iplink_parse(int argc, char **argv, struct iplink_req 
*req,
}
}
 
-   return ret - argc;
-}
-
-static int iplink_modify(int cmd, unsigned int flags, int argc, char **argv)
-{
-   char *dev = NULL;
-   char *name = NULL;
-   char *link = NULL;
-   char *type = NULL;
-   int index = 0;
-   int group;
-   struct link_util *lu = NULL;
-   struct iplink_req req = {
-   .n.nlmsg_len = NLMSG_LENGTH(sizeof(struct ifinfomsg)),
-   .n.nlmsg_flags = NLM_F_REQUEST | flags,
-   .n.nlmsg_type = cmd,
-   .i.ifi_family = preferred_family,
-   };
-   int ret;
-
-   ret = iplink_parse(argc, argv,
-  , , , , , , );
-   if (ret < 0)
-   return ret;
-
-   argc -= ret;
-   argv += ret;
-
-   if (!(flags & NLM_F_CREATE) && index) {
+   if (!(req->n.nlmsg_flags & NLM_F_CREATE) && index) {
fprintf(stderr,
"index can be used only when creating devices.\n");
exit(-1);
}
 
if (group != -1) {
-   if (dev)
-   addattr_l(,

[PATCH iproute2-next v3 1/8] utils: Introduce and use nodev() helper routine

2018-02-22 Thread Serhey Popovych

There is a couple of places where we report error in case of no network
device is found. In all of them we output message in the same format to
stderr and either return -1 or 1 to the caller or exit with -1.

Introduce new helper function nodev() that takes name of the network
device caused error and returns -1 to it's caller. Either call exit()
or return to the caller to preserve behaviour before change.

Use -nodev() in traffic control (tc) code to return 1.

Simplify expression for checking for argument being 0/NULL in @if
statement.

Signed-off-by: Serhey Popovych 
---
 bridge/fdb.c  |   17 ++---
 bridge/link.c |8 +++-
 bridge/mdb.c  |   19 ++-
 bridge/vlan.c |7 ++-
 include/utils.h   |1 +
 ip/ip6tunnel.c|6 ++
 ip/ipaddress.c|7 +++
 ip/iplink.c   |   13 -
 ip/iplink_bond.c  |4 ++--
 ip/iplink_bridge.c|7 ++-
 ip/iplink_vxlan.c |7 ++-
 ip/ipmroute.c |7 +++
 ip/ipneigh.c  |   14 +++---
 ip/ipntable.c |6 ++
 ip/iproute.c  |   36 
 ip/iproute_lwtunnel.c |4 ++--
 ip/iptunnel.c |6 ++
 ip/link_gre.c |7 ++-
 ip/link_gre6.c|7 ++-
 ip/link_ip6tnl.c  |4 ++--
 ip/link_iptnl.c   |4 ++--
 ip/link_vti.c |7 ++-
 ip/link_vti6.c|7 ++-
 lib/utils.c   |6 ++
 tc/m_mirred.c |6 ++
 tc/tc_class.c |   14 ++
 tc/tc_filter.c|   18 ++
 tc/tc_qdisc.c |   12 
 28 files changed, 97 insertions(+), 164 deletions(-)

diff --git a/bridge/fdb.c b/bridge/fdb.c
index b4f6e8b..205b4fa 100644
--- a/bridge/fdb.c
+++ b/bridge/fdb.c
@@ -311,11 +311,8 @@ static int fdb_show(int argc, char **argv)
/*we'll keep around filter_dev for older kernels */
if (filter_dev) {
filter_index = ll_name_to_index(filter_dev);
-   if (filter_index == 0) {
-   fprintf(stderr, "Cannot find device \"%s\"\n",
-   filter_dev);
-   return -1;
-   }
+   if (!filter_index)
+   return nodev(filter_dev);
req.ifm.ifi_index = filter_index;
}
 
@@ -391,8 +388,8 @@ static int fdb_modify(int cmd, int flags, int argc, char 
**argv)
} else if (strcmp(*argv, "via") == 0) {
NEXT_ARG();
via = ll_name_to_index(*argv);
-   if (via == 0)
-   invarg("invalid device\n", *argv);
+   if (!via)
+   exit(nodev(*argv));
} else if (strcmp(*argv, "self") == 0) {
req.ndm.ndm_flags |= NTF_SELF;
} else if (matches(*argv, "master") == 0) {
@@ -467,10 +464,8 @@ static int fdb_modify(int cmd, int flags, int argc, char 
**argv)
addattr32(, sizeof(req), NDA_IFINDEX, via);
 
req.ndm.ndm_ifindex = ll_name_to_index(d);
-   if (req.ndm.ndm_ifindex == 0) {
-   fprintf(stderr, "Cannot find device \"%s\"\n", d);
-   return -1;
-   }
+   if (!req.ndm.ndm_ifindex)
+   return nodev(d);
 
if (rtnl_talk(, , NULL) < 0)
return -1;
diff --git a/bridge/link.c b/bridge/link.c
index 69c08ec..579d57e 100644
--- a/bridge/link.c
+++ b/bridge/link.c
@@ -485,11 +485,9 @@ static int brlink_show(int argc, char **argv)
}
 
if (filter_dev) {
-   if ((filter_index = ll_name_to_index(filter_dev)) == 0) {
-   fprintf(stderr, "Cannot find device \"%s\"\n",
-   filter_dev);
-   return -1;
-   }
+   filter_index = ll_name_to_index(filter_dev);
+   if (!filter_index)
+   return nodev(filter_dev);
}
 
if (show_details) {
diff --git a/bridge/mdb.c b/bridge/mdb.c
index 8c08baf..f38dc67 100644
--- a/bridge/mdb.c
+++ b/bridge/mdb.c
@@ -287,11 +287,8 @@ static int mdb_show(int argc, char **argv)
 
if (filter_dev) {
filter_index = ll_name_to_index(filter_dev);
-   if (filter_index == 0) {
-   fprintf(stderr, "Cannot find device \"%s\"\n",
-   filter_dev);
-   return -1;
-   }
+   if (!filter_index)
+   return nodev(filter_dev);
}
 
new_json_obj(json);
@@ -360,16 +357,12 @@ static int mdb_modify(int cmd, int flags, int argc, char 
**argv)
}
 
req.bpm.ifindex = ll_name_to_index(d);
-   if (req.bpm.ifindex == 0) {
-   fprintf(stderr,

[PATCH iproute2-next v3 3/8] iplink: Use "dev" and "name" parameters interchangeable when possible

2018-02-22 Thread Serhey Popovych

Both of them accept network device name as argument, but have different
meaning:

  dev  - is a device by it's name,
  name - name for specific device.

The only case where they treated separately is network device rename
case where need to specify both ifindex and new name. In rest of the
cases we can assume that dev == name.

With this change we do following:

  1) Kill ambiguity with both "dev" and "name" parameters given the same
 name:

   ip link {add|set} dev veth100a name veth100a ...

  2) Make sure we do not accept "name" more than once.

  3) For VF and XDP treat "name" as "dev". Fail in case of "dev" is
 given after VF and/or XDP parsing.

  4) Make veth and vxcan to accept both "name" and "dev" as their peer
 parameters, effectively following general ip-link(8) utility
 behaviour on link create:

   ip link add {name|dev} veth1a type veth peer {name|dev} veth1b

Signed-off-by: Serhey Popovych 
---
 ip/iplink.c |   34 ++
 1 file changed, 30 insertions(+), 4 deletions(-)

diff --git a/ip/iplink.c b/ip/iplink.c
index fc358fc..1359c0f 100644
--- a/ip/iplink.c
+++ b/ip/iplink.c
@@ -605,9 +605,15 @@ int iplink_parse(int argc, char **argv, struct iplink_req 
*req,
req->i.ifi_flags &= ~IFF_UP;
} else if (strcmp(*argv, "name") == 0) {
NEXT_ARG();
+   if (*name)
+   duparg("name", *argv);
if (check_ifname(*argv))
invarg("\"name\" not a valid ifname", *argv);
*name = *argv;
+   if (!*dev) {
+   *dev = *name;
+   dev_index = ll_name_to_index(*dev);
+   }
} else if (strcmp(*argv, "index") == 0) {
NEXT_ARG();
if (*index)
@@ -665,6 +671,9 @@ int iplink_parse(int argc, char **argv, struct iplink_req 
*req,
if (xdp_parse(, , req, dev_index,
  generic, drv, offload))
exit(-1);
+
+   if (offload && *name == *dev)
+   *dev = NULL;
} else if (strcmp(*argv, "netns") == 0) {
NEXT_ARG();
if (netns != -1)
@@ -755,6 +764,9 @@ int iplink_parse(int argc, char **argv, struct iplink_req 
*req,
if (len < 0)
return -1;
addattr_nest_end(>n, vflist);
+
+   if (*name == *dev)
+   *dev = NULL;
} else if (matches(*argv, "master") == 0) {
int ifindex;
 
@@ -905,7 +917,7 @@ int iplink_parse(int argc, char **argv, struct iplink_req 
*req,
 
if (strcmp(*argv, "dev") == 0)
NEXT_ARG();
-   if (*dev)
+   if (*dev != *name)
duparg2("dev", *argv);
if (check_ifname(*argv))
invarg("\"dev\" not a valid ifname", *argv);
@@ -915,6 +927,14 @@ int iplink_parse(int argc, char **argv, struct iplink_req 
*req,
argc--; argv++;
}
 
+   /* Allow "ip link add dev" and "ip link add name" */
+   if (!*name)
+   *name = *dev;
+   else if (!*dev)
+   *dev = *name;
+   else if (!strcmp(*name, *dev))
+   *name = *dev;
+
if (dev_index && addr_len) {
int halen = nl_get_ll_addr_len(dev_index);
 
@@ -993,10 +1013,16 @@ static int iplink_modify(int cmd, unsigned int flags, 
int argc, char **argv)
req.i.ifi_index = ll_name_to_index(dev);
if (!req.i.ifi_index)
return nodev(dev);
+
+   /* Not renaming to the same name */
+   if (name == dev)
+   name = NULL;
} else {
-   /* Allow "ip link add dev" and "ip link add name" */
-   if (!name)
-   name = dev;
+   if (name != dev) {
+   fprintf(stderr,
+   "both \"name\" and \"dev\" cannot be used when 
creating devices.\n");
+   exit(-1);
+   }
 
if (link) {
int ifindex;
-- 
1.7.10.4

Re: [RFC PATCH v3 0/3] Enable virtio_net to act as a backup for a passthru device

2018-02-22 Thread Jiri Pirko

Thu, Feb 22, 2018 at 12:54:45PM CET, gerlitz...@gmail.com wrote:
>On Thu, Feb 22, 2018 at 10:11 AM, Jiri Pirko  wrote:
>> Wed, Feb 21, 2018 at 09:57:09PM CET, alexander.du...@gmail.com wrote:
>
>>>The signaling isn't too much of an issue since we can just tweak the
>>>link state of the VF or virtio manually to report the link up or down
>>>prior to the hot-plug. Now that we are on the same page with the team0
>
>> Oh, so you just do "ip link set vfrepresentor down" in the host.
>> That makes sense. I'm pretty sure that this is not implemented for all
>> drivers now.
>
>mlx5 supports that, on the representor close ndo we take the VF link
>operational v-link down
>
>We should probably also put into the picture some/more aspects
>from the host side of things. The provisioning of the v-switch now
>have to deal with two channels going into the VM, the PV (virtio)
>one and the PT (VF) one.
>
>This should probably boil down to apply teaming/bonding between
>the VF representor and a PV backend device, e.g TAP.

Yes, that is correct.

Re: [PATCH bpf] bpf: fix rcu lockdep warning for lpm_trie map_free callback

2018-02-22 Thread Eric Dumazet

On Wed, 2018-02-21 at 22:38 -0800, Yonghong Song wrote:
> Commit 9a3efb6b661f ("bpf: fix memory leak in lpm_trie map_free callback 
> function")
> fixed a memory leak and removed unnecessary locks in map_free callback 
> function.
> Unfortrunately, it introduced a lockdep warning. When lockdep checking is 
> turned on,
> running tools/testing/selftests/bpf/test_lpm_map will have:
> 
>   [   98.294321] =
>   [   98.294807] WARNING: suspicious RCU usage
>   [   98.295359] 4.16.0-rc2+ #193 Not tainted
>   [   98.295907] -
>   [   98.296486] /home/yhs/work/bpf/kernel/bpf/lpm_trie.c:572 suspicious 
> rcu_dereference_check() usage!
>   [   98.297657]
>   [   98.297657] other info that might help us debug this:
>   [   98.297657]
>   [   98.298663]
>   [   98.298663] rcu_scheduler_active = 2, debug_locks = 1
>   [   98.299536] 2 locks held by kworker/2:1/54:
>   [   98.300152]  #0:  ((wq_completion)"events"){+.+.}, at: 
> [<196bc1f0>] process_one_work+0x157/0x5c0
>   [   98.301381]  #1:  ((work_completion)(>work)){+.+.}, at: 
> [<196bc1f0>] process_one_work+0x157/0x5c0
> 
> Since actual trie tree removal happens only after no other
> accesses to the tree are possible, this patch simply converted all
> rcu protected pointer access to normal access, which removed the
> above warning.
> 
> Fixes: 9a3efb6b661f ("bpf: fix memory leak in lpm_trie map_free callback 
> function")
> Reported-by: Eric Dumazet 
> Signed-off-by: Yonghong Song 
> ---
>  kernel/bpf/lpm_trie.c | 11 +--
>  1 file changed, 5 insertions(+), 6 deletions(-)
> 
> diff --git a/kernel/bpf/lpm_trie.c b/kernel/bpf/lpm_trie.c
> index a75e02c..0c15813 100644
> --- a/kernel/bpf/lpm_trie.c
> +++ b/kernel/bpf/lpm_trie.c
> @@ -552,7 +552,7 @@ static struct bpf_map *trie_alloc(union bpf_attr *attr)
>  static void trie_free(struct bpf_map *map)
>  {
>   struct lpm_trie *trie = container_of(map, struct lpm_trie, map);
> - struct lpm_trie_node __rcu **slot;
> + struct lpm_trie_node **slot;
>   struct lpm_trie_node *node;
>  
>   /* Wait for outstanding programs to complete
> @@ -569,23 +569,22 @@ static void trie_free(struct bpf_map *map)
>   slot = >root;
>  
>   for (;;) {
> - node = rcu_dereference_protected(*slot,
> - lockdep_is_held(>lock));
> + node = *slot;

Hi Yonghong

It is not sparse compliant.

kernel/bpf/lpm_trie.c:573:30: warning: incorrect type in assignment (different 
address spaces)
kernel/bpf/lpm_trie.c:573:30:expected struct lpm_trie_node *node
kernel/bpf/lpm_trie.c:573:30:got struct lpm_trie_node [noderef] 
*


In my local tree, I simply did

node = rcu_dereference_protected(*slot, 1);

Since we are the last user of the whole tree after the prior synchronize_rcu();

[PATCH] dsa: ptp; mark dummy helpers as 'inline'

2018-02-22 Thread Arnd Bergmann

Declaring a static function in a header leads to a warning every
time that header gets included without the function being used:

In file included from drivers/net/dsa/mv88e6xxx/chip.c:42:
drivers/net/dsa/mv88e6xxx/ptp.h:92:13: error: 'mv88e6xxx_hwtstamp_work' defined 
but not used [-Werror=unused-function]
 static long mv88e6xxx_hwtstamp_work(struct ptp_clock_info *ptp)
In file included from drivers/net/dsa/mv88e6xxx/chip.c:38:
drivers/net/dsa/mv88e6xxx/global2.h:355:12: error: 'mv88e6xxx_g2_wait' defined 
but not used [-Werror=unused-function]
 static int mv88e6xxx_g2_wait(struct mv88e6xxx_chip *chip, int reg, u16 mask)
^
drivers/net/dsa/mv88e6xxx/global2.h:350:12: error: 'mv88e6xxx_g2_update' 
defined but not used [-Werror=unused-function]
 static int mv88e6xxx_g2_update(struct mv88e6xxx_chip *chip, int reg, u16 
update)
^~~
drivers/net/dsa/mv88e6xxx/global2.h:345:12: error: 'mv88e6xxx_g2_write' defined 
but not used [-Werror=unused-function]
 static int mv88e6xxx_g2_write(struct mv88e6xxx_chip *chip, int reg, u16 val)
^~
drivers/net/dsa/mv88e6xxx/global2.h:340:12: error: 'mv88e6xxx_g2_read' defined 
but not used [-Werror=unused-function]
 static int mv88e6xxx_g2_read(struct mv88e6xxx_chip *chip, int reg, u16 *val)

This marks all such functions in dsa inline to make sure we don't warn
about them.

Fixes: c6fe0ad2c349 ("net: dsa: mv88e6xxx: add rx/tx timestamping support")
Fixes: 0d632c3d6fe3 ("net: dsa: mv88e6xxx: add accessors for PTP/TAI registers")
Signed-off-by: Arnd Bergmann 
---
 drivers/net/dsa/mv88e6xxx/global2.h | 8 
 drivers/net/dsa/mv88e6xxx/ptp.h | 4 ++--
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/net/dsa/mv88e6xxx/global2.h 
b/drivers/net/dsa/mv88e6xxx/global2.h
index 25f92b3d7157..2a7c4f9b070c 100644
--- a/drivers/net/dsa/mv88e6xxx/global2.h
+++ b/drivers/net/dsa/mv88e6xxx/global2.h
@@ -337,22 +337,22 @@ static inline int mv88e6xxx_g2_require(struct 
mv88e6xxx_chip *chip)
return 0;
 }
 
-static int mv88e6xxx_g2_read(struct mv88e6xxx_chip *chip, int reg, u16 *val)
+static inline int mv88e6xxx_g2_read(struct mv88e6xxx_chip *chip, int reg, u16 
*val)
 {
return -EOPNOTSUPP;
 }
 
-static int mv88e6xxx_g2_write(struct mv88e6xxx_chip *chip, int reg, u16 val)
+static inline int mv88e6xxx_g2_write(struct mv88e6xxx_chip *chip, int reg, u16 
val)
 {
return -EOPNOTSUPP;
 }
 
-static int mv88e6xxx_g2_update(struct mv88e6xxx_chip *chip, int reg, u16 
update)
+static inline int mv88e6xxx_g2_update(struct mv88e6xxx_chip *chip, int reg, 
u16 update)
 {
return -EOPNOTSUPP;
 }
 
-static int mv88e6xxx_g2_wait(struct mv88e6xxx_chip *chip, int reg, u16 mask)
+static inline int mv88e6xxx_g2_wait(struct mv88e6xxx_chip *chip, int reg, u16 
mask)
 {
return -EOPNOTSUPP;
 }
diff --git a/drivers/net/dsa/mv88e6xxx/ptp.h b/drivers/net/dsa/mv88e6xxx/ptp.h
index 992818ade746..10f271ab650d 100644
--- a/drivers/net/dsa/mv88e6xxx/ptp.h
+++ b/drivers/net/dsa/mv88e6xxx/ptp.h
@@ -89,7 +89,7 @@ void mv88e6xxx_ptp_free(struct mv88e6xxx_chip *chip);
 
 #else /* !CONFIG_NET_DSA_MV88E6XXX_PTP */
 
-static long mv88e6xxx_hwtstamp_work(struct ptp_clock_info *ptp)
+static inline long mv88e6xxx_hwtstamp_work(struct ptp_clock_info *ptp)
 {
return -1;
 }
@@ -99,7 +99,7 @@ static inline int mv88e6xxx_ptp_setup(struct mv88e6xxx_chip 
*chip)
return 0;
 }
 
-static void mv88e6xxx_ptp_free(struct mv88e6xxx_chip *chip)
+static inline void mv88e6xxx_ptp_free(struct mv88e6xxx_chip *chip)
 {
 }
 
-- 
2.9.0

Re: [RFC][PATCH bpf v2 1/2] bpf: allow 64-bit offsets for bpf function calls

2018-02-22 Thread Michael Holzheu

Am Thu, 22 Feb 2018 13:06:40 +0100
schrieb Michael Holzheu :

> Am Fri, 16 Feb 2018 21:20:09 +0530
> schrieb "Naveen N. Rao" :
> 
> > Daniel Borkmann wrote:
> > > On 02/15/2018 05:25 PM, Daniel Borkmann wrote:
> > >> On 02/13/2018 05:05 AM, Sandipan Das wrote:
> > >>> The imm field of a bpf_insn is a signed 32-bit integer. For
> > >>> JIT-ed bpf-to-bpf function calls, it stores the offset from
> > >>> __bpf_call_base to the start of the callee function.
> > >>>
> > >>> For some architectures, such as powerpc64, it was found that
> > >>> this offset may be as large as 64 bits because of which this
> > >>> cannot be accomodated in the imm field without truncation.
> > >>>
> > >>> To resolve this, we additionally make aux->func within each
> > >>> bpf_prog associated with the functions to point to the list
> > >>> of all function addresses determined by the verifier.
> > >>>
> > >>> We keep the value assigned to the off field of the bpf_insn
> > >>> as a way to index into aux->func and also set aux->func_cnt
> > >>> so that this can be used for performing basic upper bound
> > >>> checks for the off field.
> > >>>
> > >>> Signed-off-by: Sandipan Das 
> > >>> ---
> > >>> v2: Make aux->func point to the list of functions determined
> > >>> by the verifier rather than allocating a separate callee
> > >>> list for each function.
> > >> 
> > >> Approach looks good to me; do you know whether s390x JIT would
> > >> have similar requirement? I think one limitation that would still
> > >> need to be addressed later with such approach would be regarding the
> > >> xlated prog dump in bpftool, see 'BPF calls via JIT' in 7105e828c087
> > >> ("bpf: allow for correlation of maps and helpers in dump"). Any
> > >> ideas for this (potentially if we could use off + imm for calls,
> > >> we'd get to 48 bits, but that seems still not be enough as you say)?
> > 
> > All good points. I'm not really sure how s390x works, so I can't comment 
> > on that, but I'm copying Michael Holzheu for his consideration.
> > 
> > With the existing scheme, 48 bits won't be enough, so we rejected that 
> > approach. I can also see how this will be a problem with bpftool, but I 
> > haven't looked into it in detail. I wonder if we can annotate the output 
> > to indicate the function being referred to?
> > 
> > > 
> > > One other random thought, although I'm not sure how feasible this
> > > is for ppc64 JIT to realize ... but idea would be to have something
> > > like the below:
> > > 
> > > diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
> > > index 29ca920..daa7258 100644
> > > --- a/kernel/bpf/core.c
> > > +++ b/kernel/bpf/core.c
> > > @@ -512,6 +512,11 @@ int bpf_get_kallsym(unsigned int symnum, unsigned 
> > > long *value, char *type,
> > >   return ret;
> > >  }
> > > 
> > > +void * __weak bpf_jit_image_alloc(unsigned long size)
> > > +{
> > > + return module_alloc(size);
> > > +}
> > > +
> > >  struct bpf_binary_header *
> > >  bpf_jit_binary_alloc(unsigned int proglen, u8 **image_ptr,
> > >unsigned int alignment,
> > > @@ -525,7 +530,7 @@ bpf_jit_binary_alloc(unsigned int proglen, u8 
> > > **image_ptr,
> > >* random section of illegal instructions.
> > >*/
> > >   size = round_up(proglen + sizeof(*hdr) + 128, PAGE_SIZE);
> > > - hdr = module_alloc(size);
> > > + hdr = bpf_jit_image_alloc(size);
> > >   if (hdr == NULL)
> > >   return NULL;
> > > 
> > > And ppc64 JIT could override bpf_jit_image_alloc() in a similar way
> > > like some archs would override the module_alloc() helper through a
> > > custom implementation, usually via __vmalloc_node_range(), so we
> > > could perhaps fit the range for BPF JITed images in a way that they
> > > could use the 32bit imm in the end? There are not that many progs
> > > loaded typically, so the range could be a bit narrower in such case
> > > anyway. (Not sure if this would work out though, but I thought to
> > > bring it up.)
> > 
> > That'd be a good option to consider. I don't think we want to allocate 
> > anything from the linear memory range since users could load 
> > unprivileged BPF programs and consume a lot of memory that way. I doubt 
> > if we can map vmalloc'ed memory into the 0xc0 address range, but I'm not 
> > entirely sure.
> > 
> > Michael,
> > Is the above possible? The question is if we can have BPF programs be 
> > allocated within 4GB of __bpf_call_base (which is a kernel symbol), so 
> > that calls to those programs can be encoded in a 32-bit immediate field 
> > in a BPF instruction. As an extension, we may be able to extend it to 
> > 48-bits by combining with another BPF instruction field (offset). In 
> > either case, the vmalloc'ed address range won't work.
> > 
> > The alternative is to pass the full 64-bit address of the BPF program in 
> > an auxiliary field (as proposed in this patch set) but we need to fix it 
> > up for 'bpftool' as

Re: [PATCH] dsa: ptp; mark dummy helpers as 'inline'

2018-02-22 Thread Andrew Lunn

On Thu, Feb 22, 2018 at 12:44:40PM +0100, Arnd Bergmann wrote:
> Declaring a static function in a header leads to a warning every
> time that header gets included without the function being used:
> 
> In file included from drivers/net/dsa/mv88e6xxx/chip.c:42:
> drivers/net/dsa/mv88e6xxx/ptp.h:92:13: error: 'mv88e6xxx_hwtstamp_work' 
> defined but not used [-Werror=unused-function]
>  static long mv88e6xxx_hwtstamp_work(struct ptp_clock_info *ptp)
> In file included from drivers/net/dsa/mv88e6xxx/chip.c:38:
> drivers/net/dsa/mv88e6xxx/global2.h:355:12: error: 'mv88e6xxx_g2_wait' 
> defined but not used [-Werror=unused-function]
>  static int mv88e6xxx_g2_wait(struct mv88e6xxx_chip *chip, int reg, u16 mask)
> ^
> drivers/net/dsa/mv88e6xxx/global2.h:350:12: error: 'mv88e6xxx_g2_update' 
> defined but not used [-Werror=unused-function]
>  static int mv88e6xxx_g2_update(struct mv88e6xxx_chip *chip, int reg, u16 
> update)
> ^~~
> drivers/net/dsa/mv88e6xxx/global2.h:345:12: error: 'mv88e6xxx_g2_write' 
> defined but not used [-Werror=unused-function]
>  static int mv88e6xxx_g2_write(struct mv88e6xxx_chip *chip, int reg, u16 val)
> ^~
> drivers/net/dsa/mv88e6xxx/global2.h:340:12: error: 'mv88e6xxx_g2_read' 
> defined but not used [-Werror=unused-function]
>  static int mv88e6xxx_g2_read(struct mv88e6xxx_chip *chip, int reg, u16 *val)
> 
> This marks all such functions in dsa inline to make sure we don't warn
> about them.
> 
> Fixes: c6fe0ad2c349 ("net: dsa: mv88e6xxx: add rx/tx timestamping support")
> Fixes: 0d632c3d6fe3 ("net: dsa: mv88e6xxx: add accessors for PTP/TAI 
> registers")
> Signed-off-by: Arnd Bergmann 

Reviewed-by: Andrew Lunn 

Andrew

Re: [PATCH net-next 6/7] mlxsw: spectrum_router: Add support for ipv6 hash policy update

2018-02-22 Thread Ido Schimmel

On Wed, Feb 21, 2018 at 10:49:53AM -0800, David Ahern wrote:
> Similar to 28678f07f127d ("mlxsw: spectrum_router: Update multipath hash
> parameters upon netevents") for IPv4, make sure the kernel and asic are
> using the same hash algorithm for path selection.
> 
> Signed-off-by: David Ahern 

Tested-by: Ido Schimmel

Re: nft/bpf interpreters and spectre2. Was: [PATCH RFC 0/4] net: add bpfilter

2018-02-22 Thread Pablo Neira Ayuso

Hi Alexei,

On Wed, Feb 21, 2018 at 06:20:37PM -0800, Alexei Starovoitov wrote:
> On Wed, Feb 21, 2018 at 01:13:03PM +0100, Florian Westphal wrote:
> > 
> > Obvious candidates are: meta, numgen, limit, objref, quota, reject.
> > 
> > We should probably also consider removing
> > CONFIG_NFT_SET_RBTREE and CONFIG_NFT_SET_HASH and just always
> > build both too (at least rbtree since that offers interval).
> > 
> > For the indirect call issue we can use direct calls from eval loop for
> > some of the more frequently used ones, similar to what we do already
> > for nft_cmp_fast_expr. 
> 
> nft_cmp_fast_expr and other expressions mentioned above made me thinking...
> 
> do we have the same issue with nft interpreter as we had with bpf one?
> bpf interpreter was used as part of spectre2 attack to leak
> information via cache side channel and let VM read hypervisor memory.
> Due to that issue we removed bpf interpreter from the kernel code.
> That's what CONFIG_BPF_JIT_ALWAYS_ON for...
> but we still have nft interpreter in the kernel that can also
> execute arbitrary nft expressions.
> 
> Jann's exploit used the following bpf instructions:
> struct bpf_insn evil_bytecode_instrs[] = {
> // rax = target_byte_addr
> { .code = BPF_LD | BPF_IMM | BPF_DW, .dst_reg = 0, .imm = target_byte_addr }, 
> { .imm = target_byte_addr>>32 },

We don't place pointers in the nft VM registers, it's basically
illegal to do so, otherwise we would need more sophisticated verifier.
I'm telling this because we don't have a way to point to any arbitrary
address as in 'target_byte_addr' above.

> // rdi = timing_leak_array
> { .code = BPF_LD | BPF_IMM | BPF_DW, .dst_reg = 1, .imm = 
> host_timing_leak_addr }, { .imm = host_timing_leak_addr>>32 },
> // rax = *(u8*)rax
> { .code = BPF_LDX | BPF_MEM | BPF_B, .dst_reg = 0, .src_reg = 0, .off = 0 },
> // rax = rax << ...
> { .code = BPF_ALU64 | BPF_LSH | BPF_K, .dst_reg = 0, .imm = 10 - bit_idx },
> // rax = rax & 0x400
> { .code = BPF_ALU64 | BPF_AND | BPF_K, .dst_reg = 0, .imm = 0x400 },
> // rax = rdi + rax
> { .code = BPF_ALU64 | BPF_ADD | BPF_X, .dst_reg = 0, .src_reg = 1 },
> // *(u8*) (rax + 0x800)
> { .code = BPF_LDX | BPF_MEM | BPF_B, .dst_reg = 0, .src_reg = 0, .off = 0x800 
> },
> 
> and a gadget to jump into __bpf_prog_run with insn pointing
> to memory controlled by the guest while accessible
> (at different virt address) by the hypervisor.
> 
> It seems possible to construct similar sequence of instructions
> out of nft expressions and use gadget that jumps into nft_do_chain().
> The attacker would need to discover more kernel addresses:
> nft_do_chain, nft_cmp_fast_ops, nft_payload_fast_ops, nft_bitwise_eval,
> nft_lookup_eval, and nft_bitmap_lookup
> to populate nft chains, rules and expressions in guest memory
> comparing to bpf interpreter attack.
> 
> Then in nft_do_chain(struct nft_pktinfo *pkt, void *priv)
> pkt needs to point to fake struct sk_buff in guest memory with
> skb->head == target_byte_addr

We don't have a way to make this point to fake struct sk_buff.

> The first nft expression can be nft_payload_fast_eval().
> If it's properly constructed with
> (nft_payload->based == NFT_PAYLOAD_NETWORK_HEADER, offset == 0, len == 0, 
> dreg == 1)

We can reject len == 0. To be honest, this is not done right now, but
we can place a patch to validate this. Given this is a specialized
networking virtual machine, it retain semantics, so fetching zero
length data from a skbuff makes no sense, hence, we can return EINVAL
via netlink when adding a rule that tries to do this.

> it will do arbitrary load of
> *(u8 *)dest = *(u8 *)ptr;
> from target_byte_addr into register 1 of nft state machine
> (dest is u32 array of registers in the stack of nft_do_chain)
> Second nft expression can be nft_bitwise_eval() to mask particular
> bit in register 1.
> Then nft_cmp_eval() to check whether bit is one or zero and
> conditional NFT_BREAK out of first nft expression into second nft rule.
> The last conditional nft_immediate_eval() in the first rule will set
> register 1 to 0x400 * 8 while the first nft_bitwise_eval() in
> the second rule with do r1 &= 0x400 * 8.
> So at this point r1 will have either 0x400 * 8 or 0 depending
> on value of speculatively loaded bit.
> The last expression can be nft_lookup_eval() with 
> nft_lookup->set->ops->lookup == nft_bitmap_lookup
> which will do nft_bitmap->bitmap[idx] where idx = r1 / 8
> The memory used for this last nft_lookup/bitmap expression is
> both an instruction and timing_leak_array itself.
> If I'm not mistaken, this sequence of nft expression will
> speculatively execute very similar logic as in evil_bytecode_instrs[]

My impression is that several assumptions above are not correct.

> The amount of actual speculative native cpu load/stores/branches is
> probably more than executed by bpf interpreter for these evil bytecodes,
> but likely well within cpu speculation window of 100+ insns.
> 
> Obviously such exploit is harder to do than

Re: [RFC][PATCH bpf v2 1/2] bpf: allow 64-bit offsets for bpf function calls

2018-02-22 Thread Michael Holzheu

Am Fri, 16 Feb 2018 21:20:09 +0530
schrieb "Naveen N. Rao" :

> Daniel Borkmann wrote:
> > On 02/15/2018 05:25 PM, Daniel Borkmann wrote:
> >> On 02/13/2018 05:05 AM, Sandipan Das wrote:
> >>> The imm field of a bpf_insn is a signed 32-bit integer. For
> >>> JIT-ed bpf-to-bpf function calls, it stores the offset from
> >>> __bpf_call_base to the start of the callee function.
> >>>
> >>> For some architectures, such as powerpc64, it was found that
> >>> this offset may be as large as 64 bits because of which this
> >>> cannot be accomodated in the imm field without truncation.
> >>>
> >>> To resolve this, we additionally make aux->func within each
> >>> bpf_prog associated with the functions to point to the list
> >>> of all function addresses determined by the verifier.
> >>>
> >>> We keep the value assigned to the off field of the bpf_insn
> >>> as a way to index into aux->func and also set aux->func_cnt
> >>> so that this can be used for performing basic upper bound
> >>> checks for the off field.
> >>>
> >>> Signed-off-by: Sandipan Das 
> >>> ---
> >>> v2: Make aux->func point to the list of functions determined
> >>> by the verifier rather than allocating a separate callee
> >>> list for each function.
> >> 
> >> Approach looks good to me; do you know whether s390x JIT would
> >> have similar requirement? I think one limitation that would still
> >> need to be addressed later with such approach would be regarding the
> >> xlated prog dump in bpftool, see 'BPF calls via JIT' in 7105e828c087
> >> ("bpf: allow for correlation of maps and helpers in dump"). Any
> >> ideas for this (potentially if we could use off + imm for calls,
> >> we'd get to 48 bits, but that seems still not be enough as you say)?
> 
> All good points. I'm not really sure how s390x works, so I can't comment 
> on that, but I'm copying Michael Holzheu for his consideration.
> 
> With the existing scheme, 48 bits won't be enough, so we rejected that 
> approach. I can also see how this will be a problem with bpftool, but I 
> haven't looked into it in detail. I wonder if we can annotate the output 
> to indicate the function being referred to?
> 
> > 
> > One other random thought, although I'm not sure how feasible this
> > is for ppc64 JIT to realize ... but idea would be to have something
> > like the below:
> > 
> > diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
> > index 29ca920..daa7258 100644
> > --- a/kernel/bpf/core.c
> > +++ b/kernel/bpf/core.c
> > @@ -512,6 +512,11 @@ int bpf_get_kallsym(unsigned int symnum, unsigned long 
> > *value, char *type,
> > return ret;
> >  }
> > 
> > +void * __weak bpf_jit_image_alloc(unsigned long size)
> > +{
> > +   return module_alloc(size);
> > +}
> > +
> >  struct bpf_binary_header *
> >  bpf_jit_binary_alloc(unsigned int proglen, u8 **image_ptr,
> >  unsigned int alignment,
> > @@ -525,7 +530,7 @@ bpf_jit_binary_alloc(unsigned int proglen, u8 
> > **image_ptr,
> >  * random section of illegal instructions.
> >  */
> > size = round_up(proglen + sizeof(*hdr) + 128, PAGE_SIZE);
> > -   hdr = module_alloc(size);
> > +   hdr = bpf_jit_image_alloc(size);
> > if (hdr == NULL)
> > return NULL;
> > 
> > And ppc64 JIT could override bpf_jit_image_alloc() in a similar way
> > like some archs would override the module_alloc() helper through a
> > custom implementation, usually via __vmalloc_node_range(), so we
> > could perhaps fit the range for BPF JITed images in a way that they
> > could use the 32bit imm in the end? There are not that many progs
> > loaded typically, so the range could be a bit narrower in such case
> > anyway. (Not sure if this would work out though, but I thought to
> > bring it up.)
> 
> That'd be a good option to consider. I don't think we want to allocate 
> anything from the linear memory range since users could load 
> unprivileged BPF programs and consume a lot of memory that way. I doubt 
> if we can map vmalloc'ed memory into the 0xc0 address range, but I'm not 
> entirely sure.
> 
> Michael,
> Is the above possible? The question is if we can have BPF programs be 
> allocated within 4GB of __bpf_call_base (which is a kernel symbol), so 
> that calls to those programs can be encoded in a 32-bit immediate field 
> in a BPF instruction. As an extension, we may be able to extend it to 
> 48-bits by combining with another BPF instruction field (offset). In 
> either case, the vmalloc'ed address range won't work.
> 
> The alternative is to pass the full 64-bit address of the BPF program in 
> an auxiliary field (as proposed in this patch set) but we need to fix it 
> up for 'bpftool' as well.

Hi Naveen,

Our s390 kernel maintainer Martin Schwidefsky took over
eBPF responsibility for s390 from me.

@Martin: Can you answer Navee's question?

Michael

[PATCH] selftest: fix kselftest-merge depend on 'RUNTIME_TESTING_MENU'

2018-02-22 Thread Zong Li

Since the 'commit d3deafaa8b5c ("lib/: make RUNTIME_TESTS a menuconfig
to ease disabling it all")', the make kselftest-merge cannot merge the
config dependencies of kselftest to the existing .config file.

These config dependencies of kselftest need to enable the
'CONFIG_RUNTIME_TESTING_MENU=y' at the same time.

Signed-off-by: Zong Li 
Cc: Greentime Hu 
---
 tools/testing/selftests/bpf/config | 1 +
 tools/testing/selftests/firmware/config| 1 +
 tools/testing/selftests/kmod/config| 1 +
 tools/testing/selftests/lib/config | 1 +
 tools/testing/selftests/net/config | 1 +
 tools/testing/selftests/static_keys/config | 1 +
 tools/testing/selftests/sysctl/config  | 1 +
 tools/testing/selftests/user/config| 1 +
 8 files changed, 8 insertions(+)

diff --git a/tools/testing/selftests/bpf/config 
b/tools/testing/selftests/bpf/config
index 983dd25d49f4..d93b82144b19 100644
--- a/tools/testing/selftests/bpf/config
+++ b/tools/testing/selftests/bpf/config
@@ -1,3 +1,4 @@
+CONFIG_RUNTIME_TESTING_MENU=y
 CONFIG_BPF=y
 CONFIG_BPF_SYSCALL=y
 CONFIG_NET_CLS_BPF=m
diff --git a/tools/testing/selftests/firmware/config 
b/tools/testing/selftests/firmware/config
index c8137f70e291..01d7445ef007 100644
--- a/tools/testing/selftests/firmware/config
+++ b/tools/testing/selftests/firmware/config
@@ -1 +1,2 @@
+CONFIG_RUNTIME_TESTING_MENU=y
 CONFIG_TEST_FIRMWARE=y
diff --git a/tools/testing/selftests/kmod/config 
b/tools/testing/selftests/kmod/config
index 259f4fd6b5e2..37070985e428 100644
--- a/tools/testing/selftests/kmod/config
+++ b/tools/testing/selftests/kmod/config
@@ -1,3 +1,4 @@
+CONFIG_RUNTIME_TESTING_MENU=y
 CONFIG_TEST_KMOD=m
 CONFIG_TEST_LKM=m
 CONFIG_XFS_FS=m
diff --git a/tools/testing/selftests/lib/config 
b/tools/testing/selftests/lib/config
index 126933bcc950..d1fe14c2c8cb 100644
--- a/tools/testing/selftests/lib/config
+++ b/tools/testing/selftests/lib/config
@@ -1,3 +1,4 @@
+CONFIG_RUNTIME_TESTING_MENU=y
 CONFIG_TEST_PRINTF=m
 CONFIG_TEST_BITMAP=m
 CONFIG_PRIME_NUMBERS=m
diff --git a/tools/testing/selftests/net/config 
b/tools/testing/selftests/net/config
index 7177bea1fdfa..847a99873128 100644
--- a/tools/testing/selftests/net/config
+++ b/tools/testing/selftests/net/config
@@ -1,3 +1,4 @@
+CONFIG_RUNTIME_TESTING_MENU=y
 CONFIG_USER_NS=y
 CONFIG_BPF_SYSCALL=y
 CONFIG_TEST_BPF=m
diff --git a/tools/testing/selftests/static_keys/config 
b/tools/testing/selftests/static_keys/config
index d538fb774b96..732d17f6b9a1 100644
--- a/tools/testing/selftests/static_keys/config
+++ b/tools/testing/selftests/static_keys/config
@@ -1 +1,2 @@
+CONFIG_RUNTIME_TESTING_MENU=y
 CONFIG_TEST_STATIC_KEYS=m
diff --git a/tools/testing/selftests/sysctl/config 
b/tools/testing/selftests/sysctl/config
index 6ca14800d755..772ce8c3c0d9 100644
--- a/tools/testing/selftests/sysctl/config
+++ b/tools/testing/selftests/sysctl/config
@@ -1 +1,2 @@
+CONFIG_RUNTIME_TESTING_MENU=y
 CONFIG_TEST_SYSCTL=y
diff --git a/tools/testing/selftests/user/config 
b/tools/testing/selftests/user/config
index 784ed8416324..f9f491fa4ae8 100644
--- a/tools/testing/selftests/user/config
+++ b/tools/testing/selftests/user/config
@@ -1 +1,2 @@
+CONFIG_RUNTIME_TESTING_MENU=y
 CONFIG_TEST_USER_COPY=m
-- 
2.16.1

Re: [RFC PATCH v3 0/3] Enable virtio_net to act as a backup for a passthru device

2018-02-22 Thread Or Gerlitz

On Thu, Feb 22, 2018 at 10:11 AM, Jiri Pirko  wrote:
> Wed, Feb 21, 2018 at 09:57:09PM CET, alexander.du...@gmail.com wrote:

>>The signaling isn't too much of an issue since we can just tweak the
>>link state of the VF or virtio manually to report the link up or down
>>prior to the hot-plug. Now that we are on the same page with the team0

> Oh, so you just do "ip link set vfrepresentor down" in the host.
> That makes sense. I'm pretty sure that this is not implemented for all
> drivers now.

mlx5 supports that, on the representor close ndo we take the VF link
operational v-link down

We should probably also put into the picture some/more aspects
from the host side of things. The provisioning of the v-switch now
have to deal with two channels going into the VM, the PV (virtio)
one and the PT (VF) one.

This should probably boil down to apply teaming/bonding between
the VF representor and a PV backend device, e.g TAP.

[Crypto v7 10/12] chtls: Inline crypto request Tx/Rx

2018-02-22 Thread Atul Gupta

TLS handler for record transmit and receive.
Create Inline TLS work request and post to FW.

Signed-off-by: Atul Gupta 
---
 drivers/crypto/chelsio/chtls/chtls_io.c | 1867 +++
 1 file changed, 1867 insertions(+)
 create mode 100644 drivers/crypto/chelsio/chtls/chtls_io.c

diff --git a/drivers/crypto/chelsio/chtls/chtls_io.c 
b/drivers/crypto/chelsio/chtls/chtls_io.c
new file mode 100644
index 000..0c5d6c1
--- /dev/null
+++ b/drivers/crypto/chelsio/chtls/chtls_io.c
@@ -0,0 +1,1867 @@
+/*
+ * Copyright (c) 2017 Chelsio Communications, Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * Written by: Atul Gupta (atul.gu...@chelsio.com)
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "chtls.h"
+#include "chtls_cm.h"
+
+static bool is_tls_hw(struct chtls_sock *csk)
+{
+   return csk->tlshws.ofld;
+}
+
+static bool is_tls_rx(struct chtls_sock *csk)
+{
+   return (csk->tlshws.rxkey >= 0);
+}
+
+static bool is_tls_tx(struct chtls_sock *csk)
+{
+   return (csk->tlshws.txkey >= 0);
+}
+
+static bool is_tls_skb(struct chtls_sock *csk, const struct sk_buff *skb)
+{
+   return (is_tls_hw(csk) && skb_ulp_tls_skb_flags(skb));
+}
+
+static int key_size(void *sk)
+{
+   return 16; /* Key on DDR */
+}
+
+#define ceil(x, y) \
+   ({ unsigned long __x = (x), __y = (y); (__x + __y - 1) / __y; })
+
+static int data_sgl_len(const struct sk_buff *skb)
+{
+   unsigned int cnt;
+
+   cnt = skb_shinfo(skb)->nr_frags;
+   return (sgl_len(cnt) * 8);
+}
+
+static int nos_ivs(struct sock *sk, unsigned int size)
+{
+   struct chtls_sock *csk = rcu_dereference_sk_user_data(sk);
+
+   return ceil(size, csk->tlshws.mfs);
+}
+
+#define TLS_WR_CPL_LEN \
+   (sizeof(struct fw_tlstx_data_wr) + \
+   sizeof(struct cpl_tx_tls_sfo))
+
+static int is_ivs_imm(struct sock *sk, const struct sk_buff *skb)
+{
+   int ivs_size = nos_ivs(sk, skb->len) * CIPHER_BLOCK_SIZE;
+   int hlen = TLS_WR_CPL_LEN + data_sgl_len(skb);
+
+   if ((hlen + key_size(sk) + ivs_size) <
+   MAX_IMM_OFLD_TX_DATA_WR_LEN) {
+   ULP_SKB_CB(skb)->ulp.tls.iv = 1;
+   return 1;
+   }
+   ULP_SKB_CB(skb)->ulp.tls.iv = 0;
+   return 0;
+}
+
+static int max_ivs_size(struct sock *sk, int size)
+{
+   return (nos_ivs(sk, size) * CIPHER_BLOCK_SIZE);
+}
+
+static int ivs_size(struct sock *sk, const struct sk_buff *skb)
+{
+   return (is_ivs_imm(sk, skb) ? (nos_ivs(sk, skb->len) *
+CIPHER_BLOCK_SIZE) : 0);
+}
+
+static int flowc_wr_credits(int nparams, int *flowclenp)
+{
+   int flowclen16, flowclen;
+
+   flowclen = offsetof(struct fw_flowc_wr, mnemval[nparams]);
+   flowclen16 = DIV_ROUND_UP(flowclen, 16);
+   flowclen = flowclen16 * 16;
+
+   if (flowclenp)
+   *flowclenp = flowclen;
+
+   return flowclen16;
+}
+
+static struct sk_buff *create_flowc_wr_skb(struct sock *sk,
+  struct fw_flowc_wr *flowc,
+  int flowclen)
+{
+   struct chtls_sock *csk = rcu_dereference_sk_user_data(sk);
+   struct sk_buff *skb;
+
+   skb = alloc_skb(flowclen, GFP_ATOMIC);
+   if (!skb)
+   return NULL;
+
+   memcpy(__skb_put(skb, flowclen), flowc, flowclen);
+   set_queue(skb, (csk->txq_idx << 1) | CPL_PRIORITY_DATA, sk);
+
+   return skb;
+}
+
+static int send_flowc_wr(struct sock *sk, struct fw_flowc_wr *flowc,
+int flowclen)
+{
+   struct chtls_sock *csk = rcu_dereference_sk_user_data(sk);
+   bool syn_sent = (sk->sk_state == TCP_SYN_SENT);
+   struct tcp_sock *tp = tcp_sk(sk);
+   int flowclen16 = flowclen / 16;
+   struct sk_buff *skb;
+
+   if (csk_flag(sk, CSK_TX_DATA_SENT)) {
+   skb = create_flowc_wr_skb(sk, flowc, flowclen);
+   if (!skb)
+   return -ENOMEM;
+
+   if (syn_sent)
+   __skb_queue_tail(>ooo_queue, skb);
+   else
+   skb_entail(sk, skb,
+  ULPCB_FLAG_NO_HDR | ULPCB_FLAG_NO_APPEND);
+   return 0;
+   }
+
+   if (!syn_sent) {
+   int ret;
+
+   ret = cxgb4_immdata_send(csk->egress_dev,
+csk->txq_idx,
+flowc, flowclen);
+   if (!ret)
+   return flowclen16;
+   }
+   skb = create_flowc_wr_skb(sk, flowc, flowclen);
+   if (!skb)
+   return -ENOMEM;
+   send_or_defer(sk, tp, skb, 0);
+   return flowclen16;
+}
+
+static u8

[Crypto v7 05/12] cxgb4: Inline TLS FW Interface

2018-02-22 Thread Atul Gupta

Key area size in hw-config file. CPL struct for TLS request
and response. Work request for Inline TLS.

Signed-off-by: Atul Gupta 
---
 drivers/net/ethernet/chelsio/cxgb4/t4_msg.h   | 121 ++-
 drivers/net/ethernet/chelsio/cxgb4/t4_regs.h  |   2 +
 drivers/net/ethernet/chelsio/cxgb4/t4fw_api.h | 165 +-
 3 files changed, 283 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/chelsio/cxgb4/t4_msg.h 
b/drivers/net/ethernet/chelsio/cxgb4/t4_msg.h
index d0db442..507cb5a 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/t4_msg.h
+++ b/drivers/net/ethernet/chelsio/cxgb4/t4_msg.h
@@ -81,6 +81,7 @@ enum {
CPL_RX_ISCSI_CMP  = 0x45,
CPL_TRACE_PKT_T5  = 0x48,
CPL_RX_ISCSI_DDP  = 0x49,
+   CPL_RX_TLS_CMP= 0x4E,
 
CPL_RDMA_READ_REQ = 0x60,
 
@@ -88,6 +89,7 @@ enum {
CPL_ACT_OPEN_REQ6 = 0x83,
 
CPL_TX_TLS_PDU =0x88,
+   CPL_TX_TLS_SFO= 0x89,
CPL_TX_SEC_PDU= 0x8A,
CPL_TX_TLS_ACK= 0x8B,
 
@@ -97,6 +99,7 @@ enum {
CPL_RX_MPS_PKT= 0xAF,
 
CPL_TRACE_PKT = 0xB0,
+   CPL_TLS_DATA  = 0xB1,
CPL_ISCSI_DATA= 0xB2,
 
CPL_FW4_MSG   = 0xC0,
@@ -151,6 +154,7 @@ enum {
ULP_MODE_RDMA  = 4,
ULP_MODE_TCPDDP= 5,
ULP_MODE_FCOE  = 6,
+   ULP_MODE_TLS   = 8,
 };
 
 enum {
@@ -1415,6 +1419,14 @@ struct cpl_tx_data {
 #define TX_FORCE_S 13
 #define TX_FORCE_V(x)  ((x) << TX_FORCE_S)
 
+#define TX_SHOVE_S14
+#define TX_SHOVE_V(x) ((x) << TX_SHOVE_S)
+
+#define TX_ULP_MODE_S10
+#define TX_ULP_MODE_M0x7
+#define TX_ULP_MODE_V(x) ((x) << TX_ULP_MODE_S)
+#define TX_ULP_MODE_G(x) (((x) >> TX_ULP_MODE_S) & TX_ULP_MODE_M)
+
 #define T6_TX_FORCE_S  20
 #define T6_TX_FORCE_V(x)   ((x) << T6_TX_FORCE_S)
 #define T6_TX_FORCE_F  T6_TX_FORCE_V(1U)
@@ -1429,12 +1441,21 @@ enum {
ULP_TX_SC_NOOP = 0x80,
ULP_TX_SC_IMM  = 0x81,
ULP_TX_SC_DSGL = 0x82,
-   ULP_TX_SC_ISGL = 0x83
+   ULP_TX_SC_ISGL = 0x83,
+   ULP_TX_SC_MEMRD = 0x86
 };
 
 #define ULPTX_CMD_S24
 #define ULPTX_CMD_V(x) ((x) << ULPTX_CMD_S)
 
+#define ULPTX_LEN16_S0
+#define ULPTX_LEN16_M0xFF
+#define ULPTX_LEN16_V(x) ((x) << ULPTX_LEN16_S)
+
+#define ULP_TX_SC_MORE_S 23
+#define ULP_TX_SC_MORE_V(x) ((x) << ULP_TX_SC_MORE_S)
+#define ULP_TX_SC_MORE_F  ULP_TX_SC_MORE_V(1U)
+
 struct ulptx_sge_pair {
__be32 len[2];
__be64 addr[2];
@@ -2112,4 +2133,102 @@ enum {
X_CPL_RX_MPS_PKT_TYPE_QFC   = 1 << 2,
X_CPL_RX_MPS_PKT_TYPE_PTP   = 1 << 3
 };
+
+struct cpl_tx_tls_sfo {
+   __be32 op_to_seg_len;
+   __be32 pld_len;
+   __be32 type_protover;
+   __be32 r1_lo;
+   __be32 seqno_numivs;
+   __be32 ivgen_hdrlen;
+   __be64 scmd1;
+};
+
+/* cpl_tx_tls_sfo macros */
+#define CPL_TX_TLS_SFO_OPCODE_S 24
+#define CPL_TX_TLS_SFO_OPCODE_V(x)  ((x) << CPL_TX_TLS_SFO_OPCODE_S)
+
+#define CPL_TX_TLS_SFO_DATA_TYPE_S  20
+#define CPL_TX_TLS_SFO_DATA_TYPE_V(x)   ((x) << CPL_TX_TLS_SFO_DATA_TYPE_S)
+
+#define CPL_TX_TLS_SFO_CPL_LEN_S16
+#define CPL_TX_TLS_SFO_CPL_LEN_V(x) ((x) << CPL_TX_TLS_SFO_CPL_LEN_S)
+
+#define CPL_TX_TLS_SFO_SEG_LEN_S0
+#define CPL_TX_TLS_SFO_SEG_LEN_M0x
+#define CPL_TX_TLS_SFO_SEG_LEN_V(x) ((x) << CPL_TX_TLS_SFO_SEG_LEN_S)
+#define CPL_TX_TLS_SFO_SEG_LEN_G(x) \
+   (((x) >> CPL_TX_TLS_SFO_SEG_LEN_S) & CPL_TX_TLS_SFO_SEG_LEN_M)
+
+#define CPL_TX_TLS_SFO_TYPE_S   24
+#define CPL_TX_TLS_SFO_TYPE_M   0xff
+#define CPL_TX_TLS_SFO_TYPE_V(x)((x) << CPL_TX_TLS_SFO_TYPE_S)
+#define CPL_TX_TLS_SFO_TYPE_G(x)\
+   (((x) >> CPL_TX_TLS_SFO_TYPE_S) & CPL_TX_TLS_SFO_TYPE_M)
+
+#define CPL_TX_TLS_SFO_PROTOVER_S   8
+#define CPL_TX_TLS_SFO_PROTOVER_M   0x
+#define CPL_TX_TLS_SFO_PROTOVER_V(x)((x) << CPL_TX_TLS_SFO_PROTOVER_S)
+#define CPL_TX_TLS_SFO_PROTOVER_G(x)\
+   (((x) >> CPL_TX_TLS_SFO_PROTOVER_S) & CPL_TX_TLS_SFO_PROTOVER_M)
+
+struct cpl_tls_data {
+   struct rss_header rsshdr;
+   union opcode_tid ot;
+   __be32 length_pkd;
+   __be32 seq;
+   __be32 r1;
+};
+
+#define CPL_TLS_DATA_OPCODE_S   24
+#define CPL_TLS_DATA_OPCODE_M   0xff
+#define CPL_TLS_DATA_OPCODE_V(x)((x) << CPL_TLS_DATA_OPCODE_S)
+#define CPL_TLS_DATA_OPCODE_G(x)\
+   (((x) >> CPL_TLS_DATA_OPCODE_S) & CPL_TLS_DATA_OPCODE_M)
+
+#define CPL_TLS_DATA_TID_S  0
+#define CPL_TLS_DATA_TID_M  0xff
+#define CPL_TLS_DATA_TID_V(x)   ((x) << CPL_TLS_DATA_TID_S)
+#define CPL_TLS_DATA_TID_G(x)   \
+   (((x) >> CPL_TLS_DATA_TID_S) & CPL_TLS_DATA_TID_M)
+
+#define CPL_TLS_DATA_LENGTH_S   0
+#define CPL_TLS_DATA_LENGTH_M   0x
+#define

[RFC PATCH V2] virtio_pci: Add SR-IOV support

2018-02-22 Thread Mark Rustad

Hardware-realized virtio-pci devices can implement SR-IOV, so this
patch enables its use. The device in question is an upcoming Intel
NIC that implements both a virtio-net PF and virtio-net VFs. These
are hardware realizations of what has been up to now been a software
interface.

The device in question has the following 4-part PCI IDs:

PF: device: 1af4 vendor: 1041 subvendor: 8086 subdevice: 15fe
VF: device: 1af4 vendor: 1041 subvendor: 8086 subdevice: 05fe

The patch needs no check for device ID, because the callback will
never be made for devices that do not assert the capability or
when run on a platform incapable of SR-IOV.

One reason for this patch is because the hardware requires the
vendor ID of a VF to be the same as the vendor ID of the PF that
created it. So it seemed logical to simply have a fully-functioning
virtio-net PF create the VFs. This patch makes that possible.

Signed-off-by: Mark Rustad 
Reviewed-by: Alexander Duyck 
---
Changes in V2:
- Simplified logic from previous version, removed added driver variable
- Disable SR-IOV on driver removal excapt when VFs are assigned
- Sent as RFC to virtio-dev, linux-pci, netdev, lkml and others
---
 drivers/virtio/virtio_pci_common.c |   47 
 1 file changed, 47 insertions(+)

diff --git a/drivers/virtio/virtio_pci_common.c 
b/drivers/virtio/virtio_pci_common.c
index 48d4d1cf1cb6..78b53ffc4cee 100644
--- a/drivers/virtio/virtio_pci_common.c
+++ b/drivers/virtio/virtio_pci_common.c
@@ -572,6 +572,47 @@ static int virtio_pci_probe(struct pci_dev *pci_dev,
return rc;
 }
 
+#ifdef CONFIG_PCI_IOV
+static int virtio_pci_sriov_disable(struct pci_dev *pci_dev)
+{
+   /* If vfs are assigned we cannot shut down SR-IOV without causing
+* issues, so just leave the hardware available.
+*/
+   if (pci_vfs_assigned(pci_dev)) {
+   dev_warn(_dev->dev,
+"Unloading driver while VFs are assigned - VFs will 
not be deallocated\n");
+   return -EPERM;
+   }
+   pci_disable_sriov(pci_dev);
+   return 0;
+}
+
+static int virtio_pci_sriov_enable(struct pci_dev *pci_dev, int num_vfs)
+{
+   int rc = 0;
+
+   if (pci_num_vf(pci_dev))
+   return -EINVAL;
+
+   rc = pci_enable_sriov(pci_dev, num_vfs);
+   if (rc) {
+   dev_warn(_dev->dev, "Failed to enable PCI sriov: %d\n", rc);
+   return rc;
+   }
+   dev_info(_dev->dev, "SR-IOV enabled with %d VFs\n", num_vfs);
+   return num_vfs;
+}
+
+static int virtio_pci_sriov_configure(struct pci_dev *dev, int num_vfs)
+{
+   if (num_vfs)
+   return virtio_pci_sriov_enable(dev, num_vfs);
+   if (!pci_num_vf(dev))
+   return -EINVAL;
+   return virtio_pci_sriov_disable(dev);
+}
+#endif /* CONFIG_PCI_IOV */
+
 static void virtio_pci_remove(struct pci_dev *pci_dev)
 {
struct virtio_pci_device *vp_dev = pci_get_drvdata(pci_dev);
@@ -584,6 +625,9 @@ static void virtio_pci_remove(struct pci_dev *pci_dev)
else
virtio_pci_modern_remove(vp_dev);
 
+#ifdef CONFIG_PCI_IOV
+   virtio_pci_sriov_disable(pci_dev);
+#endif
pci_disable_device(pci_dev);
put_device(dev);
 }
@@ -596,6 +640,9 @@ static void virtio_pci_remove(struct pci_dev *pci_dev)
 #ifdef CONFIG_PM_SLEEP
.driver.pm  = _pci_pm_ops,
 #endif
+#ifdef CONFIG_PCI_IOV
+   .sriov_configure = virtio_pci_sriov_configure,
+#endif
 };
 
 module_pci_driver(virtio_pci_driver);

[Crypto v7 11/12] chtls: Register chtls Inline TLS with net tls

2018-02-22 Thread Atul Gupta

Register chtls as Inline TLS driver, chtls is ULD to cxgb4.
Setsockopt to program (tx/rx) keys on chip. Support AES GCM
of key size 128. Support both Inline Rx and Tx.

Signed-off-by: Atul Gupta 
---
 drivers/crypto/chelsio/chtls/chtls_main.c | 600 ++
 include/uapi/linux/tls.h  |   1 +
 2 files changed, 601 insertions(+)
 create mode 100644 drivers/crypto/chelsio/chtls/chtls_main.c

diff --git a/drivers/crypto/chelsio/chtls/chtls_main.c 
b/drivers/crypto/chelsio/chtls/chtls_main.c
new file mode 100644
index 000..657c515
--- /dev/null
+++ b/drivers/crypto/chelsio/chtls/chtls_main.c
@@ -0,0 +1,600 @@
+/*
+ * Copyright (c) 2017 Chelsio Communications, Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * Written by: Atul Gupta (atul.gu...@chelsio.com)
+ */
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "chtls.h"
+#include "chtls_cm.h"
+
+#define DRV_NAME "chtls"
+
+/*
+ * chtls device management
+ * maintains a list of the chtls devices
+ */
+static LIST_HEAD(cdev_list);
+static DEFINE_MUTEX(cdev_mutex);
+static DEFINE_MUTEX(cdev_list_lock);
+
+static struct proto chtls_cpl_prot;
+static struct proto chtls_base_prot;
+static DEFINE_MUTEX(notify_mutex);
+static RAW_NOTIFIER_HEAD(listen_notify_list);
+struct request_sock_ops chtls_rsk_ops;
+static uint send_page_order = (14 - PAGE_SHIFT < 0) ? 0 : 14 - PAGE_SHIFT;
+
+static int register_listen_notifier(struct notifier_block *nb)
+{
+   int err;
+
+   mutex_lock(_mutex);
+   err = raw_notifier_chain_register(_notify_list, nb);
+   mutex_unlock(_mutex);
+   return err;
+}
+
+static int unregister_listen_notifier(struct notifier_block *nb)
+{
+   int err;
+
+   mutex_lock(_mutex);
+   err = raw_notifier_chain_unregister(_notify_list, nb);
+   mutex_unlock(_mutex);
+   return err;
+}
+
+static int listen_notify_handler(struct notifier_block *this,
+unsigned long event, void *data)
+{
+   struct sock *sk = data;
+   struct chtls_dev *cdev;
+   int ret =  NOTIFY_DONE;
+
+   switch (event) {
+   case CHTLS_LISTEN_START:
+   case CHTLS_LISTEN_STOP:
+   mutex_lock(_list_lock);
+   list_for_each_entry(cdev, _list, list) {
+   if (event == CHTLS_LISTEN_START)
+   ret = chtls_listen_start(cdev, sk);
+   else
+   chtls_listen_stop(cdev, sk);
+   }
+   mutex_unlock(_list_lock);
+   break;
+   }
+   return ret;
+}
+
+static struct notifier_block listen_notifier = {
+   .notifier_call = listen_notify_handler
+};
+
+static int listen_backlog_rcv(struct sock *sk, struct sk_buff *skb)
+{
+   if (likely(skb_transport_header(skb) != skb_network_header(skb)))
+   return tcp_v4_do_rcv(sk, skb);
+   BLOG_SKB_CB(skb)->backlog_rcv(sk, skb);
+   return 0;
+}
+
+static int chtls_start_listen(struct sock *sk)
+{
+   int err;
+
+   if (sk->sk_protocol != IPPROTO_TCP)
+   return -EPROTONOSUPPORT;
+
+   if (sk->sk_family == PF_INET &&
+   LOOPBACK(inet_sk(sk)->inet_rcv_saddr))
+   return -EADDRNOTAVAIL;
+
+   sk->sk_backlog_rcv = listen_backlog_rcv;
+   mutex_lock(_mutex);
+   err = raw_notifier_call_chain(_notify_list,
+ CHTLS_LISTEN_START, sk);
+   mutex_unlock(_mutex);
+   return err;
+}
+
+static int chtls_hash(struct sock *sk)
+{
+   int err;
+
+   err = tcp_prot.hash(sk);
+   if (sk->sk_state == TCP_LISTEN)
+   err |= chtls_start_listen(sk);
+
+   if (err)
+   tcp_prot.unhash(sk);
+   return err;
+}
+
+static int chtls_stop_listen(struct sock *sk)
+{
+   if (sk->sk_protocol != IPPROTO_TCP)
+   return -EPROTONOSUPPORT;
+
+   mutex_lock(_mutex);
+   raw_notifier_call_chain(_notify_list,
+   CHTLS_LISTEN_STOP, sk);
+   mutex_unlock(_mutex);
+   return 0;
+}
+
+static void chtls_unhash(struct sock *sk)
+{
+   if (sk->sk_state == TCP_LISTEN)
+   chtls_stop_listen(sk);
+   tcp_prot.unhash(sk);
+}
+
+static int chtls_netdev(struct tls_device *dev,
+   struct net_device *netdev)
+{
+   struct chtls_dev *cdev = to_chtls_dev(dev);
+   int i;
+
+   for (i = 0; i < cdev->lldi->nports; i++)
+   if (cdev->ports[i] == netdev)
+   return 1;
+
+   return 0;
+}
+
+static int chtls_inline_feature(struct tls_device *dev)
+{
+   struct chtls_dev *cdev = to_chtls_dev(dev);
+   struct net_device *netdev;
+   int i;
+
+

Re: ppp/pppoe, still panic 4.15.3 in ppp_push

2018-02-22 Thread Denys Fedoryshchenko


On 2018-02-22 20:30, Guillaume Nault wrote:

On Wed, Feb 21, 2018 at 12:04:30PM -0800, Cong Wang wrote:
On Thu, Feb 15, 2018 at 11:31 AM, Guillaume Nault 
 wrote:

> On Thu, Feb 15, 2018 at 06:01:16PM +0200, Denys Fedoryshchenko wrote:
>> On 2018-02-15 17:55, Guillaume Nault wrote:
>> > On Thu, Feb 15, 2018 at 12:19:52PM +0200, Denys Fedoryshchenko wrote:
>> > > Here we go:
>> > >
>> > >   [24558.921549]
>> > > ==
>> > >   [24558.922167] BUG: KASAN: use-after-free in
>> > > ppp_ioctl+0xa6a/0x1522
>> > > [ppp_generic]
>> > >   [24558.922776] Write of size 8 at addr 8803d35bf3f8 by task
>> > > accel-pppd/12622
>> > >   [24558.923113]
>> > >   [24558.923451] CPU: 0 PID: 12622 Comm: accel-pppd Tainted: G
>> > > W
>> > > 4.15.3-build-0134 #1
>> > >   [24558.924058] Hardware name: HP ProLiant DL320e Gen8 v2,
>> > > BIOS P80
>> > > 04/02/2015
>> > >   [24558.924406] Call Trace:
>> > >   [24558.924753]  dump_stack+0x46/0x59
>> > >   [24558.925103]  print_address_description+0x6b/0x23b
>> > >   [24558.925451]  ? ppp_ioctl+0xa6a/0x1522 [ppp_generic]
>> > >   [24558.925797]  kasan_report+0x21b/0x241
>> > >   [24558.926136]  ppp_ioctl+0xa6a/0x1522 [ppp_generic]
>> > >   [24558.926479]  ? ppp_nl_newlink+0x1da/0x1da [ppp_generic]
>> > >   [24558.926829]  ? sock_sendmsg+0x89/0x99
>> > >   [24558.927176]  ? __vfs_write+0xd9/0x4ad
>> > >   [24558.927523]  ? kernel_read+0xed/0xed
>> > >   [24558.927872]  ? SyS_getpeername+0x18c/0x18c
>> > >   [24558.928213]  ? bit_waitqueue+0x2a/0x2a
>> > >   [24558.928561]  ? wake_atomic_t_function+0x115/0x115
>> > >   [24558.928898]  vfs_ioctl+0x6e/0x81
>> > >   [24558.929228]  do_vfs_ioctl+0xa00/0xb10
>> > >   [24558.929571]  ? sigprocmask+0x1a6/0x1d0
>> > >   [24558.929907]  ? sigsuspend+0x13e/0x13e
>> > >   [24558.930239]  ? ioctl_preallocate+0x14e/0x14e
>> > >   [24558.930568]  ? SyS_rt_sigprocmask+0xf1/0x142
>> > >   [24558.930904]  ? sigprocmask+0x1d0/0x1d0
>> > >   [24558.931252]  SyS_ioctl+0x39/0x55
>> > >   [24558.931595]  ? do_vfs_ioctl+0xb10/0xb10
>> > >   [24558.931942]  do_syscall_64+0x1b1/0x31f
>> > >   [24558.932288]  entry_SYSCALL_64_after_hwframe+0x21/0x86
>> > >   [24558.932627] RIP: 0033:0x7f302849d8a7
>> > >   [24558.932965] RSP: 002b:7f3029a52af8 EFLAGS: 0206
>> > > ORIG_RAX:
>> > > 0010
>> > >   [24558.933578] RAX: ffda RBX: 7f3027d861e3 RCX:
>> > > 7f302849d8a7
>> > >   [24558.933927] RDX: 7f3023f49468 RSI: 4004743a RDI:
>> > > 3a67
>> > >   [24558.934266] RBP: 7f3029a52b20 R08:  R09:
>> > > 55c8308d8e40
>> > >   [24558.934607] R10: 0008 R11: 0206 R12:
>> > > 7f3023f49358
>> > >   [24558.934947] R13: 7ffe86e5723f R14:  R15:
>> > > 7f3029a53700
>> > >   [24558.935288]
>> > >   [24558.935626] Allocated by task 12622:
>> > >   [24558.935972]  ppp_register_net_channel+0x5f/0x5c6
>> > > [ppp_generic]
>> > >   [24558.936306]  pppoe_connect+0xab7/0xc71 [pppoe]
>> > >   [24558.936640]  SyS_connect+0x14b/0x1b7
>> > >   [24558.936975]  do_syscall_64+0x1b1/0x31f
>> > >   [24558.937319]  entry_SYSCALL_64_after_hwframe+0x21/0x86
>> > >   [24558.937655]
>> > >   [24558.937993] Freed by task 12622:
>> > >   [24558.938321]  kfree+0xb0/0x11d
>> > >   [24558.938658]  ppp_release+0x111/0x120 [ppp_generic]
>> > >   [24558.938994]  __fput+0x2ba/0x51a
>> > >   [24558.939332]  task_work_run+0x11c/0x13d
>> > >   [24558.939676]  exit_to_usermode_loop+0x7c/0xaf
>> > >   [24558.940022]  do_syscall_64+0x2ea/0x31f
>> > >   [24558.940368]  entry_SYSCALL_64_after_hwframe+0x21/0x86
>> > >   [24558.947099]
>> >
>> > Your first guess was right. It looks like we have an issue with
>> > reference counting on the channels. Can you send me your ppp_generic.o?
>> http://nuclearcat.com/ppp_generic.o
>> Compiled with gcc version 6.4.0 (Gentoo 6.4.0-r1 p1.3)
>>
> From what I can see, ppp_release() and ioctl(PPPIOCCONNECT) are called
> concurrently on the same ppp_file. Even if this ppp_file was pointed at
> by two different file descriptors, I can't see how this could defeat
> the reference counting mechanism. I'm going to think more about it.

For me it looks like pch->clist is not removed from the list 
ppp->channels
when destroyed via ppp_release(). But I don't want to pretend I 
understand

ppp logic.


I've thought about that too, but couldn't find a scenario that could
trigger the bug.

To get ->private_data pointing to a struct channel pointer, a file 
needs

to ioctl(PPPIOCATTCHAN) first. For this call to succeed, the channel
must have been registered with ppp_register_net_channel(). Both
operations take a reference on the channel, which means that, before
adding pch->clist to a ppp->channels list (with ppp_connect_channel()),
the channel is already held by a /dev/ppp file and by the code that
registered the channel in the first place.

Therefore, closing the /dev/ppp

Re: [RFC PATCH V2] virtio_pci: Add SR-IOV support

2018-02-22 Thread Christoph Hellwig

Can we move this into common code as a a generic_sriov_configure
helper?  Nothing is really virtio specific, and it seems like
some other drivers could also use it, e.g. ena or nvme.

Re: ss issue on arm not showing UDP listening ports

2018-02-22 Thread jesse_cooper



Quoting Guillaume Nault :


On Wed, Feb 21, 2018 at 07:59:24PM -0600, Jesse Cooper wrote:

Thank you for the suggestions. This is on a raspberry pi 3 not sure if
that fact matters. I will notify Raspbian of the issue.


Does your kernel have CONFIG_INET_UDP_DIAG?


grep CONFIG_INET_UDP_DIAG kernel.config
# CONFIG_INET_UDP_DIAG is not set

sudo modprobe udp_diag
modprobe: FATAL: Module udp_diag not found in directory  
/lib/modules/4.9.59-v7+


Can this module be loaded after the fact? If so what is the easiest  
way to get the proper .ko file?

Re: [PATCH net v3 2/2] tuntap: correctly add the missing xdp flush

2018-02-22 Thread Jesper Dangaard Brouer

On Thu, 22 Feb 2018 17:36:46 +0800
Jason Wang  wrote:

> Commit 762c330d670e ("tuntap: add missing xdp flush") tries to fix the
> devmap stall caused by missed xdp flush by counting the pending xdp
> redirected packets and flush when it exceeds NAPI_POLL_WEIGHT or
> MSG_MORE is clear. This may lead to BUG() since xdp_do_flush() was
> called in the process context with preemption enabled. Simply
> disabling preemption may silence the warning but be not enough since
> process may move between different CPUS during a batch which cause
> xdp_do_flush() misses some CPU where the process run
> previously. Consider the fallouts, that commit was reverted. To fix
> the issue correctly, we can simply call xdp_do_flush() immediately
> after xdp_do_redirect(), a side effect is that this removes any
> possibility of batching which could be addressed in the future.
> 
> Reported-by: Christoffer Dall 
> Fixes: 762c330d670e ("tuntap: add missing xdp flush")
> Signed-off-by: Jason Wang 
> ---
>  drivers/net/tun.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> index 2823a4a..a363ea2 100644
> --- a/drivers/net/tun.c
> +++ b/drivers/net/tun.c
> @@ -1662,6 +1662,7 @@ static struct sk_buff *tun_build_skb(struct tun_struct 
> *tun,
>   get_page(alloc_frag->page);
>   alloc_frag->offset += buflen;
>   err = xdp_do_redirect(tun->dev, , xdp_prog);
> + xdp_do_flush_map();
>   if (err)
>   goto err_redirect;
>   rcu_read_unlock();

As you have noticed, the xdp_do_redirect() + xdp_do_flush_map() rely
heavily on being executed in softirq/napi_schedule context.
Particularly the map infra devmap[1]+cpumap depend on the enqueue and
flush operation MUST happen on the same CPU (e.g. stores which
devices needs flushing in a this_cpu_ptr bitmap [1]).

What context is tun_build_skb() invoked under?

Even when you call xdp_do_redirect and xdp_do_flush_map right after
each-other, are we sure we cannot be preempted here?


[1] https://github.com/torvalds/linux/blob/master/kernel/bpf/devmap.c#L209-L215
-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

Re: [PATCH v2 iproute2-next 1/3] ip: Use the `struct fib_rule_hdr` for rules

2018-02-22 Thread David Ahern

On 2/21/18 7:12 PM, Donald Sharp wrote:
> @@ -577,21 +585,20 @@ static int iprule_modify(int cmd, int argc, char **argv)
>   __u32 tid = 0;
>   struct {
>   struct nlmsghdr n;
> - struct rtmsgr;
> + struct fib_rule_hdr frh;
>   charbuf[1024];
>   } req = {
>   .n.nlmsg_type = cmd,
> - .n.nlmsg_len = NLMSG_LENGTH(sizeof(struct rtmsg)),
> + .n.nlmsg_len = NLMSG_LENGTH(sizeof(struct fib_rule_hdr)),
>   .n.nlmsg_flags = NLM_F_REQUEST,
> - .r.rtm_family = preferred_family,
> - .r.rtm_protocol = RTPROT_BOOT,
> - .r.rtm_scope = RT_SCOPE_UNIVERSE,
> - .r.rtm_type = RTN_UNSPEC,
> + .frh.family = preferred_family,
> + .frh.proto = RTPROT_BOOT,
> + .frh.action = RTN_UNSPEC,
>   };
>  
>   if (cmd == RTM_NEWRULE) {
>   req.n.nlmsg_flags |= NLM_F_CREATE|NLM_F_EXCL;
> - req.r.rtm_type = RTN_UNICAST;
> + req.frh.action = RTN_UNICAST;
>   }

The action should be FR_ACT_TO_TBL; RTN_UNICAST == 1 == FR_ACT_TO_TBL;
the latter is the proper enum for fib rules.

Re: [for-next 7/7] IB/mlx5: Implement fragmented completion queue (CQ)

2018-02-22 Thread Jason Gunthorpe

On Wed, Feb 21, 2018 at 12:13:54PM -0800, Saeed Mahameed wrote:
> From: Yonatan Cohen 
> 
> The current implementation of create CQ requires contiguous
> memory, such requirement is problematic once the memory is
> fragmented or the system is low in memory, it causes for
> failures in dma_zalloc_coherent().
> 
> This patch implements new scheme of fragmented CQ to overcome
> this issue by introducing new type: 'struct mlx5_frag_buf_ctrl'
> to allocate fragmented buffers, rather than contiguous ones.
> 
> Base the Completion Queues (CQs) on this new fragmented buffer.
> 
> It fixes following crashes:
> kworker/29:0: page allocation failure: order:6, mode:0x80d0
> CPU: 29 PID: 8374 Comm: kworker/29:0 Tainted: G OE 3.10.0
> Workqueue: ib_cm cm_work_handler [ib_cm]
> Call Trace:
> [<>] dump_stack+0x19/0x1b
> [<>] warn_alloc_failed+0x110/0x180
> [<>] __alloc_pages_slowpath+0x6b7/0x725
> [<>] __alloc_pages_nodemask+0x405/0x420
> [<>] dma_generic_alloc_coherent+0x8f/0x140
> [<>] x86_swiotlb_alloc_coherent+0x21/0x50
> [<>] mlx5_dma_zalloc_coherent_node+0xad/0x110 [mlx5_core]
> [<>] ? mlx5_db_alloc_node+0x69/0x1b0 [mlx5_core]
> [<>] mlx5_buf_alloc_node+0x3e/0xa0 [mlx5_core]
> [<>] mlx5_buf_alloc+0x14/0x20 [mlx5_core]
> [<>] create_cq_kernel+0x90/0x1f0 [mlx5_ib]
> [<>] mlx5_ib_create_cq+0x3b0/0x4e0 [mlx5_ib]
> 
> Signed-off-by: Yonatan Cohen 
> Reviewed-by: Tariq Toukan 
> Signed-off-by: Leon Romanovsky 
> Signed-off-by: Saeed Mahameed 
>  drivers/infiniband/hw/mlx5/cq.c | 64 
> +++--
>  drivers/infiniband/hw/mlx5/mlx5_ib.h|  6 +--
>  drivers/net/ethernet/mellanox/mlx5/core/alloc.c | 37 +-
>  drivers/net/ethernet/mellanox/mlx5/core/en_rx.c | 11 +++--
>  drivers/net/ethernet/mellanox/mlx5/core/wq.c| 18 +++
>  drivers/net/ethernet/mellanox/mlx5/core/wq.h| 22 +++--
>  include/linux/mlx5/driver.h | 51 ++--
>  7 files changed, 124 insertions(+), 85 deletions(-)

For the drivers/infiniband stuff:

Acked-by: Jason Gunthorpe 

Thanks,
Jason

[Crypto v7 09/12] chtls: CPL handler definition

2018-02-22 Thread Atul Gupta

CPL handlers for TLS session, record transmit and receive.

Signed-off-by: Atul Gupta 
---
 drivers/crypto/chelsio/chtls/chtls_cm.c | 2041 +++
 net/ipv4/tcp_minisocks.c|1 +
 2 files changed, 2042 insertions(+)
 create mode 100644 drivers/crypto/chelsio/chtls/chtls_cm.c

diff --git a/drivers/crypto/chelsio/chtls/chtls_cm.c 
b/drivers/crypto/chelsio/chtls/chtls_cm.c
new file mode 100644
index 000..1c95e87
--- /dev/null
+++ b/drivers/crypto/chelsio/chtls/chtls_cm.c
@@ -0,0 +1,2041 @@
+/*
+ * Copyright (c) 2017 Chelsio Communications, Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * Written by: Atul Gupta (atul.gu...@chelsio.com)
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "chtls.h"
+#include "chtls_cm.h"
+
+extern struct request_sock_ops chtls_rsk_ops;
+
+/*
+ * State transitions and actions for close.  Note that if we are in SYN_SENT
+ * we remain in that state as we cannot control a connection while it's in
+ * SYN_SENT; such connections are allowed to establish and are then aborted.
+ */
+static unsigned char new_state[16] = {
+   /* current state: new state:  action: */
+   /* (Invalid)   */ TCP_CLOSE,
+   /* TCP_ESTABLISHED */ TCP_FIN_WAIT1 | TCP_ACTION_FIN,
+   /* TCP_SYN_SENT*/ TCP_SYN_SENT,
+   /* TCP_SYN_RECV*/ TCP_FIN_WAIT1 | TCP_ACTION_FIN,
+   /* TCP_FIN_WAIT1   */ TCP_FIN_WAIT1,
+   /* TCP_FIN_WAIT2   */ TCP_FIN_WAIT2,
+   /* TCP_TIME_WAIT   */ TCP_CLOSE,
+   /* TCP_CLOSE   */ TCP_CLOSE,
+   /* TCP_CLOSE_WAIT  */ TCP_LAST_ACK | TCP_ACTION_FIN,
+   /* TCP_LAST_ACK*/ TCP_LAST_ACK,
+   /* TCP_LISTEN  */ TCP_CLOSE,
+   /* TCP_CLOSING */ TCP_CLOSING,
+};
+
+static struct chtls_sock *chtls_sock_create(struct chtls_dev *cdev)
+{
+   struct chtls_sock *csk = kzalloc(sizeof(*csk), GFP_ATOMIC);
+
+   if (!csk)
+   return NULL;
+
+   csk->txdata_skb_cache = alloc_skb(TXDATA_SKB_LEN, GFP_ATOMIC);
+   if (!csk->txdata_skb_cache) {
+   kfree(csk);
+   return NULL;
+   }
+
+   kref_init(>kref);
+   csk->cdev = cdev;
+   skb_queue_head_init(>txq);
+   csk->wr_skb_head = NULL;
+   csk->wr_skb_tail = NULL;
+   csk->mss = MAX_MSS;
+   csk->tlshws.ofld = 1;
+   csk->tlshws.txkey = -1;
+   csk->tlshws.rxkey = -1;
+   csk->tlshws.mfs = TLS_MFS;
+   skb_queue_head_init(>tlshws.sk_recv_queue);
+   return csk;
+}
+
+static void chtls_sock_release(struct kref *ref)
+{
+   struct chtls_sock *csk =
+   container_of(ref, struct chtls_sock, kref);
+
+   kfree(csk);
+}
+
+static struct net_device *chtls_ipv4_netdev(struct chtls_dev *cdev,
+   struct sock *sk)
+{
+   struct net_device *ndev = cdev->ports[0];
+
+   if (likely(!inet_sk(sk)->inet_rcv_saddr))
+   return ndev;
+
+   ndev = ip_dev_find(_net, inet_sk(sk)->inet_rcv_saddr);
+   if (!ndev)
+   return NULL;
+
+   if (is_vlan_dev(ndev))
+   return vlan_dev_real_dev(ndev);
+   return ndev;
+}
+
+static void assign_rxopt(struct sock *sk, unsigned int opt)
+{
+   struct chtls_sock *csk = rcu_dereference_sk_user_data(sk);
+   struct tcp_sock *tp = tcp_sk(sk);
+   const struct chtls_dev *cdev;
+
+   cdev = csk->cdev;
+   tp->tcp_header_len   = sizeof(struct tcphdr);
+   tp->rx_opt.mss_clamp = cdev->mtus[TCPOPT_MSS_G(opt)] - 40;
+   tp->mss_cache= tp->rx_opt.mss_clamp;
+   tp->rx_opt.tstamp_ok = TCPOPT_TSTAMP_G(opt);
+   tp->rx_opt.snd_wscale= TCPOPT_SACK_G(opt);
+   tp->rx_opt.wscale_ok = TCPOPT_WSCALE_OK_G(opt);
+   SND_WSCALE(tp)   = TCPOPT_SND_WSCALE_G(opt);
+   if (!tp->rx_opt.wscale_ok)
+   tp->rx_opt.rcv_wscale = 0;
+   if (tp->rx_opt.tstamp_ok) {
+   tp->tcp_header_len += TCPOLEN_TSTAMP_ALIGNED;
+   tp->rx_opt.mss_clamp -= TCPOLEN_TSTAMP_ALIGNED;
+   } else if (csk->opt2 & TSTAMPS_EN_F) {
+   csk->opt2 &= ~TSTAMPS_EN_F;
+   csk->mtu_idx = TCPOPT_MSS_G(opt);
+   }
+}
+
+static void chtls_purge_rcv_queue(struct sock *sk)
+{
+   struct sk_buff *skb;
+
+   while ((skb = __skb_dequeue(>sk_receive_queue)) != NULL) {
+   skb_dst_set(skb, (void *)NULL);
+   kfree_skb(skb);
+   }
+}
+
+static void chtls_purge_write_queue(struct sock *sk)
+{
+   struct chtls_sock *csk = rcu_dereference_sk_user_data(sk);
+   struct sk_buff *skb;
+
+   while ((skb = __skb_dequeue(>txq))) {
+

[Crypto v7 07/12] chcr: Key Macro

2018-02-22 Thread Atul Gupta

Define macro for TLS Key context

Signed-off-by: Atul Gupta 
---
 drivers/crypto/chelsio/chcr_algo.h | 42 +
 drivers/crypto/chelsio/chcr_core.h | 55 +-
 2 files changed, 96 insertions(+), 1 deletion(-)

diff --git a/drivers/crypto/chelsio/chcr_algo.h 
b/drivers/crypto/chelsio/chcr_algo.h
index d1673a5..f263cd4 100644
--- a/drivers/crypto/chelsio/chcr_algo.h
+++ b/drivers/crypto/chelsio/chcr_algo.h
@@ -86,6 +86,39 @@
 KEY_CONTEXT_OPAD_PRESENT_M)
 #define KEY_CONTEXT_OPAD_PRESENT_F  KEY_CONTEXT_OPAD_PRESENT_V(1U)
 
+#define TLS_KEYCTX_RXFLIT_CNT_S 24
+#define TLS_KEYCTX_RXFLIT_CNT_V(x) ((x) << TLS_KEYCTX_RXFLIT_CNT_S)
+
+#define TLS_KEYCTX_RXPROT_VER_S 20
+#define TLS_KEYCTX_RXPROT_VER_M 0xf
+#define TLS_KEYCTX_RXPROT_VER_V(x) ((x) << TLS_KEYCTX_RXPROT_VER_S)
+
+#define TLS_KEYCTX_RXCIPH_MODE_S 16
+#define TLS_KEYCTX_RXCIPH_MODE_M 0xf
+#define TLS_KEYCTX_RXCIPH_MODE_V(x) ((x) << TLS_KEYCTX_RXCIPH_MODE_S)
+
+#define TLS_KEYCTX_RXAUTH_MODE_S 12
+#define TLS_KEYCTX_RXAUTH_MODE_M 0xf
+#define TLS_KEYCTX_RXAUTH_MODE_V(x) ((x) << TLS_KEYCTX_RXAUTH_MODE_S)
+
+#define TLS_KEYCTX_RXCIAU_CTRL_S 11
+#define TLS_KEYCTX_RXCIAU_CTRL_V(x) ((x) << TLS_KEYCTX_RXCIAU_CTRL_S)
+
+#define TLS_KEYCTX_RX_SEQCTR_S 9
+#define TLS_KEYCTX_RX_SEQCTR_M 0x3
+#define TLS_KEYCTX_RX_SEQCTR_V(x) ((x) << TLS_KEYCTX_RX_SEQCTR_S)
+
+#define TLS_KEYCTX_RX_VALID_S 8
+#define TLS_KEYCTX_RX_VALID_V(x) ((x) << TLS_KEYCTX_RX_VALID_S)
+
+#define TLS_KEYCTX_RXCK_SIZE_S 3
+#define TLS_KEYCTX_RXCK_SIZE_M 0x7
+#define TLS_KEYCTX_RXCK_SIZE_V(x) ((x) << TLS_KEYCTX_RXCK_SIZE_S)
+
+#define TLS_KEYCTX_RXMK_SIZE_S 0
+#define TLS_KEYCTX_RXMK_SIZE_M 0x7
+#define TLS_KEYCTX_RXMK_SIZE_V(x) ((x) << TLS_KEYCTX_RXMK_SIZE_S)
+
 #define CHCR_HASH_MAX_DIGEST_SIZE 64
 #define CHCR_MAX_SHA_DIGEST_SIZE 64
 
@@ -176,6 +209,15 @@
  KEY_CONTEXT_SALT_PRESENT_V(1) | \
  KEY_CONTEXT_CTX_LEN_V((ctx_len)))
 
+#define  FILL_KEY_CRX_HDR(ck_size, mk_size, d_ck, opad, ctx_len) \
+   htonl(TLS_KEYCTX_RXMK_SIZE_V(mk_size) | \
+ TLS_KEYCTX_RXCK_SIZE_V(ck_size) | \
+ TLS_KEYCTX_RX_VALID_V(1) | \
+ TLS_KEYCTX_RX_SEQCTR_V(3) | \
+ TLS_KEYCTX_RXAUTH_MODE_V(4) | \
+ TLS_KEYCTX_RXCIPH_MODE_V(2) | \
+ TLS_KEYCTX_RXFLIT_CNT_V((ctx_len)))
+
 #define FILL_WR_OP_CCTX_SIZE \
htonl( \
FW_CRYPTO_LOOKASIDE_WR_OPCODE_V( \
diff --git a/drivers/crypto/chelsio/chcr_core.h 
b/drivers/crypto/chelsio/chcr_core.h
index 3c29ee0..77056a9 100644
--- a/drivers/crypto/chelsio/chcr_core.h
+++ b/drivers/crypto/chelsio/chcr_core.h
@@ -65,10 +65,58 @@
 struct _key_ctx {
__be32 ctx_hdr;
u8 salt[MAX_SALT];
-   __be64 reserverd;
+   __be64 iv_to_auth;
unsigned char key[0];
 };
 
+#define KEYCTX_TX_WR_IV_S  55
+#define KEYCTX_TX_WR_IV_M  0x1ffULL
+#define KEYCTX_TX_WR_IV_V(x) ((x) << KEYCTX_TX_WR_IV_S)
+#define KEYCTX_TX_WR_IV_G(x) \
+   (((x) >> KEYCTX_TX_WR_IV_S) & KEYCTX_TX_WR_IV_M)
+
+#define KEYCTX_TX_WR_AAD_S 47
+#define KEYCTX_TX_WR_AAD_M 0xffULL
+#define KEYCTX_TX_WR_AAD_V(x) ((x) << KEYCTX_TX_WR_AAD_S)
+#define KEYCTX_TX_WR_AAD_G(x) (((x) >> KEYCTX_TX_WR_AAD_S) & \
+   KEYCTX_TX_WR_AAD_M)
+
+#define KEYCTX_TX_WR_AADST_S 39
+#define KEYCTX_TX_WR_AADST_M 0xffULL
+#define KEYCTX_TX_WR_AADST_V(x) ((x) << KEYCTX_TX_WR_AADST_S)
+#define KEYCTX_TX_WR_AADST_G(x) \
+   (((x) >> KEYCTX_TX_WR_AADST_S) & KEYCTX_TX_WR_AADST_M)
+
+#define KEYCTX_TX_WR_CIPHER_S 30
+#define KEYCTX_TX_WR_CIPHER_M 0x1ffULL
+#define KEYCTX_TX_WR_CIPHER_V(x) ((x) << KEYCTX_TX_WR_CIPHER_S)
+#define KEYCTX_TX_WR_CIPHER_G(x) \
+   (((x) >> KEYCTX_TX_WR_CIPHER_S) & KEYCTX_TX_WR_CIPHER_M)
+
+#define KEYCTX_TX_WR_CIPHERST_S 23
+#define KEYCTX_TX_WR_CIPHERST_M 0x7f
+#define KEYCTX_TX_WR_CIPHERST_V(x) ((x) << KEYCTX_TX_WR_CIPHERST_S)
+#define KEYCTX_TX_WR_CIPHERST_G(x) \
+   (((x) >> KEYCTX_TX_WR_CIPHERST_S) & KEYCTX_TX_WR_CIPHERST_M)
+
+#define KEYCTX_TX_WR_AUTH_S 14
+#define KEYCTX_TX_WR_AUTH_M 0x1ff
+#define KEYCTX_TX_WR_AUTH_V(x) ((x) << KEYCTX_TX_WR_AUTH_S)
+#define KEYCTX_TX_WR_AUTH_G(x) \
+   (((x) >> KEYCTX_TX_WR_AUTH_S) & KEYCTX_TX_WR_AUTH_M)
+
+#define KEYCTX_TX_WR_AUTHST_S 7
+#define KEYCTX_TX_WR_AUTHST_M 0x7f
+#define KEYCTX_TX_WR_AUTHST_V(x) ((x) << KEYCTX_TX_WR_AUTHST_S)
+#define KEYCTX_TX_WR_AUTHST_G(x) \
+   (((x) >> KEYCTX_TX_WR_AUTHST_S) & KEYCTX_TX_WR_AUTHST_M)
+
+#define KEYCTX_TX_WR_AUTHIN_S 0
+#define KEYCTX_TX_WR_AUTHIN_M 0x7f
+#define KEYCTX_TX_WR_AUTHIN_V(x) ((x) << KEYCTX_TX_WR_AUTHIN_S)
+#define KEYCTX_TX_WR_AUTHIN_G(x) \
+   (((x) >> KEYCTX_TX_WR_AUTHIN_S) & KEYCTX_TX_WR_AUTHIN_M)
+
 struct chcr_wr {
struct fw_crypto_lookaside_wr wreq;
struct ulp_txpkt ulptx;
@@ -90,6 +138,11 @@ struct uld_ctx {
struct chcr_dev

[Crypto v7 04/12] chtls: structure and macro definiton

2018-02-22 Thread Atul Gupta

Inline TLS state, connection management. Supporting macros definition.

Signed-off-by: Atul Gupta 
---
 drivers/crypto/chelsio/chtls/chtls.h| 487 
 drivers/crypto/chelsio/chtls/chtls_cm.h | 202 +
 2 files changed, 689 insertions(+)
 create mode 100644 drivers/crypto/chelsio/chtls/chtls.h
 create mode 100644 drivers/crypto/chelsio/chtls/chtls_cm.h

diff --git a/drivers/crypto/chelsio/chtls/chtls.h 
b/drivers/crypto/chelsio/chtls/chtls.h
new file mode 100644
index 000..3ae7145
--- /dev/null
+++ b/drivers/crypto/chelsio/chtls/chtls.h
@@ -0,0 +1,487 @@
+/*
+ * Copyright (c) 2016 Chelsio Communications, Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#ifndef __CHTLS_H__
+#define __CHTLS_H__
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "t4fw_api.h"
+#include "t4_msg.h"
+#include "cxgb4.h"
+#include "cxgb4_uld.h"
+#include "l2t.h"
+#include "chcr_algo.h"
+#include "chcr_core.h"
+#include "chcr_crypto.h"
+
+#define CIPHER_BLOCK_SIZE   16
+#define MAX_IVS_PAGE256
+#define TLS_KEY_CONTEXT_SZ 64
+#define TLS_HEADER_LENGTH  5
+#define SCMD_CIPH_MODE_AES_GCM  2
+#define GCM_TAG_SIZE16
+#define AEAD_EXPLICIT_DATA_SIZE 8
+/* Any MFS size should work and come from openssl */
+#define TLS_MFS16384
+
+#define SOCK_INLINE (31)
+#define RSS_HDR sizeof(struct rss_header)
+
+enum {
+   CHTLS_KEY_CONTEXT_DSGL,
+   CHTLS_KEY_CONTEXT_IMM,
+   CHTLS_KEY_CONTEXT_DDR,
+};
+
+enum {
+   CHTLS_LISTEN_START,
+   CHTLS_LISTEN_STOP,
+};
+
+/* Flags for return value of CPL message handlers */
+enum {
+   CPL_RET_BUF_DONE = 1,   /* buffer processing done */
+   CPL_RET_BAD_MSG = 2,/* bad CPL message */
+   CPL_RET_UNKNOWN_TID = 4 /* unexpected unknown TID */
+};
+
+#define TLS_RCV_ST_READ_HEADER  0xF0
+#define TLS_RCV_ST_READ_BODY0xF1
+#define TLS_RCV_ST_READ_DONE0xF2
+#define TLS_RCV_ST_READ_NB  0xF3
+
+#define RSPQ_HASH_BITS 5
+#define LISTEN_INFO_HASH_SIZE 32
+struct listen_info {
+   struct listen_info *next;  /* Link to next entry */
+   struct sock *sk;   /* The listening socket */
+   unsigned int stid; /* The server TID */
+};
+
+enum {
+   T4_LISTEN_START_PENDING,
+   T4_LISTEN_STARTED
+};
+
+enum csk_flags {
+   CSK_CALLBACKS_CHKD, /* socket callbacks have been sanitized */
+   CSK_ABORT_REQ_RCVD, /* received one ABORT_REQ_RSS message */
+   CSK_TX_MORE_DATA,   /* sending ULP data; don't set SHOVE bit */
+   CSK_TX_WAIT_IDLE,   /* suspend Tx until in-flight data is ACKed */
+   CSK_ABORT_SHUTDOWN, /* shouldn't send more abort requests */
+   CSK_ABORT_RPL_PENDING,  /* expecting an abort reply */
+   CSK_CLOSE_CON_REQUESTED,/* we've sent a close_conn_req */
+   CSK_TX_DATA_SENT,   /* sent a TX_DATA WR on this connection */
+   CSK_TX_FAILOVER,/* Tx traffic failing over */
+   CSK_UPDATE_RCV_WND, /* Need to update rcv window */
+   CSK_RST_ABORTED,/* outgoing RST was aborted */
+   CSK_TLS_HANDSHK,/* TLS Handshake */
+};
+
+struct listen_ctx {
+   struct sock *lsk;
+   struct chtls_dev *cdev;
+   u32 state;
+};
+
+struct key_map {
+   unsigned long *addr;
+   unsigned int start;
+   unsigned int available;
+   unsigned int size;
+   spinlock_t lock; /* lock for key id request from map */
+} __packed;
+
+struct tls_scmd {
+   u32 seqno_numivs;
+   u32 ivgen_hdrlen;
+};
+
+struct chtls_dev {
+   struct tls_device tlsdev;
+   struct list_head list;
+   struct cxgb4_lld_info *lldi;
+   struct pci_dev *pdev;
+   struct listen_info *listen_hash_tab[LISTEN_INFO_HASH_SIZE];
+   spinlock_t listen_lock; /* lock for listen list */
+   struct net_device **ports;
+   struct tid_info *tids;
+   unsigned int pfvf;
+   const unsigned short *mtus;
+
+   spinlock_t aidr_lock cacheline_aligned_in_smp;
+   struct idr aidr; /* ATID id space */
+   struct idr hwtid_idr;
+   struct idr stid_idr;
+
+   spinlock_t idr_lock cacheline_aligned_in_smp;
+
+   struct net_device *egr_dev[NCHAN * 2];
+   struct sk_buff *rspq_skb_cache[1 << RSPQ_HASH_BITS];
+   struct sk_buff *askb;
+
+   struct sk_buff_head deferq;
+   struct work_struct deferq_task;
+
+   struct list_head list_node;
+   struct list_head rcu_node;
+   struct list_head na_node;
+   unsigned int send_page_order;
+   struct key_map kmap;
+};
+
+struct

[Crypto v7 03/12] tls: support for inline tls

2018-02-22 Thread Atul Gupta

Facility to register Inline TLS drivers to net/tls. Setup
TLS_FULL_HW prot to listen on offload device.

Cases handled
1. Inline TLS device exists, setup prot for TLS_FULL_HW
2. Atleast one Inline TLS exists, sets TLS_FULL_HW. If
non-inline capable device establish connection, move to TLS_SW_TX
3. default mode TLS_SW_TX continues

Signed-off-by: Atul Gupta 
---
 net/tls/tls_main.c | 123 ++---
 1 file changed, 116 insertions(+), 7 deletions(-)

diff --git a/net/tls/tls_main.c b/net/tls/tls_main.c
index b0d5fce..34f8781 100644
--- a/net/tls/tls_main.c
+++ b/net/tls/tls_main.c
@@ -38,6 +38,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 
@@ -45,13 +46,9 @@
 MODULE_DESCRIPTION("Transport Layer Security Support");
 MODULE_LICENSE("Dual BSD/GPL");
 
-enum {
-   TLS_BASE_TX,
-   TLS_SW_TX,
-   TLS_NUM_CONFIG,
-};
-
-static struct proto tls_prots[TLS_NUM_CONFIG];
+static LIST_HEAD(device_list);
+static DEFINE_MUTEX(device_mutex);
+struct proto tls_prots[TLS_NUM_CONFIG];
 
 static inline void update_sk_prot(struct sock *sk, struct tls_context *ctx)
 {
@@ -260,6 +257,37 @@ static void tls_sk_proto_close(struct sock *sk, long 
timeout)
sk_proto_close(sk, timeout);
 }
 
+static struct net_device *get_netdev(struct sock *sk)
+{
+   struct inet_sock *inet = inet_sk(sk);
+   struct net_device *netdev;
+
+   netdev = dev_get_by_index(sock_net(sk), inet->cork.fl.flowi_oif);
+   return netdev;
+}
+
+static int tls_offload_dev_absent(struct sock *sk)
+{
+   struct net_device *netdev;
+   struct tls_device *dev;
+   int rc = 0;
+
+   netdev = get_netdev(sk);
+   if (!netdev)
+   return -EINVAL;
+
+   mutex_lock(_mutex);
+   list_for_each_entry(dev, _list, dev_list) {
+   if (dev->netdev && dev->netdev(dev, netdev)) {
+   rc = -EEXIST;
+   break;
+   }
+   }
+   mutex_unlock(_mutex);
+   dev_put(netdev);
+   return rc;
+}
+
 static int do_tls_getsockopt_tx(struct sock *sk, char __user *optval,
int __user *optlen)
 {
@@ -403,6 +431,15 @@ static int do_tls_setsockopt_tx(struct sock *sk, char 
__user *optval,
goto err_crypto_info;
}
 
+   rc = tls_offload_dev_absent(sk);
+   if (rc == -EINVAL) {
+   goto out;
+   } else if (rc == -EEXIST) {
+   /* Retain HW unhash for cleanup and move to SW Tx */
+   sk->sk_prot[TLS_BASE_TX].unhash =
+   sk->sk_prot[TLS_FULL_HW].unhash;
+   }
+
/* currently SW is default, we will have ethtool in future */
rc = tls_set_sw_offload(sk, ctx);
tx_conf = TLS_SW_TX;
@@ -450,6 +487,54 @@ static int tls_setsockopt(struct sock *sk, int level, int 
optname,
return do_tls_setsockopt(sk, optname, optval, optlen);
 }
 
+static int tls_hw_prot(struct sock *sk)
+{
+   struct tls_context *ctx = tls_get_ctx(sk);
+   struct tls_device *dev;
+
+   mutex_lock(_mutex);
+   list_for_each_entry(dev, _list, dev_list) {
+   if (dev->feature && dev->feature(dev)) {
+   ctx->tx_conf = TLS_FULL_HW;
+   update_sk_prot(sk, ctx);
+   break;
+   }
+   }
+   mutex_unlock(_mutex);
+   return ctx->tx_conf;
+}
+
+static void tls_hw_unhash(struct sock *sk)
+{
+   struct tls_device *dev;
+
+   mutex_lock(_mutex);
+   list_for_each_entry(dev, _list, dev_list) {
+   if (dev->unhash)
+   dev->unhash(dev, sk);
+   }
+   mutex_unlock(_mutex);
+   sk->sk_prot->unhash(sk);
+}
+
+static int tls_hw_hash(struct sock *sk)
+{
+   struct tls_device *dev;
+   int err;
+
+   err = sk->sk_prot->hash(sk);
+   mutex_lock(_mutex);
+   list_for_each_entry(dev, _list, dev_list) {
+   if (dev->hash)
+   err |= dev->hash(dev, sk);
+   }
+   mutex_unlock(_mutex);
+
+   if (err)
+   tls_hw_unhash(sk);
+   return err;
+}
+
 static int tls_init(struct sock *sk)
 {
struct inet_connection_sock *icsk = inet_csk(sk);
@@ -477,6 +562,9 @@ static int tls_init(struct sock *sk)
ctx->sk_proto_close = sk->sk_prot->close;
 
ctx->tx_conf = TLS_BASE_TX;
+   if (tls_hw_prot(sk) == TLS_FULL_HW)
+   goto out;
+
update_sk_prot(sk, ctx);
 out:
return rc;
@@ -500,7 +588,27 @@ static void build_protos(struct proto *prot, struct proto 
*base)
prot[TLS_SW_TX] = prot[TLS_BASE_TX];
prot[TLS_SW_TX].sendmsg = tls_sw_sendmsg;
prot[TLS_SW_TX].sendpage= tls_sw_sendpage;
+
+   prot[TLS_FULL_HW] = prot[TLS_BASE_TX];
+   prot[TLS_FULL_HW].hash  = tls_hw_hash;
+   prot[TLS_FULL_HW].unhash= tls_hw_unhash;
+}
+
+void

Re: [PATCH bpf v2] bpf: fix rcu lockdep warning for lpm_trie map_free callback

2018-02-22 Thread Eric Dumazet

On Thu, 2018-02-22 at 10:10 -0800, Yonghong Song wrote:
> Commit 9a3efb6b661f ("bpf: fix memory leak in lpm_trie map_free callback 
> function")
> fixed a memory leak and removed unnecessary locks in map_free callback 
> function.
> Unfortrunately, it introduced a lockdep warning. When lockdep checking is 
> turned on,
> running tools/testing/selftests/bpf/test_lpm_map will have:
> 

> Fixes: 9a3efb6b661f ("bpf: fix memory leak in lpm_trie map_free callback 
> function")
> Reported-by: Eric Dumazet 
> Suggested-by: Eric Dumazet 
> Signed-off-by: Yonghong Song 
> ---
>  kernel/bpf/lpm_trie.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> v1 -> v2:
>  . fix sparse warning which is introduced by v1, suggested by Eric.
> 
> diff --git a/kernel/bpf/lpm_trie.c b/kernel/bpf/lpm_trie.c
> index a75e02c..b4b5b81 100644
> --- a/kernel/bpf/lpm_trie.c
> +++ b/kernel/bpf/lpm_trie.c
> @@ -569,8 +569,7 @@ static void trie_free(struct bpf_map *map)
>   slot = >root;
>  
>   for (;;) {
> - node = rcu_dereference_protected(*slot,
> - lockdep_is_held(>lock));
> + node = rcu_dereference_protected(*slot, 1);
>   if (!node)
>   goto out;

SGTM, thanks !

Reviewed-by: Eric Dumazet

Re: [PATCH bpf v2] bpf: fix rcu lockdep warning for lpm_trie map_free callback

2018-02-22 Thread David Miller

From: Yonghong Song 
Date: Thu, 22 Feb 2018 10:10:35 -0800

> Commit 9a3efb6b661f ("bpf: fix memory leak in lpm_trie map_free callback 
> function")
> fixed a memory leak and removed unnecessary locks in map_free callback 
> function.
> Unfortrunately, it introduced a lockdep warning. When lockdep checking is 
> turned on,
> running tools/testing/selftests/bpf/test_lpm_map will have:
> 
>   [   98.294321] =
>   [   98.294807] WARNING: suspicious RCU usage
>   [   98.295359] 4.16.0-rc2+ #193 Not tainted
>   [   98.295907] -
>   [   98.296486] /home/yhs/work/bpf/kernel/bpf/lpm_trie.c:572 suspicious 
> rcu_dereference_check() usage!
>   [   98.297657]
>   [   98.297657] other info that might help us debug this:
>   [   98.297657]
>   [   98.298663]
>   [   98.298663] rcu_scheduler_active = 2, debug_locks = 1
>   [   98.299536] 2 locks held by kworker/2:1/54:
>   [   98.300152]  #0:  ((wq_completion)"events"){+.+.}, at: 
> [<196bc1f0>] process_one_work+0x157/0x5c0
>   [   98.301381]  #1:  ((work_completion)(>work)){+.+.}, at: 
> [<196bc1f0>] process_one_work+0x157/0x5c0
> 
> Since actual trie tree removal happens only after no other
> accesses to the tree are possible, replacing
>   rcu_dereference_protected(*slot, lockdep_is_held(>lock))
> with
>   rcu_dereference_protected(*slot, 1)
> fixed the issue.
> 
> Fixes: 9a3efb6b661f ("bpf: fix memory leak in lpm_trie map_free callback 
> function")
> Reported-by: Eric Dumazet 
> Suggested-by: Eric Dumazet 
> Signed-off-by: Yonghong Song 

Acked-by: David S. Miller

Re: [PATCH] ipv6 sit: work around bogus gcc-8 -Wrestrict warning

2018-02-22 Thread Eric Dumazet

On Thu, 2018-02-22 at 16:55 +0100, Arnd Bergmann wrote:

... 
> 
> This code is old, so Cc stable to make sure that we don't get the warning
> for older kernels built with new gcc.
> 
> Cc: sta...@vger.kernel.org

This part makes little sense to me for two reasons.

1) David Miller handles stable submission himself
 ( Documentation/networking/netdev-FAQ.txt )

2) We are not supposed to make sure old kernels will compile with
future compilers.

That would need a lot of work and potential new bugs, not worth the
time.

Otherwise your patch looks fine really ;)

[RFC PATCH] lan743x: lan743x_csr_read() can be static

2018-02-22 Thread kbuild test robot


Fixes: 896121de80db ("lan743x: Add main source files for new lan743x driver")
Signed-off-by: Fengguang Wu 
---
 lan743x_main.c |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/microchip/lan743x_main.c 
b/drivers/net/ethernet/microchip/lan743x_main.c
index 3de39e1..dd5ed86 100644
--- a/drivers/net/ethernet/microchip/lan743x_main.c
+++ b/drivers/net/ethernet/microchip/lan743x_main.c
@@ -65,12 +65,12 @@ static int lan743x_pci_init(struct lan743x_adapter *adapter,
return ret;
 }
 
-u32 lan743x_csr_read(struct lan743x_adapter *adapter, int offset)
+static u32 lan743x_csr_read(struct lan743x_adapter *adapter, int offset)
 {
return ioread32(>csr.csr_address[offset]);
 }
 
-void lan743x_csr_write(struct lan743x_adapter *adapter, int offset, u32 data)
+static void lan743x_csr_write(struct lan743x_adapter *adapter, int offset, u32 
data)
 {
iowrite32(data, >csr.csr_address[offset]);
 }

Re: [PATCH v2 net-next 1/2] lan743x: Add main source files for new lan743x driver

2018-02-22 Thread kbuild test robot

Hi Bryan,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on net-next/master]

url:
https://github.com/0day-ci/linux/commits/Bryan-Whitehead/lan743x-Add-new-lan743x-driver/20180222-225510
reproduce:
# apt-get install sparse
make ARCH=x86_64 allmodconfig
make C=1 CF=-D__CHECK_ENDIAN__


sparse warnings: (new ones prefixed by >>)

>> drivers/net/ethernet/microchip/lan743x_main.c:68:5: sparse: symbol 
>> 'lan743x_csr_read' was not declared. Should it be
>> drivers/net/ethernet/microchip/lan743x_main.c:73:6: sparse: symbol 
>> 'lan743x_csr_write' was not declared. Should it be

Please review and possibly fold the followup patch.

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation

Re: nft/bpf interpreters and spectre2. Was: [PATCH RFC 0/4] net: add bpfilter

2018-02-22 Thread Alexei Starovoitov

On Thu, Feb 22, 2018 at 12:39:15PM +0100, Pablo Neira Ayuso wrote:
> Hi Alexei,
> 
> On Wed, Feb 21, 2018 at 06:20:37PM -0800, Alexei Starovoitov wrote:
> > On Wed, Feb 21, 2018 at 01:13:03PM +0100, Florian Westphal wrote:
> > > 
> > > Obvious candidates are: meta, numgen, limit, objref, quota, reject.
> > > 
> > > We should probably also consider removing
> > > CONFIG_NFT_SET_RBTREE and CONFIG_NFT_SET_HASH and just always
> > > build both too (at least rbtree since that offers interval).
> > > 
> > > For the indirect call issue we can use direct calls from eval loop for
> > > some of the more frequently used ones, similar to what we do already
> > > for nft_cmp_fast_expr. 
> > 
> > nft_cmp_fast_expr and other expressions mentioned above made me thinking...
> > 
> > do we have the same issue with nft interpreter as we had with bpf one?
> > bpf interpreter was used as part of spectre2 attack to leak
> > information via cache side channel and let VM read hypervisor memory.
> > Due to that issue we removed bpf interpreter from the kernel code.
> > That's what CONFIG_BPF_JIT_ALWAYS_ON for...
> > but we still have nft interpreter in the kernel that can also
> > execute arbitrary nft expressions.
> > 
> > Jann's exploit used the following bpf instructions:
> > struct bpf_insn evil_bytecode_instrs[] = {
> > // rax = target_byte_addr
> > { .code = BPF_LD | BPF_IMM | BPF_DW, .dst_reg = 0, .imm = target_byte_addr 
> > }, { .imm = target_byte_addr>>32 },
> 
> We don't place pointers in the nft VM registers, it's basically
> illegal to do so, otherwise we would need more sophisticated verifier.
> I'm telling this because we don't have a way to point to any arbitrary
> address as in 'target_byte_addr' above.

these evil_bytecode_instrs never saw bpf verifier either.
That's the scary part of that poc.
The only requirement for poc to work is to have interpreter
in executable part of hypervisor code and speculatively jump into it
with arguments pointing to memory controlled by vm.
All static checks (done by bpf verifier and by nft validation) are bypassed.
The only way to defend from such exploit is either remove the interpreter
from the kernel or add _run-time_ checks and masks for every memory access
(similar to what is done for spectre1 mitigations).
In case of bpf it's impractical.
In case of nft I suspect so too. I don't yet see how nft can check
that skb pointer passed as part of nft_pktinfo is not an actual skb.

> > // rdi = timing_leak_array
> > { .code = BPF_LD | BPF_IMM | BPF_DW, .dst_reg = 1, .imm = 
> > host_timing_leak_addr }, { .imm = host_timing_leak_addr>>32 },
> > // rax = *(u8*)rax
> > { .code = BPF_LDX | BPF_MEM | BPF_B, .dst_reg = 0, .src_reg = 0, .off = 0 },
> > // rax = rax << ...
> > { .code = BPF_ALU64 | BPF_LSH | BPF_K, .dst_reg = 0, .imm = 10 - bit_idx },
> > // rax = rax & 0x400
> > { .code = BPF_ALU64 | BPF_AND | BPF_K, .dst_reg = 0, .imm = 0x400 },
> > // rax = rdi + rax
> > { .code = BPF_ALU64 | BPF_ADD | BPF_X, .dst_reg = 0, .src_reg = 1 },
> > // *(u8*) (rax + 0x800)
> > { .code = BPF_LDX | BPF_MEM | BPF_B, .dst_reg = 0, .src_reg = 0, .off = 
> > 0x800 },
> > 
> > and a gadget to jump into __bpf_prog_run with insn pointing
> > to memory controlled by the guest while accessible
> > (at different virt address) by the hypervisor.
> > 
> > It seems possible to construct similar sequence of instructions
> > out of nft expressions and use gadget that jumps into nft_do_chain().
> > The attacker would need to discover more kernel addresses:
> > nft_do_chain, nft_cmp_fast_ops, nft_payload_fast_ops, nft_bitwise_eval,
> > nft_lookup_eval, and nft_bitmap_lookup
> > to populate nft chains, rules and expressions in guest memory
> > comparing to bpf interpreter attack.
> > 
> > Then in nft_do_chain(struct nft_pktinfo *pkt, void *priv)
> > pkt needs to point to fake struct sk_buff in guest memory with
> > skb->head == target_byte_addr
> 
> We don't have a way to make this point to fake struct sk_buff.

yet. it's possible, since cpu is speculating and all such pointers
controlled by vm can be arbitrary.

> > The first nft expression can be nft_payload_fast_eval().
> > If it's properly constructed with
> > (nft_payload->based == NFT_PAYLOAD_NETWORK_HEADER, offset == 0, len == 0, 
> > dreg == 1)
> 
> We can reject len == 0. To be honest, this is not done right now, but
> we can place a patch to validate this. Given this is a specialized
> networking virtual machine, it retain semantics, so fetching zero
> length data from a skbuff makes no sense, hence, we can return EINVAL
> via netlink when adding a rule that tries to do this.

Adding static check won't help.

[PATCH] ipv6 sit: work around bogus gcc-8 -Wrestrict warning

2018-02-22 Thread Arnd Bergmann

gcc-8 has a new warning that detects overlapping input and output arguments
in memcpy(). It triggers for sit_init_net() calling ipip6_tunnel_clone_6rd(),
which is actually correct:

net/ipv6/sit.c: In function 'sit_init_net':
net/ipv6/sit.c:192:3: error: 'memcpy' source argument is the same as 
destination [-Werror=restrict]

The problem here is that the logic detecting the memcpy() arguments finds them
to be the same, but the conditional that tests for the input and output of
ipip6_tunnel_clone_6rd() to be identical is not a compile-time constant.

We know that netdev_priv(t->dev) is the same as t for a tunnel device,
and comparing "dev" directly here lets the compiler figure out as well
that 'dev == sitn->fb_tunnel_dev' when called from sit_init_net(), so
it no longer warns.

This code is old, so Cc stable to make sure that we don't get the warning
for older kernels built with new gcc.

Cc: sta...@vger.kernel.org
Cc: Martin Sebor 
Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83456
Signed-off-by: Arnd Bergmann 
---
 net/ipv6/sit.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/ipv6/sit.c b/net/ipv6/sit.c
index 3873d3877135..3a1775a62973 100644
--- a/net/ipv6/sit.c
+++ b/net/ipv6/sit.c
@@ -182,7 +182,7 @@ static void ipip6_tunnel_clone_6rd(struct net_device *dev, 
struct sit_net *sitn)
 #ifdef CONFIG_IPV6_SIT_6RD
struct ip_tunnel *t = netdev_priv(dev);
 
-   if (t->dev == sitn->fb_tunnel_dev) {
+   if (dev == sitn->fb_tunnel_dev) {
ipv6_addr_set(>ip6rd.prefix, htonl(0x2002), 0, 0, 0);
t->ip6rd.relay_prefix = 0;
t->ip6rd.prefixlen = 16;
-- 
2.9.0

[PATCH] bpf: add schedule points in percpu arrays management

2018-02-22 Thread Eric Dumazet

From: Eric Dumazet 

syszbot managed to trigger RCU detected stalls in
bpf_array_free_percpu()

It takes time to allocate a huge percpu map, but even more time to free
it.

Since we run in process context, use cond_resched() to yield cpu if
needed.

Fixes: a10423b87a7e ("bpf: introduce BPF_MAP_TYPE_PERCPU_ARRAY map")
Signed-off-by: Eric Dumazet 
Reported-by: syzbot 
---
 kernel/bpf/arraymap.c |5 -
 1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c
index 
a364c408f25a54a8175c92b6004a5e7e15f198cb..14750e7c5ee4872e4a7426e960bea7ae001e6623
 100644
--- a/kernel/bpf/arraymap.c
+++ b/kernel/bpf/arraymap.c
@@ -26,8 +26,10 @@ static void bpf_array_free_percpu(struct bpf_array *array)
 {
int i;
 
-   for (i = 0; i < array->map.max_entries; i++)
+   for (i = 0; i < array->map.max_entries; i++) {
free_percpu(array->pptrs[i]);
+   cond_resched();
+   }
 }
 
 static int bpf_array_alloc_percpu(struct bpf_array *array)
@@ -43,6 +45,7 @@ static int bpf_array_alloc_percpu(struct bpf_array *array)
return -ENOMEM;
}
array->pptrs[i] = ptr;
+   cond_resched();
}
 
return 0;

Re: [PATCH] ipv6 sit: work around bogus gcc-8 -Wrestrict warning

2018-02-22 Thread Arnd Bergmann

On Thu, Feb 22, 2018 at 5:40 PM, Eric Dumazet  wrote:
> On Thu, 2018-02-22 at 16:55 +0100, Arnd Bergmann wrote:
>
> ...
>>
>> This code is old, so Cc stable to make sure that we don't get the warning
>> for older kernels built with new gcc.
>>
>> Cc: sta...@vger.kernel.org
>
>
> This part makes little sense to me for two reasons.
>
> 1) David Miller handles stable submission himself
>  ( Documentation/networking/netdev-FAQ.txt )

Right, sorry I keep forgetting this.

> 2) We are not supposed to make sure old kernels will compile with
> future compilers.
>
> That would need a lot of work and potential new bugs, not worth the
> time.

I did spent a some time backporting the gcc-7 fixes to stable kernels
already.

The 4.4 and 4.9 releases did not build cleanly with gcc-7 originally
but now they do, which is useful since Greg actually uses that
compiler for test building them.

I expect to do the same for gcc-8. Most of the fixes are trivial
anyway, and some of them fix actual bugs that would otherwise
get missed.

  Arnd

[PATCH] Remove useless assignment in ip_do_fragment

2018-02-22 Thread C0deAi

Hi my name is Benjamin Bales.

I am the founder and creator of CodeAI,
the first non-human contributor to your software project. CodeAI finds
and fixes security defects for you. It fixed 327. It wants to merge a
fix for a useless assignment. To view all 327 fixed issues from the
run claim your free open source account at mycode.ai and the
Dockerfile used to build and run your project in CodeAI, here-
https://drive.google.com/drive/folders/1KB9WQQyWQgYccmiSjy2E1JWJ4vWuoLYd .
It is always free for open source projects.

If you have any questions about these results or have general
inquiries about CodeAI, please send an email to techsupp...@mycode.ai

Signed-off-by: Benjamin Bales 
---
 net/ipv4/ip_output.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index e8e675b..0e44434 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -640,7 +640,6 @@ int ip_do_fragment(struct net *net, struct sock *sk, struct 
sk_buff *skb,
 
/* Everything is OK. Generate! */
 
-   err = 0;
offset = 0;
frag = skb_shinfo(skb)->frag_list;
skb_frag_list_init(skb);
-- 
2.7.4

[Crypto v7 00/12] Chelsio Inline TLS

2018-02-22 Thread Atul Gupta

Series for Chelsio Inline TLS driver (chtls.ko)

Use tls ULP infrastructure to register chtls as Inline TLS driver.
Chtls use TCP Sockets to transmit and receive TLS record. TCP proto_ops is 
extended to offload TLS record.

T6 adapter provides the following features:
-TLS record offload, TLS header, encrypt, digest and transmit
-TLS record receive and decrypt
-TLS keys store
-TCP/IP engine
-TLS engine
-GCM crypto engine [support CBC also]

TLS provides security at the transport layer. It uses TCP to provide reliable 
end-to-end transport of application data. It relies on TCP for any 
retransmission. TLS session comprises of three parts:
a. TCP/IP connection
b. TLS handshake
c. Record layer processing

TLS handshake state machine is executed in host (refer standard implementation 
eg. OpenSSL).  Setsockopt [SOL_TCP, TCP_ULP] initialize TCP proto-ops for 
Chelsio inline tls support. setsockopt(sock, SOL_TCP, TCP_ULP, "tls", 
sizeof("tls"));

Tx and Rx Keys are decided during handshake and programmed onto the chip after 
CCS is exchanged.
struct tls12_crypto_info_aes_gcm_128 crypto_info setsockopt(sock, SOL_TLS, 
TLS_TX, _info, sizeof(crypto_info)) Finish is the first 
encrypted/decrypted message tx/rx inline.

On the Tx path TLS engine receive plain text from openssl, insert IV, fetches 
the tx key, create cipher text records and generate MAC. TLS header is added to 
cipher text and forward to TCP/IP engine for transport layer processing and 
transmission on wire.
TX:
Application--openssl--chtls---TLS engine---encrypt/auth---TCP/IP engine---wire.

On the Rx side, data received is PDU aligned at record boundaries. TLS 
processes only the complete record. If rx key is programmed on CCS receive, 
data is decrypted and plain text is posted to host.
RX:
Wire--cipher-text--TCP/IP engine [PDU align]---TLS engine--- 
decrypt/auth---plain-text--chtls--openssl--application

v7: func name change, use sk->sk_prot where required

v6: modify prot only for FULL_HW
   -corrected commit message for patch 11

v5: set TLS_FULL_HW for registered inline tls drivers
   -set TLS_FULL_HW prot for offload connection else move
to TLS_SW_TX
   -Case handled for interface with same IP [Dave Miller]
   -Removed Specific IP and INADDR_ANY handling [v4]

v4: removed chtls ULP type, retained tls ULP
   -registered chtls with net tls
   -defined struct tls_device to register the Inline drivers
   -ethtool interface tls-inline to enable Inline TLS for interface
   -prot update to support inline TLS

v3: fixed the kbuild test issues
   -made few funtions static
   -initialized few variables

v2: fixed the following based on the review comments of Stephan Mueller,
Stefano Brivio and Hannes Frederic
-Added more details in cover letter
-Fixed indentation and formating issues
-Using aes instead of aes-generic
-memset key info after programing the key on chip
-reordered the patch sequence
Atul Gupta (12):
  tls: tls_device struct to register TLS drivers
  ethtool: enable Inline TLS in HW
  tls: support for inline tls
  chtls: structure and macro definiton
  cxgb4: Inline TLS FW Interface
  cxgb4: LLD driver changes to enable TLS
  chcr: Key Macro
  chtls: Key program
  chtls: CPL handler definition
  chtls: Inline crypto request Tx/Rx
  chtls: Register chtls Inline TLS with net tls
  Makefile Kconfig

 drivers/crypto/chelsio/Kconfig  |   11 +
 drivers/crypto/chelsio/Makefile |1 +
 drivers/crypto/chelsio/chcr_algo.h  |   42 +
 drivers/crypto/chelsio/chcr_core.h  |   55 +-
 drivers/crypto/chelsio/chtls/Makefile   |4 +
 drivers/crypto/chelsio/chtls/chtls.h|  487 ++
 drivers/crypto/chelsio/chtls/chtls_cm.c | 2041 +++
 drivers/crypto/chelsio/chtls/chtls_cm.h |  202 +++
 drivers/crypto/chelsio/chtls/chtls_hw.c |  394 +
 drivers/crypto/chelsio/chtls/chtls_io.c | 1867 +
 drivers/crypto/chelsio/chtls/chtls_main.c   |  600 +++
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c |   32 +-
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_uld.h  |7 +
 drivers/net/ethernet/chelsio/cxgb4/sge.c|   98 +-
 drivers/net/ethernet/chelsio/cxgb4/t4_msg.h |  121 +-
 drivers/net/ethernet/chelsio/cxgb4/t4_regs.h|2 +
 drivers/net/ethernet/chelsio/cxgb4/t4fw_api.h   |  165 +-
 include/linux/netdev_features.h |2 +
 include/net/tls.h   |   23 +
 include/uapi/linux/tls.h|1 +
 net/core/ethtool.c  |1 +
 net/ipv4/tcp_minisocks.c|1 +
 net/tls/tls_main.c  |  123 +-
 23 files changed, 6256 insertions(+), 24 deletions(-)
 create mode 100644 drivers/crypto/chelsio/chtls/Makefile
 create mode 100644 drivers/crypto/chelsio/chtls/chtls.h
 create mode 100644

[Crypto v7 02/12] ethtool: enable Inline TLS in HW

2018-02-22 Thread Atul Gupta

Signed-off-by: Atul Gupta 
---
 include/linux/netdev_features.h | 2 ++
 net/core/ethtool.c  | 1 +
 2 files changed, 3 insertions(+)

diff --git a/include/linux/netdev_features.h b/include/linux/netdev_features.h
index db84c51..aacabe2 100644
--- a/include/linux/netdev_features.h
+++ b/include/linux/netdev_features.h
@@ -79,6 +79,7 @@ enum {
NETIF_F_RX_UDP_TUNNEL_PORT_BIT, /* Offload of RX port for UDP tunnels */
 
NETIF_F_GRO_HW_BIT, /* Hardware Generic receive offload */
+   NETIF_F_HW_TLS_INLINE_BIT,  /* Offload TLS record */
 
/*
 * Add your fresh new feature above and remember to update
@@ -145,6 +146,7 @@ enum {
 #define NETIF_F_HW_ESP __NETIF_F(HW_ESP)
 #define NETIF_F_HW_ESP_TX_CSUM __NETIF_F(HW_ESP_TX_CSUM)
 #defineNETIF_F_RX_UDP_TUNNEL_PORT  __NETIF_F(RX_UDP_TUNNEL_PORT)
+#define NETIF_F_HW_TLS_INLINE  __NETIF_F(HW_TLS_INLINE)
 
 #define for_each_netdev_feature(mask_addr, bit)\
for_each_set_bit(bit, (unsigned long *)mask_addr, NETDEV_FEATURE_COUNT)
diff --git a/net/core/ethtool.c b/net/core/ethtool.c
index 494e6a5..ab16781 100644
--- a/net/core/ethtool.c
+++ b/net/core/ethtool.c
@@ -107,6 +107,7 @@ int ethtool_op_get_ts_info(struct net_device *dev, struct 
ethtool_ts_info *info)
[NETIF_F_HW_ESP_BIT] =   "esp-hw-offload",
[NETIF_F_HW_ESP_TX_CSUM_BIT] =   "esp-tx-csum-hw-offload",
[NETIF_F_RX_UDP_TUNNEL_PORT_BIT] =   "rx-udp_tunnel-port-offload",
+   [NETIF_F_HW_TLS_INLINE_BIT] =   "tls-inline",
 };
 
 static const char
-- 
1.8.3.1

Re: [PATCH bpf] bpf: fix rcu lockdep warning for lpm_trie map_free callback

2018-02-22 Thread Yonghong Song




On 2/22/18 5:37 AM, Eric Dumazet wrote:

On Wed, 2018-02-21 at 22:38 -0800, Yonghong Song wrote:

Commit 9a3efb6b661f ("bpf: fix memory leak in lpm_trie map_free callback 
function")
fixed a memory leak and removed unnecessary locks in map_free callback function.
Unfortrunately, it introduced a lockdep warning. When lockdep checking is 
turned on,
running tools/testing/selftests/bpf/test_lpm_map will have:

   [   98.294321] =
   [   98.294807] WARNING: suspicious RCU usage
   [   98.295359] 4.16.0-rc2+ #193 Not tainted
   [   98.295907] -
   [   98.296486] /home/yhs/work/bpf/kernel/bpf/lpm_trie.c:572 suspicious 
rcu_dereference_check() usage!
   [   98.297657]
   [   98.297657] other info that might help us debug this:
   [   98.297657]
   [   98.298663]
   [   98.298663] rcu_scheduler_active = 2, debug_locks = 1
   [   98.299536] 2 locks held by kworker/2:1/54:
   [   98.300152]  #0:  ((wq_completion)"events"){+.+.}, at: 
[<196bc1f0>] process_one_work+0x157/0x5c0
   [   98.301381]  #1:  ((work_completion)(>work)){+.+.}, at: 
[<196bc1f0>] process_one_work+0x157/0x5c0

Since actual trie tree removal happens only after no other
accesses to the tree are possible, this patch simply converted all
rcu protected pointer access to normal access, which removed the
above warning.

Fixes: 9a3efb6b661f ("bpf: fix memory leak in lpm_trie map_free callback 
function")
Reported-by: Eric Dumazet 
Signed-off-by: Yonghong Song 
---
  kernel/bpf/lpm_trie.c | 11 +--
  1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/kernel/bpf/lpm_trie.c b/kernel/bpf/lpm_trie.c
index a75e02c..0c15813 100644
--- a/kernel/bpf/lpm_trie.c
+++ b/kernel/bpf/lpm_trie.c
@@ -552,7 +552,7 @@ static struct bpf_map *trie_alloc(union bpf_attr *attr)
  static void trie_free(struct bpf_map *map)
  {
struct lpm_trie *trie = container_of(map, struct lpm_trie, map);
-   struct lpm_trie_node __rcu **slot;
+   struct lpm_trie_node **slot;
struct lpm_trie_node *node;
  
  	/* Wait for outstanding programs to complete

@@ -569,23 +569,22 @@ static void trie_free(struct bpf_map *map)
slot = >root;
  
  		for (;;) {

-   node = rcu_dereference_protected(*slot,
-   lockdep_is_held(>lock));
+   node = *slot;


Hi Yonghong

It is not sparse compliant.

kernel/bpf/lpm_trie.c:573:30: warning: incorrect type in assignment (different 
address spaces)
kernel/bpf/lpm_trie.c:573:30:expected struct lpm_trie_node *node
kernel/bpf/lpm_trie.c:573:30:got struct lpm_trie_node [noderef] 
*


In my local tree, I simply did

node = rcu_dereference_protected(*slot, 1);

Since we are the last user of the whole tree after the prior synchronize_rcu();


Emic,

Thanks for the fix suggestion. It does make sense.

I indeed ran sparse before my patch send-email. Unfortunately, my dev 
machine (sparse 0.5.0 + gcc 4.8.5) didn't issue warning like the above.


With the same kernel config and kernel tree, I just tried on another 
machine (a FC27 VM, sparse 0.5.1 + gcc 7.3.1), I did see the warning and

the above suggested fix makes warning went away.
Need to figure out why sparse is not happy with my dev machine.

Will send a follow patch soon.

Thanks!

Yonghong

[Crypto v7 01/12] tls: tls_device struct to register TLS drivers

2018-02-22 Thread Atul Gupta

tls_device structure to register Inline TLS drivers
with net/tls

Signed-off-by: Atul Gupta 
---
 include/net/tls.h | 23 +++
 1 file changed, 23 insertions(+)

diff --git a/include/net/tls.h b/include/net/tls.h
index 4913430..e315bf9 100644
--- a/include/net/tls.h
+++ b/include/net/tls.h
@@ -55,6 +55,27 @@
 #define TLS_RECORD_TYPE_DATA   0x17
 
 #define TLS_AAD_SPACE_SIZE 13
+#define TLS_DEVICE_NAME_MAX32
+
+enum {
+   TLS_BASE_TX,
+   TLS_SW_TX,
+   TLS_FULL_HW, /* TLS record processed Inline */
+   TLS_NUM_CONFIG,
+};
+extern struct proto tls_prots[TLS_NUM_CONFIG];
+
+struct tls_device {
+   char name[TLS_DEVICE_NAME_MAX];
+   struct list_head dev_list;
+
+   /* netdev present in registered inline tls driver */
+   int (*netdev)(struct tls_device *device,
+ struct net_device *netdev);
+   int (*feature)(struct tls_device *device);
+   int (*hash)(struct tls_device *device, struct sock *sk);
+   void (*unhash)(struct tls_device *device, struct sock *sk);
+};
 
 struct tls_sw_context {
struct crypto_aead *aead_send;
@@ -256,5 +277,7 @@ static inline struct tls_offload_context *tls_offload_ctx(
 
 int tls_proccess_cmsg(struct sock *sk, struct msghdr *msg,
  unsigned char *record_type);
+void tls_register_device(struct tls_device *device);
+void tls_unregister_device(struct tls_device *device);
 
 #endif /* _TLS_OFFLOAD_H */
-- 
1.8.3.1

Re: ppp/pppoe, still panic 4.15.3 in ppp_push

2018-02-22 Thread Guillaume Nault

On Wed, Feb 21, 2018 at 12:04:30PM -0800, Cong Wang wrote:
> On Thu, Feb 15, 2018 at 11:31 AM, Guillaume Nault  
> wrote:
> > On Thu, Feb 15, 2018 at 06:01:16PM +0200, Denys Fedoryshchenko wrote:
> >> On 2018-02-15 17:55, Guillaume Nault wrote:
> >> > On Thu, Feb 15, 2018 at 12:19:52PM +0200, Denys Fedoryshchenko wrote:
> >> > > Here we go:
> >> > >
> >> > >   [24558.921549]
> >> > > ==
> >> > >   [24558.922167] BUG: KASAN: use-after-free in
> >> > > ppp_ioctl+0xa6a/0x1522
> >> > > [ppp_generic]
> >> > >   [24558.922776] Write of size 8 at addr 8803d35bf3f8 by task
> >> > > accel-pppd/12622
> >> > >   [24558.923113]
> >> > >   [24558.923451] CPU: 0 PID: 12622 Comm: accel-pppd Tainted: G
> >> > > W
> >> > > 4.15.3-build-0134 #1
> >> > >   [24558.924058] Hardware name: HP ProLiant DL320e Gen8 v2,
> >> > > BIOS P80
> >> > > 04/02/2015
> >> > >   [24558.924406] Call Trace:
> >> > >   [24558.924753]  dump_stack+0x46/0x59
> >> > >   [24558.925103]  print_address_description+0x6b/0x23b
> >> > >   [24558.925451]  ? ppp_ioctl+0xa6a/0x1522 [ppp_generic]
> >> > >   [24558.925797]  kasan_report+0x21b/0x241
> >> > >   [24558.926136]  ppp_ioctl+0xa6a/0x1522 [ppp_generic]
> >> > >   [24558.926479]  ? ppp_nl_newlink+0x1da/0x1da [ppp_generic]
> >> > >   [24558.926829]  ? sock_sendmsg+0x89/0x99
> >> > >   [24558.927176]  ? __vfs_write+0xd9/0x4ad
> >> > >   [24558.927523]  ? kernel_read+0xed/0xed
> >> > >   [24558.927872]  ? SyS_getpeername+0x18c/0x18c
> >> > >   [24558.928213]  ? bit_waitqueue+0x2a/0x2a
> >> > >   [24558.928561]  ? wake_atomic_t_function+0x115/0x115
> >> > >   [24558.928898]  vfs_ioctl+0x6e/0x81
> >> > >   [24558.929228]  do_vfs_ioctl+0xa00/0xb10
> >> > >   [24558.929571]  ? sigprocmask+0x1a6/0x1d0
> >> > >   [24558.929907]  ? sigsuspend+0x13e/0x13e
> >> > >   [24558.930239]  ? ioctl_preallocate+0x14e/0x14e
> >> > >   [24558.930568]  ? SyS_rt_sigprocmask+0xf1/0x142
> >> > >   [24558.930904]  ? sigprocmask+0x1d0/0x1d0
> >> > >   [24558.931252]  SyS_ioctl+0x39/0x55
> >> > >   [24558.931595]  ? do_vfs_ioctl+0xb10/0xb10
> >> > >   [24558.931942]  do_syscall_64+0x1b1/0x31f
> >> > >   [24558.932288]  entry_SYSCALL_64_after_hwframe+0x21/0x86
> >> > >   [24558.932627] RIP: 0033:0x7f302849d8a7
> >> > >   [24558.932965] RSP: 002b:7f3029a52af8 EFLAGS: 0206
> >> > > ORIG_RAX:
> >> > > 0010
> >> > >   [24558.933578] RAX: ffda RBX: 7f3027d861e3 RCX:
> >> > > 7f302849d8a7
> >> > >   [24558.933927] RDX: 7f3023f49468 RSI: 4004743a RDI:
> >> > > 3a67
> >> > >   [24558.934266] RBP: 7f3029a52b20 R08:  R09:
> >> > > 55c8308d8e40
> >> > >   [24558.934607] R10: 0008 R11: 0206 R12:
> >> > > 7f3023f49358
> >> > >   [24558.934947] R13: 7ffe86e5723f R14:  R15:
> >> > > 7f3029a53700
> >> > >   [24558.935288]
> >> > >   [24558.935626] Allocated by task 12622:
> >> > >   [24558.935972]  ppp_register_net_channel+0x5f/0x5c6
> >> > > [ppp_generic]
> >> > >   [24558.936306]  pppoe_connect+0xab7/0xc71 [pppoe]
> >> > >   [24558.936640]  SyS_connect+0x14b/0x1b7
> >> > >   [24558.936975]  do_syscall_64+0x1b1/0x31f
> >> > >   [24558.937319]  entry_SYSCALL_64_after_hwframe+0x21/0x86
> >> > >   [24558.937655]
> >> > >   [24558.937993] Freed by task 12622:
> >> > >   [24558.938321]  kfree+0xb0/0x11d
> >> > >   [24558.938658]  ppp_release+0x111/0x120 [ppp_generic]
> >> > >   [24558.938994]  __fput+0x2ba/0x51a
> >> > >   [24558.939332]  task_work_run+0x11c/0x13d
> >> > >   [24558.939676]  exit_to_usermode_loop+0x7c/0xaf
> >> > >   [24558.940022]  do_syscall_64+0x2ea/0x31f
> >> > >   [24558.940368]  entry_SYSCALL_64_after_hwframe+0x21/0x86
> >> > >   [24558.947099]
> >> >
> >> > Your first guess was right. It looks like we have an issue with
> >> > reference counting on the channels. Can you send me your ppp_generic.o?
> >> http://nuclearcat.com/ppp_generic.o
> >> Compiled with gcc version 6.4.0 (Gentoo 6.4.0-r1 p1.3)
> >>
> > From what I can see, ppp_release() and ioctl(PPPIOCCONNECT) are called
> > concurrently on the same ppp_file. Even if this ppp_file was pointed at
> > by two different file descriptors, I can't see how this could defeat
> > the reference counting mechanism. I'm going to think more about it.
> 
> For me it looks like pch->clist is not removed from the list ppp->channels
> when destroyed via ppp_release(). But I don't want to pretend I understand
> ppp logic.
> 
I've thought about that too, but couldn't find a scenario that could
trigger the bug.

To get ->private_data pointing to a struct channel pointer, a file needs
to ioctl(PPPIOCATTCHAN) first. For this call to succeed, the channel
must have been registered with ppp_register_net_channel(). Both
operations take a reference on the channel, which means that, before
adding pch->clist to a ppp->channels list (with ppp_connect_channel()),
the channel is

[PATCH bpf v2] bpf: fix rcu lockdep warning for lpm_trie map_free callback

2018-02-22 Thread Yonghong Song

Commit 9a3efb6b661f ("bpf: fix memory leak in lpm_trie map_free callback 
function")
fixed a memory leak and removed unnecessary locks in map_free callback function.
Unfortrunately, it introduced a lockdep warning. When lockdep checking is 
turned on,
running tools/testing/selftests/bpf/test_lpm_map will have:

  [   98.294321] =
  [   98.294807] WARNING: suspicious RCU usage
  [   98.295359] 4.16.0-rc2+ #193 Not tainted
  [   98.295907] -
  [   98.296486] /home/yhs/work/bpf/kernel/bpf/lpm_trie.c:572 suspicious 
rcu_dereference_check() usage!
  [   98.297657]
  [   98.297657] other info that might help us debug this:
  [   98.297657]
  [   98.298663]
  [   98.298663] rcu_scheduler_active = 2, debug_locks = 1
  [   98.299536] 2 locks held by kworker/2:1/54:
  [   98.300152]  #0:  ((wq_completion)"events"){+.+.}, at: 
[<196bc1f0>] process_one_work+0x157/0x5c0
  [   98.301381]  #1:  ((work_completion)(>work)){+.+.}, at: 
[<196bc1f0>] process_one_work+0x157/0x5c0

Since actual trie tree removal happens only after no other
accesses to the tree are possible, replacing
  rcu_dereference_protected(*slot, lockdep_is_held(>lock))
with
  rcu_dereference_protected(*slot, 1)
fixed the issue.

Fixes: 9a3efb6b661f ("bpf: fix memory leak in lpm_trie map_free callback 
function")
Reported-by: Eric Dumazet 
Suggested-by: Eric Dumazet 
Signed-off-by: Yonghong Song 
---
 kernel/bpf/lpm_trie.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

v1 -> v2:
 . fix sparse warning which is introduced by v1, suggested by Eric.

diff --git a/kernel/bpf/lpm_trie.c b/kernel/bpf/lpm_trie.c
index a75e02c..b4b5b81 100644
--- a/kernel/bpf/lpm_trie.c
+++ b/kernel/bpf/lpm_trie.c
@@ -569,8 +569,7 @@ static void trie_free(struct bpf_map *map)
slot = >root;
 
for (;;) {
-   node = rcu_dereference_protected(*slot,
-   lockdep_is_held(>lock));
+   node = rcu_dereference_protected(*slot, 1);
if (!node)
goto out;
 
-- 
2.9.5

Re: [PATCH net-next v2 1/1] net: Allow a rule to track originating protocol

2018-02-22 Thread David Ahern

On 2/22/18 1:23 AM, Ido Schimmel wrote:
>> diff --git a/net/core/fib_rules.c b/net/core/fib_rules.c
>> index 98e1066c3d55..c1d4ab5b2d9f 100644
>> --- a/net/core/fib_rules.c
>> +++ b/net/core/fib_rules.c
>> @@ -51,6 +51,7 @@ int fib_default_rule_add(struct fib_rules_ops *ops,
>>  r->pref = pref;
>>  r->table = table;
>>  r->flags = flags;
>> +r->proto = RTPROT_KERNEL;
>>  r->fr_net = ops->fro_net;
>>  r->uid_range = fib_kuid_range_unset;
>>  
>> @@ -465,6 +466,7 @@ int fib_nl_newrule(struct sk_buff *skb, struct nlmsghdr 
>> *nlh,
>>  }
>>  refcount_set(>refcnt, 1);
>>  rule->fr_net = net;
>> +rule->proto = frh->proto;
>>  
>>  rule->pref = tb[FRA_PRIORITY] ? nla_get_u32(tb[FRA_PRIORITY])
>>: fib_default_rule_pref(ops);
>> @@ -664,6 +666,9 @@ int fib_nl_delrule(struct sk_buff *skb, struct nlmsghdr 
>> *nlh,
>>  }
>>  
>>  list_for_each_entry(rule, >rules_list, list) {
>> +if (frh->proto && (frh->proto != rule->proto))
>> +continue;
> 
> This breaks my scripts:
> # ip -4 rule show
> 0:  from all lookup local
> 32766:  from all lookup main
> 32767:  from all lookup default
> 
> # ip -4 rule del pref 0
> RTNETLINK answers: No such file or directory
> 
> Using iproute 4.15 in Fedora 27:
> # ip -V
> ip utility, iproute2-ss180129
> 
> Problem is iproute sets protocol to RTPROT_BOOT while rules are
> installed with RTPROT_KERNEL.
> 
> Maybe add FRA_PROTOCOL?
> 
> Thanks!

ugh. Another iproute2 bug that the kernel has to deal with. iproute2 has
been using rtm for the ancillary header for rules when it should have
been fib_rule_hdr. That bug allowed someone to set the protocol field to
RTPROT_BOOT which was complete nonsense for rules until Donald's recent
patch.

That means all FIB rules need to default to RTPROT_BOOT. I hate to
inherit that for the l3mdev rule, but looking at the iproute2 code I
don't see any options.

Donald: send a patch that changes the protocol for kernel installed
rules to RTPROT_BOOT.

[Crypto v7 08/12] chtls: Key program

2018-02-22 Thread Atul Gupta

Program the tx and rx key on chip.

Signed-off-by: Atul Gupta 
---
 drivers/crypto/chelsio/chtls/chtls_hw.c | 394 
 1 file changed, 394 insertions(+)
 create mode 100644 drivers/crypto/chelsio/chtls/chtls_hw.c

diff --git a/drivers/crypto/chelsio/chtls/chtls_hw.c 
b/drivers/crypto/chelsio/chtls/chtls_hw.c
new file mode 100644
index 000..c3e17159
--- /dev/null
+++ b/drivers/crypto/chelsio/chtls/chtls_hw.c
@@ -0,0 +1,394 @@
+/*
+ * Copyright (c) 2017 Chelsio Communications, Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * Written by: Atul Gupta (atul.gu...@chelsio.com)
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "chtls.h"
+#include "chtls_cm.h"
+
+static void __set_tcb_field_direct(struct chtls_sock *csk,
+  struct cpl_set_tcb_field *req, u16 word,
+  u64 mask, u64 val, u8 cookie, int no_reply)
+{
+   struct ulptx_idata *sc;
+
+   INIT_TP_WR_CPL(req, CPL_SET_TCB_FIELD, csk->tid);
+   req->wr.wr_mid |= htonl(FW_WR_FLOWID_V(csk->tid));
+   req->reply_ctrl = htons(NO_REPLY_V(no_reply) |
+   QUEUENO_V(csk->rss_qid));
+   req->word_cookie = htons(TCB_WORD_V(word) | TCB_COOKIE_V(cookie));
+   req->mask = cpu_to_be64(mask);
+   req->val = cpu_to_be64(val);
+   sc = (struct ulptx_idata *)(req + 1);
+   sc->cmd_more = htonl(ULPTX_CMD_V(ULP_TX_SC_NOOP));
+   sc->len = htonl(0);
+}
+
+static void __set_tcb_field(struct sock *sk, struct sk_buff *skb, u16 word,
+   u64 mask, u64 val, u8 cookie, int no_reply)
+{
+   struct chtls_sock *csk = rcu_dereference_sk_user_data(sk);
+   struct cpl_set_tcb_field *req;
+   struct ulptx_idata *sc;
+   unsigned int wrlen = roundup(sizeof(*req) + sizeof(*sc), 16);
+
+   req = (struct cpl_set_tcb_field *)__skb_put(skb, wrlen);
+   __set_tcb_field_direct(csk, req, word, mask, val, cookie, no_reply);
+   set_wr_txq(skb, CPL_PRIORITY_CONTROL, csk->port_id);
+}
+
+static int chtls_set_tcb_field(struct sock *sk, u16 word, u64 mask, u64 val)
+{
+   struct chtls_sock *csk = rcu_dereference_sk_user_data(sk);
+   struct sk_buff *skb;
+   struct cpl_set_tcb_field *req;
+   struct ulptx_idata *sc;
+   unsigned int wrlen = roundup(sizeof(*req) + sizeof(*sc), 16);
+   unsigned int credits_needed = DIV_ROUND_UP(wrlen, 16);
+
+   skb = alloc_skb(wrlen, GFP_ATOMIC);
+   if (!skb)
+   return -ENOMEM;
+
+   __set_tcb_field(sk, skb, word, mask, val, 0, 1);
+   set_queue(skb, (csk->txq_idx << 1) | CPL_PRIORITY_DATA, sk);
+   csk->wr_credits -= credits_needed;
+   csk->wr_unacked += credits_needed;
+   enqueue_wr(csk, skb);
+   cxgb4_ofld_send(csk->egress_dev, skb);
+   return 0;
+}
+
+/*
+ * Set one of the t_flags bits in the TCB.
+ */
+int chtls_set_tcb_tflag(struct sock *sk, unsigned int bit_pos, int val)
+{
+   return chtls_set_tcb_field(sk, 1, 1ULL << bit_pos,
+   val << bit_pos);
+}
+
+static int chtls_set_tcb_keyid(struct sock *sk, int keyid)
+{
+   return chtls_set_tcb_field(sk, 31, 0xULL, keyid);
+}
+
+static int chtls_set_tcb_seqno(struct sock *sk)
+{
+   return chtls_set_tcb_field(sk, 28, ~0ULL, 0);
+}
+
+static int chtls_set_tcb_quiesce(struct sock *sk, int val)
+{
+   return chtls_set_tcb_field(sk, 1, (1ULL << TF_RX_QUIESCE_S),
+  TF_RX_QUIESCE_V(val));
+}
+
+static void *chtls_alloc_mem(unsigned long size)
+{
+   void *p = kmalloc(size, GFP_KERNEL);
+
+   if (!p)
+   p = vmalloc(size);
+   if (p)
+   memset(p, 0, size);
+   return p;
+}
+
+static void chtls_free_mem(void *addr)
+{
+   unsigned long p = (unsigned long)addr;
+
+   if (p >= VMALLOC_START && p < VMALLOC_END)
+   vfree(addr);
+   else
+   kfree(addr);
+}
+
+/* TLS Key bitmap processing */
+int chtls_init_kmap(struct chtls_dev *cdev, struct cxgb4_lld_info *lldi)
+{
+   unsigned int num_key_ctx, bsize;
+
+   num_key_ctx = (lldi->vr->key.size / TLS_KEY_CONTEXT_SZ);
+   bsize = BITS_TO_LONGS(num_key_ctx);
+
+   cdev->kmap.size = num_key_ctx;
+   cdev->kmap.available = bsize;
+   cdev->kmap.addr = chtls_alloc_mem(sizeof(*cdev->kmap.addr) *
+ bsize);
+   if (!cdev->kmap.addr)
+   return -1;
+
+   cdev->kmap.start = lldi->vr->key.start;
+   spin_lock_init(>kmap.lock);
+   return 0;
+}
+
+void chtls_free_kmap(struct chtls_dev *cdev)
+{
+   if (cdev->kmap.addr)
+   chtls_free_mem(cdev->kmap.addr);
+}
+
+static int

[Crypto v7 12/12] Makefile Kconfig

2018-02-22 Thread Atul Gupta

Entry for Inline TLS as another driver dependent on cxgb4 and chcr

Signed-off-by: Atul Gupta 
---
 drivers/crypto/chelsio/Kconfig| 11 +++
 drivers/crypto/chelsio/Makefile   |  1 +
 drivers/crypto/chelsio/chtls/Makefile |  4 
 3 files changed, 16 insertions(+)
 create mode 100644 drivers/crypto/chelsio/chtls/Makefile

diff --git a/drivers/crypto/chelsio/Kconfig b/drivers/crypto/chelsio/Kconfig
index 5ae9f87..930d82d 100644
--- a/drivers/crypto/chelsio/Kconfig
+++ b/drivers/crypto/chelsio/Kconfig
@@ -29,3 +29,14 @@ config CHELSIO_IPSEC_INLINE
 default n
 ---help---
   Enable support for IPSec Tx Inline.
+
+config CRYPTO_DEV_CHELSIO_TLS
+tristate "Chelsio Crypto Inline TLS Driver"
+depends on CHELSIO_T4
+depends on TLS
+select CRYPTO_DEV_CHELSIO
+---help---
+  Support Chelsio Inline TLS with Chelsio crypto accelerator.
+
+  To compile this driver as a module, choose M here: the module
+  will be called chtls.
diff --git a/drivers/crypto/chelsio/Makefile b/drivers/crypto/chelsio/Makefile
index eaecaf1..639e571 100644
--- a/drivers/crypto/chelsio/Makefile
+++ b/drivers/crypto/chelsio/Makefile
@@ -3,3 +3,4 @@ ccflags-y := -Idrivers/net/ethernet/chelsio/cxgb4
 obj-$(CONFIG_CRYPTO_DEV_CHELSIO) += chcr.o
 chcr-objs :=  chcr_core.o chcr_algo.o
 chcr-$(CONFIG_CHELSIO_IPSEC_INLINE) += chcr_ipsec.o
+obj-$(CONFIG_CRYPTO_DEV_CHELSIO_TLS) += chtls/
diff --git a/drivers/crypto/chelsio/chtls/Makefile 
b/drivers/crypto/chelsio/chtls/Makefile
new file mode 100644
index 000..df13795
--- /dev/null
+++ b/drivers/crypto/chelsio/chtls/Makefile
@@ -0,0 +1,4 @@
+ccflags-y := -Idrivers/net/ethernet/chelsio/cxgb4 -Idrivers/crypto/chelsio/
+
+obj-$(CONFIG_CRYPTO_DEV_CHELSIO_TLS) += chtls.o
+chtls-objs := chtls_main.o chtls_cm.o chtls_io.o chtls_hw.o
-- 
1.8.3.1

[Crypto v7 06/12] cxgb4: LLD driver changes to enable TLS

2018-02-22 Thread Atul Gupta

Read FW capability. Read key area size. Dump the TLS record count.

Signed-off-by: Atul Gupta 
---
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c | 32 +---
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_uld.h  |  7 ++
 drivers/net/ethernet/chelsio/cxgb4/sge.c| 98 -
 3 files changed, 126 insertions(+), 11 deletions(-)

diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c 
b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
index 56bc626..ab5937e 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
@@ -4284,18 +4284,32 @@ static int adap_init0(struct adapter *adap)
adap->num_ofld_uld += 2;
}
if (caps_cmd.cryptocaps) {
-   /* Should query params here...TODO */
-   params[0] = FW_PARAM_PFVF(NCRYPTO_LOOKASIDE);
-   ret = t4_query_params(adap, adap->mbox, adap->pf, 0, 2,
- params, val);
-   if (ret < 0) {
-   if (ret != -EINVAL)
+   if (ntohs(caps_cmd.cryptocaps) &
+   FW_CAPS_CONFIG_CRYPTO_LOOKASIDE) {
+   params[0] = FW_PARAM_PFVF(NCRYPTO_LOOKASIDE);
+   ret = t4_query_params(adap, adap->mbox, adap->pf, 0,
+ 2, params, val);
+   if (ret < 0) {
+   if (ret != -EINVAL)
+   goto bye;
+   } else {
+   adap->vres.ncrypto_fc = val[0];
+   }
+   adap->num_ofld_uld += 1;
+   }
+   if (ntohs(caps_cmd.cryptocaps) &
+   FW_CAPS_CONFIG_TLS_INLINE) {
+   params[0] = FW_PARAM_PFVF(TLS_START);
+   params[1] = FW_PARAM_PFVF(TLS_END);
+   ret = t4_query_params(adap, adap->mbox, adap->pf, 0,
+ 2, params, val);
+   if (ret < 0)
goto bye;
-   } else {
-   adap->vres.ncrypto_fc = val[0];
+   adap->vres.key.start = val[0];
+   adap->vres.key.size = val[1] - val[0] + 1;
+   adap->num_uld += 1;
}
adap->params.crypto = ntohs(caps_cmd.cryptocaps);
-   adap->num_uld += 1;
}
 #undef FW_PARAM_PFVF
 #undef FW_PARAM_DEV
diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_uld.h 
b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_uld.h
index a14e8db..3d3ef3f 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_uld.h
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_uld.h
@@ -237,6 +237,7 @@ enum cxgb4_uld {
CXGB4_ULD_ISCSI,
CXGB4_ULD_ISCSIT,
CXGB4_ULD_CRYPTO,
+   CXGB4_ULD_TLS,
CXGB4_ULD_MAX
 };
 
@@ -287,6 +288,7 @@ struct cxgb4_virt_res {  /* virtualized 
HW resources */
struct cxgb4_range qp;
struct cxgb4_range cq;
struct cxgb4_range ocq;
+   struct cxgb4_range key;
unsigned int ncrypto_fc;
 };
 
@@ -298,6 +300,9 @@ struct chcr_stats_debug {
atomic_t error;
atomic_t fallback;
atomic_t ipsec_cnt;
+   atomic_t tls_pdu_tx;
+   atomic_t tls_pdu_rx;
+   atomic_t tls_key;
 };
 
 #define OCQ_WIN_OFFSET(pdev, vres) \
@@ -378,6 +383,8 @@ struct cxgb4_uld_info {
 int cxgb4_register_uld(enum cxgb4_uld type, const struct cxgb4_uld_info *p);
 int cxgb4_unregister_uld(enum cxgb4_uld type);
 int cxgb4_ofld_send(struct net_device *dev, struct sk_buff *skb);
+int cxgb4_immdata_send(struct net_device *dev, unsigned int idx,
+  const void *src, unsigned int len);
 int cxgb4_crypto_send(struct net_device *dev, struct sk_buff *skb);
 unsigned int cxgb4_dbfifo_count(const struct net_device *dev, int lpfifo);
 unsigned int cxgb4_port_chan(const struct net_device *dev);
diff --git a/drivers/net/ethernet/chelsio/cxgb4/sge.c 
b/drivers/net/ethernet/chelsio/cxgb4/sge.c
index 6e310a0..32e3779 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/sge.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/sge.c
@@ -1740,9 +1740,9 @@ static void txq_stop_maperr(struct sge_uld_txq *q)
  * Stops an offload Tx queue that has become full and modifies the packet
  * being written to request a wakeup.
  */
-static void ofldtxq_stop(struct sge_uld_txq *q, struct sk_buff *skb)
+static void ofldtxq_stop(struct sge_uld_txq *q, void *src)
 {
-   struct fw_wr_hdr *wr = (struct fw_wr_hdr *)skb->data;
+   struct fw_wr_hdr *wr = (struct fw_wr_hdr *)src;
 
wr->lo |= htonl(FW_WR_EQUEQ_F | FW_WR_EQUIQ_F);
q->q.stops++;
@@ -2005,6 +2005,100 @@ int cxgb4_ofld_send(struct net_device *dev, struct 
sk_buff *skb)
 }
 EXPORT_SYMBOL(cxgb4_ofld_send);
 
+static void

Re: [PATCH net-next 2/5] net/smc: fix structure size

2018-02-22 Thread David Miller

From: Ursula Braun 
Date: Wed, 21 Feb 2018 12:32:32 +0100

> diff --git a/net/smc/smc_cdc.h b/net/smc/smc_cdc.h
> index ab240b37ad11..d2012fd22100 100644
> --- a/net/smc/smc_cdc.h
> +++ b/net/smc/smc_cdc.h
> @@ -48,7 +48,7 @@ struct smc_cdc_msg {
>   struct smc_cdc_producer_flags   prod_flags;
>   struct smc_cdc_conn_state_flags conn_state_flags;
>   u8  reserved[18];
> -} __aligned(8);
> +} __packed;  /* format defined in RFC7609 */

Hold on, __packed should only be used as the absolute last possible
option to fix structure layout problems.

Also, a sub-structure of smc_cdc_msg, union smc_cdc_cursor, is still
marked with __aligned(8).  That makes no sense at all.

Please fix this without using __packed, as __packed has a severe
detrimental effect on code generation for accessing such structure
on several cpu architectures.

Also, if this these are legitimate bug fixes you should target those
at 'net' not 'net-next'.

Thank you.

[PATCH next-queue 1/3] ixgbe: check for 128-bit authentication

2018-02-22 Thread Shannon Nelson

Make sure the Security Association is using
a 128-bit authentication, since that's the only
size that the hardware offload supports.

Signed-off-by: Shannon Nelson 
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c | 16 +++-
 drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.h |  1 +
 2 files changed, 12 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c 
b/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c
index 93eacdd..8b7dbc8 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c
@@ -423,15 +423,21 @@ static int ixgbe_ipsec_parse_proto_keys(struct xfrm_state 
*xs,
const char aes_gcm_name[] = "rfc4106(gcm(aes))";
int key_len;
 
-   if (xs->aead) {
-   key_data = >aead->alg_key[0];
-   key_len = xs->aead->alg_key_len;
-   alg_name = xs->aead->alg_name;
-   } else {
+   if (!xs->aead) {
netdev_err(dev, "Unsupported IPsec algorithm\n");
return -EINVAL;
}
 
+   if (xs->aead->alg_icv_len != IXGBE_IPSEC_AUTH_BITS) {
+   netdev_err(dev, "IPsec offload requires %d bit 
authentication\n",
+  IXGBE_IPSEC_AUTH_BITS);
+   return -EINVAL;
+   }
+
+   key_data = >aead->alg_key[0];
+   key_len = xs->aead->alg_key_len;
+   alg_name = xs->aead->alg_name;
+
if (strcmp(alg_name, aes_gcm_name)) {
netdev_err(dev, "Unsupported IPsec algorithm - please use %s\n",
   aes_gcm_name);
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.h 
b/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.h
index da3ce78..87d2800 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.h
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.h
@@ -32,6 +32,7 @@
 #define IXGBE_IPSEC_MAX_RX_IP_COUNT128
 #define IXGBE_IPSEC_BASE_RX_INDEX  0
 #define IXGBE_IPSEC_BASE_TX_INDEX  IXGBE_IPSEC_MAX_SA_COUNT
+#define IXGBE_IPSEC_AUTH_BITS  128
 
 #define IXGBE_RXTXIDX_IPS_EN   0x0001
 #define IXGBE_RXIDX_TBL_SHIFT  1
-- 
2.7.4

[PATCH next-queue 0/3] ixgbe: ipsec fixups

2018-02-22 Thread Shannon Nelson

These are a couple of updates for the ixgbe IPsec offload support.

Shannon Nelson (3):
  ixgbe: check for 128-bit authentication
  ixgbe: fix ipsec trailer length
  ixgbe: remove unneeded ipsec state free callback

 drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c | 53 +-
 drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.h |  1 +
 2 files changed, 35 insertions(+), 19 deletions(-)

-- 
2.7.4

pull-request: mac80211 2018-02-22

2018-02-22 Thread Johannes Berg

Hi Dave,

A bunch of fixes, including the nla_put_string() issue
just in from Kees. Otherwise nothing really super urgent
or interesting.

Please pull and let me know if there's any problem.

Thanks,
johannes



The following changes since commit ba804bb4b72e57374b5f567b783aa0298fba0ce6:

  Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net (2018-01-26 
09:03:16 -0800)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211.git 
tags/mac80211-for-davem-2018-02-22

for you to fetch changes up to 657308f73e674e86b60509a430a46e569bf02846:

  regulatory: add NUL to request alpha2 (2018-02-22 20:57:48 +0100)


Various fixes across the tree, the shortlog basically says it all:

  cfg80211: fix cfg80211_beacon_dup
  -> old bug in this code

  cfg80211: clear wep keys after disconnection
  -> certain ways of disconnecting left the keys

  mac80211: round IEEE80211_TX_STATUS_HEADROOM up to multiple of 4
  -> alignment issues with using 14 bytes

  mac80211: Do not disconnect on invalid operating class
  -> if the AP has a bogus operating class, let it be

  mac80211: Fix sending ADDBA response for an ongoing session
  -> don't send the same frame twice

  cfg80211: use only 1Mbps for basic rates in mesh
  -> interop issue with old versions of our code

  mac80211_hwsim: don't use WQ_MEM_RECLAIM
  -> it causes splats because it flushes work on a non-reclaim WQ

  regulatory: add NUL to request alpha2
  -> nla_put_string() issue from Kees

  mac80211: mesh: fix wrong mesh TTL offset calculation
  -> protocol issue

  mac80211: fix a possible leak of station stats
  -> error path might leak memory

  mac80211: fix calling sleeping function in atomic context
  -> percpu allocations need to be made with gfp flags


Arnd Bergmann (1):
  cfg80211: fix cfg80211_beacon_dup

Avraham Stern (1):
  cfg80211: clear wep keys after disconnection

Felix Fietkau (1):
  mac80211: round IEEE80211_TX_STATUS_HEADROOM up to multiple of 4

Ilan Peer (2):
  mac80211: Do not disconnect on invalid operating class
  mac80211: Fix sending ADDBA response for an ongoing session

Johannes Berg (3):
  cfg80211: use only 1Mbps for basic rates in mesh
  mac80211_hwsim: don't use WQ_MEM_RECLAIM
  regulatory: add NUL to request alpha2

Peter Oh (1):
  mac80211: mesh: fix wrong mesh TTL offset calculation

Sara Sharon (2):
  mac80211: fix a possible leak of station stats
  mac80211: fix calling sleeping function in atomic context

 drivers/net/wireless/mac80211_hwsim.c |  2 +-
 include/net/mac80211.h|  2 +-
 include/net/regulatory.h  |  2 +-
 net/mac80211/agg-rx.c |  4 +---
 net/mac80211/cfg.c|  2 +-
 net/mac80211/ieee80211_i.h|  2 +-
 net/mac80211/mesh.c   | 17 ++---
 net/mac80211/spectmgmt.c  |  7 +++
 net/mac80211/sta_info.c   |  3 ++-
 net/wireless/mesh.c   | 25 ++---
 net/wireless/sme.c|  2 ++
 11 files changed, 41 insertions(+), 27 deletions(-)

Re: pull-request: mac80211-next 2018-02-22

2018-02-22 Thread David Miller

From: Johannes Berg 
Date: Thu, 22 Feb 2018 21:16:18 +0100

> Wireless is slow ... but we're preparing for HE (802.11ax),
> so I guess soon we'll have a big chunk of work coming :-)

I wondered where you guys have been hiding :-)

> Please pull and let me know if there's any problem.

Pulled, thank you!

[PATCH net-next] r8169: disable WOL per default

2018-02-22 Thread Heiner Kallweit

Currently, if BIOS enables WOL in the chip, settings are inconsistent
because the device isn't marked as wakeup-enabled (if not done
explicitly via userspace tools). This causes issues with suspend/
resume because mdio_bus_phy_may_suspend() checks whether device is
wakeup-enabled. In detail MDIO bus access in phy_suspend() can fail
because the MDIO bus is disabled.

In the history of the driver we find two competing approaches:
8f9d5138035d "r8169: remember WOL preferences on driver load" prefers
to preserve what the BIOS may have set, whilst bde135a672bf
"r8169: only enable PCI wakeups when WOL is active" disabled PCI
wakeup per default to work around a bug on one platform.

Seems like nobody complained after the latter patch about non-working
WOL, what makes me think that nobody uses WOL w/o configuring it
explicitly.

My opinion:
Vast majority of users doesn't use WOL even if the BIOS enables it in
the chip. And having WOL being active keeps the PHY(s) from powering
down if being idle.
If somebody needs WOL, he can enable it during boot, e.g. by
configuring systemd.link/WakeOnLan.

Therefore, to make WOL consistent again, disable it per default.

Signed-off-by: Heiner Kallweit 
---
 drivers/net/ethernet/realtek/r8169.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/realtek/r8169.c 
b/drivers/net/ethernet/realtek/r8169.c
index c16b97a56..91a03d575 100644
--- a/drivers/net/ethernet/realtek/r8169.c
+++ b/drivers/net/ethernet/realtek/r8169.c
@@ -8512,11 +8512,12 @@ static int rtl_init_one(struct pci_dev *pdev, const 
struct pci_device_id *ent)
tp->txd_version = rtl_chip_infos[chipset].txd_version;
 
RTL_W8(Cfg9346, Cfg9346_Unlock);
-   RTL_W8(Config1, RTL_R8(Config1) | PMEnable);
-   RTL_W8(Config5, RTL_R8(Config5) & (BWF | MWF | UWF | LanWake | 
PMEStatus));
tp->features |= rtl_try_msi(tp, cfg);
RTL_W8(Cfg9346, Cfg9346_Lock);
 
+   /* override BIOS settings, use userspace tools to enable WOL */
+   __rtl8169_set_wol(tp, 0);
+
if (rtl_tbi_enabled(tp)) {
tp->set_speed = rtl8169_set_speed_tbi;
tp->get_link_ksettings = rtl8169_get_link_ksettings_tbi;
-- 
2.16.2

Re: pull-request: mac80211-next 2018-02-22

2018-02-22 Thread Johannes Berg

On Thu, 2018-02-22 at 15:19 -0500, David Miller wrote:
> From: Johannes Berg 
> Date: Thu, 22 Feb 2018 21:16:18 +0100
> 
> > Wireless is slow ... but we're preparing for HE (802.11ax),
> > so I guess soon we'll have a big chunk of work coming :-)
> 
> I wondered where you guys have been hiding :-)

Yeah, I don't like this development model much, but the spec isn't
finished yet and every time I look at an area I end up changing it
*again* which isn't fun to do upstream :-)

(Was just doing the HE sniffer stuff in radiotap these days ... uh,
yeah, I've more or less rewritten it twice already - hopefully no more)

johannes

Re: [PATCH bpf v2] bpf: fix rcu lockdep warning for lpm_trie map_free callback

2018-02-22 Thread Daniel Borkmann

On 02/22/2018 07:10 PM, Yonghong Song wrote:
> Commit 9a3efb6b661f ("bpf: fix memory leak in lpm_trie map_free callback 
> function")
> fixed a memory leak and removed unnecessary locks in map_free callback 
> function.
> Unfortrunately, it introduced a lockdep warning. When lockdep checking is 
> turned on,
> running tools/testing/selftests/bpf/test_lpm_map will have:
> 
>   [   98.294321] =
>   [   98.294807] WARNING: suspicious RCU usage
>   [   98.295359] 4.16.0-rc2+ #193 Not tainted
>   [   98.295907] -
>   [   98.296486] /home/yhs/work/bpf/kernel/bpf/lpm_trie.c:572 suspicious 
> rcu_dereference_check() usage!
>   [   98.297657]
>   [   98.297657] other info that might help us debug this:
>   [   98.297657]
>   [   98.298663]
>   [   98.298663] rcu_scheduler_active = 2, debug_locks = 1
>   [   98.299536] 2 locks held by kworker/2:1/54:
>   [   98.300152]  #0:  ((wq_completion)"events"){+.+.}, at: 
> [<196bc1f0>] process_one_work+0x157/0x5c0
>   [   98.301381]  #1:  ((work_completion)(>work)){+.+.}, at: 
> [<196bc1f0>] process_one_work+0x157/0x5c0
> 
> Since actual trie tree removal happens only after no other
> accesses to the tree are possible, replacing
>   rcu_dereference_protected(*slot, lockdep_is_held(>lock))
> with
>   rcu_dereference_protected(*slot, 1)
> fixed the issue.
> 
> Fixes: 9a3efb6b661f ("bpf: fix memory leak in lpm_trie map_free callback 
> function")
> Reported-by: Eric Dumazet 
> Suggested-by: Eric Dumazet 
> Signed-off-by: Yonghong Song 

Applied to bpf tree, thanks everyone!

Re: [PATCH v2 iproute2-next 2/3] ip: Display ip rule protocol used

2018-02-22 Thread David Ahern

On 2/21/18 7:12 PM, Donald Sharp wrote:
> diff --git a/ip/iprule.c b/ip/iprule.c
> index 00a6c26a..39008768 100644
> --- a/ip/iprule.c
> +++ b/ip/iprule.c
> @@ -47,6 +47,7 @@ static void usage(void)
>   "[ iif STRING ] [ oif STRING ] [ pref NUMBER ] [ 
> l3mdev ]\n"
>   "[ uidrange NUMBER-NUMBER ]\n"
>   "ACTION := [ table TABLE_ID ]\n"
> + "  [ protocol RPROTO ]\n"

Drop the 'R' makes it harder to read; just 'PROTO' is fine.


>   "  [ nat ADDRESS ]\n"
>   "  [ realms [SRCREALM/]DSTREALM ]\n"
>   "  [ goto NUMBER ]\n"
> @@ -71,6 +72,8 @@ static struct
>   struct fib_rule_uid_range range;
>   inet_prefix src;
>   inet_prefix dst;
> + int protocol;
> + int protocolmask;
>  } filter;
>  
>  static inline int frh_get_table(struct fib_rule_hdr *frh, struct rtattr **tb)
> @@ -338,6 +341,10 @@ int print_rule(const struct sockaddr_nl *who, struct 
> nlmsghdr *n, void *arg)
>   rtnl_rtntype_n2a(frh->action,
>b1, sizeof(b1)));
>  
> + if (frh->proto != RTPROT_UNSPEC)
> + fprintf(fp, " proto %s ",
> + rtnl_rtprot_n2a(frh->proto, b1, sizeof(b1)));
> +
>   fprintf(fp, "\n");
>   fflush(fp);
>   return 0;
> @@ -391,6 +398,9 @@ static int flush_rule(const struct sockaddr_nl *who, 
> struct nlmsghdr *n,
>  
>   parse_rtattr(tb, FRA_MAX, RTM_RTA(frh), len);
>  
> + if ((filter.protocol^frh->proto))
> + return 0;
> +
>   if (tb[FRA_PRIORITY]) {
>   n->nlmsg_type = RTM_DELRULE;
>   n->nlmsg_flags = NLM_F_REQUEST;
> @@ -415,12 +425,6 @@ static int iprule_list_flush_or_save(int argc, char 
> **argv, int action)
>   if (af == AF_UNSPEC)
>   af = AF_INET;
>  
> - if (action != IPRULE_LIST && argc > 0) {
> - fprintf(stderr, "\"ip rule %s\" does not take any arguments.\n",
> - action == IPRULE_SAVE ? "save" : "flush");
> - return -1;
> - }
> -
>   switch (action) {
>   case IPRULE_SAVE:
>   if (save_rule_prep())
> @@ -508,7 +512,18 @@ static int iprule_list_flush_or_save(int argc, char 
> **argv, int action)
>   NEXT_ARG();
>   if (get_prefix(, *argv, af))
>   invarg("from value is invalid\n", *argv);
> - } else {
> + } else if (matches(*argv, "protocol") == 0) {
> + __u32 prot;
> + NEXT_ARG();
> + filter.protocolmask = -1;
> + if (rtnl_rtprot_a2n(, *argv)) {
> + if (strcmp(*argv, "all") != 0)
> + invarg("invalid \"protocol\"\n", *argv);
> + prot = 0;
> + filter.protocolmask = 0;
> + }
> + filter.protocol = prot;
> + } else{
>   if (matches(*argv, "dst") == 0 ||
>   matches(*argv, "to") == 0) {
>   NEXT_ARG();
> diff --git a/man/man8/ip-rule.8 b/man/man8/ip-rule.8
> index a5c47981..98b2573d 100644
> --- a/man/man8/ip-rule.8
> +++ b/man/man8/ip-rule.8
> @@ -50,6 +50,8 @@ ip-rule \- routing policy database management
>  .IR ACTION " := [ "
>  .B  table
>  .IR TABLE_ID " ] [ "
> +.B  protocol
> +.IR RPROTO " ] [ "

same here and others in this file

Re: [PATCH v2 net-next] net: dsa: mv88e6xxx: scratch registers and external MDIO pins

2018-02-22 Thread David Miller

From: Andrew Lunn 
Date: Thu, 22 Feb 2018 01:51:49 +0100

> MV88E6352 and later switches support GPIO control through the "Scratch
> & Misc" global2 register. Two of the pins controlled this way on the
> mv88e6390 family are the external MDIO pins. They can either by used
> as part of the MII interface for port 0, GPIOs, or MDIO. Add a
> function to configure them for MDIO, if possible, and call it when
> registering the external MDIO bus.
> 
> Suggested-by: Russell King 
> Signed-off-by: Andrew Lunn 
> ---
> v2: Make stub function static inline, as reported by 0-day.

Applied, thanks Andrew.

Re: nla_put_string() vs NLA_STRING

2018-02-22 Thread Johannes Berg

On Tue, 2018-02-20 at 22:00 -0800, Kees Cook wrote:

> It seems that in at least one case[1], nla_put_string() is being used
> on an NLA_STRING, which lacks a NULL terminator, which leads to
> silliness when nla_put_string() uses strlen() to figure out the size:

Fun! I'm not a big fan of the whole NLA_STRING thing with or without
NUL terminator anyway, it's a bit confusing at times :-)

> This is a problem at least here:
> 
> struct regulatory_request {
> ...
> char alpha2[2];
> ...
> 
> static const struct nla_policy nl80211_policy[NUM_NL80211_ATTR] = {
> ...
> [NL80211_ATTR_REG_ALPHA2] = { .type = NLA_STRING, .len = 2 },
> ...

Yeah, this is clearly stupid. We already fixed one of these, see commit
a5fe8e7695dc ("regulatory: add NUL to alpha2"). I'll fix up the second
one too.

> So, this specific problem needs fixing (in at least two places calling
> nla_put_string(msg, NL80211_ATTR_REG_ALPHA2, ...)). While I suspect
> it's only ever written an extra byte from the following variable in
> the structure which is an enum nl80211_dfs_regions, 

Only one, since the other has alpha2[3] already :-)

And in that case, yes, on little endian and only if the dfs region is
non-zero, though the dfs region was added later so dunno what else
there was - but certainly this struct would have always contained some
enum value that had zero-bytes.

> I worry there
> might be a lot more of these (though I'd hope unterminated strings are
> uncommon for internal representation).

Generally they are, I'd argue.

> And more generally, it seems
> like only the NLA _input_ functions actually check nla_policy details.
> It seems that the output functions should do the same too, yes?

It doesn't really work that way - there's no real guarantee that the
policy is symmetric on input/output.

johannes

Re: [PATCH net-next 0/3] nfp: build and FW initramfs updates

2018-02-22 Thread David Miller

From: Jakub Kicinski 
Date: Wed, 21 Feb 2018 19:50:04 -0800

> This set brings empty makefiles to allow building single object files
> (useful for build-testing), Kbuild does not cater to this use case
> too well.  There are two ethernet drivers right now which suffer
> from this (nfp, aquantia), both are fixed.
> 
> Dirk adds an uncommon FW image name to the list of firmware files
> module may request.

Series applied, thanks for following up on this Jakub.

Re: [PATCH v2 net-next 1/2] lan743x: Add main source files for new lan743x driver

2018-02-22 Thread Florian Fainelli

On 02/22/2018 01:31 PM, bryan.whiteh...@microchip.com wrote:
>>> +static void lan743x_intr_unregister_isr(struct lan743x_adapter *adapter,
>>> +   int vector_index)
>>> +{
>>> +   struct lan743x_vector *vector = >intr.vector_list
>>> +   [vector_index];
>>> +
>>> +   devm_free_irq(>pci.pdev->dev, vector->irq, vector);
>>
>> Hu Bryan
>>
>> The point of devm_ is that you don't need to free resources you have
>> allocated using devm_. The core will release them when the device is
>> removed.
> 
> Hi Andrew,
> 
> When I remove the call devm_free_irq, I get a segmentation fault on close
> in pci_disable_msix. Did I do something else wrong?
> 
> Also I'm allocating interrupt resources on interface up, and freeing resources
> on interface down. So if there is an up, down, up sequence then the driver
> will allocate resources twice. In order for devm to work properly, should I
> move all resource allocation into the probe function?

No, most network drivers request their interrupt line in the open
function and free it in the close function. Because you are balancing
each devm_request_irq() with a devm_free_irq(), just don't just devm_*
functions, just the normal request_irq() and free_irq() functions.
-- 
Florian

Re: [patch net-next] mlxsw: spectrum_switchdev: Allow port enslavement to a VLAN-unaware bridge

2018-02-22 Thread David Miller

From: David Ahern 
Date: Wed, 21 Feb 2018 11:16:35 -0700

> On 2/20/18 12:45 AM, Jiri Pirko wrote:
>> From: Ido Schimmel 
>> 
>> Up until now we only allowed VLAN devices to be put in a VLAN-unaware
>> bridge, but some users need the ability to enslave physical ports as
>> well.
>> 
>> This is achieved by mapping the port and VID 1 to the bridge's vFID,
>> instead of the port and the VID used by the VLAN device.
>> 
>> The above is valid because as long as the port is not enslaved to a
>> bridge, VID 1 is guaranteed to be configured as PVID and egress
>> untagged.
>> 
>> Signed-off-by: Ido Schimmel 
>> Signed-off-by: Jiri Pirko 
>> ---
>>  drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c | 12 +---
>>  1 file changed, 5 insertions(+), 7 deletions(-)
>> 
> 
> Maybe I am missing something in the setup, but I am not getting
> host-to-host connectivity. I booted a switch with this patch, configured
> a bridge:

I'm waiting for this discussion to be fully resolved before applying this
patch.  Just FYI...

Re: [patch net-next] mlxsw: spectrum_switchdev: Allow port enslavement to a VLAN-unaware bridge

2018-02-22 Thread David Ahern

On 2/22/18 11:58 AM, David Miller wrote:
> From: David Ahern 
> Date: Wed, 21 Feb 2018 11:16:35 -0700
> 
>> On 2/20/18 12:45 AM, Jiri Pirko wrote:
>>> From: Ido Schimmel 
>>>
>>> Up until now we only allowed VLAN devices to be put in a VLAN-unaware
>>> bridge, but some users need the ability to enslave physical ports as
>>> well.
>>>
>>> This is achieved by mapping the port and VID 1 to the bridge's vFID,
>>> instead of the port and the VID used by the VLAN device.
>>>
>>> The above is valid because as long as the port is not enslaved to a
>>> bridge, VID 1 is guaranteed to be configured as PVID and egress
>>> untagged.
>>>
>>> Signed-off-by: Ido Schimmel 
>>> Signed-off-by: Jiri Pirko 
>>> ---
>>>  drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c | 12 +---
>>>  1 file changed, 5 insertions(+), 7 deletions(-)
>>>
>>
>> Maybe I am missing something in the setup, but I am not getting
>> host-to-host connectivity. I booted a switch with this patch, configured
>> a bridge:
> 
> I'm waiting for this discussion to be fully resolved before applying this
> patch.  Just FYI...
> 

Ido:

IPv4 works at boot; IPv6 requires the mcast snooping disable. For this
vlan-unaware bridges can that be set automatically?

And then, what are the options for lldp?

Re: [PATCH net-next 0/7] net/ipv6: Add support for path selection using hash of 5-tuple

2018-02-22 Thread David Miller

From: David Ahern 
Date: Wed, 21 Feb 2018 10:49:47 -0800

> Patch 5 adds the L4 hash support.

Please address Ido's feedback about how the ports aren't actually being
taken into consideration because they aren't present in the flow
information being used.

Thanks.

Re: [pull request][for-next 0/7] Mellanox, mlx5 shared code updates 2018-02-21

2018-02-22 Thread David Miller

From: Saeed Mahameed 
Date: Wed, 21 Feb 2018 12:13:47 -0800

> This series includes shared code updates for mlx5 core driver for both
> netdev and rdma subsystems.  This series should be pulled to both
> trees so we can continue netdev and rdma specific submissions separately.
> 
> For more information please see tag log below.
> 
> P.S. We expect two more shared code pull requests.
> 
> The series doesn't cause any conflict with the latest mlx5 net fixes
> series.
> 
> Please pull and let me know if there's any issue,

Looks good to me, pulled into net-next, thanks.

pull-request: mac80211-next 2018-02-22

2018-02-22 Thread Johannes Berg

Hi Dave,

Wireless is slow ... but we're preparing for HE (802.11ax),
so I guess soon we'll have a big chunk of work coming :-)

Please pull and let me know if there's any problem.

Thanks,
johannes



The following changes since commit 91e6dd8284256ef62b43b78da6e7684e4f06ac2f:

  ipmr: Fix ptrdiff_t print formatting (2018-01-30 09:20:25 -0500)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next.git 
tags/mac80211-next-for-davem-2018-02-22

for you to fetch changes up to 94ba92713f8329c96e0a8e2880b3c1a785d1c95c:

  mac80211: Call mgd_prep_tx before transmitting deauthentication (2018-02-22 
21:13:04 +0100)


Various updates across wireless.

One thing to note: I've included a new ethertype
that wireless uses (ETH_P_PREAUTH) in if_ether.h.


Ben Greear (1):
  mac80211: Add txq flags to debugfs

Benjamin Beichler (3):
  mac80211_hwsim: add permanent mac address option for new radios
  mac80211_hwsim: add nl_err_msg in hwsim_new_radio in netlink case
  mac80211_hwsim: add generation count for netlink dump operation

Colin Ian King (1):
  mac80211: remove redundant initialization to pointer 'hdr'

Denis Kenzior (1):
  uapi: Add 802.11 Preauthentication to if_ether

Ilan Peer (1):
  mac80211: Call mgd_prep_tx before transmitting deauthentication

Johannes Berg (2):
  nl80211: remove unnecessary genlmsg_cancel() calls
  mac80211: support reporting A-MPDU EOF bit value/known

Sara Sharon (1):
  mac80211: add get TID helper

Srinivas Dasari (4):
  cfg80211/nl80211: Optional authentication offload to userspace
  nl80211: Allow SAE Authentication for NL80211_CMD_CONNECT
  nl80211: Fix external_auth check for offloaded authentication
  ieee80211: Increase PMK maximum length to 64 bytes

Sunil Dutt (1):
  nl80211: Introduce scan flags to emphasize requested scan behavior

Venkateswara Naralasetty (2):
  cfg80211: send ack_signal to user in probe client response
  mac80211: Add tx ack signal support in sta info

tami...@codeaurora.org (2):
  cfg80211: Add support to notify station's opmode change to userspace
  mac80211: Add support to notify ht/vht opmode modification.

 drivers/net/wireless/ath/wil6210/cfg80211.c |   3 +-
 drivers/net/wireless/mac80211_hwsim.c   |  81 ---
 drivers/net/wireless/mac80211_hwsim.h   |   9 +-
 include/linux/ieee80211.h   |  14 +-
 include/net/cfg80211.h  | 104 +-
 include/net/ieee80211_radiotap.h|   2 +
 include/net/mac80211.h  |  19 +++
 include/uapi/linux/if_ether.h   |   1 +
 include/uapi/linux/nl80211.h|  90 +++-
 net/mac80211/debugfs.c  |   1 +
 net/mac80211/debugfs_sta.c  |  10 +-
 net/mac80211/iface.c|   3 +-
 net/mac80211/michael.c  |   2 +-
 net/mac80211/mlme.c |  18 ++-
 net/mac80211/rc80211_minstrel_ht.c  |   2 +-
 net/mac80211/rx.c   |  24 +++-
 net/mac80211/sta_info.c |   6 +
 net/mac80211/sta_info.h |   2 +
 net/mac80211/status.c   |  11 +-
 net/mac80211/tx.c   |  11 +-
 net/mac80211/vht.c  |   9 ++
 net/mac80211/wpa.c  |   8 +-
 net/wireless/nl80211.c  | 203 +++-
 net/wireless/rdev-ops.h |  15 ++
 net/wireless/trace.h|  23 
 25 files changed, 584 insertions(+), 87 deletions(-)

Re: [PATCH] dsa: ptp; mark dummy helpers as 'inline'

2018-02-22 Thread David Miller

From: Arnd Bergmann 
Date: Thu, 22 Feb 2018 12:44:40 +0100

> Declaring a static function in a header leads to a warning every
> time that header gets included without the function being used:
> 
> In file included from drivers/net/dsa/mv88e6xxx/chip.c:42:
> drivers/net/dsa/mv88e6xxx/ptp.h:92:13: error: 'mv88e6xxx_hwtstamp_work' 
> defined but not used [-Werror=unused-function]
>  static long mv88e6xxx_hwtstamp_work(struct ptp_clock_info *ptp)
> In file included from drivers/net/dsa/mv88e6xxx/chip.c:38:
> drivers/net/dsa/mv88e6xxx/global2.h:355:12: error: 'mv88e6xxx_g2_wait' 
> defined but not used [-Werror=unused-function]
>  static int mv88e6xxx_g2_wait(struct mv88e6xxx_chip *chip, int reg, u16 mask)
> ^
> drivers/net/dsa/mv88e6xxx/global2.h:350:12: error: 'mv88e6xxx_g2_update' 
> defined but not used [-Werror=unused-function]
>  static int mv88e6xxx_g2_update(struct mv88e6xxx_chip *chip, int reg, u16 
> update)
> ^~~
> drivers/net/dsa/mv88e6xxx/global2.h:345:12: error: 'mv88e6xxx_g2_write' 
> defined but not used [-Werror=unused-function]
>  static int mv88e6xxx_g2_write(struct mv88e6xxx_chip *chip, int reg, u16 val)
> ^~
> drivers/net/dsa/mv88e6xxx/global2.h:340:12: error: 'mv88e6xxx_g2_read' 
> defined but not used [-Werror=unused-function]
>  static int mv88e6xxx_g2_read(struct mv88e6xxx_chip *chip, int reg, u16 *val)
> 
> This marks all such functions in dsa inline to make sure we don't warn
> about them.
> 
> Fixes: c6fe0ad2c349 ("net: dsa: mv88e6xxx: add rx/tx timestamping support")
> Fixes: 0d632c3d6fe3 ("net: dsa: mv88e6xxx: add accessors for PTP/TAI 
> registers")
> Signed-off-by: Arnd Bergmann 

Applied, thanks Arnd.

RE: [PATCH v2 net-next 1/2] lan743x: Add main source files for new lan743x driver

2018-02-22 Thread Bryan.Whitehead

> > +static void lan743x_intr_unregister_isr(struct lan743x_adapter *adapter,
> > +   int vector_index)
> > +{
> > +   struct lan743x_vector *vector = >intr.vector_list
> > +   [vector_index];
> > +
> > +   devm_free_irq(>pci.pdev->dev, vector->irq, vector);
> 
> Hu Bryan
> 
> The point of devm_ is that you don't need to free resources you have
> allocated using devm_. The core will release them when the device is
> removed.

Hi Andrew,

When I remove the call devm_free_irq, I get a segmentation fault on close
in pci_disable_msix. Did I do something else wrong?

Also I'm allocating interrupt resources on interface up, and freeing resources
on interface down. So if there is an up, down, up sequence then the driver
will allocate resources twice. In order for devm to work properly, should I
move all resource allocation into the probe function?

Bryan

Re: [patch net-next] mlxsw: spectrum_switchdev: Allow port enslavement to a VLAN-unaware bridge

2018-02-22 Thread David Ahern

On 2/22/18 1:55 PM, Ido Schimmel wrote:
> On Thu, Feb 22, 2018 at 12:27:35PM -0700, David Ahern wrote:
>> Ido:
>>
>> IPv4 works at boot; IPv6 requires the mcast snooping disable. For this
>> vlan-unaware bridges can that be set automatically?
> 
> Can you please try the following patch?
> 
...
> 
> It should fix your problem.

it does.

> 
> The real problem that I can then address in net-next is the fact that
> the Linux bridge tries to be smart and only resorts to flooding
> unregistered multicast packets in case its querier is disabled and in
> case it didn't detect any other querier in the network. This isn't
> currently reflected to underlying drivers. Only mcast snooping on/off.
> 
> Anyway, it's not related to the patch in question. You'd get the same
> behavior with VLAN-aware bridges.
> 
>> And then, what are the options for lldp?
> 
> Didn't understand the question. Can you clarify?
> 

nm. mental lapse.

Re: [PATCH net-next 0/7] net/ipv6: Add support for path selection using hash of 5-tuple

2018-02-22 Thread David Miller

From: David Ahern 
Date: Thu, 22 Feb 2018 12:31:01 -0700

> On 2/22/18 12:27 PM, David Miller wrote:
>> From: David Ahern 
>> Date: Wed, 21 Feb 2018 10:49:47 -0800
>> 
>>> Patch 5 adds the L4 hash support.
>> 
>> Please address Ido's feedback about how the ports aren't actually being
>> taken into consideration because they aren't present in the flow
>> information being used.
> 
> It's the forwarding case; I need to add the skb to the route lookup
> functions. I'll send a v2 with that change in the next few days.

Ok, thank you.

Re: [PATCH] bpf: add schedule points in percpu arrays management

2018-02-22 Thread Daniel Borkmann

[ +Dennis for mm/pcpu ]

On 02/22/2018 05:33 PM, Eric Dumazet wrote:
> From: Eric Dumazet 
> 
> syszbot managed to trigger RCU detected stalls in
> bpf_array_free_percpu()
> 
> It takes time to allocate a huge percpu map, but even more time to free
> it.
> 
> Since we run in process context, use cond_resched() to yield cpu if
> needed.
> 
> Fixes: a10423b87a7e ("bpf: introduce BPF_MAP_TYPE_PERCPU_ARRAY map")
> Signed-off-by: Eric Dumazet 
> Reported-by: syzbot 

Applied to bpf tree, thanks Eric!

> ---
>  kernel/bpf/arraymap.c |5 -
>  1 file changed, 4 insertions(+), 1 deletion(-)
> diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c
> index 
> a364c408f25a54a8175c92b6004a5e7e15f198cb..14750e7c5ee4872e4a7426e960bea7ae001e6623
>  100644
> --- a/kernel/bpf/arraymap.c
> +++ b/kernel/bpf/arraymap.c
> @@ -26,8 +26,10 @@ static void bpf_array_free_percpu(struct bpf_array *array)
>  {
>   int i;
>  
> - for (i = 0; i < array->map.max_entries; i++)
> + for (i = 0; i < array->map.max_entries; i++) {
>   free_percpu(array->pptrs[i]);
> + cond_resched();
> + }
>  }
>  
>  static int bpf_array_alloc_percpu(struct bpf_array *array)
> @@ -43,6 +45,7 @@ static int bpf_array_alloc_percpu(struct bpf_array *array)
>   return -ENOMEM;
>   }
>   array->pptrs[i] = ptr;
> + cond_resched();
>   }
>  
>   return 0;
>

Re: [PATCH] selftest: fix kselftest-merge depend on 'RUNTIME_TESTING_MENU'

2018-02-22 Thread Luis R. Rodriguez

On Thu, Feb 22, 2018 at 07:53:07PM +0800, Zong Li wrote:
> Since the 'commit d3deafaa8b5c ("lib/: make RUNTIME_TESTS a menuconfig
> to ease disabling it all")', the make kselftest-merge cannot merge the
> config dependencies of kselftest to the existing .config file.
> 
> These config dependencies of kselftest need to enable the
> 'CONFIG_RUNTIME_TESTING_MENU=y' at the same time.
> 
> Signed-off-by: Zong Li 
> Cc: Greentime Hu 

Please add respective Fixes: tag with the sha1sum, and commit name.

  Luis

Re: [RFC PATCH v3 0/3] Enable virtio_net to act as a backup for a passthru device

2018-02-22 Thread Alexander Duyck

On Thu, Feb 22, 2018 at 12:11 AM, Jiri Pirko  wrote:
> Wed, Feb 21, 2018 at 09:57:09PM CET, alexander.du...@gmail.com wrote:
>>On Wed, Feb 21, 2018 at 11:38 AM, Jiri Pirko  wrote:
>>> Wed, Feb 21, 2018 at 06:56:35PM CET, alexander.du...@gmail.com wrote:
On Wed, Feb 21, 2018 at 8:58 AM, Jiri Pirko  wrote:
> Wed, Feb 21, 2018 at 05:49:49PM CET, alexander.du...@gmail.com wrote:
>>On Wed, Feb 21, 2018 at 8:11 AM, Jiri Pirko  wrote:
>>> Wed, Feb 21, 2018 at 04:56:48PM CET, alexander.du...@gmail.com wrote:
On Wed, Feb 21, 2018 at 1:51 AM, Jiri Pirko  wrote:
> Tue, Feb 20, 2018 at 11:33:56PM CET, kubak...@wp.pl wrote:
>>On Tue, 20 Feb 2018 21:14:10 +0100, Jiri Pirko wrote:
>>> Yeah, I can see it now :( I guess that the ship has sailed and we 
>>> are
>>> stuck with this ugly thing forever...
>>>
>>> Could you at least make some common code that is shared in between
>>> netvsc and virtio_net so this is handled in exacly the same way in 
>>> both?
>>
>>IMHO netvsc is a vendor specific driver which made a mistake on what
>>behaviour it provides (or tried to align itself with Windows SR-IOV).
>>Let's not make a far, far more commonly deployed and important driver
>>(virtio) bug-compatible with netvsc.
>
> Yeah. netvsc solution is a dangerous precedent here and in my 
> opinition
> it was a huge mistake to merge it. I personally would vote to unmerge 
> it
> and make the solution based on team/bond.
>
>
>>
>>To Jiri's initial comments, I feel the same way, in fact I've talked 
>>to
>>the NetworkManager guys to get auto-bonding based on MACs handled in
>>user space.  I think it may very well get done in next versions of NM,
>>but isn't done yet.  Stephen also raised the point that not everybody 
>>is
>>using NM.
>
> Can be done in NM, networkd or other network management tools.
> Even easier to do this in teamd and let them all benefit.
>
> Actually, I took a stab to implement this in teamd. Took me like an 
> hour
> and half.
>
> You can just run teamd with config option "kidnap" like this:
> # teamd/teamd -c '{"kidnap": true }'
>
> Whenever teamd sees another netdev to appear with the same mac as his,
> or whenever teamd sees another netdev to change mac to his,
> it enslaves it.
>
> Here's the patch (quick and dirty):
>
> Subject: [patch teamd] teamd: introduce kidnap feature
>
> Signed-off-by: Jiri Pirko 

So this doesn't really address the original problem we were trying to
solve. You asked earlier why the netdev name mattered and it mostly
has to do with configuration. Specifically what our patch is
attempting to resolve is the issue of how to allow a cloud provider to
upgrade their customer to SR-IOV support and live migration without
requiring them to reconfigure their guest. So the general idea with
our patch is to take a VM that is running with virtio_net only and
allow it to instead spawn a virtio_bypass master using the same netdev
name as the original virtio, and then have the virtio_net and VF come
up and be enslaved by the bypass interface. Doing it this way we can
allow for multi-vendor SR-IOV live migration support using a guest
that was originally configured for virtio only.

The problem with your solution is we already have teaming and bonding
as you said. There is already a write-up from Red Hat on how to do it
(https://access.redhat.com/documentation/en-us/red_hat_virtualization/4.1/html/virtual_machine_management_guide/sect-migrating_virtual_machines_between_hosts).
That is all well and good as long as you are willing to keep around
two VM images, one for virtio, and one for SR-IOV with live migration.
>>>
>>> You don't need 2 images. You need only one. The one with the team setup.
>>> That's it. If another netdev with the same mac appears, teamd will
>>> enslave it and run traffic on it. If not, ok, you'll go only through
>>> virtio_net.
>>
>>Isn't that going to cause the routing table to get messed up when we
>>rearrange the netdevs? We don't want to have an significant disruption
>> in traffic when we are adding/removing the VF. It seems like we would
>>need to invalidate any entries that were configured for the virtio_net
>>and reestablish them on the new team interface. Part of the criteria
>>we have been working with is that we should be able to transition from

Re: [PATCH v2 net-next 1/2] lan743x: Add main source files for new lan743x driver

2018-02-22 Thread Andrew Lunn

> Also I'm allocating interrupt resources on interface up, and freeing resources
> on interface down. So if there is an up, down, up sequence then the driver
> will allocate resources twice. In order for devm to work properly, should I
> move all resource allocation into the probe function?

Hi Bryan

It is better to fail early if the resource is not available, so yes, i
would register the interrupt handler in probe.

  Andrew

1 2 >

1 - 100 of 170 matches

Mail list logo