date:20180926

[PATCH 1/1] Update maintainers for bnx2/bnx2x/qlge/qlcnic drivers.

2018-09-26 Thread Sudarsana Reddy Kalluru

Signed-off-by: Sudarsana Reddy Kalluru 
Signed-off-by: Ameen Rahman 
---
 MAINTAINERS | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 15565de..e941d1b 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2956,7 +2956,6 @@ F:include/linux/bcm963xx_tag.h
 
 BROADCOM BNX2 GIGABIT ETHERNET DRIVER
 M: Rasesh Mody 
-M: Harish Patil 
 M: dept-gelinuxnic...@cavium.com
 L: netdev@vger.kernel.org
 S: Supported
@@ -2977,6 +2976,7 @@ F:drivers/scsi/bnx2i/
 
 BROADCOM BNX2X 10 GIGABIT ETHERNET DRIVER
 M: Ariel Elior 
+M: Sudarsana Kalluru 
 M: everest-linux...@cavium.com
 L: netdev@vger.kernel.org
 S: Supported
@@ -11982,7 +11982,7 @@ F:  Documentation/scsi/LICENSE.qla4xxx
 F: drivers/scsi/qla4xxx/
 
 QLOGIC QLCNIC (1/10)Gb ETHERNET DRIVER
-M: Harish Patil 
+M: Shahed Shaikh 
 M: Manish Chopra 
 M: dept-gelinuxnic...@cavium.com
 L: netdev@vger.kernel.org
@@ -11990,7 +11990,6 @@ S:  Supported
 F: drivers/net/ethernet/qlogic/qlcnic/
 
 QLOGIC QLGE 10Gb ETHERNET DRIVER
-M: Harish Patil 
 M: Manish Chopra 
 M: dept-gelinuxnic...@cavium.com
 L: netdev@vger.kernel.org
-- 
1.8.3.1

Re: [PATCH] net/ncsi: Add NCSI OEM command for FB Tiogapass

2018-09-26 Thread Samuel Mendoza-Jonas

On Thu, 2018-09-27 at 13:43 +1000, Samuel Mendoza-Jonas wrote:
> On Mon, 2018-09-24 at 17:08 -0700, Vijay Khemka wrote:
> > This patch adds OEM command to get mac address from NCSI device and and
> > configure the same to the network card.
> > 
> > ncsi_cmd_arg - Modified this structure to include bigger payload data.
> > ncsi_cmd_handler_oem: This function handles oem command request
> > ncsi_rsp_handler_oem: This function handles response for OEM command.
> > get_mac_address_oem_mlx: This function will send OEM command to get
> > mac address for Mellanox card
> > set_mac_affinity_mlx: This will send OEM command to set Mac affinity
> > for Mellanox card
> > 
> > Signed-off-by: Vijay Khemka 
> 
> Hi Vijay,
> 
> Having had a chance to take a closer look, there is probably room for
> both this patchset and Justin's potential changes to coexist; while
> Justin's is a more general solution for sending arbitrary commands, the
> approach of this patch is also useful for handling commands we want
> included in the configure process (such as get-mac-address).
> 
> Some comments below:

Whoops, forgot to re-add netdev.

> 
> > ---
> >  net/ncsi/Kconfig   |  3 ++
> >  net/ncsi/internal.h| 11 +--
> >  net/ncsi/ncsi-cmd.c| 24 +--
> >  net/ncsi/ncsi-manage.c | 68 ++
> >  net/ncsi/ncsi-pkt.h| 16 ++
> >  net/ncsi/ncsi-rsp.c| 33 +++-
> >  6 files changed, 149 insertions(+), 6 deletions(-)
> > 
> > diff --git a/net/ncsi/Kconfig b/net/ncsi/Kconfig
> > index 08a8a6031fd7..b8bf89fea7c8 100644
> > --- a/net/ncsi/Kconfig
> > +++ b/net/ncsi/Kconfig
> > @@ -10,3 +10,6 @@ config NET_NCSI
> >   support. Enable this only if your system connects to a network
> >   device via NCSI and the ethernet driver you're using supports
> >   the protocol explicitly.
> > +config NCSI_OEM_CMD_GET_MAC
> > +   bool "Get NCSI OEM MAC Address"
> > +   depends on NET_NCSI
> 
> For the moment this isn't too bad but I wonder if in the future it would
> be more flexible to have something like NCSI_OEM_CMD_MELLANOX etc, so we
> could selectively enable a class of OEM commands based on vendor rather
> than per-command.
> 
> > diff --git a/net/ncsi/internal.h b/net/ncsi/internal.h
> > index 8055e3965cef..da17958e6a4b 100644
> > --- a/net/ncsi/internal.h
> > +++ b/net/ncsi/internal.h
> > @@ -68,6 +68,10 @@ enum {
> > NCSI_MODE_MAX
> >  };
> >  
> > +#define NCSI_OEM_MFR_MLX_ID 0x8119
> > +#define NCSI_OEM_MLX_CMD_GET_MAC0x1b00
> > +#define NCSI_OEM_MLX_CMD_SET_AFFINITY   0x010700
> 
> I gather this is part of the OEM command but it would be good to describe
> what these bits mean. Is this command documented anywhere by Mellanox?
> 
> > +
> >  struct ncsi_channel_version {
> > u32 version;/* Supported BCD encoded NCSI version */
> > u32 alpha2; /* Supported BCD encoded NCSI version */
> > @@ -236,6 +240,7 @@ enum {
> > ncsi_dev_state_probe_dp,
> > ncsi_dev_state_config_sp= 0x0301,
> > ncsi_dev_state_config_cis,
> > +   ncsi_dev_state_config_oem_gma,
> > ncsi_dev_state_config_clear_vids,
> > ncsi_dev_state_config_svf,
> > ncsi_dev_state_config_ev,
> > @@ -301,9 +306,9 @@ struct ncsi_cmd_arg {
> > unsigned short   payload; /* Command packet payload length */
> > unsigned int req_flags;   /* NCSI request properties   */
> > union {
> > -   unsigned char  bytes[16]; /* Command packet specific data  */
> > -   unsigned short words[8];
> > -   unsigned int   dwords[4];
> > +   unsigned char  bytes[64]; /* Command packet specific data  */
> > +   unsigned short words[32];
> > +   unsigned int   dwords[16];
> > };
> >  };
> >  
> > diff --git a/net/ncsi/ncsi-cmd.c b/net/ncsi/ncsi-cmd.c
> > index 7567ca63aae2..3205e22c1734 100644
> > --- a/net/ncsi/ncsi-cmd.c
> > +++ b/net/ncsi/ncsi-cmd.c
> > @@ -211,6 +211,25 @@ static int ncsi_cmd_handler_snfc(struct sk_buff *skb,
> > return 0;
> >  }
> >  
> > +static int ncsi_cmd_handler_oem(struct sk_buff *skb,
> > +   struct ncsi_cmd_arg *nca)
> > +{
> > +   struct ncsi_cmd_oem_pkt *cmd;
> > +   unsigned int len;
> > +
> > +   len = sizeof(struct ncsi_cmd_pkt_hdr) + 4;
> > +   if (nca->payload < 26)
> > +   len += 26;
> 
> This will have already happened in ncsi_alloc_command(), is this check
> needed?
> 
> > +   else
> > +   len += nca->payload;
> > +
> > +   cmd = skb_put_zero(skb, len);
> > +   memcpy(cmd->data, nca->bytes, nca->payload);
> > +   ncsi_cmd_build_header(>cmd.common, nca);
> > +
> > +   return 0;
> > +}
> > +
> >  static struct ncsi_cmd_handler {
> > unsigned char type;
> > int   payload;
> > @@ -244,7 +263,7 @@ static struct ncsi_cmd_handler {
> > { NCSI_PKT_CMD_GNS,0, ncsi_cmd_handler_default },
> > { NCSI_PKT_CMD_GNPTS,  0, ncsi_cmd_handler_default },
> >

Re: [PATCH net v2] bnxt_en: Fix TX timeout during netpoll.

2018-09-26 Thread Eric Dumazet

On Wed, Sep 26, 2018 at 8:49 PM Song Liu  wrote:
>

> We also need this patch from Eric:
>
> https://marc.info/?l=linux-netdev=153780304905946
>


I will submit this formally tomorrow, thanks.

Re: [PATCH net v2] bnxt_en: Fix TX timeout during netpoll.

2018-09-26 Thread Song Liu




> On Sep 26, 2018, at 8:33 PM, David Miller  wrote:
> 
> From: Michael Chan 
> Date: Wed, 26 Sep 2018 00:41:04 -0400
> 
>> The current netpoll implementation in the bnxt_en driver has problems
>> that may miss TX completion events.  bnxt_poll_work() in effect is
>> only handling at most 1 TX packet before exiting.  In addition,
>> there may be in flight TX completions that ->poll() may miss even
>> after we fix bnxt_poll_work() to handle all visible TX completions.
>> netpoll may not call ->poll() again and HW may not generate IRQ
>> because the driver does not ARM the IRQ when the budget (0 for netpoll)
>> is reached.
>> 
>> We fix it by handling all TX completions and to always ARM the IRQ
>> when we exit ->poll() with 0 budget.
>> 
>> Also, the logic to ACK the completion ring in case it is almost filled
>> with TX completions need to be adjusted to take care of the 0 budget
>> case, as discussed with Eric Dumazet 
>> 
>> Reported-by: Song Liu 
>> Signed-off-by: Michael Chan 
> 
> Applied and queued up for -stable, thanks Michael.

Hi David,

We also need this patch from Eric:

https://marc.info/?l=linux-netdev=153780304905946

Thanks,
Song

Re: [PATCH] net-tcp: /proc/sys/net/ipv4/tcp_probe_interval is a u32 not int

2018-09-26 Thread David Miller

From: "Maciej Żenczykowski" 
Date: Tue, 25 Sep 2018 21:59:28 -0700

> From: Maciej Żenczykowski 
> 
> (fix documentation and sysctl access to treat it as such)
> 
> Tested:
>   # zcat /proc/config.gz | egrep ^CONFIG_HZ
>   CONFIG_HZ_1000=y
>   CONFIG_HZ=1000
>   # echo $[(1<<32)/1000 + 1] | tee /proc/sys/net/ipv4/tcp_probe_interval
>   4294968
>   tee: /proc/sys/net/ipv4/tcp_probe_interval: Invalid argument
>   # echo $[(1<<32)/1000] | tee /proc/sys/net/ipv4/tcp_probe_interval
>   4294967
>   # echo 0 | tee /proc/sys/net/ipv4/tcp_probe_interval
>   # echo -1 | tee /proc/sys/net/ipv4/tcp_probe_interval
>   -1
>   tee: /proc/sys/net/ipv4/tcp_probe_interval: Invalid argument
> 
> Signed-off-by: Maciej Żenczykowski 

Applied.

Re: [PATCH net v2] bnxt_en: Fix TX timeout during netpoll.

2018-09-26 Thread David Miller

From: Michael Chan 
Date: Wed, 26 Sep 2018 00:41:04 -0400

> The current netpoll implementation in the bnxt_en driver has problems
> that may miss TX completion events.  bnxt_poll_work() in effect is
> only handling at most 1 TX packet before exiting.  In addition,
> there may be in flight TX completions that ->poll() may miss even
> after we fix bnxt_poll_work() to handle all visible TX completions.
> netpoll may not call ->poll() again and HW may not generate IRQ
> because the driver does not ARM the IRQ when the budget (0 for netpoll)
> is reached.
> 
> We fix it by handling all TX completions and to always ARM the IRQ
> when we exit ->poll() with 0 budget.
> 
> Also, the logic to ACK the completion ring in case it is almost filled
> with TX completions need to be adjusted to take care of the 0 budget
> case, as discussed with Eric Dumazet 
> 
> Reported-by: Song Liu 
> Signed-off-by: Michael Chan 

Applied and queued up for -stable, thanks Michael.

Re: [PATCH 1/2] net-ipv4: remove 2 always zero parameters from ipv4_update_pmtu()

2018-09-26 Thread David Miller

From: "Maciej Żenczykowski" 
Date: Tue, 25 Sep 2018 20:56:26 -0700

> From: Maciej Żenczykowski 
> 
> (the parameters in question are mark and flow_flags)
> 
> Reviewed-by: David Ahern 
> Signed-off-by: Maciej Żenczykowski 

Applied.

Re: [PATCH 2/2] net-ipv4: remove 2 always zero parameters from ipv4_redirect()

2018-09-26 Thread David Miller

From: "Maciej Żenczykowski" 
Date: Tue, 25 Sep 2018 20:56:27 -0700

> From: Maciej Żenczykowski 
> 
> (the parameters in question are mark and flow_flags)
> 
> Reviewed-by: David Ahern 
> Signed-off-by: Maciej Żenczykowski 

Applied.

Re: [PATCH net] vxlan: fill ttl inherit info

2018-09-26 Thread David Miller

From: Hangbin Liu 
Date: Wed, 26 Sep 2018 10:35:42 +0800

> When add vxlan ttl inherit support, I forgot to fill it when dump
> vlxan info. Fix it now.
> 
> Fixes: 72f6d71e491e6 ("vxlan: add ttl inherit support")
> Signed-off-by: Hangbin Liu 

Applied and queued up for -stable.

Re: [PATCH v2 net] net: phy: sfp: Fix unregistering of HWMON SFP device

2018-09-26 Thread David Miller

From: Andrew Lunn 
Date: Tue, 25 Sep 2018 01:50:00 +0200

> A HWMON device is only registered is the SFP module supports the
> diagnostic page and is complient to SFF8472. Don't unconditionally
> unregister the hwmon device when the SFP module is remove, otherwise
> we access data structures which don't exist.
> 
> Reported-by: Florian Fainelli 
> Fixes: 1323061a018a ("net: phy: sfp: Add HWMON support for module sensors")
> Signed-off-by: Andrew Lunn 

Applied.

Re: [PATCH net] bonding: avoid possible dead-lock

2018-09-26 Thread David Miller

From: Mahesh Bandewar 
Date: Mon, 24 Sep 2018 14:40:11 -0700

> From: Mahesh Bandewar 
> 
> Syzkaller reported this on a slightly older kernel but it's still
> applicable to the current kernel -
 ...
> Signed-off-by: Mahesh Bandewar 

Applied and queued up for -stable.

Re: [PATCH net] bonding: pass link-local packets to bonding master also.

2018-09-26 Thread David Miller

From: Mahesh Bandewar 
Date: Mon, 24 Sep 2018 14:39:42 -0700

> From: Mahesh Bandewar 
> 
> Commit b89f04c61efe ("bonding: deliver link-local packets with
> skb->dev set to link that packets arrived on") changed the behavior
> of how link-local-multicast packets are processed. The change in
> the behavior broke some legacy use cases where these packets are
> expected to arrive on bonding master device also.
> 
> This patch passes the packet to the stack with the link it arrived
> on as well as passes to the bonding-master device to preserve the
> legacy use case.
> 
> Fixes: b89f04c61efe ("bonding: deliver link-local packets with skb->dev set 
> to link that packets arrived on")
> Reported-by: Michal Soltys 
> Signed-off-by: Mahesh Bandewar 

Applied and queued up for -stable.

Re: [PATCH net-next] net: phy: improve handling delayed work

2018-09-26 Thread David Miller

From: Heiner Kallweit 
Date: Mon, 24 Sep 2018 23:36:50 +0200

> diff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c
> index a1f8e4816..40a7dc101 100644
> --- a/drivers/net/phy/phy.c
> +++ b/drivers/net/phy/phy.c
 ...
> @@ -635,6 +635,13 @@ int phy_speed_up(struct phy_device *phydev)
>  }
>  EXPORT_SYMBOL_GPL(phy_speed_up);
>  
> +static inline void phy_queue_state_machine(struct phy_device *phydev,
> +unsigned int secs)
> +{

Please do not use the inline keyword in foo.c files, let the compiler
decide.

Re: [PATCH] net/ncsi: Add NCSI OEM command for FB Tiogapass

2018-09-26 Thread Joel Stanley

On Thu, 27 Sep 2018 at 00:39,  wrote:
>
> > >  As I understand Justin's version adds a generic handler, using the NCSI
> > >  Netlink interface to pass OEM commands and responses to and from
> > >  userspace, which does the actual packet handling.
> > Thanks for the direction Sam! Justin, if you don't mind, can you share the 
> > patches you have to add the support? This actually would solve a few other 
> > things we are trying to accomplish.
>
>
> Basically, I add a new flag to indicate the request is coming from the 
> Netlink interface to allow the command handler and response handler to react.
> #define NCSI_REQ_FLAG_NETLINK_DRIVEN2
>
> The work flow is as below.
>
> Request:
> User space application -> Netlink interface (msg) -> new Netlink handler - 
> ncsi_send_cmd_nl() - ncsi_xmit_cmd()
>
> Response:
> Response received - ncsi_rcv_rsp() -> internal response handler - 
> ncsi_rsp_handler_xxx() -> ncsi_send_netlink_rsp () -> Netlink interface (msg) 
> -> user space application
> Command timeout - ncsi_request_timeout() -> ncsi_send_netlink_timeout () -> 
> Netlink interface (msg with zero data length) -> user space application
>
> Error:
> Detected error -> ncsi_send_netlink_err () -> Netlink interface (err msg) -> 
> user space application
>
> I will clean up some code and send out the patch.

Thanks for the overview. We look forward to your patches; please
include the same cc list as this series.

I think it makes sense to have some OEM NCSI handing purely in the
kenrel. This would allow eg. the MAC address of an interface to be
correct at boot, without requiring userspace to come up first.

I have also heard stories of temperature sensors over NCSI. Those
would make sense to be hwmon drivers, which again would mean handling
them in the kernel.

Justin, Vijay, can you please list the known NCSI OEM
commands/extensions that we plan on implementing?

Cheers,

Joel

Re: bond: take rcu lock in bond_poll_controller

2018-09-26 Thread David Miller

From: Dave Jones 
Date: Mon, 24 Sep 2018 15:23:17 -0400

> Callers of bond_for_each_slave_rcu are expected to hold the rcu lock,
> otherwise a trace like below is shown
 ...
> Signed-off-by: Dave Jones 

Hey Dave, after some recent changes by Eric Dumazet this no longer
applies.

Please respin against 'net'.

Thanks.

Re: [PATCH net v2 0/2] net: phy: fix WoL handling when suspending the PHY

2018-09-26 Thread David Miller

From: Heiner Kallweit 
Date: Mon, 24 Sep 2018 21:58:04 +0200

> phy_suspend doesn't always recognize that WoL is enabled and therefore
> suspends the PHY when it should not. First idea to address the issue
> was to reuse checks used in mdio_bus_phy_may_suspend which check
> whether relevant devices are wakeup-enabled.
> Florian raised some concerns because drivers may enable wakeup even if
> WoL isn't enabled (e.g. certain USB network drivers).
> 
> The new approach focuses on reducing the risk to break existing stuff.
> We add a flag wol_enabled to struct net_device which is set in
> ethtool_set_wol(). Then this flag is checked in phy_suspend().
> This doesn't cover 100% of the cases yet (e.g. if WoL is enabled w/o
> explicit configuration), but it covers the most relevant cases with
> very little risk of regressions.
> 
> v2:
> - Fix a typo

Series applied, thanks.

Please deal with the extra 4 byte size of net_device in net-next.

Thanks.

Re: [PATCH net] net/ipv6: Remove extra call to ip6_convert_metrics for multipath case

2018-09-26 Thread David Miller

From: dsah...@kernel.org
Date: Wed, 26 Sep 2018 17:35:14 -0700

> From: David Ahern 
> 
> The change to move metrics from the dst to rt6_info moved the call
> to ip6_convert_metrics from ip6_route_add to ip6_route_info_create. In
> doing so it makes the call in ip6_route_info_append redundant and
> actually leaks the metrics installed as part of the ip6_route_info_create.
> Remove the now unnecessary call.
> 
> Fixes: d4ead6b34b67f ("net/ipv6: move metrics from dst to rt6_info")
> Signed-off-by: David Ahern 

Applied and queued up for -stable, thanks David.

Could your company tell us if you can supply us

2018-09-26 Thread DEBENHAMS PLC




Hi

Situated in (United Kingdom) DEBENHAMS RETAIL PLC is a retailer shop 
furnished by European products.We are looking to buy ( Fashion 
items,Perfumes, Lighters, Tiles Marbles, Wooden Floors, Smart-phones, 
Tablets,Laptops, TVs, Air conditioner etc...) and find new partnership 
with companies dealing with different products so can you please send us 
your Catalog or your website to learn more about your products or prices 
list by email and if we can make some order with you and start long-term
partnership.Could you also tell us if you can supply us, more 
information about the possibility to become one of your regular 
customers?


Our Payment Policy is payment by swift within 30 days net in Europe.

Waiting for your feedback.

Best regards.

Richard Jones
Trading Director, Global Sourcing
DEBENHAMS RETAIL PLC
Direct line : +44 7413339961
Fax : +44 02082534105
www.debenhams.com
www.debenhamsplus.com

Re: [PATCH bpf-next 0/5] Introduce libbpf_attach_type_by_name

2018-09-26 Thread Jakub Kicinski

On Wed, 26 Sep 2018 23:54:17 +, Andrey Ignatov wrote:
> Jakub Kicinski  [Wed, 2018-09-26 16:20 -0700]:
> > On Wed, 26 Sep 2018 15:24:52 -0700, Andrey Ignatov wrote:  
> > > This patch set introduces libbpf_attach_type_by_name function in libbpf to
> > > identify attach type by section name.
> > > 
> > > This is useful to avoid writing same logic over and over again in user
> > > space applications that leverage libbpf.
> > > 
> > > Patch 1 has more details on the new function and problem being solved.
> > > Patches 2 and 3 add support for new section names.
> > > Patch 4 uses new function in a selftest.
> > > Patch 5 adds selftest for libbpf_{prog,attach}_type_by_name.
> > > 
> > > As a side note there are a lot of inconsistencies now between names used 
> > > by
> > > libbpf and bpftool (e.g. cgroup/skb vs cgroup_skb, cgroup_device and 
> > > device
> > > vs cgroup/dev, sockops vs sock_ops, etc). This patch set does not address
> > > it but it tries not to make it harder to address it in the future.  
> > 
> > I was wondering a few times whether I should point it out to people
> > during review, but thought it would be nit picking.  Maybe we should be
> > more strict.
> > 
> > Your series LGTM!  
> 
> Thanks for review!
> 
> IMO having it consistent would be great, e.g. one writes a program with
> section name X and bpftool shows/accepts it in exactly same way in all
> its sub-commands (w/o maybe custom suffix added by program writer).
> 
> But I doubt that keeping a few places in sync manually will work long
> term since it's easy to miss such a thing.
> 
> What do you think of having one source of truth in libbpf so that a
> string for prog_type or attach_type is defined once and all other places
> (e.g. bpftool prog show, bpftool cgroup show) use only corresponding
> enum-s to get those strings, but don't introduce any new strings?
>
> Keeping already existing names in a backward compatible way is a pain
> though.

One source of truth would be nice IMO.  The backward compat ties our
hands a tiny bit, but we could do a fallback strategy, where bpftool
checks if it has a name defined for a type, and if it doesn't (for new
types) it will ask libbpf.

There is another place actually where names are hard coded - for program
load command we use the types as they appear in libbpf already.  And
that requires putting them in help, man page and bash completions, too.

> Another thing, I was wondering, is if there is a way to bypass strings
> completely (at least in libbpf, since bpftool still has to print
> human-readable names) and keep actual bpf_prog_type and bpf_attach_type
> as metadata for a program in ELF file. Maybe some compiler magic ..

Possibly.  I'm far from an ELF expert, but I think attaching arbitrary
metadata to sections may be harder, name is already there and is free
form.

Reminder,

2018-09-26 Thread Juliet Muhammad

Hello 

   Please i still await your response regarding my previous email.

Reminder,

2018-09-26 Thread Juliet Muhammad

Hello 

   Please i still await your response regarding my previous email.

Re: [PATCH bpf-next] flow_dissector: lookup netns by skb->sk if skb->dev is NULL

2018-09-26 Thread Eric Dumazet




On 09/25/2018 08:38 AM, Daniel Borkmann wrote:
> On 09/24/2018 10:49 PM, Willem de Bruijn wrote:
>> From: Willem de Bruijn 
>>
>> BPF flow dissectors are configured per network namespace.
>> __skb_flow_dissect looks up the netns through dev_net(skb->dev).
>>
>> In some dissector paths skb->dev is NULL, such as for Unix sockets.
>> In these cases fall back to looking up the netns by socket.
>>
>> Analyzing the codepaths leading to __skb_flow_dissect I did not find
>> a case where both skb->dev and skb->sk are NULL. Warn and fall back to
>> standard flow dissector if one is found.
>>
>> Fixes: d58e468b1112 ("flow_dissector: implements flow dissector BPF hook")
>> Reported-by: Eric Dumazet 
>> Signed-off-by: Willem de Bruijn 
> 
> Applied to bpf-next, thanks Willem!
> 

Sadly lib/test_bpf should still cause crashes, because populate_skb()
populates an skb attached to a fake device, for which net pointer is NULL.

[PATCH net] net/ipv6: Remove extra call to ip6_convert_metrics for multipath case

2018-09-26 Thread dsahern

From: David Ahern 

The change to move metrics from the dst to rt6_info moved the call
to ip6_convert_metrics from ip6_route_add to ip6_route_info_create. In
doing so it makes the call in ip6_route_info_append redundant and
actually leaks the metrics installed as part of the ip6_route_info_create.
Remove the now unnecessary call.

Fixes: d4ead6b34b67f ("net/ipv6: move metrics from dst to rt6_info")
Signed-off-by: David Ahern 
---
 net/ipv6/route.c | 5 -
 1 file changed, 5 deletions(-)

diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index d28f83e01593..570f158253b9 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -4293,11 +4293,6 @@ static int ip6_route_info_append(struct net *net,
if (!nh)
return -ENOMEM;
nh->fib6_info = rt;
-   err = ip6_convert_metrics(net, rt, r_cfg);
-   if (err) {
-   kfree(nh);
-   return err;
-   }
memcpy(>r_cfg, r_cfg, sizeof(*r_cfg));
list_add_tail(>next, rt6_nh_list);
 
-- 
2.11.0

Reminder,

2018-09-26 Thread Juliet Muhammad

Hello 

   Please i still await your response regarding my previous email.

Re: WARN_ON in TLP causing RT throttling

2018-09-26 Thread Eric Dumazet




On 09/26/2018 04:46 PM, stran...@codeaurora.org wrote:
> Hi Eric,
> 
> Someone recently reported a crash to us on the 4.14.62 kernel where excessive
> WARNING prints were spamming the logs and causing watchdog bites. The kernel
> does have the following commit by Soheil:
> bffd168c3fc5 "tcp: clear tp->packets_out when purging write queue"
> 
> Before this bug we see over 1 second of continuous WARN_ON prints from
> tcp_send_loss_probe() like so:
> 
> 7795.530450:   <2>  tcp_send_loss_probe+0x194/0x1b8
> 7795.534833:   <2>  tcp_write_timer_handler+0xf8/0x1c4
> 7795.539492:   <2>  tcp_write_timer+0x4c/0x74
> 7795.543348:   <2>  call_timer_fn+0xc0/0x1b4
> 7795.547113:   <2>  run_timer_softirq+0x248/0x81c
> 
> Specifically, the prints come from the following check:
> 
> /* Retransmit last segment. */
> if (WARN_ON(!skb))
>     goto rearm_timer;
> 
> Since skb is always NULL, we know there's nothing on the write queue or the
> retransmit queue, so we just keep resetting the timer, waiting for more data
> to be queued. However, we were able to determine that the TCP socket is in the
> TCP_FIN_WAIT1 state, so we will no longer be sending any data and these queues
> remain empty.
> 
> Would it be appropriate to stop resetting the TLP timer if we detect that the
> connection is starting to close and we have no more data to send the probe 
> with,
> or is there some way that this scenario should already be handled?
> 
> Unfortunately, we don't have a reproducer for this crash.
>

Something is fishy.

If there is no skb in the queues, then tp->packets_out should be 0,
therefore tcp_rearm_rto() should simply call inet_csk_clear_xmit_timer(sk, 
ICSK_TIME_RETRANS);

I have never seen this report before.

Re: [PATCH bpf-next 0/5] Introduce libbpf_attach_type_by_name

2018-09-26 Thread Andrey Ignatov

Jakub Kicinski  [Wed, 2018-09-26 16:20 -0700]:
> On Wed, 26 Sep 2018 15:24:52 -0700, Andrey Ignatov wrote:
> > This patch set introduces libbpf_attach_type_by_name function in libbpf to
> > identify attach type by section name.
> > 
> > This is useful to avoid writing same logic over and over again in user
> > space applications that leverage libbpf.
> > 
> > Patch 1 has more details on the new function and problem being solved.
> > Patches 2 and 3 add support for new section names.
> > Patch 4 uses new function in a selftest.
> > Patch 5 adds selftest for libbpf_{prog,attach}_type_by_name.
> > 
> > As a side note there are a lot of inconsistencies now between names used by
> > libbpf and bpftool (e.g. cgroup/skb vs cgroup_skb, cgroup_device and device
> > vs cgroup/dev, sockops vs sock_ops, etc). This patch set does not address
> > it but it tries not to make it harder to address it in the future.
> 
> I was wondering a few times whether I should point it out to people
> during review, but thought it would be nit picking.  Maybe we should be
> more strict.
> 
> Your series LGTM!

Thanks for review!

IMO having it consistent would be great, e.g. one writes a program with
section name X and bpftool shows/accepts it in exactly same way in all
its sub-commands (w/o maybe custom suffix added by program writer).

But I doubt that keeping a few places in sync manually will work long
term since it's easy to miss such a thing.

What do you think of having one source of truth in libbpf so that a
string for prog_type or attach_type is defined once and all other places
(e.g. bpftool prog show, bpftool cgroup show) use only corresponding
enum-s to get those strings, but don't introduce any new strings?

Keeping already existing names in a backward compatible way is a pain
though.

Another thing, I was wondering, is if there is a way to bypass strings
completely (at least in libbpf, since bpftool still has to print
human-readable names) and keep actual bpf_prog_type and bpf_attach_type
as metadata for a program in ELF file. Maybe some compiler magic ..

-- 
Andrey Ignatov

WARN_ON in TLP causing RT throttling

2018-09-26 Thread stranche


Hi Eric,

Someone recently reported a crash to us on the 4.14.62 kernel where 
excessive
WARNING prints were spamming the logs and causing watchdog bites. The 
kernel

does have the following commit by Soheil:
bffd168c3fc5 "tcp: clear tp->packets_out when purging write queue"

Before this bug we see over 1 second of continuous WARN_ON prints from
tcp_send_loss_probe() like so:

7795.530450:   <2>  tcp_send_loss_probe+0x194/0x1b8
7795.534833:   <2>  tcp_write_timer_handler+0xf8/0x1c4
7795.539492:   <2>  tcp_write_timer+0x4c/0x74
7795.543348:   <2>  call_timer_fn+0xc0/0x1b4
7795.547113:   <2>  run_timer_softirq+0x248/0x81c

Specifically, the prints come from the following check:

/* Retransmit last segment. */
if (WARN_ON(!skb))
goto rearm_timer;

Since skb is always NULL, we know there's nothing on the write queue or 
the
retransmit queue, so we just keep resetting the timer, waiting for more 
data
to be queued. However, we were able to determine that the TCP socket is 
in the
TCP_FIN_WAIT1 state, so we will no longer be sending any data and these 
queues

remain empty.

Would it be appropriate to stop resetting the TLP timer if we detect 
that the
connection is starting to close and we have no more data to send the 
probe with,

or is there some way that this scenario should already be handled?

Unfortunately, we don't have a reproducer for this crash.

Thanks,
Sean

Re: bpf: Massive skbuff_head_cache memory leak?

2018-09-26 Thread John Johansen

On 09/26/2018 02:22 PM, Daniel Borkmann wrote:
> On 09/26/2018 11:09 PM, Tetsuo Handa wrote:
>> Hello, Alexei and Daniel.
>>
>> Can you show us how to run testcases you are testing?
> 
> Sorry for the delay; currently quite backlogged but will definitely take a 
> look
> at these reports. Regarding your question: majority of test cases are in the
> kernel tree under selftests, see tools/testing/selftests/bpf/ .
> 

Its unlikely to be apparmor. I went through the reports and saw nothing that
would indicate apparmor involvement, but the primary reason is what is being 
tested
in upstream apparmor atm.

The current upstream code does nothing directly with skbuffs. Its
possible that the audit code paths (kernel audit does grab skbuffs)
could, but there are only a couple cases that would be triggered in
the current fuzzing so this seems to be an unlikely source for such a
large leak.

>> On 2018/09/22 22:25, Tetsuo Handa wrote:
>>> Hello.
>>>
>>> syzbot is reporting many lockup problems on bpf.git / bpf-next.git / 
>>> net.git / net-next.git trees.
>>>
>>>   INFO: rcu detected stall in br_multicast_port_group_expired (2)
>>>   
>>> https://syzkaller.appspot.com/bug?id=15c7ad8cf35a07059e8a697a22527e11d294bc94
>>>
>>>   INFO: rcu detected stall in tun_chr_close
>>>   
>>> https://syzkaller.appspot.com/bug?id=6c50618bde03e5a2eefdd0269cf9739c5ebb8270
>>>
>>>   INFO: rcu detected stall in discover_timer
>>>   
>>> https://syzkaller.appspot.com/bug?id=55da031ddb910e58ab9c6853a5784efd94f03b54
>>>
>>>   INFO: rcu detected stall in ret_from_fork (2)
>>>   
>>> https://syzkaller.appspot.com/bug?id=c83129a6683b44b39f5b8864a1325893c9218363
>>>
>>>   INFO: rcu detected stall in addrconf_rs_timer
>>>   
>>> https://syzkaller.appspot.com/bug?id=21c029af65f81488edbc07a10ed20792444711b6
>>>
>>>   INFO: rcu detected stall in kthread (2)
>>>   
>>> https://syzkaller.appspot.com/bug?id=6accd1ed11c31110fed1982f6ad38cc9676477d2
>>>
>>>   INFO: rcu detected stall in ext4_filemap_fault
>>>   
>>> https://syzkaller.appspot.com/bug?id=817e38d20e9ee53390ac361bf0fd2007eaf188af
>>>
>>>   INFO: rcu detected stall in run_timer_softirq (2)
>>>   
>>> https://syzkaller.appspot.com/bug?id=f5a230a3ff7822f8d39fddf8485931bd06ae47fe
>>>
>>>   INFO: rcu detected stall in bpf_prog_ADDR
>>>   
>>> https://syzkaller.appspot.com/bug?id=fb4911fd0e861171cc55124e209f810a0dd68744
>>>
>>>   INFO: rcu detected stall in __run_timers (2)
>>>   
>>> https://syzkaller.appspot.com/bug?id=65416569ddc8d2feb8f19066aa761f5a47f7451a
>>>
>>> The cause of lockup seems to be flood of printk() messages from memory 
>>> allocation
>>> failures, and one of out_of_memory() messages indicates that 
>>> skbuff_head_cache
>>> usage is huge enough to suspect in-kernel memory leaks.
>>>
>>>   [ 1554.547011] skbuff_head_cache1847887KB1847887KB
>>>
>>> Unfortunately, we cannot find from logs what syzbot is trying to do
>>> because constant printk() messages is flooding away syzkaller messages.
>>> Can you try running your testcases with kmemleak enabled?
>>>
>>
>> On 2018/09/27 2:35, Dmitry Vyukov wrote:
>>> I also started suspecting Apparmor. We switched to Apparmor on Aug 30:
>>> https://groups.google.com/d/msg/syzkaller-bugs/o73lO4KGh0w/j9pcH2tSBAAJ
>>> Now the instances that use SELinux and Smack explicitly contain that
>>> in the name, but the rest are Apparmor.
>>> Aug 30 roughly matches these assorted "task hung" reports. Perhaps
>>> some Apparmor hook leaks a reference to skbs?
>>
>> Maybe. They have CONFIG_DEFAULT_SECURITY="apparmor". But I'm wondering why
>> this problem is not occurring on linux-next.git when this problem is 
>> occurring
>> on bpf.git / bpf-next.git / net.git / net-next.git trees. Is syzbot running
>> different testcases depending on which git tree is targeted?
>>
> 

this is another reason that it is doubtful that its apparmor.

Re: [PATCH bpf-next 0/5] Introduce libbpf_attach_type_by_name

2018-09-26 Thread Jakub Kicinski

On Wed, 26 Sep 2018 15:24:52 -0700, Andrey Ignatov wrote:
> This patch set introduces libbpf_attach_type_by_name function in libbpf to
> identify attach type by section name.
> 
> This is useful to avoid writing same logic over and over again in user
> space applications that leverage libbpf.
> 
> Patch 1 has more details on the new function and problem being solved.
> Patches 2 and 3 add support for new section names.
> Patch 4 uses new function in a selftest.
> Patch 5 adds selftest for libbpf_{prog,attach}_type_by_name.
> 
> As a side note there are a lot of inconsistencies now between names used by
> libbpf and bpftool (e.g. cgroup/skb vs cgroup_skb, cgroup_device and device
> vs cgroup/dev, sockops vs sock_ops, etc). This patch set does not address
> it but it tries not to make it harder to address it in the future.

I was wondering a few times whether I should point it out to people
during review, but thought it would be nit picking.  Maybe we should be
more strict.

Your series LGTM!

[PATCH net-next] nfp: warn on experimental TLV types

2018-09-26 Thread Jakub Kicinski

Reserve two TLV types for feature development, and warn in the driver
if they ever leak into production.

Signed-off-by: Jakub Kicinski 
Reviewed-by: Simon Horman 
---
 drivers/net/ethernet/netronome/nfp/nfp_net_ctrl.c | 7 +++
 drivers/net/ethernet/netronome/nfp/nfp_net_ctrl.h | 8 
 2 files changed, 15 insertions(+)

diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_ctrl.c 
b/drivers/net/ethernet/netronome/nfp/nfp_net_ctrl.c
index 1f9149bb2ae6..2190836eaa1d 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_ctrl.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_ctrl.c
@@ -113,6 +113,13 @@ int nfp_net_tlv_caps_parse(struct device *dev, u8 __iomem 
*ctrl_mem,
caps->mbox_len = length;
}
break;
+   case NFP_NET_CFG_TLV_TYPE_EXPERIMENTAL0:
+   case NFP_NET_CFG_TLV_TYPE_EXPERIMENTAL1:
+   dev_warn(dev,
+"experimental TLV type:%u offset:%u len:%u\n",
+FIELD_GET(NFP_NET_CFG_TLV_HEADER_TYPE, hdr),
+offset, length);
+   break;
default:
if (!FIELD_GET(NFP_NET_CFG_TLV_HEADER_REQUIRED, hdr))
break;
diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_ctrl.h 
b/drivers/net/ethernet/netronome/nfp/nfp_net_ctrl.h
index 44d3ea75d043..a51490747689 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_ctrl.h
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_ctrl.h
@@ -489,12 +489,20 @@
  * %NFP_NET_CFG_TLV_TYPE_MBOX:
  * Variable, mailbox area.  Overwrites the default location which is
  * %NFP_NET_CFG_MBOX_BASE and length %NFP_NET_CFG_MBOX_VAL_MAX_SZ.
+ *
+ * %NFP_NET_CFG_TLV_TYPE_EXPERIMENTAL0:
+ * %NFP_NET_CFG_TLV_TYPE_EXPERIMENTAL1:
+ * Variable, experimental IDs.  IDs designated for internal development and
+ * experiments before a stable TLV ID has been allocated to a feature.  Should
+ * never be present in production firmware.
  */
 #define NFP_NET_CFG_TLV_TYPE_UNKNOWN   0
 #define NFP_NET_CFG_TLV_TYPE_RESERVED  1
 #define NFP_NET_CFG_TLV_TYPE_END   2
 #define NFP_NET_CFG_TLV_TYPE_ME_FREQ   3
 #define NFP_NET_CFG_TLV_TYPE_MBOX  4
+#define NFP_NET_CFG_TLV_TYPE_EXPERIMENTAL0 5
+#define NFP_NET_CFG_TLV_TYPE_EXPERIMENTAL1 6
 
 struct device;
 
-- 
2.17.1

Re: [PATCHv2 bpf-next 08/11] selftests/bpf: Add tests for reference tracking

2018-09-26 Thread Joe Stringer

On Mon, 24 Sep 2018 at 05:22, Daniel Borkmann  wrote:
>
> On 09/21/2018 07:10 PM, Joe Stringer wrote:
> > reference tracking: leak potential reference
> > reference tracking: leak potential reference on stack
> > reference tracking: leak potential reference on stack 2
> > reference tracking: zero potential reference
> > reference tracking: copy and zero potential references
> > reference tracking: release reference without check
> > reference tracking: release reference
> > reference tracking: release reference twice
> > reference tracking: release reference twice inside branch
> > reference tracking: alloc, check, free in one subbranch
> > reference tracking: alloc, check, free in both subbranches
> > reference tracking in call: free reference in subprog
> > reference tracking in call: free reference in subprog and outside
> > reference tracking in call: alloc & leak reference in subprog
> > reference tracking in call: alloc in subprog, release outside
> > reference tracking in call: sk_ptr leak into caller stack
> > reference tracking in call: sk_ptr spill into caller stack
> >
> > Signed-off-by: Joe Stringer 
> > Acked-by: Alexei Starovoitov 
> > ---
> >  tools/testing/selftests/bpf/test_verifier.c | 359 
> >  1 file changed, 359 insertions(+)
>
> I think this here needs to have some more test cases that we current do not 
> track but
> should in order to have better coverage. At minimum what comes to mind 
> additionally:
>
> - verifier interaction with LD_ABS, LD_IND
> - verifier interaction with tail calls (e.g. try to leak socket, 
> socket_or_null, etc,
>   but should also have a positive test where we drop ref before tail call to 
> show it
>   works in combination)
> - Try to mangle a socket and socket_or_null pointer with ALU ops and pass it 
> to helper
> - Try to access the socket data fields after we released its reference
> - Access socket member fields in general (I think not present right now)
> - Use direct packet access in combination with lookup helper (it's enabled
>   via pkt_access = true in the helper, so we should also test for it here to 
> make
>   sure future changes don't break it)

Great suggestions, I think that the LD_ABS/LD_IND ones are good
candidates for the assembly-level tests here. For the remainder, I
plan to add them to the C cases. That'll be easier to review and
easier to understand if they ever get broken.

Regarding the direct packet access with lookup, that's already in this
series in the C verifier tests patch.

[PATCH bpf-next 3/5] libbpf: Support sk_skb/stream_{parser,verdict} section names

2018-09-26 Thread Andrey Ignatov

Add section names for BPF_SK_SKB_STREAM_PARSER and
BPF_SK_SKB_STREAM_VERDICT attach types to be able to identify them in
libbpf_attach_type_by_name.

"stream_parser" and "stream_verdict" are used instead of simple "parser"
and "verdict" just to avoid possible confusion in a place where attach
type is used alone (e.g. in bpftool's show sub-commands) since there is
another attach point that can be named as "verdict": BPF_SK_MSG_VERDICT.

Signed-off-by: Andrey Ignatov 
---
 tools/lib/bpf/libbpf.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 51edf6cd390e..425d5ca45c97 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -2139,6 +2139,10 @@ static const struct {
BPF_CGROUP_DEVICE),
BPF_APROG_SEC("sockops",BPF_PROG_TYPE_SOCK_OPS,
BPF_CGROUP_SOCK_OPS),
+   BPF_APROG_SEC("sk_skb/stream_parser",   BPF_PROG_TYPE_SK_SKB,
+   BPF_SK_SKB_STREAM_PARSER),
+   BPF_APROG_SEC("sk_skb/stream_verdict",  BPF_PROG_TYPE_SK_SKB,
+   BPF_SK_SKB_STREAM_VERDICT),
BPF_APROG_COMPAT("sk_skb",  BPF_PROG_TYPE_SK_SKB),
BPF_APROG_SEC("sk_msg", BPF_PROG_TYPE_SK_MSG,
BPF_SK_MSG_VERDICT),
-- 
2.17.1

[PATCH bpf-next 5/5] selftests/bpf: Test libbpf_{prog,attach}_type_by_name

2018-09-26 Thread Andrey Ignatov

Add selftest for libbpf functions libbpf_prog_type_by_name and
libbpf_attach_type_by_name.

Example of output:
  % ./tools/testing/selftests/bpf/test_section_names
  Summary: 35 PASSED, 0 FAILED

Signed-off-by: Andrey Ignatov 
---
 tools/testing/selftests/bpf/Makefile  |   2 +-
 .../selftests/bpf/test_section_names.c| 208 ++
 2 files changed, 209 insertions(+), 1 deletion(-)
 create mode 100644 tools/testing/selftests/bpf/test_section_names.c

diff --git a/tools/testing/selftests/bpf/Makefile 
b/tools/testing/selftests/bpf/Makefile
index fd3851d5c079..059d64a0f897 100644
--- a/tools/testing/selftests/bpf/Makefile
+++ b/tools/testing/selftests/bpf/Makefile
@@ -23,7 +23,7 @@ $(TEST_CUSTOM_PROGS): $(OUTPUT)/%: %.c
 TEST_GEN_PROGS = test_verifier test_tag test_maps test_lru_map test_lpm_map 
test_progs \
test_align test_verifier_log test_dev_cgroup test_tcpbpf_user \
test_sock test_btf test_sockmap test_lirc_mode2_user get_cgroup_id_user 
\
-   test_socket_cookie test_cgroup_storage test_select_reuseport
+   test_socket_cookie test_cgroup_storage test_select_reuseport 
test_section_names
 
 TEST_GEN_FILES = test_pkt_access.o test_xdp.o test_l4lb.o test_tcp_estats.o 
test_obj_id.o \
test_pkt_md_access.o test_xdp_redirect.o test_xdp_meta.o 
sockmap_parse_prog.o \
diff --git a/tools/testing/selftests/bpf/test_section_names.c 
b/tools/testing/selftests/bpf/test_section_names.c
new file mode 100644
index ..7c4f41572b1c
--- /dev/null
+++ b/tools/testing/selftests/bpf/test_section_names.c
@@ -0,0 +1,208 @@
+// SPDX-License-Identifier: GPL-2.0
+// Copyright (c) 2018 Facebook
+
+#include 
+#include 
+
+#include "bpf_util.h"
+
+struct sec_name_test {
+   const char sec_name[32];
+   struct {
+   int rc;
+   enum bpf_prog_type prog_type;
+   enum bpf_attach_type expected_attach_type;
+   } expected_load;
+   struct {
+   int rc;
+   enum bpf_attach_type attach_type;
+   } expected_attach;
+};
+
+static struct sec_name_test tests[] = {
+   {"InvAliD", {-EINVAL, 0, 0}, {-EINVAL, 0} },
+   {"cgroup", {-EINVAL, 0, 0}, {-EINVAL, 0} },
+   {"socket", {0, BPF_PROG_TYPE_SOCKET_FILTER, 0}, {-EINVAL, 0} },
+   {"kprobe/", {0, BPF_PROG_TYPE_KPROBE, 0}, {-EINVAL, 0} },
+   {"kretprobe/", {0, BPF_PROG_TYPE_KPROBE, 0}, {-EINVAL, 0} },
+   {"classifier", {0, BPF_PROG_TYPE_SCHED_CLS, 0}, {-EINVAL, 0} },
+   {"action", {0, BPF_PROG_TYPE_SCHED_ACT, 0}, {-EINVAL, 0} },
+   {"tracepoint/", {0, BPF_PROG_TYPE_TRACEPOINT, 0}, {-EINVAL, 0} },
+   {
+   "raw_tracepoint/",
+   {0, BPF_PROG_TYPE_RAW_TRACEPOINT, 0},
+   {-EINVAL, 0},
+   },
+   {"xdp", {0, BPF_PROG_TYPE_XDP, 0}, {-EINVAL, 0} },
+   {"perf_event", {0, BPF_PROG_TYPE_PERF_EVENT, 0}, {-EINVAL, 0} },
+   {"lwt_in", {0, BPF_PROG_TYPE_LWT_IN, 0}, {-EINVAL, 0} },
+   {"lwt_out", {0, BPF_PROG_TYPE_LWT_OUT, 0}, {-EINVAL, 0} },
+   {"lwt_xmit", {0, BPF_PROG_TYPE_LWT_XMIT, 0}, {-EINVAL, 0} },
+   {"lwt_seg6local", {0, BPF_PROG_TYPE_LWT_SEG6LOCAL, 0}, {-EINVAL, 0} },
+   {
+   "cgroup_skb/ingress",
+   {0, BPF_PROG_TYPE_CGROUP_SKB, 0},
+   {0, BPF_CGROUP_INET_INGRESS},
+   },
+   {
+   "cgroup_skb/egress",
+   {0, BPF_PROG_TYPE_CGROUP_SKB, 0},
+   {0, BPF_CGROUP_INET_EGRESS},
+   },
+   {"cgroup/skb", {0, BPF_PROG_TYPE_CGROUP_SKB, 0}, {-EINVAL, 0} },
+   {
+   "cgroup/sock",
+   {0, BPF_PROG_TYPE_CGROUP_SOCK, 0},
+   {0, BPF_CGROUP_INET_SOCK_CREATE},
+   },
+   {
+   "cgroup/post_bind4",
+   {0, BPF_PROG_TYPE_CGROUP_SOCK, BPF_CGROUP_INET4_POST_BIND},
+   {0, BPF_CGROUP_INET4_POST_BIND},
+   },
+   {
+   "cgroup/post_bind6",
+   {0, BPF_PROG_TYPE_CGROUP_SOCK, BPF_CGROUP_INET6_POST_BIND},
+   {0, BPF_CGROUP_INET6_POST_BIND},
+   },
+   {
+   "cgroup/dev",
+   {0, BPF_PROG_TYPE_CGROUP_DEVICE, 0},
+   {0, BPF_CGROUP_DEVICE},
+   },
+   {"sockops", {0, BPF_PROG_TYPE_SOCK_OPS, 0}, {0, BPF_CGROUP_SOCK_OPS} },
+   {
+   "sk_skb/stream_parser",
+   {0, BPF_PROG_TYPE_SK_SKB, 0},
+   {0, BPF_SK_SKB_STREAM_PARSER},
+   },
+   {
+   "sk_skb/stream_verdict",
+   {0, BPF_PROG_TYPE_SK_SKB, 0},
+   {0, BPF_SK_SKB_STREAM_VERDICT},
+   },
+   {"sk_skb", {0, BPF_PROG_TYPE_SK_SKB, 0}, {-EINVAL, 0} },
+   {"sk_msg", {0, BPF_PROG_TYPE_SK_MSG, 0}, {0, BPF_SK_MSG_VERDICT} },
+   {"lirc_mode2", {0, BPF_PROG_TYPE_LIRC_MODE2, 0}, {0, BPF_LIRC_MODE2} },
+   {
+   "flow_dissector",
+   {0, BPF_PROG_TYPE_FLOW_DISSECTOR, 0},
+   {0, BPF_FLOW_DISSECTOR},
+   },
+

[PATCH bpf-next 4/5] selftests/bpf: Use libbpf_attach_type_by_name in test_socket_cookie

2018-09-26 Thread Andrey Ignatov

Use newly introduced libbpf_attach_type_by_name in test_socket_cookie
selftest.

Signed-off-by: Andrey Ignatov 
---
 tools/testing/selftests/bpf/test_socket_cookie.c | 6 +-
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/tools/testing/selftests/bpf/test_socket_cookie.c 
b/tools/testing/selftests/bpf/test_socket_cookie.c
index 68e108e4687a..b6c2c605d8c0 100644
--- a/tools/testing/selftests/bpf/test_socket_cookie.c
+++ b/tools/testing/selftests/bpf/test_socket_cookie.c
@@ -158,11 +158,7 @@ static int run_test(int cgfd)
bpf_object__for_each_program(prog, pobj) {
prog_name = bpf_program__title(prog, /*needs_copy*/ false);
 
-   if (strcmp(prog_name, "cgroup/connect6") == 0) {
-   attach_type = BPF_CGROUP_INET6_CONNECT;
-   } else if (strcmp(prog_name, "sockops") == 0) {
-   attach_type = BPF_CGROUP_SOCK_OPS;
-   } else {
+   if (libbpf_attach_type_by_name(prog_name, _type)) {
log_err("Unexpected prog: %s", prog_name);
goto err;
}
-- 
2.17.1

[PATCH bpf-next 0/5] Introduce libbpf_attach_type_by_name

2018-09-26 Thread Andrey Ignatov

This patch set introduces libbpf_attach_type_by_name function in libbpf to
identify attach type by section name.

This is useful to avoid writing same logic over and over again in user
space applications that leverage libbpf.

Patch 1 has more details on the new function and problem being solved.
Patches 2 and 3 add support for new section names.
Patch 4 uses new function in a selftest.
Patch 5 adds selftest for libbpf_{prog,attach}_type_by_name.

As a side note there are a lot of inconsistencies now between names used by
libbpf and bpftool (e.g. cgroup/skb vs cgroup_skb, cgroup_device and device
vs cgroup/dev, sockops vs sock_ops, etc). This patch set does not address
it but it tries not to make it harder to address it in the future.


Andrey Ignatov (5):
  libbpf: Introduce libbpf_attach_type_by_name
  libbpf: Support cgroup_skb/{e,in}gress section names
  libbpf: Support sk_skb/stream_{parser,verdict} section names
  selftests/bpf: Use libbpf_attach_type_by_name in test_socket_cookie
  selftests/bpf: Test libbpf_{prog,attach}_type_by_name

 tools/lib/bpf/libbpf.c| 129 +++
 tools/lib/bpf/libbpf.h|   2 +
 tools/testing/selftests/bpf/Makefile  |   2 +-
 .../selftests/bpf/test_section_names.c| 208 ++
 .../selftests/bpf/test_socket_cookie.c|   6 +-
 5 files changed, 302 insertions(+), 45 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/test_section_names.c

-- 
2.17.1

[PATCH bpf-next 1/5] libbpf: Introduce libbpf_attach_type_by_name

2018-09-26 Thread Andrey Ignatov

There is a common use-case when ELF object contains multiple BPF
programs and every program has its own section name. If it's cgroup-bpf
then programs have to be 1) loaded and 2) attached to a cgroup.

It's convenient to have information necessary to load BPF program
together with program itself. This is where section name works fine in
conjunction with libbpf_prog_type_by_name that identifies prog_type and
expected_attach_type and these can be used with BPF_PROG_LOAD.

But there is currently no way to identify attach_type by section name
and it leads to messy code in user space that reinvents guessing logic
every time it has to identify attach type to use with BPF_PROG_ATTACH.

The patch introduces libbpf_attach_type_by_name that guesses attach type
by section name if a program can be attached.

The difference between expected_attach_type provided by
libbpf_prog_type_by_name and attach_type provided by
libbpf_attach_type_by_name is the former is used at BPF_PROG_LOAD time
and can be zero if a program of prog_type X has only one corresponding
attach type Y whether the latter provides specific attach type to use
with BPF_PROG_ATTACH.

No new section names were added to section_names array. Only existing
ones were reorganized and attach_type was added where appropriate.

Signed-off-by: Andrey Ignatov 
---
 tools/lib/bpf/libbpf.c | 121 -
 tools/lib/bpf/libbpf.h |   2 +
 2 files changed, 84 insertions(+), 39 deletions(-)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 4f8d43ae20d2..59e589a64d5c 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -2085,58 +2085,82 @@ void bpf_program__set_expected_attach_type(struct 
bpf_program *prog,
prog->expected_attach_type = type;
 }
 
-#define BPF_PROG_SEC_FULL(string, ptype, atype) \
-   { string, sizeof(string) - 1, ptype, atype }
+#define BPF_PROG_SEC_IMPL(string, ptype, eatype, atype) \
+   { string, sizeof(string) - 1, ptype, eatype, atype }
 
-#define BPF_PROG_SEC(string, ptype) BPF_PROG_SEC_FULL(string, ptype, 0)
+/* Programs that can NOT be attached. */
+#define BPF_PROG_SEC(string, ptype) BPF_PROG_SEC_IMPL(string, ptype, 0, 
-EINVAL)
 
-#define BPF_S_PROG_SEC(string, ptype) \
-   BPF_PROG_SEC_FULL(string, BPF_PROG_TYPE_CGROUP_SOCK, ptype)
+/* Programs that can be attached. */
+#define BPF_APROG_SEC(string, ptype, atype) \
+   BPF_PROG_SEC_IMPL(string, ptype, 0, atype)
 
-#define BPF_SA_PROG_SEC(string, ptype) \
-   BPF_PROG_SEC_FULL(string, BPF_PROG_TYPE_CGROUP_SOCK_ADDR, ptype)
+/* Programs that must specify expected attach type at load time. */
+#define BPF_EAPROG_SEC(string, ptype, eatype) \
+   BPF_PROG_SEC_IMPL(string, ptype, eatype, eatype)
+
+/* Programs that can be attached but attach type can't be identified by section
+ * name. Kept for backward compatibility.
+ */
+#define BPF_APROG_COMPAT(string, ptype) BPF_PROG_SEC(string, ptype)
 
 static const struct {
const char *sec;
size_t len;
enum bpf_prog_type prog_type;
enum bpf_attach_type expected_attach_type;
+   enum bpf_attach_type attach_type;
 } section_names[] = {
-   BPF_PROG_SEC("socket",  BPF_PROG_TYPE_SOCKET_FILTER),
-   BPF_PROG_SEC("kprobe/", BPF_PROG_TYPE_KPROBE),
-   BPF_PROG_SEC("kretprobe/",  BPF_PROG_TYPE_KPROBE),
-   BPF_PROG_SEC("classifier",  BPF_PROG_TYPE_SCHED_CLS),
-   BPF_PROG_SEC("action",  BPF_PROG_TYPE_SCHED_ACT),
-   BPF_PROG_SEC("tracepoint/", BPF_PROG_TYPE_TRACEPOINT),
-   BPF_PROG_SEC("raw_tracepoint/", BPF_PROG_TYPE_RAW_TRACEPOINT),
-   BPF_PROG_SEC("xdp", BPF_PROG_TYPE_XDP),
-   BPF_PROG_SEC("perf_event",  BPF_PROG_TYPE_PERF_EVENT),
-   BPF_PROG_SEC("cgroup/skb",  BPF_PROG_TYPE_CGROUP_SKB),
-   BPF_PROG_SEC("cgroup/sock", BPF_PROG_TYPE_CGROUP_SOCK),
-   BPF_PROG_SEC("cgroup/dev",  BPF_PROG_TYPE_CGROUP_DEVICE),
-   BPF_PROG_SEC("lwt_in",  BPF_PROG_TYPE_LWT_IN),
-   BPF_PROG_SEC("lwt_out", BPF_PROG_TYPE_LWT_OUT),
-   BPF_PROG_SEC("lwt_xmit",BPF_PROG_TYPE_LWT_XMIT),
-   BPF_PROG_SEC("lwt_seg6local",   BPF_PROG_TYPE_LWT_SEG6LOCAL),
-   BPF_PROG_SEC("sockops", BPF_PROG_TYPE_SOCK_OPS),
-   BPF_PROG_SEC("sk_skb",  BPF_PROG_TYPE_SK_SKB),
-   BPF_PROG_SEC("sk_msg",  BPF_PROG_TYPE_SK_MSG),
-   BPF_PROG_SEC("lirc_mode2",  BPF_PROG_TYPE_LIRC_MODE2),
-   BPF_PROG_SEC("flow_dissector",  BPF_PROG_TYPE_FLOW_DISSECTOR),
-   BPF_SA_PROG_SEC("cgroup/bind4", BPF_CGROUP_INET4_BIND),
-   BPF_SA_PROG_SEC("cgroup/bind6", BPF_CGROUP_INET6_BIND),
-   BPF_SA_PROG_SEC("cgroup/connect4", BPF_CGROUP_INET4_CONNECT),
-   BPF_SA_PROG_SEC("cgroup/connect6", BPF_CGROUP_INET6_CONNECT),
-   BPF_SA_PROG_SEC("cgroup/sendmsg4", BPF_CGROUP_UDP4_SENDMSG),
-   BPF_SA_PROG_SEC("cgroup/sendmsg6", BPF_CGROUP_UDP6_SENDMSG),
-

[PATCH bpf-next 2/5] libbpf: Support cgroup_skb/{e,in}gress section names

2018-09-26 Thread Andrey Ignatov

Add section names for BPF_CGROUP_INET_INGRESS and BPF_CGROUP_INET_EGRESS
attach types to be able to identify them in libbpf_attach_type_by_name.

"cgroup_skb" is used instead of "cgroup/skb" mostly to easy possible
unifying of how libbpf and bpftool works with section names:
* bpftool uses "cgroup_skb" to in "prog list" sub-command;
* bpftool uses "ingress" and "egress" in "cgroup list" sub-command;
* having two parts instead of three in a string like "cgroup_skb/ingress"
  can be leveraged to split it to prog_type part and attach_type part,
  or vise versa: use two parts to make a section name.

Signed-off-by: Andrey Ignatov 
---
 tools/lib/bpf/libbpf.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 59e589a64d5c..51edf6cd390e 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -2124,6 +2124,10 @@ static const struct {
BPF_PROG_SEC("lwt_out", BPF_PROG_TYPE_LWT_OUT),
BPF_PROG_SEC("lwt_xmit",BPF_PROG_TYPE_LWT_XMIT),
BPF_PROG_SEC("lwt_seg6local",   BPF_PROG_TYPE_LWT_SEG6LOCAL),
+   BPF_APROG_SEC("cgroup_skb/ingress", BPF_PROG_TYPE_CGROUP_SKB,
+   BPF_CGROUP_INET_INGRESS),
+   BPF_APROG_SEC("cgroup_skb/egress",  BPF_PROG_TYPE_CGROUP_SKB,
+   BPF_CGROUP_INET_EGRESS),
BPF_APROG_COMPAT("cgroup/skb",  BPF_PROG_TYPE_CGROUP_SKB),
BPF_APROG_SEC("cgroup/sock",BPF_PROG_TYPE_CGROUP_SOCK,
BPF_CGROUP_INET_SOCK_CREATE),
-- 
2.17.1

Re: [RFC PATCH v2 bpf-next 0/2] verifier liveness simplification

2018-09-26 Thread Jiong Wang


On 22/08/2018 20:00, Edward Cree wrote:

The first patch is a simplification of register liveness tracking by using
 a separate parentage chain for each register and stack slot, thus avoiding
 the need for logic to handle callee-saved registers when applying read
 marks.  In the future this idea may be extended to form use-def chains.


Interesting.

This could potentially help efficient code-gen for 32-bit architectures and I
had been thinking some implementations along this line and would like to have
a discussion.

Program description
===
It will be good if we could know one instruction is safe to operate on low
32-bit sub-register only. If this is true:
  - 32-bit arches could save some code-gen.
  - 64-bit arches could utilize 32-bit sub-register instructions.

Algorithm
===
Based on the current verifier code-path-walker, it looks to me we could get
32-bit safety information using the following algorithm:

  1. assume all instructions operate on 32-bit sub-register initially.
  2. link each insn to insns which define its use.
  3. for a register use, if it is a 64-bit write back to memory, or if it is
 a comparison-then-jump based on 64-bit value, or if it is a right shift,
 then consider the use is a 64-bit use, then all its define insns and their
 parents define insns should be marked as need full 64-bit operation.
  4. currently, register read (REG_LIVE_READ) will be propagated to parent
 state when path prune happened, and REG_LIVE_WRITTEN would screen off such
 propagation. We need to propagate 64-bit mark in similar way, but
 REG_LIVE_WRITTEN shouldn't screen off backward 64-bit mark propagation if
 the written reg also used as source reg (this has raised REG_LIVE_READ
 propagation but it is not necessarily indicating 64-bit read) in the same
 instruction.

Implementation
===
And it seems there could be an implementation based on current liveness tracking
infrastructure without dramatic change.

  1. instruction level use->def chain

 - new active_defs array for each call frame to record which insn the active
   define of one register comes from. callee copies active_defs from caller
   for argument registers only, and copies active_def of R0 to caller when
   exit.

   struct bpf_func_state {
 ...
 s16 active_defs[__MAX_BPF_REG];

 - new use->def chains for each instruction. one eBPF insn could have two
   uses at maximum. also one new boolean "full_ref_def" added to keep the
   final 32-bit safety information. it will be true if this instruction
   needs to operate on full 64-bit.

   bpf_insn_aux_data {
   ...
 struct {
   s16 def[2];
   bool full_ref_def;
 };

  2. Inside mark_reg_read, split SRC_OP to SRC0_OP/SRC1_OP/SRC_OP_64/SRC1_OP_64
 to indicate one register read if from the first or second use slot of this
 instruction, and to indicate whether the read is on full 64-bit which will
 only be true if the read is for 64-bit write back to memory, or 64-bit
 comparison-then-jump, or bit right shift means 64-bit.

 Build use->def chain for any read, but do 64-bit backward propagation
 for 64-bit read only. The propagation is to set full_ref_def to true for
 def and parent defs of the 64-bit read.

   3. Need new bpf_reg_liveness enum, REG_LIVE_READ64 to indicate there is
  64-bit access to one register in the pruned path, and need
  REG_LIVE_WRITTEN64 to indicate a write that is REG_LIVE_WRITTEN but should
  not screen backward propagate 64-bit read info.

   #define REG_LIVE_NONE  0
   #define REG_LIVE_READ  BIT(0)
   #define REG_LIVE_READ64BIT(1)
   #define REG_LIVE_WRITTEN   BIT(2)
   #define REG_LIVE_WRITTEN64 BIT(3)

I might have missed something in above thoughts, would appreciate reviews and
suggestions. I will also send out some concrete patches later.

Regards,
Jiong

Re: bug: 'ethtool -m' reports spurious alarm & warning threshold values for QSFP28 transceivers

2018-09-26 Thread Andrew Lunn

> When you run ethtool -m on this driver, the kernel calls 
> mlx4_en_get_module_info
> to determine the length of the eeprom, and that value will be either 256 or 
> 512
> bytes.

So it sounds like QSFP modules using 8636 are not supported. You would
expect a size to be one of 256, 384, 512 or 640.

> Next it calls mlx4_en_get_module_eeprom, passing in that size 256 to actually
> read the eeprom data, which in turn calls mlx4_get_module_info to fetch the 
> data
> from hardware, again, passing in 256 as the size for the first call (theres a
> loop, but it will only get executed once in this scenario)
> 
> mlx4_get_module_info then issues the appropriate mailbox commands to dump the
> eeprom.  Here it starts to go sideways.  The mailbox buffer allocated for the
> return data is of type mlx4_mad_ifc, which has some front matter information 
> and
> a data buffer that is 192 bytes long!

Which suggests all SFP dumps are broken as well, not just QSFP.

Oh dear.

   Andrew

Re: bug: 'ethtool -m' reports spurious alarm & warning threshold values for QSFP28 transceivers

2018-09-26 Thread Andrew Lunn

On Wed, Sep 26, 2018 at 08:47:34PM +, Chris Preimesberger wrote:
> Hello Andrew,
> 
> Thank you for the quick response!!
> Apologies in advance for my use of outlook and top-posting, etc...
> 
> I've run the raw option and the hex option, and pasted the results below.
> Since the raw option printed strange characters on the CLI, I re-ran it,
> Sending the output to a file (raw.txt) and attached that file as well.
> 
> Pasted from Ubuntu CLI:
> 
> tech1@D7:~$ 
> tech1@D7:~$ 
> tech1@D7:~$ 
> tech1@D7:~$ 
> tech1@D7:~$ sudo ethtool -m enp1s0 raw on
> UU$��pA`?�@�G#
>  �v���@TRANSITION  ��TNQSFP100GCWDM4 
> 1AfX%F??�TN02000301  180919  
> h�I��_��'��Ri=`��Zntech1@D7:~$ 
> tech1@D7:~$ 
> tech1@D7:~$ 
> tech1@D7:~$ 
> tech1@D7:~$ sudo ethtool -m enp1s0 hex on
> OffsetValues
> ----
> 0x:   11 00 00 0f 00 00 00 00 00 55 55 00 00 00 00 00 
> 0x0010:   00 00 00 00 00 00 24 e2 00 00 81 68 00 00 00 00 
> 0x0020:   00 00 00 00 00 00 00 00 00 00 41 60 3f e0 40 e0 
> 0x0030:   47 00 1f 10 0e 1e 0b f7 12 76 00 00 00 00 00 00 
> 0x0040:   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
> 0x0050:   00 00 00 00 00 00 00 00 00 00 00 00 00 01 00 00 
> 0x0060:   00 00 00 00 00 00 00 00 00 00 1f 00 00 00 00 00 
> 0x0070:   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
> 0x0080:   11 fc 07 80 00 00 00 00 00 00 00 03 ff 00 02 00 
> 0x0090:   00 00 00 40 54 52 41 4e 53 49 54 49 4f 4e 20 20 
> 0x00a0:   20 20 20 20 00 00 c0 f2 54 4e 51 53 46 50 31 30 
> 0x00b0:   30 47 43 57 44 4d 34 20 31 41 66 58 25 1c 46 3f 
> 0x00c0:   06 00 3f d6 54 4e 30 32 30 30 30 33 30 31 20 20 
> 0x00d0:   20 20 20 20 31 38 30 39 31 39 20 20 0c 00 68 f3 
> 0x00e0:   00 00 02 49 80 a0 5f 1f de c9 27 16 f8 ae 52 69 
> 0x00f0:   3d 02 60 00 00 00 00 00 00 00 00 00 83 f4 5a 6e 

Hi Chris

I've only recently got involved with SFP modules. ethtool says this is
a SFF-8636 SFP. So a QSFP. It has multiple pages, each 128 bytes in
length, which should be returned in a concatenated form. Here we see
256 bytes, meaning there are two pages. There can be up to 5 pages.

ethtool is looking for the temperature alarms at offset 0x200. So that
does not exist in this hex dump. But the raw dump you provided has
more bytes, 0x400 of them.

So i would say the first bug is that ethtool dumps different amounts
of data in hex than raw.

The fact you get different alarm thresholds on different runs suggests
to me we might only be getting two pages from the kernel?

Can you build ethtool from source and run it inside a debugger?
ethtool makes two IOCTL calls. The first is ETHTOOL_GMODULEINFO.
Could you print out the modinfo which is returned. It then does a
ETHTOOL_GMODULEEEPROM. Can you print out eeprom after the second
IOCTL.

Thanks
Andrew

Re: bug: 'ethtool -m' reports spurious alarm & warning threshold values for QSFP28 transceivers

2018-09-26 Thread Neil Horman

On Wed, Sep 26, 2018 at 07:29:23PM +, Chris Preimesberger wrote:
> Hello,
> 
> I'm re-sending in plain text per the auto-reply from a spam filter.  I have 
> attached some text files this time, which explain the situation below, in 
> case the below email's font & formatting is now too messed up for easy 
> comprehension.
> 
> Thank you and best regards.
> 
> 
> Chris Preimesberger | Test & Validation Engineer
> Transition Networks, Inc.
> 
> chr...@transition.com
> direct: +1.952.996.1509 | fax: +1.952.941.2322 | www.transition.com
> 
> 
> 
> 
This is just a drive by guess, but I think this is a driver issue.  

Issue 1 seems like a red herring, cat doesn't modify output, nor does ethtool
know if its output is going to a console or a pipe, its all the same.  And given
issue 2 (that the output of the thresholds, etc are spurriously changing and
wrong), suggests that they are spurriously changing and wrong regardless of what
cat does.

That said, I think issue two is a problem with the mlx4 driver.  Specifically
that the driver is copying garbage data.

The three ethtool functions at work here are:
mlx4_en_get_module_info
mlx4_en_get_module_eeprom
mlx4_get_module_info

When you run ethtool -m on this driver, the kernel calls mlx4_en_get_module_info
to determine the length of the eeprom, and that value will be either 256 or 512
bytes.  Lets assume that the value is 256 for the sake of argument

Next it calls mlx4_en_get_module_eeprom, passing in that size 256 to actually
read the eeprom data, which in turn calls mlx4_get_module_info to fetch the data
from hardware, again, passing in 256 as the size for the first call (theres a
loop, but it will only get executed once in this scenario)

mlx4_get_module_info then issues the appropriate mailbox commands to dump the
eeprom.  Here it starts to go sideways.  The mailbox buffer allocated for the
return data is of type mlx4_mad_ifc, which has some front matter information and
a data buffer that is 192 bytes long!

A little further down in the function, size gets restricted if the buffer
crosses a page boundary, but given that the size is 256 on the first call here,
and offset is zero on the first call, we're not crossing anything, so size
remains unchanged.

The output mailbox buffer outmad->data (a 192 byte array), then gets cast to a
sturt mlx4_cable_info structure, which has its own internal data buffer that is
only 48 bytes long.

The memcpy in this functionthen copies cable_info->data to the buffer that gets
returned to ethtool, but it copies size bytes (256), even though the source data
buffer is only 48 bytes long.  That 48 byte array is embedded in the larger 192
byte structure, so there won't be a panic on the overrun, but theres no telling
what garbage is in the buffer beyond those first 48 bytes.  Even if the
remaining 144 bytes have valid eeprom data, its less than the required 256
bytes.  The additional copy may cause a panic, but if the buffer commonly bumps
up against other allocated memory, that will go unnoticed.

after the memcpy, mlx4_get_module_info just returns the size of the passed in
buffer (256), and so the calling function thinks its work is done, and lets  the
kernel send back the buffer with garbage data to ethtool.

I think the mlx4 guys have some work to do here.

My $0.02
Neil

> 
> 
> From: Chris Preimesberger 
> Sent: Wednesday, September 26, 2018 2:14 PM
> To: 'linvi...@tuxdriver.com'; 'netdev@vger.kernel.org'
> Subject: bug: 'ethtool -m' reports spurious alarm & warning threshold values 
> for QSFP28 transceivers
> 
> Hello John, All,
> 
> 
> I think I may have found a bug or two in ethtool, with respect to its 
> reporting of a QSFP28 transceiver's diagnostic information.  Ethtool seems to 
> correctly report all diagnostic information about QSFP28 transceivers, except 
> for the transceiver's warning and alarm thresholds.  I'm not sure whether the 
> spurious warning and alarm values that get reported are the fault of ethtool 
> or my NIC/driver, and I have no other models of 100GbE NICs to test with.  
> I've contacted Mellanox support about this, and they point the finger at 
> ethtool.  Can these issues be investigated by ethtool developers?  Here is 
> some background information about the equipment and software used when I 
> observe these issues:
> 
> Equipment used:
> NIC: Mellanox ConnectX-4 100GbE, part number MCX415A-CCAT
> Transceiver: Any 40Gb or 100Gb QSFP28 transceiver installed in the NIC 
> (Intel, Mellanox, Transition Networks, etc..)
> 
> Software used:
> Ubuntu 18.04 with the distro's packaged NIC driver and ethtool v4.15
> also tested were ethtool v4.18 compiled from source and the current Mellanox 
> OFED driver.
> 
> All test scenarios produced the same bugs.
> 
> 
> Bug #1.  Ethtool's reporting of the installed transceiver's alarm and warning 
> thresholds will differ, depending on whether or not ethtool is piped to 
> another command.  Example commands are

Re: bpf: Massive skbuff_head_cache memory leak?

2018-09-26 Thread Daniel Borkmann

On 09/26/2018 11:09 PM, Tetsuo Handa wrote:
> Hello, Alexei and Daniel.
> 
> Can you show us how to run testcases you are testing?

Sorry for the delay; currently quite backlogged but will definitely take a look
at these reports. Regarding your question: majority of test cases are in the
kernel tree under selftests, see tools/testing/selftests/bpf/ .

> On 2018/09/22 22:25, Tetsuo Handa wrote:
>> Hello.
>>
>> syzbot is reporting many lockup problems on bpf.git / bpf-next.git / net.git 
>> / net-next.git trees.
>>
>>   INFO: rcu detected stall in br_multicast_port_group_expired (2)
>>   
>> https://syzkaller.appspot.com/bug?id=15c7ad8cf35a07059e8a697a22527e11d294bc94
>>
>>   INFO: rcu detected stall in tun_chr_close
>>   
>> https://syzkaller.appspot.com/bug?id=6c50618bde03e5a2eefdd0269cf9739c5ebb8270
>>
>>   INFO: rcu detected stall in discover_timer
>>   
>> https://syzkaller.appspot.com/bug?id=55da031ddb910e58ab9c6853a5784efd94f03b54
>>
>>   INFO: rcu detected stall in ret_from_fork (2)
>>   
>> https://syzkaller.appspot.com/bug?id=c83129a6683b44b39f5b8864a1325893c9218363
>>
>>   INFO: rcu detected stall in addrconf_rs_timer
>>   
>> https://syzkaller.appspot.com/bug?id=21c029af65f81488edbc07a10ed20792444711b6
>>
>>   INFO: rcu detected stall in kthread (2)
>>   
>> https://syzkaller.appspot.com/bug?id=6accd1ed11c31110fed1982f6ad38cc9676477d2
>>
>>   INFO: rcu detected stall in ext4_filemap_fault
>>   
>> https://syzkaller.appspot.com/bug?id=817e38d20e9ee53390ac361bf0fd2007eaf188af
>>
>>   INFO: rcu detected stall in run_timer_softirq (2)
>>   
>> https://syzkaller.appspot.com/bug?id=f5a230a3ff7822f8d39fddf8485931bd06ae47fe
>>
>>   INFO: rcu detected stall in bpf_prog_ADDR
>>   
>> https://syzkaller.appspot.com/bug?id=fb4911fd0e861171cc55124e209f810a0dd68744
>>
>>   INFO: rcu detected stall in __run_timers (2)
>>   
>> https://syzkaller.appspot.com/bug?id=65416569ddc8d2feb8f19066aa761f5a47f7451a
>>
>> The cause of lockup seems to be flood of printk() messages from memory 
>> allocation
>> failures, and one of out_of_memory() messages indicates that 
>> skbuff_head_cache
>> usage is huge enough to suspect in-kernel memory leaks.
>>
>>   [ 1554.547011] skbuff_head_cache1847887KB1847887KB
>>
>> Unfortunately, we cannot find from logs what syzbot is trying to do
>> because constant printk() messages is flooding away syzkaller messages.
>> Can you try running your testcases with kmemleak enabled?
>>
> 
> On 2018/09/27 2:35, Dmitry Vyukov wrote:
>> I also started suspecting Apparmor. We switched to Apparmor on Aug 30:
>> https://groups.google.com/d/msg/syzkaller-bugs/o73lO4KGh0w/j9pcH2tSBAAJ
>> Now the instances that use SELinux and Smack explicitly contain that
>> in the name, but the rest are Apparmor.
>> Aug 30 roughly matches these assorted "task hung" reports. Perhaps
>> some Apparmor hook leaks a reference to skbs?
> 
> Maybe. They have CONFIG_DEFAULT_SECURITY="apparmor". But I'm wondering why
> this problem is not occurring on linux-next.git when this problem is occurring
> on bpf.git / bpf-next.git / net.git / net-next.git trees. Is syzbot running
> different testcases depending on which git tree is targeted?
>

Re: [PATCH] Documentation: Add HOWTO Korean translation into BPF and XDP Reference Guide.

2018-09-26 Thread Daniel Borkmann

On 09/26/2018 09:44 PM, Jonathan Corbet wrote:
> On Wed, 26 Sep 2018 18:11:42 +0900
> Chang-an Song  wrote:
> 
>>>  - The original document has a copyright assertion but no associated
>>>license.  Do we know what the license is?  I assume it's something
>>>that is free and GPL-compatible, but that would be good to know for
>>>sure.  
>>
>> 3. I asked to main author Daniel that apache 2.0 license for this document.
> 
> That is a bit of a problem, since Apache v2 is not compatible with GPLv2.
> If the license of the document cannot be changed, I don't think we can
> accept it into the kernel tree.

Alternative option could also be to integrate it into Cilium's doc given the
original document is present there as well, so it could link to the Korean
version from there.

Re: bpf: Massive skbuff_head_cache memory leak?

2018-09-26 Thread Tetsuo Handa

Hello, Alexei and Daniel.

Can you show us how to run testcases you are testing?

On 2018/09/22 22:25, Tetsuo Handa wrote:
> Hello.
> 
> syzbot is reporting many lockup problems on bpf.git / bpf-next.git / net.git 
> / net-next.git trees.
> 
>   INFO: rcu detected stall in br_multicast_port_group_expired (2)
>   
> https://syzkaller.appspot.com/bug?id=15c7ad8cf35a07059e8a697a22527e11d294bc94
> 
>   INFO: rcu detected stall in tun_chr_close
>   
> https://syzkaller.appspot.com/bug?id=6c50618bde03e5a2eefdd0269cf9739c5ebb8270
> 
>   INFO: rcu detected stall in discover_timer
>   
> https://syzkaller.appspot.com/bug?id=55da031ddb910e58ab9c6853a5784efd94f03b54
> 
>   INFO: rcu detected stall in ret_from_fork (2)
>   
> https://syzkaller.appspot.com/bug?id=c83129a6683b44b39f5b8864a1325893c9218363
> 
>   INFO: rcu detected stall in addrconf_rs_timer
>   
> https://syzkaller.appspot.com/bug?id=21c029af65f81488edbc07a10ed20792444711b6
> 
>   INFO: rcu detected stall in kthread (2)
>   
> https://syzkaller.appspot.com/bug?id=6accd1ed11c31110fed1982f6ad38cc9676477d2
> 
>   INFO: rcu detected stall in ext4_filemap_fault
>   
> https://syzkaller.appspot.com/bug?id=817e38d20e9ee53390ac361bf0fd2007eaf188af
> 
>   INFO: rcu detected stall in run_timer_softirq (2)
>   
> https://syzkaller.appspot.com/bug?id=f5a230a3ff7822f8d39fddf8485931bd06ae47fe
> 
>   INFO: rcu detected stall in bpf_prog_ADDR
>   
> https://syzkaller.appspot.com/bug?id=fb4911fd0e861171cc55124e209f810a0dd68744
> 
>   INFO: rcu detected stall in __run_timers (2)
>   
> https://syzkaller.appspot.com/bug?id=65416569ddc8d2feb8f19066aa761f5a47f7451a
> 
> The cause of lockup seems to be flood of printk() messages from memory 
> allocation
> failures, and one of out_of_memory() messages indicates that skbuff_head_cache
> usage is huge enough to suspect in-kernel memory leaks.
> 
>   [ 1554.547011] skbuff_head_cache1847887KB1847887KB
> 
> Unfortunately, we cannot find from logs what syzbot is trying to do
> because constant printk() messages is flooding away syzkaller messages.
> Can you try running your testcases with kmemleak enabled?
> 

On 2018/09/27 2:35, Dmitry Vyukov wrote:
> I also started suspecting Apparmor. We switched to Apparmor on Aug 30:
> https://groups.google.com/d/msg/syzkaller-bugs/o73lO4KGh0w/j9pcH2tSBAAJ
> Now the instances that use SELinux and Smack explicitly contain that
> in the name, but the rest are Apparmor.
> Aug 30 roughly matches these assorted "task hung" reports. Perhaps
> some Apparmor hook leaks a reference to skbs?

Maybe. They have CONFIG_DEFAULT_SECURITY="apparmor". But I'm wondering why
this problem is not occurring on linux-next.git when this problem is occurring
on bpf.git / bpf-next.git / net.git / net-next.git trees. Is syzbot running
different testcases depending on which git tree is targeted?

Re: [PATCH] 9p: potential NULL dereference

2018-09-26 Thread Dominique Martinet

Dan Carpenter wrote on Wed, Sep 26, 2018:
> p9_tag_alloc() is supposed to return error pointers, but we accidentally
> return a NULL here.  It would cause a NULL dereference in the caller.
> 
> Fixes: 996d5b4db4b1 ("9p: Use a slab for allocating requests")
> Signed-off-by: Dan Carpenter 

Good catch, the culprit commit is only in -next so just adding this to
the queue right away.

Thanks!
-- 
Dominique

[PATCH net-next 1/2] tcp: set recv_skip_hint when tcp_inq is less than PAGE_SIZE

2018-09-26 Thread Soheil Hassas Yeganeh

From: Soheil Hassas Yeganeh 

When we have less than PAGE_SIZE of data on receive queue,
we set recv_skip_hint to 0. Instead, set it to the actual
number of bytes available.

Signed-off-by: Soheil Hassas Yeganeh 
Signed-off-by: Eric Dumazet 
---
 net/ipv4/tcp.c | 14 +-
 1 file changed, 9 insertions(+), 5 deletions(-)

diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 69c236943f56..3e17501fc1a1 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -1753,6 +1753,7 @@ static int tcp_zerocopy_receive(struct sock *sk,
struct vm_area_struct *vma;
struct sk_buff *skb = NULL;
struct tcp_sock *tp;
+   int inq;
int ret;
 
if (address & (PAGE_SIZE - 1) || address != zc->address)
@@ -1773,12 +1774,15 @@ static int tcp_zerocopy_receive(struct sock *sk,
 
tp = tcp_sk(sk);
seq = tp->copied_seq;
-   zc->length = min_t(u32, zc->length, tcp_inq(sk));
+   inq = tcp_inq(sk);
+   zc->length = min_t(u32, zc->length, inq);
zc->length &= ~(PAGE_SIZE - 1);
-
-   zap_page_range(vma, address, zc->length);
-
-   zc->recv_skip_hint = 0;
+   if (zc->length) {
+   zap_page_range(vma, address, zc->length);
+   zc->recv_skip_hint = 0;
+   } else {
+   zc->recv_skip_hint = inq;
+   }
ret = 0;
while (length + PAGE_SIZE <= zc->length) {
if (zc->recv_skip_hint < PAGE_SIZE) {
-- 
2.19.0.605.g01d371f741-goog

[PATCH net-next 2/2] tcp: adjust rcv zerocopy hints based on frag sizes

2018-09-26 Thread Soheil Hassas Yeganeh

From: Soheil Hassas Yeganeh 

When SKBs are coalesced, we can have SKBs with different
frag sizes. Some with PAGE_SIZE and some not with PAGE_SIZE.
Since recv_skip_hint is always set to the full SKB size,
it can overestimate the amount that should be read using
normal read for coalesced packets.

Change the recv_skip_hint so that it only includes the first
frags that are not of PAGE_SIZE.

Signed-off-by: Soheil Hassas Yeganeh 
Signed-off-by: Eric Dumazet 
---
 net/ipv4/tcp.c | 11 ++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 3e17501fc1a1..cdbd423bdeb4 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -1805,8 +1805,17 @@ static int tcp_zerocopy_receive(struct sock *sk,
frags++;
}
}
-   if (frags->size != PAGE_SIZE || frags->page_offset)
+   if (frags->size != PAGE_SIZE || frags->page_offset) {
+   int remaining = zc->recv_skip_hint;
+
+   while (remaining && (frags->size != PAGE_SIZE ||
+frags->page_offset)) {
+   remaining -= frags->size;
+   frags++;
+   }
+   zc->recv_skip_hint -= remaining;
break;
+   }
ret = vm_insert_page(vma, address + length,
 skb_frag_page(frags));
if (ret)
-- 
2.19.0.605.g01d371f741-goog

RE: bug: 'ethtool -m' reports spurious alarm & warning threshold values for QSFP28 transceivers

2018-09-26 Thread Chris Preimesberger

Hello Andrew,

Thank you for the quick response!!
Apologies in advance for my use of outlook and top-posting, etc...

I've run the raw option and the hex option, and pasted the results below.
Since the raw option printed strange characters on the CLI, I re-ran it,
Sending the output to a file (raw.txt) and attached that file as well.

Pasted from Ubuntu CLI:

tech1@D7:~$ 
tech1@D7:~$ 
tech1@D7:~$ 
tech1@D7:~$ 
tech1@D7:~$ sudo ethtool -m enp1s0 raw on
UU$��pA`?�@�G#
 �v���@TRANSITION  ��TNQSFP100GCWDM4 
1AfX%F??�TN02000301  180919  
h�I��_��'��Ri=`��Zntech1@D7:~$ 
tech1@D7:~$ 
tech1@D7:~$ 
tech1@D7:~$ 
tech1@D7:~$ sudo ethtool -m enp1s0 hex on
Offset  Values
--  --
0x: 11 00 00 0f 00 00 00 00 00 55 55 00 00 00 00 00 
0x0010: 00 00 00 00 00 00 24 e2 00 00 81 68 00 00 00 00 
0x0020: 00 00 00 00 00 00 00 00 00 00 41 60 3f e0 40 e0 
0x0030: 47 00 1f 10 0e 1e 0b f7 12 76 00 00 00 00 00 00 
0x0040: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
0x0050: 00 00 00 00 00 00 00 00 00 00 00 00 00 01 00 00 
0x0060: 00 00 00 00 00 00 00 00 00 00 1f 00 00 00 00 00 
0x0070: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
0x0080: 11 fc 07 80 00 00 00 00 00 00 00 03 ff 00 02 00 
0x0090: 00 00 00 40 54 52 41 4e 53 49 54 49 4f 4e 20 20 
0x00a0: 20 20 20 20 00 00 c0 f2 54 4e 51 53 46 50 31 30 
0x00b0: 30 47 43 57 44 4d 34 20 31 41 66 58 25 1c 46 3f 
0x00c0: 06 00 3f d6 54 4e 30 32 30 30 30 33 30 31 20 20 
0x00d0: 20 20 20 20 31 38 30 39 31 39 20 20 0c 00 68 f3 
0x00e0: 00 00 02 49 80 a0 5f 1f de c9 27 16 f8 ae 52 69 
0x00f0: 3d 02 60 00 00 00 00 00 00 00 00 00 83 f4 5a 6e 
tech1@D7:~$ 
tech1@D7:~$ 






Chris Preimesberger | Test & Validation Engineer
Transition Networks, Inc.

chr...@transition.com
direct: +1.952.996.1509 | fax: +1.952.941.2322 | www.transition.com


-Original Message-
From: Andrew Lunn [mailto:and...@lunn.ch] 
Sent: Wednesday, September 26, 2018 2:45 PM
To: Chris Preimesberger
Cc: linvi...@tuxdriver.com; netdev@vger.kernel.org
Subject: Re: bug: 'ethtool -m' reports spurious alarm & warning threshold 
values for QSFP28 transceivers

On Wed, Sep 26, 2018 at 07:29:23PM +, Chris Preimesberger wrote:
> Hello,
> 
> I'm re-sending in plain text per the auto-reply from a spam filter.

Yep. no html obfustication accepted here. Please ASCII only please :-)

Please can you also wrap your lines at about 75 characters.

>  I have attached some text files this time, which explain the situation 
> below, in case the below email's font & formatting is now too messed up for 
> easy comprehension.

> Bug #1.  Ethtool's reporting of the installed transceiver's alarm and warning 
> thresholds will differ, depending on whether or not ethtool is piped to 
> another command.  Example commands are below, with their respective differing 
> output values highlighted:

Could you dump the raw values. That will make it easier for us to
reproduce this issue, assuming it is ethtool, and not the kernel
driver.

Thanks
Andrew

UU$òhA`?à@àG
÷{üÿ@TRANSITION
  ÀòTNQSFP100GCWDM4 1AfX%F??ÖTN02000301  180919  hóI 
_ÞÉ'ø®Ri=`ôZn

RE: [PATCH] net/ncsi: Add NCSI OEM command for FB Tiogapass

2018-09-26 Thread Justin.Lee1

> >  As I understand Justin's version adds a generic handler, using the NCSI
> >  Netlink interface to pass OEM commands and responses to and from
> >  userspace, which does the actual packet handling.
> Thanks for the direction Sam! Justin, if you don't mind, can you share the 
> patches you have to add the support? This actually would solve a few other 
> things we are trying to accomplish.


Basically, I add a new flag to indicate the request is coming from the Netlink 
interface to allow the command handler and response handler to react.
#define NCSI_REQ_FLAG_NETLINK_DRIVEN2

The work flow is as below. 

Request:
User space application -> Netlink interface (msg) -> new Netlink handler - 
ncsi_send_cmd_nl() - ncsi_xmit_cmd()

Response:
Response received - ncsi_rcv_rsp() -> internal response handler - 
ncsi_rsp_handler_xxx() -> ncsi_send_netlink_rsp () -> Netlink interface (msg) 
-> user space application
Command timeout - ncsi_request_timeout() -> ncsi_send_netlink_timeout () -> 
Netlink interface (msg with zero data length) -> user space application

Error:
Detected error -> ncsi_send_netlink_err () -> Netlink interface (err msg) -> 
user space application

I will clean up some code and send out the patch. 

Thanks,
Justin

[PATCH] netlink: fix typo in nla_parse_nested() comment

2018-09-26 Thread Johannes Berg

From: Johannes Berg 

Fix a simple typo: attribuets -> attributes

Signed-off-by: Johannes Berg 
---
 include/net/netlink.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/net/netlink.h b/include/net/netlink.h
index ddabc832febc..69866f5a97f2 100644
--- a/include/net/netlink.h
+++ b/include/net/netlink.h
@@ -153,7 +153,7 @@
  *   nla_find()find attribute in stream of 
attributes
  *   nla_find_nested() find attribute in nested attributes
  *   nla_parse()   parse and validate stream of attrs
- *   nla_parse_nested()parse nested attribuets
+ *   nla_parse_nested()parse nested attributes
  *   nla_for_each_attr()   loop over all attributes
  *   nla_for_each_nested() loop over the nested attributes
  *=
-- 
2.14.4

[PATCH] r8169: Disable clk during suspend / resume

2018-09-26 Thread Hans de Goede

Disable the clk during suspend to save power. Note that tp->clk may be
NULL, the clk core functions handle this without problems.

Reviewed-by: Andy Shevchenko 
Tested-by: Carlo Caione 
Signed-off-by: Hans de Goede 
---
 drivers/net/ethernet/realtek/r8169.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/drivers/net/ethernet/realtek/r8169.c 
b/drivers/net/ethernet/realtek/r8169.c
index 474229548731..c289350fb98f 100644
--- a/drivers/net/ethernet/realtek/r8169.c
+++ b/drivers/net/ethernet/realtek/r8169.c
@@ -6852,8 +6852,10 @@ static int rtl8169_suspend(struct device *device)
 {
struct pci_dev *pdev = to_pci_dev(device);
struct net_device *dev = pci_get_drvdata(pdev);
+   struct rtl8169_private *tp = netdev_priv(dev);
 
rtl8169_net_suspend(dev);
+   clk_disable_unprepare(tp->clk);
 
return 0;
 }
@@ -6881,6 +6883,9 @@ static int rtl8169_resume(struct device *device)
 {
struct pci_dev *pdev = to_pci_dev(device);
struct net_device *dev = pci_get_drvdata(pdev);
+   struct rtl8169_private *tp = netdev_priv(dev);
+
+   clk_prepare_enable(tp->clk);
 
if (netif_running(dev))
__rtl8169_resume(dev);
-- 
2.19.0

[PATCH net 1/1] qlcnic: fix Tx descriptor corruption on 82xx devices

2018-09-26 Thread Shahed Shaikh

In regular NIC transmission flow, driver always configures MAC using
Tx queue zero descriptor as a part of MAC learning flow.
But with multi Tx queue supported NIC, regular transmission can occur on
any non-zero Tx queue and from that context it uses
Tx queue zero descriptor to configure MAC, at the same time TX queue
zero could be used by another CPU for regular transmission
which could lead to Tx queue zero descriptor corruption and cause FW
abort.

This patch fixes this in such a way that driver always configures
learned MAC address from the same Tx queue which is used for
regular transmission.

Fixes: 7e2cf4feba05 ("qlcnic: change driver hardware interface mechanism")
Signed-off-by: Shahed Shaikh 
---
 drivers/net/ethernet/qlogic/qlcnic/qlcnic.h |  8 +---
 drivers/net/ethernet/qlogic/qlcnic/qlcnic_83xx_hw.c |  3 ++-
 drivers/net/ethernet/qlogic/qlcnic/qlcnic_83xx_hw.h |  3 ++-
 drivers/net/ethernet/qlogic/qlcnic/qlcnic_hw.h  |  3 ++-
 drivers/net/ethernet/qlogic/qlcnic/qlcnic_io.c  | 12 ++--
 5 files changed, 17 insertions(+), 12 deletions(-)

diff --git a/drivers/net/ethernet/qlogic/qlcnic/qlcnic.h 
b/drivers/net/ethernet/qlogic/qlcnic/qlcnic.h
index 81312924df14..0c443ea98479 100644
--- a/drivers/net/ethernet/qlogic/qlcnic/qlcnic.h
+++ b/drivers/net/ethernet/qlogic/qlcnic/qlcnic.h
@@ -1800,7 +1800,8 @@ struct qlcnic_hardware_ops {
int (*config_loopback) (struct qlcnic_adapter *, u8);
int (*clear_loopback) (struct qlcnic_adapter *, u8);
int (*config_promisc_mode) (struct qlcnic_adapter *, u32);
-   void (*change_l2_filter) (struct qlcnic_adapter *, u64 *, u16);
+   void (*change_l2_filter)(struct qlcnic_adapter *adapter, u64 *addr,
+u16 vlan, struct qlcnic_host_tx_ring *tx_ring);
int (*get_board_info) (struct qlcnic_adapter *);
void (*set_mac_filter_count) (struct qlcnic_adapter *);
void (*free_mac_list) (struct qlcnic_adapter *);
@@ -2064,9 +2065,10 @@ static inline int qlcnic_nic_set_promisc(struct 
qlcnic_adapter *adapter,
 }
 
 static inline void qlcnic_change_filter(struct qlcnic_adapter *adapter,
-   u64 *addr, u16 id)
+   u64 *addr, u16 vlan,
+   struct qlcnic_host_tx_ring *tx_ring)
 {
-   adapter->ahw->hw_ops->change_l2_filter(adapter, addr, id);
+   adapter->ahw->hw_ops->change_l2_filter(adapter, addr, vlan, tx_ring);
 }
 
 static inline int qlcnic_get_board_info(struct qlcnic_adapter *adapter)
diff --git a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_83xx_hw.c 
b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_83xx_hw.c
index 569d54ededec..a79d84f99102 100644
--- a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_83xx_hw.c
+++ b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_83xx_hw.c
@@ -2135,7 +2135,8 @@ int qlcnic_83xx_sre_macaddr_change(struct qlcnic_adapter 
*adapter, u8 *addr,
 }
 
 void qlcnic_83xx_change_l2_filter(struct qlcnic_adapter *adapter, u64 *addr,
- u16 vlan_id)
+ u16 vlan_id,
+ struct qlcnic_host_tx_ring *tx_ring)
 {
u8 mac[ETH_ALEN];
memcpy(, addr, ETH_ALEN);
diff --git a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_83xx_hw.h 
b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_83xx_hw.h
index b75a81246856..73fe2f64491d 100644
--- a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_83xx_hw.h
+++ b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_83xx_hw.h
@@ -550,7 +550,8 @@ int qlcnic_83xx_wrt_reg_indirect(struct qlcnic_adapter *, 
ulong, u32);
 int qlcnic_83xx_nic_set_promisc(struct qlcnic_adapter *, u32);
 int qlcnic_83xx_config_hw_lro(struct qlcnic_adapter *, int);
 int qlcnic_83xx_config_rss(struct qlcnic_adapter *, int);
-void qlcnic_83xx_change_l2_filter(struct qlcnic_adapter *, u64 *, u16);
+void qlcnic_83xx_change_l2_filter(struct qlcnic_adapter *adapter, u64 *addr,
+ u16 vlan, struct qlcnic_host_tx_ring *ring);
 int qlcnic_83xx_get_pci_info(struct qlcnic_adapter *, struct qlcnic_pci_info 
*);
 int qlcnic_83xx_set_nic_info(struct qlcnic_adapter *, struct qlcnic_info *);
 void qlcnic_83xx_initialize_nic(struct qlcnic_adapter *, int);
diff --git a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_hw.h 
b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_hw.h
index 4bb33af8e2b3..56a3bd9e37dc 100644
--- a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_hw.h
+++ b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_hw.h
@@ -173,7 +173,8 @@ int qlcnic_82xx_napi_add(struct qlcnic_adapter *adapter,
 struct net_device *netdev);
 void qlcnic_82xx_get_beacon_state(struct qlcnic_adapter *);
 void qlcnic_82xx_change_filter(struct qlcnic_adapter *adapter,
-  u64 *uaddr, u16 vlan_id);
+  u64 *uaddr, u16 vlan_id,
+  struct qlcnic_host_tx_ring *tx_ring);
 int

Re: bug: 'ethtool -m' reports spurious alarm & warning threshold values for QSFP28 transceivers

2018-09-26 Thread Andrew Lunn

On Wed, Sep 26, 2018 at 07:29:23PM +, Chris Preimesberger wrote:
> Hello,
> 
> I'm re-sending in plain text per the auto-reply from a spam filter.

Yep. no html obfustication accepted here. Please ASCII only please :-)

Please can you also wrap your lines at about 75 characters.

>  I have attached some text files this time, which explain the situation 
> below, in case the below email's font & formatting is now too messed up for 
> easy comprehension.

> Bug #1.  Ethtool's reporting of the installed transceiver's alarm and warning 
> thresholds will differ, depending on whether or not ethtool is piped to 
> another command.  Example commands are below, with their respective differing 
> output values highlighted:

Could you dump the raw values. That will make it easier for us to
reproduce this issue, assuming it is ethtool, and not the kernel
driver.

Thanks
Andrew

Re: [PATCH] Documentation: Add HOWTO Korean translation into BPF and XDP Reference Guide.

2018-09-26 Thread Jonathan Corbet

On Wed, 26 Sep 2018 18:11:42 +0900
Chang-an Song  wrote:

> >  - The original document has a copyright assertion but no associated
> >license.  Do we know what the license is?  I assume it's something
> >that is free and GPL-compatible, but that would be good to know for
> >sure.  
> 
> 3. I asked to main author Daniel that apache 2.0 license for this document.

That is a bit of a problem, since Apache v2 is not compatible with GPLv2.
If the license of the document cannot be changed, I don't think we can
accept it into the kernel tree.

Thanks,

jon

RE: bug: 'ethtool -m' reports spurious alarm & warning threshold values for QSFP28 transceivers

2018-09-26 Thread Chris Preimesberger

Hello,

I'm re-sending in plain text per the auto-reply from a spam filter.  I have 
attached some text files this time, which explain the situation below, in case 
the below email's font & formatting is now too messed up for easy comprehension.

Thank you and best regards.


Chris Preimesberger | Test & Validation Engineer
Transition Networks, Inc.

chr...@transition.com
direct: +1.952.996.1509 | fax: +1.952.941.2322 | www.transition.com






From: Chris Preimesberger 
Sent: Wednesday, September 26, 2018 2:14 PM
To: 'linvi...@tuxdriver.com'; 'netdev@vger.kernel.org'
Subject: bug: 'ethtool -m' reports spurious alarm & warning threshold values 
for QSFP28 transceivers

Hello John, All,


I think I may have found a bug or two in ethtool, with respect to its reporting 
of a QSFP28 transceiver's diagnostic information.  Ethtool seems to correctly 
report all diagnostic information about QSFP28 transceivers, except for the 
transceiver's warning and alarm thresholds.  I'm not sure whether the spurious 
warning and alarm values that get reported are the fault of ethtool or my 
NIC/driver, and I have no other models of 100GbE NICs to test with.  I've 
contacted Mellanox support about this, and they point the finger at ethtool.  
Can these issues be investigated by ethtool developers?  Here is some 
background information about the equipment and software used when I observe 
these issues:

Equipment used:
NIC: Mellanox ConnectX-4 100GbE, part number MCX415A-CCAT
Transceiver: Any 40Gb or 100Gb QSFP28 transceiver installed in the NIC (Intel, 
Mellanox, Transition Networks, etc..)

Software used:
Ubuntu 18.04 with the distro's packaged NIC driver and ethtool v4.15
also tested were ethtool v4.18 compiled from source and the current Mellanox 
OFED driver.

All test scenarios produced the same bugs.


Bug #1.  Ethtool's reporting of the installed transceiver's alarm and warning 
thresholds will differ, depending on whether or not ethtool is piped to another 
command.  Example commands are below, with their respective differing output 
values highlighted:


tech1@D8:~$ sudo ethtool -m enp1s0
    Identifier    : 0x11 (QSFP28)
    Extended identifier   : 0xfc
    Extended identifier description   : 3.5W max. Power consumption
    Extended identifier description   : CDR present in TX, CDR 
present in RX
    Extended identifier description   : High Power Class (> 3.5 W) 
not enabled
    Connector     : 0x07 (LC)
    Transceiver codes : 0x80 0x00 0x00 0x00 0x00 
0x00 0x00 0x00
    Transceiver type  : 100G Ethernet: 100G CWDM4 
MSA with FEC
    Encoding  : 0x03 (NRZ)
    BR, Nominal   : 25500Mbps
    Rate identifier   : 0x00
    Length (SMF,km)   : 2km
    Length (OM3 50um) : 0m
    Length (OM2 50um) : 0m
    Length (OM1 62.5um)   : 0m
    Length (Copper or Active cable)   : 0m
    Transmitter technology    : 0x40 (1310 nm DFB)
    Laser wavelength  : 1310.000nm
    Laser wavelength tolerance    : 47.500nm
    Vendor name   : TRANSITION
    Vendor OUI    : 00:c0:f2
    Vendor PN : TNQSFP100GCWDM4
    Vendor rev    : 1A
    Vendor SN : TN02000302
    Date code : 180919
    Revision Compliance   : SFF-8636 Rev 2.5/2.6/2.7
    Module temperature    : 39.53 degrees C / 103.15 
degrees F
    Module voltage    : 3.3241 V
    Alarm/warning flags implemented   : Yes
    Laser tx bias current (Channel 1) : 34.432 mA
    Laser tx bias current (Channel 2) : 34.432 mA
    Laser tx bias current (Channel 3) : 33.408 mA
    Laser tx bias current (Channel 4) : 33.920 mA
    Transmit avg optical power (Channel 1)    : 0.9048 mW / -0.43 dBm
    Transmit avg optical power (Channel 2)    : 0.7832 mW / -1.06 dBm
    Transmit avg optical power (Channel 3)    : 0.8057 mW / -0.94 dBm
    Transmit avg optical power (Channel 4)    : 0.7014 mW / -1.54 dBm
    Rcvr signal avg optical power(Channel 1)  : 0.7378 mW / -1.32 dBm
    Rcvr signal avg optical power(Channel 2)  : 0.7553 mW / -1.22 dBm
    Rcvr signal avg optical power(Channel 3)  : 0.6529 mW / -1.85 dBm
    Rcvr signal avg optical power(Channel 4)  : 0.6847 mW / -1.64 dBm
    Laser bias current high alarm   (Chan 1)  : Off

Re: [PATCH RFC,net-next 08/10] flow_dissector: add wake-up-on-lan and queue to flow_action

2018-09-26 Thread Florian Fainelli

On 09/25/2018 12:19 PM, Pablo Neira Ayuso wrote:
> These actions need to be added to support bcm sf2 features available
> through the ethtool_rx_flow interface.
> 
> Signed-off-by: Pablo Neira Ayuso 

Reviewed-by: Florian Fainelli 
-- 
Florian

Re: [PATCH RFC,net-next 00/10] add flow_rule infrastructure

2018-09-26 Thread Florian Fainelli

On 09/26/2018 08:51 AM, Jakub Kicinski wrote:
> On Tue, 25 Sep 2018 21:19:51 +0200, Pablo Neira Ayuso wrote:
>> This patchset is adding a new layer between drivers and the existing
>> software frontends, so it's a bit more code, but it is core
>> infrastructure common to everyone and this comes with benefits for
>> driver developers:
> 
> Thanks a lot for doing this!!

Yeah, this is looks great so far, thanks a lot for tackling this.
-- 
Florian

Re: [PATCH RFC,net-next 10/10] dsa: bcm_sf2: use flow_rule infrastructure

2018-09-26 Thread Florian Fainelli

Hi Pablo,

On 09/25/2018 12:20 PM, Pablo Neira Ayuso wrote:
> Update this driver to use the flow_rule infrastructure, hence the same
> code to populate hardware IR can be used from ethtool_rx_flow and the
> cls_flower interfaces.

Thanks for doing the conversion, I believe we could change things a
little bit such that there are fewer things to audit for correctness,
see below:

> 
> Signed-off-by: Pablo Neira Ayuso 
> ---
>  drivers/net/dsa/bcm_sf2_cfp.c | 311 
> ++
>  1 file changed, 166 insertions(+), 145 deletions(-)
> 
> diff --git a/drivers/net/dsa/bcm_sf2_cfp.c b/drivers/net/dsa/bcm_sf2_cfp.c
> index 47c5f272a084..9dace0e25a3a 100644
> --- a/drivers/net/dsa/bcm_sf2_cfp.c
> +++ b/drivers/net/dsa/bcm_sf2_cfp.c
> @@ -251,10 +251,12 @@ static int bcm_sf2_cfp_act_pol_set(struct bcm_sf2_priv 
> *priv,
>  }
>  
>  static void bcm_sf2_cfp_slice_ipv4(struct bcm_sf2_priv *priv,
> -struct ethtool_tcpip4_spec *v4_spec,
> +struct flow_rule *flow_rule,
>  unsigned int slice_num,
>  bool mask)
>  {
> + struct flow_match_ipv4_addrs ipv4;
> + struct flow_match_ports ports;
>   u32 reg, offset;
>  
>   /* C-Tag[31:24]
> @@ -268,41 +270,54 @@ static void bcm_sf2_cfp_slice_ipv4(struct bcm_sf2_priv 
> *priv,
>   offset = CORE_CFP_DATA_PORT(4);
>   core_writel(priv, reg, offset);
>  
> + flow_rule_match_ipv4_addrs(flow_rule, );
> + flow_rule_match_ports(flow_rule, );
> +
>   /* UDF_n_A7 [31:24]
>* UDF_n_A6 [23:8]
>* UDF_n_A5 [7:0]
>*/
> - reg = be16_to_cpu(v4_spec->pdst) >> 8;
> - if (mask)
> + if (mask) {
> + reg = be16_to_cpu(ports.mask->dst) >> 8;
>   offset = CORE_CFP_MASK_PORT(3);
> - else
> + } else {
> + reg = be16_to_cpu(ports.key->dst) >> 8;
>   offset = CORE_CFP_DATA_PORT(3);
> + }

For instance here, instead of having to assign "reg" differently, just
have an intermediate struct flow_match_ports variable that either points
to  or 

I quickly glanced through the changes, and I suspect that they are
correct, but there is unfortunately way too much code movement within
the functions which greatly hinders the ability to have complete
confidence in the changes :)

Can you find a way to only perform the ethtool_rx_flow_spec to flow_rule
conversion as close as possible from the point of use within the callee?

Thanks!
-- 
Florian

[PATCH] [PATCH net-next] openvswitch: Use correct reply values in datapath and vport ops

2018-09-26 Thread Yifeng Sun

This patch fixes the bug that all datapath and vport ops are returning
wrong values (OVS_FLOW_CMD_NEW or OVS_DP_CMD_NEW) in their replies.

Signed-off-by: Yifeng Sun 
---
 net/openvswitch/datapath.c | 20 ++--
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
index 0f5ce77..6679e96 100644
--- a/net/openvswitch/datapath.c
+++ b/net/openvswitch/datapath.c
@@ -1182,14 +1182,14 @@ static int ovs_flow_cmd_set(struct sk_buff *skb, struct 
genl_info *info)
   ovs_header->dp_ifindex,
   reply, info->snd_portid,
   info->snd_seq, 0,
-  OVS_FLOW_CMD_NEW,
+  OVS_FLOW_CMD_SET,
   ufid_flags);
BUG_ON(error < 0);
}
} else {
/* Could not alloc without acts before locking. */
reply = ovs_flow_cmd_build_info(flow, ovs_header->dp_ifindex,
-   info, OVS_FLOW_CMD_NEW, false,
+   info, OVS_FLOW_CMD_SET, false,
ufid_flags);
 
if (IS_ERR(reply)) {
@@ -1265,7 +1265,7 @@ static int ovs_flow_cmd_get(struct sk_buff *skb, struct 
genl_info *info)
}
 
reply = ovs_flow_cmd_build_info(flow, ovs_header->dp_ifindex, info,
-   OVS_FLOW_CMD_NEW, true, ufid_flags);
+   OVS_FLOW_CMD_GET, true, ufid_flags);
if (IS_ERR(reply)) {
err = PTR_ERR(reply);
goto unlock;
@@ -1389,7 +1389,7 @@ static int ovs_flow_cmd_dump(struct sk_buff *skb, struct 
netlink_callback *cb)
if (ovs_flow_cmd_fill_info(flow, ovs_header->dp_ifindex, skb,
   NETLINK_CB(cb->skb).portid,
   cb->nlh->nlmsg_seq, NLM_F_MULTI,
-  OVS_FLOW_CMD_NEW, ufid_flags) < 0)
+  OVS_FLOW_CMD_GET, ufid_flags) < 0)
break;
 
cb->args[0] = bucket;
@@ -1730,7 +1730,7 @@ static int ovs_dp_cmd_set(struct sk_buff *skb, struct 
genl_info *info)
ovs_dp_change(dp, info->attrs);
 
err = ovs_dp_cmd_fill_info(dp, reply, info->snd_portid,
-  info->snd_seq, 0, OVS_DP_CMD_NEW);
+  info->snd_seq, 0, OVS_DP_CMD_SET);
BUG_ON(err < 0);
 
ovs_unlock();
@@ -1761,7 +1761,7 @@ static int ovs_dp_cmd_get(struct sk_buff *skb, struct 
genl_info *info)
goto err_unlock_free;
}
err = ovs_dp_cmd_fill_info(dp, reply, info->snd_portid,
-  info->snd_seq, 0, OVS_DP_CMD_NEW);
+  info->snd_seq, 0, OVS_DP_CMD_GET);
BUG_ON(err < 0);
ovs_unlock();
 
@@ -1785,7 +1785,7 @@ static int ovs_dp_cmd_dump(struct sk_buff *skb, struct 
netlink_callback *cb)
if (i >= skip &&
ovs_dp_cmd_fill_info(dp, skb, NETLINK_CB(cb->skb).portid,
 cb->nlh->nlmsg_seq, NLM_F_MULTI,
-OVS_DP_CMD_NEW) < 0)
+OVS_DP_CMD_GET) < 0)
break;
i++;
}
@@ -2101,7 +2101,7 @@ static int ovs_vport_cmd_set(struct sk_buff *skb, struct 
genl_info *info)
 
err = ovs_vport_cmd_fill_info(vport, reply, genl_info_net(info),
  info->snd_portid, info->snd_seq, 0,
- OVS_VPORT_CMD_NEW);
+ OVS_VPORT_CMD_SET);
BUG_ON(err < 0);
 
ovs_unlock();
@@ -2182,7 +2182,7 @@ static int ovs_vport_cmd_get(struct sk_buff *skb, struct 
genl_info *info)
goto exit_unlock_free;
err = ovs_vport_cmd_fill_info(vport, reply, genl_info_net(info),
  info->snd_portid, info->snd_seq, 0,
- OVS_VPORT_CMD_NEW);
+ OVS_VPORT_CMD_GET);
BUG_ON(err < 0);
rcu_read_unlock();
 
@@ -2218,7 +2218,7 @@ static int ovs_vport_cmd_dump(struct sk_buff *skb, struct 
netlink_callback *cb)
NETLINK_CB(cb->skb).portid,
cb->nlh->nlmsg_seq,
NLM_F_MULTI,
-   OVS_VPORT_CMD_NEW) < 0)
+

Re: [PATCH net 0/2] s390/qeth: fixes 2019-09-26

2018-09-26 Thread David Miller

From: Julian Wiedmann 
Date: Wed, 26 Sep 2018 18:07:08 +0200

> please apply two qeth patches for -net. The first is a trivial cleanup
> required for patch #2 by Jean, which fixes a potential endless loop.

Series applied, thank you.

Re: [PATCH net-next] net: aquantia: Make function aq_fw1x_set_power() static

2018-09-26 Thread David Miller

From: Wei Yongjun 
Date: Wed, 26 Sep 2018 12:20:00 +

> Fixes the following sparse warning:
> 
> drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_utils.c:873:5: warning:
>  symbol 'aq_fw1x_set_power' was not declared. Should it be static?
> 
> Fixes: a0da96c08cfa ("net: aquantia: implement WOL support")
> Signed-off-by: Wei Yongjun 

Applied.

Re: [PATCH net-next] net/core: make function ___gnet_stats_copy_basic() static

2018-09-26 Thread David Miller

From: Wei Yongjun 
Date: Wed, 26 Sep 2018 12:09:45 +

> Fixes the following sparse warning:
> 
> net/core/gen_stats.c:166:1: warning:
>  symbol '___gnet_stats_copy_basic' was not declared. Should it be static?
> 
> Fixes: 5e111210a443 ("net/core: Add new basic hardware counter")
> Signed-off-by: Wei Yongjun 

Applied.

Re: [PATCH net-next] net/tls: Make function get_rec() static

2018-09-26 Thread David Miller

From: Wei Yongjun 
Date: Wed, 26 Sep 2018 12:10:48 +

> Fixes the following sparse warning:
> 
> net/tls/tls_sw.c:655:16: warning:
>  symbol 'get_rec' was not declared. Should it be static?
> 
> Fixes: a42055e8d2c3 ("net/tls: Add support for async encryption of records 
> for performance")
> Signed-off-by: Wei Yongjun 

Applied.

Re: [PATCH net-next v3 1/2] netlink: ipv4 igmp join notifications

2018-09-26 Thread Roopa Prabhu

On Tue, Sep 25, 2018 at 2:34 AM, Patrick Ruddy
 wrote:
> On Wed, 2018-09-19 at 21:47 -0700, David Ahern wrote:
>> On 9/18/18 6:12 AM, Patrick Ruddy wrote:
>> >
>> > I've hit a small snag with adding the new groups. The number of defined
>> > groups currently sits at 31 so I can only add one before hitting the
>>
>> I believe you have no more available. RTNLGRP_* has been defined from 0
>> (RTNLGRP_NONE) to 31 (RTNLGRP_IPV6_MROUTE_R) which covers the u32 range.
>>
>> > limit defined by the 32 bit groups bitmask in socakddr_nl. I can use 1
>> > group for both v4 and v6 notifications which seems like the sensible
>> > options since the AF is carried separately, but it breaks the precedent
>> > where there are separate IPV4 and IPV6 groups for IFADDR.
>> >
>> > I have the combined group patches ready and can share them if that's
>> > the preference.
>> >
>> > Has there been any previous discussion about extending the number of
>> > availabel groups?
>> >
>>
>> I have not tried it, but from a prior code review I believe you have you
>> use setsockopt to add groups > 31.
>
> I can certainly join the new groups using setsockopt and
> NETLINK_ADD_MEMBERSHIP.
> I can't see any examples of extending the defined group list within the
> kernel so I assume I just add to the RTNLGRP enum list with a suitable
> comment to indicate that later groups must be joined with the mechanism
> above or am I missing some other way of dynamically adding groups?
>

With a quick look, there are other subsystem specific groups:
xfrm_nlgroups,  nfnetlink_groups ...which i see apps registering using
NETLINK_ADD_MEMBERSHIP.

seems like an overkill to add something like this for your case.

yet another option to consider:
use family: RTNL_FAMILY_IPMR/  RTNL_FAMILY_IP6MR  with RTM_GETADDR/DELADDR
and use the existing groups: RTNLGRP_IPV4_IFADDR /  RTNLGRP_IPV6_IFADDR

(pls check if this will break any existing users)

precedence is ipmr fib rules.

Re: [PATCH v3 0/5] netlink: nested policy validation

2018-09-26 Thread David Miller

From: Johannes Berg 
Date: Wed, 26 Sep 2018 11:15:29 +0200

> This adds nested policy validation, which lets you specify the
> nested attribute type, e.g. NLA_NESTED with sub-policy, or the
> new NLA_NESTED_ARRAY with sub-sub-policy.
> 
> 
> Changes in v2:
>  * move setting the bad attr pointer/message into validate_nla()
>  * remove the recursion patch since that's no longer needed
>  * simply skip the generic bad attr pointer/message setting in
>case of nested nla_validate() failing since that could fail
>only due to validate_nla() failing inside, which already sets
>the extack information
> 
> Changes in v3:
>  * fix NLA_REJECT to have an error message if none is in policy

Looks great Johannes, series applied.

Thanks!

Re: [PATCH] net/ncsi: Add NCSI OEM command for FB Tiogapass

2018-09-26 Thread Vijay Khemka

>Hi Vijay,

 >  Thanks for the patch; before I get too into a review though I'd like to
 >  loop in Justin (cc'd) who I know is also working on an OEM command patch.
 >  The changes here are very specific (eg. a command specific config option
 >  "CONFIG_NCSI_OEM_CMD_GET_MAC"), which is ok on a small scale but if we
 >  start to add an increasing amount of commands could get out of hand.
 >  As I understand Justin's version adds a generic handler, using the NCSI
 >  Netlink interface to pass OEM commands and responses to and from
 >  userspace, which does the actual packet handling.
 >  It would be good to compare these two approaches first before committing
 >  to any one path

Hi Sam,
My oem command handler is generic and can be used for any oem commands and oem 
response handler can be made more generic. We can certainly write a wrapper to 
support netlink oem command to receive form user space. There are Mellanox 
specific functions which sends Mellanox specific request. These needed as a 
part of initial configuration. We can remove Kconfig option with more generic 
approach.

-Vijay

Re: [PATCH net-next 00/15] s390/net: updates 2018-09-26

2018-09-26 Thread David Miller

From: Julian Wiedmann 
Date: Wed, 26 Sep 2018 18:29:01 +0200

> please apply one more series of cleanups and small improvements for qeth
> to net-next. Note that one patch needs to touch both af_iucv and qeth, in
> order to untangle their receive paths.

Series applied, thank you.

[PATCH net-next 10/15] s390/qeth: remove CARD_FROM_CDEV helper

2018-09-26 Thread Julian Wiedmann

The cdev-to-card translation walks through two layers of drvdata,
with no locking or refcounting (where eg. the ccwgroup core only
accesses a cdev's drvdata while holding the ccwlock).

This might be safe for now, but any careless usage of the helper has the
potential for subtle races and use-after-free's. Luckily there's only
one occurrence where we _really_ need it (in qeth_irq()), for any other
user we can just pass through an appropriate card pointer.

Signed-off-by: Julian Wiedmann 
---
 drivers/s390/net/qeth_core_main.c | 73 ---
 1 file changed, 29 insertions(+), 44 deletions(-)

diff --git a/drivers/s390/net/qeth_core_main.c 
b/drivers/s390/net/qeth_core_main.c
index 86b9cce1f483..caa5d109841c 100644
--- a/drivers/s390/net/qeth_core_main.c
+++ b/drivers/s390/net/qeth_core_main.c
@@ -746,18 +746,10 @@ static int qeth_check_idx_response(struct qeth_card *card,
return 0;
 }
 
-static struct qeth_card *CARD_FROM_CDEV(struct ccw_device *cdev)
-{
-   struct qeth_card *card = dev_get_drvdata(&((struct ccwgroup_device *)
-   dev_get_drvdata(>dev))->dev);
-   return card;
-}
-
 static struct qeth_cmd_buffer *__qeth_get_buffer(struct qeth_channel *channel)
 {
__u8 index;
 
-   QETH_CARD_TEXT(CARD_FROM_CDEV(channel->ccwdev), 6, "getbuff");
index = channel->io_buf_no;
do {
if (channel->iob[index].state == BUF_STATE_FREE) {
@@ -778,7 +770,6 @@ void qeth_release_buffer(struct qeth_channel *channel,
 {
unsigned long flags;
 
-   QETH_CARD_TEXT(CARD_FROM_CDEV(channel->ccwdev), 6, "relbuff");
spin_lock_irqsave(>iob_lock, flags);
iob->state = BUF_STATE_FREE;
iob->callback = qeth_send_control_data_cb;
@@ -980,16 +971,15 @@ void qeth_schedule_recovery(struct qeth_card *card)
 }
 EXPORT_SYMBOL_GPL(qeth_schedule_recovery);
 
-static int qeth_get_problem(struct ccw_device *cdev, struct irb *irb)
+static int qeth_get_problem(struct qeth_card *card, struct ccw_device *cdev,
+   struct irb *irb)
 {
int dstat, cstat;
char *sense;
-   struct qeth_card *card;
 
sense = (char *) irb->ecw;
cstat = irb->scsw.cmd.cstat;
dstat = irb->scsw.cmd.dstat;
-   card = CARD_FROM_CDEV(cdev);
 
if (cstat & (SCHN_STAT_CHN_CTRL_CHK | SCHN_STAT_INTF_CTRL_CHK |
 SCHN_STAT_CHN_DATA_CHK | SCHN_STAT_CHAIN_CHECK |
@@ -1029,14 +1019,11 @@ static int qeth_get_problem(struct ccw_device *cdev, 
struct irb *irb)
return 0;
 }
 
-static long __qeth_check_irb_error(struct ccw_device *cdev,
-   unsigned long intparm, struct irb *irb)
+static long qeth_check_irb_error(struct qeth_card *card,
+struct ccw_device *cdev, unsigned long intparm,
+struct irb *irb)
 {
-   struct qeth_card *card;
-
-   card = CARD_FROM_CDEV(cdev);
-
-   if (!card || !IS_ERR(irb))
+   if (!IS_ERR(irb))
return 0;
 
switch (PTR_ERR(irb)) {
@@ -1073,10 +1060,13 @@ static void qeth_irq(struct ccw_device *cdev, unsigned 
long intparm,
int rc;
int cstat, dstat;
struct qeth_cmd_buffer *iob = NULL;
+   struct ccwgroup_device *gdev;
struct qeth_channel *channel;
struct qeth_card *card;
 
-   card = CARD_FROM_CDEV(cdev);
+   /* while we hold the ccwdev lock, this stays valid: */
+   gdev = dev_get_drvdata(>dev);
+   card = dev_get_drvdata(>dev);
if (!card)
return;
 
@@ -1096,7 +1086,7 @@ static void qeth_irq(struct ccw_device *cdev, unsigned 
long intparm,
if (qeth_intparm_is_iob(intparm))
iob = (struct qeth_cmd_buffer *) __va((addr_t)intparm);
 
-   if (__qeth_check_irb_error(cdev, intparm, irb)) {
+   if (qeth_check_irb_error(card, cdev, intparm, irb)) {
/* IO was terminated, free its resources. */
if (iob)
qeth_release_buffer(iob->channel, iob);
@@ -1151,7 +1141,7 @@ static void qeth_irq(struct ccw_device *cdev, unsigned 
long intparm,
channel->state = CH_STATE_DOWN;
goto out;
}
-   rc = qeth_get_problem(cdev, irb);
+   rc = qeth_get_problem(card, cdev, irb);
if (rc) {
card->read_or_write_problem = 1;
qeth_clear_ipacmd_list(card);
@@ -1514,12 +1504,11 @@ static struct qeth_card *qeth_alloc_card(struct 
ccwgroup_device *gdev)
return NULL;
 }
 
-static int qeth_clear_channel(struct qeth_channel *channel)
+static int qeth_clear_channel(struct qeth_card *card,
+ struct qeth_channel *channel)
 {
-   struct qeth_card *card;
int rc;
 
-   card = CARD_FROM_CDEV(channel->ccwdev);
QETH_CARD_TEXT(card, 3, "clearch");

[PATCH net-next 14/15] s390/qeth: clean up drop conditions for received cmds

2018-09-26 Thread Julian Wiedmann

If qeth_check_ipa_data() consumed an event, there's no point in
processing it further. So drop it early, and make the surrounding code
a tiny bit more readable.

Signed-off-by: Julian Wiedmann 
---
 drivers/s390/net/qeth_core_main.c | 21 +++--
 1 file changed, 11 insertions(+), 10 deletions(-)

diff --git a/drivers/s390/net/qeth_core_main.c 
b/drivers/s390/net/qeth_core_main.c
index 954dc6c688e8..9cbdc6760aba 100644
--- a/drivers/s390/net/qeth_core_main.c
+++ b/drivers/s390/net/qeth_core_main.c
@@ -826,16 +826,17 @@ static void qeth_send_control_data_cb(struct qeth_card 
*card,
if (IS_IPA(iob->data)) {
cmd = (struct qeth_ipa_cmd *) PDU_ENCAPSULATION(iob->data);
cmd = qeth_check_ipa_data(card, cmd);
-   }
-   if ((cmd == NULL) && (card->state != CARD_STATE_DOWN))
-   goto out;
-   /*in case of OSN : check if cmd is set */
-   if (card->info.type == QETH_CARD_TYPE_OSN &&
-   cmd &&
-   cmd->hdr.command != IPA_CMD_STARTLAN &&
-   card->osn_info.assist_cb != NULL) {
-   card->osn_info.assist_cb(card->dev, cmd);
-   goto out;
+   if (!cmd)
+   goto out;
+   if (IS_OSN(card) && card->osn_info.assist_cb &&
+   cmd->hdr.command != IPA_CMD_STARTLAN) {
+   card->osn_info.assist_cb(card->dev, cmd);
+   goto out;
+   }
+   } else {
+   /* non-IPA commands should only flow during initialization */
+   if (card->state != CARD_STATE_DOWN)
+   goto out;
}
 
spin_lock_irqsave(>lock, flags);
-- 
2.16.4

[PATCH net-next 11/15] s390/qeth: remove various redundant code

2018-09-26 Thread Julian Wiedmann

1. tracing iob->rc makes no sense when it hasn't been modified by the
   callback,
2. the qeth_dbf_list is declared with LIST_HEAD, which also initializes
   the list,
3. the ccwgroup core only calls the thaw/restore callbacks if the gdev
   is online, so we don't have to check for it again.

Signed-off-by: Julian Wiedmann 
---
 drivers/s390/net/qeth_core_main.c | 7 ---
 drivers/s390/net/qeth_l2_main.c   | 5 +
 drivers/s390/net/qeth_l3_main.c   | 5 +
 3 files changed, 2 insertions(+), 15 deletions(-)

diff --git a/drivers/s390/net/qeth_core_main.c 
b/drivers/s390/net/qeth_core_main.c
index caa5d109841c..5a2b90677235 100644
--- a/drivers/s390/net/qeth_core_main.c
+++ b/drivers/s390/net/qeth_core_main.c
@@ -2167,7 +2167,6 @@ static int qeth_cm_enable_cb(struct qeth_card *card, 
struct qeth_reply *reply,
memcpy(>token.cm_filter_r,
   QETH_CM_ENABLE_RESP_FILTER_TOKEN(iob->data),
   QETH_MPC_TOKEN_LENGTH);
-   QETH_DBF_TEXT_(SETUP, 2, "  rc%d", iob->rc);
return 0;
 }
 
@@ -2193,7 +2192,6 @@ static int qeth_cm_enable(struct qeth_card *card)
 static int qeth_cm_setup_cb(struct qeth_card *card, struct qeth_reply *reply,
unsigned long data)
 {
-
struct qeth_cmd_buffer *iob;
 
QETH_DBF_TEXT(SETUP, 2, "cmsetpcb");
@@ -2202,7 +2200,6 @@ static int qeth_cm_setup_cb(struct qeth_card *card, 
struct qeth_reply *reply,
memcpy(>token.cm_connection_r,
   QETH_CM_SETUP_RESP_DEST_ADDR(iob->data),
   QETH_MPC_TOKEN_LENGTH);
-   QETH_DBF_TEXT_(SETUP, 2, "  rc%d", iob->rc);
return 0;
 }
 
@@ -2224,7 +2221,6 @@ static int qeth_cm_setup(struct qeth_card *card)
rc = qeth_send_control_data(card, CM_SETUP_SIZE, iob,
qeth_cm_setup_cb, NULL);
return rc;
-
 }
 
 static int qeth_update_max_mtu(struct qeth_card *card, unsigned int max_mtu)
@@ -2284,7 +2280,6 @@ static int qeth_get_mtu_outof_framesize(int framesize)
 static int qeth_ulp_enable_cb(struct qeth_card *card, struct qeth_reply *reply,
unsigned long data)
 {
-
__u16 mtu, framesize;
__u16 len;
__u8 link_type;
@@ -2312,7 +2307,6 @@ static int qeth_ulp_enable_cb(struct qeth_card *card, 
struct qeth_reply *reply,
} else
card->info.link_type = 0;
QETH_DBF_TEXT_(SETUP, 2, "link%d", card->info.link_type);
-   QETH_DBF_TEXT_(SETUP, 2, "  rc%d", iob->rc);
return 0;
 }
 
@@ -6584,7 +6578,6 @@ static int __init qeth_core_init(void)
 
pr_info("loading core functions\n");
INIT_LIST_HEAD(_core_card_list.list);
-   INIT_LIST_HEAD(_dbf_list);
rwlock_init(_core_card_list.rwlock);
 
qeth_wq = create_singlethread_workqueue("qeth_wq");
diff --git a/drivers/s390/net/qeth_l2_main.c b/drivers/s390/net/qeth_l2_main.c
index 02566af7e23d..ed475b4ddd3e 100644
--- a/drivers/s390/net/qeth_l2_main.c
+++ b/drivers/s390/net/qeth_l2_main.c
@@ -1146,9 +1146,6 @@ static int qeth_l2_pm_resume(struct ccwgroup_device *gdev)
struct qeth_card *card = dev_get_drvdata(>dev);
int rc = 0;
 
-   if (gdev->state == CCWGROUP_OFFLINE)
-   goto out;
-
if (card->state == CARD_STATE_RECOVER) {
rc = __qeth_l2_set_online(card->gdev, 1);
if (rc) {
@@ -1158,7 +1155,7 @@ static int qeth_l2_pm_resume(struct ccwgroup_device *gdev)
}
} else
rc = __qeth_l2_set_online(card->gdev, 0);
-out:
+
qeth_set_allowed_threads(card, 0x, 0);
netif_device_attach(card->dev);
if (rc)
diff --git a/drivers/s390/net/qeth_l3_main.c b/drivers/s390/net/qeth_l3_main.c
index 7148ef71ac78..86c89338b1a2 100644
--- a/drivers/s390/net/qeth_l3_main.c
+++ b/drivers/s390/net/qeth_l3_main.c
@@ -2714,9 +2714,6 @@ static int qeth_l3_pm_resume(struct ccwgroup_device *gdev)
struct qeth_card *card = dev_get_drvdata(>dev);
int rc = 0;
 
-   if (gdev->state == CCWGROUP_OFFLINE)
-   goto out;
-
if (card->state == CARD_STATE_RECOVER) {
rc = __qeth_l3_set_online(card->gdev, 1);
if (rc) {
@@ -2726,7 +2723,7 @@ static int qeth_l3_pm_resume(struct ccwgroup_device *gdev)
}
} else
rc = __qeth_l3_set_online(card->gdev, 0);
-out:
+
qeth_set_allowed_threads(card, 0x, 0);
netif_device_attach(card->dev);
if (rc)
-- 
2.16.4

[PATCH net-next 15/15] s390/qeth: remove duplicated carrier state tracking

2018-09-26 Thread Julian Wiedmann

The netdevice is always available, apply any carrier state changes to it
without caching them.
On a STARTLAN event (ie. carrier-up), defer updating the state to
qeth_core_hardsetup_card() in the subsequent recovery action.

Also remove the carrier-state checks from the xmit routines. Stopping
transmission on carrier-down is the responsibility of upper-level code
(eg see dev_direct_xmit()).

Signed-off-by: Julian Wiedmann 
---
 drivers/s390/net/qeth_core.h  |  1 -
 drivers/s390/net/qeth_core_main.c | 12 +---
 drivers/s390/net/qeth_core_sys.c  |  7 +++
 drivers/s390/net/qeth_l2_main.c   |  6 +-
 drivers/s390/net/qeth_l3_main.c   |  6 +-
 5 files changed, 10 insertions(+), 22 deletions(-)

diff --git a/drivers/s390/net/qeth_core.h b/drivers/s390/net/qeth_core.h
index 64bcb1237db0..cd44ff2df6fe 100644
--- a/drivers/s390/net/qeth_core.h
+++ b/drivers/s390/net/qeth_core.h
@@ -763,7 +763,6 @@ struct qeth_switch_info {
 struct qeth_card {
struct list_head list;
enum qeth_card_states state;
-   int lan_online;
spinlock_t lock;
struct ccwgroup_device *gdev;
struct qeth_channel read;
diff --git a/drivers/s390/net/qeth_core_main.c 
b/drivers/s390/net/qeth_core_main.c
index 9cbdc6760aba..0078b5d217cc 100644
--- a/drivers/s390/net/qeth_core_main.c
+++ b/drivers/s390/net/qeth_core_main.c
@@ -652,16 +652,13 @@ static struct qeth_ipa_cmd *qeth_check_ipa_data(struct 
qeth_card *card,
 "The link for interface %s on CHPID 0x%X 
failed\n",
 QETH_CARD_IFNAME(card), card->info.chpid);
qeth_issue_ipa_msg(cmd, cmd->hdr.return_code, card);
+   netif_carrier_off(card->dev);
}
-   card->lan_online = 0;
-   netif_carrier_off(card->dev);
return NULL;
case IPA_CMD_STARTLAN:
dev_info(>gdev->dev,
 "The link for %s on CHPID 0x%X has been restored\n",
 QETH_CARD_IFNAME(card), card->info.chpid);
-   netif_carrier_on(card->dev);
-   card->lan_online = 1;
if (card->info.hwtrap)
card->info.hwtrap = 2;
qeth_schedule_recovery(card);
@@ -5133,13 +5130,14 @@ int qeth_core_hardsetup_card(struct qeth_card *card)
if (rc == IPA_RC_LAN_OFFLINE) {
dev_warn(>gdev->dev,
"The LAN is offline\n");
-   card->lan_online = 0;
+   netif_carrier_off(card->dev);
} else {
rc = -ENODEV;
goto out;
}
-   } else
-   card->lan_online = 1;
+   } else {
+   netif_carrier_on(card->dev);
+   }
 
card->options.ipa4.supported_funcs = 0;
card->options.ipa6.supported_funcs = 0;
diff --git a/drivers/s390/net/qeth_core_sys.c b/drivers/s390/net/qeth_core_sys.c
index 970f6c71a66e..30f61608fa22 100644
--- a/drivers/s390/net/qeth_core_sys.c
+++ b/drivers/s390/net/qeth_core_sys.c
@@ -31,10 +31,9 @@ static ssize_t qeth_dev_state_show(struct device *dev,
case CARD_STATE_SOFTSETUP:
return sprintf(buf, "SOFTSETUP\n");
case CARD_STATE_UP:
-   if (card->lan_online)
-   return sprintf(buf, "UP (LAN ONLINE)\n");
-   else
-   return sprintf(buf, "UP (LAN OFFLINE)\n");
+   return sprintf(buf, "UP (LAN %s)\n",
+  netif_carrier_ok(card->dev) ? "ONLINE" :
+"OFFLINE");
case CARD_STATE_RECOVER:
return sprintf(buf, "RECOVER\n");
default:
diff --git a/drivers/s390/net/qeth_l2_main.c b/drivers/s390/net/qeth_l2_main.c
index ed475b4ddd3e..c810d53fff51 100644
--- a/drivers/s390/net/qeth_l2_main.c
+++ b/drivers/s390/net/qeth_l2_main.c
@@ -694,7 +694,7 @@ static netdev_tx_t qeth_l2_hard_start_xmit(struct sk_buff 
*skb,
int tx_bytes = skb->len;
int rc;
 
-   if ((card->state != CARD_STATE_UP) || !card->lan_online) {
+   if (card->state != CARD_STATE_UP) {
card->stats.tx_carrier_errors++;
goto tx_drop;
}
@@ -997,10 +997,6 @@ static int __qeth_l2_set_online(struct ccwgroup_device 
*gdev, int recovery_mode)
goto out_remove;
}
card->state = CARD_STATE_SOFTSETUP;
-   if (card->lan_online)
-   netif_carrier_on(card->dev);
-   else
-   netif_carrier_off(card->dev);
 
qeth_set_allowed_threads(card, 0x, 0);
 
diff --git a/drivers/s390/net/qeth_l3_main.c b/drivers/s390/net/qeth_l3_main.c
index 86c89338b1a2..80893481bb85 100644
--- a/drivers/s390/net/qeth_l3_main.c
+++ b/drivers/s390/net/qeth_l3_main.c
@@ -2233,7 +2233,7 @@ static netdev_tx_t

[PATCH net-next 13/15] s390/qeth: re-indent qeth_check_ipa_data()

2018-09-26 Thread Julian Wiedmann

Pull one level of checking up into qeth_send_control_data_cb(), and
clean up an else-after-return. No functional change.

Signed-off-by: Julian Wiedmann 
---
 drivers/s390/net/qeth_core_main.c | 129 +-
 1 file changed, 58 insertions(+), 71 deletions(-)

diff --git a/drivers/s390/net/qeth_core_main.c 
b/drivers/s390/net/qeth_core_main.c
index 880c15647442..954dc6c688e8 100644
--- a/drivers/s390/net/qeth_core_main.c
+++ b/drivers/s390/net/qeth_core_main.c
@@ -626,80 +626,64 @@ static void qeth_issue_ipa_msg(struct qeth_ipa_cmd *cmd, 
int rc,
 }
 
 static struct qeth_ipa_cmd *qeth_check_ipa_data(struct qeth_card *card,
-   struct qeth_cmd_buffer *iob)
+   struct qeth_ipa_cmd *cmd)
 {
-   struct qeth_ipa_cmd *cmd = NULL;
-
QETH_CARD_TEXT(card, 5, "chkipad");
-   if (IS_IPA(iob->data)) {
-   cmd = (struct qeth_ipa_cmd *) PDU_ENCAPSULATION(iob->data);
-   if (IS_IPA_REPLY(cmd)) {
-   if (cmd->hdr.command != IPA_CMD_SETCCID &&
-   cmd->hdr.command != IPA_CMD_DELCCID &&
-   cmd->hdr.command != IPA_CMD_MODCCID &&
-   cmd->hdr.command != IPA_CMD_SET_DIAG_ASS)
-   qeth_issue_ipa_msg(cmd,
-   cmd->hdr.return_code, card);
-   return cmd;
+
+   if (IS_IPA_REPLY(cmd)) {
+   if (cmd->hdr.command != IPA_CMD_SETCCID &&
+   cmd->hdr.command != IPA_CMD_DELCCID &&
+   cmd->hdr.command != IPA_CMD_MODCCID &&
+   cmd->hdr.command != IPA_CMD_SET_DIAG_ASS)
+   qeth_issue_ipa_msg(cmd, cmd->hdr.return_code, card);
+   return cmd;
+   }
+
+   /* handle unsolicited event: */
+   switch (cmd->hdr.command) {
+   case IPA_CMD_STOPLAN:
+   if (cmd->hdr.return_code == IPA_RC_VEPA_TO_VEB_TRANSITION) {
+   dev_err(>gdev->dev,
+   "Interface %s is down because the adjacent port 
is no longer in reflective relay mode\n",
+   QETH_CARD_IFNAME(card));
+   qeth_close_dev(card);
} else {
-   switch (cmd->hdr.command) {
-   case IPA_CMD_STOPLAN:
-   if (cmd->hdr.return_code ==
-   IPA_RC_VEPA_TO_VEB_TRANSITION) {
-   dev_err(>gdev->dev,
-  "Interface %s is down because the "
-  "adjacent port is no longer in "
-  "reflective relay mode\n",
-  QETH_CARD_IFNAME(card));
-   qeth_close_dev(card);
-   } else {
-   dev_warn(>gdev->dev,
-  "The link for interface %s on CHPID"
-  " 0x%X failed\n",
-  QETH_CARD_IFNAME(card),
-  card->info.chpid);
-   qeth_issue_ipa_msg(cmd,
-   cmd->hdr.return_code, card);
-   }
-   card->lan_online = 0;
-   netif_carrier_off(card->dev);
-   return NULL;
-   case IPA_CMD_STARTLAN:
-   dev_info(>gdev->dev,
-  "The link for %s on CHPID 0x%X has"
-  " been restored\n",
-  QETH_CARD_IFNAME(card),
-  card->info.chpid);
-   netif_carrier_on(card->dev);
-   card->lan_online = 1;
-   if (card->info.hwtrap)
-   card->info.hwtrap = 2;
-   qeth_schedule_recovery(card);
-   return NULL;
-   case IPA_CMD_SETBRIDGEPORT_IQD:
-   case IPA_CMD_SETBRIDGEPORT_OSA:
-   case IPA_CMD_ADDRESS_CHANGE_NOTIF:
-   if (card->discipline->control_event_handler
-   (card, cmd))
-   return cmd;
-   else
-   return NULL;
-   case IPA_CMD_MODCCID:
-   return cmd;
-   case

[PATCH net-next 00/15] s390/net: updates 2018-09-26

2018-09-26 Thread Julian Wiedmann

Hi Dave,

please apply one more series of cleanups and small improvements for qeth
to net-next. Note that one patch needs to touch both af_iucv and qeth, in
order to untangle their receive paths.

Thanks,
Julian


Julian Wiedmann (15):
  s390/qeth: convert layer attribute to enum
  s390/qeth: use DEFINE_MUTEX for qeth_mod_mutex
  s390/qeth: fix discipline unload after setup error
  s390/qeth: on gdev release, reset drvdata
  net/af_iucv: locate IUCV header via skb_network_header()
  s390/qeth: replace open-coded skb_queue_walk()
  s390/qeth: remove additional skb refcount
  s390/qeth: re-use qeth_notify_skbs()
  s390/qeth: pass card pointer in iob callback
  s390/qeth: remove CARD_FROM_CDEV helper
  s390/qeth: remove various redundant code
  s390/qeth: consume local address events
  s390/qeth: re-indent qeth_check_ipa_data()
  s390/qeth: clean up drop conditions for received cmds
  s390/qeth: remove duplicated carrier state tracking

 drivers/s390/net/qeth_core.h  |  21 ++-
 drivers/s390/net/qeth_core_main.c | 373 --
 drivers/s390/net/qeth_core_sys.c  |  15 +-
 drivers/s390/net/qeth_l2_main.c   |  12 +-
 drivers/s390/net/qeth_l3_main.c   |  24 +--
 include/net/iucv/af_iucv.h|   5 +
 net/iucv/af_iucv.c|  42 ++---
 7 files changed, 200 insertions(+), 292 deletions(-)

-- 
2.16.4

[PATCH net-next 12/15] s390/qeth: consume local address events

2018-09-26 Thread Julian Wiedmann

We have no code that is waiting for these events, so just drop them when
they arrive.

Signed-off-by: Julian Wiedmann 
---
 drivers/s390/net/qeth_core_main.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/s390/net/qeth_core_main.c 
b/drivers/s390/net/qeth_core_main.c
index 5a2b90677235..880c15647442 100644
--- a/drivers/s390/net/qeth_core_main.c
+++ b/drivers/s390/net/qeth_core_main.c
@@ -688,10 +688,10 @@ static struct qeth_ipa_cmd *qeth_check_ipa_data(struct 
qeth_card *card,
return cmd;
case IPA_CMD_REGISTER_LOCAL_ADDR:
QETH_CARD_TEXT(card, 3, "irla");
-   break;
+   return NULL;
case IPA_CMD_UNREGISTER_LOCAL_ADDR:
QETH_CARD_TEXT(card, 3, "urla");
-   break;
+   return NULL;
default:
QETH_DBF_MESSAGE(2, "Received data is IPA "
   "but not a reply!\n");
-- 
2.16.4

[PATCH net-next 09/15] s390/qeth: pass card pointer in iob callback

2018-09-26 Thread Julian Wiedmann

This allows us to remove the CARD_FROM_CDEV calls in the iob callbacks.

Signed-off-by: Julian Wiedmann 
---
 drivers/s390/net/qeth_core.h  |  3 ++-
 drivers/s390/net/qeth_core_main.c | 53 ++-
 2 files changed, 32 insertions(+), 24 deletions(-)

diff --git a/drivers/s390/net/qeth_core.h b/drivers/s390/net/qeth_core.h
index 630a01b3212c..64bcb1237db0 100644
--- a/drivers/s390/net/qeth_core.h
+++ b/drivers/s390/net/qeth_core.h
@@ -582,7 +582,8 @@ struct qeth_cmd_buffer {
struct qeth_channel *channel;
unsigned char *data;
int rc;
-   void (*callback) (struct qeth_channel *, struct qeth_cmd_buffer *);
+   void (*callback)(struct qeth_card *card, struct qeth_channel *channel,
+struct qeth_cmd_buffer *iob);
 };
 
 static inline struct qeth_ipa_cmd *__ipa_cmd(struct qeth_cmd_buffer *iob)
diff --git a/drivers/s390/net/qeth_core_main.c 
b/drivers/s390/net/qeth_core_main.c
index 1b853f900720..86b9cce1f483 100644
--- a/drivers/s390/net/qeth_core_main.c
+++ b/drivers/s390/net/qeth_core_main.c
@@ -63,8 +63,9 @@ static struct kmem_cache *qeth_qdio_outbuf_cache;
 static struct device *qeth_core_root_dev;
 static struct lock_class_key qdio_out_skb_queue_key;
 
-static void qeth_send_control_data_cb(struct qeth_channel *,
-   struct qeth_cmd_buffer *);
+static void qeth_send_control_data_cb(struct qeth_card *card,
+ struct qeth_channel *channel,
+ struct qeth_cmd_buffer *iob);
 static struct qeth_cmd_buffer *qeth_get_buffer(struct qeth_channel *);
 static void qeth_free_buffer_pool(struct qeth_card *);
 static int qeth_qdio_establish(struct qeth_card *);
@@ -787,6 +788,13 @@ void qeth_release_buffer(struct qeth_channel *channel,
 }
 EXPORT_SYMBOL_GPL(qeth_release_buffer);
 
+static void qeth_release_buffer_cb(struct qeth_card *card,
+  struct qeth_channel *channel,
+  struct qeth_cmd_buffer *iob)
+{
+   qeth_release_buffer(channel, iob);
+}
+
 static struct qeth_cmd_buffer *qeth_get_buffer(struct qeth_channel *channel)
 {
struct qeth_cmd_buffer *buffer = NULL;
@@ -817,17 +825,16 @@ void qeth_clear_cmd_buffers(struct qeth_channel *channel)
 }
 EXPORT_SYMBOL_GPL(qeth_clear_cmd_buffers);
 
-static void qeth_send_control_data_cb(struct qeth_channel *channel,
- struct qeth_cmd_buffer *iob)
+static void qeth_send_control_data_cb(struct qeth_card *card,
+ struct qeth_channel *channel,
+ struct qeth_cmd_buffer *iob)
 {
-   struct qeth_card *card;
struct qeth_reply *reply, *r;
struct qeth_ipa_cmd *cmd;
unsigned long flags;
int keep_reply;
int rc = 0;
 
-   card = CARD_FROM_CDEV(channel->ccwdev);
QETH_CARD_TEXT(card, 4, "sndctlcb");
rc = qeth_check_idx_response(card, iob->data);
switch (rc) {
@@ -1164,7 +1171,7 @@ static void qeth_irq(struct ccw_device *cdev, unsigned 
long intparm,
__qeth_issue_next_read(card);
 
if (iob && iob->callback)
-   iob->callback(iob->channel, iob);
+   iob->callback(card, iob->channel, iob);
 
 out:
wake_up(>wait_q);
@@ -1804,8 +1811,9 @@ static void qeth_init_func_level(struct qeth_card *card)
 }
 
 static int qeth_idx_activate_get_answer(struct qeth_channel *channel,
-   void (*idx_reply_cb)(struct qeth_channel *,
-   struct qeth_cmd_buffer *))
+   void (*reply_cb)(struct qeth_card *,
+struct qeth_channel *,
+struct qeth_cmd_buffer 
*))
 {
struct qeth_cmd_buffer *iob;
int rc;
@@ -1816,7 +1824,7 @@ static int qeth_idx_activate_get_answer(struct 
qeth_channel *channel,
iob = qeth_get_buffer(channel);
if (!iob)
return -ENOMEM;
-   iob->callback = idx_reply_cb;
+   iob->callback = reply_cb;
qeth_setup_ccw(channel->ccw, CCW_CMD_READ, QETH_BUFSIZE, iob->data);
 
wait_event(card->wait_q,
@@ -1847,8 +1855,9 @@ static int qeth_idx_activate_get_answer(struct 
qeth_channel *channel,
 }
 
 static int qeth_idx_activate_channel(struct qeth_channel *channel,
-   void (*idx_reply_cb)(struct qeth_channel *,
-   struct qeth_cmd_buffer *))
+void (*reply_cb)(struct qeth_card *,
+ struct qeth_channel *,
+ struct qeth_cmd_buffer *))
 {
struct qeth_card *card;
struct qeth_cmd_buffer *iob;
@@ -1864,7 +1873,7 @@ static int qeth_idx_activate_channel(struct qeth_channel 
*channel,
iob = qeth_get_buffer(channel);

[PATCH net-next 08/15] s390/qeth: re-use qeth_notify_skbs()

2018-09-26 Thread Julian Wiedmann

When not using the CQ, this allows us avoid the second skb queue walk
in qeth_release_skbs().

Signed-off-by: Julian Wiedmann 
---
 drivers/s390/net/qeth_core_main.c | 29 +
 1 file changed, 5 insertions(+), 24 deletions(-)

diff --git a/drivers/s390/net/qeth_core_main.c 
b/drivers/s390/net/qeth_core_main.c
index 012519ffd8de..1b853f900720 100644
--- a/drivers/s390/net/qeth_core_main.c
+++ b/drivers/s390/net/qeth_core_main.c
@@ -1180,38 +1180,19 @@ static void qeth_notify_skbs(struct qeth_qdio_out_q *q,
skb_queue_walk(>skb_list, skb) {
QETH_CARD_TEXT_(q->card, 5, "skbn%d", notification);
QETH_CARD_TEXT_(q->card, 5, "%lx", (long) skb);
-   if (be16_to_cpu(skb->protocol) == ETH_P_AF_IUCV) {
-   if (skb->sk) {
-   struct iucv_sock *iucv = iucv_sk(skb->sk);
-   iucv->sk_txnotify(skb, notification);
-   }
-   }
+   if (skb->protocol == htons(ETH_P_AF_IUCV) && skb->sk)
+   iucv_sk(skb->sk)->sk_txnotify(skb, notification);
}
 }
 
 static void qeth_release_skbs(struct qeth_qdio_out_buffer *buf)
 {
-   struct sk_buff *skb;
-   struct iucv_sock *iucv;
-   int notify_general_error = 0;
-
-   if (atomic_read(>state) == QETH_QDIO_BUF_PENDING)
-   notify_general_error = 1;
-
/* release may never happen from within CQ tasklet scope */
WARN_ON_ONCE(atomic_read(>state) == QETH_QDIO_BUF_IN_CQ);
 
-   skb_queue_walk(>skb_list, skb) {
-   QETH_CARD_TEXT(buf->q->card, 5, "skbr");
-   QETH_CARD_TEXT_(buf->q->card, 5, "%lx", (long) skb);
-   if (notify_general_error &&
-   be16_to_cpu(skb->protocol) == ETH_P_AF_IUCV) {
-   if (skb->sk) {
-   iucv = iucv_sk(skb->sk);
-   iucv->sk_txnotify(skb, TX_NOTIFY_GENERALERROR);
-   }
-   }
-   }
+   if (atomic_read(>state) == QETH_QDIO_BUF_PENDING)
+   qeth_notify_skbs(buf->q, buf, TX_NOTIFY_GENERALERROR);
+
__skb_queue_purge(>skb_list);
 }
 
-- 
2.16.4

[PATCH net-next 07/15] s390/qeth: remove additional skb refcount

2018-09-26 Thread Julian Wiedmann

This was presumably left over from back when qeth recursed into
dev_queue_xmit().

Signed-off-by: Julian Wiedmann 
---
 drivers/s390/net/qeth_core_main.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/s390/net/qeth_core_main.c 
b/drivers/s390/net/qeth_core_main.c
index c5c40c6d8b7d..012519ffd8de 100644
--- a/drivers/s390/net/qeth_core_main.c
+++ b/drivers/s390/net/qeth_core_main.c
@@ -1211,7 +1211,6 @@ static void qeth_release_skbs(struct qeth_qdio_out_buffer 
*buf)
iucv->sk_txnotify(skb, TX_NOTIFY_GENERALERROR);
}
}
-   refcount_dec(>users);
}
__skb_queue_purge(>skb_list);
 }
@@ -3988,7 +3987,6 @@ static int qeth_fill_buffer(struct qeth_qdio_out_q *queue,
bool is_first_elem = true;
int flush_cnt = 0;
 
-   refcount_inc(>users);
__skb_queue_tail(>skb_list, skb);
 
/* build dedicated header element */
-- 
2.16.4

[PATCH net-next 06/15] s390/qeth: replace open-coded skb_queue_walk()

2018-09-26 Thread Julian Wiedmann

To match the use of __skb_queue_purge(), also make the skb's enqueue in
qeth_fill_buffer() lockless.

Signed-off-by: Julian Wiedmann 
---
 drivers/s390/net/qeth_core_main.c | 19 ---
 1 file changed, 4 insertions(+), 15 deletions(-)

diff --git a/drivers/s390/net/qeth_core_main.c 
b/drivers/s390/net/qeth_core_main.c
index a6c632d36df8..c5c40c6d8b7d 100644
--- a/drivers/s390/net/qeth_core_main.c
+++ b/drivers/s390/net/qeth_core_main.c
@@ -1177,10 +1177,7 @@ static void qeth_notify_skbs(struct qeth_qdio_out_q *q,
 {
struct sk_buff *skb;
 
-   if (skb_queue_empty(>skb_list))
-   goto out;
-   skb = skb_peek(>skb_list);
-   while (skb) {
+   skb_queue_walk(>skb_list, skb) {
QETH_CARD_TEXT_(q->card, 5, "skbn%d", notification);
QETH_CARD_TEXT_(q->card, 5, "%lx", (long) skb);
if (be16_to_cpu(skb->protocol) == ETH_P_AF_IUCV) {
@@ -1189,13 +1186,7 @@ static void qeth_notify_skbs(struct qeth_qdio_out_q *q,
iucv->sk_txnotify(skb, notification);
}
}
-   if (skb_queue_is_last(>skb_list, skb))
-   skb = NULL;
-   else
-   skb = skb_queue_next(>skb_list, skb);
}
-out:
-   return;
 }
 
 static void qeth_release_skbs(struct qeth_qdio_out_buffer *buf)
@@ -1210,8 +1201,7 @@ static void qeth_release_skbs(struct qeth_qdio_out_buffer 
*buf)
/* release may never happen from within CQ tasklet scope */
WARN_ON_ONCE(atomic_read(>state) == QETH_QDIO_BUF_IN_CQ);
 
-   skb = skb_dequeue(>skb_list);
-   while (skb) {
+   skb_queue_walk(>skb_list, skb) {
QETH_CARD_TEXT(buf->q->card, 5, "skbr");
QETH_CARD_TEXT_(buf->q->card, 5, "%lx", (long) skb);
if (notify_general_error &&
@@ -1222,9 +1212,8 @@ static void qeth_release_skbs(struct qeth_qdio_out_buffer 
*buf)
}
}
refcount_dec(>users);
-   dev_kfree_skb_any(skb);
-   skb = skb_dequeue(>skb_list);
}
+   __skb_queue_purge(>skb_list);
 }
 
 static void qeth_clear_output_buffer(struct qeth_qdio_out_q *queue,
@@ -4000,7 +3989,7 @@ static int qeth_fill_buffer(struct qeth_qdio_out_q *queue,
int flush_cnt = 0;
 
refcount_inc(>users);
-   skb_queue_tail(>skb_list, skb);
+   __skb_queue_tail(>skb_list, skb);
 
/* build dedicated header element */
if (hd_len) {
-- 
2.16.4

[PATCH net-next 05/15] net/af_iucv: locate IUCV header via skb_network_header()

2018-09-26 Thread Julian Wiedmann

This patch attempts to untangle the TX and RX code in qeth from
af_iucv's respective HiperTransport path:
On the TX side, pointing skb_network_header() at the IUCV header
means that qeth_l3_fill_af_iucv_hdr() no longer needs a magical offset
to access the header.
On the RX side, qeth pulls the (fake) L2 header off the skb like any
normal ethernet driver would. This makes working with the IUCV header
in af_iucv easier, since we no longer have to assume a fixed skb layout.

While at it, replace the open-coded length checks in af_iucv's RX path
with pskb_may_pull().

Signed-off-by: Julian Wiedmann 
---
 drivers/s390/net/qeth_l3_main.c | 12 +---
 include/net/iucv/af_iucv.h  |  5 +
 net/iucv/af_iucv.c  | 42 +
 3 files changed, 23 insertions(+), 36 deletions(-)

diff --git a/drivers/s390/net/qeth_l3_main.c b/drivers/s390/net/qeth_l3_main.c
index 2756795f7708..7148ef71ac78 100644
--- a/drivers/s390/net/qeth_l3_main.c
+++ b/drivers/s390/net/qeth_l3_main.c
@@ -1348,6 +1348,7 @@ static void qeth_l3_rebuild_skb(struct qeth_card *card, 
struct sk_buff *skb,
 static int qeth_l3_process_inbound_buffer(struct qeth_card *card,
int budget, int *done)
 {
+   struct net_device *dev = card->dev;
int work_done = 0;
struct sk_buff *skb;
struct qeth_hdr *hdr;
@@ -1369,11 +1370,10 @@ static int qeth_l3_process_inbound_buffer(struct 
qeth_card *card,
magic = *(__u16 *)skb->data;
if ((card->info.type == QETH_CARD_TYPE_IQD) &&
(magic == ETH_P_AF_IUCV)) {
-   skb->protocol = cpu_to_be16(ETH_P_AF_IUCV);
len = skb->len;
-   card->dev->header_ops->create(skb, card->dev, 0,
-   card->dev->dev_addr, "FAKELL", len);
-   skb_reset_mac_header(skb);
+   dev_hard_header(skb, dev, ETH_P_AF_IUCV,
+   dev->dev_addr, "FAKELL", len);
+   skb->protocol = eth_type_trans(skb, dev);
netif_receive_skb(skb);
} else {
qeth_l3_rebuild_skb(card, skb, hdr);
@@ -2005,17 +2005,15 @@ static void qeth_l3_fill_af_iucv_hdr(struct qeth_hdr 
*hdr, struct sk_buff *skb,
 unsigned int data_len)
 {
char daddr[16];
-   struct af_iucv_trans_hdr *iucv_hdr;
 
hdr->hdr.l3.id = QETH_HEADER_TYPE_LAYER3;
hdr->hdr.l3.length = data_len;
hdr->hdr.l3.flags = QETH_HDR_IPV6 | QETH_CAST_UNICAST;
 
-   iucv_hdr = (struct af_iucv_trans_hdr *)(skb_mac_header(skb) + ETH_HLEN);
memset(daddr, 0, sizeof(daddr));
daddr[0] = 0xfe;
daddr[1] = 0x80;
-   memcpy([8], iucv_hdr->destUserID, 8);
+   memcpy([8], iucv_trans_hdr(skb)->destUserID, 8);
memcpy(hdr->hdr.l3.next_hop.ipv6_addr, daddr, 16);
 }
 
diff --git a/include/net/iucv/af_iucv.h b/include/net/iucv/af_iucv.h
index f4c21b5a1242..14a490246be9 100644
--- a/include/net/iucv/af_iucv.h
+++ b/include/net/iucv/af_iucv.h
@@ -80,6 +80,11 @@ struct af_iucv_trans_hdr {
u8 pad;  /* total 104 bytes */
 } __packed;
 
+static inline struct af_iucv_trans_hdr *iucv_trans_hdr(struct sk_buff *skb)
+{
+   return (struct af_iucv_trans_hdr *)skb_network_header(skb);
+}
+
 enum iucv_tx_notify {
/* transmission of skb is completed and was successful */
TX_NOTIFY_OK = 0,
diff --git a/net/iucv/af_iucv.c b/net/iucv/af_iucv.c
index 5b68ee908107..45115c125569 100644
--- a/net/iucv/af_iucv.c
+++ b/net/iucv/af_iucv.c
@@ -320,13 +320,9 @@ static int afiucv_hs_send(struct iucv_message *imsg, 
struct sock *sock,
struct sk_buff *nskb;
int err, confirm_recv = 0;
 
-   memset(skb->head, 0, ETH_HLEN);
-   phs_hdr = skb_push(skb, sizeof(struct af_iucv_trans_hdr));
-   skb_reset_mac_header(skb);
+   phs_hdr = skb_push(skb, sizeof(*phs_hdr));
+   memset(phs_hdr, 0, sizeof(*phs_hdr));
skb_reset_network_header(skb);
-   skb_push(skb, ETH_HLEN);
-   skb_reset_mac_header(skb);
-   memset(phs_hdr, 0, sizeof(struct af_iucv_trans_hdr));
 
phs_hdr->magic = ETH_P_AF_IUCV;
phs_hdr->version = 1;
@@ -350,6 +346,9 @@ static int afiucv_hs_send(struct iucv_message *imsg, struct 
sock *sock,
if (imsg)
memcpy(_hdr->iucv_hdr, imsg, sizeof(struct iucv_message));
 
+   skb_push(skb, ETH_HLEN);
+   memset(skb->data, 0, ETH_HLEN);
+
skb->dev = iucv->hs_dev;
if (!skb->dev) {
err = -ENODEV;
@@ -1943,8 +1942,7 @@ static void iucv_callback_shutdown(struct iucv_path 
*path, u8 ipuser[16])
 /* HiperSockets transport callbacks /
 static

[PATCH net-next 02/15] s390/qeth: use DEFINE_MUTEX for qeth_mod_mutex

2018-09-26 Thread Julian Wiedmann

Consolidate declaration and initialization of a static variable.
While at it reduce its scope in qeth_core_load_discipline(), and simplify
the return logic accordingly.

Signed-off-by: Julian Wiedmann 
---
 drivers/s390/net/qeth_core_main.c | 13 ++---
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/drivers/s390/net/qeth_core_main.c 
b/drivers/s390/net/qeth_core_main.c
index 4fd9bdc2d0ae..c3068f680f67 100644
--- a/drivers/s390/net/qeth_core_main.c
+++ b/drivers/s390/net/qeth_core_main.c
@@ -62,7 +62,6 @@ static struct kmem_cache *qeth_qdio_outbuf_cache;
 
 static struct device *qeth_core_root_dev;
 static struct lock_class_key qdio_out_skb_queue_key;
-static struct mutex qeth_mod_mutex;
 
 static void qeth_send_control_data_cb(struct qeth_channel *,
struct qeth_cmd_buffer *);
@@ -5589,11 +5588,11 @@ static int qeth_register_dbf_views(void)
return 0;
 }
 
+static DEFINE_MUTEX(qeth_mod_mutex);   /* for synchronized module loading */
+
 int qeth_core_load_discipline(struct qeth_card *card,
enum qeth_discipline_id discipline)
 {
-   int rc = 0;
-
mutex_lock(_mod_mutex);
switch (discipline) {
case QETH_DISCIPLINE_LAYER3:
@@ -5607,14 +5606,15 @@ int qeth_core_load_discipline(struct qeth_card *card,
default:
break;
}
+   mutex_unlock(_mod_mutex);
 
if (!card->discipline) {
dev_err(>gdev->dev, "There is no kernel module to "
"support discipline %d\n", discipline);
-   rc = -EINVAL;
+   return -EINVAL;
}
-   mutex_unlock(_mod_mutex);
-   return rc;
+
+   return 0;
 }
 
 void qeth_core_free_discipline(struct qeth_card *card)
@@ -6623,7 +6623,6 @@ static int __init qeth_core_init(void)
INIT_LIST_HEAD(_core_card_list.list);
INIT_LIST_HEAD(_dbf_list);
rwlock_init(_core_card_list.rwlock);
-   mutex_init(_mod_mutex);
 
qeth_wq = create_singlethread_workqueue("qeth_wq");
if (!qeth_wq) {
-- 
2.16.4

[PATCH net-next 03/15] s390/qeth: fix discipline unload after setup error

2018-09-26 Thread Julian Wiedmann

Device initialization code usually first loads a subdriver
(via qeth_core_load_discipline()), and then runs its setup() callback.
If this fails, it rolls back the load via qeth_core_free_discipline().

qeth_core_free_discipline() expects the options.layer attribute to be
initialized, but on error in setup() that's currently not the case.
Resulting in misbalanced symbol_put() calls.

Fix this by setting options.layer when loading the subdriver.

Signed-off-by: Julian Wiedmann 
---
 drivers/s390/net/qeth_core_main.c | 2 ++
 drivers/s390/net/qeth_core_sys.c  | 2 --
 drivers/s390/net/qeth_l2_main.c   | 1 -
 drivers/s390/net/qeth_l3_main.c   | 1 -
 4 files changed, 2 insertions(+), 4 deletions(-)

diff --git a/drivers/s390/net/qeth_core_main.c 
b/drivers/s390/net/qeth_core_main.c
index c3068f680f67..1fed8f113f40 100644
--- a/drivers/s390/net/qeth_core_main.c
+++ b/drivers/s390/net/qeth_core_main.c
@@ -5614,6 +5614,7 @@ int qeth_core_load_discipline(struct qeth_card *card,
return -EINVAL;
}
 
+   card->options.layer = discipline;
return 0;
 }
 
@@ -5623,6 +5624,7 @@ void qeth_core_free_discipline(struct qeth_card *card)
symbol_put(qeth_l2_discipline);
else
symbol_put(qeth_l3_discipline);
+   card->options.layer = QETH_DISCIPLINE_UNDETERMINED;
card->discipline = NULL;
 }
 
diff --git a/drivers/s390/net/qeth_core_sys.c b/drivers/s390/net/qeth_core_sys.c
index fdb67af811f4..970f6c71a66e 100644
--- a/drivers/s390/net/qeth_core_sys.c
+++ b/drivers/s390/net/qeth_core_sys.c
@@ -432,8 +432,6 @@ static ssize_t qeth_dev_layer2_store(struct device *dev,
 
card->discipline->remove(card->gdev);
qeth_core_free_discipline(card);
-   card->options.layer = QETH_DISCIPLINE_UNDETERMINED;
-
free_netdev(card->dev);
card->dev = ndev;
}
diff --git a/drivers/s390/net/qeth_l2_main.c b/drivers/s390/net/qeth_l2_main.c
index afa7a005b21e..02566af7e23d 100644
--- a/drivers/s390/net/qeth_l2_main.c
+++ b/drivers/s390/net/qeth_l2_main.c
@@ -806,7 +806,6 @@ static int qeth_l2_probe_device(struct ccwgroup_device 
*gdev)
}
INIT_LIST_HEAD(>vid_list);
hash_init(card->mac_htable);
-   card->options.layer = QETH_DISCIPLINE_LAYER2;
card->info.hwtrap = 0;
qeth_l2_vnicc_set_defaults(card);
return 0;
diff --git a/drivers/s390/net/qeth_l3_main.c b/drivers/s390/net/qeth_l3_main.c
index 552bfad20f85..2756795f7708 100644
--- a/drivers/s390/net/qeth_l3_main.c
+++ b/drivers/s390/net/qeth_l3_main.c
@@ -2489,7 +2489,6 @@ static int qeth_l3_probe_device(struct ccwgroup_device 
*gdev)
}
hash_init(card->ip_htable);
hash_init(card->ip_mc_htable);
-   card->options.layer = QETH_DISCIPLINE_LAYER3;
card->info.hwtrap = 0;
return 0;
 }
-- 
2.16.4

[PATCH net-next 04/15] s390/qeth: on gdev release, reset drvdata

2018-09-26 Thread Julian Wiedmann

qeth_core_probe_device() sets the gdev's drvdata, but doesn't reset it
on a subsequent error. Move the (re-)setting around a bit, so that it
happens symmetrically on allocating/freeing the qeth_card struct.

This is no actual problem, as the ccwgroup core will discard the gdev
on a probe error. But from qeth's perspective the gdev is an external
resource, so it's best to manage it cleanly.

Signed-off-by: Julian Wiedmann 
---
 drivers/s390/net/qeth_core_main.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/s390/net/qeth_core_main.c 
b/drivers/s390/net/qeth_core_main.c
index 1fed8f113f40..a6c632d36df8 100644
--- a/drivers/s390/net/qeth_core_main.c
+++ b/drivers/s390/net/qeth_core_main.c
@@ -1513,6 +1513,7 @@ static struct qeth_card *qeth_alloc_card(struct 
ccwgroup_device *gdev)
QETH_DBF_HEX(SETUP, 2, , sizeof(void *));
 
card->gdev = gdev;
+   dev_set_drvdata(>dev, card);
CARD_RDEV(card) = gdev->cdev[0];
CARD_WDEV(card) = gdev->cdev[1];
CARD_DDEV(card) = gdev->cdev[2];
@@ -1531,6 +1532,7 @@ static struct qeth_card *qeth_alloc_card(struct 
ccwgroup_device *gdev)
 out_channel:
qeth_clean_channel(>read);
 out_ip:
+   dev_set_drvdata(>dev, NULL);
kfree(card);
 out:
return NULL;
@@ -5074,6 +5076,7 @@ static void qeth_core_free_card(struct qeth_card *card)
qeth_clean_channel(>data);
qeth_free_qdio_buffers(card);
unregister_service_level(>qeth_service_level);
+   dev_set_drvdata(>gdev->dev, NULL);
kfree(card);
 }
 
@@ -5788,7 +5791,6 @@ static int qeth_core_probe_device(struct ccwgroup_device 
*gdev)
goto err_card;
}
 
-   dev_set_drvdata(>dev, card);
qeth_setup_card(card);
qeth_update_from_chp_desc(card);
 
@@ -5851,7 +5853,6 @@ static void qeth_core_remove_device(struct 
ccwgroup_device *gdev)
write_unlock_irq(_core_card_list.rwlock);
free_netdev(card->dev);
qeth_core_free_card(card);
-   dev_set_drvdata(>dev, NULL);
put_device(>dev);
 }
 
-- 
2.16.4

[PATCH net-next 01/15] s390/qeth: convert layer attribute to enum

2018-09-26 Thread Julian Wiedmann

While the raw values are fixed due to their use in a sysfs attribute,
we can still use the proper QETH_DISCIPLINE_* enum within the driver.

Also move the initialization into qeth_set_initial_options(), along with
all other user-configurable fields.

Signed-off-by: Julian Wiedmann 
---
 drivers/s390/net/qeth_core.h  | 17 ++---
 drivers/s390/net/qeth_core_main.c | 22 +-
 drivers/s390/net/qeth_core_sys.c  |  8 
 drivers/s390/net/qeth_l2_main.c   |  2 +-
 drivers/s390/net/qeth_l3_main.c   |  2 +-
 5 files changed, 25 insertions(+), 26 deletions(-)

diff --git a/drivers/s390/net/qeth_core.h b/drivers/s390/net/qeth_core.h
index 0dbe81f958f0..630a01b3212c 100644
--- a/drivers/s390/net/qeth_core.h
+++ b/drivers/s390/net/qeth_core.h
@@ -671,6 +671,12 @@ struct qeth_card_info {
__u32 hwtrap;
 };
 
+enum qeth_discipline_id {
+   QETH_DISCIPLINE_UNDETERMINED = -1,
+   QETH_DISCIPLINE_LAYER3 = 0,
+   QETH_DISCIPLINE_LAYER2 = 1,
+};
+
 struct qeth_card_options {
struct qeth_routing_info route4;
struct qeth_ipa_info ipa4;
@@ -680,7 +686,7 @@ struct qeth_card_options {
struct qeth_sbp_info sbp; /* SETBRIDGEPORT options */
struct qeth_vnicc_info vnicc; /* VNICC options */
int fake_broadcast;
-   int layer2;
+   enum qeth_discipline_id layer;
int performance_stats;
int rx_sg_cb;
enum qeth_ipa_isolation_modes isolation;
@@ -690,6 +696,9 @@ struct qeth_card_options {
char hsuid[9];
 };
 
+#defineIS_LAYER2(card) ((card)->options.layer == 
QETH_DISCIPLINE_LAYER2)
+#defineIS_LAYER3(card) ((card)->options.layer == 
QETH_DISCIPLINE_LAYER3)
+
 /*
  * thread bits for qeth_card thread masks
  */
@@ -702,12 +711,6 @@ struct qeth_osn_info {
int (*data_cb)(struct sk_buff *skb);
 };
 
-enum qeth_discipline_id {
-   QETH_DISCIPLINE_UNDETERMINED = -1,
-   QETH_DISCIPLINE_LAYER3 = 0,
-   QETH_DISCIPLINE_LAYER2 = 1,
-};
-
 struct qeth_discipline {
const struct device_type *devtype;
int (*process_rx_buffer)(struct qeth_card *card, int budget, int *done);
diff --git a/drivers/s390/net/qeth_core_main.c 
b/drivers/s390/net/qeth_core_main.c
index 89e09e7b8fff..4fd9bdc2d0ae 100644
--- a/drivers/s390/net/qeth_core_main.c
+++ b/drivers/s390/net/qeth_core_main.c
@@ -1429,6 +1429,7 @@ static void qeth_set_initial_options(struct qeth_card 
*card)
card->options.rx_sg_cb = QETH_RX_SG_CB;
card->options.isolation = ISOLATION_MODE_NONE;
card->options.cq = QETH_CQ_DISABLED;
+   card->options.layer = QETH_DISCIPLINE_UNDETERMINED;
 }
 
 static int qeth_do_start_thread(struct qeth_card *card, unsigned long thread)
@@ -1522,7 +1523,6 @@ static struct qeth_card *qeth_alloc_card(struct 
ccwgroup_device *gdev)
goto out_channel;
if (qeth_setup_channel(>data, false))
goto out_data;
-   card->options.layer2 = -1;
card->qeth_service_level.seq_print = qeth_core_sl_print;
register_service_level(>qeth_service_level);
return card;
@@ -2291,7 +2291,7 @@ static int qeth_update_max_mtu(struct qeth_card *card, 
unsigned int max_mtu)
if (dev->mtu)
new_mtu = dev->mtu;
/* default MTUs for first setup: */
-   else if (card->options.layer2)
+   else if (IS_LAYER2(card))
new_mtu = ETH_DATA_LEN;
else
new_mtu = ETH_DATA_LEN - 8; /* allow for LLC + SNAP */
@@ -2358,7 +2358,7 @@ static u8 qeth_mpc_select_prot_type(struct qeth_card 
*card)
 {
if (IS_OSN(card))
return QETH_PROT_OSN2;
-   return (card->options.layer2 == 1) ? QETH_PROT_LAYER2 : QETH_PROT_TCPIP;
+   return IS_LAYER2(card) ? QETH_PROT_LAYER2 : QETH_PROT_TCPIP;
 }
 
 static int qeth_ulp_enable(struct qeth_card *card)
@@ -2896,10 +2896,7 @@ static void qeth_fill_ipacmd_header(struct qeth_card 
*card,
/* cmd->hdr.seqno is set by qeth_send_control_data() */
cmd->hdr.adapter_type = qeth_get_ipa_adp_type(card->info.link_type);
cmd->hdr.rel_adapter_no = (u8) card->dev->dev_port;
-   if (card->options.layer2)
-   cmd->hdr.prim_version_no = 2;
-   else
-   cmd->hdr.prim_version_no = 1;
+   cmd->hdr.prim_version_no = IS_LAYER2(card) ? 2 : 1;
cmd->hdr.param_count = 1;
cmd->hdr.prot_version = prot;
 }
@@ -4278,8 +4275,7 @@ static int qeth_setadpparms_change_macaddr_cb(struct 
qeth_card *card,
if (qeth_setadpparms_inspect_rc(cmd))
return 0;
 
-   if (!card->options.layer2 ||
-   !(card->info.mac_bits & QETH_LAYER2_MAC_READ)) {
+   if (IS_LAYER3(card) || !(card->info.mac_bits & QETH_LAYER2_MAC_READ)) {
ether_addr_copy(card->dev->dev_addr,

cmd->data.setadapterparms.data.change_addr.addr);

Re: [PATCH net v2] bnxt_en: Fix TX timeout during netpoll.

2018-09-26 Thread Song Liu




> On Sep 25, 2018, at 9:41 PM, Michael Chan  wrote:
> 
> The current netpoll implementation in the bnxt_en driver has problems
> that may miss TX completion events.  bnxt_poll_work() in effect is
> only handling at most 1 TX packet before exiting.  In addition,
> there may be in flight TX completions that ->poll() may miss even
> after we fix bnxt_poll_work() to handle all visible TX completions.
> netpoll may not call ->poll() again and HW may not generate IRQ
> because the driver does not ARM the IRQ when the budget (0 for netpoll)
> is reached.
> 
> We fix it by handling all TX completions and to always ARM the IRQ
> when we exit ->poll() with 0 budget.
> 
> Also, the logic to ACK the completion ring in case it is almost filled
> with TX completions need to be adjusted to take care of the 0 budget
> case, as discussed with Eric Dumazet 
> 
> Reported-by: Song Liu 
> Signed-off-by: Michael Chan 

Reviewed-and-tested-by: Song Liu 


> ---
> drivers/net/ethernet/broadcom/bnxt/bnxt.c | 13 ++---
> 1 file changed, 10 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c 
> b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
> index 61957b0..0478e56 100644
> --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
> +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
> @@ -1884,8 +1884,11 @@ static int bnxt_poll_work(struct bnxt *bp, struct 
> bnxt_napi *bnapi, int budget)
>   if (TX_CMP_TYPE(txcmp) == CMP_TYPE_TX_L2_CMP) {
>   tx_pkts++;
>   /* return full budget so NAPI will complete. */
> - if (unlikely(tx_pkts > bp->tx_wake_thresh))
> + if (unlikely(tx_pkts > bp->tx_wake_thresh)) {
>   rx_pkts = budget;
> + raw_cons = NEXT_RAW_CMP(raw_cons);
> + break;
> + }
>   } else if ((TX_CMP_TYPE(txcmp) & 0x30) == 0x10) {
>   if (likely(budget))
>   rc = bnxt_rx_pkt(bp, bnapi, _cons, );
> @@ -1913,7 +1916,7 @@ static int bnxt_poll_work(struct bnxt *bp, struct 
> bnxt_napi *bnapi, int budget)
>   }
>   raw_cons = NEXT_RAW_CMP(raw_cons);
> 
> - if (rx_pkts == budget)
> + if (rx_pkts && rx_pkts == budget)
>   break;
>   }
> 
> @@ -2027,8 +2030,12 @@ static int bnxt_poll(struct napi_struct *napi, int 
> budget)
>   while (1) {
>   work_done += bnxt_poll_work(bp, bnapi, budget - work_done);
> 
> - if (work_done >= budget)
> + if (work_done >= budget) {
> + if (!budget)
> + BNXT_CP_DB_REARM(cpr->cp_doorbell,
> +  cpr->cp_raw_cons);
>   break;
> + }
> 
>   if (!bnxt_has_work(bp, cpr)) {
>   if (napi_complete_done(napi, work_done))
> -- 
> 2.5.1
>

[PATCH net 2/2] s390: qeth: Fix potential array overrun in cmd/rc lookup

2018-09-26 Thread Julian Wiedmann

From: Jean Delvare 

Functions qeth_get_ipa_msg and qeth_get_ipa_cmd_name are modifying
the last member of global arrays without any locking that I can see.
If two instances of either function are running at the same time,
it could cause a race ultimately leading to an array overrun (the
contents of the last entry of the array is the only guarantee that
the loop will ever stop).

Performing the lookups without modifying the arrays is admittedly
slower (two comparisons per iteration instead of one) but these
are operations which are rare (should only be needed in error
cases or when debugging, not during successful operation) and it
seems still less costly than introducing a mutex to protect the
arrays in question.

As a side bonus, it allows us to declare both arrays as const data.

Signed-off-by: Jean Delvare 
Cc: Julian Wiedmann 
Cc: Ursula Braun 
Cc: Martin Schwidefsky 
Cc: Heiko Carstens 
Signed-off-by: Julian Wiedmann 
---
 drivers/s390/net/qeth_core_main.c |  2 +-
 drivers/s390/net/qeth_core_mpc.c  | 30 --
 drivers/s390/net/qeth_core_mpc.h  |  4 ++--
 3 files changed, 19 insertions(+), 17 deletions(-)

diff --git a/drivers/s390/net/qeth_core_main.c 
b/drivers/s390/net/qeth_core_main.c
index de8282420f96..ffce6f39828a 100644
--- a/drivers/s390/net/qeth_core_main.c
+++ b/drivers/s390/net/qeth_core_main.c
@@ -610,7 +610,7 @@ static void qeth_put_reply(struct qeth_reply *reply)
 static void qeth_issue_ipa_msg(struct qeth_ipa_cmd *cmd, int rc,
struct qeth_card *card)
 {
-   char *ipa_name;
+   const char *ipa_name;
int com = cmd->hdr.command;
ipa_name = qeth_get_ipa_cmd_name(com);
if (rc)
diff --git a/drivers/s390/net/qeth_core_mpc.c b/drivers/s390/net/qeth_core_mpc.c
index e8263ded0af0..e891c0b52f4c 100644
--- a/drivers/s390/net/qeth_core_mpc.c
+++ b/drivers/s390/net/qeth_core_mpc.c
@@ -148,10 +148,10 @@ EXPORT_SYMBOL_GPL(IPA_PDU_HEADER);
 
 struct ipa_rc_msg {
enum qeth_ipa_return_codes rc;
-   char *msg;
+   const char *msg;
 };
 
-static struct ipa_rc_msg qeth_ipa_rc_msg[] = {
+static const struct ipa_rc_msg qeth_ipa_rc_msg[] = {
{IPA_RC_SUCCESS,"success"},
{IPA_RC_NOTSUPP,"Command not supported"},
{IPA_RC_IP_TABLE_FULL,  "Add Addr IP Table Full - ipv6"},
@@ -219,22 +219,23 @@ static struct ipa_rc_msg qeth_ipa_rc_msg[] = {
 
 
 
-char *qeth_get_ipa_msg(enum qeth_ipa_return_codes rc)
+const char *qeth_get_ipa_msg(enum qeth_ipa_return_codes rc)
 {
-   int x = 0;
-   qeth_ipa_rc_msg[ARRAY_SIZE(qeth_ipa_rc_msg) - 1].rc = rc;
-   while (qeth_ipa_rc_msg[x].rc != rc)
-   x++;
+   int x;
+
+   for (x = 0; x < ARRAY_SIZE(qeth_ipa_rc_msg) - 1; x++)
+   if (qeth_ipa_rc_msg[x].rc == rc)
+   return qeth_ipa_rc_msg[x].msg;
return qeth_ipa_rc_msg[x].msg;
 }
 
 
 struct ipa_cmd_names {
enum qeth_ipa_cmds cmd;
-   char *name;
+   const char *name;
 };
 
-static struct ipa_cmd_names qeth_ipa_cmd_names[] = {
+static const struct ipa_cmd_names qeth_ipa_cmd_names[] = {
{IPA_CMD_STARTLAN,  "startlan"},
{IPA_CMD_STOPLAN,   "stoplan"},
{IPA_CMD_SETVMAC,   "setvmac"},
@@ -266,11 +267,12 @@ static struct ipa_cmd_names qeth_ipa_cmd_names[] = {
{IPA_CMD_UNKNOWN,   "unknown"},
 };
 
-char *qeth_get_ipa_cmd_name(enum qeth_ipa_cmds cmd)
+const char *qeth_get_ipa_cmd_name(enum qeth_ipa_cmds cmd)
 {
-   int x = 0;
-   qeth_ipa_cmd_names[ARRAY_SIZE(qeth_ipa_cmd_names) - 1].cmd = cmd;
-   while (qeth_ipa_cmd_names[x].cmd != cmd)
-   x++;
+   int x;
+
+   for (x = 0; x < ARRAY_SIZE(qeth_ipa_cmd_names) - 1; x++)
+   if (qeth_ipa_cmd_names[x].cmd == cmd)
+   return qeth_ipa_cmd_names[x].name;
return qeth_ipa_cmd_names[x].name;
 }
diff --git a/drivers/s390/net/qeth_core_mpc.h b/drivers/s390/net/qeth_core_mpc.h
index aa8b9196b089..aa5de1fe01e1 100644
--- a/drivers/s390/net/qeth_core_mpc.h
+++ b/drivers/s390/net/qeth_core_mpc.h
@@ -797,8 +797,8 @@ enum qeth_ipa_arp_return_codes {
QETH_IPA_ARP_RC_Q_NO_DATA= 0x0008,
 };
 
-extern char *qeth_get_ipa_msg(enum qeth_ipa_return_codes rc);
-extern char *qeth_get_ipa_cmd_name(enum qeth_ipa_cmds cmd);
+extern const char *qeth_get_ipa_msg(enum qeth_ipa_return_codes rc);
+extern const char *qeth_get_ipa_cmd_name(enum qeth_ipa_cmds cmd);
 
 #define QETH_SETASS_BASE_LEN (sizeof(struct qeth_ipacmd_hdr) + \
   sizeof(struct qeth_ipacmd_setassparms_hdr))
-- 
2.16.4

[PATCH net 0/2] s390/qeth: fixes 2019-09-26

2018-09-26 Thread Julian Wiedmann

Hi Dave,

please apply two qeth patches for -net. The first is a trivial cleanup
required for patch #2 by Jean, which fixes a potential endless loop.

Thanks,
Julian


Jean Delvare (1):
  s390: qeth: Fix potential array overrun in cmd/rc lookup

zhong jiang (1):
  s390: qeth_core_mpc: Use ARRAY_SIZE instead of reimplementing its
function

 drivers/s390/net/qeth_core_main.c |  2 +-
 drivers/s390/net/qeth_core_mpc.c  | 33 -
 drivers/s390/net/qeth_core_mpc.h  |  4 ++--
 3 files changed, 19 insertions(+), 20 deletions(-)

-- 
2.16.4

[PATCH net 1/2] s390: qeth_core_mpc: Use ARRAY_SIZE instead of reimplementing its function

2018-09-26 Thread Julian Wiedmann

From: zhong jiang 

Use the common code ARRAY_SIZE macro instead of a private implementation.

Reviewed-by: Jean Delvare 
Signed-off-by: zhong jiang 
Signed-off-by: Martin Schwidefsky 
Signed-off-by: Julian Wiedmann 
---
 drivers/s390/net/qeth_core_mpc.c | 7 ++-
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/drivers/s390/net/qeth_core_mpc.c b/drivers/s390/net/qeth_core_mpc.c
index 5bcb8dafc3ee..e8263ded0af0 100644
--- a/drivers/s390/net/qeth_core_mpc.c
+++ b/drivers/s390/net/qeth_core_mpc.c
@@ -222,8 +222,7 @@ static struct ipa_rc_msg qeth_ipa_rc_msg[] = {
 char *qeth_get_ipa_msg(enum qeth_ipa_return_codes rc)
 {
int x = 0;
-   qeth_ipa_rc_msg[sizeof(qeth_ipa_rc_msg) /
-   sizeof(struct ipa_rc_msg) - 1].rc = rc;
+   qeth_ipa_rc_msg[ARRAY_SIZE(qeth_ipa_rc_msg) - 1].rc = rc;
while (qeth_ipa_rc_msg[x].rc != rc)
x++;
return qeth_ipa_rc_msg[x].msg;
@@ -270,9 +269,7 @@ static struct ipa_cmd_names qeth_ipa_cmd_names[] = {
 char *qeth_get_ipa_cmd_name(enum qeth_ipa_cmds cmd)
 {
int x = 0;
-   qeth_ipa_cmd_names[
-   sizeof(qeth_ipa_cmd_names) /
-   sizeof(struct ipa_cmd_names)-1].cmd = cmd;
+   qeth_ipa_cmd_names[ARRAY_SIZE(qeth_ipa_cmd_names) - 1].cmd = cmd;
while (qeth_ipa_cmd_names[x].cmd != cmd)
x++;
return qeth_ipa_cmd_names[x].name;
-- 
2.16.4

Re: [PATCH RFC,net-next 00/10] add flow_rule infrastructure

2018-09-26 Thread Jakub Kicinski

On Tue, 25 Sep 2018 21:19:51 +0200, Pablo Neira Ayuso wrote:
> This patchset is adding a new layer between drivers and the existing
> software frontends, so it's a bit more code, but it is core
> infrastructure common to everyone and this comes with benefits for
> driver developers:

Thanks a lot for doing this!!

Re: [PATCH net-next v6 07/23] zinc: ChaCha20 ARM and ARM64 implementations

2018-09-26 Thread Ard Biesheuvel

On Wed, 26 Sep 2018 at 17:50, Jason A. Donenfeld  wrote:
>
> On Wed, Sep 26, 2018 at 5:45 PM Jason A. Donenfeld  wrote:
> > So what you have in mind is something like calling simd_relax() every
> > 4096 bytes or so?
>
> That was actually pretty easy, putting together both of your suggestions:
>
> static inline bool chacha20_arch(struct chacha20_ctx *state, u8 *dst,
>  u8 *src, size_t len,
>  simd_context_t *simd_context)
> {
> while (len > PAGE_SIZE) {
> chacha20_arch(state, dst, src, PAGE_SIZE, simd_context);
> len -= PAGE_SIZE;
> src += PAGE_SIZE;
> dst += PAGE_SIZE;
> simd_relax(simd_context);
> }
> if (IS_ENABLED(CONFIG_KERNEL_MODE_NEON) && chacha20_use_neon &&
> len >= CHACHA20_BLOCK_SIZE * 3 && simd_use(simd_context))
> chacha20_neon(dst, src, len, state->key, state->counter);
> else
> chacha20_arm(dst, src, len, state->key, state->counter);
>
> state->counter[0] += (len + 63) / 64;
> return true;
> }

Nice one :-)

This works for me (but perhaps add a comment as well)

Re: [PATCH RFC,net-next 04/10] cls_flower: add translator to flow_action representation

2018-09-26 Thread Jakub Kicinski

On Tue, 25 Sep 2018 21:19:55 +0200, Pablo Neira Ayuso wrote:
> This implements TC action to flow_action translation from cls_flower.
> 
> Signed-off-by: Pablo Neira Ayuso 
> ---
>  net/sched/cls_flower.c | 124 
> -
>  1 file changed, 123 insertions(+), 1 deletion(-)
> 
> diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c
> index e1dd60a2ecb8..a96a80f01c6d 100644
> --- a/net/sched/cls_flower.c
> +++ b/net/sched/cls_flower.c
> @@ -28,6 +28,14 @@
>  
>  #include 
>  #include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
>  
>  struct fl_flow_key {
>   int indev_ifindex;
> @@ -101,6 +109,7 @@ struct cls_fl_filter {
>   u32 in_hw_count;
>   struct rcu_work rwork;
>   struct net_device *hw_dev;
> + struct flow_action action;
>  };
>  
>  static const struct rhashtable_params mask_ht_params = {
> @@ -294,6 +303,107 @@ static void fl_hw_destroy_filter(struct tcf_proto *tp, 
> struct cls_fl_filter *f,
>   tcf_block_offload_dec(block, >flags);
>  }
>  
> +static int fl_hw_setup_action(struct flow_action *flow_action,
> +   const struct tcf_exts *exts)

The function doesn't seem very flower-specific?

> +{
> + const struct tc_action *act;
> + int num_acts = 0, i, j, k;
> +
> + if (!exts)
> + return 0;
> +
> + tcf_exts_for_each_action(i, act, exts) {
> + if (is_tcf_pedit(act))
> + num_acts += tcf_pedit_nkeys(act);
> + else
> + num_acts++;
> + }
> +
> + if (!num_acts)
> + return 0;
> +
> + if (flow_action_init(flow_action, num_acts) < 0)
> + return -ENOMEM;
> +
> + j = 0;
> + tcf_exts_for_each_action(i, act, exts) {
> + struct flow_action_key *key;
> +
> + key = _action->keys[j];
> + if (is_tcf_gact_ok(act)) {
> + key->id = FLOW_ACTION_KEY_ACCEPT;
> + } else if (is_tcf_gact_shot(act)) {
> + key->id = FLOW_ACTION_KEY_DROP;
> + } else if (is_tcf_gact_trap(act)) {
> + key->id = FLOW_ACTION_KEY_TRAP;
> + } else if (is_tcf_gact_goto_chain(act)) {
> + key->id = FLOW_ACTION_KEY_GOTO;
> + key->chain_index = tcf_gact_goto_chain_index(act);
> + } else if (is_tcf_mirred_egress_redirect(act)) {
> + key->id = FLOW_ACTION_KEY_REDIRECT;
> + key->dev = tcf_mirred_dev(act);
> + } else if (is_tcf_mirred_egress_mirror(act)) {
> + key->id = FLOW_ACTION_KEY_MIRRED;
> + key->dev = tcf_mirred_dev(act);
> + } else if (is_tcf_vlan(act)) {
> + switch (tcf_vlan_action(act)) {
> + case TCA_VLAN_ACT_PUSH:
> + key->id = FLOW_ACTION_KEY_VLAN_PUSH;
> + key->vlan.vid = tcf_vlan_push_vid(act);
> + key->vlan.proto = tcf_vlan_push_proto(act);
> + key->vlan.prio = tcf_vlan_push_prio(act);
> + break;
> + case TCA_VLAN_ACT_POP:
> + key->id = FLOW_ACTION_KEY_VLAN_POP;
> + break;
> + case TCA_VLAN_ACT_MODIFY:
> + key->id = FLOW_ACTION_KEY_VLAN_MANGLE;
> + key->vlan.vid = tcf_vlan_push_vid(act);
> + key->vlan.proto = tcf_vlan_push_proto(act);
> + key->vlan.prio = tcf_vlan_push_prio(act);
> + break;
> + }
> + } else if (is_tcf_tunnel_set(act)) {
> + key->id = FLOW_ACTION_KEY_TUNNEL_ENCAP;
> + key->tunnel = tcf_tunnel_info(act);
> + } else if (is_tcf_tunnel_release(act)) {
> + key->id = FLOW_ACTION_KEY_TUNNEL_DECAP;
> + key->tunnel = tcf_tunnel_info(act);
> + } else if (is_tcf_pedit(act)) {
> + for (k = 0; k < tcf_pedit_nkeys(act); k++) {
> + switch (tcf_pedit_cmd(act, k)) {
> + case TCA_PEDIT_KEY_EX_CMD_SET:
> + key->id = FLOW_ACTION_KEY_MANGLE;
> + break;
> + case TCA_PEDIT_KEY_EX_CMD_ADD:
> + key->id = FLOW_ACTION_KEY_ADD;
> + break;
> + default:
> + WARN_ON_ONCE(1);
> + break;
> + }
> +
> + key->mangle.htype = tcf_pedit_htype(act, k);
> +

Re: Marvell phy errata origins?

2018-09-26 Thread Daniel Walker


On 09/25/2018 10:42 PM, Harini Katakam wrote:

Hi,
On Tue, Sep 25, 2018 at 11:00 PM Harini Katakam  wrote:


Hi Daniel,

On Tue, Sep 25, 2018 at 9:10 PM Andrew Lunn  wrote:



I hope this this thread isn't too old to bring back to life. So it seems
that Harini found that m88e did not need this errata, and Cisco
previously found that Harini's patch fixed m88e1112, we included it
internally for that reason

Now I'm getting reports that this errata fixes issues we're seeing on
m88e. We see an interrupt storm without the errata, despite the errata
not being defined in the datasheet.


Is everybody actually using interrupts? It could be in one system
phylib is polling.



Yes, we weren't using interrupts; we used phy poll.

As I recall, the register and page combination was reserved and
the access seemed to fail.
It will be useful if we can the errata description or version details.
I'll check if I can get any more information.


One of the PHY parts used was "88E-B2-bab1i000"


I doubt I can find this level of detail .. We have many of these 
machines in the field so they may have different part numbers.


I may have been given some incorrect details on the issue. I'm not 
currently sure this errata code is related. I'll let you know when I 
have more information.


Daniel

Re: netlink: 16 bytes leftover after parsing attributes in process `ip'.

2018-09-26 Thread Stephen Hemminger

On Wed, 26 Sep 2018 08:54:43 -0600
David Ahern  wrote:

> On 9/25/18 11:51 PM, Jiri Benc wrote:
> > On Tue, 25 Sep 2018 09:37:41 -0600, David Ahern wrote:  
> >> For ifaddrmsg ifa_flags aligns with ifi_type which is set by kernel side
> >> so this should be ok.  
> > 
> > Does the existing user space set ifi_type to anything? Does it zero out
> > the field?
> > 
> > Are we able to find a flag value that is not going to be set by unaware
> > user space? I.e., a bit that is unused by the current ARPHRD values on
> > both little and big endian? (ARPHRD_NONE might be a problem, though...)  
> 
> The goal is for userpsace to pass something to the kernel to
> definitively state which header is used.
> 

You can not safely assume anything about older code.
iproute2 is not the only thing using the API, others include Quagga, and other 
tools.

Sorry, this API is frozen.

Re: netlink: 16 bytes leftover after parsing attributes in process `ip'.

2018-09-26 Thread David Ahern

On 9/25/18 11:51 PM, Jiri Benc wrote:
> On Tue, 25 Sep 2018 09:37:41 -0600, David Ahern wrote:
>> For ifaddrmsg ifa_flags aligns with ifi_type which is set by kernel side
>> so this should be ok.
> 
> Does the existing user space set ifi_type to anything? Does it zero out
> the field?
> 
> Are we able to find a flag value that is not going to be set by unaware
> user space? I.e., a bit that is unused by the current ARPHRD values on
> both little and big endian? (ARPHRD_NONE might be a problem, though...)

The goal is for userpsace to pass something to the kernel to
definitively state which header is used.

ifaddrmsg (proper header and one Christian's patch wants to use) is 8
bytes; ifinfomsg (legacy header from broken userspace) is 16. If you can
not trust that ifi_type is currently 0 on a dump request then you can
not trust ifi_flags to be correct or ifi_change to be correct and so you
can not move past the header and parse attributes. If that is the case
we are done - Christian's patches should be reverted as you can never
trust what is beyond the family entry.

But I do not believe that to be the case because of the route dump
analogy. As I mentioned route dumps have the same problem: sometimes
ifinfomsg is passed and sometimes rtmsg. Yet the kernel always looks at
rtm_flags.

In terms of which field to use the most logical to me is to pass in a
flag. Current address dumps have no reason to pass in a flag so it is
not like the field can be misinterpreted. ifa_flags is a single byte so
are there really endian issues to worry about?

Re: [PATCH 3/5] ixgbe: add AF_XDP zero-copy Rx support

2018-09-26 Thread Björn Töpel

Den tis 25 sep. 2018 kl 16:57 skrev Jakub Kicinski
:
>
> On Mon, 24 Sep 2018 18:35:55 +0200, Björn Töpel wrote:
> > + if (adapter->flags & IXGBE_FLAG_SRIOV_ENABLED)
> > + return -EINVAL;
> > +
> > + if (adapter->flags & IXGBE_FLAG_DCB_ENABLED)
> > + return -EINVAL;
>
> Hm, should you add UMEM checks to all the places these may get
> enabled?  Like fabf1bce103a ("ixgbe: Prevent unsupported configurations
> with XDP") did?

Actually, I can remove these checks, since it's already checked by the
XDP path. AF_XDP ZC is enabled only when XDP is enabled. So there's no
need to have the XDP checks in the AF_XDP path.

Björn

Re: [PATCH net-next] net/core: make function ___gnet_stats_copy_basic() static

2018-09-26 Thread Eelco Chaudron




On 26 Sep 2018, at 14:09, Wei Yongjun wrote:

> Fixes the following sparse warning:
>
> net/core/gen_stats.c:166:1: warning:
>  symbol '___gnet_stats_copy_basic' was not declared. Should it be static?
>

Thanks for fixing my commit ;)

Acked-by: Eelco Chaudron 

> Fixes: 5e111210a443 ("net/core: Add new basic hardware counter")
> Signed-off-by: Wei Yongjun 
> ---
>  net/core/gen_stats.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/net/core/gen_stats.c b/net/core/gen_stats.c
> index 65a2e82..9bf1b9a 100644
> --- a/net/core/gen_stats.c
> +++ b/net/core/gen_stats.c
> @@ -162,7 +162,7 @@
>  }
>  EXPORT_SYMBOL(__gnet_stats_copy_basic);
>
> -int
> +static int
>  ___gnet_stats_copy_basic(const seqcount_t *running,
>struct gnet_dump *d,
>struct gnet_stats_basic_cpu __percpu *cpu,

[PATCH net-next] net: aquantia: Make function aq_fw1x_set_power() static

2018-09-26 Thread Wei Yongjun

Fixes the following sparse warning:

drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_utils.c:873:5: warning:
 symbol 'aq_fw1x_set_power' was not declared. Should it be static?

Fixes: a0da96c08cfa ("net: aquantia: implement WOL support")
Signed-off-by: Wei Yongjun 
---
 drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_utils.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_utils.c 
b/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_utils.c
index 0dd59b09..7def1cb 100644
--- a/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_utils.c
+++ b/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_utils.c
@@ -870,8 +870,8 @@ static int aq_fw1x_set_wol(struct aq_hw_s *self, bool 
wol_enabled, u8 *mac)
return err;
 }
 
-int aq_fw1x_set_power(struct aq_hw_s *self, unsigned int power_state,
- u8 *mac)
+static int aq_fw1x_set_power(struct aq_hw_s *self, unsigned int power_state,
+u8 *mac)
 {
struct hw_atl_utils_fw_rpc *prpc = NULL;
unsigned int rpc_size = 0U;

[PATCH net-next] net/tls: Make function get_rec() static

2018-09-26 Thread Wei Yongjun

Fixes the following sparse warning:

net/tls/tls_sw.c:655:16: warning:
 symbol 'get_rec' was not declared. Should it be static?

Fixes: a42055e8d2c3 ("net/tls: Add support for async encryption of records for 
performance")
Signed-off-by: Wei Yongjun 
---
 net/tls/tls_sw.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index bcb24c4..207e3ca 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -652,7 +652,7 @@ static int memcopy_from_iter(struct sock *sk, struct 
iov_iter *from,
return rc;
 }
 
-struct tls_rec *get_rec(struct sock *sk)
+static struct tls_rec *get_rec(struct sock *sk)
 {
struct tls_context *tls_ctx = tls_get_ctx(sk);
struct tls_sw_context_tx *ctx = tls_sw_ctx_tx(tls_ctx);

[PATCH net-next] net/core: make function ___gnet_stats_copy_basic() static

2018-09-26 Thread Wei Yongjun

Fixes the following sparse warning:

net/core/gen_stats.c:166:1: warning:
 symbol '___gnet_stats_copy_basic' was not declared. Should it be static?

Fixes: 5e111210a443 ("net/core: Add new basic hardware counter")
Signed-off-by: Wei Yongjun 
---
 net/core/gen_stats.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/core/gen_stats.c b/net/core/gen_stats.c
index 65a2e82..9bf1b9a 100644
--- a/net/core/gen_stats.c
+++ b/net/core/gen_stats.c
@@ -162,7 +162,7 @@
 }
 EXPORT_SYMBOL(__gnet_stats_copy_basic);
 
-int
+static int
 ___gnet_stats_copy_basic(const seqcount_t *running,
 struct gnet_dump *d,
 struct gnet_stats_basic_cpu __percpu *cpu,

[RFC PATCH iproute2-next V2] System specification exception API

2018-09-26 Thread Eran Ben Elisha

The exception spec is targeted for Real Time Alerting, in order to know when
something bad had happened to a PCI device
- Provide alert debug information
- Self healing
- If problem needs vendor support, provide a way to gather all needed debugging
  information.

The exception mechanism contains condition checkers which sense for 
malfunction. Upon a condition hit,
actions such as logs and correction can be taken.

The condition checkers are divided into the following groups
- Hardware - a checker which is triggered by the device due to
  malfunction.
- Software - a checker which is triggered by the software due to
  malfunction.
Both groups of condition checkers can be triggered due to error event or due to 
a periodic check.

Actions are the way to handle those events. Action can be in one of the
following groups:
- Dump -  SW trace, SW dump, HW trace, HW dump
- Reset - Surgical correction (e.g. modify Q, flush Q, reset of device, etc)
Actions can be performed by SW or HW.

User is allowed to enable or disable condition checkers and its action mapping.

This RFC man page patch describes the suggested API of devlink-exception in 
order
to control conditions and actions.

V2:
* Renaming terms:
health -> exception
sensor -> condition
* Remove reinit command and merge with action command.
* Consmetics in grammer.

Eran Ben Elisha (1):
  man: Add devlink exception man page

 man/man8/devlink-exception.8 | 158 +++
 1 file changed, 158 insertions(+)
 create mode 100644 man/man8/devlink-exception.8

-- 
1.8.3.1

[RFC PATCH iproute2-next V2] man: Add devlink exception man page

2018-09-26 Thread Eran Ben Elisha

Add devlink-exception man page. Devlink-exception tool will control device
exception attributes, conditions, actions and logging.

Signed-off-by: Eran Ben Elisha 

---
Copy paste man output to here for easier review process of the RFC.

DEVLINK-EXCEPTION(8)
Linux   
DEVLINK-EXCEPTION(8)

NAME
   devlink-exception - devlink exception configuration

SYNOPSIS
   devlink [ OPTIONS ] exception  { COMMAND | help }

   OPTIONS := { -V[ersion] | -n[no-nice-names] }

   devlink exception show [ DEV ] [ condition NAME ] [ action NAME ]

   devlink exception condition set DEV name NAME [ action NAME { active | 
inactive } ]

   devlink exception action set DEV name NAME period PERIOD count COUNT 
fail { ignore | down }

   devlink exception help

DESCRIPTION
   devlink-exception tool allows user to configure the way driver treats 
unexpected status. The tool allows configuration of the conditions that can 
trigger exception activity. Set for each condition the follow up opera‐
   tions, such as, reset and dump of info. In addition, set the exception 
activity termination action.

   devlink exception show - Display devlink exception conditions and actions 
attributes
   DEVSpecifies the devlink device to show.

   condition NAME
  Specifies the devlink condition to show.

   action NAME
  Specifies the devlink action to show.

   devlink exception condition set - sets devlink exception condition attributes
   DEVSpecifies the devlink device to set.

   name NAME
  Name of the condition to set.

   action NAME { active | inactive }
  Specify which actions to activate and which to deactivate 
once a condition was triggered. Actions can be dump, reset, etc.

   devlink exception action set - sets devlink action attributes.
   Once this command is launched, period and count measurement will be 
reset.

   DEVSpecifies the devlink device to set.

   name NAME
  Specifies the devlink action to set.

   period PERIOD
  The period on which we limit the amount of performed actions, 
measured in seconds.

   count COUNT
  The maximum number of actions performed in a limited time frame.

   fail   { ignore | down }
  Specify the behavior once count limit was reached.

  ignore - Skip triggering this action.

  down - Driver will remain in nonoperational state.

EXAMPLES
   devlink exception show
   Shows the exception state of all devlink devices on the system.

   devlink exception show pci/:01:00.0
   Shows the exception state of specified devlink device.

   devlink exception condition set pci/:01:00.0 name TX_COMP_ERROR 
action reset off action dump on
   Sets TX_COMP_ERROR condition parameters for a specific device.

   devlink exception action set pci/:01:00.0 name reset period 3600 
count 5 fail ignore
   Sets exception attributes for reset action. Period timer and counter 
are being reset.

SEE ALSO
   devlink(8), devlink-port(8), devlink-sb(8), devlink-monitor(8), 
devlink-dev(8),

AUTHOR
   Eran ben Elisha 

iproute2
 15 Aug 2018
DEVLINK-EXCEPTION(8)

---
 man/man8/devlink-exception.8 | 158 +++
 1 file changed, 158 insertions(+)
 create mode 100644 man/man8/devlink-exception.8

diff --git a/man/man8/devlink-exception.8 b/man/man8/devlink-exception.8
new file mode 100644
index ..03f24b32cc98
--- /dev/null
+++ b/man/man8/devlink-exception.8
@@ -0,0 +1,158 @@
+.TH DEVLINK\-EXCEPTION 8 "15 Aug 2018" "iproute2" "Linux"
+.SH NAME
+devlink-exception \- devlink exception configuration
+.SH SYNOPSIS
+.sp
+.ad l
+.in +8
+.ti -8
+.B devlink
+.RI "[ " OPTIONS " ]"
+.BR exception
+.RI  " { " COMMAND " | "
+.BR help " }"
+.sp
+
+.ti -8
+.IR OPTIONS " := { "
+\fB\-V\fR[\fIersion\fR] |
+\fB\-n\fR[\fIno-nice-names\fR] }
+
+.ti -8
+.B devlink exception show
+.RI "[ " DEV " ]"
+.RI "[ "
+.B condition
+.IR NAME
+.RI "]"
+.RI "[ "
+.B action
+.IR NAME
+.RI "]"
+
+.ti -8
+.B devlink exception condition set
+.IR DEV
+.B name
+.IR NAME
+.RI "[ "
+.BR action
+.IR NAME
+.R "{" active "|" inactive "}" ]
+
+.ti -8
+.B devlink exception action set
+.IR DEV
+.B name
+.IR NAME
+.BR period
+.IR PERIOD
+.BR count
+.IR COUNT
+.BR fail " { "
+.IR ignore
+.BR "| "
+.IR down
+.R "} "
+
+.ti -8
+.B devlink exception help
+
+.SH "DESCRIPTION"
+.B devlink-exception
+tool allows user to configure the

1 2 >

1 - 100 of 116 matches

Mail list logo