date:20170827

[PATCH] net: stmmac: constify clk_div_table

2017-08-27 Thread Arvind Yadav

clk_div_table are not supposed to change at runtime.
meson8b_dwmac structure is working with const clk_div_table.
So mark the non-const structs as const.

Signed-off-by: Arvind Yadav 
---
 drivers/net/ethernet/stmicro/stmmac/dwmac-meson8b.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac-meson8b.c 
b/drivers/net/ethernet/stmicro/stmmac/dwmac-meson8b.c
index 968..4404650b 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac-meson8b.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-meson8b.c
@@ -89,7 +89,7 @@ static int meson8b_init_clk(struct meson8b_dwmac *dwmac)
char clk_name[32];
const char *clk_div_parents[1];
const char *mux_parent_names[MUX_CLK_NUM_PARENTS];
-   static struct clk_div_table clk_25m_div_table[] = {
+   static const struct clk_div_table clk_25m_div_table[] = {
{ .val = 0, .div = 5 },
{ .val = 1, .div = 10 },
{ /* sentinel */ },
-- 
1.9.1

Re: [PATCH net-next v7 05/10] landlock: Add LSM hooks related to filesystem

2017-08-27 Thread Alexei Starovoitov

On Sun, Aug 27, 2017 at 03:31:35PM +0200, Mickaël Salaün wrote:
> 
> > How can you add 3rd argument? All FS events would have to get it,
> > but in some LSM hooks such argument will be meaningless, whereas
> > in other places it will carry useful info that rule can operate on.
> > Would that mean that we'll have FS_3 event type and only few LSM
> > hooks will be converted to it. That works, but then we'll lose
> > compatiblity with old rules written for FS event and that given hook.
> > Otherwise we'd need to have fancy logic to accept old FS event
> > into FS_3 LSM hook.
> 
> If we want to add a third argument to the FS event, then it will become
> accessible because its type will be different than NOT_INIT. This keep
> the compatibility with old rules because this new field was then denied.
> 
> If we want to add a new argument but only for a subset of the hooks used
> by the FS event, then we need to create a new event, like FS_FCNTL. For
> example, we may want to add a FS_RENAME event to be able to tie the
> source file and the destination file of a rename call.

that's exactly my point. To add another argument FS event
to a subset of hooks will require either new FS_FOO and
to be backwards compatible these hooks will call _both_ FS and FS_FOO
or some magic logic on kernel side that will allow old FS rules
to be attached to FS_FOO hooks?
Two calls doesn't scale and if we do 'magic logic' can we do it now
and avoid introducing events altogether?
Like all landlock programs can be landlock type and they would need
to declare what arg1, arg2, argN they expect. Then at attach
time the kernel only needs to verify that hook arg types match
what program requested.

> Anyway, I added the subtype/ABI version as a safeguard in case of
> unexpected future evolution.

I don't think that abi/version field adds anything in this context.
I still think it should simply be removed.

Re: [PATCH net-next v7 04/10] bpf: Define handle_fs and add a new helper bpf_handle_fs_get_mode()

2017-08-27 Thread James Morris

On Mon, 21 Aug 2017, Mickaël Salaün wrote:

> @@ -85,6 +90,8 @@ enum bpf_arg_type {
>  
>   ARG_PTR_TO_CTX, /* pointer to context */
>   ARG_ANYTHING,   /* any (initialized) argument is ok */
> +
> + ARG_CONST_PTR_TO_HANDLE_FS, /* pointer to an abstract FS struct */
>  };

Looks like a spurious empty line.

-- 
James Morris

Re: [kernel-hardening] Re: [PATCH net-next v7 02/10] bpf: Add eBPF program subtype and is_valid_subtype() verifier

2017-08-27 Thread James Morris

On Wed, 23 Aug 2017, Mickaël Salaün wrote:

> >> +  struct {
> >> +  __u32   abi; /* minimal ABI version, cf. user doc */
> > 
> > the concept of abi (version) sounds a bit weird to me.
> > Why bother with it at all?
> > Once the first set of patches lands the kernel as whole will have landlock 
> > feature
> > with a set of helpers, actions, event types.
> > Some future patches will extend the landlock feature step by step.
> > This abi concept assumes that anyone who adds new helper would need
> > to keep incrementing this 'abi'. What value does it give to user or to 
> > kernel?
> > The users will already know that landlock is present in kernel 4.14 or 
> > whatever
> > and the kernel 4.18 has more landlock features. Why bother with extra abi 
> > number?
> 
> That's right for helpers and context fields, but we can't check the use
> of one field's content. The status field is intended to be a bitfield
> extendable in the future. For example, one use case is to set a flag to
> inform the eBPF program that it was already called with the same context
> and can skip most of its check (if not related to maps). Same goes for
> the FS action bitfield, one may want to add more of them. Another
> example may be the check for abilities. We may want to relax/remove the
> capability require to set one of them. With an ABI version, the user can
> easily check if the current kernel support that.

Don't call it an ABI, perhaps minimum policy version (similar to 
what SELinux does).  Changes need to be made so that any existing 
userspace still works.



-- 
James Morris

Re: [PATCH net-next v7 02/10] bpf: Add eBPF program subtype and is_valid_subtype() verifier

2017-08-27 Thread James Morris

On Tue, 22 Aug 2017, Alexei Starovoitov wrote:

> more general question: what is the status of security/ bits?
> I'm assuming they still need to be reviewed and explicitly acked by James, 
> right?

Yep, along with other core security developers where possible.


-- 
James Morris

Re: [kernel-hardening] [PATCH net-next v7 00/10] Landlock LSM: Toward unprivileged sandboxing

2017-08-27 Thread James Morris

On Mon, 21 Aug 2017, Mickaël Salaün wrote:

> ## Why a new LSM? Are SELinux, AppArmor, Smack and Tomoyo not good enough?
> 
> The current access control LSMs are fine for their purpose which is to give 
> the
> *root* the ability to enforce a security policy for the *system*. What is
> missing is a way to enforce a security policy for any application by its
> developer and *unprivileged user* as seccomp can do for raw syscall filtering.
> 

You could mention here that the first case is Mandatory Access Control, 
in general terms.



-- 
James Morris

Re: Get ARP/ND tables from kernel

2017-08-27 Thread Waskiewicz Jr, Peter

On 8/27/17 9:25 PM, Bassam Alsanie wrote:
> Hello everyone,
> I looking into a good way (stable and compatible with large number of
> distros) to get the arp/nd cache from kernel to user space, for both
> IP4 and IP6.
> 
> It seem IOCTL (SIOCGARP) can't do that, you can only get MAC address
> from provided IP address. But IOCTL can't give the the full arp/nd
> table.
> The other option is the Netlink interface. I tried it and I got the
> ARP/ND table :).
> The third option is using /proc/net/arp, which only restricted to IP4.
> 
> There is command line utilities that I excluding in my case.
> 
> Is there another way to do it? what is the best way in my case?
> 
> Thank you all.

# strace arp -an
[...]
open("/proc/net/arp", O_RDONLY) = 4
fstat(4, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0
read(4, "IP address   HW type Fla"..., 1024) = 310
[...]

# strace ip -6 neighbor show
[...]
socket(AF_NETLINK, SOCK_RAW|SOCK_CLOEXEC, NETLINK_ROUTE) = 3
setsockopt(3, SOL_SOCKET, SO_SNDBUF, [32768], 4) = 0
setsockopt(3, SOL_SOCKET, SO_RCVBUF, [1048576], 4) = 0
bind(3, {sa_family=AF_NETLINK, nl_pid=0, nl_groups=}, 12) = 0
getsockname(3, {sa_family=AF_NETLINK, nl_pid=30292, nl_groups=}, 
[12]) = 0
sendto(3, {{len=40, type=RTM_GETLINK, flags=NLM_F_REQUEST|NLM_F_DUMP, 
seq=1503888680, pid=0}, 
"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\10\0\35\0\1\0\0\0"}, 40, 0, NULL, 0) = 40
recvmsg(3, {msg_name={sa_family=AF_NETLINK, nl_pid=0, 
nl_groups=}, msg_namelen=12, msg_iov=[{iov_base=[{{len=1268, 
type=RTM_NEWLINK, flags=NLM_F_MULTI, seq=1503888680, pid=30292}, 
"\0\0\4\3\1\0\0\0I\0\1\0\0\0\0\0\7\0\3\0lo\0\0\10\0\r\0\350\3\0\0"...}, 
{{len=1280, type=RTM_NEWLINK, flags=NLM_F_MULTI, seq=1503888680, 
pid=30292}, 
"\0\0\1\0\2\0\0\0C\20\1\0\0\0\0\0\t\0\3\0eno1\0\0\0\0\10\0\r\0"...}], 
iov_len=32768}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 2548
[...]

Seems like it's pretty obvious if you don't want to use the existing 
tools, just look at how the existing tools get this data.  IPv4 uses 
/proc/net/arp, IPv6 uses netlink.

Cheers,
-PJ

[PATCH net] ipv6: do not set sk_destruct in IPV6_ADDRFORM sockopt

2017-08-27 Thread Xin Long

ChunYu found a kernel warn_on during syzkaller fuzzing:

[40226.038539] WARNING: CPU: 5 PID: 23720 at net/ipv4/af_inet.c:152 
inet_sock_destruct+0x78d/0x9a0
[40226.144849] Call Trace:
[40226.147590]  
[40226.149859]  dump_stack+0xe2/0x186
[40226.176546]  __warn+0x1a4/0x1e0
[40226.180066]  warn_slowpath_null+0x31/0x40
[40226.184555]  inet_sock_destruct+0x78d/0x9a0
[40226.246355]  __sk_destruct+0xfa/0x8c0
[40226.290612]  rcu_process_callbacks+0xaa0/0x18a0
[40226.336816]  __do_softirq+0x241/0x75e
[40226.367758]  irq_exit+0x1f6/0x220
[40226.371458]  smp_apic_timer_interrupt+0x7b/0xa0
[40226.376507]  apic_timer_interrupt+0x93/0xa0

The warn_on happned when sk->sk_rmem_alloc wasn't 0 in inet_sock_destruct.
As after commit f970bd9e3a06 ("udp: implement memory accounting helpers"),
udp has changed to use udp_destruct_sock as sk_destruct where it would
udp_rmem_release all rmem.

But IPV6_ADDRFORM sockopt sets sk_destruct with inet_sock_destruct after
changing family to PF_INET. If rmem is not 0 at that time, and there is
no place to release rmem before calling inet_sock_destruct, the warn_on
will be triggered.

This patch is to fix it by not setting sk_destruct in IPV6_ADDRFORM sockopt
any more. As IPV6_ADDRFORM sockopt only works for tcp and udp. TCP sock has
already set it's sk_destruct with inet_sock_destruct and UDP has set with
udp_destruct_sock since they're created.

Fixes: f970bd9e3a06 ("udp: implement memory accounting helpers")
Reported-by: ChunYu Wang 
Signed-off-by: Xin Long 
---
 net/ipv6/ipv6_sockglue.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/net/ipv6/ipv6_sockglue.c b/net/ipv6/ipv6_sockglue.c
index 02d795f..a5e466d 100644
--- a/net/ipv6/ipv6_sockglue.c
+++ b/net/ipv6/ipv6_sockglue.c
@@ -242,7 +242,6 @@ static int do_ipv6_setsockopt(struct sock *sk, int level, 
int optname,
pktopt = xchg(>pktoptions, NULL);
kfree_skb(pktopt);
 
-   sk->sk_destruct = inet_sock_destruct;
/*
 * ... and add it to the refcnt debug socks count
 * in the new family. -acme
-- 
2.1.0

Re: [PATCH RFC WIP 0/5] IGMP snooping for local traffic

2017-08-27 Thread Florian Fainelli

Hi Andrew,

On 08/26/2017 01:56 PM, Andrew Lunn wrote:
> This is a WIP patchset i would like comments on from bridge,
> switchdev and hardware offload people.
> 
> The linux bridge supports IGMP snooping. It will listen to IGMP 
> reports on bridge ports and keep track of which groups have been 
> joined on an interface. It will then forward multicast based on this 
> group membership.
> 
> When the bridge adds or removed groups from an interface, it uses 
> switchdev to request the hardware add an mdb to a port, so the 
> hardware can perform the selective forwarding between ports.
> 
> What is not covered by the current bridge code, is IGMP joins/leaves 
> from the host on the brX interface. No such monitoring is performed.
> With a pure software bridge, it is not required. All mulitcast frames
> are passed to the brX interface, and the network stack filters them,
> as it does for any interface. However, when hardware offload is
> involved, things change. We should program the hardware to only send
> multcast packets to the host when the host has in interest in them.

OK, so if I understand this right, without a bridge, we have the
following happen today: with a DSA-enabled setup using any kind of
switch tagging protocol, if a host is interested in receiving particular
multicast traffic, we would receive IGMP joins/leaves through sw0p0, and
the stack should call ndo_set_rx_mode for sw0p0, which would be
dsa_slave_set_rx_mode() and which would synchronize the DSA master
network device with the slave network device, everything works fine
provided that the CPU port is configured to accept multicast traffic.

Note here that we don't really add a MDB entry for sw0p0 when that
happens, but it seems like we should for switches that lack IGMP
snooping and/or multicast filtering.

With the current bridge and DSA code, are not we actually always going
to get the CPU port to be added with the multicast address and therefore
no filtering is occurring and snooping is pretty much useless?

> 
> Thus we need to perform IGMP snooping on the brX interface, just
> like any other interface of the bridge. However, currently the brX 
> interface is missing all the needed data structures to do this.
> There is no net_bridge_port structure for the brX interface. This
> strucuture is created when an interface is added to the bridge. But
> the brX interface is not a member of the bridge. So this patchset
> makes the brX interface a first class member of the bridge. When the
> brX interface is opened, the interface is added to the bridge. A 
> net_bridge_port is allocated for it, and IGMP snooping is performed
> as usual.

Would not making brX be part of the bridge have a huge negative
performance impact on locally generated traffic either? Even though we
do an early return in br_handle_frame() this may become noticeable.

> 
> There are some complexities here. Some assumptions are broken, like 
> the master interface of a port interface is the bridge interface.
> The brX interface cannot be its own master. The use of 
> netdev_master_upper_dev_get() within the bridge code has been
> changed to reflecit this. The bridge receive handler needs to not
> process frames for the brX interface, etc.
> 
> The interface downward to the hardware is also an issue. The code 
> presented here is a hack and needs to change. But that is secondary 
> and can be solved once it is agreed how the bridge needs to change
> to support this use case.
> 
> Comment welcome and wanted.

While I understand the reasons why you did it that way, I think this is
going to break a lot of code in bridge that does not expect brX to be a
bridge port member.

Maybe we can just generate switch MDB events targeting the bridge
network device and let switch drivers resolve that to whatever their
CPU/master port is?

It does sound like we are moving more and more to a model where brX
becomes one (if not the only one) net_device representor of what the
CPU/master port of a switch is (at least with DSA) which sort of makes
us go back to the multi-CPU port discussion we had a while ago.

Thanks!
-- 
Florian

Re: [PATCH net-next] bridge: fdb add and delete tracepoints

2017-08-27 Thread Florian Fainelli

On 08/27/2017 02:33 PM, Roopa Prabhu wrote:
> From: Roopa Prabhu 
> 
> Tracepoints to trace bridge forwarding database updates.

Thanks for adding this!

> 
> Signed-off-by: Roopa Prabhu 
> ---
>  include/trace/events/bridge.h | 98 
> +++
>  net/bridge/br_fdb.c   |  7 
>  net/core/net-traces.c |  6 +++
>  3 files changed, 111 insertions(+)
>  create mode 100644 include/trace/events/bridge.h
> 
> diff --git a/include/trace/events/bridge.h b/include/trace/events/bridge.h
> new file mode 100644
> index 000..e2d52cf
> --- /dev/null
> +++ b/include/trace/events/bridge.h
> @@ -0,0 +1,98 @@
> +#undef TRACE_SYSTEM
> +#define TRACE_SYSTEM bridge
> +
> +#if !defined(_TRACE_BRIDGE_H) || defined(TRACE_HEADER_MULTI_READ)
> +#define _TRACE_BRIDGE_H
> +
> +#include 
> +#include 
> +
> +#include "../../../net/bridge/br_private.h"
> +
> +TRACE_EVENT(br_fdb_add,
> +
> + TP_PROTO(struct ndmsg *ndm, struct net_device *dev,
> +  const unsigned char *addr, u16 vid, u16 nlh_flags),
> +
> + TP_ARGS(ndm, dev, addr, vid, nlh_flags),
> +
> + TP_STRUCT__entry(
> + __field(u8, ndm_flags)
> + __string(dev, dev->name)
> + __array(unsigned char, addr, 6)

Can you use ETH_ALEN instead of 6 here?

> + __field(u16, vid)
> + __field(u16, nlh_flags)
> + ),
> +
> + TP_fast_assign(
> + __assign_str(dev, dev->name);
> + memcpy(__entry->addr, addr, 6);

Likewise

> + __entry->vid = vid;
> + __entry->nlh_flags = nlh_flags;
> + __entry->ndm_flags = ndm->ndm_flags;
> + ),
> +
> + TP_printk("dev %s addr %02x:%02x:%02x:%02x:%02x:%02x vid %u nlh_flags 
> %x ndm_flags = %x",

I wonder if we could make %pM work for TP_printk() as this would
simplify the argument list a bitt. Can you use %04x for vid, nlh_flags
and %02x for ndm_flags?

> +   __get_str(dev), __entry->addr[0], __entry->addr[1],
> +   __entry->addr[2], __entry->addr[3], __entry->addr[4],
> +   __entry->addr[5], __entry->vid,
> +   __entry->nlh_flags, __entry->ndm_flags)
> +);
> +
> +TRACE_EVENT(br_fdb_external_learn_add,
> +
> + TP_PROTO(struct net_bridge *br, struct net_bridge_port *p,
> +  const unsigned char *addr, u16 vid),
> +
> + TP_ARGS(br, p, addr, vid),
> +
> + TP_STRUCT__entry(
> + __string(br_dev, br->dev->name)
> + __string(dev, p->dev->name)
> + __array(unsigned char, addr, 6)
> + __field(u16, vid)
> + ),
> +
> + TP_fast_assign(
> + __assign_str(br_dev, br ? br->dev->name : "null");
> + __assign_str(dev, p ? p->dev->name : "null");
> + memcpy(__entry->addr, addr, 6);
> + __entry->vid = vid;
> + ),
> +
> + TP_printk("br_dev %s port %s addr %02x:%02x:%02x:%02x:%02x:%02x vid %u",
> +   __get_str(br_dev), __get_str(dev), __entry->addr[0],
> +   __entry->addr[1], __entry->addr[2], __entry->addr[3],
> +   __entry->addr[4], __entry->addr[5], __entry->vid)
> +);
> +
> +TRACE_EVENT(fdb_delete,
> +
> + TP_PROTO(struct net_bridge *br, struct net_bridge_fdb_entry *f),
> +
> + TP_ARGS(br, f),
> +
> + TP_STRUCT__entry(
> + __string(br_dev, br->dev->name)
> + __string(dev, f->dst ? f->dst->dev->name : "null")
> + __array(unsigned char, addr, 6)

Same here, using ETH_ALEN would be clearer.

> + __field(u16, vid)
> + ),
> +

Thanks!
-- 
Florian

linux-next: manual merge of the net-next tree with the rockchip tree

2017-08-27 Thread Stephen Rothwell

Hi all,

Today's linux-next merge of the net-next tree got a conflict in:

  arch/arm64/boot/dts/rockchip/rk3328-evb.dts

between commit:

  0e54e062692a ("arm64: dts: rockchip: add mmc nodes for rk3328 evaluation 
board")
  57fca160b2be ("arm64: dts: rockchip: add cpu regulator for rk3328 evaluation 
board")

from the rockchip tree and commit:

  4b05bc6157eb ("ARM64: dts: rockchip: Enable gmac2phy for rk3328-evb")

from the net-next tree.

I fixed it up (see below) and can carry the fix as necessary. This
is now fixed as far as linux-next is concerned, but any non trivial
conflicts should be mentioned to your upstream maintainer when your tree
is submitted for merging.  You may also want to consider cooperating
with the maintainer of the conflicting tree to minimise any particularly
complex conflicts.

-- 
Cheers,
Stephen Rothwell

diff --cc arch/arm64/boot/dts/rockchip/rk3328-evb.dts
index f82b2d0d9e86,b9f36dad17e6..
--- a/arch/arm64/boot/dts/rockchip/rk3328-evb.dts
+++ b/arch/arm64/boot/dts/rockchip/rk3328-evb.dts
@@@ -51,217 -51,24 +51,234 @@@
stdout-path = "serial2:150n8";
};
  
 +  dc_12v: dc-12v {
 +  compatible = "regulator-fixed";
 +  regulator-name = "dc_12v";
 +  regulator-always-on;
 +  regulator-boot-on;
 +  regulator-min-microvolt = <1200>;
 +  regulator-max-microvolt = <1200>;
 +  };
 +
 +  sdio_pwrseq: sdio-pwrseq {
 +  compatible = "mmc-pwrseq-simple";
 +  pinctrl-names = "default";
 +  pinctrl-0 = <_enable_h>;
 +
 +  /*
 +   * On the module itself this is one of these (depending
 +   * on the actual card populated):
 +   * - SDIO_RESET_L_WL_REG_ON
 +   * - PDN (power down when low)
 +   */
 +  reset-gpios = < 18 GPIO_ACTIVE_LOW>;
 +  };
 +
+   vcc_phy: vcc-phy-regulator {
+   compatible = "regulator-fixed";
+   regulator-name = "vcc_phy";
+   regulator-always-on;
+   regulator-boot-on;
+   };
++
 +  vcc_sys: vcc-sys {
 +  compatible = "regulator-fixed";
 +  regulator-name = "vcc_sys";
 +  regulator-always-on;
 +  regulator-boot-on;
 +  regulator-min-microvolt = <500>;
 +  regulator-max-microvolt = <500>;
 +  vin-supply = <_12v>;
 +  };
 +
 +  vcc_sd: sdmmc-regulator {
 +  compatible = "regulator-fixed";
 +  gpio = < 30 GPIO_ACTIVE_LOW>;
 +  pinctrl-names = "default";
 +  pinctrl-0 = <_gpio>;
 +  regulator-name = "vcc_sd";
 +  regulator-min-microvolt = <330>;
 +  regulator-max-microvolt = <330>;
 +  vin-supply = <_io>;
 +  };
 +};
 +
 + {
 +  cpu-supply = <_arm>;
 +};
 +
 + {
 +  bus-width = <8>;
 +  cap-mmc-highspeed;
 +  non-removable;
 +  pinctrl-names = "default";
 +  pinctrl-0 = <_clk _cmd _bus8>;
 +  status = "okay";
  };
  
+  {
+   phy-supply = <_phy>;
+   clock_in_out = "output";
+   assigned-clocks = < SCLK_MAC2PHY_SRC>;
+   assigned-clock-rate = <5000>;
+   assigned-clocks = < SCLK_MAC2PHY>;
+   assigned-clock-parents = < SCLK_MAC2PHY_SRC>;
+   status = "okay";
+ };
+ 
 + {
 +  status = "okay";
 +
 +  rk805: rk805@18 {
 +  compatible = "rockchip,rk805";
 +  reg = <0x18>;
 +  interrupt-parent = <>;
 +  interrupts = <6 IRQ_TYPE_LEVEL_LOW>;
 +  #clock-cells = <1>;
 +  clock-output-names = "xin32k", "rk805-clkout2";
 +  gpio-controller;
 +  #gpio-cells = <2>;
 +  pinctrl-names = "default";
 +  pinctrl-0 = <_int_l>;
 +  rockchip,system-power-controller;
 +  wakeup-source;
 +
 +  vcc1-supply = <_sys>;
 +  vcc2-supply = <_sys>;
 +  vcc3-supply = <_sys>;
 +  vcc4-supply = <_sys>;
 +  vcc5-supply = <_io>;
 +  vcc6-supply = <_io>;
 +
 +  regulators {
 +  vdd_logic: DCDC_REG1 {
 +  regulator-name = "vdd_logic";
 +  regulator-min-microvolt = <712500>;
 +  regulator-max-microvolt = <145>;
 +  regulator-always-on;
 +  regulator-boot-on;
 +  regulator-state-mem {
 +  regulator-on-in-suspend;
 +  regulator-suspend-microvolt = <100>;
 +  };
 +  };
 +
 +  vdd_arm: DCDC_REG2 {
 +  regulator-name = "vdd_arm";
 +

Get ARP/ND tables from kernel

2017-08-27 Thread Bassam Alsanie

Hello everyone,
I looking into a good way (stable and compatible with large number of
distros) to get the arp/nd cache from kernel to user space, for both
IP4 and IP6.

It seem IOCTL (SIOCGARP) can't do that, you can only get MAC address
from provided IP address. But IOCTL can't give the the full arp/nd
table.
The other option is the Netlink interface. I tried it and I got the
ARP/ND table :).
The third option is using /proc/net/arp, which only restricted to IP4.

There is command line utilities that I excluding in my case.

Is there another way to do it? what is the best way in my case?

Thank you all.

Re: [PATCH net-next 3/4] net/core: Add violation counters to VF statisctics

2017-08-27 Thread Jakub Kicinski

On Sun, 27 Aug 2017 14:06:17 +0300, Saeed Mahameed wrote:
> From: Eugenia Emantayev 
> 
> Add receive and transmit violation counters to be
> displayed in iproute2 VF statistics.
> 
> Signed-off-by: Eugenia Emantayev 
> Signed-off-by: Saeed Mahameed 
> ---
>  include/linux/if_link.h  |  2 ++
>  include/uapi/linux/if_link.h |  2 ++
>  net/core/rtnetlink.c | 10 +-
>  3 files changed, 13 insertions(+), 1 deletion(-)
> 
> diff --git a/include/linux/if_link.h b/include/linux/if_link.h
> index da70af27e42e..ebf3448acb5b 100644
> --- a/include/linux/if_link.h
> +++ b/include/linux/if_link.h
> @@ -12,6 +12,8 @@ struct ifla_vf_stats {
>   __u64 tx_bytes;
>   __u64 broadcast;
>   __u64 multicast;
> + __u64 rx_dropped;
> + __u64 tx_dropped;

I'm a little concerned that you call those violation counters in the
commit message.  Do you expect them to only be used if the VF traffic
indeed violates some admin-set rules?  I would imaging HW/FW may drop
frames in certain situations and naming the counters *_dropped suggests
it would be OK to increment them even if the drop reason was not any
sort of violation.  Would you mind clarifying?

Re: [PATCH net-next 1/4] net: Add SRIOV VGT+ support

2017-08-27 Thread Jakub Kicinski

On Sun, 27 Aug 2017 14:06:15 +0300, Saeed Mahameed wrote:
> From: Mohamad Haj Yahia 
> 
> VGT+ is a security feature that gives the administrator the ability of
> controlling the allowed vlan-ids list that can be transmitted/received
> from/to the VF.
> The allowed vlan-ids list is called "trunk".
> Admin can add/remove a range of allowed vlan-ids via iptool.
> Example:
> After this series of configuration :
> 1) ip link set eth3 vf 0 trunk add 10 100 (allow vlan-id 10-100, default tpid 
> 0x8100)
> 2) ip link set eth3 vf 0 trunk add 105 proto 802.1q (allow vlan-id 105 tpid 
> 0x8100)
> 3) ip link set eth3 vf 0 trunk add 105 proto 802.1ad (allow vlan-id 105 tpid 
> 0x88a8)
> 4) ip link set eth3 vf 0 trunk rem 90 (block vlan-id 90)
> 5) ip link set eth3 vf 0 trunk rem 50 60 (block vlan-ids 50-60)
> 
> The VF 0 can only communicate on vlan-ids: 10-49,61-89,91-100,105 with
> tpid 0x8100 and vlan-id 105 with tpid 0x88a8.
> 
> For this purpose we added the following netlink sr-iov commands:
> 
> 1) IFLA_VF_VLAN_RANGE: used to add/remove allowed vlan-ids range.
> We added the ifla_vf_vlan_range struct to specify the range we want to
> add/remove from the userspace.
> We added ndo_add_vf_vlan_trunk_range and ndo_del_vf_vlan_trunk_range
> netdev ops to add/remove allowed vlan-ids range in the netdev.
> 
> 2) IFLA_VF_VLAN_TRUNK: used to query the allowed vlan-ids trunk.
> We added trunk bitmap to the ifla_vf_info struct to get the current
> allowed vlan-ids trunk from the netdev.
> We added ifla_vf_vlan_trunk struct for sending the allowed vlan-ids
> trunk to the userspace.
> 
> Signed-off-by: Mohamad Haj Yahia 
> Signed-off-by: Eugenia Emantayev 
> Signed-off-by: Saeed Mahameed 

Interesting work, I have some minor questions if you don't mind :)

I was under impression that "trunk" is a vendor-specific term, would it
make sense to drop it from the APIs?

> diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
> index 8d062c58d5cb..3aa895c5fbc1 100644
> --- a/include/uapi/linux/if_link.h
> +++ b/include/uapi/linux/if_link.h
> @@ -168,6 +168,8 @@ enum {
>  #ifndef __KERNEL__
>  #define IFLA_RTA(r)  ((struct rtattr*)(((char*)(r)) + 
> NLMSG_ALIGN(sizeof(struct ifinfomsg
>  #define IFLA_PAYLOAD(n) NLMSG_PAYLOAD(n,sizeof(struct ifinfomsg))
> +#define BITS_PER_BYTE 8
> +#define DIV_ROUND_UP(n, d) (((n) + (d) - 1) / (d))
>  #endif
>  
>  enum {
> @@ -645,6 +647,8 @@ enum {
>   IFLA_VF_IB_NODE_GUID,   /* VF Infiniband node GUID */
>   IFLA_VF_IB_PORT_GUID,   /* VF Infiniband port GUID */
>   IFLA_VF_VLAN_LIST,  /* nested list of vlans, option for QinQ */
> + IFLA_VF_VLAN_RANGE, /* add/delete vlan range filtering */
> + IFLA_VF_VLAN_TRUNK, /* vlan trunk filtering */
>   __IFLA_VF_MAX,
>  };
>  
> @@ -669,6 +673,7 @@ enum {
>  
>  #define IFLA_VF_VLAN_INFO_MAX (__IFLA_VF_VLAN_INFO_MAX - 1)
>  #define MAX_VLAN_LIST_LEN 1
> +#define VF_VLAN_N_VID 4096
>  
>  struct ifla_vf_vlan_info {
>   __u32 vf;
> @@ -677,6 +682,21 @@ struct ifla_vf_vlan_info {
>   __be16 vlan_proto; /* VLAN protocol either 802.1Q or 802.1ad */
>  };
>  
> +struct ifla_vf_vlan_range {
> + __u32 vf;
> + __u32 start_vid;   /* 1 - 4095 */
> + __u32 end_vid; /* 1 - 4095 */
> + __u32 setting;
> + __be16 vlan_proto; /* VLAN protocol either 802.1Q or 802.1ad */
> +};
> +
> +#define VF_VLAN_BITMAP   DIV_ROUND_UP(VF_VLAN_N_VID, sizeof(__u64) * 
> BITS_PER_BYTE)
> +struct ifla_vf_vlan_trunk {
> + __u32 vf;
> + __u64 allowed_vlans_8021q_bm[VF_VLAN_BITMAP];
> + __u64 allowed_vlans_8021ad_bm[VF_VLAN_BITMAP];
> +};

Would you mind explaining why you chose to make the API asymmetrical
like that?  I mean the set operation is range-based, yet the get
returns a bitmask.  You seem to solely depend on the bitmasks in the
driver anyway...

>  struct ifla_vf_tx_rate {
>   __u32 vf;
>   __u32 rate; /* Max TX bandwidth in Mbps, 0 disables throttling */
> diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
> index a78fd61da0ec..56909f11d88e 100644
> --- a/net/core/rtnetlink.c
> +++ b/net/core/rtnetlink.c
> @@ -827,6 +827,7 @@ static inline int rtnl_vfinfo_size(const struct 
> net_device *dev,
>nla_total_size(MAX_VLAN_LIST_LEN *
>   sizeof(struct ifla_vf_vlan_info)) +
>nla_total_size(sizeof(struct ifla_vf_spoofchk)) +
> +  nla_total_size(sizeof(struct ifla_vf_vlan_trunk)) +
>nla_total_size(sizeof(struct ifla_vf_tx_rate)) +
>nla_total_size(sizeof(struct ifla_vf_rate)) +
>nla_total_size(sizeof(struct ifla_vf_link_state)) +
> @@ -1098,31 +1099,43 @@ static noinline_for_stack int rtnl_fill_vfinfo(struct 
> sk_buff *skb,
>   struct ifla_vf_link_state vf_linkstate;
>   struct ifla_vf_vlan_info

[net-next 04/15] i40e: Use correct flag to enable egress traffic for unicast promisc

2017-08-27 Thread Jeff Kirsher

From: Akeem G Abodunrin 

Albeit, we usually set true promiscuous mode for both multicast and
unicast at the same time - however, it is possible to set it
individually, so using allmulti flag which is only for allmulticast might
caused unwanted behavior in mirroring egress traffic promiscuous for
unicast in VF.

Signed-off-by: Akeem G Abodunrin 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c 
b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
index 057c77be96e4..27d87bef4ba3 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
@@ -1758,7 +1758,7 @@ static int i40e_vc_config_promiscuous_mode_msg(struct 
i40e_vf *vf,
}
} else {
aq_ret = i40e_aq_set_vsi_unicast_promiscuous(hw, vsi->seid,
-allmulti, NULL,
+alluni, NULL,
 true);
aq_err = pf->hw.aq.asq_last_status;
if (aq_ret) {
-- 
2.14.1

[net-next 02/15] i40e: Store the requested FEC information

2017-08-27 Thread Jeff Kirsher

From: Mariusz Stachura 

Store information about FEC modes, that were requested. It will be used
in printing link status information function and this way there is no
need to call admin queue there.

Signed-off-by: Mariusz Stachura 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40e/i40e_common.c | 4 
 drivers/net/ethernet/intel/i40e/i40e_type.h   | 1 +
 drivers/net/ethernet/intel/i40evf/i40e_type.h | 1 +
 3 files changed, 6 insertions(+)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_common.c 
b/drivers/net/ethernet/intel/i40e/i40e_common.c
index 8e082a946411..5c36a18a31be 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_common.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_common.c
@@ -2529,6 +2529,10 @@ i40e_status i40e_update_link_info(struct i40e_hw *hw)
if (status)
return status;
 
+   hw->phy.link_info.req_fec_info =
+   abilities.fec_cfg_curr_mod_ext_info &
+   (I40E_AQ_REQUEST_FEC_KR | I40E_AQ_REQUEST_FEC_RS);
+
memcpy(hw->phy.link_info.module_type, _type,
   sizeof(hw->phy.link_info.module_type));
}
diff --git a/drivers/net/ethernet/intel/i40e/i40e_type.h 
b/drivers/net/ethernet/intel/i40e/i40e_type.h
index 3a18ed13edc4..fd4bbdd88b57 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_type.h
+++ b/drivers/net/ethernet/intel/i40e/i40e_type.h
@@ -185,6 +185,7 @@ struct i40e_link_status {
enum i40e_aq_link_speed link_speed;
u8 link_info;
u8 an_info;
+   u8 req_fec_info;
u8 fec_info;
u8 ext_info;
u8 loopback;
diff --git a/drivers/net/ethernet/intel/i40evf/i40e_type.h 
b/drivers/net/ethernet/intel/i40evf/i40e_type.h
index bde7f24af1c6..2ea919d9cdcf 100644
--- a/drivers/net/ethernet/intel/i40evf/i40e_type.h
+++ b/drivers/net/ethernet/intel/i40evf/i40e_type.h
@@ -159,6 +159,7 @@ struct i40e_link_status {
enum i40e_aq_link_speed link_speed;
u8 link_info;
u8 an_info;
+   u8 req_fec_info;
u8 fec_info;
u8 ext_info;
u8 loopback;
-- 
2.14.1

[net-next 13/15] i40e: invert logic for checking incorrect cpu vs irq affinity

2017-08-27 Thread Jeff Kirsher

From: Jacob Keller 

In commit 96db776a3682 ("i40e/vf: fix interrupt affinity bug")
we added some code to force exit of polling in case we did
not have the correct CPU. This is important since it was possible for
the IRQ affinity to be changed while the CPU is pegged at 100%. This can
result in the polling routine being stuck on the wrong CPU until
traffic finally stops.

Unfortunately, the implementation, "if the CPU is correct, exit as
normal, otherwise, fall-through to the end-polling exit" is incredibly
confusing to reason about. In this case, the normal flow looks like the
exception, while the exception actually occurs far away from the if
statement and comment.

We recently discovered and fixed a bug in this code because we were
incorrectly initializing the affinity mask.

Re-write the code so that the exceptional case is handled at the check,
rather than having the logic be spread through the regular exit flow.
This does end up with minor code duplication, but the resulting code is
much easier to reason about.

The new logic is identical, but inverted. If we are running on a CPU not
in our affinity mask, we'll exit polling. However, the code flow is much
easier to understand.

Note that we don't actually have to check for MSI-X, because in the MSI
case we'll only have one q_vector, but its default affinity mask should
be correct as it includes all CPUs when it's initialized. Further, we
could at some point add code to setup the notifier for the non-MSI-X
case and enable this workaround for that case too, if desired, though
there isn't much gain since its unlikely to be the common case.

Signed-off-by: Jacob Keller 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40e/i40e_txrx.c   | 31 +--
 drivers/net/ethernet/intel/i40evf/i40e_txrx.c | 30 +-
 2 files changed, 30 insertions(+), 31 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c 
b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
index 5c1edcce9459..3999afea518b 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
@@ -2369,7 +2369,6 @@ int i40e_napi_poll(struct napi_struct *napi, int budget)
 
/* If work not completed, return budget and polling will return */
if (!clean_complete) {
-   const cpumask_t *aff_mask = _vector->affinity_mask;
int cpu_id = smp_processor_id();
 
/* It is possible that the interrupt affinity has changed but,
@@ -2379,15 +2378,22 @@ int i40e_napi_poll(struct napi_struct *napi, int budget)
 * continue to poll, otherwise we must stop polling so the
 * interrupt can move to the correct cpu.
 */
-   if (likely(cpumask_test_cpu(cpu_id, aff_mask) ||
-  !(vsi->back->flags & I40E_FLAG_MSIX_ENABLED))) {
+   if (!cpumask_test_cpu(cpu_id, _vector->affinity_mask)) {
+   /* Tell napi that we are done polling */
+   napi_complete_done(napi, work_done);
+
+   /* Force an interrupt */
+   i40e_force_wb(vsi, q_vector);
+
+   /* Return budget-1 so that polling stops */
+   return budget - 1;
+   }
 tx_only:
-   if (arm_wb) {
-   q_vector->tx.ring[0].tx_stats.tx_force_wb++;
-   i40e_enable_wb_on_itr(vsi, q_vector);
-   }
-   return budget;
+   if (arm_wb) {
+   q_vector->tx.ring[0].tx_stats.tx_force_wb++;
+   i40e_enable_wb_on_itr(vsi, q_vector);
}
+   return budget;
}
 
if (vsi->back->flags & I40E_TXR_FLAGS_WB_ON_ITR)
@@ -2396,14 +2402,7 @@ int i40e_napi_poll(struct napi_struct *napi, int budget)
/* Work is done so exit the polling mode and re-enable the interrupt */
napi_complete_done(napi, work_done);
 
-   /* If we're prematurely stopping polling to fix the interrupt
-* affinity we want to make sure polling starts back up so we
-* issue a call to i40e_force_wb which triggers a SW interrupt.
-*/
-   if (!clean_complete)
-   i40e_force_wb(vsi, q_vector);
-   else
-   i40e_update_enable_itr(vsi, q_vector);
+   i40e_update_enable_itr(vsi, q_vector);
 
return min(work_done, budget - 1);
 }
diff --git a/drivers/net/ethernet/intel/i40evf/i40e_txrx.c 
b/drivers/net/ethernet/intel/i40evf/i40e_txrx.c
index d91676ccf125..f15e341ada9e 100644
--- a/drivers/net/ethernet/intel/i40evf/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40evf/i40e_txrx.c
@@ -1575,7 +1575,6 @@ int i40evf_napi_poll(struct napi_struct *napi,

[net-next 11/15] i40e: move enabling icr0 into i40e_update_enable_itr

2017-08-27 Thread Jeff Kirsher

From: Jacob Keller 

If we don't have MSI-X enabled, we handle interrupts on all icr0. This
is a special case, so let's move the conditional into
i40e_update_enable_itr() in order to make i40e_napi_poll easier to
read about.

Signed-off-by: Jacob Keller 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40e/i40e_txrx.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c 
b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
index 8a969d8f0790..5c1edcce9459 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
@@ -2243,6 +2243,12 @@ static inline void i40e_update_enable_itr(struct 
i40e_vsi *vsi,
int idx = q_vector->v_idx;
int rx_itr_setting, tx_itr_setting;
 
+   /* If we don't have MSIX, then we only need to re-enable icr0 */
+   if (!(vsi->back->flags & I40E_FLAG_MSIX_ENABLED)) {
+   i40e_irq_dynamic_enable_icr0(vsi->back, false);
+   return;
+   }
+
vector = (q_vector->v_idx + vsi->base_vector);
 
/* avoid dynamic calculation if in countdown mode OR if
@@ -2396,8 +2402,6 @@ int i40e_napi_poll(struct napi_struct *napi, int budget)
 */
if (!clean_complete)
i40e_force_wb(vsi, q_vector);
-   else if (!(vsi->back->flags & I40E_FLAG_MSIX_ENABLED))
-   i40e_irq_dynamic_enable_icr0(vsi->back, false);
else
i40e_update_enable_itr(vsi, q_vector);
 
-- 
2.14.1

[net-next 00/15][pull request] 40GbE Intel Wired LAN Driver Updates 2017-08-27

2017-08-27 Thread Jeff Kirsher

This series contains updates to i40e and i40evf only.

Sudheer updates code comments and state variable so that adminq_subtask
will have accutate information whenever it gets scheduled.

Mariusz stores information about FEC modes, to be used to printing link
states information, so that we do not need to call admin queue when
reporting link status.  Adds VF support for controlling VLAN tag
stripping via ethtool.

Jake provides the majority of changes in this series, starting with
increasing the size of the prefix buffer so that it can hold enough
characters for every possible input, which prevents snprintf truncation.
Fixed other string truncation errors/warnings produced by GCC 7.x.
Removed an unnecessary workaround for resetting XPS.  Fixed an issue
where there is a mismatched affinity mask value, so initialize the value
to cpu_possible_mask and invert the logic for checking incorrect CPU vs
IRQ affinity so that the exceptional case is handled at the check.
Removed ULTRA latency mode due to several issues found and will be
looking at better solution for small packet workloads.

Akeem fixes an issue where the incorrect flag was being used to set
promiscuous mode for unicast, which was enabling promiscuous mode only
for multicast instead of unicast.

Carolyn fixes an issue where an error return value is set, but this
value can be overwritten before we actually do exit the function.  So
remove the error code assignment and add code comments for better
understanding on why we do not need to set and return the error.

The following are changes since commit ec15ecdee5eb9e33a565e1e8eaef39fd4de565cb:
  net: mvpp2: fix the packet size configuration for 10G
and are available in the git repository at:
  git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue 40GbE

Akeem G Abodunrin (1):
  i40e: Use correct flag to enable egress traffic for unicast promisc

Carolyn Wyborny (1):
  i40e: Fix for unused value issue found by static analysis

Jacob Keller (9):
  i40e: prevent snprintf format specifier truncation
  i40evf: fix possible snprintf truncation of q_vector->name
  i40e: force VMDQ device name truncation
  i40e: remove workaround for resetting XPS
  i40e: move enabling icr0 into i40e_update_enable_itr
  i40e: initialize our affinity_mask based on cpu_possible_mask
  i40e: invert logic for checking incorrect cpu vs irq affinity
  i40e/i40evf: remove ULTRA latency mode
  i40e/i40evf: avoid dynamic ITR updates when polling or low packet rate

Mariusz Stachura (3):
  i40e: Store the requested FEC information
  i40e/i40evf: support for VF VLAN tag stripping control
  i40e: 25G FEC status improvements

Sudheer Mogilappagari (1):
  i40e: Update state variable for adminq subtask

 drivers/net/ethernet/intel/i40e/i40e_common.c  |  8 ++-
 drivers/net/ethernet/intel/i40e/i40e_main.c| 58 ++--
 drivers/net/ethernet/intel/i40e/i40e_nvm.c | 10 ++-
 drivers/net/ethernet/intel/i40e/i40e_txrx.c| 78 +++---
 drivers/net/ethernet/intel/i40e/i40e_txrx.h|  2 +-
 drivers/net/ethernet/intel/i40e/i40e_type.h|  1 +
 drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c | 62 -
 drivers/net/ethernet/intel/i40evf/i40e_common.c|  4 +-
 drivers/net/ethernet/intel/i40evf/i40e_txrx.c  | 69 +--
 drivers/net/ethernet/intel/i40evf/i40e_txrx.h  |  2 +-
 drivers/net/ethernet/intel/i40evf/i40e_type.h  |  1 +
 drivers/net/ethernet/intel/i40evf/i40evf.h |  6 +-
 drivers/net/ethernet/intel/i40evf/i40evf_main.c| 61 +
 .../net/ethernet/intel/i40evf/i40evf_virtchnl.c| 40 +++
 include/linux/avf/virtchnl.h   |  5 ++
 15 files changed, 285 insertions(+), 122 deletions(-)

-- 
2.14.1

[net-next 06/15] i40e: force VMDQ device name truncation

2017-08-27 Thread Jeff Kirsher

From: Jacob Keller 

In new versions of GCC since 7.x a new warning exists which warns when
a string is truncated before all of the format can be completed.

When we setup VMDQ netdev names we are copying a pre-existing interface
name which could be up to 15 characters in length. Since we also add
4 bytes, v, the literal %, the d and a \0 null, we would overrun the
available size unless snprintf truncated for us.

The snprintf call will of course truncate on the end, so lets instead
modify the code to force truncation of the copied netdev name by
4 characters, to create enough space for the 4 bytes we're adding.

Signed-off-by: Jacob Keller 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40e/i40e_main.c | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c 
b/drivers/net/ethernet/intel/i40e/i40e_main.c
index b0ccd3c2eec6..3a6a752c6c58 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -9690,8 +9690,13 @@ static int i40e_config_netdev(struct i40e_vsi *vsi)
i40e_add_mac_filter(vsi, mac_addr);
spin_unlock_bh(>mac_filter_hash_lock);
} else {
-   /* relate the VSI_VMDQ name to the VSI_MAIN name */
-   snprintf(netdev->name, IFNAMSIZ, "%sv%%d",
+   /* Relate the VSI_VMDQ name to the VSI_MAIN name. Note that we
+* are still limited by IFNAMSIZ, but we're adding 'v%d\0' to
+* the end, which is 4 bytes long, so force truncation of the
+* original name by IFNAMSIZ - 4
+*/
+   snprintf(netdev->name, IFNAMSIZ, "%.*sv%%d",
+IFNAMSIZ - 4,
 pf->vsi[pf->lan_vsi]->netdev->name);
random_ether_addr(mac_addr);
 
-- 
2.14.1

[net-next 09/15] i40e: Fix for unused value issue found by static analysis

2017-08-27 Thread Jeff Kirsher

From: Carolyn Wyborny 

This patch fixes an issue where an error return value is
set, but without an immediate exit, the value can be overwritten
by the following code execution.  The condition  at this point
is not fatal, so remove the error assignment and comment the
intent for future code maintainers

Signed-off-by: Carolyn Wyborny 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40e/i40e_main.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c 
b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 5a06cd23b9e6..0962b85ef6f3 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -9884,13 +9884,15 @@ static int i40e_add_vsi(struct i40e_vsi *vsi)
 */
ret = i40e_vsi_config_tc(vsi, enabled_tc);
if (ret) {
+   /* Single TC condition is not fatal,
+* message and continue
+*/
dev_info(>pdev->dev,
 "failed to configure TCs for main VSI 
tc_map 0x%08x, err %s aq_err %s\n",
 enabled_tc,
 i40e_stat_str(>hw, ret),
 i40e_aq_str(>hw,
pf->hw.aq.asq_last_status));
-   ret = -ENOENT;
}
}
break;
-- 
2.14.1

[net-next 05/15] i40evf: fix possible snprintf truncation of q_vector->name

2017-08-27 Thread Jeff Kirsher

From: Jacob Keller 

The q_vector names are based on the interface name with a driver prefix,
the type of q_vector setup, and the queue number. We previously set the
size of this variable to IFNAMSIZ + 9, which is incorrect, because we
actually include a minimum of 14 characters extra beyond the interface
name size.

New versions of GCC since 7 include a new warning that detects this
possible truncation and complains. We can fix this by increasing the
size in case our interface name is too large to avoid truncation. We
don't need to go beyond 14 because the compiler is smart enough to
realize our values can never exceed size of 1. We do go up to 15 here
because possible future changes may increase the number of queues beyond
one digit.

While we are here, also change some variables to be unsigned (since they
are never negative) and stop using an extra unnecessary %s format
specifier.

Signed-off-by: Jacob Keller 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40evf/i40evf.h  |  2 +-
 drivers/net/ethernet/intel/i40evf/i40evf_main.c | 21 +
 2 files changed, 10 insertions(+), 13 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40evf/i40evf.h 
b/drivers/net/ethernet/intel/i40evf/i40evf.h
index d310544c6c6e..e5293d35fb6a 100644
--- a/drivers/net/ethernet/intel/i40evf/i40evf.h
+++ b/drivers/net/ethernet/intel/i40evf/i40evf.h
@@ -121,7 +121,7 @@ struct i40e_q_vector {
 #define ITR_COUNTDOWN_START 100
u8 itr_countdown;   /* when 0 or 1 update ITR */
int v_idx;  /* vector index in list */
-   char name[IFNAMSIZ + 9];
+   char name[IFNAMSIZ + 15];
bool arm_wb_state;
cpumask_t affinity_mask;
struct irq_affinity_notify affinity_notify;
diff --git a/drivers/net/ethernet/intel/i40evf/i40evf_main.c 
b/drivers/net/ethernet/intel/i40evf/i40evf_main.c
index 0d87191b6bac..258e8e27068b 100644
--- a/drivers/net/ethernet/intel/i40evf/i40evf_main.c
+++ b/drivers/net/ethernet/intel/i40evf/i40evf_main.c
@@ -543,9 +543,9 @@ static void i40evf_irq_affinity_release(struct kref *ref) {}
 static int
 i40evf_request_traffic_irqs(struct i40evf_adapter *adapter, char *basename)
 {
-   int vector, err, q_vectors;
-   int rx_int_idx = 0, tx_int_idx = 0;
-   int irq_num;
+   unsigned int vector, q_vectors;
+   unsigned int rx_int_idx = 0, tx_int_idx = 0;
+   int irq_num, err;
 
i40evf_irq_disable(adapter);
/* Decrement for Other and TCP Timer vectors */
@@ -556,18 +556,15 @@ i40evf_request_traffic_irqs(struct i40evf_adapter 
*adapter, char *basename)
irq_num = adapter->msix_entries[vector + NONQ_VECS].vector;
 
if (q_vector->tx.ring && q_vector->rx.ring) {
-   snprintf(q_vector->name, sizeof(q_vector->name) - 1,
-"i40evf-%s-%s-%d", basename,
-"TxRx", rx_int_idx++);
+   snprintf(q_vector->name, sizeof(q_vector->name),
+"i40evf-%s-TxRx-%d", basename, rx_int_idx++);
tx_int_idx++;
} else if (q_vector->rx.ring) {
-   snprintf(q_vector->name, sizeof(q_vector->name) - 1,
-"i40evf-%s-%s-%d", basename,
-"rx", rx_int_idx++);
+   snprintf(q_vector->name, sizeof(q_vector->name),
+"i40evf-%s-rx-%d", basename, rx_int_idx++);
} else if (q_vector->tx.ring) {
-   snprintf(q_vector->name, sizeof(q_vector->name) - 1,
-"i40evf-%s-%s-%d", basename,
-"tx", tx_int_idx++);
+   snprintf(q_vector->name, sizeof(q_vector->name),
+"i40evf-%s-tx-%d", basename, tx_int_idx++);
} else {
/* skip this unused q_vector */
continue;
-- 
2.14.1

[net-next 03/15] i40e: prevent snprintf format specifier truncation

2017-08-27 Thread Jeff Kirsher

From: Jacob Keller 

Increase the size of the prefix buffer so that it can hold enough
characters for every possible input. Although 20 is enough for all
expected inputs, it is possible for the values to be larger than
expected, resulting in a possibly truncated string. Additionally, lets
use sizeof(prefix) in order to ensure we use the correct size if we need
to change the array length in the future.

New versions of GCC starting at 7 now include warnings to prevent
truncation unless you handle the return code. At most 27 bytes can be
written here, so lets just increase the buffer size even if for all
expected hw->bus.* values we only needed 20.

Signed-off-by: Jacob Keller 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40e/i40e_common.c   | 4 ++--
 drivers/net/ethernet/intel/i40evf/i40e_common.c | 4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_common.c 
b/drivers/net/ethernet/intel/i40e/i40e_common.c
index 5c36a18a31be..111426ba5fbc 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_common.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_common.c
@@ -328,9 +328,9 @@ void i40e_debug_aq(struct i40e_hw *hw, enum i40e_debug_mask 
mask, void *desc,
len = buf_len;
/* write the full 16-byte chunks */
if (hw->debug_mask & mask) {
-   char prefix[20];
+   char prefix[27];
 
-   snprintf(prefix, 20,
+   snprintf(prefix, sizeof(prefix),
 "i40e %02x:%02x.%x: \t0x",
 hw->bus.bus_id,
 hw->bus.device,
diff --git a/drivers/net/ethernet/intel/i40evf/i40e_common.c 
b/drivers/net/ethernet/intel/i40evf/i40e_common.c
index d69c2e44cd1a..8d3a2bfe186a 100644
--- a/drivers/net/ethernet/intel/i40evf/i40e_common.c
+++ b/drivers/net/ethernet/intel/i40evf/i40e_common.c
@@ -333,9 +333,9 @@ void i40evf_debug_aq(struct i40e_hw *hw, enum 
i40e_debug_mask mask, void *desc,
len = buf_len;
/* write the full 16-byte chunks */
if (hw->debug_mask & mask) {
-   char prefix[20];
+   char prefix[27];
 
-   snprintf(prefix, 20,
+   snprintf(prefix, sizeof(prefix),
 "i40evf %02x:%02x.%x: \t0x",
 hw->bus.bus_id,
 hw->bus.device,
-- 
2.14.1

[net-next 12/15] i40e: initialize our affinity_mask based on cpu_possible_mask

2017-08-27 Thread Jeff Kirsher

From: Jacob Keller 

On older kernels a call to irq_set_affinity_hint does not guarantee that
the IRQ affinity will be set. If nothing else on the system sets the IRQ
affinity this can result in a bug in the i40e_napi_poll() routine where
we notice that our interrupt fired on the "wrong" CPU according to our
internal affinity_mask variable.

This results in a bug where we continuously tell NAPI to stop polling to
move the interrupt to a new CPU, but the CPU never changes because our
affinity mask does not match the actual mask setup for the IRQ.

The root problem is a mismatched affinity mask value. So lets initialize
the value to cpu_possible_mask instead. This ensures that prior to the
first time we get an IRQ affinity notification we'll have the mask set
to include every possible CPU.

We use cpu_possible_mask instead of cpu_online_mask since the former is
almost certainly never going to change, while the later might change
after we've made a copy.

Signed-off-by: Jacob Keller 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40e/i40e_main.c | 12 +++-
 drivers/net/ethernet/intel/i40evf/i40evf_main.c |  7 +--
 2 files changed, 12 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c 
b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 7366e7c7f399..6498da8806cb 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -2881,7 +2881,7 @@ static void i40e_config_xps_tx_ring(struct i40e_ring 
*ring)
if ((vsi->tc_config.numtc <= 1) &&
!test_and_set_bit(__I40E_TX_XPS_INIT_DONE, >state)) {
netif_set_xps_queue(ring->netdev,
-   >q_vector->affinity_mask,
+   get_cpu_mask(ring->q_vector->v_idx),
ring->queue_index);
}
 
@@ -3506,8 +3506,10 @@ static int i40e_vsi_request_irq_msix(struct i40e_vsi 
*vsi, char *basename)
q_vector->affinity_notify.notify = i40e_irq_affinity_notify;
q_vector->affinity_notify.release = i40e_irq_affinity_release;
irq_set_affinity_notifier(irq_num, _vector->affinity_notify);
-   /* assign the mask for this irq */
-   irq_set_affinity_hint(irq_num, _vector->affinity_mask);
+   /* get_cpu_mask returns a static constant mask with
+* a permanent lifetime so it's ok to use here.
+*/
+   irq_set_affinity_hint(irq_num, get_cpu_mask(q_vector->v_idx));
}
 
vsi->irqs_ready = true;
@@ -4289,7 +4291,7 @@ static void i40e_vsi_free_irq(struct i40e_vsi *vsi)
 
/* clear the affinity notifier in the IRQ descriptor */
irq_set_affinity_notifier(irq_num, NULL);
-   /* clear the affinity_mask in the IRQ descriptor */
+   /* remove our suggested affinity mask for this IRQ */
irq_set_affinity_hint(irq_num, NULL);
synchronize_irq(irq_num);
free_irq(irq_num, vsi->q_vectors[i]);
@@ -8235,7 +8237,7 @@ static int i40e_vsi_alloc_q_vector(struct i40e_vsi *vsi, 
int v_idx, int cpu)
 
q_vector->vsi = vsi;
q_vector->v_idx = v_idx;
-   cpumask_set_cpu(cpu, _vector->affinity_mask);
+   cpumask_copy(_vector->affinity_mask, cpu_possible_mask);
 
if (vsi->netdev)
netif_napi_add(vsi->netdev, _vector->napi,
diff --git a/drivers/net/ethernet/intel/i40evf/i40evf_main.c 
b/drivers/net/ethernet/intel/i40evf/i40evf_main.c
index 9ee277e87f10..1825d956bb00 100644
--- a/drivers/net/ethernet/intel/i40evf/i40evf_main.c
+++ b/drivers/net/ethernet/intel/i40evf/i40evf_main.c
@@ -584,8 +584,10 @@ i40evf_request_traffic_irqs(struct i40evf_adapter 
*adapter, char *basename)
q_vector->affinity_notify.release =
   i40evf_irq_affinity_release;
irq_set_affinity_notifier(irq_num, _vector->affinity_notify);
-   /* assign the mask for this irq */
-   irq_set_affinity_hint(irq_num, _vector->affinity_mask);
+   /* get_cpu_mask returns a static constant mask with
+* a permanent lifetime so it's ok to use here.
+*/
+   irq_set_affinity_hint(irq_num, get_cpu_mask(q_vector->v_idx));
}
 
return 0;
@@ -1456,6 +1458,7 @@ static int i40evf_alloc_q_vectors(struct i40evf_adapter 
*adapter)
q_vector->adapter = adapter;
q_vector->vsi = >vsi;
q_vector->v_idx = q_idx;
+   cpumask_copy(_vector->affinity_mask, cpu_possible_mask);
netif_napi_add(adapter->netdev, _vector->napi,

[net-next 14/15] i40e/i40evf: remove ULTRA latency mode

2017-08-27 Thread Jeff Kirsher

From: Jacob Keller 

Since commit c56625d59726 ("i40e/i40evf: change dynamic interrupt
thresholds") a new higher latency ITR setting called I40E_ULTRA_LATENCY
was added with a cryptic comment about how it was meant for adjusting Rx
more aggressively when streaming small packets.

This mode was attempting to calculate packets per second and then kick
in when we have a huge number of small packets.

Unfortunately, the ULTRA setting was kicking in for workloads it wasn't
intended for including single-thread UDP_STREAM workloads.

This wasn't caught for a variety of reasons. First, the ip_defrag
routines were improved somewhat which makes the UDP_STREAM test still
reasonable at 10GbE, even when dropped down to 8k interrupts a second.
Additionally, some other obvious workloads appear to work fine, such
as TCP_STREAM.

The number 40k doesn't make sense for a number of reasons. First, we
absolutely can do more than 40k packets per second. Second, we calculate
the value inline in an integer, which sometimes can overflow resulting
in using incorrect values.

If we fix this overflow it makes it even more likely that we'll enter
ULTRA mode which is the opposite of what we want.

The ULTRA mode was added originally as a way to reduce CPU utilization
during a small packet workload where we weren't keeping up anyways. It
should never have been kicking in during these other workloads.

Given the issues outlined above, let's remove the ULTRA latency mode. If
necessary, a better solution to the CPU utilization issue for small
packet workloads will be added in a future patch.

Signed-off-by: Jacob Keller 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40e/i40e_txrx.c   | 17 -
 drivers/net/ethernet/intel/i40e/i40e_txrx.h   |  1 -
 drivers/net/ethernet/intel/i40evf/i40e_txrx.c | 17 -
 drivers/net/ethernet/intel/i40evf/i40e_txrx.h |  1 -
 4 files changed, 36 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c 
b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
index 3999afea518b..f00f233092e9 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
@@ -959,7 +959,6 @@ void i40e_force_wb(struct i40e_vsi *vsi, struct 
i40e_q_vector *q_vector)
 static bool i40e_set_new_dynamic_itr(struct i40e_ring_container *rc)
 {
enum i40e_latency_range new_latency_range = rc->latency_range;
-   struct i40e_q_vector *qv = rc->ring->q_vector;
u32 new_itr = rc->itr;
int bytes_per_int;
int usecs;
@@ -971,7 +970,6 @@ static bool i40e_set_new_dynamic_itr(struct 
i40e_ring_container *rc)
 *   0-10MB/s   lowest (5 ints/s)
 *  10-20MB/s   low(2 ints/s)
 *  20-1249MB/s bulk   (18000 ints/s)
-*  > 4 Rx packets per second (8000 ints/s)
 *
 * The math works out because the divisor is in 10^(-6) which
 * turns the bytes/us input value into MB/s values, but
@@ -994,24 +992,12 @@ static bool i40e_set_new_dynamic_itr(struct 
i40e_ring_container *rc)
new_latency_range = I40E_LOWEST_LATENCY;
break;
case I40E_BULK_LATENCY:
-   case I40E_ULTRA_LATENCY:
default:
if (bytes_per_int <= 20)
new_latency_range = I40E_LOW_LATENCY;
break;
}
 
-   /* this is to adjust RX more aggressively when streaming small
-* packets.  The value of 4 was picked as it is just beyond
-* what the hardware can receive per second if in low latency
-* mode.
-*/
-#define RX_ULTRA_PACKET_RATE 4
-
-   if rc->total_packets * 100) / usecs) > RX_ULTRA_PACKET_RATE) &&
-   (>rx == rc))
-   new_latency_range = I40E_ULTRA_LATENCY;
-
rc->latency_range = new_latency_range;
 
switch (new_latency_range) {
@@ -1024,9 +1010,6 @@ static bool i40e_set_new_dynamic_itr(struct 
i40e_ring_container *rc)
case I40E_BULK_LATENCY:
new_itr = I40E_ITR_18K;
break;
-   case I40E_ULTRA_LATENCY:
-   new_itr = I40E_ITR_8K;
-   break;
default:
break;
}
diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.h 
b/drivers/net/ethernet/intel/i40e/i40e_txrx.h
index f0a0eabc2666..e6456e8a899c 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.h
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.h
@@ -454,7 +454,6 @@ enum i40e_latency_range {
I40E_LOWEST_LATENCY = 0,
I40E_LOW_LATENCY = 1,
I40E_BULK_LATENCY = 2,
-   I40E_ULTRA_LATENCY = 3,
 };
 
 struct i40e_ring_container {
diff --git a/drivers/net/ethernet/intel/i40evf/i40e_txrx.c 
b/drivers/net/ethernet/intel/i40evf/i40e_txrx.c
index f15e341ada9e..2f7d9f4a6746 100644
---

[net-next 07/15] i40e/i40evf: support for VF VLAN tag stripping control

2017-08-27 Thread Jeff Kirsher

From: Mariusz Stachura 

This patch gives VF capability to control VLAN tag stripping via
ethtool. As rx-vlan-offload was fixed before, now the VF is able to
change it using "ethtool --offload  rxvlan on/off" settings.

Signed-off-by: Mariusz Stachura 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c | 60 ++
 drivers/net/ethernet/intel/i40evf/i40evf.h |  4 ++
 drivers/net/ethernet/intel/i40evf/i40evf_main.c| 33 
 .../net/ethernet/intel/i40evf/i40evf_virtchnl.c| 40 +++
 include/linux/avf/virtchnl.h   |  5 ++
 5 files changed, 142 insertions(+)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c 
b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
index 27d87bef4ba3..4d1e670f490e 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
@@ -2529,6 +2529,60 @@ static int i40e_vc_set_rss_hena(struct i40e_vf *vf, u8 
*msg, u16 msglen)
return i40e_vc_send_resp_to_vf(vf, VIRTCHNL_OP_SET_RSS_HENA, aq_ret);
 }
 
+/**
+ * i40e_vc_enable_vlan_stripping
+ * @vf: pointer to the VF info
+ * @msg: pointer to the msg buffer
+ * @msglen: msg length
+ *
+ * Enable vlan header stripping for the VF
+ **/
+static int i40e_vc_enable_vlan_stripping(struct i40e_vf *vf, u8 *msg,
+u16 msglen)
+{
+   struct i40e_vsi *vsi = vf->pf->vsi[vf->lan_vsi_idx];
+   i40e_status aq_ret = 0;
+
+   if (!test_bit(I40E_VF_STATE_ACTIVE, >vf_states)) {
+   aq_ret = I40E_ERR_PARAM;
+   goto err;
+   }
+
+   i40e_vlan_stripping_enable(vsi);
+
+   /* send the response to the VF */
+err:
+   return i40e_vc_send_resp_to_vf(vf, VIRTCHNL_OP_ENABLE_VLAN_STRIPPING,
+  aq_ret);
+}
+
+/**
+ * i40e_vc_disable_vlan_stripping
+ * @vf: pointer to the VF info
+ * @msg: pointer to the msg buffer
+ * @msglen: msg length
+ *
+ * Disable vlan header stripping for the VF
+ **/
+static int i40e_vc_disable_vlan_stripping(struct i40e_vf *vf, u8 *msg,
+ u16 msglen)
+{
+   struct i40e_vsi *vsi = vf->pf->vsi[vf->lan_vsi_idx];
+   i40e_status aq_ret = 0;
+
+   if (!test_bit(I40E_VF_STATE_ACTIVE, >vf_states)) {
+   aq_ret = I40E_ERR_PARAM;
+   goto err;
+   }
+
+   i40e_vlan_stripping_disable(vsi);
+
+   /* send the response to the VF */
+err:
+   return i40e_vc_send_resp_to_vf(vf, VIRTCHNL_OP_DISABLE_VLAN_STRIPPING,
+  aq_ret);
+}
+
 /**
  * i40e_vc_process_vf_msg
  * @pf: pointer to the PF structure
@@ -2648,6 +2702,12 @@ int i40e_vc_process_vf_msg(struct i40e_pf *pf, s16 
vf_id, u32 v_opcode,
case VIRTCHNL_OP_SET_RSS_HENA:
ret = i40e_vc_set_rss_hena(vf, msg, msglen);
break;
+   case VIRTCHNL_OP_ENABLE_VLAN_STRIPPING:
+   ret = i40e_vc_enable_vlan_stripping(vf, msg, msglen);
+   break;
+   case VIRTCHNL_OP_DISABLE_VLAN_STRIPPING:
+   ret = i40e_vc_disable_vlan_stripping(vf, msg, msglen);
+   break;
 
case VIRTCHNL_OP_UNKNOWN:
default:
diff --git a/drivers/net/ethernet/intel/i40evf/i40evf.h 
b/drivers/net/ethernet/intel/i40evf/i40evf.h
index e5293d35fb6a..82f69031e5cd 100644
--- a/drivers/net/ethernet/intel/i40evf/i40evf.h
+++ b/drivers/net/ethernet/intel/i40evf/i40evf.h
@@ -261,6 +261,8 @@ struct i40evf_adapter {
 #define I40EVF_FLAG_AQ_RELEASE_PROMISC BIT(16)
 #define I40EVF_FLAG_AQ_REQUEST_ALLMULTIBIT(17)
 #define I40EVF_FLAG_AQ_RELEASE_ALLMULTIBIT(18)
+#define I40EVF_FLAG_AQ_ENABLE_VLAN_STRIPPING   BIT(19)
+#define I40EVF_FLAG_AQ_DISABLE_VLAN_STRIPPING  BIT(20)
 
/* OS defined structs */
struct net_device *netdev;
@@ -358,6 +360,8 @@ void i40evf_get_hena(struct i40evf_adapter *adapter);
 void i40evf_set_hena(struct i40evf_adapter *adapter);
 void i40evf_set_rss_key(struct i40evf_adapter *adapter);
 void i40evf_set_rss_lut(struct i40evf_adapter *adapter);
+void i40evf_enable_vlan_stripping(struct i40evf_adapter *adapter);
+void i40evf_disable_vlan_stripping(struct i40evf_adapter *adapter);
 void i40evf_virtchnl_completion(struct i40evf_adapter *adapter,
enum virtchnl_ops v_opcode,
i40e_status v_retval, u8 *msg, u16 msglen);
diff --git a/drivers/net/ethernet/intel/i40evf/i40evf_main.c 
b/drivers/net/ethernet/intel/i40evf/i40evf_main.c
index 258e8e27068b..9ee277e87f10 100644
--- a/drivers/net/ethernet/intel/i40evf/i40evf_main.c
+++ b/drivers/net/ethernet/intel/i40evf/i40evf_main.c
@@ -1676,6 +1676,16 @@ static void i40evf_watchdog_task(struct work_struct 
*work)
goto watchdog_done;
}
 
+

[net-next 08/15] i40e: 25G FEC status improvements

2017-08-27 Thread Jeff Kirsher

From: Mariusz Stachura 

This patch improves the system log message. The log message will
be expanded to include the FEC mode the FW requested before link
was established.

Signed-off-by: Mariusz Stachura 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40e/i40e_main.c | 18 --
 1 file changed, 16 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c 
b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 3a6a752c6c58..5a06cd23b9e6 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -5354,6 +5354,7 @@ void i40e_print_link_message(struct i40e_vsi *vsi, bool 
isup)
char *speed = "Unknown";
char *fc = "Unknown";
char *fec = "";
+   char *req_fec = "";
char *an = "";
 
new_speed = vsi->back->hw.phy.link_info.link_speed;
@@ -5415,6 +5416,7 @@ void i40e_print_link_message(struct i40e_vsi *vsi, bool 
isup)
}
 
if (vsi->back->hw.phy.link_info.link_speed == I40E_LINK_SPEED_25GB) {
+   req_fec = ", Requested FEC: None";
fec = ", FEC: None";
an = ", Autoneg: False";
 
@@ -5427,10 +5429,22 @@ void i40e_print_link_message(struct i40e_vsi *vsi, bool 
isup)
else if (vsi->back->hw.phy.link_info.fec_info &
 I40E_AQ_CONFIG_FEC_RS_ENA)
fec = ", FEC: CL108 RS-FEC";
+
+   /* 'CL108 RS-FEC' should be displayed when RS is requested, or
+* both RS and FC are requested
+*/
+   if (vsi->back->hw.phy.link_info.req_fec_info &
+   (I40E_AQ_REQUEST_FEC_KR | I40E_AQ_REQUEST_FEC_RS)) {
+   if (vsi->back->hw.phy.link_info.req_fec_info &
+   I40E_AQ_REQUEST_FEC_RS)
+   req_fec = ", Requested FEC: CL108 RS-FEC";
+   else
+   req_fec = ", Requested FEC: CL74 FC-FEC/BASE-R";
+   }
}
 
-   netdev_info(vsi->netdev, "NIC Link is Up, %sbps Full Duplex%s%s, Flow 
Control: %s\n",
-   speed, fec, an, fc);
+   netdev_info(vsi->netdev, "NIC Link is Up, %sbps Full Duplex%s%s%s, Flow 
Control: %s\n",
+   speed, req_fec, fec, an, fc);
 }
 
 /**
-- 
2.14.1

[net-next 15/15] i40e/i40evf: avoid dynamic ITR updates when polling or low packet rate

2017-08-27 Thread Jeff Kirsher

From: Jacob Keller 

The dynamic ITR algorithm depends on a calculation of usecs which
assumes that the interrupts have been firing constantly at the interrupt
throttle rate. This is not guaranteed because we could have a low packet
rate, or have been polling in software.

We'll estimate whether this is the case by using jiffies to determine if
we've been too long. If the time difference of jiffies is larger we are
guaranteed to have an incorrect calculation. If the time difference of
jiffies is smaller we might have been polling some but the difference
shouldn't affect the calculation too much.

This ensures that we don't get stuck in BULK latency during certain rare
situations where we receive bursts of packets that force us into NAPI
polling.

Signed-off-by: Jacob Keller 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40e/i40e_txrx.c   | 22 +-
 drivers/net/ethernet/intel/i40e/i40e_txrx.h   |  1 +
 drivers/net/ethernet/intel/i40evf/i40e_txrx.c | 22 +-
 drivers/net/ethernet/intel/i40evf/i40e_txrx.h |  1 +
 4 files changed, 36 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c 
b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
index f00f233092e9..1519dfb851d0 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
@@ -961,11 +961,25 @@ static bool i40e_set_new_dynamic_itr(struct 
i40e_ring_container *rc)
enum i40e_latency_range new_latency_range = rc->latency_range;
u32 new_itr = rc->itr;
int bytes_per_int;
-   int usecs;
+   unsigned int usecs, estimated_usecs;
 
if (rc->total_packets == 0 || !rc->itr)
return false;
 
+   usecs = (rc->itr << 1) * ITR_COUNTDOWN_START;
+   bytes_per_int = rc->total_bytes / usecs;
+
+   /* The calculations in this algorithm depend on interrupts actually
+* firing at the ITR rate. This may not happen if the packet rate is
+* really low, or if we've been napi polling. Check to make sure
+* that's not the case before we continue.
+*/
+   estimated_usecs = jiffies_to_usecs(jiffies - rc->last_itr_update);
+   if (estimated_usecs > usecs) {
+   new_latency_range = I40E_LOW_LATENCY;
+   goto reset_latency;
+   }
+
/* simple throttlerate management
 *   0-10MB/s   lowest (5 ints/s)
 *  10-20MB/s   low(2 ints/s)
@@ -977,9 +991,6 @@ static bool i40e_set_new_dynamic_itr(struct 
i40e_ring_container *rc)
 * are in 2 usec increments in the ITR registers, and make sure
 * to use the smoothed values that the countdown timer gives us.
 */
-   usecs = (rc->itr << 1) * ITR_COUNTDOWN_START;
-   bytes_per_int = rc->total_bytes / usecs;
-
switch (new_latency_range) {
case I40E_LOWEST_LATENCY:
if (bytes_per_int > 10)
@@ -998,6 +1009,7 @@ static bool i40e_set_new_dynamic_itr(struct 
i40e_ring_container *rc)
break;
}
 
+reset_latency:
rc->latency_range = new_latency_range;
 
switch (new_latency_range) {
@@ -1016,12 +1028,12 @@ static bool i40e_set_new_dynamic_itr(struct 
i40e_ring_container *rc)
 
rc->total_bytes = 0;
rc->total_packets = 0;
+   rc->last_itr_update = jiffies;
 
if (new_itr != rc->itr) {
rc->itr = new_itr;
return true;
}
-
return false;
 }
 
diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.h 
b/drivers/net/ethernet/intel/i40e/i40e_txrx.h
index e6456e8a899c..2f848bc5e391 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.h
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.h
@@ -461,6 +461,7 @@ struct i40e_ring_container {
struct i40e_ring *ring;
unsigned int total_bytes;   /* total bytes processed this int */
unsigned int total_packets; /* total packets processed this int */
+   unsigned long last_itr_update;  /* jiffies of last ITR update */
u16 count;
enum i40e_latency_range latency_range;
u16 itr;
diff --git a/drivers/net/ethernet/intel/i40evf/i40e_txrx.c 
b/drivers/net/ethernet/intel/i40evf/i40e_txrx.c
index 2f7d9f4a6746..c32c62462c84 100644
--- a/drivers/net/ethernet/intel/i40evf/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40evf/i40e_txrx.c
@@ -359,11 +359,25 @@ static bool i40e_set_new_dynamic_itr(struct 
i40e_ring_container *rc)
enum i40e_latency_range new_latency_range = rc->latency_range;
u32 new_itr = rc->itr;
int bytes_per_int;
-   int usecs;
+   unsigned int usecs, estimated_usecs;
 
if (rc->total_packets == 0 || !rc->itr)
return false;
 
+   usecs = (rc->itr << 1) * ITR_COUNTDOWN_START;
+   bytes_per_int = rc->total_bytes /

[net-next 10/15] i40e: remove workaround for resetting XPS

2017-08-27 Thread Jeff Kirsher

From: Jacob Keller 

Since commit 3ffa037d7f78 ("i40e: Set XPS bit mask to zero in DCB mode")
we've tried to reset the XPS settings by building a custom
empty CPU mask.

This workaround is not necessary because we're not really removing the
XPS setting, but simply setting it so that no CPU is valid.

Second, we shorten the code further by using zalloc_cpumask_var instead
of a separate call to bitmap_zero().

Signed-off-by: Jacob Keller 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40e/i40e_main.c | 17 +
 1 file changed, 5 insertions(+), 12 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c 
b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 0962b85ef6f3..7366e7c7f399 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -2874,22 +2874,15 @@ static void i40e_vsi_free_rx_resources(struct i40e_vsi 
*vsi)
 static void i40e_config_xps_tx_ring(struct i40e_ring *ring)
 {
struct i40e_vsi *vsi = ring->vsi;
-   cpumask_var_t mask;
 
if (!ring->q_vector || !ring->netdev)
return;
 
-   /* Single TC mode enable XPS */
-   if (vsi->tc_config.numtc <= 1) {
-   if (!test_and_set_bit(__I40E_TX_XPS_INIT_DONE, >state))
-   netif_set_xps_queue(ring->netdev,
-   >q_vector->affinity_mask,
-   ring->queue_index);
-   } else if (alloc_cpumask_var(, GFP_KERNEL)) {
-   /* Disable XPS to allow selection based on TC */
-   bitmap_zero(cpumask_bits(mask), nr_cpumask_bits);
-   netif_set_xps_queue(ring->netdev, mask, ring->queue_index);
-   free_cpumask_var(mask);
+   if ((vsi->tc_config.numtc <= 1) &&
+   !test_and_set_bit(__I40E_TX_XPS_INIT_DONE, >state)) {
+   netif_set_xps_queue(ring->netdev,
+   >q_vector->affinity_mask,
+   ring->queue_index);
}
 
/* schedule our worker thread which will take care of
-- 
2.14.1

[net-next 01/15] i40e: Update state variable for adminq subtask

2017-08-27 Thread Jeff Kirsher

From: Sudheer Mogilappagari 

During NVM update, state machine gets into unrecoverable state because
i40e_clean_adminq_subtask can get scheduled after the admin queue
command but before other state variables are updated. This causes
incorrect input to i40e_nvmupd_check_wait_event and state transitions
don't happen.

This fix updates the state variables so that adminq_subtask will have
accurate information whenever it gets scheduled.

Signed-off-by: Sudheer Mogilappagari 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40e/i40e_nvm.c | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_nvm.c 
b/drivers/net/ethernet/intel/i40e/i40e_nvm.c
index 2cf7db2dc7cd..96afef98a08f 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_nvm.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_nvm.c
@@ -755,7 +755,11 @@ i40e_status i40e_nvmupd_command(struct i40e_hw *hw,
 
/* Acquire lock to prevent race condition where adminq_task
 * can execute after i40e_nvmupd_nvm_read/write but before state
-* variables (nvm_wait_opcode, nvm_release_on_done) are updated
+* variables (nvm_wait_opcode, nvm_release_on_done) are updated.
+*
+* During NVMUpdate, it is observed that lock could be held for
+* ~5ms for most commands. However lock is held for ~60ms for
+* NVMUPD_CSUM_LCB command.
 */
mutex_lock(>aq.arq_mutex);
switch (hw->nvmupd_state) {
@@ -778,7 +782,8 @@ i40e_status i40e_nvmupd_command(struct i40e_hw *hw,
 */
if (cmd->offset == 0x) {
i40e_nvmupd_check_wait_event(hw, hw->nvm_wait_opcode);
-   return 0;
+   status = 0;
+   goto exit;
}
 
status = I40E_ERR_NOT_READY;
@@ -793,6 +798,7 @@ i40e_status i40e_nvmupd_command(struct i40e_hw *hw,
*perrno = -ESRCH;
break;
}
+exit:
mutex_unlock(>aq.arq_mutex);
return status;
 }
-- 
2.14.1

Re: [PATCH] ARM: dts: rk3228-evb: Fix the compiling error

2017-08-27 Thread Stephen Rothwell

Hi Dave,

On Sun, 27 Aug 2017 16:59:43 -0700 (PDT) David Miller  
wrote:
>
> Sorry, I wasn't aware that this should go via my tree, I'll take care of
> this soon.

Thanks.

-- 
Cheers,
Stephen Rothwell

Re: [PATCH] ARM: dts: rk3228-evb: Fix the compiling error

2017-08-27 Thread David Miller

From: Stephen Rothwell 
Date: Mon, 28 Aug 2017 08:32:54 +1000

> Hi Dave (Miller),
> 
> On Tue, 22 Aug 2017 21:52:51 +1000 Stephen Rothwell  
> wrote:
>>
>> Thanks.
>> 
>> On Tue, 22 Aug 2017 17:24:25 +0800 David Wu  wrote:
>> >
>> > This patch solves the following error:
>> > arch/arm/boot/dts/rk3228-evb.dtb: ERROR (phandle_references): Reference to 
>> > non-existent node or label "phy0"
>> > 
>> > Fixess db40f15b53e4 ("ARM: dts: rk3228-evb: Enable the integrated PHY for 
>> > gmac")
>> > Signed-off-by: David Wu   
>> 
>> Reported-by: Stephen Rothwell 
> 
> Ping?

Sorry, I wasn't aware that this should go via my tree, I'll take care of
this soon.

Re: [PATCH] connector: Delete an error message for a failed memory allocation in cn_queue_alloc_callback_entry()

2017-08-27 Thread Waskiewicz Jr, Peter

On 8/27/17 3:26 PM, SF Markus Elfring wrote:
> From: Markus Elfring 
> Date: Sun, 27 Aug 2017 21:18:37 +0200
> 
> Omit an extra message for a memory allocation failure in this function.
> 
> This issue was detected by using the Coccinelle software.

Did coccinelle trip on the message or the fact you weren't returning NULL?

> 
> Signed-off-by: Markus Elfring 
> ---
>   drivers/connector/cn_queue.c | 4 +---
>   1 file changed, 1 insertion(+), 3 deletions(-)
> 
> diff --git a/drivers/connector/cn_queue.c b/drivers/connector/cn_queue.c
> index 1f8bf054d11c..e4f31d679f02 100644
> --- a/drivers/connector/cn_queue.c
> +++ b/drivers/connector/cn_queue.c
> @@ -40,10 +40,8 @@ cn_queue_alloc_callback_entry(struct cn_queue_dev *dev, 
> const char *name,
>   struct cn_callback_entry *cbq;
>   
>   cbq = kzalloc(sizeof(*cbq), GFP_KERNEL);
> - if (!cbq) {
> - pr_err("Failed to create new callback queue.\n");
> + if (!cbq)
>   return NULL;
> - }

Wny not:

if (!cbq) {
pr_err("Failed to create new callback queue.\n");
+   return NULL;
}

>   
>   atomic_set(>refcnt, 1);
>   
>

Re: [PATCH 0/4] irda: move it to drivers/staging so we can delete it

2017-08-27 Thread Joe Perches

On Sun, 2017-08-27 at 18:53 +0200, Greg Kroah-Hartman wrote:
> On Sun, Aug 27, 2017 at 09:19:19AM -0700, Joe Perches wrote:
> > On Sun, 2017-08-27 at 18:13 +0200, Greg Kroah-Hartman wrote:
> > > On Sun, Aug 27, 2017 at 08:35:43AM -0700, Joe Perches wrote:
> > > > On Sun, 2017-08-27 at 17:03 +0200, Greg Kroah-Hartman wrote:
> > > > > The IRDA code has long been obsolete and broken.  So, to keep people
> > > > > from trying to use it, and to prevent people from having to maintain 
> > > > > it,
> > > > > let's move it to drivers/staging/ so that we can delete it entirely 
> > > > > from
> > > > > the kernel in a few releases.
> > > > 
> > > > 
> > > > MAINTAINERS should be updated as well.
> > > > 
> > > > It'd probably be nice to try to get an email to
> > > > the irda mailing list too if it still works.
> > > 
> > > As get_maintainer.pl didn't show it, odds are it doesn't...
> > 
> > get_maintainer doesn't show it because it's subscriber-only.
> > If you want get_maintainer to show it, add -s
> > 
> > $ ./scripts/get_maintainer.pl  -s -f net/irda/
> > Samuel Ortiz  (maintainer:IRDA SUBSYSTEM)
> > "David S. Miller"  (maintainer:NETWORKING [GENERAL])
> > irda-us...@lists.sourceforge.net (subscriber list:IRDA SUBSYSTEM)
> > netdev@vger.kernel.org (open list:IRDA SUBSYSTEM)
> > linux-ker...@vger.kernel.org (open list)
> 
> Sorry, am not going to subscribe to a random list just to send patches
> that delete the subsystem :)

Then you do a disservice to those that actually might
be using that subsystem.

Re: [PATCH] igb: check memory allocation failure

2017-08-27 Thread Waskiewicz Jr, Peter

On 8/27/17 2:42 AM, Christophe JAILLET wrote:
> Check memory allocation failures and return -ENOMEM in such cases, as
> already done for other memory allocations in this function.
> 
> This avoids NULL pointers dereference.
> 
> Signed-off-by: Christophe JAILLET 
> ---
>   drivers/net/ethernet/intel/igb/igb_main.c | 2 ++
>   1 file changed, 2 insertions(+)
> 
> diff --git a/drivers/net/ethernet/intel/igb/igb_main.c 
> b/drivers/net/ethernet/intel/igb/igb_main.c
> index fd4a46b03cc8..837d9b46a390 100644
> --- a/drivers/net/ethernet/intel/igb/igb_main.c
> +++ b/drivers/net/ethernet/intel/igb/igb_main.c
> @@ -3162,6 +3162,8 @@ static int igb_sw_init(struct igb_adapter *adapter)
>   /* Setup and initialize a copy of the hw vlan table array */
>   adapter->shadow_vfta = kcalloc(E1000_VLAN_FILTER_TBL_SIZE, sizeof(u32),
>  GFP_ATOMIC);
> + if (!adapter->shadow_vfta)
> + return -ENOMEM;

Looks reasonable to me.

A larger issue though I see in this function is that if we return 
-ENOMEM here, and if we return -ENOMEM from igb_init_interrupt_scheme() 
below on failure, we leak adapter->mac_table (and adapter->shadow_vfta 
in the latter).  We should add a proper unwind to free up the memory on 
failure.

-PJ

Re: [PATCH] ARM: dts: rk3228-evb: Fix the compiling error

2017-08-27 Thread Stephen Rothwell

Hi Dave (Miller),

On Tue, 22 Aug 2017 21:52:51 +1000 Stephen Rothwell  
wrote:
>
> Thanks.
> 
> On Tue, 22 Aug 2017 17:24:25 +0800 David Wu  wrote:
> >
> > This patch solves the following error:
> > arch/arm/boot/dts/rk3228-evb.dtb: ERROR (phandle_references): Reference to 
> > non-existent node or label "phy0"
> > 
> > Fixess db40f15b53e4 ("ARM: dts: rk3228-evb: Enable the integrated PHY for 
> > gmac")
> > Signed-off-by: David Wu   
> 
> Reported-by: Stephen Rothwell 

Ping?

-- 
Cheers,
Stephen Rothwell

RE: [PATCH] DSA support for Micrel KSZ8895

2017-08-27 Thread Woojung.Huh

Pavel,

Thanks for update and sorry about email format (due to web-access version)
I'll do review when getting back to office later this week.

- Woojung

From: Pavel Machek [pa...@denx.de]
Sent: Sunday, August 27, 2017 8:36 AM
To: Woojung Huh - C21699; nathan.leigh.con...@gmail.com
Cc: vivien.dide...@savoirfairelinux.com; f.faine...@gmail.com; 
netdev@vger.kernel.org; linux-ker...@vger.kernel.org; tristram...@micrel.com; 
and...@lunn.ch; pa...@denx.de
Subject: [PATCH] DSA support for Micrel KSZ8895

Hi!

So I fought with the driver a bit more, and now I have something that
kind-of-works.

"great great hack" belows worries me.

Yeah, disabled code needs to be removed before merge.

No, tag_ksz part probably is not acceptable. Do you see solution
better than just copying it into tag_ksz1 file?

Any more comments, etc?

Help would be welcome.

[PATCH net-next] bridge: fdb add and delete tracepoints

2017-08-27 Thread Roopa Prabhu

From: Roopa Prabhu 

Tracepoints to trace bridge forwarding database updates.

Signed-off-by: Roopa Prabhu 
---
 include/trace/events/bridge.h | 98 +++
 net/bridge/br_fdb.c   |  7 
 net/core/net-traces.c |  6 +++
 3 files changed, 111 insertions(+)
 create mode 100644 include/trace/events/bridge.h

diff --git a/include/trace/events/bridge.h b/include/trace/events/bridge.h
new file mode 100644
index 000..e2d52cf
--- /dev/null
+++ b/include/trace/events/bridge.h
@@ -0,0 +1,98 @@
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM bridge
+
+#if !defined(_TRACE_BRIDGE_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_BRIDGE_H
+
+#include 
+#include 
+
+#include "../../../net/bridge/br_private.h"
+
+TRACE_EVENT(br_fdb_add,
+
+   TP_PROTO(struct ndmsg *ndm, struct net_device *dev,
+const unsigned char *addr, u16 vid, u16 nlh_flags),
+
+   TP_ARGS(ndm, dev, addr, vid, nlh_flags),
+
+   TP_STRUCT__entry(
+   __field(u8, ndm_flags)
+   __string(dev, dev->name)
+   __array(unsigned char, addr, 6)
+   __field(u16, vid)
+   __field(u16, nlh_flags)
+   ),
+
+   TP_fast_assign(
+   __assign_str(dev, dev->name);
+   memcpy(__entry->addr, addr, 6);
+   __entry->vid = vid;
+   __entry->nlh_flags = nlh_flags;
+   __entry->ndm_flags = ndm->ndm_flags;
+   ),
+
+   TP_printk("dev %s addr %02x:%02x:%02x:%02x:%02x:%02x vid %u nlh_flags 
%x ndm_flags = %x",
+ __get_str(dev), __entry->addr[0], __entry->addr[1],
+ __entry->addr[2], __entry->addr[3], __entry->addr[4],
+ __entry->addr[5], __entry->vid,
+ __entry->nlh_flags, __entry->ndm_flags)
+);
+
+TRACE_EVENT(br_fdb_external_learn_add,
+
+   TP_PROTO(struct net_bridge *br, struct net_bridge_port *p,
+const unsigned char *addr, u16 vid),
+
+   TP_ARGS(br, p, addr, vid),
+
+   TP_STRUCT__entry(
+   __string(br_dev, br->dev->name)
+   __string(dev, p->dev->name)
+   __array(unsigned char, addr, 6)
+   __field(u16, vid)
+   ),
+
+   TP_fast_assign(
+   __assign_str(br_dev, br ? br->dev->name : "null");
+   __assign_str(dev, p ? p->dev->name : "null");
+   memcpy(__entry->addr, addr, 6);
+   __entry->vid = vid;
+   ),
+
+   TP_printk("br_dev %s port %s addr %02x:%02x:%02x:%02x:%02x:%02x vid %u",
+ __get_str(br_dev), __get_str(dev), __entry->addr[0],
+ __entry->addr[1], __entry->addr[2], __entry->addr[3],
+ __entry->addr[4], __entry->addr[5], __entry->vid)
+);
+
+TRACE_EVENT(fdb_delete,
+
+   TP_PROTO(struct net_bridge *br, struct net_bridge_fdb_entry *f),
+
+   TP_ARGS(br, f),
+
+   TP_STRUCT__entry(
+   __string(br_dev, br->dev->name)
+   __string(dev, f->dst ? f->dst->dev->name : "null")
+   __array(unsigned char, addr, 6)
+   __field(u16, vid)
+   ),
+
+   TP_fast_assign(
+   __assign_str(br_dev, br ? br->dev->name : "null");
+   __assign_str(dev, f->dst ? f->dst->dev->name : "null");
+   memcpy(__entry->addr, f->addr.addr, 6);
+   __entry->vid = f->vlan_id;
+   ),
+
+   TP_printk("br_dev %s dev %s addr %02x:%02x:%02x:%02x:%02x:%02x vid %u",
+ __get_str(br_dev), __get_str(dev), __entry->addr[0],
+ __entry->addr[1], __entry->addr[2], __entry->addr[3],
+ __entry->addr[4], __entry->addr[5], __entry->vid)
+);
+
+#endif /* _TRACE_BRIDGE_H */
+
+/* This part must be outside protection */
+#include 
diff --git a/net/bridge/br_fdb.c b/net/bridge/br_fdb.c
index a79b648..be5e1da 100644
--- a/net/bridge/br_fdb.c
+++ b/net/bridge/br_fdb.c
@@ -25,6 +25,7 @@
 #include 
 #include 
 #include 
+#include 
 #include "br_private.h"
 
 static struct kmem_cache *br_fdb_cache __read_mostly;
@@ -171,6 +172,8 @@ static void fdb_del_hw_addr(struct net_bridge *br, const 
unsigned char *addr)
 
 static void fdb_delete(struct net_bridge *br, struct net_bridge_fdb_entry *f)
 {
+   trace_fdb_delete(br, f);
+
if (f->is_static)
fdb_del_hw_addr(br, f->addr.addr);
 
@@ -870,6 +873,8 @@ int br_fdb_add(struct ndmsg *ndm, struct nlattr *tb[],
struct net_bridge *br = NULL;
int err = 0;
 
+   trace_br_fdb_add(ndm, dev, addr, vid, nlh_flags);
+
if (!(ndm->ndm_state & (NUD_PERMANENT|NUD_NOARP|NUD_REACHABLE))) {
pr_info("bridge: RTM_NEWNEIGH with invalid state %#x\n", 
ndm->ndm_state);
return -EINVAL;
@@ -1066,6 +1071,8 @@ int br_fdb_external_learn_add(struct net_bridge *br, 
struct net_bridge_port *p,
bool modified = false;
int err

Re: [PATCH v2 0/2] enable hires timer to timeout datagram socket

2017-08-27 Thread Vallish Vaidyeshwara

On Tue, Aug 22, 2017 at 09:30:30PM -0700, David Miller wrote:
> From: Vallish Vaidyeshwara 
> Date: Wed, 23 Aug 2017 00:10:25 +
> 
> > I am submitting 2 patch series to enable hires timer to timeout
> > datagram sockets (AF_UNIX & AF_INET domain) and test code to test
> > timeout accuracy on these sockets.
> 
> This is not reasonable.
> 
> If you want high resolution events with real guarantees, please use
> the kernel interfaces which provide this as explained to you as
> feedback by other reviewers.
> 
> I'm not applying this, sorry.

Hello David,

I respect the decision not to upstream this patch series, however I
wanted to provide additional details. Application wanting high
resolution events with real guarantees is not the case, but the case
here is regression in system call behavior:

1) Change in system call behavior:
strace from 4.4 test run of waiting for 180 seconds on datagram socket:
10:25:48.239685 setsockopt(3, SOL_SOCKET, SO_RCVTIMEO, 
"\264\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0", 16) = 0
10:25:48.239755 recvmsg(3, 0x7ffd0a3beec0, 0) = -1 EAGAIN (Resource temporarily 
unavailable)
10:28:48.236989 fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 0), ...}) 
= 0

strace from 4.9 test run of waiting for 180 seconds on datagram socket times 
out close to 195 seconds:
setsockopt(3, SOL_SOCKET, SO_RCVTIMEO, "\264\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0", 
16) = 0 <0.28>
recvmsg(3, 0x7ffd6a2c4380, 0)   = -1 EAGAIN (Resource temporarily 
unavailable) <194.852000>
fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 0), ...}) = 0 <0.18>

This is the change in behavior of system call that is causing our application
to regress on 4.9 kernel. There are events which need to be run on timeouts
and now response time for such timeouts on 4.9 kernel are being triggered
with extended delay of close to 195 seconds as in one of the test runs
shown above.

2) Comparison with MacOS:
I ran the same test on OS X El Capitan version 10.11.6 and the behavior is
consistent with Linux 4.4 Kernel behavior. I have not tested the program on
other flavors of OS like HPUX or AIX or Solaris, but I guess if these OS
implement SO_RCVTIMEO and tested, this behavior will not be different than
Linux 4.4 kernel.

3) Standards Specification:
Opengroups standard does not talk about how quick SO_RCVTIMEO need to respond
for timeouts. However, the standards for select system call do mention that
timeout need to respond quickly. It would be good to restore SO_RCVTIMEO
behavior to 4.4 kernel and have SO_RCVTIMEO be consistent with select timeout.

4) Changing application code:
Any change to application code to accommodate this change of behavior in system
call breaks application migration between 4.4 kernel and 4.9 kernel.
Moreover, making application code change is not feasible in all cases as in
the case where the source code is not available (third party vendor).

Thanks.
-Vallish

[PATCH] connector: Delete an error message for a failed memory allocation in cn_queue_alloc_callback_entry()

2017-08-27 Thread SF Markus Elfring

From: Markus Elfring 
Date: Sun, 27 Aug 2017 21:18:37 +0200

Omit an extra message for a memory allocation failure in this function.

This issue was detected by using the Coccinelle software.

Signed-off-by: Markus Elfring 
---
 drivers/connector/cn_queue.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/connector/cn_queue.c b/drivers/connector/cn_queue.c
index 1f8bf054d11c..e4f31d679f02 100644
--- a/drivers/connector/cn_queue.c
+++ b/drivers/connector/cn_queue.c
@@ -40,10 +40,8 @@ cn_queue_alloc_callback_entry(struct cn_queue_dev *dev, 
const char *name,
struct cn_callback_entry *cbq;
 
cbq = kzalloc(sizeof(*cbq), GFP_KERNEL);
-   if (!cbq) {
-   pr_err("Failed to create new callback queue.\n");
+   if (!cbq)
return NULL;
-   }
 
atomic_set(>refcnt, 1);
 
-- 
2.14.1

Re: [PATCH V2 net-next] net-next/hinic: Fix MTU limitation

2017-08-27 Thread Andrew Lunn

On Mon, Aug 28, 2017 at 01:20:26AM +0800, Aviad Krawczyk wrote:
> Fix the hw MTU limitation by setting max_mtu
> 
> Signed-off-by: Aviad Krawczyk 
> Signed-off-by: Zhao Chen 

Reviewed-by: Andrew Lunn 

Andrew

[PATCH net-next] net-next/hinic: fix comparison of a uint16_t type with -1

2017-08-27 Thread Aviad Krawczyk

Remove the search for index of constant buffer size

Signed-off-by: Aviad Krawczyk 
Signed-off-by: Zhao Chen 
---
 drivers/net/ethernet/huawei/hinic/hinic_hw_dev.c | 37 +---
 drivers/net/ethernet/huawei/hinic/hinic_hw_qp.h  | 21 ++
 2 files changed, 22 insertions(+), 36 deletions(-)

diff --git a/drivers/net/ethernet/huawei/hinic/hinic_hw_dev.c 
b/drivers/net/ethernet/huawei/hinic/hinic_hw_dev.c
index 09dec6d..79b5674 100644
--- a/drivers/net/ethernet/huawei/hinic/hinic_hw_dev.c
+++ b/drivers/net/ethernet/huawei/hinic/hinic_hw_dev.c
@@ -69,31 +69,6 @@ struct hinic_dev_cap {
u8  rsvd3[208];
 };
 
-struct rx_buf_sz {
-   int idx;
-   size_t  sz;
-};
-
-static struct rx_buf_sz rx_buf_sz_table[] = {
-   {0, 32},
-   {1, 64},
-   {2, 96},
-   {3, 128},
-   {4, 192},
-   {5, 256},
-   {6, 384},
-   {7, 512},
-   {8, 768},
-   {9, 1024},
-   {10, 1536},
-   {11, 2048},
-   {12, 3072},
-   {13, 4096},
-   {14, 8192},
-   {15, 16384},
-   {-1, -1},
-};
-
 /**
  * get_capability - convert device capabilities to NIC capabilities
  * @hwdev: the HW device to set and convert device capabilities for
@@ -330,7 +305,6 @@ static int set_hw_ioctxt(struct hinic_hwdev *hwdev, 
unsigned int rq_depth,
struct hinic_cmd_hw_ioctxt hw_ioctxt;
struct pci_dev *pdev = hwif->pdev;
struct hinic_pfhwdev *pfhwdev;
-   int i;
 
if (!HINIC_IS_PF(hwif) && !HINIC_IS_PPF(hwif)) {
dev_err(>dev, "Unsupported PCI Function type\n");
@@ -344,16 +318,7 @@ static int set_hw_ioctxt(struct hinic_hwdev *hwdev, 
unsigned int rq_depth,
 
hw_ioctxt.rq_depth  = ilog2(rq_depth);
 
-   for (i = 0; ; i++) {
-   if ((rx_buf_sz_table[i].sz == HINIC_RX_BUF_SZ) ||
-   (rx_buf_sz_table[i].sz == -1)) {
-   hw_ioctxt.rx_buf_sz_idx = rx_buf_sz_table[i].idx;
-   break;
-   }
-   }
-
-   if (hw_ioctxt.rx_buf_sz_idx == -1)
-   return -EINVAL;
+   hw_ioctxt.rx_buf_sz_idx = HINIC_RX_BUF_SZ_IDX;
 
hw_ioctxt.sq_depth  = ilog2(sq_depth);
 
diff --git a/drivers/net/ethernet/huawei/hinic/hinic_hw_qp.h 
b/drivers/net/ethernet/huawei/hinic/hinic_hw_qp.h
index e642a8a..df729a1 100644
--- a/drivers/net/ethernet/huawei/hinic/hinic_hw_qp.h
+++ b/drivers/net/ethernet/huawei/hinic/hinic_hw_qp.h
@@ -53,7 +53,9 @@
 #define HINIC_SQ_DEPTH  SZ_4K
 #define HINIC_RQ_DEPTH  SZ_4K
 
+/* In any change to HINIC_RX_BUF_SZ, HINIC_RX_BUF_SZ_IDX must be changed */
 #define HINIC_RX_BUF_SZ 2048
+#define HINIC_RX_BUF_SZ_IDXHINIC_RX_BUF_SZ_2048_IDX
 
 #define HINIC_MIN_TX_WQE_SIZE(wq)   \
ALIGN(HINIC_SQ_WQE_SIZE(1), (wq)->wqebb_size)
@@ -61,6 +63,25 @@
 #define HINIC_MIN_TX_NUM_WQEBBS(sq) \
(HINIC_MIN_TX_WQE_SIZE((sq)->wq) / (sq)->wq->wqebb_size)
 
+enum hinic_rx_buf_sz_idx {
+   HINIC_RX_BUF_SZ_32_IDX,
+   HINIC_RX_BUF_SZ_64_IDX,
+   HINIC_RX_BUF_SZ_96_IDX,
+   HINIC_RX_BUF_SZ_128_IDX,
+   HINIC_RX_BUF_SZ_192_IDX,
+   HINIC_RX_BUF_SZ_256_IDX,
+   HINIC_RX_BUF_SZ_384_IDX,
+   HINIC_RX_BUF_SZ_512_IDX,
+   HINIC_RX_BUF_SZ_768_IDX,
+   HINIC_RX_BUF_SZ_1024_IDX,
+   HINIC_RX_BUF_SZ_1536_IDX,
+   HINIC_RX_BUF_SZ_2048_IDX,
+   HINIC_RX_BUF_SZ_3072_IDX,
+   HINIC_RX_BUF_SZ_4096_IDX,
+   HINIC_RX_BUF_SZ_8192_IDX,
+   HINIC_RX_BUF_SZ_16384_IDX,
+};
+
 struct hinic_sq {
struct hinic_hwif   *hwif;
 
-- 
1.9.1

[PATCH V2 net-next] net-next/hinic: Fix MTU limitation

2017-08-27 Thread Aviad Krawczyk

Fix the hw MTU limitation by setting max_mtu

Signed-off-by: Aviad Krawczyk 
Signed-off-by: Zhao Chen 
---
 drivers/net/ethernet/huawei/hinic/hinic_main.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/ethernet/huawei/hinic/hinic_main.c 
b/drivers/net/ethernet/huawei/hinic/hinic_main.c
index ae7ad48..eb53bd9 100644
--- a/drivers/net/ethernet/huawei/hinic/hinic_main.c
+++ b/drivers/net/ethernet/huawei/hinic/hinic_main.c
@@ -919,6 +919,7 @@ static int nic_dev_init(struct pci_dev *pdev)
 
netdev->netdev_ops = _netdev_ops;
netdev->ethtool_ops = _ethtool_ops;
+   netdev->max_mtu = ETH_MAX_MTU;
 
nic_dev = netdev_priv(netdev);
nic_dev->netdev = netdev;
-- 
1.9.1

Re: [PATCH] DSA support for Micrel KSZ8895

2017-08-27 Thread Florian Fainelli

On August 27, 2017 5:36:58 AM PDT, Pavel Machek  wrote:
>Hi!
>
>So I fought with the driver a bit more, and now I have something that
>kind-of-works.
>
>"great great hack" belows worries me.
>
>Yeah, disabled code needs to be removed before merge.
>
>No, tag_ksz part probably is not acceptable. Do you see solution
>better than just copying it into tag_ksz1 file?

You could have all Micrel tag implementations live under net/dsa/tag_ksz.c and 
have e.g: DSA_TAG_PROTO_KSZ for the current (newer) switches and 
DSA_TAG_PROTO_KSZ_LEGACY (or any other name) for the older switches and you 
would provide two sets of function pointers depending on which protocol is 
requested by the switch.

Considering the minor difference needed in tagging here, it might be acceptable 
to actually keep the current functions and just have the xmit() call check what 
get_tag_protocol returns and use word 1 or 0 based on that. Even though that's 
a fast path it shouldn't hurt performance too much. If it does, we can always 
copy the tagging protocol into dsa_slave_priv so you have a fast access to it.

>
>Any more comments, etc?

The MII emulation bits are interesting, was it not sufficient if you 
implemented phy_read and phy_write operations that perform the necessary 
internal PHY accesses or maybe you don't get access to standard MII registers? 
b53 does such a thing and we merely just need to do a simple shift to access 
the MII register number, thus avoiding the translation.

>
>Help would be welcome.

I concur with Andrew, try to get a patch series, even an RFC one together so we 
can review things individually. 

How functional is your driver so far? I'd say the basic stuff to get working: 
counters (debugging), link management (auto-negotiation, forced, etc.) and 
basic bridging: all ports separate by default and working port to port 
switching when brought together in a bridge. VLAN, FDB, MDB, other ethtool 
goodies can be added later on.

-- 
Florian

Re: [PATCH 0/4] irda: move it to drivers/staging so we can delete it

2017-08-27 Thread Greg Kroah-Hartman

On Sun, Aug 27, 2017 at 09:19:19AM -0700, Joe Perches wrote:
> On Sun, 2017-08-27 at 18:13 +0200, Greg Kroah-Hartman wrote:
> > On Sun, Aug 27, 2017 at 08:35:43AM -0700, Joe Perches wrote:
> > > On Sun, 2017-08-27 at 17:03 +0200, Greg Kroah-Hartman wrote:
> > > > The IRDA code has long been obsolete and broken.  So, to keep people
> > > > from trying to use it, and to prevent people from having to maintain it,
> > > > let's move it to drivers/staging/ so that we can delete it entirely from
> > > > the kernel in a few releases.
> > > 
> > > 
> > > MAINTAINERS should be updated as well.
> > > 
> > > It'd probably be nice to try to get an email to
> > > the irda mailing list too if it still works.
> > 
> > As get_maintainer.pl didn't show it, odds are it doesn't...
> 
> get_maintainer doesn't show it because it's subscriber-only.
> If you want get_maintainer to show it, add -s
> 
> $ ./scripts/get_maintainer.pl  -s -f net/irda/
> Samuel Ortiz  (maintainer:IRDA SUBSYSTEM)
> "David S. Miller"  (maintainer:NETWORKING [GENERAL])
> irda-us...@lists.sourceforge.net (subscriber list:IRDA SUBSYSTEM)
> netdev@vger.kernel.org (open list:IRDA SUBSYSTEM)
> linux-ker...@vger.kernel.org (open list)

Sorry, am not going to subscribe to a random list just to send patches
that delete the subsystem :)

netdev@ should be all that is needed here anyway...

thanks,

greg k-h

Re: [PATCH] DSA support for Micrel KSZ8895

2017-08-27 Thread Andrew Lunn

> No, tag_ksz part probably is not acceptable. Do you see solution
> better than just copying it into tag_ksz1 file?

How about something like this, which needs further work to actually
compile, but should give you the idea.

 Andrew

index 99e38af85fc5..843e77b7c270 100644
--- a/net/dsa/dsa.c
+++ b/net/dsa/dsa.c
@@ -49,8 +49,11 @@ const struct dsa_device_ops *dsa_device_ops[DSA_TAG_LAST] = {
 #ifdef CONFIG_NET_DSA_TAG_EDSA
[DSA_TAG_PROTO_EDSA] = _netdev_ops,
 #endif
-#ifdef CONFIG_NET_DSA_TAG_KSZ
-   [DSA_TAG_PROTO_KSZ] = _netdev_ops,
+#ifdef CONFIG_NET_DSA_TAG_KSZ_8K
+   [DSA_TAG_PROTO_KSZ8K] = _netdev_ops,
+#endif
+#ifdef CONFIG_NET_DSA_TAG_KSZ_9K
+   [DSA_TAG_PROTO_KSZ9K] = _netdev_ops,
 #endif
 #ifdef CONFIG_NET_DSA_TAG_LAN9303
[DSA_TAG_PROTO_LAN9303] = _netdev_ops,
diff --git a/net/dsa/tag_ksz.c b/net/dsa/tag_ksz.c
index de66ca8e6201..398b833889f1 100644
--- a/net/dsa/tag_ksz.c
+++ b/net/dsa/tag_ksz.c
@@ -35,6 +35,9 @@
 static struct sk_buff *ksz_xmit(struct sk_buff *skb, struct net_device *dev)
 {
struct dsa_slave_priv *p = netdev_priv(dev);
+   struct dsa_port *dp = p->dp;
+   struct dsa_switch *ds = dp->ds;
+   struct dsa_switch_tree *dst = ds->dst;
struct sk_buff *nskb;
int padlen;
u8 *tag;
@@ -69,8 +72,14 @@ static struct sk_buff *ksz_xmit(struct sk_buff *skb, struct 
net_device *dev)
}
 
tag = skb_put(nskb, KSZ_INGRESS_TAG_LEN);
-   tag[0] = 0;
-   tag[1] = 1 << p->dp->index; /* destination port */
+   if (dst->tag_ops == ksz8k_netdev_ops) {
+   tag[0] = 1 << p->dp->index; /* destination port */0;
+   tag[1] = 0;
+   }
+
+   if (dst->tag_ops == ksz9k_netdev_ops) {
+   tag[0] = 0;
+   tag[1] = 1 << p->dp->index; /* destination port */
 
return nskb;
 }
@@ -98,7 +107,12 @@ static struct sk_buff *ksz_rcv(struct sk_buff *skb, struct 
net_device *dev,
return skb;
 }
 
-const struct dsa_device_ops ksz_netdev_ops = {
+const struct dsa_device_ops ksz8k_netdev_ops = {
+   .xmit   = ksz_xmit,
+   .rcv= ksz_rcv,
+};
+
+const struct dsa_device_ops ksz9k_netdev_ops = {
.xmit   = ksz_xmit,
.rcv= ksz_rcv,
 };

Re: [PATCH] DSA support for Micrel KSZ8895

2017-08-27 Thread Andrew Lunn

> +/**
> + * sw_r_phy - read data from PHY register
> + * @sw:  The switch instance.
> + * @phy: PHY address to read.
> + * @reg: PHY register to read.
> + * @val: Buffer to store the read data.
> + *
> + * This routine reads data from the PHY register.
> + */
> +static void sw_r_phy(struct ksz_device *sw, u16 phy, u16 reg, u16 *val)
> +{
> + u8 ctrl;
> + u8 restart;
> + u8 link;
> + u8 speed;
> + u8 force;
> + u8 p = phy;
> + u16 data = 0;
> +
> + switch (reg) {
> + case PHY_REG_CTRL:
> + ksz_pread8(sw, p, P_LOCAL_CTRL, );
> + ksz_pread8(sw, p, P_NEG_RESTART_CTRL, );
> + ksz_pread8(sw, p, P_SPEED_STATUS, );
> + ksz_pread8(sw, p, P_FORCE_CTRL, );
> + if (restart & PORT_PHY_LOOPBACK)
> + data |= PHY_LOOPBACK;
> + if (force & PORT_FORCE_100_MBIT)
> + data |= PHY_SPEED_100MBIT;
> + if (!(force & PORT_AUTO_NEG_DISABLE))
> + data |= PHY_AUTO_NEG_ENABLE;
> + if (restart & PORT_POWER_DOWN)
> + data |= PHY_POWER_DOWN;
> + if (restart & PORT_AUTO_NEG_RESTART)
> + data |= PHY_AUTO_NEG_RESTART;
> + if (force & PORT_FORCE_FULL_DUPLEX)
> + data |= PHY_FULL_DUPLEX;
> + if (speed & PORT_HP_MDIX)
> + data |= PHY_HP_MDIX;
> + if (restart & PORT_FORCE_MDIX)
> + data |= PHY_FORCE_MDIX;
> + if (restart & PORT_AUTO_MDIX_DISABLE)
> + data |= PHY_AUTO_MDIX_DISABLE;
> + if (restart & PORT_TX_DISABLE)
> + data |= PHY_TRANSMIT_DISABLE;
> + if (restart & PORT_LED_OFF)
> + data |= PHY_LED_DISABLE;
> + break;
> + case PHY_REG_STATUS:
> + ksz_pread8(sw, p, P_LINK_STATUS, );
> + ksz_pread8(sw, p, P_SPEED_STATUS, );
> + data = PHY_100BTX_FD_CAPABLE |
> + PHY_100BTX_CAPABLE |
> + PHY_10BT_FD_CAPABLE |
> + PHY_10BT_CAPABLE |
> + PHY_AUTO_NEG_CAPABLE;
> + if (link & PORT_AUTO_NEG_COMPLETE)
> + data |= PHY_AUTO_NEG_ACKNOWLEDGE;
> + if (link & PORT_STAT_LINK_GOOD)
> + data |= PHY_LINK_STATUS;
> + break;
> + case PHY_REG_ID_1:
> + data = KSZ8895_ID_HI;
> + break;
> + case PHY_REG_ID_2:
> + data = KSZ8895_ID_LO;
> + break;

According to the datasheet, the PHY has the normal ID registers,
which have the value 0x0022, 0x1450. So it should be possible to have
a standard PHY driver in drivers/net/phy.

In fact, the IDs suggest it is a micrel phy, and 1430, 1435 are
already supported. So it could be you only need minor modifications to
the micrel.c.

Andrew

Re: [PATCH 0/4] irda: move it to drivers/staging so we can delete it

2017-08-27 Thread Joe Perches

On Sun, 2017-08-27 at 18:13 +0200, Greg Kroah-Hartman wrote:
> On Sun, Aug 27, 2017 at 08:35:43AM -0700, Joe Perches wrote:
> > On Sun, 2017-08-27 at 17:03 +0200, Greg Kroah-Hartman wrote:
> > > The IRDA code has long been obsolete and broken.  So, to keep people
> > > from trying to use it, and to prevent people from having to maintain it,
> > > let's move it to drivers/staging/ so that we can delete it entirely from
> > > the kernel in a few releases.
> > 
> > 
> > MAINTAINERS should be updated as well.
> > 
> > It'd probably be nice to try to get an email to
> > the irda mailing list too if it still works.
> 
> As get_maintainer.pl didn't show it, odds are it doesn't...

get_maintainer doesn't show it because it's subscriber-only.
If you want get_maintainer to show it, add -s

$ ./scripts/get_maintainer.pl  -s -f net/irda/
Samuel Ortiz  (maintainer:IRDA SUBSYSTEM)
"David S. Miller"  (maintainer:NETWORKING [GENERAL])
irda-us...@lists.sourceforge.net (subscriber list:IRDA SUBSYSTEM)
netdev@vger.kernel.org (open list:IRDA SUBSYSTEM)
linux-ker...@vger.kernel.org (open list)

Re: [PATCH 0/4] irda: move it to drivers/staging so we can delete it

2017-08-27 Thread Greg Kroah-Hartman

On Sun, Aug 27, 2017 at 08:35:43AM -0700, Joe Perches wrote:
> On Sun, 2017-08-27 at 17:03 +0200, Greg Kroah-Hartman wrote:
> > The IRDA code has long been obsolete and broken.  So, to keep people
> > from trying to use it, and to prevent people from having to maintain it,
> > let's move it to drivers/staging/ so that we can delete it entirely from
> > the kernel in a few releases.
> 
> MAINTAINERS should be updated as well.
> 
> It'd probably be nice to try to get an email to
> the irda mailing list too if it still works.

As get_maintainer.pl didn't show it, odds are it doesn't...

Re: [PATCH 0/4] irda: move it to drivers/staging so we can delete it

2017-08-27 Thread Joe Perches

On Sun, 2017-08-27 at 17:03 +0200, Greg Kroah-Hartman wrote:
> The IRDA code has long been obsolete and broken.  So, to keep people
> from trying to use it, and to prevent people from having to maintain it,
> let's move it to drivers/staging/ so that we can delete it entirely from
> the kernel in a few releases.

MAINTAINERS should be updated as well.

It'd probably be nice to try to get an email to
the irda mailing list too if it still works.

[PATCH 1/4] irda: move net/irda/ to drivers/staging/irda/net/

2017-08-27 Thread Greg Kroah-Hartman

It's time to get rid of IRDA.  It's long been broken, and no one seems
to use it anymore.  So move it to staging and after a while, we can
delete it from there.

To start, move the network irda core from net/irda to
drivers/staging/irda/net/

Signed-off-by: Greg Kroah-Hartman 
---
 drivers/staging/Kconfig | 2 ++
 drivers/staging/Makefile| 1 +
 {net/irda => drivers/staging/irda/net}/Kconfig  | 6 +++---
 {net/irda => drivers/staging/irda/net}/Makefile | 0
 {net/irda => drivers/staging/irda/net}/af_irda.c| 0
 {net/irda => drivers/staging/irda/net}/discovery.c  | 0
 {net/irda => drivers/staging/irda/net}/ircomm/Kconfig   | 0
 {net/irda => drivers/staging/irda/net}/ircomm/Makefile  | 0
 {net/irda => drivers/staging/irda/net}/ircomm/ircomm_core.c | 0
 {net/irda => drivers/staging/irda/net}/ircomm/ircomm_event.c| 0
 {net/irda => drivers/staging/irda/net}/ircomm/ircomm_lmp.c  | 0
 {net/irda => drivers/staging/irda/net}/ircomm/ircomm_param.c| 0
 {net/irda => drivers/staging/irda/net}/ircomm/ircomm_ttp.c  | 0
 {net/irda => drivers/staging/irda/net}/ircomm/ircomm_tty.c  | 0
 {net/irda => drivers/staging/irda/net}/ircomm/ircomm_tty_attach.c   | 0
 {net/irda => drivers/staging/irda/net}/ircomm/ircomm_tty_ioctl.c| 0
 {net/irda => drivers/staging/irda/net}/irda_device.c| 0
 {net/irda => drivers/staging/irda/net}/iriap.c  | 0
 {net/irda => drivers/staging/irda/net}/iriap_event.c| 0
 {net/irda => drivers/staging/irda/net}/irias_object.c   | 0
 {net/irda => drivers/staging/irda/net}/irlan/Kconfig| 0
 {net/irda => drivers/staging/irda/net}/irlan/Makefile   | 0
 {net/irda => drivers/staging/irda/net}/irlan/irlan_client.c | 0
 {net/irda => drivers/staging/irda/net}/irlan/irlan_client_event.c   | 0
 {net/irda => drivers/staging/irda/net}/irlan/irlan_common.c | 0
 {net/irda => drivers/staging/irda/net}/irlan/irlan_eth.c| 0
 {net/irda => drivers/staging/irda/net}/irlan/irlan_event.c  | 0
 {net/irda => drivers/staging/irda/net}/irlan/irlan_filter.c | 0
 {net/irda => drivers/staging/irda/net}/irlan/irlan_provider.c   | 0
 {net/irda => drivers/staging/irda/net}/irlan/irlan_provider_event.c | 0
 {net/irda => drivers/staging/irda/net}/irlap.c  | 0
 {net/irda => drivers/staging/irda/net}/irlap_event.c| 0
 {net/irda => drivers/staging/irda/net}/irlap_frame.c| 0
 {net/irda => drivers/staging/irda/net}/irlmp.c  | 0
 {net/irda => drivers/staging/irda/net}/irlmp_event.c| 0
 {net/irda => drivers/staging/irda/net}/irlmp_frame.c| 0
 {net/irda => drivers/staging/irda/net}/irmod.c  | 0
 {net/irda => drivers/staging/irda/net}/irnet/Kconfig| 0
 {net/irda => drivers/staging/irda/net}/irnet/Makefile   | 0
 {net/irda => drivers/staging/irda/net}/irnet/irnet.h| 0
 {net/irda => drivers/staging/irda/net}/irnet/irnet_irda.c   | 0
 {net/irda => drivers/staging/irda/net}/irnet/irnet_irda.h   | 0
 {net/irda => drivers/staging/irda/net}/irnet/irnet_ppp.c| 0
 {net/irda => drivers/staging/irda/net}/irnet/irnet_ppp.h| 0
 {net/irda => drivers/staging/irda/net}/irnetlink.c  | 0
 {net/irda => drivers/staging/irda/net}/irproc.c | 0
 {net/irda => drivers/staging/irda/net}/irqueue.c| 0
 {net/irda => drivers/staging/irda/net}/irsysctl.c   | 0
 {net/irda => drivers/staging/irda/net}/irttp.c  | 0
 {net/irda => drivers/staging/irda/net}/parameters.c | 0
 {net/irda => drivers/staging/irda/net}/qos.c| 0
 {net/irda => drivers/staging/irda/net}/timer.c  | 0
 {net/irda => drivers/staging/irda/net}/wrapper.c| 0
 net/Kconfig | 1 -
 net/Makefile| 1 -
 55 files changed, 6 insertions(+), 5 deletions(-)
 rename {net/irda => drivers/staging/irda/net}/Kconfig (95%)
 rename {net/irda => drivers/staging/irda/net}/Makefile (100%)
 rename {net/irda => drivers/staging/irda/net}/af_irda.c (100%)
 rename {net/irda => drivers/staging/irda/net}/discovery.c (100%)
 rename {net/irda => drivers/staging/irda/net}/ircomm/Kconfig (100%)
 rename {net/irda => drivers/staging/irda/net}/ircomm/Makefile (100%)
 rename {net/irda => drivers/staging/irda/net}/ircomm/ircomm_core.c (100%)
 rename {net/irda => drivers/staging/irda/net}/ircomm/ircomm_event.c (100%)
 rename {net/irda =>

[PATCH 0/4] irda: move it to drivers/staging so we can delete it

2017-08-27 Thread Greg Kroah-Hartman

The IRDA code has long been obsolete and broken.  So, to keep people
from trying to use it, and to prevent people from having to maintain it,
let's move it to drivers/staging/ so that we can delete it entirely from
the kernel in a few releases.


Greg Kroah-Hartman (4):
  irda: move net/irda/ to drivers/staging/irda/net/
  irda: move drivers/net/irda to drivers/staging/irda/drivers
  irda: move include/net/irda into staging subdirectory
  staging: irda: add a TODO file.

 drivers/net/Makefile  | 1 -
 drivers/staging/Kconfig   | 2 ++
 drivers/staging/Makefile  | 2 ++
 drivers/staging/irda/TODO | 4 
 drivers/{net/irda => staging/irda/drivers}/Kconfig| 0
 drivers/{net/irda => staging/irda/drivers}/Makefile   | 2 ++
 drivers/{net/irda => staging/irda/drivers}/act200l-sir.c  | 0
 drivers/{net/irda => staging/irda/drivers}/actisys-sir.c  | 0
 drivers/{net/irda => staging/irda/drivers}/ali-ircc.c | 0
 drivers/{net/irda => staging/irda/drivers}/ali-ircc.h | 0
 drivers/{net/irda => staging/irda/drivers}/au1k_ir.c  | 0
 drivers/{net/irda => staging/irda/drivers}/bfin_sir.c | 0
 drivers/{net/irda => staging/irda/drivers}/bfin_sir.h | 0
 drivers/{net/irda => staging/irda/drivers}/donauboe.c | 0
 drivers/{net/irda => staging/irda/drivers}/donauboe.h | 0
 drivers/{net/irda => staging/irda/drivers}/esi-sir.c  | 0
 drivers/{net/irda => staging/irda/drivers}/girbil-sir.c   | 0
 drivers/{net/irda => staging/irda/drivers}/irda-usb.c | 0
 drivers/{net/irda => staging/irda/drivers}/irda-usb.h | 0
 drivers/{net/irda => staging/irda/drivers}/irtty-sir.c| 0
 drivers/{net/irda => staging/irda/drivers}/irtty-sir.h| 0
 drivers/{net/irda => staging/irda/drivers}/kingsun-sir.c  | 0
 drivers/{net/irda => staging/irda/drivers}/ks959-sir.c| 0
 drivers/{net/irda => staging/irda/drivers}/ksdazzle-sir.c | 0
 drivers/{net/irda => staging/irda/drivers}/litelink-sir.c | 0
 drivers/{net/irda => staging/irda/drivers}/ma600-sir.c| 0
 drivers/{net/irda => staging/irda/drivers}/mcp2120-sir.c  | 0
 drivers/{net/irda => staging/irda/drivers}/mcs7780.c  | 0
 drivers/{net/irda => staging/irda/drivers}/mcs7780.h  | 0
 drivers/{net/irda => staging/irda/drivers}/nsc-ircc.c | 0
 drivers/{net/irda => staging/irda/drivers}/nsc-ircc.h | 0
 drivers/{net/irda => staging/irda/drivers}/old_belkin-sir.c   | 0
 drivers/{net/irda => staging/irda/drivers}/pxaficp_ir.c   | 0
 drivers/{net/irda => staging/irda/drivers}/sa1100_ir.c| 0
 drivers/{net/irda => staging/irda/drivers}/sh_sir.c   | 0
 drivers/{net/irda => staging/irda/drivers}/sir-dev.h  | 0
 drivers/{net/irda => staging/irda/drivers}/sir_dev.c  | 0
 drivers/{net/irda => staging/irda/drivers}/sir_dongle.c   | 0
 drivers/{net/irda => staging/irda/drivers}/smsc-ircc2.c   | 0
 drivers/{net/irda => staging/irda/drivers}/smsc-ircc2.h   | 0
 drivers/{net/irda => staging/irda/drivers}/smsc-sio.h | 0
 drivers/{net/irda => staging/irda/drivers}/stir4200.c | 0
 drivers/{net/irda => staging/irda/drivers}/tekram-sir.c   | 0
 drivers/{net/irda => staging/irda/drivers}/toim3232-sir.c | 0
 drivers/{net/irda => staging/irda/drivers}/via-ircc.c | 0
 drivers/{net/irda => staging/irda/drivers}/via-ircc.h | 0
 drivers/{net/irda => staging/irda/drivers}/vlsi_ir.c  | 0
 drivers/{net/irda => staging/irda/drivers}/vlsi_ir.h  | 0
 drivers/{net/irda => staging/irda/drivers}/w83977af.h | 0
 drivers/{net/irda => staging/irda/drivers}/w83977af_ir.c  | 0
 drivers/{net/irda => staging/irda/drivers}/w83977af_ir.h  | 0
 {include => drivers/staging/irda/include}/net/irda/af_irda.h  | 0
 {include => drivers/staging/irda/include}/net/irda/crc.h  | 0
 {include => drivers/staging/irda/include}/net/irda/discovery.h| 0
 {include => drivers/staging/irda/include}/net/irda/ircomm_core.h  | 0
 {include => drivers/staging/irda/include}/net/irda/ircomm_event.h | 0
 {include => drivers/staging/irda/include}/net/irda/ircomm_lmp.h   | 0
 {include => drivers/staging/irda/include}/net/irda/ircomm_param.h | 0
 {include => drivers/staging/irda/include}/net/irda/ircomm_ttp.h   | 0
 {include => drivers/staging/irda/include}/net/irda/ircomm_tty.h   | 0
 .../staging/irda/include}/net/irda/ircomm_tty_attach.h| 0
 {include => drivers/staging/irda/include}/net/irda/irda.h | 0
 {include => drivers/staging/irda/include}/net/irda/irda_device.h  | 0
 {include =>

[PATCH 4/4] staging: irda: add a TODO file.

2017-08-27 Thread Greg Kroah-Hartman

The irda code will be deleted in a future kernel release, so no need to
have anyone do any new work on it.

Signed-off-by: Greg Kroah-Hartman 
---
 drivers/staging/irda/TODO | 4 
 1 file changed, 4 insertions(+)
 create mode 100644 drivers/staging/irda/TODO

diff --git a/drivers/staging/irda/TODO b/drivers/staging/irda/TODO
new file mode 100644
index ..7d98a5cffaff
--- /dev/null
+++ b/drivers/staging/irda/TODO
@@ -0,0 +1,4 @@
+The irda code will be removed soon from the kernel tree as it is old and
+obsolete and broken.
+
+Don't worry about fixing up anything here, it's not needed.
-- 
2.14.1

[PATCH 2/4] irda: move drivers/net/irda to drivers/staging/irda/drivers

2017-08-27 Thread Greg Kroah-Hartman

Move the irda drivers from drivers/net/irda/ to
drivers/staging/irda/drivers as they will be deleted in a future kernel
release.

Signed-off-by: Greg Kroah-Hartman 
---
 drivers/net/Makefile| 1 -
 drivers/staging/Makefile| 1 +
 drivers/{net/irda => staging/irda/drivers}/Kconfig  | 0
 drivers/{net/irda => staging/irda/drivers}/Makefile | 0
 drivers/{net/irda => staging/irda/drivers}/act200l-sir.c| 0
 drivers/{net/irda => staging/irda/drivers}/actisys-sir.c| 0
 drivers/{net/irda => staging/irda/drivers}/ali-ircc.c   | 0
 drivers/{net/irda => staging/irda/drivers}/ali-ircc.h   | 0
 drivers/{net/irda => staging/irda/drivers}/au1k_ir.c| 0
 drivers/{net/irda => staging/irda/drivers}/bfin_sir.c   | 0
 drivers/{net/irda => staging/irda/drivers}/bfin_sir.h   | 0
 drivers/{net/irda => staging/irda/drivers}/donauboe.c   | 0
 drivers/{net/irda => staging/irda/drivers}/donauboe.h   | 0
 drivers/{net/irda => staging/irda/drivers}/esi-sir.c| 0
 drivers/{net/irda => staging/irda/drivers}/girbil-sir.c | 0
 drivers/{net/irda => staging/irda/drivers}/irda-usb.c   | 0
 drivers/{net/irda => staging/irda/drivers}/irda-usb.h   | 0
 drivers/{net/irda => staging/irda/drivers}/irtty-sir.c  | 0
 drivers/{net/irda => staging/irda/drivers}/irtty-sir.h  | 0
 drivers/{net/irda => staging/irda/drivers}/kingsun-sir.c| 0
 drivers/{net/irda => staging/irda/drivers}/ks959-sir.c  | 0
 drivers/{net/irda => staging/irda/drivers}/ksdazzle-sir.c   | 0
 drivers/{net/irda => staging/irda/drivers}/litelink-sir.c   | 0
 drivers/{net/irda => staging/irda/drivers}/ma600-sir.c  | 0
 drivers/{net/irda => staging/irda/drivers}/mcp2120-sir.c| 0
 drivers/{net/irda => staging/irda/drivers}/mcs7780.c| 0
 drivers/{net/irda => staging/irda/drivers}/mcs7780.h| 0
 drivers/{net/irda => staging/irda/drivers}/nsc-ircc.c   | 0
 drivers/{net/irda => staging/irda/drivers}/nsc-ircc.h   | 0
 drivers/{net/irda => staging/irda/drivers}/old_belkin-sir.c | 0
 drivers/{net/irda => staging/irda/drivers}/pxaficp_ir.c | 0
 drivers/{net/irda => staging/irda/drivers}/sa1100_ir.c  | 0
 drivers/{net/irda => staging/irda/drivers}/sh_sir.c | 0
 drivers/{net/irda => staging/irda/drivers}/sir-dev.h| 0
 drivers/{net/irda => staging/irda/drivers}/sir_dev.c| 0
 drivers/{net/irda => staging/irda/drivers}/sir_dongle.c | 0
 drivers/{net/irda => staging/irda/drivers}/smsc-ircc2.c | 0
 drivers/{net/irda => staging/irda/drivers}/smsc-ircc2.h | 0
 drivers/{net/irda => staging/irda/drivers}/smsc-sio.h   | 0
 drivers/{net/irda => staging/irda/drivers}/stir4200.c   | 0
 drivers/{net/irda => staging/irda/drivers}/tekram-sir.c | 0
 drivers/{net/irda => staging/irda/drivers}/toim3232-sir.c   | 0
 drivers/{net/irda => staging/irda/drivers}/via-ircc.c   | 0
 drivers/{net/irda => staging/irda/drivers}/via-ircc.h   | 0
 drivers/{net/irda => staging/irda/drivers}/vlsi_ir.c| 0
 drivers/{net/irda => staging/irda/drivers}/vlsi_ir.h| 0
 drivers/{net/irda => staging/irda/drivers}/w83977af.h   | 0
 drivers/{net/irda => staging/irda/drivers}/w83977af_ir.c| 0
 drivers/{net/irda => staging/irda/drivers}/w83977af_ir.h| 0
 drivers/staging/irda/net/Kconfig| 2 +-
 50 files changed, 2 insertions(+), 2 deletions(-)
 rename drivers/{net/irda => staging/irda/drivers}/Kconfig (100%)
 rename drivers/{net/irda => staging/irda/drivers}/Makefile (100%)
 rename drivers/{net/irda => staging/irda/drivers}/act200l-sir.c (100%)
 rename drivers/{net/irda => staging/irda/drivers}/actisys-sir.c (100%)
 rename drivers/{net/irda => staging/irda/drivers}/ali-ircc.c (100%)
 rename drivers/{net/irda => staging/irda/drivers}/ali-ircc.h (100%)
 rename drivers/{net/irda => staging/irda/drivers}/au1k_ir.c (100%)
 rename drivers/{net/irda => staging/irda/drivers}/bfin_sir.c (100%)
 rename drivers/{net/irda => staging/irda/drivers}/bfin_sir.h (100%)
 rename drivers/{net/irda => staging/irda/drivers}/donauboe.c (100%)
 rename drivers/{net/irda => staging/irda/drivers}/donauboe.h (100%)
 rename drivers/{net/irda => staging/irda/drivers}/esi-sir.c (100%)
 rename drivers/{net/irda => staging/irda/drivers}/girbil-sir.c (100%)
 rename drivers/{net/irda => staging/irda/drivers}/irda-usb.c (100%)
 rename drivers/{net/irda => staging/irda/drivers}/irda-usb.h (100%)
 rename drivers/{net/irda => staging/irda/drivers}/irtty-sir.c (100%)
 rename drivers/{net/irda => staging/irda/drivers}/irtty-sir.h (100%)
 rename drivers/{net/irda => staging/irda/drivers}/kingsun-sir.c (100%)
 rename drivers/{net/irda => staging/irda/drivers}/ks959-sir.c (100%)
 rename drivers/{net/irda => staging/irda/drivers}/ksdazzle-sir.c (100%)
 rename drivers/{net/irda => staging/irda/drivers}/litelink-sir.c (100%)
 rename drivers/{net/irda =>

[PATCH 3/4] irda: move include/net/irda into staging subdirectory

2017-08-27 Thread Greg Kroah-Hartman

And finally, move the irda include files into
drivers/staging/irda/include/net/irda.  Yes, it's a long path, but it
makes it easy for us to just add a Makefile directory path addition and
all of the net and drivers code "just works".

Signed-off-by: Greg Kroah-Hartman 
---
 drivers/staging/irda/drivers/Makefile  | 2 ++
 {include => drivers/staging/irda/include}/net/irda/af_irda.h   | 0
 {include => drivers/staging/irda/include}/net/irda/crc.h   | 0
 {include => drivers/staging/irda/include}/net/irda/discovery.h | 0
 {include => drivers/staging/irda/include}/net/irda/ircomm_core.h   | 0
 {include => drivers/staging/irda/include}/net/irda/ircomm_event.h  | 0
 {include => drivers/staging/irda/include}/net/irda/ircomm_lmp.h| 0
 {include => drivers/staging/irda/include}/net/irda/ircomm_param.h  | 0
 {include => drivers/staging/irda/include}/net/irda/ircomm_ttp.h| 0
 {include => drivers/staging/irda/include}/net/irda/ircomm_tty.h| 0
 {include => drivers/staging/irda/include}/net/irda/ircomm_tty_attach.h | 0
 {include => drivers/staging/irda/include}/net/irda/irda.h  | 0
 {include => drivers/staging/irda/include}/net/irda/irda_device.h   | 0
 {include => drivers/staging/irda/include}/net/irda/iriap.h | 0
 {include => drivers/staging/irda/include}/net/irda/iriap_event.h   | 0
 {include => drivers/staging/irda/include}/net/irda/irias_object.h  | 0
 {include => drivers/staging/irda/include}/net/irda/irlan_client.h  | 0
 {include => drivers/staging/irda/include}/net/irda/irlan_common.h  | 0
 {include => drivers/staging/irda/include}/net/irda/irlan_eth.h | 0
 {include => drivers/staging/irda/include}/net/irda/irlan_event.h   | 0
 {include => drivers/staging/irda/include}/net/irda/irlan_filter.h  | 0
 {include => drivers/staging/irda/include}/net/irda/irlan_provider.h| 0
 {include => drivers/staging/irda/include}/net/irda/irlap.h | 0
 {include => drivers/staging/irda/include}/net/irda/irlap_event.h   | 0
 {include => drivers/staging/irda/include}/net/irda/irlap_frame.h   | 0
 {include => drivers/staging/irda/include}/net/irda/irlmp.h | 0
 {include => drivers/staging/irda/include}/net/irda/irlmp_event.h   | 0
 {include => drivers/staging/irda/include}/net/irda/irlmp_frame.h   | 0
 {include => drivers/staging/irda/include}/net/irda/irmod.h | 0
 {include => drivers/staging/irda/include}/net/irda/irqueue.h   | 0
 {include => drivers/staging/irda/include}/net/irda/irttp.h | 0
 {include => drivers/staging/irda/include}/net/irda/parameters.h| 0
 {include => drivers/staging/irda/include}/net/irda/qos.h   | 0
 {include => drivers/staging/irda/include}/net/irda/timer.h | 0
 {include => drivers/staging/irda/include}/net/irda/wrapper.h   | 0
 drivers/staging/irda/net/Makefile  | 2 ++
 36 files changed, 4 insertions(+)
 rename {include => drivers/staging/irda/include}/net/irda/af_irda.h (100%)
 rename {include => drivers/staging/irda/include}/net/irda/crc.h (100%)
 rename {include => drivers/staging/irda/include}/net/irda/discovery.h (100%)
 rename {include => drivers/staging/irda/include}/net/irda/ircomm_core.h (100%)
 rename {include => drivers/staging/irda/include}/net/irda/ircomm_event.h (100%)
 rename {include => drivers/staging/irda/include}/net/irda/ircomm_lmp.h (100%)
 rename {include => drivers/staging/irda/include}/net/irda/ircomm_param.h (100%)
 rename {include => drivers/staging/irda/include}/net/irda/ircomm_ttp.h (100%)
 rename {include => drivers/staging/irda/include}/net/irda/ircomm_tty.h (100%)
 rename {include => drivers/staging/irda/include}/net/irda/ircomm_tty_attach.h 
(100%)
 rename {include => drivers/staging/irda/include}/net/irda/irda.h (100%)
 rename {include => drivers/staging/irda/include}/net/irda/irda_device.h (100%)
 rename {include => drivers/staging/irda/include}/net/irda/iriap.h (100%)
 rename {include => drivers/staging/irda/include}/net/irda/iriap_event.h (100%)
 rename {include => drivers/staging/irda/include}/net/irda/irias_object.h (100%)
 rename {include => drivers/staging/irda/include}/net/irda/irlan_client.h (100%)
 rename {include => drivers/staging/irda/include}/net/irda/irlan_common.h (100%)
 rename {include => drivers/staging/irda/include}/net/irda/irlan_eth.h (100%)
 rename {include => drivers/staging/irda/include}/net/irda/irlan_event.h (100%)
 rename {include => drivers/staging/irda/include}/net/irda/irlan_filter.h (100%)
 rename {include => drivers/staging/irda/include}/net/irda/irlan_provider.h 
(100%)
 rename {include => drivers/staging/irda/include}/net/irda/irlap.h (100%)
 rename {include => drivers/staging/irda/include}/net/irda/irlap_event.h (100%)
 rename {include => drivers/staging/irda/include}/net/irda/irlap_frame.h (100%)
 rename {include =>

Re: [PATCH v2 net-next 1/8] bpf: Add support for recursively running cgroup sock filters

2017-08-27 Thread David Ahern

On 8/25/17 8:49 PM, Alexei Starovoitov wrote:
> 
>> +if (prog && curr_recursive && !new_recursive)
>> +/* if a parent has recursive prog attached, only
>> + * allow recursive programs in descendent cgroup
>> + */
>> +return -EINVAL;
>> +
>>  old_prog = cgrp->bpf.prog[type];
> 
> ... I'm struggling to completely understand how it interacts
> with BPF_F_ALLOW_OVERRIDE.

The 2 flags are completely independent. The existing override logic is
unchanged. If a program can not be overridden, then the new recursive
flag is irrelevant.

> By default we shouldn't allow overriding, so if default prog attached
> to a root, what happens if we try to attach F_RECURSIVE to a descendent?
> If I'm reading the code correctly it will not succeed, which is good.
> Could you add such scenario as test to test_cgrp2_attach2.c ?

Patch 7 adds test cases to cover scenarios. I will add more tests per
comments below and rename to convey it tests the recursive flag.

> 
> Now say we attach overridable and !recursive to a root, another
> recursive prog will not be attached to a descedent, which is correct.

yes

> 
> But if we attach !overridable + recursive to a root we cannot attach
> anything to a descendent right? Then why allow such combination at all?

Sure, we can not allow that combination to prevent the inefficiency of
recursively running through cgroups to run the base program.

> So only overridable + recursive combination makes sense, right?
> 
> I think all these combinations must be documented and tests must be
> added. Sooner or later people will build security sensitive environment
> with it and we have to meticulous now.

Intentions below. I'll add more test cases to verify intentions agree
with code.

> 
> Do you think it would make sense to split this patch out and
> push patches 2 and 3 with few tests in parallel, while we're review
> this change?

I thought about that but decided no. The 'ip vrf exec' use case would
break right of the gate if the other settings were used.

> 
> Tejun needs to take a deep look into this patch as well.
> 

This is the intended behavior:

The override flag is independent of the recursive flag. If the override
flag does not allow an override, the attempt to add a new program fails.
The recursive flag brings an additional constraint: once a cgroup has a
program with the recursive flag set it is inherited by all descendant
groups. Attempts to insert a program that changes that flag fails EINVAL.

Start with the root group at $MNT. No program is attached. By default
override is allowed and recursive is not set.

1. Group $MNT/a is created.

i. Default settings from $MNT are inherited; 'a' has override enabled
and recursive disabled.

ii. Program is attached. Override flag is set, recursive flag is not set.

iii. Process in 'a' opens a socket, program attached to 'a' is run.

2. $MNT/a/b is created

i. 'b' inherits the program and settings of 'a' (override enabled,
recursive disabled).

ii. Process in 'b' opens a socket. Program inherited from 'a' is run.

iii. Non-interesting case for this patch set: attaching a non-recursive
program to 'b' overrides the inherited one. process opens a socket only
the 'b' program is run.

iv. Program is attached to 'b', override flag set, recursive flag set.

v. Process in 'b' opens a socket. Program attached to 'b' is run and
then program from 'a' is run. Recursion stops here since 'a' does not
have the recursion flag set.

3. $MNT/a/b/c is created

i. 'c' inherits the settings of 'b' (override is allowed, recursive flag
is set)

ii. Process in 'c' opens a socket. No program from 'c' exists, so
nothing is run. Recursion flag is set, so program from 'b' is run, then
program from 'a' is run. Stop (recursive flag not set on 'a').

iii. Attaching a non-recursive program to 'c' fails because it inherited
the recursive flag from 'b' and that can not be reset by a descendant.

iv. Recursive program is attached to 'c'

v. Process in 'c' opens a socket. Program attached to 'c' is run, then
the program from 'b' and the program from 'a'. Stop.

etc.

To consider what happens on doubling back and changing programs in the
hierarchy, start with $MNT/a/b/c from 3 above (non-recursive on 'a',
recursive on 'b' and recursive on 'c') for each of the following cases:

1. Program attached to 'b' is detached, recursive flag is reset in the
request. Attempt fails EINVAL because the recursion flag has to be set.

2. Program attached to 'b' is detached, recursive flag is set. Allowed.

Process in 'b' opens a socket. No program attached to 'b' so no program
is run. Recursive flag is set to program from 'a' is run. Stop.

We should allow the recursive flag to be reset if the parent is not
recursive allowing an unwind of settings applied. I'll add that change.

Re: [PATCH v2 net-next 1/8] bpf: Add support for recursively running cgroup sock filters

2017-08-27 Thread David Ahern

On 8/25/17 8:00 PM, Daniel Borkmann wrote:
> Can you elaborate on the semantical changes for the programs
> setting the new flag which are not using below cgroup_bpf_run_filter_sk()
> helper to walk back to root?

You mean other cgroup based programs -- BPF_CGROUP_* ? If so, any reason
not to allow the recursion model on those too?

Re: [PATCH] DSA support for Micrel KSZ8895

2017-08-27 Thread Andrew Lunn

On Sun, Aug 27, 2017 at 02:36:58PM +0200, Pavel Machek wrote:
> Hi!
> 
> So I fought with the driver a bit more, and now I have something that
> kind-of-works.

Thanks for keeping on working on this.

> "great great hack" belows worries me.
> 
> Yeah, disabled code needs to be removed before merge.
> 
> No, tag_ksz part probably is not acceptable. Do you see solution
> better than just copying it into tag_ksz1 file?
> 
> Any more comments, etc?

It would help with review if you split this up into multiple patches.
The change to the tagger should be one patch. The mdio emulation would
make a reasonable standalone patch etc.

I will do a more detailed review later.

  Andrew

Re: [PATCH net-next 4/8] net: ethernet: add the Alpine Ethernet driver

2017-08-27 Thread Chocron, Jonathan

This is a fixed version of my previous response (using proper indentation and 
leaving only the specific questions responded to).

> > +/* MDIO */
> > +#define AL_ETH_MDIO_C45_DEV_MASK 0x1f
> > +#define AL_ETH_MDIO_C45_DEV_SHIFT16
> > +#define AL_ETH_MDIO_C45_REG_MASK 0x
> > +
> > +static int al_mdio_read(struct mii_bus *bp, int mii_id, int reg)
> > +{
> > + struct al_eth_adapter *adapter = bp->priv;
> > + u16 value = 0;
> > + int rc;
> > + int timeout = MDIO_TIMEOUT_MSEC;
> > +
> > + while (timeout > 0) {
> > + if (reg & MII_ADDR_C45) {
> > + netdev_dbg(adapter->netdev, "[c45]: dev %x reg %x val 
> > %x\n",
> > +((reg & AL_ETH_MDIO_C45_DEV_MASK) >> 
> > AL_ETH_MDIO_C45_DEV_SHIFT),
> > +(reg & AL_ETH_MDIO_C45_REG_MASK), value);
> > + rc = al_eth_mdio_read(>hw_adapter, 
> > adapter->phy_addr,
> > + ((reg & AL_ETH_MDIO_C45_DEV_MASK) >> 
> > AL_ETH_MDIO_C45_DEV_SHIFT),
> > + (reg & AL_ETH_MDIO_C45_REG_MASK), );
> > + } else {
> > + rc = al_eth_mdio_read(>hw_adapter, 
> > adapter->phy_addr,
> > +   MDIO_DEVAD_NONE, reg, );
> > + }
> > +
> > + if (rc == 0)
> > + return value;
> > +
> > + netdev_dbg(adapter->netdev,
> > +"mdio read failed. try again in 10 msec\n");
> > +
> > + timeout -= 10;
> > + msleep(10);
> > + }
> 
> This is rather unusual, retrying MDIO operations. Are you working
> around a hardware bug? I suspect this also opens up race conditions,
> in particular with PHY interrupts, which can be clear on read.

The MDIO bus is shared between the ethernet units. There is a HW lock used to 
arbitrate between different interfaces trying to access the bus, 
therefore there is a retry loop. The reg isn't accessed before obtaining the 
lock, so there shouldn't be any clear on read issues.

> > +/* al_eth_mdiobus_setup - initialize mdiobus and register to kernel */
> > +static int al_eth_mdiobus_setup(struct al_eth_adapter *adapter)
> > +{
> > + struct phy_device *phydev;
> > + int i;
> > + int ret = 0;
> > +
> > + adapter->mdio_bus = mdiobus_alloc();
> > + if (!adapter->mdio_bus)
> > + return -ENOMEM;
> > +
> > + adapter->mdio_bus->name = "al mdio bus";
> > + snprintf(adapter->mdio_bus->id, MII_BUS_ID_SIZE, "%x",
> > +  (adapter->pdev->bus->number << 8) | adapter->pdev->devfn);
> > + adapter->mdio_bus->priv = adapter;
> > + adapter->mdio_bus->parent   = >pdev->dev;
> > + adapter->mdio_bus->read = _mdio_read;
> > + adapter->mdio_bus->write= _mdio_write;
> > + adapter->mdio_bus->phy_mask = ~BIT(adapter->phy_addr);
>
> Why do this?

Since the MDIO bus is shared, we want each interface to probe only for the PHY 
associated with it.

> > + * acquire mdio interface ownership
> > + * when mdio interface shared between multiple eth controllers, this 
> > function waits until the ownership granted for this controller.
> > + * this function does nothing when the mdio interface is used only by this 
> > controller.
> > + *
> > + * @param adapter
> > + * @return 0 on success, -ETIMEDOUT  on timeout.
> > + */
> > +static int al_eth_mdio_lock(struct al_hw_eth_adapter *adapter)
> > +{
> > + int count = 0;
> > + u32 mdio_ctrl_1;
> > +
> > + if (!adapter->shared_mdio_if)
> > + return 0; /* nothing to do when interface is not shared */
> > +
> > + do {
> > + mdio_ctrl_1 = readl(>mac_regs_base->gen.mdio_ctrl_1);
> > + if (mdio_ctrl_1 & BIT(0)) {
> > + if (count > 0)
> > + netdev_dbg(adapter->netdev,
> > +"eth %s mdio interface still 
> > busy!\n",
> > +adapter->name);
> > + } else {
> > + return 0;
> > + }
> > + udelay(AL_ETH_MDIO_DELAY_PERIOD);
> > + } while (count++ < (AL_ETH_MDIO_DELAY_COUNT * 4));
>
> This needs explaining. How can a read alone perform a lock? How is
> this race free?

This is how this HW lock works: when the bit is 0 this means the lock is free. 
When a read transaction arrives
to the lock, it changes its value to 1 but sends 0 as the response, basically 
taking ownership.
When the owner is done, it writes  a 0 which essentially "frees" the lock.

> > + if (adapter->mdio_type == AL_ETH_MDIO_TYPE_CLAUSE_22)
> > + rc = al_eth_mdio_10g_mac_type22(adapter, 1, phy_addr,
> > + reg, val);
> > + else
> > + rc = al_eth_mdio_10g_mac_type45(adapter, 1, phy_addr,
> > +

Re: [PATCH net-next v7 05/10] landlock: Add LSM hooks related to filesystem

2017-08-27 Thread Mickaël Salaün

On 26/08/2017 03:16, Alexei Starovoitov wrote:
> On Fri, Aug 25, 2017 at 10:16:39AM +0200, Mickaël Salaün wrote:

>>>
 +/* a directory inode contains only one dentry */
 +HOOK_NEW_FS(inode_create, 3,
 +  struct inode *, dir,
 +  struct dentry *, dentry,
 +  umode_t, mode,
 +  WRAP_ARG_INODE, dir,
 +  WRAP_ARG_RAW, LANDLOCK_ACTION_FS_WRITE
 +);
>>>
>>> more general question: why you're not wrapping all useful
>>> arguments? Like in the above dentry can be acted upon
>>> by the landlock rule and it's readily available...
>>
>> The context used for the FS event must have the exact same types for all
>> calls. This event is meant to be generic but we can add more specific
>> ones if needed, like I do with FS_IOCTL.
> 
> I see. So all FS events will have dentry as first argument
> regardless of how it is in LSM hook ?

All FS events will have a const struct bpf_handle_fs pointer as first
argument, which wrap either a struct file, a struct dentry, a struct
path or a struct inode. Having only one type (struct bpf_handle_fs) is
needed for the eBPF type checker to verify if a Landlock rule (tied to
an event) can access a context field and which operation is allowed
(with this pointer).

> I guess that will simplify the rules indeed.
> I suspect you're doing it to simplify the LSM->landlock shim layer as well, 
> right?

That's right. This ABI is independent from the LSM API and much more
simpler to use.

> 
>> The idea is to enable people to write simple rules, while being able to
>> write fine grain rules for special cases (e.g. IOCTL) if needed.
>>
>>>
>>> The limitation of only 2 args looks odd.
>>> Is it a hard limitation ? how hard to extend?
>>
>> It's not a hard limit at all. Actually, the FS_FNCTL event should have
>> three arguments (I'll add them in the next series): FS handle, FCNTL
>> command and FCNTL argument. I made sure that it's really easy to add
>> more arguments to the context of an event.
> 
> The reason I'm asking, because I'm not completely convinced that
> adding another argument to existing event will be backwards compatible.
> It looks like you're expecting only two args for all FS events, right?

There is four events right now: FS, FS_IOCTL, FS_LOCK and FS_FCNTL. Each
of them are independent. Their context fields can be of the same or
different eBPF type (e.g. scalar, file handle) and numbers. Actually,
these four events have the same arg1 field (file handle) and the same
arg2 eBPF type (scalar), even if arg2 does not have the same semantic
(i.e. abstract FS action, IOCTL command…).

For example, if we want to extend the FS_FCNTL's context in the future,
we will just have to add an arg3. The check is performed in
landlock_is_valid_access() and landlock_decide(). If a field is not used
by an event, then this field will have a NOT_INIT type and accessing it
will be denied.

> How can you add 3rd argument? All FS events would have to get it,
> but in some LSM hooks such argument will be meaningless, whereas
> in other places it will carry useful info that rule can operate on.
> Would that mean that we'll have FS_3 event type and only few LSM
> hooks will be converted to it. That works, but then we'll lose
> compatiblity with old rules written for FS event and that given hook.
> Otherwise we'd need to have fancy logic to accept old FS event
> into FS_3 LSM hook.

If we want to add a third argument to the FS event, then it will become
accessible because its type will be different than NOT_INIT. This keep
the compatibility with old rules because this new field was then denied.

If we want to add a new argument but only for a subset of the hooks used
by the FS event, then we need to create a new event, like FS_FCNTL. For
example, we may want to add a FS_RENAME event to be able to tie the
source file and the destination file of a rename call.

Anyway, I added the subtype/ABI version as a safeguard in case of
unexpected future evolution.

signature.asc
Description: OpenPGP digital signature

[PATCH v4 4/7] dpaa_eth: enable Rx hashing control

2017-08-27 Thread Madalin Bucur

Allow ethtool control of the Rx flow hashing. By default RSS is
enabled, this allows to turn it off by bypassing the FMan Keygen
block and sending all traffic on the default Rx frame queue.

Signed-off-by: Madalin Bucur 
---
 drivers/net/ethernet/freescale/dpaa/dpaa_ethtool.c | 113 +
 1 file changed, 113 insertions(+)

diff --git a/drivers/net/ethernet/freescale/dpaa/dpaa_ethtool.c 
b/drivers/net/ethernet/freescale/dpaa/dpaa_ethtool.c
index aad825088..965f652 100644
--- a/drivers/net/ethernet/freescale/dpaa/dpaa_ethtool.c
+++ b/drivers/net/ethernet/freescale/dpaa/dpaa_ethtool.c
@@ -399,6 +399,117 @@ static void dpaa_get_strings(struct net_device *net_dev, 
u32 stringset,
memcpy(strings, dpaa_stats_global, size);
 }
 
+static int dpaa_get_hash_opts(struct net_device *dev,
+ struct ethtool_rxnfc *cmd)
+{
+   cmd->data = 0;
+
+   switch (cmd->flow_type) {
+   case TCP_V4_FLOW:
+   case TCP_V6_FLOW:
+   case UDP_V4_FLOW:
+   case UDP_V6_FLOW:
+   cmd->data |= RXH_L4_B_0_1 | RXH_L4_B_2_3;
+   /* Fall through */
+   case IPV4_FLOW:
+   case IPV6_FLOW:
+   case SCTP_V4_FLOW:
+   case SCTP_V6_FLOW:
+   case AH_ESP_V4_FLOW:
+   case AH_ESP_V6_FLOW:
+   case AH_V4_FLOW:
+   case AH_V6_FLOW:
+   case ESP_V4_FLOW:
+   case ESP_V6_FLOW:
+   cmd->data |= RXH_IP_SRC | RXH_IP_DST;
+   break;
+   default:
+   cmd->data = 0;
+   break;
+   }
+
+   return 0;
+}
+
+static int dpaa_get_rxnfc(struct net_device *dev, struct ethtool_rxnfc *cmd,
+ u32 *unused)
+{
+   int ret = -EOPNOTSUPP;
+
+   switch (cmd->cmd) {
+   case ETHTOOL_GRXFH:
+   ret = dpaa_get_hash_opts(dev, cmd);
+   break;
+   default:
+   break;
+   }
+
+   return ret;
+}
+
+static void dpaa_set_hash(struct net_device *net_dev, bool enable)
+{
+   struct mac_device *mac_dev;
+   struct fman_port *rxport;
+   struct dpaa_priv *priv;
+
+   priv = netdev_priv(net_dev);
+   mac_dev = priv->mac_dev;
+   rxport = mac_dev->port[0];
+
+   fman_port_use_kg_hash(rxport, enable);
+}
+
+static int dpaa_set_hash_opts(struct net_device *dev,
+ struct ethtool_rxnfc *nfc)
+{
+   int ret = -EINVAL;
+
+   /* we support hashing on IPv4/v6 src/dest IP and L4 src/dest port */
+   if (nfc->data &
+   ~(RXH_IP_SRC | RXH_IP_DST | RXH_L4_B_0_1 | RXH_L4_B_2_3))
+   return -EINVAL;
+
+   switch (nfc->flow_type) {
+   case TCP_V4_FLOW:
+   case TCP_V6_FLOW:
+   case UDP_V4_FLOW:
+   case UDP_V6_FLOW:
+   case IPV4_FLOW:
+   case IPV6_FLOW:
+   case SCTP_V4_FLOW:
+   case SCTP_V6_FLOW:
+   case AH_ESP_V4_FLOW:
+   case AH_ESP_V6_FLOW:
+   case AH_V4_FLOW:
+   case AH_V6_FLOW:
+   case ESP_V4_FLOW:
+   case ESP_V6_FLOW:
+   dpaa_set_hash(dev, !!nfc->data);
+   ret = 0;
+   break;
+   default:
+   break;
+   }
+
+   return ret;
+}
+
+static int dpaa_set_rxnfc(struct net_device *dev, struct ethtool_rxnfc *cmd)
+{
+   int ret = -EOPNOTSUPP;
+
+   switch (cmd->cmd) {
+   case ETHTOOL_SRXFH:
+   ret = dpaa_set_hash_opts(dev, cmd);
+   break;
+   default:
+   break;
+   }
+
+   return ret;
+}
+
 const struct ethtool_ops dpaa_ethtool_ops = {
.get_drvinfo = dpaa_get_drvinfo,
.get_msglevel = dpaa_get_msglevel,
@@ -412,4 +523,6 @@ const struct ethtool_ops dpaa_ethtool_ops = {
.get_strings = dpaa_get_strings,
.get_link_ksettings = dpaa_get_link_ksettings,
.set_link_ksettings = dpaa_set_link_ksettings,
+   .get_rxnfc = dpaa_get_rxnfc,
+   .set_rxnfc = dpaa_set_rxnfc,
 };
-- 
2.1.0

[PATCH v4 0/7] Add RSS to DPAA 1.x Ethernet driver

2017-08-27 Thread Madalin Bucur

This patch set introduces Receive Side Scaling for the DPAA Ethernet
driver. Documentation is updated with details related to the new
feature and limitations that apply.
Added also a small fix.

v2: removed a C++ style comment
v3: move struct fman to header file to avoid exporting a function
v4: addressed compilation issues introduced in v3

Iordache Florinel-R70177 (1):
  fsl/fman: enable FMan Keygen

Madalin Bucur (6):
  fsl/fman: move struct fman to header file
  dpaa_eth: use multiple Rx frame queues
  dpaa_eth: enable Rx hashing control
  dpaa_eth: add NETIF_F_RXHASH
  Documentation: networking: add RSS information
  dpaa_eth: check allocation result

 Documentation/networking/dpaa.txt  |  68 +-
 drivers/net/ethernet/freescale/dpaa/dpaa_eth.c |  76 +-
 drivers/net/ethernet/freescale/dpaa/dpaa_eth.h |   2 +
 .../net/ethernet/freescale/dpaa/dpaa_eth_sysfs.c   |   3 +
 drivers/net/ethernet/freescale/dpaa/dpaa_ethtool.c | 118 
 drivers/net/ethernet/freescale/fman/Makefile   |   2 +-
 drivers/net/ethernet/freescale/fman/fman.c |  88 +--
 drivers/net/ethernet/freescale/fman/fman.h |  77 ++
 drivers/net/ethernet/freescale/fman/fman_keygen.c  | 783 +
 drivers/net/ethernet/freescale/fman/fman_keygen.h  |  46 ++
 drivers/net/ethernet/freescale/fman/fman_port.c|  59 +-
 drivers/net/ethernet/freescale/fman/fman_port.h|   7 +
 12 files changed, 1235 insertions(+), 94 deletions(-)
 create mode 100644 drivers/net/ethernet/freescale/fman/fman_keygen.c
 create mode 100644 drivers/net/ethernet/freescale/fman/fman_keygen.h

-- 
2.1.0

[PATCH v4 3/7] dpaa_eth: use multiple Rx frame queues

2017-08-27 Thread Madalin Bucur

Add a block of 128 Rx frame queues per port. The FMan hardware will
send traffic on one of these queues based on the FMan port Parse
Classify Distribute setup. The hash computed by the FMan Keygen
block will select the Rx FQ.

Signed-off-by: Madalin Bucur 
---
 drivers/net/ethernet/freescale/dpaa/dpaa_eth.c | 50 +++---
 drivers/net/ethernet/freescale/dpaa/dpaa_eth.h |  1 +
 .../net/ethernet/freescale/dpaa/dpaa_eth_sysfs.c   |  3 ++
 3 files changed, 47 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c 
b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
index c7fa285..6d89e74 100644
--- a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
+++ b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
@@ -158,7 +158,7 @@ MODULE_PARM_DESC(tx_timeout, "The Tx timeout in ms");
 #define DPAA_RX_PRIV_DATA_SIZE (u16)(DPAA_TX_PRIV_DATA_SIZE + \
dpaa_rx_extra_headroom)
 
-#define DPAA_ETH_RX_QUEUES 128
+#define DPAA_ETH_PCD_RXQ_NUM   128
 
 #define DPAA_ENQUEUE_RETRIES   10
 
@@ -169,6 +169,7 @@ struct fm_port_fqs {
struct dpaa_fq *tx_errq;
struct dpaa_fq *rx_defq;
struct dpaa_fq *rx_errq;
+   struct dpaa_fq *rx_pcdq;
 };
 
 /* All the dpa bps in use at any moment */
@@ -628,6 +629,7 @@ static inline void dpaa_assign_wq(struct dpaa_fq *fq, int 
idx)
fq->wq = 5;
break;
case FQ_TYPE_RX_DEFAULT:
+   case FQ_TYPE_RX_PCD:
fq->wq = 6;
break;
case FQ_TYPE_TX:
@@ -688,6 +690,7 @@ static int dpaa_alloc_all_fqs(struct device *dev, struct 
list_head *list,
  struct fm_port_fqs *port_fqs)
 {
struct dpaa_fq *dpaa_fq;
+   u32 fq_base, fq_base_aligned, i;
 
dpaa_fq = dpaa_fq_alloc(dev, 0, 1, list, FQ_TYPE_RX_ERROR);
if (!dpaa_fq)
@@ -701,6 +704,26 @@ static int dpaa_alloc_all_fqs(struct device *dev, struct 
list_head *list,
 
port_fqs->rx_defq = _fq[0];
 
+   /* the PCD FQIDs range needs to be aligned for correct operation */
+   if (qman_alloc_fqid_range(_base, 2 * DPAA_ETH_PCD_RXQ_NUM))
+   goto fq_alloc_failed;
+
+   fq_base_aligned = ALIGN(fq_base, DPAA_ETH_PCD_RXQ_NUM);
+
+   for (i = fq_base; i < fq_base_aligned; i++)
+   qman_release_fqid(i);
+
+   for (i = fq_base_aligned + DPAA_ETH_PCD_RXQ_NUM;
+i < (fq_base + 2 * DPAA_ETH_PCD_RXQ_NUM); i++)
+   qman_release_fqid(i);
+
+   dpaa_fq = dpaa_fq_alloc(dev, fq_base_aligned, DPAA_ETH_PCD_RXQ_NUM,
+   list, FQ_TYPE_RX_PCD);
+   if (!dpaa_fq)
+   goto fq_alloc_failed;
+
+   port_fqs->rx_pcdq = _fq[0];
+
if (!dpaa_fq_alloc(dev, 0, DPAA_ETH_TXQ_NUM, list, FQ_TYPE_TX_CONF_MQ))
goto fq_alloc_failed;
 
@@ -870,13 +893,14 @@ static void dpaa_fq_setup(struct dpaa_priv *priv,
  const struct dpaa_fq_cbs *fq_cbs,
  struct fman_port *tx_port)
 {
-   int egress_cnt = 0, conf_cnt = 0, num_portals = 0, cpu;
+   int egress_cnt = 0, conf_cnt = 0, num_portals = 0, portal_cnt = 0, cpu;
const cpumask_t *affine_cpus = qman_affine_cpus();
-   u16 portals[NR_CPUS];
+   u16 channels[NR_CPUS];
struct dpaa_fq *fq;
 
for_each_cpu(cpu, affine_cpus)
-   portals[num_portals++] = qman_affine_channel(cpu);
+   channels[num_portals++] = qman_affine_channel(cpu);
+
if (num_portals == 0)
dev_err(priv->net_dev->dev.parent,
"No Qman software (affine) channels found");
@@ -890,6 +914,12 @@ static void dpaa_fq_setup(struct dpaa_priv *priv,
case FQ_TYPE_RX_ERROR:
dpaa_setup_ingress(priv, fq, _cbs->rx_errq);
break;
+   case FQ_TYPE_RX_PCD:
+   if (!num_portals)
+   continue;
+   dpaa_setup_ingress(priv, fq, _cbs->rx_defq);
+   fq->channel = channels[portal_cnt++ % num_portals];
+   break;
case FQ_TYPE_TX:
dpaa_setup_egress(priv, fq, tx_port,
  _cbs->egress_ern);
@@ -1039,7 +1069,8 @@ static int dpaa_fq_init(struct dpaa_fq *dpaa_fq, bool 
td_enable)
/* Put all the ingress queues in our "ingress CGR". */
if (priv->use_ingress_cgr &&
(dpaa_fq->fq_type == FQ_TYPE_RX_DEFAULT ||
-dpaa_fq->fq_type == FQ_TYPE_RX_ERROR)) {
+dpaa_fq->fq_type == FQ_TYPE_RX_ERROR ||
+dpaa_fq->fq_type == FQ_TYPE_RX_PCD)) {
initfq.we_mask |= cpu_to_be16(QM_INITFQ_WE_CGID);
initfq.fqd.fq_ctrl |= cpu_to_be16(QM_FQCTRL_CGE);

[PATCH v4 2/7] fsl/fman: enable FMan Keygen

2017-08-27 Thread Madalin Bucur

From: Iordache Florinel-R70177 

Add support for the FMan Keygen with a hardcoded scheme to spread
incoming traffic on a FQ range based on source and destination IPs
and ports.

Signed-off-by: Iordache Florinel 
Signed-off-by: Madalin Bucur 
---
 drivers/net/ethernet/freescale/fman/Makefile  |   2 +-
 drivers/net/ethernet/freescale/fman/fman.c|   8 +
 drivers/net/ethernet/freescale/fman/fman.h|   2 +
 drivers/net/ethernet/freescale/fman/fman_keygen.c | 783 ++
 drivers/net/ethernet/freescale/fman/fman_keygen.h |  46 ++
 drivers/net/ethernet/freescale/fman/fman_port.c   |  40 +-
 drivers/net/ethernet/freescale/fman/fman_port.h   |   5 +
 7 files changed, 884 insertions(+), 2 deletions(-)
 create mode 100644 drivers/net/ethernet/freescale/fman/fman_keygen.c
 create mode 100644 drivers/net/ethernet/freescale/fman/fman_keygen.h

diff --git a/drivers/net/ethernet/freescale/fman/Makefile 
b/drivers/net/ethernet/freescale/fman/Makefile
index 6049177..2c38119 100644
--- a/drivers/net/ethernet/freescale/fman/Makefile
+++ b/drivers/net/ethernet/freescale/fman/Makefile
@@ -4,6 +4,6 @@ obj-$(CONFIG_FSL_FMAN) += fsl_fman.o
 obj-$(CONFIG_FSL_FMAN) += fsl_fman_port.o
 obj-$(CONFIG_FSL_FMAN) += fsl_mac.o
 
-fsl_fman-objs  := fman_muram.o fman.o fman_sp.o
+fsl_fman-objs  := fman_muram.o fman.o fman_sp.o fman_keygen.o
 fsl_fman_port-objs := fman_port.o
 fsl_mac-objs:= mac.o fman_dtsec.o fman_memac.o fman_tgec.o
diff --git a/drivers/net/ethernet/freescale/fman/fman.c 
b/drivers/net/ethernet/freescale/fman/fman.c
index 8179cc1..f420dac 100644
--- a/drivers/net/ethernet/freescale/fman/fman.c
+++ b/drivers/net/ethernet/freescale/fman/fman.c
@@ -45,6 +45,7 @@
 
 #include "fman.h"
 #include "fman_muram.h"
+#include "fman_keygen.h"
 
 /* General defines */
 #define FMAN_LIODN_TBL 64  /* size of LIODN table */
@@ -56,6 +57,7 @@
 /* Modules registers offsets */
 #define BMI_OFFSET 0x0008
 #define QMI_OFFSET 0x00080400
+#define KG_OFFSET  0x000C1000
 #define DMA_OFFSET 0x000C2000
 #define FPM_OFFSET 0x000C3000
 #define IMEM_OFFSET0x000C4000
@@ -1737,6 +1739,7 @@ static int fman_config(struct fman *fman)
fman->qmi_regs = base_addr + QMI_OFFSET;
fman->dma_regs = base_addr + DMA_OFFSET;
fman->hwp_regs = base_addr + HWP_OFFSET;
+   fman->kg_regs = base_addr + KG_OFFSET;
fman->base_addr = base_addr;
 
spin_lock_init(>spinlock);
@@ -2009,6 +2012,11 @@ static int fman_init(struct fman *fman)
/* Init HW Parser */
hwp_init(fman->hwp_regs);
 
+   /* Init KeyGen */
+   fman->keygen = keygen_init(fman->kg_regs);
+   if (!fman->keygen)
+   return -EINVAL;
+
err = enable(fman, cfg);
if (err != 0)
return err;
diff --git a/drivers/net/ethernet/freescale/fman/fman.h 
b/drivers/net/ethernet/freescale/fman/fman.h
index 1015dac..bfa02e0 100644
--- a/drivers/net/ethernet/freescale/fman/fman.h
+++ b/drivers/net/ethernet/freescale/fman/fman.h
@@ -328,6 +328,7 @@ struct fman {
struct fman_qmi_regs __iomem *qmi_regs;
struct fman_dma_regs __iomem *dma_regs;
struct fman_hwp_regs __iomem *hwp_regs;
+   struct fman_kg_regs __iomem *kg_regs;
fman_exceptions_cb *exception_cb;
fman_bus_error_cb *bus_error_cb;
/* Spinlock for FMan use */
@@ -336,6 +337,7 @@ struct fman {
 
struct fman_cfg *cfg;
struct muram_info *muram;
+   struct fman_keygen *keygen;
/* cam section in muram */
unsigned long cam_offset;
size_t cam_size;
diff --git a/drivers/net/ethernet/freescale/fman/fman_keygen.c 
b/drivers/net/ethernet/freescale/fman/fman_keygen.c
new file mode 100644
index 000..f54da3c
--- /dev/null
+++ b/drivers/net/ethernet/freescale/fman/fman_keygen.c
@@ -0,0 +1,783 @@
+/*
+ * Copyright 2017 NXP
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions are met:
+ * * Redistributions of source code must retain the above copyright
+ *   notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ *   notice, this list of conditions and the following disclaimer in the
+ *   documentation and/or other materials provided with the distribution.
+ * * Neither the name of NXP nor the
+ *   names of its contributors may be used to endorse or promote products
+ *   derived from this software without specific prior written permission.
+ *
+ *
+ * ALTERNATIVELY, this software may be distributed under the terms of the
+ * GNU General Public License ("GPL") as published by the Free Software
+ * Foundation, either version 2 of that License or (at your option) any
+ * later

[PATCH v4 1/7] fsl/fman: move struct fman to header file

2017-08-27 Thread Madalin Bucur

Signed-off-by: Madalin Bucur 
---
 drivers/net/ethernet/freescale/fman/fman.c  | 80 +
 drivers/net/ethernet/freescale/fman/fman.h  | 75 +++
 drivers/net/ethernet/freescale/fman/fman_port.c |  8 +--
 3 files changed, 82 insertions(+), 81 deletions(-)

diff --git a/drivers/net/ethernet/freescale/fman/fman.c 
b/drivers/net/ethernet/freescale/fman/fman.c
index e714b8f..8179cc1 100644
--- a/drivers/net/ethernet/freescale/fman/fman.c
+++ b/drivers/net/ethernet/freescale/fman/fman.c
@@ -32,9 +32,6 @@
 
 #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
 
-#include "fman.h"
-#include "fman_muram.h"
-
 #include 
 #include 
 #include 
@@ -46,6 +43,9 @@
 #include 
 #include 
 
+#include "fman.h"
+#include "fman_muram.h"
+
 /* General defines */
 #define FMAN_LIODN_TBL 64  /* size of LIODN table */
 #define MAX_NUM_OF_MACS10
@@ -564,80 +564,6 @@ struct fman_cfg {
u32 qmi_def_tnums_thresh;
 };
 
-/* Structure that holds information received from device tree */
-struct fman_dts_params {
-   void __iomem *base_addr;/* FMan virtual address */
-   struct resource *res;   /* FMan memory resource */
-   u8 id;  /* FMan ID */
-
-   int err_irq;/* FMan Error IRQ */
-
-   u16 clk_freq;   /* FMan clock freq (In Mhz) */
-
-   u32 qman_channel_base;  /* QMan channels base */
-   u32 num_of_qman_channels;   /* Number of QMan channels */
-
-   struct resource muram_res;  /* MURAM resource */
-};
-
-/** fman_exceptions_cb
- * fman- Pointer to FMan
- * exception   - The exception.
- *
- * Exceptions user callback routine, will be called upon an exception
- * passing the exception identification.
- *
- * Return: irq status
- */
-typedef irqreturn_t (fman_exceptions_cb)(struct fman *fman,
-enum fman_exceptions exception);
-
-/** fman_bus_error_cb
- * fman- Pointer to FMan
- * port_id - Port id
- * addr- Address that caused the error
- * tnum- Owner of error
- * liodn   - Logical IO device number
- *
- * Bus error user callback routine, will be called upon bus error,
- * passing parameters describing the errors and the owner.
- *
- * Return: IRQ status
- */
-typedef irqreturn_t (fman_bus_error_cb)(struct fman *fman, u8 port_id,
-   u64 addr, u8 tnum, u16 liodn);
-
-struct fman {
-   struct device *dev;
-   void __iomem *base_addr;
-   struct fman_intr_src intr_mng[FMAN_EV_CNT];
-
-   struct fman_fpm_regs __iomem *fpm_regs;
-   struct fman_bmi_regs __iomem *bmi_regs;
-   struct fman_qmi_regs __iomem *qmi_regs;
-   struct fman_dma_regs __iomem *dma_regs;
-   struct fman_hwp_regs __iomem *hwp_regs;
-   fman_exceptions_cb *exception_cb;
-   fman_bus_error_cb *bus_error_cb;
-   /* Spinlock for FMan use */
-   spinlock_t spinlock;
-   struct fman_state_struct *state;
-
-   struct fman_cfg *cfg;
-   struct muram_info *muram;
-   /* cam section in muram */
-   unsigned long cam_offset;
-   size_t cam_size;
-   /* Fifo in MURAM */
-   unsigned long fifo_offset;
-   size_t fifo_size;
-
-   u32 liodn_base[64];
-   u32 liodn_offset[64];
-
-   struct fman_dts_params dts_params;
-};
-
 static irqreturn_t fman_exceptions(struct fman *fman,
   enum fman_exceptions exception)
 {
diff --git a/drivers/net/ethernet/freescale/fman/fman.h 
b/drivers/net/ethernet/freescale/fman/fman.h
index f53e147..1015dac 100644
--- a/drivers/net/ethernet/freescale/fman/fman.h
+++ b/drivers/net/ethernet/freescale/fman/fman.h
@@ -34,6 +34,8 @@
 #define __FM_H
 
 #include 
+#include 
+#include 
 
 /* FM Frame descriptor macros  */
 /* Frame queue Context Override */
@@ -274,6 +276,79 @@ struct fman_intr_src {
void *src_handle;
 };
 
+/** fman_exceptions_cb
+ * fman - Pointer to FMan
+ * exception- The exception.
+ *
+ * Exceptions user callback routine, will be called upon an exception
+ * passing the exception identification.
+ *
+ * Return: irq status
+ */
+typedef irqreturn_t (fman_exceptions_cb)(struct fman *fman,
+enum fman_exceptions exception);
+/** fman_bus_error_cb
+ * fman - Pointer to FMan
+ * port_id  - Port id
+ * addr - Address that caused the error
+ * tnum - Owner of error
+ * liodn- Logical IO device number
+ *
+ * Bus error user callback routine, will be called upon bus error,
+ * passing parameters describing the errors and the owner.
+ *
+ * Return: IRQ status
+ */
+typedef irqreturn_t (fman_bus_error_cb)(struct fman *fman, u8 port_id,
+   u64 addr, u8

[PATCH v4 6/7] Documentation: networking: add RSS information

2017-08-27 Thread Madalin Bucur

Signed-off-by: Madalin Bucur 
---
 Documentation/networking/dpaa.txt | 68 ++-
 1 file changed, 67 insertions(+), 1 deletion(-)

diff --git a/Documentation/networking/dpaa.txt 
b/Documentation/networking/dpaa.txt
index 76e016d..f88194f 100644
--- a/Documentation/networking/dpaa.txt
+++ b/Documentation/networking/dpaa.txt
@@ -13,6 +13,7 @@ Contents
- Configuring DPAA Ethernet in your kernel
- DPAA Ethernet Frame Processing
- DPAA Ethernet Features
+   - DPAA IRQ Affinity and Receive Side Scaling
- Debugging
 
 DPAA Ethernet Overview
@@ -147,7 +148,10 @@ gradually.
 
 The driver has Rx and Tx checksum offloading for UDP and TCP. Currently the Rx
 checksum offload feature is enabled by default and cannot be controlled through
-ethtool.
+ethtool. Also, rx-flow-hash and rx-hashing was added. The addition of RSS
+provides a big performance boost for the forwarding scenarios, allowing
+different traffic flows received by one interface to be processed by different
+CPUs in parallel.
 
 The driver has support for multiple prioritized Tx traffic classes. Priorities
 range from 0 (lowest) to 3 (highest). These are mapped to HW workqueues with
@@ -166,6 +170,68 @@ classes as follows:
 tc qdisc add dev  root handle 1: \
 mqprio num_tc 4 map 0 0 0 0 1 1 1 1 2 2 2 2 3 3 3 3 hw 1
 
+DPAA IRQ Affinity and Receive Side Scaling
+==
+
+Traffic coming on the DPAA Rx queues or on the DPAA Tx confirmation
+queues is seen by the CPU as ingress traffic on a certain portal.
+The DPAA QMan portal interrupts are affined each to a certain CPU.
+The same portal interrupt services all the QMan portal consumers.
+
+By default the DPAA Ethernet driver enables RSS, making use of the
+DPAA FMan Parser and Keygen blocks to distribute traffic on 128
+hardware frame queues using a hash on IP v4/v6 source and destination
+and L4 source and destination ports, in present in the received frame.
+When RSS is disabled, all traffic received by a certain interface is
+received on the default Rx frame queue. The default DPAA Rx frame
+queues are configured to put the received traffic into a pool channel
+that allows any available CPU portal to dequeue the ingress traffic.
+The default frame queues have the HOLDACTIVE option set, ensuring that
+traffic bursts from a certain queue are serviced by the same CPU.
+This ensures a very low rate of frame reordering. A drawback of this
+is that only one CPU at a time can service the traffic received by a
+certain interface when RSS is not enabled.
+
+To implement RSS, the DPAA Ethernet driver allocates an extra set of
+128 Rx frame queues that are configured to dedicated channels, in a
+round-robin manner. The mapping of the frame queues to CPUs is now
+hardcoded, there is no indirection table to move traffic for a certain
+FQ (hash result) to another CPU. The ingress traffic arriving on one
+of these frame queues will arrive at the same portal and will always
+be processed by the same CPU. This ensures intra-flow order preservation
+and workload distribution for multiple traffic flows.
+
+RSS can be turned off for a certain interface using ethtool, i.e.
+
+   # ethtool -N fm1-mac9 rx-flow-hash tcp4 ""
+
+To turn it back on, one needs to set rx-flow-hash for tcp4/6 or udp4/6:
+
+   # ethtool -N fm1-mac9 rx-flow-hash udp4 sfdn
+
+There is no independent control for individual protocols, any command
+run for one of tcp4|udp4|ah4|esp4|sctp4|tcp6|udp6|ah6|esp6|sctp6 is
+going to control the rx-flow-hashing for all protocols on that interface.
+
+Besides using the FMan Keygen computed hash for spreading traffic on the
+128 Rx FQs, the DPAA Ethernet driver also sets the skb hash value when
+the NETIF_F_RXHASH feature is on (active by default). This can be turned
+on or off through ethtool, i.e.:
+
+   # ethtool -K fm1-mac9 rx-hashing off
+   # ethtool -k fm1-mac9 | grep hash
+   receive-hashing: off
+   # ethtool -K fm1-mac9 rx-hashing on
+   Actual changes:
+   receive-hashing: on
+   # ethtool -k fm1-mac9 | grep hash
+   receive-hashing: on
+
+Please note that Rx hashing depends upon the rx-flow-hashing being on
+for that interface - turning off rx-flow-hashing will also disable the
+rx-hashing (without ethtool reporting it as off as that depends on the
+NETIF_F_RXHASH feature flag).
+
 Debugging
 =
 
-- 
2.1.0

[PATCH v4 5/7] dpaa_eth: add NETIF_F_RXHASH

2017-08-27 Thread Madalin Bucur

Set the skb hash when then FMan Keygen hash result is available.

Signed-off-by: Madalin Bucur 
---
 drivers/net/ethernet/freescale/dpaa/dpaa_eth.c | 23 +++---
 drivers/net/ethernet/freescale/dpaa/dpaa_eth.h |  1 +
 drivers/net/ethernet/freescale/dpaa/dpaa_ethtool.c |  9 +++--
 drivers/net/ethernet/freescale/fman/fman_port.c| 11 +++
 drivers/net/ethernet/freescale/fman/fman_port.h|  2 ++
 5 files changed, 41 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c 
b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
index 6d89e74..73ca8d7 100644
--- a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
+++ b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
@@ -236,7 +236,7 @@ static int dpaa_netdev_init(struct net_device *net_dev,
net_dev->max_mtu = dpaa_get_max_mtu();
 
net_dev->hw_features |= (NETIF_F_IP_CSUM | NETIF_F_IPV6_CSUM |
-NETIF_F_LLTX);
+NETIF_F_LLTX | NETIF_F_RXHASH);
 
net_dev->hw_features |= NETIF_F_SG | NETIF_F_HIGHDMA;
/* The kernels enables GSO automatically, if we declare NETIF_F_SG.
@@ -2237,12 +2237,13 @@ static enum qman_cb_dqrr_result rx_default_dqrr(struct 
qman_portal *portal,
dma_addr_t addr = qm_fd_addr(fd);
enum qm_fd_format fd_format;
struct net_device *net_dev;
-   u32 fd_status;
+   u32 fd_status, hash_offset;
struct dpaa_bp *dpaa_bp;
struct dpaa_priv *priv;
unsigned int skb_len;
struct sk_buff *skb;
int *count_ptr;
+   void *vaddr;
 
fd_status = be32_to_cpu(fd->status);
fd_format = qm_fd_get_format(fd);
@@ -2288,7 +2289,8 @@ static enum qman_cb_dqrr_result rx_default_dqrr(struct 
qman_portal *portal,
dma_unmap_single(dpaa_bp->dev, addr, dpaa_bp->size, DMA_FROM_DEVICE);
 
/* prefetch the first 64 bytes of the frame or the SGT start */
-   prefetch(phys_to_virt(addr) + qm_fd_get_offset(fd));
+   vaddr = phys_to_virt(addr);
+   prefetch(vaddr + qm_fd_get_offset(fd));
 
fd_format = qm_fd_get_format(fd);
/* The only FD types that we may receive are contig and S/G */
@@ -2309,6 +2311,18 @@ static enum qman_cb_dqrr_result rx_default_dqrr(struct 
qman_portal *portal,
 
skb->protocol = eth_type_trans(skb, net_dev);
 
+   if (net_dev->features & NETIF_F_RXHASH && priv->keygen_in_use &&
+   !fman_port_get_hash_result_offset(priv->mac_dev->port[RX],
+ _offset)) {
+   enum pkt_hash_types type;
+
+   /* if L4 exists, it was used in the hash generation */
+   type = be32_to_cpu(fd->status) & FM_FD_STAT_L4CV ?
+   PKT_HASH_TYPE_L4 : PKT_HASH_TYPE_L3;
+   skb_set_hash(skb, be32_to_cpu(*(u32 *)(vaddr + hash_offset)),
+type);
+   }
+
skb_len = skb->len;
 
if (unlikely(netif_receive_skb(skb) == NET_RX_DROP))
@@ -2774,6 +2788,9 @@ static int dpaa_eth_probe(struct platform_device *pdev)
if (err)
goto init_ports_failed;
 
+   /* Rx traffic distribution based on keygen hashing defaults to on */
+   priv->keygen_in_use = true;
+
priv->percpu_priv = devm_alloc_percpu(dev, *priv->percpu_priv);
if (!priv->percpu_priv) {
dev_err(dev, "devm_alloc_percpu() failed\n");
diff --git a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.h 
b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.h
index 496a12c..bd94220 100644
--- a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.h
+++ b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.h
@@ -159,6 +159,7 @@ struct dpaa_priv {
struct list_head dpaa_fq_list;
 
u8 num_tc;
+   bool keygen_in_use;
u32 msg_enable; /* net_device message level */
 
struct {
diff --git a/drivers/net/ethernet/freescale/dpaa/dpaa_ethtool.c 
b/drivers/net/ethernet/freescale/dpaa/dpaa_ethtool.c
index 965f652..faea674 100644
--- a/drivers/net/ethernet/freescale/dpaa/dpaa_ethtool.c
+++ b/drivers/net/ethernet/freescale/dpaa/dpaa_ethtool.c
@@ -402,6 +402,8 @@ static void dpaa_get_strings(struct net_device *net_dev, 
u32 stringset,
 static int dpaa_get_hash_opts(struct net_device *dev,
  struct ethtool_rxnfc *cmd)
 {
+   struct dpaa_priv *priv = netdev_priv(dev);
+
cmd->data = 0;
 
switch (cmd->flow_type) {
@@ -409,7 +411,8 @@ static int dpaa_get_hash_opts(struct net_device *dev,
case TCP_V6_FLOW:
case UDP_V4_FLOW:
case UDP_V6_FLOW:
-   cmd->data |= RXH_L4_B_0_1 | RXH_L4_B_2_3;
+   if (priv->keygen_in_use)
+   cmd->data |= RXH_L4_B_0_1 | RXH_L4_B_2_3;
/* Fall through */
case IPV4_FLOW:
case IPV6_FLOW:
@@ -421,7 +424,8 @@ static int dpaa_get_hash_opts(struct net_device *dev,

[PATCH v4 7/7] dpaa_eth: check allocation result

2017-08-27 Thread Madalin Bucur

Signed-off-by: Madalin Bucur 
---
 drivers/net/ethernet/freescale/dpaa/dpaa_eth.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c 
b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
index 73ca8d7..4225806 100644
--- a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
+++ b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
@@ -2561,6 +2561,9 @@ static struct dpaa_bp *dpaa_bp_alloc(struct device *dev)
 
dpaa_bp->bpid = FSL_DPAA_BPID_INV;
dpaa_bp->percpu_count = devm_alloc_percpu(dev, *dpaa_bp->percpu_count);
+   if (!dpaa_bp->percpu_count)
+   return ERR_PTR(-ENOMEM);
+
dpaa_bp->config_count = FSL_DPAA_ETH_MAX_BUF_COUNT;
 
dpaa_bp->seed_cb = dpaa_bp_seed;
-- 
2.1.0

Re: Stable apply request [was: Bluetooth: bnep: fix possible might sleep error in bnep_session]

2017-08-27 Thread Greg KH

On Wed, Aug 23, 2017 at 08:14:15PM +0200, Marcel Holtmann wrote:
> Hi Jiri,
> 
> >>> It looks like bnep_session has same pattern as the issue reported in
> >>> old rfcomm:
> >>> 
> >>>   while (1) {
> >>>   set_current_state(TASK_INTERRUPTIBLE);
> >>>   if (condition)
> >>>   break;
> >>>   // may call might_sleep here
> >>>   schedule();
> >>>   }
> >>>   __set_current_state(TASK_RUNNING);
> >>> 
> >>> Which fixed at:
> >>>   dfb2fae Bluetooth: Fix nested sleeps
> >>> 
> >>> So let's fix it at the same way, also follow the suggestion of:
> >>> https://lwn.net/Articles/628628/
> > 
> > ...
> > 
> >> all 3 patches have been applied to bluetooth-next tree.
> > 
> > Hi,
> > 
> > given users are hitting it in at least 4.4 and 4.12, can we have all
> > three in all stables where this applies?
> > 
> > 5da8e47d849d Bluetooth: hidp: fix possible might sleep error in
> > hidp_session_thread
> > f06d977309d0 Bluetooth: cmtp: fix possible might sleep error in cmtp_session
> > 25717382c1dd Bluetooth: bnep: fix possible might sleep error in bnep_session
> > 
> > I am not sure: to stable directly or via net stable?
> 
> as Dave said, just email -stable directly and have Greg pick them up.

All now picked up :)

Re: [PATCH 0/4] net: stmmac: revert the EMAC bindings

2017-08-27 Thread Chen-Yu Tsai

On Sat, Aug 26, 2017 at 3:12 AM, Maxime Ripard
 wrote:
> Hi,
>
> The bindings of the stmmac glue for the new Allwinner EMAC controller
> are still controversial and being discussed, even though they've been
> merged in 4.13.
>
> In order not to introduce any binding we do not really want to commit
> to in a stable release, especially since that would mean we would have
> to support both the right and old bindings, let's revert them.
>
> We will reintroduce them in due time, once the discussion has settled
> down.
>
> The first three patches should go through the arm-soc tree, the last
> one through the net tree. All of them must be treated as fixes.
>
> Thanks!
> Maxime
>
> Maxime Ripard (4):
>   dt-bindings: net: Revert sun8i dwmac binding
>   arm64: dts: allwinner: Revert EMAC changes
>   arm: dts: sunxi: Revert EMAC changes
>   net: stmmac: sun8i: Remove the compatibles
>
>  .../devicetree/bindings/net/dwmac-sun8i.txt| 84 
> --
>  arch/arm/boot/dts/sun8i-h2-plus-orangepi-zero.dts  |  9 ---
>  arch/arm/boot/dts/sun8i-h3-bananapi-m2-plus.dts| 19 -
>  arch/arm/boot/dts/sun8i-h3-beelink-x2.dts  |  8 ---

I think this particular change is in -next, not v4.13-rc.

Otherwise, whole series is

Acked-by: Chen-Yu Tsai 

>  arch/arm/boot/dts/sun8i-h3-nanopi-neo.dts  |  7 --
>  arch/arm/boot/dts/sun8i-h3-orangepi-2.dts  |  8 ---
>  arch/arm/boot/dts/sun8i-h3-orangepi-one.dts|  8 ---
>  arch/arm/boot/dts/sun8i-h3-orangepi-pc-plus.dts|  5 --
>  arch/arm/boot/dts/sun8i-h3-orangepi-pc.dts |  8 ---
>  arch/arm/boot/dts/sun8i-h3-orangepi-plus.dts   | 22 --
>  arch/arm/boot/dts/sun8i-h3-orangepi-plus2e.dts | 16 -
>  arch/arm/boot/dts/sunxi-h3-h5.dtsi | 26 ---
>  .../boot/dts/allwinner/sun50i-a64-bananapi-m64.dts | 17 -
>  .../boot/dts/allwinner/sun50i-a64-pine64-plus.dts  | 15 
>  .../arm64/boot/dts/allwinner/sun50i-a64-pine64.dts | 18 -
>  .../dts/allwinner/sun50i-a64-sopine-baseboard.dts  | 17 -
>  arch/arm64/boot/dts/allwinner/sun50i-a64.dtsi  | 20 --
>  .../boot/dts/allwinner/sun50i-h5-nanopi-neo2.dts   | 17 -
>  .../boot/dts/allwinner/sun50i-h5-orangepi-pc2.dts  | 17 -
>  .../dts/allwinner/sun50i-h5-orangepi-prime.dts | 17 -
>  drivers/net/ethernet/stmicro/stmmac/dwmac-sun8i.c  |  8 ---
>  21 files changed, 366 deletions(-)
>  delete mode 100644 Documentation/devicetree/bindings/net/dwmac-sun8i.txt
>
> --
> 2.13.5
>

[PATCH] DSA support for Micrel KSZ8895

2017-08-27 Thread Pavel Machek

Hi!

So I fought with the driver a bit more, and now I have something that
kind-of-works.

"great great hack" belows worries me.

Yeah, disabled code needs to be removed before merge.

No, tag_ksz part probably is not acceptable. Do you see solution
better than just copying it into tag_ksz1 file?

Any more comments, etc?

Help would be welcome.

Best regards,
Pavel

Signed-off-by: Pavel Machek 

diff --git a/drivers/net/dsa/microchip/Kconfig 
b/drivers/net/dsa/microchip/Kconfig
index a8b8f59099ce..7b7d7ddb3488 100644
--- a/drivers/net/dsa/microchip/Kconfig
+++ b/drivers/net/dsa/microchip/Kconfig
@@ -1,12 +1,25 @@
 menuconfig MICROCHIP_KSZ
-   tristate "Microchip KSZ series switch support"
+   tristate "Microchip KSZ 9477 series switch support"
+   depends on NET_DSA
+   select NET_DSA_TAG_KSZ
+   help
+ This driver adds support for Microchip KSZ switch chips.
+
+menuconfig MICROCHIP_KSZ_8895
+   tristate "Microchip KSZ 8895 series switch support"
depends on NET_DSA
select NET_DSA_TAG_KSZ
help
  This driver adds support for Microchip KSZ switch chips.
 
 config MICROCHIP_KSZ_SPI_DRIVER
-   tristate "KSZ series SPI connected switch driver"
+   tristate "KSZ 9477 series SPI connected switch driver"
depends on MICROCHIP_KSZ && SPI
help
  Select to enable support for registering switches configured through 
SPI.
+
+config MICROCHIP_KSZ_8895_SPI_DRIVER
+   tristate "KSZ 8895 series SPI connected switch driver"
+   depends on MICROCHIP_KSZ_8895 && SPI
+   help
+ Select to enable support for registering switches configured through 
SPI.
diff --git a/drivers/net/dsa/microchip/Makefile 
b/drivers/net/dsa/microchip/Makefile
index ed335e29fae8..b6a17f79d2d9 100644
--- a/drivers/net/dsa/microchip/Makefile
+++ b/drivers/net/dsa/microchip/Makefile
@@ -1,2 +1,4 @@
 obj-$(CONFIG_MICROCHIP_KSZ)+= ksz_common.o
+obj-$(CONFIG_MICROCHIP_KSZ_8895)+= ksz_8895.o
 obj-$(CONFIG_MICROCHIP_KSZ_SPI_DRIVER) += ksz_spi.o
+obj-$(CONFIG_MICROCHIP_KSZ_8895_SPI_DRIVER)+= ksz_8895_spi.o
diff --git a/drivers/net/dsa/microchip/ksz_8895.c 
b/drivers/net/dsa/microchip/ksz_8895.c
new file mode 100644
index ..d546e08b1281
--- /dev/null
+++ b/drivers/net/dsa/microchip/ksz_8895.c
@@ -0,0 +1,721 @@
+/*
+ * Microchip switch driver main logic
+ *
+ * Copyright (C) 2017
+ * Copyright (C) 2017 Pavel Machek 
+ *
+ * Permission to use, copy, modify, and/or distribute this software for any
+ * purpose with or without fee is hereby granted, provided that the above
+ * copyright notice and this permission notice appear in all copies.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
+ * WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
+ * MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
+ * ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
+ * WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
+ * ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
+ * OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "ksz_8895_reg.h"
+#include "ksz_priv.h"
+
+static const struct {
+   int index;
+   char string[ETH_GSTRING_LEN];
+} mib_names[TOTAL_SWITCH_COUNTER_NUM] = {
+   { 0x00, "???" },
+};
+
+static void ksz_cfg(struct ksz_device *dev, u32 addr, u8 bits, bool set)
+{
+   u8 data;
+
+   ksz_read8(dev, addr, );
+   if (set)
+   data |= bits;
+   else
+   data &= ~bits;
+   ksz_write8(dev, addr, data);
+}
+
+#if 0
+static void ksz_cfg32(struct ksz_device *dev, u32 addr, u32 bits, bool set)
+{
+   u32 data;
+
+   ksz_read32(dev, addr, );
+   if (set)
+   data |= bits;
+   else
+   data &= ~bits;
+   ksz_write32(dev, addr, data);
+}
+#endif
+
+static void ksz_port_cfg(struct ksz_device *dev, int port, int offset, u8 bits,
+bool set)
+{
+   u32 addr;
+   u8 data;
+
+   addr = PORT_CTRL_ADDR(port, offset);
+   ksz_read8(dev, addr, );
+
+   if (set)
+   data |= bits;
+   else
+   data &= ~bits;
+
+   ksz_write8(dev, addr, data);
+}
+
+#if 0
+static void ksz_port_cfg32(struct ksz_device *dev, int port, int offset,
+  u32 bits, bool set)
+{
+   u32 addr;
+   u32 data;
+
+   addr = PORT_CTRL_ADDR(port, offset);
+   ksz_read32(dev, addr, );
+
+   if (set)
+   data |= bits;
+   else
+   data &= ~bits;
+
+   ksz_write32(dev, addr, data);
+}
+#endif
+
+#define NOTIMPL() do { NOTIMPLV(); return -EJUKEBOX; }

[PATCH net-next 2/4] net/mlx5: Add SRIOV VGT+ support

2017-08-27 Thread Saeed Mahameed

From: Mohamad Haj Yahia 

Implementing the VGT+ feature via acl tables.
The acl tables will hold the actual needed rules which is only the
intersection of the requested vlan-ids list and the allowed vlan-ids
list from the administrator.

Signed-off-by: Mohamad Haj Yahia 
Signed-off-by: Eugenia Emantayev 
Signed-off-by: Saeed Mahameed 
---
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c |  28 ++
 drivers/net/ethernet/mellanox/mlx5/core/eswitch.c | 496 +-
 drivers/net/ethernet/mellanox/mlx5/core/eswitch.h |  31 +-
 drivers/net/ethernet/mellanox/mlx5/core/vport.c   |  19 +-
 include/linux/mlx5/vport.h|   6 +-
 5 files changed, 458 insertions(+), 122 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index fdc2b92f020b..1a2ebe0e79ae 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -3388,6 +3388,32 @@ static int mlx5e_set_vf_vlan(struct net_device *dev, int 
vf, u16 vlan, u8 qos,
   vlan, qos);
 }
 
+static int mlx5e_add_vf_vlan_trunk_range(struct net_device *dev, int vf,
+u16 start_vid, u16 end_vid,
+__be16 vlan_proto) {
+   struct mlx5e_priv *priv = netdev_priv(dev);
+   struct mlx5_core_dev *mdev = priv->mdev;
+
+   if (vlan_proto != htons(ETH_P_8021Q))
+   return -EPROTONOSUPPORT;
+
+   return mlx5_eswitch_add_vport_trunk_range(mdev->priv.eswitch, vf + 1,
+ start_vid, end_vid);
+}
+
+static int mlx5e_del_vf_vlan_trunk_range(struct net_device *dev, int vf,
+u16 start_vid, u16 end_vid,
+__be16 vlan_proto) {
+   struct mlx5e_priv *priv = netdev_priv(dev);
+   struct mlx5_core_dev *mdev = priv->mdev;
+
+   if (vlan_proto != htons(ETH_P_8021Q))
+   return -EPROTONOSUPPORT;
+
+   return mlx5_eswitch_del_vport_trunk_range(mdev->priv.eswitch, vf + 1,
+ start_vid, end_vid);
+}
+
 static int mlx5e_set_vf_spoofchk(struct net_device *dev, int vf, bool setting)
 {
struct mlx5e_priv *priv = netdev_priv(dev);
@@ -3733,6 +3759,8 @@ static const struct net_device_ops mlx5e_netdev_ops = {
/* SRIOV E-Switch NDOs */
.ndo_set_vf_mac  = mlx5e_set_vf_mac,
.ndo_set_vf_vlan = mlx5e_set_vf_vlan,
+   .ndo_add_vf_vlan_trunk_range = mlx5e_add_vf_vlan_trunk_range,
+   .ndo_del_vf_vlan_trunk_range = mlx5e_del_vf_vlan_trunk_range,
.ndo_set_vf_spoofchk = mlx5e_set_vf_spoofchk,
.ndo_set_vf_trust= mlx5e_set_vf_trust,
.ndo_set_vf_rate = mlx5e_set_vf_rate,
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c 
b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
index 6b84c1113301..a8e8670c7c8d 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
@@ -60,12 +60,14 @@ struct vport_addr {
 enum {
UC_ADDR_CHANGE = BIT(0),
MC_ADDR_CHANGE = BIT(1),
+   VLAN_CHANGE= BIT(2),
PROMISC_CHANGE = BIT(3),
 };
 
 /* Vport context events */
 #define SRIOV_VPORT_EVENTS (UC_ADDR_CHANGE | \
MC_ADDR_CHANGE | \
+   VLAN_CHANGE | \
PROMISC_CHANGE)
 
 static int arm_vport_context_events_cmd(struct mlx5_core_dev *dev, u16 vport,
@@ -681,6 +683,45 @@ static void esw_update_vport_addr_list(struct mlx5_eswitch 
*esw,
kfree(mac_list);
 }
 
+static void esw_update_acl_trunk_bitmap(struct mlx5_eswitch *esw, u32 
vport_num)
+{
+   struct mlx5_vport *vport = >vports[vport_num];
+
+   bitmap_and(vport->acl_vlan_8021q_bitmap, vport->req_vlan_bitmap,
+  vport->info.vlan_trunk_8021q_bitmap, VLAN_N_VID);
+}
+
+static int esw_vport_egress_config(struct mlx5_eswitch *esw,
+  struct mlx5_vport *vport);
+static int esw_vport_ingress_config(struct mlx5_eswitch *esw,
+   struct mlx5_vport *vport);
+
+/* Sync vport vlan list from vport context */
+static void esw_update_vport_vlan_list(struct mlx5_eswitch *esw, u32 vport_num)
+{
+   struct mlx5_vport *vport = >vports[vport_num];
+   DECLARE_BITMAP(prev_vlans_bitmap, VLAN_N_VID);
+   int err;
+
+   bitmap_copy(prev_vlans_bitmap, vport->req_vlan_bitmap, VLAN_N_VID);
+   bitmap_zero(vport->req_vlan_bitmap, VLAN_N_VID);
+
+   if (!vport->enabled)
+   return;
+
+   err = mlx5_query_nic_vport_vlans(esw->dev, vport_num, 
vport->req_vlan_bitmap);
+   if (err)
+   return;
+
+

[PATCH net-next 0/4] SRIOV VF VGT+ and violation counters support

2017-08-27 Thread Saeed Mahameed

Hi Dave

This series provides two security SRIOV related features (VGT+ and VF violation 
counters).

VGT+ is a security feature that gives the administrator the ability of 
controlling
the allowed VGT vlan IDs list that can be transmitted/received from/to the VF.
The allowed VGT vlan IDs list is called "trunk".

Admin can add/remove a range of allowed vlan-ids via iptool:
ip link set { DEVICE } [ vf NUM [ trunk { add | rem } START-VLAN-ID [ 
END-VLAN-ID ] [ proto VLAN-PROTO ] ] ]

Example:
After this series of configuration :
1) ip link set eth3 vf 0 trunk add 10 100 (allow vlan-id 10-100, default tpid 
0x8100)
2) ip link set eth3 vf 0 trunk add 105 proto 802.1q (allow vlan-id 105 tpid 
0x8100)
3) ip link set eth3 vf 0 trunk add 105 proto 802.1ad (allow vlan-id 105 tpid 
0x88a8)
4) ip link set eth3 vf 0 trunk rem 90 (block vlan-id 90)
5) ip link set eth3 vf 0 trunk rem 50 60 (block vlan-ids 50-60)

VF 0 can only communicate on vlan-ids: 10-49,61-89,91-100,105 with tpid 0x8100 
and vlan-id 105 with tpid 0x88a8.

For this purpose following net_device callbacks were added:
int (*ndo_add_vf_vlan_trunk_range)(struct net_device *dev, int vf, u16 
start_vid, u16 end_vid, __be16 proto);
int (*ndo_del_vf_vlan_trunk_range)(struct net_device *dev, int vf, u16 
start_vid, u16 end_vid, __be16 proto);

This feature is implemented and demonstrated in mlx5 via ACL steering tables 
and vlan rules attached to the VF's
corresponding E-Switch vport.

I addition to VGT+ we introduce new set of counter to VF statistics, to collect 
counters for traffic violating
VF ACL rules (such as VGT+ violation), for that we extend the current 
ifla_vf_stats to include rx_dropped/tx_dropped
to be reported per VF.

Example:
> ip link set eth3 vf 0 trunk add 10 100
VF 0 transmits 2412 packets on a vlan id not in [10,100] range will be dropped 
and reported in hypervisor
via:
> ip -s link show dev enp5s0f0"
  6: enp5s0f0:  mtu 1500 qdisc mq state UP 
mode DEFAULT group default qlen 1000
[...]
vf 0 MAC 00:00:ca:fe:ca:fe, vlan 5, spoof checking off, link-state 
auto, trust off, query_rss off
RX: bytes  packets  mcast   bcast   dropped
1666   29   14 32  0
TX: bytes  packets   dropped
2880   44   2412

Thanks,
Saeed.

Eugenia Emantayev (2):
  net/core: Add violation counters to VF statisctics
  net/mlx5e: E-switch, Add steering drop counters

Mohamad Haj Yahia (2):
  net: Add SRIOV VGT+ support
  net/mlx5: Add SRIOV VGT+ support

 drivers/net/ethernet/mellanox/mlx5/core/en_main.c  |  28 +
 drivers/net/ethernet/mellanox/mlx5/core/eswitch.c  | 589 +
 drivers/net/ethernet/mellanox/mlx5/core/eswitch.h  |  31 +-
 drivers/net/ethernet/mellanox/mlx5/core/fs_core.h  |   2 +
 .../net/ethernet/mellanox/mlx5/core/fs_counters.c  |   6 +
 drivers/net/ethernet/mellanox/mlx5/core/vport.c|  19 +-
 include/linux/if_link.h|   4 +
 include/linux/mlx5/vport.h |   6 +-
 include/linux/netdevice.h  |  12 +
 include/uapi/linux/if_link.h   |  22 +
 net/core/rtnetlink.c   | 119 +++--
 11 files changed, 681 insertions(+), 157 deletions(-)

-- 
2.13.0

[PATCH net-next 1/4] net: Add SRIOV VGT+ support

2017-08-27 Thread Saeed Mahameed

From: Mohamad Haj Yahia 

VGT+ is a security feature that gives the administrator the ability of
controlling the allowed vlan-ids list that can be transmitted/received
from/to the VF.
The allowed vlan-ids list is called "trunk".
Admin can add/remove a range of allowed vlan-ids via iptool.
Example:
After this series of configuration :
1) ip link set eth3 vf 0 trunk add 10 100 (allow vlan-id 10-100, default tpid 
0x8100)
2) ip link set eth3 vf 0 trunk add 105 proto 802.1q (allow vlan-id 105 tpid 
0x8100)
3) ip link set eth3 vf 0 trunk add 105 proto 802.1ad (allow vlan-id 105 tpid 
0x88a8)
4) ip link set eth3 vf 0 trunk rem 90 (block vlan-id 90)
5) ip link set eth3 vf 0 trunk rem 50 60 (block vlan-ids 50-60)

The VF 0 can only communicate on vlan-ids: 10-49,61-89,91-100,105 with
tpid 0x8100 and vlan-id 105 with tpid 0x88a8.

For this purpose we added the following netlink sr-iov commands:

1) IFLA_VF_VLAN_RANGE: used to add/remove allowed vlan-ids range.
We added the ifla_vf_vlan_range struct to specify the range we want to
add/remove from the userspace.
We added ndo_add_vf_vlan_trunk_range and ndo_del_vf_vlan_trunk_range
netdev ops to add/remove allowed vlan-ids range in the netdev.

2) IFLA_VF_VLAN_TRUNK: used to query the allowed vlan-ids trunk.
We added trunk bitmap to the ifla_vf_info struct to get the current
allowed vlan-ids trunk from the netdev.
We added ifla_vf_vlan_trunk struct for sending the allowed vlan-ids
trunk to the userspace.

Signed-off-by: Mohamad Haj Yahia 
Signed-off-by: Eugenia Emantayev 
Signed-off-by: Saeed Mahameed 
---
 include/linux/if_link.h  |   2 +
 include/linux/netdevice.h|  12 +
 include/uapi/linux/if_link.h |  20 
 net/core/rtnetlink.c | 109 +++
 4 files changed, 114 insertions(+), 29 deletions(-)

diff --git a/include/linux/if_link.h b/include/linux/if_link.h
index 0b17c585b5cd..da70af27e42e 100644
--- a/include/linux/if_link.h
+++ b/include/linux/if_link.h
@@ -25,6 +25,8 @@ struct ifla_vf_info {
__u32 max_tx_rate;
__u32 rss_query_en;
__u32 trusted;
+   __u64 trunk_8021q[VF_VLAN_BITMAP];
+   __u64 trunk_8021ad[VF_VLAN_BITMAP];
__be16 vlan_proto;
 };
 #endif /* _LINUX_IF_LINK_H */
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index c5475b37a631..10633cabc58f 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -959,6 +959,10 @@ struct xfrmdev_ops {
  *  Hash Key. This is needed since on some devices VF share this 
information
  *  with PF and querying it may introduce a theoretical security risk.
  * int (*ndo_set_vf_rss_query_en)(struct net_device *dev, int vf, bool 
setting);
+ * int (*ndo_add_vf_vlan_trunk_range)(struct net_device *dev, int vf,
+ *   u16 start_vid, u16 end_vid, __be16 proto);
+ * int (*ndo_del_vf_vlan_trunk_range)(struct net_device *dev, int vf,
+ *   u16 start_vid, u16 end_vid, __be16 proto);
  * int (*ndo_get_vf_port)(struct net_device *dev, int vf, struct sk_buff *skb);
  * int (*ndo_setup_tc)(struct net_device *dev, enum tc_setup_type type,
  *void *type_data);
@@ -1208,6 +1212,14 @@ struct net_device_ops {
int (*ndo_set_vf_rss_query_en)(
   struct net_device *dev,
   int vf, bool setting);
+   int (*ndo_add_vf_vlan_trunk_range)(
+  struct net_device *dev,
+  int vf, u16 start_vid,
+  u16 end_vid, __be16 proto);
+   int (*ndo_del_vf_vlan_trunk_range)(
+  struct net_device *dev,
+  int vf, u16 start_vid,
+  u16 end_vid, __be16 proto);
int (*ndo_setup_tc)(struct net_device *dev,
enum tc_setup_type type,
void *type_data);
diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
index 8d062c58d5cb..3aa895c5fbc1 100644
--- a/include/uapi/linux/if_link.h
+++ b/include/uapi/linux/if_link.h
@@ -168,6 +168,8 @@ enum {
 #ifndef __KERNEL__
 #define IFLA_RTA(r)  ((struct rtattr*)(((char*)(r)) + 
NLMSG_ALIGN(sizeof(struct ifinfomsg
 #define IFLA_PAYLOAD(n) NLMSG_PAYLOAD(n,sizeof(struct ifinfomsg))
+#define BITS_PER_BYTE 8
+#define DIV_ROUND_UP(n, d) (((n) + (d) - 1) / (d))
 #endif
 
 enum {
@@ -645,6 +647,8 @@ enum {
IFLA_VF_IB_NODE_GUID,   /* VF Infiniband node GUID */
IFLA_VF_IB_PORT_GUID,   /* VF Infiniband port GUID */

[PATCH net-next 4/4] net/mlx5e: E-switch, Add steering drop counters

2017-08-27 Thread Saeed Mahameed

From: Eugenia Emantayev 

Add flow counters to count packets dropped due to drop rules
configured in eswitch egress and ingress ACLs.
These counters will count VFs violations and incoming traffic drops.
Will be presented on hypervisor via standard 'ip -s link show' command.

Example: "ip -s link show dev enp5s0f0"

6: enp5s0f0:  mtu 1500 qdisc mq state UP mode 
DEFAULT group default qlen 1000
link/ether 24:8a:07:a5:28:f0 brd ff:ff:ff:ff:ff:ff
RX: bytes  packets  errors  dropped overrun mcast
0  00   0   0   2
TX: bytes  packets  errors  dropped carrier collsns
1406   17   0   0   0   0
vf 0 MAC 00:00:ca:fe:ca:fe, vlan 5, spoof checking off, link-state auto, 
trust off, query_rss off
RX: bytes  packets  mcast   bcast   dropped
1666   29   14 32  0
TX: bytes  packets   dropped
2880   44   2412

Signed-off-by: Eugenia Emantayev 
Signed-off-by: Saeed Mahameed 
---
 drivers/net/ethernet/mellanox/mlx5/core/eswitch.c  | 97 --
 drivers/net/ethernet/mellanox/mlx5/core/fs_core.h  |  2 +
 .../net/ethernet/mellanox/mlx5/core/fs_counters.c  |  6 ++
 3 files changed, 98 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c 
b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
index a8e8670c7c8d..6c992e43e397 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
@@ -37,6 +37,7 @@
 #include 
 #include "mlx5_core.h"
 #include "eswitch.h"
+#include "fs_core.h"
 
 #define UPLINK_VPORT 0x
 
@@ -1007,8 +1008,14 @@ static void esw_vport_cleanup_egress_rules(struct 
mlx5_eswitch *esw,
kfree(trunk_vlan_rule);
}
 
-   if (!IS_ERR_OR_NULL(vport->egress.drop_rule))
+   if (!IS_ERR_OR_NULL(vport->egress.drop_rule)) {
+   struct mlx5_fc *drop_counter =
+   mlx5_flow_rule_counter(vport->egress.drop_rule);
+
mlx5_del_flow_rules(vport->egress.drop_rule);
+   if (drop_counter)
+   mlx5_fc_destroy(vport->dev, drop_counter);
+   }
 
if (!IS_ERR_OR_NULL(vport->egress.allow_untagged_rule))
mlx5_del_flow_rules(vport->egress.allow_untagged_rule);
@@ -1174,8 +1181,14 @@ static void esw_vport_cleanup_ingress_rules(struct 
mlx5_eswitch *esw,
 {
struct mlx5_acl_vlan *trunk_vlan_rule, *tmp;
 
-   if (!IS_ERR_OR_NULL(vport->ingress.drop_rule))
+   if (!IS_ERR_OR_NULL(vport->ingress.drop_rule)) {
+   struct mlx5_fc *drop_counter =
+   mlx5_flow_rule_counter(vport->ingress.drop_rule);
+
mlx5_del_flow_rules(vport->ingress.drop_rule);
+   if (drop_counter)
+   mlx5_fc_destroy(vport->dev, drop_counter);
+   }
 
list_for_each_entry_safe(trunk_vlan_rule, tmp,
 >ingress.allowed_vlans_rules, list) {
@@ -1222,6 +1235,8 @@ static int esw_vport_ingress_config(struct mlx5_eswitch 
*esw,
bool need_vlan_filter = 
!!bitmap_weight(vport->info.vlan_trunk_8021q_bitmap,
VLAN_N_VID);
struct mlx5_acl_vlan *trunk_vlan_rule;
+   struct mlx5_flow_destination dest;
+   struct mlx5_fc *counter = NULL;
struct mlx5_flow_act flow_act = {0};
struct mlx5_flow_spec *spec;
bool need_acl_table = true;
@@ -1333,18 +1348,33 @@ static int esw_vport_ingress_config(struct mlx5_eswitch 
*esw,
}
 
 drop_rule:
+   /* Alloc ingress drop flow counter */
+   counter = mlx5_fc_create(esw->dev, false);
+   if (IS_ERR(counter)) {
+   esw_warn(esw->dev,
+"vport[%d] configure ingress drop rule counter 
failed\n",
+vport->vport);
+   counter = NULL;
+   } else {
+   dest.type = MLX5_FLOW_DESTINATION_TYPE_COUNTER;
+   dest.counter = counter;
+   }
+
+   /* Drop others rule (star rule) */
memset(spec, 0, sizeof(*spec));
flow_act.action = MLX5_FLOW_CONTEXT_ACTION_DROP;
+   if (counter)
+   flow_act.action |= MLX5_FLOW_CONTEXT_ACTION_COUNT;
vport->ingress.drop_rule =
-   mlx5_add_flow_rules(vport->ingress.acl, spec,
-   _act, NULL, 0);
+   mlx5_add_flow_rules(vport->ingress.acl, spec, _act, , 
1);
if (IS_ERR(vport->ingress.drop_rule)) {
err = PTR_ERR(vport->ingress.drop_rule);
esw_warn(esw->dev,
 "vport[%d] configure ingress drop rule, err(%d)\n",
 vport->vport, err);
vport->ingress.drop_rule = NULL;
-   goto out;
+   if (counter)
+

[PATCH net-next 3/4] net/core: Add violation counters to VF statisctics

2017-08-27 Thread Saeed Mahameed

From: Eugenia Emantayev 

Add receive and transmit violation counters to be
displayed in iproute2 VF statistics.

Signed-off-by: Eugenia Emantayev 
Signed-off-by: Saeed Mahameed 
---
 include/linux/if_link.h  |  2 ++
 include/uapi/linux/if_link.h |  2 ++
 net/core/rtnetlink.c | 10 +-
 3 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/include/linux/if_link.h b/include/linux/if_link.h
index da70af27e42e..ebf3448acb5b 100644
--- a/include/linux/if_link.h
+++ b/include/linux/if_link.h
@@ -12,6 +12,8 @@ struct ifla_vf_stats {
__u64 tx_bytes;
__u64 broadcast;
__u64 multicast;
+   __u64 rx_dropped;
+   __u64 tx_dropped;
 };
 
 struct ifla_vf_info {
diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
index 3aa895c5fbc1..68cd31b281a1 100644
--- a/include/uapi/linux/if_link.h
+++ b/include/uapi/linux/if_link.h
@@ -743,6 +743,8 @@ enum {
IFLA_VF_STATS_BROADCAST,
IFLA_VF_STATS_MULTICAST,
IFLA_VF_STATS_PAD,
+   IFLA_VF_STATS_RX_DROPPED,
+   IFLA_VF_STATS_TX_DROPPED,
__IFLA_VF_STATS_MAX,
 };
 
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 56909f11d88e..1a653bb00d6e 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -845,6 +845,10 @@ static inline int rtnl_vfinfo_size(const struct net_device 
*dev,
 nla_total_size_64bit(sizeof(__u64)) +
 /* IFLA_VF_STATS_MULTICAST */
 nla_total_size_64bit(sizeof(__u64)) +
+/* IFLA_VF_STATS_RX_DROPPED */
+nla_total_size_64bit(sizeof(__u64)) +
+/* IFLA_VF_STATS_TX_DROPPED */
+nla_total_size_64bit(sizeof(__u64)) +
 nla_total_size(sizeof(struct ifla_vf_trust)));
return size;
} else
@@ -1214,7 +1218,11 @@ static noinline_for_stack int rtnl_fill_vfinfo(struct 
sk_buff *skb,
nla_put_u64_64bit(skb, IFLA_VF_STATS_BROADCAST,
  vf_stats.broadcast, IFLA_VF_STATS_PAD) ||
nla_put_u64_64bit(skb, IFLA_VF_STATS_MULTICAST,
- vf_stats.multicast, IFLA_VF_STATS_PAD)) {
+ vf_stats.multicast, IFLA_VF_STATS_PAD) ||
+   nla_put_u64_64bit(skb, IFLA_VF_STATS_RX_DROPPED,
+ vf_stats.rx_dropped, IFLA_VF_STATS_PAD) ||
+   nla_put_u64_64bit(skb, IFLA_VF_STATS_TX_DROPPED,
+ vf_stats.tx_dropped, IFLA_VF_STATS_PAD)) {
nla_nest_cancel(skb, vfstats);
goto nla_put_vf_failure;
}
-- 
2.13.0

Re: [PATCH net] bridge: check for null fdb->dst before notifying switchdev drivers

2017-08-27 Thread Arkadi Sharshevsky



On 08/27/2017 07:13 AM, Roopa Prabhu wrote:
> From: Roopa Prabhu 
> 
> current switchdev drivers dont seem to support offloading fdb
> entries pointing to the bridge device which have fdb->dst
> not set to any port. This patch adds a NULL fdb->dst check in
> the switchdev notifier code.
> 
> This patch fixes the below NULL ptr dereference:
> $bridge fdb add 00:02:00:00:00:33 dev br0 self
> 
> [   69.953374] BUG: unable to handle kernel NULL pointer dereference at
> 0008
> [   69.954044] IP: br_switchdev_fdb_notify+0x29/0x80
> [   69.954044] PGD 66527067
> [   69.954044] P4D 66527067
> [   69.954044] PUD 7899c067
> [   69.954044] PMD 0
> [   69.954044]
> [   69.954044] Oops:  [#1] SMP
> [   69.954044] Modules linked in:
> [   69.954044] CPU: 1 PID: 3074 Comm: bridge Not tainted 4.13.0-rc6+ #1
> [   69.954044] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
> BIOS rel-1.7.5.1-0-g8936dbb-20141113_115728-nilsson.home.kraxel.org
> 04/01/2014
> [   69.954044] task: 88007b827140 task.stack: c90001564000
> [   69.954044] RIP: 0010:br_switchdev_fdb_notify+0x29/0x80
> [   69.954044] RSP: 0018:c90001567918 EFLAGS: 00010246
> [   69.954044] RAX:  RBX: 8800795e0880 RCX:
> 00c0
> [   69.954044] RDX: c90001567920 RSI: 001c RDI:
> 8800795d0600
> [   69.954044] RBP: c90001567938 R08: 8800795d0600 R09:
> 
> [   69.954044] R10: c90001567a88 R11: 88007b849400 R12:
> 8800795e0880
> [   69.954044] R13: 8800795d0600 R14: 81ef8880 R15:
> 001c
> [   69.954044] FS:  7f93d3085700() GS:88007fd0()
> knlGS:
> [   69.954044] CS:  0010 DS:  ES:  CR0: 80050033
> [   69.954044] CR2: 0008 CR3: 66551000 CR4:
> 06e0
> [   69.954044] Call Trace:
> [   69.954044]  fdb_notify+0x3f/0xf0
> [   69.954044]  __br_fdb_add.isra.12+0x1a7/0x370
> [   69.954044]  br_fdb_add+0x178/0x280
> [   69.954044]  rtnl_fdb_add+0x10a/0x200
> [   69.954044]  rtnetlink_rcv_msg+0x1b4/0x240
> [   69.954044]  ? skb_free_head+0x21/0x40
> [   69.954044]  ? rtnl_calcit.isra.18+0xf0/0xf0
> [   69.954044]  netlink_rcv_skb+0xed/0x120
> [   69.954044]  rtnetlink_rcv+0x15/0x20
> [   69.954044]  netlink_unicast+0x180/0x200
> [   69.954044]  netlink_sendmsg+0x291/0x370
> [   69.954044]  ___sys_sendmsg+0x180/0x2e0
> [   69.954044]  ? filemap_map_pages+0x2db/0x370
> [   69.954044]  ? do_wp_page+0x11d/0x420
> [   69.954044]  ? __handle_mm_fault+0x794/0xd80
> [   69.954044]  ? vma_link+0xcb/0xd0
> [   69.954044]  __sys_sendmsg+0x4c/0x90
> [   69.954044]  SyS_sendmsg+0x12/0x20
> [   69.954044]  do_syscall_64+0x63/0xe0
> [   69.954044]  entry_SYSCALL64_slow_path+0x25/0x25
> [   69.954044] RIP: 0033:0x7f93d2bad690
> [   69.954044] RSP: 002b:7ffc7217a638 EFLAGS: 0246 ORIG_RAX:
> 002e
> [   69.954044] RAX: ffda RBX: 7ffc72182eac RCX:
> 7f93d2bad690
> [   69.954044] RDX:  RSI: 7ffc7217a670 RDI:
> 0003
> [   69.954044] RBP: 59a1f7f8 R08: 0006 R09:
> 000a
> [   69.954044] R10: 7ffc7217a400 R11: 0246 R12:
> 7ffc7217a670
> [   69.954044] R13: 7ffc72182a98 R14: 006114c0 R15:
> 7ffc72182aa0
> [   69.954044] Code: 1f 00 66 66 66 66 90 55 48 89 e5 48 83 ec 20 f6 47
> 20 04 74 0a 83 fe 1c 74 09 83 fe 1d 74 2c c9 66 90 c3 48 8b 47 10 48 8d
> 55 e8 <48> 8b 70 08 0f b7 47 1e 48 83 c7 18 48 89 7d f0 bf 03 00 00 00
> [   69.954044] RIP: br_switchdev_fdb_notify+0x29/0x80 RSP:
> c90001567918
> [   69.954044] CR2: 0008
> [   69.954044] ---[ end trace 03e9eec4a82c238b ]---
> 
> Fixes: 6b26b51b1d13 ("net: bridge: Add support for notifying devices about 
> FDB add/del")
> Signed-off-by: Roopa Prabhu 
> ---
>  net/bridge/br_switchdev.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/net/bridge/br_switchdev.c b/net/bridge/br_switchdev.c
> index 181a44d..f6b1c7d 100644
> --- a/net/bridge/br_switchdev.c
> +++ b/net/bridge/br_switchdev.c
> @@ -115,7 +115,7 @@ br_switchdev_fdb_call_notifiers(bool adding, const 
> unsigned char *mac,
>  void
>  br_switchdev_fdb_notify(const struct net_bridge_fdb_entry *fdb, int type)
>  {
> - if (!fdb->added_by_user)
> + if (!fdb->added_by_user || !fdb->dst)
>   return;
>  
>   switch (type) {
> 

Thanks, missed that.
Arkadi

Re: [Intel-wired-lan] [PATCH] e1000e: apply burst mode settings only on default

2017-08-27 Thread Neftin, Sasha


On 8/27/2017 11:32, Neftin, Sasha wrote:

On 8/27/2017 11:30, Neftin, Sasha wrote:

On 8/25/2017 18:06, Willem de Bruijn wrote:

From: Willem de Bruijn 

Devices that support FLAG2_DMA_BURST have different default values
for RDTR and RADV. Apply burst mode default settings only when no
explicit value was passed at module load.

The RDTR default is zero. If the module is loaded for low latency
operation with RxIntDelay=0, do not override this value with a burst
default of 32.

Move the decision to apply burst values earlier, where explicitly
initialized module variables can be distinguished from defaults.

Signed-off-by: Willem de Bruijn 
---
  drivers/net/ethernet/intel/e1000e/e1000.h  |  4 
  drivers/net/ethernet/intel/e1000e/netdev.c |  8 
  drivers/net/ethernet/intel/e1000e/param.c  | 16 +++-
  3 files changed, 15 insertions(+), 13 deletions(-)

diff --git a/drivers/net/ethernet/intel/e1000e/e1000.h 
b/drivers/net/ethernet/intel/e1000e/e1000.h

index 98e6abb1..2311b31bdcac 100644
--- a/drivers/net/ethernet/intel/e1000e/e1000.h
+++ b/drivers/net/ethernet/intel/e1000e/e1000.h
@@ -94,10 +94,6 @@ struct e1000_info;
   */
  #define E1000_CHECK_RESET_COUNT25
  -#define DEFAULT_RDTR0
-#define DEFAULT_RADV8
-#define BURST_RDTR0x20
-#define BURST_RADV0x20
  #define PCICFG_DESC_RING_STATUS0xe4
  #define FLUSH_DESC_REQUIRED0x100
  diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c 
b/drivers/net/ethernet/intel/e1000e/netdev.c

index 327dfe5bedc0..47b89aac7969 100644
--- a/drivers/net/ethernet/intel/e1000e/netdev.c
+++ b/drivers/net/ethernet/intel/e1000e/netdev.c
@@ -3223,14 +3223,6 @@ static void e1000_configure_rx(struct 
e1000_adapter *adapter)

   */
  ew32(RXDCTL(0), E1000_RXDCTL_DMA_BURST_ENABLE);
  ew32(RXDCTL(1), E1000_RXDCTL_DMA_BURST_ENABLE);
-
-/* override the delay timers for enabling bursting, only if
- * the value was not set by the user via module options
- */
-if (adapter->rx_int_delay == DEFAULT_RDTR)
-adapter->rx_int_delay = BURST_RDTR;
-if (adapter->rx_abs_int_delay == DEFAULT_RADV)
-adapter->rx_abs_int_delay = BURST_RADV;
  }
/* set the Receive Delay Timer Register */
diff --git a/drivers/net/ethernet/intel/e1000e/param.c 
b/drivers/net/ethernet/intel/e1000e/param.c

index 6d8c39abee16..bb696c98f9b0 100644
--- a/drivers/net/ethernet/intel/e1000e/param.c
+++ b/drivers/net/ethernet/intel/e1000e/param.c
@@ -73,17 +73,25 @@ E1000_PARAM(TxAbsIntDelay, "Transmit Absolute 
Interrupt Delay");

  /* Receive Interrupt Delay in units of 1.024 microseconds
   * hardware will likely hang if you set this to anything but zero.
   *
+ * Burst variant is used as default if device has FLAG2_DMA_BURST.
+ *
   * Valid Range: 0-65535
   */
  E1000_PARAM(RxIntDelay, "Receive Interrupt Delay");
+#define DEFAULT_RDTR0
+#define BURST_RDTR0x20
  #define MAX_RXDELAY 0x
  #define MIN_RXDELAY 0
/* Receive Absolute Interrupt Delay in units of 1.024 microseconds
+ *
+ * Burst variant is used as default if device has FLAG2_DMA_BURST.
   *
   * Valid Range: 0-65535
   */
  E1000_PARAM(RxAbsIntDelay, "Receive Absolute Interrupt Delay");
+#define DEFAULT_RADV8
+#define BURST_RADV0x20
  #define MAX_RXABSDELAY 0x
  #define MIN_RXABSDELAY 0
  @@ -297,6 +305,9 @@ void e1000e_check_options(struct e1000_adapter 
*adapter)

   .max = MAX_RXDELAY } }
  };
  +if (adapter->flags2 & FLAG2_DMA_BURST)
+opt.def = BURST_RDTR;
+
  if (num_RxIntDelay > bd) {
  adapter->rx_int_delay = RxIntDelay[bd];
e1000_validate_option(>rx_int_delay, ,
@@ -307,7 +318,7 @@ void e1000e_check_options(struct e1000_adapter 
*adapter)

  }
  /* Receive Absolute Interrupt Delay */
  {
-static const struct e1000_option opt = {
+static struct e1000_option opt = {
  .type = range_option,
  .name = "Receive Absolute Interrupt Delay",
  .err  = "using default of "
@@ -317,6 +328,9 @@ void e1000e_check_options(struct e1000_adapter 
*adapter)

   .max = MAX_RXABSDELAY } }
  };
  +if (adapter->flags2 & FLAG2_DMA_BURST)
+opt.def = BURST_RADV;
+
  if (num_RxAbsIntDelay > bd) {
  adapter->rx_abs_int_delay = RxAbsIntDelay[bd];
e1000_validate_option(>rx_abs_int_delay, ,


This patch looks good for me, but I would like hear second opinion.

___
Intel-wired-lan mailing list
intel-wired-...@osuosl.org
https://lists.osuosl.org/mailman/listinfo/intel-wired-lan



___
Intel-wired-lan mailing list
intel-wired-...@osuosl.org
https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

Re: [Intel-wired-lan] [PATCH] e1000e: apply burst mode settings only on default

2017-08-27 Thread Neftin, Sasha


On 8/27/2017 11:30, Neftin, Sasha wrote:

On 8/25/2017 18:06, Willem de Bruijn wrote:

From: Willem de Bruijn 

Devices that support FLAG2_DMA_BURST have different default values
for RDTR and RADV. Apply burst mode default settings only when no
explicit value was passed at module load.

The RDTR default is zero. If the module is loaded for low latency
operation with RxIntDelay=0, do not override this value with a burst
default of 32.

Move the decision to apply burst values earlier, where explicitly
initialized module variables can be distinguished from defaults.

Signed-off-by: Willem de Bruijn 
---
  drivers/net/ethernet/intel/e1000e/e1000.h  |  4 
  drivers/net/ethernet/intel/e1000e/netdev.c |  8 
  drivers/net/ethernet/intel/e1000e/param.c  | 16 +++-
  3 files changed, 15 insertions(+), 13 deletions(-)

diff --git a/drivers/net/ethernet/intel/e1000e/e1000.h 
b/drivers/net/ethernet/intel/e1000e/e1000.h

index 98e6abb1..2311b31bdcac 100644
--- a/drivers/net/ethernet/intel/e1000e/e1000.h
+++ b/drivers/net/ethernet/intel/e1000e/e1000.h
@@ -94,10 +94,6 @@ struct e1000_info;
   */
  #define E1000_CHECK_RESET_COUNT25
  -#define DEFAULT_RDTR0
-#define DEFAULT_RADV8
-#define BURST_RDTR0x20
-#define BURST_RADV0x20
  #define PCICFG_DESC_RING_STATUS0xe4
  #define FLUSH_DESC_REQUIRED0x100
  diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c 
b/drivers/net/ethernet/intel/e1000e/netdev.c

index 327dfe5bedc0..47b89aac7969 100644
--- a/drivers/net/ethernet/intel/e1000e/netdev.c
+++ b/drivers/net/ethernet/intel/e1000e/netdev.c
@@ -3223,14 +3223,6 @@ static void e1000_configure_rx(struct 
e1000_adapter *adapter)

   */
  ew32(RXDCTL(0), E1000_RXDCTL_DMA_BURST_ENABLE);
  ew32(RXDCTL(1), E1000_RXDCTL_DMA_BURST_ENABLE);
-
-/* override the delay timers for enabling bursting, only if
- * the value was not set by the user via module options
- */
-if (adapter->rx_int_delay == DEFAULT_RDTR)
-adapter->rx_int_delay = BURST_RDTR;
-if (adapter->rx_abs_int_delay == DEFAULT_RADV)
-adapter->rx_abs_int_delay = BURST_RADV;
  }
/* set the Receive Delay Timer Register */
diff --git a/drivers/net/ethernet/intel/e1000e/param.c 
b/drivers/net/ethernet/intel/e1000e/param.c

index 6d8c39abee16..bb696c98f9b0 100644
--- a/drivers/net/ethernet/intel/e1000e/param.c
+++ b/drivers/net/ethernet/intel/e1000e/param.c
@@ -73,17 +73,25 @@ E1000_PARAM(TxAbsIntDelay, "Transmit Absolute 
Interrupt Delay");

  /* Receive Interrupt Delay in units of 1.024 microseconds
   * hardware will likely hang if you set this to anything but zero.
   *
+ * Burst variant is used as default if device has FLAG2_DMA_BURST.
+ *
   * Valid Range: 0-65535
   */
  E1000_PARAM(RxIntDelay, "Receive Interrupt Delay");
+#define DEFAULT_RDTR0
+#define BURST_RDTR0x20
  #define MAX_RXDELAY 0x
  #define MIN_RXDELAY 0
/* Receive Absolute Interrupt Delay in units of 1.024 microseconds
+ *
+ * Burst variant is used as default if device has FLAG2_DMA_BURST.
   *
   * Valid Range: 0-65535
   */
  E1000_PARAM(RxAbsIntDelay, "Receive Absolute Interrupt Delay");
+#define DEFAULT_RADV8
+#define BURST_RADV0x20
  #define MAX_RXABSDELAY 0x
  #define MIN_RXABSDELAY 0
  @@ -297,6 +305,9 @@ void e1000e_check_options(struct e1000_adapter 
*adapter)

   .max = MAX_RXDELAY } }
  };
  +if (adapter->flags2 & FLAG2_DMA_BURST)
+opt.def = BURST_RDTR;
+
  if (num_RxIntDelay > bd) {
  adapter->rx_int_delay = RxIntDelay[bd];
e1000_validate_option(>rx_int_delay, ,
@@ -307,7 +318,7 @@ void e1000e_check_options(struct e1000_adapter 
*adapter)

  }
  /* Receive Absolute Interrupt Delay */
  {
-static const struct e1000_option opt = {
+static struct e1000_option opt = {
  .type = range_option,
  .name = "Receive Absolute Interrupt Delay",
  .err  = "using default of "
@@ -317,6 +328,9 @@ void e1000e_check_options(struct e1000_adapter 
*adapter)

   .max = MAX_RXABSDELAY } }
  };
  +if (adapter->flags2 & FLAG2_DMA_BURST)
+opt.def = BURST_RADV;
+
  if (num_RxAbsIntDelay > bd) {
  adapter->rx_abs_int_delay = RxAbsIntDelay[bd];
e1000_validate_option(>rx_abs_int_delay, ,


This patch looks good for me, but I would like hear second opinion.

___
Intel-wired-lan mailing list
intel-wired-...@osuosl.org
https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

Re: [patch net-next 11/12] mlxsw: spectrum_dpipe: Add support for IPv4 host table dump

2017-08-27 Thread Arkadi Sharshevsky

On 08/25/2017 10:51 PM, David Ahern wrote:
> On 8/25/17 2:26 AM, Arkadi Sharshevsky wrote:
>>
>>
>> On 08/24/2017 10:26 PM, David Ahern wrote:
>>> On 8/23/17 11:40 PM, Jiri Pirko wrote:
 +static int
 +mlxsw_sp_dpipe_table_host_entries_get(struct mlxsw_sp *mlxsw_sp,
 +struct devlink_dpipe_entry *entry,
 +bool counters_enabled,
 +struct devlink_dpipe_dump_ctx *dump_ctx,
 +int type)
 +{
 +  int rif_neigh_count = 0;
 +  int rif_neigh_skip = 0;
 +  int neigh_count = 0;
 +  int rif_count;
 +  int i, j;
 +  int err;
 +
 +  rtnl_lock();
>>>
>>> Why does a h/w driver dumping its tables need the rtnl lock?
>>>
>>
>> This table represents the hw IPv4 arp table, and the
>> driver depends on rtnl to be held.
>>
> 
> Meaning mlxsw does not have its own locks protecting data structures --
> e.g., rif adds and deletes, so it is relying on rtnl?
> 
> Also, this dpipe capability seems to be just dumping data structures
> maintained by the driver. ie., you can compare the mlxsw view of
> networking state to IPv4 and IPv6 level tables. Any plans to offer a
> command that reads data from the h/w and passes that back to the user?
> i.e, a command to compare kernel tables to h/w state?
> 

So this infra should provide several things-

1) Reveal the interactions between various hardware tables
2) Counters for this tables
3) Debugabillity

The first two can be achieved right now. Regarding debugabillity, which
is a bit vague, the current assumption is that the drivers internal data
structures are synced with hardware (which is no always true), and maybe
are not synced with the kernel, so this can be achieved right now by
dumping the internal state of the driver. Furthermore, the counters are
dumped from the hardware and give the user additional indication.

I completely agree that the hardware should be dumped in order to
validate the internal data structures are really synced with HW. This
could be usable for observing data corruptions inside the ASIC and
various complex bugs.

In order to address that I though about maybe add a flag called
"validate_hw" so that during the dump the driver<-->hw state could be
validated.

What do you think about it?

Thanks,
Arkadi

Re: [Intel-wired-lan] [PATCH] e1000e: apply burst mode settings only on default

2017-08-27 Thread Neftin, Sasha


On 8/25/2017 18:06, Willem de Bruijn wrote:

From: Willem de Bruijn 

Devices that support FLAG2_DMA_BURST have different default values
for RDTR and RADV. Apply burst mode default settings only when no
explicit value was passed at module load.

The RDTR default is zero. If the module is loaded for low latency
operation with RxIntDelay=0, do not override this value with a burst
default of 32.

Move the decision to apply burst values earlier, where explicitly
initialized module variables can be distinguished from defaults.

Signed-off-by: Willem de Bruijn 
---
  drivers/net/ethernet/intel/e1000e/e1000.h  |  4 
  drivers/net/ethernet/intel/e1000e/netdev.c |  8 
  drivers/net/ethernet/intel/e1000e/param.c  | 16 +++-
  3 files changed, 15 insertions(+), 13 deletions(-)

diff --git a/drivers/net/ethernet/intel/e1000e/e1000.h 
b/drivers/net/ethernet/intel/e1000e/e1000.h
index 98e6abb1..2311b31bdcac 100644
--- a/drivers/net/ethernet/intel/e1000e/e1000.h
+++ b/drivers/net/ethernet/intel/e1000e/e1000.h
@@ -94,10 +94,6 @@ struct e1000_info;
   */
  #define E1000_CHECK_RESET_COUNT   25
  
-#define DEFAULT_RDTR			0

-#define DEFAULT_RADV   8
-#define BURST_RDTR 0x20
-#define BURST_RADV 0x20
  #define PCICFG_DESC_RING_STATUS   0xe4
  #define FLUSH_DESC_REQUIRED   0x100
  
diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c b/drivers/net/ethernet/intel/e1000e/netdev.c

index 327dfe5bedc0..47b89aac7969 100644
--- a/drivers/net/ethernet/intel/e1000e/netdev.c
+++ b/drivers/net/ethernet/intel/e1000e/netdev.c
@@ -3223,14 +3223,6 @@ static void e1000_configure_rx(struct e1000_adapter 
*adapter)
 */
ew32(RXDCTL(0), E1000_RXDCTL_DMA_BURST_ENABLE);
ew32(RXDCTL(1), E1000_RXDCTL_DMA_BURST_ENABLE);
-
-   /* override the delay timers for enabling bursting, only if
-* the value was not set by the user via module options
-*/
-   if (adapter->rx_int_delay == DEFAULT_RDTR)
-   adapter->rx_int_delay = BURST_RDTR;
-   if (adapter->rx_abs_int_delay == DEFAULT_RADV)
-   adapter->rx_abs_int_delay = BURST_RADV;
}
  
  	/* set the Receive Delay Timer Register */

diff --git a/drivers/net/ethernet/intel/e1000e/param.c 
b/drivers/net/ethernet/intel/e1000e/param.c
index 6d8c39abee16..bb696c98f9b0 100644
--- a/drivers/net/ethernet/intel/e1000e/param.c
+++ b/drivers/net/ethernet/intel/e1000e/param.c
@@ -73,17 +73,25 @@ E1000_PARAM(TxAbsIntDelay, "Transmit Absolute Interrupt 
Delay");
  /* Receive Interrupt Delay in units of 1.024 microseconds
   * hardware will likely hang if you set this to anything but zero.
   *
+ * Burst variant is used as default if device has FLAG2_DMA_BURST.
+ *
   * Valid Range: 0-65535
   */
  E1000_PARAM(RxIntDelay, "Receive Interrupt Delay");
+#define DEFAULT_RDTR   0
+#define BURST_RDTR 0x20
  #define MAX_RXDELAY 0x
  #define MIN_RXDELAY 0
  
  /* Receive Absolute Interrupt Delay in units of 1.024 microseconds

+ *
+ * Burst variant is used as default if device has FLAG2_DMA_BURST.
   *
   * Valid Range: 0-65535
   */
  E1000_PARAM(RxAbsIntDelay, "Receive Absolute Interrupt Delay");
+#define DEFAULT_RADV   8
+#define BURST_RADV 0x20
  #define MAX_RXABSDELAY 0x
  #define MIN_RXABSDELAY 0
  
@@ -297,6 +305,9 @@ void e1000e_check_options(struct e1000_adapter *adapter)

 .max = MAX_RXDELAY } }
};
  
+		if (adapter->flags2 & FLAG2_DMA_BURST)

+   opt.def = BURST_RDTR;
+
if (num_RxIntDelay > bd) {
adapter->rx_int_delay = RxIntDelay[bd];
e1000_validate_option(>rx_int_delay, ,
@@ -307,7 +318,7 @@ void e1000e_check_options(struct e1000_adapter *adapter)
}
/* Receive Absolute Interrupt Delay */
{
-   static const struct e1000_option opt = {
+   static struct e1000_option opt = {
.type = range_option,
.name = "Receive Absolute Interrupt Delay",
.err  = "using default of "
@@ -317,6 +328,9 @@ void e1000e_check_options(struct e1000_adapter *adapter)
 .max = MAX_RXABSDELAY } }
};
  
+		if (adapter->flags2 & FLAG2_DMA_BURST)

+   opt.def = BURST_RADV;
+
if (num_RxAbsIntDelay > bd) {
adapter->rx_abs_int_delay = RxAbsIntDelay[bd];
e1000_validate_option(>rx_abs_int_delay, ,


This patch looks good for me, but I would like hear second opinion.

Re: [PATCH] pktgen: add a new sample script for 40G and above link testing

2017-08-27 Thread Tariq Toukan

On 25/08/2017 12:26 PM, Robert Hoo wrote:

(Sorry for yesterday's wrong sending, I finally fixed my MTA and git
send-email settings.)

It's hard to benchmark 40G+ network bandwidth using ordinary
tools like iperf, netperf (see reference 1).
Pktgen, packet generator from Kernel sapce, shall be a candidate.
I then tried with pktgen multiqueue sample scripts, but still
cannot reach line rate.

Try samples 03 and 04.

I then derived this NUMA awared irq affinity sample script from
multi-queue sample one, successfully benchmarked 40G link. I think this can
also be useful for 100G reference, though I haven't got device to test yet.

This script simply does:
Detect $DEV's NUMA node belonging.
Bind each thread (processor from that NUMA node) with each $DEV queue's
irq affinity, 1:1 mapping.
How many '-t' threads input determines how many queues will be
utilized.

I agree this is an essential capability.
This was the main reason I added support for the -f argument.
Using it, I could choose cores of local NUMA, especially for single
thread, or when cores of the NUMA are sequential.

Tested with Intel XL710 NIC with Cisco 3172 switch.

It would be even slightly better if the irqbalance service is turned
off outside.

Referrences:
https://people.netfilter.org/hawk/presentations/LCA2015/net_stack_challenges_100G_LCA2015.pdf
http://www.intel.cn/content/dam/www/public/us/en/documents/reference-guides/xl710-x710-performance-tuning-linux-guide.pdf

Signed-off-by: Robert Hoo
---

Regards,
Tariq Toukan

[PATCH] be2net: Fix some u16 fields appropriately

2017-08-27 Thread Haishuang Yan

In be_tx_compl_process, frag_index declared as u32, so it's better to
declare last_index as u32 also.

CC: Ajit Khaparde 
Fixes: b0fd2eb28bd4 ("be2net: Declare some u16 fields as u32 to improve
performance")
Signed-off-by: Haishuang Yan 
---
 drivers/net/ethernet/emulex/benet/be.h  | 2 +-
 drivers/net/ethernet/emulex/benet/be_main.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/emulex/benet/be.h 
b/drivers/net/ethernet/emulex/benet/be.h
index 674cf9d..2ba4d61 100644
--- a/drivers/net/ethernet/emulex/benet/be.h
+++ b/drivers/net/ethernet/emulex/benet/be.h
@@ -255,7 +255,7 @@ struct be_tx_stats {
 /* Structure to hold some data of interest obtained from a TX CQE */
 struct be_tx_compl_info {
u8 status;  /* Completion status */
-   u16 end_index;  /* Completed TXQ Index */
+   u32 end_index;  /* Completed TXQ Index */
 };
 
 struct be_tx_obj {
diff --git a/drivers/net/ethernet/emulex/benet/be_main.c 
b/drivers/net/ethernet/emulex/benet/be_main.c
index 319eee3..3645344 100644
--- a/drivers/net/ethernet/emulex/benet/be_main.c
+++ b/drivers/net/ethernet/emulex/benet/be_main.c
@@ -2606,7 +2606,7 @@ static struct be_tx_compl_info *be_tx_compl_get(struct 
be_tx_obj *txo)
 }
 
 static u16 be_tx_compl_process(struct be_adapter *adapter,
-  struct be_tx_obj *txo, u16 last_index)
+  struct be_tx_obj *txo, u32 last_index)
 {
struct sk_buff **sent_skbs = txo->sent_skb_list;
struct be_queue_info *txq = >q;
-- 
1.8.3.1

[PATCH] igb: check memory allocation failure

2017-08-27 Thread Christophe JAILLET

Check memory allocation failures and return -ENOMEM in such cases, as
already done for other memory allocations in this function.

This avoids NULL pointers dereference.

Signed-off-by: Christophe JAILLET 
---
 drivers/net/ethernet/intel/igb/igb_main.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/ethernet/intel/igb/igb_main.c 
b/drivers/net/ethernet/intel/igb/igb_main.c
index fd4a46b03cc8..837d9b46a390 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -3162,6 +3162,8 @@ static int igb_sw_init(struct igb_adapter *adapter)
/* Setup and initialize a copy of the hw vlan table array */
adapter->shadow_vfta = kcalloc(E1000_VLAN_FILTER_TBL_SIZE, sizeof(u32),
   GFP_ATOMIC);
+   if (!adapter->shadow_vfta)
+   return -ENOMEM;
 
/* This call may decrease the number of queues */
if (igb_init_interrupt_scheme(adapter, true)) {
-- 
2.11.0

84 matches

Mail list logo