[PATCH] net: stmmac: constify clk_div_table
clk_div_table are not supposed to change at runtime. meson8b_dwmac structure is working with const clk_div_table. So mark the non-const structs as const. Signed-off-by: Arvind Yadav--- drivers/net/ethernet/stmicro/stmmac/dwmac-meson8b.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac-meson8b.c b/drivers/net/ethernet/stmicro/stmmac/dwmac-meson8b.c index 968..4404650b 100644 --- a/drivers/net/ethernet/stmicro/stmmac/dwmac-meson8b.c +++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-meson8b.c @@ -89,7 +89,7 @@ static int meson8b_init_clk(struct meson8b_dwmac *dwmac) char clk_name[32]; const char *clk_div_parents[1]; const char *mux_parent_names[MUX_CLK_NUM_PARENTS]; - static struct clk_div_table clk_25m_div_table[] = { + static const struct clk_div_table clk_25m_div_table[] = { { .val = 0, .div = 5 }, { .val = 1, .div = 10 }, { /* sentinel */ }, -- 1.9.1
Re: [PATCH net-next v7 05/10] landlock: Add LSM hooks related to filesystem
On Sun, Aug 27, 2017 at 03:31:35PM +0200, Mickaël Salaün wrote: > > > How can you add 3rd argument? All FS events would have to get it, > > but in some LSM hooks such argument will be meaningless, whereas > > in other places it will carry useful info that rule can operate on. > > Would that mean that we'll have FS_3 event type and only few LSM > > hooks will be converted to it. That works, but then we'll lose > > compatiblity with old rules written for FS event and that given hook. > > Otherwise we'd need to have fancy logic to accept old FS event > > into FS_3 LSM hook. > > If we want to add a third argument to the FS event, then it will become > accessible because its type will be different than NOT_INIT. This keep > the compatibility with old rules because this new field was then denied. > > If we want to add a new argument but only for a subset of the hooks used > by the FS event, then we need to create a new event, like FS_FCNTL. For > example, we may want to add a FS_RENAME event to be able to tie the > source file and the destination file of a rename call. that's exactly my point. To add another argument FS event to a subset of hooks will require either new FS_FOO and to be backwards compatible these hooks will call _both_ FS and FS_FOO or some magic logic on kernel side that will allow old FS rules to be attached to FS_FOO hooks? Two calls doesn't scale and if we do 'magic logic' can we do it now and avoid introducing events altogether? Like all landlock programs can be landlock type and they would need to declare what arg1, arg2, argN they expect. Then at attach time the kernel only needs to verify that hook arg types match what program requested. > Anyway, I added the subtype/ABI version as a safeguard in case of > unexpected future evolution. I don't think that abi/version field adds anything in this context. I still think it should simply be removed.
Re: [PATCH net-next v7 04/10] bpf: Define handle_fs and add a new helper bpf_handle_fs_get_mode()
On Mon, 21 Aug 2017, Mickaël Salaün wrote: > @@ -85,6 +90,8 @@ enum bpf_arg_type { > > ARG_PTR_TO_CTX, /* pointer to context */ > ARG_ANYTHING, /* any (initialized) argument is ok */ > + > + ARG_CONST_PTR_TO_HANDLE_FS, /* pointer to an abstract FS struct */ > }; Looks like a spurious empty line. -- James Morris
Re: [kernel-hardening] Re: [PATCH net-next v7 02/10] bpf: Add eBPF program subtype and is_valid_subtype() verifier
On Wed, 23 Aug 2017, Mickaël Salaün wrote: > >> + struct { > >> + __u32 abi; /* minimal ABI version, cf. user doc */ > > > > the concept of abi (version) sounds a bit weird to me. > > Why bother with it at all? > > Once the first set of patches lands the kernel as whole will have landlock > > feature > > with a set of helpers, actions, event types. > > Some future patches will extend the landlock feature step by step. > > This abi concept assumes that anyone who adds new helper would need > > to keep incrementing this 'abi'. What value does it give to user or to > > kernel? > > The users will already know that landlock is present in kernel 4.14 or > > whatever > > and the kernel 4.18 has more landlock features. Why bother with extra abi > > number? > > That's right for helpers and context fields, but we can't check the use > of one field's content. The status field is intended to be a bitfield > extendable in the future. For example, one use case is to set a flag to > inform the eBPF program that it was already called with the same context > and can skip most of its check (if not related to maps). Same goes for > the FS action bitfield, one may want to add more of them. Another > example may be the check for abilities. We may want to relax/remove the > capability require to set one of them. With an ABI version, the user can > easily check if the current kernel support that. Don't call it an ABI, perhaps minimum policy version (similar to what SELinux does). Changes need to be made so that any existing userspace still works. -- James Morris
Re: [PATCH net-next v7 02/10] bpf: Add eBPF program subtype and is_valid_subtype() verifier
On Tue, 22 Aug 2017, Alexei Starovoitov wrote: > more general question: what is the status of security/ bits? > I'm assuming they still need to be reviewed and explicitly acked by James, > right? Yep, along with other core security developers where possible. -- James Morris
Re: [kernel-hardening] [PATCH net-next v7 00/10] Landlock LSM: Toward unprivileged sandboxing
On Mon, 21 Aug 2017, Mickaël Salaün wrote: > ## Why a new LSM? Are SELinux, AppArmor, Smack and Tomoyo not good enough? > > The current access control LSMs are fine for their purpose which is to give > the > *root* the ability to enforce a security policy for the *system*. What is > missing is a way to enforce a security policy for any application by its > developer and *unprivileged user* as seccomp can do for raw syscall filtering. > You could mention here that the first case is Mandatory Access Control, in general terms. -- James Morris
Re: Get ARP/ND tables from kernel
On 8/27/17 9:25 PM, Bassam Alsanie wrote: > Hello everyone, > I looking into a good way (stable and compatible with large number of > distros) to get the arp/nd cache from kernel to user space, for both > IP4 and IP6. > > It seem IOCTL (SIOCGARP) can't do that, you can only get MAC address > from provided IP address. But IOCTL can't give the the full arp/nd > table. > The other option is the Netlink interface. I tried it and I got the > ARP/ND table :). > The third option is using /proc/net/arp, which only restricted to IP4. > > There is command line utilities that I excluding in my case. > > Is there another way to do it? what is the best way in my case? > > Thank you all. # strace arp -an [...] open("/proc/net/arp", O_RDONLY) = 4 fstat(4, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0 read(4, "IP address HW type Fla"..., 1024) = 310 [...] # strace ip -6 neighbor show [...] socket(AF_NETLINK, SOCK_RAW|SOCK_CLOEXEC, NETLINK_ROUTE) = 3 setsockopt(3, SOL_SOCKET, SO_SNDBUF, [32768], 4) = 0 setsockopt(3, SOL_SOCKET, SO_RCVBUF, [1048576], 4) = 0 bind(3, {sa_family=AF_NETLINK, nl_pid=0, nl_groups=}, 12) = 0 getsockname(3, {sa_family=AF_NETLINK, nl_pid=30292, nl_groups=}, [12]) = 0 sendto(3, {{len=40, type=RTM_GETLINK, flags=NLM_F_REQUEST|NLM_F_DUMP, seq=1503888680, pid=0}, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\10\0\35\0\1\0\0\0"}, 40, 0, NULL, 0) = 40 recvmsg(3, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=}, msg_namelen=12, msg_iov=[{iov_base=[{{len=1268, type=RTM_NEWLINK, flags=NLM_F_MULTI, seq=1503888680, pid=30292}, "\0\0\4\3\1\0\0\0I\0\1\0\0\0\0\0\7\0\3\0lo\0\0\10\0\r\0\350\3\0\0"...}, {{len=1280, type=RTM_NEWLINK, flags=NLM_F_MULTI, seq=1503888680, pid=30292}, "\0\0\1\0\2\0\0\0C\20\1\0\0\0\0\0\t\0\3\0eno1\0\0\0\0\10\0\r\0"...}], iov_len=32768}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 2548 [...] Seems like it's pretty obvious if you don't want to use the existing tools, just look at how the existing tools get this data. IPv4 uses /proc/net/arp, IPv6 uses netlink. Cheers, -PJ
[PATCH net] ipv6: do not set sk_destruct in IPV6_ADDRFORM sockopt
ChunYu found a kernel warn_on during syzkaller fuzzing: [40226.038539] WARNING: CPU: 5 PID: 23720 at net/ipv4/af_inet.c:152 inet_sock_destruct+0x78d/0x9a0 [40226.144849] Call Trace: [40226.147590] [40226.149859] dump_stack+0xe2/0x186 [40226.176546] __warn+0x1a4/0x1e0 [40226.180066] warn_slowpath_null+0x31/0x40 [40226.184555] inet_sock_destruct+0x78d/0x9a0 [40226.246355] __sk_destruct+0xfa/0x8c0 [40226.290612] rcu_process_callbacks+0xaa0/0x18a0 [40226.336816] __do_softirq+0x241/0x75e [40226.367758] irq_exit+0x1f6/0x220 [40226.371458] smp_apic_timer_interrupt+0x7b/0xa0 [40226.376507] apic_timer_interrupt+0x93/0xa0 The warn_on happned when sk->sk_rmem_alloc wasn't 0 in inet_sock_destruct. As after commit f970bd9e3a06 ("udp: implement memory accounting helpers"), udp has changed to use udp_destruct_sock as sk_destruct where it would udp_rmem_release all rmem. But IPV6_ADDRFORM sockopt sets sk_destruct with inet_sock_destruct after changing family to PF_INET. If rmem is not 0 at that time, and there is no place to release rmem before calling inet_sock_destruct, the warn_on will be triggered. This patch is to fix it by not setting sk_destruct in IPV6_ADDRFORM sockopt any more. As IPV6_ADDRFORM sockopt only works for tcp and udp. TCP sock has already set it's sk_destruct with inet_sock_destruct and UDP has set with udp_destruct_sock since they're created. Fixes: f970bd9e3a06 ("udp: implement memory accounting helpers") Reported-by: ChunYu WangSigned-off-by: Xin Long --- net/ipv6/ipv6_sockglue.c | 1 - 1 file changed, 1 deletion(-) diff --git a/net/ipv6/ipv6_sockglue.c b/net/ipv6/ipv6_sockglue.c index 02d795f..a5e466d 100644 --- a/net/ipv6/ipv6_sockglue.c +++ b/net/ipv6/ipv6_sockglue.c @@ -242,7 +242,6 @@ static int do_ipv6_setsockopt(struct sock *sk, int level, int optname, pktopt = xchg(>pktoptions, NULL); kfree_skb(pktopt); - sk->sk_destruct = inet_sock_destruct; /* * ... and add it to the refcnt debug socks count * in the new family. -acme -- 2.1.0
Re: [PATCH RFC WIP 0/5] IGMP snooping for local traffic
Hi Andrew, On 08/26/2017 01:56 PM, Andrew Lunn wrote: > This is a WIP patchset i would like comments on from bridge, > switchdev and hardware offload people. > > The linux bridge supports IGMP snooping. It will listen to IGMP > reports on bridge ports and keep track of which groups have been > joined on an interface. It will then forward multicast based on this > group membership. > > When the bridge adds or removed groups from an interface, it uses > switchdev to request the hardware add an mdb to a port, so the > hardware can perform the selective forwarding between ports. > > What is not covered by the current bridge code, is IGMP joins/leaves > from the host on the brX interface. No such monitoring is performed. > With a pure software bridge, it is not required. All mulitcast frames > are passed to the brX interface, and the network stack filters them, > as it does for any interface. However, when hardware offload is > involved, things change. We should program the hardware to only send > multcast packets to the host when the host has in interest in them. OK, so if I understand this right, without a bridge, we have the following happen today: with a DSA-enabled setup using any kind of switch tagging protocol, if a host is interested in receiving particular multicast traffic, we would receive IGMP joins/leaves through sw0p0, and the stack should call ndo_set_rx_mode for sw0p0, which would be dsa_slave_set_rx_mode() and which would synchronize the DSA master network device with the slave network device, everything works fine provided that the CPU port is configured to accept multicast traffic. Note here that we don't really add a MDB entry for sw0p0 when that happens, but it seems like we should for switches that lack IGMP snooping and/or multicast filtering. With the current bridge and DSA code, are not we actually always going to get the CPU port to be added with the multicast address and therefore no filtering is occurring and snooping is pretty much useless? > > Thus we need to perform IGMP snooping on the brX interface, just > like any other interface of the bridge. However, currently the brX > interface is missing all the needed data structures to do this. > There is no net_bridge_port structure for the brX interface. This > strucuture is created when an interface is added to the bridge. But > the brX interface is not a member of the bridge. So this patchset > makes the brX interface a first class member of the bridge. When the > brX interface is opened, the interface is added to the bridge. A > net_bridge_port is allocated for it, and IGMP snooping is performed > as usual. Would not making brX be part of the bridge have a huge negative performance impact on locally generated traffic either? Even though we do an early return in br_handle_frame() this may become noticeable. > > There are some complexities here. Some assumptions are broken, like > the master interface of a port interface is the bridge interface. > The brX interface cannot be its own master. The use of > netdev_master_upper_dev_get() within the bridge code has been > changed to reflecit this. The bridge receive handler needs to not > process frames for the brX interface, etc. > > The interface downward to the hardware is also an issue. The code > presented here is a hack and needs to change. But that is secondary > and can be solved once it is agreed how the bridge needs to change > to support this use case. > > Comment welcome and wanted. While I understand the reasons why you did it that way, I think this is going to break a lot of code in bridge that does not expect brX to be a bridge port member. Maybe we can just generate switch MDB events targeting the bridge network device and let switch drivers resolve that to whatever their CPU/master port is? It does sound like we are moving more and more to a model where brX becomes one (if not the only one) net_device representor of what the CPU/master port of a switch is (at least with DSA) which sort of makes us go back to the multi-CPU port discussion we had a while ago. Thanks! -- Florian
Re: [PATCH net-next] bridge: fdb add and delete tracepoints
On 08/27/2017 02:33 PM, Roopa Prabhu wrote: > From: Roopa Prabhu> > Tracepoints to trace bridge forwarding database updates. Thanks for adding this! > > Signed-off-by: Roopa Prabhu > --- > include/trace/events/bridge.h | 98 > +++ > net/bridge/br_fdb.c | 7 > net/core/net-traces.c | 6 +++ > 3 files changed, 111 insertions(+) > create mode 100644 include/trace/events/bridge.h > > diff --git a/include/trace/events/bridge.h b/include/trace/events/bridge.h > new file mode 100644 > index 000..e2d52cf > --- /dev/null > +++ b/include/trace/events/bridge.h > @@ -0,0 +1,98 @@ > +#undef TRACE_SYSTEM > +#define TRACE_SYSTEM bridge > + > +#if !defined(_TRACE_BRIDGE_H) || defined(TRACE_HEADER_MULTI_READ) > +#define _TRACE_BRIDGE_H > + > +#include > +#include > + > +#include "../../../net/bridge/br_private.h" > + > +TRACE_EVENT(br_fdb_add, > + > + TP_PROTO(struct ndmsg *ndm, struct net_device *dev, > + const unsigned char *addr, u16 vid, u16 nlh_flags), > + > + TP_ARGS(ndm, dev, addr, vid, nlh_flags), > + > + TP_STRUCT__entry( > + __field(u8, ndm_flags) > + __string(dev, dev->name) > + __array(unsigned char, addr, 6) Can you use ETH_ALEN instead of 6 here? > + __field(u16, vid) > + __field(u16, nlh_flags) > + ), > + > + TP_fast_assign( > + __assign_str(dev, dev->name); > + memcpy(__entry->addr, addr, 6); Likewise > + __entry->vid = vid; > + __entry->nlh_flags = nlh_flags; > + __entry->ndm_flags = ndm->ndm_flags; > + ), > + > + TP_printk("dev %s addr %02x:%02x:%02x:%02x:%02x:%02x vid %u nlh_flags > %x ndm_flags = %x", I wonder if we could make %pM work for TP_printk() as this would simplify the argument list a bitt. Can you use %04x for vid, nlh_flags and %02x for ndm_flags? > + __get_str(dev), __entry->addr[0], __entry->addr[1], > + __entry->addr[2], __entry->addr[3], __entry->addr[4], > + __entry->addr[5], __entry->vid, > + __entry->nlh_flags, __entry->ndm_flags) > +); > + > +TRACE_EVENT(br_fdb_external_learn_add, > + > + TP_PROTO(struct net_bridge *br, struct net_bridge_port *p, > + const unsigned char *addr, u16 vid), > + > + TP_ARGS(br, p, addr, vid), > + > + TP_STRUCT__entry( > + __string(br_dev, br->dev->name) > + __string(dev, p->dev->name) > + __array(unsigned char, addr, 6) > + __field(u16, vid) > + ), > + > + TP_fast_assign( > + __assign_str(br_dev, br ? br->dev->name : "null"); > + __assign_str(dev, p ? p->dev->name : "null"); > + memcpy(__entry->addr, addr, 6); > + __entry->vid = vid; > + ), > + > + TP_printk("br_dev %s port %s addr %02x:%02x:%02x:%02x:%02x:%02x vid %u", > + __get_str(br_dev), __get_str(dev), __entry->addr[0], > + __entry->addr[1], __entry->addr[2], __entry->addr[3], > + __entry->addr[4], __entry->addr[5], __entry->vid) > +); > + > +TRACE_EVENT(fdb_delete, > + > + TP_PROTO(struct net_bridge *br, struct net_bridge_fdb_entry *f), > + > + TP_ARGS(br, f), > + > + TP_STRUCT__entry( > + __string(br_dev, br->dev->name) > + __string(dev, f->dst ? f->dst->dev->name : "null") > + __array(unsigned char, addr, 6) Same here, using ETH_ALEN would be clearer. > + __field(u16, vid) > + ), > + Thanks! -- Florian
linux-next: manual merge of the net-next tree with the rockchip tree
Hi all, Today's linux-next merge of the net-next tree got a conflict in: arch/arm64/boot/dts/rockchip/rk3328-evb.dts between commit: 0e54e062692a ("arm64: dts: rockchip: add mmc nodes for rk3328 evaluation board") 57fca160b2be ("arm64: dts: rockchip: add cpu regulator for rk3328 evaluation board") from the rockchip tree and commit: 4b05bc6157eb ("ARM64: dts: rockchip: Enable gmac2phy for rk3328-evb") from the net-next tree. I fixed it up (see below) and can carry the fix as necessary. This is now fixed as far as linux-next is concerned, but any non trivial conflicts should be mentioned to your upstream maintainer when your tree is submitted for merging. You may also want to consider cooperating with the maintainer of the conflicting tree to minimise any particularly complex conflicts. -- Cheers, Stephen Rothwell diff --cc arch/arm64/boot/dts/rockchip/rk3328-evb.dts index f82b2d0d9e86,b9f36dad17e6.. --- a/arch/arm64/boot/dts/rockchip/rk3328-evb.dts +++ b/arch/arm64/boot/dts/rockchip/rk3328-evb.dts @@@ -51,217 -51,24 +51,234 @@@ stdout-path = "serial2:150n8"; }; + dc_12v: dc-12v { + compatible = "regulator-fixed"; + regulator-name = "dc_12v"; + regulator-always-on; + regulator-boot-on; + regulator-min-microvolt = <1200>; + regulator-max-microvolt = <1200>; + }; + + sdio_pwrseq: sdio-pwrseq { + compatible = "mmc-pwrseq-simple"; + pinctrl-names = "default"; + pinctrl-0 = <_enable_h>; + + /* + * On the module itself this is one of these (depending + * on the actual card populated): + * - SDIO_RESET_L_WL_REG_ON + * - PDN (power down when low) + */ + reset-gpios = < 18 GPIO_ACTIVE_LOW>; + }; + + vcc_phy: vcc-phy-regulator { + compatible = "regulator-fixed"; + regulator-name = "vcc_phy"; + regulator-always-on; + regulator-boot-on; + }; ++ + vcc_sys: vcc-sys { + compatible = "regulator-fixed"; + regulator-name = "vcc_sys"; + regulator-always-on; + regulator-boot-on; + regulator-min-microvolt = <500>; + regulator-max-microvolt = <500>; + vin-supply = <_12v>; + }; + + vcc_sd: sdmmc-regulator { + compatible = "regulator-fixed"; + gpio = < 30 GPIO_ACTIVE_LOW>; + pinctrl-names = "default"; + pinctrl-0 = <_gpio>; + regulator-name = "vcc_sd"; + regulator-min-microvolt = <330>; + regulator-max-microvolt = <330>; + vin-supply = <_io>; + }; +}; + + { + cpu-supply = <_arm>; +}; + + { + bus-width = <8>; + cap-mmc-highspeed; + non-removable; + pinctrl-names = "default"; + pinctrl-0 = <_clk _cmd _bus8>; + status = "okay"; }; + { + phy-supply = <_phy>; + clock_in_out = "output"; + assigned-clocks = < SCLK_MAC2PHY_SRC>; + assigned-clock-rate = <5000>; + assigned-clocks = < SCLK_MAC2PHY>; + assigned-clock-parents = < SCLK_MAC2PHY_SRC>; + status = "okay"; + }; + + { + status = "okay"; + + rk805: rk805@18 { + compatible = "rockchip,rk805"; + reg = <0x18>; + interrupt-parent = <>; + interrupts = <6 IRQ_TYPE_LEVEL_LOW>; + #clock-cells = <1>; + clock-output-names = "xin32k", "rk805-clkout2"; + gpio-controller; + #gpio-cells = <2>; + pinctrl-names = "default"; + pinctrl-0 = <_int_l>; + rockchip,system-power-controller; + wakeup-source; + + vcc1-supply = <_sys>; + vcc2-supply = <_sys>; + vcc3-supply = <_sys>; + vcc4-supply = <_sys>; + vcc5-supply = <_io>; + vcc6-supply = <_io>; + + regulators { + vdd_logic: DCDC_REG1 { + regulator-name = "vdd_logic"; + regulator-min-microvolt = <712500>; + regulator-max-microvolt = <145>; + regulator-always-on; + regulator-boot-on; + regulator-state-mem { + regulator-on-in-suspend; + regulator-suspend-microvolt = <100>; + }; + }; + + vdd_arm: DCDC_REG2 { + regulator-name = "vdd_arm"; +
Get ARP/ND tables from kernel
Hello everyone, I looking into a good way (stable and compatible with large number of distros) to get the arp/nd cache from kernel to user space, for both IP4 and IP6. It seem IOCTL (SIOCGARP) can't do that, you can only get MAC address from provided IP address. But IOCTL can't give the the full arp/nd table. The other option is the Netlink interface. I tried it and I got the ARP/ND table :). The third option is using /proc/net/arp, which only restricted to IP4. There is command line utilities that I excluding in my case. Is there another way to do it? what is the best way in my case? Thank you all.
Re: [PATCH net-next 3/4] net/core: Add violation counters to VF statisctics
On Sun, 27 Aug 2017 14:06:17 +0300, Saeed Mahameed wrote: > From: Eugenia Emantayev> > Add receive and transmit violation counters to be > displayed in iproute2 VF statistics. > > Signed-off-by: Eugenia Emantayev > Signed-off-by: Saeed Mahameed > --- > include/linux/if_link.h | 2 ++ > include/uapi/linux/if_link.h | 2 ++ > net/core/rtnetlink.c | 10 +- > 3 files changed, 13 insertions(+), 1 deletion(-) > > diff --git a/include/linux/if_link.h b/include/linux/if_link.h > index da70af27e42e..ebf3448acb5b 100644 > --- a/include/linux/if_link.h > +++ b/include/linux/if_link.h > @@ -12,6 +12,8 @@ struct ifla_vf_stats { > __u64 tx_bytes; > __u64 broadcast; > __u64 multicast; > + __u64 rx_dropped; > + __u64 tx_dropped; I'm a little concerned that you call those violation counters in the commit message. Do you expect them to only be used if the VF traffic indeed violates some admin-set rules? I would imaging HW/FW may drop frames in certain situations and naming the counters *_dropped suggests it would be OK to increment them even if the drop reason was not any sort of violation. Would you mind clarifying?
Re: [PATCH net-next 1/4] net: Add SRIOV VGT+ support
On Sun, 27 Aug 2017 14:06:15 +0300, Saeed Mahameed wrote: > From: Mohamad Haj Yahia> > VGT+ is a security feature that gives the administrator the ability of > controlling the allowed vlan-ids list that can be transmitted/received > from/to the VF. > The allowed vlan-ids list is called "trunk". > Admin can add/remove a range of allowed vlan-ids via iptool. > Example: > After this series of configuration : > 1) ip link set eth3 vf 0 trunk add 10 100 (allow vlan-id 10-100, default tpid > 0x8100) > 2) ip link set eth3 vf 0 trunk add 105 proto 802.1q (allow vlan-id 105 tpid > 0x8100) > 3) ip link set eth3 vf 0 trunk add 105 proto 802.1ad (allow vlan-id 105 tpid > 0x88a8) > 4) ip link set eth3 vf 0 trunk rem 90 (block vlan-id 90) > 5) ip link set eth3 vf 0 trunk rem 50 60 (block vlan-ids 50-60) > > The VF 0 can only communicate on vlan-ids: 10-49,61-89,91-100,105 with > tpid 0x8100 and vlan-id 105 with tpid 0x88a8. > > For this purpose we added the following netlink sr-iov commands: > > 1) IFLA_VF_VLAN_RANGE: used to add/remove allowed vlan-ids range. > We added the ifla_vf_vlan_range struct to specify the range we want to > add/remove from the userspace. > We added ndo_add_vf_vlan_trunk_range and ndo_del_vf_vlan_trunk_range > netdev ops to add/remove allowed vlan-ids range in the netdev. > > 2) IFLA_VF_VLAN_TRUNK: used to query the allowed vlan-ids trunk. > We added trunk bitmap to the ifla_vf_info struct to get the current > allowed vlan-ids trunk from the netdev. > We added ifla_vf_vlan_trunk struct for sending the allowed vlan-ids > trunk to the userspace. > > Signed-off-by: Mohamad Haj Yahia > Signed-off-by: Eugenia Emantayev > Signed-off-by: Saeed Mahameed Interesting work, I have some minor questions if you don't mind :) I was under impression that "trunk" is a vendor-specific term, would it make sense to drop it from the APIs? > diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h > index 8d062c58d5cb..3aa895c5fbc1 100644 > --- a/include/uapi/linux/if_link.h > +++ b/include/uapi/linux/if_link.h > @@ -168,6 +168,8 @@ enum { > #ifndef __KERNEL__ > #define IFLA_RTA(r) ((struct rtattr*)(((char*)(r)) + > NLMSG_ALIGN(sizeof(struct ifinfomsg > #define IFLA_PAYLOAD(n) NLMSG_PAYLOAD(n,sizeof(struct ifinfomsg)) > +#define BITS_PER_BYTE 8 > +#define DIV_ROUND_UP(n, d) (((n) + (d) - 1) / (d)) > #endif > > enum { > @@ -645,6 +647,8 @@ enum { > IFLA_VF_IB_NODE_GUID, /* VF Infiniband node GUID */ > IFLA_VF_IB_PORT_GUID, /* VF Infiniband port GUID */ > IFLA_VF_VLAN_LIST, /* nested list of vlans, option for QinQ */ > + IFLA_VF_VLAN_RANGE, /* add/delete vlan range filtering */ > + IFLA_VF_VLAN_TRUNK, /* vlan trunk filtering */ > __IFLA_VF_MAX, > }; > > @@ -669,6 +673,7 @@ enum { > > #define IFLA_VF_VLAN_INFO_MAX (__IFLA_VF_VLAN_INFO_MAX - 1) > #define MAX_VLAN_LIST_LEN 1 > +#define VF_VLAN_N_VID 4096 > > struct ifla_vf_vlan_info { > __u32 vf; > @@ -677,6 +682,21 @@ struct ifla_vf_vlan_info { > __be16 vlan_proto; /* VLAN protocol either 802.1Q or 802.1ad */ > }; > > +struct ifla_vf_vlan_range { > + __u32 vf; > + __u32 start_vid; /* 1 - 4095 */ > + __u32 end_vid; /* 1 - 4095 */ > + __u32 setting; > + __be16 vlan_proto; /* VLAN protocol either 802.1Q or 802.1ad */ > +}; > + > +#define VF_VLAN_BITMAP DIV_ROUND_UP(VF_VLAN_N_VID, sizeof(__u64) * > BITS_PER_BYTE) > +struct ifla_vf_vlan_trunk { > + __u32 vf; > + __u64 allowed_vlans_8021q_bm[VF_VLAN_BITMAP]; > + __u64 allowed_vlans_8021ad_bm[VF_VLAN_BITMAP]; > +}; Would you mind explaining why you chose to make the API asymmetrical like that? I mean the set operation is range-based, yet the get returns a bitmask. You seem to solely depend on the bitmasks in the driver anyway... > struct ifla_vf_tx_rate { > __u32 vf; > __u32 rate; /* Max TX bandwidth in Mbps, 0 disables throttling */ > diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c > index a78fd61da0ec..56909f11d88e 100644 > --- a/net/core/rtnetlink.c > +++ b/net/core/rtnetlink.c > @@ -827,6 +827,7 @@ static inline int rtnl_vfinfo_size(const struct > net_device *dev, >nla_total_size(MAX_VLAN_LIST_LEN * > sizeof(struct ifla_vf_vlan_info)) + >nla_total_size(sizeof(struct ifla_vf_spoofchk)) + > + nla_total_size(sizeof(struct ifla_vf_vlan_trunk)) + >nla_total_size(sizeof(struct ifla_vf_tx_rate)) + >nla_total_size(sizeof(struct ifla_vf_rate)) + >nla_total_size(sizeof(struct ifla_vf_link_state)) + > @@ -1098,31 +1099,43 @@ static noinline_for_stack int rtnl_fill_vfinfo(struct > sk_buff *skb, > struct ifla_vf_link_state vf_linkstate; > struct ifla_vf_vlan_info
[net-next 04/15] i40e: Use correct flag to enable egress traffic for unicast promisc
From: Akeem G AbodunrinAlbeit, we usually set true promiscuous mode for both multicast and unicast at the same time - however, it is possible to set it individually, so using allmulti flag which is only for allmulticast might caused unwanted behavior in mirroring egress traffic promiscuous for unicast in VF. Signed-off-by: Akeem G Abodunrin Tested-by: Andrew Bowers Signed-off-by: Jeff Kirsher --- drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c index 057c77be96e4..27d87bef4ba3 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c +++ b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c @@ -1758,7 +1758,7 @@ static int i40e_vc_config_promiscuous_mode_msg(struct i40e_vf *vf, } } else { aq_ret = i40e_aq_set_vsi_unicast_promiscuous(hw, vsi->seid, -allmulti, NULL, +alluni, NULL, true); aq_err = pf->hw.aq.asq_last_status; if (aq_ret) { -- 2.14.1
[net-next 02/15] i40e: Store the requested FEC information
From: Mariusz StachuraStore information about FEC modes, that were requested. It will be used in printing link status information function and this way there is no need to call admin queue there. Signed-off-by: Mariusz Stachura Tested-by: Andrew Bowers Signed-off-by: Jeff Kirsher --- drivers/net/ethernet/intel/i40e/i40e_common.c | 4 drivers/net/ethernet/intel/i40e/i40e_type.h | 1 + drivers/net/ethernet/intel/i40evf/i40e_type.h | 1 + 3 files changed, 6 insertions(+) diff --git a/drivers/net/ethernet/intel/i40e/i40e_common.c b/drivers/net/ethernet/intel/i40e/i40e_common.c index 8e082a946411..5c36a18a31be 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_common.c +++ b/drivers/net/ethernet/intel/i40e/i40e_common.c @@ -2529,6 +2529,10 @@ i40e_status i40e_update_link_info(struct i40e_hw *hw) if (status) return status; + hw->phy.link_info.req_fec_info = + abilities.fec_cfg_curr_mod_ext_info & + (I40E_AQ_REQUEST_FEC_KR | I40E_AQ_REQUEST_FEC_RS); + memcpy(hw->phy.link_info.module_type, _type, sizeof(hw->phy.link_info.module_type)); } diff --git a/drivers/net/ethernet/intel/i40e/i40e_type.h b/drivers/net/ethernet/intel/i40e/i40e_type.h index 3a18ed13edc4..fd4bbdd88b57 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_type.h +++ b/drivers/net/ethernet/intel/i40e/i40e_type.h @@ -185,6 +185,7 @@ struct i40e_link_status { enum i40e_aq_link_speed link_speed; u8 link_info; u8 an_info; + u8 req_fec_info; u8 fec_info; u8 ext_info; u8 loopback; diff --git a/drivers/net/ethernet/intel/i40evf/i40e_type.h b/drivers/net/ethernet/intel/i40evf/i40e_type.h index bde7f24af1c6..2ea919d9cdcf 100644 --- a/drivers/net/ethernet/intel/i40evf/i40e_type.h +++ b/drivers/net/ethernet/intel/i40evf/i40e_type.h @@ -159,6 +159,7 @@ struct i40e_link_status { enum i40e_aq_link_speed link_speed; u8 link_info; u8 an_info; + u8 req_fec_info; u8 fec_info; u8 ext_info; u8 loopback; -- 2.14.1
[net-next 13/15] i40e: invert logic for checking incorrect cpu vs irq affinity
From: Jacob KellerIn commit 96db776a3682 ("i40e/vf: fix interrupt affinity bug") we added some code to force exit of polling in case we did not have the correct CPU. This is important since it was possible for the IRQ affinity to be changed while the CPU is pegged at 100%. This can result in the polling routine being stuck on the wrong CPU until traffic finally stops. Unfortunately, the implementation, "if the CPU is correct, exit as normal, otherwise, fall-through to the end-polling exit" is incredibly confusing to reason about. In this case, the normal flow looks like the exception, while the exception actually occurs far away from the if statement and comment. We recently discovered and fixed a bug in this code because we were incorrectly initializing the affinity mask. Re-write the code so that the exceptional case is handled at the check, rather than having the logic be spread through the regular exit flow. This does end up with minor code duplication, but the resulting code is much easier to reason about. The new logic is identical, but inverted. If we are running on a CPU not in our affinity mask, we'll exit polling. However, the code flow is much easier to understand. Note that we don't actually have to check for MSI-X, because in the MSI case we'll only have one q_vector, but its default affinity mask should be correct as it includes all CPUs when it's initialized. Further, we could at some point add code to setup the notifier for the non-MSI-X case and enable this workaround for that case too, if desired, though there isn't much gain since its unlikely to be the common case. Signed-off-by: Jacob Keller Tested-by: Andrew Bowers Signed-off-by: Jeff Kirsher --- drivers/net/ethernet/intel/i40e/i40e_txrx.c | 31 +-- drivers/net/ethernet/intel/i40evf/i40e_txrx.c | 30 +- 2 files changed, 30 insertions(+), 31 deletions(-) diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c index 5c1edcce9459..3999afea518b 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c +++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c @@ -2369,7 +2369,6 @@ int i40e_napi_poll(struct napi_struct *napi, int budget) /* If work not completed, return budget and polling will return */ if (!clean_complete) { - const cpumask_t *aff_mask = _vector->affinity_mask; int cpu_id = smp_processor_id(); /* It is possible that the interrupt affinity has changed but, @@ -2379,15 +2378,22 @@ int i40e_napi_poll(struct napi_struct *napi, int budget) * continue to poll, otherwise we must stop polling so the * interrupt can move to the correct cpu. */ - if (likely(cpumask_test_cpu(cpu_id, aff_mask) || - !(vsi->back->flags & I40E_FLAG_MSIX_ENABLED))) { + if (!cpumask_test_cpu(cpu_id, _vector->affinity_mask)) { + /* Tell napi that we are done polling */ + napi_complete_done(napi, work_done); + + /* Force an interrupt */ + i40e_force_wb(vsi, q_vector); + + /* Return budget-1 so that polling stops */ + return budget - 1; + } tx_only: - if (arm_wb) { - q_vector->tx.ring[0].tx_stats.tx_force_wb++; - i40e_enable_wb_on_itr(vsi, q_vector); - } - return budget; + if (arm_wb) { + q_vector->tx.ring[0].tx_stats.tx_force_wb++; + i40e_enable_wb_on_itr(vsi, q_vector); } + return budget; } if (vsi->back->flags & I40E_TXR_FLAGS_WB_ON_ITR) @@ -2396,14 +2402,7 @@ int i40e_napi_poll(struct napi_struct *napi, int budget) /* Work is done so exit the polling mode and re-enable the interrupt */ napi_complete_done(napi, work_done); - /* If we're prematurely stopping polling to fix the interrupt -* affinity we want to make sure polling starts back up so we -* issue a call to i40e_force_wb which triggers a SW interrupt. -*/ - if (!clean_complete) - i40e_force_wb(vsi, q_vector); - else - i40e_update_enable_itr(vsi, q_vector); + i40e_update_enable_itr(vsi, q_vector); return min(work_done, budget - 1); } diff --git a/drivers/net/ethernet/intel/i40evf/i40e_txrx.c b/drivers/net/ethernet/intel/i40evf/i40e_txrx.c index d91676ccf125..f15e341ada9e 100644 --- a/drivers/net/ethernet/intel/i40evf/i40e_txrx.c +++ b/drivers/net/ethernet/intel/i40evf/i40e_txrx.c @@ -1575,7 +1575,6 @@ int i40evf_napi_poll(struct napi_struct *napi,
[net-next 11/15] i40e: move enabling icr0 into i40e_update_enable_itr
From: Jacob KellerIf we don't have MSI-X enabled, we handle interrupts on all icr0. This is a special case, so let's move the conditional into i40e_update_enable_itr() in order to make i40e_napi_poll easier to read about. Signed-off-by: Jacob Keller Tested-by: Andrew Bowers Signed-off-by: Jeff Kirsher --- drivers/net/ethernet/intel/i40e/i40e_txrx.c | 8 ++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c index 8a969d8f0790..5c1edcce9459 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c +++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c @@ -2243,6 +2243,12 @@ static inline void i40e_update_enable_itr(struct i40e_vsi *vsi, int idx = q_vector->v_idx; int rx_itr_setting, tx_itr_setting; + /* If we don't have MSIX, then we only need to re-enable icr0 */ + if (!(vsi->back->flags & I40E_FLAG_MSIX_ENABLED)) { + i40e_irq_dynamic_enable_icr0(vsi->back, false); + return; + } + vector = (q_vector->v_idx + vsi->base_vector); /* avoid dynamic calculation if in countdown mode OR if @@ -2396,8 +2402,6 @@ int i40e_napi_poll(struct napi_struct *napi, int budget) */ if (!clean_complete) i40e_force_wb(vsi, q_vector); - else if (!(vsi->back->flags & I40E_FLAG_MSIX_ENABLED)) - i40e_irq_dynamic_enable_icr0(vsi->back, false); else i40e_update_enable_itr(vsi, q_vector); -- 2.14.1
[net-next 00/15][pull request] 40GbE Intel Wired LAN Driver Updates 2017-08-27
This series contains updates to i40e and i40evf only. Sudheer updates code comments and state variable so that adminq_subtask will have accutate information whenever it gets scheduled. Mariusz stores information about FEC modes, to be used to printing link states information, so that we do not need to call admin queue when reporting link status. Adds VF support for controlling VLAN tag stripping via ethtool. Jake provides the majority of changes in this series, starting with increasing the size of the prefix buffer so that it can hold enough characters for every possible input, which prevents snprintf truncation. Fixed other string truncation errors/warnings produced by GCC 7.x. Removed an unnecessary workaround for resetting XPS. Fixed an issue where there is a mismatched affinity mask value, so initialize the value to cpu_possible_mask and invert the logic for checking incorrect CPU vs IRQ affinity so that the exceptional case is handled at the check. Removed ULTRA latency mode due to several issues found and will be looking at better solution for small packet workloads. Akeem fixes an issue where the incorrect flag was being used to set promiscuous mode for unicast, which was enabling promiscuous mode only for multicast instead of unicast. Carolyn fixes an issue where an error return value is set, but this value can be overwritten before we actually do exit the function. So remove the error code assignment and add code comments for better understanding on why we do not need to set and return the error. The following are changes since commit ec15ecdee5eb9e33a565e1e8eaef39fd4de565cb: net: mvpp2: fix the packet size configuration for 10G and are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue 40GbE Akeem G Abodunrin (1): i40e: Use correct flag to enable egress traffic for unicast promisc Carolyn Wyborny (1): i40e: Fix for unused value issue found by static analysis Jacob Keller (9): i40e: prevent snprintf format specifier truncation i40evf: fix possible snprintf truncation of q_vector->name i40e: force VMDQ device name truncation i40e: remove workaround for resetting XPS i40e: move enabling icr0 into i40e_update_enable_itr i40e: initialize our affinity_mask based on cpu_possible_mask i40e: invert logic for checking incorrect cpu vs irq affinity i40e/i40evf: remove ULTRA latency mode i40e/i40evf: avoid dynamic ITR updates when polling or low packet rate Mariusz Stachura (3): i40e: Store the requested FEC information i40e/i40evf: support for VF VLAN tag stripping control i40e: 25G FEC status improvements Sudheer Mogilappagari (1): i40e: Update state variable for adminq subtask drivers/net/ethernet/intel/i40e/i40e_common.c | 8 ++- drivers/net/ethernet/intel/i40e/i40e_main.c| 58 ++-- drivers/net/ethernet/intel/i40e/i40e_nvm.c | 10 ++- drivers/net/ethernet/intel/i40e/i40e_txrx.c| 78 +++--- drivers/net/ethernet/intel/i40e/i40e_txrx.h| 2 +- drivers/net/ethernet/intel/i40e/i40e_type.h| 1 + drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c | 62 - drivers/net/ethernet/intel/i40evf/i40e_common.c| 4 +- drivers/net/ethernet/intel/i40evf/i40e_txrx.c | 69 +-- drivers/net/ethernet/intel/i40evf/i40e_txrx.h | 2 +- drivers/net/ethernet/intel/i40evf/i40e_type.h | 1 + drivers/net/ethernet/intel/i40evf/i40evf.h | 6 +- drivers/net/ethernet/intel/i40evf/i40evf_main.c| 61 + .../net/ethernet/intel/i40evf/i40evf_virtchnl.c| 40 +++ include/linux/avf/virtchnl.h | 5 ++ 15 files changed, 285 insertions(+), 122 deletions(-) -- 2.14.1
[net-next 06/15] i40e: force VMDQ device name truncation
From: Jacob KellerIn new versions of GCC since 7.x a new warning exists which warns when a string is truncated before all of the format can be completed. When we setup VMDQ netdev names we are copying a pre-existing interface name which could be up to 15 characters in length. Since we also add 4 bytes, v, the literal %, the d and a \0 null, we would overrun the available size unless snprintf truncated for us. The snprintf call will of course truncate on the end, so lets instead modify the code to force truncation of the copied netdev name by 4 characters, to create enough space for the 4 bytes we're adding. Signed-off-by: Jacob Keller Tested-by: Andrew Bowers Signed-off-by: Jeff Kirsher --- drivers/net/ethernet/intel/i40e/i40e_main.c | 9 +++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c index b0ccd3c2eec6..3a6a752c6c58 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_main.c +++ b/drivers/net/ethernet/intel/i40e/i40e_main.c @@ -9690,8 +9690,13 @@ static int i40e_config_netdev(struct i40e_vsi *vsi) i40e_add_mac_filter(vsi, mac_addr); spin_unlock_bh(>mac_filter_hash_lock); } else { - /* relate the VSI_VMDQ name to the VSI_MAIN name */ - snprintf(netdev->name, IFNAMSIZ, "%sv%%d", + /* Relate the VSI_VMDQ name to the VSI_MAIN name. Note that we +* are still limited by IFNAMSIZ, but we're adding 'v%d\0' to +* the end, which is 4 bytes long, so force truncation of the +* original name by IFNAMSIZ - 4 +*/ + snprintf(netdev->name, IFNAMSIZ, "%.*sv%%d", +IFNAMSIZ - 4, pf->vsi[pf->lan_vsi]->netdev->name); random_ether_addr(mac_addr); -- 2.14.1
[net-next 09/15] i40e: Fix for unused value issue found by static analysis
From: Carolyn WybornyThis patch fixes an issue where an error return value is set, but without an immediate exit, the value can be overwritten by the following code execution. The condition at this point is not fatal, so remove the error assignment and comment the intent for future code maintainers Signed-off-by: Carolyn Wyborny Tested-by: Andrew Bowers Signed-off-by: Jeff Kirsher --- drivers/net/ethernet/intel/i40e/i40e_main.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c index 5a06cd23b9e6..0962b85ef6f3 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_main.c +++ b/drivers/net/ethernet/intel/i40e/i40e_main.c @@ -9884,13 +9884,15 @@ static int i40e_add_vsi(struct i40e_vsi *vsi) */ ret = i40e_vsi_config_tc(vsi, enabled_tc); if (ret) { + /* Single TC condition is not fatal, +* message and continue +*/ dev_info(>pdev->dev, "failed to configure TCs for main VSI tc_map 0x%08x, err %s aq_err %s\n", enabled_tc, i40e_stat_str(>hw, ret), i40e_aq_str(>hw, pf->hw.aq.asq_last_status)); - ret = -ENOENT; } } break; -- 2.14.1
[net-next 05/15] i40evf: fix possible snprintf truncation of q_vector->name
From: Jacob KellerThe q_vector names are based on the interface name with a driver prefix, the type of q_vector setup, and the queue number. We previously set the size of this variable to IFNAMSIZ + 9, which is incorrect, because we actually include a minimum of 14 characters extra beyond the interface name size. New versions of GCC since 7 include a new warning that detects this possible truncation and complains. We can fix this by increasing the size in case our interface name is too large to avoid truncation. We don't need to go beyond 14 because the compiler is smart enough to realize our values can never exceed size of 1. We do go up to 15 here because possible future changes may increase the number of queues beyond one digit. While we are here, also change some variables to be unsigned (since they are never negative) and stop using an extra unnecessary %s format specifier. Signed-off-by: Jacob Keller Tested-by: Andrew Bowers Signed-off-by: Jeff Kirsher --- drivers/net/ethernet/intel/i40evf/i40evf.h | 2 +- drivers/net/ethernet/intel/i40evf/i40evf_main.c | 21 + 2 files changed, 10 insertions(+), 13 deletions(-) diff --git a/drivers/net/ethernet/intel/i40evf/i40evf.h b/drivers/net/ethernet/intel/i40evf/i40evf.h index d310544c6c6e..e5293d35fb6a 100644 --- a/drivers/net/ethernet/intel/i40evf/i40evf.h +++ b/drivers/net/ethernet/intel/i40evf/i40evf.h @@ -121,7 +121,7 @@ struct i40e_q_vector { #define ITR_COUNTDOWN_START 100 u8 itr_countdown; /* when 0 or 1 update ITR */ int v_idx; /* vector index in list */ - char name[IFNAMSIZ + 9]; + char name[IFNAMSIZ + 15]; bool arm_wb_state; cpumask_t affinity_mask; struct irq_affinity_notify affinity_notify; diff --git a/drivers/net/ethernet/intel/i40evf/i40evf_main.c b/drivers/net/ethernet/intel/i40evf/i40evf_main.c index 0d87191b6bac..258e8e27068b 100644 --- a/drivers/net/ethernet/intel/i40evf/i40evf_main.c +++ b/drivers/net/ethernet/intel/i40evf/i40evf_main.c @@ -543,9 +543,9 @@ static void i40evf_irq_affinity_release(struct kref *ref) {} static int i40evf_request_traffic_irqs(struct i40evf_adapter *adapter, char *basename) { - int vector, err, q_vectors; - int rx_int_idx = 0, tx_int_idx = 0; - int irq_num; + unsigned int vector, q_vectors; + unsigned int rx_int_idx = 0, tx_int_idx = 0; + int irq_num, err; i40evf_irq_disable(adapter); /* Decrement for Other and TCP Timer vectors */ @@ -556,18 +556,15 @@ i40evf_request_traffic_irqs(struct i40evf_adapter *adapter, char *basename) irq_num = adapter->msix_entries[vector + NONQ_VECS].vector; if (q_vector->tx.ring && q_vector->rx.ring) { - snprintf(q_vector->name, sizeof(q_vector->name) - 1, -"i40evf-%s-%s-%d", basename, -"TxRx", rx_int_idx++); + snprintf(q_vector->name, sizeof(q_vector->name), +"i40evf-%s-TxRx-%d", basename, rx_int_idx++); tx_int_idx++; } else if (q_vector->rx.ring) { - snprintf(q_vector->name, sizeof(q_vector->name) - 1, -"i40evf-%s-%s-%d", basename, -"rx", rx_int_idx++); + snprintf(q_vector->name, sizeof(q_vector->name), +"i40evf-%s-rx-%d", basename, rx_int_idx++); } else if (q_vector->tx.ring) { - snprintf(q_vector->name, sizeof(q_vector->name) - 1, -"i40evf-%s-%s-%d", basename, -"tx", tx_int_idx++); + snprintf(q_vector->name, sizeof(q_vector->name), +"i40evf-%s-tx-%d", basename, tx_int_idx++); } else { /* skip this unused q_vector */ continue; -- 2.14.1
[net-next 03/15] i40e: prevent snprintf format specifier truncation
From: Jacob KellerIncrease the size of the prefix buffer so that it can hold enough characters for every possible input. Although 20 is enough for all expected inputs, it is possible for the values to be larger than expected, resulting in a possibly truncated string. Additionally, lets use sizeof(prefix) in order to ensure we use the correct size if we need to change the array length in the future. New versions of GCC starting at 7 now include warnings to prevent truncation unless you handle the return code. At most 27 bytes can be written here, so lets just increase the buffer size even if for all expected hw->bus.* values we only needed 20. Signed-off-by: Jacob Keller Tested-by: Andrew Bowers Signed-off-by: Jeff Kirsher --- drivers/net/ethernet/intel/i40e/i40e_common.c | 4 ++-- drivers/net/ethernet/intel/i40evf/i40e_common.c | 4 ++-- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/drivers/net/ethernet/intel/i40e/i40e_common.c b/drivers/net/ethernet/intel/i40e/i40e_common.c index 5c36a18a31be..111426ba5fbc 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_common.c +++ b/drivers/net/ethernet/intel/i40e/i40e_common.c @@ -328,9 +328,9 @@ void i40e_debug_aq(struct i40e_hw *hw, enum i40e_debug_mask mask, void *desc, len = buf_len; /* write the full 16-byte chunks */ if (hw->debug_mask & mask) { - char prefix[20]; + char prefix[27]; - snprintf(prefix, 20, + snprintf(prefix, sizeof(prefix), "i40e %02x:%02x.%x: \t0x", hw->bus.bus_id, hw->bus.device, diff --git a/drivers/net/ethernet/intel/i40evf/i40e_common.c b/drivers/net/ethernet/intel/i40evf/i40e_common.c index d69c2e44cd1a..8d3a2bfe186a 100644 --- a/drivers/net/ethernet/intel/i40evf/i40e_common.c +++ b/drivers/net/ethernet/intel/i40evf/i40e_common.c @@ -333,9 +333,9 @@ void i40evf_debug_aq(struct i40e_hw *hw, enum i40e_debug_mask mask, void *desc, len = buf_len; /* write the full 16-byte chunks */ if (hw->debug_mask & mask) { - char prefix[20]; + char prefix[27]; - snprintf(prefix, 20, + snprintf(prefix, sizeof(prefix), "i40evf %02x:%02x.%x: \t0x", hw->bus.bus_id, hw->bus.device, -- 2.14.1
[net-next 12/15] i40e: initialize our affinity_mask based on cpu_possible_mask
From: Jacob KellerOn older kernels a call to irq_set_affinity_hint does not guarantee that the IRQ affinity will be set. If nothing else on the system sets the IRQ affinity this can result in a bug in the i40e_napi_poll() routine where we notice that our interrupt fired on the "wrong" CPU according to our internal affinity_mask variable. This results in a bug where we continuously tell NAPI to stop polling to move the interrupt to a new CPU, but the CPU never changes because our affinity mask does not match the actual mask setup for the IRQ. The root problem is a mismatched affinity mask value. So lets initialize the value to cpu_possible_mask instead. This ensures that prior to the first time we get an IRQ affinity notification we'll have the mask set to include every possible CPU. We use cpu_possible_mask instead of cpu_online_mask since the former is almost certainly never going to change, while the later might change after we've made a copy. Signed-off-by: Jacob Keller Tested-by: Andrew Bowers Signed-off-by: Jeff Kirsher --- drivers/net/ethernet/intel/i40e/i40e_main.c | 12 +++- drivers/net/ethernet/intel/i40evf/i40evf_main.c | 7 +-- 2 files changed, 12 insertions(+), 7 deletions(-) diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c index 7366e7c7f399..6498da8806cb 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_main.c +++ b/drivers/net/ethernet/intel/i40e/i40e_main.c @@ -2881,7 +2881,7 @@ static void i40e_config_xps_tx_ring(struct i40e_ring *ring) if ((vsi->tc_config.numtc <= 1) && !test_and_set_bit(__I40E_TX_XPS_INIT_DONE, >state)) { netif_set_xps_queue(ring->netdev, - >q_vector->affinity_mask, + get_cpu_mask(ring->q_vector->v_idx), ring->queue_index); } @@ -3506,8 +3506,10 @@ static int i40e_vsi_request_irq_msix(struct i40e_vsi *vsi, char *basename) q_vector->affinity_notify.notify = i40e_irq_affinity_notify; q_vector->affinity_notify.release = i40e_irq_affinity_release; irq_set_affinity_notifier(irq_num, _vector->affinity_notify); - /* assign the mask for this irq */ - irq_set_affinity_hint(irq_num, _vector->affinity_mask); + /* get_cpu_mask returns a static constant mask with +* a permanent lifetime so it's ok to use here. +*/ + irq_set_affinity_hint(irq_num, get_cpu_mask(q_vector->v_idx)); } vsi->irqs_ready = true; @@ -4289,7 +4291,7 @@ static void i40e_vsi_free_irq(struct i40e_vsi *vsi) /* clear the affinity notifier in the IRQ descriptor */ irq_set_affinity_notifier(irq_num, NULL); - /* clear the affinity_mask in the IRQ descriptor */ + /* remove our suggested affinity mask for this IRQ */ irq_set_affinity_hint(irq_num, NULL); synchronize_irq(irq_num); free_irq(irq_num, vsi->q_vectors[i]); @@ -8235,7 +8237,7 @@ static int i40e_vsi_alloc_q_vector(struct i40e_vsi *vsi, int v_idx, int cpu) q_vector->vsi = vsi; q_vector->v_idx = v_idx; - cpumask_set_cpu(cpu, _vector->affinity_mask); + cpumask_copy(_vector->affinity_mask, cpu_possible_mask); if (vsi->netdev) netif_napi_add(vsi->netdev, _vector->napi, diff --git a/drivers/net/ethernet/intel/i40evf/i40evf_main.c b/drivers/net/ethernet/intel/i40evf/i40evf_main.c index 9ee277e87f10..1825d956bb00 100644 --- a/drivers/net/ethernet/intel/i40evf/i40evf_main.c +++ b/drivers/net/ethernet/intel/i40evf/i40evf_main.c @@ -584,8 +584,10 @@ i40evf_request_traffic_irqs(struct i40evf_adapter *adapter, char *basename) q_vector->affinity_notify.release = i40evf_irq_affinity_release; irq_set_affinity_notifier(irq_num, _vector->affinity_notify); - /* assign the mask for this irq */ - irq_set_affinity_hint(irq_num, _vector->affinity_mask); + /* get_cpu_mask returns a static constant mask with +* a permanent lifetime so it's ok to use here. +*/ + irq_set_affinity_hint(irq_num, get_cpu_mask(q_vector->v_idx)); } return 0; @@ -1456,6 +1458,7 @@ static int i40evf_alloc_q_vectors(struct i40evf_adapter *adapter) q_vector->adapter = adapter; q_vector->vsi = >vsi; q_vector->v_idx = q_idx; + cpumask_copy(_vector->affinity_mask, cpu_possible_mask); netif_napi_add(adapter->netdev, _vector->napi,
[net-next 14/15] i40e/i40evf: remove ULTRA latency mode
From: Jacob KellerSince commit c56625d59726 ("i40e/i40evf: change dynamic interrupt thresholds") a new higher latency ITR setting called I40E_ULTRA_LATENCY was added with a cryptic comment about how it was meant for adjusting Rx more aggressively when streaming small packets. This mode was attempting to calculate packets per second and then kick in when we have a huge number of small packets. Unfortunately, the ULTRA setting was kicking in for workloads it wasn't intended for including single-thread UDP_STREAM workloads. This wasn't caught for a variety of reasons. First, the ip_defrag routines were improved somewhat which makes the UDP_STREAM test still reasonable at 10GbE, even when dropped down to 8k interrupts a second. Additionally, some other obvious workloads appear to work fine, such as TCP_STREAM. The number 40k doesn't make sense for a number of reasons. First, we absolutely can do more than 40k packets per second. Second, we calculate the value inline in an integer, which sometimes can overflow resulting in using incorrect values. If we fix this overflow it makes it even more likely that we'll enter ULTRA mode which is the opposite of what we want. The ULTRA mode was added originally as a way to reduce CPU utilization during a small packet workload where we weren't keeping up anyways. It should never have been kicking in during these other workloads. Given the issues outlined above, let's remove the ULTRA latency mode. If necessary, a better solution to the CPU utilization issue for small packet workloads will be added in a future patch. Signed-off-by: Jacob Keller Tested-by: Andrew Bowers Signed-off-by: Jeff Kirsher --- drivers/net/ethernet/intel/i40e/i40e_txrx.c | 17 - drivers/net/ethernet/intel/i40e/i40e_txrx.h | 1 - drivers/net/ethernet/intel/i40evf/i40e_txrx.c | 17 - drivers/net/ethernet/intel/i40evf/i40e_txrx.h | 1 - 4 files changed, 36 deletions(-) diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c index 3999afea518b..f00f233092e9 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c +++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c @@ -959,7 +959,6 @@ void i40e_force_wb(struct i40e_vsi *vsi, struct i40e_q_vector *q_vector) static bool i40e_set_new_dynamic_itr(struct i40e_ring_container *rc) { enum i40e_latency_range new_latency_range = rc->latency_range; - struct i40e_q_vector *qv = rc->ring->q_vector; u32 new_itr = rc->itr; int bytes_per_int; int usecs; @@ -971,7 +970,6 @@ static bool i40e_set_new_dynamic_itr(struct i40e_ring_container *rc) * 0-10MB/s lowest (5 ints/s) * 10-20MB/s low(2 ints/s) * 20-1249MB/s bulk (18000 ints/s) -* > 4 Rx packets per second (8000 ints/s) * * The math works out because the divisor is in 10^(-6) which * turns the bytes/us input value into MB/s values, but @@ -994,24 +992,12 @@ static bool i40e_set_new_dynamic_itr(struct i40e_ring_container *rc) new_latency_range = I40E_LOWEST_LATENCY; break; case I40E_BULK_LATENCY: - case I40E_ULTRA_LATENCY: default: if (bytes_per_int <= 20) new_latency_range = I40E_LOW_LATENCY; break; } - /* this is to adjust RX more aggressively when streaming small -* packets. The value of 4 was picked as it is just beyond -* what the hardware can receive per second if in low latency -* mode. -*/ -#define RX_ULTRA_PACKET_RATE 4 - - if rc->total_packets * 100) / usecs) > RX_ULTRA_PACKET_RATE) && - (>rx == rc)) - new_latency_range = I40E_ULTRA_LATENCY; - rc->latency_range = new_latency_range; switch (new_latency_range) { @@ -1024,9 +1010,6 @@ static bool i40e_set_new_dynamic_itr(struct i40e_ring_container *rc) case I40E_BULK_LATENCY: new_itr = I40E_ITR_18K; break; - case I40E_ULTRA_LATENCY: - new_itr = I40E_ITR_8K; - break; default: break; } diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.h b/drivers/net/ethernet/intel/i40e/i40e_txrx.h index f0a0eabc2666..e6456e8a899c 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_txrx.h +++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.h @@ -454,7 +454,6 @@ enum i40e_latency_range { I40E_LOWEST_LATENCY = 0, I40E_LOW_LATENCY = 1, I40E_BULK_LATENCY = 2, - I40E_ULTRA_LATENCY = 3, }; struct i40e_ring_container { diff --git a/drivers/net/ethernet/intel/i40evf/i40e_txrx.c b/drivers/net/ethernet/intel/i40evf/i40e_txrx.c index f15e341ada9e..2f7d9f4a6746 100644 ---
[net-next 07/15] i40e/i40evf: support for VF VLAN tag stripping control
From: Mariusz StachuraThis patch gives VF capability to control VLAN tag stripping via ethtool. As rx-vlan-offload was fixed before, now the VF is able to change it using "ethtool --offload rxvlan on/off" settings. Signed-off-by: Mariusz Stachura Signed-off-by: Jeff Kirsher --- drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c | 60 ++ drivers/net/ethernet/intel/i40evf/i40evf.h | 4 ++ drivers/net/ethernet/intel/i40evf/i40evf_main.c| 33 .../net/ethernet/intel/i40evf/i40evf_virtchnl.c| 40 +++ include/linux/avf/virtchnl.h | 5 ++ 5 files changed, 142 insertions(+) diff --git a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c index 27d87bef4ba3..4d1e670f490e 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c +++ b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c @@ -2529,6 +2529,60 @@ static int i40e_vc_set_rss_hena(struct i40e_vf *vf, u8 *msg, u16 msglen) return i40e_vc_send_resp_to_vf(vf, VIRTCHNL_OP_SET_RSS_HENA, aq_ret); } +/** + * i40e_vc_enable_vlan_stripping + * @vf: pointer to the VF info + * @msg: pointer to the msg buffer + * @msglen: msg length + * + * Enable vlan header stripping for the VF + **/ +static int i40e_vc_enable_vlan_stripping(struct i40e_vf *vf, u8 *msg, +u16 msglen) +{ + struct i40e_vsi *vsi = vf->pf->vsi[vf->lan_vsi_idx]; + i40e_status aq_ret = 0; + + if (!test_bit(I40E_VF_STATE_ACTIVE, >vf_states)) { + aq_ret = I40E_ERR_PARAM; + goto err; + } + + i40e_vlan_stripping_enable(vsi); + + /* send the response to the VF */ +err: + return i40e_vc_send_resp_to_vf(vf, VIRTCHNL_OP_ENABLE_VLAN_STRIPPING, + aq_ret); +} + +/** + * i40e_vc_disable_vlan_stripping + * @vf: pointer to the VF info + * @msg: pointer to the msg buffer + * @msglen: msg length + * + * Disable vlan header stripping for the VF + **/ +static int i40e_vc_disable_vlan_stripping(struct i40e_vf *vf, u8 *msg, + u16 msglen) +{ + struct i40e_vsi *vsi = vf->pf->vsi[vf->lan_vsi_idx]; + i40e_status aq_ret = 0; + + if (!test_bit(I40E_VF_STATE_ACTIVE, >vf_states)) { + aq_ret = I40E_ERR_PARAM; + goto err; + } + + i40e_vlan_stripping_disable(vsi); + + /* send the response to the VF */ +err: + return i40e_vc_send_resp_to_vf(vf, VIRTCHNL_OP_DISABLE_VLAN_STRIPPING, + aq_ret); +} + /** * i40e_vc_process_vf_msg * @pf: pointer to the PF structure @@ -2648,6 +2702,12 @@ int i40e_vc_process_vf_msg(struct i40e_pf *pf, s16 vf_id, u32 v_opcode, case VIRTCHNL_OP_SET_RSS_HENA: ret = i40e_vc_set_rss_hena(vf, msg, msglen); break; + case VIRTCHNL_OP_ENABLE_VLAN_STRIPPING: + ret = i40e_vc_enable_vlan_stripping(vf, msg, msglen); + break; + case VIRTCHNL_OP_DISABLE_VLAN_STRIPPING: + ret = i40e_vc_disable_vlan_stripping(vf, msg, msglen); + break; case VIRTCHNL_OP_UNKNOWN: default: diff --git a/drivers/net/ethernet/intel/i40evf/i40evf.h b/drivers/net/ethernet/intel/i40evf/i40evf.h index e5293d35fb6a..82f69031e5cd 100644 --- a/drivers/net/ethernet/intel/i40evf/i40evf.h +++ b/drivers/net/ethernet/intel/i40evf/i40evf.h @@ -261,6 +261,8 @@ struct i40evf_adapter { #define I40EVF_FLAG_AQ_RELEASE_PROMISC BIT(16) #define I40EVF_FLAG_AQ_REQUEST_ALLMULTIBIT(17) #define I40EVF_FLAG_AQ_RELEASE_ALLMULTIBIT(18) +#define I40EVF_FLAG_AQ_ENABLE_VLAN_STRIPPING BIT(19) +#define I40EVF_FLAG_AQ_DISABLE_VLAN_STRIPPING BIT(20) /* OS defined structs */ struct net_device *netdev; @@ -358,6 +360,8 @@ void i40evf_get_hena(struct i40evf_adapter *adapter); void i40evf_set_hena(struct i40evf_adapter *adapter); void i40evf_set_rss_key(struct i40evf_adapter *adapter); void i40evf_set_rss_lut(struct i40evf_adapter *adapter); +void i40evf_enable_vlan_stripping(struct i40evf_adapter *adapter); +void i40evf_disable_vlan_stripping(struct i40evf_adapter *adapter); void i40evf_virtchnl_completion(struct i40evf_adapter *adapter, enum virtchnl_ops v_opcode, i40e_status v_retval, u8 *msg, u16 msglen); diff --git a/drivers/net/ethernet/intel/i40evf/i40evf_main.c b/drivers/net/ethernet/intel/i40evf/i40evf_main.c index 258e8e27068b..9ee277e87f10 100644 --- a/drivers/net/ethernet/intel/i40evf/i40evf_main.c +++ b/drivers/net/ethernet/intel/i40evf/i40evf_main.c @@ -1676,6 +1676,16 @@ static void i40evf_watchdog_task(struct work_struct *work) goto watchdog_done; } +
[net-next 08/15] i40e: 25G FEC status improvements
From: Mariusz StachuraThis patch improves the system log message. The log message will be expanded to include the FEC mode the FW requested before link was established. Signed-off-by: Mariusz Stachura Tested-by: Andrew Bowers Signed-off-by: Jeff Kirsher --- drivers/net/ethernet/intel/i40e/i40e_main.c | 18 -- 1 file changed, 16 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c index 3a6a752c6c58..5a06cd23b9e6 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_main.c +++ b/drivers/net/ethernet/intel/i40e/i40e_main.c @@ -5354,6 +5354,7 @@ void i40e_print_link_message(struct i40e_vsi *vsi, bool isup) char *speed = "Unknown"; char *fc = "Unknown"; char *fec = ""; + char *req_fec = ""; char *an = ""; new_speed = vsi->back->hw.phy.link_info.link_speed; @@ -5415,6 +5416,7 @@ void i40e_print_link_message(struct i40e_vsi *vsi, bool isup) } if (vsi->back->hw.phy.link_info.link_speed == I40E_LINK_SPEED_25GB) { + req_fec = ", Requested FEC: None"; fec = ", FEC: None"; an = ", Autoneg: False"; @@ -5427,10 +5429,22 @@ void i40e_print_link_message(struct i40e_vsi *vsi, bool isup) else if (vsi->back->hw.phy.link_info.fec_info & I40E_AQ_CONFIG_FEC_RS_ENA) fec = ", FEC: CL108 RS-FEC"; + + /* 'CL108 RS-FEC' should be displayed when RS is requested, or +* both RS and FC are requested +*/ + if (vsi->back->hw.phy.link_info.req_fec_info & + (I40E_AQ_REQUEST_FEC_KR | I40E_AQ_REQUEST_FEC_RS)) { + if (vsi->back->hw.phy.link_info.req_fec_info & + I40E_AQ_REQUEST_FEC_RS) + req_fec = ", Requested FEC: CL108 RS-FEC"; + else + req_fec = ", Requested FEC: CL74 FC-FEC/BASE-R"; + } } - netdev_info(vsi->netdev, "NIC Link is Up, %sbps Full Duplex%s%s, Flow Control: %s\n", - speed, fec, an, fc); + netdev_info(vsi->netdev, "NIC Link is Up, %sbps Full Duplex%s%s%s, Flow Control: %s\n", + speed, req_fec, fec, an, fc); } /** -- 2.14.1
[net-next 15/15] i40e/i40evf: avoid dynamic ITR updates when polling or low packet rate
From: Jacob KellerThe dynamic ITR algorithm depends on a calculation of usecs which assumes that the interrupts have been firing constantly at the interrupt throttle rate. This is not guaranteed because we could have a low packet rate, or have been polling in software. We'll estimate whether this is the case by using jiffies to determine if we've been too long. If the time difference of jiffies is larger we are guaranteed to have an incorrect calculation. If the time difference of jiffies is smaller we might have been polling some but the difference shouldn't affect the calculation too much. This ensures that we don't get stuck in BULK latency during certain rare situations where we receive bursts of packets that force us into NAPI polling. Signed-off-by: Jacob Keller Tested-by: Andrew Bowers Signed-off-by: Jeff Kirsher --- drivers/net/ethernet/intel/i40e/i40e_txrx.c | 22 +- drivers/net/ethernet/intel/i40e/i40e_txrx.h | 1 + drivers/net/ethernet/intel/i40evf/i40e_txrx.c | 22 +- drivers/net/ethernet/intel/i40evf/i40e_txrx.h | 1 + 4 files changed, 36 insertions(+), 10 deletions(-) diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c index f00f233092e9..1519dfb851d0 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c +++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c @@ -961,11 +961,25 @@ static bool i40e_set_new_dynamic_itr(struct i40e_ring_container *rc) enum i40e_latency_range new_latency_range = rc->latency_range; u32 new_itr = rc->itr; int bytes_per_int; - int usecs; + unsigned int usecs, estimated_usecs; if (rc->total_packets == 0 || !rc->itr) return false; + usecs = (rc->itr << 1) * ITR_COUNTDOWN_START; + bytes_per_int = rc->total_bytes / usecs; + + /* The calculations in this algorithm depend on interrupts actually +* firing at the ITR rate. This may not happen if the packet rate is +* really low, or if we've been napi polling. Check to make sure +* that's not the case before we continue. +*/ + estimated_usecs = jiffies_to_usecs(jiffies - rc->last_itr_update); + if (estimated_usecs > usecs) { + new_latency_range = I40E_LOW_LATENCY; + goto reset_latency; + } + /* simple throttlerate management * 0-10MB/s lowest (5 ints/s) * 10-20MB/s low(2 ints/s) @@ -977,9 +991,6 @@ static bool i40e_set_new_dynamic_itr(struct i40e_ring_container *rc) * are in 2 usec increments in the ITR registers, and make sure * to use the smoothed values that the countdown timer gives us. */ - usecs = (rc->itr << 1) * ITR_COUNTDOWN_START; - bytes_per_int = rc->total_bytes / usecs; - switch (new_latency_range) { case I40E_LOWEST_LATENCY: if (bytes_per_int > 10) @@ -998,6 +1009,7 @@ static bool i40e_set_new_dynamic_itr(struct i40e_ring_container *rc) break; } +reset_latency: rc->latency_range = new_latency_range; switch (new_latency_range) { @@ -1016,12 +1028,12 @@ static bool i40e_set_new_dynamic_itr(struct i40e_ring_container *rc) rc->total_bytes = 0; rc->total_packets = 0; + rc->last_itr_update = jiffies; if (new_itr != rc->itr) { rc->itr = new_itr; return true; } - return false; } diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.h b/drivers/net/ethernet/intel/i40e/i40e_txrx.h index e6456e8a899c..2f848bc5e391 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_txrx.h +++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.h @@ -461,6 +461,7 @@ struct i40e_ring_container { struct i40e_ring *ring; unsigned int total_bytes; /* total bytes processed this int */ unsigned int total_packets; /* total packets processed this int */ + unsigned long last_itr_update; /* jiffies of last ITR update */ u16 count; enum i40e_latency_range latency_range; u16 itr; diff --git a/drivers/net/ethernet/intel/i40evf/i40e_txrx.c b/drivers/net/ethernet/intel/i40evf/i40e_txrx.c index 2f7d9f4a6746..c32c62462c84 100644 --- a/drivers/net/ethernet/intel/i40evf/i40e_txrx.c +++ b/drivers/net/ethernet/intel/i40evf/i40e_txrx.c @@ -359,11 +359,25 @@ static bool i40e_set_new_dynamic_itr(struct i40e_ring_container *rc) enum i40e_latency_range new_latency_range = rc->latency_range; u32 new_itr = rc->itr; int bytes_per_int; - int usecs; + unsigned int usecs, estimated_usecs; if (rc->total_packets == 0 || !rc->itr) return false; + usecs = (rc->itr << 1) * ITR_COUNTDOWN_START; + bytes_per_int = rc->total_bytes /
[net-next 10/15] i40e: remove workaround for resetting XPS
From: Jacob KellerSince commit 3ffa037d7f78 ("i40e: Set XPS bit mask to zero in DCB mode") we've tried to reset the XPS settings by building a custom empty CPU mask. This workaround is not necessary because we're not really removing the XPS setting, but simply setting it so that no CPU is valid. Second, we shorten the code further by using zalloc_cpumask_var instead of a separate call to bitmap_zero(). Signed-off-by: Jacob Keller Tested-by: Andrew Bowers Signed-off-by: Jeff Kirsher --- drivers/net/ethernet/intel/i40e/i40e_main.c | 17 + 1 file changed, 5 insertions(+), 12 deletions(-) diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c index 0962b85ef6f3..7366e7c7f399 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_main.c +++ b/drivers/net/ethernet/intel/i40e/i40e_main.c @@ -2874,22 +2874,15 @@ static void i40e_vsi_free_rx_resources(struct i40e_vsi *vsi) static void i40e_config_xps_tx_ring(struct i40e_ring *ring) { struct i40e_vsi *vsi = ring->vsi; - cpumask_var_t mask; if (!ring->q_vector || !ring->netdev) return; - /* Single TC mode enable XPS */ - if (vsi->tc_config.numtc <= 1) { - if (!test_and_set_bit(__I40E_TX_XPS_INIT_DONE, >state)) - netif_set_xps_queue(ring->netdev, - >q_vector->affinity_mask, - ring->queue_index); - } else if (alloc_cpumask_var(, GFP_KERNEL)) { - /* Disable XPS to allow selection based on TC */ - bitmap_zero(cpumask_bits(mask), nr_cpumask_bits); - netif_set_xps_queue(ring->netdev, mask, ring->queue_index); - free_cpumask_var(mask); + if ((vsi->tc_config.numtc <= 1) && + !test_and_set_bit(__I40E_TX_XPS_INIT_DONE, >state)) { + netif_set_xps_queue(ring->netdev, + >q_vector->affinity_mask, + ring->queue_index); } /* schedule our worker thread which will take care of -- 2.14.1
[net-next 01/15] i40e: Update state variable for adminq subtask
From: Sudheer MogilappagariDuring NVM update, state machine gets into unrecoverable state because i40e_clean_adminq_subtask can get scheduled after the admin queue command but before other state variables are updated. This causes incorrect input to i40e_nvmupd_check_wait_event and state transitions don't happen. This fix updates the state variables so that adminq_subtask will have accurate information whenever it gets scheduled. Signed-off-by: Sudheer Mogilappagari Tested-by: Andrew Bowers Signed-off-by: Jeff Kirsher --- drivers/net/ethernet/intel/i40e/i40e_nvm.c | 10 -- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/intel/i40e/i40e_nvm.c b/drivers/net/ethernet/intel/i40e/i40e_nvm.c index 2cf7db2dc7cd..96afef98a08f 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_nvm.c +++ b/drivers/net/ethernet/intel/i40e/i40e_nvm.c @@ -755,7 +755,11 @@ i40e_status i40e_nvmupd_command(struct i40e_hw *hw, /* Acquire lock to prevent race condition where adminq_task * can execute after i40e_nvmupd_nvm_read/write but before state -* variables (nvm_wait_opcode, nvm_release_on_done) are updated +* variables (nvm_wait_opcode, nvm_release_on_done) are updated. +* +* During NVMUpdate, it is observed that lock could be held for +* ~5ms for most commands. However lock is held for ~60ms for +* NVMUPD_CSUM_LCB command. */ mutex_lock(>aq.arq_mutex); switch (hw->nvmupd_state) { @@ -778,7 +782,8 @@ i40e_status i40e_nvmupd_command(struct i40e_hw *hw, */ if (cmd->offset == 0x) { i40e_nvmupd_check_wait_event(hw, hw->nvm_wait_opcode); - return 0; + status = 0; + goto exit; } status = I40E_ERR_NOT_READY; @@ -793,6 +798,7 @@ i40e_status i40e_nvmupd_command(struct i40e_hw *hw, *perrno = -ESRCH; break; } +exit: mutex_unlock(>aq.arq_mutex); return status; } -- 2.14.1
Re: [PATCH] ARM: dts: rk3228-evb: Fix the compiling error
Hi Dave, On Sun, 27 Aug 2017 16:59:43 -0700 (PDT) David Millerwrote: > > Sorry, I wasn't aware that this should go via my tree, I'll take care of > this soon. Thanks. -- Cheers, Stephen Rothwell
Re: [PATCH] ARM: dts: rk3228-evb: Fix the compiling error
From: Stephen RothwellDate: Mon, 28 Aug 2017 08:32:54 +1000 > Hi Dave (Miller), > > On Tue, 22 Aug 2017 21:52:51 +1000 Stephen Rothwell > wrote: >> >> Thanks. >> >> On Tue, 22 Aug 2017 17:24:25 +0800 David Wu wrote: >> > >> > This patch solves the following error: >> > arch/arm/boot/dts/rk3228-evb.dtb: ERROR (phandle_references): Reference to >> > non-existent node or label "phy0" >> > >> > Fixess db40f15b53e4 ("ARM: dts: rk3228-evb: Enable the integrated PHY for >> > gmac") >> > Signed-off-by: David Wu >> >> Reported-by: Stephen Rothwell > > Ping? Sorry, I wasn't aware that this should go via my tree, I'll take care of this soon.
Re: [PATCH] connector: Delete an error message for a failed memory allocation in cn_queue_alloc_callback_entry()
On 8/27/17 3:26 PM, SF Markus Elfring wrote: > From: Markus Elfring> Date: Sun, 27 Aug 2017 21:18:37 +0200 > > Omit an extra message for a memory allocation failure in this function. > > This issue was detected by using the Coccinelle software. Did coccinelle trip on the message or the fact you weren't returning NULL? > > Signed-off-by: Markus Elfring > --- > drivers/connector/cn_queue.c | 4 +--- > 1 file changed, 1 insertion(+), 3 deletions(-) > > diff --git a/drivers/connector/cn_queue.c b/drivers/connector/cn_queue.c > index 1f8bf054d11c..e4f31d679f02 100644 > --- a/drivers/connector/cn_queue.c > +++ b/drivers/connector/cn_queue.c > @@ -40,10 +40,8 @@ cn_queue_alloc_callback_entry(struct cn_queue_dev *dev, > const char *name, > struct cn_callback_entry *cbq; > > cbq = kzalloc(sizeof(*cbq), GFP_KERNEL); > - if (!cbq) { > - pr_err("Failed to create new callback queue.\n"); > + if (!cbq) > return NULL; > - } Wny not: if (!cbq) { pr_err("Failed to create new callback queue.\n"); + return NULL; } > > atomic_set(>refcnt, 1); > >
Re: [PATCH 0/4] irda: move it to drivers/staging so we can delete it
On Sun, 2017-08-27 at 18:53 +0200, Greg Kroah-Hartman wrote: > On Sun, Aug 27, 2017 at 09:19:19AM -0700, Joe Perches wrote: > > On Sun, 2017-08-27 at 18:13 +0200, Greg Kroah-Hartman wrote: > > > On Sun, Aug 27, 2017 at 08:35:43AM -0700, Joe Perches wrote: > > > > On Sun, 2017-08-27 at 17:03 +0200, Greg Kroah-Hartman wrote: > > > > > The IRDA code has long been obsolete and broken. So, to keep people > > > > > from trying to use it, and to prevent people from having to maintain > > > > > it, > > > > > let's move it to drivers/staging/ so that we can delete it entirely > > > > > from > > > > > the kernel in a few releases. > > > > > > > > > > > > MAINTAINERS should be updated as well. > > > > > > > > It'd probably be nice to try to get an email to > > > > the irda mailing list too if it still works. > > > > > > As get_maintainer.pl didn't show it, odds are it doesn't... > > > > get_maintainer doesn't show it because it's subscriber-only. > > If you want get_maintainer to show it, add -s > > > > $ ./scripts/get_maintainer.pl -s -f net/irda/ > > Samuel Ortiz(maintainer:IRDA SUBSYSTEM) > > "David S. Miller" (maintainer:NETWORKING [GENERAL]) > > irda-us...@lists.sourceforge.net (subscriber list:IRDA SUBSYSTEM) > > netdev@vger.kernel.org (open list:IRDA SUBSYSTEM) > > linux-ker...@vger.kernel.org (open list) > > Sorry, am not going to subscribe to a random list just to send patches > that delete the subsystem :) Then you do a disservice to those that actually might be using that subsystem.
Re: [PATCH] igb: check memory allocation failure
On 8/27/17 2:42 AM, Christophe JAILLET wrote: > Check memory allocation failures and return -ENOMEM in such cases, as > already done for other memory allocations in this function. > > This avoids NULL pointers dereference. > > Signed-off-by: Christophe JAILLET> --- > drivers/net/ethernet/intel/igb/igb_main.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/drivers/net/ethernet/intel/igb/igb_main.c > b/drivers/net/ethernet/intel/igb/igb_main.c > index fd4a46b03cc8..837d9b46a390 100644 > --- a/drivers/net/ethernet/intel/igb/igb_main.c > +++ b/drivers/net/ethernet/intel/igb/igb_main.c > @@ -3162,6 +3162,8 @@ static int igb_sw_init(struct igb_adapter *adapter) > /* Setup and initialize a copy of the hw vlan table array */ > adapter->shadow_vfta = kcalloc(E1000_VLAN_FILTER_TBL_SIZE, sizeof(u32), > GFP_ATOMIC); > + if (!adapter->shadow_vfta) > + return -ENOMEM; Looks reasonable to me. A larger issue though I see in this function is that if we return -ENOMEM here, and if we return -ENOMEM from igb_init_interrupt_scheme() below on failure, we leak adapter->mac_table (and adapter->shadow_vfta in the latter). We should add a proper unwind to free up the memory on failure. -PJ
Re: [PATCH] ARM: dts: rk3228-evb: Fix the compiling error
Hi Dave (Miller), On Tue, 22 Aug 2017 21:52:51 +1000 Stephen Rothwellwrote: > > Thanks. > > On Tue, 22 Aug 2017 17:24:25 +0800 David Wu wrote: > > > > This patch solves the following error: > > arch/arm/boot/dts/rk3228-evb.dtb: ERROR (phandle_references): Reference to > > non-existent node or label "phy0" > > > > Fixess db40f15b53e4 ("ARM: dts: rk3228-evb: Enable the integrated PHY for > > gmac") > > Signed-off-by: David Wu > > Reported-by: Stephen Rothwell Ping? -- Cheers, Stephen Rothwell
RE: [PATCH] DSA support for Micrel KSZ8895
Pavel, Thanks for update and sorry about email format (due to web-access version) I'll do review when getting back to office later this week. - Woojung From: Pavel Machek [pa...@denx.de] Sent: Sunday, August 27, 2017 8:36 AM To: Woojung Huh - C21699; nathan.leigh.con...@gmail.com Cc: vivien.dide...@savoirfairelinux.com; f.faine...@gmail.com; netdev@vger.kernel.org; linux-ker...@vger.kernel.org; tristram...@micrel.com; and...@lunn.ch; pa...@denx.de Subject: [PATCH] DSA support for Micrel KSZ8895 Hi! So I fought with the driver a bit more, and now I have something that kind-of-works. "great great hack" belows worries me. Yeah, disabled code needs to be removed before merge. No, tag_ksz part probably is not acceptable. Do you see solution better than just copying it into tag_ksz1 file? Any more comments, etc? Help would be welcome.
[PATCH net-next] bridge: fdb add and delete tracepoints
From: Roopa PrabhuTracepoints to trace bridge forwarding database updates. Signed-off-by: Roopa Prabhu --- include/trace/events/bridge.h | 98 +++ net/bridge/br_fdb.c | 7 net/core/net-traces.c | 6 +++ 3 files changed, 111 insertions(+) create mode 100644 include/trace/events/bridge.h diff --git a/include/trace/events/bridge.h b/include/trace/events/bridge.h new file mode 100644 index 000..e2d52cf --- /dev/null +++ b/include/trace/events/bridge.h @@ -0,0 +1,98 @@ +#undef TRACE_SYSTEM +#define TRACE_SYSTEM bridge + +#if !defined(_TRACE_BRIDGE_H) || defined(TRACE_HEADER_MULTI_READ) +#define _TRACE_BRIDGE_H + +#include +#include + +#include "../../../net/bridge/br_private.h" + +TRACE_EVENT(br_fdb_add, + + TP_PROTO(struct ndmsg *ndm, struct net_device *dev, +const unsigned char *addr, u16 vid, u16 nlh_flags), + + TP_ARGS(ndm, dev, addr, vid, nlh_flags), + + TP_STRUCT__entry( + __field(u8, ndm_flags) + __string(dev, dev->name) + __array(unsigned char, addr, 6) + __field(u16, vid) + __field(u16, nlh_flags) + ), + + TP_fast_assign( + __assign_str(dev, dev->name); + memcpy(__entry->addr, addr, 6); + __entry->vid = vid; + __entry->nlh_flags = nlh_flags; + __entry->ndm_flags = ndm->ndm_flags; + ), + + TP_printk("dev %s addr %02x:%02x:%02x:%02x:%02x:%02x vid %u nlh_flags %x ndm_flags = %x", + __get_str(dev), __entry->addr[0], __entry->addr[1], + __entry->addr[2], __entry->addr[3], __entry->addr[4], + __entry->addr[5], __entry->vid, + __entry->nlh_flags, __entry->ndm_flags) +); + +TRACE_EVENT(br_fdb_external_learn_add, + + TP_PROTO(struct net_bridge *br, struct net_bridge_port *p, +const unsigned char *addr, u16 vid), + + TP_ARGS(br, p, addr, vid), + + TP_STRUCT__entry( + __string(br_dev, br->dev->name) + __string(dev, p->dev->name) + __array(unsigned char, addr, 6) + __field(u16, vid) + ), + + TP_fast_assign( + __assign_str(br_dev, br ? br->dev->name : "null"); + __assign_str(dev, p ? p->dev->name : "null"); + memcpy(__entry->addr, addr, 6); + __entry->vid = vid; + ), + + TP_printk("br_dev %s port %s addr %02x:%02x:%02x:%02x:%02x:%02x vid %u", + __get_str(br_dev), __get_str(dev), __entry->addr[0], + __entry->addr[1], __entry->addr[2], __entry->addr[3], + __entry->addr[4], __entry->addr[5], __entry->vid) +); + +TRACE_EVENT(fdb_delete, + + TP_PROTO(struct net_bridge *br, struct net_bridge_fdb_entry *f), + + TP_ARGS(br, f), + + TP_STRUCT__entry( + __string(br_dev, br->dev->name) + __string(dev, f->dst ? f->dst->dev->name : "null") + __array(unsigned char, addr, 6) + __field(u16, vid) + ), + + TP_fast_assign( + __assign_str(br_dev, br ? br->dev->name : "null"); + __assign_str(dev, f->dst ? f->dst->dev->name : "null"); + memcpy(__entry->addr, f->addr.addr, 6); + __entry->vid = f->vlan_id; + ), + + TP_printk("br_dev %s dev %s addr %02x:%02x:%02x:%02x:%02x:%02x vid %u", + __get_str(br_dev), __get_str(dev), __entry->addr[0], + __entry->addr[1], __entry->addr[2], __entry->addr[3], + __entry->addr[4], __entry->addr[5], __entry->vid) +); + +#endif /* _TRACE_BRIDGE_H */ + +/* This part must be outside protection */ +#include diff --git a/net/bridge/br_fdb.c b/net/bridge/br_fdb.c index a79b648..be5e1da 100644 --- a/net/bridge/br_fdb.c +++ b/net/bridge/br_fdb.c @@ -25,6 +25,7 @@ #include #include #include +#include #include "br_private.h" static struct kmem_cache *br_fdb_cache __read_mostly; @@ -171,6 +172,8 @@ static void fdb_del_hw_addr(struct net_bridge *br, const unsigned char *addr) static void fdb_delete(struct net_bridge *br, struct net_bridge_fdb_entry *f) { + trace_fdb_delete(br, f); + if (f->is_static) fdb_del_hw_addr(br, f->addr.addr); @@ -870,6 +873,8 @@ int br_fdb_add(struct ndmsg *ndm, struct nlattr *tb[], struct net_bridge *br = NULL; int err = 0; + trace_br_fdb_add(ndm, dev, addr, vid, nlh_flags); + if (!(ndm->ndm_state & (NUD_PERMANENT|NUD_NOARP|NUD_REACHABLE))) { pr_info("bridge: RTM_NEWNEIGH with invalid state %#x\n", ndm->ndm_state); return -EINVAL; @@ -1066,6 +1071,8 @@ int br_fdb_external_learn_add(struct net_bridge *br, struct net_bridge_port *p, bool modified = false; int err
Re: [PATCH v2 0/2] enable hires timer to timeout datagram socket
On Tue, Aug 22, 2017 at 09:30:30PM -0700, David Miller wrote: > From: Vallish Vaidyeshwara> Date: Wed, 23 Aug 2017 00:10:25 + > > > I am submitting 2 patch series to enable hires timer to timeout > > datagram sockets (AF_UNIX & AF_INET domain) and test code to test > > timeout accuracy on these sockets. > > This is not reasonable. > > If you want high resolution events with real guarantees, please use > the kernel interfaces which provide this as explained to you as > feedback by other reviewers. > > I'm not applying this, sorry. Hello David, I respect the decision not to upstream this patch series, however I wanted to provide additional details. Application wanting high resolution events with real guarantees is not the case, but the case here is regression in system call behavior: 1) Change in system call behavior: strace from 4.4 test run of waiting for 180 seconds on datagram socket: 10:25:48.239685 setsockopt(3, SOL_SOCKET, SO_RCVTIMEO, "\264\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0", 16) = 0 10:25:48.239755 recvmsg(3, 0x7ffd0a3beec0, 0) = -1 EAGAIN (Resource temporarily unavailable) 10:28:48.236989 fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 0), ...}) = 0 strace from 4.9 test run of waiting for 180 seconds on datagram socket times out close to 195 seconds: setsockopt(3, SOL_SOCKET, SO_RCVTIMEO, "\264\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0", 16) = 0 <0.28> recvmsg(3, 0x7ffd6a2c4380, 0) = -1 EAGAIN (Resource temporarily unavailable) <194.852000> fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 0), ...}) = 0 <0.18> This is the change in behavior of system call that is causing our application to regress on 4.9 kernel. There are events which need to be run on timeouts and now response time for such timeouts on 4.9 kernel are being triggered with extended delay of close to 195 seconds as in one of the test runs shown above. 2) Comparison with MacOS: I ran the same test on OS X El Capitan version 10.11.6 and the behavior is consistent with Linux 4.4 Kernel behavior. I have not tested the program on other flavors of OS like HPUX or AIX or Solaris, but I guess if these OS implement SO_RCVTIMEO and tested, this behavior will not be different than Linux 4.4 kernel. 3) Standards Specification: Opengroups standard does not talk about how quick SO_RCVTIMEO need to respond for timeouts. However, the standards for select system call do mention that timeout need to respond quickly. It would be good to restore SO_RCVTIMEO behavior to 4.4 kernel and have SO_RCVTIMEO be consistent with select timeout. 4) Changing application code: Any change to application code to accommodate this change of behavior in system call breaks application migration between 4.4 kernel and 4.9 kernel. Moreover, making application code change is not feasible in all cases as in the case where the source code is not available (third party vendor). Thanks. -Vallish
[PATCH] connector: Delete an error message for a failed memory allocation in cn_queue_alloc_callback_entry()
From: Markus ElfringDate: Sun, 27 Aug 2017 21:18:37 +0200 Omit an extra message for a memory allocation failure in this function. This issue was detected by using the Coccinelle software. Signed-off-by: Markus Elfring --- drivers/connector/cn_queue.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/drivers/connector/cn_queue.c b/drivers/connector/cn_queue.c index 1f8bf054d11c..e4f31d679f02 100644 --- a/drivers/connector/cn_queue.c +++ b/drivers/connector/cn_queue.c @@ -40,10 +40,8 @@ cn_queue_alloc_callback_entry(struct cn_queue_dev *dev, const char *name, struct cn_callback_entry *cbq; cbq = kzalloc(sizeof(*cbq), GFP_KERNEL); - if (!cbq) { - pr_err("Failed to create new callback queue.\n"); + if (!cbq) return NULL; - } atomic_set(>refcnt, 1); -- 2.14.1
Re: [PATCH V2 net-next] net-next/hinic: Fix MTU limitation
On Mon, Aug 28, 2017 at 01:20:26AM +0800, Aviad Krawczyk wrote: > Fix the hw MTU limitation by setting max_mtu > > Signed-off-by: Aviad Krawczyk> Signed-off-by: Zhao Chen Reviewed-by: Andrew Lunn Andrew
[PATCH net-next] net-next/hinic: fix comparison of a uint16_t type with -1
Remove the search for index of constant buffer size Signed-off-by: Aviad KrawczykSigned-off-by: Zhao Chen --- drivers/net/ethernet/huawei/hinic/hinic_hw_dev.c | 37 +--- drivers/net/ethernet/huawei/hinic/hinic_hw_qp.h | 21 ++ 2 files changed, 22 insertions(+), 36 deletions(-) diff --git a/drivers/net/ethernet/huawei/hinic/hinic_hw_dev.c b/drivers/net/ethernet/huawei/hinic/hinic_hw_dev.c index 09dec6d..79b5674 100644 --- a/drivers/net/ethernet/huawei/hinic/hinic_hw_dev.c +++ b/drivers/net/ethernet/huawei/hinic/hinic_hw_dev.c @@ -69,31 +69,6 @@ struct hinic_dev_cap { u8 rsvd3[208]; }; -struct rx_buf_sz { - int idx; - size_t sz; -}; - -static struct rx_buf_sz rx_buf_sz_table[] = { - {0, 32}, - {1, 64}, - {2, 96}, - {3, 128}, - {4, 192}, - {5, 256}, - {6, 384}, - {7, 512}, - {8, 768}, - {9, 1024}, - {10, 1536}, - {11, 2048}, - {12, 3072}, - {13, 4096}, - {14, 8192}, - {15, 16384}, - {-1, -1}, -}; - /** * get_capability - convert device capabilities to NIC capabilities * @hwdev: the HW device to set and convert device capabilities for @@ -330,7 +305,6 @@ static int set_hw_ioctxt(struct hinic_hwdev *hwdev, unsigned int rq_depth, struct hinic_cmd_hw_ioctxt hw_ioctxt; struct pci_dev *pdev = hwif->pdev; struct hinic_pfhwdev *pfhwdev; - int i; if (!HINIC_IS_PF(hwif) && !HINIC_IS_PPF(hwif)) { dev_err(>dev, "Unsupported PCI Function type\n"); @@ -344,16 +318,7 @@ static int set_hw_ioctxt(struct hinic_hwdev *hwdev, unsigned int rq_depth, hw_ioctxt.rq_depth = ilog2(rq_depth); - for (i = 0; ; i++) { - if ((rx_buf_sz_table[i].sz == HINIC_RX_BUF_SZ) || - (rx_buf_sz_table[i].sz == -1)) { - hw_ioctxt.rx_buf_sz_idx = rx_buf_sz_table[i].idx; - break; - } - } - - if (hw_ioctxt.rx_buf_sz_idx == -1) - return -EINVAL; + hw_ioctxt.rx_buf_sz_idx = HINIC_RX_BUF_SZ_IDX; hw_ioctxt.sq_depth = ilog2(sq_depth); diff --git a/drivers/net/ethernet/huawei/hinic/hinic_hw_qp.h b/drivers/net/ethernet/huawei/hinic/hinic_hw_qp.h index e642a8a..df729a1 100644 --- a/drivers/net/ethernet/huawei/hinic/hinic_hw_qp.h +++ b/drivers/net/ethernet/huawei/hinic/hinic_hw_qp.h @@ -53,7 +53,9 @@ #define HINIC_SQ_DEPTH SZ_4K #define HINIC_RQ_DEPTH SZ_4K +/* In any change to HINIC_RX_BUF_SZ, HINIC_RX_BUF_SZ_IDX must be changed */ #define HINIC_RX_BUF_SZ 2048 +#define HINIC_RX_BUF_SZ_IDXHINIC_RX_BUF_SZ_2048_IDX #define HINIC_MIN_TX_WQE_SIZE(wq) \ ALIGN(HINIC_SQ_WQE_SIZE(1), (wq)->wqebb_size) @@ -61,6 +63,25 @@ #define HINIC_MIN_TX_NUM_WQEBBS(sq) \ (HINIC_MIN_TX_WQE_SIZE((sq)->wq) / (sq)->wq->wqebb_size) +enum hinic_rx_buf_sz_idx { + HINIC_RX_BUF_SZ_32_IDX, + HINIC_RX_BUF_SZ_64_IDX, + HINIC_RX_BUF_SZ_96_IDX, + HINIC_RX_BUF_SZ_128_IDX, + HINIC_RX_BUF_SZ_192_IDX, + HINIC_RX_BUF_SZ_256_IDX, + HINIC_RX_BUF_SZ_384_IDX, + HINIC_RX_BUF_SZ_512_IDX, + HINIC_RX_BUF_SZ_768_IDX, + HINIC_RX_BUF_SZ_1024_IDX, + HINIC_RX_BUF_SZ_1536_IDX, + HINIC_RX_BUF_SZ_2048_IDX, + HINIC_RX_BUF_SZ_3072_IDX, + HINIC_RX_BUF_SZ_4096_IDX, + HINIC_RX_BUF_SZ_8192_IDX, + HINIC_RX_BUF_SZ_16384_IDX, +}; + struct hinic_sq { struct hinic_hwif *hwif; -- 1.9.1
[PATCH V2 net-next] net-next/hinic: Fix MTU limitation
Fix the hw MTU limitation by setting max_mtu Signed-off-by: Aviad KrawczykSigned-off-by: Zhao Chen --- drivers/net/ethernet/huawei/hinic/hinic_main.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/net/ethernet/huawei/hinic/hinic_main.c b/drivers/net/ethernet/huawei/hinic/hinic_main.c index ae7ad48..eb53bd9 100644 --- a/drivers/net/ethernet/huawei/hinic/hinic_main.c +++ b/drivers/net/ethernet/huawei/hinic/hinic_main.c @@ -919,6 +919,7 @@ static int nic_dev_init(struct pci_dev *pdev) netdev->netdev_ops = _netdev_ops; netdev->ethtool_ops = _ethtool_ops; + netdev->max_mtu = ETH_MAX_MTU; nic_dev = netdev_priv(netdev); nic_dev->netdev = netdev; -- 1.9.1
Re: [PATCH] DSA support for Micrel KSZ8895
On August 27, 2017 5:36:58 AM PDT, Pavel Machekwrote: >Hi! > >So I fought with the driver a bit more, and now I have something that >kind-of-works. > >"great great hack" belows worries me. > >Yeah, disabled code needs to be removed before merge. > >No, tag_ksz part probably is not acceptable. Do you see solution >better than just copying it into tag_ksz1 file? You could have all Micrel tag implementations live under net/dsa/tag_ksz.c and have e.g: DSA_TAG_PROTO_KSZ for the current (newer) switches and DSA_TAG_PROTO_KSZ_LEGACY (or any other name) for the older switches and you would provide two sets of function pointers depending on which protocol is requested by the switch. Considering the minor difference needed in tagging here, it might be acceptable to actually keep the current functions and just have the xmit() call check what get_tag_protocol returns and use word 1 or 0 based on that. Even though that's a fast path it shouldn't hurt performance too much. If it does, we can always copy the tagging protocol into dsa_slave_priv so you have a fast access to it. > >Any more comments, etc? The MII emulation bits are interesting, was it not sufficient if you implemented phy_read and phy_write operations that perform the necessary internal PHY accesses or maybe you don't get access to standard MII registers? b53 does such a thing and we merely just need to do a simple shift to access the MII register number, thus avoiding the translation. > >Help would be welcome. I concur with Andrew, try to get a patch series, even an RFC one together so we can review things individually. How functional is your driver so far? I'd say the basic stuff to get working: counters (debugging), link management (auto-negotiation, forced, etc.) and basic bridging: all ports separate by default and working port to port switching when brought together in a bridge. VLAN, FDB, MDB, other ethtool goodies can be added later on. -- Florian
Re: [PATCH 0/4] irda: move it to drivers/staging so we can delete it
On Sun, Aug 27, 2017 at 09:19:19AM -0700, Joe Perches wrote: > On Sun, 2017-08-27 at 18:13 +0200, Greg Kroah-Hartman wrote: > > On Sun, Aug 27, 2017 at 08:35:43AM -0700, Joe Perches wrote: > > > On Sun, 2017-08-27 at 17:03 +0200, Greg Kroah-Hartman wrote: > > > > The IRDA code has long been obsolete and broken. So, to keep people > > > > from trying to use it, and to prevent people from having to maintain it, > > > > let's move it to drivers/staging/ so that we can delete it entirely from > > > > the kernel in a few releases. > > > > > > > > > MAINTAINERS should be updated as well. > > > > > > It'd probably be nice to try to get an email to > > > the irda mailing list too if it still works. > > > > As get_maintainer.pl didn't show it, odds are it doesn't... > > get_maintainer doesn't show it because it's subscriber-only. > If you want get_maintainer to show it, add -s > > $ ./scripts/get_maintainer.pl -s -f net/irda/ > Samuel Ortiz(maintainer:IRDA SUBSYSTEM) > "David S. Miller" (maintainer:NETWORKING [GENERAL]) > irda-us...@lists.sourceforge.net (subscriber list:IRDA SUBSYSTEM) > netdev@vger.kernel.org (open list:IRDA SUBSYSTEM) > linux-ker...@vger.kernel.org (open list) Sorry, am not going to subscribe to a random list just to send patches that delete the subsystem :) netdev@ should be all that is needed here anyway... thanks, greg k-h
Re: [PATCH] DSA support for Micrel KSZ8895
> No, tag_ksz part probably is not acceptable. Do you see solution > better than just copying it into tag_ksz1 file? How about something like this, which needs further work to actually compile, but should give you the idea. Andrew index 99e38af85fc5..843e77b7c270 100644 --- a/net/dsa/dsa.c +++ b/net/dsa/dsa.c @@ -49,8 +49,11 @@ const struct dsa_device_ops *dsa_device_ops[DSA_TAG_LAST] = { #ifdef CONFIG_NET_DSA_TAG_EDSA [DSA_TAG_PROTO_EDSA] = _netdev_ops, #endif -#ifdef CONFIG_NET_DSA_TAG_KSZ - [DSA_TAG_PROTO_KSZ] = _netdev_ops, +#ifdef CONFIG_NET_DSA_TAG_KSZ_8K + [DSA_TAG_PROTO_KSZ8K] = _netdev_ops, +#endif +#ifdef CONFIG_NET_DSA_TAG_KSZ_9K + [DSA_TAG_PROTO_KSZ9K] = _netdev_ops, #endif #ifdef CONFIG_NET_DSA_TAG_LAN9303 [DSA_TAG_PROTO_LAN9303] = _netdev_ops, diff --git a/net/dsa/tag_ksz.c b/net/dsa/tag_ksz.c index de66ca8e6201..398b833889f1 100644 --- a/net/dsa/tag_ksz.c +++ b/net/dsa/tag_ksz.c @@ -35,6 +35,9 @@ static struct sk_buff *ksz_xmit(struct sk_buff *skb, struct net_device *dev) { struct dsa_slave_priv *p = netdev_priv(dev); + struct dsa_port *dp = p->dp; + struct dsa_switch *ds = dp->ds; + struct dsa_switch_tree *dst = ds->dst; struct sk_buff *nskb; int padlen; u8 *tag; @@ -69,8 +72,14 @@ static struct sk_buff *ksz_xmit(struct sk_buff *skb, struct net_device *dev) } tag = skb_put(nskb, KSZ_INGRESS_TAG_LEN); - tag[0] = 0; - tag[1] = 1 << p->dp->index; /* destination port */ + if (dst->tag_ops == ksz8k_netdev_ops) { + tag[0] = 1 << p->dp->index; /* destination port */0; + tag[1] = 0; + } + + if (dst->tag_ops == ksz9k_netdev_ops) { + tag[0] = 0; + tag[1] = 1 << p->dp->index; /* destination port */ return nskb; } @@ -98,7 +107,12 @@ static struct sk_buff *ksz_rcv(struct sk_buff *skb, struct net_device *dev, return skb; } -const struct dsa_device_ops ksz_netdev_ops = { +const struct dsa_device_ops ksz8k_netdev_ops = { + .xmit = ksz_xmit, + .rcv= ksz_rcv, +}; + +const struct dsa_device_ops ksz9k_netdev_ops = { .xmit = ksz_xmit, .rcv= ksz_rcv, };
Re: [PATCH] DSA support for Micrel KSZ8895
> +/** > + * sw_r_phy - read data from PHY register > + * @sw: The switch instance. > + * @phy: PHY address to read. > + * @reg: PHY register to read. > + * @val: Buffer to store the read data. > + * > + * This routine reads data from the PHY register. > + */ > +static void sw_r_phy(struct ksz_device *sw, u16 phy, u16 reg, u16 *val) > +{ > + u8 ctrl; > + u8 restart; > + u8 link; > + u8 speed; > + u8 force; > + u8 p = phy; > + u16 data = 0; > + > + switch (reg) { > + case PHY_REG_CTRL: > + ksz_pread8(sw, p, P_LOCAL_CTRL, ); > + ksz_pread8(sw, p, P_NEG_RESTART_CTRL, ); > + ksz_pread8(sw, p, P_SPEED_STATUS, ); > + ksz_pread8(sw, p, P_FORCE_CTRL, ); > + if (restart & PORT_PHY_LOOPBACK) > + data |= PHY_LOOPBACK; > + if (force & PORT_FORCE_100_MBIT) > + data |= PHY_SPEED_100MBIT; > + if (!(force & PORT_AUTO_NEG_DISABLE)) > + data |= PHY_AUTO_NEG_ENABLE; > + if (restart & PORT_POWER_DOWN) > + data |= PHY_POWER_DOWN; > + if (restart & PORT_AUTO_NEG_RESTART) > + data |= PHY_AUTO_NEG_RESTART; > + if (force & PORT_FORCE_FULL_DUPLEX) > + data |= PHY_FULL_DUPLEX; > + if (speed & PORT_HP_MDIX) > + data |= PHY_HP_MDIX; > + if (restart & PORT_FORCE_MDIX) > + data |= PHY_FORCE_MDIX; > + if (restart & PORT_AUTO_MDIX_DISABLE) > + data |= PHY_AUTO_MDIX_DISABLE; > + if (restart & PORT_TX_DISABLE) > + data |= PHY_TRANSMIT_DISABLE; > + if (restart & PORT_LED_OFF) > + data |= PHY_LED_DISABLE; > + break; > + case PHY_REG_STATUS: > + ksz_pread8(sw, p, P_LINK_STATUS, ); > + ksz_pread8(sw, p, P_SPEED_STATUS, ); > + data = PHY_100BTX_FD_CAPABLE | > + PHY_100BTX_CAPABLE | > + PHY_10BT_FD_CAPABLE | > + PHY_10BT_CAPABLE | > + PHY_AUTO_NEG_CAPABLE; > + if (link & PORT_AUTO_NEG_COMPLETE) > + data |= PHY_AUTO_NEG_ACKNOWLEDGE; > + if (link & PORT_STAT_LINK_GOOD) > + data |= PHY_LINK_STATUS; > + break; > + case PHY_REG_ID_1: > + data = KSZ8895_ID_HI; > + break; > + case PHY_REG_ID_2: > + data = KSZ8895_ID_LO; > + break; According to the datasheet, the PHY has the normal ID registers, which have the value 0x0022, 0x1450. So it should be possible to have a standard PHY driver in drivers/net/phy. In fact, the IDs suggest it is a micrel phy, and 1430, 1435 are already supported. So it could be you only need minor modifications to the micrel.c. Andrew
Re: [PATCH 0/4] irda: move it to drivers/staging so we can delete it
On Sun, 2017-08-27 at 18:13 +0200, Greg Kroah-Hartman wrote: > On Sun, Aug 27, 2017 at 08:35:43AM -0700, Joe Perches wrote: > > On Sun, 2017-08-27 at 17:03 +0200, Greg Kroah-Hartman wrote: > > > The IRDA code has long been obsolete and broken. So, to keep people > > > from trying to use it, and to prevent people from having to maintain it, > > > let's move it to drivers/staging/ so that we can delete it entirely from > > > the kernel in a few releases. > > > > > > MAINTAINERS should be updated as well. > > > > It'd probably be nice to try to get an email to > > the irda mailing list too if it still works. > > As get_maintainer.pl didn't show it, odds are it doesn't... get_maintainer doesn't show it because it's subscriber-only. If you want get_maintainer to show it, add -s $ ./scripts/get_maintainer.pl -s -f net/irda/ Samuel Ortiz(maintainer:IRDA SUBSYSTEM) "David S. Miller" (maintainer:NETWORKING [GENERAL]) irda-us...@lists.sourceforge.net (subscriber list:IRDA SUBSYSTEM) netdev@vger.kernel.org (open list:IRDA SUBSYSTEM) linux-ker...@vger.kernel.org (open list)
Re: [PATCH 0/4] irda: move it to drivers/staging so we can delete it
On Sun, Aug 27, 2017 at 08:35:43AM -0700, Joe Perches wrote: > On Sun, 2017-08-27 at 17:03 +0200, Greg Kroah-Hartman wrote: > > The IRDA code has long been obsolete and broken. So, to keep people > > from trying to use it, and to prevent people from having to maintain it, > > let's move it to drivers/staging/ so that we can delete it entirely from > > the kernel in a few releases. > > MAINTAINERS should be updated as well. > > It'd probably be nice to try to get an email to > the irda mailing list too if it still works. As get_maintainer.pl didn't show it, odds are it doesn't...
Re: [PATCH 0/4] irda: move it to drivers/staging so we can delete it
On Sun, 2017-08-27 at 17:03 +0200, Greg Kroah-Hartman wrote: > The IRDA code has long been obsolete and broken. So, to keep people > from trying to use it, and to prevent people from having to maintain it, > let's move it to drivers/staging/ so that we can delete it entirely from > the kernel in a few releases. MAINTAINERS should be updated as well. It'd probably be nice to try to get an email to the irda mailing list too if it still works.
[PATCH 1/4] irda: move net/irda/ to drivers/staging/irda/net/
It's time to get rid of IRDA. It's long been broken, and no one seems to use it anymore. So move it to staging and after a while, we can delete it from there. To start, move the network irda core from net/irda to drivers/staging/irda/net/ Signed-off-by: Greg Kroah-Hartman--- drivers/staging/Kconfig | 2 ++ drivers/staging/Makefile| 1 + {net/irda => drivers/staging/irda/net}/Kconfig | 6 +++--- {net/irda => drivers/staging/irda/net}/Makefile | 0 {net/irda => drivers/staging/irda/net}/af_irda.c| 0 {net/irda => drivers/staging/irda/net}/discovery.c | 0 {net/irda => drivers/staging/irda/net}/ircomm/Kconfig | 0 {net/irda => drivers/staging/irda/net}/ircomm/Makefile | 0 {net/irda => drivers/staging/irda/net}/ircomm/ircomm_core.c | 0 {net/irda => drivers/staging/irda/net}/ircomm/ircomm_event.c| 0 {net/irda => drivers/staging/irda/net}/ircomm/ircomm_lmp.c | 0 {net/irda => drivers/staging/irda/net}/ircomm/ircomm_param.c| 0 {net/irda => drivers/staging/irda/net}/ircomm/ircomm_ttp.c | 0 {net/irda => drivers/staging/irda/net}/ircomm/ircomm_tty.c | 0 {net/irda => drivers/staging/irda/net}/ircomm/ircomm_tty_attach.c | 0 {net/irda => drivers/staging/irda/net}/ircomm/ircomm_tty_ioctl.c| 0 {net/irda => drivers/staging/irda/net}/irda_device.c| 0 {net/irda => drivers/staging/irda/net}/iriap.c | 0 {net/irda => drivers/staging/irda/net}/iriap_event.c| 0 {net/irda => drivers/staging/irda/net}/irias_object.c | 0 {net/irda => drivers/staging/irda/net}/irlan/Kconfig| 0 {net/irda => drivers/staging/irda/net}/irlan/Makefile | 0 {net/irda => drivers/staging/irda/net}/irlan/irlan_client.c | 0 {net/irda => drivers/staging/irda/net}/irlan/irlan_client_event.c | 0 {net/irda => drivers/staging/irda/net}/irlan/irlan_common.c | 0 {net/irda => drivers/staging/irda/net}/irlan/irlan_eth.c| 0 {net/irda => drivers/staging/irda/net}/irlan/irlan_event.c | 0 {net/irda => drivers/staging/irda/net}/irlan/irlan_filter.c | 0 {net/irda => drivers/staging/irda/net}/irlan/irlan_provider.c | 0 {net/irda => drivers/staging/irda/net}/irlan/irlan_provider_event.c | 0 {net/irda => drivers/staging/irda/net}/irlap.c | 0 {net/irda => drivers/staging/irda/net}/irlap_event.c| 0 {net/irda => drivers/staging/irda/net}/irlap_frame.c| 0 {net/irda => drivers/staging/irda/net}/irlmp.c | 0 {net/irda => drivers/staging/irda/net}/irlmp_event.c| 0 {net/irda => drivers/staging/irda/net}/irlmp_frame.c| 0 {net/irda => drivers/staging/irda/net}/irmod.c | 0 {net/irda => drivers/staging/irda/net}/irnet/Kconfig| 0 {net/irda => drivers/staging/irda/net}/irnet/Makefile | 0 {net/irda => drivers/staging/irda/net}/irnet/irnet.h| 0 {net/irda => drivers/staging/irda/net}/irnet/irnet_irda.c | 0 {net/irda => drivers/staging/irda/net}/irnet/irnet_irda.h | 0 {net/irda => drivers/staging/irda/net}/irnet/irnet_ppp.c| 0 {net/irda => drivers/staging/irda/net}/irnet/irnet_ppp.h| 0 {net/irda => drivers/staging/irda/net}/irnetlink.c | 0 {net/irda => drivers/staging/irda/net}/irproc.c | 0 {net/irda => drivers/staging/irda/net}/irqueue.c| 0 {net/irda => drivers/staging/irda/net}/irsysctl.c | 0 {net/irda => drivers/staging/irda/net}/irttp.c | 0 {net/irda => drivers/staging/irda/net}/parameters.c | 0 {net/irda => drivers/staging/irda/net}/qos.c| 0 {net/irda => drivers/staging/irda/net}/timer.c | 0 {net/irda => drivers/staging/irda/net}/wrapper.c| 0 net/Kconfig | 1 - net/Makefile| 1 - 55 files changed, 6 insertions(+), 5 deletions(-) rename {net/irda => drivers/staging/irda/net}/Kconfig (95%) rename {net/irda => drivers/staging/irda/net}/Makefile (100%) rename {net/irda => drivers/staging/irda/net}/af_irda.c (100%) rename {net/irda => drivers/staging/irda/net}/discovery.c (100%) rename {net/irda => drivers/staging/irda/net}/ircomm/Kconfig (100%) rename {net/irda => drivers/staging/irda/net}/ircomm/Makefile (100%) rename {net/irda => drivers/staging/irda/net}/ircomm/ircomm_core.c (100%) rename {net/irda => drivers/staging/irda/net}/ircomm/ircomm_event.c (100%) rename {net/irda =>
[PATCH 0/4] irda: move it to drivers/staging so we can delete it
The IRDA code has long been obsolete and broken. So, to keep people from trying to use it, and to prevent people from having to maintain it, let's move it to drivers/staging/ so that we can delete it entirely from the kernel in a few releases. Greg Kroah-Hartman (4): irda: move net/irda/ to drivers/staging/irda/net/ irda: move drivers/net/irda to drivers/staging/irda/drivers irda: move include/net/irda into staging subdirectory staging: irda: add a TODO file. drivers/net/Makefile | 1 - drivers/staging/Kconfig | 2 ++ drivers/staging/Makefile | 2 ++ drivers/staging/irda/TODO | 4 drivers/{net/irda => staging/irda/drivers}/Kconfig| 0 drivers/{net/irda => staging/irda/drivers}/Makefile | 2 ++ drivers/{net/irda => staging/irda/drivers}/act200l-sir.c | 0 drivers/{net/irda => staging/irda/drivers}/actisys-sir.c | 0 drivers/{net/irda => staging/irda/drivers}/ali-ircc.c | 0 drivers/{net/irda => staging/irda/drivers}/ali-ircc.h | 0 drivers/{net/irda => staging/irda/drivers}/au1k_ir.c | 0 drivers/{net/irda => staging/irda/drivers}/bfin_sir.c | 0 drivers/{net/irda => staging/irda/drivers}/bfin_sir.h | 0 drivers/{net/irda => staging/irda/drivers}/donauboe.c | 0 drivers/{net/irda => staging/irda/drivers}/donauboe.h | 0 drivers/{net/irda => staging/irda/drivers}/esi-sir.c | 0 drivers/{net/irda => staging/irda/drivers}/girbil-sir.c | 0 drivers/{net/irda => staging/irda/drivers}/irda-usb.c | 0 drivers/{net/irda => staging/irda/drivers}/irda-usb.h | 0 drivers/{net/irda => staging/irda/drivers}/irtty-sir.c| 0 drivers/{net/irda => staging/irda/drivers}/irtty-sir.h| 0 drivers/{net/irda => staging/irda/drivers}/kingsun-sir.c | 0 drivers/{net/irda => staging/irda/drivers}/ks959-sir.c| 0 drivers/{net/irda => staging/irda/drivers}/ksdazzle-sir.c | 0 drivers/{net/irda => staging/irda/drivers}/litelink-sir.c | 0 drivers/{net/irda => staging/irda/drivers}/ma600-sir.c| 0 drivers/{net/irda => staging/irda/drivers}/mcp2120-sir.c | 0 drivers/{net/irda => staging/irda/drivers}/mcs7780.c | 0 drivers/{net/irda => staging/irda/drivers}/mcs7780.h | 0 drivers/{net/irda => staging/irda/drivers}/nsc-ircc.c | 0 drivers/{net/irda => staging/irda/drivers}/nsc-ircc.h | 0 drivers/{net/irda => staging/irda/drivers}/old_belkin-sir.c | 0 drivers/{net/irda => staging/irda/drivers}/pxaficp_ir.c | 0 drivers/{net/irda => staging/irda/drivers}/sa1100_ir.c| 0 drivers/{net/irda => staging/irda/drivers}/sh_sir.c | 0 drivers/{net/irda => staging/irda/drivers}/sir-dev.h | 0 drivers/{net/irda => staging/irda/drivers}/sir_dev.c | 0 drivers/{net/irda => staging/irda/drivers}/sir_dongle.c | 0 drivers/{net/irda => staging/irda/drivers}/smsc-ircc2.c | 0 drivers/{net/irda => staging/irda/drivers}/smsc-ircc2.h | 0 drivers/{net/irda => staging/irda/drivers}/smsc-sio.h | 0 drivers/{net/irda => staging/irda/drivers}/stir4200.c | 0 drivers/{net/irda => staging/irda/drivers}/tekram-sir.c | 0 drivers/{net/irda => staging/irda/drivers}/toim3232-sir.c | 0 drivers/{net/irda => staging/irda/drivers}/via-ircc.c | 0 drivers/{net/irda => staging/irda/drivers}/via-ircc.h | 0 drivers/{net/irda => staging/irda/drivers}/vlsi_ir.c | 0 drivers/{net/irda => staging/irda/drivers}/vlsi_ir.h | 0 drivers/{net/irda => staging/irda/drivers}/w83977af.h | 0 drivers/{net/irda => staging/irda/drivers}/w83977af_ir.c | 0 drivers/{net/irda => staging/irda/drivers}/w83977af_ir.h | 0 {include => drivers/staging/irda/include}/net/irda/af_irda.h | 0 {include => drivers/staging/irda/include}/net/irda/crc.h | 0 {include => drivers/staging/irda/include}/net/irda/discovery.h| 0 {include => drivers/staging/irda/include}/net/irda/ircomm_core.h | 0 {include => drivers/staging/irda/include}/net/irda/ircomm_event.h | 0 {include => drivers/staging/irda/include}/net/irda/ircomm_lmp.h | 0 {include => drivers/staging/irda/include}/net/irda/ircomm_param.h | 0 {include => drivers/staging/irda/include}/net/irda/ircomm_ttp.h | 0 {include => drivers/staging/irda/include}/net/irda/ircomm_tty.h | 0 .../staging/irda/include}/net/irda/ircomm_tty_attach.h| 0 {include => drivers/staging/irda/include}/net/irda/irda.h | 0 {include => drivers/staging/irda/include}/net/irda/irda_device.h | 0 {include =>
[PATCH 4/4] staging: irda: add a TODO file.
The irda code will be deleted in a future kernel release, so no need to have anyone do any new work on it. Signed-off-by: Greg Kroah-Hartman--- drivers/staging/irda/TODO | 4 1 file changed, 4 insertions(+) create mode 100644 drivers/staging/irda/TODO diff --git a/drivers/staging/irda/TODO b/drivers/staging/irda/TODO new file mode 100644 index ..7d98a5cffaff --- /dev/null +++ b/drivers/staging/irda/TODO @@ -0,0 +1,4 @@ +The irda code will be removed soon from the kernel tree as it is old and +obsolete and broken. + +Don't worry about fixing up anything here, it's not needed. -- 2.14.1
[PATCH 2/4] irda: move drivers/net/irda to drivers/staging/irda/drivers
Move the irda drivers from drivers/net/irda/ to drivers/staging/irda/drivers as they will be deleted in a future kernel release. Signed-off-by: Greg Kroah-Hartman--- drivers/net/Makefile| 1 - drivers/staging/Makefile| 1 + drivers/{net/irda => staging/irda/drivers}/Kconfig | 0 drivers/{net/irda => staging/irda/drivers}/Makefile | 0 drivers/{net/irda => staging/irda/drivers}/act200l-sir.c| 0 drivers/{net/irda => staging/irda/drivers}/actisys-sir.c| 0 drivers/{net/irda => staging/irda/drivers}/ali-ircc.c | 0 drivers/{net/irda => staging/irda/drivers}/ali-ircc.h | 0 drivers/{net/irda => staging/irda/drivers}/au1k_ir.c| 0 drivers/{net/irda => staging/irda/drivers}/bfin_sir.c | 0 drivers/{net/irda => staging/irda/drivers}/bfin_sir.h | 0 drivers/{net/irda => staging/irda/drivers}/donauboe.c | 0 drivers/{net/irda => staging/irda/drivers}/donauboe.h | 0 drivers/{net/irda => staging/irda/drivers}/esi-sir.c| 0 drivers/{net/irda => staging/irda/drivers}/girbil-sir.c | 0 drivers/{net/irda => staging/irda/drivers}/irda-usb.c | 0 drivers/{net/irda => staging/irda/drivers}/irda-usb.h | 0 drivers/{net/irda => staging/irda/drivers}/irtty-sir.c | 0 drivers/{net/irda => staging/irda/drivers}/irtty-sir.h | 0 drivers/{net/irda => staging/irda/drivers}/kingsun-sir.c| 0 drivers/{net/irda => staging/irda/drivers}/ks959-sir.c | 0 drivers/{net/irda => staging/irda/drivers}/ksdazzle-sir.c | 0 drivers/{net/irda => staging/irda/drivers}/litelink-sir.c | 0 drivers/{net/irda => staging/irda/drivers}/ma600-sir.c | 0 drivers/{net/irda => staging/irda/drivers}/mcp2120-sir.c| 0 drivers/{net/irda => staging/irda/drivers}/mcs7780.c| 0 drivers/{net/irda => staging/irda/drivers}/mcs7780.h| 0 drivers/{net/irda => staging/irda/drivers}/nsc-ircc.c | 0 drivers/{net/irda => staging/irda/drivers}/nsc-ircc.h | 0 drivers/{net/irda => staging/irda/drivers}/old_belkin-sir.c | 0 drivers/{net/irda => staging/irda/drivers}/pxaficp_ir.c | 0 drivers/{net/irda => staging/irda/drivers}/sa1100_ir.c | 0 drivers/{net/irda => staging/irda/drivers}/sh_sir.c | 0 drivers/{net/irda => staging/irda/drivers}/sir-dev.h| 0 drivers/{net/irda => staging/irda/drivers}/sir_dev.c| 0 drivers/{net/irda => staging/irda/drivers}/sir_dongle.c | 0 drivers/{net/irda => staging/irda/drivers}/smsc-ircc2.c | 0 drivers/{net/irda => staging/irda/drivers}/smsc-ircc2.h | 0 drivers/{net/irda => staging/irda/drivers}/smsc-sio.h | 0 drivers/{net/irda => staging/irda/drivers}/stir4200.c | 0 drivers/{net/irda => staging/irda/drivers}/tekram-sir.c | 0 drivers/{net/irda => staging/irda/drivers}/toim3232-sir.c | 0 drivers/{net/irda => staging/irda/drivers}/via-ircc.c | 0 drivers/{net/irda => staging/irda/drivers}/via-ircc.h | 0 drivers/{net/irda => staging/irda/drivers}/vlsi_ir.c| 0 drivers/{net/irda => staging/irda/drivers}/vlsi_ir.h| 0 drivers/{net/irda => staging/irda/drivers}/w83977af.h | 0 drivers/{net/irda => staging/irda/drivers}/w83977af_ir.c| 0 drivers/{net/irda => staging/irda/drivers}/w83977af_ir.h| 0 drivers/staging/irda/net/Kconfig| 2 +- 50 files changed, 2 insertions(+), 2 deletions(-) rename drivers/{net/irda => staging/irda/drivers}/Kconfig (100%) rename drivers/{net/irda => staging/irda/drivers}/Makefile (100%) rename drivers/{net/irda => staging/irda/drivers}/act200l-sir.c (100%) rename drivers/{net/irda => staging/irda/drivers}/actisys-sir.c (100%) rename drivers/{net/irda => staging/irda/drivers}/ali-ircc.c (100%) rename drivers/{net/irda => staging/irda/drivers}/ali-ircc.h (100%) rename drivers/{net/irda => staging/irda/drivers}/au1k_ir.c (100%) rename drivers/{net/irda => staging/irda/drivers}/bfin_sir.c (100%) rename drivers/{net/irda => staging/irda/drivers}/bfin_sir.h (100%) rename drivers/{net/irda => staging/irda/drivers}/donauboe.c (100%) rename drivers/{net/irda => staging/irda/drivers}/donauboe.h (100%) rename drivers/{net/irda => staging/irda/drivers}/esi-sir.c (100%) rename drivers/{net/irda => staging/irda/drivers}/girbil-sir.c (100%) rename drivers/{net/irda => staging/irda/drivers}/irda-usb.c (100%) rename drivers/{net/irda => staging/irda/drivers}/irda-usb.h (100%) rename drivers/{net/irda => staging/irda/drivers}/irtty-sir.c (100%) rename drivers/{net/irda => staging/irda/drivers}/irtty-sir.h (100%) rename drivers/{net/irda => staging/irda/drivers}/kingsun-sir.c (100%) rename drivers/{net/irda => staging/irda/drivers}/ks959-sir.c (100%) rename drivers/{net/irda => staging/irda/drivers}/ksdazzle-sir.c (100%) rename drivers/{net/irda => staging/irda/drivers}/litelink-sir.c (100%) rename drivers/{net/irda =>
[PATCH 3/4] irda: move include/net/irda into staging subdirectory
And finally, move the irda include files into drivers/staging/irda/include/net/irda. Yes, it's a long path, but it makes it easy for us to just add a Makefile directory path addition and all of the net and drivers code "just works". Signed-off-by: Greg Kroah-Hartman--- drivers/staging/irda/drivers/Makefile | 2 ++ {include => drivers/staging/irda/include}/net/irda/af_irda.h | 0 {include => drivers/staging/irda/include}/net/irda/crc.h | 0 {include => drivers/staging/irda/include}/net/irda/discovery.h | 0 {include => drivers/staging/irda/include}/net/irda/ircomm_core.h | 0 {include => drivers/staging/irda/include}/net/irda/ircomm_event.h | 0 {include => drivers/staging/irda/include}/net/irda/ircomm_lmp.h| 0 {include => drivers/staging/irda/include}/net/irda/ircomm_param.h | 0 {include => drivers/staging/irda/include}/net/irda/ircomm_ttp.h| 0 {include => drivers/staging/irda/include}/net/irda/ircomm_tty.h| 0 {include => drivers/staging/irda/include}/net/irda/ircomm_tty_attach.h | 0 {include => drivers/staging/irda/include}/net/irda/irda.h | 0 {include => drivers/staging/irda/include}/net/irda/irda_device.h | 0 {include => drivers/staging/irda/include}/net/irda/iriap.h | 0 {include => drivers/staging/irda/include}/net/irda/iriap_event.h | 0 {include => drivers/staging/irda/include}/net/irda/irias_object.h | 0 {include => drivers/staging/irda/include}/net/irda/irlan_client.h | 0 {include => drivers/staging/irda/include}/net/irda/irlan_common.h | 0 {include => drivers/staging/irda/include}/net/irda/irlan_eth.h | 0 {include => drivers/staging/irda/include}/net/irda/irlan_event.h | 0 {include => drivers/staging/irda/include}/net/irda/irlan_filter.h | 0 {include => drivers/staging/irda/include}/net/irda/irlan_provider.h| 0 {include => drivers/staging/irda/include}/net/irda/irlap.h | 0 {include => drivers/staging/irda/include}/net/irda/irlap_event.h | 0 {include => drivers/staging/irda/include}/net/irda/irlap_frame.h | 0 {include => drivers/staging/irda/include}/net/irda/irlmp.h | 0 {include => drivers/staging/irda/include}/net/irda/irlmp_event.h | 0 {include => drivers/staging/irda/include}/net/irda/irlmp_frame.h | 0 {include => drivers/staging/irda/include}/net/irda/irmod.h | 0 {include => drivers/staging/irda/include}/net/irda/irqueue.h | 0 {include => drivers/staging/irda/include}/net/irda/irttp.h | 0 {include => drivers/staging/irda/include}/net/irda/parameters.h| 0 {include => drivers/staging/irda/include}/net/irda/qos.h | 0 {include => drivers/staging/irda/include}/net/irda/timer.h | 0 {include => drivers/staging/irda/include}/net/irda/wrapper.h | 0 drivers/staging/irda/net/Makefile | 2 ++ 36 files changed, 4 insertions(+) rename {include => drivers/staging/irda/include}/net/irda/af_irda.h (100%) rename {include => drivers/staging/irda/include}/net/irda/crc.h (100%) rename {include => drivers/staging/irda/include}/net/irda/discovery.h (100%) rename {include => drivers/staging/irda/include}/net/irda/ircomm_core.h (100%) rename {include => drivers/staging/irda/include}/net/irda/ircomm_event.h (100%) rename {include => drivers/staging/irda/include}/net/irda/ircomm_lmp.h (100%) rename {include => drivers/staging/irda/include}/net/irda/ircomm_param.h (100%) rename {include => drivers/staging/irda/include}/net/irda/ircomm_ttp.h (100%) rename {include => drivers/staging/irda/include}/net/irda/ircomm_tty.h (100%) rename {include => drivers/staging/irda/include}/net/irda/ircomm_tty_attach.h (100%) rename {include => drivers/staging/irda/include}/net/irda/irda.h (100%) rename {include => drivers/staging/irda/include}/net/irda/irda_device.h (100%) rename {include => drivers/staging/irda/include}/net/irda/iriap.h (100%) rename {include => drivers/staging/irda/include}/net/irda/iriap_event.h (100%) rename {include => drivers/staging/irda/include}/net/irda/irias_object.h (100%) rename {include => drivers/staging/irda/include}/net/irda/irlan_client.h (100%) rename {include => drivers/staging/irda/include}/net/irda/irlan_common.h (100%) rename {include => drivers/staging/irda/include}/net/irda/irlan_eth.h (100%) rename {include => drivers/staging/irda/include}/net/irda/irlan_event.h (100%) rename {include => drivers/staging/irda/include}/net/irda/irlan_filter.h (100%) rename {include => drivers/staging/irda/include}/net/irda/irlan_provider.h (100%) rename {include => drivers/staging/irda/include}/net/irda/irlap.h (100%) rename {include => drivers/staging/irda/include}/net/irda/irlap_event.h (100%) rename {include => drivers/staging/irda/include}/net/irda/irlap_frame.h (100%) rename {include =>
Re: [PATCH v2 net-next 1/8] bpf: Add support for recursively running cgroup sock filters
On 8/25/17 8:49 PM, Alexei Starovoitov wrote: > >> +if (prog && curr_recursive && !new_recursive) >> +/* if a parent has recursive prog attached, only >> + * allow recursive programs in descendent cgroup >> + */ >> +return -EINVAL; >> + >> old_prog = cgrp->bpf.prog[type]; > > ... I'm struggling to completely understand how it interacts > with BPF_F_ALLOW_OVERRIDE. The 2 flags are completely independent. The existing override logic is unchanged. If a program can not be overridden, then the new recursive flag is irrelevant. > By default we shouldn't allow overriding, so if default prog attached > to a root, what happens if we try to attach F_RECURSIVE to a descendent? > If I'm reading the code correctly it will not succeed, which is good. > Could you add such scenario as test to test_cgrp2_attach2.c ? Patch 7 adds test cases to cover scenarios. I will add more tests per comments below and rename to convey it tests the recursive flag. > > Now say we attach overridable and !recursive to a root, another > recursive prog will not be attached to a descedent, which is correct. yes > > But if we attach !overridable + recursive to a root we cannot attach > anything to a descendent right? Then why allow such combination at all? Sure, we can not allow that combination to prevent the inefficiency of recursively running through cgroups to run the base program. > So only overridable + recursive combination makes sense, right? > > I think all these combinations must be documented and tests must be > added. Sooner or later people will build security sensitive environment > with it and we have to meticulous now. Intentions below. I'll add more test cases to verify intentions agree with code. > > Do you think it would make sense to split this patch out and > push patches 2 and 3 with few tests in parallel, while we're review > this change? I thought about that but decided no. The 'ip vrf exec' use case would break right of the gate if the other settings were used. > > Tejun needs to take a deep look into this patch as well. > This is the intended behavior: The override flag is independent of the recursive flag. If the override flag does not allow an override, the attempt to add a new program fails. The recursive flag brings an additional constraint: once a cgroup has a program with the recursive flag set it is inherited by all descendant groups. Attempts to insert a program that changes that flag fails EINVAL. Start with the root group at $MNT. No program is attached. By default override is allowed and recursive is not set. 1. Group $MNT/a is created. i. Default settings from $MNT are inherited; 'a' has override enabled and recursive disabled. ii. Program is attached. Override flag is set, recursive flag is not set. iii. Process in 'a' opens a socket, program attached to 'a' is run. 2. $MNT/a/b is created i. 'b' inherits the program and settings of 'a' (override enabled, recursive disabled). ii. Process in 'b' opens a socket. Program inherited from 'a' is run. iii. Non-interesting case for this patch set: attaching a non-recursive program to 'b' overrides the inherited one. process opens a socket only the 'b' program is run. iv. Program is attached to 'b', override flag set, recursive flag set. v. Process in 'b' opens a socket. Program attached to 'b' is run and then program from 'a' is run. Recursion stops here since 'a' does not have the recursion flag set. 3. $MNT/a/b/c is created i. 'c' inherits the settings of 'b' (override is allowed, recursive flag is set) ii. Process in 'c' opens a socket. No program from 'c' exists, so nothing is run. Recursion flag is set, so program from 'b' is run, then program from 'a' is run. Stop (recursive flag not set on 'a'). iii. Attaching a non-recursive program to 'c' fails because it inherited the recursive flag from 'b' and that can not be reset by a descendant. iv. Recursive program is attached to 'c' v. Process in 'c' opens a socket. Program attached to 'c' is run, then the program from 'b' and the program from 'a'. Stop. etc. To consider what happens on doubling back and changing programs in the hierarchy, start with $MNT/a/b/c from 3 above (non-recursive on 'a', recursive on 'b' and recursive on 'c') for each of the following cases: 1. Program attached to 'b' is detached, recursive flag is reset in the request. Attempt fails EINVAL because the recursion flag has to be set. 2. Program attached to 'b' is detached, recursive flag is set. Allowed. Process in 'b' opens a socket. No program attached to 'b' so no program is run. Recursive flag is set to program from 'a' is run. Stop. We should allow the recursive flag to be reset if the parent is not recursive allowing an unwind of settings applied. I'll add that change.
Re: [PATCH v2 net-next 1/8] bpf: Add support for recursively running cgroup sock filters
On 8/25/17 8:00 PM, Daniel Borkmann wrote: > Can you elaborate on the semantical changes for the programs > setting the new flag which are not using below cgroup_bpf_run_filter_sk() > helper to walk back to root? You mean other cgroup based programs -- BPF_CGROUP_* ? If so, any reason not to allow the recursion model on those too?
Re: [PATCH] DSA support for Micrel KSZ8895
On Sun, Aug 27, 2017 at 02:36:58PM +0200, Pavel Machek wrote: > Hi! > > So I fought with the driver a bit more, and now I have something that > kind-of-works. Thanks for keeping on working on this. > "great great hack" belows worries me. > > Yeah, disabled code needs to be removed before merge. > > No, tag_ksz part probably is not acceptable. Do you see solution > better than just copying it into tag_ksz1 file? > > Any more comments, etc? It would help with review if you split this up into multiple patches. The change to the tagger should be one patch. The mdio emulation would make a reasonable standalone patch etc. I will do a more detailed review later. Andrew
Re: [PATCH net-next 4/8] net: ethernet: add the Alpine Ethernet driver
This is a fixed version of my previous response (using proper indentation and leaving only the specific questions responded to). > > +/* MDIO */ > > +#define AL_ETH_MDIO_C45_DEV_MASK 0x1f > > +#define AL_ETH_MDIO_C45_DEV_SHIFT16 > > +#define AL_ETH_MDIO_C45_REG_MASK 0x > > + > > +static int al_mdio_read(struct mii_bus *bp, int mii_id, int reg) > > +{ > > + struct al_eth_adapter *adapter = bp->priv; > > + u16 value = 0; > > + int rc; > > + int timeout = MDIO_TIMEOUT_MSEC; > > + > > + while (timeout > 0) { > > + if (reg & MII_ADDR_C45) { > > + netdev_dbg(adapter->netdev, "[c45]: dev %x reg %x val > > %x\n", > > +((reg & AL_ETH_MDIO_C45_DEV_MASK) >> > > AL_ETH_MDIO_C45_DEV_SHIFT), > > +(reg & AL_ETH_MDIO_C45_REG_MASK), value); > > + rc = al_eth_mdio_read(>hw_adapter, > > adapter->phy_addr, > > + ((reg & AL_ETH_MDIO_C45_DEV_MASK) >> > > AL_ETH_MDIO_C45_DEV_SHIFT), > > + (reg & AL_ETH_MDIO_C45_REG_MASK), ); > > + } else { > > + rc = al_eth_mdio_read(>hw_adapter, > > adapter->phy_addr, > > + MDIO_DEVAD_NONE, reg, ); > > + } > > + > > + if (rc == 0) > > + return value; > > + > > + netdev_dbg(adapter->netdev, > > +"mdio read failed. try again in 10 msec\n"); > > + > > + timeout -= 10; > > + msleep(10); > > + } > > This is rather unusual, retrying MDIO operations. Are you working > around a hardware bug? I suspect this also opens up race conditions, > in particular with PHY interrupts, which can be clear on read. The MDIO bus is shared between the ethernet units. There is a HW lock used to arbitrate between different interfaces trying to access the bus, therefore there is a retry loop. The reg isn't accessed before obtaining the lock, so there shouldn't be any clear on read issues. > > +/* al_eth_mdiobus_setup - initialize mdiobus and register to kernel */ > > +static int al_eth_mdiobus_setup(struct al_eth_adapter *adapter) > > +{ > > + struct phy_device *phydev; > > + int i; > > + int ret = 0; > > + > > + adapter->mdio_bus = mdiobus_alloc(); > > + if (!adapter->mdio_bus) > > + return -ENOMEM; > > + > > + adapter->mdio_bus->name = "al mdio bus"; > > + snprintf(adapter->mdio_bus->id, MII_BUS_ID_SIZE, "%x", > > + (adapter->pdev->bus->number << 8) | adapter->pdev->devfn); > > + adapter->mdio_bus->priv = adapter; > > + adapter->mdio_bus->parent = >pdev->dev; > > + adapter->mdio_bus->read = _mdio_read; > > + adapter->mdio_bus->write= _mdio_write; > > + adapter->mdio_bus->phy_mask = ~BIT(adapter->phy_addr); > > Why do this? Since the MDIO bus is shared, we want each interface to probe only for the PHY associated with it. > > + * acquire mdio interface ownership > > + * when mdio interface shared between multiple eth controllers, this > > function waits until the ownership granted for this controller. > > + * this function does nothing when the mdio interface is used only by this > > controller. > > + * > > + * @param adapter > > + * @return 0 on success, -ETIMEDOUT on timeout. > > + */ > > +static int al_eth_mdio_lock(struct al_hw_eth_adapter *adapter) > > +{ > > + int count = 0; > > + u32 mdio_ctrl_1; > > + > > + if (!adapter->shared_mdio_if) > > + return 0; /* nothing to do when interface is not shared */ > > + > > + do { > > + mdio_ctrl_1 = readl(>mac_regs_base->gen.mdio_ctrl_1); > > + if (mdio_ctrl_1 & BIT(0)) { > > + if (count > 0) > > + netdev_dbg(adapter->netdev, > > +"eth %s mdio interface still > > busy!\n", > > +adapter->name); > > + } else { > > + return 0; > > + } > > + udelay(AL_ETH_MDIO_DELAY_PERIOD); > > + } while (count++ < (AL_ETH_MDIO_DELAY_COUNT * 4)); > > This needs explaining. How can a read alone perform a lock? How is > this race free? This is how this HW lock works: when the bit is 0 this means the lock is free. When a read transaction arrives to the lock, it changes its value to 1 but sends 0 as the response, basically taking ownership. When the owner is done, it writes a 0 which essentially "frees" the lock. > > + if (adapter->mdio_type == AL_ETH_MDIO_TYPE_CLAUSE_22) > > + rc = al_eth_mdio_10g_mac_type22(adapter, 1, phy_addr, > > + reg, val); > > + else > > + rc = al_eth_mdio_10g_mac_type45(adapter, 1, phy_addr, > > +
Re: [PATCH net-next v7 05/10] landlock: Add LSM hooks related to filesystem
On 26/08/2017 03:16, Alexei Starovoitov wrote: > On Fri, Aug 25, 2017 at 10:16:39AM +0200, Mickaël Salaün wrote: >>> +/* a directory inode contains only one dentry */ +HOOK_NEW_FS(inode_create, 3, + struct inode *, dir, + struct dentry *, dentry, + umode_t, mode, + WRAP_ARG_INODE, dir, + WRAP_ARG_RAW, LANDLOCK_ACTION_FS_WRITE +); >>> >>> more general question: why you're not wrapping all useful >>> arguments? Like in the above dentry can be acted upon >>> by the landlock rule and it's readily available... >> >> The context used for the FS event must have the exact same types for all >> calls. This event is meant to be generic but we can add more specific >> ones if needed, like I do with FS_IOCTL. > > I see. So all FS events will have dentry as first argument > regardless of how it is in LSM hook ? All FS events will have a const struct bpf_handle_fs pointer as first argument, which wrap either a struct file, a struct dentry, a struct path or a struct inode. Having only one type (struct bpf_handle_fs) is needed for the eBPF type checker to verify if a Landlock rule (tied to an event) can access a context field and which operation is allowed (with this pointer). > I guess that will simplify the rules indeed. > I suspect you're doing it to simplify the LSM->landlock shim layer as well, > right? That's right. This ABI is independent from the LSM API and much more simpler to use. > >> The idea is to enable people to write simple rules, while being able to >> write fine grain rules for special cases (e.g. IOCTL) if needed. >> >>> >>> The limitation of only 2 args looks odd. >>> Is it a hard limitation ? how hard to extend? >> >> It's not a hard limit at all. Actually, the FS_FNCTL event should have >> three arguments (I'll add them in the next series): FS handle, FCNTL >> command and FCNTL argument. I made sure that it's really easy to add >> more arguments to the context of an event. > > The reason I'm asking, because I'm not completely convinced that > adding another argument to existing event will be backwards compatible. > It looks like you're expecting only two args for all FS events, right? There is four events right now: FS, FS_IOCTL, FS_LOCK and FS_FCNTL. Each of them are independent. Their context fields can be of the same or different eBPF type (e.g. scalar, file handle) and numbers. Actually, these four events have the same arg1 field (file handle) and the same arg2 eBPF type (scalar), even if arg2 does not have the same semantic (i.e. abstract FS action, IOCTL command…). For example, if we want to extend the FS_FCNTL's context in the future, we will just have to add an arg3. The check is performed in landlock_is_valid_access() and landlock_decide(). If a field is not used by an event, then this field will have a NOT_INIT type and accessing it will be denied. > How can you add 3rd argument? All FS events would have to get it, > but in some LSM hooks such argument will be meaningless, whereas > in other places it will carry useful info that rule can operate on. > Would that mean that we'll have FS_3 event type and only few LSM > hooks will be converted to it. That works, but then we'll lose > compatiblity with old rules written for FS event and that given hook. > Otherwise we'd need to have fancy logic to accept old FS event > into FS_3 LSM hook. If we want to add a third argument to the FS event, then it will become accessible because its type will be different than NOT_INIT. This keep the compatibility with old rules because this new field was then denied. If we want to add a new argument but only for a subset of the hooks used by the FS event, then we need to create a new event, like FS_FCNTL. For example, we may want to add a FS_RENAME event to be able to tie the source file and the destination file of a rename call. Anyway, I added the subtype/ABI version as a safeguard in case of unexpected future evolution. signature.asc Description: OpenPGP digital signature
[PATCH v4 4/7] dpaa_eth: enable Rx hashing control
Allow ethtool control of the Rx flow hashing. By default RSS is enabled, this allows to turn it off by bypassing the FMan Keygen block and sending all traffic on the default Rx frame queue. Signed-off-by: Madalin Bucur--- drivers/net/ethernet/freescale/dpaa/dpaa_ethtool.c | 113 + 1 file changed, 113 insertions(+) diff --git a/drivers/net/ethernet/freescale/dpaa/dpaa_ethtool.c b/drivers/net/ethernet/freescale/dpaa/dpaa_ethtool.c index aad825088..965f652 100644 --- a/drivers/net/ethernet/freescale/dpaa/dpaa_ethtool.c +++ b/drivers/net/ethernet/freescale/dpaa/dpaa_ethtool.c @@ -399,6 +399,117 @@ static void dpaa_get_strings(struct net_device *net_dev, u32 stringset, memcpy(strings, dpaa_stats_global, size); } +static int dpaa_get_hash_opts(struct net_device *dev, + struct ethtool_rxnfc *cmd) +{ + cmd->data = 0; + + switch (cmd->flow_type) { + case TCP_V4_FLOW: + case TCP_V6_FLOW: + case UDP_V4_FLOW: + case UDP_V6_FLOW: + cmd->data |= RXH_L4_B_0_1 | RXH_L4_B_2_3; + /* Fall through */ + case IPV4_FLOW: + case IPV6_FLOW: + case SCTP_V4_FLOW: + case SCTP_V6_FLOW: + case AH_ESP_V4_FLOW: + case AH_ESP_V6_FLOW: + case AH_V4_FLOW: + case AH_V6_FLOW: + case ESP_V4_FLOW: + case ESP_V6_FLOW: + cmd->data |= RXH_IP_SRC | RXH_IP_DST; + break; + default: + cmd->data = 0; + break; + } + + return 0; +} + +static int dpaa_get_rxnfc(struct net_device *dev, struct ethtool_rxnfc *cmd, + u32 *unused) +{ + int ret = -EOPNOTSUPP; + + switch (cmd->cmd) { + case ETHTOOL_GRXFH: + ret = dpaa_get_hash_opts(dev, cmd); + break; + default: + break; + } + + return ret; +} + +static void dpaa_set_hash(struct net_device *net_dev, bool enable) +{ + struct mac_device *mac_dev; + struct fman_port *rxport; + struct dpaa_priv *priv; + + priv = netdev_priv(net_dev); + mac_dev = priv->mac_dev; + rxport = mac_dev->port[0]; + + fman_port_use_kg_hash(rxport, enable); +} + +static int dpaa_set_hash_opts(struct net_device *dev, + struct ethtool_rxnfc *nfc) +{ + int ret = -EINVAL; + + /* we support hashing on IPv4/v6 src/dest IP and L4 src/dest port */ + if (nfc->data & + ~(RXH_IP_SRC | RXH_IP_DST | RXH_L4_B_0_1 | RXH_L4_B_2_3)) + return -EINVAL; + + switch (nfc->flow_type) { + case TCP_V4_FLOW: + case TCP_V6_FLOW: + case UDP_V4_FLOW: + case UDP_V6_FLOW: + case IPV4_FLOW: + case IPV6_FLOW: + case SCTP_V4_FLOW: + case SCTP_V6_FLOW: + case AH_ESP_V4_FLOW: + case AH_ESP_V6_FLOW: + case AH_V4_FLOW: + case AH_V6_FLOW: + case ESP_V4_FLOW: + case ESP_V6_FLOW: + dpaa_set_hash(dev, !!nfc->data); + ret = 0; + break; + default: + break; + } + + return ret; +} + +static int dpaa_set_rxnfc(struct net_device *dev, struct ethtool_rxnfc *cmd) +{ + int ret = -EOPNOTSUPP; + + switch (cmd->cmd) { + case ETHTOOL_SRXFH: + ret = dpaa_set_hash_opts(dev, cmd); + break; + default: + break; + } + + return ret; +} + const struct ethtool_ops dpaa_ethtool_ops = { .get_drvinfo = dpaa_get_drvinfo, .get_msglevel = dpaa_get_msglevel, @@ -412,4 +523,6 @@ const struct ethtool_ops dpaa_ethtool_ops = { .get_strings = dpaa_get_strings, .get_link_ksettings = dpaa_get_link_ksettings, .set_link_ksettings = dpaa_set_link_ksettings, + .get_rxnfc = dpaa_get_rxnfc, + .set_rxnfc = dpaa_set_rxnfc, }; -- 2.1.0
[PATCH v4 0/7] Add RSS to DPAA 1.x Ethernet driver
This patch set introduces Receive Side Scaling for the DPAA Ethernet driver. Documentation is updated with details related to the new feature and limitations that apply. Added also a small fix. v2: removed a C++ style comment v3: move struct fman to header file to avoid exporting a function v4: addressed compilation issues introduced in v3 Iordache Florinel-R70177 (1): fsl/fman: enable FMan Keygen Madalin Bucur (6): fsl/fman: move struct fman to header file dpaa_eth: use multiple Rx frame queues dpaa_eth: enable Rx hashing control dpaa_eth: add NETIF_F_RXHASH Documentation: networking: add RSS information dpaa_eth: check allocation result Documentation/networking/dpaa.txt | 68 +- drivers/net/ethernet/freescale/dpaa/dpaa_eth.c | 76 +- drivers/net/ethernet/freescale/dpaa/dpaa_eth.h | 2 + .../net/ethernet/freescale/dpaa/dpaa_eth_sysfs.c | 3 + drivers/net/ethernet/freescale/dpaa/dpaa_ethtool.c | 118 drivers/net/ethernet/freescale/fman/Makefile | 2 +- drivers/net/ethernet/freescale/fman/fman.c | 88 +-- drivers/net/ethernet/freescale/fman/fman.h | 77 ++ drivers/net/ethernet/freescale/fman/fman_keygen.c | 783 + drivers/net/ethernet/freescale/fman/fman_keygen.h | 46 ++ drivers/net/ethernet/freescale/fman/fman_port.c| 59 +- drivers/net/ethernet/freescale/fman/fman_port.h| 7 + 12 files changed, 1235 insertions(+), 94 deletions(-) create mode 100644 drivers/net/ethernet/freescale/fman/fman_keygen.c create mode 100644 drivers/net/ethernet/freescale/fman/fman_keygen.h -- 2.1.0
[PATCH v4 3/7] dpaa_eth: use multiple Rx frame queues
Add a block of 128 Rx frame queues per port. The FMan hardware will send traffic on one of these queues based on the FMan port Parse Classify Distribute setup. The hash computed by the FMan Keygen block will select the Rx FQ. Signed-off-by: Madalin Bucur--- drivers/net/ethernet/freescale/dpaa/dpaa_eth.c | 50 +++--- drivers/net/ethernet/freescale/dpaa/dpaa_eth.h | 1 + .../net/ethernet/freescale/dpaa/dpaa_eth_sysfs.c | 3 ++ 3 files changed, 47 insertions(+), 7 deletions(-) diff --git a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c index c7fa285..6d89e74 100644 --- a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c +++ b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c @@ -158,7 +158,7 @@ MODULE_PARM_DESC(tx_timeout, "The Tx timeout in ms"); #define DPAA_RX_PRIV_DATA_SIZE (u16)(DPAA_TX_PRIV_DATA_SIZE + \ dpaa_rx_extra_headroom) -#define DPAA_ETH_RX_QUEUES 128 +#define DPAA_ETH_PCD_RXQ_NUM 128 #define DPAA_ENQUEUE_RETRIES 10 @@ -169,6 +169,7 @@ struct fm_port_fqs { struct dpaa_fq *tx_errq; struct dpaa_fq *rx_defq; struct dpaa_fq *rx_errq; + struct dpaa_fq *rx_pcdq; }; /* All the dpa bps in use at any moment */ @@ -628,6 +629,7 @@ static inline void dpaa_assign_wq(struct dpaa_fq *fq, int idx) fq->wq = 5; break; case FQ_TYPE_RX_DEFAULT: + case FQ_TYPE_RX_PCD: fq->wq = 6; break; case FQ_TYPE_TX: @@ -688,6 +690,7 @@ static int dpaa_alloc_all_fqs(struct device *dev, struct list_head *list, struct fm_port_fqs *port_fqs) { struct dpaa_fq *dpaa_fq; + u32 fq_base, fq_base_aligned, i; dpaa_fq = dpaa_fq_alloc(dev, 0, 1, list, FQ_TYPE_RX_ERROR); if (!dpaa_fq) @@ -701,6 +704,26 @@ static int dpaa_alloc_all_fqs(struct device *dev, struct list_head *list, port_fqs->rx_defq = _fq[0]; + /* the PCD FQIDs range needs to be aligned for correct operation */ + if (qman_alloc_fqid_range(_base, 2 * DPAA_ETH_PCD_RXQ_NUM)) + goto fq_alloc_failed; + + fq_base_aligned = ALIGN(fq_base, DPAA_ETH_PCD_RXQ_NUM); + + for (i = fq_base; i < fq_base_aligned; i++) + qman_release_fqid(i); + + for (i = fq_base_aligned + DPAA_ETH_PCD_RXQ_NUM; +i < (fq_base + 2 * DPAA_ETH_PCD_RXQ_NUM); i++) + qman_release_fqid(i); + + dpaa_fq = dpaa_fq_alloc(dev, fq_base_aligned, DPAA_ETH_PCD_RXQ_NUM, + list, FQ_TYPE_RX_PCD); + if (!dpaa_fq) + goto fq_alloc_failed; + + port_fqs->rx_pcdq = _fq[0]; + if (!dpaa_fq_alloc(dev, 0, DPAA_ETH_TXQ_NUM, list, FQ_TYPE_TX_CONF_MQ)) goto fq_alloc_failed; @@ -870,13 +893,14 @@ static void dpaa_fq_setup(struct dpaa_priv *priv, const struct dpaa_fq_cbs *fq_cbs, struct fman_port *tx_port) { - int egress_cnt = 0, conf_cnt = 0, num_portals = 0, cpu; + int egress_cnt = 0, conf_cnt = 0, num_portals = 0, portal_cnt = 0, cpu; const cpumask_t *affine_cpus = qman_affine_cpus(); - u16 portals[NR_CPUS]; + u16 channels[NR_CPUS]; struct dpaa_fq *fq; for_each_cpu(cpu, affine_cpus) - portals[num_portals++] = qman_affine_channel(cpu); + channels[num_portals++] = qman_affine_channel(cpu); + if (num_portals == 0) dev_err(priv->net_dev->dev.parent, "No Qman software (affine) channels found"); @@ -890,6 +914,12 @@ static void dpaa_fq_setup(struct dpaa_priv *priv, case FQ_TYPE_RX_ERROR: dpaa_setup_ingress(priv, fq, _cbs->rx_errq); break; + case FQ_TYPE_RX_PCD: + if (!num_portals) + continue; + dpaa_setup_ingress(priv, fq, _cbs->rx_defq); + fq->channel = channels[portal_cnt++ % num_portals]; + break; case FQ_TYPE_TX: dpaa_setup_egress(priv, fq, tx_port, _cbs->egress_ern); @@ -1039,7 +1069,8 @@ static int dpaa_fq_init(struct dpaa_fq *dpaa_fq, bool td_enable) /* Put all the ingress queues in our "ingress CGR". */ if (priv->use_ingress_cgr && (dpaa_fq->fq_type == FQ_TYPE_RX_DEFAULT || -dpaa_fq->fq_type == FQ_TYPE_RX_ERROR)) { +dpaa_fq->fq_type == FQ_TYPE_RX_ERROR || +dpaa_fq->fq_type == FQ_TYPE_RX_PCD)) { initfq.we_mask |= cpu_to_be16(QM_INITFQ_WE_CGID); initfq.fqd.fq_ctrl |= cpu_to_be16(QM_FQCTRL_CGE);
[PATCH v4 2/7] fsl/fman: enable FMan Keygen
From: Iordache Florinel-R70177Add support for the FMan Keygen with a hardcoded scheme to spread incoming traffic on a FQ range based on source and destination IPs and ports. Signed-off-by: Iordache Florinel Signed-off-by: Madalin Bucur --- drivers/net/ethernet/freescale/fman/Makefile | 2 +- drivers/net/ethernet/freescale/fman/fman.c| 8 + drivers/net/ethernet/freescale/fman/fman.h| 2 + drivers/net/ethernet/freescale/fman/fman_keygen.c | 783 ++ drivers/net/ethernet/freescale/fman/fman_keygen.h | 46 ++ drivers/net/ethernet/freescale/fman/fman_port.c | 40 +- drivers/net/ethernet/freescale/fman/fman_port.h | 5 + 7 files changed, 884 insertions(+), 2 deletions(-) create mode 100644 drivers/net/ethernet/freescale/fman/fman_keygen.c create mode 100644 drivers/net/ethernet/freescale/fman/fman_keygen.h diff --git a/drivers/net/ethernet/freescale/fman/Makefile b/drivers/net/ethernet/freescale/fman/Makefile index 6049177..2c38119 100644 --- a/drivers/net/ethernet/freescale/fman/Makefile +++ b/drivers/net/ethernet/freescale/fman/Makefile @@ -4,6 +4,6 @@ obj-$(CONFIG_FSL_FMAN) += fsl_fman.o obj-$(CONFIG_FSL_FMAN) += fsl_fman_port.o obj-$(CONFIG_FSL_FMAN) += fsl_mac.o -fsl_fman-objs := fman_muram.o fman.o fman_sp.o +fsl_fman-objs := fman_muram.o fman.o fman_sp.o fman_keygen.o fsl_fman_port-objs := fman_port.o fsl_mac-objs:= mac.o fman_dtsec.o fman_memac.o fman_tgec.o diff --git a/drivers/net/ethernet/freescale/fman/fman.c b/drivers/net/ethernet/freescale/fman/fman.c index 8179cc1..f420dac 100644 --- a/drivers/net/ethernet/freescale/fman/fman.c +++ b/drivers/net/ethernet/freescale/fman/fman.c @@ -45,6 +45,7 @@ #include "fman.h" #include "fman_muram.h" +#include "fman_keygen.h" /* General defines */ #define FMAN_LIODN_TBL 64 /* size of LIODN table */ @@ -56,6 +57,7 @@ /* Modules registers offsets */ #define BMI_OFFSET 0x0008 #define QMI_OFFSET 0x00080400 +#define KG_OFFSET 0x000C1000 #define DMA_OFFSET 0x000C2000 #define FPM_OFFSET 0x000C3000 #define IMEM_OFFSET0x000C4000 @@ -1737,6 +1739,7 @@ static int fman_config(struct fman *fman) fman->qmi_regs = base_addr + QMI_OFFSET; fman->dma_regs = base_addr + DMA_OFFSET; fman->hwp_regs = base_addr + HWP_OFFSET; + fman->kg_regs = base_addr + KG_OFFSET; fman->base_addr = base_addr; spin_lock_init(>spinlock); @@ -2009,6 +2012,11 @@ static int fman_init(struct fman *fman) /* Init HW Parser */ hwp_init(fman->hwp_regs); + /* Init KeyGen */ + fman->keygen = keygen_init(fman->kg_regs); + if (!fman->keygen) + return -EINVAL; + err = enable(fman, cfg); if (err != 0) return err; diff --git a/drivers/net/ethernet/freescale/fman/fman.h b/drivers/net/ethernet/freescale/fman/fman.h index 1015dac..bfa02e0 100644 --- a/drivers/net/ethernet/freescale/fman/fman.h +++ b/drivers/net/ethernet/freescale/fman/fman.h @@ -328,6 +328,7 @@ struct fman { struct fman_qmi_regs __iomem *qmi_regs; struct fman_dma_regs __iomem *dma_regs; struct fman_hwp_regs __iomem *hwp_regs; + struct fman_kg_regs __iomem *kg_regs; fman_exceptions_cb *exception_cb; fman_bus_error_cb *bus_error_cb; /* Spinlock for FMan use */ @@ -336,6 +337,7 @@ struct fman { struct fman_cfg *cfg; struct muram_info *muram; + struct fman_keygen *keygen; /* cam section in muram */ unsigned long cam_offset; size_t cam_size; diff --git a/drivers/net/ethernet/freescale/fman/fman_keygen.c b/drivers/net/ethernet/freescale/fman/fman_keygen.c new file mode 100644 index 000..f54da3c --- /dev/null +++ b/drivers/net/ethernet/freescale/fman/fman_keygen.c @@ -0,0 +1,783 @@ +/* + * Copyright 2017 NXP + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions are met: + * * Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * * Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * * Neither the name of NXP nor the + * names of its contributors may be used to endorse or promote products + * derived from this software without specific prior written permission. + * + * + * ALTERNATIVELY, this software may be distributed under the terms of the + * GNU General Public License ("GPL") as published by the Free Software + * Foundation, either version 2 of that License or (at your option) any + * later
[PATCH v4 1/7] fsl/fman: move struct fman to header file
Signed-off-by: Madalin Bucur--- drivers/net/ethernet/freescale/fman/fman.c | 80 + drivers/net/ethernet/freescale/fman/fman.h | 75 +++ drivers/net/ethernet/freescale/fman/fman_port.c | 8 +-- 3 files changed, 82 insertions(+), 81 deletions(-) diff --git a/drivers/net/ethernet/freescale/fman/fman.c b/drivers/net/ethernet/freescale/fman/fman.c index e714b8f..8179cc1 100644 --- a/drivers/net/ethernet/freescale/fman/fman.c +++ b/drivers/net/ethernet/freescale/fman/fman.c @@ -32,9 +32,6 @@ #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt -#include "fman.h" -#include "fman_muram.h" - #include #include #include @@ -46,6 +43,9 @@ #include #include +#include "fman.h" +#include "fman_muram.h" + /* General defines */ #define FMAN_LIODN_TBL 64 /* size of LIODN table */ #define MAX_NUM_OF_MACS10 @@ -564,80 +564,6 @@ struct fman_cfg { u32 qmi_def_tnums_thresh; }; -/* Structure that holds information received from device tree */ -struct fman_dts_params { - void __iomem *base_addr;/* FMan virtual address */ - struct resource *res; /* FMan memory resource */ - u8 id; /* FMan ID */ - - int err_irq;/* FMan Error IRQ */ - - u16 clk_freq; /* FMan clock freq (In Mhz) */ - - u32 qman_channel_base; /* QMan channels base */ - u32 num_of_qman_channels; /* Number of QMan channels */ - - struct resource muram_res; /* MURAM resource */ -}; - -/** fman_exceptions_cb - * fman- Pointer to FMan - * exception - The exception. - * - * Exceptions user callback routine, will be called upon an exception - * passing the exception identification. - * - * Return: irq status - */ -typedef irqreturn_t (fman_exceptions_cb)(struct fman *fman, -enum fman_exceptions exception); - -/** fman_bus_error_cb - * fman- Pointer to FMan - * port_id - Port id - * addr- Address that caused the error - * tnum- Owner of error - * liodn - Logical IO device number - * - * Bus error user callback routine, will be called upon bus error, - * passing parameters describing the errors and the owner. - * - * Return: IRQ status - */ -typedef irqreturn_t (fman_bus_error_cb)(struct fman *fman, u8 port_id, - u64 addr, u8 tnum, u16 liodn); - -struct fman { - struct device *dev; - void __iomem *base_addr; - struct fman_intr_src intr_mng[FMAN_EV_CNT]; - - struct fman_fpm_regs __iomem *fpm_regs; - struct fman_bmi_regs __iomem *bmi_regs; - struct fman_qmi_regs __iomem *qmi_regs; - struct fman_dma_regs __iomem *dma_regs; - struct fman_hwp_regs __iomem *hwp_regs; - fman_exceptions_cb *exception_cb; - fman_bus_error_cb *bus_error_cb; - /* Spinlock for FMan use */ - spinlock_t spinlock; - struct fman_state_struct *state; - - struct fman_cfg *cfg; - struct muram_info *muram; - /* cam section in muram */ - unsigned long cam_offset; - size_t cam_size; - /* Fifo in MURAM */ - unsigned long fifo_offset; - size_t fifo_size; - - u32 liodn_base[64]; - u32 liodn_offset[64]; - - struct fman_dts_params dts_params; -}; - static irqreturn_t fman_exceptions(struct fman *fman, enum fman_exceptions exception) { diff --git a/drivers/net/ethernet/freescale/fman/fman.h b/drivers/net/ethernet/freescale/fman/fman.h index f53e147..1015dac 100644 --- a/drivers/net/ethernet/freescale/fman/fman.h +++ b/drivers/net/ethernet/freescale/fman/fman.h @@ -34,6 +34,8 @@ #define __FM_H #include +#include +#include /* FM Frame descriptor macros */ /* Frame queue Context Override */ @@ -274,6 +276,79 @@ struct fman_intr_src { void *src_handle; }; +/** fman_exceptions_cb + * fman - Pointer to FMan + * exception- The exception. + * + * Exceptions user callback routine, will be called upon an exception + * passing the exception identification. + * + * Return: irq status + */ +typedef irqreturn_t (fman_exceptions_cb)(struct fman *fman, +enum fman_exceptions exception); +/** fman_bus_error_cb + * fman - Pointer to FMan + * port_id - Port id + * addr - Address that caused the error + * tnum - Owner of error + * liodn- Logical IO device number + * + * Bus error user callback routine, will be called upon bus error, + * passing parameters describing the errors and the owner. + * + * Return: IRQ status + */ +typedef irqreturn_t (fman_bus_error_cb)(struct fman *fman, u8 port_id, + u64 addr, u8
[PATCH v4 6/7] Documentation: networking: add RSS information
Signed-off-by: Madalin Bucur--- Documentation/networking/dpaa.txt | 68 ++- 1 file changed, 67 insertions(+), 1 deletion(-) diff --git a/Documentation/networking/dpaa.txt b/Documentation/networking/dpaa.txt index 76e016d..f88194f 100644 --- a/Documentation/networking/dpaa.txt +++ b/Documentation/networking/dpaa.txt @@ -13,6 +13,7 @@ Contents - Configuring DPAA Ethernet in your kernel - DPAA Ethernet Frame Processing - DPAA Ethernet Features + - DPAA IRQ Affinity and Receive Side Scaling - Debugging DPAA Ethernet Overview @@ -147,7 +148,10 @@ gradually. The driver has Rx and Tx checksum offloading for UDP and TCP. Currently the Rx checksum offload feature is enabled by default and cannot be controlled through -ethtool. +ethtool. Also, rx-flow-hash and rx-hashing was added. The addition of RSS +provides a big performance boost for the forwarding scenarios, allowing +different traffic flows received by one interface to be processed by different +CPUs in parallel. The driver has support for multiple prioritized Tx traffic classes. Priorities range from 0 (lowest) to 3 (highest). These are mapped to HW workqueues with @@ -166,6 +170,68 @@ classes as follows: tc qdisc add dev root handle 1: \ mqprio num_tc 4 map 0 0 0 0 1 1 1 1 2 2 2 2 3 3 3 3 hw 1 +DPAA IRQ Affinity and Receive Side Scaling +== + +Traffic coming on the DPAA Rx queues or on the DPAA Tx confirmation +queues is seen by the CPU as ingress traffic on a certain portal. +The DPAA QMan portal interrupts are affined each to a certain CPU. +The same portal interrupt services all the QMan portal consumers. + +By default the DPAA Ethernet driver enables RSS, making use of the +DPAA FMan Parser and Keygen blocks to distribute traffic on 128 +hardware frame queues using a hash on IP v4/v6 source and destination +and L4 source and destination ports, in present in the received frame. +When RSS is disabled, all traffic received by a certain interface is +received on the default Rx frame queue. The default DPAA Rx frame +queues are configured to put the received traffic into a pool channel +that allows any available CPU portal to dequeue the ingress traffic. +The default frame queues have the HOLDACTIVE option set, ensuring that +traffic bursts from a certain queue are serviced by the same CPU. +This ensures a very low rate of frame reordering. A drawback of this +is that only one CPU at a time can service the traffic received by a +certain interface when RSS is not enabled. + +To implement RSS, the DPAA Ethernet driver allocates an extra set of +128 Rx frame queues that are configured to dedicated channels, in a +round-robin manner. The mapping of the frame queues to CPUs is now +hardcoded, there is no indirection table to move traffic for a certain +FQ (hash result) to another CPU. The ingress traffic arriving on one +of these frame queues will arrive at the same portal and will always +be processed by the same CPU. This ensures intra-flow order preservation +and workload distribution for multiple traffic flows. + +RSS can be turned off for a certain interface using ethtool, i.e. + + # ethtool -N fm1-mac9 rx-flow-hash tcp4 "" + +To turn it back on, one needs to set rx-flow-hash for tcp4/6 or udp4/6: + + # ethtool -N fm1-mac9 rx-flow-hash udp4 sfdn + +There is no independent control for individual protocols, any command +run for one of tcp4|udp4|ah4|esp4|sctp4|tcp6|udp6|ah6|esp6|sctp6 is +going to control the rx-flow-hashing for all protocols on that interface. + +Besides using the FMan Keygen computed hash for spreading traffic on the +128 Rx FQs, the DPAA Ethernet driver also sets the skb hash value when +the NETIF_F_RXHASH feature is on (active by default). This can be turned +on or off through ethtool, i.e.: + + # ethtool -K fm1-mac9 rx-hashing off + # ethtool -k fm1-mac9 | grep hash + receive-hashing: off + # ethtool -K fm1-mac9 rx-hashing on + Actual changes: + receive-hashing: on + # ethtool -k fm1-mac9 | grep hash + receive-hashing: on + +Please note that Rx hashing depends upon the rx-flow-hashing being on +for that interface - turning off rx-flow-hashing will also disable the +rx-hashing (without ethtool reporting it as off as that depends on the +NETIF_F_RXHASH feature flag). + Debugging = -- 2.1.0
[PATCH v4 5/7] dpaa_eth: add NETIF_F_RXHASH
Set the skb hash when then FMan Keygen hash result is available. Signed-off-by: Madalin Bucur--- drivers/net/ethernet/freescale/dpaa/dpaa_eth.c | 23 +++--- drivers/net/ethernet/freescale/dpaa/dpaa_eth.h | 1 + drivers/net/ethernet/freescale/dpaa/dpaa_ethtool.c | 9 +++-- drivers/net/ethernet/freescale/fman/fman_port.c| 11 +++ drivers/net/ethernet/freescale/fman/fman_port.h| 2 ++ 5 files changed, 41 insertions(+), 5 deletions(-) diff --git a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c index 6d89e74..73ca8d7 100644 --- a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c +++ b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c @@ -236,7 +236,7 @@ static int dpaa_netdev_init(struct net_device *net_dev, net_dev->max_mtu = dpaa_get_max_mtu(); net_dev->hw_features |= (NETIF_F_IP_CSUM | NETIF_F_IPV6_CSUM | -NETIF_F_LLTX); +NETIF_F_LLTX | NETIF_F_RXHASH); net_dev->hw_features |= NETIF_F_SG | NETIF_F_HIGHDMA; /* The kernels enables GSO automatically, if we declare NETIF_F_SG. @@ -2237,12 +2237,13 @@ static enum qman_cb_dqrr_result rx_default_dqrr(struct qman_portal *portal, dma_addr_t addr = qm_fd_addr(fd); enum qm_fd_format fd_format; struct net_device *net_dev; - u32 fd_status; + u32 fd_status, hash_offset; struct dpaa_bp *dpaa_bp; struct dpaa_priv *priv; unsigned int skb_len; struct sk_buff *skb; int *count_ptr; + void *vaddr; fd_status = be32_to_cpu(fd->status); fd_format = qm_fd_get_format(fd); @@ -2288,7 +2289,8 @@ static enum qman_cb_dqrr_result rx_default_dqrr(struct qman_portal *portal, dma_unmap_single(dpaa_bp->dev, addr, dpaa_bp->size, DMA_FROM_DEVICE); /* prefetch the first 64 bytes of the frame or the SGT start */ - prefetch(phys_to_virt(addr) + qm_fd_get_offset(fd)); + vaddr = phys_to_virt(addr); + prefetch(vaddr + qm_fd_get_offset(fd)); fd_format = qm_fd_get_format(fd); /* The only FD types that we may receive are contig and S/G */ @@ -2309,6 +2311,18 @@ static enum qman_cb_dqrr_result rx_default_dqrr(struct qman_portal *portal, skb->protocol = eth_type_trans(skb, net_dev); + if (net_dev->features & NETIF_F_RXHASH && priv->keygen_in_use && + !fman_port_get_hash_result_offset(priv->mac_dev->port[RX], + _offset)) { + enum pkt_hash_types type; + + /* if L4 exists, it was used in the hash generation */ + type = be32_to_cpu(fd->status) & FM_FD_STAT_L4CV ? + PKT_HASH_TYPE_L4 : PKT_HASH_TYPE_L3; + skb_set_hash(skb, be32_to_cpu(*(u32 *)(vaddr + hash_offset)), +type); + } + skb_len = skb->len; if (unlikely(netif_receive_skb(skb) == NET_RX_DROP)) @@ -2774,6 +2788,9 @@ static int dpaa_eth_probe(struct platform_device *pdev) if (err) goto init_ports_failed; + /* Rx traffic distribution based on keygen hashing defaults to on */ + priv->keygen_in_use = true; + priv->percpu_priv = devm_alloc_percpu(dev, *priv->percpu_priv); if (!priv->percpu_priv) { dev_err(dev, "devm_alloc_percpu() failed\n"); diff --git a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.h b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.h index 496a12c..bd94220 100644 --- a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.h +++ b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.h @@ -159,6 +159,7 @@ struct dpaa_priv { struct list_head dpaa_fq_list; u8 num_tc; + bool keygen_in_use; u32 msg_enable; /* net_device message level */ struct { diff --git a/drivers/net/ethernet/freescale/dpaa/dpaa_ethtool.c b/drivers/net/ethernet/freescale/dpaa/dpaa_ethtool.c index 965f652..faea674 100644 --- a/drivers/net/ethernet/freescale/dpaa/dpaa_ethtool.c +++ b/drivers/net/ethernet/freescale/dpaa/dpaa_ethtool.c @@ -402,6 +402,8 @@ static void dpaa_get_strings(struct net_device *net_dev, u32 stringset, static int dpaa_get_hash_opts(struct net_device *dev, struct ethtool_rxnfc *cmd) { + struct dpaa_priv *priv = netdev_priv(dev); + cmd->data = 0; switch (cmd->flow_type) { @@ -409,7 +411,8 @@ static int dpaa_get_hash_opts(struct net_device *dev, case TCP_V6_FLOW: case UDP_V4_FLOW: case UDP_V6_FLOW: - cmd->data |= RXH_L4_B_0_1 | RXH_L4_B_2_3; + if (priv->keygen_in_use) + cmd->data |= RXH_L4_B_0_1 | RXH_L4_B_2_3; /* Fall through */ case IPV4_FLOW: case IPV6_FLOW: @@ -421,7 +424,8 @@ static int dpaa_get_hash_opts(struct net_device *dev,
[PATCH v4 7/7] dpaa_eth: check allocation result
Signed-off-by: Madalin Bucur--- drivers/net/ethernet/freescale/dpaa/dpaa_eth.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c index 73ca8d7..4225806 100644 --- a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c +++ b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c @@ -2561,6 +2561,9 @@ static struct dpaa_bp *dpaa_bp_alloc(struct device *dev) dpaa_bp->bpid = FSL_DPAA_BPID_INV; dpaa_bp->percpu_count = devm_alloc_percpu(dev, *dpaa_bp->percpu_count); + if (!dpaa_bp->percpu_count) + return ERR_PTR(-ENOMEM); + dpaa_bp->config_count = FSL_DPAA_ETH_MAX_BUF_COUNT; dpaa_bp->seed_cb = dpaa_bp_seed; -- 2.1.0
Re: Stable apply request [was: Bluetooth: bnep: fix possible might sleep error in bnep_session]
On Wed, Aug 23, 2017 at 08:14:15PM +0200, Marcel Holtmann wrote: > Hi Jiri, > > >>> It looks like bnep_session has same pattern as the issue reported in > >>> old rfcomm: > >>> > >>> while (1) { > >>> set_current_state(TASK_INTERRUPTIBLE); > >>> if (condition) > >>> break; > >>> // may call might_sleep here > >>> schedule(); > >>> } > >>> __set_current_state(TASK_RUNNING); > >>> > >>> Which fixed at: > >>> dfb2fae Bluetooth: Fix nested sleeps > >>> > >>> So let's fix it at the same way, also follow the suggestion of: > >>> https://lwn.net/Articles/628628/ > > > > ... > > > >> all 3 patches have been applied to bluetooth-next tree. > > > > Hi, > > > > given users are hitting it in at least 4.4 and 4.12, can we have all > > three in all stables where this applies? > > > > 5da8e47d849d Bluetooth: hidp: fix possible might sleep error in > > hidp_session_thread > > f06d977309d0 Bluetooth: cmtp: fix possible might sleep error in cmtp_session > > 25717382c1dd Bluetooth: bnep: fix possible might sleep error in bnep_session > > > > I am not sure: to stable directly or via net stable? > > as Dave said, just email -stable directly and have Greg pick them up. All now picked up :)
Re: [PATCH 0/4] net: stmmac: revert the EMAC bindings
On Sat, Aug 26, 2017 at 3:12 AM, Maxime Ripardwrote: > Hi, > > The bindings of the stmmac glue for the new Allwinner EMAC controller > are still controversial and being discussed, even though they've been > merged in 4.13. > > In order not to introduce any binding we do not really want to commit > to in a stable release, especially since that would mean we would have > to support both the right and old bindings, let's revert them. > > We will reintroduce them in due time, once the discussion has settled > down. > > The first three patches should go through the arm-soc tree, the last > one through the net tree. All of them must be treated as fixes. > > Thanks! > Maxime > > Maxime Ripard (4): > dt-bindings: net: Revert sun8i dwmac binding > arm64: dts: allwinner: Revert EMAC changes > arm: dts: sunxi: Revert EMAC changes > net: stmmac: sun8i: Remove the compatibles > > .../devicetree/bindings/net/dwmac-sun8i.txt| 84 > -- > arch/arm/boot/dts/sun8i-h2-plus-orangepi-zero.dts | 9 --- > arch/arm/boot/dts/sun8i-h3-bananapi-m2-plus.dts| 19 - > arch/arm/boot/dts/sun8i-h3-beelink-x2.dts | 8 --- I think this particular change is in -next, not v4.13-rc. Otherwise, whole series is Acked-by: Chen-Yu Tsai > arch/arm/boot/dts/sun8i-h3-nanopi-neo.dts | 7 -- > arch/arm/boot/dts/sun8i-h3-orangepi-2.dts | 8 --- > arch/arm/boot/dts/sun8i-h3-orangepi-one.dts| 8 --- > arch/arm/boot/dts/sun8i-h3-orangepi-pc-plus.dts| 5 -- > arch/arm/boot/dts/sun8i-h3-orangepi-pc.dts | 8 --- > arch/arm/boot/dts/sun8i-h3-orangepi-plus.dts | 22 -- > arch/arm/boot/dts/sun8i-h3-orangepi-plus2e.dts | 16 - > arch/arm/boot/dts/sunxi-h3-h5.dtsi | 26 --- > .../boot/dts/allwinner/sun50i-a64-bananapi-m64.dts | 17 - > .../boot/dts/allwinner/sun50i-a64-pine64-plus.dts | 15 > .../arm64/boot/dts/allwinner/sun50i-a64-pine64.dts | 18 - > .../dts/allwinner/sun50i-a64-sopine-baseboard.dts | 17 - > arch/arm64/boot/dts/allwinner/sun50i-a64.dtsi | 20 -- > .../boot/dts/allwinner/sun50i-h5-nanopi-neo2.dts | 17 - > .../boot/dts/allwinner/sun50i-h5-orangepi-pc2.dts | 17 - > .../dts/allwinner/sun50i-h5-orangepi-prime.dts | 17 - > drivers/net/ethernet/stmicro/stmmac/dwmac-sun8i.c | 8 --- > 21 files changed, 366 deletions(-) > delete mode 100644 Documentation/devicetree/bindings/net/dwmac-sun8i.txt > > -- > 2.13.5 >
[PATCH] DSA support for Micrel KSZ8895
Hi! So I fought with the driver a bit more, and now I have something that kind-of-works. "great great hack" belows worries me. Yeah, disabled code needs to be removed before merge. No, tag_ksz part probably is not acceptable. Do you see solution better than just copying it into tag_ksz1 file? Any more comments, etc? Help would be welcome. Best regards, Pavel Signed-off-by: Pavel Machekdiff --git a/drivers/net/dsa/microchip/Kconfig b/drivers/net/dsa/microchip/Kconfig index a8b8f59099ce..7b7d7ddb3488 100644 --- a/drivers/net/dsa/microchip/Kconfig +++ b/drivers/net/dsa/microchip/Kconfig @@ -1,12 +1,25 @@ menuconfig MICROCHIP_KSZ - tristate "Microchip KSZ series switch support" + tristate "Microchip KSZ 9477 series switch support" + depends on NET_DSA + select NET_DSA_TAG_KSZ + help + This driver adds support for Microchip KSZ switch chips. + +menuconfig MICROCHIP_KSZ_8895 + tristate "Microchip KSZ 8895 series switch support" depends on NET_DSA select NET_DSA_TAG_KSZ help This driver adds support for Microchip KSZ switch chips. config MICROCHIP_KSZ_SPI_DRIVER - tristate "KSZ series SPI connected switch driver" + tristate "KSZ 9477 series SPI connected switch driver" depends on MICROCHIP_KSZ && SPI help Select to enable support for registering switches configured through SPI. + +config MICROCHIP_KSZ_8895_SPI_DRIVER + tristate "KSZ 8895 series SPI connected switch driver" + depends on MICROCHIP_KSZ_8895 && SPI + help + Select to enable support for registering switches configured through SPI. diff --git a/drivers/net/dsa/microchip/Makefile b/drivers/net/dsa/microchip/Makefile index ed335e29fae8..b6a17f79d2d9 100644 --- a/drivers/net/dsa/microchip/Makefile +++ b/drivers/net/dsa/microchip/Makefile @@ -1,2 +1,4 @@ obj-$(CONFIG_MICROCHIP_KSZ)+= ksz_common.o +obj-$(CONFIG_MICROCHIP_KSZ_8895)+= ksz_8895.o obj-$(CONFIG_MICROCHIP_KSZ_SPI_DRIVER) += ksz_spi.o +obj-$(CONFIG_MICROCHIP_KSZ_8895_SPI_DRIVER)+= ksz_8895_spi.o diff --git a/drivers/net/dsa/microchip/ksz_8895.c b/drivers/net/dsa/microchip/ksz_8895.c new file mode 100644 index ..d546e08b1281 --- /dev/null +++ b/drivers/net/dsa/microchip/ksz_8895.c @@ -0,0 +1,721 @@ +/* + * Microchip switch driver main logic + * + * Copyright (C) 2017 + * Copyright (C) 2017 Pavel Machek + * + * Permission to use, copy, modify, and/or distribute this software for any + * purpose with or without fee is hereby granted, provided that the above + * copyright notice and this permission notice appear in all copies. + * + * THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES + * WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF + * MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR + * ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES + * WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN + * ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF + * OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE. + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include "ksz_8895_reg.h" +#include "ksz_priv.h" + +static const struct { + int index; + char string[ETH_GSTRING_LEN]; +} mib_names[TOTAL_SWITCH_COUNTER_NUM] = { + { 0x00, "???" }, +}; + +static void ksz_cfg(struct ksz_device *dev, u32 addr, u8 bits, bool set) +{ + u8 data; + + ksz_read8(dev, addr, ); + if (set) + data |= bits; + else + data &= ~bits; + ksz_write8(dev, addr, data); +} + +#if 0 +static void ksz_cfg32(struct ksz_device *dev, u32 addr, u32 bits, bool set) +{ + u32 data; + + ksz_read32(dev, addr, ); + if (set) + data |= bits; + else + data &= ~bits; + ksz_write32(dev, addr, data); +} +#endif + +static void ksz_port_cfg(struct ksz_device *dev, int port, int offset, u8 bits, +bool set) +{ + u32 addr; + u8 data; + + addr = PORT_CTRL_ADDR(port, offset); + ksz_read8(dev, addr, ); + + if (set) + data |= bits; + else + data &= ~bits; + + ksz_write8(dev, addr, data); +} + +#if 0 +static void ksz_port_cfg32(struct ksz_device *dev, int port, int offset, + u32 bits, bool set) +{ + u32 addr; + u32 data; + + addr = PORT_CTRL_ADDR(port, offset); + ksz_read32(dev, addr, ); + + if (set) + data |= bits; + else + data &= ~bits; + + ksz_write32(dev, addr, data); +} +#endif + +#define NOTIMPL() do { NOTIMPLV(); return -EJUKEBOX; }
[PATCH net-next 2/4] net/mlx5: Add SRIOV VGT+ support
From: Mohamad Haj YahiaImplementing the VGT+ feature via acl tables. The acl tables will hold the actual needed rules which is only the intersection of the requested vlan-ids list and the allowed vlan-ids list from the administrator. Signed-off-by: Mohamad Haj Yahia Signed-off-by: Eugenia Emantayev Signed-off-by: Saeed Mahameed --- drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 28 ++ drivers/net/ethernet/mellanox/mlx5/core/eswitch.c | 496 +- drivers/net/ethernet/mellanox/mlx5/core/eswitch.h | 31 +- drivers/net/ethernet/mellanox/mlx5/core/vport.c | 19 +- include/linux/mlx5/vport.h| 6 +- 5 files changed, 458 insertions(+), 122 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c index fdc2b92f020b..1a2ebe0e79ae 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c @@ -3388,6 +3388,32 @@ static int mlx5e_set_vf_vlan(struct net_device *dev, int vf, u16 vlan, u8 qos, vlan, qos); } +static int mlx5e_add_vf_vlan_trunk_range(struct net_device *dev, int vf, +u16 start_vid, u16 end_vid, +__be16 vlan_proto) { + struct mlx5e_priv *priv = netdev_priv(dev); + struct mlx5_core_dev *mdev = priv->mdev; + + if (vlan_proto != htons(ETH_P_8021Q)) + return -EPROTONOSUPPORT; + + return mlx5_eswitch_add_vport_trunk_range(mdev->priv.eswitch, vf + 1, + start_vid, end_vid); +} + +static int mlx5e_del_vf_vlan_trunk_range(struct net_device *dev, int vf, +u16 start_vid, u16 end_vid, +__be16 vlan_proto) { + struct mlx5e_priv *priv = netdev_priv(dev); + struct mlx5_core_dev *mdev = priv->mdev; + + if (vlan_proto != htons(ETH_P_8021Q)) + return -EPROTONOSUPPORT; + + return mlx5_eswitch_del_vport_trunk_range(mdev->priv.eswitch, vf + 1, + start_vid, end_vid); +} + static int mlx5e_set_vf_spoofchk(struct net_device *dev, int vf, bool setting) { struct mlx5e_priv *priv = netdev_priv(dev); @@ -3733,6 +3759,8 @@ static const struct net_device_ops mlx5e_netdev_ops = { /* SRIOV E-Switch NDOs */ .ndo_set_vf_mac = mlx5e_set_vf_mac, .ndo_set_vf_vlan = mlx5e_set_vf_vlan, + .ndo_add_vf_vlan_trunk_range = mlx5e_add_vf_vlan_trunk_range, + .ndo_del_vf_vlan_trunk_range = mlx5e_del_vf_vlan_trunk_range, .ndo_set_vf_spoofchk = mlx5e_set_vf_spoofchk, .ndo_set_vf_trust= mlx5e_set_vf_trust, .ndo_set_vf_rate = mlx5e_set_vf_rate, diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c index 6b84c1113301..a8e8670c7c8d 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c @@ -60,12 +60,14 @@ struct vport_addr { enum { UC_ADDR_CHANGE = BIT(0), MC_ADDR_CHANGE = BIT(1), + VLAN_CHANGE= BIT(2), PROMISC_CHANGE = BIT(3), }; /* Vport context events */ #define SRIOV_VPORT_EVENTS (UC_ADDR_CHANGE | \ MC_ADDR_CHANGE | \ + VLAN_CHANGE | \ PROMISC_CHANGE) static int arm_vport_context_events_cmd(struct mlx5_core_dev *dev, u16 vport, @@ -681,6 +683,45 @@ static void esw_update_vport_addr_list(struct mlx5_eswitch *esw, kfree(mac_list); } +static void esw_update_acl_trunk_bitmap(struct mlx5_eswitch *esw, u32 vport_num) +{ + struct mlx5_vport *vport = >vports[vport_num]; + + bitmap_and(vport->acl_vlan_8021q_bitmap, vport->req_vlan_bitmap, + vport->info.vlan_trunk_8021q_bitmap, VLAN_N_VID); +} + +static int esw_vport_egress_config(struct mlx5_eswitch *esw, + struct mlx5_vport *vport); +static int esw_vport_ingress_config(struct mlx5_eswitch *esw, + struct mlx5_vport *vport); + +/* Sync vport vlan list from vport context */ +static void esw_update_vport_vlan_list(struct mlx5_eswitch *esw, u32 vport_num) +{ + struct mlx5_vport *vport = >vports[vport_num]; + DECLARE_BITMAP(prev_vlans_bitmap, VLAN_N_VID); + int err; + + bitmap_copy(prev_vlans_bitmap, vport->req_vlan_bitmap, VLAN_N_VID); + bitmap_zero(vport->req_vlan_bitmap, VLAN_N_VID); + + if (!vport->enabled) + return; + + err = mlx5_query_nic_vport_vlans(esw->dev, vport_num, vport->req_vlan_bitmap); + if (err) + return; + +
[PATCH net-next 0/4] SRIOV VF VGT+ and violation counters support
Hi Dave This series provides two security SRIOV related features (VGT+ and VF violation counters). VGT+ is a security feature that gives the administrator the ability of controlling the allowed VGT vlan IDs list that can be transmitted/received from/to the VF. The allowed VGT vlan IDs list is called "trunk". Admin can add/remove a range of allowed vlan-ids via iptool: ip link set { DEVICE } [ vf NUM [ trunk { add | rem } START-VLAN-ID [ END-VLAN-ID ] [ proto VLAN-PROTO ] ] ] Example: After this series of configuration : 1) ip link set eth3 vf 0 trunk add 10 100 (allow vlan-id 10-100, default tpid 0x8100) 2) ip link set eth3 vf 0 trunk add 105 proto 802.1q (allow vlan-id 105 tpid 0x8100) 3) ip link set eth3 vf 0 trunk add 105 proto 802.1ad (allow vlan-id 105 tpid 0x88a8) 4) ip link set eth3 vf 0 trunk rem 90 (block vlan-id 90) 5) ip link set eth3 vf 0 trunk rem 50 60 (block vlan-ids 50-60) VF 0 can only communicate on vlan-ids: 10-49,61-89,91-100,105 with tpid 0x8100 and vlan-id 105 with tpid 0x88a8. For this purpose following net_device callbacks were added: int (*ndo_add_vf_vlan_trunk_range)(struct net_device *dev, int vf, u16 start_vid, u16 end_vid, __be16 proto); int (*ndo_del_vf_vlan_trunk_range)(struct net_device *dev, int vf, u16 start_vid, u16 end_vid, __be16 proto); This feature is implemented and demonstrated in mlx5 via ACL steering tables and vlan rules attached to the VF's corresponding E-Switch vport. I addition to VGT+ we introduce new set of counter to VF statistics, to collect counters for traffic violating VF ACL rules (such as VGT+ violation), for that we extend the current ifla_vf_stats to include rx_dropped/tx_dropped to be reported per VF. Example: > ip link set eth3 vf 0 trunk add 10 100 VF 0 transmits 2412 packets on a vlan id not in [10,100] range will be dropped and reported in hypervisor via: > ip -s link show dev enp5s0f0" 6: enp5s0f0:mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000 [...] vf 0 MAC 00:00:ca:fe:ca:fe, vlan 5, spoof checking off, link-state auto, trust off, query_rss off RX: bytes packets mcast bcast dropped 1666 29 14 32 0 TX: bytes packets dropped 2880 44 2412 Thanks, Saeed. Eugenia Emantayev (2): net/core: Add violation counters to VF statisctics net/mlx5e: E-switch, Add steering drop counters Mohamad Haj Yahia (2): net: Add SRIOV VGT+ support net/mlx5: Add SRIOV VGT+ support drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 28 + drivers/net/ethernet/mellanox/mlx5/core/eswitch.c | 589 + drivers/net/ethernet/mellanox/mlx5/core/eswitch.h | 31 +- drivers/net/ethernet/mellanox/mlx5/core/fs_core.h | 2 + .../net/ethernet/mellanox/mlx5/core/fs_counters.c | 6 + drivers/net/ethernet/mellanox/mlx5/core/vport.c| 19 +- include/linux/if_link.h| 4 + include/linux/mlx5/vport.h | 6 +- include/linux/netdevice.h | 12 + include/uapi/linux/if_link.h | 22 + net/core/rtnetlink.c | 119 +++-- 11 files changed, 681 insertions(+), 157 deletions(-) -- 2.13.0
[PATCH net-next 1/4] net: Add SRIOV VGT+ support
From: Mohamad Haj YahiaVGT+ is a security feature that gives the administrator the ability of controlling the allowed vlan-ids list that can be transmitted/received from/to the VF. The allowed vlan-ids list is called "trunk". Admin can add/remove a range of allowed vlan-ids via iptool. Example: After this series of configuration : 1) ip link set eth3 vf 0 trunk add 10 100 (allow vlan-id 10-100, default tpid 0x8100) 2) ip link set eth3 vf 0 trunk add 105 proto 802.1q (allow vlan-id 105 tpid 0x8100) 3) ip link set eth3 vf 0 trunk add 105 proto 802.1ad (allow vlan-id 105 tpid 0x88a8) 4) ip link set eth3 vf 0 trunk rem 90 (block vlan-id 90) 5) ip link set eth3 vf 0 trunk rem 50 60 (block vlan-ids 50-60) The VF 0 can only communicate on vlan-ids: 10-49,61-89,91-100,105 with tpid 0x8100 and vlan-id 105 with tpid 0x88a8. For this purpose we added the following netlink sr-iov commands: 1) IFLA_VF_VLAN_RANGE: used to add/remove allowed vlan-ids range. We added the ifla_vf_vlan_range struct to specify the range we want to add/remove from the userspace. We added ndo_add_vf_vlan_trunk_range and ndo_del_vf_vlan_trunk_range netdev ops to add/remove allowed vlan-ids range in the netdev. 2) IFLA_VF_VLAN_TRUNK: used to query the allowed vlan-ids trunk. We added trunk bitmap to the ifla_vf_info struct to get the current allowed vlan-ids trunk from the netdev. We added ifla_vf_vlan_trunk struct for sending the allowed vlan-ids trunk to the userspace. Signed-off-by: Mohamad Haj Yahia Signed-off-by: Eugenia Emantayev Signed-off-by: Saeed Mahameed --- include/linux/if_link.h | 2 + include/linux/netdevice.h| 12 + include/uapi/linux/if_link.h | 20 net/core/rtnetlink.c | 109 +++ 4 files changed, 114 insertions(+), 29 deletions(-) diff --git a/include/linux/if_link.h b/include/linux/if_link.h index 0b17c585b5cd..da70af27e42e 100644 --- a/include/linux/if_link.h +++ b/include/linux/if_link.h @@ -25,6 +25,8 @@ struct ifla_vf_info { __u32 max_tx_rate; __u32 rss_query_en; __u32 trusted; + __u64 trunk_8021q[VF_VLAN_BITMAP]; + __u64 trunk_8021ad[VF_VLAN_BITMAP]; __be16 vlan_proto; }; #endif /* _LINUX_IF_LINK_H */ diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index c5475b37a631..10633cabc58f 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -959,6 +959,10 @@ struct xfrmdev_ops { * Hash Key. This is needed since on some devices VF share this information * with PF and querying it may introduce a theoretical security risk. * int (*ndo_set_vf_rss_query_en)(struct net_device *dev, int vf, bool setting); + * int (*ndo_add_vf_vlan_trunk_range)(struct net_device *dev, int vf, + * u16 start_vid, u16 end_vid, __be16 proto); + * int (*ndo_del_vf_vlan_trunk_range)(struct net_device *dev, int vf, + * u16 start_vid, u16 end_vid, __be16 proto); * int (*ndo_get_vf_port)(struct net_device *dev, int vf, struct sk_buff *skb); * int (*ndo_setup_tc)(struct net_device *dev, enum tc_setup_type type, *void *type_data); @@ -1208,6 +1212,14 @@ struct net_device_ops { int (*ndo_set_vf_rss_query_en)( struct net_device *dev, int vf, bool setting); + int (*ndo_add_vf_vlan_trunk_range)( + struct net_device *dev, + int vf, u16 start_vid, + u16 end_vid, __be16 proto); + int (*ndo_del_vf_vlan_trunk_range)( + struct net_device *dev, + int vf, u16 start_vid, + u16 end_vid, __be16 proto); int (*ndo_setup_tc)(struct net_device *dev, enum tc_setup_type type, void *type_data); diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h index 8d062c58d5cb..3aa895c5fbc1 100644 --- a/include/uapi/linux/if_link.h +++ b/include/uapi/linux/if_link.h @@ -168,6 +168,8 @@ enum { #ifndef __KERNEL__ #define IFLA_RTA(r) ((struct rtattr*)(((char*)(r)) + NLMSG_ALIGN(sizeof(struct ifinfomsg #define IFLA_PAYLOAD(n) NLMSG_PAYLOAD(n,sizeof(struct ifinfomsg)) +#define BITS_PER_BYTE 8 +#define DIV_ROUND_UP(n, d) (((n) + (d) - 1) / (d)) #endif enum { @@ -645,6 +647,8 @@ enum { IFLA_VF_IB_NODE_GUID, /* VF Infiniband node GUID */ IFLA_VF_IB_PORT_GUID, /* VF Infiniband port GUID */
[PATCH net-next 4/4] net/mlx5e: E-switch, Add steering drop counters
From: Eugenia EmantayevAdd flow counters to count packets dropped due to drop rules configured in eswitch egress and ingress ACLs. These counters will count VFs violations and incoming traffic drops. Will be presented on hypervisor via standard 'ip -s link show' command. Example: "ip -s link show dev enp5s0f0" 6: enp5s0f0: mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000 link/ether 24:8a:07:a5:28:f0 brd ff:ff:ff:ff:ff:ff RX: bytes packets errors dropped overrun mcast 0 00 0 0 2 TX: bytes packets errors dropped carrier collsns 1406 17 0 0 0 0 vf 0 MAC 00:00:ca:fe:ca:fe, vlan 5, spoof checking off, link-state auto, trust off, query_rss off RX: bytes packets mcast bcast dropped 1666 29 14 32 0 TX: bytes packets dropped 2880 44 2412 Signed-off-by: Eugenia Emantayev Signed-off-by: Saeed Mahameed --- drivers/net/ethernet/mellanox/mlx5/core/eswitch.c | 97 -- drivers/net/ethernet/mellanox/mlx5/core/fs_core.h | 2 + .../net/ethernet/mellanox/mlx5/core/fs_counters.c | 6 ++ 3 files changed, 98 insertions(+), 7 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c index a8e8670c7c8d..6c992e43e397 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c @@ -37,6 +37,7 @@ #include #include "mlx5_core.h" #include "eswitch.h" +#include "fs_core.h" #define UPLINK_VPORT 0x @@ -1007,8 +1008,14 @@ static void esw_vport_cleanup_egress_rules(struct mlx5_eswitch *esw, kfree(trunk_vlan_rule); } - if (!IS_ERR_OR_NULL(vport->egress.drop_rule)) + if (!IS_ERR_OR_NULL(vport->egress.drop_rule)) { + struct mlx5_fc *drop_counter = + mlx5_flow_rule_counter(vport->egress.drop_rule); + mlx5_del_flow_rules(vport->egress.drop_rule); + if (drop_counter) + mlx5_fc_destroy(vport->dev, drop_counter); + } if (!IS_ERR_OR_NULL(vport->egress.allow_untagged_rule)) mlx5_del_flow_rules(vport->egress.allow_untagged_rule); @@ -1174,8 +1181,14 @@ static void esw_vport_cleanup_ingress_rules(struct mlx5_eswitch *esw, { struct mlx5_acl_vlan *trunk_vlan_rule, *tmp; - if (!IS_ERR_OR_NULL(vport->ingress.drop_rule)) + if (!IS_ERR_OR_NULL(vport->ingress.drop_rule)) { + struct mlx5_fc *drop_counter = + mlx5_flow_rule_counter(vport->ingress.drop_rule); + mlx5_del_flow_rules(vport->ingress.drop_rule); + if (drop_counter) + mlx5_fc_destroy(vport->dev, drop_counter); + } list_for_each_entry_safe(trunk_vlan_rule, tmp, >ingress.allowed_vlans_rules, list) { @@ -1222,6 +1235,8 @@ static int esw_vport_ingress_config(struct mlx5_eswitch *esw, bool need_vlan_filter = !!bitmap_weight(vport->info.vlan_trunk_8021q_bitmap, VLAN_N_VID); struct mlx5_acl_vlan *trunk_vlan_rule; + struct mlx5_flow_destination dest; + struct mlx5_fc *counter = NULL; struct mlx5_flow_act flow_act = {0}; struct mlx5_flow_spec *spec; bool need_acl_table = true; @@ -1333,18 +1348,33 @@ static int esw_vport_ingress_config(struct mlx5_eswitch *esw, } drop_rule: + /* Alloc ingress drop flow counter */ + counter = mlx5_fc_create(esw->dev, false); + if (IS_ERR(counter)) { + esw_warn(esw->dev, +"vport[%d] configure ingress drop rule counter failed\n", +vport->vport); + counter = NULL; + } else { + dest.type = MLX5_FLOW_DESTINATION_TYPE_COUNTER; + dest.counter = counter; + } + + /* Drop others rule (star rule) */ memset(spec, 0, sizeof(*spec)); flow_act.action = MLX5_FLOW_CONTEXT_ACTION_DROP; + if (counter) + flow_act.action |= MLX5_FLOW_CONTEXT_ACTION_COUNT; vport->ingress.drop_rule = - mlx5_add_flow_rules(vport->ingress.acl, spec, - _act, NULL, 0); + mlx5_add_flow_rules(vport->ingress.acl, spec, _act, , 1); if (IS_ERR(vport->ingress.drop_rule)) { err = PTR_ERR(vport->ingress.drop_rule); esw_warn(esw->dev, "vport[%d] configure ingress drop rule, err(%d)\n", vport->vport, err); vport->ingress.drop_rule = NULL; - goto out; + if (counter) +
[PATCH net-next 3/4] net/core: Add violation counters to VF statisctics
From: Eugenia EmantayevAdd receive and transmit violation counters to be displayed in iproute2 VF statistics. Signed-off-by: Eugenia Emantayev Signed-off-by: Saeed Mahameed --- include/linux/if_link.h | 2 ++ include/uapi/linux/if_link.h | 2 ++ net/core/rtnetlink.c | 10 +- 3 files changed, 13 insertions(+), 1 deletion(-) diff --git a/include/linux/if_link.h b/include/linux/if_link.h index da70af27e42e..ebf3448acb5b 100644 --- a/include/linux/if_link.h +++ b/include/linux/if_link.h @@ -12,6 +12,8 @@ struct ifla_vf_stats { __u64 tx_bytes; __u64 broadcast; __u64 multicast; + __u64 rx_dropped; + __u64 tx_dropped; }; struct ifla_vf_info { diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h index 3aa895c5fbc1..68cd31b281a1 100644 --- a/include/uapi/linux/if_link.h +++ b/include/uapi/linux/if_link.h @@ -743,6 +743,8 @@ enum { IFLA_VF_STATS_BROADCAST, IFLA_VF_STATS_MULTICAST, IFLA_VF_STATS_PAD, + IFLA_VF_STATS_RX_DROPPED, + IFLA_VF_STATS_TX_DROPPED, __IFLA_VF_STATS_MAX, }; diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c index 56909f11d88e..1a653bb00d6e 100644 --- a/net/core/rtnetlink.c +++ b/net/core/rtnetlink.c @@ -845,6 +845,10 @@ static inline int rtnl_vfinfo_size(const struct net_device *dev, nla_total_size_64bit(sizeof(__u64)) + /* IFLA_VF_STATS_MULTICAST */ nla_total_size_64bit(sizeof(__u64)) + +/* IFLA_VF_STATS_RX_DROPPED */ +nla_total_size_64bit(sizeof(__u64)) + +/* IFLA_VF_STATS_TX_DROPPED */ +nla_total_size_64bit(sizeof(__u64)) + nla_total_size(sizeof(struct ifla_vf_trust))); return size; } else @@ -1214,7 +1218,11 @@ static noinline_for_stack int rtnl_fill_vfinfo(struct sk_buff *skb, nla_put_u64_64bit(skb, IFLA_VF_STATS_BROADCAST, vf_stats.broadcast, IFLA_VF_STATS_PAD) || nla_put_u64_64bit(skb, IFLA_VF_STATS_MULTICAST, - vf_stats.multicast, IFLA_VF_STATS_PAD)) { + vf_stats.multicast, IFLA_VF_STATS_PAD) || + nla_put_u64_64bit(skb, IFLA_VF_STATS_RX_DROPPED, + vf_stats.rx_dropped, IFLA_VF_STATS_PAD) || + nla_put_u64_64bit(skb, IFLA_VF_STATS_TX_DROPPED, + vf_stats.tx_dropped, IFLA_VF_STATS_PAD)) { nla_nest_cancel(skb, vfstats); goto nla_put_vf_failure; } -- 2.13.0
Re: [PATCH net] bridge: check for null fdb->dst before notifying switchdev drivers
On 08/27/2017 07:13 AM, Roopa Prabhu wrote: > From: Roopa Prabhu> > current switchdev drivers dont seem to support offloading fdb > entries pointing to the bridge device which have fdb->dst > not set to any port. This patch adds a NULL fdb->dst check in > the switchdev notifier code. > > This patch fixes the below NULL ptr dereference: > $bridge fdb add 00:02:00:00:00:33 dev br0 self > > [ 69.953374] BUG: unable to handle kernel NULL pointer dereference at > 0008 > [ 69.954044] IP: br_switchdev_fdb_notify+0x29/0x80 > [ 69.954044] PGD 66527067 > [ 69.954044] P4D 66527067 > [ 69.954044] PUD 7899c067 > [ 69.954044] PMD 0 > [ 69.954044] > [ 69.954044] Oops: [#1] SMP > [ 69.954044] Modules linked in: > [ 69.954044] CPU: 1 PID: 3074 Comm: bridge Not tainted 4.13.0-rc6+ #1 > [ 69.954044] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), > BIOS rel-1.7.5.1-0-g8936dbb-20141113_115728-nilsson.home.kraxel.org > 04/01/2014 > [ 69.954044] task: 88007b827140 task.stack: c90001564000 > [ 69.954044] RIP: 0010:br_switchdev_fdb_notify+0x29/0x80 > [ 69.954044] RSP: 0018:c90001567918 EFLAGS: 00010246 > [ 69.954044] RAX: RBX: 8800795e0880 RCX: > 00c0 > [ 69.954044] RDX: c90001567920 RSI: 001c RDI: > 8800795d0600 > [ 69.954044] RBP: c90001567938 R08: 8800795d0600 R09: > > [ 69.954044] R10: c90001567a88 R11: 88007b849400 R12: > 8800795e0880 > [ 69.954044] R13: 8800795d0600 R14: 81ef8880 R15: > 001c > [ 69.954044] FS: 7f93d3085700() GS:88007fd0() > knlGS: > [ 69.954044] CS: 0010 DS: ES: CR0: 80050033 > [ 69.954044] CR2: 0008 CR3: 66551000 CR4: > 06e0 > [ 69.954044] Call Trace: > [ 69.954044] fdb_notify+0x3f/0xf0 > [ 69.954044] __br_fdb_add.isra.12+0x1a7/0x370 > [ 69.954044] br_fdb_add+0x178/0x280 > [ 69.954044] rtnl_fdb_add+0x10a/0x200 > [ 69.954044] rtnetlink_rcv_msg+0x1b4/0x240 > [ 69.954044] ? skb_free_head+0x21/0x40 > [ 69.954044] ? rtnl_calcit.isra.18+0xf0/0xf0 > [ 69.954044] netlink_rcv_skb+0xed/0x120 > [ 69.954044] rtnetlink_rcv+0x15/0x20 > [ 69.954044] netlink_unicast+0x180/0x200 > [ 69.954044] netlink_sendmsg+0x291/0x370 > [ 69.954044] ___sys_sendmsg+0x180/0x2e0 > [ 69.954044] ? filemap_map_pages+0x2db/0x370 > [ 69.954044] ? do_wp_page+0x11d/0x420 > [ 69.954044] ? __handle_mm_fault+0x794/0xd80 > [ 69.954044] ? vma_link+0xcb/0xd0 > [ 69.954044] __sys_sendmsg+0x4c/0x90 > [ 69.954044] SyS_sendmsg+0x12/0x20 > [ 69.954044] do_syscall_64+0x63/0xe0 > [ 69.954044] entry_SYSCALL64_slow_path+0x25/0x25 > [ 69.954044] RIP: 0033:0x7f93d2bad690 > [ 69.954044] RSP: 002b:7ffc7217a638 EFLAGS: 0246 ORIG_RAX: > 002e > [ 69.954044] RAX: ffda RBX: 7ffc72182eac RCX: > 7f93d2bad690 > [ 69.954044] RDX: RSI: 7ffc7217a670 RDI: > 0003 > [ 69.954044] RBP: 59a1f7f8 R08: 0006 R09: > 000a > [ 69.954044] R10: 7ffc7217a400 R11: 0246 R12: > 7ffc7217a670 > [ 69.954044] R13: 7ffc72182a98 R14: 006114c0 R15: > 7ffc72182aa0 > [ 69.954044] Code: 1f 00 66 66 66 66 90 55 48 89 e5 48 83 ec 20 f6 47 > 20 04 74 0a 83 fe 1c 74 09 83 fe 1d 74 2c c9 66 90 c3 48 8b 47 10 48 8d > 55 e8 <48> 8b 70 08 0f b7 47 1e 48 83 c7 18 48 89 7d f0 bf 03 00 00 00 > [ 69.954044] RIP: br_switchdev_fdb_notify+0x29/0x80 RSP: > c90001567918 > [ 69.954044] CR2: 0008 > [ 69.954044] ---[ end trace 03e9eec4a82c238b ]--- > > Fixes: 6b26b51b1d13 ("net: bridge: Add support for notifying devices about > FDB add/del") > Signed-off-by: Roopa Prabhu > --- > net/bridge/br_switchdev.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/net/bridge/br_switchdev.c b/net/bridge/br_switchdev.c > index 181a44d..f6b1c7d 100644 > --- a/net/bridge/br_switchdev.c > +++ b/net/bridge/br_switchdev.c > @@ -115,7 +115,7 @@ br_switchdev_fdb_call_notifiers(bool adding, const > unsigned char *mac, > void > br_switchdev_fdb_notify(const struct net_bridge_fdb_entry *fdb, int type) > { > - if (!fdb->added_by_user) > + if (!fdb->added_by_user || !fdb->dst) > return; > > switch (type) { > Thanks, missed that. Arkadi
Re: [Intel-wired-lan] [PATCH] e1000e: apply burst mode settings only on default
On 8/27/2017 11:32, Neftin, Sasha wrote: On 8/27/2017 11:30, Neftin, Sasha wrote: On 8/25/2017 18:06, Willem de Bruijn wrote: From: Willem de BruijnDevices that support FLAG2_DMA_BURST have different default values for RDTR and RADV. Apply burst mode default settings only when no explicit value was passed at module load. The RDTR default is zero. If the module is loaded for low latency operation with RxIntDelay=0, do not override this value with a burst default of 32. Move the decision to apply burst values earlier, where explicitly initialized module variables can be distinguished from defaults. Signed-off-by: Willem de Bruijn --- drivers/net/ethernet/intel/e1000e/e1000.h | 4 drivers/net/ethernet/intel/e1000e/netdev.c | 8 drivers/net/ethernet/intel/e1000e/param.c | 16 +++- 3 files changed, 15 insertions(+), 13 deletions(-) diff --git a/drivers/net/ethernet/intel/e1000e/e1000.h b/drivers/net/ethernet/intel/e1000e/e1000.h index 98e6abb1..2311b31bdcac 100644 --- a/drivers/net/ethernet/intel/e1000e/e1000.h +++ b/drivers/net/ethernet/intel/e1000e/e1000.h @@ -94,10 +94,6 @@ struct e1000_info; */ #define E1000_CHECK_RESET_COUNT25 -#define DEFAULT_RDTR0 -#define DEFAULT_RADV8 -#define BURST_RDTR0x20 -#define BURST_RADV0x20 #define PCICFG_DESC_RING_STATUS0xe4 #define FLUSH_DESC_REQUIRED0x100 diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c b/drivers/net/ethernet/intel/e1000e/netdev.c index 327dfe5bedc0..47b89aac7969 100644 --- a/drivers/net/ethernet/intel/e1000e/netdev.c +++ b/drivers/net/ethernet/intel/e1000e/netdev.c @@ -3223,14 +3223,6 @@ static void e1000_configure_rx(struct e1000_adapter *adapter) */ ew32(RXDCTL(0), E1000_RXDCTL_DMA_BURST_ENABLE); ew32(RXDCTL(1), E1000_RXDCTL_DMA_BURST_ENABLE); - -/* override the delay timers for enabling bursting, only if - * the value was not set by the user via module options - */ -if (adapter->rx_int_delay == DEFAULT_RDTR) -adapter->rx_int_delay = BURST_RDTR; -if (adapter->rx_abs_int_delay == DEFAULT_RADV) -adapter->rx_abs_int_delay = BURST_RADV; } /* set the Receive Delay Timer Register */ diff --git a/drivers/net/ethernet/intel/e1000e/param.c b/drivers/net/ethernet/intel/e1000e/param.c index 6d8c39abee16..bb696c98f9b0 100644 --- a/drivers/net/ethernet/intel/e1000e/param.c +++ b/drivers/net/ethernet/intel/e1000e/param.c @@ -73,17 +73,25 @@ E1000_PARAM(TxAbsIntDelay, "Transmit Absolute Interrupt Delay"); /* Receive Interrupt Delay in units of 1.024 microseconds * hardware will likely hang if you set this to anything but zero. * + * Burst variant is used as default if device has FLAG2_DMA_BURST. + * * Valid Range: 0-65535 */ E1000_PARAM(RxIntDelay, "Receive Interrupt Delay"); +#define DEFAULT_RDTR0 +#define BURST_RDTR0x20 #define MAX_RXDELAY 0x #define MIN_RXDELAY 0 /* Receive Absolute Interrupt Delay in units of 1.024 microseconds + * + * Burst variant is used as default if device has FLAG2_DMA_BURST. * * Valid Range: 0-65535 */ E1000_PARAM(RxAbsIntDelay, "Receive Absolute Interrupt Delay"); +#define DEFAULT_RADV8 +#define BURST_RADV0x20 #define MAX_RXABSDELAY 0x #define MIN_RXABSDELAY 0 @@ -297,6 +305,9 @@ void e1000e_check_options(struct e1000_adapter *adapter) .max = MAX_RXDELAY } } }; +if (adapter->flags2 & FLAG2_DMA_BURST) +opt.def = BURST_RDTR; + if (num_RxIntDelay > bd) { adapter->rx_int_delay = RxIntDelay[bd]; e1000_validate_option(>rx_int_delay, , @@ -307,7 +318,7 @@ void e1000e_check_options(struct e1000_adapter *adapter) } /* Receive Absolute Interrupt Delay */ { -static const struct e1000_option opt = { +static struct e1000_option opt = { .type = range_option, .name = "Receive Absolute Interrupt Delay", .err = "using default of " @@ -317,6 +328,9 @@ void e1000e_check_options(struct e1000_adapter *adapter) .max = MAX_RXABSDELAY } } }; +if (adapter->flags2 & FLAG2_DMA_BURST) +opt.def = BURST_RADV; + if (num_RxAbsIntDelay > bd) { adapter->rx_abs_int_delay = RxAbsIntDelay[bd]; e1000_validate_option(>rx_abs_int_delay, , This patch looks good for me, but I would like hear second opinion. ___ Intel-wired-lan mailing list intel-wired-...@osuosl.org https://lists.osuosl.org/mailman/listinfo/intel-wired-lan ___ Intel-wired-lan mailing list intel-wired-...@osuosl.org https://lists.osuosl.org/mailman/listinfo/intel-wired-lan
Re: [Intel-wired-lan] [PATCH] e1000e: apply burst mode settings only on default
On 8/27/2017 11:30, Neftin, Sasha wrote: On 8/25/2017 18:06, Willem de Bruijn wrote: From: Willem de BruijnDevices that support FLAG2_DMA_BURST have different default values for RDTR and RADV. Apply burst mode default settings only when no explicit value was passed at module load. The RDTR default is zero. If the module is loaded for low latency operation with RxIntDelay=0, do not override this value with a burst default of 32. Move the decision to apply burst values earlier, where explicitly initialized module variables can be distinguished from defaults. Signed-off-by: Willem de Bruijn --- drivers/net/ethernet/intel/e1000e/e1000.h | 4 drivers/net/ethernet/intel/e1000e/netdev.c | 8 drivers/net/ethernet/intel/e1000e/param.c | 16 +++- 3 files changed, 15 insertions(+), 13 deletions(-) diff --git a/drivers/net/ethernet/intel/e1000e/e1000.h b/drivers/net/ethernet/intel/e1000e/e1000.h index 98e6abb1..2311b31bdcac 100644 --- a/drivers/net/ethernet/intel/e1000e/e1000.h +++ b/drivers/net/ethernet/intel/e1000e/e1000.h @@ -94,10 +94,6 @@ struct e1000_info; */ #define E1000_CHECK_RESET_COUNT25 -#define DEFAULT_RDTR0 -#define DEFAULT_RADV8 -#define BURST_RDTR0x20 -#define BURST_RADV0x20 #define PCICFG_DESC_RING_STATUS0xe4 #define FLUSH_DESC_REQUIRED0x100 diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c b/drivers/net/ethernet/intel/e1000e/netdev.c index 327dfe5bedc0..47b89aac7969 100644 --- a/drivers/net/ethernet/intel/e1000e/netdev.c +++ b/drivers/net/ethernet/intel/e1000e/netdev.c @@ -3223,14 +3223,6 @@ static void e1000_configure_rx(struct e1000_adapter *adapter) */ ew32(RXDCTL(0), E1000_RXDCTL_DMA_BURST_ENABLE); ew32(RXDCTL(1), E1000_RXDCTL_DMA_BURST_ENABLE); - -/* override the delay timers for enabling bursting, only if - * the value was not set by the user via module options - */ -if (adapter->rx_int_delay == DEFAULT_RDTR) -adapter->rx_int_delay = BURST_RDTR; -if (adapter->rx_abs_int_delay == DEFAULT_RADV) -adapter->rx_abs_int_delay = BURST_RADV; } /* set the Receive Delay Timer Register */ diff --git a/drivers/net/ethernet/intel/e1000e/param.c b/drivers/net/ethernet/intel/e1000e/param.c index 6d8c39abee16..bb696c98f9b0 100644 --- a/drivers/net/ethernet/intel/e1000e/param.c +++ b/drivers/net/ethernet/intel/e1000e/param.c @@ -73,17 +73,25 @@ E1000_PARAM(TxAbsIntDelay, "Transmit Absolute Interrupt Delay"); /* Receive Interrupt Delay in units of 1.024 microseconds * hardware will likely hang if you set this to anything but zero. * + * Burst variant is used as default if device has FLAG2_DMA_BURST. + * * Valid Range: 0-65535 */ E1000_PARAM(RxIntDelay, "Receive Interrupt Delay"); +#define DEFAULT_RDTR0 +#define BURST_RDTR0x20 #define MAX_RXDELAY 0x #define MIN_RXDELAY 0 /* Receive Absolute Interrupt Delay in units of 1.024 microseconds + * + * Burst variant is used as default if device has FLAG2_DMA_BURST. * * Valid Range: 0-65535 */ E1000_PARAM(RxAbsIntDelay, "Receive Absolute Interrupt Delay"); +#define DEFAULT_RADV8 +#define BURST_RADV0x20 #define MAX_RXABSDELAY 0x #define MIN_RXABSDELAY 0 @@ -297,6 +305,9 @@ void e1000e_check_options(struct e1000_adapter *adapter) .max = MAX_RXDELAY } } }; +if (adapter->flags2 & FLAG2_DMA_BURST) +opt.def = BURST_RDTR; + if (num_RxIntDelay > bd) { adapter->rx_int_delay = RxIntDelay[bd]; e1000_validate_option(>rx_int_delay, , @@ -307,7 +318,7 @@ void e1000e_check_options(struct e1000_adapter *adapter) } /* Receive Absolute Interrupt Delay */ { -static const struct e1000_option opt = { +static struct e1000_option opt = { .type = range_option, .name = "Receive Absolute Interrupt Delay", .err = "using default of " @@ -317,6 +328,9 @@ void e1000e_check_options(struct e1000_adapter *adapter) .max = MAX_RXABSDELAY } } }; +if (adapter->flags2 & FLAG2_DMA_BURST) +opt.def = BURST_RADV; + if (num_RxAbsIntDelay > bd) { adapter->rx_abs_int_delay = RxAbsIntDelay[bd]; e1000_validate_option(>rx_abs_int_delay, , This patch looks good for me, but I would like hear second opinion. ___ Intel-wired-lan mailing list intel-wired-...@osuosl.org https://lists.osuosl.org/mailman/listinfo/intel-wired-lan
Re: [patch net-next 11/12] mlxsw: spectrum_dpipe: Add support for IPv4 host table dump
On 08/25/2017 10:51 PM, David Ahern wrote: > On 8/25/17 2:26 AM, Arkadi Sharshevsky wrote: >> >> >> On 08/24/2017 10:26 PM, David Ahern wrote: >>> On 8/23/17 11:40 PM, Jiri Pirko wrote: +static int +mlxsw_sp_dpipe_table_host_entries_get(struct mlxsw_sp *mlxsw_sp, +struct devlink_dpipe_entry *entry, +bool counters_enabled, +struct devlink_dpipe_dump_ctx *dump_ctx, +int type) +{ + int rif_neigh_count = 0; + int rif_neigh_skip = 0; + int neigh_count = 0; + int rif_count; + int i, j; + int err; + + rtnl_lock(); >>> >>> Why does a h/w driver dumping its tables need the rtnl lock? >>> >> >> This table represents the hw IPv4 arp table, and the >> driver depends on rtnl to be held. >> > > Meaning mlxsw does not have its own locks protecting data structures -- > e.g., rif adds and deletes, so it is relying on rtnl? > > Also, this dpipe capability seems to be just dumping data structures > maintained by the driver. ie., you can compare the mlxsw view of > networking state to IPv4 and IPv6 level tables. Any plans to offer a > command that reads data from the h/w and passes that back to the user? > i.e, a command to compare kernel tables to h/w state? > So this infra should provide several things- 1) Reveal the interactions between various hardware tables 2) Counters for this tables 3) Debugabillity The first two can be achieved right now. Regarding debugabillity, which is a bit vague, the current assumption is that the drivers internal data structures are synced with hardware (which is no always true), and maybe are not synced with the kernel, so this can be achieved right now by dumping the internal state of the driver. Furthermore, the counters are dumped from the hardware and give the user additional indication. I completely agree that the hardware should be dumped in order to validate the internal data structures are really synced with HW. This could be usable for observing data corruptions inside the ASIC and various complex bugs. In order to address that I though about maybe add a flag called "validate_hw" so that during the dump the driver<-->hw state could be validated. What do you think about it? Thanks, Arkadi
Re: [Intel-wired-lan] [PATCH] e1000e: apply burst mode settings only on default
On 8/25/2017 18:06, Willem de Bruijn wrote: From: Willem de BruijnDevices that support FLAG2_DMA_BURST have different default values for RDTR and RADV. Apply burst mode default settings only when no explicit value was passed at module load. The RDTR default is zero. If the module is loaded for low latency operation with RxIntDelay=0, do not override this value with a burst default of 32. Move the decision to apply burst values earlier, where explicitly initialized module variables can be distinguished from defaults. Signed-off-by: Willem de Bruijn --- drivers/net/ethernet/intel/e1000e/e1000.h | 4 drivers/net/ethernet/intel/e1000e/netdev.c | 8 drivers/net/ethernet/intel/e1000e/param.c | 16 +++- 3 files changed, 15 insertions(+), 13 deletions(-) diff --git a/drivers/net/ethernet/intel/e1000e/e1000.h b/drivers/net/ethernet/intel/e1000e/e1000.h index 98e6abb1..2311b31bdcac 100644 --- a/drivers/net/ethernet/intel/e1000e/e1000.h +++ b/drivers/net/ethernet/intel/e1000e/e1000.h @@ -94,10 +94,6 @@ struct e1000_info; */ #define E1000_CHECK_RESET_COUNT 25 -#define DEFAULT_RDTR 0 -#define DEFAULT_RADV 8 -#define BURST_RDTR 0x20 -#define BURST_RADV 0x20 #define PCICFG_DESC_RING_STATUS 0xe4 #define FLUSH_DESC_REQUIRED 0x100 diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c b/drivers/net/ethernet/intel/e1000e/netdev.c index 327dfe5bedc0..47b89aac7969 100644 --- a/drivers/net/ethernet/intel/e1000e/netdev.c +++ b/drivers/net/ethernet/intel/e1000e/netdev.c @@ -3223,14 +3223,6 @@ static void e1000_configure_rx(struct e1000_adapter *adapter) */ ew32(RXDCTL(0), E1000_RXDCTL_DMA_BURST_ENABLE); ew32(RXDCTL(1), E1000_RXDCTL_DMA_BURST_ENABLE); - - /* override the delay timers for enabling bursting, only if -* the value was not set by the user via module options -*/ - if (adapter->rx_int_delay == DEFAULT_RDTR) - adapter->rx_int_delay = BURST_RDTR; - if (adapter->rx_abs_int_delay == DEFAULT_RADV) - adapter->rx_abs_int_delay = BURST_RADV; } /* set the Receive Delay Timer Register */ diff --git a/drivers/net/ethernet/intel/e1000e/param.c b/drivers/net/ethernet/intel/e1000e/param.c index 6d8c39abee16..bb696c98f9b0 100644 --- a/drivers/net/ethernet/intel/e1000e/param.c +++ b/drivers/net/ethernet/intel/e1000e/param.c @@ -73,17 +73,25 @@ E1000_PARAM(TxAbsIntDelay, "Transmit Absolute Interrupt Delay"); /* Receive Interrupt Delay in units of 1.024 microseconds * hardware will likely hang if you set this to anything but zero. * + * Burst variant is used as default if device has FLAG2_DMA_BURST. + * * Valid Range: 0-65535 */ E1000_PARAM(RxIntDelay, "Receive Interrupt Delay"); +#define DEFAULT_RDTR 0 +#define BURST_RDTR 0x20 #define MAX_RXDELAY 0x #define MIN_RXDELAY 0 /* Receive Absolute Interrupt Delay in units of 1.024 microseconds + * + * Burst variant is used as default if device has FLAG2_DMA_BURST. * * Valid Range: 0-65535 */ E1000_PARAM(RxAbsIntDelay, "Receive Absolute Interrupt Delay"); +#define DEFAULT_RADV 8 +#define BURST_RADV 0x20 #define MAX_RXABSDELAY 0x #define MIN_RXABSDELAY 0 @@ -297,6 +305,9 @@ void e1000e_check_options(struct e1000_adapter *adapter) .max = MAX_RXDELAY } } }; + if (adapter->flags2 & FLAG2_DMA_BURST) + opt.def = BURST_RDTR; + if (num_RxIntDelay > bd) { adapter->rx_int_delay = RxIntDelay[bd]; e1000_validate_option(>rx_int_delay, , @@ -307,7 +318,7 @@ void e1000e_check_options(struct e1000_adapter *adapter) } /* Receive Absolute Interrupt Delay */ { - static const struct e1000_option opt = { + static struct e1000_option opt = { .type = range_option, .name = "Receive Absolute Interrupt Delay", .err = "using default of " @@ -317,6 +328,9 @@ void e1000e_check_options(struct e1000_adapter *adapter) .max = MAX_RXABSDELAY } } }; + if (adapter->flags2 & FLAG2_DMA_BURST) + opt.def = BURST_RADV; + if (num_RxAbsIntDelay > bd) { adapter->rx_abs_int_delay = RxAbsIntDelay[bd]; e1000_validate_option(>rx_abs_int_delay, , This patch looks good for me, but I would like hear second opinion.
Re: [PATCH] pktgen: add a new sample script for 40G and above link testing
On 25/08/2017 12:26 PM, Robert Hoo wrote: (Sorry for yesterday's wrong sending, I finally fixed my MTA and git send-email settings.) It's hard to benchmark 40G+ network bandwidth using ordinary tools like iperf, netperf (see reference 1). Pktgen, packet generator from Kernel sapce, shall be a candidate. I then tried with pktgen multiqueue sample scripts, but still cannot reach line rate. Try samples 03 and 04. I then derived this NUMA awared irq affinity sample script from multi-queue sample one, successfully benchmarked 40G link. I think this can also be useful for 100G reference, though I haven't got device to test yet. This script simply does: Detect $DEV's NUMA node belonging. Bind each thread (processor from that NUMA node) with each $DEV queue's irq affinity, 1:1 mapping. How many '-t' threads input determines how many queues will be utilized. I agree this is an essential capability. This was the main reason I added support for the -f argument. Using it, I could choose cores of local NUMA, especially for single thread, or when cores of the NUMA are sequential. Tested with Intel XL710 NIC with Cisco 3172 switch. It would be even slightly better if the irqbalance service is turned off outside. Referrences: https://people.netfilter.org/hawk/presentations/LCA2015/net_stack_challenges_100G_LCA2015.pdf http://www.intel.cn/content/dam/www/public/us/en/documents/reference-guides/xl710-x710-performance-tuning-linux-guide.pdf Signed-off-by: Robert Hoo--- Regards, Tariq Toukan
[PATCH] be2net: Fix some u16 fields appropriately
In be_tx_compl_process, frag_index declared as u32, so it's better to declare last_index as u32 also. CC: Ajit KhapardeFixes: b0fd2eb28bd4 ("be2net: Declare some u16 fields as u32 to improve performance") Signed-off-by: Haishuang Yan --- drivers/net/ethernet/emulex/benet/be.h | 2 +- drivers/net/ethernet/emulex/benet/be_main.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/emulex/benet/be.h b/drivers/net/ethernet/emulex/benet/be.h index 674cf9d..2ba4d61 100644 --- a/drivers/net/ethernet/emulex/benet/be.h +++ b/drivers/net/ethernet/emulex/benet/be.h @@ -255,7 +255,7 @@ struct be_tx_stats { /* Structure to hold some data of interest obtained from a TX CQE */ struct be_tx_compl_info { u8 status; /* Completion status */ - u16 end_index; /* Completed TXQ Index */ + u32 end_index; /* Completed TXQ Index */ }; struct be_tx_obj { diff --git a/drivers/net/ethernet/emulex/benet/be_main.c b/drivers/net/ethernet/emulex/benet/be_main.c index 319eee3..3645344 100644 --- a/drivers/net/ethernet/emulex/benet/be_main.c +++ b/drivers/net/ethernet/emulex/benet/be_main.c @@ -2606,7 +2606,7 @@ static struct be_tx_compl_info *be_tx_compl_get(struct be_tx_obj *txo) } static u16 be_tx_compl_process(struct be_adapter *adapter, - struct be_tx_obj *txo, u16 last_index) + struct be_tx_obj *txo, u32 last_index) { struct sk_buff **sent_skbs = txo->sent_skb_list; struct be_queue_info *txq = >q; -- 1.8.3.1
[PATCH] igb: check memory allocation failure
Check memory allocation failures and return -ENOMEM in such cases, as already done for other memory allocations in this function. This avoids NULL pointers dereference. Signed-off-by: Christophe JAILLET--- drivers/net/ethernet/intel/igb/igb_main.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c index fd4a46b03cc8..837d9b46a390 100644 --- a/drivers/net/ethernet/intel/igb/igb_main.c +++ b/drivers/net/ethernet/intel/igb/igb_main.c @@ -3162,6 +3162,8 @@ static int igb_sw_init(struct igb_adapter *adapter) /* Setup and initialize a copy of the hw vlan table array */ adapter->shadow_vfta = kcalloc(E1000_VLAN_FILTER_TBL_SIZE, sizeof(u32), GFP_ATOMIC); + if (!adapter->shadow_vfta) + return -ENOMEM; /* This call may decrease the number of queues */ if (igb_init_interrupt_scheme(adapter, true)) { -- 2.11.0