Suggestions for new Ethernet driver?
I'm working with Aquantia to add a new 2.5/5 Gbps driver to the kernel. It looks like it's going to one of the biggest drivers in drivers/net/ethernet. The team that developed the driver is new to kernel development processes, but are working to make it checkpatch-clean and addressing sparse issues. Right now we're working to split the code into small chunks for review. The sequence of patches first targets basic functionality, then adding a feature at a time. Still, it's going to be a lot of work to review. Aquantia is committed to doing the work to add this to the mainline kernel but it's clearly going to be a substantial amount of work not only for them, but for reviewers. So, my question: what can we do to make this process easy for the networking community in addition to the basics that are already under way? I welcome any and all suggestions. Thanks! -- David VL
[PATCH net-next] ibmveth: calculate correct gso_size and set gso_type
We recently encountered a bug where a few customers using ibmveth on the same LPAR hit an issue where a TCP session hung when large receive was enabled. Closer analysis revealed that the session was stuck because the one side was advertising a zero window repeatedly. We narrowed this down to the fact the ibmveth driver did not set gso_size which is translated by TCP into the MSS later up the stack. The MSS is used to calculate the TCP window size and as that was abnormally large, it was calculating a zero window, even although the sockets receive buffer was completely empty. We were able to reproduce this and worked with IBM to fix this. Thanks Tom and Marcelo for all your help and review on this. The patch fixes both our internal reproduction tests and our customers tests. Signed-off-by: Jon Maxwell--- drivers/net/ethernet/ibm/ibmveth.c | 19 +++ 1 file changed, 19 insertions(+) diff --git a/drivers/net/ethernet/ibm/ibmveth.c b/drivers/net/ethernet/ibm/ibmveth.c index 29c05d0..3028c33 100644 --- a/drivers/net/ethernet/ibm/ibmveth.c +++ b/drivers/net/ethernet/ibm/ibmveth.c @@ -1182,6 +1182,8 @@ static int ibmveth_poll(struct napi_struct *napi, int budget) int frames_processed = 0; unsigned long lpar_rc; struct iphdr *iph; + bool large_packet = 0; + u16 hdr_len = ETH_HLEN + sizeof(struct tcphdr); restart_poll: while (frames_processed < budget) { @@ -1236,10 +1238,27 @@ static int ibmveth_poll(struct napi_struct *napi, int budget) iph->check = 0; iph->check = ip_fast_csum((unsigned char *)iph, iph->ihl); adapter->rx_large_packets++; + large_packet = 1; } } } + if (skb->len > netdev->mtu) { + iph = (struct iphdr *)skb->data; + if (be16_to_cpu(skb->protocol) == ETH_P_IP && iph->protocol == IPPROTO_TCP) { + hdr_len += sizeof(struct iphdr); + skb_shinfo(skb)->gso_type = SKB_GSO_TCPV4; + skb_shinfo(skb)->gso_size = netdev->mtu - hdr_len; + } else if (be16_to_cpu(skb->protocol) == ETH_P_IPV6 && + iph->protocol == IPPROTO_TCP) { + hdr_len += sizeof(struct ipv6hdr); + skb_shinfo(skb)->gso_type = SKB_GSO_TCPV6; + skb_shinfo(skb)->gso_size = netdev->mtu - hdr_len; + } + if (!large_packet) + adapter->rx_large_packets++; + } + napi_gro_receive(napi, skb);/* send it up */ netdev->stats.rx_packets++; -- 1.8.3.1
pull request (net-next): ipsec-next 2016-10-25
Just a leftover from the last development cycle. 1) Remove some unused code, from Florian Westphal. Please pull or let me know if there are problems. Thanks! The following changes since commit 31fbe81fe3426dfb7f8056a7f5106c6b1841a9aa: Merge branch 'qcom-emac-acpi' (2016-09-29 01:50:20 -0400) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next.git master for you to fetch changes up to 2258d927a691ddd2ab585adb17ea9f96e89d0638: xfrm: remove unused helper (2016-09-30 08:20:56 +0200) Florian Westphal (1): xfrm: remove unused helper net/xfrm/xfrm_state.c | 8 1 file changed, 8 deletions(-)
[PATCH] xfrm: remove unused helper
From: Florian WestphalNot used anymore since 2009 (9e0d57fd6dad37, 'xfrm: SAD entries do not expire correctly after suspend-resume'). Signed-off-by: Florian Westphal Signed-off-by: Steffen Klassert --- net/xfrm/xfrm_state.c | 8 1 file changed, 8 deletions(-) diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c index 419bf5d..45cb7c6 100644 --- a/net/xfrm/xfrm_state.c +++ b/net/xfrm/xfrm_state.c @@ -388,14 +388,6 @@ static void xfrm_state_gc_task(struct work_struct *work) xfrm_state_gc_destroy(x); } -static inline unsigned long make_jiffies(long secs) -{ - if (secs >= (MAX_SCHEDULE_TIMEOUT-1)/HZ) - return MAX_SCHEDULE_TIMEOUT-1; - else - return secs*HZ; -} - static enum hrtimer_restart xfrm_timer_handler(struct hrtimer *me) { struct tasklet_hrtimer *thr = container_of(me, struct tasklet_hrtimer, timer); -- 1.9.1
Re: [PATCH net-next] ethernet: fix min/max MTU typos
On Mon, Oct 24, 2016 at 02:42:26PM +0200, Stefan Richter wrote: > Fixes: d894be57ca92('ethernet: use net core MTU range checking in more > drivers') > CC: Jarod Wilson> CC: Thomas Falcon > Signed-off-by: Stefan Richter Wuf. Thank you, Stefan. Way too many bleeding eyeball hours staring at all those changes. Acked-by: Jarod Wilson -- Jarod Wilson ja...@redhat.com
Re: [PATCH net-next 2/2 v2] firewire: net: set initial MTU = 1500 unconditionally, fix IPv6 on some CardBus cards
On Mon, Oct 24, 2016 at 02:26:13PM +0200, Stefan Richter wrote: > firewire-net, like the older eth1394 driver, reduced the initial MTU to > less than 1500 octets if the local link layer controller's asynchronous > packet reception limit was lower. > > This is bogus, since this reception limit does not have anything to do > with the transmission limit. Neither did this reduction affect the TX > path positively, nor could it prevent link fragmentation at the RX path. > > Many FireWire CardBus cards have a max_rec of 9, causing an initial MTU > of 1024 - 16 = 1008. RFC 2734 and RFC 3146 allow a minimum max_rec = 8, > which would result in an initial MTU of 512 - 16 = 496. On such cards, > IPv6 could only be employed if the MTU was manually increased to 1280 or > more, i.e. IPv6 would not work without intervention from userland. > > We now always initialize the MTU to 1500, which is the default according > to RFC 2734 and RFC 3146. > > On a VIA VT6316 based CardBus card which was affected by this, changing > the MTU from 1008 to 1500 also increases TX bandwidth by 6 %. > RX remains unaffected. > > CC: netdev@vger.kernel.org > CC: linux1394-de...@lists.sourceforge.net > CC: Jarod Wilson> Signed-off-by: Stefan Richter > --- > v2: use ETH_DATA_LEN, add comment Acked-by: Jarod Wilson -- Jarod Wilson ja...@redhat.com
[RFC PATCH net-next] net: ethtool: add support for forward error correction modes
From: Vidya Sagar RavipatiForward Error Correction (FEC) modes i.e Base-R and Reed-Solomon modes are introduced in 25G/40G/100G standards for providing good BER at high speeds. Various networking devices which support 25G/40G/100G provides ability to manage supported FEC modes and the lack of FEC encoding control and reporting today is a source for itneroperability issues for many vendors. FEC capability as well as specific FEC mode i.e. Base-R or RS modes can be requested or advertised through bits D44:47 of base link codeword. This patch set intends to provide option under ethtool to manage and report FEC encoding settings for networking devices as per IEEE 802.3 bj, bm and by specs. set-fec/show-fec option(s) are designed to provide control and report the FEC encoding on the link. SET FEC option: root@tor: ethtool --set-fec swp1 encoding [off | RS | BaseR | auto] autoneg [off | on] Encoding: Types of encoding Off: Turning off any encoding RS : enforcing RS-FEC encoding on supported speeds BaseR : enforcing Base R encoding on supported speeds Auto : Default FEC settings for divers , and would represent asking the hardware to essentially go into a best effort mode. Here are a few examples of what we would expect if encoding=auto: - if autoneg is on, we are expecting FEC to be negotiated as on or off as long as protocol supports it - if the hardware is capable of detecting the FEC encoding on it's receiver it will reconfigure its encoder to match - in absence of the above, the configuration would be set to IEEE defaults. >From our understanding , this is essentially what most hardware/driver combinations are doing today in the absence of a way for users to control the behavior. SHOW FEC option: root@tor: ethtool --show-fec swp1 FEC parameters for swp1: Autonegotiate: off FEC encodings: RS ETHTOOL DEVNAME output modification: ethtool devname output: root@tor:~# ethtool swp1 Settings for swp1: root@hpe-7712-03:~# ethtool swp18 Settings for swp18: Supported ports: [ FIBRE ] Supported link modes: 4baseCR4/Full 4baseSR4/Full 4baseLR4/Full 10baseSR4/Full 10baseCR4/Full 10baseLR4_ER4/Full Supported pause frame use: No Supports auto-negotiation: Yes Supported FEC modes: [RS | BaseR | None | Not reported] Advertised link modes: Not reported Advertised pause frame use: No Advertised auto-negotiation: No Advertised FEC modes: [RS | BaseR | None | Not reported] One or more FEC modes Speed: 10Mb/s Duplex: Full Port: FIBRE PHYAD: 106 Transceiver: internal Auto-negotiation: off Link detected: yes This patch includes following changes a) New ETHTOOL_SFECPARAM/SFECPARAM API, handled by the new get_fecparam/set_fecparam callbacks, provides support for configuration of forward error correction modes. b) Link mode bits for FEC modes i.e. None (No FEC mode), RS, BaseR/FC are defined so that users can configure these fec modes for supported and advertising fields as part of link autonegotiation. Signed-off-by: Vidya Sagar Ravipati --- include/linux/ethtool.h | 4 include/uapi/linux/ethtool.h | 53 ++-- net/core/ethtool.c | 34 3 files changed, 89 insertions(+), 2 deletions(-) diff --git a/include/linux/ethtool.h b/include/linux/ethtool.h index 9ded8c6..79a0bab 100644 --- a/include/linux/ethtool.h +++ b/include/linux/ethtool.h @@ -372,5 +372,9 @@ struct ethtool_ops { struct ethtool_link_ksettings *); int (*set_link_ksettings)(struct net_device *, const struct ethtool_link_ksettings *); + int (*get_fecparam)(struct net_device *, + struct ethtool_fecparam *); + int (*set_fecparam)(struct net_device *, + struct ethtool_fecparam *); }; #endif /* _LINUX_ETHTOOL_H */ diff --git a/include/uapi/linux/ethtool.h b/include/uapi/linux/ethtool.h index 099a420..28ea382 100644 --- a/include/uapi/linux/ethtool.h +++ b/include/uapi/linux/ethtool.h @@ -1224,6 +1224,51 @@ struct ethtool_per_queue_op { chardata[]; }; +/** + * struct ethtool_fecparam - Ethernet forward error correction(fec) parameters + * @cmd: Command number = %ETHTOOL_GFECPARAM or %ETHTOOL_SFECPARAM + * @autoneg: Flag to enable autonegotiation of fec modes(rs,baser) + * (D44:47 of base link code word) + * @fec: Bitmask of supported FEC modes + * @rsvd: Reserved for future extensions. i.e FEC bypass feature. + * + * Drivers should reject a non-zero setting of @autoneg when + * autoneogotiation is disabled (or not supported) for the
RE: [PATCH] net: fec: hard phy reset on open
From: Manfred SchlaeglSent: Monday, October 24, 2016 10:43 PM > To: Andy Duan > Cc: netdev@vger.kernel.org; linux-ker...@vger.kernel.org > Subject: Re: [PATCH] net: fec: hard phy reset on open > > On 2016-10-24 16:03, Andy Duan wrote: > > From: manfred.schla...@gmx.at Sent: > Monday, > > October 24, 2016 5:26 PM > >> To: Andy Duan > >> Cc: netdev@vger.kernel.org; linux-ker...@vger.kernel.org > >> Subject: [PATCH] net: fec: hard phy reset on open > >> > >> We have seen some problems with auto negotiation on i.MX6 using > >> LAN8720, after interface down/up. > >> > >> In our configuration, the ptp clock is used externally as reference > >> clock for the phy. Some phys (e.g. LAN8720) needs a stable clock > >> while and after hard reset. > >> Before this patch, the driver disabled the clock on close but did no > >> hard reset on open, after enabling the clocks again. > >> > >> A solution that prevents disabling the clocks on close was > >> considered, but discarded because of bad power saving behavior. > >> > >> This patch saves the reset dt properties on probe and does a reset on > >> every open after clocks where enabled, to make sure the clock is > >> stable while and after hard reset. > >> > >> Tested on i.MX6 and i.MX28, both using LAN8720. > >> > >> Signed-off-by: Manfred Schlaegl > >> --- > > > This patch did hard reset to let phy stable. > > > Firstly, you should do reset before clock enable. > I have to disagree here. > The phy demands(datasheet + tests) a stable clock at the time of (hard- > )reset and after this. Therefore the clock has to be enabled before the hard > reset. > (This is exactly the reason for the patch.) > > Generally: The sense of a reset is to defer the start of digital circuit > until the > environment (power, clocks, ...) has stabilized. > > Furthermore: Before this patch the hard reset was done in fec_probe. And > here also after the clocks were enabled. > > Whats was your argument to do it the other way in this special case? > I check some different vendor phy, hard reset assert after clock stable. But I still don't ensure all phys are this action. > > Secondly, we suggest to do phy reset in phy driver, not MAC driver. > I was not sure, if you meant a soft-, or hard-reset here. > > In case you are talking about soft reset: > Yes, the phy drivers perform a soft reset. Sadly a soft reset is not > sufficient in > this case - The phy recovers only on a hard reset from lost clock. (datasheet > + > tests) > > In case you're talking about hard reset: > Intuitively I would also think, that the (hard-)reset should be handled by the > phy driver, but this is not reality in given implementations. > > Documentation/devicetree/bindings/net/fsl-fec.txt says > > - phy-reset-gpios : Should specify the gpio for phy reset > > It is explicitly talked about phy-reset here. And the (hard-)reset was handled > by the fec driver also before this patch (once on probe). > I suggest to do phy hard reset in phy driver like: drivers/net/phy/spi_ks8995.c: and Uwe Kleine-König's patch "phy: add support for a reset-gpio specification" (I don't know why the patch is reverted now.) Regards, Andy > > > > Regards, > > Andy > > Thanks for your feedback! > > Best regards, > Manfred > > > > > > >> drivers/net/ethernet/freescale/fec.h | 4 ++ > >> drivers/net/ethernet/freescale/fec_main.c | 77 > >> +--- > >> --- > >> 2 files changed, 47 insertions(+), 34 deletions(-) > >> > >> diff --git a/drivers/net/ethernet/freescale/fec.h > >> b/drivers/net/ethernet/freescale/fec.h > >> index c865135..379e619 100644 > >> --- a/drivers/net/ethernet/freescale/fec.h > >> +++ b/drivers/net/ethernet/freescale/fec.h > >> @@ -498,6 +498,10 @@ struct fec_enet_private { > >>struct clk *clk_enet_out; > >>struct clk *clk_ptp; > >> > >> + int phy_reset; > >> + bool phy_reset_active_high; > >> + int phy_reset_msec; > >> + > >>bool ptp_clk_on; > >>struct mutex ptp_clk_mutex; > >>unsigned int num_tx_queues; > >> diff --git a/drivers/net/ethernet/freescale/fec_main.c > >> b/drivers/net/ethernet/freescale/fec_main.c > >> index 48a033e..8cc1ec5 100644 > >> --- a/drivers/net/ethernet/freescale/fec_main.c > >> +++ b/drivers/net/ethernet/freescale/fec_main.c > >> @@ -2802,6 +2802,22 @@ static int fec_enet_alloc_buffers(struct > >> net_device *ndev) > >>return 0; > >> } > >> > >> +static void fec_reset_phy(struct fec_enet_private *fep) { > >> + if (!gpio_is_valid(fep->phy_reset)) > >> + return; > >> + > >> + gpio_set_value_cansleep(fep->phy_reset, !!fep- > >>> phy_reset_active_high); > >> + > >> + if (fep->phy_reset_msec > 20) > >> + msleep(fep->phy_reset_msec); > >> + else > >> + usleep_range(fep->phy_reset_msec * 1000, > >> + fep->phy_reset_msec * 1000 + 1000); > >> +
[PATCH v2] net: skip genenerating uevents for network namespaces that are exiting
No one can see these events, because a network namespace can not be destroyed, if it has sockets. Unlike other devices, uevent-s for network devices are generated only inside their network namespaces. They are filtered in kobj_bcast_filter() My experiments shows that net namespaces are destroyed more 30% faster with this optimization. Here is a perf output for destroying network namespaces without this patch. - 94.76% 0.02% kworker/u48:1 [kernel.kallsyms] [k] cleanup_net - 94.74% cleanup_net - 94.64% ops_exit_list.isra.4 - 41.61% default_device_exit_batch - 41.47% unregister_netdevice_many - rollback_registered_many - 40.36% netdev_unregister_kobject - 14.55% device_del + 13.71% kobject_uevent - 13.04% netdev_queue_update_kobjects + 12.96% kobject_put - 12.72% net_rx_queue_update_kobjects kobject_put - kobject_release + 12.69% kobject_uevent + 0.80% call_netdevice_notifiers_info + 19.57% nfsd_exit_net + 11.15% tcp_net_metrics_exit + 8.25% rpcsec_gss_exit_net It's very critical to optimize the exit path for network namespaces, because they are destroyed under net_mutex and many namespaces can be destroyed for one iteration. v2: use dev_set_uevent_suppress() Cc: Cong WangCc: "David S. Miller" Cc: Eric W. Biederman Signed-off-by: Andrei Vagin --- net/core/net-sysfs.c | 14 +++--- 1 file changed, 11 insertions(+), 3 deletions(-) diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c index 6e4f347..d4fe286 100644 --- a/net/core/net-sysfs.c +++ b/net/core/net-sysfs.c @@ -950,10 +950,13 @@ net_rx_queue_update_kobjects(struct net_device *dev, int old_num, int new_num) } while (--i >= new_num) { + struct kobject *kobj = >_rx[i].kobj; + + if (!list_empty(_net(dev)->exit_list)) + kobj->uevent_suppress = 1; if (dev->sysfs_rx_queue_group) - sysfs_remove_group(>_rx[i].kobj, - dev->sysfs_rx_queue_group); - kobject_put(>_rx[i].kobj); + sysfs_remove_group(kobj, dev->sysfs_rx_queue_group); + kobject_put(kobj); } return error; @@ -1340,6 +1343,8 @@ netdev_queue_update_kobjects(struct net_device *dev, int old_num, int new_num) while (--i >= new_num) { struct netdev_queue *queue = dev->_tx + i; + if (!list_empty(_net(dev)->exit_list)) + queue->kobj.uevent_suppress = 1; #ifdef CONFIG_BQL sysfs_remove_group(>kobj, _group); #endif @@ -1525,6 +1530,9 @@ void netdev_unregister_kobject(struct net_device *ndev) { struct device *dev = &(ndev->dev); + if (!list_empty(_net(ndev)->exit_list)) + dev_set_uevent_suppress(dev, 1); + kobject_get(>kobj); remove_queue_kobjects(ndev); -- 2.7.4
[PATCH net-next] net: add an ioctl to get a socket network namespace
From: Andrey VaginEach socket operates in a network namespace where it has been created, so if we want to dump and restore a socket, we have to know its network namespace. We have a socket_diag to get information about sockets, it doesn't report sockets which are not bound or connected. This patch introduces a new socket ioctl, which is called SIOCGSKNS and used to get a file descriptor for a socket network namespace. A task must have CAP_NET_ADMIN in a target network namespace to use this ioctl. Cc: "David S. Miller" Cc: Eric W. Biederman Signed-off-by: Andrei Vagin --- fs/nsfs.c| 2 +- include/linux/proc_fs.h | 4 include/uapi/linux/sockios.h | 1 + net/socket.c | 13 + 4 files changed, 19 insertions(+), 1 deletion(-) diff --git a/fs/nsfs.c b/fs/nsfs.c index 8718af8..8c9fb29 100644 --- a/fs/nsfs.c +++ b/fs/nsfs.c @@ -118,7 +118,7 @@ void *ns_get_path(struct path *path, struct task_struct *task, return ret; } -static int open_related_ns(struct ns_common *ns, +int open_related_ns(struct ns_common *ns, struct ns_common *(*get_ns)(struct ns_common *ns)) { struct path path = {}; diff --git a/include/linux/proc_fs.h b/include/linux/proc_fs.h index b97bf2e..368c7ad 100644 --- a/include/linux/proc_fs.h +++ b/include/linux/proc_fs.h @@ -82,4 +82,8 @@ static inline struct proc_dir_entry *proc_net_mkdir( return proc_mkdir_data(name, 0, parent, net); } +struct ns_common; +int open_related_ns(struct ns_common *ns, + struct ns_common *(*get_ns)(struct ns_common *ns)); + #endif /* _LINUX_PROC_FS_H */ diff --git a/include/uapi/linux/sockios.h b/include/uapi/linux/sockios.h index 8e7890b..83cc54c 100644 --- a/include/uapi/linux/sockios.h +++ b/include/uapi/linux/sockios.h @@ -84,6 +84,7 @@ #define SIOCWANDEV 0x894A /* get/set netdev parameters*/ #define SIOCOUTQNSD0x894B /* output queue size (not sent only) */ +#define SIOCGSKNS 0x894C /* get socket network namespace */ /* ARP cache control calls. */ /* 0x8950 - 0x8952 * obsolete calls, don't re-use */ diff --git a/net/socket.c b/net/socket.c index 5a9bf5e..970a7ea 100644 --- a/net/socket.c +++ b/net/socket.c @@ -877,6 +877,11 @@ static long sock_do_ioctl(struct net *net, struct socket *sock, * what to do with it - that's up to the protocol still. */ +static struct ns_common *get_net_ns(struct ns_common *ns) +{ + return _net(container_of(ns, struct net, ns))->ns; +} + static long sock_ioctl(struct file *file, unsigned cmd, unsigned long arg) { struct socket *sock; @@ -945,6 +950,13 @@ static long sock_ioctl(struct file *file, unsigned cmd, unsigned long arg) err = dlci_ioctl_hook(cmd, argp); mutex_unlock(_ioctl_mutex); break; + case SIOCGSKNS: + err = -EPERM; + if (!ns_capable(net->user_ns, CAP_NET_ADMIN)) + break; + + err = open_related_ns(>ns, get_net_ns); + break; default: err = sock_do_ioctl(net, sock, cmd, arg); break; @@ -3093,6 +3105,7 @@ static int compat_sock_ioctl_trans(struct file *file, struct socket *sock, case SIOCSIFVLAN: case SIOCADDDLCI: case SIOCDELDLCI: + case SIOCGSKNS: return sock_ioctl(file, cmd, arg); case SIOCGIFFLAGS: -- 2.7.4
Re: question about function igmp_stop_timer() in net/ipv4/igmp.c
Hi Andrew, On 2016/10/24 23:32, Andrew Lunn wrote: > On Mon, Oct 24, 2016 at 07:50:12PM +0800, Dongpo Li wrote: >> Hello >> >> We encountered a multicast problem when two set-top box(STB) join the same >> multicast group and leave. >> The two boxes can join the same multicast group >> but only one box can send the IGMP leave group message when leave, >> the other box does not send the IGMP leave message. >> Our boxes use the IGMP version 2. >> >> I added some debug info and found the whole procedure is like this: >> (1) Box A joins the multicast group 225.1.101.145 and send the IGMP v2 >> membership report(join group). >> (2) Box B joins the same multicast group 225.1.101.145 and also send the >> IGMP v2 membership report(join group). >> (3) Box A receives the IGMP membership report from Box B and kernel calls >> igmp_heard_report(). >> This function will call igmp_stop_timer(im). >> In function igmp_stop_timer(im), it tries to delete IGMP timer and does >> the following: >> im->tm_running = 0; >> im->reporter = 0; >> (4) Box A leaves the multicast group 225.1.101.145 and kernel calls >> ip_mc_leave_group -> ip_mc_dec_group -> igmp_group_dropped. >> But in function igmp_group_dropped(), the im->reporter is 0, so the >> kernel does not send the IGMP leave message. > > RFC 2236 says: > > 2. Introduction > >The Internet Group Management Protocol (IGMP) is used by IP hosts to >report their multicast group memberships to any immediately- >neighboring multicast routers. > > Are Box A or B multicast routers? Thank you for your comments. Both Box A and B are IP hosts, not multicast routers. And the RFC says: IGMP is used by "IP hosts" to report their multicast group membership. > > Andrew > > . > Regards, Dongpo .
Re: [PATCH net-next RFC WIP] Patch for XDP support for virtio_net
On Sun, Oct 23, 2016 at 06:51:53PM -0700, Shrijeet Mukherjee wrote: > > The main goal of this patch was to start that discussion. My v2 patch > rejects the ndo op if neither of rx_mergeable or big_buffers are set. > Does that sound like a good tradeoff ? Don't know enough about who > turns these features off and why. > > I can say that virtualbox always has the device features enabled .. so > seems like a good tradeoff ? If virtio can be taught to work with xdp that would be awesome. I've looked at it from xdp prog debugging point of view, but amount of complexity related to mergeable/big/etc was too much, so I went with e1k+xdp. Are you sure that if mergeable/big disabled than buf is contiguous? Also my understanding that buf is not writeable? I don't see how to do TX either... May be it's all solvable somehow. There was a discussion to convert raw dma buffer in mlx/intel directly into vhost to avoid skb. This will allow host to send packets into VMs quickly. Then if we can have fast virtio in the guest then even more interesting use cases will be solved.
[PATCH v2 net 1/1] net sched filters: fix notification of filter delete with proper handle
From: Jamal Hadi SalimDaniel says: While trying out [1][2], I noticed that tc monitor doesn't show the correct handle on delete: $ tc monitor qdisc clsact : dev eno1 parent :fff1 filter dev eno1 ingress protocol all pref 49152 bpf handle 0x2a [...] deleted filter dev eno1 ingress protocol all pref 49152 bpf handle 0xf3be0c80 some context to explain the above: The user identity of any tc filter is represented by a 32-bit identifier encoded in tcm->tcm_handle. Example 0x2a in the bpf filter above. A user wishing to delete, get or even modify a specific filter uses this handle to reference it. Every classifier is free to provide its own semantics for the 32 bit handle. Example: classifiers like u32 use schemes like 800:1:801 to describe the semantics of their filters represented as hash table, bucket and node ids etc. Classifiers also have internal per-filter representation which is different from this externally visible identity. Most classifiers set this internal representation to be a pointer address (which allows fast retrieval of said filters in their implementations). This internal representation is referenced with the "fh" variable in the kernel control code. When a user successfuly deletes a specific filter, by specifying the correct tcm->tcm_handle, an event is generated to user space which indicates which specific filter was deleted. Before this patch, the "fh" value was sent to user space as the identity. As an example what is shown in the sample bpf filter delete event above is 0xf3be0c80. This is infact a 32-bit truncation of 0x8807f3be0c80 which happens to be a 64-bit memory address of the internal filter representation (address of the corresponding filter's struct cls_bpf_prog); After this patch the appropriate user identifiable handle as encoded in the originating request tcm->tcm_handle is generated in the event. One of the cardinal rules of netlink rules is to be able to take an event (such as a delete in this case) and reflect it back to the kernel and successfully delete the filter. This patch achieves that. Note, this issue has existed since the original TC action infrastructure code patch back in 2004 as found in: https://git.kernel.org/cgit/linux/kernel/git/history/history.git/commit/ [1] http://patchwork.ozlabs.org/patch/682828/ [2] http://patchwork.ozlabs.org/patch/682829/ Fixes: 4e54c4816bfe ("[NET]: Add tc extensions infrastructure.") Reported-by: Daniel Borkmann Acked-by: Cong Wang Signed-off-by: Jamal Hadi Salim --- net/sched/cls_api.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c index 2ee29a3..2b2a797 100644 --- a/net/sched/cls_api.c +++ b/net/sched/cls_api.c @@ -345,7 +345,8 @@ static int tc_ctl_tfilter(struct sk_buff *skb, struct nlmsghdr *n) if (err == 0) { struct tcf_proto *next = rtnl_dereference(tp->next); - tfilter_notify(net, skb, n, tp, fh, + tfilter_notify(net, skb, n, tp, + t->tcm_handle, RTM_DELTFILTER, false); if (tcf_destroy(tp, false)) RCU_INIT_POINTER(*back, next); -- 1.9.1
[PATCH] net: bgmac: fix spelling mistake: "connecton" -> "connection"
From: Colin Ian Kingtrivial fix to spelling mistake in dev_err message Signed-off-by: Colin Ian King --- drivers/net/ethernet/broadcom/bgmac.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/ethernet/broadcom/bgmac.c b/drivers/net/ethernet/broadcom/bgmac.c index 856379c..31ca204 100644 --- a/drivers/net/ethernet/broadcom/bgmac.c +++ b/drivers/net/ethernet/broadcom/bgmac.c @@ -1449,7 +1449,7 @@ static int bgmac_phy_connect(struct bgmac *bgmac) phy_dev = phy_connect(bgmac->net_dev, bus_id, _adjust_link, PHY_INTERFACE_MODE_MII); if (IS_ERR(phy_dev)) { - dev_err(bgmac->dev, "PHY connecton failed\n"); + dev_err(bgmac->dev, "PHY connection failed\n"); return PTR_ERR(phy_dev); } -- 2.9.3
[PATCH v3 net] flow_dissector: fix vlan tag handling
gcc warns about an uninitialized pointer dereference in the vlan priority handling: net/core/flow_dissector.c: In function '__skb_flow_dissect': net/core/flow_dissector.c:281:61: error: 'vlan' may be used uninitialized in this function [-Werror=maybe-uninitialized] As pointed out by Jiri Pirko, the variable is never actually used without being initialized first as the only way it end up uninitialized is with skb_vlan_tag_present(skb)==true, and that means it does not get accessed. However, the warning hints at some related issues that I'm addressing here: - the second check for the vlan tag is different from the first one that tests the skb for being NULL first, causing both the warning and a possible NULL pointer dereference that was not entirely fixed. - The same patch that introduced the NULL pointer check dropped an earlier optimization that skipped the repeated check of the protocol type - The local '_vlan' variable is referenced through the 'vlan' pointer but the variable has gone out of scope by the time that it is accessed, causing undefined behavior Caching the result of the 'skb && skb_vlan_tag_present(skb)' check in a local variable allows the compiler to further optimize the later check. With those changes, the warning also disappears. Fixes: 3805a938a6c2 ("flow_dissector: Check skb for VLAN only if skb specified.") Fixes: d5709f7ab776 ("flow_dissector: For stripped vlan, get vlan info from skb->vlan_tci") Signed-off-by: Arnd Bergmann--- v3: set 'proto' variable correct again mark it for net, rather than net-next, as both patches that introduced the bugs are in mainline or in net/master v2: fix multiple issues found in the initial review beyond the uninitialized access that turned out to be ok net/core/flow_dissector.c | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c index 44e6ba9d3a6b..ab193e5def07 100644 --- a/net/core/flow_dissector.c +++ b/net/core/flow_dissector.c @@ -246,13 +246,13 @@ bool __skb_flow_dissect(const struct sk_buff *skb, case htons(ETH_P_8021AD): case htons(ETH_P_8021Q): { const struct vlan_hdr *vlan; + struct vlan_hdr _vlan; + bool vlan_tag_present = skb && skb_vlan_tag_present(skb); - if (skb && skb_vlan_tag_present(skb)) + if (vlan_tag_present) proto = skb->protocol; - if (eth_type_vlan(proto)) { - struct vlan_hdr _vlan; - + if (!vlan_tag_present || eth_type_vlan(skb->protocol)) { vlan = __skb_header_pointer(skb, nhoff, sizeof(_vlan), data, hlen, &_vlan); if (!vlan) @@ -270,7 +270,7 @@ bool __skb_flow_dissect(const struct sk_buff *skb, FLOW_DISSECTOR_KEY_VLAN, target_container); - if (skb_vlan_tag_present(skb)) { + if (vlan_tag_present) { key_vlan->vlan_id = skb_vlan_tag_get_id(skb); key_vlan->vlan_priority = (skb_vlan_tag_get_prio(skb) >> VLAN_PRIO_SHIFT); -- 2.9.0
Re: [PATCH] netfilter: ip_vs_sync: fix bogus maybe-uninitialized warning
On Monday, October 24, 2016 10:47:54 PM CEST Julian Anastasov wrote: > > diff --git a/net/netfilter/ipvs/ip_vs_sync.c > > b/net/netfilter/ipvs/ip_vs_sync.c > > index 1b07578bedf3..9350530c16c1 100644 > > --- a/net/netfilter/ipvs/ip_vs_sync.c > > +++ b/net/netfilter/ipvs/ip_vs_sync.c > > @@ -283,6 +283,7 @@ struct ip_vs_sync_buff { > > */ > > static void ntoh_seq(struct ip_vs_seq *no, struct ip_vs_seq *ho) > > { > > + memset(ho, 0, sizeof(*ho)); > > ho->init_seq = get_unaligned_be32(>init_seq); > > ho->delta = get_unaligned_be32(>delta); > > ho->previous_delta = get_unaligned_be32(>previous_delta); > > So, now there is a double write here? Correct. I would hope that a sane version of gcc would just not perform the first write. What happens instead is that the version that produces the warning here moves the initialization to the top of the calling function. > What about such constructs?: > > *ho = (struct ip_vs_seq) { > .init_seq = get_unaligned_be32(>init_seq), > ... > }; > > Any difference in the compiled code or warnings? Yes, it's one of many things I tried. What happens here is that the warning remains as long as all fields are initialized together, e.g. these two produces the same warning: a) ho->init_seq = get_unaligned_be32(>init_seq); ho->delta = get_unaligned_be32(>delta); ho->previous_delta = get_unaligned_be32(>previous_delta); b) *ho = (struct ip_vs_seq) { .init_seq = get_unaligned_be32(>init_seq); .delta = get_unaligned_be32(>delta); .previous_delta = get_unaligned_be32(>previous_delta); }; but this one does not: c) *ho = (struct ip_vs_seq) { .delta = get_unaligned_be32(>delta); .previous_delta = get_unaligned_be32(>previous_delta); }; ho->init_seq = get_unaligned_be32(>init_seq); I have absolutely no idea what is going on inside of gcc here. Arnd
Re: [PATCH] can: fix warning in bcm_connect/proc_register
On Mon, Oct 24, 2016 at 1:10 PM, Cong Wangwrote: > On Mon, Oct 24, 2016 at 12:11 PM, Oliver Hartkopp > wrote: >> if (proc_dir) { >> /* unique socket address as filename */ >> sprintf(bo->procname, "%lu", sock_i_ino(sk)); >> bo->bcm_proc_read = proc_create_data(bo->procname, 0644, >> proc_dir, >> _proc_fops, sk); >> + if (!bo->bcm_proc_read) { >> + ret = -ENOMEM; >> + goto fail; >> + } > > Well, I meant we need to call proc_create_data() once per socket, > so we need a check before proc_create_data() too. Hmm, bo->bound should guarantee it, so never mind, your patch looks fine.
Re: [PATCH] can: fix warning in bcm_connect/proc_register
On Mon, Oct 24, 2016 at 12:11 PM, Oliver Hartkoppwrote: > if (proc_dir) { > /* unique socket address as filename */ > sprintf(bo->procname, "%lu", sock_i_ino(sk)); > bo->bcm_proc_read = proc_create_data(bo->procname, 0644, > proc_dir, > _proc_fops, sk); > + if (!bo->bcm_proc_read) { > + ret = -ENOMEM; > + goto fail; > + } Well, I meant we need to call proc_create_data() once per socket, so we need a check before proc_create_data() too. Thanks.
Re: [PATCH] netfilter: ip_vs_sync: fix bogus maybe-uninitialized warning
Hello, On Mon, 24 Oct 2016, Arnd Bergmann wrote: > Building the ip_vs_sync code with CONFIG_OPTIMIZE_INLINING on x86 > confuses the compiler to the point where it produces a rather > dubious warning message: > > net/netfilter/ipvs/ip_vs_sync.c:1073:33: error: ‘opt.init_seq’ may be used > uninitialized in this function [-Werror=maybe-uninitialized] > struct ip_vs_sync_conn_options opt; > ^~~ > net/netfilter/ipvs/ip_vs_sync.c:1073:33: error: ‘opt.delta’ may be used > uninitialized in this function [-Werror=maybe-uninitialized] > net/netfilter/ipvs/ip_vs_sync.c:1073:33: error: ‘opt.previous_delta’ may be > used uninitialized in this function [-Werror=maybe-uninitialized] > net/netfilter/ipvs/ip_vs_sync.c:1073:33: error: ‘*((void *)+12).init_seq’ > may be used uninitialized in this function [-Werror=maybe-uninitialized] > net/netfilter/ipvs/ip_vs_sync.c:1073:33: error: ‘*((void *)+12).delta’ > may be used uninitialized in this function [-Werror=maybe-uninitialized] > net/netfilter/ipvs/ip_vs_sync.c:1073:33: error: ‘*((void > *)+12).previous_delta’ may be used uninitialized in this function > [-Werror=maybe-uninitialized] > > The problem appears to be a combination of a number of factors, including > the __builtin_bswap32 compiler builtin being slightly odd, having a large > amount of code inlined into a single function, and the way that some > functions only get partially inlined here. > > I've spent way too much time trying to work out a way to improve the > code, but the best I've come up with is to add an explicit memset > right before the ip_vs_seq structure is first initialized here. When > the compiler works correctly, this has absolutely no effect, but in the > case that produces the warning, the warning disappears. > > In the process of analysing this warning, I also noticed that > we use memcpy to copy the larger ip_vs_sync_conn_options structure > over two members of the ip_vs_conn structure. This works because > the layout is identical, but seems error-prone, so I'm changing > this in the process to directly copy the two members. This change > seemed to have no effect on the object code or the warning, but > it deals with the same data, so I kept the two changes together. > > Signed-off-by: Arnd BergmannOK, Acked-by: Julian Anastasov I guess, Simon will take the patch for ipvs-next. > --- > net/netfilter/ipvs/ip_vs_sync.c | 7 +-- > 1 file changed, 5 insertions(+), 2 deletions(-) > > diff --git a/net/netfilter/ipvs/ip_vs_sync.c b/net/netfilter/ipvs/ip_vs_sync.c > index 1b07578bedf3..9350530c16c1 100644 > --- a/net/netfilter/ipvs/ip_vs_sync.c > +++ b/net/netfilter/ipvs/ip_vs_sync.c > @@ -283,6 +283,7 @@ struct ip_vs_sync_buff { > */ > static void ntoh_seq(struct ip_vs_seq *no, struct ip_vs_seq *ho) > { > + memset(ho, 0, sizeof(*ho)); > ho->init_seq = get_unaligned_be32(>init_seq); > ho->delta = get_unaligned_be32(>delta); > ho->previous_delta = get_unaligned_be32(>previous_delta); So, now there is a double write here? What about such constructs?: *ho = (struct ip_vs_seq) { .init_seq = get_unaligned_be32(>init_seq), ... }; Any difference in the compiled code or warnings? > @@ -917,8 +918,10 @@ static void ip_vs_proc_conn(struct netns_ipvs *ipvs, > struct ip_vs_conn_param *pa > kfree(param->pe_data); > } > > - if (opt) > - memcpy(>in_seq, opt, sizeof(*opt)); > + if (opt) { > + cp->in_seq = opt->in_seq; > + cp->out_seq = opt->out_seq; This fix is fine. > + } > atomic_set(>in_pkts, sysctl_sync_threshold(ipvs)); > cp->state = state; > cp->old_state = cp->state; > -- > 2.9.0 Regards -- Julian Anastasov
Re: net/sctp: slab-out-of-bounds in sctp_sf_ootb
Hi Andrey, On Mon, Oct 24, 2016 at 05:30:04PM +0200, Andrey Konovalov wrote: > The problem is that sctp_walk_errors walks the chunk before its length > is checked for overflow. Exactly. The check is done too late, for the 2nd and subsequent chunks only. Please try the following patch, thanks. Note: not even compile tested. ---8<--- diff --git a/net/sctp/sm_statefuns.c b/net/sctp/sm_statefuns.c index 026e3bca4a94..8ec20a64a3f8 100644 --- a/net/sctp/sm_statefuns.c +++ b/net/sctp/sm_statefuns.c @@ -3422,6 +3422,12 @@ sctp_disposition_t sctp_sf_ootb(struct net *net, return sctp_sf_violation_chunklen(net, ep, asoc, type, arg, commands); + /* Report violation if chunk len overflows */ + ch_end = ((__u8 *)ch) + SCTP_PAD4(ntohs(ch->length)); + if (ch_end > skb_tail_pointer(skb)) + return sctp_sf_violation_chunklen(net, ep, asoc, type, arg, + commands); + /* Now that we know we at least have a chunk header, * do things that are type appropriate. */ @@ -3453,12 +3459,6 @@ sctp_disposition_t sctp_sf_ootb(struct net *net, } } - /* Report violation if chunk len overflows */ - ch_end = ((__u8 *)ch) + SCTP_PAD4(ntohs(ch->length)); - if (ch_end > skb_tail_pointer(skb)) - return sctp_sf_violation_chunklen(net, ep, asoc, type, arg, - commands); - ch = (sctp_chunkhdr_t *) ch_end; } while (ch_end < skb_tail_pointer(skb));
Re: [net-next PATCH RFC 19/26] arch/sparc: Add option to skip DMA sync as a part of map and unmap
On Mon, Oct 24, 2016 at 11:27 AM, David Millerwrote: > From: Alexander Duyck > Date: Mon, 24 Oct 2016 08:06:07 -0400 > >> This change allows us to pass DMA_ATTR_SKIP_CPU_SYNC which allows us to >> avoid invoking cache line invalidation if the driver will just handle it >> via a sync_for_cpu or sync_for_device call. >> >> Cc: "David S. Miller" >> Cc: sparcli...@vger.kernel.org >> Signed-off-by: Alexander Duyck > > This is fine for avoiding the flush for performance reasons, but the > chip isn't going to write anything back unless the device wrote into > the area. That is mostly what I am doing here. The original implementation was mostly for performance. I am trying to take the attribute that was already in place for ARM and apply it to all the other architectures. So what will be happening now is that we call the map function with this attribute set and then use the sync functions to map it to the device and then pull the mapping later. The idea is that if Jesper does his page pool stuff it would be calling the map/unmap functions and then the drivers would be doing the sync_for_cpu/sync_for_device. I want to make sure the map is cheap and we will have to call sync_for_cpu from the drivers anyway since there is no guarantee if we will have a new page or be reusing an existing one. - Alex
[PATCH] net: ipv6: Do not consider link state for nexthop validation
Similar to IPv4, do not consider link state when validating next hops. Currently, if the link is down default routes can fail to insert: $ ip -6 ro add vrf blue default via 2100:2::64 dev eth2 RTNETLINK answers: No route to host With this patch the command succeeds. Fixes: 8c14586fc320 ("net: ipv6: Use passed in table for nexthop lookups") Signed-off-by: David Ahern--- include/net/ip6_route.h | 1 + net/ipv6/route.c| 6 -- 2 files changed, 5 insertions(+), 2 deletions(-) diff --git a/include/net/ip6_route.h b/include/net/ip6_route.h index e0cd318d5103..f83e78d071a3 100644 --- a/include/net/ip6_route.h +++ b/include/net/ip6_route.h @@ -32,6 +32,7 @@ struct route_info { #define RT6_LOOKUP_F_SRCPREF_TMP 0x0008 #define RT6_LOOKUP_F_SRCPREF_PUBLIC0x0010 #define RT6_LOOKUP_F_SRCPREF_COA 0x0020 +#define RT6_LOOKUP_F_IGNORE_LINKSTATE 0x0040 /* We do not (yet ?) support IPv6 jumbograms (RFC 2675) * Unlike IPv4, hdr->seg_len doesn't include the IPv6 header diff --git a/net/ipv6/route.c b/net/ipv6/route.c index 3ac19eb81a86..947ed1ded026 100644 --- a/net/ipv6/route.c +++ b/net/ipv6/route.c @@ -658,7 +658,8 @@ static struct rt6_info *find_match(struct rt6_info *rt, int oif, int strict, struct net_device *dev = rt->dst.dev; if (dev && !netif_carrier_ok(dev) && - idev->cnf.ignore_routes_with_linkdown) + idev->cnf.ignore_routes_with_linkdown && + !(strict & RT6_LOOKUP_F_IGNORE_LINKSTATE)) goto out; if (rt6_check_expired(rt)) @@ -1052,6 +1053,7 @@ struct rt6_info *ip6_pol_route(struct net *net, struct fib6_table *table, int strict = 0; strict |= flags & RT6_LOOKUP_F_IFACE; + strict |= flags & RT6_LOOKUP_F_IGNORE_LINKSTATE; if (net->ipv6.devconf_all->forwarding == 0) strict |= RT6_LOOKUP_F_REACHABLE; @@ -1791,7 +1793,7 @@ static struct rt6_info *ip6_nh_lookup_table(struct net *net, }; struct fib6_table *table; struct rt6_info *rt; - int flags = RT6_LOOKUP_F_IFACE; + int flags = RT6_LOOKUP_F_IFACE | RT6_LOOKUP_F_IGNORE_LINKSTATE; table = fib6_get_table(net, cfg->fc_table); if (!table) -- 2.1.4
Re: [net-next PATCH RFC 02/26] swiotlb: Add support for DMA_ATTR_SKIP_CPU_SYNC
On Mon, Oct 24, 2016 at 11:09 AM, Konrad Rzeszutek Wilkwrote: > On Mon, Oct 24, 2016 at 08:04:37AM -0400, Alexander Duyck wrote: >> As a first step to making DMA_ATTR_SKIP_CPU_SYNC apply to architectures >> beyond just ARM I need to make it so that the swiotlb will respect the >> flag. In order to do that I also need to update the swiotlb-xen since it >> heavily makes use of the functionality. >> >> Cc: Konrad Rzeszutek Wilk >> Signed-off-by: Alexander Duyck >> --- >> drivers/xen/swiotlb-xen.c | 40 ++ >> include/linux/swiotlb.h |6 -- >> lib/swiotlb.c | 48 >> +++-- >> 3 files changed, 56 insertions(+), 38 deletions(-) >> >> diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c >> index 87e6035..cf047d8 100644 >> --- a/drivers/xen/swiotlb-xen.c >> +++ b/drivers/xen/swiotlb-xen.c >> @@ -405,7 +405,8 @@ dma_addr_t xen_swiotlb_map_page(struct device *dev, >> struct page *page, >>*/ >> trace_swiotlb_bounced(dev, dev_addr, size, swiotlb_force); >> >> - map = swiotlb_tbl_map_single(dev, start_dma_addr, phys, size, dir); >> + map = swiotlb_tbl_map_single(dev, start_dma_addr, phys, size, dir, >> + attrs); >> if (map == SWIOTLB_MAP_ERROR) >> return DMA_ERROR_CODE; >> >> @@ -416,11 +417,13 @@ dma_addr_t xen_swiotlb_map_page(struct device *dev, >> struct page *page, >> /* >>* Ensure that the address returned is DMA'ble >>*/ >> - if (!dma_capable(dev, dev_addr, size)) { >> - swiotlb_tbl_unmap_single(dev, map, size, dir); >> - dev_addr = 0; >> - } >> - return dev_addr; >> + if (dma_capable(dev, dev_addr, size)) >> + return dev_addr; >> + >> + swiotlb_tbl_unmap_single(dev, map, size, dir, >> + attrs | DMA_ATTR_SKIP_CPU_SYNC); >> + >> + return DMA_ERROR_CODE; > > Why? This change (re-ordering the code - and returning DMA_ERROR_CODE instead > of 0) does not have anything to do with the title. > > If you really feel strongly about it - then please send it as a seperate > patch. Okay I can do that. This was mostly just to clean up the formatting because I was over 80 characters when I added the attribute. Changing the return value to DMA_ERROR_CODE from 0 was based on the fact that earlier in the function that is the value you return if there is a mapping error. >> } >> EXPORT_SYMBOL_GPL(xen_swiotlb_map_page); >> >> @@ -444,7 +447,7 @@ static void xen_unmap_single(struct device *hwdev, >> dma_addr_t dev_addr, >> >> /* NOTE: We use dev_addr here, not paddr! */ >> if (is_xen_swiotlb_buffer(dev_addr)) { >> - swiotlb_tbl_unmap_single(hwdev, paddr, size, dir); >> + swiotlb_tbl_unmap_single(hwdev, paddr, size, dir, attrs); >> return; >> } >> >> @@ -557,16 +560,9 @@ void xen_swiotlb_unmap_page(struct device *hwdev, >> dma_addr_t dev_addr, >> >> start_dma_addr, >>sg_phys(sg), >>sg->length, >> - dir); >> - if (map == SWIOTLB_MAP_ERROR) { >> - dev_warn(hwdev, "swiotlb buffer is full\n"); >> - /* Don't panic here, we expect map_sg users >> -to do proper error handling. */ >> - xen_swiotlb_unmap_sg_attrs(hwdev, sgl, i, dir, >> -attrs); >> - sg_dma_len(sgl) = 0; >> - return 0; >> - } >> + dir, attrs); >> + if (map == SWIOTLB_MAP_ERROR) >> + goto map_error; >> xen_dma_map_page(hwdev, pfn_to_page(map >> PAGE_SHIFT), >> dev_addr, >> map & ~PAGE_MASK, >> @@ -589,6 +585,16 @@ void xen_swiotlb_unmap_page(struct device *hwdev, >> dma_addr_t dev_addr, >> sg_dma_len(sg) = sg->length; >> } >> return nelems; >> +map_error: >> + dev_warn(hwdev, "swiotlb buffer is full\n"); >> + /* >> + * Don't panic here, we expect map_sg users >> + * to do proper error handling. >> + */ >> + xen_swiotlb_unmap_sg_attrs(hwdev, sgl, i, dir, >> +attrs | DMA_ATTR_SKIP_CPU_SYNC); >> + sg_dma_len(sgl) = 0; >> + return 0; >> } > > This too. Why can't that be part of the existing code that was there?
[PATCH] can: fix warning in bcm_connect/proc_register
Andrey Konovalov reported an issue with proc_register in bcm.c. As suggested by Cong Wang this patch adds a lock_sock() protection and a check for unsuccessful proc_create_data() in bcm_connect(). Reference: http://marc.info/?l=linux-netdev=147732648731237 Reported-by: Andrey KonovalovSuggested-by: Cong Wang Signed-off-by: Oliver Hartkopp --- net/can/bcm.c | 32 +++- 1 file changed, 23 insertions(+), 9 deletions(-) diff --git a/net/can/bcm.c b/net/can/bcm.c index 8e999ff..8af9d25 100644 --- a/net/can/bcm.c +++ b/net/can/bcm.c @@ -1549,24 +1549,31 @@ static int bcm_connect(struct socket *sock, struct sockaddr *uaddr, int len, struct sockaddr_can *addr = (struct sockaddr_can *)uaddr; struct sock *sk = sock->sk; struct bcm_sock *bo = bcm_sk(sk); + int ret = 0; if (len < sizeof(*addr)) return -EINVAL; - if (bo->bound) - return -EISCONN; + lock_sock(sk); + + if (bo->bound) { + ret = -EISCONN; + goto fail; + } /* bind a device to this socket */ if (addr->can_ifindex) { struct net_device *dev; dev = dev_get_by_index(_net, addr->can_ifindex); - if (!dev) - return -ENODEV; - + if (!dev) { + ret = -ENODEV; + goto fail; + } if (dev->type != ARPHRD_CAN) { dev_put(dev); - return -ENODEV; + ret = -ENODEV; + goto fail; } bo->ifindex = dev->ifindex; @@ -1577,17 +1584,24 @@ static int bcm_connect(struct socket *sock, struct sockaddr *uaddr, int len, bo->ifindex = 0; } - bo->bound = 1; - if (proc_dir) { /* unique socket address as filename */ sprintf(bo->procname, "%lu", sock_i_ino(sk)); bo->bcm_proc_read = proc_create_data(bo->procname, 0644, proc_dir, _proc_fops, sk); + if (!bo->bcm_proc_read) { + ret = -ENOMEM; + goto fail; + } } - return 0; + bo->bound = 1; + +fail: + release_sock(sk); + + return ret; } static int bcm_recvmsg(struct socket *sock, struct msghdr *msg, size_t size, -- 2.9.3
Re: [net-next PATCH RFC 05/26] arch/avr32: Add option to skip sync on DMA map
Around Mon 24 Oct 2016 08:04:53 -0400 or thereabout, Alexander Duyck wrote: > The use of DMA_ATTR_SKIP_CPU_SYNC was not consistent across all of the DMA > APIs in the arch/arm folder. This change is meant to correct that so that > we get consistent behavior. Looks good (-: > Cc: Haavard Skinnemoen> Cc: Hans-Christian Egtvedt > Signed-off-by: Alexander Duyck Acked-by: Hans-Christian Noren Egtvedt > --- > arch/avr32/mm/dma-coherent.c |7 ++- > 1 file changed, 6 insertions(+), 1 deletion(-) > > diff --git a/arch/avr32/mm/dma-coherent.c b/arch/avr32/mm/dma-coherent.c > index 58610d0..54534e5 100644 > --- a/arch/avr32/mm/dma-coherent.c > +++ b/arch/avr32/mm/dma-coherent.c > @@ -146,7 +146,8 @@ static dma_addr_t avr32_dma_map_page(struct device *dev, > struct page *page, > { > void *cpu_addr = page_address(page) + offset; > > - dma_cache_sync(dev, cpu_addr, size, direction); > + if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC)) > + dma_cache_sync(dev, cpu_addr, size, direction); > return virt_to_bus(cpu_addr); > } > > @@ -162,6 +163,10 @@ static int avr32_dma_map_sg(struct device *dev, struct > scatterlist *sglist, > > sg->dma_address = page_to_bus(sg_page(sg)) + sg->offset; > virt = sg_virt(sg); > + > + if (attrs & DMA_ATTR_SKIP_CPU_SYNC) > + continue; > + > dma_cache_sync(dev, virt, sg->length, direction); > } > -- mvh Hans-Christian Noren Egtvedt
Re: [net-next PATCH RFC 19/26] arch/sparc: Add option to skip DMA sync as a part of map and unmap
From: Alexander DuyckDate: Mon, 24 Oct 2016 08:06:07 -0400 > This change allows us to pass DMA_ATTR_SKIP_CPU_SYNC which allows us to > avoid invoking cache line invalidation if the driver will just handle it > via a sync_for_cpu or sync_for_device call. > > Cc: "David S. Miller" > Cc: sparcli...@vger.kernel.org > Signed-off-by: Alexander Duyck This is fine for avoiding the flush for performance reasons, but the chip isn't going to write anything back unless the device wrote into the area.
Re: net/can: warning in bcm_connect/proc_register
Hello Andrey, hello Cong, thanks for catching this issue. I added lock_sock() and a check for a failing proc_create_data() below. Can you please check if it solved the issue? I tested the patched version with the stress tool as advised by Andrey and did not see any problems in dmesg anymore. If ok I can provide a proper patch. Many thanks, Oliver diff --git a/net/can/bcm.c b/net/can/bcm.c index 8e999ff..8af9d25 100644 --- a/net/can/bcm.c +++ b/net/can/bcm.c @@ -1549,24 +1549,31 @@ static int bcm_connect(struct socket *sock, struct sockaddr *uaddr, int len, struct sockaddr_can *addr = (struct sockaddr_can *)uaddr; struct sock *sk = sock->sk; struct bcm_sock *bo = bcm_sk(sk); + int ret = 0; if (len < sizeof(*addr)) return -EINVAL; - if (bo->bound) - return -EISCONN; + lock_sock(sk); + + if (bo->bound) { + ret = -EISCONN; + goto fail; + } /* bind a device to this socket */ if (addr->can_ifindex) { struct net_device *dev; dev = dev_get_by_index(_net, addr->can_ifindex); - if (!dev) - return -ENODEV; - + if (!dev) { + ret = -ENODEV; + goto fail; + } if (dev->type != ARPHRD_CAN) { dev_put(dev); - return -ENODEV; + ret = -ENODEV; + goto fail; } bo->ifindex = dev->ifindex; @@ -1577,17 +1584,24 @@ static int bcm_connect(struct socket *sock, struct sockaddr *uaddr, int len, bo->ifindex = 0; } - bo->bound = 1; - if (proc_dir) { /* unique socket address as filename */ sprintf(bo->procname, "%lu", sock_i_ino(sk)); bo->bcm_proc_read = proc_create_data(bo->procname, 0644, proc_dir, _proc_fops, sk); + if (!bo->bcm_proc_read) { + ret = -ENOMEM; + goto fail; + } } - return 0; + bo->bound = 1; + +fail: + release_sock(sk); + + return ret; } static int bcm_recvmsg(struct socket *sock, struct msghdr *msg, size_t size, On 10/24/2016 07:31 PM, Andrey Konovalov wrote: Hi Cong, I'm able to reproduce it by running https://gist.github.com/xairy/33f2eb6bf807b004e643bae36c3d02d7 in a tight parallel loop with stress (https://godoc.org/golang.org/x/tools/cmd/stress): $ gcc -lpthread tmp.c $ ./stress ./a.out The C program was generated from the following syzkaller prog: mmap(&(0x7f00/0x991000)=nil, (0x991000), 0x3, 0x32, 0x, 0x0) socket(0x1d, 0x80002, 0x2) r0 = socket(0x1d, 0x80002, 0x2) connect$nfc_llcp(r0, &(0x7f00c000)={0x27, 0x1, 0x0, 0x5, 0x1, 0x1, "341b3a01b257849ca1d7d1ff9f999d8127b185f88d1d775d59c88a3aa6a8ddacdf2bdc324ea6578a21b85114610186c3817c34b05eaffd2c3f54f57fa81ba0", 0x1ff}, 0x60) connect$nfc_llcp(r0, &(0x7f991000-0x60)={0x27, 0x1, 0x1, 0x5, 0xfffd, 0x0, "341b3a01b257849ca1d7d1ff9f999d8127b185f88d1d775dbec88a3aa6a8ddacdf2bdc324ea6578a21b85114610186c3817c34b05eaffd2c3f54f57fa81ba0", 0x1ff}, 0x60) Unfortunately I wasn't able to create a simpler reproducer. Thanks! On Mon, Oct 24, 2016 at 6:58 PM, Cong Wangwrote: On Mon, Oct 24, 2016 at 9:21 AM, Andrey Konovalov wrote: Hi, I've got the following error report while running the syzkaller fuzzer: WARNING: CPU: 0 PID: 32451 at fs/proc/generic.c:345 proc_register+0x25e/0x300 proc_dir_entry 'can-bcm/249757' already registered Kernel panic - not syncing: panic_on_warn set ... Looks like we have two problems here: 1) A check for bo->bcm_proc_read != NULL seems missing 2) We need to lock the sock in bcm_connect(). I will work on a patch. Meanwhile, it would help a lot if you could provide a reproducer. Thanks!
[net-next PATCH RFC 19/26] arch/sparc: Add option to skip DMA sync as a part of map and unmap
This change allows us to pass DMA_ATTR_SKIP_CPU_SYNC which allows us to avoid invoking cache line invalidation if the driver will just handle it via a sync_for_cpu or sync_for_device call. Cc: "David S. Miller"Cc: sparcli...@vger.kernel.org Signed-off-by: Alexander Duyck --- arch/sparc/kernel/iommu.c |4 ++-- arch/sparc/kernel/ioport.c |4 ++-- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/arch/sparc/kernel/iommu.c b/arch/sparc/kernel/iommu.c index 5c615ab..8fda4e4 100644 --- a/arch/sparc/kernel/iommu.c +++ b/arch/sparc/kernel/iommu.c @@ -415,7 +415,7 @@ static void dma_4u_unmap_page(struct device *dev, dma_addr_t bus_addr, ctx = (iopte_val(*base) & IOPTE_CONTEXT) >> 47UL; /* Step 1: Kick data out of streaming buffers if necessary. */ - if (strbuf->strbuf_enabled) + if (strbuf->strbuf_enabled && !(attrs & DMA_ATTR_SKIP_CPU_SYNC)) strbuf_flush(strbuf, iommu, bus_addr, ctx, npages, direction); @@ -640,7 +640,7 @@ static void dma_4u_unmap_sg(struct device *dev, struct scatterlist *sglist, base = iommu->page_table + entry; dma_handle &= IO_PAGE_MASK; - if (strbuf->strbuf_enabled) + if (strbuf->strbuf_enabled && !(attrs & DMA_ATTR_SKIP_CPU_SYNC)) strbuf_flush(strbuf, iommu, dma_handle, ctx, npages, direction); diff --git a/arch/sparc/kernel/ioport.c b/arch/sparc/kernel/ioport.c index 2344103..6ffaec4 100644 --- a/arch/sparc/kernel/ioport.c +++ b/arch/sparc/kernel/ioport.c @@ -527,7 +527,7 @@ static dma_addr_t pci32_map_page(struct device *dev, struct page *page, static void pci32_unmap_page(struct device *dev, dma_addr_t ba, size_t size, enum dma_data_direction dir, unsigned long attrs) { - if (dir != PCI_DMA_TODEVICE) + if (dir != PCI_DMA_TODEVICE && !(attrs & DMA_ATTR_SKIP_CPU_SYNC)) dma_make_coherent(ba, PAGE_ALIGN(size)); } @@ -572,7 +572,7 @@ static void pci32_unmap_sg(struct device *dev, struct scatterlist *sgl, struct scatterlist *sg; int n; - if (dir != PCI_DMA_TODEVICE) { + if (dir != PCI_DMA_TODEVICE && !(attrs & DMA_ATTR_SKIP_CPU_SYNC)) { for_each_sg(sgl, sg, nents, n) { dma_make_coherent(sg_phys(sg), PAGE_ALIGN(sg->length)); }
[net-next PATCH RFC 23/26] mm: Add support for releasing multiple instances of a page
This patch adds a function that allows us to batch free a page that has multiple references outstanding. Specifically this function can be used to drop a page being used in the page frag alloc cache. With this drivers can make use of functionality similar to the page frag alloc cache without having to do any workarounds for the fact that there is no function that frees multiple references. Cc: linux...@kvack.org Signed-off-by: Alexander Duyck--- include/linux/gfp.h |2 ++ mm/page_alloc.c | 14 ++ 2 files changed, 16 insertions(+) diff --git a/include/linux/gfp.h b/include/linux/gfp.h index f8041f9de..4175dca 100644 --- a/include/linux/gfp.h +++ b/include/linux/gfp.h @@ -506,6 +506,8 @@ extern struct page *alloc_pages_vma(gfp_t gfp_mask, int order, extern void free_hot_cold_page_list(struct list_head *list, bool cold); struct page_frag_cache; +extern void __page_frag_drain(struct page *page, unsigned int order, + unsigned int count); extern void *__alloc_page_frag(struct page_frag_cache *nc, unsigned int fragsz, gfp_t gfp_mask); extern void __free_page_frag(void *addr); diff --git a/mm/page_alloc.c b/mm/page_alloc.c index ca423cc..253046a 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -3883,6 +3883,20 @@ static struct page *__page_frag_refill(struct page_frag_cache *nc, return page; } +void __page_frag_drain(struct page *page, unsigned int order, + unsigned int count) +{ + VM_BUG_ON_PAGE(page_ref_count(page) == 0, page); + + if (page_ref_sub_and_test(page, count)) { + if (order == 0) + free_hot_cold_page(page, false); + else + __free_pages_ok(page, order); + } +} +EXPORT_SYMBOL(__page_frag_drain); + void *__alloc_page_frag(struct page_frag_cache *nc, unsigned int fragsz, gfp_t gfp_mask) {
Re: [net-next PATCH RFC 01/26] swiotlb: Drop unused function swiotlb_map_sg
On Mon, Oct 24, 2016 at 08:04:31AM -0400, Alexander Duyck wrote: > There are no users for swiotlb_map_sg so we might as well just drop it. > > Cc: Konrad Rzeszutek WilkAcked-by: Konrad Rzeszutek Wilk Thought I swear I saw a familiar patch by Christopher Hellwig at some point.. but maybe that patchset had been dropped. > Signed-off-by: Alexander Duyck > --- > include/linux/swiotlb.h |4 > lib/swiotlb.c |8 > 2 files changed, 12 deletions(-) > > diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h > index 5f81f8a..e237b6f 100644 > --- a/include/linux/swiotlb.h > +++ b/include/linux/swiotlb.h > @@ -72,10 +72,6 @@ extern void swiotlb_unmap_page(struct device *hwdev, > dma_addr_t dev_addr, > size_t size, enum dma_data_direction dir, > unsigned long attrs); > > -extern int > -swiotlb_map_sg(struct device *hwdev, struct scatterlist *sg, int nents, > -enum dma_data_direction dir); > - > extern void > swiotlb_unmap_sg(struct device *hwdev, struct scatterlist *sg, int nents, >enum dma_data_direction dir); > diff --git a/lib/swiotlb.c b/lib/swiotlb.c > index 22e13a0..47aad37 100644 > --- a/lib/swiotlb.c > +++ b/lib/swiotlb.c > @@ -910,14 +910,6 @@ void swiotlb_unmap_page(struct device *hwdev, dma_addr_t > dev_addr, > } > EXPORT_SYMBOL(swiotlb_map_sg_attrs); > > -int > -swiotlb_map_sg(struct device *hwdev, struct scatterlist *sgl, int nelems, > -enum dma_data_direction dir) > -{ > - return swiotlb_map_sg_attrs(hwdev, sgl, nelems, dir, 0); > -} > -EXPORT_SYMBOL(swiotlb_map_sg); > - > /* > * Unmap a set of streaming mode DMA translations. Again, cpu read rules > * concerning calls here are the same as for swiotlb_unmap_page() above. >
[net-next PATCH RFC 26/26] igb: Revert "igb: Revert support for build_skb in igb"
This reverts commit f9d40f6a9921 ("igb: Revert support for build_skb in igb") and adds a few changes to update it to work with the latest version of igb. We are now able to revert the removal of this due to the fact that with the recent changes to the page count and the use of DMA_ATTR_SKIP_CPU_SYNC we can make the pages writable so we should not be invalidating the additional data added when we call build_skb. The biggest risk with this change is that we are now not able to support full jumbo frames when using build_skb. Instead we can only support up to 2K minus the skb overhead and padding offset. Signed-off-by: Alexander Duyck--- drivers/net/ethernet/intel/igb/igb.h | 29 ++ drivers/net/ethernet/intel/igb/igb_main.c | 130 ++--- 2 files changed, 142 insertions(+), 17 deletions(-) diff --git a/drivers/net/ethernet/intel/igb/igb.h b/drivers/net/ethernet/intel/igb/igb.h index acbc3ab..c3420f3 100644 --- a/drivers/net/ethernet/intel/igb/igb.h +++ b/drivers/net/ethernet/intel/igb/igb.h @@ -145,6 +145,10 @@ struct vf_data_storage { #define IGB_RX_HDR_LEN IGB_RXBUFFER_256 #define IGB_RX_BUFSZ IGB_RXBUFFER_2048 +#define IGB_SKB_PAD(NET_SKB_PAD + NET_IP_ALIGN) +#define IGB_MAX_BUILD_SKB_SIZE \ + (SKB_WITH_OVERHEAD(IGB_RX_BUFSZ) - (IGB_SKB_PAD + IGB_TS_HDR_LEN)) + /* How many Rx Buffers do we bundle into one write to the hardware ? */ #define IGB_RX_BUFFER_WRITE16 /* Must be power of 2 */ @@ -301,12 +305,29 @@ struct igb_q_vector { }; enum e1000_ring_flags_t { - IGB_RING_FLAG_RX_SCTP_CSUM, - IGB_RING_FLAG_RX_LB_VLAN_BSWAP, - IGB_RING_FLAG_TX_CTX_IDX, - IGB_RING_FLAG_TX_DETECT_HANG + IGB_RING_FLAG_RX_SCTP_CSUM = 0, +#if (NET_IP_ALIGN != 0) + IGB_RING_FLAG_RX_BUILD_SKB_ENABLED = 1, +#endif + IGB_RING_FLAG_RX_LB_VLAN_BSWAP = 2, + IGB_RING_FLAG_TX_CTX_IDX = 3, + IGB_RING_FLAG_TX_DETECT_HANG = 4, +#if (NET_IP_ALIGN == 0) +#if (L1_CACHE_SHIFT < 5) + IGB_RING_FLAG_RX_BUILD_SKB_ENABLED = 5, +#else + IGB_RING_FLAG_RX_BUILD_SKB_ENABLED = L1_CACHE_SHIFT, +#endif +#endif }; +#define ring_uses_build_skb(ring) \ + test_bit(IGB_RING_FLAG_RX_BUILD_SKB_ENABLED, &(ring)->flags) +#define set_ring_build_skb_enabled(ring) \ + set_bit(IGB_RING_FLAG_RX_BUILD_SKB_ENABLED, &(ring)->flags) +#define clear_ring_build_skb_enabled(ring) \ + clear_bit(IGB_RING_FLAG_RX_BUILD_SKB_ENABLED, &(ring)->flags) + #define IGB_TXD_DCMD (E1000_ADVTXD_DCMD_EOP | E1000_ADVTXD_DCMD_RS) #define IGB_RX_DESC(R, i) \ diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c index 83fdef6..7674a50 100644 --- a/drivers/net/ethernet/intel/igb/igb_main.c +++ b/drivers/net/ethernet/intel/igb/igb_main.c @@ -3761,6 +3761,16 @@ void igb_configure_rx_ring(struct igb_adapter *adapter, wr32(E1000_RXDCTL(reg_idx), rxdctl); } +static void igb_set_rx_buffer_len(struct igb_adapter *adapter, + struct igb_ring *rx_ring) +{ + /* set build_skb flag */ + if (adapter->max_frame_size <= IGB_MAX_BUILD_SKB_SIZE) + set_ring_build_skb_enabled(rx_ring); + else + clear_ring_build_skb_enabled(rx_ring); +} + /** * igb_configure_rx - Configure receive Unit after Reset * @adapter: board private structure @@ -3778,8 +3788,12 @@ static void igb_configure_rx(struct igb_adapter *adapter) /* Setup the HW Rx Head and Tail Descriptor Pointers and * the Base and Length of the Rx Descriptor Ring */ - for (i = 0; i < adapter->num_rx_queues; i++) - igb_configure_rx_ring(adapter, adapter->rx_ring[i]); + for (i = 0; i < adapter->num_rx_queues; i++) { + struct igb_ring *rx_ring = adapter->rx_ring[i]; + + igb_set_rx_buffer_len(adapter, rx_ring); + igb_configure_rx_ring(adapter, rx_ring); + } } /** @@ -4238,7 +4252,7 @@ static void igb_set_rx_mode(struct net_device *netdev) struct igb_adapter *adapter = netdev_priv(netdev); struct e1000_hw *hw = >hw; unsigned int vfn = adapter->vfs_allocated_count; - u32 rctl = 0, vmolr = 0; + u32 rctl = 0, vmolr = 0, rlpml = MAX_JUMBO_FRAME_SIZE; int count; /* Check for Promiscuous and All Multicast modes */ @@ -4310,12 +4324,18 @@ static void igb_set_rx_mode(struct net_device *netdev) vmolr |= rd32(E1000_VMOLR(vfn)) & ~(E1000_VMOLR_ROPE | E1000_VMOLR_MPME | E1000_VMOLR_ROMPE); - /* enable Rx jumbo frames, no need for restriction */ + /* enable Rx jumbo frames, restrict as needed to support build_skb */ vmolr &= ~E1000_VMOLR_RLPML_MASK; - vmolr |= MAX_JUMBO_FRAME_SIZE | E1000_VMOLR_LPE; + vmolr |= E1000_VMOLR_LPE; + vmolr |= (adapter->max_frame_size <= IGB_MAX_BUILD_SKB_SIZE) ? +
[net-next PATCH RFC 22/26] dma: Add calls for dma_map_page_attrs and dma_unmap_page_attrs
Add support for mapping and unmapping a page with attributes. The primary use for this is currently to allow for us to pass the DMA_ATTR_SKIP_CPU_SYNC attribute when mapping and unmapping a page. On some architectures such as ARM the synchronization has significant overhead and if we are already taking care of the sync_for_cpu and sync_for_device from the driver there isn't much need to handle this in the map/unmap calls as well. Signed-off-by: Alexander Duyck--- include/linux/dma-mapping.h | 20 +--- 1 file changed, 13 insertions(+), 7 deletions(-) diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h index 08528af..10c5a17 100644 --- a/include/linux/dma-mapping.h +++ b/include/linux/dma-mapping.h @@ -243,29 +243,33 @@ static inline void dma_unmap_sg_attrs(struct device *dev, struct scatterlist *sg ops->unmap_sg(dev, sg, nents, dir, attrs); } -static inline dma_addr_t dma_map_page(struct device *dev, struct page *page, - size_t offset, size_t size, - enum dma_data_direction dir) +static inline dma_addr_t dma_map_page_attrs(struct device *dev, + struct page *page, + size_t offset, size_t size, + enum dma_data_direction dir, + unsigned long attrs) { struct dma_map_ops *ops = get_dma_ops(dev); dma_addr_t addr; kmemcheck_mark_initialized(page_address(page) + offset, size); BUG_ON(!valid_dma_direction(dir)); - addr = ops->map_page(dev, page, offset, size, dir, 0); + addr = ops->map_page(dev, page, offset, size, dir, attrs); debug_dma_map_page(dev, page, offset, size, dir, addr, false); return addr; } -static inline void dma_unmap_page(struct device *dev, dma_addr_t addr, - size_t size, enum dma_data_direction dir) +static inline void dma_unmap_page_attrs(struct device *dev, + dma_addr_t addr, size_t size, + enum dma_data_direction dir, + unsigned long attrs) { struct dma_map_ops *ops = get_dma_ops(dev); BUG_ON(!valid_dma_direction(dir)); if (ops->unmap_page) - ops->unmap_page(dev, addr, size, dir, 0); + ops->unmap_page(dev, addr, size, dir, attrs); debug_dma_unmap_page(dev, addr, size, dir, false); } @@ -385,6 +389,8 @@ static inline void dma_sync_single_range_for_device(struct device *dev, #define dma_unmap_single(d, a, s, r) dma_unmap_single_attrs(d, a, s, r, 0) #define dma_map_sg(d, s, n, r) dma_map_sg_attrs(d, s, n, r, 0) #define dma_unmap_sg(d, s, n, r) dma_unmap_sg_attrs(d, s, n, r, 0) +#define dma_map_page(d, p, o, s, r) dma_map_page_attrs(d, p, o, s, r, 0) +#define dma_unmap_page(d, a, s, r) dma_unmap_page_attrs(d, a, s, r, 0) extern int dma_common_mmap(struct device *dev, struct vm_area_struct *vma, void *cpu_addr, dma_addr_t dma_addr, size_t size);
[net-next PATCH RFC 20/26] arch/tile: Add option to skip DMA sync as a part of map and unmap
This change allows us to pass DMA_ATTR_SKIP_CPU_SYNC which allows us to avoid invoking cache line invalidation if the driver will just handle it via a sync_for_cpu or sync_for_device call. Cc: Chris MetcalfSigned-off-by: Alexander Duyck --- arch/tile/kernel/pci-dma.c | 12 ++-- 1 file changed, 10 insertions(+), 2 deletions(-) diff --git a/arch/tile/kernel/pci-dma.c b/arch/tile/kernel/pci-dma.c index 09bb774..24e0f8c 100644 --- a/arch/tile/kernel/pci-dma.c +++ b/arch/tile/kernel/pci-dma.c @@ -213,10 +213,12 @@ static int tile_dma_map_sg(struct device *dev, struct scatterlist *sglist, for_each_sg(sglist, sg, nents, i) { sg->dma_address = sg_phys(sg); - __dma_prep_pa_range(sg->dma_address, sg->length, direction); #ifdef CONFIG_NEED_SG_DMA_LENGTH sg->dma_length = sg->length; #endif + if (attrs & DMA_ATTR_SKIP_CPU_SYNC) + continue; + __dma_prep_pa_range(sg->dma_address, sg->length, direction); } return nents; @@ -232,6 +234,8 @@ static void tile_dma_unmap_sg(struct device *dev, struct scatterlist *sglist, BUG_ON(!valid_dma_direction(direction)); for_each_sg(sglist, sg, nents, i) { sg->dma_address = sg_phys(sg); + if (attrs & DMA_ATTR_SKIP_CPU_SYNC) + continue; __dma_complete_pa_range(sg->dma_address, sg->length, direction); } @@ -245,7 +249,8 @@ static dma_addr_t tile_dma_map_page(struct device *dev, struct page *page, BUG_ON(!valid_dma_direction(direction)); BUG_ON(offset + size > PAGE_SIZE); - __dma_prep_page(page, offset, size, direction); + if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC)) + __dma_prep_page(page, offset, size, direction); return page_to_pa(page) + offset; } @@ -256,6 +261,9 @@ static void tile_dma_unmap_page(struct device *dev, dma_addr_t dma_address, { BUG_ON(!valid_dma_direction(direction)); + if (attrs & DMA_ATTR_SKIP_CPU_SYNC) + return; + __dma_complete_page(pfn_to_page(PFN_DOWN(dma_address)), dma_address & (PAGE_SIZE - 1), size, direction); }
[net-next PATCH RFC 24/26] igb: Update driver to make use of DMA_ATTR_SKIP_CPU_SYNC
The ARM architecture provides a mechanism for deferring cache line invalidation in the case of map/unmap. This patch makes use of this mechanism to avoid unnecessary synchronization. A secondary effect of this change is that the portion of the page that has been synchronized for use by the CPU should be writable and could be passed up the stack (at least on ARM). The last bit that occurred to me is that on architectures where the sync_for_cpu call invalidates cache lines we were prefetching and then invalidating the first 128 bytes of the packet. To avoid that I have moved the sync up to before we perform the prefetch and allocate the skbuff so that we can actually make use of it. Signed-off-by: Alexander Duyck--- drivers/net/ethernet/intel/igb/igb_main.c | 53 ++--- 1 file changed, 33 insertions(+), 20 deletions(-) diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c index 4feca69..c8c458c 100644 --- a/drivers/net/ethernet/intel/igb/igb_main.c +++ b/drivers/net/ethernet/intel/igb/igb_main.c @@ -3947,10 +3947,21 @@ static void igb_clean_rx_ring(struct igb_ring *rx_ring) if (!buffer_info->page) continue; - dma_unmap_page(rx_ring->dev, - buffer_info->dma, - PAGE_SIZE, - DMA_FROM_DEVICE); + /* Invalidate cache lines that may have been written to by +* device so that we avoid corrupting memory. +*/ + dma_sync_single_range_for_cpu(rx_ring->dev, + buffer_info->dma, + buffer_info->page_offset, + IGB_RX_BUFSZ, + DMA_FROM_DEVICE); + + /* free resources associated with mapping */ + dma_unmap_page_attrs(rx_ring->dev, +buffer_info->dma, +PAGE_SIZE, +DMA_FROM_DEVICE, +DMA_ATTR_SKIP_CPU_SYNC); __free_page(buffer_info->page); buffer_info->page = NULL; @@ -6808,12 +6819,6 @@ static void igb_reuse_rx_page(struct igb_ring *rx_ring, /* transfer page from old buffer to new buffer */ *new_buff = *old_buff; - - /* sync the buffer for use by the device */ - dma_sync_single_range_for_device(rx_ring->dev, old_buff->dma, -old_buff->page_offset, -IGB_RX_BUFSZ, -DMA_FROM_DEVICE); } static inline bool igb_page_is_reserved(struct page *page) @@ -6934,6 +6939,13 @@ static struct sk_buff *igb_fetch_rx_buffer(struct igb_ring *rx_ring, page = rx_buffer->page; prefetchw(page); + /* we are reusing so sync this buffer for CPU use */ + dma_sync_single_range_for_cpu(rx_ring->dev, + rx_buffer->dma, + rx_buffer->page_offset, + size, + DMA_FROM_DEVICE); + if (likely(!skb)) { void *page_addr = page_address(page) + rx_buffer->page_offset; @@ -6958,21 +6970,15 @@ static struct sk_buff *igb_fetch_rx_buffer(struct igb_ring *rx_ring, prefetchw(skb->data); } - /* we are reusing so sync this buffer for CPU use */ - dma_sync_single_range_for_cpu(rx_ring->dev, - rx_buffer->dma, - rx_buffer->page_offset, - size, - DMA_FROM_DEVICE); - /* pull page into skb */ if (igb_add_rx_frag(rx_ring, rx_buffer, size, rx_desc, skb)) { /* hand second half of page back to the ring */ igb_reuse_rx_page(rx_ring, rx_buffer); } else { /* we are not reusing the buffer so unmap it */ - dma_unmap_page(rx_ring->dev, rx_buffer->dma, - PAGE_SIZE, DMA_FROM_DEVICE); + dma_unmap_page_attrs(rx_ring->dev, rx_buffer->dma, +PAGE_SIZE, DMA_FROM_DEVICE, +DMA_ATTR_SKIP_CPU_SYNC); } /* clear contents of rx_buffer */ @@ -7230,7 +7236,8 @@ static bool igb_alloc_mapped_page(struct igb_ring *rx_ring, } /* map page for use */ - dma = dma_map_page(rx_ring->dev, page, 0, PAGE_SIZE, DMA_FROM_DEVICE); + dma = dma_map_page_attrs(rx_ring->dev, page, 0, PAGE_SIZE, +DMA_FROM_DEVICE,
[net-next PATCH RFC 15/26] arch/openrisc: Add option to skip DMA sync as a part of mapping
This change allows us to pass DMA_ATTR_SKIP_CPU_SYNC which allows us to avoid invoking cache line invalidation if the driver will just handle it via a sync_for_cpu or sync_for_device call. Cc: Jonas BonnSigned-off-by: Alexander Duyck --- arch/openrisc/kernel/dma.c |3 +++ 1 file changed, 3 insertions(+) diff --git a/arch/openrisc/kernel/dma.c b/arch/openrisc/kernel/dma.c index 140c991..906998b 100644 --- a/arch/openrisc/kernel/dma.c +++ b/arch/openrisc/kernel/dma.c @@ -141,6 +141,9 @@ unsigned long cl; dma_addr_t addr = page_to_phys(page) + offset; + if (attrs & DMA_ATTR_SKIP_CPU_SYNC) + return addr; + switch (dir) { case DMA_TO_DEVICE: /* Flush the dcache for the requested range */
[net-next PATCH RFC 09/26] arch/hexagon: Add option to skip DMA sync as a part of mapping
This change allows us to pass DMA_ATTR_SKIP_CPU_SYNC which allows us to avoid invoking cache line invalidation if the driver will just handle it later via a sync_for_cpu or sync_for_device call. Cc: Richard KuoCc: linux-hexa...@vger.kernel.org Signed-off-by: Alexander Duyck --- arch/hexagon/kernel/dma.c |6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/arch/hexagon/kernel/dma.c b/arch/hexagon/kernel/dma.c index b901778..dbc4f10 100644 --- a/arch/hexagon/kernel/dma.c +++ b/arch/hexagon/kernel/dma.c @@ -119,6 +119,9 @@ static int hexagon_map_sg(struct device *hwdev, struct scatterlist *sg, s->dma_length = s->length; + if (attrs & DMA_ATTR_SKIP_CPU_SYNC) + continue; + flush_dcache_range(dma_addr_to_virt(s->dma_address), dma_addr_to_virt(s->dma_address + s->length)); } @@ -180,7 +183,8 @@ static dma_addr_t hexagon_map_page(struct device *dev, struct page *page, if (!check_addr("map_single", dev, bus, size)) return bad_dma_address; - dma_sync(dma_addr_to_virt(bus), size, dir); + if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC)) + dma_sync(dma_addr_to_virt(bus), size, dir); return bus; }
[net-next PATCH RFC 08/26] arch/frv: Add option to skip sync on DMA map
The use of DMA_ATTR_SKIP_CPU_SYNC was not consistent across all of the DMA APIs in the arch/arm folder. This change is meant to correct that so that we get consistent behavior. Signed-off-by: Alexander Duyck--- arch/frv/mb93090-mb00/pci-dma-nommu.c | 16 +++- arch/frv/mb93090-mb00/pci-dma.c |7 ++- 2 files changed, 17 insertions(+), 6 deletions(-) diff --git a/arch/frv/mb93090-mb00/pci-dma-nommu.c b/arch/frv/mb93090-mb00/pci-dma-nommu.c index 90f2e4c..ff606d1 100644 --- a/arch/frv/mb93090-mb00/pci-dma-nommu.c +++ b/arch/frv/mb93090-mb00/pci-dma-nommu.c @@ -109,16 +109,19 @@ static int frv_dma_map_sg(struct device *dev, struct scatterlist *sglist, int nents, enum dma_data_direction direction, unsigned long attrs) { - int i; struct scatterlist *sg; + int i; + + WARN_ON(direction == DMA_NONE); + + if (attrs & DMA_ATTR_SKIP_CPU_SYNC) + return nents; for_each_sg(sglist, sg, nents, i) { frv_cache_wback_inv(sg_dma_address(sg), sg_dma_address(sg) + sg_dma_len(sg)); } - BUG_ON(direction == DMA_NONE); - return nents; } @@ -126,8 +129,11 @@ static dma_addr_t frv_dma_map_page(struct device *dev, struct page *page, unsigned long offset, size_t size, enum dma_data_direction direction, unsigned long attrs) { - BUG_ON(direction == DMA_NONE); - flush_dcache_page(page); + WARN_ON(direction == DMA_NONE); + + if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC)) + flush_dcache_page(page); + return (dma_addr_t) page_to_phys(page) + offset; } diff --git a/arch/frv/mb93090-mb00/pci-dma.c b/arch/frv/mb93090-mb00/pci-dma.c index f585745..ee5dadf 100644 --- a/arch/frv/mb93090-mb00/pci-dma.c +++ b/arch/frv/mb93090-mb00/pci-dma.c @@ -52,6 +52,9 @@ static int frv_dma_map_sg(struct device *dev, struct scatterlist *sglist, for_each_sg(sglist, sg, nents, i) { vaddr = kmap_atomic_primary(sg_page(sg)); + if (attrs & DMA_ATTR_SKIP_CPU_SYNC) + continue; + frv_dcache_writeback((unsigned long) vaddr, (unsigned long) vaddr + PAGE_SIZE); @@ -70,7 +73,9 @@ static dma_addr_t frv_dma_map_page(struct device *dev, struct page *page, unsigned long offset, size_t size, enum dma_data_direction direction, unsigned long attrs) { - flush_dcache_page(page); + if (!(attr & DMA_ATTR_SKIP_CPU_SYNC)) + flush_dcache_page(page); + return (dma_addr_t) page_to_phys(page) + offset; }
[net-next PATCH RFC 21/26] arch/xtensa: Add option to skip DMA sync as a part of mapping
This change allows us to pass DMA_ATTR_SKIP_CPU_SYNC which allows us to avoid invoking cache line invalidation if the driver will just handle it via a sync_for_cpu or sync_for_device call. Cc: Max FilippovSigned-off-by: Alexander Duyck --- arch/xtensa/kernel/pci-dma.c |7 +-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/arch/xtensa/kernel/pci-dma.c b/arch/xtensa/kernel/pci-dma.c index 1e68806..6a16dec 100644 --- a/arch/xtensa/kernel/pci-dma.c +++ b/arch/xtensa/kernel/pci-dma.c @@ -189,7 +189,9 @@ static dma_addr_t xtensa_map_page(struct device *dev, struct page *page, { dma_addr_t dma_handle = page_to_phys(page) + offset; - xtensa_sync_single_for_device(dev, dma_handle, size, dir); + if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC)) + xtensa_sync_single_for_device(dev, dma_handle, size, dir); + return dma_handle; } @@ -197,7 +199,8 @@ static void xtensa_unmap_page(struct device *dev, dma_addr_t dma_handle, size_t size, enum dma_data_direction dir, unsigned long attrs) { - xtensa_sync_single_for_cpu(dev, dma_handle, size, dir); + if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC)) + xtensa_sync_single_for_cpu(dev, dma_handle, size, dir); } static int xtensa_map_sg(struct device *dev, struct scatterlist *sg,
[net-next PATCH RFC 25/26] igb: Update code to better handle incrementing page count
This patch updates the driver code so that we do bulk updates of the page reference count instead of just incrementing it by one reference at a time. The advantage to doing this is that we cut down on atomic operations and this in turn should give us a slight improvement in cycles per packet. In addition if we eventually move this over to using build_skb the gains will be more noticeable. Signed-off-by: Alexander Duyck--- drivers/net/ethernet/intel/igb/igb.h |7 ++- drivers/net/ethernet/intel/igb/igb_main.c | 24 +--- 2 files changed, 23 insertions(+), 8 deletions(-) diff --git a/drivers/net/ethernet/intel/igb/igb.h b/drivers/net/ethernet/intel/igb/igb.h index d11093d..acbc3ab 100644 --- a/drivers/net/ethernet/intel/igb/igb.h +++ b/drivers/net/ethernet/intel/igb/igb.h @@ -210,7 +210,12 @@ struct igb_tx_buffer { struct igb_rx_buffer { dma_addr_t dma; struct page *page; - unsigned int page_offset; +#if (BITS_PER_LONG > 32) || (PAGE_SIZE >= 65536) + __u32 page_offset; +#else + __u16 page_offset; +#endif + __u16 pagecnt_bias; }; struct igb_tx_queue_stats { diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c index c8c458c..83fdef6 100644 --- a/drivers/net/ethernet/intel/igb/igb_main.c +++ b/drivers/net/ethernet/intel/igb/igb_main.c @@ -3962,7 +3962,8 @@ static void igb_clean_rx_ring(struct igb_ring *rx_ring) PAGE_SIZE, DMA_FROM_DEVICE, DMA_ATTR_SKIP_CPU_SYNC); - __free_page(buffer_info->page); + __page_frag_drain(buffer_info->page, 0, + buffer_info->pagecnt_bias); buffer_info->page = NULL; } @@ -6830,13 +6831,15 @@ static bool igb_can_reuse_rx_page(struct igb_rx_buffer *rx_buffer, struct page *page, unsigned int truesize) { + unsigned int pagecnt_bias = rx_buffer->pagecnt_bias--; + /* avoid re-using remote pages */ if (unlikely(igb_page_is_reserved(page))) return false; #if (PAGE_SIZE < 8192) /* if we are only owner of page we can reuse it */ - if (unlikely(page_count(page) != 1)) + if (unlikely(page_ref_count(page) != pagecnt_bias)) return false; /* flip page offset to other buffer */ @@ -6849,10 +6852,14 @@ static bool igb_can_reuse_rx_page(struct igb_rx_buffer *rx_buffer, return false; #endif - /* Even if we own the page, we are not allowed to use atomic_set() -* This would break get_page_unless_zero() users. + /* If we have drained the page fragment pool we need to update +* the pagecnt_bias and page count so that we fully restock the +* number of references the driver holds. */ - page_ref_inc(page); + if (unlikely(!rx_buffer->pagecnt_bias)) { + page_ref_add(page, USHRT_MAX); + rx_buffer->pagecnt_bias = USHRT_MAX; + } return true; } @@ -6904,7 +6911,6 @@ static bool igb_add_rx_frag(struct igb_ring *rx_ring, return true; /* this page cannot be reused so discard it */ - __free_page(page); return false; } @@ -6975,10 +6981,13 @@ static struct sk_buff *igb_fetch_rx_buffer(struct igb_ring *rx_ring, /* hand second half of page back to the ring */ igb_reuse_rx_page(rx_ring, rx_buffer); } else { - /* we are not reusing the buffer so unmap it */ + /* We are not reusing the buffer so unmap it and free +* any references we are holding to it +*/ dma_unmap_page_attrs(rx_ring->dev, rx_buffer->dma, PAGE_SIZE, DMA_FROM_DEVICE, DMA_ATTR_SKIP_CPU_SYNC); + __page_frag_drain(page, 0, rx_buffer->pagecnt_bias); } /* clear contents of rx_buffer */ @@ -7252,6 +7261,7 @@ static bool igb_alloc_mapped_page(struct igb_ring *rx_ring, bi->dma = dma; bi->page = page; bi->page_offset = 0; + bi->pagecnt_bias = 1; return true; }
Re: [net-next PATCH RFC 02/26] swiotlb: Add support for DMA_ATTR_SKIP_CPU_SYNC
On Mon, Oct 24, 2016 at 08:04:37AM -0400, Alexander Duyck wrote: > As a first step to making DMA_ATTR_SKIP_CPU_SYNC apply to architectures > beyond just ARM I need to make it so that the swiotlb will respect the > flag. In order to do that I also need to update the swiotlb-xen since it > heavily makes use of the functionality. > > Cc: Konrad Rzeszutek Wilk> Signed-off-by: Alexander Duyck > --- > drivers/xen/swiotlb-xen.c | 40 ++ > include/linux/swiotlb.h |6 -- > lib/swiotlb.c | 48 > +++-- > 3 files changed, 56 insertions(+), 38 deletions(-) > > diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c > index 87e6035..cf047d8 100644 > --- a/drivers/xen/swiotlb-xen.c > +++ b/drivers/xen/swiotlb-xen.c > @@ -405,7 +405,8 @@ dma_addr_t xen_swiotlb_map_page(struct device *dev, > struct page *page, >*/ > trace_swiotlb_bounced(dev, dev_addr, size, swiotlb_force); > > - map = swiotlb_tbl_map_single(dev, start_dma_addr, phys, size, dir); > + map = swiotlb_tbl_map_single(dev, start_dma_addr, phys, size, dir, > + attrs); > if (map == SWIOTLB_MAP_ERROR) > return DMA_ERROR_CODE; > > @@ -416,11 +417,13 @@ dma_addr_t xen_swiotlb_map_page(struct device *dev, > struct page *page, > /* >* Ensure that the address returned is DMA'ble >*/ > - if (!dma_capable(dev, dev_addr, size)) { > - swiotlb_tbl_unmap_single(dev, map, size, dir); > - dev_addr = 0; > - } > - return dev_addr; > + if (dma_capable(dev, dev_addr, size)) > + return dev_addr; > + > + swiotlb_tbl_unmap_single(dev, map, size, dir, > + attrs | DMA_ATTR_SKIP_CPU_SYNC); > + > + return DMA_ERROR_CODE; Why? This change (re-ordering the code - and returning DMA_ERROR_CODE instead of 0) does not have anything to do with the title. If you really feel strongly about it - then please send it as a seperate patch. > } > EXPORT_SYMBOL_GPL(xen_swiotlb_map_page); > > @@ -444,7 +447,7 @@ static void xen_unmap_single(struct device *hwdev, > dma_addr_t dev_addr, > > /* NOTE: We use dev_addr here, not paddr! */ > if (is_xen_swiotlb_buffer(dev_addr)) { > - swiotlb_tbl_unmap_single(hwdev, paddr, size, dir); > + swiotlb_tbl_unmap_single(hwdev, paddr, size, dir, attrs); > return; > } > > @@ -557,16 +560,9 @@ void xen_swiotlb_unmap_page(struct device *hwdev, > dma_addr_t dev_addr, >start_dma_addr, >sg_phys(sg), >sg->length, > - dir); > - if (map == SWIOTLB_MAP_ERROR) { > - dev_warn(hwdev, "swiotlb buffer is full\n"); > - /* Don't panic here, we expect map_sg users > -to do proper error handling. */ > - xen_swiotlb_unmap_sg_attrs(hwdev, sgl, i, dir, > -attrs); > - sg_dma_len(sgl) = 0; > - return 0; > - } > + dir, attrs); > + if (map == SWIOTLB_MAP_ERROR) > + goto map_error; > xen_dma_map_page(hwdev, pfn_to_page(map >> PAGE_SHIFT), > dev_addr, > map & ~PAGE_MASK, > @@ -589,6 +585,16 @@ void xen_swiotlb_unmap_page(struct device *hwdev, > dma_addr_t dev_addr, > sg_dma_len(sg) = sg->length; > } > return nelems; > +map_error: > + dev_warn(hwdev, "swiotlb buffer is full\n"); > + /* > + * Don't panic here, we expect map_sg users > + * to do proper error handling. > + */ > + xen_swiotlb_unmap_sg_attrs(hwdev, sgl, i, dir, > +attrs | DMA_ATTR_SKIP_CPU_SYNC); > + sg_dma_len(sgl) = 0; > + return 0; > } This too. Why can't that be part of the existing code that was there?
[net-next PATCH RFC 00/26] Add support for DMA writable pages being writable by the network stack
The first 21 patches in the set add support for the DMA attribute DMA_ATTR_SKIP_CPU_SYNC on multiple platforms/architectures. This is needed so that we can flag the calls to dma_map/unmap_page so that we do not invalidate cache lines that do not currently belong to the device. Instead we have to take care of this in the driver via a call to sync_single_range_for_cpu prior to freeing the Rx page. Patch 22 adds support for dma_map_page_attrs and dma_unmap_page_attrs so that we can unmap and map a page using the DMA_ATTR_SKIP_CPU_SYNC attribute. Patch 23 adds support for freeing a page that has multiple references being held by a single caller. This way we can free page fragments that were allocated by a given driver. The last 3 patches use these updates in the igb driver to allow for us to reimplement the use of build_skb which hands a writable page off to the stack. My hope is to get the series accepted into the net-next tree as I have a number of other Intel drivers I could then begin updating once these patches are accepted. Any feedback is welcome. Specifically if there is something I overlooked design-wise or an architecture I missed please let me know and I will add it to this patch set. If needed I can look into breaking this into a smaller set of patches but this set is all that should be needed to then start looking at putting together a DMA page pool per device which I know is something Jesper has been working on. --- Alexander Duyck (26): swiotlb: Drop unused function swiotlb_map_sg swiotlb: Add support for DMA_ATTR_SKIP_CPU_SYNC arch/arc: Add option to skip sync on DMA mapping arch/arm: Add option to skip sync on DMA map and unmap arch/avr32: Add option to skip sync on DMA map arch/blackfin: Add option to skip sync on DMA map arch/c6x: Add option to skip sync on DMA map and unmap arch/frv: Add option to skip sync on DMA map arch/hexagon: Add option to skip DMA sync as a part of mapping arch/m68k: Add option to skip DMA sync as a part of mapping arch/metag: Add option to skip DMA sync as a part of map and unmap arch/microblaze: Add option to skip DMA sync as a part of map and unmap arch/mips: Add option to skip DMA sync as a part of map and unmap arch/nios2: Add option to skip DMA sync as a part of map and unmap arch/openrisc: Add option to skip DMA sync as a part of mapping arch/parisc: Add option to skip DMA sync as a part of map and unmap arch/powerpc: Add option to skip DMA sync as a part of mapping arch/sh: Add option to skip DMA sync as a part of mapping arch/sparc: Add option to skip DMA sync as a part of map and unmap arch/tile: Add option to skip DMA sync as a part of map and unmap arch/xtensa: Add option to skip DMA sync as a part of mapping dma: Add calls for dma_map_page_attrs and dma_unmap_page_attrs mm: Add support for releasing multiple instances of a page igb: Update driver to make use of DMA_ATTR_SKIP_CPU_SYNC igb: Update code to better handle incrementing page count igb: Revert "igb: Revert support for build_skb in igb" arch/arc/mm/dma.c |3 arch/arm/common/dmabounce.c | 16 +- arch/avr32/mm/dma-coherent.c |7 + arch/blackfin/kernel/dma-mapping.c|7 + arch/c6x/kernel/dma.c | 16 ++ arch/frv/mb93090-mb00/pci-dma-nommu.c | 16 ++ arch/frv/mb93090-mb00/pci-dma.c |7 + arch/hexagon/kernel/dma.c |6 + arch/m68k/kernel/dma.c|8 + arch/metag/kernel/dma.c | 16 ++ arch/microblaze/kernel/dma.c | 10 + arch/mips/loongson64/common/dma-swiotlb.c |2 arch/mips/mm/dma-default.c|8 + arch/nios2/mm/dma-mapping.c | 14 ++ arch/openrisc/kernel/dma.c|3 arch/parisc/kernel/pci-dma.c | 20 ++- arch/powerpc/kernel/dma.c |9 + arch/sh/kernel/dma-nommu.c|7 + arch/sparc/kernel/iommu.c |4 - arch/sparc/kernel/ioport.c|4 - arch/tile/kernel/pci-dma.c| 12 +- arch/xtensa/kernel/pci-dma.c |7 + drivers/net/ethernet/intel/igb/igb.h | 36 - drivers/net/ethernet/intel/igb/igb_main.c | 207 +++-- drivers/xen/swiotlb-xen.c | 40 +++--- include/linux/dma-mapping.h | 20 ++- include/linux/gfp.h |2 include/linux/swiotlb.h | 10 + lib/swiotlb.c | 56 mm/page_alloc.c | 14 ++ 30 files changed, 435 insertions(+), 152 deletions(-) -- Signature
[net-next PATCH RFC 02/26] swiotlb: Add support for DMA_ATTR_SKIP_CPU_SYNC
As a first step to making DMA_ATTR_SKIP_CPU_SYNC apply to architectures beyond just ARM I need to make it so that the swiotlb will respect the flag. In order to do that I also need to update the swiotlb-xen since it heavily makes use of the functionality. Cc: Konrad Rzeszutek WilkSigned-off-by: Alexander Duyck --- drivers/xen/swiotlb-xen.c | 40 ++ include/linux/swiotlb.h |6 -- lib/swiotlb.c | 48 +++-- 3 files changed, 56 insertions(+), 38 deletions(-) diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c index 87e6035..cf047d8 100644 --- a/drivers/xen/swiotlb-xen.c +++ b/drivers/xen/swiotlb-xen.c @@ -405,7 +405,8 @@ dma_addr_t xen_swiotlb_map_page(struct device *dev, struct page *page, */ trace_swiotlb_bounced(dev, dev_addr, size, swiotlb_force); - map = swiotlb_tbl_map_single(dev, start_dma_addr, phys, size, dir); + map = swiotlb_tbl_map_single(dev, start_dma_addr, phys, size, dir, +attrs); if (map == SWIOTLB_MAP_ERROR) return DMA_ERROR_CODE; @@ -416,11 +417,13 @@ dma_addr_t xen_swiotlb_map_page(struct device *dev, struct page *page, /* * Ensure that the address returned is DMA'ble */ - if (!dma_capable(dev, dev_addr, size)) { - swiotlb_tbl_unmap_single(dev, map, size, dir); - dev_addr = 0; - } - return dev_addr; + if (dma_capable(dev, dev_addr, size)) + return dev_addr; + + swiotlb_tbl_unmap_single(dev, map, size, dir, +attrs | DMA_ATTR_SKIP_CPU_SYNC); + + return DMA_ERROR_CODE; } EXPORT_SYMBOL_GPL(xen_swiotlb_map_page); @@ -444,7 +447,7 @@ static void xen_unmap_single(struct device *hwdev, dma_addr_t dev_addr, /* NOTE: We use dev_addr here, not paddr! */ if (is_xen_swiotlb_buffer(dev_addr)) { - swiotlb_tbl_unmap_single(hwdev, paddr, size, dir); + swiotlb_tbl_unmap_single(hwdev, paddr, size, dir, attrs); return; } @@ -557,16 +560,9 @@ void xen_swiotlb_unmap_page(struct device *hwdev, dma_addr_t dev_addr, start_dma_addr, sg_phys(sg), sg->length, -dir); - if (map == SWIOTLB_MAP_ERROR) { - dev_warn(hwdev, "swiotlb buffer is full\n"); - /* Don't panic here, we expect map_sg users - to do proper error handling. */ - xen_swiotlb_unmap_sg_attrs(hwdev, sgl, i, dir, - attrs); - sg_dma_len(sgl) = 0; - return 0; - } +dir, attrs); + if (map == SWIOTLB_MAP_ERROR) + goto map_error; xen_dma_map_page(hwdev, pfn_to_page(map >> PAGE_SHIFT), dev_addr, map & ~PAGE_MASK, @@ -589,6 +585,16 @@ void xen_swiotlb_unmap_page(struct device *hwdev, dma_addr_t dev_addr, sg_dma_len(sg) = sg->length; } return nelems; +map_error: + dev_warn(hwdev, "swiotlb buffer is full\n"); + /* +* Don't panic here, we expect map_sg users +* to do proper error handling. +*/ + xen_swiotlb_unmap_sg_attrs(hwdev, sgl, i, dir, + attrs | DMA_ATTR_SKIP_CPU_SYNC); + sg_dma_len(sgl) = 0; + return 0; } EXPORT_SYMBOL_GPL(xen_swiotlb_map_sg_attrs); diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h index e237b6f..4517be9 100644 --- a/include/linux/swiotlb.h +++ b/include/linux/swiotlb.h @@ -44,11 +44,13 @@ enum dma_sync_target { extern phys_addr_t swiotlb_tbl_map_single(struct device *hwdev, dma_addr_t tbl_dma_addr, phys_addr_t phys, size_t size, - enum dma_data_direction dir); + enum dma_data_direction dir, + unsigned long attrs); extern void swiotlb_tbl_unmap_single(struct device *hwdev, phys_addr_t tlb_addr, -size_t size, enum dma_data_direction dir); +size_t size, enum
[net-next PATCH RFC 13/26] arch/mips: Add option to skip DMA sync as a part of map and unmap
This change allows us to pass DMA_ATTR_SKIP_CPU_SYNC which allows us to avoid invoking cache line invalidation if the driver will just handle it via a sync_for_cpu or sync_for_device call. Cc: Ralf BaechleCc: Keguang Zhang Cc: linux-m...@linux-mips.org Signed-off-by: Alexander Duyck --- arch/mips/loongson64/common/dma-swiotlb.c |2 +- arch/mips/mm/dma-default.c|8 +--- 2 files changed, 6 insertions(+), 4 deletions(-) diff --git a/arch/mips/loongson64/common/dma-swiotlb.c b/arch/mips/loongson64/common/dma-swiotlb.c index 1a80b6f..aab4fd6 100644 --- a/arch/mips/loongson64/common/dma-swiotlb.c +++ b/arch/mips/loongson64/common/dma-swiotlb.c @@ -61,7 +61,7 @@ static int loongson_dma_map_sg(struct device *dev, struct scatterlist *sg, int nents, enum dma_data_direction dir, unsigned long attrs) { - int r = swiotlb_map_sg_attrs(dev, sg, nents, dir, 0); + int r = swiotlb_map_sg_attrs(dev, sg, nents, dir, attrs); mb(); return r; diff --git a/arch/mips/mm/dma-default.c b/arch/mips/mm/dma-default.c index b2eadd6..dd998d7 100644 --- a/arch/mips/mm/dma-default.c +++ b/arch/mips/mm/dma-default.c @@ -293,7 +293,7 @@ static inline void __dma_sync(struct page *page, static void mips_dma_unmap_page(struct device *dev, dma_addr_t dma_addr, size_t size, enum dma_data_direction direction, unsigned long attrs) { - if (cpu_needs_post_dma_flush(dev)) + if (cpu_needs_post_dma_flush(dev) && !(attrs & DMA_ATTR_SKIP_CPU_SYNC)) __dma_sync(dma_addr_to_page(dev, dma_addr), dma_addr & ~PAGE_MASK, size, direction); plat_post_dma_flush(dev); @@ -307,7 +307,8 @@ static int mips_dma_map_sg(struct device *dev, struct scatterlist *sglist, struct scatterlist *sg; for_each_sg(sglist, sg, nents, i) { - if (!plat_device_is_coherent(dev)) + if (!plat_device_is_coherent(dev) && + !(attrs & DMA_ATTR_SKIP_CPU_SYNC)) __dma_sync(sg_page(sg), sg->offset, sg->length, direction); #ifdef CONFIG_NEED_SG_DMA_LENGTH @@ -324,7 +325,7 @@ static dma_addr_t mips_dma_map_page(struct device *dev, struct page *page, unsigned long offset, size_t size, enum dma_data_direction direction, unsigned long attrs) { - if (!plat_device_is_coherent(dev)) + if (!plat_device_is_coherent(dev) && !(attrs & DMA_ATTR_SKIP_CPU_SYNC)) __dma_sync(page, offset, size, direction); return plat_map_dma_mem_page(dev, page) + offset; @@ -339,6 +340,7 @@ static void mips_dma_unmap_sg(struct device *dev, struct scatterlist *sglist, for_each_sg(sglist, sg, nhwentries, i) { if (!plat_device_is_coherent(dev) && + !(attrs & DMA_ATTR_SKIP_CPU_SYNC) && direction != DMA_TO_DEVICE) __dma_sync(sg_page(sg), sg->offset, sg->length, direction);
[net-next PATCH RFC 01/26] swiotlb: Drop unused function swiotlb_map_sg
There are no users for swiotlb_map_sg so we might as well just drop it. Cc: Konrad Rzeszutek WilkSigned-off-by: Alexander Duyck --- include/linux/swiotlb.h |4 lib/swiotlb.c |8 2 files changed, 12 deletions(-) diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h index 5f81f8a..e237b6f 100644 --- a/include/linux/swiotlb.h +++ b/include/linux/swiotlb.h @@ -72,10 +72,6 @@ extern void swiotlb_unmap_page(struct device *hwdev, dma_addr_t dev_addr, size_t size, enum dma_data_direction dir, unsigned long attrs); -extern int -swiotlb_map_sg(struct device *hwdev, struct scatterlist *sg, int nents, - enum dma_data_direction dir); - extern void swiotlb_unmap_sg(struct device *hwdev, struct scatterlist *sg, int nents, enum dma_data_direction dir); diff --git a/lib/swiotlb.c b/lib/swiotlb.c index 22e13a0..47aad37 100644 --- a/lib/swiotlb.c +++ b/lib/swiotlb.c @@ -910,14 +910,6 @@ void swiotlb_unmap_page(struct device *hwdev, dma_addr_t dev_addr, } EXPORT_SYMBOL(swiotlb_map_sg_attrs); -int -swiotlb_map_sg(struct device *hwdev, struct scatterlist *sgl, int nelems, - enum dma_data_direction dir) -{ - return swiotlb_map_sg_attrs(hwdev, sgl, nelems, dir, 0); -} -EXPORT_SYMBOL(swiotlb_map_sg); - /* * Unmap a set of streaming mode DMA translations. Again, cpu read rules * concerning calls here are the same as for swiotlb_unmap_page() above.
[net-next PATCH RFC 10/26] arch/m68k: Add option to skip DMA sync as a part of mapping
This change allows us to pass DMA_ATTR_SKIP_CPU_SYNC which allows us to avoid invoking cache line invalidation if the driver will just handle it later via a sync_for_cpu or sync_for_device call. Cc: Geert UytterhoevenCc: linux-m...@lists.linux-m68k.org Signed-off-by: Alexander Duyck --- arch/m68k/kernel/dma.c |8 +++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/arch/m68k/kernel/dma.c b/arch/m68k/kernel/dma.c index 8cf97cb..0707006 100644 --- a/arch/m68k/kernel/dma.c +++ b/arch/m68k/kernel/dma.c @@ -134,7 +134,9 @@ static dma_addr_t m68k_dma_map_page(struct device *dev, struct page *page, { dma_addr_t handle = page_to_phys(page) + offset; - dma_sync_single_for_device(dev, handle, size, dir); + if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC)) + dma_sync_single_for_device(dev, handle, size, dir); + return handle; } @@ -146,6 +148,10 @@ static int m68k_dma_map_sg(struct device *dev, struct scatterlist *sglist, for_each_sg(sglist, sg, nents, i) { sg->dma_address = sg_phys(sg); + + if (attrs & DMA_ATTR_SKIP_CPU_SYNC) + continue; + dma_sync_single_for_device(dev, sg->dma_address, sg->length, dir); }
[net-next PATCH RFC 07/26] arch/c6x: Add option to skip sync on DMA map and unmap
This change allows us to pass DMA_ATTR_SKIP_CPU_SYNC which allows us to avoid invoking cache line invalidation if the driver will just handle it later via a sync_for_cpu or sync_for_device call. Cc: Mark SalterCc: Aurelien Jacquiot Cc: linux-c6x-...@linux-c6x.org Signed-off-by: Alexander Duyck --- arch/c6x/kernel/dma.c | 16 +++- 1 file changed, 11 insertions(+), 5 deletions(-) diff --git a/arch/c6x/kernel/dma.c b/arch/c6x/kernel/dma.c index db4a6a3..d28df74 100644 --- a/arch/c6x/kernel/dma.c +++ b/arch/c6x/kernel/dma.c @@ -42,14 +42,17 @@ static dma_addr_t c6x_dma_map_page(struct device *dev, struct page *page, { dma_addr_t handle = virt_to_phys(page_address(page) + offset); - c6x_dma_sync(handle, size, dir); + if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC)) + c6x_dma_sync(handle, size, dir); + return handle; } static void c6x_dma_unmap_page(struct device *dev, dma_addr_t handle, size_t size, enum dma_data_direction dir, unsigned long attrs) { - c6x_dma_sync(handle, size, dir); + if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC)) + c6x_dma_sync(handle, size, dir); } static int c6x_dma_map_sg(struct device *dev, struct scatterlist *sglist, @@ -60,7 +63,8 @@ static int c6x_dma_map_sg(struct device *dev, struct scatterlist *sglist, for_each_sg(sglist, sg, nents, i) { sg->dma_address = sg_phys(sg); - c6x_dma_sync(sg->dma_address, sg->length, dir); + if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC)) + c6x_dma_sync(sg->dma_address, sg->length, dir); } return nents; @@ -72,8 +76,10 @@ static void c6x_dma_unmap_sg(struct device *dev, struct scatterlist *sglist, struct scatterlist *sg; int i; - for_each_sg(sglist, sg, nents, i) - c6x_dma_sync(sg_dma_address(sg), sg->length, dir); + for_each_sg(sglist, sg, nents, i) { + if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC)) + c6x_dma_sync(sg_dma_address(sg), sg->length, dir); + } }
[net-next PATCH RFC 06/26] arch/blackfin: Add option to skip sync on DMA map
The use of DMA_ATTR_SKIP_CPU_SYNC was not consistent across all of the DMA APIs in the arch/arm folder. This change is meant to correct that so that we get consistent behavior. Cc: Steven MiaoSigned-off-by: Alexander Duyck --- arch/blackfin/kernel/dma-mapping.c |7 ++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/arch/blackfin/kernel/dma-mapping.c b/arch/blackfin/kernel/dma-mapping.c index 53fbbb6..ed9a6a8 100644 --- a/arch/blackfin/kernel/dma-mapping.c +++ b/arch/blackfin/kernel/dma-mapping.c @@ -133,6 +133,10 @@ static void bfin_dma_sync_sg_for_device(struct device *dev, for_each_sg(sg_list, sg, nelems, i) { sg->dma_address = (dma_addr_t) sg_virt(sg); + + if (attrs & DMA_ATTR_SKIP_CPU_SYNC) + continue; + __dma_sync(sg_dma_address(sg), sg_dma_len(sg), direction); } } @@ -143,7 +147,8 @@ static dma_addr_t bfin_dma_map_page(struct device *dev, struct page *page, { dma_addr_t handle = (dma_addr_t)(page_address(page) + offset); - _dma_sync(handle, size, dir); + if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC)) + _dma_sync(handle, size, dir); return handle; }
[net-next PATCH RFC 05/26] arch/avr32: Add option to skip sync on DMA map
The use of DMA_ATTR_SKIP_CPU_SYNC was not consistent across all of the DMA APIs in the arch/arm folder. This change is meant to correct that so that we get consistent behavior. Cc: Haavard SkinnemoenCc: Hans-Christian Egtvedt Signed-off-by: Alexander Duyck --- arch/avr32/mm/dma-coherent.c |7 ++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/arch/avr32/mm/dma-coherent.c b/arch/avr32/mm/dma-coherent.c index 58610d0..54534e5 100644 --- a/arch/avr32/mm/dma-coherent.c +++ b/arch/avr32/mm/dma-coherent.c @@ -146,7 +146,8 @@ static dma_addr_t avr32_dma_map_page(struct device *dev, struct page *page, { void *cpu_addr = page_address(page) + offset; - dma_cache_sync(dev, cpu_addr, size, direction); + if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC)) + dma_cache_sync(dev, cpu_addr, size, direction); return virt_to_bus(cpu_addr); } @@ -162,6 +163,10 @@ static int avr32_dma_map_sg(struct device *dev, struct scatterlist *sglist, sg->dma_address = page_to_bus(sg_page(sg)) + sg->offset; virt = sg_virt(sg); + + if (attrs & DMA_ATTR_SKIP_CPU_SYNC) + continue; + dma_cache_sync(dev, virt, sg->length, direction); }
[net-next PATCH RFC 03/26] arch/arc: Add option to skip sync on DMA mapping
This change allows us to pass DMA_ATTR_SKIP_CPU_SYNC which allows us to avoid invoking cache line invalidation if the driver will just handle it later via a sync_for_cpu or sync_for_device call. Cc: Vineet GuptaCc: linux-snps-...@lists.infradead.org Signed-off-by: Alexander Duyck --- arch/arc/mm/dma.c |3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/arch/arc/mm/dma.c b/arch/arc/mm/dma.c index 20afc65..d0c4b28 100644 --- a/arch/arc/mm/dma.c +++ b/arch/arc/mm/dma.c @@ -133,7 +133,8 @@ static dma_addr_t arc_dma_map_page(struct device *dev, struct page *page, unsigned long attrs) { phys_addr_t paddr = page_to_phys(page) + offset; - _dma_cache_sync(paddr, size, dir); + if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC)) + _dma_cache_sync(paddr, size, dir); return plat_phys_to_dma(dev, paddr); }
[net-next PATCH RFC 17/26] arch/powerpc: Add option to skip DMA sync as a part of mapping
This change allows us to pass DMA_ATTR_SKIP_CPU_SYNC which allows us to avoid invoking cache line invalidation if the driver will just handle it via a sync_for_cpu or sync_for_device call. Cc: Benjamin HerrenschmidtCc: Paul Mackerras Cc: Michael Ellerman Cc: linuxppc-...@lists.ozlabs.org Signed-off-by: Alexander Duyck --- arch/powerpc/kernel/dma.c |9 - 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/kernel/dma.c b/arch/powerpc/kernel/dma.c index e64a601..6877e3f 100644 --- a/arch/powerpc/kernel/dma.c +++ b/arch/powerpc/kernel/dma.c @@ -203,6 +203,10 @@ static int dma_direct_map_sg(struct device *dev, struct scatterlist *sgl, for_each_sg(sgl, sg, nents, i) { sg->dma_address = sg_phys(sg) + get_dma_offset(dev); sg->dma_length = sg->length; + + if (attrs & DMA_ATTR_SKIP_CPU_SYNC) + continue; + __dma_sync_page(sg_page(sg), sg->offset, sg->length, direction); } @@ -235,7 +239,10 @@ static inline dma_addr_t dma_direct_map_page(struct device *dev, unsigned long attrs) { BUG_ON(dir == DMA_NONE); - __dma_sync_page(page, offset, size, dir); + + if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC)) + __dma_sync_page(page, offset, size, dir); + return page_to_phys(page) + offset + get_dma_offset(dev); }
[net-next PATCH RFC 11/26] arch/metag: Add option to skip DMA sync as a part of map and unmap
This change allows us to pass DMA_ATTR_SKIP_CPU_SYNC which allows us to avoid invoking cache line invalidation if the driver will just handle it via a sync_for_cpu or sync_for_device call. Cc: James HoganCc: linux-me...@vger.kernel.org Signed-off-by: Alexander Duyck --- arch/metag/kernel/dma.c | 16 +--- 1 file changed, 13 insertions(+), 3 deletions(-) diff --git a/arch/metag/kernel/dma.c b/arch/metag/kernel/dma.c index 0db31e2..91968d9 100644 --- a/arch/metag/kernel/dma.c +++ b/arch/metag/kernel/dma.c @@ -484,8 +484,9 @@ static dma_addr_t metag_dma_map_page(struct device *dev, struct page *page, unsigned long offset, size_t size, enum dma_data_direction direction, unsigned long attrs) { - dma_sync_for_device((void *)(page_to_phys(page) + offset), size, - direction); + if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC)) + dma_sync_for_device((void *)(page_to_phys(page) + offset), + size, direction); return page_to_phys(page) + offset; } @@ -493,7 +494,8 @@ static void metag_dma_unmap_page(struct device *dev, dma_addr_t dma_address, size_t size, enum dma_data_direction direction, unsigned long attrs) { - dma_sync_for_cpu(phys_to_virt(dma_address), size, direction); + if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC)) + dma_sync_for_cpu(phys_to_virt(dma_address), size, direction); } static int metag_dma_map_sg(struct device *dev, struct scatterlist *sglist, @@ -507,6 +509,10 @@ static int metag_dma_map_sg(struct device *dev, struct scatterlist *sglist, BUG_ON(!sg_page(sg)); sg->dma_address = sg_phys(sg); + + if (attrs & DMA_ATTR_SKIP_CPU_SYNC) + continue; + dma_sync_for_device(sg_virt(sg), sg->length, direction); } @@ -525,6 +531,10 @@ static void metag_dma_unmap_sg(struct device *dev, struct scatterlist *sglist, BUG_ON(!sg_page(sg)); sg->dma_address = sg_phys(sg); + + if (attrs & DMA_ATTR_SKIP_CPU_SYNC) + continue; + dma_sync_for_cpu(sg_virt(sg), sg->length, direction); } }
[net-next PATCH RFC 12/26] arch/microblaze: Add option to skip DMA sync as a part of map and unmap
This change allows us to pass DMA_ATTR_SKIP_CPU_SYNC which allows us to avoid invoking cache line invalidation if the driver will just handle it via a sync_for_cpu or sync_for_device call. Cc: Michal SimekSigned-off-by: Alexander Duyck --- arch/microblaze/kernel/dma.c | 10 -- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/arch/microblaze/kernel/dma.c b/arch/microblaze/kernel/dma.c index ec04dc1..818daf2 100644 --- a/arch/microblaze/kernel/dma.c +++ b/arch/microblaze/kernel/dma.c @@ -61,6 +61,10 @@ static int dma_direct_map_sg(struct device *dev, struct scatterlist *sgl, /* FIXME this part of code is untested */ for_each_sg(sgl, sg, nents, i) { sg->dma_address = sg_phys(sg); + + if (attrs & DMA_ATTR_SKIP_CPU_SYNC) + continue; + __dma_sync(page_to_phys(sg_page(sg)) + sg->offset, sg->length, direction); } @@ -80,7 +84,8 @@ static inline dma_addr_t dma_direct_map_page(struct device *dev, enum dma_data_direction direction, unsigned long attrs) { - __dma_sync(page_to_phys(page) + offset, size, direction); + if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC)) + __dma_sync(page_to_phys(page) + offset, size, direction); return page_to_phys(page) + offset; } @@ -95,7 +100,8 @@ static inline void dma_direct_unmap_page(struct device *dev, * phys_to_virt is here because in __dma_sync_page is __virt_to_phys and * dma_address is physical address */ - __dma_sync(dma_address, size, direction); + if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC)) + __dma_sync(dma_address, size, direction); } static inline void
[net-next PATCH RFC 16/26] arch/parisc: Add option to skip DMA sync as a part of map and unmap
This change allows us to pass DMA_ATTR_SKIP_CPU_SYNC which allows us to avoid invoking cache line invalidation if the driver will just handle it via a sync_for_cpu or sync_for_device call. Cc: "James E.J. Bottomley"Cc: Helge Deller Cc: linux-par...@vger.kernel.org Signed-off-by: Alexander Duyck --- arch/parisc/kernel/pci-dma.c | 20 +++- 1 file changed, 15 insertions(+), 5 deletions(-) diff --git a/arch/parisc/kernel/pci-dma.c b/arch/parisc/kernel/pci-dma.c index 02d9ed0..be55ede 100644 --- a/arch/parisc/kernel/pci-dma.c +++ b/arch/parisc/kernel/pci-dma.c @@ -459,7 +459,9 @@ static dma_addr_t pa11_dma_map_page(struct device *dev, struct page *page, void *addr = page_address(page) + offset; BUG_ON(direction == DMA_NONE); - flush_kernel_dcache_range((unsigned long) addr, size); + if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC)) + flush_kernel_dcache_range((unsigned long) addr, size); + return virt_to_phys(addr); } @@ -469,8 +471,11 @@ static void pa11_dma_unmap_page(struct device *dev, dma_addr_t dma_handle, { BUG_ON(direction == DMA_NONE); + if (attrs & DMA_ATTR_SKIP_CPU_SYNC) + return; + if (direction == DMA_TO_DEVICE) - return; + return; /* * For PCI_DMA_FROMDEVICE this flush is not necessary for the @@ -479,7 +484,6 @@ static void pa11_dma_unmap_page(struct device *dev, dma_addr_t dma_handle, */ flush_kernel_dcache_range((unsigned long) phys_to_virt(dma_handle), size); - return; } static int pa11_dma_map_sg(struct device *dev, struct scatterlist *sglist, @@ -496,6 +500,10 @@ static int pa11_dma_map_sg(struct device *dev, struct scatterlist *sglist, sg_dma_address(sg) = (dma_addr_t) virt_to_phys(vaddr); sg_dma_len(sg) = sg->length; + + if (attrs & DMA_ATTR_SKIP_CPU_SYNC) + continue; + flush_kernel_dcache_range(vaddr, sg->length); } return nents; @@ -510,14 +518,16 @@ static void pa11_dma_unmap_sg(struct device *dev, struct scatterlist *sglist, BUG_ON(direction == DMA_NONE); + if (attrs & DMA_ATTR_SKIP_CPU_SYNC) + return; + if (direction == DMA_TO_DEVICE) - return; + return; /* once we do combining we'll need to use phys_to_virt(sg_dma_address(sglist)) */ for_each_sg(sglist, sg, nents, i) flush_kernel_vmap_range(sg_virt(sg), sg->length); - return; } static void pa11_dma_sync_single_for_cpu(struct device *dev,
[net-next PATCH RFC 14/26] arch/nios2: Add option to skip DMA sync as a part of map and unmap
This change allows us to pass DMA_ATTR_SKIP_CPU_SYNC which allows us to avoid invoking cache line invalidation if the driver will just handle it via a sync_for_cpu or sync_for_device call. Cc: Ley Foon TanSigned-off-by: Alexander Duyck --- arch/nios2/mm/dma-mapping.c | 14 +++--- 1 file changed, 11 insertions(+), 3 deletions(-) diff --git a/arch/nios2/mm/dma-mapping.c b/arch/nios2/mm/dma-mapping.c index d800fad..b83e723 100644 --- a/arch/nios2/mm/dma-mapping.c +++ b/arch/nios2/mm/dma-mapping.c @@ -102,7 +102,9 @@ static int nios2_dma_map_sg(struct device *dev, struct scatterlist *sg, addr = sg_virt(sg); if (addr) { - __dma_sync_for_device(addr, sg->length, direction); + if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC)) + __dma_sync_for_device(addr, sg->length, + direction); sg->dma_address = sg_phys(sg); } } @@ -117,7 +119,9 @@ static dma_addr_t nios2_dma_map_page(struct device *dev, struct page *page, { void *addr = page_address(page) + offset; - __dma_sync_for_device(addr, size, direction); + if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC)) + __dma_sync_for_device(addr, size, direction); + return page_to_phys(page) + offset; } @@ -125,7 +129,8 @@ static void nios2_dma_unmap_page(struct device *dev, dma_addr_t dma_address, size_t size, enum dma_data_direction direction, unsigned long attrs) { - __dma_sync_for_cpu(phys_to_virt(dma_address), size, direction); + if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC)) + __dma_sync_for_cpu(phys_to_virt(dma_address), size, direction); } static void nios2_dma_unmap_sg(struct device *dev, struct scatterlist *sg, @@ -138,6 +143,9 @@ static void nios2_dma_unmap_sg(struct device *dev, struct scatterlist *sg, if (direction == DMA_TO_DEVICE) return; + if (attrs & DMA_ATTR_SKIP_CPU_SYNC) + return; + for_each_sg(sg, sg, nhwentries, i) { addr = sg_virt(sg); if (addr)
[net-next PATCH RFC 04/26] arch/arm: Add option to skip sync on DMA map and unmap
The use of DMA_ATTR_SKIP_CPU_SYNC was not consistent across all of the DMA APIs in the arch/arm folder. This change is meant to correct that so that we get consistent behavior. Cc: Russell KingSigned-off-by: Alexander Duyck --- arch/arm/common/dmabounce.c | 16 ++-- 1 file changed, 10 insertions(+), 6 deletions(-) diff --git a/arch/arm/common/dmabounce.c b/arch/arm/common/dmabounce.c index 3012816..75055df 100644 --- a/arch/arm/common/dmabounce.c +++ b/arch/arm/common/dmabounce.c @@ -243,7 +243,8 @@ static int needs_bounce(struct device *dev, dma_addr_t dma_addr, size_t size) } static inline dma_addr_t map_single(struct device *dev, void *ptr, size_t size, - enum dma_data_direction dir) + enum dma_data_direction dir, + unsigned long attrs) { struct dmabounce_device_info *device_info = dev->archdata.dmabounce; struct safe_buffer *buf; @@ -262,7 +263,8 @@ static inline dma_addr_t map_single(struct device *dev, void *ptr, size_t size, __func__, buf->ptr, virt_to_dma(dev, buf->ptr), buf->safe, buf->safe_dma_addr); - if (dir == DMA_TO_DEVICE || dir == DMA_BIDIRECTIONAL) { + if ((dir == DMA_TO_DEVICE || dir == DMA_BIDIRECTIONAL) && + !(attrs & DMA_ATTR_SKIP_CPU_SYNC)) { dev_dbg(dev, "%s: copy unsafe %p to safe %p, size %d\n", __func__, ptr, buf->safe, size); memcpy(buf->safe, ptr, size); @@ -272,7 +274,8 @@ static inline dma_addr_t map_single(struct device *dev, void *ptr, size_t size, } static inline void unmap_single(struct device *dev, struct safe_buffer *buf, - size_t size, enum dma_data_direction dir) + size_t size, enum dma_data_direction dir, + unsigned long attrs) { BUG_ON(buf->size != size); BUG_ON(buf->direction != dir); @@ -283,7 +286,8 @@ static inline void unmap_single(struct device *dev, struct safe_buffer *buf, DO_STATS(dev->archdata.dmabounce->bounce_count++); - if (dir == DMA_FROM_DEVICE || dir == DMA_BIDIRECTIONAL) { + if ((dir == DMA_FROM_DEVICE || dir == DMA_BIDIRECTIONAL) && + !(attrs & DMA_ATTR_SKIP_CPU_SYNC)) { void *ptr = buf->ptr; dev_dbg(dev, "%s: copy back safe %p to unsafe %p size %d\n", @@ -334,7 +338,7 @@ static dma_addr_t dmabounce_map_page(struct device *dev, struct page *page, return DMA_ERROR_CODE; } - return map_single(dev, page_address(page) + offset, size, dir); + return map_single(dev, page_address(page) + offset, size, dir, attrs); } /* @@ -357,7 +361,7 @@ static void dmabounce_unmap_page(struct device *dev, dma_addr_t dma_addr, size_t return; } - unmap_single(dev, buf, size, dir); + unmap_single(dev, buf, size, dir, attrs); } static int __dmabounce_sync_for_cpu(struct device *dev, dma_addr_t addr,
[net-next PATCH RFC 18/26] arch/sh: Add option to skip DMA sync as a part of mapping
This change allows us to pass DMA_ATTR_SKIP_CPU_SYNC which allows us to avoid invoking cache line invalidation if the driver will just handle it via a sync_for_cpu or sync_for_device call. Cc: Yoshinori SatoCc: Rich Felker Cc: linux...@vger.kernel.org Signed-off-by: Alexander Duyck --- arch/sh/kernel/dma-nommu.c |7 +-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/arch/sh/kernel/dma-nommu.c b/arch/sh/kernel/dma-nommu.c index eadb669..47fee3b 100644 --- a/arch/sh/kernel/dma-nommu.c +++ b/arch/sh/kernel/dma-nommu.c @@ -18,7 +18,9 @@ static dma_addr_t nommu_map_page(struct device *dev, struct page *page, dma_addr_t addr = page_to_phys(page) + offset; WARN_ON(size == 0); - dma_cache_sync(dev, page_address(page) + offset, size, dir); + + if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC)) + dma_cache_sync(dev, page_address(page) + offset, size, dir); return addr; } @@ -35,7 +37,8 @@ static int nommu_map_sg(struct device *dev, struct scatterlist *sg, for_each_sg(sg, s, nents, i) { BUG_ON(!sg_page(s)); - dma_cache_sync(dev, sg_virt(s), s->length, dir); + if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC)) + dma_cache_sync(dev, sg_virt(s), s->length, dir); s->dma_address = sg_phys(s); s->dma_length = s->length;
[PATCH 2/3] mwifiex: Introduce mwifiex_probe_of() to parse common properties
Introduce function mwifiex_probe_of() to parse common properties. Since the interface drivers get to decide whether or not the device tree node was a valid one (depending on the compatible property), let the interface drivers pass a flag to indicate whether the device tree node was a valid one. The function mwifiex_probe_of()nodetself is currently only a place holder with the next patch adding content to it. Signed-off-by: Rajat Jain--- drivers/net/wireless/marvell/mwifiex/main.c| 15 ++- drivers/net/wireless/marvell/mwifiex/main.h| 2 +- drivers/net/wireless/marvell/mwifiex/pcie.c| 4 +++- drivers/net/wireless/marvell/mwifiex/sdio.c| 4 +++- drivers/net/wireless/marvell/mwifiex/sta_cmd.c | 5 + drivers/net/wireless/marvell/mwifiex/usb.c | 2 +- 6 files changed, 23 insertions(+), 9 deletions(-) diff --git a/drivers/net/wireless/marvell/mwifiex/main.c b/drivers/net/wireless/marvell/mwifiex/main.c index dcceab2..b2f3d96 100644 --- a/drivers/net/wireless/marvell/mwifiex/main.c +++ b/drivers/net/wireless/marvell/mwifiex/main.c @@ -1552,6 +1552,16 @@ void mwifiex_do_flr(struct mwifiex_adapter *adapter, bool prepare) } EXPORT_SYMBOL_GPL(mwifiex_do_flr); +static void mwifiex_probe_of(struct mwifiex_adapter *adapter) +{ + struct device *dev = adapter->dev; + + if (!dev->of_node) + return; + + adapter->dt_node = dev->of_node; +} + /* * This function adds the card. * @@ -1568,7 +1578,7 @@ EXPORT_SYMBOL_GPL(mwifiex_do_flr); int mwifiex_add_card(void *card, struct semaphore *sem, struct mwifiex_if_ops *if_ops, u8 iface_type, -struct device *dev) +struct device *dev, bool of_node_valid) { struct mwifiex_adapter *adapter; @@ -1581,6 +1591,9 @@ mwifiex_add_card(void *card, struct semaphore *sem, } adapter->dev = dev; + if (of_node_valid) + mwifiex_probe_of(adapter); + adapter->iface_type = iface_type; adapter->card_sem = sem; diff --git a/drivers/net/wireless/marvell/mwifiex/main.h b/drivers/net/wireless/marvell/mwifiex/main.h index 91218a1..83e0776 100644 --- a/drivers/net/wireless/marvell/mwifiex/main.h +++ b/drivers/net/wireless/marvell/mwifiex/main.h @@ -1412,7 +1412,7 @@ static inline u8 mwifiex_is_tdls_link_setup(u8 status) int mwifiex_init_shutdown_fw(struct mwifiex_private *priv, u32 func_init_shutdown); int mwifiex_add_card(void *, struct semaphore *, struct mwifiex_if_ops *, u8, -struct device *); +struct device *, bool); int mwifiex_remove_card(struct mwifiex_adapter *, struct semaphore *); void mwifiex_get_version(struct mwifiex_adapter *adapter, char *version, diff --git a/drivers/net/wireless/marvell/mwifiex/pcie.c b/drivers/net/wireless/marvell/mwifiex/pcie.c index 49b5835..ea423d5 100644 --- a/drivers/net/wireless/marvell/mwifiex/pcie.c +++ b/drivers/net/wireless/marvell/mwifiex/pcie.c @@ -194,6 +194,7 @@ static int mwifiex_pcie_probe(struct pci_dev *pdev, const struct pci_device_id *ent) { struct pcie_service_card *card; + bool valid_of_node = false; int ret; pr_debug("info: vendor=0x%4.04X device=0x%4.04X rev=%d\n", @@ -221,10 +222,11 @@ static int mwifiex_pcie_probe(struct pci_dev *pdev, ret = mwifiex_pcie_probe_of(>dev); if (ret) goto err_free; + valid_of_node = true; } ret = mwifiex_add_card(card, _remove_card_sem, _ops, - MWIFIEX_PCIE, >dev); + MWIFIEX_PCIE, >dev, valid_of_node); if (ret) { pr_err("%s failed\n", __func__); goto err_free; diff --git a/drivers/net/wireless/marvell/mwifiex/sdio.c b/drivers/net/wireless/marvell/mwifiex/sdio.c index c95f41f..558743a 100644 --- a/drivers/net/wireless/marvell/mwifiex/sdio.c +++ b/drivers/net/wireless/marvell/mwifiex/sdio.c @@ -156,6 +156,7 @@ mwifiex_sdio_probe(struct sdio_func *func, const struct sdio_device_id *id) { int ret; struct sdio_mmc_card *card = NULL; + bool valid_of_node = false; pr_debug("info: vendor=0x%4.04X device=0x%4.04X class=%d function=%d\n", func->vendor, func->device, func->class, func->num); @@ -203,10 +204,11 @@ mwifiex_sdio_probe(struct sdio_func *func, const struct sdio_device_id *id) dev_err(>dev, "SDIO dt node parse failed\n"); goto err_disable; } + valid_of_node = true; } ret = mwifiex_add_card(card, _remove_card_sem, _ops, - MWIFIEX_SDIO, >dev); + MWIFIEX_SDIO, >dev, valid_of_node); if (ret) { dev_err(>dev, "add card failed\n");
[PATCH 1/3] mwifiex: Allow mwifiex early access to device structure
Today all the interface drivers (usb/pcie/sdio) assign the adapter->dev in the register_dev() callback, although they have this piece of info well before hand. This patch makes the device structure available for mwifiex right at the beginning, so that it can be used for early initialization if needed. This is needed for subsequent patches in this patchset that intend to unify and consolidate some of the code that would otherwise have to be duplicated among the interface drivers (sdio, pcie, usb). Signed-off-by: Rajat Jain--- drivers/net/wireless/marvell/mwifiex/main.c | 4 +++- drivers/net/wireless/marvell/mwifiex/main.h | 3 ++- drivers/net/wireless/marvell/mwifiex/pcie.c | 4 +--- drivers/net/wireless/marvell/mwifiex/sdio.c | 5 + drivers/net/wireless/marvell/mwifiex/usb.c | 3 +-- 5 files changed, 8 insertions(+), 11 deletions(-) diff --git a/drivers/net/wireless/marvell/mwifiex/main.c b/drivers/net/wireless/marvell/mwifiex/main.c index 2478ccd..dcceab2 100644 --- a/drivers/net/wireless/marvell/mwifiex/main.c +++ b/drivers/net/wireless/marvell/mwifiex/main.c @@ -1567,7 +1567,8 @@ EXPORT_SYMBOL_GPL(mwifiex_do_flr); */ int mwifiex_add_card(void *card, struct semaphore *sem, -struct mwifiex_if_ops *if_ops, u8 iface_type) +struct mwifiex_if_ops *if_ops, u8 iface_type, +struct device *dev) { struct mwifiex_adapter *adapter; @@ -1579,6 +1580,7 @@ mwifiex_add_card(void *card, struct semaphore *sem, goto err_init_sw; } + adapter->dev = dev; adapter->iface_type = iface_type; adapter->card_sem = sem; diff --git a/drivers/net/wireless/marvell/mwifiex/main.h b/drivers/net/wireless/marvell/mwifiex/main.h index 26df28f..91218a1 100644 --- a/drivers/net/wireless/marvell/mwifiex/main.h +++ b/drivers/net/wireless/marvell/mwifiex/main.h @@ -1411,7 +1411,8 @@ static inline u8 mwifiex_is_tdls_link_setup(u8 status) int mwifiex_init_shutdown_fw(struct mwifiex_private *priv, u32 func_init_shutdown); -int mwifiex_add_card(void *, struct semaphore *, struct mwifiex_if_ops *, u8); +int mwifiex_add_card(void *, struct semaphore *, struct mwifiex_if_ops *, u8, +struct device *); int mwifiex_remove_card(struct mwifiex_adapter *, struct semaphore *); void mwifiex_get_version(struct mwifiex_adapter *adapter, char *version, diff --git a/drivers/net/wireless/marvell/mwifiex/pcie.c b/drivers/net/wireless/marvell/mwifiex/pcie.c index f7c84d3..49b5835 100644 --- a/drivers/net/wireless/marvell/mwifiex/pcie.c +++ b/drivers/net/wireless/marvell/mwifiex/pcie.c @@ -224,7 +224,7 @@ static int mwifiex_pcie_probe(struct pci_dev *pdev, } ret = mwifiex_add_card(card, _remove_card_sem, _ops, - MWIFIEX_PCIE); + MWIFIEX_PCIE, >dev); if (ret) { pr_err("%s failed\n", __func__); goto err_free; @@ -2990,11 +2990,9 @@ static void mwifiex_pcie_get_fw_name(struct mwifiex_adapter *adapter) static int mwifiex_register_dev(struct mwifiex_adapter *adapter) { struct pcie_service_card *card = adapter->card; - struct pci_dev *pdev = card->dev; /* save adapter pointer in card */ card->adapter = adapter; - adapter->dev = >dev; if (mwifiex_pcie_request_irq(adapter)) return -1; diff --git a/drivers/net/wireless/marvell/mwifiex/sdio.c b/drivers/net/wireless/marvell/mwifiex/sdio.c index 807af13..c95f41f 100644 --- a/drivers/net/wireless/marvell/mwifiex/sdio.c +++ b/drivers/net/wireless/marvell/mwifiex/sdio.c @@ -206,7 +206,7 @@ mwifiex_sdio_probe(struct sdio_func *func, const struct sdio_device_id *id) } ret = mwifiex_add_card(card, _remove_card_sem, _ops, - MWIFIEX_SDIO); + MWIFIEX_SDIO, >dev); if (ret) { dev_err(>dev, "add card failed\n"); goto err_disable; @@ -2106,9 +2106,6 @@ static int mwifiex_register_dev(struct mwifiex_adapter *adapter) return ret; } - - adapter->dev = >dev; - strcpy(adapter->fw_name, card->firmware); if (card->fw_dump_enh) { adapter->mem_type_mapping_tbl = generic_mem_type_map; diff --git a/drivers/net/wireless/marvell/mwifiex/usb.c b/drivers/net/wireless/marvell/mwifiex/usb.c index 73eb084..f847fff 100644 --- a/drivers/net/wireless/marvell/mwifiex/usb.c +++ b/drivers/net/wireless/marvell/mwifiex/usb.c @@ -476,7 +476,7 @@ static int mwifiex_usb_probe(struct usb_interface *intf, usb_set_intfdata(intf, card); ret = mwifiex_add_card(card, _remove_card_sem, _ops, - MWIFIEX_USB); + MWIFIEX_USB, >udev->dev); if (ret) { pr_err("%s: mwifiex_add_card failed: %d\n", __func__, ret);
[PATCH 3/3] mwifiex: Enable WoWLAN for both sdio and pcie
Commit ce4f6f0c353b ("mwifiex: add platform specific wakeup interrupt support") added WoWLAN feature only for sdio. This patch moves that code to the common module so that all the interface drivers can use it for free. It enables pcie and sdio for its use currently. Signed-off-by: Rajat Jain--- drivers/net/wireless/marvell/mwifiex/main.c | 41 drivers/net/wireless/marvell/mwifiex/main.h | 25 ++ drivers/net/wireless/marvell/mwifiex/pcie.c | 2 + drivers/net/wireless/marvell/mwifiex/sdio.c | 72 ++--- drivers/net/wireless/marvell/mwifiex/sdio.h | 8 5 files changed, 73 insertions(+), 75 deletions(-) diff --git a/drivers/net/wireless/marvell/mwifiex/main.c b/drivers/net/wireless/marvell/mwifiex/main.c index b2f3d96..20c9b77 100644 --- a/drivers/net/wireless/marvell/mwifiex/main.c +++ b/drivers/net/wireless/marvell/mwifiex/main.c @@ -1552,14 +1552,55 @@ void mwifiex_do_flr(struct mwifiex_adapter *adapter, bool prepare) } EXPORT_SYMBOL_GPL(mwifiex_do_flr); +static irqreturn_t mwifiex_irq_wakeup_handler(int irq, void *priv) +{ + struct mwifiex_adapter *adapter = priv; + + if (adapter->irq_wakeup >= 0) { + dev_dbg(adapter->dev, "%s: wake by wifi", __func__); + adapter->wake_by_wifi = true; + disable_irq_nosync(irq); + } + + /* Notify PM core we are wakeup source */ + pm_wakeup_event(adapter->dev, 0); + + return IRQ_HANDLED; +} + static void mwifiex_probe_of(struct mwifiex_adapter *adapter) { + int ret; struct device *dev = adapter->dev; if (!dev->of_node) return; adapter->dt_node = dev->of_node; + adapter->irq_wakeup = irq_of_parse_and_map(adapter->dt_node, 0); + if (!adapter->irq_wakeup) { + dev_info(dev, "fail to parse irq_wakeup from device tree\n"); + return; + } + + ret = devm_request_irq(dev, adapter->irq_wakeup, + mwifiex_irq_wakeup_handler, IRQF_TRIGGER_LOW, + "wifi_wake", adapter); + if (ret) { + dev_err(dev, "Failed to request irq_wakeup %d (%d)\n", + adapter->irq_wakeup, ret); + goto err_exit; + } + + disable_irq(adapter->irq_wakeup); + if (device_init_wakeup(dev, true)) { + dev_err(dev, "fail to init wakeup for mwifiex\n"); + goto err_exit; + } + return; + +err_exit: + adapter->irq_wakeup = 0; } /* diff --git a/drivers/net/wireless/marvell/mwifiex/main.h b/drivers/net/wireless/marvell/mwifiex/main.h index 83e0776..12def94 100644 --- a/drivers/net/wireless/marvell/mwifiex/main.h +++ b/drivers/net/wireless/marvell/mwifiex/main.h @@ -1010,6 +1010,10 @@ struct mwifiex_adapter { bool usb_mc_setup; struct cfg80211_wowlan_nd_info *nd_info; struct ieee80211_regdomain *regd; + + /* Wake-on-WLAN (WoWLAN) */ + int irq_wakeup; + bool wake_by_wifi; }; void mwifiex_process_tx_queue(struct mwifiex_adapter *adapter); @@ -1409,6 +1413,27 @@ static inline u8 mwifiex_is_tdls_link_setup(u8 status) return false; } +/* Disable platform specific wakeup interrupt */ +static inline void mwifiex_disable_wake(struct mwifiex_adapter *adapter) +{ + if (adapter->irq_wakeup >= 0) { + disable_irq_wake(adapter->irq_wakeup); + if (!adapter->wake_by_wifi) + disable_irq(adapter->irq_wakeup); + } +} + +/* Enable platform specific wakeup interrupt */ +static inline void mwifiex_enable_wake(struct mwifiex_adapter *adapter) +{ + /* Enable platform specific wakeup interrupt */ + if (adapter->irq_wakeup >= 0) { + adapter->wake_by_wifi = false; + enable_irq(adapter->irq_wakeup); + enable_irq_wake(adapter->irq_wakeup); + } +} + int mwifiex_init_shutdown_fw(struct mwifiex_private *priv, u32 func_init_shutdown); int mwifiex_add_card(void *, struct semaphore *, struct mwifiex_if_ops *, u8, diff --git a/drivers/net/wireless/marvell/mwifiex/pcie.c b/drivers/net/wireless/marvell/mwifiex/pcie.c index ea423d5..af93661 100644 --- a/drivers/net/wireless/marvell/mwifiex/pcie.c +++ b/drivers/net/wireless/marvell/mwifiex/pcie.c @@ -133,6 +133,7 @@ static int mwifiex_pcie_suspend(struct device *dev) adapter = card->adapter; + mwifiex_enable_wake(adapter); hs_actived = mwifiex_enable_hs(adapter); /* Indicate device suspended */ @@ -179,6 +180,7 @@ static int mwifiex_pcie_resume(struct device *dev) mwifiex_cancel_hs(mwifiex_get_priv(adapter, MWIFIEX_BSS_ROLE_STA), MWIFIEX_ASYNC_CMD); + mwifiex_disable_wake(adapter); return 0; } diff --git a/drivers/net/wireless/marvell/mwifiex/sdio.c
[PATCH 0/3] mwifiex: Make WoWLAN a common feature
I have a Marvell card on the PCIe bus that needs to support WoWLAN (wake-on-wireless-LAN) feature. This is a feature offered by the "core" mwifiex card and is not specific to an interface (pcie/sdio/usb). Currently the code to parse the WowLAN pin, and activate it resides only in sdio.c [mostly commit ce4f6f0c353b ("mwifiex: add platform specific wakeup interrupt support")]. I started by copying all that code & data structures in pcie.c/pcie.h but then realized that the we should probably have it common since the feature is not interface specific. Further, I noticed that interface driver had no interest in device trees since there are no properties specific to interfaces. Currently the only properties needed, are the common ones needed by the core mwifiex. This patch set thus introduces mwifiex_probe_of() to parse the common properties, and then moves the WowLAN specific code to the common module so that all the interfaces can use it. Essentially this is a single logical patch that has been split up into multiple patches only for the reason of simplicity and code reviews. This is currently rebased on the top of Linus' tree with the following 2 patches applied: https://patchwork.kernel.org/patch/9362275/ https://patchwork.kernel.org/patch/9390225/ Rajat Jain (3): mwifiex: Allow mwifiex early access to device structure mwifiex: Introduce mwifiex_probe_of() to parse common properties mwifiex: Enable WoWLAN for both sdio and pcie drivers/net/wireless/marvell/mwifiex/main.c| 58 ++- drivers/net/wireless/marvell/mwifiex/main.h| 28 - drivers/net/wireless/marvell/mwifiex/pcie.c| 8 ++- drivers/net/wireless/marvell/mwifiex/sdio.c| 79 +++--- drivers/net/wireless/marvell/mwifiex/sdio.h| 8 --- drivers/net/wireless/marvell/mwifiex/sta_cmd.c | 5 +- drivers/net/wireless/marvell/mwifiex/usb.c | 3 +- 7 files changed, 99 insertions(+), 90 deletions(-) -- 2.8.0.rc3.226.g39d4020
Re: [PATCH] netns: revert "netns: avoid disabling irq for netns id"
On Sat, Oct 22, 2016 at 12:29 PM, Paul Moorewrote: > On Fri, Oct 21, 2016 at 11:38 PM, Cong Wang wrote: >> On Fri, Oct 21, 2016 at 6:49 PM, Paul Moore wrote: >>> Eventually we should be able to reintroduce this code once we have >>> rewritten the audit multicast code to queue messages much the same >>> way we do for unicast messages. A tracking issue for this can be >>> found below: >> >> NAK. >> >> 1) This will be forgotten by Paul. > > The way things are going right now, this argument is going to devolve > into a "yes he will"/"no I won't" so I'll just repeat that I've > created a tracking issue for this so I won't forget (and included a > link, repeated below, in the commit description) and I think I have a > reasonable history of following through on things. > > * https://github.com/linux-audit/audit-kernel/issues/23 I never doubt you will remember to do the audit part, what you will forget is the revert to your revert. We will see. Also, you make git log history much uglier. > >> 2) There is already a fix which is considered as a rework by Paul. > > Already discussed this in the other thread, I'm not going to go into > detail here, just a quick summary: the fix provided by Cong Wang > doubles the message queue's memory consumption and changes some > fundamentals in how multicast messages are handled. The memory > issues, while still an objectionable blocker, are easily resolved, but > moving the netlink multicast send is something I want to make sure is > tested/baked for a bit (it's 4.10 merge window material as far as I'm > concerned). Sounds like you don't have the capacity to get it reviewed and tested within 5 weeks (assuming -rc7 will be the final RC), as a maintainer. > > At this point I think it is worth mentioning that we are in this > position due to a lack of testing; if Cong Wang had tested his > original patch with SELinux we might not be dealing with this > regression now. A more measured approach seems very reasonable. > My SELinux is silently disabled because CONFIG_DEFAULT_SECURITY_SELINUX=y was missing in my kernel config. The change is a cross-subsystem one, I definitely can't guarantee I can cover all subsystems. This is exactly why we need -rc1...-rc7, the moment you close the door for -rc2, the moment you lose the opportunity to get it tested more widely. I am sure you will revert the revert of revert again for the next merge window if you continue to work in this style. >> 3) -rc2 is Paul's personal deadline, not ours. > > The current 4.9-rc kernels are broken and cause errors when SELinux is > enabled, while I understand SELinux is not a priority (or a secondary, > or tertiary, or N-ary concern) for many on the netdev list, it is > still an important part of the kernel and this regression needs to be > treated seriously and corrected soon. You get it wrong, it is never because SELinux is not important, every part of Linux kernel is important. You need to realize we as a whole community don't work in this way, -rc2 is NOT late for a bug fix of any part of Linux kernel. If you can't review and test a 30-line patch in 5 weeks, it is very likely your problem. > > SELinux/audit has run into interaction issues with the network stack > before, and we've worked together to sort things out; I'm hopeful > cooler heads will prevail and we can do the same here. I am trying my best to help (by providing 3 possible patches), you refused them because of _your_ -rc2 deadline. Let people judge who is the one doesn't work together. I am tired of explaining why we have -rc7 to you.
[PATCH v2] net: ipv6: Fix processing of RAs in presence of VRF
rt6_add_route_info and rt6_add_dflt_router were updated to pull the FIB table from the device index, but the corresponding rt6_get_route_info and rt6_get_dflt_router functions were not leading to the failure to process RA's: ICMPv6: RA: ndisc_router_discovery failed to add default route Fix the 'get' functions by using the table id associated with the device when applicable. Also, now that default routes can be added to tables other than the default table, rt6_purge_dflt_routers needs to be updated as well to look at all tables. To handle that efficiently, add a flag to the table denoting if it is has a default route via RA. Fixes: ca254490c8dfd ("net: Add VRF support to IPv6 stack") Signed-off-by: David Ahern--- v2 - added Fixes to commit message include/net/ip6_fib.h | 2 ++ net/ipv6/route.c | 68 --- 2 files changed, 50 insertions(+), 20 deletions(-) diff --git a/include/net/ip6_fib.h b/include/net/ip6_fib.h index fb961a576abe..a74e2aa40ef4 100644 --- a/include/net/ip6_fib.h +++ b/include/net/ip6_fib.h @@ -230,6 +230,8 @@ struct fib6_table { rwlock_ttb6_lock; struct fib6_nodetb6_root; struct inet_peer_base tb6_peers; + unsigned intflags; +#define RT6_TABLE_HAS_DFLT_ROUTER BIT(0) }; #define RT6_TABLE_UNSPEC RT_TABLE_UNSPEC diff --git a/net/ipv6/route.c b/net/ipv6/route.c index bdbc38e8bf29..3ac19eb81a86 100644 --- a/net/ipv6/route.c +++ b/net/ipv6/route.c @@ -102,11 +102,13 @@ static int rt6_score_route(struct rt6_info *rt, int oif, int strict); #ifdef CONFIG_IPV6_ROUTE_INFO static struct rt6_info *rt6_add_route_info(struct net *net, const struct in6_addr *prefix, int prefixlen, - const struct in6_addr *gwaddr, int ifindex, + const struct in6_addr *gwaddr, + struct net_device *dev, unsigned int pref); static struct rt6_info *rt6_get_route_info(struct net *net, const struct in6_addr *prefix, int prefixlen, - const struct in6_addr *gwaddr, int ifindex); + const struct in6_addr *gwaddr, + struct net_device *dev); #endif struct uncached_list { @@ -803,7 +805,7 @@ int rt6_route_rcv(struct net_device *dev, u8 *opt, int len, rt = rt6_get_dflt_router(gwaddr, dev); else rt = rt6_get_route_info(net, prefix, rinfo->prefix_len, - gwaddr, dev->ifindex); + gwaddr, dev); if (rt && !lifetime) { ip6_del_rt(rt); @@ -811,8 +813,8 @@ int rt6_route_rcv(struct net_device *dev, u8 *opt, int len, } if (!rt && lifetime) - rt = rt6_add_route_info(net, prefix, rinfo->prefix_len, gwaddr, dev->ifindex, - pref); + rt = rt6_add_route_info(net, prefix, rinfo->prefix_len, gwaddr, + dev, pref); else if (rt) rt->rt6i_flags = RTF_ROUTEINFO | (rt->rt6i_flags & ~RTF_PREF_MASK) | RTF_PREF(pref); @@ -2325,13 +2327,16 @@ static void ip6_rt_copy_init(struct rt6_info *rt, struct rt6_info *ort) #ifdef CONFIG_IPV6_ROUTE_INFO static struct rt6_info *rt6_get_route_info(struct net *net, const struct in6_addr *prefix, int prefixlen, - const struct in6_addr *gwaddr, int ifindex) + const struct in6_addr *gwaddr, + struct net_device *dev) { + u32 tb_id = l3mdev_fib_table(dev) ? : RT6_TABLE_INFO; + int ifindex = dev->ifindex; struct fib6_node *fn; struct rt6_info *rt = NULL; struct fib6_table *table; - table = fib6_get_table(net, RT6_TABLE_INFO); + table = fib6_get_table(net, tb_id); if (!table) return NULL; @@ -2357,12 +2362,13 @@ static struct rt6_info *rt6_get_route_info(struct net *net, static struct rt6_info *rt6_add_route_info(struct net *net, const struct in6_addr *prefix, int prefixlen, - const struct in6_addr *gwaddr, int ifindex, + const struct in6_addr *gwaddr, + struct net_device *dev, unsigned int pref) { struct fib6_config cfg = { .fc_metric = IP6_RT_PRIO_USER, - .fc_ifindex
[PATCH] net: ipv6: Fix processing of RAs in presence of VRF
rt6_add_route_info and rt6_add_dflt_router were updated to pull the FIB table from the device index, but the corresponding rt6_get_route_info and rt6_get_dflt_router functions were not leading to the failure to process RA's: ICMPv6: RA: ndisc_router_discovery failed to add default route Fix the 'get' functions by using the table id associated with the device when applicable. Also, now that default routes can be added to tables other than the default table, rt6_purge_dflt_routers needs to be updated as well to look at all tables. To handle that efficiently, add a flag to the table denoting if it is has a default route via RA. Signed-off-by: David Ahern--- include/net/ip6_fib.h | 2 ++ net/ipv6/route.c | 68 --- 2 files changed, 50 insertions(+), 20 deletions(-) diff --git a/include/net/ip6_fib.h b/include/net/ip6_fib.h index fb961a576abe..a74e2aa40ef4 100644 --- a/include/net/ip6_fib.h +++ b/include/net/ip6_fib.h @@ -230,6 +230,8 @@ struct fib6_table { rwlock_ttb6_lock; struct fib6_nodetb6_root; struct inet_peer_base tb6_peers; + unsigned intflags; +#define RT6_TABLE_HAS_DFLT_ROUTER BIT(0) }; #define RT6_TABLE_UNSPEC RT_TABLE_UNSPEC diff --git a/net/ipv6/route.c b/net/ipv6/route.c index bdbc38e8bf29..3ac19eb81a86 100644 --- a/net/ipv6/route.c +++ b/net/ipv6/route.c @@ -102,11 +102,13 @@ static int rt6_score_route(struct rt6_info *rt, int oif, int strict); #ifdef CONFIG_IPV6_ROUTE_INFO static struct rt6_info *rt6_add_route_info(struct net *net, const struct in6_addr *prefix, int prefixlen, - const struct in6_addr *gwaddr, int ifindex, + const struct in6_addr *gwaddr, + struct net_device *dev, unsigned int pref); static struct rt6_info *rt6_get_route_info(struct net *net, const struct in6_addr *prefix, int prefixlen, - const struct in6_addr *gwaddr, int ifindex); + const struct in6_addr *gwaddr, + struct net_device *dev); #endif struct uncached_list { @@ -803,7 +805,7 @@ int rt6_route_rcv(struct net_device *dev, u8 *opt, int len, rt = rt6_get_dflt_router(gwaddr, dev); else rt = rt6_get_route_info(net, prefix, rinfo->prefix_len, - gwaddr, dev->ifindex); + gwaddr, dev); if (rt && !lifetime) { ip6_del_rt(rt); @@ -811,8 +813,8 @@ int rt6_route_rcv(struct net_device *dev, u8 *opt, int len, } if (!rt && lifetime) - rt = rt6_add_route_info(net, prefix, rinfo->prefix_len, gwaddr, dev->ifindex, - pref); + rt = rt6_add_route_info(net, prefix, rinfo->prefix_len, gwaddr, + dev, pref); else if (rt) rt->rt6i_flags = RTF_ROUTEINFO | (rt->rt6i_flags & ~RTF_PREF_MASK) | RTF_PREF(pref); @@ -2325,13 +2327,16 @@ static void ip6_rt_copy_init(struct rt6_info *rt, struct rt6_info *ort) #ifdef CONFIG_IPV6_ROUTE_INFO static struct rt6_info *rt6_get_route_info(struct net *net, const struct in6_addr *prefix, int prefixlen, - const struct in6_addr *gwaddr, int ifindex) + const struct in6_addr *gwaddr, + struct net_device *dev) { + u32 tb_id = l3mdev_fib_table(dev) ? : RT6_TABLE_INFO; + int ifindex = dev->ifindex; struct fib6_node *fn; struct rt6_info *rt = NULL; struct fib6_table *table; - table = fib6_get_table(net, RT6_TABLE_INFO); + table = fib6_get_table(net, tb_id); if (!table) return NULL; @@ -2357,12 +2362,13 @@ static struct rt6_info *rt6_get_route_info(struct net *net, static struct rt6_info *rt6_add_route_info(struct net *net, const struct in6_addr *prefix, int prefixlen, - const struct in6_addr *gwaddr, int ifindex, + const struct in6_addr *gwaddr, + struct net_device *dev, unsigned int pref) { struct fib6_config cfg = { .fc_metric = IP6_RT_PRIO_USER, - .fc_ifindex = ifindex, + .fc_ifindex = dev->ifindex, .fc_dst_len =
Re: net/can: warning in bcm_connect/proc_register
Hi Cong, I'm able to reproduce it by running https://gist.github.com/xairy/33f2eb6bf807b004e643bae36c3d02d7 in a tight parallel loop with stress (https://godoc.org/golang.org/x/tools/cmd/stress): $ gcc -lpthread tmp.c $ ./stress ./a.out The C program was generated from the following syzkaller prog: mmap(&(0x7f00/0x991000)=nil, (0x991000), 0x3, 0x32, 0x, 0x0) socket(0x1d, 0x80002, 0x2) r0 = socket(0x1d, 0x80002, 0x2) connect$nfc_llcp(r0, &(0x7f00c000)={0x27, 0x1, 0x0, 0x5, 0x1, 0x1, "341b3a01b257849ca1d7d1ff9f999d8127b185f88d1d775d59c88a3aa6a8ddacdf2bdc324ea6578a21b85114610186c3817c34b05eaffd2c3f54f57fa81ba0", 0x1ff}, 0x60) connect$nfc_llcp(r0, &(0x7f991000-0x60)={0x27, 0x1, 0x1, 0x5, 0xfffd, 0x0, "341b3a01b257849ca1d7d1ff9f999d8127b185f88d1d775dbec88a3aa6a8ddacdf2bdc324ea6578a21b85114610186c3817c34b05eaffd2c3f54f57fa81ba0", 0x1ff}, 0x60) Unfortunately I wasn't able to create a simpler reproducer. Thanks! On Mon, Oct 24, 2016 at 6:58 PM, Cong Wangwrote: > On Mon, Oct 24, 2016 at 9:21 AM, Andrey Konovalov > wrote: >> Hi, >> >> I've got the following error report while running the syzkaller fuzzer: >> >> WARNING: CPU: 0 PID: 32451 at fs/proc/generic.c:345 proc_register+0x25e/0x300 >> proc_dir_entry 'can-bcm/249757' already registered >> Kernel panic - not syncing: panic_on_warn set ... > > Looks like we have two problems here: > > 1) A check for bo->bcm_proc_read != NULL seems missing > 2) We need to lock the sock in bcm_connect(). > > I will work on a patch. Meanwhile, it would help a lot if you could provide > a reproducer. > > Thanks!
Re: [PATCH] net: skip genenerating uevents for network namespaces that are exiting
On Sat, Oct 22, 2016 at 12:37 AM, Andrey Vaginwrote: > Hi Cong, > > On Thu, Oct 20, 2016 at 10:25 PM, Andrey Vagin wrote: >> On Thu, Oct 20, 2016 at 8:10 PM, Cong Wang wrote: >>> On Thu, Oct 20, 2016 at 7:46 PM, Andrei Vagin wrote: No one can see these events, because a network namespace can not be destroyed, if it has sockets. >>> >>> Are you sure? kobject_uevent_env() seems sending uevents to all >>> network namespaces. >> >> kobj_bcast_filter() checks that a kobject namespace is equal to a >> socket namespace. > > Today I've checked that it really works as I read from the source code. > I use this tool to read events: > https://gist.github.com/avagin/430ba431fc2972002df40ebe6a048b36 > > And I see that events from non-network devices are delivered to all sockets, > but events from network devices are delivered only to sockets from > a network namespace where a device is operated. I missed it, it makes sense now. Please consider adding a comment in the code or expanding your changelog for reference. Thanks!
Re: [PATCH net 1/1] net sched filters: fix notification of filter delete with proper handle
On Sun, Oct 23, 2016 at 8:35 AM, Jamal Hadi Salimwrote: > From: Jamal Hadi Salim > > Signed-off-by: Jamal Hadi Salim We definitely need a serious changelog, even just a short one. ;) Other than this, Acked-by: Cong Wang We can address the "if (RTM_DELTFILTER != event)" in a separated patch if needed. Thanks.
Re: net/can: warning in bcm_connect/proc_register
On Mon, Oct 24, 2016 at 9:21 AM, Andrey Konovalovwrote: > Hi, > > I've got the following error report while running the syzkaller fuzzer: > > WARNING: CPU: 0 PID: 32451 at fs/proc/generic.c:345 proc_register+0x25e/0x300 > proc_dir_entry 'can-bcm/249757' already registered > Kernel panic - not syncing: panic_on_warn set ... Looks like we have two problems here: 1) A check for bo->bcm_proc_read != NULL seems missing 2) We need to lock the sock in bcm_connect(). I will work on a patch. Meanwhile, it would help a lot if you could provide a reproducer. Thanks!
Re: Redundancy support through HSR and PRP
On 10/20/2016 01:08 PM, Murali Karicheri wrote: > David, > > On 10/10/2016 02:34 PM, Murali Karicheri wrote: >> All, >> >> Wondering if there plan to add PRP driver support, like HSR in Linux? AFAIK, >> PRP >> adds trailor to Ethernet frame and is used for Redundancy management like >> HSR. >> So wondering why this is not supported. >> >> Thanks >> > I need to work on a prp driver for Linux. So if there is already someone > working > on this, I would like to join and contribute. Either way please respond so > that > I can work to add this support. > > I am also working to add support for offload HSR functions to hardware and > will > need to modify the hsr driver to support the same. So any suggestion as to > how this > can be done, will be appreciated. > > Here is what I believe should happen to support this at a higher level > > hsr capable NIC (with firmware support) may able to > - duplicate packets at the egress. So only one copy needs to be forwarded to > the >NIC > - Discard the duplicate at the ingress. So forward only one to copy to the > ethernet >driver > - Manage supervision of the network. Keep node list and their status > > It could be a subset of the above. So I am hoping this can be published by > the Ethernet > driver as a set of features. The hsr driver can then look at this features and > decide to offload and disable same functionality at the hsr driver. Also the > node list/status > has to be polled from the underlying hardware. > > PRP is similar to HSR in many respect. Redundancy management uses a suffix > tag to the MAC > frame instead of prefix used by HSR. So they are more transparently handled > by > switches or routers. Probably i need to do > - rename net/hsr to net/hsr-prp > - restructure the current set of files to add prp support > > Thanks > + Arvid Didn't copy HSR owner in my original email. Copying now. -- Murali Karicheri Linux Kernel, Keystone
net/can: warning in bcm_connect/proc_register
Hi, I've got the following error report while running the syzkaller fuzzer: WARNING: CPU: 0 PID: 32451 at fs/proc/generic.c:345 proc_register+0x25e/0x300 proc_dir_entry 'can-bcm/249757' already registered Kernel panic - not syncing: panic_on_warn set ... CPU: 0 PID: 32451 Comm: syz-executor Not tainted 4.9.0-rc1+ #293 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 880037d8fae0 81b474f4 0003 dc00 840cbf00 880037d8fb04 880037d8fba8 8140c06a 41b58ab3 8479ab7d 8140beae 0032 Call Trace: [< inline >] __dump_stack lib/dump_stack.c:15 [] dump_stack+0xb3/0x10f lib/dump_stack.c:51 [] panic+0x1bc/0x39d kernel/panic.c:179 [] __warn+0x1cc/0x1f0 kernel/panic.c:542 [] warn_slowpath_fmt+0xac/0xd0 kernel/panic.c:565 [] proc_register+0x25e/0x300 fs/proc/generic.c:344 [] proc_create_data+0x101/0x1a0 fs/proc/generic.c:507 [] bcm_connect+0x16e/0x380 net/can/bcm.c:1585 [] SYSC_connect+0x244/0x2f0 net/socket.c:1533 [] SyS_connect+0x24/0x30 net/socket.c:1514 [] entry_SYSCALL_64_fastpath+0x1f/0xc2 Dumping ftrace buffer: (ftrace buffer empty) Kernel Offset: disabled On commit 07d9a380680d1c0eb51ef87ff2eab5c994949e69 (Oct 23).
Re: [PATCH net-next] flow_dissector: fix vlan tag handling
On Monday, October 24, 2016 10:17:36 AM CEST Jiri Pirko wrote: > Sat, Oct 22, 2016 at 10:30:08PM CEST, a...@arndb.de wrote: > >diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c > >index 44e6ba9d3a6b..17be1b66cc41 100644 > >--- a/net/core/flow_dissector.c > >+++ b/net/core/flow_dissector.c > >@@ -246,13 +246,10 @@ bool __skb_flow_dissect(const struct sk_buff *skb, > > case htons(ETH_P_8021AD): > > case htons(ETH_P_8021Q): { > > const struct vlan_hdr *vlan; > >+struct vlan_hdr _vlan; > >+bool vlan_tag_present = (skb && skb_vlan_tag_present(skb)); > > Drop the unnecessary "()" ok > > > >-if (skb && skb_vlan_tag_present(skb)) > >-proto = skb->protocol; > > This does not look correct. I believe that you need to set proto for > further processing. > Ah, of course. I only looked at the usage in this 'case' statement, but the variable is also used after the 'goto again' and at the end of the function. Arnd
[PATCH] kalmia: avoid potential uninitialized variable use
The kalmia_send_init_packet() returns zero or a negative return code, but gcc has no way of knowing that there cannot be a positive return code, so it determines that copying the ethernet address at the end of kalmia_bind() will access uninitialized data: drivers/net/usb/kalmia.c: In function ‘kalmia_bind’: arch/x86/include/asm/string_32.h:78:22: error: ‘*((void *)_addr+4)’ may be used uninitialized in this function [-Werror=maybe-uninitialized] *((short *)to + 2) = *((short *)from + 2); ^ drivers/net/usb/kalmia.c:138:5: note: ‘*((void *)_addr+4)’ was declared here This warning is harmless, but for consistency, we should make the check for the return code match what the driver does everywhere else and just progate it, which then gets rid of the warning. Signed-off-by: Arnd Bergmann--- drivers/net/usb/kalmia.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/usb/kalmia.c b/drivers/net/usb/kalmia.c index 5662babf0583..3e37724d30ae 100644 --- a/drivers/net/usb/kalmia.c +++ b/drivers/net/usb/kalmia.c @@ -151,7 +151,7 @@ kalmia_bind(struct usbnet *dev, struct usb_interface *intf) status = kalmia_init_and_get_ethernet_addr(dev, ethernet_addr); - if (status < 0) { + if (status) { usb_set_intfdata(intf, NULL); usb_driver_release_interface(driver_of(intf), intf); return status; -- 2.9.0
[PATCH] cw1200: fix bogus maybe-uninitialized warning
On x86, the cw1200 driver produces a rather silly warning about the possible use of the 'ret' variable without an initialization presumably after being confused by the architecture specific definition of WARN_ON: drivers/net/wireless/st/cw1200/wsm.c: In function ‘wsm_handle_rx’: drivers/net/wireless/st/cw1200/wsm.c:1457:9: error: ‘ret’ may be used uninitialized in this function [-Werror=maybe-uninitialized] As the driver just checks the same variable twice here, we can simplify it by removing the second condition, which makes it more readable and avoids the warning. Signed-off-by: Arnd Bergmann--- drivers/net/wireless/st/cw1200/wsm.c | 15 +++ 1 file changed, 7 insertions(+), 8 deletions(-) diff --git a/drivers/net/wireless/st/cw1200/wsm.c b/drivers/net/wireless/st/cw1200/wsm.c index 680d60eabc75..094e6637ade2 100644 --- a/drivers/net/wireless/st/cw1200/wsm.c +++ b/drivers/net/wireless/st/cw1200/wsm.c @@ -385,14 +385,13 @@ static int wsm_multi_tx_confirm(struct cw1200_common *priv, if (WARN_ON(count <= 0)) return -EINVAL; - if (count > 1) { - /* We already released one buffer, now for the rest */ - ret = wsm_release_tx_buffer(priv, count - 1); - if (ret < 0) - return ret; - else if (ret > 0) - cw1200_bh_wakeup(priv); - } + /* We already released one buffer, now for the rest */ + ret = wsm_release_tx_buffer(priv, count - 1); + if (ret < 0) + return ret; + + if (ret > 0) + cw1200_bh_wakeup(priv); cw1200_debug_txed_multi(priv, count); for (i = 0; i < count; ++i) { -- 2.9.0
net/ipv4: warning in inet_sock_destruct
Hi, I've got the following error report while running the syzkaller fuzzer: [ cut here ] WARNING: CPU: 1 PID: 0 at net/ipv4/af_inet.c:153[] inet_sock_destruct+0x64d/0x810 net/ipv4/af_inet.c:153 Modules linked in: CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.9.0-rc2+ #301 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 88006cd07d88 81b47264 84465d80 88006cd07dd0 8237 88006cd19100[ 60.531224] 0099 84465d80 0099 Call Trace: [ 60.531224] [] dump_stack+0xb3/0x10f [] __warn+0x1a7/0x1f0 kernel/panic.c:550 [] warn_slowpath_null+0x2c/0x40 kernel/panic.c:585 [] inet_sock_destruct+0x64d/0x810 net/ipv4/af_inet.c:153 [] __sk_destruct+0x51/0x480 net/core/sock.c:1422 [< inline >] __rcu_reclaim kernel/rcu/rcu.h:118 [< inline >] rcu_do_batch kernel/rcu/tree.c:2776 [< inline >] invoke_rcu_callbacks kernel/rcu/tree.c:3040 [< inline >] __rcu_process_callbacks kernel/rcu/tree.c:3007 [] rcu_process_callbacks+0xa40/0x1190 kernel/rcu/tree.c:3024 [] __do_softirq+0x23f/0x8e5 kernel/softirq.c:284 [< inline >] invoke_softirq kernel/softirq.c:364 [] irq_exit+0x1a7/0x1e0 kernel/softirq.c:405 [< inline >] exiting_irq ./arch/x86/include/asm/apic.h:659 [] smp_apic_timer_interrupt+0x7b/0xa0 arch/x86/kernel/apic/apic.c:960 [] apic_timer_interrupt+0x8c/0xa0 [ 60.531224] [] ? native_safe_halt+0x6/0x10 [< inline >] arch_safe_halt ./arch/x86/include/asm/paravirt.h:103 [] default_idle+0x22/0x2d0 arch/x86/kernel/process.c:308 [] arch_cpu_idle+0xa/0x10 arch/x86/kernel/process.c:299 [] default_idle_call+0x36/0x60 kernel/sched/idle.c:96 [< inline >] cpuidle_idle_call kernel/sched/idle.c:154 [< inline >] cpu_idle_loop kernel/sched/idle.c:247 [] cpu_startup_entry+0x244/0x300 kernel/sched/idle.c:302 [] start_secondary+0x250/0x2d0 arch/x86/kernel/smpboot.c:263 ---[ end trace 3cd7480984cd70d8 ]--- === [ INFO: suspicious RCU usage. ] 4.9.0-rc2+ #301 Tainted: GW --- net/core/sock.c:1425 suspicious rcu_dereference_check() usage! other info that might help us debug this: rcu_scheduler_active = 1, debug_locks = 0 1 lock held by swapper/1/0: #0: [ 60.560631] ( rcu_callback[ 60.560930] ){..} , at: [ 60.561271] [] rcu_process_callbacks+0x9eb/0x1190 stack backtrace: CPU: 1 PID: 0 Comm: swapper/1 Tainted: GW 4.9.0-rc2+ #301 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 88006cd07e20 81b47264 88006c18 0001 843fe660 88006cd07e50 81204a4f 880066438440 880066438000 8800664381b0 Call Trace: [ 60.563304] [] dump_stack+0xb3/0x10f [] lockdep_rcu_suspicious+0x13f/0x190 kernel/locking/lockdep.c:4445 [] __sk_destruct+0x3c0/0x480 net/core/sock.c:1424 [< inline >] __rcu_reclaim kernel/rcu/rcu.h:118 [< inline >] rcu_do_batch kernel/rcu/tree.c:2776 [< inline >] invoke_rcu_callbacks kernel/rcu/tree.c:3040 [< inline >] __rcu_process_callbacks kernel/rcu/tree.c:3007 [] rcu_process_callbacks+0xa40/0x1190 kernel/rcu/tree.c:3024 [] __do_softirq+0x23f/0x8e5 kernel/softirq.c:284 [< inline >] invoke_softirq kernel/softirq.c:364 [] irq_exit+0x1a7/0x1e0 kernel/softirq.c:405 [< inline >] exiting_irq ./arch/x86/include/asm/apic.h:659 [] smp_apic_timer_interrupt+0x7b/0xa0 arch/x86/kernel/apic/apic.c:960 [] apic_timer_interrupt+0x8c/0xa0 [ 60.563304] [] ? native_safe_halt+0x6/0x10 [< inline >] arch_safe_halt ./arch/x86/include/asm/paravirt.h:103 [] default_idle+0x22/0x2d0 arch/x86/kernel/process.c:308 [] arch_cpu_idle+0xa/0x10 arch/x86/kernel/process.c:299 [] default_idle_call+0x36/0x60 kernel/sched/idle.c:96 [< inline >] cpuidle_idle_call kernel/sched/idle.c:154 [< inline >] cpu_idle_loop kernel/sched/idle.c:247 [] cpu_startup_entry+0x244/0x300 kernel/sched/idle.c:302 [] start_secondary+0x250/0x2d0 arch/x86/kernel/smpboot.c:263 On commit 07d9a380680d1c0eb51ef87ff2eab5c994949e69 (Oct 23).
[PATCH] wireless: fix bogus maybe-uninitialized warning
The hostap_80211_rx() function is supposed to set up the mac addresses for four possible cases, based on two bits of input data. For some reason, gcc decides that it's possible that none of the these four cases apply and the addresses remain uninitialized: drivers/net/wireless/intersil/hostap/hostap_80211_rx.c: In function ‘hostap_80211_rx’: arch/x86/include/asm/string_32.h:77:14: warning: ‘src’ may be used uninitialized in this function [-Wmaybe-uninitialized] drivers/net/wireless/intel/ipw2x00/libipw_rx.c: In function ‘libipw_rx’: arch/x86/include/asm/string_32.h:77:14: error: ‘dst’ may be used uninitialized in this function [-Werror=maybe-uninitialized] arch/x86/include/asm/string_32.h:78:22: error: ‘*((void *)+4)’ may be used uninitialized in this function [-Werror=maybe-uninitialized] This warning is clearly nonsense, but changing the last case into 'default' makes it obvious to the compiler too, which avoids the warning and probably leads to better object code too. The same code is duplicated several times in the kernel, so this patch uses the same workaround for all copies. The exact configuration was hit only very rarely in randconfig builds and I only saw it in three drivers, but I assume that all of them are potentially affected, and it's better to keep the code consistent. Signed-off-by: Arnd Bergmann--- drivers/net/wireless/ath/ath6kl/wmi.c | 8 drivers/net/wireless/intel/ipw2x00/libipw_rx.c | 2 +- drivers/net/wireless/intersil/hostap/hostap_80211_rx.c | 2 +- net/wireless/lib80211_crypt_tkip.c | 2 +- 4 files changed, 7 insertions(+), 7 deletions(-) diff --git a/drivers/net/wireless/ath/ath6kl/wmi.c b/drivers/net/wireless/ath/ath6kl/wmi.c index 3fd1cc98fd2f..84a6d12c3f8a 100644 --- a/drivers/net/wireless/ath/ath6kl/wmi.c +++ b/drivers/net/wireless/ath/ath6kl/wmi.c @@ -421,10 +421,6 @@ int ath6kl_wmi_dot11_hdr_remove(struct wmi *wmi, struct sk_buff *skb) switch ((le16_to_cpu(wh.frame_control)) & (IEEE80211_FCTL_FROMDS | IEEE80211_FCTL_TODS)) { - case 0: - memcpy(eth_hdr.h_dest, wh.addr1, ETH_ALEN); - memcpy(eth_hdr.h_source, wh.addr2, ETH_ALEN); - break; case IEEE80211_FCTL_TODS: memcpy(eth_hdr.h_dest, wh.addr3, ETH_ALEN); memcpy(eth_hdr.h_source, wh.addr2, ETH_ALEN); @@ -435,6 +431,10 @@ int ath6kl_wmi_dot11_hdr_remove(struct wmi *wmi, struct sk_buff *skb) break; case IEEE80211_FCTL_FROMDS | IEEE80211_FCTL_TODS: break; + default: + memcpy(eth_hdr.h_dest, wh.addr1, ETH_ALEN); + memcpy(eth_hdr.h_source, wh.addr2, ETH_ALEN); + break; } skb_pull(skb, sizeof(struct ath6kl_llc_snap_hdr)); diff --git a/drivers/net/wireless/intel/ipw2x00/libipw_rx.c b/drivers/net/wireless/intel/ipw2x00/libipw_rx.c index cef7f7d79cd9..1c1ec7bb9302 100644 --- a/drivers/net/wireless/intel/ipw2x00/libipw_rx.c +++ b/drivers/net/wireless/intel/ipw2x00/libipw_rx.c @@ -507,7 +507,7 @@ int libipw_rx(struct libipw_device *ieee, struct sk_buff *skb, memcpy(dst, hdr->addr3, ETH_ALEN); memcpy(src, hdr->addr4, ETH_ALEN); break; - case 0: + default: memcpy(dst, hdr->addr1, ETH_ALEN); memcpy(src, hdr->addr2, ETH_ALEN); break; diff --git a/drivers/net/wireless/intersil/hostap/hostap_80211_rx.c b/drivers/net/wireless/intersil/hostap/hostap_80211_rx.c index 599f30f22841..34dbddbf3f9b 100644 --- a/drivers/net/wireless/intersil/hostap/hostap_80211_rx.c +++ b/drivers/net/wireless/intersil/hostap/hostap_80211_rx.c @@ -855,7 +855,7 @@ void hostap_80211_rx(struct net_device *dev, struct sk_buff *skb, memcpy(dst, hdr->addr3, ETH_ALEN); memcpy(src, hdr->addr4, ETH_ALEN); break; - case 0: + default: memcpy(dst, hdr->addr1, ETH_ALEN); memcpy(src, hdr->addr2, ETH_ALEN); break; diff --git a/net/wireless/lib80211_crypt_tkip.c b/net/wireless/lib80211_crypt_tkip.c index 71447cf86306..ba0a1f398ce5 100644 --- a/net/wireless/lib80211_crypt_tkip.c +++ b/net/wireless/lib80211_crypt_tkip.c @@ -556,7 +556,7 @@ static void michael_mic_hdr(struct sk_buff *skb, u8 * hdr) memcpy(hdr, hdr11->addr3, ETH_ALEN);/* DA */ memcpy(hdr + ETH_ALEN, hdr11->addr4, ETH_ALEN); /* SA */ break; - case 0: + default: memcpy(hdr, hdr11->addr1, ETH_ALEN);/* DA */ memcpy(hdr + ETH_ALEN, hdr11->addr2, ETH_ALEN); /* SA */ break; -- 2.9.0
Re: [PATCH] LSO feature added to Cadence GEM driver
On Mon, 2016-10-24 at 14:18 +0100, Rafal Ozieblo wrote: > New Cadence GEM hardware support Large Segment Offload (LSO): > TCP segmentation offload (TSO) as well as UDP fragmentation > offload (UFO). Support for those features was added to the driver. > > Signed-off-by: Rafal Ozieblo... > > +static int macb_lso_check_compatibility(struct sk_buff *skb, unsigned int > hdrlen) > +{ > + unsigned int nr_frags, f; > + > + if (skb_shinfo(skb)->gso_size == 0) > + /* not LSO */ > + return -EPERM; > + > + /* there is only one buffer */ > + if (!skb_is_nonlinear(skb)) > + return 0; > + > + /* For LSO: > + * When software supplies two or more payload buffers all payload > buffers > + * apart from the last must be a multiple of 8 bytes in size. > + */ > + if (!IS_ALIGNED(skb_headlen(skb) - hdrlen, MACB_TX_LEN_ALIGN)) > + return -EPERM; > + > + nr_frags = skb_shinfo(skb)->nr_frags; > + /* No need to check last fragment */ > + nr_frags--; > + for (f = 0; f < nr_frags; f++) { > + const skb_frag_t *frag = _shinfo(skb)->frags[f]; > + > + if (!IS_ALIGNED(skb_frag_size(frag), MACB_TX_LEN_ALIGN)) > + return -EPERM; > + } > + return 0; > +} > + Very strange hardware requirements ;( You should implement an .ndo_features_check method to perform the checks from core networking stack, and not from your ndo_start_xmit() This has the huge advantage of not falling back to skb_linearize(skb) which is very likely to fail with ~64 KB skbs anyway. (Your ndo_features_check() would request software GSO instead ...)
[PATCH] staging: rtl8192x: fix bogus maybe-uninitialized warning
The rtllib_rx_extract_addr() is supposed to set up the mac addresses for four possible cases, based on two bits of input data. For some reason, gcc decides that it's possible that none of the these four cases apply and the addresses remain uninitialized: drivers/staging/rtl8192e/rtllib_rx.c: In function ‘rtllib_rx_InfraAdhoc’: include/linux/etherdevice.h:316:61: error: ‘*((void *)+4)’ may be used uninitialized in this function [-Werror=maybe-uninitialized] drivers/staging/rtl8192e/rtllib_rx.c:1318:5: note: ‘*((void *)+4)’ was declared here ded from /git/arm-soc/drivers/staging/rtl8192e/rtllib_rx.c:40:0: include/linux/etherdevice.h:316:36: error: ‘dst’ may be used uninitialized in this function [-Werror=maybe-uninitialized] drivers/staging/rtl8192e/rtllib_rx.c:1318:5: note: ‘dst’ was declared here This warning is clearly nonsense, but changing the last case into 'default' makes it obvious to the compiler too, which avoids the warning and probably leads to better object code too. As the same warning appears in other files that have the exact same code, I'm fixing it in both rtl8192e and rtl8192u, even though I did not observe it for the latter. Signed-off-by: Arnd Bergmann--- drivers/staging/rtl8192e/rtllib_rx.c | 2 +- drivers/staging/rtl8192u/ieee80211/ieee80211_crypt_tkip.c | 2 +- drivers/staging/rtl8192u/ieee80211/ieee80211_rx.c | 2 +- 3 files changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/staging/rtl8192e/rtllib_rx.c b/drivers/staging/rtl8192e/rtllib_rx.c index c743182b933e..d6777ecda64d 100644 --- a/drivers/staging/rtl8192e/rtllib_rx.c +++ b/drivers/staging/rtl8192e/rtllib_rx.c @@ -986,7 +986,7 @@ static void rtllib_rx_extract_addr(struct rtllib_device *ieee, ether_addr_copy(src, hdr->addr4); ether_addr_copy(bssid, ieee->current_network.bssid); break; - case 0: + default: ether_addr_copy(dst, hdr->addr1); ether_addr_copy(src, hdr->addr2); ether_addr_copy(bssid, hdr->addr3); diff --git a/drivers/staging/rtl8192u/ieee80211/ieee80211_crypt_tkip.c b/drivers/staging/rtl8192u/ieee80211/ieee80211_crypt_tkip.c index 6fa96d57d316..e68850897adf 100644 --- a/drivers/staging/rtl8192u/ieee80211/ieee80211_crypt_tkip.c +++ b/drivers/staging/rtl8192u/ieee80211/ieee80211_crypt_tkip.c @@ -553,7 +553,7 @@ static void michael_mic_hdr(struct sk_buff *skb, u8 *hdr) memcpy(hdr, hdr11->addr3, ETH_ALEN); /* DA */ memcpy(hdr + ETH_ALEN, hdr11->addr4, ETH_ALEN); /* SA */ break; - case 0: + default: memcpy(hdr, hdr11->addr1, ETH_ALEN); /* DA */ memcpy(hdr + ETH_ALEN, hdr11->addr2, ETH_ALEN); /* SA */ break; diff --git a/drivers/staging/rtl8192u/ieee80211/ieee80211_rx.c b/drivers/staging/rtl8192u/ieee80211/ieee80211_rx.c index 89cbc077a48d..2e4d2d7bc2e7 100644 --- a/drivers/staging/rtl8192u/ieee80211/ieee80211_rx.c +++ b/drivers/staging/rtl8192u/ieee80211/ieee80211_rx.c @@ -1079,7 +1079,7 @@ int ieee80211_rx(struct ieee80211_device *ieee, struct sk_buff *skb, memcpy(src, hdr->addr4, ETH_ALEN); memcpy(bssid, ieee->current_network.bssid, ETH_ALEN); break; - case 0: + default: memcpy(dst, hdr->addr1, ETH_ALEN); memcpy(src, hdr->addr2, ETH_ALEN); memcpy(bssid, hdr->addr3, ETH_ALEN); -- 2.9.0
net/sctp: slab-out-of-bounds in sctp_sf_ootb
Hi, I've got the following error report while running the syzkaller fuzzer: == BUG: KASAN: slab-out-of-bounds in sctp_sf_ootb+0x634/0x6c0 at addr 88006bc1f210 Read of size 2 by task syz-executor/13493 CPU: 3 PID: 13493 Comm: syz-executor Not tainted 4.9.0-rc2+ #300 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 88003e256e40 81b47264 88003e80ccc0 88006bc1eed8 88006bc1f210 88006bc1eed0 88003e256e68 8150b34c 88003e256ef8 88003e80ccc0 8800ebc1f210 88003e256ee8 Call Trace: [< inline >] __dump_stack lib/dump_stack.c:15 [] dump_stack+0xb3/0x10f lib/dump_stack.c:51 [] kasan_object_err+0x1c/0x70 mm/kasan/report.c:156 [< inline >] print_address_description mm/kasan/report.c:194 [] kasan_report_error+0x1f7/0x4d0 mm/kasan/report.c:283 [< inline >] kasan_report mm/kasan/report.c:303 [] __asan_report_load_n_noabort+0x3a/0x40 mm/kasan/report.c:334 [] sctp_sf_ootb+0x634/0x6c0 net/sctp/sm_statefuns.c:3448 [] sctp_do_sm+0x104/0x4ed0 net/sctp/sm_sideeffect.c:1108 [] sctp_endpoint_bh_rcv+0x32d/0x9c0 net/sctp/endpointola.c:452 [] sctp_inq_push+0x134/0x1a0 net/sctp/inqueue.c:95 [] sctp_rcv+0x1fa8/0x2d00 net/sctp/input.c:268 [] ip_local_deliver_finish+0x332/0xad0 net/ipv4/ip_input.c:216 [< inline >] NF_HOOK_THRESH include/linux/netfilter.h:232 [< inline >] NF_HOOK include/linux/netfilter.h:255 [] ip_local_deliver+0x1c2/0x4b0 net/ipv4/ip_input.c:257 [< inline >] dst_input include/net/dst.h:507 [] ip_rcv_finish+0x750/0x1c40 net/ipv4/ip_input.c:396 [< inline >] NF_HOOK_THRESH include/linux/netfilter.h:232 [< inline >] NF_HOOK include/linux/netfilter.h:255 [] ip_rcv+0x96f/0x12f0 net/ipv4/ip_input.c:487 [] __netif_receive_skb_core+0x1897/0x2a50 net/core/dev.c:4212 [] __netif_receive_skb+0x2a/0x170 net/core/dev.c:4250 [] netif_receive_skb_internal+0x1b3/0x390 net/core/dev.c:4278 [] netif_receive_skb+0x48/0x250 net/core/dev.c:4302 [] tun_get_user+0xbd5/0x28a0 drivers/net/tun.c:1308 [] tun_chr_write_iter+0xda/0x190 drivers/net/tun.c:1332 [< inline >] new_sync_write fs/read_write.c:499 [] __vfs_write+0x334/0x570 fs/read_write.c:512 [] vfs_write+0x17b/0x500 fs/read_write.c:560 [< inline >] SYSC_write fs/read_write.c:607 [] SyS_write+0xd4/0x1a0 fs/read_write.c:599 [] entry_SYSCALL_64_fastpath+0x1f/0xc2 Object at 88006bc1eed8, in cache kmalloc-512 size: 512 Allocated: PID = 9755 [ 182.804017] [] save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:57 [ 182.804017] [] save_stack+0x46/0xd0 mm/kasan/kasan.c:495 [ 182.804017] [< inline >] set_track mm/kasan/kasan.c:507 [ 182.804017] [] kasan_kmalloc+0xab/0xe0 mm/kasan/kasan.c:598 [ 182.804017] [] kasan_slab_alloc+0x12/0x20 mm/kasan/kasan.c:537 [ 182.804017] [< inline >] slab_post_alloc_hook mm/slab.h:417 [ 182.804017] [< inline >] slab_alloc_node mm/slub.c:2708 [ 182.804017] [] __kmalloc_node_track_caller+0xcb/0x390 mm/slub.c:4270 [ 182.804017] [] __kmalloc_reserve.isra.35+0x41/0xe0 net/core/skbuff.c:138 [ 182.804017] [] __alloc_skb+0xf0/0x600 net/core/skbuff.c:231 [ 182.804017] [< inline >] alloc_skb include/linux/skbuff.h:921 [ 182.804017] [] sock_wmalloc+0xa3/0xf0 net/core/sock.c:1757 [ 182.804017] [] __ip_append_data.isra.46+0x1e38/0x28c0 net/ipv4/ip_output.c:1010 [ 182.804017] [] ip_append_data.part.47+0xf1/0x170 net/ipv4/ip_output.c:1201 [ 182.804017] [< inline >] ip_append_data net/ipv4/ip_output.c:1605 [ 182.804017] [] ip_send_unicast_reply+0x831/0xe20 net/ipv4/ip_output.c:1605 [ 182.804017] [] tcp_v4_send_reset+0xb0a/0x1700 net/ipv4/tcp_ipv4.c:696 [ 182.804017] [] tcp_v4_rcv+0x198b/0x2e50 net/ipv4/tcp_ipv4.c:1719 [ 182.804017] [] ip_local_deliver_finish+0x332/0xad0 net/ipv4/ip_input.c:216 [ 182.804017] [< inline >] NF_HOOK_THRESH include/linux/netfilter.h:232 [ 182.804017] [< inline >] NF_HOOK include/linux/netfilter.h:255 [ 182.804017] [] ip_local_deliver+0x1c2/0x4b0 net/ipv4/ip_input.c:257 [ 182.804017] [< inline >] dst_input include/net/dst.h:507 [ 182.804017] [] ip_rcv_finish+0x750/0x1c40 net/ipv4/ip_input.c:396 [ 182.804017] [< inline >] NF_HOOK_THRESH include/linux/netfilter.h:232 [ 182.804017] [< inline >] NF_HOOK include/linux/netfilter.h:255 [ 182.804017] [] ip_rcv+0x96f/0x12f0 net/ipv4/ip_input.c:487 [ 182.804017] [] __netif_receive_skb_core+0x1897/0x2a50 net/core/dev.c:4212 [ 182.804017] [] __netif_receive_skb+0x2a/0x170 net/core/dev.c:4250 [ 182.804017] [] netif_receive_skb_internal+0x1b3/0x390 net/core/dev.c:4278 [ 182.804017] [] netif_receive_skb+0x48/0x250 net/core/dev.c:4302 [ 182.804017] [] tun_get_user+0xbd5/0x28a0 drivers/net/tun.c:1308 [ 182.804017] [] tun_chr_write_iter+0xda/0x190 drivers/net/tun.c:1332 [ 182.804017] [< inline >]
RE: [PATCH] net: fec: hard phy reset on open
From: manfred.schla...@gmx.atSent: Monday, October 24, 2016 5:26 PM > To: Andy Duan > Cc: netdev@vger.kernel.org; linux-ker...@vger.kernel.org > Subject: [PATCH] net: fec: hard phy reset on open > > We have seen some problems with auto negotiation on i.MX6 using LAN8720, > after interface down/up. > > In our configuration, the ptp clock is used externally as reference clock for > the phy. Some phys (e.g. LAN8720) needs a stable clock while and after hard > reset. > Before this patch, the driver disabled the clock on close but did no hard > reset > on open, after enabling the clocks again. > > A solution that prevents disabling the clocks on close was considered, but > discarded because of bad power saving behavior. > > This patch saves the reset dt properties on probe and does a reset on every > open after clocks where enabled, to make sure the clock is stable while and > after hard reset. > > Tested on i.MX6 and i.MX28, both using LAN8720. > > Signed-off-by: Manfred Schlaegl > --- This patch did hard reset to let phy stable. Firstly, you should do reset before clock enable. Secondly, we suggest to do phy reset in phy driver, not MAC driver. Regards, Andy > drivers/net/ethernet/freescale/fec.h | 4 ++ > drivers/net/ethernet/freescale/fec_main.c | 77 +--- > --- > 2 files changed, 47 insertions(+), 34 deletions(-) > > diff --git a/drivers/net/ethernet/freescale/fec.h > b/drivers/net/ethernet/freescale/fec.h > index c865135..379e619 100644 > --- a/drivers/net/ethernet/freescale/fec.h > +++ b/drivers/net/ethernet/freescale/fec.h > @@ -498,6 +498,10 @@ struct fec_enet_private { > struct clk *clk_enet_out; > struct clk *clk_ptp; > > + int phy_reset; > + bool phy_reset_active_high; > + int phy_reset_msec; > + > bool ptp_clk_on; > struct mutex ptp_clk_mutex; > unsigned int num_tx_queues; > diff --git a/drivers/net/ethernet/freescale/fec_main.c > b/drivers/net/ethernet/freescale/fec_main.c > index 48a033e..8cc1ec5 100644 > --- a/drivers/net/ethernet/freescale/fec_main.c > +++ b/drivers/net/ethernet/freescale/fec_main.c > @@ -2802,6 +2802,22 @@ static int fec_enet_alloc_buffers(struct > net_device *ndev) > return 0; > } > > +static void fec_reset_phy(struct fec_enet_private *fep) { > + if (!gpio_is_valid(fep->phy_reset)) > + return; > + > + gpio_set_value_cansleep(fep->phy_reset, !!fep- > >phy_reset_active_high); > + > + if (fep->phy_reset_msec > 20) > + msleep(fep->phy_reset_msec); > + else > + usleep_range(fep->phy_reset_msec * 1000, > + fep->phy_reset_msec * 1000 + 1000); > + > + gpio_set_value_cansleep(fep->phy_reset, !fep- > >phy_reset_active_high); > +} > + > static int > fec_enet_open(struct net_device *ndev) > { > @@ -2817,6 +2833,8 @@ fec_enet_open(struct net_device *ndev) > if (ret) > goto clk_enable; > > + fec_reset_phy(fep); > + > /* I should reset the ring buffers here, but I don't yet know >* a simple way to do that. >*/ > @@ -3183,52 +3201,39 @@ static int fec_enet_init(struct net_device *ndev) > return 0; > } > > -#ifdef CONFIG_OF > -static void fec_reset_phy(struct platform_device *pdev) > +static int > +fec_get_reset_phy(struct platform_device *pdev, int *msec, int > *phy_reset, > + bool *active_high) > { > - int err, phy_reset; > - bool active_high = false; > - int msec = 1; > + int err; > struct device_node *np = pdev->dev.of_node; > > - if (!np) > - return; > + if (!np || !of_device_is_available(np)) > + return 0; > > - of_property_read_u32(np, "phy-reset-duration", ); > + of_property_read_u32(np, "phy-reset-duration", msec); > /* A sane reset duration should not be longer than 1s */ > - if (msec > 1000) > - msec = 1; > + if (*msec > 1000) > + *msec = 1; > > - phy_reset = of_get_named_gpio(np, "phy-reset-gpios", 0); > - if (!gpio_is_valid(phy_reset)) > - return; > + *phy_reset = of_get_named_gpio(np, "phy-reset-gpios", 0); > + if (!gpio_is_valid(*phy_reset)) > + return 0; > > - active_high = of_property_read_bool(np, "phy-reset-active-high"); > + *active_high = of_property_read_bool(np, "phy-reset-active-high"); > > - err = devm_gpio_request_one(>dev, phy_reset, > - active_high ? GPIOF_OUT_INIT_HIGH : > GPIOF_OUT_INIT_LOW, > - "phy-reset"); > + err = devm_gpio_request_one(>dev, *phy_reset, > + *active_high ? > + GPIOF_OUT_INIT_HIGH : > + GPIOF_OUT_INIT_LOW, > + "phy-reset"); > if (err) { > dev_err(>dev, "failed to get
[PATCH] netfilter: ip_vs_sync: fix bogus maybe-uninitialized warning
Building the ip_vs_sync code with CONFIG_OPTIMIZE_INLINING on x86 confuses the compiler to the point where it produces a rather dubious warning message: net/netfilter/ipvs/ip_vs_sync.c:1073:33: error: ‘opt.init_seq’ may be used uninitialized in this function [-Werror=maybe-uninitialized] struct ip_vs_sync_conn_options opt; ^~~ net/netfilter/ipvs/ip_vs_sync.c:1073:33: error: ‘opt.delta’ may be used uninitialized in this function [-Werror=maybe-uninitialized] net/netfilter/ipvs/ip_vs_sync.c:1073:33: error: ‘opt.previous_delta’ may be used uninitialized in this function [-Werror=maybe-uninitialized] net/netfilter/ipvs/ip_vs_sync.c:1073:33: error: ‘*((void *)+12).init_seq’ may be used uninitialized in this function [-Werror=maybe-uninitialized] net/netfilter/ipvs/ip_vs_sync.c:1073:33: error: ‘*((void *)+12).delta’ may be used uninitialized in this function [-Werror=maybe-uninitialized] net/netfilter/ipvs/ip_vs_sync.c:1073:33: error: ‘*((void *)+12).previous_delta’ may be used uninitialized in this function [-Werror=maybe-uninitialized] The problem appears to be a combination of a number of factors, including the __builtin_bswap32 compiler builtin being slightly odd, having a large amount of code inlined into a single function, and the way that some functions only get partially inlined here. I've spent way too much time trying to work out a way to improve the code, but the best I've come up with is to add an explicit memset right before the ip_vs_seq structure is first initialized here. When the compiler works correctly, this has absolutely no effect, but in the case that produces the warning, the warning disappears. In the process of analysing this warning, I also noticed that we use memcpy to copy the larger ip_vs_sync_conn_options structure over two members of the ip_vs_conn structure. This works because the layout is identical, but seems error-prone, so I'm changing this in the process to directly copy the two members. This change seemed to have no effect on the object code or the warning, but it deals with the same data, so I kept the two changes together. Signed-off-by: Arnd Bergmann--- net/netfilter/ipvs/ip_vs_sync.c | 7 +-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/net/netfilter/ipvs/ip_vs_sync.c b/net/netfilter/ipvs/ip_vs_sync.c index 1b07578bedf3..9350530c16c1 100644 --- a/net/netfilter/ipvs/ip_vs_sync.c +++ b/net/netfilter/ipvs/ip_vs_sync.c @@ -283,6 +283,7 @@ struct ip_vs_sync_buff { */ static void ntoh_seq(struct ip_vs_seq *no, struct ip_vs_seq *ho) { + memset(ho, 0, sizeof(*ho)); ho->init_seq = get_unaligned_be32(>init_seq); ho->delta = get_unaligned_be32(>delta); ho->previous_delta = get_unaligned_be32(>previous_delta); @@ -917,8 +918,10 @@ static void ip_vs_proc_conn(struct netns_ipvs *ipvs, struct ip_vs_conn_param *pa kfree(param->pe_data); } - if (opt) - memcpy(>in_seq, opt, sizeof(*opt)); + if (opt) { + cp->in_seq = opt->in_seq; + cp->out_seq = opt->out_seq; + } atomic_set(>in_pkts, sysctl_sync_threshold(ipvs)); cp->state = state; cp->old_state = cp->state; -- 2.9.0
Re: question about function igmp_stop_timer() in net/ipv4/igmp.c
On Mon, Oct 24, 2016 at 07:50:12PM +0800, Dongpo Li wrote: > Hello > > We encountered a multicast problem when two set-top box(STB) join the same > multicast group and leave. > The two boxes can join the same multicast group > but only one box can send the IGMP leave group message when leave, > the other box does not send the IGMP leave message. > Our boxes use the IGMP version 2. > > I added some debug info and found the whole procedure is like this: > (1) Box A joins the multicast group 225.1.101.145 and send the IGMP v2 > membership report(join group). > (2) Box B joins the same multicast group 225.1.101.145 and also send the IGMP > v2 membership report(join group). > (3) Box A receives the IGMP membership report from Box B and kernel calls > igmp_heard_report(). > This function will call igmp_stop_timer(im). > In function igmp_stop_timer(im), it tries to delete IGMP timer and does > the following: > im->tm_running = 0; > im->reporter = 0; > (4) Box A leaves the multicast group 225.1.101.145 and kernel calls > ip_mc_leave_group -> ip_mc_dec_group -> igmp_group_dropped. > But in function igmp_group_dropped(), the im->reporter is 0, so the > kernel does not send the IGMP leave message. RFC 2236 says: 2. Introduction The Internet Group Management Protocol (IGMP) is used by IP hosts to report their multicast group memberships to any immediately- neighboring multicast routers. Are Box A or B multicast routers? Andrew
[PATCH] ip6_tunnel: Clear IP6CB(skb)->frag_max_size in ip4ip6_tnl_xmit()
skb->cb may contain data from previous layers, as shown in 5146d1f1511 ("tunnel: Clear IPCB(skb)->opt before dst_link_failure called"). However, for ipip6 tunnels, clearing IPCB(skb)->opt alone is not enough, because skb->cb is later misinterpreted as IP6CB(skb)->frag_max_size. In the observed scenario, the garbage data made the max fragment size so so small that packets sent through the tunnel are mistakenly fragmented. This patch clears IP6CB(skb)->frag_max_size for ipip6 tunnels. Signed-off-by: Eli Cooper--- net/ipv6/ip6_tunnel.c | 1 + 1 file changed, 1 insertion(+) diff --git a/net/ipv6/ip6_tunnel.c b/net/ipv6/ip6_tunnel.c index 202d16a..4110562 100644 --- a/net/ipv6/ip6_tunnel.c +++ b/net/ipv6/ip6_tunnel.c @@ -1205,6 +1205,7 @@ ip4ip6_tnl_xmit(struct sk_buff *skb, struct net_device *dev) int err; memset(&(IPCB(skb)->opt), 0, sizeof(IPCB(skb)->opt)); + IP6CB(skb)->frag_max_size = 0; tproto = ACCESS_ONCE(t->parms.proto); if (tproto != IPPROTO_IPIP && tproto != 0) -- 2.10.1
[PATCH v2 RESEND] xen-netback: prefer xenbus_scanf() over xenbus_gather()
For single items being collected this should be preferred as being more typesafe (as the compiler can check format string and to-be-written-to variable match) and more efficient (requiring one less parameter to be passed). Signed-off-by: Jan Beulich--- v2: Avoid commit message to continue from subject. --- drivers/net/xen-netback/xenbus.c | 14 +++--- 1 file changed, 7 insertions(+), 7 deletions(-) --- 4.9-rc2/drivers/net/xen-netback/xenbus.c +++ 4.9-rc2-xen-netback-prefer-xenbus_scanf/drivers/net/xen-netback/xenbus.c @@ -889,16 +889,16 @@ static int connect_ctrl_ring(struct back unsigned int evtchn; int err; - err = xenbus_gather(XBT_NIL, dev->otherend, - "ctrl-ring-ref", "%u", , NULL); - if (err) + err = xenbus_scanf(XBT_NIL, dev->otherend, + "ctrl-ring-ref", "%u", ); + if (err <= 0) goto done; /* The frontend does not have a control ring */ ring_ref = val; - err = xenbus_gather(XBT_NIL, dev->otherend, - "event-channel-ctrl", "%u", , NULL); - if (err) { + err = xenbus_scanf(XBT_NIL, dev->otherend, + "event-channel-ctrl", "%u", ); + if (err <= 0) { xenbus_dev_fatal(dev, err, "reading %s/event-channel-ctrl", dev->otherend); @@ -919,7 +919,7 @@ done: return 0; fail: - return err; + return err ?: -ENODATA; } static void connect(struct backend_info *be)
Re: [PATCH v2 net] macsec: Fix header length if SCI is added if explicitly disabled
2016-10-24, 15:44:26 +0200, Tobias Brunner wrote: > Even if sending SCIs is explicitly disabled, the code that creates the > Security Tag might still decide to add it (e.g. if multiple RX SCs are > defined on the MACsec interface). > But because the header length so far only depended on the configuration > option the SCI overwrote the original frame's contents (EtherType and > e.g. the beginning of the IP header) and if encrypted did not visibly > end up in the packet, while the SC flag in the TCI field of the Security > Tag was still set, resulting in invalid MACsec frames. > > Fixes: c09440f7dcb3 ("macsec: introduce IEEE 802.1AE driver") > Signed-off-by: Tobias BrunnerAcked-by: Sabrina Dubroca -- Sabrina
Re: [PATCH] net: fec: hard phy reset on open
On 2016-10-24 16:03, Andy Duan wrote: > From: manfred.schla...@gmx.atSent: Monday, > October 24, 2016 5:26 PM >> To: Andy Duan >> Cc: netdev@vger.kernel.org; linux-ker...@vger.kernel.org >> Subject: [PATCH] net: fec: hard phy reset on open >> >> We have seen some problems with auto negotiation on i.MX6 using LAN8720, >> after interface down/up. >> >> In our configuration, the ptp clock is used externally as reference clock for >> the phy. Some phys (e.g. LAN8720) needs a stable clock while and after hard >> reset. >> Before this patch, the driver disabled the clock on close but did no hard >> reset >> on open, after enabling the clocks again. >> >> A solution that prevents disabling the clocks on close was considered, but >> discarded because of bad power saving behavior. >> >> This patch saves the reset dt properties on probe and does a reset on every >> open after clocks where enabled, to make sure the clock is stable while and >> after hard reset. >> >> Tested on i.MX6 and i.MX28, both using LAN8720. >> >> Signed-off-by: Manfred Schlaegl >> --- > This patch did hard reset to let phy stable. > Firstly, you should do reset before clock enable. I have to disagree here. The phy demands(datasheet + tests) a stable clock at the time of (hard-)reset and after this. Therefore the clock has to be enabled before the hard reset. (This is exactly the reason for the patch.) Generally: The sense of a reset is to defer the start of digital circuit until the environment (power, clocks, ...) has stabilized. Furthermore: Before this patch the hard reset was done in fec_probe. And here also after the clocks were enabled. Whats was your argument to do it the other way in this special case? > Secondly, we suggest to do phy reset in phy driver, not MAC driver. I was not sure, if you meant a soft-, or hard-reset here. In case you are talking about soft reset: Yes, the phy drivers perform a soft reset. Sadly a soft reset is not sufficient in this case - The phy recovers only on a hard reset from lost clock. (datasheet + tests) In case you're talking about hard reset: Intuitively I would also think, that the (hard-)reset should be handled by the phy driver, but this is not reality in given implementations. Documentation/devicetree/bindings/net/fsl-fec.txt says - phy-reset-gpios : Should specify the gpio for phy reset It is explicitly talked about phy-reset here. And the (hard-)reset was handled by the fec driver also before this patch (once on probe). > > Regards, > Andy Thanks for your feedback! Best regards, Manfred > >> drivers/net/ethernet/freescale/fec.h | 4 ++ >> drivers/net/ethernet/freescale/fec_main.c | 77 +--- >> --- >> 2 files changed, 47 insertions(+), 34 deletions(-) >> >> diff --git a/drivers/net/ethernet/freescale/fec.h >> b/drivers/net/ethernet/freescale/fec.h >> index c865135..379e619 100644 >> --- a/drivers/net/ethernet/freescale/fec.h >> +++ b/drivers/net/ethernet/freescale/fec.h >> @@ -498,6 +498,10 @@ struct fec_enet_private { >> struct clk *clk_enet_out; >> struct clk *clk_ptp; >> >> +int phy_reset; >> +bool phy_reset_active_high; >> +int phy_reset_msec; >> + >> bool ptp_clk_on; >> struct mutex ptp_clk_mutex; >> unsigned int num_tx_queues; >> diff --git a/drivers/net/ethernet/freescale/fec_main.c >> b/drivers/net/ethernet/freescale/fec_main.c >> index 48a033e..8cc1ec5 100644 >> --- a/drivers/net/ethernet/freescale/fec_main.c >> +++ b/drivers/net/ethernet/freescale/fec_main.c >> @@ -2802,6 +2802,22 @@ static int fec_enet_alloc_buffers(struct >> net_device *ndev) >> return 0; >> } >> >> +static void fec_reset_phy(struct fec_enet_private *fep) { >> +if (!gpio_is_valid(fep->phy_reset)) >> +return; >> + >> +gpio_set_value_cansleep(fep->phy_reset, !!fep- >>> phy_reset_active_high); >> + >> +if (fep->phy_reset_msec > 20) >> +msleep(fep->phy_reset_msec); >> +else >> +usleep_range(fep->phy_reset_msec * 1000, >> + fep->phy_reset_msec * 1000 + 1000); >> + >> +gpio_set_value_cansleep(fep->phy_reset, !fep- >>> phy_reset_active_high); >> +} >> + >> static int >> fec_enet_open(struct net_device *ndev) >> { >> @@ -2817,6 +2833,8 @@ fec_enet_open(struct net_device *ndev) >> if (ret) >> goto clk_enable; >> >> +fec_reset_phy(fep); >> + >> /* I should reset the ring buffers here, but I don't yet know >> * a simple way to do that. >> */ >> @@ -3183,52 +3201,39 @@ static int fec_enet_init(struct net_device *ndev) >> return 0; >> } >> >> -#ifdef CONFIG_OF >> -static void fec_reset_phy(struct platform_device *pdev) >> +static int >> +fec_get_reset_phy(struct platform_device *pdev, int *msec, int >> *phy_reset, >> + bool *active_high) >> { >> -int err, phy_reset; >> -bool active_high = false; >> -
Re: UDP does not autobind on recv
On 10/24/2016, 03:03 PM, Eric Dumazet wrote: > On Mon, 2016-10-24 at 14:54 +0200, Jiri Slaby wrote: >> Hello, >> >> as per man 7 udp: >> In order to receive packets, the socket can be bound to >> a local address first by using bind(2). Otherwise, >> the socket layer will automatically assign a free local >> port out of the range defined by /proc/sys/net/ipv4 >> /ip_local_port_range and bind the socket to INADDR_ANY. >> >> I did not know that bind is unneeded, so I tried that. But it does not >> work with this piece of code: >> int main() >> { >> char buf[128]; >> int fd = socket(AF_INET, SOCK_DGRAM, 0); >> recv(fd, buf, sizeof(buf), 0); >> } > > autobind makes little sense at recv() time really. > > How an application could expect to receive a frame to 'some socket' > without even knowing its port ? For example struct sockaddr_storage sa; socklen_t slen = sizeof(sa); recv(fd, buf, sizeof(buf), MSG_DONTWAIT); getsockname(fd, (struct sockaddr *), ); recv(fd, buf, sizeof(buf), 0); works. > How useful would that be exactly ? No need for finding a free port and checking, for example. > How TCP behaves ? TCP is a completely different story. bind is documented to be required there. (And listen and accept.) > I would say, fix the documentation if it is not correct. I don't have a problem with either. I have only found, that the implementation differs from the documentation :). Is there some supervisor documentation (like POSIX) which should we be in conformance to? thanks, -- js suse labs
Fwd: net/ipx: null-ptr-deref in ipxrtr_route_packet
+a...@redhat.com Hi, I've got the following error report while running the syzkaller fuzzer: kasan: CONFIG_KASAN_INLINE enabled kasan: GPF could be caused by NULL-ptr deref or user memory access general protection fault: [#1] SMP KASAN Modules linked in: CPU: 0 PID: 3953 Comm: syz-executor Not tainted 4.9.0-rc1+ #228 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 task: 88006aa2ac00 task.stack: 880068a9 RIP: 0010:[] [] ipxrtr_route_packet+0x4e4/0xbe0 net/ipx/ipx_route.c :213 RSP: 0018:880068a97b08 EFLAGS: 00010246 RAX: 88006b648500 RBX: 880068a97e40 RCX: dc00 RDX: 0003 RSI: RDI: 88006b648960 RBP: 880068a97bc8 R08: dc00 R09: 11000d4ddf97 R10: dc00 R11: R12: 88006b410300 R13: R14: 88006444b68e R15: 88006a6efc80 FS: 7f28cf665700() GS:88006cc0() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: 00451f80 CR3: 68a9a000 CR4: 06f0 Stack: 88006a6efd58 880068a97dc0 880068a97e44 11000d152f68 001a 88006b648500 41b58ab3 847fb90b 834ed410 82b7cfea 8800ff97 Call Trace: [] ipx_sendmsg+0x30e/0x550 net/ipx/af_ipx.c:1749 [< inline >] sock_sendmsg_nosec net/socket.c:606 [] sock_sendmsg+0xcc/0x110 net/socket.c:616 [] SYSC_sendto+0x211/0x340 net/socket.c:1641 [] SyS_sendto+0x40/0x50 net/socket.c:1609 [] entry_SYSCALL_64_fastpath+0x1f/0xc2 arch/x86/entry/entry_64.S:209 Code: 41 80 7c 0d 00 00 0f 85 82 06 00 00 48 8b 85 70 ff ff ff 49 b8 00 00 00 00 00 fc ff df 4c 8b a8 60 04 00 00 4c 89 ee 48 c1 ee 03 <46> 0f b6 0c 06 45 84 c9 74 0a 41 80 f9 03 0f 8e e5 05 00 00 49 RIP [] ipxrtr_route_packet+0x4e4/0xbe0 net/ipx/ipx_route.c:213 RSP ---[ end trace f5bc9a28de6b2776 ]--- == For some reason ipxs->intrfc ends up being NULL. The reproducer is attached, you need to run a few instances simultaneously. In case it's relevant, this is what I have in /etc/network/interfaces: auto eth1 iface eth1 inet static address 192.168.1.5 netmask 255.255.255.0 post-up arp -s 192.168.1.6 aa:aa:aa:aa:aa:aa iface eth1 ipx static frame EtherII netnum 0x42424242 On commit 1a1891d762d6e64daf07b5be4817e3fbb29e3c59 (Oct 18). // autogenerated by syzkaller (http://github.com/google/syzkaller) #ifndef __NR_sendto #define __NR_sendto 44 #endif #ifndef __NR_syz_fuse_mount #define __NR_syz_fuse_mount 104 #endif #ifndef __NR_syz_open_dev #define __NR_syz_open_dev 102 #endif #ifndef __NR_syz_test #define __NR_syz_test 101 #endif #ifndef __NR_mmap #define __NR_mmap 9 #endif #ifndef __NR_socket #define __NR_socket 41 #endif #ifndef __NR_bind #define __NR_bind 49 #endif #ifndef __NR_syz_fuseblk_mount #define __NR_syz_fuseblk_mount 105 #endif #ifndef __NR_syz_open_pts #define __NR_syz_open_pts 103 #endif #include #include #include #include #include #include #include #include #include #include #include #include #include #include __thread int skip_segv; __thread jmp_buf segv_env; static void segv_handler(int sig, siginfo_t* info, void* uctx) { if (__atomic_load_n(_segv, __ATOMIC_RELAXED)) _longjmp(segv_env, 1); exit(sig); } static void install_segv_handler() { struct sigaction sa; memset(, 0, sizeof(sa)); sa.sa_sigaction = segv_handler; sa.sa_flags = SA_NODEFER | SA_SIGINFO; sigaction(SIGSEGV, , NULL); sigaction(SIGBUS, , NULL); } #define NONFAILING(...)\ {\ __atomic_fetch_add(_segv, 1, __ATOMIC_SEQ_CST); \ if (_setjmp(segv_env) == 0) { \ __VA_ARGS__; \ } \ __atomic_fetch_sub(_segv, 1, __ATOMIC_SEQ_CST); \ } static uintptr_t syz_open_dev(uintptr_t a0, uintptr_t a1, uintptr_t a2) { if (a0 == 0xc || a0 == 0xb) { char buf[128]; sprintf(buf, "/dev/%s/%d:%d", a0 == 0xc ? "char" : "block", (uint8_t)a1, (uint8_t)a2); return open(buf, O_RDWR, 0); } else { char buf[1024]; char* hash; strncpy(buf, (char*)a0, sizeof(buf)); buf[sizeof(buf) - 1] = 0; while ((hash = strchr(buf, '#'))) { *hash = '0' + (char)(a1 % 10); a1 /= 10; } return open(buf, a2, 0); } } static uintptr_t syz_open_pts(uintptr_t a0, uintptr_t a1) { int ptyno = 0; if (ioctl(a0, TIOCGPTN, )) return -1; char buf[128]; sprintf(buf, "/dev/pts/%d", ptyno); return open(buf, a1, 0); } static uintptr_t syz_fuse_mount(uintptr_t a0, uintptr_t a1, uintptr_t a2, uintptr_t a3,
Re: [PATCH net] sctp: fix the panic caused by route update
On Mon, Oct 24, 2016 at 01:01:09AM +0800, Xin Long wrote: > Commit 7303a1475008 ("sctp: identify chunks that need to be fragmented > at IP level") made the chunk be fragmented at IP level in the next round > if it's size exceed PMTU. > > But there still is another case, PMTU can be updated if transport's dst > expires and transport's pmtu_pending is set in sctp_packet_transmit. If > the new PMTU is less than the chunk, the same issue with that commit can > be triggered. > > So we should drop this packet and let it retransmit in another round > where it would be fragmented at IP level. > > This patch is to fix it by checking the chunk size after PMTU may be > updated and dropping this packet if it's size exceed PMTU. > > Fixes: 90017accff61 ("sctp: Add GSO support") > Signed-off-by: Xin Long> --- > net/sctp/output.c | 8 +++- > 1 file changed, 7 insertions(+), 1 deletion(-) > > diff --git a/net/sctp/output.c b/net/sctp/output.c > index 2a5c189..6cb0df8 100644 > --- a/net/sctp/output.c > +++ b/net/sctp/output.c > @@ -418,6 +418,7 @@ int sctp_packet_transmit(struct sctp_packet *packet, > gfp_t gfp) > __u8 has_data = 0; > int gso = 0; > int pktcount = 0; > + int auth_len = 0; > struct dst_entry *dst; > unsigned char *auth = NULL; /* pointer to auth in skb data */ > > @@ -510,7 +511,12 @@ int sctp_packet_transmit(struct sctp_packet *packet, > gfp_t gfp) > list_for_each_entry(chunk, >chunk_list, list) { > int padded = SCTP_PAD4(chunk->skb->len); > > - if (pkt_size + padded > tp->pathmtu) > + if (chunk == packet->auth) > + auth_len = padded; > + else if (auth_len + padded + packet->overhead > > + tp->pathmtu) > + goto nomem; > + else if (pkt_size + padded > tp->pathmtu) > break; > pkt_size += padded; > } > -- > 2.1.0 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-sctp" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Acked-by: Neil Horman
[PATCH] LSO feature added to Cadence GEM driver
New Cadence GEM hardware support Large Segment Offload (LSO): TCP segmentation offload (TSO) as well as UDP fragmentation offload (UFO). Support for those features was added to the driver. Signed-off-by: Rafal Ozieblo--- drivers/net/ethernet/cadence/macb.c | 141 +--- drivers/net/ethernet/cadence/macb.h | 14 2 files changed, 143 insertions(+), 12 deletions(-) diff --git a/drivers/net/ethernet/cadence/macb.c b/drivers/net/ethernet/cadence/macb.c index b32444a..f659d57 100644 --- a/drivers/net/ethernet/cadence/macb.c +++ b/drivers/net/ethernet/cadence/macb.c @@ -32,7 +32,9 @@ #include #include #include - +#include +#include +#include #include "macb.h" #define MACB_RX_BUFFER_SIZE128 @@ -53,8 +55,10 @@ | MACB_BIT(TXERR)) #define MACB_TX_INT_FLAGS (MACB_TX_ERR_FLAGS | MACB_BIT(TCOMP)) -#define MACB_MAX_TX_LEN((unsigned int)((1 << MACB_TX_FRMLEN_SIZE) - 1)) -#define GEM_MAX_TX_LEN ((unsigned int)((1 << GEM_TX_FRMLEN_SIZE) - 1)) +/* Max length of transmit frame must be a multiple of 8 bytes */ +#define MACB_TX_LEN_ALIGN 8 +#define MACB_MAX_TX_LEN((unsigned int)((1 << MACB_TX_FRMLEN_SIZE) - 1) & ~((unsigned int)(MACB_TX_LEN_ALIGN - 1))) +#define GEM_MAX_TX_LEN ((unsigned int)((1 << GEM_TX_FRMLEN_SIZE) - 1) & ~((unsigned int)(MACB_TX_LEN_ALIGN - 1))) #define GEM_MTU_MIN_SIZE 68 @@ -1212,7 +1216,8 @@ static void macb_poll_controller(struct net_device *dev) static unsigned int macb_tx_map(struct macb *bp, struct macb_queue *queue, - struct sk_buff *skb) + struct sk_buff *skb, + unsigned int hdrlen) { dma_addr_t mapping; unsigned int len, entry, i, tx_head = queue->tx_head; @@ -1220,14 +1225,27 @@ static unsigned int macb_tx_map(struct macb *bp, struct macb_dma_desc *desc; unsigned int offset, size, count = 0; unsigned int f, nr_frags = skb_shinfo(skb)->nr_frags; - unsigned int eof = 1; - u32 ctrl; + unsigned int eof = 1, mss_mfs = 0; + u32 ctrl, lso_ctrl = 0, seq_ctrl = 0; + + /* LSO */ + if (skb_shinfo(skb)->gso_size != 0) { + if (IPPROTO_UDP == (((struct iphdr *)skb_network_header(skb))->protocol)) + /* UDP - UFO */ + lso_ctrl = MACB_LSO_UFO_ENABLE; + else + /* TCP - TSO */ + lso_ctrl = MACB_LSO_TSO_ENABLE; + } /* First, map non-paged data */ len = skb_headlen(skb); + + /* first buffer length */ + size = hdrlen; + offset = 0; while (len) { - size = min(len, bp->max_tx_length); entry = macb_tx_ring_wrap(tx_head); tx_skb = >tx_skb[entry]; @@ -1247,6 +1265,8 @@ static unsigned int macb_tx_map(struct macb *bp, offset += size; count++; tx_head++; + + size = min(len, bp->max_tx_length); } /* Then, map paged data from fragments */ @@ -1300,6 +1320,20 @@ static unsigned int macb_tx_map(struct macb *bp, desc = >tx_ring[entry]; desc->ctrl = ctrl; + if (lso_ctrl) { + if (lso_ctrl == MACB_LSO_UFO_ENABLE) + /* include header and FCS in value given to h/w */ + mss_mfs = skb_shinfo(skb)->gso_size + + skb_transport_offset(skb) + 4; + else /* TSO */ { + mss_mfs = skb_shinfo(skb)->gso_size; + /* TCP Sequence Number Source Select +* can be set only for TSO +*/ + seq_ctrl = 0; + } + } + do { i--; entry = macb_tx_ring_wrap(i); @@ -1314,6 +1348,16 @@ static unsigned int macb_tx_map(struct macb *bp, if (unlikely(entry == (TX_RING_SIZE - 1))) ctrl |= MACB_BIT(TX_WRAP); + /* First descriptor is header descriptor */ + if (i == queue->tx_head) { + ctrl |= MACB_BF(TX_LSO, lso_ctrl); + ctrl |= MACB_BF(TX_TCP_SEQ_SRC, seq_ctrl); + } else + /* Only set MSS/MFS on payload descriptors +* (second or later descriptor) +*/ + ctrl |= MACB_BF(MSS_MFS, mss_mfs); + /* Set TX buffer descriptor */ macb_set_addr(desc, tx_skb->mapping); /* desc->addr must be visible to hardware before clearing @@ -1339,6 +1383,37 @@ static unsigned int macb_tx_map(struct macb *bp, return 0; } +static int
[PATCH v2 net] macsec: Fix header length if SCI is added if explicitly disabled
Even if sending SCIs is explicitly disabled, the code that creates the Security Tag might still decide to add it (e.g. if multiple RX SCs are defined on the MACsec interface). But because the header length so far only depended on the configuration option the SCI overwrote the original frame's contents (EtherType and e.g. the beginning of the IP header) and if encrypted did not visibly end up in the packet, while the SC flag in the TCI field of the Security Tag was still set, resulting in invalid MACsec frames. Fixes: c09440f7dcb3 ("macsec: introduce IEEE 802.1AE driver") Signed-off-by: Tobias Brunner--- drivers/net/macsec.c | 26 ++ 1 file changed, 18 insertions(+), 8 deletions(-) diff --git a/drivers/net/macsec.c b/drivers/net/macsec.c index 3ea47f28e143..d2e61e002926 100644 --- a/drivers/net/macsec.c +++ b/drivers/net/macsec.c @@ -397,6 +397,14 @@ static struct macsec_cb *macsec_skb_cb(struct sk_buff *skb) #define DEFAULT_ENCRYPT false #define DEFAULT_ENCODING_SA 0 +static bool send_sci(const struct macsec_secy *secy) +{ + const struct macsec_tx_sc *tx_sc = >tx_sc; + + return tx_sc->send_sci || + (secy->n_rx_sc > 1 && !tx_sc->end_station && !tx_sc->scb); +} + static sci_t make_sci(u8 *addr, __be16 port) { sci_t sci; @@ -437,15 +445,15 @@ static unsigned int macsec_extra_len(bool sci_present) /* Fill SecTAG according to IEEE 802.1AE-2006 10.5.3 */ static void macsec_fill_sectag(struct macsec_eth_header *h, - const struct macsec_secy *secy, u32 pn) + const struct macsec_secy *secy, u32 pn, + bool sci_present) { const struct macsec_tx_sc *tx_sc = >tx_sc; - memset(>tci_an, 0, macsec_sectag_len(tx_sc->send_sci)); + memset(>tci_an, 0, macsec_sectag_len(sci_present)); h->eth.h_proto = htons(ETH_P_MACSEC); - if (tx_sc->send_sci || - (secy->n_rx_sc > 1 && !tx_sc->end_station && !tx_sc->scb)) { + if (sci_present) { h->tci_an |= MACSEC_TCI_SC; memcpy(>secure_channel_id, >sci, sizeof(h->secure_channel_id)); @@ -650,6 +658,7 @@ static struct sk_buff *macsec_encrypt(struct sk_buff *skb, struct macsec_tx_sc *tx_sc; struct macsec_tx_sa *tx_sa; struct macsec_dev *macsec = macsec_priv(dev); + bool sci_present; u32 pn; secy = >secy; @@ -687,7 +696,8 @@ static struct sk_buff *macsec_encrypt(struct sk_buff *skb, unprotected_len = skb->len; eth = eth_hdr(skb); - hh = (struct macsec_eth_header *)skb_push(skb, macsec_extra_len(tx_sc->send_sci)); + sci_present = send_sci(secy); + hh = (struct macsec_eth_header *)skb_push(skb, macsec_extra_len(sci_present)); memmove(hh, eth, 2 * ETH_ALEN); pn = tx_sa_update_pn(tx_sa, secy); @@ -696,7 +706,7 @@ static struct sk_buff *macsec_encrypt(struct sk_buff *skb, kfree_skb(skb); return ERR_PTR(-ENOLINK); } - macsec_fill_sectag(hh, secy, pn); + macsec_fill_sectag(hh, secy, pn, sci_present); macsec_set_shortlen(hh, unprotected_len - 2 * ETH_ALEN); skb_put(skb, secy->icv_len); @@ -726,10 +736,10 @@ static struct sk_buff *macsec_encrypt(struct sk_buff *skb, skb_to_sgvec(skb, sg, 0, skb->len); if (tx_sc->encrypt) { - int len = skb->len - macsec_hdr_len(tx_sc->send_sci) - + int len = skb->len - macsec_hdr_len(sci_present) - secy->icv_len; aead_request_set_crypt(req, sg, sg, len, iv); - aead_request_set_ad(req, macsec_hdr_len(tx_sc->send_sci)); + aead_request_set_ad(req, macsec_hdr_len(sci_present)); } else { aead_request_set_crypt(req, sg, sg, 0, iv); aead_request_set_ad(req, skb->len - secy->icv_len); -- 1.9.1
Re: net/dccp: warning in dccp_set_state
Hi Eric, I can confirm that with your patch the warning goes away. Tested-by: Andrey KonovalovOn Mon, Oct 24, 2016 at 2:52 PM, Eric Dumazet wrote: > On Mon, 2016-10-24 at 05:47 -0700, Eric Dumazet wrote: >> On Mon, 2016-10-24 at 14:23 +0200, Andrey Konovalov wrote: >> > Hi, >> > >> > I've got the following error report while running the syzkaller fuzzer: >> > >> > WARNING: CPU: 1 PID: 21072 at net/dccp/proto.c:83 >> > dccp_set_state+0x229/0x290 >> > Kernel panic - not syncing: panic_on_warn set ... >> > >> > CPU: 1 PID: 21072 Comm: syz-executor Not tainted 4.9.0-rc1+ #293 >> > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs >> > 01/01/2011 >> > 88003d4c7738 81b474f4 0003 dc00 >> > 844f8b00 88003d4c7804 88003d4c7800 8140c06a >> > 41b58ab3 8479ab7d 8140beae 8140cd00 >> > Call Trace: >> > [< inline >] __dump_stack lib/dump_stack.c:15 >> > [] dump_stack+0xb3/0x10f lib/dump_stack.c:51 >> > [] panic+0x1bc/0x39d kernel/panic.c:179 >> > [] __warn+0x1cc/0x1f0 kernel/panic.c:542 >> > [] warn_slowpath_null+0x2c/0x40 kernel/panic.c:585 >> > [] dccp_set_state+0x229/0x290 net/dccp/proto.c:83 >> > [] dccp_close+0x612/0xc10 net/dccp/proto.c:1016 >> > [] inet_release+0xef/0x1c0 net/ipv4/af_inet.c:415 >> > [] sock_release+0x8e/0x1d0 net/socket.c:570 >> > [] sock_close+0x16/0x20 net/socket.c:1017 >> > [] __fput+0x29d/0x720 fs/file_table.c:208 >> > [] fput+0x15/0x20 fs/file_table.c:244 >> > [] task_work_run+0xf8/0x170 kernel/task_work.c:116 >> > [< inline >] exit_task_work include/linux/task_work.h:21 >> > [] do_exit+0x883/0x2ac0 kernel/exit.c:828 >> > [] do_group_exit+0x10e/0x340 kernel/exit.c:931 >> > [] get_signal+0x634/0x15a0 kernel/signal.c:2307 >> > [] do_signal+0x8d/0x1a30 arch/x86/kernel/signal.c:807 >> > [] exit_to_usermode_loop+0xe5/0x130 >> > arch/x86/entry/common.c:156 >> > [< inline >] prepare_exit_to_usermode arch/x86/entry/common.c:190 >> > [] syscall_return_slowpath+0x1a8/0x1e0 >> > arch/x86/entry/common.c:259 >> > [] entry_SYSCALL_64_fastpath+0xc0/0xc2 >> > Dumping ftrace buffer: >> >(ftrace buffer empty) >> > Kernel Offset: disabled >> > >> > On commit 1a1891d762d6e64daf07b5be4817e3fbb29e3c59 (Oct 18). >> >> Not sure we we keep around DCCP. David could we kill it ? >> >> TCP seems to have an additional check, missing in DCCP. >> >> diff --git a/net/dccp/proto.c b/net/dccp/proto.c >> index 41e65804ddf5..9fe25bf63296 100644 >> --- a/net/dccp/proto.c >> +++ b/net/dccp/proto.c >> @@ -1009,6 +1009,10 @@ void dccp_close(struct sock *sk, long timeout) >> __kfree_skb(skb); >> } >> >> + /* If socket has been already reset kill it. */ >> + if (sk->sk_state == DCCP_CLOSED) >> + goto adjudge_to_death; >> + >> if (data_was_unread) { >> /* Unread data was tossed, send an appropriate Reset Code */ >> DCCP_WARN("ABORT with %u bytes unread\n", data_was_unread); >> > > The equivalent tcp fix was : > https://git.kernel.org/cgit/linux/kernel/git/davem/net.git/commit/?id=565b7b2d2e632b5792879c0c9cccdd9eecd31195 > >
Re: [PATCH net] macsec: Fix header length if SCI is added if explicitily disabled
2016-10-24, 15:32:40 +0200, Tobias Brunner wrote: > > [snip] > >> @@ -440,12 +448,12 @@ static void macsec_fill_sectag(struct > >> macsec_eth_header *h, > >> const struct macsec_secy *secy, u32 pn) > >> { > >>const struct macsec_tx_sc *tx_sc = >tx_sc; > >> + bool sci_present = send_sci(secy); > > > > You're already computing this in macsec_encrypt() just before calling > > macsec_fill_sectag(), so you could pass it as argument instead of > > recomputing it. > > Right, I'll send a v2. Would you like me to inline the send_sci() > function, as it will only be called once afterwards. I think keeping the send_sci() function is okay, but if you prefer to inline it, I don't mind. -- Sabrina
Re: [PATCH net] macsec: Fix header length if SCI is added if explicitily disabled
> [snip] >> @@ -440,12 +448,12 @@ static void macsec_fill_sectag(struct >> macsec_eth_header *h, >> const struct macsec_secy *secy, u32 pn) >> { >> const struct macsec_tx_sc *tx_sc = >tx_sc; >> +bool sci_present = send_sci(secy); > > You're already computing this in macsec_encrypt() just before calling > macsec_fill_sectag(), so you could pass it as argument instead of > recomputing it. Right, I'll send a v2. Would you like me to inline the send_sci() function, as it will only be called once afterwards. Regards, Tobias
Re: UDP does not autobind on recv
On Mon, 2016-10-24 at 14:54 +0200, Jiri Slaby wrote: > Hello, > > as per man 7 udp: > In order to receive packets, the socket can be bound to > a local address first by using bind(2). Otherwise, > the socket layer will automatically assign a free local > port out of the range defined by /proc/sys/net/ipv4 > /ip_local_port_range and bind the socket to INADDR_ANY. > > I did not know that bind is unneeded, so I tried that. But it does not > work with this piece of code: > int main() > { > char buf[128]; > int fd = socket(AF_INET, SOCK_DGRAM, 0); > recv(fd, buf, sizeof(buf), 0); > } autobind makes little sense at recv() time really. How an application could expect to receive a frame to 'some socket' without even knowing its port ? How useful would that be exactly ? How TCP behaves ? I would say, fix the documentation if it is not correct.
Re: [PATCH 3/5] genetlink: statically initialize families
On Mon, 2016-10-24 at 14:40 +0200, Johannes Berg wrote: > From: Johannes Berg> > Instead of providing macros/inline functions to initialize > the families, make all users initialize them statically and > get rid of the macros. > > This reduces the kernel code size by about 1.6k on x86-64 > (with allyesconfig). Actually, with the new system where it's not const, I could even split this up and submit per subsystem, i.e. the fourth patch doesn't depend on it. I thought it would, since I wanted to make it const, but since I failed it doesn't actually have that dependency. johannes
UDP does not autobind on recv
Hello, as per man 7 udp: In order to receive packets, the socket can be bound to a local address first by using bind(2). Otherwise, the socket layer will automatically assign a free local port out of the range defined by /proc/sys/net/ipv4 /ip_local_port_range and bind the socket to INADDR_ANY. I did not know that bind is unneeded, so I tried that. But it does not work with this piece of code: int main() { char buf[128]; int fd = socket(AF_INET, SOCK_DGRAM, 0); recv(fd, buf, sizeof(buf), 0); } The recv above never returns (even if I bomb all ports from the range). ss -ulpan is silent too. As a workaround, I can stick a dummy write/send before recv: write(fd, "", 0); And it starts working. ss suddenly displays a port which the program listens on. I think the UDP recv path should do inet_autobind as I have done in the attached patch. But my knowledge is very limited in that area, so I have no idea whether that is correct at all. thanks, -- js suse labs >From 57c320998feb2e1e705a4ab6d3bbcb74c6ae65f0 Mon Sep 17 00:00:00 2001 From: Jiri SlabyDate: Sat, 22 Oct 2016 12:10:53 +0200 Subject: [PATCH] net: autobind UDP on recv Signed-off-by: Jiri Slaby --- include/net/inet_common.h | 1 + net/ipv4/af_inet.c| 3 ++- net/ipv4/udp.c| 5 + net/ipv6/udp.c| 5 + 4 files changed, 13 insertions(+), 1 deletion(-) diff --git a/include/net/inet_common.h b/include/net/inet_common.h index 5d683428fced..ba224ed3dd36 100644 --- a/include/net/inet_common.h +++ b/include/net/inet_common.h @@ -27,6 +27,7 @@ ssize_t inet_sendpage(struct socket *sock, struct page *page, int offset, int inet_recvmsg(struct socket *sock, struct msghdr *msg, size_t size, int flags); int inet_shutdown(struct socket *sock, int how); +int inet_autobind(struct sock *sk); int inet_listen(struct socket *sock, int backlog); void inet_sock_destruct(struct sock *sk); int inet_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len); diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c index 9648c97e541f..d23acb11cdb0 100644 --- a/net/ipv4/af_inet.c +++ b/net/ipv4/af_inet.c @@ -171,7 +171,7 @@ EXPORT_SYMBOL(inet_sock_destruct); * Automatically bind an unbound socket. */ -static int inet_autobind(struct sock *sk) +int inet_autobind(struct sock *sk) { struct inet_sock *inet; /* We may need to bind the socket. */ @@ -187,6 +187,7 @@ static int inet_autobind(struct sock *sk) release_sock(sk); return 0; } +EXPORT_SYMBOL_GPL(inet_autobind); /* * Move a socket into listening state. diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c index 82fb78265f4b..ceb07c83af17 100644 --- a/net/ipv4/udp.c +++ b/net/ipv4/udp.c @@ -1360,6 +1360,11 @@ int udp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len, int noblock, if (flags & MSG_ERRQUEUE) return ip_recv_error(sk, msg, len, addr_len); + /* We may need to bind the socket. */ + if (!inet_sk(sk)->inet_num && !sk->sk_prot->no_autobind && + inet_autobind(sk)) + return -EAGAIN; + try_again: peeking = off = sk_peek_offset(sk, flags); skb = __skb_recv_datagram(sk, flags | (noblock ? MSG_DONTWAIT : 0), diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c index 71963b23d5a5..1c3dafc3d91e 100644 --- a/net/ipv6/udp.c +++ b/net/ipv6/udp.c @@ -341,6 +341,11 @@ int udpv6_recvmsg(struct sock *sk, struct msghdr *msg, size_t len, if (np->rxpmtu && np->rxopt.bits.rxpmtu) return ipv6_recv_rxpmtu(sk, msg, len, addr_len); + /* We may need to bind the socket. */ + if (!inet_sk(sk)->inet_num && !sk->sk_prot->no_autobind && + inet_autobind(sk)) + return -EAGAIN; + try_again: peeking = off = sk_peek_offset(sk, flags); skb = __skb_recv_datagram(sk, flags | (noblock ? MSG_DONTWAIT : 0), -- 2.10.1
Re: [PATCH net-next 0/9] alx: add multi queue support
I tested this patchset with my AR8161 ethernet card in different situations: - After two weeks of daily use I observed no regression with this patchset. - I manually tested the new error paths in the __alx-open function and in the other newly added device bringup functions. - iperf udp and tcp throughput are exactly the same with and without this patchset, regardless of the number of parallel streams. - netperf TCP_RR and UDP_RR tests shows a slight performance increase of about 1-2% with this patchset. I don't own any of the other supported cards by the driver, so if someone is willing to test these patches on one of the other cards, this is highly appreciated. Benefits are the split between misc interrupts and the tx / rx interrupts with the new msi-x support and better multi core cpu utilization. Sorry for not providing these information in the patchset, I will add these in the next revision. -- Tobias On 21.10.16, Chris Snook wrote: > Can you please elaborate on the testing and benefits? > > - Chris > > On Fri, Oct 21, 2016 at 3:50 AM Tobias Regnery> wrote: > > > This patchset lays the groundwork for multi queue support in the alx driver > > and enables multi queue support for the tx path by default. The hardware > > supports up to 4 tx queues. > > > > The rx path is a little bit harder because apparently (based on the limited > > information from the downstream driver) the hardware supports up to 8 rss > > queues but only has one hardware descriptor ring on the rx side. So the rx > > path will be part of another patchset. > > > > This work is based on the downstream driver at github.com/qca/alx > > > > I had a hard time splitting these changes up into reasonable parts because > > this is my first bigger kernel patchset, so please be patient if this is > > not > > the right approach. > > > > Tobias Regnery (9): > > alx: refactor descriptor allocation > > alx: extend data structures for multi queue support > > alx: add ability to allocate and free alx_napi structures > > alx: switch to per queue data structures > > alx: prepare interrupt functions for multiple queues > > alx: prepare resource allocation for multi queue support > > alx: prepare tx path for multi queue support > > alx: enable msi-x interrupts by default > > alx: enable multiple tx queues > > > > drivers/net/ethernet/atheros/alx/alx.h | 36 ++- > > drivers/net/ethernet/atheros/alx/main.c | 554 > > ++-- > > 2 files changed, 420 insertions(+), 170 deletions(-) > > > > -- > > 2.7.4 > > > >
Re: net/dccp: warning in dccp_set_state
On Mon, 2016-10-24 at 05:47 -0700, Eric Dumazet wrote: > On Mon, 2016-10-24 at 14:23 +0200, Andrey Konovalov wrote: > > Hi, > > > > I've got the following error report while running the syzkaller fuzzer: > > > > WARNING: CPU: 1 PID: 21072 at net/dccp/proto.c:83 dccp_set_state+0x229/0x290 > > Kernel panic - not syncing: panic_on_warn set ... > > > > CPU: 1 PID: 21072 Comm: syz-executor Not tainted 4.9.0-rc1+ #293 > > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 > > 88003d4c7738 81b474f4 0003 dc00 > > 844f8b00 88003d4c7804 88003d4c7800 8140c06a > > 41b58ab3 8479ab7d 8140beae 8140cd00 > > Call Trace: > > [< inline >] __dump_stack lib/dump_stack.c:15 > > [] dump_stack+0xb3/0x10f lib/dump_stack.c:51 > > [] panic+0x1bc/0x39d kernel/panic.c:179 > > [] __warn+0x1cc/0x1f0 kernel/panic.c:542 > > [] warn_slowpath_null+0x2c/0x40 kernel/panic.c:585 > > [] dccp_set_state+0x229/0x290 net/dccp/proto.c:83 > > [] dccp_close+0x612/0xc10 net/dccp/proto.c:1016 > > [] inet_release+0xef/0x1c0 net/ipv4/af_inet.c:415 > > [] sock_release+0x8e/0x1d0 net/socket.c:570 > > [] sock_close+0x16/0x20 net/socket.c:1017 > > [] __fput+0x29d/0x720 fs/file_table.c:208 > > [] fput+0x15/0x20 fs/file_table.c:244 > > [] task_work_run+0xf8/0x170 kernel/task_work.c:116 > > [< inline >] exit_task_work include/linux/task_work.h:21 > > [] do_exit+0x883/0x2ac0 kernel/exit.c:828 > > [] do_group_exit+0x10e/0x340 kernel/exit.c:931 > > [] get_signal+0x634/0x15a0 kernel/signal.c:2307 > > [] do_signal+0x8d/0x1a30 arch/x86/kernel/signal.c:807 > > [] exit_to_usermode_loop+0xe5/0x130 > > arch/x86/entry/common.c:156 > > [< inline >] prepare_exit_to_usermode arch/x86/entry/common.c:190 > > [] syscall_return_slowpath+0x1a8/0x1e0 > > arch/x86/entry/common.c:259 > > [] entry_SYSCALL_64_fastpath+0xc0/0xc2 > > Dumping ftrace buffer: > >(ftrace buffer empty) > > Kernel Offset: disabled > > > > On commit 1a1891d762d6e64daf07b5be4817e3fbb29e3c59 (Oct 18). > > Not sure we we keep around DCCP. David could we kill it ? > > TCP seems to have an additional check, missing in DCCP. > > diff --git a/net/dccp/proto.c b/net/dccp/proto.c > index 41e65804ddf5..9fe25bf63296 100644 > --- a/net/dccp/proto.c > +++ b/net/dccp/proto.c > @@ -1009,6 +1009,10 @@ void dccp_close(struct sock *sk, long timeout) > __kfree_skb(skb); > } > > + /* If socket has been already reset kill it. */ > + if (sk->sk_state == DCCP_CLOSED) > + goto adjudge_to_death; > + > if (data_was_unread) { > /* Unread data was tossed, send an appropriate Reset Code */ > DCCP_WARN("ABORT with %u bytes unread\n", data_was_unread); > The equivalent tcp fix was : https://git.kernel.org/cgit/linux/kernel/git/davem/net.git/commit/?id=565b7b2d2e632b5792879c0c9cccdd9eecd31195
[PATCH v4] skbedit: allow the user to specify bitmask for mark
The user may want to use only some bits of the skb mark in his skbedit rules because the remaining part might be used by something else. Introduce the "mask" parameter to the skbedit actor in order to implement such functionality. When the mask is specified, only those bits selected by the latter are altered really changed by the actor, while the rest is left untouched. Signed-off-by: Antonio QuartulliSigned-off-by: Jamal Hadi Salim --- This patch has been sleeping for a while although it was basically ready for being merged. I hope it can still be merged. Checkpatch is now complaining about this: CHECK: Comparison to NULL could be written "tb[TCA_SKBEDIT_MASK]" #112: FILE: net/sched/act_skbedit.c:114: + if (tb[TCA_SKBEDIT_MASK] != NULL) { However the surrounding code does not use this codestyle. Please, let me know if I should rearrange this line. Thanks! Changes from v3: - rebase on top of net-next - fix syntax error in if statement Changes from v2: - remove useless comments - use nla_put_u32() and fix typ0 Changes from v1: - use '&=' tcf_skbedit() to clean the mark - extend tcf_skbedit_dump() in order to send the mask as well include/net/tc_act/tc_skbedit.h| 1 + include/uapi/linux/tc_act/tc_skbedit.h | 2 ++ net/sched/act_skbedit.c| 21 ++--- 3 files changed, 21 insertions(+), 3 deletions(-) diff --git a/include/net/tc_act/tc_skbedit.h b/include/net/tc_act/tc_skbedit.h index 5767e9dbcf92..19cd3d345804 100644 --- a/include/net/tc_act/tc_skbedit.h +++ b/include/net/tc_act/tc_skbedit.h @@ -27,6 +27,7 @@ struct tcf_skbedit { u32 flags; u32 priority; u32 mark; + u32 mask; u16 queue_mapping; u16 ptype; }; diff --git a/include/uapi/linux/tc_act/tc_skbedit.h b/include/uapi/linux/tc_act/tc_skbedit.h index a4d00c608d8f..2884425738ce 100644 --- a/include/uapi/linux/tc_act/tc_skbedit.h +++ b/include/uapi/linux/tc_act/tc_skbedit.h @@ -28,6 +28,7 @@ #define SKBEDIT_F_QUEUE_MAPPING0x2 #define SKBEDIT_F_MARK 0x4 #define SKBEDIT_F_PTYPE0x8 +#define SKBEDIT_F_MASK 0x10 struct tc_skbedit { tc_gen; @@ -42,6 +43,7 @@ enum { TCA_SKBEDIT_MARK, TCA_SKBEDIT_PAD, TCA_SKBEDIT_PTYPE, + TCA_SKBEDIT_MASK, __TCA_SKBEDIT_MAX }; #define TCA_SKBEDIT_MAX (__TCA_SKBEDIT_MAX - 1) diff --git a/net/sched/act_skbedit.c b/net/sched/act_skbedit.c index a133dcb82132..024f3a3afeff 100644 --- a/net/sched/act_skbedit.c +++ b/net/sched/act_skbedit.c @@ -46,8 +46,10 @@ static int tcf_skbedit(struct sk_buff *skb, const struct tc_action *a, if (d->flags & SKBEDIT_F_QUEUE_MAPPING && skb->dev->real_num_tx_queues > d->queue_mapping) skb_set_queue_mapping(skb, d->queue_mapping); - if (d->flags & SKBEDIT_F_MARK) - skb->mark = d->mark; + if (d->flags & SKBEDIT_F_MARK) { + skb->mark &= ~d->mask; + skb->mark |= d->mark & d->mask; + } if (d->flags & SKBEDIT_F_PTYPE) skb->pkt_type = d->ptype; @@ -61,6 +63,7 @@ static const struct nla_policy skbedit_policy[TCA_SKBEDIT_MAX + 1] = { [TCA_SKBEDIT_QUEUE_MAPPING] = { .len = sizeof(u16) }, [TCA_SKBEDIT_MARK] = { .len = sizeof(u32) }, [TCA_SKBEDIT_PTYPE] = { .len = sizeof(u16) }, + [TCA_SKBEDIT_MASK] = { .len = sizeof(u32) }, }; static int tcf_skbedit_init(struct net *net, struct nlattr *nla, @@ -71,7 +74,7 @@ static int tcf_skbedit_init(struct net *net, struct nlattr *nla, struct nlattr *tb[TCA_SKBEDIT_MAX + 1]; struct tc_skbedit *parm; struct tcf_skbedit *d; - u32 flags = 0, *priority = NULL, *mark = NULL; + u32 flags = 0, *priority = NULL, *mark = NULL, *mask = NULL; u16 *queue_mapping = NULL, *ptype = NULL; bool exists = false; int ret = 0, err; @@ -108,6 +111,11 @@ static int tcf_skbedit_init(struct net *net, struct nlattr *nla, mark = nla_data(tb[TCA_SKBEDIT_MARK]); } + if (tb[TCA_SKBEDIT_MASK] != NULL) { + flags |= SKBEDIT_F_MASK; + mask = nla_data(tb[TCA_SKBEDIT_MASK]); + } + parm = nla_data(tb[TCA_SKBEDIT_PARMS]); exists = tcf_hash_check(tn, parm->index, a, bind); @@ -145,6 +153,10 @@ static int tcf_skbedit_init(struct net *net, struct nlattr *nla, d->mark = *mark; if (flags & SKBEDIT_F_PTYPE) d->ptype = *ptype; + /* default behaviour is to use all the bits */ + d->mask = 0x; + if (flags & SKBEDIT_F_MASK) + d->mask = *mask; d->tcf_action = parm->action; @@ -182,6 +194,9 @@ static int tcf_skbedit_dump(struct sk_buff *skb, struct
Re: net/dccp: warning in dccp_set_state
On Mon, 2016-10-24 at 14:23 +0200, Andrey Konovalov wrote: > Hi, > > I've got the following error report while running the syzkaller fuzzer: > > WARNING: CPU: 1 PID: 21072 at net/dccp/proto.c:83 dccp_set_state+0x229/0x290 > Kernel panic - not syncing: panic_on_warn set ... > > CPU: 1 PID: 21072 Comm: syz-executor Not tainted 4.9.0-rc1+ #293 > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 > 88003d4c7738 81b474f4 0003 dc00 > 844f8b00 88003d4c7804 88003d4c7800 8140c06a > 41b58ab3 8479ab7d 8140beae 8140cd00 > Call Trace: > [< inline >] __dump_stack lib/dump_stack.c:15 > [] dump_stack+0xb3/0x10f lib/dump_stack.c:51 > [] panic+0x1bc/0x39d kernel/panic.c:179 > [] __warn+0x1cc/0x1f0 kernel/panic.c:542 > [] warn_slowpath_null+0x2c/0x40 kernel/panic.c:585 > [] dccp_set_state+0x229/0x290 net/dccp/proto.c:83 > [] dccp_close+0x612/0xc10 net/dccp/proto.c:1016 > [] inet_release+0xef/0x1c0 net/ipv4/af_inet.c:415 > [] sock_release+0x8e/0x1d0 net/socket.c:570 > [] sock_close+0x16/0x20 net/socket.c:1017 > [] __fput+0x29d/0x720 fs/file_table.c:208 > [] fput+0x15/0x20 fs/file_table.c:244 > [] task_work_run+0xf8/0x170 kernel/task_work.c:116 > [< inline >] exit_task_work include/linux/task_work.h:21 > [] do_exit+0x883/0x2ac0 kernel/exit.c:828 > [] do_group_exit+0x10e/0x340 kernel/exit.c:931 > [] get_signal+0x634/0x15a0 kernel/signal.c:2307 > [] do_signal+0x8d/0x1a30 arch/x86/kernel/signal.c:807 > [] exit_to_usermode_loop+0xe5/0x130 > arch/x86/entry/common.c:156 > [< inline >] prepare_exit_to_usermode arch/x86/entry/common.c:190 > [] syscall_return_slowpath+0x1a8/0x1e0 > arch/x86/entry/common.c:259 > [] entry_SYSCALL_64_fastpath+0xc0/0xc2 > Dumping ftrace buffer: >(ftrace buffer empty) > Kernel Offset: disabled > > On commit 1a1891d762d6e64daf07b5be4817e3fbb29e3c59 (Oct 18). Not sure we we keep around DCCP. David could we kill it ? TCP seems to have an additional check, missing in DCCP. diff --git a/net/dccp/proto.c b/net/dccp/proto.c index 41e65804ddf5..9fe25bf63296 100644 --- a/net/dccp/proto.c +++ b/net/dccp/proto.c @@ -1009,6 +1009,10 @@ void dccp_close(struct sock *sk, long timeout) __kfree_skb(skb); } + /* If socket has been already reset kill it. */ + if (sk->sk_state == DCCP_CLOSED) + goto adjudge_to_death; + if (data_was_unread) { /* Unread data was tossed, send an appropriate Reset Code */ DCCP_WARN("ABORT with %u bytes unread\n", data_was_unread);
[PATCH net-next] ethernet: fix min/max MTU typos
Fixes: d894be57ca92('ethernet: use net core MTU range checking in more drivers') CC: Jarod WilsonCC: Thomas Falcon Signed-off-by: Stefan Richter --- drivers/net/ethernet/broadcom/sb1250-mac.c | 2 +- drivers/net/ethernet/ibm/ibmveth.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/broadcom/sb1250-mac.c b/drivers/net/ethernet/broadcom/sb1250-mac.c index cb312e4c89f4..435a2e4739d1 100644 --- a/drivers/net/ethernet/broadcom/sb1250-mac.c +++ b/drivers/net/ethernet/broadcom/sb1250-mac.c @@ -2219,7 +2219,7 @@ static int sbmac_init(struct platform_device *pldev, long long base) dev->netdev_ops = _netdev_ops; dev->watchdog_timeo = TX_TIMEOUT; - dev->max_mtu = 0; + dev->min_mtu = 0; dev->max_mtu = ENET_PACKET_SIZE; netif_napi_add(dev, >napi, sbmac_poll, 16); diff --git a/drivers/net/ethernet/ibm/ibmveth.c b/drivers/net/ethernet/ibm/ibmveth.c index 29c05d0d79a9..4a81c892fc31 100644 --- a/drivers/net/ethernet/ibm/ibmveth.c +++ b/drivers/net/ethernet/ibm/ibmveth.c @@ -1549,7 +1549,7 @@ static int ibmveth_probe(struct vio_dev *dev, const struct vio_device_id *id) } netdev->min_mtu = IBMVETH_MIN_MTU; - netdev->min_mtu = ETH_MAX_MTU; + netdev->max_mtu = ETH_MAX_MTU; memcpy(netdev->dev_addr, mac_addr_p, ETH_ALEN); -- Stefan Richter -==- =-=- ==--- http://arcgraph.de/sr/