Re: DQL and TCQ_F_CAN_BYPASS destroy performance under virtualizaiton (Was: "Re: net_sched strange in 4.11")
On 11/05/17 03:43, Jason Wang wrote: > > > On 2017年05月10日 17:42, Anton Ivanov wrote: >> On 10/05/17 09:56, Jason Wang wrote: >>> >>> >>> On 2017年05月10日 13:28, Anton Ivanov wrote: On 10/05/17 03:18, Jason Wang wrote: > > On 2017年05月09日 23:11, Stefan Hajnoczi wrote: >> On Tue, May 09, 2017 at 08:46:46AM +0100, Anton Ivanov wrote: >>> I have figured it out. Two issues. >>> >>> 1) skb->xmit_more is hardly ever set under virtualization because >>> the qdisc >>> is usually bypassed because of TCQ_F_CAN_BYPASS. Once >>> TCQ_F_CAN_BYPASS is >>> set a virtual NIC driver is not likely see skb->xmit_more (this >>> answers my >>> "how does this work at all" question). >>> >>> 2) If that flag is turned off (I patched sched_generic to turn it >>> off in >>> pfifo_fast while testing), DQL keeps xmit_more from being set. If >>> the driver >>> is not DQL enabled xmit_more is never ever set. If the driver is >>> DQL >>> enabled >>> the queue is adjusted to ensure xmit_more stops happening within >>> 10-15 xmit >>> cycles. >>> >>> That is plain *wrong* for virtual NICs - virtio, emulated NICs, >>> etc. >>> There, >>> the BIG cost is telling the hypervisor that it needs to "kick" the >>> packets. >>> The cost of putting them into the vNIC buffers is negligible. >>> You want >>> xmit_more to happen - it makes between 50% and 300% (depending >>> on vNIC >>> design) difference. If there is no xmit_more the vNIC will >>> immediately >>> "kick" the hypervisor and try to signal that the packet needs >>> to move >>> straight away (as for example in virtio_net). > How do you measure the performance? TCP or just measure pps? In this particular case - tcp from guest. I have a couple of other benchmarks (forwarding, etc). >>> >>> One more question, is the number for virtio-net or other emulated vNIC? >> >> Other for now - you are cc-ed to keep you in the loop. >> >> Virtio is next on my list - I am revisiting the l2tpv3.c driver in >> QEMU and looking at how to preserve bulking by adding back sendmmsg >> (as well as a list of other features/transports). >> >> We had sendmmsg removed for the final inclusion in QEMU 2.1, it >> presently uses only recvmmsg so for the time being it does not care. >> That will most likely change once it starts using sendmmsg as well. > > An issue is that qemu net API does not support bulking, do you plan to > add it? Yes :) A. > > Thanks > -- Anton R. Ivanov Cambridgegreys Limited. Registered in England. Company Number 10273661
[PATCH net] net: sched: optimize class dumps
From: Eric DumazetIn commit 59cc1f61f09c ("net: sched: convert qdisc linked list to hashtable") we missed the opportunity to considerably speed up tc_dump_tclass_root() if a qdisc handle is provided by user. Instead of iterating all the qdiscs, use qdisc_match_from_root() to directly get the one we look for. Signed-off-by: Eric Dumazet Cc: Jiri Kosina Cc: Jamal Hadi Salim Cc: Cong Wang Cc: Jiri Pirko --- net/sched/sch_api.c |6 ++ 1 file changed, 6 insertions(+) diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c index bbe57d57b67fd498692bd41db49147511f1bb091..e88342fde1bc409aed6a3c86e7a628030eaac66f 100644 --- a/net/sched/sch_api.c +++ b/net/sched/sch_api.c @@ -1831,6 +1831,12 @@ static int tc_dump_tclass_root(struct Qdisc *root, struct sk_buff *skb, if (!qdisc_dev(root)) return 0; + if (tcm->tcm_parent) { + q = qdisc_match_from_root(root, TC_H_MAJ(tcm->tcm_parent)); + if (q && tc_dump_tclass_qdisc(q, skb, tcm, cb, t_p, s_t) < 0) + return -1; + return 0; + } hash_for_each(qdisc_dev(root)->qdisc_hash, b, q, hash) { if (tc_dump_tclass_qdisc(q, skb, tcm, cb, t_p, s_t) < 0) return -1;
RE: [PATCH] net: fec: select queue depending on VLAN priority
From: Stefan AgnerSent: Thursday, May 11, 2017 12:08 PM >To: Andy Duan >Cc: David Miller ; and...@lunn.ch; >feste...@gmail.com; netdev@vger.kernel.org; linux- >ker...@vger.kernel.org >Subject: Re: [PATCH] net: fec: select queue depending on VLAN priority > >On 2017-05-09 19:42, Andy Duan wrote: >> From: David Miller Sent: Tuesday, May 09, 2017 >> 9:39 PM >>>To: ste...@agner.ch >>>Cc: Andy Duan ; and...@lunn.ch; >>>feste...@gmail.com; netdev@vger.kernel.org; linux- >>>ker...@vger.kernel.org >>>Subject: Re: [PATCH] net: fec: select queue depending on VLAN priority >>> >>>From: Stefan Agner >>>Date: Mon, 8 May 2017 22:37:08 -0700 >>> Since the addition of the multi queue code with commit 59d0f7465644 ("net: fec: init multi queue date structure") the queue selection has been handelt by the default transmit queue selection implementation which tries to evenly distribute the traffic across all available queues. This selection presumes that the queues are using an equal priority, however, the queues 1 and 2 are actually of higher priority (the classification of the queues is enabled in >fec_enet_enable_ring). This can lead to net scheduler warnings and continuous TX ring dumps when exercising the system with iperf. Use only queue 0 for all common traffic (no VLAN and P802.1p priority 0 and 1) and route level 2-7 through queue 1 and 2. Signed-off-by: Fugang Duan Fixes: 59d0f7465644 ("net: fec: init multi queue date structure") >>> >>>If the queues are used for prioritization, and it does not have >>>multiple normal priority level queues, multiqueue is not what the >>>driver should have implemented. >> Firstly, HW multiple queues support: >> - Traffic-shaping bandwidth distribution supports credit-based and >> round-robin-based policies. Either policy can be combined with >> time-based shaping. >> - AVB (Audio Video Bridging, IEEE 802.1Qav) features: >> * Credit-based bandwidth distribution policy can be combined >with >> time-based shaping >> * AVB endpoint talker and listener support >> * Support for arbitration between different priority traffic >> (for >> example, AVB class A, AVB class B, and non-AVB) Round-robin-based >> policies: >> It has the same priority for three queues: In the round-robin QoS >> scheme, each queue is given an equal opportunity to transmit one >> frame. For example, if queue n has a frame to transmit, the queue >> transmits its frame. After queue n has transmitted its frame, or if >> queue n does not have a frame to transmit, queue n+1 is then allowed >> to transmit its frame, and so on. >> >> Credit-based policies: >> The AVB credit based shaper acts independently, per class, to control >> the bandwidth distribution between normal traffic and time-sensitive >> traffic with respect to the total link bandwidth available. >> Credit based shaper conbined with time-based shaping: >> - priority: ClassA queue > ClassB queue > best effort >> - ensure the queue bandwidth as user set based on time- >based shaping >> algorithms (transmitter transmit frame from three queue in turn based >> on time-based shaping algorithms) >> And in real AVB case, each streaming can be independent, and are >> fixed on related queue. Then driver level should implement >> .ndo_select_queue() to put the streaming into related queue. That is >> what the patch did. >> >> The current driver config the three queue to credit-based policies >> (AVB), the patch seems no problem for the implementation. Do you have >> any suggestion ? >> > >I tried using the round robin mode by adding this: > >+ /* Set Round-Robin policy */ >+ writel(1, fep->hwp + FEC_QOS_SCHEME); > >After a while I got the warning non the less: > >WARNING: CPU: 0 PID: 0 at net/sched/sch_generic.c:316 >dev_watchdog+0x248/0x24c NETDEV WATCHDOG: eth0 (fec): transmit queue >2 timed out Modules linked in: >CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.11.0-rc1-00058-g56d22eced8bc- >dirty #377 Hardware name: Freescale i.MX7 Dual (Device Tree) [] >(unwind_backtrace) from [] (show_stack+0x10/0x14) [] >(show_stack) from [] (dump_stack+0x78/0x8c) [] >(dump_stack) from [] (__warn+0xe8/0x100) [] >(__warn) from [] (warn_slowpath_fmt+0x38/0x48) [] >(warn_slowpath_fmt) from [] >(dev_watchdog+0x248/0x24c) >[] (dev_watchdog) from [] (call_timer_fn+0x28/0x98) >[] (call_timer_fn) from [] (expire_timers+0xa0/0xac) >[] (expire_timers) from [] >(run_timer_softirq+0x9c/0x194) >[] (run_timer_softirq) from [] >(__do_softirq+0x114/0x234) >[] (__do_softirq) from [] (irq_exit+0xcc/0x108) >[] (irq_exit) from [] >(__handle_domain_irq+0x80/0xec) >[] (__handle_domain_irq) from [] >(gic_handle_irq+0x48/0x8c) >[] (gic_handle_irq) from [] (__irq_svc+0x58/0x8c)
Re: [PATCH] net: fec: select queue depending on VLAN priority
On 2017-05-09 19:42, Andy Duan wrote: > From: David MillerSent: Tuesday, May 09, 2017 9:39 PM >>To: ste...@agner.ch >>Cc: Andy Duan ; and...@lunn.ch; >>feste...@gmail.com; netdev@vger.kernel.org; linux- >>ker...@vger.kernel.org >>Subject: Re: [PATCH] net: fec: select queue depending on VLAN priority >> >>From: Stefan Agner >>Date: Mon, 8 May 2017 22:37:08 -0700 >> >>> Since the addition of the multi queue code with commit 59d0f7465644 >>> ("net: fec: init multi queue date structure") the queue selection has >>> been handelt by the default transmit queue selection implementation >>> which tries to evenly distribute the traffic across all available >>> queues. This selection presumes that the queues are using an equal >>> priority, however, the queues 1 and 2 are actually of higher priority >>> (the classification of the queues is enabled in fec_enet_enable_ring). >>> >>> This can lead to net scheduler warnings and continuous TX ring dumps >>> when exercising the system with iperf. >>> >>> Use only queue 0 for all common traffic (no VLAN and P802.1p priority >>> 0 and 1) and route level 2-7 through queue 1 and 2. >>> >>> Signed-off-by: Fugang Duan >>> Fixes: 59d0f7465644 ("net: fec: init multi queue date structure") >> >>If the queues are used for prioritization, and it does not have multiple >>normal >>priority level queues, multiqueue is not what the driver should have >>implemented. > Firstly, HW multiple queues support: > - Traffic-shaping bandwidth distribution supports credit-based and > round-robin-based policies. Either policy can be combined with > time-based shaping. > - AVB (Audio Video Bridging, IEEE 802.1Qav) features: > * Credit-based bandwidth distribution policy can be combined > with > time-based shaping > * AVB endpoint talker and listener support > * Support for arbitration between different priority traffic > (for > example, AVB class A, AVB class B, and non-AVB) > Round-robin-based policies: > It has the same priority for three queues: In the round-robin QoS > scheme, each queue is given an equal opportunity to transmit one > frame. For example, if queue n has a frame to transmit, the queue > transmits its frame. After queue n has transmitted its frame, or if > queue n does not have a frame to transmit, queue n+1 is then allowed > to transmit its frame, and so on. > > Credit-based policies: > The AVB credit based shaper acts independently, per class, to control > the bandwidth distribution between normal traffic and time-sensitive > traffic with respect to the total link bandwidth available. > Credit based shaper conbined with time-based shaping: > - priority: ClassA queue > ClassB queue > best effort > - ensure the queue bandwidth as user set based on time-based > shaping > algorithms (transmitter transmit frame from three queue in turn based > on time-based shaping algorithms) > And in real AVB case, each streaming can be independent, and are > fixed on related queue. Then driver level should implement > .ndo_select_queue() to put the streaming into related queue. That is > what the patch did. > > The current driver config the three queue to credit-based policies > (AVB), the patch seems no problem for the implementation. Do you have > any suggestion ? > I tried using the round robin mode by adding this: + /* Set Round-Robin policy */ + writel(1, fep->hwp + FEC_QOS_SCHEME); After a while I got the warning non the less: WARNING: CPU: 0 PID: 0 at net/sched/sch_generic.c:316 dev_watchdog+0x248/0x24c NETDEV WATCHDOG: eth0 (fec): transmit queue 2 timed out Modules linked in: CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.11.0-rc1-00058-g56d22eced8bc-dirty #377 Hardware name: Freescale i.MX7 Dual (Device Tree) [] (unwind_backtrace) from [] (show_stack+0x10/0x14) [] (show_stack) from [] (dump_stack+0x78/0x8c) [] (dump_stack) from [] (__warn+0xe8/0x100) [] (__warn) from [] (warn_slowpath_fmt+0x38/0x48) [] (warn_slowpath_fmt) from [] (dev_watchdog+0x248/0x24c) [] (dev_watchdog) from [] (call_timer_fn+0x28/0x98) [] (call_timer_fn) from [] (expire_timers+0xa0/0xac) [] (expire_timers) from [] (run_timer_softirq+0x9c/0x194) [] (run_timer_softirq) from [] (__do_softirq+0x114/0x234) [] (__do_softirq) from [] (irq_exit+0xcc/0x108) [] (irq_exit) from [] (__handle_domain_irq+0x80/0xec) [] (__handle_domain_irq) from [] (gic_handle_irq+0x48/0x8c) [] (gic_handle_irq) from [] (__irq_svc+0x58/0x8c) Exception stack(0xc1001f28 to 0xc1001f70) 1f20: 0001 c022fdc0 c100 c1003d80 1f40: c1003d34 c0e72ed0 c0bd9b04 c1001f80 c1001f78 1f60: c022048c c0220490 6013 [] (__irq_svc) from [] (arch_cpu_idle+0x38/0x3c) [] (arch_cpu_idle) from [] (do_idle+0x170/0x204) [] (do_idle) from [] (cpu_startup_entry+0x18/0x1c) []
Re: [PATCH] libertas: Avoid reading past end of buffer
Joe Percheswrites: > unrelated trivia: > > lbs_deb_enter is used incorrectly here at > function exit as both enter and leave calls. > > That type of copy/paste defect may be common. > > $ git grep -w lbs_deb_enter | wc -l > 148 > $ git grep -w lbs_deb_leave | wc -l > 71 > > One would expect these numbers to be the same. > > Another option would be to delete all these > calls as ftrace function tracing works well. Yeah, deleting all the enter/exit calls would be better. -- Kalle Valo
Re: [PATCH net-next V4 10/10] vhost_net: try batch dequing from skb array
On 2017年05月10日 20:34, Michael S. Tsirkin wrote: On Wed, May 10, 2017 at 11:36:22AM +0800, Jason Wang wrote: We used to dequeue one skb during recvmsg() from skb_array, this could be inefficient because of the bad cache utilization and spinlock touching for each packet. This patch tries to batch them by calling batch dequeuing helpers explicitly on the exported skb array and pass the skb back through msg_control for underlayer socket to finish the userspace copying. Batch dequeuing is also the requirement for more batching improvement on rx. Tests were done by pktgen on tap with XDP1 in guest on top of batch zeroing: rx batch | pps 2562.41Mpps (+6.16%) 1282.48Mpps (+8.80%) 64 2.38Mpps (+3.96%) <- Default 16 2.31Mpps (+1.76%) 4 2.31Mpps (+1.76%) 1 2.30Mpps (+1.32%) 0 2.27Mpps (+7.48%) Signed-off-by: Jason Wang--- drivers/vhost/net.c | 117 +--- 1 file changed, 111 insertions(+), 6 deletions(-) diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c index 9b51989..fbaecf3 100644 --- a/drivers/vhost/net.c +++ b/drivers/vhost/net.c @@ -28,6 +28,8 @@ #include #include #include +#include +#include #include @@ -85,6 +87,13 @@ struct vhost_net_ubuf_ref { struct vhost_virtqueue *vq; }; +#define VHOST_RX_BATCH 64 +struct vhost_net_buf { + struct sk_buff *queue[VHOST_RX_BATCH]; + int tail; + int head; +}; + Do you strictly need to put this inline? This structure is quite big already. Do you see a measureabe difference if you make it struct sk_buff **queue; int tail; int head; ? I don't. Will also make it easier to play with the size in the future should someone want to see how does it work e.g. for different ring sizes. Ok, will do this in next version Thanks
Re: DQL and TCQ_F_CAN_BYPASS destroy performance under virtualizaiton (Was: "Re: net_sched strange in 4.11")
On 2017年05月10日 17:42, Anton Ivanov wrote: On 10/05/17 09:56, Jason Wang wrote: On 2017年05月10日 13:28, Anton Ivanov wrote: On 10/05/17 03:18, Jason Wang wrote: On 2017年05月09日 23:11, Stefan Hajnoczi wrote: On Tue, May 09, 2017 at 08:46:46AM +0100, Anton Ivanov wrote: I have figured it out. Two issues. 1) skb->xmit_more is hardly ever set under virtualization because the qdisc is usually bypassed because of TCQ_F_CAN_BYPASS. Once TCQ_F_CAN_BYPASS is set a virtual NIC driver is not likely see skb->xmit_more (this answers my "how does this work at all" question). 2) If that flag is turned off (I patched sched_generic to turn it off in pfifo_fast while testing), DQL keeps xmit_more from being set. If the driver is not DQL enabled xmit_more is never ever set. If the driver is DQL enabled the queue is adjusted to ensure xmit_more stops happening within 10-15 xmit cycles. That is plain *wrong* for virtual NICs - virtio, emulated NICs, etc. There, the BIG cost is telling the hypervisor that it needs to "kick" the packets. The cost of putting them into the vNIC buffers is negligible. You want xmit_more to happen - it makes between 50% and 300% (depending on vNIC design) difference. If there is no xmit_more the vNIC will immediately "kick" the hypervisor and try to signal that the packet needs to move straight away (as for example in virtio_net). How do you measure the performance? TCP or just measure pps? In this particular case - tcp from guest. I have a couple of other benchmarks (forwarding, etc). One more question, is the number for virtio-net or other emulated vNIC? Other for now - you are cc-ed to keep you in the loop. Virtio is next on my list - I am revisiting the l2tpv3.c driver in QEMU and looking at how to preserve bulking by adding back sendmmsg (as well as a list of other features/transports). We had sendmmsg removed for the final inclusion in QEMU 2.1, it presently uses only recvmmsg so for the time being it does not care. That will most likely change once it starts using sendmmsg as well. An issue is that qemu net API does not support bulking, do you plan to add it? Thanks
Re: openvswitch MTU patch needed in 4.10 stable
From: Stephen HemmingerDate: Tue, 9 May 2017 08:46:56 -0700 > Could you queue the patch to stable? It speeds things along if you actually specify the SHA1 ID of the specific commit being requested for a -stable backport. I did this work, but you could have taken a brief amount of time so that this would not have been necessary. Thanks.
Re: [PATCH net] tcp: avoid fragmenting peculiar skbs in SACK
On Wed, May 10, 2017 at 8:01 PM, Yuchung Chengwrote: > > This patch fixes a bug in splitting an SKB during SACK > processing. Specifically if an skb contains multiple > packets and is only partially sacked in the higher sequences, > tcp_match_sack_to_skb() splits the skb and marks the second fragment > as SACKed. > > The current code further attempts rounding up the first fragment > to MSS boundaries. But it misses a boundary condition when the > rounded-up fragment size (pkt_len) is exactly skb size. Spliting > such an skb is pointless and causses a kernel warning and aborts > the SACK processing. This patch universally checks such over-split > before calling tcp_fragment to prevent these unnecessary warnings. > > Fixes: adb92db857ee ("tcp: Make SACK code to split only at mss boundaries") > Signed-off-by: Yuchung Cheng > Signed-off-by: Eric Dumazet > Signed-off-by: Soheil Hassas Yeganeh Acked-by: Neal Cardwell neal
Re: [PATCH 1/2] PCI: Add new PCIe Fabric End Node flag, PCI_DEV_FLAGS_NO_RELAXED_ORDERING
On 2017/5/9 8:48, Casey Leedom wrote: > > | From: Alexander Duyck> | Date: Saturday, May 6, 2017 11:07 AM > | > | | From: Ding Tianhong > | | Date: Fri, May 5, 2017 at 8:08 PM > | | > | | According the suggestion, I could only think of this code: > | | .. > | > | This is a bit simplistic but it is a start. > > Yes, something tells me that this is going to be more complicated than any > of us like ... > > | The other bit I was getting at is that we need to update the core PCIe > | code so that when we configure devices and the root complex reports no > | support for relaxed ordering it should be clearing the relaxed > | ordering bits in the PCIe configuration registers on the upstream > | facing devices. > > Of course, this can be written to by the Driver at any time ... and is in > the case of the cxgb4 Driver ... > > After a lot of rummaging around, it does look like KVM prohibits writes to > the PCIe Capability Device Control register in drivers/xen/xen-pciback/ > conf_space_capability.c and conf_space.c simply because writes aren't > allowed unless "permissive" is set. So it ~looks~ like a driver running in > a Virtual Machine can't turn Enable Relaxed Ordering back on ... > > | The last bit we need in all this is a way to allow for setups where > | peer-to-peer wants to perform relaxed ordering but for writes to the > | host we have to not use relaxed ordering. For that we need to enable a > | special case and that isn't handled right now in any of the solutions > | we have coded up so far. > > Yes, we do need this. > > > | From: Alexander Duyck > | Date: Saturday, May 8, 2017 08:22 AM > | > | The problem is we need to have something that can be communicated > | through a VM. Your change doesn't work in that regard. That was why I > | suggested just updating the code so that we when we initialized PCIe > | devices what we do is either set or clear the relaxed ordering bit in > | the PCIe device control register. That way when we direct assign an > | interface it could know just based on the bits int the PCIe > | configuration if it could use relaxed ordering or not. > | > | At that point the driver code itself becomes very simple since you > | could just enable the relaxed ordering by default in the igb/ixgbe > | driver and if the bit is set or cleared in the PCIe configuration then > | we are either sending with relaxed ordering requests or not and don't > | have to try and locate the root complex. > | > | So from the sound of it Casey has a special use case where he doesn't > | want to send relaxed ordering frames to the root complex, but instead > | would like to send them to another PCIe device. To do that he needs to > | have a way to enable the relaxed ordering bit in the PCIe > | configuration but then not send any to the root complex. Odds are that > | is something he might be able to just implement in the driver, but is > | something that may become a more general case in the future. I don't > | see our change here impacting it as long as we keep the solution > | generic and mostly confined to when we instantiate the devices as the > | driver could likely make the decision to change the behavior later. > > It's not just me. Intel has said that while RO directed at the Root > Complex Host Coherent Memory has a performance bug (not Data Corruption), > it's a performance win for Peer-to-Peer writes to MMIO Space. (I'll be very > interested in hearing what the bug is if we get that much detail. The very > same TLPs directed to the Root Complex Port without Relaxed Ordering set get > good performance. So this is essentially a bug in the hardware that was > ~trying~ to implement a performance win.) > > Meanwhile, I currently only know of a single PCIe End Point which causes > catastrophic results: the AMD A1100 ARM SoC ("SEATTLE"). And it's not even > clear that product is even alive anymore since I haven't been able to get > any responses from them for several months. > > What I'm saying is: let's try to architect a solution which doesn't throw > the baby out with the bath water ... > > I think that if a Device's Root Complex Port has problems with Relaxed > Ordering, it ~probably~ makes sense to turn off the PCIe Capability Device > Control[Enable Relaxed Ordering] when we assign a device to a Virtual > Machine since the Device Driver can no longer query the Relaxed Ordering > Support of the Root Complex Port. The only down side of this would be if we > assigned two Peers to a VM in an application which wanted to do Peer-to-Peer > transfers. But that seems like a hard application to support in any case > since the PCI Bus:Slot.Function IDs for assigned Devices within a VM don't > match the actual values. > > For Devices running in the base OS/Hypervisor, their Drivers can query the > Relaxed Ordering Support for the Root Complex Port or a Peer Device. So a > simple
[PATCH net] tcp: avoid fragmenting peculiar skbs in SACK
This patch fixes a bug in splitting an SKB during SACK processing. Specifically if an skb contains multiple packets and is only partially sacked in the higher sequences, tcp_match_sack_to_skb() splits the skb and marks the second fragment as SACKed. The current code further attempts rounding up the first fragment to MSS boundaries. But it misses a boundary condition when the rounded-up fragment size (pkt_len) is exactly skb size. Spliting such an skb is pointless and causses a kernel warning and aborts the SACK processing. This patch universally checks such over-split before calling tcp_fragment to prevent these unnecessary warnings. Fixes: adb92db857ee ("tcp: Make SACK code to split only at mss boundaries") Signed-off-by: Yuchung ChengSigned-off-by: Eric Dumazet Signed-off-by: Soheil Hassas Yeganeh --- net/ipv4/tcp_input.c | 9 + 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index 5a3ad09e2786..06e2dbc2b4a2 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -1179,13 +1179,14 @@ static int tcp_match_skb_to_sack(struct sock *sk, struct sk_buff *skb, */ if (pkt_len > mss) { unsigned int new_len = (pkt_len / mss) * mss; - if (!in_sack && new_len < pkt_len) { + if (!in_sack && new_len < pkt_len) new_len += mss; - if (new_len >= skb->len) - return 0; - } pkt_len = new_len; } + + if (pkt_len >= skb->len && !in_sack) + return 0; + err = tcp_fragment(sk, skb, pkt_len, mss, GFP_ATOMIC); if (err < 0) return err; -- 2.13.0.rc2.291.g57267f2277-goog
[PATCH net] bpf, arm64: fix faulty emission of map access in tail calls
Shubham was recently asking on netdev why in arm64 JIT we don't multiply the index for accessing the tail call map by 8. That led me into testing out arm64 JIT wrt tail calls and it turned out I got a NULL pointer dereference on the tail call. The buggy access is at: prog = array->ptrs[index]; if (prog == NULL) goto out; [...] 0060: d2800e0a mov x10, #0x70 // #112 0064: f86a682a ldr x10, [x1,x10] 0068: f862694b ldr x11, [x10,x2] 006c: b4ab cbz x11, 0x0080 [...] The code triggering the crash is f862694b. x1 at the time contains the address of the bpf array, x10 offsetof(struct bpf_array, ptrs). Meaning, above we load the pointer to the program at map slot 0 into x10. x10 can then be NULL if the slot is not occupied, which we later on try to access with a user given offset in x2 that is the map index. Fix this by emitting the following instead: [...] 0060: d2800e0a mov x10, #0x70 // #112 0064: 8b0a002a add x10, x1, x10 0068: d37df04b lsl x11, x2, #3 006c: f86b694b ldr x11, [x10,x11] 0070: b4ab cbz x11, 0x0084 [...] This basically adds the offset to ptrs to the base address of the bpf array we got and we later on access the map with an index * 8 offset relative to that. The tail call map itself is basically one large area with meta data at the head followed by the array of prog pointers. This makes tail calls working again, tested on Cavium ThunderX ARMv8. Fixes: ddb55992b04d ("arm64: bpf: implement bpf_tail_call() helper") Reported-by: Shubham BansalSigned-off-by: Daniel Borkmann --- arch/arm64/net/bpf_jit_comp.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/arch/arm64/net/bpf_jit_comp.c b/arch/arm64/net/bpf_jit_comp.c index c6e5358..71f9305 100644 --- a/arch/arm64/net/bpf_jit_comp.c +++ b/arch/arm64/net/bpf_jit_comp.c @@ -253,8 +253,9 @@ static int emit_bpf_tail_call(struct jit_ctx *ctx) */ off = offsetof(struct bpf_array, ptrs); emit_a64_mov_i64(tmp, off, ctx); - emit(A64_LDR64(tmp, r2, tmp), ctx); - emit(A64_LDR64(prg, tmp, r3), ctx); + emit(A64_ADD(1, tmp, r2, tmp), ctx); + emit(A64_LSL(1, prg, r3, 3), ctx); + emit(A64_LDR64(prg, tmp, prg), ctx); emit(A64_CBZ(1, prg, jmp_offset), ctx); /* goto *(prog->bpf_func + prologue_size); */ -- 1.9.3
Re: [PATCH] libertas: Avoid reading past end of buffer
On Wed, 2017-05-10 at 12:24 -0700, Kees Cook wrote: > Using memcpy() from a string that is shorter than the length copied means > the destination buffer is being filled with arbitrary data from the kernel > rodata segment. another bit of trivia: > diff --git a/drivers/net/wireless/marvell/libertas/mesh.c > b/drivers/net/wireless/marvell/libertas/mesh.c [] > @@ -1170,17 +1170,11 @@ int lbs_mesh_ethtool_get_sset_count(struct net_device > *dev, int sset) [] > + memcpy(s, *mesh_stat_strings, sizeof(mesh_stat_strings)); That * isn't necessary.
Re: [PATCH] libertas: Avoid reading past end of buffer
On Wed, 2017-05-10 at 12:24 -0700, Kees Cook wrote: > Using memcpy() from a string that is shorter than the length copied means [] > diff --git a/drivers/net/wireless/marvell/libertas/mesh.c > b/drivers/net/wireless/marvell/libertas/mesh.c [] > @@ -1170,17 +1170,11 @@ int lbs_mesh_ethtool_get_sset_count(struct net_device > *dev, int sset) > void lbs_mesh_ethtool_get_strings(struct net_device *dev, > uint32_t stringset, uint8_t *s) > { > - int i; > - > lbs_deb_enter(LBS_DEB_ETHTOOL); > > switch (stringset) { > case ETH_SS_STATS: > - for (i = 0; i < MESH_STATS_NUM; i++) { > - memcpy(s + i * ETH_GSTRING_LEN, > - mesh_stat_strings[i], > - ETH_GSTRING_LEN); > - } > + memcpy(s, *mesh_stat_strings, sizeof(mesh_stat_strings)); > break; > } > lbs_deb_enter(LBS_DEB_ETHTOOL); unrelated trivia: lbs_deb_enter is used incorrectly here at function exit as both enter and leave calls. That type of copy/paste defect may be common. $ git grep -w lbs_deb_enter | wc -l 148 $ git grep -w lbs_deb_leave | wc -l 71 One would expect these numbers to be the same. Another option would be to delete all these calls as ftrace function tracing works well.
Re: [PATCH net 2/2] xdp: disallow use of native and generic hook at once
On 05/11/2017 12:46 AM, Jakub Kicinski wrote: On Thu, 11 May 2017 00:24:56 +0200, Daniel Borkmann wrote: I understand the counter argument that from user space perspective it would make things slightly more complicated because there would be two conditions in which driver hook is used: 1) DRV_MODE set on dump; 2) flags attribute not present (old kernel). I'm concerned about number 2). We can't simply depend on SKB_MODE not being set because we may add more *_MODE flags in the future. So doing: if (flags & SKB_MODE) printf("skb-mode"); else printf("drv-mode"); is not correct. The flags attribute must not be present at all (think HW_MODE flag). But going further there can also be non-MODE flags, like, say.. NEVER_TX, and then there may be flags present in dump, and if SKB_MODE isn't be set, the mode could be some new MODE user space doesn't understand, or it could be DRV_MODE+a new non-MODE flag... no way to tell :S Yep, I see your point. Additionally, if we use XDP_FLAGS_* again for dumping we're wasting bit space for flags we would never dump back such as mentioned XDP_FLAGS_UPDATE_IF_NOEXIST (or any other future flags that are only relevant for loading, but never for dumping). Given dumping IFLA_XDP_FLAGS was added due to XDP_FLAGS_SKB_MODE, we can still change this, since it's not too late. How about the following proposal: IFLA_XDP_ATTACHED we have as-is (need to keep that anyway), if that is true, it means "something is attached at XDP layer". Then, we add a new attribute IFLA_XDP_MODE (enum as type), which can contain XDP_DRV_MODE (0), XDP_SKB_MODE (1). I don't think there's a strict requirement to really dump IFLA_XDP_FLAGS back, separating both attrs would avoid this hassle of what current or future load flag fits for dump as well and which not. Wdyt? I really like the idea of not reusing IFLA_XDP_FLAGS for dumps! New enum sounds good, but perhaps reusing IFLA_XDP_ATTACHED isn't 100% off-limits either? 4.11 would report (0) - driver supports XDP but nothing is attached, (1) - something attached at the driver, could we make (2) mean - something attached at generic XDP? Would that be considered breaking the ABI, are we bound to boolean semantics? I like the idea, it would also render IFLA_XDP_ATTACHED not useless or redundant then. Older binaries check !rta_getattr_u8(tb[IFLA_XDP_ATTACHED]) whether something is attached or not at XDP layer. So they won't know IFLA_XDP_FLAGS attr either to make a more fine-grained distinction about what mode. That seems actually cleaner to me, will give it a try. Thanks, Daniel
Re: [PATCH net 2/2] xdp: disallow use of native and generic hook at once
On Thu, 11 May 2017 00:24:56 +0200, Daniel Borkmann wrote: > > I understand the counter argument that from user space perspective it > > would make things slightly more complicated because there would be two > > conditions in which driver hook is used: > > 1) DRV_MODE set on dump; > > 2) flags attribute not present (old kernel). > > > > I'm concerned about number 2). We can't simply depend on SKB_MODE > > not being set because we may add more *_MODE flags in the future. So > > doing: > > > > if (flags & SKB_MODE) > > printf("skb-mode"); > > else > > printf("drv-mode"); > > > > is not correct. The flags attribute must not be present at all (think > > HW_MODE flag). But going further there can also be non-MODE flags, > > like, say.. NEVER_TX, and then there may be flags present in dump, > > and if SKB_MODE isn't be set, the mode could be some new MODE user space > > doesn't understand, or it could be DRV_MODE+a new non-MODE flag... no > > way to tell :S > > Yep, I see your point. Additionally, if we use XDP_FLAGS_* again for > dumping we're wasting bit space for flags we would never dump back > such as mentioned XDP_FLAGS_UPDATE_IF_NOEXIST (or any other future > flags that are only relevant for loading, but never for dumping). > Given dumping IFLA_XDP_FLAGS was added due to XDP_FLAGS_SKB_MODE, > we can still change this, since it's not too late. > > How about the following proposal: IFLA_XDP_ATTACHED we have as-is > (need to keep that anyway), if that is true, it means "something > is attached at XDP layer". Then, we add a new attribute IFLA_XDP_MODE > (enum as type), which can contain XDP_DRV_MODE (0), XDP_SKB_MODE (1). > I don't think there's a strict requirement to really dump IFLA_XDP_FLAGS > back, separating both attrs would avoid this hassle of what current > or future load flag fits for dump as well and which not. Wdyt? I really like the idea of not reusing IFLA_XDP_FLAGS for dumps! New enum sounds good, but perhaps reusing IFLA_XDP_ATTACHED isn't 100% off-limits either? 4.11 would report (0) - driver supports XDP but nothing is attached, (1) - something attached at the driver, could we make (2) mean - something attached at generic XDP? Would that be considered breaking the ABI, are we bound to boolean semantics?
Re: [PATCH net 2/2] xdp: disallow use of native and generic hook at once
On 05/10/2017 11:07 PM, Jakub Kicinski wrote: On Wed, 10 May 2017 11:36:22 +0200, Daniel Borkmann wrote: On 05/10/2017 05:18 AM, Jakub Kicinski wrote: On Wed, 10 May 2017 03:31:31 +0200, Daniel Borkmann wrote: [...] xdp = nla_nest_start(skb, IFLA_XDP); if (!xdp) return -EMSGSIZE; + if (rcu_access_pointer(dev->xdp_prog)) { xdp_flags = XDP_FLAGS_SKB_MODE; val = 1; - } else if (dev->netdev_ops->ndo_xdp) { - struct netdev_xdp xdp_op = {}; - - xdp_op.command = XDP_QUERY_PROG; - err = dev->netdev_ops->ndo_xdp(dev, _op); - if (err) - goto err_cancel; - val = xdp_op.prog_attached; + } else { + val = dev_xdp_attached(dev); } Would it make sense to set xdp_flags to XDP_FLAGS_DRV_MODE here to keep things symmetrical? I know you are just preserving existing behaviour but it may seem slightly arbitrary to a new comer to report one of the very similarly named flags in the dump but not the other. I thought about it, it's kind of redundant information since IFLA_XDP_ATTACHED attribute w/o IFLA_XDP_FLAGS attribute today says that it's native already. It might look strange if we add also XDP_FLAGS_DRV_MODE there, since it doesn't give anything new. I rather see it similar to XDP_FLAGS_UPDATE_IF_NOEXIST flag that is for updating fd only, but I don't really have a strong opinion on this though. I could add it to the respin if preferred. XDP_FLAGS_UPDATE_IF_NOEXIST is indeed the precedent which makes things a bit murky. There are no reasonably useful semantics for IF_NOEXIST on dump :( Note that meaning of SKB_MODE flag shifts slightly between set and dump IIUC. At set time it means: "force installation at the generic hook", at dump time it means: "installed at generic hook - regardless of whether the flag was set at installation time", So I would argue that DRV_MODE flag is closer to SKB_MODE not only by name but also by semantics, and it would be cool if we could keep the semantics close on dump as well as set. Right. I understand the counter argument that from user space perspective it would make things slightly more complicated because there would be two conditions in which driver hook is used: 1) DRV_MODE set on dump; 2) flags attribute not present (old kernel). I'm concerned about number 2). We can't simply depend on SKB_MODE not being set because we may add more *_MODE flags in the future. So doing: if (flags & SKB_MODE) printf("skb-mode"); else printf("drv-mode"); is not correct. The flags attribute must not be present at all (think HW_MODE flag). But going further there can also be non-MODE flags, like, say.. NEVER_TX, and then there may be flags present in dump, and if SKB_MODE isn't be set, the mode could be some new MODE user space doesn't understand, or it could be DRV_MODE+a new non-MODE flag... no way to tell :S Yep, I see your point. Additionally, if we use XDP_FLAGS_* again for dumping we're wasting bit space for flags we would never dump back such as mentioned XDP_FLAGS_UPDATE_IF_NOEXIST (or any other future flags that are only relevant for loading, but never for dumping). Given dumping IFLA_XDP_FLAGS was added due to XDP_FLAGS_SKB_MODE, we can still change this, since it's not too late. How about the following proposal: IFLA_XDP_ATTACHED we have as-is (need to keep that anyway), if that is true, it means "something is attached at XDP layer". Then, we add a new attribute IFLA_XDP_MODE (enum as type), which can contain XDP_DRV_MODE (0), XDP_SKB_MODE (1). I don't think there's a strict requirement to really dump IFLA_XDP_FLAGS back, separating both attrs would avoid this hassle of what current or future load flag fits for dump as well and which not. Wdyt? Thanks, Daniel
Re: Requirements for a shutdown function?
On 05/10/2017 03:11 PM, Timur Tabi wrote: > On 05/10/2017 04:47 PM, Florian Fainelli wrote: >> AFAIR kexec takes care of shutting down network devices explicitly >> (unless instructed otherwise with -x/--no-ifdown) so this may be where >> this is coming from. >> >> Reading through drivers/base/core.c it does not appear that ->remove() >> is called and then ->shutdown() gets called, only ->shutdown() gets >> called from device_shutdown() called from kernel/reboot.c. It seems to >> me like if you want to be on the safe side you would want to implement a >> shutdown function that is identical to what your remove function does. > > I finally found a testcase where the shutdown function is useful. If you do > a "reboot -f", it will call shutdown but not close. Correct yes. Sorry, I did not recall which one of kexec or reboot would call it, but both would actually now that I looked at what happens on one of my systems again. -- Florian
Re: Requirements for a shutdown function?
On 05/10/2017 04:47 PM, Florian Fainelli wrote: > AFAIR kexec takes care of shutting down network devices explicitly > (unless instructed otherwise with -x/--no-ifdown) so this may be where > this is coming from. > > Reading through drivers/base/core.c it does not appear that ->remove() > is called and then ->shutdown() gets called, only ->shutdown() gets > called from device_shutdown() called from kernel/reboot.c. It seems to > me like if you want to be on the safe side you would want to implement a > shutdown function that is identical to what your remove function does. I finally found a testcase where the shutdown function is useful. If you do a "reboot -f", it will call shutdown but not close. -- Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc. Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.
Re: [PATCH v2 net-next 03/12] drivers: net: xgene: Use rgmii mdio mac access
On 05/10/2017 01:45 PM, Iyappan Subramanian wrote: > From: Quan Nguyen> > This patch switches to use rgmii mdio mac access routines if available, > as they share the same HW. > > Signed-off-by: Quan Nguyen > Signed-off-by: Iyappan Subramanian > --- > drivers/net/ethernet/apm/xgene/xgene_enet_hw.c | 14 ++ > 1 file changed, 14 insertions(+) > > diff --git a/drivers/net/ethernet/apm/xgene/xgene_enet_hw.c > b/drivers/net/ethernet/apm/xgene/xgene_enet_hw.c > index 2050c58..47c5b75 100644 > --- a/drivers/net/ethernet/apm/xgene/xgene_enet_hw.c > +++ b/drivers/net/ethernet/apm/xgene/xgene_enet_hw.c > @@ -277,6 +277,13 @@ void xgene_enet_wr_mac(struct xgene_enet_pdata *pdata, > u32 wr_addr, u32 wr_data) > u8 wait = 10; > u32 done; > > + if (pdata->mdio_driver && ndev->phydev && > + pdata->phy_mode == PHY_INTERFACE_MODE_RGMII) { To be on the safe side you should check all 4 values of PHY_INTERFACE_MODE_RGMII, not just this one. -- Florian
Re: Requirements for a shutdown function?
On 05/10/2017 01:17 PM, Timur Tabi wrote: > On 05/09/2017 02:06 PM, Florian Fainelli wrote: >> On 05/09/2017 11:51 AM, Timur Tabi wrote: > >>> Is it possible that the network stack detects a kexec and automatically >>> stops all network devices? >> >> No. why would it? However the device driver model does call into your >> driver's remove function and that one does a right job already because >> it does an network device unregister, and so on. > > I ran some more tests. When I launch kexec, the driver's > net_device_ops.ndo_stop function is called, which already stops the interface. > > So it looks to me as if the network stack does close the interface during a > kexec. With the interface closed, there's no point in having a shutdown > function, is there? AFAIR kexec takes care of shutting down network devices explicitly (unless instructed otherwise with -x/--no-ifdown) so this may be where this is coming from. Reading through drivers/base/core.c it does not appear that ->remove() is called and then ->shutdown() gets called, only ->shutdown() gets called from device_shutdown() called from kernel/reboot.c. It seems to me like if you want to be on the safe side you would want to implement a shutdown function that is identical to what your remove function does. > >>> My in-house driver stops the RX and TX queues. I'm guessing that's good >>> enough, but I don't have a failing test case to prove it. >>> >> >> That's probably good enough, yes. > > Except that it turns out that the queues are already stopped by then because > the emac_close() function has already been called. -- Florian
[PATCH net-next] ipv6: Implement limits on hop by hop and destination options
RFC 2460 (IPv6) defines hop by hop options and destination options extension headers. Both of these carry a list of TLVs which is only limited by the maximum length of the extension header (2048 bytes). By the spec a host must process all the TLVs in these options, however these could be used as a fairly obvious denial of service attack. I think this could in fact be a significant DOS vector on the Internet, one mitigating factor might be that many FWs drop all packets with EH (and obviously this is only IPv6) so an Internet wide attack might not be so effective (yet!). By my calculation, the worse case packet with TLVs in a standard 1500 byte MTU packet that would be processed by the stack contains 1282 invidual TLVs (including pad TLVS) or 724 two byte TLVs. I wrote a quick test program that floods a whole bunch of these packets to a host and sure enough there is substantial time spent in ip6_parse_tlv. These packets contain nothing but unknown TLVS (that are ignored), TLV padding, and bogus UDP header with zero payload length. 25.38% [kernel][k] __fib6_clean_all 21.63% [kernel][k] ip6_parse_tlv 4.21% [kernel][k] __local_bh_enable_ip 2.18% [kernel][k] ip6_pol_route.isra.39 1.98% [kernel][k] fib6_walk_continue 1.88% [kernel][k] _raw_write_lock_bh 1.65% [kernel][k] dst_release This patches adds configurable limits to destination and hop by hop options. There are three limits that may be set: - Limit the number of non-padding TLVs that may be in an extension header - Limit the length of a hop by hop or destination options extension header - Disallow unknown options The limits are set in corresponding sysctls: ipv6.sysctl.max_dst_opts_cnt ipv6.sysctl.max_hbh_opts_cnt ipv6.sysctl.max_dst_opts_len ipv6.sysctl.max_hbh_opts_len If a max_*_opts_cnt is less than zero then unknown TLVs are disallowed. The number of known TLVs that are allowed is the absolute value of this number. If a limit is exceeded when processing an extension header the packet is dropped. Default values are set to 8 for options counts, and set to INT_MAX for maximum length. Note the choice to limit options to 8 is an arbitrary guess (roughly based on the fact that the stack supports three HBH options and just one destination option). Tested: I've only complied this code, working on getting a test environment set up which is why RFC. If anyone has resources and time to do some testing or development, let me know! Tested: Martin Lau was kind enough to test this on real HW. He verified that processing IPv6 TLVs CPU intensive and could very well be a DDOS, and he verified that the patch drops packets that exceed the limits to avoid the issue: I tested out 1 thread (i.e. one raw_udp process). I changed the net.ipv6.max_dst_(opts|hbh)_number between 8 to 2048. With sysctls setting to 2048, the softirq% is packed to 100%. With 8, the softirq% is almost unnoticable from mpstat. --- Documentation/networking/ip-sysctl.txt | 22 + include/net/ipv6.h | 33 + include/net/netns/ipv6.h | 4 net/ipv6/af_inet6.c| 4 net/ipv6/exthdrs.c | 44 ++ net/ipv6/sysctl_net_ipv6.c | 32 + 6 files changed, 134 insertions(+), 5 deletions(-) diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt index 974ab47..476a5c5 100644 --- a/Documentation/networking/ip-sysctl.txt +++ b/Documentation/networking/ip-sysctl.txt @@ -1379,6 +1379,28 @@ mld_qrv - INTEGER Default: 2 (as specified by RFC3810 9.1) Minimum: 1 (as specified by RFC6636 4.5) +max_dst_opts_cnt - INTEGER + Maximum number of non-padding TLVs allowed in a destination + options extension header. If this value is less than zero + then unknown options are disallowed and the number of known + TLVs allowed are the absolute value of this numer. + + Default: 8 + +max_hbh_opts_cnt - INTEGER + Maximum number of non-padding TLVs allowed in a hop by hop + options extension header. If this value is less than zero + then unknown options are disallowed and the number of known + TLVs allowed are the absolute value of this number. + +max dst_opts_len - INTEGER + Maximum length allowed for a destination options extension + header. + +max hbh_opts_len - INTEGER + Maximum length allowed for a hop by hop options extension + header. + IPv6 Fragmentation: ip6frag_high_thresh - INTEGER diff --git a/include/net/ipv6.h b/include/net/ipv6.h index dbf0abb..9f724ae 100644 --- a/include/net/ipv6.h +++ b/include/net/ipv6.h @@ -50,6 +50,39 @@ #define IPV6_DEFAULT_HOPLIMIT 64 #define IPV6_DEFAULT_MCASTHOPS 1 +/* Limits on hop
Re: Hello
-- Hello, I have something important to discuss with you as soon as you reply back. Regards. Miss.Rechel
Re: [PATCH net 2/2] xdp: disallow use of native and generic hook at once
On Wed, 10 May 2017 11:36:22 +0200, Daniel Borkmann wrote: > On 05/10/2017 05:18 AM, Jakub Kicinski wrote: > > On Wed, 10 May 2017 03:31:31 +0200, Daniel Borkmann wrote: > >> @@ -6851,6 +6851,32 @@ int dev_change_proto_down(struct net_device *dev, > >> bool proto_down) > >> } > >> EXPORT_SYMBOL(dev_change_proto_down); > >> > >> +bool __dev_xdp_attached(struct net_device *dev, xdp_op_t xdp_op) > > > > Out of curiosity - the leading underscores refer to caller having to > > hold rtnl? I assume they are not needed in the function below because > > it's static? > > I think I don't quite follow the last question, but it probably makes > sense to add an ASSERT_RTNL() into dev_xdp_attached() inline helper to > make it clearly visible to callers of this api. Sorry, I missed you have a dev_xdp_attached() defined in the header, hence the confusion. > >> +{ > >> + struct netdev_xdp xdp; > >> + > >> + memset(, 0, sizeof(xdp)); > >> + xdp.command = XDP_QUERY_PROG; > > > > Probably personal preference, but seems like designated struct > > initializer would do quite nicely here and save the memset :) > > I had that initially, but I recalled that gcc < 4.6 does not handle this > style for the initialization of anonymous struct/union properly (e.g., > we fixed that in iproute2 as well). Andrew Morton still uses gcc 4.4.4 > and occasionally sends kernel fixes, so we might end up like this anyway. Ah, good to know! > >> diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c > >> index dda9f16..99320f0 100644 > >> --- a/net/core/rtnetlink.c > >> +++ b/net/core/rtnetlink.c > >> @@ -1251,24 +1251,20 @@ static int rtnl_xdp_fill(struct sk_buff *skb, > >> struct net_device *dev) > >> { > >>struct nlattr *xdp; > >>u32 xdp_flags = 0; > >> - u8 val = 0; > >>int err; > >> + u8 val; > >> > >>xdp = nla_nest_start(skb, IFLA_XDP); > >>if (!xdp) > >>return -EMSGSIZE; > >> + > >>if (rcu_access_pointer(dev->xdp_prog)) { > >>xdp_flags = XDP_FLAGS_SKB_MODE; > >>val = 1; > >> - } else if (dev->netdev_ops->ndo_xdp) { > >> - struct netdev_xdp xdp_op = {}; > >> - > >> - xdp_op.command = XDP_QUERY_PROG; > >> - err = dev->netdev_ops->ndo_xdp(dev, _op); > >> - if (err) > >> - goto err_cancel; > >> - val = xdp_op.prog_attached; > >> + } else { > >> + val = dev_xdp_attached(dev); > >>} > > > > Would it make sense to set xdp_flags to XDP_FLAGS_DRV_MODE here to keep > > things symmetrical? I know you are just preserving existing behaviour > > but it may seem slightly arbitrary to a new comer to report one of the > > very similarly named flags in the dump but not the other. > > I thought about it, it's kind of redundant information since > IFLA_XDP_ATTACHED attribute w/o IFLA_XDP_FLAGS attribute today > says that it's native already. It might look strange if we add > also XDP_FLAGS_DRV_MODE there, since it doesn't give anything > new. I rather see it similar to XDP_FLAGS_UPDATE_IF_NOEXIST flag > that is for updating fd only, but I don't really have a strong > opinion on this though. I could add it to the respin if preferred. XDP_FLAGS_UPDATE_IF_NOEXIST is indeed the precedent which makes things a bit murky. There are no reasonably useful semantics for IF_NOEXIST on dump :( Note that meaning of SKB_MODE flag shifts slightly between set and dump IIUC. At set time it means: "force installation at the generic hook", at dump time it means: "installed at generic hook - regardless of whether the flag was set at installation time", So I would argue that DRV_MODE flag is closer to SKB_MODE not only by name but also by semantics, and it would be cool if we could keep the semantics close on dump as well as set. I understand the counter argument that from user space perspective it would make things slightly more complicated because there would be two conditions in which driver hook is used: 1) DRV_MODE set on dump; 2) flags attribute not present (old kernel). I'm concerned about number 2). We can't simply depend on SKB_MODE not being set because we may add more *_MODE flags in the future. So doing: if (flags & SKB_MODE) printf("skb-mode"); else printf("drv-mode"); is not correct. The flags attribute must not be present at all (think HW_MODE flag). But going further there can also be non-MODE flags, like, say.. NEVER_TX, and then there may be flags present in dump, and if SKB_MODE isn't be set, the mode could be some new MODE user space doesn't understand, or it could be DRV_MODE+a new non-MODE flag... no way to tell :S
[PATCH v2 net-next 07/12] drivers: net: xgene: Extend ethtool statistics
From: Quan NguyenThis patch adds extended ethtool statistics support. Signed-off-by: Quan Nguyen Signed-off-by: Iyappan Subramanian --- .../net/ethernet/apm/xgene/xgene_enet_ethtool.c| 89 +- drivers/net/ethernet/apm/xgene/xgene_enet_hw.c | 29 +++ drivers/net/ethernet/apm/xgene/xgene_enet_hw.h | 51 + drivers/net/ethernet/apm/xgene/xgene_enet_main.c | 8 ++ drivers/net/ethernet/apm/xgene/xgene_enet_main.h | 4 + drivers/net/ethernet/apm/xgene/xgene_enet_xgmac.h | 1 + 6 files changed, 181 insertions(+), 1 deletion(-) diff --git a/drivers/net/ethernet/apm/xgene/xgene_enet_ethtool.c b/drivers/net/ethernet/apm/xgene/xgene_enet_ethtool.c index 217cde8..a7eed3b 100644 --- a/drivers/net/ethernet/apm/xgene/xgene_enet_ethtool.c +++ b/drivers/net/ethernet/apm/xgene/xgene_enet_ethtool.c @@ -23,9 +23,17 @@ struct xgene_gstrings_stats { char name[ETH_GSTRING_LEN]; int offset; + u32 addr; + u32 mask; }; #define XGENE_STAT(m) { #m, offsetof(struct rtnl_link_stats64, m) } +#define XGENE_EXTD_STAT(s, a, m) \ + { \ + .name = #s, \ + .addr = a ## _ADDR, \ + .mask = m \ + } static const struct xgene_gstrings_stats gstrings_stats[] = { XGENE_STAT(rx_packets), @@ -40,7 +48,51 @@ struct xgene_gstrings_stats { XGENE_STAT(rx_fifo_errors) }; +static const struct xgene_gstrings_stats gstrings_extd_stats[] = { + XGENE_EXTD_STAT(tx_rx_64b_frame_cntr, TR64, 31), + XGENE_EXTD_STAT(tx_rx_127b_frame_cntr, TR127, 31), + XGENE_EXTD_STAT(tx_rx_255b_frame_cntr, TR255, 31), + XGENE_EXTD_STAT(tx_rx_511b_frame_cntr, TR511, 31), + XGENE_EXTD_STAT(tx_rx_1023b_frame_cntr, TR1K, 31), + XGENE_EXTD_STAT(tx_rx_1518b_frame_cntr, TRMAX, 31), + XGENE_EXTD_STAT(tx_rx_1522b_frame_cntr, TRMGV, 31), + XGENE_EXTD_STAT(rx_fcs_error_cntr, RFCS, 16), + XGENE_EXTD_STAT(rx_multicast_pkt_cntr, RMCA, 31), + XGENE_EXTD_STAT(rx_broadcast_pkt_cntr, RBCA, 31), + XGENE_EXTD_STAT(rx_ctrl_frame_pkt_cntr, RXCF, 16), + XGENE_EXTD_STAT(rx_pause_frame_pkt_cntr, RXPF, 16), + XGENE_EXTD_STAT(rx_unk_opcode_cntr, RXUO, 16), + XGENE_EXTD_STAT(rx_align_err_cntr, RALN, 16), + XGENE_EXTD_STAT(rx_frame_len_err_cntr, RFLR, 16), + XGENE_EXTD_STAT(rx_code_err_cntr, RCDE, 16), + XGENE_EXTD_STAT(rx_carrier_sense_err_cntr, RCSE, 16), + XGENE_EXTD_STAT(rx_undersize_pkt_cntr, RUND, 16), + XGENE_EXTD_STAT(rx_oversize_pkt_cntr, ROVR, 16), + XGENE_EXTD_STAT(rx_fragments_cntr, RFRG, 16), + XGENE_EXTD_STAT(rx_jabber_cntr, RJBR, 16), + XGENE_EXTD_STAT(rx_dropped_pkt_cntr, RDRP, 16), + XGENE_EXTD_STAT(tx_multicast_pkt_cntr, TMCA, 31), + XGENE_EXTD_STAT(tx_broadcast_pkt_cntr, TBCA, 31), + XGENE_EXTD_STAT(tx_pause_ctrl_frame_cntr, TXPF, 16), + XGENE_EXTD_STAT(tx_defer_pkt_cntr, TDFR, 31), + XGENE_EXTD_STAT(tx_excv_defer_pkt_cntr, TEDF, 31), + XGENE_EXTD_STAT(tx_single_col_pkt_cntr, TSCL, 31), + XGENE_EXTD_STAT(tx_multi_col_pkt_cntr, TMCL, 31), + XGENE_EXTD_STAT(tx_late_col_pkt_cntr, TLCL, 31), + XGENE_EXTD_STAT(tx_excv_col_pkt_cntr, TXCL, 31), + XGENE_EXTD_STAT(tx_total_col_cntr, TNCL, 31), + XGENE_EXTD_STAT(tx_pause_frames_hnrd_cntr, TPFH, 16), + XGENE_EXTD_STAT(tx_drop_frame_cntr, TDRP, 16), + XGENE_EXTD_STAT(tx_jabber_frame_cntr, TJBR, 12), + XGENE_EXTD_STAT(tx_fcs_error_cntr, TFCS, 12), + XGENE_EXTD_STAT(tx_ctrl_frame_cntr, TXCF, 12), + XGENE_EXTD_STAT(tx_oversize_frame_cntr, TOVR, 12), + XGENE_EXTD_STAT(tx_undersize_frame_cntr, TUND, 12), + XGENE_EXTD_STAT(tx_fragments_cntr, TFRG, 12) +}; + #define XGENE_STATS_LENARRAY_SIZE(gstrings_stats) +#define XGENE_EXTD_STATS_LEN ARRAY_SIZE(gstrings_extd_stats) static void xgene_get_drvinfo(struct net_device *ndev, struct ethtool_drvinfo *info) @@ -142,6 +194,11 @@ static void xgene_get_strings(struct net_device *ndev, u32 stringset, u8 *data) memcpy(p, gstrings_stats[i].name, ETH_GSTRING_LEN); p += ETH_GSTRING_LEN; } + + for (i = 0; i < XGENE_EXTD_STATS_LEN; i++) { + memcpy(p, gstrings_extd_stats[i].name, ETH_GSTRING_LEN); + p += ETH_GSTRING_LEN; + } } static int xgene_get_sset_count(struct net_device *ndev, int sset) @@ -149,19 +206,49 @@ static int xgene_get_sset_count(struct net_device *ndev, int sset) if (sset != ETH_SS_STATS) return -EINVAL; - return XGENE_STATS_LEN; + return XGENE_STATS_LEN + XGENE_EXTD_STATS_LEN; +} + +static void xgene_get_extd_stats(struct xgene_enet_pdata *pdata) +{ + u32 tmp; + int i; + + for (i
[PATCH v2 net-next 06/12] drivers: net: xgene: Remove unused macros
From: Quan NguyenThis patch cleans up unused macros to improve readability. Signed-off-by: Quan Nguyen Signed-off-by: Iyappan Subramanian --- drivers/net/ethernet/apm/xgene/xgene_enet_hw.h | 6 -- 1 file changed, 6 deletions(-) diff --git a/drivers/net/ethernet/apm/xgene/xgene_enet_hw.h b/drivers/net/ethernet/apm/xgene/xgene_enet_hw.h index 3ce349f..ed51005 100644 --- a/drivers/net/ethernet/apm/xgene/xgene_enet_hw.h +++ b/drivers/net/ethernet/apm/xgene/xgene_enet_hw.h @@ -217,12 +217,6 @@ enum xgene_enet_rm { #define FULL_DUPLEX2 BIT(0) #define PAD_CRCBIT(2) #define LENGTH_CHK BIT(4) -#define SCAN_AUTO_INCR BIT(5) -#define TBYT_ADDR 0x38 -#define TPKT_ADDR 0x39 -#define TDRP_ADDR 0x45 -#define TFCS_ADDR 0x47 -#define TUND_ADDR 0x4a #define TSO_IPPROTO_TCP1 -- 1.9.1
[PATCH v2 net-next 01/12] drivers: net: xgene: Protect indirect MAC access
This patch, - refactors mac read/write functions - adds lock to protect indirect mac access Signed-off-by: Iyappan SubramanianSigned-off-by: Quan Nguyen --- drivers/net/ethernet/apm/xgene/xgene_enet_hw.c| 119 +- drivers/net/ethernet/apm/xgene/xgene_enet_hw.h| 3 + drivers/net/ethernet/apm/xgene/xgene_enet_main.c | 1 + drivers/net/ethernet/apm/xgene/xgene_enet_main.h | 1 + drivers/net/ethernet/apm/xgene/xgene_enet_sgmac.c | 71 - drivers/net/ethernet/apm/xgene/xgene_enet_xgmac.c | 44 ++-- 6 files changed, 62 insertions(+), 177 deletions(-) diff --git a/drivers/net/ethernet/apm/xgene/xgene_enet_hw.c b/drivers/net/ethernet/apm/xgene/xgene_enet_hw.c index 2a835e0..2050c58 100644 --- a/drivers/net/ethernet/apm/xgene/xgene_enet_hw.c +++ b/drivers/net/ethernet/apm/xgene/xgene_enet_hw.c @@ -270,42 +270,32 @@ static void xgene_enet_wr_mcx_csr(struct xgene_enet_pdata *pdata, iowrite32(val, addr); } -static bool xgene_enet_wr_indirect(void __iomem *addr, void __iomem *wr, - void __iomem *cmd, void __iomem *cmd_done, - u32 wr_addr, u32 wr_data) +void xgene_enet_wr_mac(struct xgene_enet_pdata *pdata, u32 wr_addr, u32 wr_data) { - u32 done; + void __iomem *addr, *wr, *cmd, *cmd_done; + struct net_device *ndev = pdata->ndev; u8 wait = 10; + u32 done; + addr = pdata->mcx_mac_addr + MAC_ADDR_REG_OFFSET; + wr = pdata->mcx_mac_addr + MAC_WRITE_REG_OFFSET; + cmd = pdata->mcx_mac_addr + MAC_COMMAND_REG_OFFSET; + cmd_done = pdata->mcx_mac_addr + MAC_COMMAND_DONE_REG_OFFSET; + + spin_lock(>mac_lock); iowrite32(wr_addr, addr); iowrite32(wr_data, wr); iowrite32(XGENE_ENET_WR_CMD, cmd); - /* wait for write command to complete */ while (!(done = ioread32(cmd_done)) && wait--) udelay(1); if (!done) - return false; + netdev_err(ndev, "mac write failed, addr: %04x data: %08x\n", + wr_addr, wr_data); iowrite32(0, cmd); - - return true; -} - -static void xgene_enet_wr_mcx_mac(struct xgene_enet_pdata *pdata, - u32 wr_addr, u32 wr_data) -{ - void __iomem *addr, *wr, *cmd, *cmd_done; - - addr = pdata->mcx_mac_addr + MAC_ADDR_REG_OFFSET; - wr = pdata->mcx_mac_addr + MAC_WRITE_REG_OFFSET; - cmd = pdata->mcx_mac_addr + MAC_COMMAND_REG_OFFSET; - cmd_done = pdata->mcx_mac_addr + MAC_COMMAND_DONE_REG_OFFSET; - - if (!xgene_enet_wr_indirect(addr, wr, cmd, cmd_done, wr_addr, wr_data)) - netdev_err(pdata->ndev, "MCX mac write failed, addr: %04x\n", - wr_addr); + spin_unlock(>mac_lock); } static void xgene_enet_rd_csr(struct xgene_enet_pdata *pdata, @@ -332,42 +322,33 @@ static void xgene_enet_rd_mcx_csr(struct xgene_enet_pdata *pdata, *val = ioread32(addr); } -static bool xgene_enet_rd_indirect(void __iomem *addr, void __iomem *rd, - void __iomem *cmd, void __iomem *cmd_done, - u32 rd_addr, u32 *rd_data) +u32 xgene_enet_rd_mac(struct xgene_enet_pdata *pdata, u32 rd_addr) { - u32 done; + void __iomem *addr, *rd, *cmd, *cmd_done; + u32 done, rd_data; u8 wait = 10; + addr = pdata->mcx_mac_addr + MAC_ADDR_REG_OFFSET; + rd = pdata->mcx_mac_addr + MAC_READ_REG_OFFSET; + cmd = pdata->mcx_mac_addr + MAC_COMMAND_REG_OFFSET; + cmd_done = pdata->mcx_mac_addr + MAC_COMMAND_DONE_REG_OFFSET; + + spin_lock(>mac_lock); iowrite32(rd_addr, addr); iowrite32(XGENE_ENET_RD_CMD, cmd); - /* wait for read command to complete */ while (!(done = ioread32(cmd_done)) && wait--) udelay(1); if (!done) - return false; + netdev_err(pdata->ndev, "mac read failed, addr: %04x\n", + rd_addr); - *rd_data = ioread32(rd); + rd_data = ioread32(rd); iowrite32(0, cmd); + spin_unlock(>mac_lock); - return true; -} - -static void xgene_enet_rd_mcx_mac(struct xgene_enet_pdata *pdata, - u32 rd_addr, u32 *rd_data) -{ - void __iomem *addr, *rd, *cmd, *cmd_done; - - addr = pdata->mcx_mac_addr + MAC_ADDR_REG_OFFSET; - rd = pdata->mcx_mac_addr + MAC_READ_REG_OFFSET; - cmd = pdata->mcx_mac_addr + MAC_COMMAND_REG_OFFSET; - cmd_done = pdata->mcx_mac_addr + MAC_COMMAND_DONE_REG_OFFSET; - - if (!xgene_enet_rd_indirect(addr, rd, cmd, cmd_done, rd_addr, rd_data)) - netdev_err(pdata->ndev, "MCX mac read failed, addr: %04x\n", - rd_addr); + return rd_data; } static void xgene_gmac_set_mac_addr(struct
[PATCH v2 net-next 00/12] drivers: net: xgene: Add ethtool stats and bug fixes
This patch set, - adds ethtool extended statistics support - addresses errata workarounds - fixes bugs related to statistics Signed-off-by: Iyappan SubramanianSigned-off-by: Quan Nguyen --- v2: Address review comments from v1 - Adds lock to protect mdio-xgene indirect MAC access - Refactors xgene-enet indirect MAC read/write functions - Uses mdio-xgene MAC access routines, if xgene-enet port use the same HW. v1: - Initial version --- Iyappan Subramanian (3): drivers: net: xgene: Protect indirect MAC access drivers: net: xgene: Add rx_overrun/tx_underrun statistics drivers: net: xgene: Fix redundant prefetch buffer cleanup Quan Nguyen (9): drivers: net: phy: xgene: Add lock to protect mac access drivers: net: xgene: Use rgmii mdio mac access drivers: net: xgene: Remove redundant local stats drivers: net: xgene: Refactor statistics error parsing code drivers: net: xgene: Remove unused macros drivers: net: xgene: Extend ethtool statistics drivers: net: xgene: Workaround for HW errata 10GE_4 drivers: net: xgene: Add frame recovered statistics counter for errata 10GE_8/ENET_11 drivers: net: xgene: Workaround for HW errata 10GE_10/ENET_15 .../net/ethernet/apm/xgene/xgene_enet_ethtool.c| 132 ++- drivers/net/ethernet/apm/xgene/xgene_enet_hw.c | 188 +++-- drivers/net/ethernet/apm/xgene/xgene_enet_hw.h | 70 +++- drivers/net/ethernet/apm/xgene/xgene_enet_main.c | 59 +-- drivers/net/ethernet/apm/xgene/xgene_enet_main.h | 12 +- drivers/net/ethernet/apm/xgene/xgene_enet_sgmac.c | 110 +++- drivers/net/ethernet/apm/xgene/xgene_enet_xgmac.c | 77 +++-- drivers/net/ethernet/apm/xgene/xgene_enet_xgmac.h | 5 + drivers/net/phy/mdio-xgene.c | 74 drivers/net/phy/mdio-xgene.h | 3 + 10 files changed, 428 insertions(+), 302 deletions(-) -- 1.9.1
[PATCH v2 net-next 12/12] drivers: net: xgene: Fix redundant prefetch buffer cleanup
Prefetch buffer cleanup code was called twice, causing EDAC to report errors during reboot. [ 1130.972475] xgene-edac 7880.edac: IOB bridge agent (BA) transaction error [ 1130.979584] xgene-edac 7880.edac: IOB BA write response error [ 1130.985648] xgene-edac 7880.edac: IOB BA write access at 0x00. () [ 1130.993612] xgene-edac 7880.edac: IOB BA requestor ID 0x2400 [ 1131.000242] xgene-edac 7880.edac: IOB bridge agent (BA) transaction error ... This patch fixes the errors by, - removing the redundant prefetch buffer cleanup from port_ops->shutdown() - moving port_ops->shutdown() after delete_rings() Signed-off-by: Iyappan Subramanian--- drivers/net/ethernet/apm/xgene/xgene_enet_hw.c| 21 - drivers/net/ethernet/apm/xgene/xgene_enet_main.c | 2 +- drivers/net/ethernet/apm/xgene/xgene_enet_sgmac.c | 20 drivers/net/ethernet/apm/xgene/xgene_enet_xgmac.c | 20 4 files changed, 1 insertion(+), 62 deletions(-) diff --git a/drivers/net/ethernet/apm/xgene/xgene_enet_hw.c b/drivers/net/ethernet/apm/xgene/xgene_enet_hw.c index 3235d0c..6ac27c7 100644 --- a/drivers/net/ethernet/apm/xgene/xgene_enet_hw.c +++ b/drivers/net/ethernet/apm/xgene/xgene_enet_hw.c @@ -763,27 +763,6 @@ static void xgene_enet_clear(struct xgene_enet_pdata *pdata, static void xgene_gport_shutdown(struct xgene_enet_pdata *pdata) { struct device *dev = >pdev->dev; - struct xgene_enet_desc_ring *ring; - u32 pb; - int i; - - pb = 0; - for (i = 0; i < pdata->rxq_cnt; i++) { - ring = pdata->rx_ring[i]->buf_pool; - pb |= BIT(xgene_enet_get_fpsel(ring->id)); - ring = pdata->rx_ring[i]->page_pool; - if (ring) - pb |= BIT(xgene_enet_get_fpsel(ring->id)); - - } - xgene_enet_wr_ring_if(pdata, ENET_CFGSSQMIFPRESET_ADDR, pb); - - pb = 0; - for (i = 0; i < pdata->txq_cnt; i++) { - ring = pdata->tx_ring[i]; - pb |= BIT(xgene_enet_ring_bufnum(ring->id)); - } - xgene_enet_wr_ring_if(pdata, ENET_CFGSSQMIWQRESET_ADDR, pb); if (dev->of_node) { if (!IS_ERR(pdata->clk)) diff --git a/drivers/net/ethernet/apm/xgene/xgene_enet_main.c b/drivers/net/ethernet/apm/xgene/xgene_enet_main.c index 01e389d..21cd4ef 100644 --- a/drivers/net/ethernet/apm/xgene/xgene_enet_main.c +++ b/drivers/net/ethernet/apm/xgene/xgene_enet_main.c @@ -2159,8 +2159,8 @@ static int xgene_enet_remove(struct platform_device *pdev) xgene_enet_mdio_remove(pdata); unregister_netdev(ndev); - pdata->port_ops->shutdown(pdata); xgene_enet_delete_desc_rings(pdata); + pdata->port_ops->shutdown(pdata); free_netdev(ndev); return 0; diff --git a/drivers/net/ethernet/apm/xgene/xgene_enet_sgmac.c b/drivers/net/ethernet/apm/xgene/xgene_enet_sgmac.c index 31df8f3..b1a83fd 100644 --- a/drivers/net/ethernet/apm/xgene/xgene_enet_sgmac.c +++ b/drivers/net/ethernet/apm/xgene/xgene_enet_sgmac.c @@ -534,26 +534,6 @@ static void xgene_enet_clear(struct xgene_enet_pdata *pdata, static void xgene_enet_shutdown(struct xgene_enet_pdata *p) { struct device *dev = >pdev->dev; - struct xgene_enet_desc_ring *ring; - u32 pb; - int i; - - pb = 0; - for (i = 0; i < p->rxq_cnt; i++) { - ring = p->rx_ring[i]->buf_pool; - pb |= BIT(xgene_enet_get_fpsel(ring->id)); - ring = p->rx_ring[i]->page_pool; - if (ring) - pb |= BIT(xgene_enet_get_fpsel(ring->id)); - } - xgene_enet_wr_ring_if(p, ENET_CFGSSQMIFPRESET_ADDR, pb); - - pb = 0; - for (i = 0; i < p->txq_cnt; i++) { - ring = p->tx_ring[i]; - pb |= BIT(xgene_enet_ring_bufnum(ring->id)); - } - xgene_enet_wr_ring_if(p, ENET_CFGSSQMIWQRESET_ADDR, pb); if (dev->of_node) { if (!IS_ERR(p->clk)) diff --git a/drivers/net/ethernet/apm/xgene/xgene_enet_xgmac.c b/drivers/net/ethernet/apm/xgene/xgene_enet_xgmac.c index eab6f1c..b7d75d0 100644 --- a/drivers/net/ethernet/apm/xgene/xgene_enet_xgmac.c +++ b/drivers/net/ethernet/apm/xgene/xgene_enet_xgmac.c @@ -446,26 +446,6 @@ static void xgene_enet_xgcle_bypass(struct xgene_enet_pdata *pdata, static void xgene_enet_shutdown(struct xgene_enet_pdata *pdata) { struct device *dev = >pdev->dev; - struct xgene_enet_desc_ring *ring; - u32 pb; - int i; - - pb = 0; - for (i = 0; i < pdata->rxq_cnt; i++) { - ring = pdata->rx_ring[i]->buf_pool; - pb |= BIT(xgene_enet_get_fpsel(ring->id)); - ring = pdata->rx_ring[i]->page_pool; - if (ring) - pb |= BIT(xgene_enet_get_fpsel(ring->id)); - } - xgene_enet_wr_ring_if(pdata,
[PATCH v2 net-next 02/12] drivers: net: phy: xgene: Add lock to protect mac access
From: Quan NguyenThis patch, - refactors mac access routine - adds lock to protect mac indirect access Signed-off-by: Quan Nguyen Signed-off-by: Iyappan Subramanian --- drivers/net/phy/mdio-xgene.c | 74 ++-- drivers/net/phy/mdio-xgene.h | 3 ++ 2 files changed, 40 insertions(+), 37 deletions(-) diff --git a/drivers/net/phy/mdio-xgene.c b/drivers/net/phy/mdio-xgene.c index 3e2ac07..bfd3090 100644 --- a/drivers/net/phy/mdio-xgene.c +++ b/drivers/net/phy/mdio-xgene.c @@ -34,76 +34,73 @@ static bool xgene_mdio_status; -static u32 xgene_enet_rd_mac(void __iomem *base_addr, u32 rd_addr) +u32 xgene_mdio_rd_mac(struct xgene_mdio_pdata *pdata, u32 rd_addr) { void __iomem *addr, *rd, *cmd, *cmd_done; u32 done, rd_data = BUSY_MASK; u8 wait = 10; - addr = base_addr + MAC_ADDR_REG_OFFSET; - rd = base_addr + MAC_READ_REG_OFFSET; - cmd = base_addr + MAC_COMMAND_REG_OFFSET; - cmd_done = base_addr + MAC_COMMAND_DONE_REG_OFFSET; + addr = pdata->mac_csr_addr + MAC_ADDR_REG_OFFSET; + rd = pdata->mac_csr_addr + MAC_READ_REG_OFFSET; + cmd = pdata->mac_csr_addr + MAC_COMMAND_REG_OFFSET; + cmd_done = pdata->mac_csr_addr + MAC_COMMAND_DONE_REG_OFFSET; + spin_lock(>mac_lock); iowrite32(rd_addr, addr); iowrite32(XGENE_ENET_RD_CMD, cmd); - while (wait--) { - done = ioread32(cmd_done); - if (done) - break; + while (!(done = ioread32(cmd_done)) && wait--) udelay(1); - } - if (!done) - return rd_data; + if (done) + rd_data = ioread32(rd); - rd_data = ioread32(rd); iowrite32(0, cmd); + spin_unlock(>mac_lock); return rd_data; } +EXPORT_SYMBOL(xgene_mdio_rd_mac); -static void xgene_enet_wr_mac(void __iomem *base_addr, u32 wr_addr, u32 wr_data) +void xgene_mdio_wr_mac(struct xgene_mdio_pdata *pdata, u32 wr_addr, u32 data) { void __iomem *addr, *wr, *cmd, *cmd_done; u8 wait = 10; u32 done; - addr = base_addr + MAC_ADDR_REG_OFFSET; - wr = base_addr + MAC_WRITE_REG_OFFSET; - cmd = base_addr + MAC_COMMAND_REG_OFFSET; - cmd_done = base_addr + MAC_COMMAND_DONE_REG_OFFSET; + addr = pdata->mac_csr_addr + MAC_ADDR_REG_OFFSET; + wr = pdata->mac_csr_addr + MAC_WRITE_REG_OFFSET; + cmd = pdata->mac_csr_addr + MAC_COMMAND_REG_OFFSET; + cmd_done = pdata->mac_csr_addr + MAC_COMMAND_DONE_REG_OFFSET; + spin_lock(>mac_lock); iowrite32(wr_addr, addr); - iowrite32(wr_data, wr); + iowrite32(data, wr); iowrite32(XGENE_ENET_WR_CMD, cmd); - while (wait--) { - done = ioread32(cmd_done); - if (done) - break; + while (!(done = ioread32(cmd_done)) && wait--) udelay(1); - } if (!done) pr_err("MCX mac write failed, addr: 0x%04x\n", wr_addr); iowrite32(0, cmd); + spin_unlock(>mac_lock); } +EXPORT_SYMBOL(xgene_mdio_wr_mac); int xgene_mdio_rgmii_read(struct mii_bus *bus, int phy_id, int reg) { - void __iomem *addr = (void __iomem *)bus->priv; + struct xgene_mdio_pdata *pdata = (struct xgene_mdio_pdata *)bus->priv; u32 data, done; u8 wait = 10; data = SET_VAL(PHY_ADDR, phy_id) | SET_VAL(REG_ADDR, reg); - xgene_enet_wr_mac(addr, MII_MGMT_ADDRESS_ADDR, data); - xgene_enet_wr_mac(addr, MII_MGMT_COMMAND_ADDR, READ_CYCLE_MASK); + xgene_mdio_wr_mac(pdata, MII_MGMT_ADDRESS_ADDR, data); + xgene_mdio_wr_mac(pdata, MII_MGMT_COMMAND_ADDR, READ_CYCLE_MASK); do { usleep_range(5, 10); - done = xgene_enet_rd_mac(addr, MII_MGMT_INDICATORS_ADDR); + done = xgene_mdio_rd_mac(pdata, MII_MGMT_INDICATORS_ADDR); } while ((done & BUSY_MASK) && wait--); if (done & BUSY_MASK) { @@ -111,8 +108,8 @@ int xgene_mdio_rgmii_read(struct mii_bus *bus, int phy_id, int reg) return -EBUSY; } - data = xgene_enet_rd_mac(addr, MII_MGMT_STATUS_ADDR); - xgene_enet_wr_mac(addr, MII_MGMT_COMMAND_ADDR, 0); + data = xgene_mdio_rd_mac(pdata, MII_MGMT_STATUS_ADDR); + xgene_mdio_wr_mac(pdata, MII_MGMT_COMMAND_ADDR, 0); return data; } @@ -120,17 +117,17 @@ int xgene_mdio_rgmii_read(struct mii_bus *bus, int phy_id, int reg) int xgene_mdio_rgmii_write(struct mii_bus *bus, int phy_id, int reg, u16 data) { - void __iomem *addr = (void __iomem *)bus->priv; + struct xgene_mdio_pdata *pdata = (struct xgene_mdio_pdata *)bus->priv; u32 val, done; u8 wait = 10; val = SET_VAL(PHY_ADDR, phy_id) | SET_VAL(REG_ADDR, reg); - xgene_enet_wr_mac(addr, MII_MGMT_ADDRESS_ADDR, val); +
[PATCH v2 net-next 11/12] drivers: net: xgene: Workaround for HW errata 10GE_10/ENET_15
From: Quan NguyenThis patch adds workaround for HW errata 10GE_10 and ENET_15: "HW statistic counters value are duplicated". - RFCS duplicates RALN counter - RFLR duplicates RUND counter - TFCS duplicates TFRG counter - RALN should be intepreted as 0 in 10G mode Signed-off-by: Quan Nguyen Signed-off-by: Iyappan Subramanian --- .../net/ethernet/apm/xgene/xgene_enet_ethtool.c| 33 ++ drivers/net/ethernet/apm/xgene/xgene_enet_main.c | 20 +++-- drivers/net/ethernet/apm/xgene/xgene_enet_main.h | 2 ++ 3 files changed, 46 insertions(+), 9 deletions(-) diff --git a/drivers/net/ethernet/apm/xgene/xgene_enet_ethtool.c b/drivers/net/ethernet/apm/xgene/xgene_enet_ethtool.c index 5c2c84a..0fdec78 100644 --- a/drivers/net/ethernet/apm/xgene/xgene_enet_ethtool.c +++ b/drivers/net/ethernet/apm/xgene/xgene_enet_ethtool.c @@ -71,6 +71,7 @@ struct xgene_gstrings_stats { XGENE_EXTD_STAT(rx_oversize_pkt_cntr, ROVR, 16), XGENE_EXTD_STAT(rx_fragments_cntr, RFRG, 16), XGENE_EXTD_STAT(rx_jabber_cntr, RJBR, 16), + XGENE_EXTD_STAT(rx_jabber_recov_cntr, DUMP, 0), XGENE_EXTD_STAT(rx_dropped_pkt_cntr, RDRP, 16), XGENE_EXTD_STAT(rx_overrun_cntr, DUMP, 0), XGENE_EXTD_STAT(tx_multicast_pkt_cntr, TMCA, 31), @@ -96,9 +97,16 @@ struct xgene_gstrings_stats { #define XGENE_STATS_LENARRAY_SIZE(gstrings_stats) #define XGENE_EXTD_STATS_LEN ARRAY_SIZE(gstrings_extd_stats) +#define RFCS_IDX 7 +#define RALN_IDX 13 +#define RFLR_IDX 14 #define FALSE_RFLR_IDX 15 -#define RX_OVERRUN_IDX 23 -#define TX_UNDERRUN_IDX42 +#define RUND_IDX 18 +#define FALSE_RJBR_IDX 22 +#define RX_OVERRUN_IDX 24 +#define TFCS_IDX 38 +#define TFRG_IDX 42 +#define TX_UNDERRUN_IDX43 static void xgene_get_drvinfo(struct net_device *ndev, struct ethtool_drvinfo *info) @@ -218,14 +226,25 @@ static int xgene_get_sset_count(struct net_device *ndev, int sset) static void xgene_get_extd_stats(struct xgene_enet_pdata *pdata) { u32 rx_drop, tx_drop; - u32 tmp; + u32 mask, tmp; int i; for (i = 0; i < XGENE_EXTD_STATS_LEN; i++) { tmp = xgene_enet_rd_stat(pdata, gstrings_extd_stats[i].addr); - if (gstrings_extd_stats[i].mask) - pdata->extd_stats[i] += tmp & - GENMASK(gstrings_extd_stats[i].mask - 1, 0); + if (gstrings_extd_stats[i].mask) { + mask = GENMASK(gstrings_extd_stats[i].mask - 1, 0); + pdata->extd_stats[i] += (tmp & mask); + } + } + + if (pdata->phy_mode == PHY_INTERFACE_MODE_XGMII) { + /* Errata 10GE_10 - SW should intepret RALN as 0 */ + pdata->extd_stats[RALN_IDX] = 0; + } else { + /* Errata ENET_15 - Fixes RFCS, RFLR, TFCS counter */ + pdata->extd_stats[RFCS_IDX] -= pdata->extd_stats[RALN_IDX]; + pdata->extd_stats[RFLR_IDX] -= pdata->extd_stats[RUND_IDX]; + pdata->extd_stats[TFCS_IDX] -= pdata->extd_stats[TFRG_IDX]; } pdata->mac_ops->get_drop_cnt(pdata, _drop, _drop); @@ -234,6 +253,8 @@ static void xgene_get_extd_stats(struct xgene_enet_pdata *pdata) /* Errata 10GE_8 - Update Frame recovered from Errata 10GE_8/ENET_11 */ pdata->extd_stats[FALSE_RFLR_IDX] = pdata->false_rflr; + /* Errata ENET_15 - Jabber Frame recov'ed from Errata 10GE_10/ENET_15 */ + pdata->extd_stats[FALSE_RJBR_IDX] = pdata->vlan_rjbr; } int xgene_extd_stats_init(struct xgene_enet_pdata *pdata) diff --git a/drivers/net/ethernet/apm/xgene/xgene_enet_main.c b/drivers/net/ethernet/apm/xgene/xgene_enet_main.c index c2d38da..01e389d 100644 --- a/drivers/net/ethernet/apm/xgene/xgene_enet_main.c +++ b/drivers/net/ethernet/apm/xgene/xgene_enet_main.c @@ -656,6 +656,18 @@ static void xgene_enet_free_pagepool(struct xgene_enet_desc_ring *buf_pool, buf_pool->head = head; } +/* Errata 10GE_10 and ENET_15 - Fix duplicated HW statistic counters */ +static bool xgene_enet_errata_10GE_10(struct sk_buff *skb, u32 len, u8 status) +{ + if (status == INGRESS_CRC && + len >= (ETHER_STD_PACKET + 1) && + len <= (ETHER_STD_PACKET + 4) && + skb->protocol == htons(ETH_P_8021Q)) + return true; + + return false; +} + /* Errata 10GE_8 and ENET_11 - allow packet with length <=64B */ static bool xgene_enet_errata_10GE_8(struct sk_buff *skb, u32 len, u8 status) { @@ -706,14 +718,16 @@ static int xgene_enet_rx_frame(struct xgene_enet_desc_ring *rx_ring, status = (GET_VAL(ELERR, le64_to_cpu(raw_desc->m0)) << LERR_LEN) | GET_VAL(LERR,
[PATCH v2 net-next 08/12] drivers: net: xgene: Add rx_overrun/tx_underrun statistics
This patch adds rx_overrun and tx_underrun ethtool statistic counters. Signed-off-by: Quan NguyenSigned-off-by: Iyappan Subramanian --- drivers/net/ethernet/apm/xgene/xgene_enet_ethtool.c | 16 +--- drivers/net/ethernet/apm/xgene/xgene_enet_hw.c | 11 +++ drivers/net/ethernet/apm/xgene/xgene_enet_hw.h | 8 drivers/net/ethernet/apm/xgene/xgene_enet_main.h| 1 + drivers/net/ethernet/apm/xgene/xgene_enet_sgmac.c | 14 ++ drivers/net/ethernet/apm/xgene/xgene_enet_xgmac.c | 11 +++ drivers/net/ethernet/apm/xgene/xgene_enet_xgmac.h | 4 7 files changed, 62 insertions(+), 3 deletions(-) diff --git a/drivers/net/ethernet/apm/xgene/xgene_enet_ethtool.c b/drivers/net/ethernet/apm/xgene/xgene_enet_ethtool.c index a7eed3b..6b2a4b9 100644 --- a/drivers/net/ethernet/apm/xgene/xgene_enet_ethtool.c +++ b/drivers/net/ethernet/apm/xgene/xgene_enet_ethtool.c @@ -71,6 +71,7 @@ struct xgene_gstrings_stats { XGENE_EXTD_STAT(rx_fragments_cntr, RFRG, 16), XGENE_EXTD_STAT(rx_jabber_cntr, RJBR, 16), XGENE_EXTD_STAT(rx_dropped_pkt_cntr, RDRP, 16), + XGENE_EXTD_STAT(rx_overrun_cntr, DUMP, 0), XGENE_EXTD_STAT(tx_multicast_pkt_cntr, TMCA, 31), XGENE_EXTD_STAT(tx_broadcast_pkt_cntr, TBCA, 31), XGENE_EXTD_STAT(tx_pause_ctrl_frame_cntr, TXPF, 16), @@ -88,11 +89,14 @@ struct xgene_gstrings_stats { XGENE_EXTD_STAT(tx_ctrl_frame_cntr, TXCF, 12), XGENE_EXTD_STAT(tx_oversize_frame_cntr, TOVR, 12), XGENE_EXTD_STAT(tx_undersize_frame_cntr, TUND, 12), - XGENE_EXTD_STAT(tx_fragments_cntr, TFRG, 12) + XGENE_EXTD_STAT(tx_fragments_cntr, TFRG, 12), + XGENE_EXTD_STAT(tx_underrun_cntr, DUMP, 0) }; #define XGENE_STATS_LENARRAY_SIZE(gstrings_stats) #define XGENE_EXTD_STATS_LEN ARRAY_SIZE(gstrings_extd_stats) +#define RX_OVERRUN_IDX 22 +#define TX_UNDERRUN_IDX41 static void xgene_get_drvinfo(struct net_device *ndev, struct ethtool_drvinfo *info) @@ -211,14 +215,20 @@ static int xgene_get_sset_count(struct net_device *ndev, int sset) static void xgene_get_extd_stats(struct xgene_enet_pdata *pdata) { + u32 rx_drop, tx_drop; u32 tmp; int i; for (i = 0; i < XGENE_EXTD_STATS_LEN; i++) { tmp = xgene_enet_rd_stat(pdata, gstrings_extd_stats[i].addr); - pdata->extd_stats[i] += tmp & - GENMASK(gstrings_extd_stats[i].mask - 1, 0); + if (gstrings_extd_stats[i].mask) + pdata->extd_stats[i] += tmp & + GENMASK(gstrings_extd_stats[i].mask - 1, 0); } + + pdata->mac_ops->get_drop_cnt(pdata, _drop, _drop); + pdata->extd_stats[RX_OVERRUN_IDX] += rx_drop; + pdata->extd_stats[TX_UNDERRUN_IDX] += tx_drop; } int xgene_extd_stats_init(struct xgene_enet_pdata *pdata) diff --git a/drivers/net/ethernet/apm/xgene/xgene_enet_hw.c b/drivers/net/ethernet/apm/xgene/xgene_enet_hw.c index 9e89193..5278d62 100644 --- a/drivers/net/ethernet/apm/xgene/xgene_enet_hw.c +++ b/drivers/net/ethernet/apm/xgene/xgene_enet_hw.c @@ -618,6 +618,16 @@ static void xgene_gmac_init(struct xgene_enet_pdata *pdata) xgene_enet_wr_csr(pdata, CFG_BYPASS_ADDR, RESUME_TX); } +static void xgene_gmac_get_drop_cnt(struct xgene_enet_pdata *pdata, + u32 *rx, u32 *tx) +{ + u32 count; + + xgene_enet_rd_mcx_csr(pdata, ICM_ECM_DROP_COUNT_REG0_ADDR, ); + *rx = ICM_DROP_COUNT(count); + *tx = ECM_DROP_COUNT(count); +} + static void xgene_enet_config_ring_if_assoc(struct xgene_enet_pdata *pdata) { u32 val = 0x; @@ -1027,6 +1037,7 @@ void xgene_enet_mdio_remove(struct xgene_enet_pdata *pdata) .tx_enable = xgene_gmac_tx_enable, .rx_disable = xgene_gmac_rx_disable, .tx_disable = xgene_gmac_tx_disable, + .get_drop_cnt = xgene_gmac_get_drop_cnt, .set_speed = xgene_gmac_set_speed, .set_mac_addr = xgene_gmac_set_mac_addr, .set_framesize = xgene_enet_set_frame_size, diff --git a/drivers/net/ethernet/apm/xgene/xgene_enet_hw.h b/drivers/net/ethernet/apm/xgene/xgene_enet_hw.h index f1a4cfa..5d3e18d 100644 --- a/drivers/net/ethernet/apm/xgene/xgene_enet_hw.h +++ b/drivers/net/ethernet/apm/xgene/xgene_enet_hw.h @@ -192,6 +192,10 @@ enum xgene_enet_rm { #define CFG_CLE_NXTFPSEL0(val) (((val) << 20) & GENMASK(23, 20)) #define ICM_CONFIG0_REG_0_ADDR 0x0400 #define ICM_CONFIG2_REG_0_ADDR 0x0410 +#define ECM_CONFIG0_REG_0_ADDR 0x0500 +#define ECM_CONFIG0_REG_1_ADDR 0x0504 +#define ICM_ECM_DROP_COUNT_REG0_ADDR 0x0508 +#define ICM_ECM_DROP_COUNT_REG1_ADDR 0x050c #define RX_DV_GATE_REG_0_ADDR 0x05fc #define TX_DV_GATE_EN0 BIT(2) #define RX_DV_GATE_EN0
[PATCH v2 net-next 03/12] drivers: net: xgene: Use rgmii mdio mac access
From: Quan NguyenThis patch switches to use rgmii mdio mac access routines if available, as they share the same HW. Signed-off-by: Quan Nguyen Signed-off-by: Iyappan Subramanian --- drivers/net/ethernet/apm/xgene/xgene_enet_hw.c | 14 ++ 1 file changed, 14 insertions(+) diff --git a/drivers/net/ethernet/apm/xgene/xgene_enet_hw.c b/drivers/net/ethernet/apm/xgene/xgene_enet_hw.c index 2050c58..47c5b75 100644 --- a/drivers/net/ethernet/apm/xgene/xgene_enet_hw.c +++ b/drivers/net/ethernet/apm/xgene/xgene_enet_hw.c @@ -277,6 +277,13 @@ void xgene_enet_wr_mac(struct xgene_enet_pdata *pdata, u32 wr_addr, u32 wr_data) u8 wait = 10; u32 done; + if (pdata->mdio_driver && ndev->phydev && + pdata->phy_mode == PHY_INTERFACE_MODE_RGMII) { + struct mii_bus *bus = ndev->phydev->mdio.bus; + + return xgene_mdio_wr_mac(bus->priv, wr_addr, wr_data); + } + addr = pdata->mcx_mac_addr + MAC_ADDR_REG_OFFSET; wr = pdata->mcx_mac_addr + MAC_WRITE_REG_OFFSET; cmd = pdata->mcx_mac_addr + MAC_COMMAND_REG_OFFSET; @@ -328,6 +335,13 @@ u32 xgene_enet_rd_mac(struct xgene_enet_pdata *pdata, u32 rd_addr) u32 done, rd_data; u8 wait = 10; + if (pdata->mdio_driver && pdata->ndev->phydev && + pdata->phy_mode == PHY_INTERFACE_MODE_RGMII) { + struct mii_bus *bus = pdata->ndev->phydev->mdio.bus; + + return xgene_mdio_rd_mac(bus->priv, rd_addr); + } + addr = pdata->mcx_mac_addr + MAC_ADDR_REG_OFFSET; rd = pdata->mcx_mac_addr + MAC_READ_REG_OFFSET; cmd = pdata->mcx_mac_addr + MAC_COMMAND_REG_OFFSET; -- 1.9.1
[PATCH v2 net-next 10/12] drivers: net: xgene: Add frame recovered statistics counter for errata 10GE_8/ENET_11
From: Quan NguyenThis patch adds statistic counter for frames recovered from HW errata 10GE_8 and ENET_11: "HW reports Length error for valid 64 byte frames with len <46 bytes". Signed-off-by: Quan Nguyen Signed-off-by: Iyappan Subramanian --- drivers/net/ethernet/apm/xgene/xgene_enet_ethtool.c | 9 +++-- drivers/net/ethernet/apm/xgene/xgene_enet_main.c| 2 ++ drivers/net/ethernet/apm/xgene/xgene_enet_main.h| 1 + 3 files changed, 10 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/apm/xgene/xgene_enet_ethtool.c b/drivers/net/ethernet/apm/xgene/xgene_enet_ethtool.c index 6b2a4b9..5c2c84a 100644 --- a/drivers/net/ethernet/apm/xgene/xgene_enet_ethtool.c +++ b/drivers/net/ethernet/apm/xgene/xgene_enet_ethtool.c @@ -64,6 +64,7 @@ struct xgene_gstrings_stats { XGENE_EXTD_STAT(rx_unk_opcode_cntr, RXUO, 16), XGENE_EXTD_STAT(rx_align_err_cntr, RALN, 16), XGENE_EXTD_STAT(rx_frame_len_err_cntr, RFLR, 16), + XGENE_EXTD_STAT(rx_frame_len_err_recov_cntr, DUMP, 0), XGENE_EXTD_STAT(rx_code_err_cntr, RCDE, 16), XGENE_EXTD_STAT(rx_carrier_sense_err_cntr, RCSE, 16), XGENE_EXTD_STAT(rx_undersize_pkt_cntr, RUND, 16), @@ -95,8 +96,9 @@ struct xgene_gstrings_stats { #define XGENE_STATS_LENARRAY_SIZE(gstrings_stats) #define XGENE_EXTD_STATS_LEN ARRAY_SIZE(gstrings_extd_stats) -#define RX_OVERRUN_IDX 22 -#define TX_UNDERRUN_IDX41 +#define FALSE_RFLR_IDX 15 +#define RX_OVERRUN_IDX 23 +#define TX_UNDERRUN_IDX42 static void xgene_get_drvinfo(struct net_device *ndev, struct ethtool_drvinfo *info) @@ -229,6 +231,9 @@ static void xgene_get_extd_stats(struct xgene_enet_pdata *pdata) pdata->mac_ops->get_drop_cnt(pdata, _drop, _drop); pdata->extd_stats[RX_OVERRUN_IDX] += rx_drop; pdata->extd_stats[TX_UNDERRUN_IDX] += tx_drop; + + /* Errata 10GE_8 - Update Frame recovered from Errata 10GE_8/ENET_11 */ + pdata->extd_stats[FALSE_RFLR_IDX] = pdata->false_rflr; } int xgene_extd_stats_init(struct xgene_enet_pdata *pdata) diff --git a/drivers/net/ethernet/apm/xgene/xgene_enet_main.c b/drivers/net/ethernet/apm/xgene/xgene_enet_main.c index bd2486e..c2d38da 100644 --- a/drivers/net/ethernet/apm/xgene/xgene_enet_main.c +++ b/drivers/net/ethernet/apm/xgene/xgene_enet_main.c @@ -712,6 +712,8 @@ static int xgene_enet_rx_frame(struct xgene_enet_desc_ring *rx_ring, xgene_enet_parse_error(rx_ring, status); rx_ring->rx_dropped++; goto out; + } else { + pdata->false_rflr++; } } diff --git a/drivers/net/ethernet/apm/xgene/xgene_enet_main.h b/drivers/net/ethernet/apm/xgene/xgene_enet_main.h index cabe54e..0f5f0b0 100644 --- a/drivers/net/ethernet/apm/xgene/xgene_enet_main.h +++ b/drivers/net/ethernet/apm/xgene/xgene_enet_main.h @@ -224,6 +224,7 @@ struct xgene_enet_pdata { enum xgene_enet_rm rm; struct xgene_enet_cle cle; u64 *extd_stats; + u64 false_rflr; spinlock_t stats_lock; /* statistics lock */ const struct xgene_mac_ops *mac_ops; spinlock_t mac_lock; /* mac lock */ -- 1.9.1
[PATCH v2 net-next 05/12] drivers: net: xgene: Refactor statistics error parsing code
From: Quan NguyenThis patch fixes the tx error counters and adds more rx error counters. Signed-off-by: Quan Nguyen Signed-off-by: Iyappan Subramanian --- drivers/net/ethernet/apm/xgene/xgene_enet_hw.c | 6 -- drivers/net/ethernet/apm/xgene/xgene_enet_hw.h | 2 -- drivers/net/ethernet/apm/xgene/xgene_enet_main.c | 26 +++- drivers/net/ethernet/apm/xgene/xgene_enet_main.h | 2 ++ 4 files changed, 18 insertions(+), 18 deletions(-) diff --git a/drivers/net/ethernet/apm/xgene/xgene_enet_hw.c b/drivers/net/ethernet/apm/xgene/xgene_enet_hw.c index 47c5b75..02df577 100644 --- a/drivers/net/ethernet/apm/xgene/xgene_enet_hw.c +++ b/drivers/net/ethernet/apm/xgene/xgene_enet_hw.c @@ -205,30 +205,24 @@ static u32 xgene_enet_ring_len(struct xgene_enet_desc_ring *ring) } void xgene_enet_parse_error(struct xgene_enet_desc_ring *ring, - struct xgene_enet_pdata *pdata, enum xgene_enet_err_code status) { switch (status) { case INGRESS_CRC: ring->rx_crc_errors++; - ring->rx_dropped++; break; case INGRESS_CHECKSUM: case INGRESS_CHECKSUM_COMPUTE: ring->rx_errors++; - ring->rx_dropped++; break; case INGRESS_TRUNC_FRAME: ring->rx_frame_errors++; - ring->rx_dropped++; break; case INGRESS_PKT_LEN: ring->rx_length_errors++; - ring->rx_dropped++; break; case INGRESS_PKT_UNDER: ring->rx_frame_errors++; - ring->rx_dropped++; break; case INGRESS_FIFO_OVERRUN: ring->rx_fifo_errors++; diff --git a/drivers/net/ethernet/apm/xgene/xgene_enet_hw.h b/drivers/net/ethernet/apm/xgene/xgene_enet_hw.h index 057f951..3ce349f 100644 --- a/drivers/net/ethernet/apm/xgene/xgene_enet_hw.h +++ b/drivers/net/ethernet/apm/xgene/xgene_enet_hw.h @@ -380,9 +380,7 @@ static inline u16 xgene_enet_get_numslots(u16 id, u32 size) } void xgene_enet_parse_error(struct xgene_enet_desc_ring *ring, - struct xgene_enet_pdata *pdata, enum xgene_enet_err_code status); - int xgene_enet_mdio_config(struct xgene_enet_pdata *pdata); void xgene_enet_mdio_remove(struct xgene_enet_pdata *pdata); bool xgene_ring_mgr_init(struct xgene_enet_pdata *p); diff --git a/drivers/net/ethernet/apm/xgene/xgene_enet_main.c b/drivers/net/ethernet/apm/xgene/xgene_enet_main.c index e4f2ef2..3f24b83 100644 --- a/drivers/net/ethernet/apm/xgene/xgene_enet_main.c +++ b/drivers/net/ethernet/apm/xgene/xgene_enet_main.c @@ -246,9 +246,9 @@ static int xgene_enet_tx_completion(struct xgene_enet_desc_ring *cp_ring, skb_frag_t *frag; dma_addr_t *frag_dma_addr; u16 skb_index; - u8 status; - int i, ret = 0; u8 mss_index; + u8 status; + int i; skb_index = GET_VAL(USERINFO, le64_to_cpu(raw_desc->m0)); skb = cp_ring->cp_skb[skb_index]; @@ -275,19 +275,17 @@ static int xgene_enet_tx_completion(struct xgene_enet_desc_ring *cp_ring, /* Checking for error */ status = GET_VAL(LERR, le64_to_cpu(raw_desc->m0)); if (unlikely(status > 2)) { - xgene_enet_parse_error(cp_ring, netdev_priv(cp_ring->ndev), - status); - ret = -EIO; + cp_ring->tx_dropped++; + cp_ring->tx_errors++; } if (likely(skb)) { dev_kfree_skb_any(skb); } else { netdev_err(cp_ring->ndev, "completion skb is NULL\n"); - ret = -EIO; } - return ret; + return 0; } static int xgene_enet_setup_mss(struct net_device *ndev, u32 mss) @@ -711,7 +709,8 @@ static int xgene_enet_rx_frame(struct xgene_enet_desc_ring *rx_ring, if (!xgene_enet_errata_10GE_8(skb, datalen, status)) { dev_kfree_skb_any(skb); xgene_enet_free_pagepool(page_pool, raw_desc, exp_desc); - xgene_enet_parse_error(rx_ring, pdata, status); + xgene_enet_parse_error(rx_ring, status); + rx_ring->rx_dropped++; goto out; } } @@ -1477,6 +1476,8 @@ static void xgene_enet_get_stats64( if (ring) { stats->tx_packets += ring->tx_packets; stats->tx_bytes += ring->tx_bytes; + stats->tx_dropped += ring->tx_dropped; + stats->tx_errors += ring->tx_errors; } } @@ -1485,11 +1486,16 @@ static void xgene_enet_get_stats64( if (ring) { stats->rx_packets +=
[PATCH v2 net-next 04/12] drivers: net: xgene: Remove redundant local stats
From: Quan NguyenCommit 5944701df90d ("net: remove useless memset's in drivers get_stats64") makes the pdata->stats redundant. This patch removes pdata->stats and updates get_stats64() callback accordingly. Signed-off-by: Quan Nguyen Signed-off-by: Iyappan Subramanian --- drivers/net/ethernet/apm/xgene/xgene_enet_ethtool.c | 7 --- drivers/net/ethernet/apm/xgene/xgene_enet_main.c| 4 +--- drivers/net/ethernet/apm/xgene/xgene_enet_main.h| 1 - 3 files changed, 5 insertions(+), 7 deletions(-) diff --git a/drivers/net/ethernet/apm/xgene/xgene_enet_ethtool.c b/drivers/net/ethernet/apm/xgene/xgene_enet_ethtool.c index 28fdedc..217cde8 100644 --- a/drivers/net/ethernet/apm/xgene/xgene_enet_ethtool.c +++ b/drivers/net/ethernet/apm/xgene/xgene_enet_ethtool.c @@ -25,7 +25,7 @@ struct xgene_gstrings_stats { int offset; }; -#define XGENE_STAT(m) { #m, offsetof(struct xgene_enet_pdata, stats.m) } +#define XGENE_STAT(m) { #m, offsetof(struct rtnl_link_stats64, m) } static const struct xgene_gstrings_stats gstrings_stats[] = { XGENE_STAT(rx_packets), @@ -156,11 +156,12 @@ static void xgene_get_ethtool_stats(struct net_device *ndev, struct ethtool_stats *dummy, u64 *data) { - void *pdata = netdev_priv(ndev); + struct rtnl_link_stats64 stats; int i; + dev_get_stats(ndev, ); for (i = 0; i < XGENE_STATS_LEN; i++) - *data++ = *(u64 *)(pdata + gstrings_stats[i].offset); + data[i] = *(u64 *)((char *) + gstrings_stats[i].offset); } static void xgene_get_pauseparam(struct net_device *ndev, diff --git a/drivers/net/ethernet/apm/xgene/xgene_enet_main.c b/drivers/net/ethernet/apm/xgene/xgene_enet_main.c index 9a28ac3..e4f2ef2 100644 --- a/drivers/net/ethernet/apm/xgene/xgene_enet_main.c +++ b/drivers/net/ethernet/apm/xgene/xgene_enet_main.c @@ -1466,10 +1466,9 @@ static int xgene_enet_create_desc_rings(struct net_device *ndev) static void xgene_enet_get_stats64( struct net_device *ndev, - struct rtnl_link_stats64 *storage) + struct rtnl_link_stats64 *stats) { struct xgene_enet_pdata *pdata = netdev_priv(ndev); - struct rtnl_link_stats64 *stats = >stats; struct xgene_enet_desc_ring *ring; int i; @@ -1493,7 +1492,6 @@ static void xgene_enet_get_stats64( stats->rx_dropped += ring->rx_dropped; } } - memcpy(storage, stats, sizeof(struct rtnl_link_stats64)); } static int xgene_enet_set_mac_address(struct net_device *ndev, void *addr) diff --git a/drivers/net/ethernet/apm/xgene/xgene_enet_main.h b/drivers/net/ethernet/apm/xgene/xgene_enet_main.h index 827b33d..5e6fd71 100644 --- a/drivers/net/ethernet/apm/xgene/xgene_enet_main.h +++ b/drivers/net/ethernet/apm/xgene/xgene_enet_main.h @@ -219,7 +219,6 @@ struct xgene_enet_pdata { int phy_mode; enum xgene_enet_rm rm; struct xgene_enet_cle cle; - struct rtnl_link_stats64 stats; const struct xgene_mac_ops *mac_ops; spinlock_t mac_lock; /* mac lock */ const struct xgene_port_ops *port_ops; -- 1.9.1
[PATCH v2 net-next 09/12] drivers: net: xgene: Workaround for HW errata 10GE_4
From: Quan NguyenThis patch adds workaround for HW errata 10GE_4: "XGENET_ICM_ECM_DROP_COUNT_REG_0 reg not clear on read". Signed-off-by: Quan Nguyen Signed-off-by: Iyappan Subramanian --- drivers/net/ethernet/apm/xgene/xgene_enet_hw.c| 2 ++ drivers/net/ethernet/apm/xgene/xgene_enet_sgmac.c | 5 + drivers/net/ethernet/apm/xgene/xgene_enet_xgmac.c | 2 ++ 3 files changed, 9 insertions(+) diff --git a/drivers/net/ethernet/apm/xgene/xgene_enet_hw.c b/drivers/net/ethernet/apm/xgene/xgene_enet_hw.c index 5278d62..3235d0c 100644 --- a/drivers/net/ethernet/apm/xgene/xgene_enet_hw.c +++ b/drivers/net/ethernet/apm/xgene/xgene_enet_hw.c @@ -626,6 +626,8 @@ static void xgene_gmac_get_drop_cnt(struct xgene_enet_pdata *pdata, xgene_enet_rd_mcx_csr(pdata, ICM_ECM_DROP_COUNT_REG0_ADDR, ); *rx = ICM_DROP_COUNT(count); *tx = ECM_DROP_COUNT(count); + /* Errata: 10GE_4 - Fix ICM_ECM_DROP_COUNT not clear-on-read */ + xgene_enet_rd_mcx_csr(pdata, ECM_CONFIG0_REG_0_ADDR, ); } static void xgene_enet_config_ring_if_assoc(struct xgene_enet_pdata *pdata) diff --git a/drivers/net/ethernet/apm/xgene/xgene_enet_sgmac.c b/drivers/net/ethernet/apm/xgene/xgene_enet_sgmac.c index 2919d3e..31df8f3 100644 --- a/drivers/net/ethernet/apm/xgene/xgene_enet_sgmac.c +++ b/drivers/net/ethernet/apm/xgene/xgene_enet_sgmac.c @@ -106,6 +106,11 @@ static void xgene_sgmac_get_drop_cnt(struct xgene_enet_pdata *pdata, count = xgene_enet_rd_mcx_csr(pdata, addr); *rx = ICM_DROP_COUNT(count); *tx = ECM_DROP_COUNT(count); + /* Errata: 10GE_4 - ICM_ECM_DROP_COUNT not clear-on-read */ + addr = (pdata->enet_id != XGENE_ENET1) ? + XG_MCX_ECM_CONFIG0_REG_0_ADDR : + ECM_CONFIG0_REG_0_ADDR + pdata->port_id * OFFSET_4; + xgene_enet_rd_mcx_csr(pdata, addr); } static void xgene_enet_config_ring_if_assoc(struct xgene_enet_pdata *p) diff --git a/drivers/net/ethernet/apm/xgene/xgene_enet_xgmac.c b/drivers/net/ethernet/apm/xgene/xgene_enet_xgmac.c index 56dff38..eab6f1c 100644 --- a/drivers/net/ethernet/apm/xgene/xgene_enet_xgmac.c +++ b/drivers/net/ethernet/apm/xgene/xgene_enet_xgmac.c @@ -188,6 +188,8 @@ static void xgene_xgmac_get_drop_cnt(struct xgene_enet_pdata *pdata, xgene_enet_rd_axg_csr(pdata, XGENET_ICM_ECM_DROP_COUNT_REG0, ); *rx = ICM_DROP_COUNT(count); *tx = ECM_DROP_COUNT(count); + /* Errata: 10GE_4 - ICM_ECM_DROP_COUNT not clear-on-read */ + xgene_enet_rd_axg_csr(pdata, XGENET_ECM_CONFIG0_REG_0, ); } static void xgene_enet_config_ring_if_assoc(struct xgene_enet_pdata *pdata) -- 1.9.1
Re: Requirements for a shutdown function?
On 05/09/2017 02:06 PM, Florian Fainelli wrote: > On 05/09/2017 11:51 AM, Timur Tabi wrote: >> Is it possible that the network stack detects a kexec and automatically >> stops all network devices? > > No. why would it? However the device driver model does call into your > driver's remove function and that one does a right job already because > it does an network device unregister, and so on. I ran some more tests. When I launch kexec, the driver's net_device_ops.ndo_stop function is called, which already stops the interface. So it looks to me as if the network stack does close the interface during a kexec. With the interface closed, there's no point in having a shutdown function, is there? >> My in-house driver stops the RX and TX queues. I'm guessing that's good >> enough, but I don't have a failing test case to prove it. >> > > That's probably good enough, yes. Except that it turns out that the queues are already stopped by then because the emac_close() function has already been called. -- Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc. Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.
Re: [Patch net] ipv4: restore rt->fi for reference counting
Hello, On Wed, 10 May 2017, Cong Wang wrote: > On Wed, May 10, 2017 at 12:38 AM, Julian Anastasovwrote: > > > > During NETDEV_UNREGISTER packets for dev should not > > be flying but packets for other devs can walk the nexthops > > for multipath routes. It is the rcu_barrier before > > NETDEV_UNREGISTER_FINAL that allows nh_dev to be set to NULL > > during this grace period but there are many places to fix that > > assume nh_dev!=NULL. > > Excellent point. Unfortunately NETDEV_UNREGISTER_FINAL is not > yet captured by the fib event notifier. > > I think readers are still safe to assume nh_dev!=NULL since we wait > for existing readers, readers coming later can't see it any more. Or > am I missing anything? It is not safe because other devs can deliver packets that still can see the multipath fib info in the FIB nodes. CPU 1 (unreg dev) CPU 2 (packet to another dev2) NETDEV_DOWN => set RTNH_F_DEAD Now traffic sent via downed dev should stop flush_all_backlogs synchronize_net Now received traffic for dev should stop on all CPUs fib_sync_down_dev nh_dev = NULL fib_table_lookup: boom in __in_dev_get_rcu(nh->nh_dev), all nh_dev dereferences must be checked. If checked, it should be safe to use this nh_dev due to the rcu_barrier in netdev_run_todo. But only the structure can be accessed, traffic should not go via unreged dev. fib_flush: unlink fi from fa_info in fib_table_flush Now other CPUs will not see this fi on lookups (after fib_table_flush) netdev_run_todo rcu_barrier Now other CPUs should not send packets via this fib info NETDEV_UNREGISTER_FINAL Notes: - when NETDEV_DOWN is delivered fib_sync_down_dev sets RTNH_F_DEAD but only for link routes, not for host routes - fib_table_lookup runs without any memory ordering barriers, when CPU 2 delivers packet from different dev it can hit a multipath route with nh_dev set to NULL. Not so often if RTNH_F_DEAD was set early and properly checked before dereferencing nh_dev. > > But why we leak routes? Is there some place that holds > > routes without listening for NETDEV_UNREGISTER? On fib_flush > > the infos are unlinked from trees, so after a grace period packets > > should not see/hold such infos. If we hold routes somewhere for > > long time, problem can happen also for routes with single nexthop. > > My theory is that dst which holds a refcnt to fib_info (of course, after > this patch) is moved in GC after NETDEV_UNREGISTER but still > referenced somewhere, it therefore holds these nh_dev's, which > in turn hold back to the netdevice which is being unregistered, thus > Eric saw these refcnt leak warnings. Oh, well, the sockets can hold cached dst. But if the promise is that rt->fi is used only as reference to metrics we have to find a way to drop the dev references at NETDEV_UNREGISTER time. If you set nh_dev to NULL then all lookups should check it for != NULL. The sockets will not walk NHs via rt->fi, i.e. the route lookups will get valid res.fi from trees, so it may work in this way. Regards -- Julian Anastasov
[PATCH] net-procfs: Adjust buffer output in dev_mc_seq_show()
From: Markus ElfringDate: Wed, 10 May 2017 21:17:43 +0200 * Use a special format string specification for the desired output of the array "addr" into a sequence. This issue was detected by using the Coccinelle software. * Delete the local variable "i" and a call of the function "seq_putc" which became unnecessary with this refactoring. Signed-off-by: Markus Elfring --- net/core/net-procfs.c | 8 +--- 1 file changed, 1 insertion(+), 7 deletions(-) diff --git a/net/core/net-procfs.c b/net/core/net-procfs.c index 14d09345f00d..6aa427a2fb98 100644 --- a/net/core/net-procfs.c +++ b/net/core/net-procfs.c @@ -363,15 +363,9 @@ static int dev_mc_seq_show(struct seq_file *seq, void *v) netif_addr_lock_bh(dev); netdev_for_each_mc_addr(ha, dev) { - int i; - seq_printf(seq, "%-4d %-15s %-5d %-5d ", dev->ifindex, dev->name, ha->refcount, ha->global_use); - - for (i = 0; i < dev->addr_len; i++) - seq_printf(seq, "%02x", ha->addr[i]); - - seq_putc(seq, '\n'); + seq_printf(seq, "%*phN\n", dev->addr_len, ha->addr); } netif_addr_unlock_bh(dev); return 0; -- 2.12.3
[PATCH] libertas: Avoid reading past end of buffer
Using memcpy() from a string that is shorter than the length copied means the destination buffer is being filled with arbitrary data from the kernel rodata segment. Instead, redefine the stat strings to be ETH_GSTRING_LEN sizes, like other drivers. This lets us use a single memcpy that does not leak rodata contents. Additionally adjust indentation to keep checkpatch.pl happy. This was found with the future CONFIG_FORTIFY_SOURCE feature. Cc: Daniel MicaySigned-off-by: Kees Cook --- v2: use ETH_GSTRING_LEN; joe --- drivers/net/wireless/marvell/libertas/mesh.c | 26 ++ 1 file changed, 10 insertions(+), 16 deletions(-) diff --git a/drivers/net/wireless/marvell/libertas/mesh.c b/drivers/net/wireless/marvell/libertas/mesh.c index d0c881dd5846..6076c83ce5ab 100644 --- a/drivers/net/wireless/marvell/libertas/mesh.c +++ b/drivers/net/wireless/marvell/libertas/mesh.c @@ -1108,15 +1108,15 @@ void lbs_mesh_set_txpd(struct lbs_private *priv, * Ethtool related */ -static const char * const mesh_stat_strings[] = { - "drop_duplicate_bcast", - "drop_ttl_zero", - "drop_no_fwd_route", - "drop_no_buffers", - "fwded_unicast_cnt", - "fwded_bcast_cnt", - "drop_blind_table", - "tx_failed_cnt" +static const char mesh_stat_strings[MESH_STATS_NUM][ETH_GSTRING_LEN] = { + "drop_duplicate_bcast", + "drop_ttl_zero", + "drop_no_fwd_route", + "drop_no_buffers", + "fwded_unicast_cnt", + "fwded_bcast_cnt", + "drop_blind_table", + "tx_failed_cnt" }; void lbs_mesh_ethtool_get_stats(struct net_device *dev, @@ -1170,17 +1170,11 @@ int lbs_mesh_ethtool_get_sset_count(struct net_device *dev, int sset) void lbs_mesh_ethtool_get_strings(struct net_device *dev, uint32_t stringset, uint8_t *s) { - int i; - lbs_deb_enter(LBS_DEB_ETHTOOL); switch (stringset) { case ETH_SS_STATS: - for (i = 0; i < MESH_STATS_NUM; i++) { - memcpy(s + i * ETH_GSTRING_LEN, - mesh_stat_strings[i], - ETH_GSTRING_LEN); - } + memcpy(s, *mesh_stat_strings, sizeof(mesh_stat_strings)); break; } lbs_deb_enter(LBS_DEB_ETHTOOL); -- 2.7.4 -- Kees Cook Pixel Security
[PATCH 4/5] bpf: Add bpf_verify_program() to the library.
This allows a test case to load a BPF program and unconditionally acquire the verifier log. It also allows specification of the strict alignment flag. Signed-off-by: David S. Miller--- tools/lib/bpf/bpf.c | 22 ++ tools/lib/bpf/bpf.h | 4 2 files changed, 26 insertions(+) diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c index 4fe444b80..6e17898 100644 --- a/tools/lib/bpf/bpf.c +++ b/tools/lib/bpf/bpf.c @@ -117,6 +117,28 @@ int bpf_load_program(enum bpf_prog_type type, const struct bpf_insn *insns, return sys_bpf(BPF_PROG_LOAD, , sizeof(attr)); } +int bpf_verify_program(enum bpf_prog_type type, const struct bpf_insn *insns, + size_t insns_cnt, int strict_alignment, + const char *license, __u32 kern_version, + char *log_buf, size_t log_buf_sz) +{ + union bpf_attr attr; + + bzero(, sizeof(attr)); + attr.prog_type = type; + attr.insn_cnt = (__u32)insns_cnt; + attr.insns = ptr_to_u64(insns); + attr.license = ptr_to_u64(license); + attr.log_buf = ptr_to_u64(log_buf); + attr.log_size = log_buf_sz; + attr.log_level = 2; + log_buf[0] = 0; + attr.kern_version = kern_version; + attr.prog_flags = strict_alignment ? BPF_F_STRICT_ALIGNMENT : 0; + + return sys_bpf(BPF_PROG_LOAD, , sizeof(attr)); +} + int bpf_map_update_elem(int fd, const void *key, const void *value, __u64 flags) { diff --git a/tools/lib/bpf/bpf.h b/tools/lib/bpf/bpf.h index edb4dae..972bd83 100644 --- a/tools/lib/bpf/bpf.h +++ b/tools/lib/bpf/bpf.h @@ -35,6 +35,10 @@ int bpf_load_program(enum bpf_prog_type type, const struct bpf_insn *insns, size_t insns_cnt, const char *license, __u32 kern_version, char *log_buf, size_t log_buf_sz); +int bpf_verify_program(enum bpf_prog_type type, const struct bpf_insn *insns, + size_t insns_cnt, int strict_alignment, + const char *license, __u32 kern_version, + char *log_buf, size_t log_buf_sz); int bpf_map_update_elem(int fd, const void *key, const void *value, __u64 flags); -- 2.1.2.532.g19b5d50
[PATCH 5/5] bpf: Add verifier test case for alignment.
Signed-off-by: David S. Miller--- tools/testing/selftests/bpf/Makefile | 3 +- tools/testing/selftests/bpf/test_align.c | 417 +++ 2 files changed, 419 insertions(+), 1 deletion(-) create mode 100644 tools/testing/selftests/bpf/test_align.c diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile index 91edd05..f92f27d 100644 --- a/tools/testing/selftests/bpf/Makefile +++ b/tools/testing/selftests/bpf/Makefile @@ -11,7 +11,8 @@ endif CFLAGS += -Wall -O2 -I$(APIDIR) -I$(LIBDIR) -I$(GENDIR) $(GENFLAGS) -I../../../include LDLIBS += -lcap -lelf -TEST_GEN_PROGS = test_verifier test_tag test_maps test_lru_map test_lpm_map test_progs +TEST_GEN_PROGS = test_verifier test_tag test_maps test_lru_map test_lpm_map test_progs \ + test_align TEST_GEN_FILES = test_pkt_access.o test_xdp.o test_l4lb.o test_tcp_estats.o diff --git a/tools/testing/selftests/bpf/test_align.c b/tools/testing/selftests/bpf/test_align.c new file mode 100644 index 000..9dd9794 --- /dev/null +++ b/tools/testing/selftests/bpf/test_align.c @@ -0,0 +1,417 @@ +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include +#include +#include +#include + +#include + +#include "../../../include/linux/filter.h" + +#ifndef ARRAY_SIZE +# define ARRAY_SIZE(x) (sizeof(x) / sizeof((x)[0])) +#endif + +#define MAX_INSNS 512 +#define MAX_MATCHES16 + +struct bpf_align_test { + const char *descr; + struct bpf_insn insns[MAX_INSNS]; + enum { + UNDEF, + ACCEPT, + REJECT + } result; + enum bpf_prog_type prog_type; + const char *matches[MAX_MATCHES]; +}; + +static struct bpf_align_test tests[] = { + { + .descr = "mov", + .insns = { + BPF_MOV64_IMM(BPF_REG_3, 2), + BPF_MOV64_IMM(BPF_REG_3, 4), + BPF_MOV64_IMM(BPF_REG_3, 8), + BPF_MOV64_IMM(BPF_REG_3, 16), + BPF_MOV64_IMM(BPF_REG_3, 32), + BPF_MOV64_IMM(BPF_REG_0, 0), + BPF_EXIT_INSN(), + }, + .prog_type = BPF_PROG_TYPE_SCHED_CLS, + .matches = { + "1: R1=ctx R3=imm2,min_value=2,max_value=2,min_align=2 R10=fp", + "2: R1=ctx R3=imm4,min_value=4,max_value=4,min_align=4 R10=fp", + "3: R1=ctx R3=imm8,min_value=8,max_value=8,min_align=8 R10=fp", + "4: R1=ctx R3=imm16,min_value=16,max_value=16,min_align=16 R10=fp", + "5: R1=ctx R3=imm32,min_value=32,max_value=32,min_align=32 R10=fp", + }, + }, + { + .descr = "shift", + .insns = { + BPF_MOV64_IMM(BPF_REG_3, 1), + BPF_ALU64_IMM(BPF_LSH, BPF_REG_3, 1), + BPF_ALU64_IMM(BPF_LSH, BPF_REG_3, 1), + BPF_ALU64_IMM(BPF_LSH, BPF_REG_3, 1), + BPF_ALU64_IMM(BPF_LSH, BPF_REG_3, 1), + BPF_ALU64_IMM(BPF_RSH, BPF_REG_3, 4), + BPF_MOV64_IMM(BPF_REG_4, 32), + BPF_ALU64_IMM(BPF_RSH, BPF_REG_4, 1), + BPF_ALU64_IMM(BPF_RSH, BPF_REG_4, 1), + BPF_ALU64_IMM(BPF_RSH, BPF_REG_4, 1), + BPF_ALU64_IMM(BPF_RSH, BPF_REG_4, 1), + BPF_MOV64_IMM(BPF_REG_0, 0), + BPF_EXIT_INSN(), + }, + .prog_type = BPF_PROG_TYPE_SCHED_CLS, + .matches = { + "1: R1=ctx R3=imm1,min_value=1,max_value=1,min_align=1 R10=fp", + "2: R1=ctx R3=imm2,min_value=2,max_value=2,min_align=2 R10=fp", + "3: R1=ctx R3=imm4,min_value=4,max_value=4,min_align=4 R10=fp", + "4: R1=ctx R3=imm8,min_value=8,max_value=8,min_align=8 R10=fp", + "5: R1=ctx R3=imm16,min_value=16,max_value=16,min_align=16 R10=fp", + "6: R1=ctx R3=imm1,min_value=1,max_value=1,min_align=1 R10=fp", + "7: R1=ctx R3=imm1,min_value=1,max_value=1,min_align=1 R4=imm32,min_value=32,max_value=32,min_align=32 R10=fp", + "8: R1=ctx R3=imm1,min_value=1,max_value=1,min_align=1 R4=imm16,min_value=16,max_value=16,min_align=16 R10=fp", + "9: R1=ctx R3=imm1,min_value=1,max_value=1,min_align=1 R4=imm8,min_value=8,max_value=8,min_align=8 R10=fp", + "10: R1=ctx R3=imm1,min_value=1,max_value=1,min_align=1 R4=imm4,min_value=4,max_value=4,min_align=4 R10=fp", + "11: R1=ctx R3=imm1,min_value=1,max_value=1,min_align=1 R4=imm2,min_value=2,max_value=2,min_align=2 R10=fp", +
[PATCH 3/5] bpf: Add strict alignment flag for BPF_PROG_LOAD.
Add a new field, "prog_flags", and an initial flag value BPF_F_STRCIT_ALIGNMENT. When set, the verifier will enforce strict pointer alignment regardless of the setting of CONFIG_EFFICIENT_UNALIGNED_ACCESS. The verifier, in this mode, will also use a fixed value of "2" in place of NET_IP_ALIGN. This facilitates test cases that will exercise and validate this part of the verifier even when run on architectures where alignment doesn't matter. Signed-off-by: David S. Miller--- include/uapi/linux/bpf.h | 8 kernel/bpf/syscall.c | 5 - kernel/bpf/verifier.c | 20 tools/build/feature/test-bpf.c | 1 + tools/include/uapi/linux/bpf.h | 11 +-- 5 files changed, 38 insertions(+), 7 deletions(-) diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index 945a1f5..94dfa9d 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -132,6 +132,13 @@ enum bpf_attach_type { */ #define BPF_F_ALLOW_OVERRIDE (1U << 0) +/* If BPF_F_STRICT_ALIGNMENT is used in BPF_PROG_LOAD command, the + * verifier will perform strict alignment checking as if the kernel + * has been built with CONFIG_EFFICIENT_UNALIGNED_ACCESS not set, + * and NET_IP_ALIGN defined to 2. + */ +#define BPF_F_STRICT_ALIGNMENT (1U << 0) + #define BPF_PSEUDO_MAP_FD 1 /* flags for BPF_MAP_UPDATE_ELEM command */ @@ -177,6 +184,7 @@ union bpf_attr { __u32 log_size; /* size of user buffer */ __aligned_u64 log_buf;/* user supplied buffer */ __u32 kern_version; /* checked when prog_type=kprobe */ + __u32 prog_flags; }; struct { /* anonymous struct used by BPF_OBJ_* commands */ diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index fd2411f..265a0d8 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -783,7 +783,7 @@ struct bpf_prog *bpf_prog_get_type(u32 ufd, enum bpf_prog_type type) EXPORT_SYMBOL_GPL(bpf_prog_get_type); /* last field in 'union bpf_attr' used by this command */ -#defineBPF_PROG_LOAD_LAST_FIELD kern_version +#defineBPF_PROG_LOAD_LAST_FIELD prog_flags static int bpf_prog_load(union bpf_attr *attr) { @@ -796,6 +796,9 @@ static int bpf_prog_load(union bpf_attr *attr) if (CHECK_ATTR(BPF_PROG_LOAD)) return -EINVAL; + if (attr->prog_flags & ~BPF_F_STRICT_ALIGNMENT) + return -EINVAL; + /* copy eBPF program license from user space */ if (strncpy_from_user(license, u64_to_user_ptr(attr->license), sizeof(license) - 1) < 0) diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index e1effce..a406555 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -796,6 +796,7 @@ static bool is_pointer_value(struct bpf_verifier_env *env, int regno) static int check_pkt_ptr_alignment(const struct bpf_reg_state *reg, int off, int size, bool strict) { + int ip_align; int reg_off; /* Byte size accesses are always allowed. */ @@ -812,10 +813,14 @@ static int check_pkt_ptr_alignment(const struct bpf_reg_state *reg, reg_off += reg->aux_off; } - /* skb->data is NET_IP_ALIGN-ed */ - if ((NET_IP_ALIGN + reg_off + off) % size != 0) { + /* skb->data is NET_IP_ALIGN-ed, but for strict alignment checking +* we force this to 2 which is universally what architectures use +* when they don't set CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS. +*/ + ip_align = strict ? 2 : NET_IP_ALIGN; + if ((ip_align + reg_off + off) % size != 0) { verbose("misaligned packet access off %d+%d+%d size %d\n", - NET_IP_ALIGN, reg_off, off, size); + ip_align, reg_off, off, size); return -EACCES; } @@ -833,10 +838,12 @@ static int check_val_ptr_alignment(const struct bpf_reg_state *reg, return 0; } +static bool strict_alignment; + static int check_ptr_alignment(const struct bpf_reg_state *reg, int off, int size) { - bool strict = false; + bool strict = strict_alignment; if (!IS_ENABLED(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS)) strict = true; @@ -3574,6 +3581,10 @@ int bpf_check(struct bpf_prog **prog, union bpf_attr *attr) } else { log_level = 0; } + if (attr->prog_flags & BPF_F_STRICT_ALIGNMENT) + strict_alignment = true; + else + strict_alignment = false; ret = replace_map_fd_with_map_ptr(env); if (ret < 0) @@ -3679,6 +3690,7 @@ int bpf_analyzer(struct bpf_prog *prog, const struct bpf_ext_analyzer_ops *ops, mutex_lock(_verifier_lock); log_level = 0; + strict_alignment = false;
[PATCH 2/5] bpf: Do per-instruction state dumping in verifier when log_level > 1.
If log_level > 1, do a state dump every instruction and emit it in a more compact way (without a leading newline). This will facilitate more sophisticated test cases which inspect the verifier log for register state. Signed-off-by: David S. Miller--- kernel/bpf/verifier.c | 8 ++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index 56230ff..e1effce 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -2932,8 +2932,12 @@ static int do_check(struct bpf_verifier_env *env) goto process_bpf_exit; } - if (log_level && do_print_state) { - verbose("\nfrom %d to %d:", prev_insn_idx, insn_idx); + if (log_level > 1 || (log_level && do_print_state)) { + if (log_level > 1) + verbose("%d:", insn_idx); + else + verbose("\nfrom %d to %d:", + prev_insn_idx, insn_idx); print_verifier_state(>cur_state); do_print_state = false; } -- 2.1.2.532.g19b5d50
[PATCH 1/5] bpf: Track alignment of register values in the verifier.
Currently if we add only constant values to pointers we can fully validate the alignment, and properly check if we need to reject the program on CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS architectures. However, once an unknown value is introduced we only allow byte sized memory accesses which is too restrictive. Add logic to track the known minimum alignment of register values, and propagate this state into registers containing pointers. The most common paradigm that makes use of this new logic is computing the transport header using the IP header length field. For example: struct ethhdr *ep = skb->data; struct iphdr *iph = (struct iphdr *) (ep + 1); struct tcphdr *th; ... n = iph->ihl; th = ((void *)iph + (n * 4)); port = th->dest; The existing code will reject the load of th->dport because it cannot validate that the alignment is at least 2 once "n * 4" is added the the packet pointer. In the new code, the register holding "n * 4" will have a reg->min_align value of 4, because any value multiplied by 4 will be at least 4 byte aligned. (actually, the eBPF code emitted by the compiler in this case is most likely to use a shift left by 2, but the end result is identical) At the critical addition: th = ((void *)iph + (n * 4)); The register holding 'th' will start with reg->off value of 14. The pointer addition will transform that reg into something that looks like: reg->aux_off = 14 reg->aux_off_align = 4 Next, the verifier will look at the th->dest load, and it will see a load offset of 2, and first check: if (reg->aux_off_align % size) which will pass because aux_off_align is 4. reg_off will be computed: reg_off = reg->off; ... reg_off += reg->aux_off; plus we have off==2, and it will thus check: if ((NET_IP_ALIGN + reg_off + off) % size != 0) which evaluates to: if ((NET_IP_ALIGN + 14 + 2) % size != 0) On strict alignment architectures, NET_IP_ALIGN is 2, thus: if ((2 + 14 + 2) % size != 0) which passes. These pointer transformations and checks work regardless of whether the constant offset or the variable with known alignment is added first to the pointer register. Signed-off-by: David S. Miller--- include/linux/bpf_verifier.h | 3 ++ kernel/bpf/verifier.c| 114 +++ 2 files changed, 98 insertions(+), 19 deletions(-) diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h index 5efb4db..7c6a519 100644 --- a/include/linux/bpf_verifier.h +++ b/include/linux/bpf_verifier.h @@ -40,6 +40,9 @@ struct bpf_reg_state { */ s64 min_value; u64 max_value; + u32 min_align; + u32 aux_off; + u32 aux_off_align; }; enum bpf_stack_slot_type { diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index c5b56c9..56230ff 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -241,6 +241,12 @@ static void print_verifier_state(struct bpf_verifier_state *state) if (reg->max_value != BPF_REGISTER_MAX_RANGE) verbose(",max_value=%llu", (unsigned long long)reg->max_value); + if (reg->min_align) + verbose(",min_align=%u", reg->min_align); + if (reg->aux_off) + verbose(",aux_off=%u", reg->aux_off); + if (reg->aux_off_align) + verbose(",aux_off_align=%u", reg->aux_off_align); } for (i = 0; i < MAX_BPF_STACK; i += BPF_REG_SIZE) { if (state->stack_slot_type[i] == STACK_SPILL) @@ -466,6 +472,9 @@ static void init_reg_state(struct bpf_reg_state *regs) regs[i].imm = 0; regs[i].min_value = BPF_REGISTER_MIN_RANGE; regs[i].max_value = BPF_REGISTER_MAX_RANGE; + regs[i].min_align = 0; + regs[i].aux_off = 0; + regs[i].aux_off_align = 0; } /* frame pointer */ @@ -494,11 +503,17 @@ static void reset_reg_range_values(struct bpf_reg_state *regs, u32 regno) regs[regno].max_value = BPF_REGISTER_MAX_RANGE; } +static void reset_reg_align(struct bpf_reg_state *regs, u32 regno) +{ + regs[regno].min_align = 0; +} + static void mark_reg_unknown_value_and_range(struct bpf_reg_state *regs, u32 regno) { mark_reg_unknown_value(regs, regno); reset_reg_range_values(regs, regno); + reset_reg_align(regs, regno); } enum reg_arg_type { @@ -779,17 +794,28 @@ static bool is_pointer_value(struct bpf_verifier_env *env, int regno) } static int check_pkt_ptr_alignment(const struct bpf_reg_state *reg, - int off, int size) + int off, int size, bool strict) { - if (reg->id && size != 1) { -
[PATCH 0/5] bpf: Add alignment tracker to verifier.
This is the whole series, more details in the individual commit messages. First we add the alignment tracking logic to the verifier. Next, we work on building up infrastructure to facilitate regression testing of this facility. Finally, we add the "test_align" test case. Signed-off-by: David S. Miller
Re: [PATCH] libertas: Avoid reading past end of buffer
On Tue, May 9, 2017 at 9:33 PM, Joe Percheswrote: > On Tue, 2017-05-09 at 16:23 -0700, Kees Cook wrote: >> Using memcpy() from a string that is shorter than the length copied means >> the destination buffer is being filled with arbitrary data from the kernel >> rodata segment. Instead, use strncpy() which will fill the trailing bytes >> with zeros. Additionally adjust indentation to keep checkpatch.pl happy. >> >> This was found with the future CONFIG_FORTIFY_SOURCE feature. > [] >> diff --git a/drivers/net/wireless/marvell/libertas/mesh.c >> b/drivers/net/wireless/marvell/libertas/mesh.c > [] >> @@ -1177,9 +1177,9 @@ void lbs_mesh_ethtool_get_strings(struct net_device >> *dev, >> switch (stringset) { >> case ETH_SS_STATS: >> for (i = 0; i < MESH_STATS_NUM; i++) { >> - memcpy(s + i * ETH_GSTRING_LEN, >> - mesh_stat_strings[i], >> - ETH_GSTRING_LEN); >> + strncpy(s + i * ETH_GSTRING_LEN, >> + mesh_stat_strings[i], >> + ETH_GSTRING_LEN); >> } > > The better solution is to declare > mesh_stat_strings in in the normal way > > --- > drivers/net/wireless/marvell/libertas/mesh.c | 18 +- > 1 file changed, 9 insertions(+), 9 deletions(-) > > diff --git a/drivers/net/wireless/marvell/libertas/mesh.c > b/drivers/net/wireless/marvell/libertas/mesh.c > index d0c881dd5846..a535e7f48d2d 100644 > --- a/drivers/net/wireless/marvell/libertas/mesh.c > +++ b/drivers/net/wireless/marvell/libertas/mesh.c > @@ -1108,15 +1108,15 @@ void lbs_mesh_set_txpd(struct lbs_private *priv, > * Ethtool related > */ > > -static const char * const mesh_stat_strings[] = { > - "drop_duplicate_bcast", > - "drop_ttl_zero", > - "drop_no_fwd_route", > - "drop_no_buffers", > - "fwded_unicast_cnt", > - "fwded_bcast_cnt", > - "drop_blind_table", > - "tx_failed_cnt" > +static const char mesh_stat_strings[][ETH_GSTRING_LEN] = { > + "drop_duplicate_bcast", > + "drop_ttl_zero", > + "drop_no_fwd_route", > + "drop_no_buffers", > + "fwded_unicast_cnt", > + "fwded_bcast_cnt", > + "drop_blind_table", > + "tx_failed_cnt", > }; > > void lbs_mesh_ethtool_get_stats(struct net_device *dev, Ah yeah! And that simplifies the memcpy too, since it can just do it in one go. New patch on the way... -Kees -- Kees Cook Pixel Security
Re: [PATCH] wcn36xx: Close SMD channel on device removal
On Wed 10 May 00:27 PDT 2017, Arend van Spriel wrote: > On 5/10/2017 1:03 AM, Bjorn Andersson wrote: > > On Mon 08 May 23:17 PDT 2017, Kalle Valo wrote: > > > > > Bjorn Anderssonwrites: > > > > > > > The SMD channel is not the primary WCNSS channel and must explicitly be > > > > closed as the device is removed, or the channel will already by open on > > > > a subsequent probe call in e.g. the case of reloading the kernel module. > > > > > > > > This issue was introduced because I simplified the underlying SMD > > > > implementation while the SMD adaptions of the driver sat on the mailing > > > > list, but missed to update these patches. The patch does however only > > > > apply back to the transition to rpmsg, hence the limited Fixes. > > > > > > > > Fixes: 5052de8deff5 ("soc: qcom: smd: Transition client drivers from > > > > smd to rpmsg") > > > > Reported-by: Eyal Ilsar > > > > Signed-off-by: Bjorn Andersson > > > > > > As this is a regression I'll queue this to 4.12. > > > > > > > Thanks. > > > > > But if this is an older bug (didn't quite understand your description > > > though) should there be a separate patch for stable releases? > > > > > > > AFAICT this never worked, as it seems I did the rework in SMD while we > > tried to figure out the dependency issues we had with moving to SMD. So > > v4.9 through v4.11 has SMD support - with this bug. > > > > How do I proceed, do you want me to write up a fix for stable@? Do I > > send that out as an ordinary patch? > > If the patch applies cleanly on branches linux-4.9.y through linux-4.11.y in > the stable repository you can go for '--- Option 1 ---' as described in > /Documentation/stable_kernel_rules.txt. > It does not, before v4.12 it's a completely different function to call to close the channel. But "Option 3" describes the situation, thanks for the reference. I'll try to find the time to verify the patch on v4.11 and send it to stable@. Regards, Bjorn
[PATCH] net: ethernet: ti: netcp_core: return error while dma channel open issue
Fix error path while dma open channel issue. Also, no need to check output on NULL if it's never returned. Signed-off-by: Ivan Khoronzhuk--- Based on net-next drivers/net/ethernet/ti/netcp_core.c | 6 -- drivers/soc/ti/knav_dma.c| 2 +- 2 files changed, 5 insertions(+), 3 deletions(-) diff --git a/drivers/net/ethernet/ti/netcp_core.c b/drivers/net/ethernet/ti/netcp_core.c index 729a7da..e6222e5 100644 --- a/drivers/net/ethernet/ti/netcp_core.c +++ b/drivers/net/ethernet/ti/netcp_core.c @@ -1353,9 +1353,10 @@ int netcp_txpipe_open(struct netcp_tx_pipe *tx_pipe) tx_pipe->dma_channel = knav_dma_open_channel(dev, tx_pipe->dma_chan_name, ); - if (IS_ERR_OR_NULL(tx_pipe->dma_channel)) { + if (IS_ERR(tx_pipe->dma_channel)) { dev_err(dev, "failed opening tx chan(%s)\n", tx_pipe->dma_chan_name); + ret = PTR_ERR(tx_pipe->dma_channel); goto err; } @@ -1673,9 +1674,10 @@ static int netcp_setup_navigator_resources(struct net_device *ndev) netcp->rx_channel = knav_dma_open_channel(netcp->netcp_device->device, netcp->dma_chan_name, ); - if (IS_ERR_OR_NULL(netcp->rx_channel)) { + if (IS_ERR(netcp->rx_channel)) { dev_err(netcp->ndev_dev, "failed opening rx chan(%s\n", netcp->dma_chan_name); + ret = PTR_ERR(netcp->rx_channel); goto fail; } diff --git a/drivers/soc/ti/knav_dma.c b/drivers/soc/ti/knav_dma.c index ecebe2e..026182d 100644 --- a/drivers/soc/ti/knav_dma.c +++ b/drivers/soc/ti/knav_dma.c @@ -413,7 +413,7 @@ static int of_channel_match_helper(struct device_node *np, const char *name, * @name: slave channel name * @config:dma configuration parameters * - * Returns pointer to appropriate DMA channel on success or NULL. + * Returns pointer to appropriate DMA channel on success or error. */ void *knav_dma_open_channel(struct device *dev, const char *name, struct knav_dma_cfg *config) -- 2.7.4
[PATCH net 2/4] s390/qeth: unbreak OSM and OSN support
commit b4d72c08b358 ("qeth: bridgeport support - basic control") broke the support for OSM and OSN devices as follows: As OSM and OSN are L2 only, qeth_core_probe_device() does an early setup by loading the l2 discipline and calling qeth_l2_probe_device(). In this context, adding the l2-specific bridgeport sysfs attributes via qeth_l2_create_device_attributes() hits a BUG_ON in fs/sysfs/group.c, since the basic sysfs infrastructure for the device hasn't been established yet. Note that OSN actually has its own unique sysfs attributes (qeth_osn_devtype), so the additional attributes shouldn't be created at all. For OSM, add a new qeth_l2_devtype that contains all the common and l2-specific sysfs attributes. When qeth_core_probe_device() does early setup for OSM or OSN, assign the corresponding devtype so that the ccwgroup probe code creates the full set of sysfs attributes. This allows us to skip qeth_l2_create_device_attributes() in case of an early setup. Any device that can't do early setup will initially have only the generic sysfs attributes, and when it's probed later qeth_l2_probe_device() adds the l2-specific attributes. If an early-setup device is removed (by calling ccwgroup_ungroup()), device_unregister() will - using the devtype - delete the l2-specific attributes before qeth_l2_remove_device() is called. So make sure to not remove them twice. What complicates the issue is that qeth_l2_probe_device() and qeth_l2_remove_device() is also called on a device when its layer2 attribute changes (ie. its layer mode is switched). For early-setup devices this wouldn't work properly - we wouldn't remove the l2-specific attributes when switching to L3. But switching the layer mode doesn't actually make any sense; we already decided that the device can only operate in L2! So just refuse to switch the layer mode on such devices. Note that OSN doesn't have a layer2 attribute, so we only need to special-case OSM. Based on an initial patch by Ursula Braun. Fixes: b4d72c08b358 ("qeth: bridgeport support - basic control") Signed-off-by: Julian Wiedmann--- drivers/s390/net/qeth_core.h | 4 drivers/s390/net/qeth_core_main.c | 17 + drivers/s390/net/qeth_core_sys.c | 22 ++ drivers/s390/net/qeth_l2.h| 2 ++ drivers/s390/net/qeth_l2_main.c | 17 + drivers/s390/net/qeth_l2_sys.c| 8 drivers/s390/net/qeth_l3_main.c | 1 + 7 files changed, 51 insertions(+), 20 deletions(-) diff --git a/drivers/s390/net/qeth_core.h b/drivers/s390/net/qeth_core.h index f6aa21176d89..30bc6105aac3 100644 --- a/drivers/s390/net/qeth_core.h +++ b/drivers/s390/net/qeth_core.h @@ -701,6 +701,7 @@ enum qeth_discipline_id { }; struct qeth_discipline { + const struct device_type *devtype; void (*start_poll)(struct ccw_device *, int, unsigned long); qdio_handler_t *input_handler; qdio_handler_t *output_handler; @@ -875,6 +876,9 @@ extern struct qeth_discipline qeth_l2_discipline; extern struct qeth_discipline qeth_l3_discipline; extern const struct attribute_group *qeth_generic_attr_groups[]; extern const struct attribute_group *qeth_osn_attr_groups[]; +extern const struct attribute_group qeth_device_attr_group; +extern const struct attribute_group qeth_device_blkt_group; +extern const struct device_type qeth_generic_devtype; extern struct workqueue_struct *qeth_wq; int qeth_card_hw_is_reachable(struct qeth_card *); diff --git a/drivers/s390/net/qeth_core_main.c b/drivers/s390/net/qeth_core_main.c index 5bfd7c15d6a9..fc6d85f2b38d 100644 --- a/drivers/s390/net/qeth_core_main.c +++ b/drivers/s390/net/qeth_core_main.c @@ -5530,10 +5530,12 @@ void qeth_core_free_discipline(struct qeth_card *card) card->discipline = NULL; } -static const struct device_type qeth_generic_devtype = { +const struct device_type qeth_generic_devtype = { .name = "qeth_generic", .groups = qeth_generic_attr_groups, }; +EXPORT_SYMBOL_GPL(qeth_generic_devtype); + static const struct device_type qeth_osn_devtype = { .name = "qeth_osn", .groups = qeth_osn_attr_groups, @@ -5659,23 +5661,22 @@ static int qeth_core_probe_device(struct ccwgroup_device *gdev) goto err_card; } - if (card->info.type == QETH_CARD_TYPE_OSN) - gdev->dev.type = _osn_devtype; - else - gdev->dev.type = _generic_devtype; - switch (card->info.type) { case QETH_CARD_TYPE_OSN: case QETH_CARD_TYPE_OSM: rc = qeth_core_load_discipline(card, QETH_DISCIPLINE_LAYER2); if (rc) goto err_card; + + gdev->dev.type = (card->info.type != QETH_CARD_TYPE_OSN) + ? card->discipline->devtype + : _osn_devtype; rc = card->discipline->setup(card->gdev); if (rc)
[PATCH net 3/4] s390/qeth: avoid null pointer dereference on OSN
Access card->dev only after checking whether's its valid. Signed-off-by: Julian WiedmannReviewed-by: Ursula Braun --- drivers/s390/net/qeth_l2_main.c | 10 ++ 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/drivers/s390/net/qeth_l2_main.c b/drivers/s390/net/qeth_l2_main.c index 04666fe231aa..bd2df62a5cdf 100644 --- a/drivers/s390/net/qeth_l2_main.c +++ b/drivers/s390/net/qeth_l2_main.c @@ -965,7 +965,6 @@ static int qeth_l2_setup_netdev(struct qeth_card *card) case QETH_CARD_TYPE_OSN: card->dev = alloc_netdev(0, "osn%d", NET_NAME_UNKNOWN, ether_setup); - card->dev->flags |= IFF_NOARP; break; default: card->dev = alloc_etherdev(0); @@ -980,9 +979,12 @@ static int qeth_l2_setup_netdev(struct qeth_card *card) card->dev->min_mtu = 64; card->dev->max_mtu = ETH_MAX_MTU; card->dev->netdev_ops = _l2_netdev_ops; - card->dev->ethtool_ops = - (card->info.type != QETH_CARD_TYPE_OSN) ? - _l2_ethtool_ops : _l2_osn_ops; + if (card->info.type == QETH_CARD_TYPE_OSN) { + card->dev->ethtool_ops = _l2_osn_ops; + card->dev->flags |= IFF_NOARP; + } else { + card->dev->ethtool_ops = _l2_ethtool_ops; + } card->dev->features |= NETIF_F_HW_VLAN_CTAG_FILTER; if (card->info.type == QETH_CARD_TYPE_OSD && !card->info.guestlan) { card->dev->hw_features = NETIF_F_SG; -- 2.10.2
[PATCH net 0/4] s390/net fixes
Hello Dave, some qeth fixes for -net, the OSM/OSN one being the most crucial. Please also queue these up for stable. Thanks, Julian Julian Wiedmann (2): s390/qeth: unbreak OSM and OSN support s390/qeth: avoid null pointer dereference on OSN Ursula Braun (2): s390/qeth: handle sysfs error during initialization s390/qeth: add missing hash table initializations drivers/s390/net/qeth_core.h | 4 drivers/s390/net/qeth_core_main.c | 21 - drivers/s390/net/qeth_core_sys.c | 24 drivers/s390/net/qeth_l2.h| 2 ++ drivers/s390/net/qeth_l2_main.c | 26 -- drivers/s390/net/qeth_l2_sys.c| 8 drivers/s390/net/qeth_l3_main.c | 8 +++- 7 files changed, 69 insertions(+), 24 deletions(-) -- 2.10.2
[PATCH net 1/4] s390/qeth: handle sysfs error during initialization
From: Ursula BraunWhen setting up the device from within the layer discipline's probe routine, creating the layer-specific sysfs attributes can fail. Report this error back to the caller, and handle it by releasing the layer discipline. Signed-off-by: Ursula Braun [jwi: updated commit msg, moved an OSN change to a subsequent patch] Signed-off-by: Julian Wiedmann --- drivers/s390/net/qeth_core_main.c | 4 +++- drivers/s390/net/qeth_core_sys.c | 2 ++ drivers/s390/net/qeth_l2_main.c | 5 - drivers/s390/net/qeth_l3_main.c | 5 - 4 files changed, 13 insertions(+), 3 deletions(-) diff --git a/drivers/s390/net/qeth_core_main.c b/drivers/s390/net/qeth_core_main.c index 38114a8d56e0..5bfd7c15d6a9 100644 --- a/drivers/s390/net/qeth_core_main.c +++ b/drivers/s390/net/qeth_core_main.c @@ -5731,8 +5731,10 @@ static int qeth_core_set_online(struct ccwgroup_device *gdev) if (rc) goto err; rc = card->discipline->setup(card->gdev); - if (rc) + if (rc) { + qeth_core_free_discipline(card); goto err; + } } rc = card->discipline->set_online(gdev); err: diff --git a/drivers/s390/net/qeth_core_sys.c b/drivers/s390/net/qeth_core_sys.c index 75b29fd2fcf4..412ff61891ac 100644 --- a/drivers/s390/net/qeth_core_sys.c +++ b/drivers/s390/net/qeth_core_sys.c @@ -426,6 +426,8 @@ static ssize_t qeth_dev_layer2_store(struct device *dev, goto out; rc = card->discipline->setup(card->gdev); + if (rc) + qeth_core_free_discipline(card); out: mutex_unlock(>discipline_mutex); return rc ? rc : count; diff --git a/drivers/s390/net/qeth_l2_main.c b/drivers/s390/net/qeth_l2_main.c index 1b07f382d74c..2d2623182abf 100644 --- a/drivers/s390/net/qeth_l2_main.c +++ b/drivers/s390/net/qeth_l2_main.c @@ -883,8 +883,11 @@ static int qeth_l2_stop(struct net_device *dev) static int qeth_l2_probe_device(struct ccwgroup_device *gdev) { struct qeth_card *card = dev_get_drvdata(>dev); + int rc; - qeth_l2_create_device_attributes(>dev); + rc = qeth_l2_create_device_attributes(>dev); + if (rc) + return rc; INIT_LIST_HEAD(>vid_list); hash_init(card->mac_htable); card->options.layer2 = 1; diff --git a/drivers/s390/net/qeth_l3_main.c b/drivers/s390/net/qeth_l3_main.c index 6e0354ef4b86..ae70daf33bd5 100644 --- a/drivers/s390/net/qeth_l3_main.c +++ b/drivers/s390/net/qeth_l3_main.c @@ -3039,8 +3039,11 @@ static int qeth_l3_setup_netdev(struct qeth_card *card) static int qeth_l3_probe_device(struct ccwgroup_device *gdev) { struct qeth_card *card = dev_get_drvdata(>dev); + int rc; - qeth_l3_create_device_attributes(>dev); + rc = qeth_l3_create_device_attributes(>dev); + if (rc) + return rc; card->options.layer2 = 0; card->info.hwtrap = 0; return 0; -- 2.10.2
[PATCH net 4/4] s390/qeth: add missing hash table initializations
From: Ursula Brauncommit 5f78e29ceebf ("qeth: optimize IP handling in rx_mode callback") added new hash tables, but missed to initialize them. Fixes: 5f78e29ceebf ("qeth: optimize IP handling in rx_mode callback") Signed-off-by: Ursula Braun Reviewed-by: Julian Wiedmann Signed-off-by: Julian Wiedmann --- drivers/s390/net/qeth_l3_main.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/s390/net/qeth_l3_main.c b/drivers/s390/net/qeth_l3_main.c index 6c2146fc831a..d8df1e635163 100644 --- a/drivers/s390/net/qeth_l3_main.c +++ b/drivers/s390/net/qeth_l3_main.c @@ -3044,6 +3044,8 @@ static int qeth_l3_probe_device(struct ccwgroup_device *gdev) rc = qeth_l3_create_device_attributes(>dev); if (rc) return rc; + hash_init(card->ip_htable); + hash_init(card->ip_mc_htable); card->options.layer2 = 0; card->info.hwtrap = 0; return 0; -- 2.10.2
Re: [Patch net] ipv4: restore rt->fi for reference counting
On Wed, May 10, 2017 at 12:38 AM, Julian Anastasovwrote: > > Hello, > > On Tue, 9 May 2017, Cong Wang wrote: > >> > Also setting nexthop_nh->nh_dev to NULL looks quite dangerous >> > >> > We have plenty of sites doing : >> > >> > if (fi->fib_dev) >> > x = fi->fib_dev->field >> > >> > fib_route_seq_show() is one example. >> > >> >> All of them take RCU read lock, so, as I explained in the code comment, >> they all should be fine because of synchronize_net() on unregister path. >> Do you see anything otherwise? > > During NETDEV_UNREGISTER packets for dev should not > be flying but packets for other devs can walk the nexthops > for multipath routes. It is the rcu_barrier before > NETDEV_UNREGISTER_FINAL that allows nh_dev to be set to NULL > during this grace period but there are many places to fix that > assume nh_dev!=NULL. Excellent point. Unfortunately NETDEV_UNREGISTER_FINAL is not yet captured by the fib event notifier. I think readers are still safe to assume nh_dev!=NULL since we wait for existing readers, readers coming later can't see it any more. Or am I missing anything? > > But why we leak routes? Is there some place that holds > routes without listening for NETDEV_UNREGISTER? On fib_flush > the infos are unlinked from trees, so after a grace period packets > should not see/hold such infos. If we hold routes somewhere for > long time, problem can happen also for routes with single nexthop. My theory is that dst which holds a refcnt to fib_info (of course, after this patch) is moved in GC after NETDEV_UNREGISTER but still referenced somewhere, it therefore holds these nh_dev's, which in turn hold back to the netdevice which is being unregistered, thus Eric saw these refcnt leak warnings. I am not sure if sitting in GC for a long time is problematic or not, but from the code where we transfer dst->dev to loopback_dev, it seems this is expected otherwise no need to transfer? But I don't dig the history though. Thanks.
Re: [PATCH net 2/3] net/mlx4_en: Avoid adding steering rules with invalid ring
On Tue, May 9, 2017 at 2:45 PM, Tariq Toukanwrote: > From: Talat Batheesh > > Inserting steering rules with illegal ring is an invalid operation, block it. Hi Dave, I realized today that the patch introduced a regression, Tariq will see if to revert it as a whole or fix the regression. Just wanted to drop you a note and to make sure you don't further carry it to -stable, xxit happens :( Or.
Re: bpf pointer alignment validation
From: Daniel BorkmannDate: Wed, 10 May 2017 18:21:50 +0200 > On 05/10/2017 05:57 PM, David Miller wrote: >> From: Daniel Borkmann >> Date: Wed, 10 May 2017 17:51:50 +0200 >> >>> Would probably be good nevertheless to have this as a flag for >>> program loads, which gets then passed through to the verifier to >>> explicitly enable strict alignment checks. >>> >>> Might certainly aide developing & testing programs on archs with >>> efficient unaligned access and later actually running them on archs >>> that don't have it. (And at minimum, it also helps for checking >>> the test suite against the verifier.) >> >> Ok, I can implement this flag. >> >> The only question is where to put it? An unused bit in the program >> type? :-) > > See for example 7f677633379b ("bpf: introduce BPF_F_ALLOW_OVERRIDE > flag"). We can add a flags field to the prog loading part of union > bpf_attr; we would need to make sure to update > BPF_PROG_LOAD_LAST_FIELD > to the new member, and to reject unknown flags, of course. Then the > syscall will handle compat with older binaries just fine by design, > the main bpf syscall code and CHECK_ATTR() macros will ensure this > (backward compat, and to a limited degree also forward compat). Thanks to both of you for guidance on how to do this properly. Here is what I have right now, once things look good enough I'll split it all out into a proper patch series: 1) Add alignment tracking. 2) Add "log_level > 1" per-insn state dumping 3) Add BPF_F_STRICT_ALIGNMENT 4) Add bpf_verify_program() to library 5) Add test_align to selftests Subject: [PATCH] BPF alignment framework --- include/linux/bpf_verifier.h | 3 + include/uapi/linux/bpf.h | 8 + kernel/bpf/syscall.c | 5 +- kernel/bpf/verifier.c| 132 -- tools/include/uapi/linux/bpf.h | 11 +- tools/lib/bpf/bpf.c | 22 ++ tools/lib/bpf/bpf.h | 4 + tools/testing/selftests/bpf/Makefile | 4 +- tools/testing/selftests/bpf/test_align.c | 417 +++ 9 files changed, 581 insertions(+), 25 deletions(-) create mode 100644 tools/testing/selftests/bpf/test_align.c diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h index 5efb4db..7c6a519 100644 --- a/include/linux/bpf_verifier.h +++ b/include/linux/bpf_verifier.h @@ -40,6 +40,9 @@ struct bpf_reg_state { */ s64 min_value; u64 max_value; + u32 min_align; + u32 aux_off; + u32 aux_off_align; }; enum bpf_stack_slot_type { diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index 945a1f5..94dfa9d 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -132,6 +132,13 @@ enum bpf_attach_type { */ #define BPF_F_ALLOW_OVERRIDE (1U << 0) +/* If BPF_F_STRICT_ALIGNMENT is used in BPF_PROG_LOAD command, the + * verifier will perform strict alignment checking as if the kernel + * has been built with CONFIG_EFFICIENT_UNALIGNED_ACCESS not set, + * and NET_IP_ALIGN defined to 2. + */ +#define BPF_F_STRICT_ALIGNMENT (1U << 0) + #define BPF_PSEUDO_MAP_FD 1 /* flags for BPF_MAP_UPDATE_ELEM command */ @@ -177,6 +184,7 @@ union bpf_attr { __u32 log_size; /* size of user buffer */ __aligned_u64 log_buf;/* user supplied buffer */ __u32 kern_version; /* checked when prog_type=kprobe */ + __u32 prog_flags; }; struct { /* anonymous struct used by BPF_OBJ_* commands */ diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index 13642c7..027053a 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -784,7 +784,7 @@ struct bpf_prog *bpf_prog_get_type(u32 ufd, enum bpf_prog_type type) EXPORT_SYMBOL_GPL(bpf_prog_get_type); /* last field in 'union bpf_attr' used by this command */ -#defineBPF_PROG_LOAD_LAST_FIELD kern_version +#defineBPF_PROG_LOAD_LAST_FIELD prog_flags static int bpf_prog_load(union bpf_attr *attr) { @@ -797,6 +797,9 @@ static int bpf_prog_load(union bpf_attr *attr) if (CHECK_ATTR(BPF_PROG_LOAD)) return -EINVAL; + if (attr->prog_flags & ~BPF_F_STRICT_ALIGNMENT) + return -EINVAL; + /* copy eBPF program license from user space */ if (strncpy_from_user(license, u64_to_user_ptr(attr->license), sizeof(license) - 1) < 0) diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index c2ff608..87ecb4d 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -241,6 +241,12 @@ static void print_verifier_state(struct bpf_verifier_state *state) if (reg->max_value != BPF_REGISTER_MAX_RANGE) verbose(",max_value=%llu", (unsigned long
Re: [Patch net] ipv4: restore rt->fi for reference counting
On Tue, May 9, 2017 at 4:51 PM, Eric Dumazetwrote: > On Tue, 2017-05-09 at 16:35 -0700, Cong Wang wrote: > >> This statement is only used to ensure we pass the "dead == fi->fib_nhs" >> check right below the inner loop, it is fine to keep it without break since >> fi is not changed in the inner loop. >> > > So the dead++ above wont end up with (dead > fi->fib_nhs) ? Good point, it could happen, we probably need another boolean to address this.
Re: [Patch net] ipv4: restore rt->fi for reference counting
On Tue, May 9, 2017 at 4:50 PM, Eric Dumazetwrote: > On Tue, 2017-05-09 at 16:35 -0700, Cong Wang wrote: > >> All of them take RCU read lock, so, as I explained in the code comment, >> they all should be fine because of synchronize_net() on unregister path. >> Do you see anything otherwise? > > They might take rcu lock, but compiler is still allowed to read > fi->fib_dev multiple times, and crashes might happen. > > You will need to audit all code and fix it, using proper > rcu_dereference() or similar code ensuring compiler wont do stupid > things. > Point taken. But without my patch, nh_dev is supposed to be protected by RCU too, it is freed in a rcu callback and dereferenced like: struct in_device *in_dev = __in_dev_get_rcu(nh->nh_dev);
Re: bpf pointer alignment validation
On 05/10/2017 05:57 PM, David Miller wrote: From: Daniel BorkmannDate: Wed, 10 May 2017 17:51:50 +0200 Would probably be good nevertheless to have this as a flag for program loads, which gets then passed through to the verifier to explicitly enable strict alignment checks. Might certainly aide developing & testing programs on archs with efficient unaligned access and later actually running them on archs that don't have it. (And at minimum, it also helps for checking the test suite against the verifier.) Ok, I can implement this flag. The only question is where to put it? An unused bit in the program type? :-) See for example 7f677633379b ("bpf: introduce BPF_F_ALLOW_OVERRIDE flag"). We can add a flags field to the prog loading part of union bpf_attr; we would need to make sure to update BPF_PROG_LOAD_LAST_FIELD to the new member, and to reject unknown flags, of course. Then the syscall will handle compat with older binaries just fine by design, the main bpf syscall code and CHECK_ATTR() macros will ensure this (backward compat, and to a limited degree also forward compat).
Re: bpf pointer alignment validation
On 5/10/17 8:57 AM, David Miller wrote: From: Daniel BorkmannDate: Wed, 10 May 2017 17:51:50 +0200 Would probably be good nevertheless to have this as a flag for program loads, which gets then passed through to the verifier to explicitly enable strict alignment checks. Might certainly aide developing & testing programs on archs with efficient unaligned access and later actually running them on archs that don't have it. (And at minimum, it also helps for checking the test suite against the verifier.) Ok, I can implement this flag. The only question is where to put it? An unused bit in the program type? :-) just add '__u32 prog_flags' to bpf_attr PROG_LOAD anon union. There was no need for such flags in the past, hence no flags field existed. Now it's time.
Re: [net-mellanox] question about potential null pointer dereference
Quoting Ido Schimmel: On Wed, May 10, 2017 at 10:36:59AM -0500, Gustavo A. R. Silva wrote: Hello everybody, While looking into Coverity ID 1350941 I ran into the following piece of code at drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c:1483: 1483static void mlxsw_sp_fdb_notify_mac_process(struct mlxsw_sp *mlxsw_sp, 1484char *sfn_pl, int rec_index, 1485bool adding) 1486{ 1487struct mlxsw_sp_port *mlxsw_sp_port; 1488char mac[ETH_ALEN]; 1489u8 local_port; 1490u16 vid, fid; 1491bool do_notification = true; 1492int err; 1493 1494mlxsw_reg_sfn_mac_unpack(sfn_pl, rec_index, mac, , _port); 1495mlxsw_sp_port = mlxsw_sp->ports[local_port]; 1496if (!mlxsw_sp_port) { 1497dev_err_ratelimited(mlxsw_sp->bus_info->dev, "Incorrect local port in FDB notification\n"); 1498goto just_remove; 1499} 1500 1501if (mlxsw_sp_fid_is_vfid(fid)) { 1502struct mlxsw_sp_port *mlxsw_sp_vport; 1503 1504mlxsw_sp_vport = mlxsw_sp_port_vport_find_by_fid(mlxsw_sp_port, 1505 fid); 1506if (!mlxsw_sp_vport) { 1507netdev_err(mlxsw_sp_port->dev, "Failed to find a matching vPort following FDB notification\n"); 1508goto just_remove; 1509} 1510vid = 0; 1511/* Override the physical port with the vPort. */ 1512mlxsw_sp_port = mlxsw_sp_vport; 1513} else { 1514vid = fid; 1515} 1516 1517do_fdb_op: 1518err = mlxsw_sp_port_fdb_uc_op(mlxsw_sp, local_port, mac, fid, 1519 adding, true); 1520if (err) { 1521if (net_ratelimit()) 1522netdev_err(mlxsw_sp_port->dev, "Failed to set FDB entry\n"); 1523return; 1524} 1525 1526if (!do_notification) 1527return; 1528mlxsw_sp_fdb_call_notifiers(mlxsw_sp_port->learning_sync, 1529adding, mac, vid, mlxsw_sp_port->dev); 1530return; 1531 1532just_remove: 1533adding = false; 1534do_notification = false; 1535goto do_fdb_op; 1536} The issue here is that line 1496 implies that mlxsw_sp_port might be NULL. If this is the case, the execution path jumps to line 1532 and then to line 1517. All this could end up dereferencing a NULL pointer at line 1522. Is there any chance for mlxsw_sp_port to be NULL at line 1496 and, at the same time, a NULL pointer dereference occurs at line 1522? I'm trying to figure out if this is a false positive or something that actually needs to be fixed. In theory, yes, it can happen, but it didn't happen yet. I recently patched that and now it's in Jiri's queue. I guess he'll send it tomorrow. https://github.com/jpirko/linux_mlxsw/commit/4160fb9ad6eeacba7736cfdbf7f52248432c2e89 Great, glad to see it is fixed now. Thanks for looking into this! Sure thing, thanks for clarifying. :) -- Gustavo A. R. Silva
Re: bpf pointer alignment validation
From: Daniel BorkmannDate: Wed, 10 May 2017 17:51:50 +0200 > Would probably be good nevertheless to have this as a flag for > program loads, which gets then passed through to the verifier to > explicitly enable strict alignment checks. > > Might certainly aide developing & testing programs on archs with > efficient unaligned access and later actually running them on archs > that don't have it. (And at minimum, it also helps for checking > the test suite against the verifier.) Ok, I can implement this flag. The only question is where to put it? An unused bit in the program type? :-)
Re: [net-mellanox] question about potential null pointer dereference
On Wed, May 10, 2017 at 10:36:59AM -0500, Gustavo A. R. Silva wrote: > > Hello everybody, > > While looking into Coverity ID 1350941 I ran into the following piece of > code at drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c:1483: > > 1483static void mlxsw_sp_fdb_notify_mac_process(struct mlxsw_sp *mlxsw_sp, > 1484char *sfn_pl, int rec_index, > 1485bool adding) > 1486{ > 1487struct mlxsw_sp_port *mlxsw_sp_port; > 1488char mac[ETH_ALEN]; > 1489u8 local_port; > 1490u16 vid, fid; > 1491bool do_notification = true; > 1492int err; > 1493 > 1494mlxsw_reg_sfn_mac_unpack(sfn_pl, rec_index, mac, , > _port); > 1495mlxsw_sp_port = mlxsw_sp->ports[local_port]; > 1496if (!mlxsw_sp_port) { > 1497dev_err_ratelimited(mlxsw_sp->bus_info->dev, "Incorrect > local port in FDB notification\n"); > 1498goto just_remove; > 1499} > 1500 > 1501if (mlxsw_sp_fid_is_vfid(fid)) { > 1502struct mlxsw_sp_port *mlxsw_sp_vport; > 1503 > 1504mlxsw_sp_vport = > mlxsw_sp_port_vport_find_by_fid(mlxsw_sp_port, > 1505 fid); > 1506if (!mlxsw_sp_vport) { > 1507netdev_err(mlxsw_sp_port->dev, "Failed to find a > matching vPort following FDB notification\n"); > 1508goto just_remove; > 1509} > 1510vid = 0; > 1511/* Override the physical port with the vPort. */ > 1512mlxsw_sp_port = mlxsw_sp_vport; > 1513} else { > 1514vid = fid; > 1515} > 1516 > 1517do_fdb_op: > 1518err = mlxsw_sp_port_fdb_uc_op(mlxsw_sp, local_port, mac, fid, > 1519 adding, true); > 1520if (err) { > 1521if (net_ratelimit()) > 1522netdev_err(mlxsw_sp_port->dev, "Failed to set > FDB entry\n"); > 1523return; > 1524} > 1525 > 1526if (!do_notification) > 1527return; > 1528mlxsw_sp_fdb_call_notifiers(mlxsw_sp_port->learning_sync, > 1529adding, mac, vid, mlxsw_sp_port->dev); > 1530return; > 1531 > 1532just_remove: > 1533adding = false; > 1534do_notification = false; > 1535goto do_fdb_op; > 1536} > > > The issue here is that line 1496 implies that mlxsw_sp_port might be NULL. > If this is the case, the execution path jumps to line 1532 and then to line > 1517. All this could end up dereferencing a NULL pointer at line 1522. > > Is there any chance for mlxsw_sp_port to be NULL at line 1496 and, at the > same time, a NULL pointer dereference occurs at line 1522? > > I'm trying to figure out if this is a false positive or something that > actually needs to be fixed. In theory, yes, it can happen, but it didn't happen yet. I recently patched that and now it's in Jiri's queue. I guess he'll send it tomorrow. https://github.com/jpirko/linux_mlxsw/commit/4160fb9ad6eeacba7736cfdbf7f52248432c2e89 Thanks for looking into this! > > I'd really appreciate any comment on this. > Thank you! > -- > Gustavo A. R. Silva
Re: bpf pointer alignment validation
On 05/10/2017 05:33 PM, David Miller wrote: From: Alexei StarovoitovDate: Tue, 9 May 2017 22:57:37 -0700 On Tue, May 09, 2017 at 02:32:34PM -0400, David Miller wrote: +static u32 calc_align(u32 imm) +{ + u32 align = 1; + + if (!imm) + return 1U << 31; + + while (!(imm & 1)) { + imm >>= 1; + align <<= 1; + } + return align; +} same question as in previous reply. Why not to use something like: static u32 calc_align(u32 n) { if (!n) return 1U << 31; return n - ((n - 1) & n); } Ok. I did a cursory search and we don't have a generic kernel helper for this kind of calculation. I was actually quite surprised, as we have one for everything else :-) this needs to be tweaked like if (log_level > 1) verbose("%d:", insn_idx); else verbose("\nfrom %d to %d:", prev_insn_idx, insn_idx); otherwise it prints prev_insn_idx which is meaningful only with processing exit and search pruning. Agreed. Nice to see all these comments. I wonder how we can make them automatic in the verifier. Like if verifier can somehow hint the user in such human friendly way about what is happening with the program. Today that's the #1 problem. Most folks complaining that verifier error messages are too hard to understand. We can put whatever text strings we want into that verifier buffer. It is as flexible as netlink extended acks. would it make sense to bpf_prog_test_run() it here as well? We could. On x86 not much value, but on sparc we can somehow look for traps? Is there some counter of unaligned traps that we can read and report as error to user space after prog_test_run ? Unfortunately, no. These tests we cannot really run, since they don't do any load/store. I mean more for some future tests. Or some sort of debug warn that there were traps while bpf prog was executed, so the user is alarmed and reports to us, since that would be a bug in verifier align logic? One thing I could definitely do is add logic to the unaligned trap handler to print something special in the logs if we find that we are executing BPF code. The basic structure of the log message can be codified in a generic bpf_xxx() helper, which architectures call with the PC and unaligned address as arguments. I was thinking more last night about strict alignment mode for the verifier, so that bugs can be spotted on all architectures. But it stupidly will not work. The problem is that x86 and powerpc define NET_IP_ALIGN as 0, so all bets are off. One thing we could do in "strict alignment" verifier mode is pretend that NET_IP_ALIGN is 2 in the alignment checks. Would probably be good nevertheless to have this as a flag for program loads, which gets then passed through to the verifier to explicitly enable strict alignment checks. Might certainly aide developing & testing programs on archs with efficient unaligned access and later actually running them on archs that don't have it. (And at minimum, it also helps for checking the test suite against the verifier.)
[net-mellanox] question about potential null pointer dereference
Hello everybody, While looking into Coverity ID 1350941 I ran into the following piece of code at drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c:1483: 1483static void mlxsw_sp_fdb_notify_mac_process(struct mlxsw_sp *mlxsw_sp, 1484char *sfn_pl, int rec_index, 1485bool adding) 1486{ 1487struct mlxsw_sp_port *mlxsw_sp_port; 1488char mac[ETH_ALEN]; 1489u8 local_port; 1490u16 vid, fid; 1491bool do_notification = true; 1492int err; 1493 1494mlxsw_reg_sfn_mac_unpack(sfn_pl, rec_index, mac, , _port); 1495mlxsw_sp_port = mlxsw_sp->ports[local_port]; 1496if (!mlxsw_sp_port) { 1497dev_err_ratelimited(mlxsw_sp->bus_info->dev, "Incorrect local port in FDB notification\n"); 1498goto just_remove; 1499} 1500 1501if (mlxsw_sp_fid_is_vfid(fid)) { 1502struct mlxsw_sp_port *mlxsw_sp_vport; 1503 1504mlxsw_sp_vport = mlxsw_sp_port_vport_find_by_fid(mlxsw_sp_port, 1505 fid); 1506if (!mlxsw_sp_vport) { 1507netdev_err(mlxsw_sp_port->dev, "Failed to find a matching vPort following FDB notification\n"); 1508goto just_remove; 1509} 1510vid = 0; 1511/* Override the physical port with the vPort. */ 1512mlxsw_sp_port = mlxsw_sp_vport; 1513} else { 1514vid = fid; 1515} 1516 1517do_fdb_op: 1518err = mlxsw_sp_port_fdb_uc_op(mlxsw_sp, local_port, mac, fid, 1519 adding, true); 1520if (err) { 1521if (net_ratelimit()) 1522netdev_err(mlxsw_sp_port->dev, "Failed to set FDB entry\n"); 1523return; 1524} 1525 1526if (!do_notification) 1527return; 1528mlxsw_sp_fdb_call_notifiers(mlxsw_sp_port->learning_sync, 1529adding, mac, vid, mlxsw_sp_port->dev); 1530return; 1531 1532just_remove: 1533adding = false; 1534do_notification = false; 1535goto do_fdb_op; 1536} The issue here is that line 1496 implies that mlxsw_sp_port might be NULL. If this is the case, the execution path jumps to line 1532 and then to line 1517. All this could end up dereferencing a NULL pointer at line 1522. Is there any chance for mlxsw_sp_port to be NULL at line 1496 and, at the same time, a NULL pointer dereference occurs at line 1522? I'm trying to figure out if this is a false positive or something that actually needs to be fixed. I'd really appreciate any comment on this. Thank you! -- Gustavo A. R. Silva
Re: bpf pointer alignment validation
From: Alexei StarovoitovDate: Tue, 9 May 2017 22:57:37 -0700 > On Tue, May 09, 2017 at 02:32:34PM -0400, David Miller wrote: >> >> +static u32 calc_align(u32 imm) >> +{ >> +u32 align = 1; >> + >> +if (!imm) >> +return 1U << 31; >> + >> +while (!(imm & 1)) { >> +imm >>= 1; >> +align <<= 1; >> +} >> +return align; >> +} > > same question as in previous reply. > Why not to use something like: > static u32 calc_align(u32 n) > { > if (!n) > return 1U << 31; > return n - ((n - 1) & n); > } Ok. I did a cursory search and we don't have a generic kernel helper for this kind of calculation. I was actually quite surprised, as we have one for everything else :-) > this needs to be tweaked like > if (log_level > 1) > verbose("%d:", insn_idx); > else > verbose("\nfrom %d to %d:", prev_insn_idx, insn_idx); > > otherwise it prints prev_insn_idx which is meaningful > only with processing exit and search pruning. Agreed. > Nice to see all these comments. > I wonder how we can make them automatic in the verifier. > Like if verifier can somehow hint the user in such human friendly way > about what is happening with the program. > Today that's the #1 problem. Most folks complaining > that verifier error messages are too hard to understand. We can put whatever text strings we want into that verifier buffer. It is as flexible as netlink extended acks. > would it make sense to bpf_prog_test_run() it here as well? We could. > On x86 not much value, but on sparc we can somehow look for traps? > Is there some counter of unaligned traps that we can read and report > as error to user space after prog_test_run ? Unfortunately, no. > These tests we cannot really run, since they don't do any load/store. > I mean more for some future tests. Or some sort of debug warn > that there were traps while bpf prog was executed, so the user > is alarmed and reports to us, since that would be a bug in verifier > align logic? One thing I could definitely do is add logic to the unaligned trap handler to print something special in the logs if we find that we are executing BPF code. The basic structure of the log message can be codified in a generic bpf_xxx() helper, which architectures call with the PC and unaligned address as arguments. I was thinking more last night about strict alignment mode for the verifier, so that bugs can be spotted on all architectures. But it stupidly will not work. The problem is that x86 and powerpc define NET_IP_ALIGN as 0, so all bets are off. One thing we could do in "strict alignment" verifier mode is pretend that NET_IP_ALIGN is 2 in the alignment checks.
[PATCH] mdio: mux: Correct mdio_mux_init error path issues
There is a potential unnecessary refcount decriment on error path of put_device(>mii_bus->dev), as it is possible to avoid the of_mdio_find_bus() call if mux_bus is specified by the calling function. The same put_device() is not called in the error path if the devm_kzalloc of pb fails. This caused the variable used in the put_device() to be changed, as the pb pointer was obviously not set up. There is an unnecessary of_node_get() on child_bus_node if the of_mdiobus_register() is successful, as the for_each_available_child_of_node() automatically increments this. Thus the refcount on this node will always be +1 more than it should be. There is no of_node_put() on child_bus_node if the of_mdiobus_register() call fails. Finally, it is lacking devm_kfree() of pb in the error path. While this might not be technically necessary, it was present in other parts of the function. So, I am adding it where necessary to make it uniform. Signed-off-by: Jon MasonFixes: f20e6657a875 ("mdio: mux: Enhanced MDIO mux framework for integrated multiplexers") Fixes: 0ca2997d1452 ("netdev/of/phy: Add MDIO bus multiplexer support.") --- drivers/net/phy/mdio-mux.c | 12 +++- 1 file changed, 7 insertions(+), 5 deletions(-) diff --git a/drivers/net/phy/mdio-mux.c b/drivers/net/phy/mdio-mux.c index 963838d4fac1..6943c5ece44a 100644 --- a/drivers/net/phy/mdio-mux.c +++ b/drivers/net/phy/mdio-mux.c @@ -122,10 +122,9 @@ int mdio_mux_init(struct device *dev, pb = devm_kzalloc(dev, sizeof(*pb), GFP_KERNEL); if (pb == NULL) { ret_val = -ENOMEM; - goto err_parent_bus; + goto err_pb_kz; } - pb->switch_data = data; pb->switch_fn = switch_fn; pb->current_child = -1; @@ -154,6 +153,7 @@ int mdio_mux_init(struct device *dev, cb->mii_bus = mdiobus_alloc(); if (!cb->mii_bus) { ret_val = -ENOMEM; + devm_kfree(dev, cb); of_node_put(child_bus_node); break; } @@ -169,8 +169,8 @@ int mdio_mux_init(struct device *dev, if (r) { mdiobus_free(cb->mii_bus); devm_kfree(dev, cb); + of_node_put(child_bus_node); } else { - of_node_get(child_bus_node); cb->next = pb->children; pb->children = cb; } @@ -181,9 +181,11 @@ int mdio_mux_init(struct device *dev, return 0; } + devm_kfree(dev, pb); +err_pb_kz: /* balance the reference of_mdio_find_bus() took */ - put_device(>mii_bus->dev); - + if (!mux_bus) + put_device(_bus->dev); err_parent_bus: of_node_put(parent_bus_node); return ret_val; -- 2.7.4
Re: [PATCH net-next V4 00/10] vhost_net batch dequeuing
On Wed, May 10, 2017 at 11:36:12AM +0800, Jason Wang wrote: > This series tries to implement rx batching for vhost-net. This is done > by batching the dequeuing from skb_array which was exported by > underlayer socket and pass the sbk back through msg_control to finish > userspace copying. This is also the requirement for more batching > implemention on rx path. > > Tests shows at most 8.8% improvment bon rx pps on top of batch zeroing. > > Please review. > > Thanks > > Changes from V3: > - add batch zeroing patch to fix the build warnings FYI that one's going upstream now. > Changes from V2: > - rebase to net-next HEAD > - use unconsume helpers to put skb back on releasing > - introduce and use vhost_net internal buffer helpers > - renew performance numbers on top of batch zeroing > > Changes from V1: > - switch to use for() in __ptr_ring_consume_batched() > - rename peek_head_len_batched() to fetch_skbs() > - use skb_array_consume_batched() instead of > skb_array_consume_batched_bh() since no consumer run in bh > - drop the lockless peeking patch since skb_array could be resized, so > it's not safe to call lockless one > > Jason Wang (8): > skb_array: introduce skb_array_unconsume > ptr_ring: introduce batch dequeuing > skb_array: introduce batch dequeuing > tun: export skb_array > tap: export skb_array > tun: support receiving skb through msg_control > tap: support receiving skb from msg_control > vhost_net: try batch dequing from skb array > > Michael S. Tsirkin (2): > ptr_ring: batch ring zeroing > ptr_ring: add ptr_ring_unconsume > > drivers/net/tap.c | 25 ++- > drivers/net/tun.c | 31 ++-- > drivers/vhost/net.c | 117 +++-- > include/linux/if_tap.h| 5 ++ > include/linux/if_tun.h| 5 ++ > include/linux/ptr_ring.h | 183 > +++--- > include/linux/skb_array.h | 31 > 7 files changed, 370 insertions(+), 27 deletions(-) > > -- > 2.7.4
Re: [PATCH net-next V4 10/10] vhost_net: try batch dequing from skb array
On Wed, May 10, 2017 at 11:36:22AM +0800, Jason Wang wrote: > We used to dequeue one skb during recvmsg() from skb_array, this could > be inefficient because of the bad cache utilization and spinlock > touching for each packet. This patch tries to batch them by calling > batch dequeuing helpers explicitly on the exported skb array and pass > the skb back through msg_control for underlayer socket to finish the > userspace copying. > > Batch dequeuing is also the requirement for more batching improvement > on rx. > > Tests were done by pktgen on tap with XDP1 in guest on top of batch > zeroing: > > rx batch | pps > > 2562.41Mpps (+6.16%) > 1282.48Mpps (+8.80%) > 64 2.38Mpps (+3.96%) <- Default > 16 2.31Mpps (+1.76%) > 4 2.31Mpps (+1.76%) > 1 2.30Mpps (+1.32%) > 0 2.27Mpps (+7.48%) > > Signed-off-by: Jason Wang> --- > drivers/vhost/net.c | 117 > +--- > 1 file changed, 111 insertions(+), 6 deletions(-) > > diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c > index 9b51989..fbaecf3 100644 > --- a/drivers/vhost/net.c > +++ b/drivers/vhost/net.c > @@ -28,6 +28,8 @@ > #include > #include > #include > +#include > +#include > > #include > > @@ -85,6 +87,13 @@ struct vhost_net_ubuf_ref { > struct vhost_virtqueue *vq; > }; > > +#define VHOST_RX_BATCH 64 > +struct vhost_net_buf { > + struct sk_buff *queue[VHOST_RX_BATCH]; > + int tail; > + int head; > +}; > + Do you strictly need to put this inline? This structure is quite big already. Do you see a measureabe difference if you make it struct sk_buff **queue; int tail; int head; ? Will also make it easier to play with the size in the future should someone want to see how does it work e.g. for different ring sizes. > struct vhost_net_virtqueue { > struct vhost_virtqueue vq; > size_t vhost_hlen; > @@ -99,6 +108,8 @@ struct vhost_net_virtqueue { > /* Reference counting for outstanding ubufs. >* Protected by vq mutex. Writers must also take device mutex. */ > struct vhost_net_ubuf_ref *ubufs; > + struct skb_array *rx_array; > + struct vhost_net_buf rxq; > }; > > struct vhost_net { > @@ -117,6 +128,71 @@ struct vhost_net { > > static unsigned vhost_net_zcopy_mask __read_mostly; > > +static void *vhost_net_buf_get_ptr(struct vhost_net_buf *rxq) > +{ > + if (rxq->tail != rxq->head) > + return rxq->queue[rxq->head]; > + else > + return NULL; > +} > + > +static int vhost_net_buf_get_size(struct vhost_net_buf *rxq) > +{ > + return rxq->tail - rxq->head; > +} > + > +static int vhost_net_buf_is_empty(struct vhost_net_buf *rxq) > +{ > + return rxq->tail == rxq->head; > +} > + > +static void *vhost_net_buf_consume(struct vhost_net_buf *rxq) > +{ > + void *ret = vhost_net_buf_get_ptr(rxq); > + ++rxq->head; > + return ret; > +} > + > +static int vhost_net_buf_produce(struct vhost_net_virtqueue *nvq) > +{ > + struct vhost_net_buf *rxq = >rxq; > + > + rxq->head = 0; > + rxq->tail = skb_array_consume_batched(nvq->rx_array, rxq->queue, > + VHOST_RX_BATCH); > + return rxq->tail; > +} > + > +static void vhost_net_buf_unproduce(struct vhost_net_virtqueue *nvq) > +{ > + struct vhost_net_buf *rxq = >rxq; > + > + if (nvq->rx_array && !vhost_net_buf_is_empty(rxq)) { > + skb_array_unconsume(nvq->rx_array, rxq->queue + rxq->head, > + vhost_net_buf_get_size(rxq)); > + rxq->head = rxq->tail = 0; > + } > +} > + > +static int vhost_net_buf_peek(struct vhost_net_virtqueue *nvq) > +{ > + struct vhost_net_buf *rxq = >rxq; > + > + if (!vhost_net_buf_is_empty(rxq)) > + goto out; > + > + if (!vhost_net_buf_produce(nvq)) > + return 0; > + > +out: > + return __skb_array_len_with_tag(vhost_net_buf_get_ptr(rxq)); > +} > + > +static void vhost_net_buf_init(struct vhost_net_buf *rxq) > +{ > + rxq->head = rxq->tail = 0; > +} > + > static void vhost_net_enable_zcopy(int vq) > { > vhost_net_zcopy_mask |= 0x1 << vq; > @@ -201,6 +277,7 @@ static void vhost_net_vq_reset(struct vhost_net *n) > n->vqs[i].ubufs = NULL; > n->vqs[i].vhost_hlen = 0; > n->vqs[i].sock_hlen = 0; > + vhost_net_buf_init(>vqs[i].rxq); > } > > } > @@ -503,15 +580,14 @@ static void handle_tx(struct vhost_net *net) > mutex_unlock(>mutex); > } > > -static int peek_head_len(struct sock *sk) > +static int peek_head_len(struct vhost_net_virtqueue *rvq, struct sock *sk) > { > - struct socket *sock = sk->sk_socket; > struct sk_buff *head; > int len = 0; > unsigned long flags; > > - if (sock->ops->peek_len) > - return sock->ops->peek_len(sock); > + if
[PULL] virtio: fixes, cleanups, performance
The following changes since commit a351e9b9fc24e982ec2f0e76379a49826036da12: Linux 4.11 (2017-04-30 19:47:48 -0700) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git tags/for_linus for you to fetch changes up to c8b0d7290657996a29f318b948d2397d30e70c36: s390/virtio: change maintainership (2017-05-09 16:43:25 +0300) virtio: fixes, cleanups, performance A bunch of changes to virtio, most affecting virtio net. ptr_ring batched zeroing - first of batching enhancements that seems ready. Signed-off-by: Michael S. TsirkinChristian Borntraeger (1): s390/virtio: change maintainership Colin Ian King (1): tools/virtio: fix spelling mistake: "wakeus" -> "wakeups" Cornelia Huck (1): virtio: virtio_driver doc Dan Carpenter (2): ringtest: fix an assert statement virtio_net: tidy a couple debug statements Michael S. Tsirkin (11): virtio: wrap find_vqs virtio: add context flag to find vqs virtio: allow extra context per descriptor virtio_net: allow specifying context for rx virtio_net: rework mergeable buffer handling virtio_net: reduce alignment for buffers virtio_net: fix support for small rings virtio_net: don't reset twice on XDP on/off ptr_ring: batch ring zeroing ringtest: support test specific parameters ptr_ring: support testing different batching sizes Sekhar Nori (1): tools/virtio: fix build breakage MAINTAINERS| 2 +- drivers/block/virtio_blk.c | 3 +- drivers/char/virtio_console.c | 6 +- drivers/crypto/virtio/virtio_crypto_core.c | 3 +- drivers/gpu/drm/virtio/virtgpu_kms.c | 3 +- drivers/misc/mic/vop/vop_main.c| 9 +- drivers/net/caif/caif_virtio.c | 3 +- drivers/net/virtio_net.c | 147 - drivers/remoteproc/remoteproc_virtio.c | 10 +- drivers/rpmsg/virtio_rpmsg_bus.c | 2 +- drivers/s390/virtio/kvm_virtio.c | 8 +- drivers/s390/virtio/virtio_ccw.c | 7 +- drivers/scsi/virtio_scsi.c | 3 +- drivers/virtio/virtio_balloon.c| 3 +- drivers/virtio/virtio_input.c | 3 +- drivers/virtio/virtio_mmio.c | 8 +- drivers/virtio/virtio_pci_common.c | 17 ++-- drivers/virtio/virtio_pci_common.h | 4 +- drivers/virtio/virtio_pci_legacy.c | 4 +- drivers/virtio/virtio_pci_modern.c | 12 ++- drivers/virtio/virtio_ring.c | 77 --- include/linux/ptr_ring.h | 63 +++-- include/linux/virtio.h | 13 +++ include/linux/virtio_config.h | 25 - include/linux/virtio_ring.h| 3 + net/vmw_vsock/virtio_transport.c | 6 +- tools/virtio/linux/virtio.h| 1 + tools/virtio/ringtest/main.c | 15 ++- tools/virtio/ringtest/main.h | 2 + tools/virtio/ringtest/ptr_ring.c | 3 + tools/virtio/virtio_test.c | 4 +- tools/virtio/vringh_test.c | 7 +- 32 files changed, 330 insertions(+), 146 deletions(-)
Re: [PATCH 1/3] ptr_ring: batch ring zeroing
On Wed, May 10, 2017 at 11:18:13AM +0200, Jesper Dangaard Brouer wrote: > On Tue, 9 May 2017 16:33:14 +0300 > "Michael S. Tsirkin"wrote: > > > On Sat, Apr 08, 2017 at 02:14:08PM +0200, Jesper Dangaard Brouer wrote: > > > On Fri, 7 Apr 2017 08:49:57 +0300 > > > "Michael S. Tsirkin" wrote: > > > > > > > A known weakness in ptr_ring design is that it does not handle well the > > > > situation when ring is almost full: as entries are consumed they are > > > > immediately used again by the producer, so consumer and producer are > > > > writing to a shared cache line. > > > > > > > > To fix this, add batching to consume calls: as entries are > > > > consumed do not write NULL into the ring until we get > > > > a multiple (in current implementation 2x) of cache lines > > > > away from the producer. At that point, write them all out. > > > > > > > > We do the write out in the reverse order to keep > > > > producer from sharing cache with consumer for as long > > > > as possible. > > > > > > > > Writeout also triggers when ring wraps around - there's > > > > no special reason to do this but it helps keep the code > > > > a bit simpler. > > > > > > > > What should we do if getting away from producer by 2 cache lines > > > > would mean we are keeping the ring moe than half empty? > > > > Maybe we should reduce the batching in this case, > > > > current patch simply reduces the batching. > > > > > > > > Notes: > > > > - it is no longer true that a call to consume guarantees > > > > that the following call to produce will succeed. > > > > No users seem to assume that. > > > > - batching can also in theory reduce the signalling rate: > > > > users that would previously send interrups to the producer > > > > to wake it up after consuming each entry would now only > > > > need to do this once in a batch. > > > > Doing this would be easy by returning a flag to the caller. > > > > No users seem to do signalling on consume yet so this was not > > > > implemented yet. > > > > > > > > Signed-off-by: Michael S. Tsirkin > > > > --- > > > > > > > > Jason, I am curious whether the following gives you some of > > > > the performance boost that you see with vhost batching > > > > patches. Is vhost batching on top still helpful? > > > > > > > > include/linux/ptr_ring.h | 63 > > > > +--- > > > > 1 file changed, 54 insertions(+), 9 deletions(-) > > > > > > > > diff --git a/include/linux/ptr_ring.h b/include/linux/ptr_ring.h > > > > index 6c70444..6b2e0dd 100644 > > > > --- a/include/linux/ptr_ring.h > > > > +++ b/include/linux/ptr_ring.h > > > > @@ -34,11 +34,13 @@ > > > > struct ptr_ring { > > > > int producer cacheline_aligned_in_smp; > > > > spinlock_t producer_lock; > > > > - int consumer cacheline_aligned_in_smp; > > > > + int consumer_head cacheline_aligned_in_smp; /* next valid > > > > entry */ > > > > + int consumer_tail; /* next entry to invalidate */ > > > > spinlock_t consumer_lock; > > > > /* Shared consumer/producer data */ > > > > /* Read-only by both the producer and the consumer */ > > > > int size cacheline_aligned_in_smp; /* max entries in queue > > > > */ > > > > + int batch; /* number of entries to consume in a batch */ > > > > void **queue; > > > > }; > > > > > > > > @@ -170,7 +172,7 @@ static inline int ptr_ring_produce_bh(struct > > > > ptr_ring *r, void *ptr) > > > > static inline void *__ptr_ring_peek(struct ptr_ring *r) > > > > { > > > > if (likely(r->size)) > > > > - return r->queue[r->consumer]; > > > > + return r->queue[r->consumer_head]; > > > > return NULL; > > > > } > > > > > > > > @@ -231,9 +233,38 @@ static inline bool ptr_ring_empty_bh(struct > > > > ptr_ring *r) > > > > /* Must only be called after __ptr_ring_peek returned !NULL */ > > > > static inline void __ptr_ring_discard_one(struct ptr_ring *r) > > > > { > > > > - r->queue[r->consumer++] = NULL; > > > > - if (unlikely(r->consumer >= r->size)) > > > > - r->consumer = 0; > > > > + /* Fundamentally, what we want to do is update consumer > > > > +* index and zero out the entry so producer can reuse it. > > > > +* Doing it naively at each consume would be as simple as: > > > > +* r->queue[r->consumer++] = NULL; > > > > +* if (unlikely(r->consumer >= r->size)) > > > > +* r->consumer = 0; > > > > +* but that is suboptimal when the ring is full as producer is > > > > writing > > > > +* out new entries in the same cache line. Defer these updates > > > > until a > > > > +* batch of entries has been consumed. > > > > +*/ > > > > + int head = r->consumer_head++; > > > > + > > > > + /* Once we have processed enough entries
Re: [PATCH 1/2] net: Added mtu parameter to dev_forward_skb calls
Hi Fredrik, [auto build test ERROR on net/master] [also build test ERROR on v4.11 next-20170510] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system] url: https://github.com/0day-ci/linux/commits/Fredrik-Markstrom/net-Added-mtu-parameter-to-dev_forward_skb-calls/20170509-231142 config: x86_64-rhel (attached as .config) compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901 reproduce: # save the attached .config to linux build tree make ARCH=x86_64 All errors (new ones prefixed by >>): net//bridge/br_forward.c: In function '__br_forward': >> net//bridge/br_forward.c:99:46: error: 'struct sk_buff' has no member named >> 'dev_mtu' if (!is_skb_forwardable(skb->dev, skb, skb->dev_mtu)) { ^~ vim +99 net//bridge/br_forward.c 93 } 94 br_hook = NF_BR_FORWARD; 95 skb_forward_csum(skb); 96 net = dev_net(indev); 97 } else { 98 if (unlikely(netpoll_tx_running(to->br->dev))) { > 99 if (!is_skb_forwardable(skb->dev, skb, skb->dev_mtu)) { 100 kfree_skb(skb); 101 } else { 102 skb_push(skb, ETH_HLEN); --- 0-DAY kernel test infrastructureOpen Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation .config.gz Description: application/gzip
RE: bpf pointer alignment validation
From: Alexei Starovoitov > Sent: 10 May 2017 06:58 > > +static u32 calc_align(u32 imm) > > +{ > > + u32 align = 1; > > + > > + if (!imm) > > + return 1U << 31; > > + > > + while (!(imm & 1)) { > > + imm >>= 1; > > + align <<= 1; > > + } > > + return align; > > +} > > same question as in previous reply. > Why not to use something like: > static u32 calc_align(u32 n) > { > if (!n) > return 1U << 31; > return n - ((n - 1) & n); > } That function needs a comment saying what it returns. Think I'd write it as: return n & ~(n & (n - 1)); (even though that might be one more instruction) David
Re: iproute2 ss outputs duplicate tcp sockets info on kernel 3.10.105
Hi, Cc'ing Cyrill who wrote the code in question. Maybe he has an idea what's going wrong here. Cheers, Phil On Mon, May 08, 2017 at 06:56:04PM -0700, Li Er wrote: > i'm using v4.11.0 release of iproute2 and kernel 3.10.105, simply > running > > $ ss > Netid State Recv-Q Send-Q Local Address:Port > Peer Address:Port > tcpCLOSE-WAIT 4340 10.0.0.1:47931 > 65.49.18.136:https > tcpCLOSE-WAIT 4320 10.0.0.1:47932 > 65.49.18.136:https > tcpCLOSE-WAIT 4340 10.0.0.1:47931 > 65.49.18.136:https > tcpCLOSE-WAIT 4320 10.0.0.1:47932 > 65.49.18.136:https > > as you can see, there's one duplicate entry of each tcp socket, > however, if i explicitly specify tcp socket by adding the -t > switch, > > $ ss -t > State Recv-Q Send-Q Local Address:Port Peer > Address:Port > CLOSE-WAIT 4340 10.0.0.1:47931 > 65.49.18.136:https > CLOSE-WAIT 4320 10.0.0.1:47932 > 65.49.18.136:https > > the duplication is gone. > > this problem also occurs on git master, but not on iproute2 > v4.3.0, so i did a git bisect and found out the commit which > caused this is 9f66764e308e9c645b3fb2d1cfbb7fb207eb5de1, and by > revert this commit on git master, i.e. removing > > err = rtnl_dump_done(rth, h); > if (err < 0) > return -1; > > these 3 lines of code of lib/libnetlink.c, the problem is gone. > > since i'm not familiar with the source code, i doubt this is the > right way to solve the problem, what's your suggestions? thanks.
Re: [PATCH] ip: mpls: fix printing of mpls labels
On Mon, May 08, 2017 at 11:04:13PM -0700, David Ahern wrote: > If the kernel returns more labels than iproute2 expects, none of > the labels are printed and (null) is shown instead: > $ ip -f mpls ro ls > 101 as to (null) via inet 172.16.2.2 dev virt12 > 201 as to 202/203 via inet6 2001:db8:2::2 dev virt12 > > Remove the use of MPLS_MAX_LABELS and rely on buffer length that is > passed to mpls_ntop. With this change ip can print the label stack > returned by the kernel up to 255 characters (limit is due to size of > buf passed in) which amounts to 31 labels with a separator. > > With this change the above is: > $ ip/ip -f mpls ro ls > 101 as to 102/103/104/105/106/107/108/109/110 via inet 172.16.2.2 dev > virt12 > > Signed-off-by: David AhernReviewed-by: Simon Horman
Re: DQL and TCQ_F_CAN_BYPASS destroy performance under virtualizaiton (Was: "Re: net_sched strange in 4.11")
On 10/05/17 09:56, Jason Wang wrote: On 2017年05月10日 13:28, Anton Ivanov wrote: On 10/05/17 03:18, Jason Wang wrote: On 2017年05月09日 23:11, Stefan Hajnoczi wrote: On Tue, May 09, 2017 at 08:46:46AM +0100, Anton Ivanov wrote: I have figured it out. Two issues. 1) skb->xmit_more is hardly ever set under virtualization because the qdisc is usually bypassed because of TCQ_F_CAN_BYPASS. Once TCQ_F_CAN_BYPASS is set a virtual NIC driver is not likely see skb->xmit_more (this answers my "how does this work at all" question). 2) If that flag is turned off (I patched sched_generic to turn it off in pfifo_fast while testing), DQL keeps xmit_more from being set. If the driver is not DQL enabled xmit_more is never ever set. If the driver is DQL enabled the queue is adjusted to ensure xmit_more stops happening within 10-15 xmit cycles. That is plain *wrong* for virtual NICs - virtio, emulated NICs, etc. There, the BIG cost is telling the hypervisor that it needs to "kick" the packets. The cost of putting them into the vNIC buffers is negligible. You want xmit_more to happen - it makes between 50% and 300% (depending on vNIC design) difference. If there is no xmit_more the vNIC will immediately "kick" the hypervisor and try to signal that the packet needs to move straight away (as for example in virtio_net). How do you measure the performance? TCP or just measure pps? In this particular case - tcp from guest. I have a couple of other benchmarks (forwarding, etc). One more question, is the number for virtio-net or other emulated vNIC? Other for now - you are cc-ed to keep you in the loop. Virtio is next on my list - I am revisiting the l2tpv3.c driver in QEMU and looking at how to preserve bulking by adding back sendmmsg (as well as a list of other features/transports). We had sendmmsg removed for the final inclusion in QEMU 2.1, it presently uses only recvmmsg so for the time being it does not care. That will most likely change once it starts using sendmmsg as well. In addition to that, the perceived line rate is proportional to this cost, so I am not sure that the current dql math holds. In fact, I think it does not - it is trying to adjust something which influences the perceived line rate. So - how do we turn BOTH bypass and DQL adjustment while under virtualization and set them to be "always qdisc" + "always xmit_more allowed" Virtio-net net does not support BQL. Before commit ea7735d97ba9 ("virtio-net: move free_old_xmit_skbs"), it's even impossible to support that since we don't have tx interrupt for each packet. I haven't measured the impact of xmit_more, maybe I was wrong but I think it may help in some cases since it may improve the batching on host more or less. If you do not support BQL, you might as well look the xmit_more part kick code path. Line 1127. bool kick = !skb->xmit_more; effectively means kick = true; It will never be triggered. You will be kicking each packet and per packet. Probably not, we have several ways to try to suppress this on the virtio layer, host can give hints to disable the kicks through: - explicitly set a flag - implicitly by not publishing a new event idx FYI, I can get 100-200 packets per vm exit when testing 64 byte TCP_STREAM using netperf. I am aware of that. If, however, the host is providing a hint we might as well use it. xmit_more is now set only out of BQL. If BQL is not enabled you never get it. Now, will the current dql code work correctly if you do not have a defined line rate and completion interrupts - no idea. Probably not. IMHO instead of trying to fix it there should be a way for a device or architecture to turn it off. In fact BQL is not the only user for xmit_more. Pktgen with burst is another. Test does not show obvious difference if I set burst from 0 to 64 since we already had other ways to avoid kicking host. That, as well as this not being wired to bulk transport. To be clear - I ran into this working on my own drivers for UML, you are cc-ed because you are likely to be one of the most affected. I'm still not quite sure the issue. Looks like virtio-net is ok since BQL is not supported and the impact of xmit_more could be ignored. Presently - yes. If you have bulk aware transports to wire into that is likely to make a difference. Thanks A. Thanks A. P.S. Cc-ing virtio maintainer CCing Michael Tsirkin and Jason Wang, who are the core virtio and virtio-net maintainers. (I maintain the vsock driver - it's unrelated to this discussion.) A. On 08/05/17 08:15, Anton Ivanov wrote: Hi all, I was revising some of my old work for UML to prepare it for submission and I noticed that skb->xmit_more does not seem to be set any more. I traced the issue as far as net/sched/sched_generic.c try_bulk_dequeue_skb() is never invoked (the drivers I am working on are dql enabled so that is not the problem). More interestingly, if I put a breakpoint and debug output into
Re: [PATCH net 2/2] xdp: disallow use of native and generic hook at once
On 05/10/2017 05:18 AM, Jakub Kicinski wrote: On Wed, 10 May 2017 03:31:31 +0200, Daniel Borkmann wrote: While working on the iproute2 generic XDP frontend, I noticed that as of right now it's possible to have native *and* generic XDP programs loaded both at the same time for the case when a driver supports native XDP. Nice improvement! A couple of absolute nitpicks below.. The intended model for generic XDP from b5cdae3291f7 ("net: Generic XDP") is, however, that only one out of the two can be present at once which is also indicated as such in the XPD netlink dump part. ^^^ XDP Good point. @@ -6851,6 +6851,32 @@ int dev_change_proto_down(struct net_device *dev, bool proto_down) } EXPORT_SYMBOL(dev_change_proto_down); +bool __dev_xdp_attached(struct net_device *dev, xdp_op_t xdp_op) Out of curiosity - the leading underscores refer to caller having to hold rtnl? I assume they are not needed in the function below because it's static? I think I don't quite follow the last question, but it probably makes sense to add an ASSERT_RTNL() into dev_xdp_attached() inline helper to make it clearly visible to callers of this api. +{ + struct netdev_xdp xdp; + + memset(, 0, sizeof(xdp)); + xdp.command = XDP_QUERY_PROG; Probably personal preference, but seems like designated struct initializer would do quite nicely here and save the memset :) I had that initially, but I recalled that gcc < 4.6 does not handle this style for the initialization of anonymous struct/union properly (e.g., we fixed that in iproute2 as well). Andrew Morton still uses gcc 4.4.4 and occasionally sends kernel fixes, so we might end up like this anyway. diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c index dda9f16..99320f0 100644 --- a/net/core/rtnetlink.c +++ b/net/core/rtnetlink.c @@ -1251,24 +1251,20 @@ static int rtnl_xdp_fill(struct sk_buff *skb, struct net_device *dev) { struct nlattr *xdp; u32 xdp_flags = 0; - u8 val = 0; int err; + u8 val; xdp = nla_nest_start(skb, IFLA_XDP); if (!xdp) return -EMSGSIZE; + if (rcu_access_pointer(dev->xdp_prog)) { xdp_flags = XDP_FLAGS_SKB_MODE; val = 1; - } else if (dev->netdev_ops->ndo_xdp) { - struct netdev_xdp xdp_op = {}; - - xdp_op.command = XDP_QUERY_PROG; - err = dev->netdev_ops->ndo_xdp(dev, _op); - if (err) - goto err_cancel; - val = xdp_op.prog_attached; + } else { + val = dev_xdp_attached(dev); } Would it make sense to set xdp_flags to XDP_FLAGS_DRV_MODE here to keep things symmetrical? I know you are just preserving existing behaviour but it may seem slightly arbitrary to a new comer to report one of the very similarly named flags in the dump but not the other. I thought about it, it's kind of redundant information since IFLA_XDP_ATTACHED attribute w/o IFLA_XDP_FLAGS attribute today says that it's native already. It might look strange if we add also XDP_FLAGS_DRV_MODE there, since it doesn't give anything new. I rather see it similar to XDP_FLAGS_UPDATE_IF_NOEXIST flag that is for updating fd only, but I don't really have a strong opinion on this though. I could add it to the respin if preferred.
Re: [PATCH 1/3] ptr_ring: batch ring zeroing
On Tue, 9 May 2017 16:33:14 +0300 "Michael S. Tsirkin"wrote: > On Sat, Apr 08, 2017 at 02:14:08PM +0200, Jesper Dangaard Brouer wrote: > > On Fri, 7 Apr 2017 08:49:57 +0300 > > "Michael S. Tsirkin" wrote: > > > > > A known weakness in ptr_ring design is that it does not handle well the > > > situation when ring is almost full: as entries are consumed they are > > > immediately used again by the producer, so consumer and producer are > > > writing to a shared cache line. > > > > > > To fix this, add batching to consume calls: as entries are > > > consumed do not write NULL into the ring until we get > > > a multiple (in current implementation 2x) of cache lines > > > away from the producer. At that point, write them all out. > > > > > > We do the write out in the reverse order to keep > > > producer from sharing cache with consumer for as long > > > as possible. > > > > > > Writeout also triggers when ring wraps around - there's > > > no special reason to do this but it helps keep the code > > > a bit simpler. > > > > > > What should we do if getting away from producer by 2 cache lines > > > would mean we are keeping the ring moe than half empty? > > > Maybe we should reduce the batching in this case, > > > current patch simply reduces the batching. > > > > > > Notes: > > > - it is no longer true that a call to consume guarantees > > > that the following call to produce will succeed. > > > No users seem to assume that. > > > - batching can also in theory reduce the signalling rate: > > > users that would previously send interrups to the producer > > > to wake it up after consuming each entry would now only > > > need to do this once in a batch. > > > Doing this would be easy by returning a flag to the caller. > > > No users seem to do signalling on consume yet so this was not > > > implemented yet. > > > > > > Signed-off-by: Michael S. Tsirkin > > > --- > > > > > > Jason, I am curious whether the following gives you some of > > > the performance boost that you see with vhost batching > > > patches. Is vhost batching on top still helpful? > > > > > > include/linux/ptr_ring.h | 63 > > > +--- > > > 1 file changed, 54 insertions(+), 9 deletions(-) > > > > > > diff --git a/include/linux/ptr_ring.h b/include/linux/ptr_ring.h > > > index 6c70444..6b2e0dd 100644 > > > --- a/include/linux/ptr_ring.h > > > +++ b/include/linux/ptr_ring.h > > > @@ -34,11 +34,13 @@ > > > struct ptr_ring { > > > int producer cacheline_aligned_in_smp; > > > spinlock_t producer_lock; > > > - int consumer cacheline_aligned_in_smp; > > > + int consumer_head cacheline_aligned_in_smp; /* next valid entry */ > > > + int consumer_tail; /* next entry to invalidate */ > > > spinlock_t consumer_lock; > > > /* Shared consumer/producer data */ > > > /* Read-only by both the producer and the consumer */ > > > int size cacheline_aligned_in_smp; /* max entries in queue */ > > > + int batch; /* number of entries to consume in a batch */ > > > void **queue; > > > }; > > > > > > @@ -170,7 +172,7 @@ static inline int ptr_ring_produce_bh(struct ptr_ring > > > *r, void *ptr) > > > static inline void *__ptr_ring_peek(struct ptr_ring *r) > > > { > > > if (likely(r->size)) > > > - return r->queue[r->consumer]; > > > + return r->queue[r->consumer_head]; > > > return NULL; > > > } > > > > > > @@ -231,9 +233,38 @@ static inline bool ptr_ring_empty_bh(struct ptr_ring > > > *r) > > > /* Must only be called after __ptr_ring_peek returned !NULL */ > > > static inline void __ptr_ring_discard_one(struct ptr_ring *r) > > > { > > > - r->queue[r->consumer++] = NULL; > > > - if (unlikely(r->consumer >= r->size)) > > > - r->consumer = 0; > > > + /* Fundamentally, what we want to do is update consumer > > > + * index and zero out the entry so producer can reuse it. > > > + * Doing it naively at each consume would be as simple as: > > > + * r->queue[r->consumer++] = NULL; > > > + * if (unlikely(r->consumer >= r->size)) > > > + * r->consumer = 0; > > > + * but that is suboptimal when the ring is full as producer is writing > > > + * out new entries in the same cache line. Defer these updates until a > > > + * batch of entries has been consumed. > > > + */ > > > + int head = r->consumer_head++; > > > + > > > + /* Once we have processed enough entries invalidate them in > > > + * the ring all at once so producer can reuse their space in the ring. > > > + * We also do this when we reach end of the ring - not mandatory > > > + * but helps keep the implementation simple. > > > + */ > > > + if (unlikely(r->consumer_head - r->consumer_tail >= r->batch || > > > + r->consumer_head >= r->size)) { > > > + /* Zero out entries in the reverse order: this way we touch the > > > + * cache line that producer might currently be
[PATCH v5 11/17] net: qualcomm: make qca_7k_common a separate kernel module
In order to share common functions between QCA7000 SPI and UART protocol driver the qca_7k_common needs to be a separate kernel module. Signed-off-by: Stefan Wahren--- drivers/net/ethernet/qualcomm/Kconfig | 8 +++- drivers/net/ethernet/qualcomm/Makefile| 5 +++-- drivers/net/ethernet/qualcomm/qca_7k_common.c | 10 ++ 3 files changed, 20 insertions(+), 3 deletions(-) diff --git a/drivers/net/ethernet/qualcomm/Kconfig b/drivers/net/ethernet/qualcomm/Kconfig index d7720bf..b4c369d 100644 --- a/drivers/net/ethernet/qualcomm/Kconfig +++ b/drivers/net/ethernet/qualcomm/Kconfig @@ -16,7 +16,13 @@ config NET_VENDOR_QUALCOMM if NET_VENDOR_QUALCOMM config QCA7000 - tristate "Qualcomm Atheros QCA7000 support" + tristate + help + This enables support for the Qualcomm Atheros QCA7000. + +config QCA7000_SPI + tristate "Qualcomm Atheros QCA7000 SPI support" + select QCA7000 depends on SPI_MASTER && OF ---help--- This SPI protocol driver supports the Qualcomm Atheros QCA7000. diff --git a/drivers/net/ethernet/qualcomm/Makefile b/drivers/net/ethernet/qualcomm/Makefile index 5e17bf1..65556ca 100644 --- a/drivers/net/ethernet/qualcomm/Makefile +++ b/drivers/net/ethernet/qualcomm/Makefile @@ -2,7 +2,8 @@ # Makefile for the Qualcomm network device drivers. # -obj-$(CONFIG_QCA7000) += qcaspi.o -qcaspi-objs := qca_spi.o qca_7k_common.o qca_7k.o qca_debug.o +obj-$(CONFIG_QCA7000) += qca_7k_common.o +obj-$(CONFIG_QCA7000_SPI) += qcaspi.o +qcaspi-objs := qca_7k.o qca_debug.o qca_spi.o obj-y += emac/ diff --git a/drivers/net/ethernet/qualcomm/qca_7k_common.c b/drivers/net/ethernet/qualcomm/qca_7k_common.c index 0d3daa9..6b511f0 100644 --- a/drivers/net/ethernet/qualcomm/qca_7k_common.c +++ b/drivers/net/ethernet/qualcomm/qca_7k_common.c @@ -21,7 +21,9 @@ * by an atheros frame while transmitted over a serial channel; */ +#include #include +#include #include "qca_7k_common.h" @@ -46,6 +48,7 @@ qcafrm_create_header(u8 *buf, u16 length) return QCAFRM_HEADER_LEN; } +EXPORT_SYMBOL_GPL(qcafrm_create_header); u16 qcafrm_create_footer(u8 *buf) @@ -57,6 +60,7 @@ qcafrm_create_footer(u8 *buf) buf[1] = 0x55; return QCAFRM_FOOTER_LEN; } +EXPORT_SYMBOL_GPL(qcafrm_create_footer); /* Gather received bytes and try to extract a full ethernet frame by * following a simple state machine. @@ -154,3 +158,9 @@ qcafrm_fsm_decode(struct qcafrm_handle *handle, u8 *buf, u16 buf_len, u8 recv_by return ret; } +EXPORT_SYMBOL_GPL(qcafrm_fsm_decode); + +MODULE_DESCRIPTION("Qualcomm Atheros QCA7000 common"); +MODULE_AUTHOR("Qualcomm Atheros Communications"); +MODULE_AUTHOR("Stefan Wahren "); +MODULE_LICENSE("Dual BSD/GPL"); -- 2.1.4
[PATCH v5 04/17] net: qualcomm: use net_device_ops instead of direct call
There is no need to export qcaspi_netdev_open and qcaspi_netdev_close because they are also accessible via the net_device_ops. Signed-off-by: Stefan Wahren--- drivers/net/ethernet/qualcomm/qca_debug.c | 5 +++-- drivers/net/ethernet/qualcomm/qca_spi.c | 4 ++-- drivers/net/ethernet/qualcomm/qca_spi.h | 3 --- 3 files changed, 5 insertions(+), 7 deletions(-) diff --git a/drivers/net/ethernet/qualcomm/qca_debug.c b/drivers/net/ethernet/qualcomm/qca_debug.c index d145df9..92b6be9 100644 --- a/drivers/net/ethernet/qualcomm/qca_debug.c +++ b/drivers/net/ethernet/qualcomm/qca_debug.c @@ -275,6 +275,7 @@ qcaspi_get_ringparam(struct net_device *dev, struct ethtool_ringparam *ring) static int qcaspi_set_ringparam(struct net_device *dev, struct ethtool_ringparam *ring) { + const struct net_device_ops *ops = dev->netdev_ops; struct qcaspi *qca = netdev_priv(dev); if ((ring->rx_pending) || @@ -283,13 +284,13 @@ qcaspi_set_ringparam(struct net_device *dev, struct ethtool_ringparam *ring) return -EINVAL; if (netif_running(dev)) - qcaspi_netdev_close(dev); + ops->ndo_stop(dev); qca->txr.count = max_t(u32, ring->tx_pending, TX_RING_MIN_LEN); qca->txr.count = min_t(u16, qca->txr.count, TX_RING_MAX_LEN); if (netif_running(dev)) - qcaspi_netdev_open(dev); + ops->ndo_open(dev); return 0; } diff --git a/drivers/net/ethernet/qualcomm/qca_spi.c b/drivers/net/ethernet/qualcomm/qca_spi.c index 8590109..5c79612 100644 --- a/drivers/net/ethernet/qualcomm/qca_spi.c +++ b/drivers/net/ethernet/qualcomm/qca_spi.c @@ -602,7 +602,7 @@ qcaspi_intr_handler(int irq, void *data) return IRQ_HANDLED; } -int +static int qcaspi_netdev_open(struct net_device *dev) { struct qcaspi *qca = netdev_priv(dev); @@ -639,7 +639,7 @@ qcaspi_netdev_open(struct net_device *dev) return 0; } -int +static int qcaspi_netdev_close(struct net_device *dev) { struct qcaspi *qca = netdev_priv(dev); diff --git a/drivers/net/ethernet/qualcomm/qca_spi.h b/drivers/net/ethernet/qualcomm/qca_spi.h index 6e31a0e..064853d 100644 --- a/drivers/net/ethernet/qualcomm/qca_spi.h +++ b/drivers/net/ethernet/qualcomm/qca_spi.h @@ -108,7 +108,4 @@ struct qcaspi { u16 burst_len; }; -int qcaspi_netdev_open(struct net_device *dev); -int qcaspi_netdev_close(struct net_device *dev); - #endif /* _QCA_SPI_H */ -- 2.1.4
[PATCH v5 07/17] net: qualcomm: move qcaspi_tx_cmd to qca_spi.c
The function qcaspi_tx_cmd() is only called from qca_spi.c. So we better move it there. Signed-off-by: Stefan Wahren--- drivers/net/ethernet/qualcomm/qca_7k.c | 24 drivers/net/ethernet/qualcomm/qca_7k.h | 1 - drivers/net/ethernet/qualcomm/qca_spi.c | 24 3 files changed, 24 insertions(+), 25 deletions(-) diff --git a/drivers/net/ethernet/qualcomm/qca_7k.c b/drivers/net/ethernet/qualcomm/qca_7k.c index 557d53c..aa90a1d 100644 --- a/drivers/net/ethernet/qualcomm/qca_7k.c +++ b/drivers/net/ethernet/qualcomm/qca_7k.c @@ -119,27 +119,3 @@ qcaspi_write_register(struct qcaspi *qca, u16 reg, u16 value) return ret; } - -int -qcaspi_tx_cmd(struct qcaspi *qca, u16 cmd) -{ - __be16 tx_data; - struct spi_message *msg = >spi_msg1; - struct spi_transfer *transfer = >spi_xfer1; - int ret; - - tx_data = cpu_to_be16(cmd); - transfer->len = sizeof(tx_data); - transfer->tx_buf = _data; - transfer->rx_buf = NULL; - - ret = spi_sync(qca->spi_dev, msg); - - if (!ret) - ret = msg->status; - - if (ret) - qcaspi_spi_error(qca); - - return ret; -} diff --git a/drivers/net/ethernet/qualcomm/qca_7k.h b/drivers/net/ethernet/qualcomm/qca_7k.h index 4047f0a..27124c2 100644 --- a/drivers/net/ethernet/qualcomm/qca_7k.h +++ b/drivers/net/ethernet/qualcomm/qca_7k.h @@ -67,6 +67,5 @@ void qcaspi_spi_error(struct qcaspi *qca); int qcaspi_read_register(struct qcaspi *qca, u16 reg, u16 *result); int qcaspi_write_register(struct qcaspi *qca, u16 reg, u16 value); -int qcaspi_tx_cmd(struct qcaspi *qca, u16 cmd); #endif /* _QCA_7K_H */ diff --git a/drivers/net/ethernet/qualcomm/qca_spi.c b/drivers/net/ethernet/qualcomm/qca_spi.c index a239ac4..c727357 100644 --- a/drivers/net/ethernet/qualcomm/qca_spi.c +++ b/drivers/net/ethernet/qualcomm/qca_spi.c @@ -192,6 +192,30 @@ qcaspi_read_legacy(struct qcaspi *qca, u8 *dst, u32 len) } static int +qcaspi_tx_cmd(struct qcaspi *qca, u16 cmd) +{ + __be16 tx_data; + struct spi_message *msg = >spi_msg1; + struct spi_transfer *transfer = >spi_xfer1; + int ret; + + tx_data = cpu_to_be16(cmd); + transfer->len = sizeof(tx_data); + transfer->tx_buf = _data; + transfer->rx_buf = NULL; + + ret = spi_sync(qca->spi_dev, msg); + + if (!ret) + ret = msg->status; + + if (ret) + qcaspi_spi_error(qca); + + return ret; +} + +static int qcaspi_tx_frame(struct qcaspi *qca, struct sk_buff *skb) { u32 count; -- 2.1.4
[PATCH v5 00/17] net: qualcomm: add QCA7000 UART driver
The Qualcomm QCA7000 HomePlug GreenPHY supports two interfaces: UART and SPI. This patch series adds the missing support for UART. This driver based on the Qualcomm code [1], but contains some changes: * use random MAC address per default * use net_device_stats from device * share frame decoding between SPI and UART driver * improve error handling * reimplement tty_wakeup with work queue (based on slcan) * use new serial device bus instead of ldisc The patches 1 - 3 are just for clean up and are not related to the UART support. Patches 4 - 15 prepare the existing QCA7000 code for UART support. Patch 16 is a improvement for serial device bus. Patch 17 contains the new driver. Cherry picking of the dt-bindings and serdev patch is okay. The UART driver functionally (not compile) depends on this unmerged patch [2]. The code itself has been tested on a Freescale i.MX28 board and a Raspberry Pi Zero. Changes in v5: * rebase to current linux-next * fix alignment issues in rx path * fix buffer overrun with big ethernet frames * fix init of UART decoding fsm * add device UART settings to Kconfig help * add current-speed to slave-device binding * merge SPI and UART binding document * rename qca_common to qca_7k_common * drop patch for retrieving UART settings * several cleanups Changes in v4: * rebase to current linux-next * use parameter -M for git format-patch * change order of local variables where possible * implement basic serdev support (without hardware flow control) Changes in v3: * rebase to current net-next Changes in v2: * fix build issue by using netif_trans_update() and dev_trans_start() [1] - https://github.com/IoE/qca7000 [2] - http://marc.info/?l=linux-serial=149338017301309=2 Stefan Wahren (17): net: qualcomm: remove unnecessary includes net: qca_framing: use u16 for frame offset net: qca_7k: Use BIT macro net: qualcomm: use net_device_ops instead of direct call net: qualcomm: Improve readability of length defines net: qca_spi: remove QCASPI_MTU net: qualcomm: move qcaspi_tx_cmd to qca_spi.c net: qca_spi: Clarify MODULE_DESCRIPTION net: qualcomm: rename qca_framing.c to qca_7k_common.c net: qualcomm: prepare frame decoding for UART driver net: qualcomm: make qca_7k_common a separate kernel module dt-bindings: qca7000-spi: Rework binding dt-bindings: qca7000: rename binding dt-bindings: slave-device: add current-speed property dt-bindings: qca7000: append UART interface to binding tty: serdev-ttyport: return actual baudrate from ttyport_set_baudrate net: qualcomm: add QCA7000 UART driver .../devicetree/bindings/net/qca-qca7000-spi.txt| 47 --- .../devicetree/bindings/net/qca-qca7000.txt| 88 + .../devicetree/bindings/serial/slave-device.txt| 9 + drivers/net/ethernet/qualcomm/Kconfig | 24 +- drivers/net/ethernet/qualcomm/Makefile | 7 +- drivers/net/ethernet/qualcomm/qca_7k.c | 28 -- drivers/net/ethernet/qualcomm/qca_7k.h | 15 +- .../qualcomm/{qca_framing.c => qca_7k_common.c}| 26 +- .../qualcomm/{qca_framing.h => qca_7k_common.h}| 24 +- drivers/net/ethernet/qualcomm/qca_debug.c | 5 +- drivers/net/ethernet/qualcomm/qca_spi.c| 47 ++- drivers/net/ethernet/qualcomm/qca_spi.h| 5 +- drivers/net/ethernet/qualcomm/qca_uart.c | 423 + drivers/tty/serdev/serdev-ttyport.c| 2 +- 14 files changed, 630 insertions(+), 120 deletions(-) delete mode 100644 Documentation/devicetree/bindings/net/qca-qca7000-spi.txt create mode 100644 Documentation/devicetree/bindings/net/qca-qca7000.txt rename drivers/net/ethernet/qualcomm/{qca_framing.c => qca_7k_common.c} (85%) rename drivers/net/ethernet/qualcomm/{qca_framing.h => qca_7k_common.h} (86%) create mode 100644 drivers/net/ethernet/qualcomm/qca_uart.c -- 2.1.4
[PATCH v5 10/17] net: qualcomm: prepare frame decoding for UART driver
Unfortunately the frame format is not exactly identical between SPI and UART. In case of SPI there is an additional HW length at the beginning. So store the initial state to make the decoding state machine more flexible and easy to extend for UART support. Signed-off-by: Stefan Wahren--- drivers/net/ethernet/qualcomm/qca_7k_common.c | 12 ++-- drivers/net/ethernet/qualcomm/qca_7k_common.h | 8 ++-- drivers/net/ethernet/qualcomm/qca_spi.c | 2 +- 3 files changed, 13 insertions(+), 9 deletions(-) diff --git a/drivers/net/ethernet/qualcomm/qca_7k_common.c b/drivers/net/ethernet/qualcomm/qca_7k_common.c index 6d17fbd..0d3daa9 100644 --- a/drivers/net/ethernet/qualcomm/qca_7k_common.c +++ b/drivers/net/ethernet/qualcomm/qca_7k_common.c @@ -83,7 +83,7 @@ qcafrm_fsm_decode(struct qcafrm_handle *handle, u8 *buf, u16 buf_len, u8 recv_by if (recv_byte != 0x00) { /* first two bytes of length must be 0 */ - handle->state = QCAFRM_HW_LEN0; + handle->state = handle->init; } break; case QCAFRM_HW_LEN2: @@ -97,7 +97,7 @@ qcafrm_fsm_decode(struct qcafrm_handle *handle, u8 *buf, u16 buf_len, u8 recv_by case QCAFRM_WAIT_AA4: if (recv_byte != 0xAA) { ret = QCAFRM_NOHEAD; - handle->state = QCAFRM_HW_LEN0; + handle->state = handle->init; } else { handle->state--; } @@ -119,7 +119,7 @@ qcafrm_fsm_decode(struct qcafrm_handle *handle, u8 *buf, u16 buf_len, u8 recv_by len = handle->offset; if (len > buf_len || len < QCAFRM_MIN_LEN) { ret = QCAFRM_INVLEN; - handle->state = QCAFRM_HW_LEN0; + handle->state = handle->init; } else { handle->state = (enum qcafrm_state)(len + 1); /* Remaining number of bytes. */ @@ -135,7 +135,7 @@ qcafrm_fsm_decode(struct qcafrm_handle *handle, u8 *buf, u16 buf_len, u8 recv_by case QCAFRM_WAIT_551: if (recv_byte != 0x55) { ret = QCAFRM_NOTAIL; - handle->state = QCAFRM_HW_LEN0; + handle->state = handle->init; } else { handle->state = QCAFRM_WAIT_552; } @@ -143,11 +143,11 @@ qcafrm_fsm_decode(struct qcafrm_handle *handle, u8 *buf, u16 buf_len, u8 recv_by case QCAFRM_WAIT_552: if (recv_byte != 0x55) { ret = QCAFRM_NOTAIL; - handle->state = QCAFRM_HW_LEN0; + handle->state = handle->init; } else { ret = handle->offset; /* Frame is fully received. */ - handle->state = QCAFRM_HW_LEN0; + handle->state = handle->init; } break; } diff --git a/drivers/net/ethernet/qualcomm/qca_7k_common.h b/drivers/net/ethernet/qualcomm/qca_7k_common.h index 5df7c65..07bdd6c 100644 --- a/drivers/net/ethernet/qualcomm/qca_7k_common.h +++ b/drivers/net/ethernet/qualcomm/qca_7k_common.h @@ -61,6 +61,7 @@ #define QCAFRM_ERR_BASE -1000 enum qcafrm_state { + /* HW length is only available on SPI */ QCAFRM_HW_LEN0 = 0x8000, QCAFRM_HW_LEN1 = QCAFRM_HW_LEN0 - 1, QCAFRM_HW_LEN2 = QCAFRM_HW_LEN1 - 1, @@ -101,6 +102,8 @@ enum qcafrm_state { struct qcafrm_handle { /* Current decoding state */ enum qcafrm_state state; + /* Initial state depends on connection type */ + enum qcafrm_state init; /* Offset in buffer (borrowed for length too) */ u16 offset; @@ -113,9 +116,10 @@ u16 qcafrm_create_header(u8 *buf, u16 len); u16 qcafrm_create_footer(u8 *buf); -static inline void qcafrm_fsm_init(struct qcafrm_handle *handle) +static inline void qcafrm_fsm_init_spi(struct qcafrm_handle *handle) { - handle->state = QCAFRM_HW_LEN0; + handle->init = QCAFRM_HW_LEN0; + handle->state = handle->init; } /* Gather received bytes and try to extract a full Ethernet frame diff --git a/drivers/net/ethernet/qualcomm/qca_spi.c b/drivers/net/ethernet/qualcomm/qca_spi.c index 2bc3ba4..b7073a9 100644 --- a/drivers/net/ethernet/qualcomm/qca_spi.c +++ b/drivers/net/ethernet/qualcomm/qca_spi.c @@ -637,7 +637,7 @@ qcaspi_netdev_open(struct net_device *dev) qca->intr_req = 1; qca->intr_svc = 0; qca->sync = QCASPI_SYNC_UNKNOWN; - qcafrm_fsm_init(>frm_handle); + qcafrm_fsm_init_spi(>frm_handle); qca->spi_thread = kthread_run((void *)qcaspi_spi_thread, qca, "%s", dev->name); -- 2.1.4
[PATCH v5 08/17] net: qca_spi: Clarify MODULE_DESCRIPTION
Since this driver is specific to the QCA7000, we should make the module description more precisely. Signed-off-by: Stefan Wahren--- drivers/net/ethernet/qualcomm/qca_spi.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/ethernet/qualcomm/qca_spi.c b/drivers/net/ethernet/qualcomm/qca_spi.c index c727357..deec70f 100644 --- a/drivers/net/ethernet/qualcomm/qca_spi.c +++ b/drivers/net/ethernet/qualcomm/qca_spi.c @@ -997,7 +997,7 @@ static struct spi_driver qca_spi_driver = { }; module_spi_driver(qca_spi_driver); -MODULE_DESCRIPTION("Qualcomm Atheros SPI Driver"); +MODULE_DESCRIPTION("Qualcomm Atheros QCA7000 SPI Driver"); MODULE_AUTHOR("Qualcomm Atheros Communications"); MODULE_AUTHOR("Stefan Wahren "); MODULE_LICENSE("Dual BSD/GPL"); -- 2.1.4
[PATCH v5 13/17] dt-bindings: qca7000: rename binding
Before we can merge the QCA7000 UART binding the document needs to be renamed. Signed-off-by: Stefan Wahren--- .../devicetree/bindings/net/{qca-qca7000-spi.txt => qca-qca7000.txt} | 0 1 file changed, 0 insertions(+), 0 deletions(-) rename Documentation/devicetree/bindings/net/{qca-qca7000-spi.txt => qca-qca7000.txt} (100%) diff --git a/Documentation/devicetree/bindings/net/qca-qca7000-spi.txt b/Documentation/devicetree/bindings/net/qca-qca7000.txt similarity index 100% rename from Documentation/devicetree/bindings/net/qca-qca7000-spi.txt rename to Documentation/devicetree/bindings/net/qca-qca7000.txt -- 2.1.4
[PATCH v5 12/17] dt-bindings: qca7000-spi: Rework binding
In preparation for the QCA7000 UART binding rework the binding document. Signed-off-by: Stefan Wahren--- .../devicetree/bindings/net/qca-qca7000-spi.txt| 49 +- 1 file changed, 29 insertions(+), 20 deletions(-) diff --git a/Documentation/devicetree/bindings/net/qca-qca7000-spi.txt b/Documentation/devicetree/bindings/net/qca-qca7000-spi.txt index c74989c..a37f656 100644 --- a/Documentation/devicetree/bindings/net/qca-qca7000-spi.txt +++ b/Documentation/devicetree/bindings/net/qca-qca7000-spi.txt @@ -1,29 +1,38 @@ -* Qualcomm QCA7000 (Ethernet over SPI protocol) +* Qualcomm QCA7000 -Note: The QCA7000 is useable as a SPI device. In this case it must be defined -as a child of a SPI master in the device tree. +The QCA7000 is a serial-to-powerline bridge with a host interface which could +be configured either as SPI or UART slave. This configuration is done by +the QCA7000 firmware. + +(a) Ethernet over SPI + +In order to use the QCA7000 as SPI device it must be defined as a child of a +SPI master in the device tree. Required properties: -- compatible : Should be "qca,qca7000" -- reg : Should specify the SPI chip select -- interrupts : The first cell should specify the index of the source interrupt - and the second cell should specify the trigger type as rising edge -- spi-cpha : Must be set -- spi-cpol: Must be set +- compatible : Should be "qca,qca7000" +- reg : Should specify the SPI chip select +- interrupts : The first cell should specify the index of the source + interrupt and the second cell should specify the trigger + type as rising edge +- spi-cpha : Must be set +- spi-cpol : Must be set Optional properties: -- interrupt-parent : Specify the pHandle of the source interrupt +- interrupt-parent : Specify the pHandle of the source interrupt - spi-max-frequency : Maximum frequency of the SPI bus the chip can operate at. - Numbers smaller than 100 or greater than 1600 are invalid. Missing - the property will set the SPI frequency to 800 Hertz. -- local-mac-address: 6 bytes, MAC address -- qca,legacy-mode : Set the SPI data transfer of the QCA7000 to legacy mode. - In this mode the SPI master must toggle the chip select between each data - word. In burst mode these gaps aren't necessary, which is faster. - This setting depends on how the QCA7000 is setup via GPIO pin strapping. - If the property is missing the driver defaults to burst mode. - -Example: + Numbers smaller than 100 or greater than 1600 + are invalid. Missing the property will set the SPI + frequency to 800 Hertz. +- local-mac-address : see ./ethernet.txt +- qca,legacy-mode : Set the SPI data transfer of the QCA7000 to legacy mode. + In this mode the SPI master must toggle the chip select + between each data word. In burst mode these gaps aren't + necessary, which is faster. This setting depends on how + the QCA7000 is setup via GPIO pin strapping. If the + property is missing the driver defaults to burst mode. + +SPI Example: /* Freescale i.MX28 SPI master*/ ssp2: spi@80014000 { -- 2.1.4
[PATCH v5 17/17] net: qualcomm: add QCA7000 UART driver
This patch adds the Ethernet over UART driver for the Qualcomm QCA7000 HomePlug GreenPHY. Signed-off-by: Stefan Wahren--- drivers/net/ethernet/qualcomm/Kconfig | 16 + drivers/net/ethernet/qualcomm/Makefile| 2 + drivers/net/ethernet/qualcomm/qca_7k_common.h | 6 + drivers/net/ethernet/qualcomm/qca_uart.c | 423 ++ 4 files changed, 447 insertions(+) create mode 100644 drivers/net/ethernet/qualcomm/qca_uart.c diff --git a/drivers/net/ethernet/qualcomm/Kconfig b/drivers/net/ethernet/qualcomm/Kconfig index b4c369d..877675a 100644 --- a/drivers/net/ethernet/qualcomm/Kconfig +++ b/drivers/net/ethernet/qualcomm/Kconfig @@ -30,6 +30,22 @@ config QCA7000_SPI To compile this driver as a module, choose M here. The module will be called qcaspi. +config QCA7000_UART + tristate "Qualcomm Atheros QCA7000 UART support" + select QCA7000 + depends on SERIAL_DEV_BUS && OF + ---help--- + This UART protocol driver supports the Qualcomm Atheros QCA7000. + + Currently the driver assumes these device UART settings: + Data bits: 8 + Parity: None + Stop bits: 1 + Flow control: None + + To compile this driver as a module, choose M here. The module + will be called qcauart. + config QCOM_EMAC tristate "Qualcomm Technologies, Inc. EMAC Gigabit Ethernet support" depends on HAS_DMA && HAS_IOMEM diff --git a/drivers/net/ethernet/qualcomm/Makefile b/drivers/net/ethernet/qualcomm/Makefile index 65556ca..92fa7c4 100644 --- a/drivers/net/ethernet/qualcomm/Makefile +++ b/drivers/net/ethernet/qualcomm/Makefile @@ -5,5 +5,7 @@ obj-$(CONFIG_QCA7000) += qca_7k_common.o obj-$(CONFIG_QCA7000_SPI) += qcaspi.o qcaspi-objs := qca_7k.o qca_debug.o qca_spi.o +obj-$(CONFIG_QCA7000_UART) += qcauart.o +qcauart-objs := qca_uart.o obj-y += emac/ diff --git a/drivers/net/ethernet/qualcomm/qca_7k_common.h b/drivers/net/ethernet/qualcomm/qca_7k_common.h index 07bdd6c..928554f 100644 --- a/drivers/net/ethernet/qualcomm/qca_7k_common.h +++ b/drivers/net/ethernet/qualcomm/qca_7k_common.h @@ -122,6 +122,12 @@ static inline void qcafrm_fsm_init_spi(struct qcafrm_handle *handle) handle->state = handle->init; } +static inline void qcafrm_fsm_init_uart(struct qcafrm_handle *handle) +{ + handle->init = QCAFRM_WAIT_AA1; + handle->state = handle->init; +} + /* Gather received bytes and try to extract a full Ethernet frame * by following a simple state machine. * diff --git a/drivers/net/ethernet/qualcomm/qca_uart.c b/drivers/net/ethernet/qualcomm/qca_uart.c new file mode 100644 index 000..1f6514c --- /dev/null +++ b/drivers/net/ethernet/qualcomm/qca_uart.c @@ -0,0 +1,423 @@ +/* + * Copyright (c) 2011, 2012, Qualcomm Atheros Communications Inc. + * Copyright (c) 2017, I2SE GmbH + * + * Permission to use, copy, modify, and/or distribute this software + * for any purpose with or without fee is hereby granted, provided + * that the above copyright notice and this permission notice appear + * in all copies. + * + * THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL + * WARRANTIES WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED + * WARRANTIES OF MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL + * THE AUTHOR BE LIABLE FOR ANY SPECIAL, DIRECT, INDIRECT, OR + * CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM + * LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, + * NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN + * CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE. + */ + +/* This module implements the Qualcomm Atheros UART protocol for + * kernel-based UART device; it is essentially an Ethernet-to-UART + * serial converter; + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include "qca_7k_common.h" + +#define QCAUART_DRV_VERSION "0.1.0" +#define QCAUART_DRV_NAME "qcauart" +#define QCAUART_TX_TIMEOUT (1 * HZ) + +struct qcauart { + struct net_device *net_dev; + spinlock_t lock;/* transmit lock */ + struct work_struct tx_work; /* Flushes transmit buffer */ + + struct serdev_device *serdev; + struct qcafrm_handle frm_handle; + struct sk_buff *rx_skb; + + unsigned char *tx_head; /* pointer to next XMIT byte */ + int tx_left;/* bytes left in XMIT queue */ + unsigned char *tx_buffer; +}; + +static int +qca_tty_receive(struct serdev_device *serdev, const unsigned char *data, + size_t count) +{ + struct qcauart *qca = serdev_device_get_drvdata(serdev); + struct net_device *netdev = qca->net_dev; + struct net_device_stats *n_stats =
[PATCH v5 14/17] dt-bindings: slave-device: add current-speed property
This adds a new DT property to define the current baud rate of the slave device. Signed-off-by: Stefan Wahren--- Documentation/devicetree/bindings/serial/slave-device.txt | 9 + 1 file changed, 9 insertions(+) diff --git a/Documentation/devicetree/bindings/serial/slave-device.txt b/Documentation/devicetree/bindings/serial/slave-device.txt index f660379..40110e0 100644 --- a/Documentation/devicetree/bindings/serial/slave-device.txt +++ b/Documentation/devicetree/bindings/serial/slave-device.txt @@ -21,6 +21,15 @@ Optional Properties: can support. For example, a particular board has some signal quality issue or the host processor can't support higher baud rates. +- current-speed: The current baud rate the device operates at. This should + only be present in case a driver has no chance to know + the baud rate of the slave device. + Examples: + * device supports auto-baud + * the rate is setup by a bootloader and there is no + way to reset the device + * device baud rate is configured by its firmware but + there is no way to request the actual settings Example: -- 2.1.4
[PATCH v5 15/17] dt-bindings: qca7000: append UART interface to binding
This merges the serdev binding for the QCA7000 UART driver (Ethernet over UART) into the existing document. Signed-off-by: Stefan Wahren--- .../devicetree/bindings/net/qca-qca7000.txt| 32 ++ 1 file changed, 32 insertions(+) diff --git a/Documentation/devicetree/bindings/net/qca-qca7000.txt b/Documentation/devicetree/bindings/net/qca-qca7000.txt index a37f656..08364c3 100644 --- a/Documentation/devicetree/bindings/net/qca-qca7000.txt +++ b/Documentation/devicetree/bindings/net/qca-qca7000.txt @@ -54,3 +54,35 @@ ssp2: spi@80014000 { local-mac-address = [ A0 B0 C0 D0 E0 F0 ]; }; }; + +(b) Ethernet over UART + +In order to use the QCA7000 as UART slave it must be defined as a child of a +UART master in the device tree. It is possible to preconfigure the UART +settings of the QCA7000 firmware, but it's not possible to change them during +runtime. + +Required properties: +- compatible: Should be "qca,qca7000-uart" + +Optional properties: +- local-mac-address : see ./ethernet.txt +- current-speed : current baud rate of QCA7000 which defaults to 115200 + if absent, see also ../serial/slave-device.txt + +UART Example: + +/* Freescale i.MX28 UART */ +auart0: serial@8006a000 { + compatible = "fsl,imx28-auart", "fsl,imx23-auart"; + reg = <0x8006a000 0x2000>; + pinctrl-names = "default"; + pinctrl-0 = <_2pins_a>; + status = "okay"; + + qca7000: ethernet { + compatible = "qca,qca7000-uart"; + local-mac-address = [ A0 B0 C0 D0 E0 F0 ]; + current-speed = <38400>; + }; +}; -- 2.1.4
[PATCH v5 02/17] net: qca_framing: use u16 for frame offset
It doesn't make sense to use a signed variable for offset here, so fix it up. Signed-off-by: Stefan Wahren--- drivers/net/ethernet/qualcomm/qca_framing.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/ethernet/qualcomm/qca_framing.h b/drivers/net/ethernet/qualcomm/qca_framing.h index d5e795d..8b385e6 100644 --- a/drivers/net/ethernet/qualcomm/qca_framing.h +++ b/drivers/net/ethernet/qualcomm/qca_framing.h @@ -103,7 +103,7 @@ struct qcafrm_handle { enum qcafrm_state state; /* Offset in buffer (borrowed for length too) */ - s16 offset; + u16 offset; /* Frame length as kept by this module */ u16 len; -- 2.1.4
[PATCH v5 01/17] net: qualcomm: remove unnecessary includes
Most of the includes in qca_7k.c are unnecessary so we better remove them. Signed-off-by: Stefan Wahren--- drivers/net/ethernet/qualcomm/qca_7k.c | 4 1 file changed, 4 deletions(-) diff --git a/drivers/net/ethernet/qualcomm/qca_7k.c b/drivers/net/ethernet/qualcomm/qca_7k.c index f0066fb..557d53c 100644 --- a/drivers/net/ethernet/qualcomm/qca_7k.c +++ b/drivers/net/ethernet/qualcomm/qca_7k.c @@ -23,11 +23,7 @@ * kernel-based SPI device. */ -#include -#include -#include #include -#include #include "qca_7k.h" -- 2.1.4
[PATCH v5 06/17] net: qca_spi: remove QCASPI_MTU
There is no need for an additional MTU define. Signed-off-by: Stefan Wahren--- drivers/net/ethernet/qualcomm/qca_spi.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/drivers/net/ethernet/qualcomm/qca_spi.c b/drivers/net/ethernet/qualcomm/qca_spi.c index a0dbb92..a239ac4 100644 --- a/drivers/net/ethernet/qualcomm/qca_spi.c +++ b/drivers/net/ethernet/qualcomm/qca_spi.c @@ -69,7 +69,6 @@ static int qcaspi_pluggable = QCASPI_PLUGGABLE_MIN; module_param(qcaspi_pluggable, int, 0); MODULE_PARM_DESC(qcaspi_pluggable, "Pluggable SPI connection (yes/no)."); -#define QCASPI_MTU QCAFRM_MAX_MTU #define QCASPI_TX_TIMEOUT (1 * HZ) #define QCASPI_QCA7K_REBOOT_TIME_MS 1000 @@ -745,7 +744,7 @@ qcaspi_netdev_init(struct net_device *dev) { struct qcaspi *qca = netdev_priv(dev); - dev->mtu = QCASPI_MTU; + dev->mtu = QCAFRM_MAX_MTU; dev->type = ARPHRD_ETHER; qca->clkspeed = qcaspi_clkspeed; qca->burst_len = qcaspi_burst_len; -- 2.1.4
Re: DQL and TCQ_F_CAN_BYPASS destroy performance under virtualizaiton (Was: "Re: net_sched strange in 4.11")
On 2017年05月10日 13:28, Anton Ivanov wrote: On 10/05/17 03:18, Jason Wang wrote: On 2017年05月09日 23:11, Stefan Hajnoczi wrote: On Tue, May 09, 2017 at 08:46:46AM +0100, Anton Ivanov wrote: I have figured it out. Two issues. 1) skb->xmit_more is hardly ever set under virtualization because the qdisc is usually bypassed because of TCQ_F_CAN_BYPASS. Once TCQ_F_CAN_BYPASS is set a virtual NIC driver is not likely see skb->xmit_more (this answers my "how does this work at all" question). 2) If that flag is turned off (I patched sched_generic to turn it off in pfifo_fast while testing), DQL keeps xmit_more from being set. If the driver is not DQL enabled xmit_more is never ever set. If the driver is DQL enabled the queue is adjusted to ensure xmit_more stops happening within 10-15 xmit cycles. That is plain *wrong* for virtual NICs - virtio, emulated NICs, etc. There, the BIG cost is telling the hypervisor that it needs to "kick" the packets. The cost of putting them into the vNIC buffers is negligible. You want xmit_more to happen - it makes between 50% and 300% (depending on vNIC design) difference. If there is no xmit_more the vNIC will immediately "kick" the hypervisor and try to signal that the packet needs to move straight away (as for example in virtio_net). How do you measure the performance? TCP or just measure pps? In this particular case - tcp from guest. I have a couple of other benchmarks (forwarding, etc). One more question, is the number for virtio-net or other emulated vNIC? In addition to that, the perceived line rate is proportional to this cost, so I am not sure that the current dql math holds. In fact, I think it does not - it is trying to adjust something which influences the perceived line rate. So - how do we turn BOTH bypass and DQL adjustment while under virtualization and set them to be "always qdisc" + "always xmit_more allowed" Virtio-net net does not support BQL. Before commit ea7735d97ba9 ("virtio-net: move free_old_xmit_skbs"), it's even impossible to support that since we don't have tx interrupt for each packet. I haven't measured the impact of xmit_more, maybe I was wrong but I think it may help in some cases since it may improve the batching on host more or less. If you do not support BQL, you might as well look the xmit_more part kick code path. Line 1127. bool kick = !skb->xmit_more; effectively means kick = true; It will never be triggered. You will be kicking each packet and per packet. Probably not, we have several ways to try to suppress this on the virtio layer, host can give hints to disable the kicks through: - explicitly set a flag - implicitly by not publishing a new event idx FYI, I can get 100-200 packets per vm exit when testing 64 byte TCP_STREAM using netperf. xmit_more is now set only out of BQL. If BQL is not enabled you never get it. Now, will the current dql code work correctly if you do not have a defined line rate and completion interrupts - no idea. Probably not. IMHO instead of trying to fix it there should be a way for a device or architecture to turn it off. In fact BQL is not the only user for xmit_more. Pktgen with burst is another. Test does not show obvious difference if I set burst from 0 to 64 since we already had other ways to avoid kicking host. To be clear - I ran into this working on my own drivers for UML, you are cc-ed because you are likely to be one of the most affected. I'm still not quite sure the issue. Looks like virtio-net is ok since BQL is not supported and the impact of xmit_more could be ignored. Thanks A. Thanks A. P.S. Cc-ing virtio maintainer CCing Michael Tsirkin and Jason Wang, who are the core virtio and virtio-net maintainers. (I maintain the vsock driver - it's unrelated to this discussion.) A. On 08/05/17 08:15, Anton Ivanov wrote: Hi all, I was revising some of my old work for UML to prepare it for submission and I noticed that skb->xmit_more does not seem to be set any more. I traced the issue as far as net/sched/sched_generic.c try_bulk_dequeue_skb() is never invoked (the drivers I am working on are dql enabled so that is not the problem). More interestingly, if I put a breakpoint and debug output into dequeue_skb() around line 147 - right before the bulk: tag that skb there is always NULL. ??? Similarly, debug in pfifo_fast_dequeue shows only NULLs being dequeued. Again - ??? First and foremost, I apologize for the silly question, but how can this work at all? I see the skbs showing up at the driver level, why are NULLs being returned at qdisc dequeue and where do the skbs at the driver level come from? Second, where should I look to fix it? A. -- Anton R. Ivanov Cambridge Greys Limited, England company No 10273661 http://www.cambridgegreys.com/
[PATCH v5 05/17] net: qualcomm: Improve readability of length defines
In order to avoid mixing things up, make the MTU and frame length defines easier to read. Signed-off-by: Stefan Wahren--- drivers/net/ethernet/qualcomm/qca_framing.c | 2 +- drivers/net/ethernet/qualcomm/qca_framing.h | 8 drivers/net/ethernet/qualcomm/qca_spi.c | 12 ++-- 3 files changed, 11 insertions(+), 11 deletions(-) diff --git a/drivers/net/ethernet/qualcomm/qca_framing.c b/drivers/net/ethernet/qualcomm/qca_framing.c index faa924c..2341f2b 100644 --- a/drivers/net/ethernet/qualcomm/qca_framing.c +++ b/drivers/net/ethernet/qualcomm/qca_framing.c @@ -117,7 +117,7 @@ qcafrm_fsm_decode(struct qcafrm_handle *handle, u8 *buf, u16 buf_len, u8 recv_by break; case QCAFRM_WAIT_RSVD_BYTE2: len = handle->offset; - if (len > buf_len || len < QCAFRM_ETHMINLEN) { + if (len > buf_len || len < QCAFRM_MIN_LEN) { ret = QCAFRM_INVLEN; handle->state = QCAFRM_HW_LEN0; } else { diff --git a/drivers/net/ethernet/qualcomm/qca_framing.h b/drivers/net/ethernet/qualcomm/qca_framing.h index 8b385e6..5df7c65 100644 --- a/drivers/net/ethernet/qualcomm/qca_framing.h +++ b/drivers/net/ethernet/qualcomm/qca_framing.h @@ -44,12 +44,12 @@ #define QCAFRM_INVFRAME (QCAFRM_ERR_BASE - 4) /* Min/Max Ethernet MTU: 46/1500 */ -#define QCAFRM_ETHMINMTU (ETH_ZLEN - ETH_HLEN) -#define QCAFRM_ETHMAXMTU ETH_DATA_LEN +#define QCAFRM_MIN_MTU (ETH_ZLEN - ETH_HLEN) +#define QCAFRM_MAX_MTU ETH_DATA_LEN /* Min/Max frame lengths */ -#define QCAFRM_ETHMINLEN (QCAFRM_ETHMINMTU + ETH_HLEN) -#define QCAFRM_ETHMAXLEN (QCAFRM_ETHMAXMTU + VLAN_ETH_HLEN) +#define QCAFRM_MIN_LEN (QCAFRM_MIN_MTU + ETH_HLEN) +#define QCAFRM_MAX_LEN (QCAFRM_MAX_MTU + VLAN_ETH_HLEN) /* QCA7K header len */ #define QCAFRM_HEADER_LEN 8 diff --git a/drivers/net/ethernet/qualcomm/qca_spi.c b/drivers/net/ethernet/qualcomm/qca_spi.c index 5c79612..a0dbb92 100644 --- a/drivers/net/ethernet/qualcomm/qca_spi.c +++ b/drivers/net/ethernet/qualcomm/qca_spi.c @@ -69,7 +69,7 @@ static int qcaspi_pluggable = QCASPI_PLUGGABLE_MIN; module_param(qcaspi_pluggable, int, 0); MODULE_PARM_DESC(qcaspi_pluggable, "Pluggable SPI connection (yes/no)."); -#define QCASPI_MTU QCAFRM_ETHMAXMTU +#define QCASPI_MTU QCAFRM_MAX_MTU #define QCASPI_TX_TIMEOUT (1 * HZ) #define QCASPI_QCA7K_REBOOT_TIME_MS 1000 @@ -402,7 +402,7 @@ qcaspi_tx_ring_has_space(struct tx_ring *txr) if (txr->skb[txr->tail]) return 0; - return (txr->size + QCAFRM_ETHMAXLEN < QCASPI_HW_BUF_LEN) ? 1 : 0; + return (txr->size + QCAFRM_MAX_LEN < QCASPI_HW_BUF_LEN) ? 1 : 0; } /* Flush the tx ring. This function is only safe to @@ -666,8 +666,8 @@ qcaspi_netdev_xmit(struct sk_buff *skb, struct net_device *dev) struct sk_buff *tskb; u8 pad_len = 0; - if (skb->len < QCAFRM_ETHMINLEN) - pad_len = QCAFRM_ETHMINLEN - skb->len; + if (skb->len < QCAFRM_MIN_LEN) + pad_len = QCAFRM_MIN_LEN - skb->len; if (qca->txr.skb[qca->txr.tail]) { netdev_warn(qca->net_dev, "queue was unexpectedly full!\n"); @@ -804,8 +804,8 @@ qcaspi_netdev_setup(struct net_device *dev) dev->tx_queue_len = 100; /* MTU range: 46 - 1500 */ - dev->min_mtu = QCAFRM_ETHMINMTU; - dev->max_mtu = QCAFRM_ETHMAXMTU; + dev->min_mtu = QCAFRM_MIN_MTU; + dev->max_mtu = QCAFRM_MAX_MTU; qca = netdev_priv(dev); memset(qca, 0, sizeof(struct qcaspi)); -- 2.1.4
[PATCH v5 09/17] net: qualcomm: rename qca_framing.c to qca_7k_common.c
As preparation for the upcoming UART driver we need a module which contains common functions for both interfaces. The module qca_framing is a good candidate but renaming to qca_7k_common would make it clear. Signed-off-by: Stefan Wahren--- drivers/net/ethernet/qualcomm/Makefile | 2 +- drivers/net/ethernet/qualcomm/{qca_framing.c => qca_7k_common.c} | 2 +- drivers/net/ethernet/qualcomm/{qca_framing.h => qca_7k_common.h} | 0 drivers/net/ethernet/qualcomm/qca_spi.c | 2 +- drivers/net/ethernet/qualcomm/qca_spi.h | 2 +- 5 files changed, 4 insertions(+), 4 deletions(-) rename drivers/net/ethernet/qualcomm/{qca_framing.c => qca_7k_common.c} (99%) rename drivers/net/ethernet/qualcomm/{qca_framing.h => qca_7k_common.h} (100%) diff --git a/drivers/net/ethernet/qualcomm/Makefile b/drivers/net/ethernet/qualcomm/Makefile index aacb0a5..5e17bf1 100644 --- a/drivers/net/ethernet/qualcomm/Makefile +++ b/drivers/net/ethernet/qualcomm/Makefile @@ -3,6 +3,6 @@ # obj-$(CONFIG_QCA7000) += qcaspi.o -qcaspi-objs := qca_spi.o qca_framing.o qca_7k.o qca_debug.o +qcaspi-objs := qca_spi.o qca_7k_common.o qca_7k.o qca_debug.o obj-y += emac/ diff --git a/drivers/net/ethernet/qualcomm/qca_framing.c b/drivers/net/ethernet/qualcomm/qca_7k_common.c similarity index 99% rename from drivers/net/ethernet/qualcomm/qca_framing.c rename to drivers/net/ethernet/qualcomm/qca_7k_common.c index 2341f2b..6d17fbd 100644 --- a/drivers/net/ethernet/qualcomm/qca_framing.c +++ b/drivers/net/ethernet/qualcomm/qca_7k_common.c @@ -23,7 +23,7 @@ #include -#include "qca_framing.h" +#include "qca_7k_common.h" u16 qcafrm_create_header(u8 *buf, u16 length) diff --git a/drivers/net/ethernet/qualcomm/qca_framing.h b/drivers/net/ethernet/qualcomm/qca_7k_common.h similarity index 100% rename from drivers/net/ethernet/qualcomm/qca_framing.h rename to drivers/net/ethernet/qualcomm/qca_7k_common.h diff --git a/drivers/net/ethernet/qualcomm/qca_spi.c b/drivers/net/ethernet/qualcomm/qca_spi.c index deec70f..2bc3ba4 100644 --- a/drivers/net/ethernet/qualcomm/qca_spi.c +++ b/drivers/net/ethernet/qualcomm/qca_spi.c @@ -43,8 +43,8 @@ #include #include "qca_7k.h" +#include "qca_7k_common.h" #include "qca_debug.h" -#include "qca_framing.h" #include "qca_spi.h" #define MAX_DMA_BURST_LEN 5000 diff --git a/drivers/net/ethernet/qualcomm/qca_spi.h b/drivers/net/ethernet/qualcomm/qca_spi.h index 064853d..fc4beb1 100644 --- a/drivers/net/ethernet/qualcomm/qca_spi.h +++ b/drivers/net/ethernet/qualcomm/qca_spi.h @@ -32,7 +32,7 @@ #include #include -#include "qca_framing.h" +#include "qca_7k_common.h" #define QCASPI_DRV_VERSION "0.2.7-i" #define QCASPI_DRV_NAME"qcaspi" -- 2.1.4
[PATCH v5 03/17] net: qca_7k: Use BIT macro
Use the BIT macro for the CONFIG and INT register values. Signed-off-by: Stefan Wahren--- drivers/net/ethernet/qualcomm/qca_7k.h | 14 +++--- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/drivers/net/ethernet/qualcomm/qca_7k.h b/drivers/net/ethernet/qualcomm/qca_7k.h index 1cad851..4047f0a 100644 --- a/drivers/net/ethernet/qualcomm/qca_7k.h +++ b/drivers/net/ethernet/qualcomm/qca_7k.h @@ -54,15 +54,15 @@ #define SPI_REG_ACTION_CTRL 0x1B00 /* SPI_CONFIG register definition; */ -#define QCASPI_SLAVE_RESET_BIT (1 << 6) +#define QCASPI_SLAVE_RESET_BIT BIT(6) /* INTR_CAUSE/ENABLE register definition. */ -#define SPI_INT_WRBUF_BELOW_WM (1 << 10) -#define SPI_INT_CPU_ON (1 << 6) -#define SPI_INT_ADDR_ERR (1 << 3) -#define SPI_INT_WRBUF_ERR (1 << 2) -#define SPI_INT_RDBUF_ERR (1 << 1) -#define SPI_INT_PKT_AVLBL (1 << 0) +#define SPI_INT_WRBUF_BELOW_WM BIT(10) +#define SPI_INT_CPU_ON BIT(6) +#define SPI_INT_ADDR_ERRBIT(3) +#define SPI_INT_WRBUF_ERR BIT(2) +#define SPI_INT_RDBUF_ERR BIT(1) +#define SPI_INT_PKT_AVLBL BIT(0) void qcaspi_spi_error(struct qcaspi *qca); int qcaspi_read_register(struct qcaspi *qca, u16 reg, u16 *result); -- 2.1.4
[PATCH v5 16/17] tty: serdev-ttyport: return actual baudrate from ttyport_set_baudrate
Instead of returning the requested baudrate, we better return the actual one because it isn't always the same. Signed-off-by: Stefan WahrenAcked-by: Rob Herring --- drivers/tty/serdev/serdev-ttyport.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/tty/serdev/serdev-ttyport.c b/drivers/tty/serdev/serdev-ttyport.c index 487c88f..2cfdf34 100644 --- a/drivers/tty/serdev/serdev-ttyport.c +++ b/drivers/tty/serdev/serdev-ttyport.c @@ -151,7 +151,7 @@ static unsigned int ttyport_set_baudrate(struct serdev_controller *ctrl, unsigne /* tty_set_termios() return not checked as it is always 0 */ tty_set_termios(tty, ); - return speed; + return ktermios.c_ospeed; } static void ttyport_set_flow_control(struct serdev_controller *ctrl, bool enable) -- 2.1.4
RE: [PATCH 1/4] hamradio: Combine two seq_printf() calls into one in yam_seq_show()
From: SF Markus Elfring > Sent: 09 May 2017 15:22 > A bit of data was put into a sequence by two separate function calls. > Print the same data by a single function call instead. > > This issue was detected by using the Coccinelle software. > > Signed-off-by: Markus Elfring> --- > drivers/net/hamradio/yam.c | 3 +-- > 1 file changed, 1 insertion(+), 2 deletions(-) > > diff --git a/drivers/net/hamradio/yam.c b/drivers/net/hamradio/yam.c > index b6891ada1d7b..542f1e511df1 100644 > --- a/drivers/net/hamradio/yam.c > +++ b/drivers/net/hamradio/yam.c > @@ -830,8 +830,7 @@ static int yam_seq_show(struct seq_file *seq, void *v) > seq_printf(seq, " RxFrames %lu\n", dev->stats.rx_packets); > seq_printf(seq, " TxInt%u\n", yp->nb_mdint); > seq_printf(seq, " RxInt%u\n", yp->nb_rxint); > - seq_printf(seq, " RxOver %lu\n", dev->stats.rx_fifo_errors); > - seq_printf(seq, "\n"); > + seq_printf(seq, " RxOver %lu\n\n", dev->stats.rx_fifo_errors); > return 0; The code was consistently (and probably deliberately) using one seq_printf() call for each line so that the source looks 'a bit like' the output. These changes are all stupid. David